Last updated: 2017-11-08
Code version: f869674
I will use this analysis to look at inital data QC and points of interests.
First I looked at the number of reads that mapp to the genome before and after deduplication UMI steps.
samtools view -c -F 4
For flag info: https://broadinstitute.github.io/picard/explain-flags.html
Mayer data: fastq: 137281933
sorted: 120124203 dedup: 2262387
dedup/sorted: 0.01883373
library= c( "18486-dep", "18486-tot", "18508-dep", "18508-nondep", "19238-dep", "mayer")
fastq= c( 45803834, 4006, 70776230, 77223987, 113160855, 137281933)
sorted= c(17336796, 1258, 43247747, 50189574, 40420633, 17157730 )
dedup= c(1533069, 1105, 1776330,1919904,
1870359,2262387)
perc= dedup/sorted
reads_mapped_dedup= data.frame(rbind(library, fastq, sorted, dedup, perc))
reads_mapped_dedup
X1 X2 X3
library 18486-dep 18486-tot 18508-dep
fastq 45803834 4006 70776230
sorted 17336796 1258 43247747
dedup 1533069 1105 1776330
perc 0.0884286231435151 0.878378378378378 0.0410733534859053
X4 X5 X6
library 18508-nondep 19238-dep mayer
fastq 77223987 113160855 137281933
sorted 50189574 40420633 17157730
dedup 1919904 1870359 2262387
perc 0.0382530443474177 0.0462723827209732 0.131858177043234
total_reads= sum(fastq)
sorted/fastq
[1] 0.3785010 0.3140290 0.6110490 0.6499221 0.3571962 0.1249817
Undetermined is nothing: it corresponds to random reads
From meeting:
Allign with star and bwa to compare
compare to http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr7%3A5568588%2D5568715&hgsid=642260271_FLEwDANY0lSWCFhW4QjbmbASDDnB
#mayer data first
sbatch --mem=8g STAR --runThreadN 4 --genomeDir ../genome/ --readFilesIn fastq_extr/ SRR1575922_extracted.fastq --readFilesCommand zcat --outFilterMultimapNmax 1 --outSAMtype BAM SortedByCoordinate --outStd BAM_SortedByCoordinate > star/mayer_star_align.bam
Problem: I need the genome annotation file to use STAR to allign the genome.
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.2 backports_1.1.1 magrittr_1.5 rprojroot_1.2
[5] tools_3.4.2 htmltools_0.3.6 yaml_2.1.14 Rcpp_0.12.13
[9] stringi_1.1.5 rmarkdown_1.6 knitr_1.17 git2r_0.19.0
[13] stringr_1.2.0 digest_0.6.12 evaluate_0.10.1
This R Markdown site was created with workflowr