Last updated: 2017-11-07

Code version: d5f003d

I will use this analysis to look at inital data QC and points of interests.

First I looked at the number of reads that mapp to the genome before and after deduplication UMI steps.

samtools view -c -F 4

For flag info: https://broadinstitute.github.io/picard/explain-flags.html

Mayer data: fastq: 137281933
sorted: 120124203 dedup: 2262387
dedup/sorted: 0.01883373

library= c( "18486-dep", "18486-tot", "18508-dep", "18508-nondep", "19238-dep", "mayer")

fastq= c( 45803834, 4006, 70776230, 77223987, 113160855, 137281933)  

sorted= c(17336796, 1258, 43247747, 50189574, 40420633, 17157730 )

dedup= c(1533069, 1105, 1776330,1919904,
1870359,2262387)

perc= dedup/sorted

reads_mapped_dedup= data.frame(rbind(library, fastq, sorted, dedup, perc))

reads_mapped_dedup
                        X1                X2                 X3
library          18486-dep         18486-tot          18508-dep
fastq             45803834              4006           70776230
sorted            17336796              1258           43247747
dedup              1533069              1105            1776330
perc    0.0884286231435151 0.878378378378378 0.0410733534859053
                        X4                 X5                X6
library       18508-nondep          19238-dep             mayer
fastq             77223987          113160855         137281933
sorted            50189574           40420633          17157730
dedup              1919904            1870359           2262387
perc    0.0382530443474177 0.0462723827209732 0.131858177043234
total_reads= sum(fastq)

sorted/fastq
[1] 0.3785010 0.3140290 0.6110490 0.6499221 0.3571962 0.1249817

Undetermined is nothing: it corresponds to random reads

From meeting:

Explore UMIs

’samtools view YG-SP-NET1-18486-total-2017-10-13_S5_R1_001-sort.bam | tr “_" “” | cut -f 2 | sort | uniq -c > UMI_18486_total_stats.txt ’

’samtools view YG-SP-NET1-18486-total-2017-10-13_S5_R1_001-sort.bam | tr “_" “” | cut -f 2 > UMI_18486_total.txt’

Run this for the files and put in output- then i can make plots based on this.

Session information

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.2  backports_1.1.1 magrittr_1.5    rprojroot_1.2  
 [5] tools_3.4.2     htmltools_0.3.6 yaml_2.1.14     Rcpp_0.12.13   
 [9] stringi_1.1.5   rmarkdown_1.6   knitr_1.17      git2r_0.19.0   
[13] stringr_1.2.0   digest_0.6.12   evaluate_0.10.1

This R Markdown site was created with workflowr