Last updated: 2018-02-14
Code version: d48ffe2
This analysis is to look at the quality of the library net-3. This has the first 8 libraries from the 16 pilot libraries.
18505
18508
18486
19239
19141
19193
19257
19128
The first step is to run the raw reads through my snakefile. To do this I have to change the project directory in the /project2/gilad/briana/net_seq_pipeline/config.yaml to /project2/gilad/briana/Net-seq/Net-seq3/ .
The reads will go in: /project2/gilad/briana/Net-seq/Net-seq3/data/fastq/
I first use gunzip to unzip all of the fastq files then I submit the snakemake file from /project2/gilad/briana/net_seq_pipeline with nohup scripts/submit-snakemake.sh
Look at number of reads, mapped reads, deduplicated reads.
‘samtools view -c -F 4’
libraries=c("18486", "18505", "18508", "19128", "19141", "19193", "19239", '19257')
fastq=c( 34342214,42959246,205654644, 51413661,211367408,52841852,54253650,45506210)
mapped=c(14018943,23607050,26532568,38084941,28165596,37084191,37095436,26636660)
dedup= c(1446379, 1032174,1642314,2469438,1926300,2717515, 2361238, 1377712 )
read_count=rbind(fastq,mapped,dedup)
colnames(read_count)=libraries
total_reads= sum(fastq)
total_reads
[1] 698338885
total_map= sum(mapped)
total_map
[1] 231225385
total_dedup= sum(dedup)
total_dedup
[1] 14973070
total_map/total_reads
[1] 0.3311077
total_dedup/total_map
[1] 0.0647553
Make a plot to look at this by line:
count_plot=barplot(as.matrix(read_count), main="Counts for coverage and complexity",
xlab="Library", col=c("lightskyblue2","dodgerblue1","navy"),
ylab="Read counts")
legend("topright", legend = c("total", "mapped", "UMI"), col=c("lightskyblue2","dodgerblue1","navy"), pch=20, cex = .75)
Look at this by proportions:
percent_mapped= mapped/fastq
percent_mapped
[1] 0.4082131 0.5495220 0.1290152 0.7407553 0.1332542 0.7017958 0.6837408
[8] 0.5853412
percent_UMI= dedup/fastq
percent_not_mapped= 1- percent_mapped - percent_UMI
prop=rbind(percent_not_mapped, percent_mapped, percent_UMI)
colnames(prop)= libraries
prop_plot=barplot(as.matrix(prop), main="Proportions for coverage and complexity",
xlab="Library", col=c("lightskyblue2","dodgerblue1","navy"),
ylab="Proportion of sequenced reads")
legend("bottomright", legend = c("un-mapped", "mapped", "UMI"), col=c("lightskyblue2","dodgerblue1","navy"), pch=20, cex = 0.75)
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.2 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
[5] tools_3.4.2 htmltools_0.3.6 yaml_2.1.16 Rcpp_0.12.15
[9] stringi_1.1.6 rmarkdown_1.8.5 knitr_1.18 git2r_0.21.0
[13] stringr_1.2.0 digest_0.6.14 evaluate_0.10.1
This R Markdown site was created with workflowr