Update Snakefile for pilot analysis

Last updated: 2018-03-02

Code version: 646ee57

Pilot data:

The pilot project has 16 human lines of net-seq data. I have 3 lanes of sequencing for each library of 8. The first library is the same 8 samples from Net3. I can use zcat to combine the samles from seperate lanes into one file. I will call these [sample_name]_combined_Netpilot.fastq. I will use the project directory /project2/gilad/briana/Net-seq-pilot

Snakefile Update:

I will update my snakefile by copying the old one as Snakefile_old.

Changes:

change genome location to /project2/gilad/briana/genome_anotation_data
Add picard tools annotation to the snakefile
compute genome coverage for the dedup and sorted bam files

Genome location

My old snakefile create the reference genome for every new analysis. It would be better to build the reference once and always keep it in one place. I will now be in /project2/gilad/briana/genome_anotation_data. To do this I will change the snakefile to point to this location and change the config file to point to the correct location.

#config file  

dir_genome: /project2/gilad/briana/genome_anotation_data/

#snakefile
dir_gen=config["dir_genome"]
dir_genome= dir_gen + "genome/"

Picard tools

Picard tools will take in a flat file and a rRna file and will assess coverage at genomic regions. Before I make the rule I need to convert the gencode.v19.annotation.gtf to a flat file.

Update net-seq environment:

ucsc-gtfToGenePred
picard >= 2.9.2

#convert gtf to genepred flat  
gtfToGenePred -genePredExt -ignoreGroupsWithoutExons gencode.v19.annotation.gtf gencode.v19.annotation.refFlat

Adapt rule from Seb’s Snakefile

#config file 
ref_flat: /project2/gilad/briana/genome_anotation_data/gencode.v19.annotation.refFlat
ribosome_intervals: /project2/gilad/briana/genome_anotation_data/hg19.rRNA.intervals  

#snakefile  
ref_flat = config["ref_flat"]
ribosome_int = config["ribosome_intervals"]


#rule 
rule collect_rna_metrics:
    input:
        bam = dir_sort + "{samples}-sort.bam"
        ref_flat = ref_flat,
        ribosome_int = ribosome_int
    output:
        rna_metrics = output + "{samples}_RNAmetrics.picard.txt"
    params:
        strand = "NONE",
        memory = "-Xmx12G"
    shell:
        "picard CollectRnaSeqMetrics {params.memory} I={input.bam} O={output.rna_metrics} REF_FLAT={input.ref_flat} STRAND={params.strand} RIBOSOMAL_INTERVALS={input.ribosome_int}"

Genome Coverage

The old snakefile has a genome_cov rule for genome cov of the deduplicated files. I want to do this for the sort and dedup files. I also will change the rule so it is -d -5.

rule genome_cov_dedup:
    input:
       bamdedup = dir_dedup + "{samples}-sort.dedup.bam",
       genome = dir_genome + ensembl_genome + ".reads"
    output:
       dir_cov + "{samples}-sort.dedup.cov.bed"
    shell: "bedtools genomecov -d -5 -ibam {input.bamdedup} -g {input.genome} > {output}"
    
    
rule genome_cov_sort:
    input:
      bamsort = dir_sort + "{samples}-sort.bam"
      genome = dir_genome + ensembl_genome + ".reads"  
    output: 
      dir_cov + "{samples}-sort.cov.bed"
    shell: "bedtools genomecov -d -5 -ibam {input.bamsort} -g {input.genome} > {output}"
   
   
#in rule all:  
expand(dir_cov + "{samples}-sort.dedup.cov.bed", samples=samples),
expand(dir_cov + "{samples}-sort.cov.bed", samples=samples),

Session information

sessionInfo()

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.2  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
 [5] tools_3.4.2     htmltools_0.3.6 yaml_2.1.16     Rcpp_0.12.15   
 [9] stringi_1.1.6   rmarkdown_1.8.5 knitr_1.18      git2r_0.21.0   
[13] stringr_1.2.0   digest_0.6.14   evaluate_0.10.1

This R Markdown site was created with workflowr