Last updated: 2018-03-22
Code version: 0e50352
The purpose of this analysis is to explore the strand specificity of my net seq data. Within genes we expect strand specific expression and at TSS we would expect some convergent transcription. I will filter the overlaping genes as done in the Meyer paper.
Mayer filtering: Pol-II-transcribed genes that do not overlap other genes within 2.5 kb of the TSS and polyadenylation site and are longer than 2 kb
bedtools closest -N
Get the protein coding genes:
grep 'protein_coding' gencode.v19.annotation.bed | awk '$8 == "gene"' |cut -f1-6 |sed 's/^chr//' > gencode.v19.annotation.proteincodinggene.bed
Only keep genes that are 2kb. I want genes where end (3) - start (2) is greater than 2000
awk '{if( ($3 - $2) > 2000) print($1 "\t" $2 "\t" $3 "\t" $4 "\t" $5 "\t" $6)}' gencode.v19.annotation.proteincodinggene.bed > gencode.v19.annotation.proteincodinggene.2kb.bed
Next step is to look for overlab distance with bedtools closest -N.
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.2 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
[5] tools_3.4.2 htmltools_0.3.6 yaml_2.1.16 Rcpp_0.12.15
[9] stringi_1.1.6 rmarkdown_1.8.5 knitr_1.18 git2r_0.21.0
[13] stringr_1.2.0 digest_0.6.14 evaluate_0.10.1
This R Markdown site was created with workflowr