Last updated: 2017-11-16

Code version: 332719e

Goal: Extract the number of bases that mapped and were soft clipped from bam files and make a histogram

Look first at the original bam files from the snake mapping.

To look at bam files in python, use pysam. I added this to my environment.

Process:

  • Python Script: for simplicity at first, this file is called get_mapped.py and is in the /project2/gilad/briana/Net-seq/Net-seq1/data/sort/ directory. The bam file needs to have a bai file in teh same directory to work with the pysam program
#!/usr/bin/env python
import sys
import pysam
import re


mapped_num = open("mapped_18486_dep.txt", "w")
file_bam_open= open("/project2/gilad/briana/Net-seq/Net-seq1/data/sort/YG-SP-NET1-18486-dep-2017-10-13_S4_R1_001-sort.bam", "r")
file_bam= pysam.AlignmentFile(file_bam_open, "rb")
def get_mapped(file):
  mapped=[]
  for read in file.fetch():
      read= str(read)
      cigar= read.split()[5] #python starts at 0
      num= re.findall("\d+M", cigar) #this gives me each "##M" in the cigar
      for i in num:
        mapped.append(i[:-1])
  return(mapped)

get_mapped(file_bam)

mapped_num.write("\n".join(mapped))

mapped_num.close()

  
    

What I want:

6th column (sep by tab) of the line. ex: 7S23M14S. I want the all the numbers before the M in this string.

Pull the txt file with all the numbers into R and make a histogram.

Session information

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.2  backports_1.1.1 magrittr_1.5    rprojroot_1.2  
 [5] tools_3.4.2     htmltools_0.3.6 yaml_2.1.14     Rcpp_0.12.13   
 [9] stringi_1.1.5   rmarkdown_1.6   knitr_1.17      git2r_0.19.0   
[13] stringr_1.2.0   digest_0.6.12   evaluate_0.10.1

This R Markdown site was created with workflowr