<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="pandoc" /> <meta name="author" content="Sarah Urbut, Gao Wang, Peter Carbonetto and Matthew Stephens" /> <title>Converting FastQTL results to mash format</title> <script src="site_libs/jquery-1.11.3/jquery.min.js"></script> <meta name="viewport" content="width=device-width, initial-scale=1" /> <link href="site_libs/bootstrap-3.3.5/css/readable.min.css" rel="stylesheet" /> <script src="site_libs/bootstrap-3.3.5/js/bootstrap.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/respond.min.js"></script> <script src="site_libs/navigation-1.1/tabsets.js"></script> <link href="site_libs/highlightjs-9.12.0/textmate.css" rel="stylesheet" /> <script src="site_libs/highlightjs-9.12.0/highlight.js"></script> <style type="text/css">code{white-space: pre;}</style> <style type="text/css"> pre:not([class]) { background-color: white; } </style> <script type="text/javascript"> if (window.hljs) { hljs.configure({languages: []}); hljs.initHighlightingOnLoad(); if (document.readyState && document.readyState === "complete") { window.setTimeout(function() { hljs.initHighlighting(); }, 0); } } </script> <style type="text/css"> h1 { font-size: 34px; } h1.title { font-size: 38px; } h2 { font-size: 30px; } h3 { font-size: 24px; } h4 { font-size: 18px; } h5 { font-size: 16px; } h6 { font-size: 12px; } .table th:not([align]) { text-align: left; } </style> </head> <body> <style type = "text/css"> .main-container { max-width: 940px; margin-left: auto; margin-right: auto; } code { color: inherit; background-color: rgba(0, 0, 0, 0.04); } img { max-width:100%; height: auto; } .tabbed-pane { padding-top: 12px; } button.code-folding-btn:focus { outline: none; } </style> <style type="text/css"> /* padding for bootstrap navbar */ body { padding-top: 51px; padding-bottom: 40px; } /* offset scroll position for anchor links (for fixed navbar) */ .section h1 { padding-top: 56px; margin-top: -56px; } .section h2 { padding-top: 56px; margin-top: -56px; } .section h3 { padding-top: 56px; margin-top: -56px; } .section h4 { padding-top: 56px; margin-top: -56px; } .section h5 { padding-top: 56px; margin-top: -56px; } .section h6 { padding-top: 56px; margin-top: -56px; } </style> <script> // manage active state of menu based on current page $(document).ready(function () { // active menu anchor href = window.location.pathname href = href.substr(href.lastIndexOf('/') + 1) if (href === "") href = "index.html"; var menuAnchor = $('a[href="' + href + '"]'); // mark it active menuAnchor.parent().addClass('active'); // if it's got a parent navbar menu mark it active as well menuAnchor.closest('li.dropdown').addClass('active'); }); </script> <div class="container-fluid main-container"> <!-- tabsets --> <script> $(document).ready(function () { window.buildTabsets("TOC"); }); </script> <!-- code folding --> <div class="navbar navbar-default navbar-fixed-top" role="navigation"> <div class="container"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar"> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="index.html">mash code resources</a> </div> <div id="navbar" class="navbar-collapse collapse"> <ul class="nav navbar-nav"> <li> <a href="index.html">Overview</a> </li> <li> <a href="https://github.com/stephenslab/mashr">mashr</a> </li> <li> <a href="fastqtl2mash.html">Fastqtl to mash</a> </li> <li> <a href="gtex.html">GTEx analysis</a> </li> </ul> <ul class="nav navbar-nav navbar-right"> <li> <a href="https://github.com/stephenslab/gtexresults">source</a> </li> </ul> </div><!--/.nav-collapse --> </div><!--/.container --> </div><!--/.navbar --> <!-- Add a small amount of space between sections. --> <style type="text/css"> div.section { padding-top: 12px; } </style> <!-- Add a small amount of space between sections. --> <style type="text/css"> div.section { padding-top: 12px; } </style> <div class="fluid-row" id="header"> <h1 class="title toc-ignore">Converting FastQTL results to mash format</h1> <h4 class="author"><em>Sarah Urbut, Gao Wang, Peter Carbonetto and Matthew Stephens</em></h4> </div> <p><strong>Last updated:</strong> 2018-06-01</p> <strong>workflowr checks:</strong> <small>(Click a bullet for more information)</small> <ul> <li> <p><details> <summary> <strong style="color:blue;">✔</strong> <strong>R Markdown file:</strong> up-to-date </summary></p> <p>Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.</p> </details> </li> <li> <p><details> <summary> <strong style="color:blue;">✔</strong> <strong>Repository version:</strong> <a href="https://github.com/stephenslab/gtexresults/tree/cb5e65cb3e8f61e7a5de757d25a7ee401c1932ec" target="_blank">cb5e65c</a> </summary></p> Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated. <br><br> Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use <code>wflow_publish</code> or <code>wflow_git_commit</code>). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated: <pre><code> Ignored files: Ignored: .sos/ Ignored: data/.sos/ Ignored: output/MatrixEQTLSumStats.Portable.Z.coved.K3.P3.lite.single.expanded.V1.loglik.rds Ignored: workflows/.ipynb_checkpoints/ Ignored: workflows/.sos/ Untracked files: Untracked: fastqtl_to_mash_output/ Untracked: gtex6_workflow_output/ </code></pre> Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes. </details> </li> </ul> <details> <summary> <small><strong>Expand here to see past versions:</strong></small> </summary> <ul> <table style="border-collapse:separate; border-spacing:5px;"> <thead> <tr> <th style="text-align:left;"> File </th> <th style="text-align:left;"> Version </th> <th style="text-align:left;"> Author </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Message </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/cb5e65cb3e8f61e7a5de757d25a7ee401c1932ec/analysis/fastqtl2mash.Rmd" target="_blank">cb5e65c</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”, view = FALSE) </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/stephenslab/gtexresults/55397e1688f0751e90d3dd0605dc2a5ee7735a98/docs/fastqtl2mash.html" target="_blank">55397e1</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> Updates to fastqtl2mash demo. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/68995c41bdb9a7acb2b1d3dd445e5bc97512417a/analysis/fastqtl2mash.Rmd" target="_blank">68995c4</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”, view = FALSE) </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/ab370ff229cf6853953f69f94cc19b866b31e181/analysis/fastqtl2mash.Rmd" target="_blank">ab370ff</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> Revising fastqtl2mash instructions. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/f59f05284cd8deb08aae34c10038e3392f59613d/analysis/fastqtl2mash.Rmd" target="_blank">f59f052</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”) </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/155cfc91e2df10bf09bb7e819cd164c53fe33b8d/analysis/fastqtl2mash.Rmd" target="_blank">155cfc9</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”) </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/stephenslab/gtexresults/930e0f63d6aaccb07131b6466a161697c912cb58/docs/fastqtl2mash.html" target="_blank">930e0f6</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> Build site. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/401fd65cb932baf8a0a6ec738732b1bbfcfbb07f/analysis/fastqtl2mash.Rmd" target="_blank">401fd65</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”) </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/6a456e6971005dc03629876b99d3a3f181f06d43/analysis/fastqtl2mash.Rmd" target="_blank">6a456e6</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> Moved some output files to data folder; removed some old files from </td> </tr> </tbody> </table> </ul> <p></details></p> <hr /> <div id="overview" class="section level2"> <h2>Overview</h2> <p>We provide code to convert association statistics in <a href="http://fastqtl.sourceforge.net">FastQTL</a> format, or a format similar to FastQTL, to a format that is more convenient for mash analysis. This code was used to generate data file <code>MatrixEQTLSumStats.Portable.Z.rds</code> in the <a href="https://github.com/stephenslab/gtexresults">git repository</a> from the SNP-gene association statistics included as part of Release 6 of the <a href="http://gtexportal.org">GTEx Project</a> (<code>GTEx_Analysis_V6_all-snp-gene-associations.tar</code>).</p> <p>Here we give instructions for using this code, and demonstrate how to convert a toy FastQTL data set. This toy data set is included in the <a href="https://github.com/stephenslab/gtexresults">git repository</a>.</p> <p>To facilitate running our conversion procedure, we have also developed a <a href="https://hub.docker.com/r/gaow/hdf5tools">Docker container</a> that includes all the required software components, notably the HDF5 libraries used to create intermediate data files that can be efficiently queried. Docker can run on most popular operating systems (Mac, Windows and Linux) and cloud computing services such as Amazon Web Services and Microsoft Azure. If you have not used Docker before, you might want to read <a href="https://docs.docker.com/engine/docker-overview">this</a> to learn the basic concepts and understand the main benefits of Docker.</p> <p>For details on how the Docker image was configured, see <code>hdf5tools.dockerfile</code> in the <code>workflows</code> directory of the <a href="https://github.com/stephenslab/gtexresults">git repository</a>. The Docker image used for our analyses is based on <a href="https://hub.docker.com/r/gaow/lab-base">gaow/lab-base</a>, a customized Docker image for development with R and Python.</p> <p>If you find a bug in any of these steps, please post an <a href="https://github.com/stephenslab/gtexresults/issues">issue</a>.</p> </div> <div id="download-and-install-docker" class="section level2"> <h2>Download and install Docker</h2> <p>Download <a href="https://docs.docker.com/install">Docker</a> (note that a free <a href="https://www.docker.com/community-edition">community edition</a> of Docker is available), and install it following the instructions provided on the Docker website. Once you have installed Docker, check that Docker is working correctly by following Part 1 of the <a href="https://docs.docker.com/get-started">“Getting Started” guide</a>. If you are new to Docker, we recommend reading the entire “Getting Started” guide.</p> <p><strong>Note:</strong> Setting up Docker requires that you have administrator access to your computer. <a href="https://singularity.lbl.gov/docs-docker">Singularity</a> is an alternative that accepts Docker images and does not require administrator access.</p> </div> <div id="download-and-test-docker-image" class="section level2"> <h2>Download and test Docker image</h2> <p>Run this <code>alias</code> command in the shell, which will be used below to run commands inside the Docker container:</p> <pre class="bash"><code>alias fastqtl2mash-docker='docker run --security-opt label:disable -t '\ '-P -h MASH -w $PWD -v $HOME:/home/$USER -v /tmp:/tmp -v $PWD:$PWD '\ '-u $UID:${GROUPS[0]} -e HOME=/home/$USER -e USER=$USER gaow/hdf5tools'</code></pre> <p>The <code>-v</code> flags in this command map directories between the standard computing environment and the Docker container. Since the analyses below will write files to these directories, it is important to ensure that:</p> <ul> <li><p>Environment variables <code>$HOME</code> and <code>$PWD</code> are set to valid and writeable directories (usually your home and current working directories, respectively).</p></li> <li><p><code>/tmp</code> should also be a valid and writeable directory.</p></li> </ul> <p>If any of these statements are not true, please adjust the <code>alias</code> accordingly. The remaining options only affect operation of the container, and so should function the same regardless of your operating system.</p> <p>Next, run a simple command in the Docker container to check that has loaded successfully:</p> <pre><code>fastqtl2mash-docker uname -sn</code></pre> <p>This command will download the Docker image if it has not already been downloaded.</p> <p>If the container was successfully run, you should see this information about the Docker container outputted to the screen:</p> <pre><code>Linux MASH</code></pre> <p>You can also run these commands to show the information about the image downloaded to your computer and the container that has run (and exited):</p> <pre class="bash"><code>docker image list docker container list --all</code></pre> <p><em>Note:</em> If you get error “Cannot connect to the Docker daemon. Is the docker daemon running on this host?” in Linux or macOS, see <a href="https://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo">here for Linux</a> or <a href="https://github.com/wodby/docker4drupal/issues/15">here for Mac</a> for suggestions on how to resolve this issue.</p> </div> <div id="clone-or-download-the-gtexresults-repository" class="section level2"> <h2>Clone or download the gtexresults repository</h2> <p>Clone or download the <a href="https://github.com/stephenslab/gtexresults">gtexresults</a> repository to your computer, then change your working directory in the shell to the root of the repository, e.g.,</p> <pre class="bash"><code>cd gtexresults</code></pre> <p>All the commands below will be run from this directory.</p> </div> <div id="convert-eqtl-summary-statistics" class="section level2"> <h2>Convert eQTL summary statistics</h2> <p>Next, we will use the <code>fastqtl_to_mash.ipynb</code> source code in the <code>workflows</code> directory to convert the toy data set in FastQTL format to the mash format. The toy data are stored in the <code>data/fastqtl</code> subdirectory of the git repository.</p> <p>Having followed the above steps to set up the Docker container on your computer, the data conversion can be carried out with the following command:</p> <pre class="bash"><code>fastqtl2mash-docker sos run workflows/fastqtl_to_mash.ipynb \ --data_list data/fastqtl/FastQTLSumStats.list \ --gene_list data/fastqtl/GTEx_genes.txt</code></pre> <p><em>Add information here about the expected outputs.</em></p> </div> <div id="more-usage-notes" class="section level2"> <h2>More usage notes</h2> <ul> <li><p>The conversion procedure has several options which were not illustrated in the example above. View the <code>fastqtl_to_mash.ipynb</code> file in Jupyter, or in your Web browser <a href="https://github.com/stephenslab/gtexresults/blob/master/workflows/fastqtl_to_mash.ipynb">here</a>, for more details about the available options and other usage information.</p></li> <li><p>In practice for GTEx data the conversion is computationally intensive and is best done on a cluster environment with configurations to run the workflow across different compute nodes. See <a href="https://vatlab.github.io/sos-docs/doc/documentation/Remote_Execution.html">here</a> for details.</p></li> </ul> </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ "HTML-CSS": { availableFonts: ["TeX"] } }); </script> <!-- Adjust MathJax settings so that all math formulae are shown using TeX fonts only; see http://docs.mathjax.org/en/latest/configuration.html. This will make the presentation more consistent at the cost of the webpage sometimes taking slightly longer to load. Note that this only works because the footer is added to webpages before the MathJax javascript. --> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ "HTML-CSS": { availableFonts: ["TeX"] } }); </script> <hr> <p> This reproducible <a href="http://rmarkdown.rstudio.com">R Markdown</a> analysis was created with <a href="https://github.com/jdblischak/workflowr">workflowr</a> 1.0.1.9000 </p> <hr> </div> <script> // add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); }); </script> <!-- dynamically load mathjax for compatibility with self-contained --> <script> (function () { var script = document.createElement("script"); script.type = "text/javascript"; script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"; document.getElementsByTagName("head")[0].appendChild(script); })(); </script> </body> </html>