<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="pandoc" /> <meta name="author" content="Sarah Urbut, Gao Wang, Peter Carbonetto and Matthew Stephens" /> <title>Converting FastQTL results to mash format</title> <script src="site_libs/jquery-1.11.3/jquery.min.js"></script> <meta name="viewport" content="width=device-width, initial-scale=1" /> <link href="site_libs/bootstrap-3.3.5/css/readable.min.css" rel="stylesheet" /> <script src="site_libs/bootstrap-3.3.5/js/bootstrap.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/respond.min.js"></script> <script src="site_libs/navigation-1.1/tabsets.js"></script> <link href="site_libs/highlightjs-9.12.0/textmate.css" rel="stylesheet" /> <script src="site_libs/highlightjs-9.12.0/highlight.js"></script> <style type="text/css">code{white-space: pre;}</style> <style type="text/css"> pre:not([class]) { background-color: white; } </style> <script type="text/javascript"> if (window.hljs) { hljs.configure({languages: []}); hljs.initHighlightingOnLoad(); if (document.readyState && document.readyState === "complete") { window.setTimeout(function() { hljs.initHighlighting(); }, 0); } } </script> <style type="text/css"> h1 { font-size: 34px; } h1.title { font-size: 38px; } h2 { font-size: 30px; } h3 { font-size: 24px; } h4 { font-size: 18px; } h5 { font-size: 16px; } h6 { font-size: 12px; } .table th:not([align]) { text-align: left; } </style> </head> <body> <style type = "text/css"> .main-container { max-width: 940px; margin-left: auto; margin-right: auto; } code { color: inherit; background-color: rgba(0, 0, 0, 0.04); } img { max-width:100%; height: auto; } .tabbed-pane { padding-top: 12px; } button.code-folding-btn:focus { outline: none; } </style> <style type="text/css"> /* padding for bootstrap navbar */ body { padding-top: 51px; padding-bottom: 40px; } /* offset scroll position for anchor links (for fixed navbar) */ .section h1 { padding-top: 56px; margin-top: -56px; } .section h2 { padding-top: 56px; margin-top: -56px; } .section h3 { padding-top: 56px; margin-top: -56px; } .section h4 { padding-top: 56px; margin-top: -56px; } .section h5 { padding-top: 56px; margin-top: -56px; } .section h6 { padding-top: 56px; margin-top: -56px; } </style> <script> // manage active state of menu based on current page $(document).ready(function () { // active menu anchor href = window.location.pathname href = href.substr(href.lastIndexOf('/') + 1) if (href === "") href = "index.html"; var menuAnchor = $('a[href="' + href + '"]'); // mark it active menuAnchor.parent().addClass('active'); // if it's got a parent navbar menu mark it active as well menuAnchor.closest('li.dropdown').addClass('active'); }); </script> <div class="container-fluid main-container"> <!-- tabsets --> <script> $(document).ready(function () { window.buildTabsets("TOC"); }); </script> <!-- code folding --> <div class="navbar navbar-default navbar-fixed-top" role="navigation"> <div class="container"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar"> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="index.html">mash code resources</a> </div> <div id="navbar" class="navbar-collapse collapse"> <ul class="nav navbar-nav"> <li> <a href="index.html">Overview</a> </li> <li> <a href="https://github.com/stephenslab/mashr">mashr</a> </li> <li> <a href="fastqtl2mash.html">Fastqtl to mash</a> </li> <li> <a href="gtex.html">GTEx analysis</a> </li> </ul> <ul class="nav navbar-nav navbar-right"> <li> <a href="https://github.com/stephenslab/gtexresults">source</a> </li> </ul> </div><!--/.nav-collapse --> </div><!--/.container --> </div><!--/.navbar --> <!-- Add a small amount of space between sections. --> <style type="text/css"> div.section { padding-top: 12px; } </style> <!-- Add a small amount of space between sections. --> <style type="text/css"> div.section { padding-top: 12px; } </style> <div class="fluid-row" id="header"> <h1 class="title toc-ignore">Converting FastQTL results to mash format</h1> <h4 class="author"><em>Sarah Urbut, Gao Wang, Peter Carbonetto and Matthew Stephens</em></h4> </div> <p><strong>Last updated:</strong> 2018-06-05</p> <strong>workflowr checks:</strong> <small>(Click a bullet for more information)</small> <ul> <li> <p><details> <summary> <strong style="color:blue;">✔</strong> <strong>R Markdown file:</strong> up-to-date </summary></p> <p>Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.</p> </details> </li> <li> <p><details> <summary> <strong style="color:blue;">✔</strong> <strong>Repository version:</strong> <a href="https://github.com/stephenslab/gtexresults/tree/62c92b0226dd0fcf7d73bba86f27270320f6cc22" target="_blank">62c92b0</a> </summary></p> Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated. <br><br> Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use <code>wflow_publish</code> or <code>wflow_git_commit</code>). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated: <pre><code> Ignored files: Ignored: .sos/ Ignored: data/.sos/ Ignored: output/MatrixEQTLSumStats.Portable.Z.coved.K3.P3.lite.single.expanded.V1.loglik.rds Ignored: workflows/.ipynb_checkpoints/ Ignored: workflows/.sos/ Untracked files: Untracked: analysis/files.txt Untracked: fastqtl_to_mash_output/ Untracked: gtex6_workflow_output/ </code></pre> Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes. </details> </li> </ul> <details> <summary> <small><strong>Expand here to see past versions:</strong></small> </summary> <ul> <table style="border-collapse:separate; border-spacing:5px;"> <thead> <tr> <th style="text-align:left;"> File </th> <th style="text-align:left;"> Version </th> <th style="text-align:left;"> Author </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Message </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/3345ce39aef90224f44e26c9a4d1e028c8b37b2d/analysis/fastqtl2mash.Rmd" target="_blank">3345ce3</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-04 </td> <td style="text-align:left;"> A few minor revisions to instructions for running pipelines. </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/stephenslab/gtexresults/ec7e83eb62cf1bd422f38d79578fefe3b156c975/docs/fastqtl2mash.html" target="_blank">ec7e83e</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-04 </td> <td style="text-align:left;"> Re-built main webpages after several updates and improvements. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/340ad6fb2a23ef21a6758142b65977ba2d7277b6/analysis/fastqtl2mash.Rmd" target="_blank">340ad6f</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-04 </td> <td style="text-align:left;"> wflow_publish(c(“index.Rmd”, “gtex.Rmd”, “fastqtl2mash.Rmd”)) </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/7ff4f673d585f44b34066c047e9e9a77cfac7408/analysis/fastqtl2mash.Rmd" target="_blank">7ff4f67</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”, view = FALSE) </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/stephenslab/gtexresults/0bbaec2f516038c11376973d2d281cf241349b3f/docs/fastqtl2mash.html" target="_blank">0bbaec2</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> A few small revisions to fastqtl2mash demo. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/cb5e65cb3e8f61e7a5de757d25a7ee401c1932ec/analysis/fastqtl2mash.Rmd" target="_blank">cb5e65c</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”, view = FALSE) </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/stephenslab/gtexresults/55397e1688f0751e90d3dd0605dc2a5ee7735a98/docs/fastqtl2mash.html" target="_blank">55397e1</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> Updates to fastqtl2mash demo. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/68995c41bdb9a7acb2b1d3dd445e5bc97512417a/analysis/fastqtl2mash.Rmd" target="_blank">68995c4</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”, view = FALSE) </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/ab370ff229cf6853953f69f94cc19b866b31e181/analysis/fastqtl2mash.Rmd" target="_blank">ab370ff</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> Revising fastqtl2mash instructions. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/f59f05284cd8deb08aae34c10038e3392f59613d/analysis/fastqtl2mash.Rmd" target="_blank">f59f052</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”) </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/155cfc91e2df10bf09bb7e819cd164c53fe33b8d/analysis/fastqtl2mash.Rmd" target="_blank">155cfc9</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”) </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/stephenslab/gtexresults/930e0f63d6aaccb07131b6466a161697c912cb58/docs/fastqtl2mash.html" target="_blank">930e0f6</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> Build site. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/401fd65cb932baf8a0a6ec738732b1bbfcfbb07f/analysis/fastqtl2mash.Rmd" target="_blank">401fd65</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> wflow_publish(“fastqtl2mash.Rmd”) </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/6a456e6971005dc03629876b99d3a3f181f06d43/analysis/fastqtl2mash.Rmd" target="_blank">6a456e6</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-01 </td> <td style="text-align:left;"> Moved some output files to data folder; removed some old files from </td> </tr> </tbody> </table> </ul> <p></details></p> <hr /> <div id="overview" class="section level2"> <h2>Overview</h2> <p>We provide code to convert association statistics in <a href="http://fastqtl.sourceforge.net">FastQTL</a> format, or a format similar to FastQTL, to a format that is more suited for analysis with mash. This code was used to generate <code>MatrixEQTLSumStats.Portable.Z.rds</code> in the <a href="https://github.com/stephenslab/gtexresults">git repository</a> from the SNP-gene association statistics included as part of Release 6 of the <a href="http://gtexportal.org">GTEx Project</a> (the source file was named <code>GTEx_Analysis_V6_all-snp-gene-associations.tar</code>).</p> <p>Here we give instructions for using this code, and demonstrate how to convert a toy FastQTL data set. This toy data set is included in the <a href="https://github.com/stephenslab/gtexresults">git repository</a>.</p> <p>To facilitate running our conversion procedure, we have also developed a <a href="https://hub.docker.com/r/gaow/hdf5tools">Docker container</a> that includes all the required software components, notably the HDF5 libraries used to create intermediate data files that can be efficiently queried. Docker can run on most popular operating systems (Mac, Windows and Linux) and cloud computing services such as Amazon Web Services and Microsoft Azure. If you have not used Docker before, you might want to read <a href="https://docs.docker.com/engine/docker-overview">this</a> to learn the basic concepts and understand the main benefits of Docker.</p> <p>For details on how the Docker image was configured, see <code>hdf5tools.dockerfile</code> in the <code>workflows</code> directory of the <a href="https://github.com/stephenslab/gtexresults">git repository</a>. The Docker image used for our analyses is based on <a href="https://hub.docker.com/r/gaow/lab-base">gaow/lab-base</a>, a customized Docker image for development with R and Python.</p> <p>If you find a bug in any of these steps, please post an <a href="https://github.com/stephenslab/gtexresults/issues">issue</a>.</p> </div> <div id="download-and-install-docker" class="section level2"> <h2>Download and install Docker</h2> <p>Download <a href="https://docs.docker.com/install">Docker</a> (note that a free <a href="https://www.docker.com/community-edition">community edition</a> of Docker is available), and install it following the instructions provided on the Docker website. Once you have installed Docker, check that Docker is working correctly by following Part 1 of the <a href="https://docs.docker.com/get-started">“Getting Started” guide</a>. If you are new to Docker, we recommend reading the entire “Getting Started” guide.</p> <p><strong>Note:</strong> Setting up Docker requires that you have administrator access to your computer. <a href="https://singularity.lbl.gov/docs-docker">Singularity</a> is an alternative that accepts Docker images and does not require administrator access.</p> </div> <div id="download-and-test-docker-image" class="section level2"> <h2>Download and test Docker image</h2> <p>Run this <code>alias</code> command in the shell, which will be used below to run commands inside the Docker container:</p> <pre class="bash"><code>alias fastqtl2mash-docker='docker run --security-opt label:disable -t '\ '-P -h MASH -w $PWD -v $HOME:/home/$USER -v /tmp:/tmp -v $PWD:$PWD '\ '-u $UID:${GROUPS[0]} -e HOME=/home/$USER -e USER=$USER gaow/hdf5tools'</code></pre> <p>The <code>-v</code> flags in this command map directories between the standard computing environment and the Docker container. Since the analyses below will write files to these directories, it is important to ensure that:</p> <ul> <li><p>Environment variables <code>$HOME</code> and <code>$PWD</code> are set to valid and writeable directories (usually your home and current working directories, respectively).</p></li> <li><p><code>/tmp</code> should also be a valid and writeable directory.</p></li> </ul> <p>If any of these statements are not true, please adjust the <code>alias</code> accordingly. The remaining options only affect operation of the container, and so should function the same regardless of your operating system.</p> <p>Next, run a simple command in the Docker container to check that has loaded successfully:</p> <pre><code>fastqtl2mash-docker uname -sn</code></pre> <p>This command will download the Docker image if it has not already been downloaded.</p> <p>If the container was successfully run, you should see this information about the Docker container outputted to the screen:</p> <pre><code>Linux MASH</code></pre> <p>You can also run these commands to show the information about the image downloaded to your computer and the container that has run (and exited):</p> <pre class="bash"><code>docker image list docker container list --all</code></pre> <p><em>Note:</em> If you get error “Cannot connect to the Docker daemon. Is the docker daemon running on this host?” in Linux or macOS, see <a href="https://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo">here for Linux</a> or <a href="https://github.com/wodby/docker4drupal/issues/15">here for Mac</a> for suggestions on how to resolve this issue.</p> </div> <div id="clone-or-download-the-gtexresults-repository" class="section level2"> <h2>Clone or download the gtexresults repository</h2> <p>Clone or download the <a href="https://github.com/stephenslab/gtexresults">gtexresults</a> repository to your computer, then change your working directory in the shell to the root of the repository, e.g.,</p> <pre class="bash"><code>cd gtexresults</code></pre> <p>All the commands below will be run from this directory.</p> </div> <div id="convert-eqtl-summary-statistics" class="section level2"> <h2>Convert eQTL summary statistics</h2> <p>Next, use the <code>fastqtl_to_mash.ipynb</code> code in the <code>workflows</code> directory to convert the toy data set in FastQTL format to the mash format. The toy data are stored in the <code>data/fastqtl</code> subdirectory of the git repository.</p> <p>Having followed the above steps to set up the Docker container on your computer, the data conversion can be carried out with the following command:</p> <pre class="bash"><code>fastqtl2mash-docker sos run workflows/fastqtl_to_mash.ipynb \ --data_list data/fastqtl/FastQTLSumStats.list \ --gene_list data/fastqtl/GTEx_genes.txt</code></pre> <p>If successful, this command will write several files to a newly created directory, <code>fastqtl_to_mash_output</code>. One file, <code>FastQTLSumStats.mash.rds</code>, contains the eQTL summary statistics in an RDS file, which is easily loaded into R; see <code>help(readRDS)</code> in R for detailsf. For more information about the contents of this file, and how they can be provided as input to the mash methods using the <code>set_mash_data</code> function, see the documentation inside the <a href="https://github.com/stephenslab/gtexresults/blob/master/workflows/fastqtl_to_mash.ipynb">fastqtl2mash notebook</a> and the vignettes in the <a href="https://github.com/stephenslab/mashr">mashr package</a>.</p> </div> <div id="additional-usage-notes" class="section level2"> <h2>Additional usage notes</h2> <ul> <li><p>All containers that have run and exited will still be retained in the Docker system. Run <code>docker container list --all</code> to list all previous run containers. To clear these previously run containers, run <code>docker container prune</code>. See <a href="https://stackoverflow.com/questions/17014263/should-i-be-concerned-about-excess-non-running-docker-containers">here</a> for more information.</p></li> <li><p>The conversion procedure has several options which were not illustrated in the example above. View the <code>fastqtl_to_mash.ipynb</code> file in Jupyter, or in your Web browser <a href="https://github.com/stephenslab/gtexresults/blob/master/workflows/fastqtl_to_mash.ipynb">here</a>, for more details about the available options, specifications of the input files, and other usage information.</p></li> <li><p>Converting the full GTEx data set is computationally intensive and is best done in high-performance computing environment with configurations to run the workflow across different compute nodes. See <a href="https://vatlab.github.io/sos-docs/doc/documentation/Remote_Execution.html">here</a> for details.</p></li> <li><p>Run the following command to update the Docker image:</p></li> </ul> <p><code>bash docker pull gaow/hdf5tools</code></p> </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ "HTML-CSS": { availableFonts: ["TeX"] } }); </script> <!-- Adjust MathJax settings so that all math formulae are shown using TeX fonts only; see http://docs.mathjax.org/en/latest/configuration.html. This will make the presentation more consistent at the cost of the webpage sometimes taking slightly longer to load. Note that this only works because the footer is added to webpages before the MathJax javascript. --> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ "HTML-CSS": { availableFonts: ["TeX"] } }); </script> <hr> <p> This reproducible <a href="http://rmarkdown.rstudio.com">R Markdown</a> analysis was created with <a href="https://github.com/jdblischak/workflowr">workflowr</a> 1.0.1.9000 </p> <hr> </div> <script> // add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); }); </script> <!-- dynamically load mathjax for compatibility with self-contained --> <script> (function () { var script = document.createElement("script"); script.type = "text/javascript"; script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"; document.getElementsByTagName("head")[0].appendChild(script); })(); </script> </body> </html>