Last updated: 2018-06-04

workflowr checks: (Click a bullet for more information)
Expand here to see past versions:


Overview

To reproduce the results of Urbut, Wang & Stephens (2017), please follow these instructions. You are welcome to adapt these steps to your own study. Please also visit the mashr R package repository, which has a more user-friendly interface and tutorials on how to apply multivariate adaptive shrinkage (mash) to association analysis gene expression (eQTL analysis).

The complete analyses of the GTEx data require installation of several programs and libraries, as well as large data sets that are specifically prepared for mash. To facilitate reproducing our results, we provide data that was pre-processed using the fastqtl2mash preprocessing pipeline. We have also developed a Docker container that includes all software components necessary to run the analyses. Docker can run on most popular operating systems (Mac, Windows and Linux) and cloud computing services such as Amazon Web Services and Microsoft Azure. If you have not used Docker before, you might want to read this to learn the basic concepts and understand the main benefits of Docker.

For details on how the Docker image was configured, see mash.dockerfile in the workflows directory of the git repository. The Docker image used for our analyses is based on gaow/lab-base, a customized Docker image for development with R and Python.

If you find a bug in any of these steps, please post an issue.

Download and install Docker

Download Docker (note that a free community edition of Docker is available), and install it following the instructions provided on the Docker website. Once you have installed Docker, check that Docker is working correctly by following Part 1 of the “Getting Started” guide. If you are new to Docker, we recommend reading the entire “Getting Started” guide.

Note: Setting up Docker requires that you have administrator access to your computer. Singularity is an alternative that accepts Docker images and does not require administrator access.

Download and test Docker image

Run this alias command in the shell, which will be used below to run commands inside the Docker container:

alias mash-docker='docker run --security-opt label:disable -t '\
'-P -h MASH -w $PWD -v $HOME:/home/$USER -v /tmp:/tmp -v $PWD:$PWD '\
'-u $UID:${GROUPS[0]} -e HOME=/home/$USER -e USER=$USER gaow/mash-paper'

The -v flags in this command map directories between the standard computing environment and the Docker container. Since the analyses below will write files to these directories, it is important to ensure that:

If any of these statements are not true, please adjust the alias accordingly. The remaining options only affect operation of the container, and so should function the same regardless of your operating system.

Next, run a simple command in the Docker container to check that has loaded successfully:

mash-docker uname -sn

This command will download the Docker image if it has not already been downloaded.

If the container was successfully run, you should see this information about the Docker container outputted to the screen:

Linux MASH

You can also run these commands to show the information about the image downloaded to your computer and the container that has run (and exited):

docker image list
docker container list --all

Note: If you get error “Cannot connect to the Docker daemon. Is the docker daemon running on this host?” in Linux or macOS, see here for Linux or here for Mac for suggestions on how to resolve this issue.

Clone or download the gtexresults repository

Clone or download the gtexresults repository to your computer, then change your working directory in the shell to the root of the repository, e.g.,

cd gtexresults

All the commands below will be run from this directory.

Fit mash model and compute posterior statistics

Assuming your working directory is the root of the git repository (you can check this by running pwd), run all the steps of the analysis with this command:

mash-docker sos run workflows/gtex6_mash_analysis.ipynb

This command will take several hours to run—see below for more information on the individual steps. All outputs generated by this command will be saved to folder output inside the repository.

Note that you may recognize file gtex6_mash_analysis.ipynb as a Jupyter notebook. Indeed, you may open this notebook in Jupyter. However, you should not step through the code sequentially as you would in a typical Jupyter notebook; this is because the code in this notebook is meant to be run using the Script of Scripts (SoS) framework.

This command will execute the following steps of the analysis:

Note: All containers that have run and exited will still be retained in the Docker system. Run docker container list --all to list all previous run containers. To clear these previously run containers, run docker container prune. See here for more information.

5. Add Step 5 title here

Install some packages from CRAN:

# Add commands here to install packages.

For convenience, the results needed to generate the figures and tables have been saved in the output folder.

FIXME: update figure plotting instructions

The input data necessary to run this analysis is all available under inputs. This may take some time to run. We have provided the outputs of running mash in Data_vhat.

This repo is organized so that you can run Mash using the gteX data contained in Inputs to produce the parameters and posteriors from mashr.

The directory Plots_for_Paper_vmat contains .Rmd files to plot figures from the paper, using our results which are provided in Results_Data.

Figure 3:Summary of primary patterns identified by mash in GTEx data

Figure 4:Examples illustrating of how mash uses patterns of sharing to inform effect estimates in the GTEx data.

Figure 5:Histogram of Sharing

Figure 6:Pairwise sharing by magnitude of eQTL among tissues

Supplementary Figure 1:Sample sizes and effective sample sizes from mash analysis across tissues

Supplementary Figure 2:There are 4 figures here:

Summary of covariance matrices Uk with largest estimated weight (> 1%) in GTEx data:Uk2

Summary of covariance matrices Uk with largest estimated weight (> 1%) in GTEx data:Uk4

Summary of covariance matrices Uk with largest estimated weight (> 1%) in GTEx data:Uk5

Summary of covariance matrices Uk with largest estimated weight (> 1%) in GTEx data:Uk8

Supplementary Figure 3: Illustration of how Linkage Disequilibrium can impact effect estimate table and figure

Supplementary Figure 4:Pairwise Sharing By Sign

Supplementary Figure 5:Number of “tissue-specific eQTLs” in each tissues.

Supplementary Figure 6:Expression levels in genes with “tissue-specific eQTLs” are similar to those in other genes

Table 1: Heterogeneity Analysis Simulation and Data.

Additional usage notes

Developer notes

Run the following command to update the Docker image:

docker pull gaow/mash-paper

This reproducible R Markdown analysis was created with workflowr 1.0.1.9000