Last updated: 2018-06-21

workflowr checks: (Click a bullet for more information)
Expand here to see past versions:

To apply multivariate adaptive shrinkage (mash) to data from the GTEx study, we created an R data set (serialized R object) containing matrices of SNP-gene association statistics. These association statistics include effect estimates, Z scores and corresponding standard errors.

See here for the scripts used to generate these statistics from the SNP-gene data that were provided by the GTEx Project.

How to download the data

These are the recommended steps for retrieving the GTEx SNP-gene association statistics:

  1. Download or clone the git repository.

  2. The association statistics are found in file MatrixEQTLSumStats.Portable.Z.rds.

How to load the data into R

  1. Change the working directory in R (or RStudio) to the analysis directory of the gtexresults repository, e.g.,

R setwd("gtexresults/analysis")

  1. Read the data object into R:

R dat <- readRDS("../data/MatrixEQTLSumStats.Portable.Z.rds")

  1. Get an overview of the data from this file:

R names(dat)

Overview of SNP-gene association statistics

This file contains SNP-gene association statistics for 16,069 genes and 44 human tissues. These 16,069 genes were selected because they all showed some indication of being expressed in all 44 tissues. Therefore, the association statistics are stored as matrices each with 16,069 rows and 44 columns.

As input to mash, we use a matrix of expression quantitative trait loci (eQTL) effect estimate, and corresponding standard errors. (We also provide Z scores.) See the manuscript for details on how these association statistics were obtained.

These association statistics were subdivided into three subsets:

  1. Results from a subset `strong” tests corresponding to stronger effects in your study. For example, these tests might have been identified by taking the “top” eQTL in each gene based on univariate test results, or by some other approach such as a simple meta-analysis.
  1. Results from a random subset of all tests. It is important that these be an unbiased representation of all the tests you are considering, including null and non-null tests, because mashr uses these tests to learn about the amount of signal in the data, and to “correct” estimates for the fact that many tests are null (analagous to a kind of multiple testing correction.

This reproducible R Markdown analysis was created with workflowr