Last updated: 2017-03-04
Code version: f7ef5aa
Now we are moving to handle \(t\) likelihood. Suppose we have observations \[ (\hat\beta_1, \hat s_1, \nu_1), \ldots, (\hat\beta_n, \hat s_n, \nu_n) \] \((\hat\beta_i, \hat s_i)\) being the summary statistics, and \(\nu_i\) the degree of freedom for estimating \(\hat s_i\). Now we are considering both \(\hat\beta_j\) and \(\hat s_j\) are jointly generated by a data generating mechanism as follows.
\[ \begin{array}{c} \hat\beta_j |\beta_j, s_j \sim N(\beta_j, s_j^2) \\ \hat s_j^2 / s_j^2 |s_j, \nu_j \sim \chi_{\nu_j}^2 / \nu_j\\ \hat\beta_j \perp \hat s_j | \beta_j, s_j, \nu_j \end{array} \Rightarrow \begin{array}{c} (\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j}\\ \hat\beta_j / \hat s_j | \beta_j, s_j \sim t_{\nu_j}(\beta_j / s_j) \end{array} \] where “\(\perp\)” means “conditionally independent with,” and \(t_{\nu}(\mu)\) is the noncentral \(t\) distribution with \(\nu\) degrees of freedom and noncentrality parameter \(\mu\). \(t_{\nu}(0) = t_\nu\) is the standard \(t\) distribution.
Since now we are taking into consideration the randomness of both \(\hat\beta_j\) and \(\hat s_j\), or at least the fact that \(\hat s_j\) is not a precise measure of \(s_j\), the model could become trickier. Idealy we would hope to use the distribution
\[ (\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j} \]
but it’s not clear how to use it. We need a likelihood for the data, either \(\hat\beta_j\) itself or \((\hat\beta_j, \hat s_j)\) jointly, but this aforementioned distribution doesn’t give us such a likelihood directly as it does not specify either \(p(\hat\beta_j | \beta_j, \hat s_j)\) or \(p(\hat\beta_j, \hat s_j | \beta_j)\).
ashr
The current implementation in ashr
uses a simplication
\[ \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j} \] As Matthew noted, it’s different from \((\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j}\). For one thing, under this approximation
\[ \hat\beta_j / \hat s_j | \beta_j, \hat s_j \sim \beta_j / \hat s_j + t_{\nu_j} \]
should be unimodal and symmetric, whereas the “true” distribution
\[
\hat\beta_j / \hat s_j | \beta_j, s_j \sim t_{\nu_j}(\beta_j / s_j)
\] is a noncentral \(t\) which is not symmetric. So it might have some problem going to truncash
when we need to consider the probability of \(|\hat\beta_j / \hat s_j|\) smaller than some threshold (more on that below). However, it works satisfactorily well with ashr
in practice, and there seems no obviously better alternatives, detailed below.
Using the fact that
\[ (\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j}\\ \] and assuming \(\beta_j\) from a mixture of uniform
\[ \beta_j \sim \sum_k\pi_k \text{Unif}[a_k, b_k] \] we can integrate out \(\beta_j\) using convolution of \(t_{\nu_j}\) and each component of the mixture uniform \([a_k, b_k]\)
\[ \begin{array}{rl} &\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j)\text{d}\beta_j \\ =&\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j|\beta_j\sim \sum_k\pi_k\text{Unif}[a_k, b_k])\text{d}\beta_j\\ =&\sum_k\pi_k\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j|\beta_j\sim \text{Unif}[a_k, b_k])\text{d}\beta_j\\ =&\sum_k\pi_k \begin{cases} \frac{\hat s_j}{b_k - a_k}(F_{t_{\nu_j}}((\hat\beta_j - b_k) / \hat s_j) - F_{t_{\nu_j}}((\hat\beta_j - a_k) / \hat s_j)), & a_k < b_k \\ f_{t_{\nu_j}}((\hat\beta_j - a_k) / \hat s_j), & a_k = b_k \end{cases} \end{array} \] It’s mathematically feasible, yet I cannot quite recognize the meaning of this “expected” probability density \(\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j)\text{d}\beta_j\) and how to use it. It turns out to be related with the pivotal likelihood idea, yet as Matthew put it, “ultimately we did not find it satisfying. It is not a well established concept and it is not clear to me that it ends up being a good idea.”
Taking advantage of the conditional indepedence of \(\hat\beta\) and \(\hat s\) given \(\beta\) and \(s\), we can write a model as
\[ \begin{array}{c} p(\hat\beta_j, \hat s_j|\beta_j, s_j, \nu_j) = p(\hat\beta_j|\beta_j, s_j)p(\hat s_j|s_j, \nu_j)\\ \hat\beta_j|\beta_j, s_j \sim N(\beta_j, s_j^2)\\ \hat s_j|s_j, \nu_j \sim s_j^2\chi_{\nu_j}^2 /\nu_j\\ \beta_j \sim \sum_k\pi_kg_k^\beta\\ s_j \sim \sum_l\rho_lg_l^s \end{array} \]
This line of “joint ash” is being done by Mengyin.
The “better” approach is the one that Mengyin now takes. First appy vash
to shrink the \(\hat s\), and then apply ashr
with its currently-implemented \(t\) likelihood (taking \(\hat s\)) as given) using moderated \(\hat s\) (and moderated df). This approach can be formally justified, although not obvious, as Matthew noted. Probably the reason is that since \(\hat\beta\) and \(\hat s\) are conditionally independent given \(\beta\) and \(s\), we could model them separately and sequentially.
So the bottom line is that for truncash
I think it suffices to implement the same \(t\) approach as ashr
, since then we can use the same trick as in 4. Of course, for testing the implementation you will want to simulate from the assumed model
\[ \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j} \]
truncash
As in Normal likelihood case, suppose we have \(m + n\) observations of \((\hat\beta_j, \hat s_j, \nu_j)\) in two groups such that, with a pre-specified \(t_j\) (related to \(\nu_j\)) for each observation
\[ \text{Group 1: }(\hat\beta_1, \hat s_1), \ldots, (\hat\beta_m, \hat s_m), \text{with } |\hat\beta_j/\hat s_j| \leq t_j \]
\[ \text{Group 2: }(\hat\beta_{m+1}, \hat s_{m+1}), \ldots, (\hat\beta_{m+n}, \hat s_{m+n}), \text{with } |\hat\beta_j/\hat s_j| > t_j \]
For Group 1, we’ll only use the information that for each one, \(|\hat\beta_j/\hat s_j| \leq t_j\); that is, they are moderate observations. For Group 2, we’ll use the full observation \((\hat\beta_j, \hat s_j, \nu_j)\).
Now for Group 2, the extreme group where we observe the whole thing, we have usual ASH with an approximate \(t_{\nu_j}\) likelihood and uniform mixture priors
\[ \begin{array}{c} \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j}\\ \beta_j \sim \sum_k\pi_k \text{Unif}[a_k, b_k] \end{array} \]
For Group 1, the extreme group where the relevant information is \(|\hat \beta_j / \hat s_j| \leq t_j\), we still use the same uniform mixture priors
\[
\beta_j \sim \sum_k\pi_k \text{Unif}[a_k, b_k]
\] yet two possible likelihood. One comes from the current ashr
implementation
\[ \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j} \Rightarrow \hat\beta_j / \hat s_j | \beta_j, \hat s_j \sim \beta_j / \hat s_j + t_{\nu_j} \] based on a standard \(t\). The other approach comes from the fact
\[ \hat\beta_j / \hat s_j | \beta_j, s_j \sim t_{\nu_j}(\beta_j / s_j) \approx t_{\nu_j}(\beta_j / \hat s_j) \]
based on a noncentral \(t\). Both use some simplification and approximation, and presumably it shouldn’t make much difference in practice.
Then we put both groups together and estimate \(\pi_k\) from the marginal probability of the data in both groups.
source("../code/truncash.t.R")
source("../code/truncash.R")
betahat = rt(100, df = 3)
sebetahat = rep(1, 100)
fit.normal.original = truncash(betahat, sebetahat, t = qnorm(0.975))
get_pi0(fit.normal)
fit.normal.t = truncash.t(betahat, sebetahat, pval.thresh = 0.05, df = rep(2, 100), method = "fdr", mixcompdist = "uniform")
ashr::ash.workhorse(betahat, sebetahat, fixg = TRUE, g = fit.normal.t)
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] backports_1.0.5 magrittr_1.5 rprojroot_1.2 tools_3.3.2
[5] htmltools_0.3.5 yaml_2.1.14 Rcpp_0.12.9 stringi_1.1.2
[9] rmarkdown_1.3 knitr_1.15.1 git2r_0.18.0 stringr_1.1.0
[13] digest_0.6.11 workflowr_0.3.0 evaluate_0.10
This R Markdown site was created with workflowr