Handling t likelihood

Last updated: 2017-03-04

Code version: be223d3

Introduction

Now we are moving to handle \(t\) likelihood. Suppose we have observations \[ (\hat\beta_1, \hat s_1, \nu_1), \ldots, (\hat\beta_n, \hat s_n, \nu_n) \] \((\hat\beta_i, \hat s_i)\) being the summary statistics, and \(\nu_i\) the degree of freedom for estimating \(\hat s_i\). Now we are considering both \(\hat\beta_j\) and \(\hat s_j\) are jointly generated by a data generating mechanism as follows.

\[ \begin{array}{c} \hat\beta_j |\beta_j, s_j \sim N(\beta_j, s_j^2) \\ \hat s_j^2 / s_j^2 |s_j, \nu_j \sim \chi_{\nu_j}^2 / \nu_j\\ \hat\beta_j \perp \hat s_j | \beta_j, s_j, \nu_j \end{array} \Rightarrow \begin{array}{c} (\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j}\\ \hat\beta_j / \hat s_j | \beta_j, s_j \sim t_{\nu_j}(\beta_j / s_j) \end{array} \] where “\(\perp\)” means “conditionally independent with,” and \(t_{\nu}(\mu)\) is the noncentral \(t\) distribution with \(\nu\) degrees of freedom and noncentrality parameter \(\mu\). \(t_{\nu}(0) = t_\nu\) is the standard \(t\) distribution.

Models for \(t\) likelihood

Since now we are taking into consideration the randomness of both \(\hat\beta_j\) and \(\hat s_j\), or at least the fact that \(\hat s_j\) is not a precise measure of \(s_j\), the model could become trickier. Idealy we would hope to use the distribution

\[ (\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j} \]

but it’s not clear how to use it. We need a likelihood for the data, either \(\hat\beta_j\) itself or \((\hat\beta_j, \hat s_j)\) jointly, but this aforementioned distribution doesn’t give us such a likelihood directly as it does not specify either \(p(\hat\beta_j | \beta_j, \hat s_j)\) or \(p(\hat\beta_j, \hat s_j | \beta_j)\).

1. Current implementation in `ashr`

The current implementation in ashr uses a simplication

\[ \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j} \] As Matthew noted, it’s different from \((\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j}\). For one thing, under this approximation

\[ \hat\beta_j / \hat s_j | \beta_j, \hat s_j \sim \beta_j / \hat s_j + t_{\nu_j} \]

should be unimodal and symmetric, whereas the “true” distribution

\[ \hat\beta_j / \hat s_j | \beta_j, s_j \sim t_{\nu_j}(\beta_j / s_j) \] is a noncentral \(t\) which is not symmetric. So it might have some problem going to truncash when we need to consider the probability of \(|\hat\beta_j / \hat s_j|\) smaller than some threshold (more on that below). However, it works satisfactorily well with ashr in practice, and there seems no obviously better alternatives, detailed below.

2. Pivotal likelihood

Using the fact that

\[ (\hat\beta_j - \beta_j) / \hat s_j | \beta_j\sim t_{\nu_j}\\ \] and assuming \(\beta_j\) from a mixture of uniform

\[ \beta_j \sim \sum_k\pi_k \text{Unif}[a_k, b_k] \] we can integrate out \(\beta_j\) using convolution of \(t_{\nu_j}\) and each component of the mixture uniform \([a_k, b_k]\)

\[ \begin{array}{rl} &\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j)\text{d}\beta_j \\ =&\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j|\beta_j\sim \sum_k\pi_k\text{Unif}[a_k, b_k])\text{d}\beta_j\\ =&\sum_k\pi_k\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j|\beta_j\sim \text{Unif}[a_k, b_k])\text{d}\beta_j\\ =&\sum_k\pi_k \begin{cases} \frac{\hat s_j}{b_k - a_k}(F_{t_{\nu_j}}((\hat\beta_j - b_k) / \hat s_j) - F_{t_{\nu_j}}((\hat\beta_j - a_k) / \hat s_j)), & a_k < b_k \\ f_{t_{\nu_j}}((\hat\beta_j - a_k) / \hat s_j), & a_k = b_k \end{cases} \end{array} \] It’s mathematically feasible, yet I cannot quite recognize the meaning of this “expected” probability density \(\int p((\hat\beta_j - \beta_j) / \hat s_j | \beta_j) p(\beta_j)\text{d}\beta_j\) and how to use it. It turns out to be related with the pivotal likelihood idea, yet as Matthew put it, “ultimately we did not find it satisfying. It is not a well established concept and it is not clear to me that it ends up being a good idea.”

3. Joint ash: Jointly modeling \(\hat\beta\) and \(\hat s\)

Taking advantage of the conditional indepedence of \(\hat\beta\) and \(\hat s\) given \(\beta\) and \(s\), we can write a model as

\[ \begin{array}{c} p(\hat\beta_j, \hat s_j|\beta_j, s_j, \nu_j) = p(\hat\beta_j|\beta_j, s_j)p(\hat s_j|s_j, \nu_j)\\ \hat\beta_j|\beta_j, s_j \sim N(\beta_j, s_j^2)\\ \hat s_j|s_j, \nu_j \sim s_j^2\chi_{\nu_j}^2 /\nu_j\\ \beta_j \sim \sum_k\pi_kg_k^\beta\\ s_j \sim \sum_l\rho_lg_l^s \end{array} \]

This line of “joint ash” is being done by Mengyin.

4. Sequential joint ash

The “better” approach is the one that Mengyin now takes. First appy vash to shrink the \(\hat s\), and then apply ashr with its currently-implemented \(t\) likelihood (taking \(\hat s\)) as given) using moderated \(\hat s\) (and moderated df). This approach can be formally justified, although not obvious, as Matthew noted. Probably the reason is that since \(\hat\beta\) and \(\hat s\) are conditionally independent given \(\beta\) and \(s\), we could model them separately and sequentially.

5. Matthew’s recommendation

So the bottom line is that for truncash I think it suffices to implement the same \(t\) approach as ashr, since then we can use the same trick as in 4. Of course, for testing the implementation you will want to simulate from the assumed model

\[ \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j} \]

Moving to `truncash`

Problem setting

As in Normal likelihood case, suppose we have \(m + n\) observations of \((\hat\beta_j, \hat s_j, \nu_j)\) in two groups such that, with a pre-specified \(t_j\) (related to \(\nu_j\)) for each observation

\[ \text{Group 1: }(\hat\beta_1, \hat s_1), \ldots, (\hat\beta_m, \hat s_m), \text{with } |\hat\beta_j/\hat s_j| \leq t_j \]

\[ \text{Group 2: }(\hat\beta_{m+1}, \hat s_{m+1}), \ldots, (\hat\beta_{m+n}, \hat s_{m+n}), \text{with } |\hat\beta_j/\hat s_j| > t_j \]

For Group 1, we’ll only use the information that for each one, \(|\hat\beta_j/\hat s_j| \leq t_j\); that is, they are moderate observations. For Group 2, we’ll use the full observation \((\hat\beta_j, \hat s_j, \nu_j)\).

The extreme group: business as usual

Now for Group 2, the extreme group where we observe the whole thing, we have usual ASH with an approximate \(t_{\nu_j}\) likelihood and uniform mixture priors

\[ \begin{array}{c} \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j}\\ \beta_j \sim \sum_k\pi_k \text{Unif}[a_k, b_k] \end{array} \]

The moderate group: two possible ways

For Group 1, the extreme group where the relevant information is \(|\hat \beta_j / \hat s_j| \leq t_j\), we still use the same uniform mixture priors

\[ \beta_j \sim \sum_k\pi_k \text{Unif}[a_k, b_k] \] yet two possible likelihood. One comes from the current ashr implementation

\[ \hat\beta_j | \beta_j, \hat s_j \sim \beta_j + \hat s_j t_{\nu_j} \Rightarrow \hat\beta_j / \hat s_j | \beta_j, \hat s_j \sim \beta_j / \hat s_j + t_{\nu_j} \] based on a standard \(t\). The other approach comes from the fact

\[ \hat\beta_j / \hat s_j | \beta_j, s_j \sim t_{\nu_j}(\beta_j / s_j) \approx t_{\nu_j}(\beta_j / \hat s_j) \]

based on a noncentral \(t\). Both use some simplification and approximation, and presumably it shouldn’t make much difference in practice.

The rest: back to business as usual

Then we put both groups together and estimate \(\pi_k\) from the marginal probability of the data in both groups.

Code

Simulation

source("../code/truncash.t.R")
source("../code/truncash.R")
betahat = rt(100, df = 3)
sebetahat = rep(1, 100)
fit.normal.original = truncash(betahat, sebetahat, t = qnorm(0.975))
get_pi0(fit.normal)
fit.normal.t = truncash.t(betahat, sebetahat, pval.thresh = 0.05, df = rep(2, 100), method = "fdr", mixcompdist = "uniform")
ashr::ash.workhorse(betahat, sebetahat, fixg = TRUE, g = fit.normal.t)

Session Information

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] backports_1.0.5 magrittr_1.5    rprojroot_1.2   tools_3.3.2    
 [5] htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.9     stringi_1.1.2  
 [9] rmarkdown_1.3   knitr_1.15.1    git2r_0.18.0    stringr_1.1.0  
[13] digest_0.6.11   workflowr_0.3.0 evaluate_0.10

This R Markdown site was created with workflowr

Handling \(t\) likelihood

Lei Sun

2017-03-01

Introduction

Models for \(t\) likelihood

1. Current implementation in `ashr`

2. Pivotal likelihood

3. Joint ash: Jointly modeling \(\hat\beta\) and \(\hat s\)

4. Sequential joint ash

5. Matthew’s recommendation

Moving to `truncash`

Problem setting

The extreme group: business as usual

The moderate group: two possible ways

The rest: back to business as usual

Code

Simulation

Session Information

Handling \(t\) likelihood

Lei Sun

2017-03-01

Introduction

Models for \(t\) likelihood

1. Current implementation in ashr

2. Pivotal likelihood

3. Joint ash: Jointly modeling \(\hat\beta\) and \(\hat s\)

4. Sequential joint ash

5. Matthew’s recommendation

Moving to truncash

Problem setting

The extreme group: business as usual

The moderate group: two possible ways

The rest: back to business as usual

Code

Simulation

Session Information

1. Current implementation in `ashr`

Moving to `truncash`