Last updated: 2017-11-07
Code version: 2c05d59
Up until now, truncash
only uses a threshold that’s pre-specified, that is, independent with the data. So a natrual question is, what will happen if we choose a threshold that is data driven, such as the \(n^\text{th}\) most extreme observation or the top \(q\%\) quantile?
For a start, Matthew had an idea that what if the only thing we know is the most extreme observation \((\hat\beta_{(n)}, \hat s_{(n)})\), as well as the total number of observations \(n\). What does this single data point tell us?
Start with our usual ash
model.
\[ \begin{array}{c} \hat\beta_j | \hat s_j, \beta_j \sim N(\beta_j, \hat s_j^2)\\ \beta_j \sim \sum_k\pi_k N(0, \sigma_k^2) \end{array} \] Now we only observe \((\hat\beta_{(n)}, \hat s_{(n)})\) with the information that \(|\hat\beta_{(n)}/\hat s_{(n)}| \geq |\hat\beta_{j}/\hat s_{j}|\), \(j = 1, \ldots, n\). This is essentially separating \(n\) observations into two groups.
\[
\text{Group 1: }(\hat\beta_{(1)}, \hat s_{(1)}), \ldots, (\hat\beta_{(n - 1)}, \hat s_{(n - 1)}), \text{ with } |\hat\beta_j/\hat s_j| \leq t = |\hat\beta_{(n)}/\hat s_{(n)}|
\] \[
\text{Group 2: }(\hat\beta_{n}, \hat s_{n}), \text{ with } |\hat\beta_{(n)}/\hat s_{(n)}| = t
\] Or in other words, it should be related to truncash
using the threshold \(t = |\hat\beta_{(n)}/\hat s_{(n)}|\), at least from the likelihood principle point of view.
Suppose \(X_1 \sim F_1, X_2\sim F_2, \ldots, X_n \sim F_n\), with \(F_i\) being the cdf of the random variable \(X_i\), with a pdf \(f_i\). In ash
’s setting, we can think of \(X_i = |\hat\beta_i/ \hat s_i|\), and \(f_i\) is the convolution of a common unimodel distribution \(g\) (to be estimated) and the idiosyncratic likelihood of \(|\hat\beta_j / \hat s_j|\) given \(\hat s_j\) (usually related to normal or Student’s t, but could be generalized to others). Let \(X_{(n)}:=\max\{X_1, X_2, \ldots, X_n\}\), the extreme value of these \(n\) random variables.
\[ \begin{array}{rl} & P(X_{(n)} \leq t) = \prod_{i = 1}^n F_i(t) \\ \Rightarrow & p_{X_{(n)}}(t) = dP(X_{(n)} \leq t)/dt \neq \prod_{i = 1}^{n-1} F_i(t)f_n(t) \end{array} \] where \(\{1, \ldots, n-1\}\) are the index set of less extreme observations and \(n\) of the most extreme one. So these two statements are not equivalent.
If we have \(F_1 = F_2 = \cdots = F_n\), the two statements are somehow indeed related because \[
\begin{array}{rl}
& P(X_{(n)} \leq t) = (F(t))^n \\
\Rightarrow & p_{X_{(n)}}(t) = dP(X_{(n)} \leq t)/dt =
n(F(t))^{n-1}f(t) \\
\propto & (F(t))^{n-1}f(t)\\
\end{array}
\] In other words, we can regard “known the largest observation only” as equivalent to “using the largest observation as the threshold in truncash
.”
\(F_1 = F_2 = \cdots = F_n\) in current setting implies that \(\hat\beta_j / \hat s_j\) has the same marginal distribution for every observation. Actually it’s not a wild assumption. For example, we always have
\[ \hat\beta_j / \hat s_j | \beta_j, s_j, \nu_j \sim t_{\nu_j}(\beta_j / s_j) \] If we further assume
\[
\beta_j / s_j \sim g
\] then we’ll arrive at the result that \(\hat\beta_j / \hat s_j\) has the same marginal distribution. This assumption is essentially the gold standard everybody implicitly makes, refered to as \(\alpha = 1\) assumption in ash
.
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.2 backports_1.1.1 magrittr_1.5 rprojroot_1.2
[5] tools_3.4.2 htmltools_0.3.6 yaml_2.1.14 Rcpp_0.12.13
[9] stringi_1.1.5 rmarkdown_1.6 knitr_1.17 git2r_0.19.0
[13] stringr_1.2.0 digest_0.6.12 workflowr_0.7.0 evaluate_0.10.1
This R Markdown site was created with workflowr