Last updated: 2017-03-04
Code version: be223d3
Up until now, truncash only uses a threshold that’s pre-specified, that is, independent with the data. So a natrual question is, what will happen if we choose a threshold that is data driven, such as the \(n^\text{th}\) most extreme observation or the top \(q\%\) quantile?
For a start, Matthew had an idea that what if the only thing we know is the most extreme observation \((\hat\beta_{(n)}, \hat s_{(n)})\), as well as the total number of observations \(n\). What does this single data point tell us?
Start with our usual ash model.
\[ \begin{array}{c} \hat\beta_j | \hat s_j, \beta_j \sim N(\beta_j, \hat s_j^2)\\ \beta_j \sim \sum_k\pi_k N(0, \sigma_k^2) \end{array} \] Now we only observe \((\hat\beta_{(n)}, \hat s_{(n)})\) with the information that \(|\hat\beta_{(n)}/\hat s_{(n)}| \geq |\hat\beta_{j}/\hat s_{j}|\), \(j = 1, \ldots, n\). This is essentially separating \(n\) observations into two groups.
\[
\text{Group 1: }(\hat\beta_{(1)}, \hat s_{(1)}), \ldots, (\hat\beta_{(n - 1)}, \hat s_{(n - 1)}), \text{ with } |\hat\beta_j/\hat s_j| \leq t = |\hat\beta_{(n)}/\hat s_{(n)}|
\] \[
\text{Group 2: }(\hat\beta_{n}, \hat s_{n}), \text{ with } |\hat\beta_{(n)}/\hat s_{(n)}| = t
\] Or in other words, it should be related to truncash using the threshold \(t = |\hat\beta_{(n)}/\hat s_{(n)}|\), at least from the likelihood principle point of view.
Suppose \(X_1 \sim F_1, X_2\sim F_2, \ldots, X_n \sim F_n\), with \(F_i\) being the cdf of the random variable \(X_i\), with a pdf \(f_i\). In ash’s setting, we can think of \(X_i = |\hat\beta_i/ \hat s_i|\), and \(f_i\) is the convolution of a common unimodel distribution \(g\) (to be estimated) and the idiosyncratic likelihood of \(|\hat\beta_j / \hat s_j|\) given \(\hat s_j\) (usually related to normal or Student’s t, but could be generalized to others). Let \(X_{(n)}:=\max\{X_1, X_2, \ldots, X_n\}\), the extreme value of these \(n\) random variables.
\[ \begin{array}{rl} & P(X_{(n)} \leq t) = \prod_{i = 1}^n F_i(t) \\ \Rightarrow & p_{X_{(n)}}(t) = dP(X_{(n)} \leq t)/dt \neq \prod_{i = 1}^{n-1} F_i(t)f_n(t) \end{array} \] where \(\{1, \ldots, n-1\}\) are the index set of less extreme observations and \(n\) of the most extreme one. So these two statements are not equivalent.
If we have \(F_1 = F_2 = \cdots = F_n\), the two statements are somehow indeed related because \[
\begin{array}{rl}
& P(X_{(n)} \leq t) = (F(t))^n \\
\Rightarrow & p_{X_{(n)}}(t) = dP(X_{(n)} \leq t)/dt =
n(F(t))^{n-1}f(t) \\
 \propto & (F(t))^{n-1}f(t)\\
\end{array}
\] In other words, we can regard “known the largest observation only” as equivalent to “using the largest observation as the threshold in truncash.”
\(F_1 = F_2 = \cdots = F_n\) in current setting implies that \(\hat\beta_j / \hat s_j\) has the same marginal distribution for every observation. Actually it’s not a wild assumption. For example, we always have
\[ \hat\beta_j / \hat s_j | \beta_j, s_j, \nu_j \sim t_{\nu_j}(\beta_j / s_j) \] If we further assume
\[
\beta_j / s_j \sim g
\] then we’ll arrive at the result that \(\hat\beta_j / \hat s_j\) has the same marginal distribution. This assumption is essentially the gold standard everybody implicitly makes, refered to as \(\alpha = 1\) assumption in ash.
sessionInfo()R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
loaded via a namespace (and not attached):
 [1] backports_1.0.5 magrittr_1.5    rprojroot_1.2   tools_3.3.2    
 [5] htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.9     stringi_1.1.2  
 [9] rmarkdown_1.3   knitr_1.15.1    git2r_0.18.0    stringr_1.1.0  
[13] digest_0.6.11   workflowr_0.3.0 evaluate_0.10  This R Markdown site was created with workflowr