Last updated: 2017-04-19
Code version: 020da62
ASH
fit less wellDiagnostic plots for ASH
can be used to measure the goodness of fit of ASH
. Breaking any of the assumptions behind ASH
would make its fit less well, as shown in the non-uniformness of \(\left\{\hat F_j\right\}\), among which some cases have been discussed.
A special case closely related to this project is about correlation. The presence of correlation could affects the accuracy of estimating \(g\), as shown before. Moreover, the correlation will inflate the empirical distribution of the correlated noise, making it less regular, thus less able to be captured by the mixture of normals or uniforms as in ASH
.
BH
-hostile data setsHere we get \(1K\) simulated, correlated null data sets, each having \(10K\) \(z\) scores, \(p\) values associated \(10K\) genes. We choose those data sets most hostile to \(BH\) procedure – those which produce at least false discovery under BH
using \(\alpha = 0.05\). And then feed \(\hat\beta_j = z_j, \hat s_j \equiv 1\) to ASH
, using a prior of normal mixtures and uniform mixtures, respectively.
If these \(z\) scores are independent, ASH
should be able to estimated \(\hat g =\delta_0\) reasonably well. However, with correlation, estimated \(\hat g\) will be different from \(\delta_0\). Moreover, with this estimated \(\hat g\), \(\left\{\hat F_j\right\}\) might not behave like \(\text{Unif}\left[0, 1\right]\).
This is because the pseudo-effect due to the correlation-induced inflation is not regular and not likely to be captured by unimodal mixtures of normals of uniforms. For example, as Matthew’s initial observation which inspired the whole project in the first place, the correlation may only inflate the moderate observations but not the extreme ones. Thus, a mixture of normals, which simultaneously inflates both the moderate and the extreme observations, would be ill-advised.
library(ashr)
z = read.table("../output/z_null_liver_777.txt")
p = read.table("../output/p_null_liver_777.txt")
n = nrow(z)
m = ncol(z)
It’s worth noting that for the same data set, that is, for the same correlated \(z\) scores, the diagnostic plots of ASH
using a mixture uniform prior is usually conspicuously better than those using a mixture normal prior. This might be because uniform mixtures are more flexible than normal mixtures at dealing with irregular distributions, like the “moderates inflated yet extremes not inflated” one. For the same reason, mixcompdist = "halfuniform"
is occasionally better than mixcompdist = "uniform"
.
N0. 1 : Data Set 355 ; Number of False Discoveries: 639 ; pihat0 = 0.01079742
N0. 2 : Data Set 327 ; Number of False Discoveries: 489 ; pihat0 = 0.0477939
N0. 3 : Data Set 23 ; Number of False Discoveries: 408 ; pihat0 = 0.008419777
N0. 4 : Data Set 122 ; Number of False Discoveries: 331 ; pihat0 = 0.01544099
N0. 5 : Data Set 783 ; Number of False Discoveries: 170 ; pihat0 = 0.01516697
N0. 6 : Data Set 749 ; Number of False Discoveries: 114 ; pihat0 = 0.01482747
N0. 7 : Data Set 724 ; Number of False Discoveries: 79 ; pihat0 = 0.01451062
N0. 8 : Data Set 56 ; Number of False Discoveries: 35 ; pihat0 = 0.03811803
N0. 9 : Data Set 840 ; Number of False Discoveries: 28 ; pihat0 = 0.01433503
N0. 10 : Data Set 858 ; Number of False Discoveries: 16 ; pihat0 = 0.01340537
N0. 11 : Data Set 771 ; Number of False Discoveries: 12 ; pihat0 = 0.04069309
N0. 12 : Data Set 389 ; Number of False Discoveries: 11 ; pihat0 = 0.02694463
N0. 13 : Data Set 485 ; Number of False Discoveries: 9 ; pihat0 = 0.03190266
N0. 14 : Data Set 77 ; Number of False Discoveries: 7 ; pihat0 = 0.0240265
N0. 15 : Data Set 503 ; Number of False Discoveries: 7 ; pihat0 = 0.1290869
N0. 16 : Data Set 984 ; Number of False Discoveries: 7 ; pihat0 = 0.04373679
N0. 17 : Data Set 360 ; Number of False Discoveries: 6 ; pihat0 = 0.01242637
N0. 18 : Data Set 522 ; Number of False Discoveries: 4 ; pihat0 = 0.02361336
N0. 19 : Data Set 51 ; Number of False Discoveries: 3 ; pihat0 = 0.07300305
N0. 20 : Data Set 316 ; Number of False Discoveries: 3 ; pihat0 = 0.1078068
N0. 21 : Data Set 663 ; Number of False Discoveries: 3 ; pihat0 = 0.1696198
N0. 22 : Data Set 274 ; Number of False Discoveries: 2 ; pihat0 = 0.08298182
N0. 23 : Data Set 901 ; Number of False Discoveries: 2 ; pihat0 = 0.9997685
N0. 24 : Data Set 912 ; Number of False Discoveries: 2 ; pihat0 = 0.08821883
N0. 25 : Data Set 22 ; Number of False Discoveries: 1 ; pihat0 = 0.0328122
N0. 26 : Data Set 31 ; Number of False Discoveries: 1 ; pihat0 = 0.09577454
N0. 27 : Data Set 187 ; Number of False Discoveries: 1 ; pihat0 = 0.6662317
N0. 28 : Data Set 248 ; Number of False Discoveries: 1 ; pihat0 = 0.4583919
N0. 29 : Data Set 269 ; Number of False Discoveries: 1 ; pihat0 = 0.03060497
N0. 30 : Data Set 285 ; Number of False Discoveries: 1 ; pihat0 = 0.5178013
N0. 31 : Data Set 403 ; Number of False Discoveries: 1 ; pihat0 = 0.03166538
N0. 32 : Data Set 483 ; Number of False Discoveries: 1 ; pihat0 = 0.9998828
N0. 33 : Data Set 501 ; Number of False Discoveries: 1 ; pihat0 = 0.09362328
N0. 34 : Data Set 530 ; Number of False Discoveries: 1 ; pihat0 = 0.4320548
N0. 35 : Data Set 574 ; Number of False Discoveries: 1 ; pihat0 = 0.3723755
N0. 36 : Data Set 575 ; Number of False Discoveries: 1 ; pihat0 = 0.03307994
N0. 37 : Data Set 643 ; Number of False Discoveries: 1 ; pihat0 = 0.160838
N0. 38 : Data Set 769 ; Number of False Discoveries: 1 ; pihat0 = 0.1227421
N0. 39 : Data Set 778 ; Number of False Discoveries: 1 ; pihat0 = 0.05117821
N0. 40 : Data Set 817 ; Number of False Discoveries: 1 ; pihat0 = 1
N0. 41 : Data Set 837 ; Number of False Discoveries: 1 ; pihat0 = 0.828704
N0. 42 : Data Set 897 ; Number of False Discoveries: 1 ; pihat0 = 0.7555982
N0. 43 : Data Set 923 ; Number of False Discoveries: 1 ; pihat0 = 0.01590919
N0. 44 : Data Set 955 ; Number of False Discoveries: 1 ; pihat0 = 0.9998902
N0. 45 : Data Set 971 ; Number of False Discoveries: 1 ; pihat0 = 0.1368012
N0. 46 : Data Set 997 ; Number of False Discoveries: 1 ; pihat0 = 0.1018171
The simulation shows that oftentimes, ASH
, no matter what kind of mixture prior is used, would only be able to produce a bad fit for the pseudo-effects due to correlation-induced inflation. However, occasionally, if uniform
or halfuniform
is used as the mixture component for the prior, the model may have a good fit, as indicated by the uniformness of \(\left\{\hat F_j\right\}\).
sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] backports_1.0.5 magrittr_1.5 rprojroot_1.2 tools_3.3.3
[5] htmltools_0.3.5 yaml_2.1.14 Rcpp_0.12.10 stringi_1.1.2
[9] rmarkdown_1.3 knitr_1.15.1 git2r_0.18.0 stringr_1.2.0
[13] digest_0.6.11 evaluate_0.10
This R Markdown site was created with workflowr