Last updated: 2018-04-26
Loading required package: ashr
corrplot 0.84 loaded
The data contains 10 conditions with 10% non-null samples. For the non-null samples, it has equal effects in the first c conditions.
Let L be the contrast matrix that subtract mean from each sample.
\[\hat{\delta}_{j}|\delta_{j} \sim N(\delta_{j}, \frac{1}{2}LL')\] 90% of the true deviations are 0. 10% of the deviation \(\delta_{j}\) has correlation that the first c conditions are negatively correlated with the rest conditions.
We set \(c = 2\).
set.seed(1)
R = 10
C = 2
data = sim.mean.sig(nsamp=10000, ncond=C)
L = matrix(-1/R, R, R)
L[cbind(1:R,1:R)] = (R-1)/R
L.10 = L[1:(R-1),]
row.names(L.10) = seq(1,R-1)
mash_data = mash_set_data(Bhat=data$Chat, Shat=data$Shat)
mash_data_L.10 = mash_set_data_contrast(mash_data, L.10)
U.c = cov_canonical(mash_data_L.10)
# data driven
# select max
m.1by1 = mash_1by1(mash_data_L.10)
strong = get_significant_results(m.1by1,0.05)
# center Z
mash_data_L.center = mash_data_L.10
mash_data_L.center$Bhat = mash_data_L.10$Bhat/mash_data_L.10$Shat # obtain z
mash_data_L.center$Shat = matrix(1, nrow(mash_data_L.10$Bhat),ncol(mash_data_L.10$Bhat))
mash_data_L.center$Bhat = apply(mash_data_L.center$Bhat, 2, function(x) x - mean(x))
U.pca = cov_pca(mash_data_L.center,2,strong)
U.ed = cov_ed(mash_data_L.center, U.pca, strong)
mashcontrast.model.10 = mash(mash_data_L.10, c(U.c, U.ed), algorithm.version = 'R', verbose = FALSE)
Using mashcommonbaseline, there are 288 discoveries. The covariance structure found here is:
barplot(get_estimated_pi(mashcontrast.model.10),las = 2, cex.names = 0.7)
 The correlation for PCA1 is: 
The correlation identified here is correct.
If we subtract the mean from the data directly \[Var(\hat{c}_{j,r}-\bar{\hat{c}_{j}}) = \frac{1}{2} - \frac{1}{2R}\]
Indep.data.10 = mash_set_data(Bhat = mash_data_L.10$Bhat,
                           Shat = matrix(sqrt(0.5-1/(2*R)), nrow(data$Chat), R-1))
Indep.model.10 = mash(Indep.data.10, c(U.c, U.ed), algorithm.version = 'R', verbose = FALSE)
There are 336 discoveries, which is more than the mashcommonbaseline model. The covariance structure found here is:
barplot(get_estimated_pi(Indep.model.10),las = 2, cex.names = 0.7)
 The weights for covariances are very different.
The correlation for PCA2 and tPCA is: 
The RRMSE plot:
delta = data$C %*% t(L.10)
barplot(c(sqrt(mean((delta - mashcontrast.model.10$result$PosteriorMean)^2)/mean((delta - data$Chat%*%t(L.10))^2)), sqrt(mean((delta - Indep.model.10$result$PosteriorMean)^2)/mean((delta - data$Chat%*%t(L.10))^2))), ylim=c(0,0.2), names.arg = c('mashcommonbaseline', 'mash.indep'), ylab='RRMSE')

We check the False Positive Rate and True Positive Rate. \[FPR = \frac{|N\cap S|}{|N|} \quad TPR = \frac{|CS\cap S|}{|T|} \]
CS_S = function(model, thresh=0.05, data){
  sig.index = model$result$lfsr <= thresh
  sum(sig.index * model$result$PosteriorMean * data > 0)
}
N_S = function(model, thresh=0.05, data){
  N.index = data == 0
  sig.index = model$result$lfsr <= thresh
  sum(sig.index * N.index)
}
delta = data$C %*% t(L.10)
delta[abs(delta) <= 1e-14] = 0
N = sum(delta == 0)
Tr = nrow(delta) * ncol(delta) - N
thresh.seq = seq(0, 1, by=0.0005)[-1]
mashcontrast = matrix(0,length(thresh.seq), 2)
Indep = matrix(0,length(thresh.seq), 2)
colnames(mashcontrast) = c('TPR', 'FPR')
colnames(Indep) = c('TPR', 'FPR')
for(t in 1:length(thresh.seq)){
  mashcontrast[t,] = c(CS_S(mashcontrast.model.10, thresh.seq[t], delta)/Tr, N_S(mashcontrast.model.10, thresh.seq[t],delta)/N)
  Indep[t,] = c(CS_S(Indep.model.10, thresh.seq[t], delta)/Tr,  N_S(Indep.model.10, thresh.seq[t],delta)/N)
}

These methods are similar in terms of the number of false positives versus true positive. The mashcommonbaseline model is slightly better than mash.indep model.
The data was generated with signals in the first c conditions (\(c_{j,1}, \cdots, c_{j,c}\)). The contrast matrix L used here discards the last condition. The deviations are \(\hat{c}_{j,1} - \bar{\hat{c}_{j}}, \hat{c}_{j,2} - \bar{\hat{c}_{j}}, \cdots, \hat{c}_{j,R-1} - \bar{\hat{c}_{j}}\).
However, the contrast matrix L can discard any deviation from \(\hat{c}_{j,1} - \bar{\hat{c}_{j}}, \cdots, \hat{c}_{j,R} - \bar{\hat{c}_{j}}\). The choice of the discarded deviation could influence the reuslt.
We run the same model with L that discard the first deviation.
L.1 = L[2:R,]
row.names(L.1) = seq(2,R)
mash_data_L.1 = mash_set_data_contrast(mash_data, L.1)
U.c = cov_canonical(mash_data_L.1)
# data driven
# select max
m.1by1 = mash_1by1(mash_data_L.1)
strong = get_significant_results(m.1by1,0.05)
# center Z
mash_data_L.center = mash_data_L.1
mash_data_L.center$Bhat = mash_data_L.1$Bhat/mash_data_L.1$Shat # obtain z
mash_data_L.center$Shat = matrix(1, nrow(mash_data_L.1$Bhat),ncol(mash_data_L.1$Bhat))
mash_data_L.center$Bhat = apply(mash_data_L.center$Bhat, 2, function(x) x - mean(x))
U.pca = cov_pca(mash_data_L.center,2,strong)
U.ed = cov_ed(mash_data_L.center, U.pca, strong)
mashcontrast.model.1 = mash(mash_data_L.1, c(U.c, U.ed), algorithm.version = 'R', verbose = FALSE)
Using mashcommonbaseline model, there are 283 discoveries. The covariance structure found here is:
barplot(get_estimated_pi(mashcontrast.model.1),las = 2, cex.names = 0.7)
 The correlation PCA 1 is: 
Indep.data.1 = mash_set_data(Bhat = mash_data_L.1$Bhat,
                           Shat = matrix(sqrt(0.5-1/(R*2)), nrow(data$Chat), R-1))
Indep.model.1 = mash(Indep.data.1, c(U.c, U.ed), algorithm.version = 'R', verbose = FALSE)
For mashIndep model, there are 217 discoveries, which is less than the mashcommonbaseline model. The covariance structure found here is:
barplot(get_estimated_pi(Indep.model.1),las = 2, cex.names = 0.7)

The correlation for PCA2 and tPCA is: 
The RRMSE plot: 
We check the False Positive Rate and True Positive Rate. \[FPR = \frac{|N\cap S|}{|N|} \quad TPR = \frac{|CS\cap S|}{|T|} \]

Using this contrast L, the results from mashcommonbaseline is much better than mashIndep.
sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] corrplot_0.84 mashr_0.2-6   ashr_2.2-7   
loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16             knitr_1.20              
 [3] magrittr_1.5             REBayes_1.3             
 [5] MASS_7.3-49              doParallel_1.0.11       
 [7] pscl_1.5.2               SQUAREM_2017.10-1       
 [9] lattice_0.20-35          ExtremeDeconvolution_1.3
[11] foreach_1.4.4            plyr_1.8.4              
[13] stringr_1.3.0            tools_3.4.4             
[15] parallel_3.4.4           grid_3.4.4              
[17] rmeta_3.0                htmltools_0.3.6         
[19] iterators_1.0.9          assertthat_0.2.0        
[21] yaml_2.1.18              rprojroot_1.3-2         
[23] digest_0.6.15            Matrix_1.2-14           
[25] codetools_0.2-15         evaluate_0.10.1         
[27] rmarkdown_1.9            stringi_1.1.7           
[29] compiler_3.4.4           Rmosek_8.0.69           
[31] backports_1.1.2          mvtnorm_1.0-7           
[33] truncnorm_1.0-8         
This R Markdown site was created with workflowr