Last updated: 2018-08-20
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date 
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
 ✔ Environment: empty 
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
 ✔ Seed: 
set.seed(1) 
The command set.seed(1) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
 ✔ Session information: recorded 
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
 ✔ Repository version: b083054 
wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/figure/
    Ignored:    analysis/include/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    docs/.DS_Store
    Ignored:    output/.DS_Store
Untracked files:
    Untracked:  _workflowr.yml
    Untracked:  analysis/Classify.Rmd
    Untracked:  analysis/EstimateCorMaxEM.Rmd
    Untracked:  analysis/EstimateCorMaxEMGD.Rmd
    Untracked:  analysis/EstimateCorPrior.Rmd
    Untracked:  analysis/EstimateCorSol.Rmd
    Untracked:  analysis/HierarchicalFlashSim.Rmd
    Untracked:  analysis/MashLowSignal.Rmd
    Untracked:  analysis/Mash_GTEx.Rmd
    Untracked:  analysis/MeanAsh.Rmd
    Untracked:  analysis/OutlierDetection.Rmd
    Untracked:  analysis/OutlierDetection2.Rmd
    Untracked:  analysis/OutlierDetection3.Rmd
    Untracked:  analysis/OutlierDetection4.Rmd
    Untracked:  analysis/Test.Rmd
    Untracked:  analysis/mash_missing_row.Rmd
    Untracked:  code/MashClassify.R
    Untracked:  code/MashCorResult.R
    Untracked:  code/MashSource.R
    Untracked:  code/Weight_plot.R
    Untracked:  code/addemV.R
    Untracked:  code/estimate_cor.R
    Untracked:  code/generateDataV.R
    Untracked:  code/johnprocess.R
    Untracked:  code/sim_mean_sig.R
    Untracked:  code/summary.R
    Untracked:  data/Blischak_et_al_2015/
    Untracked:  data/scale_data.rds
    Untracked:  docs/figure/Classify.Rmd/
    Untracked:  docs/figure/OutlierDetection.Rmd/
    Untracked:  docs/figure/OutlierDetection2.Rmd/
    Untracked:  docs/figure/OutlierDetection3.Rmd/
    Untracked:  docs/figure/Test.Rmd/
    Untracked:  docs/figure/mash_missing_whole_row_5.Rmd/
    Untracked:  docs/include/
    Untracked:  output/AddEMV/
    Untracked:  output/CovED_UKBio_strong.rds
    Untracked:  output/CovED_UKBio_strong_Z.rds
    Untracked:  output/Flash_UKBio_strong.rds
    Untracked:  output/MASH.10.em2.result.rds
    Untracked:  output/MASH.10.mle.result.rds
    Untracked:  output/MASH.result.1.rds
    Untracked:  output/MASH.result.10.rds
    Untracked:  output/MASH.result.2.rds
    Untracked:  output/MASH.result.3.rds
    Untracked:  output/MASH.result.4.rds
    Untracked:  output/MASH.result.5.rds
    Untracked:  output/MASH.result.6.rds
    Untracked:  output/MASH.result.7.rds
    Untracked:  output/MASH.result.8.rds
    Untracked:  output/MASH.result.9.rds
    Untracked:  output/Mash_EE_Cov_0_plusR1.rds
    Untracked:  output/Trail 1/
    Untracked:  output/Trail 2/
    Untracked:  output/UKBio_mash_model.rds
Unstaged changes:
    Modified:   analysis/EstimateCorMaxEM2.Rmd
    Modified:   analysis/Mash_UKBio.Rmd
    Modified:   analysis/_site.yml
    Modified:   analysis/chunks.R
    Modified:   analysis/mash_missing_samplesize.Rmd
    Modified:   output/Flash_T2_0.rds
    Modified:   output/Flash_T2_0_mclust.rds
    Modified:   output/Mash_model_0_plusR1.rds
    Modified:   output/PresiAddVarCol.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes. 
| File | Version | Author | Date | Message | 
|---|---|---|---|---|
| Rmd | b083054 | zouyuxin | 2018-08-20 | wflow_publish(“analysis/EstimateCorMax.Rmd”) | 
| html | 6281062 | zouyuxin | 2018-08-15 | Build site. | 
| Rmd | 3e3e128 | zouyuxin | 2018-08-15 | wflow_publish(c(“analysis/EstimateCor.Rmd”, “analysis/EstimateCorMax.Rmd”, | 
| html | bf28e18 | zouyuxin | 2018-08-14 | Build site. | 
| Rmd | 15b85be | zouyuxin | 2018-08-14 | wflow_publish(c(“analysis/EstimateCorMax.Rmd”, “analysis/EstimateCorMaxEM2.Rmd”)) | 
| html | 568fbe6 | zouyuxin | 2018-08-13 | Build site. | 
| Rmd | 3ae3f08 | zouyuxin | 2018-08-13 | wflow_publish(c(“analysis/EstimateCor.Rmd”, | 
| html | 1ceff72 | zouyuxin | 2018-08-03 | Build site. | 
| Rmd | 9d2623c | zouyuxin | 2018-08-03 | wflow_publish(c(“analysis/EstimateCorMax.Rmd”)) | 
| html | faa5b1a | zouyuxin | 2018-07-26 | Build site. | 
| Rmd | 7ac3df0 | zouyuxin | 2018-07-26 | wflow_publish(“analysis/EstimateCorMax.Rmd”) | 
library(mashr)
Loading required package: ashr
source('../code/generateDataV.R')
source('../code/estimate_cor.R')
source('../code/summary.R')
library(kableExtra)
library(knitr)
We want to estimate \(\rho\) \[ \left(\begin{matrix} \hat{x} \\ \hat{y} \end{matrix} \right) | \left(\begin{matrix} x \\ y \end{matrix} \right) \sim N(\left(\begin{matrix} \hat{x} \\ \hat{y} \end{matrix} \right) ; \left(\begin{matrix} x \\ y \end{matrix} \right), \left( \begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix} \right)) \] \[ \left(\begin{matrix} x \\ y \end{matrix} \right) \sim \sum_{k=0}^{K} \pi_{k} N( \left(\begin{matrix} x \\ y \end{matrix} \right); 0, U_{k} ) \] \(\Rightarrow\) \[ \left(\begin{matrix} \hat{x} \\ \hat{y} \end{matrix} \right) \sim \sum_{k=0}^{K} \pi_{k} N( \left(\begin{matrix} \hat{x} \\ \hat{y} \end{matrix} \right); 0, \left( \begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix} \right) + U_{k} ) \] \[ \Sigma_{k} = \left( \begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix} \right) + U_{k} = \left( \begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix} \right) + \left( \begin{matrix} u_{k11} & u_{k12} \\ u_{k21} & u_{k22} \end{matrix} \right) = \left( \begin{matrix} 1+u_{k11} & \rho+u_{k12} \\ \rho+u_{k21} & 1+u_{k22} \end{matrix} \right) \] Let \(\sigma_{k11} = \sqrt{1+u_{k11}}\), \(\sigma_{k22} = \sqrt{1+u_{k22}}\), \(\phi_{k}=\frac{\rho+u_{k12}}{\sigma_{k11}\sigma_{k22}}\)
The loglikelihood is (with penalty) \[ l(\rho, \pi) = \sum_{i=1}^{n} \log \sum_{k=0}^{K} \pi_{k}N(x_{i}; 0, \Sigma_{k}) + \sum_{k=0}^{K} (\lambda_{k}-1) \log \pi_{k} \]
The penalty on \(\pi\) encourages over-estimation of \(\pi_{0}\), \(\lambda_{k}\geq 1\).
\[ l(\rho, \pi) = \sum_{i=1}^{n} \log \sum_{k=0}^{K} \pi_{k}\frac{1}{2\pi\sigma_{k11}\sigma_{k22}\sqrt{1-\phi_{k}^2}} \exp\left( -\frac{1}{2(1-\phi_{k}^2)}\left[ \frac{x_{i}^2}{\sigma_{k11}^2} + \frac{y_{i}^2}{\sigma_{k22}^2} - \frac{2\phi_{k}x_{i}y_{i}}{\sigma_{k11}\sigma_{k22}}\right] \right) + \sum_{k=0}^{K} (\lambda_{k}-1) \log \pi_{k} \]
Note: This probelm is convex with respect to \(\pi\). In terms of \(\rho\), the covenxity depends on the data.
Algorithm:
Input: X, init_pi, init_rho, Ulist
Compute loglikelihood
delta = 1
while delta > tol
  Given pi, estimate rho by max loglikelihood (optim function)
  Given rho, estimate pi by max loglikelihood (convex problem)
  Compute loglikelihood
  Update delta
\[ \hat{\beta}|\beta \sim N_{2}(\hat{\beta}; \beta, \left(\begin{matrix} 1 & 0.5 \\ 0.5 & 1 \end{matrix}\right)) \]
\[ \beta \sim \frac{1}{4}\delta_{0} + \frac{1}{4}N_{2}(0, \left(\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}\right)) + \frac{1}{4}N_{2}(0, \left(\begin{matrix} 0 & 0 \\ 0 & 1 \end{matrix}\right)) + \frac{1}{4}N_{2}(0, \left(\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix}\right)) \]
n = 4000
set.seed(1)
n = 4000; p = 2
Sigma = matrix(c(1,0.5,0.5,1),p,p)
U0 = matrix(0,2,2)
U1 = U0; U1[1,1] = 1
U2 = U0; U2[2,2] = 1
U3 = matrix(1,2,2)
Utrue = list(U0=U0, U1=U1, U2=U2, U3=U3)
data = generate_data(n, p, Sigma, Utrue)
m.data = mash_set_data(data$Bhat, data$Shat)
U.c = cov_canonical(m.data)
grid = mashr:::autoselect_grid(m.data, sqrt(2))
Ulist = mashr:::normalize_Ulist(U.c)
xUlist = mashr:::expand_cov(Ulist,grid,usepointmass =  TRUE)
result <- optimize_pi_rho_times(data$Bhat, xUlist, init_rho = c(-0.7,0,0.7))
Warning in REBayes::KWDual(A, rep(1, k), normalize(w), control = control): estimated mixing distribution has some negative values:
               consider reducing rtol
Warning in mixIP(matrix_lik = structure(c(0.0627889120852815,
0.0114523005348735, : Optimization step yields mixture weights that are
either too small, or negative; weights have been corrected and renormalized
after the optimization.
plot(result[[1]]$loglik)

| Version | Author | Date | 
|---|---|---|
| 568fbe6 | zouyuxin | 2018-08-13 | 
| faa5b1a | zouyuxin | 2018-07-26 | 
The estimated \(\rho\) is 0.6124039.
m.data.mle = mash_set_data(data$Bhat, data$Shat, V = matrix(c(1,result[[1]]$rho,result[[1]]$rho,1),2,2))
U.c = cov_canonical(m.data.mle)
m.mle = mash(m.data.mle, U.c, verbose= FALSE)
null.ind = which(apply(data$B,1,sum) == 0)
The log likelihood is -1.23017710^{4}. There are 55 significant samples, 2 false positives. The RRMSE is 0.5964282.
The estimated pi is
barplot(get_estimated_pi(m.mle), las=2, cex.names = 0.7, main='MLE', ylim=c(0,0.8))

| Version | Author | Date | 
|---|---|---|
| bf28e18 | zouyuxin | 2018-08-14 | 
| 568fbe6 | zouyuxin | 2018-08-13 | 
The ROC curve:
m.data.correct = mash_set_data(data$Bhat, data$Shat, V=Sigma)
m.correct = mash(m.data.correct, U.c, verbose = FALSE)
m.correct.seq = ROC.table(data$B, m.correct)
m.mle.seq = ROC.table(data$B, m.mle)

| Version | Author | Date | 
|---|---|---|
| 568fbe6 | zouyuxin | 2018-08-13 | 
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] knitr_1.20       kableExtra_0.9.0 mashr_0.2-11     ashr_2.2-10     
loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18      pillar_1.3.0      compiler_3.5.1   
 [4] git2r_0.23.0      plyr_1.8.4        workflowr_1.1.1  
 [7] R.methodsS3_1.7.1 R.utils_2.6.0     iterators_1.0.10 
[10] tools_3.5.1       digest_0.6.15     viridisLite_0.3.0
[13] tibble_1.4.2      evaluate_0.11     lattice_0.20-35  
[16] pkgconfig_2.0.1   rlang_0.2.1       Matrix_1.2-14    
[19] foreach_1.4.4     rstudioapi_0.7    yaml_2.2.0       
[22] parallel_3.5.1    mvtnorm_1.0-8     xml2_1.2.0       
[25] httr_1.3.1        stringr_1.3.1     REBayes_1.3      
[28] hms_0.4.2         rprojroot_1.3-2   grid_3.5.1       
[31] R6_2.2.2          rmarkdown_1.10    rmeta_3.0        
[34] readr_1.1.1       magrittr_1.5      whisker_0.3-2    
[37] scales_0.5.0      backports_1.1.2   codetools_0.2-15 
[40] htmltools_0.3.6   MASS_7.3-50       rvest_0.3.2      
[43] assertthat_0.2.0  colorspace_1.3-2  stringi_1.2.4    
[46] Rmosek_8.0.69     munsell_0.5.0     doParallel_1.0.11
[49] pscl_1.5.2        truncnorm_1.0-8   SQUAREM_2017.10-1
[52] crayon_1.3.4      R.oo_1.22.0      
This reproducible R Markdown analysis was created with workflowr 1.1.1