Last updated: 2019-01-17
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20190115)
The command set.seed(20190115) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 9bf71e0
wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: .sos/
Ignored: data/.DS_Store
Untracked files:
Untracked: .gitattributes
Untracked: analysis/SusieZConverge.Rmd
Untracked: data/sim_gaussian_475.rds
Untracked: data/sim_gaussian_75.rds
Untracked: output/dscout_gaussian_init.rds
Untracked: output/dscout_gaussian_null.rds
Untracked: output/dscout_gaussian_z.rds
Unstaged changes:
Modified: analysis/dscquery.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 9bf71e0 | zouyuxin | 2019-01-17 | wflow_publish(“analysis/SusieZProblem.Rmd”) |
| html | 5028244 | zouyuxin | 2019-01-17 | Build site. |
| Rmd | 9ac42e0 | zouyuxin | 2019-01-17 | wflow_publish(“analysis/SusieZProblem.Rmd”) |
| html | 142fac0 | zouyuxin | 2019-01-17 | Build site. |
| Rmd | fd55060 | zouyuxin | 2019-01-17 | wflow_publish(“analysis/SusieZProblem.Rmd”) |
| html | a6cee3a | zouyuxin | 2019-01-17 | Build site. |
| Rmd | 29fc636 | zouyuxin | 2019-01-17 | wflow_publish(“analysis/SusieZProblem.Rmd”) |
Investigate the non-convergence problem in susie_z.
# devtools::install_github('zouyuxin/susieR') # susieR in my folk output the susie log BF matrix
library(susieR)
dscout_null = readRDS('output/dscout_gaussian_null.rds')
dscout_null = dscout_null[!is.na(dscout_null$sim_gaussian_null.output.file),]
dscout_null = dscout_null[!is.na(dscout_null$susie_z.output.file),]
The maximum number of iteration is 7. All models converge. There is no false discovery.
We simulate data with only one non-zero effect.
We simulate data under PVE = 0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95. There are 20 replicates.
dscout_gaussian_z = readRDS('output/dscout_gaussian_z.rds')
dscout_gaussian_z = dscout_gaussian_z[!is.na(dscout_gaussian_z$sim_gaussian.output.file),]
dscout_gaussian_z = dscout_gaussian_z[!is.na(dscout_gaussian_z$susie_z.output.file),]
dscout_gaussian_z$NotConverge = dscout_gaussian_z$susie_z.niter == 100
dscout_gaussian_z_1 = dscout_gaussian_z[dscout_gaussian_z$sim_gaussian.effect_num == '1', ]
dscout_gaussian_z_1 = dscout_gaussian_z_1[dscout_gaussian_z_1$susie_z.L == '5' , ]
When the PVE is greater than 0.5, some replicates fail to converge in 100 iterations.
converge.summary = aggregate(NotConverge ~ sim_gaussian.pve, dscout_gaussian_z_1, sum)
colnames(converge.summary) = c('pve', 'NotConverge')
converge.summary
pve NotConverge
1 0.01 0
2 0.05 0
3 0.10 0
4 0.20 0
5 0.50 4
6 0.80 9
7 0.95 6
Now we change the initial susie object to the truth:
dscout_gaussian_init = readRDS('output/dscout_gaussian_init.rds')
dscout_gaussian_init = dscout_gaussian_init[!is.na(dscout_gaussian_init$sim_gaussian.output.file),]
dscout_gaussian_init = dscout_gaussian_init[!is.na(dscout_gaussian_init$susie_z_init.output.file),]
dscout_gaussian_init$NotConverge = dscout_gaussian_init$susie_z_init.niter == 100
dscout_gaussian_init_1 = dscout_gaussian_init[dscout_gaussian_init$sim_gaussian.effect_num == '1', ]
converge.summary = aggregate(NotConverge ~ sim_gaussian.pve, dscout_gaussian_init_1, sum)
colnames(converge.summary) = c('pve', 'NotConverge')
converge.summary
pve NotConverge
1 0.01 0
2 0.05 0
3 0.10 0
4 0.20 0
5 0.50 0
6 0.80 0
7 0.95 0
All models converge.
We simulate data with 5 true effects. We simulate data under PVE = 0.01, 0.05, 0.1, 0.2, 0.5, 0.8, 0.95. There are 20 replicates.
dscout_gaussian_z_5 = dscout_gaussian_z[dscout_gaussian_z$sim_gaussian.effect_num == '5', ]
dscout_gaussian_z_5 = dscout_gaussian_z_5[dscout_gaussian_z_5$susie_z.L == '5' , ]
When the PVE is greater than 0.5, some replicates fail to converge in 100 iterations.
converge.summary = aggregate(NotConverge ~ sim_gaussian.pve, dscout_gaussian_z_5, sum)
colnames(converge.summary) = c('pve', 'NotConverge')
converge.summary
pve NotConverge
1 0.01 0
2 0.05 0
3 0.10 0
4 0.20 0
5 0.50 1
6 0.80 5
7 0.95 5
Now we change the initial susie object to the truth,
dscout_gaussian_init_5 = dscout_gaussian_init[dscout_gaussian_init$sim_gaussian.effect_num == '5', ]
converge.summary = aggregate(NotConverge ~ sim_gaussian.pve, dscout_gaussian_init_5, sum)
colnames(converge.summary) = c('pve', 'NotConverge')
converge.summary
pve NotConverge
1 0.01 0
2 0.05 0
3 0.10 0
4 0.20 0
5 0.50 1
6 0.80 5
7 0.95 5
There are still non-convergence cases with the true initialization.
Let’s see an example data with one true effect. The PVE is 0.5, the residual variance is 0.47^2.
X = readRDS('data/susie_X.rds') # X is from susieR package, N3finemapping, X is coloumn mean centered.
R = readRDS('data/susie_R.rds')
data = readRDS('data/sim_gaussian_75.rds')
n = data$n
beta = numeric(data$p)
beta[data$beta_idx] = data$beta_val
z = data$ss$effect/data$ss$se
susie_plot(z, y = 'z', b=beta)

| Version | Author | Date |
|---|---|---|
| a6cee3a | zouyuxin | 2019-01-17 |
Fit susie model using z scores,
fit_z = susie_z(z, R=R, max_iter = 20, L=5, track_fit = T)
Warning in susie_ss(XtX = R, Xty = z, n = 2, var_y = 1, L = L,
estimate_prior_variance = TRUE, : IBSS algorithm did not converge in 20
iterations!
The objective value is
susie_get_objective(fit_z)
[1] 409.0831
susie_plot(fit_z, y = 'PIP', b = beta)

| Version | Author | Date |
|---|---|---|
| 5028244 | zouyuxin | 2019-01-17 |
Now we initialize at the truth (L=1), the model converges to a lower objective value.
s_init = susie_init_coef(data$beta_idx, data$beta_val, data$p)
fit_z_init = susie_z(z, R=R, s_init = s_init)
The objective value is
susie_get_objective(fit_z_init)
[1] 299.4515
susie_plot(fit_z_init, y = 'PIP', b = beta)

| Version | Author | Date |
|---|---|---|
| 5028244 | zouyuxin | 2019-01-17 |
| a6cee3a | zouyuxin | 2019-01-17 |
I mentioned in the write-up, we approximate the BF(\(z_j(\sigma)\)) (eqn 7.61) using BF(\(z_j(\hat{\sigma}_j)\)). When the data contain strong association signals (\(\hat{\sigma}_j > \sigma\)), the z score approximation under-estimates the Bayes Factor.
The 7.61 in the write-up: \[ \begin{align} BF(z_j(\sigma); w) &= \sqrt{\frac{1}{1+w^2}} \exp\left( \frac{1}{2} z_j(\sigma)^2 \frac{w^2}{1+w^2} \right) \quad \quad \text{where }w^2 = (n-1)\frac{\sigma_{0}^{2}}{\sigma^2} \end{align} \]
We check the under-estimate of the BF. We fit the susie model using \(z_j(\sigma)\) and \(z_j(\hat{\sigma}_j)\) separately, with L = 1 and iter = 1, fix prior variance.
fit_z_1 = susie_ss(XtX = R, Xty = z, n = 2, var_y = 1, L = 1,
estimate_prior_variance = FALSE,
estimate_residual_variance = FALSE, max_iter = 1)
X.s = apply(X, 2, function(x) x/(sd(x)*sqrt(n-1)))
z_true = (t(X.s) %*% data$sim_y) / as.numeric(data$sigma)
fit_z_1_true = susie_ss(XtX = R, Xty = z_true, n = 2, var_y = 1, L = 1,
estimate_prior_variance = FALSE,
estimate_residual_variance = FALSE, max_iter = 1)
par(mfrow=c(1,3))
{plot(fit_z_1$lbf_mtx, fit_z_1_true$lbf_mtx, xlab = expression(logBF(z[j](hat(sigma)[j]))), ylab = expression(logBF(z[j](sigma))), main='log BF')
abline(0,1)}
{plot(fit_z_1$alpha, fit_z_1_true$alpha, xlab = expression(alpha(hat(sigma)[j])), ylab = expression(alpha(sigma)), main=expression(alpha))
abline(0,1)}
{plot(abs(fit_z_1$mu), abs(fit_z_1_true$mu), xlab = expression(abs(mu(hat(sigma)[j]))), ylab = expression(abs(mu(sigma))), main=expression(abs(mu)))
abline(0,1)}

| Version | Author | Date |
|---|---|---|
| 5028244 | zouyuxin | 2019-01-17 |
| a6cee3a | zouyuxin | 2019-01-17 |
It is clear from the plots that the Bayes Factor is under-estimated. The estimated \(\alpha\), \(\mu\) are smaller.
Now, we fit susie model using \(z_j(\hat{\sigma}_j)\) and \(z_j(\sigma)\), with L = 5 and estimating the prior variance.
fit_z_10_prior = susie_z(z=z, R=R, L=5)
Warning in susie_ss(XtX = R, Xty = z, n = 2, var_y = 1, L = L,
estimate_prior_variance = TRUE, : IBSS algorithm did not converge in 100
iterations!
The estimated prior variances for model using \(z_j(\hat{\sigma}_j)\) are
susie_get_prior_variance(fit_z_10_prior)
[1] 24449.0562 4961.9053 2624.2558 2017.4113 163.6263
fit_z_10_true_prior = susie_z(z = z_true, R=R, L=5)
The estimated prior variances for model using \(z_j(\sigma)\) are
susie_get_prior_variance(fit_z_10_true_prior)
[1] 604.074353 3.315658 0.000000 0.000000 0.000000
The PIP for model using \(z_j(\sigma)\):
susie_plot(fit_z_10_true_prior, y='PIP', b=beta)

Using the true z, \(z_j(\sigma)\), the estimated prior variance for some L becomes zero. Using the estimated z, \(z_j(\hat{\sigma}_j)\), the estimated prior variance for all L are non-zero.
Let’s see an example data with 5 true effects. The PVE is 0.8, the residual variance is 0.74^2.
data = readRDS('data/sim_gaussian_475.rds')
n = data$n
beta = numeric(data$p)
beta[data$beta_idx] = data$beta_val
z = data$ss$effect/data$ss$se
susie_plot(z, y = "z", b=beta)

| Version | Author | Date |
|---|---|---|
| 5028244 | zouyuxin | 2019-01-17 |
Fit susie model using z scores,
fit_z = susie_z(z, R=R, max_iter = 20, L=5, track_fit = T)
Warning in susie_ss(XtX = R, Xty = z, n = 2, var_y = 1, L = L,
estimate_prior_variance = TRUE, : IBSS algorithm did not converge in 20
iterations!
The objective value is
susie_get_objective(fit_z)
[1] 525.9303
susie_plot(fit_z, y = 'PIP', b = beta)

Now we initialize at the truth, the model fails to converge.
s_init = susie_init_coef(data$beta_idx, data$beta_val, data$p)
fit_z_init = susie_z(z, R=R, s_init = s_init)
Warning in susie_ss(XtX = R, Xty = z, n = 2, var_y = 1, L = L,
estimate_prior_variance = TRUE, : IBSS algorithm did not converge in 100
iterations!
susie_get_objective(fit_z_init)
[1] 1089.854
susie_plot(fit_z_init, y = 'PIP', b = beta)

Changing the z scores to \(z_j(\sigma)\), the model converges.
z_true = (t(X.s) %*% data$sim_y) / as.numeric(data$sigma)
fit_z_5_true = susie_z(z = z_true, R=R, L=5)
The objective value is
susie_get_objective(fit_z_5_true)
[1] 1119.521
The estimated prior variances are
susie_get_prior_variance(fit_z_5_true)
[1] 1467.02402 640.34084 185.85349 53.12493 0.00000
susie_plot(fit_z_5_true, y = 'PIP', b = beta)

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] susieR_0.6.3
loaded via a namespace (and not attached):
[1] workflowr_1.1.1 Rcpp_1.0.0 lattice_0.20-38
[4] digest_0.6.18 rprojroot_1.3-2 R.methodsS3_1.7.1
[7] grid_3.5.1 backports_1.1.3 git2r_0.24.0
[10] magrittr_1.5 evaluate_0.12 stringi_1.2.4
[13] whisker_0.3-2 R.oo_1.22.0 R.utils_2.7.0
[16] Matrix_1.2-15 rmarkdown_1.11 tools_3.5.1
[19] stringr_1.3.1 yaml_2.2.0 compiler_3.5.1
[22] htmltools_0.3.6 knitr_1.20 expm_0.999-3
This reproducible R Markdown analysis was created with workflowr 1.1.1