BH
robust to correlation?Last updated: 2018-05-15
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(12345)
The command set.seed(12345)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 388e65e
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/BH_robustness_cache/
Ignored: analysis/FDR_Null_cache/
Ignored: analysis/FDR_null_betahat_cache/
Ignored: analysis/Rmosek_cache/
Ignored: analysis/StepDown_cache/
Ignored: analysis/alternative2_cache/
Ignored: analysis/alternative_cache/
Ignored: analysis/ash_gd_cache/
Ignored: analysis/average_cor_gtex_2_cache/
Ignored: analysis/average_cor_gtex_cache/
Ignored: analysis/brca_cache/
Ignored: analysis/cash_deconv_cache/
Ignored: analysis/cash_fdr_1_cache/
Ignored: analysis/cash_fdr_2_cache/
Ignored: analysis/cash_fdr_3_cache/
Ignored: analysis/cash_fdr_4_cache/
Ignored: analysis/cash_fdr_5_cache/
Ignored: analysis/cash_fdr_6_cache/
Ignored: analysis/cash_plots_cache/
Ignored: analysis/cash_sim_1_cache/
Ignored: analysis/cash_sim_2_cache/
Ignored: analysis/cash_sim_3_cache/
Ignored: analysis/cash_sim_4_cache/
Ignored: analysis/cash_sim_5_cache/
Ignored: analysis/cash_sim_6_cache/
Ignored: analysis/cash_sim_7_cache/
Ignored: analysis/correlated_z_2_cache/
Ignored: analysis/correlated_z_3_cache/
Ignored: analysis/correlated_z_cache/
Ignored: analysis/create_null_cache/
Ignored: analysis/cutoff_null_cache/
Ignored: analysis/design_matrix_2_cache/
Ignored: analysis/design_matrix_cache/
Ignored: analysis/diagnostic_ash_cache/
Ignored: analysis/diagnostic_correlated_z_2_cache/
Ignored: analysis/diagnostic_correlated_z_3_cache/
Ignored: analysis/diagnostic_correlated_z_cache/
Ignored: analysis/diagnostic_plot_2_cache/
Ignored: analysis/diagnostic_plot_cache/
Ignored: analysis/efron_leukemia_cache/
Ignored: analysis/fitting_normal_cache/
Ignored: analysis/gaussian_derivatives_2_cache/
Ignored: analysis/gaussian_derivatives_3_cache/
Ignored: analysis/gaussian_derivatives_4_cache/
Ignored: analysis/gaussian_derivatives_5_cache/
Ignored: analysis/gaussian_derivatives_cache/
Ignored: analysis/gd-ash_cache/
Ignored: analysis/gd_delta_cache/
Ignored: analysis/gd_lik_2_cache/
Ignored: analysis/gd_lik_cache/
Ignored: analysis/gd_w_cache/
Ignored: analysis/knockoff_10_cache/
Ignored: analysis/knockoff_2_cache/
Ignored: analysis/knockoff_3_cache/
Ignored: analysis/knockoff_4_cache/
Ignored: analysis/knockoff_5_cache/
Ignored: analysis/knockoff_6_cache/
Ignored: analysis/knockoff_7_cache/
Ignored: analysis/knockoff_8_cache/
Ignored: analysis/knockoff_9_cache/
Ignored: analysis/knockoff_cache/
Ignored: analysis/knockoff_var_cache/
Ignored: analysis/marginal_z_alternative_cache/
Ignored: analysis/marginal_z_cache/
Ignored: analysis/mosek_reg_2_cache/
Ignored: analysis/mosek_reg_4_cache/
Ignored: analysis/mosek_reg_5_cache/
Ignored: analysis/mosek_reg_6_cache/
Ignored: analysis/mosek_reg_cache/
Ignored: analysis/pihat0_null_cache/
Ignored: analysis/plot_diagnostic_cache/
Ignored: analysis/poster_obayes17_cache/
Ignored: analysis/real_data_simulation_2_cache/
Ignored: analysis/real_data_simulation_3_cache/
Ignored: analysis/real_data_simulation_4_cache/
Ignored: analysis/real_data_simulation_5_cache/
Ignored: analysis/real_data_simulation_cache/
Ignored: analysis/rmosek_primal_dual_2_cache/
Ignored: analysis/rmosek_primal_dual_cache/
Ignored: analysis/seqgendiff_cache/
Ignored: analysis/simulated_correlated_null_2_cache/
Ignored: analysis/simulated_correlated_null_3_cache/
Ignored: analysis/simulated_correlated_null_cache/
Ignored: analysis/simulation_real_se_2_cache/
Ignored: analysis/simulation_real_se_cache/
Ignored: analysis/smemo_2_cache/
Ignored: data/LSI/
Ignored: docs/.DS_Store
Ignored: docs/figure/.DS_Store
Ignored: output/fig/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | eaaa5f9 | LSun | 2018-05-14 | Build site. |
rmd | a9d12a8 | LSun | 2018-05-14 | wflow_publish(c(“analysis/BH_robustness.rmd”, “analysis/index.Rmd”)) |
html | ddf9062 | LSun | 2018-05-12 | Update to 1.0 |
rmd | cc0ab83 | Lei Sun | 2018-05-11 | update |
html | adeab80 | LSun | 2018-05-06 | Build site. |
rmd | 0b0a394 | LSun | 2018-05-06 | wflow_publish(c(“analysis/BH_robustness.rmd”, “analysis/gd_w.rmd”)) |
html | 9de2617 | LSun | 2018-04-14 | Build site. |
rmd | 76da3c2 | LSun | 2018-04-14 | wflow_publish(“analysis/BH_robustness.rmd”) |
rmd | ae83c56 | Lei Sun | 2018-04-14 | BH robustness |
We apply BH to correlated data, generated from pure simulation or real data, to have a sense how robust BH is for correlation.
source("../code/gdash_lik.R")
source("../code/gdfit.R")
source("../code/count_to_summary.R")
library(limma)
library(edgeR)
library(ashr)
library(plyr)
library(ggplot2)
library(reshape2)
set.seed(777)
d <- 10
n <- 1e4
B <- matrix(rnorm(n * d), n, d)
Sigma <- B %*% t(B) + diag(n)
sigma <- diag(Sigma)
Rho <- cov2cor(Sigma)
par(mar = c(5.1, 4.1, 1, 2.1))
hist(Rho[lower.tri(Rho)], xlab = expression(rho[ij]), main = "")
Version | Author | Date |
---|---|---|
eaaa5f9 | LSun | 2018-05-14 |
rhobar <- c()
for (l in 1 : 10) {
rhobar[l] <- (sum(Rho^l) - n) / (n * (n - 1))
}
nsim <- 1e4
Z.list <- W <- list()
for (i in 1 : nsim) {
z <- rnorm(d)
Z <- B %*% z + rnorm(n)
Z <- Z / sqrt(sigma)
Z.list[[i]] <- Z
Z.GD <- gdfit.mom(Z, 100)
W[[i]] <- Z.GD$w
}
Z.sim <- Z.list
W.sim <- W
r <- readRDS("../data/liver.rds")
top_genes_index = function (g, X) {
return(order(rowSums(X), decreasing = TRUE)[1 : g])
}
lcpm = function (r) {
R = colSums(r)
t(log2(((t(r) + 0.5) / (R + 1)) * 10^6))
}
nsamp <- 5
ngene <- n
Y = lcpm(r)
subset = top_genes_index(ngene, Y)
r = r[subset,]
nsim <- 1e4
Z.list <- W <- list()
for (i in 1 : nsim) {
## generate data
counts <- r[, sample(ncol(r), 2 * nsamp)]
design <- model.matrix(~c(rep(0, nsamp), rep(1, nsamp)))
summary <- count_to_summary(counts, design)
Z <- summary$z
Z.list[[i]] <- Z
Z.GD <- gdfit.mom(Z, 100)
W[[i]] <- Z.GD$w
}
Z.gtex <- Z.list
W.sim <- W
p <- lapply(Z.sim, function(x) {pnorm(-abs(x)) * 2})
q <- lapply(p, p.adjust, method = "BH")
q.cutoff <- seq(0.01, 0.99, by = 0.01)
fd <- list()
for (i in seq(q.cutoff)) {
fd[[i]] <- lapply(q, function(x) {sum(x <= q.cutoff[i])})
}
fdp <- lapply(fd, function(x) {mean(x != 0)})
plot(q.cutoff, fdp, xlab = "Nominal FDR", ylab = "FDP",
xlim = range(q.cutoff, fdp), ylim = range(q.cutoff, fdp),
type = "l")
abline(0, 1, col = "red", lty = 3)
Version | Author | Date |
---|---|---|
eaaa5f9 | LSun | 2018-05-14 |
p <- lapply(Z.gtex, function(x) {pnorm(-abs(x)) * 2})
q <- lapply(p, p.adjust, method = "BH")
q.cutoff <- seq(0.001, 0.200, by = 0.001)
fd <- list()
for (i in seq(q.cutoff)) {
fd[[i]] <- lapply(q, function(x) {sum(x <= q.cutoff[i])})
}
fdp <- lapply(fd, function(x) {mean(x != 0)})
plot(q.cutoff, fdp, xlab = "Nominal FDR", ylab = "FDP",
xlim = range(q.cutoff, fdp), ylim = range(q.cutoff, fdp),
type = "l")
abline(0, 1, col = "red", lty = 3)
Version | Author | Date |
---|---|---|
eaaa5f9 | LSun | 2018-05-14 |
theta <- list()
for (j in 1 : 1e4) {
theta[[j]] <- sample(c(rep(0, 9.5e3), rep(3, 0.5e3)))
}
X.gtex <- list()
for (j in 1 : 1e4) {
X.gtex[[j]] <- theta[[j]] + Z.gtex[[j]]
}
p <- lapply(X.gtex, function(x) {pnorm(-abs(x)) * 2})
q <- lapply(p, p.adjust, method = "BH")
q.cutoff <- seq(0.001, 0.200, by = 0.001)
fdp <- tdp <- list()
for (i in seq(q.cutoff)) {
fdp.vec <- tdp.vec <- c()
for (j in 1 : 1e4) {
fdp.vec[j] <- sum(theta[[j]][q[[j]] <= q.cutoff[i]] == 0) / max(1, length(q[[j]] <= q.cutoff[i]))
tdp.vec[j] <- sum(theta[[j]][q[[j]] <= q.cutoff[i]] != 0) / 1e3
}
fdp[[i]] <- fdp.vec
tdp[[i]] <- tdp.vec
}
fdp.avg <- lapply(fdp, mean)
tdp.avg <- lapply(tdp, mean)
plot(q.cutoff, fdp.avg, type = "l", xlim = range(q.cutoff, fdp.avg), ylim = range(q.cutoff, fdp.avg), xlab = "Nominal FDR", ylab = "Average FDP")
abline(0, 1, col = "red")
Version | Author | Date |
---|---|---|
eaaa5f9 | LSun | 2018-05-14 |
plot(q.cutoff, tdp.avg, type = "l", xlab = "Nominal FDR", ylab = "TDP")
Version | Author | Date |
---|---|---|
eaaa5f9 | LSun | 2018-05-14 |
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] workflowr_1.0.1 Rcpp_0.12.16 digest_0.6.15
[4] rprojroot_1.3-2 R.methodsS3_1.7.1 backports_1.1.2
[7] git2r_0.21.0 magrittr_1.5 evaluate_0.10.1
[10] stringi_1.1.6 whisker_0.3-2 R.oo_1.21.0
[13] R.utils_2.6.0 rmarkdown_1.9 tools_3.4.3
[16] stringr_1.3.0 yaml_2.1.18 compiler_3.4.3
[19] htmltools_0.3.6 knitr_1.20
This reproducible R Markdown analysis was created with workflowr 1.0.1