Improvement on Implementation with Rmosek: Regularization

Last updated: 2018-05-12

workflowr checks: (Click a bullet for more information)

✔ R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

✔ Repository version: ddf9062

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/BH_robustness_cache/
    Ignored:    analysis/FDR_Null_cache/
    Ignored:    analysis/FDR_null_betahat_cache/
    Ignored:    analysis/Rmosek_cache/
    Ignored:    analysis/StepDown_cache/
    Ignored:    analysis/alternative2_cache/
    Ignored:    analysis/alternative_cache/
    Ignored:    analysis/ash_gd_cache/
    Ignored:    analysis/average_cor_gtex_2_cache/
    Ignored:    analysis/average_cor_gtex_cache/
    Ignored:    analysis/brca_cache/
    Ignored:    analysis/cash_deconv_cache/
    Ignored:    analysis/cash_fdr_1_cache/
    Ignored:    analysis/cash_fdr_2_cache/
    Ignored:    analysis/cash_fdr_3_cache/
    Ignored:    analysis/cash_fdr_4_cache/
    Ignored:    analysis/cash_fdr_5_cache/
    Ignored:    analysis/cash_fdr_6_cache/
    Ignored:    analysis/cash_plots_cache/
    Ignored:    analysis/cash_sim_1_cache/
    Ignored:    analysis/cash_sim_2_cache/
    Ignored:    analysis/cash_sim_3_cache/
    Ignored:    analysis/cash_sim_4_cache/
    Ignored:    analysis/cash_sim_5_cache/
    Ignored:    analysis/cash_sim_6_cache/
    Ignored:    analysis/cash_sim_7_cache/
    Ignored:    analysis/correlated_z_2_cache/
    Ignored:    analysis/correlated_z_3_cache/
    Ignored:    analysis/correlated_z_cache/
    Ignored:    analysis/create_null_cache/
    Ignored:    analysis/cutoff_null_cache/
    Ignored:    analysis/design_matrix_2_cache/
    Ignored:    analysis/design_matrix_cache/
    Ignored:    analysis/diagnostic_ash_cache/
    Ignored:    analysis/diagnostic_correlated_z_2_cache/
    Ignored:    analysis/diagnostic_correlated_z_3_cache/
    Ignored:    analysis/diagnostic_correlated_z_cache/
    Ignored:    analysis/diagnostic_plot_2_cache/
    Ignored:    analysis/diagnostic_plot_cache/
    Ignored:    analysis/efron_leukemia_cache/
    Ignored:    analysis/fitting_normal_cache/
    Ignored:    analysis/gaussian_derivatives_2_cache/
    Ignored:    analysis/gaussian_derivatives_3_cache/
    Ignored:    analysis/gaussian_derivatives_4_cache/
    Ignored:    analysis/gaussian_derivatives_5_cache/
    Ignored:    analysis/gaussian_derivatives_cache/
    Ignored:    analysis/gd-ash_cache/
    Ignored:    analysis/gd_delta_cache/
    Ignored:    analysis/gd_lik_2_cache/
    Ignored:    analysis/gd_lik_cache/
    Ignored:    analysis/gd_w_cache/
    Ignored:    analysis/knockoff_10_cache/
    Ignored:    analysis/knockoff_2_cache/
    Ignored:    analysis/knockoff_3_cache/
    Ignored:    analysis/knockoff_4_cache/
    Ignored:    analysis/knockoff_5_cache/
    Ignored:    analysis/knockoff_6_cache/
    Ignored:    analysis/knockoff_7_cache/
    Ignored:    analysis/knockoff_8_cache/
    Ignored:    analysis/knockoff_9_cache/
    Ignored:    analysis/knockoff_cache/
    Ignored:    analysis/knockoff_var_cache/
    Ignored:    analysis/marginal_z_alternative_cache/
    Ignored:    analysis/marginal_z_cache/
    Ignored:    analysis/mosek_reg_2_cache/
    Ignored:    analysis/mosek_reg_4_cache/
    Ignored:    analysis/mosek_reg_5_cache/
    Ignored:    analysis/mosek_reg_6_cache/
    Ignored:    analysis/mosek_reg_cache/
    Ignored:    analysis/pihat0_null_cache/
    Ignored:    analysis/plot_diagnostic_cache/
    Ignored:    analysis/poster_obayes17_cache/
    Ignored:    analysis/real_data_simulation_2_cache/
    Ignored:    analysis/real_data_simulation_3_cache/
    Ignored:    analysis/real_data_simulation_4_cache/
    Ignored:    analysis/real_data_simulation_5_cache/
    Ignored:    analysis/real_data_simulation_cache/
    Ignored:    analysis/rmosek_primal_dual_2_cache/
    Ignored:    analysis/rmosek_primal_dual_cache/
    Ignored:    analysis/seqgendiff_cache/
    Ignored:    analysis/simulated_correlated_null_2_cache/
    Ignored:    analysis/simulated_correlated_null_3_cache/
    Ignored:    analysis/simulated_correlated_null_cache/
    Ignored:    analysis/simulation_real_se_2_cache/
    Ignored:    analysis/simulation_real_se_cache/
    Ignored:    analysis/smemo_2_cache/
    Ignored:    data/LSI/
    Ignored:    docs/.DS_Store
    Ignored:    docs/figure/.DS_Store
    Ignored:    output/fig/

Unstaged changes:
    Deleted:    analysis/cash_plots_fdp.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

Expand here to see past versions:

File	Version	Author	Date	Message
rmd	cc0ab83	Lei Sun	2018-05-11	update
html	0f36d99	LSun	2017-12-21	Build site.
html	853a484	LSun	2017-11-07	Build site.
html	4cc8ec5	LSun	2017-05-09	regularization
rmd	b3932c7	LSun	2017-05-09	regularization

Introduction

The \(w\) optimization problem we hope to solve has two constraints. If we are not imposing these two constraints, chances are the optimization will be unstable, especially when we don’t know \(L\) for sure beforehand. However, these two constraints are hard to be converted directly and strictly to convex or even tractable forms.

\[ \begin{array}{rl} \min\limits_{w} & -\sum\limits_{j = 1}^n\log\left(\sum\limits_{l=1}^Lw_l\left(\sum\limits_{k = 1}^K\hat\pi_k f_{jkl}\right) + \sum\limits_{k = 1}^K\hat\pi_kf_{jk0}\right) + \sum\limits_{l = 1}^L\lambda_l^w \phi \left(\left|w_l\right|\right) \\ \text{subject to} & \sum\limits_{l=1}^L w_l \varphi^{(l)}\left(z\right) + \varphi\left(z\right) \geq 0, \forall z\in \mathbb{R}\\ & w_l \text{ decay reasonably fast.} \end{array} \]

One observation is that without the constraints, the optimization goes unstable as \(\hat w\) get uncontrollably large. Therefore, a natural idea to replace these two constraints would be to bound, penalize, or regularize \(w\), to prevent them from being too large.

On the other hand, as indicated the theory and examples of good fitting by Gaussian derivatives, \(w_l\) is getting smaller as \(l\) increases. Moreover, oftentimes, really small \(w_l\) could still make a difference and thus indispensable. Therefore, although we need to stop \(\hat w\) getting too large, we cerntainly don’t want to shrink them unnecessarily and unwarrantedly to zero.

Ideally, the goal of \(\sum\limits_{l = 1}^L\lambda_l^w \phi \left(\left|w_l\right|\right)\) regularizing \(w\) should be to

Ideas on \(\lambda_l^w\) and \(\phi\)

Remember that in theory, if we are writing the random empirical density of the correlated marginally \(N\left(0, 1\right)\) random samples as

\[ f_0\left(z\right) = \varphi\left(z\right) + \sum\limits_{l = 1}^\infty W_k\varphi^{(l)}\left(z\right) \ , \] then

\[ \text{var}\left(W_l\right) = \frac{\alpha_l}{l!} \ , \] where

\[ \alpha_l = \frac{1}{\frac{n\left(n - 1\right)}{2}}\sum_\limits{i < j}\rho_{ij}^l := \bar{\rho_{ij}^l} \ . \]

Therefore, naturally \(w_l\) should decay somehow in the order of

\[ \left|w_l\right| \leadsto \sqrt{\text{var}\left(W_l\right)} = \frac{\sqrt{\bar{\rho_{ij}^l}}}{\sqrt{l!}} \ . \]

Normalization

The order of decay suggests that we should work with normalized coefficients, so that

\[ \left|w_l^n\right| = \sqrt{l!}\left|w_l\right| \leadsto \sqrt{\bar{\rho_{ij}^l}} \ . \] This piece of information on the order of magnitude of the normalized \(w_l^n\) provides a hint to determine \(\lambda_l^w\), for example,

Regularizing even orders only

Odd orders of Gaussian derivatives are generally associated with mean shift and skewness of \(f_0\), so they are generally very small, but if they are indeed not zero, they are important however they are small.

Meanwhile, when \(l\) is odd, \(\bar{\rho_{ij}^l}\) could be very small, and not in order of \(\bar{\rho^l}\) for some \(\rho \in \left(0, 1\right)\), so it’s very difficult to come up with a good \(\lambda_l^w\) when \(l\) is odd.

Therefore, it might be better to leave the odd orders alone and regularize the even orders only.

Penalty: \(l_1\) and \(l_2\)

Generally speaking, \(l_1\) imposes sparsity by shrinking small estimates exactly to zero, which is both good and bad for our case, whereas \(l_2\) penalizes large estimates severely but doesn’t force exact sparsity, which could also be good or bad.

We’ll implement both and see what happens.

\(\lambda_l^w\)

\(\lambda_l^w\) is determined to help ensure

\[ \left|w_l^n\right| = \sqrt{l!}\left|w_l\right| \leadsto \sqrt{\bar{\rho_{ij}^l}} \ , \]

Given \(\rho \in \left(0, 1\right)\), for \(l_1\) regularization, \(\lambda_l^n \sim \lambda / \sqrt{\rho^l}\); for \(l_2\) regularization, \(\lambda_l^n \sim \lambda / \rho^l\).

\(L\)

So far, \(L = 9\) has been able to handle all the example we’ve tried. Without constraints or regularization, \(L = 9\) is usually too large for cases where, say, \(L = 4\) is enough, and a larger than necessary \(L\) could lead to numerical instability.

Hence, we’ll experiment with \(L = 12\), assuming it’s enough for all the correlation induced distortions in practice, and assuming the regularization could prevent the optimization from going crazy.

Summary

The \(w\) optimization problem becomes

\[ \begin{array}{rl} \min\limits_{w, g} & -\sum\limits_{j = 1}^n \log\left(g_j\right) + \text{Regularization} \\ \text{subject to} & Aw + a = g \\ & g \geq 0\ , \end{array} \] where \(A_{n \times L} = \left[a_1,\ldots, a_L\right]\) and \(a\) are computed with normalized Gaussian derivatives and Hermite polynomials.

Let \(\lambda > 0\), \(\rho \in \left(0, 1\right)\), the regularization term has two forms as follows.

\(l_1\) regularization

\[ \begin{array}{l} \sum\limits_{l = 1}^L\lambda_l^{w^s}\left|w_l^s\right| \ ,\\ \lambda_l^{w^s} = \begin{cases} 0, & l \text{ is odd;} \\ \lambda / \rho^{l/2}, & l \text{ is even.} \end{cases} \end{array} \]

\(l_2\) regularization

\[ \begin{array}{l} \sum\limits_{l = 1}^L \lambda_l^{w^s}{w_l^s}^2 \ ,\\ \lambda_l^{w^s} = \begin{cases} 0, & l \text{ is odd;} \\ \lambda / \rho^{l}, & l \text{ is even.} \end{cases} \end{array} \]

Dual problem for \(l_1\) regularization.

The primal form is

\[ \begin{array}{rl} \min\limits_{w, g} & -\sum\limits_{j = 1}^n \log\left(g_j\right) + \sum\limits_{l = 1}^L\lambda_l^{w^s}\left|w_l^s\right| \\ \text{subject to} & Aw + a = g \\ & g \geq 0\ , \end{array} \] The dual form is

\[ \begin{array}{rl} \min\limits_{v} & -\sum\limits_{j = 1}^n \log\left(v_j\right) + a^Tv - n \\ \text{subject to} & -\lambda \leq A^Tv \leq \lambda \\ & v \geq 0\ . \end{array} \]

Dual problem for \(l_2\) regularization.

The primal form is

\[ \begin{array}{rl} \min\limits_{w, g} & -\sum\limits_{j = 1}^n \log\left(g_j\right) + \sum\limits_{l = 1}^L\lambda_l^{w^s}{w_l^s}^2 \\ \text{subject to} & Aw + a = g \\ & g \geq 0\ , \end{array} \] The dual form is

\[ \begin{array}{rl} \min\limits_{v} & -\sum\limits_{j = 1}^n \log\left(v_j\right) + a^Tv - n + \frac14 v^T \left(A \Lambda^{-1} A^T\right)v \\ \text{subject to} & a_j^Tv = 0 \ \text{ if }\lambda_j = 0 \ ,\\ & v \geq 0\ , \end{array} \] where \(\Lambda = \begin{bmatrix}\lambda_1 & & \\ & \ddots & \\ & & \lambda_L \end{bmatrix}\), and \({\Lambda^{-1}}_{jj} = 0\) when \(\lambda_j = 0\).

Choosing \(\lambda\) and \(\rho\)

Let’s start with \(\rho = 0.9\) and \(\lambda = 0.1\).

This reproducible R Markdown analysis was created with workflowr 1.0.1