<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="pandoc" /> <meta name="author" content="Lei Sun" /> <meta name="date" content="2017-03-27" /> <title>Empirical Null with Gaussian Derivatives: Weight Constraints</title> <script src="site_libs/jquery-1.11.3/jquery.min.js"></script> <meta name="viewport" content="width=device-width, initial-scale=1" /> <link href="site_libs/bootstrap-3.3.5/css/readable.min.css" rel="stylesheet" /> <script src="site_libs/bootstrap-3.3.5/js/bootstrap.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/respond.min.js"></script> <script src="site_libs/jqueryui-1.11.4/jquery-ui.min.js"></script> <link href="site_libs/tocify-1.9.1/jquery.tocify.css" rel="stylesheet" /> <script src="site_libs/tocify-1.9.1/jquery.tocify.js"></script> <script src="site_libs/navigation-1.1/tabsets.js"></script> <link href="site_libs/highlightjs-9.12.0/textmate.css" rel="stylesheet" /> <script src="site_libs/highlightjs-9.12.0/highlight.js"></script> <link href="site_libs/font-awesome-4.5.0/css/font-awesome.min.css" rel="stylesheet" /> <style type="text/css">code{white-space: pre;}</style> <style type="text/css"> pre:not([class]) { background-color: white; } </style> <script type="text/javascript"> if (window.hljs) { hljs.configure({languages: []}); hljs.initHighlightingOnLoad(); if (document.readyState && document.readyState === "complete") { window.setTimeout(function() { hljs.initHighlighting(); }, 0); } } </script> <style type="text/css"> h1 { font-size: 34px; } h1.title { font-size: 38px; } h2 { font-size: 30px; } h3 { font-size: 24px; } h4 { font-size: 18px; } h5 { font-size: 16px; } h6 { font-size: 12px; } .table th:not([align]) { text-align: left; } </style> </head> <body> <style type = "text/css"> .main-container { max-width: 940px; margin-left: auto; margin-right: auto; } code { color: inherit; background-color: rgba(0, 0, 0, 0.04); } img { max-width:100%; height: auto; } .tabbed-pane { padding-top: 12px; } button.code-folding-btn:focus { outline: none; } </style> <style type="text/css"> /* padding for bootstrap navbar */ body { padding-top: 51px; padding-bottom: 40px; } /* offset scroll position for anchor links (for fixed navbar) */ .section h1 { padding-top: 56px; margin-top: -56px; } .section h2 { padding-top: 56px; margin-top: -56px; } .section h3 { padding-top: 56px; margin-top: -56px; } .section h4 { padding-top: 56px; margin-top: -56px; } .section h5 { padding-top: 56px; margin-top: -56px; } .section h6 { padding-top: 56px; margin-top: -56px; } </style> <script> // manage active state of menu based on current page $(document).ready(function () { // active menu anchor href = window.location.pathname href = href.substr(href.lastIndexOf('/') + 1) if (href === "") href = "index.html"; var menuAnchor = $('a[href="' + href + '"]'); // mark it active menuAnchor.parent().addClass('active'); // if it's got a parent navbar menu mark it active as well menuAnchor.closest('li.dropdown').addClass('active'); }); </script> <div class="container-fluid main-container"> <!-- tabsets --> <script> $(document).ready(function () { window.buildTabsets("TOC"); }); </script> <!-- code folding --> <script> $(document).ready(function () { // move toc-ignore selectors from section div to header $('div.section.toc-ignore') .removeClass('toc-ignore') .children('h1,h2,h3,h4,h5').addClass('toc-ignore'); // establish options var options = { selectors: "h1,h2,h3", theme: "bootstrap3", context: '.toc-content', hashGenerator: function (text) { return text.replace(/[.\\/?&!#<>]/g, '').replace(/\s/g, '_').toLowerCase(); }, ignoreSelector: ".toc-ignore", scrollTo: 0 }; options.showAndHide = true; options.smoothScroll = true; // tocify var toc = $("#TOC").tocify(options).data("toc-tocify"); }); </script> <style type="text/css"> #TOC { margin: 25px 0px 20px 0px; } @media (max-width: 768px) { #TOC { position: relative; width: 100%; } } .toc-content { padding-left: 30px; padding-right: 40px; } div.main-container { max-width: 1200px; } div.tocify { width: 20%; max-width: 260px; max-height: 85%; } @media (min-width: 768px) and (max-width: 991px) { div.tocify { width: 25%; } } @media (max-width: 767px) { div.tocify { width: 100%; max-width: none; } } .tocify ul, .tocify li { line-height: 20px; } .tocify-subheader .tocify-item { font-size: 0.90em; padding-left: 25px; text-indent: 0; } .tocify .list-group-item { border-radius: 0px; } </style> <!-- setup 3col/9col grid for toc_float and main content --> <div class="row-fluid"> <div class="col-xs-12 col-sm-4 col-md-3"> <div id="TOC" class="tocify"> </div> </div> <div class="toc-content col-xs-12 col-sm-8 col-md-9"> <div class="navbar navbar-default navbar-fixed-top" role="navigation"> <div class="container"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar"> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="index.html">truncash</a> </div> <div id="navbar" class="navbar-collapse collapse"> <ul class="nav navbar-nav"> <li> <a href="about.html">About</a> </li> <li> <a href="license.html">License</a> </li> </ul> <ul class="nav navbar-nav navbar-right"> <li> <a href="https://github.com/LSun/truncash"> <span class="fa fa-github"></span> </a> </li> </ul> </div><!--/.nav-collapse --> </div><!--/.container --> </div><!--/.navbar --> <!-- Add a small amount of space between sections. --> <style type="text/css"> div.section { padding-top: 12px; } </style> <div class="fluid-row" id="header"> <h1 class="title toc-ignore">Empirical Null with Gaussian Derivatives: Weight Constraints</h1> <h4 class="author"><em>Lei Sun</em></h4> <h4 class="date"><em>2017-03-27</em></h4> </div> <p><strong>Last updated:</strong> 2018-05-15</p> <strong>workflowr checks:</strong> <small>(Click a bullet for more information)</small> <ul> <li> <details> <p><summary> <strong style="color:blue;">✔</strong> <strong>R Markdown file:</strong> up-to-date </summary></p> <p>Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.</p> </details> </li> <li> <details> <p><summary> <strong style="color:blue;">✔</strong> <strong>Environment:</strong> empty </summary></p> <p>Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.</p> </details> </li> <li> <details> <p><summary> <strong style="color:blue;">✔</strong> <strong>Seed:</strong> <code>set.seed(12345)</code> </summary></p> <p>The command <code>set.seed(12345)</code> was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.</p> </details> </li> <li> <details> <p><summary> <strong style="color:blue;">✔</strong> <strong>Session information:</strong> recorded </summary></p> <p>Great job! Recording the operating system, R version, and package versions is critical for reproducibility.</p> </details> </li> <li> <details> <p><summary> <strong style="color:blue;">✔</strong> <strong>Repository version:</strong> <a href="https://github.com/LSun/truncash/tree/388e65e06000e313c170a82f3ed57346f6024897" target="_blank">388e65e</a> </summary></p> Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated. <br><br> Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use <code>wflow_publish</code> or <code>wflow_git_commit</code>). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated: <pre><code> Ignored files: Ignored: .DS_Store Ignored: .Rhistory Ignored: .Rproj.user/ Ignored: analysis/.DS_Store Ignored: analysis/BH_robustness_cache/ Ignored: analysis/FDR_Null_cache/ Ignored: analysis/FDR_null_betahat_cache/ Ignored: analysis/Rmosek_cache/ Ignored: analysis/StepDown_cache/ Ignored: analysis/alternative2_cache/ Ignored: analysis/alternative_cache/ Ignored: analysis/ash_gd_cache/ Ignored: analysis/average_cor_gtex_2_cache/ Ignored: analysis/average_cor_gtex_cache/ Ignored: analysis/brca_cache/ Ignored: analysis/cash_deconv_cache/ Ignored: analysis/cash_fdr_1_cache/ Ignored: analysis/cash_fdr_2_cache/ Ignored: analysis/cash_fdr_3_cache/ Ignored: analysis/cash_fdr_4_cache/ Ignored: analysis/cash_fdr_5_cache/ Ignored: analysis/cash_fdr_6_cache/ Ignored: analysis/cash_plots_cache/ Ignored: analysis/cash_sim_1_cache/ Ignored: analysis/cash_sim_2_cache/ Ignored: analysis/cash_sim_3_cache/ Ignored: analysis/cash_sim_4_cache/ Ignored: analysis/cash_sim_5_cache/ Ignored: analysis/cash_sim_6_cache/ Ignored: analysis/cash_sim_7_cache/ Ignored: analysis/correlated_z_2_cache/ Ignored: analysis/correlated_z_3_cache/ Ignored: analysis/correlated_z_cache/ Ignored: analysis/create_null_cache/ Ignored: analysis/cutoff_null_cache/ Ignored: analysis/design_matrix_2_cache/ Ignored: analysis/design_matrix_cache/ Ignored: analysis/diagnostic_ash_cache/ Ignored: analysis/diagnostic_correlated_z_2_cache/ Ignored: analysis/diagnostic_correlated_z_3_cache/ Ignored: analysis/diagnostic_correlated_z_cache/ Ignored: analysis/diagnostic_plot_2_cache/ Ignored: analysis/diagnostic_plot_cache/ Ignored: analysis/efron_leukemia_cache/ Ignored: analysis/fitting_normal_cache/ Ignored: analysis/gaussian_derivatives_2_cache/ Ignored: analysis/gaussian_derivatives_3_cache/ Ignored: analysis/gaussian_derivatives_4_cache/ Ignored: analysis/gaussian_derivatives_5_cache/ Ignored: analysis/gaussian_derivatives_cache/ Ignored: analysis/gd-ash_cache/ Ignored: analysis/gd_delta_cache/ Ignored: analysis/gd_lik_2_cache/ Ignored: analysis/gd_lik_cache/ Ignored: analysis/gd_w_cache/ Ignored: analysis/knockoff_10_cache/ Ignored: analysis/knockoff_2_cache/ Ignored: analysis/knockoff_3_cache/ Ignored: analysis/knockoff_4_cache/ Ignored: analysis/knockoff_5_cache/ Ignored: analysis/knockoff_6_cache/ Ignored: analysis/knockoff_7_cache/ Ignored: analysis/knockoff_8_cache/ Ignored: analysis/knockoff_9_cache/ Ignored: analysis/knockoff_cache/ Ignored: analysis/knockoff_var_cache/ Ignored: analysis/marginal_z_alternative_cache/ Ignored: analysis/marginal_z_cache/ Ignored: analysis/mosek_reg_2_cache/ Ignored: analysis/mosek_reg_4_cache/ Ignored: analysis/mosek_reg_5_cache/ Ignored: analysis/mosek_reg_6_cache/ Ignored: analysis/mosek_reg_cache/ Ignored: analysis/pihat0_null_cache/ Ignored: analysis/plot_diagnostic_cache/ Ignored: analysis/poster_obayes17_cache/ Ignored: analysis/real_data_simulation_2_cache/ Ignored: analysis/real_data_simulation_3_cache/ Ignored: analysis/real_data_simulation_4_cache/ Ignored: analysis/real_data_simulation_5_cache/ Ignored: analysis/real_data_simulation_cache/ Ignored: analysis/rmosek_primal_dual_2_cache/ Ignored: analysis/rmosek_primal_dual_cache/ Ignored: analysis/seqgendiff_cache/ Ignored: analysis/simulated_correlated_null_2_cache/ Ignored: analysis/simulated_correlated_null_3_cache/ Ignored: analysis/simulated_correlated_null_cache/ Ignored: analysis/simulation_real_se_2_cache/ Ignored: analysis/simulation_real_se_cache/ Ignored: analysis/smemo_2_cache/ Ignored: data/LSI/ Ignored: docs/.DS_Store Ignored: docs/figure/.DS_Store Ignored: output/fig/ </code></pre> Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes. </details> </li> </ul> <details> <summary> <small><strong>Expand here to see past versions:</strong></small> </summary> <ul> <table style="border-collapse:separate; border-spacing:5px;"> <thead> <tr> <th style="text-align:left;"> File </th> <th style="text-align:left;"> Version </th> <th style="text-align:left;"> Author </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Message </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/LSun/truncash/e05bc836b3c74dc6ebca415afb5938675d6c5436/docs/gaussian_derivatives_5.html" target="_blank">e05bc83</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2018-05-12 </td> <td style="text-align:left;"> Update to 1.0 </td> </tr> <tr> <td style="text-align:left;"> rmd </td> <td style="text-align:left;"> <a href="https://github.com/LSun/truncash/blob/cc0ab8379469bc3726f1508cd81e4ecd6fef1b1a/analysis/gaussian_derivatives_5.rmd" target="_blank">cc0ab83</a> </td> <td style="text-align:left;"> Lei Sun </td> <td style="text-align:left;"> 2018-05-11 </td> <td style="text-align:left;"> update </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/LSun/truncash/0f36d998db26444c5dd01502ea1af7fbd1129b22/docs/gaussian_derivatives_5.html" target="_blank">0f36d99</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-12-21 </td> <td style="text-align:left;"> Build site. </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/LSun/truncash/6e42447d053260ae6798ad7a7f04532c41976c02/docs/gaussian_derivatives_5.html" target="_blank">6e42447</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-12-21 </td> <td style="text-align:left;"> Build site. </td> </tr> <tr> <td style="text-align:left;"> rmd </td> <td style="text-align:left;"> <a href="https://github.com/LSun/truncash/blob/fd352684e29c65e25eef3746c2564fb008a5b035/analysis/gaussian_derivatives_5.rmd" target="_blank">fd35268</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-12-21 </td> <td style="text-align:left;"> wflow_publish(“analysis/gaussian_derivatives_5.rmd”) </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/LSun/truncash/853a484bfacf347e109f6c8fb3ffaab5f4d6cc02/docs/gaussian_derivatives_5.html" target="_blank">853a484</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-11-07 </td> <td style="text-align:left;"> Build site. </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/LSun/truncash/1ea081a9eeb7fd3101271eeefe10ef8be9993622/docs/gaussian_derivatives_5.html" target="_blank">1ea081a</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-07-03 </td> <td style="text-align:left;"> sites </td> </tr> <tr> <td style="text-align:left;"> rmd </td> <td style="text-align:left;"> <a href="https://github.com/LSun/truncash/blob/0f49e8acfa54a62535e4c5d346cd89cdc4826731/analysis/gaussian_derivatives_5.rmd" target="_blank">0f49e8a</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-03-31 </td> <td style="text-align:left;"> alternative simulation </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/LSun/truncash/0f49e8acfa54a62535e4c5d346cd89cdc4826731/docs/gaussian_derivatives_5.html" target="_blank">0f49e8a</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-03-31 </td> <td style="text-align:left;"> alternative simulation </td> </tr> <tr> <td style="text-align:left;"> rmd </td> <td style="text-align:left;"> <a href="https://github.com/LSun/truncash/blob/2a9b0b7220e4b3802de7bc196687fbd3b2b80baf/analysis/gaussian_derivatives_5.rmd" target="_blank">2a9b0b7</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-03-30 </td> <td style="text-align:left;"> weights </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/LSun/truncash/2a9b0b7220e4b3802de7bc196687fbd3b2b80baf/docs/gaussian_derivatives_5.html" target="_blank">2a9b0b7</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-03-30 </td> <td style="text-align:left;"> weights </td> </tr> <tr> <td style="text-align:left;"> rmd </td> <td style="text-align:left;"> <a href="https://github.com/LSun/truncash/blob/7344c2d6c85eeaa4a51e948f9177261ba4c502b5/analysis/gaussian_derivatives_5.rmd" target="_blank">7344c2d</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-03-30 </td> <td style="text-align:left;"> constraints </td> </tr> <tr> <td style="text-align:left;"> rmd </td> <td style="text-align:left;"> <a href="https://github.com/LSun/truncash/blob/5bd5d70c2467c0ebba78974f24b8603e18800886/analysis/gaussian_derivatives_5.rmd" target="_blank">5bd5d70</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-03-28 </td> <td style="text-align:left;"> revision </td> </tr> <tr> <td style="text-align:left;"> rmd </td> <td style="text-align:left;"> <a href="https://github.com/LSun/truncash/blob/c803b43a9255f39ecac06315f7e88db746dddfdf/analysis/gaussian_derivatives_5.rmd" target="_blank">c803b43</a> </td> <td style="text-align:left;"> LSun </td> <td style="text-align:left;"> 2017-03-27 </td> <td style="text-align:left;"> fitting and write-up </td> </tr> </tbody> </table> </ul> </details> <hr /> <div id="numerical-issues" class="section level2"> <h2>Numerical issues</h2> <p>When fitting Gaussian derivatives, <a href="gaussian_derivatives_3.rmd">overfitting</a> or <a href="gaussian_derivatives_4.html">heavy correlation</a> may cause numerical instability. Most of the time, the numerical instability leads to ridiculous weights <span class="math inline">\(w_k\)</span>, making the fitted density break <a href="gaussian_derivatives.html#assumption_2:_we_further_assume_that_the_number_of_observations_(n)_is_sufficiently_large,_so_that_we_can_use_the_(n)_observed_(z)_scores_in_place_of_the_intractable_all_(xinmathbb%7Br%7D)_in_the_second_constraint">the nonnegativity constraint</a> for <span class="math inline">\(x\neq z_i\)</span>’s.</p> <p>Let normalized weights <span class="math inline">\(W_k^s = W_k\sqrt{k!}\)</span>. As <a href="gaussian_derivatives.html">shown previously</a>, under correlated null, the variance <span class="math inline">\(\text{var}(W_k^s) = \alpha_k = \bar{\rho_{ij}^k}\)</span>. Thus, under correlated null, the Gaussian derivative decomposition of the empirical distribution should have “reasonable” weights of similar decaying patterns. In other words, <span class="math inline">\(W_k^s\)</span> with mean <span class="math inline">\(0\)</span> and variance <span class="math inline">\(\bar{\rho_{ij}^k}\)</span>, should be in the order of <span class="math inline">\(\sqrt{\rho^K}\)</span> with a <span class="math inline">\(\rho\in (0, 1)\)</span>.</p> </div> <div id="examples" class="section level2"> <h2>Examples</h2> <p>This example shows that numerical instability is reflected in the number of Gaussian derivatives fitted <span class="math inline">\(K\)</span> being too large, as well as in the normalized fitted weights of Gaussian derivatives <span class="math inline">\(\hat w_k\sqrt{k!}\)</span> being completely out of order <span class="math inline">\(\sqrt{\rho^K}\)</span>.</p> <pre class="r"><code>source("../code/ecdfz.R")</code></pre> <pre class="r"><code>z = read.table("../output/z_null_liver_777.txt") p = read.table("../output/p_null_liver_777.txt")</code></pre> <pre class="r"><code>library(ashr) DataSet = c(522, 724) res_DataSet = list() for (i in 1:length(DataSet)) { zscore = as.numeric(z[DataSet[i], ]) fit.ecdfz = ecdfz.optimal(zscore) fit.ash = ash(zscore, 1, method = "fdr") fit.ash.pi0 = get_pi0(fit.ash) pvalue = as.numeric(p[DataSet[i], ]) fd.bh = sum(p.adjust(pvalue, method = "BH") <= 0.05) res_DataSet[[i]] = list(DataSet = DataSet[i], fit.ecdfz = fit.ecdfz, fit.ash = fit.ash, fit.ash.pi0 = fit.ash.pi0, fd.bh = fd.bh, zscore = zscore, pvalue = pvalue) }</code></pre> <pre class="r"><code>library(EQL) x.pt = seq(-5, 5, 0.01) H.pt = sapply(1:15, EQL::hermite, x = x.pt)</code></pre> <pre><code>Data Set 724 : Number of BH's False Discoveries: 79 ; ASH's pihat0 = 0.01606004</code></pre> <pre><code>Normalized Weights for K = 8:</code></pre> <pre><code>1 : 0.0359300698579203 ; 2 : 1.08855871642562 ; 3 : 0.0779368130366434 ; 4 : 0.345861286563959 ; 5 : 0.00912636957148547 ; 6 : -0.291318682290939 ; 7 : -0.0336534605417156 ; 8 : -0.19310963474206 ;</code></pre> <pre><code>Normalized Weights for K = 14:</code></pre> <pre><code>1 : -0.676936454061805 ; 2 : -9.24968885347997 ; 3 : -8.3124954075297 ; 4 : -87.0040400264205 ; 5 : -38.5774993866502 ; 6 : -327.811119421261 ; 7 : -92.5614572826525 ; 8 : -679.045197952641 ; 9 : -123.743821656837 ; 10 : -812.599139016504 ; 11 : -88.0469137351964 ; 12 : -530.268674468285 ; 13 : -26.1051054428588 ; 14 : -146.98364434743 ;</code></pre> <p><img src="figure/gaussian_derivatives_5.rmd/unnamed-chunk-5-1.png" width="672" style="display: block; margin: auto;" /><img src="figure/gaussian_derivatives_5.rmd/unnamed-chunk-5-2.png" width="672" style="display: block; margin: auto;" /></p> </div> <div id="weight-constraints" class="section level2"> <h2>Weight constraints</h2> <p>Therefore, we can impose regularity in the fitted Gaussian derivatives by imposing constraints on <span class="math inline">\(w_k\)</span>. A good set of weights should have following properties.</p> <ol style="list-style-type: decimal"> <li>They should make <span class="math inline">\(\sum\limits_{k = 1}^\infty w_kh_k(x) + 1\)</span> non-negative for <span class="math inline">\(\forall x\in\mathbb{R}\)</span>. This constraint needs to be satisfied for any distribution.</li> <li>They should decay in roughly exponential order such that <span class="math inline">\(w_k^s = w_k\sqrt{k!} \sim \sqrt{\rho^K}\)</span>. This constraint needs to be satisfied particularly for empirical correlated null.</li> <li><span class="math inline">\(w_k\)</span> vanishes to essentially <span class="math inline">\(0\)</span> for sufficiently large <span class="math inline">\(k\)</span>. This constraint can be seen as coming from the last one, leading to the simplification that only first <span class="math inline">\(K\)</span> orders of Gaussian derivatives are enough.</li> </ol> <p>As <a href="https://galton.uchicago.edu/~pmcc/">Prof. Peter McCullagh</a> pointed out during a chat, there should be a rich literature discussing the non-negativity / positivity condition for Gaussian derivative decomposition, also known as <a href="https://en.wikipedia.org/wiki/Edgeworth_series#Disadvantages_of_the_Edgeworth_expansion">Edgeworth expansion</a>. <strong>This could potentially be a direction to look at.</strong></p> <p>An approximation to the non-negativity constraint may come from <a href="gaussian_derivatives.html#gaussian_derivatives_and_hermite_polynomials">the fact</a> that due to orthogonality of Hermite polynomials,</p> <p><span class="math display">\[ W_k = \frac{1}{k!}\int_{-\infty}^\infty f_0(x)h_k(x)dx \ . \]</span> Therefore, if <span class="math inline">\(f_0\)</span> is truly a proper density,</p> <p><span class="math display">\[ \begin{array}{rclcl} W_1 &=& \frac{1}{1!}\int_{-\infty}^\infty f_0(x)h_1(x)dx = \int_{-\infty}^\infty x f_0(x)dx &=& E[X]_{F_0} \\ W_2 &=& \frac{1}{2!}\int_{-\infty}^\infty f_0(x)h_1(x)dx = \frac12\int_{-\infty}^\infty (x^2-1) f_0(x)dx &=& \frac12E[X^2]_{F_0} -\frac12\\ W_3 &=& \frac{1}{3!}\int_{-\infty}^\infty f_0(x)h_1(x)dx = \frac16\int_{-\infty}^\infty (x^3-3x) f_0(x)dx &=& \frac16E[X^3]_{F_0} -\frac12E[X]_{F_0}\\ W_4 &=& \frac{1}{4!}\int_{-\infty}^\infty f_0(x)h_1(x)dx = \frac1{24}\int_{-\infty}^\infty (x^4-6x^2+3) f_0(x)dx &=& \frac1{24}E[X^4]_{F_0} -\frac14E[X^2]_{F_0} + \frac18\\ &\vdots& \end{array} \]</span> Note that <span class="math inline">\(F_0\)</span> is not the empirical cdf <span class="math inline">\(\hat F\)</span>, even if we’ve taken into consideration that <span class="math inline">\(N\)</span> is sufficiently large, and the correlation structure is solely determined by <span class="math inline">\(W_k\)</span>’s and Gaussian derivatives. <span class="math inline">\(F_0\)</span> is inherently continuous, whereas the empirical cdf <span class="math inline">\(\hat F\)</span> is inherently non-differentiable. This implies that the mean of <span class="math inline">\(F_0\)</span>, <span class="math inline">\(E[X]_{F_0} \neq \bar X\)</span>, the mean of the empirical cdf <span class="math inline">\(\hat F\)</span>. <span class="math inline">\(W_k\)</span>’s are still parameters of <span class="math inline">\(F_0\)</span> that are not readily determined even given the observations (hence given the empirical cdf).</p> <p>This relationship inspires the following two ways to constraint <span class="math inline">\(W_k\)</span>’s.</p> </div> <div id="method-of-moments-estimates" class="section level2"> <h2>Method of moments estimates</h2> <p>Instead of using maxmimum likelihood estimates of <span class="math inline">\(f_0\)</span>, that is, <span class="math inline">\(\max\sum\limits_{i = 1}^n\log\left(\sum\limits_{k = 1}^\infty w_kh_k(z_i) + 1\right)\)</span>, we may use method of moments estimates:</p> <p><span class="math display">\[ \begin{array}{rcl} w_1 &=& \hat E[X]_{F_0}\\ w_2 &=& \frac12\hat E[X^2]_{F_0} -\frac12\\ w_3 &=& \frac16\hat E[X^3]_{F_0} -\frac12\hat E[X]_{F_0}\\ w_4 &=& \frac1{24}\hat E[X^4]_{F_0} -\frac14\hat E[X^2]_{F_0} + \frac18\\ &\vdots&\\ w_k &=& \cdots \end{array} \]</span></p> </div> <div id="constraining-weights-by-moments" class="section level2"> <h2>Constraining weights by moments</h2> <p>Another way is to constraint the weights by the properties of the moments to prevent them going crazy, such that</p> <p><span class="math display">\[ W_2 = \frac12E[X^2]_{F_0} -\frac12 \Rightarrow w_2 \geq -\frac12 \ . \]</span></p> <p>We may also combine the moment estimates and constraints like</p> <p><span class="math display">\[ \begin{array}{rcl} w_1 &=& \hat E[X]_{F_0}\\ w_2 &\geq& -\frac12\\ &\vdots& \end{array} \]</span> ## Quantile Constraints</p> <p>It has been shown <a href="FDR_Null.html#result">here</a>, <a href="FDR_null_betahat.html#result">here</a>, <a href="correlated_z_3.html#result">here</a> that Benjamini-Hochberg (BH) is suprisingly robust under correlation. So Matthew has an idea to constrain the quantiles of the estimated empirical null. In his words,</p> <blockquote> <p>I wonder if you can constrain the quantiles of the estimated null distribution using something based on BH. Eg, the estimated quantiles should be no more than <span class="math inline">\(1/\alpha\)</span> times more extreme than expected under <span class="math inline">\(N(0,1)\)</span> where <span class="math inline">\(\alpha\)</span> is chosen to be the level you’d like to control FDR at under the global null.</p> </blockquote> <blockquote> <p>This is not a very specific idea, but maybe gives you an idea of the kind of constraint I mean… <strong>Something to stop that estimated null having too extreme tails.</strong></p> </blockquote> <p>The good thing is this idea of constraining quantiles is easy to implement.</p> <p><span class="math display">\[ \begin{array}{rcl} F_0(x) &=& \displaystyle\Phi(x) + \sum\limits_{k = 1}^\infty W_k(-1)^k \varphi^{(k - 1)}(x)\\ &=& \displaystyle\Phi(x) - \sum\limits_{k = 1}^\infty W_kh_{k-1}(x)\varphi(x) \end{array} \]</span> So the constraints that <span class="math inline">\(F_0(x) \leq g(\alpha)\)</span> for certain <span class="math inline">\(x, g, \alpha\)</span> are essentially linear constraints on <span class="math inline">\(W_k\)</span>. Thus the convexity of the problem is preserved.</p> </div> <div id="implementation" class="section level2"> <h2>Implementation</h2> <p>Right now using the current implementation the results are disappointing. Using the constraint on <span class="math inline">\(w_2\)</span> usually makes the optimization unstable. Using moment estimates on <span class="math inline">\(w_1\)</span> and <span class="math inline">\(w_2\)</span> performs better, but only sporadically.</p> <p>Here we are re-running the optimization on <a href="gaussian_derivatives_4.html#fitting_experiments_when_(rho_%7Bij%7D)_is_large">the previous <span class="math inline">\(\rho_{ij} \equiv 0.7\)</span> example</a> with <span class="math inline">\(\hat w_1\)</span> and <span class="math inline">\(\hat w_2\)</span> estimated by the method of moments. Without this tweak current implementation can only work up to <span class="math inline">\(K = 3\)</span>, which is obviously insufficient. Now it can go up to <span class="math inline">\(K = 6\)</span>. <span class="math inline">\(K = 7\)</span> doesn’t look good, although making some improvement compared with <span class="math inline">\(K = 6\)</span>.</p> <p>The results are not particularly encouraging. Right now we’ll leave it here.</p> <pre class="r"><code>rho = 0.7 set.seed(777) n = 1e4 z = rnorm(1) * sqrt(rho) + rnorm(n) * sqrt(1 - rho) w.3 = list() for (ord in 6:7) { H = sapply(1 : ord, EQL::hermite, x = z) w <- Variable(ord - 2) w1 = mean(z) w2 = mean(z^2) / 2 - .5 H2w = H[, 1:2] %*% c(w1, w2) objective <- Maximize(CVXR::sum_entries(CVXR::log1p(H[, 3:ord] %*% w + H2w))) prob <- Problem(objective) capture.output(result <- solve(prob), file = "/dev/null") w.3[[ord - 5]] = c(w1, w2, result$primal_values[[1]]) }</code></pre> <p><img src="figure/gaussian_derivatives_5.rmd/unnamed-chunk-8-1.png" width="672" style="display: block; margin: auto;" /></p> </div> <div id="conclusion" class="section level2"> <h2>Conclusion</h2> <p>Weights <span class="math inline">\(w_k\)</span> are of central importance in the algorithm. They contain the information regarding whether the composition of Gaussian derivatives is a proper density function, and if it is, whether it’s a correlated null. We need to find an ingenious way to impose appropriate constraints on <span class="math inline">\(w_k\)</span>.</p> </div> <div id="appendix-implementation-of-moments-constraints" class="section level2"> <h2>Appendix: implementation of moments constraints</h2> </div> <div id="session-information" class="section level2"> <h2>Session information</h2> <pre class="r"><code>sessionInfo()</code></pre> <pre><code>R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.4 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] CVXR_0.95 EQL_1.0-0 ttutils_1.0-1 loaded via a namespace (and not attached): [1] Rcpp_0.12.16 knitr_1.20 whisker_0.3-2 [4] magrittr_1.5 workflowr_1.0.1 bit_1.1-12 [7] lattice_0.20-35 R6_2.2.2 stringr_1.3.0 [10] tools_3.4.3 grid_3.4.3 R.oo_1.21.0 [13] git2r_0.21.0 scs_1.1-1 htmltools_0.3.6 [16] bit64_0.9-7 yaml_2.1.18 rprojroot_1.3-2 [19] digest_0.6.15 gmp_0.5-13.1 Matrix_1.2-12 [22] ECOSolveR_0.4 R.utils_2.6.0 evaluate_0.10.1 [25] rmarkdown_1.9 stringi_1.1.6 Rmpfr_0.6-1 [28] compiler_3.4.3 backports_1.1.2 R.methodsS3_1.7.1</code></pre> </div> <!-- Adjust MathJax settings so that all math formulae are shown using TeX fonts only; see http://docs.mathjax.org/en/latest/configuration.html. This will make the presentation more consistent at the cost of the webpage sometimes taking slightly longer to load. Note that this only works because the footer is added to webpages before the MathJax javascript. --> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ "HTML-CSS": { availableFonts: ["TeX"] } }); </script> <hr> <p> This reproducible <a href="http://rmarkdown.rstudio.com">R Markdown</a> analysis was created with <a href="https://github.com/jdblischak/workflowr">workflowr</a> 1.0.1 </p> <hr> </div> </div> </div> <script> // add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); }); </script> <!-- dynamically load mathjax for compatibility with self-contained --> <script> (function () { var script = document.createElement("script"); script.type = "text/javascript"; script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"; document.getElementsByTagName("head")[0].appendChild(script); })(); </script> </body> </html>