<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="pandoc" /> <meta name="author" content="Lei Sun" /> <meta name="date" content="2017-02-27" /> <title>When only the most extreme observation is known</title> <script src="site_libs/jquery-1.11.3/jquery.min.js"></script> <meta name="viewport" content="width=device-width, initial-scale=1" /> <link href="site_libs/bootstrap-3.3.5/css/cosmo.min.css" rel="stylesheet" /> <script src="site_libs/bootstrap-3.3.5/js/bootstrap.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/respond.min.js"></script> <script src="site_libs/jqueryui-1.11.4/jquery-ui.min.js"></script> <link href="site_libs/tocify-1.9.1/jquery.tocify.css" rel="stylesheet" /> <script src="site_libs/tocify-1.9.1/jquery.tocify.js"></script> <script src="site_libs/navigation-1.1/tabsets.js"></script> <link href="site_libs/highlightjs-1.1/textmate.css" rel="stylesheet" /> <script src="site_libs/highlightjs-1.1/highlight.js"></script> <link href="site_libs/font-awesome-4.5.0/css/font-awesome.min.css" rel="stylesheet" /> <style type="text/css">code{white-space: pre;}</style> <style type="text/css"> pre:not([class]) { background-color: white; } </style> <script type="text/javascript"> if (window.hljs && document.readyState && document.readyState === "complete") { window.setTimeout(function() { hljs.initHighlighting(); }, 0); } </script> <style type="text/css"> h1 { font-size: 34px; } h1.title { font-size: 38px; } h2 { font-size: 30px; } h3 { font-size: 24px; } h4 { font-size: 18px; } h5 { font-size: 16px; } h6 { font-size: 12px; } .table th:not([align]) { text-align: left; } </style> </head> <body> <style type = "text/css"> .main-container { max-width: 940px; margin-left: auto; margin-right: auto; } code { color: inherit; background-color: rgba(0, 0, 0, 0.04); } img { max-width:100%; height: auto; } .tabbed-pane { padding-top: 12px; } button.code-folding-btn:focus { outline: none; } </style> <style type="text/css"> /* padding for bootstrap navbar */ body { padding-top: 51px; padding-bottom: 40px; } /* offset scroll position for anchor links (for fixed navbar) */ .section h1 { padding-top: 56px; margin-top: -56px; } .section h2 { padding-top: 56px; margin-top: -56px; } .section h3 { padding-top: 56px; margin-top: -56px; } .section h4 { padding-top: 56px; margin-top: -56px; } .section h5 { padding-top: 56px; margin-top: -56px; } .section h6 { padding-top: 56px; margin-top: -56px; } </style> <script> // manage active state of menu based on current page $(document).ready(function () { // active menu anchor href = window.location.pathname href = href.substr(href.lastIndexOf('/') + 1) if (href === "") href = "index.html"; var menuAnchor = $('a[href="' + href + '"]'); // mark it active menuAnchor.parent().addClass('active'); // if it's got a parent navbar menu mark it active as well menuAnchor.closest('li.dropdown').addClass('active'); }); </script> <div class="container-fluid main-container"> <!-- tabsets --> <script> $(document).ready(function () { window.buildTabsets("TOC"); }); </script> <!-- code folding --> <script> $(document).ready(function () { // move toc-ignore selectors from section div to header $('div.section.toc-ignore') .removeClass('toc-ignore') .children('h1,h2,h3,h4,h5').addClass('toc-ignore'); // establish options var options = { selectors: "h1,h2,h3", theme: "bootstrap3", context: '.toc-content', hashGenerator: function (text) { return text.replace(/[.\\/?&!#<>]/g, '').replace(/\s/g, '_').toLowerCase(); }, ignoreSelector: ".toc-ignore", scrollTo: 0 }; options.showAndHide = true; options.smoothScroll = true; // tocify var toc = $("#TOC").tocify(options).data("toc-tocify"); }); </script> <style type="text/css"> #TOC { margin: 25px 0px 20px 0px; } @media (max-width: 768px) { #TOC { position: relative; width: 100%; } } .toc-content { padding-left: 30px; padding-right: 40px; } div.main-container { max-width: 1200px; } div.tocify { width: 20%; max-width: 260px; max-height: 85%; } @media (min-width: 768px) and (max-width: 991px) { div.tocify { width: 25%; } } @media (max-width: 767px) { div.tocify { width: 100%; max-width: none; } } .tocify ul, .tocify li { line-height: 20px; } .tocify-subheader .tocify-item { font-size: 0.90em; padding-left: 25px; text-indent: 0; } .tocify .list-group-item { border-radius: 0px; } </style> <!-- setup 3col/9col grid for toc_float and main content --> <div class="row-fluid"> <div class="col-xs-12 col-sm-4 col-md-3"> <div id="TOC" class="tocify"> </div> </div> <div class="toc-content col-xs-12 col-sm-8 col-md-9"> <div class="navbar navbar-default navbar-fixed-top" role="navigation"> <div class="container"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar"> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="index.html">truncash</a> </div> <div id="navbar" class="navbar-collapse collapse"> <ul class="nav navbar-nav"> <li> <a href="index.html">Home</a> </li> <li> <a href="about.html">About</a> </li> <li> <a href="license.html">License</a> </li> </ul> <ul class="nav navbar-nav navbar-right"> <li> <a href="https://github.com/LSun/truncash"> <span class="fa fa-github"></span> </a> </li> </ul> </div><!--/.nav-collapse --> </div><!--/.container --> </div><!--/.navbar --> <div class="fluid-row" id="header"> <h1 class="title toc-ignore">When only the most extreme observation is known</h1> <h4 class="author"><em>Lei Sun</em></h4> <h4 class="date"><em>2017-02-27</em></h4> </div> <p><strong>Last updated:</strong> 2017-03-04</p> <p><strong>Code version:</strong> be223d3</p> <div id="introducation" class="section level2"> <h2>Introducation</h2> <p>Up until now, <code>truncash</code> only uses a threshold that’s pre-specified, that is, independent with the data. So a natrual question is, what will happen if we choose a threshold that is data driven, such as the <span class="math inline">\(n^\text{th}\)</span> most extreme observation or the top <span class="math inline">\(q\%\)</span> quantile?</p> <p>For a start, Matthew had an idea that what if the only thing we know is the most extreme observation <span class="math inline">\((\hat\beta_{(n)}, \hat s_{(n)})\)</span>, as well as the total number of observations <span class="math inline">\(n\)</span>. What does this single data point tell us?</p> </div> <div id="model" class="section level2"> <h2>Model</h2> <p>Start with our usual <code>ash</code> model.</p> <p><span class="math display">\[ \begin{array}{c} \hat\beta_j | \hat s_j, \beta_j \sim N(\beta_j, \hat s_j^2)\\ \beta_j \sim \sum_k\pi_k N(0, \sigma_k^2) \end{array} \]</span> Now we only observe <span class="math inline">\((\hat\beta_{(n)}, \hat s_{(n)})\)</span> with the information that <span class="math inline">\(|\hat\beta_{(n)}/\hat s_{(n)}| \geq |\hat\beta_{j}/\hat s_{j}|\)</span>, <span class="math inline">\(j = 1, \ldots, n\)</span>. This is essentially separating <span class="math inline">\(n\)</span> observations into two groups.</p> <p><span class="math display">\[ \text{Group 1: }(\hat\beta_{(1)}, \hat s_{(1)}), \ldots, (\hat\beta_{(n - 1)}, \hat s_{(n - 1)}), \text{ with } |\hat\beta_j/\hat s_j| \leq t = |\hat\beta_{(n)}/\hat s_{(n)}| \]</span> <span class="math display">\[ \text{Group 2: }(\hat\beta_{n}, \hat s_{n}), \text{ with } |\hat\beta_{(n)}/\hat s_{(n)}| = t \]</span> Or in other words, it should be related to <code>truncash</code> using the threshold <span class="math inline">\(t = |\hat\beta_{(n)}/\hat s_{(n)}|\)</span>, at least from the likelihood principle point of view.</p> </div> <div id="back-of-the-envelope-calculation" class="section level2"> <h2>Back-of-the-envelope calculation</h2> <p>Suppose <span class="math inline">\(X_1 \sim F_1, X_2\sim F_2, \ldots, X_n \sim F_n\)</span>, with <span class="math inline">\(F_i\)</span> being the cdf of the random variable <span class="math inline">\(X_i\)</span>, with a pdf <span class="math inline">\(f_i\)</span>. In <code>ash</code>’s setting, we can think of <span class="math inline">\(X_i = |\hat\beta_i/ \hat s_i|\)</span>, and <span class="math inline">\(f_i\)</span> is the convolution of a common unimodel distribution <span class="math inline">\(g\)</span> (to be estimated) and the idiosyncratic likelihood of <span class="math inline">\(|\hat\beta_j / \hat s_j|\)</span> given <span class="math inline">\(\hat s_j\)</span> (usually related to normal or Student’s t, but could be generalized to others). Let <span class="math inline">\(X_{(n)}:=\max\{X_1, X_2, \ldots, X_n\}\)</span>, the extreme value of these <span class="math inline">\(n\)</span> random variables.</p> <p><span class="math display">\[ \begin{array}{rl} & P(X_{(n)} \leq t) = \prod_{i = 1}^n F_i(t) \\ \Rightarrow & p_{X_{(n)}}(t) = dP(X_{(n)} \leq t)/dt \neq \prod_{i = 1}^{n-1} F_i(t)f_n(t) \end{array} \]</span> where <span class="math inline">\(\{1, \ldots, n-1\}\)</span> are the index set of less extreme observations and <span class="math inline">\(n\)</span> of the most extreme one. So these two statements are not equivalent.</p> <ol style="list-style-type: decimal"> <li>The largest value in <span class="math inline">\(\{X_1, X_2, \ldots, X_n\}\)</span> is <span class="math inline">\(t\)</span>.</li> <li>We have <span class="math inline">\(n\)</span> random variables and we only observe one; all others are less than it.</li> </ol> </div> <div id="special-case" class="section level2"> <h2>Special case</h2> <p>If we have <span class="math inline">\(F_1 = F_2 = \cdots = F_n\)</span>, the two statements are somehow indeed related because <span class="math display">\[ \begin{array}{rl} & P(X_{(n)} \leq t) = (F(t))^n \\ \Rightarrow & p_{X_{(n)}}(t) = dP(X_{(n)} \leq t)/dt = n(F(t))^{n-1}f(t) \\ \propto & (F(t))^{n-1}f(t)\\ \end{array} \]</span> In other words, we can regard “known the largest observation only” as equivalent to “using the largest observation as the threshold in <code>truncash</code>.”</p> <p><span class="math inline">\(F_1 = F_2 = \cdots = F_n\)</span> in current setting implies that <span class="math inline">\(\hat\beta_j / \hat s_j\)</span> has the same marginal distribution for every observation. Actually it’s not a wild assumption. For example, <a href="t-likelihood.html">we always have</a></p> <p><span class="math display">\[ \hat\beta_j / \hat s_j | \beta_j, s_j, \nu_j \sim t_{\nu_j}(\beta_j / s_j) \]</span> If we further assume</p> <p><span class="math display">\[ \beta_j / s_j \sim g \]</span> then we’ll arrive at the result that <span class="math inline">\(\hat\beta_j / \hat s_j\)</span> has the same marginal distribution. This assumption is essentially the gold standard everybody implicitly makes, refered to as <span class="math inline">\(\alpha = 1\)</span> assumption in <a href="https://doi.org/10.1093/biostatistics/kxw041"><code>ash</code></a>.</p> </div> <div id="session-information" class="section level2"> <h2>Session Information</h2> <pre class="r"><code>sessionInfo()</code></pre> <pre><code>R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS Sierra 10.12.3 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] backports_1.0.5 magrittr_1.5 rprojroot_1.2 tools_3.3.2 [5] htmltools_0.3.5 yaml_2.1.14 Rcpp_0.12.9 stringi_1.1.2 [9] rmarkdown_1.3 knitr_1.15.1 git2r_0.18.0 stringr_1.1.0 [13] digest_0.6.11 workflowr_0.3.0 evaluate_0.10 </code></pre> </div> <hr> <p> This <a href="http://rmarkdown.rstudio.com">R Markdown</a> site was created with <a href="https://github.com/jdblischak/workflowr">workflowr</a> </p> <hr> <!-- To enable disqus, uncomment the section below and provide your disqus_shortname --> <!-- disqus <div id="disqus_thread"></div> <script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'rmarkdown'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> <a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a> --> </div> </div> </div> <script> // add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); }); </script> <!-- dynamically load mathjax for compatibility with self-contained --> <script> (function () { var script = document.createElement("script"); script.type = "text/javascript"; script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"; document.getElementsByTagName("head")[0].appendChild(script); })(); </script> </body> </html>