\documentclass[14pt]{extarticle} \usepackage{hyperref} \usepackage{wrapfig} \usepackage{graphicx} \usepackage{Sweave} \begin{document} \title{Reading group discussion of Wainwright's ``Sharp thresholds''} \author{Dean P Foster} \maketitle \section{Review of regression} Facts one should know about doing regression: \begin{itemize} \item Fact: Adding $p$ variables to a $n$ observation regression will generate a perfect fit. \item Heuristic: Adding one variable will improve R-squared by $1/n$ on average. \item Fact: $\Phi^{-1}(1/p) \approx \sqrt{2 log p}$ \item Fact: $\Delta R^2 = t^2/n$ \item Heuristic: Best feature out of $p$ improves $R^2$ by $2 log(p)/n$. \item Fact: Random vectors in $\Re^n$ are typically orthognal \item Heuristic: One can ``hope'' for aproximate orthogonality up to $p \approx 2^n$. Clearly exact orthogonality is only possible up to size $n$. \end{itemize} \section{Implications} \begin{itemize} \item If the model we want has diminsion $k$ which is bigger than $O(n/log(p))$ variables, there will be a random model that fits better with fewer variables. \item This is what is called a threshold. \item Needing to search might make this worse. \end{itemize} \section{Now on to Wainwright's paper} \section{Introduction: L0 vs L1} \begin{itemize} \item Goal: low diminsional recovery \item Solution: good fit with few's variables \item Problem: NP hard \item Aproximation: Relax to L1 \item This is equation 3 and 4 \end{itemize} \section{previous work} \begin{itemize} \item equation 5: amazingly, regression heuristics help here \item If you can get close using random vectors, you can make it exact by adding in $n$ more small vectors \end{itemize} \section{Gausian model} \begin{itemize} \item $X_{ij} \sim N(0,1)$ \item good way of generating almost orthoganal vectors \end{itemize} \section{Our contribution} \begin{itemize} \item equation 6: crude recovery of the $\beta$'s not just the subspace. \end{itemize} \section{Figure 1} \begin{itemize} \item Note: it gets sharper and sharper on rescalled axis. So only the ``mean'' is right--not the ``SD''. \end{itemize} \section{Section II: Primal-Dual witnsess construciton} Constructs properties around the optimum point. We have enought statistics to cover--that I'm going to skip the optimization. \section{III: Determinsitic designs} Equation 14a, and 15: \begin{itemize} \item Recall $\hat{\beta} = (X'X)^{-1} X'Y$ for least squares regression \item equation 15 then says: Every regression equation to predict every left out variable has all its beta's less than 1. \item Totally weird as far as L2 matrix concepts go. \item Problem: what if incoherence parameter $\gamma < 0$: \begin{itemize} \item the shadow larger than the object (Think starwars 1). \item $X_{fake} = 2 * X_{real}$ doesn't work since now fake is same as real \item $X_{fake} = 2 * X_{real} + noise$ -- this shadow is dangerious. It moves things closer at a lower $\beta$ cost \end{itemize} \end{itemize} Theorem 1: \begin{itemize} \item equation 15 keeps the shadows away \item equation 16 keeps the truth from being too colinear \item equation 17 keeps the impostors out of the equation \end{itemize} Then statement (a) says we don't over fit. equation (b) requires a strong signal--and it then says we don't under fit. \section{Necessary conditions} Theorem 2: \begin{itemize} \item equation 19 says we have a long shadow for a beta we care about \item equation 20 then says we will use that beta instead. \end{itemize} \section{Skip to bottom of 2192} Typical Oracle results have a $\log(p)$ term in them. This holds here. \section{Skip to figure 2} same shape. same ideas. Notice, steeper as sample sizes increases. So he only is capturing the cross point--not the shape of the transition. \end{document}