Author: Barak Oshri
This week we discussed Stability Selection, an extremely general finite sample control technique for structure estimation.
The main result we'll drive towards is an error control bound for the expected number of false discoveries.
In general, we have a task modeled by $p$-dimensional vector $\beta$ which is sparse and only $s < p$ components are non-zero.
We want a procedure to find the set of non-zero explanatory variables $S = \{k : \beta_k \neq 0\}$ that are not vanishing coefficients $N = \{k : \beta_k = 0 \}$. The goal of structure estimation is to infer the set $S$.
In regression, this could be selecting variables from a coefficient vector $\beta$ in a linear model
$$ Y = X\beta + \epsilon $$
where $Y$ are observations, $X$ is an $n \times p$ design matrix, and $\epsilon$ is random iid noise.
There is already a suite of well-studied methods to solve the variable selection problem above (for example, the lasso). The aim of stability selection is to enhance and improve existing methods. It is not a new variable selection technique.
Generically, we have a set of tuning parameters $\lambda \in \Lambda \subseteq \mathbb{R}^+$ in our task. For every $\lambda$, we get a structure estimate $\hat S^\lambda \subseteq \{1, ..., p\}$.