Statistical Inference

By John Hallman - April 6, 2021

This week we discuss Model-X Knockoffs by Emmanuel Candes and company, an extension of their previous paper Fixed-X Knockoffs which we covered a few months back.

**Materials**

- Model-X Knockoffs
- Original Fixed-X Knockoffs (for context)

**Why Model-X Knockoff filters?**

- Knockoffs is a family of new and exciting approaches for converting
*any*variable selection procedure into one that controls false discovery rate (FDR). - The initial Fixed-X (FX) Knockoffs approach only dealt with linear systems and gaussian noise, however.
- Model-X (MX) Knockoffs extends this to arbitrary joint distributions $P(X_1, \ldots, X_p, Y)$ for relatively low amounts of work by modifying the assumptions and requirements of the knockoff variables and variable scoring procedures.
- The above is crucial. FX alone doesn't cover Sisu's use cases since business metrics involve not just scalar values but also categorical values.
- The biggest limitation to this paper is that it doesn't cover
*how*to generate good MX knockoffs. It merely states the requirements for MX to provide FDR control. - That said, future papers address the limitations above. Stay tuned for more in future reading groups...

**Nuggets**

- There are 3 components to MX Knockoffs, as with FX, although with slightly different assumptions and results: (1) the knockoff generation procedure, (2) the feature scoring procedure, and (3) the threshold selection procedure.
(1) Knockoff generation — for MX Knockoffs to work, the knockoff variables must be generated such that the

*exchangeability property*and*independence*holds w.r.t the original and the knockoff variables:$P([X, \tilde{X}]) \overset{d}{=} P([X, \tilde{X}]_{swap(S)})$

$\tilde{X} \perp y \: | \: X$

Where the $swap(S)$ refers to switching columns between the original and the knockoff variables for each index $j \in S$. Note that the paper says little about how to generate $\tilde{X}$ that satisfies the above, which is easier said than done.

(2) Feature scoring — for MX Knockoffs to work, the feature scoring procedures $w_j : (X, \tilde{X}, y) \rightarrow \mathbb{R}$