We introduce a novel method for sparse regression and variable
selection, which is inspired by modern ideas in multiple
testing. Imagine we have observations from the linear model
, then we suggest estimating the regression coefficients
by means of a new estimator called SLOPE, which is the solution to
here,
and
is the order
statistic of the magnitudes of
. In short, the regularizer is a
sorted
norm which penalizes the regression coefficients
according to their rank: the higher the rank — the closer to the
top — the larger the penalty. This is similar to the famous
Benjamini-Hochberg procedure (BHq) [1], which compares the
value of a test statistic taken from a family to a critical threshold
that depends on its rank in the family. SLOPE is a
convex program and we demonstrate an efficient algorithm for computing
the solution. We prove that for orthogonal designs with
variables,
taking
(
is the cumulative distribution
function of the errors),
, controls the false discovery
rate (FDR) for variable selection. This holds under the assumption
that the errors are i.i.d. symmetric and continuous random
variables. When the design matrix is nonorthogonal there are inherent
limitations on the FDR level and the power which can be obtained with
model selection methods based on
-like penalties. However,
whenever the columns of the design matrix are not strongly correlated,
we demonstrate empirically that it is possible to select the
parameters
as to obtain FDR control at a reasonable level
as long as the number of nonzero coefficients is not too large. At the
same time, the procedure exhibits increased power over the lasso,
which treats all coefficients equally. The paper illustrates further
estimation properties of the new selection rule through comprehensive
simulation studies.
[1] Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300, 1995