Get Complete Project Material File(s) Now! »

## Solving the constrained problem on a given active set

The selection property of the constraint is naturally the source of both benets and diculties of ℓ1 regularization. If this part of the problem is left aside, that is if the active set of a LASSO solution that we may refer to as an optimal active set is assumed to be known, the remaining task reduces to the straightforward computation of minimizing a quadratic function under a linear constraint.

The algorithms presented in this chapter proceed by repeatedly performing this computation for successive active sets that are either assumed to be opti- mal. The sequences of active sets are constructed by successive inclusions and removals of one element, following considerations of respectively suciency and necessity/feasibility.

In this section, we detail this computation of a linearly constrained least-msquares problem, from which the notions of feasibility and optimality appear naturally. This step also involves a Lagrange multiplier that leads to the reg- ularized version of the problem and enlightens the relation between these two formulations.

### Optimality of an active set

Let us decompose the optimization problem into the sub-problem of nding an optimal active set and that of computing its coecient vector. In the following,

• we refer to a problem reduced on a given subset of features as a restricted problem (restricted least-squares(RLS), restricted LASSO), and to the orig- inal problem as the full one;

• a coecient is qualied as feasible if it satises constraints that were ignored in its computation and tested afterwards.

#### The active set descent method

The simplex algorithm ((Dantzig et al., 1954)) for solving linear programs, despite a worst-case exponential complexity, has proven remarkably ecient in practice, and easy to implement. The same principles have been applied to quadratic programming with similar results, in dierent settings and formulations. These algorithms are usually referred to as the family of active set methods. Their main common principle is to solve a minimization problem subject to constraints

by repeatedly solving sub-problems dened by a set of active constraints and updating this set. An exposition of the most general forms of these algorithms can be found in (Nocedal and Wright, 2000).

It should be rst claried that the term active set refers here to the more general concept of constraints activity, as opposed to variable/feature activity.

However, these concepts usually coincide in practice, as will be seen below. Regarding the plural form in methods, it usually refers, as in (Nocedal and Wright, 2000), to the derivation of the same method for dierent settings, like convex or indenite QP, or for dierent formulations (primal, dual, primaldual).

It is also legitimate to consider the two other algorithms studied here (homotopy and cyclical coordinate descent) as active set methods, since they do update a set of active features/constraints; there is, at least, a possibility of confusion about which methods may be included in this category.

We focus here on the general primal active set method for convex program- ming. We start by giving its most simple and general form, that is for convex but not necessarily quadratic programming. Surprisingly, no such exposition was found in existing literature, and although it may not be of direct practical use, it gives the core of the algorithm, which allows to understand and derive it more easily. We then specify it for the case of convex quadratic programs without cov-

ering all aspects in details, and present its application to the more specic and simple case of LASSO. In order to clarify the taxonomy of algorithms, we refer to this primal method for convex programming, as the Active Set Descent (ASD) method.

**The active set descent method for convex programming**

The general idea of the method is to follow a path on the surface of the constraint polytope that corresponds to a sequence of adjacent active sets (a constraint is activated or inactivated at each step), and on which the objective function is monotonically decreasing to its minimum. The changes in the active set are determined by considerations of suciency and necessity of the active set.

Let us rst dene the problem using, for clarity, the classical notations of optimization, in which x, or its components xi, are the variables on which a function is optimized; they will correspond to the coecient when applying to regression, and should not be confused with the predictors. Let f be a strictly convex function over Rm. We consider the problem of minimizing f under linear inequality and equality constraints over.

**Case of a dierentiable objective function**

A practical implementation of the method is made easier by the dierentiability of the objective function f and by the reformulation of the problem in a canonical form.

The dierentiability of f, or more precisely its sub-dierentiability, allows to determine if inactivating a constraint can decrease f, and is of great help for the step of minimizing it under the active constraints.The reformulation consists in associating an additional slack variable to each inequality constraint, so that they are equivalent to their associated variable be- ing positive, and then rewrite x as a (linear) function of these new variables, so that each variable is associated to a (positivity) constraint. Up to some mainte- nance of the rewriting of x, the problem is then equivalent to one of canonical form.

**Table of contents :**

**1 The LASSO 31**

1.1 Regression

1.1.1 Estimating the conditional mean

1.1.2 Practical models

1.1.3 Linear regression

1.1.4 Loss functions

1.1.5 Regularization

1.2 The Least Absolute Shrinkage Operator

1.2.1 Algorithms for solving the LASSO

1.2.2 Extended denition

1.2.3 LASSO programming

**2 Active set Algorithms**

2.1 Solving the constrained problem on a given active set

2.1.1 Optimality of an active set

2.1.2 The computation step

2.2 The active set descent method

2.2.1 The active set descent method for convex programming .

2.2.2 active set descent for the LASSO

2.2.3 Active set descent for the regularized LASSO

2.3 The homotopy method

2.3.1 Computing the rst unfeasibility point

2.3.2 Computing the rst insuciency point

2.3.3 Jumping to the next break point

2.4 Complexity

2.4.1 Sequential least-squares

2.4.2 Cholesky decomposition

2.4.3 Cholesky rank one update

2.4.4 Feature addition

2.4.5 Feature subtraction

2.5 The coordinate descent method

2.5.1 The one-dimensional LASSO

2.6 Experiments

2.6.1 Illustration of LASSO estimators

2.6.2 Speed trials

2.6.3 LASSO path and descent paths 18 Contents

**3 Beyond the Simple LASSO**

3.1 Degeneracies

3.1.1 Multiple solutions

3.1.2 Simultaneous status changes

3.2 Very large feature sets

3.2.1 Non-countable sets

3.2.2 Practical optimization

3.3 Sequential Learning

3.3.1 Online Learning

3.3.2 Non-stationary problems

3.3.3 Reinforcement Learning

3.4 Alternative loss functions

3.4.1 Alternative penalties

3.4.2 Alternative residual losses

3.5 ECON Experiments

3.5.1 ECON with homotopy and adaptive penalization

3.5.2 ECON with ASD

**Bibliography**