A.1Lagrangians and all that

Most kernel-based algorithms fall into two classes, either they use spectral techniques to solve the problem, or they use convex optimization techniques to solve the problem. Here we will discuss convex optimization. A constrained optimization problem canbe expressed as follows,

minimizex	_f_0(x)
subject to	f__i(x) ≤ 0 ∀i

h__j(x) = 0 ∀j(A.1)

That is we have inequality constraints and equality constraints. We now write the primal Lagrangian of this problem, which will be helpful in the following development,

LP(x,λ,ν) =f_0(x) + Xλifi(x) + Xνjhj_(x)(A.2)

_i_j

where we will assume in the following thatλ__i≥ 0 ∀i. From here we can define the dual Lagrangian by,

LD(λ,ν) =infxLP(x,λ,ν)(A.3)

This objective can actually become−∞for certain values of its arguments. We will call parametersλ≥ 0,νfor whichLD__>−∞dual feasible.

73

APPENDIXA. ESSENTIALSOFCONVEXOPTIMIZATION

It is important to notice that the dual Lagrangian is a concave function ofλ,νbecause it is a pointwise infimum of a family of linear functions inλ,νfunction.

Hence, even if the primal is not convex, the dual is certainly concave! It is not hard to show that

LD(λ,ν) ≤p∗(A.4)

wherep∗is the primal optimal point. This simply follows becauseiλifi(x) +jνjhj(x) ≤ 0for a primal feasible pointx∗.P

Thus, the dual problem always provides lower bound to the primal problem. The optimal lower bound can be found by solving the dual problem,

maximizeλ,νLD(λ,ν)

subject toλ__i≥ 0 ∀i(A.5)

which is therefore a convex optimization problem. If we calld∗the dual optimal point we always have:d∗≤p∗, which is called weak duality.p∗−d∗is called the duality gap. Strong duality holds whenp∗=d∗. Strong duality is very nice, in particular if we can express the primal solutionx∗in terms of the dual solution

, because then we can simply solve the dual problem and convert to the answer to the primal domain since we know that solution must then be optimal. Often the dual problem is easier tosolve.

So when does strong duality hold? Up to some mathematical details the answer is:if the primal problem is convex and the equality constraints are linear.

This means thatf_0(x)and{_f__i(x)}are convex functions andh__j(x) =Ax−b.

The primal problem can be written as follows,

p∗= inf sup LP(x,λ,ν)(A.6)

xλ≥0,ν

This can be seen as follows by noting thatsupλ≥0,νLP(x,λ,ν) =f_0(x)whenxis feasible but∞otherwise. To see this first check that by violating one of the constraints you can find a choice ofλ,νthat makes the Lagrangian infinity. Also, when all the constraints are satisfied, the best we can do is maximize the additional terms to be zero, whichis always possible. For instance, we can simply set allλ,_νto zero, even though this is not necessary if the constraints themselves vanish.

The dual problem by definition is given by,

d∗= sup inf LP(x,λ,ν)(A.7)

λ≥0,νx

A.1. LAGRANGIANSANDALLTHAT75

Hence, the “sup” and “inf” can be interchanged if strong duality holds, hence the optimal solution is a saddle-point. It is important to realize that the order of maximization and minimization matters for arbitrary functions (but not for convex functions).Try to imagine a “V” shapes valley which runs diagonally across the coordinate system. If we first maximize over one direction, keeping the other direction fixed, and then minimize the result we end up with the lowest point on the rim. If we reverse the order we end up with the highest point in the valley.

There are a number of important necessary conditions that hold for problems with zero duality gap. These Karush-Kuhn-Tucker conditions turn out to be sufficient for convex optimization problems. They aregiven by,

∇f_0(x∗) + Xλ∗_i∇f__i(x∗) + Xν__j∗∇h__j(x∗) = 0	(A.8)
h__j(x∗) = 0	(A.10)
λ∗i≥ 0	(A.11)
	(A.12)

The first equation is easily derivedbecause we already saw that

and hence all the derivatives must vanish. This condition has a nice interpretation as a “balancing of forces”. Imagine a ball rolling down a surface defined byf_0(x)(i.e. you are doing gradient descent to find the minimum). The ball gets blocked by a wall, which is the constraint. If the surface and constraint is convex then if the ball doesn’t move we have reached the optimal solution. At that point, the forces on the ball must balance. The first term represent the force of the ball against the wall dueto gravity (the ball is still on a slope). The second term represents the reaction force of the wall in the opposite direction. Theλrepresents the magnitude of the reaction force, which needs to be higher if the surface slopes more. We say that this constraint is “active”. Other constraints which do not exert a force are “inactive” and haveλ= 0. The latter statement can be read of from the last KKT condition which we call “complementary slackness”. It says that either_f__i(x) = 0(the constraint is saturated and hence active) in which caseλ_is free to take on a non-zero value. However, if the constraint is inactive:_f__i(x) ≤ 0, then_λ_must vanish. As we will see soon, the active constraints will correspond to the support vectors in SVMs!

APPENDIXA. ESSENTIALSOFCONVEXOPTIMIZATION

Complementary slackness is easily derived by,f_0(x∗) = L_D(λ∗,ν∗) = infxf_0(x) + Xλi_∗_fi(x) + Xνj_∗_hj_(x)!

_i_j

≤f_0(x∗) + Xλ∗_ifi(x∗) + Xν__j∗h__j(x∗)(A.13)

_i_j

≤_f_0(x∗)(A.14)

where the first line follows from Eqn.A.6 the second because the inf is always smaller than anyx∗and the last becausef__i(x∗) ≤ 0,λ∗i≥ 0andh__j(x∗) = 0. Hence all inequalities are equalities and each term is negative so each term must vanish separately.

A.1Lagrangians and all that

A.1Lagrangians and all that

73

maximizeλ,νLD(λ,ν)

results matching ""

No results matching ""