### Lagrange multiplier approach with inequality constraints

We have previously explored the method of Lagrange multipliers to identify local minima or local maxima of a function with equality constraints. The same strategy can be applied to those with inequality constraints as well. In this guide, you will find out about the method of Lagrange multipliers applied to identify the local minimum or maximum of a function when inequality constraints are a consideration, optionally combined with equality constraints.

After going through this guide, you will be aware of:

- How to identify points of local maximum or minimum of a function with equality constraints
- Method of Lagrange multipliers with equality constraints

__Prerequisites__

For this guide, the assumption is that you already possess prior knowledge of:

- Derivative of functions
- Function of several variables, partial derivatives and gradient vectors
- A gentle intro to optimization
- Gradient descent

__Constrained Optimization and Lagrangians__

A constrained optimization issue can be typically considered as:

Where X is a scalar or vector values. Here, g(X) = 0 is the equality constraint, and

are inequality constraints. Observe that we always leverage rather than > and < in optimization problems as the former gives definition to a closed set in mathematics from where we ought to look for the value of X. These can be several constraints of each variant in an optimization problem.

The equality constraints are simple to handle but the inequality constraints are not. Thus one method to make it simpler to tackle is to convert the inequalities into equalities, but putting forth slack variables.

When something is negative, including a specific positive quantity into it will make it equal to zero, and the other way around. That quantity is the slack variable, the s squared and t squared above are instances. We intentionally put s squared and t squared terms to indicate that they must not be negative. With the slack variables put forth, we can leverage the Lagrange multipliers strategy to solve it, in which the Lagrangian is defined as:

It is useful to have the knowledge that, for the optimal solution X * to the issue, the inequality constraints are either possessing the equality holds (which the slack variable is zero), or not. For those inequality constraints with their equality hold are referred to as the active constraints. Otherwise, the inactive constraints. In this sense, you can consider that the equality constraints are always active.

__The Complementary Slackness Condition__

The reason we are required to know whether a constraint is active or not is because of the Krush-Kuhn-Tucker (KKT) conditions. Precisely, the KKT conditions details what occurs when X * is the optimum solution to a constrained optimization problem:

1] The gradient of the Lagrangian function is nil

2] All constraints are satisfied

3] The inequality constraints satisfied complementary slackness condition

The most critical of them is the complementary slackness condition. While we came to know that optimization issue with equality constraints can be solved leveraging Lagrange multiplier which the gradient of the Lagrangian is zero at the optimum solution, the complementary slackness condition extends this to the scenario of inequality constraint by stating that at the optimal solution X *, either the Lagrange multiplier is zero or the corresponding inequality constraint is active.

The leveraging of complementary slackness condition is to assist us look into differing cases in finding solutions to the optimization problem. It is the best to be explained with an instance.

__Example 1: Mean-variance portfolio optimization__

This is an instance from finance. If we possess a single dollar and were to engage in two differing investments, in which their return is modelled as a bi-variate Gaussian distribution. How much should we invest in each to reduce the overall variance in return?

This optimization issue, also referred to as Markowitz mean-variance portfolio optimization is formulated as:

Which the final two are to bound the weight of every investment to between zero and one dollar. Let’s assume Then the Lagrangian function is defined as:

And we possess the gradients:

Assume we are leveraging a battery that can provide just 1 watt of power and this power has to distribute to the k channels (denoted as p1…pk) Every channel might possess some different attenuation so at the end, the signal power is discounted by a gain g1 for every channel. Then the maximum total capacity we can accomplish by leveraging these k channels is formulated as an optimization problem.

For convenience of differentiation, we observe log2 x = logx/ log 2 and log(1+gipi/ni) = log(ni + gipi)

Under the assumption we possess k=3 channels, each one has a noise level of 1.0, 0.9, 1.0 respectively, and the channel gain is 0.9, 0.8, 0.7 then the optimization issue is:

Max f(p1, p2, pk) = log(1 + 0.9p1) + log(0.9 + 0.8p2) + log(1+0.7p3)

Subject to p1 + p2 + p3 = 1

p1, p2, p3

We possess three inequality constraints here. The Lagrangian function is defined as:

The gradient is thus,

However, now, we possess 3 slack variables and we have to take up the 8 scenarios:

Likewise, in scenario 3, p2 = 0 and we found a solution to p1 = 0.659 and p3 = 0.341, with the objective function f (p1,p2,p3) = 0.634.

Likewise, in scenario 3, p2 = 0 and we found a solution to p1 = 0.659 and p3 = 0.341, with the objective function f (p1, p2, p3) = 0.574

In scenario 4, we have p1 = 0, p2 = 0.652, p3 = 0.348, and the objective function f (p1, p2, p3) = 0.570.

In scenario 5, we possess p2 = p3 = 0 and therefore p3 = 1. Therefore, we have the objective function f(p1, p2, p3) = 0.0.536

Likewise in scenario 6 and scenario 7, we possess p2 = 1 and p1 = 1 respectively. The objective function attained 0.531 and 0.425 respectively.

Contrasting all these scenarios, we identified that the maximum value that the objective function attained is in scenario 1. Therefore the solution to this optimization problem is p1 = 0.444, p2 = 0.430, p3 = 0.126 with f(p1, p2, p3) = 0.639.

__Extensions and Further Reading__

While in the above instance, we put forth the slack variables into the Lagrangian function, some books may prefer not to include the slack variables but to restrict the Lagrange multipliers for inequality constraints as positive. In that scenario, you might observe that the Lagrangian function written as:

The Lagrangian function is also good to apply to primal-dual approach for identifying the maximum or minimum. This is specifically beneficial if the goals or constraints are non-linear, for which the solution might not be simple to identify.

Literature that covers this subject:

- Convex optimization by Stephen Boyd and Lieven Vandenberghe, 2004
- Chapter 4 of Deep Learning by Ian Goodfellow et al., 2016

__Conclusion__

In this guide, you found out about how the strategy of Lagrange multipliers can be applied to inequality constraints. Particularly, you learned:

- Lagrange multipliers and the Lagrange function in presence of inequality constraints
- How to leverage KKT conditions to find solutions to an optimization problem where inequality constraints are provided.