siam j. optimization c - gwdgwebdoc.sub.gwdg.de/ebook/serien/e/impa_a/677.pdf · rio de janeiro, rj...

LOCAL CONVERGENCE OF EXACT AND INEXACT AUGMENTEDLAGRANGIAN METHODS UNDER THE SECOND-ORDER

SUFFICIENT OPTIMALITY CONDITION ∗

D. FERNANDEZ † AND M. V. SOLODOV ‡

SIAM J. OPTIMIZATION c© 2012 Society for Industrial and Applied MathematicsVol. xx, No. x, pp. xx–xx, 2012 000

Abstract. We establish local convergence and rate of convergence of the classical augmentedLagrangian algorithm under the sole assumption that the dual starting point is close to a multipliersatisfying the second-order sufficient optimality condition. In particular, no constraint qualificationsof any kind are needed. Previous literature on the subject required, in addition, the linear indepen-dence constraint qualification and either the strict complementarity assumption or a stronger versionof the second-order sufficient condition. That said, the classical results allow the initial multiplierestimate to be far from the optimal one, at the expense of proportionally increasing the thresholdvalue for the penalty parameters. Although our primary goal is to avoid constraint qualifications,if the stronger assumptions are introduced then starting points far from the optimal multiplier areallowed within our analysis as well. Using only the second-order sufficient optimality condition, forpenalty parameters large enough we prove primal-dual Q-linear convergence rate, which becomessuperlinear if the parameters are allowed to go to infinity. Both exact and inexact solutions of sub-problems are considered. In the exact case, we further show that the primal convergence rate is of thesame Q-order as the primal-dual rate. Previous assertions for the primal sequence all had to do withthe the weaker R-rate of convergence and required the stronger assumptions cited above. Finally,we show that under our assumptions one of the popular rules of controlling the penalty parametersensures their boundedness.

Key words. Augmented Lagrangian, method of multipliers, second-order sufficient optimalitycondition, linear convergence, superlinear convergence.

AMS subject classifications. 90C30, 90C33, 90C55, 65K05.

1. Introduction. Given twice continuously differentiable functions f : IRn → IRand g : IRn → IRm, we consider the optimization problem

min f(x)s.t. gi(x) = 0, i = 1, . . . , l,

gi(x) ≤ 0, i = l + 1, . . . ,m,(1.1)

where 0 ≤ l ≤ m. One of the fundamental approaches to solving (1.1) is the aug-mented Lagrangian algorithm, known also as the method of multipliers. The methoddates back to [19, 26, 27, 28], and had been further developed and studied from vari-ous angles in [33, 3, 9, 20, 13, 10, 32, 5, 1, 2, 6, 7, 23], among extensive other literature(see also the monographs [4, 25, 12, 31]). Successful software based on the augmentedLagrangians includes LANCELOT [11] and ALGENCAN [34].

We next describe the basic iteration of the algorithm in question. The augmentedLagrangian function for (1.1) is defined by L : IRn × IRm × (0,∞)→ IR,

L(x, µ; ρ) = f(x)+

l∑i=1

(µigi(x)+ρ

2gi(x)2)+

1

2ρ

m∑i=l+1

(max{0, µi+ρgi(x)}2−µ2i ).(1.2)

∗Received by the editors October 2010; revised July 2011.†FaMAF, Universidad Nacional de Cordoba, Medina Allende s/n, 5000 Cordoba, Argentina

([email protected]).‡IMPA – Instituto de Matematica Pura e Aplicada, Estrada Dona Castorina 110, Jardim Botanico,

Rio de Janeiro, RJ 22460-320, Brazil ([email protected]). Research of this author is supported inpart by CNPq Grants 300513/2008-9, by PRONEX–Optimization, and by FAPERJ.

1

2 D. FERNANDEZ AND M. V. SOLODOV

Then given the current multiplier estimate µk ∈ IRm and the current penalty param-eter ρk > 0, the (exact) augmented Lagrangian method generates the next iterate(xk+1, µk+1) ∈ IRn × IRm as follows:

xk+1 is a solution of minx∈IRn

L(x, µk; ρk),

µk+1i = µki + ρkgi(x

k+1), i = 1, . . . , l,

µk+1i = max{0, µki + ρkgi(x

k+1)}, i = l + 1, . . . ,m.

(1.3)

Before explaining our contributions to the analysis of this algorithm, we needto introduce some notation. Stationary points of problem (1.1) and the associatedLagrange multipliers are characterized by the Karush–Kuhn–Tucker (KKT) system

0 =∂L

∂x(x, µ),

0 = gi(x), i = 1, . . . , l,0 ≤ µi, gi(x) ≤ 0, µigi(x) = 0, i = l + 1, . . . ,m,

(1.4)

where L : IRn × IRm → IR is the Lagrangian function of problem (1.1), i.e.,

L(x, µ) = f(x) + 〈µ, g(x)〉.

We denote byM(x) the set of Lagrange multipliers of problem (1.1) associated with agiven stationary point x, i.e., µ ∈M(x) if and only if (x, µ) satisfies the KKT system(1.4). Let

I = I(x) = {i ∈ {1, . . . ,m} | gi(x) = 0}, J = J (x) = {1, . . . ,m} \ I,

be the sets of indices of active and inactive constraints at x, respectively. For eachµ ∈M(x), we introduce the following standard partition of the set I:

I1(µ) = {1, . . . , l} ∪ {i ∈ {l + 1, . . . ,m} | µi > 0}, I0(µ) = I \ I1(µ).

The critical cone of problem (1.1) at its stationary point x is given by

C(x) =

{u ∈ IRn

∣∣∣∣〈f ′(x), u〉 = 0, 〈g′i(x), u〉 = 0 for i ∈ {1, . . . , l},〈g′i(x), u〉 ≤ 0 for i ∈ {l + 1, . . . ,m} with gi(x) = 0

}(1.5)

={u ∈ IRn | g′I1(µ)(x)u = 0, g′I0(µ)(x)u ≤ 0

},

where the second equality is independent of the choice of µ ∈M(x).We say that the second-order sufficient optimality condition (SOSC) is satisfied

at (x, µ) with µ ∈M(x) if⟨∂2L

∂x2(x, µ)u, u

⟩> 0 ∀u ∈ C(x) \ {0}.(1.6)

The usual convergence rate statements for the augmented Lagrangian meth-ods assume the linear independence constraint qualification (i.e., that the gradients{g′i(x), i ∈ I} are linearly independent, and hence the multiplier µ ∈M(x) is unique),strict complementarity (i.e., µi > 0 for all i ∈ I) and SOSC (1.6). The assertion isthat for any initial multiplier estimate µ0 if ρk ≥ ρ > 0 for all k (the farther µ0

is from the unique optimal multiplier µ the larger should be the penalty parameterthreshold ρ), then the dual sequence generated by (1.3) converges to µ with Q-linear

LOCAL CONVERGENCE OF AUGMENTED LAGRANGIAN METHODS 3

rate and the primal sequence converges to x with R-linear rate. Convergence becomessuperlinear if ρk → +∞. Various versions of such statements can be found, e.g., in[4, Prop. 3.2 and 2.7], [25, Thm. 17.6], [12], [31, Thm. 6.16]. Strict complementarityis not assumed in [13], but a stronger version of second-order sufficiency is employed.Strict complementarity is also not used in [20]. However, the linear independenceconstraint qualification is required in all the literature.

In this paper, we prove that if the initial multiplier estimate µ0 is close enough toµ satisfying SOSC (1.6) and the penalty parameters are chosen large enough, then theprimal-dual sequence converges Q-linearly to (x, µ) with some µ ∈M(x). In particu-lar, no constraint qualifications of any kind are needed. The multipliers set can evenbe unbounded. Strict complementarity is also not assumed. As usual, convergencebecomes superlinear if ρk → +∞. However, in most implementations it is acceptedthat the penalty parameters should stay bounded. In this respect, we show that theyindeed stay bounded under our assumption of SOSC (1.6), if updated as proposed in[7]. This is also the first result to this effect not assuming any constraint qualifica-tions. Both exact and inexact solutions of subproblems are considered. Furthermore,for the exact solutions of subproblems we show that the primal convergence rate isof the same Q-order as the primal-dual rate. To the best of our knowledge, this isthe first result asserting Q-rate of convergence (rather than the weaker R-rate) of theprimal sequence generated by the augmented Lagrangian method, under any assump-tions. We also show that if the linear independence constraint qualification and thestrong second-order sufficient optimality condition are introduced, then initial valuesof multiplier estimates far from the optimal one can be used in our analysis as well,similar to the classical results.

It is worth commenting that when it comes to the need (or not) for constraintqualifications, it is reasonable to conjecture that in augmented Lagrangian methodsthey may be not required. Indeed, at least part of the role of constraint qualificationsin convergence analyses of optimization methods is to ensure that subproblems arefeasible; for example when constraints are linearized as in the standard sequentialquadratic programming [8] or in the linearized (augmented) Lagrangian methods [24].In such cases, some constraint qualification is clearly unavoidable. In the augmentedLagrangian methods, on the other hand, subproblems are unconstrained and thereis no obvious reason why assumptions about the constraints should be required inthe analysis. Thus removing them from consideration looks appealing and potentiallypossible. We note, in passing, that the situation is somewhat similar to the stabilizedsequential quadratic programming method, where constraint qualifications are alsonot needed [17]. For this method feasibility of subproblems is also not an issue,although the reason is different: subproblems are always feasible thanks to a certain“elastic mode” feature.

Our approach to convergence of the augmented Lagrangian methods is in thespirit of some recent analyses of Newtonian frameworks for generalized equationsin [18] and [21], and their applications to methods for constrained optimization in[17, 16, 22]. The framework of [18] allows nonisolated solutions (and thus nonuniquemultipliers), and had been employed in [17, 22] to prove convergence of the stabilizedsequential quadratic programming method without assuming constraint qualifications.The framework of [21] requires solutions to be isolated (thus the strict Mangasarian–Fromovitz constraint qualification is needed), but reveals that Newtonian lines ofanalysis might be applicable to some algorithms that are not commonly recognized asbeing of Newton type, in the sense that their subproblems are not systems of equations


or quadratic programs. For example, such is the case of the linearly constrainedaugmented Lagrangian method [24], for which improved local convergence results havebeen obtained in [21, 16] relating the method in question to a certain perturbation ofNewtonian iterates. The present paper develops Newtonian analysis of the augmentedLagrangian method by combining the ideas of [18] in dealing with nonisolated solutionsin Newton methods with the ideas of [21, 16] in dealing with methods that perhaps “donot look” like Newton methods, but can be re-interpreted in such terms a posteriori byintroducing perturbations. The applicability of this line of analysis to the augmentedLagrangian method is actually somewhat surprising, considering that no linear orquadratic approximations appear in (1.3).

In practice, subproblems of the augmented Lagrangian method are often solvedapproximately, in the sense that the iterates of the unconstrained minimization methodused to minimize the augmented Lagrangian are truncated. Specifically, instead ofcomputing an “exact” minimizer in (1.3), a point xk+1 satisfying∥∥∥∥∂L∂x (xk+1, µk; ρk)

∥∥∥∥ ≤ εkis accepted, where εk ≥ 0 is the current approximation tolerance. In theoreticalanalysis, {εk} is often an exogenous sequence of scalars converging to zero. In whatfollows, we define εk as a specific computable quantity that depends on the violationof KKT conditions (1.4) for problem (1.1) by the point (xk, µk). This technique isalso related to some truncation conditions used in [1]. Thus we prove convergence andrate of convergence for both the inexact/truncated version and the exact method.

The rest of the paper is organized as follows. In Section 2 we state our algorithmicframework and interpret augmented Lagrangian iterates as perturbations of solutionsof generalized equations associated to the KKT conditions of the problem. We alsosketch the general lines of our convergence analysis. Details are worked out in thenext Section 3. Section 4 presents some further related developments. In particular,in Section 4.1 we establish boundedness of the penalty parameters if the latter aregenerated by one of the popular update rules. Some relations with the classical resultsare discussed in Section 4.2, where it is shown that under the stronger assumptionsinitial multiplier estimates far from the optimal one can be used in our analysis aswell. Section 4.3 explains that our results extend to the practically important casewhere linear constraints of the problem are passed as constraints to the subproblems,while the augmented Lagrangian involves general constraints only.

We finish this section with describing our notation. We use 〈·, ·〉 to denote theEuclidean inner product, ‖ · ‖ the associated norm, B the closed unit ball, and Ithe identity matrix (the space is always clear from the context). For any matrix M ,MI denotes the submatrix of M with rows indexed by the set I. When in matrixnotation, vectors are considered columns, and for a vector x we denote by xI thesubvector of x with coordinates indexed by I. For a set S ⊂ IRq and a point z ∈ IRq,the distance from z to S is defined as dist(z, S) = infs∈S ‖z− s‖. Then ΠS(z) = {s ∈S | dist(z, S) = ‖z − s‖} is the set of all points in S that have minimal distance to z.For a cone K ⊂ IRq, its polar (negative dual) is K◦ = {ξ ∈ IRq | 〈z, ξ〉 ≤ 0 ∀z ∈ K}.Recall that for any closed convex cone K ⊂ IRq and z ∈ IRq it holds that

z = ΠK(z) ⇔ K 3 z ⊥ z − z ∈ K◦,(1.7)

where the notation u ⊥ v means that 〈u, v〉 = 0. We use the notation ψ(t) = o(t) forany function ψ : IR+ → IRq such that limt→0+ t

−1ψ(t) = 0.


2. The algorithmic framework and preliminary considerations. In thissection, we formally state the algorithm under consideration and discuss our interpre-tation of its iterates as solutions of perturbed generalized equations associated to theKKT system (1.4).

To simplify the notation, we define the closed convex cone

Q = {ν ∈ IRm | νi ∈ IR, i = 1, . . . , l; νi ≥ 0, i = l + 1, . . . ,m}.

Noting that Q◦ = {ξ ∈ IRm | ξi = 0, i = 1, . . . , l; ξi ≤ 0, i = l + 1, . . . ,m}, ourproblem (1.1) takes the compact form

min f(x)s.t. g(x) ∈ Q◦,(2.1)

with its KKT system (1.4) given by

0 =∂L

∂x(x, µ),

Q 3 µ ⊥ g(x) ∈ Q◦.(2.2)

The violation of the KKT conditions (2.2) is measured by the natural residual σ :IRn × IRm → IR+,

σ(x, µ) =

∥∥∥∥[ f ′(x) + (g′(x))>µµ−ΠQ(µ+ g(x))

]∥∥∥∥ .(2.3)

In particular, σ(x, µ) = 0 if and only if (x, µ) solves (2.2). Define the function G :IRn× IRm → IRn× IRm and the set-valued mapping N from IRn× IRm to the subsetsof IRn × IRm by

G(x, µ) =

[∂L

∂x(x, µ)

−g(x)

], N (x, µ) = {0} × NQ(µ),(2.4)

where

NQ(µ) =

{{v ∈ IRm | 〈v, ν − µ〉 ≤ 0 ∀ν ∈ Q} if µ ∈ Q,∅, otherwise,

is the normal cone to Q at µ ∈ IRm. Then the KKT system (1.4) (or (2.2)) is furtherequivalent to the generalized equation (GE)

0 ∈ G(w) +N (w), w = (x, µ) ∈ IRn × IRm.(2.5)

In the present notation, the augmented Lagrangian function (1.2) is

L(x, µ; ρ) = f(x) +1

2ρ

(‖ΠQ (µ+ ρg(x))‖2 − ‖µ‖2

),(2.6)

and its derivatives are given by

∂L

∂x(x, µ; ρ) = f ′(x) + (g′(x))>ΠQ(µ+ ρg(x)),

∂L

∂µ(x, µ; ρ) =

1

ρ(ΠQ(µ+ ρg(x))− µ) .


For an arbitrary but fixed c ∈ (0,+∞), we consider the following iterative pro-cedure. Given a current iterate (xk, µk) ∈ IRn × Q with σ(xk, µk) > 0, a penaltyparameter ρk > 0 and an approximation parameter εk ≥ 0,

find xk+1 satisfying

∥∥∥∥∂L∂x (xk+1, µk; ρk)

∥∥∥∥ ≤ εk,(2.7)

∥∥∥∥[ xk+1 − xkΠQ(µk + ρkg(xk+1))− µk

]∥∥∥∥ ≤ c σ(xk, µk);(2.8)

and set µk+1 = ΠQ(µk + ρkg(xk+1)).(2.9)

Some comments are in order. The condition (2.8) is a localization-type condition.Some condition of this kind is unavoidable in local convergence analyses of most algo-rithms, and the augmented Lagrangian method is no exception [4, 25, 12, 31]. Withoutextremely strong assumptions, the augmented Lagrangian may have stationary points(as well as minimizers) arbitrarily far from the solution of interest (equivalently, fromthe current iterate xk). Such points must be discarded in any meaningful local anal-ysis. This is precisely the role of (2.8). In practical implementations, it is ignored,of course. Another point is that this condition is formally not well-defined for k = 0(the algorithm starts with the initial dual estimate µ0 ∈ Q and generates x1, i.e.,we do not have x0). This detail should not introduce any confusion, however, as wecan think of (2.7)–(2.9) starting with the index k = 1. Alternatively, we could takex0 arbitrary and fix c a posteriori, after x1 is obtained, choosing it large enough tosatisfy (2.8) for k = 0.

We next relate the iterative process (2.7)–(2.9) to the study of GE (2.5) underperturbations. By (2.7), there exists ϑ ∈ B such that

∂L

∂x(xk+1, µk+1) + εkϑ = 0,

where we used the fact that

∂L

∂x(xk+1, µk+1) =

∂L

∂x(xk+1, µk; ρk).(2.10)

Also, by the definition of µk+1, we have that

µk + ρkg(xk+1)− µk+1 ∈ NQ(µk+1).(2.11)

Hence,

0 ∈

∂L

∂x(xk+1, µk+1) + εkϑ

−g(xk+1) +1

ρk(µk+1 − µk)

+N (xk+1, µk+1).

Defining wk+1 = (xk+1, µk+1), we conclude that

0 ∈ G(wk+1) + pk +N (wk+1), where pk =

(εkϑ ,

µk+1 − µk

ρk

), ϑ ∈ B.(2.12)


Thus, the behavior of the sequence generated by the augmented Lagrangianmethod is connected to the study of solutions of GE (2.5) under perturbations ofthe form in (2.12). There are some ingredients in our approach that are closely re-lated to [18], where an analysis of a certain class of Newtonian methods for GEs ispresented. However, the framework of [18] is not applicable to our case. To see this,it is enough to note that the penalty parameters ρk in (2.12) are not functions of theproblem variables, unlike the perturbation terms of the Newtonian iteration in [18].

Let Σ∗ be the solution set of the GE (2.5) (equivalently, of the KKT system (2.2))and let Σ(p) be the solution set of its right-hand side perturbation, i.e.,

Σ(p) = {w ∈ IRn+m | 0 ∈ G(w) + p+N (w)}, p ∈ IRn+m.(2.13)

Let w ∈ Σ∗ be a specific solution of the GE (2.5). The following upper-Lipschitzianproperty of Σ(p) is one of the main ingredients in local analysis of a number ofalgorithms for solving GEs and its special cases: there exist ε0, γ0, τ0 > 0 such that

Σ(p) ∩ (w + ε0B) ⊆ Σ∗ + τ0‖p‖B ∀p ∈ γ0B.(2.14)

According to [18, Theorem 2], in the case of KKT systems the property (2.14) isfurther equivalent to the first inequality in the following error bound:

β1 dist(w,Σ∗) ≤ σ(w) ≤ β2 dist(w,Σ∗) ∀w ∈ w + εσB,(2.15)

where εσ > 0, β2 ≥ β1 > 0 and σ(·) is the natural residual of KKT conditionsdefined in (2.3). The second inequality in (2.15) holds by the Lipschitz-continuityof the natural residual. Furthermore, the property (2.14) is also equivalent to theassumption that the multiplier µ in w = (x, µ) is noncritical as defined in [22]. Weshall not introduce the latter notion here, as for the purposes of this paper it is enoughto mention that SOSC (1.6) holding at (x, µ) implies that µ is noncritical and thus(2.15) holds [22]. Summarizing, under SOSC (1.6) we have the properties stated in(2.14) and (2.15). Note also that since SOSC implies that x is locally unique, shrinkingεσ if necessary, it holds that

dist(w,Σ∗) =(‖x− x‖2 + dist(µ,M(x))2

)1/2 ∀w ∈ w + εσB.(2.16)

We now sketch the general line of our convergence analysis. Observe that (2.12)means that wk+1 ∈ Σ(pk). Thus, if wk+1 ∈ w + ε0B and pk ∈ γ0B, from (2.14) weobtain that

dist(wk+1,Σ∗) ≤ τ0‖pk‖

≤ τ0(εk +

1

ρk‖µk+1 − µk‖

).(2.17)

It should be noted that relations like (2.17) are standard in the literature of aug-mented Lagrangian methods, but previously they have always been derived assumingthe linear independence constraint qualification at x, among other things. Our inter-pretation above of the augmented Lagrangian iterates as solutions of perturbed GEs(2.12) shows that (2.17) is a direct consequence of the upper-Lipschitzian property(2.14), for which SOSC (1.6) is sufficient.

Consider, for the moment, the case of the exact iterations with εk = 0. If thecondition (2.8) holds, then

‖wk+1 − wk‖ ≤ cσ(wk)


and, by (2.15),

‖µk+1 − µk‖ ≤ cβ2dist(wk,Σ∗).

It then follows from (2.17) that

dist(wk+1,Σ∗) ≤cτ0β2

ρkdist(wk,Σ∗).

Hence, convergence (and rate of convergence) essentially would follow if we show thatthe sequence generated by (2.7)–(2.9) is well-defined and, in particular, that (2.8)holds.

3. Local convergence analysis. The first stage of the analysis is to show thatunder SOSC (1.6) holding at (x, µ), the augmented Lagrangian has (exact local)minimizers that satisfy the condition (2.8).

We start with establishing that the augmented Lagrangian has the quadraticgrowth property around x, uniform for all µ ∈M(x) close to µ. This is an extensionof [29, Theorem 7.4 (c)], which states this property for the fixed µ.

Proposition 3.1. If (x, µ) satisfies SOSC (1.6), then there exist the constantsδµ, γµ, ηµ, ρµ > 0 such that

L(x, µ; ρ) ≥ f(x) + γµ‖x− x‖2,(3.1)

for all x ∈ x+ δµB, all µ ∈ (µ+ ηµB) ∩M(x), and all ρ ≥ ρµ.Proof. For all µ ∈ M(x) and ρ > 0, it holds that µ = ΠQ(µ + ρg(x)). Hence,

L(x, µ; ρ) = f(x) and ∂L∂x (x, µ; ρ) = ∂L

∂x (x, µ) = 0. Then the second-order expansion ofthe augmented Lagrangian in the primal variables takes the following form (see [29,Prop. 7.2, (7.3)]):

L(x+ u, µ; ρ) = f(x) + Φ0(µ, u) + ρΦ1(µ, u) + o(‖u‖2),(3.2)

where

Φ0(µ, u) =1

2

⟨∂2L

∂x2(x, µ)u, u

⟩,

Φ1(µ, u) =1

2

∑i∈I1(µ)

(〈g′i(x), u〉)2 +∑

i∈I0(µ)

(max{0, 〈g′i(x), u〉})2

.

Note that if i ∈ I1(µ), i ≥ l + 1, we have that µi > 0. Then I1(µ) ⊆ I1(µ) forall µ ∈ M(x) close enough to µ, by continuity. For the same (continuity) reason,Φ0(µ, u) > 0 for all u ∈ C(x) \ {0}, since this property holds at µ by SOSC (1.6) andC(x) is closed. Thus, there exists ηµ > 0 such that

I1(µ) ⊆ I1(µ) and Φ0(µ, u) > 0 ∀ u ∈ C(x) \ {0}, ∀ µ ∈ (µ+ ηµB) ∩M(x).(3.3)

Define Ψ0,Ψ1 : IRn → IR by

Ψ0(u) = min{Φ0(µ, u) | µ ∈ (µ+ ηµB) ∩M(x)} and Ψ1(u) = Φ1(µ, u).

By [30, Theorem 1.17 (c)], Ψ0 is continuous on IRn. Let −t0 ∈ IR and t1 ∈ IR be theminimal values of the continuous functions Ψ0 and Ψ1, respectively, on the compact


set D = {u ∈ S | Ψ0(u) ≤ 0}, where S = {u ∈ IRn | ‖u‖ = 1}. Obviously, t0 ≥ 0 bythe definition of D. Note that by the compactness of the set in the definition of Ψ0

and (3.3), it holds that Ψ0(u) > 0 for all u ∈ C(x)\{0}, and by the definition (1.5) wehave that C(x) = {u ∈ IRn | Ψ1(u) ≤ 0}. Thus, Ψ1(u) > 0 for all u ∈ D (if Ψ1(u) ≤ 0for u ∈ S then u ∈ C(x) \ {0}, so that Ψ0(u) > 0 and u 6∈ D). Hence, t1 > 0 by thecompactness of D. Choose ρµ > t0/t1 ≥ 0. As Ψ1(u) ≥ 0 for all u ∈ IRn, by thedefinition of D we have that Ψ0(u) + ρµΨ1(u) ≥ Ψ0(u) > 0 for all u ∈ S \D. Also,Ψ0(u)+ρµΨ1(u) > 0 for all u ∈ D by the definition of ρµ. Thus Ψ0(u) +ρµΨ1(u) > 0for all u ∈ S. By the compactness of S, and by the continuity of Ψ0 and Ψ1, it followsthat there exists ε > 0 such that

Ψ0(u) + ρµΨ1(u) ≥ ε > 0 ∀u ∈ S.(3.4)

Since I1(µ) ∪ I0(µ) = I for any µ ∈M(x), it holds that

(I1(µ)\I1(µ)) ∪ I0(µ) = I0(µ).

Hence, for any µ ∈ (µ+ ηµB) ∩M(x) we have

2Φ1(µ, u) =∑

i∈I1(µ)

(〈g′i(x), u〉)2 +∑

i∈I1(µ)\I1(µ)

(〈g′i(x), u〉)2 +∑

i∈I0(µ)

(max{0, 〈g′i(x), u〉})2

≥∑

i∈I1(µ)

(〈g′i(x), u〉)2 +∑

i∈I0(µ)

(max{0, 〈g′i(x), u〉})2

= 2Φ1(µ, u) = 2Ψ1(u),

where we used the fact that t2 ≥ max{0, t}2. Combining the latter relation with (3.4),we conclude that for all u ∈ S it holds that

Φ0(µ, u) + ρµΦ1(µ, u) ≥ Ψ0(u) + ρµΨ1(u) ≥ ε.

For each v ∈ IRn \ {0} considering u = v/‖v‖ ∈ S, using the definitions of Φ0 and Φ1

the latter relation gives

Φ0(µ, v) + ρµΦ1(µ, v) ≥ ε‖v‖2,

for all µ ∈ (µ + ηµB) ∩M(x) and all v ∈ IRn. Using now (3.2), we guarantee theexistence of γµ, δµ > 0 such that

L(x+ v, µ; ρµ) ≥ min{L(x+ v, µ; ρµ) | µ ∈ (µ+ ηµB) ∩M(x)}= L(x+ v, µ; ρµ)

≥ f(x) + ε‖v‖2 + o(‖v‖2)

≥ f(x) + γµ‖v‖2,

for all v ∈ δµB.The property (3.1) follows taking x = x + v, v ∈ δµB, and using the fact that

L(x, µ; ·) is nondecreasing [29, Proposition 7.1].

The next result shows, among other things, that under the stated assumptions theaugmented Lagrangians have local minimizers in the interior of some ball around thepoint x. This conclusion is standard under SOSC for the case of equality-constrained


problems and in the general case if strict complementarity is assumed (e.g., combin-ing SOSC with [4, Prop. 2.2]). Using the quadratic growth condition and the exteriorpenalty arguments, it can be seen that the property holds also without strict comple-mentarity (see, e.g., [6, Thm. 2]). We include here a different proof for completeness,and also because some specific estimates derived in the process would be needed lateron.

Corollary 3.2. Let (x, µ) satisfy SOSC (1.6), and let δµ, γµ, ηµ, ρµ > 0 be theconstants defined in Proposition 3.1.

If µ ∈ IRm, x ∈ x+ δµB and ρ ≥ ρµ are such that L(x, µ; ρ) ≤ f(x), then for allµ ∈ (µ+ ηµB) ∩M(x) it holds that

‖x− x‖2 ≤ 1

ργµ〈ΠQ(µ+ ρg(x))− µ, µ− µ〉.(3.5)

Furthermore, shrinking δµ, ηµ and increasing ρµ if necessary, it holds that for allµ ∈ (µ+ ηµB) ∩Q and all ρ ≥ ρµ the problem

minimizex∈IRn

L(x, µ; ρ)

has a local minimizer x ∈ int(x+ δµB).Proof. Using the concavity of L(x, ·; ρ) [29, Prop. 7.1], we have that

L(x, µ; ρ) ≥ L(x, µ; ρ)−⟨∂L

∂µ(x, µ; ρ), µ− µ

⟩= L(x, µ; ρ)− 1

ρ〈ΠQ(µ+ ρg(x))− µ, µ− µ〉

≥ f(x) + γµ‖x− x‖2 −1

ρ〈ΠQ(µ+ ρg(x))− µ, µ− µ〉,(3.6)

where we used (3.1) for the second inequality. Then if L(x, µ; ρ) ≤ f(x), the property(3.5) follows immediately.

We now prove the last assertion. Fix τ ∈ (0, 1). Define c > 0 and ` > 0 such that

c =

1

3mini∈J{−gi(x)}, if J 6= ∅,

1, otherwise,

and

‖g(x)− g(x)‖ ≤ `‖x− x‖ ∀x ∈ x+ δµB.

Decreasing δµ, ηµ and increasing ρµ, if necessary, we can guarantee that

0 < δµ ≤c

`, 0 < ηµ ≤

γµτ2δµ

2`, ρµ ≥

γµτ2

2`2.(3.7)

Take any µ ∈ (µ+ ηµB) ∩Q, any ρ ≥ ρµ, and let x be any (global) minimizer ofL(·, µ; ρ) over the (compact) set x+ δµB.

Since g(x) ∈ Q◦ and the projection operator is nonexpansive, it holds that

‖ΠQ(µ+ ρg(x))‖ = ‖ΠQ(µ+ ρg(x))−ΠQ(ρg(x))‖ ≤ ‖µ‖.


Hence,

L(x, µ; ρ) ≤ L(x, µ; ρ)

= f(x) +1

2ρ

(‖ΠQ (µ+ ρg(x))‖2 − ‖µ‖2

)≤ f(x).(3.8)

Condition (3.8) now implies that the bound (3.5) holds for x and µ, i.e.,

‖x− x‖ ≤(

1

ργµ〈ΠQ(µ+ ρg(x))− µ, µ− µ〉

)1/2

.(3.9)

To establish the claim, it remains to show that the right-hand side in (3.9) is strictlysmaller than δµ.

If i ∈ J then µi = 0, and we obtain that

1

ρµi + gi(x) ≤ 1

ρ|µi − µi|+ |gi(x)− gi(x)|+ gi(x)

≤ 1

ρµηµ + `δµ − 3c

≤ 2`δµ − 3c

≤ −c < 0,

where the second inequality follows from the definition of c, and for the other relationswe use (3.7). Hence, (µ+ ρg(x))J < 0 and we have that∥∥(ΠQ(µ+ ρg(x))− µ)J

∥∥ = ‖ − µJ ‖ = ‖µJ − µJ ‖.(3.10)

By the structure of Q and using that gi(x) = 0 for i ∈ I, we have∣∣(ΠQ(µ+ ρg(x))− µ)i∣∣ = ρ|gi(x)− gi(x)| for i ∈ {1, . . . , l},∣∣(ΠQ(µ+ ρg(x))− µ)i∣∣ = |max{0, µi + ρgi(x)} − µi|≤ ρ|gi(x)− gi(x)| for i ∈ I\{1, . . . , l},

where for the last inequality we use the fact that µi ≥ 0 (because µ ∈ Q) and|max{a, b} −max{a, c}| ≤ |b− c| for any a, b, c ∈ IR. Combining the relations above,we obtain ∥∥(ΠQ(µ+ ρg(x))− µ)I

∥∥ ≤ ρ‖gI(x)− gI(x)‖.(3.11)

Now, using (3.10) and (3.11), it follows that

1

ρ〈ΠQ(µ+ ρg(x))− µ, µ− µ〉 ≤ 1

ρ‖(ΠQ(µ+ ρg(x))− µ)J ‖‖(µ− µ)J ‖

+1

ρ‖(ΠQ(µ+ ρg(x))− µ)I‖‖(µ− µ)I‖

≤ ‖gI(x)− gI(x)‖‖µI − µI‖+1

ρ‖µJ − µJ ‖2

≤ `δµηµ +1

ρµη2µ

≤ 1

2γµτ

2δ2µ +

1

2γµτ

2δ2µ = γµτ

2δ2µ,


where for the last inequality we use (3.7). Combining this bound with with (3.9), weconclude that

‖x− x‖ ≤ τδµ < δµ,

i.e., x ∈ int(x+δµB). It follows that x is an unconstrained local minimizer of L(·, µ; ρ).

It is now clear that the (exact or inexact) stationarity condition (2.7) can alwaysbe achieved. The next step is to prove that the localization property (2.8) is satisfiedas well. We first verify this fact for the case of exact solutions of subproblems.

Proposition 3.3. Let (x, µ) satisfy SOSC (1.6), and let δµ, γµ, ηµ, ρµ > 0 be theconstants defined in Corollary 3.2.

Then there exist ε1, c1 > 0 such that for any ρ ≥ ρµ and (x, µ) ∈ ((x, µ) + ε1B)∩(IRn ×Q) with σ(x, µ) > 0, if y is a solution of

minimizex∈x+δµB

L(x, µ; ρ)

and ν = ΠQ(µ+ ρg(y)), then it holds that∥∥∥∥[ y − xν − µ

]∥∥∥∥ ≤ c1 σ(x, µ).

Proof. Suppose the contrary, i.e., that there exists a sequence {(xk, µk, ρk)} ⊂IRn ×Q× (0,∞) such that (xk, µk)→ (x, µ), ρk ≥ ρµ and

ζk = ‖(yk − xk, νk − µk)‖ > kσk, σk = σ(xk, µk) > 0,

where

yk ∈ argmin{L(x, µk; ρk) | x ∈ x+ δµB}, νk = ΠQ(µk + ρkg(yk)).

By the assumption above, we have that

σkζk→ 0.(3.12)

By the definition (2.3) of σ(·), this means that

f ′(xk) + (g′(xk))>µk = o(ζk), µk −ΠQ(µk + g(xk)) = o(ζk).(3.13)

Also, defining µk = ΠM(x)(µk), we have that

xk − x = O(σk), µk − µk = O(σk), xk − x = o(ζk), µk − µk = o(ζk),(3.14)

where the first two relations follow from (2.15) and the last two from (3.12).Taking a subsequence if necessary, we can assume that

1

ζk

[yk − xkνk − µk

]→[uv

]6= 0.(3.15)

For k large enough we have that yk ∈ x+ δµB, ρk ≥ ρµ, µk ∈ (µ+ ηµB) ∩M(x)and L(yk, µk; ρk) ≤ f(x) (as in (3.8)). Thus, by (3.5), it holds that

‖yk − x‖2 ≤ 1

ρkγµ〈νk − µk, µk − µk〉.(3.16)


Since µk ∈ Q, we have that

‖νk − µk‖ = ‖ΠQ(µk + ρkg(yk))−ΠQ(µk)‖ ≤ ρk‖g(yk)‖.

Then (3.16) implies

lim supk→∞

‖yk − x‖2 ≤ lim supk→∞

1

ρkγµ‖νk − µk‖‖µk − µk‖

≤ lim supk→∞

1

γµ‖g(yk)‖‖µk − µk‖ = 0,(3.17)

where we used the fact that {yk} is bounded. From (3.16) we also obtain that

lim supk→∞

‖yk − x‖2

ζ2k

≤ lim supk→∞

1

ρkγµ

‖νk − µk‖ζk

‖µk − µk‖ζk

= 0,

where we have used the last relation in (3.14), that ρk ≥ ρµ, and that ζ−1k (νk−µk)→ v

by (3.15). Using now the relation above and again (3.14), we conclude that

u = limk→∞

yk − xk

ζk= limk→∞

yk − xζk

− limk→∞

xk − xζk

= 0.(3.18)

From (3.16), using that ζ−1k (νk − µk)→ v, we also obtain

lim supk→∞

ρkζk‖yk − x‖2 ≤ lim sup

k→∞

1

γµ

‖νk − µk‖ζk

‖µk − µk‖ = 0.(3.19)

Since yk → x (by (3.17)), we have that yk ∈ int (x + δµB) so that it is anunconstrained local minimizer of L(·, µk; ρk). We then have that

0 =∂L

∂x(yk, µk; ρk)

= f ′(yk) + (g′(yk))>νk

= f ′(yk) + (g′(yk))>νk − f ′(xk)− (g′(xk))>µk + o(ζk)

= f ′(yk)− f ′(xk) + (g′(yk)− g′(xk))>µk + (g′(yk))>(νk − µk) + o(ζk)

= (g′(yk))>(νk − µk) + o(ζk),

where the second equality is by (2.10), the third is by (3.13), and the last follows fromyk − xk = o(ζk) (by (3.18)). Dividing both sides of the latter equation by ζk, takinglimits and using that yk → x (by (3.17)), we conclude that

v ∈ ker (g′(x))>.(3.20)

If i ∈ J we have that µi = 0 and gi(x) < 0. Then, since yk → x (by (3.17)), xk → xand µk → µ, it holds that µki + gi(x

k) < 0 and µki + ρkgi(yk) < 0 for k large enough.

Thus, by (3.13) we have that µkJ = o(ζk), and by the definition of νk that νkJ = 0.Hence,

vJ = limk→∞

νkJ − µkJζk

= 0.(3.21)


Combining (3.21) with (3.20) we conclude that

vI ∈ K = ker (g′I(x))>.

Since gI(yk) = gI(x) + g′I(x)(yk − x) +O(‖yk − x‖2) and gI(x) = 0, we have that

gI(yk) +O(‖yk − x‖2) = g′I(x)(yk − x) ∈ im g′I(x) = K⊥.

Using now that ΠK(·) is a linear operator (since K is a subspace), we obtain that

0 = limk→∞

ρkζk

ΠK(gI(yk) +O(‖yk − x‖2))

= limk→∞

(ρkζk

ΠK(gI(yk)) +O

(ρk‖yk − x‖2

ζk

))= limk→∞

ρkζk

ΠK(gI(yk)),(3.22)

where (3.19) was employed for the last equality.Since the index set {l+1, . . . ,m} is finite, passing onto a subsequence if necessary,

we can assume that there exists I2 ⊆ {l + 1, . . . ,m} such that µki + ρkgi(yk) < 0 for

i ∈ I2 and all k large enough, while µki + ρkgi(yk) ≥ 0 for i ∈ {l + 1, . . . ,m} \ I2.

Then νki = 0 for i ∈ I2 and νki = µki + ρkgi(yk) ≥ 0 for i ∈ {l+ 1, . . . ,m} \ I2. Define

the closed convex cone

Q2 = {wI ∈ IR|I| | wi ∈ IR, i ∈ I\I2; wi ≥ 0, i ∈ I ∩ I2}.

Since for i ∈ I2 we have νki = 0 while µki ≥ 0 (because µk ∈ Q andI2 ⊆ {l + 1, . . . ,m}), it holds that −(νkI − µkI) ∈ Q2. Dividing by ζk > 0 andtaking limits, we obtain that

−vI ∈ Q2.

We also have that µki + ρkgi(yk)− νki = 0 for i ∈ {1, . . . , l} ⊂ I by the structure

of Q, and for i ∈ {l + 1, . . . ,m} \ I2 by the definition of I2. By the same definition,µki + ρkgi(y

k)− νki = µki + ρkgi(yk) < 0 for i ∈ I2. Hence, ρkgI(yk)− (νkI −µkI) ∈ Q◦2.

Using the linearity of ΠK(·), we then obtain that

ΠK(Q◦2) 3 limk→∞

ΠK

(1

ζk(ρkgI(yk)− (νkI − µkI))

)= limk→∞

(ρkζk

ΠK(gI(yk))−ΠK

(νkI − µkIζk

))= −ΠK(vI) = −vI ,

where the second equality follows from (3.22) and the last holds since vI ∈ K. Hence,−vI = ΠK(ξI) for some ξI ∈ Q◦2. Since −vI ∈ Q2 ∩K and K is a subspace, we thenobtain that

0 ≥ 〈−vI , ξI〉 = 〈−vI ,ΠK(ξI)〉 = ‖vI‖2,

so that vI = 0. Combining this with (3.18) and (3.21), we obtain that (u, v) = 0, incontradiction with (3.15). This completes the proof.


We have therefore established that the conditions in (2.7) and (2.8) hold if xk+1

is the exact local minimizer x of L(·, µk; ρk) in x+ δµB, and we define in (2.8) c = c1given by Proposition 3.3.

Now taking c > c1 in (2.8) and an appropriately small εk > 0 in (2.7), by thecontinuity of all the involved quantities it follows that (2.7) and (2.8) hold for anyxk+1 close enough to x. We are now in position to state our main convergence result,which also makes precise the choice of the truncation parameter εk. The proof of thefirst item in Theorem 3.4 below is an adaptation of the corresponding part in [18,Theorem 1].

Theorem 3.4. Let (x, µ) satisfy SOSC (1.6), and let ψ : IR+ → IR+ be anyfunction ψ(t) = o(t).

Then there exist ε, ρ > 0 such that if (x0, µ0) ∈ ((x, µ) + εB)∩ (IRn ×Q) and thesequence {(xk, µk)} is generated according to (2.7)–(2.9) with ρk ≥ ρ for all k andεk = ψ(σ(xk, µk)), the following assertions hold:

(i) The sequence {(xk, µk)} is well-defined and converges to (x, µ), with someµ ∈M(x).

(ii) For any q ∈ (0, 1) there exists ρq such that if ρk ≥ ρq for all k then theconvergence rate of {(xk, µk)} to (x, µ) is Q-linear with quotient q.

(iii) If ρk → +∞, the convergence rate is Q-superlinear.(iv) If εk ≤ t1σ(xk, µk)s and ρk ≥ t2/σ(xk, µk)s−1, where t1, t2 > 0 and s > 1,

then the Q-order of superlinear convergence is at least s.

Proof. If σ(xk, µk) = 0 for any iteration index k, the point (xk, µk) satisfies the

KKT conditions (1.4) and the algorithm should, naturally, stop. We assume that thisdoes not happen and σ(xk, µk) > 0 for all k.

Let ε0, γ0, τ0 satisfy (2.14), β2 satisfy (2.15), ε1, c1 be given by Proposition 3.3,ρµ be given by Proposition 3.1, and let c in (2.8) satisfy c > c1.

For any ψ(t) = o(t), there exists ε2 > 0 such that

ψ(σ(x, µ)) ≤ 1

4β2τ0σ(x, µ) ∀ (x, µ) ∈ (x, µ) + ε2B.(3.23)

Define

ρ = max{4cβ2τ0, ρµ}, ε′ = min

{2τ0γ0,

ε0

cβ2 + 1, ε1, ε2

}, ε =

ε′

2cβ2 + 1.(3.24)

To shorten notation, let w = (x, µ), wk = (xk, µk) and σk = σ(xk, µk).We shall argue by induction. Suppose we have wj ∈ IRn × Q, j = 0, . . . , k,

satisfying

‖wj − w‖ ≤ ε′,(3.25)

∥∥∥∥∂L∂x (wj+1)

∥∥∥∥ ≤ εj , ‖wj+1 − wj‖ ≤ cσj ,(3.26)

where we recall (2.10) for the first relation in (3.26). Note that if w0 ∈ (w + εB) ∩(IRn ×Q) then (3.25) holds for j = 0 since ε ≤ ε′ ≤ ε1, and there exists w ∈ IRn ×Qsuch that ∂L

∂x (w) = 0 by Corollary 3.2, and ‖w−w0‖ ≤ cσ0 by Proposition 3.3 (by theremarks preceding the statement of Theorem 3.4 appropriate approximate solutions


for the first iteration satisfy the required conditions as well). Hence, w1 is well-defined,and the conditions (3.25), (3.26) hold for j = 0. We next show that if (3.25), (3.26)hold for j = 0, . . . , k, then wk+2 is well-defined and the conditions in question holdfor j = k + 1.

By (2.15),

σj ≤ β2‖wj − w‖.(3.27)

Thus, using (3.25) and (3.24), we have that

‖wj+1 − w‖ ≤ ‖wj+1 − wj‖+ ‖wj − w‖ ≤ (cβ2 + 1)ε′ ≤ ε0.

Also, for pj = (εjϑ, (µj+1 − µj)/ρj) with ϑ ∈ B and εj = ψ(σj), we obtain

‖pj‖ ≤ εj +1

ρj‖wj+1 − wj‖ ≤

(1

4β2τ0+c

ρ

)σj ≤

(1

4τ0+

1

4τ0

)ε′ ≤ γ0,

where we use (3.23), (3.25) and (3.24). Hence, since wj+1 ∈ w + ε0B, pj ∈ γ0B and(2.12) holds, from (2.14) we obtain (2.17) for j = 0, . . . , k. We then have that

dist(wj+1,Σ∗) ≤ τ0(εj +

c

ρjσj

)≤(

1

4β2+

1

4β2

)σj ≤

1

2dist(wj ,Σ∗),

where in the second inequality we use (3.23) and the fact that ρj ≥ ρ ≥ 4cβ2τ0, andin the last inequality we use (2.15). It then follows that

dist(wj+1,Σ∗) ≤1

2j+1dist(w0,Σ∗),(3.28)

for j = 0, . . . , k. Combining this relation with (3.26) and (2.15), we obtain

‖wk+1 − w0‖ ≤k∑i=0

‖wi+1 − wi‖

≤ ck∑i=0

σi

≤ cβ2

k∑i=0

dist(wi,Σ∗)

≤ cβ2

k∑i=0

1

2idist(w0,Σ∗)

≤ 2cβ2dist(w0,Σ∗).(3.29)

To show that (3.25) holds for j = k + 1, note that

‖wk+1 − w‖ ≤ ‖wk+1 − w0‖+ ‖w0 − w‖ ≤ (2cβ2 + 1)‖w0 − w‖ ≤ (2cβ2 + 1)ε = ε′,

where we use that w ∈ Σ∗ and (3.29) for the second inequality and (3.24) for the lastinequality. Now, since wk+1 ∈ IRn ×Q and ε′ ≤ ε1, by Corollary 3.2 and Proposition3.3 there exists w ∈ IRn × Q such that ∂L

∂x (w) = 0 and ‖w − wk+1‖ ≤ cσk+1. Thus,wk+2 in (2.7)–(2.9) is well-defined and (3.26) holds for j = k + 1.


The argument above shows that the sequence generated according to (2.7)–(2.9)is well-defined. We next prove that it converges. Note that (3.25) implies that {wk}is bounded. By (3.28), dist(wk,Σ∗)→ 0 and all accumulation points of {wk} belongto Σ∗. Suppose {wk} has two different accumulation points w and w. Using the sameargument as that leading to (3.29), for any i, k ∈ IN with k < i it holds that

‖wi − wk‖ ≤ 2cβ2dist(wk,Σ∗).(3.30)

Choosing the indices k such that the corresponding subsequence of {wk} convergesto w and the indices i such that the corresponding subsequence of {wk} converges tow, the left-hand side in (3.30) stays bounded away from zero, while the right-handside tends to zero. Hence, {wk} cannot have different accumulation points, i.e., itconverges (to a point w ∈ Σ∗).

Next, fixing any k and passing onto the limit as i→∞ in the relation ‖wk−w‖ ≤‖wi − wk‖+ ‖wi − w‖, using (3.30) we conclude that

‖wk − w‖ ≤ 2cβ2dist(wk,Σ∗).(3.31)

Hence, using also (3.26) and (2.17), we have that

‖wk+1 − w‖ ≤ 2cβ2dist(wk+1,Σ∗)

≤ 2cβ2τ0

(εk +

1

ρk‖µk+1 − µk‖

)≤ 2cβ2τ0

(εkσk

+c

ρk

)σk

≤ 2cβ22τ0

(εkσk

+c

ρk

)dist(wk,Σ∗)

≤ 2cβ22τ0

(εkσk

+c

ρk

)‖wk − w‖.(3.32)

For item (ii) of the assertion, let q ∈ (0, 1) and define ρq = 2c2β22τ0/q. Since

εk = o(σk) and ρk ≥ ρq, from (3.32) we obtain that

lim supk→∞

‖wk+1 − w‖‖wk − w‖

≤ lim supk→∞

2cβ22τ0

(εkσk

+c

ρq

)=

1

ρq2c2β2

2τ0 = q.

For item (iii), since εk = o(σk) and ρk → +∞, from (3.32) we have that

limk→∞

‖wk+1 − w‖‖wk − w‖

≤ limk→∞

2cβ22τ0

(εkσk

+c

ρk

)= 0.

Finally, for item (iv), using that εk ≤ t1σsk and ρk ≥ t2/σ

s−1k , from (3.32) we

obtain

‖wk+1 − w‖ ≤ 2cβ22τ0(t1σ

s−1k + cσs−1

k /t2)‖wk − w‖

≤ 2cβs+12 τ0 (t1 + c/t2) ‖wk − w‖s,

i.e., the order of convergence is at least s > 1.

Our next result establishes that if minimization in the subproblems is exact, thenthe primal sequence converges with the same Q-rate as the primal-dual sequence. This


appears to be the first assertion of primal Q-rate of convergence for the augmentedLagrangian methods, as opposed to the usual (weaker) R-rate.

Corollary 3.5. If in Theorem 3.4 εk = 0 for all k, then the primal sequence{xk} converges to x with the same Q-order as that of the convergence of the primal-dual sequence {(xk, µk)} to (x, µ).Proof. Let µk = ΠM(x)(µ

k). If εk = 0 then it holds that

0 =∂L

∂x(xk, µk−1; ρk−1) = f ′(xk) + (g′(xk))>µk.

Hence,

f ′(x) + (g′(x))>µk = f ′(x)− f ′(xk) + (g′(x)− g′(xk))>µk

+(g′(x)− g′(xk))>(µk − µk).(3.33)

Since {ρk} is bounded away from zero and (xk, µk)→ (x, µ) with µi = 0 and gi(x) < 0for i ∈ J , it holds that µk−1

i /ρk−1 +gi(xk) < 0 for all i ∈ J and k large enough. Then

µkJ = 0. Using also that µk ∈ Q, by the Hoffman’s error bound for linear systems [15,Lemma 3.2.3] there exists c0 > 0 such that

‖µk − µk‖ ≤ c0‖f ′(x) + (g′(x))>µk‖≤ c0‖f ′(x)− f ′(xk) + (g′(x)− g′(xk))>µk‖

+c0‖(g′(x)− g′(xk))>(µk − µk)‖≤ c1‖xk − x‖+ c2‖xk − x‖‖µk − µk‖,

where we used (3.33) and the fact that {µk} is bounded (because {µk} is convergent).Since ‖xk − x‖ → 0, we have that c2‖xk − x‖ ≤ 1/2 for k large enough, so that thelast relation above implies

‖µk − µk‖ ≤ 2c1‖xk − x‖.

Then,

‖wk − w‖ ≤ 2cβ2dist(wk,Σ∗)

= 2cβ2

(‖xk − x‖2 + ‖µk − µk‖2

)1/2≤ 2cβ2

√1 + 4c21‖xk − x‖,

where the first inequality is by (3.31) and the equality is by (2.16). Hence,

‖xk+1 − x‖‖xk − x‖

≤ ‖wk+1 − w‖‖xk − x‖

=‖wk − w‖‖xk − x‖

‖wk+1 − w‖‖wk − w‖

≤ 2cβ2

√1 + 4c21

‖wk+1 − w‖‖wk − w‖

.(3.34)

If convergence of {wk} to w is superlinear, then (3.34) shows that the rate ofconvergence of {xk} to x is at least as fast.

Concerning the linear rate of convergence, the conclusion follows as in the proofof item (ii) of Theorem 3.4 re-defining ρq = 4c3β3

2

√1 + 4c21τ0/q , which makes the

upper-limit of right-hand side in (3.34) no larger than q ∈ (0, 1).


4. Some related developments.

4.1. Boundedness of the penalty parameters. In computational implemen-tations, boundedness of the penalty parameters is important to avoid ill-conditioningin the subproblems of minimizing augmented Lagrangians (although there exist somestrategies for dealing with ill-conditioning in this context, they seem to be not quitecompetitive at this time).

Penalty parameter updates are different in various sources. To fix the setting, weconsider the proposal of [7], implemented within the latest version of ALGENCAN[34].

For α ∈ (0, 1) and r > 1 fixed, define

ρk+1 =

{ρk if σ(xk+1, µk+1) ≤ ασ(xk, µk),rρk otherwise,

(4.1)

where we recall that σ(·) stands for the natural residual of the KKT conditions (2.3).If solutions of subproblems are exact, then the iterations yield stationarity of theLagrangian at every step (recall (2.10)). In that case, the corresponding part in σ(·)is zero in each iteration, and thus the test in (4.1) to decide on the increase of thepenalty parameter measures the feasibility-complementarity progress. In this sense,this strategy is similar to what is commonly employed in other augmented Lagrangianmethods.

Conditions that guarantee boundedness of the sequence of penalty parameters{ρk} generated by the scheme (4.1) arise naturally from our analysis above. Since

g(xk+1)− 1

ρk(µk+1 − µk) ∈ NQ(µk+1),

it holds that

µk+1 = ΠQ

(µk+1 + g(xk+1)− 1

ρk(µk+1 − µk)

).

Now, using the nonexpansivity of the projection, we have

‖µk+1 −ΠQ(µk+1 + g(xk+1))‖ ≤ 1

ρk‖µk+1 − µk‖.

Thus, recalling again the definition (2.3) of σ(·), for (xk+1, µk+1) satisfying (2.7)–(2.9)we obtain that

σ(xk+1, µk+1) ≤(ε2k +

1

ρ2k

‖µk+1 − µk‖2)1/2

≤ εk +c

ρkσ(xk, µk).(4.2)

Hence, the boundedness of {ρk} is ensured if ρk > c/α and εk ≤ (α− c/ρk)σ(xk, µk).The desired result then follows from Theorem 3.4.

Corollary 4.1. If in the setting of Theorem 3.4 the sequence of penalty parame-ters {ρk} is generated according to the rule (4.1) with ρ0 ≥ ρ, then {ρk} is bounded.Proof. By Theorem 3.4 we have that εk = o(σ(xk, µk)) and the sequence {(xk, µk)}converges to (x, µ) with σ(x, µ) = 0 (in particular, σ(xk, µk)→ 0).

Suppose that ρk → +∞. Using (4.2) we then obtain

σ(xk+1, µk+1)

σ(xk, µk)≤ εkσ(xk, µk)

+c

ρk< α,

for all k large enough. But then (4.1) implies that ρk+1 = ρk for all k large enough,in contradiction with the assumption ρk → +∞.


4.2. Relations with the classical results. In the analysis above the initialmultiplier approximation must be close enough to a Lagrange multiplier satisfyingSOSC (1.6). The classical results (cited in the Introduction), while requiring sig-nificantly stronger assumptions, allow the initial approximation to be far from the(unique in that setting) optimal multiplier µ at the price of increasing proportion-ally the threshold value for the penalty parameters. Keeping in mind the practicalrequirement of using moderate values of penalty parameters, this is perhaps not animportant difference since the penalty parameters would introduce a bound on howfar the initial multiplier can be taken. Nevertheless, it would be nice to completethe picture by clarifying whether or not increasing the penalty threshold one can alsotake far away multipliers assuming SOSC (1.6) only. At this time, this is an openquestion. Here, we show that a result similar to the classical can be derived from ouranalysis under some stronger assumptions.

In the context of generalized equations, we replace the upper-Lipschitzian prop-erty (2.14) of Σ(p) by the stronger Aubin property of Σ at (0, w): there exist constantsε0, γ0, τ0 > 0 such that

Σ(p) ∩ (w + ε0B) ⊆ Σ(q) + τ0‖p− q‖B ∀p, q ∈ γ0B.(4.3)

This condition guarantees that Σ is nonempty and single-valued near the origin. More-over, it was shown in [14, Theorem 6] that in the context of the optimization problem(1.1) the Aubin property (4.3) for w = (x, µ) is equivalent to the combination of thelinear independence constraint qualification (LICQ) at x (i.e., the set {g′i(x), i ∈ I}is linearly independent) with the strong second-order sufficient optimality condition(SSOSC) at (x, µ), i.e.,⟨

∂2L

∂x2(x, µ)u, u

⟩> 0 ∀u ∈ C+(x) \ {0},(4.4)

where

C+(x) = {u ∈ IRn | 〈g′i(x), u〉 = 0 for i ∈ I1(µ)} .

Note that under the strict complementarity condition µi > 0 for all i ∈ I, used in theclassical results, it holds that C+(x) = C(x) and thus SSOSC (4.4) and SOSC (1.6)are the same in that case.

We next show that under the assumptions of LICQ and SSOSC, and withoutstrict complementarity, our analysis can handle initial multiplier estimates far fromthe optimal µ, similar to the classical results.

Proposition 4.2. Let x satisfy LICQ and (x, µ) satisfy SSOSC (4.4). Then

there exist η, ρ, δ, τ > 0 such that for each (µ, ρ, ϑ) in the set

D =

{(µ, ρ, ϑ) ∈ IRm+1+n

∣∣∣∣∣(‖ϑ‖2 +

1

ρ2‖µ− µ‖2

)1/2

≤ η, ρ ≥ ρ

},

there exists the unique vector x ∈ x+ δB satisfying

∂L

∂x(x, µ; ρ) = ϑ.

Moreover, it holds that∥∥∥∥[ x− xµ− µ

]∥∥∥∥ ≤ τ (‖ϑ‖2 +1

ρ2‖µ− µ‖2

)1/2

,


where µ = ΠQ(µ+ ρg(x)).Proof. By the assumptions, we have that Σ satisfies the Aubin property (4.3).

Define

τ = 2τ0, δ = ε0, γ = min{γ0, ε0/τ0}, η = γ/2, ρ = 2ε0/γ,(4.5)

and take any (µ, ρ, ϑ) ∈ D. Let v = (µ−µ)/ρ and p0 = (−ϑ,−v). Since ‖p0‖ ≤ η ≤ γ0,by (4.3) there exists (x1, µ1) = w1 ∈ Σ(p0) such that ‖w1 − w‖ ≤ τ0‖p0‖ ≤ ε0.Similarly, we can define the sequences {pj} and {wj = (xj , µj)} such that

pj =

(−ϑ, 1

ρ(µj − µ)− v

), wj ∈ Σ(pj−1) and ‖wj − w‖ ≤ ε0,

for all j ≥ 1. This construction can be done because

‖pj‖2 = ‖ϑ‖2 + ‖1

ρ(µj − µ)− v‖2

≤ 2

(1

ρ2‖µj − µ‖2 + ‖ϑ‖2 + ‖v‖2

)≤ 2

(1

ρ2ε2

0 +1

4γ2

)≤ γ2,

where the last two relations come from (4.5) and the fact that (µ, ρ, ϑ) ∈ D. Now,using (4.3), for any k > j we obtain

‖wk − wj‖ ≤ τ0‖pk − pj‖

=τ0ρ‖µk−1 − µj−1‖ ≤ τ0

ρ‖wk−1 − wj−1‖

≤(τ0ρ

)j−1

‖wk−j+1 − w1‖

≤(τ0ρ

)j−1

τ0‖pk−j − p0‖

=

(τ0ρ

)j‖µk−j − µ‖ ≤ 1

2jε0,

where for the last inequality we use the facts that ‖wk−j − w‖ ≤ ε0 and ρ ≥ ρ ≥ 2τ0.Thus, since {wj} is a Cauchy sequence, there exists w such that wj → w = (x, µ).

Hence, pj → p =(−ϑ, 1

ρ (µ− µ)− v)

and w ∈ Σ(p). Then,

ϑ = f ′(x) + (g′(x))>µ,(4.6)

g(x) + v − 1

ρ(µ− µ) ∈ NQ(µ).(4.7)

By (4.7) and using that v = (µ − µ)/ρ we have µ + ρg(x) − µ ∈ NQ(µ), which isequivalent to µ = ΠQ(µ+ ρg(x)). Thus, from (4.6) we obtain

∂L

∂x(x, µ; ρ) = f ′(x) + (g′(x))>µ = ϑ.

The uniqueness of x comes from the fact that Σ is single-valued.


To prove the last relation, by (4.3) and using that w ∈ Σ(0), we obtain

‖x− x‖2 + ‖µ− µ‖2 = ‖w − w‖2 ≤ τ20 ‖p‖2

≤ 2τ20

(1

ρ2‖µ− µ‖2 + ‖ϑ‖2 + ‖v‖2

)≤ 2τ2

0 (‖ϑ‖2 + ‖v‖2) +1

2‖µ− µ‖2,

where we used that ρ ≥ ρ ≥ 2τ0 for the last inequality. Then,

‖x− x‖2 + ‖µ− µ‖2 ≤ 4τ20 (‖ϑ‖2 + ‖v‖2).

Thus, the desired result follows noticing that τ = 2τ0.

Note that in the case of equality-constrained problems the above reproducesthe classical result (see [4, Proposition 2.14]), while in the presence of inequalityconstraints there is an improvement, as the strict complementarity condition is notneeded.

Thus, under the assumptions of LICQ and SSOSC, we can take an initial mul-tiplier µ0 far from µ, at the expense of increasing the penalty parameter. Moreover,our Theorem 3.4 can be used from the next (i.e., second) iteration. To that end,considering η ≤ ε/τ and ρ ≥ ρ, if we take (µ0, ρ0, ϑ) ∈ D we obtain

(x1, µ1) = (x,ΠQ(µ0 + ρ0g(x))) ∈ ((x, µ) + εB) ∩ (IRn ×Q) .

We would like to emphasize that the definition of the set D depends on the uniquemultiplier µ. For the degenerate case, the definition of an appropriate set D allowinginitial multiplier estimates far from the multiplier set joint with the correspondingexistence result is an open question.

4.3. Extension to the case with upper and lower level constraints. Whenthere are some simple (bound or, more generally, linear) constraints, these are of-ten treated by augmented Lagrangian methods directly [10, 1]. I.e, such constraintsare not included into the augmented Lagrangian; instead the partial augmented La-grangian composed with general constraints is minimized subject to the simple con-straints. To that end, consider the problem (1.1) with additional linear constraints:

min f(x)s.t. gi(x) = 0, i = 1, . . . , l,

gi(x) ≤ 0, i = l + 1, . . . ,m,Ax ≤ a,

where A is an q × n matrix and a ∈ IRq. The Lagrangian L : IRn × IRm × IRq → IRof this problem is given by L(x, µ, λ) = f(x) + 〈µ, g(x)〉+ 〈λ,Ax− a〉.

The partial augmented Lagrangian with respect to general constraints is thendefined by (2.6) as before, and the subproblem consists of solving

min L(x, µk; ρk) s.t. Ax ≤ a.(4.8)

The multiplier update formula for µk+1 is the usual (1.3), and the new multipliers λk+1

associated to the linear constraints are those resulting from solving the subproblem(4.8).


Using considerations analogous to (2.10) and (2.11) before, the optimality condi-tions for (4.8) can be written as

0 ∈

∂L

∂x(xk+1, µk+1, λk+1)

−g(xk+1) +1

ρk(µk+1 − µk)

Axk+1 − a

+N (xk+1, µk+1, λk+1),

where

N (x, µ, λ) = {0} × NQ(µ)×NIRq+

(λ).

It is clear that the structure at hand is again a perturbed GE representing theKKT conditions for the problem in consideration, with the same type of perturbationsas before for general constraints and the associated multipliers, and with zero pertur-bations associated to the linear constraints. The preceding analysis easily extends.

5. Concluding remarks. We have established convergence and rate of con-vergence of the augmented Lagrangian method under the sole assumption that thedual starting point is close enough to a multiplier satisfying the usual second-ordersufficient optimality condition. No constraint qualifications and no strict complemen-tarity were assumed. Some possible directions of future research concern extendingthe analysis to various modifications of the classical algorithm. These include usingdifferent penalty parameters for different constraints, non-quadratic penalty terms,etc. Also, extensions of the method of multipliers to variational problems could beconsidered.

REFERENCES

[1] R. Andreani, E.G. Birgin, J.M. Martınez, and M.L. Schuverdt. On augmented Lagrangianmethods with general lower-level constraints. SIAM J. Optim., 18:1286–1309, 2007.

[2] R. Andreani, E.G. Birgin, J.M. Martınez, and M.L. Schuverdt. Augmented Lagrangian methodsunder the constant positive linear dependence constraint qualification. Math. Program.,111:5–32, 2008.

[3] D.P. Bertsekas. Multiplier methods: A survey. Automatica, 12:133–145, 1976.[4] D.P. Bertsekas. Constrained optimization and Lagrange multiplier methods. Academic Press,

New York, 1982.[5] E.G. Birgin, R. Castillo, and J.M. Martınez. Numerical comparison of Augmented Lagrangian

algorithms for nonconvex problems. Comp. Optim. Appl., 31:31–56, 2005.[6] E.G. Birgin, C.A. Floudas, and J.M. Martınez. Global minimization using an augmented La-

grangian method with variable lower-level constraints. Math. Program., 125:139–162, 2010.[7] E.G. Birgin, D. Fernandez, and J.M. Martınez. On the boundedness of penalty parameters in

an augmented Lagrangian method with lower level constraints. Optim. Methods Soft., 2011.DOI: 10.1080/10556788.2011.556634.

[8] B.T. Boggs and J.W. Tolle. Sequential quadratic programming. Acta Numerica, 4:1–51, 1996.[9] A.R. Conn, N. Gould, and P.L. Toint. A globally convergent augmented Lagrangian algorithm for

optimization with general constraints and simple bounds. SIAM J. Numer. Anal., 28:545–572, 1991.

[10] A.R. Conn, N. Gould, A. Sartenaer, and P.L. Toint. Convergence properties of an augmentedLagrangian algorithm for optimization with a combination of general equality and linearconstraints. SIAM J. Optim., 6:674–703, 1996.

[11] A.R. Conn, N.I.M. Gould, and Ph.L. Toint. LANCELOT: a Fortran package for largescalenonlinear optimization (Release A). Number 17 in Springer Series in Computational Math-ematics. Springer Verlag, Heidelberg, Berlin, New York, 1992.

[12] A.R. Conn, N.I.M. Gould, and Ph.L. Toint. Trust-Region Methods. SIAM, Philadelphia, 2000.[13] L. Contesse-Becker. Extended convergence results for the method of multipliers for non-strictly

binding inequality constraints. J. Optim. Theory Appl., 79:273–310, 1993.


[14] A. L. Dontchev and R. T. Rockafellar. Characterizations of strong regularity for variationalinequalities over polyhedral convex sets. SIAM J. Optim., 6:1087–1105, 1996.

[15] F. Facchinei and J.-S. Pang. Finite-Dimensional Variational Inequalities and ComplementarityProblems. Springer-Verlag, New York, 2003.

[16] D. Fernandez, A.F. Izmailov, and M.V. Solodov. Sharp primal superlinear convergence resultsfor some Newtonian methods for constrained optimization. SIAM J. Optim. 20:3312–3334,2010.

[17] D. Fernandez and M. Solodov. Stabilized sequential quadratic programming for optimizationand a stabilized Newton-type method for variational problems. Math. Program., 125:47–73,2010.

[18] A. Fischer. Local behavior of an iterative framework for generalized equations with nonisolatedsolutions. Math. Program., 94:91–124, 2002.

[19] M.R. Hestenes. Multiplier and gradient methods. J. Opt. Theory and Appl., 4, 303–320, 1969.[20] K. Ito and K. Kunisch. The augmented Lagrangian method for equality and inequality con-

straints in Hilbert spaces. Math. Program., 46:341–360, 1990.[21] A.F. Izmailov and M.V. Solodov. Inexact Josephy–Newton framework for generalized equations

and its applications to local analysis of Newtonian methods for constrained optimization.Comp. Optim. Appl. 46:347–368, 2010.

[22] A.F. Izmailov and M.V. Solodov. Stabilized SQP revisited. Math. Program., 2010. DOI:10.1007/s10107-010-0413-3.

[23] H.-Z. Luo, X.-L. Sun, and D. Li. On the convergence of augmented Lagrangian methods forconstrained global optimization. SIAM J. Optim., 18:1209–1230, 2007.

[24] B.A. Murtagh and M.A. Saunders. A projected Lagrangian algorithm and its implementationfor sparse nonlinear constraints. Math. Progr. Study, 16:84–117, 1982.

[25] J. Nocedal and S.J. Wright. Numerical Optimization. Springer–Verlag, New-York, 1999.[26] M.J.D. Powell. A method for nonlinear constraints in minimization problems. In R. Fletcher,

ed., Optimization, pp. 283–298, Academic Press, London and New York, 1969.[27] R.T. Rockafellar. A dual approach for solving nonlinear programming problems by uncon-

strained optimization. Math. Progr., 5:354–373, 1973.[28] R.T. Rockafellar. Augmented Lagrange multiplier functions and duality in nonconvex program-

ming. SIAM J. Control Optim., 12, 268–285, 1974.[29] R.T. Rockafellar. Lagrange multipliers and optimality. SIAM Rev., 35:183–238, 1993.[30] R.T. Rockafellar and J.-B. Wets. Variational Analysis. Springer-Verlag, New York, 1997.[31] A. Ruszczynski. Nonlinear Optimization. Princeton University Press, 2006.[32] T. Pennanen. Local convergence of the proximal point algorithm and multiplier methods without

monotonicity. Math. Oper. Res., 27:170–191, 2002.[33] B.T. Polyak and N.V. Tret’yakov. The method of penalty estimates for conditional extremum

problems. USSR Computational Mathematics and Mathematical Physics, 13:42–58, 1974.[34] http://www.ime.usp.br/˜egbirgin/tango/

siam j. optimization c - gwdgwebdoc.sub.gwdg.de/ebook/serien/e/impa_a/677.pdf · rio de janeiro, rj...

Documents