A New Class of Scaling Matrices for Scaled Trust Region Algorithms

A new class of affine scaling matrices for the interior point Newton-type methods is considered to solve the nonlinear systems with simple bounds. We review the essential properties of a scaling matrix and consider several well-known scaling matrices proposed in the literature. We define a new scaling matrix that is the convex combination of these matrices. The proposed scaling matrix inherits those interesting properties of the individual matrices and satisfies additional desired requirements. The numerical experiments demonstrate the superiority of the new scaling matrix in solving several important test problems.


I. INTRODUCTION
Diverse applications including signal processing [36] and compressive sensing [3,4] incorporate many convex nonlinear optimization problems. Although specialized algorithms have been developed for some of these problems, the interior-point methods are still the main tool to tackle them. These methods require a feasible initial point [19]. It should be paid attention that proper data collection plays an important role in assessing the results obtained [39]. In addition, a vast variety of soft-computing techniques such as evolutionary computing methods [2,18,28,37,38] and neural networks [1,31] include optimization problems, which are sensitive to the initial points. Like verifying solution uniqueness conditions, these tasks convert into linear feasibility problems with strict inequalities or nonlinear feasibility problems with bound constraints [16,34]. The latter is often challenging so that the existing algorithms need to be theoretically and computationally improved. The nonlinear minimization problem with bound constraints is: Efficient methods for the solution of this problem with good local convergence behavior have been proposed. The affine scaling trust region approach forms a practical framework for smooth and nonsmooth box constrained systems of nonlinear equations [6][7][8][9]. These kinds of methods use ellipsoidal trust region defined by a diagonal scaling matrix [12]. The diagonal scaling handles the bounds while at each iteration a quadratic model of the object function 1 2 || F || 2 is minimized within a trust region around the current iteration.
The main motivation for the current work is a series of papers by Bellavia et al [6][7][8][9][10][11]. The methods they introduced, STRN and CODOSOL have very good numerical properties [6,7]. These methods are widely used in practice [14,35] and their efficiency has been proved in several papers [7,23]. In [8] the authors studied global and fast convergence of an inexact dogleg method. They did not investigate the choice of a suitable scaling matrix and only reported the preliminary results. Later in [11] they focused on medium scale problems and replaced the inexact Newton step by the exact solution of the Newton equation. They considered several diagonal scaling matrices and showed the assumptions required to ensure the convergence. The name of the method is Constrained Dogleg (CoDo) method which is freely accessible through the website http://codosol.de.unifi.it. The effectiveness of (CoDo) is verified by comparing it to STRSCNE [7] and to IATR [9].
In these scaling-based algorithms, the performance is influenced by the selection of a scaling matrix. We introduce a new class of scaling matrices, which is obtained by the convex combination of current known matrices. We analyze the numerical performance of CoDoSol (The Matlab Solver of CoDo) for different convex combinations of the scaling matrices and compare the results using performance profile approach. We also use a Projected Affine Scaling Interior Point algorithm to check the local convergence properties of the new scaling matrix.
In section 2 we explain the role of the scaling matrices. In section 3 we consider several scaling matrices and review the requirements and assumptions. In section 4, we introduce a new class of scaling matrices and prove that the new matrices satisfy the required assumptions. Finally, in section 5 we report our conclusion of computational results.

II. SCALED TRUST REGIONS
In this section we describe the idea behind using scaled matrices to solve problem (1). First note that (1) is closely related to the box constrained optimization problem: Every solution of (1) is a global minimum of (2.2) and if x * is a minimum of (2) such that f (x * ) = 0 , then x * is a solution of (1). The first order optimality conditions of (2) are equivalent to the nonlinear system of equations and D is a suitable scaling matrix of order n Coleman and Li [12] considered only one choice of the scaling matrix, then Heinkenschloss et al. [21] noted that the optimality conditions holds for a general class of scaling matrices satisfying the conditions: (4) for all i=1,...,n and all x Î W. In affine scaling methods in order to handle the bounds the direction of the scaled gradient is defined by Given an iterate x k Î int (W) and the trust region size D k > 0, the trust region subproblem for (2) is: where m k is the norm of the linear model for 2 the elliptical trust region achieves. In order to solve this subproblem different approaches have been proposed like STRN [6] which combines ideas from the classical trust-region Newton method for unconstrained nonlinear equations and the interior affine scaling approach for constrained optimization problems or CoDoSol [11] which is based on a dogleg procedure tailored for constrained problems.

III. SCALING MATRICES
We consider the following well known matrices: • D CL (x ) given by Coleman and Li [12]. The diagonal elements are:   a(x k ) = l k . Bellavia et al in [11] introduced the following assumptions for a scaling matrix and verified that all above scaling matrices satisfy almost all the requirements specified by the following assumption.

A. Assumption
We define the convex combination of the above scaling matrices as a new class of scaling matrices as follows: This scaling matrix demonstrates the advantages of all the scaling matrices involved in its definition and seems an appropriate matrix with combined properties. we first have to verify that this matrix satisfies four desired requirements as follows (i) Clearly D CON satisfies (4). (ii) As Bellavia et al. [11] showed, the above three matrices are bounded in x Î W and r > 0 . This implies that the combination of these matrices is also bounded.
(iii) It has been shown in [11] that all the matrices except D HUU join this property. Thus if we assume a 2 = 0 then D CON satisfies this condition.
(iv) Same as before, since all the matrices satisfy condition (iv), the convex combination also verifies this property.
We implement the Constrained Dogleg method in the Matlab code CoDoSol using Elliptical trust-region with initial radius of 1. The limiting number of the iterations is set to be 300 and the limiting number of F-evaluations is set to be 1000. The experiment was on 15 problems with dimension between n=2 and n=1000 specified in Table 1.
Different types of constrained systems including systems with solutions both within the feasible region and on the boundary, systems with only lower (upper) bounds and systems with variable components bounded from above and below can be found in this table.
Nonlinear constrained systems come from [13,15] (problems Pb1 to Pb6), [40] (problems Pb10 to Pb13), chemical equilibrium system given in [17,32,33,41] (Pb7), and nonlinear complementarity problems (NCPs) given in [25,29] (Pb8, Pb9). While Pb15 [20] comes from nonlinear BVPs. Dealing with large dimensional problems is also critical, these problems need more CPU and, in some cases, they need to be reformed before feeding them into the solver [5]. Problems Pb14, Pb15 are examples of this kind of problems [30]. The starting points for the problems with finite lower and upper bounds are selected by a uniform distribution between l and u i.e. In CoDoSol the trust region size is updated as in [10,11], the failure criteria and parameter selection is same as [24] where the authors introduced an innovative and efficient method for parameter selection. We tested the algorithm with the scaling matrices and reported the numerical results. Thus, the CoDoSol is tested with the following scaling matrices: (11) and The efficiency of the scaling matrices has been measured by It, the number of the iterations and Fe, the number of the function evaluations to get convergence ( Table 2,3,4).
Bellavia et al showed that CL is superior to KK and KK is superior to HUU on their studied test problems, in our test problems KK is slightly better while HUU shows the poorest performance. HUU shows as good performance as or better performance than the individual matrices.
To compare these seven scaling matrices, we use the performance profiles and to have a more reliable comparison we used Nested Performance Profiles [22], which removed a negative side effect of the performance profiles. In Fig 1 and Fig 3 the computational effort is measured in terms of mean It (the mean number of iterations for the three starting points) and mean Fe (mean F-evaluations for the three starting points) respectively. In these figures we eliminate the convex coefficients and show 1 The black doted lines correspond to the individual matrices and the red line corresponds to the convex combination of the scaling matrices.
By looking at the performance profiles corresponding to CL+HUU we can see that it is efficient in solving about the 75% of the tests and solves about 95% of the tests within a factor 1 from the best solver. CL+HUU is the best scaling matrix for the studied problems.
This can be verified by looking at the figure 2 and figure 4. Figure 2 is based on It while Figure 4 is based on Fe.    In this section, we illustrate the local behavior of the different scaling strategies using two standard test problems. We implemented Algorithm "Projected Affine-Scaling Interior-Point Newton Method" [29]. The first test example is the famous Rosenbrockfunction: This function has a unique global minimum at 1) . The lower and upper bounds are l = (0,0) and u = (1,1).
Again, we see that the convergence of the Coleman-Li matrix is rather slow. While according to Bellavia et all it is the fastest one for the medium scale problems. Clearly the convex combination of the CL and KK helps it to mitigate the violation of the strict complementarity assumption that slows down the convergence rate of the affine-scaling Newton method using the Coleman-Li scaling. During our long implementations it has been proved that the performance of the scaling matrices depends on the test problems. Each matrix has its own advantages and disadvantages. For a new problem, there is no way to select the best possible scaling matrix and it has to be done by trial and fail. By changing in the dimension of the problem or changing the parameter values of a problem the previous matrix will not necessarily work as the best one. Also the performance of the scaling matrices highly depends on the algorithm. A matrix demonstrates fast convergence for a specific algorithm and slow convergence for other algorithm. In order to overcome this problem, one can use the convex combination of the scaling matrices. This new class of scaling matrices has the advantages of the individual matrices and in some cases works as the best option.