****optimize levenberg_marquardt#

Description#

This optimizer ([U14], [U15]) is a robust first order method (uses gradients of the objective function) specially meant for least-square minimization. It does not explicitly handle constraints.

The least-square norm of the distance between simulation and experience can be written,

(509)#\[{\cal F}~=~\frac{1}{2} (f(x) - y)^T W (f(x) - y)~,\]

where \(f(x)\) is the \((N\times 1)\) vector of simulations, \(y\) is the \((N\times 1)\) vector of experiments, and \(W\) is the positive diagonal \((N \times N)\) weight matrix. The first and second derivatives of \({\cal F}(x)\) are,

(510)#\[\begin{split}\begin{aligned} {\tt grad~} {\cal F}(x) &=& J^T W (f(x)-y)\\ {\rm hessian~} {\cal F}(x) &=& J^T W J + \sum_{i=1}^N W_{ii} (f_i(x) - y_i) {\rm hessian~} f_i \\\end{aligned}\end{split}\]

where \(J\) is the \((n \times N)\) matrix composed of \([{\tt grad~} f_1 \dots {\tt grad~} f_N]\) (here \(n\) is the number of optimization variables). The first fundamental idea in Levenberg-Marquardt is that the last terms of \({\rm hessian~} {\cal F}(x)\) can be neglected, \({\rm hessian~} {\cal F}(x) \approx J^T W J\). This is true in particular when the terms \((f_i(x) - y_i)\) are small in comparison to \({\rm hessian~} f_i\), which stands when the simulation is near the experiments and the largest eigenvalue of \({\rm hessian~} f_i\) is not too large (if not true, another optimization method, sqp for example, should be used). Necessary conditions of optimality state that the gradient of the objective function should vanish at the optimum \(x^*\),

(511)#\[{\tt grad~} {\cal F}(x^*)~=~0~.\]

Linearization of Equation (511) about the current point \(x_k\) yields,

(512)#\[{\tt grad~} {\cal F}(x_k) + {\rm hessian~} {\cal F}(x_k) (x_{k+1} - x_k) ~=~0~.\]

The second idea underlying Levenberg-Marquardt is to regularize the approximation of the Hessian of the objective function by adding positive terms on the diagonal, and use this in Equation (512). The regularized approximation to \({\rm hessian~} {\cal F}\) is then strictly positive definite and Equation (512) can always be solved. The fundamental equation used to update optimization variables is,

(513)#\[(J_k^T W J_k + \lambda_k I ) (x_{k+1} - x_k) ~=~-J_k^T W (f(x_k)-y)~.\]

where \(\lambda_k\) is a scalar positive parameter. The principle of Levenberg-Marquardt algorithm can also be derived from the linearized original optimization problem with a bound on the design variables. This yields an interpretation of \(\lambda\) as Lagrange multiplier associated to the move limit on the design variables. The value of \(\lambda\) can therefore be iteratively adjusted to guarantee that the linearization performed is still valid and that progress in terms of the objective function is made. The Levenberg-Marquardt algorithm proceeds as follows:

  1. \(k=0~,~\lambda=\lambda_0~,~x_k = x_0\).

  2. Solve (513) for \(x_{k+1}\).

  3. If \(||{\tt grad~} {\cal F}(x_{k+1})|| < gmin||\) or \({\cal F}(x_{k+1}) < fmin\), or number of \({\cal F}\) evaluations > \(iter\), or \(||x_{k+1}-x_k||\) < stepmini, stop.

  4. If \({\cal F}(x_{k+1}) <{\cal F}(x_{k})\), \(\lambda_{k+1}=\lambda_k / nu\), \(k = k+1\), goto 2.

  5. If \({\cal F}(x_{k+1}) \ge {\cal F}(x_{k})\), \(\lambda_{k}=\lambda_k * nu\), goto 2.

Gradients are calculated by finite differences. Min and max bounds on the optimization variables are taken into account by projection into the feasible domain.

Syntax#

The following adjustable parameters are allowed in the ***convergence section.

perturb Magnitude of the perturbation (in percent of the normalized design variables) used to calculate gradient through finite differences. Default value is 0.5 percent.

lambda0 Initial value of the \(\lambda\) parameter which controls the algorithm. Smaller values when closer to the solution.

lambda_max Maximal value of lambda. Default is 1.e+07.

iter Maximum number of function \({\cal F}\) evaluations before terminating. Default is 100.

nu cutting/amplification factor of the lambda parameter on reduced or increased function value. It was originally suggested to have 10 for this parameter (default value). A factor of 4 may be more conservative.

fmin minimum function value for convergence. Default is 0.0001.

gmin minimum norm of the gradient of the objective function for convergence. Default is 1.e-8 .

stepmini minimum step at each iteration. Default is 0.00001. There is a possible false stopping of the optimizer if lambda is taken very large and stepmini very small (because lambda acts as a move limit on the parameter change).

Example#

This following is a complete optimization input using this method:

****optimize levenberg_marquardt
 ***files plast.sim
 ***shell
  Zrun -S plast3 2>&1 > /dev/null
 ***values
  cl  1000.  min 200. max 15000.
  cnl 140000.  min 50000. max 200000.
  dnl 100.   min 50.  max 2000.
  r0  260.   min 50.  max 500.
  q   200.   min 20.  max 500.
  b   5.     min 2.   max 20.
 ***convergence
  perturb 0.005
  lambda0 5
  lambda_max 1.e+08
  iter 100
  nu  5
  fmin  0.00001
  gmin  0.00001
 ***compare
    t_file_file plast3.test 1 3 data_plast3 1 3 weight 1.
****return