Penalty Method

Next: Lagrange Multipliers Up: Constrained Minimization Previous: Constrained Minimization

8.3.1

Penalty Functions

Using penalty functions, we can force f to stay inside or close to the feasible region determined by while minimizing . A simply penalty function for can be taken as . Adding them to the original energy yields a new energy

where determines the amount of penalty. Minimizing the above w.r.t. f for a fixed is an unconstrained problem. The minimum in the limit ()

will be one satisfying the constraints . This suggests an algorithm which solves a sequence of unconstrained minimization problems:

Choose a fixed sequence with and .
For each , find a local minimum .
Terminate when is small enough.

In practice, step (2) is usually done numerically using an iterative method. Moreover,

obtained previously with

is used as the initial point for the next minimization with

Consider the problem of pose estimation from a set of corresponding points discussed in Section 5.3.1. Assume is the unconstrained solution space in which a pose f consists of an arbitrary rotation matrix and a translation vector in 3D. The energy without constraints measures squared errors. If poses are constrained to be Euclidean, as assumed, any rotation matrix must be orthogonal. The penalty function for orthogonality was given in (5.32) as the single-site potential function. A similar penalty function has been used by [Haralick et al. 1989].

As described in Section 2.3.2, [Geman et al. 1990] formulate boundary detection as constrained minimization; certain edge patterns, such as isolated edges, sharp turns, quadruple junctions and ``small'' structures, are forbidden in forming desirable boundaries. These are expressed as . In the penalty method, the penalty term is added to the energy , yielding . Sending to in the minimization will rule out occurrences of forbidden edge configurations.

To overcome the problem of local minima, an annealing process can be incorporated with the penalty method. In constrained simulated annealing [Geman et al. 1990], an annealing parameter T is introduced into E, yielding

Using a stochastic sampling scheme such as Metropolis algorithm or Gibbs sampler ( cf. Section 9.1), one can generate a Gibbs distribution f with energy functions . With an annealing scheme in which and at a suitably coupled rate, will converge to the global minmum of with probability approaching 1. Although this is computationally more demanding than ICM [Besag 1986] but invariably arrives at a better labeling.

The penalty method has a number of advantages : It is easy to use; it allows inexact constraints -- constraints are not necessary to fulfill exactly; it converges to a feasible solution when . However, allowing inexact constraints is disadvantageous for those problems in which the exact fulfillment of constraints is required. Moreover, when are very large, the system becomes stiff and the problem becomes ill-conditioned [Fletcher 1987]. The Lagrange multiplier method may be better in this regard.