Minimization -- Global Methods

The minimal solution is usually defined as the global one or one of them when there are multiple global minima. Finding a global minimum is non-trivial if the energy function contains many local minima. Whereas methods for local minimization are quite mature with commercial software on market, the study of global minimization is still young. There are no efficient algorithms which guarantee to find globally minimal solutions as are there for local minimization.

In analytical studies, local and global minima may be studied via convexity analysis. Let be a real function defined on . is said to be convex in if

for any two points and any real number . It is strictly convex if strict inequality ``<'' holds in the above. When plotted, a convex will always lie below the line segment connecting any two points on itself. The neighboring points of a minimum are always above the tangent at the minimum. A local minimum is also a global one if is convex on . It is the unique global minimum if is strictly convex. Therefore, when is convex, we can find a global minimum by finding a local minimum using the gradient descent method.

Global minimization requires: (1) to find all (a finite number of) local minima and (2) to prove that there are no more local minima. Without efficient algorithm, this amounts to exhaustive search. In reality, one is always facing one of the two choices: (1) to find the exact global minimum with possibly intolerable expenses or (2) to find some approximations to it with much less cost.

Two methods are often used to deal with the local minimum problem: random search and annealing. In random search methods, such as Metropolis algorithm [Metropolis and etal 1953] and Gibbs sampler [Geman and Geman 1984], a new configuration does not always make energy descent; occasional energy increase is allowed. This strategy hopefully can help get out of a local minimum into the global minimum. The change for generating a configuration is proportional to where T is a control parameter. Therefore, this method finds a lower energy configuration with a larger probability.

Annealing is often incorporated into local search methods to overcome the problem of local minima. It is performed by decreasing a parameter, e.g. the temperature in the Gibbs distribution, from a high value to a low value during the iterative minimization. At one extreme e.g. a high temperature, the energy landscape is convex and smooth and thus the unique minimum can be located easily; the minimum is tracked as temperature is gradually decreased to a sufficiently low value. This is so-called the continuation method [Wasserstrom 1973]. Such a technique can significantly overcome the local minimum problem and improve the quality of the solution. A disadvantage is that they take more time since an annealing algorithm has to decrease the parameter gradually over a range of values and at each value some convergence has to be reached.

There are two types of annealing: deterministic and stochastic. In MRF vision work, the stochastic simulated annealing algorithm [Geman and Geman 1984] and the deterministic graduated non-convexity (GNC) algorithm of [Blake and Zisserman 1987] enjoy their popularity. Other deterministic algorithms include mean field annealing [Peterson and Soderberg 1989,Yullie 1990,Geiger and Girosi 1991] and the Hopfield network approach [Hopfield and Tank 1985,Koch et al. 1986,Yullie 1987].

While stochastic annealing such as simulated annealing is theoretically justified [Geman and Geman 1984], deterministic annealing remains heuristic. There is no guarantee that the minimum at high temperature can always be tracked to the minimum at low temperature.

9.1 Simulated Annealing

9.1.1 Random Sampling Algorithms
9.1.2 Annealing

9.2 Mean Field Annealing

9.3 Graduated Non-Convexity

9.3.1 Annealing Labeling for MAP-MRF Matching

9.4 Genetic Algorithms

9.5 Experimental Comparison

9.6 Accelerating Computation

9.6.1 Multi-resolution Methods
9.6.2 Use of Heuristics

9.7 Model Debugging

Next: Simulated Annealing Up: Table of Contents Previous: RL using Lagrange-Hopfield