Next: Annealing Labeling for MAP-MRP Matching Up: Minimization -- Global Methods Previous: Mean Field Annealing

9.3

Graduated Non-Convexity

Graduated non-convexity (GNC) [Blake and Zisserman 1987] is a deterministic annealing method for approximating the global solution for nonconvex minimization of unconstrained, continuous problems such as 8.5. It finds good solutions with much less cost than stochastic simulated annealing. Energies of the following form will be considered

in the subsequent discussion of GNC. When g is a non-convex function, a gradient-based method such as (8.7) finds only a local minimum.

The idea of GNC is the following: Initially, is set to a sufficiently large value such that is strictly convex. It is easy to find the unique minimum of regardless of the initial f using the gradient descent method. The minimum found under is then used as the initial value for the next phase of minimization under a lower value to obtain the next minimum . As is lowered, becomes non-convex and local minima appear. However, if we track the sequence of minima as decreases from the high value to the target value , we may approximate the global minimum under the target value , i.e. the global minimum of the target energy .

The first term on the RHS of (9.25) is quadratic and hence strictly convex with respect to f. The second term may or may not be convex, depending on f and g. Its non-convexity could be partly compensated by the convexity of the closeness term. If , then the second term is strictly convex and so is . If ( cf.

Eq.(3.10)), the second term is nonconvex and so is . However if the function g together with the involved parameters are chosen to satisfy

where is some constant, then the convexity of is guaranteed. The value of depends on the type of the model and on whether the data d is complete. Provided are available for all i, then the value is for string, for membrane, for rod and for plate, respectively. See [Blake and Zisserman 1987]. If are missing altogether, it requires , so that the second term is convex by itself, to ensure the convexity of . The practical situation is generally better than this worst case because the data cannot be missing altogether.

Therefore, to construct an initially convex , we can choose an initial parameter for which for all i. This is equivalent to choosing a such that where is the band introduced in the section. For APFs 1, 2 and 3, the must be larger than 2v, 3v and v, respectively, where .

Figure 9.3: A GNC algorithm for finding the DA solution.

The GNC algorithm is outlined in Fig.9.3. Given d, and a target value for , the algorithm aims to construct a sequence () and thus to approach the global minimum for which . In the algorithm, is a constant for judging the convergence, and is a factor for decreasing towards , . The choice of controls the balance between the quality of the solution and the computational time. In principle, should vary continuously to keep a good track of the global minimum. In discrete computation, we choose . More rapid decrease of (with smaller ) is likely to lead the system to an unfavorable local minimum. Witkin et al. (1987) presented a more sophisticated scheme for decreasing by relating the step to the energy change: where and are constants. This seems reasonable.

To approximate the truncated quadratic function , Blake and Zisserman (1987) construct the following function

where is a continuation parameter, , and . The corresponding interaction function is

The continuation parameter p is decreased from 1 to 0. When p=1, the corresponding is convex where is the energy with . When p=0, equals to the original energy. A performance comparison between GNC and SA for reconstruction is given in [Blake 1989].

However, the complexity of the above convexity treatment can be avoided. Instead of modifying the definition of , we can simply modify the parameter in it. To achieve graduated non-convexity, we can use as the control parameter and decrease it from a sufficiently high value, , towards the target value. Let f be initialized so that . Then the energy is convex at such . An assumption is that if so initialized, always holds at any t during the iteration. Parameter values for the convexity in this way are related to the band B defined in (3.29) for the DA (discontinuity-adaptive) model. In a similar way, parameter values for the convexity can be derived for the LP (line process) approximation models given by [Koch et al. 1986,Yullie 1987,Geiger and Girosi 1991].

Figure: Schematic diagram of the analog network circuit.

The computation can be performed using an analog network. Let be the potential of neural cell i. Let be the membrane capacitance and the membrane resistance. Let

be the conductance or synaptic efficacy between neurons i and where and . If the exponential is used, then

Let be the external current input to i, with when . Now (9.4) can be written as

The above is the dynamic equation at neuron i of the network. The diagram of the network circuit is shown in Fig.3.2. The synaptic current from i to is

If the exponential is used, then

A plot of current versus potential difference was shown at the bottom of Fig.. The voltage-controlled nonlinear synaptic conductance , characterized by the h function in (3.27), realizes the adaptive continuity control; the corresponding nonlinear current realizes the adaptive smoothing. The current diminishes asymptotically to zero as the potential difference between neurons i and reaches far beyond the band .

Figure 9.5: Stability of the DA solution under disturbances in parameters.

In comparison of the DA and the LP models, the former is more suitable for VLSI implementation than the latter. This is because the continuous shape of the adaptive conductance in the DA is easier to implement using analog circuits than the piecewise in the LP. This advantage of the DA model is also reflected in the work by Harris et al. [ Harris et al. 1990] and [Lumsdaine et al. 1990]. There, the piecewise current-voltage characteristics is implemented by a resistive fuse. Interestingly, the resistive fuse is of finite gain and thus does not give rise to a sharp switch-off as described by the piecewise . The actual current-voltage characteristics of the resistive fuse is more like than . It seems to satisfy all the requirements in (3.2) except for the continuity. The definition of offers guidelines for the DA circuit design.

Fig.9.5 shows the behavior of the analog DA network under component defects such as manufacturing inadequacy, quality changes, etc. The defects are simulated by adding evenly distributed random noise into R, C, and T in (9.31). The data d is shown in triangles with missing rate; the locations of the missing data, for which , are indicated by triangles at the bottom. The noise in the data is white Gaussian with standard deviation (left) and (right). The interaction function is chosen to be . Solutions obtained with simulated component defects are shown in dashed lines in comparison with those obtained without such noise shown in thicker solid lines. The ideal signal is shown in thinner solid lines. As can be seen, there is only a little difference between the solutions obtained with and without such noise. This demonstrates not only the stability of the network circuit but also the error-tolerance property of the DA model.

9.3.1 Annealing Labeling for MAP-MRF Matching

Next: Annealing Labeling for Up: Minimization -- Global Previous: Mean Field Annealing