9.3
Graduated non-convexity (GNC) [Blake and Zisserman 1987] is a deterministic annealing method for approximating the global solution for nonconvex minimization of unconstrained, continuous problems such as 8.5. It finds good solutions with much less cost than stochastic simulated annealing. Energies of the following form will be considered
in the subsequent discussion of GNC. When g is a non-convex function, a gradient-based method such as (8.7) finds only a local minimum.
The idea of GNC is the following: Initially, is set to a
sufficiently large value
such that
is strictly convex. It is easy to find the unique minimum
of
regardless of the initial f using the
gradient descent method. The minimum
found under
is then used as the initial value for the next phase of
minimization under a lower value
to obtain
the next minimum
. As
is lowered,
becomes non-convex and local minima appear. However, if we
track the sequence of minima as
decreases from the high value
to the target value
, we may approximate the
global minimum
under the target value
, i.e.
the
global minimum
of the target energy
.
The first term on the RHS of (9.25) is quadratic and hence
strictly convex with respect to f. The second term may or may not be
convex, depending on f and g. Its non-convexity could be partly
compensated by the convexity of the closeness term. If , then the second term is strictly convex and so is
. If
( cf.
Eq.(3.10)), the second term is nonconvex and so is
. However if the function g together with the involved
parameters are chosen to satisfy
where is some constant, then the convexity of
is
guaranteed. The value of
depends on the type of the model and on
whether the data d is complete. Provided
are available for all
i, then the value is
for string,
for membrane,
for rod and
for plate, respectively. See [Blake and Zisserman 1987]. If
are missing altogether, it requires
, so
that the second term is convex by itself, to ensure the convexity of
. The practical situation is generally better than this worst case
because the data cannot be missing altogether.
Therefore, to construct an initially convex , we can choose
an initial parameter
for which
for all i.
This is equivalent to choosing a
such that
where
is the band
introduced in the section. For APFs 1, 2 and 3, the
must
be larger than 2v, 3v and v, respectively, where
.
Figure 9.3: A GNC algorithm for finding the DA solution.
The GNC algorithm is outlined in Fig.9.3. Given d,
and a target value
for
, the
algorithm aims to construct a sequence
(
) and thus
to approach the global minimum
for which
.
In the algorithm,
is a constant for judging the convergence,
and
is a factor for decreasing
towards
,
. The choice of
controls the
balance between the quality of the solution and the computational time.
In principle,
should vary continuously to keep a good
track of the global minimum. In discrete computation, we choose
. More rapid decrease of
(with
smaller
) is likely to lead the system to an unfavorable local
minimum. Witkin et al.
(1987) presented a more
sophisticated scheme for decreasing
by relating the step to the
energy change:
where
and
are constants. This seems reasonable.
To approximate the truncated quadratic function , Blake and Zisserman (1987)
construct the following function
where is a continuation parameter,
,
and
. The
corresponding interaction function is
The continuation parameter p is decreased from 1 to 0. When p=1,
the corresponding is convex where
is the
energy with
. When p=0,
equals
to the original energy. A performance comparison between GNC and SA for
reconstruction is given in [Blake 1989].
However, the complexity of the above convexity treatment can be avoided.
Instead of modifying the definition of , we can simply modify
the parameter
in it. To achieve graduated non-convexity, we can
use
as the control parameter and decrease it from a
sufficiently high value,
, towards
the target value. Let f be initialized so that
. Then the energy is convex at
such
. An assumption is that if so initialized,
always holds at any
t during the iteration. Parameter values for the convexity in this way
are related to the band B defined in (3.29) for the
DA (discontinuity-adaptive) model. In a similar way, parameter values
for the convexity can be derived for the LP (line process) approximation
models given by [Koch et al. 1986,Yullie 1987,Geiger and Girosi 1991].
Figure: Schematic diagram of the analog network circuit.
The computation can be performed using an analog network. Let be
the potential of neural cell i. Let
be the membrane
capacitance and
the membrane resistance. Let
be the conductance or synaptic efficacy between neurons i and
where
and
. If the exponential
is used, then
Let be the external current input to i, with
when
. Now (9.4) can be written as
The above is the dynamic equation at neuron i of the network. The
diagram of the network circuit is shown in Fig.3.2. The
synaptic current from i to is
If the exponential is used, then
A plot of current versus potential difference
was shown at the bottom of Fig.. The voltage-controlled
nonlinear synaptic conductance
, characterized by the h
function in (3.27), realizes the adaptive continuity
control; the corresponding nonlinear current
realizes the
adaptive smoothing. The current
diminishes asymptotically to
zero as the potential difference between neurons i and
reaches
far beyond the band
.
Figure 9.5: Stability of the DA solution under disturbances in
parameters.
In comparison of the DA and the LP models, the former is more suitable
for VLSI implementation than the latter. This is because the continuous
shape of the adaptive conductance in the DA is easier to implement using analog circuits than the
piecewise
in the LP. This advantage of the DA model
is also reflected in the work by Harris et al.
[ Harris et al. 1990] and
[Lumsdaine et al. 1990]. There, the piecewise current-voltage
characteristics
is implemented by a
resistive fuse. Interestingly, the resistive fuse is of finite
gain and thus does not give rise to a sharp switch-off as described by
the piecewise
. The actual current-voltage characteristics of
the resistive fuse is more like
than
. It seems to satisfy all the
requirements in (3.2) except for the
continuity.
The definition of
offers guidelines for the DA circuit
design.
Fig.9.5 shows the behavior of the analog DA network
under component defects such as manufacturing inadequacy, quality
changes, etc.
The defects are simulated by adding evenly
distributed random noise into R, C, and T in
(9.31). The data d is shown in triangles with
missing rate; the locations of the missing data, for which
, are indicated by triangles at the bottom. The noise in the
data is white Gaussian with standard deviation
(left) and
(right). The interaction function is chosen to be
. Solutions
obtained with simulated component defects are shown in dashed lines in
comparison with those obtained without such noise shown in thicker solid
lines. The ideal signal is shown in thinner solid lines. As can be seen,
there is only a little difference between the solutions obtained with
and without such noise. This demonstrates not only the stability of the
network circuit but also the error-tolerance property of the DA model.