4.1.4
The annealing algorithm for the redefined M-estimator, called the AM-estimator [Li 1995e], is based on the idea of GNC algorithm [Blake and Zisserman 1987]. It aims to overcome the local minimum problem in the M estimation, i.e.
to obtain good estimate regardless of the initialization. It also make the estimation free from parameters or at least insensitive to their choices.
The annealing is performed by continuation in
and the AM-estimator is defined in the limit
An algorithm which implements the AM-estimator algorithm is given in
Fig.4.2. Initially, is set to a value
high enough to guarantee that the corresponding APF
is
convex in an interval. With such
, it is easy to find the unique
minimum of the global error function
using the gradient
descent method, regardless of the initial value for f. The minimum is
then used as the initial value for the next phase of minimization under
a lower
to obtain the next minimum. As
is lowered,
may be no longer convex and local minima may appear.
However, if we track the global minima for decreasing
values,
we may hopefully approximate the global minimum
under
.
Obviously, whatever the initialization is, the first iteration always gives a value equal to the LS estimate
This is because for which guarantees the strict
convexity, all weights are the same as
. The initial
is chosen to satisfy the following
where (
) is the upper bound of the band in
(3.29). This guarantees
and
hence the strict convexity of
. The parameter
is
lowered according to the schedule specified by the function
. The parameters
and
in the
convergence conditions are some small numbers. An alternative way is to
decrease
according to a fixed schedule regardless of whether or
not
is converged at current
, which is equivalent to
setting a big value for
. In this case, the algorithm freezes
after dozens of iterations. This quick annealing is used in our
experiments.
The advantages of the AM-estimator is summarized below. Firstly, the use
of the annealing significantly improves the quality and stability of
the estimate. The estimate is made independent of the initialization.
Because the start point for obtaining at current
is the convergence point obtained with the previous
value,
the divergence problem with the traditional M estimator is minimized.
Secondly, the definition of the AM-estimator effectively eliminates
scale parameters in the M estimation because
is finally set to
zero (or a small number to whose value the final estimate is
insensitive). This avoids the instability problem incurred by
inappropriate selection of the scale parameters. Furthermore, it needs no
order statistics, such as the median, hence no sorting. This improves
the computational efficiency.