4.2.1
Figure 4.3: The AM estimate of location.
From (Li 1995a) with permission; © 1995 Elsevier.
Simulated data points in 2D locations are generated. The data set is a
mixture of true data points and outliers. First, m true data points
are randomly generated around
. The values of
and
obey an identical,
independent Gaussian distribution with a fixed mean value of 10 and a
variance value V. After that, a percentage
of the m data
points are replaced by random outlier values. The outliers are
uniformly distributed in a square of size
centered at
. There are four parameters to control the data
generation. Their values are:
Fig.4.3 shows two typical data distributions and estimated locations. Each of the two data sets contains 32 Gaussian-distributed true data points and 18 uniformly distributed outliers. The two sets differ only in the arrangement of outliers while the true data points are common to both sets. The algorithm takes about 50 iterations to converge for each of these data sets. The estimated locations for the two data sets is marked in Fig.4.3. The experiments show that the estimated locations are very stable regardless of the initial estimate though the outliers arrangement is quite different in the two sets. Without the use of AM, the estimated location would have been very much dependent on the initialization.
In a quantitative comparison, two quantities are used as the performance
measures: (1) the mean error versus the percentage of outliers
(PO)
and (2) the mean error
versus the noise
variance (NV) V. Let the Euclidean error by
where
is the
estimate and
is the true location.
Fig. 4.4 and 4.5 shows
the mean error of the AM-estimator and the M-estimator, respectively.
Every statistic for the simulated experiment is made based on 1000
random tests and the data sets are exactly the same for the two compared
estimators. Outliers are uniformly distributed in a square centered at
(the left columns) or b=50 (the right columns). The plots
show the mean error vs. percentage of outliers with m=50 (row 1) and
m=200 (row 2) and the mean error vs. noise variance with m=50 (row
3) and m=200 (row 4). It can be seen that the AM-estimator has a very
stable and elegant behavior as the percentage of outliers and the noise
variance increase; in contrast, the M-estimator not only gives higher
error but also has an unstable behavior.