7.2.3
In this work, we are interested in cases where the
is linear in
. Assume that with a non-parametric or partially
parametric modeling method, the energy is derived to take the linear
form (7.2). With the linear form, the
energy change can be written as
where
is the potential changes. Denote the set of all potential changes by
Note that excludes
, the vector of zeros. The
set
will be used as the training data set. When there are
L>1 exemplar instances,
contains training data from all the instances. is the K+1
dimensional data space in which x's are points.
The correctness (7.7) can be written as
In pattern recognition, is called a linear
classification function when considered as a function of x;
is called a generalized linear classification function when
considered as a function of f. There exist useful theories and
algorithms for linear pattern classification functions
[Duda and Hart 1973].
Equation is a hyperplane in the space
, passing
through the origin
. With a correct
, the
hyperplane divides the space into two parts, with all
(
)
in one side, more exactly the ``positive'' side, of the hyperplane. The
Euclidean distance from x to the hyperplane is equal to
, a signed quantity. After the normalization
(7.4), the point-to-hyperplane distance is
just
.
The correctness can be judged by checking the minimal distance from the
point set to the hyperplane. We define the ``separability'' as
the smallest distance value
The above can also be considered as the stability of the system with a
given . The correctness is equivalent to the positivity of the
separability.
It is helpful to visually illustrate the optimality using .
With
, the minimal-instability solution is the same as the
following minimax solution
In this case, and the above is
maximal-separability.
Figure 7.1: Correct and incorrect parameter estimates. A case where the
-optimal hyperplane is parallel to
.
From (Li 1995c) with permission; © 1995 Kluwer.
Fig.7.1 qualitatively illustrates
correct/incorrect and the maximal- separability parameters. It is an
example in two dimensional space where an energy change takes the form
of . The point
coincides with the origin of the
--
space. Data
(
) are shown in filled dots.
The three lines
,
and
represent three
hyperplanes, corresponding to three different estimates of parameters
. Parameter estimates for
and
are correct
ones because they make all the data points in the positive side of the
hyperplane and thus satisfy Eq.(7.7). However,
is not a correct one. Of the two correct estimates in
Fig.7.1, the one corresponding to
is
better than the others in terms of
.
The separability determines the range of disturbances in x within
which the exemplary configuration remains minimal. Refer to
point
in the figure. Its distance to
is the
smallest. The point may easily deviate across
to the negative
half space due to some perturbation in the observation d. When this
happens,
, causing
to
be lower than
. This means that
is no
longer the energy minimum. If the parameters are chosen as those
corresponding to
, the separability is larger and the violation
of the correctness is less likely to happen.
The dashed polygon in the figure forms the convex hull (polytope) of the data set. Only those data points which form the hull affect the minimax solution whereas those inside the hull are ineffective. The ineffective data points inside the hull can be removed and only the effective points, the number of which may be small compared to the whole data set, need be considered in the solution finding process. This increases the efficiency.
Figure 7.2: A case where the optimal hyperplane is perpendicular to
.
From (Li 1995c) with permission; © 1995 Kluwer.
There are two possibilities for the orientation of the
maximal-separability hyperplane with respect to the polytope. The first
is that the optimal hyperplane is parallel to one of the sides (faces in
cases of 3D or higher dimension parameter space) of the polytope, which
is the case in Fig.7.1. The other is that the
optimal hyperplane is perpendicular to the line linking the origin
and the nearest data point, which is the case in
Fig.7.2. This is a property we may use to find
the minimax solution. When the points constituting the polytope are
identified, all the possible solutions can be enumerated; when there are
only a small number of them, we can find the minimax solution by an
exhaustive comparison. In Fig.7.2,
is parallel to
, and
is parallel to
but
is perpendicular to
. Let
,
and
be
the sets of parameters corresponding to
,
and
. Suppose in the figure, the following relations hold:
and
; then
is the best estimate because it maximizes the separability.
The minimax solution (7.23) with was used
in the above only for illustration. With
, a continuous
change in x may lead to a discontinuous change in the
-optimal solution. This is due to the decision-making nature
of the definition which may cause discontinuous changes in the minimum.
Other p-instability definitions, with p=1 or 2 for example, are used
in our practice because they give more stable estimates.