Linear Classification Function

Next: A Non-parametric Learning Algorithm Up: Theory of Parameter Estimation Previous: Optimality

7.2.3

Linear Classification Function

In this work, we are interested in cases where the is linear in . Assume that with a non-parametric or partially parametric modeling method, the energy is derived to take the linear form (7.2). With the linear form, the energy change can be written as

where

is the potential changes. Denote the set of all potential changes by

Note that excludes , the vector of zeros. The set will be used as the training data set. When there are L>1 exemplar instances,

contains training data from all the instances. is the K+1 dimensional data space in which x's are points.

The correctness (7.7) can be written as

In pattern recognition, is called a linear classification function when considered as a function of x; is called a generalized linear classification function when considered as a function of f. There exist useful theories and algorithms for linear pattern classification functions [Duda and Hart 1973].

Equation is a hyperplane in the space , passing through the origin . With a correct , the hyperplane divides the space into two parts, with all () in one side, more exactly the ``positive'' side, of the hyperplane. The Euclidean distance from x to the hyperplane is equal to , a signed quantity. After the normalization (7.4), the point-to-hyperplane distance is just .

The correctness can be judged by checking the minimal distance from the point set to the hyperplane. We define the ``separability'' as the smallest distance value

The above can also be considered as the stability of the system with a given . The correctness is equivalent to the positivity of the separability.

It is helpful to visually illustrate the optimality using . With , the minimal-instability solution is the same as the following minimax solution

In this case, and the above is maximal-separability.

Fig.7.1 qualitatively illustrates correct/incorrect and the maximal- separability parameters. It is an example in two dimensional space where an energy change takes the form of . The point coincides with the origin of the -- space. Data () are shown in filled dots. The three lines , and represent three hyperplanes, corresponding to three different estimates of parameters . Parameter estimates for and are correct ones because they make all the data points in the positive side of the hyperplane and thus satisfy Eq.(7.7). However, is not a correct one. Of the two correct estimates in Fig.7.1, the one corresponding to is better than the others in terms of .

The separability determines the range of disturbances in x within which the exemplary configuration remains minimal. Refer to point in the figure. Its distance to is the smallest. The point may easily deviate across to the negative half space due to some perturbation in the observation d. When this happens, , causing to be lower than . This means that is no longer the energy minimum. If the parameters are chosen as those corresponding to , the separability is larger and the violation of the correctness is less likely to happen.

The dashed polygon in the figure forms the convex hull (polytope) of the data set. Only those data points which form the hull affect the minimax solution whereas those inside the hull are ineffective. The ineffective data points inside the hull can be removed and only the effective points, the number of which may be small compared to the whole data set, need be considered in the solution finding process. This increases the efficiency.

There are two possibilities for the orientation of the maximal-separability hyperplane with respect to the polytope. The first is that the optimal hyperplane is parallel to one of the sides (faces in cases of 3D or higher dimension parameter space) of the polytope, which is the case in Fig.7.1. The other is that the optimal hyperplane is perpendicular to the line linking the origin and the nearest data point, which is the case in Fig.7.2. This is a property we may use to find the minimax solution. When the points constituting the polytope are identified, all the possible solutions can be enumerated; when there are only a small number of them, we can find the minimax solution by an exhaustive comparison. In Fig.7.2, is parallel to , and is parallel to but is perpendicular to . Let , and be the sets of parameters corresponding to , and . Suppose in the figure, the following relations hold: and ; then is the best estimate because it maximizes the separability.

The minimax solution (7.23) with was used in the above only for illustration. With , a continuous change in x may lead to a discontinuous change in the -optimal solution. This is due to the decision-making nature of the definition which may cause discontinuous changes in the minimum. Other p-instability definitions, with p=1 or 2 for example, are used in our practice because they give more stable estimates.

Next: A Non-parametric Learning Up: Theory of Parameter Previous: Optimality