Cross Validation

Next: Further Issues Up: Unsupervised Estimation with Unlabeled Data Previous: Expectation-Maximization

6.2.4

Cross Validation

Lastly in this section, we describe an analytical technique called ``cross validation'' for estimating the smoothing parameter in the regularization model. Consider the energy of the form

( cf. (1.86)) under additive white noise

where with unknown variance . It requires no a priori knowledge and is entirely data driven. The method has been studied in the context of regression and smoothing splines (see Chapter 4 of [Wahba 1990]). See also [Geiger and Poggio 1987,Shahraray and Anderson 1989,Thompson et al. 1991,Galatsanos and Katsaggelos 1992,Reeves 1992] for its applications in image and vision processing.

The idea of cross validation for parameter estimation is the following: Divide the data into an estimation subset and a validation subset. The former subset is used to obtain a parameter estimate and the latter is used to validate the performance under the estimate. However, cross validation does not use one subset exclusively for one purpose (estimation or validation); it allows all the data to be used for both purposes. For instance, we can divide the data into m subsets; compute an estimate from all the subsets but one; and validate the estimate the left-out subset. Then, we perform the estimation-validation with a different left-out. We repeat this m times.

Ordinary cross validation (OCV) uses ``leaving-out-one'' strategy. Each point is left out in turn and its value is predicated from the rest of the data using the fitted covariance function. The prediction errors are summed. The best minimizes the summed error. This is expressed as follows.

Let be the minimizer of the following energy

in which is left out. The OCV function is defined as

In the above, measures the error incurred by using to predicate . Therefore, is the average prediction error. The OCV estimate of is the minimizer of .

Let be the minimizer of the complete energy (6.72) under the parameter value . The relationship between and can be derived. Define the influence matrix as satisfying

where and d are vectors. The OCV function can be expressed in terms of as follows [Craven and Wahba 1979]

where . By comparing (6.75) with (6.77), one obtains

OCV is modified to generalize cross validation (GCV) to achieve certain desirable invariance properties that do not generally hold for OCV. Let be any orthogonal matrix and consider a new vector of data . In general, the OCV estimate can give a different value of ; in contrast, the GCV estimate is invariant under this (rotation) transformation [].

The GCV function is a weighted version of :

where the weights

give the relative effect of leaving out . It is obtained by replacing in (6.77) by . The GCV estimate is the minimizer of .