6.2.4
Lastly in this section, we describe an analytical technique called
``cross validation'' for estimating the
smoothing parameter in the regularization model. Consider the
energy of the form
( cf. (1.86)) under additive white noise
where with unknown variance
. It
requires no a priori knowledge and is entirely data driven. The
method has been studied in the context of regression and smoothing
splines (see Chapter 4 of [Wahba 1990]). See also
[Geiger and Poggio 1987,Shahraray and Anderson 1989,Thompson et al. 1991,Galatsanos and Katsaggelos 1992,Reeves 1992]
for its applications in image and vision processing.
The idea of cross validation for parameter estimation is the following: Divide the data into an estimation subset and a validation subset. The former subset is used to obtain a parameter estimate and the latter is used to validate the performance under the estimate. However, cross validation does not use one subset exclusively for one purpose (estimation or validation); it allows all the data to be used for both purposes. For instance, we can divide the data into m subsets; compute an estimate from all the subsets but one; and validate the estimate the left-out subset. Then, we perform the estimation-validation with a different left-out. We repeat this m times.
Ordinary cross validation (OCV) uses ``leaving-out-one'' strategy. Each
point is left out in turn and its value is predicated from the rest of
the data using the fitted covariance function. The prediction errors are
summed. The best minimizes the summed error.
This is expressed as follows.
Let be the minimizer of the following energy
in which is left out. The OCV function is defined as
In the above, measures the error
incurred by using
to predicate
. Therefore,
is the average prediction error. The OCV estimate of
is the minimizer of
.
Let be the minimizer of the complete energy
(6.72) under the parameter value
. The
relationship between
and
can be derived.
Define the
influence matrix
as satisfying
where and d are
vectors. The OCV function can
be expressed in terms of
as follows [Craven and Wahba 1979]
where . By comparing
(6.75) with (6.77), one obtains
OCV is modified to generalize cross validation (GCV) to achieve certain
desirable invariance properties that do not generally hold for OCV. Let
be any
orthogonal matrix and consider a new vector
of data
. In general, the OCV estimate can give a
different value of
; in contrast, the GCV estimate is invariant
under this (rotation) transformation [].
The GCV function is a weighted version of :
where the weights
give the relative effect of leaving out . It is obtained by
replacing
in (6.77) by
. The GCV estimate
is the minimizer of
.