Least Squares Fit

Next: Unsupervised Estimation with Unlabeled Data Up: Supervised Estimation with Labeled Data Previous: Mean Field Approximations

6.1.5

Least Squares Fit

MRF parameter estimation can also be done by using a least squares (LS) fit procedure [Derin and Elliot 1987]. The procedure consists of the following steps: (1) Find the relationship between the joint probability and the parameters ; (2) use histogram techniques to estimate all such probabilities; (3) construct an over-determined system of equations in terms of the probabilities and the parameters; and (4) solve it using the LS method.

Figure 6.3: Pixel i and its eight neighbors in the second order neighborhood system.

Consider the MLL model with the label set . For the 8-neighborhood system shown in Fig.6.3, there are ten type of clique potential parameters shown in Fig.1.4. Denote all the parameters as a vector

Let

be the sum of the potential functions over all the cliques containing i. Define indicator functions

for expressing (1.52) and

for (1.53). In terms of and 's, (6.32) can be written as

where

is a vector whose elements count the number of each type of cliques in the window. The RHS of the first row in the above equation correspond to the potentials weighted by , the second row by and , the third row by and , the fourth to seventh by , and the last two rows by (refer to Fig.1.4).

The conditional likelihood is related to the potential by

It can be expressed in terms of the joint probability of the labels on the window and the joint probability of the labels at the neighboring sites

The reason for this expression is that can be estimated using histogram techniques (see shortly). Comparing the above two equations yield the following equation

Note that the RHS of the above is independent of the value of and therefore, so is the LHS.

Figure 6.4: Realizations from a Gibbs distribution with specified (upper) and estimated (lower) parameters. See Table 6.1 for parameter values. From (Derin and Elliott 1987) with permission; © 1987 IEEE.

Consider the LHS of (6.39) for any two distinct values, I and , for . Because the LHS is independent of values, the following holds

Rearranging the above and using the relationship (6.35), we obtain

For any and , the vector can be determined using (6.36) and can be estimated using histogram techniques. Therefore, (6.41) represents a linear equation with unknown . One such equation can be obtained for each distinct combination of I, and . An over-determined linear system of equations is obtained from all possible combinations. It can be solved using standard LS method. This procedure is also used by [Nadabar and Jain 1992] for estimating line process parameters in edge detection.

Histogram techniques are proposed to estimate , for all combinations of , from one or several realizations (images). Assume that there are a total of N distinct blocks in the image lattice and that a particular configuration occurs times. Then . The relationship between the histogramming and the ML estimation is established in [Guerelli and Onural 1994].

A few remarks follow: Any combination for which cannot be used to obtain a linear equation because of the logarithm in (6.41). Derin and Elliott propose to use only those combinations for which are largest. This selection strategy is quite robust []. They also suggest to discard the three and four cliques (set their potentials to zero), according to some prior or empirical knowledge, to reduce the length of and to simplify the procedure. The normalized form of GRFs [Griffeath 1976] described in Section 1.2.5 provides a foundation for the reduction. This will be discussed in Section 6.3.2.

Fig.6.4 shows estimation results obtained by using the LS fitting method. In this example, the only nonzero parameters are . In the upper row are the texture images generated with specified parameter values. These values are estimated from the images. The estimated values are then used to generate textures in the lower row. Table 6.1 compares the specified values and the estimated values. Note that the estimate for the image in column 4 is numerically not so good; Derin and Elliott explanation for this is that the parameter representation for a realization is not unique.

The LS fitting procedure has the following advantages over the coding method and MPL estimation. It does not require numerical maximization, avoiding solving nonlinear equations. Nor is the final solution dependent on the initial estimate. Further, it is more efficient and faster. A disadvantage is that the number of equations (6.41) in the system increases exponentially as the sizes of and increase. Zero histogramming counts also cause problems with the RHS of (6.41).

Next: Unsupervised Estimation with Unlabeled Data Up: Supervised Estimation with Labeled Data Previous: Mean Field Approximations