MOMENTS IN IMAGE PROCESSING
Bob Bailey       Nov. 2002

 

There are two ways of viewing moments, one based on statistics and one based on arbitrary functions such as f(x) or f(x, y).  As a result moments can be defined in more than one way.

 

Statistical view

 

Definition: Moments are the statistical expectation of certain power functions of a random variable.  The most common moment is the mean which is just the expected value of a random variable:

where f(x) is the probability density function of continuous random variable X.

More generally, moments of order p = 0, 1, 2, … can be calculated as mp = E[Xp].

These are sometimes referred to as the raw moments.

There are other kinds of moments that are often useful.

One of these is the central moments mp = E[(X–m)p].

The best known central moment is the second, which is known as the variance

Two less common statistical measures, skewness and kurtosis, are based on the third and fourth central moments.

The use of expectation assumes that the pdf is known.

Moments are easily extended to two or more dimensions.  For example

where f(x, y) is the joint pdf.

 

Estimation

However, moments are easy to estimate from a set of measurements, xi.

The p-th moment is estimated as

(often 1/N is left out of the definition) and the p-th central moment is estimated as

where  is the average of the measurements, which is the usual estimate of the mean.

The second central moment gives the variance of a set of data s2 = m2.

For multidimensional distributions, the first and second order moments give estimates of the mean vector and covariance matrix.  The order of moments in two dimensions is given by p+q, so for moments above 0, there is more than one of a given order.  For example, m20, m11, and m02 are the three moments of order 2.

 

Uses

One property of moments of probability distributions is that distributions have unique sets of moments, much like the terms of a Fourier series.  But, also like the Fourier series, it is possible for two different distributions to have some of the same moments.

(Figure 11, p77 of MGB)

But, not all moments will be the same.  Still moments of lower orders are often useful to distinguish between different distributions, especially since they can be so easily estimated from samples from the distributions.  One just needs to make sure that appropriate moments are computed.

 

Non-statistical view

This view is not based on probability and expected values, but most of the same ideas still hold.  For any arbitrary function f(x), one may compute moments as

or for a 2-D function

Notice now that to find the mean value of f(x), one must use m1/m0, since f(x) is not normalized to area 1 like the pdf.  Likewise, for higher order moments it is common to normalize these moments by dividing by m0 (or m00).  This allows one to compute moments which depend only on the shape and not the magnitude of f(x).  The result of normalizing moments gives measures which contain information about the shape or distribution (not probability dist.) of f(x).

This is what makes moments useful for the analysis of shapes in image processing, for which f(x, y) is the image function.  It may be a gray-level function or binary (0 for background, 1 for foreground).  Most often moments are computed for connected components of binary images for the analysis of shape.  In this situation and depending on the application, moments may be computed for several orders, not just orders 2 to 4 used in statistics.  Also, since they are 2-D moments, quite a few will be needed.

The moments computed are usually used as features for shape recognition.  Such methods have been in use for at least 40 years.

 

Digital approximation

For digitized data (including images) we must replace the integral with a summation over the domain covered by the data.  Writing the 2-D approximation:

If f(x, y) is a binary image function of an object, the area is m00, the x and y centroids are  and .

Invariance

In many applications such as shape recognition, it is useful to generate shape features which are independent of parameters which cannot be controlled in an image.  Such features are called invariant features.  There are several types of invariance.  For example, if an object may occur in an arbitrary location in an image, then one needs the moments to be invariant to location.  For binary connected components, this can be achieved simply by using the central moments, mpq.

If an object is not at a fixed distance from a fixed focal length camera, then the sizes of objects will not be fixed.  In this case size invariance is needed.  This can be achieved by normalizing the moments,

The third common type of invariance is rotation invariance.  This is not always needed, for example if objects always have a known direction as in recognizing machine printed text in a document.  The direction can be established by locating lines of text.

(insert planes.gif planes_rot.gif)

M.K. Hu (1962) derived a transformation of the normalized central moments to make the resulting moments rotation invariant.  These moments continue to be published in books on image processing.

The first six of these are also invariant to reflection, while the last one changes sign.

Orthogonal Moments

Orthogonal functions have been around for a very long time.  The best known are the sine and cosine.  Two functions or vectors are orthogonal if their inner product (defined as the sum of the product of their corresponding elements) is zero.  For functions f(t) and g(t) and for vectors x and y, this means

where a and b are certain limits which depend on the functions.

An important class of orthogonal functions is orthogonal polynomials , which are orthogonal over various intervals of the real axis.   Important orthogonal polynomials include Legendre, Hermite, Chebyshev, etc.  While there are a number that are orthogonal on the real axis, there are not so many that are orthogonal in two dimensions.  Legendre polynomials, which are orthogonal over [-1, 1], can be taken as a product P(x)P(y), and the result is an orthogonal set of polynomials over a square.  Zernike working in optics in the 1930’s derived a set of polynomials that are orthogonal over a unit disk, i.e., r £ 1.

Orthogonal moments are computed similar to regular moments, except that the set of orthogonal polynomials replaces the xp or xpyq monomial in all the above equations.  That is,

where hpq(x, y) is the pq-th orthogonal polynomial, and R is the region over which the polynomials are defined.

for Legendre and Zernike polynomials, and where we have subtracted the centers, which can be determined from the centroid or some other means, so that the resulting moments are location invariant.

For shape recognition these moments can also be normalized as above so that the moments are also size invariant.

For rotation invariance, things are not quite as simple.  It is not possible to make the Legendre moments rotation invariant.  However, the Zernike moments can be made so.

 

Zernike Polynomials

Zernike polynomials are orthogonal over the unit disk and are specified in polar coordinates in terms of a real valued radial component Rnl(r) which is a polynomial of order n, and a complex exponential component:

Vnl(r,q) = Rnl(r) ejlq, where 0 £ r £ 1, 0 £ q < 2p, and n – |l| is even and |l| £ n.

l is called the “repetition” and represents the angle dependence.

l         n

0

1

2

3

4

0

1

 

2r2 – 1

 

6r4 – 6r2 + 1

1

 

r

 

3r3 – 2r

 

2

 

 

r2

 

4r4 – 3r2

3

 

 

 

r3

 

4

 

 

 

 

r4

 

The radial functions are


plot: ---

The Zernike polynomials are orthogonal, i.e.,

 

Zernike moments

In 1980 Teague published an important paper on the use of orthogonal moments which are derived from orthogonal polynomials for the recognition of shapes, but he did not address digital images.????

In 1988 Khotanzad and Hong introduced the use of Zernike moments for the recognition of shapes in digital binary image.  Since that time many papers have appeared on the use or computation of Zernike moments for image processing.

For an image (or other) function f(r,q), the Zernike moments are

where x = r cos(q) and y = r sin(q), and the function f must be rescaled so that it is contained in the unit circle.  There is more than one way to compute Zernike moments.  One can precompute and store in a table all the values of Vnl(xi, yj), but this requires a fixed number of samples in the image and the table may be quite large.  A more flexible way, which does not need to store the values in a table, is to first compute the centered ordinary (geometric) moments of f(xi, yj), then apply a transformation to the moments.

where xc and yc are the centers which may be obtained from the average (i.e., the centroid), median, or midrange of function f over its finite extent, whichever is the most robust for the application.  Since the Zernike moments are defined only within a unit circle, either f must be rescaled to fit inside the unit circle, or better, the geometric moments can be normalized by the radius a of a bounding circle with center at (xc, yc) which contains all of f:

Then the complex Zernike moments are

where 0 £ l £ n, q = (k – l)/2.

These may look complicated, but are actually quite efficient since the factorials and binomial coefficients can be precomputed (there are not that many. And, the results are as accurate as the computation of  the geometric moments.

Since the Zernike polynomials form a complete orthogonal basis for 2-D functions defined over a unit circle, the moments represent the magnitudes of projections of the function f onto the orthogonal axes represented by the Zernike polynomials, just like the coefficients of the Fourier series.  As a result, the moment features contain independent information about f, just like the coefficients of a Fourier series or transform, or the principal components transform of a covariance matrix.

And like the Fourier series coefficients, the first N Zernike moments can be used to reconstruct an approximation of the original function.

where R and I mean the real and imaginary parts, respectively.  fN is  the best least squares approximation of f for order N.

 

Rotation invariance

As computed above, Anl are location and size invariant, but not rotation invariant. 

Rotating function f CCW by angle f results in a phase shift of the Zernike moments: e–jlf Anl

A rotation followed by a reflection about the x-axis gives –e–jlf Anl* , which is just a change in the sign of the real part of the moments.

If one does not need rotation invariance, then computed Anl can be used directly as shape recognition features, with real and non-zero components used as separate features.  However, if one wants rotation invariant features, then it is necessary to combine Anl in such a way as to remove dependence on f.  The simplest solution is to take their magnitudes (they are complex valued) |Anl|.  Also, moments of different orders and repetitions can be combined in various ways with lp = kq:

 are invariant to proper and improper rotations.

 are invariant to proper, but not improper rotations.

Proper rotations are simple, arbitrary rotations of an object.  Improper rotations also involve a reflection.

 

Summary

Uses of Zernike moments: object recognition from shape, determination of angle of rotation of an object, searching imagery data bases.  Some of these applications are still being researched, as are efficient ways of computing them.

The major disadvantage of moments in general is that they are global features rather than local.  This makes them not suited for recognizing objects which are partially obstructed. Moments are inherently location dependent, so some means must be adopted to insure location invariance (like the centroid).

Zernike moments are far superior to the Hu moments for rotation invariance.