There are two ways of viewing moments, one based on statistics and one based on arbitrary functions such as f(x) or f(x, y). As a result moments can be defined in more than one way.
Definition: Moments are the statistical expectation of
certain power functions of a random variable.
The most common moment is the mean which is just the expected value of a
random variable:
where f(x) is the probability density function of continuous random variable X.
More generally, moments of order p = 0, 1, 2, can be calculated as mp = E[Xp].
These are sometimes referred to as the raw moments.
There are other kinds of moments that are often useful.
One of these is the central moments mp = E[(Xm)p].
The best known central moment is the second, which is known
as the variance
Two less common statistical measures, skewness and kurtosis, are based on the third and fourth central moments.
The use of expectation assumes that the pdf is known.
Moments are easily extended to two or more dimensions. For example
where f(x, y) is the joint pdf.
However, moments are easy to estimate from a set of measurements, xi.
The p-th moment is estimated as
(often 1/N is left out of the definition) and the p-th central moment is estimated as
where
The second central moment gives the variance of a set of data s2 = m2.
For multidimensional distributions, the first and second order moments give estimates of the mean vector and covariance matrix. The order of moments in two dimensions is given by p+q, so for moments above 0, there is more than one of a given order. For example, m20, m11, and m02 are the three moments of order 2.
One property of moments of probability distributions is that distributions have unique sets of moments, much like the terms of a Fourier series. But, also like the Fourier series, it is possible for two different distributions to have some of the same moments.
(Figure 11, p77 of MGB)
But, not all moments will be the same. Still moments of lower orders are often useful to distinguish between different distributions, especially since they can be so easily estimated from samples from the distributions. One just needs to make sure that appropriate moments are computed.
This view is not based on probability and expected values,
but most of the same ideas still hold.
For any arbitrary function f(x), one may compute moments as
or for a 2-D function
Notice now that to find the mean value of f(x), one must use m1/m0, since f(x) is not normalized to area 1 like the pdf. Likewise, for higher order moments it is common to normalize these moments by dividing by m0 (or m00). This allows one to compute moments which depend only on the shape and not the magnitude of f(x). The result of normalizing moments gives measures which contain information about the shape or distribution (not probability dist.) of f(x).
This is what makes moments useful for the analysis of shapes
in image processing, for which f(x, y)
is the image function. It may be a
gray-level function or binary (0 for background, 1 for foreground). Most often moments are computed for connected
components of binary images for the analysis of shape. In this situation and depending on the
application, moments may be computed for several orders, not just orders
The moments computed are usually used as features for shape recognition. Such methods have been in use for at least 40 years.
For digitized data (including images) we must replace the
integral with a summation over the domain covered by the data. Writing the 2-D approximation:
If f(x, y)
is a binary image function of an object, the area is m00, the x and
y centroids are
In many applications such as shape recognition, it is useful to generate shape features which are independent of parameters which cannot be controlled in an image. Such features are called invariant features. There are several types of invariance. For example, if an object may occur in an arbitrary location in an image, then one needs the moments to be invariant to location. For binary connected components, this can be achieved simply by using the central moments, mpq.
If an object is not at a fixed distance from a fixed focal
length camera, then the sizes of objects will not be fixed. In this case size invariance is needed. This can be achieved by normalizing the
moments,
The third common type of invariance is rotation invariance. This is not always needed, for example if objects always have a known direction as in recognizing machine printed text in a document. The direction can be established by locating lines of text.
(insert planes.gif planes_rot.gif)
M.K. Hu (1962) derived a transformation of the normalized central moments to make the resulting moments rotation invariant. These moments continue to be published in books on image processing.
The first six of these are also invariant to reflection, while the last one changes sign.
Orthogonal functions have been around for a very long
time. The best known are the sine and
cosine. Two functions or vectors are
orthogonal if their inner product (defined as the sum of the product of their
corresponding elements) is zero. For
functions f(t) and g(t) and for vectors x and y, this means
where a and b are certain limits which depend on the functions.
An important class of orthogonal functions is orthogonal polynomials
Orthogonal moments are computed similar to regular moments,
except that the set of orthogonal polynomials replaces the xp or xpyq
monomial in all the above equations.
That is,
where hpq(x, y)
is the pq-th
orthogonal polynomial, and R is the
region over which the polynomials are defined.
for Legendre and Zernike polynomials, and where we have subtracted the centers, which can be determined from the centroid or some other means, so that the resulting moments are location invariant.
For shape recognition these moments can also be normalized as above so that the moments are also size invariant.
For rotation invariance, things are not quite as simple. It is not possible to make the Legendre moments rotation invariant. However, the Zernike moments can be made so.
Zernike polynomials are orthogonal over the unit disk and are specified in polar coordinates in terms of a real valued radial component Rnl(r) which is a polynomial of order n, and a complex exponential component:
Vnl(r,q) = Rnl(r) ejlq, where 0 £ r £ 1, 0 £ q < 2p, and n |l| is even and |l| £ n.
l is called the repetition and represents the angle dependence.
|
l n |
0 |
1 |
2 |
3 |
4 |
|
0 |
1 |
|
2r2 1 |
|
6r4 6r2 + 1 |
|
1 |
|
r |
|
3r3 2r |
|
|
2 |
|
|
r2 |
|
4r4 3r2 |
|
3 |
|
|
|
r3 |
|
|
4 |
|
|
|
|
r4 |
The radial functions are
plot: ---
The Zernike polynomials are orthogonal, i.e.,
In 1980 Teague published an important paper on the use of orthogonal moments which are derived from orthogonal polynomials for the recognition of shapes, but he did not address digital images.????
In 1988 Khotanzad and Hong introduced the use of Zernike moments for the recognition of shapes in digital binary image. Since that time many papers have appeared on the use or computation of Zernike moments for image processing.
For an image (or other) function f(r,q),
the Zernike moments are
where x = r cos(q) and y = r sin(q),
and the function f must be rescaled
so that it is contained in the unit circle.
There is more than one way to compute Zernike moments. One can precompute
and store in a table all the values of Vnl(xi, yj),
but this requires a fixed number of samples in the image and the table may be quite
large. A more flexible way, which does
not need to store the values in a table, is to first compute the centered
ordinary (geometric) moments of f(xi, yj), then apply a
transformation to the moments.
where xc and yc are the centers
which may be obtained from the average (i.e., the centroid), median, or
midrange of function f over its
finite extent, whichever is the most robust for the application. Since the Zernike moments are defined only
within a unit circle, either f must
be rescaled to fit inside the unit circle, or better, the geometric moments can
be normalized by the radius a
of a bounding circle with center at (xc, yc)
which contains all of f:
Then the complex Zernike moments are
where 0 £ l £ n, q = (k l)/2.
These may look complicated, but are actually quite efficient since the factorials and binomial coefficients can be precomputed (there are not that many. And, the results are as accurate as the computation of the geometric moments.
Since the Zernike polynomials form a complete orthogonal basis for 2-D functions defined over a unit circle, the moments represent the magnitudes of projections of the function f onto the orthogonal axes represented by the Zernike polynomials, just like the coefficients of the Fourier series. As a result, the moment features contain independent information about f, just like the coefficients of a Fourier series or transform, or the principal components transform of a covariance matrix.
And like the Fourier series coefficients, the first N Zernike moments can be used to
reconstruct an approximation of the original function.
where R and I mean the real and imaginary parts, respectively. fN is the best least squares approximation of f for order N.
As computed above, Anl are location and size invariant, but not rotation invariant.
Rotating function f CCW by angle f results in a phase shift of the Zernike moments: ejlf Anl
A rotation followed by a reflection about the x-axis gives ejlf Anl* , which is just a change in the sign of the real part of the moments.
If one does not need rotation invariance, then computed Anl can be used directly as shape recognition features, with real and non-zero components used as separate features. However, if one wants rotation invariant features, then it is necessary to combine Anl in such a way as to remove dependence on f. The simplest solution is to take their magnitudes (they are complex valued) |Anl|. Also, moments of different orders and repetitions can be combined in various ways with lp = kq:
Proper rotations are simple, arbitrary rotations of an object. Improper rotations also involve a reflection.
Uses of Zernike moments: object recognition from shape, determination of angle of rotation of an object, searching imagery data bases. Some of these applications are still being researched, as are efficient ways of computing them.
The major disadvantage of moments in general is that they are global features rather than local. This makes them not suited for recognizing objects which are partially obstructed. Moments are inherently location dependent, so some means must be adopted to insure location invariance (like the centroid).
Zernike moments are far superior to the Hu moments for rotation invariance.