Principal Components Analysis


There is a tendency for multiband data sets/images to be somewhat redundant wherever bands are adjacent to each other in the (multi-)spectral range. Thus, such bands are said to be correlated (relatively small variations in DNs for some features). A statistically-based program, called Principal Components Analysis, decorrelates the data by transforming DN distributions around sets of new multi-spaced axes. The underlying basis of PCA is described in a link. Color composites made from images representing individual components often show information not evident in other enhancement products. Canonical Analysis and Decorrelation Stretching are also mentioned.


Principal Components Analysis

We are now ready to overview the last two types of image enhancement discussed in this Tutorial. Both are also suited to Information Extraction and Interpretation, but are treated separately from Classification (considered later in the Section). We will embark first on a quick run-through of images produced by Principal Components Analysis or PCA (also accessible as Appendix C). (You have the option here of reading this summary of the theory of PCA or going directly on to the review and images on the rest of this page; the theory is “tough”, so take a look at the link and then decide.) PCA is a decorrelation procedure which reorganizes by statistical means the DN values from as many of the spectral bands as we choose to include in the analysis. In producing these values, we used all seven bands and requested that all seven components be generated (the number of components is fixed by the number of bands, because they must be equal).

Next look at each of these components, keeping in mind that many of the tonal patterns in individual components do not seem to spatially match specific features or classes identified in the TM bands and represent linear combinations of the original values instead. We make only limited comments on the nature of those patterns that lend themselves to some interpretation.
` <>`__1-14: After reading through the special review of PCA accessed by link, plus the above paragraph, see if you can come up with a single key word (or perhaps a key idea in several words) that describes the main benefit from using Principal Components Analysis. `ANSWER <Sect1_zanswer.html#1-14>`__
The first Principal Component explains the maximum amount of variation in the 7-dimensional space defined by the seven Thematic Mapper bands. The image produced from PC 1 data commonly resembles an actual aerial photograph.

The first Principal Component (PC1) image derived from all seven TM bands covering Morro Bay.

In fact, this is the normal character of the first component, in that it broadly simulates standard black and white photography and it contains most of the pertinent information inherent to a scene. The hills appear more realistic because the sharp light-dark contrast in most TM bands is subdued. Note the internal structure of the waves and the absence of any indication of sediment load in the sea. The histogram of the first PC shows two peaks. The first, on the left,constitutes the ocean pixels and the second one, to the right, the land pixels.

` <>`__1-15: Describe this image relative to, say, the histogram-equalization stretched image seen on the previous page. `ANSWER <Sect1_zanswer.html#1-15>`__

When we look at the histogram of the second PC, we see that even though the total range (maximum value - minimum value) is greater than for the first PC, most of the pixels fall in a small range around the mean of 49. Thus as is the convention the second PC has a smaller variance (variance is standard deviation squared) than the first PC. Since the bulk of the pixels falls in such a narrow range, the image does not display well (below left). In order to make the image viewable (below right), we expand (stretch numerically) it and then apply a histogram equalization to the results. This procedure (histogram equalization) produces a histogram where the space between the most frequent values is increased and the less frequent values are combined and compressed. If we had not done this transformation, the image would appear tonally flat, with only two gray levels defining most of the land surfaces and one gray level defining the ocean. However, the distinctions that were previously small are now magnified and easier to see on the computer display. The breaker waves are uniquely singled out as very bright.

Contrast-stretched (histogram equalization) TM Band 3 image of Morro Bay. The second Principal Component (PC2) image of the Morro Bay scene.

` <>`__1-16: Make some general observations on how the tonal patterns in PC 2 differ from patterns observed in, for instance, Band TM 3. `ANSWER <Sect1_zanswer.html#1-16>`__

Some of the gray patterns in the PC3 image below can be broadly correlated with two combined classes of vegetation:

The third Principal Component (PC3) image of the Morro Bay scene.

The brighter tones come from the fairways in the golf course and many of the agricultural fields. Moderately darker tones coincide with some of the grasslands, forest or tree areas, and coastal marshland. Note that both the beach and waves almost disappear as patterns.

The breakers completely disappear in the PC4 image below while the rest of the scene is rather flat with several patterns set forth in medium grays.

The fourth Principal Component (PC4) image of the Morro Bay scene.

` <>`__1-17: Anything unusual about PC 4 that might be meaningful? `ANSWER <Sect1_zanswer.html#1-17>`__

You may be wondering what the remaining PCs (through PC7) look like, and if they show any useful information. The response, after examining, for example, PC6, is that the features we are familiar with do appear but probably offer little new in interpretation. Note that the waves in the image below now are black - interesting but perhaps meaningless; the golf course pattern is also black.

The sixth Principal Component (PC6) image of the Morro Bay scene, with a special stretch

Any three of these four PC images can be made into color composites with various assignments of blue, green, and red. In all, 24 different combinations are possible. Of those made experimentally for this review, this next image composed of PC 4 = blue, PC 1 = green, and PC 3 = red has proved the most interesting. In this rendition, the golf course has a singular color signature (orange-red) and a unique internal structure. Most other vegetation shows as red to purple-red tones, but the grasslands (**v**) has an unusual color, describable as greenish-orange. The brighter slopes of the hills and mountains appear as medium green, while some areas in shadow, are bluish. The urban areas also have a deep blue color. The beach bar now appears as turquoise and the adjacent breakers are olive-green.

Color composite image of Morro Bay made using PC4 = blue; PC1 = green; PC3 = red

A very instructive example of a practical use of PCA is given in Section 5, page 3.

A variant of PCA is known as Canonical Analysis (CA). Whereas PCA uses all pixels regardless of identity or class to derive the components, in CA one limits the pixels involved to those associated with pre-identified features/classes. This requires that those features can be recognized (by photointerpretation) in an image display (single band or color composite) in one to several areas within the scene. These pixels are “blocked out” as training sites much as you will see done in the Classification discussion beginning on page 1-16. Their multiband values (within the site areas) are then processed in the manner of PCA. This selective approach is designed to optimize recognition and location of the same features elsewhere in the scene.

Another use of PCA, mainly as a means to improve image enhancement, is known as a Decorrelation Stretch (DS). The DS optimizes the assignment of colors that bring out subtle differences not readily distinguished in natural and false color composites. This reduction in interband correlations emphasizes small but often diagnostic reflectance or emittance variations owing to topography and temperature. The first step is to transform band data into at least the first three PCs. Each component is rescaled by normalizing the variance of the PC vectors. Then each PC image is stretched, usually following the Gaussian mode. The stretched PC data are then projected back into the original channels which are enhanced to maximize spectral sensitivity.

The IDRISI Windows version used to produce the various Morro Bay images does not contain the last step in the DS process. However, here are two examples gleaned from the Internet. The first shows a Landsat subscene of an unidentified area.: on the left is a standard false color composite; on the right a DS image - this illustrates the ability to extract and emphasize the tonal differences not apparent in the left image:

Decorrelation stretch of an arid area (left) in which differences in color do not stand out.

Users of ASTER data have found Decorrelation Stretching to be particularly effective in image display. The stretch is effective whether the bands used are in the Visibile, the SWIR, or the thermal IR interval. These three ASTER scenes (again, of an unidentified area) show the effects of a DS; read the captions for the bands used.

ASTER image: left - false color composite using bands 321 as RGB; right DS version.

Color composits made from SWIR bands 679 as RGB, with the DS version on the right.

ASTER thermal bands 13, 10, 12 as RGB, and corresponding DS version on right.

The difference between the PC color composites and the DS color composites is generally not large, but extra statistic data manipulation in the latter often leads to a better product.


Primary Author: Nicholas M. Short, Sr. email: nmshort@nationi.net