It's not just about squiggly lines anymore: extracting information from spectral mapping using principal components analysis

I describe here principal components analysis, a method for condensing the information present in images with many colors into fewer channels. This section is heavy on linear algebra—just a warning.

I am presently using this to try to map Titan into spectral classification units using the Visual and Infrared Mapping Spectrometer (VIMS). VIMS takes simultaneous 64×64 images in 256 different infrared channels at wavelengths between 0.9 and 5.2 microns. However, VIMS can only see through Titan's atmosphere and down to the surface in a handful of spectral windows, totalling maybe 20–30 channels. The remaining channels probe different levels in the atmospheric haze, but most are redundant.

I am using principal components analysis (PCA) to bring out subtle variations by reprojecting the VIMS 256-color maps into a different set of orthonormal basis vectors that span the same space, but have most of the data's information in only 9 or 10 channels instead of 256. Woah.

I learned about this technique from an OpSci class that I took here at Arizona, from Dr. Schowengerdt and based on his book on remote sensing. In any given scene, most of the power and information within the data is the overall albedo of the scene, as this is usually pretty constant as a function of wavelength. Hence the first principal component of a scene is generally a basis vector

x1 + x2 + ... + xn

in the same way that "value" in HSV space represents the overall brightness of each pixel in an RGB image.

Additional principal components then encode higher-order spectral variations in the image. Rather quickly, the magnitude of each channel decreases such that only noise remains.

VIMs original and PCA spectra

Note that the PCA plot is semi-logarithmic. These are plots from VIMS on Cassini's Tb flyby of Titan.

To get the principal components, I wrote my own software in C++ to find the components based on instructions embedded in the Wikipedia PCA article. The tricky parts are: (1) generating the covariance matrix and (2) finding that matrix's eigenvectors and eigenvalues. The first amounts to just a few lines of code:

// Step 4:  Find the empirical covariance d x d matrix C of S.
cube covariance(N(Z), N(Z), 1, 0.);
cout << " Calculating covariance:   00%";
for (int cx=0;cx < covariance.N(X);cx++) {
    printpercent(cx,covariance.N(X)-1);
    for (int cy=cx;cy < covariance.N(Y);cy++) {
        double thiscov=0.;
        for (int ix=0;ix < subbed.N(X);ix++)
            for (int iy=0;iy < subbed.N(Y);iy++)
                thiscov += subbed(ix,iy,cx)*subbed(ix,iy,cy);
        thiscov /= N(X)*N(Y)-1.;
        covariance(cy,cx,0) = covariance(cx,cy,0) = thiscov;
    }
}

To get eigenvalues and eigenvectors, I hooked in to a version of the Numerical Recipes function jacobi() that I modified. Do not attempt to write your own. Use the NR one, or anything else you can find that is pre-written.

Here's an example of the (very preliminary) results: on the left is a VIMS 3-color image of Titan using the 1.6 (blue), 2.0 (green), and 5.0 (red) micron spectral windows for color information. On the right is a 3-color image using 3 interesting-looking principal components projections.

VIMS true color and principal components

These images bring up one of the big problems with PCA: What do the colors mean in the right-hand image? It's not obvious. My intention is to use the PCA to help identify clusters that indicate spectral similarity, but then I'll have to figure out what that spectral similarity means by pulling out the individual originally-projected spectra. PCA is a tool, it doesn't just do the science for ya!