2. Review of methods

Two-stage stratified random sampling and visual interpretation of Landsat prints has been the approach in the global Tropical surveys in FRA 1990 and FRA 2000 (FAO 1996, FAO 2001 a, FAO 2001 b). The advantage of visual interpretation is the possibility to utilise contextual information and expert knowledge in the analysis more easily, sometimes in a more effective way, than in digital methods. Visual interpretation is especially effective for detecting changes in land use with images from different years. Visual interpretation is, however, laborious and is sensitive to subjective factors. These things become more critical in global surveys with varying vegetation zones. Visual interpretation may have some role, depending on the approach, e.g., in estimating reference data from very high resolution remote sensing data for digital analysis with high and medium resolution data, and also in feature extraction. Some most commonly used digital analysis methods are recalled in this chapter.

The traditional method applied in remote sensing has been discriminant analysis and its different versions. This method is relevant when the goal is to estimate a limited number classes, e.g., vegetation types or land cover classes (FL, OWL, other land). Regression analysis has been used estimating quantitative variables, e.g., tree stem volume and biomass. Non-parametric methods, e.g., k nearest neighbour estimation (k-nn estimation) and artificial neural networks have the advantages that they can be used for estimating simultaneously all inventory variables. Particularly, k-nn method is under intensive research in Europe and North America and has also been applied in operative inventories.

The availability of reference data for digital image analysis or visual image interpretation will be one key problem in remote sensing aided global survey. In principle, a certain minimum number of field plots for each image scene is needed. This could partly be overcome by relative calibration of images which makes it possible to utilise reference data from neighbouring image scenes.

Some details of the methods are summarised in the next sections.

2.1. Relative calibration of images

2.1.1. The physical model

Several mechanisms affect the electromagnetic signal carrying information from the targets on ground to the sensor above (on satellite or aircraft). The atmosphere attenuates the signal. The attenuation depends on the amount of air between the light source and the target, the amount of air between the target and the sensor, and the wavelength. The amount of air is dependent on the distance between the target and the sensor and this varies according to the location within the field of view of the instrument. Scattering of the light from the molecules of the different gases creates a background signal at each pixel in the image. This effect is also dependent on the wavelength and viewing geometry. In addition to this, the background level at a pixel is dependent on the neighbouring targets on ground. In total, any realistic model of the atmospheric effects on the received signal is complex when the interaction between the electromagnetic field and the air is noticeable. In the microwave region the interaction is weak.

The reflection functions of natural targets vary according to the illumination and viewing angle. Models exist for predicting this phenomenon but, in addition to the viewing and illumination geometry, knowledge of the target is necessary and often not available beforehand.

Time is an important parameter in explaining the differences between images. If the images are from different dates, the atmospheric conditions are different. To some extent, the atmospheric conditions can be retrieved from the archived meteorological data or estimated from image data. The parameters related to the illumination and viewing geometry are easily available. The changes in the targets during time are more difficult to model.

Time is an important variable in explaining the changes within images if imaging takes a long time. This is the case in aerial image mosaics and airborne scanner data.

When analysing remote sensing images, all of the effects mentioned above must be somehow taken into account. In visual analysis, the human brain does this. If the analysis uses only a small region within a single image, the changes are small and can be ignored. When using global methods within larger images or images sets, correction must be applied or integrated into the analysis method.

The correction methods can be divided into two classes: atmospheric correction methods and relative calibration methods. The division is not exclusive. A correction system can include methods from both classes or relative calibration can be applied after atmospheric correction.

2.1.2. Atmospheric correction

The atmospheric correction methods correct single images based on a physical model and atmospheric parameters. The model is then inverted so that the target reflectance can be estimated for each pixel's ground footprint from the observed image pixel values.

The atmospheric models available (e.g., MODTRAN or 6S ) (Berk et al. 1989, Vermote et al. 1997) are pretty good but the problem is availability of the atmospheric parameters. Good average values may be available but the parameters vary within an image covering a large area. In some cases it is possible to estimate this variability from the image data. Calibration of the sensors may also affect the results especially at low signal levels (e.g., in forested areas in the visible optical region). (It is very difficult to measure the changes in the sensor properties after the satellite has been launched).

Implementation of complete inversion of the atmospheric model at each pixel is not feasible but good enough approximations can be used with current computers. The atmospheric correction can be expected to compensate the atmospheric effects within images and between images to a large extent but absolute calibration of images is not feasible with current technology.

2.1.3. Relative calibration

The relative calibration methods try to make the images homogeneous and remove the differences between images. The physical mechanisms or other causes of the differences are not explicitly used, although methods may try to avoid removing changes related to the phenomena being examined (for instance, changes in forest between two time points).

The methods for relative calibration examine areas covered by two or more images or compute statistics for different parts of the images. These are then used to compute parameters for correction models.

The relative correction methods can be very efficient in removing visible changes between images. The methods are not dependent on any physical parameters that may not be available. The drawback is the possibility of correcting differences due to variability in the target state.

2.2. Methods for delineating analysis units

Sometimes the analysis of larger units than a single pixel may improve the accuracy of the image analysis based estimates. This is the case particularly with very high resolution images and with visual interpretation. The analysis unit can be, e.g., a patch with a minimum area to be classified as forest land or a homogeneous forest area. Segmentation methods are applied in digital image analysis in delineating the area of interest to subsets. Segmentation can be done either using multitemporal images, the difference image or a single image.

Image segmentation means partitioning the image into homogeneous areas according to some criterion. The homogeneity criterion is selected to either result in areas meaningful for the application area (for instance, forest stands) or meaningful as an aid in analysis. The criterion is computed using the pixels belonging to each area. Segmentation is useful only if the resolution of images is high enough so that the meaningful entities in the image are larger than one pixel.

The segmentation methods compute a more or less optimal allocation of pixels into areas based on the selected homogeneity criterion. The algorithms can be classified into three classes: splitting methods, merging methods, and split/merge methods. The splitting methods start by allocating all pixels to the single area. This area is then recursively split until each of the resulting areas is homogeneous enough according to the selected criterion. The drawback of splitting methods is that they tend to result in an excessive number of areas. The merging methods start from elementary areas (for instance, pixels) and combine the areas as long as the combined areas are homogeneous enough. The drawback of these methods is also that the resulting areas will be smaller than the optimal partitioning. The split/merge methods alternate between the two basic methods to arrive at the optimal partitioning.

The practical segmentation methods can use quite complex homogeneity criteria. The criteria can combine spatial features within an area with edge features at the border of the area.

The value of segmentation can be twofold. In all cases it enables use of several pixels for computation of more reliable information for an entity than pure pixel wise methods. In some cases the segments can also be meaningful to the application. One example is automatic partitioning of the forest into stands.

2.3. Feature analysis

The purpose of feature analysis is to compute from the remotely sensed information a set of numbers that the analyser (human or machine) can most efficiently use. Feature analysis does not increase the available information but makes it accessible to the analyser and suppresses the information that is not interesting in the current analysis.

The feature analysis methods for remote sensing images can be separated into point wise methods and spatial methods. The point wise methods compute new values based on the measurements from the same pixel. Examples are spectral channel ratios and difference images. The spectral channel ratios suppress unknown, useless information present in both channels in the ratio. One example is the average illumination level. Difference between data from two time points can be used to emphasise changes between the images. More elaborate point wise features can be constructed either based on physical or heuristical criteria.

The spatial features are computed from several pixels in the neighbourhood of the pixel being analysed. These features provide information about the spatial characteristics or homogeneity of the image around the pixel being analysed. Examples of spatial features are texture measures and edge detectors. The texture measures characterise the local regular or random structure of the intensity variations. A simple texture measure is the standard deviation of the intensities of the pixels in a small neighbourhood. A large number of more complex texture measures have been proposed in the literature. The edge detectors measure the probability of the pixel being on the border between two areas and possibly also the direction of the border at that point. These detectors are based on differences between neighbouring pixels.

2.4. Estimation using combined field measurements and remote sensing

Some commonly applied estimation and classification methods are recalled. These methods are often employed when field measurements and remote sensing data are available. The availability of other ancillary data, e.g., digital map data, reduces the errors of the estimates. A non-parametric k-nearest neighbour method is described in more detailed way as an example how the estimation procedure could be done, also when other data as remote sensing data and field data are available.

2.4.1. Discriminant analysis

Discriminant analysis is a widely used method to classify a satellite image area into pre-defined classes when field observations are available. Let us suppose that n classes have been defined (e.g., forest land, other wooded land, other land, or broad-leaved forest, coniferous forest, etc.). The probabilities of an arbitrary pixel p belonging to group k, are computed using discriminant functions based on the generalised quadratic distances:

, (1)

where

y_p is the vector of intensity values at pixel p,

m_k is an estimate of the vector of the expected intensities in class k,

S_k is an estimate of the covariance matrix of the intensities in class k, and

q_k is the prior probability of class k.

The value of q_k can be a'priori probability of each class or the inverse of the number of classes.

The posterior probability of observation y_p belonging to the class k is, under the multivariate normal assumption, obtained from equation (2):

(2)

where n is the number of classes.

The class with the highest posterior probability can be attached to the pixel p. Discriminant analysis or its variants might be relevant methods in RSFS if the purpose is to estimate the areas of land cover classes. Cross-validation can utilised in pixel level error estimation. Discriminat analysis is not necessarily the optimal method when several variables should be estimated simultaneously. Problems are the high number of classes and the preservation of the co-variance structure of the variables.

2.4.2. k-nn estimation

K nearest neighbour estimation method is under an intensive research among forest inventory groups (Nilsson 1997, Franco-Lopez et al. 2001, McRoberts et al. 2002, Tokola et al. 1996, Tomppo 1990). Its favour is based on the fact that it resembles the normal forest inventory estimation in the sense that certain weights for each field plot are employed in the estimation. It thus produces simultaneously the estimates for all inventory variables and preserves covariances between variables. Let's first recall the k-nn estimation method. It uses a distance metric, d_pi,p, defined in the image feature space. It is computed from pixel p to be analysed to each pixel p_i, whose ground truth is known (to pixel with sample plot i). Data from the k plots, i₁(p),…,i_k(p), with the shortest distances are utilised in the analysis of pixel p. A maximum distance in the geographical space (usually 50 to 100 km in the horizontal direction) is set from the pixel p to the sample plots applied in order to avoid utilising sample plots from very different vegetation zones. A maximum distance is also set in the vertical direction (in south and central Finland, e.g., usually 50 to 200 m) in order to take into account the vegetation variation caused by elevation variation, provided that a digital terrain model is available. The feasible set of nearest neighbours for a pixel p was , where F is o possible stratum from which the neighbours are sought, is the geographical horizontal and geographical vertical distance from pixel p to pixel p_i and and their maximum allowed values.

The maximum distance in the geographical space can be compensated with a new method presented by Tomppo and Halme (2002). Variables describing the large scale variation of forest variables are used as additional variables in computing the distance metric in that method.

The weight of the ground data vector of plot i to pixel p is then defined by

, if (3)

= 0, otherwise.

Volume and biomass estimates can be written in the form of a digital map

, (4)

where

= the multi-source estimate of the value of variable M at pixel p, and

m_j = the measured value of variable M at field plot j.

The land use classes outside forestry land are transferred directly from digital map file (see Tomppo, 1990, 1996).

The sums of weights, w_i,p were calculated by computation units (municipalities) in the estimation process. The weight of plot i to computation unit U is then:

(5)

The inventory statistics by computation units can be obtained by means of digital boundary maps and the weight coefficients (4). If preliminary land cover maps are available, the area of water and non-forestry land cover classes can be taken by computation units from the digital maps produced by multiplying the number of pixels classified in the land cover class by the size of the pixel:

(6)

where

c = land cover class,

u = computation unit, and

a = area of one pixel.

The estimates of the areas of the forestry land reference units, by computation units, can be obtained from the estimated plot weights employing the equation

(7)

where

S = forestry land reference unit (e.g., forest type),

I_S = set of sample plots belonging to the reference unit, and

u = computation unit.

Reduced weight sums are obtained from the formula (12), if clouds or their shadows cover a part of the area of the computation unit u. The real weight sum for plot i is estimated by means of the formula

, (8)

where

= area of the forestry land of unit u, and

= area of the forestry land of unit u not covered by the cloud mask.

It is thus assumed that the forestry land covered by clouds per computation units is, on average, similar to the rest of the forestry land with respect to the forest variables. The proportion of non-forestry land covered by clouds is estimated as given above.

The mean volume (and biomass) estimates by computation units can be obtained with the formula

(9)

where s is a computation strata (e.g., forest land),
Is the set of the sample plots in the strata,
u a computation unit,
ci,u the weight of the sample plot i in the unit u, and
vi,t the volume (biomass) per hectares of the growing stock on the sample plot i.

The total volume and biomass estimates are obtained by replacing the denominator in the Formula (9) by 1.

Most applications of the k-nn method concern the cases where digital land use map has been available. Methods for correction to a possible bias caused by incomplete map data have been presented by (Katila et al 2000, Katila and Tomppo 2002).

If the digital land use maps are not available, the weights (8) can be used also for area estimation outside of forest land (or forest and other wooded land) when all land use classes are represented among the field plots. McRoberts et al. (2002) used satellite images and field measurements together with k-nn estimation for reducing the RMSE of forest areas estimates without employing ancillary digital map data. The use of Landsat TM data together with field data reduced the variances of forest area estimates with a factor as great as 5 compared to field data based estimates. One field plot, consisting of four sub-plots, represented an area of 12 300 ha and 15 800 ha in two study areas. This sampling intensity and study are a relevant references to a possible remote sensing aided global survey.

The pixel level error of the estimates with remote sensing based methods are often high, particularly when estimating volumes or other quantitative variables (Tokola et al 1996, Nilsson 1997, Poso et al. 1999). The error, however, decreases rapidly when the area in question increases (Nilsson 1997, Tomppo et al. 2002). These studies are based on a relative dense grid of field plots and the use of remote sensing data is target to compute estimates for smaller areas than what is possible using. In the RSFS, the possible field plot is much more sparse and pilot studies are needed for error estimation.

2.4.3. Artificial neural networks

The artificial neural networks (ANN) are a group of information processing methods that are based on or inspired by principles from the biological neural systems. The networks consist of simple computational units with memory that are connected together using links. The connection weights are adjusted using a learning algorithm until the network performs the desired task. This task is basically a mapping from the set of input signals (for instance, reflectance of a target in different wavelength channels) to an output (for instance, classification of the pixel).

There are several different families of ANNs. Examples of the best known ANN types are Multilayer Perceptrons (MLP), Self-Organising Maps (SOM), Radial Basis Function (RBF) networks, ART (Adaptive Resonance Theory) networks. They differ in the unit type, connection topology, and learning algorithm. Each of the networks have advantages and drawbacks and no one family is clearly superior to another. Selection of the best family depends on the application.

One of the strongest advantages of neural networks is that the user does not have to specify a functional dependency between the inputs and outputs beforehand: the network learns the dependency when a set of learning samples if presented to the network. This enables the network to automatically infer and utilise very complex relationships between the input signals and desired outputs. This is also a drawback because the networks may learn the dependencies between the learning samples instead of the general dependencies. The methods use learning parameters to balance the generality versus specificity and also the speed of learning.

The ANNs are effective tools in classification and estimation and they can sometimes give better results than classical statistical methods. ANNs have been tested also in many remote sensing tasks.

2.4.4. Regression analysis

A commonly used approach in regression analysis is to predict the variables of interest as a function of image features. A possible model is thus

where A is the variable to be predicted, the vector of image features,

f a function whose parameters should be estimated and

a N(0,σ²) -distributed error term.

In local linear regression, a weighted model is fitted to each point . The weight of an observation depends on distance of image features such that observations with nearby image features get higher weight. Ride regression can be used to reduce the effect of multi-collinearity. Regression analysis is suitable for instance for modelling and predicting volumes and biomass but can be applied with forest percentage as well.