2. Aggregation


Aggregation is the combination of individual trees into groups based on some characteristic such as diameter, height, species, or functional group in order to simplify field data collection or to summarize information for decision-making. All trees may be combined to obtain aggregate level estimates of the characteristic of interest. Forest stands represent one such aggregation.


In conducting an inventory, it is important to measure all trees within a sampling unit that meet specified criteria. (For more information visit Sample designs). These criteria are usually based on demographic characteristics. For example, a minimum size limit, often expressed as a minimum diameter such as, say, 10 cm. Criteria may also limit measurements to certain species of interest. It is important to remember that if only a subset of species or trees are observed in the field, then it is not possible to estimate total volume or biomass for the forest of interest. Inventory estimates of population attributes are always limited by the individual tree selection criteria during field observation. The results apply only to population elements with a known positive probability of being included in the inventory sample.

Sample unit summaries do not usually include values for all trees measured. Trees are aggregated into groups based on species -see section 2.1- demographics such as diameter, height, or social status -see section 2.2- hierarchy such as functional group -see section 2.3- or the total stand. Aggregation may be one-dimensional (e.g., total volume by species) or multi-dimensional (e.g., volume by diameter class by species) as appropriate.

The resulting accuracy and precision of estimates are enhanced if aggregation occurs at the analytical phase after field data are collected. In this case, variables are measured and recorded for every tree in each sampling unit, and trees are grouped into classes during analysis. In reality, though, aggregation often occurs during field observation. Instead of measuring diameter of each tree to the nearest 0.1 cm, for example, counts of trees by 5- or even 10-cm diameter class are often the variable observed in the field. In this case, the class mid-points are usually used in the analytical phase. This introduces additional error and possibly bias into the resulting estimates if the derived attribute is a nonlinear function of the aggregated variables. Of course, aggregation greatly simplifies field procedures and may well result in greater overall precision if the savings due to efficiency is used instead to collect more field data.

The reasons for aggregation vary by inventory, but include the simplification of field data collection as well as information requirements to support decision-making.

2.1 Aggregation by Species

For total stand estimates, estimated values are aggregated over all species. It is important that all species be observed if estimates are to be developed for the total stand. Aggregation by species can occur during analysis, or it can occur earlier during field data collection.

It is common in temperate forests to develop estimates of stand characteristics by species. In tropical forests, this is difficult to do and probably even more difficult to interpret due to complex stand structures or the large number of species. In some instances, species are combined by ecological functional group (e.g., canopy dominants). This often results in a manageable number of species groups that are relatively easy to interpret in terms of forest structure. Commercial utility could also be considered in defining the species groups.

In national or continental scale summaries, species groups may be developed for reporting purposes due to the large numbers of potential species. A species group labeled "Pine" for example, may include all Pinus species occurring within the geographic area of interest.

2.2 Aggregation by Size

Trees are often aggregated by demographics, meaning that trees of similar sizes or social status are combined into groups for data summaries or field data collection. Aggregation by diameter (e.g., 5-cm diameter class) is probably the most common method of aggregation during field data collection, with the number of trees in the respective diameter classes being tallied in the field and diameter class midpoints being used in the subsequent analyses to estimate variables like volume or biomass. The minimum diameter of trees to be measured is usually specified in field measurement protocols. This is defined according to the purpose of the survey, but is usually the minimal commercial diameter or one to two diameter classes below that threshold.

Trees may also be aggregated by height class, which may have bearing on the resulting products that may be derived. Height class may also be used to aggregate trees for analysis of grazing forage availability or wildlife habitat structure. Height aggregation is probably more common in ecological or grazing surveys than in surveys that focus on assessment of commercial fiber utilization opportunities.

Finally, trees may be grouped by their position in the forest canopy, such as canopy dominants. This type of classification is more useful in ecological surveys than in surveys attempting to inventory commercial material. Structural classification may have a great deal of importance in evaluating forage potential or other non-fiber commercial potential.

2.3 Aggregation by Hierarchy

Trees may be aggregated in various hierarchical systems for analysis and reporting. Trees may be aggregated in a biological hierarchy (individual trees --> species --> ecological functional group --> stand). Trees could also be aggregated in a utilization hierarchy (individual trees --> diameter class --> product class --> species --> stand). Reporting may include summaries for any or all levels of the hierarchy.

Aggregation also occurs at the stand level, with sampled stands being grouped into predetermined strata for summary purposes. Statistical analyses may be based on such stratification. In such cases, stands may be aggregated by forest type (e.g, moist tropical) or geographic region for analyses at regional or national levels. Geographic region may be based on political delineations (e.g. state or provincial boundaries) or ecological zones (e.g. Holridge life zones). Aggregation at the stand level may occur after estimation of the variables of interest for each sampling unit, with the appropriate methodology for the aggregation being determined by the sampling design and survey objectives. Aggregation at the stand level may be built into the sampling design through use of stratified sampling.

2.4 Aggregated Class Estimation

There are two basic sampling units commonly used in forest inventory applications: 1) fixed area plots, and 2) variable radius plots. Transect or line plot samples are a special case of fixed area plots and can be treated similarly in many applications. In most cases, estimates of the variables of interest are obtained for each sample unit, and then combined to obtain estimates for the larger area of interest. The method of combining estimates from individual sampling units, and the methods of estimating associated precision of the estimates, are dependent on the selected sampling design.

For trees measured on fixed area plots, estimates of per hectare values are obtained by dividing the respective individual tree characteristics by the size of the area sampled:

Yij = Estimated per hectare quantity of the measured variable for the ith sampling unit and the jth aggregation class with nij observations of X.

Xijk = Value of the measured variable for the kth tree in the ith sampling unit and jth aggregation class.

A = Size in hectares of the individual sampling unit.



For trees measured on variable radius sampling units, (visit Measurements for more information) estimates of per hectare values are obtained by dividing the respective individual tree characteristics by the basal area of the measured tree, and multiplying by the basal area expansion factor:

Yij = Estimated per hectare quantity of the measured variable for the ith sampling unit and the jth aggregation class with nij observations of X.

BAF = Basal area factor, equivalent to the basal area per hectare represented by each measured tree.

Xijk = Value of the measured variable for the kth tree in the ith sampling unit and jth aggregation class. Bijk = Basal area in m2 of the kth measured tree in the ith sampling unit and jth aggregation class.

2.5 Implications of Aggregation in Estimation and Modeling

Aggregation at the analytical stage is usually designed to simplify reporting or to facilitate the use of survey information for decision-making. Aggregation at the field data collection phase very often simplifies field data collection and may improve the relative accuracy. The potential downside is a possible introduction of bias and an almost certain reduction in both accuracy and precision of resulting estimates due to the introduction of error. Combining all trees within a species class together, for example, results in a loss of information on individual tree sizes that may result in a loss of precision of the estimates. This may be offset, however, if simplification of field data collection procedures allows the collection of additional data.

The survey objectives, sampling design, cost of sample unit establishment, and resulting numbers of samples that can be collected must be considered in determining the appropriate level of aggregation during field data collection. Too much aggregation at the field sampling stage results in a loss of precision in the overall estimates and a possible failure to satisfy survey objectives. Too little aggregation at the field data collection stage may result in excessive cost and a failure to collect sufficient samples for adequate precision. Decisions about aggregation during the field data collection efforts can also affect future utility of the resulting data, particularly their use in addressing issues that arise after the sampling has occurred. It is not possible, for example, to explore many aspects of biodiversity if species have been aggregated during field data collection.

A disaggregation of aggregated data is only possible if one is willing to make assumptions about the frequency distribution of possible data values that have been aggregated into a single value. Only rarely can such assumptions be justified.