1. Introduction

The sample design to support the technical program supporting a National Forest Assessment (NFA) requires a theoretical design that must be implemented on the ground (see Implementation of an NFA). Understanding the basic concepts related to statistical design and estimation methods are one component of the overall process for Information management and Data registration for National Forest Assessments.


The goal is to estimate the condition of forests for an entire nation using data collected from a sample of field plots. The basic objectives of an NFA are assumed to be fourfold: (1) to obtain national estimates of the total area of forest, subdivided by major categories of different forest types and conditions; the numbers and distributions of trees by species and size categories; wood volume by different tree characteristics; non-wood forest products; estimates of change in these forest attributes; and indicators of biodiversity; (2) to obtain sufficiently precise estimates for selected geographic regions such as the nation, sub-national areas, provinces or states, and municipalities; (3) to collect sufficient kinds and amounts of information to satisfy international reporting requirements; and (4) to achieve an acceptable compromise between cost and the precision and geographic resolution of estimates. See Variables typically assessed in National Forest Inventories .

Assumptions and simplifying constraints

Several assumptions underlie the discussion that follows. First, we assume that expert statisticians who are experienced in designing natural resource inventories and analyzing the data are not available. Second, we assume that ancillary data in the form of maps depicting features such as ecological regions, land cover, soils, elevation, political and administrative boundaries, and transportation systems are available. Third, we assume models for predicting attributes such as individual tree volumes from basic tree measurements are available. Even with these assumptions, a full discussion of all sampling design possibilities for an NFA is beyond the scope of this section. Thus, we establish three constraints that further limit the discussion. First, we constrain the discussion to relatively simple, multipurpose designs that can be used reliably with only local expertise. Second, we constrain our discussion to designs that are flexible, yet reduce risks of bias and loss of credibility. Third, we constrain our discussion to designs that feature equal probability samples, or in the case of stratified designs, equal probability samples within strata.

Why use sampling?

The most precise description of a population comes from accurate measurements of each member of the population, which is a census. However, a census is typically impossible because of cost and logistical problems. Imagine trying to measure every tree in a 1-million hectare forest. A sample measures a portion of the population, and in forestry, this is usually a very small portion. Estimates derived from data collected from the measured sample is then extrapolated to the entire population, most of which has not been measured.

Think of this as ¿guessing¿ or ¿estimating¿ the condition of a population based on sampling a few members of that population. If the sample is representative of the entire population, then the estimate will be accurate and unbiased. Otherwise, estimates will be inaccurate and misleading; it will not be apparent that the estimates are inaccurate; and the accuracy of estimate will not be known because the true condition of the whole population will not be known. The best that is possible is to increase the chances of measuring a representative sample. This is done by using scientifically defensible rules to select the sample, maximizing the number of sample units observed measured, and minimizing the errors in measuring each sample (see Data quality ). It is not difficult to produce data. It is much more challenging to produce accurate data with known reliability that will be used to help make important decisions.

Defining the population

Scientifically defensible estimation of population attributes is based on a very formal body of mathematical theory which must be respected if it is to be used to defend the accuracy of sample-based estimates. Careful selections of a sampling frame, plot configuration, and sample design are crucial steps in the process and cannot be done independently of each other. Each decision has impacts on the others. The mathematical theory begins with a precise definition of the population for which attributes will be estimated. For example, for a municipality of 5-million ha and 1-million ha of forest, the statistical population could be described in several different but logical ways:

  1. Thousands of tree-stands and non-forest polygons
  2. Tens of millions of potential 0.1-ha sampling plots
  3. Ten million remotely sensed 30m x 30m pixels
  4. Billions of trees
  5. Infinite number of points

See the section on Observation units for more details.

There is no one best definition of a population for forest inventories. The key issue in basic applications of forest sampling is to define precisely the geographic boundaries of the targeted population, such as all lands, both forest and non-forest, within a nation that are outside of the geopolitical boundaries of urban areas. It is not uncommon to discover portions of a target population can not be sampled. Examples include areas that are remote and inaccessible or unsafe to access. These areas should be precisely identified in a cartographic form, even though the true boundaries might not be obvious, and be excluded from the sampled population. Scientifically defensible estimates must be limited to the sampled population only.

Choosing a sampling frame

We distinguish among three terms: sampling frame, sample design, and plot configuration. Sampling frame refers to the set of all possible sample units; sample design refers to the selection of a subset of sample units to represent the population; and plot configuration refers to the size, shape, and components of the field plot. Some advantages are gained with a sampling frame that considers a forest to be an infinite population of points. One approach to sampling with this sampling frame is to use the popular Bitterlich plot which is efficient for estimating variables correlated with tree size. Alternative point-based plot configurations measure a support region and impute its attributes to a point. When near a boundary or stand edge, a point is more easily assigned to one side or the other, whereas plots with different designs can straddle edges or boundaries. We recommend considering the forest population to be an infinite set of points and that physical measurements in a support region be used to describe conditions at a sample point.

Choosing a plot configuration

The plot configuration consists of the plot size and shape and determines what to measure at each sample plot location. Choices for plot configurations include variable area plots, fixed area plots, subdivisions of plots into subplots, and cluster plots, all of which require plot size and shape considerations. Variable area plots using Bitterlich sampling is particularly effective for obtaining precise estimates of forest attributes related to tree size. Fixed area plots, while not necessarily optimal for any particular forest attribute, are an excellent compromise when sampling is intended to produce estimates of a wide variety of forest attributes and tend to be more compatible with ancillary data. Cluster sampling reduces travel between plots while providing a sufficient number of plots. The optimal shape and size may be addressed using sampling simulation and prior information, although circular plots are often used in forest inventories.

Issues related to the selection of a plot configuration are discussed in the sections on Observations units and Optimization of plot designs.

Measuring sample plots

The section on Observation and Measurement for National Forest Assessments summarizes the major considerations relevant to measuring sample plots. For more detailed information, see the on-line reference Statistical Techniques for Sampling and Monitoring Natural Resources (Schreuder et al, 2004). For this section, we note two aspects of this issue, the use of remotely sensed data for measuring plots and temporary versus permanent plots.

First, remotely sensed data from medium-resolution satellites and high-altitude aerial photography (1:24,000 to 1:60,000 scales) provide cost-effective measurements for coarse indicators of forest conditions, mostly forest area changes. However, most measurements of detailed forest conditions are impossible with these sensors (see Remote sensing for NFA ). More detailed measurements of forest conditions may be obtained with low-altitude aerial photography and sensors such as Lidar lasers. All of these sensors are currently expensive and have a narrow field of view that is not currently capable of producing border-to-border coverage of an entire nation. However, in principle, these sensors could be used to measure a sample of locations in a national survey. For example, it may be cost efficient to measure a plot initially with data from a remote sensor to determine if the plot has accessible forest land cover or forest land use. If not, field crew visits to the location may not be warranted.

Second, estimating changes and trends in a nation¿s forests is often an important part of NFA. If the locations of sample plots are sufficiently documented, then the same plots can be re-measured over time to obtain more precise estimates of forest change such as tree growth, mortality, harvesting, regeneration, and changes in the area of different forest conditions and land use categories. (see sections on Temporal vs. permanent observations, Observation and Measurement, and Change assessment for more details ). Remeasurement of plots increases estimation efficiency and contributes to better understanding of the components of change. However, if permanent plots are used, their locations must be very accurately documented. This can be done by driving a pin into the ground, and carefully documenting how to find the pin from a convenient starting location, perhaps several kilometers distant. The pin should be hidden from normal view to keep the plot truly representative of thousands of hectares that will never be measured. A sample plot will not be representative if it receives special treatment such as protection from harvesting or other disturbances. An obvious pin in the ground could influence how the location is treated by other humans.

Although remeasurement of the same trees produces the most precise estimates of change, this approach is more costly because the same plot centers and trees must be relocated at the time of each measurement. Alternatives for estimating change from temporary plots include estimation of tree growth from increment borings and gross estimation of forest area and volume change by comparing independent estimates obtained from measurements of different sets of temporary plots at different points in time. However, harvest, mortality, and regeneration are difficult to estimate using data from temporary plots. Thus, where possible, we recommend the use of permanent plots or a combination of permanent and temporary plots (e.g. Ranneby et al. 1987).