2.1 Stratified sampling

Stratified sampling entails first dividing the population into non-overlapping subpopulations called strata that together comprise the entire population and then drawing an independent sample from each stratum. If the sample in each stratum is a simple random sample, the whole procedure is described as stratified random sampling. Numerous reasons may be given as justification for stratified sampling (Cochran 1977, Schreuder et al. 1993). First, stratification is used to increase the precision of population estimates. To understand the potential for gain in precision that may be achieved with stratification, some notation and formulae are necessary. With simple random sampling (SRS), the estimate of the population mean is

and the estimate of the variance of the mean is

where n is the sample size, yi is an observation, and

is the sample estimate of the population variance. Cochran (1977) provides basic formulae for stratified estimation. Ignoring finite population correction factors and estimation errors in stratum weights, an unbiased estimator of the population mean and variance are,



are the within stratum means and variances, respectively; h=1, 2, ', L denote strata; j denotes observations within strata; nh denotes the number of sample observations within the hth stratum with n1+n2+'+nL=n; and Wh is the stratum weight representing the proportion of the population in the hth stratum. The effects of stratification and stratified estimation on precision are often assessed using relative efficiency, RE, defined as,

where RE>1 indicates a beneficial effect. Relative efficiency may be interpreted as the increase in the overall sample size that would be necessary to achieve the same precision using estimation based on simple random sampling as is achieved using stratification and stratified estimation. From a quantitative perspective, precision gains are realized when variances of estimated stratum means are substantially less than the variance of the overall estimated mean and/or when strata with large

represent small proportions of the population (i.e., when Wh is small). From a qualitative perspective, precision gains are realized when heterogeneous populations are divided into more homogenous subpopulations. This typically means that the strata have substantially different means, variances, or both.

A second reason for stratification is that it may contribute to avoiding estimation bias, depending on the estimator selected. For example, NFA field crews generally are granted access to plot locations on publicly owned lands. However, if permission of private land owners is required to measure sample plots on their lands, inevitably some private land owners will deny access. In extreme cases, the ratio of privately owned to publicly owned plots in the sample may be considerably less than the ratio of privately owned to publicly owned forest lands in the population. If the species compositions and/or management practices are substantially different on privately owned and publicly owned forest lands, estimation bias may occur. One solution is to stratify lands by ownership, thus leading to independent sample estimates for the two ownership strata (McRoberts 2003).

A third reason for stratification is to accommodate different sampling protocols or different estimation procedures for different subpopulations. For example, a substantial portion of sampling costs may be attributed to travel to and from plot locations. If data from remote sensors may be used to determine that some plots are located on non-forest land, then travel costs may be substantially reduced by not sending field crews to these plot locations. As a result of the different measurement technique, however, a different estimator may be required for these strata.

The greatest benefits of stratified estimation are realized when the population is stratified and stratum sample sizes are determined before sampling is conducted. The process of determining stratum sample sizes or, equivalently, allocating samples to strata, may be accomplished in several different ways and for several different purposes. Frequently samples are allocated to strata in proportion to some attribute of the strata. An easily implemented approach is to allocate sample plots to strata in proportion to strata sizes. If simple random or systematic sampling is used within strata, then this approach leads to equal probability samples within strata which may simplify estimation. However, with this approach, the variances of stratum means may differ greatly. If comparably precise estimates of stratum means are desired, then samples may be allocated to strata in proportion to stratum variances. A potential disadvantage of this approach is that good estimates of stratum variances are necessary before samples are allocated to strata. Finally, it may be that estimates of means for some strata are more important than others. In this case, samples may be allocated to strata in proportion to a subjective assessment of strata importance.

Often the sampling objectives prohibit stratified random sampling. For example a systematic sample design may be used as a means of optimizing the precision of estimates for multiple variables simultaneously. Even though the greatest benefits of stratification may not be realized for any particular variable, the beneficial effects of increasing precision and precluding estimation bias may still warrant post-sampling stratification and stratified estimation. Thus, even if stratified sampling is not used, we recommend consideration of post-sampling stratified estimation because large increases in precision may often be realized with little additional cost or effort.

Almost any source of data can be used to create strata as long as two tasks can be accomplished in a consistent manner. First, stratum weights, calculated as the proportion of the population represented by each stratum, must be determined. Second, each plot must be assigned to one and only one stratum. The increasing availability of diverse thematic digital data layers opens vast possibilities for sources of data that can be used to create strata. In addition, the increasing availability of geographic information systems (GIS) greatly simplifies accomplishment of the two tasks. One popular choice of stratification data is land cover classifications from which aggregated forest and non-forest classes may be constructed and used as strata (McRoberts 2002). Using a GIS with such a layer greatly simplifies the two stratification tasks. Within the GIS, each mapping unit of the land cover classification is assigned to a stratum based on the class assigned to the mapping unit. Calculation of stratum weights is then simply a matter of using GIS functionality to determine the total area of all mapping units assigned to the same stratum and dividing by the total area of the sampled population. Plots are assigned to the stratum of the mapping unit containing them. Other choices of digital data layers that can be used to create strata include, but are not limited to, soil maps, climate division maps, ecological provinces, administrative boundaries, ownership maps, and land management units.

last updated:  Monday, November 29, 2004