Previous Page Table of Contents Next Page


6. Two-stage sampling

6.1 INTRODUCTION

In the two-stage sampling design the population is partitioned into groups, like cluster sampling, but in this design new samples are taken from each cluster sampled. The clusters are the first stage units to be sampled, called primary or first sampling units and denoted by SU1. The second-stage units are the elements of those clusters, called sub-units, secondary or second sampling units and will be denoted by SU2.

Two-stage sampling is used when the sizes of the clusters are large, making it difficult or expensive to observe all the units inside them. This is, for example, the situation when one wishes to estimate total landing per trip of a fishery with many landing sites and also with a large number of vessels.

Sometimes, in order to decrease the sizes of the primary sampling units, one can previously stratify the population and apply two-stage sampling to each stratum.

It is possible to extend the two-stage sampling design to three or more stages. A short reference will be made to a three-stage sampling design, using a case where the procedure to estimate errors is simple.

6.2 THE POPULATION

Most of the population parameters of interest to fisheries research in the two-stage sampling design are the same as in cluster sampling. These are summarised below.

Table 6.1
Main population parameters of interest to fisheries research in two-stage sampling design

NNumber of clusters (SU1) in the population
MiNumber of elements (SU2) in cluster (SU1) i
Total number of elements (SU2) in the population
Mean number of elements (SU2) per cluster (SU1). This is useful when a population has clusters (SU1) of unequal size.
YijValue of the chosen characteristic of element (SU2) j in cluster (SU1) i
Total value of the chosen characteristic in cluster (SU1) i
Mean value of the characteristic Y in the elements (SU2) of cluster (SU1) i
Total value of the characteristic Y in the population
Mean value of the characteristic Y per cluster (SU1)
Mean value of the characteristic Y per element (SU2)
Mean value of the characteristic Y per element (SU2) if Mi= constant = M
Variances 
Variance between total values of the characteristic Y per cluster
Variance between mean values of the characteristic Y per cluster (SU1) The asterisk is used in the symbol, , to differentiate the variance between mean values of the characteristic per cluster (SU1) and the variance between total values of the characteristic per cluster (SU1)
Variance between values of the characteristic Y in the elements (SU2) within cluster (SU1) i
Variance between values of the characteristic Y in the elements (SU2) within all clusters (SU1)

6.3 THE SAMPLE

In this design, as opposed to cluster sampling, the numbers mi of elements sampled in the second sampling stage are not equal to the sizes of the corresponding clusters. The sample statistics common of this design that are most important to fisheries research are summarised below.

Table 6.2
Main sample statistics of interest to fisheries research in two-stage sampling design

nNumber of clusters (SU1) sampled
miNumber of elements (SU2) sampled from cluster (SU1) i
Total number of elements (SU2) sampled
Mean number of elements (SU2) sampled per cluster (SU1)
yijValue of the characteristic Y in element (SU2) j of cluster (SU1) i
Total value of the characteristic Y in the elements (SU2) sampled from cluster (SU1) i
Mean value of the characteristic Y in the elements (SU2) sampled from cluster (SU1) i
Total value of the characteristic Y in the clusters (SU1) sampled
Sample mean value of the characteristic Y per cluster (SU1)
Sample mean value of the characteristic Y per element (SU2)
Sample mean value of the characteristic Y per element (SU2) if mi= constant = m
Sample variance between total values of the characteristic Y per cluster (SU1)
Sample variance between mean values of the characteristic Y per cluster (SU1)
Sample variance between values of the characteristic Y in the elements (SU2) sampled within cluster (SU1) i
Sample variance between values of the characteristic Y in the elements (SU2) sampled within all clusters (SU1), if mi= constant = m

6.4 THE SAMPLING WORLD

In two-stage sampling design, the expected values and the variances of an estimator in the sampling world are calculated taking into consideration the two stages.

Let the sub-indices 1 and 2 refer respectively to the first and to the second sampling stages.

First sampling stage

E1 refers to the expected value of the estimator among all possible first-stage samples to be selected from the population.V1 refers to the sampling variance of the estimator among all possible first-stage samples to be selected from the population.

Second sampling stage

E2 refers to the expected value of the estimator among all possible second-stage samples to be selected from the first-stage clusters already sampled, that is, conditional on the SU1 sampled.

V2 refers to the sampling variance of the estimator among all possible second-stage samples to be selected from the first stage clusters already sampled, that is, conditional on the SU1 sampled.

Using these definitions it can be demonstrated that, if is an estimator of the population parameter θ, the expected value of the estimator is:

E[]= E1[E2 ()]

and its sampling variance is:

V[]=V1[E2()]+ E1, [V2 ()]

The first term relates to the sampling variance of the estimator between the clusters (SU1) and the second term relates to the sampling variance between the elements (SU2) within the clusters (SU1).

This basic theorem is valid for the sampling distribution of any estimator and it is valid for any two-stage sampling design. These results can also be extended to sampling designs with more stages.

There are different methods of selecting the sampling units in the two-stage sampling design. The units can be selected with simple random sampling, or with different probabilities in one or both stages. These choices will affect the sampling distribution of the estimators, and correspondingly the choice of estimators to use for any particular purpose.

In this document the following methods will be analysed:

In two-stage sampling applied to fisheries science, the population total value, Y, and the mean per element, are often the parameters to be estimated.

6.4.1 First selection by simple random sampling, without replacement, and second selection by simple random sampling, without replacement

In this two-stage sampling design, an unbiased estimator of the total value of the population is:

where Ŷi is an estimator of the total value of the characteristic in cluster (SU1) i. Taking into consideration that simple random sampling is adopted in the second sampling stage, the estimator Ŷi would be:

Ŷi= M i i or

Applying the general theorem from section 6.4.2, it can be proven that the estimator is unbiased with sampling variance equal to:

In this expression, the sampling fractions at the first and second stages are

and respectively, is the population variance between total values of the characteristic of clusters (SU1) and is the population variance between the values of the characteristic of the elements (SU2) within cluster (SU1) i.

An estimate of the variance, V(Ŷ), is obtained by replacing the population variance with a sample estimate, Ŝ12 and the population variances with the sample variances :

Particular case of SU1s with equal sizes

When the SU1 sampling units have equal sizes and both selections are with equal probabilities ( for the first stage and for the second stage), two-stage sampling design becomes a very simple particular case.

Let M be the constant number of second-stage sampling units, SU2, in any of the N clusters (SU1) of the population and m the constant number of SU2 sampled from any SU1 of the sample.

Replacing Mi with M and mi with m in the previous case, the above sampling variance will become:

In this case, the estimator of the mean per element, , is given by . The sampling variance of this estimator is given by and hence, from the previous expression of the sampling variance, this will be:

It was earlier seen that, for equal sizes of clusters SU1 s and equal sizes of the respective SU2 s samples, the variances between mean values of the characteristic Y in the SU1 s and the variances between values of the characteristic Y in the SU2 s could be written as:

and respectively

Then the sampling variance of will take the form:

An estimate of the sampling variance is:

The second term of this variance is negligible when f1 is small.

In the case of three-stage (or more stages) sampling with sampling units of equal sizes, the extension of the expressions above is simple. For example, considering three-stage sampling, the estimator of the population mean per element is:

where

The sampling variance of this estimator will be:

where:

with:

An estimator of the sampling variance can be obtained from:

where:

and

In the case of two-stage sampling with equal-sized SU1s, it is also simple to estimate the proportion of elements of the population belonging to one certain category.

A proportion in a sample of size n is considered as a mean of n Bernoulli variables. Then, the proportion pi, in the ith sampled cluster ith SU1, is:

pi = i

Then the sample mean per element is the overall proportion:

Therefore, the estimator of the overall proportion of the elements belonging to the category of interest can be the average, , of the proportions, pi, of the clusters sampled:

This estimator is unbiased, and an estimate of its sampling variance is given by:

where

and with

6.4.2 First selection with unequal probabilities, with replacement, and second selection with equal probabilities, with replacement

To analyse this design, let Pi be the known probability of selecting the ith cluster(SU1i) in one extraction, .

An unbiased estimator of the population total, Y, is:

where i = Mii is an estimator of the total value, Yi, of the ith SU1.

The sampling variance of this estimator will depend on the sampling design of the first stage. However if independent estimates, Ŷi, of the total values, Yi, are available, an unbiased estimator of the sampling variance, will be:

The estimators presented for multi-stage sampling are in general efficient, but they suffer from the handicap of having complicated expressions. This complication arises from the unequal selection probabilities, requiring the calculation of weights for each cluster. Under some conditions of sample allocation, however, this estimator can be self-weighting.

In fact, the estimator of the total value in the population can be rewritten as:

with

If the sample allocation is such that , then , and the estimator can be rewritten:

showing that the sample is self-weighted.

The importance of self-weighting sampling is to facilitate the calculations, since the constant weight factors simplify the calculations of the estimators and of the estimated sampling variances.

Under these conditions, an estimate of the sampling variance is:

The population mean per element, , can be estimated using the estimator .

The sampling variance of this estimator is:

As in the previous cases, this estimator requires that M0, the total number of elements of the population, is known.


Previous Page Top of Page Next Page