In cluster sampling the population is partitioned into groups, called clusters. The clusters, which are composed of elements, are not necessarily of the same size. Each element should belong to one cluster only and none of the elements of the population should be left out.
The clusters, and not the elements, are the units to be sampled. Whenever a cluster is sampled, every element within it is observed.
In cluster sampling, only a few clusters are sampled. Hence, in order to increase the precision of the estimates, the population should be partitioned into clusters in such a way that the clusters will have similar mean values. As the elements inside the clusters are not sampled, the variance within clusters does not contribute to the sampling variance of the estimators. Therefore, in order to decrease the sampling variance of the estimators the variation within the clusters should be as large as possible, while the variation between clusters should be as small as possible.
It should be noted that the partitioning of the population into clusters follows two opposite criteria to the criteria of partitioning a population into strata, that is, the heterogeneity within clusters as opposed to the homogeneity within strata and the similarity of cluster means as opposed to the differences in strata means.
Cluster sampling is often more cost effective than other sampling designs, as one does not have to sample all the clusters. However, if the size of a cluster is large it might not be possible to observe all its elements. The next chapter will show that there are ways to overcome these difficulties.
In fisheries, cluster sampling has been used to estimate landings per trip in artisanal fisheries with a small number of vessels landing at many sites (beaches). Consider, for instance, a fishery with 100 small beaches, where a few vessels land at each beach. One is interested in the total catch per day of the vessels landing at these beaches, but one does not have the possibility to visit all of them. In this case each beach can be a cluster. If a beach is sampled, all its elements (vessel landings) should be observed.
Another example of cluster sampling in fisheries is the sampling of the length composition of an unsorted large catch of a species kept in fish boxes onboard a vessel. Let us assume that the catch in each box is as heterogeneous as possible. The fish boxes can then be looked upon as clusters, and when a box has been selected for sampling, all the elements (fish) inside the box have to be observed.
To be consistent with other methods, the symbol N is used to designate the total number of population sampling units, which in this case are the clusters and not the elements.
The number of elements in a cluster i is denoted by Mi.
The total number of elements in the population is:

and the mean number of elements per cluster is:

The value of the characteristic Y in element j from cluster i is Yij and Yi is the total value of the characteristic in cluster i:

The mean value per element in cluster i is:

The total value of the characteristic in all the elements of the population is:

In cluster sampling, two means can be considered:
The mean per cluster
and
the mean per element

In cluster sampling, two types of variances can be considered. The first is the variance between clusters, and the second is the variance within clusters, or between elements within clusters.
The variance between cluster totals is:

The variance
within a cluster i, denoted by
, is

Table 5.1 presents a summary of the main parameters of a discrete population divided into clusters that are most used in fisheries research.
In
cluster sampling, n is the number of clusters to be sampled and mi is the number of elements sampled from cluster i. Note that, in this case, mi = Mi, since all elements of
every cluster sampled are observed. The total number of elements
observed (i.e the total number of elements in the sample) is then
and the mean number
of elements per cluster in the sample is
.
For any sampled cluster i, the value of
the chosen characteristic of element j is yij and
is the total value of
the characteristic in cluster i. Note that the sampled cluster total yi is
equal to the population cluster total Yi, since there is no
sampling inside the clusters (all elements of the clusters sampled are observed).
Table 5.1
Summary of population
parameters of most interest in fisheries research
| N | Number of clusters in the population |
| Mi | Number of elements in cluster i |
![]() | Total number of elements in the population |
![]() | Mean number of elements per cluster |
| Yij | Value of characteristic Y in element j of cluster i |
![]() | Total value of the characteristic Y in cluster i |
![]() | Mean value of the characteristic Y in the elements of cluster i |
![]() | Total value of characteristic Y of all the elements in the population |
![]() | Mean value of characteristic Y per cluster |
![]() | Mean value of characteristic Y per element |
![]() | Variance of characteristic Y between cluster totals |
![]() | Variance of characteristic Y within cluster i |
The mean value of the characteristic Y in all the elements of cluster i is:

The
total value of the characteristic Y in all the elements of all the clusters
sampled is denoted by: 
The
mean value of the characteristic Y per
cluster is: 
The mean value of
the characteristic Y per element is: 
The variance between total values of the characteristic Y in the clusters sampled is:

The variance of the values of the characteristic Y within the ith sampled cluster is:

Table 5.2 presents a summary of the most common sample statistics in cluster sampling applied to fisheries.
Table 5.2
Most common sample statistics in cluster
sampling
| n | Number of clusters sampled |
| mi | Number of elements in cluster i (Note that mi = Mi) |
![]() | Total number of elements in sample |
![]() | Sample mean number of elements per cluster |
| yij | Value of the characteristic Y in element j of cluster i |
![]() | Total value of the characteristic Y in the sampled cluster i |
![]() | Mean value of the characteristic Y in the sampled cluster i |
![]() | Total value of the characteristic Y in the sample |
![]() | Mean value of the characteristic Y per cluster sampled |
![]() | Sample mean value of the characteristic Y per element |
![]() | Variance between total values of the characteristic Y in the clusters sampled |
![]() | Variance of the values of characteristic Y within the sampled cluster i |
As mentioned in the introduction to this chapter, in cluster sampling the sampling units are the clusters. The selection of the clusters can be made by random sampling with equal probabilities (simple random sampling) or with different probabilities. A particular case of random sampling with different probabilities is when the probabilities are proportional to the sizes of the clusters.
The most important estimators in cluster sampling are the estimators of the total value, of the mean per cluster and of the mean per element.
First, let us consider an example where the clusters
are selected by simple random sampling without replacement. In this case, the
probability of selecting any cluster i, in one extraction, is constant
and equal to
.
An unbiased estimator of the population total value, Y, is:
or
= N 
The factor
is a raising factor,
which raises the sample total, y, to the estimator
of the population total
.
The sampling distribution of this estimator is approximately normal with expected value E, and variance V:

N (E, V)
where
E = E[Ŷ]= Y (Ŷ is an unbiased estimator of Y)
and

An estimate of the sampling variance can be obtained by replacing the population variance S12 with the sample variance s12 in the expression of the sampling variance:

Two cases of selecting the sample with unequal probabilities will be considered: selection with replacement and selection without replacement. In the former, the Hansen-Hurwitz estimator will be described. The particular case of selection with probabilities proportional to the sizes of the clusters will be also studied.
Another estimator, the Horvitz-Thompson estimator, is applicable in both cases, i.e., with or without replacement.
Hansen-Hurwitz estimator -selection with replacement
Let Pi be the probability of selecting cluster i,
with replacement, in one extraction (Note that
).In this case an
unbiased estimator of the population total value is:

This estimator has an approximately normal distribution with expected value E, and variance, V:

N (E, V)
where
E = E[Ŷ] = Y (Ŷ is an unbiased estimator of Y)
and

An estimate of the
sampling variance
would be:

Selection with probabilities proportional to cluster sizes
Let us consider the special case
where the selection probability is proportional to the size of the clusters, 
In this case, the estimator of the
total value,
,
its sampling variance and all the other expressions can be obtained replacing Pi in
the case previously described, by
i.e.:
or 
The sampling variance is:

considering that:
and 
Another expression of this variance can also be obtained:

An estimate of this variance is:
or 
Note that in order to use this estimate one needs to know the total number of elements in the population, Mo.
An estimator of the mean value per
element is:
or 
which has a sampling
variance given by: 
An estimate of the sampling variance can be calculated replacing the variance of the population total by its sample estimate:

resulting in: 
Note that in order to use this estimate one needs to know the total number of elements in the population, Mo.
The estimator of the mean value per cluster is:

with a sampling variance:

An estimate of
this sampling variance can be obtained from
and is expressed as:

In order to use this
estimator one needs to know Mo, or at least the mean number
of elements per cluster,
.
Selecting clusters with probabilities
proportional to their sizes is not always easy. A simple procedure for
selecting n clusters with probabilities proportional to their
sizes (
), that can be easily used in fisheries research, is given
below:
A simple example to illustrate the procedure:
Consider a situation where one wishes to select three out of five boats landing fish on a beach. The boats are considered as the clusters to be sampled. Each boat carries a different number of fish boxes to be landed. The percentages of the total number of fish boxes carried by each one of the five boats will be considered as the probabilities proportional to the sizes of the clusters. Table 5.3 shows the original data and how to calculate what can be designated as the “boat selection numbers”.
Table 5.3
Original data and
calculation of “boat selection numbers”
| Boat | Number of fish boxes | Cumulative numbers | Boat selection numbers | Selection Probability |
| 1 | 5 | 5 | 1–5 | 5/50=0.10 |
| 2 | 10 | 15 | 6–15 | 10/50=0.20 |
| 3 | 7 | 22 | 16–22 | 7/50=0.14 |
| 4 | 13 | 35 | 23–35 | 13/50=0.26 |
| 5 | 15 | 50 | 36–50 | 15/50=0.30 |
By repeating three times a simple random sampling of one out of fifty numbers, one will get three boat selection numbers corresponding to the boats to be sampled (ignore the selected numbers corresponding to clusters already chosen).
Horvitz-Thompson estimator - selection with or without replacement
It is convenient, before describing this estimator and its sampling characteristics, to show how inclusion probabilities can be calculated.
Let πi denote the probability of including cluster i in a sample of size n. The inclusion probability, πi, is connected with the probability Pi of selecting cluster i in one single extraction.
To derive the relation between πi and Pi it is preferable to use complementary probabilities. In this way, the probability, 1-πi, of not including cluster i in the sample of size n can be calculated as the probability of not selecting the cluster i in any of the n extractions, that is, 1- π i = (1- Pi)n. Therefore the inclusion probability πi will be:
πi = 1 - (1- Pi)n
Let us now consider the probability, πij, that both cluster i and cluster j are included in a sample of size n. The probability of extracting either cluster i or cluster j, in one extraction, is Pi + PJ and so the probability of neither extracting cluster i nor cluster j, will be 1- (Pi + Pj).
In n independent extractions the probability of neither extracting cluster i nor cluster j will be [1 - (Pi + Pj)]n. Therefore the probability of extracting either cluster i or cluster j in n extractions will be 1 - [1 - (Pi + Pj)]n.
Alternatively, the same probability - that either cluster i or j be included in the sample, could also be expressed as the probability of including cluster i plus the probability of including cluster j minus the probability of including both i and j, that is, (πi + πj) - πij.
The two last expressions are different ways to refer to the same probability, thus:
πi + πj - πij = 1 - [1-(Pi + Pj)]n
Finally the inclusion probability, π ij, can be calculated as:
πij = (πi + πj) - {1-[1-(Pi + Pj)]n}
The calculations of the inclusion probabilities, for the example presented previously, are shown below.
| Boat, i | Prob. of selection, Pi | Prob. of inclusion, πi | Inclusion probabilities, πij | |||
| 1 | 2 | 3 | 4 | |||
| 1 | 5/50=0.10 | π1= 0.271 | ||||
| 2 | 10/50=0.20 | π2= 0.488 | 0.102 | |||
| 3 | 7/50=0.14 | π3= 0.364 | 0.074 | 0.139 | ||
| 4 | 13/50=0.26 | π4= 0.595 | 0.128 | 0.240 | 0.175 | |
| 5 | 15/50=0.30 | π5= 0.657 | 0.144 | 0.270 | 0.197 | 0.337 |
| Total | 1.00 | |||||
Calculations:
| π1 = 1- (1-0.10)3 = 0.271 | |
| π2 = 1- (1-0.20)3 = 0.488 | π12 = 0.217 + 0.488 -{1-[1- (0.10 + 0.20)]3} = 0.102 |
| π2 = 1- (1-0.14)3 = 0.364 | π13 = 0.217 + 0.364 -{1-[1- (0.10 + 0.14)]3} = 0.074 |
| π23 = 0.488 + 0.364 -{1-[1- (0.20 + 0.14)]3} = 0.139 | |
| etc. |
The Horvitz-Thompson estimator of the total value of the population is:

where yi is the total value of the variable, in the distinct sampled cluster i; πi is the probability of inclusion of the cluster i in the sample and n * is the “effective” size of the sample, that is, the number of distinct clusters, in a sample of size n. Note that clusters sampled repeatedly are eliminated from the calculations.
The estimator is unbiased and its sampling variance can be written as:

An unbiased estimate of this variance is:

Note that inclusion probabilities should be different from zero.
These estimates of the variances can be negative. A way to avoid this inconvenience is as follows:
Calculate, from each effective sampled cluster i, the following ti statistics:

Each of the ti values calculated can be considered as an estimate of the total value, Y.
The mean of ti for all clusters effectively sampled is a Horvitz-Thompson estimator of the total value Y:

The estimate, v[Ŷ], of the sampling variance of this estimator is:
where
and 