When the population is heterogeneous, dividing the whole
population into sub-populations, called *strata*, can increase the
precision of the estimates. The *strata* should not overlap and each *stratum* should be sampled following some design. All *strata* must be sampled.
The *strata* are sampled separately and the estimates from each *stratum*
combined into one estimate for the whole population.

The theory of stratified sampling deals with the properties of the sampling distribution of the estimators and with different types of allocation of the sample sizes to obtain the maximum precision.

The principle of stratification is the partition of the
population in such a way that the elements within a *stratum* are as
similar as possible and the means of the *strata* are as different as
possible.

The design is called stratified random sampling if simple
random sampling is applied to each *stratum*.

In stratified sampling the population of *N* elements is
divided into *k strata* of sizes:

*N _{1}*,

Every
element in the population belongs to at least one *stratum*, and no
element of the population belongs to more than one *stratum*. Figure 4.1
shows a stratified sampling scheme for a shrimp fishing ground.

FIGURE 4.1

**A stratified sampling scheme for a shrimp fishing ground**

The population was divided
into 19 *strata*. As an illustration *stratum*17 shows the 18
trawling unit areas into which the *stratum* was divided. A similar
subdivision was used for each of the other *strata*.

Let *N _{h}* represent the size and

and the mean value is:

The
modified population variance of *stratum h* is:

Note that the sum of
squares of residuals, *SS _{h}*, is divided by

The total value of the
characteristic Y in the population is the sum of the total values of all *strata*:

and the mean value is a weighted average of the means of all *strata*,

where *N* is the size of the
population with *k strata*:

*N = N _{1} + N_{2} + … + N_{h} + N … + N_{k}* and is the size of

In stratified sampling, a sample is selected from each *stratum*
by simple random sampling. Independent selections are used in each *strata*.

Consider a sample of size *n _{h}*
selected from

The sample variance of characteristic Y in *stratum
h* is:

The sample standard deviation, *s _{h}*

Given independent simple random
samples from each *strata*, each of size n_{h}, the total sample
size is .

Under these conditions, the total
value of characteristic Y in the whole sample is the sum of the sample total
values in each *stratum*,

The stratified sample mean, * _{st}*, is given by the
weighted average of the sample means of the characteristic of interest from
each

and the stratified sample variance is
simply the sum of the variances within each *stratum*. This is achieved
because there is no sampling of *strata* (all are observed) and sampling
is carried out independently within each of them,

The stratified sample standard
deviation, *s _{st}*

and the coefficient of variation will be .

*Estimator of the mean
value*

Within each *stratum*, simple random sampling is used.
So, the sampling distribution of the estimators of the population parameters of
each *stratum* is that given for simple random sampling.

An unbiased estimator , of the mean of characteristic Y, of the *stratum h,* , is *y** _{h}*.

The sampling distribution of is approximately
normal, *N*(*E, V*), where *E* is the expected value and *V* is the
sampling variance of the estimator in *stratum h*:

and

where *f _{h}* is the
sampling fraction defined by ,

An unbiased estimator of is the sample variance :

An estimator of the sampling variance of the estimator can thus be obtained by replacing by in the corresponding expression:

*Estimator
of the total value*

Let *Ŷ _{h}* be an estimator of the total value

where
is the mean of *stratum h*.

*Ŷ _{h}* is an unbiased estimator
of

*E = E[ Ŷ_{h}] = Y_{h}*

and

where *f _{h}* is the sampling fraction
of

The square root of the sampling variance, is the error of the estimator.

An unbiased estimator of the
sampling variance *V* is obtained by replacing the population variance
by the sample variance in the corresponding expression:

where is an estimate of , given by the sample variance.

*Estimator of the mean
value*

An unbiased estimator of the
population stratified mean, for all *strata*, is given by the sample
stratified mean,

The sampling distribution of is approximately
normal, *N* (*E, V*, where *E* is the expected value and *V* is the
sampling variance of the estimator, given by:

and

In these expressions, *f _{h}*
is the sampling fraction defined by ,

An unbiased estimator of the sampling variance of the estimator of the stratified mean value can be obtained by replacing by in the corresponding expression:

*Estimator
of the total value*

As
sample selections in different *strata* have
been made independently, an estimator of the total value of the population is: = *N*_{st}

where
_{st} is the stratified
sample mean, given by

The
estimator Ŷ has an approximately
normal distribution, _{h}*N*,[*E, V*], where *E* and
*V* are the expected value and the sampling variance, respectively, of the
estimator, and are given by

*E = E*[]and

Like for the mean value, an unbiased estimator of the sampling variance of the estimator of the stratified mean value can be obtained, by replacing the population varianceby the corresponding sample variancein its expression:

In stratified sampling, the size
of the sample from each *stratum* is chosen by the sampler, or to put it
another way, given a total sample size *n = n _{1} + n_{2} + … + n_{h} + … + n_{k}*, a choice can be made on how to allocate the sample among
the

Let *n* be the total size of the sample to be taken.

If the *strata* sizes are
different, proportional allocation could be used to maintain a steady sampling
fraction throughout the population. The total sample size, *n*, should be
allocated to the *strata* proportionally to their sizes:

or

Optimum
allocation takes into consideration both the sizes of the *strata* and the
variability inside the *strata*. In order to obtain the minimum sampling
variance the total sample size should be allocated to the *strata*
proportionally to their sizes and also to the standard deviation of their
values, i.e. to the square root of the variances.

*n _{h}* = constant ×

Given that , in this case

so that

where *n* is total sample
size, *n _{h}* is the sample size in

In some sampling situations, the
cost of sampling in terms of time or money is composed of a fixed part and of a
variable part depending on the *stratum*.

The sampling cost function is thus of the form:

where *C* is the total cost of
the sampling, *c*_{0} is an overhead cost and *c _{h}*
is the cost per sampling unit in stratum

The optimum allocation of the
sample to the *strata* in this situation is allocating sample size to the *strata*
proportional to the size, and the standard error, and inversely proportional to
the cost of sampling in each *stratum*. This gives the following sample
size for *stratum h*:

Very often, it is the total cost of the sampling, rather than
the total sample size, that is fixed. This is usually the case with research
vessel surveys, in which the number of days is fixed beforehand. In this case,
the optimum allocation of sample size among *strata* is

To obtain the full benefits of the stratification technique,
the relative sizes of *strata* must be known.

Each *stratum* should
be internally homogeneous. If information about heterogeneity is not available
then consider all *strata* equally variable. A short stratified pilot
survey can sometimes provide useful information about internal dispersion
within *strata*.

A small sized sample could be taken from a *stratum* if
the variability among their units is small.

Compared with the simple random sample, stratification results almost always in a smaller sampling variance of the mean or total value estimators, when:

- The
*strata*are heterogeneous among themselves - The
variance of each
*stratum*is small.

A larger sample from a *stratum* should be taken if:

- The
*stratum*is larger - The
*stratum*is more heterogeneous - The
cost of sampling the
*stratum*is low.