When the population is heterogeneous, dividing the whole population into sub-populations, called strata, can increase the precision of the estimates. The strata should not overlap and each stratum should be sampled following some design. All strata must be sampled. The strata are sampled separately and the estimates from each stratum combined into one estimate for the whole population.
The theory of stratified sampling deals with the properties of the sampling distribution of the estimators and with different types of allocation of the sample sizes to obtain the maximum precision.
The principle of stratification is the partition of the population in such a way that the elements within a stratum are as similar as possible and the means of the strata are as different as possible.
The design is called stratified random sampling if simple random sampling is applied to each stratum.
In stratified sampling the population of N elements is divided into k strata of sizes:
N1, N2,
…, Nh, …, Nk elements, where 
Every element in the population belongs to at least one stratum, and no element of the population belongs to more than one stratum. Figure 4.1 shows a stratified sampling scheme for a shrimp fishing ground.
FIGURE 4.1
A stratified sampling scheme for a shrimp fishing ground

The population was divided into 19 strata. As an illustration stratum17 shows the 18 trawling unit areas into which the stratum was divided. A similar subdivision was used for each of the other strata.
Let Nh represent the size and Yhi the value of the characteristic Y in the ith element of stratum h. The total value of the characteristic Y in stratum h is:

and the mean value is: 
The modified population variance of stratum h is:

Note that the sum of
squares of residuals, SSh, is divided by (Nh-
1), to obtain
, and not σ
. The standard deviation is the square root of the variance,
.
The total value of the characteristic Y in the population is the sum of the total values of all strata:

and the mean value is a weighted average of the means of all strata,

where N is the size of the population with k strata:
N = N1 + N2 + … + Nh + N … + Nk and
is the size of stratum
h, relative to the total population size, and is used as the weighting
factor.
In stratified sampling, a sample is selected from each stratum by simple random sampling. Independent selections are used in each strata.
Consider a sample of size nh
selected from stratum h by simple random sampling without
replacement. The value of characteristic Y in the ith element
of the sample from the stratum is denoted by yhi. Then
is the sample total
value and
is the sample mean
value of characteristic Y in the stratum.
The sample variance of characteristic Y in stratum
h is: 
The sample standard deviation, sh,
is the square root of the variance,
and the coefficient
of variation will be
.
Given independent simple random
samples from each strata, each of size nh, the total sample
size is
.
Under these conditions, the total
value of characteristic Y in the whole sample is the sum of the sample total
values in each stratum, 
The stratified sample mean,
st, is given by the
weighted average of the sample means of the characteristic of interest from
each stratum,

and the stratified sample variance is simply the sum of the variances within each stratum. This is achieved because there is no sampling of strata (all are observed) and sampling is carried out independently within each of them,

The stratified sample standard deviation, sst, is the square root of the variance,
and the coefficient of variation will be
.
Estimator of the mean value
Within each stratum, simple random sampling is used. So, the sampling distribution of the estimators of the population parameters of each stratum is that given for simple random sampling.
An unbiased estimator
, of the mean of characteristic Y, of the stratum h,
, is yh.
The sampling distribution of
is approximately
normal, 
N(E, V), where E is the expected value and V is the
sampling variance of the estimator in stratum h:
and 
where fh is the
sampling fraction defined by
, nh is the sample size in stratum h and is the size of stratum h.
An unbiased estimator of
is the sample variance
:

An estimator of the sampling variance of the estimator can
thus be obtained by replacing
by
in the corresponding
expression:

Estimator of the total value
Let Ŷh be an estimator of the total value Yh of stratum h, given by: Ŷh = Nh 
where
is the mean of stratum h.
Ŷh is an unbiased estimator
of Yh with an approximately normal sampling distribution, Ŷh
N[E, V], where:
E = E[Ŷh] = Yh
and

where fh is the sampling fraction
of stratum h,
.
The square root of the
sampling variance,
is the error of the
estimator.
An unbiased estimator of the sampling variance V is obtained by replacing the population variance by the sample variance in the corresponding expression:

where
is an estimate of
, given by the sample variance.
Estimator of the mean value
An unbiased estimator of the population stratified mean, for all strata, is given by the sample stratified mean,

The sampling distribution of
is approximately
normal, 
N (E, V, where E is the expected value and V is the
sampling variance of the estimator, given by:
and 
In these expressions, fh
is the sampling fraction defined by
, nh is the sample size in stratum
h and Nh is the size of stratum h.
An unbiased
estimator of the sampling variance of the estimator of the stratified mean
value can be obtained by replacing
by
in the corresponding
expression:

Estimator of the total value
As
sample selections in different strata have
been made independently, an estimator of the total value of the population is:
= N
st
where
st is the stratified
sample mean, given by

The
estimator Ŷ has an approximately
normal distribution,
h
N,[E, V], where E and
V are the expected value and the sampling variance, respectively, of the
estimator, and are given by
E = E[
]
and 
Like
for the mean value, an unbiased estimator of the sampling variance of
the estimator of the stratified mean value can be obtained, by replacing the
population variance
by the corresponding sample variance
in its expression:

In stratified sampling, the size of the sample from each stratum is chosen by the sampler, or to put it another way, given a total sample size n = n1 + n2 + … + nh + … + nk, a choice can be made on how to allocate the sample among the k strata. There are rules governing how a sample from a given stratum should be taken. Sample size should be larger in strata that are larger, with greater variability and where sampling has lower cost. If the strata are of the same size and there is no information about the variability of the population, a reasonable choice would be to assign equal sample sizes to all strata.
Let n be the total size of the sample to be taken.
If the strata sizes are different, proportional allocation could be used to maintain a steady sampling fraction throughout the population. The total sample size, n, should be allocated to the strata proportionally to their sizes:
or 
Optimum allocation takes into consideration both the sizes of the strata and the variability inside the strata. In order to obtain the minimum sampling variance the total sample size should be allocated to the strata proportionally to their sizes and also to the standard deviation of their values, i.e. to the square root of the variances.
nh = constant × Nh sh
Given that
, in this case
so that 
where n is total sample size, nh is the sample size in stratum h, Nh is the size of stratum h and sh is the square root of the variance in stratum h.
In some sampling situations, the cost of sampling in terms of time or money is composed of a fixed part and of a variable part depending on the stratum.
The sampling cost function is thus of the form:

where C is the total cost of the sampling, c0 is an overhead cost and ch is the cost per sampling unit in stratum h, which may vary from stratum to stratum.
The optimum allocation of the sample to the strata in this situation is allocating sample size to the strata proportional to the size, and the standard error, and inversely proportional to the cost of sampling in each stratum. This gives the following sample size for stratum h:

Very often, it is the total cost of the sampling, rather than the total sample size, that is fixed. This is usually the case with research vessel surveys, in which the number of days is fixed beforehand. In this case, the optimum allocation of sample size among strata is

To obtain the full benefits of the stratification technique, the relative sizes of strata must be known.
Each stratum should be internally homogeneous. If information about heterogeneity is not available then consider all strata equally variable. A short stratified pilot survey can sometimes provide useful information about internal dispersion within strata.
A small sized sample could be taken from a stratum if the variability among their units is small.
Compared with the simple random sample, stratification results almost always in a smaller sampling variance of the mean or total value estimators, when:
A larger sample from a stratum should be taken if: