Simple random sampling is the simplest way to sample a population. Its simplicity arises from the way that the sample is selected. In this design, all possible samples have the same probability to be chosen.
Other sampling methods have procedures that include this method for selection of parts of a total sample. For this reason, when describing sampling methods it is convenient to start with the simple random sampling.
In simple random sampling, the population to be sampled is considered as a simple collection of elements, where no subgroups are considered.
Let Y denote a characteristic of a population. Then Yi will be the value of the characteristic of the ith element. The main parameters of the population that are more relevant to fisheries research are discussed in Chapter 2. A short summary is presented in Table 3.1.
Table 3.1
Main population parameters of interest to fisheries
research
| N | Population size |
| Yi | Value of the characteristic of the ith element (i=1, 2, ..., N) |
![]() | Total value |
![]() | Mean value or µ (The relation Y = N is valid) |
and ![]() | Variance, σ2 and modified variance, S2 |
and ![]() | Standard deviation σ, and modified standard deviation S |
and ![]() | Coefficient of variation, CV |
The main statistics of the sample that are more relevant to fisheries research are discussed in Chapter 2. A short summary of the more common ones is presented in Table 3.2.
Table 3.2
Some sample statistics more frequently used in fisheries
research
| n | Sample size |
| yi | Value of the characteristic of the ith element (i=1, 2, ..., n) |
![]() | Total value of the characteristic in the sample |
![]() | Sample mean |
![]() | Sum of squares of the deviations from the sample mean |
![]() | Sample variance |
![]() | Sample standard deviation |
![]() | Sample coefficient of variation |
Simple random sampling is defined as any sampling system that ensures that all possible samples with a given sample size have the same probability of being selected. Alternatively one could say that every element of the population has the same probability of being selected for inclusion in the sample, in one extraction.
In this sampling method, the probability, Pi, of selecting an element, i, from a finite population of size N, is:
There are two ways to take a simple random sample: either the elements are selected with replacement of the element into the population after each extraction, or without replacement.
When using sampling without replacement
from a finite population, it is usual to define the sampling fraction, f,
as
. In that case, (1-f)
is called the finite correction factor. When the number of elements of
the population is infinite or sampling is with replacement, the sampling
fraction is zero, the correction factor is equal to 1 and, therefore, it is not
considered.
Several statistics (e.g. the median) can be used to estimate the mean of the population. However, the most frequently used estimator of the population mean is the sample mean:
In simple random sampling, the sampling distribution of the sample mean has some important properties:
1. The expected value of the sampling distribution is equal to the population mean:
E[
] = μ
Note: the sample mean is an unbiased estimator of the population mean.
2. When the sample size n is large (for example, n>100), the sampling distribution of the sample mean tends to be a normal distribution. Using the notation presented in chapter 2, we may write:

N(E,V)
where
E= E[
]=
= μ
V = V[
]= 
When n is very small compared to N the correction factor, (1-f), can be ignored and, in this case:

An estimate of the sampling variance of the estimator can be calculated by replacing the population parameter S, in the variance expressions, with the sample statistic s, that is:

or
when N is
large or sampling is with replacement.
The error is
given by: 
and the
estimate of the error is: 
The C confidence interval (l1, l2) for the estimator of the population mean, can be with:

if σ2 is known (or n is large), z can be calculated from the probability relation:

When σ2 is unknown, but n
is large, the standard error of the mean, σ
, can be replaced with the estimated standard error of the mean, s
.
If σ2 is unknown, and n
is small, the standard error of the mean, σ
, can still be replaced with the estimated standard error of the mean, s
. In these situations,
however, the Z- distribution should be replaced with the t-student
distribution. The confidence interval will then be:

where tn-1 follows the t- distribution with n-1 degrees of freedom.
The value of tn-1 is given by the probability relation:
Prob {t outside the interval (-tn-1, tn-1)} = (1- C), where C is the desired confidence level.
This value tn-1 can be obtained from t- student tables.
Note that the limits (l1, l2) of the confidence interval are derived from the following relation:
with Z
N(0,1)
or, when the population variance is unknown and the sample size is small, are derived using the t-student distribution as follows:
with t
t(n - 1)
An unbiased estimator of the total value of the population is:
= N 
The sampling distribution of this estimator is approximately normal:

N (E, V)
where
E = E[
]= Y and 
The error of the estimator is the square root of the sampling variance:

Approximations to the sampling variance and to the error of the estimator can be obtained by replacing the population variance, S2, with the sample variance, s2, in the respective formulas, that is:
and 
It should be noted
that the estimator
= N
can be written as: 
This last expression shows that the estimator of the total value of the population was obtained by raising, extrapolating or amplifying the total value of the sample by the raising factor N/n. This is the most common way of obtaining the estimator total values of the population in fishery research. It is also common in fisheries research to apply, instead of the quotient between the size of the population, N, and the size of the sample, n, the quotient of the corresponding total weights, W and w (this implies that the same mean weight in the sample and in the population is assumed).
In these circumstances, the expression given above for the estimated variance can be written in terms of the quotient N/n or W/w, e.g.:

In the case of proportions, the quantities to be estimated are the proportion P of the elements belonging to a category or the population mean and the total number, NP, of the elements belonging to the category or the population total of the variable Yi(in the case of finite populations).
Estimator of the population proportion
In the general situation of simple random sampling, the sample mean, that in this case is the proportion of the sample elements belonging to the category of interest, p, is an estimator of the population mean, P.
The sampling distribution of this estimator has the following properties:
Expected value:E[p]=P
Sampling
Variance: V[p]= 
As previously
mentioned, the sampling variance of a mean is
, where S2 is the population variance. In
this case S2 is equal to
and the sampling
variance of p will be:

An unbiased estimate of V[p] can be obtained by replacing the population variance S2 with the sample variance s2 in the general expression of the sampling variance of a mean:

The error Sp is
the square root of the sampling variance of p:
An estimate, sp, of the sampling error, is given by: 
Estimator Ŷ of the total number of elements in the population, NP
In the cases of finite populations we are often
interested in estimating the total number of individuals belonging to the
category. An estimator of the total value can be obtained from the mean, N
and so: Ŷ = N p
The expected value of Np is the total value of the population NP.
E[Np] = N P
The sampling variance of Np is:

An estimate of the sampling variance v[N p] is:

The error of Np will be:

and an estimate of the sampling error of Np is

Comments
The expected value, the variance and the error of the sampling distributions of p or Np have been presented, but sometimes it is helpful to know other aspects of the sampling distributions.
The sampling distributions of p and of Np are derived from the binomial distribution.
The variable, Yi is a Bernoulli variable with probability P of being equal to 1 and probability Q=1-P otherwise. Therefore the binomial distribution is a combination of n independent Bernoulli variables with a common constant parameter P.
This distribution is usually
denoted as Y
b(n, P). The parameters are n, the sample size, and P, the
proportion of elements belonging to the category.
When P is close to 0.5 and n is large, the binomial distribution approximates the normal distribution. Thus the mean will follow approximately the normal distribution:
p
N[E, V] with E and V as previously
indicated.
The total value Np is also approximately normally distributed:
Np
N[E, V] with E and V as previously
indicated.
Some characteristics can be classified into more than two categories. For instance, maturity can be classified into stage I, stage II, stage III, etc.
Consider a population divided into K categories or classes and let h designate one of these classes. To estimate the proportion of elements belonging to the class h, the population can be thought of as divided into only two classes, that is, the class h and another class covering all the remaining categories. In this way we can apply the previous conclusions about populations divided into two classes, being ph= nh/n the estimator of the proportion of elements belonging to class h, where n is the size of a simple random sample. The expected value and the sampling variance could be derived for the class h as mentioned for the binomial case.
As an example, the expected value and an estimate of the sampling variance of ph is:
E[ph] = Ph and 
The sampling distribution of the sample proportions, when the population is divided into k classes, with different proportions of elements in each class, can be considered as an extension of the binomial distribution, that is, as a combination of n independent Bernoulli distributions with different parameters (nh, Ph) with h=1, 2,…, k. This probability distribution is called the multinomial distribution.