Previous Page Table of Contents Next Page


3. Simple random sampling

3.1 INTRODUCTION

Simple random sampling is the simplest way to sample a population. Its simplicity arises from the way that the sample is selected. In this design, all possible samples have the same probability to be chosen.

Other sampling methods have procedures that include this method for selection of parts of a total sample. For this reason, when describing sampling methods it is convenient to start with the simple random sampling.

3.2 THE POPULATION WORLD

In simple random sampling, the population to be sampled is considered as a simple collection of elements, where no subgroups are considered.

Let Y denote a characteristic of a population. Then Yi will be the value of the characteristic of the ith element. The main parameters of the population that are more relevant to fisheries research are discussed in Chapter 2. A short summary is presented in Table 3.1.

Table 3.1
Main population parameters of interest to fisheries research

NPopulation size
YiValue of the characteristic of the ith element (i=1, 2, ..., N)
Total value
Mean value or µ (The relation Y = Nis valid)
and Variance, σ2 and modified variance, S2
and Standard deviation σ, and modified standard deviation S
and Coefficient of variation, CV

3.3 THE SAMPLE WORLD

The main statistics of the sample that are more relevant to fisheries research are discussed in Chapter 2. A short summary of the more common ones is presented in Table 3.2.

Table 3.2
Some sample statistics more frequently used in fisheries research

nSample size
yiValue of the characteristic of the ith element (i=1, 2, ..., n)
Total value of the characteristic in the sample
Sample mean
Sum of squares of the deviations from the sample mean
Sample variance
Sample standard deviation
Sample coefficient of variation

3.4 THE SAMPLING WORLD

3.4.1 Selection of the sample

Simple random sampling is defined as any sampling system that ensures that all possible samples with a given sample size have the same probability of being selected. Alternatively one could say that every element of the population has the same probability of being selected for inclusion in the sample, in one extraction.

In this sampling method, the probability, Pi, of selecting an element, i, from a finite population of size N, is:

There are two ways to take a simple random sample: either the elements are selected with replacement of the element into the population after each extraction, or without replacement.

When using sampling without replacement from a finite population, it is usual to define the sampling fraction, f, as . In that case, (1-f) is called the finite correction factor. When the number of elements of the population is infinite or sampling is with replacement, the sampling fraction is zero, the correction factor is equal to 1 and, therefore, it is not considered.

3.4.2 Estimator of the population mean y

Several statistics (e.g. the median) can be used to estimate the mean of the population. However, the most frequently used estimator of the population mean is the sample mean:

In simple random sampling, the sampling distribution of the sample mean has some important properties:

1. The expected value of the sampling distribution is equal to the population mean:

E[] = μ

Note: the sample mean is an unbiased estimator of the population mean.

2. When the sample size n is large (for example, n>100), the sampling distribution of the sample mean tends to be a normal distribution. Using the notation presented in chapter 2, we may write:

N(E,V)

where

E= E[]= = μ

V = V[]=

When n is very small compared to N the correction factor, (1-f), can be ignored and, in this case:

An estimate of the sampling variance of the estimator can be calculated by replacing the population parameter S, in the variance expressions, with the sample statistic s, that is:

or

when N is large or sampling is with replacement.

The error is given by:

and the estimate of the error is:

The C confidence interval (l1, l2) for the estimator of the population mean, can be with:

if σ2 is known (or n is large), z can be calculated from the probability relation:

When σ2 is unknown, but n is large, the standard error of the mean, σ, can be replaced with the estimated standard error of the mean, s.

If σ2 is unknown, and n is small, the standard error of the mean, σ, can still be replaced with the estimated standard error of the mean, s. In these situations, however, the Z- distribution should be replaced with the t-student distribution. The confidence interval will then be:

where tn-1 follows the t- distribution with n-1 degrees of freedom.

The value of tn-1 is given by the probability relation:

Prob {t outside the interval (-tn-1, tn-1)} = (1- C), where C is the desired confidence level.

This value tn-1 can be obtained from t- student tables.

Note that the limits (l1, l2) of the confidence interval are derived from the following relation:

with Z N(0,1)

or, when the population variance is unknown and the sample size is small, are derived using the t-student distribution as follows:

with tt(n - 1)

3.4.3 Estimator of the total value of the population

An unbiased estimator of the total value of the population is:

= N

The sampling distribution of this estimator is approximately normal:

N (E, V)

where

E = E[]= Y and

The error of the estimator is the square root of the sampling variance:

Approximations to the sampling variance and to the error of the estimator can be obtained by replacing the population variance, S2, with the sample variance, s2, in the respective formulas, that is:

and

It should be noted that the estimator = N can be written as:

This last expression shows that the estimator of the total value of the population was obtained by raising, extrapolating or amplifying the total value of the sample by the raising factor N/n. This is the most common way of obtaining the estimator total values of the population in fishery research. It is also common in fisheries research to apply, instead of the quotient between the size of the population, N, and the size of the sample, n, the quotient of the corresponding total weights, W and w (this implies that the same mean weight in the sample and in the population is assumed).

In these circumstances, the expression given above for the estimated variance can be written in terms of the quotient N/n or W/w, e.g.:

3.4.4 Estimators of proportions

In the case of proportions, the quantities to be estimated are the proportion P of the elements belonging to a category or the population mean and the total number, NP, of the elements belonging to the category or the population total of the variable Yi(in the case of finite populations).

Estimator of the population proportion

In the general situation of simple random sampling, the sample mean, that in this case is the proportion of the sample elements belonging to the category of interest, p, is an estimator of the population mean, P.

The sampling distribution of this estimator has the following properties:

Expected value:E[p]=P

Sampling Variance: V[p]=

As previously mentioned, the sampling variance of a mean is , where S2 is the population variance. In this case S2 is equal to and the sampling variance of p will be:

An unbiased estimate of V[p] can be obtained by replacing the population variance S2 with the sample variance s2 in the general expression of the sampling variance of a mean:

The error Sp is the square root of the sampling variance of p:

An estimate, sp, of the sampling error, is given by:

Estimator Ŷ of the total number of elements in the population, NP

In the cases of finite populations we are often interested in estimating the total number of individuals belonging to the category. An estimator of the total value can be obtained from the mean, N and so: Ŷ = N p

The expected value of Np is the total value of the population NP.

E[Np] = N P

The sampling variance of Np is:

An estimate of the sampling variance v[N p] is:

The error of Np will be:

and an estimate of the sampling error of Np is

Comments

The expected value, the variance and the error of the sampling distributions of p or Np have been presented, but sometimes it is helpful to know other aspects of the sampling distributions.

The sampling distributions of p and of Np are derived from the binomial distribution.

The variable, Yi is a Bernoulli variable with probability P of being equal to 1 and probability Q=1-P otherwise. Therefore the binomial distribution is a combination of n independent Bernoulli variables with a common constant parameter P.

This distribution is usually denoted as Y b(n, P). The parameters are n, the sample size, and P, the proportion of elements belonging to the category.

When P is close to 0.5 and n is large, the binomial distribution approximates the normal distribution. Thus the mean will follow approximately the normal distribution:

pN[E, V] with E and V as previously indicated.

The total value Np is also approximately normally distributed:

NpN[E, V] with E and V as previously indicated.

3.4.5 Estimator of several proportions of the population

Some characteristics can be classified into more than two categories. For instance, maturity can be classified into stage I, stage II, stage III, etc.

Consider a population divided into K categories or classes and let h designate one of these classes. To estimate the proportion of elements belonging to the class h, the population can be thought of as divided into only two classes, that is, the class h and another class covering all the remaining categories. In this way we can apply the previous conclusions about populations divided into two classes, being ph= nh/n the estimator of the proportion of elements belonging to class h, where n is the size of a simple random sample. The expected value and the sampling variance could be derived for the class h as mentioned for the binomial case.

As an example, the expected value and an estimate of the sampling variance of ph is:

E[ph] = Ph and

The sampling distribution of the sample proportions, when the population is divided into k classes, with different proportions of elements in each class, can be considered as an extension of the binomial distribution, that is, as a combination of n independent Bernoulli distributions with different parameters (nh, Ph) with h=1, 2,…, k. This probability distribution is called the multinomial distribution.


Previous Page Top of Page Next Page