|1.||Population, frame, sampling units, survey units|
|2.||Method of selection|
|2.1||Simple random sample (SRS)|
|3.||Estimation of population mean from a sample and precision of estimate|
|3.1||Estimation of population total and precision|
|4.||Estimation of proportions and their uses|
|5.1||Sample size in different strata|
|7.||Unequal probability sampling|
|7.1||Method of selection|
|7.2||Method of estimation|
|8.||Two stage sampling|
|8.1||Selection of first stage units at random|
|8.2||Selection of first-stage units with probability proportional size (pps)|
1. Population, Sampling Frame, Sampling Units, Survey Units
Whenever a survey is contemplated, it is first necessary to specify the units which require to be included in the survey, and their geographical context. All rigorous sampling demands a subdivision of the material to be sampled into units, termed “sampling units”, which form the basis of the actual sampling process. Clear and unambiguous definition demands the existence or construction of a list (= sampling frame) of the sampling units. In the case of a Catch Assessment Survey (traditional and artisanal fisheries) the following hierarchy of sampling units can be introduced:
- Primary sampling units (PSU's): landing places
- Secondary sampling units (SSU's): fishing economic units
Items of information on the survey characteristics are collected from the above SSU's, which, are also called “survey units”.
For data collection one of the following two survey methods can be used: (a) The census method. This implies complete enumeration of the survey population; in a census method information is obtained from all the survey units in the population, and (b) The sampling method, where information is obtained from a properly selected fraction of units of the survey population. In large-scale surveys, the sample selection is from the existing sampling frame.
2. Selection of Sample Units
If there are N sampling units in the population and we want to draw a simple random sample1 of size n, we can work out all possible samples of size n and select one of them at random. The number of all possible distinct samples of size n which can be selected from a population N is given by:
where, ! stands for factorial e.g., 3! = 1x2x3, etc. For example, if N = 4 and n = 2, the number of distinct samples which can be selected is given by:
In practice, when N is large, it is not possible to enumerate all possible distinct samples and then select one of them. Normally, a simple random sample is drawn unit by unit. The units in the population are marked serially from 1 to N. We then refer to a table of random numbers (see Appendix Table 1) and draw from this table a series of n numbers lying between 1 and N, taking care to reject numbers above N and not allowing the same numbers to appear in the series more than once. The units in the population marked as per the number selected in the series constitute our sample of n selected units. It has been proved that this method produces simple random samples.
There are N=28 landing sites in a district. We want a simple random sample of n=5 landing sites.
Since N=28 is a two-digit number, we refer to any row of two-digit numbers in the Random Number Table. Referring to the first row of two-digit numbers, we find the consecutive numbers are: 23, 5, 14, 38, 97, 11, 43, 93, 49, 36, 7, etc.
Now select those that lie between 1 and 28, until we have selected a series of 5 numbers. The selected series is: 23, 5, 14, 11 and 7.
1 This means, every unit in the population has an equal and no zero probability of being selected in the sample
The landing sites marked with these numbers in the population constitute our sample.
3. Estimation of Population Mean from a Sample and Precision of Estimate
If there are N units in the population and we measure a desired characteristic (y) of all units in the population, then we have:
The variability in the measured characteristics among the population units is given by S²y
Now, if we draw a sample of n units from the N units in the population, we can define:
and the variance per unit in the sample is given by:
If the same method of measurement of the desired characteristics is employed both for the population units and the sample units, the absolute value of the precision of the sample mean is given by:
Generally, the population mean is not known and the main purpose of sampling is to get an estimate of from the sample and also to have a measure of precision of that estimate. Now we know that in SRS we can produce Ncn samples (of n units) from a population of N units, and we can have a series of Ncn sample means 's.E() is equal to and thus is an unbiased estimate of . It has also been proved that in the case of SRS selection, the variance of is given by:
The standard error of the sample mean is given by:
S measures the degree of scatter of possible sample means around . The smaller it is, the probability of a large deviation of from will be small. For n > 30, it has been shown that at 95% probability level, the population mean will lie in the interval,
Thus we see that S provides a measure of precision of the sample estimate.
We generally do not know Sy in order to calculate Sy. In SRS, an unbiased estimate of Sy is provided by sy.
3.1 Estimation of Population Total and Precision
In a landing site, 30 boats land their catch on a particular day, and the catches (yi) of 10 boats selected at random are examined. Estimate the total catch of the day and its standard error and coefficient of variation: N = 30; n = 10.
|Sample boat||Catch (kg)|
The various estimates are:
3.2 Sample Size
In Section 3 we have seen:
When N is large,
Now, for large N, at 95% probability level, the population mean will lie within the interval ± 1.96 s or roughly within ± 2 s. Therefore, represents percentage accuracy of the mean at 5% significance level.
Thus, the sample size n required for an a% accuracy of the mean at 5% significance level is given by:
In a survey sample n = 18 gave a mean of = 589.44 kg and sy = 531.79. How many units would be needed if it were desired to estimate at a 5% significance level, the estimated mean (a) within 10%, (b) within 5%, and (c) within 1% of the population mean.
(a) Number of units required for getting with an accuracy of 10% is,
(b) For an accuracy of 5%,
(c) For an accuracy of 1%,
In Example 3.1a, if we had derived an estimate of with a cv of 5%, what size of sample would be needed.
4. ESTIMATION OF PROPORTIONS AND THEIR USES
Let there be N units in the population of which Ni belongs to i-class, so that the proportion belonging to class i is: Pi=Ni/N . We want to estimate Ni and Pi from a simple random of n units, in which ni is in class i so that pi=ni/n.
It has been shown that an unbiased estimate Pi of Pi is given by Pi, so that Pi = Pi = ni/n, and an unbiased estimate of Ni (where Ni is the number in the class i in the n population) is given by: Ni = N ·pi.
An unbiased estimate of variance of pi is given by:
When n/N is small, i.e., n is small compared to N, or N is very large,
An unbiased estimate of the variance of Ni is given by:
If the magnitude of N is itself an estimate, the estimated variance of Ni is given by:
A random sample of 82 boats were taken out of 820 boats. It was found that 32 were using lines. Estimate the proportion and number of boats using lines.
The number of cods landed was 2 000. A sample of 100 cods were taken and their ages determined, and the distribution is as follows:
Find out the estimated number of cods in each age group in the total landings and the variance of these estimates.
Here we have: N = 2 000; n = n1 + n2 + n3 + n4 + n5 = 100
5. STRATIFIED SAMPLING
It has been seen that in simple random sampling the variance of mean v() depends, apart from the sample size n, on the variability of the characteristics in the population, i.e., on S²y. If the population is heterogeneous, i.e., measurements vary considerably from one unit to another, then by using auxiliary information, it may be possible to divide it into sub-populations (or strata), each of which is internally homogeneous.
Let us suppose that there are N units in the population and these are stratified into k strata with Ni units in the ith stratum. Let a sample of n units be drawn, of which ni are from the ith stratum. Let yij be the measurement of the jth unit in the ith stratum.
Then we have the following:
We also have,
The unbiased estimates of variances are:
If the sampling fraction ni/Ni is negligible for all strata, then we have:
Out of 200 boats in a district, 70 were engaged in line fishing, 120 in gillnet fishing, and 10 in beach-seine fishing. For the purpose of estimating catch, 5 line fishing boats, 7 gillnet boats, and 3 beach-seine boats were selected, and their catches in tons for the month of January were noted as follows:
What was the estimated total catch in the district in January and the variance of the estimates? What is the mean catch per boat and its variance?
Note: If there was no stratification, and we had chosen a simple random selection of 15 units, and their catches were as in Example 5, we would have:
Ŷ = 10.06 × 200 = 2 012 t
Clearly, by stratification, we have obtained an estimate with lower cv(Ŷ) than in the case of a simple random selection.
5.1 Sample Size in Different Strata
In Example 5, we selected a sample of 15 units, and the allocation of number of units in the different strata was done arbitrarily.
Now, when the sampling fraction is negligible, we know from equation (5.5) that variance of the population total is given by:
This equation suggests two methods of allocation of n among the different strata:
(a) Proportional allocation:
In this method, ni is proportional to Ni. If within-stratum variances are equal, the method gives the smallest sampling variance, i.e., the most efficient estimates. Generally, the proportional allocation is used when information on strata variances are not available.
(b) Optimum allocation:
When the within-strata variances differ greatly from stratum to stratum, the proportional allocation no longer provides best estimates. In such cases, it is better that the sampling fraction is taken proportional to the stratum standard deviation.
For further details on these, one is referred to books of sampling designs (e.g., Yates, Bazigos, 1974).
The following catches (kg) were obtained in 18 hauls of a trawl survey:
200, 440, 600, 640, 700, 800, 900, 1 020, 1 600, 1 920 20, 10, 340, 400, 720 40, 100, 160
(a) If the trawl net covered 40 ha per haul and if 50% of all fish in its path was caught and the total survey area was 6 × 106ha, estimate the total abundance of fish.
(b) If the first 10 hauls were taken in depths 0–20 m, the next 5 in depths 20–40 m, and the last three in depths over 40 m and the areas of the depth zones are 1 × 106, estimate of abundance?
(c) Find the variances of the above two estimates.
(a) Unstratified Sample
Let be the mean catch, and if a is the area swept by each haul, the catch per hectare is /a. Since the net catches only 50%, i.e., the catchability coefficient q is 1/2, the density of stock per hectare is: /aq.
Therefore, estimated abundances for the survey area A are:
where n is the number of sample hauls.
Now we have,
(b) Stratified Sample
In this case,
The nummerical calculations may be done conveniently in a tabular fashion:
This is another method in which use is made of auxiliary information to increase the precision. Let us suppose we have selected at random n units out of N units in the population and for each of these selected units we have measured (x,y), where y is the survey variate and x is another correlated variate. The population total of x-variate is known to be:
but y may not be known for each unit of the population except for those in the sample. In this case, an estimate of the population total Y of the survey variate is given by: Ŷrat = R\?\ X, where the estimate R is obtained from the sample as:
The variance of the ratio estimate Ŷrat is given by:
where, r is the estimated coefficient of correlation between x and y.
There are 50 landing centres in a country where shrimp trawlers land. The shrimp trawlers are registered and the total from the Registration Record is known to be 280. Now, 5 landing centres are selected at random and the catch (y) and the number of trawlers (x) at each of the 5 landing centres are obtained. Make a ratio estimate Yrat of the total landings by the shrimp trawlers in the country.
|Landing centres: Total - N = 50|
|Sample - n = 5|
|Trawlers: Total - X = 280|
|Total:||30||295||226||21 331||2 191|
Therefore, Ŷrat = R\?\ X = 9.83 × 280 = 2 752.40 t
and from equation (6.1),
7. UNEQUAL PROBABILITY SAMPLING
We have seen that by stratification and ratio estimation we can increase the precision of estimate. Another technique used for this purpose is pps sampling, i.e., where the sampling units are selected with probabilities proportional to their sizes. This is widely used in cases where sampling of clusters is preferred to direct sampling of individual units, the reasons being that it is economical to sample a fixed number of individual units when they are in clusters and that sometimes reliable frame of individual units are not available.
7.1 Method of Selection
Suppose there are 10 landing sites with number of boats at each landing sites shown in Col. 2. We want to select 3 sites with pps.
|Landing||No.of||Cumulative||Allotted||Selected random no.|
|site||boats||total||numbers||or fishing site|
Random no. 011
Fishing site 01
Random no. 027
Fishing site 03
Random no. 064
Fishing site 05
Column 3 is the cumulative total. Now each landing site is given a number proportional to its size. Thus the landing site 1 gets 12 numbers, 001–012, allotted to it, the landing centre 5 gets 30 numbers from 040–069 allotted to it, and so on. Then we use the random number table and select 3 numbers between 1 and 120. These selected numbers are: 011, 027 and 064. The corresponding fishing sites selected are: 01, 03 and 05.
It may be noted that in this method of selection a unit with a larger size has a higher chance of selection than a unit of a smaller size.
7.2 Method of Estimation
Let there be N primary sampling units (fishing sites) and let xi be the number of secondary units (boats) in the ith landing site. If n primary units are selected with pps, then the probability of selecting the ith unit in the sample is: Pi=xi∑xi.
The estimate of the Population Total Y is given by:
where yi is the measurement of the ith unit in the sample; and the estimated variance of Y is given by:
There are 20 fishing sites in a district. The number of boats at each centre is known, i.e., xi = number of boats at the ith centre is known, and therefore X = ∑xi is known to be 496. Four fishing sites are selected out of 20 fishing sites with pps. In the table below, Col. 1 gives the 4 fishing sites selected in the sample, Col. 2 gives the number of boats (x) in these sites, and Col. 3 gives the landings at these sites during a month. Estimate the total monthly landings Ŷ and v(Ŷ).
|1||22||81||0.0443||1 828||3 341 584|
|2||30||118||0.0605||1 950||3 802 500|
|3||30||118||0.0605||1 950||3 802 500|
|4||42||170||0.0847||2 007||4 028 049|
|Total:||7 735||14 974 633|
8. TWO-STAGE SAMPLING
In two-stage sampling, a sample of first-stage units are chosen first, and in each of the selected first-stage units, a further sample of survey units is chosen. A simple random selection may be made for the first-stage units or they can be selected with probability proportional to their sizes.
8.1 Selection of First-Stage Units at Random (SRS)
Let us have:
N = Number of first-stage units
n = Number of first-stage sample units
Mi = Number of survey units in the ith first-stage unit
mi = Number of survey units selected in the ith first-stage unit
The unbiased estimate of the population total of the survey characteristic (y) is given by:
Let there be 8 fishing sites (N=8). We first select n=3 fishing sites at random and for each fishing site we select 3 traps and measure their catch. The number of traps existing at each selected fishing site and the catches of each selected trap are shown below. Calculate the estimated total catch of trap fisheries and its variance.
|No. of traps at|
each site (Mi)
|No. of traps|
Estimated total landings,
It may be noted that the contribution 1 473.3 to v(Ŷ) is due to difference in the obtained catches between the fishing sites and this is much greater than 673.3 which is due to difference among second-stage units within the first-stage units.
8.2 Selection of First-Stage Units with PPS
The estimated catch in the ith fishing site is given by:
The unbiased estimate of the population total is given by:
The variance of Y is given by:
Three fishing sit es were chosen with pps and within each sample fishing site a simple random sample of boats were selected. In the table below we give catches (in kg) of selected sample. Calculate Ŷ and cv(Ŷ).
Tables of Random Numbers (from Bazigos, 1974)