3. PROGRESSIVE SAMPLING APPROACHES

3.1 Definition of a Progressive Sampling Approach (PSA)

Given a population of size N and mean µ (it is recalled here that all populations are considered to be normalized), we define a Progressive Sampling Approach (PSA) as a sequence of N independent random samples of size n, where n takes values 1,2,..., N. The mean of each sample with size n will estimate the population mean at an accuracy level A_n defined by expression (2.6). It is evident that samples with size n=N will all have an accuracy A_N equal to 1 since the sample mean will coincide with the population mean. For small samples the resulting accuracy will take any value between 0 and 1 although, depending on the distribution of the population, there exist lower limits for A.

3.2 Accuracy curves

In order to examine the way the accuracy A varies throughout a PSA we first consider the independent variable:

(3.1)

representing the proportion of sample size to population size. Since x varies between and 1, and A between 0 and 1, all plots of A on x (referred to as A-curves), will always be contained in the standard rectangle formed by the coordinates (, 0) and (1, 1). Figure 3.1 illustrates such an A-curve for a skewed population with size N=50. The small graph inside the plot describes the distribution of the population.

The curve formed by the accuracy A has a "hyperbolic-type" shape (it is not a true hyperbola as it intersects the axes rather than having them as asymptotes), with two distinct and clearly visible patterns of growth. The first pattern shows that A is expected to be low when the sampling proportion x is near its origin, and that it increases sharply as x starts increasing. The second pattern shows that at some critical value of x (that will be later discussed), the accuracy A reaches a breakpoint after which its growth becomes much steadier and slower until it finally reaches its maximum value 1.

3.3 Overall mean accuracy of a PSA

When plotting the accuracy A throughout a PSA the area under the A-curve provides a measure of the overall mean accuracy of the PSA in question. This is because the difference between two successive values of the variable is always and the area under the accuracy curve can be expressed as:

(3.2)

Fig. 3.1. Illustration of a typical plot of accuracy A against the sample proportion x=n/N

In general, accuracy growth throughout a PSA follows the "hyperbolic-type" pattern illustrated in Figure 3.1. However, there exist other possible paths, albeit of lower frequency, that do not necessarily have that shape. Our aim now is to prove that all possible A-curves of a normalized population of size N, have a geometrical boundary specific to that population.

3.4 Definition of the "Worst" PSA

To any given population there corresponds a "Worst Possible PSA" which is uniquely defined. This PSA consists of the "worst" possible (or most "unlucky") samples with size 1,2,...,N. To construct it we first assume that the elements of a normalized population of size N and mean m, have been ranked in ascending order so that:

It is evident that the worst sample of size 1 will be either or and, using the expression (2.6) for accuracy, the worst A for the sample proportion will be the minimum of and . Likewise for the sample proportion the worst accuracy will correspond either to the subset or to . Generally, for any sample proportion the corresponding worst accuracy, now denoted by W(x), will be formed either by the first n or the last n ranked elements.

In can thus be concluded that for any given population of size N, all A-curves will be found above the curve formed by the worst accuracy W(x) determined by the approach described above.

Figures 3.2-3.5 illustrate examples of A(x) and W(x) curves for four different populations with size N=200. The plots are accompanied by smaller diagrams describing the distribution of the populations. In each case the region of all feasible PSA's is to be found above the worst accuracy curve W(x).

The analytical model describing "hyperbolic-type" accuracy curves is briefly discussed in Section 6.1.

Fig. 3.2. Example of A- and W-curves for a flat population

Fig. 3.3. Example of A- and W-curves for a convex and symmetrical population

Fig. 3.4. Example of A- and W-curves for a convex and skewed population

Fig. 3.5. Example of A- and W-curves for a concave population