Appendix 2 Calculation of sample size

Chapter 5 introduced the issue of calculating the sample size needed to estimate the population mean with a reasonable level of confidence.

The optimum sample size is based formally on calculation from the following equation (Proctor and Meullenet, 1998):

where

x = sample mean

m = population mean

SD = standard deviation of the sample mean

n = sample size

The equation can be rearranged as follows:

Sample size (t a n -1)2 SD2/(accuracy ¥ mean)2

Application of this equation requires knowledge of some parameters that will be only available if the analyst has some preliminary information about the food. This ideally should come from pilot analytical studies to determine the mean and standard deviation, from data in the literature or, if such data are not available, from intuitive guesses.

The values for adefine the confidence limits required. If a 95 per cent confidence interval is required, a equals 5 percent, i.e. 0.05. The degree of freedom (df ) is defined as n – 1. Thus, for a sample size of 10, df = 10 – 1 = 9.

The value for t is taken from standard statistical tables (Student's t table), using the required value of a and a guesstimate of sample size.

Accuracy is the required closeness of the estimated value to the true value (unknown). A sample mean within 10 percent of the population mean would represent an accuracy of

0.1. In other words, the required confidence interval is x ± 0.1x.

Examples of value for t :

For a sample size of 10, a = 0.05, df = 9, t = 2.262. Thus t 2 = 5.1166.

For a sample size of 20, a = 0.05, df = 19, t = 2.093. Thus t 2 = 4.3806.

Examples of sample sizes calculated from literature values:

The examples below use the data reported by Greenfield, Makinson and Wills (1984) for moisture, fat and cholesterol in 24 samples of retail French fries. These data illustrate the fact that different nutrients, because they show different variances, need different sample sizes to achieve the same level of confidence.

Table A2.1 Calculation of sample numbers
Parameter	Moisture (g/100 g)	Fat (g/100 g)	Cholesterol (g/100 g)
Actual sample size	24	24	24
Actual mean	49.9	13.4	16
Actual standard deviation (SD)	8.5	3.9	6.7
SD²	72.25	15.21	44.89
t(a = 0.05)	2.069	2.069	2.069
t²	4.2808	4.2808	4.2808
t² ¥SD²	309.285	65.11	192.165
Accuracy set at	0.1 (0.05)	0.1 (0.05)	0.1 (0.05)
Accuracy ¥ mean	4.99 (2.495)	1.34 (0.67)	1.6 (0.8)
(Accuracy ¥ mean)²	24.9 (6.225)	1.7956 (0.4489)	2.56 (0.64)
Sample size required for accuracy = 0.1	309.285/24.9 = 13	65.11/1.7956 = 37	192.165/2.56 = 76
Sample size required for accuracy = 0.05	309.285/6.225 = 50	65.11/0.4489 = 146	192.165/0.64 = 301

Table A2.1 summarizes the relevant data and calculations.

This shows that for accuracy of 0.1, ten samples (a commonly used sample size) would be inadequate to achieve a mean with the required confidence in any of the three cases. A sample size of around 13 would be adequate for moisture and 37 for fat; a sample size of 76 would be required for cholesterol, which showed the greatest variability. This can be explained by the fact that some of the French fries had been fried in vegetable oils with virtually no cholesterol.

If the calculation is carried out to produce confidence limits for accuracy of 0.05 then a sample size of 50 would be needed for water, 146 for fat, and over 300 for cholesterol.

The examples show that the sample size will be larger for nutrients that show greater variability than for less variable nutrients. In practice most designers of sampling protocols have to make intuitive judgements in calculating the sample size to be collected.