# Calculation of sample size

Chapter 5 introduced the issue of calculating the sample size needed to estimate the population mean with a reasonable level of confidence.

The optimum sample size is based formally on calculation from the following equation (Proctor and Meullenet, 1998): where

x = sample mean

m = population mean

SD = standard deviation of the sample mean

n = sample size

The equation can be rearranged as follows:

Sample size (t a n -1)2 SD2/(accuracy ¥ mean)2

Application of this equation requires knowledge of some parameters that will be only available if the analyst has some preliminary information about the food. This ideally should come from pilot analytical studies to determine the mean and standard deviation, from data in the literature or, if such data are not available, from intuitive guesses.

The values for adefine the confidence limits required. If a 95 per cent confidence interval is required, a equals 5 percent, i.e. 0.05. The degree of freedom (df ) is defined as n – 1. Thus, for a sample size of 10, df = 10 – 1 = 9.

The value for t is taken from standard statistical tables (Student's t table), using the required value of a and a guesstimate of sample size.

Accuracy is the required closeness of the estimated value to the true value (unknown). A sample mean within 10 percent of the population mean would represent an accuracy of

0.1. In other words, the required confidence interval is x ± 0.1x.

Examples of value for t :

For a sample size of 10, a = 0.05, df = 9, t = 2.262. Thus t 2 = 5.1166.

For a sample size of 20, a = 0.05, df = 19, t = 2.093. Thus t 2 = 4.3806.

Examples of sample sizes calculated from literature values:

The examples below use the data reported by Greenfield, Makinson and Wills (1984) for moisture, fat and cholesterol in 24 samples of retail French fries. These data illustrate the fact that different nutrients, because they show different variances, need different sample sizes to achieve the same level of confidence.

 Table A2.1 Calculation of sample numbers Parameter Moisture (g/100 g) Fat (g/100 g) Cholesterol (g/100 g) Actual sample size 24 24 24 Actual mean 49.9 13.4 16 Actual standard deviation (SD) 8.5 3.9 6.7 SD2 72.25 15.21 44.89 t(a = 0.05) 2.069 2.069 2.069 t2 4.2808 4.2808 4.2808 t2 ¥SD2 309.285 65.11 192.165 Accuracy set at 0.1 (0.05) 0.1 (0.05) 0.1 (0.05) Accuracy ¥ mean 4.99 (2.495) 1.34 (0.67) 1.6 (0.8) (Accuracy ¥ mean)2 24.9 (6.225) 1.7956 (0.4489) 2.56 (0.64) Sample size required for accuracy = 0.1 309.285/24.9 = 13 65.11/1.7956 = 37 192.165/2.56 = 76 Sample size required for accuracy = 0.05 309.285/6.225 = 50 65.11/0.4489 = 146 192.165/0.64 = 301

Table A2.1 summarizes the relevant data and calculations.

This shows that for accuracy of 0.1, ten samples (a commonly used sample size) would be inadequate to achieve a mean with the required confidence in any of the three cases. A sample size of around 13 would be adequate for moisture and 37 for fat; a sample size of 76 would be required for cholesterol, which showed the greatest variability. This can be explained by the fact that some of the French fries had been fried in vegetable oils with virtually no cholesterol.

If the calculation is carried out to produce confidence limits for accuracy of 0.05 then a sample size of 50 would be needed for water, 146 for fat, and over 300 for cholesterol.

The examples show that the sample size will be larger for nutrients that show greater variability than for less variable nutrients. In practice most designers of sampling protocols have to make intuitive judgements in calculating the sample size to be collected.