Chapter 11 Guidelines for the use of food composition data

There are two schools of thought about food tables. One tends to regard the figures in them as having the accuracy of atomic weight determinations; the other dismisses them as valueless on the ground that a foodstuff may be so modified by the soil, the season or its rate of growth that no figure can be a reliable guide to its composition. The truth, of course, lies somewhere between these two points of view.

(Widdowson and McCance, 1943)

A food composition database or table is a scientific tool and must be treated as such. Even the best food composition database or table is of little value if it is used incorrectly. The compilers are responsible for ensuring that the database meets users' requirements and they must also define for the user the limitations of the database, so that the data are not used inappropriately. However, correct use is the responsibility of those who train the users, and of the users themselves.

Effective use requires training and expertise, the level of which depends on the sophistication of the database or tables concerned (see Chapter 1 for a discussion of levels of data management). Even simplified food tables designed for lay use require some background knowledge of weights and measures, and of terms such as “kilojoules” and “energy”. More sophisticated databases require an understanding of modes of expression, food descriptors and concepts such as edible portion. A professional nutritionist or dietitian must become familiar with the principles of sampling, analytical methodology and data management, and be aware of common mistakes that can arise in database usage. The professional user also requires training in database evaluation for specialized uses (e.g. a research project). A training programme covering all of these areas should probably form a unit in any tertiary or professional course specializing in nutrition. Wageningen Agricultural University and UNU/FAO/INFOODS have run specialized short training courses on the production, management and use of food composition data in centres around the world since 1992, and information about forthcoming courses can be found on the INFOODS Web site (INFOODS, 2003). Overall, considerable responsibility rests with those who train users of food composition databases (Greenfield, 1991b).

Ultimately, it is the users, particularly the professional users, who bear responsibility for using the database correctly and particularly those users who have the responsibility for updating and supplementation of an existing database for their own organization. They must familiarize themselves with all aspects of the database or tables: coverage, methods of analysis, method of compilation, sources of values, differing types of values, coding, food nomenclature and modes of expression. They must understand the use of factors in calculating derived values (such as protein, energy value and vitamin equivalents) and the different levels of reliability attached to values for different nutrients. Arithmetical checks should be run to ascertain the accuracy of calculated values (e.g. fatty acid levels in a food, calculated from the food's fat content and the fatty acid composition [see Appendix 5]). Any computer program developed for use with the database should be carefully checked for accuracy. Finally, the user must ensure that any research report based on a database or set of tables fully documents the database or tables used, together with any supplemental food values used (Perloff, 1983). Several journals (Journal of Food Composition and Analysis, 2003a; Journal of the American Dietetic Association, 2003; and Nutrition and Dietetics, 2003) now require the identification of nutrient databases and software in all published articles, with the following standard presentation suggested by the Citation Task Force aligned with the United States' National Nutrient Databank Conference:

Cite software developers parenthetically in the text after the first mention of a software package. Software citations should include the name, version number, and release date of the software as well as the name and headquarters location (city and state) of the software developer. If software incorporates a nutrient database, provide information in the text about the database. This should include the release date for the database, a description of substantial modifications made to the database, and an explanation of how missing nutrient data for foods were handled (i.e., indicate whether values were extrapolated and evaluate the effect of any missing values on dietary totals for the nutrients of interest).

This practice could usefully be adopted by all journals dealing with dietary studies of humans. Failure to give such information means that a study as published can never be independently replicated.

The quality of future databases will improve only if all users are well trained and vigilant.

Limitations of the use of food composition databases

Several studies have compared values obtained from the chemical analysis of composite diets with values computed by use of food composition tables or databases, with greatly varying findings (Stock and Wheeler, 1972; Acheson et al., 1980; Stockley et al., 1985; Wolf, 1981; McCullough et al., 1999). Arab (1985) demonstrated the difficulties of making international comparisons, owing to variations in both nomenclature and composition of foods. Limitations in the use of food composition databases can be summarized as:

variability in the composition of foods;
partial or limited coverage of food items;
partial or limited coverage of nutrients;
inappropriate database or food composition values;
errors arising in database use;
incompatibility of databases;
differences in software packages;
limitations of methods for measuring food intake.

Variability in the composition of foods

Foods as biological materials exhibit natural variations in the amounts of nutrients contained. This variability is increased by different methods of plant and animal husbandry, storage, transport and marketing. Processed foods, despite being subject to quality control during production, also vary, in part because of variations in the composition of ingredients but also because of changes in formulation and production. Some composite foods such as margarines are routinely reformulated with the least-cost procedure that will maintain technological qualities of the product within a defined price range but may alter the nutrient content.

For many foods the limits of natural nutrient variation are not defined. Similarly, variations introduced as the food moves from production through retail sale to consumption are not known for many nutrients, because of the low priority (and hence lack of resources) devoted to food composition research. However, sufficient information exists to support some general statements about the major sources of variation in the nutritional composition of foods.

Meats. The major sources of variation in animal products are the proportion of lean to fat tissue and the proportion of edible to inedible materials (bone, gristle). The distinction between edible and inedible is subject to cultural and personal idiosyncrasies. Variations in the lean–fat ratio affect levels of most other nutrients, which are distributed differently in the two fractions.

Fruits and vegetables. In plant foods, genetics, husbandry and storage are major sources of variation. Water content is particularly affected by storage conditions, and changes in water content are associated with changes in all other constituents, primarily as a result of changes in nutrient density. Husbandry conditions, geochemistry (soil composition) and fertilizer use alter vitamin and mineral contents, especially of trace elements; levels of illumination affect sugars, organic acids, carotenoids and vitamin C levels. The level of phytochemicals in plant foods varies even more than nutrient levels because it is heavily dependent on factors such as pests and pesticides (Eldridge and Kwolek, 1983).

Cereal. Flours and grains vary less than do fruits and vegetables because they can be stored only if their water content lies within a narrow range. However, their protein content can vary by a factor of two, depending on variety and fertilizer usage. Of course, fertilizer and soil type will produce some variations in mineral content. Cereal enrichment/fortification practices in some countries markedly affect contents of B vitamins, iron, calcium and folate.

Milk. The major variation is in fat content and fat-soluble vitamins. Most industrialized countries have rigid standards for fat content, and the collection of milk from large herds minimizes differences due to stage of lactation. Considerable variation would occur in the composition of milk from small herds, which comprise the majority of those in developing countries. Levels of carotenes in milk may vary considerably, depending on time of year and whether the herds are fed concentrates or are at pasture. In some countries, milk is fortified, e.g. with vitamins A and D.

Processed foods. Variations in ingredients and formulation are common, although most manufacturers have rigid specifications for ingredients and use quality control procedures that sometimes pertain to nutrient levels. However, in many cases the requirement is to maintain specified levels of nutrients, and most additions include “overages” to allow for losses during handling and storage. Despite quality control, many processed foods exhibit the same variations seen in “natural” foods.

Composite dishes. Human diets include a wide range of composite dishes, prepared either by a food service (such as a restaurant or workplace canteen) or in the home. Composite dishes show the greatest variations in composition and therefore represent the least reliable data in a food database. Nonetheless, if a database is to be used in nutritional studies of individuals as members of groups, then data on these foods will be required. Recipe formulation and actual cooking method are the major sources of variation.

Calculated compositional data. The results of calculations will incorporate variations such as those listed above in the analytical data for the ingredients used, as well as variability in yield and retention factors.

The variations summarized above are a major constraint on the usage of food composition databases. A database is unlikely to predict within narrow limits the composition of a particular sample of food, because the limits will vary according to the food item and to the nutrient. Furthermore, the limits can be defined only if the value for each nutrient is accompanied by some measure of variation within that food. Beaton (1987) carried out simulation computations with United States food composition data (for which standard error data are published) using model diets. Variability appeared to produce a smaller bias in nutrient intakes computed for diets composed of many as opposed to few foods. This work also indicated the need to analyse or replicate analyses of foods that are major suppliers of dietary nutrients.

Ideally, all food composition databases should contain estimates of variability. Thus, the ideal composition database would have to be derived from sufficient numbers of analytical values to permit definition of the natural limits of variation and the distribution of the variance. Databases are in development that may meet these statistical requirements (ILSI, 2003). However, even such an ideal database would only predict the expected range of composition for any individual food.

Thus, natural variations in foods need to be understood by all users in all sectors, as they limit the predictive accuracy of nutrient intake calculations. Additionally, when using a compositional database for statutory purposes or to define standards against which to compare an individual food sample, this natural variation must be taken into consideration.

For some nutrients, a database is, at best, an approximate quantitative guide. Examples are vitamin C and folates, and sodium (and chloride) because of the wide use of salt as an additive. In many cases trace elements can be predicted only semi-quantitatively.

Limited coverage of food items

In industrialized countries the number of branded processed foods available is of the order of 10 000; furthermore, “new” products are being introduced continuously. The total number of foods consumed, if composite dishes are included, is probably of the order of 100 000. It is therefore unlikely that a database can be truly comprehensive for more than a short time. Clearly, priorities must be assessed when foods are selected for inclusion. Nevertheless, users require an increasing amount of brand name data in food composition databases because many manufactured foods are unique in their composition and/or have no generic equivalent (McDowell, 1993).

If the criteria discussed in Chapter 3 are applied to the selection, the database will include data for generic foods or major types of product. Thus biscuits (cookies) can be identified by brand name and type (sweet, semi-sweet, etc.), and a biscuit can be assigned to a type if the specific brand is not included. In most nutritional studies, the error produced by this approach is acceptable. For a computerized database application, software can probably be designed that will guide the user to the most appropriate alternative item. A cumulative record of items for which alternatives were sought would aid the assessment of priorities for items to be inserted in the database.

Coverage of nutrients

The assignment of priorities to specific nutrients for inclusion in a database is discussed in Chapter 4. Complete coverage of all nutrients requires high levels of laboratory instrumentation, and many nutrients remain problematical from the analytical viewpoint. Complete coverage of all the nutrients in well-documented samples is therefore uncommon. Furthermore, nutritional interests change with time; for example, in 1967–68 most dietitians in the United Kingdom did not require values for “unavailable carbohydrate” (dietary fibre), whereas by 1974 all were seeking such data avidly. Some interests in nutrients parallel analytical methodology; the advent of gas chromatographs permitted detailed characterization of fatty acid composition; automatic liquid chromatography heightened interest in amino acids, and high-pressure liquid chromatography in the analysis of free sugars. Improvements in inorganic analysis using atomic absorption spectroscopy have increased interest in trace elements.

If the first priority is given to proximates and major nutrients (as suggested in Chapter 4), new databases will lack certain data for some years. Even if a massive, comprehensive analytical programme is attempted, priorities must still be set according to the importance of a food in the provision of a nutrient. Assessment on the grounds of probable concentration alone is inadequate; low levels of a nutrient in a food that is regularly consumed are more important than high levels in a rarely consumed food such as a luxury item. Both frequency of consumption and nutrient concentration must be judged against the normal range of total intake of the nutrient in question. This assessment often shows that a certain food makes a virtually negligible contribution to total consumption of the nutrient in question, and consequently, analytical work on the food for that nutrient is difficult to justify.

Missing values can be a source of grave error, however. Stockley (1988) reviewed studies of errors associated with missing values in databases, citing underestimates of B vitamin intake ranging from 1.5 percent to 14.3 percent. Further, only 69 percent of total polyunsaturated acids analysed in duplicate diets were obtained, improving to 89 percent when missing values in the tables used were filled in. Cowin and Emmett (1999) compared nutrient intakes from a food intake study in the United Kingdom calculated from the fifth edition of the United Kingdom tables (Holland et al., 1991) with those calculated from the same database with missing values filled with “guesstimates”. They found that of the 1 027 foods recorded in the dietary survey, 540 had missing data for one or more nutrients. The nutrient intakes of over 90 percent of the subjects were altered by the use of the guesstimate-filled database. Underestimates using the uncorrected database ranged from 0.04 percent to 14.7 percent, the effect of missing data being proportionately greater at the lower end of the nutrient intake distribution. Further, in the European Prospective Investigation into Cancer and Nutrition (EPIC) project (Riboli et al., 2002), differences of up to 25 percent were found for dietary fibre intakes when missing values were treated as zero (Charrondiere, Vignat and Riboli, 2002). This kind of discrepancy will cause misranking of subjects within a nutrient intake distribution.

Clearly, then, zero must not be used for missing values in computations. If the database compilers have not supplied guesstimates, then a practical alternative would be for the user to assign estimated values to fill these gaps, or alternatively to substitute averages derived from known values for foods of the same type. Estimates prepared by careful interpretation of data on related foods are acceptable in nutritional studies, provided that their use is clearly noted. If computations of intake have to be made using zeros for missing values, the summation should be marked with a “not less than” sign and the programme must be written accordingly.

Slimani, Riboli and Greenfield (1995) have pointed out the need for tailored databases for nutritional epidemiology studies; examples of developing such a database include those of Hankin et al. (1995) for the Pacific Islands (using borrowed, calculated and commissioned analytical data), Salvini et al. (1996) for an Italian study, and Schakel (2001). A useful paper by Buzzard, Schakel and Ditter-Johnson (1995) describes procedures for quality control in database maintenance and use.

Inappropriate database or food composition values

An inappropriate database may be used as a result of lack of insight or lack of a purpose-designed database. The United States and United Kingdom food composition tables are probably the most common “default” databases used around the world, because of their ready availability in computerized form and their comprehensive coverage of foods and nutrients.

An opportunity to test the databases arose in Australia, where the first all-Australian database of original analytical data for Australian foods analysed in Australian laboratories was produced in the mid-1980s; prior to that time, United Kingdom or United States data had been used. In a comparison of the food supply data for 1990–91 in the new Australian tables (Department of Community Services and Health, 1989–91) with those in the United Kingdom and United States tables, it was found that the latter tables overestimated fat from meats by 60 percent and total fat by 15–22 percent. They also overestimated the iron, zinc, retinol activity, vitamin C and magnesium in the Australian food supply, while calcium was 35 percent higher using United Kingdom data and thiamin 59 percent higher using United States data (Cashel and Greenfield, 1995). The disparity arose because of differences in the gross composition for foods, as well as in the nutrient composition.

Another problem is the application of out-of-date data food composition databases. An interesting study by Hulshof et al. (1996) investigated the reasons for dietary change observed between the first Dutch National Food Consumption Survey (DNFCS), carried out in 1987–88 and the second one in 1992. The apparent decrease of 13 g in fat intake per person per day over the period was reduced to 11 g when artefactual changes in the food composition database were identified. About half of the reduction in fat intake was due to true changes in food choices and the other half to true changes in foods. All food composition databases tend to be “out-of-date” in view of the inevitable delays between the stages of collecting foods for analysis and entering validated data for nutrient composition into the database management system, and this study highlighted the need for the careful preparation and updating of a database prior to its use for national references for dietary studies. It also illustrated the usefulness of having a data audit trail – a system that records changes and reasons for changes in the data.

Errors arising in database use

Studies reported by Danford (1981) and Hoover (1983a) found considerable differences between results for a single day's nutrient consumption when processed by several different food composition databases, even though all the databases were founded on the USDA handbook of food composition values. These problems have recurred in more recent studies, the situation being complicated by the proliferation of calculation software packages in the United States, each with different modifications of the nutrient database (Lee, Nieman and Rainwater, 1995; McCullough et al., 1999). Thus, software differences have to be added to the list originally identified by Hoover (1983a) as sources of error in database use: differences in conversion of household measures to standard weights, miscoding of food items and problems in identifying the food items exactly. Similar studies in France (Herbeth et al., 1991) identified differences in databases available in the country as the main source of error.

Hoover and Perloff (1983, 1984) have developed a series of procedures for testing the accuracy of use of a food composition database: procedures for updating the database, for calculating nutrients for a simple recipe, for reporting baseline data, for reporting nutrients for various portion sizes and for executing the computation of a dietary intake record. This quality control tool can be adapted for different kinds of nutrient database. It is also a useful model for a teaching tool.

Use of these standardized procedures revealed that inclusion of abundant descriptive detail of the foods reduced mismatching of foods with database food items (Hoover and Perloff, 1983). This indication that confusion of food nomenclature is a major source of error in database use highlights the need for improved methods for food nomenclature.

Errors arising in the use of composition data include the following:

failure to record sufficient details regarding the food (e.g. cooking or processing method);
failure to note whether the total food or edible portion only was weighed;
use of nutrient data for raw instead of cooked foods;
errors in calculating fatty acid intakes arising from the use of fatty acids per 100 g of total fatty acids instead of per 100 g food or the use of an incorrect conversion factor;
failure to adjust for water, vitamin and mineral losses when calculating nutrient intake from a recipe;
failure to note the identity of fats and oils used in recipe foods or foods cooked in fat;
failure to include provitamin A compounds when calculating vitamin A intakes;
failure to recognize difference in values as a result of nutrient definitions, e.g. available as opposed to total carbohydrate;
errors in matching nutritionally different foods when substituting for missing foods in the tables/database;
mistakes in conversions (volume to weight, portion description to weight).

Incompatibility of databases

Epidemiologists are often concerned with comparisons of diet among countries, or among populations. The incompatibility of databases often limits conclusions that can be drawn from such comparisons. Deharveng et al. (1999) compared the food composition tables of the nine European countries participating in EPIC in terms of availability, definition, analytical methods and mode of expression of the nutrients of interest for this epidemiological study. Although most nutrients in the tables had been analysed and expressed in a compatible way, some nutrients were not comparable (e.g. folate, dietary fibre, carbohydrates, carotenes). Other problems identified included out-of-date methods of analysis and the inclusion of data for foods collected over 20 years earlier. The authors concluded that purpose-built food composition tables were needed to analyse the large amount of dietary data being reported in EPIC.

Differences in software packages

Nowadays, the majority of users outside the major research centres that can afford to develop their own calculation programs will use the nutrient database integrated into the software package they purchase. This highlights the need to identify both the package and the database separately in publications. Software producers often incorporate additional foods or components into databases or may select certain nutrient data (e.g. niacin only, instead of niacin equivalents, when calculating dietary niacin status). This means that users must be trained to evaluate software packages prior to purchase, especially when purchasing a package for use by a large number of users (e.g. throughout a health care system, such as a group of hospitals, or for allied health use throughout an entire province or state).

The range of functions currently needed in dietary analysis tools is huge and is discussed in detail by Weiss (2001) and Stumbo (2001). They include: entering client records; facilities for updating the food composition databases; searching and displaying foods for nutrient composition by 100 g and by common serving size; ranking foods in terms of provision of nutrients; calculating the nutrient content of recipes, meals, diets, food intakes (from dietary records or food frequency questionnaires) and menus; multiplying or dividing food and nutrient intakes by factors such as days, meals or other variables of interest; comparing nutrient intakes with dietary recommendations; performing computations such as averaging, or dividing group intake data for foods and nutrients into deciles; printing or displaying results as tables, lists or graphs; storing calculated records or exporting them for further statistical analysis; calculating and printing product labels for nutrients, ingredients and comparisons with dietary references; costing products, meals and diets; printing labels for meals and clients; developing research, therapeutic or hospital diets, menus and food purchase lists according to different costs; adjusting menus to meet nutritional goals.

Limitations of methods for measuring food intake

The most accurate way to assess the nutrient intake of a person is to analyse an exact duplicate of the foods eaten over the survey period. This approach is seldom used because of obvious practical problems, in addition to the costs and the time involved in the analyses. Estimation of nutrient intakes by the application of food consumption data to food composition data is the method of choice. Indeed, computations of this sort probably constitute the major use of food composition databases at present.

All ways of estimating the amounts of foods consumed are associated with some degree of error. A full discussion of this topic is beyond the scope of this book, but readers are referred to several publications (Bingham, 1987, 1991; Gibson, 1990; Willett, 1998; Margetts and Nelson, 1997). A prominent problem with all dietary methods is the high prevalence of underreporting, estimated by Macdiarmid and Blundell (1998) to range up to 70 percent in certain groups.

Clearly, errors in the measurement of food intake add to errors arising from differences between the composition of the food consumed and the values recorded in the database. At the same time, the accuracy of nutrient intakes calculated from food composition data cannot be improved by attention to the database alone. The quality of the results depends on the quality of the database, the accuracy with which foods can be identified, the quality of the food consumption data, and the accuracy with which the food composition database and the programs (or calculations) are used (Figure 11.1).

Figure11.1: Factors influencing the accuracy of nutrient intake estimation

Food Composition Data

Evaluation of a database, tables or software

One task that invariably falls to the professional nutritionist, particularly the nutritionist involved in a research project, is the choice of a database. Because of the many commercial diet analysis programs that are now available for the calculation of nutrient intakes, nutritionists require training in the evaluation and selection of databases; indeed, such training should form part of any professional or degree course in nutrition. In general, the options available for the nutritionist are (adapted from the suggestions of Perloff [1983]):

to computerize a set of tables or to make up a computerized database from several sets of tables that are available (in this case, criteria for the selection of values must be provided. Programs for calculating nutrient intakes will have to be written);
to link up to an existing computerized database via a modem;
to purchase a computerized database on disk, CD or online and prepare computer programs to calculate nutrient intakes from the base plus consumption data;
to purchase a database plus programs;
to contract to provide consumption data to a database user who will compute nutrient intake data for a fee.

In considering these options, the primary concerns of the user should be to choose a database that is appropriate, that contains reliable data for foods closely matching those consumed, and that has accurate programs.

The suitability of the database can be determined by putting it through standardized tasks based on the functions discussed above (Hoover and Perloff, 1983, 1984). Other considerations will include the cost, speed, ease and convenience of use, the degree of training required of the operator, and hardware requirements.