Section III

Quality Control of Food Composition Data and Databases

This Session was chaired by Dr Dorothy Mackerras of the Department of Public Health, Sydney University. The keynote address entitled Food Classification and Terminology Systems was given by J.A.T. Pennington. This was followed by papers on Nutritional Metrology: the Role of Reference Materials in Improving Quality of Analytical Measurement and Data on Food Components by J.T. Tanner, W.R. Wolf and W. Horwitz, Strategies for Sampling: the Assurance of Representative Values by J. Holden and C.S. Davis, and Assuring Regional Data Quality in the Food Composition Program in China by G. Wang and X. Li, and Quality Control for Food Composition Data in Journals — a Primer jointly presented by K.K. Stewart and M. R. Stewart. These papers are published on the following pages.

The paper by B. Perloff and S. Gebhardt, Building Data Quality in the Data Base Management Process, is not included. The authors can be contacted at the US Department of Agriculture, 4700 River Road, Riverdale, MD 20737, USA.

Posters presented after Session III were:

Criteria Used for Analytical Data Evaluation, Buick, D., Mottershead, R., & Scheelings, P., Australian Government Analytical Laboratory, Seaton, SA, Australia.
Evaluation of Foods as Analytical Control Samples, Buick, D., Pant, I, Trenerry, C., & Scheelings, P., Australian Government Analytical Laboratory, Seaton, SA, Australia
Development of an In-house Nutrition and Food Science Bibliographic Database Using Micro CDS/ISIS, Chia, W.Y., & Greenfield, H., Department of Food Science and Technology, University of New South Wales, Sydney NSW, Australia.
APINMAP - an Integrated Database of Medicinal and Aromatic Plants, Henninger, M., School of Information, Library and Archive Studies, University of New South Wales, Sydney, NSW, Australia.
Food Analysis Reference Materials for the Asia-Pacific, James, K.W., DSTO, Materials Research Laboratory, Scottsdale, TAS, Australia.
International Survey on Dietary Fiber Definition, Analysis and Reference Materials, Lee, S.C., & Prosky, L., Kellogg Company, Battle Creek, MI 49016, and US FDA, Washington, DC 20204, USA.
Desktop Publishing of Food Tables, Mikkelsen, B.E., Danish Catering Centre, Institute of Food Chemistry and Nutrition, National Food Agency, Søborg, Denmark.
Information Sources in Nutrition and Food Science and Technology, Mobbs, S.L., & Siu, C.S., Biomedical Library, University of New South, Sydney NSW, Australia.
Interface Standard for Food Databases, Pennington, J.A.T., Hendricks, T.C., Douglass, J., Peterson, B., & Kidwell, J., Center for Food Safety and Applied Nutrition, US FDA, Washington, DC 20204, USA.
Development of ASEANFOODS Reference Materials, Pustawien, P., & Sungpuag, P., Institute of Nutrition, Mahidol University, PO Box 31, Talingchan, Bangkok 10170, Thailand.

Food Classification and Terminology Systems

Jean A.T. Pennington

Food and Drug Administration, 200 C Street, S.W., Washington, DC 20204, USA

Food classification systems organize foods in databases among groups and subgroups based on food type (e.g., grain products, fruits) and/or food use (e.g., beverages, main dishes). The food groups and subgroups vary among databases according to the number and types of foods in the database, the cultural uses of the foods, and specific decisions made by the database compiler. Terminology systems are structured methods of applying descriptive terms (e.g., terms relating to packaging, processing, color, maturity) to foods. Faceted terminology systems assign descriptive terms for specific characteristics of foods, allowing these characteristics to be considered independently. Eurocode is a food classification, coding, and terminology system. Langual/Interface Standard is a faceted food description system with standardized vocabulary, and the INFOODS system is a free-text faceted food description system.

The words used to classify, name, and describe foods are a mixture of traditional, fanciful, technical, and sensory terms. Sometimes these terms convey a clear picture of what a food is, especially if one is already familiar with it. For a food that is not familiar, the mental image conveyed by the terms is important to understanding the food and the data associated with it. Food names and the terms associated with them are key to the use of information in food-related databases. There should be sufficient descriptive information about the food to clearly understand what the data represent.

Table I. Food classification systems (number of groups in each database)

References		(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)
Food Type Classifications
Milk and eggs							1
Milk and milk products		1	1	1	1	1		1	1	1
Eggs		1	1	1	1	1		1	1	1
Meat, poultry, fish								1	1
Meat and poultry		1	1	1	1	1				1
	Meat						3
	Poultry						1
	Luncheon meat & sausages						1
Fish and shellfish		1	1	1	1	1	1			1
Fats and oils		1	1	1	1	1	1	1		1
Grain products		1	1	1	1	1	3	1	1	1
Fruits and vegetables									1
Fruits		1		1	1	1	1	1		1
Fruits and nuts			1
Fruit juices/nectars			1
Legumes, nuts, seeds		1						1
Nuts and seeds					1	1	1		1	1
Vegetables		1	1	1	1	1	1	1		1
	Legumes						1			1
	Potatoes and roots			1						1
Food Use Classifications
Beverages		1	3		1	1	1	1	1	1
Alcoholic beverages			1			1
Sugars/syrups/sweets		1	2		1	1		1	1	1
Special nutritional use		1
Herbs/spices/flavourings			1			1	1		1
Snacks							1		1
Soups/sauces/gravies/dressing						1	1		1
Fast foods							1
Baby food							1
Prepared products									1
Miscellancous/other		1	3	1	1				7	1
Number of major groups		13	19	10	12	14	21	10	29	14

References
(1) Eurocode 2
(2) Germany
(3) Sweden
(4) Australia
(5) Britain
(6) USDA Agriculture Handbooks
(7) USDA Nationwide Food Consumption Survey
(8) Langual
(9) Near East

• Classification Systems

Classification systems refer to the groupings and subgroupings of foods in databases, based on food type (e.g., vegetables, dairy products) and/or food use (e.g., beverages, fast foods, snacks) (Table I). Most food composition databases, except those arranged alphabetically, are organized by such groupings. These groupings assist database users in locating foods and comparing the nutrient content of similar products. They also reduce the repetition of group and subgroup headings. The number of major food groups found in nine selected databases (1–9) ranges from 10 to 21 (Table I).

Foods in the major groups are usually subgrouped by more precise food names or by descriptive terms, creating hierarchies within each major group. For example, meats may be subgrouped by beef, lamb, and pork, and desserts may be subgrouped by cakes, cookies, and pies. Beef may be further subgrouped by specific cuts (brisket, loin, steak), grade (choice, good, prime), and/or fat trim (0", ¼“, ½”). Cookies may be further subgrouped by flavor ingredients (chocolate chip, oatmeal, peanut butter, sugar) or source (commercial, homemade).

As Table I shows, food type classifications vary somewhat among countries. For example, some databases group all vegetables together. Others have separate groupings for legumes and root vegetables; some group legumes and nuts together. However, there is probably better international agreement for food type than for food use groupings because the use of foods in daily diets varies among ethnic and cultural groups. Food use categories are particularly useful to group together products with common dietary use that could be “lost” among food type classifications. For example, under “snack foods” in the USDA Agriculture Handbook No. 8–19 (6), one finds vegetable-based products (potato chips/crisps), grain-based products (corn chips, tortilla chips, popcorn), and fruit or nut-based products (trail mix, banana chips).

Food Grouping Problems

For databases with both food type and food use classifications, there may be some difficulty in placing foods that fit under two or more groups. For example, “French fries” (an American food equivalent to British “chips”) could be classified under “vegetables” or “fast foods”; “cookies” (American food equivalent to British “biscuits”) could be classified under “grain products” or “desserts”; “bouillon” (American term equivalent to British “beef tea”) could be classified under “soups” or “beverages.” Such products may be forced into one group or listed in several. The latter solution would result in foods with the same name being in different groups, e.g., some French fries under the fast food group and some under the vegetable group. This makes it difficult for users to locate similar foods and compare their nutrient content.

Within each major food group, similar decisions about how to place foods among subgroups must be made, especially if there are rigid hierarchies. Some foods clearly fit two or more subgroups. For example, “Irish coffee” (American name for coffee with whiskey) is clearly a “beverage,” but is both an “alcoholic beverage” and a “coffee beverage”. Other foods seem to be transitions between food groups or subgroups. For example, a broth with chunks of meat and vegetables may be a transition between a soup and a stew.

In hard-copy databases with space constraints, subgroups are usually formed by identifying common descriptive terms and using them as subgroup headings. This may lead to some inconsistences in a database as to how subgroups are formed. For example, pancakes and waffles could be subgrouped under grain products first by type (frozen, frozen batter, home-made, liquid batter) and then by flavor (blueberry, cinnamon, plain, old fashioned, strawberry, whole grain) or the other way around. A fast-food fish sandwich might be subgrouped by entree type “fish,” by entrée type “sandwiches,” or by restaurant name (e.g., “McDonald's”).

Decisions about groups and subgroupings are usually made by the database compiler after the data are collected and sorted. At that point, the number of repetitive terms can be determined and minimized. The provision of an index assists users in locating foods that might be placed in multiple groups or subgroups or that have inconsistent subgroup structure from group to group.

In computerized databases, one might view only one food name (and its descriptors) at a time without benefit of seeing the other foods and descriptors in the classification hierarchy. It is necessary to repeat descriptive terms in this case. For example, the terms “breakfast cereal,” “cookie,” and “frozen dinner” would need to be repeated with each listing for which it is appropriate.

• Terminology Systems

Terminology systems refer to the systematic methods of applying descriptive terms to foods. These terms, which provide information about color, flavor, maturity, preparation, preservation, brand names, etc., are important because the nutrient contents of foods vary according to such terms. For example, the USDA database for the 1987–88 NFCS (7) lists 18 entries for string beans, each of which has different nutrient values based on descriptive terms for color, preservation, cooking, and/or added ingredients. Descriptive terms also provide insights about food safety (storage, preservation) and nutritional quality (fortification, processing).

The simplest type of terminology system is one which orders the descriptive terms (as appropriate) around the food name (linear descriptors). Descriptors for most food names could be ordered in several ways. Database compilers generally try to use consistent terms and ordering of linear descriptors to facilitate the use of the database.

Faceted terminology systems assign descriptive terms for each food for specific characteristics (facets). The terms are not necessarily a part of the food name, but are linked to the food name in a manual or computerized system. Faceted systems allow for different characteristics of food to be considered independently. To develop such a system, one must identify the facets, collect the descriptive terms belonging to each facet, and define the terms.

Faceted systems for foods are based largely on the faceted system developed in 1971 by the International Network of Feed Information Centers (INFIC) for international exchange and dissemination of information about feeds (10). Approximately 21,000 feeds have been described according to the facets: origin, part, process, growth stage, cut, and grade. The descriptive information (in English, French, and German) and numerical data associated with various feeds can be stored, summarized, retrieved, and printed in various formats.

Three unique terminology systems are briefly discussed: Eurocode 2 (1), a food classification/coding/terminology system; Langual/Interface Standard (8, 11, 12), a faceted description system with standardized vocabulary; and the INFOODS system (13), a free-text faceted description system.

Eurocode 2

Eurocode was originally developed in the early 1980s as a common European system for coding foods consumed by participants in dietary surveys (14, 15). In this case, “coding” refers to the assignment of alphanumeric codes to foods in databases. The codes link the food name to the data (e.g., composition or consumption data) associated with it and allow for computer manipulation of the data. The Eurocode 2 manual (1) provides rules for coding single foods, mixed foods, and foods as recipes. The codes (as described in the manual) may to be applied to foods in manual or computerized databases.

Table II. The Eurocode system^a

Eurocode Fields with Examples
Field 1. Field 2. Field 3. Field 4
Main group. Subgroup. Food name. Recipe (optional)

Meat and meat products (3)
	Mutton (3.4)
		Mutton, carcass meat (3.4.1)
			Mutton recipe prepared in Ireland (31E.4.1.2)^b

Grains and grain products (6)
	Wheat breads (6X.1)^c
		Rusks (6X.1.8)

Vegetables and products (8)
	Cabbages (8.2)
		Kohlrabi (8.2.6)

Eurocode Descriptors	Examples
T Thermal treatment at consumption	T7 (deep fried)
N Non-thermal treatment	N4 (mashed)
P Preservation method/packing medium	P19 (frozen)
A Component added	A10 (fiber added)
R Component removed	R4 (skin removed)

^a Information adapted from Poortvliet and Kohlmeier (1)

^b IE indicates a recipe prepared in Ireland. The “2” in the fourth field indicates a specific recipe for a dish based on mutton, e.g., Irish stew

^c The X in the first field indicates that this food has been coded as a mixed food

The food codes have four fields (Table II). The first field identifies one of 13 main food groups, the second field identifies the food subgroup, the third field identifies the food item, and the fourth field, which is optional, provides reference to a recipe. For example, in the code for rusks, 6×.1.8, 6 represents grain products and 6×.1 represents wheat breads. The “X” in the first field indicates that a food is coded as a mixed food (i.e., a multi-ingredient food). A two-character country code replaces the “X” to identify the country for a national recipe. For example, 3IE.4.1.2 is the code for a mutton recipe prepared in Ireland (3 is for meat and meat products, IE if for Ireland, 3.4 is for mutton, 3.4.1 is for mutton carcass meat, and the 2 at the end refers to the recipe).

Eurocode 2 provides an optional terminology system with descriptive terms for five facets: thermal treatment, nonthermal treatment, preservation and packing, components added, and components removed. Descriptors are identified with alphanumeric codes (e.g., T7 for the thermal treatment “deep fried”), and definitions are provided for consistent coding. The authors of the Eurocode 2 manual indicate that the descriptors are designed for dietary surveys and do not attempt to satisfy the degree of technical detail used in food technology (1).

Table III. Langual factors and examples of factor terms

Langual Factors			Examples of Factor Terms
1.		Product type	Breakfast cereal
2.		Food source (plant or animal)	Leafy vegetable
3.		Part of plant or animal	Organ meat
4.		Physical state, shape or form	Semisolid
5.		Extent of heat treatment	Partially heat-treated
6.		Cooking method	Cooked by dry heat
7.		Treatment applied	Hydrogenated
8.		Preservation method	Pasteurized by heat
9.		Packing medium	Packed in gelatin
10.		Container or wrapping	Paperboard container
11.		Food contact surface	Plastic
12.		Consumer group/dietary use	Human food, low calorie
13.		Geographic places and regions
	a.	Area of origin (grown/produced)	Zimbabwe
	b.	Area of processing	Italy
	c.	Area of consumption	Tennessee
14.		Cuisine	Chinese
15.		Adjunct characteristics of food (examples)
		Color of poultry meat	Dark meat
		Grade of meat, US	Choice grade
		Plant maturity	Ripe or mature
		Location of preparation	Restaurant/fast food prepared

Langual

Langual is a faceted food description language that has been under development by the US Food and Drug Administration (FDA) since the early 1970s (8). It is a software system that may be applied to food-related databases such as those of food composition and food consumption. Each food is assigned a set of descriptors, using standardized language, from the following facets: product type; food source; part of plant or animal; physical state, shape, or form; extent of heat treatment; cooking method; treatment applied; preservation method; packing medium; container or wrapping; food contact surface; consumer group/dietary use; geographical places and regions; cuisine; and adjunct characteristics (Table III). If the factor term for a food is not known or does not apply for a food, the terms “unknown” and “not applicable” may be used. For internal storage and processing, factor terms are assigned alphanumeric codes. Langual is currently used on a mainframe computer at FDA, but has been adapted for personal computers in other locations.

Foods in various databases may be searched or retrieved by one or more of the Langual descriptive terms. The more accurate the descriptions of the foods, the more informative are the searches and retrievals. To facilitate retrieval and aggregation, the descriptors within each facet are arrayed in a hierarchy from broader to narrower terms. The vocabulary includes definitions for the terms and explains when and in what contexts they should be used. The Langual thesaurus includes cross references for synonyms and Latin names, and for preferred, broader, narrower, and related terms.

An European Langual Working Group was established in the early 1990s to be the focal point for Langual use in Europe and to communicate needs to the US Langual Committee. In May 1992, Langual was evaluated for use in European databases. Several European dietitians/nutritionists were trained in Langual and were asked to code a number of foods to determine the applicability of Langual to European foods. The results of this test indicated that Langual is an appropriate terminology system for European foods (16).

The concept of an interface standard (a common communication link based on the food name and descriptors) to allow international exchange of food-related data arose at a meeting of the Committee on Data for Science and Technology (CODATA) in March 1990 in Maryland, USA. Criteria for an international interface standard were drafted at this meeting, and FDA used those draft ideas to formulate an interface using Langual (11). The interface was further refined under a FDA contractual effort (Figure 1) (12), and the computer software for the interface standard is expected to be completed in April 1995.

The aspects of the interface standard, which are linked to the food names, include food name synonyms, Langual factor terms, other food descriptors (agricultural and storage variables), other descriptive coding systems, ingredients and recipes, food standards, and reference files. The reference files allow for the identification of substances administered or applied during production and storage, the organization that produced or prepared the food, and the source of the data.

As much information as possible is provided about the food without making questionable assumptions. Only those descriptors that pertain to a food need to be used. Once the foods in a database are described according to the interface, databases may be queried and information may be retrieved. The system will also allow for matching (or finding the closest matches) of foods in different databases.

INFOODS

The International Network for Food Data Systems (INFOODS) was organized to improve the quality and accessibility of food composition databases. It was funded by US government agencies from 1984 to 1987 with headquarters at the Massachusetts Institute of Technology. The Food Nomenclature and Terminology Committee (one of the three INFOODS committees) was charged with developing a proposal to standardize the nomenclature and description of foods to allow for useful exchange of food composition data among countries (13).

The Committee met at several international meetings and worked via mail to develop and refine a system for describing foods. The report that resulted from this work (13) provides for free text descriptors for specific characteristics of foods. The system, which was not specifically designed for computer implementation, includes six major facets (Table IV): source of food name and descriptive terms; name and identification of the food; description of “single” foods; description of “mixed” foods; customary uses of food (optional), and sampling and laboratory handing of food. The INFOODS system was not intended to supersede or replace systems currently in use, but to support and be compatible with them (13).

Figure 1. International interface standard for food databases

Table IV. Major facets of the INFOODS system for describing foods¹

A. Source of food name (5) and descriptive terms
B. Name and identification of the food
1.	Name in national language
2.	Local name
3.	Nearest equivalent name in English, French, or Spanish
4.	Country/area where obtained
5.	Food group and code in national database
6.	Food group and code in regional database
7.	Codex Alimentarius indexing group
C. Description of “single” foods
1.	(a) Food source
	(b) Scientific name (Latin)
	(c) Variety, breed, strain
2.	Part of plant or animal
3.	Country/area of origin
4.	Manufacturer's name and address (batch or lot number)
5.	Other ingredients
6.	Food processing and/or preparation
7.	Preservation method
8.	Degree of cooking
9.	Agricultural production conditions
10.	Maturity or ripeness
11.	Storage conditions
12.	Grade
13.	Container and food contact surface
14.	Physical state, shape, or form
15.	Color
16.	Other descriptors
17.	Availability and location of photograph/drawing of food
D. Description of “mixed” foods
1.	Ingredients and quantities
2.	Recipe procedure
3.	Place where prepared
4.	Availability and location of photograph/picture
5.	Manufacturer's name and address
6.	Container and food contact surface
7.	Preservation method
8.	Storage conditions
9.	Final preparation
E. Customary uses of food (optional)
1.	Typical portion weight and measure
2.	Availability (frequency and season of consumption)
3.	Role of food in the diet
4.	Food users
5.	Specific purposes of the food; special claims
F. Sampling and laboratory handling of food
1.	Date of collection
2.	Weight(s) of sample(s)
3.	Percentage edible portion; nature of edible portion
4.	Percentage of refuse; nature of refuse
5.	Place of collection
6.	Handling between supplier and laboratory
7.	Handling on arrival at laboratory
8.	Laboratory storage and subsequent handling
9.	Strategy for analyses
10.	Reasons for doing analyses

¹ Adapted from Truswell et al. (11).

• Importance of Terminology Systems

Terminology systems allow for descriptive information about foods in a consistent, standardized way that extends beyond the food name. Many food names are not sufficient by themselves to identify foods. Descriptors are especially useful for implicit food names; different foods that have the same name; foods that have different names; and vague, generic names. Terminology systems can address these problematic food names through descriptive terms relating to food source, food group, Latin name, language of food name, maturity, geographic region, cuisine, synonyms, preferred terms, and/or other facets.

Implicit Food Names

There are several types of implicit food names. Some convey no meaning without prior familiarity and do not translate meaningfully to other languages. Examples include bubble and squeak (British), kaerlinghedskranse (Danish “love rings”), hete bliksem (Dutch “hot lightening”), himmel und erde (German “heaven and earth”), scottadito (Italian “burning fingers”), himmelsk lapskaus (Norwegian “heavenly potpourri”), brazo de gitano (Spanish “gypsy's arm”), putt i panna (Swedish “tidbits in a pan”), and the American foods baked Alaska, red flannel hash, pigs-in-a-blanket, and succotash. Most of the commercial names for alcoholic mixed drinks (Bloody Mary, Rusty Nail, Screwdriver), ready-to-eat breakfast cereals (Frankenberries, Froot Loops, Pebbles), and candies (Baby Ruth, M&Ms, Now'n' Later, Payday) are fanciful, implicit names.

Some food names are implicit misnomers, i.e., the literal translation may lead to the wrong food. If one is not familiar with these food names, the wrong conclusions may be drawn. Examples of American food names that are implicit misnomers are corn dogs, grasshopper, hush puppies, rocky mountain oysters, and sweetbreads. Examples of implicit misnomers from the UK are Scotch woodcock, spotted dog, toad-inthe-hole, and Yorkshire pudding.

Some implicit geographic food names imply an area of origin (Brussels sprouts, Danish (pastry), English muffins, Lima beans, and London broil), but have little to do with the identified areas.

Same Name, Different Foods

Some foods share the same (or nearly the same) name, but are different foods. “Tuna” in American English is a fish; in Mexican Spanish, the term refers to a prickly pear. “Rape” is a plant oil used in Mid-Eastern cookery, a Spanish fish, or a French cheese. In England and France, “flan” is an open fruit tart in sponge cake or pastry crust; in Mexico or Spain, it is a baked caramel cream custard. A terminology system which defines the language of the food name and the cuisine is useful for distinguishing the correct usage of a food name.

There are many examples of the “same name, different food” problem among American and British names for foods. “Half-and-half” in the UK is a beverage of half porter and half pale ale; in the US, the food name refers to a mixture of cream and milk. “Mince” could be chopped ground beef or chopped fruit in the UK, but is chopped, dried fruit (mainly raisins) in the US. “Silverside” is a beef cut in the UK and a fish in the US. A cordial is a soft drink in the UK, but is a concentrated alcoholic beverage in the US. A terminology system which identifies the language of the food name and specifically distinguishes between different forms of the same language (i.e., English in the UK, the US, Canada, and Australia) would assist the data user.

Common usage of food names (usually a tendency to shorten the name) may result in names that refer to several different foods. For example, the term “chili/chile” may refer to a chili pepper (vegetable or spice), to chili beans (beans with a chili pepper sauce), or a mixed dish made with beef, beans, and a chili pepper sauce. The term “curry” may refer to the spice or to a rice dish made with the spice. The term “dressing” refers to salad dressing as well as to poultry stuffing (breading). A terminology system can help clarify these many uses of a food name through food groups, food source, and homonym definitions.

Some foods share the same name, but are prepared with different ingredients and are not really the same foods. For example, cocoa (hot chocolate) is usually made with milk, but some of the instant, dry cocoa products are reconstituted with water and contain little or no milk. Similarly, “lemonade” may be made from lemons or with artificial flavoring. The nutrient data associated with various cocoas and lemonades show clear differences in these products. Main dishes, soups, salads, and desserts may share the same food name (e.g., lasagne, gazpacho, carrot cake), but have different recipes. A terminology system should allow for information on ingredients and recipes (how the ingredients are put together) and information on place of procurement (e.g., restaurant, homemade, grocery store).

Some foods have the same commercial product name (Kellogg's Corn Flakes, McDonald's Big Mac), but are made from different ingredients in different countries. Different formulations may be due to different food standards, different nutrient fortification levels, the local availability of ingredients, or local taste preferences. A terminology system may help by providing information on ingredients and geographic descriptors.

Food standards (e.g., the definitions for what constitute milk, butter, margarine, beer, wine, ice cream), nutrient fortification levels, and nutrient claims (e.g., low fat) are established by government regulations and vary among countries. A terminology system should allow for descriptive information relating to food standards, nutrient fortification levels, and claims and identify the country associated with these legal terms.

Same Food, Different Names

The “same food, different name” problem can be handled by synonyms in a terminology system. In many cases, the preferred food name varies by geographical location or culture. There are different names for the same food within a country, e.g., ocean perch is known regionally in the US as rosefish, redfish, snapper, sea perch, and redbeam (17). There are different names for the same food in the same language among countries, e.g., American molasses and British treacle; American oatmeal and British porridge; American raisin bread and British currant loaf; American gelatin dessert and British jelly, and American jelly and British jam.

Vague, Generic Names

Food descriptions in databases are often lacking for basic, traditional foods such as fruits, vegetables, animal flesh, and grain products (breads, etc.). For example “oranges” and “white bread” could be described by year of production and/or market share of cultivars and brands, respectively. Such descriptors are especially important when database compilers are aggregating data from various sources and filling in missing values by matching food names. A terminology system could allow for these types of descriptions through agricultural variables and information on sampling designs.

Database users need to know if “generic” foods are market basket samples and/or mixtures of cultivars or maturity levels. If generic foods are not adequately described, inappropriate or misleading conclusions may be drawn about the data associated with them. For example, the vitamin A content of ½ grapefruit (120 g) is 318 IU for pink and red and 12 IU for white (6). The weighted vitamin A value of 149 IU for the US market share product (6) does not reflect either the pink or white product.

• Current Status and Future Goals

Food classification systems are developed by database compilers according to the number and types of foods in the database, cultural uses of foods, and/or intended users of the database. Thus, each food composition database tends to have its own system. The importance of food group classifications in databases depends on the types of terminology systems that are present, i.e., what other sorting or retrieval mechanisms are available. If there are no other mechanisms to describe or locate foods in a database, then classifications are very important, and must be carefully structured to place foods logically and consistently in the hierarchy. If there are other means by which to describe (and hence retrieve) foods, then the classification system is of less importance.

Because food classification systems are culture dependent, they are probably best designed to assist immediate (local) users of the database. A universal classification system is not necessary for the exchange and sharing of information in food-related databases.

Faceted terminology systems, especially those with standardized vocabulary, have specific advantages for use with food composition databases. These advantages include consistency in the use of defined terms; access to a hierarchy of terms with information on narrower, broader, and preferred terms and synonyms; retrieval of food names based on descriptive terms across food groups; and ability to match foods in various databases based on identical or similar descriptive terms.

Terminology systems must keep up with foods available in the marketplace which are changing to meet consumer preferences for convenience, appreciation of ethnic foods, and increased interest and knowledge of nutrition. Several types of foods in the marketplace are presenting challenges for food description systems. They include products from newer or changing plant cultivars and animal breeds; foods previously used only by select population groups that are becoming available in different geographic areas (e.g., ugli fruit, jicama); synthetic foods made of mixtures of refined ingredients (formula-type meal replacements, medical foods); meat analogues; traditional foods made with fat and sugar substitutes; and traditional foods that have been reformulated to meet special dietary claims.

The type and level of descriptive information needed about foods vary among database users (i.e., researchers, epidemiologists, government agencies, educators). However, it is possible that a terminology system can serve multiple needs. It is important to note that current systems are not incompatible and that much knowledge and experience have been gained by the development of several different systems.

Foods in databases must be clearly and accurately described so that we can better use the data associated with them. Descriptive information associated with foods prior to laboratory analysis (e.g., information about sampling, preparation, and cooking methods and information from labels), should be recorded and carried with the food composition data to the database. Countries need to work together toward flexible and compatible food description systems for databases to increase the capability to capture, exchange, share, and retrieve information about foods.

• References

(1) Poortvliet, E.J., & Kohlmeier, L. (1993) Manual for Using the Eurocode 2 Food Coding System, Federal Health Office, Institute for Social Medicine and Epidemiology, Berlin

(2) Souci, S.W., Fachmann, W., & Kraut, H. (1989) Food Composition and Nutrition Tables 1989–90, Wissenschaftliche Verlagsgesellschaft mbH, Stuttgart

(3) Fettsyratabeller for Livsmedel och Matratter (1989) Statens Livsmedelsverk, Produktion Informako AB, Stockholm

(4) English, R., & Lewis, J. (1992) Nutritional Values of Australian Foods, Australian Government Publishing Service, Canberra

(5) Holland, B., Welch, A.A., Unwin, I.D., Buss, D.H., Paul, A.A., & Southgate, D.A.T. (1991) McCance and Widdowson's The Composition of Foods, 5th Ed., Royal Society of Chemistry, Cambridge

(6) US Department of Agriculture (1976-) Composition of Foods: Raw, Processed, Prepared, Agric. Handbook No. 8 series, USDA, Washington, DC

(7) US Department of Agriculture (1993) USDA Nutrient Data Base for Individual Food Intake Surveys, Release 6, National Technical Information Service, Springfield, VA

(8) McCann, A., Pennington, J.A.T., Smith, E.C., Holden, J.M., Soergel, D., & Wiley, R.C. (1988) J. Am. Diet. Assoc. 88, 336–341

(9) Food and Agriculture Organization (1982) Food Composition Tables for the Near East, Rome

(10) Haendler, H., Neese, U., Jager, F., & Harris, L.E. (1980) in International Network of Food Information Centers, Pub. 2, L.E. Harris, H. Haendler, R. Riviere, & L. Rechaussat (Eds.), International Feed Databank System, Utab State University, Logan, UT

(11) Pennington, J.A.T., & Hendricks, T.C., (1992) Food Add. Contam. 9, 265–275

(12) Pennington, J.A.T., Hendricks, T.C., Douglass, J.S., Petersen, B., & Kidwell, J. Food Add. Contam. (in press)

(13) Truswell, A.S., Bateson, D.J., Madafiglio, K.C., Pennington, J.A.T., Rand, W.M., & Klensin, J.C. (1991) J. Food Comp. Anal. 4, 18–38

(14) Arab, L., Wittler, M., & Schettler, G. (Eds.) (1987) in European Food Composition Tables in Translation, Springer-Verlag, Berlin, pp. 132–154.

(15) Kohlmeier, L. (1992) Eur. J. Clin. Nutr. 46 (Suppl. 5), S25–S34

(16) Deary, J. (1993) Langual Coding Experiment, MAFF, London

(17) FDA (1988) The Fish List. FDA Guide to Acceptable Market Names for Food Fish Sold in Interstate Commerce, US Government Printing Office, Washington, DC

Nutritional Metrology: The Role of Reference Materials in Improving Quality of Analytical Measurement and Data on Food Components

James T. Tanner

Center for Food Safety and Applied Nutrition, Food and Drug Administration, Washington DC 20204, USA

Wayne R. Wolf

Food Composition Laboratory, Beltsville, Human Nutrition Research Center, ARS, US Department of Agriculture, Beltsville MD 20705, USA

William Horwitz

Center for Food Safety and Applied Nutrition, Food and Drug Administration, Washington DC 20204, USA

This paper discusses the role of reference materials (RMs) in improving analytical results in order to complement existing quality control procedures focused on processes such as standard methods and collaborative trials. Activities to improve the range of RMs available, and their incorporation into standard methods are also discussed.

Analytical measurements of the content of food components are the foundation of nutritional science. Knowledge and application of the principles of metrology (the science of measurement) are essential to improve and assure the quality of the data generated by these measurements. In the past, analytical methodology for nutrient measurements had focused primarily on the process of these analytical measurements, i.e. the emphasis on use of Official Methods of Analysis which have been collaboratively studied and evaluated through procedures established by AOAC INTERNATIONAL (formerly the Association of Official Analytical Chemists). These collaborative studies show the capability to achieve agreement of results among analysts using specifically defined analytical procedures.

More recently metrology in general has focused on the result of the analytical measurement process, i.e. the accuracy of the data generated by the specific application of the procedure. There is a well recognized need to build a foundation for data validation through establishment of accuracy based measurement systems (1). In these systems “routine” or “field” methodologies are linked and traceable through Reference Materials (RMs), Reference Methods, Certified Reference Materials and Definitive Methods to the basic measurement systems of national and international bodies. The use of RMs in conjunction with Official Methods is necessary to build this foundation, not only for establishment of an accurate database of food composition data, but also for the monitoring of appropriate regulations dealing with these types of data.

This concept of “traceability” is essentially important in nutritional science because many of our essential nutrients are not single chemical entities, but are families of related components. Chemical families ordinarily can not be analyzed by methods designed for specific analytes. They require tailor-made methods that try to include only components of nutritional interest. Therefore, many nutrient measurements are method specific, requiring that the procedures be followed in exact detail to obtain repeatable answers. Such methods are even more dependent on reference materials than are methods based upon chemical stoichiometry. The assignment of reference values by a certifying organization, based upon validation by experienced laboratories faithfully following the details of the same method, produces the value which is to be reproduced by laboratories supplying analytical values to nutritional science. Only if a reference value can be duplicated by an analytical laboratory can any degree of confidence be ascribed to values produced by that laboratory for the same nutrient in other foods.

Indeed the foundation of the U.S. Food and Drug Administration (FDA) regulatory process is a tested, reliable method combined with a reference material to validate accuracy of the resulting analytical data. This is a basic requirement of Good Laboratory Practices (GLP) and the corner-stone of good science. In its regulatory programs the FDA requires use of the analytical methods of AOAC INTERNATIONAL, which have been validated through interlaboratory methods performance studies to ensure that they are capable of providing acceptable accuracy and precision. This requirement does not eliminate use of other analytical methods which have been evaluated through similar studies. Indeed the Code of Federal Regulations (2), which specifies that AOAC methods will be used for regulatory purposes, requires that: “…if no AOAC method is available, by reliable and appropriate analytical procedures.” Other methods developed by such organizations as the American Association of Cereal Chemists, the American Oil Chemists Society, the International Standards Organization (ISO), or other organizations may in some cases also be useful for regulatory purposes.

However, all of these methods provide only half of the requirement. In addition to a well-studied method, some means of determining that the method was performed correctly is also necessary. Obtaining acceptable results with validated methodology for a reference material that has a known concentration of the analyte and is similar in composition to the material being analyzed is presumptive evidence that the method was performed correctly and that the results obtained for the test materials are correct. RMs, for which the true values are known, are important for this validation. From a regulator's point of view, the use of appropriate RMs is desirable for determining compliance with existing regulations.

Unfortunately, RMs are not available for many products and analytes. Dating back over 80 years, standard reference materials (SRMs) have been developed by such organizations as the National Institute of Standards and Technology (NIST, formerly National Bureau of Standards, NBS) for products such as steel, in which the content of trace elements is very important. Building on this expertise, RMs have been developed within in the past 20 years for biological products such as flour, spinach, oysters and other food products for which the main focus has been the major and trace elements rather than the various organic compounds comprising the major components of food. One reason for this focus has been that some organic components may change with time and are not shelf stable, therefore, the exact “true” concentration at the time of use cannot be assigned. Another reason is that analytical expertise for organic components has not progressed at the same pace as for inorganic components.

Reference materials are also necessary to determine the systematic error of new methods. Previously, some AOAC methods had used standard additions for checking for the presence of method bias, when a reference material of known concentration was not available. Although this technique is useful under some conditions, it really only measures the analyst's ability to recover analyte added at the measurement stage and not the ability to determine the analyte that was endogenous to the matrix. For this reason, the technique of standard additions sometimes gives unreliable information. The determination of precision or reproducibility is frequently used as a measure of the success of a method because of the ability of a laboratory to obtain the same values as well as to replicate the results of other laboratories. This is an important part of method evaluation but does not address the accuracy question. The International Standards Organization (ISO) has now broken down the concept of “error” as deviation from the true value into three parts: 1) “Accuracy” is the deviation of a single value; 2) “Trueness” is the deviation of the average set of values; and 3) “Bias (or systematic error)” is the deviation of the long term average (3).

AOAC INTERNATIONAL recently formed a task force to address the problem of the methods available to enforce regulations stemming from the Nutrition Labeling and Education Act of 1990 (NLEA) (4), which made nutrition labeling mandatory as of March 1994 for retail foods distributed in the United States. The purpose of the task force was to determine what methodology was available and whether existing methods were adequate for the purpose of nutrition labeling (5). In addition to methods questions, the task force also examined the availability of RMs. It found a serious deficiency in the availability of RMs for organic nutrients and recommended that action be taken to improve that situation (6).

Several problems must be addressed before reference materials for organic nutrient content can be made available. The first is the selection of matrix materials to represent many different kinds of foods; the second is the packaging and storage of these materials to provide a useful shelf life. Third is the characterization or assignment of the “correct” or “best estimate” of the value for the components of interest.

The AOAC task force addressed the question of matrix materials for different foods in a creative way (7). Food is composed of the basic components: protein, carbohydrate, fat, water and ash. Frequently, analysis of a food is not successful because of interference or interaction from one or more of these components with the analyte of interest. In any analytical procedure, water can usually be added or subtracted to suit the requirements of the method. Ash, in general, does not have a great impact on the performance of analytical methods for organic material in foods. Thus, the behavior of a given food in an analytical method is primarily determined by the relative proportions of protein, fat, and carbohydrate.

A scheme has been proposed to represent foods by first normalizing content of these three components to 100 per cent of their sum (7). This normalized food composition can then be plotted within a triangle with 100 per cent fat, 100 per cent protein, and 100 per cent carbohydrate at the respective vertices with the concentration of each component decreasing to zero approaching the opposite side. This schema can then be divided into nine different sectors, each encompassing a range of concentrations of the three components (protein, carbohydrate, and fat) (Figure 1). If a method of analysis were successful for foods falling in each of the nine different sectors, then it should be applicable to all types of food. Such an approach would also be useful to AOAC Associate Referees and AOAC Official Methods committees in minimizing the effort required for collaborative studies while maximizing the value of the resulting data to AOAC Official Methods users. For example, the prospect of coordinating a collaborative study involving 40 or more different foods may discourage many researchers from fully exploring the scope of applicability of a particular method. As a result, reseachers may limit the scope of their study to a few food groups to reduce the analytical burden on the participating laboratories. However, as demonstrated by the triangle, many of the 40 or more foods selected to represent foods for a collaborative study may be very similar to one another on a dry basis, and may behave chemically, and, thus, analytically, in a very similar way.

If a diagram such as Figure 1 were to be used to select samples for a collaborative study, two samples from a sector could be selected to account for variation in the type of protein, fat, or carbohydrate that may have an impact on the performance of the method. Examples of these variations within carbohydrates are high fiber foods versus high sugar foods. Other variations include fats containing significant amounts of short chain fatty acids versus those containing predominantly long chain fatty acids, or foods containing more hydrophilic proteins as opposed to those containing predominantly hydrophobic proteins. In addition, two foods may be selected within a sector that vary according to the extent of processing each has undergone.

The logical extension of this same approach would be to provide appropriate reference materials for a food type or category representing each of the nine sectors. By using the different types as part of a method-performance study and having a reference material for each type, all foods would have a method and a reference material, similar to the actual food, that could be used for regulatory purposes. These RMs could be produced and made available through an organization such as NIST. The first priority would be to produce reference materials for the nine food sectors named above and for products in areas where a critical need exists for reliable analyses, such as medical foods.

Figure 1. Schematic layout of food matrices by which all foods can be organized according to their relative proportions of protein, fat, and carbohydrate; the points of the triangle represent 100 per cent of the normalized content of these three major classes of food components (moisture and ash are excluded).

A reference analytical method of known reliability together with a stable RM to monitor analytical performance is the most important requirement for a regulatory agency. With results produced by using this combination, the agency can proceed with appropriate regulatory action that is based on sound analytical science.

This type of verification is part of the infant formula program. Methods for the analysis of infant formula have been developed and collaboratively studied because infant formula is the most highly regulated food in the United States today. It represents the sole source of nutrition for a large segment of the population, namely, infants. As part of the Infant Formula Act of 1980, companies are required to manufacture formula within specified limits, and FDA is required to monitor the formulas to ensure that they are within those limits. Because of differences in methodology, many questions have arisen as to the “true” concentrations of some analytes in the products. Currently, there are analytical methods for infant formula that both industry and FDA have agreed are to be used for regulatory analyses. These methods are now part of AOAC's Official Methods of Analysis (8) and have been collaboratively studied by FDA, infant formula manufacturers, and several commercial laboratories. However, no reference material is currently available for validating method performance in each laboratory. One on-going NIST project is the development of a spray dried Infant Formula material (SRM-1846) which is being characterized for organic nutrient content. SRM-1846 will serve as a reference material for Infant Formula, and will also provide a least one reference material for validating measurements that determine conformity with the requirements of NLEA. An infant formula has been prepared, spray-dried, and packaged under nitrogen in individual packets weighing approximately 30 g each. These packets have been stored for about two years and analyzed at specific intervals. They appear to have been shelf stable for that time period. Further testing is still under way. If successful, this method of packaging could be applied to other potential RMs to ensure that the nutrient content is stable for a reasonable time.

An AOAC international Technical Division on Reference Materials has been established in order to facilitate availability and use of RMs in the validation, implementation and use of AOAC Official Methods of Analysis. In addition this Technical Division will coordinate activities to assist in characterizing RMs and will conduct the International Symposia Series on Biological and Environmental Reference Materials (BERM) (9).

• References

(1) Uriano, G., & Cali, J.P. (1977) in Validation of the Measurement Process, ACS Symposium Series No. 63, J.R. Devoe (Ed.), ACS, Washington DC, pp. 114–139

(2) Code of Federal Regulations, (21 CFR 101. 9 (e) (2))

(3) International Standards Organization (1994) ISO Standard 5725

(4) Ellefson, W. (1993) in Methods of Analysis for Nutrition Labeling, D.M. Sullivan & D.E. Carpenter (Eds.), AOAC INTERNATIONAL, Arlington, VA, pp. 3–26

(5) Sullivan, D.M., & D.E. Carpenter (Eds.) (1993) Methods of Analysis for Nutrition Labeling, AOAC INTERNATIONAL, Arlington, VA

(6) Wolf, W.R. (1993) in Methods of Analysis for Nutrition Labeling, AOAC INTERNATIONAL, Arlington, VA, pp. 111–122

(7) Ikins, W., DeVries, J., Wolf, W.R., Oles, P., Carpenter, D., Fraley, N., & Ngeh-Ngwainbi, J. (1993) The Referee 17, 1, 6–7

(8) Official Methods of Analysis (1995) 16th Ed., AOAC INTERNATIONAL, Arlington, VA

(9) Heavner, G. Fres. J. Anal. Chem. (in press)

Strategies for Sampling: The Assurance of Representative Values

Joanne M. Holden, Carol S. Davis

Food Composition Laboratory, Beltsville Human Nutrition Research Center, ARS/USDA, BARC-East, Beltsville, MD 20705, USA

Current interest in the relationship of diet to the maintenance of health has stimulated the demand for representative food composition data. Values for nutrients and other food components are required to calculate dietary intakes, to determine food policy, to monitor food safety, to formulate new products, and to facilitate trade. A specific estimate must be statistically representative of the population of all values for a component in the food product of interest. Serious bias in the estimate can lead to erroneous conclusions about diet-related issues. The Food Composition Laboratory has conducted research to develop statistically based strategies for sampling the US food supply to determine estimates for components in many foods. To determine a strategy for food sampling it is necessary to define project objectives and to determine analytical priorities for foods and components. Foods to be sampled should be described in terms of the product type, ingredients, preservation state, source, cultivar, and other factors which may influence component levels. Demographic and marketing data can be used to identify parameters which are potential sources of variability. In addition, protocols for sample handling and chemical analyses should be standardized to minimize the impact of errors which may arise during the measurement process. Results of sampling research for selenium, total fat, and cholesterol in several foods are presented and the impact of sampling results on the calculation of national estimates is discussed.

Since 1960 the assessment of food consumption patterns and their impact on health status has evolved, requiring food composition data for more foods and components (1). The recognition of food intake as one factor in the longitudinal development of complex, multifactorial diseases has occurred more recently (2, 3, 4). Not only are food composition data used to identify and monitor dietary trends but they are also used for hypothesis testing (5). Other uses of food composition data are equally important (e.g. trade, food safety, food manufacturing) (6). This increased interest in food composition data has stimulated the demand for improved data, including an indication of the number of analyses, the sampling plan, and the magnitude and sources of variability, as well as descriptive and quantitative information about the analytical method and quality control (7). The lack of data for foods and ingredients impedes the assessment of diet-health relationships and impacts on the production, regulation, and use of foods. Increased demand for more data can be attributed, in part, to the development of sophisticated instrumentation which permits the measurement of minute quantities of components in foods and in biological matrices more rapidly than ever before. Similarly, the development and accessibility of computers for data processing has improved the ability to manipulate large data files to investigate new hypotheses. In view of the importance of foods as vehicles for nutrients and other components, the generation of food composition data is not an isolated exercise but, rather, an integral part of the assessment of human health status and dietary effects.

Possible specific objectives for generating food composition data include:

development of a national food composition database
determination of aflatoxin levels in a rail container of grain
determination of pesticide levels in a food product
quality control of food manufacturing
determination of significant differences in the vitamin content of different animal muscles
brand to brand (or region to region) comparisons of component levels.

The generation of these data should be based on a statistical sampling plan specific to the objective which will indicate what to sample, where to sample, and how many units to select to represent the food of interest. The definition of the objective provides the focus for the study and helps to determine the most appropriate sampling strategy. According to Horwitz a statistically based sampling plan should guide the selection of representative units from the population to provide component estimates “within a specified degree of variability with a stated degree of confidence” (8). The objective of this paper is to discuss the development of sampling strategies to provide estimates of central tendency and variability for component levels in foods to be used in food composition databases and national dietary assessment projects.

The average daily diet may contain 20–25 different items. It has been estimated that 4,000 different generic products (e.g. beef, white bread, pizza) can be found in the American marketplace. Since a nation's food supply is a complex mixture of processed and non-processed products each food item represents many brands, formulations or styles, and geographical sources. There may be as many as 50,000 products if one considers different brand names. For example, in the US there are hundreds of brands of white bread (9). Similarly, the diversity of the population, personal preferences for foods, and the availability of sophisticated manufacturing and marketing schemes stimulates the nationwide distribution of new and unusual products. Due to the complexity of a national food supply, the generation of accurate food composition data is a difficult and expensive task.

Figure 1. Population and sample: the definition of representativeness

Figure 1 illustrates the statistical concept of the sample and its relationship to the population of all forms, brands, and units of a food (10). The term population describes the collection of relevant objects from which a subset is chosen for analysis. Generally, the population of interest is very large and can be considered infinite relative to the size of the subset which is to be selected. For example, in Figure 1, the population consists of all forms of carrots usually consumed by individuals. In this same example experimental cultivars of carrots lie outside of the population but are part of the larger universe of all carrots. When estimating levels of a nutrient contained in carrots consumed in the US one would probably not sample such cultivars since they are not widely consumed. While it is not possible or desirable to analyze every package, unit, or lot of a food, the analysis of a subset of carefully selected units will provide the required data to draw inferences about the population of all available units (10) (Figure 1).

Using traditional survey sampling theory, the term sample refers to that subset or group of items or units which are selected from the population of interest to represent that population (10) (Figure 1). If the objective is to develop a nationally representative database of food composition values, then the sampling strategy must be carefully planned to construct a sample to include typical items or units in shares proportional to the sales volume or consumption properties of the population of those foods. The sample will include units of predominant brands, manufacturing locations, cultivars, etc. relevant to the specific food as consumed by the individuals of interest.

If one were to analyze all containers or units for all available brands or cultivars defined as the population for a food, e.g. carrots, then one could construct a frequency distribution of all analytical values. The distribution may or may not be Gaussian or normal. Since all units of a food which constitute the population cannot be analyzed without destroying that population the concept of sampling, i.e., selecting a representative subset of the population based on the probabilities of various types has developed (10). The analysis of all units in the subset will yield a collection of values which can be used to construct a frequency distribution for that subset. If the sample is representative of the population than that distribution will be similar in its statistical characteristics and subsequent “shape” to the distribution for the population. If one were to take multiple samples, i.e., multiple subsets of units, of the same size from the same large population one could expect that the statistical characteristics of each sample would be similar to those for the population. However, they will not be identical since the collection of mean values for all samples taken from the population will form a frequency distribution themselves. The degree of similarity of the statistical characteristics between the population and the sample defines, in part, the degree of representativeness of that sample for the population. While, in most cases, the true statistical characteristics of the population can never be known statistical sampling theory can be applied to the generation of food composition data to yield estimates of population parameters. Although the discussion of mathematical sampling theory is beyond the scope of this paper it provides the framework and point of reference for comments about the selection of foods, the number of units, variability, etc. It is important to note that the usual statistical techniques which are used to evaluate the statistical characteristics of the sample subset and to provide estimates of statistical parameters for that subset assume normality of the distribution of all possible analytical values in the subset. In some disciplines various mathematical transformations of the data are possible to permit the evaluation of scientific hypotheses. However, it is difficult if not impossible to use transformation techniques to estimate such parameters as the mean and variance. More research is needed for many components and foods to determine the statistical distributions for food composition data and to evaluate the robustness of statistical techniques as applied to such data.

In general, most values for components and foods in a database are calculated means of two or more individual values. For analytical sources or files the data may have been generated in a single laboratory or in several laboratories. Individual values may be the product of the analysis of an aliquot of a single unit or of a composite of several units. Each mean value in a database is a point in the distribution of sample means mentioned above and, yet each mean also represents a distribution of individual values or points for the sample subset from which it was derived. Since a mean database value represents a sample subset selected from the population a new analytical value for another individual unit chosen at random from the food supply may not fall within the confidence limits of the database value. However, the probability of any new value falling within limits defined by representative sampling and analysis will be high (10). Thus, it is important to estimate the mean composition and some parameter of variability for the most important food/component combinations in a database.

Recently, Greenfield and Southgate have published a discussion of the importance of sampling, including important definitions and approaches for obtaining the representative sample set (11). Analyses may or may not include aliquots of all brands, types, or cultivars present in the population. In keeping with fiscal and physical constraints, it may be necessary to take a subset of the brands or types available. One should seek statistical advice during the development phase of the sampling plan. Aliquots of single units (primary samples) may be analyzed. Conversely, units can be combined or composited by brand name, geographic location, cultivar, etc., as appropriate, before aliquots are taken to minimize the number of analytical measurements and yet represent the contribution of that unit to the estimate of central tendency. The formulation of composites should be based on the statistical data about the collection of units representing brand names, geographic locations, etc. which have been obtained from a pilot study or previous independent investigations. The impact of compositing on the magnitude of variability should not be overlooked. The number of analyses to be conducted will be determined by the desired statistical power of the estimate, the observed variability in pilot tests, and such practical considerations as physical and fiscal resources. More detail will be provided later in the text.

If the objective is to develop a national food composition database, then two major questions need to be answered: “What nutrient (s)/component(s) should be determined?” and “What foods should be selected for analysis?” Food analysis projects can be driven by the need to estimate levels of a single component (e.g. selenium, β-carotene, total fat) in foods consumed by a population of individuals. Conversely, the focus may be on a single food (e.g. beef, milk, carrot) and its major components.

• What Components Should Be Analyzed?

The components of interest may be nutrients (e.g. protein, vitamin A, iron), additives, biological agents, or contaminants. Each component or class of components represents a unique sampling challenge. However, the choice of components should be guided by the particular priorities or emphasis of the project or agency. In general three factors determine the selection of components:

the component should rank highly relative to actual or suspected public health effects
available analytical methods for the component(s) of interest should be robust, valid, capable of producing accurate data, and economically feasible
in view of fiscal and personnel limitations, analytical priorities should include those components for which available data are unacceptable or previously unavailable (12,13).

As an example, the scientific community has become interested in the possible effects of carotenoids intake on health, specifically the development of certain cancers (14). Since the 1930's, several carotenoids (α and β-carotene, and β-cryptoxanthin) have been known to have significant vitamin A activity (15). While vitamin A deficiency is still prevalent in many areas of the world, the broader role of carotenoids in human metabolism has become the object of interest. However, until recently, no comprehensive assessment of carotenoid data for foods had been conducted (16, 17). In fact, for many foods carotenoid data are lacking. Analytical methods for measuring additional individual carotenoids in simple forms of foods have been developed in recent years (18). While more work needs to be done in this area to release a robust field method, some centers are using liquid chromatography (LC) while other centers are using valid open column chromatography (OCC) methods (19). Finally, carotenoids are good candidates for further analyses because sufficient high quality data are lacking (16, 20). As other less familiar components (e.g. isoflavonoids, flavonoids) have become the objects of research, and as robust methods have become available, their determination in foods will become important. Furthermore, as improved methods for recognized important components (e.g. folates) are developed, new analyses will be needed. As new, more specific, analytical methods become available it is necessary to generate new data to replace the older outdated values. Data for fiber content of foods is an example. Crude fiber analysis has been replaced by other methods, including total dietary fiber (21). Today, carbohydrate values calculated by “difference” have been replaced by analyses of specific fractions since carbohydrates, as a class, contain diverse forms with different molecular weights and chemical structures and, therefore, different metabolic effects. Frequently, newer methods make it possible to determine some components for the first time. A large nationwide study of fast-food chicken included the determination of levels of starch contained in the seasoned flour coating (22).

• What Foods Should Be Sampled?

The selection of foods is equally important. Stewart et al. (11) and Beecher and Matthews (12) have stated that priorities for analyses should be based on three considerations: First, although many foods may contain the component of interest, the foods selected should be the major contributors of that component to the diet. Frequently, a limited number of foods (5–100) contribute 50–90 per cent of a single component to the diet of the population of interest (23, 24). Existing data and/or pilot studies can provide preliminary estimates of the levels of components in foods. Food consumption survey data and/or data from food balance sheets can be combined with preliminary food composition data to provide a ranked list of the major contributors of specific components (20, 24).

Second, foods for which data are unacceptable or unavailable should be selected. As an example, tomato products are the most important source of lycopene, an abundant carotenoid in the US diet. However, after the assessment of carotenoids data quality for multi-component foods by Chug-Ahuja et al. (20), the authors determined that analytical carotenoid data for popular commercial, tomato-based soups, sauces, and spaghetti sauce were nonexistent. A nationwide sampling plan for three cities was developed to select samples of these products to be analyzed for five of the most important dietary carotenoids (25). New forms of foods are appearing in the markets of many countries and are gaining in popularity. Food composition data for many of these foods are nonexistent. For example, the influx of many previously unknown fruits and vegetables into the US food supply requires that these foods be sampled and analyzed to determine their composition. Initially, new foods may be imported from other countries, prior to commencement of their local production (e.g. kiwi fruit, Granny Smith apples). Since climate, soil conditions and geography affect levels of some components, geographical source and variety/cultivar may be relevant to the sampling plan. Therefore, it would be necessary to compare data for imported fruits with data for fruit from domestic sources; values would be revised, if necessary. For a recent study of human carotenoid metabolism, a single production lot of frozen broccoli was needed to assure uniformity of the product for all subjects over the entire course of the study. When a small regional company was contacted to procure the broccoli it was found that the product was grown and processed in Guatemala. Analyses of carotenoid levels in the frozen broccoli revealed that values were significantly different from those for fresh broccoli procured in the retail market (26). This revelation emphasized the importance of using analytical values for critical components in single lot foods used in human metabolic studies.

A third consideration is the need to analyze foods as eaten. As new forms of important foods become popular they should be analyzed to generate up-to-date data more appropriate to eating habits (12, 13). In many countries the use of fully prepared commercial foods instead of home-prepared commodities has increased rapidly. Estimates for those prepared foods are more representative of what some segments of a population are eating than estimates for foods prepared from the basic ingredients. Formulated foods may contain different levels of fat, sodium, or other components than domestic recipes. A recipe calculation technique can be used for some components. However, formulations for commercial products are generally unavailable and are frequently different from home-prepared products. Therefore, the composition of important commercially-prepared foods will need to be determined by analysis. New ingredients such as fat substitutes, gums and sweeteners alter the formulations of familiar foods, necessitating the need for new analyses. Finally, advances in animal and plant breeding will require new analyses to estimate changes in targeted components. For example, in the US, recent advances in breeding and marketing practices have dramatically reduced the separable fat trim on beef and pork. Nationwide retail studies were planned and conducted in collaboration with meat science departments at Texas Agricultural and Mechanical (A & M) University and the University of Wisconsin to assess the impact of these changes on the composition of beef and pork (27, 28).

Section III Quality Control of Food Composition Data and Databases

Food Classification and Terminology Systems

Nutritional Metrology: The Role of Reference Materials in Improving Quality of Analytical Measurement and Data on Food Components

Strategies for Sampling: The Assurance of Representative Values

Section III

Quality Control of Food Composition Data and Databases