Notes
When the Arabic and Chinese terms appear to differ in the sources, there are listed separately.
The Arabic and Chinese terms mostly gave error messages when copied so they are all replaced by "xxx", but one can still see which sources have these terms.
| watershed | [FAOGloss, FAOTerm, AGROVOC +s] |
| | |
| DF | The area which supplies water by surface and subsurface flow from rain to a given point in the drainage system (ISSS (1996) in Choudhury K. and L.J.M. Jansen (1999): Terminology for Integrated Resources Planning and Management. FAO, Rome, Italy: 69 pages [FAOGloss] [FAOTerm, source:. (Terminology for integrated resource planning and management, 1999 - X2079E)] |
| | |
| UF | Catchment areas [AGROVOC] |
| | |
| BT | Physiographic features [AGROVOC] |
| | |
| RT | River basin [FAOGloss] [UF in AGROVOC] |
| (Arabic) | xxx [AGROVOC] |
| (Chinese) | xxx [FAOTerm] |
| (Czech) | povodí [AGROVOC] |
| (French) | bassin versant [FAOTerm] [AGROVOC] |
| (Spanish | cuenca hidrográfica [FAOTerm, AGROVOC +s] |
| (Portuguese) | Bacia hidrográfica [AGROVOC] |
| trawl | [FAOGloss, FAOTerm] [RT in AGROVOC +ers, +ing] |
| DF | A cone or funnel-shaped net that is towed through the water by one or more vessels [FAOGloss] |
| | |
| RT | Pair trawling [FAOGloss] |
| (Arabic) | xxx [FAOTerm] |
| (Chinese) | xxx [FAOTerm] |
| (French) | chalut [FAOTerm] |
| (Spanish) | red barredera [FAOTerm] |
| feed | [FAOGloss, FAOTerm, AGROVOC +s] |
| DF | Any non-injurious edible material having nutrient value. May be harvest forage, range or artificial pasture forage, grain, or other processed feed for livestock or game animals. Am. Soc. Of Range management (1964) in Choudhury K. and L.J.M. Jansen (1999): Terminology for Integrated Resources Planning and Management. FAO, Rome, Italy: 69 pages [FAOGloss] [FAOTerm, source:. (Terminology for integrated resource planning and management, 1999 - X2079E)] |
| | |
| DF | Any non-injurious edible material having nutrient value. May be harvest forage, range or artificial pasture forage, grain, or other processed feed for livestock or game animals. (Terminology for integrated resource planning and management, 1999 - X2079E) [FAOTerm] |
| | |
| DF | In aquaculture, residues from agriculture and food producing industries as well as fishmeal are important sources of feeds. [FAOGloss] |
| | |
| UF | Animal feeding stuffs [AGROVOC] |
| | |
| NT | Clover [AGROVOC] |
| | |
| RT | Beet pulp [AGROVOC] |
| (Arabic) | xxx [AGROVOC] |
| (Chinese) | xxx [AGROVOC] |
| (Czech) | krmiva [AGROVOC] |
| (French) | aliment du bétail [FAOTerm] |
| (Spanish) | pienso [FAOTerm, AGROVOC +s] |
| (Portuguese) | Alimento para animais [AGROVOC] |
| harvest index | | [FAOTerm, AGROVOC] |
| DF | | The ratio of economic yield of a crop and total dry matter at harvest. (Terminology for integrated resource planning and management, 1999 - X2079E) [FAOTerm] |
| | | |
| BT | | Yield components [AGROVOC] |
| (Arabic) | xxx [AGROVOC] |
| (Chinese) | xxx [AGROVOC] |
| (Czech) | sklizòový index [AGROVOC] |
| (French) | Indice de récolte [FAOTerm, AGROVOC] |
| (Spanish) | Índice de cosecha [FAOTerm, AGROVOC] |
| (Portuguese) | Relação grão palha [AGROVOC] |
| irradiation | [FAOGloss, FAOTerm, AGROVOC] |
| DF | Illumination with electromagnetic radiation, typically of sufficiently high energy(low-wavelength UV or gamma, etc.) to disrupt biological macromolecules and hence induce mutations. [FAOGloss] |
| | |
| UF | exposure [FAOTerm] |
| | |
| NT | Gamma irradiation [AGROVOC] |
| | |
| RT | asymmetric hybrid [FAOGloss] |
| | Immunosuppression [AGROVOC] |
| | Ionization [AGROVOC] |
| | mutagen [FAOGloss] |
| | mutation [FAOGloss] |
| | Processing [AGROVOC] |
| | radiation hybrid cell panel [FAOGloss] |
| | Radiation damage [AGROVOC] |
| | Radiosensitivity [AGROVOC] |
| | Sterilizing [AGROVOC] |
| | xxx [AGROVOC] |
| | xxx [AGROVOC] |
| | ozaøování [AGROVOC] |
| | radioexposition [FAOTerm] |
| | irradiation [FAOTerm, AGROVOC] |
| | radioexposición [FAOTerm] |
| | irradiación [FAOTerm, AGROVOC] |
| | Irradiação [AGROVOC] |
| |
| nutrient deficiency | [FAOGloss, FAOTerm, AGROVOC +ies] |
| DF | Absence or insufficiency of an essential factor for normal growth and development [FAOGloss] |
| | |
| NT | Kwashiorkor [AGROVOC] |
| | |
| RT | Deficiency diseases [AGROVOC] |
| (Arabic) | xxx [AGROVOC] |
| (Chinese) | xxx [FAOTerm, AGROVOC] |
| (Czech) | deficience zivin [AGROVOC] |
| (French) | carence (alimentaire) [FAOTerm] |
| | Carence en substance nutritive [AGROVOC] |
| (Spanish) | carencia de nutrients [FAOTerm] |
| | Deficiencias nutritivas [AGROVOC] |
| (Portuguese) | Carência em substâncias nutritivas [AGROVOC] |
| food security | [FAOGloss, FAOTerm, AGROVOC] |
| DF | Freedom from hunger. The capability to produce an adequate amount of food for all consumers at affordable prices. [FAOGloss] |
| | |
| DF | Food security exists when all people, at all times, have physical and economic access to sufficient, safe and nutritious food to meet their dietary needs and food preferences for an active and healthy life (WFS, 1996). [FAOTerm] |
| | |
| DF | (Spanish) Se dice que existe seguridad alimentaria cuando todas las personas tienen en todo momento acceso físico y económico a suficientes alimentos inocuos y nutritivos para satisfacer sus necesidades alimentarias y sus preferencias en cuanto a los alimentos, a fin de llevar una vida activa y sana (WFS, 1996). [FAOTerm] |
| | |
| DF | (French) La sécurité alimentaire existe lorsque tous les êtres humains ont, à tout moment un accès physique et économique à une nourriture suffisante, saine et nutritive leur permettant de satisfaire leurs besoins et leurs préférences alimentaires pour mener une vie saine et active (WFS, 1996). [FAOTerm] |
| | |
| RT | Food aid [AGROVOC] |
| (Arabic) | xxx [FAOTerm] |
| (Chinese) | xxx [FAOTerm] |
| (Czech) | potravinová bezpeènost [AGROVOC] |
| (French) | sécurité alimentaire [FAOTerm, AGROVOC] |
| (Spanish) | seguridad alimentaria [FAOTerm, AGROVOC] |
| (Portuguese) | Segurança alimentar[AGROVOC] |
This text is repeated from Part 2 J. for convenience.
Appendix 2 lists a number of sources, divided into general coverage sources and specialized sources.
The sources are labeled as follows:
G/D = Glossary, dictionary
T = Thesaurus, classification
N = nomenclature
DB = Database
O = Other (handbook etc.)
+ KOS maintained, sponsored, or used by FAO
* Otherwise consider as a priority source of terms for AGROVOC
# Site to link to in an Agricultural Ontology Server for more detailed information
The sources marked by * were selected based on germaneness to FAO's work, authority of the originating organization, richness of information, and, where appropriate, size. FAO personnel are more knowledgeable about the areas in which AGROVOC is weak and are therefore in a better position to assign source priority based on that criterion.
These sources can be harvested for additional concepts, terms in multiple languages, definitions, and relationships. This requires
Clearance of copyrights, where applicable.
Scripts to transform the sources into the input format of the KOS management software used.
Adding to the FAO KOS database via KOS management software (see Section E above)
Editing terms, definitions, relationships that are new to integrate the source into the FAO KOS database.
This appendix gives a few examples of KOS use cases and KOS projects analyzed according to the template given in Part1, 1.4 and 1.5
Framework for KOS projects
Number and title AGROVOC
Related to thesaurus use cases
Scope and size
Regular work flow just being established, update size not available
update cycle 3 months
Unit and person responsible Gudrun Johansen + countries
Support countries with tool
Collaboration / coordination (actual and possible)
Development versus maintenance
Any gaps in domain coverage
Fisheries, forestry (both complained), sustainable development: environment and natural resources service needs technology terms remote sensing, GIS, biotechnology, biosafety, Need list of FAO priorities and new initiatives
Development person hours / maintenance person hours per month
Time frame
Data model (entity and relationship types) (existing and needed)
Mapping to categorization schemes instead of having built-in hierarchy
094: TermCode indicator
0 English (EN)
1 French (FR)
2 Spanish (ES)
3 Arabic
4 Chinese
Note: AGROVOC also contains some German translations. Also, AGROVOC translations done by third parties are available in a number of languages, including Thai, Croatian,...
x01: Descriptor
x02: Scope Note
x03: Use
x04: Use for
x05: Broader Term
x06: Narrower Term
x07: Related Term
x08: Non Descriptor
| LanguageCode | Name | LngGroupID | OriginalName | CreateDate |
| DE | German | 100 | Deutsch | 4/27/1999 |
| EN | English | 10 | English | 4/27/1999 |
| ES | Spanish | 10 | | 4/27/1999 |
| FR | French | 10 | | 4/27/1999 |
| IT | Italian | 100 | | 1/20/1997 |
| PT | Portuguese | 100 | | 7/3/1996 |
| RU | Russian | 100 | | 4/27/1999 |
Scope
| ScopeID | ScopeDesc |
| GC | Geographic Term (country level) |
| GG | Geographic Term (above country level) |
| GL | Geographic Term (below country level) |
| TA | Taxonomic Term (animals) |
| TB | Taxonomic Term (bacteria) |
| TF | Taxonomic Term(fungi) |
| TP | Taxonomic Term (plants) |
| TV | Taxonomic Term (viruses) |
| TagTypeID | TagDesc | LanguageCode |
| 10 | Scope Note | EN |
| 20 | History Note | EN |
| 30 | Definition | EN |
| 40 | Comments | EN |
| LinkTypeID | LanguageCode | LinkDesc | LinkAbr | CreateDate | RLinkCode |
| 5 | EN | Scope Note Reference | SNR | 4/27/1999 | 10 |
| 10 | EN | Is Referenced in Scope Note | SNX | 4/27/1999 | 5 |
| 20 | EN | Used For | UF | 4/27/1999 | 70 |
| 30 | EN | Used For+ | UF+ | 4/27/1999 | 70 |
| 40 | EN | Seen For | SF | 4/27/1999 | 80 |
| 50 | EN | Broader Term | BT | 4/27/1999 | 60 |
| 60 | EN | Narrower Term | NT | 4/27/1999 | 50 |
| 70 | EN | Use | USE | 4/27/1999 | 20 |
| 80 | EN | See | SEE | 4/27/1999 | 40 |
| 90 | EN | Related Term | RT | 4/27/1999 | 90 |
Software used, file structure
Internal software
MySQL database is the main DB, exported to Access for distribution, to Oracle for Web
Sources
Suggestions from users/ info system designers internal to FAO
EIMS WAICENT
ASFA problem
Framework for KOS projects
Number and title FAO Term
Related to thesaurus use cases
Translators, terminology info on FAO activities, org. units and project names 8,000 that FAO deals with for anybody in the organization
No automated translation, except using exact same sentence with fuzzy matches. Internet use Claudia
Could support definitions in flux
Interpreters, create specific glossaries for meetings, interpreters also need general language info
Scope and size
100 terms changed or added, 30% change, 70% new
Unit and person responsible Alexis Crespel, Ingrid Alldritt-Ferrarro 3 full + Alexis.5, + other support
But also support meeting preparation
Collaboration / coordination (actual and possible)
Development versus maintenance
Any gaps in domain coverage
No field completely missing Codex alimentarius, environment, fisheries and forestry difficult to keep up with due to size
Development person hours / maintenance person hours per month
Time frame
Data model (entity and relationship types) (existing and needed)
Terms
5 official languages
See separate document
Software used, file structure
Trados MultiTerm connected to translation memory system
Work on Web interface for direct input to Web database
Multiterm to Oracle every months.
Other
Web site feedback form
Sources for concepts and terms
translators
INRA
query
term extraction using Trados, statistical approach, lots of noise, not unique terms
Definition extractor but political
avg 100 per month
rely on expertise of others for definition
Meeting document glossaries
Reference section alerts to docs with definitions
From other institutions try to integrate
Integrated within a service
Forms of publications
Framework for KOS projects
Number and title FAO Glossary
Related to thesaurus use cases
Scope and size 4,700 terms
Unit and person responsible Fisheries; Sustainable Development
Glossaries defined by technical departments
Aquaculture did their own thing using their own system
Fisheries dept. wants tool for other institutions to download subsets
Collaboration / coordination (actual and possible)
Development versus maintenance
Time spent in units, updated in intervals, not daily
Any gaps in domain coverage
Development person hours / maintenance person hours per month
Time frame
Data model (entity and relationship types) (existing and needed)
Terms (starting with capitals) and English scope notes (using <br> for line breaks)
Owner for each term and scope note
Examples
Link to image
RT
Software used, file structure
Updated by Word document without sources because all definition come from an FAO publication (biotech)
Fisheries uses Web interface for small updates, files for larger updates
Some
Oracle tables
Web interface for input
Sources
International organization literature, often
Framework for KOS projects
Number and title NAL Thesaurus
Related to thesaurus use cases
Scope and size
Unit and person responsible
Collaboration / coordination (actual and possible)
Development versus maintenance
Any gaps in domain coverage
Development person hours / maintenance person hours per month
Time frame
Data model (entity and relationship types) (existing and needed)
UF
USE
BT
RT
SN
SC subject category
TNR Indicator for facet heading
Software used, file structure
| Functions of a thesaurus / classification /ontological knowledge base Overview Provide a semantic road map to individual fields and the relationships among fields. Improve communication generally. Support learning and assimilating information.
Provide the conceptual basis for the design of good research and implementation.
Provide classification for action. Classification for social and political purposes
Support information retrieval and analysis. Organizing and keeping track of goods and services for commerce (esp. ecommerce) and inventory
Support meaningful, well-structured display of information. Ontology for data element definition. Data element dictionary. Conceptual basis for knowledge-based systems. Do all this across multiple languages Mono-, bi-, or multilingual dictionary for human use. |
| The underlying function of a knowledge base on concepts and terminology: Provide a semantic road map to individual fields and the relationships among and across fields. Map out a concept space, relate concepts to terms, and provide definitions, thus providing orientation and serving as a reference tool. Provide a semantic road map and common language for an individual field and, perhaps more importantly, map the relationships among fields. Clarify concepts by putting them in the context of a classification / typology and to provide a system of definitions. Relate concepts and terms across disciplines, languages, and cultures. Many specific functions build on this foundation. |
| Improve communication generally. Support learning about any topic by providing the learner/reader with a coherent, age-appropriate conceptual framework. Conceptual frameworks help the learner ask the right questions; learning as information retrieval. Support the development of instructional materials by providing a conceptual framework to the instructional developer / writer and by suggesting didactically useful arrangements of topics. Assist readers in understanding text; help them ascertain the proper meaning of a term and placing it in context. Assist writers in producing understandable text by helping them to conceptualize the topic and suggesting from a semantic field the term that best conveys the intended meaning and connotation. Support foreign language learning |
| Provide the conceptual basis for the design of good research and implementation. Assist researchers and practitioners with problem clarification Includes help with exploring the conceptual context of a research or practical problem - a study, policy, plan, or implementation project and with structuring the problem and providing a conceptual framework for asking the right questions and devising good query formulations for retrieval. Examples of specific functions: Present the issues in a field or application area in a coherent framework. Assist in problem-solving: Assist in the exploration of the dimensions of a problem and aspects to be considered in its solution; provide a classification of approaches to solving a specific problem (for example, a classification of approaches to drug abuse prevention as a help in designing drug abuse prevention projects). Provide classification and consistent definition of variables for research / of evaluation criteria for practical problems, thus enhancing the comparability of research and evaluation results and making research more cumulative. |
| Support the compilation and use of statistics This is a very important function. The Census Bureau, the Bureau of Labor Statistics, and other statistical agencies are heavily involved in developing classifications and defining concepts.
|
| Provide classification for action This list addresses the functions of formal classifications. In a broader perspective, classification is the basis for much of everyday action, where we put people, things, and events in certain categories and, based on these categories, predict the behavior of persons and things and the course and effects of events, determine our attitudes towards them, and plan action accordingly. For example,
|
| Classification for social and political purposes. Socially charged classification For example
|
| Support information retrieval 1: A tool for searching, particularly knowledge-based support for end-user searching. Support
Elicitation of user needs through a series of menus based on a search tree, or through guidance in the conceptual analysis of a search topic (questions based on a facet structure, presentation of a segment of the concept hierarchy for each applicable facet). Browsing the classification structure to identify useful concepts for a search at the level of specificity desired. (The user may not have command of the vocabulary needed.) Browsing a collection (as on the shelves or in a subject directory) Mapping from the user's query terms to descriptors used in a database or to the multiple natural language expressions to be used for free-text searching. Inclusive (hierarchically expanded) searching. Enhanced ranking algorithms that use concept and term relationships. Searching multiple databases by mapping the users query terms to the descriptors used in each of the databases, or mapping the descriptors from one database to another databases (switching); common search language. |
| Support information retrieval 2: Provide a tool for indexing. Vocabulary control. User-centered (request-oriented, problem-oriented) indexing. Indexing several databases in a field with a common index language and sharing the results of indexing to reduce overall indexing effort. Mapping indexing descriptors from one system to another. |
| Support information retrieval 3: Facilitate the combination of multiple databases or unified access to multiple databases through
|
| Support information retrieval 4: Document processing after retrieval Sample functions that require knowledge-based support:
|
| Support meaningful, well-structured display of information
|
| Organizing and keeping track of goods and services for commerce (esp. ecommerce) and inventory The functions detailed for information retrieval apply to this special case
These functions apply both to business-to-consumer and to business-to-business commerce. Classification by function or purpose is especially important here. |
| Ontology for data element definition.
|
| Conceptual basis for knowledge-based systems. |
| Do all this across multiple languages |
| Mono-, bi-, or multilingual dictionary for human use.
Dictionary/knowledge base for automated language processing
|
| Functions of an ontological knowledge base in software development Assist in the design and implementation of the user interface, esp. choice of terms and icons. Terms and icons must be chosen with the sometimes conflicting goals of communicating to the intended user group and of adhering to standards. Assist in the organization and formulation of help messages and of documentation and third-party software books. Serve as the lexicon for machine translation of interfaces and software-related documents Assist the user in understanding interfaces and documentation, esp. in a foreign language. Support retrieval of software for the end user or for software reuse. Data element definition and standardization and organization of CASE tool databases. All this functionality must be provided in multiple languages (for example, software localization for end users, CASE tool databases for multinational development teams) |
January 10, 1992
Dagobert Soergel
See Soergel et al. A language for the description of foods in databases for background (included in file Foodlanguage2b.pdf).
At the core of a thesaurus for an IFD system is a list of authorized entity values arranged by entity type. Each entity value is represented by an internal identifier, usually a short code, and an external preferred term. Additional synonyms may be given with reference to the preferred name or code.
Various relationships can be established among these entity values independently of the description of individual food products. These relationships constitute data, and they are stored in the same entity-relationship structure as any other data in the IFD system; the thesaurus is an integral part of the database.
The relationships comprise is-a relationships between entity values of the same type, such as Potassium sorbate is-a ***, and relationships between entity values of different types, such as Potassium sorbate is-used-for Preservation. Both types of relationships can be used for the classification of entity values within one entity type. The is-a relationships define the basic classification, for example of substances by their chemical nature. The other relationships can be used to define auxiliary classifications, for example a classification of substances by use, which would bring together all preservatives.
For many applications an auxiliary classification may be more helpful than the basic classification. Fortunately, in an on-line thesaurus there is no need to choose between the two; the user can have the entity values displayed in any way that is supported by the data in the system. Even in a printed edition one may include several arrangements of the entity values within a type, but size constraints may force choices.
The following pages show the entity types and relationship types for food descriptions that are suggested for a first implementation of the IFD system and the relationship types that should be considered in building the thesaurus. The remainder of this section will discuss each entity type in turn.
Entity types
Food product, recipe, or standard
Three values are on the top of the hierarchy
Values: food product, food recipe, food standard of identity
A substance value can be used whenever food product is indicated in a relationship.
Organism or inorganic food source
Age
Maturity of plant or animal part
Sex
Environment
Anatomical partValues can be used for recording more detail of growing conditions. It presently includes values that distinguish between farm- or garden-produced plants or animals and wild plants or animals.
Substance, material (energy or matter)
Physical state
Physical formValues solid, semisolid, semiliquid, liquid or more precise viscosity numbers.
Process
Sequence number of processValues cover many different types of processes, e.g. cooking method, incl. storage and handling.
Place/stage of processing or point in distribution chain
Consumer groupValues include farm/garden, manufacturing plant, retail store, restaurant, home.
Purpose or effect, use (including diet by intake)Characterization by age, sex, ethnic origin and other relevant characteristics
ConditionSample values are nutrition, preservation, texture, packing. Includes meal type (breakfast, lunch, snack, etc.) and use for special holiday or religious occasions. Also includes the distinction between luxury and everyday foods. When a purpose or effect is required, the value of a process can be given: the purpose is to facilitate the process.
Amount
Property
Geographical area
OrganizationIncludes countries, subdivisions of countries, regions (such as tropics), bodies of water.
DateIncludes food-producing companies, government organizations (as producers of standards), and cookbooks (as "producers" of recipes).
Money (for indicating price if desired)
Currency
Container, equipment
Types of containers or equipment, such as aluminum can or kettle made of copper, or specific models. Values are divided into container and equipment and within each into primary components (for example, bottle, mixing bowl) and secondary components (for example, lid, mixing arm).
Relationship types for food description
| food product | isa | [food product 2] |
| food product | is_one_of | [food product list] |
| food product | is_one_or_more_of | [food product list] |
Labels often include ingredient listings such as
Vegetable oil (one or more of the following: soybean, almond, coconut, or cottonseed oil)
This and the preceding relationship allow for the definition of a corresponding "food product", which can then be used in a has ingredient statement.
| food product | is_substitute_for | [food product 2] |
| food product | comes_from_source | [food source, age, sex, environment] |
| food product | comes_from_part | [[principal part present, [subparts absent], [subparts unknown], maturity, [[country, grade]]]] |
Often the anatomical part from which a food product is made is most easily described as a principal part from which some subpart was removed, such as
Fruit, seed removed
Fruit, seed removed, skin removed
To make matters worse, often it is not known whether a subpart was removed or is still present. Rather than including all the possible combinations in the thesaurus, this relationship provides a frame for constructing the appropriate combination from three pieces of information as required: The principal part, a list of zero or more subparts known to be absent, and a list of zero or more subparts for which it is not known whether they are present or not. Examples:
Fruit, seed removed, skin unknown; ripe
fp636 comes_from_part [[fruit, [seed], [skin], ripe]]Fruit, seed removed, skin removed; no statement on maturity
fp737 comes_from_part [[fruit, [seed, skin]]]
The format of the relationship allows for listing several principal parts with their modifiers; this may be more natural than constructing an intermediate food product with two parts as ingredients.
| food product | is_made_from | food product 2 |
is_made_from is used for foods from a single source where has_ingredient would seem unnatural, such as protein extracted from soybeans or chocolate nibs made from cacao beans by roasting and cracking. In many searches it makes sense to search for
is_made_from OR has_ingredient
| food product | contains | [substance, amount in total, amount in solids, part of definition (yes/no), label claim (yes/no)] |
The relationship contains refers, strictly speaking, to analytical rather than descriptive data. However, many foods are defined in terms of the nutrients or other substances they contain, for example, low-fat milk. Analytical data thus become, by necessity, part of the description of the food. This relationship can be used to integrate analytical data into an IFD database, if desired.
| food product | is_isolated_substance | [isolated substance, degree of concentration] |
The process by which the isolation was accomplished should be indexed separately, using the relationship underwent_process.
| food product | had_removed_substance | [removed substance, degree of removal] |
The process by which the removal was accomplished should be indexed separately, using the relationship underwent_process.
| food product | has_ingredient | [food product, predominance rank, total ingredient amount in total food product, ingredient solid amount in food product solids, [purpose list]] |
Where amount or predominance rank appear in relationships, a more complex structure will be needed to reflect the variability of foods, especially in the description of generic foods. The suggested structure replaces a single value by a list as follows:
[most common value, ["RANGE", upper bound, lower bound], ["LIST", value-1, value-2,..., value-n]]
For example, in the description of the generic food product milk (cows milk), the amount specification in the contains statement for fat would be
[3.35, ["LIST", 3.35, 2.0, 1.0, 0.2]]
This indicates that the most common fat content is 3.35 g per 100 g (whole milk) and that the permissible values are those given after LIST. This information is presented to the indexer who entered milk as the product to be indexed; the indexer must then choose the correct value for the product at hand or, if that information is not available, accept the imprecise specification given.
If the point in the process and/or the manner of addition of the ingredient are important, use a separate underwent_process statement with addition or a narrower term as the process and the ingredient as the agent.
| food product | may_have_ingredient | (as has ingredient) |
| food product | contains_dish | food product 2 |
This relationship serves to describe TV dinners and such.
| food product | underwent_process | [process, [agent list], equipment, place/stage, sequence no., [purpose list]] |
An agent of a process may be an organism, a substance, or a food product. If the agent is also an ingredient (if the agent stays in the food), use a separate has_ingredient statement.
A product that underwent several processes can be indexed in two ways.
(1) Index as one food product, giving for it all the processes with their sequence number.
(2) Define several intermediate food products and index each separately; each product becomes the ingredient of the product next in the processing stages.
Option 2 should be used when the intermediate stages are products in their own right.
The presence of both options in a database must be considered in searching.
| food product | has_state | [physical state, temperature] |
In this relationship a missing temperature value is interpreted as the default value room temperature (20 degrees Celsius). There can be several has_state statements for a food product; at least one must be for room temperature.
| food product | has_form | [physical form, temperature] |
| food product | has_property | property |
| food product | is_for_special_use | [purpose, label claim (yes/no)] |
| food product | made_for | consumer group |
| food product | is_used_for | [purpose, priority, season or day of year, [country list]] |
| food product | has_price | [money, currency] |
| food product | produced_in | [geographical area, date] |
If the statement is about a food standard, geographical area is the country or other area in which the standard holds, and date refers to the date it was promulgated.
| food product | produced_by | organization |
If the statement is about a food standard, organization is the organization that developed the standard.
| food product | packed_in | container |
Relationship types for container and equipment description
| container or equipment | consists-of | [[container or equipment component, structural strength material, inside coating, outside coating]] |
The container description is a list. The elements of the list are in turn lists with four elements each, as indicated. The first component given is the primary component, such as bottle, box, or tray. The remaining components (if any) are secondary components, such as lid, ends, window, liner, top cover. The primary component implies a form.
Relationship types to be used in the thesaurus
| entity | isa | entity |
| entity | is_part_of | entity |
| entity | overlaps_with | entity |
| [organism, part] | is_used_for | [purpose, priority, [country list]] |
| organism | lives_in | environment |
| substance | is_measured_in | [unit 1, conversion factor, unit 2] |
Unit 1 is the unit of measurement used, generally kg, g, mg, or mg. The conversion factor serves, for example, to convert international vitamin units to mg or mg.
| substance | has_daily_requirement | [user group, minimum amount, recommended amount, maximum amount, [country list]] |
| substance | is_used_for | [purpose, priority, food product, [country list]] |
[substance, consumer group, quantity]
| | is_harmful_for | [harmful effect, strength, food product, [country list]] |
| process | is_used_for | [purpose/effect, priority, food product, [country list]] |
| process | is_harmful_for | [harmful effect, strength, food product, [country list]] |
| substance | is_soluble_in | [substance] |
Notes on the conceptual schema
1. Many food descriptions will not include the level of detail that is possible with the schema. The level of detail is determined by the amount of information that is available. The schema defines an upper limit in the sense that one should not aim at descriptions that are more detailed than the schema provides.
2. Most relationships have one element to the left and an argument list at the right; the argument list may contain lists as elements. The meaning of an element in a list is normally specified by its position. Elements at the end of a list may be omitted if they are not specified. However, if one element remains unspecified and an element further to the right in the list is specified, the unspecified element must be represented by a place holder to maintain the proper position for the specified element.
3. The structure of the relationships is determined by ease of internal processing. In the finished system the user will input the information needed on formatted screens and the system will put together the relationship.
4. Any relationship can be used to express the absence or negation of something, for example,
| fp383 | has_ingredient | ["NOT", [salt]] |
In general: The right-hand list has two arguments, "NOT" and a list which has the same structure as the argument list of the unnegated relation.
5. The source of data, the date of analysis, and the date of data entry, the person responsible for entering the data, and comments can all be handled through auxiliary relationships. The left-hand side of a relationship is expanded to be a pair of the form [entity value, statement number] (for example, [food product, statement number] or [substance, statement number]).
| [entity, stno] | has_entry_info | [entry date, entry person] |
| [entity, stno] | has_source_info | [[source, date of analysis]] |
| [entity, stno, specific element] | has_comment | text |
specific element would name a specific element in a statement to which the comment refers; this may not be needed.
Further elaboration of the entity types
Food product, recipe, or standard
Food product is the central entity type of an IFD database. Food products are described by indicating their relationships to other food products and to entities from other entity types. Every food product description starts with a statement of the form
Food product 1 is a Food product 2.
Further statements then record the characteristics that are added to the description of food product 2 to result in the more specific description of food product 1.
The thesaurus contains a number of very generic food products called product types. The product types serve as a starting point in the description of more specific foods. They also provide a useful overview of the major types of food (food groups). Since different classifications of food groups are used for different purposes and by different organizations, the thesaurus may contain several such classifications with one designated as the major one for purposes of this system.
The thesaurus entry for a product type is a food product description following the same rules as the descriptions of specific foods in the database. The only difference is that the names of product types appear in the thesaurus listing whereas the names of more specific food products do not. All food products, whether product types or more specific products are accessible online and can be displayed in any selection and arrangement desired (within the capabilities of the system).
Development of this part of the thesaurus should start from the FFV factor Product type. Since many product types are defined based on usage, many values of the entity types Purpose or effect and Meal type must be introduced to allow for proper description of all product types.
Organisms and inorganic sources
The scope of this entity type comprises food-relevant organisms, that is organisms used as a source for food, used for the treatment of food, or having a harmful effect on food. It also includes the values Mining (for products such as salt and water) and Chemical synthesis. This is somewhat broader than the definition of the FFV factor Food source.
The basic classification of organisms should follow taxonomic arrangement. Standard taxonomy provides a common frame of reference, which is of particular importance in an international system.
Application-oriented classifications are also very important, perhaps more important than the taxonomic classification. They are accomplished through relationships of the type
(organism, part) is-used-for (purpose, priority, country-list)
where the entity type purpose includes such values as fruit, berry, vegetable, and grain. The purpose slot can also be filled by pairs of the form (extraction, substance). Priority gives some indication of what is the main use and what are minor uses of an organism. The optional country-list allows for differences from country to country in these data. If it is used consistently, each country can have its own application-oriented classification of organisms. There must be at least one relationship of the type is-used-for every organism. This takes care of the classification of Plant used as food source in FFV which is very helpful.
A further relationship is useful:
organism lives-in environment,
where values for environment are initially limited to
Land (soil), Water, Sea water, Fresh water.
The relationship Organism lives-on Land is the default. Explicit statements need be included only for organisms that live in water, where the specific terms Sea water and Fresh water are used whenever possible. This relationship makes it possible to search for all seafood.
Development of this part of the thesaurus starts from the FFV factor Food source. The Latin names, and thus the taxonomic location, for each organism must be determined (use Hortus 3 as authority for plant names). In some cases expansion into subspecies, varieties, and cultivars may be indicated. Additional organisms that are used as food source will be identified as the database grows. The FFV classification supplies most of the information needed for the application-oriented classification from an US point of view. These data can be augmented as the system develops.
Food-relevant organisms that are not food sources must be identified and included with appropriate is-used-for relationships. Such organisms can be identified from the CFR (for example, bacteria and molds used in making cheese) and from textbooks and handbooks in food science and technology.
Ancillary entity types Age, maturity (values to be determined) and Sex.
Environment
At this point Environment is recommended only for thesaural relationships making general statements about the organisms included in the thesaurus as discussed above and not for making statements that specify the conditions (e.g. soil type) under which the specific plants or animals used for the food product being described were raised. Therefore, the only values to include here for now are
Land (soil), Water, Sea water, Fresh water
Anatomical part
This section contains an anatomic classification of plant parts and of animal parts at a level of detail appropriate for food use. Skeletal meat parts may be further subdivided into cuts. Since different countries use different meat cuts, several parallel classifications may be needed. As an aid to the user one may develop a very fine-grained classification of base cuts such that any conventional cut used in any country can be represented as a group of base cuts.
In FFV there are two ancillary entity types, Presence of bone or shell (values With bone and Boneless) and Meat shade (values White, Light, and Dark). The first can be handled by the conceptual schema presented here, the second is a genuine entity type.
The classification can be taken from the FFV factor B2, the classifications for the ancillary entity types from factor Z. The Extract, concentrate, or isolate terms from B2 need not be included; this is handled through the relationship is extracted substance.
Substance/material
The IFD thesaurus must contain a comprehensive list of food-relevant substances that
- naturally occur in foods,
- occur in foods through migration in the food chain,
- are added to foods for whatever purpose,
- are used to treat foods,
- are used in food-processing equipment, or
- are used in any component of packaging.
The basic classification of substances should reflect their chemical nature, choosing from the many ways in which substances can be classified chemically the one that is most useful in the context of foods. Beyond that, the system should rely on a chemical substructure search system, such as the CAS Registry, with an option to limit output of search results to substances in the IFD thesaurus.
An application-oriented classification is very important here. While the data for the basic classification are by and large the same as those found in a source like CAS, the application-oriented classification is the real contribution of the IFD thesaurus. It should be based on the relationship
substance is used for (purpose, priority, food product)
and on the relationship
substance is harmful for (harmful effect, strength).
For each substance there may be, and usually are, multiple is-used-for and/or is-harmful-for statements. The purpose slot can accept a combination of the form (aiding-in, process) or (counter-acting, process). The food product is generally specified at a very high level (food group).
Substance per se is not a factor in the FFV. A good starting point for developing this section of the thesaurus is FDA/CFSAN's chemical dictionary, which gives verified CAS numbers and names for most any food-relevant substance. Data for the relationship is-used-for can be obtained from the FFV by virtue of a substance being listed in a factor, and from the CFR. Substances occur in the following FFV factors: B1 Food source (in connection with plants used for producing certain substances), B2 Part of plant or animal (in connection with extracts), D3 Treatment applied (primarily as Ingredient added, but also as substances used in processing), D4 Preservation (preservatives), E2 Container (they are organized first by material), and E3 Food contact surface. All substances listed in one of these two sources should be included in the IFD thesaurus.
Physical state
The following classification, taken essentially from the FFV factor C Physical state, shape, or form, should be satisfactory.
Liquid
Liquid, low viscosity
Liquid, high viscosityLiquid, low viscosity, no visible particles
Liquid, low viscosity, with very small particlesLiquid, high viscosity, no visible particles
Liquid, high viscosity, with very small particles
Semiliquid
Semiliquid with smooth consistency
Semiliquid with very small particles
Semisolid
SolidSemisolid with smooth consistency
Semisolid with very small particles
Soft
Hard
The internal structure seems less important for solid products. However, if it is deemed important, appropriate values can be added.
The FFV term Liquid, low viscosity, with small pieces and the other corresponding terms are not needed here. The product would be indexed as consisting of two ingredients, and the physical state (and the physical form) would be indexed separately in the description of each ingredient. (There is still a problem here that needs to be resolved: The ingredient physical state and form should be indexed at the point of entry into the food being described, while for the purposes here one would need the physical state and form at the end of the processing, in the finished product.)
Physical form
In the long run, an integrated list of physical form terms applicable to food products as well as to containers should be developed. For now it is easiest to have two subdivisions:
Physical form of food product
Physical form of container
The first subdivision can be taken from the subdivision of Solid in the FFV. Values for the second subdivision can be derived from the container terms in the FFV factor E2 Container or wrapping by subtracting the material component.
Process
This is a very important entity type. It covers all processes used in the manufacture/preparation, preservation, and handling of foods. The base classification should reflect the intrinsic nature of the processes. One possible broad subdivision is
- Physical process (with heat process as a major narrower term)
- Chemical process
- Biochemical process
Application-oriented classifications can be based on the following two relationships
process is-used-for (purpose, priority, food product)
and
process is-harmful-for (harmful-effect, strength, food product).
Finding all these relationships while simultaneously adding values to the list for the entity type purpose/harmful effect will require considerable effort.
The FFV does not have a process factor per se. The factors in group D (D1 Degree of preparation, D2 Cooking method, D3 Treatment applied, and D4 Preservation) all include many processing terms. For the beginning, the terms found there might suffice.
Consumer group
The values in this entity type are in turn composites of three factors: organism, age, and sex. Organism in this context distinguishes between human and animal. Animal species are included in the entity type organism anyhow. The most logical course of action is to include Human there as well (possibly subdivided by race if such subdivision is food-relevant). Note that Human is needed anyhow as the source of human milk. The most important combinations should be included in the thesaurus so that they can be used as if they were elemental values. The less frequent combinations must be built during indexing as they are needed.
Purpose/harmful effect
For economy of entity types, these two can be combined into one type. This entity type comprises all the purposes or effects to be achieved in the production, processing, and consumption of foods, and all the harmful effects that can be caused by food-relevant substances.
At a minimum, this section of the thesaurus must contain the purposes/harmful effects needed for the application-oriented classification of organisms, substances, and product types.
The basic classification should first make the division between purpose/useful effect and harmful effect, and then group by area, such as nutrition, preservation, appearance, etc. An alternate classification might group, for example, all physiological effects, whether useful or harmful.
This may be the most difficult part of the thesaurus to develop since there is no explicit list to start from. Purposes are implied by the classification in the FFV factors Product type and Food source, by the factors Preservation, Packing medium, Container, and Food contact surface, and by the classification of food additives and other mention of substances in the CFR. Explicit purpose terms must be extracted from these sources. It is recommended that purpose terms be accumulated as is-used-for statements about organisms, substances, and product types are entered.
This entity type includes
Use/diet
This entity type is concerned with special uses of foods, especially conditions, such as diseases, for which the food is helpful or at least can be tolerated. It covers some of the values under Consumer group in FFV. However, the descriptors such as Low sodium are handled through the relationship
Food product contains (Substance,...)
with the value of Label claim being yes.
Amount
This entity type serves to record quantitative ingredient or nutrient information where available. Amount is recorded as a number per 100 g. The unit of measurement is linked once and for all to the substance for which the amount is recorded; therefore, there is no need to give the unit of measurement for each individual amount recorded.
Sometimes amounts are given imprecisely, such as "low in sodium". The following special values of the entity type amount are introduced to make recording of this type of information possible.
None
Very low
Low
Medium low
Normal
Medium high
High
For substances for which standards exist, these terms can be linked with a range of values through a relationship to be defined for this purpose.
Property
This entity type is introduced to capture any properties not taken care of otherwise. Its use and the values needed must be developed over time.
Geographical area
Countries could be represented by their three-digit telephone code. Each country would define a hierarchy of regions (possibly two or three ways of subdividing a country, each adapted for a different type of food, such as wine and cheese in France). The codes for the within-country regions are appended to the country code. A set of special codes for large regions, such as tropics to be developed.
Money and Currency
Money is represented as numbers with two places after the decimal point, and currency by the country code (with a table giving the proper designation).
Container
The situation here is similar to that for Food product. The system provides the elements for describing the containers by giving the structural strength material(s), the coating material(s), and the container form. An indexer can make up such a description on the spot and then reference it in a food description. However, the thesaurus should include a list of containers with ready-made descriptions corresponding to the FFV factor E2.
The discussion of the entity types Substance/material and Physical form already mentioned that the list of entity values included must consider the needs of container description.
The text of this appendix is found in the file
SoftcritEnglish.pdf
Two files constitute this appendix:
FAOSimplifiedDataSchemas.doc
FAODataSchema3Full.xls
The simplified schemas are stripped down to bare essentials, omitting whole tables and data fields of a mainly administrative nature, primary key declarations, and index declarations, so that is easier to see the structure.
The full schema comparison is given in a spreadsheet. It is organized along the MARTIF categories.
As can be seen, FAO Term and FAO Glossary have very similar data structures. AGROVOC agrees in many aspects, with one major difference:
In FAO Term and FAO Glossary, there is one record for a term with a separate data field for the term in each of the six languages included. The same is true for definitions. In AGROVOC, on the other hand, the record for a term has a field for the term (in whatever language) and a field for the language; each language version of the term has its own record, tied together by the term number. AGROVOC's solution is more elegant in this case; it has simpler table definitions and it can add new languages without having to redefine tables.
/* Capital conversion. */
/* Principle: Capital letters are converted to lower case with the following exceptions:
- If any letter but the first in a word is a capital, no character in the word is changed.
- If a capital letter at the beginning of a word is preceded by '^^', it is not changed.
- If the capital letter is a word by itself, it is not changed.
- If the capital letter is followed by any punctuation character or digit, it is not changed.
- If a single word is in the database starting with a capital, the capital is preserved. */
/* Determine capitalization of a word within the whole term. (capitalization of the individual word for database lookup and storage is determined later.) Compute gToLower based on many conditions that protect upper case. */
| /* gToLower | 0 | capital left as is |
| | 1 | capital converted to lower case for word as such (sSingular), but not for word in term (apsTerm[n][i]) |
| | 2 | capital converted to lower case in both places. */ |
/* If the word start with lower case and/or falls under one of the protection conditions, gToLower is 0. Otherwise, gToLower may be 1 or 2, depending on other conditions. */
| if (strlen(apsTerm[n][i]) == 1 | ||||||
| | || islower(apsTerm[n][i][0]) | |||||
| | || gNoMod | |||||
| | || strlen(asWordCaret[i]) >= 2 | |||||
| | | /* A single caret maintains the initial capital of the word as part of the term, but not of the word stored as such. */ | ||||
| | || ispunct(apsTerm[n][i][1]) | |||||
| | || isdigit(apsTerm[n][i][1]) /* Strings like A1, A-1. */ | |||||
| | || strpbrk(apsTerm[n][i] + 1, sCapital)) /* Inner capital. */ | |||||
| | { | |||||
| | gToLower = 0; | |||||
| | } | |||||
| else | ||||||
| | /* The word starts with a capital and none of the protection conditions apply. */ | |||||
| | { | |||||
| | switch (cCapAlgorithm) | |||||
| | | { | ||||
| | | case 'n': | ||||
| | | case 'N': | ||||
| | | | gToLower = 1; | |||
| | | | | /* Capitalized wordz, preceded by at most one caret; capitalized as part of term, lower case as individual word. */ | ||
| | | | break; | |||
| | | case 'p': | ||||
| | | case 'P': | ||||
| | | | /* In partial capital conversion only the first word is converted, but not if all other words (excluding stopwords) in a multi-word term are capitalized. */ if (i == 1) | |||
| | | | | { | ||
| | | | | /* Determine whether all other non-stop words start with a capital; if so, keep capital on first word. */ | ||
| | | | | gAllCap = 1; | ||
| | | | | for (k = 2; k <= iNumberOfWords; k++) | ||
| | | | | | { | |
| | | | | | if (islower(apsTerm[n][k][0]) &&!stopword(apsTerm[n][k])) | |
| | | | | | | /* The term starts with lower case and is not a stopword. */ |
| | | | | | { | |
| | | | | | gAllCap = 0; | |
| | | | | | break; | |
| | | | | | } | |
| | | | | | | } /* End for. */ |
|
| | | | if (iNumberOfWords == 1) | ||
| | | | | | gAllCap = 0; | |
| | | | | if (!gAllCap) | ||
| | | | | | if (strlen(asWordCaret[i]) == 0) | |
| | | | | | gToLower = 2; | |
| | | | | | else | |
| | | | | | gToLower = 1; | |
| | | | | } /* End if (i == 1). */ | ||
| | | | else | |||
| | | | | | gToLower = 1; /* Unprotected individual word l.c. */ | |
| | | | break; | |||
| | | case 'f': | ||||
| | | case 'F': | ||||
| | | | if (strlen(asWordCaret[i]) == 0) | |||
| | | | | gToLower = 2; | ||
| | | | else | |||
| | | | | gToLower = 1; | ||
| | | | break; | |||
| | | } /* End switch (cCapAlgorithm). */ | ||||
| | } /* End if (isupper...). */ | |||||
| if (gToLower == 2) | ||||||
| | apsTerm[n][i][0] = tolower(apsTerm[n][i][0]); | |||||
| /* End of capital conversion for the word in the whole term. May still need to be revised as part of database lookup. Still need to word as it is to be stored individually. */ | ||||||
| | ||||||
| zeroset(sWord); | ||||||
| strncpy(sWord, apsTerm[n][i], sizeof(sWord) - 1); | ||||||
| /* apsTerm[n][i] remains the word as it appeared in the term, except that capitalization has been adjusted when necessary, including words that need to be always cap. */ | ||||||
| | if (gToLower == 1) | |||||
| | | sWord[0] = tolower(sWord[0]); | ||||
| | | /* If gToLower is 2, apsTerm[n][i] is already converted to lower case. This must be done here, because this is the form to be looked up in the database. */ | ||||
| | ||||||
| /* Now look up individual word in database, if gPost create new term record if needed. | ||||||
| Check in database whether term needs to be always cap. May need to reverse a conversion to lower; if original lower (as seen from gToLower) needs to be cap, need a message. */ | ||||||
| | ||||||
| if (stopword(sWord)) | ||||||
| | /* Note: stopword is case-sensitive. */ | |||||
| | return -1; | |||||
| /* Word is not a stopword. Search for word itself. */ | ||||||
| /* Note: to_singular checks for occurrence in database, but not with the examination of capitalization as done here. */ | ||||||
| if (btrieve_term_term(iGreaterEq, sWord) >= 0) | ||||||
| | if (strcmp(sWord, rTermRec.sTerm) == 0) | |||||
| | | /* Exact same term with same capitalization found. */ | ||||
| | | gMatch = 2; | ||||
| | else /* Not an exact match. */ | |||||
| | | if (strcmpi(sWord, rTermRec.sTerm) == 0 | ||||
| | | | && strcmp(sWord + 1, rTermRec.sTerm + 1) == 0) | |||
| | | | | /* Same term with case-insensitive comparison, exact same except for first character. */ | ||
| | | | gMatch = 1; | |||
| | | else | ||||
| | | | gMatch = 0; | |||
| else | ||||||
| | gMatch = 0; | |||||
| /* If there is a match, must still take care of words wher singular is stem + suffix, esp. final y, such as psychology. */ | ||||||
| | if (gMatch) | |||||
| | | { | ||||
| | | zeroset(sSuffix); | ||||
| | | strncpy(sSuffix, rTermRec.sTerm + rTermRec.iStemLength, iSuffixLength); | ||||
| | | zeroset(sSingular); | ||||
| | | strncpy(sSingular, rTermRec.sTerm, iTermLengthMax); | ||||
| | | iStemLength = rTermRec.iStemLength; | ||||
| | | } | ||||
| | else | |||||
| | | /* If there is no match, find the singular and try it, if it is different. */ | ||||
| | | { | ||||
| | | if (gNoMod || asWordBackSlash[i][0]) | ||||
| | | | /* No modification. */ | |||
| | | | { | |||
| | | | zeroset(sSingular); | |||
| | | | strncpy(sSingular, sWord, sizeof(sSingular) - 1); | |||
| | | | sSuffix[0] = 0; | |||
| | | | iStemLength = strlen(sSingular); | |||
| | | | } | |||
| | | else | ||||
| | | | /* Note: need to do this even if no final s to get final y as sSuffix. */ | |||
| | | | iStemLength = to_singular(sWord, sSingular, sSuffix); | |||
| | | if (strcmp(sWord, sSingular)!= 0) | ||||
| | | | /* No use repeating the look-up if word was singular. */ | |||
| | | | if (btrieve_term_term(iGreaterEq, sSingular) >= 0) | |||
| | | | | if (strcmp(sSingular, rTermRec.sTerm) == 0) | ||
| | | | | | /* Exact same term found. */ | |
| | | | | | gMatch = 2; | |
| | | | | else /* Not an exact match. */ | ||
| | | | | | if (strcmpi(sSingular, rTermRec.sTerm) == 0 | |
| | | | | | | && strcmp(sSingular + 1, rTermRec.sTerm + 1) == 0) |
| | | | | | | gMatch = 1; |
| | | | | | else | |
| | | | | | | gMatch = 0; |
| | | | else | |||
| | | | | gMatch = 0; | ||
| | | } | ||||
| iStemDiff = strlen(sSingular) - iStemLength; | ||||||
| switch (gMatch) | ||||||
| | { | |||||
| | | case 2: /* Exact match in database. */ | ||||
| | | | lFoundNo = rTermRec.lTermNo; | |||
| | | | if (islower(apsTerm[n][i][0])) | |||
|
| | | | /* No action necessary. */; | ||
| | | |||||
| | | | else /* Initial cap. */ | |||
| | | | | { | ||
| | | | | if (rTermRec.cAlwaysCap) | ||
| | | | | | /* Make sure initial cap is protected. */ | |
| | | | | | { | |
| | | | | | if (strlen(asWordCaret[i]) < 2) | |
| | | | | | | { |
| | | | | | | zeroset(asWordCaret[i]); |
| | | | | | | strncpy(asWordCaret[i], "^^", sizeof(asWordCaret[i]) - 1); |
| | | | | | | } |
| | | | | | } | |
| | | | | else | ||
| | | | | | { | |
| | | | | | if (strlen(asWordCaret[i]) == 3 && gPost) | |
| | | | | | | /* set rTermRec.cAlwaysCap to 1 in current term record. */ |
| | | | | | | { |
| | | | | | | rTermRec.cAlwaysCap = 1; |
| | | | | | | btrieve_term_term(iUpdate, ""); |
| | | | | | | } |
| | | | | | } | |
| | | | | } | ||
| | | | break; | |||
| case 1: | ||||||
| | /* Partial match in database, match in everything except capitalization of first character. */ | |||||
| | /* This can only happen if the first character of the word is lower case and the first character of the term in the database is upper case, and if the database does not contain the term with a starting lower case. Reasons: A lower-case word would match a lower-case term in the database exactly. If there is no lower-case matching term, next in sequence is the matching upper-case term (if in the database), otherwise a completely different term. If the word starts with upper case, an upper-case matching term would match exactly. A lower-case matching term would be earlier in the sequence and thus not be found by GreaterEq; thus, if there is no exact match for the upper-case word, the term found by GreaterEq is a completely different term, leading to gMatch = 0. */ | |||||
| | ||||||
| | if (rTermRec.cAlwaysCap) | |||||
| | | /* Word must be first cap, unless input with lower case, protected. */ | ||||
| | | { | ||||
| | | if (gToLower) | ||||
| | | | /* Word was capitalized in input, must restore capital. */ | |||
| | | | { | |||
| | | | lFoundNo = rTermRec.lTermNo; | |||
| | | | | |||
| | | | if (gToLower == 2) | |||
| | | | | { | ||
| | | | | apsTerm[n][i][0] = toupper(apsTerm[n][i][0]); | ||
| | | | | } | ||
| | | | sSingular[0] = toupper(sSingular[0]); | |||
| | | | zeroset(asWordCaret[i]); | |||
| | | | strncpy(asWordCaret[i], "^^", sizeof(asWordCaret[i]) - 1); | |||
| | | | } | |||
| | | else | ||||
| | | | /* Word was not capitalized in input, must capitalize unless preceded by^^. */ | |||
| | | | { | |||
| | | | if (strlen(asWordCaret[i]) >= 2) | |||
| | | | | /* Set rTermRec.cAlwaysCap to 0 in the term in rTermRec. Leave word being processed alone; a term record must be posted to the database. */ | ||
| | | | { | |||
| | | | if(gPost) | |||
| | | | { | |||
| | | | rTermRec.cAlwaysCap = 0; | |||
| | | | btrieve_term_term(iUpdate, ""); | |||
| | | | } | |||
| | | | } | |||
| | | | else /* Must capitalize word and print message. */ | |||
| | | | { | |||
| | | | lFoundNo = rTermRec.lTermNo; | |||
| | | | apsTerm[n][i][0] = toupper(apsTerm[n][i][0]); | |||
| | | | sSingular[0] = toupper(sSingular[0]); | |||
| | | | fprintf(fpLoadReport1, | |||
| | | | "%s was not capitalized in line\n%s\n", | |||
| | | | apsTerm[n][i], sLineOrig); | |||
| | | | fprintf(fpLoadReport1, | |||
| | | | "Program changed first letter to capital.\n\n"); | |||
| | | | fprintf(fpLoadReportCum, | |||
| | | | "%s was not capitalized in line\n%s\n", | |||
| | | | apsTerm[n][i], sLineOrig); | |||
| | | | fprintf(fpLoadReportCum, | |||
| | | | "Program changed first letter to capital.\n\n"); | |||
| | | | } | |||
| | | } /* End if (gToLower). */ | ||||
| | } /* End positive if (rTermRec.gAlwaysCap). */ | |||||
| break; | ||||||
| } /* End switch (gmatch). */ | ||||||
Files Mockup1.pdf, Mockup2.pdf, Mockup3.pdf
Note: Internal to Harvard Business School. For internal use of the FAO thesaurus and ontology group only.
The mockups show a series of screens that demonstrate a type of thesaurus display that gives the user always an overall view of the thesaurus structure. They show a progression from a hierarchical outline to the "quick hierarchy" display (just descriptors, no annotations) and finally to the "annotated hierarchy" (descriptors with all annotations: definitions, synonyms, and conceptual relationships). Clicking on a descriptor in one display brings up the next more detailed display centered around the descriptor.
Information about the descriptor and relationships in other KOS can be shown in an expanded annotated hierarchy.
A similar thesaurus display can be seen in operation on the Web site for the
Alcohol and Other Drug Thesaurus, http://etoh.niaaa.nih.gov/AODVol1/aodthome.htm
Most KOS interfaces on the Web show the user only a small window on the scheme which makes it difficult for the user to get a sense of the overall structure and locate her topic within that structure. An sense of the overall structure often assists users in the very process of formulating their topics.
This appendix is given in file thes_schema.1.0.doc
Note: Internal to Harvard Business School. For internal use of the FAO thesaurus and ontology group only.
This schema is based on the concept- term- string model described in the JoDI paper http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/. It is a very flexible schema that allows for any type of relationship to be entered. It is based on the principle that most information about concepts, terms, and strings can be expressed through relationships. The schema allows for indicating sources and for a detailed audit trail of changes and responsibilities.
FAO and Harvard Business School might want to collaborate on the further development of this schema.
Dagobert Soergel
This appendix is given in two files,
semwebprop.pdf (short version)
semwebfl.pdf (full version)
This paper gives the design of exactly the type of system that FAO envisions for its Agricultural Ontology Service (AOS). It also contains a list of data fields for terminological records that should be consulted for the final version of the FAO KOS database schema.
Abstract
We propose to develop a system, dubbed SemWeb, that would revolutionize the way people - from experts to students - interact with conceptual structures and terminology and the way they share such knowledge. We aim at the synergistic exploitation of existing lexical and ontological knowledge bases (ontologies/classifications, thesauri, dictionaries) and their vast intellectual capital through integrated access, allowing a user to consult multiple sources with one search that returns one integrated answer that visualizes concept relationships for ease of understanding. SemWeb is intended for for a wide variety of users and uses - including education, information retrieval, knowledge-based systems and natural language processing - and bridge discipline, languages, and cultures. Then same environment will support collaborative development and maintenance of ontologies and lexica.
We will do research on difficult issues that need to be addressed in the system, for example we will study how ontological and lexical knowledge is used in different disciplines and we will work on defining measures and methods for the evaluation of ontologies, lexica, and their representations and for correlating and integrating ontologies. We will also study the use and impact of the prototype through pilot application and user studies, particularly the impact on learning by students.
Dagobert Soergel; Boris Lauser; Anita Liang; Frehiwot Fisseha; Johannes Keizer; Stephen Katz
Reengineering thesauri for new applications. The AGROVOC example
Journal of Digital Information, Volume 4 Issue 4, Article No. 257, 2004-03-17 http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/
This paper presents a number of results from the study undertaken to prepare this report. In particular, Section 4 contains an analysis of relationship types in AGROVOC that also sheds light on structural problems in AGROVOC.
Abstract and table of contents are given below.
Abstract
Empowering end users in searching collections of ever increasing magnitudes with performance far exceeding plain free-text searching (as used in many Web search engines) and developing systems that not only find but also process information for action require far more powerful and complex knowledge organization systems (KOS) than the existing classification schemes and thesauri that are lacking in well-defined semantics and structural consistency. In this paper, we present a conceptual structure and transition procedure to support the shift from a traditional KOS towards a full-fledged and semantically rich KOS. The proposed structure also complies with other interoperability approaches like RDFS and XML in the web environment. AGROVOC, a traditional thesaurus developed and maintained by the Food and Agriculture Organization (FAO) of the United Nations serves as a case study for exploring the reengineering of a traditional thesaurus into a full-fledged ontology. We start the process of developing an inventory of specific relationship types with well-defined semantics for the agricultural domain and explore the rules-as-you-go approach to streamlining the reengineering process.
Contents
1 From thesauri to rich ontologies
1.1 The problem
1.2 The relationship of traditional KOS to ontologies
1.3 Potential benefits of future generation KOSs
1.4 The process of reengineering: The rules-as-you-go approach
2 AGROVOC: A multilingual agricultural thesaurus
2.1 Background
2.2 Applications and related terminologies
2.3 Conceptual structure of AGROVOC
2.4 Semantic problems of AGROVOC
2.5 The need for reengineering AGROVOC into an ontology
3 Conceptual model: combining thesauri and ontologies
3.1 The basic model
3.2 Model extensions
3.3 Limitations
3.4 Implementation
3.5 Related approaches
4 The AGROVOC case: exploring conceptual relationships for the agricultural domain
4.1 The logical generic relationship
4.2 The part-whole family of relationships
4.3 Other relationships
5 Exploring the rules-as-you-go approach for the case of AGROVOC
6 Implications and further work
Dagobert Soergel
FAO Agricultural Ontology Server Workshop
Beijing, April 27 - 29, 2004
Overview
AI and Semantic Web applications need full-fledged ontologies that support reasoning
Constructing such ontologies is expensive
While existing KOS do not provide the full set of precise concept relationships needed for reasoning, existing KOS, both large and small, represent much intellectual capital (KOS = Knowledge Organization System)
How can this intellectual capital be put to use in constructing full-fledged ontologies
Specifically: From AGROVOC to a full-fledged Food and Agriculture Ontology
FAO Subject Tree V. 2´
The following is the new FAO Subject Tree categories and the Agrovoc descriptors that can be searched within each category in the EIMS databases.
Asterisks (*) denotes a term that will appear in the new version of AGROVOC and in the Subject tree when there are records that have been indexed with it.
Animal Production & Health
Animal breeding | Animal Diseases | Animal Health | Animal Husbandry | Animal Physiology | Animal Production | Animal products | Disease Control | Feeding | Feeds | Livestock | Statistical Data | Veterinary Medicine
Economics & Policy
Agricultural Products | Agricultural policies | Commodity Markets | Development aid | Development policies | Economic Development | Economic Policies | Food Security | Investment | Land economics | Marketing | Natural Resources | Policies | Prices | Production | Statistical Data | Supply Balance | Trade | Trade Policies
Education & Extension
Education | Educational Policies | Extension Activities | Information and Communication Technologies (ICTs) | Journalism | Public Relations | Training
Farming Practices & Systems
Cropping Systems | Farm Management | Farming Systems | Organic Agriculture | Sustainability | Urban Agriculture
Fisheries & Aquaculture
Aquaculture | Climatic Change | Cooperation | Ecosystems | Development policies | Fishery data | Fish processing | Fisheries | Fisheries development | Fishery policies | Fishes | Fishery management | Fishery production | Fishery products | Fishery resources | Governance* | Legislation | Marketing | Quality | Research | Safety | Statistical data | Trade | Trends | Technology | Trade agreements | Utilization*
Food Security
Agricultural development | Agricultural policies | Agricultural situation | Development projects | Emergency relief | Ethics | Famine | Food aid | Food policies | Food production | Food resources | Food stocks | Food supply | International cooperation | Malnutrition | Poverty
Forestry
Biodiversity | Climate change | Community forestry | Education | Environment | Forest land | Forest management | Forestry development | Forestry policies | Forest products | Forest protection | Forest resources | Forests | Legislation | Statistical Data
Geographical and Regional Information
Regions (Drop down): Africa | Latin America and the Caribbean | North America | Asia | Europe | Oceania Country (Drop down)
Government, Administration & Legislation
Administration | Agricultural and rural legislation | Environmental legislation | Food legislation | International agreements | Labour legislation | Law | Legislation | Management | Planning | Public health legislation | Regulations | Standards | Water rights
Human Nutrition & Food Safety
Consumer Protection | Diet | Food Additives | Food Composition | Food Legislation | Food Safety | Food Technology | Foods | Health | Human Nutrition | Malnutrition | Nutrition Education* | Nutrition Policies | Nutritional Requirements | Public Health | Quality controls | Risk Assessment* | Risk Communication* | Statistical data
Natural Resources & Environment
Biodiversity | Climate | Desertification | Drainage | Ecology | Ecosystems | Environmental Conventions | Environmental Protection | Forestry Resources | Genetic Resources | Irrigation | Land economics | Land resources | Land Use | Natural Resources | Pollution | Resource Management | Soil Resources | Soil Sciences | Statistical data | Water Resources | Water use
Plant Production & Protection
Breeding Methods | Crop Management | Crops | Fertilizers | Harvesting | Integrated Pest Management | Irrigation | Pest Control | Phytosanitation[1]| Plant genetic resources | Plant health[2]| Plant protection | Pesticides | Seed production | Weeds
Rural, Social & Agricultural Development
Agricultural Development | Agricultural policies | Community forestry | Community Involvement | Development projects | Households | Indigenous Knowledge | Participation | Poverty | Rural Communities | Rural Development | Rural Finance | Social Policies | Socioeconomic Development | Sustainable Livelihoods | Gender | Women in Development
Engineering, Technology & Research
Appropriate technology | Biotechnology | Databases | Engineering | Equipment | Farm equipment | Methods | Research | Statistical Data | Statistical methods | Surveys | Technology
| AGRIS Categories | |
| A | Agriculture |
| A01 | Agriculture - General aspects |
| A50 | Agricultural research |
| B | Geography and history |
| B10 | Geography |
| B50 | History |
| C | Education, extension, and advisory work |
| C10 | Education |
| C20 | Extension |
| C30 | Documentation and information |
| D | Administration and legislation |
| D10 | Public administration |
| D50 | Legislation |
| E | Economics, development, and rural sociology |
| E10 | Agricultural economics and policies |
| E11 | Land economics and policies |
| E12 | Labour and employment |
| E13 | Investment, finance and credit |
| E14 | Development economics and policies |
| E16 | Production economics |
| E20 | Organization, administration and management of agricultural enterprises or farms |
| E21 | Agro-industry |
| E40 | Cooperatives |
| E50 | Rural sociology |
| E51 | Rural population |
| E70 | Trade, marketing and distribution |
| E71 | International trade |
| E72 | Domestic trade |
| E73 | Consumer economics |
| E80 | Home economics, industries and crafts |
| E90 | Agrarian structure |
| F | Plant production |
| F01 | Crop husbandry |
| F02 | Plant propagation |
| F03 | Seed production |
| F04 | Fertilizing |
| F06 | Irrigation |
| F07 | Soil cultivation |
| F08 | Cropping patterns and systems |
| F30 | Plant genetics and breeding |
| F40 | Plant ecology |
| F50 | Plant structure |
| F60 | Plant physiology and biochemistry |
| F61 | Plant physiology - Nutrition |
| F62 | Plant physiology - Growth and development |
| F63 | Plant physiology - Reproduction |
| F70 | Plant taxonomy and geography |
| H | Protection of plants and stored products |
| H01 | Protection of plants - General aspects |
| H10 | Pests of plants |
| H20 | Plant diseases |
| H50 | Miscellaneous plant disorders |
| H60 | Weeds |
| J | Handling, transport, storage and protection of agricultural products |
| J10 | Handling, transport, storage and protection of agricultural products |
| J11 | Handling, transport, storage and protection of plant products |
| J12 | Handling, transport, storage and protection of forest products |
| J13 | Handling, transport, storage and protection of animal products |
| J14 | Handling, transport, storage and protection of fisheries and aquacultural products |
| J15 | Handling, transport, storage and protection of non-food or non-feed agricultural products |
| K | Forestry |
| K01 | Forestry - General aspects |
| K10 | Forestry production |
| K11 | Forest engineering |
| K50 | Processing of forest products |
| K70 | Forest injuries and protection |
| L | Animal production |
| L01 | Animal husbandry |
| L02 | Animal feeding |
| L10 | Animal genetics and breeding |
| L20 | Animal ecology |
| L40 | Animal structure |
| L50 | Animal physiology and biochemistry |
| L51 | Animal physiology - Nutrition |
| L52 | Animal physiology - Growth and development |
| L53 | Animal physiology - Reproduction |
| L60 | Animal taxonomy and geography |
| L70 | Veterinary science and hygiene |
| L72 | Pests of animals |
| L73 | Animal diseases |
| L74 | Miscellaneous animal disorders |
| M | Aquatic sciences and fisheries |
| M01 | Fisheries and aquaculture - General aspects |
| M11 | Fisheries production |
| M12 | Aquaculture production and management |
| M40 | Aquatic ecology |
| N | Machinery and buildings |
| N01 | Agricultural engineering |
| N02 | Farm layout |
| N10 | Agricultural structures |
| N20 | Agricultural machinery and equipment |
| P | Natural resources |
| P01 | Nature conservation and land resources |
| P05 | Energy resources and management |
| P06 | Renewable energy resources |
| P07 | Non-renewable energy resources |
| P10 | Water resources and management |
| P11 | Drainage |
| P30 | Soil science and management |
| P31 | Soil surveys and mapping |
| P32 | Soil classification and genesis |
| P33 | Soil chemistry and physics |
| P34 | Soil biology |
| P35 | Soil fertility |
| P36 | Soil erosion, conservation and reclamation |
| P40 | Meteorology and climatology |
| Q | Food science |
| Q01 | Food science and technology |
| Q02 | Food processing and preservation |
| Q03 | Food contamination and toxicology |
| Q04 | Food composition |
| Q05 | Food additives |
| Q51 | Feed technology |
| Q52 | Feed processing and preservation |
| Q53 | Feed contamination and toxicology |
| Q54 | Feed composition |
| Q55 | Feed additives |
| Q60 | Processing of non-food or non-feed agricultural products |
| Q70 | Processing of agricultural wastes |
| Q80 | Packaging |
| S | Human nutrition |
| S01 | Human nutrition - General aspects |
| S20 | Physiology of human nutrition |
| S30 | Diet and diet-related diseases |
| S40 | Nutrition programmes |
| T | Pollution |
| T01 | Pollution |
| T10 | Occupational diseases and hazards |
| U | Auxiliary disciplines |
| U10 | Mathematical and statistical methods |
| U30 | Research methods |
| U40 | Surveying methods |
In file Terminology workflow-update1.doc
This analysis was done by a student. It is quite well done and might be useful while preparing for mapping to the CABI Thesaurus
| [1] Actual term has yet to be confirmed. If Phytosanitary regulations/standards is also added it will be linked to Phytosanitation and appear as a sub-category. [2] Has not yet been added to Agrovoc and thus will not immediately be added to the tree. |