Previous Page Table of Contents


Appendices


Appendix 1. Sample entries integrating information from AGROVOC, FAO Term and FAO Glossary

Notes

When the Arabic and Chinese terms appear to differ in the sources, there are listed separately.

The Arabic and Chinese terms mostly gave error messages when copied so they are all replaced by "xxx", but one can still see which sources have these terms.

watershed

[FAOGloss, FAOTerm, AGROVOC +s]



DF

The area which supplies water by surface and subsurface flow from rain to a given point in the drainage system (ISSS (1996) in Choudhury K. and L.J.M. Jansen (1999): Terminology for Integrated Resources Planning and Management. FAO, Rome, Italy: 69 pages [FAOGloss] [FAOTerm, source:. (Terminology for integrated resource planning and management, 1999 - X2079E)]



UF

Catchment areas [AGROVOC]
Catchment basins [AGROVOC]
Groundwater basin [AGROVOC]
River basins [AGROVOC] [RT in FAOGloss]



BT

Physiographic features [AGROVOC]



RT

River basin [FAOGloss] [UF in AGROVOC]
Drainage [FAOGloss]
Watershed management [AGROVOC]

(Arabic)

xxx [AGROVOC]

(Chinese)

xxx [FAOTerm]
xxx [AGROVOC]

(Czech)

povodí [AGROVOC]

(French)

bassin versant [FAOTerm] [AGROVOC]
bassin hydrographique [FAOTerm]

(Spanish

cuenca hidrográfica [FAOTerm, AGROVOC +s]
cuenca colectora [FAOTerm]

(Portuguese)

Bacia hidrográfica [AGROVOC]

trawl

[FAOGloss, FAOTerm] [RT in AGROVOC +ers, +ing]

DF

A cone or funnel-shaped net that is towed through the water by one or more vessels [FAOGloss]



RT

Pair trawling [FAOGloss]

(Arabic)

xxx [FAOTerm]
xxx [FAOTerm]
xxx [FAOTerm]

(Chinese)

xxx [FAOTerm]
xxx [FAOTerm]

(French)

chalut [FAOTerm]

(Spanish)

red barredera [FAOTerm]
red de arrastre [FAOTerm]

feed

[FAOGloss, FAOTerm, AGROVOC +s]

DF

Any non-injurious edible material having nutrient value. May be harvest forage, range or artificial pasture forage, grain, or other processed feed for livestock or game animals. Am. Soc. Of Range management (1964) in Choudhury K. and L.J.M. Jansen (1999): Terminology for Integrated Resources Planning and Management. FAO, Rome, Italy: 69 pages [FAOGloss] [FAOTerm, source:. (Terminology for integrated resource planning and management, 1999 - X2079E)]



DF

Any non-injurious edible material having nutrient value. May be harvest forage, range or artificial pasture forage, grain, or other processed feed for livestock or game animals. (Terminology for integrated resource planning and management, 1999 - X2079E) [FAOTerm]



DF

In aquaculture, residues from agriculture and food producing industries as well as fishmeal are important sources of feeds. [FAOGloss]



UF

Animal feeding stuffs [AGROVOC]
Dehydrated feeds [AGROVOC]
+ Feed formulations [AGROVOC]
+ Feed quality [AGROVOC]
feeding stuff [FAOTerm, AGROVOC +s]
feedstuff [FAOTerm, AGROVOC +s]
Foods for animals [AGROVOC]
Liquid feeds [AGROVOC]
Livestock feed [AGROVOC]
+ Pelleted feeds [AGROVOC]



NT

Clover [AGROVOC]
Complete feeds [AGROVOC]
Compound feeds [AGROVOC]
Concentrates [AGROVOC]
Corn cob mix [AGROVOC]
Desiccated fodders [AGROVOC]
Feed cereals [AGROVOC]
Feed crucifers [AGROVOC]
Feed grasses [AGROVOC]
Feed legumes [AGROVOC]
Feed meals [AGROVOC]
Feed roots [AGROVOC]
Feeds of animal origin [AGROVOC]
Grain feed [AGROVOC]
Green feed [AGROVOC]
Hay [AGROVOC]
Haylage [AGROVOC]
Leaf meal [AGROVOC]
Lucerne [AGROVOC]
Lucerne meal [AGROVOC]
Mangolds [AGROVOC]
Medicated feeds [AGROVOC]
Milk replacers [AGROVOC]
Pet foods [AGROVOC]
Roughage [AGROVOC]
Silage [AGROVOC]
Weaning feeds [AGROVOC]
Whole crop silage [AGROVOC]



RT

Beet pulp [AGROVOC]
Brewery byproducts [AGROVOC]
Browse plants [AGROVOC]
Cereal byproducts [AGROVOC]
Faeces [AGROVOC]
Farm inputs [AGROVOC]
Feed crops [AGROVOC]
Forage [AGROVOC]
Oilseed cakes [AGROVOC]
Rice husks [AGROVOC]
Rice polishings [AGROVOC]
Vegetables [AGROVOC]

(Arabic)

xxx [AGROVOC]

(Chinese)

xxx [AGROVOC]

(Czech)

krmiva [AGROVOC]

(French)

aliment du bétail [FAOTerm]
aliment pour animaux [FAOTerm, AGROVOC]

(Spanish)

pienso [FAOTerm, AGROVOC +s]
alimento de los animales [FAOTerm]

(Portuguese)

Alimento para animais [AGROVOC]

harvest index


[FAOTerm, AGROVOC]

DF


The ratio of economic yield of a crop and total dry matter at harvest. (Terminology for integrated resource planning and management, 1999 - X2079E) [FAOTerm]




BT


Yield components [AGROVOC]
Yields [AGROVOC]

(Arabic)

xxx [AGROVOC]

(Chinese)

xxx [AGROVOC]

(Czech)

sklizòový index [AGROVOC]

(French)

Indice de récolte [FAOTerm, AGROVOC]

(Spanish)

Índice de cosecha [FAOTerm, AGROVOC]
relación grano-paja [AGROVOC]

(Portuguese)

Relação grão palha [AGROVOC]

irradiation

[FAOGloss, FAOTerm, AGROVOC]

DF

Illumination with electromagnetic radiation, typically of sufficiently high energy(low-wavelength UV or gamma, etc.) to disrupt biological macromolecules and hence induce mutations. [FAOGloss]



UF

exposure [FAOTerm]
+ Ionization treatment [AGROVOC]
+ Radurization [AGROVOC]
+ Ultrasonic treatment [AGROVOC]



NT

Gamma irradiation [AGROVOC]
Ultraviolet irradiation [AGROVOC]
X ray irradiation [AGROVOC]



RT

asymmetric hybrid [FAOGloss]


Immunosuppression [AGROVOC]


Ionization [AGROVOC]


mutagen [FAOGloss]


mutation [FAOGloss]


Processing [AGROVOC]


radiation hybrid cell panel [FAOGloss]


Radiation damage [AGROVOC]


Radiosensitivity [AGROVOC]


Sterilizing [AGROVOC]


xxx [AGROVOC]


xxx [AGROVOC]


ozaøování [AGROVOC]


radioexposition [FAOTerm]


irradiation [FAOTerm, AGROVOC]


radioexposición [FAOTerm]


irradiación [FAOTerm, AGROVOC]


Irradiação [AGROVOC]


nutrient deficiency

[FAOGloss, FAOTerm, AGROVOC +ies]

DF

Absence or insufficiency of an essential factor for normal growth and development [FAOGloss]



NT

Kwashiorkor [AGROVOC]
Marasmus [AGROVOC]
Protein deficiencies [AGROVOC]
Vitamin deficiencies [AGROVOC]



RT

Deficiency diseases [AGROVOC]
Diet [AGROVOC]
Mineral deficiencies [AGROVOC]
Nutrients [AGROVOC]
Nutritional disorders [AGROVOC]
Nutritional status [AGROVOC]

(Arabic)

xxx [AGROVOC]

(Chinese)

xxx [FAOTerm, AGROVOC]

(Czech)

deficience zivin [AGROVOC]

(French)

carence (alimentaire) [FAOTerm]


Carence en substance nutritive [AGROVOC]

(Spanish)

carencia de nutrients [FAOTerm]


Deficiencias nutritivas [AGROVOC]

(Portuguese)

Carência em substâncias nutritivas [AGROVOC]

food security

[FAOGloss, FAOTerm, AGROVOC]

DF

Freedom from hunger. The capability to produce an adequate amount of food for all consumers at affordable prices. [FAOGloss]



DF

Food security exists when all people, at all times, have physical and economic access to sufficient, safe and nutritious food to meet their dietary needs and food preferences for an active and healthy life (WFS, 1996). [FAOTerm]



DF

(Spanish) Se dice que existe seguridad alimentaria cuando todas las personas tienen en todo momento acceso físico y económico a suficientes alimentos inocuos y nutritivos para satisfacer sus necesidades alimentarias y sus preferencias en cuanto a los alimentos, a fin de llevar una vida activa y sana (WFS, 1996). [FAOTerm]



DF

(French) La sécurité alimentaire existe lorsque tous les êtres humains ont, à tout moment un accès physique et économique à une nourriture suffisante, saine et nutritive leur permettant de satisfaire leurs besoins et leurs préférences alimentaires pour mener une vie saine et active (WFS, 1996). [FAOTerm]



RT

Food aid [AGROVOC]
Food consumption [AGROVOC]
Food policies [AGROVOC]
Food production [AGROVOC]
Food resources [AGROVOC]
Food stocks [AGROVOC]
Food supply [AGROVOC]
Right to food [AGROVOC]
Self sufficiency [AGROVOC]

(Arabic)

xxx [FAOTerm]
xxx [AGROVOC]

(Chinese)

xxx [FAOTerm]
xxx [AGROVOC]

(Czech)

potravinová bezpeènost [AGROVOC]

(French)

sécurité alimentaire [FAOTerm, AGROVOC]

(Spanish)

seguridad alimentaria [FAOTerm, AGROVOC]

(Portuguese)

Segurança alimentar[AGROVOC]

Appendix 2. Listing of general and specialized KOS in the food and agriculture domain identified on the Web

This text is repeated from Part 2 J. for convenience.

Appendix 2 lists a number of sources, divided into general coverage sources and specialized sources.

The sources are labeled as follows:

G/D = Glossary, dictionary
T = Thesaurus, classification
N = nomenclature
DB = Database
O = Other (handbook etc.)

+ KOS maintained, sponsored, or used by FAO
* Otherwise consider as a priority source of terms for AGROVOC
# Site to link to in an Agricultural Ontology Server for more detailed information

The sources marked by * were selected based on germaneness to FAO's work, authority of the originating organization, richness of information, and, where appropriate, size. FAO personnel are more knowledgeable about the areas in which AGROVOC is weak and are therefore in a better position to assign source priority based on that criterion.

These sources can be harvested for additional concepts, terms in multiple languages, definitions, and relationships. This requires

Appendix 3. KOS use cases and KOS projects inventory

This appendix gives a few examples of KOS use cases and KOS projects analyzed according to the template given in Part1, 1.4 and 1.5

Framework for KOS projects

Number and title AGROVOC

Related to thesaurus use cases

Scope and size
Regular work flow just being established, update size not available
update cycle 3 months

Unit and person responsible Gudrun Johansen + countries
Support countries with tool

Collaboration / coordination (actual and possible)

Development versus maintenance

Any gaps in domain coverage

Fisheries, forestry (both complained), sustainable development: environment and natural resources service needs technology terms remote sensing, GIS, biotechnology, biosafety, Need list of FAO priorities and new initiatives

Development person hours / maintenance person hours per month

Time frame

Data model (entity and relationship types) (existing and needed)

Mapping to categorization schemes instead of having built-in hierarchy

094: TermCode indicator

0 English (EN)
1 French (FR)
2 Spanish (ES)
3 Arabic
4 Chinese

Note: AGROVOC also contains some German translations. Also, AGROVOC translations done by third parties are available in a number of languages, including Thai, Croatian,...

x01: Descriptor
x02: Scope Note
x03: Use
x04: Use for
x05: Broader Term
x06: Narrower Term
x07: Related Term
x08: Non Descriptor

LanguageCode

Name

LngGroupID

OriginalName

CreateDate

DE

German

100

Deutsch

4/27/1999
7:07:00 PM

EN

English

10

English

4/27/1999
7:07:00 PM

ES

Spanish

10


4/27/1999
7:07:00 PM

FR

French

10


4/27/1999
7:07:00 PM

IT

Italian

100


1/20/1997

PT

Portuguese

100


7/3/1996

RU

Russian

100


4/27/1999
7:07:00 PM

Scope

ScopeID

ScopeDesc

GC

Geographic Term (country level)

GG

Geographic Term (above country level)

GL

Geographic Term (below country level)

TA

Taxonomic Term (animals)

TB

Taxonomic Term (bacteria)

TF

Taxonomic Term(fungi)

TP

Taxonomic Term (plants)

TV

Taxonomic Term (viruses)


TagTypeID

TagDesc

LanguageCode

10

Scope Note

EN

20

History Note

EN

30

Definition

EN

40

Comments

EN


LinkTypeID

LanguageCode

LinkDesc

LinkAbr

CreateDate

RLinkCode

5

EN

Scope Note Reference

SNR

4/27/1999
7:07:00 PM

10

10

EN

Is Referenced in Scope Note

SNX

4/27/1999
7:07:00 PM

5

20

EN

Used For

UF

4/27/1999
7:07:00 PM

70

30

EN

Used For+

UF+

4/27/1999
7:07:00 PM

70

40

EN

Seen For

SF

4/27/1999
7:07:00 PM

80

50

EN

Broader Term

BT

4/27/1999
7:07:00 PM

60

60

EN

Narrower Term

NT

4/27/1999
7:07:00 PM

50

70

EN

Use

USE

4/27/1999
7:07:00 PM

20

80

EN

See

SEE

4/27/1999
7:07:00 PM

40

90

EN

Related Term

RT

4/27/1999
7:07:00 PM

90

Software used, file structure

Internal software

MySQL database is the main DB, exported to Access for distribution, to Oracle for Web

Sources

Suggestions from users/ info system designers internal to FAO

EIMS WAICENT

ASFA problem

Framework for KOS projects

Number and title FAO Term

Related to thesaurus use cases

Translators, terminology info on FAO activities, org. units and project names 8,000 that FAO deals with for anybody in the organization

No automated translation, except using exact same sentence with fuzzy matches. Internet use Claudia

Could support definitions in flux

Interpreters, create specific glossaries for meetings, interpreters also need general language info

Scope and size

100 terms changed or added, 30% change, 70% new

Unit and person responsible Alexis Crespel, Ingrid Alldritt-Ferrarro 3 full + Alexis.5, + other support

But also support meeting preparation

Collaboration / coordination (actual and possible)

Development versus maintenance

Any gaps in domain coverage

No field completely missing Codex alimentarius, environment, fisheries and forestry difficult to keep up with due to size

Development person hours / maintenance person hours per month

Time frame

Data model (entity and relationship types) (existing and needed)

Terms
5 official languages
See separate document

Software used, file structure

Trados MultiTerm connected to translation memory system
Work on Web interface for direct input to Web database
Multiterm to Oracle every months.

Other

Web site feedback form

Sources for concepts and terms

translators
INRA
query
term extraction using Trados, statistical approach, lots of noise, not unique terms
Definition extractor but political
avg 100 per month
rely on expertise of others for definition
Meeting document glossaries
Reference section alerts to docs with definitions
From other institutions try to integrate

Integrated within a service

Forms of publications

Framework for KOS projects

Number and title FAO Glossary

Related to thesaurus use cases

Scope and size 4,700 terms

Unit and person responsible Fisheries; Sustainable Development
Glossaries defined by technical departments
Aquaculture did their own thing using their own system
Fisheries dept. wants tool for other institutions to download subsets

Collaboration / coordination (actual and possible)

Development versus maintenance
Time spent in units, updated in intervals, not daily

Any gaps in domain coverage

Development person hours / maintenance person hours per month

Time frame

Data model (entity and relationship types) (existing and needed)
Terms (starting with capitals) and English scope notes (using <br> for line breaks)
Owner for each term and scope note
Examples
Link to image
RT

Software used, file structure

Updated by Word document without sources because all definition come from an FAO publication (biotech)
Fisheries uses Web interface for small updates, files for larger updates
Some
Oracle tables
Web interface for input

Sources

International organization literature, often

Framework for KOS projects

Number and title NAL Thesaurus

Related to thesaurus use cases

Scope and size

Unit and person responsible

Collaboration / coordination (actual and possible)

Development versus maintenance

Any gaps in domain coverage

Development person hours / maintenance person hours per month

Time frame

Data model (entity and relationship types) (existing and needed)

UF
USE
BT
RT
SN
SC subject category
TNR Indicator for facet heading

Software used, file structure

Appendix 4. Thesaurus/ontology functions - Reference list

Functions of a thesaurus / classification /ontological knowledge base Overview

Provide a semantic road map to individual fields and the relationships among fields.
Map out a concept space, relate concepts to terms, and provide definitions, thus providing orientation and serving as a reference tool.

Improve communication generally. Support learning and assimilating information.

Support learning through conceptual frameworks. Conceptual framework to help thelearner ask the right questions.

Support the development of instructional materials through conceptual frameworks.

Assist readers in understanding text by giving the meaning of terms.

Assist writers in producing understandable text by suggesting good terms.

Support foreign language learning.

Provide the conceptual basis for the design of good research and implementation.

Assist researchers and practitioners with problem clarification.

Consistent data collection, compilation of statistics (related to information analysis)

Provide classification for action. Classification for social and political purposes

a classification of diseases for diagnosis,

of medical procedures for insurance billing,

of commodities for customs.

Support information retrieval and analysis. Organizing and keeping track of goods and services for commerce (esp. ecommerce) and inventory

Provide a tool for searching, particularly knowledge-based support for end-user searching, including hierarchically expanded searching.

Provide a tool for indexing.

Facilitate the combination of or unified access to multiple databases

Support document processing after retrieval.

Support meaningful, well-structured display of information.

Ontology for data element definition. Data element dictionary.

Conceptual basis for knowledge-based systems.

Do all this across multiple languages

Mono-, bi-, or multilingual dictionary for human use.
Dictionary/knowledge base for automated language processing


The underlying function of a knowledge base on concepts and terminology:

Provide a semantic road map to individual fields and the relationships among and across fields.

Map out a concept space, relate concepts to terms, and provide definitions, thus providing orientation and serving as a reference tool.

Provide a semantic road map and common language for an individual field and, perhaps more importantly, map the relationships among fields.

Clarify concepts by putting them in the context of a classification / typology and to provide a system of definitions.

Relate concepts and terms across disciplines, languages, and cultures.

Many specific functions build on this foundation.


Improve communication generally.
Support learning and assimilating information

Support learning about any topic by providing the learner/reader with a coherent, age-appropriate conceptual framework. Conceptual frameworks help the learner ask the right questions; learning as information retrieval.

Support the development of instructional materials by providing a conceptual framework to the instructional developer / writer and by suggesting didactically useful arrangements of topics.

Assist readers in understanding text; help them ascertain the proper meaning of a term and placing it in context.

Assist writers in producing understandable text by helping them to conceptualize the topic and suggesting from a semantic field the term that best conveys the intended meaning and connotation.

Support foreign language learning


Provide the conceptual basis for the design of good research and implementation.

Assist researchers and practitioners with problem clarification

Includes help with

exploring the conceptual context of a research or practical problem - a study, policy, plan, or implementation project

and with

structuring the problem and providing a conceptual framework for asking the right questions and devising good query formulations for retrieval.

Examples of specific functions:

Present the issues in a field or application area in a coherent framework.

Assist in problem-solving: Assist in the exploration of the dimensions of a problem and aspects to be considered in its solution; provide a classification of approaches to solving a specific problem (for example, a classification of approaches to drug abuse prevention as a help in designing drug abuse prevention projects).

Provide classification and consistent definition of variables for research / of evaluation criteria for practical problems, thus enhancing the comparability of research and evaluation results and making research more cumulative.


Support the compilation and use of statistics

This is a very important function. The Census Bureau, the Bureau of Labor Statistics, and other statistical agencies are heavily involved in developing classifications and defining concepts.

Support data collection

The concepts in a classification used for statistics not only make the collected data retrievable, they define the very nature of the data.

Support data aggregation

For example, get the value of all electronic goods imported into the US in the year 2000, or the tonnage of green leafy vegetables produced in a given year in the US.

Support retrieval of specific numbers (also part of information retrieval)
Support data tabulation and analysis (Need to have proper variables available)


Provide classification for action

This list addresses the functions of formal classifications. In a broader perspective, classification is the basis for much of everyday action, where we put people, things, and events in certain categories and, based on these categories, predict the behavior of persons and things and the course and effects of events, determine our attitudes towards them, and plan action accordingly.

For example,

a classification of diseases for diagnosis,

a classification of medical procedures for insurance billing,

a classification of medical outcomes to assist with treatment evaluation,

a classification of commodities for customs,

a classification of educational objectives for instructional development,

a classification of occupations for matching job applicants with job openings and for pay scale;

a classification of skills for employee task assignments.

a classification of crimes for determining sentences

a classification of types of expenses for tax purposes


Classification for social and political purposes. Socially charged classification

For example

Establishing that a profession has its own knowledge base, thereby enhancing there cognition of the profession (for example, the Nursing Intervention Classification)

Establishing a persons condition or behavior as normal, or as a disease, or as a moral failing or otherwise deviant. Different groups may want the same condition or behavior classified in different ways to further their agenda

Examples:

Should homosexuality be classified as a disease?

Is alcoholism or other drug abuse a disease or a moral failing?

Is mental illness a disease on a par with physical illness, and thus covered by health insurance the same way?

Is some levy to be classified as a tax or as a user fee


Support information retrieval 1:

A tool for searching, particularly knowledge-based support for end-user searching. Support

searching in any kind of database - bibliographic, full-text and hypermedia, directory, numeric, etc.;

searching in any kind of medium - printed indexes, CD-ROM systems, online systems, and the Internet;

searching in multiple natural languages independent of the language used in each database;

free-text searching;

searching multiple databases using different index languages.

Elicitation of user needs through a series of menus based on a search tree, or through guidance in the conceptual analysis of a search topic (questions based on a facet structure, presentation of a segment of the concept hierarchy for each applicable facet).

Browsing the classification structure to identify useful concepts for a search at the level of specificity desired. (The user may not have command of the vocabulary needed.) Browsing a collection (as on the shelves or in a subject directory)

Mapping from the user's query terms to descriptors used in a database or to the multiple natural language expressions to be used for free-text searching.

Inclusive (hierarchically expanded) searching.

Enhanced ranking algorithms that use concept and term relationships.

Searching multiple databases by mapping the users query terms to the descriptors used in each of the databases, or mapping the descriptors from one database to another databases (switching); common search language.


Support information retrieval 2: Provide a tool for indexing.

Vocabulary control.

User-centered (request-oriented, problem-oriented) indexing.

Indexing several databases in a field with a common index language and sharing the results of indexing to reduce overall indexing effort.

Mapping indexing descriptors from one system to another.


Support information retrieval 3:

Facilitate the combination of multiple databases or unified access to multiple databases through

mapping the users query terms to the descriptors used in each of the databases;

mapping the query descriptors from one database to another (switching);

providing a common search language from which to map to multiple databases;

providing a common index language for a number of databases in a field;

mapping indexing descriptors from one database to another.


Support information retrieval 4: Document processing after retrieval

Sample functions that require knowledge-based support:

Meaningful arrangement of search results (see next box)

Highlight descriptors responsible for retrieval, using colors to show facets.

Highlight terms belonging to a given category, for example, personal names, again using different colors for different categories.

Prepare document summaries, possibly in a different language, taking into account the query topic.

Translate full documents.

Extract substantive data from text. Compile and arrange data extracted from several texts.


Support meaningful, well-structured display of information

Meaningful arrangement of units (document records, paragraphs, property data on a given substance assembled from several databases), including knowledge-based clustering of records retrieved. This includes meaningful structure for Web sites and subject directories

This supports exploration of large retrieved sets and, by extension, exploration of the content of an entire collection or subcollection.

Meaningful arrangement of information within a unit (for example meaningful ordering of descriptors within a bibliographic record).


Organizing and keeping track of goods and services for commerce (esp. ecommerce) and inventory

The functions detailed for information retrieval apply to this special case

Organize a store, an inventory, an online merchandise catalog, a yellow page directory so items can be found

Display the inventory in a meaningful arrangement so users can find tings (as in a store)

Keep track of inventory

These functions apply both to business-to-consumer and to business-to-business commerce. Classification by function or purpose is especially important here.


Ontology for data element definition.

Data element dictionary.
Consider data processing systems in a multinational corporation


Conceptual basis for knowledge-based systems.


Do all this across multiple languages


Mono-, bi-, or multilingual dictionary for human use.

Printed or machine-readable, such as dictionary on CD-ROM or a thesaurus used in conjunction with a word processor

Dictionary/knowledge base for automated language processing

Machine translation and natural language understanding (data extraction, automatic abstracting/indexing). (It should be noted that parsing natural language requires not only morphological information and information about the possible syntactic roles of a term but also a great deal of semantic information.)

Spell check dictionary

Knowledge base for grammar checking.


Functions of an ontological knowledge base in software development

Assist in the design and implementation of the user interface, esp. choice of terms and icons.

Terms and icons must be chosen with the sometimes conflicting goals of communicating to the intended user group and of adhering to standards.

Assist in the organization and formulation of help messages and of documentation and third-party software books.

Serve as the lexicon for machine translation of interfaces and software-related documents

Assist the user in understanding interfaces and documentation, esp. in a foreign language.

Support retrieval of software for the end user or for software reuse.

Data element definition and standardization and organization of CASE tool databases.

All this functionality must be provided in multiple languages (for example, software localization for end users, CASE tool databases for multinational development teams)

Appendix 5. Entity-Relationship schema for a moderately detailed Interlinked Food Description (IFD) database with emphasis on thesaurus structure

January 10, 1992

Dagobert Soergel

See Soergel et al. A language for the description of foods in databases for background (included in file Foodlanguage2b.pdf).

At the core of a thesaurus for an IFD system is a list of authorized entity values arranged by entity type. Each entity value is represented by an internal identifier, usually a short code, and an external preferred term. Additional synonyms may be given with reference to the preferred name or code.

Various relationships can be established among these entity values independently of the description of individual food products. These relationships constitute data, and they are stored in the same entity-relationship structure as any other data in the IFD system; the thesaurus is an integral part of the database.

The relationships comprise is-a relationships between entity values of the same type, such as Potassium sorbate is-a ***, and relationships between entity values of different types, such as Potassium sorbate is-used-for Preservation. Both types of relationships can be used for the classification of entity values within one entity type. The is-a relationships define the basic classification, for example of substances by their chemical nature. The other relationships can be used to define auxiliary classifications, for example a classification of substances by use, which would bring together all preservatives.

For many applications an auxiliary classification may be more helpful than the basic classification. Fortunately, in an on-line thesaurus there is no need to choose between the two; the user can have the entity values displayed in any way that is supported by the data in the system. Even in a printed edition one may include several arrangements of the entity values within a type, but size constraints may force choices.

The following pages show the entity types and relationship types for food descriptions that are suggested for a first implementation of the IFD system and the relationship types that should be considered in building the thesaurus. The remainder of this section will discuss each entity type in turn.

Entity types

Food product, recipe, or standard

Three values are on the top of the hierarchy

Values: food product, food recipe, food standard of identity

A substance value can be used whenever food product is indicated in a relationship.

Organism or inorganic food source

Age

Maturity of plant or animal part

Sex

Environment

Values can be used for recording more detail of growing conditions. It presently includes values that distinguish between farm- or garden-produced plants or animals and wild plants or animals.

Anatomical part

Substance, material (energy or matter)

Physical state

Values solid, semisolid, semiliquid, liquid or more precise viscosity numbers.

Physical form

Process

Values cover many different types of processes, e.g. cooking method, incl. storage and handling.

Sequence number of process

Place/stage of processing or point in distribution chain

Values include farm/garden, manufacturing plant, retail store, restaurant, home.

Consumer group

Characterization by age, sex, ethnic origin and other relevant characteristics

Purpose or effect, use (including diet by intake)

Sample values are nutrition, preservation, texture, packing. Includes meal type (breakfast, lunch, snack, etc.) and use for special holiday or religious occasions. Also includes the distinction between luxury and everyday foods. When a purpose or effect is required, the value of a process can be given: the purpose is to facilitate the process.

Condition

Amount

Property

Geographical area

Includes countries, subdivisions of countries, regions (such as tropics), bodies of water.

Organization

Includes food-producing companies, government organizations (as producers of standards), and cookbooks (as "producers" of recipes).

Date

Money (for indicating price if desired)

Currency

Container, equipment

Types of containers or equipment, such as aluminum can or kettle made of copper, or specific models. Values are divided into container and equipment and within each into primary components (for example, bottle, mixing bowl) and secondary components (for example, lid, mixing arm).

Relationship types for food description

food product

isa

[food product 2]

food product

is_one_of

[food product list]

food product

is_one_or_more_of

[food product list]

Labels often include ingredient listings such as

Vegetable oil (one or more of the following: soybean, almond, coconut, or cottonseed oil)

This and the preceding relationship allow for the definition of a corresponding "food product", which can then be used in a has ingredient statement.

food product

is_substitute_for

[food product 2]

food product

comes_from_source

[food source, age, sex, environment]

food product

comes_from_part

[[principal part present, [subparts absent], [subparts unknown], maturity, [[country, grade]]]]

Often the anatomical part from which a food product is made is most easily described as a principal part from which some subpart was removed, such as

Fruit, seed removed
Fruit, seed removed, skin removed

To make matters worse, often it is not known whether a subpart was removed or is still present. Rather than including all the possible combinations in the thesaurus, this relationship provides a frame for constructing the appropriate combination from three pieces of information as required: The principal part, a list of zero or more subparts known to be absent, and a list of zero or more subparts for which it is not known whether they are present or not. Examples:

Fruit, seed removed, skin unknown; ripe
fp636 comes_from_part [[fruit, [seed], [skin], ripe]]

Fruit, seed removed, skin removed; no statement on maturity
fp737 comes_from_part [[fruit, [seed, skin]]]

The format of the relationship allows for listing several principal parts with their modifiers; this may be more natural than constructing an intermediate food product with two parts as ingredients.

food product

is_made_from

food product 2

is_made_from is used for foods from a single source where has_ingredient would seem unnatural, such as protein extracted from soybeans or chocolate nibs made from cacao beans by roasting and cracking. In many searches it makes sense to search for

is_made_from OR has_ingredient

food product

contains

[substance, amount in total, amount in solids, part of definition (yes/no), label claim (yes/no)]

The relationship contains refers, strictly speaking, to analytical rather than descriptive data. However, many foods are defined in terms of the nutrients or other substances they contain, for example, low-fat milk. Analytical data thus become, by necessity, part of the description of the food. This relationship can be used to integrate analytical data into an IFD database, if desired.

food product

is_isolated_substance

[isolated substance, degree of concentration]

The process by which the isolation was accomplished should be indexed separately, using the relationship underwent_process.

food product

had_removed_substance

[removed substance, degree of removal]

The process by which the removal was accomplished should be indexed separately, using the relationship underwent_process.

food product

has_ingredient

[food product, predominance rank, total ingredient amount in total food product, ingredient solid amount in food product solids, [purpose list]]

Where amount or predominance rank appear in relationships, a more complex structure will be needed to reflect the variability of foods, especially in the description of generic foods. The suggested structure replaces a single value by a list as follows:

[most common value, ["RANGE", upper bound, lower bound], ["LIST", value-1, value-2,..., value-n]]

For example, in the description of the generic food product milk (cows milk), the amount specification in the contains statement for fat would be

[3.35, ["LIST", 3.35, 2.0, 1.0, 0.2]]

This indicates that the most common fat content is 3.35 g per 100 g (whole milk) and that the permissible values are those given after LIST. This information is presented to the indexer who entered milk as the product to be indexed; the indexer must then choose the correct value for the product at hand or, if that information is not available, accept the imprecise specification given.

If the point in the process and/or the manner of addition of the ingredient are important, use a separate underwent_process statement with addition or a narrower term as the process and the ingredient as the agent.

food product

may_have_ingredient

(as has ingredient)

food product

contains_dish

food product 2

This relationship serves to describe TV dinners and such.

food product

underwent_process

[process, [agent list], equipment, place/stage, sequence no., [purpose list]]

An agent of a process may be an organism, a substance, or a food product. If the agent is also an ingredient (if the agent stays in the food), use a separate has_ingredient statement.

A product that underwent several processes can be indexed in two ways.

(1) Index as one food product, giving for it all the processes with their sequence number.

(2) Define several intermediate food products and index each separately; each product becomes the ingredient of the product next in the processing stages.

Option 2 should be used when the intermediate stages are products in their own right.
The presence of both options in a database must be considered in searching.

food product

has_state

[physical state, temperature]

In this relationship a missing temperature value is interpreted as the default value room temperature (20 degrees Celsius). There can be several has_state statements for a food product; at least one must be for room temperature.

food product

has_form

[physical form, temperature]

food product

has_property

property

food product

is_for_special_use

[purpose, label claim (yes/no)]

food product

made_for

consumer group

food product

is_used_for

[purpose, priority, season or day of year, [country list]]

food product

has_price

[money, currency]

food product

produced_in

[geographical area, date]

If the statement is about a food standard, geographical area is the country or other area in which the standard holds, and date refers to the date it was promulgated.

food product

produced_by

organization

If the statement is about a food standard, organization is the organization that developed the standard.

food product

packed_in

container

Relationship types for container and equipment description

container or equipment

consists-of

[[container or equipment component, structural strength material, inside coating, outside coating]]

The container description is a list. The elements of the list are in turn lists with four elements each, as indicated. The first component given is the primary component, such as bottle, box, or tray. The remaining components (if any) are secondary components, such as lid, ends, window, liner, top cover. The primary component implies a form.

Relationship types to be used in the thesaurus

entity

isa

entity

entity

is_part_of

entity

entity

overlaps_with

entity

[organism, part]

is_used_for

[purpose, priority, [country list]]

organism

lives_in

environment

substance

is_measured_in

[unit 1, conversion factor, unit 2]

Unit 1 is the unit of measurement used, generally kg, g, mg, or mg. The conversion factor serves, for example, to convert international vitamin units to mg or mg.

substance

has_daily_requirement

[user group, minimum amount, recommended amount, maximum amount, [country list]]

substance

is_used_for

[purpose, priority, food product, [country list]]

[substance, consumer group, quantity]


is_harmful_for

[harmful effect, strength, food product, [country list]]

process

is_used_for

[purpose/effect, priority, food product, [country list]]

process

is_harmful_for

[harmful effect, strength, food product, [country list]]

substance

is_soluble_in

[substance]

Notes on the conceptual schema

1. Many food descriptions will not include the level of detail that is possible with the schema. The level of detail is determined by the amount of information that is available. The schema defines an upper limit in the sense that one should not aim at descriptions that are more detailed than the schema provides.

2. Most relationships have one element to the left and an argument list at the right; the argument list may contain lists as elements. The meaning of an element in a list is normally specified by its position. Elements at the end of a list may be omitted if they are not specified. However, if one element remains unspecified and an element further to the right in the list is specified, the unspecified element must be represented by a place holder to maintain the proper position for the specified element.

3. The structure of the relationships is determined by ease of internal processing. In the finished system the user will input the information needed on formatted screens and the system will put together the relationship.

4. Any relationship can be used to express the absence or negation of something, for example,

fp383

has_ingredient

["NOT", [salt]]

In general: The right-hand list has two arguments, "NOT" and a list which has the same structure as the argument list of the unnegated relation.

5. The source of data, the date of analysis, and the date of data entry, the person responsible for entering the data, and comments can all be handled through auxiliary relationships. The left-hand side of a relationship is expanded to be a pair of the form [entity value, statement number] (for example, [food product, statement number] or [substance, statement number]).

[entity, stno]

has_entry_info

[entry date, entry person]

[entity, stno]

has_source_info

[[source, date of analysis]]

[entity, stno, specific element]

has_comment

text

specific element would name a specific element in a statement to which the comment refers; this may not be needed.

Further elaboration of the entity types

Food product, recipe, or standard

Food product is the central entity type of an IFD database. Food products are described by indicating their relationships to other food products and to entities from other entity types. Every food product description starts with a statement of the form

Food product 1 is a Food product 2.

Further statements then record the characteristics that are added to the description of food product 2 to result in the more specific description of food product 1.

The thesaurus contains a number of very generic food products called product types. The product types serve as a starting point in the description of more specific foods. They also provide a useful overview of the major types of food (food groups). Since different classifications of food groups are used for different purposes and by different organizations, the thesaurus may contain several such classifications with one designated as the major one for purposes of this system.

The thesaurus entry for a product type is a food product description following the same rules as the descriptions of specific foods in the database. The only difference is that the names of product types appear in the thesaurus listing whereas the names of more specific food products do not. All food products, whether product types or more specific products are accessible online and can be displayed in any selection and arrangement desired (within the capabilities of the system).

Development of this part of the thesaurus should start from the FFV factor Product type. Since many product types are defined based on usage, many values of the entity types Purpose or effect and Meal type must be introduced to allow for proper description of all product types.

Organisms and inorganic sources

The scope of this entity type comprises food-relevant organisms, that is organisms used as a source for food, used for the treatment of food, or having a harmful effect on food. It also includes the values Mining (for products such as salt and water) and Chemical synthesis. This is somewhat broader than the definition of the FFV factor Food source.

The basic classification of organisms should follow taxonomic arrangement. Standard taxonomy provides a common frame of reference, which is of particular importance in an international system.

Application-oriented classifications are also very important, perhaps more important than the taxonomic classification. They are accomplished through relationships of the type

(organism, part) is-used-for (purpose, priority, country-list)

where the entity type purpose includes such values as fruit, berry, vegetable, and grain. The purpose slot can also be filled by pairs of the form (extraction, substance). Priority gives some indication of what is the main use and what are minor uses of an organism. The optional country-list allows for differences from country to country in these data. If it is used consistently, each country can have its own application-oriented classification of organisms. There must be at least one relationship of the type is-used-for every organism. This takes care of the classification of Plant used as food source in FFV which is very helpful.

A further relationship is useful:

organism lives-in environment,

where values for environment are initially limited to

Land (soil), Water, Sea water, Fresh water.

The relationship Organism lives-on Land is the default. Explicit statements need be included only for organisms that live in water, where the specific terms Sea water and Fresh water are used whenever possible. This relationship makes it possible to search for all seafood.

Development of this part of the thesaurus starts from the FFV factor Food source. The Latin names, and thus the taxonomic location, for each organism must be determined (use Hortus 3 as authority for plant names). In some cases expansion into subspecies, varieties, and cultivars may be indicated. Additional organisms that are used as food source will be identified as the database grows. The FFV classification supplies most of the information needed for the application-oriented classification from an US point of view. These data can be augmented as the system develops.

Food-relevant organisms that are not food sources must be identified and included with appropriate is-used-for relationships. Such organisms can be identified from the CFR (for example, bacteria and molds used in making cheese) and from textbooks and handbooks in food science and technology.

Ancillary entity types Age, maturity (values to be determined) and Sex.

Environment

At this point Environment is recommended only for thesaural relationships making general statements about the organisms included in the thesaurus as discussed above and not for making statements that specify the conditions (e.g. soil type) under which the specific plants or animals used for the food product being described were raised. Therefore, the only values to include here for now are

Land (soil), Water, Sea water, Fresh water

Anatomical part

This section contains an anatomic classification of plant parts and of animal parts at a level of detail appropriate for food use. Skeletal meat parts may be further subdivided into cuts. Since different countries use different meat cuts, several parallel classifications may be needed. As an aid to the user one may develop a very fine-grained classification of base cuts such that any conventional cut used in any country can be represented as a group of base cuts.

In FFV there are two ancillary entity types, Presence of bone or shell (values With bone and Boneless) and Meat shade (values White, Light, and Dark). The first can be handled by the conceptual schema presented here, the second is a genuine entity type.

The classification can be taken from the FFV factor B2, the classifications for the ancillary entity types from factor Z. The Extract, concentrate, or isolate terms from B2 need not be included; this is handled through the relationship is extracted substance.

Substance/material

The IFD thesaurus must contain a comprehensive list of food-relevant substances that

- naturally occur in foods,
- occur in foods through migration in the food chain,
- are added to foods for whatever purpose,
- are used to treat foods,
- are used in food-processing equipment, or
- are used in any component of packaging.

The basic classification of substances should reflect their chemical nature, choosing from the many ways in which substances can be classified chemically the one that is most useful in the context of foods. Beyond that, the system should rely on a chemical substructure search system, such as the CAS Registry, with an option to limit output of search results to substances in the IFD thesaurus.

An application-oriented classification is very important here. While the data for the basic classification are by and large the same as those found in a source like CAS, the application-oriented classification is the real contribution of the IFD thesaurus. It should be based on the relationship

substance is used for (purpose, priority, food product)

and on the relationship

substance is harmful for (harmful effect, strength).

For each substance there may be, and usually are, multiple is-used-for and/or is-harmful-for statements. The purpose slot can accept a combination of the form (aiding-in, process) or (counter-acting, process). The food product is generally specified at a very high level (food group).

Substance per se is not a factor in the FFV. A good starting point for developing this section of the thesaurus is FDA/CFSAN's chemical dictionary, which gives verified CAS numbers and names for most any food-relevant substance. Data for the relationship is-used-for can be obtained from the FFV by virtue of a substance being listed in a factor, and from the CFR. Substances occur in the following FFV factors: B1 Food source (in connection with plants used for producing certain substances), B2 Part of plant or animal (in connection with extracts), D3 Treatment applied (primarily as Ingredient added, but also as substances used in processing), D4 Preservation (preservatives), E2 Container (they are organized first by material), and E3 Food contact surface. All substances listed in one of these two sources should be included in the IFD thesaurus.

Physical state

The following classification, taken essentially from the FFV factor C Physical state, shape, or form, should be satisfactory.

Liquid

Liquid, low viscosity

Liquid, low viscosity, no visible particles
Liquid, low viscosity, with very small particles

Liquid, high viscosity

Liquid, high viscosity, no visible particles
Liquid, high viscosity, with very small particles

Semiliquid

Semiliquid with smooth consistency
Semiliquid with very small particles

Semisolid

Semisolid with smooth consistency
Semisolid with very small particles

Solid

Soft
Hard

The internal structure seems less important for solid products. However, if it is deemed important, appropriate values can be added.

The FFV term Liquid, low viscosity, with small pieces and the other corresponding terms are not needed here. The product would be indexed as consisting of two ingredients, and the physical state (and the physical form) would be indexed separately in the description of each ingredient. (There is still a problem here that needs to be resolved: The ingredient physical state and form should be indexed at the point of entry into the food being described, while for the purposes here one would need the physical state and form at the end of the processing, in the finished product.)

Physical form

In the long run, an integrated list of physical form terms applicable to food products as well as to containers should be developed. For now it is easiest to have two subdivisions:

Physical form of food product
Physical form of container

The first subdivision can be taken from the subdivision of Solid in the FFV. Values for the second subdivision can be derived from the container terms in the FFV factor E2 Container or wrapping by subtracting the material component.

Process

This is a very important entity type. It covers all processes used in the manufacture/preparation, preservation, and handling of foods. The base classification should reflect the intrinsic nature of the processes. One possible broad subdivision is

- Physical process (with heat process as a major narrower term)
- Chemical process
- Biochemical process

Application-oriented classifications can be based on the following two relationships

process is-used-for (purpose, priority, food product)

and

process is-harmful-for (harmful-effect, strength, food product).

Finding all these relationships while simultaneously adding values to the list for the entity type purpose/harmful effect will require considerable effort.

The FFV does not have a process factor per se. The factors in group D (D1 Degree of preparation, D2 Cooking method, D3 Treatment applied, and D4 Preservation) all include many processing terms. For the beginning, the terms found there might suffice.

Consumer group

The values in this entity type are in turn composites of three factors: organism, age, and sex. Organism in this context distinguishes between human and animal. Animal species are included in the entity type organism anyhow. The most logical course of action is to include Human there as well (possibly subdivided by race if such subdivision is food-relevant). Note that Human is needed anyhow as the source of human milk. The most important combinations should be included in the thesaurus so that they can be used as if they were elemental values. The less frequent combinations must be built during indexing as they are needed.

Purpose/harmful effect

For economy of entity types, these two can be combined into one type. This entity type comprises all the purposes or effects to be achieved in the production, processing, and consumption of foods, and all the harmful effects that can be caused by food-relevant substances.

At a minimum, this section of the thesaurus must contain the purposes/harmful effects needed for the application-oriented classification of organisms, substances, and product types.

The basic classification should first make the division between purpose/useful effect and harmful effect, and then group by area, such as nutrition, preservation, appearance, etc. An alternate classification might group, for example, all physiological effects, whether useful or harmful.

This may be the most difficult part of the thesaurus to develop since there is no explicit list to start from. Purposes are implied by the classification in the FFV factors Product type and Food source, by the factors Preservation, Packing medium, Container, and Food contact surface, and by the classification of food additives and other mention of substances in the CFR. Explicit purpose terms must be extracted from these sources. It is recommended that purpose terms be accumulated as is-used-for statements about organisms, substances, and product types are entered.

This entity type includes

Use/diet

This entity type is concerned with special uses of foods, especially conditions, such as diseases, for which the food is helpful or at least can be tolerated. It covers some of the values under Consumer group in FFV. However, the descriptors such as Low sodium are handled through the relationship

Food product contains (Substance,...)

with the value of Label claim being yes.

Amount

This entity type serves to record quantitative ingredient or nutrient information where available. Amount is recorded as a number per 100 g. The unit of measurement is linked once and for all to the substance for which the amount is recorded; therefore, there is no need to give the unit of measurement for each individual amount recorded.

Sometimes amounts are given imprecisely, such as "low in sodium". The following special values of the entity type amount are introduced to make recording of this type of information possible.

None
Very low
Low
Medium low
Normal
Medium high
High

For substances for which standards exist, these terms can be linked with a range of values through a relationship to be defined for this purpose.

Property

This entity type is introduced to capture any properties not taken care of otherwise. Its use and the values needed must be developed over time.

Geographical area

Countries could be represented by their three-digit telephone code. Each country would define a hierarchy of regions (possibly two or three ways of subdividing a country, each adapted for a different type of food, such as wine and cheese in France). The codes for the within-country regions are appended to the country code. A set of special codes for large regions, such as tropics to be developed.

Money and Currency

Money is represented as numbers with two places after the decimal point, and currency by the country code (with a table giving the proper designation).

Container

The situation here is similar to that for Food product. The system provides the elements for describing the containers by giving the structural strength material(s), the coating material(s), and the container form. An indexer can make up such a description on the spot and then reference it in a food description. However, the thesaurus should include a list of containers with ready-made descriptions corresponding to the FFV factor E2.

The discussion of the entity types Substance/material and Physical form already mentioned that the list of entity values included must consider the needs of container description.

Appendix 6. Detailed list of requirements for KOS management software

The text of this appendix is found in the file

SoftcritEnglish.pdf

Appendix 7. List of data fields in AGROVOC, FAO Term, and FAO Glossary

Two files constitute this appendix:

FAOSimplifiedDataSchemas.doc
FAODataSchema3Full.xls

The simplified schemas are stripped down to bare essentials, omitting whole tables and data fields of a mainly administrative nature, primary key declarations, and index declarations, so that is easier to see the structure.

The full schema comparison is given in a spreadsheet. It is organized along the MARTIF categories.

As can be seen, FAO Term and FAO Glossary have very similar data structures. AGROVOC agrees in many aspects, with one major difference:

In FAO Term and FAO Glossary, there is one record for a term with a separate data field for the term in each of the six languages included. The same is true for definitions. In AGROVOC, on the other hand, the record for a term has a field for the term (in whatever language) and a field for the language; each language version of the term has its own record, tied together by the term number. AGROVOC's solution is more elegant in this case; it has simpler table definitions and it can add new languages without having to redefine tables.

Appendix 8. Algorithm to convert terms to within-sentence case

/* Capital conversion. */

/* Principle: Capital letters are converted to lower case with the following exceptions:

- If any letter but the first in a word is a capital, no character in the word is changed.

- If a capital letter at the beginning of a word is preceded by '^^', it is not changed.

- If the capital letter is a word by itself, it is not changed.

- If the capital letter is followed by any punctuation character or digit, it is not changed.

- If a single word is in the database starting with a capital, the capital is preserved. */

/* Determine capitalization of a word within the whole term. (capitalization of the individual word for database lookup and storage is determined later.) Compute gToLower based on many conditions that protect upper case. */

/* gToLower

0

capital left as is


1

capital converted to lower case for word as such (sSingular), but not for word in term (apsTerm[n][i])


2

capital converted to lower case in both places. */

/* If the word start with lower case and/or falls under one of the protection conditions, gToLower is 0. Otherwise, gToLower may be 1 or 2, depending on other conditions. */

if (strlen(apsTerm[n][i]) == 1


|| islower(apsTerm[n][i][0])


|| gNoMod


|| strlen(asWordCaret[i]) >= 2



/* A single caret maintains the initial capital of the word as part of the term, but not of the word stored as such. */


|| ispunct(apsTerm[n][i][1])


|| isdigit(apsTerm[n][i][1]) /* Strings like A1, A-1. */


|| strpbrk(apsTerm[n][i] + 1, sCapital)) /* Inner capital. */


{


gToLower = 0;


}

else


/* The word starts with a capital and none of the protection conditions apply. */


{


switch (cCapAlgorithm)



{



case 'n':



case 'N':




gToLower = 1;





/* Capitalized wordz, preceded by at most one caret; capitalized as part of term, lower case as individual word. */




break;



case 'p':



case 'P':




/* In partial capital conversion only the first word is converted, but not if all other words (excluding stopwords) in a multi-word term are capitalized. */ if (i == 1)





{





/* Determine whether all other non-stop words start with a capital; if so, keep capital on first word. */





gAllCap = 1;





for (k = 2; k <= iNumberOfWords; k++)






{






if (islower(apsTerm[n][k][0]) &&!stopword(apsTerm[n][k]))







/* The term starts with lower case and is not a stopword. */






{






gAllCap = 0;






break;






}







} /* End for. */




if (iNumberOfWords == 1)






gAllCap = 0;





if (!gAllCap)






if (strlen(asWordCaret[i]) == 0)






gToLower = 2;






else






gToLower = 1;





} /* End if (i == 1). */




else






gToLower = 1; /* Unprotected individual word l.c. */




break;



case 'f':



case 'F':




if (strlen(asWordCaret[i]) == 0)





gToLower = 2;




else





gToLower = 1;




break;



} /* End switch (cCapAlgorithm). */


} /* End if (isupper...). */

if (gToLower == 2)


apsTerm[n][i][0] = tolower(apsTerm[n][i][0]);

/* End of capital conversion for the word in the whole term. May still need to be revised as part of database lookup. Still need to word as it is to be stored individually. */


zeroset(sWord);

strncpy(sWord, apsTerm[n][i], sizeof(sWord) - 1);

/* apsTerm[n][i] remains the word as it appeared in the term, except that capitalization has been adjusted when necessary, including words that need to be always cap. */


if (gToLower == 1)



sWord[0] = tolower(sWord[0]);



/* If gToLower is 2, apsTerm[n][i] is already converted to lower case. This must be done here, because this is the form to be looked up in the database. */


/* Now look up individual word in database, if gPost create new term record if needed.

Check in database whether term needs to be always cap. May need to reverse a conversion to lower; if original lower (as seen from gToLower) needs to be cap, need a message. */


if (stopword(sWord))


/* Note: stopword is case-sensitive. */


return -1;

/* Word is not a stopword. Search for word itself. */

/* Note: to_singular checks for occurrence in database, but not with the examination of capitalization as done here. */

if (btrieve_term_term(iGreaterEq, sWord) >= 0)


if (strcmp(sWord, rTermRec.sTerm) == 0)



/* Exact same term with same capitalization found. */



gMatch = 2;


else /* Not an exact match. */



if (strcmpi(sWord, rTermRec.sTerm) == 0




&& strcmp(sWord + 1, rTermRec.sTerm + 1) == 0)





/* Same term with case-insensitive comparison, exact same except for first character. */




gMatch = 1;



else




gMatch = 0;

else


gMatch = 0;

/* If there is a match, must still take care of words wher singular is stem + suffix, esp. final y, such as psychology. */


if (gMatch)



{



zeroset(sSuffix);



strncpy(sSuffix, rTermRec.sTerm + rTermRec.iStemLength, iSuffixLength);



zeroset(sSingular);



strncpy(sSingular, rTermRec.sTerm, iTermLengthMax);



iStemLength = rTermRec.iStemLength;



}


else



/* If there is no match, find the singular and try it, if it is different. */



{



if (gNoMod || asWordBackSlash[i][0])




/* No modification. */




{




zeroset(sSingular);




strncpy(sSingular, sWord, sizeof(sSingular) - 1);




sSuffix[0] = 0;




iStemLength = strlen(sSingular);




}



else




/* Note: need to do this even if no final s to get final y as sSuffix. */




iStemLength = to_singular(sWord, sSingular, sSuffix);



if (strcmp(sWord, sSingular)!= 0)




/* No use repeating the look-up if word was singular. */




if (btrieve_term_term(iGreaterEq, sSingular) >= 0)





if (strcmp(sSingular, rTermRec.sTerm) == 0)






/* Exact same term found. */






gMatch = 2;





else /* Not an exact match. */






if (strcmpi(sSingular, rTermRec.sTerm) == 0







&& strcmp(sSingular + 1, rTermRec.sTerm + 1) == 0)







gMatch = 1;






else







gMatch = 0;




else





gMatch = 0;



}

iStemDiff = strlen(sSingular) - iStemLength;

switch (gMatch)


{



case 2: /* Exact match in database. */




lFoundNo = rTermRec.lTermNo;




if (islower(apsTerm[n][i][0]))




/* No action necessary. */;






else /* Initial cap. */





{





if (rTermRec.cAlwaysCap)






/* Make sure initial cap is protected. */






{






if (strlen(asWordCaret[i]) < 2)







{







zeroset(asWordCaret[i]);







strncpy(asWordCaret[i], "^^", sizeof(asWordCaret[i]) - 1);







}






}





else






{






if (strlen(asWordCaret[i]) == 3 && gPost)







/* set rTermRec.cAlwaysCap to 1 in current term record. */







{







rTermRec.cAlwaysCap = 1;







btrieve_term_term(iUpdate, "");







}






}





}




break;

case 1:


/* Partial match in database, match in everything except capitalization of first character. */


/* This can only happen if the first character of the word is lower case and the first character of the term in the database is upper case, and if the database does not contain the term with a starting lower case. Reasons: A lower-case word would match a lower-case term in the database exactly. If there is no lower-case matching term, next in sequence is the matching upper-case term (if in the database), otherwise a completely different term. If the word starts with upper case, an upper-case matching term would match exactly. A lower-case matching term would be earlier in the sequence and thus not be found by GreaterEq; thus, if there is no exact match for the upper-case word, the term found by GreaterEq is a completely different term, leading to gMatch = 0. */



if (rTermRec.cAlwaysCap)



/* Word must be first cap, unless input with lower case, protected. */



{



if (gToLower)




/* Word was capitalized in input, must restore capital. */




{




lFoundNo = rTermRec.lTermNo;








if (gToLower == 2)





{





apsTerm[n][i][0] = toupper(apsTerm[n][i][0]);





}




sSingular[0] = toupper(sSingular[0]);




zeroset(asWordCaret[i]);




strncpy(asWordCaret[i], "^^", sizeof(asWordCaret[i]) - 1);




}



else




/* Word was not capitalized in input, must capitalize unless preceded by^^. */




{




if (strlen(asWordCaret[i]) >= 2)





/* Set rTermRec.cAlwaysCap to 0 in the term in rTermRec. Leave word being processed alone; a term record must be posted to the database. */




{




if(gPost)




{




rTermRec.cAlwaysCap = 0;




btrieve_term_term(iUpdate, "");




}




}




else /* Must capitalize word and print message. */




{




lFoundNo = rTermRec.lTermNo;




apsTerm[n][i][0] = toupper(apsTerm[n][i][0]);




sSingular[0] = toupper(sSingular[0]);




fprintf(fpLoadReport1,




"%s was not capitalized in line\n%s\n",




apsTerm[n][i], sLineOrig);




fprintf(fpLoadReport1,




"Program changed first letter to capital.\n\n");




fprintf(fpLoadReportCum,




"%s was not capitalized in line\n%s\n",




apsTerm[n][i], sLineOrig);




fprintf(fpLoadReportCum,




"Program changed first letter to capital.\n\n");




}



} /* End if (gToLower). */


} /* End positive if (rTermRec.gAlwaysCap). */

break;

} /* End switch (gmatch). */

Appendix 9. Mock-ups for an online thesaurus display

Files Mockup1.pdf, Mockup2.pdf, Mockup3.pdf

Note: Internal to Harvard Business School. For internal use of the FAO thesaurus and ontology group only.

The mockups show a series of screens that demonstrate a type of thesaurus display that gives the user always an overall view of the thesaurus structure. They show a progression from a hierarchical outline to the "quick hierarchy" display (just descriptors, no annotations) and finally to the "annotated hierarchy" (descriptors with all annotations: definitions, synonyms, and conceptual relationships). Clicking on a descriptor in one display brings up the next more detailed display centered around the descriptor.

Information about the descriptor and relationships in other KOS can be shown in an expanded annotated hierarchy.

A similar thesaurus display can be seen in operation on the Web site for the

Alcohol and Other Drug Thesaurus, http://etoh.niaaa.nih.gov/AODVol1/aodthome.htm

Most KOS interfaces on the Web show the user only a small window on the scheme which makes it difficult for the user to get a sense of the overall structure and locate her topic within that structure. An sense of the overall structure often assists users in the very process of formulating their topics.

Appendix 10. A SQL data schema for a thesaurus and ontology system under Oracle

This appendix is given in file thes_schema.1.0.doc

Note: Internal to Harvard Business School. For internal use of the FAO thesaurus and ontology group only.

This schema is based on the concept- term- string model described in the JoDI paper http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/. It is a very flexible schema that allows for any type of relationship to be entered. It is based on the principle that most information about concepts, terms, and strings can be expressed through relationships. The schema allows for indicating sources and for a detailed audit trail of changes and responsibilities.

FAO and Harvard Business School might want to collaborate on the further development of this schema.

Appendix 11. SemWeb: integrated access to distributed ontological resources

Dagobert Soergel

This appendix is given in two files,

semwebprop.pdf (short version)
semwebfl.pdf (full version)

This paper gives the design of exactly the type of system that FAO envisions for its Agricultural Ontology Service (AOS). It also contains a list of data fields for terminological records that should be consulted for the final version of the FAO KOS database schema.

Abstract

We propose to develop a system, dubbed SemWeb, that would revolutionize the way people - from experts to students - interact with conceptual structures and terminology and the way they share such knowledge. We aim at the synergistic exploitation of existing lexical and ontological knowledge bases (ontologies/classifications, thesauri, dictionaries) and their vast intellectual capital through integrated access, allowing a user to consult multiple sources with one search that returns one integrated answer that visualizes concept relationships for ease of understanding. SemWeb is intended for for a wide variety of users and uses - including education, information retrieval, knowledge-based systems and natural language processing - and bridge discipline, languages, and cultures. Then same environment will support collaborative development and maintenance of ontologies and lexica.

We will do research on difficult issues that need to be addressed in the system, for example we will study how ontological and lexical knowledge is used in different disciplines and we will work on defining measures and methods for the evaluation of ontologies, lexica, and their representations and for correlating and integrating ontologies. We will also study the use and impact of the prototype through pilot application and user studies, particularly the impact on learning by students.

Appendix 12. The JoDI paper

Dagobert Soergel; Boris Lauser; Anita Liang; Frehiwot Fisseha; Johannes Keizer; Stephen Katz
Reengineering thesauri for new applications. The AGROVOC example
Journal of Digital Information, Volume 4 Issue 4, Article No. 257, 2004-03-17 http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/

This paper presents a number of results from the study undertaken to prepare this report. In particular, Section 4 contains an analysis of relationship types in AGROVOC that also sheds light on structural problems in AGROVOC.

Abstract and table of contents are given below.

Abstract

Empowering end users in searching collections of ever increasing magnitudes with performance far exceeding plain free-text searching (as used in many Web search engines) and developing systems that not only find but also process information for action require far more powerful and complex knowledge organization systems (KOS) than the existing classification schemes and thesauri that are lacking in well-defined semantics and structural consistency. In this paper, we present a conceptual structure and transition procedure to support the shift from a traditional KOS towards a full-fledged and semantically rich KOS. The proposed structure also complies with other interoperability approaches like RDFS and XML in the web environment. AGROVOC, a traditional thesaurus developed and maintained by the Food and Agriculture Organization (FAO) of the United Nations serves as a case study for exploring the reengineering of a traditional thesaurus into a full-fledged ontology. We start the process of developing an inventory of specific relationship types with well-defined semantics for the agricultural domain and explore the rules-as-you-go approach to streamlining the reengineering process.

Contents

1 From thesauri to rich ontologies

1.1 The problem
1.2 The relationship of traditional KOS to ontologies
1.3 Potential benefits of future generation KOSs
1.4 The process of reengineering: The rules-as-you-go approach

2 AGROVOC: A multilingual agricultural thesaurus

2.1 Background
2.2 Applications and related terminologies
2.3 Conceptual structure of AGROVOC
2.4 Semantic problems of AGROVOC
2.5 The need for reengineering AGROVOC into an ontology

3 Conceptual model: combining thesauri and ontologies

3.1 The basic model
3.2 Model extensions
3.3 Limitations
3.4 Implementation
3.5 Related approaches

4 The AGROVOC case: exploring conceptual relationships for the agricultural domain

4.1 The logical generic relationship
4.2 The part-whole family of relationships
4.3 Other relationships

5 Exploring the rules-as-you-go approach for the case of AGROVOC

6 Implications and further work

Appendix 13. Building a rich ontology from AGROVOC

Dagobert Soergel

FAO Agricultural Ontology Server Workshop
Beijing, April 27 - 29, 2004

Overview

Appendix 14. The FAO Subject Tree and the AGRIS Category Scheme

FAO Subject Tree V. 2´

The following is the new FAO Subject Tree categories and the Agrovoc descriptors that can be searched within each category in the EIMS databases.

Asterisks (*) denotes a term that will appear in the new version of AGROVOC and in the Subject tree when there are records that have been indexed with it.

Animal Production & Health

Animal breeding | Animal Diseases | Animal Health | Animal Husbandry | Animal Physiology | Animal Production | Animal products | Disease Control | Feeding | Feeds | Livestock | Statistical Data | Veterinary Medicine

Economics & Policy

Agricultural Products | Agricultural policies | Commodity Markets | Development aid | Development policies | Economic Development | Economic Policies | Food Security | Investment | Land economics | Marketing | Natural Resources | Policies | Prices | Production | Statistical Data | Supply Balance | Trade | Trade Policies

Education & Extension

Education | Educational Policies | Extension Activities | Information and Communication Technologies (ICTs) | Journalism | Public Relations | Training

Farming Practices & Systems

Cropping Systems | Farm Management | Farming Systems | Organic Agriculture | Sustainability | Urban Agriculture

Fisheries & Aquaculture

Aquaculture | Climatic Change | Cooperation | Ecosystems | Development policies | Fishery data | Fish processing | Fisheries | Fisheries development | Fishery policies | Fishes | Fishery management | Fishery production | Fishery products | Fishery resources | Governance* | Legislation | Marketing | Quality | Research | Safety | Statistical data | Trade | Trends | Technology | Trade agreements | Utilization*

Food Security

Agricultural development | Agricultural policies | Agricultural situation | Development projects | Emergency relief | Ethics | Famine | Food aid | Food policies | Food production | Food resources | Food stocks | Food supply | International cooperation | Malnutrition | Poverty

Forestry

Biodiversity | Climate change | Community forestry | Education | Environment | Forest land | Forest management | Forestry development | Forestry policies | Forest products | Forest protection | Forest resources | Forests | Legislation | Statistical Data

Geographical and Regional Information

Regions (Drop down): Africa | Latin America and the Caribbean | North America | Asia | Europe | Oceania Country (Drop down)

Government, Administration & Legislation

Administration | Agricultural and rural legislation | Environmental legislation | Food legislation | International agreements | Labour legislation | Law | Legislation | Management | Planning | Public health legislation | Regulations | Standards | Water rights

Human Nutrition & Food Safety

Consumer Protection | Diet | Food Additives | Food Composition | Food Legislation | Food Safety | Food Technology | Foods | Health | Human Nutrition | Malnutrition | Nutrition Education* | Nutrition Policies | Nutritional Requirements | Public Health | Quality controls | Risk Assessment* | Risk Communication* | Statistical data

Natural Resources & Environment

Biodiversity | Climate | Desertification | Drainage | Ecology | Ecosystems | Environmental Conventions | Environmental Protection | Forestry Resources | Genetic Resources | Irrigation | Land economics | Land resources | Land Use | Natural Resources | Pollution | Resource Management | Soil Resources | Soil Sciences | Statistical data | Water Resources | Water use

Plant Production & Protection

Breeding Methods | Crop Management | Crops | Fertilizers | Harvesting | Integrated Pest Management | Irrigation | Pest Control | Phytosanitation[1]| Plant genetic resources | Plant health[2]| Plant protection | Pesticides | Seed production | Weeds

Rural, Social & Agricultural Development

Agricultural Development | Agricultural policies | Community forestry | Community Involvement | Development projects | Households | Indigenous Knowledge | Participation | Poverty | Rural Communities | Rural Development | Rural Finance | Social Policies | Socioeconomic Development | Sustainable Livelihoods | Gender | Women in Development

Engineering, Technology & Research

Appropriate technology | Biotechnology | Databases | Engineering | Equipment | Farm equipment | Methods | Research | Statistical Data | Statistical methods | Surveys | Technology

AGRIS Categories

A

Agriculture

A01

Agriculture - General aspects

A50

Agricultural research

B

Geography and history

B10

Geography

B50

History

C

Education, extension, and advisory work

C10

Education

C20

Extension

C30

Documentation and information

D

Administration and legislation

D10

Public administration

D50

Legislation

E

Economics, development, and rural sociology

E10

Agricultural economics and policies

E11

Land economics and policies

E12

Labour and employment

E13

Investment, finance and credit

E14

Development economics and policies

E16

Production economics

E20

Organization, administration and management of agricultural enterprises or farms

E21

Agro-industry

E40

Cooperatives

E50

Rural sociology

E51

Rural population

E70

Trade, marketing and distribution

E71

International trade

E72

Domestic trade

E73

Consumer economics

E80

Home economics, industries and crafts

E90

Agrarian structure

F

Plant production

F01

Crop husbandry

F02

Plant propagation

F03

Seed production

F04

Fertilizing

F06

Irrigation

F07

Soil cultivation

F08

Cropping patterns and systems

F30

Plant genetics and breeding

F40

Plant ecology

F50

Plant structure

F60

Plant physiology and biochemistry

F61

Plant physiology - Nutrition

F62

Plant physiology - Growth and development

F63

Plant physiology - Reproduction

F70

Plant taxonomy and geography

H

Protection of plants and stored products

H01

Protection of plants - General aspects

H10

Pests of plants

H20

Plant diseases

H50

Miscellaneous plant disorders

H60

Weeds

J

Handling, transport, storage and protection of agricultural products

J10

Handling, transport, storage and protection of agricultural products

J11

Handling, transport, storage and protection of plant products

J12

Handling, transport, storage and protection of forest products

J13

Handling, transport, storage and protection of animal products

J14

Handling, transport, storage and protection of fisheries and aquacultural products

J15

Handling, transport, storage and protection of non-food or non-feed agricultural products

K

Forestry

K01

Forestry - General aspects

K10

Forestry production

K11

Forest engineering

K50

Processing of forest products

K70

Forest injuries and protection

L

Animal production

L01

Animal husbandry

L02

Animal feeding

L10

Animal genetics and breeding

L20

Animal ecology

L40

Animal structure

L50

Animal physiology and biochemistry

L51

Animal physiology - Nutrition

L52

Animal physiology - Growth and development

L53

Animal physiology - Reproduction

L60

Animal taxonomy and geography

L70

Veterinary science and hygiene

L72

Pests of animals

L73

Animal diseases

L74

Miscellaneous animal disorders

M

Aquatic sciences and fisheries

M01

Fisheries and aquaculture - General aspects

M11

Fisheries production

M12

Aquaculture production and management

M40

Aquatic ecology

N

Machinery and buildings

N01

Agricultural engineering

N02

Farm layout

N10

Agricultural structures

N20

Agricultural machinery and equipment

P

Natural resources

P01

Nature conservation and land resources

P05

Energy resources and management

P06

Renewable energy resources

P07

Non-renewable energy resources

P10

Water resources and management

P11

Drainage

P30

Soil science and management

P31

Soil surveys and mapping

P32

Soil classification and genesis

P33

Soil chemistry and physics

P34

Soil biology

P35

Soil fertility

P36

Soil erosion, conservation and reclamation

P40

Meteorology and climatology

Q

Food science

Q01

Food science and technology

Q02

Food processing and preservation

Q03

Food contamination and toxicology

Q04

Food composition

Q05

Food additives

Q51

Feed technology

Q52

Feed processing and preservation

Q53

Feed contamination and toxicology

Q54

Feed composition

Q55

Feed additives

Q60

Processing of non-food or non-feed agricultural products

Q70

Processing of agricultural wastes

Q80

Packaging

S

Human nutrition

S01

Human nutrition - General aspects

S20

Physiology of human nutrition

S30

Diet and diet-related diseases

S40

Nutrition programmes

T

Pollution

T01

Pollution

T10

Occupational diseases and hazards

U

Auxiliary disciplines

U10

Mathematical and statistical methods

U30

Research methods

U40

Surveying methods

Appendix 15. FAO Term Work Flow

In file Terminology workflow-update1.doc

Appendix 16. Analysis and Critique of the CABI Thesaurus

This analysis was done by a student. It is quite well done and might be useful while preparing for mapping to the CABI Thesaurus


[1] Actual term has yet to be confirmed. If Phytosanitary regulations/standards is also added it will be linked to Phytosanitation and appear as a sub-category.
[2] Has not yet been added to Agrovoc and thus will not immediately be added to the tree.

Previous Page Top of Page