Previous Page Table of Contents Next Page


2 AGROVOC: A multilingual agricultural thesaurus


This section describes the AGROVOC Thesaurus and further illustrates the problem of semantic under-specification.

2.1 Background

AGROVOC is a multilingual, structured, controlled vocabulary/thesaurus designed to cover concepts and terminology in agriculture, forestry, fisheries, food and related domains (e.g. environment). It was developed by the Food and Agriculture Organization (FAO) of the United Nations and the Commission of the European Communities in the early 1980s to describe documents and other information resources in a controlled language for indexing and searching. It contains approximately 16,500 descriptors and 10,000 non-descriptors.

AGROVOC is available online and for download in the five FAO official languages (English, French, Spanish, Chinese and Arabic). It is translated into other national languages such as Czech, Danish, German, Italian, Polish, Portuguese, Slovak and Thai.

2.2 Applications and related terminologies

AGROVOC is used for controlled-vocabulary indexing and searching globally and in various systems throughout the FAO. Systems where AGROVOC is used include:

Within FAO:

AGROVOC coexists as one knowledge organization system next to numerous others in the agricultural domain. Among the most important ones at the FAO are:

There are a number of other thesauri in the food and agricultural sector, developed by other institutions, such as

These thesauri basically all follow the same conceptual structure, which will be discussed in the following section. Nevertheless, we will see that, although all these thesauri use basically the same conceptual model, the information contained in them can differ substantially.

2.3 Conceptual structure of AGROVOC

AGROVOC follows a traditional thesaurus approach. It is a collection of terms, definitions, and term relationships. As is the case with most thesauri, a small, standard, non-adaptable set of relationship types is applied to interlink terms.

2.3.1 Equivalence relationships

USE: Since thesauri have been primarily developed for the purpose of indexing and retrieval, this relationship indicates that any term preceding the USE relation should be replaced, for the purposes of indexing documents and formulating queries, by the term following the USE relation. The relationship usually (but not always) expresses synonymy between two terms.

USED FOR (UF): This is the inverse of USE and indicates that term A is USED FOR term B for indexing purposes.

2.3.2 Hierarchical relationships

Narrower Term (NT): if X is a NT of Y, then X is narrower in some sense than Y. For example, milk NT cow milk, grain NT rice.

Broader Term (BT): if Y is a BT of X, then X is broader than Y; for example cow milk BT milk, rice BT grain. BT is the inverse of NT.

Given these rather unspecific definitions, BT and NT relationships can be applied to express generic relations, meronymic relations, instantiations, and many others (see section 4).

2.3.3 Associative relationships

Related Term (RT): the thesaurus conceptual model contains the RT relationship to express any kind of associative relationship between two terms that is not a hierarchical relationship. This relationship is hence very ambiguous in that it is the default for all other relationships.

Hierarchical (NT, BT) and associative (RT) relationships are relationships between concepts. In the thesaurus, these exist only between descriptors. Following a traditional thesaurus approach, AGROVOC distinguishes between descriptors and non-descriptors (often referred to imprecisely as preferred terms and non-preferred terms). The rationale behind this is that only a descriptor should be used when referring to the concept (for example, for indexing and retrieval); each descriptor uniquely and unambiguously designates a concept. A non-descriptor must not be used for indexing or retrieval; it is linked through a USE cross-reference to the corresponding descriptor that must be used instead. There are no relationships from one non-descriptor to another.

2.3.4 Scope notes

Many descriptors in AGROVOC have a scope note, which can be a definition of a term, a history note, instructions to the indexer or searcher, or simply a comment. The purpose is to provide the user with more detail about the term and its usage.

2.3.5 Top level structure

Currently AGROVOC has more than 1500 top-level terms, i.e. descriptors which do not have a broader term, making it cumbersome to access the thesaurus from a top-level approach and browse through the hierarchy. Superimposed on AGROVOC is the AGRIS categorization scheme; it has more than 100 top-level categories, ordered in a shallow two-level hierarchy. AGROVOC descriptors are mapped to the second level of AGRIS categories. For example, the AGROVOC descriptor fish farms is mapped to the AGRIS category aquaculture production which is a subcategory of fishery and aquaculture. Thus the AGRIS categorization scheme provides high-level organization for information that has been tagged with AGROVOC descriptors.

2.4 Semantic problems of AGROVOC

Given its minimalist conceptual structure, AGROVOC (as other thesauri) has a number of semantic flaws. In the following we will use examples to point out the major drawbacks of the current system and develop the rationale for the shift towards a more powerful, expressive, and unambiguous conceptual model.

2.4.1 Ambiguous descriptor to non-descriptor relationship

In AGROVOC, as indicated, USE/UF covers synonyms and formal variants. In addition, the relation also links quasi-synonyms and very specific narrower terms, which the AGROVOC defines as any of the following:

1. "two concepts considered sufficiently alike to be identified by one descriptor"
2. "a concept and its opposite"
3. "more specific concepts encompassed by one descriptor"

Definition 1 deals with semantically very closely related, yet separate concepts (so that the terms designating such concepts would not be true synonyms), such as

famine

UF hunger

Definition 2 involves concepts on opposite ends of a scale or otherwise in opposition to each other. (With a few exceptions, the terms designating such concepts are antonyms). Example:

hydrophilicity

UF hydrophobicity

Definition 3 indicates that USE/UF can also express a hierarchical relationship, for example:

biological competition

UF interspecific competition
UF intraspecific competition

where the fine distinction between interspecific competition and intraspecific competition is deemed unnecessary for retrieval and therefore abandoned in favor of the more general category.

2.4.2 Ambiguous hierarchical definitions

The BT/NT relationship used to build up the hierarchy is very ambiguous; it lumps together several different types of relationships as the following examples show:

2.4.2.1 <includesSpecific> relationship (erythrocytes are a specific kind of blood cell):

blood cells

NT erythrocytes
NT leukocytes

2.4.2.2 <hasComponent> relationship (blood contains as a component blood cells):

blood

NT blood cells

2.4.2.3 The following example shows clearly the discrepancies between different thesauri that apply the ambiguously defined modeling principles:

AGROVOC and CABI:

water

NT ice
NT water vapor

...

NT fresh water
NT drinking water

But ASFA:

water

RT ice
RT water vapor

...

NT fresh water

Water vapor and ice are phases of water while fresh water and drinking water are kinds of water, so in AGROVOC and CABI hierarchical relationships lump together several different semantic relationships. For retrieval this is generally useful (a search for water should generally find documents on ice as well), but for more differentiated retrieval a user may want to ask for water in all phases or for all kinds of water. There are many other purposes of semantic processing that need more differentiated relationships. In ASFA the phase relationship is treated as a RT, an example of how grouping relationships may lead to inconsistency. Note, by the way, that neither thesaurus includes the concept liquid water, which is logically necessary if water means water in any phase.

There are many more examples in AGROVOC where the currently used BT/NT relationship is used to describe different relationships. The most obvious ones have been identified and are used in our proposal below in section 4.

2.4.3 Ambiguous associative relationships

Like the BT/NT relationships, the associative RT relationships can be refined into more specific relationships. Some examples are given below.

2.4.3.1 <hasMember> relationship (Anglophone Africa <hasMember> Botswana)

Anglophone Africa
RT Botswana
RT Gambia
RT Ghana
RT Kenya
RT Lesotho
...

2.4.3.2 <causes> relationship (bleaching <causes> discoloration):

bleaching

RT discoloration

2.5 The need for reengineering AGROVOC into an ontology

The examples above indicate clearly the ambiguous nature of the relationships in AGROVOC. With respect to future information retrieval and intelligent processing needs, where it will be necessary to combine different KOS in order to give access to different information systems, it becomes evident that a more rigid structure is required. A reassessment of AGROVOC (as well as other thesauri) to transform its UF, NT, BT, and RT relationships into unambiguously defined relationships and hierarchical order will provide the first step towards solving the problem of ambiguity and inconsistency in information description and retrieval.


Previous Page Top of Page Next Page