Table of ContentsNext Page


Part 1. Major recommendations


Major recommendations. Overview

1 Recommendation 1. Build an inventory of KOS uses and KOS, now and future

For FAO, for organizations with which FAO collaborates or which FAO wishes to support, and for the food and agriculture domain in general, assemble and maintain an inventory (registry) (1) of KOS use cases and of functions that should be served by information on concepts and terms and (2) of KOS and KOS efforts. To realize the full benefit KOS can provide for the organization and to allow for a complete cost-benefit analysis of KOS activities, this inventory should cover a wide spectrum of existing and imaginative new KOS uses (Appendix 4). Development of this inventory requires thorough knowledge of the organization and imagination and vision.

2 Recommendation 2. Integrate information management for all FAO KOS

Integrate information management for all FAO KOS, beginning with AGROVOC, FAO Term, and FAO Glossary, into one FAO KOS database, called the FAO KOS Distribution System (KDS), to be used by all groups that create and maintain KOS. The FAO KDS should also provide the environment for collaborative development and refinement of KOS as needed by semantic Web and other artificial intelligence applications and for a crosswalk between major agricultural KOS and (Recommendations 3 and 4, Section H).

3 Recommendation 3. Incrementally build a rich ontology of the FAO domain

This recommendation responds to the need for rich ontologies for improved information access and intelligent information processing in the food and agriculture domain. Starting with the concepts in AGROVOC, FAO Term, and FAO Glossary (and possibly the NAL, CABI, and CAAS thesauri) develop a well-structured, meaningfully arranged classification of the FAO domain (using facets where appropriate) in collaboration with groups both inside and outside FAO. Apply the rules-as-you-go approach to refine relationships, starting now with a few rules that apply to many relationships so that collaborating groups can focus on the semantically more difficult relationships.

4 Recommendation 4. Create a crosswalk between major KOS in the FAO domain

Within financial constraints, collaborate with other institutions, in particular NAL, CABI, and CAAS, to create a crosswalk between major KOS in the FAO domain, evolving incrementally into a system modeled in functionality, but not in implementation, after the Unified Medical Language System of the US National Library of Medicine as the basis for the Agricultural Ontology Service. This should be linked seamlessly to a database of the taxonomy of living things and to a geographic name server.

5 Recommendation 5. Use powerful KOS management software

Use KOS management software (KMS) that supports many KOS, can handle complex concept and term relationships, and, through full exploitation of the knowledge available in existing KOS and through intelligent processing, makes the process of creating and maintaining KOS as efficient as possible.

Major recommendations. Detail

1 Recommendation 1. Build an inventory of KOS uses and KOS, now and future

For FAO, for organizations with which FAO collaborates or which FAO wishes to support, and for the food and agriculture domain in general, assemble and maintain an inventory (registry) (1) of KOS use cases and of functions that should be served by information on concepts and terms and (2) of KOS and KOS efforts. To realize the full benefit KOS can provide for the organization and to allow for a complete cost-benefit analysis of KOS activities, this inventory should cover a wide spectrum of existing and imaginative new KOS uses (Appendix 4). Development of this inventory requires thorough knowledge of the organization and imagination and vision.

KOS uses include existing uses as well as new uses or applications for KOS (existing functions not now supported but that would benefit from using a KOS and future functions that would benefit). KOS include existing KOS and their maintenance, KOS in the process of development, and new KOS planned or suggested.

1.1 Rationale

(1) Even within FAO itself (not to mention other national and international organizations) there are many isolated efforts in developing and maintaining KOS, often by staff who are not experts in KOS development. This leads to duplication of effort, inefficiency, redundancy, and inconsistency. An inventory of such efforts, and of implemented and potential use cases for such efforts, is needed for better planning.

(2) An inventory of KOS and KOS development efforts is needed so that all available resources can be used to answer questions, be it through a distributed system or an integrated database.

(3) An inventory of use cases provides the data needed to set priorities in KOS development efforts and supports full exploitation for KOS and thus increases return on investment.

(4) An inventory of use cases provides the basis for at least a "seat of the pants" estimate of the return on investment and thereby provides a basis for a complete cost-benefit analysis of KOS activities and decisions on resource allocation.

(5) This will provide a starting point for a KOS inventory needed as the backbone of the Agricultural Ontology Service.

1.2 Implementation ideas

1.3 Template for KOS use cases

Template for KOS use cases

Number and title

Relationship to agency mission: priority supported

Activity supported / savings or benefit offered by the KOS

Internal activity (e.g., routing of drug applications) versus
External activity (e.g, public information about drugs)
Both linked to beneficiary group

Beneficiary/user group

How many people? How many potential instances per person per month

How many instances of the activity per month

Now
Planned (may increase due to ease of use through thesaurus support, marketing, etc.)

Benefits per instance

Savings in time and/or money
Quality improvement (how much?)

System supporting this activity

KOS requirements: Subject domains, specificity, languages, types of relationships

KOS that are or can be used as is, need to be adapted, need to be developed,

Other functional requirements: What needs to happen to make the potential benefits real. Costs and responsibility for each

System-side. For example

Install automatic query term expansion for free-text searching on a Web site
Use KOS for indexing with a controlled vocabulary: human, computer-assisted, automatic

User side. Train users in applying the KOS
Reengineering work processes

Estimated cost for KOS application

Estimated benefits

Time frame

Comments

1.4 Template for KOS projects

Template for KOS projects

Number and title

Relationship to agency mission: priority supported

Related KOS use cases

Scope and size

Unit and person responsible

Collaboration / coordination (actual and possible)

Development versus maintenance

Any gaps in domain coverage

Data model (entity and relationship types) (existing and needed)

Software used, file structure

Publication data / Location / URL

Development person hours / maintenance person hours per month

Estimated cost for development and for maintenance

Time frame

Comments

2 Recommendation 2 Integrate information management for all FAO KOS

Integrate information management for all FAO KOS, beginning with AGROVOC, FAO Term, and FAO Glossary, into one FAO KOS database, called the FAO KOS Distribution System (KDS), to be used by all groups that create and maintain KOS. The FAO KDS should also provide the environment for collaborative development and refinement of KOS as needed by semantic Web and other artificial intelligence applications and for a crosswalk between major agricultural KOS and (Recommendations 3 and 4, Section H).

2.1 General rationale (also for Recommendation 3)

While these objectives can be achieved, to some extent, by a federated solution, such a solution will have higher costs and will likely not achieve the objectives as well.

The data in this database are produced by many units throughout FAO, for example the various groups responsible for maintaining specialized glossaries. Each unit should retain ownership over and control of its data, and data should not be changed without agreement of the unit.

2.2 Present situation

Presently, the three major KOS are maintained as follows:

AGROVOC is maintained in a MySQL database with a simple interface for adding and editing concepts and terms. This database is ported to ORACLE for Web access.

FAO Term is maintained in TRADOS MultiTerm, which tightly integrates with the TRADOS translation environment used by translators. The MultiTerm database is ported to ORACLE for Web access. (See Appendix 15 for a description of the work flow)

FAO Glossary is maintained as an ORACLE database with a nice interface for adding and editing terms. However, many of the glossaries are maintained by there owners as word processing documents which are then parsed (requiring sophisticated procedures) to read the data into ORACLE. This is done only at large intervals following the publication cycle of updated versions of the printed glossaries (for example, every two years). The data structure and the interface to this database allow for many types of information, many of which are at present not populated.

Within the scope of this report it was not possible to examine all the KOS maintained in FAO. Recommendation 1 addresses the issue of creating an inventory of these KOS.

2.3 Detailed consideration of alternatives

It is assumed that two criteria must be met by any solution to be implemented:

(a) Users should have one-stop access to all information about a concept or term.

(b) The present owners of a KOS need to retain control over the content of that KOS.

There are four overall solutions that can be considered (each with many ways of implementation)

(S1)Maintain the status quo of multiple systems with independent access.

(S2) Develop an interface that accesses different systems and integrates information from several KOS on the fly.

(S3) Develop a unified system that provides a joint home for different lexical knowledge bases, leaving control with the owners, and that provides users with integrated access to information from all KOS within FAO

(S4)Implement a unified system under central control.

Solution (S1) fails to meet the necessary criterion (a), and solution (S4) fails to meet the necessary criterion (b). This leaves solutions (S3) and (S4) for closer examination. The criteria set forth below are suggested for this examination.

First this report will elaborate on the solutions.

Solution (S2) can be characterized and implemented as follows:

Existing systems operate as they do now. Have an overall access format that is simply a list of all the types of data available from any of the underlying systems, with duplicates removed. This tells users what data are available; internally the format indicates which type of information is available where.

Data can be communicated from one system to another through intermediation of this format but the content may be based on different definitions for a type of data, for example, different definitions of relationships between concepts.

If a user requests certain types of data, the access system obtains these data from the appropriate underlying systems and combines them into one display without resolving semantic ambiguities and inconsistencies that might exist.

It is possible within this solution to start the semantic integration of different KOS and reflect the results in each KOS; the editors of each KOS have access to all the data in the other KOS to facilitate this process. That will make the on-the-fly integration of data for presentation to the user easier.

This solution has two variants:

(S2.1) Each system has its own set of KOS data entry and editing screens; access to data from other KOS through the integrated end-user interface

(S2.2 Data entry and editing screens are coordinated with built-in access to all KOS (a step towards (S3))

Solution (S3) can be characterized and implemented as follows. (See also the detailed suggestions below)

A common semantic model that provides standard types of data (entity types and relationship types, data elements) with definitions that are used by all contributors/editors of KOS data.

All contributors/editors have access to all data so that duplication of effort in creating and maintaining data is avoided. The system supports online communication among contributors/editors to enable continuous thoughtful integration of data (see below).

The overall system has non-redundant or at least consistent data storage.

Responsibility for cleaning up existing data and adding new data and editorial control is distributed across multiple contributors/editors according to one or more of the following criteria: subject domain, user group, and type of data. Each unit maintains complete control of its data.

Data on concepts and terms are thoughtfully integrated across all points of origination. This increases data quality and interoperability, for example by clarifying a concept and its definition from multiple perspectives.

If a user requests data about a concept or term, she receives a unified and consistent report.

Criteria for evaluating alternatives for dealing with FAO KOS

(c1) User access to the KOS

(c1.1) Ease of access

(c1.2) Quality of information integration

(c1.3) Response time

(c2) Interoperability of KOS as they are applied in information systems

(c3) Maintenance of the KOS

(c3.1) Enable use of the information in all KOS while editing any one KOS

Good performance on (c3.1) promotes (c2) Interoperability

(c3.2) Using suggestions (for new concepts, terms, definitions, etc.) from users and indexers of one KOS for other KOS. Suggestions made by users of one system will often be useful for the maintenance of other KOS

(c3.3) Transferability of KOS editing skills from one KOS to another

(c4) Implementation and maintenance of the software system

(c4.1) First implementation

(c4.2) Maintenance

(c4.3) Storage space

(c5) Integration with applications

(c6) Supporting new KOS or automating existing KOS that are manually maintained. Adding these KOS to the system

The following table compares solutions (S2) and (S3) using these criteria

Criterion

Solution (S2) Integrated access to separate systems

Solution (S3) Unified system for decentralized but coordinate KOS maintenance

(c1) User access to the KOS



(c1.1) Ease of access

No difference between solutions


(c1.2) Quality of information integration

This is likely to be limited since integration on the fly is difficult. This will work as well as Solution (S3) only if the information in the different KOS has been edited to avoid all unnecessary differences.

Good because integration is done beforehand in the system.

(c1.3) Response time

May be slow because of accessing multiple systems and then processing the information gathered. Likely to deteriorate as new systems are added.

Good because only a single database with pre-integrated information is accessed. Proper database design will maintain performance even in a large database.

(c2) Interoperability of KOS

If unnecessary differences are edited out and common standards are followed, (S2) will work ok. However, there is the danger that the systems will diverge again unless maintenance is tightly coupled, as described in (S3). (S2.2) would be better here than (S2.1)

Built-in to the extent that the owners of the different systems can agree on common concepts and terms. Remaining differences should be clearly articulated, which is somewhat easier in a unified framework.

(c3) Maintenance of the KOS



(c3.1) Enable use of the information in all KOS while editing any one KOS

Cumbersome in (S2.1), involving copying from the end-user interface to use data from a KOS other than that being edited. Easier in (S2.2)

Built in

(c3.2) Using suggestions made for any KOS for maintenance of all KOS

Requires continuous exchange of data

Unified suggestion list built into the system

(c3.3) Transferability of KOS editing skills from one KOS to another

Requires learning content rules specific to the other KOS
For (S2.1): Requires learning a new interface
For (S2.2): Interface always the same

Requires learning content rules specific to the other KOS Interface always the same

(c4) Implementation and maintenance of the software system



(c4.1) First implementation

Very similar. Both need to deal with idiosyncracies of individual systems. Both can build on the existing code base

Needs to deal with access to multiple databases

Needs to access just one database - easier to implement

(c4.2) Maintenance

Any changes in participating systems require work. Different systems may do same functions differently.

All changes and improvements made benefit all KOS.

(c4.3) Storage space

Much redundant storage

Each piece of information stored once

(c5) Integration with applications

Difficult since either all applications must deal with accessing multiple systems or a common API that provides integrated access to all systems must be developed. While developing such an API can build on the work done for the integrated end-user interface, it is still a major piece of work

Easy to access information from all KOS at once.

If data are needed in a format specific to the application, a data export module must be created

(c6) Supporting new KOS

Need to develop a whole new application (unless the new KOS is created and maintained within one of the existing KOS systems - Solution (S3) on a smaller scale.

All mechanisms for common access must be updated.

May need to add some new functionality, but can simply add new data in most cases

Table 1. Comparison of KOS implementation solutions

This analysis shows that both Solution (S2) and Solution (S3) can be designed to meet the requirements set forth, but (S3) does so more elegantly, at lower cost, and most likely higher performance. This is why (S3) is recommended.

2.4 Notes on implementation

2.4.1 Integration of ORACLE databases

Since all these KOS exist as ORACLE databases, storing all their data in one database (without intellectual integration) should not be hard. The new database consists of a set of tables drawn from the existing databases. Some of these tables will consist of the union of the data fields (columns) from two or more existing tables that are very similar. Other tables will simply be taken as is from an existing database. The comparative table of data fields from the three existing databases in Appendix 7 should assist in this schema integration. There may also be some entirely new tables that deal with data ownership and display destination.

The AGROVOC interface can be ported from MySQL to ORACLE and combined with the functionality of the FAO Glossary interface (which should be easily adapted to the new database structure). This functionality can then be used for editing FAO Term, with the possible addition of some features. Finally, software must be written to port all FAO Term data (all the data in the combined database that are potentially useful to translators) to MultiTerm so that these data are available in the TRADOS translation environment.

Another feature to be added is a notification system: Whenever one team adds a term, the other teams must be notified so they can consider adding the term as well. In the case of the FAO Glossary, there should be an editor that can forward the notification to the appropriate unit(s) (possibly none).

The database should be set up in such a way that each concept can be accessed through a URI.

Note: The Harvard Business School Thesaurus Project has developed an ORACLE database schema (Appendix 10) and is developing data entry and user interface screens based on that schema. They would be amenable to talks about collaboration in developing this application. This schema would have to be augmented to take account of multiple languages.

2.4.2 Harmonization of term formats

All systems should predominantly follow standard dictionary practice for the form of terms. This means:

2.4.3 Arrangements for Web access

The Web access should probably done as follows for now:

Each unit that produces a glossary has on its Web site a link to its own glossary
FAO Term and the overall FAO Glossary are combined
AGROVOC retains its present access.

The existing interfaces for Web access should be easily modified to work with the new database.

2.5 Further development

The KDS common database suggested here will provide immediate access to all data through a common mechanism. By putting the knowledge available now into a format that can be easily augmented and refined, KDS also provides the framework for

3 Recommendation 3. Incrementally build a rich ontology of the FAO domain

This recommendation responds to the need for rich ontologies for improved information access and intelligent information processing in the food and agriculture domain. Starting with the concepts in AGROVOC, FAO Term, and FAO Glossary (and possibly the NAL, CABI, and CAAS thesauri) develop a well-structured, meaningfully arranged classification of the FAO domain (using facets where appropriate) in collaboration with groups both inside and outside FAO. Apply the rules-as-you-go approach to refine relationships, starting now with a few rules that apply to many relationships so that collaborating groups can focus on the semantically more difficult relationships.

A well-structured hierarchy is essential to support good indexing and query formulation and other user interactions with the KOS. It is equally important to support reasoning through hierarchical inheritance in artificial intelligence and Semantic Web applications. Both applications also require a rich set of differentiated relationships as detailed in the JoDI paper (Appendix 12). Such relationships are also needed for extracting specialized KOS. For example, to extract a KOS on rice, we need relationships from rice to organisms that are pests of rice to be sure to include such organisms.

Constructing such a hierarchy requires standard procedures of thesaurus development: semantic factoring and facet analysis followed by hierarchy building in each facet. Much of this process can be supported by computer. This should be organized as a collaboration of many groups inside and outside FAO supported by the FAO KDS described in Section 2 with Web data access and Web data entry. For example, the Thai AGROVOC group is eager to develop a rich ontology of Thai food, horticultural, and forestry plants. There is also a group in the US working on expert systems for processing documents in the domain of food and agriculture, and they are working on ontologies to support their AI programs.

Procedural details are discussed in Section E.

4 Recommendation 4. Create a crosswalk between major KOS in the FAO domain

Within financial constraints, collaborate with other institutions, in particular NAL, CABI, and CAAS, to create a crosswalk between major KOS in the FAO domain. This might evolve incrementally into a system modeled in functionality, but not in implementation, after the Unified Medical Language System of the US National Library of Medicine as the basis for the Agricultural Ontology Service. This should be linked seamlessly to an existing database of the taxonomy of living things.

5 Recommendation 4. Use powerful KOS management software

Use KOS management software (KMS) that supports many KOS, can handle complex concept and term relationships, and, through full exploitation of the knowledge available in existing KOS and through intelligent processing, makes the process of creating and maintaining KOS as efficient as possible.

There are many requirements KOS management software (KMS) should meet; they are detailed in Appendix 6. A few requirements that are essential for FAO but absent from most KMS are highlighted here:


Top of Page Next Page