COAIM 1/4

April 2000





1ST CONSULTATION ON AGRICULTURAL INFORMATION MANAGEMENT

WORKING DOCUMENT

Rome, 5-7 June 2000

Standards and Guidelines for Agricultural Information Management

Table of Contents


I. INTRODUCTION

1. The adoption of standards is fundamental to ensure quality, accessibility, usefulness and timeliness of agricultural information.

2. One of FAO's main tasks in its normative role is to facilitate the development and adoption standards and guidelines to improve the effectiveness of programmes in agricultural development and food security world-wide. An example of FAO's efforts to facilitate the adoption of standards among partners in the international community is illustrated by the Global Information and Early Warning System on Food and Agriculture (GIEWS) (http://www.fao.org/giews).

3. The success of the early warning systems programme of FAO can be largely attributed to the development of standards and procedures for collecting, analysing and disseminating information relevant to early warning systems in the world. The result of implementing GIEWS has had an important impact on minimizing the risk of famine in the world by providing an international framework which allows to identify in advance locations where populations are at risk of suffering from food shortages with sufficient lead time for response. The standards developed by the GIEWS programme have thus contributed in an important way to saving people's lives in many parts of the world over the last 25 years.

4. The Internet offers great opportunities for addressing the information needs of agricultural development and food security in the world. However, it has become evident that there are serious limitations for finding and retrieving relevant information using existing Internet tools and technologies. The example of the GIEWS programme shows how standards and guidelines can provide the framework for collecting, managing and delivering the relevant information in a timely and effective manner.

5. It is evident that there are already numerous standards in the many domains of agricultural development and food security, and within FAO's programme on information management it would be neither effective nor practical to propose any particular one of them to meet all user needs. The approach proposed by FAO in this paper for the consideration of the Consultation is to establish a framework for standards on agricultural information management based on internationally accepted and existing industry standards, and by linking together established international thesauri1 in the different technical areas of agricultural development and food security.

6. However, adoption of standards, categorization schemes, and agreed terminology is not sufficient to meet all the information needs for agricultural development and food security, which are interdisciplinary by nature. This requires information systems that can link in a coherent way related topics between the different disciplines within agriculture and between agriculture and other sectors, such as health, trade, and education. To achieve all of these goals requires active collaboration between all stakeholders and partners at the national, and international level. The related paper on Strengthening information and knowledge management capacities through international cooperation addresses this important issue in more detail.

7. This paper will initially focus on the "information revolution" that has taken place with the advent of the Internet. After this, it provides an overview of key issues regarding standards in agricultural information management and of FAO's own role in this area, including the maintenance and further enhancement of the multilingual agricultural thesaurus AGROVOC. Specific priorities are then suggested for applying standards in support of agricultural development and food security, with an underlying emphasis on FAO's normative role. The Consultation is then invited to consider a proposal for the way forward with a programme on standards for agricultural information management.

II. INTERNET - A VEHICLE FOR GLOBAL INFORMATION DISSEMINATION AND DATA EXCHANGE?

A. USE OF THE INTERNET

8. As noted in the plenary paper Access to Information and Communication for Sustainable Development and Food Security, the statistics concerning the number of web sites and international users illustrate the impressive scale and reach of the Internet. However, the sheer quantity of information posted on the Internet has now outgrown our ability to manage it.

B. THE INHERENT PROBLEMS OF THE INTERNET

9. The Internet is not really designed to manage large quantities of information, let alone an "Information revolution". As the statistics provided in the related plenary paper mentioned above illustrate, the amount of information and data available on the Internet is simply spiralling out of control, and there are few tools at the present time to manage it in an effective way. By the very nature of the Internet's architecture, information on similar subjects is scattered across many different servers around the world, yet there are few tools available which are able to integrate related information from different sources. As a result, it is very difficult to find things on the Internet.

10. Existing search engines cover less that 15% of all the information available on the Internet, and provide too many and non-relevant answers. For example, with 1.6 billion web-pages around the world, searching for the term "Buffalo Management" in one of the many available search engines returns results that are very difficult to deal with. Not only are more than 93,000 web-sites presented that meet this search criteria, but there is also a significant problem related to semantics. The search engine in fact returns a large number of results concerning a city in the United States called Buffalo, including one entry providing detailed information about taking courses at the University at Buffalo, School of Management. Perhaps this is interesting to some people, but it is completely irrelevant to managing buffalo herds.

11. The fact is that information is not well-organised on the Internet, and there are very few search engines that rely on human beings to create metadata2 that better categorizes the information that is available. Moreover, there is extremely limited use of agreed vocabularies, such as the multilingual agricultural thesaurus AGROVOC, which at least would guarantee the relevance of keywords used by both the supplier and the recipient of information.

12. The large search engines do in fact index the Internet to facilitate retrieval, but it is done with automatic routines without value-added categorization by human beings, and without agreed vocabularies and standards. This may make the search engines fast in responding to a keyword search, but in addition to supplying too many and irrelevant results, they also frequently provide outdated information.

13. Even when the search engines do finally present results that seem current and relevant, the subsequent action of linking to the source information often fails. Typically this is either because the host server is temporarily unavailable, or even worse because the information has disappeared (i.e. the dreaded error 404, "Object not Found"). This unfriendly feature of search engines is the result of a lack of synchronization between the automatically generated metadata and subsequent changes that may take place in the location of the full information object. This leads to one of the most significant flaws of the Internet: that a unit of information is defined by its physical location (i.e. its URL) rather than a unique identifier that would follow it wherever the information moves.

14. All of these problems are primarily due to the fact that the Internet was not initially envisioned as a tool for global access to information, and that the underlying standards for information management are not entirely adequate. Even the agreed international standard for publishing documents on the Internet, the HyperText Mark-up Language (HTML), is not entirely appropriate. HTML is the language that browsers such as Internet Explorer and Netscape interpret to display a document on the Internet in consistent and standard way, regardless of type of computer equipment and operating system software being used to surf the net. However, HTML is a non-flexible standard, with predefined mark-up tags, and a peculiar mixture of presentation mark-up and data, which is sometimes further enriched with procedural code such as ASP and JAVA. An HTML document is also not protected from the test of time. Should HTML evolve to something new, many organizations face a very serious legacy problem in order to secure their intellectual capital. Logical separation between application code, presentation mark-up, and content is the only way to accomplish this.

This is the underlying concept of a new standard, called the Extensible Mark-up Language (XML3), and we will come back several times to this important new international standard later in the paper.

C. ADDRESSING THE INHERENT PROBLEMS OF THE INTERNET

15. We therefore arrive at the great paradox of the Internet: it has fantastic and unprecedented potential for global dissemination, but also for global information management problems! These problems can only be solved if action is taken to:

III. ISSUES AND CONSIDERATIONS FOR FAO MEMBERS REGARDING AGRICULTURAL INFORMATION MANAGEMENT ON THE INTERNET

16. One important characteristic that separates the agricultural sector from many others is that agricultural scientific and research documents do not become obsolete very quickly. Many agricultural technologies, techniques, methodologies and research findings that were relevant decades ago, are still applicable. This means that agricultural documentation must be stored in formats that will pass the test of time.

17. It has already been mentioned that HTML is not a particularly good standard for archiving documents because content, formatting, and procedural aspects are not logically separated. The only current international standards that allow for this kind of separation are the Standard Generalized Mark-up Language (SGML) and its derivative for the Internet, XML.

18. FAO is using SGML and XML standards, on one hand to preserve the organization's institutional memory, but also to facilitate dissemination through the Internet from the corporate document repository, which can be found at http://www.fao.org/docrep.

19. The FAO Secretariat invites the Consultation to consider adopting SGML/XML as standards for document management.

20. However, content management standards such as SGML/XML by themselves are not enough to facilitate data access and exchange on the Internet. It is necessary to establish agreed international classifications schemes, such as the AGRIS/CARIS subject categories, and internationally agreed vocabularies, such as the multilingual agricultural thesaurus, AGROVOC. A thesaurus tool such as AGROVOC, as well as other international thesauri such as the one produced by CAB International, allows for consistent and precise use of agricultural terminology.

21. When publishing information on the Internet, value added metadata should be associated with the information published, providing precise keywords and classification categories that will later facilitate data retrieval. This is the concept behind FAO's Information Finder (http://www.fao.org/waicent/search/default.asp), where for each web-page published on the FAO web-site, there is an associated record in the Finder database, using AGRIS Subject Categories and AGROVOC terms. The end-user can then use these same categories and AGROVOC terms to retrieve data from the FAO web-site, following the same precise terminology that was in the mind of the information producer. The Information Finder provides both the platform and the standards to find information in an effective manner. If either the information producer or the end-user is unsure of what precise vocabulary to use, he or she can consult the AGROVOC Thesaurus, which is readily available on-line at http://www.fao.org/agrovoc.

22. The FAO Secretariat invites the Consultation to consider adopting and promoting the use of agreed vocabularies and classification schemes (such as AGROVOC and AGRIS/CARIS subject categories) for better management of agricultural information.

23. If a large number of agricultural web-sites follow the same or related standards for categorization and indexing, data retrieval and data exchange will be greatly facilitated. The more sites using the same overall approach to information management, the easier it will be to link together information from different sources. To make linkages even easier, international agreements could be reached to define key data elements for different areas in agriculture (for example, to describe key data elements to facilitate the exchange of information on plant genetic resources, or on forest products). These standards can also be applied to more general information types, for example to exchange data on agricultural projects, experts, and institutions. The XML Document Type Definition (DTD), and the XML Schema, are among some of the standards that should be explored in this respect.

24. The FAO Secretariat invites the Consultation to recognise the need to establish common data elements to allow for effective use and exchange of agricultural information.

25. Through adoption of international classification schemes, agreed vocabularies, open standards4, and common data elements such as those described above, it would be possible to overcome the information management problems of the Internet. However, these kinds of standards, as well as the procedures and guidelines which govern them, still need to be defined for agricultural information on the Internet. The Consultation on Agricultural Information Management provides a unique opportunity for Member States to consider this as a matter of high priority.

IV. FAO'S ROLE AS A FACILITATOR FOR STANDARDS IN AGRICULTURAL INFORMATION MANAGEMENT

26. Codex Alimentarius is a very good example of how FAO is effectively serving the inter-governmental process in developing international norms and standards for food. Codex adopts international recommended standards, guidelines and codes of practice after thorough consideration by all Codex member countries. The Codex system was set up to protect the health of consumers, ensure fair practices in international food trade and to coordinate all international food standards work. Universally uniform food standards have the advantage that they protect consumers from unsafe food and allow food producers, processors and traders to have access to markets by breaking down artificial non-tariff barriers to trade. Codex standards are accepted as the benchmarks against which national food measures and regulations are evaluated within the Uruguay Round Trade Agreements.

27. The World Agricultural Information Centre not only provides the information systems platform to access FAO's information, statistics, knowledge, and expertise in support of Member Nations, but it also can serve the inter-governmental process as a normative mechanism to resolve information management problems in general, and to assist in the establishment of National Agricultural Information Systems.

28. In particular, FAO, in collaboration with international partners, can provide to Member Nations:

29. In addition, with the assistance of its partners FAO can coordinate and facilitate activities at the international level to:

30. Below are some practical examples of how FAO is already exercising its normative role in information management in support of Member Nations.

V. THE WAY FORWARD FOR FAO AND ITS MEMBERS

31. The Internet provides an unprecedented opportunity for dissemination, but at the same time presents serious information management problems with no international agreements on how to deal with them. There is an urgent need to establish standards, procedures and guidelines for better information management and data exchange on the Internet.

32. Standards in general provide a common ground on which to build information systems. They facilitate understanding and communication between people from different backgrounds and with different types of expertise. They facilitate the sharing of data and information within and across organizations, and across hardware and software environments. Establishing standards requires cooperation between data producers and data consumers, which is not an easy thing to achieve, especially on the international level. In the course of negotiating agreements on international standards for agricultural information management, Member Nations may well wish to ask FAO to serve as the neutral forum to discuss the underlying issues, and to draw on the normative services of FAO through WAICENT to expedite the process.

33. In particular, FAO in close collaboration with other stakeholders in the Agricultural sector can assist Member Nations in adopting international standards for content management, such as SGML and XML, and in reaching international agreements on user classifications schemes such as AGRIS/CARIS Subject Categories, and agreed vocabularies, such as the multilingual AGROVOC Thesaurus.

34. Member Nations may also wish to examine possible enhancements and extensions to AGROVOC, so that it can be adapted to serve as a precise and effective tool even for researchers and scientists. AGROVOC provides an excellent basis for better information management on the Internet, but with a rapidly changing information environment , and a tendency for greater decentralization in information production, the current structure and functionality should be carefully evaluated. In particular, a major improvement in functionality would be achieved if formal linkages are established with other widely-used vocabularies, such as the CAB Thesaurus. Furthermore, there may be a need to establish more specialised and more precise vocabularies for certain subject areas, beyond what AGROVOC is currently able to offer, and what would not even be appropriate in a global thesaurus. However, linkages from AGROVOC to such specialized vocabularies would be useful, and should be implemented. With such value-added extensions, AGROVOC (or what at FAO we refer to as AGROVOX), could emerge as the tool for agricultural information management for the current Millennium!

35. The Consultation on Agricultural Information Management may wish to acknowledge that some of the key problems affecting access to agricultural information and data exchange on the Internet can be addressed by reaching international agreements on content management standards, classification schemes, and agreed vocabularies. In this respect, the Consultation may wish to consider the specific proposals and recommendations made in this paper.

36. The Consultation may wish to invite FAO, in collaboration with other stakeholders in the Agricultural Sector, to coordinate and facilitate with Member Countries in the promotion and adoption of such standards.

37. The Consultation may wish to suggest that all international agreements reached, procedures adopted and tools developed, be published on the FAO Web-site as a clearing-house for information management standards in the agricultural sector.


1 A thesaurus consists of a controlled set of agreed terminology linked together by hierarchical or associative relations, and where each term a unique and unambiguous meaning. Where relevant, terms may also identify relations of equivalence (i.e. synonyms) with other words from colloquial language. Thesauri usual focus on a particular area of knowledge, and range from a few hundred to 20,000 terms each. A controlled vocabulary, on the other hand, is simply a list of agreed terminology, with no relations defined between terms.

2 The term "metadata" refers to data that describes a piece of information in order to facilitate its retrieval or to explain its content. An example of metadata is a library catalogue, which provides value-added information about the content and location of books.

3 XML is a textual encoding system for creating structured documents that can be easily understood by browser software on the Internet. Both XML and HTML are based on the use of "tags" wrapped around logical units of text. However, while HTML is limited to presenting documents on the Web, XML provides a framework structure to manage, retrieve and exchange textual information similar to the functionality provided by relational databases for managing data. XML allows each specific user group to develop its own set of tags to meet its unique needs without restricting presentation to a single style on a single medium. More detailed information on XML can be found at http://www.w3.org/XML.

4 Industry standards which are non-proprietary in addition to being platform- and application-independent.