AGROVOC is a comprehensive multilingual agriculture thesaurus that was developed with the cooperation of FAO member countries. It is used for indexing data in agricultural information systems and is continually being improved and updated. The first version of AGROVOC was produced in 1982 and distributed to all AGRIS centres.
Vocabulary updating is done by FAO with collaboration from national AGRIS centres. Staff at the centres propose new terms for the database to FAO subject specialists for consideration. The terms selected by the experts are added into AGROVOC. In the past, an AGROVOC supplement was then published and provided to the centres. Now the updated AGROVOC is available online. The proposing of new terms and corrections also can be done through the FAO/AGROVOC web site.
Initially AGROVOC was available in English, French and Spanish but has been expanded to four more languages: Arabic, Portuguese, Czech and Chinese. The Thai AGRIS centre is developing the Thai AGROVOC by using the English AGROVOC thesaurus as a prototype.
The following describes the motivations, work plan and problems in developing the Thai AGROVOC and provides suggestions for its future.
Limitations of translating AGROVOC to the Thai Language
Data and information produced locally typically are displayed in native language. Thus to be useful they also have to be recorded and indexed in the local language. However, each language is different in alphabet characteristic and structure. Some languages are easy to process by computer while others are very impractical or even unlikely.
For the latter, basing an index on uncontrolled vocabulary and computer processing is extremely difficult. Thai language is such a case. This is because the written Thai language has no space between words; so it is very difficult to use a word-spacing program to separate words into individual and correct words. In addition, many Thai words have the same form but different meaning or different forms with the same meaning.
Because it is difficult to use a word-spacing program for making an uncontrolled vocabulary index and because an index that has been produced from automatic spacing is low in retrieval efficiency, having a controlled vocabulary becomes quite critical. The attributes of Thai language and its limitation for natural language index by computer processing make it inefficient for data retrieval and utilization. For these reasons, the Thai AGRIS staff opted in 2001 to develop a Thai AGROVOC. As mentioned previously, financial support from Kasetsart University’s Research and Development Institute and technical support from FAO facilitates the project.
The project implementation has been divided into two phases: The first focused on developing the Thai AGROVOC (and the electronic agriculture thesaurus) and the second relates to adding words in the local language and other maintenance. The electronic agriculture thesaurus is the first Thai database that stores and displays agriculture vocabulary in bi-language - Thai and English.
Table 1: Objectives of the Thai AGROVOC
|Objective||Problem to solve|
|1.||Create standardized terms for indexing in Thai.||•||Thai words have the same form for different meanings and the same meaning in different forms; there is no standard word for making an index.|
|2.||Provide Thai descriptors for inputting data in the AGRIS system.||•||Indexing in English is not efficient and erroneous, depending upon English literacy of the person who is making the index.|
|3.||Increase data retrieving efficiency.||•||A Thai word always has various meanings and various forms.|
|•||Thai automatic word separator is very difficult and very constraining for searching.|
|•||Existing searching does not allow efficient free text searching in Thai.|
|4.||Link Thai vocabulary with words in other languages in the AGROVOC.||•||No local language technical terms and name entities relate with other languages.|
|5.||Make it more convenient and possible for every level of user to search the index.||•||Users who are not keen in either English or agricultural science cannot identify agricultural vocabulary or data searching.|
|6.||Create Thai agriculture word stock in both English also.||•||There has never been an electronic agricultural thesaurus in Thai.|
Developing the Thai AGROVOC
Developing the Thai AGROVOC has been a kind of research and development process for the centre, with considerable exploration into integrating ICT and knowledge management to enhance the efficiency of information service. The project has involved many resource persons and information sources, as the following outlines:
37 experts in 31 fields of study with 12 lecturers and researchers working as translators;
FAO AGROVOC thesaurus;
Special fields dictionary related to agriculture (see the appendix 2 for the listing);
Text books and document on agriculture; and
Vocabulary and index in articles and research paper for the past 20 years.
The five-year project evolved in two stages:
Stage 1 – Building vocabulary collection (2001–2003)
- System analysis
- Collecting vocabulary from technical dictionary and research papers
- Translating terms from English to Thai
- Building Thai agriculture vocabulary database
- Installing system and providing Thai AGROVOC service through the Internet
- Promoting public awareness and feedback
Stage 2 – Adding new vocabulary and maintenance (2004–2005)
- Collecting comments for each term
- Editing vocabulary
- Adding local vocabulary into the system
- Filling up new vocabulary
- Updating vocabulary collection
Figure 3: The Thai agricultural thesaurus development process
Detailed achievements of the five-year project
1.1. Built vocabulary collection using vocabulary database from AGROVOC (Agrovoc.mdb) version 9 Nov 2001 as a prototype. The vocabulary in the database is displayed in three languages (we used English for the prototype, totalling 28,577 terms).
Term number and term type in AGROVOC database (version 9 November 2001)
|English descriptor||Non-descriptors||Deleted terms||Scope note|
1.2. Categorized vocabulary, descriptors and non-descriptors (except scope note), totalling 27,313 terms into 47 groups in 31 subjects, based on AGRIS/CARIS subject categorization schemes and the field of expert’s specialization.
1.3. Created terms list for specialists to translate from English into Thai. Verified them with a technical dictionary.
1.4. Sent the translated vocabularies to respective experts for confirmation, editing and modifying and any translation that the subject specialists could not do. The experts could recommend other descriptors or synonyms also.
1.5. Collected the translated and confirmed vocabulary from the experts.
1.6. Added more vocabulary by selecting terms from titles and index of research papers and articles in the Thai Agricultural Database (dating back 20 years and extracting approximately 92,605 words). Extracting an individual word was done manually because there was no well organized word-extraction tool. This is because the Thai language has no space between words so it is next to impossible for a word-extraction program to extract words without errors. Take for example:It should be extracted tobut can easily come out as which does not have the same meaning.
1.7. Vocabulary from the previous step was regarded as a natural-language term. They were ranked and frequency checked. Then they were split into two groups: one was synonymous with AGROVOC and the others were defined as a new term.
1.8. Selected the suitable term as a descriptor. The rest were kept as non-descriptors (synonym). The principle criterion was to use words that are defined and recognized by the Royal Institute. Vocabulary from a technical dictionary and text was a secondary criterion. The third was frequency of use in the literature.
1.9. Entered the Thai vocabulary (descriptor) into the Thai AGROVOC database using the same identification as terms in English by adding a new field for Thai terms. As synonym was recorded separately in another table to be used later as a non-descriptor.
1.10. Checked word redundancy and relationship and corrected errors.
1.11. Submitted the vocabulary to experts for re-verification and re-editing.
1.12. Modified them according to the experts’ recommendation and corrected errors.
1.13. Processed a preliminary Thai AGROVOC.
1.14. Installed the system to provide service and get feed back via the Internet.
1.15. Evaluated the system, verified terms and their relationship.
1.16. Edited data and processed Thai AGROVOC first edition.
1.17. Installed a public feedback system with Internet access.
1.18. Launched the service, promoted it through public relations campaigns and disseminated it to interested groups of people for comments.
1.19. Submitted the Thai AGROVOC to FAO.
2.1. Collected comments and opinions from the public; summarize and edit data.
2.2. Assembled proposed terms obtained from the public; as in step 1.7, accepted vocabulary were taken either as new words or as synonyms.
2.3. Confirmed those new words with experts, created a relationship with existing words and then translated new words from Thai to English.
2.4. Added terms with relation into Thai AGROVOC database by creating new term identification.
2.5. Recorded vocabularies, checked redundant terms and corrected errors.
2.6. Submitted the Thai vocabulary to the Royal Institute for them to promulgate it as an approved Thai agriculture vocabulary.
Existing words are regularly modified and new words are added to keep AGROVOC up to date. Adding new words is possible when there is a suggestion from users and by developing thesaurus maintenance tools that automatically add words and create a relationship.
Figure 4: Process of extracting Thai agriculture terms
Critical factors for success of the Thai AGROVOC project
Several elements worked together toward the success of the Thai vocabulary project:
Body of knowledge from a variety of sources;
Contributions from experts and subject specialists;
Intelligence tools from system developers;
Collaboration from agencies in the network and their recommendations;
Feedbacks from users;
Managerial capability of system administrator; and
Constraints of the Thai AGROVOC development project
Several difficulties were encountered in the development of the project:
Incompatibility between local information and vocabulary. Many descriptors in the AGROVOC are not available in the local scope of knowledge or are incomprehensible to local people. On the other hand, there are also countless local terms, especially the names of plants, animals or other local beings, that do not appear in AGROVOC.
Difficulty of defining new terms in Thai. Many terms in the AGROVOC have never appeared in Thai so there is a necessity to define them. This endeavour is really burdensome. Defining new terms needs expertise and mutual collaboration from experts in that very specific field. Finally, each term has to be officially accepted and registered as new.
Inconsistent language structure. In Thai, there are not singular or plural. Therefore the meaning of word with “s” and word without “s” cannot be displayed differently as English terms in AGROVOC. For example, the terms “seed” and “seeds” have different meanings in AGROVOC.
Incompatibility of meaning and synonymous words. Some words possess more than one meaning in English but mean only one thing in Thai. For example, “corn” and “maize” are separate in English but in Thai they mean the same thing and have only one Thai term. On the other hand, there are Thai terms with different meanings but only one term in English, such as and which in English refer to “buffaloes”.
Very time consuming. The endeavour has to utilize countless experts in many fields in addition to a lot of personnel for word extraction and data processing.
Maintenance. Adding words and changing relationships are extremely difficult due to the lack of an efficient tool.
Shortfall. Some original terms in AGROVOC, as prototype, are not up to date; also, their relationships are incomplete, unclear and inadequate.
FAO should update the AGROVOC by restructuring, improving and editing the vocabulary.
FAO should design a clear procedure for identifying identifiers and the data structure to facilitate the entering of additional local vocabulary into the system.
It would be very helpful if FAO developed efficient tools for AGROVOC maintenance. This would allow member countries to regularly maintain the vocabulary system on their own.
As such, the development of a national AGROVOC to completion is very critical for local users to effectively access the widest global knowledge base. Efficient tools for maintenance and automatically updating thesaurus terms are necessary and waiting to be developed.
Thai AGROVOC public hearing
The Thai AGRIS Centre has developed an Internet public feedback system that allows users, specifically experts, academics and agricultural scientists, to voice their opinions or suggestions, confirm the exiting terms and propose new terms (http://pikul.lib.ku.ac.th/). They can verify vocabulary or even suggest new words. This process is expected to continuously improve the quality of the Thai AGROVOC, especially in terms of comprehensiveness, accuracy and relevance to the Thai agricultural body of knowledge.
However, developing the Thai AGROVOC is still an uncompleted attempt. Further integration of technology with expertise in agricultural sciences and information science is needed to enhance the efficiency of AGROVOC for knowledge discovery.