Overview of Approach, Methodologies, Standards, and Tools for Ontologies

 

* * * DRAFT * * *


PDF VERSION

 

                  

Howard Beck

Agricultural and Biological Engineering

University of Florida

hwb@agen.ufl.edu

 

Helena Sofia Pinto

Department of Information Systems and Computer Science

Universidade Técnica de Lisboa

sofia@gia.ist.utl.pt

 

The Agricultural Ontology Service

UN FAO

© 2001, 2002, Helena Sofia Andrade Nunes Pereira Pinto (Sections 2, 3, and 5)
© 2002 University of Florida (Sections 1, 4, and 6).

 

1. The Problem to be Solved

 

In this section we give a justification and suggest ways that ontologies can be used to better organize information resources and assist users in retrieving relevant information.  Ontologies are contrasted with conventional search methods based on fulltext search, and the relationship between ontologies and thesauri is introduced.  Applications of ontology in databases and natural language processing are also introduced.

 

The Searching Problem

 

Everyone knows the Internet has exploded with useful information, but nobody can find what they want, or so goes the claim that has lead to the current interest in the Semantic Web [W3C 2001] and, in particular, the use of ontologies for organizing large collections of knowledge.  The agricultural domain is no different from any other in that large repositories of knowledge are being created by thousands of individuals building Web sites around the world.  Finding information within a single site can be difficult, whereas searching for information across multiple sites at different institutions, which are probably written in different languages, can be an overwhelming task.  Of course, the agricultural domain contains concepts which are unique to agriculture. Ontologies attempt to exploit domain-specific information by representing the meaning of terms within a domain, and using these meaning representations to organize the collection and make search more accurate.  Exactly how meaning is represented, how to organize collections around this representational framework, and how search and other inference operations are implemented are all technical issues related to ontology construction.  This paper illustrates how problems of information organization and search in the agricultural domain can be addressed by using ontologies, and presents a brief overview of the methodologies, standards, and tools available for this task.

 

To be sure, conventional information retrieval technologies, consisting mainly of variations on fulltext search which is the basis of well known search engines such as Google [2002], Lycos [2002], and AltaVista [2002] have performed remarkably well, and are currently the method of choice for searching the Web. In classic information retrieval [Salton and McGill 1983], statistical techniques are used based on the frequency of key words appearing in a document.  In its simplest form, a fulltext search engine contains an index of every word occurring in every document within the collection to be searched.  For example, the word “crop” would appear in this index, and would point to every document containing the word “crop”.  A user searching for “crop” is shown a list of all these documents, along with a ranking based on the number of times the word “crop” appeared in each document (presumably the more times the word appears in a document, the more relevant that document is to the user doing the search).

 

Because there are so many documents on the Web, the chances of finding something containing a particular word or phrase is quite good.  It is so good in fact that conventional search engines typically identify hundreds of thousands of documents matching a user’s search terms.  And that is the main problem with conventional search.  The overwhelming number of these documents are not relevant to the user’s interest.  Perhaps more dangerous is the possibility that documents do exist that are very important to the user, but they were not identified, usually because different words were used in the document that did not match the search terms directly.  These two types of errors are formally measured as precision and recall:

 

 

 

A search engine with perfect precision and recall would find all and only the documents relevant to the user’s interest.  Precision is a measure of how many of the documents identified as a result of a search are relevant to the user. Typically with fulltext search most are not, and precision can be as low as 5 percent.  Recall is a measure of how well the search engine did in locating relevant documents (did it find all the relevant documents in the collection).  Again, recall statistics can be quite low for even the best search engines.

 

Furthermore, fulltext search is not capable of processing structured queries, such as “list the countries that produce cassava”, since they would simply produce a list of documents containing these terms.  Preexisting text publications would not be structured in a way that can answer this question.  Processing this query correctly would require a database that explicitly represents facts about agricultural production practices by country. 

 

Fulltext search based only on statistical methods makes no attempt to understand the meaning of the terms being used.  This is the main cause of poor performance.  Words can have many senses (food bank verses river bank), synonymous terms are used in different situations (farmer verses grower), words have a wide variety of different associations and interrelationships (peanut is a kind of crop, leaf is part of a plant) and terms appear in different languages (peanut in English, cacahuate in Spanish).  A big improvement in search precision and recall could be obtained if these relationships could be taken into consideration, and that is what ontologies attempt to do.

 

Thesaurus

 

For well over a century, librarians have made use of thesauri for building subject classifications and cataloging documents within subject headings.  The relevance of the thesaurus to the electronic information age is now being realized. The thesaurus can be considered an early, although simple, kind of ontology, and.  A thesaurus attempts to categorize subject terms using a variety of simple abstractions, the main ones being:

 

Broader Term (BT) - A particular term is more general than another term (“crop” is broader than “soybeans”)

Narrower Term (NT) – A particular term is more specific than another (“soybeans” is narrower than “crop”)

Related Term (RT) – Two terms are associated (“leaf” is related to “plant”)

Use For (UF) – A particular term is the preferred term among a set of synonymous terms (use “grower” for “farmer”)

 

Thus, a thesaurus entry for the term “soybean” might look like:

 

Soybean

            BT: legume

            NT: Bragg, Cobb, …

            RT: pod, leaf

            UF: soy

 

The BT/NT relationships give rise to taxonomies which organizing terms in hierarchical categories.  Taxonomies are an important component of both thesauri and ontologies.

 

The thesaurus provides a structured representation among terms in a domain, hence it is a kind of meaning representation.  The advantages gained by this approach are that users can search directly for information that has been manually cataloged within these subject headings, and that associations to related but different terms can be used to navigate within a neighborhood of relevant topics.  Thus thesaurus-based search can have a high precision rate, and recall is generally improved over fulltext search. Several well-know agricultural thesauri [AGROVOC 2002, CABI 2002, NAL 2002] have been constructed, some of them have been in use for many decades. 

 

While similar in general structure, ontologies attempt to do an even better job than thesauri.  The main way they do this is by providing a more detailed, formal knowledge representation language that does a more thorough of representing word meaning.  While BT/NT/RT/UF relationships are simple and useful, they can be vague, and certainly don’t cover all the rich ways in which words can be interrelated. 

 

Ontologies

 

Here is a simple language for building an ontology that improves slightly on the basic thesaurus abstractions[1]:

 

            Class: A generic concept

            Object: A particular occurrence of a generic concept

            Subclass: A class that is more specific than a particular class

            Superclass: A class that is more general than a particular class

            PartOf: An object that is part of a particular object

            Association: Two objects are related in general (other than one of the above relationships)

 

           

 

Figure 1.  Sample ontology for crop-pest management.

 

A sample ontology of the crop-pest domain is shown in Figure 1, and illustrates some of these simple relationships.  “Beet Armyworm” (Note, this would be an actual occurrence of beet armyworm) is an object within the “insect pest” class.  There are several taxonomic superclass-class-subclass relationships, for example “crop” – “agronomic crop” – “soybeans”.  “Beat Armyworm” is associated with “Soybeans”, specifically it is an “Insect Pest” of soybeans. 

 

How does this improve search?

 

Equipped with such a representation containing interconnections between related terms, the search engine now has information about the meaning of terms that can be useful.  Instead of the term “Soybean” being treated as a string, and counting the frequency with which this string occurs within a document, the search engine can exploit the term relationships to get directly to information about soybeans.  The user enters the term “Soybeans” and jumps directly to the node for this concept shown in Figure 1.  All information directly related to this concept is directly accessible via the term relationships. Moreover, it can use these relations to examine similar or related concepts.  In this way, the user gets access to all an only the information available on a particular concept.

 

Issues in Database Management and Natural Language Processing

 

The ontology acts as a framework for organizing the concepts within a domain.  In addition, information resources such as documents can be attached to concepts, a process known as cataloging.  Traditionally cataloging is a labor-intensive manual process, requiring special training.  Tools for automating or semi-automating this process are much in demand, but existing tools do not perform as well as human catalogers.

 

Attaching information resources to ontologies creates a complete database, the result being that users can perform queries to retrieve specific information.  Ontologies can attach to existing database management systems, or even ad hoc files containing documents, photographs, video, or other media.   Typically the attached information resources are considered to be outside the ontology.  Alternatively, some knowledge representation languages used to build ontologies have advanced to the point where the can act as complete data modeling languages, and databases can be constructed directly within the ontology, such as CLASSIC [Borgida  et al. 1998] and ConceptBase [Jeusfeld et al. 1998].  In such systems the distinction between ontology and database is blurred. 

 

Ideally the user could express queries in natural language, such as “List all insects that damage soybean leaves”, or “What are the vegetative stages of soybean development”, and instead of getting a list of documents which the user must read to find the answer, a direct reply would be generated by the system.  The steps involved in natural language query processing include finding each word in the query in a dictionary or lexicon, analyzing the grammatical patterns within the stated phrase or sentence (syntactic analysis), mapping the grammatical structure onto objects in the ontology (semantic analysis), and then drawing inferences on the relationships between the query and other objects in the database (query processing), the result being a precise retrieval of objects matching the user’s interest.

 

Other uses for natural language processing (NLP) include information extract, and machine translation.  In information extraction, NLP techniques are used to automatically extract facts from plain text.  Machine translation has obvious uses in converting document written in on language, such as Spanish, into another language, such as English.  Natural language processing is an extremely difficult area, yet the ontology promises to provide an important facility necessary for the construction of natural language systems by providing a representation for the meaning of concepts in a domain.

 

Section 2 gives a more formal definition of ontologies, what they look like, and levels of representation running from abstract to specific within the same ontology.  Methodologies for construction of ontologies, which is a labor-intensive task, are addressed in Section 3, including issues such as merging existing ontologies and version control.  Section 4 gives an overview of formal languages used for representing ontologies.  This includes an historical view of semantic networks, and recent developments to define languages for constructing ontologies within the new XML (Extensible Markup Language) standards.  Section 5 gives a brief overview of available commercial and public domain tools that assist in ontology construction and use. Section 6 discusses advanced topics, including database management and natural language processing, in greater detail.  .

 

2. What are Ontologies?

 

In this section we discuss what ontologies are. First, we give a general idea of what ontologies are. Then, we discuss the differences between ontologies and knowledge bases and between ontologies and thesauri. Then, we present and discuss the different definitions of “ontology” and the different forms that they may take. Finally, we analyze the different types of ontologies, and indicate ways to address multilingual issues.

 

General Idea

 

Ontologies have been proposed to solve the problems that arise from using different terminology to refer to the same concept or using the same term to refer to different concepts. The term “ontology” has been borrowed from Philosophy. In Knowledge Sharing [Neches et al. 1991, Patileta 1992, Swartout 1994] the meaning of this word is different from its meaning in Philosophy. Gruber [1993] introduced the term ontology to mean an “explicit specification of a conceptualization” while in Philosophy Ontology means “a systematic account of Existence”.[2] To distinguish between both meanings [Guarino and Giaretta 1995] proposed that Ontology (upper-case “o”) should refer to the Philosophy meaning and ontology (lower-case “o”) to the AI meaning.

An ontology specifies common vocabulary between different systems. It tries to identify and overcome the barriers to sharing and reuse of knowledge represented by AI programs that are due to a lack of consensus in what regards the vocabulary used and the different semantic interpretations in domain models. Informally, an ontology consists of a set of terms and a set of constraints imposed on the way those terms can be combined. The latter set constrains the semantics of a term since it restricts the number of possible interpretations of the term. Terms in an ontology are a representation of concepts. We should stress that in an ontology concepts are represented, not words. Concepts, in general, are not specific of a given natural language [Mars 1995].

Ontologies are closely related to knowledge bases. The distinction between ontologies and knowledge bases lies on the different role played by represented knowledge. Ontologies tend to represent knowledge that is more or less consensual of a community of people, whereas knowledge bases represent knowledge that is specific of the particular problem that the knowledge based system solves. Ontologies are concerned with static domain knowledge. A knowledge base usually includes knowledge that changes with inferences. Knowledge represented in ontologies does not change with inference. For instance, while an ontology on enterprise modeling contains concepts, such as activity, process, resource, in a knowledge base one would have represented the particular activities that are performed by a particular enterprise, the particular processes that take place in that enterprise, the actual process, activities, costs, resources that were used to build or produce a particular product, an estimate of the resources that were inferred to be needed to satisfy a new order that has just arrived. Therefore, knowledge in ontologies is more appropriate to be reused and shared across applications.

Although ontologies aim at capturing static domain knowledge, it is generally acknowledged that an ontology depends on the application that powered its construction. If two applications are dealing with the same domain but the tasks they have to perform are different, then it is natural that the ontologies they need about that domain are slightly different. Although most of the concepts are usually common they may be defined in different ways, such as with different levels of detail (as a class, a relation, etc.), capturing different points of view or features about the same concept (from a structural point of view, a functional point of view, etc.), with different levels of granularity. Different points of view may also imply that the same concepts are represented using different terminology. It should be stressed that there is no single way of organizing concepts. There are different genuine alternatives. Therefore, one commits itself when an alternative is chosen.

Ontologies play an important part in communication between intelligent systems. Suppose that one application asks the other to perform a task. While transmitting information about the particular problem that it wants to see solved it must transmit that information in such a way that the other application can understand. Therefore, it may be important to translate information between different ontologies about the same domain.

Not only is compatibility among ontologies important in communication between different applications, but it is also important in building large systems from smaller ones. Any knowledge based system has an ontology as one of its components, even if only implicitly. A knowledge base can only be assembled from smaller ones if their underlying ontologies are compatible and consistent one with the others. If the ontologies underlying the different knowledge bases are not the same it means that either the domain is represented using different terminology or the same terminology is used with different meaning, that is, either the terms in each ontology are different or the axioms representing the constraints imposed on those terms are not equivalent. Only if the underlying ontologies of the knowledge bases are the same can they be assembled together. Therefore, one cannot build a large system by means of reuse if there is no understanding in what concerns the vocabulary that is used or if there is understanding about the vocabulary that is used but the axioms don't represent the same statements or are in contradiction with one another.

What Is the Difference Between an Ontology and a Thesaurus?

 

Given the general definition of ontology stated above, it is important to ask, what is the difference between an ontology and a thesaurus?  Many potential ontology users and partners in the library sciences are very familiar with the thesaurus, and make uses of a thesaurus in cataloging information.  They will want to know, what is the advantage in moving from a thesaurus to an ontology?

 

Because the above definition of ontology is very general, it might be argued that a thesaurus is an ontology.  There are features in a thesaurus that are common to ontological theories, but others that aren't.  The common features include organization of terminology and hierarchical structure.  Both an ontology and thesaurus are concerned about covering a broad range of terminology used in a particular domain, and in understanding the relationships among these terms.  Both utilize a hierarchical organization to group terms into categories and subcategories.  Both can be applied to cataloging and organizing information.   Important differences include informality and ambiguity of relations in a thesaurus.  The relationships (BT/NT/RT/UF, etc) available for organizing the terms in a thesaurus are not only relatively few in number, but are not formally defined and thus subject to ambiguous use.  For example, the BT/NT (broader than/ narrower than) relationship can be used ambiguously to both indicate that a particular concept is a special case of another, or that a concept is part of another.  The RT (related to) relationship covers all other relationships, lumping together associations, arbitrary properties, and other vague relationships.  A good ontology can introduce a host of structural and conceptual relationships including superclass/subclass/instance relationships, property values, time relationships, and others depending on the representation language used.  In general an ontology contains far more relationships, which are formally defined and unambiguous, compared to a thesaurus.

 

Another way to look at this is to compare the goals of a thesaurus with the goals of an ontology.  A thesaurus attempts to show the relationships between terms, whereas an ontology attempts to define  concepts and show the relationships between concepts.  In pure form, an ontology is not about terms at all, only about concepts which ideally are represented in a form independent of terms in any natural language.  A thesaurus makes no attempt to define, let alone formally represent, the meaning of concepts, it is concerned only with relationships among terms in a particular natural language (or multiple natural languages if the thesaurus is multilingual).  Thus the machinery for representing concepts in an ontology must be much stronger.  Nevertheless to talk about an ontology, we must use terms in some natural language, so the ontology must include a mapping from terms to concepts, no such mapping is formally recognized in a thesaurus.

 

In practical applications, this distinction implies that an ontology will better than a thesaurus when it comes to searching.  Because the ontology contains machine interpretable definitions of concepts, it is able to support terminological reasoning.  This means that a user’s question can be understood through analysis of the meaning of the user’s terms appearing in the question, and mapped more precisely to information resources.  The ontology can reason about the meaning of concepts by comparing logical concept structures.  A simple example (see section) is subsumption.  An ontology can reason that one concept is a special case of another because the logical definitions of each concept can be compared.  If concept B satisfies the requirements for being  a case of concept A, then B can automatically be classified below A.  This gives rise to query processing and searching which is not possible with a thesaurus.

Ontology definitions

The term “ontology” has been used in AI with several meanings. A discussion of some of the meanings in Philosophy, in AI, and in the Knowledge Sharing area can be found in [Guarino and Giaretta 1995]. The initial definition proposed by Gruber [1993] was slightly modified in [Borst 1997]. A merge of both definitions can be phrased as:[3]

“an explicit formal specification of a shared conceptualization”

 

As discussed in [Studer et al.1998]:

  • “explicit” means that “the type of concepts used and the constraints on their use are explicitly defined”;
  • “formal” refers to the fact that “it should be machine readable”;
  • “shared” reflects the notion that the knowledge represented in an ontology “captures consensual knowledge, that is, it is not private to some individual, but accepted by a group”;
  • “conceptualization” refers to “an abstract model of some phenomenon in the world by having identified the relevant concepts of that phenomenon”.[4]

On the other end of possible ontology definitions, the broadest, more informal, definition of ontology is [Uschold et al. 1996]:

“a vocabulary of terms and some specification of their meaning”

The main differences between both definitions are the formality requirement and the consensual nature of the knowledge represented in an ontology. It is important that the knowledge represented in the ontology has a consensual nature, at least among a given group, so that it can be reused in several knowledge based systems. After all, this is the main reason why ontologies are built. The formality requirement is not consensual. There are ontologies which are expressed in a restricted and structured form of natural language that are nonetheless considered ontologies, for instance, the text version of an ontology about activities, processes, organization, strategy [Uschold et al. 1998]. There are even ontologies which are loosely expressed in natural language [Uschold et al. 1996].

What do they look like?

An ontology usually takes the form of an hierarchy of symbols. The symbols represent the concepts of a particular domain. Sometimes the hierarchy is referred to as a taxonomy and symbols are referred to as concepts, vocabulary or terms. However, this is not enough since these constituents could be interpreted differently by different systems. To restrict the possible interpretations of its symbols, an ontology includes a set of axioms. These axioms express the constraints that the symbols involved in those axioms must comply to. These axioms relate one symbol[5] with the other symbols of the ontology. They restrict the possible interpretations for that symbol. Therefore, the most important part of an ontology is the semantics associated with its symbols, usually referred to as the content of the ontology.[6] The content of an ontology is constrained through its set of axioms. Therefore, the basic unit of meaning is not a symbol but the theory, that is, the set of axioms that is associated with the several symbols in the hierarchy.

To give an idea of what an ontology looks like, we present the definitions of the same few concepts from an existing ontology in both an informal and a formal way. In Figure 2 we show the text definition of an activity and a doer in the ENTERPRISE ontology [Uschold et al. 1998]. An activity is characterized by the interval during which the activity takes place, its pre-conditions (what must be true for the activity to be performed), its effects (what is true once the activity is completed). There are also other attributes that characterize it, such as its doer, the sub-activities into which it can be decomposed. A doer is an actor that performs an activity. All concepts in upper case are also defined in the ontology. The definitions are expressed in a restricted and structured form of natural language.

ACTIVITY: something done over a particular TIME INTERVAL. The following may pertain to an ACTIVITY:

  • has PRE-CONDITION(S);
  • has EFFECT(S);
  • is performed by one or more DOERS;
  • is decomposed into more detailed SUB-ACTIVITIES
  • entails use and/or consumption of RESOURCES
  • has AUTHORITY requirements
  • is associated with an [ACTIVITY] OWNER
  • has a measured efficiency

DOER: The Role of an Actor in a Relationship with an ACTIVITY whereby the Actor performs (all or part of) the ACTIVITY.

Figure 2: Informal definitions in the ENTERPRISE ontology

In Figure 3 we present the definitions of those concepts implemented in Ontolingua and kept in the Ontolingua Server[7] library as Enterprise-Ontology. Activity and actual-doer (which corresponds to the concept of doer) were represented as classes.[8] As it can be seen, there may be some differences between the implemented and the textual versions of an ontology. Activity is characterized by the interval of time during which it takes place, its pre-conditions, its effects and its status (done, to be done, etc.). Activity and actual-doer are related by the actually-execute relation. The actual-doer is an actor for which there is an activity to which that actor is related by the actually-execute relation. In this case the relationships used to establish the hierarchy of concepts are class-superclass and instance-class, for instance, actual-doer is a subclass of actor.


(Define-Frame Activity 
:Own-Slots 
((Documentation "Something done over a particular Time-Range.
The following may pertain to an Activity:
* is performed by one or more Actual-Doer(s);
* is decomposed into more detailed Sub-Activity(s);
* Can-Use-Resource(s);
* An Actor may Hold-Authority to perform it;
* there may be an Activity-Owner;
* has a measured efficiency.")
 
 (Instance-Of Class) 
 (Subclass-Of Activity-Or-Spec)) 
:Template-Slots ((Actual-Activity-Interval (Minimum-Cardinality 0) 
                                           (Cardinality 1) 
                                           (Value-Type Time-Range)) 
                 (Actual-Pre-Condition (Minimum-Cardinality 1) 
                                       (Value-Type Pre-Condition)) 
                 (Actual-Effect (Minimum-Cardinality 1) 
                                (Value-Type Effect)) 
                 (Activity-Status (Minimum-Cardinality 1) 
                                  (Value-Type Activity-State)))) 
 
(Define-Frame Actual-Doer 
:Own-Slots 
((Documentation "The Actor in the Actually-Execute relationship.") 
 (Instance-Of Class) 
 (Subclass-Of Actor)) 
:Axioms (<=> (Actual-Doer ?Actor) 
             (And (Actor ?Actor) 
                  (Exists (?Activity) 
                          (Actually-Execute ?Actor ?Activity))))) 
 
(Define-Relation Actually-Execute (?Actor ?Activity) 
"A relationship between an Actor and an Activity 
whereby the Actor has performed the Activity." 
:Def (And (Potential-Actor ?Actor) (Activity ?Activity))) 
 

Figure 3 Formal definitions in the ENTERPRISE ontology

Types of Ontologies

Ontologies can be classified, according to the issue of the conceptualization, usually referred as type, into [van Heist et al. 1997, Guarino 1998]:

  • representation ontologies or meta-ontologies, capture the representation primitives used to formalize knowledge in a given knowledge representation family or system. For instance, the Frame ontology [Gruber 1993] which defines the terms that capture conventions used in object-centered knowledge representation systems (frames, description logics, etc.). This ontology defines concepts, such as class, relation, function, named-axiom, arity, exact-domain, exact-range, unary-relation, binary-relation. In this ontology, relations are sets of tuples (named by predicates), functions are a special case of relations, classes are unary relations (there is no special syntax for types), there is no special treatment of slots (since they can be represented as unary functions or binary relations) and classes are defined extensionally as sets (not descriptions);
  • general or upper-level ontologies,[9] classifies the different categories of entities existing in the world. Very general notions which are independent of a particular problem or domain are represented in these ontologies. Knowledge defined in this kind of ontologies is applicable across domains and includes vocabulary related to things, events, time, space, mereology.[10]

An ontology of time, Simple-Time, can be found in the Ontolingua Server library. In Figure 4 we present part of its structure and in Figure 5 we present some of the definitions of the concepts represented in it. There are three main concepts: time point, time range and duration. A time point defines a single point in time that cannot be further decomposed. A time range is characterized by two time points, its starting and ending time points, and a duration. The ending time of a time range is equal to the sum of its starting time and its duration. A duration denotes a period of time and is characterized by a value and a measure. One can define an equality relation between two time points (if they represent the same time point) or two time ranges (if they have the same starting points and the same ending points). One can define several relations between time ranges, such as before, after, meets, overlaps.

Other examples of upper-level ontologies can be found in [Sowa 2000]. Some of the abstract upper-level ontologies presented in it were proposed by philosophers (for instance, Aristotle's ontology) and others by knowledge engineers (for instance, Cyc's [Lenat and Guha 1990] upper-level);

 

Figure 4: Part of the structure of Simple-Time ontology

(Define-Frame Time-Point
:Own-Slots
((Arity 1)
 (Documentation "A time-point is a point in real, historical 
time (on earth).  It is independent of observer and context....  
The time-points at which events occur can be known with various
degrees of precision and approximation, but conceptually time-points 
are point-like and not interval-like....")
(Domain-Of Day-of Minutes-of Hour-of Month-of Seconds-of Year-of)
(Instance-Of Class) 
(Subclass-Of individual)))
 
(Define-Class Time-Range (?time-range) 
"Time-Range denotes a certain period of time. It consists of a
start time, an end time. A start time must proceed an end time.
Relations between Time-Ranges are defined after James Allen's interval
relations."
:def (individual ?time-range)
:constraints (Equals  (+ (Sart-Time-of ?time-range)
                         (Duration-of ?time-range))
                      (End-Time-of ?time-range)))
 
(Define-Function Start-Time-of (?time-range) :-> ?time-point
"(Start-Time-of 'tr) denotes a start time of a time range tr."
:def (and (Time-Range ?time-range)
          (Time-Point ?time-point)))
 
(Define-Class Duration (?duration)
"Duration denotes a period of time. It consists of a value and a 
 measure"
:def (individual ?duration))
 
(Define-Relation Equals (?t1 ?t2)
" a time point ?t1 is equal to a time point ?t2.
  a time range ?t1 is identical to a time range ?t2."
:axiom-def
((=> (and (Time-Point ?t1) (Time-Point ?t2))
     (<=> (Equals ?t1 ?t2)
          (and (= (Year-of ?t1) (Year-of ?t2))
               (= (Month-of ?t1) (Month-of ?t2))
               (= (Day-of ?t1) (Day-of ?t2))
               (= (Hour-of ?t1) (Hour-of ?t2))
               (= (Minute-of ?t1) (Minute-of ?t2))
               (= (Second-of ?t1) (Second-of ?t2)))))
 (=> (and (Time-Range ?t1) (Time-Range ?t2))
     (<=> (Equals ?t1 ?t2)
          (and (Equals (Start-Time-of ?t1)
                       (Start-Time-of ?t2))
               (Equals (End-Time-of ?t1)
                       (End-Time-of ?t2)))))))
 
(Define-Relation After (?time-range-1 ?time-range-2)
"a time range ?time-range-1 succeeds a time range ?time-range-2."
:iff-def (< (End-Time-of ?time-range-2)
            (Start-Time-of ?time-range-1))
:equivalent (Before ?time-range-2 ?time-range-1))
 

Figure 5: Some definitions from the Simple-Time ontology

  • domain ontologies, are more specific ontologies. Knowledge represented in this kind of ontologies is specific to a particular domain. These ontologies describe vocabulary related to a generic domain, such as airplanes, chemical elements, etc. They provide vocabularies about concepts in a domain and their relationships or about the theories governing the domain. For instance, the Plinius ontology [van der Vet et al. 1995, Speel 1995] is about the chemical composition of ceramic materials and Chemical-Elements [Mariano 1996] is an ontology about the chemical elements.

In Chemical-Elements, the most general concept, Elements, is characterized by its symbol, atomic number, chemical group, chemical period, atomic weight, boiling point, melting point, crystal structure, density at 20 degrees centigrade, electronegativity, etc. Ontologies are usually hierarchically organized. In Figure 6 we present part of the hierarchy of Chemical-Elements. Dashed lines represent instance-class relations whereas solid lines represent class-superclass relations. Elements are divided into non-reactive and reactive. Helium, neon, argon, etc. are instances of non-reactive elements. They are all gases at normal pressure and temperature conditions. Reactive elements are further divided into metals, semi-metals and non-metals. There are several non-metal[11] elements, such as carbon and oxygen. A few of them are grouped in the halogens subclass.

 

Figure 6: Part of Chemical-Elements hierarchy

  • application ontologies, describe knowledge pieces depending both on a particular domain and task. Therefore, they are related to problem solving methods, which is outside the scope of this work. An application ontology relates concepts that describe the domain with concepts that are part of the description of problem solving methods. They explicit the role played by concepts of the domain in a given problem solving method.

In Figure 7 we present part of an example described in [Van Heist et al. 1997]. CASNET [Weiss et al. 1984] allows the representation of casual links that describe the processes associated to diseases and the development of diagnosis applications. Simple boxes represent domain knowledge and labeled boxes represent knowledge associated to problem solving methods. In this example, CASNET ontology defines pathophysiological states and observations. Dashed arrows represent class-superclass relations and labeled arrows represent the relation named by its label. In this example there are two inference methods: casnet abduction and casnet ranking. The application ontology relates concepts from the domain with concepts related to problem solving methods. Observations provide evidence for states which are associated with abduction confidence measures. One of the procedures involved in abduction uses the evidence links between observations and their associated confidence measures to compute the confidence measure of the state. One of the procedures involved in ranking uses the strengths of the casual relations between the states. The total weight of a state is the maximum of the forward and the inverse weights. The forward weight of a state summarizes the weight of the evidences coming from the causes of that state. The inverse weight summarizes the weight of the evidences coming from the effects of the state. The procedure that ranks states (hypothesis) uses the ratio of the weight of the hypothesis and the costs of testing that hypothesis.

Figure 7: Part of an application ontology

Upper-level ontologies and domain ontologies are somewhat related since their aim is to represent a conceptualization of reality. General ontologies usually represent knowledge that is related or borrowed from Ontologies.[12] There is a large range of domain ontologies: general domain ontologies, such as about airplanes or chemical substances and more domain specific ontologies, such as about turbojet engines used in aircraft propulsion or about chemical elements. The distinction between domain ontologies and upper-level ontologies is not clear cut. In between the extremes one can see a middle ground. This middle ground specializes the upper-level and provides the upper-structure for domain ontologies. One can see this middle ground as representing general domain knowledge (general knowledge in one field, for instance medicine, physical engineering systems). Representation and application ontologies are orthogonal to domain and upper-level ontologies. Representation ontologies try to conceptualize the meta-level categories used to model the world (concept, property, etc.). Application ontologies connect ontologies with problem solving methods in a particular knowledge based system.

Although the previous classification is the most consensual in the area there are other proposals. For instance, [Mizoguchi et al. 1995] proposes a classification according to ontology content. In this case, only three kinds of ontologies are considered: domain ontologies, general ontologies and task ontologies. While the first two were already described, task ontologies provide the terms used to solve problems associated with a particular task. Therefore, they are related to problem solving methods, which is outside the scope of this work. For instance, in diagnosis the terms defined in the task ontology would include “observation”, “hypothesis” and “goal”.

Strategies for Creating Multilingual Ontologies

 

A strategy is needed for developing an ontology in such a way that it can be used by people having different native (natural) languages, such as English, French, Japanese, and potentially many other languages.  Furthermore, even within a general language there can be local dialects or even local communities of users who use terms in a way that differ from more general usage.  We use the term “multilingual” to refer to the ability to support multiple, different natural languages (and to distinguish this from the separate issue of using different formal knowledge representation languages used for representing ontologies).  Although an ontology represents concepts in a language-independent fashion, mappings from terms in natural language to concepts in the ontology enable people to access and interpret the concepts.  The terms are expressed in the person’s natural language, and in a multilingual ontology, terms from many different natural languages are available.  In addition, the multilingual aspect of an ontology can be beneficial in applications such as machine translation.

 

Two strategies for multilingual support in existing thesauri include the single concept approach and the interlingua approach.  In the single concept approach, there is a single representation for a concept, and this representation includes a translation of the concept into terms in several different natural languages.  In the interlingua approach, there is a multi-level representation involving a language-neutral (interlingua) concept representation, along with representations for each term in each supported natural language.  Each natural language term is mapped to a single interlingua concept (in general each sense of the term is mapped to a single interlingua concept).  Terms in different natural languages may map to the same interlingua concept, but it is not necessary that a given interlingua concept is mapped to by terms in all the languages.  This is because a given concept can not necessarily be translated into a single term for all languages.  Examples of these two approaches are presented below.

 

1. Single Concept Approach

 

AGROVOC

 

AGROVOC is a multilingual agricultural thesaurus developed by the United Nations Food and Agricultural Organization.  A central AGROVOC server [AGROVOC, 2002] is available in English, French, Spanish and Portuguese, and versions in other languages are available or being prepared by national centers in other countries.  The AGROVOC thesaurus contains the basic thesauri relationships (BT/NT/RT/UF).  Each entry in the thesaurus contains terms for each language.  Thus, the AGROVOC entry for “beef” contains:

 

 

BT meat

   BT animal products

RT veal

 

English: beef

Spanish: carne de res

French: vlande bovine

Portuguese: carne de bovino

 

A person making a new entry into AGROVOC is required to specify terms for all the supported languages.  While this somewhat simplifies creation of multilingual terms in AGROVOC, the difficulty is that not all concepts have simple translations into a term in all the supported languages.  This problem is solved, albeit at greater effort and complexity, in the interlingua approach.

 

2. Interlingua Approach

 

EuroWordNet

 

EuroWordNet [Vossen 2001] is a multilingual ontology based on WordNet [Fellbaum, 1998] which is an English ontology developed at Princeton University.  WordNet was developed first, and then EuroWordNet was developed later as a multilingual ontology.  EuroWordNet is based on WordNet, and uses the same knowledge representation language, extended to provide multilingual support. The terms and concepts in WordNet are a subset of EuroWordNet (the English part).

 

WordNet provides a mechanism for dealing with synonyms and homonyms; different terms can have the same meaning, and the same term can have different meanings.  A given term in WordNet is first broken into its different senses, which are also categorized by part-of-speech (noun, verb, adjective and adverb).  Each sense of a term is mapped to a synset (synonym set) which is WordNet’s representation for a concept.  Different terms sharing a common sense are mapped to the same synset.  The synset includes a gloss (short natural langue definition of the concept, in English), and other conceptual information, such as pointers to antonym synsets.

 

In EuroWordNet, terms and synsets are created semi-independently for each natural language.  It is not required that the concepts in one language are the same as that of another, though of course only concepts in the same domain can lead to a multilingual approach.    Thus English can have its own terms and synsets and French can have its own terms and synsets, which allows for greater flexibility in describing language-specific concepts.  Multilingual relationships are made at the level of synsets.  An Inter-Lingual-Index (ILI) provides a mapping between synsets for the same concept in different language.  Thus if synset in English represents the same concept as a synset in French, the English and French synsets would be linked by the ILI.  The ILI is a way of  indicating equivalence among synsets from different languages.  Notice that the actual terms are not involved directly in this association.  Also, an ILI does not necessary link to a synset in all of the supported languages (some languages may not have terms for a particular concept). 

 

The ILI is a kind of interlingua, however the ILI has no formal representational structure, it is merely a mechanism for linking equivalent synsets.  Thus as an interlingua, the ILI provides comparatively weak ontological functions.  Also, the ILI  set in WordNet was initialized from the synsets from WordNet.  Although the ILI set can be extended with non-English concepts, historically it is somewhat predisposed towards English.

 

EDR

 

To quote from the English introduction on EDR:

 

"The EDR Electronic Dictionary was developed for advanced processing of natural language by computers, and is composed of eleven sub-dictionaries. Sub-dictionaries include a concept dictionary, word dictionaries, bilingual dictionaries, etc. The EDR Electronic Dictionary is the result of a nine-year project (from fiscal 1986 to fiscal 1994) aimed at establishing an infrastructure for knowledge information processing. The project was funded by the Japan Key Technology Center and eight computer manufacturers (Fujitsu, Ltd., NEC Corporation, Hitachi, Ltd., Sharp Corporation, Toshiba Corporation, Oki Electric Industry Co., Ltd., Mitsubishi Electric Corporation, and Matsushita Electric Industrial Co., Ltd)” [http://www.iijnet.or.jp/edr]

 

The EDR Electronic Dictionary (Figure 8) is a machine-tractable dictionary that catalogues the lexical knowledge of Japanese and English and has unified thesaurus-like concept classifications with corpus databases. The Concept Classification Dictionary, a sub-dictionary of the Concept Dictionary, describes the similarity relation among concepts listed in the Word Dictionary. The EDR Corpus is the source for the information described in each of the sub-dictionaries. The basic approach taken during the development of the dictionary was to avoid a particular linguistic theory and to allow for adoptability to various applications."  [Japan Electronic Dictionary Research Institute, Ltd, 1996]

 

EDR is bilingual (Japanese/English). Each concept has a hexadecimal identifier, and is primarily defined by its relationships to other concepts (these relationships are stored in the Concept Classification Dictionary), and by its links to one Japanese and/or one English word in the Word Dictionaries (see below). Each concept has a brief explanation in Japanese and/or English, intended to help those editing and maintaining the dictionary.  The concept dictionary provides the interlingua, and concepts are modeled using a rich set of ontological relationships including agent, object, cause, material, occurrence, and time.

 

 

 

 

Figure 8. EDR structure

EDR is a fairly balanced bilingual dictionary, providing similar representation of both English and Japanese. The use of a language-neutral concept identifier is important in this regard. EDR attempts to handle the lack of corresponding words between languages using several kinds of translation. In the case of Japanese to English, the options are paraphrases (a phrase in English with equivalent meaning and implications), word-for-word translation, conversion of the Japanese word (in pictorial characters) into a Romanized form or explanation in English.

 

Note that the two languages have different requirements in terms of their “dictionary entries”. For example, inflection of Japanese verbs follows a few well-defined rules, so it is not necessary to record it, whereas English verb inflection is a nightmare (walk, walked; speak, spoke, think, thought; go, went…).  Conversely, the pictorial characters used to represent Japanese usually have at least two pronunciations, and sometimes many, so guidance on the pronunciation is helpful. Note also that an EDR concept may have multiple “parents’ – the structure is not a tree. There are also other relationships, other than BT/NT, which can be used to represent facts or occurrences.

 

Character Sets

 

The Unicode standard [Unicode Consortium, 2000 #170], can represent most of the characters used in Asian languages. Although country-specific encodings are more commonly used on the Internet and in documents (eg SJIS and EUC in Japan), Unicode allows characters from many languages to be stored together. There isn’t any obvious alternative, and most countries will have mechanisms for converting from their own national encoding systems into Unicode. We would therefore propose that AOS use Unicode internally, with client applications converting to and from country-specific encodings as required.

 

3. How can ontologies be built?

 

In this section we overview the way ontologies are built. First, we characterize the ontology building process and its life cycle. Then, we present an overview of the most representative methodologies to build ontologies from scratch. Then, we describe the processes of building ontologies by means of reuse. Finally, we describe the problems of ontology versioning.

The ontology development life cycle

Ontology building is a process. A process is composed of a series of activities that are performed in order to achieve something. The usually accepted stages through which an ontology is built are:[13] specification, conceptualization, formalization, implementation, and maintenance. At each stage there are activities to be performed:

  • specification, where one identifies the purpose and scope of the ontology. Purpose answers the question “why is the ontology being built?” and scope answers the question “what are its intended uses and end-users?”;
  • conceptualization, where one describes, in a conceptual model, the ontology that should be built so that it meets the specification found in the previous step;
  • formalization, where one transforms the conceptual description into a formal model, that is, the description found in the previous step is written in a more formal form, although, not yet its final form;
  • implementation where one implements the formalized ontology in a formal knowledge representation language;
  • maintenance, where one updates and corrects the implemented ontology.

Besides the activities mentioned above that should be performed at each homonymous stage, there are other activities that can be performed during the whole life cycle, such as:

  • knowledge acquisition, where one acquires knowledge about the domain either by using elicitation techniques on domain experts or by referring to relevant bibliography;
  • documentation, in which one reports in a document and along the implementation, what was done, how it was done and why it was done;
  • evaluation, in which one technically judges the ontology.

Besides these activities there is also an activity that depends on the methodology:

  • reuse, where one reuses other ontologies as much as possible. Most methodologies name this activity “integration”.

In Figure 9, we present a scheme of the activities involved in ontology development life cycle.

 

Figure 9: Activities in the ontology development life cycle

Although these activities have some influence from software engineering activities [IEEE-Std-1074-1995. 1996] they are different. The main differences are:

  • in software engineering after requirement specification activities, design activities are not divided into conceptualization and formalization as most ontology methodologies propose,[14]
  • knowledge acquisition activities, (that take place during the whole life cycle), do not exist in software engineering processes and are an essential part of any ontology building methodology.

There are some activities that all of the most representative methodologies consider, namely, specification, conceptualization, implementation, evaluation and reuse.

In Figure 10 we compare waterfall [Royce 1970], iterative [Basili and Turner 1995] and evolving prototyping life cycle models, [15] enhancing how activities are scheduled along the life cycle and how the final product is developed.

 

Figure 10: Comparison of life cycles

It is more or less consensual in the field that the development of an ontology follows an evolving prototyping life cycle rather than a waterfall or an iterative one. In an evolving prototyping life cycle, one can go back from any stage to any stage of the development process. As long as the ontology does not satisfy evaluation criteria and does not meet all requirements found during specification, the prototype is improved. One should note that evaluation should be performed along the whole life cycle, although some methodologies have a specific evaluation stage after implementation. Usually, one does not start over a new prototype in each “iteration”, it only improves the existing one. [16]Any part of the ontology that was identified as lacking quality or not meeting the desired requirements is improved. This kind of life cycle is different from a waterfall one, since it may go back to a previous stage of the life cycle. It is different from iterative life cycles since there is no a-priori planning of the several prototypes of the ontology that are going to be developed in each iteration of the ontology building process. [17]

Most representative methodologies to build from scratch

Although there is a set of ontology building methodologies proposals, none is widely accepted. Ontology building is still more of a craft than an engineering task. The main methodologies to build ontologies from scratch, are:

  • TOVE methodology, which was used to build TOVE ontology (which is about enterprise modeling processes) [Fox 1995, Gruninger et al. 1995, Gruninger 1996],
  • ENTERPRISE methodology, which was used to build the ENTERPRISE ontology (which is also about enterprise modeling processes) [Uschold et al. 1995, Uschold 1996, Uschold et al. 1996],
  • METHONTOLOGY, which was used to build, among others, the Chemicals ontology (which is about the chemical elements of the periodic table) [Fernández et al. 1997,Fernández et al. 1999],

A comparative study of ontology building methodologies from scratch can be found in [Fernández 1999]. In the remainder of this section, we describe the most representative ontology building methodologies that address the problem of building ontologies from scratch. In the next section, we describe the methodologies that deal with reuse issues.

The TOVE methodology proposes the following stages:

  • capture motivating scenarios, which are stories or examples that describe the motivation for the proposed ontology in terms of its intended applications,
  • formulate informal competency questions based on the motivating scenarios. These are the questions that the ontology must be able to answer (these questions can be stratified, that is, the answer to a question can be used to answer more general questions),
  • specify the terminology of the ontology within a formal language (it uses classical first-order logic, FOL),[18]
  • formulate formal competency questions in FOL using the terminology defined in the previous stage,
  • specify axioms and definitions for the terms in the ontology within the formal language,
  • evaluate the ontology by demonstrating the competency of the ontology with respect to the set of questions that arise from the applications that use the ontology,
  • define the conditions under which solutions to the questions are complete.

The ENTERPRISE methodology proposes the following stages:

  • identify the purpose and scope of the ontology,
  • build the ontology by capturing knowledge,[19] coding knowledge[20] and reusing appropriate knowledge from existing ontologies,
  • evaluate the ontology,
  • document the ontology.

This methodology proposes a set of techniques, methods and guidelines for each stage. For instance, brainstorming and meetings with domain experts are suggested for knowledge acquisition. It proposes the use of a middle-out approach to produce the conceptual model of the ontology, instead of a bottom-up or top-down approaches. In a middle-out approach one begins by conceptualizing and defining the concepts that are more highly connected to other concepts since these are the most difficult to be correctly and accurately defined.

METHONTOLOGY proposes an evolving prototyping life cycle composed of:

  • a series of development oriented activities that correspond to homonymous stages, such as:
    • requirement specification,
    • conceptualization of domain knowledge,
    • formalization of the conceptual model in a formal language,
    • implementation of the formal model,
    • maintenance of implemented ontologies;
  • a series of support activities that are performed along the whole ontology building process, such as:
    • knowledge acquisition,
    • documentation,
    • evaluation,
    • integration of other ontologies;[21] [22]
  • a series of project management activities, such as:
    • planning,
    • control.

Since we have used METHONTOLOGY terminology to describe the activities that compose the ontology building life cycle, the activities that compose METHONTOLOGY have already been described.

This methodology proposes a set of intermediate representations to help conceptualize knowledge [Gómez-Pérez et al. 1996] and a series of criteria to perform evaluation [Gómez-Pérez et al. 1995].

So far, none of these methodologies is mature enough or has a significant user community. Therefore, none has been accepted as standard. However, these methodologies are the most cited in the literature of the area. In http://babage.dia.fi.upm.es/ontoweb/wp1/OntoRoadMap/index.html one can find references to other ontology building methodologies.

Ontology reuse

Although ontology building methodologies from scratch recognize reuse as part of the development process, none really addresses this issue. It is only recognized as a difficult problem to be solved.

There are two kinds of ontology reuse processes: merge and integration.

Merge is the process of building an ontology in one subject reusing two or more different ontologies on that subject [Pinto et al. 1999]. In a merge process source ontologies are unified into a single one, so it usually is difficult to identify regions in the resulting ontology that were taken from the merged ontologies and that were left more or less unchanged.[23] It should be stressed that in a merge process source ontologies are truly different ontologies and not simple revisions, improvements or variations of the same ontology.

Integration is the process of building an ontology in one subject reusing one or more ontologies in different subjects[24] [Pinto et al. 1999]. In an integration process source ontologies are aggregated, combined, assembled together, to form the resulting ontology, possibly after reused ontologies have suffered some changes, such as, extension, specialization or adaptation. In an integration process one can identify in the resulting ontology regions that were taken from the integrated ontologies. Knowledge in those regions was left more or less unchanged.

In Figure 11 we illustrate the differences between the two ontology reuse processes.