Previous Page Table of Contents Next Page

3 The Mapping Schema

As mentioned in the previous section, the source vocabulary is CAT and the target vocabulary is AGROVOC. Mapping means linking an entry in the source vocabulary to an entry in the target vocabulary. An entry in CAT consists of the Chinese term and any English translation(s) along with its relations to other entries. An entry in AGROVOC consists of at least one English or Chinese term along with their translations as well as its relations to other entries. We use the term ‘concept’ and ‘entry’ interchangeably. A term is a lexical representation of a concept. Note that entries do not necessarily have to have lexicalizations in both Chinese and English.

Therefore, in order to carry out a mapping between two concepts, both Chinese and English lexicalizations, when they occur in a given entry, must be considered. The following example demonstrates why: CAT ‘’/‘Oryza sativa’ was originally mapped to AGROVOC ‘Oryza sativa’. However, upon closer examination, the Chinese lexicalization in AGROVOC of ‘Oryza sativa’, which is ‘’, appears to be the broader term of the CAT Chinese term. Moreover, a search in AGROVOC for the CAT Chinese term ‘’, shows the English translation as ‘Paddy’. These discrepancies indicate the weakness of the mentioned procedure and the necessity of cross checking all lexicalizations in both languages.

The relationships are drawn from the SKOS Mapping Vocabulary Specification[4] (version 2004), which define the characteristics of each of the following properties:

and the following classes:

We tried to apply the rules to sample data according to the SKOS specifications, but we found that it was difficult for several reasons. First, the SKOS rules assume that the thesauri to be mapped have both been used to index the same set of resources, which is not the case with the current project. Secondly, the SKOS descriptions of the mapping properties are not well defined, especially for the problems of thesauri in unrelated languages, such as Chinese and English. Third, the heterogeneity of the terminologies themselves complicates the work. Therefore, we modify the rules in such a way as to be able to perform the mappings between the thesauri given the particular linguistic and conceptual issues that characterize these terminologies. We also assume that our modifications are applicable to other projects involving mapping of multilingual thesauri.

3.1 Procedure

Prior to applying the mapping rules, a set of guidelines should be kept in mind:

1. Entries should be mapped irrespective of their status as descriptors or non-descriptors;

2. Mappings should be between entries, not terms;

3. Many to one: more CAT entries could be mapped to the same entry in the target vocabulary;

4. One to many: an entry in CAT could be mapped to one or more entries in the target vocabulary.

Then, the overall mapping process will be performed according to the following sequence:

1. find the exact match at the highest possible level;

2. if there is no exact match, find an approximate equivalent and apply the broad/narrow match;

3. map the corresponding children of each mapped concept;

4. check inheritance of all the non mapped children.

3.2 Inheritance

In cases where there are descendents of a concept in the source vocabulary and there is a mapping from that concept to one in the target vocabulary, and there are no corresponding children in the target vocabulary, rather than mapping each individual descendent via a broad match to the target concept, we assume that those descendents are implicitly map by inheritance.

Fig. 2: The inheritance mechanism.

For example, CAT ‘’/‘Mathematics’, corresponds to AGROVOC ‘’/‘Mathematics’. Furthermore, the CAT concept has over 200 descendants whereas the AGROVOC concept includes none of them. Instead of mapping each individual descendant to the AGROVOC concept using a broad match mapping rule, inheritance is applied. This means that the mapping file will not include any reference to the descendants of the mapped concept unless those children have corresponding equivalences in the target vocabulary.

3.3 ExactMatch

We consider concepts to be the same if:

even if they have different BT/NT/RT, e.g.

123: CAT ‘’/‘Cereal crops’.
2551: AGROVOC ‘Cereal crops’/‘’.

Therefore the mapping would be: CAT-123 ExactMatch AGROVOC-2551

When a gap occurs in either vocabulary because the corresponding term is missing, the term should be added to the appropriate vocabulary

3.4 broadMatch and narrowMatch

When a gap occurs in the target vocabulary because the concept does not exist, but there is a concept that is closely related, the broadMatch or narrowMatch property should be used. If the target concept is more general, then the broadMatch should be used. If it is more specific, then the narrowMatch should be used. See Figure 2.

Fig. 3: The broadMatch and narrowMatch relationships, including inheritance (see below).

By inheritance, c_B is implicitly mapped to a_A as a narrow term, and vice versa.

3.5 partialMatch

SKOS suggests using majorMatch or minorMatch to indicate the mapping relationship between two concepts between which there is some degree of semantic overlap. However, the definitions of major and minor match are imprecise, and in practice, they are difficult to use as specified. So, we redefine one of the rules outlined in the SKOS specifications. SKOS defines a mapping rule called partialMatch but because it is supposed to represent a subsumption relation, we find this to be a misnomer; therefore, we redefine this term to mean the link between any two concepts that have some degree of overlap in their meaning excluding subsumption, which is already covered by the broad and narrow match.

For example, the CAT concept ‘’/‘Economic power’ has been analyzed to be a partial match with the AGROVOC ‘Developed countries’/‘’ because not all developed countries are sub-concepts of ‘Economic power’.

3.6 AND, OR and NOT classes

Because SKOS recognizes that one-to-one correspondences do not always exist between languages and/or terminologies, it introduces the AND, OR, and NOT classes for combining or excluding concepts. ‘AND’ is used to identify a concept formed from the intersection of two or more concepts. ‘OR’ represents the union of the semantics of two or more concepts. ‘NOT’ can be used to create a mapping target from which one or more elements of meaning are excluded.

CAT ‘’/‘Fire control organization’, corresponds to AGROVOC ‘Fire control’/‘’ AND ‘Organization’ AND ‘Public services’/‘’. Note that, as in the case of AGROVOC ‘Organization’, it is not necessary for there to be a Chinese lexicalization for a concept to be mapped to.

In Chinese there are different terms (concepts) used to lexicalize the grain and the plant. The term ‘’ is used as rice as a grain which can be eaten; whereas ‘’ is used for the plant ‘Oryza sativa’. Normally however, the distinction between the crops plant and the grain, is not made. For instance the CAT ‘’ is used for both the plant and the grain. In such cases, we use the rule of OR to map these terms. Thus, in the case of barley, the CAT term ‘’ is exact matched with AGROVOC ‘Hordeum vulgare’/‘’ OR ‘Barley’/‘[1].

CAT ‘’/‘Continent or Mainland’ is a geographic term, indicating the part of China excluding Hongkong, and Macau. So we can use NOT rule to map this term, ‘’ exact match with ‘China’ NOT (‘Hongkong’ AND ‘Macau’).

3.7 Special cases

During our preliminary test phase we encountered the following issues:

1) CAT concept ‘’/‘Abroad’ - ‘External’ - ‘Overseas’ does not exist in AGROVOC. Therefore it is suggested that it be added to the target vocabulary.

2) Some special cases that may occur within CAT or other multilingual mapping projects, are currently unresolved using the described formalism: for example the French concept ‘tutoiement’ is difficult to represent in a language that does not distinguish between format and informal uses of the pronoun ‘you’. Feedback or comments are welcome for this situation.

[1] Due to the way that AGROVOC is structured, the distinction between the term and the concept is only approximate. The lexicalizations of barley should all be part of the same entry. Thus, there should be an exact match with a single barley concept that subsumes both the grain and the plant notions.

Previous Page Top of Page Next Page