2. Some definitions

2.1 XML

XML, the eXtensible Markup Language, is the universal format for structured documents and data on the Web. It is designed to improve the functionality of the Web by providing more flexible and adaptable information identification. It is called extensible because it is not a fixed format like HTML (a single, predefined markup language). Instead, XML is actually a ‘metalanguage’ -- a language for describing other languages -- which lets you design your own customized markup languages for limitless different types of documents. All these features make it an attractive standard for exchanging data.

An XML document is a collection of data. In many ways, this makes it no different from any other file. As a "database" format, XML has some advantages. For example, it is self-describing (the mark-up describes the structure and type names of the data, although not the semantics), it is portable (Unicode), and it can describe data as tree or graph structures.

Except for unparsed entities, all data in an AGRIS AP XML document is PC DATA (for elements) or CDATA (for attributes) text, even if it represents another data type, such as a date or an integer. Generally, the data transfer software will convert data from text (in the XML document) to other types (in the database) and vice versa.

XML is a content mark-up meta-language designed to store and display documents on the World Wide Web. By separating content from presentation, XML enables us to create information that can be more easily integrated with other Web resources.

2.2 The Document Type Definition (DTD)

The purpose of a DTD, or document type definition, is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. The advantages of the DTD are many, viz. each of your XML files can carry a description of its own format with it; independent groups of people can agree to use a common DTD for interchanging data; your application can use a standard DTD to verify that the data you receive from the outside world is valid; and you can also use a DTD to verify your own data.

It is essential that the structure of the XML output documents exactly match the structure expected by the DTD. Mapping the local database schema to an XML DTD schema is the most important exercise that is undertaken in this context.

2.3 Namespaces

The W3C XML community defines a mechanism called XML namespaces, which can be used as single XML document containing elements and attributes that are defined for and used by multiple software components. This use by multiple software promotes reuse and restricts reinvention. Their definition:

An XML namespace is a collection of names, identified by a URI reference which are used in XML documents as element types and attribute names. XML namespaces differ from the "namespaces" conventionally used in computing disciplines in that the XML version has an internal structure and is not, mathematically speaking, a set.

In this context, all the newly defined elements in Agricultural Metadata Element Set (AgMES)³ constitute a namespace. The AgMES defines elements needed to accurately describe various types of information resources in the domain of agriculture. This element set is maintained at a stable location and identifies a reference point where elements are defined and are maintained to be used by different applications.

2.4 XML and Databases

Today most bibliographic data is stored both in relational databases, such as Oracle and SQL Server 2000 and other database systems that support XML using different approaches. These products allow easy publishing, managing, and sharing of content on corporate intranets and Web. An important characteristic in this respect is that they are bidirectional. That is, they can be used to transfer data both from XML documents to the database and from the database to XML documents.

In this document we focus our attention on XML Enabled Databases, that is on systems that allow exporting data to the XML format. Most of the modern ILMS vendors often offer some sort of XML functionality in their products. However, it is important to note that a more multipart process will allow to extract, convert and generate XML from nearly any type of DBMS, provided that an additional layer is developed after the extraction of the relevant subset of records.

³ http://www.fao.org/agris/agmes