Caliper - Statistical Classifications in a Linked Open World

Can you find your questions? If not, please let us know!
General questions

Caliper is FAO's platform for the dissemination of statistical classifications. Caliper is a website and a suit of tools, and, perhaps most importantly, a methodology and an approach to the management, dissemination and use of statistical classifications.

One of Caliper's main pillar is that statistical classifications are standards of public use and interest, and as such should be treated. In particular, given the importance of data (we are in a "data age"!), it is crucial that what provides the meaning to a piece of data (usually referred to as "dimensions", "classifications" or "codes") is clearly understandable to users, and easily reusable in information systems (interoperability is key).  

On December 2024, Caliper has migrated to the UN Global Platform. See: https://www.fao.org/statistics/caliper/resources/news/detail/caliper-migrates-to-the-un-global-platform/en 


The UN Global Platform (UNGP) is a cloud-service ecosystem to support international collaboration in the development of Official Statistics.  It is an initiative of the UN Committee of Experts on Big Data and Data Science for Official Statistics (UN-CEBD). UNGP is organized into 4 "regional hubs" and 2 "sector hubs". Caliper has joined the "ARIES for SEEA Sector Hub" hosted at the Basque Center for Climate Change (BC3). 

 

Data, and in particular statistical data, inform virtually all aspects of our life. Data is used as a basis to elaborate and promote public policy, and to develop new or enhanced  products and services for the marketplace. It is critical, therefore, that the meaning of the data, as provided by statistical classification systems, is easily accessible, explorable and understandable to both a very wide range of human actors, as well as being seamlessly interrogable digitally by computer-based applications without risk of corruption or information loss. Caliper addresses these needs.

The migration of Caliper to the UN Global Platform, and the publication of the International Family of Statistical Classifications as Linked Open data, represents one more important step towards the modernization of official statistics.  

 

 

 

 

FAO, UNSD and BC3 have long collaborated towards the modernization of official statistics and, ultimately, produce high quality data, widely usable & reusable by different users and information systems. The migration of Caliper to the UN Global Platform is one step towards achieving this common goal. 

The partnership - initially featuring FAO, UNSD and BC3 - is open to other custodians of international statistical classifications. Get in touch if you wish to participate and would like to discuss your needs and expectations!

 

If you are a data user, you are probably interested in checking out classifications’ contents (e.g., codes or definitions) or correspondences using our browser ShowVoc

If you are a developer, you may be interested in checking out the online query facilities offered by our SPARQL endpoint. We have developed a number of sample queries for your tests

NO. The classifications published in Caliper are maintained and published by dedicated institutions, sometimes in collaborations with FAO. Go to section Classifications in this website to see all classifications currently included in Caliper, and their custodians. 

Being part of the FAO web content, the FAO Terms and Conditions apply to Caliper.

The statistical classifications disseminated through this website and services are all in the "public domain" (i.e., materials that are not protected by any intellectual property rights such as copyright, trademark or patent laws). To our knowledge, no specific license is defined or adopted for statistical classifications.

FAO has formalized its policies regarding the licensing and the terms of use of the statistical databases it produces, see the policy document on  Open Data Licensing for Statistical Databases, and the page on Statistical Database Terms of Use.

YES. Check out our page Documentation

Open Data

According to the Open Definition, “Open data and content can be freely used, modified, and shared by anyone for any purpose”. The idea of open data has gained momentum with the raise of the Internet and, more strongly with open data initiatives promoted by governments and other large institutions. The general understanding is that open data is distributed with an open license, such as the Creative Commons, and expressed according to standardized (as opposed to proprietary) machine-readable formats. Moreover, it should be registered in appropriate catalogs so as to facilitate its discovery.

Linked Data

Linked Data is structured data that is interlinked with other data, so as to become more useful. Linked data aims at making data more easily consumable by machines, for example by means of semantic queries. It relies on existing web standards such as the HTTP protocol, the RDF data model, and the notion of global identifiers over the web (URIs). 

All. Caliper supports editing, display of and search for information in all languages, including those with non-Latin script (such as the two FAO official languages Russian and Chinese) and with right-to-left orientation (such as Arabic, another FAO official language). 

No. To users, all services and functionalities offered by Caliper are for free.

Questions on Classifications and Data Model

The basic data model is RDF

At the most basic level, the data model adopted in Caliper is RDF, a language for the web that expresses single pieces of data as triples, i.e., statements of the form subject - predicate - object. A set of triples form a graph.

Whereas the relational data model (e.g., that of a database or even of a spreadsheet) organizes data in tables consisting of rows and columns. a graph data structure consists of vertices (aka nodes, or points) connected by edges (aka links or lines). The basic unit of a graph is a triple node-edge-node. the basic units of a tables are rows and columns.

To compare tables and graphs, consider the table below, consisting of only 1 line, describing the record with ID = A25. 

IDNameSurnameAge
A25MaryJones89

That very same line can be expressed in the following 3 triples:

  • A25 "has name" Mary
  • A25 "has surname" Jones
  • A25 "has age" 89

In the RDF framework, "A25" is a subject,  "has name" ("has surname," "has age") is a predicate, and "Mary" ("Jones," "89") is an object.

The two things to highlight here are:

1) Being RDF a language for the web, it requires that the subject be a web resource, i.e., anything that may unambiguously identified on the web. Hence, a global identifier, i.e., unique over the web, as opposed to unique over a specific database.

2) The predicate may be defined once and for all in a public vocabulary, expressed in ways that are understandable by machine and people alike, and therefore reusable over the web.  For this purpose, a number of public vocabularies are defined for different domains and applications. In Caliper, we largely use SKOS and XKOS.

SKOS provides the construct to express the basic elements of a classification

SKOS stands for Simple Knowledge Organization System. SKOS is a vocabulary for RDF defining the constructs for expressing classification schemes, subject headings, thesauri, taxonomies and the like. We use SKOS to express: hierarchies, classifications entries, labels, explanatory notes, definitions, and correspondences. SKOS is a W3C specification. 

XKOS provides constructs to express information specific to statistical classifications

XKOS (the Extended Knowledge Organization System) is one of those, defining constructs specific to statistical classifications (such as correspondences, classification levels, etc.).

Other standard vocabularies

Other vocabularies are available for unambiguously expressing pieces of information over the web.

A full account of the modelling adopted in Caliper can be found in Section Documentation of this website.

RDF is a standard model for Web data interchange, originally developed to describe metadata elements. This is very handy to express statistical classifications, and in the following we explain why.

First of all, we need to make clear that the main  difference between RDF and the common way to store data, for example in Excel sheets, CSV or databases, is the data model. In fact, the RDF fundamental data model is based on the notion of triples organized into graphs, while Excel sheets, CVS files and relational databases all adopt a data model based on tables. 

By triple, we mean a structure of the type “subject predicate object”, that can express statements like “Classification A <is maintained by> Custodian X”, or Classification A has title ), i.e., a triplet or triple. Any piece of information may be expressed as a triple and different triples about the same subject form a graph. Instead, when using a tabular model, the same information would be expressed by a row with three fields, one for the classification name, one for the custodian, and one for its title. Many rows of this type form a table.

The first advantage of the triple/graph model over the tabular one concerns the possibility of expressing hierarchies in standard ways. Consider for example the following fragment hierarchy from CPC 2.1:

  • “001 Cereals,”  
    • “0111 Wheat,”  
      • “01111 Wheat, seed,”  

When dealing with tables, that very same piece of hierarchy, may be expressed in different ways, as shown in the picture below:

    A close-up of a computer screen

AI-generated content may be incorrect.

    While in the RDF world that would be expressed as follows:

    ....graph with arcs from SKOS..

    where the labels on the arcs come from SKOS (Simple Knowledge Organization System), a vocabulary for RDF designed to provide the constructs to express hierarchical (or flat) structures like thesauri, glossaries and the like. The advantage of this approach is that data expressed in RDF can be stored in dedicated databases, called triplestores, whose internal organization is transparent to the user. In other words, the user may concentrate on the content of the data, instead of spending time on designing the database scheme.   

    It is important to note that the graph graphically shown above is the actual data model for RDF, not just a convenient representation of it, for which alternative "serializations" are possible. 

    Another advantage of this approach is that adding new pieces of information to a graph boils down to adding new arcs and nodes to the graph, which can be done without disrupting the entire data repository, whereas adding new properties or fields to a database implies a break in the database scheme. 


       

      SKOS

      SKOS stands for "Simple Knowledge Organization System (SKOS). It is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data." (From Wikipedia).
      SKOS defines classes and properties for representing:
      • a "concept scheme" (a set of terms: a classification, a code list, a list of subject headings...)
      • its terms / concepts (labels in different languages, definition, notation, editorial notes...),
      • the relationships between concepts (generic or hierarchical)
      • subsets of concepts (collections)
      It is therefore suitable for representing classifications in a semantic, machine readable way.

      SPARQL

      SPARQL is the query language for RDF. 

      Vocabulary

      In everyday language, a vocabulary is a set of words, possibly used by a group, individual, or work, or in a field of knowledge (See the definitions given by Merriam-Webster dictionary). Vocabularies are then fundamental to shape the universe of discourse of people, and have a special role in the field of information management, especially in the form of controlled vocabularies, i.e., selected list of words used as "tag" or "classifier" of information unit - numeric or textual data. Because of their role in defining the entities to measure and codifying data, statistical classifications can be considered as special types of vocabularies.
      Also in the area of information management and in the semantic web, vocabularies play a very important role. The World Wide Web Consortium (W3C) Vocabularies are defined in this broad sense by the W3C: "On the Semantic Web, vocabularies define the concepts and relationships (also referred to as “terms”) used to describe and represent an area of concern. Vocabularies are used to classify the terms that can be used in a particular application, characterize possible relationships, and define possible constraints on using those terms. In practice, vocabularies can be very complex (with several thousands of terms) or very simple (describing one or two concepts only)." 
      Moreover, the W3C usefully distinguishes two types of vocabularies:
      1. value vocabularies or sets of controlled values used to categorize and classify things. These are also known as Knowledge Organization Systems (KOS) and include classifications, code lists, thesauri, even certain types of ISO standards that prescribe controlled lists of values;
      2. metadata element sets that prescribe what features or properties should be used to describe things. They are also called schemas, or description vocabularies. XML schemas and RDF schema, formal languages to describe entities in XML and RDF respectively. Other example include ontologies, application profiles, and UML models.
      The statistical classifications that are the focus of Caliper fall under the first type. SKOS, the formal language we used to express statistical classifications in a machine-readable format, is an example of the second type. Specifically, SKOS is a vocabulary for RDF, tailored to express thesauri on the web.

      when I open it with Excel leading zeroes as missing

      You probably opened the csv from Excel. If you opened it with a text editor, you would see all leading zeroes correctly in place. To be able to see them in Excel too, follow the instructions given by the Office Support website.

      When I open it with Excel I see weird characters instead of an Arabic/Chinese/Russian name

      This is because Excel does not recognize UTF-8 encoding.

      Caliper works well with all formats commonly used to store or pass classifications around, such as CSV, XLS, DB dump, or JSON.

      Yes. The editing tool used in Caliper is VocBench. It is a powerful tool, able to support the editing of both classifications (as RDF vocabularies), and the OWL model (ontologies) the use, when this is the case. VocBench also fully supports editorial workflows, so that some users will only be able to add translations, for example, while other are allowed to approve changes and perform more complicated operations. Therefore, the level of knowledge of RDF required for the maintenance of classifications in VocBench very much depends on your role in the project.