Data is an extremely important intangible good, but official data is not always available and it is often challenged by low statistical capacities, poor funding on data and statistics, weak data dissemination and use culture and new competitors on the market, which create data gaps as a consequence. Such gaps are widened by emergency contexts, when having access to timely information is very important.
To tackle the crisis of traditional data collection systems, national and international actors need to engage with new sources of data and methods and find innovative solutions to generate information that is relevant for food security, nutrition, and food systems transformation.
In this light, in 2019 FAO created the “Data Lab for statistical innovation” to fill such gaps by improving the timeliness and granularity of data collection, increasing the use of methods and technologies to extract data from unstructured sources with the aim to build more timely information to support decisions making processes.
The internet grants a wide scope of facts and data sources, which consist of an enormous assortment of dissimilar and poorly organized data. Web scraping involves fetching and extracting those data from web pages, and creating properly organized information. The Data Lab has developed different procedures to gather structured information either from web pages containing statistical data or from social media (Twitter) / media aggregators (Google News).
Text analytics (or text mining) refers to the process of automatically extracting information from different written resources. It involves lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, sentiment analysis, information extraction, data mining techniques (including link and association analysis), visualization, and predictive analytics with the aim to turn text into data for analysis, via the application of natural language processing (NLP). By means of proper tools (all of these being free and Open Source), the Data Lab implements the different NLP steps, depending on the resources used or the specific objectives of the analysis.
Data validation consists of the process of verifying the quality of the scraped data. This implies the implementation of a strategy that verifies the correctness and the meaningfulness of the resulting information by referring to other sources. The Data Lab has access to all the FAO data systems to check the resulting values against the most updated "official sources".
Formalizing a phenomenon in a mathematical way allows for identifying how the different components act in determining its values. This could be used to obtain values for the phenomenon when no observations are available. The Data Lab develops models that, starting from the scraped data and considering also other sources, produce descriptive statistics and proper indicators that are useful to support the achievements of FAO's main objectives.