PC 86/INF/3 |
Eighty-sixth Session |
Rome, 17-21 September 2001 |
Modernization of FAOSTAT |
The Committee at its last session was informed of the growing difficulties in managing the FAOSTAT working system and also in maintaining the quality of FAO's data collections. The Chairman has requested that further information be made available to the Committee at its current session.
The attached Discussion Paper entitled "Modernization of FAOSTAT" is the basis of a document which will be shortly be submitted to a newly established Committee on WAICENT which will consider FAOSTAT's needs in the overall context of FAO's strategic objective E1 An integrated information resource base, with current, reliable statistics, information and knowledge made accessible to all FAO clients. It is likely that the next steps will be go into further depth on requirements and conceptual design of the proposed replacement system. This would allow detailed estimates of costs to be prepared.
Following completion of this work, the Committee on WAICENT will review the resulting proposal and advise the Director-General on possible courses of action.
The Programme Committee's views on the issues which are raised in the document are invited as input to the development of an appropriate response to the problems raised.
Discussion Paper
August 2001
+ + + + +
Summary
This discussion paper deals with the urgent need to replace the FAOSTAT working system, which is outdated and vulnerable to frequent breakdowns. Modernization is required in order to ensure continued availability of the Organization's statistical data through WAICENT. Moreover, a key function of the FAOSTAT working system is to enable consistency checks and estimation of missing data warranted by the continued quality deficiencies and gaps in data provided by FAO member countries. While addressing the modernization of FAOSTAT, the opportunity would also be taken to enhance the functionality of the working system. This would, in turn, enable data providers within FAO to improve the quality, coverage and consistency of data posted into WAICENT. A full requirement analysis is foreseen as the first stage in this modernization.
+ + + + +
1. The FAOSTAT working system is one of the most important contributing systems to the World Agricultural Information Centre (WAICENT) through which access is given to FAO's basic collection of statistical time series on agriculture, fisheries and forestry.
2. FAOSTAT today: The rapid development of the Internet during the 1990s has brought with it an increased awareness of FAO's wealth of world agricultural statistical data. The visits to the FAO web site, http://www.fao.org, have grown from 219,639 (5,435,963 hits) in August 1999 to 652,935 (17,734,772 hits) in July 2001. Similarly, the visits to the FAOSTAT Dissemination System web page for statistical data, http://apps.fao.org, have followed the same pattern, reaching 162,281 database accesses (hits) and 19,256,541 records downloaded in the month of July, 2001 alone. This demonstrates the progressive increase in demand for comparable inter-country statistical information.
3. It is estimated that about a quarter of all visits to the FAO web pages (that would cover both "hits" and "downloads") are for retrieving statistical data. In this context, the constant increase of "consultations of the FAOSTAT Dissemination System " over the years, doubling about every 10-15 months, provides a clear sign of the significant global impact made by two of FAO's key normative activities: maintaining statistical databases, and collecting and disseminating information.
4. Furthermore, FAO's statistical products are now well known, both in the scientific and in the communications/media world, and are highly valued. Around 140 subscriptions to "Online FAOSTAT" have been sold (this allows the user bulk data delivery or downloads) and additionally 350-400 FAOSTAT CDs are sold in each release. (In addition, there are the FAO Yearbooks, distributed and sold, which are drawn directly from the database.) This demonstrates the fundamental importance of statistical data to the Organization and its constituents.
5. The origin of FAOSTAT: In the late 1980s, FAO decided, as a part of the development of WAICENT, to merge and modernize several of its systems to improve the quality of the statistical data collected by FAO. These systems had all been performing the same tasks of compiling, processing and validating time series data, used to generate other data and as input for advanced economic analysis. This new umbrella system was called FAOSTAT and is referred to as the Corporate Database for Substantive Statistical Data.
6. The system was implemented between 1990 and 1992. Its initial release, in 1992, addressed the data processing needs of the Statistics Division as well as the Fisheries and Forestry Departments but, over time, the system has been expanded to incorporate other smaller working systems of these and other Departments as well.
7. The underlying system - the `working system' - was designed, following extensive user consultation, to cover both the functionality of the systems then in use, and additional functionality for a new merged system. It was developed using tools and "off-the-shelf" packages that were industry standards at the time (i.e., the early nineties), using a client-server architecture to share the processing between the database server and the user's desktop personal computer. Initially the system was developed using the Ingres database; later it was moved to an Oracle database (which is now the Organization's standard for corporate databases). The software was developed in C++, a programming language, and used package components for the user interface and underlying system that allowed for portability across different operating systems.
8. FAOSTAT has been in use for about a decade. Three major technical problems now affect the working system. The situation has become progressively more critical over the past year.
9. Inherent Stability: When designing the working system users were concerned about the low level of stability of the FAO internal local area network. Consequently, the FAOSTAT working system was designed so as to rely more on the memory and storage capacity of the user's local PC, and less on obtaining data and carrying out processing over the network. However, it is now recognized that the PC environment has certain inherent limits, and these limits affect both the speed and the robustness of the application. This was, and is, a constant frustration to those using the system frequently. System crashes, a frequent FAOSTAT working system occurrence, interrupt a user's work, and result in lost productivity and increased likelihood of errors.
10. The decision to design a system with heavy reliance on the local PC may have inadvertently introduced some of the instability which it was being sought to avoid. It is certainly no longer a necessary design feature as the Organization's internal network has become progressively more stable and reliable.
11. Although the FAOSTAT development team has attempted to address stability problems, as reported by users over the years, not all of these corrections have been possible, or have been addressable (given resource restraints). While some of these reported difficulties could be considered enhancements to the initial design, others amount to outright system errors, which the users need to have repaired. Over time, the users have found ways to work around these problem areas. But this ad hoc approach to addressing stability problems within the system, is clearly not an efficient use of the Organization's resources for the longer term.
12. Operating systems: The FAOSTAT working system is based on old versions of packaged components and compilers which may no longer be compatible with the latest versions of the Organization's standard operating systems (Windows 2000 and Windows XP). This could complicate the computer environment at FAO resulting in higher costs.
13. System framework and unsupported software components: The original system framework for FAOSTAT was designed based on the requirements defined in 1990. Some of the new features, requested by users, simply do not fit within the FAOSTAT working system's original framework, and would require significant restructuring of the application in order to accommodate them. At the same time, the core off-the-shelf packages used by FAOSTAT are extremely out of date. In fact, the suppliers, who developed these packages, no longer support the versions used by FAOSTAT.
14. These packages have not been upgraded for "cost-benefit" reasons - the benefits of implementing the new versions did not outweigh the time and effort required to perform the upgrades. It is not a simple matter of replacing the older version with the newer version. Rather, significant programming modifications would be required, in all cases, to accommodate the new versions. This is also true of the C++ compiler used by FAOSTAT that is also very out-of-date, which has an impact on the stability and kinds of function, which the system can perform.
15. It is estimated that updating the existing versions of software and compilers, if even possible, could cost more than half that of developing a new working system without the benefits of modernizing the FAOSTAT framework and incorporating the new user requirements.
16. Even without the technical necessity to review FAOSTAT, as outlined above, there are a number of issues regarding data quality which, alone, justify a review of the FAOSTAT working System at this time.
17. For example, an in-depth analysis reveals that there are some 30 countries worldwide where relevant statistics are missing for five and even for ten years. In value terms, world crop production is for slightly over one-half based on official data (about 55 percent). When looked at in terms of data cells, a little less than half of the cells are official (around 48 percent) (Graph 1).
18. As regards agricultural trade, while over 60 percent of trade, in value terms, is official, still just about 60 percent of the data cells contain official data (Graph 2). Therefore, there is a vast amount of data, which is to be estimated by the working system, some of it by quite sophisticated (trade) matrix computations.
19. The analysis suggests that there is even a negative trend in the reporting of commodity statistics by individual countries; this is shown in Graphs 3, 4 for both, for the number of the "non-reported cells" and also for the "non-reported volume" represented by these cells. Graph 3 shows this for crops, while Graph 4 overlays the livestock data and shows the percentage of official data for livestock number and livestock products.
20. In conclusion, important statistical information is frequently incomplete. This is particularly true for livestock production, production of some staple food commodities (roots and tubers in African countries), agricultural inputs and prices. Furthermore, there exist gaps in certain key areas on which data are not collected through questionnaires, e.g. agricultural investment.
21. In the absence of official or semi-official data, gaps have to be filled by FAO estimates in order to provide global/regional assessments. The Organization's FAOSTAT working database system is therefore used to estimate, interpolate and extrapolate, using the full array of algorithms available to it. The calculations made, in this way, have to be checked for consistency and plausibility using various editing routines, which have been developed in now outmoded routines and programming languages.
22. Even if efforts to enhance the quality of member countries' data supplies (which are urgent) are successful, gaps are expected to continue to exist for quite some time. FAO will therefore need to continue its efforts to fill them. To address these needs, a robust and functionally sophisticated working system database is required.
23. A Working Group on Agricultural Data Series and Derived Products, including members from ES, AG, SD, FI and FO as well as GIL and AFI, have reviewed their requirements and consider the existing system obsolete and inadequate.
24. The following are the main areas where corporate users consider that system functionality should be enhanced. These divide into (a) system functions and (b) data-related issues.
25. New analysis tools: the availability of newer hardware to users has given them access to more powerful econometric tools and related software packages to meet their forecasting and modelling needs. Furthermore, demands for trade matrices (trade flows by origin and destination), as well as member countries' data delivery needs (to FAO) have grown.
26. Publication functionality: the dissemination component is also compromised by the structure of the current working system. With a new working system, it would be possible to address the demand for higher quality outputs from the database including XML formats for publication and (drill down) national meta-data.
27. Interpolation/extrapolation: interpolation/extrapolation is another essential feature missing from the current working system. Many countries cannot make available full time series, while there is a slight downward trend overall in the reporting of official statistics. In these circumstances, functionality to in-fill data, i.e. estimate missing points based on appropriate algorithms, is extremely important. Flexibility is also needed in order to accommodate extremely useful data where major surveys are undertaken but on a less frequent than annual basis (this applies particularly to water statistics held in the AQUASTAT database).
28. Quality assurance: the FAOSTAT working system needs improved functionality particularly in the areas of editing and consistency checking - such facilities are commonplace features in current state of the art database management systems. Furthermore, increasing amounts of data are being received in an electronic form and hence users need an easy, flexible way of loading those data into the working system, which allows them to edit and verify the values. Equally, real-time editing programmes and consistency checks are needed to increase efficiency of data entry and improve data quality. Users have many suggestions as to how the system could be extended to better help them meet their goal of improved data quality if the necessary software technology was available. These suggestions would all require modifications to be made to the system's data entry functionality and validation programs, which would expand the flexibility of the working system.
29. Meta data: Users of the working system also make extensive use of information about the data item concerned (metadata) as well as the history of each value that is entered. This information helps them decide on the validity of a figure. Therefore, they need fast, easy access to these historical details for which up-to-date software is required.
30. Devolved data entry: currently, data provided by member countries cannot download into the current system. Data quality would be greatly improved if countries could be encouraged to compile and process data online themselves.
31. In the past, this has been attempted through the use of questionnaires, covering a subset of the country's data present in the FAO system. While this has had some success, and has been improved upon with the new Virtual Questionnaire application (running over the web), it has always been felt that a better approach would be to supply the countries with their own downloadable version of the FAOSTAT working system. This would allow the countries to do, not just the data compilation work, but also to take on some of the responsibility for validation and processing of those data as well.
32. Clearly throughout this work, the opportunity exists to rationalize some data questionnaires, to streamline and reduce the cost or administrative burden imposed on data providers.
33. Extending data sets: Re-designing the working system also offers the opportunity to extend the data sets held by FAOSTAT. Many data sets, which currently sit outside FAOSTAT, could be brought into a new revised data model if they were of corporate interest. Illustrative examples could include the following:
Nutrition data: ESN has data sets on nutrients (vitamin A, iron, calcium, etc) which could be integrated into a new working system. This includes data sets on food additives and food contaminants, composition, food contaminants and food additives, which could be incorporated into the system. If this information were combined with related data sets, FAO would have a powerful resource for undertaking dietary and risk assessments. Dietary intake assessment instruments (e.g. quantitative or semi-quantitative food frequency methodology), using commodities as the defined food list, could be constructed, and used in countries to collect dietary intake data, with associated demographic details.
"Data about commodities" (not just "commodity data") and food composition and nutrient factors could be brought together into existing data and new links made between data sets, e.g., with the Nutrition Division, ESN.
Forestry resources data could be reviewed for incorporation in a new rebuilt working system within FAOSTAT (FONS: Forestry Planning & Statistics Branch).
Agro-meteorological data, which supply GIEWS (Global Information and Early Warning System) with some valuable information, are potential new sources of data sets to be merged into the corporate database. (SDRN: Environment and Natural Resources Service).
ESC's Global Information and Early Warning Service maintains useful data on basic foodstuffs which is currently outside the FAOSTAT working system and resides in Excel files. This could be merged with other databases to provide global indicators of food security.
Some of the sets of Fisheries data could be reviewed for inclusion in a new database. (FIDI: Fisheries Information, Data and Statistics Unit).
AQUASTAT, a database developed by AGL (Land and Water Development Division) could consider a new working system database structure.
Other groups in the Agriculture Department, who maintain data sets, such as crop nutrient responses, livestock production, farm economics, etc., would need to review their requirements vis-à-vis a modernized FAOSTAT.
34. Clearly, there are many data domains that would gain from a modernised FAOSTAT. There are also others who could benefit for utilising incorporation beyond the examples given above. At this stage, it is essential to produce a full list of candidates in consultation with technical divisions, and then develop a set of criteria for their inclusion. This would form the basis for the design of the new data model.
35. Conversion factors: Beyond expanding the scope of the data model, improvements could also be made in the holding of technical conversion factors used for the processing of the basic production and trade data. These require systematic updating and there is a need to make these more transparent. A new FAOSTAT should be able to capture, store and link this kind of information for users.
36. New parameters: An improved system would increase the flexibility to generate new parameters, e.g. in addition to the "traditional" yield per hectare, food production per cubic meters of fresh water or environmental crop parameters (such as emissions) could be generated.
37. Another related issue is the need to establish a set of unified concepts and definitions for country and commodity groupings across the House. Over time and with the development of new datasets around the Organization some of the standardization of groups has managed to slip. There is a need once again to revisit the groupings, as well as similar (meta) data, so that unnecessary and confusing multiple concepts do not flourish. Unfortunately, there are no quick fixes for these methodological issues (commodity coverage - scope/range and methodologies), which need to be addressed thoroughly as part of the proposed redesign.
38. The discussion paper raises issues in two critical categories: working system stability and working system functionality. The former justifies an overhaul of the FAOSTAT working system; the latter highlights the opportunities for improvement, which such an overhaul also presents.
39. Working System Stability: The working system is based on a number of outdated components and has become less stable over the past 18 months. System crashes are not uncommon. The working system is vulnerable because the programming is now no longer well understood by the staff required to maintain it. Since FAOSTAT was introduced software standards have changed, and key personnel involved in the system's design have left the Organization (see section 2 above).
40. Also the current system architecture seems to have reached a limit, where even minor changes to the application can result in programmes which are unable to run. Diagnosing problems and correcting them greatly increases the time required to undertake vital maintenance activities (adding reference data, fixing known problems to avoid the need for workarounds, etc).
41. Another serious difficulty is that the system design itself is a patchwork, by the system having `absorbed' functions that are written in older programming languages (e.g. Fortran). These functions are frequently very complex (such as the matrix-solving data evaluation program which is the core of the standardization process of the supply/utilization accounts into food balance sheets) and cannot easily be modified.
42. The result is that working system has become difficult to maintain, and there is some risk that the overall system could collapse, which would significantly compromise FAO's ability to collect and disseminate statistical data - and to undertake analytical work based on these data.
43. System Functionality: Both the quality and coverage of agricultural statistics received at Headquarters remains less than satisfactory and the evidence is that this situation will continue for some time to come. To address this, new system functionality will help in collection of data, and in making the most of the data which is submitted (see section 3).
44. Furthermore requirements are emerging (see section 4 above) for a new data model for FAOSTAT and for a new set of system functionality, more closely resembling current state of the art support for statistical analysis.
45. Conclusion: The FAOSTAT working system was designed at the end of the 1980s, and has been operational throughout the nineties. Over that time period computer applications have been revolutionized by the advent of the Internet and by the development of Java, the primary programming language for modern applications. Java provides all the object-oriented benefits of C++, while resolving most of its pitfalls. In a very short period of time, both of these tools have moved to the forefront of the developing world. Furthermore, the increasing use of the very new XML standard for publication and meta data development has important developmental implications for this decade and for databases like FAOSTAT.
46. All users have concerns about FAOSTAT's ability to meet their processing needs. Those responsible for maintaining FAOSTAT have a different set of concerns, relating to the difficulty maintaining the system and ensuring that user demands are satisfied. These concerns, together with the significant changes in the area of PC applications, suggest that it is now the moment to review FAOSTAT, and at the very least compare the feasibility of patching up the shortcomings outlined above against a complete rewrite of the application.
47. It is intended to initiate work this year to produce the requirement specification and conceptual design of the Working System replacement. This work could be completed by the end of the year and provide detailed estimates of hardware, software and development resources required for the new system according to the agreed requirements and scope of the new working system.
48. Although it is not possible to give concrete estimates, based on the development of the previous system, basic knowledge of user problems and requirement and experience with the technology being considered for the new system, a "ball-park" number can be estimated, including both human and equipment costs.
49. It is envisaged that the preparation and new core system could be completed by the end of 2002 with the remaining functionality completed by mid-year 2003.