Knowledge reference for national forest assessments - information management and data registration
2. A basic Forest Resource Assessment scenario
Production of information involves data collection, processing and reporting (FAO 1999b). Inventory data can be analyzed together with other information in order to investigate relationships between volume and other spatial and statistical variables such as soils, slopes, bio-climatic indices, population density, etc. This approach can be used for filling data gaps for forest types for which no information is available (FAO 1998b).
2.1.1 Data ModelsA data model helps one to perceive, organize and describe data in a conceptual schema that includes both the data and the operations for manipulating the data set (Tokola et al. 1997). Data models for traditional measurement-based inventory data are well established - visit Modeling for estimation and monitoring - whereas studies of data models for the socio-economic aspects of forests, which may involve interview-based data, are much less available and may include fields such as question categories and interdependencies (Thomson 2000).
2.1.2 Data input
The use of data loggers has greatly enhanced the input and quality control of tree measurements, especially if linked to a central database via mobile communication and Internet access (Kleinn 2002). The database can be permanently updated and checking procedures adjusted immediately and uniformly for all field crews, which improves data quality, see 2.1.4 below. However, the most significant gain is from immediate digital storage of data in the field with periodic data transfer. Input of geospatial data by digitizing or image analysis is covered elsewhere in the knowledge reference.
Much of the information used in a FRA may reside in existing printed reports. Use of scanners and optical character recognition can facilitate the transfer of such reports into digital media. Audio recording followed by transcription into a word processor has traditionally been used to record interviews, but advances in speech technology now permit direct speech input to the word processor.
2.1.3 Computer programs for data and information managementRelational databases and Geographic Information Systems (GIS) are the primary software products used in information management (Tokola et al. 1997). However, data may be moved from between these programs and other systems such as spreadsheets, statistical programs and models during the analysis and reporting phases of a FRA, and word processors may also play a significant role, especially in relation to interview data as described above. Word processors can also include dynamic links to spreadsheets and databases to automate reporting. Data mining tools, facilitated by standardization, may also be used (Miles 2001). Customized user interfaces may be developed to enhance the operations of these programs. Varjo et al. (2000) describe the interaction of a database and spreadsheet program for graphics, and models and GIS are described elsewhere in the knowledge reference. Predefined report layouts (visit Reporting and communication) are used with such programs (Tokola et al. 1997).
2.1.4 Standards, metadata and data qualityMany of the difficulties described in Current status of information management in national FRAs can be categorised as issues of poor data quality. For ¿traditional¿ types of data such as area, growing stock, increment and fellings, quality is often satisfactory, while for some of the more recently introduced parameters such as forest condition, protection status and provision of non-wood goods and services, problems with quality were more likely (ECE 2000).
Quality and currency can be achieved through data registration, with the responsibility for keeping data up-to-date related as far as possible to those with responsibilities in the field (Swedish National Road Administration 2003), although lack of local expertise may be an obstacle to this process (EC/FAO Partnership Programme 2000). Data quality has a human element, such as errors at entry, which can be minimised through use of technologies such as data loggers (2.1.2 above), errors in reporting as a result of overwork (Michalek 2002: 30), biases or misinterpretation of questions (ECE 2000: 227). However, this section will focus on computational issues in which quality assurance can be achieved through methods such as use of appropriate standards, metadata, validation and verification, backups and archiving. These methods not only improve data quality, but also facilitate comparability among assessments.
Standards can be employed in three main arenas: content, classification and technology. Standards referring to structure, transmission and meaning of information are known as content standards, and define how to store and share information unambiguously (Richards and Reynolds 1999). Standards apply not only to data input: appraisal of data sources helps identify information quality for inclusion in assessments. Standardized criteria for objective evaluations of information source reliability are required, but are not easily developed or carried out (FAO 1998b). Quality assurance of data supply can be conducted within the framework of a standard such as ISO 9000.
The systems of nomenclature applied in National FRAs are characterized by tradition and by national information needs and are not standardized internationally. Even identically named attributes may mask different concepts and definitions (ECE 2000). Differences in definitions and measurement rules can be made compatible in two different ways: standardization and harmonization. Harmonization falls in section 2.2.2 below (Information aggregation and integration), but the distinction will be clarified here: ¿Harmonization relates to attributes that are already defined in different ways at the national level. The harmonization process seeks for a common agreement on how data can be converted to meet a harmonized definition, which is often the union of similarities of existing definitions, and does not necessarily eliminate all inconsistencies. Standardization introduces a new, common definition or standard that is applied in all national programmes. The standard eliminates all inconsistencies but can be quite different from individual, national approaches. Standardization can be seen as the process necessary for definitions of attributes that are not yet assessed but have to be introduced in national programmes. Harmonization is related to using already existing national systems of definitions but endeavouring to bring the definitions into alignment through incorporating ¿adjustments¿ for the known differences.¿ (ECE 2000: 27).
Standards can include Dictionaries, Definitions (Lund 1998), Ontologies and semantic webs, Nomenclature, Thesauri and Gazetteers. Agreement on common classifications and definitions involves compromises (FAO 1998a): for example, the threshold of 40% crown cover to distinguish closed from open forest is frequently debated. During Kotka III, one non-governmental organization recommended a threshold of 70% crown cover for defining closed forests. There is no single classification system that will serve and satisfy all needs. What is essential is that the classification criteria are clear and can be applied objectively.
Metadata and Metainformation:
A metadata standard is a common set of terms and definitions that describe data, outlining the characteristic properties to be recorded as well as the values the properties should have. It provides a way for data users to know: what data are available; whether the data meet their specific needs; where to find the data, and how to access the data (CGDI 2001a). Metadata includes who, what, where, when, why, and how about every facet of the data or service being documented, such as details regarding the data's ownership, quality, time of collection, update or transformation and attribute information. It can include details of accuracy, reliability, precision and significance of the data, as discussed elsewhere in the knowledge reference. Metadata standards are especially well developed for geospatial data. Metainformation, on the other hand, is data about any form of information resource, including organizations, people, documents and services as well as datasets (Richards and Reynolds 1999).
The Global Forest Information System (GFIS) developed by the International Union of Forest Research Organizations (IUFRO) is a distributed network of metadatabases that catalogue the information resources of contributing GFIS partners. GFIS functions by providing a standardised core of metadata (catalogue) fields, a standardised set of key words on which to search and a standardised interface between web sites and the databases, enabling functioning in an interoperable environment (Päivinen et al. 2000). Distributed networks and interoperability are discussed in section 3.1.1 below. The Dublin Core, used by the European Forest Information System (Schuck 2002) consists of 15 basic elements: title, creator, subject and key words, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage and rights. Dublin Core metadata is about semantics: what one is trying to say about resources. Notes and comments without which forest resource information could not be properly interpreted (Varjo et al. 2000) may be regarded as an informal form of metadata.
Verification and validation:
In the FAO FRA 2000 process, one principle was that information should be verified, meaning that all data items should have a source reference, and the data processing should be transparent; countries then validate and approve information concerning their country before publication (FAO 1999b). It requires much manual work to maintain and update data (Varjo et al. 2000), and the process of gathering, verifying and cross-checking information competes for the time of agencies that are also involved in other key processes (Bureau of Rural Sciences 1998).
Backups and archiving:
The organization of existing information is a task that may be tedious and lack glamour, but can often save much time and money (Janz and Persson 2002). This organization is facilitated by good archiving and retrieval systems, and may be hindered by lack of safe backup procedures.