6. How does the NFA verify data quality?

6.1 Approaches to quality control

Forest use can be socially very complex, especially when it involves multiple users with variable characteristics. The challenge for the design of the NFA interview component is to gather quality data that capture this variability at the national level. This section provides advice on how FAO can assess the quality of the results from the interview component of the NFA. To control the quality of NFA data, each NFA team should, during the field work phase, test their data for their degree of (1) Representativity; (2) Reliability, and (3) Validity.

6.1.1 Test of representativity

The technical team should periodically evaluate the degree of representativity of the sample of interviewees at the strata level. Such an evaluation could include a series of simple statistical analyses. Moving averages and F-tests can be used to get a better handle on whether the complexities of the country¿s forest use are captured by the measurements in the sampled sites. To be meaningful, such tests would have to be carried out continuously by the technical unit. As consultants enter information from interviews into the database, the technical unit should test how new data affects the aggregate variance and moving averages of key variables at the different regional strata levels. Such a processing method would enable the teams to correct for possible under-sampling or biased sampling in the earlier sites of the project by increasing the sampling intensity of interviewees in the latter sites. Such tests provide an independent quality control to the tests that the field teams should perform when they apply the adaptive sampling techniques, described above.

6.1.2 Reliability tests

Reliability means that by applying the same measurement procedure in the same way you will always get the same result (King et al, 1993: p. 25). Zeller and Carmines (1980) provide the following explanation of the concept: ¿If a well-anchored rifle is fired but the shots are widely scattered about a target, the rifle is unreliable¿ (idem, p. 48). In a more formal and perhaps nuanced way, reliability is concerned with the degree to which measurements are repeatable and consistent (Nunnally, 1965).

There is no such thing as perfectly reliable measurements in the social sciences. Because one seldom deals with direct measurements but with indirect estimates of variables in the social sciences, there is always some degree of human-induced error involved in measurements of social phenomena. These measurement errors can be either random or systematic in character. Because of the possible difficulties in interpreting data that are liable to different kinds of possible measurement problems, it is all the more important to somehow be able to verify the reliability of the measurements(Gujarati, 1995). To this end, social science research has produced a variety of testing procedures.

One test of reliability that can be used by surveyors is to try to reproduce the same measures either by the same researcher using a different set of measurement methods, or by a different researcher using the same methods. A more formal test of reliability can be carried out by examining the two main characteristics of the reliability concept: stability (whether a measurement is repeatable) and equivalence (whether a measurement is consistent).

Stability is assessed by analyzing the same measure for the same population at more than one point in time. This test is often referred to as the test-retest correlation test. In such analysis one correlates the same measurements at different points in time. If one obtains exactly the same results in the second as in the first measurement, there is a perfect correlation of 1.00. Since the correlation score is likely to be less than a perfect 1.00, most social scientist researchers consider that a measure is reliable if the test-retest correlation score is greater than r=0.70 (Litwin, 1995).

Equivalence is tested by measuring multiple indicators of the target concept at a single point in time, using the split-half method. You do this test by first dividing interview variables (that measure the same concept) into two halves and then you correlate the values of the two halves. The higher the correlation score, the higher the reliability (Stanley, 1971).1 Of course, to use the split-half method, one must have an instrument that takes multiple measures of the same concept.

Just as the test of representativity should be carried out continuously throughout the information gathering phase, so should the technical unit also run periodic reliability tests as more field data is added on by field consultants. Such a practice will work as an early warning system should there be detectable problems with the reliability of reported information.

6.1.3 Validity tests

Using the same rifle-analogy that was used to explain reliability, Zeller and Carmines (1980) offers the following illustration of validity: ¿If the shots from a well-anchored rifle hit exactly the same location but not the proper target, the targeting of the rifle is consistent (and hence reliable) but it did not hit the location it was supposed to (and hence it is not valid)¿ (Zeller and Carmines, 1980: p. 77). The illustration shows how it is possible to have a set of indicators that are perfectly reliable, but because they are plagued by a systematic error or bias, the indicators do not represent the concept that we want to measure.

The interview component of the NFA is concerned with measuring a series of both abstract and complex concepts. Since these concepts are not directly observable variables, proxy variables are applied. Precisely because proxy measures are imperfect, it is important to assess how well they reflect the concepts that the interviews seek to measure.

In contrast to the testing of reliability, there are no mechanical blueprint tests that can be applied to test validity (Litwin, 1995; Zeller and Carmines, 1980). Despite these difficulties, social science research views validity tests as a crucial step. It has become a norm in social science research to document and describe the validation process in the presentation of the results (King et al, 1993; Fink, 1995; Litwin, 1995).

Bohrstedt (1970) suggests that ¿because the fallibility of any single set of measures, we need to validate our measure of X by several independent measures, all of which supposedly measure X¿ (1970: p. 95). Depending on the most likely source of non-random errors, the approach and evaluation criteria chosen are likely to vary with the particular objective of the study.

For the NFA it is recommended that validation is carried out by an independent team of experts who has been contracted to compare the NFA survey results with their own in-depth measurements. Such an expert team would carry out their measurement in a selected number of sites, as it would not be necessary to revisit all sites to get an idea of the validity of the original measurements. (Litwin, 1995; Fink, 1995). This is probably the most rigorous kind of test of validity as it (a) assesses the level of uncertainty for all variables measured; (b) complements the interview data with in-depth information on issues of central importance for policy, i.e. describing the role of forestry in alleviating rural poverty, the importance of forestry in efforts to improve food security, etc; as well as (c) strengthens the accountability mechanism between the technical unit and the consultants.

The case study protocol developed by the International Forest Resources and Institutions (IFRI) Research Program2 provides an excellent guide for an independent validation of the NFA interview component (Ostrom and Wertime, 1993; Gibson et al, 2000). Following the guidelines of the IFRI program, the selected case study sites could be established as permanent reference sites to be revisited every 5-10 years, to study the effects of changing policy and market conditions on local patterns of tree and forest use. Having these sites as reference cases will help the NFA personnel to interpret the results of the study, put them into concrete contexts, provide reliable data on public policy impacts in specific locations, and reveal the practical implications of the results for particular forest user groups in society.

The validity of the interviews may be further strengthened by carrying out a survey pretest (Fowlern, 1988). As a complement to the above test it is advisable to carry out a pretest of the survey that is to be applied. The pretest gives the surveyor a chance to identify the questions that are not effective and that seem to generate a large proportion of no-opinion responses.3 Analyzing the responses to the pretest, one can identify the potentially problematic questions and proceed to modify lengthy, emotionally loaded, confusing, and suggestive wordings Parten, 1950; Wilkin et al 1992; Patton, 2001)

The validation tests may very well show that the methods used in the NFA are both reliable and valid, but until such tests are actually carried out, the quality of NFA information will remain unknown. The three types of tests mentioned above will make it possible to define the level of uncertainty that is associated with the NFA findings. Presenting the results of such a testing procedure is likely to improve the NFA information users¿ perception of the quality of the NFA products.


1The split-half method has been criticized because the results of the test can be manipulated by the analyst through the way in which the indicators are divided. As a response to this critique, several other techniques have been developed to test the consistency aspects of reliability of measurements, such as Cronbach¿s alpha, principal component and factor analyses (e.g. see Armor, 1974; Novick and Lewis, 1967; Cronbach, 1951).
2 Visit IFRI¿s home page for more information on the program and its methods: www.indiana.edu/~ifri
3 If a very large proportion of responses are no-opinion responses, one should suspect the validity of the survey instruments and/or the methods used.