Previous PageTable Of ContentsNext Page

4. APPROACH AND METHOD ASSESSMENT

Forest use can be socially complex, especially when it involves multiple users with variable characteristics. These users may vary with regards to (i) levels of dependence on the resource; (ii) legal user rights; (iii) commercial interests; (iv) interests in conserving the resource; (v) perceptions of costs and benefits associated with the future resource supply; (vi) non-consumptive preferences and existence values; (vii) knowledge about processing and product transformations; (viii) access to physical capital; (ix) access to markets, etc. These attributes of forest use tend to vary according to the locality's bio-physical characteristics5 social, cultural and historical attributes of the community of users6, as well as the rules and norms observed by each user.7

The challenge for the design of the interview component of the FAO-NFA is to create an information collection-approach that can accurately capture this variability at the national level. This document provides advice on how FAO can assess the quality of the results from the interview component of the NFA. I suggest the use of three different criteria when assessing the quality of results: reliability, validity, and uncertainty estimate. This section defines these concepts and discusses their relevance for assessing the quality of NFA results.

4.1. RELIABILITY

Reliability means that by applying the same measurement procedure in the same way you will always get the same result (King et al, 1993: p. 25). As Zeller and Carmines put it: "If a well-anchored rifle is fired but the shots are widely scattered about a target, the rifle is unreliable" (p. 48). In a more formal and perhaps nuanced way, reliability is concerned with the degree to which measurements are repeatable and consistent (Nunnally, 1965).

There is no such thing as perfectly reliable measurements in the social sciences. Because you seldom deal with direct measurements but with indirect estimates of variables in the social sciences, there is always some degree of human-induced error involved in measurements of social phenomena. These measurement errors can be either random or systematic in character. Because of the possible difficulties in interpreting data that are liable to different kinds of possible measurement problems, it is all the more important to somehow be able to verify the reliability of the measurements. To this end, social science research has produced a variety of reliability testing procedures.

One test of reliability that can be used by surveyors is to try to reproduce the same measures either by the same researcher using a different set of measurement methods, or by a different researcher using the same methods. The latter would require the people responsible for the original study to document their methods of inquiry in a sufficiently detailed manner so that other analysts would be able to duplicate the data and trace the logic by which the original study reached its conclusions (King et al, 1993). A more formal test of reliability can be carried out by examining the two main characteristics of the reliability concept: stability (whether a measurement is repeatable) and equivalence (whether a measurement is consistent).

Stability is assessed by analyzing the same measure for the same population at more than one point in time. This test is often referred to as the test-retest correlation test. In such analysis one correlates the same measurements at different points in time. If one obtains exactly the same results in the second as in the first measurement, there is a perfect correlation of 1.00. But, generally the correlation measure will be less than perfect because of the instability of measurements taken at different points in time. For example, a person may respond differently to a set of questions about his use of different forest products from one time to another because the person may be temporarily distracted, misunderstand the meaning of an item, feel uncomfortable - possible because of someone being present, or because the person has recently been visited TWICE by the same surveyor! (Bohrstedt, 1970; Parten, 1950).

If the correlation score is likely to be less than a perfect 1.00, what is the threshold for an acceptable level of reliability? Drawing from a review of survey research in the health sector in the United States, Mark Litwin finds that most researchers consider that if a test-retest correlation score is equal to, or greater than r=0.70, the measure is reliable (Litwin, 1995). The definition of the threshold value is arbitrary and should be justified according to the specific characteristics of the data.

Equivalence, the second key attribute of reliable measurements, is tested by measuring multiple indicators of the target concept at a single point in time. Each indicator is considered a separate but equivalent measure of the underlying concept (Zeller and Carmines, 1980). For example, if we are interested in a reliable measure of the degree of livelihood dependence on certain forest products among a group of people in a certain locality, we can try to come up with different indicators of forest-related livelihood dependency; such as the amount of time family members spend in the forest, the proportion of food that originates from the forest, proportion of family income that is related to forest activities, and the quantity and variety of forest products that are harvested regularly by the family. Comparing these different measures of livelihood dependency will inform us to what extent the different indicators yield consistent estimates of dependency. To test equivalence in a more formal way, social science researchers often use the split-half method. You do this test by first dividing all your indicator measures into two halves and then you correlate the values of the two halves. The higher the correlation score, the higher the reliability (Stanley, 1971).8 Naturally, to be able to use the split-half method, one must have an instrument that takes multiple measures of the same concept.

From a statistical standpoint, the focus of attention in reliability assessments is on random error. If there is no random error involved in the measurement of a certain variable, there is perfect reliability (Blalock, 1972). Consequently, as we increase the sample size, reliability also increases because the random errors of each measurement cancel each other out over repeated samples. However, one of the negative consequences of having extensive occurrence of random measurement errors in a study, even if the sample is large, is that it widens the width of the uncertainty band of the study's results. This is because the variance of variables is overestimated when there is high occurrence of random errors. What about systematic, non-random measurement errors, do they affect the reliability of a measurement?

Formally speaking, a systematic, or non-random error, does not detract from a measure's reliability but it does affect its validity. A highly reliable measure does not mean that it is a good measure because it may be a measure of something completely irrelevant for the purposes of the question asked. When a researcher measures an irrelevant variable there is a problem of validity. The next section deals with the challenge to assess the validity of measurements in the social sciences.

4.2. VALIDITY

Using the same rifle-analogy that was used to explain reliability, Zeller and Carmines (1980) offers the following illustration of validity: "If the shots from a well-anchored rifle hit exactly the same location but not the proper target, the targeting of the rifle is consistent (and hence reliable) but it did not hit the location it was supposed to (and hence it is not valid)" (Zeller and Carmines, 1980: p. 77). The illustration shows how it is possible to have a set of indicators that are perfectly reliable, but because they are plagued by a systematic error or bias, the indicators do not represent the concept that we want to measure. This means that to ensure high quality in a national forest assessment, it is not sufficient to assess if measurements are repeatable and consistent, but perhaps importantly, we need to consider whether our measurements correspond to the concepts we want to examine within the different NFAs. The following example is a case in point.

A researcher interested in measuring the performance of a government agency in a country's forestry sector is looking for plausible indicators. He decides to use deforestation rates as an impact indicator of the agency's performance. Although the deforestation rates per se may be reliable, the validity of applying deforestation as an indicator of government performance would be questionable. It may be that the government agency does not even have a mandate to address the causes of deforestation. If this is the case, deforestation rate is not a valid indicator of the agency's performance.

Is it possible to test the degree of validity for a given measurement? Unlike the tests for reliability described above, validity does not lend itself as well to statistical testing. Lacking the quantitative tools for testing validity, the strategy taken by many social scientists is to explain the underlying logic to the measurement strategy and discuss potential sources of systematic error. While it may often be legitimate for the natural sciences to assume that measurement errors are mostly random in nature, this is not an appropriate assumption in social science studies. Both systematic and random errors are usually embedded in social science research. Systematic error is a prominent factor because most social science studies seldom involve direct measurements of the concepts that are studied. The objects of such studies are often invisible things, such as livelihood dependence, motivation, incentives, utility, or welfare. The indicators that are chosen to approximate these concepts are bound to vary in terms of their accuracy in reflecting the sought-after concept. Ergo, the measurements are more or less valid representations of the concepts.

The interview component of the NFA is concerned with measuring a series of both abstract and complex concepts, such as the intensity of forest use by different users; type of products harvested; whether the purpose of the harvesting activities is commercial, subsistence or both; the tendencies of market demand and supply for harvested products; and the degree of awareness among users regarding official government laws and policy; among others . Since these are not directly observable variables, proxy variables are applied. The validity of these proxy measures depends to some extent on who the consultant asks, what questions are asked and how interviewees are being approached during the interviews. If there is bias in the selection of interviewees, the questions that are actually asked or in the interview methods chosen, the validity of the measures may suffer as a consequence. Precisely because proxy measures are imperfect by nature, it is important to assess their validity through a validation process so that one can have a better idea of how well they reflect the target concepts.

One of the major challenges in dealing with validity is to come up with an appropriate way of testing it. There are no mechanical blueprint tests that can be applied, as there are for testing reliability (Litwin, 1995; Zeller and Carmines, 1980). And even if we can devise a good way of testing validity for a particular measurement, it can be quite difficult interpreting the results of the test. Given the fact that hardly any validity tests gives a straightforward, quantitative answer, the viability of the test depends largely on the researcher's ability to interpret the test results (Fink, 1995; Parten, 1950). Despite these difficulties, it is seen as a crucial step in good social science research not only to carry out the validity tests, but also to document and describe this validation process in the presentation of the results (King et al, 1993; Fink, 1995; Litwin, 1995). Bohrstedt (1970) suggests that "because the fallibility of any single set of measures, we need to validate our measure of X by several independent measures, all of which supposedly measure X" (1970: p. 95). Depending on the most likely source of non-random errors, the approach and evaluation criteria chosen are likely to vary with the particular objective of the study. Researchers that use surveys often use one of the following validation tools:

1. Compare the survey results with the results of an expert team visiting conducting in-depth measurements. Such an expert team would carry out their measurement in a selected number of sites, as it would not be necessary to revisit all sites to get an idea of the validity of the original measurements. (Litwin, 1995; Fink, 1995). This is probably the most rigorous test of validity as it not only tests the validity of the questions asked but it also gives the surveyors an idea of possible selection bias in choosing interview objects, which is particularly relevant in qualitative studies.

2. Carry out a survey pretest. As a complement to the above test it is advisable to carry out a pretest of the survey that is to be applied. The pretest gives the surveyor a chance to identify the questions that are not effective and that seem to generate a large proportion of no-opinion responses.9 Analyzing the responses to the pretest, one can identify the potentially problematic questions and proceed to modify lengthy, emotionally loaded, confusing, and suggestive wordings (Wilkin et al 1992; Parten, 1950)

3. Compare survey responses to a known situation. By checking survey responses to questions for which there are known answers will give the surveyors an idea of the validity of survey responses (Rossi, et al 1983; Fowlern, 1988).

4. Compare survey results with results obtained using other survey techniques. The results of the field interviews may be compared with data gathered by using alternative measurement methods, for example through group discussions or participatory rural appraisal techniques in a number of selected sites (Tulsky, 1990; Parten, 1950).

The field interviews that form part of the FAO-NFA are subject to several of the potential problems of acquiring valid measurements and would benefit from carrying out several of these tests, as discussed in the sections below. The tests suggested for assessing both reliability and validity would need to be custom-developed and tailored to the particular purpose, needs and available resources of the individual FAO-NFA program. Such tests should constitute a critical part of quality control management, and without conducting such tests it is difficult to appreciate the degree of uncertainty of the NFA's results.

4.3. UNCERTAINTY ESTIMATE

Perhaps the most serious problem in social science research is the common failure to report estimates of the uncertainty in the analyst's results (King 1990, King et al, 1993). Even if the measurement is of qualitative nature, such as interpreted information from forest interviews with key informants, it is extremely important for the credibility of the analysis that the degree of uncertainty is addressed explicitly for each one of the conclusions drawn on the basis of the measurements. Without a good description of the degree of uncertainty, a particular finding becomes virtually uninterpretable (King et al, 1993).

The degree of uncertainty of a study's result is directly related to the reliability and validity of the measurements on which the results are based. Estimating the degree of uncertainty can be done by considering how observed limitations in reliability and validity may influence the study's findings. If this influence remains unknown, the study's results are easily contested by opponents to the study's findings. On the other hand, if the presentation of the results is accompanied by an assessment of the study's limitation with regards to the reliability and validity of its analytical methods, it is likely to strengthen the credibility of the findings. The process of defining the degree of uncertainty associated with a study's particular findings is a crucial step of the analysis because if it is done well , it puts the findings into perspective, adds nuance to the results, and justifies the particular conclusions drawn.

For example, if a national survey on forest use revealed that about 10 percent of the respondents used the forest for their principal source of medicinal plants, it would be important to know, before one draws any wider conclusions from this result, who the respondents were, whether indigenous groups and women were equally represented in the sample, how interviews were conducted, and what the survey considered to be medicinal plants. If these groups were not equally represented in the sample, or if the field personnel did not know how to speak their language well, it is likely that the overall results are biased. However, bringing the attention of the reader to such potential data limitations, which ultimately affects the uncertainty of the results, produces more nuanced and credible results.

Even though there is no such thing as absolute certainty (or even estimates of the degree of uncertainty) in the social sciences, there are things we can do to improve the reliability, validity and estimation of certainty in our studies' conclusions. This report is an effort to help FAO to make this happen in the different member countries' national forest assessments.

The next section's quality assessment of FAO's approach to NFAs brings both good and bad news to the FAO-FRA team. The bad news is that many of the potential problems related to the reliability, validity and uncertainty estimates do exist in the FRA and many of these are practically impossible to eliminate entirely. The good news is that this is quite normal for social science measurements and there are ways to come to grips with these problems.



5 Such as the sites infrastructure, availability and particular composition of forest resources, harvesting technology available, etc.

6 Such as how the users in a site relate to each other, what they believe, who they trust, etc.

7 Essentially, this means the institutional arrangements that constrain and reward user behaviors.

8 The split-half method has been criticized because the results of the test can be manipulated by the analyst through the way in which the indicators are divided. As a response to this critique, several other techniques have been developed to test the consistency aspects of reliability of measurements, such as Cronbach's alpha, principal component and factor analyses (e.g. see Armor, 1974; Novick and Lewis, 1967; Cronbach, 1951).

9 If a very large proportion of responses are no-opinion responses, one should suspect the validity of the survey instruments and/or the methods used.

Previous PageTop Of PageNext Page