ESS: CHAPTER 16. QUALITY CHECKS AND POST-ENUMERATION SURVEYS

No matter how well a census or a survey is organized, it is difficult to assure that quality data is collected. It is very important to arrange various data checks before data are disseminated to the public. This chapter describes various methods of checking data quality, the most important of which is a post-enumeration survey. Often, such quality checks are not organized as this is one of the last census operations when funds are not available. Statisticians responsible for the organization of the censuses in many countries have not historically disseminated this kind of information.

This chapter deals with non-sampling errors while sampling errors are described in Chapter 7. Non-sampling errors creep into data generally because of mistakes committed at different phases of the census: preparatory activities, data collection, data processing and data tabulation. These errors usually refer to coverage errors (missing holdings, duplicates, etc.) and response errors. Even when systematic errors are detected, the correction of such data is difficult and is not recommended.

Introduction

16.1 In censuses and surveys it should be a practice to analyze the accuracy of data collected. Considering the large number of enumerators and supervisors employed, the number of steps involved in organization, difficulties in controlling operations, particularly in remote areas, it is also necessary to check the quality of data disseminated. The organizers should be aware of the quality of the data before they are released for public use; and data users should be aware of data limitations in order to avoid mistakes in decision making.

16.2 There are two types of errors in census and survey work:

Sampling errors occur when sampling is used. They refer to the discrepancies between the sample estimates and the population values that would be obtained by enumerating all units in the population. Sampling errors can be estimated and controlled in the sense that they can be reduced by enlarging the sample size.
Non-sampling errors appear in all censuses and surveys. These errors refer to the discrepancies between data collected and their true value. They are due primarily to the variable performance of human beings and their lack of precise knowledge of the data requested. Strictly speaking, they are the result of mistakes committed in various phases of the census and survey work.

16.3 There are various methods of detecting and controlling data errors. Methods such as: (i) data evaluation as a part of supervision of field enumeration, and (ii) checking census tables against administrative or other available data may be called quality checks. A comprehensive check on a sample of raw data is recommended by FAO, and consists of a separate Post-Enumeration Survey (PES).

16.4 The purpose of this chapter is to discuss the various sources of non-sampling errors in agricultural censuses and surveys, and to describe methods for controlling such errors.

Sources of non-sampling errors

16.5 The non-sampling errors in census and survey data may be classified into three broad groups:

Errors resulting from preparatory activities.
Errors committed in the data collection stage
Processing and tabulation errors.

16.6 Errors committed in the preparatory stage. These errors are the direct responsibility of the census organizers, and are primarily due to insufficient pre-testing of various operations. They may be classified as biased tool and biased procedure errors.

16.7 Biased tools refer to means used for data collection such as: questionnaires, instruction manuals, tables of random numbers for selection of sample holdings, etc. Some definitions may not be adequate, or some concepts may be defined in a misleading way, so that the enumerator does not apply them correctly in the field work. The wording of some questions in the questionnaire may also be misleading, and instruction manuals may not be well drafted and, therefore, not clear to enumerators.

16.8 Biased procedures refer to measurement, sample selection and estimation procedures, etc. Concerning measurement procedures, it has been demonstrated that the objective measurement of yield does not always provide reliable data because of border and other biases. The sample selection procedure, when sample enumeration is applied, is a delicate operation which, particularly if entrusted to the field enumerators, may lead to considerable errors. Sample estimation procedures, if prepared by laymen, may also result in major errors in the census results.

16.9 Data collection errors are the responsibility of enumerators and respondents. They can be broken down into coverage errors, response errors, missing data, etc.

16.10 Coverage errors. There may be errors in the listing of units which create errors in coverage. The omission of some units in the listing will lead to an underestimation of the totals for all characteristics, while duplication of holdings will lead to overestimation. Omissions are more common and, therefore, it is generally accepted that census estimates for most characteristics are biased downwards.

16.11 Coverage errors are very common whether the census is based on a sample or on a complete enumeration. They might appear because of difficulties connected with various characteristics of the enumeration area. If these are large in terms of area of the number of potential units (units from which data are to be collected), some units can easily be either omitted or listed several times. On the other hand, if they are small there is difficulty defining their borders. In the latter case, the enumerator may not know whether a particular potential unit belongs to his/her segment. Such misjudgment naturally leads to errors.

16.12 Accuracy of coverage depends on the distribution of units over the area of the enumerator's segment. Congestion of units often causes trouble. In a situation of housing shortages, a number of holders might be found living in the same house sharing many of the common amenities, but operating land separately. Brothers living in the same house and sharing common facilities often operate land separately. The enumerator may, however, list them as operating only one agricultural holding. In such a situation data on the number of holdings is affected, and omission of data is very likely.

16.13 Coverage errors are also created when segments are prepared for identification because of the quality of related mapping materials. If the borders are not well defined and if there are no distinct identifiable landmarks on the ground separating two consecutive enumeration areas, the enumerator may find it difficult to determine whether a particular household should be interviewed or listed in this enumeration area.

16.14 When cadastral surveys have been conducted and maps prepared indicating the boundaries of the various areas such as villages, it is not difficult to demarcate the boundary of the two consecutive villages provided the enumerator is trained in reading cadastral maps. However, where the land has not been cadastrally surveyed and cadastral maps are not maintained, if well-defined boundaries identified on the ground are lacking, there is a great danger of making errors of omission or duplication of border units.

16.15 The enumerators themselves represent another source of coverage errors. Some enumerators are careless and attempt to complete the work in haste. Some are not properly trained and do not know how to use existing facilities to prepare accurate lists. Some others may not be sufficiently interested in the work and do not clarify more complicated situations.

16.16 Response error is another serious error. Response errors are common in countries where holders do not keep records of their agricultural operations and do not have clear concepts of area measurement. Sometimes under-reporting is due to fear of taxation or imposition of land tenure changes. The nature of the inquiry may be cause for under-reporting. If holders do not keep records, it is difficult to get information on the number of trees in orchards. Under-reporting is also very common in reporting livestock numbers.

16.17 In many developing countries, the quality of census data suffers because of the prevalence of a large number of different units of measurement for area and weight and occasionally standard units of measure are non-existent. In such cases the enumerator cannot easily convert local units into standard units.

16.18 The holders operating large holdings often forget to report all parcels of land operated. They generally operate land in several areas or villages and when reporting forget to give information on the parcels operated in villages other than the village in which they reside. When objective measurement of areas is applied, it is necessary that both enumerator and holder visit each parcel. They may intentionally fail to report distant parcels in order to avoid such a distant visit.

16.19 Missing data is a special kind of response error. This refers to a variety of situations, such as when the holder is not at home during the period of enumeration. In case of crop-cutting, it may happen that the enumerator arrives after the field is already harvested. Refusals to report also represent missing data.

16.20 Data processing errors include those errors committed at the stage of data entry from questionnaire to computer media, either because of illegible handwriting or other reasons. Such errors are normally discovered by data entry verification or by computer checking for data consistency. As described in Chapter 17, the errors detected by the computer can be corrected either manually after comparison with the census questionnaire, or automatically by computer. The latter, however, is a very delicate procedure. Mistakes occur in handling questionnaires and/or corresponding computer records. Questionnaires may be misplaced, while computer records may have mistakes in identification codes. Errors in the data processing stage are easier to control then errors committed in the field, and can be avoided in a good organization. Nevertheless, routine controls such as checking for duplicate records, always discover unexpected mistakes.

Checking census tables against other data

16.21 The most common and often practised technique for checking the quality of data is the comparison of census totals with information available on the same item from independent sources. For example, data on crop acreage as collected in current surveys can often be successfully compared with the corresponding information available from the census. In India, crop and land utilization statistics are collected annually by complete enumeration employing the land record and revenue staff. These statistics can be compared with the corresponding statistics obtained through the census. This is possible only if the data collected in both cases uses the same definitions and concepts.

16.22 In many cases, there may not be comparable totals although there might be some possibility for evaluating sub-totals. In some countries a list of holdings of a certain size is maintained for the purpose of land taxation. If census data are tabulated in such a way that the holdings of such categories are separated, it might be possible to compare the administrative data with the results obtained in the census. Such comparisons are primarily used to study the effects of coverage errors and possibly other characteristics available in the record. Unless a group of holdings cover an important part of agriculture, such checks have limited value from the point of view of the population as a whole, unless each of the sub-totals can be checked separately for accuracy.

16.23 A similar technique which may be found useful is evaluating the consistency of data. The evaluation procedure aims at examining the consistency of data with other available knowledge which is generally accepted. For example, total area operated obtained in an agricultural census should always be less than the total geographical area. Similarly, the number of agricultural labourers obtained from the agricultural census should be less than the total rural labour force obtained through the population census. Such comparisons can only be very general, but they provide an opportunity to judge the quality of data. However, if such a comparison indicates that the data from the two sources differ significantly, these comparisons do not provide the tools needed to correct the census levels.

Supervision and post-enumeration check

16.24 The content of the PES should not normally be used as part of the supervision programme of the work and the supervisory staff should not be used to provide the necessary data needed for the PES. The existence of sample checks by the supervisor on enumerators' work is very useful as it represents some pressure on the census enumerator to do the work correctly. The procedure corresponds to application of statistical methods for controlling the quality of industrial production. The mere introduction of the quality control brings about a noticeable improvement in the quality. Supervision of enumerators should include as many unscheduled visits as possible. Supervision may be intensified in those areas where the work is found to be lacking or unsatisfactory. The census enumerators have different backgrounds and levels of training and experience. Some may be honest and intelligent while others may be indifferent, careless and dishonest. The supervision of the work of the latter category of enumerators should be more intensive. There may be some items in the census for which there are serious recording problems. Such items should be reviewed by supervisors very carefully. Consequently, census supervisors should not play the same role as the PES investigators. Therefore, the objective of supervision and the PES should not be confused. The purpose of supervision is to improve the quality of the enumerators' work, while the purpose of the PES is to make an independent assessment of the quality of data collected.

Purpose of the post-enumeration survey

16.25 The organization of PES on a sample basis is a common practice for evaluating the accuracy of data collected. In most countries, whether the census is based on a sample or on a complete enumeration, a PES is planned and conducted. The PES should be qualitatively better than the census, and its cost and size would be relatively small. Questions arise on what the content of the PES should be, what the sample size should be, who should do the field work, and when the field work should be organized.

16.26 The objective of the PES should be clearly outlined. The purpose will be to determine a quality measurement of the census data and to provide such information to users of the data. The PES data should not be used to adjust the census results. The data collected in the PES are from a small sample and cannot be used for such adjustments. Census results are presented for small administrative and geographical areas, while PES is not. Any adjustment based on PES data will introduce serious limitations in the use of census results as correction factors will be subject to large sampling errors. Such adjustments will also introduce internal inconsistency in the results. It has been found that if there is a serious error in the irrigated areas obtained from the census, any adjustment made in the irrigated area on the basis of the PES may introduce serious inconsistency in respect of total cropland. There may be some situations where common errors can be determined on the basis of the PES. For example, when area data have been reported in a particular local unit, physical measurement of area in the PES may provide a correction factor for adjusting the census results.

16.27 The cost involved in organizing the PES discourages many developing countries from undertaking such a survey. The utility of the PES for checking the quality of census data is of even more value in countries that are at the initial stages of statistical activities. In such countries there may not be check data to evaluate the consistency of census results.

16.28 Experience is needed to clarify the problems of conducting a census through a number of field investigations, and test how efficient are the methods that might be used. Systematic records kept on the origin of errors is an extremely valuable tool for planning future surveys. Statistically developed countries have a census methodology that has evolved through many surveys and censuses. Developing countries will have to develop, through experience, census methodology suiting their own local socio-economic conditions. The organization of a PES is one of the important steps in that direction.

16.29 The use of the PES for checking quality may create pressure on respondents and enumerators to supply more accurate data. They will both be alert and conscious that data inaccuracies could be detected at a later time.

Design of the post-enumeration survey

16.30 The size of the sample and its distribution will depend on available resources for this purpose. A frame of enumeration blocks, which has been prepared for the main census, is a convenient frame for a PES. Considering the importance of coverage errors and the fact that the data on agricultural areas tend to be underestimated, the area frame for the PES should be independent from the frame used in the census, in order to better evaluate the census frame itself. A design which is likely to prove most useful for the PES should be based on area sampling, whenever possible, and particularly if the census was organized on the basis of a list frame (e.g., list of villages). Agricultural situations and levels of farming vary considerably from province to province, depending on the agro-climatic and socio-economic conditions. Errors in data collection, to a large extent, are impacted by the socio-economic situation of the holder. It is advisable to adopt agro-climatically and socio-economically homogeneous zones as strata, with area segments or the village as the first stage sampling unit and an agricultural holding as the second stage sampling unit. The technique of "two stage sampling" can easily be adopted to collect data on some items from a larger sample of holdings and, at the same time, collect data on some other items from a smaller sample. For example, the listing of agricultural holdings can be checked for the entire sample selected for the PES, while the information on selected census data, such as crop area, agricultural inputs, livestock numbers, etc., can be obtained from a sub-sample of holdings. In its simplest form the PES would involve (i) selecting a sample of area units, such as enumeration blocks or villages; (ii) preparing a new list of agricultural holdings in the selected enumeration unit; (iii) collecting relevant data on selected items incorporated into the census programme from a sub-sample of holdings; and (iv) estimating separately the coverage bias and the response error.

Method of data collection in the post-enumeration survey

16.31 The PES should normally be a small sample survey carried out soon after the census enumeration is completed, utilizing qualified and trained enumerators. The questionnaires to be used by the PES enumerators should deal with only a few key items from the census. The preparation of a new list of units in the sample areas should be an integral part of the PES. As far as possible, the methods used for data collection should be more objective and reliable than those used by census enumerators. By repeating the same questions and by following the same method of collecting data, there is hardly any possibility of discovering the errors in census data. If the census data has been obtained by the interview method, it would be best to check their quality by adopting some method of physical measurement. For a good quality check on census data, it is necessary to have the best enumerators and to adopt a very controlled technique. The use of physical measurement of area, and actual count of livestock and trees should be attempted.

16.32 The field operations of the PES should start as soon as the census enumeration is completed. Re-enumeration during the census is possible and may be more economical but is not recommended because it is not possible to select the most qualified and experienced staff for this work. Enumerators working on the PES should never be assigned the same area they worked during the census. Conducting the PES soon after the census takes advantage of the atmosphere created for the census in securing people's willing cooperation. If it is conducted late, there is a danger of respondents forgetting many things. The longer the lapse between the census and the PES the more likely problems will arise.

Presentation of errors detected in the post-enumeration survey

16.33 Suggested methods of presentation of errors in census data will be described in this section, while the possible content of the report on the PES is given in Chapter 18.

16.34 PES errors may be classified into two categories: (i) coverage errors and (ii) response errors. It may be emphasized at this stage that coverage errors in surveys and censuses affect the results more than any other factor. Bias due to either omission or duplication of units introduces errors in estimates of all characteristics. Of course, the magnitude of this bias will depend on the distribution of coverage errors. It may be large irrespective of whether omissions or duplications exist. However, in either case, it will be small if units affected contribute little to the total for the characteristics concerned. For example, a considerable omission of small-holdings may slightly affect the magnitude of totals for most characteristics in the programme of the agricultural census. However, it is often found that smallholders concentrate on livestock and poultry for improving their income. In such situations, if there is a large-scale omission of smallholdings, there is a possibility of under-reporting livestock and poultry numbers.

16.35 There are many different ways to present coverage errors. The suggested presentation is shown in Table 16.1. The absolute numbers in this table refer to estimates for the population based on the sample expansion, while percentages refer to the ratio of data obtained from the post-enumeration surveys as true values (100 percent). All information is classified by broad categories of size of holdings as it is considered important to analyze coverage errors in relation to size of holdings. Obviously, the countries in which small or large holdings prevail may choose different size categories, or countries may relate coverage errors to other holding characteristics such as land tenure.

16.36 The information shown in the stub of the table refers to the following:

Holdings listed in the census.
Holdings listed in the PES.
Holdings listed in both the census and the PES (agreements).
Holdings listed in the census and not in the PES (erroneously included). The totals in lines 3 and 4 should be equal to the figure in line 1.
Holdings listed in the PES and not in the census (erroneously excluded or omissions). The totals in lines 3 and 5 should be equal to the figure in line 2.
Bias which is equal to the difference between lines 1 and 2, or 4 and 5.

16.37 For other important data, such as area of holdings, area under major crops, number of livestock etc., the effect of coverage error can be presented in a similar Table.

Table 16.1. Effect of coverage errors on the number of holdings

		Size classification of holdings (in ha)
		Less than 1 ha	1 ha and less than 5	5 ha and less than 10	10 ha and over	Total
1. Census	Number	1100	500	100	50	1750
1. Census	Percent	110	111	111	104	110
2. Post enumeration survey	Number	1000	450	90	48	1588
2. Post enumeration survey	Percent	100	100	100	100	100
3. Agreement	Number	900	420	85	48	1453
3. Agreement	Percent	90	93	94	100	91
4. Erroneously included	Number	200	80	15	2	297
4. Erroneously included	Percent	20	18	17	4	19
5. Erroneously excluded	Number	100	30	5	0	135
5. Erroneously excluded	Percent	10	7	5	0	9
6. Bias (difference)	Number	100	50	10	2	162
6. Bias (difference)	Percent	10	11	11	4	10

16.38 The other major source of error in statistical surveys is response error. This kind of error can be presented independently from coverage error for various characteristics in a form similar to Table 16.2 with the stub as follows: 1. Census, 2. Post-enumeration survey and 3. Bias. This kind of error can be presented only for holdings covered by both the census and the PES (named "agreement" in Table 16.1).

16.39 The total error, i.e., total of coverage and response errors, can also be presented for various census data.

Table 16.2

		Size classification of holdings (in ha)
		Less than 1 ha	1 ha and less than 5	5 ha and less than 10	10 ha and over	Total
1. Census	Area
1. Census	Percent
2. Post-enumeration survey	Area
2. Post-enumeration survey	Percent
3. Bias (difference)	Area
3. Bias (difference)	Percent

16.40 A critical study of various tables will give an excellent insight into the quality of data collected in the census. Data for various size classes point out what categories of holdings are most affected by data quality and where attention should be directed in future surveys and censuses to ensure higher quality. Each statistician responsible for planning the agricultural census should know to what extent the coverage and measurement errors for various characteristics could be surveyed, and whether these errors can be overlooked without running the risk of major biases, etc. All such questions can be answered by analyzing the data in the tables described above.

Suggested reading
UN (1982). Non-sampling errors in household surveys: Sources, assessment and control. NHSCP technical study.
UN (1984). Handbook for household surveys (revised edition). Studies in Methods, Series F, No. 31.
UN (1992). Handbook of population and housing censuses: Part I, Planning, organization and administration of population and housing censuses. Studies in methods, Series F, No. 54.
US Bureau of the Census (1974). Standards for discussion and presentation of error in data. Technical paper No. 32. Washington DC.