|
QUALITY CHECKS AND POST-ENUMERATION SURVEYS
No matter how well a census or a survey is organized,
it is difficult to assure that quality data is collected.
It is very important to arrange various data checks before
data are disseminated to the public. This chapter describes
various methods of checking data quality, the most important
of which is a post-enumeration survey. Often, such quality
checks are not organized as this is one of the last census
operations when funds are not available. Statisticians responsible
for the organization of the censuses in many countries have
not historically disseminated this kind of information.
This chapter deals with non-sampling errors while sampling
errors are described in Chapter 7.
Non-sampling errors creep into data generally because of mistakes
committed at different phases of the census: preparatory activities,
data collection, data processing and data tabulation. These
errors usually refer to coverage errors (missing holdings,
duplicates, etc.) and response errors. Even when systematic
errors are detected, the correction of such data is difficult
and is not recommended.
Introduction
16.1 In censuses and surveys it should be a practice to analyze
the accuracy of data collected. Considering the large number
of enumerators and supervisors employed, the number of steps
involved in organization, difficulties in controlling operations,
particularly in remote areas, it is also necessary to check
the quality of data disseminated. The organizers should be
aware of the quality of the data before they are released
for public use; and data users should be aware of data limitations
in order to avoid mistakes in decision making.
16.2 There are two types of errors in census and survey work:
- Sampling errors occur when sampling is used.
They refer to the discrepancies between the sample estimates
and the population values that would be obtained by enumerating
all units in the population. Sampling errors can be estimated
and controlled in the sense that they can be reduced by
enlarging the sample size.
- Non-sampling errors appear in all censuses and
surveys. These errors refer to the discrepancies between
data collected and their true value. They are due primarily
to the variable performance of human beings and their lack
of precise knowledge of the data requested. Strictly speaking,
they are the result of mistakes committed in various phases
of the census and survey work.
16.3 There are various methods of detecting and controlling
data errors. Methods such as: (i) data evaluation as a part
of supervision of field enumeration, and (ii) checking census
tables against administrative or other available data may
be called quality checks. A comprehensive check on a sample
of raw data is recommended by FAO, and consists of a separate
Post-Enumeration Survey (PES).
16.4 The purpose of this chapter is to discuss the various
sources of non-sampling errors in agricultural censuses and
surveys, and to describe methods for controlling such errors.
Sources of non-sampling errors
16.5 The non-sampling errors in census and survey data may
be classified into three broad groups:
- Errors resulting from preparatory activities.
- Errors committed in the data collection stage
- Processing and tabulation errors.
16.6 Errors committed in the preparatory stage. These
errors are the direct responsibility of the census organizers,
and are primarily due to insufficient pre-testing of various
operations. They may be classified as biased tool and biased
procedure errors.
16.7 Biased tools refer to means used for data collection
such as: questionnaires, instruction manuals, tables of random
numbers for selection of sample holdings, etc. Some definitions
may not be adequate, or some concepts may be defined in a
misleading way, so that the enumerator does not apply them
correctly in the field work. The wording of some questions
in the questionnaire may also be misleading, and instruction
manuals may not be well drafted and, therefore, not clear
to enumerators.
16.8 Biased procedures refer to measurement, sample selection
and estimation procedures, etc. Concerning measurement procedures,
it has been demonstrated that the objective measurement of
yield does not always provide reliable data because of border
and other biases. The sample selection procedure, when sample
enumeration is applied, is a delicate operation which, particularly
if entrusted to the field enumerators, may lead to considerable
errors. Sample estimation procedures, if prepared by laymen,
may also result in major errors in the census results.
16.9 Data collection errors are the responsibility
of enumerators and respondents. They can be broken down into
coverage errors, response errors, missing data, etc.
16.10 Coverage errors. There may be errors in the listing
of units which create errors in coverage. The omission of
some units in the listing will lead to an underestimation
of the totals for all characteristics, while duplication of
holdings will lead to overestimation. Omissions are more common
and, therefore, it is generally accepted that census estimates
for most characteristics are biased downwards.
16.11 Coverage errors are very common whether the census
is based on a sample or on a complete enumeration. They might
appear because of difficulties connected with various characteristics
of the enumeration area. If these are large in terms of area
of the number of potential units (units from which data are
to be collected), some units can easily be either omitted
or listed several times. On the other hand, if they are small
there is difficulty defining their borders. In the latter
case, the enumerator may not know whether a particular potential
unit belongs to his/her segment. Such misjudgment naturally
leads to errors.
16.12 Accuracy of coverage depends on the distribution of
units over the area of the enumerator's segment. Congestion
of units often causes trouble. In a situation of housing shortages,
a number of holders might be found living in the same house
sharing many of the common amenities, but operating land separately.
Brothers living in the same house and sharing common facilities
often operate land separately. The enumerator may, however,
list them as operating only one agricultural holding. In such
a situation data on the number of holdings is affected, and
omission of data is very likely.
16.13 Coverage errors are also created when segments are
prepared for identification because of the quality of related
mapping materials. If the borders are not well defined and
if there are no distinct identifiable landmarks on the ground
separating two consecutive enumeration areas, the enumerator
may find it difficult to determine whether a particular household
should be interviewed or listed in this enumeration area.
16.14 When cadastral surveys have been conducted and maps
prepared indicating the boundaries of the various areas such
as villages, it is not difficult to demarcate the boundary
of the two consecutive villages provided the enumerator is
trained in reading cadastral maps. However, where the land
has not been cadastrally surveyed and cadastral maps are not
maintained, if well-defined boundaries identified on the ground
are lacking, there is a great danger of making errors of omission
or duplication of border units.
16.15 The enumerators themselves represent another source
of coverage errors. Some enumerators are careless and attempt
to complete the work in haste. Some are not properly trained
and do not know how to use existing facilities to prepare
accurate lists. Some others may not be sufficiently interested
in the work and do not clarify more complicated situations.
16.16 Response error is another serious error. Response errors
are common in countries where holders do not keep records
of their agricultural operations and do not have clear concepts
of area measurement. Sometimes under-reporting is due to fear
of taxation or imposition of land tenure changes. The nature
of the inquiry may be cause for under-reporting. If holders
do not keep records, it is difficult to get information on
the number of trees in orchards. Under-reporting is also very
common in reporting livestock numbers.
16.17 In many developing countries, the quality of census
data suffers because of the prevalence of a large number of
different units of measurement for area and weight and occasionally
standard units of measure are non-existent. In such cases
the enumerator cannot easily convert local units into standard
units.
16.18 The holders operating large holdings often forget to
report all parcels of land operated. They generally operate
land in several areas or villages and when reporting forget
to give information on the parcels operated in villages other
than the village in which they reside. When objective measurement
of areas is applied, it is necessary that both enumerator
and holder visit each parcel. They may intentionally fail
to report distant parcels in order to avoid such a distant
visit.
16.19 Missing data is a special kind of response error. This
refers to a variety of situations, such as when the holder
is not at home during the period of enumeration. In case of
crop-cutting, it may happen that the enumerator arrives after
the field is already harvested. Refusals to report also represent
missing data.
16.20 Data processing errors include those errors
committed at the stage of data entry from questionnaire to
computer media, either because of illegible handwriting or
other reasons. Such errors are normally discovered by data
entry verification or by computer checking for data consistency.
As described in Chapter 17, the errors
detected by the computer can be corrected either manually
after comparison with the census questionnaire, or automatically
by computer. The latter, however, is a very delicate procedure.
Mistakes occur in handling questionnaires and/or corresponding
computer records. Questionnaires may be misplaced, while computer
records may have mistakes in identification codes. Errors
in the data processing stage are easier to control then errors
committed in the field, and can be avoided in a good organization.
Nevertheless, routine controls such as checking for duplicate
records, always discover unexpected mistakes.
Checking census tables against other data
16.21 The most common and often practised technique for checking
the quality of data is the comparison of census totals with
information available on the same item from independent sources.
For example, data on crop acreage as collected in current
surveys can often be successfully compared with the corresponding
information available from the census. In India, crop and
land utilization statistics are collected annually by complete
enumeration employing the land record and revenue staff. These
statistics can be compared with the corresponding statistics
obtained through the census. This is possible only if the
data collected in both cases uses the same definitions and
concepts.
16.22 In many cases, there may not be comparable totals although
there might be some possibility for evaluating sub-totals.
In some countries a list of holdings of a certain size is
maintained for the purpose of land taxation. If census data
are tabulated in such a way that the holdings of such categories
are separated, it might be possible to compare the administrative
data with the results obtained in the census. Such comparisons
are primarily used to study the effects of coverage errors
and possibly other characteristics available in the record.
Unless a group of holdings cover an important part of agriculture,
such checks have limited value from the point of view of the
population as a whole, unless each of the sub-totals can be
checked separately for accuracy.
16.23 A similar technique which may be found useful is evaluating
the consistency of data. The evaluation procedure aims at
examining the consistency of data with other available knowledge
which is generally accepted. For example, total area operated
obtained in an agricultural census should always be less than
the total geographical area. Similarly, the number of agricultural
labourers obtained from the agricultural census should be
less than the total rural labour force obtained through the
population census. Such comparisons can only be very general,
but they provide an opportunity to judge the quality of data.
However, if such a comparison indicates that the data from
the two sources differ significantly, these comparisons do
not provide the tools needed to correct the census levels.
Supervision and post-enumeration check
16.24 The content of the PES should not normally be used
as part of the supervision programme of the work and the supervisory
staff should not be used to provide the necessary data needed
for the PES. The existence of sample checks by the supervisor
on enumerators' work is very useful as it represents some
pressure on the census enumerator to do the work correctly.
The procedure corresponds to application of statistical methods
for controlling the quality of industrial production. The
mere introduction of the quality control brings about a noticeable
improvement in the quality. Supervision of enumerators should
include as many unscheduled visits as possible. Supervision
may be intensified in those areas where the work is found
to be lacking or unsatisfactory. The census enumerators have
different backgrounds and levels of training and experience.
Some may be honest and intelligent while others may be indifferent,
careless and dishonest. The supervision of the work of the
latter category of enumerators should be more intensive. There
may be some items in the census for which there are serious
recording problems. Such items should be reviewed by supervisors
very carefully. Consequently, census supervisors should not
play the same role as the PES investigators. Therefore, the
objective of supervision and the PES should not be confused.
The purpose of supervision is to improve the quality of the
enumerators' work, while the purpose of the PES is to make
an independent assessment of the quality of data collected.
Purpose of the post-enumeration survey
16.25 The organization of PES on a sample basis is a common
practice for evaluating the accuracy of data collected. In
most countries, whether the census is based on a sample or
on a complete enumeration, a PES is planned and conducted.
The PES should be qualitatively better than the census, and
its cost and size would be relatively small. Questions arise
on what the content of the PES should be, what the sample
size should be, who should do the field work, and when the
field work should be organized.
16.26 The objective of the PES should be clearly outlined.
The purpose will be to determine a quality measurement of
the census data and to provide such information to users of
the data. The PES data should not be used to adjust the census
results. The data collected in the PES are from a small sample
and cannot be used for such adjustments. Census results are
presented for small administrative and geographical areas,
while PES is not. Any adjustment based on PES data will introduce
serious limitations in the use of census results as correction
factors will be subject to large sampling errors. Such adjustments
will also introduce internal inconsistency in the results.
It has been found that if there is a serious error in the
irrigated areas obtained from the census, any adjustment made
in the irrigated area on the basis of the PES may introduce
serious inconsistency in respect of total cropland. There
may be some situations where common errors can be determined
on the basis of the PES. For example, when area data have
been reported in a particular local unit, physical measurement
of area in the PES may provide a correction factor for adjusting
the census results.
16.27 The cost involved in organizing the PES discourages
many developing countries from undertaking such a survey.
The utility of the PES for checking the quality of census
data is of even more value in countries that are at the initial
stages of statistical activities. In such countries there
may not be check data to evaluate the consistency of census
results.
16.28 Experience is needed to clarify the problems of conducting
a census through a number of field investigations, and test
how efficient are the methods that might be used. Systematic
records kept on the origin of errors is an extremely valuable
tool for planning future surveys. Statistically developed
countries have a census methodology that has evolved through
many surveys and censuses. Developing countries will have
to develop, through experience, census methodology suiting
their own local socio-economic conditions. The organization
of a PES is one of the important steps in that direction.
16.29 The use of the PES for checking quality may create
pressure on respondents and enumerators to supply more accurate
data. They will both be alert and conscious that data inaccuracies
could be detected at a later time.
Design of the post-enumeration survey
16.30 The size of the sample and its distribution will depend
on available resources for this purpose. A frame of enumeration
blocks, which has been prepared for the main census, is a
convenient frame for a PES. Considering the importance of
coverage errors and the fact that the data on agricultural
areas tend to be underestimated, the area frame for the PES
should be independent from the frame used in the census, in
order to better evaluate the census frame itself. A design
which is likely to prove most useful for the PES should be
based on area sampling, whenever possible, and particularly
if the census was organized on the basis of a list frame (e.g.,
list of villages). Agricultural situations and levels of farming
vary considerably from province to province, depending on
the agro-climatic and socio-economic conditions. Errors in
data collection, to a large extent, are impacted by the socio-economic
situation of the holder. It is advisable to adopt agro-climatically
and socio-economically homogeneous zones as strata, with area
segments or the village as the first stage sampling unit and
an agricultural holding as the second stage sampling unit.
The technique of "two stage sampling" can easily
be adopted to collect data on some items from a larger sample
of holdings and, at the same time, collect data on some other
items from a smaller sample. For example, the listing of agricultural
holdings can be checked for the entire sample selected for
the PES, while the information on selected census data, such
as crop area, agricultural inputs, livestock numbers, etc.,
can be obtained from a sub-sample of holdings. In its simplest
form the PES would involve (i) selecting a sample of area
units, such as enumeration blocks or villages; (ii) preparing
a new list of agricultural holdings in the selected enumeration
unit; (iii) collecting relevant data on selected items incorporated
into the census programme from a sub-sample of holdings; and
(iv) estimating separately the coverage bias and the response
error.
Method of data collection in the post-enumeration survey
16.31 The PES should normally be a small sample survey carried
out soon after the census enumeration is completed, utilizing
qualified and trained enumerators. The questionnaires to be
used by the PES enumerators should deal with only a few key
items from the census. The preparation of a new list of units
in the sample areas should be an integral part of the PES.
As far as possible, the methods used for data collection should
be more objective and reliable than those used by census enumerators.
By repeating the same questions and by following the same
method of collecting data, there is hardly any possibility
of discovering the errors in census data. If the census data
has been obtained by the interview method, it would be best
to check their quality by adopting some method of physical
measurement. For a good quality check on census data, it is
necessary to have the best enumerators and to adopt a very
controlled technique. The use of physical measurement of area,
and actual count of livestock and trees should be attempted.
16.32 The field operations of the PES should start as soon
as the census enumeration is completed. Re-enumeration during
the census is possible and may be more economical but is not
recommended because it is not possible to select the most
qualified and experienced staff for this work. Enumerators
working on the PES should never be assigned the same area
they worked during the census. Conducting the PES soon after
the census takes advantage of the atmosphere created for the
census in securing people's willing cooperation. If it is
conducted late, there is a danger of respondents forgetting
many things. The longer the lapse between the census and the
PES the more likely problems will arise.
Presentation of errors detected in the post-enumeration
survey
16.33 Suggested methods of presentation of errors in census
data will be described in this section, while the possible
content of the report on the PES is given in Chapter
18.
16.34 PES errors may be classified into two categories: (i)
coverage errors and (ii) response errors. It may be emphasized
at this stage that coverage errors in surveys and censuses
affect the results more than any other factor. Bias due to
either omission or duplication of units introduces errors
in estimates of all characteristics. Of course, the magnitude
of this bias will depend on the distribution of coverage errors.
It may be large irrespective of whether omissions or duplications
exist. However, in either case, it will be small if units
affected contribute little to the total for the characteristics
concerned. For example, a considerable omission of small-holdings
may slightly affect the magnitude of totals for most characteristics
in the programme of the agricultural census. However, it is
often found that smallholders concentrate on livestock and
poultry for improving their income. In such situations, if
there is a large-scale omission of smallholdings, there is
a possibility of under-reporting livestock and poultry numbers.
16.35 There are many different ways to present coverage
errors. The suggested presentation is shown in Table 16.1.
The absolute numbers in this table refer to estimates for
the population based on the sample expansion, while percentages
refer to the ratio of data obtained from the post-enumeration
surveys as true values (100 percent). All information is classified
by broad categories of size of holdings as it is considered
important to analyze coverage errors in relation to size of
holdings. Obviously, the countries in which small or large
holdings prevail may choose different size categories, or
countries may relate coverage errors to other holding characteristics
such as land tenure.
16.36 The information shown in the stub of the table refers
to the following:
- Holdings listed in the census.
- Holdings listed in the PES.
- Holdings listed in both the census and the PES (agreements).
- Holdings listed in the census and not in the PES (erroneously
included). The totals in lines 3 and 4 should be equal to
the figure in line 1.
- Holdings listed in the PES and not in the census (erroneously
excluded or omissions). The totals in lines 3 and 5 should
be equal to the figure in line 2.
- Bias which is equal to the difference between lines 1
and 2, or 4 and 5.
16.37 For other important data, such as area of holdings,
area under major crops, number of livestock etc., the effect
of coverage error can be presented in a similar Table.
Table 16.1. Effect of coverage errors on the number
of holdings
(example)
| Size
classification of holdings (in ha)
|
| Less than
1 ha
| 1 ha and
less than 5
| 5 ha and
less than 10
| 10 ha and
over
| Total
|
| 1. Census
| Number
| 1100
| 500
| 100
| 50
| 1750
|
| Percent
| 110
| 111
| 111
| 104
| 110
|
| 2. Post enumeration
survey
| Number
| 1000
| 450
| 90
| 48
| 1588
|
| Percent
| 100
| 100
| 100
| 100
| 100
|
| 3. Agreement
| Number
| 900
| 420
| 85
| 48
| 1453
|
| Percent
| 90
| 93
| 94
| 100
| 91
|
| 4. Erroneously
included
| Number
| 200
| 80
| 15
| 2
| 297
|
| Percent
| 20
| 18
| 17
| 4
| 19
|
| 5. Erroneously
excluded
| Number
| 100
| 30
| 5
| 0
| 135
|
| Percent
| 10
| 7
| 5
| 0
| 9
|
| 6. Bias (difference)
| Number
| 100
| 50
| 10
| 2
| 162
|
| Percent
| 10
| 11
| 11
| 4
| 10
|
16.38 The other major source of error in statistical surveys
is response error. This kind of error can be presented independently
from coverage error for various characteristics in a form
similar to Table 16.2 with the stub as follows: 1. Census,
2. Post-enumeration survey and 3. Bias. This kind of error
can be presented only for holdings covered by both the census
and the PES (named "agreement" in Table 16.1).
16.39 The total error, i.e., total of coverage and response
errors, can also be presented for various census data.
Table 16.2. Effect of response errors on the total
area of holdings
| Size classification of holdings
(in ha)
|
| Less than 1 ha
| 1 ha and less than 5
| 5 ha and less than 10
| 10 ha and over
| Total
|
| 1. Census
| Area
|
|
|
|
|
|
| Percent
|
|
|
|
|
|
| 2. Post-enumeration
survey
| Area
|
|
|
|
|
|
| Percent
|
|
|
|
|
|
| 3. Bias (difference)
| Area
|
|
|
|
|
|
| Percent
|
|
|
|
|
|
16.40 A critical study of various tables will give an excellent
insight into the quality of data collected in the census.
Data for various size classes point out what categories of
holdings are most affected by data quality and where attention
should be directed in future surveys and censuses to ensure
higher quality. Each statistician responsible for planning
the agricultural census should know to what extent the coverage
and measurement errors for various characteristics could be
surveyed, and whether these errors can be overlooked without
running the risk of major biases, etc. All such questions
can be answered by analyzing the data in the tables described
above.
Suggested reading
UN (1982). Non-sampling errors in household surveys: Sources,
assessment and control. NHSCP technical study.
UN (1984). Handbook for household surveys (revised edition).
Studies in Methods, Series F, No. 31.
UN (1992). Handbook of population and housing censuses: Part
I, Planning, organization and administration of population
and housing censuses. Studies in methods, Series F, No. 54.
US Bureau of the Census (1974). Standards for discussion and
presentation of error in data. Technical paper No. 32. Washington
DC.
|