Previous Page Table of Contents Next Page

PROCESSING AND ANALYSIS OF FARMER INCOME DATA

(Item 5 of the Agenda)

48. Once the farmer income data is collected, processing and analysis does not easily follow. A tedious procedure of evaluation and validation are done before the data becomes available for the intended use. Four papers from the USA, Korea and FAO on methodologies for processing and strategies for imputation and analysis were presented were presented in this agenda item.

Methodologies for Processing and Analysis

Processing and Analysis of USDA's ARMS Survey

49. Mr David Banker, Agricultural Economist, Economic Research Service (ERS) of the US Department of Agriculture (USDA) presented, in STAT-INCOME-11, a summary of current methods used in the processing and analysis of farm business and farm operator household data for US farm operations collected in the Agricultural Resource Management Survey (ARMS). He described ARMS as an annual survey collecting data from farm operators on the farm business, the farm operation, commodity production practices, and characteristics of the farm operator and the operator's household. The survey is conducted in three phases: Phase I is a screening survey used to identify farms that are in scope; Phase II collects data on production practices and costs for targeted crops; Phase III obtains information on the farm business, the operator's household and production practices and costs for targeted livestock operations.

50. He said that while the Phase III survey used both list and area frames the list frame is predominant, accounting for nearly all samples in recent years. The target population was all farms (excluding institutional farms) in the 48 contiguous states (Alaska and Hawaii excluded) defined as those that sold or normally would have sold at least US$1 000 of agricultural production in the survey year. Samples are selected to provide estimates at the national, regional, and state level for 15 core states (those with the highest agricultural cash receipts). Within each state, farms in the list frame are stratified by size and type while area frame samples within each state (which are segments of land), are stratified by land use characteristics. Reporting units in the area frame are farm operations with farming activity within the selected land segments.

51. Mr Banker explained that ARMS data was collected by the National Agricultural Statistics Service (NASS) and subject to extensive editing and analysis by both NASS and ERS personnel. Postprocessing by NASS includes survey weight adjustments for outliers, unit non-response, coverage of production levels of major commodities and farm numbers, as well as item imputation for non-response. ERS provides additional data editing, analysis, item imputation, and variable creation.

He said that at NASS, editing was first done manually on paper questionnaires and then electronically on individual reports as well as at the macro level. SAS computing procedures checked for errors in coding, physical relationships (such as yield limits), and simple economic relationships between interrelated questionnaire cells.

52. Mr. Banker noted that the NASS imputation procedure involved the identification of "donors" (records with non-zero data) which were placed in imputation groups based on locality, farm type and value of sales. After excluding extreme values, un-weighted means were computed for each group to replace missing item values. He explained that after receipt of the raw survey file from NASS, ERS further reviewed and edited the data before creating a research database. He also noted that ERS added several hundred variables in the research database that were typically calculated from combinations of various survey items.

53. During the subsequent discussion, the Experts praised the systematic approach for data review, imputation and analysis of the USDA. Mr Banker explained to the Experts that for ARMS Phase III, a survey report outlier was identified by its weighted total expenses relative to total weighted expenses at the national level and/or regional level. Following identification, outliers were reviewed for potential adjustment by an official USDA board comprised of NASS and ERS personnel. For targeted crops (selected on rotating basis), field level crop production practice and cost information were obtained in Phase II. Field to farm expansion factors (weights) then provided crop production practice and cost information at the farm level. The same farms were then contacted again in Phase III to obtain farm business and operator household information. For targeted livestock commodities, all production practice and whole farm/farm household data were obtained in Phase III.

Strategies for Overcoming Data Limitations

Optimal Strategies to Improve Collection and Analysis of Farmers' Income Data

54. In STAT-INCOME-12, Mr Kyeong-Duk Kim, Chief of International Rural Development, Korea Rural Economic Institute, presented statistical data collection, analysis and dissemination in the agriculture sector with information technology (IT). He explained that there were two censuses conducted every five years: one on population and housing, and another on agriculture. The censuses served as the frame for the Survey of Integrated Farm Household Economy conducted every year, which covered about 33 000 households (4%). The survey panel was partially replaced every year mainly due to drop outs. Every 5 years, new samples were drawn. Provincial (state) level data on production and cost by commodity, and supply and demand situation were collected. Mr Kim said that farm income accounted for about onethird of the total farm household income which averaged just above US$30 000 a year. He added that the average farm size in Korea was relatively small at 1.4 ha.

55. When asked about further details on income data collection, Mr Kim explained that income data was collected/generated in two stages. First, the questionnaire was provided ahead of time so the farmer could familiarize with the kind of data to be collected. When face-to-face interview was done, it was shorter and mainly devoted to minimal data probing and to educating farmers on proper bookkeeping techniques. Handheld computers were also used in data collection. The second stage consisted of data input into an internet-based system already containing information of costs and prices. Mr Kim said that statistics on income and other household data was disseminated online. There, farmers had access to information on prices, production and weather, both current and forecast, among others, to facilitate their decision-making processes. He added that this was and effective way to encourage farmers to provide reliable information. In return for cooperating in data collection, the farmers benefited in terms of information and government protection in terms of tariff levied on imported agricultural commodities. He said that the National Statistics Office was responsible for data collection while other agencies such as the Agricultural Outlook Center were incharge of data utilization and dissemination.

56. The Experts asked about coverage of internet in Korea. Mr Kim informed that internet coverage in the country was very advanced, with ADSL internet connection available even in remote areas either at individual farm or community level. He said that extension services were provided to educate farmers in internet usage but acknowledged problems with old farmers unwilling to learn the technology. When asked about its possible applicability in other Asian countries, Mr Kim said it was plausible since the size of Korean farms was also very small. Although the initial implementation cost could be high, he said that in countries like Thailand, rice farmers could be persuaded to contribute financially since the information would help them to plan their marketing strategies. Mr Kim also stressed the use of increasingly inexpensive satellite technology. The Experts agreed that the use of information technology could also contribute to the efficient generation for farmer income data.

Generation of Farmers' Income Data

57. Mr Erniel Barrios from the School of Statistics, University of the Philippines and FAO Consultant, introduced, in STAT-INCOME-13, three methods that could be used in generating farmers' income from existing data. The methods were proposed to fill in data gaps in years when surveys to collect farmer income data were not undertaken.

58. The first method integrates data coming from multi-purpose household surveys such as the LSMS as well as from production surveys. During years where the LSMS is conducted (frequency of data collection vary from 2 to 5 years across developing countries), farmers' income can be estimated over sub-domain. For nonLSMS years, a linear regression model can be estimated with panel data, involving income data from LSMS as the dependant variable and yield/production, area harvested, irrigated area, etc., from the production survey, as the independent variables (see below). Farmers' income for nonLSMS years can be predicted from the model.

yit = β0 + β1xit + ui + εit

Where yit = income for domain/group i at time t
        xit = auxiliary variable for domain/group i at time t
        ui = random effect for domain/group i
        εit = random error for domain/group i at time t

59. The second method is based on a quasi experimental design usually adopted in impact evaluation surveys. The survey usually considers the whole area where the project was implemented as the domain. Sample areas are drawn in two-stages. In the first stage, sample areas/villages are selected while in the second stage sample farming households (about 10-20) are drawn from each sample area. The respondents are selected so that they provide the indicators or at least some proxy variables of the project impact.

60. The third method collects community-level data that are needed to monitor progress in rural programmes. Data collection is a combination of administrative reports, focused group discussions and key informants interview. Data is used in the identification of the kind of development intervention package to foster development in the communities.

61. Mr Barrios illustrated the three methods using data from the Philippines. He clarified that the methods were applied on different instances and different data sets, therefore comparisons were unnecessary. Income data estimates from the quasi experimental design and rapid assessment methods were comparable to those generated from a probability sample (i.e., LSMS). For the linear model, production, harvest area and yield of different crops/livestock as well as growth in regional GDP were considered as independent variables. However, only rice and corn yields (the two most important agricultural commodities in the country) were significant. The adjusted coefficient of determination was a reasonable 63 percent while the meanabsolute prediction error (MAPE) was only 7 percent.

62. He said that by using these methods, the generation of income data could be inexpensive while producing reasonable estimates at a regular frequency. He added that in the absence of a data collection activity aimed at estimating farmer income, existing data coming from different sources could be combined to come up with reasonable estimates. He pointed out that if the goal was to focus on specific farmers' segment, sampling design might deviate away from the usual probability sampling and consider a purposive sampling or even a rapid assessment strategy that uses a combination of the different data collection strategies.

63. The Experts praised the presentation and agreed on the need for suitable methods for generating farm income data in years where there were no farmer income surveys due to budgetary or other constraints. However, they questioned the fact that the income function as shown in the first method was excluding prices. Mr Barrios indicated that price and other variables were accounted for by the inclusion of a random component into the model, which was estimated a priori. Some experts suggested the inclusion of non-farm variables as regressors of income. It was clarified that LSMS samples used sampling rates ranging from 1-5 percent among developing countries.

Appropriate Strategies for Imputation and Analysis

Rural Income Generating Activities (RIGA) Study: Income Aggregate Methodology, Issues and Considerations

64. In STAT-INCOME-14, Ms Katia Covarrubias, Economist/Consultant, Agricultural Development Service, FAO, presented the Rural Income Generating Activities (RIGA) project implemented by FAO. She indicated that the RIGA project aimed at measuring and characterizing rural income generating activities in developing countries. The project has worked with selected surveys from Africa (Ghana, Madagascar, Malawi and Nigeria), Asia (Bangladesh, Indonesia, Nepal, Pakistan, Thailand and Viet Nam), Latin America (Ecuador, Guatemala, Nicaragua and Panama) and Eastern Europe (Albania, Bosnia-Herzegovina and Bulgaria). It also helps to fill research gaps, build platform or protocol for future data collection, construction and analysis, and to contribute to rural development policy.

65. Ms Covarrubias pointed out that the processing of cross-country data varied according to methodology (although many used the LSMS framework), reference period, concepts and definitions, which caused problems in the comparison of income statistics across countries and over time. In order to achieve consistency and comparability, some standard definitions were adopted by the RIGA project. With regard to imputation, she indicated that the presence of outliers in cross-country data was common. The project defined an outlier to be +/- 3 standard deviations cutoff from the median value of a relevant population subgroup (e.g., crop type if checking crop sales income). She said that the project was exploring alternate approaches to deal with other extreme values.

66. In aggregating cross-country data on shares of various income sources to total income, Ms Covarrubias mentioned that the project encountered the problem of whether to use mean of shares or shares of means. The mean of shares reflected more accurately the household-level diversification strategy, regardless of the magnitude of income; while the share of means reflected the importance of a given income source in the aggregate income of rural households in general or any given group of households. If the distribution of the shares of a given source of income was constant over the income distribution, the two measures gave similar results. If however, for example, those households with the highest share of crop income were also the households with the highest quantity of crop income, then the share of agricultural income in total income (over a given group of households) using the share of means would be greater than the value using mean of shares.

67. The Experts praised the efforts made by FAO in measuring and characterizing rural income in developing countries. They recognized the need for more consistency in the collection of farm income and other socioeconomic data across Asia-Pacific countries. The Experts felt that in survey design construction, the following issues should be properly planned: reference periods and survey frequency; units of measurements and equivalence tables; data validation (consistency in reporting across data modules); geographic referencing information (to possibly link the survey data to census data); consistency across surveys and over time (with consideration to the local context). With regards to imputation, it was suggested that bootstrap methods could be considered in dealing with extreme values.

68. In the ensuing discussion, the Experts recognized that collecting data on farm income was a complex process requiring large resources. Thus initiatives to develop optimal sampling design were required as it provided a framework that could be used to optimize cost-efficiency balance. List frames commonly obtained from censuses could be augmented with area frames. The choice and application of stratification variables (e.g., farm size, access, etc.) could certainly enhance efficiency of farmer income data. Rotation of samples and the use of model-based methods could also contribute both in enhancing efficiency and data quality. Spatial-temporal dimensions in survey designs might also be considered. The Experts pointed out that data collection methods could be a mixture of different strategies (e.g., face-to-face interview, telephone interview, mail, etc.), the choice dependant on the complexity of the information needed and level of comprehension of the respondents. The use of technology was envisioned to facilitate data collection as well. The choice of a reference period could contribute to the issues on memory recall.

Previous Page Top of Page Next Page