Impact Evaluation

What is Impact Evaluation?

Critical observations are made on a day-to-day basis for most projects. However, these are not the same as evaluations that in a structured manner examine results based on evidence, and which use and test hypotheses about the interventions. Impact evaluation has been defined in different ways in the international development field. Some organizations and experts define it as an exercise that seeks to assess, to the extent possible, the attribution of changes in the lives of low-income beneficiaries (e.g. in incomes, production, empowerment, etc.) or in country policies or systems to the project in question. Such exercises, in addition to evaluating the results for the beneficiaries at project-end, also involve examining the counterfactual¹ cases or scenarios both with and without the project intervention(s). This is to assess to what degree the actual project or intervention, as opposed to other external factors, may have contributed to change: the project attribution. Other evaluators and organizations view impacts as merely deeper and long-term changes in livelihoods, policies, institutions and practices, beyond the project’s more immediate outcomes. Their evaluations look only at the “before” and “after” situations. In this Note, the first definition of impact evaluation is adopted. By including analysis of attribution it is more rigorous in policy terms, but cost, time and other factors are considered to offer feasible means for assessment. With larger scale and longer term investments, rigorous impact evaluation becomes more essential also in terms of accountability in the use of public resources.

Why do you need to know about this?

All projects are encouraged to include an impact evaluation. They are usually done to evaluate results against objectives and indicators formulated at the beginning of a project (or programme) (see RBM). Evaluations are most useful when project heads or policy-makers are driving the discussion about what should be evaluated to inform policy decisions. They will always have the potential to provide stakeholders with important lessons on which interventions have been most successful and under which conditions, and how, upscaling and replication may take place (see Scaling Up). Typically for projects aimed at goals such as contributing to national food security and nutrition, decreasing child malnutrition rates and increasing household living standards, the following questions are some good starting points:

What changes have occurred in the participating population since the beginning of the project?
What is the magnitude of any given change?
To what extent are these changes attributable to the project?
What different factors hindered or enabled the achievement of the impact(s)?

The exact methodology and the amount of resources allocated to a specific impact evaluation can differ depending on e.g. the size of the project and potential lessons to be learned. Though impact evaluation surveys that involve control groups can be more expensive, they are important as such comparison groups help to strengthen the analysis of attribution. Integrating complementary information from both impact evaluation and ongoing efforts in the project’s own monitoring and evaluation (M&E) system, as well as ensuring management's use of this information, are critical aspects of results-based management (see RBM) and crucial for future interventions. This Note will describe characteristics of good impact evaluations and suggest ways to address challenges.

To allow for the best possible impact evaluation results, it is necessary to begin considering the assessment approach and establishing concrete monitoring indicators during project formulation, which the impact evaluation will also use. A baseline study, which should precede implementation or be one of the first project activities, should be considered an essential part of the impact evaluation in addition to serving project monitoring purposes. The baseline study can be compared with one or more follow-up assessments during and towards the end of the project to understand the outcomes, and to help understand the impacts of the project on the target population. Depending on the type of intervention, it may be acceptable to carry out an impact evaluation directly at the end of the project. There are, however, various interpretations with regard to the time when impact is measurable – immediately after the project or a couple of years later.

Note: Even if you do not want to conduct a full impact evaluation with a counterfactual (e.g. because of limited funds), you should still conduct a baseline study because it provides the foundation for properly assessing outcomes.

What is a baseline study?

A baseline study is an analysis of the conditions of the beneficiaries or targeted areas (i.e. households, government trainees, policy environment) conducted BEFORE any project interventions starts. The baseline study should take into consideration how the project will be implemented, and the sampling strategy, survey instruments, etc. should be prepared accordingly. Well-designed baseline studies help managers and implementers think through indicators, implementation strategies and detailed targets at the beginning of the project. Baseline studies, when done well, can be expensive so many projects do not end up doing them. Studies that focus specifically on the variables that can, and should be affected by the project, can help control costs. It should be noted that situational analyses carried out during design stage, which may include various preparatory studies and surveys, are often called baselines studies. This Note focuses on studies and surveys that produce sufficient sound and detailed data to ensure a strong end-of-project impact evaluation.

Considerations and challenges in conducting a baseline study

Developing a logically sound counterfactual. A quality impact evaluation must present a plausible argument that observed changes in outcome indicators after the project intervention are in fact due to the project (attributable) and not to other unrelated factors, such as improvements in the local economy or projects organized by other agencies.

Cost considerations. Concerns have been raised that a lot of money is spent on baseline study surveys, which might not be justified in all cases. This concern often reflects negatively on the impact evaluation. The response to this argument must clarify that baseline studies are not ends in themselves. They should be designed with a longer-term perspective in mind, as “rigorous and systemic evaluations have the potential to leverage the impact of international organizations well beyond simply their ability to finance programs.” (Source: see box)

Experience also shows that a thorough quantitative impact evaluation often cannot rely on existing official data because of the specificities of the interventions. Relevant data might not be readily available at the disaggregated level. Sampling issues also occur, as characteristics of specific project beneficiaries may be very different from the averages reported in government data. Conducting formal sample surveys is the only reasonable solution for assessing effects of project outcomes on the economic or social welfare of beneficiaries.

Depending on the dimension of the project, the argument can be made that knowledge about good or poor functioning of certain projects extends well beyond the organization or the country implementing the programme and can therefore be considered an “international public good”.

In this respect, the development and application of different empirical strategies and methods is only one cost element². Transportation and labour costs related to administering questionnaires and analysing data should also be taken into consideration. Efficient oversight and supervision of the whole process is another important factor. Taking into consideration all the different elements, new data collection through household or other types of surveys can be costly, accounting in certain situations for over 60 percent of the necessary impact evaluation expenses. While usually desirable, the need for and affordability of statistically robust – and often expensive – surveys must therefore be carefully judged on a case by case basis.

Where possible, it is sensible to add project-specific surveys to existing national/area surveys. This is to join forces with other stakeholders rather than creating a new data collection facility, or to combine them with methods of participatory rural appraisal or rapid rural appraisal.

Examples of indicators: infrastructure–likelihood of sustainability of groups managing infrastructure formed/strengthened; production–improved performance of service providers; financial services–improved access to financial services for the poor; enterprise development–employment opportunities created; forestry–percentage increase in household income through sales of wood and non-wood products.

Implementation arrangements. While monitoring and ongoing evaluation should normally be the responsibility of project managers, impact evaluations may often require the expertise and capacity of external specialists that is not available in government or implementing institutions. This may also help bring an external independent review and analysis. The necessary expertise for an impact evaluation should not be underestimated. Availability of sustained and competent technical support in such areas as statistics and econometrics is perhaps the most important contributor to the success or failure of the impact evaluation method. One additional advantage to contracting an external company is that it will offer continuity to the project’s evaluation “history”. However, to ensure continuity it is necessary to contract the same external support over a number of years. Despite this, project implementers should be cognisant of the fact that outsourcing can also create problems, for example if:

Terms of reference (ToRs) are written by people who have limited knowledge about impact evaluation. Specifying the approach for the evaluation can be the most challenging part of developing the ToR³;
There is insufficient capacity to supervise the contractor. One option for the project implementers to ensure minimum control is to make use of performance-based contracts with the partner institution.

How do you conduct effective impact evaluations

Within the capacities and limitations of each project, three decisions must be taken prior to any evaluation⁴:

Data sources. Will you use existing secondary data or collect primary data through your own quantitative baseline survey?

Primary data collection is essential for the impact evaluation baseline survey as this is the only way a project will be able to show in a robust manner any lasting changes initiated through or caused by subsequent activities. Secondary data especially about the project area and target group can complement such findings. At this stage it is important to reiterate that projects should always conduct a separate baseline study at inception, to form part of the situational analysis. This can of course include field work, yet at a much smaller scale compared to the impact evaluation baseline survey.

Assessment approach. How will you combine quantitative and qualitative methods? What level of participation do you want?

Experimental vs. non-experimental

Approaches to an impact evaluation can be broadly divided between experimental and non-experimental approaches. Experiments refer to projects in which the treated and control groups are randomly assigned to ensure comparability. Non-experimental approaches are defined as those in which it cannot be reasonably assumed that the non-treated group is a perfect counterfactual for the treatment group⁵. However, there could be valid reasons for adopting a non-experimental or non-random selection of groups, based on the objectives of the evaluation and what it seeks to investigate.

Quantitative vs. qualitative

Both qualitative and quantitative approaches have their place in the project’s M&E system and it would be unwise to rely solely on one or the other. The former can be more informal and participatory, but should be conducted, and the results analyzed, following a structured approach (e.g. interviews should be semi-/structured to obtain comparable results). Qualitative methods should ideally be used in conjunction with the latter. The choice of methods will depend on the questions of the evaluation. The predominant evaluation strategy adopted so far is grounded largely in quasi-experimental designs with control groups, using before- and after-project intervention data from large quantitative surveys. However, this strategy has proved difficult to implement in several projects, weakening the utility and reliability of the information gathered. In this situation, mixed methods can be useful because qualitative data is more flexible; it can address questions of ‘why,’ identify possible unexpected results, etc. However, in practical terms, mixed methods are not really comparable with statistical significance because sample sizes are very small.

Assessment design. How can you design the assessment to make it as rigorous as possible?

Random vs. not random samples

For quantitative impact evaluations, a high degree of statistical rigor is generally essential; otherwise the attribution of impact could be compromised. The random assignment procedure allows for creating an equivalent control group and thereby avoiding selection bias. With regard to random sample precision, one can say that the larger the sample, the higher the level of precision⁶. The higher the ratio of the sample size to the population size, the more precision (once the ratio exceeds 1:10). The less variance in variables of interest (for example, the share of undernourished children in a village), the more precision.

In practice, difficulties in impact evaluations may arise from the lack of suitable comparison groups, poorly identified or too small samples, and failure to include comparison groups in either baseline or impact assessment surveys. Loss of statistical rigor is therefore being accepted as outweighed by the gain in understanding of how projects work, which parts work best and why they work in a given context.

For projects that feel confident calculating their own sample size, see related links.

Using impact evaluations at the national investment plan level

Country evaluations (in the form of country sector assessments, sector impact evaluations, or strategic evaluations) are focused on the combined performance of a group of related projects or other activities that are aimed at the same development objective within a developing country. In other words, they assess the effectiveness of a set of related activities in terms of achieving specific country-level development results, usually of a selected sector or sub-sector. Such evaluations may attempt to compare and assess the relative effectiveness of the different project intervention strategies aimed at the same objective, including their synergies and potential conflicts or tradeoffs. The main difficulty they face is to identify a counterfactual to start with.

In the case of the Comprehensive Africa Agriculture Development Programme (CAADP), there have been various initiatives to quantify the impact of CAADP on e.g. agricultural expenditure, agricultural value-added, land and labour productivity, income, and nutrition. This is always done using country-level data (International Food Policy Research Institute discussion paper). Another example is the impact evaluation of small and medium enterprise programmes in Mexico. Impact evaluations at this level mostly refer to sector-wide changes and are mirrored in secondary data such as the Living Standards Measurement Study or an agricultural census survey.

Using impact evaluations at the subproject level

When a project setting involves a national investment facility or fund with established criteria for how sub-projects can apply for this funding (e.g. through matching grants) it is slightly more complex to assess the impact at project level due to the heterogeneity of the activities assessed. As a general rule, it is easier to conduct a quantitative randomized control trial if the group of beneficiaries is fairly homogenous. In the case of an investment fund involving various agribusiness enterprises it should be feasible to review the profit levels of the individual companies. The added difficulty in this case is to select the counterfactual. One option could be to review the scoring of all applicants and include in the counterfactual those that barely missed the necessary score to receive funding. Heterogeneity always adds complication and diverse methods should be applied.

How should results of the analysis of an impact evaluation be used?

Impact evaluations can add great value to the project learning process if decision makers at various levels make use of the information generated⁷ . In that case they can be a good source to inform future adaptation of implementation approaches (see Scaling Up). However, it is also important to note that projects will always often struggle with impact evaluations, because the project managers who control them are also stakeholders who want to show positive results. Hence the development success depends very much on the importance of fostering a results oriented culture among key stakeholders from the very beginning of any operation.

Footnotes

¹The pre-test/post-test control group comparison represents the counterfactual – what would have happened to the project population if the project had not taken place.

² Criteria for selecting a data collection method and source can be found in www.oecd.org/development/evaluation/1886527.pdf, p.41.

³For more information on ToRs see:
siteresources.worldbank.org/EXTEVACAPDEV/Resources/ecd_writing_TORs.pdf
http://www.managingforimpact.org/resource/baseline-study-guidelines

⁴This text does not explain all the different data collection and impact evaluation methods in detail. For further reference see useful resources and external links or e.g. www.adb.org/documents/impact-evaluation-methodological-and-operational-issues.

⁵In reality experimental designs are difficult and a review of WB impact evaluations of agricultural projects shows that only 6% used experimentally designed approaches whereas the remainder used non-experimental approaches. In these cases, empirical methods allow for a control group to be created that represents a reasonable counterfactual. Some common non-experimental approaches include difference-indifference (or double difference), propensity score matching, regression discontinuity and instrumental variable estimation

⁶Sample size selection requires such considerations as: (i) the desired level of significance or degree of confidence; (ii) the estimated prevalence of indicator of interest (e.g. chronic malnutrition); (iii) the amount of precision required or acceptable margin of error; (iv) the estimated size of the design effect.

Key Resources

Overview of methods for baseline assessments (FAO M&E Technical Advisory Notes Series)	Covers quantitative, qualitative and mixed methods as well as tools and participatory approaches that can be used for a baseline (or follow up) assessment, taking into account the stakeholders’ information needs and the overall programme context.
Designing Impact Evaluations for Agricultural Projects (IADB, 2010)	Provides suggestions on designing impact evaluations for agricultural projects, particularly projects that directly target farmers, and seek to improve agricultural production productivity and profitability.
Conducting quality impact evaluations under budget, time and data constraints (World Bank, 2006)	Provides advice to those planning an impact evaluation to select the most rigorous methods available clarifies the nature of trade-offs between evaluation rigor and the budget, time and data available for an evaluation.
Using Mixed Methods in M&E: Experiences from International Development (World Bank, 2010)	Reviews the main challenges and opportunities for incorporating mixed method approaches into research and evaluation on the effectiveness and impacts of international development.
E-learning course on "Qualitative Methods for Assessing the Impact of Development Programmes on Food Security"(FAO, 2013)	Provides guidance and assists managers and monitoring and evaluation officers in how to use qualitative methods in conducting the assessment of food security and nutrition impact of development programmes.
E-learning course on "Assessing Impact of Development Programmes on Food Security" (FAO, 2010)	Presents latest information on impact assessment within the context of development programmes that address food insecurity.
Module 4: impact evaluation at household level (FAO/ Land Administration Projects Platform)	Provides a conceptual framework, practical guidance and factsheets to carry out impact evaluation at household level regarding improvement in land tenure security.
Combining quantitative (formal) and qualitative (informal) survey methods (DFID, 2001)	Offers practical assistance for field staff and project managers in selecting the most appropriate data collection and analysis methods.

*These documents are Unit chapters from a postgraduate distance learning module – P534 Project Planning and Management - produced by the Centre for Development, Environment and Policy of SOAS, University of London. The whole module, including study of the role of projects in development and financial and economic cost-benefit analysis, is available for study as an Individual Professional Award for professional update, or as an elective in postgraduate degree programmes in the fields of Agricultural Economics, Poverty Reduction and Sustainable Development, offered by the University of London. For more information see: http://www.soas.ac.uk/cedep/
These documents are made available under a Creative Commons ‘Attribution - Non Commercial – No Derivatives 4.0 International licence’ (CC BY-NC-ND 4.0).