Study of WaPOR data use
Yield analysis of the Mwea Irrigation scheme
Figure 1: Mwea irrigation scheme in the WaPOR portal with layers that were used in this study (NPP and AETI)
Introduction by the Kenyan National Irrigation Authority
"Having reviewed the analysis report presented, I would like to begin by commending [the WaPOR team] for the quality work delivered. The findings accurately reflect the rice crop phenology and productivity patterns across the five scheme sections, consistently aligning with the cropping calendar and historical production data. The scatter plot presented in the report reinforces this observation, as most data points cluster closely around the 1:1 line (line of equality), indicating strong agreement between the two datasets.
Since its establishment in 1954, the Mwea Irrigation Scheme has undergone progressive expansion, and it currently comprises approximately 12,300 hectares under active irrigation. The National Irrigation Authority (NIA), mandated under the Irrigation Act of 2019, continues to oversee the management of public irrigation schemes nationwide. This includes capacity building of stakeholders, and the planning and execution of operation and maintenance activities to ensure efficient water delivery, infrastructure functionality, and sustainable agricultural production.
The open-source datasets available through WaPOR version 3 have been valuable data information to the scheme’s water management department, particularly in irrigation scheduling and supporting crop yield estimation. The reliability of these datasets, enhanced by the WaPORIPA tool, has made it possible to access to NPP, AETI, and transpiration data, to enable accurate crop-water productivity assessments for routine decision-making. It is important to note that, since 2020, the scheme has been undergoing robust expansion works. As a result, several spatial datasets and shape files are still being updated to accurately reflect the expanded command area. The updated boundaries are essential for ensuring that remotely sensed analyses capture (crop and water) productivity for not only the original scheme boundaries but also across the entire developed areas. The expansion of the scheme created additional pressure on water distribution, making it difficult to meet the irrigation needs of all farmers concurrently. In response, the scheme through raft of capacity building and stakeholder engagement through the Capacity Development Project for Enhancement of Rice Production in Irrigation Schemes introduced Alternate Wetting and Drying to optimize water use measures. Clustering units and sections during the planting season (dividing the command area into four groups) was necessary due to infrastructural limitations in handling large water volumes conveyance while implementing the cropping program. Consequently, the Start of Season (SOS) and End of Season (EOS) for different units and sections may have mismatched, as noted by the author.
In conclusion, the analysis is the true reflection of expected outcomes especially on the larger scale (scheme section). We will endeavour to utilise the open source data to assess performance of irrigation schemes with cost effective and reliable tools. Your support and partnership are greatly appreciated."
Eng Jairus Serede, Director Irrigation Management Services at National Irrigation Authority
Table of contents
(click in order to be taken directly to the corresponding section)
Introduction by NIA
Background and study objectives
Study area
Abbreviations
Methodology
| Data | ||
| Study period | ||
| Analysis |
Results
| Is WaPOR data able to capture the crop production dynamics over the course of a season? | |
| How does yield derived from WaPOR data perform against local yield data? |
Background and study objectives
The Mwea Irrigation Scheme (MIS) is a public scheme that is located in the agrarian county of Kirinyaga in Kenya. It is the country's largest and is central to its rice production. The Mwea Irrigation Development Project (MIDP) lies within the MIS. It is a national flagship project for its contribution to poverty reduction and the improvement of food security. The component of the project that is centered on the scheme aims to enhance the production of rice through water usage optimization by way of improvements to the irrigation and drainage systems using water from the nearby Thiba dam, as well as through improvement in scheme operation and management.
According to the National Irrigation Authority (NIA) 30600 acres are currently under paddy rice production in MIS. The NIA recognizes that remote sensing can play a crucial role in supporting the monitoring of such a vast area, as ground-based crop monitoring across large irrigation schemes is labor-intensive and costly, while validated remote sensing approaches can be rapidly scaled to monitor agricultural productivity across vast extents. Kenya is a WaPOR partner country, and the MIS is one of the pilot locations selected by the NIA, in conjunction with other Kenyan partners to benefit from 10 m resolution WaPOR data, to facilitate the monitoring.
This study was done with the aim of using WaPOR v3 data to calculate the yield for the four main growing seasons between 2021 and 2024 in the MIS seeking to answer the following questions:
- Is WaPOR data able to capture the crop production dynamics over the course of a season?
- How does yield derived from WaPOR data perform against measured field data?
- What are other paths to further establish the connection between WaPOR data and what is happening on the ground?
Study area
The MIDP has a gazzetted area of 30 350 acres out of which 26 000 acres is dedicated to paddy rice cultivation. The main scheme in the gazzeted area is divided into five main sections (hydraulic and administrative units): Karaba, Mwea, Teberre, Thiba, and Wamumu. These sections are further divided into 47 blocks in total. The scheme is located along the drainage basins of Rivers Nyamindi and Thiba, which also supply the irrigation water to the scheme ( figure 1).
Kenya experiences two rainy seasons that are linked to the kusi monsoon wind patterns:
- the long rains from March to May; and
- the short rains from October to December.
The kaskazi winds bring drier conditions. The Thiba Dam was built to provide a year-round water supply for the MIS, expanding irrigation and allowing for double cropping.
Abbreviations
| AETI | Actual Evapotranspiration and Interception |
| AOI | Area of Interest |
| AOT | Above-ground Total biomass production |
| EOS / SOS | End of Season / Start of Season |
| FAO | Food and Agriculture Organization (of the United Nations) |
| fc | Light use efficiency correction factor |
| HI | Harvest Index |
| MAE | Mean Absolute Error |
| Mc | Moisture content factor |
| MIDP | Mwea Irrigation Development Project |
| MIS | Mwea Irrigation Scheme |
| NIA | National Irrigation Authority |
| NPP | Net Primary Production |
| nRMSE | Normalized Root Mean Square Error |
| pBIAS | Percent Bias |
| QGIS | Quantum Geographic Information System |
| TBP | Total Biomass Production |
| WaPOR | Water Productivity through Open access of Remotely sensed derived data |
| Y | Yield (harvestable crop amount) |
Methodology
Figure 2: Methodology worklflow
Data
WaPOR data: from the version 3 of the dataset (available in the portal from the beggining of 2024 to date, with data going as far back as 2018) was acquired from the WaPOR portal to carry out the analysis. The layers used were:
- AETI (actual evapotranspiration and interception): the combined water loss from a surface through evaporation from soil and water bodies, transpiration from plants, and interception of water by vegetation canopies, expressed in millimeters of water per unit of time (mm/month*); and
- NPP (net primary production): the amount of carbon that vegetation captures through photosynthesis after subtracting the carbon used for plant respiration, expressed as grams of carbon per square meter per unito of time (gC/m²/dekad*). In essence, NPP refers to how much the plants grow and accumulate biomass over the observed time frame.
* The data was downloaded at the monthly and dekadal (10 days) time-steps, then aggregated to the period of study: from May to December.
Local data:
- yield data by section, that was used for the validation. The field data used in this analysis was collected through a multi-stage stratified random sampling approach. Sections served as strata, from which blocks/units were randomly selected, followed by random selection of farms. The exact aggregation methodology from sample plots to section-level yields is not clearly documented in public sources we consulted.
- boundaries of the irrigation scheme (shapefile).
This data was provided by the NIA, that has collaborated with this study and facilitaded greatly its execution.
Study period
The MIS practices 2 to 2.5 rice growing seasons annually. The main season runs from July (sometimes earlier) to December. A secondary, shorter season occurs from March to July, and a ratoon crop can also be grown between December and February. Based on reports by the NIA for the period of interest, the growing period considered spanned from 01 May to 15 December.
The multi-year approach adopted in this study seeks to compare these main growing seasons accross four years: from 2019 to 2022. This period for the four years was chosen due to the availability of field data from the NIA.
Equations 1 and 2: formulas for calculatting total biomass production and yield
Analysis
The analyses were conducted using python and QGIS. The work done using python rested on the freely available WaPORIPA 3 repository developed by IHE Delft, a partner of the WaPOR project. QGIS was used for the data preparation.
Besides are the formulas used to calculate TBP (total biomass production) and Y (yield).
Terms of the equations:
- TBP, in tons/ha, expresses the total amount of dry matter produced over the season;
- NPP, in gC/m2 /season, espresses how much dry matter crops actually keep and store after using some of their energy for basic survival functions like breathing. It's the net growth they achieve;
- 45 is a conversion factor that transforms NPP into TBP by accounting for the fact that carbon typically makes up about 45% of plant dry matter;
- 1000 converts the units from gC/m2/season to tons/ha;
- HI is the harvest index; it is unitless. It refers to the ratio that shows what percentage of the NPP ends up as harvestable crop;
- AOT is the above ground total biomass production, also unitless. It refers to the plant matter that grows above the soil surface (stems, leaves, branches, and the harvestable parts like grains or fruits) and excludes what is underground (the roots);
- fc is the light use efficiency correction factor, which adjusts the biomass calculation to account for local conditions that might affect how efficiently plants convert sunlight into biomass. This factor was obtained from the litterature;
- Mc is the moisture content factor, which converts dry matter biomass to fresh weight biomass by accounting for the water content in the harvested crop. The Mc is crop-specific and was found in the litterature;
- Y, in ton/ha/season, expresses the amount of harvestable crop: it's the useful part of the NPP that can be sold or consumed.
Results
Is WaPOR data able to capture the crop production dynamics over the course of a season?
Figure 3a: AETI time series (view 2: inter-year trends)
In order to answer this question, the AETI of the irrigated area (total water lost from crops and soil through evaporation, transpiration, and rainfall pooled in the canopy) of the area of interest (AOI) was observed for each month of the growing seasons over the 4-year study period.
The AETI time series for the main growing seasons in 2019, 2020, 2021 and 2022 (figure 3) demonstrates a consistent seasonal pattern that closely aligns with the rice crop calendar.
- 1️⃣: AETI values are typically low during land preparation and early planting phase (May–June),
- 2️⃣: rise sharply during the vegetative and reproductive phases (July–October),
- 3️⃣: and gradually decline during crop maturity and harvest (November–December).
This trend reflects that the phenological stages of rice growth is captured well and is observed consistently across all four years. All five sections within the irrigation scheme—Karaba, Mwea, Tebere, Thiba, and Wamumu—follow this general trend, with variations in magnitude and in the timing of the stages (1️⃣,2️⃣ and 3️⃣).
Notably, Wamumu and Karaba seem to consistently higher AETI values (4️⃣) despite their smaller land area. This is potentially due to better irrigation infrastructure, more intensive cultivation, or a higher proportion of irrigated rice fields.
For the 2022 and 2023 seasons, however, there is a slight increase in AETI observed toward the end of each main season (October–December) (5️⃣). This could be attributed to the onset of short rain season in Kenya, residual soil moisture, or continued irrigation, including land preparation for the subsequent season. These factors contribute to sustained evapotranspiration even after the crop reaches physiological maturity.
High AETI values observed at the start of the 2022 and 2023 seasons can be explained by a variety of potential factors. The start and end of season dates (SOS and EOS) used in this study were determined by reports by the NIA, and could correspond more to planned dates than actual dates that typically vary year-to-year. This could mean that the these two years' seasons may have started a bit later (6️⃣) on the ground. The implication is that the AETI and NPP values (from which yield is derived) may include a short lapse of time that corresponds to the previous season, which has the potentinal to bias the subsequent yield analysis for those two years.
Considering double cropping is practiced in MIS, the initial decline in AETI followed by an increase likely might reflect the chosen SOS including late crop stages from the preceding double-cropping cycle.
However, due to limited spatial granularity of the validation data, these are interpretations are primarily based on temporal trends observed in the AETI datasets. Furthermore, satellite-derived AETI estimates may also capture evapotranspiration from non-crop vegetation, standing water, or fallow fields with residual moisture, potentially leading to overestimations during initial phases of the crop growth cycle.
Overall, AETI time series provide valuable insights into crop phenology.
Figure 3b: AETI time series (view 1: individual years)
How does yield derived from WaPOR data perform against local yield data?
Figure 5a: bar chart of WaPOR yields against field yields by year
In order to answer this question, the yield data provided by the NIA at the section level was compared to the yield data estimated using WaPOR data for each main growing season over the 4 year study period. The WaPOR-based seasonal yield was obtained by summing the dekadal data.
Several metrics were calculated to assess the relationship between estimated and observed yields over the 4-year period:
- Percent bias (PBIAS = -0.45%) The PBIAS measures the average tendency of predictions to be larger or smaller than observations. The value of -0.45% is close to zero, indicating that WaPOR estimates are, on average, virtually unbiased compared to field yields, with a very slight tendency towards underestimation.
- Mean error (ME = -0.03 t/ha) The mean error represents the average difference between predictions and observations (without taking absolute values). At -0.03 t/ha, this confirms minimal systematic bias, with errors distributed symmetrically around zero, which means that some overestimations and some underestimations largely cancel out.
- Mean absolute error (MAE = 0.795 t/ha) The MAE expresses the average magnitude of prediction errors in the same units as yield. With typical yields around 6 t/ha (as shown in Figure 5), an MAE of 0.795 t/ha represents approximately 13% error, indicating that individual predictions deviate from observed values by about 0.8 t/ha on average.
- Normalized root mean square error (nRMSE = 15.30%) nRMSE quantifies error magnitude relative to mean observed yield and, unlike MAE, penalizes larger errors more strongly. At 15.3%, this indicates the model has moderate predictive accuracy. That is, reasonable for yield estimation models but with non-negligible error that should be considered when using these estimates for decision-making. For average yields of roughly 6 t/ha, this suggests approximately 68% of estimates fall within ±0.92 t/ha of the true yield (roughly 5.1 - 6.9 t/ha) and 95% fall within ±1.8 t/ha (roughly 4.2 - 7.8 t/ha).
- Standard deviation of yields (WaPOR = 0.456 t/ha vs. observed = 0.490 t/ha): the similarity in standard deviations indicates that WaPOR captures not only the mean yields well, but also their temporal variability across seasons, which is important for understanding local and temporal yield dynamics.
Overall, the model demonstrates minimal systematic bias with moderate prediction errors of approximately 15% relative error, which represents typical performance for remote sensing-based yield estimation.
Figure 4: scatter plot of WaPOR-based yield estimate compared to the field measured yields
Concluding remarks
Diagnosing the source of the differences in yield estimates between field data and WaPOR data is not an easy task, as the source of uncertainty are varied.
One possible sources of uncertainty is the use of literature-based parameters, such as the HI, the AOT, the fc, and the Mc (all variables that are necessary for the conversion of NPP into TBP then into yield, as described in the "Analysis" subsection of the "Methodology". The litterature-based variables may not fully reflect the local conditions in the MIS.
It is noteworthy that this analysis was performed using external data provided by the NIA. This limitation means that we do not have a full understanding of the local data collection procedure and the potential biases that might be embedded in it. Having validation data from the field is extremely useful for understanding how the remote-sensing data performs against real-world observations. A systematic study of this relationship is key to increasing the maturity of a dataset, which, among others, is characterized by: well known and documented limitations and uncertainties. A mature dataset is one where the users have enough information about its performance against field conditions that allow them to clearly determine its usability in different contexts, particularly their own. Yet, validation data is not immune to uncertainties. Field data collection methods can introduce sampling biases, measurement errors, or temporal mismatches with satellite observations, while the representativeness of ground measurements may be limited by specific locations, timing, or methodologies used. Without detailed metadata about field data collection protocols, measurement uncertainties, and quality control procedures, it becomes difficult to distinguish between limitations in the remote sensing methodology versus issues with the validation data itself. The quality and reliability of validation data directly influences how effectively we can assess and improve remote sensing products - inconsistent or poorly documented field measurements can mask the true performance of satellite-derived estimates, potentially leading to either overconfidence or unwarranted skepticism about the dataset's capabilities. These compounding uncertainties in both remote sensing and validation data create significant challenges for operational applications, particularly in contexts where decision-makers require clear confidence intervals and well-characterized error bounds for agricultural monitoring and yield forecasting. This underscores the importance of transparent data sharing practices and comprehensive documentation throughout the validation process.
The apparent mismatch between the start of season and end of season (SOS and EOS) dates and the phenological progression of the AETI pointed out in the 1st subsection of the "Results" section that seeks to answer the first question posed in this study, can also be a significant source of error that might influence the WaPOR-based estimations. Since yield calculations depend on cumulative biomass production over the growing season, even small temporal shifts in the integration window can compound into significant errors in final estimates. Early season misalignment may incorrectly attribute land preparation activities to crop growth, while late season errors might include post-harvest field conditions or subsequent crop establishment in the calculations. When the dynamics are looked at at section level, given that MIS section areas vary between ~1,325 ha (Karaba) and ~2,479 ha (Tebere), they can have considerable impact on the final results, with the compounding of uncertainty. Shifting from scheduled start and end of season dates to dates that are informed by the data itself, might be a step in the right direction towards diminishing that uncertainty. Moreover, as demonstrated in this study, WaPOR data successfully mirrors the phenological changes of the rice crops. It could therefore be used to cross-check the schedulled SOS and EOS dates reported.
In addition to that, working at a temporal scale finer than monthly, which WaPOR data affords through the availability of dekadal data (every 10 days), should be considered as a way forward to increasing the detail in the crop growth dynamics, even if aggregated seasonal results would not be affected by this approach. This could provide: a more nuanced understanding of critical phenological transitions, a clearer picture of short-term water stress events, irrigation cycles, or weather impacts, a more precise alignment between satellite observations and actual farm management practices, a better foundation for identifying anomalous periods or sudden changes and a more robust basis for calibrating and refining the algorithms used in biomass and yield calculations, particularly for identifying the optimal integration periods that best capture actual crop growth phases rather than relying on generalized seasonal windows.
References
FAO WaPOR. Accessed May 10, 2024. https://data.apps.fao.org/wapor/?lang=enWaPORIPA: Standardized protocol for irrigation performance assessment using WaPOR data. Updated WaPORWP for WaPOR version 3 and pyWaPOR outputs. Published 2024. Accessed October 8, 2024. https://github.com/wateraccounting/waporipa
WaPOR V3. WaPOR data component and methodology: Net Primary Production (NPP). Published October 12, 2022. Accessed September 8, 2024. https://bitbucket.org/cioapps/wapor-et-look/wiki/WaPOR_data_components_and_methodology/NPP