3. Multi-environment yield trials

3.1 Types of trials and requirements of the generated information

Regional yield trials are networks of experiments by which a set of cultivars is usually assessed to make genotype recommendations. In this context, trials typically: are research-managed; comprise 6 to 15 genotypes; are conducted in 5 to 10 locations; and are laid out in a randomized complete block design with 2 to 4 replicates, with more complex designs sometimes adopted (Shaner et al., 1982; Hildebrand and Poey, 1985).

These features, well suited to trials carried out at research stations or on experimental farms, may vary when experiments are performed on farmers’ or villages’ land, either as research-managed or farmer-managed trials. With this increasingly popular situation (Ashby et al., 1995), the number of test locations may increase, whereas the experiments may have fewer or no replicates to reduce the number of plots per site. However, unreplicated experiments performed contemporarily by different farmers in the same location may be considered blocks of a randomized complete block experiment performed on the site, in order to analyse trials replicated within each location.

Multi-environment yield trials may also be performed in the final stage(s) of genotype selection in a breeding programme. In this case, the data set may include a relatively large number of breeding lines together with a few control cultivars. Also in this case, the experiments may be hosted by research institutions or by farmers, with farmer-managed trials being ordinarily adopted in participatory plant breeding schemes (Witcombe et al., 1996; McGuire et al., 1999).

Certain requirements should be fulfilled for the set of yield trials in order to permit reliable inference from data in relation to the specific aim (i.e. genotype selection or targeting). The use of experiment data also to support decisions concerning the breeding programme strategy makes the requirements more stringent and better able to cope with the larger inference space. These requirements are discussed below.

Test locations

The population of sites within the target region should be adequately represented (Cooper and Hammer, 1996b). Locations are rarely chosen at random, as trials tend to be performed where collaborating institutions or interested villages and farmers are present. Indeed, the inclusion of test sites, such as breeding stations or experimental farms, in which selection work can easily be carried out, is useful for defining optimal selection environments. The sample of sites, however, should encompass the major cropping areas and farming practices in the target region, in order to reflect the variation in climatic, soil, biotic and crop management factors. For example, sample sites could represent different areas and cropping systems in accordance with a stratified sampling strategy. The use of a proportional allocation criterion, implying that areas and farming practices of greater importance for the crop are better represented, is more appropriate for supporting decisions on the breeding strategy. The total number of test sites may vary, depending on the extent of the region and the variation in environmental factors, but it should probably not fall below 6 or 7. As a general rule, trials at research stations should be carried out under management conditions which are as similar as possible to those of the population of farmers’ environments that they are supposed to represent.

Repetition in time

For annual crops, trials should be conducted for two or preferably three years to distinguish repeatable from non-repeatable GL interaction effects. Indeed, a multi-year data set seems necessary for a reliable estimation of the genotype mean yield across locations. Results for two independent data sets of bread wheat in Italy show that even a two-year data set cannot provide an accurate prediction of the genotype mean values recorded across four other years, for values averaged over a fixed set of 14 or 15 test locations (Table 3.1).

Yield data for perennials are conveniently analysed as totals over the crop cycle. Such trials are rarely repeated, which means that data are not made available from more than one crop cycle in each location. Research on lucerne in Italy suggests that the variation in environmental factors encountered by genotypes across a three-year crop cycle is wide enough to act as a buffer against the occurrence of non-repeatable GL interaction effects. This conclusion is supported by the lack of sizeable genotype × crop cycle interaction within locations (Annicchiarico, 1992) and by the consistency of GL interaction patterns across independent data sets (Annicchiarico, 2002). These findings support the view that repetition in time is not strictly needed for trials on perennials.

TABLE 3.1 - Mean yield (t/ha) of bread wheat varieties over two years and over four other years in trials repeated across a fixed set of locations, for two independent data sets

Data set 1^a			Data set 2^b
Variety	Two years	Four years	Variety	Two years	Four years
Centauro	7.34*	6.81*	Genio	6.28*	6.70*
Oderzo	6.84	6.71*	Pascal	6.42*	6.40
Pandas	6.95	6.67*	Centauro	6.08	6.31
Chiarano	7.04	6.51	Mieti	5.91	6.22
Brasilia	6.50	6.37	Golia	5.75	6.22
Mec	6.39	6.01	Pandas	5.87	6.15

^a Data set 1 = 15 locations across seasons 1985/86 to 1990/91 in the Italian network of bread wheat variety trials.

^b Data set 2 = 14 locations across seasons 1992/93 to 1997/98 in the Italian network of bread wheat variety trials.

Note: Varieties marked * do not differ from the top-ranking one according to Dunnett’s one-tailed test (P < 0.05), using GE interaction as the error term.

Repetition in time may imply same test years or different test years across locations. Analytical models are also available for the latter case (see Section 4.1).

Germplasm pool

The set of evaluated genotypes should adequately represent the germplasm pool available for the breeding programme (Cooper and Hammer, 1996b). Genotypes are hardly ever chosen at random from the relevant population, but this should not be a limitation (Abou-El-Fittouh et al., 1969). It is argued that a group of carefully chosen elite varieties or breeding lines may represent the genetic base of interest to breeders better than a random sample of entries can. However, conclusions on the adaptation strategy may be influenced by the germplasm sample (Ceccarelli and Grando, 1991), which should, therefore, adequately represent all germplasm types and provenances of major interest for local breeding. For example, a useful data set relative to variety trials may include recent improved varieties produced by national breeding programmes, exotic varieties bred in other countries or at CGIAR’s international centres, and traditional cultivars or landraces, with the relative size of each group roughly corresponding to its potential interest as a source of parent material. Likewise, breeding material evaluated in selection trials should derive from introgression and recombination of genetic variation involving the major germplasm groups. Adequate sampling is also required for possibly different variety types, especially when analysis aims to provide indications on adaptation or yield stability patterns of each type. The requirement concerning the germplasm is of paramount importance when the objectives include the identification of useful genetic resources or of adaptive traits usable as indirect selection criteria (Whan et al., 1991; Jackson et al., 1996). In general, investigation of these issues is fully justified only when the data set includes a relatively large number of entries.

Experiments involving replicates allow for better control and for estimation of experimental errors. In turn, estimation allows for:

verification of the quality of each trial (with the possible elimination of trials with extremely high experimental error or coefficient of variation);
performance of certain statistical tests in the ANOVAs; and
estimation of heritability values.

However, increasing the spatial and temporal repetition of the trials (thereby obtaining a better sampling of environmental variation) is more convenient than increasing the number of replicates in each trial (Bradley et al., 1988). The use of 2 or 3 replicates, possibly accompanied by the adoption of more efficient experimental designs based on incomplete blocks or on a row-column arrangement (Cochran and Cox, 1957; Basford et al., 1996), may prove adequate in most situations. Should an unreplicated scheme be adopted on each site (e.g. for costs limitation), experimental errors may still be estimated from the variation of some replicated genotypes randomly assigned to plots, either within a completely randomized layout or according to a specific design for non-replicated trials (e.g. a modified augmented one, as described by Lin and Poushinsky [1983]).

Unbalanced data sets do not necessarily represent a major problem for analysis (see Sections 4.1 and 5.7). With a multilocation, multiyear data set, including plot values of replicated trials, imbalance (due to missing plots, varying number of replicates per trial or different test years per location) may be adjusted with the appropriate statistical procedures. The application of some procedures may be limited by the unavailability of appropriate software. It may be necessary to eliminate from the data set genotypes or locations with missing values; alternatively, specific techniques for data analysis may be adopted when data are absent from a genotype-location combination (see Section 5.5).

The extent to which the requirements are fulfilled determines the reliability of the information generated for supporting breeding strategy decisions. Ideal situations are actually quite rare. Some data sets may even be discarded a priori, whereas others may be analysed to obtain provisional information for verification once comprehensive data sets are available. When possible, analysing different data sets and pooling the indications generates more reliable information.

3.2 Additional information on environmental factors and genotypic traits

Additional information on climatic, soil, biotic or crop management factors of test locations and morphophysiological traits of genotypes can prove extremely valuable for:

providing reasons for the occurrence of GE interactions;
providing a means for characterizing the subregions and extending the results to new sites;
enlarging the set of possibly adopted models for analysis of adaptation; and
identifying adaptive traits and assessing their potential as indirect selection criteria for breeding.

Furthermore, comparison between climatic data in the test years with long-term site data may help to verify whether any test year with very unusual features has occurred.

Unfortunately, extensive data collection may raise the cost of trials to unsustainable levels in many situations. Adequate sampling of the relevant environmental and genotypic variation (at the cost of relatively limited characterization of sites and genotypes) is often preferred to thorough characterization of just a few sites and genotypes. In particular with morphophysiological traits, observation is usually limited (even in favourable cases) to a few easily recordable characters assessed in research-managed trials. Climatic data may be easily accessible only for on-station trials, whereas their availability for other experiments may depend on the proximity of the test site to a meteorological station.

For some test sites, an accurate estimation of the climatic variables may not be available for the test years. However, estimates of long-term values for these sites obtained from nearby meteorological stations may contribute to characterizing subregions, particularly for those variables associated with the occurrence of GL interaction on the basis of test year data for other locations (see Section 5.8). Similarly, soil variables for the test site may be unknown, but an estimation of their mean value in the area may be exploited at a later stage in the analysis. The possible availability of a Geographic Information System (GIS) can greatly facilitate the characterization of subregions, as well as the scaling-up of results: both spatially, to non-test locations, and temporally, to average values of relevant climatic variables for individual sites.

In conclusion, it is assumed that possible additional data (especially for morphophysiological traits) are only available for a subset of test locations, with implications regarding the choice of analytical methods considered of primary interest to potential users. Although not exhaustive, and while research may offer further verification, this information may provide useful indications concerning the possible causal factors for GE interactions and relevant adaptive traits (see Chapter 5 and Section 6.3).