7. Measures of yield stability and reliability

7.1 Yield stability

Yield stability targets for breeding programmes can be defined from yields of regional trials (possibly submitted to data transformation) through estimation of variance components for the target region (wide adaptation prospect) or individual subregions (specific prospect) (see Section 2.6 and Fig. 2.3). The present chapter focuses on the measures of yield stability and yield reliability that may be adopted to make variety recommendations according to the flow chart in Figure 2.4, or to select breeding material for stable yield. The main concepts of yield stability are introduced in Section 2.5.

Measures of stability

There are two major stability measures that can be ascribed to the static, Type 1 stability concept (Lin et al., 1986; Becker and Léon, 1988).

i) The environmental variance (S²), i.e. the variance of genotype yields recorded across test or selection environments (i.e. individual trials). For the genotype i:
S_i² = ∑ (R_ij - m_i)²/(e - 1)
[7.1]
where R_ij = observed genotype yield response in the environment j (the m_ij notation may also be appropriate, since values are averaged across experiment replicates), m_i = genotype mean yield across environments, and e = number of environments. Greatest stability is S² = 0. Derived stability measures include the square root value (S) and its coefficient of variation.
ii) The regression coefficient of genotype yield in individual environments as a function of the environment mean yield (m_j), adopting Finlay and Wilkinson's (1963) b coefficient. The modelled genotype response:
R_ij = a_i + b_i m_j
where a_i = intercept value, is analogous to equation [5.1] reported for joint regression analysis of adaptation, but genotype responses to environments (rather than to locations) are of concern here. Greatest stability is b = 0.

The following measures are probably the most popular in the context of the dynamic, Type 2 stability concept:

i) Two measures, namely, Shukla's (1972a) stability variance and Wricke's (1962) ecovalence, which give the same results for ranking genotypes (Becker and Léon, 1988). Their formulation is explicit in the work by Calinski (1960). Wricke's ecovalence, which is simpler to calculate, is for the genotype i:
W_i² = ∑ (R_ij - m_i - m_j + m)²
where R_ij is the observed yield response (averaged across experiment replicates), m_i and m_j correspond to previous notations, and m is the grand mean. Greatest stability is W² = 0.
ii) Finlay and Wilkinson's regression coefficient across environments (as above), assuming greatest stability for b = 1. Therefore, instability can be evaluated as the distance in absolute value from the unity coefficient, i.e. | b_i - 1 |.

Figure 7.1 - Yield responses across five environments of hypothetical stable-yielding genotypes according to two concepts of stability, using non-regression (left) or regression (right) stability measures

Figure 7.1 reports yield responses across five environments of hypothetical stable-yielding material according to Type 1 and Type 2 stability concepts for non-regression (left) and regression (right) stability measures. Plotted diamonds represent observed genotype yields in the former case, and expected genotype yields at extreme values of environment mean yield in the latter case (in which the five environments are ordered by increasing mean yield). Type 1 stability requires (for a perfectly stable genotype) a constant yield value across environments based on observed (environmental variance) or expected (b value) data. In fact, this response implies large GE interaction in the presence of large variation in environment mean yield, with relatively better response in unfavourable environments and relatively worse response in favourable environments (Fig. 7.1). Conversely, perfect stability according to the Type 2 concept implies an observed or modelled yield response that is always parallel to the environment mean yield (i.e. zero GE interaction). As suggested in the figure, ranking of genotypes for yield stability may reveal major differences depending on the stability concept. For consideration as stability parameters, regression coefficients require that heterogeneity of genotype regressions account for a relatively high portion of the GE interaction SS (e.g. >35%), assessed by the procedure described in Section 5.2 applied to a GE interaction rather than a GL interaction matrix) - a condition that cannot be satisfied in most instances (Annicchiarico, 1997a; Brancourt-Hulmel et al., 1997).

Genotype yields are sometimes expressed in each environment as relative yields (mostly as percentage values of the environment mean yield) for better appreciation of genotype differences between environments. The environmental variance of genotype relative yields has been proposed independently by Annicchiarico (1992) and by Yau and Hamblin (1994) as an easy-to-calculate stability measure, which may be considered Type 2 because of the elimination of the environment main effect. The effect of the relative yield transformation on genotype merit is similar to that of the logarithmic transformation (T. Calinski, personal communication, 1991) - see also the genotype mean values across sites for original yields and their transformations in Table 5.2. It assigns greater weight to data of low-yielding environments for assessing genotype mean yield and yield stability. In the presence of proportionality between environment (or location) mean yield and within-environment (or within-location) phenotypic standard deviation of genotype values - likely to occur for site mean yields that differ widely and include very low values - this effect can be positive because it counterbalances the greater influence of high-yielding environments on the assessment (see Table 5.2 for the effect on genotype mean yield). In these conditions, selection for wide adaptation based on relative yields may prove more effective than that based on original yields (Brennan and Byth, 1979). Original yields and relative yields can be expected to rank genotypes likewise when test environments have similar yields. In other cases (i.e. moderate to large variation in environment mean yield, and no proportionality between environment mean yield and within-environment variation of genotype values), the relative yield transformation is not recommended (Piepho, 1994b).

A few Type 2 non-parametric measures that are widely applicable are based on the variance of the genotype rank across environments (Hühn, 1990). Since Type 2 stability implies low GE interaction, another stability measure could be provided by the Euclidean distance of the genotype from the origin of significant GE interaction PC axes, following the AMMI analysis of the GE interaction matrix (Annicchiarico, 1997a). Zero distance corresponds to perfect stability. For example, genotypes 5 and 16 in Figure 5.3 could be rated as almost perfectly stable-yielding, had the analysis referred to GE rather than GL interaction effects. Other AMMI analysis-derived stability measures have also been proposed (Sneller et al., 1997). On the one hand, this analytical approach allows for the estimation of yield stability on the "pattern" portion of the GE effects (retained in the significant PC axes). On the other, the estimation is much more time-demanding than that for other Type 2 stability measures.

Eberhart and Russell (1966) proposed the estimated variance of genotype deviations from regressions (s_d²) as a further stability measure for consideration in conjunction with the b parameter. Lin et al. (1986), while ascribing this measure to a Type 3 stability concept, interpreted it as an indicator of the goodness of fit of the regression model for describing the stability response. They argued that poor fit (i.e. large s_d² values) simply points towards the adoption of other Type 2 measures (such as Wricke's or Shukla's) rather than bothering with two stability parameters (b and s_d²), whereas good fit implies no practical usefulness of s_d² estimates.

Finally, Lin and Binns' (1988) Type 4 stability concept relates to stability only in time (i.e. across test years or crop cycles), averaged across test locations, rather than stability also in space (as implied by stability analysis across environments). The stability measure can be derived from an ANOVA that is limited to data of the genotype under assessment. The ANOVA can be performed on yield values averaged across experiment replicates, including just two factors, i.e. location, and year within locations. The stability measure is represented by the ANOVA MS for the latter factor (M_y_(l)). High stability is indicated by low M_y_(l) value, i.e. low temporal variation of genotype yield values (hence, the similarity with the Type 1, homeostatic concept of stability). In fact, the estimate of this variation as provided by M_y_(l) is inflated by the experimental error variance. The actual variance of this effect (S_y_(l)²) could be estimated as:

S_y_(l)² = M_y_(l) - (M_err/r)

[7.2]

where M_err = pooled error (i.e. average experimental error for the genotypes) in the combined ANOVA (the one performed at the beginning of data analysis according to Fig. 2.4), and r = number of experiment replicates. While S_y_(l)² and M_y_(l) values are equivalent for ranking genotypes, the former are more appropriate for adoption in yield reliability indexes. S_y_(l)² values could also be estimated through a hierarchical ANOVA performed on plot values of each genotype, which includes, in addition, the MS for the replicate within years source of variation (M_r_(y)). In this case:

S_y_(l)² = (M_y_(l) - (M_r_(y))/r
[7.3]

The current estimate of S_y_(l)² values may differ slightly from the estimate obtained with formula [7.2].

Choice of stability measure

Yield stability as estimated by a given stability measure should be repeatable in time (across future environments or years) to be of practical interest for genotype selection and recommendation. Indeed, high repeatability (estimated as the correlation of genotype values or ranks across independent data sets) indicates a high degree of genetic determination of the stability trait. Several studies have investigated this aspect (or the broad sense heritability) for different measures and the underlying concepts of stability. Their results are summarized below.

i) Type 1 and Type 4 measures are moderately repeatable in most instances, and tend to have higher repeatability/heritability than Type 2 measures; Type 3 stability has low or negligible repeatability (Léon and Becker, 1988; Lin and Binns, 1991; Eskridge and Mumm, 1992; Zavala-Garcia et al., 1992b; Jalaluddin and Harrison, 1993; Helms, 1993; Sneller et al., 1997; Schut and Dourleijn, 2000).
ii) The repeatability values can vary largely depending on the crop and the data set (Jalaluddin and Harrison, 1993; Annicchiarico, 1997a), but they remain distinctly lower than those for genotype mean yield across environments (Becker, 1987; Pham and Kang, 1988; Eskridge and Mumm, 1992; Jalaluddin and Harrison, 1993; Annicchiarico, 1997a).
iii) AMMI analysis-derived stability measures usually allow for only a slight increase of repeatability compared with other Type 2 measures (Annicchiarico, 1997a; Sneller et al., 1997) and, therefore, are hardly worth the calculation required.
iv) The repeatability of stability measures increases together with the temporal scale of the assessment (Becker, 1987; Sneller et al., 1997).

Besides tending towards higher repeatability, the static (Type 1 or Type 4) stability measures also offer theoretical advantages. They are estimated independently from the set of tested genotypes and allow, therefore, for a broader generalization (Lin et al., 1986). Their agronomic interpretation is less ambiguous than for Type 2 measures, where unstable yield may derive from genetic characteristics that logically confer higher stability (e.g. one cultivar resistant to a major biotic or abiotic stress occurring in some environments, tested with several cultivars susceptible to the same stress, would show high GE interaction effects and, hence, low Type 2 stability). Finally, Type 1 and Type 4 measures offer the chance to exploit positive GE effects leading to relatively better yield in unfavourable environments or years, thus increasing the security of food production or agricultural income at national and household level (Simmonds, 1991). This makes the static concept of stability more attractive in a wide range of situations, particularly for public institutions responsible for breeding or variety recommendation (while private breeders, especially in favourable cropping regions, may prefer the dynamic stability concept). The above reasons justify the greater emphasis placed below on Type 1 and Type 4 stability and the derived measures of yield reliability. Within Type 1 measures, greater attention will be paid to the environmental variance, which is generally applicable and easier to calculate compared with b values. This measure is widely adopted in the analysis of economic risk in agriculture (Hazell, 1982; Roberts and Swinton, 1996). As anticipated (Section 2.5), about eight trials or more are necessary for a reasonably reliable assessment of yield stability.

For breeding purposes, selection for increased yield stability can be performed through a yield reliability measure based on the environmental variance of genotype yields across selection environments of the region (wide adaptation prospect) or the subregion (specific prospect), thereby taking account of possible GL effects that cannot be eliminated through specific breeding. For genotype recommendation based on yield reliability directed to the whole target region, the same approach (integrating yield stability as environmental variance across all test environments) is generally appropriate. Conversely, yield reliability for recommendation directed to individual subregions can conveniently integrate yield stability of Type 4, because GL effects within subregions are minimized by the zoning procedure. To decide whether the yield reliability assessment is required (paths 1 vs. 2, or 3 vs. 4, in Fig. 2.4), the occurrence of significant variation in the combined ANOVA for relevant terms (i.e. GY and GLY interaction, or GY interaction within locations), particularly at relatively high Type 1 error rates (e.g. P > 0.001), does not necessarily imply the presence of genotype variation in yield stability (indeed, various situations are conceivable in which GE interactions are associated with genotypes possessing the same stability of Types 1, 2 or 4). Moreover, lower yield stability may not concern the subset of best-performing genotypes (e.g. those not different from the top-ranking one for mean yield according to Dunnett's one-tailed test): in that case, cultivar recommendation could simply be based on mean yield (Fig. 2.4). Therefore, the comparison among genotypes for yield stability may prove useful in various instances.

Comparison of stability values

Ordinary tests for comparison of variances - namely, Fisher's bilateral for two variances, and Hartley's, Cochran's or Bartlett's for several variances - are not recommended for environmental variances because the samples (i.e. the environments) are not independent (they are paired for individual comparison). Most of the appropriate tests are rather complex (Piepho, 1997). Ekbohm (1981) proposed a simple test that modifies Fisher's criterion by adjusting its DF to take account of the possible correlation (r) between the yield values of the two genotypes across the test environments. The observed F value (ratio of higher to lower estimated variance) is compared with a critical value of which the adjusted DF value (for the numerator and denominator of the test) is:

DF = (e - 1 - 2 r²)/(1 - r²)

where e = no. of environments, instead of (e - 1) according to Fisher's test. The higher the correlation, the greater the increase of DF relative to Fisher's test (which is, therefore, biased towards too few significant results in the presence of correlation). Ekbohm's test, valid under the assumption of normal distribution of genotype values across environments, can also be used for comparing the variance of relative yield. In order to limit the number of comparisons, one may choose the genotype belonging to the best-yielding subset showing the lowest stability (i.e. the highest variance) and compare it with each of a few genotypes characterized by highest stability. Only the presence of some significant difference justifies the assessment in terms of yield reliability. Material showing very low mean yield, which can hardly be compensated for by higher stability, may be excluded from the comparison.

The main determinant of the correlation coefficient in Ekbohm's test is probably represented by the trend of genotype yields to covariate as a consequence of the environment mean value. Ordinary tests for variance comparison may be applied with some caution to estimates of Type 4 stability (preferably S_y_(l)²), from which the effect of environment mean yield has been removed, also in view of the difficulty in applying more appropriate tests in this situation. In particular, differences at P < 0.10 may already be considered as indicative, owing to the bias towards an insufficient number of significant results. The null hypothesis of equality of all stability estimates could be verified by a quick and easy test, such as Hartley's (1950; see also Dagnelie, 1975a^[32]). For this test, the critical values reported in Table 7.1 for a range of DF values of major interest are only available for P < 0.05 and P < 0.01 levels. For each estimated S_y_(l)² (or M_y_(l)) parameter, DF = l (y - 1), where l and y are the number of test locations and years, respectively. There is significant variation in yield stability among genotypes when the observed ratio between the highest and the lowest variance exceeds the critical value of the ratio relative to the P level, the DF and the number of compared stability measures. Alternatively, top-yielding genotypes characterized by high S_y_(l)² values could be compared with the most stable material by means of Fisher's bilateral test.

TABLE 7.1 - Critical values of Hartley's (1950) test for comparison of several variances, applicable to comparison of yield stability measures of Type 4

		Number of compared variances
DF	P level <	2	3	4	5	6	7	8	9	10	11	12
5	0.05	7.15	10.8	13.7	16.3	18.7	20.8	22.9	24.7	26.5	28.2	29.9
5	0.01	14.9	22	28	33	38	42	46	50	54	57	60
6	0.05	5.82	8.38	10.4	12.1	13.7	15.0	16.3	17.5	18.6	19.7	20.7
6	0.01	11.1	15.5	19.1	22	25	27	30	32	34	36	37
7	0.05	4.99	6.94	8.44	9.70	10.8	11.8	12.7	13.5	14.3	15.1	15.8
7	0.01	8.89	12.1	14.5	16.5	18.4	20	22	23	24	26	27
8	0.05	4.43	6.00	7.18	8.12	9.03	9.78	10.5	11.1	11.7	12.2	12.7
8	0.01	7.50	9.9	11.7	13.2	14.5	15.8	16.9	17.9	18.9	19.8	21
9	0.05	4.03	5.34	6.31	7.11	7.80	8.41	8.95	9.45	9.91	10.3	10.7
9	0.01	6.54	8.5	9.9	11.1	12.1	13.1	13.9	14.7	15.3	16.0	16.6
10	0.05	3.72	4.85	5.67	6.34	6.92	7.42	7.87	8.28	8.66	9.01	9.34
10	0.01	5.85	7.4	8.6	9.6	10.4	11.1	11.8	12.4	12.9	13.4	13.9
12	0.05	3.28	4.16	4.79	5.30	5.72	6.09	6.42	6.72	7.00	7.25	7.48
12	0.01	4.91	6.1	6.9	7.6	8.2	8.7	9.1	9.5	9.9	10.2	10.6
15	0.05	2.86	3.54	4.01	4.37	4.68	4.95	5.19	5.40	5.59	5.77	5.93
15	0.01	4.07	4.9	5.5	6.0	6.4	6.7	7.1	7.3	7.5	7.8	8.0
20	0.05	2.46	2.95	3.29	3.54	3.76	3.94	4.10	4.24	4.37	4.49	4.59
20	0.01	3.32	3.8	4.3	4.6	4.9	5.1	5.3	5.5	5.6	5.8	5.9
30	0.05	2.07	2.40	2.61	2.78	2.91	3.02	3.12	3.21	3.29	3.36	3.39
30	0.01	2.63	3.0	3.3	3.4	3.6	3.7	3.8	3.9	4.0	4.1	4.2
60	0.05	1.67	1.85	1.96	2.04	2.11	2.17	2.22	2.26	2.30	2.33	2.36
60	0.01	1.96	2.2	2.3	2.4	2.4	2.5	2.5	2.6	2.6	2.7	2.7

Source: Dagnelie, 1975a (p. 418).

Regression coefficients of pairs of genotypes can be compared by t test (Steel and Torrie, 1960; Dagnelie, 1975a^[33]). For other Type 2 stability statistics, one test proposed by Shukla (1972b) can be used for comparing Shukla's or Wricke's measures between pairs of genotypes (incidentally, testing individual Shukla's stability variance values for difference to zero, as often reported, does not imply the occurrence of variation among the stability values), whereas a global comparison among non-parametric stability measures can be performed according to Nassar and Hühn (1987). Piepho (1996) provides a thorough discussion of testing procedures for Type 2 stability statistics.

7.2 Yield reliability

High Type 1 or Type 2 stability in Figure 7.1 may be associated with high or low mean yield. Likewise, high Type 4 stability may characterize high- or low-yielding material. Therefore, both mean value and consistency of performance should be considered for genotype selection or recommendation (in the presence of variation in yield stability).

Measures of reliability over test environments

For reasons discussed in the preceding section, the integration of mean yield with static (Type 1 or Type 4) stability measures is considered of special interest for yield reliability assessment. One possible approach is represented by the "mean-standard deviation analysis" (Barah et al., 1981; Witcombe, 1988), in which genotype values of mean yield and square root of the environmental variance are reported on abscissa and ordinate axes, respectively, of a plot that allows for a visual identification of risk-efficient material. A second approach, proposed by Kataoka (1963) for economic analysis, has the advantage of integrating the two characteristics into a single index of reliability of yield (or some other economic variable, e.g. gross benefit or net income). The relative importance attributed to yield stability in the index depends on the average level of risk aversion of farmers in the target region or subregion. In particular, Kataoka's index can be used for estimating, on the basis of the distribution of yield values observed across test environments (cultivar recommendation) or selection environments (breeding), the lowest yield expected for a given genotype and a specified probability of negative event (Eskridge, 1990). Consider, for example, two genotypes having same mean yield but different yield stability as indicated by the environmental variance measure (Fig. 7.2). For each genotype, the mean value represents the yield that can be expected in 50 percent of cases (i.e. P = 0.50). Basing decisions on this value indicates complete indifference to risk. Conversely, decisions based on the lowest yield expected in a very large number of instances (e.g. 95%, i.e. P = 0.95) are driven by concern for disastrous events (i.e. marked risk aversion), with little consideration for average yield response. P values may vary between 0.95 (for subsistence agriculture in unfavourable cropping regions) to 0.70 for modern agriculture in most favourable regions. In general, the index value (I) for the genotype i is:

I_i = m_i - Z_(P) S_i

[7.4]

where m_i = mean yield, S_i = square root of the environmental variance, and Z_(P) = percentile from the standard normal distribution for which the cumulative distribution function reaches the value P. Z_(P) can assume the following values depending on the chosen P level: 0.675 for P = 0.75; 0.840 for P = 0.80; 1.040 for P = 0.85; 1.280 for P = 0.90; and 1.645 for P = 0.95.

Figure 7.2 - Frequency of yield (or relative yield) values across environments of two genotypes having same mean yield (mⁱ) and contrasting stability as measured by the variance (S_i²) of yield values, and estimation of a yield reliability index (I_i) equal to the lowest yield that is expected in 75% of cases

In the example (Fig. 7.2), the lower confidence bound for yield values estimated by the index (I) is distinctly different for the two genotypes already at P = 0.75 (lowest yield in 75% of cases). The difference in genotype merit would further increase together with the P value (i.e. the level of risk aversion) and, therefore, with the importance of yield stability. At relatively high P levels, in particular, genotype ranks for mean yield may largely differ from those for yield reliability integrating Type 1 stability (Eskridge, 1990). Kataoka's approach may also be used to assess yield reliability in terms of relative yield, thereby considering yield stability of Type 2 (Annicchiarico, 1992). Notations m_i and S_i in formula [7.4] indicate in this case the mean value and the square root of the variance of relative yield, respectively, for the genotype i. Differences in genotype merit, expressed in percentage values (Fig. 7.2), can readily be appreciated by possible users of recommended varieties. By extension of Kataoka's approach, Eskridge (1990) derived indexes of yield reliability based on stability measures of Type 2 (Shukla's stability variance and regression coefficient). The calculation of these indexes is not as simple as with previous measures of yield reliability. With reference to the same stability measures, Eskridge and Mumm (1992) proposed an alternative definition of yield reliability in terms of probability of a given entry to outperform a reference cultivar. Finally, Kang (1988) proposed a simple index of reliability (termed "Yield-Stability statistics") that integrates mean yield and Shukla's stability in a non-parametric fashion -i.e. it is not based on probability distributions. Genotypes are ranked for mean yield on the one hand, and for Shukla's stability variance on the other. Ranks are summed up: the lower the rank-sum, the higher the yield reliability. The yield stability characteristic may be assigned a higher weight, in order to account for relatively greater risk aversion (Kang and Pham, 1991).

Also Type 4 stability, particularly as S_y_(l)² value, may be considered through Kataoka's index. It is sufficient to substitute its square root for the square root of the environmental variance in expression [7.4]. This measure may be preferred for genotype selection, or for making general recommendations over a region, when GL interaction effects are not significant. However, the main interest of Type 4 stability is that it allows the consideration, when convenient, of yield stability in time when investigating adaptive responses for specific recommendation of cultivars (path 4 in Fig. 2.4).

Modelling responses to locations for reliability

As modelled by analysis of adaptation (e.g. Fig. 5.1 or Fig. 5.4 [A]), these responses are estimates of the mean value of yield or nominal yield expected for a given genotype on each site. This value is affected by year-to-year variation proportional to the level of instability in time of the genotype, just as the genotype value in individual environments of Figure 7.2 is affected by year-to-year and site-to-site variation proportional to the degree of genotype instability across environments. Imposing a yield reliability assessment on genotype adaptive responses basically requires the estimation of a lower confidence bound for each response that depends on its variation in time. These bounds can be used, in particular, for making specific recommendations on the basis of the most reliable genotype(s) in each location (for a specified level of risk aversion).

Actually, an estimation of lower bounds for joint linear regression or one-covariate factorial regression could be obtained by regressing genotype values on each site in individual years (rather than averaged across years) as a function of the site covariate (thereby considering explicity the within-site temporal variation in the analysis of adaptation model), and then applying the conventional formula for prediction of the lowest value of an individual observation for a specified P level and covariate value (Snedecor and Cochran, 1967; Dagnelie, 1975a^[34]). The adopted P value (ranging between 0.75 and 0.95, and relative to the proportion of yield values accommodated above the confidence bound) would be assigned to Student's t in the formula. Besides its limited application (which excludes multidimensional AMMI and regression models), this procedure may also suffer a drawback because the width of confidence limits depends not only on the extent of the within-site temporal variation (i.e. stability in time) but also on the lack of fit of the analysis of adaptation model (as the difference between observed and estimated genotype mean values on the site). A stable-yielding genotype that fits the model only moderately could be severely penalized, especially at relatively high P levels.

The following procedure, although providing an approximate solution, is applicable to any analysis of adaptation model and relates specifically to yield stability in time, while requiring a modest level of calculation. It may apply to any formula for calculating nominal yields from estimated values of genotype mean yield across locations (m_i) and GL interaction effects (e.g. formulae [5.2], [5.4], [5.5] and [5.7], or analogous ones for AMMI models including three or more PC axes). The formulae can integrate the average yield stability in time as estimated, for each genotype, by the s_(y)l² value, to predict the lowest nominal yield response expected for a specified P value (i.e. proportion of cases). For the genotype i, it is sufficient to substitute the following expression:

m_i - Z_(P) S_y_(l)i

where Z_(P) corresponds to previous notations and S_y_(l)i is the square root of the Type 4 stability measure for the genotype (in place of the m_i value). The part of the formula relative to the estimation of GL effects is unaltered (thus, the information produced by analysis of adaptation remains entirely valuable). For example, formula [5.4] relative to AMMI-1 models becomes:

N_ij' = m_i - (1.28 S_y_(l)i) + (u_i1' v_j1')

where N_ij' represents the nominal yield reliability of the genotype i at the location j, if the lowest response expected in 90 percent of cases (i.e. P = 0.90) is of interest. Compared with nominal yields relative to mean responses (i.e. P = 0.50), represented for each genotype as a straight line in Figure 5.4 (A), the responses for nominal yield reliability would be parallel but lowered to an extent that varies depending on the Type 4 stability of the genotype (see Fig. 8.2 in Section 8.3). The lowering is mild for stable and severe for unstable material, thereby modifying the crossover points that, for best-ranking genotypes, determine the subregions for specific recommendation. Lower confidence bounds for genotype responses obtained by the same procedure would relate to straight lines also for joint regression (Fig. 5.1 [A]) and one-covariate factorial regression (Fig. 5.1 [B]), as well as to plans for AMMI-2 (Fig. 5.5 [A]) and two-covariate factorial regression. Information on the most reliable genotypes on each site can be extended to new sites or scaled up temporally, as described in Section 5.8, the only difference concerning the introduction of the term (- Z_(P) S_y_(l)i) in the formula adopted for the calculation of nominal yields.

The proposed approach for modelling yield reliability responses assumes that the year-to-year variation for genotype yield on each site is not related, in general, to site ordination for mean yield (joint regression), PC scores (AMMI) or covariate values (factorial regression) - thereby justifying, for the genotype, the constant lowering across sites of yield reliability responses relative to mean adaptive responses. This assumption, which can be verified by the lack of relationship between average temporal variation of the genotypes on each site (e.g. as average within-site phenotypic variance of annual yield values of genotype, s_y²) and values of site ordination for the adopted analytical model, is confirmed by results in Table 4.7 - even for a situation of particular concern, namely, site ordination for mean yield (m_loc) in the Algerian data set. No relationship between s_y² and m_loc takes place, despite the large variation in site mean yield that is also inclusive of very low values. This result is explained by the well-known, large temporal variation in genotype yields in unfavourable locations, caused by the wide year-to-year variation in the extent and the timing of major climatic stresses (e.g. Ceccarelli et al., 1991). The possible logarithmic transformation of this data set to better compare genotype values on specific sites (see Section 5.6) would not be recommended for modelling adaptive responses in terms of yield reliability, because the temporal variation of genotype values would be larger on lower-yielding than higher-yielding sites after transforming data. On the whole, the results in Table 4.7 suggest that the proposed procedure for integration of genotype responses for adaptation and yield stability can be applied in a large range of situations following the analysis of adaptation of original yield values. If any effect of site mean yield on the average extent of temporal variation of the genotypes on the sites were found, it could be removed by an appropriate data transformation based on results of the regression of s_y² values as a function of m_loc on a logarithmic scale, using the same criterion already discussed in relation to heterogeneity of genotypic variance among sites (see Section 5.6).

The statistical comparison among estimates of yield reliability of the best-ranking genotypes seems difficult in most instances. For measures based on Type 2 stability assessed across a set of environments, confidence intervals could be calculated for the probability of the top-ranking genotype to outperform each of the remaining entries (Eskridge and Mumm, 1992). Dunnett's critical difference values, as calculated for genotype comparison in terms of mean yield across environments or nominal yield on specific sites, may be used for comparing values of yield reliability or nominal yield reliability based on Type 1 or Type 4 stability, thereby obtaining a very rough but simple outline of less reliable entries compared with the top-ranking one.

For extension of results targeted to variety choice, the yield reliability estimates may be expressed as percentage values relative to the most reliable genotype or a control cultivar. Information may be provided on more than one level of risk aversion (e.g. P = 0.75 and P = 0.90), if the target population of farmers varied largely for attitude towards risk (as a consequence of farmers' resources, goals, farming system etc.).

7.3 Computer software

The calculation of the measures of yield stability and reliability herein considered of greatest interest is relatively simple and does not require specific software. In particular, values of environmental variance (for original or relative yields) and the derived reliability indexes can easily be calculated through a worksheet (or its mode as available in IRRISTAT). The comparison of environmental variance values, requiring also correlation analysis, and the calculation of Type 4 stability measures, requiring the execution of simple one-way ANOVAs, can be performed by IRRISTAT or any ordinary statistical software. In particular, the ANOVA for each genotype performed on plot yields for estimation of S_y_(l)² values according to formula [7.3] can easily be carried out through IRRISTAT by indicating the Location and Location × Year effects in the ANOVA module, and specifying Genotype as the classification criterion in the Data selection option for the analysis of different data subsets. The utilization of Type 4 statistics for modelling yield reliability responses is a more complex task that can be accomplished using worksheets - as envisaged earlier for the estimation of nominal yields according to the analysis of adaptation model (exploiting the previous calculation of GL interaction effects, which is the most demanding issue).

Other stability measures relative to regression coefficients and PC scores of genotypes may be calculated and, possibly, tested through a joint regression and an AMMI analysis, respectively, performed on the GE interaction effects, using IRRISTAT or other software (see Section 5.9). Freely available programs have been developed by Lin et al. (1992) to calculate regression coefficients and Wricke's and Shukla's statistics, and Kang and Magari (1995) to estimate and test Shukla's parameter and calculate Kang's index of yield reliability. For SAS users, a version of the latter program is also available that allows for the analysis of unbalanced data sets (Magari and Kang, 1997). Again for SAS users, programs have been developed by Hussein et al. (2000) to assess a large number of stability measures and Kang's index of reliability, and by Annicchiarico (1997c) to estimate and test a smaller number of stability measures, in balanced data sets. Instructions for several methods of stability analysis that are also applicable to unbalanced data sets have been reported by Piepho (1999).

^[32] Ibid., p. 58
^[33] Ibid., p. 281.
^[34] Ibid., p. 292.