4. Analysis of variance (ANOVA) and estimation of variance components

4.1 Models of ANOVA

Preliminary ANOVAs can be carried out for individual experiments to assess variation among environments for experimental error and, possibly, genotypic variance (see Sections 4.3 and 4.4). Combined ANOVAs for a complete set of experiments or its subsets can be performed with different objectives, such as:

verification of the occurrence (i.e. significance) of different effects;
estimation and comparison of mean values for levels of fixed factors (in particular, genotype mean values across the region or within subregions); and
estimation of the size of genotypic and genotype-environmental variance components (possibly as a step towards estimation of genetic parameters).

The ANOVA may also represent one step in the analysis of adaptation or in the assessment of yield stability measures.

Use of plot or cell mean values

A combined ANOVA can be performed using either plot values or data of genotypes in individual environments that have been averaged across experiment replicates (i.e. genotype-environment cell means). The use of plot values is considered in greater detail below, with special reference to experiments laid out in a randomized complete block design. The use of genotype-environment cell means may be preferable in some cases, for example:

in trials laid out in lattices or other incomplete block designs (allowing for the use of data adjusted according to the design);
where trials differ for experimental design; or
when there is a varying number of replicates.

To convert the results of an ANOVA performed on a cell mean basis into results on a plot basis, the sum of squares (SS) and mean square (MS) of effects must be multiplied by a common factor (r) equal to the harmonic mean of the number of experiment replicates (Cochran and Cox, 1957). For p trials,

r = p/Σ(1/r_j)

where rj = number of replicates of the trial j. For a constant number of replicates, this coincides with the arithmetic mean. The pooled error MS (M_e) to be inserted in the ANOVA can then be estimated from the experimental errors of the individual trials. For trials with the same number of replicates, M_e is equal to the arithmetic mean of the experimental errors. For trials with a variable number of replicates, a valid option for the estimation is weighting the experimental error (Me_(j)) of each trial by the relative number of replicates (r_j) (Cochran and Cox, 1957):

M_e=Σ(r_j M_e(j))/Σr_j

Genotype, location and time factors

Besides the genotype factor and the possible block factor, the combined ANOVA may include only the location factor, or (for trials repeated in time) also a time factor, such as year (annual crops) or crop cycle (perennials). Alternatively, there might be an environment factor, for which the levels are represented by individual trials. Some ANOVA models may also include a subregion factor (following the definition of mega-environments) and/or a germplasm group factor (for which the groups may coincide with distinct gene pools, variety types or material with contrasting adaptation patterns).

The ANOVA models may differ in terms of number and type of factors as well as relationship between factors. In particular, the year factor may be crossed with, or nested within, the location factor. A further element of distinction between models arises from the definition of each factor as random or fixed. In general, randomness implies that factor levels are randomly chosen from a target population, to which conclusions for fixed factors are extended and in which the extent of variation (rather than the difference between specific factor levels, e.g. pairs of genotypes) is of primary concern. This definition may be applied to the time factor. The genotype factor may be defined as random or fixed, depending on the objective of the analysis. It is random when the aim is to support decisions regarding elements of a breeding strategy by estimating: variance components, genetic parameters, genetic gains expected from different adaptation strategies or selection procedures etc. In this case, the genotypes should be representative of the relevant genetic base. Conversely, genotype is a fixed factor when the emphasis is on the comparison of tested material for selection or recommendation.

The location factor is definitely random when the main interest of the analysis lies in the estimation of variance components (Wricke and Weber, 1986) for sites that are representative of the relevant population within the target region. The choice between random and fixed can be problematic for sites that are closely investigated for similarity of GL interaction effects and the possible identification of relatively uniform subregions for breeding or recommendation purposes. Random is usually the most appropriate. Fixed may be envisaged only if each location represents a well-defined area with relative crop management; results for a given site may therefore be extended to the area that it represents. The environment factor is usually random. Finally, both the subregion and the germplasm group factors (if present) are fixed.

Three major groups of partially hierarchical ANOVA models are herein considered for the combined analysis of sets of experiments laid out in a randomized complete block design. The first group includes models with three factors:

genotype;
location or environment; and
block within locations or environments.

The observed yield response R_ijr of the genotype i in the location j and block r is:

R_ijr = m + G_i + L_j + B_r (L_j) + GL_ij + e_ijr

where m = grand mean; G = genotype, L = location and B = block effects; and e_ijr = random error. This group of models is useful for analysis of adaptation based on trials that are not repeated in time, as is frequently the case for perennials.

The second group of ANOVA models comprises four factors:

genotype;
location;
year (or other time factor), crossed with the location factor; and
block within locations and years.

The yield response R_ijkr of the genotype i in the location j, year k and block r is:

R_ijkr = m + G_i + L_j + Y_k + B_r (L_j Y_k) + GL_ij + GY_ik + LY_jk + GLY_ijk + e_ijkr

where Y = year effects.

The third group of models includes the same factors as the second group, but the time factor is nested into location. The yield response R_ijkr is:

R_ijkr = m + G_i + L_j + Y_k (L_j) + B_r (L_j Y_k) + GL_ij + GY_ik (L_j) + e_ijkr

This ANOVA layout is particularly useful when locations differ for test years, although it may also be used as an alternative to the preceding layout, i.e. for same test years across locations. In fact, a fourth group of ANOVA models may also be considered where the location factor is nested into year (i.e. test locations change across years). Difficulty in assessing the genotype responses to locations makes these ANOVA models less useful for analysis of adaptation.

Test of effects

Within each group, four different models are available, depending on whether the genotype, location and environment factors are defined as random or fixed. This definition has implications for the expectations of MS values with the possible modification of the error term to be adopted in the F tests. The ratios of mean squares for the F tests are reported for the models of each group in Tables 4.1 to 4.3. They are based on the expected MS values of general cases reported by Dagnelie (1975a^[6]).

TABLE 4.1 - ANOVA models including the factors G = genotype and L = location or environment, and estimation of variance components, for trials in a randomized complete block design

Source of variation	DF^e	MS	Model 1^a		Model 2^b		Model 3^c		Model 4^d
Source of variation	DF^e	MS	F test	Variance component	F test	Variance component	F test	Variance component	F test
G	g-1	M₁	M₁/M₄	s_g² = (M₁-M₄)/rl	M₁/M₄	-	M₁/M₅	s_g² = (M₁-M₅)/rl	M₁/M₅
L	l-1	M₂	M₂/M₄	-	M₂/M₅	-	M₂/M₄	-	M₂/M₅
Block (L)	(r-1) l	M₃	-	-	-	-	-	-	-
G × L	(g-1) (l-1)	M₄	M₄/M₅	s_gl² = (M₄-M₅)/r	M₄/M₅	s_gl² = (M₄-M₅)/r	M₄/M₅	s_gl² = (M₄-M₅)/r	M₄/M₅
Pooled error	(r-1) (g-1) l	M₅	-	s_e²= M₅	-	s_e²= M₅	-	s_e²= M₅	-

^a Model 1 = G and L random factors;
^b Model 2 = G fixed, L random;
^c Model 3 = L fixed, G random;
^d Model 4 = G and L fixed factors.
^e g = no. genotypes; l = no. locations; r = no. blocks.

TABLE 4.2 - ANOVA models including the factors G = genotype, L = location and Y = year, and estimation of variance components, for trials in a randomized complete block design repeated in same years (i.e. L and Y crossed factors)

Source of variation	DF^e	MS	Model 1^a		Model 2^b		Model 3^c		Model 4^d
Source of variation	DF^e	MS	F test^f	Variance component	F test^f	Variance component	F test^f	Variance component	F test
G	g-1	M₁	?	s_g² = (M₁-M₅-M₆ + M₈) /ryl	?	-	M₁/M₆	s_g² = (M₁-M₆)/ryl	M₁/M₆
L	l-1	M₂	?	-	M₂/M₇	-	?	-	M₂/M₇
Y	y-1	M₃	?	-	M₃/M₇	-	M₃/M₆	-	M₃/M₉
Block (L Y)	(r-1) ly	M₄	-	-	-	-	-	-	-
G × L	(g-1) (l-1)	M₅	M₅/M₈	s_gl² = (M₅-M₈)/ry	M₅/M₈	s_gl² = (M₅-M₈)/ry	M₅/M₈	s_gl² = (M₅-M₈)/ry	M₅/M₈
G × Y	(g-1) (y-1)	M₆	M₆/M₈	s_gy² = (M₆-M₈)/rl	M₆/M₈	s_gy² = (M₆-M₈)/rl	M₆/M₉	s_gy² = (M₆-M₉)/rl	M₆/M₉
L × Y	(l-1) (y-1)	M₇	M₇/M₈	-	M₇/M₉	-	M₇/M₈	-	M₇/M₉
G × L × Y	(g-1)(l-1)(y-1)	M₈	M₈/M₉	s_gly² = (M₈-M₉)/r	M₈/M₉	s_gly² = (M₈-M₉)/r	M₈/M₉	s_gly² = (M₈-M₉)/r	M₈/M₉
Pooled error	(r-1) (g-1) ly	M₉	-	s_e²= M₉	-	s_e²= M₉	-	s_e²= M₉	-

^a Model 1 = G, L and Y random factors;

^b Model 2 = G fixed, L and Y random;

^c Model 3 = L fixed, G and Y random;

^d Model 4 = G and L fixed, Y random.

^e g = no. genotypes; l = no. locations; y = no. years; r = no. blocks.

^f Tests marked with a question mark are feasible, but not straightforward; for approximate tests, see text.

TABLE 4.3 - ANOVA models including the factors G = genotype, L = location and Y = year, and estimation of variance components, for trials in a randomized complete block design repeated in different years in each location (i.e. Y factor nested into L)

Source of variation	DF^e	MS	Model 1^a		Model 2^b		Model 3^c		Model 4^d
Source of variation	DF^e	MS	F test^f	Variance component	F test	Variance component	F test^f		F test
G	g-1	M₁	M₁/M₅	s_g² = (M₁-M₅)/ryl	M₁/M₅	-	M₁/M₆	s_g² = (M₁-M₆)/ryl	M₁/M₆
L	l-1	M₂	?	-	M₂/M₃	-	?	-	M₂/M₃
Y (L)	(y-1) l	M₃	M₃/M₆	-	M₃/M₇	-	M₃/M₆	-	M₃/M₇
Block (L Y)	(r-1) ly	M₄	-	-	-	-	-	-	-
G × L	(g-1) (l-1)	M₅	M₅/M₆	s_gl² = (M₅-M₆)/ry	M₅/M₆	s_gl² = (M₅-M₆)/ry	M₅/M₆	s_gl² = (M₅-M₆)/ry	M₅/M₆
G × Y (L)	(g-1) (y-1) l	M₆	M₆/M₇	s_gy(l)² = (M₆-M₇)/r	M₆/M₇	s_gy(l)² = (M₆-M₇)/r	M₆/M₇	s_gy(l)² = (M₆-M₇)/r	M₆/M₇
Pooled error	(r-1) (g-1) ly	M₇	-	s_e²= M₇	-	s_e²= M₇	-	s_e²= M₇	-

^a Model 1 = G, L and Y random factors;
^b Model 2 = G fixed, L and Y random;
^c Model 3 = L fixed, G and Y random;
^d Model 4 = G and L fixed, Y random.
^e g = no. genotypes; l = no. locations; y = no. years; r = no. blocks.
^f Test marked with a question mark is feasible, but not straightforward.

Other effects may also contribute to the definition of the error term for an F test. Sometimes the F test is not as straightforward as the ratio between two mean squares. Considering the models in Table 4.2, there is uncertainty in the F test for the genotype factor holding as null hypothesis: i) the lack of purely genetic variation within the relevant genetic base (Model 1), or ii) the lack of differences in mean value between the tested genotypes (Model 2), across the target populations of years and locations. The reason is that various random effects contribute to the error term. The expectations of MS values suggest that the following combination of mean squares (using notations of Table 4.2) can provide an appropriate error term (Kempthorne, 1952):

M_err = M₅ + M₆ - M₈

for which the associated degrees of freedom (DF) are:

DF = (M₅ + M₆ - M₈)²/{ M₅²/(g-1)(l-1) + M₆²/(g-1)(y-1) + M₈²/g (l-1)(y-1) }

This F test is preferable to that reported by Cochran and Cox (1957), because the error term can be easily adopted in formulae for the calculation of standard errors and comparison of genotype means. For standard errors, the results coincide with those from other formulae (Bowman, 1989; Patterson, 1997), which require, however, the estimation of all genotype-environment components of variance. Testing of the genotype factor is much easier with ANOVA models 1 and 2 in Table 4.3 or with models 1 and 2 in Table 4.1, where the environment factor summarizes locations and years. In the latter case, however, the error term (i.e. GE interaction) tends to be smaller than the appropriate error based on the individual genotype-environment sources of variation (Patterson, 1997), resulting in loss of protection for the test.

An approximate F test for the location factor in Models 1 and 3 in Table 4.2 can be performed using location × year interaction as the error term, provided that (and it is often the case) MS is much greater (e.g. > 30 times) than for GL interaction. Similarly, the same error term can be used for testing the year factor in Model 1 in Table 4.2, if the MS square is much greater than that for GY interaction.

With year as a random factor, the appropriate error term for testing the GL interaction in multiyear data sets is always (across Models 1 to 4): GLY interaction, when the year factor is crossed with location (Table 4.2); and average GY interaction within locations, when the year factor is nested within location (Table 4.3). The major difference between the models in Table 4.2 and those in Table 4.3 lies in the inability of the latter to separate the variation due to the year effect across all locations. Therefore, the GY interaction within locations for the Table 4.3 models sums up the variation of the GY and GLY interaction terms for the Table 4.2 models. On the other hand, the variation of years within locations for the models in Table 4.3 pools the variation of year and LY interaction for the models in Table 4.2. Pooling GY and GLY interactions tends to increase slightly the error term for GL interaction, but in compensation it facilitates some aspects of the analysis. Failure to distinguish between GY and GLY interaction effects is not particularly detrimental, as they both contribute to decisions on yield stability for breeding (Fig. 2.3) or variety recommendation (Fig. 2.4).

TABLE 4.4 - Analysis of variance for 18 bread wheat varieties grown for three years in 31 Italian locations, with partitioning of GL interaction by: 1) joint regression analysis; 2) AMMI analysis; 3) definition of four subregions

Source of variation^a	DF	SS	MS^b	Variance component (t/ha)²
G	17	612.8	36.00 **	0.117
L	30	8 775.1	292.50 **	-
- SR	3	4 090.0	1 363.35 *	-
- L (SR)	27	4 684.1	173.56 **	-
Y	2	665.5	332.74 *	-
Block (L Y)	186	242.2	1.30	-
G × L	510	587.9	1.15 **	0.043
1. - G regressions	17	61.1	3.59 **	-
- Deviations	493	526.8	1.07 **	-
2. - PC 1	46	129.3	2.81 **	-
- PC 2	44	94.1	2.14 **	-
- Residual	420	364.5	0.87 ns	-
3. - G × SR	51	188.1	3.69 **	-
- G × L (SR)	459	399.8	0.87 ns	-
G × Y	34	100.0	2.94 **	0.024
L × Y	60	4 767.2	79.45 **	-
G × L × Y	1 020	794.6	0.78 **	0.179
Pooled error	3 162	791.2	0.25	0.250

^a G = genotype; L = location; SR = subregion; Y = year factors (trials in a randomized block design with three replicates).

^b ns = not significant; * = significant at P < 0.05; ** = significant at P < 0.01 (test FR for PC axes).

Source: Annicchiarico and Perenzin, 1994 (adapted from data).

Additional factors

An example of ANOVA including a subregion factor is given in Table 4.4. The variation in location (random factor) and its interaction of greatest interest (i.e. GL) is divided into two components relative to the differences:

between subregions (previously identified through analysis of adaptation); and
between locations within subregions.

The GL interaction variation is also partitioned by two models for analysis of adaptation (see Chapter 5). About 32 percent of the GL interaction variation is accounted for by genotype × subregion interaction. This effect is significant, whereas average GL interaction within a subregion is not, following F tests in which average GL interaction acts as the error term of genotype × subregion interaction and holds the GLY interaction as its error term.

The inclusion in ANOVA models of a germplasm group factor would split the genotype variation (for main effect and interactions) into two components relative to the differences:

between groups; and
between genotypes within groups,

in order to verify whether genotypic and genotype-environment differences are mainly accounted for by distinct gene pools, variety types or other features that define the groups. Incidentally, a “genotypic value” fixed factor, with the levels defined by genetic marker information, may also be included in the combined ANOVA to assess the proportion of genetic and genotype-environmental variation explained by the genetic factor (Walsh, 2002).

Unbalanced data sets

When analysing unbalanced data sets, the arithmetically simpler, sequential (Type I) SS can be biased for some effects (with bias proportional to the degree of imbalance). Corrected SS (usually Type III) should be adopted for this situation (Milliken and Johnson, 1984), whereas the elements of the analysed genotype by location data matrix should derive from least squares estimation, i.e. genotype means adjusted for the lack of orthogonality in the data (Searle, 1987). Solutions capable of eliminating imbalance are often sought (e.g. the estimation of missing plot values based on experimental design; the performance of ANOVA and following analyses on genotype by environment cell means), in order to simplify the analysis. Similarly, some genotypes and/or locations may be eliminated from ANOVA and analysis of adaptation in order to obtain a complete matrix of genotype by location (trials not repeated in time) or genotype by location by year (trials repeated in time) yield values. Information from these locations may be utilized at a later stage to define subregions and assess genotype values (see Section 5.7).

For ANOVAs including subregion or germplasm group factors, the imbalance may arise from a variable number of sites or environments per subregion or genotypes per group. The use of Type I SS (if unavoidable) should be limited to the estimation of effects related to these factors (e.g. subregion, location within subregion, genotype × subregion interaction, and genotype × location within subregion interaction in Table 4.4); the remaining effects may then be estimated using an ANOVA excluding these factors.

4.2 Estimation of individual effects and comparison of means

The estimation of genotype (Gi) and location (Lj) main effects and GL interaction effects (GLij) is illustrated in Table 4.5 for a balanced data set of three hypothetical genotypes grown in three locations.

TABLE 4.5 - Calculation of genotype (G_i) and location (L_j) main effects, and GL interaction effects (GL_ij), from mean values of genotypes at each location (m_ij)

Genotype	m_i values			*Genotype mean (m_i)*	G_i values	GL_ij values
Genotype	Loc. 1	Loc. 2	Loc. 3	*Genotype mean (m_i)*	G_i values	Loc. 1	Loc. 2	Loc. 3
1	2	6	7	5	0	-1	1	0
2	3	5	4	4	-1	1	1	-2
3	4	4	10	6	1	0	-2	2
Loc. mean (m_j)	3	5	7
L_j values	-2	0	2

Note:

Grand mean (m) = 5
GL_ij = m_ij - m - G_i - L_j = m_ij - m_i - m_j + m

G_i = m_i - m
L_j = m_j - m
GL_ij = m_ij - m - G_i - L_j = m_ij - m_i - m_j + m

[4.1]

where m represents the grand mean, m_i the mean values of genotype i, m_j the mean values of location j, and m_ij the observed yield response of the genotype i in location j (averaged across replicates). In order to calculate the environment main effects and GE interaction effects, the location factor in the formulae should be substituted with the environment factor.

The occurrence of GL interaction of the crossover type between pairs of genotypes may be verified - across locations represented by single environments - through the test proposed by Gail and Simon (1985) and described by Baker (1988). However, this procedure is limited by:

the potentially large number of paired comparisons in data sets with numerous genotypes;
the greater complexity of its application for trials repeated in time; and
the opportunity of considering crossover interactions for genotype responses that have conveniently been modelled through an analysis of adaptation (see Chapter 5).

Genotype values over a given region or subregion may be compared for selection or recommendation purposes. For the identification of entries that are no worse (statistically) than the best-ranking, Dunnett’s one-tailed test, as devised by Gupta (1965), is recommended for multiple comparisons (Dagnelie, 1975a^[7]; Lentner and Bishop, 1986). The test involves the calculation of a critical (or least significant) difference (d) valid for comparison of the top-ranking genotype (a kind of control cultivar) with the other entries:

d = t’ √ (2 M_err/N)

[4.2]

where M_err is the appropriate error term for the genotype factor in the ANOVA; N is the total number of observations for each genotype (e.g. N = no. sites × no. years × no. replicates, in trials repeated in time); and t’ values for P < 0.05 or P < 0.01 levels (which are different from t values used for calculating the ordinary least significant difference) may be found in statistical textbooks (e.g. Steel and Torrie, 1960).

TABLE 4.6 - Values of t’ for calculation of Dunnett’s one-tailed (or Gupta’s) multiple comparisons of the top-ranking genotype with the remaining entries

	DF = 25		DF = 50		DF = 100
p	P < 0.10	P < 0.20	P < 0.10	P < 0.20	P < 0.10	P < 0.20
4	1.91	1.49	1.87	1.47	1.85	1.46
5	1.99	1.57	1.95	1.55	1.93	1.54
6	2.06	1.64	2.02	1.62	2.00	1.61
7	2.12	1.70	2.07	1.68	2.05	1.67
8	2.17	1.75	2.12	1.72	2.10	1.71
9	2.20	1.79	2.15	1.76	2.13	1.75
10	2.24	1.83	2.19	2.80	2.17	1.79
11	2.27	1.86	2.22	1.83	2.20	1.82
12	2.30	1.89	2.25	1.86	2.22	1.85
13	2.32	1.91	2.27	1.88	2.24	1.87
14	2.34	1.93	2.29	1.91	2.27	1.90
15	2.37	1.96	2.32	1.93	2.29	1.92
16	2.40	1.99	2.34	1.95	2.31	1.94
17	2.42	2.01	2.35	1.97	2.33	1.96
18	2.45	2.03	2.37	1.98	2.34	1.97
19	2.47	2.05	2.39	2.01	2.36	1.99
20	2.48	2.07	2.40	2.02	2.37	2.00
21	2.50	2.08	2.42	2.04	2.39	2.02
22	2.51	2.09	2.44	2.05	2.40	2.03
23	2.52	2.10	2.45	2.07	2.42	2.05
24	2.53	2.11	2.46	2.08	2.43	2.06

Note: DF = degrees of freedom of the error term; p = number of comparisons.

The ordinary level of statistical significance adopted in relation to Type 1 error rates (i.e. P < 0.05) generally implies severe Type 2 error rates in regional variety trials (Carmer and Walker, 1988; Kang, 1998). The failure to recognize and then select or recommend superior germplasm has a negative impact on breeding progress and farmers’ and regional yields. The widespread adoption of less critical probability levels, such as P < 0.10 or (preferably) P < 0.20, is recommended for achieving a better balance between Type 1 and Type 2 error rates. In order to facilitate the execution of Dunnett’s one-tailed tests at these P levels, appropriate t’ values are reported in Table 4.6 for between 4 and 24 multiple comparisons with the top-ranking entry (i.e. 5-25 tested genotypes) and three reference values of error DF (25, 50 and 100). DF values beyond 100 DF - or more than 24 comparisons - determine a negligible decrease in the t’ values.

Arithmetic means across observations are biased when data sets are unbalanced. Least squares means are preferable in this case (Searle, 1987). In the presence of missing values for some genotype-environment combinations, and under the usual hypothesis of environment random factor, genotype means can best be estimated through a Best Linear Unbiased Prediction (BLUP) procedure, usually based on a Restricted Maximum Likelihood (REML) method. The BLUP theory is described in Henderson (1975) and Searle et al. (1992); its application to the current context is discussed by DeLacy et al. (1996a) and Lynch and Walsh (1998), while pertinent examples are provided by Hill and Rosemberger (1985), Piepho (1994a) and Patterson (1997). BLUP-based means are shrinked towards the grand mean in comparison with least squares means, as estimation error due to the random effect of the environment is taken into account. The shrinkage is greater for means estimated across a lower number of environments. The ranking of genotypes for least squares means and BLUP-based means may differ only if there are missing values for a genotype-environment combination. It is primarily in such a situation that BLUP-based means (which are more demanding to calculate) may be applied.

4.3 Estimation of variance components

The most important variance components for defining adaptation strategy and yield stability targets are those relating to genotypic and genotype-environment effects. Genotype-environment effects may concern:

the two determinants of the GE interaction variance represented by heterogeneity of genotypic variance and lack of genetic correlation among environments; and
genotype interactions with location and time factors.

Genotype (s_g²) and GE interaction (s_ge²) variance components for balanced data sets can be estimated as described for Model 1 in Table 4.1. The heterogeneity of genotypic variance and the lack of genetic correlation variance components can be estimated through formulae provided by Dickerson (1962) and reported by Cooper et al. (1996a). An estimate of the former is obtained from the variance of the genotypic standard deviation values estimated for individual environments through separate ANOVAs. For the environment j, the genotypic variance (s_g(j)²) can be estimated from the MS of the genotype (Mg_(j)) and the experimental error (M_e(j)) terms as:

s_g(j)² = (M_g(j) - M_e(j))/r

where r is the number of experiment replicates. The variance of the square root values V(s_g(j)) provides the estimation. The proportion of the GE interaction variance accounted for by this component is:

(V(s_g(j))/s_ge²) × 100

The size of the lack of genetic correlation among environments can easily be estimated as the difference between s_ge² and V(s_g(j)). The proportion of the GE interaction variation accounted for by this component is:

[(s_ge² - V(s_g(j)))/s_ge²)] × 100

The pooled genetic correlation (r_g) among environments can be estimated as:

r_g = s_g²/(s_g² + s_ge² - V(s_g(j)))

This value indicates the relative size of GE interaction effects that are not due to heterogeneity of genotypic variance among environments and are, therefore, of practical interest for breeding programmes. Values close to unity and those close to zero reveal, respectively, substantially consistent and largely inconsistent response of genotypes across environments. Further information on the r_g concept and its estimation in other situations is provided in Section 5.7.

Tables 4.1, 4.2 and 4.3 report formulae for estimating genotypic and genotype-environmental variance components for different models of ANOVA and balanced data sets. The F test result indicates for each effect whether its variance differs significantly from zero. Results of ANOVAs performed on a cell mean basis need to be converted into results on a plot basis before using the formulae. A maximum likelihood method, especially REML, is preferable for the estimation of variance components in unbalanced data sets (Patterson, 1997; Lynch and Walsh, 1998). However, the formulae may still be applied (with caution) when the imbalance is due to a few missing plot values or, in less favourable situations, when MS values derive from Type III SS (adopting average values for coefficients of variance components).

The ANOVA in Table 4.4 includes variance component estimation according to the formulae for Model 1 in Table 4.2. The GL interaction variance - not too low relative to the genotypic variance (≈ 36%) and higher than the GY interaction variance - does not prevent verification of the potential of breeding for specific adaptation (Fig. 2.3). Should wide, rather than specific, adaptation be preferred, selection on contrasting sites is recommended; the variance summed up by the GY and the GLY interaction components of variance relative to the genotypic variance (167%) may be considered just large enough to support yield stability as a useful target (see Section 2.6).

4.4 Data transformation

Data transformation is somewhat controversial in the context of combined ANOVAs, in particular, analysis of adaptation. It may be adopted where there is:

heterogeneity of experimental errors among test environments, violating the assumption of homogeneity of variances required for execution of some F tests;
heterogeneity of genotypic variance among environments or locations, with implications for the presence and size of GE interactions and the assessment of site similarity for GL interaction effects; or
heterogeneity of GY interaction among locations, with implications for the comparison of genotype values on specific sites in the analysis of adaptation.

The latter two situations are discussed in full in Section 5.6. In general, transforming data complicates to some extent the interpretation and exploitation of results, as well as their assessment in economic terms (since the monetary value is proportional to the untransformed yield). For example, a logarithmic transformation affects the measures of genotype merit based on mean yield or yield reliability, because it gives greater weight to data from low-yielding environments (see Section 7.1). When the change of data scale is not beneficial, data transformation (for genotype assessment) is only justified by situations of serious concern. Data transformation also complicates the investigation of adaptive traits (see Section 6.3).

Experimental errors are rarely homogeneous in regional yield trials. Their values are influenced by specific circumstances and tend to be lower in low-yielding environments (Bowman and Watson, 1997). This trend can be expected with variation in plot values bounded by zero at its lower end. The occurrence of heterogeneity of error MS can be verified by any of the available tests for variance comparison (i.e. Cochran’s, Hartley’s or Bartlett’s). In particular, Hartley’s (1950) test simply requires (for MS values with the same DF) the calculation of the ratio between the highest and the lowest MS and its comparison with a critical value (found, limitedly to some reference values of DF, also in Table 7.1). The main consequence of heterogeneous experiment errors is loss of sensitivity of the F tests in which the pooled error MS is the error term (Cochran and Cox, 1957). This makes a significant result fully reliable, and justifies some uncertainty when the F value of the test is not far below the critical value. The bias is reduced for balanced data sets. It should be noted that the bias does not concern the test of the effects of greater practical importance (genotype main effect and GL interaction) for the ANOVA models reported in Tables 4.2 and 4.3. Serious concern is justified only for largely heterogeneous errors in ANOVA models of Table 4.1.

Cochran and Cox (1957) describe a procedure that assigns a lower weight to data of environments with higher error variance to perform a correct F test. A similar approach is represented by multiplying these data by (M_e²/M_e(j)²), where Me is the pooled error MS and M_e(j) is the error MS in the j environment (McLaren, 1996). The transformation affects the results of subsequent analysis of adaptation (e.g. Virk et al., 1991). There is concern, however, that genotype responses may be distorted by the influence of environments with less precise trials (Crossa, 1990). Furthermore, the estimation of M_e(j) values may be inaccurate, suggesting more complex approaches based on their prediction as a function of external variables (Frensham et al., 1998). Weighted genotype means seem unnecessary in a wide range of situations involving heterogeneous error variances (Hühn, 1997). For situations of major concern, data transformation could be devised aimed at eliminating the systematic effect of environment mean yield on experimental errors without introducing unwanted patterns in the data. The regression of error MS (s_e²) on mean yield (m_env) of the trials, with both terms expressed on a logarithmic scale, can reveal whether transformation is required (Dagnelie, 1975a^[8]). Regression slope:

b ≈ 2 (implying the relationship s_e ≈ k m_env) suggests a logarithmic transformation of the complete data set;
b ≈ 1 (implying the relationship s_e² ≈ k m_env) suggests a square root transformation; and
b ≈ 0 (implying no relationship of s_e² with m_env) discourages any data transformation.

Were s_e² and m_env not expressed on a logarithmic scale, the square root would be suggested by a straight line through the origin and the logarithmic transformations by a quadratic curve. Other power transformations could be defined for intermediate situations (Dagnelie, 1975a^[9]), but their complexity makes adoption difficult. To avoid negative log₁₀-transformed values, original yield data including values below 1 may be expressed in a convenient unit (e.g. kg/ha instead of t/ha) prior to transformation. Decisions on data transformation in relation to heterogeneity of genotypic variance take priority over those in the current context (i.e. heterogeneity of experimental errors) when the analysis of adaptation is aimed at the definition of adaptation strategies.

Table 4.7 provides information on four data sets characterized by significant heterogeneity of experimental errors (P < 0.01). A significant relationship between error MS and mean yield of experiments, suggesting a square root transformation (b = 1.37), is present only in the Algerian data set, where the variation for environment mean yield is the largest in relative values (eight-fold between highest and lowest value) and includes very low yields. For this data set, preference would be given to a logarithmic transformation to compensate for heterogeneity of genotypic variance among sites (see Section 5.6). Results for the other data sets suggest that the trend towards covariation of experimental error and mean yield may be negligible in many situations.

TABLE 4.7 - Relationships of environment mean yield (m_env) with experimental error (s_e²), and of location mean yield (m_loc) with within-location phenotypic variance (s_p²) or standard deviation (s_p) of genotype values, within-location GY interaction mean square (M_gy), and average within-location phenotypic variance of annual yield values for individual genotypes (s_y²), in different data sets

Item	Data set ^a
Item	Bread wheat (Italy)	Durum wheat (Algeria)	Lucerne (Italy)	Maize (Italy)
No. locations	20	14	12	11
No. genotypes	10	24	11	13
Repetition in time	4 years	2 years	No	3 years
Extreme m_env values (t/ha)	2.38 - 9.07	0.46 - 3.59	16.59 - 57.88	8.43 - 15.84
Regression s_e² - m_env ^bc	0.44 ns	1.37 **	-0.17 ns	0.61 ns
Extreme m_loc values (t/ha)	3.61 - 8.10	0.80 - 3.44	16.59 - 57.88	10.82 - 15.43
Correlation s_p - m_loc ^c	0.01 ns	0.78 **	0.53 ns	0.41 ns
Regression s_p² - m_loc ^c	0.24 ns	1.92 **	0.80 ns	0.89 ns
Regression M_gy - m_loc ^c	0.23 ns	1.67 **	-	0.84 ns
Regression s_y² - m_loc ^c	-1.10 ns	0.39 ns	-	-1.26 ns

^a Bread wheat: analysis of data from the Italian network of variety trials (Anon., 1993, 1994, 1995, 1996; consistent locations across cropping years). Durum wheat: case study in Chapter 8. Lucerne: data from Annicchiarico (1992). Maize: data from Annicchiarico et al. (1995) for FAO class 700 material.

^b Environments always different at P < 0.01 for s_e² values according to Cochran’s test.

^c ns = correlation or regression coefficient not different from zero (P > 0.05); ** = correlation or regression coefficient different at P < 0.01; regressions on log₁₀-transformed values of both variables.

4.5 Computer software

With powerful software (e.g. SAS, S-PLUS, GENSTAT) all the statistical techniques considered, including the most complex and advanced (REML-based estimation of variance components and genotype means), may be applied; the relatively high cost of such software, however, hinders its widespread adoption. Cheaper applications (e.g. STAT-ITCF, MINITAB) permit the use of a more limited set of techniques; however, if used in combination with a degree of manual or worksheet calculation, most types of analysis may still be performed, particularly for balanced data sets.

The IRRISTAT software (Version 4.3), developed and made freely available by the International Rice Research Institute (IRRI),^[10] is currently considered the reference software for ANOVAs and other types of analysis. In particular, the analysis for different ANOVA models (Tables 4.1 - 4.3) can be performed through the ANOVA module, indicating effects in the model and error terms as appropriate. Specific terminology is used to indicate nested effects in the program. For example, for models in Table 4.3, the ANOVA Model Specification List should be as follows (with factors named as in the table): G, L, L × Y, L × Y × Block, G × L, G × L × Y (where L × Y = year within locations, L × Y × Block = block within years and locations, and G × L × Y = GY interaction within locations). The list of effects for ANOVA models in Table 4.2 comprises: G, L, Y, L × Y × Block, G × L, G × Y, L × Y, G × L × Y. Up to 10 missing plot values can be estimated through the procedure for balanced ANOVA (the observations with missing values should always be present in the data file). The Effect option allows for:

indication of error terms for each effect;
storage in an output data file of the mean values for genotypes, genotype-location combinations (which can be used for subsequent analysis of adaptation) or other combinations of factor levels; and
assessment of the heterogeneity of experiment errors by Bartlett’s test, and plotting the error variance as a function of the experiment mean yield.

ANOVAs for one or more subsets of data can be obtained by specifying, through the Data selection option, the name and level(s) of a classification criterion (e.g. location, environment) that defines the set (Single selection) or sets (Multiple selection) of observations included in (or excluded from) each analysis.

ANOVAs for individual trials (i.e. environments, termed “sites” in the program) can be performed through IRRISTAT’s Single Site Analysis module. This module may also be used to adjust genotype means according to different lattice designs, or to estimate them as least squares means when there are missing plot values. An ANOVA for each location-year combination can be obtained by defining an environment classification criterion and reporting its levels in the Multiple data selection option. The genotype by environment cell means can be stored in an output data file for further analysis.

IRRISTAT’s ANOVA module also comprises a procedure for unbalanced ANOVA which calculates Type III SS and least squares means. There is a limit to its adoption for a combined ANOVA because only a certain number of treatments may be analysed. However, least squares means of genotype-location combinations may also derive from ANOVAs for single trials when there are large numbers of missing plot values, or a combined ANOVA of treatment data averaged across experiment replicates when sites are characterized by a variable number of test years. No procedure is available for the estimation of variance components; previously reported formulae must therefore be applied.

IRRISTAT also provides a Summary module used in the calculation of statistics across all observations of a data set, and a Regression module for simple or multiple regression analysis and correlation analysis of data sets or their subsets (through the Data selection option). Two methods for variable selection may be used for stepwise multiple regression: forward selection and backward elimination (Dagnelie, 1975b^[11]; Draper and Smith, 1981); the stepwise method is not available. The chosen method needs to be specified for each independent variable in the Selection option of the Regression model window. The P level for variable selection in the stepwise test, and the maximum number of steps in the analysis, can also be specified (the default values are P = 0.05 and twice the possible number of independent variables, respectively). Expected (fitted) values of regressions and residuals can be stored in the input data file, provided that these variables are defined in the file. Correlation is requested by an option in the Regression module. The correlation between characters recorded in a single experiment can be calculated on genotype mean values stored in a data file outputted from a previous ANOVA performed on plot values.

Exponential expressions are common in outputs by IRRISTAT (e.g. “.9221E-02” stands for 0.009221; “.4241E+03” stands for 424.1). Besides allowing for easy import of data and export of data to other programs (e.g. Excel), IRRISTAT provides worksheet facilities which allow for data inputting, editing, management and transformation (the latter is also possible for subsets of observations through the use of logical operators). An additional module can be used for randomization and layout of experiments. Details of the different modules for data management and analysis can be found in the tutorial manual available with the software.

IRRISTAT does not allow for REML-based estimation of variance components or genotype means. The software ASREML, developed by the New South Wales Department of Agriculture (Gilmour et al., 1999), may be used for this purpose.

^[6] Ibid., p. 237.
^[7] Ibid., p. 253.
^[8] Ibid., p. 367.
^[9] Ibid., p. 368.
^[10] The software may be downloaded, together with all relevant documentation, from IRRI’s web page, www.irri.org/in the Software Downloads window, or requested from IRRI’s Biometrics Unit (DAPO Box 7777, Metro Manila, Philippines) for the cost of materials and shipping.
^[11] Ibid., p. 98.