Chapter four
MOLECULAR MARKERS

The characterisation of cellular molecules and their variants, as an alternative to assessment of morphological and quantitative traits, has its attraction in the fact that such molecular markers are less subject to the influences of environmental factors and developmental stage.

METHODS

Strauss, Bousquet et al. (1992) distinguished between two classes of molecular marker - molecular genetic markers (those derived from direct analysis of polymorphism in DNA sequences), and biochemical markers (those derived from study of the chemical products of gene expression). A number of major types of molecular marker warrant discussion:

Isozymes

The principles, techniques and applications in plant breeding of isozymes, available since the mid-1950s, have been the subject of many reviews (e.g. Brown and Moran 1981; Weeden 1989; Liengsiri et al.). In the broad sense, the term isozyme refers to detectably different enzymes which catalyze the same reaction. The term allozyme is specifically applied to situations where the different forms are the result of allelic polymorphism.

In isozyme analyses, the particular enzyme is extracted from the plant tissue, and the different forms separated by gel electrophoresis, on the basis of molecular size, shape and electrical charge. In the simplest protocols, starch gels are used. More sophisticated technologies, e.g. use of polyacrylamide gels and isoelectric focussing, give better resolution but are more expensive. A number of enzyme systems can be examined in this way. The two alleles at an allozyme locus in a heterozygous individual can be detected. Allozymes are thus co-dominant markers. They are also multiallelic and, compared to RELPs and RAPDs discussed below, fast and inexpensive to analyse. The number of markers is limited, however, by the number of enzymes available for analysis. Other limiting factors are discussed below.

Restriction Fragment Length Polymorphisms (RFLPs)

Used widely in research programmes since the early 1980s, RFLPs have been well described in a number of recent reviews (e.g. Landry & Michelmore 1987, Tanksley et al. 1989, Nance & Nelson 1989, Neale et al. 1989). The basic technique involves the extraction and then digestion of DNA with a restriction enzyme, which cuts the DNA at occurrences of a particular recognition sequence (usually 4 to 8 bases in length) throughout the strand. Several hundred of these enzymes, each with its own specificity in terms of recognition sequence, are known. The number and lengths of resulting fragments depends then on the number and distribution of recognition sites. Common sources of polymorphisms are (Nance & Nelson 1989):

Two DNA molecules may differ in the number of restriction sites for a particular enzyme, e.g. as a result of point mutations that create or destroy a restriction site.
Two DNA molecules may differ in the length of the sequence separating common restriction sites.

Following digestion, the fragments which have been generated are separated by gel electrophoresis. The generally large genomes of higher plants and animals produce too many fragments for clear resolution, and the technique of “Southern blotting” is therefore applied to the fragment array. This involves the transfer of the fragment pattern resulting from electrophoresis to a solid support, and then visualization of only those fragments which hybridize to a labelled probe. To be useful, probes in the “probe library” must be homologous to single-copy, low-copy, or tandemly repeated sequences in the target DNA. Probes are prepared from the same (or closely related) genome under study. Two common sources of probes are (Nance & Nelson 1989):

Randomly fragmented DNA, cloned in a plasmid and then screened to eliminate those fragments with highly repetitive sequences.
Complementary DNA (cDNA) molecules produced from messenger RNA (extracted from the target) and then cloned.

The latter are homologous to functional regions of the genome and are thus potentially more useful, though less numerous, than random probes. Restriction enzyme/probe combinations are then screened for those giving rise to polymorphisms. This is facilitated by the fact that the Southern blot can be probed in succession with several different probes.

Compared to isozymes, RFLPs are much more numerous, but also much more expensive, and require the use of radioactive material. Like isozymes, they are codominant and multiallelic markers.

Randomly Amplified Polymorphic DNAs (RAPDs)

RAPDs were first described only in 1990 (Williams et al. 1991). Good reviews of the methodology and application have been presented by Williams et al. (1991) and Rafalski et al. (1991). These publications include detailed descriptions supported by diagrammatic representations. Briefly, the technique involves the use of a short oligonucleotide of arbitrary sequence to prime the amplification of DNA fragments by the Polymerase Chain Reaction (PCR). An oligonucleotide will prime amplification from a genomic template if binding sites on opposite strands of the template exist within a distance which can be traversed by the DNA polymerase (up to several thousand nucleotides). Genomic polymorphisms at one or both priming sites result in the non-amplification of a band. RAPDs are thus dominant markers - appearance of a band implies homology with the primer used. All other alleles at the priming site will be represented by absence of the band. Codominant RAPD markers, resulting from insertions or deletions between priming sites and observed as different sized fragments amplified from the same locus, are detected only rarely (Williams et al. 1991). A primer usually amplifies several bands, each originating from a different genomic location. The nature of the fragments amplified is influenced dramatically by the sequences of both primer and template. Fragments are separated on agarose gels and stained with ethidium bromide. Primers most commonly used are 10 nucleotides in length with at least 50% Guanine-Cytosine content. Primers as short as 5 nucleotides give more complex banding patterns requiring more sophisticated electrophoretic and staining procedures (acrylamide gels and silver staining). Each different primer used will result in a different banding pattern. As for RFLPs, the RAPD technique leads to the availability of a virtually unlimited number of markers. Because they do not require Southern blotting, development of species-specific probes and radioactive labelling, RAPD analyses can be conducted much more quickly (and with fewer laboratory restrictions) than those involving RFLPs. As a result of amplification by the Polymerase Chain Reaction, quantities of DNA required for analysis are 100 times lower (only 15–20 ng) than for RFLPs. The techniques are, however, considerably more expensive than those involved in isozyme analyses, and the dominant nature of the marker, resulting in an inability to distinguish homozygotes and heterozygotes, is a limitation for some applications. Use of RAPD markers may permit mapping in areas of the genome not accessible to RFLP analysis due to the presence of repetitive DNA sequences. As with other genetic markers, some polymorphisms are easy to score, while others are ambiguous and not useful as markers. Ambiguous polymorphisms may result from poor discrimination by a primer between alternative priming sites of slightly different nucleotide sequences.

Microsatellites

Microsatellites are DNA sequences composed of a tandem repetition of a simple short sequence, occurring in the genome of many higher organisms (Rafalski et al. 1993). The most common are dinucleotide repeats. They are very common, and very polymorphic (there are many variants). Providing the sequence of the DNA surrounding a microsatellite is known and suitable PCR primers can be designed, the segment of DNA incorporating the microsatellite can be amplified and its length determined by electrophoresis. Multiple allelic length variants can be identified at most microsatellite loci. Advantages of microsatellites are their abundance, high degree of polymorphism, multi-allelic and co-dominant nature, basis on PCR reactions (requiring only small amounts of DNA and no radioactive labels), and that they can be shared among laboratories by exchanging primer DNA sequences. Disadvantages are the requirement for cloning and sequencing of microsatellite loci, the need for high resolution gels, and the difficulty of plus/minus assays (Rafalski et al. 1993).

Other molecular markers have been used. The use of seed storage proteins was reviewed by Gepts (1989). Polymorphisms for seed storage proteins have been identified in several species, e.g. beans, pea, maize, barley. Post-translational processes, e.g. glycosylating reactions, contribute to molecular weight heterogeneity of these. Levels of diversity tend to be higher than for isozymes and the markers, like isozymes, are codominant and cheaply analysed by electrophoresis. Disadvantages are the limited coverage of the genome and their absence in many tissues. Terpenoids are useful biosystematic markers, but only genetic changes that substantially change gene expression or biosynthetic enzyme activity will be detected through their use (Strauss, Bousquet et al. 1992). They are inexpensive to analyse, and perhaps more numerous than isozymes, but the number of statistically independent dimensions provided is reduced by strong correlations among terpene concentrations (Hanover 1992).

PRINCIPLES AND ACHIEVEMENTS

Molecular markers have been applied in three areas of plant improvement, specifically: quantification of genetic diversity, genotype verification and delineation, and marker assisted selection.

Quantification of Genetic Diversity

The use of molecular markers for the rapid determination of the extent of genetic variation within and among populations is of value in guiding:

Gene conservation activities, which are aimed at maintaining genetic diversity with respect to traits of both known and unknown importance.
Development of breeding populations, which is aimed at defining populations with appropriately large genetic variation around high means with respect to traits of known importance.

In both cases, the use of markers would circumvent the obstacles to measuring variation in terms of quantitative traits: environmental effects, difficulties in calculating genetic diversity parameters, and requirement for several years growth before many traits can be measured (Hamrick 1992). Marker genes are considered to be random samples of a large class of functional genes (Conkle 1992). Genetic variation is measured by the percentage of loci which are polymorphic, and the expected heterozygosity under Hardy-Weinberg equilibrium (Loveless 1992).

For agricultural crop species, isozymes have been used to demonstrate reduction of diversity due to domestication for rice, tomato, watermelon and beans (Gepts 1989). No such reduction was evident for Douglas fir (Neale, Devey et al. 1992). For forest tree species, isozymes have been widely used for the assessment of among and within population variation e.g. for Abies alba, Larix decidua, Picea abies, Pinus cembra, P. halepensis, P. leucodermis, P. nigra, P. pumila, P. sibirica, P. sylvestris, Castanea sativa, Fagus sylvatica, Ficus carica, Quercus ilex, Quercus petraea, Quercus robur (Muller-Strack et al. 1992). Acacia mangium, A.melanoxylon (Moran 1992a), Alseis blackiana, Picea glauca, Robinia pseudoacacia, Pinus resinosa, P. torreyana, Populus balsamea, Acer saccharum (Hamrick et al. 1992). Pinus banksiana, P. contorta, P. monticola, Picea mariana, P. glauca, P. sitchensis, Pseudotsuga menziesii and Thuja plicata (Neale & Williams 1991). While many of these species have demonstrated substantial variation, a few e.g. Pinus torreyana, P. resinosa, Populus balsamea, Ficus carica, Thuja plicata and Acacia mangium are characterized by low levels of isozyme variation. In most studies, intra-population isozyme variation has been large, and inter-population variation very low. Exceptions are species with regional distributions but small disjunct populations. Population levels of isozyme variation have not been found to bear any obvious relationship to environmental factors such as latitude, altitude and soil type in eucalypts (Moran 1992a). These studies have suggested that effective population sizes for tropical trees may be quite large, perhaps 25–50 hectares (Loveless 1992).

The results of many isozyme studies lead to the conclusion that interpopulation variation is of little significance, in comparison with intrapopulation variation, and therefore need not be considered in gene conservation and breeding programmes. Fundamental to this argument is the extent to which isozyme variation reflects:

overall genetic variation, and
variation with respect to particular qualitative and quantitative traits of interest.

Only 0.5% of the eukaryotic genome codes for all of the proteins in an organism. Only 20% of the variation in this 0.5% can be detected by electrophoresis, and isozyme techniques are thus able to detect only 0.1% of nucleotide substitutions in the total genome (El-Kassaby 1991). In practice, the proportion of the variation assessed is even smaller, as only a few of a plant's proteins are examined. Furthermore, the choice of enzymes has been shown to influence determined diversity in some studies peroxidases, esterases, phosphatases and aminopeptidases tend to be polymorphic, while aminotransferases, dehydrogenases and isomerases are typically less so (Muller-Starck et al. 1992).

It could be inferred then that estimates of isozyme variation may be imprecise, except where a large number of loci, say 100, are included (Yeh 1989), and furthermore may provide biased estimates of overall variation due to non-randomness of the genes sampled (Brown & Moran 1981, Yeh 1989). Estimates of overall genetic diversity determined from isozyme analyses should therefore be used with caution. Their use to predict variation with respect to a different, relatively small subset of genes - those underlying the selection traits for a breeding program - is logically even more subject to imprecision. Published results bear this out. Estimated population differentiation at isozyme loci has generally been much lower than that with respect to adaptive traits such as growth, survival and form for Pinus sylvestris (Savolainen & Karkkainen 1992), Pseudotsuga menziesii (Merkle & Adams 1987), and some other conifers (Yeh 1989). Morphological variation in domesticated crops such as barley and bean is also reported to be greater than isozyme diversity (Gepts 1989). As pointed out by Muona (1990), variability with respect to adaptive traits is likely to be more strongly influenced by natural selection than is variability with respect to markers.

In theory, DNA markers are preferable for the assessment of genetic variability, as they permit investigation of both coding and non-coding variation (Yeh 1989, Wagner 1992). RFLPs are likely to be better than RAPDs because multiple alleles at a locus can be detected (Neale, Devey et al. 1992). Some restriction endonucleases may not be distributed randomly, however, and RFLP data should be interpreted with caution (Landry & Michelmore 1987). RFLPs have been used to estimate diversity in maize (Hoisington 1992). For forest tree species, RFLPs have demonstrated interpopulation differentiation in Gliricidia sepium (Lavin et al. 1991), Pinus attenuata, P.muricata and P.radiata (Strauss, Hong & Hipkins 1992), Quercus robur and Q. petraea (Kremer et al. 1991). Variation in Gliricidia sepium and G. maculata has also been partitioned using RAPD markers (Waugh & Powell 1992). Organellar RFLPs yielded a much higher estimate of among population diversity than isozymes for the Quercus and Pinus species. This is due to the conservative nature of organellar genomes, such that their sensitivity for detection of differentiation is larger with more distant relationships (Szmidt 1991).

These studies indicate that molecular markers can not be expected to provide identical nor precise estimates of overall genetic variability, and are likely to be even less reliable for predicting variability with respect to traits of practical importance.

Genotype Verification and Delineation

The most widespread recent use of molecular markers has been for the identification of genotypes, for applications such as: taxonomic studies and studies of phylogenetic relationships, biological studies, and genetic fingerprinting.

Taxonomic and Phylogenetic Studies

Reliable taxonomic systems are fundamental to a plant breeder's testing and hybridization programmes, and the availability of molecular markers has provided the opportunity to align taxonomy more closely with basic genetic relationships. Molecular methods have largely replaced morphological approaches in phylogenetic studies.

Nuclear RFLPs have been used to distinguish Populus tremuloides and P.grandidentata, and chloroplast DNA variants have provided markers of hybridization in Picea and Pinus (Wagner 1992). Species specific RAPD markers have been sought in Quercus robur and Q.petraea, with a view to detecting natural hybrids between the species (Moreau et al. 1992). Terpenes have been used to distinguish natural hybrids between Picea pungens and P.engelmannii (Hanover 1992). Species-specific RFLP and/or RAPD markers have been used to detect introgression and the presence of natural hybrids in Louisiana Iris species (Arnold et al. 1991), Gossypium barbadense and G. hirsutum. (Paterson 1993), coastal (Picea sitchensis) and interior (P.glauca and P.engelmannii) spruces (Sutton et al. 1992), and Populus freemontii and P. angustifolium (Cooper 1992). In the Iris example, the hybrid origin of one of the putative species was demonstrated. Allozymes have been effective for defining zones of natural hybridization and introgression in Picea sitchensis and P.glauca, Abies balsamea and A.fraseri, Pinus brutia and P.halepensis, P.contorta and P.banksiana, and P.radiata and P.attenuata (Strauss, Bousquet et al. 1992).

Within species, allozyme data have been considered to be a useful component of information on the basis of which breeding zones are defined in Pinus lambertiana, P.ponderosa and Pseudotsuga menziesii (Westfall & Conkle 1992).

Biological Studies

The use of molecular markers has revolutionized studies of mating systems, pollen movement, seed dispersal and genetic processes. Results of such studies are of considerable practical significance in relation to population sampling, seed orchard design and management, controlled pollination methods, and clonal forestry programmes.

Mating system studies using isozymes have generally substantiated the expected high levels of outcrossing in forest tree species (Yeh 1989). Of 16 tropical tree species for which isozyme studies were carried out, 14 were shown to be largely outcrossing (Loveless 1992). Progeny of four Acacia species were nearly entirely the result of outcrossing (Moran 1992b). The arid zone species A.holosericea and A.cowleana, on the other hand, showed evidence of unusual breeding systems - perhaps apomixis and/or selfing (Moran 1992b). Isozyme studies with 12 eucalypt species showed that these have mixed mating systems with predominant outcrossing but a significant fraction of inbreeding (Moran 1992a).

Adams (1992) discusses statistical methods for measuring gene movement based on the use of large numbers of isozyme loci - these procedures include various forms of parentage analysis and the fitting of mating models to genotypic arrays of offspring from individual maternal plants. A knowledge of gene movement permits, for example, a more practical estimate of the minimum distance between candidate trees when sampling in natural populations - Yeh (1989) cites estimates of 65 m for Picea glauca, but less for P.mariana. Isozyme analyses revealed pollen movement over a three year study with Platypodium elegans to average 374 m, 420 m, and 369 m (Loveless 1992). Isozyme studies suggest that animal dispersal may be moving seeds over relatively large distances in tropical forests, thus reducing population genetic differentiation (Loveless 1992). RFLP chloroplast analysis suggested introgression in Louisiana Irises has resulted largely from a pollen rather than a seed mediated flow (Arnold et al. 1991).

The assessment of pollen movement is of major significance in relation to the management of seed orchards, e.g. for estimation of pollen contamination, mating patterns among clones, inbreeding, and the success of supplemental mass pollination. Using isozymes, the genotype of the pollen gamete is inferred from the genotypes of the seed megagametophyte and embryo and then compared to male genotypes within and outside the orchard to infer parentage (Neale, Devey et al. 1992). Approaches to the detection of contamination have been discussed in more detail by Wheeler and Jech (1992). The simplest is the unique marker method, dependent on the presence of a unique marker, usually a single locus, in populations outside the orchard but not in the orchard itself. In situations where gene frequencies in orchards and surrounding stands are quite distinct, multilocus procedures based on gene frequency differences among orchard parents and orchard seed pools can be used to estimate immigration of contaminant pollen. Paternity exclusion techniques are also very effective: potential pollen gamete genotypes that can be produced by orchard parents are determined on the basis of clonal genotypes. These are checked against the paternal genotypes of outcrossed seed from the orchard crop, and seed which could not have been sired by orchard pollen is listed as unambiguous contaminants. The power of genetic discrimination among putative males is a function of the number of genetic markers, the amount of allelic variability at individual loci, the frequencies of alleles in the population, and the overall number of males in the population (Neale, Devey et al. 1992). Complete genetic discrimination among males using isozymes is never possible in operational seed orchard situations, and statistical inference must be used. DNA markers could potentially allow enhanced genetic discrimination, but neither RFLPs or RAPDs are well suited for inferring the genotype of the male gamete by examination of the embryo - RAPDS because of dominance, and RFLPs because of the requirement for large quantities of DNA. A codominant PCR based marker would be extremely useful for this purpose (Neale, Devey et al. 1992). Statistically adjusted pollen contamination in a Scots pine seed orchard, estimated on the basis of isozyme data, was over 50% (Wang et al. 1991), estimates for Douglas fir range from 21% to 89% (Wheeler & Jech 1992), and serious erosion of genetic gain through contamination in wind-pollinated seed orchards may be widespread (Muona 1990, Ellstrand 1992). The use of markers, as described above, can at least allow appropriate monitoring.

Use of molecular markers has been proposed also for the assessment of gene flow in the reverse direction - “genetic pollution” of wild Populus populations by commercial hybrid clones (Cooper 1992).

Contaminants in controlled crosses of Betula alleghaniensis were detected using RAPDs (Roy et al. 1992). Isozymes have been used to show that 30% or more of Douglas fir and loblolly pine controlled crosses were not correct (Neale, Devey et al. 1992). These determinations were possible with as few as 6–10 isozyme loci and five seeds per cross, and it has been suggested therefore that the greater power of DNA markers would offer no advantage for these species. RFLPs would be useful for species with low isozyme variation, although it would be necessary to raise progeny to a size sufficient to yield the quantities of DNA required for analysis (Neale, Devey et al. 1992). The effectiveness of supplemental mass pollination in a loblolly pine orchard has been assessed using isozymes (Wheeler & Jech 1992).

Molecular markers have been used to assess the extent of inbreeding and preferential mating systems in seed orchards. Isozyme work reviewed by Yeh (1989) showed that Pseudotsuga menziesii families in a seedling seed orchard varied greatly in outcrossing rate. Selfing rates of 16% were reported in a Pinus sylvestris orchard, while isozyme studies in a clonal Picea glauca orchard showed that as few as two of the 33 clones contributed about 50% of the pollen in two consecutive years (Yeh 1989). Mating patterns in a Scots pine seed orchard were examined using two isozyme loci (Gregorius 1991).

Molecular markers have been used in many studies of fundamental genetic processes with forest tree species. Linkage disequilibrium and its causes have been studied with RFLPs in several genetic systems. (Landry & Michelmore 1987). RAPD analyses in a hybrid between Pinus elliottii and P. caribaea revealed segregation distortion believed due to selection against incompatible gene combinations which are lethal to megagametophyte development (Dale et al. 1992, Dale, Teasdale et al. 1993). In this same taxon, RAPD analysis has been used to quantify recombination rate, which was determined to be quite low, chiasma occurring at one per chromosome per generation (Dale, Gates et al. 1993). As pointed out in this article, a low rate of recombination per generation has significant implications for the use of RAPD maps for marker assisted selection. RFLP analysis in the chloroplast DNA of six interpopulation crosses of Eucalyptus nitens showed the chloroplast genome to be maternally inherited. (Byrne & Moran 1992). RAPDs have been used to demonstrate the absence of somaclonal variation in plants regenerated from hypocotyl sections of E.grandis (Haque et al. 1992).

Genetic Fingerprinting

The problem of verification of identity of seedlot (provenance, orchard batch, family lot) or clone differs from the identification problem discussed above in that exclusive identification, not just delineation from one or a few known alternatives, is frequently desired.

Specialized biochemical markers such as seed storage proteins (Gepts 1989) and terpenes (Hanover 1992) have occasionally been used in cultivar identification. Isozyme phenotypes have helped distinguish cultivars of many crops, but have some general limitations, including poor application to crops with low levels of isozyme variation (e.g. tomato, pepper, chickpea, cucumber), and limited ability to detect mutants involving only minor genetic changes (Weedon 1989). For forest tree species, analytical procedures have been used to discriminate among provenances and seed orchards, based on gene frequency data, but discriminatory powers with these isozyme methods are generally inadequate for this purpose (Wheeler & Jech 1992). Allozymes were found to be unsuitable for the certification of Douglas fir seed origin (Merkle & Adams 1987). In summary, the power to genetically discriminate between individuals or groups is a function of the number of markers and the amount of genetic variability they display. The number of available allozyme loci is frequently limiting (Neale, Devey et al. 1992).

The use of DNA markers overcomes the above limitation. RFLPs have been used for fingerprinting, and calculations reported by Landry & Michelmore (1987) indicate their effectiveness. For an inbreeding crop with only two alleles at each RFLP locus, each allele occurring with a frequency of 0.5 among 20 inbred lines, the probability of distinguishing each of the 20 lines is 0.99 if 20 probes are used. Observations of frequent RFLPs in maize, suggest that even fewer polymorphic probes would be necessary in practice. For highly heterozygous, asexually propagated plants, only a few markers would be sufficient (Landry & Michelmore 1987). Combined RFLP banding patterns of six probes could be used to individually identify each of 39 cultivars of peach examined (Ballard et al. 1992).

Work involving the use of RFLPs of highly repeated sequences, possibly nuclear in origin, to differentiate half-sib cell lines of Liriodendron tulipifera was reviewed by Wagner (1992), who concluded also that DNA fingerprinting is now possible in many woody taxa. Chloroplast RFLPs unambiguously distinguished many individuals in populations of P.banksiana and P.contorta (Govindaraju et al. 1989), but may be generally insensitive for the detection of differentiation at the intraspecific level due to the conservative nature of the chloroplast genome (Szmidt 1991). RAPDs have been successfully used to fingerprint individual accessions of Theobroma cacao, using only a small selection of primers (Waugh & Powell 1992). RAPDs are particularly advantageous in this species due to difficulties in extracting sufficient DNA for an RFLP analysis. RAPD markers have also been used for fingerprinting of individuals within full-sib families of Picea glauca (Hong et al. 1992). The dominant nature of RAPD markers, however, will place some limitations on their use for genetic fingerprinting.

Where sufficient DNA can be extracted, and where sufficient polymorphism is displayed, RFLPs are thus preferred to isozymes and RAPDs for genetic fingerprinting work. Microsatellites, however, codominant and capable of PCR amplification, will be superior (Neale, Devey et al. 1992).

Marker Assisted Selection

Marker-assisted selection is based on the principle of genetic linkage - that recombination occurs infrequently between loci which are very close together on the chromosome. Selection is made on the basis of an easily or reliably assessed marker(s) which is tightly linked to a character which is of practical importance but not as easily assessed. Although the principle is far from recent, commercial application has been uncommon. One reason is that it is rare to find morphological markers (e.g. an unusual leaf shape) which are tightly linked to traits of interest. Another reason is that the morphological markers themselves sometimes have a negative effect on phenotype. The availability of molecular markers, more numerous, generally without effect on phenotype, and with a heritability of one, have added a new dimension to the possibilities of marker-assisted selection. In application, markers linked to the trait of interest are established for the particular genetic material used, and then the presence of the marker allele is used to screen for the trait in the segregating F₂ or backcrossed populations. Screening can thus be done without field testing, without the requirement to inoculate plants with a pathogen in the case of a disease resistance trait, and, potentially, for resistance to a range of pathogens on the same plants. This offers the possibility of cheaper and more rapid screening. It has been estimated, for example, that reconstruction of the recurrent parent genotype in a backcrossing program could be accomplished in only three generations, contrasting with at least six in traditional programmes (Tanksley et al. 1989). Marker assisted backcrossing should be particularly effective for “pyramiding” several dominant genes controlling a “quasiquantitative” trait (e.g. some types of disease resistance) into a single line (Stuber 1989). Markers offer a particular advantage in disease resistance programmes in which inoculations are difficult to control, or where a pathogen is not available due to import restrictions (Vallejos 1992).

Essential to the application of marker-assisted selection is the identification of markers tightly linked to the gene of interest. Near-isogenic lines (lines genetically almost identical except for the target gene) have been used to identify such markers. In tomato, for example, 7 of 144 RAPD primers screened yielded amplification products which appeared in one line but not the other. Subsequent segregation analysis confirmed linkage of some of these to the bacterial resistance gene for which the lines differed (Martin et al. 1991). An alternative approach is bulked segregant analysis (Michelmore et al. 1992). This comprises the preparation of two bulked DNA samples (differing for the trait of interest) from a segregating population. Each bulk is made up of individuals which are identical for a particular trait but arbitrary at all unlinked regions of the genome. This has been used to identify RAPD markers linked to disease resistance genes in lettuce. In peach, ten RAPD markers linked to genes at the red leaf locus, and three to a malate dehydrogenase locus, have been identified using the bulked segregant approach, confirmed by cosegregation analyses (Chaparro et al. 1992). Attractions of the bulked segregant approach are the rapid construction of bulks, and the non-identification of markers in unselected regions (Michelmore et al. 1992).

Tight linkage, less than 5 cM (centiMorgans: units of recombination distance on the genome), is required for tagging or manipulating desired loci with single markers (Stuber 1989). The level of recombination with looser linkage is so great that the benefits of using the marker are largely negated. Tagging with flanking markers (one on each side of the target locus), permitting more confident gene manipulation through simultaneous selection for both markers, offers the possibility of using more widely spaced markers - say at 20 cM. It has been calculated that, for a genome size of 1 500 cM, 286 random RFLP markers would be necessary to cover the genome every 20cM (Landry & Michelmore 1987). Saturation levels approaching this have been attained in crops such as maize and tomato (Stuber 1989). Segregation analysis and construction of a linkage map are stepwise processes which can be facilitated using computer programmes. Recombination frequencies for all pairwise comparisons between loci are estimated using the maximum likelihood method. Finally, map units (cM) are calculated using a mapping function. The linkage map is deduced by the best fit to these values. (Landry & Michelmore 1987).

Linkages have been established between seed storage protein loci and powdery mildew resistance in barley, and wrinkledness in pea (Gepts 1989). Weedon (1989) presented a list of reported linkages between isozyme loci and commercially important traits in crop plants. Included are nematode resistance, male sterility and self-incompatibility in tomato; resistance to bean yellow mosaic virus, to Fusarium, and to pea enation mosaic virus in pea; and self-incompatibility in Nicotiana and rye. These markers, however, are not sufficiently numerous to provide many useful linkages, and the opportunity for saturated genome mapping (markers spaced at very frequent intervals throughout the genome) appeared only with the advent of RFLPs. RFLP maps have been constructed for many crop species in recent years, including rice, maize, sorghum, wheat, tomato, potato, brassica, lettuce and soybean (Young 1992). Mapping projects in species of particular interest to the tropics and sub-tropics are also underway. These include plantain, common bean, groundnut, pigeonpea, mung bean, and cowpea (Young 1992).

Many important resistance genes have now been mapped in relation to tightly linked RFLPs, e.g. genes that provide resistance to downy mildew (Bremia lactuca) of lettuce, maize dwarf mosaic virus of maize, powdery mildew (Erisyphe graminis) of barley, Phytophthora rot of soybean, leaf blast (Magnaporthe grisea) of rice, and powdery mildew (Erisyphe polygoni) of mung bean, tomato mosaic virus, Fusarium oxysporum, root-knot nematode (Meloidogyne incognita), grey leaf spot (Stemphyllium), and bacterial blight (Pseudomonas syringae) of tomato, potato virus X of potato, (Tanksley et al. 1989, Young 1992, Harms 1992)

The linkage between the enzyme Aps-1 locus and nematode resistance has been widely utilized in tomato breeding programmes in North America and Europe - the linkage is very tight and direct screening for nematode resistance using traditional methods is difficult (Weedon 1989, Stuber 1989). Work is now underway to map additional tomato resistance genes and to use RFLPs to assist in incorporating them all into a single “super” tomato variety (Young 1992).

Many characters of agronomic importance are controlled by genes at several unlinked loci (Quantitative Trait Loci, or QTLs). The basic idea of mapping QTLs through cosegregation analysis has been available since 1923 (Lander & Botstein 1989). Statistical methods based on the normal distribution and three point mapping can locate genome regions contributing to a QTL. In this way, morphological markers (seed colour, leaf morphology) have been used for limited mapping of QTLs. Isozyme polymorphisms have also been used to map some QTLs in tomato (leaf ratio, stigma exertion, fruit weight, and seed weight) and maize (ear number and grain yield) (Landry & Michelmore 1987). The number of morphological and isozyme markers that can segregate in a single cross, however, is too low to obtain the close linkage required to identify the majority of loci contributing to a quantitative trait. The availability of RFLPs has provided the power required for these analyses. RFLPs throughout the genome are individually tested for the likelihood that they are linked to a QTL. For each RFLP marker, the population is split into groups according to the RFLP phenotype at that locus (Young 1992). Maximum Likelihood estimates are compared to those obtained under the assumption that no QTL is linked. If the LOD score exceeds a predetermined threshhold, a QTL is declared to be present (Lander & Botstein 1989). A QTL likelihood map, displaying variation in LOD score throughout the genome, is derived in this way.

Putative QTLs underlying several agronomic traits have recently been identified using RFLPs. Examples (mostly from the review by Young 1992) are:

Six QTLs controlling 58% of the variation in tomato fruit size.
Four QTLs controlling 44% for soluble solids in tomato fruit.
Five QTLs controlling 48% for pH in tomato.
Three major QTLs controlling water use efficiency in tomato.
At least three QTLs controlling the principal toxic factor for insect resistance in tomato (Landry & Michelmore 1987).
Several QTLs contributing to a hypersensitive response to Xanthomonas campestris in tomato (Vallejos 1992).
Six QTLs controlling 53% of variation in cellular membrane stability (related to heat tolerance) in maize.
Seven to ten QTLs involved in resistance to Southwestern corn borer in maize (Hoisington 1992).
Five associated with hard seededness in soybean, accounting for 71% of the variation.
A single QTL accounting for 35% of the variation in seed size and 37% of pod length in cowpea.
A QTL in the homologous region of the mung bean genome controlling 46% of seed size and 28% of pod length.

Some of the tomato QTLs have been shown to have an effect in all environments, while others are significant in only one. Other research in progress is directed at the identification of QTLs contributing to drought tolerance in maize (Hoisington 1992) and sorghum (Mullet et al. 1992).

Traits of major importance in forestry are mainly quantitative, but work directed at the identification of QTLs in forest tree species has not progressed far (O'Malley 1992). Particular hindrances are the large size of the genome, scarcity of multigenerational pedigrees, and long generation times (Tulsieram et al. 1992). Nevertheless, many mapping projects are in progress - in conifers stimulated partly by the ready availability of haploid megagametophyte tissue. Partial linkage maps including isozymes have been developed for Picea glauca, P.nigra, Pinus rigida and P. albicaulis (Weedon 1989). Maps involving up to 300 RFLP markers are under construction for Pinus taeda (Neale et al. 1989, Devey et al. 1992, O'Malley et al. 1992). One of these studies is directed at the mapping of QTLs for specific gravity, using a three generation pedigree with grandparents selected as being phenotypically extreme for the trait (Neale et al. 1992). At the time of reporting, 91 RFLP loci had been mapped. A similar QTL mapping project is underway also for Douglas fir, involving another three generation pedigree, with grandparents selected to be extreme for early and late flushing (Neale et al. 1992). An RFLP map is being prepared also for Eucalyptus nitens (Byrne et al. 1992).

Many mapping projects are in progress using RAPDs, particularly for conifers, where the use of haploid tissue circumvents the dominance problem. 61 RAPD markers have been mapped for a single Picea glauca tree (Tulsieram et al. 1992). For the hybrid between Pinus elliottii and P. caribaea, a genetic map with an average resolution of 10cM was constructed in just five weeks (Dale, Gates et al. 1993). In this taxon, RAPD markers linked to six putative genes controlling bark thickness have been identified (Dale and Teasdale 1993). Wilcox et al. (1992) described a method for mapping QTLs in individual conifer clones, without the requirement for multigenerational crosses, using half-sib progenies. Progeny are scored for traits of interest, and then separated into two groups on their maternal allelic contributions at a RAPD marker locus. A T-test is then used to compare the two groups to find associations between traits and markers. It has been calculated that family sizes of 1000 will be adequate to detect QTLs. The method is being used to examine cyclic height growth in loblolly pine.

Several factors associated with the linkage relationships between marker loci and adjacent QTLs influence the precision with which they can be identified and mapped (Stuber 1989). These include:

The number of segregating marker loci available in the population (more giving greater precision).
The distribution of marker loci (an even distribution being preferable).
The level of linkage equilibrium (disequilibrium being essential).

To summarize comparatively the three major types of molecular marker applied to marker assisted selection: lsozymes are fast, cheap, codominant and multiallelic, and in most respects the perfect marker, but will have limited application due to their limited number and relatively low level of polymorphism. RFLPs are multiallelic, codominant, and virtually unlimited in number, but are expensive and slow to analyse. RAPDs are numerous and fast (although expensive), but dominant, and thus require special strategies for their use.

APPLICATION TO FOREST TREE IMPROVEMENT

Assessment of Genetic Diversity

Gene Conservation

Patterns of variation in most of the major industrial species are reasonably well defined and, although some provenances may be under threat locally, the major gene pools of the species are conserved in in situ and ex situ stands. No major application of the use of markers to gene conservation activities is thus evident for this group. A partial exception may be species with a long history of vegetative propagation, e.g. poplars, where demarcation of natural seedling stands and planted clonal material is unclear. In these cases, the use of markers could aid in the identification of previously unrecognized centres of diversity.

Natural stands of many tropical hardwoods and non-industrial species are under serious threat. Conservation programmes are expensive, and funding sparse, and methods to maximize efficiency are urgently required (Strauss, Bousquet et al. 1992). For these species, markers could assist in defining centres of diversity and establishing conservation priorities. However, they must be used with great caution due to the limitations discussed above.

Assembly of Breeding Populations

This is of most relevance to species where breeding programmes are in the early stages of development, e.g. mainly species within the tropical hardwood and non-industrial groups. Limited resources frequently dictate that only a proportion of provenances can be field tested properly, and the use of any method which might assist in directing sampling is very appealing. Marker diversity, however, generally will not be an accurate guide to diversity with respect to one or two economic traits, and therefore must be used very conservatively, e.g. to exclude populations which might have very low variability due to a long period of traditional use.

Genotype Verification and Delineation

In recent years, there has been a massive resurgence of interest in population dymanics and gene flow. This has occurred largely because of the availability of molecular markers.

Taxonomy and Phylogenetic Studies

Sound knowledge of the taxonomic boundaries of a species, and of its relationships to other taxa, is fundamental to tree improvement. In addition, accurate taxonomy will be required to define species for legal purposes associated with conservation programmes. The taxonomic relationships of most established plantation species have been reasonably well defined by traditional means, although markers are now being used for refinement. It is in the tropical hardwood and non-industrial groups, however, that many taxonomic relationships remain poorly defined, and molecular markers will be a valuable supplement to morphological approaches. Both isozyme and DNA markers will be useful.

Biological Studies (Pollen Contamination, Mating Systems etc)

As discussed above, evidence suggests that pollen contamination in seed orchards and crossing programmes is widespread, and is likely to be eroding substantially genetic gains expected from breeding programmes based on open pollination (most species) or even controlled pollination. Similarly inbreeding, or distorted mating patterns, could be resulting in loss of genetic gain in open pollinated orchards. A need exists, therefore, for monitoring procedures as a routine part of tree improvement programmes with industrial species. For many tropical hardwoods and non-industrial species, species biology, in particular mating systems, is poorly known. Molecular markers are a very powerful tool for these studies. Isozymes are suitable for most of these purposes. Studies aimed at understanding gene expression and the nature of phenomena such as heterosis, perhaps the most important area of strategic research related to tree improvement, have been revolutionized by the availability of DNA markers.

Genetic Fingerprinting

Misidentification of clones in seed orchard establishment or crossing programmes is probably very common. Incorrectly identified ramets in seed orchards have been estimated to comprise 2–13% in Douglas fir, 7–10% in Scots pine, and 10% in loblolly pine (Wheeler and Jech 1992), no doubt resulting in some loss of genetic gain. The economic implications are likely to be even greater if these errors occur in clonal forestry programmes. Molecular markers will therefore have an important role in quality control in advanced breeding programmes. Although isozymes will be satisfactory for some verification problems, DNA markers are much more powerful for genotype identification, and will be necessary where identification is to be more or less exclusive. Such identification would be required for Plant Variety Rights (PVR) registration. It seems unlikely though that PVR would be of widespread importance in forestry - registration of all of the clones of interest to a program would be expensive, and Genotype × Environment interaction (GXE) is likely to limit the extent to which clones could confidently be transferred from one program to another without prior testing.

Marker Assisted Selection

The concept of marker-assisted selection is particularly appealing in forest tree breeding programmes due to the nature of the traits of common interest. Frequently, these traits:

are of low heritability
are difficult and costly to assess using traditional methods, e.g. height, wood quality, disease resistance
involve a delay of many years before assessment can be made

The last of these is particularly significant. Early selection, long the hope of most tree breeders, could be utilized in two ways:

Early selection of clones to be propagated for large-scale use, avoiding the delay and expense involved in clonal testing.
Reduction of the time to selection in the breeding population, leading to a reduction in the generation interval, provided that early flowering can also be achieved. Long generation intervals are the main constraint to rapid genetic improvement in most forest tree species. An interesting example is that of white birch (Betula), which flowers at a very early age (one or two years), but for which selections for wood quality cannot be made until age 20 (Lapinjoki et al. 1992).

Rapid progress and positive results with marker-assisted early selection in crop species has led to hopes that the “holy grail” of forest tree breeders may be close at hand. There are a number of factors, however, which act to severely limit the potential of marker-assisted selection in forest tree breeding (Neale & Williams 1991, Strauss et al. 1992):

Applications and promising results with agricultural crop species almost exclusively concern simply inherited traits such as disease resistance, and these are the traits of greatest importance to breeders of most agricultural crops. Although there a few simple genes of potential commercial importance in forest tree species, e.g. alleles conferring resistance to Cronartium ribicola in Pinus lambertiana, and to C.quercuum in P.elliottii (Nance and Nelson 1989), most traits are polygenic. As indicated above, work with QTLs is in its infancy, particularly for forest tree species. Associations between markers and QTLs will be very difficult to establish where a trait is under the control of a large number of genes. There is growing evidence, however, that for many quantitative traits, the major part of the genetic variation can be explained by a handful of the total number of QTLs.
Substantial Genotype × Environment interactions, common in forest tree species, would necessitate the development of different sets of markers for different sites. Of 29 QTLs identified in tomato hybrids grown in three environments, for example, only four were common to all three, and 15 were unique to single environments (Strauss et al. 1992).
Marker-assisted selection is dependent on the ability to define, in the breeding material, a consistent linkage between marker alleles and alleles of economic interest. In other words, linkage disequilibrium in the breeding line is essential. Linkage disequilibrium is predictable in crop breeding programmes based on genetically narrow or discrete lines, e.g. the backcrossing programmes in tomato and other crops, or the inbreeding and hybridization programmes in maize. The traditional large outcrossed breeding populations used with forest tree species, however, will tend to be in linkage equilibrium (Hastings 1989, Neale & Williams 1991, Strauss et al. 1992). Relationships between marker and QTL alleles will thus vary among families. Linkage disequilibrium will be present, however, in interspecific hybrid breeding programmes, and the potential use of RAPD markers for selecting for species-specific characteristics such as bark thickness in F₂ populations of the Queensland Pinus elliottii × P.caribaea hybrid breeding program has been discussed (Dale, Gates et al. 1993, Dale and Teasdale 1993). For pure species, linkage disequilibrium can be maintained and utilized by dividing the breeding population into small sublines (four to six founding clones per subline), and for each subline, establishing marker/QTL associations which could be used for several generations of breeding (O'Malley 1992). The strategy for the third cycle of the North Carolina Co-operative P.taeda breeding program is based on four tree sublines (McKeand and Bridgwater 1992). Poplar breeding programmes, based on clonal testing within progeny of crosses (often quite wide) involving known commercial clones, may also involve populations in linkage disequilibrium.
Current costs of RFLP and RAPD analyses are very high, such that application to selection in populations of the size generally used in tree improvement programmes would be impractical. Furthermore, costs of establishing the facilities required to conduct such analyses are high. It has been estimated, for example, that the cost of setting up a RFLP laboratory to work in conjunction with an operational crop breeding program would roughly equal the cost of setting up another breeding station (Strauss et al. 1992).

In general, widespread application of marker-assisted selection in forestry is going to require the availability of cheaper markers. Application to reduction of the generation interval in the breeding population will furthermore require adoption of program structures radically different from those commonly used at present (and probably considerably more expensive to maintain), with the exception of interspecific hybrids, and perhaps also the poplar hybridisation and clonal selection programmes. For industrial species with relatively small planting programmes, and for most non-industrial species, the sophistication required in breeding may be difficult to justify economically.

As noted above, early selection would be of value also in the identification of clones for commercial use. Marker-assisted selection could conceivably be used for this purpose, without restructuring of the breeding population marker-QTL associations would be established for particular families, and these relationships used to select the desired genotypes within family. Apart from the cost question, this application is dependent on the existence of an operational clonal forestry system, and a breeding program which is generating genotypes worthy of selection. Once again, this implies a level of sophistication and advancement evident in only a very small proportion of tree improvement programmes. Nevertheless, it should be pointed out that there are a few programmes which are at or near this level of advancement, and for which marker-assisted selection can be contemplated in the medium term.

CONCLUSIONS

Markers have important immediate applications in supportive research for advanced breeding programmes with industrial species - mainly in relation to quality control, e.g. checking of clonal identification, orchard contamination and within-orchard mating patterns. Isozymes will be satisfactory for many, but not all, of these purposes.

Markers also have important immediate application in supportive research for tropical hardwoods and non-industrial species, in particular for essential studies of mating systems. Isozymes will be satisfactory for this. Markers will also be useful for the quantification of genetic variation, although they must be used conservatively.

Realistically, application of marker-assisted selection in the short to medium term is likely to be very limited. Cheaper markers would be required and, even when these were available, the technology would apply only to advanced and sophisticated breeding programmes - those in which creation and maintenance of the appropriate population structures could be afforded, and where clonal forestry is achievable. A small number of programmes fall into this category and, for these, some effort aimed at developing marker-assisted selection can be justified. For most species, current resources would be better directed towards moving breeding programmes to this stage of advancement, rather than to the development of marker assisted selection. Marker assisted selection is unlikely to have much application to non-industrial species, although, by virtue of a very short generation time, taxa such as Gliricidia may be useful model species for experimentation.

The major application of markers lies in strategic research - in the great contributions which marker studies are making to rapid advances in the understanding of basic genetic mechanisms and genome organization at the molecular level. For forest tree species, the study of quantitative traits will be an important focus of this work in coming years. This work will be most efficiently concentrated on a few model species, e.g. loblolly pine.