Previous Page Table of Contents Next Page


2. HOW FOREST GENETIC CONSERVATION CAN BENEFIT FROM NEW ACHIEVEMENTS IN GENOMICS


2.1. Introduction to genomics
2.2. DNA sequencing of entire genomes
2.3. Gene discovery and expressed sequence tag polymorphisms (ESTPs)
2.4. Physical and genetic mapping of the whole genome using numerous genetic markers
2.5. Analysis of genetic control of complex adaptive traits via quantitative trait loci (QTL) mapping
2.6. Candidate gene mapping of adaptive genes
2.7. Comparative mapping of adaptive genes

2.1. Introduction to genomics


2.1.1. Structural genomics
2.1.2. Functional genomics
2.1.3. Comparative genomics
2.1.4. Associative genomics
2.1.5. Statistical genomics

Genomics has arisen as a new science that studies the whole genome by integrating traditional genetic disciplines such as population, quantitative and molecular genetics with new technologies in molecular biology, DNA analysis, bioinformatics and automated robotic systems (Figure 1).

Figure 1: Genomics is a broad discipline that integrates traditional areas of genetics (adapted from Figures 1.1 and 1.2 in Liu 1998).

A number of subdisciplines of genomics can be combined to provide a powerful approach to studying adaptive genetic variation: structural, functional, comparative, statistical and associative genomics. A brief description of these subdisciplines might be useful in helping those new to the field to understand how modern genomics can affect gene conservation.

2.1.1. Structural genomics

Structural genomics attempts to identify all the genes in the genome, sometimes called gene discovery, and to determine their locations on the chromosomes. This goal is achieved by sequencing individual genes, gene segments or entire genomes. The individual genes are identified from the DNA sequence using sophisticated computer algorithms. The biochemical function of a gene is deduced via comparison of the DNA sequence with the sequences of genes of known function in the databases. When complete sequence of an entire genome is not available the location of genes can be determined either by direct physical mapping or by genetic mapping of the entire genome using numerous genetic markers. One of the most prominent applications of structural genomics for the study of adaptive genetic variation is quantitative trait loci (QTL) analysis via genome mapping. However, this approach aims to explain genomic structure and gene interaction at the genomic rather than functional level, unlike functional genomics.

2.1.2. Functional genomics

Functional genomics seeks to understand the function of genes and how they determine phenotypes. One of the major advances in functional genomics is using DNA microarrays (also known as “DNA chips”) to measure the specific expression of thousands of genes simultaneously. DNA microarray contains thousands of DNA samples or oligonucleotide sequences printed or synthesized onto nylon membrane filter or microscope glass slide in a precise and known pattern and representing thousands of genes in the genome respectively. Each DNA spot represents a unique gene that is used to quantitatively measure mRNA (messenger RNA) expression by hybridizing to fluorescent labelled mRNA (Figure 2). Adaptive response to different environmental stresses and treatments can be studied for many genes simultaneously or in parallel by analysis of differential responses of thousands of genes using DNA microarrays.

Figure 2: The use of DNA microarrays in differential gene expression analysis (adapted from Albelda and Sheppard 2000). Comparative hybridization experiment involves isolation of messenger RNA (mRNA) from two separate samples (A). The mRNA from each sample is treated with revers transcriptase (B) and labelled with a distinct fluorescent tag (C). The two pools of labelled RNA are mixed, hybridised to the DNA microarray containing a full set of thousands or tens of thousands of DNA sequences based on genome or complimentary DNA (cDNA) sequences, and washed (D). The microarray array is scanned using a specialised fluorimager, and the colour of each spot is determined (E). In this example, genes expressed only in Sample A would be red in colour, genes expressed only in Sample B would be green and those genes expressed equally in both samples would be yellow. This allows researchers to determine genes that are specifically expressed in response to the specific treatment or disease, or tissue-specific genes that are expressed in one tissue, but not in other.

2.1.3. Comparative genomics

Comparative genomics uses information from different species and assists in understanding gene organization and expression and evolutionary differences. It takes advantage of the high level of gene conservatism1 in structure and function (i.e., little variation across diverse taxa) and applies this principle in an interspecific manner in the search for functional genes and their genomic organization. Comparative genomic studies are also enhanced by examining a diversity of model organisms in which physiological, developmental or biochemical traits are readily studied.

1 Unchanged gene location in chromosomes among related species
DNA sequence comparison and comparative genetic mapping are the most often used methods in comparative genomics. It is likely that the study of adaptation in forest tree species will greatly benefit from comparative genomic studies using different models as well as other well-studied species. In particular, genomic studies in a small flowering Brassica plant, Arabidopsis thaliana, widely used as a model species, have already yielded complete genome sequence data. A complete genome sequence of poplar also should soon be available for comparative genomic analysis.

2.1.4. Associative genomics

Associative genomics searches for mutations in populations via linkage disequilibrium analysis and via direct assessment of association between alleles and phenotypes. This approach can be effectively used in the search for adaptive mutations such as disease resistance, drought tolerance, cold hardiness, etc. DNA variants or mutations (inherited differences in DNA sequence) can either directly contribute to phenotypic variation, influencing an individual’s phenotypic characteristics (e.g., risk of disease and response to the environment), or can be tightly linked to the genes causing this variation. In the latter case, the alleles serve as markers of the selective genes and can be in linkage disequilibrium with alleles of this gene due to the limited population size, recent origin, low recombination rate and/or strong selection acting on alleles of the linked selective gene. Once candidate alleles responsible for adaptive traits are detected via QTL, candidate and comparative mapping, it will be possible to perform association studies to estimate effects of alleles or haplotypes2 on phenotypes. It should be practical to define common haplotypes using a dense set of polymorphic markers, and to evaluate each haplotype for association with disease or any particular adaptive trait. Single nucleotide polymorphisms (SNPs) are the most appropriate markers to characterise haplotypes and to achieve the required density of markers. Most sequence variation is attributable to SNPs, with the rest attributable to insertions or deletions of one or more bases, repeat length polymorphisms and rearrangements. SNPs occur (on average) every 1,000-2,000 bases when two individual sequences are compared, and are thus present at sufficient density for comprehensive haplotype analysis. SNPs are binary, and thus well suited to automated, efficient and fast genotyping. It is likely that soon a SNP map with sufficient density will be created for forest tree species, and will be used in the associative genomic study of adaptive variation. Such studies should help to find haplotypes and genetic variants that are either directly involved in the genetic control of adaptive traits or have non-random association with these traits due to a tight linkage and linkage disequilibrium.

2 A particular combination of alleles or sequence variations that are closely linked - that is, are likely to be inherited together - on the same chromosome.

2.1.5. Statistical genomics

Statistical genomics is an integrative sub-discipline and serves all other areas of genomics. It provides statistical tools for genome and QTL mapping in structural genomics, bioinformatics tools for gene search, comparison and annotation in functional genomics, and statistical population genetic methods in associative genomics. Statistical genomics is also very important in developing computerised comprehensive interactive biological databases. New computer tools are required to compose genetic data at all levels of biological organization - from gene to population, species and ecosystems - for multiple purposes, including gene conservation.

Certainly, the division of genomics into these subdisciplines is rather arbitrary. Often the distinctions are vague or overlapping, but may be useful in helping those new to the field to understand modern genomics. In fact, genomics is a synthetic discipline that combines many methods and approaches of molecular biology, population and evolutionary genetics and bioinformatics (Figure 1). The purpose of genomics is to study the structure, function and evolution of genome as a whole via complete genome sequencing, creating functional genetic maps for entire genomes and simultaneous analysis of patterns of differential expression of all or thousands of genes in the genome representing different cells and tissues and/or different treatments and conditions. It facilitates understanding genomes at both a molecular and a phenotypic level. It is likely that soon we will have a catalogue of all or most of genes expressed in plant and animal genomes and those that play essential roles in species- and population-level adaptation. Identifying and understanding the function of these genes, we can associate genetic variation with phenotypes and study adaptive genetic variation in different populations.

2.2. DNA sequencing of entire genomes

Complete sequencing of genomes of several important and model species is a significant achievement of genomics, which provides the basis for comparative and functional analysis. Answers to questions such as (1) the number, location and distribution of genes in genome; (2) gene organization and their function; (3) what genes are the same or highly conserved across different species; and (4) what genes are responsible for species adaptation and evolution can now be obtained. Complete genome sequences have been determined for the yeast Saccharomyces cerevisiae (May 1997), the nematode Caenorhabditis elegans (December 1998), the fruit fly Drosophila melanogaster (March 2000), the annual plant arabidopsis (December 2000), the human (February 2001), and will also become available soon for mice, mouse, rice and maize. It now seems experimentally possible to determine the complete sequence of a forest tree genome such as Populus (500 million bp3) or Eucalyptus (340-580 million bp).

3 Nucleotide base pairs
The number of genes in a genome is limited and turns out to be not as high as expected earlier (for instance, only ~26,000 in plants and animals vs. ~6,000 in baker's or budding yeast, Table 2). Moreover, many genes are common across different species and are practically unchanged from the distant evolutionary past. For instance, only 94 of 1278 protein families in the human genome appear to be specific to vertebrates. The most elementary of cellular functions - basic metabolism, transcription of DNA into RNA, translation of RNA into protein, DNA replication and the like - evolved just once and have remained almost unchanged since the evolution of single-celled yeast and bacteria.

Table 2: Genome size of several species for comparison

Taxonomic rank

Latin name

Common name

Haploid chromosome

Nucleotide base pairs (x 106)

Genes (x 103)

Prokaryotae

Archae

12 1

archael microorganisms

-

1.6-3.0

1.5-2.7

Bacteria

40 1

bacterial microorganisms

-

0.6-7.0

0.5-6.6

Bacteria

Escherichia coli 2

no common name

-

4.6

4.3

Eukaryotae

Yeast

Saccharomyces cerevisiae 2

baker's or budding yeast

16

12

6

Worm

Caenorhabditis elegans 2

nematode

5/6

97

19

Insect

Drosophila melanogaster 2

fruit fly

4

180

13.6

Annual plant /Angiosperm

Arabidopsis thaliana 2

arabidopsis

5

125

25.5

Annual plant/ Angiosperm

Oryza sativa 2

rice

12

400

?

Annual plant/ Angiosperm

Zea mays 2

maize

10

2,400-3,200

?

Perennial plant / Angiosperm

Lycopersicon esculentum

tomato

12

900

?

Forest tree / Angiosperm

Eucalyptus 3

eucalypts

11

340-580

?

Forest tree / Angiosperm

Populus 3

poplars

19

500

?

Forest tree / Gymnosperm

Pinus 3

pines

12

20,000-30,000

?

Mammals / Rodent

Mus musculus 2

mouse

20

3,500

21-30

Mammals / Primate

Homo sapiens 2

human

23

3,400

26-31

1 Number of species with completely sequenced genomes.
2 Species with completely or almost completely sequenced genome.
3 Data are based on several species.
Comparative genomics explores such gene conservatism, which helps to understand and infer the function of a particular gene from the data obtained for similar homologous genes studied in other organisms. Much about forest tree gene functions can be learned from the data obtained in other organisms, such as Arabidopsis. Complete genome sequences are not yet available for any forest tree species, although advances in sequencing technology should make it possible in the near future. Among various challenges are the complexity and large size of tree genomes (Table 2). The size of the pine genome (20,000-30,000 million bp), for example, is 6 to 8 times larger than the human genome (3,400 million bp), and 150 to 200 times larger than the genome of Arabidopsis (125 million bp). Even the relatively small physical size of the Populus genome (500 million bp), which is 40 times smaller than the best-studied conifer, Pinus taeda, and, therefore, will be likely the first forest tree genome to be entirely sequenced, is still about 4 times as large as Arabidopsis (although similar to rice and 6 times smaller than maize, both of which are almost completely sequenced).

2.3. Gene discovery and expressed sequence tag polymorphisms (ESTPs)

An alternative to complete genome sequencing for discovering genes is being used in trees and many other organisms, which is based on identifying only the DNA that code for genes that are expressed in the genome. These sequences are called expressed sequence tags (ESTs). They are partial or complete sequences of complementary DNA (cDNA) obtained from mRNA isolated from different tissues and therefore represent genes expressed in these tissues with often known or suggested function (Figure 3). EST sequences are compared to all other sequences in gene databases to identify matches likely representing highly homologous genes. If there is a high similarity (homology) to some other gene sequence whose identity has been determined, then the identity of the EST can be immediately inferred. ESTs can be used as a source for identifying candidate genes for QTLs involved in genetic control of adaptive traits. Large libraries of partial or complete sequences of thousands of expressed genes have already been obtained, and large databases of EST sequences are available for many animal and plant species, including several forest tree species, such as Monterey or radiata pine4 (Pinus radiata), loblolly pine (P. taeda), Norway spruce (Picea abies), Eucalyptus and Populus.

4 See http://www.cbc.med.umn.edu/ResearchProjects/Pine/
Expressed sequence tag polymorphisms (ESTPs) are derived from ESTs. Using EST sequences polymerase chain reaction (PCR) primers are designed to amplify ESTs from individual genomic DNA (Harry et al. 1998). Allelic polymorphism in the amplification product (ESTPs) can be revealed using different modern methods for detection and visualisation of DNA alterations (Kristensen et al. 2001).

Figure 3: The development of expressed sequence tag (EST) markers in forest trees that can be used in comparative and candidate gene mapping. EST markers are derived from partial or complete sequences of complimentary DNA (cDNA) libraries that obtained from messenger RNA (mRNA) isolated from different tissues (for instance, xylem). EST sequences are submitted to gene databases and compared to all other sequences in the databases to identify matches likely representing highly homologous genes. Polymerase chain reaction (PCR) primers based on the EST sequences are designed to amplify these genes. If these genes are polymorphic and segregate in the experimental population or progeny, they can be used in the genome and quantitative trait loci (QTL) mapping. They are good candidate genes for QTLs.

ESTPs mostly reveal genetic variation within genes, although variation can be found in both coding and non-coding regions of genes. Thus, ESTPs are the most informative markers in terms of gene function among the most recently developed one and are the first genetic markers that offer real potential for detecting adaptive genetic diversity broadly.

2.4. Physical and genetic mapping of the whole genome using numerous genetic markers

Genetic linkage mapping is central to genomics. It allows the positioning of genes and genetic markers on a specific chromosome. There are two kinds of maps: physical and genetic. Physical maps provide the exact location of genes or genetic markers on chromosomes. These maps are either assembled from the complete genome sequences, BAC5 contigs6, or based on in situ hybridization or other methods. However, as long as the complete genome sequences of forest tree species are not available the alternative approach is to develop genetic linkage maps by segregation and linkage analysis. Genetic maps identify the distance and order between markers based on the number of recombination events between them. Genetic maps have been already constructed for many different forest tree species using a variety of genetic marker types (see Table 1; Neale and Sederoff 1996; Krutovskii et al. 1998 and Cervera et al. 2000 for review). A complete sequence alone is not sufficient to understand the genetic control of adaptive traits. These traits are usually very complex, have quantitative inheritance and are controlled by many genes each with relatively small effects, which are called quantitative trait loci (QTL). Genetic maps can be used to study the number, location and distribution of QTLs in a genome via their genetic linkage mapping with molecular markers. Following this approach, a new genomic technique called QTL mapping has been relatively recently developed.

5 Bacterial artificial chromosome (BAC): A chromosome-like structure, constructed by genetic engineering that carries large segments of DNA—100000 to 200000 bases—from another species cloned into bacteria. Once the foreign DNA has been cloned into the host bacteria, many copies of it can be made.

6 A group of clones representing overlapping regions of a genome.

2.5. Analysis of genetic control of complex adaptive traits via quantitative trait loci (QTL) mapping

The method for finding and locating QTLs is called QTL mapping. The conceptual basis of this method is comparatively simple but it requires relatively dense genetic maps with evenly distributed markers covering the entire genome, appropriate statistical tools, and an experimental population of sufficient size segregating for both genetic markers and phenotypic traits (e.g., Paterson, 1998). First, multi-locus genotypes (molecular markers) and phenotypes (quantitative traits) are measured on all individuals of segregating population. Then, phenotypic values are statistically associated with genotypic values, usually using multiple regression or maximum likelihood methods to identify markers that have a strong association (joint segregation) with the quantitative trait. Such association can be the result of either tight linkage of the genetic marker and QTL (i.e., because they reside in the same region of the chromosome) or direct involvement of this particular marker(s) in genetic control of the trait. QTLs have been already detected and mapped in forest trees for such adaptive traits as growth rhythm, phenology, form, wood quality, disease resistance, cold hardiness, drought tolerance, and others (see Neale 1998, Sewell and Neale 2000 and Neale et al. 2002 for review). Once a QTL controlling an adaptive trait has been precisely mapped, it then may become possible to clone the gene underlying the QTL based solely on the knowledge of its genetic map position and without knowing its function or DNA sequence. This is known as positional or map-based cloning.

Numerous recently developed PCR-based markers (e.g., RAPD, AFLP, SSR, STS, etc.) are used in QTL mapping (e.g., Sewell and Neale 2000 and Neale et al. 2002 for review). However, many of these markers are either dominant or anonymous, and their functions are unknown. There are three important aspects to consider when choosing a genetic marker system for QTL mapping: the outbred nature of forest tree pedigrees (1), the potential for comparative (2) and candidate gene (3) mapping. First, each parent of an outbreed pedigree is typically a different, highly heterozygous individual, where the transmission of up to four different alleles must be followed from the parents to progeny. Therefore, multiallelic codominant markers are best suited to detect the maximum number of polymorphisms found in the heterozygous parents. Second, comparative mapping, both within and among species, is an important tool for relating results from different mapping experiments. Therefore a subset of the markers used in a mapping experiment should be orthologous7 across pedigrees and species. Third, to identify actual genes controlling a quantitative trait, genes with known or suggested function should be used in QTL mapping. Complete or partial cDNA sequences (ESTs) allow now researchers to design ESTP markers that take into account all these aspects and can be used for genetic mapping of the entire genome and for measuring adaptive genetic diversity via QTL mapping analysis (e.g., Harry et al. 1998; Temesgen et al. 2001; Neale et al. 2002). These are the most informative markers for adaptive trait candidate gene mapping that is now used in animal and plant species, mostly agriculture stocks and crop species, to identify genes for different yield and quality traits including also adaptive traits such as biomass, growth rate, fecundity and other reproductive traits, disease resistance, etc.

7 Loci in two species that have arisen from the same locus of their common ancestor.

2.6. Candidate gene mapping of adaptive genes

Candidate gene mapping is based on the assumption that a gene with known or assumed function that may affect genetic control of a trait can be considered a ‘candidate gene’ for this trait (e.g., Gion et al. 2000; Neale et al. 2002). Furthermore, it is assumed that if this gene is also mapped to the same genomic region as a QTL for this trait, then this gene is very likely to be this QTL that directly controls the trait, although the likelihood depends on marker density, precision of QTL map and genome size.

Large forest tree EST projects will identify and provide DNA sequences that give researchers enough material to develop genetic markers for an unlimited number of genes that can be used as a source of possible candidate genes to target particular adaptive traits (Temesgen et al. 2001; Neale et al. 2002). Different subsets of specific EST markers can be used in mapping adaptive gene. For instance, EST markers derived from genes that are supposedly related to the cell defence mechanism can be used to map QTLs controlling disease resistance; EST markers derived from genes that are involved in the wood formation can be efficiently used in QTL mapping of wood related traits, etc. If function of genes used to derive ESTs is unknown but they represent cDNA isolated from a specific tissue or obtained from the cells that undergone a specific treatment, they still can be used as candidate genes in QTL mapping. For instance, heat shock genes expressed during experimental heat stress can be used to map genes related to drought resistance. The use of such meaningful markers as ESTs directly in genetic mapping makes analysis of adaptive variation more efficient and focused. In addition, highly efficient and sensitive methods are now being developed to detect allelic differences between these genes that can be used for mapping (e.g., SNP detection).

Identifying candidate genes for QTLs controlling adaptive traits in trees would ultimately provide the diagnostic tools to screen large amounts of wild germplasm for individuals carrying alleles worthy of conserving. The challenge is to identify DNA polymorphisms within candidate genes that will distinguish alleles and then associate alleles with differences among phenotypes. This can be accomplished through SNP discovery and association studies. The approach is to identify SNPs within regions of candidate genes involved in the control of a trait, to genotype a large number of individuals from the natural population at these SNPs, and to test for associations between SNPs and phenotypes. This approach will soon be available for application in forest tree conservation programs because of the intensity and progress of research and development activities.

2.7. Comparative mapping of adaptive genes

Comparative mapping is one aspect of comparative genomics and another very promising genomic approach for discovering adaptive genes. It takes advantage of high similarity in gene location in chromosomes of closely related species and applies it across different species to search for functional genes and their genomic organization. Comparative mapping in various species has shown that gene content and gene order are conserved over long chromosomal regions among related species of animals or angiosperm plants. These results strongly suggest that similar studies can be effectively done in the forest trees. The genetic maps of closely related species can be directly compared due to synteny (i.e., co-occurrence of two or more genes on the same chromosome) among the genomes of these species. Indeed, the high levels of co-linearity among, for instance, pine species (e.g., Brown et al. 2001) means that genetic information from one species can be applied to others (Figure 4).

Figure 4: Comparative genetic linkage maps of linkage group 6 of loblolly, slash, Monterey and maritime pines aligned using common expressed sequence tag (EST) markers and illustrating the potential utility of loblolly pine ESTs as anchored reference loci. Loci connected by a dotted line were detected by the same EST marker (Brown et al. 2001).

The most valuable alleles of adaptive genes can be identified from the pool of all species and possibly incorporated into breeding and conservation strategies. Furthermore, the controls and interactions affecting adaptive trait expression can be studied. Further studies should show whether comparative mapping between distantly related forest trees, for example between Populus or Eucalyptus and Arabidopsis, is also possible.

The development of genetic resources for comparative genomic analysis in forest trees would have significant impacts in many areas of forest gene conservation research. Comparative mapping would facilitate: (1) verification of QTLs controlling adaptive traits, (2) identification of candidate genes and (3) the understanding of evolutionary relationships. The emphasis in forest gene conservation is not on a single species, but on many, each with its own regional economic and ecological distinctions. Comparative genetic mapping in pines and other conifers follows this paradigm, focusing not only on the creation of individual species maps but also on the consensus maps to identify the genomic locations of genes affecting quantitatively inherited adaptive phenotypes, resistance to pathogens, and other biological and physiological characteristics.

Comparative mapping is possible if orthologous8 genetic markers have been mapped to each of the species maps to be compared. Orthologs are genes that have descended from a common ancestral locus, whereas paralogs are loci that have originated by gene duplications within an individual species.

8 Similarity in DNA or protein sequences between different species due to common ancestry. Describes the evolutionary origin of a locus. Loci in two species are said to be orthologous when they have arisen from the same locus of their common ancestor.
Most of the anonymous markers (e.g., RAPD, AFLP, and SSR) cannot be used for comparative mapping because they are not orthologus among species. Genetic markers that are based on genic DNA sequences, such as RFLPs and ESTPs, are more suited for comparative mapping. For example, RFLP loci from both Pinus taeda and P. radiata have been used to construct comparative maps between these species (Devey et al. 1999). However, because RFLP markers do not easily distinguish between orthologs and paralogs and because they are difficult to apply, they are unlikely to be used widely for comparative mapping. ESTPs are the most useful markers for comparative mapping and have been already used in genetic mapping in conifers (Tsumura et al. 1997; Perry and Bousquet 1998; Cato et al. 2001; Temesgen et al. 2001). ESTPs reveal orthologs among species and only occasionally paralogs. ESTP markers from Pinus taeda have been used to construct comparative maps for this species and slash pine, Pinus elliottii (Brown et al. 2001).


Previous Page Top of Page Next Page