Christian Bachem

Unravelling the potato genome

Christian Bachem, of the Department of Plant Sciences, Wageningen UR, in the Netherlands, is coordinator of the Potato Genome Sequencing Consortium (PGSC), an international research network that aims at revealing the potato's complete set of DNA by the end of 2010.

Why is it so important to sequence the potato genome?

"Cultivated potato is what we call a highly heterozygous, self-incompatible outbreeder – in practice that makes it impossible to produce true breeding lines and so genetic improvement is a complex and lengthy process. We estimate that well over 40 000 genes are encoded by the genome. The problem is these genes are not conveniently located in clusters. By unravelling the complete DNA sequence, we aim at localizing and identifying the genes coding for important traits such as disease resistance but also for nutritional attributes, such as starch quality, protein and vitamin content. Genomic sequencing will deliver molecular markers that breeders can use to increase the efficiency and rapidity of their breeding programmes. In the longer term, the full genome sequence will form the basis for understanding the biological processes underlying complex traits such as yield and quality."

What do we already know about the potato genome?

"Potato has 12 chromosomes, each one about 70 million base pairs long, which makes it about a quarter the size of the human genome. We estimate the size of the complete sequence at 840 Mbp [Mega base-pairs], which means 840 million nucleotides that line up in a particular order to form the potato's chromosomes."

How is the Potato Genome Sequencing Consortium organized?

"The consortium consists of nationally supported scientific research institutes in Argentina, Brazil, China, Chile, India, Ireland, Netherlands, New Zealand, Poland, Peru, Russia, the UK and the United States. Each national partner will sequence at least a third of a chromosome, and each chromosome has been assigned to one or more countries."

What is your approach to sequencing the potato genome?

"Mapping the DNA code of over 800 million base pairs is a huge technical and bio-informatic challenge. At Wageningen's Laboratory of Plant Breeding we are using a novel approach for mapping and aligning a library of large chunks of potato genomic DNA called 'bacterial artificial chromosomes', or BACs, which are small, manageable parts of the entire genome averaging 120 000 nucleotides. The technique involves first creating an ultra-high density genetic map of the potato genome using molecular DNA markers. The DNA markers with a known genetic location can then be used to identify groups of overlapping BACs to form a physical map."

What is the current status of the PGSC project?

"We are currently assembling the Potato BAC library into a physically and genetically anchored map, which will allow the sequencing of relevant chromosome sections by consortium partners. Most of the partners have been able to raise funding for sequencing the chromosomes assigned to them and, in most cases, have established sequencing facilities. One important initiative the PGSC is pursuing is a collaborative training scheme with countries that have identified specific gaps in their know-how. Through this collaboration, junior scientists will visit our facilities for training, for example in bioinformatics. These arrangements have been made with China and Brazil and discussions are underway with other consortium members."

How much will the entire project cost?

"Sequencing of the human genome was achieved in 2003 at a total cost of about $800 million. Since then, the cost of sequencing has been very much reduced. The total cost of sequencing the potato genome would be, we estimate, around €25 million. An equal amount is probably needed for closing gaps and for the bioinformatics needed for assembly and annotation. A worldwide effort of around €50 million is therefore likely to be needed."

What is the PGSC policy on sharing genome data?

"We have an open information policy. All data is intended to be freely shared between the consortium partners and the scientific community at large. The data of the potato genome sequence is shared within the consortium for six months for quality control, after which it is being released as nucleotide flat files into the public domain."