Much public interest was generated by the publication of two draft sequences of rice genome in the 5 April 2002 issue of the journal Science. Two separate research groups, one from China and the other from a private company, Syngenta, published the first draft sequences of the rice genome of the subspecies Oryza sativa indica and Oryza sativa japonica. While these two reports mark a milestone in the study of the rice genome, they are by no means complete and they contain errors. Nevertheless, they do represent the first detailed genetic blueprint of a food crop. The first plant genome to be completely sequenced was that of Arabidopsis, a weedy plant belonging to the Crucifereae family. China has made its sequence data publicly available; the Syngenta data, on the other hand, are not freely available and may only be obtained through a signed agreement.
The Chinese team comprising scientists from 11 Chinese institutions was led by Jun Yu of the Beijing Genomics Institute (BGI), a publicly funded institute established in 1999. It worked in collaboration with the University of Washington Genome Center to unveil the sequence of the subspecies indica. The rice variety chosen was one of the parental lines of a Chinese hybrid variety that has revolutionized rice production in China. The second parent of the hybrid is being sequenced.
The Syngenta team, which included scientists from Myriad Genetics Inc. (another private biotech company), was led by Stephen A. Goff. It unveiled the sequences of the subspecies japonica, which is popular in Japan. The variety chosen by Syngenta - i.e. Nipponbare - is the same as that being sequenced by a publicly funded international consortium of ten countries: Japan, USA, China, Taiwan, France, India, Korea, Thailand, UK and Brazil. In January 2001, Syngenta announced that it had completed the sequencing of the rice genome, but it did not publish the data.
With the public release of the indica rice sequence data by China, the International Rice Genome Sequencing Project (IRGSP) is set to complete the entire japonica sequence by December 2002 with an accuracy of 99.99 percent - genetically aligned as well as correctly oriented. The final finished sequence with only a small number of gaps or errors is expected to be completed by 2005, three years ahead of the original schedule.
IRGSP chose the japonica variety, Nipponbare, for sequencing - as sequenced by Syngenta. The clone-by-clone technique adopted is slower and about ten times more expensive, but much more accurate and complete. BGI and Syngenta, on the other hand, adopted the quicker whole genome shotgun approach. The latter method was pioneered for sequencing the fruit-fly Drosophila melanogaster, and adopted in 2001 by Celera Genomics for sequencing the human genome. The publicly funded international human genome project has also adopted the clone-by-clone sequencing method.
The clone-by-clone approach involves the creation of physical and genetic maps of the genome, developing a library of clones, or relatively short stretches of DNA, anchored to a specific location identified from the map; these clones are then sequenced. In the shotgun approach, the entire genome is randomly broken into pieces and sequenced. Large coherent units of sequence are then created from the pieces with the aid of high-speed computers and software which look for overlapping DNA sequences and identify contiguous genomic regions. This method is prone to errors and gaps remain, because higher organisms have several repeated sequences in their DNA preventing the assembly of sequence reads on the genome scale.
That Syngenta and BGI have both completed sequencing of the rice genome should be understood correctly: as of 2001, only 28 percent of the genome had been fully sequenced, 66 percent was sequenced with gaps and 6 percent was unknown. In the case of the rice genome, the sequences released by the two groups cover only 93 and 92 percent, respectively, of the total genome. IRGSP has released its more accurate sequences in the public database, GenBank, which to date covers 68 percent of the genome. Syngenta sequenced 5.5 million random clones and eventually assembled the sequenced snippets into 42 109 contigs; similarly, BGI lined up the large number of sequenced snippets into 127 550 contigs. It is evident that there are a large number of gaps between the two genome sequences. The IRGSP-released data contain only 2 000 contigs; in the final finished sequence stage - Phase 3 - this number should come down to 12.
Both the Sygenta and BGI sequences are less accurate than the IRGSP sequences, where an error of one in 10 000 base pairs (bp) is envisaged, i.e. 99.99 percent accuracy. This is the level of accuracy required by the Bermuda Declaration, adopted for the publicly funded human genome project and also followed by IRGSP.
The sequence already submitted to IRGSP by its partners is 340.2 mega bp. The next 2 to 3 years will see the highly accurate completion of the sequence of all 12 rice chromosomes in a single contig from telomere to telomere. The highly accurate rice genome sequence needs to be obtained for the following reasons:
The ability to determine gene function is highly dependent on accurate sequences.
As rice is a model plant, a complete rice sequence will directly affect what can be accomplished with other cereals.
Agronomic traits of economic importance require precise map-based genomic sequence.
Knowledge of the complete genetic code of rice will help breeders develop strains of the crop with specific characteristics (e.g. stress tolerance, disease resistance or high yield) much quicker than through traditional methods, which may require years of crossing to achieve the desired property. Sequencing of all the rice genes provides insufficient information on which to base crop improvements: sequence information linked to complete physical and genetic maps of the genome is required to exploit the full potential of the sequences. Herein lies the importance of the consortium effort. The IRGSP sequencing will not only be complete and very accurate but will also be genetically anchored: on the basis of physical and genetic maps, the precise ordering of the short stretches of sequence and their precise location along the DNA can be identified.
The draft sequence of part of the rice genome achieved using the clone-by-clone approach was made available by Monsanto to IRGSP in early 2000. Since Monsanto had also sequenced the Nipponbare strain, the sequenced clones (though of low - fivefold - accuracy) could be used by institutions participating in IRGSP, on condition that they not be commercially exploited. This enabled late entrants, such as France and Brazil, to progress rapidly in sequencing the chromosome share allocated to them. China used the Monsanto data for its full sequencing of chromosome 4, an experience useful for Chinas parallel effort at sequencing indica.
While the BGI group has submitted its data to GenBank, Syngenta has not done so. Instead, it has entered into an agreement with Science journal to allow only limited access to its raw data for noncommercial purposes (similar to what Celera did with its human genome sequence data). The conditions limit searches to fewer than 15 kbp of the sequence at a time and no more than 100 kbp of sequence a week may be downloaded. An academic requiring more than 100 kbp a week must obtain the companys prior written approval. More sequence may be freely downloaded, provided the researchers institution enters into an agreement with Syngenta stating that the data will not be used for commercial ends.
As in the past, Syngentas commercial hold over data has sparked a controversy. Since Nipponbare was used by both Syngenta and IRGSP, the data may be useful to the latter. Syngenta agreed to make its data available to IRGSP, but has not done so to date. According to IRGSP, Monsanto clones and sequences underlie 30 percent of the sequence in public databases. Given that the Syngenta data are probably of better quality than the Monsanto data, they would be able to accelerate IRGSPs sequencing efforts. The two public groups (IRGSP and BGI) and the two private groups (Syngenta and Monsanto) could ideally come together and pool their efforts in a single joint project. Meanwhile, IRGSP has stated that negotiations are underway with Syngenta to gain access to its data on terms similar to those with Monsanto.
Knowledge of the complete rice genome is important because rice is a model experimental plant. All flowering plants are divided into two classes: monocots and dicots. In November 2000, the genome sequence of Arabidopsis thaliana, a weed belonging to the cabbage and mustard family and widely used in research, provided the first complete view of the genetic code of a dicot. Rice, on the other hand, is a monocot and belongs to the grass family, which includes other important cereals, such as maize, wheat, barley and sorghum. Its genomic sequence now enables comparison between the genome of a monocot and that of a dicot, as well as between their genomes and those of other organisms. Unlike other cereals, rice has a relatively small genome (about 400 million base pairs [Mbp]). The maize genome - whose sequencing the United States is launching under its National Genome Initiative - is 2 500 Mbp, barley is 4 900 Mbp and wheat is 16 000 Mbp. For the sake of comparison, the size of the human genome is approximately 3 000 Mbp.
While Syngenta researchers estimate the size of the rice genome to be 420 Mbp, the BGI group gives a substantially higher estimate of 466 Mbp. IRGSPs initial estimate was 400 Mbp and its more recent estimate - based on the recently published detailed physical and genetic map - is 403 Mbp. More interestingly, the two draft blueprints reveal that a rice plant contains more genes than a human being. While the number of human genes is estimated to be around 35 000, indica rice contains between 45 000 and 56 000; and the number of genes in japonica may be between 42 000 and 63 000. It is thought that rice has more genes than humans because plants rely on duplication for protein diversity. Protein diversity in humans, on the other hand, is believed to be achieved by a process of alternative splicing: a single gene does several things, and the genes are constantly broken up and spliced together with a different sequence and function.
Genome sequence provides a handle for discovering all the rice genes and establishing their functionality. Gene identification and functional validation is what functional genomics is all about and where the future focus of research lies. Using the sequence data, it is possible to begin developing appropriate markers to breed rice varieties, isolate genes and study how they can be over-expressed or under-expressed. Furthermore, with genes identified for desirable traits and functions, it would also be possible to create transgenics. One way to determine the functions of genes is to create deletion mutants, rice varieties with specific genes deleted from their DNA. IRRI is expected to have over 40 000 deletion mutant varieties by the end of this year; discussion is underway with Indian scientists to develop appropriate mutant lines for the International Rice Functional Genomics Programme that IRRI has established and for which sequence data would be useful.
With knowledge of the sequence of specific genes, it will be possible to tap into the natural genetic variation of the rice species - of which IRRI has about 100 000 accessions. In India, the Central Rice Research Institute has 42 000 accessions in its germplasm collection. The rice seeds in a germplasm collection serve as a pool of natural variants whose advantageous traits have not been fully tapped owing to lack of genetic handle on them. Knowledge of the sequence of specific genes will enable tapping into the natural genetic variation in the rice species/germplasm. Now, however, once the gene associated with a particular trait is known from sequence data, alleles variants of this gene can be examined from the germplasm collection for their relative usefulness, and the accurate gold standard sequence data being generated by the IRGSP will thus become invaluable.
Assigning functions to each of the approximately 50 000 predicted genes in the rice genome is vital, and it requires much wider participation of rice breeders, biotechnologists and computer scientists.
The Rice-Functional Genomics Workshop was organized in India on 20-21 May 2002, with the participation of distinguished national and IRRI scientists. Following in-depth deliberations, the following recommendations emerged:
1. A National Consortium for Functional Genomics of Rice (NCFGR) must be established to carry out and coordinate functional genomics research of rice.
2. A database should be developed with the help of rice breeders, listing donor rice varieties with important traits of agronomic importance. Similarly, a database of all the segregating populations/lines available should be developed and made available on the Web.
3. Project proposals for rice functional genomics research are to be developed under five categories of identified traits, each with its own coordinator:
- Abiotic stress: salinity and drought.
- Biotic stress: bacterial leaf blight, blast resistance, sheath blight, yellow stem borer, brown planthopper and gall midge.
- Yield components: grain number, grain weight, plant type, male sterility-fertility restoration and photosynthetic efficiency.
- Micronutrients: Fe, Zn and vitamin A.
- Gene discovery expression and bio-informatics.
 All genetic code is made
up of different sequences of four fundamental chemicals called