Imagine trying to understand a country and its culture without knowing its language. Only a comprehensive knowledge of the language would give a newcomer the tools to begin to explore and understand the country. Publication of the human genome sequence in February this year (see box) was a little like equipping scientists with the language of the human body.
Sequencing, genetics and medicine A genome comprises essentially four
main types of molecules, or bases -- adenine, thiamine,
guanine and cytosine -- arranged in pairs in a double helical
structure. There are 3 billion base pairs and their order carries the
instructions to make a human being. Of the entire human genome
sequence, only 1.1-1.4% contains genes.
Two sequences of the human genome were published simultaneously in
February (see main text). They are roughly 92-94%
complete. The published sequences suggest that there are 31 000 genes
in the human body, far fewer than originally estimated -- vs
about 26 000 genes for plants, 18 000 for worms, 13 000 for flies and
6000 for yeast. One sequence was the work of the publicly
funded International Human Genome Sequencing Consortium and was
published in Nature (15 February 2001). The consortium has
made its data freely available to the public via the Internet on a
daily basis. Its work was undertaken by about a thousand scientists in
six countries, including one developing country, China.
The other sequence and its analysis were published by the US commercial
company Celera Genomics in Science (16 February
2001). Access to Celera's sequence data is more restricted and there
has been much controversy and rivalry between the public and
private ventures. The question is complex but what is clear is that
Celera's entry into the mass sequencing game spurred the public effort
to complete its task earlier than it would have done otherwise.
"Making the data publicly available," says Dr Virander Chauhan,
director of the International Centre for Genetic Engineering and
Biotechnology in New Delhi, India, "has levelled the playing field, so
that for the first time a university in New Delhi can compete directly
with a university such as Harvard in the States."
Though the Human Genome Project was conceived in 1985 and began in
earnest in 1990, since the beginning of the century
scientists have attempted to identify traits passed down through the
generations. Then, with the advent of molecular biology tools,
individual genes were isolated and sequenced. In the mid-1980s,
biologists, mainly in the USA, began to consider sequencing the whole
genome. Sequencing began in the late 1980s. About a decade later, the
project got under way in earnest, moving away from earlier
concerns about the function of genes and concentrating on the
To transform sequence data into diagnostic tests, vaccines, and
therapies, scientists have important questions to answer.
Although the location of most of the genes is now known, scientists
need to know which gene makes which protein, in which cell and at
what stage of life. Then they need to know a protein's specific tasks
and how different proteins interact with one another. Equally
importantly, researchers want to know how environmental factors
influence gene expression.
Now that the human genome sequence is known, the focus is firmly back
on gene function, only this time researchers will be
learning and exploring with an entire genetic language, not only the
few words interpreted from isolated observations.
The scientific community's reaction has been positive, but tempered by uncertainty over the time it will take for practical results to emerge. "Now," says Dr Virander Chauhan, director of the International Centre for Genetic Engineering and Biotechnology in New Delhi, India, "we can truly start to turn the genetic sequences into information important for medicine." But, cautions Dr Barry Bloom, dean of the Harvard School of Public Health in the USA, "there will be a long haul before the human genome is fully exploited -- even in the West. …