Nav Button - Home

This issue...

  News in Brief

  View from the Inside

  Human Genome Sequencing

  Magnetic Moment

  Scintimammography

  People

  About

  Subscribe Free






















































This issue...

  News in Brief

  View from the Inside

  Human Genome Sequencing

  Magnetic Moment

  Scintimammography

  People

  About

  Subscribe Free

Human Genome Sequencing: Nearing the End of the Beginning

by Sallie J. Ortiz

DNAThe long-awaited first analyses of working draft sequences of the human genome were published in special issues of Science (Feb. 16) and Nature (Feb. 15). Nature published the sequence generated by the publicly sponsored U.S. Human Genome Project (HGP), while Science published the draft sequence reported by the private company Celera Genomics.

Ari Patrinos, head of the DOE's Human Genome Program, led a series of meetings in 2000 between leaders of the public and private sectors of the human genome sequencing project that resulted in their historic agreement to announce the completion of a draft genome in June 2000, and a promise to later publish their analyses concurrently. They kept that promise in February 2001 in special issues of the Science and Nature journals.

Venter, Patrinos, Collins at the Whitehouse
Craig Venter, Celera Genomics; Ari Patrinos, DOE's Human Genome Program and Office of Biological and Environmental Research; and Francis Collins, NIH's National Human Genome Research Institute.

This achievement provides scientists worldwide with a virtual road map to an estimated 95% of all genes. The HGP has made a commitment to filling the remaining gaps and resolving all ambiguities in the sequence by 2003. In spite of the few gaps, scientists are already getting a good sense of what the genome landscape looks like and the surprising stories it has to tell. The following are highlights of HGP's findings published in Nature:

  • The distribution of genes on mammalian chromosomes is striking. It turns out that our chromosomes have crowded urban centers with many genes in close proximity to one another and also vast expanses of unpopulated desert where only non-coding "junk" DNA can be found. This distribution of genes is in marked contrast to the genomes of many other organisms, such as the mustard weed, the worm, and the fly. Their genomes, more closely resemble uniform, sprawling suburbs, with genes relatively evenly spaced throughout.


  • Though a definitive count of human genes must await further experimental and computational analysis, scientists now estimate that humans have some 30,000 to 35,000 genes in their genomes. This new estimate indicates that humans have only about twice as many genes as the worm or the fly. How can human complexity be explained by a genome with such a paucity of genes? It turns out that humans are very thrifty with their genes, able to do more with what they have than other species. Instead of producing only one protein per gene, the average human gene produces three different proteins.


  • The full set of proteins (the proteome) encoded by the human genome is more complex than those of invertebrates because humans and other vertebrates have rearranged old protein domains into a rich collection of new architectures. In other words, humans have for the most part achieved innovations by rearranging and expanding tried-and-true strategies from other species, rather than by developing novel strategies of their own.


  • Scientists have identified more than 200 genes in the human genome whose closest relatives are in bacteria. Analogous genes are not found in invertebrates, such as the worm, fly, and yeast. This suggests that these genes were acquired at a more recent evolutionary past, perhaps after the emergence of vertebrates. Scientists didn't find any single bacterial source for the transferred genes, indicating that several independent gene transfers from different bacteria occurred.


  • Our junk DNA, characterized by long stretches of repeating sequences, represents a rich fossil record of clues to our evolutionary past. It is possible to date groups of so-called "repeats" to when in the evolutionary process they were "born" and to follow their fates in different regions of the genome and in different species. The HGP scientists used 3 million such repeating elements as dating tools. Based on such "DNA dating," scientists can build family trees of the repeats, showing exactly where they came from and when. These repeats have reshaped the genome by rearranging it, creating entirely new genes, and modifying and reshuffling existing genes.


  • We have a greater percentage of junk DNA in our genomes—50 percent—than the mustard weed (11 percent), the worm (7 percent), or the fly (3 percent). Also, shockingly, there seems to have been a dramatic decrease in the activity of repeats in the human genome over the past 50 million years—as if the human species decided 50 million years ago to stop collecting junk. In contrast, there seems to be no such decline in repeats in rodents.


  • Ordinarily, repeat elements land in inhospitable regions of the genome—regions that are AT rich and GC poor. But mysteriously, one type of repeat called "SINE elements" have found a way to take up residence in the GC-rich neighborhoods of the genome. Over the years, SINE elements have acquired a bad reputation among scientists for what looked like parasitic behavior. But this bad reputation is unjustified. We now see that SINE elements may by helpful symbionts that earn their keep in the genome.


  • By dating the 3 million repeat elements and examining the pattern of interspersed repeats on the Y chromosome, scientists estimated the relative mutation rates in the X and the Y chromosome and in the male and female germ lines. They found that the ratio of mutations in males versus females is 2:1. Scientists point to several possible reasons for the higher mutation rate in the male germ line, including the fact that there are a greater number of cell divisions involved in the formation of sperm than in the formation of eggs.


  • Scientists have created a catalogue of 1.4 million single-letter differences, or single nucleotide polymorphisms (SNPs)—and specified their exact location in the human genome. This SNP map, the word's largest publicly available catalogue of SNPs, promises to revolutionize both mapping diseases and tracing human history.

The sequence information from the consortium has been immediately and freely released to the world, with no restrictions on its use or redistribution. The information is scanned daily by scientists in academia and industry, as well as by commercial database companies, providing key information services to biotechnologists. Already, many tens of thousands of genes have been identified from the genome sequence, including more than 30 that play a direct role in human disease.

Speaking of the value of genome data and technologies, Patrinos said, "We are eager to offer a future to our children and grandchildren in which cancer will be only a constellation in the sky."

Francis Collins, head of the NIH genome program said, "Researchers in a few years will have trouble imagining how we studied human biology without genome sequence in front of us."

More than $3 billion has been spent worldwide on the private side Human Genome Project since its formal inception in 1990. Although 16 institutions participated in the HGP, most sequencing takes place at 5 locations. These are the DOE Joint Genome Institute, Washington University (St. Louis), Sanger Centre (U.K.), Baylor College of Medicine, and Whitehead Institute. Bioinformatics teams at the Ensembl database project and the University of California, Santa Cruz, generated an ordered view of the 400,000 sequenced DNA fragments in the working draft.

In July 2000, the Wellcome Trust (U.K.) announced a 5-year investment in Ensembl of more than $14 million (£8.8 million) for automatic annotation of human genome data, including identification of genes and other biologically important sequence features.

HGP Resources: Lowering Public, Private Costs

HGP logoThe HGP's early phase was characterized by efforts to generate the biological, instrumentational, and computational resources necessary for efficient production-scale DNA sequencing. Pilot studies on large-scale sequencing began in 1996, and successes led to a ramp up in 1998.

In 1999, international HGP leaders set the accelerated goal of completing a rough draft of all 24 human chromosomes a year ahead of schedule. Resources pioneered in DOE-sponsored HGP projects (e.g., a new generation of automated capillary DNA sequencing machines and by DNA fragments called BACs) have facilitated this increased pace. Researchers in both the public and private sectors use BACs to speed their sequencing procedures.

The extraordinary achievements of the HGP stand as a testimony to the successful collaborations among scientists intent on overcoming massive technological challenges to move toward the common goal of understanding life at its most basic level. The DNA sequences give scientists the foundations to begin this work.

The status of human genome research today is well represented by the words of Winston Churchill in 1942, who said, after 3 years of war, "Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning."

What's Next?

The HGP plans to sequence the genomes of many other species, because comparing genomes across species will provide researchers key tools for understanding the essential elements that evolution has designated as important to survival. This information will in turn translate into practical knowledge toward developing better therapies in the future.

Comparative genomics will offer scientists insights into important regions in the sequence that perform regulatory functions. Also among the future plans for HGP scientists is the sequencing of other large genomes, such as primates. Scientists also plan to complete the catalogue of human variations in the population and identify the genes that predispose individuals to risk for common diseases.

The sequence will serve as a foundation for a broad range of functional genomic tools to help biologists to probe the function of the genes in a more systematic manner. Development of such post-genomic tools will be one of the major thrusts for biologists in the next decade.

world cellThe Department of Energy is poised to move forward in addressing some of these post-sequencing research challenges through a proposed new program called Genomes to Life.

Goals of the Genomes to Life program are to
(1) identify life's molecular machines, the multiprotein complexes that carry out the functions of living systems;
(2) characterize the gene-regulatory networks and processes that control life's molecular machines;
(3) characterize the functional repertoire of complex microbial communities in their natural environments; and
(4) develop computers and other computational capabilities needed to create models that describe the complexity of biological systems to enable prediction of their behavior and productive use of their functions.

There will be a pressing need for improved methods to analyze the abundance of information being generated. And genetics will become an increasingly important part of the medical mainstream. The pressure will grow to encourage educated use of genetic information and to set thoughtful limits on its use.


Related Web Links:

"On the Shoulders of Giants: Private Sector Leverages HGP Successes", Human Genome News

Human Genome Project and the Private Sector: A Working Partnership

"Sharing the Glory, Not the Credit," by Eliot Marshall, Science, Volume 291, Number 5507, 16 Feb 2001, pp. 1189-1193.

Webcast of the "First Analysis of Complete Human Genome Press Conference," February 12, 2001. (Click on Past Events, and then Conferences.)

Human Genome Project: The Science, History, and Societal Issues

Human Genome Project Information

History

Progress

Goals

Frequently Asked Questions about the Project

Anticipated Benefits of HGP Research

Ethical, Legal, and Social Issues

HGP Data Sites

DOE Joint Genome Institute

The ENSEMBL Project

European Bioinformatics Institute

National Center for Biotechnology Information

University of California, Santa Cruz

Baylor College of Medicine

Computational Biosciences, ORNL See also alternate site.

DNA Data Bank of Japan

Genome Database

GenBank

Sanger Centre

Stanford Human Genome Center

Washington University, St. Louis

Whitehead Institute

Medical Applications of Genome Research

Basic Information

Gene Testing

Gene Therapy

Pharmacogenomics

Genetic Counseling

Disease Research

Publications and Other Resources

Genetics 101—an overview

Glossary of Genetic Terms

Image Gallery

Human Genome News

Webcasts

DOE Primer on Molecular Genetics

To Know Ourselves

Research in Progress (More technical information about the HGP)

Fact Sheet about DOE's involvement in the Human Genome Project

Contents
Search
Comments

previous    next









































"Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning." —Winston Churchill, 1942.
   previous      next


• Energy Science News • Energy Science News • Energy Science News • Energy Science News •
www.pnl.gov/energyscience/