ENCODE confirms our DNA isn’t mostly junk
September 5, 2012 § 1 Comment
Contrary to popular perception, most of our DNA isn’t junk. That’s the big news this week out of the ENCODE project.
The Human Genome Project gave us the genomic blueprint more than a decade ago. Now ENCODE gives us our first big overview of how the blueprint of nearly 3 billion bases is executed so that the human genome can do its complex job in making us functional beings. The project has revealed that more than 80 percent of the human genome sequence participates in at least one biochemical event in at least one cell type.
Fortyyears ago, the thinking was that noncoding DNA in our genome didn’t do much. The term “junk DNA” has been tossed around.
ENCODE, the acronym of Encyclopedia of DNA Elements, is an enormous project funded by the National Institutes of Health. It has involved 440 researchers in 32 labs around the world working for more than five years. The researchers have been mapping regions of transcription, transcription factor association, chromatin structure and histone modification. To date, they have looked at 147 cell types and came up with 1,640 genomewide data sets.
And this week, across several journals, including the Journal of Biological Chemistry, about 30 research and review papers have appeared to present the data from ENCODE. The papers discuss how most of the human genome has at least one biochemical activity assigned to it.
Ninety-five percent of the genome is situated within 8 kilobases of a DNA–protein interaction, as seen by a bound transcription factor motif or DNase I footprint; 99 percent is positioned within 1.7 kb of at least one biochemical event. Furthermore, most of the genome sits close to a regulatory event. In a nutshell, regulation of gene expression is complex and involves most of the genome! Therefore, the so-called junk DNA actually has a purpose as gene-controlling switches, making sure the correct genes are turned on at the correct time in the correct place.
“We’ve come a long way,” Ewan Birney of the European Bioinformatics Institute in the U. K. said in a statement from the National Human Genome Research Institute. Birney is the lead analysis coordinator of the ENCODE data. He adds, “We have learned an incredible amount by integrating the different types of data that ENCODE produced, which was done at a scale never before achieved in biology. This data integration was one of the keys to the success of the project.”
JBC has published six articles, shepherded by guest editor Peggy Farnham at the University of Southern California, as a thematic minireview series focused on the identification and analysis of some of the different types of regulatory regions studied by ENCODE (you can get reviews at this link here). For example, two of the minireviews are focused on the identification and characterization of enhancers. Nature has set up a website dedicated to the papers from ENCODE.
The ENCODE data are already becoming a fundamental resource for researchers to understand human biology and disease. For example, researchers are using it to study disease-associated variants that map to both protein-coding regions and noncoding regions of the genome, which were formerly known as junk DNA regions. The hope is to understand how the variants contribute to disease through gene regulation and expression.
As variants in the gene-controlling switches will differ between people, the data from ENCODE will help to figure out if people with particular variants will be, for example, at risk for a disease or responsive to a specific drug. “We believe the ENCODE project will have a profound effect on personalized medicine,” said Michael Snyder of Stanford University at a press conference.
Identifying regulatory regions also will help researchers realize why different cell types differ in their properties. Just think of a muscle cell that contracts and contrast it with a neuronal cell that sends out neurotransmitters. ENCODE will help these studies by identifying the cell type-specific control elements.
However ENCODE’s work is still unfinished. For that reason, the project is about to be renewed for an additional four years. In the next phase, ENCODE will add more data about the types of functional elements and cell types studied. It also will develop new tools for more sophisticated analyses of data.