About “junk DNA”…
Θεόφιλος (Χατζημιχαήλ) (1873 – 1934), Η Αγία Άννα της Χαλκίδας, 1927
Breakthrough study overturns theory of ‘junk DNA’ in genome
The international Encode project has found that about a fifth of the human genome regulates the 2% that makes proteins
-
Alok Jha, science correspondent
-
guardian.co.uk, Wednesday 5 September 2012 20.03 BST
Science correspondent Ian Sample uses a visual aid to explain the implications of the new research. Video: Guardian Link to this video
Long stretches of DNA previously dismissed as "junk" are in fact crucial to the way our genome works, an international team of scientists said on Wednesday.
It is the most significant shift in scientists’ understanding of the way our DNA operates since the sequencing of the human genome in 2000, when it was discovered that our bodies are built and controlled by far fewer genes than scientists had expected. Now the next generation of geneticists have updated that picture.
The results of the international Encode project will have a huge impact for geneticists trying to work out how genes operate. The findings will also provide new leads for scientists looking for treatments for conditions such as heart disease, diabetes and Crohn’s disease that have their roots partly in glitches in the DNA. Until now, the focus had largely been on looking for errors within genes themselves, but the Encode research will help guide the hunt for problem areas that lie elsewhere in our DNA sequence.
Dr Ewan Birney, of the European Bioinformatics Institute near Cambridge, one of the principal investigators in the Encode project, said: "In 2000, we published the draft human genome and, in 2003, we published the finished human genome and we always knew that was going to be a starting point. We always knew that protein-coding genes were not the whole story."
For years, the vast stretches of DNA between our 20,000 or so protein-coding genes – more than 98% of the genetic sequence inside each of our cells – was written off as "junk" DNA. Already falling out of favour in recent years, this concept will now, with Encode’s work, be consigned to the history books.
Encode is the largest single update to the data from the human genome since its final draft was published in 2003 and the first systematic attempt to work out what the DNA outside protein-coding genes does. The researchers found that it is far from useless: within these regions they have identified more than 10,000 new "genes" that code for components that control how the more familiar protein-coding genes work. Up to 18% of our DNA sequence is involved in regulating the less than 2% of the DNA that codes for proteins. In total, Encode scientists say, about 80% of the DNA sequence can be assigned some sort of biochemical function.
Scientists know that while most cells in our body contain our entire genetic code, not all of the protein-coding genes are active. A liver cell contains enzymes used to metabolise alcohol and other toxins, whereas hair cells make the protein keratin. Through some mechanism that regulates its genes, the hair cell knows it should make keratin rather than liver enzymes, and the liver cell knows it should make the liver enzymes and not the hair proteins.
"That control must have been somewhere in the genome, and we always knew that – for some individual genes – it was an element sometimes quite far away from the gene," said Birney. "But we didn’t have a genome-wide view to this. So we set about working out how we could discover those elements."
The results of the five-year Encode project are published on Wednesday across 30 papers in the journals Nature, Science, Genome Biology and Genome Research. The researchers have mapped 4m switches in what was once thought to be junk DNA, many of which will help them better understand a range of common human diseases, from diabetes to heart disease, that depend on the complex interaction of hundreds of genes and their associated regulatory elements.
Researchers at the forefront of the Encode project explain its findings. Video: Nature
"Regulatory elements are the things that turn genes on and off," says Professor Mike Snyder of Stanford University, who was a principal investigator in the Encode consortium. "Much of the difference between people is due to the differences in the efficiency of these regulatory elements. There are more variants, we think, in the regulatory elements than in the genes themselves."
Genes cannot function without these regulatory elements. If regulation goes wrong, malfunctioning genes can cause diseases including cancer, atherosclerosis, type 2 diabetes, psoriasis and Crohn’s disease. Errors in the regulation of a gene known as Sonic Hedgehog, for example, are thought to underlie some cases of human polydactyly in which individuals have extra toes or fingers.
Prof Anne Ferguson-Smith, of Cambridge University, said: "They also have important implications for the growth and development of embryos and foetuses during pregnancy. These are the kinds of elements that make your tissues and organs grow properly, at the right time and place, and containing the right kinds of cells."
Encode scientists found that 9% of human DNA is involved in the coding for the regulatory switches, although Birney thinks the true figure may turn out to be about 20%. "One of the big surprises is that we see way more [regulatory] elements than I was expecting," he said.
The project has identified about 10,000 stretches of DNA, which the Encode scientists have called non-coding genes, that do not make proteins but, instead, a type of RNA – the single-stranded equivalent of DNA. There are many types of RNA molecule in cells, each with a specific role such as carrying messages or transcribing the DNA code in the first step of making a protein. However, the 10,000 non-coding genes carry instructions to build the large and small RNA molecules required to regulate the actions of the 20,000 protein-coding genes.
The results have already shed light on previous, massive studies of genetic data. In recent years, scientists have compared the genetic code of thousands of people with a specific disease (such as diabetes, bipolar disorder, Crohn’s disease or heart disease) with the DNA code of thousands of healthy people, in an attempt to locate mutations that could account for some of the risk of developing that disease. These so-called genome-wide association studies (GWAS) have identified scores of locations in the DNA that seem to raise a person’s risk of developing a disease – but the vast majority are nowhere near protein-coding genes. That makes sense if regions previously thought of as "junk" are actually vital for controlling the expression of protein-encoding genes.
Indeed, there is a big overlap between the locations identified by GWAS and the regulation switches identified in Encode. "When I first saw that result I thought it was too good to be true – we’ve done the analysis five different ways now and it still holds up," says Birney.
Understanding some of these regulatory elements could help explain some of the environmental triggers for different diseases.
Crohn’s disease, for example, is a long-term condition that causes inflammation of the lining of the digestive system that affects up to 60,000 people in the UK, but scientists cannot fully explain why some people suffer from it and others do not, even when they all have the genetic mutations associated with an elevated risk. One hypothesis is that the disease could be triggered by a bacterial infection. "Maybe there’s a place in the middle of nowhere [in the DNA], not close to a protein-coding gene, that if you have one variant you’re more sensitive to this bacterium, if you have another variant you’re less sensitive," says Birney. "So you get Crohn’s disease probably because you have the more sensitive type and that particular bacterial infection occurred at a time when you were vulnerable."
The Encode consortium’s 442 researchers, situated in 32 institutes around the world, used 300 years of computer time and five years in the lab to get their results. They examined a total of 147 types of tissue – including cancer cells, liver extracts, endothelial cells from umbilical cords, and stem cells derived from embryos – and subjected them to around a hundred different experiments, recording which parts of the DNA code were activated in which cells at which times.
The current and future phases of Encode will prove useful not only for scientists, but also for those who want a more personalised approach to medicine in the decades to come. "We’re in an era where people are starting to get their genomes sequenced – with Encode data we could start mapping regulatory information," says Snyder.
This means that the individual differences in people’s diseases can be more effectively targeted for treatment. "Diseases have been defined by the medical profession observing symptoms," says Dr Tim Hubbard of the Wellcome Trust Sanger Institute in Cambridge. "[But] we know, for example, that breast cancer is not one disease but there’s multiple types of breast cancer with all sorts of different mechanistic processes going wrong.
"A given drug only works in about a third of the people you give it to, but you don’t know which third. A lot of that is related to genomics, so if you knew the relationship between a person’s genome and which drugs work for them and which ones they shouldn’t take because it gives them side effects, that would improve medicine."
Understanding exactly how each type of cell in the body works – in other words which genes are switched on or off at different stages of its function – will also be useful in future stem cell therapies. If doctors want to grow replacement liver tissue, for example, they will be able to check that it is safe by comparing the DNA functions of their manufactured cells with data from normal liver cells.
Birney says that the decade since the publication of the first draft of the human genome has shown that genetics is much more complex than anyone could have predicted. "We felt that maybe life was easier beforehand and more comfortable because we were just more ignorant. The major thing that’s happening is that we’re losing some of our ignorance and, indeed, it’s very complicated," he says. "You’ve got to remember that these genomes make one of the most complicated things we know, ourselves. The idea that the recipe book would be easy to understand is kind of hubris. I still think we’re at the start of this journey, we’re still in the warm-up, the first couple of miles of this marathon."
Glossary
DNA Deoxyribonucleic acid is the chemical that stores genetic information in our cells. Shaped like a double helix, DNA passes down from one generation to the next.
RNA Ribonucleic acid is a type of molecule used in making proteins in the body.
Genome The complete genetic makeup of an organism, which contains all the biological information to build and keep it alive.
Gene A stretch of DNA that tells a cell how to make specific proteins or RNA molecules.
Enzyme A molecule that takes part in a chemical reaction inside a living organism.
Stem cell A biological master cell that can multiply and become many different types of tissue. They can also self-replicate to make more stem cells.
Flora Malein
Bits of Mystery DNA, Far From ‘Junk,’ Play Crucial Role
By GINA KOLATA, The New York Times
Rethinking ‘Junk’ DNA
A large group of scientists has found that so-called junk DNA, which makes up most of the human genome, does much more than previously thought.
Among the many mysteries of human biology is why complex diseases like diabetes, high blood pressure and psychiatric disorders are so difficult to predict and, often, to treat. An equally perplexing puzzle is why one individual gets a disease like cancer or depression, while an identical twin remains perfectly healthy.
Béatrice de Géa for The New York Times
“It is like opening a wiring closet and seeing a hairball of wires,” Mark Gerstein of Yale University said of the DNA intricacies.
Now scientists have discovered a vital clue to unraveling these riddles. The human genome is packed with at least four million gene switches that reside in bits of DNA that once were dismissed as “junk” but that turn out to play critical roles in controlling how cells, organs and other tissues behave. The discovery, considered a major medical and scientific breakthrough, has enormous implications for human health because many complex diseases appear to be caused by tiny changes in hundreds of gene switches.
The findings, which are the fruit of an immense federal project involving 440 scientists from 32 laboratories around the world, will have immediate applications for understanding how alterations in the non-gene parts of DNA contribute to human diseases, which may in turn lead to new drugs. They can also help explain how the environment can affect disease risk. In the case of identical twins, small changes in environmental exposure can slightly alter gene switches, with the result that one twin gets a disease and the other does not.
As scientists delved into the “junk” — parts of the DNA that are not actual genes containing instructions for proteins — they discovered a complex system that controls genes. At least 80 percent of this DNA is active and needed. The result of the work is an annotated road map of much of this DNA, noting what it is doing and how. It includes the system of switches that, acting like dimmer switches for lights, control which genes are used in a cell and when they are used, and determine, for instance, whether a cell becomes a liver cell or a neuron.
“It’s Google Maps,” said Eric Lander, president of the Broad Institute, a joint research endeavor of Harvard and the Massachusetts Institute of Technology. In contrast, the project’s predecessor, the Human Genome Project, which determined the entire sequence of human DNA, “was like getting a picture of Earth from space,” he said. “It doesn’t tell you where the roads are, it doesn’t tell you what traffic is like at what time of the day, it doesn’t tell you where the good restaurants are, or the hospitals or the cities or the rivers.”
The new result “is a stunning resource,” said Dr. Lander, who was not involved in the research that produced it but was a leader in the Human Genome Project. “My head explodes at the amount of data.”
The discoveries were published on Wednesday in six papers in the journal Nature and in 24 papers in Genome Research and Genome Biology. In addition, The Journal of Biological Chemistry is publishing six review articles, and Science is publishing yet another article.
Human DNA is “a lot more active than we expected, and there are a lot more things happening than we expected,” said Ewan Birney of the European Molecular Biology Laboratory-European Bioinformatics Institute, a lead researcher on the project.
In one of the Nature papers, researchers link the gene switches to a range of human diseases — multiple sclerosis, lupus, rheumatoid arthritis, Crohn’s disease, celiac disease — and even to traits like height. In large studies over the past decade, scientists found that minor changes in human DNA sequences increase the risk that a person will get those diseases. But those changes were in the junk, now often referred to as the dark matter — they were not changes in genes — and their significance was not clear. The new analysis reveals that a great many of those changes alter gene switches and are highly significant.
“Most of the changes that affect disease don’t lie in the genes themselves; they lie in the switches,” said Michael Snyder, a Stanford University researcher for the project, called Encode, for Encyclopedia of DNA Elements.
And that, said Dr. Bradley Bernstein, an Encode researcher at Massachusetts General Hospital, “is a really big deal.” He added, “I don’t think anyone predicted that would be the case.”
The discoveries also can reveal which genetic changes are important in cancer, and why. As they began determining the DNA sequences of cancer cells, researchers realized that most of the thousands of DNA changes in cancer cells were not in genes; they were in the dark matter. The challenge is to figure out which of those changes are driving the cancer’s growth.
“These papers are very significant,” said Dr. Mark A. Rubin, a prostate cancer genomics researcher at Weill Cornell Medical College. Dr. Rubin, who was not part of the Encode project, added, “They will definitely have an impact on our medical research on cancer.”
In prostate cancer, for example, his group found mutations in important genes that are not readily attacked by drugs. But Encode, by showing which regions of the dark matter control those genes, gives another way to attack them: target those controlling switches.
Dr. Rubin, who also used the Google Maps analogy, explained: “Now you can follow the roads and see the traffic circulation. That’s exactly the same way we will use these data in cancer research.” Encode provides a road map with traffic patterns for alternate ways to go after cancer genes, he said.
Dr. Bernstein said, “This is a resource, like the human genome, that will drive science forward.”
The system, though, is stunningly complex, with many redundancies. Just the idea of so many switches was almost incomprehensible, Dr. Bernstein said.
There also is a sort of DNA wiring system that is almost inconceivably intricate.
“It is like opening a wiring closet and seeing a hairball of wires,” said Mark Gerstein, an Encode researcher from Yale. “We tried to unravel this hairball and make it interpretable.”
There is another sort of hairball as well: the complex three-dimensional structure of DNA. Human DNA is such a long strand — about 10 feet of DNA stuffed into a microscopic nucleus of a cell — that it fits only because it is tightly wound and coiled around itself. When they looked at the three-dimensional structure — the hairball — Encode researchers discovered that small segments of dark-matter DNA are often quite close to genes they control. In the past, when they analyzed only the uncoiled length of DNA, those controlling regions appeared to be far from the genes they affect.
The project began in 2003, as researchers began to appreciate how little they knew about human DNA. In recent years, some began to find switches in the 99 percent of human DNA that is not genes, but they could not fully characterize or explain what a vast majority of it was doing.
The thought before the start of the project, said Thomas Gingeras, an Encode researcher from Cold Spring Harbor Laboratory, was that only 5 to 10 percent of the DNA in a human being was actually being used.
The big surprise was not only that almost all of the DNA is used but also that a large proportion of it is gene switches. Before Encode, said Dr. John Stamatoyannopoulos, a University of Washington scientist who was part of the project, “if you had said half of the genome and probably more has instructions for turning genes on and off, I don’t think people would have believed you.”
By the time the National Human Genome Research Institute, part of the National Institutes of Health, embarked on Encode, major advances in DNA sequencing and computational biology had made it conceivable to try to understand the dark matter of human DNA. Even so, the analysis was daunting — the researchers generated 15 trillion bytes of raw data. Analyzing the data required the equivalent of more than 300 years of computer time.
Just organizing the researchers and coordinating the work was a huge undertaking. Dr. Gerstein, one of the project’s leaders, has produced a diagram of the authors with their connections to one another. It looks nearly as complicated as the wiring diagram for the human DNA switches. Now that part of the work is done, and the hundreds of authors have written their papers.
“There is literally a flotilla of papers,” Dr. Gerstein said. But, he added, more work has yet to be done — there are still parts of the genome that have not been figured out.
That, though, is for the next stage of Encode.
Γραμμένος Κορνήλιος (1959 ), Χωρίς τίτλο, 1984
Πρόσφατα σχόλια