A group of researchers at The Scripps Research Institute and the Scripps Translational Science Institute has published a paper that reviews new strategies for identifying collections of rare genetic variations that reveal whether people are predisposed to developing common conditions like diabetes and cancer.
In our modern genetic age, the entire DNA sequences, or “genomes,” of humans and thousands of other animals, plants, and microbial life forms have been completely decoded and are publicly available to scientists worldwide. One of the hopes now that this data is available is that scientists will be able to find genetic markers of diseases—particular bits of DNA that would identify someone as being at risk for developing a particular disease.
Knowing that a person has such a genetic predisposition could be a powerful tool for preventative medicine because, depending on the disease in question, there may be specific drugs or behavioral modifications like diet or exercise that doctors could prescribe to their patients early on to prevent or significantly lessen the impact of those diseases later in life.
Finding these genetic markers has proven to be difficult, however, and despite the fact that the human genome has been available to researchers for years, scientists have only discovered the underlying genetic determinants for about 5 to 10 percent of the heritable component of most common human diseases.
“There’s a long way to go,” said Nicholas J. Schork, Ph.D., who is a professor at Scripps Research and director of biostatistics and bioinformatics at the Scripps Translational Science Institute.
In the November 2010 issue of Nature Reviews Genetics, Schork and his colleagues outline new statistical strategies that may help to close the gap in the coming years.
Part of the problem, Schork says, is that most studies up to now have focused on identifying common genetic markers of diseases—those definitive DNA signatures that are unmistakably linked to diseases because they are shared by large groups of people who have those diseases.
Such investigations, typically referred to as “genome-wide association studies,” use statistical algorithms to sift through DNA samples and pull out whatever common variations exist that exhibit signs of association with a condition. While powerful, these statistical methods may not shed light on many diseases, says Schork, because not all diseases have such definitive DNA signatures. Many of the most common diseases are more complex. They are associated with multiple genes and multiple environmental factors.
According to Schork, the key to identifying the genetic components of these complex diseases is not to focus on finding single common genetic signatures that people share—but rather to identify whole collections of rare genetic signatures, any one of which may indicate a predisposition toward a disease.
The situation is analogous to asking how someone from outside New York City could get to Times Square in Manhattan. There is no single answer to that question because there are any number of approaches and modes of transportation—from New Jersey, from Brooklyn, from Wall Street, or from the Bronx, and via plane, bus, train, taxi, ferry, bridge, tunnel, subway, or sidewalk.
Regardless of where they start or how they get there, it is possible for many people to wind up at exactly the same spot, though, and Schork says the same is true for many human diseases. There may not be one single genetic marker for many diseases, but multiple markers involving any number of genes, even among people who share the same disease.
Finding these rare signatures requires a great deal more scientific sleuthing, Schork said, and in their Nature Review Genetics article Schork and his colleagues suggest a new approach to discover all the possible combinations.
This approach will require collaborations between mathematicians and computer scientists, who have the skills needed to tease out these elusive genetic markers, and biologists who can shed light on what those genes do.
“Mathematics, statistics, and fancy computers alone won’t do it,” Schork said. “A much more integrative approach has to occur in order to make sense of DNA sequence data.”