Twenty years ago this week, the Human Genome Project (HGP) was declared complete, marking the end of a 13-year and nearly $3-billion endeavor. Hailed by many as one of the greatest achievements in modern science and medicine, the project has still had its share of critics, many of whom bemoan its failure to solve the causes of complex (i.e., virtually all) diseases and inaugurate an era of personalized therapies. But in looking back, these pie-in-the-sky predictions about the HGP’s potential immediate benefits reveal just how little we truly knew about our own genetics and biology before this project began – and how much we gained as a result of its completion. While it may not have given us all of the answers we sought, it did teach us that we’d been asking the wrong questions, and it provided the necessary foundation for investigating the right ones.
Discussion of the full impact of this immense project could fill a library, but for now, I’d like to highlight just a few macro points which, in retrospect, limited the clinical benefit we could derive from the Human Genome Project itself but have shaped progress in genetics over the last 20 years and pave our way into the future.
DNA is far more than genes
The “central dogma” of biology states that genes “work” by being expressed – that is, they are transcribed into RNA, which in turn is translated into proteins, which then carry out countless cellular functions. Prior to the HGP, it had been thought that the complexity of an organism would therefore correlate positively with its number of genes: more genes = more proteins = more functions.
Given this logic, one of the most shocking results of the HGP was that the human genome contained far fewer genes than previously thought – only around 20,000, or roughly the same number as are found in a sea sponge (or ~one-third of the number found in soybeans). Further, it was discovered that this “functional” DNA accounted for only a tiny minority of the total human genome. Approximately 98% of our DNA does not code for proteins at all, and scientists were at a loss for how to interpret this genomic “dark matter.” The research also revealed that our genetics are remarkably similar to that of other species. For instance, mainstream media has popularized the knowledge that humans are 50% genetically identical to bananas, while the fact that we share 99% of our DNA with chimpanzees illustrates just how little of our genetic material defines all of our many uniquely human qualities. Together, these surprises signaled that our understanding of how genetics influence our biology was woefully simplistic.
An open book, but we’re still just learning to read
The Human Genome Project mapped out the sequence of letters that make up the human genetic code, but the task of deriving meaning from that sequence was not as trivial as many had hoped. The relatively small number of genes implied (correctly) that they – and their eventual protein products – could be modified or regulated in myriad ways which would alter their expression and function. To some, this meant that the project was a failure and that research ought to shift away from genetics and toward environmental influences on disease. One particularly pessimistic journalist went as far as to call the HGP a “map to nowhere” and expressed the belief that deciphering such a complex system would never be possible.
This view is astoundingly short-sighted, as research over the last 20 years has shown. In addition to facilitating further study and elucidation of the functions of genes themselves, the HGP led to the 2003 launch of the Encyclopedia of DNA Elements (ENCODE) Project, aimed at determining the function of all genomic material, whether protein-coding or not. ENCODE has since reported that over 80% of the genome demonstrates some form of functionality, with much of the “dark matter” DNA involved in regulation of gene expression in various cell types – a critical link between the genetic code and biological relevance. However, the task of deciphering which genes are regulated by which non-coding elements and under which circumstances is ongoing.
The need for greater diversity and inclusion
Another undertaking that arose in the wake of the HGP was the 1000 Genomes Project, an international collaboration to catalog common genetic variations across humanity. This project highlights another important limitation of the HGP: a lack of diversity.
In order to use genetic information to gain insights into personal health risks, we must learn how genetic sequences vary in certain locations across individuals, and how each of those different variations correlates with disease. (For example, I’ve frequently discussed how the APOE4 variant of the APOE gene is associated with elevated risk of Alzheimer’s disease.) But we can only study risk for the variants we know exist, and we only know they exist if they’ve been documented before.
Analysis has revealed that around 70% of the original HGP genome derived from a single individual, providing a reference for comparison but virtually no information on variation. Since then, the cost of human sequencing has plummeted, facilitating expansion of sampling for the purpose of identifying and studying variants. But this process has been heavily biased in its representation: a 2019 study found that around 80% of all participants in genetic studies have been of European descent, a group which represents around 16% of the world’s population. This disparity is a problem for two reasons.
First, it means that underrepresented groups are far less likely to benefit from genetic risk assessments, as they are more likely to possess variants for which we do not have enough data on possible associations with disease. This in turn is likely to heighten racial and geographical inequalities in quality of medical care.
Second, it deprives us of a source of knowledge which might benefit all populations. Certain variants are only present in specific populations and are therefore more likely to be discovered by sampling from a broad spectrum of humanity. But the insights these variants provide on health may be applicable to many. An excellent example of this scenario is the identification of a rare variant, present primarily in African Americans, that results in loss of function of the PCSK9 protein. As I explained in a previous newsletter, these individuals have low cholesterol levels and are largely protected from atherosclerotic cardiovascular disease – a discovery which inspired the development of a class of lipid-lowering PCSK9 inhibitor drugs, which have proven highly effective in reducing cardiovascular risk across broad populations.
A giant leap forward, but the journey is long
This discussion barely scratches the surface on the implications and impact of the Human Genome Project, let alone on the future of genetics and its applications to health and medicine. (I’d like to explore this latter topic in a bit more detail in a future “Ask Me Anything” episode of the podcast.) Still, in honor of the 20th anniversary of the project’s completion, I hope I’ve provided a small taste of the many ways in which this achievement has revolutionized our understanding of our own biology and – far from being a “map to nowhere” – has illuminated the path forward.
To call it a failure because it did not instantly solve all questions regarding the genetic bases of diseases would be a bit like saying the entire history of science has been a failure because we’ve never determined how to turn iron into gold. The project’s completion was accurately described at the time as “the end of the beginning,” and over the last twenty years, we’ve made steady progress in tackling the complexities of gene expression, function, and variation. As we march through this new phase of the journey toward understanding how our DNA affects our health, new questions, mysteries, and setbacks will certainly arise. But at least we’re on our way.
For a list of all previous weekly emails, click here.