An expansive new study offers clear evidence that sequencing the genomes of diverse populations can yield fresh insights into how our DNA shapes our health.
A team of researchers involved in an ongoing project spearheaded by the U.S. Department of Veterans Affairs analyzed data from nearly 636,000 veterans to look for genetic variants associated with more than 2,000 traits, such as height, blood glucose levels, and whether subjects had certain cancers. Scientists found around 26,000 associations between variants and traits, most of which were detectable regardless of participants’ ancestry.
But nearly 3,500 associations were only apparent when people with non-European ancestry were included in the analysis. The research team also found that including non-European populations increased their ability to pinpoint which genetic variants actually cause someone to have increased risk for a trait.
“On one hand, we show empirically that genetics is mostly the same. And on the other hand, we show that you learn a lot when you include individuals of non-European populations, and you get a lot of population-specific knowledge,” said Scott Damrauer, one of the study’s senior authors and a vascular surgeon affiliated with the Corporal Michael J. Crescenz VA Medical Center and the University of Pennsylvania in Philadelphia. “The only way to do genetics equitably is to have participants of diverse populations.”
The findings, published in the journal Science on Thursday, come out of the VA Million Veteran Program, a national research project launched in 2011 to examine how genes, lifestyle, and other factors influence the health of veterans — with the hope that its discoveries might apply to the general population, too. Data from the effort have already been used in more than 350 scientific publications, including studies of cancer, diabetes, and post-traumatic stress disorder.
In the new study, researchers looked for associations between genes and traits across a range of populations. To do so, they compared participants’ genetics to sequences in the 1000 Genomes Project, a global reference for human genetic variation, and categorized subjects into one of four broad groups: African, East Asian, European, and Admixed American, a group that includes several Hispanic populations. These categories don’t capture the full variation of human genetics, Damrauer acknowledged, but gave researchers some way to compare populations based on genetic similarity rather than self-reported race or ethnicity.
The researchers then ran more than 4,000 genome-wide association studies (GWAS) to identify variants linked to specific health traits in these four populations. Doing this analysis for 44 million DNA variants would have taken a computer about 250 years with traditional methods, but the Department of Energy offered both supercomputers and expertise that allowed the team to crunch the data in weeks.
Just 8% of participants were female, and the average age of study subjects was 62, roughly in line with the demographics of the current veteran population. About 29% of participants in this study had non-European ancestry; that rate has on average been 13% in other GWAS studies. That’s one reason Anurag Verma, the study’s first author, had expected to see more differences in gene-variant relationships across populations than they ultimately found.
“That was surprising,” said Verma, a VA staff scientist and assistant professor at the University of Pennsylvania who studies the genetics of complex traits. “But when you think from the biology point of view, and when you actually think of the pathophysiology of diseases, that makes sense.”
Yet the research team also saw clear evidence of the value of diversity in genetic research. In some cases, they were able to see links between variants and traits in one population that weren’t visible in another, often because a variant or trait was too rare in a population for scientists to spot a clear association. For instance, researchers found a variant associated with an increased risk of prostate cancer, and another linked to keloid scarring, an injury response that causes skin to grow back as a bumpy, raised scar. Both findings were driven by the inclusion of individuals with African ancestry.
In total, the authors found more than 800 variant-trait associations they would have missed had they only included individuals whose DNA resembled European reference sequences.
“This collaborative study could speed up scientific discoveries for hundreds of health conditions and diseases, helping to resolve a lack of inclusion in health research worldwide,” said Carolyn Clancy, the VA’s assistant under secretary for Health for Discovery, Education and Affiliate Networks, in a statement to STAT.
The scientists also leveraged diversity to unravel which variants mattered most. Genetic association studies often find that a whole set of variants that sit next to one another in the genome are associated with a certain trait. But many of these variants have no influence on the trait; they simply happen to be next to a variant that does. Having a more diverse study population can help scientists pinpoint the key variants, and the researchers identified 6,318 variants that likely play a causal role in influencing 613 traits.
“This work sets a new bar for multi-ancestry genomic analysis and represents a major step forward towards diverse representation in the genomics literature,” said Euan Ashley, a Stanford cardiologist who was not involved in the study and whose team is using whole-genome sequencing to rapidly diagnose patients.
“A more diverse cohort affords much greater power of discovery — as evidenced by many previously undiscovered associations being uncovered in this study,” he wrote in an email to STAT. “However, this cohort also allows that effect to be quantified, perhaps for the first time, and certainly for the first time at this scale and resolution.”
As of last year, a million veterans had volunteered to participate in the Million Veteran Program, and Damrauer said that researchers will continue to study variant-trait relationships as they get access to more data. The effort is one of a growing number of large-scale studies seeking to collect and analyze reams of genetic and health data from volunteers, with parallel efforts including the U.K. Biobank and the National Institutes of Health’s “All of Us” project.
Findings from the study could one day be used to create risk scores that more accurately sum up the incremental contributions of genetic variants to help doctors assess a patient’s risk of a particular disease. Earlier this year, researchers used All of Us data to show that these polygenic risk scores currently aren’t very accurate for non-European populations, but Damrauer is hopeful that using more diverse populations to find disease-associated variants could change that.
The most impactful applications of the recent study may not come from any of its authors. That’s because the research team plans to make the main findings of its genetic association studies publicly available on dbGaP, an NIH repository of human genomic data, so that other researchers can learn from and build on their results.
“To me, among the most interesting next steps are going to be things that other people think of using the data,” Damrauer said. “I’m hopeful that this can really move the overall endeavor of understanding human genetics forward.”
To submit a correction request, please visit our Contact Us page.
STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect