New insights into human disease that could propel drug development

The largest open-access proteogenomic dataset to date lays the groundwork for the future discovery of novel drug targets and biomarkers

29 May 2024

A first-of-its-kind population-scale proteogenomic database has revealed new links between genetics, health, and disease.y — A first-of-its-kind population-scale proteogenomic database has revealed new links between genetics, health, and disease. © Nikolai Lenets @ 123rf.com

Scientists from the UK Biobank Pharma Proteomics Project have uncovered a slew of new links between genetics, protein expression and human disease in two landmark studies published in Nature this month.^1,2The research, which analyzed proteomic data from over 54,000 UK Biobank participants, demonstrates how population-scale proteogenomic studies can advance our understanding of disease biology and offers insights that could aid the discovery of new drug targets and biomarkers for a wide range of health conditions. Data generated by the project is set to become accessible to the wider scientific community in the coming weeks.

Proteomics bridges the gap between the human genome and disease

Almost all human diseases are influenced to some degree by genetic variation, but the links between DNA variants and disease can often be elusive. While genome-wide association studies (GWAS) have been successful in identifying many genomic variants associated with various diseases, they often identify variants without clear causal genes mediating their impact. Proteomic studies, which provide a complementary layer of information that reflects the functional consequences of genetic variants, can help bridge this gap.

Proteins serve critical roles in biological processes and their expression levels can provide a unique snapshot of the current state of health or disease. Therefore, by identifying genomic variants that are associated with circulating protein levels – known as protein quantitative trait loci (pQTL) mapping – scientists can infer relationships between genomic variants and proteins that are indicative of disease. Integrating genomics and proteomics in this way can thus provide a more comprehensive understanding of underlying disease mechanisms, which can in turn guide the development of precision medicine and targeted treatments.

“The scientific community has invested substantially in genomics for the advancement of precision medicine,” said Dr. Chris Whelan, a director in Johnson & Johnson’s data science & digital health division, and leader of the UK Biobank Pharma Proteomics Project.³ “However, to identify the right drug for the right patient at the right time, we must move beyond genomics alone.”

Initial findings from the UK Biobank Pharma Proteomics Project

To address this need, the UK Biobank Pharma Proteomics Project (UKB-PPP), a consortium of thirteen leading pharmaceutical companies, was set up in 2020 to characterize the proteomic profiles of blood plasma samples from the UK Biobank – building upon its rich dataset of genetic and health data. Leveraging Olink® Proximity Extension Assay (PEA) technology and in particular, the Olink® Explore 3072 platform, the abundance of nearly 3,000 proteins was measured in plasma samples from over 54,000 UK Biobank participants.

A flagship article by Sun et al.¹, provides the first detailed summary of the data obtained from the project, accompanied by downstream GWAS-based proteogenomic analysis and pQTL mapping. The authors identified over 14,000 genetic associations with protein expression levels in the blood, 81% of which were previously unknown.

The UKB-PPP consortium selected Olink’s PEA™ technology for high throughput protein biomarker discovery as its preferred proteomics platform and employed the Olink® Explore 3072 to measure almost 3,000 proteins.

“The study constructs an updated genetic atlas of the plasma proteome, reveals novel biological insights into prevalent illnesses, and provides the scientific community with an open-access, population-scale proteomics resource,” Sun et al., stated in the article.

Additionally, the study demonstrated the utility of this data in drug target discovery. Notably, the authors provided evidence for an interaction between blood group and FUT2 secretor status upon gastrointestinal (GI) protein expression, which may underlie susceptibility to certain GI conditions. Moreover, by analyzing COVID-19 susceptibility loci, the study sought to untangle shared and distinct protein pathways associated with the disease.

Dr. Whelan added, “This dataset will help paint a much more nuanced and detailed picture of how the human genome and proteins circulating in the blood influence human health and disease – enabling biomedical researchers to identify new biological associations, find new drug targets, and build blood-based diagnostics.³”

The effect of rare protein-coding variants

A paper in the same issue of Nature by Dhindsa et al.², predominantly authored by consortium members from AstraZeneca, leveraged the dataset generated by the UKB-PPP to study genetic associations of rare protein-coding variants with protein expression.

Compared with common genetic variants, rare variants offer more directly interpretable insights into the relationships between genes, proteins, and their roles in disease. However, prior to this study, attempts to determine the impact of rare protein-coding variants on plasma proteins had only been carried out on a small scale. Analyzing over 50,000 human exomes using an EXWAS approach, the study revealed over 4,400 significant pQTLs – more than three-quarters of which were not detected in the GWAS analysis described by Sun et al.¹

The utility of rare variant analysis was demonstrated by the identification of both known and unknown protein-protein interactions and novel biomarker discovery, including six proteins associated with a rare variant in the HSD17B13 gene, known to protect against chronic liver diseases.

“Beyond biomarker discovery, this pQTL atlas may facilitate other components of drug development, including identifying novel genetic targets, safety profiling, and drug repositioning opportunities,” Dhindsa et al. stated in the article. “To make these data broadly accessible, we provide the pQTL summary statistics in our PheWAS browser.”

Using this browser, researchers can search by gene, variant, or phenotype to discover genetically anchored disease-protein associations to gain novel insights and a deeper understanding of the biology of common diseases.

Scientists worldwide will also be able to access the proteomic data generated by the UKB-PPP in the coming weeks via the UK Biobank. “All of these data will soon be available to bona fide researchers across the globe, alongside the existing genomic, lifestyle, and health data that UK Biobank holds for its 500,000 volunteers,” said Professor Naomi Allen, Chief Scientist of UK Biobank. “I am excited for researchers to use these data to identify patterns that could transform our understanding of how diseases develop, and to identify potential new treatment pathways.”