High-dimensional clinical data (HDCD) refers to datasets in healthcare where the number of variables (or features) is significantly larger than the number of patients (or observations). As the number of variables increases, the data space grows exponentially, requiring substantial computational resources that make it difficult to process and analyze. Additionally, models built on high-dimensional data can be difficult to interpret, hindering clinical decision-making. The difficulty in obtaining large datasets with comprehensive disease labels and the limitations of standard disease labels in reflecting complex biological traits restrict the effective use of HDCD in genomic studies.
GoogleAI researchers address the challenge of harnessing high-dimensional clinical data (HDCD), such as spirograms, photoplethysmograms (PPGs), and imaging data, for genetic discovery and disease prediction. Current methods in genomic studies often involve genome-wide association studies (GWAS) on expert-defined features extracted from HDCD or directly on high-dimensional data coordinates. However, these approaches face challenges such as computational expense, high multiple-testing burdens, and limited ability to uncover complex genetic associations.
Google’s novel approach called REpresentation Learning for Genetic discovery on Low-dimensional Embeddings (REGLE), is designed to address these limitations. REGLE utilizes unsupervised representation learning to transform HDCD into lower-dimensional embeddings without the need for disease labels. This method integrates expert-defined features (EDFs) where available and enables more efficient and comprehensive genetic analysis.
REGLE employs a variational autoencoder (VAE) to learn non-linear, low-dimensional, disentangled representations of HDCD. The process involves three main steps: learning embeddings of HDCD via VAE, performing GWAS on these embeddings to identify genetic associations, and creating polygenic risk scores (PRSs) from the embeddings to predict specific diseases or traits, potentially using a few disease labels. The method was validated on two types of HDCD—spirograms and PPGs—and demonstrated significant improvements. REGLE detected novel genetic loci associated with lung and cardiovascular functions that were not identified through traditional methods. For instance, REGLE found 45% more significant loci for PPG data and improved risk prediction for diseases like COPD and asthma compared to methods based on EDFs or principal component analysis (PCA). The embeddings also provided interpretable results, highlighting features like airway obstruction not well-represented by standard EDFs.
In conclusion, the REGLE method provides a robust solution for genetic analysis using high-dimensional clinical data by leveraging unsupervised learning to uncover hidden genetic signals and improve disease prediction. By eliminating the need for extensive disease labels and incorporating expert features, REGLE effectively addresses traditional methods’ limitations. Researchers demonstrated that improvements in novel loci discovery and risk prediction underscore REGLE’s potential to advance genomic research and enhance personalized medicine through a more comprehensive analysis of HDCD.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter.
Join our Telegram Channel and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our 46k+ ML SubReddit
The post Google Research Presents a Novel AI Method for Genetic Discovery that can Harness Hidden Information in High-Dimensional Clinical Data appeared first on MarkTechPost.
#AIPaperSummary #AIShorts #Applications #ArtificialIntelligence #EditorsPick #MachineLearning #Staff #TechNews #Technology [Source: AI Techpark]