Integrating next-generation sequencing and artificial intelligence for the identification and validation of pathogenic variants in colorectal cancer

Juliana Rodriguez-Salamanca, Mariana Angulo-Aguado, Sarah Orjuela-Amarillo, Catalina Duque, Diana Carolina Sierra-Díaz, Nora Contreras Bravo, Carlos Figueroa, Carlos M. Restrepo, Andrés López-Cortés, Rodrigo Cabrera, Adrien Morel, Dora Janeth Fonseca-Mendoza

Research output: Contribution to journalResearch Articlepeer-review

Abstract

Background: Colorectal cancer (CRC) is recognized as a multifactorial disease, where both genetic and environmental factors play critical roles in its development and progression. The identification of pathogenic germline variants has proven to be a valuable tool for early diagnosis, the implementation of surveillance strategies, and the identification of individuals at increased cancer risk. Next-generation sequencing (NGS) has facilitated comprehensive multigene analysis in both hereditary and sporadic cases of CRC. Patients and methods: In this study, we analyzed 100 unselected Colombian patients with CRC to identify pathogenic (P) and likely pathogenic (LP) germline variants, classified according to the guidelines established by the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP). Using the BoostDM artificial intelligence method, we were able to identify oncodriver germline variants with potential implications for disease progression. We assessed the model’s accuracy in predicting germline variants by comparing its results with the AlphaMissense pathogenicity prediction model. Additionally, a minigene assay was employed for the functional validation of intronic mutations. Results: Our findings revealed that 12% of the patients carried pathogenic/likely pathogenic (P/LP) variants according to ACMG/AMP criteria. Using BoostDM, we identified oncodriver variants in 65% of the cases. These results highlight the significance of expanded multigene analysis and the integration of artificial intelligence in detecting germline variants associated with CRC. The average overall AUC values for the comparison between BoostDM and AlphaMissense were 0.788 for the entire BoostDM dataset and 0.803 for the genes within our panel, with individual gene AUC values ranging from 0.606 to 0.983. Functional validation through the minigene assay revealed the generation of aberrant transcripts, potentially linked to the molecular etiology of the disease. Conclusion: Our study provided valuable insights into the prevalence and frequency of P/LP germline variants in unselected Colombian CRC patients through NGS. Integrating advanced genomic analysis and artificial intelligence has proven instrumental in enhancing variant detection beyond conventional methods. Our functional validation results provide insights into the potential pathogenicity of intronic variants. These findings underscore the necessity of a multifaceted approach to unravel the complex genetic landscape of CRC.

Original languageEnglish (US)
Article number1568205
JournalFrontiers in Oncology
Volume15
DOIs
StatePublished - 2025

All Science Journal Classification (ASJC) codes

  • Oncology
  • Cancer Research

Cite this