Special Session 115: Topology and Dynamics in Data

Topological signatures of admixture in genomic data
Maria Vivien Visaya
University of Johannesburg
So Africa
Co-Author(s):    Al Bien Aculan, Rachelle Sambayan, Victoria Mendoza, Ricardo del Rosario
Abstract:
We investigate topological summaries of high-dimensional genomic data to detect and characterise population admixture, without relying on parametric population genetic models. Using persistent homology, we analyse haplotype data from 26 human populations (3202 individuals) from the 1000 Genomes Project. Admixed populations exhibit distinct and reproducible topological signatures, particularly in the distribution and persistence of one-dimensional homology classes. Unlike non-admixed populations, where short-lived cycles emerge only at large filtration scales, admixed populations display cycles distributed across a broad range of scales, reflecting heterogeneous ancestry structure. We formalise this observation through a non-admixture score (NAS) derived from persistence barcode statistics, which robustly separates admixed from non-admixed populations across genome-wide and per-chromosome analyses. Further, by equipping persistence diagrams with the Wasserstein metric, we demonstrate that hierarchical clustering recovers groups of populations with shared admixture signatures, revealing structure not captured by classical measures such as FST. Our results suggest that persistent homology provides an orthogonal, model-free framework for population genetic inference, capturing geometric and topological aspects of genetic variation that complement existing statistical approaches. This positions topological data analysis as a promising tool for studying complex evolutionary processes in large-scale genomic data.