Special Session 63: Analysis and Optimization of Biological and Medical Systems

When Can We Cluster Data? Insights from Convex Optimization and Semidefinite Programming

Brendan P Ames
University of Alabama
USA
Co-Author(s):    
Abstract:
Recent years have seen an incredible increase in the use of artificial intelligence and machine learning (AI/ML) methods in essentially all areas of science, technology, engineering, humanities, and education. However, despite this widespread adoption of AI/ML models as analytical tools, relatively very little is known about the theoretical properties of these models. In particular, there is minimal theory justifying their use, despite a large and growing body of empirical evidence of their efficacy. In this talk, I will attempt to partially bridge this gap between theory and performance for a particular machine learning problem: clustering. Clustering is a classical machine learning task where one seeks to partition a given data into subgroups of similar items called clusters. I will argue that the clustering task can be modeled as a family of combinatorial optimization problems, and illustrate how to use methods from semidefinite programming to design efficient and accurate numerical methods for clustering. In particular, I will propose several models for well-behaved or clusterable data, where we can expect these clustering heuristics to correctly identify the hidden cluster structure.