Xinyuan (Youki) Cao

Hi! I am Xinyuan Cao (曹馨元), a fifth-year PhD student in Machine Learning at Georgia Institute of Technology. I am fortunate to be advised by Prof. Santosh Vempala. Before joining Gatech, I received my Master's degree in Data Science from Columbia University, supervised by Prof. John Wright. I obtained my Bachelor's degree in Mathematics at Fudan University.

I have broad interest in machine learning, optimization and graph. My research mainly focuses on developing efficient and provable machine learning algorithms and building theories that inspire the machine learning practitioners!

Email / Github / Google Scholar

Fellowships and Awards

ARC Fellowship, Georgia Tech, 2023 Spring
J.P. Morgan AI Research PhD Fellowship, 2024

Talks

Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time

Jun 2024 - Northwestern Theory Lunch (Evanston, IL)
Nov 2023 - Georgia Tech ACO Student Seminar (Atlanta, GA)
Oct 2023 - LeT-All Mentorship Workshop (Virtual)
Sep 2023 - Georgia Tech ARC (Atlanta, GA)
Aug 2023 - SUFE TCS Seminar (Shanghai, China)

Provable Lifelong Learning of Representations

Mar 2023 - CMU Theory Lunch (Pittsburgh, PA)
Mar 2023 - Continual AI (CLAI) Seminar (Virtual)
Mar 2023 - AISTATS 2023 (Virtual)
Jan 2023 - Georgia Tech ACO Student Seminar (Atlanta, GA)

Publications

* indicates equal contribution.

Machine Learning Theory
	Provable Long-Range Benefits of Next-Token Prediction Xinyuan Cao, Santosh Vempala STOC 2026 Why do modern language models, trained to do well on next-word prediction, appear to generate coherent documents and capture long-range structure? Here, we prove that standard next-token training yields a model that is k-token indistinguishable from training data, even for arbitrarily long generations.
	Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time Xinyuan Cao, Santosh Vempala NeurIPS 2023 We propose a provable and efficient unsupervised learning algorithm to learn the max margin classifier with linearly separable unlabeled data through a contrastive approach. The approach uses re-weighted first and second moments to compute the direction of the max margin classifier.
	Provable Lifelong Learning of Representations Xinyuan Cao, Weiyang Liu, Santosh Vempala AISTATS 2022 We propose a lifelong learning algorithm that maintains and refines the internal feature representation and prove nearly matching upper and lower bounds on the total sample complexity. We also complement our analysis with an empirical study, where our method performs favorably on challenging realistic image datasets compared to state-of-the-art continual learning methods.
Learning-augmented Algorithms
	On the Power of Learning-Augmented Search Trees Jingbang Chen, Xinyuan Cao, Alicia Stepin, Li Chen ICML 2025 We study learning-augmented binary search trees (BSTs) and B-Trees via Treaps with composite priorities. It also gives the first B-Tree data structure that can provably take advantage of localities in the access sequence via online self-reorganization. The data structure is robust to prediction errors and handles insertions, deletions, as well as prediction updates.
Graph Representation Learning
	StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning Shengzhong Zhang, Wenjie Yang, Xinyuan Cao, Hongwei Zhang, Zengfeng Huang ICLR 2024 We propose a simple yet effective training framework called Structural Compression (StructComp) to do graph contrastive learning. Inspired by a sparse low-rank approximation on the diffusion matrix, StructComp trains the encoder with the compressed nodes. This allows the encoder not to perform any message passing during the training stage, and significantly reduces the number of sample pairs in the contrastive loss.
	Graph-Level Embedding for Time-Evolving Graphs Lili Wang, Chenghan Huang, Weicheng Ma, Xinyuan Cao, Soroush Vosoughi WWW 2023 We present a novel method for temporal graph-level embedding that involves constructing a multilayer graph and using a modified random walk with temporal backtracking to generate temporal contexts for the graph’s nodes.
	Graph Embedding via Diffusion-Wavelets-Based Node Feature Distribution Characterization Lili Wang, Chenghan Huang, Weicheng Ma, Xinyuan Cao, Soroush Vosoughi CIKM 2021 We propose a novel unsupervised whole graph embedding method. Our method uses spectral graph wavelets to capture topological similarities on each k-hop sub-graph between nodes and uses them to learn embeddings for the whole graph.

Preprints

Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay
Leyan Pan, Xinyuan Cao

We investigate the interrelationships between batch normalization (BN), weight decay, and proximity to the Neural Collapse (NC) structure. Experimental evidence substantiates our theoretical findings, revealing a pronounced occurrence of NC in models incorporating BN and appropriate weight-decay values.