Xiang Li

I am a postdoctoral researcher in Statistics at the University of Pennsylvania, working with Prof. Qi Long and Prof. Weijie Su. I received my Ph.D. in Statistics from the School of Mathematical Sciences, Peking University in 2023, advised by Prof. Zhihua Zhang. Before that, I earned double bachelor’s degrees in Statistics and Economics at Peking University in 2018.

My research interests lie broadly at the intersection of statistics, optimization, and machine learning, with applications spanning data science and artificial intelligence. My current research focuses on the statistical and algorithmic foundations of reliable AI, with emphasis on large language models (LLMs). I investigate statistical watermarking to ensure the provenance and robustness of AI-generated content and develop tools to evaluate how LLMs encode and use knowledge.

Earlier, during my Ph.D., I designed methods for learning with heterogeneous and online data, addressing challenges such as communication efficiency in federated learning, robustness under data heterogeneity, and uncertainty quantification in streaming and decision-making problems. These experiences continue to shape my perspective on building scalable and trustworthy data-driven systems.

I am currently on the academic job market for the 2025–2026 cycle, seeking faculty positions in data science, statistics, mathematics, machine learning, and related fields. I am open to discussions about potential opportunities and collaborations.

Contact Info: lx10077 at upenn dot edu
Curriculum Vitae: CV

News

Dec 2, 2025	I’ll be attending NeurIPS 2025. Excited to connect and feel free to reach out!
Nov 5, 2025	I gave a SDLS webinar on LLM API usage and watermarking. The slides are available here.
Oct 25, 2025	I’ll attend 2025 INFORMS annual meeting and would be happy to discuss any faculty opportunities.
Sep 18, 2025	Two papers accepted to NeurIPS 2025 as spotlights: one on the empirical evaluation of goodness-of-fit tests for watermark detection, and the other on mitigating the privacy–utility trade-off in decentralized federated learning.
Aug 4, 2025	Excited to present my recent work on estimating watermark proportion at JSM 2025.
Jun 15, 2025	Attend 2025 ICSA, where I’ll be giving a talk on robust watermark detection and presenting a short course on LLM watermarking. Lecture slides are here.
Apr 1, 2025	Excited to receive the IMS New Researcher Travel Award.
Nov 5, 2024	Attend 2024 SLDS. First time to visit California.
Aug 15, 2024	Chair a session on Federated Learning at 2024 MOPTA.
Jul 12, 2024	Attend 2024 JCSDS. Great to catch up with old friends and meet new ones!
Jul 6, 2024	Attend IMS-China Meeting.
May 29, 2024	Attend 2024 IMS-NUS workshop and present our recent work on watermarking.
Dec 9, 2023	Attend 2023 NeurIPS. Great to visit New Orleans again!
Sep 26, 2023	Attend 2023 Allerton conference at the University of Illinois.
Jul 4, 2023	Graduate from Peking University. Finish my nine-year study therein.
Feb 11, 2023	Finish my visit at UoT.
Jan 20, 2023	Two papers are accepted by AISTATS 2023.

Selected Publications

A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules

[arXiv][HTML][Code][Poster][Slides]

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie J. Su

The Annals of Statistics, 2025, 🏛️ Invited talk at AoS invited paper session, JSM 2025
Robust detection of watermarks in large language models under human edits

[arXiv][Code][Poster][Slides]

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie J. Su

Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2025, 🏛️ Invited talk at the JRSSb Editors Invited Session, RSS 2026, 🏆 IMS New Researcher Travel Award
Evaluating the unseen capabilities: How many theorems do LLMs know?

[arXiv][Code][Poster][Slides]

Xiang Li, Jiayi Xin, Qi Long, and Weijie J. Su

arXiv preprint arXiv:2506.02058, 2025
On the convergence of FedAvg on non-iid data

[arXiv][HTML][Code][Poster][Slides]

Xiang Li*, Kaixuan Huang*, Wenhao Yang*, Shusen Wang, and Zhihua Zhang

In International Conference on Learning Representations, 2020, 🎤 Oral presentation
Statistical estimation and online inference via Local SGD

[arXiv][HTML][Slides]

Xiang Li, Jiadong Liang, Xiangyu Chang, and Zhihua Zhang

In Conference on Learning Theory, 2022