Xiang Li

prof_pic.jpg

I am a postdoctoral researcher in Statistics at the University of Pennsylvania, working with Prof. Qi Long and Prof. Weijie Su. I received my Ph.D. in Statistics from the School of Mathematical Sciences, Peking University in 2023, advised by Prof. Zhihua Zhang. Before that, I earned dual bachelor’s degrees in Statistics and Economics at Peking University in 2018.

My research interests lie broadly at the intersection of statistics, optimization, and machine learning, with applications spanning data science and artificial intelligence. My current research focuses on the statistical and algorithmic foundations of reliable AI, with emphasis on large language models (LLMs). I investigate statistical watermarking to ensure the provenance and robustness of AI-generated content and develop tools to evaluate how LLMs encode and use knowledge.

Earlier, during my Ph.D., I designed methods for learning with heterogeneous and online data, addressing challenges such as communication efficiency in federated learning, robustness under data heterogeneity, and uncertainty quantification in streaming and decision-making problems. These experiences continue to shape my perspective on building scalable and trustworthy data-driven systems.

I am currently on the academic job market for the 2025–2026 cycle, and I am also open to scientific research opportunities in industry.

Contact Info: lx10077 at upenn dot cn

News

Sep 18, 2025 Two papers have been accepted to NeurIPS 2025: one on the empirical effectiveness of goodness-of-fit tests for watermark detection, and another on mitigating the privacy–utility trade-off in decentralized federated learning via f-differential privacy.
Aug 4, 2025 Excited to present my recent work on estimating watermark proportion at JSM 2025.
Jun 15, 2025 Attend 2025 ICSA, where I’ll be giving a talk on robust watermark detection and presenting a short course on LLM watermarking. Lecture slides are here.
Apr 1, 2025 Excited to receive the IMS New Researcher Travel Award.
Nov 5, 2024 Attend 2024 SLDS. First time to visit California.
Show More

Selected Publications

  1. A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules
    Xiang LiFeng RuanHuiyuan WangQi Long, and Weijie J. Su
    The Annals of Statistics, 2025, 🏛️ Invited talk at AoS invited paper session, JSM 2025
  2. Evaluating the unseen capabilities: How many theorems do LLMs know?
    Xiang Li, Jiayi Xin, Qi Long, and Weijie J. Su
    arXiv preprint arXiv:2506.02058, 2025
  3. On the convergence of FedAvg on non-iid data
    Xiang Li*, Kaixuan Huang*, Wenhao Yang*Shusen Wang, and Zhihua Zhang
    In International Conference on Learning Representations, 2020, 🎤 Oral presentation
  4. Statistical estimation and online inference via Local SGD
    In Conference on Learning Theory, 2022
  5. Variance-aware decision making with linear function approximation with heavy-tailed rewards
    Xiang Li, and Qiang Sun
    Transactions on Machine Learning Research, 2024, 🏛️ Invited to present in ICLR 2025