Research | Xiang Li

Large Language Models (LLMs)

A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie Su

The Annals of Statistics, 2025

arXiv HTML Code Slides
Evaluating the unseen capabilities: How many theorems do LLMs know?

Xiang Li, Jiayi Xin, Qi Long, and Weijie Su

arXiv preprint arXiv:2506.02058, 2025

arXiv
Robust detection of watermarks in large language models under human edits

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie Su

arXiv preprint arXiv:2411.13868, 2024, 🏆 IMS New Researcher Travel Award

arXiv Code Slides
Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts

Xiang Li, Garrett Wen, Weiqing He, Jiayuan Wu, Qi Long, and 1 more author

arXiv preprint arXiv:2506.22343, 2025

arXiv Code Slides
Debiasing watermarks for large language models via maximal coupling

Yangxinyu Xie, Xiang Li, Tanwi Mallick, Weijie Su, and Ruixun Zhang

Journal of the American Statistical Association, 2025

arXiv Code

Statistical Inference

Online statistical inference for nonlinear stochastic approximation with Markovian data

Xiang Li, Jiadong Liang, and Zhihua Zhang

arXiv preprint arXiv:2302.07690, 2023

arXiv
Convergence and inference of Stream SGD, with applications to queueing systems and inventory control

Xiang Li, Jiadong Liang, Xinyun Chen, and Zhihua Zhang

arXiv preprint arXiv:2309.09545, 2023

arXiv Code
A statistical analysis of Polyak-Ruppert averaged Q-learning

Xiang Li, Wenhao Yang, Jiadong Liang, Zhihua Zhang, and Michael I Jordan

In International Conference on Artificial Intelligence and Statistics, 2023

arXiv Code Poster Slides Video
Statistical estimation and online inference via Local SGD

Xiang Li, Jiadong Liang, Xiangyu Chang, and Zhihua Zhang

In Conference on Learning Theory, 2022

HTML Slides
Uncertainty quantification of data shapley via statistical inference

Mengmeng Wu, Zhihong Liu, Xiang Li, Ruoxi Jia, and Xiangyu Chang

arXiv preprint arXiv:2407.19373, 2024

arXiv
Statistical analysis of Karcher means for random restricted PSD matrices

Hengchao Chen, Xiang Li, and Qiang Sun

In International Conference on Artificial Intelligence and Statistics, 2023

arXiv

Stochastic Approximation

Finite-time decoupled convergence in nonlinear two-time-scale stochastic approximation

Yuze Han, Xiang Li, and Zhihua Zhang

arXiv preprint arXiv:2401.03893, 2024

arXiv
Decoupled functional central limit theorems for two-time-scale stochastic approximation

Yuze Han, Xiang Li, Jiadong Liang, and Zhihua Zhang

arXiv preprint arXiv:2412.17070, 2024

arXiv
Asymptotic behaviors and phase transitions in projected stochastic approximation: A jump diffusion approach

Jiadong Liang, Yuze Han, Xiang Li, and Zhihua Zhang

arXiv preprint arXiv:2304.12953, 2023

Extended version of the conference paper: Asymptotic behaviors of projected stochastic approximation: A jump diffusion perspective

arXiv
Asymptotic behaviors of projected stochastic approximation: A jump diffusion perspective

Jiadong Liang, Yuze Han, Xiang Li, and Zhihua Zhang

In Advances in Neural Information Processing Systems, 2022, 🌟 Spotlight

HTML Slides
Do subsampled Newton methods work for high-dimensional data?

Xiang Li, Shusen Wang, and Zhihua Zhang

In AAAI Conference on Artificial Intelligence, 2020

HTML

Federated Learning

On the convergence of FedAvg on non-iid data

Xiang Li*, Kaixuan Huang*, Wenhao Yang*, Shusen Wang, and Zhihua Zhang

In International Conference on Learning Representations, 2020, 🎤 Oral presentation

HTML Code Poster Slides
A random projection approach to personalized federated learning: Enhancing communication efficiency, robustness, and fairness

Yuze Han**, Shiyun Lin**, Xiang Li**, and Zhihua Zhang**

Journal of Machine Learning Research, 2024

Extended version of the conference paper: Personalized federated learning towards communication efficiency, robustness and fairness

HTML
Fedpower: Privacy‑preserving distributed eigenspace estimation

Xiao Guo, Xiang Li, Xiangyu Chang, Shusen Wang, and Zhihua Zhang

Machine Learning, 2024

Extended version of the conference paper: Communication-efficient distributed SVD via local power iterations

arXiv HTML
Personalized federated learning towards communication efficiency, robustness and fairness

Shiyun Lin*, Yuze Han*, Xiang Li, and Zhihua Zhang

In Neural Information Processing Systems, 2022

HTML Code Poster Slides
Communication-efficient distributed SVD via local power iterations

Xiang Li, Shusen Wang, Kun Chen, and Zhihua Zhang

In International Conference on Machine Learning, 2021

HTML Code Poster Slides
Communication efficient decentralized training with multiple local updates

Xiang Li, Wenhao Yang, Shusen Wang, and Zhihua Zhang

arXiv preprint arXiv:1910.09126, 2019

arXiv

Online Decision Making

Variance-aware decision making with linear function approximation with heavy-tailed rewards

Xiang Li, and Qiang Sun

Transactions on Machine Learning Research, 2024, 🎓 Invited to present in ICLR 2025

arXiv HTML
Corruption-robust variance-aware algorithms for generalized linear bandits under heavy-tailed rewards

Qingyuan Yu, Euijin Baek, Xiang Li, and Qiang Sun

In Conference on Uncertainty in Artificial Intelligence, 2025
A regularized approach to sparse optimal policy in reinforcement learning

Wenhao Yang*, Xiang Li*, and Zhihua Zhang

In Advances in Neural Information Processing Systems, 2019

HTML
Finding near optimal policies via reducive regularization in Markov decision processes

Wenhao Yang, Xiang Li, Guangzeng Xie, and Zhihua Zhang

In Workshop on Reinforcement Learning Theory, ICML, 2021

HTML

2025

A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie Su

The Annals of Statistics, 2025

arXiv HTML Code Slides
Evaluating the unseen capabilities: How many theorems do LLMs know?

Xiang Li, Jiayi Xin, Qi Long, and Weijie Su

arXiv preprint arXiv:2506.02058, 2025

arXiv
Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts

Xiang Li, Garrett Wen, Weiqing He, Jiayuan Wu, Qi Long, and 1 more author

arXiv preprint arXiv:2506.22343, 2025

arXiv Code Slides
Debiasing watermarks for large language models via maximal coupling

Yangxinyu Xie, Xiang Li, Tanwi Mallick, Weijie Su, and Ruixun Zhang

Journal of the American Statistical Association, 2025

arXiv Code
Corruption-robust variance-aware algorithms for generalized linear bandits under heavy-tailed rewards

Qingyuan Yu, Euijin Baek, Xiang Li, and Qiang Sun

In Conference on Uncertainty in Artificial Intelligence, 2025

2024

Robust detection of watermarks in large language models under human edits

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie Su

arXiv preprint arXiv:2411.13868, 2024, 🏆 IMS New Researcher Travel Award

arXiv Code Slides
Variance-aware decision making with linear function approximation with heavy-tailed rewards

Xiang Li, and Qiang Sun

Transactions on Machine Learning Research, 2024, 🎓 Invited to present in ICLR 2025

arXiv HTML
A random projection approach to personalized federated learning: Enhancing communication efficiency, robustness, and fairness

Yuze Han**, Shiyun Lin**, Xiang Li**, and Zhihua Zhang**

Journal of Machine Learning Research, 2024

Extended version of the conference paper: Personalized federated learning towards communication efficiency, robustness and fairness

HTML
Fedpower: Privacy‑preserving distributed eigenspace estimation

Xiao Guo, Xiang Li, Xiangyu Chang, Shusen Wang, and Zhihua Zhang

Machine Learning, 2024

Extended version of the conference paper: Communication-efficient distributed SVD via local power iterations

arXiv HTML
Decoupled functional central limit theorems for two-time-scale stochastic approximation

Yuze Han, Xiang Li, Jiadong Liang, and Zhihua Zhang

arXiv preprint arXiv:2412.17070, 2024

arXiv
Finite-time decoupled convergence in nonlinear two-time-scale stochastic approximation

Yuze Han, Xiang Li, and Zhihua Zhang

arXiv preprint arXiv:2401.03893, 2024

arXiv
Uncertainty quantification of data shapley via statistical inference

Mengmeng Wu, Zhihong Liu, Xiang Li, Ruoxi Jia, and Xiangyu Chang

arXiv preprint arXiv:2407.19373, 2024

arXiv

2023

Online statistical inference for nonlinear stochastic approximation with Markovian data

Xiang Li, Jiadong Liang, and Zhihua Zhang

arXiv preprint arXiv:2302.07690, 2023

arXiv
Convergence and inference of Stream SGD, with applications to queueing systems and inventory control

Xiang Li, Jiadong Liang, Xinyun Chen, and Zhihua Zhang

arXiv preprint arXiv:2309.09545, 2023

arXiv Code
Asymptotic behaviors and phase transitions in projected stochastic approximation: A jump diffusion approach

Jiadong Liang, Yuze Han, Xiang Li, and Zhihua Zhang

arXiv preprint arXiv:2304.12953, 2023

arXiv
A statistical analysis of Polyak-Ruppert averaged Q-learning

Xiang Li, Wenhao Yang, Jiadong Liang, Zhihua Zhang, and Michael I Jordan

In International Conference on Artificial Intelligence and Statistics, 2023

arXiv Code Poster Slides Video
Statistical analysis of Karcher means for random restricted PSD matrices

Hengchao Chen, Xiang Li, and Qiang Sun

In International Conference on Artificial Intelligence and Statistics, 2023

arXiv

2022

Statistical estimation and online inference via Local SGD

Xiang Li, Jiadong Liang, Xiangyu Chang, and Zhihua Zhang

In Conference on Learning Theory, 2022

HTML Slides
Asymptotic behaviors of projected stochastic approximation: A jump diffusion perspective

Jiadong Liang, Yuze Han, Xiang Li, and Zhihua Zhang

In Advances in Neural Information Processing Systems, 2022, 🌟 Spotlight

HTML Slides
Personalized federated learning towards communication efficiency, robustness and fairness

Shiyun Lin*, Yuze Han*, Xiang Li, and Zhihua Zhang

In Neural Information Processing Systems, 2022

HTML Code Poster Slides

2021

Communication-efficient distributed SVD via local power iterations

Xiang Li, Shusen Wang, Kun Chen, and Zhihua Zhang

In International Conference on Machine Learning, 2021

HTML Code Poster Slides
Finding near optimal policies via reducive regularization in Markov decision processes

Wenhao Yang, Xiang Li, Guangzeng Xie, and Zhihua Zhang

In Workshop on Reinforcement Learning Theory, ICML, 2021

HTML

2020

On the convergence of FedAvg on non-iid data

Xiang Li*, Kaixuan Huang*, Wenhao Yang*, Shusen Wang, and Zhihua Zhang

In International Conference on Learning Representations, 2020, 🎤 Oral presentation

HTML Code Poster Slides
Do subsampled Newton methods work for high-dimensional data?

Xiang Li, Shusen Wang, and Zhihua Zhang

In AAAI Conference on Artificial Intelligence, 2020

HTML

2019

A regularized approach to sparse optimal policy in reinforcement learning

Wenhao Yang*, Xiang Li*, and Zhihua Zhang

In Advances in Neural Information Processing Systems, 2019

HTML
Communication efficient decentralized training with multiple local updates

Xiang Li, Wenhao Yang, Shusen Wang, and Zhihua Zhang

arXiv preprint arXiv:1910.09126, 2019

arXiv

thesis

LazySVD: Fast singular value decomposition

Xiang Li

Peking University, 2018, thesis

PDF
Online statistical inference for federated learning and nonlinear stochastic approximation

Xiang Li

Peking University, 2023, thesis

PDF