Ph.D. Student · Carnegie Mellon

Zhuoming Chen 陈卓明

Making generative AI faster, cheaper, and smarter

I am a Ph.D. student at the School of Computer Science, Carnegie Mellon University (2023–), advised by Prof. Beidi Chen and Prof. Zhihao Jia. My research builds machine learning systems that accelerate large language and generative models — through speculative decoding, sparse and randomized algorithms, and test-time scaling.

I received my B.Eng. in Automation from Tsinghua University, where I worked with Prof. Jidong Zhai. I have also worked with Prof. Xuehai Qian (Purdue).

Zhuoming Chen

Research

I work at the intersection of machine learning systems, generative AI, and randomized algorithms. My goal is to push the efficiency frontier of large models so that powerful generation and reasoning become practical everywhere — from datacenter clusters to consumer devices. A recurring theme in my work is exploiting structure (sparsity, hierarchy, and randomness) to break long-standing latency, memory, and throughput trade-offs without sacrificing quality.

Efficient LLM Inference

Speculative decoding and contextual sparsity that accelerate long-context generation losslessly — Sequoia, SpecInfer, SpecExec, TriForce, MagicDec, and Sirius.

🎲

Randomized Algorithms

LSH sampling, rejection sampling, and Monarch-structured attention that turn randomness into scalable systems — MagicPiG, Jackpot, and MonarchRT.

📈

Scaling & Training Systems

Rethinking test-time scaling laws and memory-efficient training for long sequences and graphs — Kinetics, Mini-Sequence Transformer, and GNNPipe.

News

Jan 2026
Jackpot — optimal budgeted rejection sampling for extreme actor-policy mismatch RL — is accepted to ICLR 2026.
Dec 2025
Co-presenting the NeurIPS 2025 tutorial “Scale Test-Time Compute on Modern Hardware” with Beidi Chen and Azalia Mirhoseini.
Sep 2025
Kinetics: Rethinking Test-Time Scaling Laws is accepted to NeurIPS 2025, after winning the TTODLER'25 Best Paper Award and oral presentations at LCFM'25 and KDD GenAI'25.
May 2025
Started as a Research Scientist Intern at Meta FAIR in New York, mentored by Director Léon Bottou.
Jan 2025
MagicPiG (Spotlight) and MagicDec are accepted to ICLR 2025.
Sep 2024
Sequoia (Spotlight), SpecExec, Sirius, and Mini-Sequence Transformer are accepted to NeurIPS 2024.

Selected Publications

Bold indicates my name; * denotes equal contribution. See Google Scholar for the full list.

Mon
arch

MonarchRT: Efficient Attention for Real-Time Video Generation

K. Agarwal, Z. Chen, C. Luo, Y. Chen, H. Zheng, X. Huang, …, B. Chen

arXiv 2026 arXiv Code Project
Kine
tics

Kinetics: Rethinking Test-Time Scaling Laws

R. Sadhukhan*, Z. Chen*, H. Zheng, Y. Zhou, E. Strubell, B. Chen

NeurIPS 2025 🏆 TTODLER'25 Best Paper ★ LCFM'25 Oral ★ KDD GenAI'25 Oral arXiv Code Project
Magic
PiG

MagicPiG: LSH Sampling for Efficient LLM Generation

Z. Chen, R. Sadhukhan, Z. Ye, Y. Zhou, J. Zhang, N. Nolte, …, B. Chen

ICLR 2025 ★ Spotlight arXiv Code
Seq
uoia

Sequoia: Scalable, Robust, and Hardware-Aware Speculative Decoding

Z. Chen*, A. May*, R. Svirschevski*, Y. Huang, M. Ryabinin, Z. Jia, B. Chen

NeurIPS 2024 ★ Spotlight arXiv Code
Spec
Exec

SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

R. Svirschevski*, A. May*, Z. Chen*, B. Chen, Z. Jia, M. Ryabinin

NeurIPS 2024 arXiv Code
Magic
Dec

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

J. Chen, V. Tiwari, R. Sadhukhan, Z. Chen, J. Shi, I. E. H. Yen, B. Chen

ICLR 2025 arXiv Code Project
Siri
us

Sirius: Contextual Sparsity with Correction for Efficient LLMs

Y. Zhou, Z. Chen, Z. Xu, V. Lin, B. Chen

NeurIPS 2024 arXiv Code Project
Mini
Seq

Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training

C. Luo, J. Zhao, Z. Chen, B. Chen, A. Anandkumar

NeurIPS 2024 arXiv Code
Spec
Infer

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

X. Miao, G. Oliaro, Z. Zhang, X. Cheng, Z. Wang, R. Y. Y. Wong, Z. Chen, D. Arfeen, R. Abhyankar, Z. Jia

ASPLOS 2024 arXiv Code
Q-
GBDT

Quantized Training of Gradient Boosting Decision Trees

Y. Shi, G. Ke, Z. Chen, S. Zheng, T.-Y. Liu

NeurIPS 2022 arXiv Code

Experience & Education

Research Scientist Intern
FAIR, Meta · New York — Mentor: Léon Bottou
May 2025 – Dec 2025
Ph.D. Student, Computer Science
Carnegie Mellon University — Advised by Beidi Chen & Zhihao Jia
Aug 2023 – Present
Research Intern
Machine Learning Group, Microsoft Research Asia · Beijing — Mentor: Yu Shi
Nov 2021 – May 2022
B.Eng. in Automation
Tsinghua University · Beijing — GPA 3.86 / 4.0
Aug 2019 – Jul 2023