Making generative AI faster, cheaper, and smarter
I am a Ph.D. student at the School of Computer Science, Carnegie Mellon University (2023–), advised by Prof. Beidi Chen and Prof. Zhihao Jia. My research builds machine learning systems that accelerate large language and generative models — through speculative decoding, sparse and randomized algorithms, and test-time scaling.
I received my B.Eng. in Automation from Tsinghua University, where I worked with Prof. Jidong Zhai. I have also worked with Prof. Xuehai Qian (Purdue).
I work at the intersection of machine learning systems, generative AI, and randomized algorithms. My goal is to push the efficiency frontier of large models so that powerful generation and reasoning become practical everywhere — from datacenter clusters to consumer devices. A recurring theme in my work is exploiting structure (sparsity, hierarchy, and randomness) to break long-standing latency, memory, and throughput trade-offs without sacrificing quality.
Speculative decoding and contextual sparsity that accelerate long-context generation losslessly — Sequoia, SpecInfer, SpecExec, TriForce, MagicDec, and Sirius.
LSH sampling, rejection sampling, and Monarch-structured attention that turn randomness into scalable systems — MagicPiG, Jackpot, and MonarchRT.
Rethinking test-time scaling laws and memory-efficient training for long sequences and graphs — Kinetics, Mini-Sequence Transformer, and GNNPipe.
Bold indicates my name; * denotes equal contribution. See Google Scholar for the full list.