(at Lake Louise, Alberta, Canada)

Hanshi Sun   孙寒石

I am currently a Research Scientist at ByteDance Logo ByteDance Seed (Seattle) , where I work on Machine Learning Systems. Previously, I earned my M.S. in Electrical and Computer Engineering from CMU Logo Carnegie Mellon University (CMU) , and my bachelor's degree from Southeast University Logo Southeast University (SEU) .


At CMU, I worked with Prof. Beidi Chen in the Infini AI Lab Logo InfiniAI Lab and collaborated with Prof. Andrea Zanette. Also, I had a wonderful experience working with Prof. Xingyu Li at the University of Alberta Logo University of Alberta and Prof. Yi Zhou in the PALM Lab Logo PALM Lab .


preminstrel [at] gmail [dot] com; hanshi.s [at] bytedance.com

@preminstrel   /   GitHub  /   Google Scholar   /   LinkedIn   /   Blog

News 📢

Publications

/ . Selected papers are highlighted.

   
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bao Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, and Anima Anandkumar

ArXiv, 2025


arXiv / bibtex


Fine-grained, Head-wise Offloading Strategy

   
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, and Beidi Chen

ArXiv, 2024


arXiv / website / code / bibtex


High-Throughput Long-Context LLM Inference System

   
Fast Best-of-N Decoding via Speculative Rejection

Hanshi Sun*, Momin Haider*, Ruiqi Zhang*, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, and Andrea Zanette* (* for core authors)

Conference on Neural Information Processing Systems (NeurIPS), 2024


arXiv / website / code / bibtex


Fast Inference-time Aligment Algorithm

   
TriForce
*TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, and Beidi Chen

Conference on Language Modeling (COLM), 2024


arXiv / website / code / demo / bibtex


Training-free Lossless Long Sequence Generation Acceleration

Services


© Hanshi Sun 2025