![]() (at Lake Louise, Alberta, Canada) |
Hanshi Sun   孙寒石
I am currently a Research Scientist at
At CMU, I worked with Prof.
Beidi Chen in the
![]()
|
[2025/03/03] Join ByteDance Seed-Foundation-MLSys team as a Research Scientist.
[2024/10/01] Our Speculative Rejection has been accepted by NeurIPS 2024! See you in Vancouver!
[2024/07/09] Our TriForce has been accepted by 🦙 COLM 2024! See you in Philadelphia!
[2024/06/03] Work as a MLSys Research Intern in Seed-Foundation team at ByteDance.
[2023/11/06] Join Prof. Beidi Chen's group at
CMU.
[2023/06/20] Graduate from
Southeast University
with a
bachelor degree.
[2022/11/15] Work at Apple as a Software Engineer Intern in the iPad System team.
[2022/07/06] Work as a Research Intern in Prof.
Xingyu Li’s group at the
University of Alberta.
/ . Selected papers are highlighted.
![]() |
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bao Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, and Anima Anandkumar ArXiv, 2025 Fine-grained, Head-wise Offloading Strategy |
![]() |
![]() Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, and Beidi Chen ArXiv, 2024
arXiv / website / code High-Throughput Long-Context LLM Inference System |
![]() |
Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun*, Momin Haider*, Ruiqi Zhang*, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, and Andrea Zanette* (* for core authors) Conference on Neural Information Processing Systems (NeurIPS), 2024
arXiv / website / code Fast Inference-time Aligment Algorithm |
![]() |
*TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, and Beidi Chen Conference on Language Modeling (COLM), 2024
arXiv / website / code Training-free Lossless Long Sequence Generation Acceleration |
© Hanshi Sun 2025