under cherry blossoms with a dog Lake Louise
📷 by Zihan

Hanshi Sun 孙寒石

I am currently a Research Scientist at ByteDance ByteDance Seed (Seattle), where I work on Machine Learning Systems. Previously, I earned my M.S. in Electrical and Computer Engineering from CMU Carnegie Mellon University (CMU), and my bachelor's degree from SEU Southeast University (SEU).

At CMU, I worked with Prof. Beidi Chen in the InfiniAI Lab InfiniAI Lab and collaborated with Prof. Andrea Zanette. Also, I had a wonderful experience working with Prof. Xingyu Li at the Alberta University of Alberta and Prof. Yi Zhou in the PALM Lab PALM Lab.

News

Publications

/ . Representative papers are highlighted.

Ditron
Ditron: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs
Size Zheng, Xuegui Zheng, Hanshi Sun, Qi Hou, Wenlei Bao, Shiyu Li, Haojie Duanmu, Jin Fang, Chenli Xue, Chenhui Huang, Yuanqiang Liu, Renze Chen, Ningxin Zheng, Dongyang Wang, Li-Wen Chang, Liqiang Lu, Yun Liang, Jidong Zhai, Xin Liu
ICML 2026 International Conference on Machine Learning
Distributed multi-level tiling compiler for parallel tensor programs.
Charon
Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference
Mengtian Yang, Zhekun Zhang, Mingheng Wu, Jianwen Yan, Hanshi Sun, Li-Wen Chang
MLSys 2026 Conference on Machine Learning and Systems
Unified, fine-grained simulator for LLM training and inference.
R-KV
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
Zefan Cai, Wen Xiao, Hanshi Sun, Cheng Luo, Yikai Zhang, Ke Wan, Yucheng Li, Yeyang Zhou, Li-Wen Chang, Jiuxiang Gu, Zhen Dong, Anima Anandkumar, Abedelkadir Asi, Junjie Hu
NeurIPS 2025 Conference on Neural Information Processing Systems
Shrink the cache, keep the brains.
HeadInfer
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
ICML 2025 Workshop International Conference on Machine Learning
Fine-grained, head-wise offloading strategy.
ShadowKV
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ICML 2025 Spotlight International Conference on Machine Learning
High-throughput long-context LLM inference system.
Speculative Rejection
Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun*, Momin Haider*, Ruiqi Zhang*, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, and Andrea Zanette* (* for core authors)
NeurIPS 2024 Conference on Neural Information Processing Systems
Fast inference-time alignment algorithm.
TriForce
*TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
COLM 2024 Conference on Language Modeling
Training-free lossless long-sequence generation acceleration.
BMAD
BMAD: Benchmarks for Medical Anomaly Detection
Jinan Bao, Hanshi Sun, Hanqiu Deng, Zhaoxiang Zhang, and Xingyu Li
CVPR 2024 Workshop Computer Vision and Pattern Recognition
Six datasets across five medical domains, three evaluation metrics, and fourteen state-of-the-art AD algorithms.
Combating Noisy Labels
Combating Medical Noisy Labels by Disentangled Distribution Learning and Consistency Regularization
Yi Zhou, Lei Huang, Tao Zhou and Hanshi Sun
FGCS 2023 Future Generation Computer Systems
Disentangled distribution learning reduces effect of label uncertainty and ambiguity.
Arrhythmia Classifier
Arrhythmia Classifier Using Convolutional Neural Network with Adaptive Loss-aware Multi-bit Networks Quantization
Hanshi Sun, Ao Wang, Ninghao Pu, Zhiqing Li, Junguang Huang, Hao Liu and Zhi Qi
ICAICE 2021
1-D adaptive loss-aware quantization with 23.36× memory reduction.

Services