Hanshi Sun

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

Zefan Cai, Wen Xiao, Hanshi Sun, Cheng Luo, Yikai Zhang, Ke Wan, Yucheng Li, Yeyang Zhou, Li-Wen Chang, Jiuxiang Gu, Zhen Dong, Anima Anandkumar, Abedelkadir Asi, Junjie Hu

Conference on Neural Information Processing Systems (NeurIPS), 2025

arXiv / website / code / bibtex

Shrink the cache, keep the brains.

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bao Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, and Anima Anandkumar

ICML 2025 Workshop on Long-Context Foundation Models, 2025

arXiv / bibtex

Fine-grained, Head-wise Offloading Strategy

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, and Beidi Chen

International Conference on Machine Learning (ICML) Spotlight, 2025

arXiv / website / code / bibtex

High-Throughput Long-Context LLM Inference System

Fast Best-of-N Decoding via Speculative Rejection

Hanshi Sun*, Momin Haider*, Ruiqi Zhang*, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, and Andrea Zanette* (* for core authors)

Conference on Neural Information Processing Systems (NeurIPS), 2024

arXiv / website / code / bibtex

Fast Inference-time Aligment Algorithm

*TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, and Beidi Chen

Conference on Language Modeling (COLM), 2024

arXiv / website / code / demo / bibtex

Training-free Lossless Long Sequence Generation Acceleration

Conference on Neural Information Processing Systems (NeurIPS), 2025

arXiv / website / code / bibtex

Shrink the cache, keep the brains.

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bao Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, and Anima Anandkumar

ICML 2025 Workshop on Long-Context Foundation Models, 2025

arXiv / bibtex

Fine-grained, Head-wise Offloading Strategy

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, and Beidi Chen

International Conference on Machine Learning (ICML) Spotlight, 2025

arXiv / website / code / bibtex

High-Throughput Long-Context LLM Inference System

Conference on Neural Information Processing Systems (NeurIPS), 2024

arXiv / website / code / bibtex

Fast Inference-time Aligment Algorithm

*TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, and Beidi Chen

Conference on Language Modeling (COLM), 2024

arXiv / website / code / demo / bibtex

Training-free Lossless Long Sequence Generation Acceleration

BMAD: Benchmarks for Medical Anomaly Detection

Jinan Bao, Hanshi Sun, Hanqiu Deng, Zhaoxiang Zhang, and Xingyu Li

Computer Vision and Pattern Recognition (CVPR) Workshop, 2024

arXiv / code / bibtex

This benchmark encompasses six reorganized datasets from five medical domains (i.e. brain MRI, liver CT, retinal OCT, chest X-ray, and digital histopathology) and three key evaluation metrics, and includes a total of fourteen state-of-the-art AD algorithms.

Combating Medical Noisy Labels by Disentangled Distribution Learning and Consistency Regularization

Yi Zhou, Lei Huang, Tao Zhou and Hanshi Sun

Future Generation Computer Systems (FGCS), 2023

paper / bibtex

Disentangled distribution learning reduces effect of label uncertainty and ambiguity

Arrhythmia Classifier Using Convolutional Neural Network with Adaptive Loss-aware Multi-bit Networks Quantization
Hanshi Sun, Ao Wang, Ninghao Pu, Zhiqing Li, Junguang Huang, Hao Liu and Zhi Qi

ICAICE, 2021

website / paper / code / bibtex

Present a 1-D adaptive loss-aware quantization, achieving a high compression rate that reduces memory consumption by 23.36x

Hanshi Sun 孙寒石

News 📢

Publications

Services