NanoFlow: Towards Optimal Large Language Model Serving Throughput
Published in arXiv, 2024
Throughput-oriented LLM serving system
Authors:Kan Zhu, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu, Dedong Xie, Yufei Gao, Qinyu Xu, Tian Tang, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci
Download Paper | Download Slides
