Page Attention

Slides
Video Lecture

References

  1. Efficient Memory Management for Large Language Model Serving with PagedAttentionWoosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, etal.2023
  2. PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information FunnelingZefan Cai, Yichi Zhang, Bofei Gao, Yuliang Liu, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, etal.2024
  3. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head CheckpointsJoshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai2023