FlashAttention

Slides
Video Lecture

References

  1. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessTri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré2022
  2. FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningTri Dao2023
  3. FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precisionJay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao2024