FlashAttention
SlidesVideo Lecture
References
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré 2022 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Tri Dao 2023 FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao 2024