FlashAttention
SlidesVideo Lecture
References
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness - Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré - 2022 
- FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning - Tri Dao - 2023 
- FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision - Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao - 2024