FlashAttention

Video Lecture

References

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessTri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré2022
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningTri Dao2023
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precisionJay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao2024