Advanced Training
References
- Mixed Precision Training - Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, etal. - 2017 
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism - Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, etal. - 2018 
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding - Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, etal. - 2020 
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models - Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He - 2019 
- LoRA: Low-Rank Adaptation of Large Language Models - Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, etal. - 2021 
- 8-Bit Approximations for Parallelism in Deep Learning - Tim Dettmers - 2015 
- 8-bit Optimizers via Block-wise Quantization - Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer - 2021 
- The case for 4-bit precision: k-bit Inference Scaling Laws - Tim Dettmers, Luke Zettlemoyer - 2022 
- QLoRA: Efficient Finetuning of Quantized LLMs - Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer - 2023 
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection - Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, Yuandong Tian - 2024 
- Training Deep Nets with Sublinear Memory Cost - Tianqi Chen, Bing Xu, Chiyuan Zhang, Carlos Guestrin - 2016 
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness - Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré - 2022 
- FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning - Tri Dao - 2023 
- FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision - Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao - 2024 
- https://github.com/ray-project/ray
- https://github.com/Lightning-AI/pytorch-lightning