Distributed Training
SlidesVideo Lecture
References
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism - Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, etal. - 2018 
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding - Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, etal. - 2020