Distributed Training
SlidesVideo Lecture
References
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, etal. 2018 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, etal. 2020