Distributed Training

Slides
Video Lecture

References

  1. GPipe: Efficient Training of Giant Neural Networks using Pipeline ParallelismYanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, etal.2018
  2. GShard: Scaling Giant Models with Conditional Computation and Automatic ShardingDmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, etal.2020