Speculative Decoding

Slides
Video Lecture

References

  1. Fast Inference from Transformers via Speculative DecodingYaniv Leviathan, Matan Kalman, Yossi Matias2022
  2. Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding HeadsTianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao2024