References
- Fast Inference from Transformers via Speculative DecodingYaniv Leviathan, Matan Kalman, Yossi Matias2022
- Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding HeadsTianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao2024