Bonus - Reinforcement Learning and LLMs

Slides
Video Lecture

References

  1. Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMsArash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, etal.2024
  2. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsZhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, etal.2024
  3. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, etal.2025
  4. Reinforcement Learning for Long-Horizon Interactive LLM AgentsKevin Chen, Marco Cusumano-Towner, Brody Huval, Aleksei Petrenko, Jackson Hamburger, etal.2025
  5. Buy 4 REINFORCE Samples, Get a Baseline for Free!Wouter Kool, Herke van Hoof, Max Welling2019