Architectures

Slides
Video Lecture

References

  1. PaLM: Scaling Language Modeling with PathwaysAakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, etal.2022
  2. Gemini: A Family of Highly Capable Multimodal Models Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, etal.2023
  3. Mistral 7BAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, etal.2023
  4. Mixtral of ExpertsAlbert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, etal.2024
  5. Improving Language Understanding by Generative PretrainingAlec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever2018
  6. Attention Is All You NeedAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, etal.2017
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingJacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova2018
  8. Physics of Language Models: Part 3.1, Knowledge Storage and ExtractionZeyuan Allen-Zhu, Yuanzhi Li2023
  9. Language Models are Unsupervised Multitask LearnersAlec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever2019
  10. Language Models are Few-Shot LearnersTom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, etal.2020
  11. https://commoncrawl.org/
  12. The Pile: An 800GB Dataset of Diverse Text for Language ModelingLeo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, etal.2020
  13. Mamba: Linear-Time Sequence Modeling with Selective State SpacesAlbert Gu, Tri Dao2023
  14. Efficiently Modeling Long Sequences with Structured State SpacesAlbert Gu, Karan Goel, Christopher Ré2021