References
- PaLM: Scaling Language Modeling with PathwaysAakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, etal.2022
- Gemini: A Family of Highly Capable Multimodal Models Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, etal.2023
- Mistral 7BAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, etal.2023
- Mixtral of ExpertsAlbert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, etal.2024
- Improving Language Understanding by Generative PretrainingAlec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever2018
- Attention Is All You NeedAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, etal.2017
- BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingJacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova2018
- Physics of Language Models: Part 3.1, Knowledge Storage and ExtractionZeyuan Allen-Zhu, Yuanzhi Li2023
- Language Models are Unsupervised Multitask LearnersAlec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever2019
- Language Models are Few-Shot LearnersTom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, etal.2020
- https://commoncrawl.org/
- The Pile: An 800GB Dataset of Diverse Text for Language ModelingLeo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, etal.2020
- Mamba: Linear-Time Sequence Modeling with Selective State SpacesAlbert Gu, Tri Dao2023
- Efficiently Modeling Long Sequences with Structured State SpacesAlbert Gu, Karan Goel, Christopher Ré2021