Large Language Models
References
PaLM: Scaling Language Modeling with Pathways Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, etal. 2022 Gemini: A Family of Highly Capable Multimodal Models Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, etal. 2023 Mistral 7B Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, etal. 2023 Mixtral of Experts Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, etal. 2024 Improving Language Understanding by Generative Pretraining Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever 2018 Attention Is All You Need Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, etal. 2017 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova 2018 Physics of Language Models: Part 3.1, Knowledge Storage and Extraction Zeyuan Allen-Zhu, Yuanzhi Li 2023 Language Models are Unsupervised Multitask Learners Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever 2019 Language Models are Few-Shot Learners Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, etal. 2020 - https://commoncrawl.org/
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, etal. 2020 Mamba: Linear-Time Sequence Modeling with Selective State Spaces Albert Gu, Tri Dao 2023 Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, Christopher Ré 2021 The Curious Case of Neural Text Degeneration Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, Yejin Choi 2019 Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs Minh Nguyen, Andrew Baker, Clement Neo, Allen Roush, Andreas Kirsch, Ravid Shwartz-Ziv 2024 Training language models to follow instructions with human feedback Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, etal. 2022 Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, etal. 2024 Simple statistical gradient-following algorithms for connectionist reinforcement learning Ronald J. Williams 1992 Direct Preference Optimization: Your Language Model is Secretly a Reward Model Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn 2023 DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner 2019 PIQA: Reasoning about Physical Commonsense in Natural Language Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, Yejin Choi 2019 Measuring Massive Multitask Language Understanding Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt 2020 Training Verifiers to Solve Math Word Problems Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, etal. 2021 WinoGrande: An Adversarial Winograd Schema Challenge at Scale Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi 2019 Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, etal. 2022 AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, etal. 2023 Evaluating Large Language Models Trained on Code Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, etal. 2021 Program Synthesis with Large Language Models Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, etal. 2021 Ring Attention with Blockwise Transformers for Near-Infinite Context Hao Liu, Matei Zaharia, Pieter Abbeel 2023 Sequence Parallelism: Long Sequence Training from System Perspective Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You 2021 Reducing Activation Recomputation in Large Transformer Models Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, etal. 2022 DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang 2023 Efficient Memory Management for Large Language Model Serving with PagedAttention Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, etal. 2023 PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling Zefan Cai, Yichi Zhang, Bofei Gao, Yuliang Liu, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, etal. 2024 GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai 2023 Fast Inference from Transformers via Speculative Decoding Yaniv Leviathan, Matan Kalman, Yossi Matias 2022 Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao 2024 - https://github.com/pytorch/torchtune
- https://github.com/vllm-project/vllm
- https://huggingface.co/models
- https://lmsys.org/
- https://ollama.com/
- https://github.com/ggerganov/llama.cpp
Toolformer: Language Models Can Teach Themselves to Use Tools Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, etal. 2023 AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls Yu Du, Fangyun Wei, Hongyang Zhang 2024 The Llama 3 Herd of Models Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, etal. 2024 Synchromesh: Reliable code generation from pre-trained language models Gabriel Poesia, Oleksandr Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, etal. 2022 Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation Luca Beurer-Kellner, Marc Fischer, Martin Vechev 2024 Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search Chris Hokamp, Qun Liu 2017 Long Context Compression with Activation Beacon Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou 2024 RoFormer: Enhanced Transformer with Rotary Position Embedding Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu 2021 Extending Context Window of Large Language Models via Positional Interpolation Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian 2023 Reading Wikipedia to Answer Open-Domain Questions Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes 2017 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, etal. 2020 REALM: Retrieval-Augmented Language Model Pre-Training Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang 2020 Improving language models by retrieving from trillions of tokens Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, etal. 2021 In-Context Retrieval-Augmented Language Models Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham 2023 Vision Transformers Need Registers Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski 2023 Massive Activations in Large Language Models Mingjie Sun, Xinlei Chen, J. Zico Kolter, Zhuang Liu 2024 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, etal. 2022 Self-Consistency Improves Chain of Thought Reasoning in Language Models Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, etal. 2022 Tree of Thoughts: Deliberate Problem Solving with Large Language Models Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan 2023 ReAct: Synergizing Reasoning and Acting in Language Models Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao 2022 Reflexion: Language Agents with Verbal Reinforcement Learning Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao 2023 Generative Verifiers: Reward Modeling as Next-Token Prediction Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal 2024 ChatGPT is bullshit Michael Townsen Hicks, James Humphries, Joe Slater 2024 Large Language Models Cannot Self-Correct Reasoning Yet Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou 2023 Dissociating language and thought in large language models Kyle Mahowald, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, etal. 2023 Physics of Language Models: Part 1, Learning Hierarchical Language Structures Zeyuan Allen-Zhu, Yuanzhi Li 2023