References
- Learning Transferable Visual Models From Natural Language SupervisionAlec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, etal.2021
- Reproducible scaling laws for contrastive language-image learningMehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, etal.2022
- DataComp: In search of the next generation of multimodal datasetsSamir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, etal.2023
- Image Captioners Are Scalable Vision Learners TooMichael Tschannen, Manoj Kumar, Andreas Steiner, Xiaohua Zhai, Neil Houlsby, Lucas Beyer2023
- LocCa: Visual Pretraining with Location-aware CaptionersBo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, etal.2024