Vision Language Models - Image Captioning

Slides
Video Lecture

References

  1. Learning Transferable Visual Models From Natural Language SupervisionAlec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, etal.2021
  2. Reproducible scaling laws for contrastive language-image learningMehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, etal.2022
  3. DataComp: In search of the next generation of multimodal datasetsSamir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, etal.2023
  4. Image Captioners Are Scalable Vision Learners TooMichael Tschannen, Manoj Kumar, Andreas Steiner, Xiaohua Zhai, Neil Houlsby, Lucas Beyer2023
  5. LocCa: Visual Pretraining with Location-aware CaptionersBo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, etal.2024