Member-only story
Your Daily AI Research tl;dr — 2022–09–15 🧠
StoryDALL-E, CLIP-ViP and the 2022 Noonies…
Welcome to your official daily AI research tl;dr (often with code and news) for AI professionals where I share the most exciting papers I find daily, along with a one-liner summary to help you quickly determine if the article (and code) is worth investigating.
1️⃣ StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation
“We first propose the task of story continuation, where the generated visual story is conditioned on a source image, allowing for better generalization to narratives with new characters. […] Our work demonstrates that pretrained text-to-image synthesis models can be adapted for complex and low-resource tasks like story continuation.”
Link to the paper: https://arxiv.org/pdf/2209.06192.pdf
2️⃣ CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
“We propose a Omnisource Cross-modal Learning method equipped with a Video Proxy mechanism on the basis of CLIP, namely CLIPViP [… ] improving the performance of CLIP on video-text retrieval by a large margin, also achieving SOTA…