Member-only story

Your Daily AI Research tl;dr — 2022–09–15 🧠

StoryDALL-E, CLIP-ViP and the 2022 Noonies…

2 min readSep 15, 2022

Welcome to your official daily AI research tl;dr (often with code and news) for AI professionals where I share the most exciting papers I find daily, along with a one-liner summary to help you quickly determine if the article (and code) is worth investigating.

1️⃣ StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

“We first propose the task of story continuation, where the generated visual story is conditioned on a source image, allowing for better generalization to narratives with new characters. […] Our work demonstrates that pretrained text-to-image synthesis models can be adapted for complex and low-resource tasks like story continuation.”

Link to the paper: https://arxiv.org/pdf/2209.06192.pdf

2️⃣ CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

“We propose a Omnisource Cross-modal Learning method equipped with a Video Proxy mechanism on the basis of CLIP, namely CLIPViP [… ] improving the performance of CLIP on video-text retrieval by a large margin, also achieving SOTA…

Your Daily AI Research tl;dr — 2022–09–15 🧠

StoryDALL-E, CLIP-ViP and the 2022 Noonies…

1️⃣ StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

2️⃣ CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Written by Louis-François Bouchard

No responses yet