Member-only story
Your Daily AI Research tl;dr — 2022–10–13 🧠
The largest public dataset of real-world 3D spaces with dense semantic annotations, A Benchmark for Multimodal Models and more!
Welcome to your official daily AI research tl;dr (often with code and news) for AI professionals where I share the most exciting papers I find daily, along with a one-liner summary to help you quickly determine if the article (and code) is worth investigating.
1️⃣ f -DM: A MULTI-STAGE DIFFUSION MODEL VIA PROGRESSIVE SIGNAL TRANSFORMATION
They extend diffusion models to incorporate a set of (hand-designed or learned) transformations, where the transformed input is the mean of each diffusion step allowing the model to learn more abstract representations.
Link to the paper: https://arxiv.org/pdf/2210.04955.pdf
2️⃣ [deepmind] Perception Test: A Diagnostic Benchmark for Multimodal Models
A novel multimodal benchmark — the Perception Test — that aims to extensively evaluate perception and reasoning skills of multimodal models by having real-world videos designed to show perceptually interesting situations and defines multiple tasks that…