Member-only story

Masked Autoencoders that Listen, Diffusion Model for Text-to-Speech, and Andrej Karpathy leaves Tesla

Your Daily AI Research tl;dr — 2022–07–16 🧠

2 min readJul 16, 2022

Welcome to your official daily AI research tl;dr (often with code and news) for AI enthusiasts where I share the most exciting papers I find daily, along with a one-liner summary to help you quickly determine if the article (and code) is worth investigating. I will also take this opportunity to share daily exciting news in the field.

Let’s get started with this iteration!

1️⃣ Masked Autoencoders that Listen

“Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens [incorporating local window attention], in order to reconstruct the input spectrogram.”

Link to the paper: https://arxiv.org/pdf/2207.06405.pdf

Code: https://github.com/facebookresearch/AudioMAE

Masked Autoencoders that Listen, Diffusion Model for Text-to-Speech, and Andrej Karpathy leaves Tesla

Your Daily AI Research tl;dr — 2022–07–16 🧠

1️⃣ Masked Autoencoders that Listen

2️⃣ ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech

Written by Louis-François Bouchard

No responses yet