Member-only story
Masked Autoencoders that Listen, Diffusion Model for Text-to-Speech, and Andrej Karpathy leaves Tesla
Your Daily AI Research tl;dr — 2022–07–16 🧠
Welcome to your official daily AI research tl;dr (often with code and news) for AI enthusiasts where I share the most exciting papers I find daily, along with a one-liner summary to help you quickly determine if the article (and code) is worth investigating. I will also take this opportunity to share daily exciting news in the field.
Let’s get started with this iteration!
1️⃣ Masked Autoencoders that Listen
“Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens [incorporating local window attention], in order to reconstruct the input spectrogram.”
Link to the paper: https://arxiv.org/pdf/2207.06405.pdf
Code: https://github.com/facebookresearch/AudioMAE