Deep Learning

GANsformers: Generate complex scenes using GANs and Transformers

They basically leverage transformers’ attention mechanism in the powerful StyleGAN2 architecture to make it even more powerful!

Louis-François Bouchard

Follow

Published in

Towards AI

6 min readMar 6, 2021

--

Watch the video:

Last week we looked at DALL-E, OpenAI’s most recent paper.
It uses a similar architecture as GPT-3 involving transformers to generate an image from text. This is a super interesting and complex task called text-to-image translation. As you can see in the video below, the results were surprisingly good compared to previous state-of-the-art techniques. This is mainly due to the use of transformers and a large amount of data.

This week we will look at a very similar task called visual generative modelling. Where the goal is to generate a complete scene in high-resolution, such as a road or a room, rather than a single face or a specific object. This is different from DALL-E since we are not generating this scene from a text but from a trained model on a specific style of scenes. Which is a bedroom in this case.

Results examples on generating bedroom scenes. Image from: Drew A. Hudson and C. Lawrence Zitnick, Generative Adversarial Transformers, (2021).

Rather, it is just like StyleGAN that is able to generate unique and non-existing human faces being trained on datasets of real faces.

Towards AI

Deep Learning

GANsformers: Generate complex scenes using GANs and Transformers

They basically leverage transformers’ attention mechanism in the powerful StyleGAN2 architecture to make it even more powerful!

Watch the video:

Published in Towards AI

Written by Louis-François Bouchard

No responses yet