Member-only story
Deep Learning
GANsformers: Generate complex scenes using GANs and Transformers
They basically leverage transformers’ attention mechanism in the powerful StyleGAN2 architecture to make it even more powerful!
Watch the video:
Last week we looked at DALL-E, OpenAI’s most recent paper.
It uses a similar architecture as GPT-3 involving transformers to generate an image from text. This is a super interesting and complex task called text-to-image translation. As you can see in the video below, the results were surprisingly good compared to previous state-of-the-art techniques. This is mainly due to the use of transformers and a large amount of data.
This week we will look at a very similar task called visual generative modelling. Where the goal is to generate a complete scene in high-resolution, such as a road or a room, rather than a single face or a specific object. This is different from DALL-E since we are not generating this scene from a text but from a trained model on a specific style of scenes. Which is a bedroom in this case.

Rather, it is just like StyleGAN that is able to generate unique and non-existing human faces being trained on datasets of real faces.

