Even with everything that happened in the world this year, we still had the chance to see a lot of amazing research come out. Especially in the field of artificial intelligence. More, many important aspects were highlighted this year, like the ethical aspects, important biases, and much more. Artificial intelligence and our understanding of the human brain and its link to AI is constantly evolving, showing promising applications in the soon future.
Neural scene representation from a single image is a really complex problem. The “end goal” is to be able to take a picture from a real-life object, and translate this picture into a 3D scene. It implies that the model understands a whole 3-dimensional scene, or real-life scene, using information from a single picture. This can sometimes be hard even for humans where the colors, or shadows in the image trick our eyes.
Odei Garcia-Garin et al. from the University of Barcelona have developed a deep learning-based algorithm able to detect and quantify floating garbage from aerial images. They also made a web-oriented application allowing users to identify these garbages, called floating marine macro-litter, or FMML, within images of the sea surface. Floating marine macro-litter is any persistent, manufactured, or processed solid material lost or abandoned in a marine compartment. As you most certainly know, these plastic wastes are dangerous for fish, turtles, and marine mammals as they can either ingest them or get entangled and hurt.
Here are the 3 most interesting research papers of the month, in case you missed any of them. It is a curated list of the latest breakthroughs in AI and Data Science by release date with a clear video explanation, link to a more in-depth article, and code (if applicable). Enjoy the read, and let me know if I missed any important papers in the comments, or by contacting me directly on LinkedIn!
Follow me on Medium to see this AI top 3 every month!
OpenAI successfully trained a network able to generate images from text captions. …
If the title and subtitle sound like another language to you, this article was made for you!
You’ve probably heard of iGPT, or Image-GPT recently published by OpenAI that I covered on my channel. It is the state-of-the-art generative transformer model. OpenAI used the transformer architecture on a pixel-representation of images to perform image synthesis. In short, they use transformers with half the pixels of an image as inputs to generate the other half of the image. As you can see here, it is extremely powerful.
A team of researchers from Google, MIT, and the University of Washington recently published a paper called “VOGUE: Try-On by StyleGAN Interpolation Optimization”. They use a GAN architecture to create an online fitting room, where you can automatically try-on any pants or shirts you want using only an image of yourself. Also called garment transfer, the goal is to take the clothes from a person in a picture and transfer it onto someone else while conserving the correct body shape, hair, and skin color. …
I will let Francesca Rossi introduce this article with her great remark made at the AI Debate 2 organized by Montreal AI:
These are the reasons why Francesca Rossi and her team at IBM published this paper proposing a research direction to advance AI. Drawing inspiration from cognitive theories of human decision making. …
DALL-E is a new neural network developed by OpenAI based on GPT-3.
In fact, it’s a smaller version of GPT-3 using 12-billion parameters instead of 175 billion. But it has been specifically trained to generate images from text descriptions, using a dataset of text-image pairs instead of a very broad dataset like GPT-3. It can create images from text captions using natural language, just like GPT-3 creates websites and stories.
It’s a continuation of Image GPT and GPT-3 that I both covered in previous videos if you haven’t watched them yet.
DALL-E is very similar to GPT-3…
Even with everything that happened in the world this year, we still had the chance to see a lot of amazing research come out. Especially in the field of artificial intelligence and more precisely computer vision. More, many important aspects were highlighted this year, like the ethical aspects, important biases, and much more. Artificial intelligence and our understanding of the human brain and its link to AI is constantly evolving, showing promising applications in the soon future, which I will definitely cover.
NeRV, or Neural Reflectance and Visibility Fields for Relighting and View Synthesis, is a method that produces a 3D representation of a scene and can generate arbitrary lighting conditions. It only needs a set of images of the scene as inputs to generate novel viewpoints of the scene under any chosen lighting conditions!