/r/DeepGenerative
This is a place to share and discuss deep generative models i.e. Variational Auto-Encoders (VAEs) and Generative Adversarial Networks (GANs).
Deep generative models is a new sub-field of machine learning that uses deep neural networks to generate convincing "samples" from existing data.
This sub aims to be an intersection of industry, researcher, and hobbyist interests in this burgeoning field. Posts lacking effort will be deleted, and simple questions should be asked in the weekly simple questions thread. This is not a place for anger or off-topic discussions. That being said, hopefully we can end up with a tight-knit community here on reddit.
/r/DeepGenerative
We’ll try to prepare a very popular attack, the Fast Gradient Sign Method, to demonstrate the security vulnerabilities of neural networks.
We cover all three steps:
It’s safe to assume a topic can be considered mainstream when it is the basis for an opinion piece in the Guardian. What is unusual is when that topic is a fairly niche area that involves applying Deep Learning techniques to develop natural language models. What is even more unusual is when one of those models (GPT-3) wrote the article itself!
Understandably, this caused a flurry of apocalyptic terminator-esque social media buzz (and some criticisms of the Guardian for being misleading about GPT-3’s ability).
Nevertheless, the rapid progress made in recent years in this field has resulted in Language Models (LMs) like GPT-3. Many claim that these LMs understand language due to their ability to write Guardian opinion pieces, generate React code, or perform a series of other impressive tasks.
To understand NLP, we need to look at three aspects of these Language Models:
So how good are these models?
Can Deep Learning Models Like BERT Ever Understand Language?
If you don't mind I would like to show you what we recently prepared.
Generic GANs setup is widely known: G
and D
play min-max game where one is trying to outsmart the other.
That’d be all fine if it was that simple when you’re actually implementing them. One common problem is the overly simplistic loss function.
Here, we analyse this problem by examining different variations of the GAN loss functions to get a better insight into how they actually work. We look at many loss function formulations and analyse issues like mode collapse, vanishing gradients and convergence.
We've attempted to give that insight in the article so, hopefully, you find this helpful/useful.
Hello geeks,
I am new to deep generative models, I have a problem statement where I want to generate text for trends in tabular data showing trends. Any ideas how this can be achieved?
Hi guys,
We got our start making deepfakes on reddit channels, and now we've launched our new mobile app that lets everyone make deepfakes. We're live on product hunt today. Check it out. We'd love your feedback: https://www.impressions.app/
A curated, quasi-exhaustive list of state-of-the-art publications and resources -sorted by citations/start- about GANs &their applications.
I start to focus on Text-to-Image Synthesis on complex Dataset (like MSCOCO) Using GAN these days.
After searching, some relevant works are StackGAN, Hong et.al. and AttnGAN
It seems there are mainly two methods for synthesis: either generating from scratch (low resolution) to reality (high resolution) or generating from bbox to shape(Mask) and finally to image.
Here are some of my questions about current situation of Text-to-Image Synthesis research:
I want to synthesise high-res images by concatenating two latent vectors (meaning, not from a random sample). Does it makes sense to train the AE with GAN loss or is it better to first train the AE and as a second step to improve the decoder with further training using a GAN loss? Does any of this makes sense?
Hi everyone, Here is my implementation of the Progressive Growing of GANs from Nvidia Research: https://github.com/Latope2-150/Progressive_Growing_of_GANs-PyTorch
The original paper is this one: Progressive Growing of GANs for Improved Quality, Stability, and Variation
For now, there is only an example of MNIST but it is not very complicated to adapt it to other datasets. I haven't had the time to train it on large datasets but I have tested it on 320x320 images so I know it works for higher resolutions.
This implementation is as close as possible from the original one in default configuration but can easily be modified. I trained it on a single Nvidia Tesla P100 and I still need to add [efficient] multi-GPU training.
Future work includes testing GroupNorm as normalization, making it conditional, changing the loss function (WGAN-GP for now), etc.
If you have any question, feel free to ask!
I was just wondering what is the standard resource that people refer to when learning about GANs?
Thanks!
I am looking to build a model that implements a version of text guided image translation.
For example, an image of a man + "walking" --> Image of man walking. Or something even simpler, but you get the basic idea. I am unable to find any existing research for this. Any suggestions/ new ideas will be very helpful :)
This a project I played around with using affinelayer's pix2pix implementation. The goal was to generate baseball player headshots with an eye towards using them in the Out of the Park computer games for fictional players. I didn't quite get that far into it, but I did get some interesting results. You can see a sample of the system running on held-out test data here.
In most cases, pix2pix is able to correctly impute a variety of features of the original image from only a rough black-and-white sketch. It colors old-timey pictures black and white, it usually (not always) correctly colorizes hats based on team logos, and can often make a reasonable guess of a player's skin color. There are a handful of failure cases in the bunch, although some of them are failure cases of the process I used to generate the outlines.
The data set I used is a compilation of over thousands of photos of almost everyone who's ever played Major League Baseball, available here. Photos of modern players are very consistently framed, but as you go back in time, you get more and more variety. Some players from the 1800s are merely sketches or extremely grainy, low-resolution blurs. I generated the training outlines using imagemagick's edge detector, although I think I need to tune the settings a bit to get a more consistent output - a few players came out almost completely blank.
For reference, the original pix2pix paper is here