Generative Models

1) Overview:

2) Details:

Supervised Learning

Data: (x, y) - x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc.

Unsupervised Learning

Data: x - Just data, no labels!
Goal: Learn some underlying hidden structure of the data
Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc.

Generative Modeling

Given training data, generate new samples from same distribution.
Formulate as density estimation problems:
1. Explicit density estimation: explicit define and solve pmodel(x)
2. Implicit density estimation: learn model that can sample from pmodel(x) without explicity defining it.

Taxonomy of Generative Models

Fully Visible Belief Network (FVBN)

Use Chain rule to decompose likelihood of an image x into product of 1-d distributions.

Some Explainations of FVBNs: [Link]

What distinguishes the FVBNs from the other explicit density models are that they have a tractable density: that is, that you can exactly calculate the probability of samples of your dataset (or at least, that's the assumption made by the model). This is what's meant by "fully visible".

Fully Visible Belief Network (FVBN)

Use Chain rule to decompose likelihood of an image x into product of 1-d distributions.

Pixel RNN vs Pixel CNN

2 different approaches to generative modeling of images at the pixel level. Both models aim to generate realistic images pixel by pixel.
1. Pixel RNN uses recurrent connections and generates pixels sequentially.
2. Pixel CNN uses convolutional layers to model dependencies and can generate pixels in parallel.

Pros:
- Can explicitly compute likelihood p(x).
- Easy to optimize.
- Good samples.
Con:
- Sequential generation => slow.

Improving PixelCNN performance (Salimans et al. 2017 - PixelCNN++)

Improving PixelCNN performance (Salimans et al. 2017 - PixelCNN++)

Variational Autoencoders (VAE)

Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data.
z usually smaller than x (dimensionality reduction). (Want features to capture meaningful factors of variation in data)

Learning z? (Train such that features can be used to reconstruct original data “Autoencoding” - encoding input itself)

How do we make autoencoder a generative model?

Probabilistic spin on autoencoders - will let us sample from the model to generate data!

z is latent factors used to generate x: attributes, orientation, etc.
We want to estimate the true parameters θ* of this generative model given training data x.

How should we represent the model?

Choose prior p(z) to be simple, e.g. Gaussian. Reasonable for latent attributes, e.g. pose, how much smile.
Conditional p(x|z) is complex (generates image) => represent with neural network.

How to train the model

Variational Autoencoders (VAEs) define intractable density function with latent z:

Intractability

References:

Page updated

Google Sites

Report abuse

About Me: