# Variational AutoEncoders(VAEs)

For Complicated Distributions

## AutoEncoder

The process of making the input data with high dimensionality into smaller after passing through the Neural Network.

## AutoEncoder Types:

- Denoising
**autoencoder**. - Sparse
**Autoencoder**. - Deep
**Autoencoder**. - Contractive
**Autoencoder**. - Undercomplete
**Autoencoder**. - Convolutional
**Autoencoder**. - Variational
**Autoencoder**.

There are so many techniques in machine learning to reduce the dimensionality of the data into smaller space that is by reducing the number of features that describe some data(**Dimensionality Reduction**). But one of the most popular techniques is VAEs. Generally, It is also considered as an **Unsupervised Learning** technique (because we don’t require labeled inputs to enable learning).

## Neural Network

The Neural Network of Variational Autoencoder consists of an encoder, a decoder, and a loss function.

An** Encoder** is a process of producing “New Features” from the “Old Features” by reducing the dimensions of the input data. It is easy to understand if you are familiar with Convolutional Neural Networks(CNNs). And the **Decoder** is a process of reconstructing the input data using the latent space in the bottleneck layer with some loss of input data(called Reconstruction loss). The quality of the Reconstruction data depends on the dimensions we used in the latent representation.

Actually, we don't care about the output itself but rather the vector constructed in the middle. We can feed that vector to complex architectures to solve complicated problems.

The actual difference between the Autoencoder and Variational Autoencoder is:

# Math Behind It

Assume that our data is represented by the variable x that generated from a hidden variable z is sampled from prior distribution p(z). The probabilistic encoder and decoder are defined by p(z|x) and p(x|z).

After considering p(z|x) using Bayes theorem,

We are going to approximate p(z|x) by a Gaussian distribution q(z|x). We can also remind the KL(Kullback-Leibler) divergence is a measure of the difference between two probability distributions { p(z|x) and q(z|x) }. Our goal is to minimize the KL divergence between these two distributions.

Finally, the loss function of the Variational Autoencoder after the complex derivation is: