Basic Information:
Shannon's source coding theorem:
The source coding theorem states that (in the limit, as the length of a stream of independently and identically-distributed random variable (i.i.d.) data approaches infinity), it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source without losing information. However, with a minimal likelihood of loss, the code rate can be arbitrarily near to the Shannon entropy.
Source coding is a mapping from (a sequence of) symbols from an information source to a sequence of alphabet symbols (usually bits) such that the source symbols can be exactly recovered from the binary bits (lossless source coding) or recovered within some distortion (lossy source coding). This is the concept behind data compression.
In information theory, the source coding theorem informally states that,
N i.i.d. random variables each with entropy H(X) can be compressed into more than
N H(X) bits with negligible risk of information loss, as
N → ∞;
but conversely, if they are compressed into fewer than N H(X) bits it is virtually certain that information will be lost
Generative Model:
To begin we must understand what Is a generative model, to train a generative model, we first collect a vast amount of data in a domain (for example, millions of photos, phrases, or sounds) and then train a model to generate data that looks like it.
Deep Generative Model and Lossless Compression
The basis of AI data compression, according to Shannon's source coding theorem, is to build an explicit generative model to describe the likelihood, and then create a coder to adjust the explicit generative model.
Currently, explicit generative models are classified into three types:
· Autoregressive models
· (Variational) auto-encoder models
· Flow models
all of the above can be created with specialized data reduction methods.
Autoregressive Data Model:
Instead, autoregressive models train a network that models the conditional distribution of each individual pixel based on the conditional distribution of prior pixels (to the left and to the top). This is similar to plugging the image's pixels into a char-rnn, but instead of a 1D sequence of characters, the RNNs run horizontally and vertically over the image.
the probability distribution of each symbol in an autoregressive model can be determined from the symbols in the previous dimension, the current symbol can be compressed or decompressed using entropy coding.
Auto-encoder models:
An encoder and a decoder are included in the auto-encoder model. The compression procedure saves hidden variables as well as the difference between the decoder and the actual data. Using the hidden variables and faults recorded in the decoder, the decoder can reconstruct the original data.
VAEs allow us to formalize this problem in the context of probabilistic graphical models, where the goal is to maximize a lower constraint on the data's log probability.
Flow models:
The bijection of the input data and the implicit variable is constructed using a neural network in the flow model. The compression method converts the input data into latent, which are subsequently compressed using the previous distribution. The process of decoding is reversed. The following diagram summarizes the three types of explicit generative models and compression algorithms.
A flow model is made up of several flow layers, each of which creates an invertible transition from input to output. By stacking the flow lay, the complex transformation between the input data and the hidden variables can be realized.
Final thoughts:
All of these approaches have advantages and disadvantages. For example, in advanced probabilistic graphical models with latent variables, Variational Autoencoders allow us to accomplish both learning and efficient Bayesian inference. Their created samples, on the other hand, are a little hazy. GANs now provide the sharpest images, but because to their unstable training dynamics, they are more difficult to optimize. They are, however, inefficient during sampling and do not simply give simple low-dimensional picture codes. All of these models are currently being researched.