Denoising Diffusion

Diffusion

A diffusion probabilistic model is a parameterized Markov chain trained using variational inference to produce samples that match data after a finite time. Transitions of the chain are learned to reverse the diffusion process, where noise is gradually added in the opposite direction until the signal is destroyed.

These are latent variable models of the form pθ(x0):=pθ(x0:T) dx1:T where the latent space x1,...,xT has the same dimensionality as the data x0q(x0). The joint distribution pθ(x0:T) is called the reverse process and is defined with learned Gaussian transitions.

pθ(x0:T):=p(xT)t=1Tpθ(xt1|xt)pθ(xt1|xt):=N(xt1;μθ(xt,t),Σθ(xt,t))

The unique quality of diffusion is the forward process where the approximate posterior q(x1:T|x0) gradually adds Gaussian noise to the data according to a variance schedule β1,...,βT.

q(x1:T|x0):=t=1Tq(xt|xt1)q(xt|xt1):=N(xt;1βt xt1, βtI)

Training is performed by optimizing the variational bound on negative log likelihood.

L:=E[log pθ(x0)]E[logpθ(x0:T)q(x1:T|x0)]=E[log p(xT)t1logpθ(xt1|xt)q(xt|xt1)]

The forward process variances can be learned or held constant as tunable hyperparameters. Efficient training is possible by optimizing random terms of L rewritten as:

Eq[DKL(q(xT|x0) || p(xT))LT+t>1DKL(q(xt1|xt,x0) || pθ(xt1|xt))Lt1log pθ(x0|x1)L0]