Natural Evolution Strategies

NES

A family of numerical optimization algorithms for black-box problems which iteratively update a search distribution by using an estimated gradient on its distribution parameters.

The core idea is to use search gradients to update parameters where the search gradient is the sampled gradient of expected fitness. Let $\theta$ be the parameters of the search distribution $\pi(x|\theta)$ and $f(x)$ the fitness function evaluated at $x$. The objective is to maximize the expected fitness under the distribution.

$$ J(\theta) = \mathbb{E}_{\theta}[f(x)] = \int f(x) \ \pi(x | \theta) \ dx $$

The gradient can be written as:

$$ \nabla_{\theta} J(\theta) = \mathbb{E}_{\theta}[f(z) \ \nabla_{\theta} \ log \ \pi(z|\theta)] $$

The Monte Carlo estimate from samples $z_1, ..., z_{\lambda}$ as:

$$ \nabla_{\theta} J(\theta) \approx \frac{1}{\lambda} \sum_{k=1}^{\lambda} f(z_k) \ \nabla_{\theta} \ log \ \pi(z_k | \theta) $$

The parameters are then updated with a standard ascent scheme:

$$ \theta \leftarrow \theta + \eta \nabla_{\theta} J(\theta) $$

The natural gradient accounts for uncertainty in the second order updates by removing the dependence on parameterization by instead relying on the KL divergence between probability distributions where the Fisher information matrix defines the local curvature in distribution space.

\begin{align} F &= \int \pi(z|\theta) \ \nabla_{\theta} \ log \ \pi(z|\theta) \ \nabla_{\theta} \ log \ \pi(z|\theta)^{\top} dz \\ &\approx \frac{1}{\lambda} \sum_{k=1}^{\lambda} \nabla_{\theta} \ log \ \pi(z_k|\theta) \ \nabla_{\theta} \ log \ \pi(z_k|\theta)^{\top} \end{align}

The ascent scheme then becomes:

$$ \theta \leftarrow \theta + \eta F^{-1} \nabla_{\theta} J(\theta) $$