# Stochastic Meta-Descent

As ever, $\odot$ is the elementwise product.

$w_{t+1}=w_t-p_t \odot g_t$

$p_t = p_{t-1}\odot \exp(\mu v_t \cdot g_t)$

$v_{t+1}=\lambda v_t + p_t \odot(g_t - \lambda H_t v_t)$

The hessian-vector product can be approximated efficiently, using only gradient evaluations.

### One Response to Stochastic Meta-Descent

1. twolfe18 says:

Schraudolph says in a lecture that you usually want to approximate exp(x) with max(1/2, x) when updating p. This function 1) is much faster on the CPU (important as it is in an inner loop), 2) is a reasonable estimate if \mu is small, and 3) is robust in the case where x happens to get really big. In http://videolectures.net/mlss06au_schraudolph_aml/ (part 2).