# Stochastic Meta-Descent

As ever, $\odot$ is the elementwise product.

$w_{t+1}=w_t-p_t \odot g_t$

$p_t = p_{t-1}\odot \exp(\mu v_t \cdot g_t)$

$v_{t+1}=\lambda v_t + p_t \odot(g_t - \lambda H_t v_t)$

The hessian-vector product can be approximated efficiently, using only gradient evaluations.