As ever, is the elementwise product.

The hessian-vector product can be approximated efficiently, using only gradient evaluations.

As ever, is the elementwise product.

The hessian-vector product can be approximated efficiently, using only gradient evaluations.

%d bloggers like this:

Schraudolph says in a lecture that you usually want to approximate exp(x) with max(1/2, x) when updating p. This function 1) is much faster on the CPU (important as it is in an inner loop), 2) is a reasonable estimate if \mu is small, and 3) is robust in the case where x happens to get really big. In http://videolectures.net/mlss06au_schraudolph_aml/ (part 2).