As ever, is the elementwise product.
The hessian-vector product can be approximated efficiently, using only gradient evaluations.
Schraudolph says in a lecture that you usually want to approximate exp(x) with max(1/2, x) when updating p. This function 1) is much faster on the CPU (important as it is in an inner loop), 2) is a reasonable estimate if \mu is small, and 3) is robust in the case where x happens to get really big. In http://videolectures.net/mlss06au_schraudolph_aml/ (part 2).
Fill in your details below or click an icon to log in:
You are commenting using your WordPress.com account. ( Log Out / Change )
You are commenting using your Twitter account. ( Log Out / Change )
You are commenting using your Facebook account. ( Log Out / Change )
You are commenting using your Google+ account. ( Log Out / Change )
Connecting to %s
Notify me of new comments via email.