Log Gradient Descent
This is a very simple trick I have had to rederive many times. Often you need to optimize over some parameters that need to be constrained to be positive. One solution is to reparameterize with some parameters
, where the correspondence is
.
An unconstrained optimization over is equivalent to a constrained optimization over
. Now, if the function we are optimizing is
, we have a trivial correspondence between gradients:
(Notation: is the elementwise product, and
is a vector of the derivatives
.)
If we were to do gradient descent on , the updates would look like
.
But, why not forget about , and do this same update directly on
?
So the final update formula to remember is