This is a very simple trick I have had to rederive many times. Often you need to optimize over some parameters that need to be constrained to be positive. One solution is to reparameterize with some parameters , where the correspondence is
An unconstrained optimization over is equivalent to a constrained optimization over . Now, if the function we are optimizing is , we have a trivial correspondence between gradients:
(Notation: is the elementwise product, and is a vector of the derivatives .)
If we were to do gradient descent on , the updates would look like
But, why not forget about , and do this same update directly on ?
So the final update formula to remember is