Log Gradient Descent

This is a very simple trick I have had to rederive many times. Often you need to optimize over some parameters \bf x that need to be constrained to be positive. One solution is to reparameterize with some parameters \boldsymbol \theta, where the correspondence is

{\bf x} = \exp({\boldsymbol \theta}).

An unconstrained optimization over \boldsymbol \theta is equivalent to a constrained optimization over \bf x. Now, if the function we are optimizing is f, we have a trivial correspondence between gradients:

\frac{df}{d{\boldsymbol \theta}}=\frac{df}{d{\bf x}}\odot\frac{d\bf x}{d\boldsymbol\theta}

(Notation: \odot is the elementwise product, and \frac{d\bf x}{d\boldsymbol\theta} is a vector of the derivatives dx_i/d\theta_i.)

If we were to do gradient descent on \boldsymbol \theta, the updates would look like

{\boldsymbol \theta}' = {\boldsymbol \theta} - \lambda \frac{df}{d{\boldsymbol \theta}}.

But, why not forget about {\boldsymbol \theta}, and do this same update directly on \bf x?

{\bf x}' = \exp({\boldsymbol \theta}') = \exp({\boldsymbol \theta}) \odot \exp(- \lambda \frac{df}{d{\boldsymbol \theta}})

So the final update formula to remember is
\boxed{{\bf x}'= {\bf x} \odot \exp(- \lambda {\bf x} \odot \frac{df}{d\bf x})}

Leave a Reply