Implicit Differentiation

Regular single-variate implicit differentiation is taught in calc 101.

Single variate

Consider a relationship between x and y implicitly defined by


Now, consider some (infinitesimally) small perturbation \delta x to x, inducing a corresponding peturbation \delta y in y.

f(x + \delta x, y + \delta y)=0

f(x,y)+\frac{\partial f(x, y)}{\partial x} \delta x + \frac{\partial f(x, y)}{\partial y} \delta y = 0

(Here I am using the convention that if arguments to f are unspecified, it is to be evaluated at (x,y)) Recall that f(x,y)=0, then solve to get

\boxed{ \delta y = - \dfrac{\frac{\partial f}{\partial x}}{\frac{\partial f}{\partial y}} \delta x }


\boxed{\frac{d y}{d x}=- \dfrac{\frac{\partial f}{\partial x}}{\frac{\partial f}{\partial y}}}

Multiple variate

We again have an implicit relationship, but now everything is a vector.

{\bf f}({\bf x},{\bf y})={\bf 0}

Consider perturbations {\bf \delta x} and {\bf \delta y}.

{\bf f}({\bf x}+{\bf \delta x},{\bf y}+{\bf \delta y})={\bf 0}

{\bf f}({\bf x},{\bf y}) + \frac{\partial{\bf f}({\bf x},{\bf y})}{\partial {\bf x}^T}{\bf \delta x}+ \frac{\partial{\bf f}({\bf x},{\bf y})}{\partial {\bf y}^T}{\bf \delta y}={\bf 0}

\boxed{ {\bf \delta y} = -(\frac{\partial{\bf f}}{\partial {\bf y}^T})^{-1} \frac{\partial{\bf f}}{\partial {\bf x}^T}{\bf \delta x}}


\boxed{ \frac{d \bf y}{{d \bf x}^T} = -(\frac{\partial{\bf f}}{\partial {\bf y}^T})^{-1} \frac{\partial{\bf f}}{\partial {\bf x}^T}}

Use in optimization

Suppose some variables \bf y are implicitly defined by a minimization of some function also of \bf x.

{\bf y} = \arg\min_{\bf y} g({\bf x},{\bf y})

Suppose furthermore, that there is some criterion L({\bf y}) that we would like to minimize.

\text{minimize } L({\bf y}).

So we want to set \bf x so that the resulting \bf y gives the best value for L.

By setting the gradient of the above to zero, we obtain (We assume here that g is convex over \bf y for all \bf x.)

\frac{ \partial g({\bf x},{\bf y})}{\partial {\bf y}}={\bf 0}

Making the substitution f \rightarrow \frac{\partial g}{\partial \bf y} in the last equation from the previous section, we get

\frac{d \bf y}{{d \bf x}^T} = -(\frac{\partial^2 g}{\partial {\bf y} \partial {\bf y}^T})^{-1} \frac{\partial^2 g}{\partial {\bf y} \partial {\bf x}^T}

Now, by putting the pieces together, we can see how a small change in \bf x affects the objective function.

\frac{d L}{d {\bf x}^T}=\frac{\partial L}{\partial {\bf y}^T}\frac{d \bf y}{{d \bf x}^T}

\frac{d L}{d {\bf x}^T}=-\frac{\partial L}{\partial {\bf y}^T}(\frac{\partial^2 g}{\partial {\bf y} \partial {\bf y}^T})^{-1} \frac{\partial^2 g}{\partial {\bf y} \partial {\bf x}^T}

\boxed{ \frac{d L}{d {\bf x}}=-\frac{\partial^2 g}{\partial {\bf x} \partial{\bf y}^T}(\frac{\partial^2 g}{\partial {\bf y} \partial {\bf y}^T})^{-1}\frac{\partial L}{\partial {\bf y}}}

I’ve used results like this in this paper and there is a bunch of work using this kind of thing for training neural networks.  See this entry.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s