# Implicit Differentiation

Regular single-variate implicit differentiation is taught in calc 101.

# Single variate

Consider a relationship between $x$ and $y$ implicitly defined by

$f(x,y)=0.$

Now, consider some (infinitesimally) small perturbation $\delta x$ to $x$, inducing a corresponding peturbation $\delta y$ in $y$.

$f(x + \delta x, y + \delta y)=0$

$f(x,y)+\frac{\partial f(x, y)}{\partial x} \delta x + \frac{\partial f(x, y)}{\partial y} \delta y = 0$

(Here I am using the convention that if arguments to $f$ are unspecified, it is to be evaluated at $(x,y)$) Recall that $f(x,y)=0$, then solve to get

$\boxed{ \delta y = - \dfrac{\frac{\partial f}{\partial x}}{\frac{\partial f}{\partial y}} \delta x }$

or

$\boxed{\frac{d y}{d x}=- \dfrac{\frac{\partial f}{\partial x}}{\frac{\partial f}{\partial y}}}$

# Multiple variate

We again have an implicit relationship, but now everything is a vector.

${\bf f}({\bf x},{\bf y})={\bf 0}$

Consider perturbations ${\bf \delta x}$ and ${\bf \delta y}$.

${\bf f}({\bf x}+{\bf \delta x},{\bf y}+{\bf \delta y})={\bf 0}$

${\bf f}({\bf x},{\bf y}) + \frac{\partial{\bf f}({\bf x},{\bf y})}{\partial {\bf x}^T}{\bf \delta x}+ \frac{\partial{\bf f}({\bf x},{\bf y})}{\partial {\bf y}^T}{\bf \delta y}={\bf 0}$

$\boxed{ {\bf \delta y} = -(\frac{\partial{\bf f}}{\partial {\bf y}^T})^{-1} \frac{\partial{\bf f}}{\partial {\bf x}^T}{\bf \delta x}}$

or

$\boxed{ \frac{d \bf y}{{d \bf x}^T} = -(\frac{\partial{\bf f}}{\partial {\bf y}^T})^{-1} \frac{\partial{\bf f}}{\partial {\bf x}^T}}$

# Use in optimization

Suppose some variables $\bf y$ are implicitly defined by a minimization of some function also of $\bf x$.

${\bf y} = \arg\min_{\bf y} g({\bf x},{\bf y})$

Suppose furthermore, that there is some criterion $L({\bf y})$ that we would like to minimize.

$\text{minimize } L({\bf y})$.

So we want to set $\bf x$ so that the resulting $\bf y$ gives the best value for $L$.

By setting the gradient of the above to zero, we obtain (We assume here that $g$ is convex over $\bf y$ for all $\bf x$.)

$\frac{ \partial g({\bf x},{\bf y})}{\partial {\bf y}}={\bf 0}$

Making the substitution $f \rightarrow \frac{\partial g}{\partial \bf y}$ in the last equation from the previous section, we get

$\frac{d \bf y}{{d \bf x}^T} = -(\frac{\partial^2 g}{\partial {\bf y} \partial {\bf y}^T})^{-1} \frac{\partial^2 g}{\partial {\bf y} \partial {\bf x}^T}$

Now, by putting the pieces together, we can see how a small change in $\bf x$ affects the objective function.

$\frac{d L}{d {\bf x}^T}=\frac{\partial L}{\partial {\bf y}^T}\frac{d \bf y}{{d \bf x}^T}$

$\frac{d L}{d {\bf x}^T}=-\frac{\partial L}{\partial {\bf y}^T}(\frac{\partial^2 g}{\partial {\bf y} \partial {\bf y}^T})^{-1} \frac{\partial^2 g}{\partial {\bf y} \partial {\bf x}^T}$

$\boxed{ \frac{d L}{d {\bf x}}=-\frac{\partial^2 g}{\partial {\bf x} \partial{\bf y}^T}(\frac{\partial^2 g}{\partial {\bf y} \partial {\bf y}^T})^{-1}\frac{\partial L}{\partial {\bf y}}}$

I’ve used results like this in this paper and there is a bunch of work using this kind of thing for training neural networks.  See this entry.