A neural network can be described as a series of altering linear transformations and elementwise nonlinearities.

{\bf s}_n=W_n{\bf x}_{n-1}

{\bf x}_n=\sigma({\bf s}_n)

\sigma is a “sigmoid function”, usually 1/(1+\exp(-s)) or \tanh(s).

You present input {\bf x}_0 and get output {\bf x}_N.  You have some loss function L that says how much you like that particular ouput on that input.  Backpropagation is an algorithm for calculating the derivatives of L with respect to all the weight matrices W_n.  Backprop is a special case of reverse-mode automatic differentiation.  However, it is simple enough to implement manually that it is often worth doing so for efficiency.

The loss function will directly give the derivatives with respect to the output.  This will be either (depending on the application)

\displaystyle{\frac{dL}{d{\bf x}_N}}. or \displaystyle{\frac{dL}{d{\bf s}_N}}.

The algorithm follows easily from applying the following three rules (Here, \odot denotes the elementwise product):

\displaystyle{\frac{dL}{d{\bf s}_n} = \frac{dL}{d{\bf x}_n} \odot \sigma'({\bf s}_n)}

\displaystyle{\frac{dL}{dW_n} = \frac{dL}{d{\bf s}_n}{{\bf x}_{n-1}}^T}

\displaystyle{\frac{dL}{d{\bf x}_{n-1}} = {W_n}^T\frac{dL}{d{\bf s}_n}}

Notice that applying these rules in reverse will have the same complexity as the original “forward propagation”.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s