What Gauss-Seidel is Really Doing

I’ve been reading Alan Sokal’s lecture notes “Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms” today. Once I learned to take the word “Hamiltonian” and mentally substitute “function to be minimized”, they are very clearly written.

Anyway, the notes give an explanation of the Gauss-Seidel iterative method for solving linear systems that is so clear, I feel a little cheated that it was never explained to me before. Let’s start with the typical (confusing!) explanation of Gauss-Seidel (as in, say, Wikipedia). You want to solve some linear system

A{\bf x}={\bf b},

for \bf x. What you do is decompose A into a diagonal matrix D, a strictly lower triangular matrix L, and a strictly upper triangular matrix U, like


To solve this, you iterate like so:

{\bf x} \leftarrow (D+L)^{-1}({\bf b}-U{\bf x}).

This works when A is symmetric positive definite.

Now, what the hell is going on here? The first observation is that instead of inverting A, you can think of minimizing the function

\frac{1}{2} {\bf x}^T A {\bf x} - {\bf x}^T{\bf b}.

(It is easy to see that the global minimum of this is found when A{\bf x}={\bf b}.) Now, how to minimize this function? One natural way to approach things would be to just iterate through the coordinates of {\bf x}, minimizing the function over each one. Say we want to minimize with respect to x_i. So, we are going to make the change x_i \leftarrow x_i + d. Thus, we want to minimize (with respect to d), this thing:

\frac{1}{2} ({\bf x}+d {\bf e_i})^T A ({\bf x}+d {\bf e_i}) - ({\bf x}+d {\bf e_i})^T {\bf b}

Expanding this, taking the derivative w.r.t. d, and solving gives us (where {\bf e_i} is the unit vector in the ith direction)

d = \frac{{\bf e_i}^T({\bf b}-A{\bf x})}{{\bf e_i}^TA{\bf e_i}},

or, equivalently,

d = \frac{1}{A_{ii}}(b_i-A_{i*}{\bf x}).

So, taking the solution at one timestep, {\bf x}^t, and solving for the solution {\bf x}^{t+1} at the next timestep, we will have, for the first index,

x^{t+1}_1 = x^t_1 + \frac{1}{A_{11}}(b_1-A_{1*}x^t).

Meanwhile, for the second index,

x^{t+1}_2 = x^t_2 + \frac{1}{A_{22}}(b_2-A_{2*}(x^t-{\bf e_1} x^t_1 +{\bf e_1} x^{t+1}_1 )).

And, more generally

x^{t+1}_n = x^t_n + \frac{1}{A_{nn}}(b_n-A_{n*}(\sum_{i<n}{\bf e_i} x^{t+1}_i +\sum_{i\geq n}{\bf e_i} x^t_i )).

Now, all is a matter of cranking through the equations.

x^{t+1}_n = x^t_n + \frac{1}{A_{nn}}(b_n-\sum_{i<n}A_{n i} x^{t+1}_i - \sum_{i\geq n}A_{n i} x^t_i ))

D {\bf x}^{t+1} = D {\bf x}^t + {\bf b}- L x^{t+1} - (D+U) {\bf x}^t

(D+L) {\bf x}^{t+1} = {\bf b}-U {\bf x}^t

Which is exactly the iteration we started with.

The fact that this would work when A is symmetric positive definite is now obvious– that is when the objective function we started with is bounded below.

The understanding that Gauss-Seidel is just a convenient way to implement coordinate descent also helps to get intuition for the numerical properties of Gauss-Seidel (e.g. quickly removing "high frequency error", and very slowly removing "low frequency error").

One thought on “What Gauss-Seidel is Really Doing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s