I’ve been reading Alan Sokal’s lecture notes “Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms” today. Once I learned to take the word “Hamiltonian” and mentally substitute “function to be minimized”, they are very clearly written.

Anyway, the notes give an explanation of the Gauss-Seidel iterative method for solving linear systems that is so clear, I feel a little cheated that it was never explained to me before. Let’s start with the typical (confusing!) explanation of Gauss-Seidel (as in, say, Wikipedia). You want to solve some linear system

,

for . What you do is decompose into a diagonal matrix , a strictly lower triangular matrix , and a strictly upper triangular matrix , like

.

To solve this, you iterate like so:

.

This works when is symmetric positive definite.

Now, what the hell is going on here? The first observation is that instead of inverting , you can think of minimizing the function

.

(It is easy to see that the global minimum of this is found when .) Now, how to minimize this function? One natural way to approach things would be to just iterate through the coordinates of , minimizing the function over each one. Say we want to minimize with respect to . So, we are going to make the change . Thus, we want to minimize (with respect to d), this thing:

Expanding this, taking the derivative w.r.t. , and solving gives us (where is the unit vector in the th direction)

,

or, equivalently,

.

So, taking the solution at one timestep, , and solving for the solution at the next timestep, we will have, for the first index,

.

Meanwhile, for the second index,

.

And, more generally

.

Now, all is a matter of cranking through the equations.

Which is exactly the iteration we started with.

The fact that this would work when is symmetric positive definite is now obvious– that is when the objective function we started with is bounded below.

The understanding that Gauss-Seidel is just a convenient way to implement coordinate descent also helps to get intuition for the numerical properties of Gauss-Seidel (e.g. quickly removing "high frequency error", and very slowly removing "low frequency error").

till how much the iteration should be continued?