In calc 101, the derivative is derived as . So, if you want to estimate a derivative, an easy way to do so would be to just pick some small and estimate:

This can work OK. Let’s look at an example of trying to calculate the derivative of , using a range of different

What’s happening? Well, the result is true mathematically in the limit that is small, so it’s natural to get errors for large . However, with very small you run into trouble because floating-point arithmetic can only represent finite precision. Let’s try again with a smaller value of

That’s somewhat concerning. We can still get a nearly correct value, but we have a limited range of steps that achieve it. This is very well-known in numerical analysis, and a common solution is to use two-sided differences, i.e. to estimate

If we try that, we indeed get better results:

What’s happening here? Basically, we are using more information about the function to make a higher-order approximation, so we can mathematically get away with using a larger value of . This in turn isis helpful to avoid the numerical precision demons.

Great! But let’s go deeper. If we use , we get:

Huh. 1-sided differences totally fall apart, but we seem to be running into more trouble. But never fear, you can use **“four-sided differences”**!

Then, we get what you might expect:

But what if we go deeper, with ?

Or maybe we should go even deeper, with ?

Even the four-sided differences have failed us. Now, you might take a lesson here that you shouldn’t be using numerical differences (and I sort of agree). Those of certain temperament, on the other hand, would say instead that what we need is power. OK then, how do **six-sided differences** sound to you? Sound good?

This is still tough, but there is at least some range of epsilon where you can calculate a reasonable derivative. There is, of course, a never-ended sequence of these higher-order derivative approximations. There’s even a calculator, for all your bespoke finite-difference stencil needs.

Now, you might think that the real solution here is to use automatic differentiation, and you’re mostly right. It takes more computation to sample the function at more points, and no number of samples will fundamentally stop the numerical demons from destroying your result.

However, there still remain cases where it’s worthwhile to manually compute a derivative. More importantly perhaps, when you’re implementing an automatic differentiation tool, you still need to *test that it is actually correct*! Here, numerically differences will probably forever remain useful. So, certainly for the cases of building a test suite for autodiff code, it certainly makes sense to use these higher-order derivatives.

(I originally intended to leave this as a comment on Tim Viera’s post on testing gradient implementations, but this kinda got out of control.)