In calc 101, the derivative is derived as . So, if you want to estimate a derivative, an easy way to do so would be to just pick some small and estimate:

This can work OK. Let’s look at an example of trying to calculate the derivative of , using a range of different

What’s happening? Well, the result is true mathematically in the limit that is small, so it’s natural to get errors for large . However, with very small you run into trouble because floating-point arithmetic can only represent finite precision. Let’s try again with a smaller value of

That’s somewhat concerning. We can still get a nearly correct value, but we have a limited range of steps that achieve it. This is very well-known in numerical analysis, and a common solution is to use two-sided differences, i.e. to estimate

If we try that, we indeed get better results:

What’s happening here? Basically, we are using more information about the function to make a higher-order approximation, so we can mathematically get away with using a larger value of . This in turn isis helpful to avoid the numerical precision demons.

Great! But let’s go deeper. If we use , we get:

Huh. 1-sided differences totally fall apart, but we seem to be running into more trouble. But never fear, you can use **“four-sided differences”**!

Then, we get what you might expect:

But what if we go deeper, with ?

Or maybe we should go even deeper, with ?

Even the four-sided differences have failed us. Now, you might take a lesson here that you shouldn’t be using numerical differences (and I sort of agree). Those of certain temperament, on the other hand, would say instead that what we need is power. OK then, how do **six-sided differences** sound to you? Sound good?

This is still tough, but there is at least some range of epsilon where you can calculate a reasonable derivative. There is, of course, a never-ended sequence of these higher-order derivative approximations. There’s even a calculator, for all your bespoke finite-difference stencil needs.

Now, you might think that the real solution here is to use automatic differentiation, and you’re mostly right. It takes more computation to sample the function at more points, and no number of samples will fundamentally stop the numerical demons from destroying your result.

However, there still remain cases where it’s worthwhile to manually compute a derivative. More importantly perhaps, when you’re implementing an automatic differentiation tool, you still need to *test that it is actually correct*! Here, numerically differences will probably forever remain useful. So, certainly for the cases of building a test suite for autodiff code, it certainly makes sense to use these higher-order derivatives.

(I originally intended to leave this as a comment on Tim Viera’s post on testing gradient implementations, but this kinda got out of control.)

Pingback: The Secret Guild of Silicon Valley · Two Sided Finite Differences

Very nice post. Reminds me of using Ridder’s method for iteratively refining the epsilon starting with a large step size.

Thanks for that. Do you have a reference for Ridder’s method? I’m familiar with the use for root finding (which is clearly related) but I can’t seem to find a reference directly for estimating derivatives.

This is super awesome! I hate numerical demons! How did you measure the numerical accuracy to make these plots? Also, have you heard of Herbie (https://herbie.uwplse.org/)?

For Ridder’s method you can look at chapter 5.7 of Numerical Recipes book. It is also implemented in Ceres-Solver http://ceres-solver.org/numerical_derivatives.html#ridders-method and they have done a good job in explaining the method.