## Jan 18 Benford’s Law

If you were to look at the of the populations of all of the countries in the world, what first digit would you expect these numbers to begin with most often? How about if you were to measure the length of all of the rivers in the world (in whatever unit you like, but let’s say meters). What would the leading digit most likely to be? At first it seems like all of the numbers from 1 to 9 should happen equally, but that is surprisingly far from the truth. About 30.1% of these first digits should be 1 with only 4.6% of digits being 9 and this is called Benford’s law.

The mistake that we all make here is that we are used to picking random numbers between 1 and 10 or 1 and 100 etc. Within these ranges the first digits or more or less equal. However if you are picking your numbers from actual observed data then these numbers are will be between two random numbers. If for example, the upper number is 38,291 and the lower limit is 5,304, then most of the leading digits will be either 1 or 2 because over half our range is either in the 10,000s or the 20,000s. This is just one example, but on average it holds true for most sets.

One requirement for the data is that it spreads over a few orders of magnitude. For example if you were to measure the heights of humans then we get mostly 1 as the first digit in metres and mostly 5 or 6 in feet. This data is just too concentrated for Benford’s law to apply. But there lots of data sets where this is not the case: pick anything number based from the front pages of an atlas and see for yourself. To take something a bit more mathematical, if you look at the leading digits of powers of 2: 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1…, you'll find that they fit the law exactly.

How do we work out these numbers? The key to this is to look at the logarithmic scale:

When you pick a random number you can think of it as picking that number on the log scale. You can see that the regions of the scale which are biggest are when the leading digit is 1: this happens log2=0.301 of the time. The digit 2 happens between 2 and 3, so it is log3-log2=0.176. The digit 3 happens with probability log4-log3=0.125 etc.

The law was actually first noticed by Simon Newcomb in 1881 when he noticed that the first pages of his log book were far more beaten up than the rest. After he realised that these corresponded with the logs corresponding with a leading digit of 1 he came up with an early version of the law which was later named after a later discover. This isn't the Newcomb of Newcomb’s Paradox; William Newcomb was his great great nephew. While Simon Newcomb made contributions to Mathematics, Physics and Astronomy he is infamously known as the predictor that humankind would never be able to build a flying machine, or that it would require drastic innovation such as new lighter materials to be invented. This was in 1903, six years before the Wright Brothers launched their first plane with materials available to Newcomb.

If you want to work out the distribution in other bases you can just use logs in the matching base. In binary Benford’s law is trivially true because the other number that doesn't have 1 as the leading digit is 0. As you move forwards trinary 1 happens log32=0.630 of the time and 2 the remaining 0.370 of the time. This can be extended however far you like.