Statistics

Crash courses and sparks for data scientists, analysts, students, on statistical mathematics and related data science domains like machine learning with easy to follow lessons and insights.

In any real-world dataset, the probability of the first digit of a value being 1 is very high compared to the rest of the values, with the digit 9 having the lowest probability. This interesting occurrence is well-known as Benford’s law.

$$P(d)=\log_{10}(d+1)-\log_{10}(d)$$

The intuition behind this observation is that a value in a real-world dataset spends a long time with its first digit as 1, and then the time shortens for every digit towards the digit 9 where it is the shortest. For example, if a value has to grow from 100 to 200, it needs to double itself. But when it reaches 900, the transition to 1000 is very quick as it’s just a 11% growth compared to 100 to 200, which is a 100% growth.

In other words, growth tends to be slow at the beginning (the first digit being 1) and exponentially faster as it reaches towards 9.