Crash courses and sparks for data scientists, analysts, students, on statistical mathematics and related data science domains like machine learning with easy to follow lessons and insights.


In any real-world dataset, the probability of the first digit of a value being 1 is very high compared to the rest of the values, with the digit 9 having the lowest probability. This interesting occurrence is well-known as Benford’s law.

$$ P(d)=\log_{10}(d+1)-\log_{10}(d) $$

The intuition behind this observation is that a value in a real-world dataset spends a long time with its first digit as 1, and then the time shortens for every digit towards the digit 9 where it is the shortest. For example, if a value has to grow from 100 to 200, it needs to double itself. But when it reaches 900, the transition to 1000 is very quick as it’s just a 11% growth compared to 100 to 200, which is a 100% growth.

In other words, growth tends to be slow at the beginning (the first digit being 1) and exponentially faster as it reaches towards 9.