Benford’s Law

Business, Legal & Accounting Glossary

Definition: Benford’s Law



Full Definition of Benford’s Law


Benford’s law, also called the first-digit law, states that in lists of numbers from many real-life sources of data, the leading digit is 1 almost one-third of the time, and larger numbers occur as the leading digit with less and less frequency as they grow in magnitude, to the point that 9 is the first digit less than one time in twenty. This is based on the observation that real-world measurements are generally distributed logarithmically, thus the logarithm of a set of real-world measurements is generally distributed uniformly.

This counter-intuitive result applies to a wide variety of figures, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature). The result holds regardless of the base in which the numbers are expressed, although the exact proportions of course change.

It is named after physicist Frank Benford, who stated it in 1938, although it had been previously stated by Simon Newcomb in 1881 in his paper “Note on the Frequency of Use of the Different Digits in Natural Numbers”. The first rigorous formulation and proof appears to be due to Theodore P. Hill in 1988.

Mathematical Statement

More precisely, Benford’s law states that the leading digit d (d ∈ {1, …, b − 1} ) in base b (b ≥ 2) occurs with probability proportional to logb(d + 1) − logbd = logb((d + 1)/d). This quantity is exactly the space between d and d + 1 in a log scale.

In base 10, the leading digits have the following distribution by Benford’s law, where d is the leading digit and p the probability:

d p
1 30.1%
2 17.6%
3 12.5%
4 9.7%
5 7.9%
6 6.7%
7 5.8%
8 5.1%
9 4.6%

One can also formulate a law for the first two digits: the probability that the first two-digit block is equal to n (n = 10, …, 99) is log100(n + 1) − log100(n), and similarly for three-blocks without leading zeros and longer blocks (in fact, the BL for the first p digits in base b follows immediately from the BL for a single leading digit in base bp).

The law demonstrates a distribution on the mantissas of one’s data: any number can be written as a power of 10 multiplied by a mantissa, m, with 1≤m<10. The distribution of mantissas of the data must follow a 1/x distribution to yield Benford’s law. Some have been confused into thinking this implies the underlying data must follow the same distribution (which is not normalizable), but the result only requires the mantissa distribution, on the bounded interval from 1 to 10, to do so. That this distribution is unsurprising may be seen by looking in the domain of the logarithm of the data: reducing our original data’s distribution to its mantissa distribution is thus transformed into looking at the distribution of the fractional part of the logarithm of our data; this is a distribution on the interval from 0 to 1. Reducing any distribution to the distribution it implies on the fractional part of its data is naturally wont to yield a roughly uniform distribution (since the slopes on the tails of the distribution get translated into the unit interval, where the opposite slopes of the left and right tails tend to cancel one another out). A roughly uniform distribution on the fractional part of the logarithm corresponds directly to a roughly 1/x distribution on the mantissas of the original data. This applies naturally regardless of whether the data are as likely to lie between 1 and 10 as to lie between 100 and 1000; all that matters is that the data are broadly enough spread that there is some value for which the proportion of the data less than that value, and the proportion of the data more than 10 times that value, are both significant. Explanation The law can be explained by the fact that, if it is indeed true that the first digits have a particular distribution, it must be independent of the measuring units used. For example, this means that if one converts from e.g. feet to yards (multiplication by a constant), the distribution must be unchanged — it is scale-invariant, and the only distribution that fits this is one whose logarithm is uniformly distributed. For example, the first (non-zero) digit of the lengths or distances of objects should have the same distribution whether the unit of measurement is feet, yards, or anything else. But there are three feet in a yard, so the probability that the first digit of a length in yards is 1 must be the same as the probability that the first digit of a length in feet starts 3, 4, or 5. Applying this to all possible measurement scales gives a logarithmic distribution, and combined with the fact that log10(1) = 0 and log10(10) = 1 gives Benford’s law. That is, if there is a distribution of first digits, it must apply to a set of data regardless of what measuring units are used, and the only distribution of first digits that fits that is the Benford Law. More precisely, let X be a random variable whose probability of being equal to any positive integer, x, is proportional to x−s, where s > 1. That is,

P(X=x)\propto x^{-s}\qquad s>1.

The constant of proportionality must then be 1/ζ(s), where ζ is the Riemann zeta function (see zeta distribution). The probability that the first digit of X is n approaches log10(n + 1) − log10(n) as s approaches 1.

The precise form of Benford’s law can be explained if one assumes that the logarithms of the numbers are uniformly distributed; this means that a number is for instance just as likely to be between 100 and 1000 (logarithm between 2 and 3) as it is between 10,000 and 100,000 (logarithm between 4 and 5). For many sets of numbers, especially ones that grow exponentially such as incomes and stock prices, this is a reasonable assumption.

A simple example may help clarify how this works. To say that a quantity is “growing exponentially” is just another way of saying that its doubling time is constant. If the quantity takes a year to double, then after one more year, it has doubled again. Thus it will be four times its original value at the end of the second year, eight times its original value at the end of the third year, and so on. Suppose we start the timer when a quantity that is doubling every year has reached the value of 100. Its value will have a leading digit of 1 for the entire first year. During the second year, its value will have a leading digit of 2 for a little over seven months, and 3 for the remaining five. During the third year, the leading digit will pass through 4, 5, 6, and 7, spending less and less time with each succeeding digit. Fairly early in the fourth year, the leading digits will pass through 8 and 9. Then the quantity’s value will have reached 1000, and the process starts again. From this example, it’s easy to see that if you sampled the quantity’s value at random times throughout those years, you’re most likely to have measured it when the value of its leading digit was 1, and successively less likely to have measured it when the value was moving through increasingly higher leading digits.

This example makes it plausible that data tables that involve measurements of exponentially growing quantities will agree with Benford’s Law. But the Law also appears to hold for many cases where an exponential growth pattern is not obvious.

Note that for numbers drawn from many distributions, for example IQ scores, human heights or other variables following normal distributions, the law is not valid. However, if one “mixes” numbers from those distributions, for example by taking numbers from newspaper articles, Benford’s law reappears. This can be proven mathematically: if one repeatedly “randomly” chooses a probability distribution and then randomly chooses a number according to that distribution, the resulting list of numbers will obey Benford’s law (Hill, 1998).

Applications And Limitations

In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford’s law ought to show up any anomalous results.[2] Following this idea, Nigrini showed that Benford’s law could be used as an indicator of accounting and expenses fraud.

Limitations

Care must be taken with these applications, however. A set of real-life data may not obey the law, depending on the extent to which the distribution of numbers it contains are skewed by the category of data.

For instance, one might expect a list of numbers representing ‘populations of UK villages beginning with A’ or ‘small insurance claims’ to obey Benford’s law. But if it turns out that the definition of a ‘village’ is ‘settlement with population between 300 and 999’, or that the definition of a ‘small insurance claim’ is ‘claim between $50 and $100’, then Benford’s law would not apply because certain numbers have been excluded by the definition.

History

The discovery of this fact goes back to 1881, when the American astronomer Simon Newcomb noticed that in logarithm books (used at that time to perform calculations), the earlier pages (which contained numbers that started with 1) were much more worn than the other pages. It has been argued that any book that is used from the beginning would show more wear and tear on the earlier pages, but also that Newcomb would have been referring to dirt on the pages themselves (rather than the edges) where people ran their fingers down the lists of digits to find the closest number to the one they required.

However, logarithm books did contain more than one list, with both logarithms and antilogarithms present, and sometimes many other tables as well, including exponentials, roots, sines, cosines, tangents, secants, cosecants etc., thus, this story may be apocryphal. However, Newcomb’s published result is the first known instance of this observation and includes a distribution on the second digit, as well. Newcomb proposed a law that the probability of a single number being the first digit of a number (let such a first digit be N) was equal to log(N+1)-log(N).

The phenomenon was rediscovered in 1938 by the physicist Frank Benford, who checked it on a wide variety of data sets and was credited for it. In 1996, Ted Hill proved the result about mixed distributions mentioned above.

Popular Culture

Benford’s law was used as a plot device on CBS’s TV series NUMB3RS in the episode “The Running Man”.


Cite Term


To help you cite our definitions in your bibliography, here is the proper citation layout for the three major formatting styles, with all of the relevant information filled in.

Page URL
https://payrollheaven.com/define/benfords-law/
Modern Language Association (MLA):
Benford’s Law. PayrollHeaven.com. Payroll & Accounting Heaven Ltd. December 04, 2020 https://payrollheaven.com/define/benfords-law/.
Chicago Manual of Style (CMS):
Benford’s Law. PayrollHeaven.com. Payroll & Accounting Heaven Ltd. https://payrollheaven.com/define/benfords-law/ (accessed: December 04, 2020).
American Psychological Association (APA):
Benford’s Law. PayrollHeaven.com. Retrieved December 04, 2020, from PayrollHeaven.com website: https://payrollheaven.com/define/benfords-law/

Definition Sources


Definitions for Benford’s Law are sourced/syndicated and enhanced from:

  • A Dictionary of Economics (Oxford Quick Reference)
  • Oxford Dictionary Of Accounting
  • Oxford Dictionary Of Business & Management

This glossary post was last updated: 18th April, 2020 | 2 Views.