Current location - Health Preservation Learning Network - Fitness coach - This Ford Law Course Share 3 2022-04- 16
This Ford Law Course Share 3 2022-04- 16
Ben Ford's law Course sharing 3

? This is a case in the second lecture of the general elective course "Calculation Methods in Economic Research".

? Bendford's law is an atypical statistical law with a long history. Although it has not been proved in a broad sense, it has important applications. The most direct function is to help detect "data fraud" in various fields.

(1) benford's law

? Bendford's law, also known as the first number law. It is the inherent law of numerical statistics and refers to all natural random variables. As long as the sample space is large enough, the probability that the first digit of each sample is 1 to 9 is stable within a certain range (see figure). That is, samples starting with 1 account for 0.3 of the sample space, samples starting with 2 account for 0. 17-0. 19 of the sample space, and samples starting with 9 or 8 always account for only about 0.05.

? Qian Qian's tens of thousands of data in the world start with any number from 1 to 9, and the probability of starting each number should be almost the same. But if you count enough data, you will be surprised to find that the data starting with 1 is the most. ?

? 1935, an American named Frank? The engineers of Frank Benford (1883–1948) found that the first pages of the logarithmic table were dirtier than the last pages, which indicated that more people usually read the first pages. Further research shows that as long as there are enough samples in the data, the frequency of numbers starting with 1 in the data is not 1/9, but 30. 1%. The frequency of numbers headed by 2 is 17.6%, and then decreases in turn. The frequency of 9 is the lowest, only 4.6%. ?

? Ben Ford began to investigate other figures and found that all kinds of completely different data had this rule. For example, about one-third of residential numbers have 1 as their first number. Many areas with little connectivity have the same situation: for example, the historical data of Dow Jones index, the order of file sizes stored in personal computers, the length of major rivers in the world, the numbers on the front page of newspapers and many other things are consistent.

? In 196 1, an American scientist suggested that this Ford's law is actually a phenomenon caused by the accumulation of numbers, even if there is no unit number. For example, if the index of the stock market starts from 1000 and rises at an annual rate of 10%, it will take more than 7 years for the index to rise from 1000 to 2000. It only takes more than four years to rise from 2000 to 3000; But if the index is to rise from 10000 to 20000, it will take more than seven years. Therefore, we can see that the index data starting with 1 is much higher than the index data starting with other numbers.

Ben Ford

? Ben Ford was originally an American electrical engineer and physicist. He worked in the laboratory of General Electric Company for many years until he retired. When the engineer was in his fifties, he fell in love with a subject related to numbers. The conclusion of this project is what we now call "Benford's Law".

? In fact, it was not Bendford who first discovered Bendford's law, but the American astronomer Simon? Niukang (simon newcomb,1835.3.10-1909.7.11). 1877, Newcomb became the director of the American Maritime Astronomical Calendar Bureau, and organized colleagues to recalculate all major astronomical constants. Logarithm tables were often used in complicated astronomical calculations, but at that time, there was no Internet or Alibaba Cloud. Logarithms were printed into books and stored in libraries. Careful newcomb found a strange phenomenon: the pages with numbers beginning with 1 in the logarithm table are much worse than other pages, which seems to indicate that the probability of the first number in calculation is higher, so he published an article in 188 1 to mention and analyze this phenomenon, but it didn't attract people's attention until 65438.

? Strange to say, the discovery of scientific laws sometimes comes from some insignificant and extremely small phenomena, such as Ben Ford's discovery that there are many numbers at the beginning of 1. Is this a law? He found that this phenomenon exists not only in logarithmic tables, but also in other kinds of data, so he consulted a lot of data to confirm this.

? Ben Ford's observation on this issue is more profound than newcomb's. He began to investigate other numbers and found that the phenomenon of "first number law" appeared in completely different data, such as population, mortality, physical and chemical constants, baseball statistics, half-life radioisotopes, answers in physics books, prime numbers and Fibonacci numbers. In other words, as long as the data obtained by the unit of measurement system conforms to this law. On the other hand, data obtained and restricted at will usually do not conform to this Ford law. For example, lottery numbers, telephone numbers, gasoline prices, dates, and weight or height data of a group of people are relatively randomly or arbitrarily specified, and are not obtained by the measurement system.

? Newcomb discovered this rule more than 50 years earlier than Bendford, but it is obvious that the latter is a more caring person. Otherwise, it will be called newcomb's law.

(3) Is Ben Ford's Law Reliable?

? The first number law describes the frequency of natural number 1 to 9, and the formula is f (d) = log [1+(1/d)] (d is a natural number). After analysis, it is found that the natural cumulative data obtained by the measurement unit system conforms to the first law of numbers, while the data obtained and restricted at will usually do not. But people's height and weight data do not match, how to explain it? Although the law has been applied in many aspects, people still don't understand this phenomenon.

? Then there is how to prove the law by mathematical methods, and there is no satisfactory result so far. This is the biggest problem, and it is also the reason why the famous Benford's law, known as the first law of numbers, has not yet entered the mathematics or statistics textbook.

? There is more than one proof of this law, but none of them are strict. The following one, though strict, obviously has attached conditions.

? The proof is as follows: suppose we have a large sample space with a random variable of x? ,x? , ..., x_{n}, where n is large enough. x? ,x? The evolution law of x {n} can be simulated by exponential equation.

? If the logarithm with the base of 10 is taken on both sides of the solution of the exponential law, the conclusion that lg x(t) is proportional to time t will be obtained.

? If you ask the probability that the variable X is between 80 and 90, you only need to find the solution T of T when x(t=80). And the solution of t when x(t=90) t? Then the ratio of the total time t (t? -t? T is the probability that x is between 80 and 90.

? So what if we ask the probability that the first number is 8? Thanks to the ideas of duanx and zhuww, we only need to care about the length of the fractional part of lg x between lg 8 and lg 9.

? This is because the integer part of the logarithm lg x about 10 determines how many digits X is (the integer part is 1, which means two digits; The integer part is 2, which means 3 digits. The fractional part of lg x determines what each number of X is.

? If you draw an image of lg x about the fractional part of time t, it is actually equivalent to folding the image of lg x to the interval of [lg 0, lg 10]. In this way, we don't need to care about how big the time t is, because the time axis is also folded. Then the probability that the first digit is d is [LG (d+1)-LG (d)]/(LG1LG1) = LG (d+1)-LG (d).

? Note: The exponential equation above is the solution of the differential equation below. The physical meaning of this equation is that the change of x(t) per unit time is directly proportional to the value of x(t) at time t, and the proportional coefficient is a constant k.

? In the real world, many evolutionary processes can be approximated by the above equations, especially when the initial stage of real evolution has not reached saturation. On Wikipedia, we can find many such examples, such as exponential decay, exponential growth, and the degenerate part of the rate equation in chemistry.

(4) The application of Bendford's law.

? No matter how to interpret Bendford's law, it is objective and useful. Because most financial data satisfy Bendford's law, it can be used to check whether there is fraud in financial data.

? At that time, the largest investment fraud case was detected in Washington State, USA, amounting to $654.38 billion. Kevin Lawrence, the mastermind of this fraud case, and his associates raised a lot of money from more than 5,000 investors in the name of establishing a high-tech chain fitness club. Then, they embezzle public funds for their own enjoyment and buy luxury houses, luxury cars and jewelry for themselves. In order to cover up their illegal activities, they frequently transfer funds between overseas companies and banks, artificially make false accounts, and give investors the illusion that business is booming. Fortunately, an accountant (Darrell dorrell) felt something was wrong at that time. He collected more than 70,000 data related to checks and remittances, and compared the frequency of the first number of these data with Bendford's law, and found that these data failed the test of the first number law. Finally, after three years of judicial investigation, this investment scam was finally uncovered. In 2002, Lawrence was sentenced to 20 years in prison.

? In 200 1 year, Enron, the largest energy trader in the United States, declared bankruptcy, and there were rumors that its senior managers were suspected of making false accounts. It is said that Enron executives changed their financial data, so their published earnings per share data of 200 1-2002 does not conform to this Ford law. 200 1, 1 In February, the seventh company in the global top 500 companies admitted accounting fraud to the US Securities and Exchange Commission. Enron incident aroused public concern about accounting data fraud, which directly led to the birth of Sarbanes-Oxley Act in August 2002.

? The IRS also uses the Bendford Rule to check tax returns and find out tax evasion. It is said that someone once used this rule to check the tax return data of former US President Bill Clinton 10, but found no flaw.

? In addition, Bendford's law is also used to analyze the stock market and test fraud in election voting.

? Obviously, Bendford's law is a big killer to combat data fraud. Of course, we should pay attention to its application conditions:

1. Data cannot be sorted periodically;

2. Data cannot be set manually;

3. The amount of data should be large enough. Some people say more than 3,000, I wonder if there is any basis;

4. Not always right, which is an unsolved mystery at present;

5. Accuracy is also a standard problem, because it is closer to Monte Carlo algorithm.