Since the mid-1960s, the performance of CPU has almost doubled every 18 months. Since 1965, it has almost increased by 34 times, which means that the operation speed has increased by1600 million times. If this progress is decomposed, it can be divided into two dimensions, one is the improvement of main frequency, and the other is the improvement of integrated circuit chip density (called integration).
After the main frequency was increased, it was originally 654.38+ million cycles per second, and now it can be counted as 3 billion cycles, which is tens of thousands of times higher. After the density is increased, the original calculation task can be completed once in several cycles, and now it can be flow-shop, division of labor and cooperation, and several calculations can be completed in one cycle. The acceleration of the two is superimposed, which makes the speed of computers very fast today.
But the main frequency of CPU can't be increased indefinitely, because the speed of light is the absolute bottleneck. Today, the speed of electronic movement in computer CPU is close to the given limit of light speed, and there is almost no possibility of further improvement. In fact, this road was blocked from 10 years ago.
So what should we do? Intel's approach is to further improve the integration of computer CPU. At the earliest time, there were only a few thousand transistors in an integrated circuit. Today, there are as many as 6 billion transistors, so the calculation can be carried out in parallel. This is why our computer and mobile phone CPU have the so-called quad-core eight nuclear theory.
So, why not make it16,32 or even 100? Because with the current technology, the size of multi-core processors is extremely huge, and the problem of heat dissipation alone cannot be solved. What to do in the face of this problem? Intel's general attitude can't be solved. You can buy more processors, build more servers and make the computing center bigger from me. Not to mention that this method is ineffective, and there are no conditions to do so on many occasions. For example, in a driverless car, you can't take a cabinet with you on the road.
We know that enterprising people always try their best to find better solutions. Huang Renxun, the founder of NVIDIA, is such a person. Huang Renxun and his colleagues in NVIDIA think that CPU is not fast enough, because it is designed to adapt to all calculations, and many transistors in it are used to build control circuits.
In addition, because the calculation is too complicated, the design of the processor itself is too complicated. In the computer, there is a relatively simple calculation, which is to control the graphics calculation of the display, so NVIDIA specially designed a processor for this calculation, called GPU, which is also the graphics processor.
Of course, before NVIDIA, Sun and SGI, which made graphic workstations, also designed similar products, but they were not universal. GPU has two advantages over CPU:
First, the control circuit is simple, so more transistors are used for calculation rather than control. In this way, the original 65.438 billion transistors can make four cores, and there is hope to make eight, sixteen or even more.
Second, turn the calculation of a single child into batch calculation. In real life, most calculations are always made between two numbers. For example, A+B = C is an operation instruction ("+") with two numbers ("A" and "B"), and the next operation instruction ("-") with two other numbers ("X" and "Y"). Therefore, the computer processor is actually designed according to one instruction channel (stream) corresponding to one data channel (called SISD). To make an inappropriate analogy, the general calculation is like you squatting on the ground and picking up beans one by one.
And graphic calculation is to calculate the whole line (called vector in computer science) in one breath according to SIMD operation, such as A 1+B 1=C 1, A2+B2=C2 ..., and finally a1000+b1000.
This is equivalent to vacuuming beans on the ground. Walking through a line, you suck a lot, and the efficiency is much higher. In this regard, NVIDIA proposed the so-called "Unified Computing Architecture" (CUDA) concept, that is, many cores do the same thing and design a graphics processor GPU based on it.
With GPU, many repeated and consistent calculations can be parallel. GPU was originally designed for graphics computing, but later NVIDIA discovered that the algorithm of machine learning can be realized in this way, so in 20 16, NVIDIA designed a GPU for machine learning according to the characteristics of machine learning, and its latest P40 processor has as many as 3,000 so-called "unified computing architecture" cores.
Although each capability is not as good as one core in Intel's quad-core processor, there are many cores in GPUs such as P40, so artificial intelligence calculation is very fast. Today Tesla is engaged in assisted driving, and one such processor can solve all the problems. In the match between AlphaGo and Li Shishi last year, NVIDIA's 176 GPU was used to undertake the main computing function.
But after all, the vector calculation in machine learning is different from the general vector calculation. Can you make the calculated kernel function more specific and only do a very specific machine learning algorithm (that is, Google's artificial neural network algorithm) related vector calculation?
Therefore, Google put forward the concept of tensor computing. Tensor, originally a mathematical concept, represents the relationship between various vectors or values. For example, your two photos are two different vectors, and some similarities between them are a tensor. The algorithm of artificial neural network can be regarded as the calculation of tensor. As for why, don't study it carefully, just remember this conclusion.
Next, on the basis of NVIDIA and other companies' GPUs, Google further focused on computing, and designed a processor named TPU, which is only used for specific tensor computing, where t stands for tensor. Google claims that for a task like AlphaGo, the efficiency of a TPU is as high as 65,438+05-30 NVIDIA GPU, which is why this time Google said that the new version of AlphaGo has lost weight in hardware.
AlphaGo, which defeated Li Shishi last year, consumes 300 times as much power as the human brain. Today, AlphaGo uses a lot fewer machines, at least one order of magnitude less, which means that the power consumption has dropped from 300 times that of the human brain to less than 30 times. This progress is still amazing.
NVIDIA, of course, was unconvinced, saying that Google compared apples and oranges, and my P40 was much faster than your TPU. In fact, who is better, TPU or GPU, depends entirely on what to do.
From CPU to GPU to TPU, the fundamental reason for efficiency improvement lies in two words-focus. In contrast, the CPU of our mobile phone and computer is "unfocused".
In social life, the situation is actually very similar to a computer processor. After the industrial revolution began, British factory owners made a detailed division of labor, so the efficiency increased greatly. Adam Smith said in The Wealth of Nations that even in the matter of making sewing needles, when the division of labor is very detailed, a worker can produce thousands of needles a day. If a worker does all the processes, he may not finish 10 stitches a day. So after the British industrial revolution, the processing industry in Europe was crushed. This is actually like the relationship between TPU and CPU.
However, there is a premise to use TPU. There must be at least millions of chips on the market, otherwise it is not worth doing, because it takes millions of samples and tens of millions of designs. If the market demand is only tens of thousands of chips, it is better to use a lot of CPU to work. It's like making a sewing needle. I'm afraid Europe alone needs hundreds of millions of stitches a year to be worthy of social division of labor. If you only use thirty or fifty dollars, you might as well let a few workers grind it slowly! So the premise of division of labor and concentration is that the market scale is large enough.
Finally, talk about people's skills, when they need to specialize and when they need to be extensive. In fact, there is no certain rule, but a good criterion is whether the market is big enough to need to be very specialized and refined.