Unicode code corresponds to a character in the Unicode font table. It can be said that Unicode encoding is the number of a character in the Unicode font table.
(Unicode encoding. N) is a binary number.
In order to contain all the characters in the world, Unicode adopts the rule that multi-bytes represent an eight-bit character. One bit can have two states, namely 0 and 1, one byte can have 256 states, and n bytes can have 256 states, each state corresponds to a binary number, so multi-bytes can represent more characters, thus making the font table bigger.
UTF-8 is a character encoding scheme, which encodes Unicode (that is, encodes binary digits), and the character encoding scheme maps a binary digit into a byte sequence.
Why re-encode Unicode? Because Unicode is a fixed-length (coded character. N), the trouble brought by this way is:
Suppose the binary value 0000000 1 is the code of the character A, which can be stored in a computer with only one byte. However, because the length of Unicode is fixed at four bytes, the code of A becomes 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000008+0 needs to store four bytes in the computer, which leads to
Therefore, in order to use Unicode's large font table and save storage space, Unicode needs to be re-encoded, and it is based on (Unicode encoded content. N)-UTF-8 is a variable-length character encoding scheme (Unicode encoding). n)。
UTF-8 character encoding scheme determines how to (Unicode encoding. N) stored in the computer.
(Unicode encoding. N) can also be regarded as a new binary number encoded by UTF-8 character encoding scheme (the value of this new binary number is usually represented by hexadecimal numeric characters, and their direct relationship is that the value represented by this hexadecimal character is equal to the value of this binary number).