Unicode and ASCII are Character encodings standards. It enables people across the world to use their own language on phones and computers. The following article tries to explain both these encoding standards in simple terms without going too much into the technical sphere.
The computer only understands binary number systems. The binary number system comprises 0 and 1.So everything you feed into the computer is converted into binary codes to make the computer understand. Converting numbers into binary codes is easy. For example, the binary equivalent of decimal number 25 is 11001.
But what about Alphabets and different symbols(like emojis)? We could not convert any character directly to binary codes.
To solve this problem ASCII and Unicode encoding system is used. Unicode and ASCII provide a unique number for every character.
For example, the ASCII value of small ‘a’ is 97. It is challenging to convert ‘a’ into binary codes but converting 97 into binary codes is easy.
ASCII
ASCII characters are generally printable characters of the English alphabet a-z, A-Z, Symbols-?&! etc.,0-9.
ASCII uses 8 bits, but only 7 bits are used to represent a character.
ASCII generally represents English alphabets, but there are so many languages having different alphabets; what about them?
The answer is Unicode.
Unicode
The Unicode is a superset of ASCII. It is a universal character encoding standard that assigns a code(basically number) to every character and symbol in every language in the world.
It provides a unique number for every character.
Unicode in Python
In python to find the Unicode of any character use ord() function. The ord() function returns an integer representing the Unicode character.
#for converting/finding unicode of a character. #a_unicode,b_unicode,c_unicode are variables #where we are storing unicode value of different characters. a_unicode=ord('आ') print('unicode of आ=',a_unicode) b_unicode=ord('a') print('unicode of a=',b_unicode) c_unicode=ord('A') print('unicode of A=',c_unicode)
When you run the program, the output will be:
unicode of आ= 2310 unicode of a= 97 unicode of A= 65
Use the chr() function to get the Unicode code integer value of a character.
#for converting/finding unicode of a character. #a_char and b_char a_char=chr(2310) print('character value of unicode 2310=',a_char) b_char=chr(97) print('character value of unicode 2310=',b_char) c_char=chr(2500) print('character value of unicode 2500=',c_char)
When you run the program, the output will be:
character value of unicode 2310= आ character value of unicode 2310= a character value of unicode 2500= ৄ