What is UTF?
UTF is an abbreviation for UCS Transformation Format. UCS is an abbreviation for Universal Character Set. The Universal Character Set is synchronized with the Unicode standard. There are three commonly known types of UTF encodings, namely UTF-8, UTF-16 and UTF-32. The UTF-8 encodes Unicode characters into a sequence of 8-bit values known as code units. In UTF-8 the encoding unit is 8-bits long. Similarly UTF-16 and UTF-32 each use 16 and 32 bits for encoding the Unicode characters. There are over one hundred forty three thousand characters included in the current version of Unicode Standard (v13.0 is the standard at the time of this writing). The valid range of code points for the Unicode characters is from 0 to 10FFFF (in Hex). Out of this range of code points the values in the range from D800 to DFFF are reserved for creating surrogate pairs and is not assigned to any abstract characters. The range D800–DBFF(Hex) is for High Surrogate ...