6 min read
What is Unicode SMS?
What is Unicode?
Unicode is a worldwide standard for character encoding used to support characters in scripts other than ASCII. The Unicode Consortium, a non-profit organization maintains, develops, and promotes the Unicode Standard.
ASCII is based on the English alphabet and consists of only 128 characters while Unicode supports more than 1 million characters.
Unicode supports character sets in languages around the globe. Unicode uses 16 bits while ASCII characters use only about 7 bits. The two most common types of Unicode include UTF-8 and UTF-16. With UTF-8, the number of bits used changes depending on the character.
Unicode SMS Messaging
Unicode SMS refers to messages encoded using the Unicode standard. Unicode messages contain non-ASCII characters that are not in the default GSM character set. The GSM character set has 128 letters including English alphabets A-Z, numbers 0-9, and symbols such as !, @, &, etc.
The standard SMS character limit is 160 (without concatenation). Since a single Unicode character requires twice as much space compared to the 1 byte required by standard GSM characters, Unicode messages are shorter and can contain only up to 70 characters. Unicode messages that exceed the character limit are segmented into multiple parts. Please note that inadvertently including unicode characters can result in sending of multi-part messages.
If you want to send SMS globally, Kaleyra’s API allows automatic detection of Unicode SMS. The system will automatically detect the language of the SMS and charge you accordingly.