Posts

Showing posts from July, 2020

String Encodings? Unicode? ASCII? UTF-8? ãéîöû?

Image
4 years of my career as a Software Engineer and I was able to put off the topic of character encoding and somehow sleep peacefully at night everyday until last night. A piece of python code that I had written broke and the error was UnicodeEncodeError. My reaction was similar to the title of this post. The last night's sleeplessness was not because I had a broken code but because why I had put off something which is needed every time I worked with strings. ASCII: The time when strings used to be simple! Just like aliens were only sighted in America, computers too were sighted only in America (or at least only in English speaking countries). To deal with strings, only English characters were standardised as ASCII and they were mapped to each number between 32-127. The first 32 numbers were mapped to control characters(number 7 made your computer beep). Since computers worked with 8 bit word size then, characters could be represented eas