elem, do skora sam baš prilično imao muka da svarim sve detalje oko Unicoda, UCS-a, UTF-a i ostalih opasnih stranih reči ;)
uglavnom, znao sam da Unicode uglavnom 16-bitni (osim kad nije ;), ali me bunilo to što je do danas registrovano skoro 100k karaktera.. i to je samo jedna od stvari koje mi nisu bile jasne.
no, posle prekjuče, kada sam pročitao jedan duži blog post (u formi kraćeg članka) čuvenog Tim-a Bray-a (xml, w3c, textuality, ...) sve mi se razbistrilo ;)
no, šalu na stranu, evo pa prosudite sami:
Citat:
Characters vs. Bytes
This is the first of a three-part essay on modern character string processing for computer programmers. Here I explain and illustrate the methods for storing Unicode characters in byte sequences in computers, and discuss their advantages and disadvantages. These methods have well-known names like UTF-8 and UTF-16 ...
This is the first of a three-part essay on modern character string processing for computer programmers. Here I explain and illustrate the methods for storing Unicode characters in byte sequences in computers, and discuss their advantages and disadvantages. These methods have well-known names like UTF-8 and UTF-16 ...
http://tbray.org/ongoing/When/200x/2003/04/26/UTF