character encoding


Also found in: Dictionary, Thesaurus, Wikipedia.

character encoding

(character)
(Or "character encoding scheme") A mapping of binary values to code positions and back; generally a 1:1 (bijective) mapping.

In the case of ASCII, this is generally a f(x)=x mapping: code point 65 maps to the byte value 65, and vice versa. This is possible because ASCII uses only code positions representable as single bytes, i.e., values between 0 and 255, at most. (US-ASCII only uses values 0 to 127, in fact.)

Unicode and many CJK coded character sets use many more than 255 positions, requiring more complex mappings: sometimes the characters are mapped onto pairs of bytes (see DBCS). In many cases, this breaks programs that assume a one-to-one mapping of bytes to characters, and so, for example, treat any occurrance of the byte value 13 as a carriage return. To avoid this problem, character encodings such as UTF-8 were devised.
References in periodicals archive ?
It should be noted also that UTF-8 should always be specified as the character encoding system with text containing two or more non-Latin scripts.
Internationalization of character encoding It enables users to input various languages and view them without turning into garbage characters.
To further meet the demands of customers, CuteFTP version 9 provides UTF-8 character encoding support for international file naming, support for WebDAV shares, and numerous interface and security enhancements.
Character encoding support and word/paragraph navigation have also been improved compared to the previous version.
1 of the Unicode Standard, the character encoding system defined by the Unicode Consortium.
Users from around the world can now enter text in any language represented by Unicode, the international standard for multi-lingual character encoding.
More than half of all web pages use multi-byte / double byte, and it is the main character encoding used internationally online.
For devices requiring display terminals, QNX now supports Chinese character encoding based on the GB18030 encoding standard, as required by Chinese import regulations.
The static analysis identifies source code typically used in malicious attacks, including encoded JavaScript, web bugs and character encoding inside of inline frames.
The clarified specification of conformance requirements incorporates the most highly developed character encoding model in existence, encompassing the wide variety of types of characters needed by the world's languages, and permitting compatibility with all modern computer architectures.
The typefaces also support the Big5 Chinese character encoding system.
UTF-8, 8-bit Unified Transformation Format, is a lossless, variable-length character encoding which uses groups of bytes to represent the alphabets of many of the world's languages.