character encoding


Also found in: Dictionary, Thesaurus, Wikipedia.

character encoding

(character)
(Or "character encoding scheme") A mapping of binary values to code positions and back; generally a 1:1 (bijective) mapping.

In the case of ASCII, this is generally a f(x)=x mapping: code point 65 maps to the byte value 65, and vice versa. This is possible because ASCII uses only code positions representable as single bytes, i.e., values between 0 and 255, at most. (US-ASCII only uses values 0 to 127, in fact.)

Unicode and many CJK coded character sets use many more than 255 positions, requiring more complex mappings: sometimes the characters are mapped onto pairs of bytes (see DBCS). In many cases, this breaks programs that assume a one-to-one mapping of bytes to characters, and so, for example, treat any occurrance of the byte value 13 as a carriage return. To avoid this problem, character encodings such as UTF-8 were devised.
References in periodicals archive ?
It should be noted also that UTF-8 should always be specified as the character encoding system with text containing two or more non-Latin scripts.
* Multilingual Computing and the Problem of Character Encoding and Rendering -- annotated guide
The variety of ISO 8859 encodings is evident in the multiple character encodings which can be set in contemporary Web browsers.
With a professional team of project managers, the company will assign experienced professionals to localize everything ranging from design and layout, color and images, character encoding and content management systems.
In 1983, ISO began developing a 2-byte (16-bit) standard for character encoding, ISO 10646.
It quickly became obvious that 16-bit character encoding would not provide a large enough character set.
The project involved encoding special characters of non-English language using MARC8 and UTF-8 character encoding, mapping of metadata in input XML files with output template fields expected by T&F, conversion of MARC records to MARCXML format as per the LOC XML schema and single XML output with all output MARC records.
With Android's support for Urdu, the content will be available in Unicode Transformation Format, the international standard for character encodings.