character encoding


Also found in: Dictionary, Thesaurus, Wikipedia.

character encoding

(character)
(Or "character encoding scheme") A mapping of binary values to code positions and back; generally a 1:1 (bijective) mapping.

In the case of ASCII, this is generally a f(x)=x mapping: code point 65 maps to the byte value 65, and vice versa. This is possible because ASCII uses only code positions representable as single bytes, i.e., values between 0 and 255, at most. (US-ASCII only uses values 0 to 127, in fact.)

Unicode and many CJK coded character sets use many more than 255 positions, requiring more complex mappings: sometimes the characters are mapped onto pairs of bytes (see DBCS). In many cases, this breaks programs that assume a one-to-one mapping of bytes to characters, and so, for example, treat any occurrance of the byte value 13 as a carriage return. To avoid this problem, character encodings such as UTF-8 were devised.
References in periodicals archive ?
Multilingual Computing and the Problem of Character Encoding and Rendering -- annotated guide
It quickly became obvious that 16-bit character encoding would not provide a large enough character set.
UTF-8, 8-bit Unified Transformation Format, is a lossless, variable-length character encoding which uses groups of bytes to represent the alphabets of many of the world's languages.
Big 5 is a character encoding method used in Taiwan (Republic of China) and Hong Kong to enable traditional Chinese characters to be rendered on computers.
Unicode is a 16-bit character encoding that encompasses all known characters and is used as a worldwide character-encoding standard.
NET by simplifying application configuration, providing complete character encoding solutions for international markets, and adding a Btrieve support option.
This universal character encoding scheme solves the problem of handling multiple languages simultaneously for multinational organizations.
In an ongoing effort to aggressively build its database of metadata and links to Japanese streams, Singingfish works with a team of Japanese interns at its Seattle headquarters translating Japanese character encoding into UTF-8, and continually testing the relevance of returns for Japanese queries.
For example, in order to mine an email for key words, phrases, or patterns, the overall system must be able to accurately identify the character encoding and language.
With Android's support for Urdu, the content will be available in Unicode Transformation Format, the international standard for character encodings.
Multibyte Data Processing By virtue of its unique architecture, ETI Solution can now generate programs that can perform ETL (Extract, Transform and Load) operations on multibyte data, including data from multiple sources in different Japanese character encodings.
Greater language support - full-text search support, including language detection, tokenization, and support for all major character encodings for 77 languages; advanced linguistics for 25 languages.