[′kar·ik·tər ‚rek·ig′nish·ən]
(computer science)
The technology of using a machine to sense and encode into a machine language the characters which are originally written or printed by human beings.

Character recognition

The process of converting scanned images of machine-printed or handwritten text (numerals, letters, and symbols) into a computer-processable format; also known as optical character recognition (OCR). A typical OCR system contains three logical components: an image scanner, OCR software and hardware, and an output interface. The image scanner optically captures text images to be recognized. Text images are processed with OCR software and hardware. The process involves three operations: document analysis (extracting individual character images), recognizing these images (based on shape), and contextual processing (either to correct misclassifications made by the recognition algorithm or to limit recognition choices). The output interface is responsible for communication of OCR system results to the outside world.

Commercial OCR systems can largely be grouped into two categories: task-specific readers and general-purpose page readers. A task-specific reader handles only specific document types. Some of the most common task-specific readers read bank checks, letter mail, or credit-card slips. These readers usually utilize custom-made image-lift hardware that captures only a few predefined document regions. For example, a bank-check reader may scan just the courtesy-amount field (where the amount of the check is written numerically) and a postal OCR system may scan just the address block on a mail piece. Such systems emphasize high throughput rates and low error rates. Applications such as letter-mail reading have throughput rates of 12 letters per second with error rates less than 2%. The character recognizer in many task-specific readers is able to recognize both handwritten and machine-printed text.

General-purpose page readers are designed to handle a broader range of documents such as business letters, technical writings, and newspapers. These systems capture an image of a document page and separate the page into text regions and nontext regions. Nontext regions such as graphics and line drawings are often saved separately from the text and associated recognition results. Text regions are segmented into lines, words, and characters, and the characters are passed to the recognizer. Recognition results are output in a format that can be postprocessed by application software. Most of these page readers can read machine-written text, but only a few can read hand-printed alphanumerics. See Computer

character recognition

The ability of a machine to recognize printed text. See OCR and MICR.
