Optical Character Recognizer.

From an image of a paper document, this software gives an electronic version (often only text).

The result has a better presentation when it is combined with Document Analysis techniques.

Many commercial OCR have an error rate of less than 1/100.Than means nearly one error per text line.

OCR errors can be:

  1. confusion: a character (rather, a glyph) instead of another one
  2. insertion: a glyph is added where it should not be
  3. deletion: a glyph is not recognized.

A typical error is the replacement of "m" by "rn", the confusion of the lowercase L and the digit 1, etc.

The most common OCR are: FineReader, TextBridge, OmniPage. Also, they all do a segmentation of the scanned images given to them into blocks of different media (at least texte, table, image).

An exam board in Britain which sets GCSEs and A-level papers. The initials stand for Oxford, Cambridge and RSA examinations board.

Log in or register to write something here or to contact authors.