Text does not get recognized properly

100% of people found this useful
Text does not get recognized properly

Try these solutions if any part of the original document is not converted to text properly during OCR:

  • Look at the page image and ensure that all text areas are enclosed by text zones. If an area is not enclosed by a zone, it is generally ignored during OCR.

  • Make sure text zones are identified correctly. Reidentify zone types and contents, if necessary, and perform OCR on the document again. See Zone types and contents for more information.

  • Be sure you do not have an unsuitable template loaded by mistake. If zone borders cut through text, recognition is impaired.

  • Adjust the brightness and contrast sliders in the Scanner panel of the Options dialog box. You may need to experiment with different settings combinations to get the desired results.

  • Enhance images for OCR purposes using the SET tools.

  • Check the resolution of the original image. Hover the cursor over a page thumbnail for a popup display. If the resolution is significantly above or below 300 dpi, recognition is likely to suffer.

  • Make sure the correct document languages are selected in the OCR panel of the Options dialog box. Only languages included in the document should be selected. In particular, setting an Asian language for non-Asian texts (and vice versa) is likely to produce unusable results.

  • Recognition results in Japanese, Korean and Chinese can be viewed and saved only if your system has East Asian language support.

  • Turn IntelliTrain on and make some proofing corrections. This is most likely to help with stylized fonts or uniformly degraded documents. If IntelliTrain was running, try turning it off – on some types of degraded documents it may not be able to help.

  • Do some manual training, or edit existing training to remove unsuccessful training.

  • If you use True Page as the Text Editor formatting level or for export, recognized text is put into text boxes or frames. Some text may be hidden if a text box is too small. To view the text, place the cursor in the text box and use the arrow keys on your keyboard to scroll to the top, bottom, left, or right of the box.

  • Check the glass, mirrors, and lenses on your scanner for dust, smudges, or scratches. Clean if necessary.

  • OmniPage only recognizes machine printed-text characters such as typewritten or laser-printed text. It can handle dot-matrix characters, though accuracy may be lower on draft-quality texts. It cannot read handprint or handwriting. However, it can retain signatures or other handwritten text as a graphic.

Recent Comments

By: rjs Posted on 06-06-2009 11:24 AM

You should always check ALL the text in the TEXT EDITOR view of your recognized view before you change the FORMAT of the text. OmniPage may incorrectly and randomly detect some of your text as BULLETED TEXT and when you change the format of your page, the parts of your text that were recognized as BULLETS will disappear. The remaining text will be missing the parts of your text that were incorrectly detected as bullets. Bullets can be upper and lower case letters, Roman numerals, or anything that OmniPage would think might be a bullet. This happens frequently in recognizing text in Table of Contents, Index and pages with names and initials.

There is no way to disable this feature of OmniPage.