These hints are designed to increase OCR accuracy in OmniPage.
Select settings that improve accuracy in the Options dialog box.
Choose Options in
the Tools menu or click
in the Standard toolbar. Then
click the tab in the Options dialog box for the settings you want to change:
Select Accuracy
under Optimize the OCR process for...in
the OCR panel.
Adjust the Brightness and Contrast
sliders in the Scanner panel. Click
here for an example of optimum
brightness.
Enhance
images for OCR purposes using the SET
tools.
If your only criteria is OCR
accuracy, prefer black-and-white scanning for good quality documents with
crisp black text on a white background. Choose grayscale scanning if you
are scanning pages with text on colored or shaded backgrounds, or for
degraded documents with low or varied contrast.
Select Training
File in the Proofing panel
to use a character training file to help recognize special or stylized
characters during OCR. See Training files
for more information. This does not apply to Asian languages.
Use suitable recognition aids
If you have a long document,
and no suitable training file, do some training
on a few typical pages. Turn on IntelliTrain
in the Proofing panel of the Options dialog box, then recognize three
or four pages and proofread the text. Inspect the quality of the training
in the Edit Training dialog box,
then save it to file.
If you are getting poor results
with a training file loaded, check its contents in the Edit
Training dialog box. Make sure it is appropriate for the current document.
If it is not, either unload it or edit its contents to remove training
from poorly formed character shapes. Unsuitable training can yield worse
results than no training at all.
If proofing is skipping too many
unsuitable words and you have a user dictionary loaded, check its contents
with the Edit User Dictionary dialog
box. Delete any entries added in error, especially misspelt words.
Identify Zones Correctly
When processing pages manually,
make sure zones are identified correctly before OCR.
When processing automatically,
be sure your original layout setting is the best one for the document.
Inspect the recognition results. If there are defects due to poor zoning
on some pages, change the zone properties and/or locations and re-recognize
those pages.
Make sure you do not have a zone template
file loaded which is unsuitable for your current pages.
To retain handwritten text, such
as a signature, identify it as a graphic zone.
Use High-Quality Images
In general, try to use original
pages when you are scanning documents. Typeset, high-quality printed page
images yield the best OCR accuracy. OCR accuracy may not be as good with
lesser-quality pages.
With low-quality originals, sometimes
a good-quality photocopy can yield better OCR results. This may be true
on documents with low contrast or printed on thin paper. On the other
hand, poor-quality photocopies with stripes, blotches or uneven brightness
will usually give worse results.
Ask senders to select Fine
or Best Mode when they send you
a fax.
Page images should be free of
notes, lines, or doodles. Anything that is not a printed character slows
recognition, and any character distorted by a mark may be unrecognizable.
Try not to include such marks in zones, or enclose them in an ignore zone.
Text in page images should be
reasonably clean and crisp. Characters should be separated from each other
and not blotched together or overlapping.
If you have influence over the
styling used in documents you want to recognize, avoid having underlines
used. It is difficult to recognize underlined text because the underline
changes the shape of descenders on the letters q, g, y, p, and j.
If you are getting poor results
from image files, check their quality and resolution by hovering the cursor
over the thumbnails. The ideal resolution for OCR is 300 dpi. Images with
less than 200 dpi or more than 400 dpi are liable to yield far lower accuracy.
If you have the documents on paper, scan them again with better settings.
If not, ask the people who supply your images to use 300 dpi.
<!--Metadata type="DesignerControl" startspan
<object CLASSID="clsid:FF80F713-5DC6-11d0-A7B4-00AADC53E937"
ID=RelatedTopics
BORDER=0
style="margin-top: 0px;
margin-bottom: 0px;
margin-left: 0px;
margin-right: 0px;
vertical-align: baseline;"
align=bottom>
<param name="_Version" value="65536" >
<param name="_ExtentX" value="2408" >
<param name="_ExtentY" value="609" >
<param name="_StockProps" value="13" >
<param name="ForeColor" value="0" >
<param name="BackColor" value="13160660" >
<param name="UseButton" value="-1" >
<param name="UseText" value="0" >
<param name="ControlLabel" value="Related Topics" >
<param name="UseIcon" value="0" >
<param name="Items" value="Improving fax recognition;Improving_fax_recognition.htm$$**$$Improving speed;Improving_speed.htm$$**$$" >
<param name="Image" value="" >
<param name="FontInfo" value="Arial,8,0,," >
<param name="_CURRENTFILEPATH" value="D:\SOURCESAFE\OP17\Help\Improving_accuracy.htm" >
<param name="_ID" value="RelatedTopics" >
<param name="UseMenu" value="-1" >
<param name="Frame" value="" >
<param name="Window" value="" >
</object>-->
<!--Metadata type="DesignerControl" endspan-->