You can read the fine-print
OCR technology today is capable of reading fonts as small as 8 pt even 6 pt very accurately. It used to be unless you have a 12 pt font you stood no chance. Because of increased quality of scans and more advanced OCR engines reading small fonts can be no problem if the right approaches are used.
Small fonts have a higher sensitivity to image quality and degradation to the document. For this reason original source images that are scanned at 300 DPI or higher are necessary. For normal fonts there is seldom reason to scan higher than 300 DPI but for small fonts the goal is to get them to appear more or less the same as the regular fonts, so scanning them at 400 to 600 DPI is useful. Additionally documents that are “clean” is very important. A smudge or spill on a document impacts smaller fonts many times more then a larger font because of the closeness of lines. Once you have a good image quality you can start the conversion.
The next best benefit for small fonts is for them to be zoned separately. Zoning is the process of rubber banding the region where the text exists. When small fonts are grouped in the same zone with normal sized fonts the OCR software assumes that they should be of the same size and the confidence and accuracy go down. If you zone the small fonts separately you increase the OCR engines ability to use experts just for small fonts and increase the accuracy on them.
Next time someone tells you to read the small print, tell them you wont read it you will scan and OCR it.
Labels: best practices, book OCR, Check Scanning, full-page ocr, small print
