The Magic of 300DPI
You will hear a lot that 300 DPI is the best resolution to scan an image for OCR. But why? 300 DPI is that magic number where you gain the most accuracy with out sacrificing speed and file size. If you were to put the resolutions on a progressive line starting with 96 DPI and run test of both OCR accuracy, scanning speed, OCR speed, and file size. You will notice something very interesting, the improvement gap between 200 DPI scan and 300 DPI scan will be at least 2 times the improvement gap of any other resolutions. Now if you look at the same line between 300 DPI and 400 DPI the improvement gap is nearly absent, but still there. This simple study is the reason 300 DPI is the ideal resolution for OCR scanning. Now lets look at why.
There is one major reason that 300 DPI is optimal besides it has a reasonable scan speed and reasonable file size, but the biggest reason is the Engine cores were all initial trained on this resolution. Some engine's no matter what resolution you give it will actual sample up or down to get to 300 DPI. The image pre-processing/cleanup engines are similarly setup.
There are always exceptions, and the area of exceptions are usually in hand-printed forms ( ICR ), or documents with small print.
The beauty of the 300 DPI best practice is that it's one of the few things in the area of OCR and Data Capture that is consistent through document type. You have been told to use 300 DPI and now you know reason behind it.
Labels: best practices, icr, OCR, Scanning, Settings
