Tax Return OCR
Checkmarks: Tax returns have two types of checkmarks, ones that are standard and printed in the body of the document. These can be handled similar to all other common checkmark types. The other type of checkmark is unique only to tax forms, they are typically on the right side of the document. They are boxes that within can be filled with a character or a checkmark symbol. With these checkmark's the best approach is to create a field the entire size of where the checkmark can be printed and set the checkmark type to be of type “white field”. In this case the software will expect there to be only white space and a presence of enough black pixels will consider it checked.
Tabular Data: Much of the data in a tax form is presented as a table. When considering capturing data from a table organizations have to decide if they want to capture each cell of the table as it's own field OR if they would like to capture the data in the table as a table field that later must be parsed. This can dramatically effect the exported results so knowing before hand is very important.
Delivery Type: Tax forms usually come as eFile which is a pixel perfect document that is never printed and never scanned, or as a scanned document received first as paper then scanned. For the most part the eFile version of the tax form will be more accurate, however the eFile version of the form has non-traditional checkmark's that could cause a problem. Organizations need to decide if they are going to process all delivery types together as a single type or separate them. There are advantages to both. By combining them integration time is less, by separating them accuracy is higher.
I much rather OCR a tax return than file one. Because of this the skills I've developed in processing tax returns are better than creating them, and I hope today I imparted some of that knowledge.
Labels: best practices, Data Capture, OCR, tax return

0 Comments:
Post a Comment
<< Home