Mysterious tables
Most tables out there follow the typical structure, a header with column names, 1 to many rows of data below the that align to column names, and a footer which may contain summation data. This structure is ideal. The first added element of complexity that can occur is when column names do not align with data. This can happen intentionally or due to shifts in scanning. If this is an always or common enough occurrence then it's necessary in data capture setup to ignore table headers completely. Next level of complexity is multi-level headers. Multi-level header structured tables amount to basically tables within tables. There are two levels of headers the first being the parent, and the subsequent levels provide additional details usually a lessor number of items. The levels are usually indicated by using more indents per level. This is most commonly found in EOBs, and what makes EOBs so complex. In this case, you have to capture multiple copies of the same table over and over, and not attempt to collect the whole details as a table. In the most complex documents with this structure, the table data capture element is not used at all but instead a basic field-by-field approach.
One of the biggest mistake's integrators made is assuming a certain data capture table approach will work for all their tables on all documents. The only way to know for sure is testing. The ability for data capture software to find table structures is based on the process Document Analysis. Document Analysis will tell the data capture software where ALL tables on the document are located allowing it to choose the best one. In the case of tables within tables this very often results in a single table that is cutting data cells in half. Document Analysis is built on probability, so if borders of cells for one column have a high location average than that border is selected right or wrong. The more data in a table, the greater the chance of this probability being wrong.
It's best to use tables on concrete document types i.e. a single variation of vendor invoice, or class of vendor invoices all with the same table type. If you prepare, you will not be let down by bad expectations and instead, you will be impressed with your table extraction.
Labels: Data Capture, details, eobs, extraction, invoices, line items, tables
