Monday, November 30, 2009

The Clock is ticking

When considering the ROI on a data capture integration, setup time is one of the most important and often miscalculated factors. Not just the setup time for initial integration, but the setup time used for any fine-tuning and optimization some times post production.

The difference in setup time between a fixed data capture environment where coordinate based fields are used and rules based semi-structured environments is substantial. It's not usually the fixed data capture environments that pose the biggest challenge in calculating ROI or predicting it. It takes an administrator on average between 15 to 45 seconds to create and fine-tune a fixed form field. In semi-structured processing the field setup time can be between 60 seconds and hours, depending on the complexity of the document ant the logic being deployed. It's this large gap that throws a wrench in some ROI calculations.

For experienced integrators ability to put a document and it's associated fields into complexity classes is usually pretty easy. After doing so gauging the average amount of time to setup each field, and thus all fields should be accurate. There is always a field or two that requires extra fine-tuning. The key is a complete understanding of the document. Sometimes document variations are obvious, other times they sneak up on you and you have no idea the variation exists until you start working with it. Knowing all variations is the easiest way to understand the additional time any field will take to setup. Variants are the biggest contributor of time in semi-structured data capture setup. Second is odd field types, such as fields that take up one to many lines, or are continuous across two separate lines, and finally tables. The third and final largest contributor to setup time is poor document quality, this means the administrator has to be more general when creating fields and likely has to deploy multiple logic per each field to locate information in several possible ways.

When calculating the ROI on your data capture project make sure to be aware of these sometimes sneaky factors that can eat at integration time. Bottom-line, know your documents, and know the technology before any work is done. If you are unsure seek professional assistance.

Labels: , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Tuesday, November 3, 2009

Fixed, Semi-structured, UNSTRUCTURED!?

I find my self educating even industry peers on the topic of document type structure more and more recently. Often the conversation starts with one of them telling me about how unstructured document processing exists, OR the fact that a particular form is fixed when it is not. Understanding what is meant when talking about document structure is very important.

First lets start with defining a document, a document is a collection of one or many pages that has a business process associated with it. Documents of a single type can vary in length but the content contained within or the possibility of it existing is constrained. When data capture technology works, it works on pages, so each page of a document is processed as a separate entity, this it seems, is the meat of the confusion.

Often someone will say a document is unstructured, what they are thinking is the order of pages is unstructured, this is more or less accurate, however the pages within this unstructured document are either fixed or semi-structured. The only truly unstructured documents that exist are contracts and agreements. How you know is if at any moment in time you pull a page from the document and state what that page is and what information it would have, then it IS NOT unstructured.

The ability to processes agreements and contracts is very limited in very concrete scenarios, where the contract variants are non which essentially also makes them unstructured. In general the ability to process unstructured documents does not exist. Now to explore the difference between semi-structured and fixed.

It's actually very easy, 80% of the documents that exist are semi-structured. Even if a field appears in the same general location on every page of a particular type, does not make it fixed. For example a tax form always has the same general location to print company name. The printer has to print within a specified range. They can print more to the left, more to the top, and the length will very with every input name. This makes is semi-structured, additionally this document when it is scanned will shift left , right, up, down small amounts. A document is ONLY truly a fixed form when it has registration marks and fields of fixed location and length. Registration marks are how the software matches every image to the same set of coordinates making it more or less identical to the template.

There, again the confusion is exposed. It's very important to understand when having conversations about data capture to understand the true definitions of the lingo that is used. I task you, if you catch someone using the lingo incorrectly it will help you and them to correct it.

Labels: , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Friday, September 18, 2009

When you got it design it – Form Design

Not too often to companies using Data Capture technology have the chance to change their forms design or even create new ones. If you have this ability, USE IT! A properly designed form is the fist step to success in automating that form. There are many things you can do to make sure your form is as machine readable as possible. Typically the forms we are talking about are hand-written but occasional also machine filled. I will highlight the major points.

1. Corner stones. Make sure your form has corner stones in each corner of the page. The corner stones should be at 90 degree angles to each neighbor one and the ideal type is black 5 mm squares.

2. Form title. A clear title in 24 pt or higher print and no stylized font.

3.Completion Guide. This is optional but sometimes is useful at the top of the form to print a guide on how best to fill in the fields of the type you use.

4.Mono-Spaced fields. For the fields to be completed it's best to use field types that are character by character separation. Each character block should be 4 mm x 5 mm and should be separated by 2 mm or more distance. The best types of fields to use in order are letters separated by dotted frame, letters separated by drop-out color frame, letters separated by complete square frames.

5. Segmented fields by data type. For certain fields it will be important to segment the field in portions to enhance ICR accuracy. The best example is date instead of having one field for the complete data split it into 3 separate parts first being a month field, next a day field, and finally a year field. Same is often done for numbers, codes, and phone numbers.

6. Separate fields. Separate each field by 3 mm or more.

7. Consistent fields. Make sure the form uses consistent field types stated in 4.

8. Form breaks. It's OK to break the form up into sections and separate those sections with solid lines. This often helps template matching.

9. Placement of field text. For the text that indicates what a field is “first name”, “last name”. It is best to put these left justified to the left of the field at a distance of 5mm or more. DO NOT put the field descriptor in drop-out in the field itself.

10. Barcode. Barcode form identifiers are useful in form identification. Use a unique id per form page and place the barcode at the bottom of the page at lease 10 mm from any field.

Labels: , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Wednesday, September 16, 2009

If it's not semi-structured why fix it – know your form's class?

There are two major classes of Data Capture technology fixed or semi-structured. When processing a form it's critical that the right class is chosen. To complicate things there is a population of forms out there that can be automated with either, but there is always a definite benefit of one over the other. In my experience organizations are having a very hard time figuring out if their form is fixed or not. The most common miss-diagnosis is from forms where fields are in the same location and each possess an allotted white space for data to be entered. Too most this seems fixed, but in actuality it's not. Text in these boxes can move around substantially, additionally the boxes themselves while in the same location relative to each other can move because of copying, variations in printing, etc. There are two very easy steps to determine if your form is fixed or not.

1.)Does your form have corner stones? Corner stones, sometimes refereed to as registration marks ( registration marks have been known to replace corner stones when they are very clearly defined ) are printed objects usually squares in each corner of the form. They must be all at 90 degree angle's from their neighbors. What corner stones do is allow the software to match the scanned or input document to the original template, theoretically making all fields and all elements that are static on the form lined up. Removing any shifts, skews, etc.

2.)Does your form have pre-defined fields? A pre-defined field is more than location on the form a pre-defined field has a set width, height, location, and finally and most importantly set number of characters. You know these fields most commonly by when you have filled out a form and you have a box for each letter. There are variations in how the characters are separated, but they all share these attributes. This is called mono-spaced text.

If your form does not have the above two items it is not a fixed form. This would indicate that a semi-structured forms processing technology would be the best fit. On those forms that are commonly confused for fixed, there are ways to make it process well with a fixed form solution by isolating the input type ( fax, email, scan ), and using the proper arrangement of registration marks.

Labels: , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments