Tuesday, March 2, 2010

Document Conversion and Law

Both CVISION Technologies and I had the pleasure of attending LegalTech 2010 this year in New York. I was quite impressed with the show and especially how engaged the attendees were. Where does document conversion and compression technologies fit in the legal space? Here is a brief review of the usage of the technologies in this vertical market.

File security:

Starting with the most popular buzz word PDFs. PDFs are the most popular file format in legal for their ability to be secure, and with the right compression tools very small file format. Security is fairly obvious, but compression not so much. Because many of the legal case management platforms, eDiscovery engines, and simply content management are billed by the megabyte of space, keeping files small but usable is critical. The trend of these applications is to be fewer installed products and most hosted. The hosted products usually have a monthly service fee and charge per amount of storage. Keeping the content value but small then becomes a real concern especially when dealing with the hundreds of thousands of pages a case might contain.

Search-ability:

Lawyers work with a lot of paper, getting at the right information is tough. That is why before a document can be loaded to any case management or eDiscovery system, it must be OCRed and made searchable. Good OCR is essential, as is the ability to quickly get the documents converted. Without OCR, eDiscovery simply cannot work on paper. Surprisingly this was a common afterthought, but a large complaint for products with poor OCR. Some organizations simply put the paper or image files aside, risking loss of valuable information. Others did not concern themselves with OCR accuracy and just assumed it was good enough. Both are a mistake and I hope a dying trend in this particular market as they are only hurting themselves. Garbage in garbage out.

Translation:

The number of translation companies at the show was large. Why? Because very often lawsuits are comprised of a large collection of documents that contain a subset of languages. In order for eDiscovery to work well, the data must be normalized i.e. translated. The first challenge is to find the languages. It is a tremendous effort to go through a large collection of documents and identify each page a particular language occurs. Second is in paper documents getting the data into a digital format so manual or software based translation can occur. OCR can facilitate both. First is the conversion of paper to digital, and second is during OCR language detection happens internally in the OCR engine. Again just like the above, the quality of the OCR is imperative, so law firms have every right to be concerned about what OCR engine they or their translation company uses.

If you did not attend, I recommend you keep it on your radar for next year, or the west coast version. While document conversion is not the favorite topic in legal, it finds its way in each step of case management, e-discovery, and billing.

Labels: , , , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Thursday, February 4, 2010

Document Conversion and Law

I had the pleasure of attending LegalTech 2010 this year in New York. I was quite impressed with the show and especially how engaged the attendees were. Where does document conversion and the conversion and compression technologies fit in the legal space? Here is a brief review of the usage of the technologies in this vertical market.

File security:

Starting with the most popular buzz word PDFs. PDFs are the most popular file format in legal for their ability to be secure, and with the right compression tools very small file format. Security is fairly obvious, but compression not so much. Because many of the legal case management platforms, eDiscovery engines, and simply content management are billed by the megabyte of space, keeping files small but usable is critical. The trend of these applications is to be fewer installed products and most hosted. The hosted products usually have a monthly service fee and charge per amount of storage. Keeping the content value but small then becomes a real concern especially when dealing with the hundreds of thousands of pages a case might contain.

Search-ability:

Lawyers work with a lot of paper, getting at the right information is tough. That is why before a document can be loaded to any case management or eDiscovery system it must be OCRed and made searchable. Good OCR is essential, as is the ability to quickly get the documents converted. Without OCR eDiscovery simply cannot work on paper. Surprisingly this was a common afterthought, but a large complaint for products with poor OCR. Some organizations simply put the paper or image files aside, risking loss of valuable information. Others did not concern themselves with OCR accuracy and just assumed it was good enough. Both are a mistake and I hope a dieing trend in this particular market as they are only hurting themselves. Garbage in garbage out.

Translation:

The number of translation companies at the show was large. Why? Because very often lawsuits are comprised of a large collection of documents that contain a subset of languages. In order for eDiscovery to work well the data must be normalized i.e. translated. The first challenge is to find the languages. It is a tremendous effort to go through a large collection of documents and identify each page a particular language occurs. Second is in paper documents getting the data into a digital format so manual or software based translation can occur. OCR can facilitate both. First is the conversion of paper to digital, and second is during OCR language detection happens internally in the OCR engine. Again just like the above, the quality of the OCR is imperative, so law firms have every right to be concerned about what OCR engine they or their translation company uses.

If you did not attend, I recommend you keep it on your radar for next year, or the west coast version. While document conversion is not the favorite topic in legal, it finds its way in each step of case management, e-discovery and billing.

Labels: , , , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Tuesday, December 29, 2009

Rich Media OCR

I often speak of unique uses of OCR, and here is yet another. OCRing video files! But why? Part of the management of rich media assets is indexing these files. Technologies such as speech recognition and optical character recognition give a greater index and search value to rich media.

By using OCR technology to find and extract text from video frames, the data can be stored as meta-data. In the simplest scenario, this is a text file that accompanies the video file. More complex environments will even tell you the minuet and second the text occurs. Because this is not a traditional use of the technology, some special consideration must take place.

First is converting and separating frames to individual images files. For the OCR to be effective it needs to work on a series of images. Although a video is only a sequence of images that repeat at a high rate of speed, it's still somewhat of a challenge to convert video files such as MPEG to a series of images. Not only that, dealing with motion blurs that might occur in some frames will also be a problem.

The second challenge is dealing with frames that are repeats. Essentially, because there are so many similar images that are only slightly different from each other, the text on a series of frames might not change. Better OCR results will account for this and not repeat text as the frames would.

And finally dealing with the variations of fonts, and often small sizes. This requires an OCR engine with specific settings for specialized OCR, and one that is very accurate on complex low quality documents.

I expect that in the future, this technique in conjunction with speech recognition will be used in eDiscovery, content management, and robust search of rich media files.

Labels: , , , , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Tuesday, November 24, 2009

Hidden value propositions

Document automation has it's obvious value: to decrease the cost of running a paper base business, but there are some other areas to gain value from document automation technology, not always monetary, that some companies are finding to be even more important.

Improved Employee Efficiency:
Very often organizations have salaried employees who's job description is not data-entry but are doing data-entry work. It's often overlooked because the data-entry portion of their job is ad-hoc and not at tremendous volume. When the data-entry task is removed from these employees plate they are able to dedicate more time to their job responsibilities and increase their efficiency. This is most often seen with accounts payable clerks. The value of this is that the salary paid to these employees is now used on more critical thinking tasks.

eDiscovery Ready:
No one wants to be involved in a lawsuit, but they happen. When they happen being ready is critical. Courts will expect you to produce data that is accurate and produce it quickly. You will want to be able to produce that data at a low cost, and not more information than you have too. Document automation and OCR technologies are a critical part of this. Paper in file cabinets are very costly to review and collect for a case, but OCRd documents that are properly filed are easy to search and retrieve. This makes you eDiscovery ready at a lower cost and greater efficiency. The value is a reduced cost and risk when and if a lawsuit happens.

Compliance:
Regulation can come from government or industry. Companies who are not complaint risk penalties or even worse. Document automation technologies help companies become compliant faster and more accurately. Instead of having large staff to manage compliance they can dedicate computer time to do the data-entry work on documents having compliance risk, and a small staff to maintain their presence. Similar to eDiscovery this preparedness mitigates potential risk and cost of not being ready.

Reduced Workers Comp Claims:
The lesser thought about value of document automation is workers compensation claims associated with document handling and entry. Companies with large data-entry staff can dramatically reduce claims associated with data-entry especially such as carpal tunnel, back pains, and eye strain. The staff's duties will shift to more body friendlily activities that are less redundant. Every year companies are spending a lot of money on workers compensation claims, and the administration of them. This reduces that cost and risk.

As you can see there are many areas where document automation can help companies, there are even a few more that are industry specific. Often companies find that the ROI is second to one of the above benefits of document automation.

Labels: , , , , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments