Tuesday, March 2, 2010

Document Conversion and Law

Both CVISION Technologies and I had the pleasure of attending LegalTech 2010 this year in New York. I was quite impressed with the show and especially how engaged the attendees were. Where does document conversion and compression technologies fit in the legal space? Here is a brief review of the usage of the technologies in this vertical market.

File security:

Starting with the most popular buzz word PDFs. PDFs are the most popular file format in legal for their ability to be secure, and with the right compression tools very small file format. Security is fairly obvious, but compression not so much. Because many of the legal case management platforms, eDiscovery engines, and simply content management are billed by the megabyte of space, keeping files small but usable is critical. The trend of these applications is to be fewer installed products and most hosted. The hosted products usually have a monthly service fee and charge per amount of storage. Keeping the content value but small then becomes a real concern especially when dealing with the hundreds of thousands of pages a case might contain.

Search-ability:

Lawyers work with a lot of paper, getting at the right information is tough. That is why before a document can be loaded to any case management or eDiscovery system, it must be OCRed and made searchable. Good OCR is essential, as is the ability to quickly get the documents converted. Without OCR, eDiscovery simply cannot work on paper. Surprisingly this was a common afterthought, but a large complaint for products with poor OCR. Some organizations simply put the paper or image files aside, risking loss of valuable information. Others did not concern themselves with OCR accuracy and just assumed it was good enough. Both are a mistake and I hope a dying trend in this particular market as they are only hurting themselves. Garbage in garbage out.

Translation:

The number of translation companies at the show was large. Why? Because very often lawsuits are comprised of a large collection of documents that contain a subset of languages. In order for eDiscovery to work well, the data must be normalized i.e. translated. The first challenge is to find the languages. It is a tremendous effort to go through a large collection of documents and identify each page a particular language occurs. Second is in paper documents getting the data into a digital format so manual or software based translation can occur. OCR can facilitate both. First is the conversion of paper to digital, and second is during OCR language detection happens internally in the OCR engine. Again just like the above, the quality of the OCR is imperative, so law firms have every right to be concerned about what OCR engine they or their translation company uses.

If you did not attend, I recommend you keep it on your radar for next year, or the west coast version. While document conversion is not the favorite topic in legal, it finds its way in each step of case management, e-discovery, and billing.

Labels: , , , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Tuesday, February 23, 2010

Translating images

Text translation services come in a variety of forms, from individuals who make a good living translating documents from one language to another, to large firms using many individuals or purely software. No matter the form, they are all faced with a challenge when the text they need to translate is contained in physical paper or an image file.

Today, translation is facilitated with the use of word processing systems. Word processors give the translator the ability to be more efficient and manage the translation process over many sessions. But in order to use the capabilities of a word processing system, it's necessary to get the text into a digital format. That is where Optical Character Recognition comes in. OCR is one of the greatest tools in a translator's bag of tricks. It allows the individual to convert the image files and physical paper to digital text which can be consumed and translated.

The great thing about modern OCR is the sheer number of languages that are supported. Not only is OCR capable of converting a document to digital in one language but even if it contains multiple languages, it's smart enough to know where one language begins and the other ends. If you can imagine the risk of a translator who receives OCR errors, you will see why making sure documents are scanned at the optimum quality is a great consideration. Modern OCR engines will tell the operator exactly where any confusion might have occurred and give them the opportunity to correct it. Documents scanned at 300 DPI TIFF Group 4 black and white will excel.

Without OCR, a translator's job becomes more of a data entry task than what they are truly skilled at which is translation.

Labels: , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Thursday, February 4, 2010

Document Conversion and Law

I had the pleasure of attending LegalTech 2010 this year in New York. I was quite impressed with the show and especially how engaged the attendees were. Where does document conversion and the conversion and compression technologies fit in the legal space? Here is a brief review of the usage of the technologies in this vertical market.

File security:

Starting with the most popular buzz word PDFs. PDFs are the most popular file format in legal for their ability to be secure, and with the right compression tools very small file format. Security is fairly obvious, but compression not so much. Because many of the legal case management platforms, eDiscovery engines, and simply content management are billed by the megabyte of space, keeping files small but usable is critical. The trend of these applications is to be fewer installed products and most hosted. The hosted products usually have a monthly service fee and charge per amount of storage. Keeping the content value but small then becomes a real concern especially when dealing with the hundreds of thousands of pages a case might contain.

Search-ability:

Lawyers work with a lot of paper, getting at the right information is tough. That is why before a document can be loaded to any case management or eDiscovery system it must be OCRed and made searchable. Good OCR is essential, as is the ability to quickly get the documents converted. Without OCR eDiscovery simply cannot work on paper. Surprisingly this was a common afterthought, but a large complaint for products with poor OCR. Some organizations simply put the paper or image files aside, risking loss of valuable information. Others did not concern themselves with OCR accuracy and just assumed it was good enough. Both are a mistake and I hope a dieing trend in this particular market as they are only hurting themselves. Garbage in garbage out.

Translation:

The number of translation companies at the show was large. Why? Because very often lawsuits are comprised of a large collection of documents that contain a subset of languages. In order for eDiscovery to work well the data must be normalized i.e. translated. The first challenge is to find the languages. It is a tremendous effort to go through a large collection of documents and identify each page a particular language occurs. Second is in paper documents getting the data into a digital format so manual or software based translation can occur. OCR can facilitate both. First is the conversion of paper to digital, and second is during OCR language detection happens internally in the OCR engine. Again just like the above, the quality of the OCR is imperative, so law firms have every right to be concerned about what OCR engine they or their translation company uses.

If you did not attend, I recommend you keep it on your radar for next year, or the west coast version. While document conversion is not the favorite topic in legal, it finds its way in each step of case management, e-discovery and billing.

Labels: , , , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments