Thursday, February 4, 2010

Document Conversion and Law

I had the pleasure of attending LegalTech 2010 this year in New York. I was quite impressed with the show and especially how engaged the attendees were. Where does document conversion and the conversion and compression technologies fit in the legal space? Here is a brief review of the usage of the technologies in this vertical market.

File security:

Starting with the most popular buzz word PDFs. PDFs are the most popular file format in legal for their ability to be secure, and with the right compression tools very small file format. Security is fairly obvious, but compression not so much. Because many of the legal case management platforms, eDiscovery engines, and simply content management are billed by the megabyte of space, keeping files small but usable is critical. The trend of these applications is to be fewer installed products and most hosted. The hosted products usually have a monthly service fee and charge per amount of storage. Keeping the content value but small then becomes a real concern especially when dealing with the hundreds of thousands of pages a case might contain.

Search-ability:

Lawyers work with a lot of paper, getting at the right information is tough. That is why before a document can be loaded to any case management or eDiscovery system it must be OCRed and made searchable. Good OCR is essential, as is the ability to quickly get the documents converted. Without OCR eDiscovery simply cannot work on paper. Surprisingly this was a common afterthought, but a large complaint for products with poor OCR. Some organizations simply put the paper or image files aside, risking loss of valuable information. Others did not concern themselves with OCR accuracy and just assumed it was good enough. Both are a mistake and I hope a dieing trend in this particular market as they are only hurting themselves. Garbage in garbage out.

Translation:

The number of translation companies at the show was large. Why? Because very often lawsuits are comprised of a large collection of documents that contain a subset of languages. In order for eDiscovery to work well the data must be normalized i.e. translated. The first challenge is to find the languages. It is a tremendous effort to go through a large collection of documents and identify each page a particular language occurs. Second is in paper documents getting the data into a digital format so manual or software based translation can occur. OCR can facilitate both. First is the conversion of paper to digital, and second is during OCR language detection happens internally in the OCR engine. Again just like the above, the quality of the OCR is imperative, so law firms have every right to be concerned about what OCR engine they or their translation company uses.

If you did not attend, I recommend you keep it on your radar for next year, or the west coast version. While document conversion is not the favorite topic in legal, it finds its way in each step of case management, e-discovery and billing.

Labels: , , , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Wednesday, January 27, 2010

Thoughts on the iPad. Why it's more than a tablet

Apple even when faced with one of the most blatant rumor leaks ever, still managed to impress the world today with it's announcement of the iPad. Minuets after the announcement both Mashable and Engadget crawled due to excessive traffic. I too was overtaken by my familiar Apple announcement excitement, continually refreshing every source I could cling too. But what does it mean, getting past the hype, how will the iPad affect the digital world? Here are my thoughts:

We very often focus on the device that Apple releases, it's specs, it's artistic shell. What is overlooked is how Apple historically wins favor based on quality and user experience and less on it's sex appeal. People switched to Mac not only because it was sexy and new, but it was easy to use and did not have the problems of a common PC. Steve from the very beginning had a focus on controlling the experience of each and every user very tightly. The years in Steve's absence was the first time I ever felt frustrated with an Apple machine. Magically, Steve and the Apple team has tightly controlled the user experience of each and every device they released and the iPad will be no different.

Tablets have existed, many more powerful and functional prior to the iPad. So what will make the iPad win, assuming it does. The iPad introduces a concept started with the iPhone and now propagating further, and that is the App store. I don't know about you, but it's hard for me to avoid conversations about new App's in the App store, best App's, and jokingly App's that can do everything even brush my teeth. Now with the iPad we wont just have fun, cute, and sometimes functional mobile App's we will have more robust productivity App's.

This software distribution method is truly innovative. Apple I would say was not the first. My first flirtation with this method of purchase, distribution, and maintenance of software was created, I believe, by Valve in the video game industry with their Steam product. Steam was a way to manage all the games you purchased from Valve, their updates, and games of their partners. It was awesome, no CD, no worrying about updates or where and which machine they were installed. You received news on new games, and games you likely will enjoy. The iPad I believe is going to be the expansion of this method into not only the world of entertainment but also business and productivity. I've already planned a way to use the iPad for my recently started boutique winery representation business to manage profiles and inventory.

The App store is just one example of how it's not just the device, it's the experience. One of the lesser known and utilized products from Apple and arguable one of the most impressive is the AppleTV. This product too wins based on usability; plug it in, enter your WEP password if you have one, and enjoy. Undying pride, and perseverance has forced Steve Jobs style and vision into each and every device.

Some forecasting: as I said I believe the iPad will find a place in business and productivity, I hope, but I hoped this even with the Xserve and partnerships with Oracle. I expect more flavors to surface. I believe the iPad will start to become an eReader device of choice. Expect some lawsuits, Fujitsu? I believe the ability to drive external devices will become a common request and Apple will need to facilitate the ability to install device drivers ( experienced this with www.LivingSCAN.com and reason we did not go with the iPhone ). Finally, I expect the launch of the next iPhone to be the biggest yet..... Can't wait

Labels: , , , , , , , , , , , ,

Bookmark and Share
posted by Chris Riley at 2 Comments

Tuesday, January 26, 2010

Attachment Emailing Master

Very often in business, email correspondences are accompanied by a file attachment. While it's possible to attach to an email any file format ( some not preferred by email clients ) the most common type is a document and the most common format is either Word or PDF. This post contains some advice on the best way to deliver documents via email.

When emailing documents, you have to be concerned about size, readability, and security. If the attachment is too large, you may not be able to email it at all. If the document is not readable, there is no point in sending it anyway. Finally, if it's not secure, it might be re-purposed or stolen. When your document starts out in paper form, the challenges increase.

There is an ideal format and conversion settings to use when sending documents via email. Ideally you would scan your document in color for readability visually. This is not the only type of readability, you also want to make sure the documents are accessible for long periods of time. You would use optical character recognition ( OCR ) for the document's ability to be indexed by a search utility. You would use a compression tool to convert that initially large color image into one that is manageable but the quality is not degraded, and finally you will use the PDF format to get all levels of security you choose.

The combination of a searchable, compressed, color PDF is the ideal method for emailing documents as attachments and ensuring their effectiveness and long-term usage.

Labels: , , , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Monday, January 18, 2010

Integrate-ability

There are so many really cool technologies out there. Technologies that when you see them demoed you are blown away by the promise they offer. Sometimes this awe leads companies and individuals astray when it comes to not only finding a technology but one that will actualize in an already established environment. What I'm talking about is the lack of focus organizations put on the way new technology integrates.

Poor integration is the killer of ROI when it comes to purchasing enterprise applications. We sometimes forget that when introducing new technology, it must play nice with existing technology, it also must be given the chance to reach it's demoed potential. It seems obvious but so often organizations skip planning steps, skip testing and result in buying expensive technology that conflicts with the existing environment or setup incorrectly.

This problem in large part is aggravated by the vendors. Most vendors are not too concerned with their technology's smooth integration, many even want it to be difficult to integrate as their business thrives on professional service income. Recently, I was a panelist for leading enterprise software vendors, the attendees were mostly marketing staff. It was clear they did not see the importance of integration. Often what it takes for vendors to provide better integration support is more documentation, more samples. This was my initial and easiest to execute recommendation. They see this as a cost generator, or they simply don't know who in their organization would own this responsibility. The unfortunate aspect of this belief is that they are only hurting themselves. Adoption of technology depends on a good experience beyond the sale. Companies have tightened their budget all around, and to win deals these technologies need to promote features and integration.

Because the vendors are a little lost it's up to the IT departments and knowledge works who will utilize the technology to plan properly the integration. Seeing demos is great, but that is just the point when organizations should slow down. Document the current environment. Establish how the new technology will fit within it. Know the applications success factors and the path to reach them, and finally TRIAL. If a vendor is afraid to give you a trial than this should be a red flag instantly. They are not established enough to have trials, or they are trying to lock you into a professional service ROI squashing machine.

Often technology is blamed incorrectly when what really should be blamed is poor integration and planning.

Labels: , , , ,

Bookmark and Share
posted by Chris Riley at 3 Comments

Thursday, January 14, 2010

Replacement for fax right under our noses

How does a technology first invented in 1843 and executed in 1924 still exist as a primary function in our working lives? I'm talking about fax. The fax technology is old and outdated. I personally avoid fax based on simply principle. But my principle alone will not make big changes in adoption. What people don't understand is that we have a fax replacement right under our noses, one that is both green and as easy to use.

The combination of a document scanner, imaging software, and email software is a complete fax replacement solution. Instead of typing in phone numbers users can type in email addresses. In fax you double the amount of paper that exists. Paper in, paper out. With the document scanning approach you are reducing the paper consumption, paper in, email out. Most document scanners today even ship with a pre-configured “Scan to Email” option. On a production level, systems can be setup in offices, your local Kinkos, wherever, to allow multiple users to access the same document scanner and scan to any email with a basic step-by-step wizard.

Not only is fax to email saving trees it is also increasing efficiency and when combined with workflow, document imaging, OCR, and data capture, adds much greater value for that single piece of paper.

These systems do in fact exist in small corners of the world, and I have participated in the development and setup of them. The adoption is still very low. What it comes down to is fear of change. People understand paper to paper. Many users of fax don't even know what email is. There is two ways this is solved, time, and forced adoption. While I would hope for the second which would be a campaign of replacing all fax machines with scanners, it's very unlikely and requires unity of multiple competing entities.

No I do not like fax, but I understand it. And I hope that sooner rather than later people see there has been a solution to replace fax that is both saving trees, increasing efficiency and has existed for many years.

Labels: , , , , , ,

Bookmark and Share
posted by Chris Riley at 1 Comments

Tuesday, January 12, 2010

Duel Stream Scanning – Have your cake and eat it too

The benefit of drop-out forms is that they are very accurate in data capture. The downside to drop-out forms is that after they are scanned they aren't much to look at. Companies want the best of both drop-out and black and white forms. They do this in various ways, the most common being to just deal with the images they have. Some will scan a document twice, that is very time consuming. Others will use an overlay utility that stamps the original form fields and labels back on an already processed drop-out image. These utilities are accurate but not as accurate as the original and often result in lines stamped on text. The best solution for getting a form scanned efficiently that is both optimum for data capture and viewing is to use duel stream scanning.

Duel stream scanning is usually a feature in the higher end scanners. The technology is slowly moving down to the work group and desktop scanners. What the feature allows for is a single scan that produces both a drop-out and black and white image. The scan speed is the same scan speed as if you were scanning in color. When configured the drop-out image goes one path and the black and white image another. By doing so a company can use the drop-out image only for data capture, and the black and white image will marry with the data capture results in the database or file system.

The difference in data capture accuracy between a drop-out form and a black and white scanned form is on average 15% more accurate often much higher. The reason for this is the OCR in data capture does not get interfered with form lines being printed on or too close to text. Additionally the logic to locate fields can be simplified as field labels are often small font and hard to detect.

It's simple and has the greatest accuracy of any solution, duel stream is a great tool.

Labels: , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Thursday, January 7, 2010

Print to OCR?

When I talk to people about the unique technique of printing text documents to image just for the purpose to run optical character recognition ( OCR ) or data capture on them, they are rightful confused and think I'm a little nutz.

Why would you ever convert an already digital document back to image? I promise it's not because I'm so fond of OCR it actually has it's purpose.

Language Detection: By converting a document to image for OCR, I can check the language of each word in the document. While I would much prefer to use a language detection tool on a digital file, there is no robust tool that exists to do this at volume. The unique aspect of OCR engines is that they contain morphology and dictionaries. This is where OCR has improved its accuracy in the past 5 years. OCR engines attempt to identify the language of text in order to better read the document. Because this mechanism is already built into the engines if I convert a digital file to image and OCR it, I can tell you what languages exist in that document. Additionally font while a clear indicator of language if not accompanied by the proper language encoding will not tell a digital process what a language is, in OCR there is no need for such an encoding.

Normalization of digital formats: While a PDF created in Acrobat and a PDF created in a third party tool look identical to the viewer, internally these PDF files are very different. In order to accurately digitally parse a PDF file you have to have a standard format that is used. If you do not have a standard format you are dealing with variations in the document visually and infrastructural. This becomes an overwhelming number of variations. For example, a collection of invoices has as many variations as there are invoices' times as many PDF generating applications exist. However, if you were to OCR the PDF to parse versus digital parsing than you are dealing with only the number of variants that exist in the invoices themselves.

However crazy it sounds like the above two are real scenarios and there are many more. I doubt that these problems will always exist, but it makes you think twice about crazy statements such as printing a digital document to image just so you can OCR it.

Labels: , , , , , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments