Wednesday, December 9, 2009

OCR makes old systems new

One of the biggest challenges in the IT space is migration from legacy systems, often mainframe's, to modern day operating systems and applications. Legacy systems today still exist in the form of classic green screen UNIX systems. Their life has been extended do to the critical nature of the data they contain. Modern day standards have been put into place hoping to avoid this problem in the future. However those applications that seem most critical to conform to standards such as hospital medical records systems, airline systems, and government systems still do not conform to any The vendors who make this systems have every intention of making it very hard to migrate from. But there is a way, and it works very well. OCR.

You may have seen in a previous post where I eluded to the possibilities of using OCR to scrape screen-shots. This is one of the best real examples of why the technology is so useful. When you don't have XML and ODBC or any of the other great standards that allow the exchange of data from one system to another, you always have what you can see, and if you can see it you can OCR it. If you can view the data on the screen you can move it to a new system.

Using OCR to either problematically or manual read portions of a screen where the legacy system window is displaying data, copy it to memory, and paste it into the new system is one of the most ingenious ways to ensure the neutrality of your data. Vendor lock down attempts, or old technology should not prevent you from getting to what you own, the information.

Weather it's a manual process or a programmatic one the ability to OCR screen-shots to migrate data is the hidden secret to crack any proprietary software safe.

Labels: , ,

Bookmark and Share
posted by Chris Riley at 0 Comments

Thursday, October 8, 2009

Not even your monitor is safe from OCR

I've talked about various uses of OCR that are non-conventional: anti-virus, CAPTCHA ( thought this does not work ), and now it's time for a new one. Screen scraping. OCR technology is not widely used to extract text's from user's active screens, and the predominate use has been of the sneaky kind. However I suspect that screen scraping will become more popular for data validation, user identification similar to CAPTCHA, user automation, and even extreme content management. I myself have used screen scraping to convert an on-line address book from one email account to an importable format for another email account where the initial account did not have the option for export!

Essentially what screen scraping does is take a screenshot of the active window, or entire current session and reads the text in it with OCR. Although screenshot resolution is very-low, 96 dpi, the text contained in it is what is called “pixel perfect”, and does not accompany the distortions, dithering, and splotches that can appear in scans. This makes reading the text itself relatively easy, the hard part is getting to the text.

Look at your screen now. It's probably filled with various graphics, and text everywhere. For screen scraping you cannot consider any traditional document analysis to discern where text is and what text is valuable, this has to happen after the fact. The most successful screen scraping is that which is focused on one particular portion of the screen. The next biggest challenge in screen scraping that is continuous, is the rate a screen changes. For example if you are typing a document, as I am now, you may scroll up and down very rapidly at times. Deciding when and where to capture data in an active screen can be tricky.

It may be hard for you to image why screen scraping is useful. Especially you techies who realize that the text on the screen is in digital format already somewhere. Where screen scraping is extremely valuable is when your application has to obtain data from another application. Developing connectors between applications can be very time consuming, and often a major waste of time. You have to learn the other products API, if they come out with a new version you now have to support it. But with screen scraping you can write one way to get data off the screen of ANY active application window, search for the relevant content, and presto, you never have to do it again. In the areas of enterprise content management, and conversion from a legacy system to a new, screen scraping using OCR can be the most amazing tool.

Labels: , , ,

Bookmark and Share
posted by Chris Riley at 0 Comments