OCR online
October 5th, 2009 by admin
With the technology of optical character recognition, I met somewhere in 1997, when bought my first one, then another hand, black and white scanner Genius ScanMate 256 (incidentally, still working). Was attached to the scanner program Direct OCR to 3x inch floppy disk (pancake, from somewhere in the subconscious all these names pop up), which is in full force trying to prove that you can quickly and almost error-free text from the book to enter into the computer. Well, the evidence was not very. FineReader, whom I met later, made it better. Topic Detection intrigued me, I spent quite a lot of time on non-fiction articles about technology OCR.
In 2001 I was preparing a thesis on web-technologies. For a long time thinking about where to apply knowledge. Since I have been interested in the technology of OCR, I decided to combine the WEB and recognition of texts. For the very recognition I had to answer FineReader. With friends we have “dismantled» FineReader into separate DLL and found out how to call the individual functions of these libraries by passing the binary image data, and how to get back to the recognized version of the text. Above it all was to build a simple Web interface to download images, run the recognition and benefits.
The first restriction on the time for us was ridiculous bandwidth Internet. Page A4, scanned as 200 dpi and saved in TIFF (which only perceived the program FineReader) could take up to several megabytes in gray tones, and if someone by mistake or ignorance of the color version scans, the volume increased at three to four times . This enormous for those days, even a file on a local network are sent and processed with difficulty, but through the public Internet - generally an elusive goal.
The second factor - cost. At this speed, sending files of scanned pages, each page was expensive. We also took into account that are commonly used cracked versions of text recognition, which comes for free or for peanuts.
The third factor - demand. To people began to use the online service to recognize the text, must be at least three factors: the availability of the scanner, the availability of the Internet and the lack of autonomy of the text. It was hard to imagine a large number of such “Krivorukov” and “stupid” users.
The project was implemented, but on “shelved” as unpromising.
Two years ago I suggested to my colleagues on the work of re-consider alternative project. The situation has changed: the Internet has become faster (mp3 files have long been more in volume than the scanned pages in the format JPG), scanners are almost everywhere (and even text you can simply take a picture), users do not try to load my head with all sorts of programs and use online services. We FineReader is API, and FLASH can be done quite convenient web-based interface for managing the loading and recognition. But we have not come to a consensus and we can say, missed the opportunity to make a popular and useful service that can profitably sell ABBYY or Google.
Now ABBYY has already sold itself on-line version of Fine Reader for text recognition (supports 6 languages, including Russian; understand documents written in several languages, supports input formats TIFF (including multipage files), JPEG, BMP, PNG, PCX, GIF, DjVu; supports output in Microsoft ® Word, Excel ®, Rich Text Format, TXT, searchable PDF).
The other day a well-known service Google Docs API prodostavil opportunity to check the same in their demo page. Google allows you to upload images in high resolution (up to 10 megabytes) in the format JPG, PNG or GIF. Recognition lasts about two minutes. Supported until only the Latin alphabet.
—
Check out the analysis of freeservers.com, game.com.cn, utro.ru, onlinedown.net, zonamusical.net, virustotal.com, castorama.fr, apartmenttherapy.com, tv-asahi.co.jp, icicibank.co.in - and much more
- No Comments »
- Posted in Uncategorized



