Easy OCR on GNU/Linux with gImageReader

Posted by samtuke on Monday, October 3rd, 2011 0

UPDATE: I can now confirm that gImageReader also works well on Windows.

Today I discovered gImageReader – really easy OCR software for GNU/Linux. It uses Tesseract as its back-end, and the interface is very intuitive, with straightforward instructions at the bottom of the window letting you know what to do next at each stage of the OCR process.

I haven’t tried complicated structures (like tables or indenting), but for uncomplicated blocks of printed text it worked perfectly, correctly identifying each word.

Tesseract has multiple international dictionaries available, and gImageReader allows you to choose which one to use before you scan. Tesseract is under active development, with the upcoming 3.0 release scheduled to include page layout analysis; automatic page orientation and script detection capability; special modes for single column, line, word and even character; and many more languages, including Chinese.

gImageReader isn’t in my repositories (Fedora 14), but packages are available as .deb and .rpm.