When processing images, tesseract is very restrictive about the images it processes.
It only touches TIFF files, with the extension "tif" and uncompressed.
Unfortunately, TesseractGUI is not very straightforward about the reason it rejects files from processing (it would just say "Error reading tesseract output").
Here's how you can identify the error and fix it:
the problem with this particular TIFF file is the compression.
Step 1/ Download ImageMagick
Step 2/ Identify the TIFF file we want to use:
Note the compression scheme (LZW).
Step 3/ Uncompress the TIFF file
Now use page1_uncompressed.tif with tesseract.
Related:
It only touches TIFF files, with the extension "tif" and uncompressed.
Unfortunately, TesseractGUI is not very straightforward about the reason it rejects files from processing (it would just say "Error reading tesseract output").
Here's how you can identify the error and fix it:
Run tesseract from the command line to find out more about the rejection cause
cristi:~ diciu$ export TESSDATA_PREFIX=/Applications/TesseractGUI.app/Contents/Resources/
cristi:~ diciu$ /Applications/TesseractGUI.app/Contents/Resources/tesseract ~/Desktop/tiffs/page1.tif /tmp/ocrtest.txt
Tesseract Open Source OCR Engine
read_tif_image:Error:Illegal image format:Compression
/Applications/TesseractGUI.app/Contents/Resources/tesseract:Error:Read of file failed:/Users/diciu/Desktop/tiffs/page1.tif
Signal_exit 31 ABORT. LocCode: 3 AbortCode: 3
the problem with this particular TIFF file is the compression.
Fixing the problem
Step 1/ Download ImageMagick
Step 2/ Identify the TIFF file we want to use:
cd /Users/diciu/Downloads/ImageMagick-6.5.8/bin
export DYLD_FALLBACK_LIBRARY_PATH=/Users/diciu/Downloads/ImageMagick-6.5.8/lib
export MAGICK_HOME=/Users/diciu/Downloads/ImageMagick-6.5.8/
cristi:bin diciu$ tiffutil -info ~/Desktop/tiffs/page1.tif
Directory at 0x837f8
Subfile Type: (0 = 0x0)
Image Width: 1200 Image Length: 2088
Resolution: 200, 200
Resolution Unit: pixels/inch
Bits/Sample: 8
Compression Scheme: Lempel-Ziv & Welch encoding
Photometric Interpretation: palette color (RGB from colormap)
Predictor: none
Samples/Pixel: 1
Rows/Strip: 10
Number of Strips: 209
Planar Configuration: Not planar
Color Map: (present)
Note the compression scheme (LZW).
Step 3/ Uncompress the TIFF file
tiffutil -none ~/Desktop/tiffs/page1.tif -out ~/Desktop/tiffs/page1_uncompressed.tif
Now use page1_uncompressed.tif with tesseract.
Related: