I am looking for a optical character recognition solution and I've checked out OCRopus, but the early alpha stages it's in make it very hard to compile. OCRopus lists
tesseract as a dependancy so I've compiled and ran tesseract on a couple of scanned pages.
The results are impressive (see below the results of running it on a page from a Cisco manual).
|
Chapter 24 • Mixed-Media Bridging ending delimiter, which follows the data field) are treated differently depende ing on the bridge manufacturer Some bridge manufacturers simply ignore the bits. Others have the bridge set the C bit (to indicate that the frame has been copied) but not the A bit (which indicates that the destination station recog- nizes die address). Ln the former case, a Token Ring source node determines whether the frame it sent has become lost. Proponents of this approach sug~ gest that reliability mechanisms, such as the tracking of lost frames [..]
|