01
Nov
11

How To: Compile and Use Tesseract (3.01) on iOS (SDK 5)

I never thought that my last post would have so much audience. Among other things, it earned me 3 direct job interview offers (1 of  ‘em from Google itself, mantainer of tesseract), an invite to write articles to a TI digital e-magazine and a few digital friends, but that’s something to discuss at other posts. Thank you!

Getting back to what really matters: last post was focused on cross compiling (potentially) any library for iOS (armv6/armv7/i386) and to use as an example I chose Tesseract, which was the library I was using on a work project. But the repercussion was so great and both Tesseract and iOS got newer versions that I’ve decided to write this post specifically about getting Tesseract compiled and using it on your iOS project.

As stated earlier, Tesseract has been officially launched at version 3.01 (that now uses an autogen.sh setup script and an improved configure script ) and iOS has received a major upgrade, version 5.0. As you may guess, these changes broke my script!


So let’s restart this party! (or: Compiling Tesseract 3.01 for iOS SDK 5.0)

The basics about the script were explained at last post and I’ll be just covering the changes and how to use it.

As noted by Rafael, the default C/C++/Objective-C compilers for iOS 5 (bundled with Xcode 4.2) have changed, actually, now you just need Clang, so the CPP, CXX, CXXPP, and CC definitions (inside setenv_all()) have changed to:

export CXX="$DEVROOT/usr/bin/llvm-g++"
export CC="$DEVROOT/usr/bin/llvm-gcc"

Additionaly, as Tesseract now has on autogen.sh script to run before configuring, we run it before each configure call:

bash autogen.sh

And because Tesseract’s configure script now accepts a path to Leptonica to be specified, no hacks with it are needed, just calling it with another parameter is just enough:

./configure --enable-shared=no LIBLEPT_HEADERSDIR=$GLOBAL_OUTDIR/include/

To build your desired library, create a directory, I’ll refer to it as “./build/”. Inside it, create the following structure:

  • ./build/
    • dependencies/ – which will receive the .h and compiled lib*.a files
    • leptonica-1.68/ – directory with the Leptonica 1.68 source files
    • tesseract-3.01/ – directory with the Tesseract 3.01 source files
    • build_dependencies.sh – our build script (link at the end of the post)

Open Terminal, enter our “./build/” directory, cross your fingers (one very important step pointed out by Venusbai) and run:

bash build_dependencies.sh

Well, if your’re lucky enough and deserve the holy right to use Tesseract on mobile Apps, check the dependencies folder content and there you’ll have all the needed header and library files to play with OCR on your iPhone (I don’t have one, personally prefer Android, but you got it….).

Great!!! Now what?! (or: Using Tesseract on your iOS project)

  1. Create one new iOS project at Xcode (or just open your existing one)
  2. Add the generated ./build/dependencies/ folder to your project. It contains the needed .h Header and lib*.a Library fles
  3. Add the tessdata folder, containing, well, erhm, hum, the tessdata files you need at your project. If you don’t know what the “tessdata” folder is: it contains preprocessed data for a certain language so Tesseract can recognize that language, download language data from: http://code.google.com/p/tesseract-ocr/downloads/list. Check the sub-instructions below to add it the right way (not the default Xcode way…)
    1. Right-click your project/group at Xcode
    2. Choose “Add files to your project”
    3. Select the “tessdata” folder
    4. At the same window, check the “Create folder references for any added folders”. This is the most important step, as it instructs Xcode to add your “tessdata” folder as a regular folder (a resource, as well), not as a Xcode project group.
  4. Create your TessBaseAPI object with the code below to start playing with it!
  5. Make sure that every source file that includes/imports or sees (includes/imports one file that may include/import) Tesseract Header files has the .mm extension instead of the regular .m. This allows the compiler to interpret Tesseract Headers as C/C++ headers.

// Set up the tessdata path. This is included in the application bundle
// but is copied to the Documents directory on the first run.
NSArray *documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentPath = ([documentPaths count] > 0) ? [documentPaths objectAtIndex:0] : nil;

NSString *dataPath = [documentPath stringByAppendingPathComponent:@"tessdata"];
NSFileManager *fileManager = [NSFileManager defaultManager];
// If the expected store doesn't exist, copy the default store.
if (![fileManager fileExistsAtPath:dataPath]) {
    // get the path to the app bundle (with the tessdata dir)
    NSString *bundlePath = [[NSBundle mainBundle] bundlePath];
    NSString *tessdataPath = [bundlePath stringByAppendingPathComponent:@"tessdata"];
    if (tessdataPath) {
        [fileManager copyItemAtPath:tessdataPath toPath:dataPath error:NULL];
    }
}

setenv("TESSDATA_PREFIX", [[documentPath stringByAppendingString:@"/"] UTF8String], 1);

// init the tesseract engine.
tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");

Well, that’s it! Hope you can reproduce this and I also provide to download one Xcode 4.2 iOS SDK 5 project with Tesseract configured and already recognizing one sample image, check it out if having any troubles following this howto.

Files for Download

Final Considerations

I really hope you guys have enjoyed it and if you have any opinion, compliment, sugestion or just wanna state something, feel free to comment, I’ll try to approve it ASAP.

Advertisement

50 Responses to “How To: Compile and Use Tesseract (3.01) on iOS (SDK 5)”


  1. November 1, 2011 at 7:06 pm

    Great stuff Suzuki .. I’ll try it later today :-)

    Muito obrigado pela resposta e um grande abraco !

  2. November 4, 2011 at 2:21 pm

    Hey thanks a bunch for this, the same thing just happened to me on tesseract3.0 + iOS 5 upgrade :(

    I was also wondering if you had any luck Training tesseract for some reason I am having a really tough time to train for a select few fonts.

    Thanks for this, you seem to be interested in the same type of projects as I am possibly? :)

    • November 4, 2011 at 3:14 pm

      I haven’t tried to train tesseract since I found it to be really painful and got regular to very good results with the supplied language files…

      If you’re interested in mobile applications that try to empower the user (in the “be able to do more” sense), then yes, we may have the same tastes. :D

  3. November 4, 2011 at 4:11 pm

    Yea the base eng.traindata is good however for say new pictures of text the image quality is dramatically changed, even with image processing and cleaning up the results are far from usable for reliable information. Especially for say targeted document types. I only want to train 2 fonts after my image processing to test accuracy on imagepicker results. My older 3.0 version without a lot of work was around 80% accurate on non-trained fonts however on times new roman was 100% from picture. Its a shame the training process is so painful especially when in theory it should not be that difficult lol. If your interested in training at all keep me in the loop or need some pre-processing tips, ill let you know how it all works out shortly. Hoping to finish this and release in 2 weeks. But thanks again for this, I was dreading doing a new compile for iOS!

    • November 4, 2011 at 4:34 pm

      Nice!

      Let me know if you get to train tesseract and if that actually helps with accuracy, I haven’t seem someone stating that got success with that. Guess that could be used as material for a new post.

      Good luck!

  4. November 4, 2011 at 5:01 pm

    Sorry to blow up your thread on possible off topics however.. I have tried converting this to run off the camera roll or image picker and noticed that leptonica pix.h methods are completely new and restricting my ability to run any size image I want. Are there any restrictions to sizing or ways to overcome this instead of using the older image sizing methods aside from leptonica?

  5. November 9, 2011 at 7:23 am

    Just a small thanks, you saved me a lot of time.

  6. 9 Matt H
    November 9, 2011 at 11:20 pm

    Excellent tutorial, worked perfectly!

  7. 10 Lucio
    November 11, 2011 at 2:07 pm

    Great tutorial…. thanks

  8. 11 Abdulla
    November 19, 2011 at 5:09 am

    Great tutorial.. Was fully lost with several kinds of errors trying to build tesseract :)
    It would be good if you let us know how you used OpenCV with tesseract

    Thanks a lot!!!!!

  9. 12 abdulla
    November 19, 2011 at 10:01 am

    Hi,
    I tried your steps to build 4.3, but not able to build it properly I think. So many errors popup when trying to link to an Xcode project. More over, the header files in the dependencies/include/tesseract are lesser than that you have. I ve posted the logs of the build script. Can you help pls. Trying to build this library and use for the last 8 hours straight – badly need your help. No idea where I am going wrong.

    The following i output to a txt file. I ve also pasted the error messages coming from terminal.
    *****************************************
    Edit: WAY TOO BIG!!!

    configure: error: C preprocessor “/Developer/Platforms/iPhoneSimulator.platform/Developer/usr/bin/cpp-4.2″ fails sanity check
    See `config.log’ for more details.
    make: *** No targets specified and no makefile found. Stop.
    ar: creating archive libtesseract_all.a
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(bmpiostub.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(gifio.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(jpegio.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(leptwin.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(pdfiostub.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(pngio.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(pnmiostub.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(psio1stub.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(psio2stub.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(tiffio.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(webpio.o) has no symbols
    /Developer/usr/bin/ranlib: file: libtesseract_all.a(zlibmemstub.o) has no symbols
    lipo: specifed architecture type (armv6) for file (./outdir/arm6/liblept.a) does not match its cputype (16777223) and cpusubtype (3) (should be cputype (12) and cpusubtype (6))
    lipo: specifed architecture type (armv6) for file (./outdir/arm6/libtesseract_all.a) does not match its cputype (16777223) and cpusubtype (3) (should be cputype (12) and cpusubtype (6))
    cp: ./outdir/lib*.a: No such file or directory

    • 13 Vedika
      February 17, 2012 at 9:32 pm

      Hi abdullah
      Can you tell me how you solved this problem? I am getting the same error..
      /Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/ranlib: file: .libs/liblept.a(leptwin.o) has no symbols

      Thanks,

  10. 14 abdulla
    November 19, 2011 at 11:33 am

    Hi,
    Pls disregard my last message. I ve build it fine. Too many hours of work made me a little impatient with tesseract :(

    Now trying to integrate with Xcode project – View controller based. Getting this error :

    Command /Developer/usr/bin/lex failed with exit code 1

    Trying to fix it . Thanks

  11. November 20, 2011 at 6:29 am

    If you build the library this way using the iOS 5 SDK, can an app using the library run on iOS 4.3 devices?

  12. 18 Andrea
    November 23, 2011 at 6:57 pm

    Hi, thank you for the tutorial!
    I Have a problem, i have copied folder “dependencies” and “tessdata” in my Xcode project from your Xcode example (that work for me), but I have this error in this part

    namespace tesseract {
    class TessBaseAPI;
    };

    -> expected ; after top level declarator

    Can you help me?

    • November 23, 2011 at 9:21 pm

      You can try changing your .m files that are including Tesseract stuff to .mm extension. Having .m files including C/C++ files causes problem because .m files are compiled as strict C and Objective-C source files.

      If that won’t work, you can also try to change the project I’ve provided and turn it into yours…

      Regards,
      Suzuki

      • 20 Andrea
        November 24, 2011 at 5:10 am

        I have already changed .m file in .mm and not work (compile but when I #import the .h file in another class give me the previous error)
        I have solved in this way:

        Change “namespace tesseract { …. };” in:

        #ifdef __cplusplus
        #include “baseapi.h”
        using namespace tesseract;
        #else
        @class TessBaseAPI;
        #endif

        and change tesseract::TessBaseAPI *tesseract;
        in : TessBaseAPI *tesseract;

        In this way my project works. Is correct?

      • November 24, 2011 at 5:33 am

        I guess it is correct and a very nice way to solve the problem. I’ll use it myself.

        Thanks and I’m glad you worked it out!

      • 22 Andrea
        November 24, 2011 at 5:37 am

        Thank you for this tutorial! =)

  13. 23 Fred
    November 25, 2011 at 6:19 pm

    I’ve downloaded the Sample Xcode project, but didn’t work to me. When i try to compile project more then 50 erros was showed, specific on those codes: @class MBProgressHUD;

    namespace tesseract {
    class TessBaseAPI;
    };

    Can u help me to just compile this project ??

    tks.

  14. December 6, 2011 at 9:11 am

    Do you have a PayPal account? I want to send you some money!
    You’re a life savior!!!

  15. 27 Pich
    December 12, 2011 at 5:03 am

    hi Suzuki
    firstly i’d like to apologize for my poor english if you can’t read my question smoothly.

    i’ve read your instruction to cross-complie tesseract but it didn’t work well
    (of course that i wasn’t good enough at this)
    but finally i decided to download your sample project and a bit fixing
    so now i can go through with tesseract on iOS
    but the problem is i want to enable tesseract to recognize Handwritting
    so i made my own hand.traineddata
    but i don’t know how to make Xcode use my hand.traineddata – -”

    i changed this once
    “tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], “hand”);”
    and already added hand.traineddata in to “tessdata” folder

    and Xcode still show me “error”
    i know that i have to set PATH for Xcode to read my hand.trainedata but i didn’t know how to set it – -”

    very thanks

    • December 19, 2011 at 3:01 pm

      OK, so the provided sample works fine for you and you’re trying to add your own traineddata to it, right?

      As far as I know, the described procedure should have worked fine and your traineddata file should have been loaded. You can try to see if some of the traineddata supplied by tesseract other than the english one works (http://code.google.com/p/tesseract-ocr/downloads/list), but I can’t help you with that since I haven’t tried to load different traineddata files.

      Besides that I’d take Ray Smith‘s (tesseract developer) statement as a strong advise in the “limited” sense:
      “Tesseract was never designed for handwriting, but people have been successful to a limited extent in retraining it for handwriting.”

      Regards

      • 29 Pich
        December 20, 2011 at 5:04 am

        finally i can solve this problem
        i printed the error and see the path that your proveded sample call
        and just copy .traineddata to that folder
        so the proplem is successful fix
        and can also work when i deployed it to my ipad

        i know this maybe a very simple solution
        but just post it for the others
        who hit the same problem with me :D

        Regards
        pich

  16. 30 swathi
    December 13, 2011 at 3:08 am

    Hi if possible can u provide me a video regarding this tutorial

  17. 32 sudheer
    December 13, 2011 at 8:18 am

    hi

    I’ve downloaded the Sample Xcode project, but didn’t work to me. When i try to compile project more then 50 erros was showed, specific on those codes: @class MBProgressHUD;

    namespace tesseract {
    class TessBaseAPI;
    };

    Can u help me to just compile this project ??

    tks.

  18. December 13, 2011 at 6:52 pm

    Great tutorial, very helpful, thanks a lot!

  19. December 19, 2011 at 5:01 am

    Huge help, you’re the man Suzuki!

  20. January 4, 2012 at 11:55 am

    Thanks a lot & happy new year !!
    I had to do a really quick proof-of-concept and was given a link to RCarlsen’s Pocket OCR sample and after reading through several blogs (including yours) I was really happy to see that your sample project includes pre-compiled libraries – WHAT A TIME SAVER !!

    I’ll probably be back one day and will have to compile some library on my own, but importing your dependencies folder into Pocket OCR will enable me to use the picture roll directly for making some custom tests without having to bother to compile libraries on my own, so your sample project makes my day !!

    Keep up the good work !!

  21. January 12, 2012 at 6:42 am

    Amazing tutorial complete with explanations and a sample project. Thank you for everything and keep up the great work!

  22. 38 Crapulax
    January 20, 2012 at 7:10 am

    Very interesting post !

    I am trying to build vlc for iOS (vlc is an open source multimedia player) http://wiki.videolan.org/MobileVLC

    The provided build script are outdated so I am trying to upgrade them to match new ios version

    During the configure step , I got the following error :
    “configure: error: C preprocessor “/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/llvm-cpp-4.2 ” fails sanity check”

    export CPP=”${DEVROOT}/usr/bin/llvm-cpp-4.2

    would you have any guess on this pb ?

  23. 40 Tom
    January 26, 2012 at 6:00 am

    Yeaahh.. amazing post! Thanks!
    Is it possible to compile Tesseract with your script without Leptonica? I am using OpenCv in my project, so there is no need of Leptonica…

    • January 26, 2012 at 9:25 am

      I haven’t searched for how to compile Tesseract w/o Leptonica, but if it is possible, there must be some parameter to be passed to Tesseract’s configure call. Take a look at it an modify the way it gets called by the build script.

      Good Luck!

    • 42 sts2k
      January 26, 2012 at 11:37 am

      The readme states : “Leptonica is required and provides image I/O and processing”… so doubt it.

      btw thanks Suzuki! Spent 5 hrs trying to fix the build script myself and then I came across this page :)
      Are you with Google now, and if so you think they’ll incl. support for osx/ios in the future releases?

      • January 26, 2012 at 12:34 pm

        Well, I’m not with Google, though I’d be more than happy to contribute with OSX/iOS support or so. And, for future releases, note that tesseract project only provides pre-compiled binaries for Windows, so I bet it will stay like that: pre-compiled/easy installer for Windows, compatible code for compiling for other platforms.

  24. 44 Mark
    February 9, 2012 at 5:37 pm

    Using Thai (tha) and I notice simplified Chinese (chi_sim) the tesseract code seg faults while processing the image. Someone created an issue: http://code.google.com/p/tesseract-ocr/issues/detail?id=502 though in my testing it seg faults in a different function. Anyone have any pointers for debugging? I’d love to be able to step through the tesseract library code in Xcode.

    • 45 Mark
      February 10, 2012 at 2:06 pm

      Actually english doesn’t work 100% for me either. With certain images I get the same crash which occurs in tesseract::Classify::ComputeIntCharNormArray. Is anyone actually using this with iOS 5 successfully beyond a few test images?

  25. 46 Ruben
    February 11, 2012 at 11:03 pm

    Thanks for this info. One question: in your sample project you included the dependencies group with the header files and static libs of leptonica and tesseract. Are the static libs in there fat libs? In other words can I use those for both the iPhone simulator and a real iPhone? Or do I still have to build my own following your tutorial?

    P.S. sorry I first posted this in the wrong topic. :)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.