I never thought that my last post would have so much audience. Among other things, it earned me 3 direct job interview offers (1 of ‘em from Google itself, maintainer of tesseract), an invite to write articles to a TI digital e-magazine and a few digital friends, but that’s something to discuss at other posts. Thank you!
Getting back to what really matters: last post was focused on cross compiling (potentially) any library for iOS (armv6/armv7/i386) and to use as an example I chose Tesseract, which was the library I was using on a work project. But the repercussion was so great and both Tesseract and iOS got newer versions that I’ve decided to write this post specifically about getting Tesseract compiled and using it on your iOS project.
As stated earlier, Tesseract has been officially launched at version 3.01 (that now uses an autogen.sh setup script and an improved configure script ) and iOS has received a major upgrade, version 5.0. As you may guess, these changes broke my script!
So let’s restart this party! (or: Compiling Tesseract 3.01 for iOS SDK 5.0)
The basics about the script were explained at last post and I’ll be just covering the changes and how to use it.
As noted by Rafael, the default C/C++/Objective-C compilers for iOS 5 (bundled with Xcode 4.2) have changed, actually, now you just need Clang, so the CPP, CXX, CXXPP, and CC definitions (inside setenv_all()) have changed to:
export CXX="$DEVROOT/usr/bin/llvm-g++" export CC="$DEVROOT/usr/bin/llvm-gcc"
Additionally, as Tesseract now has on autogen.sh script to run before configuring, we run it before each configure call:
bash autogen.sh
And because Tesseract’s configure script now accepts a path to Leptonica to be specified, no hacks with it are needed, just calling it with another parameter is just enough:
./configure --enable-shared=no LIBLEPT_HEADERSDIR=$GLOBAL_OUTDIR/include/
To build your desired library, create a directory, I’ll refer to it as “./build/”. Inside it, create the following structure:
- ./build/
- dependencies/ – which will receive the .h and compiled lib*.a files
- leptonica-1.68/ – directory with the Leptonica 1.68 source files
- tesseract-3.01/ – directory with the Tesseract 3.01 source files
- build_dependencies.sh – our build script (link at the end of the post)
Open Terminal, enter our “./build/” directory, cross your fingers (one very important step pointed out by Venusbai) and run:
bash build_dependencies.sh
Well, if you’re lucky enough and deserve the holy right to use Tesseract on mobile Apps, check the dependencies folder content and there you’ll have all the needed header and library files to play with OCR on your iPhone (I don’t have one, personally prefer Android, but you got it….).
Great!!! Now what?! (or: Using Tesseract on your iOS project)
- Create one new iOS project at Xcode (or just open your existing one)
- Add the generated ./build/dependencies/ folder to your project. It contains the needed .h Header and lib*.a Library files
- Add the tessdata folder, containing, well, erhm, hum, the tessdata files you need at your project. If you don’t know what the “tessdata” folder is: it contains preprocessed data for a certain language so Tesseract can recognize that language, download language data from: http://code.google.com/p/tesseract-ocr/downloads/list. Check the sub-instructions below to add it the right way (not the default Xcode way…)
- Right-click your project/group at Xcode
- Choose “Add files to your project”
- Select the “tessdata” folder
- At the same window, check the “Create folder references for any added folders”. This is the most important step, as it instructs Xcode to add your “tessdata” folder as a regular folder (a resource, as well), not as a Xcode project group.
- Create your TessBaseAPI object with the code below to start playing with it!
- Make sure that every source file that includes/imports or sees (includes/imports one file that may include/import) Tesseract Header files has the .mm extension instead of the regular .m. This allows the compiler to interpret Tesseract Headers as C/C++ headers.
// Set up the tessdata path. This is included in the application bundle
// but is copied to the Documents directory on the first run.
NSArray *documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentPath = ([documentPaths count] > 0) ? [documentPaths objectAtIndex:0] : nil;
NSString *dataPath = [documentPath stringByAppendingPathComponent:@"tessdata"];
NSFileManager *fileManager = [NSFileManager defaultManager];
// If the expected store doesn't exist, copy the default store.
if (![fileManager fileExistsAtPath:dataPath]) {
// get the path to the app bundle (with the tessdata dir)
NSString *bundlePath = [[NSBundle mainBundle] bundlePath];
NSString *tessdataPath = [bundlePath stringByAppendingPathComponent:@"tessdata"];
if (tessdataPath) {
[fileManager copyItemAtPath:tessdataPath toPath:dataPath error:NULL];
}
}
setenv("TESSDATA_PREFIX", [[documentPath stringByAppendingString:@"/"] UTF8String], 1);
// init the tesseract engine.
tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
Well, that’s it! Hope you can reproduce this and I also provide to download one Xcode 4.2 iOS SDK 5 project with Tesseract configured and already recognizing one sample image, check it out if having any troubles following this howto.
Files for Download
- build_dependencies.sh
- Sample Xcode project
- Xcode 4.2
- iOS SDK 5
- Leptonica 1.68
- Tesseract v3.01
Final Considerations
I really hope you guys have enjoyed it and if you have any opinion, compliment, suggestion or just wanna state something, feel free to comment, I’ll try to approve it ASAP.



Great stuff Suzuki .. I’ll try it later today
Muito obrigado pela resposta e um grande abraco !
Hey thanks a bunch for this, the same thing just happened to me on tesseract3.0 + iOS 5 upgrade
I was also wondering if you had any luck Training tesseract for some reason I am having a really tough time to train for a select few fonts.
Thanks for this, you seem to be interested in the same type of projects as I am possibly?
I haven’t tried to train tesseract since I found it to be really painful and got regular to very good results with the supplied language files…
If you’re interested in mobile applications that try to empower the user (in the “be able to do more” sense), then yes, we may have the same tastes.
Please, i ‘m integrating tesseract with OpenCv , i wonder how to make an object from tesseract in order to use it’s methods in another class, so the question is how to make an object from tesseract ?
It’s as simple as encapsulating it inside a custom class and creating functions that calls the tesseract C-functions to do the job. Did I understand your question?
when i see this link http://code.google.com/p/tesseract-ocr/wiki/ReleaseNotes for version 3.01 i found that these version have many features that i want to make a good benefit from them, the question –> is the sample Xcode project that you attached to these post already configured to allow all these features? or it only recognizes an image if yes, so how to use them in my project, if no, so how to configure these features on iOS (SDK 5)?
thanks
yes you did it, thanks
Yea the base eng.traindata is good however for say new pictures of text the image quality is dramatically changed, even with image processing and cleaning up the results are far from usable for reliable information. Especially for say targeted document types. I only want to train 2 fonts after my image processing to test accuracy on imagepicker results. My older 3.0 version without a lot of work was around 80% accurate on non-trained fonts however on times new roman was 100% from picture. Its a shame the training process is so painful especially when in theory it should not be that difficult lol. If your interested in training at all keep me in the loop or need some pre-processing tips, ill let you know how it all works out shortly. Hoping to finish this and release in 2 weeks. But thanks again for this, I was dreading doing a new compile for iOS!
Nice!
Let me know if you get to train tesseract and if that actually helps with accuracy, I haven’t seem someone stating that got success with that. Guess that could be used as material for a new post.
Good luck!
Sorry to blow up your thread on possible off topics however.. I have tried converting this to run off the camera roll or image picker and noticed that leptonica pix.h methods are completely new and restricting my ability to run any size image I want. Are there any restrictions to sizing or ways to overcome this instead of using the older image sizing methods aside from leptonica?
You don’t really need to use Pix at all. You can work with UIImage, scale or modify it the way you want and then converting the UIImage bitmap data to Pix in order to use it at tesseract.
I’ve based my code for converting from/to IplImage (OpenCV equivalent of Leptonica’s Pix) on:
- http://stackoverflow.com/questions/4545237/creating-uiimage-from-raw-rgba-data
- http://stackoverflow.com/questions/1298867/convert-image-to-grayscale
Guess you can work on that to create your UIImage Pix convertion.
Regards
Just a small thanks, you saved me a lot of time.
Excellent tutorial, worked perfectly!
Hey, I’ve updated your script to also include libtiff. Libtiff is useful because it lets you extract convert Pix* structures to UIImages using this code I found on stackoverflow http://stackoverflow.com/questions/9013475/create-uiimage-from-leptonicas-pix-structure
Here’s the updated build_dependencies.sh
http://pastebin.com/gE59pVgk
Great tutorial…. thanks
Great tutorial.. Was fully lost with several kinds of errors trying to build tesseract
It would be good if you let us know how you used OpenCV with tesseract
Thanks a lot!!!!!
Hi,
I tried your steps to build 4.3, but not able to build it properly I think. So many errors popup when trying to link to an Xcode project. More over, the header files in the dependencies/include/tesseract are lesser than that you have. I ve posted the logs of the build script. Can you help pls. Trying to build this library and use for the last 8 hours straight – badly need your help. No idea where I am going wrong.
The following i output to a txt file. I ve also pasted the error messages coming from terminal.
*****************************************
Edit: WAY TOO BIG!!!
configure: error: C preprocessor “/Developer/Platforms/iPhoneSimulator.platform/Developer/usr/bin/cpp-4.2″ fails sanity check
See `config.log’ for more details.
make: *** No targets specified and no makefile found. Stop.
ar: creating archive libtesseract_all.a
/Developer/usr/bin/ranlib: file: libtesseract_all.a(bmpiostub.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(gifio.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(jpegio.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(leptwin.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(pdfiostub.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(pngio.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(pnmiostub.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(psio1stub.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(psio2stub.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(tiffio.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(webpio.o) has no symbols
/Developer/usr/bin/ranlib: file: libtesseract_all.a(zlibmemstub.o) has no symbols
lipo: specifed architecture type (armv6) for file (./outdir/arm6/liblept.a) does not match its cputype (16777223) and cpusubtype (3) (should be cputype (12) and cpusubtype (6))
lipo: specifed architecture type (armv6) for file (./outdir/arm6/libtesseract_all.a) does not match its cputype (16777223) and cpusubtype (3) (should be cputype (12) and cpusubtype (6))
cp: ./outdir/lib*.a: No such file or directory
Hi abdullah
Can you tell me how you solved this problem? I am getting the same error..
/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/ranlib: file: .libs/liblept.a(leptwin.o) has no symbols
Thanks,
Hi,
Pls disregard my last message. I ve build it fine. Too many hours of work made me a little impatient with tesseract
Now trying to integrate with Xcode project – View controller based. Getting this error :
Command /Developer/usr/bin/lex failed with exit code 1
Trying to fix it . Thanks
Do you have any other output about this error on Xcode? Even the command that is generating this would help.
Regards
If you build the library this way using the iOS 5 SDK, can an app using the library run on iOS 4.3 devices?
Your question has a broader question/answer:
Can iOS 5 SDK Applications run on iOS 4 devices?
Definitely yes. If you use Base SDK iOS 5 (latest one) for compiling, your App/library can run on lower iOS devices (4.*, ie) if the iOS Deployment Target is <= than 4.* (this allows the App to be installed on such device) AND you don't call iOS 5 APIs (this would cause crashes).
Tesseract and Leptonica use no iOS functions AT ALL. So you're safe with compiling them against iOS 5 SDK.
To better understand "Base SDK" and "iPhone OS Deployment Target", take a look at:
http://iphonedevelopertips.com/xcode/base-sdk-and-iphone-os-deployment-target-developing-apps-with-the-4-x-sdk-deploying-to-3-x-devices.html
Regards
Hi, thank you for the tutorial!
I Have a problem, i have copied folder “dependencies” and “tessdata” in my Xcode project from your Xcode example (that work for me), but I have this error in this part
namespace tesseract {
class TessBaseAPI;
};
-> expected ; after top level declarator
Can you help me?
You can try changing your .m files that are including Tesseract stuff to .mm extension. Having .m files including C/C++ files causes problem because .m files are compiled as strict C and Objective-C source files.
If that won’t work, you can also try to change the project I’ve provided and turn it into yours…
Regards,
Suzuki
I have already changed .m file in .mm and not work (compile but when I #import the .h file in another class give me the previous error)
I have solved in this way:
Change “namespace tesseract { …. };” in:
#ifdef __cplusplus
#include “baseapi.h”
using namespace tesseract;
#else
@class TessBaseAPI;
#endif
and change tesseract::TessBaseAPI *tesseract;
in : TessBaseAPI *tesseract;
In this way my project works. Is correct?
I guess it is correct and a very nice way to solve the problem. I’ll use it myself.
Thanks and I’m glad you worked it out!
Thank you for this tutorial! =)
I’ve downloaded the Sample Xcode project, but didn’t work to me. When i try to compile project more then 50 erros was showed, specific on those codes: @class MBProgressHUD;
namespace tesseract {
class TessBaseAPI;
};
Can u help me to just compile this project ??
tks.
Have you tried using Andrea fix at the comment above?
http://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/#comment-161
Give it a try and let me know if it works for you.
Regards
Do you have a PayPal account? I want to send you some money!
You’re a life savior!!!
Hi Deobrat. PayPal? For sure! Just donate to tinsukesuzuki@gmail.com from:
https://www.paypal.com/br/cgi-bin/webscr?cmd=_send-money
hi Suzuki
firstly i’d like to apologize for my poor english if you can’t read my question smoothly.
i’ve read your instruction to cross-complie tesseract but it didn’t work well
(of course that i wasn’t good enough at this)
but finally i decided to download your sample project and a bit fixing
so now i can go through with tesseract on iOS
but the problem is i want to enable tesseract to recognize Handwritting
so i made my own hand.traineddata
but i don’t know how to make Xcode use my hand.traineddata – -”
i changed this once
“tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], “hand”);”
and already added hand.traineddata in to “tessdata” folder
and Xcode still show me “error”
i know that i have to set PATH for Xcode to read my hand.trainedata but i didn’t know how to set it – -”
very thanks
OK, so the provided sample works fine for you and you’re trying to add your own traineddata to it, right?
As far as I know, the described procedure should have worked fine and your traineddata file should have been loaded. You can try to see if some of the traineddata supplied by tesseract other than the english one works (http://code.google.com/p/tesseract-ocr/downloads/list), but I can’t help you with that since I haven’t tried to load different traineddata files.
Besides that I’d take Ray Smith‘s (tesseract developer) statement as a strong advise in the “limited” sense:
“Tesseract was never designed for handwriting, but people have been successful to a limited extent in retraining it for handwriting.”
Regards
finally i can solve this problem
i printed the error and see the path that your proveded sample call
and just copy .traineddata to that folder
so the proplem is successful fix
and can also work when i deployed it to my ipad
i know this maybe a very simple solution
but just post it for the others
who hit the same problem with me
Regards
pich
Hi if possible can u provide me a video regarding this tutorial
I don’t think that recording this tutorial would be useful. Are you having problems with any specific point?
hi
I’ve downloaded the Sample Xcode project, but didn’t work to me. When i try to compile project more then 50 erros was showed, specific on those codes: @class MBProgressHUD;
namespace tesseract {
class TessBaseAPI;
};
Can u help me to just compile this project ??
tks.
Have you tried using Andrea fix at the comment above?
http://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/#comment-161
Give it a try and let me know if it works for you.
Regards
Great tutorial, very helpful, thanks a lot!
Huge help, you’re the man Suzuki!
Thanks a lot & happy new year !!
I had to do a really quick proof-of-concept and was given a link to RCarlsen’s Pocket OCR sample and after reading through several blogs (including yours) I was really happy to see that your sample project includes pre-compiled libraries – WHAT A TIME SAVER !!
I’ll probably be back one day and will have to compile some library on my own, but importing your dependencies folder into Pocket OCR will enable me to use the picture roll directly for making some custom tests without having to bother to compile libraries on my own, so your sample project makes my day !!
Keep up the good work !!
Amazing tutorial complete with explanations and a sample project. Thank you for everything and keep up the great work!
Very interesting post !
I am trying to build vlc for iOS (vlc is an open source multimedia player) http://wiki.videolan.org/MobileVLC
The provided build script are outdated so I am trying to upgrade them to match new ios version
During the configure step , I got the following error :
“configure: error: C preprocessor “/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/llvm-cpp-4.2 ” fails sanity check”
export CPP=”${DEVROOT}/usr/bin/llvm-cpp-4.2
would you have any guess on this pb ?
Try changing from llvm-cpp-4.2 to llvm-gcc-4.2, if that works the explanation would be that llvm-cpp isn’t a C compiler…
I haven’t the same issue here (I am using XCode 4.2/iOS 5) – also trying to compile Tesseract.. having problem liptolib:
onfigure: error: C preprocessor “/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//usr/bin/llvm-gcc” fails sanity check
See `config.log’ for more details.
Checking config.log this is waht I find:
configure:5841: /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//usr/bin/llvm-gcc -arch armv7 -pipe -no-cpp-precomp -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//SDKs/iPhoneOS5.1.sdk -miphoneos-version-min=3.2 -I/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//SDKs/iPhoneOS5.1.sdk/usr/include/ -I/Users/miguel/Uni-Local/Praktikum/build/dependencies/include -L/Users/miguel/Uni-Local/Praktikum/build/dependencies/lib conftest.c
conftest.c:14: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘error’
configure:5841: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME “leptonica”
| #define PACKAGE_TARNAME “leptonica”
| #define PACKAGE_VERSION “1.68″
| #define PACKAGE_STRING “leptonica 1.68″
| #define PACKAGE_BUGREPORT “dan.bloomberg@gmail.com”
| #define PACKAGE_URL “”
| /* end confdefs.h. */
| #ifdef __STDC__
| # include
| #else
| # include
| #endif
| Syntax error
configure:5871: error: in `/Users/miguel/Uni-Local/Praktikum/build/leptonlib-1.67′:
configure:5874: error: C preprocessor “/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//usr/bin/llvm-gcc” fails sanity check
See `config.log’ for more details.
Enviroment:
setenv_all()
{
# Add internal libs
export CFLAGS=”$CFLAGS -I$GLOBAL_OUTDIR/include -L$GLOBAL_OUTDIR/lib”
export CPP=”$DEVROOT/usr/bin/llvm-gcc”
#export CXX=”$DEVROOT/usr/bin/g++-4.2″
export CXX=”$DEVROOT/usr/bin/llvm-g++”
export CC=”$DEVROOT/usr/bin/llvm-gcc”
export CXXCPP=”$DEVROOT/usr/bin/llvm-g++”
#export CC=”$DEVROOT/usr/bin/gcc-4.2″
export LD=$DEVROOT/usr/bin/ld
export AR=$DEVROOT/usr/bin/ar
export AS=$DEVROOT/usr/bin/as
export NM=$DEVROOT/usr/bin/nm
export RANLIB=$DEVROOT/usr/bin/ranlib
export LDFLAGS=”-L$SDKROOT/usr/lib/”
export CPPFLAGS=$CFLAGS
export CXXFLAGS=$CFLAGS
}
setenv_arm7()
{
unset DEVROOT SDKROOT CFLAGS CC LD CPP CXX AR AS NM CXXCPP RANLIB LDFLAGS CPPFLAGS CXXFLAGS
export DEVROOT=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer
# export DEVROOT=/Developer/Platforms/iPhoneOS.platform/Developer
export SDKROOT=$DEVROOT/SDKs/iPhoneOS$IOS_BASE_SDK.sdk
export CFLAGS=”-arch armv7 -pipe -no-cpp-precomp -isysroot $SDKROOT -miphoneos-version-min=$IOS_DEPLOY_TGT -I$SDKROOT/usr/include/”
setenv_all
}
Any help would be appreciated.
I am having the same error with XCode 4.2/ iOS 5.1 – Trying to compile Tesseract as well (but fails building liption):
Error:
configure: error: C preprocessor “/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//usr/bin/llvm-gcc” fails sanity check
See `config.log’ for more details.
In config.log I see:
configure:5841: /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//usr/bin/llvm-gcc -arch armv7 -pipe -no-cpp-precomp -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//SDKs/iPhoneOS5.1.sdk -miphoneos-version-min=3.2 -I/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//SDKs/iPhoneOS5.1.sdk/usr/include/ -I/Users/miguel/Uni-Local/Praktikum/build/dependencies/include -L/Users/miguel/Uni-Local/Praktikum/build/dependencies/lib conftest.c
conftest.c:14: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘error’
configure:5841: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME “leptonica”
| #define PACKAGE_TARNAME “leptonica”
| #define PACKAGE_VERSION “1.68″
| #define PACKAGE_STRING “leptonica 1.68″
| #define PACKAGE_BUGREPORT “dan.bloomberg@gmail.com”
| #define PACKAGE_URL “”
| /* end confdefs.h. */
| #ifdef __STDC__
| # include
| #else
| # include
| #endif
| Syntax error
configure:5871: error: in `/Users/miguel/Uni-Local/Praktikum/build/leptonlib-1.67′:
configure:5874: error: C preprocessor “/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer//usr/bin/llvm-gcc” fails sanity check
See `config.log’ for more details.
My Environment:
setenv_all()
{
# Add internal libs
export CFLAGS=”$CFLAGS -I$GLOBAL_OUTDIR/include -L$GLOBAL_OUTDIR/lib”
export CPP=”$DEVROOT/usr/bin/llvm-gcc”
#export CXX=”$DEVROOT/usr/bin/g++-4.2″
export CXX=”$DEVROOT/usr/bin/llvm-g++”
export CC=”$DEVROOT/usr/bin/llvm-gcc”
export CXXCPP=”$DEVROOT/usr/bin/llvm-g++”
#export CC=”$DEVROOT/usr/bin/gcc-4.2″
export LD=$DEVROOT/usr/bin/ld
export AR=$DEVROOT/usr/bin/ar
export AS=$DEVROOT/usr/bin/as
export NM=$DEVROOT/usr/bin/nm
export RANLIB=$DEVROOT/usr/bin/ranlib
export LDFLAGS=”-L$SDKROOT/usr/lib/”
export CPPFLAGS=$CFLAGS
export CXXFLAGS=$CFLAGS
}
setenv_arm7()
{
unset DEVROOT SDKROOT CFLAGS CC LD CPP CXX AR AS NM CXXCPP RANLIB LDFLAGS CPPFLAGS CXXFLAGS
export DEVROOT=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer
# export DEVROOT=/Developer/Platforms/iPhoneOS.platform/Developer
export SDKROOT=$DEVROOT/SDKs/iPhoneOS$IOS_BASE_SDK.sdk
export CFLAGS=”-arch armv7 -pipe -no-cpp-precomp -isysroot $SDKROOT -miphoneos-version-min=$IOS_DEPLOY_TGT -I$SDKROOT/usr/include/”
setenv_all
}
I have changed many times CXX/CPP but no of them works… same problem :S!
Any help appreciated.
Yeaahh.. amazing post! Thanks!
Is it possible to compile Tesseract with your script without Leptonica? I am using OpenCv in my project, so there is no need of Leptonica…
I haven’t searched for how to compile Tesseract w/o Leptonica, but if it is possible, there must be some parameter to be passed to Tesseract’s configure call. Take a look at it an modify the way it gets called by the build script.
Good Luck!
The readme states : “Leptonica is required and provides image I/O and processing”… so doubt it.
btw thanks Suzuki! Spent 5 hrs trying to fix the build script myself and then I came across this page
Are you with Google now, and if so you think they’ll incl. support for osx/ios in the future releases?
Well, I’m not with Google, though I’d be more than happy to contribute with OSX/iOS support or so. And, for future releases, note that tesseract project only provides pre-compiled binaries for Windows, so I bet it will stay like that: pre-compiled/easy installer for Windows, compatible code for compiling for other platforms.
Using Thai (tha) and I notice simplified Chinese (chi_sim) the tesseract code seg faults while processing the image. Someone created an issue: http://code.google.com/p/tesseract-ocr/issues/detail?id=502 though in my testing it seg faults in a different function. Anyone have any pointers for debugging? I’d love to be able to step through the tesseract library code in Xcode.
Actually english doesn’t work 100% for me either. With certain images I get the same crash which occurs in tesseract::Classify::ComputeIntCharNormArray. Is anyone actually using this with iOS 5 successfully beyond a few test images?
Thanks for this info. One question: in your sample project you included the dependencies group with the header files and static libs of leptonica and tesseract. Are the static libs in there fat libs? In other words can I use those for both the iPhone simulator and a real iPhone? Or do I still have to build my own following your tutorial?
P.S. sorry I first posted this in the wrong topic.
Yes, they are fat files, there is no need to recompile them if you’re not willing to change compilation directives (for size or performance optimizations) or some other compilation-time option.
Okay thanks!
Awesome tutorial! I downloaded your sample and it works for me, but when i trying to include in my own project some problem occurs. There are no error while compiling but it crash my app. The image below is the screenshot of the error.
http://dl.dropbox.com/u/2305062/tesseract%20error.png
Any idea with this?
Are you sure that the required image (Lorem_Ipsum_Univers.png) is also in your project’s resources? Or, if you’re using your own images, place a breakpoint at the setTesseractImage: function and check if your (UIImage *)image is valid. As this code is only for demonstration purposes, I haven’t placed error checks.
Yes it is UIImage. The image is displayed in the Center image view. But when OCR take place the error just pops up. I’ll try debug it later if it still can’t maybe I juz use ur sample file and move all my project into it. I would like to ask what process do I need to include to OCR normal capture image. I added camera and camera roll to your sample and tried a image from camera the OCR output is quite messy. What I know is I have to do some preprocessing to the image. I would like to ask is there anyway to decrease its sensitivity?
That (image preprocessing) is, actually, the real challenge behind getting Tesseract to play well with mobile camera pictures. I have no great advices to give, but I know this is totally possible because Apps like ScanBizCards (http://www.scanbizcards.com/) can do it pretty well. If you find out any hint on how to do this, please share with us.
Good Luck!
make sure that you checked the “Create folder references for any added folders” checkbox when you add the tessdata folder to your project,
GoodLuck.
awesome Suzuki. Adigato! You saved my lazy ass a lot of work
Hi, I have a weird problem and I’m looking for help. I used AV Foundation to capture a image of text, just few words, to UIImage (JPG). If I save that photo to iPhoto Library and load it back then use Tesseract, it works no problem. However, if I pass the UIImage directly from AV Foundation output, it didn’t work, the resulting text from image is just random characters.
Thank you very much!
I’d bet that the problem is in your “passing UIImage to Tesseract” code, because Tesseract understands binary (or RAW) image representation in a variety of formats, but you gotta correctly specify it. This AV Foundation image of yours may not be on the format that your specific UIImage->Tesseract code is expecting, resulting in erroneous image interpretation.
On the other hand, when you save your AV Foundation image to JPG and load back to UIImage, it may have been converted on the save or the load step to the format you’re expecting on your UIImage->Tesseract code.
Pay attention to your images pixel data format.
Best regards!
Thanks for your response. I finally figured it out, the image rotation is the cause.
Have a nice day!
THANK YOU!!
Finally a script that works. I have it up and running now. For anyone having problems with the /Developer build path, Apple moved the entire folder into /Applications/Xcode. Here is an updated script
http://pastebin.com/xyUt3c84
HI I am trying to build your sample project by setting the target compiler to LLVM GCC 4.2 and run into a lot of compile time errors. Is this possible? Thanks
Can you please help me out with exact description about building simple OCR android application. its for my project submission.
Thanks in advance !!
Sorry pal, can’t do that. Just can give you the overral idea of what I’d try to do: compile Tesseract for the ARM architecture of your target device and try to use it from Java using JNI.
Hi Suzuki,
I would like to get the debug information of the OCR process, for example, the bounding box around the recognized words. Could you please show me how to get that info?
Thank you
hungtrv, here’s what I do:
- (UIImage*)image:(UIImage*)image withBoxa:(Boxa*)boxa
{
UIGraphicsBeginImageContext(image.size);
[image drawAtPoint:CGPointZero];
CGContextRef ctx = UIGraphicsGetCurrentContext();
[[UIColor blueColor] setStroke];
for (int i = 0; i n; i++)
{
Box *b = boxa->box[i];
CGRect asRect = CGRectMake(b->x, b->y, b->w, b->h);
CGContextStrokeRectWithWidth(ctx, asRect, 2.0);
}
UIImage *newImg = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();
return newImg;
}
then, with the original image passed to tesseract, do something like:
Pixa *textImages = 0;
Boxa *textLines = tesseractPtr->GetTextlines(&textImages, NULL);
originalWithBoxes = [self image:originalUIImage withBoxa:textLines];
pixaDestroy(&textImages);
boxaDestroy(&textLines);
Hope that helps..
It works perfectly, thank you very much matt h.
what’s the variable n defined in the for loop ? is this method returns to me only the text lines in the image, (i.e. neglecting non any graphics content in the image)?
thanks,
what’s the variable n defined in the for loop ? is this method returns to me only the text lines in the image, (i.e. neglecting any graphics content in the image)?
(sorry there’s a mistake in the previous post)
thanks,
It is a great tutorial but while running build_dependencies.sh, I got few errors:
configure: WARNING: If you wanted to set the --build type, don't use --host.
If a cross compiler is detected then cross compile mode will be used.
checking build system type... i386-apple-darwin11.3.0
checking host system type... arm-apple-darwin6
checking for arm-apple-darwin6-gcc... /Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/llvm-gcc
checking whether the C compiler works... no
configure: error: in `/Users/jjaideep2000/project/tesseract/build/leptonica-1.68':
configure: error: C compiler cannot create executables
See `config.log' for more details.
test.sh: line 103: make: command not found
cp: src/.libs/lib*.a: No such file or directory
...
Thats a generic error caused by the script when using the wrong c compiler:
checking whether the C compiler works… noI’m gonna need more info like the Xcode and iOS SDK versions you’re using to reproduce that.
How to use DetectOS (Automatic page orientation) and page segmentation in tesseract 3.01 under iOS 5
this is amazing ……. thanks a lot man, you was a big help for me, thanks for the effort
@Maciej Swic
comment:
(http://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/#comment-227)
Thank You! I knew someone would have already gone to the effort of adjusting the script for the XCode 4.3 changeover.
I think the lines for setting SDKROOT in setenv_arm6(), setenv_arm7(), setenv_i386() should remain:
export SDKROOT=$DEVROOT/SDKs/iPhoneOS$IOS_BASE_SDK.sdk
and then just update IOS_BASE_SDK to:
ln9: IOS_BASE_SDK=”5.1″
@Suzuki
Epic praise for your work! Going to making bringing just about any OpenSource code I find into my iOS project 1000x easier with your cross-compile and FAT linking tips.
You’re most welcome Josh. I’m really glad people are getting to use OpenSource libraries easier with this tool, not just Tesseract.
Thanks so much for the awsome instructions, AND the binaries! One snag that I ran into when trying to make use of the library was that I got a “Member access into incomplete type…” error when I tried to use BaseAPI::ResultIterator. This is a really useful tool to be able to access standard OCR metadata like character-level confidences. I figured out that just adding the header files ResultIterator.h and PageIterator.h from the Tesseract source made it work, so those might be handy to include too…
Note taken! I just didn’t want to include every Tesseract Header file in order to avoid confusing people, and, as I ain’t a Tesseract expert, I just added the ones needed to make plain OCR work.
thanks a lot. you just save my life.
How to remove Non text area from a scanned image using tesseract 3.01 i see the two methods (segmentPage) in tesseractClass.h and (remove_non_text_area) in osdetect.h but i don’t know how to use them, please help me on that
Hi Eslam,
Tesseract provides ways to get the OCR result as a list of bounding boxes with the recognized strings for each box. You could create a new image with just the region determined by those boxes.
Take a look at TessBaseAPI::GetRegions(Pixa** pixa).
And as a general tip, read the library API a little bit more before asking for directions.
Good luck,
Suzuki
you mean that tesseract already neglects non text area ? and i have to take the bounding boxes(contains the resulted text ) and put them in another image then apply tesseract again on the new image?
the method GetRegions(Pixa** pixa) returns a struct of type Boxa , the question is how to put the value returned from these method in a UIImage?
If you encounter problems using this script and you’re using the latest Xcode from the app store:
The developer folder is now in : /Applications/Xcode.app/Contents/Developer/Platforms, adjust the script accordingly.
Do not forget to update the SDK version to 5.1 or whatever new version you have when you read this.
Another problem I encounter whas that automake was missing (aclocal to be precise), if you’re in that situation :
curl -O http://mirrors.kernel.org/gnu/automake/automake-1.11.tar.gz
tar xzvf automake-1.11.tar.gz
cd automake-1.11
./configure –prefix=/usr/local
make
sudo make install
And libtool also
Hi Suzuki,
i´m trying to get the apache portable runtime library on iOS, have you any experience with that or do you think it will work with your script?
Hi Frank,
Well, if the library contains a makefile compatible with iOS (or at least not platform-tied), the script (with some adjustments) should work fine.
Please let us know if works with camera,if yes how and what are the limits
thanks for your great work.
Sir please let us know why it works like charm on iphone simulator but not on Iphone ,do we need to do anything else to make it run on Iphone.
What do you mean by not working on iPhone? Could you be more specific? Build/compile problems or not recognizing at runtime?
Regards,
Suzuki
Hi, I done these changes:
Change “namespace tesseract { …. };” in:
#ifdef __cplusplus
#include “baseapi.h”
using namespace tesseract;
#else
@class TessBaseAPI;
#endif
and change tesseract::TessBaseAPI *tesseract;
in : TessBaseAPI *tesseract;
But now when I try to compile lines like:
tesseract->SetImage(…);
or
tesseract->Recognize();
the compiler give me back as errors:
‘TessBaseAPI’ does not have a member named ‘SetImage’
‘TessBaseAPI’ does not have a member named ‘Recognize’
What I made wrong?
thx a lot lenny
Have you tried changing “@class TessBaseAPI;” to “class TessBaseAPI”? Because the one with a @ is Objective-C specific and TessBaseAPI is a C/C++ class…
Good luck,
Suzuki
Ok, the problem is that I have to rename the .m file that include the tesseract header file in .mm
my fault!
tnx
@Suzuki,it shows 90% accuracy for my data when i pick image from my phone’s Library,but same image if i click from camera and use ,it is showing jumbled letters.Kindly guide if something can be done to make sure it works from camera as it works for gallery images,thanks
Hello
I followed all the steps and the script ran successfully , however thr lib folder under dependencies is empty.
I would need more details (like the portion of the script that may have some warning/fail message) to determine which was your problem….
Hi, Suzuki
the url for the script and the sample is invalid.
could you reshare it?
thanks.
Well man, I have just tested it and I was able to download the files through the post links…
Give it another try and let me know if it doesn’t work for you.
Regards,
Suzuki
thanks for your reply.
I just checked and found it was the network problem.
@Suzuki,it shows 90% accuracy for my data when i pick image from my phone’s Library,but same image if i click from camera and use ,it is showing jumbled letters.Kindly guide if something can be done to make sure it works from camera as it works for gallery images,thanks
Hi, Suzuki
I download the project and it runs well.
I also try to create a new project using the sources build by you, and it runs well, too.
However, when I add the source build myself, it runs error:
‘pageiterator.h’ file not found
in dependencies/include/tesseract/baseapi.h.
I compare the baseapi.h with yours, and find mine has more codes than yours.
For example, the include files are:
#include “platform.h”
#include “apitypes.h”
#include “thresholder.h”
#include “unichar.h”
#include “tesscallback.h”
#include “publictypes.h”
#include “pageiterator.h”
#include “resultiterator.h”
Moreover, the files in libs are also more than yours: it has 18 files.
I do’t know where to focus on this problem.
Could you help me and give some suggestion?
BTW,My Xcode is 4.3.2.
thanks.
Oh,sorry.
I just find I use the latest source code which is 3.02
I try 3.01 later.
Sorry for my mistake.