2014/10/05

Build Tesseract-OCR 3.02.02 with MinGW and CMake

The different steps in this article have been done:
  • under Windows 7 Home Premium.
  • with CMake 2.8
  • with Qt 5.1 and its MinGW 4.8.
  • with basic MinGW (without Qt).


Tesseract OCR source code

Download tesseract-ocr-3.02.02.tar.gz and extract it.

Leptonica library

From the Leptonica web site:
Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.
Leptonica is quite tedious to build because of all its dependencies. Fortunately, someone did this work for us.

Here is the link to his repository: https://github.com/zdenop/tesseract-mingw .

Many thanks to zdenop for saving us time!

Download the following libraries from the bin folder:
  • libgif-4.dll
  • libjbig-1.dll
  • libjpeg-8.dll
  • liblept-3.dll : the Leptonica library.
  • libpng15-15.dll
  • libtiff-3.dll
  • libtiffxx-3.dll
  • libwebp-2.dll
  • zlib1.dll
Maybe, you've noticed that a libtesseract-3.dll is also available. I've tried to use it in my projects but it didn't work. That's why I've decided to build it my way.

You must also get the source code. I didn't use the header files in zdenop's repo but you could try. I used the original headers from Leptonica version 1.69.

Extract Leptonica archive, create a bin directory in the new folder then copy all the libraries mentioned above in it.

CMake

I use CMake version 2.8.

MinGW

Installation of MinGW is out of the scope of this article. There are many tutorials about this.

I use MinGW version 4.8 supplied by Qt 5.1. All the necessary tools are already installed.

If you don't already have Qt installed and don't need it, you'll have to download MinGW C/C++ development packages to build the project.

Environment batch file

We'll name it env.bat. It adds MinGW bin directory to the PATH environment var.
@ECHO off

SET PATH=c:\your\path\to\mingw\bin;%PATH%

START %SYSTEMROOT%\system32\cmd
Example:
SET PATH=c:\mingw\bin;%PATH%
      
Other example for Qt users:
SET PATH=D:\Programs\Qt\Qt5.1.0\Tools\mingw48_32\bin;%PATH%


CMake batch files

If you code with Qt: cmake.bat.
@ECHO OFF

rmdir /s /q CMakeFiles
del /f /q CMakeCache.txt

cmake^
-G "Unix Makefiles"^
.
If you use MinGW out of Qt: cmake_noqt.bat.
@ECHO OFF

rmdir /s /q CMakeFiles
del /f /q CMakeCache.txt

cmake^
 -G"Unix Makefiles"^
 -D"CMAKE_MAKE_PROGRAM:PATH=C:/MinGW/bin/mingw32-make.exe"^
 .
It assumes cmake.exe or CMake bin directory is in your PATH. If it's not the case, add a line in your env.bat.

For example:
SET PATH="C:\CMake 2.8\bin";%PATH%


CMakeLists.txt file

If you are not familiar with CMake, simply consider CMakeLists.txt as a project file.

In this section, we won't analyze the whole file but only the lines you will have to understand.
#_-_-_-_-_-_SOME DIRECTORIES_-_-_-_-_-_
set(OCR_DIR D:/prog/ocr)
set(MINGW_DIR D:/Programs/Qt/Qt5.1.0/Tools/mingw48_32/i686-w64-mingw32)
set(MINGW_LIB_DIR ${MINGW_DIR}/lib)
set(LEPTONICA_DIR ${OCR_DIR}/leptonica-1.69)
  • OCR_DIR : base directory for my OCR tools.
  • MINGW_DIR : parent directory for the MinGW lib one, C:\MinGW if you don't use Qt.
  • MINGW_LIB_DIR : this one is needed to link against winsock2 library.
  • LEPTONICA_DIR : Leptonica extraction directory.
set(CMAKE_BINARY_DIR ../${PROJECT_NAME}_output)
The build output directory.
set(WINDLL_NAME \"lib${TARGET_LIB_TESSERACT}.dll\")
add_definitions(-D_tagBLOB_DEFINED
                -D__BLOB_T_DEFINED
                -DUSE_STD_NAMESPACE
                -DWINDLLNAME=${WINDLL_NAME})
Here, we add preprocessor definitions.
  • _tagBLOB_DEFINED : to avoid conflicting declarations between wtypes.h (MinGW) and platform.h (tesseract) if you work with Qt.
  • __BLOB_T_DEFINED : same as above if your MinGW installation is not part of Qt.
  • WINDLLNAME : used by ccutil files.
  • USE_STD_NAMESPACE : I have not searched its exact purpose but it must be declared.
#_-_-_-_-_-_LINKING_-_-_-_-_-_
set(CMAKE_FIND_LIBRARY_SUFFIXES .a ${CMAKE_FIND_LIBRARY_SUFFIXES})
Because we want to link against a static library.
find_library(LEPTONICA_LIB NAMES lept
                                 lept-3
                                 liblept
                                 liblept-3
                                 PATHS ${LEPTONICA_DIR}/bin)
Linking against Leptonica library.
find_library(WS2_32_LIB NAMES libws2_32.a
                              PATHS ${MINGW_LIB_DIR}
                              NO_DEFAULT_PATH
                              NO_SYSTEM_ENVIRONMENT_PATH)
Linking against winsock2 static library.

Final steps

  • Copy our CMakeLists.txt in the tesseract-ocr source code directory, along with configure, eurotext.tif, etc...
  • Copy env.bat and cmake.bat in tesseract-ocr parent directory.
  • Launch env.bat.
  • Enter the tesseract dir:
    cd tesseract-ocr
  • Launch CMake:
    ..\cmake.bat
    or
    ..\cmake_noqt.bat
  • Build:
    mingw32-make
  • Wait a few minutes...

You should end up with a tesseract_output directory containing:
  • libtesseract3.02.02.dll
  • svpaint.exe
  • tesseract.exe

Batch files and CMakeLists.txt can be downloaded from my repository:

https://github.com/broija/tesseract_ocr_mingw

6 comments:

  1. Thankx for your tuto, but I'm somes problem when i compile it.
    1- C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\wtypesbase.h:385: erreur : conflicting declaration 'typedef struct tagBLOB BLOB'
    } BLOB;
    ^
    2-C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\wtypesbase.h:386: erreur : conflicting declaration 'typedef struct tagBLOB* LPBLOB'
    typedef struct tagBLOB *LPBLOB;

    and a lot of warning
    C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\combaseapi.h:153: In file included from C:/Qt/Qt5.4.2/Tools/mingw491_32/i686-w64-mingw32/include/combaseapi.h:153:0,
    C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\objbase.h:14: from C:/Qt/Qt5.4.2/Tools/mingw491_32/i686-w64-mingw32/include/objbase.h:14,
    .......

    ReplyDelete
    Replies
    1. Hi,

      Sorry for the late answer. This tutorial was written using Qt 5.1 and MinGW 4.8.

      The "_tagBLOB_DEFINED" directive was intended to avoid this problem : "to avoid conflicting declarations between wtypes.h (MinGW) and platform.h (tesseract) if you work with Qt.".

      I can't remember if I faced any warning during compilation. Since I changed my computer, I can't rebuild the old environment.

      Regards,

      Delete
  2. Need help to compile tesseract on win 7 having ming compiler, using cmake GUI.

    ReplyDelete
    Replies
    1. Hi,

      This post is 20-month old now. Unless you're trying to use the very same versions used in that post, in which case I'd be glad to help you, I'm sorry to say that I have not enough time to dig further with different versions.

      Could you please be more specific?

      Delete
  3. Hi,
    I followed your commands, and everything seems smooth without errors or warnings. But after the command: mingw32-make, there is no output.
    in the CMakeList.txt, I tried to fix the binary output, but still no output.
    Could you please helf me?
    Thanks.

    JackNguyen

    ReplyDelete
    Replies
    1. Hello,

      If it is not too late, could you please detail exactly what you've used (which Windows OS, CMake version, etc...). I'm not sure the versions used in this post are still easy to find.

      I'm still using Win7 by the way, so I won't be of any help if you're working on a more recent one.

      Regards,

      Delete