From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package
Date: Sun, 19 Mar 2017 14:54:55 +0100 [thread overview]
Message-ID: <20170319145455.27ceaa84@free-electrons.com> (raw)
In-Reply-To: <1489910873-8450-3-git-send-email-gilles.talis@gmail.com>
Hello,
On Sun, 19 Mar 2017 09:07:53 +0100, Gilles Talis wrote:
> diff --git a/package/tesseract-ocr/Config.in b/package/tesseract-ocr/Config.in
> new file mode 100644
> index 0000000..4fd0668
> --- /dev/null
> +++ b/package/tesseract-ocr/Config.in
> @@ -0,0 +1,44 @@
> +comment "tesseract-ocr needs a toolchain w/ threads, C++, gcc >= 4.8 & dynamic library"
> + depends on BR2_USE_MMU
> + depends on !BR2_INSTALL_LIBSTDCPP || !BR2_TOOLCHAIN_HAS_THREADS || \
> + !BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 || BR2_STATIC_LIBS
Indentation of this last line should have been two tabs.
> +menuconfig BR2_PACKAGE_TESSERACT_OCR
> + bool "tesseract-ocr"
> + depends on BR2_INSTALL_LIBSTDCPP
> + depends on BR2_TOOLCHAIN_HAS_THREADS
> + depends on BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 # C++11
> + depends on BR2_USE_MMU # fork()
> + depends on !BR2_STATIC_LIBS
> + select BR2_PACKAGE_JPEG
> + select BR2_PACKAGE_LEPTONICA
> + select BR2_PACKAGE_LIBPNG
> + select BR2_PACKAGE_TIFF
I don't see where jpeg, libpng and tiff are mandatory. In fact, I don't
see them being used by tesseract-ocr, so I've dropped those
dependencies for nwo.
> +TESSERACT_OCR_VERSION = 3.05.00
> +TESSERACT_OCR_DATA_VERSION = 3.04.00
> +TESSERACT_OCR_SITE = $(call github,tesseract-ocr,tesseract,$(TESSERACT_OCR_VERSION))
> +TESSERACT_OCR_LICENSE = Apache-2.0
> +TESSERACT_OCR_LICENSE_FILES = COPYING
> +
> +# Source from github, no configure script provided
> +TESSERACT_OCR_AUTORECONF = YES
> +
> +TESSERACT_OCR_DEPENDENCIES += leptonica jpeg libpng tiff
I've dropped jpeg, libpng and tiff. Instead, I've added host-pkgconf
which is really needed since configure.ac uses PKG_CHECK_MODULES().
I've also passed --disable-opencl since your package hasn't added
explicit support for OpenCL.
> +# Language data files download
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_ENG),y)
> +TESSERACT_OCR_DATA_FILES += eng.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_FRA),y)
> +TESSERACT_OCR_DATA_FILES += fra.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_DEU),y)
> +TESSERACT_OCR_DATA_FILES += deu.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_SPA),y)
> +TESSERACT_OCR_DATA_FILES += spa.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_SIM),y)
> +TESSERACT_OCR_DATA_FILES += chi_sim.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_TRA),y)
> +TESSERACT_OCR_DATA_FILES += chi_tra.traineddata
> +endif
Regarding the language files, I'm not entirely happy with the current
solution, but I couldn't come up with something better. I looked at the
two following options:
* Creating a separate package for the tessdata repository
https://github.com/tesseract-ocr/tessdata/, but this repository is
3.4GB in size, which is admittedly a bit annoying to download when
you just want a single language.
* Since the list of languages is quite long, having an explicit option
for each of them is a bit annoying. So I looked into turning your
one-option-per-language idea into a single option with a space
separated list of languages. Except that we anyway need to have the
hash file for each language in tesseract-ocr.hash.
So in the end, I kept it as-is. We'll see if other folks have better
idea.
So in the mean time, I've applied with the fixes described above.
Thanks!
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
next prev parent reply other threads:[~2017-03-19 13:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-19 8:07 [Buildroot] [PATCH v2 0/2] Introducing tesseract OCR engine Gilles Talis
2017-03-19 8:07 ` [Buildroot] [PATCH v2 1/2] leptonica: new package Gilles Talis
2017-03-19 13:51 ` Thomas Petazzoni
2017-03-20 15:05 ` Peter Korsgaard
2017-03-21 7:37 ` Gilles Talis
2017-03-19 8:07 ` [Buildroot] [PATCH v2 2/2] tesseract-ocr: " Gilles Talis
2017-03-19 13:54 ` Thomas Petazzoni [this message]
2017-03-19 23:00 ` Arnout Vandecappelle
2017-03-19 23:03 ` Thomas Petazzoni
2017-03-20 8:10 ` Gilles Talis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170319145455.27ceaa84@free-electrons.com \
--to=thomas.petazzoni@free-electrons.com \
--cc=buildroot@busybox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.