From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package
Date: Sun, 19 Mar 2017 14:54:55 +0100 [thread overview]
Message-ID: <20170319145455.27ceaa84@free-electrons.com> (raw)
In-Reply-To: <1489910873-8450-3-git-send-email-gilles.talis@gmail.com>
Hello,
On Sun, 19 Mar 2017 09:07:53 +0100, Gilles Talis wrote:
> diff --git a/package/tesseract-ocr/Config.in b/package/tesseract-ocr/Config.in
> new file mode 100644
> index 0000000..4fd0668
> --- /dev/null
> +++ b/package/tesseract-ocr/Config.in
> @@ -0,0 +1,44 @@
> +comment "tesseract-ocr needs a toolchain w/ threads, C++, gcc >= 4.8 & dynamic library"
> + depends on BR2_USE_MMU
> + depends on !BR2_INSTALL_LIBSTDCPP || !BR2_TOOLCHAIN_HAS_THREADS || \
> + !BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 || BR2_STATIC_LIBS
Indentation of this last line should have been two tabs.
> +menuconfig BR2_PACKAGE_TESSERACT_OCR
> + bool "tesseract-ocr"
> + depends on BR2_INSTALL_LIBSTDCPP
> + depends on BR2_TOOLCHAIN_HAS_THREADS
> + depends on BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 # C++11
> + depends on BR2_USE_MMU # fork()
> + depends on !BR2_STATIC_LIBS
> + select BR2_PACKAGE_JPEG
> + select BR2_PACKAGE_LEPTONICA
> + select BR2_PACKAGE_LIBPNG
> + select BR2_PACKAGE_TIFF
I don't see where jpeg, libpng and tiff are mandatory. In fact, I don't
see them being used by tesseract-ocr, so I've dropped those
dependencies for nwo.
> +TESSERACT_OCR_VERSION = 3.05.00
> +TESSERACT_OCR_DATA_VERSION = 3.04.00
> +TESSERACT_OCR_SITE = $(call github,tesseract-ocr,tesseract,$(TESSERACT_OCR_VERSION))
> +TESSERACT_OCR_LICENSE = Apache-2.0
> +TESSERACT_OCR_LICENSE_FILES = COPYING
> +
> +# Source from github, no configure script provided
> +TESSERACT_OCR_AUTORECONF = YES
> +
> +TESSERACT_OCR_DEPENDENCIES += leptonica jpeg libpng tiff
I've dropped jpeg, libpng and tiff. Instead, I've added host-pkgconf
which is really needed since configure.ac uses PKG_CHECK_MODULES().
I've also passed --disable-opencl since your package hasn't added
explicit support for OpenCL.
> +# Language data files download
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_ENG),y)
> +TESSERACT_OCR_DATA_FILES += eng.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_FRA),y)
> +TESSERACT_OCR_DATA_FILES += fra.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_DEU),y)
> +TESSERACT_OCR_DATA_FILES += deu.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_SPA),y)
> +TESSERACT_OCR_DATA_FILES += spa.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_SIM),y)
> +TESSERACT_OCR_DATA_FILES += chi_sim.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_TRA),y)
> +TESSERACT_OCR_DATA_FILES += chi_tra.traineddata
> +endif
Regarding the language files, I'm not entirely happy with the current
solution, but I couldn't come up with something better. I looked at the
two following options:
* Creating a separate package for the tessdata repository
https://github.com/tesseract-ocr/tessdata/, but this repository is
3.4GB in size, which is admittedly a bit annoying to download when
you just want a single language.
* Since the list of languages is quite long, having an explicit option
for each of them is a bit annoying. So I looked into turning your
one-option-per-language idea into a single option with a space
separated list of languages. Except that we anyway need to have the
hash file for each language in tesseract-ocr.hash.
So in the end, I kept it as-is. We'll see if other folks have better
idea.
So in the mean time, I've applied with the fixes described above.
Thanks!
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
next prev parent reply other threads:[~2017-03-19 13:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-19 8:07 [Buildroot] [PATCH v2 0/2] Introducing tesseract OCR engine Gilles Talis
2017-03-19 8:07 ` [Buildroot] [PATCH v2 1/2] leptonica: new package Gilles Talis
2017-03-19 13:51 ` Thomas Petazzoni
2017-03-20 15:05 ` Peter Korsgaard
2017-03-21 7:37 ` Gilles Talis
2017-03-19 8:07 ` [Buildroot] [PATCH v2 2/2] tesseract-ocr: " Gilles Talis
2017-03-19 13:54 ` Thomas Petazzoni [this message]
2017-03-19 23:00 ` Arnout Vandecappelle
2017-03-19 23:03 ` Thomas Petazzoni
2017-03-20 8:10 ` Gilles Talis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170319145455.27ceaa84@free-electrons.com \
--to=thomas.petazzoni@free-electrons.com \
--cc=buildroot@busybox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox