Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package
Date: Sun, 19 Mar 2017 14:54:55 +0100	[thread overview]
Message-ID: <20170319145455.27ceaa84@free-electrons.com> (raw)
In-Reply-To: <1489910873-8450-3-git-send-email-gilles.talis@gmail.com>

Hello,

On Sun, 19 Mar 2017 09:07:53 +0100, Gilles Talis wrote:
> diff --git a/package/tesseract-ocr/Config.in b/package/tesseract-ocr/Config.in
> new file mode 100644
> index 0000000..4fd0668
> --- /dev/null
> +++ b/package/tesseract-ocr/Config.in
> @@ -0,0 +1,44 @@
> +comment "tesseract-ocr needs a toolchain w/ threads, C++, gcc >= 4.8 & dynamic library"
> +	depends on BR2_USE_MMU
> +	depends on !BR2_INSTALL_LIBSTDCPP || !BR2_TOOLCHAIN_HAS_THREADS || \
> +        !BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 || BR2_STATIC_LIBS

Indentation of this last line should have been two tabs.

> +menuconfig BR2_PACKAGE_TESSERACT_OCR
> +	bool "tesseract-ocr"
> +	depends on BR2_INSTALL_LIBSTDCPP
> +	depends on BR2_TOOLCHAIN_HAS_THREADS
> +	depends on BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 # C++11
> +	depends on BR2_USE_MMU # fork()
> +	depends on !BR2_STATIC_LIBS
> +	select BR2_PACKAGE_JPEG
> +	select BR2_PACKAGE_LEPTONICA
> +	select BR2_PACKAGE_LIBPNG
> +	select BR2_PACKAGE_TIFF

I don't see where jpeg, libpng and tiff are mandatory. In fact, I don't
see them being used by tesseract-ocr, so I've dropped those
dependencies for nwo.


> +TESSERACT_OCR_VERSION = 3.05.00
> +TESSERACT_OCR_DATA_VERSION = 3.04.00
> +TESSERACT_OCR_SITE = $(call github,tesseract-ocr,tesseract,$(TESSERACT_OCR_VERSION))
> +TESSERACT_OCR_LICENSE = Apache-2.0
> +TESSERACT_OCR_LICENSE_FILES = COPYING
> +
> +# Source from github, no configure script provided
> +TESSERACT_OCR_AUTORECONF = YES
> +
> +TESSERACT_OCR_DEPENDENCIES += leptonica jpeg libpng tiff

I've dropped jpeg, libpng and tiff. Instead, I've added host-pkgconf
which is really needed since configure.ac uses PKG_CHECK_MODULES().

I've also passed --disable-opencl since your package hasn't added
explicit support for OpenCL.

> +# Language data files download
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_ENG),y)
> +TESSERACT_OCR_DATA_FILES += eng.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_FRA),y)
> +TESSERACT_OCR_DATA_FILES += fra.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_DEU),y)
> +TESSERACT_OCR_DATA_FILES += deu.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_SPA),y)
> +TESSERACT_OCR_DATA_FILES += spa.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_SIM),y)
> +TESSERACT_OCR_DATA_FILES += chi_sim.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_TRA),y)
> +TESSERACT_OCR_DATA_FILES += chi_tra.traineddata
> +endif

Regarding the language files, I'm not entirely happy with the current
solution, but I couldn't come up with something better. I looked at the
two following options:

 * Creating a separate package for the tessdata repository
   https://github.com/tesseract-ocr/tessdata/, but this repository is
   3.4GB in size, which is admittedly a bit annoying to download when
   you just want a single language.

 * Since the list of languages is quite long, having an explicit option
   for each of them is a bit annoying. So I looked into turning your
   one-option-per-language idea into a single option with a space
   separated list of languages. Except that we anyway need to have the
   hash file for each language in tesseract-ocr.hash.

So in the end, I kept it as-is. We'll see if other folks have better
idea.

So in the mean time, I've applied with the fixes described above.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

  reply	other threads:[~2017-03-19 13:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-19  8:07 [Buildroot] [PATCH v2 0/2] Introducing tesseract OCR engine Gilles Talis
2017-03-19  8:07 ` [Buildroot] [PATCH v2 1/2] leptonica: new package Gilles Talis
2017-03-19 13:51   ` Thomas Petazzoni
2017-03-20 15:05   ` Peter Korsgaard
2017-03-21  7:37     ` Gilles Talis
2017-03-19  8:07 ` [Buildroot] [PATCH v2 2/2] tesseract-ocr: " Gilles Talis
2017-03-19 13:54   ` Thomas Petazzoni [this message]
2017-03-19 23:00     ` Arnout Vandecappelle
2017-03-19 23:03       ` Thomas Petazzoni
2017-03-20  8:10         ` Gilles Talis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170319145455.27ceaa84@free-electrons.com \
    --to=thomas.petazzoni@free-electrons.com \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox