All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package
Date: Sun, 19 Mar 2017 14:54:55 +0100	[thread overview]
Message-ID: <20170319145455.27ceaa84@free-electrons.com> (raw)
In-Reply-To: <1489910873-8450-3-git-send-email-gilles.talis@gmail.com>

Hello,

On Sun, 19 Mar 2017 09:07:53 +0100, Gilles Talis wrote:
> diff --git a/package/tesseract-ocr/Config.in b/package/tesseract-ocr/Config.in
> new file mode 100644
> index 0000000..4fd0668
> --- /dev/null
> +++ b/package/tesseract-ocr/Config.in
> @@ -0,0 +1,44 @@
> +comment "tesseract-ocr needs a toolchain w/ threads, C++, gcc >= 4.8 & dynamic library"
> +	depends on BR2_USE_MMU
> +	depends on !BR2_INSTALL_LIBSTDCPP || !BR2_TOOLCHAIN_HAS_THREADS || \
> +        !BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 || BR2_STATIC_LIBS

Indentation of this last line should have been two tabs.

> +menuconfig BR2_PACKAGE_TESSERACT_OCR
> +	bool "tesseract-ocr"
> +	depends on BR2_INSTALL_LIBSTDCPP
> +	depends on BR2_TOOLCHAIN_HAS_THREADS
> +	depends on BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 # C++11
> +	depends on BR2_USE_MMU # fork()
> +	depends on !BR2_STATIC_LIBS
> +	select BR2_PACKAGE_JPEG
> +	select BR2_PACKAGE_LEPTONICA
> +	select BR2_PACKAGE_LIBPNG
> +	select BR2_PACKAGE_TIFF

I don't see where jpeg, libpng and tiff are mandatory. In fact, I don't
see them being used by tesseract-ocr, so I've dropped those
dependencies for nwo.


> +TESSERACT_OCR_VERSION = 3.05.00
> +TESSERACT_OCR_DATA_VERSION = 3.04.00
> +TESSERACT_OCR_SITE = $(call github,tesseract-ocr,tesseract,$(TESSERACT_OCR_VERSION))
> +TESSERACT_OCR_LICENSE = Apache-2.0
> +TESSERACT_OCR_LICENSE_FILES = COPYING
> +
> +# Source from github, no configure script provided
> +TESSERACT_OCR_AUTORECONF = YES
> +
> +TESSERACT_OCR_DEPENDENCIES += leptonica jpeg libpng tiff

I've dropped jpeg, libpng and tiff. Instead, I've added host-pkgconf
which is really needed since configure.ac uses PKG_CHECK_MODULES().

I've also passed --disable-opencl since your package hasn't added
explicit support for OpenCL.

> +# Language data files download
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_ENG),y)
> +TESSERACT_OCR_DATA_FILES += eng.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_FRA),y)
> +TESSERACT_OCR_DATA_FILES += fra.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_DEU),y)
> +TESSERACT_OCR_DATA_FILES += deu.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_SPA),y)
> +TESSERACT_OCR_DATA_FILES += spa.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_SIM),y)
> +TESSERACT_OCR_DATA_FILES += chi_sim.traineddata
> +endif
> +
> +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_TRA),y)
> +TESSERACT_OCR_DATA_FILES += chi_tra.traineddata
> +endif

Regarding the language files, I'm not entirely happy with the current
solution, but I couldn't come up with something better. I looked at the
two following options:

 * Creating a separate package for the tessdata repository
   https://github.com/tesseract-ocr/tessdata/, but this repository is
   3.4GB in size, which is admittedly a bit annoying to download when
   you just want a single language.

 * Since the list of languages is quite long, having an explicit option
   for each of them is a bit annoying. So I looked into turning your
   one-option-per-language idea into a single option with a space
   separated list of languages. Except that we anyway need to have the
   hash file for each language in tesseract-ocr.hash.

So in the end, I kept it as-is. We'll see if other folks have better
idea.

So in the mean time, I've applied with the fixes described above.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

  reply	other threads:[~2017-03-19 13:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-19  8:07 [Buildroot] [PATCH v2 0/2] Introducing tesseract OCR engine Gilles Talis
2017-03-19  8:07 ` [Buildroot] [PATCH v2 1/2] leptonica: new package Gilles Talis
2017-03-19 13:51   ` Thomas Petazzoni
2017-03-20 15:05   ` Peter Korsgaard
2017-03-21  7:37     ` Gilles Talis
2017-03-19  8:07 ` [Buildroot] [PATCH v2 2/2] tesseract-ocr: " Gilles Talis
2017-03-19 13:54   ` Thomas Petazzoni [this message]
2017-03-19 23:00     ` Arnout Vandecappelle
2017-03-19 23:03       ` Thomas Petazzoni
2017-03-20  8:10         ` Gilles Talis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170319145455.27ceaa84@free-electrons.com \
    --to=thomas.petazzoni@free-electrons.com \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.