* [Buildroot] [PATCH v2 0/2] Introducing tesseract OCR engine @ 2017-03-19 8:07 Gilles Talis 2017-03-19 8:07 ` [Buildroot] [PATCH v2 1/2] leptonica: new package Gilles Talis 2017-03-19 8:07 ` [Buildroot] [PATCH v2 2/2] tesseract-ocr: " Gilles Talis 0 siblings, 2 replies; 10+ messages in thread From: Gilles Talis @ 2017-03-19 8:07 UTC (permalink / raw) To: buildroot Version 2 of the patch series that introduces tesseract OCR engine, its data and leptonica, the library it mainly depends on. Gilles Talis (2): leptonica: new package tesseract-ocr: new package DEVELOPERS | 2 + package/Config.in | 2 + package/leptonica/Config.in | 8 ++++ package/leptonica/leptonica.hash | 2 + package/leptonica/leptonica.mk | 64 +++++++++++++++++++++++++++++ package/tesseract-ocr/Config.in | 44 ++++++++++++++++++++ package/tesseract-ocr/tesseract-ocr.hash | 8 ++++ package/tesseract-ocr/tesseract-ocr.mk | 69 ++++++++++++++++++++++++++++++++ 8 files changed, 199 insertions(+) create mode 100644 package/leptonica/Config.in create mode 100644 package/leptonica/leptonica.hash create mode 100644 package/leptonica/leptonica.mk create mode 100644 package/tesseract-ocr/Config.in create mode 100644 package/tesseract-ocr/tesseract-ocr.hash create mode 100644 package/tesseract-ocr/tesseract-ocr.mk -- 2.5.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH v2 1/2] leptonica: new package 2017-03-19 8:07 [Buildroot] [PATCH v2 0/2] Introducing tesseract OCR engine Gilles Talis @ 2017-03-19 8:07 ` Gilles Talis 2017-03-19 13:51 ` Thomas Petazzoni 2017-03-20 15:05 ` Peter Korsgaard 2017-03-19 8:07 ` [Buildroot] [PATCH v2 2/2] tesseract-ocr: " Gilles Talis 1 sibling, 2 replies; 10+ messages in thread From: Gilles Talis @ 2017-03-19 8:07 UTC (permalink / raw) To: buildroot Signed-off-by: Gilles Talis <gilles.talis@gmail.com> --- Changes v2 (following review by Thomas P.) - Fixed newline at end of hash file - Explicitly set autotools options using --with--XXX - Added change in DEVELOPERS file --- DEVELOPERS | 1 + package/Config.in | 1 + package/leptonica/Config.in | 8 +++++ package/leptonica/leptonica.hash | 2 ++ package/leptonica/leptonica.mk | 64 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 76 insertions(+) create mode 100644 package/leptonica/Config.in create mode 100644 package/leptonica/leptonica.hash create mode 100644 package/leptonica/leptonica.mk diff --git a/DEVELOPERS b/DEVELOPERS index 746a547..8802fc7 100644 --- a/DEVELOPERS +++ b/DEVELOPERS @@ -587,6 +587,7 @@ N: Gilles Talis <gilles.talis@gmail.com> F: package/fdk-aac/ F: package/httping/ F: package/iozone/ +F: package/leptonica/ F: package/ocrad/ F: package/webp/ diff --git a/package/Config.in b/package/Config.in index 52b4f36..ed48058 100644 --- a/package/Config.in +++ b/package/Config.in @@ -1010,6 +1010,7 @@ menu "Graphics" source "package/jpeg/Config.in" source "package/kmsxx/Config.in" source "package/lcms2/Config.in" + source "package/leptonica/Config.in" source "package/lesstif/Config.in" source "package/libart/Config.in" source "package/libdmtx/Config.in" diff --git a/package/leptonica/Config.in b/package/leptonica/Config.in new file mode 100644 index 0000000..a4053a3 --- /dev/null +++ b/package/leptonica/Config.in @@ -0,0 +1,8 @@ +config BR2_PACKAGE_LEPTONICA + bool "leptonica" + help + Leptonica is a pedagogically-oriented open source site containing + software that is broadly useful for image processing and image + analysis applications. + + http://www.leptonica.org/ diff --git a/package/leptonica/leptonica.hash b/package/leptonica/leptonica.hash new file mode 100644 index 0000000..48da06b --- /dev/null +++ b/package/leptonica/leptonica.hash @@ -0,0 +1,2 @@ +# locally computed hash +sha256 746a517a47a3bd2a90bc8d581ca6464c10f30e91a60209735efe45b3778bec62 leptonica-1.74.1.tar.gz diff --git a/package/leptonica/leptonica.mk b/package/leptonica/leptonica.mk new file mode 100644 index 0000000..37e5219 --- /dev/null +++ b/package/leptonica/leptonica.mk @@ -0,0 +1,64 @@ +################################################################################ +# +# leptonica +# +################################################################################ + +LEPTONICA_VERSION = 1.74.1 +LEPTONICA_SITE = http://www.leptonica.org/source +LEPTONICA_LICENSE = BSD-2c +LEPTONICA_LICENSE_FILES = leptonica-license.txt +LEPTONICA_INSTALL_STAGING = YES + +LEPTONICA_CONF_OPTS += --disable-programs + +ifeq ($(BR2_PACKAGE_GIFLIB),y) +LEPTONICA_DEPENDENCIES += giflib +LEPTONICA_CONF_OPTS += --with-giflib +else +LEPTONICA_CONF_OPTS += --without-giflib +endif + +ifeq ($(BR2_PACKAGE_JPEG),y) +LEPTONICA_DEPENDENCIES += jpeg +LEPTONICA_CONF_OPTS += --with-jpeg +else +LEPTONICA_CONF_OPTS += --without-jpeg +endif + +ifeq ($(BR2_PACKAGE_LIBPNG),y) +LEPTONICA_DEPENDENCIES += libpng +LEPTONICA_CONF_OPTS += --with-libpng +else +LEPTONICA_CONF_OPTS += --without-libpng +endif + +ifeq ($(BR2_PACKAGE_OPENJPEG),y) +LEPTONICA_DEPENDENCIES += openjpeg +LEPTONICA_CONF_OPTS += --with-libopenjpeg +else +LEPTONICA_CONF_OPTS += --without-libopenjpeg +endif + +ifeq ($(BR2_PACKAGE_TIFF),y) +LEPTONICA_DEPENDENCIES += tiff +LEPTONICA_CONF_OPTS += --with-libtiff +else +LEPTONICA_CONF_OPTS += --without-libtiff +endif + +ifeq ($(BR2_PACKAGE_WEBP),y) +LEPTONICA_DEPENDENCIES += webp +LEPTONICA_CONF_OPTS += --with-libwebp +else +LEPTONICA_CONF_OPTS += --without-libwebp +endif + +ifeq ($(BR2_PACKAGE_ZLIB),y) +LEPTONICA_DEPENDENCIES += zlib +LEPTONICA_CONF_OPTS += --with-zlib +else +LEPTONICA_CONF_OPTS += --without-zlib +endif + +$(eval $(autotools-package)) -- 2.5.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH v2 1/2] leptonica: new package 2017-03-19 8:07 ` [Buildroot] [PATCH v2 1/2] leptonica: new package Gilles Talis @ 2017-03-19 13:51 ` Thomas Petazzoni 2017-03-20 15:05 ` Peter Korsgaard 1 sibling, 0 replies; 10+ messages in thread From: Thomas Petazzoni @ 2017-03-19 13:51 UTC (permalink / raw) To: buildroot Hello, On Sun, 19 Mar 2017 09:07:52 +0100, Gilles Talis wrote: > Signed-off-by: Gilles Talis <gilles.talis@gmail.com> > --- > Changes v2 (following review by Thomas P.) > - Fixed newline at end of hash file > - Explicitly set autotools options using --with--XXX > - Added change in DEVELOPERS file > --- > DEVELOPERS | 1 + > package/Config.in | 1 + > package/leptonica/Config.in | 8 +++++ > package/leptonica/leptonica.hash | 2 ++ > package/leptonica/leptonica.mk | 64 ++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 76 insertions(+) > create mode 100644 package/leptonica/Config.in > create mode 100644 package/leptonica/leptonica.hash > create mode 100644 package/leptonica/leptonica.mk Applied to master, thanks. Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH v2 1/2] leptonica: new package 2017-03-19 8:07 ` [Buildroot] [PATCH v2 1/2] leptonica: new package Gilles Talis 2017-03-19 13:51 ` Thomas Petazzoni @ 2017-03-20 15:05 ` Peter Korsgaard 2017-03-21 7:37 ` Gilles Talis 1 sibling, 1 reply; 10+ messages in thread From: Peter Korsgaard @ 2017-03-20 15:05 UTC (permalink / raw) To: buildroot >>>>> "Gilles" == Gilles Talis <gilles.talis@gmail.com> writes: > Signed-off-by: Gilles Talis <gilles.talis@gmail.com> > +config BR2_PACKAGE_LEPTONICA > + bool "leptonica" > + help > + Leptonica is a pedagogically-oriented open source site containing > + software that is broadly useful for image processing and image > + analysis applications. I know this description is taken verbatim from the upstream site, but it imho isn't very helpful. This is not a (web)site. What is is really? A image processing library ans some applications? -- Bye, Peter Korsgaard ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH v2 1/2] leptonica: new package 2017-03-20 15:05 ` Peter Korsgaard @ 2017-03-21 7:37 ` Gilles Talis 0 siblings, 0 replies; 10+ messages in thread From: Gilles Talis @ 2017-03-21 7:37 UTC (permalink / raw) To: buildroot Hi Peter, 2017-03-20 16:05 GMT+01:00 Peter Korsgaard <peter@korsgaard.com>: >>>>>> "Gilles" == Gilles Talis <gilles.talis@gmail.com> writes: > > > Signed-off-by: Gilles Talis <gilles.talis@gmail.com> > > > +config BR2_PACKAGE_LEPTONICA > > + bool "leptonica" > > + help > > + Leptonica is a pedagogically-oriented open source site containing > > + software that is broadly useful for image processing and image > > + analysis applications. > > I know this description is taken verbatim from the upstream site, but it > imho isn't very helpful. This is not a (web)site. What is is really? A > image processing library ans some applications? Yes, leptonica is essentially an image processing library. Do you want me to update the description? Thanks Gilles ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package 2017-03-19 8:07 [Buildroot] [PATCH v2 0/2] Introducing tesseract OCR engine Gilles Talis 2017-03-19 8:07 ` [Buildroot] [PATCH v2 1/2] leptonica: new package Gilles Talis @ 2017-03-19 8:07 ` Gilles Talis 2017-03-19 13:54 ` Thomas Petazzoni 1 sibling, 1 reply; 10+ messages in thread From: Gilles Talis @ 2017-03-19 8:07 UTC (permalink / raw) To: buildroot Signed-off-by: Gilles Talis <gilles.talis@gmail.com> --- Changes v2 (following review by Thomas P.) - Added language data files support inside main package instead of specific package for each of them - Explicitly selected PNG, JPEG and TIFF libraries as dependencies - Added DEVELOPERS file change - Fixed indentation issues - Added extra comments - Added limitations found using test-pkg script --- DEVELOPERS | 1 + package/Config.in | 1 + package/tesseract-ocr/Config.in | 44 ++++++++++++++++++++ package/tesseract-ocr/tesseract-ocr.hash | 8 ++++ package/tesseract-ocr/tesseract-ocr.mk | 69 ++++++++++++++++++++++++++++++++ 5 files changed, 123 insertions(+) create mode 100644 package/tesseract-ocr/Config.in create mode 100644 package/tesseract-ocr/tesseract-ocr.hash create mode 100644 package/tesseract-ocr/tesseract-ocr.mk diff --git a/DEVELOPERS b/DEVELOPERS index 8802fc7..bdc93d9 100644 --- a/DEVELOPERS +++ b/DEVELOPERS @@ -589,6 +589,7 @@ F: package/httping/ F: package/iozone/ F: package/leptonica/ F: package/ocrad/ +F: package/tesseract-ocr/ F: package/webp/ N: Gregory Dymarek <gregd72002@gmail.com> diff --git a/package/Config.in b/package/Config.in index ed48058..66c87d5 100644 --- a/package/Config.in +++ b/package/Config.in @@ -244,6 +244,7 @@ comment "Graphic applications" source "package/mesa3d-demos/Config.in" source "package/qt5cinex/Config.in" source "package/rrdtool/Config.in" + source "package/tesseract-ocr/Config.in" comment "Graphic libraries" source "package/cegui06/Config.in" diff --git a/package/tesseract-ocr/Config.in b/package/tesseract-ocr/Config.in new file mode 100644 index 0000000..4fd0668 --- /dev/null +++ b/package/tesseract-ocr/Config.in @@ -0,0 +1,44 @@ +comment "tesseract-ocr needs a toolchain w/ threads, C++, gcc >= 4.8 & dynamic library" + depends on BR2_USE_MMU + depends on !BR2_INSTALL_LIBSTDCPP || !BR2_TOOLCHAIN_HAS_THREADS || \ + !BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 || BR2_STATIC_LIBS + +menuconfig BR2_PACKAGE_TESSERACT_OCR + bool "tesseract-ocr" + depends on BR2_INSTALL_LIBSTDCPP + depends on BR2_TOOLCHAIN_HAS_THREADS + depends on BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 # C++11 + depends on BR2_USE_MMU # fork() + depends on !BR2_STATIC_LIBS + select BR2_PACKAGE_JPEG + select BR2_PACKAGE_LEPTONICA + select BR2_PACKAGE_LIBPNG + select BR2_PACKAGE_TIFF + help + Tesseract is an OCR (Optical Character Recognition) engine, + It can be used directly, or (for programmers) using an API. + It supports a wide variety of languages. + + https://github.com/tesseract-ocr/tesseract + +if BR2_PACKAGE_TESSERACT_OCR +comment "tesseract-ocr languages support" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_ENG + bool "English" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_FRA + bool "French" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_GER + bool "German" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_SPA + bool "Spanish" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_SIM + bool "Simplified Chinese" + +config BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_TRA + bool "Traditional Chinese" +endif diff --git a/package/tesseract-ocr/tesseract-ocr.hash b/package/tesseract-ocr/tesseract-ocr.hash new file mode 100644 index 0000000..9bb5b52 --- /dev/null +++ b/package/tesseract-ocr/tesseract-ocr.hash @@ -0,0 +1,8 @@ +# locally computed +sha256 3fe83e06d0f73b39f6e92ed9fc7ccba3ef734877b76aa5ddaaa778fac095d996 tesseract-ocr-3.05.00.tar.gz +sha256 c0515c9f1e0c79e1069fcc05c2b2f6a6841fb5e1082d695db160333c1154f06d eng.traineddata +sha256 86afb23ad146467f263e8ade56fd3951b1cc28f8c4eebc34f993d3c02d88a7ab fra.traineddata +sha256 cb7eb42a7e972cec7ef904fe81825d7b547c46df684c814fdb11a930b13bca3a deu.traineddata +sha256 f23985996bbcfe2b57864ccb082783c1c74c87429f04411a04a6ba4d3da2efda spa.traineddata +sha256 323ae74d4a2ff49e932dbb4d6282fe0e67ddfafda075ec85803ecd077207454c chi_sim.traineddata +sha256 774d566bd0b36e4b6c07415dfa5b6b57feb2575b1f5f231d7fe01a52dac5dd0e chi_tra.traineddata diff --git a/package/tesseract-ocr/tesseract-ocr.mk b/package/tesseract-ocr/tesseract-ocr.mk new file mode 100644 index 0000000..5ddacda --- /dev/null +++ b/package/tesseract-ocr/tesseract-ocr.mk @@ -0,0 +1,69 @@ +################################################################################ +# +# tesseract-ocr +# +################################################################################ + +TESSERACT_OCR_VERSION = 3.05.00 +TESSERACT_OCR_DATA_VERSION = 3.04.00 +TESSERACT_OCR_SITE = $(call github,tesseract-ocr,tesseract,$(TESSERACT_OCR_VERSION)) +TESSERACT_OCR_LICENSE = Apache-2.0 +TESSERACT_OCR_LICENSE_FILES = COPYING + +# Source from github, no configure script provided +TESSERACT_OCR_AUTORECONF = YES + +TESSERACT_OCR_DEPENDENCIES += leptonica jpeg libpng tiff + +TESSERACT_OCR_INSTALL_STAGING = YES + +TESSERACT_OCR_CONF_ENV += \ + LIBLEPT_HEADERSDIR=$(STAGING_DIR)/usr/include/leptonica + +# Language data files download +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_ENG),y) +TESSERACT_OCR_DATA_FILES += eng.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_FRA),y) +TESSERACT_OCR_DATA_FILES += fra.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_DEU),y) +TESSERACT_OCR_DATA_FILES += deu.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_SPA),y) +TESSERACT_OCR_DATA_FILES += spa.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_SIM),y) +TESSERACT_OCR_DATA_FILES += chi_sim.traineddata +endif + +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_TRA),y) +TESSERACT_OCR_DATA_FILES += chi_tra.traineddata +endif + +TESSERACT_OCR_EXTRA_DOWNLOADS = \ + $(addprefix https://github.com/tesseract-ocr/tessdata/raw/$(TESSERACT_OCR_DATA_VERSION)/,\ + $(TESSERACT_OCR_DATA_FILES)) + +define TESSERACT_OCR_PRECONFIGURE + # Autoreconf step fails due to missing m4 directory + mkdir -p $(@D)/m4 +endef + +TESSERACT_OCR_PRE_CONFIGURE_HOOKS += TESSERACT_OCR_PRECONFIGURE + +# Language data files installation +define TESSERACT_OCR_INSTALL_LANG_DATA + $(foreach langfile,$(TESSERACT_OCR_DATA_FILES), \ + $(INSTALL) -D -m 0644 $(DL_DIR)/$(langfile) \ + $(TARGET_DIR)/usr/share/tessdata/$(langfile) + ) +endef + +TESSERACT_OCR_POST_INSTALL_TARGET_HOOKS += TESSERACT_OCR_INSTALL_LANG_DATA + +$(eval $(autotools-package)) -- 2.5.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package 2017-03-19 8:07 ` [Buildroot] [PATCH v2 2/2] tesseract-ocr: " Gilles Talis @ 2017-03-19 13:54 ` Thomas Petazzoni 2017-03-19 23:00 ` Arnout Vandecappelle 0 siblings, 1 reply; 10+ messages in thread From: Thomas Petazzoni @ 2017-03-19 13:54 UTC (permalink / raw) To: buildroot Hello, On Sun, 19 Mar 2017 09:07:53 +0100, Gilles Talis wrote: > diff --git a/package/tesseract-ocr/Config.in b/package/tesseract-ocr/Config.in > new file mode 100644 > index 0000000..4fd0668 > --- /dev/null > +++ b/package/tesseract-ocr/Config.in > @@ -0,0 +1,44 @@ > +comment "tesseract-ocr needs a toolchain w/ threads, C++, gcc >= 4.8 & dynamic library" > + depends on BR2_USE_MMU > + depends on !BR2_INSTALL_LIBSTDCPP || !BR2_TOOLCHAIN_HAS_THREADS || \ > + !BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 || BR2_STATIC_LIBS Indentation of this last line should have been two tabs. > +menuconfig BR2_PACKAGE_TESSERACT_OCR > + bool "tesseract-ocr" > + depends on BR2_INSTALL_LIBSTDCPP > + depends on BR2_TOOLCHAIN_HAS_THREADS > + depends on BR2_TOOLCHAIN_GCC_AT_LEAST_4_8 # C++11 > + depends on BR2_USE_MMU # fork() > + depends on !BR2_STATIC_LIBS > + select BR2_PACKAGE_JPEG > + select BR2_PACKAGE_LEPTONICA > + select BR2_PACKAGE_LIBPNG > + select BR2_PACKAGE_TIFF I don't see where jpeg, libpng and tiff are mandatory. In fact, I don't see them being used by tesseract-ocr, so I've dropped those dependencies for nwo. > +TESSERACT_OCR_VERSION = 3.05.00 > +TESSERACT_OCR_DATA_VERSION = 3.04.00 > +TESSERACT_OCR_SITE = $(call github,tesseract-ocr,tesseract,$(TESSERACT_OCR_VERSION)) > +TESSERACT_OCR_LICENSE = Apache-2.0 > +TESSERACT_OCR_LICENSE_FILES = COPYING > + > +# Source from github, no configure script provided > +TESSERACT_OCR_AUTORECONF = YES > + > +TESSERACT_OCR_DEPENDENCIES += leptonica jpeg libpng tiff I've dropped jpeg, libpng and tiff. Instead, I've added host-pkgconf which is really needed since configure.ac uses PKG_CHECK_MODULES(). I've also passed --disable-opencl since your package hasn't added explicit support for OpenCL. > +# Language data files download > +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_ENG),y) > +TESSERACT_OCR_DATA_FILES += eng.traineddata > +endif > + > +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_FRA),y) > +TESSERACT_OCR_DATA_FILES += fra.traineddata > +endif > + > +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_DEU),y) > +TESSERACT_OCR_DATA_FILES += deu.traineddata > +endif > + > +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_SPA),y) > +TESSERACT_OCR_DATA_FILES += spa.traineddata > +endif > + > +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_SIM),y) > +TESSERACT_OCR_DATA_FILES += chi_sim.traineddata > +endif > + > +ifeq ($(BR2_PACKAGE_TESSERACT_OCR_LANG_CHI_TRA),y) > +TESSERACT_OCR_DATA_FILES += chi_tra.traineddata > +endif Regarding the language files, I'm not entirely happy with the current solution, but I couldn't come up with something better. I looked at the two following options: * Creating a separate package for the tessdata repository https://github.com/tesseract-ocr/tessdata/, but this repository is 3.4GB in size, which is admittedly a bit annoying to download when you just want a single language. * Since the list of languages is quite long, having an explicit option for each of them is a bit annoying. So I looked into turning your one-option-per-language idea into a single option with a space separated list of languages. Except that we anyway need to have the hash file for each language in tesseract-ocr.hash. So in the end, I kept it as-is. We'll see if other folks have better idea. So in the mean time, I've applied with the fixes described above. Thanks! Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package 2017-03-19 13:54 ` Thomas Petazzoni @ 2017-03-19 23:00 ` Arnout Vandecappelle 2017-03-19 23:03 ` Thomas Petazzoni 0 siblings, 1 reply; 10+ messages in thread From: Arnout Vandecappelle @ 2017-03-19 23:00 UTC (permalink / raw) To: buildroot On 19-03-17 14:54, Thomas Petazzoni wrote: > Regarding the language files, I'm not entirely happy with the current > solution, but I couldn't come up with something better. I looked at the > two following options: > > * Creating a separate package for the tessdata repository > https://github.com/tesseract-ocr/tessdata/, but this repository is > 3.4GB in size, which is admittedly a bit annoying to download when > you just want a single language. > > * Since the list of languages is quite long, having an explicit option > for each of them is a bit annoying. So I looked into turning your > one-option-per-language idea into a single option with a space > separated list of languages. Except that we anyway need to have the > hash file for each language in tesseract-ocr.hash. That's why we have BR_NO_CHECK_HASH_FOR, no? Regards, Arnout > > So in the end, I kept it as-is. We'll see if other folks have better > idea. -- Arnout Vandecappelle arnout at mind be Senior Embedded Software Architect +32-16-286500 Essensium/Mind http://www.mind.be G.Geenslaan 9, 3001 Leuven, Belgium BE 872 984 063 RPR Leuven LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle GPG fingerprint: 7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package 2017-03-19 23:00 ` Arnout Vandecappelle @ 2017-03-19 23:03 ` Thomas Petazzoni 2017-03-20 8:10 ` Gilles Talis 0 siblings, 1 reply; 10+ messages in thread From: Thomas Petazzoni @ 2017-03-19 23:03 UTC (permalink / raw) To: buildroot Hello, On Mon, 20 Mar 2017 00:00:27 +0100, Arnout Vandecappelle wrote: > > * Since the list of languages is quite long, having an explicit option > > for each of them is a bit annoying. So I looked into turning your > > one-option-per-language idea into a single option with a space > > separated list of languages. Except that we anyway need to have the > > hash file for each language in tesseract-ocr.hash. > > That's why we have BR_NO_CHECK_HASH_FOR, no? True. But then we don't check hashes for stuff downloaded through Github, which potentially could change (hence the reason why I'm also suggesting to have a package that downloads all of the tessdata package, but it's huge). Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH v2 2/2] tesseract-ocr: new package 2017-03-19 23:03 ` Thomas Petazzoni @ 2017-03-20 8:10 ` Gilles Talis 0 siblings, 0 replies; 10+ messages in thread From: Gilles Talis @ 2017-03-20 8:10 UTC (permalink / raw) To: buildroot Hi Thomas, Arnout, >> That's why we have BR_NO_CHECK_HASH_FOR, no? > > True. But then we don't check hashes for stuff downloaded through > Github, which potentially could change (hence the reason why I'm also > suggesting to have a package that downloads all of the tessdata > package, but it's huge). First of all, thanks a lot for the corrections to my patch. and for committing it. Regarding the language pack, my first intention was to create a package for the entire tessdata. But just like you, I found out this was not a viable option. I am open to all suggestions to make this support better though. thanks again Gilles. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-03-21 7:37 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-19 8:07 [Buildroot] [PATCH v2 0/2] Introducing tesseract OCR engine Gilles Talis 2017-03-19 8:07 ` [Buildroot] [PATCH v2 1/2] leptonica: new package Gilles Talis 2017-03-19 13:51 ` Thomas Petazzoni 2017-03-20 15:05 ` Peter Korsgaard 2017-03-21 7:37 ` Gilles Talis 2017-03-19 8:07 ` [Buildroot] [PATCH v2 2/2] tesseract-ocr: " Gilles Talis 2017-03-19 13:54 ` Thomas Petazzoni 2017-03-19 23:00 ` Arnout Vandecappelle 2017-03-19 23:03 ` Thomas Petazzoni 2017-03-20 8:10 ` Gilles Talis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox