All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Jeffrey Walton" <noloader@gmail.com>,
	"Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>,
	"J Smith" <dark.panda@gmail.com>,
	"Victor Leschuk" <vleschuk@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Fredrik Kuivinen" <frekui@gmail.com>,
	"Brandon Williams" <bmwill@google.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH/RFC 1/6] Makefile & compat/pcre2: add ability to build an embedded PCRE
Date: Thu, 11 May 2017 17:51:10 +0000	[thread overview]
Message-ID: <20170511175115.648-2-avarab@gmail.com> (raw)
In-Reply-To: <20170511175115.648-1-avarab@gmail.com>

Add a USE_LIBPCRE2_BUNDLED=YesIHaveNoPackagedVersion flag to the
Makefile which'll use the PCRE v2 shipped in compat/pcre2 instead of
trying to find it via -lpcre2-8 on the installed system.

As covered in a previous commits ("grep: add support for PCRE v2",
2017-04-08) there are major benefits to using a bleeding edge PCRE v2,
but more importantly I'd like to experiment with making PCRE a
mandatory dependency to power various internal features of grep/log
without the user being aware that they're using the library under the
hood, similar to how we use kwset now for fixed-string searches.

Imposing that hard dependency on everyone using git would bother a lot
of people, whereas if git itself ships PCRE it's no more bothersome
than the code using kwset, i.e. it can be invisible to the builder &
user, and allow git to target newer PCRE APIs without worrying about
versioning.

See [1] for a mostly one-sided pcre-dev mailing list thread discussing
how embed the library.

Implementation details:

 * I configured PCRE v2 with ./configure --enable-jit --enable-utf

 * It sets a lot of -DHAVE_* but these are used by the subset of the
   files I copied, many are either unused or only used by pcre2test.c
   which isn't brought in by the script.

 * These -DHAVE_* flags are something we have already by default &
   assume in other git.git code, so it should be fine to define it.

 * -DPCRE2_CODE_UNIT_WIDTH=8 only compiles the functions linking to
   -lpcre2-8 would have gotten us.

 * All the limits / sizes are the PCRE defaults, the
   MATCH_LIMIT_RECURSION define is a synonym for MATCH_LIMIT_DEPTH in
   older versions, it allows building against older (currently
   release) versions of the library.

 * -DNEWLINE_DEFAULT=2 means only \n is recognized as a newline. This
    corresponds to the --enable-newline-is-lf option. It's also
    possible to set this to CR, CRLF, any of CR, LF, or CRLF, or any
    Unicode newline character being recognized as \n.

    This *might* have to be customized on Windows, but I think the
    grep machinery always splits on newlines for us already, so this
    probably works on Windows as-is, but needs testing.

1. https://lists.exim.org/lurker/thread/20170507.223619.fbee8f00.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile                  | 52 ++++++++++++++++++++++++++++++++++++
 compat/pcre2/get-pcre2.sh | 67 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 119 insertions(+)
 create mode 100755 compat/pcre2/get-pcre2.sh

diff --git a/Makefile b/Makefile
index d77ca4c1a5..b18867196e 100644
--- a/Makefile
+++ b/Makefile
@@ -34,6 +34,11 @@ all::
 # library. The USE_LIBPCRE flag will likely be changed to mean v2 by
 # default in future releases.
 #
+# Define USE_LIBPCRE2_BUNDLED=YesIHaveNoPackagedVersion in addition to
+# USE_LIBPCRE2=YesPlease if you'd like to use a copy of PCRE version 2
+# bunded with Git. This is for setups where getting a hold of a
+# packaged PCRE is inconvenient.
+#
 # Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
 #
@@ -1105,8 +1110,10 @@ endif
 
 ifdef USE_LIBPCRE2
 	BASIC_CFLAGS += -DUSE_LIBPCRE2
+ifndef USE_LIBPCRE2_BUNDLED
 	EXTLIBS += -lpcre2-8
 endif
+endif
 
 ifdef LIBPCREDIR
 	BASIC_CFLAGS += -I$(LIBPCREDIR)/include
@@ -1505,6 +1512,50 @@ ifdef NO_REGEX
 	COMPAT_CFLAGS += -Icompat/regex
 	COMPAT_OBJS += compat/regex/regex.o
 endif
+ifdef USE_LIBPCRE2_BUNDLED
+ifndef USE_LIBPCRE2
+$(error please set USE_LIBPCRE2=YesPlease when setting \
+USE_LIBPCRE2_BUNDLED=$(USE_LIBPCRE2_BUNDLED))
+endif
+	COMPAT_CFLAGS += \
+		-Icompat/pcre2/src \
+		-DHAVE_BCOPY=1 \
+		-DHAVE_INTTYPES_H=1 \
+		-DHAVE_MEMMOVE=1 \
+		-DHAVE_STDINT_H=1 \
+		-DPCRE2_CODE_UNIT_WIDTH=8 \
+		-DLINK_SIZE=2 \
+		-DHEAP_LIMIT=20000000 \
+		-DMATCH_LIMIT=10000000 \
+		-DMATCH_LIMIT_DEPTH=10000000 \
+		-DMATCH_LIMIT_RECURSION=10000000 \
+		-DMAX_NAME_COUNT=10000 \
+		-DMAX_NAME_SIZE=32 \
+		-DPARENS_NEST_LIMIT=250 \
+		-DNEWLINE_DEFAULT=2 \
+		-DSUPPORT_JIT \
+		-DSUPPORT_UNICODE
+	COMPAT_OBJS += \
+		compat/pcre2/src/pcre2_auto_possess.o \
+		compat/pcre2/src/pcre2_chartables.o \
+		compat/pcre2/src/pcre2_compile.o \
+		compat/pcre2/src/pcre2_config.o \
+		compat/pcre2/src/pcre2_context.o \
+		compat/pcre2/src/pcre2_error.o \
+		compat/pcre2/src/pcre2_find_bracket.o \
+		compat/pcre2/src/pcre2_jit_compile.o \
+		compat/pcre2/src/pcre2_maketables.o \
+		compat/pcre2/src/pcre2_match.o \
+		compat/pcre2/src/pcre2_match_data.o \
+		compat/pcre2/src/pcre2_newline.o \
+		compat/pcre2/src/pcre2_ord2utf.o \
+		compat/pcre2/src/pcre2_string_utils.o \
+		compat/pcre2/src/pcre2_study.o \
+		compat/pcre2/src/pcre2_tables.o \
+		compat/pcre2/src/pcre2_ucd.o \
+		compat/pcre2/src/pcre2_valid_utf.o \
+		compat/pcre2/src/pcre2_xclass.o
+endif
 ifdef NATIVE_CRLF
 	BASIC_CFLAGS += -DNATIVE_CRLF
 endif
@@ -2259,6 +2310,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
 	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
 	@echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
+	@echo USE_LIBPCRE2_BUNDLED=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2_BUNDLED)))'\' >>$@+
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
diff --git a/compat/pcre2/get-pcre2.sh b/compat/pcre2/get-pcre2.sh
new file mode 100755
index 0000000000..f1796cb518
--- /dev/null
+++ b/compat/pcre2/get-pcre2.sh
@@ -0,0 +1,67 @@
+#!/bin/sh -e
+
+# Usage:
+# ./get-pcre2.sh '' 'trunk'
+# ./get-pcre2.sh '' 'tags/pcre2-10.23'
+# ./get-pcre2.sh ~/g/pcre2 ''
+
+srcdir=$1
+version=$2
+if test -z "$version"
+then
+	version="tags/pcre2-10.23"
+fi
+
+echo Getting PCRE v2 version $version
+rm -rfv src
+mkdir src src/sljit
+
+for srcfile in \
+	pcre2.h \
+	pcre2_internal.h \
+	pcre2_intmodedep.h \
+	pcre2_ucp.h \
+	pcre2_auto_possess.c \
+	pcre2_chartables.c.dist \
+	pcre2_compile.c \
+	pcre2_config.c \
+	pcre2_context.c \
+	pcre2_error.c \
+	pcre2_find_bracket.c \
+	pcre2_jit_compile.c \
+	pcre2_jit_match.c \
+	pcre2_jit_misc.c \
+	pcre2_maketables.c \
+	pcre2_match.c \
+	pcre2_match_data.c \
+	pcre2_newline.c \
+	pcre2_ord2utf.c \
+	pcre2_string_utils.c \
+	pcre2_study.c \
+	pcre2_tables.c \
+	pcre2_ucd.c \
+	pcre2_valid_utf.c \
+	pcre2_xclass.c
+do
+	if test -z "$srcdir"
+	then
+		svn cat svn://vcs.exim.org/pcre2/code/$version/src/$srcfile >src/$srcfile
+	else
+		cp "$srcdir/src/$srcfile" src/$srcfile
+	fi
+	wc -l src/$srcfile
+done
+
+(cd src && ln -sf pcre2_chartables.c.dist pcre2_chartables.c)
+
+if test -z "$srcdir"
+then
+	for srcfile in $(svn ls svn://vcs.exim.org/pcre2/code/tags/pcre2-10.23/src/sljit)
+	do
+		svn cat svn://vcs.exim.org/pcre2/code/$version/src/sljit/$srcfile >src/sljit/$srcfile
+		wc -l src/sljit/$srcfile
+	done
+else
+	cp -R "$srcdir/src/sljit" src/
+	wc -l src/sljit/*
+fi
-- 
2.11.0


  reply	other threads:[~2017-05-11 17:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-11 17:51 [PATCH/RFC 0/6] Speed up git-grep by using PCRE v2 as a backend Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` Ævar Arnfjörð Bjarmason [this message]
2017-05-11 17:51 ` [PATCH/RFC 2/6] Makefile & compat/pcre2: add dependency on pcre2_convert.c Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 4/6] test-lib: add LIBPCRE1 & LIBPCRE2 prerequisites Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 5/6] grep: support regex patterns containing \0 via PCRE v2 Ævar Arnfjörð Bjarmason
2017-05-11 17:51 ` [PATCH/RFC 6/6] grep: use PCRE v2 under the hood for -G & -E for amazing speedup Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170511175115.648-2-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=dark.panda@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michal.kiedrowicz@gmail.com \
    --cc=noloader@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=vleschuk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.