From: Junio C Hamano <gitster@pobox.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
"Jeffrey Walton" <noloader@gmail.com>,
"Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>,
"J Smith" <dark.panda@gmail.com>,
"Victor Leschuk" <vleschuk@gmail.com>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
"Fredrik Kuivinen" <frekui@gmail.com>,
"Brandon Williams" <bmwill@google.com>,
"Stefan Beller" <sbeller@google.com>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>,
"Simon Ruderich" <simon@ruderich.org>
Subject: Re: [PATCH v2 7/7] grep: add support for PCRE v2
Date: Wed, 24 May 2017 15:23:38 +0900 [thread overview]
Message-ID: <xmqqa862hixh.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170523192453.14172-8-avarab@gmail.com> ("Ævar Arnfjörð Bjarmason"'s message of "Tue, 23 May 2017 19:24:53 +0000")
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> Add support for v2 of the PCRE API. This is a new major version of
> PCRE that came out in early 2015[1].
>
> The regular expression syntax is the same, but while the API is
> similar, pretty much every function is either renamed or takes
> different arguments. Thus using it via entirely new functions makes
> sense, as opposed to trying to e.g. have one compile_pcre_pattern()
> that would call either PCRE v1 or v2 functions.
>
> Git can now be compiled with either USE_LIBPCRE1=YesPlease or
> USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
> synonym for the former. Providing both is a compile-time error.
>
> With earlier patches to enable JIT for PCRE v1 the performance of the
> release versions of both libraries is almost exactly the same, with
> PCRE v2 being around 1% slower.
>
> However after I reported this to the pcre-dev mailing list[2] I got a
> lot of help with the API use from Zoltán Herczeg, he subsequently
> optimized some of the JIT functionality in v2 of the library.
>
> Running the p7820-grep-engines.sh performance test against the latest
> Subversion trunk of both, with both them and git compiled as -O3, and
> the test run against linux.git, gives the following results. Just the
> /perl/ tests shown:
>
> $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD p7820-grep-engines.sh
> [...]
> Test HEAD~2 HEAD~ HEAD
> ----------------------------------------------------------------------------------------------------------------
> 7820.3: perl grep 'how.to' 0.22(0.40+0.48) 0.22(0.31+0.58) +0.0% 0.22(0.26+0.59) +0.0%
> 7820.7: perl grep '^how to' 0.27(0.62+0.50) 0.28(0.60+0.50) +3.7% 0.22(0.25+0.60) -18.5%
> 7820.11: perl grep '[how] to' 0.33(0.92+0.47) 0.33(0.94+0.45) +0.0% 0.25(0.42+0.51) -24.2%
> 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.35(1.08+0.46) 0.35(1.12+0.41) +0.0% 0.25(0.52+0.50) -28.6%
> 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.30(0.78+0.51) 0.30(0.86+0.42) +0.0% 0.25(0.29+0.54) -16.7%
>
> See commit ("perf: add a comparison test of grep regex engines",
> 2017-04-19) for details on the machine the above test run was executed
> on.
>
> Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
> JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
> mentioning p7820-grep-engines.sh for more details on the test setup.
>
> For ease of readability, a different run just of HEAD~ (PCRE v1 with
> JIT v.s. PCRE v2), again with just the /perl/ tests shown:
>
> Test HEAD~ HEAD
> ---------------------------------------------------------------------------------------
> 7820.3: perl grep 'how.to' 0.23(0.41+0.47) 0.23(0.26+0.59) +0.0%
> 7820.7: perl grep '^how to' 0.27(0.64+0.47) 0.23(0.28+0.56) -14.8%
> 7820.11: perl grep '[how] to' 0.34(0.95+0.44) 0.25(0.38+0.56) -26.5%
> 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.34(1.07+0.46) 0.24(0.52+0.49) -29.4%
> 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.30(0.81+0.46) 0.22(0.33+0.54) -26.7%
>
> I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
> when it does it's around 20% faster.
>
> A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
> the compiled pattern can be shared between threads, but not some of
> the JIT context, however the grep threading support does all pattern &
> JIT compilation in separate threads, so this code doesn't need to
> concern itself with thread safety.
Nicely explained.
> -# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
> +# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
> +# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
> +# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
> +# default in future releases.
> +#
> +# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
> # /foo/bar/include and /foo/bar/lib directories.
As there is no way to use both, having a single LIBPCREDIR is not a
hurting limitation, which makes sense.
> @@ -2241,6 +2258,7 @@ GIT-BUILD-OPTIONS: FORCE
> @echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
> @echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
> @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
Shouldn't the line above record $(USE_LIBPCRE1) instead of the
generic fallback?
> + @echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
> @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
> @echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
> @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
> diff --git a/grep.c b/grep.c
> index 3c0c30f033..569cf9e290 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -179,22 +179,36 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
> case GREP_PATTERN_TYPE_BRE:
> opt->fixed = 0;
> opt->pcre1 = 0;
> + opt->pcre2 = 0;
> break;
>
> case GREP_PATTERN_TYPE_ERE:
> opt->fixed = 0;
> opt->pcre1 = 0;
> + opt->pcre2 = 0;
> opt->regflags |= REG_EXTENDED;
> break;
>
> case GREP_PATTERN_TYPE_FIXED:
> opt->fixed = 1;
> opt->pcre1 = 0;
> + opt->pcre2 = 0;
> break;
>
> case GREP_PATTERN_TYPE_PCRE:
> opt->fixed = 0;
> +#ifdef USE_LIBPCRE2
> + opt->pcre1 = 0;
> + opt->pcre2 = 1;
> +#else
> + /* It's important that pcre1 always be assigned to
> + * even when there's no USE_LIBPCRE* defined. We still
> + * call the PCRE stub function, it just dies with
> + * "cannot use Perl-compatible regexes[...]".
> + */
> opt->pcre1 = 1;
Very well thought-out comment. Our style wants you to have
slash-aster that opens a multi-line comment on its own line, though.
> + opt->pcre2 = 0;
> +#endif
> break;
> }
> }
> @@ -446,6 +460,126 @@ static void free_pcre1_regexp(struct grep_pat *p)
> }
> #endif /* !USE_LIBPCRE1 */
>
> +#ifdef USE_LIBPCRE2
> +static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
> +{
> +...
> + p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
> + p->patternlen, options, &error, &erroffset,
> + p->pcre2_compile_context);
Are all die("BUG:...") in this function actual bugs, or just
"die()"? Just like the comment on an earlier patch, things like
running out of memory that you as a Git programmer cannot fix by
correcting this code are not die("BUG:"), but normal runtime errors.
> +
> + if (p->pcre2_pattern) {
> + p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
> + if (!p->pcre2_match_data)
> + die("BUG: Couldn't allocate PCRE2 match data");
> + } else {
> + pcre2_get_error_message(error, errbuf, sizeof(errbuf));
> + compile_regexp_failed(p, (const char *)&errbuf);
> + }
> +
> + pcre2_config(PCRE2_CONFIG_JIT, &canjit);
> + if (canjit == 1) {
> + jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
> + if (!jitret)
> + p->pcre2_jit_on = 1;
I think the same "would it be better to do this without canjit?"
comment applies here.
> +#else /* !USE_LIBPCRE2 */
> +static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
> +{
> + /* Unreachable until USE_LIBPCRE2 becomes synonymous with
> + * USE_LIBPCRE. See the sibling comment in
> + * grep_set_pattern_type_option().
> + */
> + die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
> +}
Wow. If I were doing this, I wouldn't have been this cautious, but
I have no complaints ;-).
next prev parent reply other threads:[~2017-05-24 6:23 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 01/30] Makefile & configure: reword inaccurate comment about PCRE Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 02/30] grep & rev-list doc: stop promising libpcre for --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 03/30] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 04/30] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-20 23:50 ` Junio C Hamano
2017-05-21 6:58 ` Ævar Arnfjörð Bjarmason
2017-05-22 0:17 ` Junio C Hamano
2017-06-28 21:58 ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
2017-06-29 22:22 ` [PATCH v2 0/6] " Ævar Arnfjörð Bjarmason
2017-06-29 22:22 ` [PATCH v2 1/6] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-29 22:22 ` [PATCH v2 2/6] grep: adjust a redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
2017-06-29 22:22 ` [PATCH v2 3/6] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-29 22:22 ` [PATCH v2 4/6] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
2017-06-29 22:22 ` [PATCH v2 5/6] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
2017-06-30 17:20 ` Junio C Hamano
2017-06-29 22:22 ` [PATCH v2 6/6] grep: remove redundant REG_NEWLINE when compiling fixed regex Ævar Arnfjörð Bjarmason
2017-06-28 21:58 ` [PATCH 1/5] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-28 21:58 ` [PATCH 2/5] grep: remove redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
2017-06-29 17:03 ` Stefan Beller
2017-06-29 17:50 ` Ævar Arnfjörð Bjarmason
2017-06-28 21:58 ` [PATCH 3/5] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-29 17:10 ` Stefan Beller
2017-06-28 21:58 ` [PATCH 4/5] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
2017-06-28 21:58 ` [PATCH 5/5] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
2017-06-29 17:43 ` Stefan Beller
2017-06-29 18:16 ` Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 06/30] grep: add a test asserting that --perl-regexp dies when !PCRE Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 07/30] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 08/30] grep: change non-ASCII -i test to stop using --debug Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 09/30] grep: add tests for --threads=N and grep.threads Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 10/30] grep: amend submodule recursion test for regex engine testing Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 11/30] grep: add tests for grep pattern types being passed to submodules Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 12/30] grep: add a test helper function for less verbose -f \0 tests Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 13/30] grep: prepare for testing binary regexes containing rx metacharacters Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 14/30] grep: add tests to fix blind spots with \0 patterns Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 15/30] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do Ævar Arnfjörð Bjarmason
2017-05-20 23:50 ` Junio C Hamano
2017-05-21 6:23 ` Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 16/30] perf: emit progress output when unpacking & building Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 17/30] perf: add a comparison test of grep regex engines Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 18/30] perf: add a comparison test of grep regex engines with -F Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 19/30] perf: add a comparison test of log --grep regex engines Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 20/30] grep: catch a missing enum in switch statement Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 21/30] grep: remove redundant regflags assignments Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 22/30] grep: factor test for \0 in grep patterns into a function Ævar Arnfjörð Bjarmason
2017-05-23 21:17 ` Brandon Williams
2017-05-20 21:42 ` [PATCH v3 23/30] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 24/30] grep: change internal *pcre* variable & function names to be *pcre1* Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 25/30] grep: move is_fixed() earlier to avoid forward declaration Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 26/30] test-lib: add a PTHREADS prerequisite Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 27/30] pack-objects & index-pack: add test for --threads warning Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 28/30] pack-objects: fix buggy warning about threads Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 29/30] grep: given --threads with NO_PTHREADS=YesPlease, warn Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 30/30] grep: assert that threading is enabled when calling grep_{lock,unlock} Ævar Arnfjörð Bjarmason
2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
2017-05-23 19:24 ` [PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
2017-05-23 19:24 ` [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
2017-05-24 4:42 ` Junio C Hamano
2017-05-25 10:33 ` Ævar Arnfjörð Bjarmason
2017-05-26 0:58 ` Junio C Hamano
2017-05-26 8:06 ` Ævar Arnfjörð Bjarmason
2017-05-26 9:49 ` Junio C Hamano
2017-05-23 19:24 ` [PATCH v2 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
2017-05-24 4:45 ` Junio C Hamano
2017-05-23 19:24 ` [PATCH v2 3/7] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-23 19:24 ` [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
2017-05-24 5:17 ` Junio C Hamano
2017-05-24 7:37 ` Ævar Arnfjörð Bjarmason
2017-05-23 19:24 ` [PATCH v2 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
2017-05-24 6:00 ` Junio C Hamano
2017-05-24 6:38 ` Junio C Hamano
2017-05-23 19:24 ` [PATCH v2 6/7] grep: un-break building with PCRE < 8.20 Ævar Arnfjörð Bjarmason
2017-05-23 19:24 ` [PATCH v2 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
2017-05-24 6:23 ` Junio C Hamano [this message]
2017-05-25 9:49 ` Ævar Arnfjörð Bjarmason
-- strict thread matches above, loose matches on Subject: below --
2017-05-13 23:45 [PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
2017-05-13 23:45 ` [PATCH v2 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqa862hixh.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=avarab@gmail.com \
--cc=bmwill@google.com \
--cc=dark.panda@gmail.com \
--cc=frekui@gmail.com \
--cc=git@vger.kernel.org \
--cc=johannes.schindelin@gmx.de \
--cc=michal.kiedrowicz@gmail.com \
--cc=noloader@gmail.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=sbeller@google.com \
--cc=simon@ruderich.org \
--cc=vleschuk@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.