All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Simon Ruderich" <simon@ruderich.org>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v2 2/2] grep: fix segfault under -P + PCRE2 <=10.30 + (*NO_JIT)
Date: Thu, 23 Nov 2017 14:16:58 +0000	[thread overview]
Message-ID: <20171123141658.13010-2-avarab@gmail.com> (raw)
In-Reply-To: <20171123141658.13010-1-avarab@gmail.com>
In-Reply-To: <20171122133630.18931-2-avarab@gmail.com>

Fix a bug in the compilation of PCRE2 patterns under JIT (the most
common runtime configuration). Any pattern with a (*NO_JIT) verb would
segfault in any currently released PCRE2 version:

    $ git grep -P '(*NO_JIT)hi.*there'
    Segmentation fault

That this segfaulted was a bug in PCRE2 itself, after reporting it[1]
on pcre-dev it's been fixed in a yet-to-be-released version of
PCRE (presumably released first as 10.31). Now it'll die with:

    $ git grep -P '(*NO_JIT)hi.*there'
    fatal: pcre2_jit_match failed with error code -45: bad JIT option

But the cause of the bug is in our own code dating back to my
94da9193a6 ("grep: add support for PCRE v2", 2017-06-01).

As explained at more length in the comment being added here, it isn't
sufficient to just check pcre2_config() to see whether the JIT should
be used, pcre2_pattern_info() also has to be asked.

This is something I discovered myself when fiddling around with PCRE2
verbs in patterns passed to git. I don't expect that any user of git
has encountered this given the obscurity of passing PCRE2 verbs
through to the library, along with the relative obscurity of (*NO_JIT)
itself.

1. "How am I supposed to use PCRE2 JIT in the face of (*NO_JIT) ?"
   (<CACBZZX5mMqDuWuFmi7sRBp3wH6CFyd-ghACukd=v0NN=rBMnJg@mail.gmail.com> &
    https://lists.exim.org/lurker/thread/20171123.101502.7f0d38ca.en.html)
   on the pcre-dev mailing list

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

Incorporates feedback from Eric & Simon. Thanks both. I also amended
the commit message / comment to note that this was also a bug in PCRE2
upstream, which has been fixed after I reported it.

 grep.c          | 26 ++++++++++++++++++++++++++
 t/t7810-grep.sh |  6 ++++++
 2 files changed, 32 insertions(+)

diff --git a/grep.c b/grep.c
index d0b9b6cdfa..e8ae0b5d8f 100644
--- a/grep.c
+++ b/grep.c
@@ -477,6 +477,8 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	int options = PCRE2_MULTILINE;
 	const uint8_t *character_tables = NULL;
 	int jitret;
+	int patinforet;
+	size_t jitsizearg;
 
 	assert(opt->pcre2);
 
@@ -511,6 +513,30 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
 		if (jitret)
 			die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
+
+		/*
+		 * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just
+		 * tells us whether the library itself supports JIT,
+		 * but to see whether we're going to be actually using
+		 * JIT we need to extract PCRE2_INFO_JITSIZE from the
+		 * pattern *after* we do pcre2_jit_compile() above.
+		 *
+		 * This is because if the pattern contains the
+		 * (*NO_JIT) verb (see pcre2syntax(3))
+		 * pcre2_jit_compile() will exit early with 0. If we
+		 * then proceed to call pcre2_jit_match() further down
+		 * the line instead of pcre2_match() we'll either
+		 * segfault (pre PCRE 10.31) or run into a fatal error
+		 * (post PCRE2 10.31)
+		 */
+		patinforet = pcre2_pattern_info(p->pcre2_pattern, PCRE2_INFO_JITSIZE, &jitsizearg);
+		if (patinforet)
+			BUG("pcre2_pattern_info() failed: %d", patinforet);
+		if (jitsizearg == 0) {
+			p->pcre2_jit_on = 0;
+			return;
+		}
+
 		p->pcre2_jit_stack = pcre2_jit_stack_create(1, 1024 * 1024, NULL);
 		if (!p->pcre2_jit_stack)
 			die("Couldn't allocate PCRE2 JIT stack");
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 2a6679c2f5..c8ff50cc30 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1110,6 +1110,12 @@ test_expect_success PCRE 'grep -P pattern' '
 	test_cmp expected actual
 '
 
+test_expect_success LIBPCRE2 "grep -P with (*NO_JIT) doesn't error out" '
+	git grep -P "(*NO_JIT)\p{Ps}.*?\p{Pe}" hello.c >actual &&
+	test_cmp expected actual
+
+'
+
 test_expect_success !PCRE 'grep -P pattern errors without PCRE' '
 	test_must_fail git grep -P "foo.*bar"
 '
-- 
2.15.0.403.gc27cc4dac6


  parent reply	other threads:[~2017-11-23 14:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-22 13:36 [PATCH 1/2] test-lib: add LIBPCRE1 & LIBPCRE2 prerequisites Ævar Arnfjörð Bjarmason
2017-11-22 13:36 ` [PATCH 2/2] grep: fix segfault under -P + PCRE2 + (*NO_JIT) Ævar Arnfjörð Bjarmason
2017-11-22 18:18   ` Eric Sunshine
2017-11-23  9:15     ` Ævar Arnfjörð Bjarmason
2017-11-23  9:10   ` Simon Ruderich
2017-11-23 14:16   ` [PATCH v2 1/2] test-lib: add LIBPCRE1 & LIBPCRE2 prerequisites Ævar Arnfjörð Bjarmason
2017-11-23 14:16   ` Ævar Arnfjörð Bjarmason [this message]
2017-11-22 20:22 ` [PATCH " Jonathan Nieder
2017-11-23  9:11   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171123141658.13010-2-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=simon@ruderich.org \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.