git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh test failures
@ 2011-05-03 17:49 Ramsay Jones
  2011-05-03 21:07 ` Jonathan Nieder
  0 siblings, 1 reply; 3+ messages in thread
From: Ramsay Jones @ 2011-05-03 17:49 UTC (permalink / raw)
  To: GIT Mailing-list; +Cc: Jonathan Nieder, trast, Junio C Hamano


In particular, this bug affects the word-diff regex for 'bibtex' and
'html', leading to the test failures in t4034-diff-words.sh. The bug
is described here:

    http://sourceware.org/bugzilla/show_bug.cgi?id=3957

and was fixed on 12-07-2007. In summary, when the REG_NEWLINE flag is
passed to regcomp(), a non-matching list ([^...]) not containing a
newline should not match a newline. However, in some old versions of
the glibc regex library, the newline character was indeed matched.

In order to fix the problem, we add an explicit '\n' to the list in
the non-matching list expression.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
---

Junio,
    I recently mentioned that a couple of tests in t4034-*.sh were
failing for me on Linux. I have now looked into it, and the problem
turned out to be an old bug in the glibc regex routines. :-(

This is an RFC because:
    - A simple fix would be for me to put NO_REGEX=1 in my config.mak,
      since the compat/regex routines don't suffer this problem.
    - I suspect this bug is old enough that it will not affect many users.
    - I have not audited the other non-matching list expressions in
      userdiff.c
    - blame, grep and pickaxe all call regcomp() with the REG_NEWLINE
      flag, but get the regex from the user (eg from command line).

ATB,
Ramsay Jones

 userdiff.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/userdiff.c b/userdiff.c
index 1ff4797..2f9ba37 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -28,7 +28,7 @@ IPATTERN("fortran",
 	 "|[-+]?[0-9.]+([AaIiDdEeFfLlTtXx][Ss]?[-+]?[0-9.]*)?(_[a-zA-Z0-9][a-zA-Z0-9_]*)?"
 	 "|//|\\*\\*|::|[/<>=]="),
 PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
-	 "[^<>= \t]+"),
+	 "[^<>= \t\n]+"),
 PATTERNS("java",
 	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
 	 "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
@@ -94,7 +94,7 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
 	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
 PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
-	 "[={}\"]|[^={}\" \t]+"),
+	 "[={}\"]|[^={}\" \t\n]+"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
 	 "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
 PATTERNS("cpp",
-- 
1.7.5

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh test failures
  2011-05-03 17:49 [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh test failures Ramsay Jones
@ 2011-05-03 21:07 ` Jonathan Nieder
  2011-05-07 17:54   ` Ramsay Jones
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Nieder @ 2011-05-03 21:07 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: GIT Mailing-list, trast, Junio C Hamano

Hi,

Ramsay Jones wrote:

> This is an RFC because:
>     - A simple fix would be for me to put NO_REGEX=1 in my config.mak,
>       since the compat/regex routines don't suffer this problem.
>     - I suspect this bug is old enough that it will not affect many users.
>     - I have not audited the other non-matching list expressions in
>       userdiff.c
>     - blame, grep and pickaxe all call regcomp() with the REG_NEWLINE
>       flag, but get the regex from the user (eg from command line).

I think excluding \n along with ' ' and \t in similar places makes
sense, but that meanwhile we should add tests to the testsuite for
blame/grep/pickaxe to catch such implementations failing when
NO_REGEX=1 is not set.

Does that make sense?  Thanks for tracking this down.

Jonathan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh test failures
  2011-05-03 21:07 ` Jonathan Nieder
@ 2011-05-07 17:54   ` Ramsay Jones
  0 siblings, 0 replies; 3+ messages in thread
From: Ramsay Jones @ 2011-05-07 17:54 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: GIT Mailing-list, trast, Junio C Hamano

Jonathan Nieder wrote:
> Ramsay Jones wrote:
>> This is an RFC because:
>>     - A simple fix would be for me to put NO_REGEX=1 in my config.mak,
>>       since the compat/regex routines don't suffer this problem.
>>     - I suspect this bug is old enough that it will not affect many users.
>>     - I have not audited the other non-matching list expressions in
>>       userdiff.c
>>     - blame, grep and pickaxe all call regcomp() with the REG_NEWLINE
>>       flag, but get the regex from the user (eg from command line).
> 
> I think excluding \n along with ' ' and \t in similar places makes
> sense, ...

I briefly considered replacing ' \t' with [:space:], but that would
also add \v, \f and \r in addition to \n (and, maybe, some locale
specific chars too), so I decided not to go that route ...

>        ... but that meanwhile we should add tests to the testsuite for
> blame/grep/pickaxe to catch such implementations failing when
> NO_REGEX=1 is not set.

Hmm, well I would certainly never complain about additional tests, but
who would these tests benefit? I suspect that the number of people
building git from source on an old (pre about 2008 say) glibc system
with this bug is somewhat small. (Of course I have no numbers to back
up that gut feeling! :-))

Having said that, I have attached a patch (below), that adds a test
(to t0070-fundamental.sh) using a stand-alone test-regex program, in
order to simplify the detection of this bug.

What do you think? (would this exchange in the list-archive suffice?)

> Does that make sense?  Thanks for tracking this down.

No problem.

ATB,
Ramsay Jones

-- >8 --
Subject: [PATCH] test-regex: Add a test to check for a bug in the regex routines


Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
---
 .gitignore             |    1 +
 Makefile               |    1 +
 t/t0070-fundamental.sh |    5 +++++
 test-regex.c           |   35 +++++++++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+), 0 deletions(-)
 create mode 100644 test-regex.c

diff --git a/.gitignore b/.gitignore
index 711fce7..70af8b1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -174,6 +174,7 @@
 /test-obj-pool
 /test-parse-options
 /test-path-utils
+/test-regex
 /test-run-command
 /test-sha1
 /test-sigchain
diff --git a/Makefile b/Makefile
index 3a1fe20..482b9f9 100644
--- a/Makefile
+++ b/Makefile
@@ -428,6 +428,7 @@ TEST_PROGRAMS_NEED_X += test-match-trees
 TEST_PROGRAMS_NEED_X += test-obj-pool
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
+TEST_PROGRAMS_NEED_X += test-regex
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 9bee8bf..da2c504 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -25,4 +25,9 @@ test_expect_success POSIXPERM 'mktemp to unwritable directory prints filename' '
 	grep "cannotwrite/test" err
 '
 
+test_expect_success 'check for a bug in the regex routines' '
+	# if this test fails, re-build git with NO_REGEX=1
+	test-regex
+'
+
 test_done
diff --git a/test-regex.c b/test-regex.c
new file mode 100644
index 0000000..9259985
--- /dev/null
+++ b/test-regex.c
@@ -0,0 +1,35 @@
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <sys/types.h>
+#include <regex.h>
+
+static void die(const char *fmt, ...)
+{
+	va_list p;
+
+	va_start(p, fmt);
+	vfprintf(stderr, fmt, p);
+	va_end(p);
+	fputc('\n', stderr);
+	exit(128);
+}
+
+int main(int argc, char **argv)
+{
+	char *pat = "[^={} \t]+";
+	char *str = "={}\nfred";
+	regex_t r;
+	regmatch_t m[1];
+
+	if (regcomp(&r, pat, REG_EXTENDED | REG_NEWLINE))
+		die("failed regcomp() for pattern '%s'", pat);
+	if (regexec(&r, str, 1, m, 0))
+		die("no match of pattern '%s' to string '%s'", pat, str);
+
+	/* http://sourceware.org/bugzilla/show_bug.cgi?id=3957  */
+	if (m[0].rm_so == 3) /* matches '\n' when it should not */
+		exit(1);
+
+	exit(0);
+}
-- 
1.7.5

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-05-07 17:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-03 17:49 [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh test failures Ramsay Jones
2011-05-03 21:07 ` Jonathan Nieder
2011-05-07 17:54   ` Ramsay Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).