git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh test failures
@ 2011-05-03 17:49 Ramsay Jones
  2011-05-03 21:07 ` Jonathan Nieder
  0 siblings, 1 reply; 3+ messages in thread
From: Ramsay Jones @ 2011-05-03 17:49 UTC (permalink / raw)
  To: GIT Mailing-list; +Cc: Jonathan Nieder, trast, Junio C Hamano


In particular, this bug affects the word-diff regex for 'bibtex' and
'html', leading to the test failures in t4034-diff-words.sh. The bug
is described here:

    http://sourceware.org/bugzilla/show_bug.cgi?id=3957

and was fixed on 12-07-2007. In summary, when the REG_NEWLINE flag is
passed to regcomp(), a non-matching list ([^...]) not containing a
newline should not match a newline. However, in some old versions of
the glibc regex library, the newline character was indeed matched.

In order to fix the problem, we add an explicit '\n' to the list in
the non-matching list expression.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
---

Junio,
    I recently mentioned that a couple of tests in t4034-*.sh were
failing for me on Linux. I have now looked into it, and the problem
turned out to be an old bug in the glibc regex routines. :-(

This is an RFC because:
    - A simple fix would be for me to put NO_REGEX=1 in my config.mak,
      since the compat/regex routines don't suffer this problem.
    - I suspect this bug is old enough that it will not affect many users.
    - I have not audited the other non-matching list expressions in
      userdiff.c
    - blame, grep and pickaxe all call regcomp() with the REG_NEWLINE
      flag, but get the regex from the user (eg from command line).

ATB,
Ramsay Jones

 userdiff.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/userdiff.c b/userdiff.c
index 1ff4797..2f9ba37 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -28,7 +28,7 @@ IPATTERN("fortran",
 	 "|[-+]?[0-9.]+([AaIiDdEeFfLlTtXx][Ss]?[-+]?[0-9.]*)?(_[a-zA-Z0-9][a-zA-Z0-9_]*)?"
 	 "|//|\\*\\*|::|[/<>=]="),
 PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
-	 "[^<>= \t]+"),
+	 "[^<>= \t\n]+"),
 PATTERNS("java",
 	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
 	 "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$",
@@ -94,7 +94,7 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
 	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
 PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
-	 "[={}\"]|[^={}\" \t]+"),
+	 "[={}\"]|[^={}\" \t\n]+"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
 	 "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
 PATTERNS("cpp",
-- 
1.7.5

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-05-07 17:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-03 17:49 [RFC/PATCH] userdiff.c: Avoid old glibc regex bug causing t4034-*.sh test failures Ramsay Jones
2011-05-03 21:07 ` Jonathan Nieder
2011-05-07 17:54   ` Ramsay Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).