* git diff looping? @ 2009-06-16 1:37 John Bito 2009-06-16 2:44 ` Jeff Epler 2009-06-16 11:47 ` Jeff King 0 siblings, 2 replies; 37+ messages in thread From: John Bito @ 2009-06-16 1:37 UTC (permalink / raw) To: git Running Git 1.6.1 on Solaris 10, git diff seems to go into a loop - consuming CPU and producing no output after a little bit. While the repository isn't small, it's not huge (it's http://repo.or.cz/w/egit.git). I've tried the following: $ git diff v0.4.0 > ~/t.diff $ git diff v0.4.0 HEAD > ~/t.diff Both sit indefinitely eating most of 1 CPU $ git fsck Exits quickly with no output. I don't mind going to a newer version of Git, but I'd like to have some idea of the source of the problem before I start trying things. Thanks! John ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 1:37 git diff looping? John Bito @ 2009-06-16 2:44 ` Jeff Epler 2009-06-16 2:53 ` John Bito 2009-06-16 11:47 ` Jeff King 1 sibling, 1 reply; 37+ messages in thread From: Jeff Epler @ 2009-06-16 2:44 UTC (permalink / raw) To: John Bito; +Cc: git It's apples and oranges, but I had no problem on $ git --version git version 1.5.4.3 $ uname -a Linux platte 2.6.24-24-generic #1 SMP Wed Apr 15 15:11:35 UTC 2009 x86_64 GNU/Linux or $ /usr/src/git/git --version git version 1.6.3.2.31.g7af0 $ git clone git://repo.or.cz/egit.git $ cd egit $ git diff v0.4.0 | wc -l 47943 $ /usr/src/git/git diff v0.4.0 | wc -l 47943 Jeff ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 2:44 ` Jeff Epler @ 2009-06-16 2:53 ` John Bito 0 siblings, 0 replies; 37+ messages in thread From: John Bito @ 2009-06-16 2:53 UTC (permalink / raw) To: Jeff Epler; +Cc: git Thanks, Jeff! I didn't mean to suggest there was a problem in the (origin) repository. I was wondering if anyone could recommend an approach to finding out what went wrong. I can certainly clone a fresh repo and apply my changes there. I'd just like to have some idea of what went wrong. ~John On Mon, Jun 15, 2009 at 7:44 PM, Jeff Epler<jepler@unpythonic.net> wrote: > It's apples and oranges, but I had no problem on > $ git --version > git version 1.5.4.3 > $ uname -a > Linux platte 2.6.24-24-generic #1 SMP Wed Apr 15 15:11:35 UTC 2009 x86_64 GNU/Linux > or > $ /usr/src/git/git --version > git version 1.6.3.2.31.g7af0 > > $ git clone git://repo.or.cz/egit.git > $ cd egit > $ git diff v0.4.0 | wc -l > 47943 > > $ /usr/src/git/git diff v0.4.0 | wc -l > 47943 > > Jeff > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 1:37 git diff looping? John Bito 2009-06-16 2:44 ` Jeff Epler @ 2009-06-16 11:47 ` Jeff King 2009-06-16 12:07 ` Jeff King ` (2 more replies) 1 sibling, 3 replies; 37+ messages in thread From: Jeff King @ 2009-06-16 11:47 UTC (permalink / raw) To: John Bito; +Cc: git On Mon, Jun 15, 2009 at 06:37:21PM -0700, John Bito wrote: > Running Git 1.6.1 on Solaris 10, git diff seems to go into a loop - > consuming CPU and producing no output after a little bit. While the > repository isn't small, it's not huge (it's > http://repo.or.cz/w/egit.git). I've tried the following: I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to be caused by a horribly slow system regex implementation; it really chokes on the regex we use to find the "funcname" line for java files. I tried running "git diff v0.4.0" and it still hadn't finished after 90 seconds. Then I did: git config diff.java.xfuncname foo ;# some garbage regex git diff v0.4.0 and it completed in about 2.5 seconds. Can you try that and see if it works around the problem for you? If anybody wants to look further into the problem, I think it is specifically triggered by this file (and the built-in xfuncname for java files): $ git clone git://repo.or.cz/egit.git $ git diff v0.4.0 -- \ org.spearce.egit.core.test/src/org/spearce/egit/core/op/T0001_ConnectProviderOperationTest.java which isn't even all that big a file, but it is either causing some horrible algorithmic behavior in the regex library, or is outright sending it into an infinite loop. I tried building against the code in compat/regex; it completes in a reasonable amount of time, though it is still noticeably slow. With system regex, the diff given above doesn't complete in less than 90 seconds (at which I get bored and kill it). With compat/regex, it completes in about 2.2 seconds. Disabling the xfuncname, it completes in 0.14 seconds. So I think it is a viable solution to recommend building against compat/regex on Solaris, but I think there is still room for improvement in what we ship in compat/. -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 11:47 ` Jeff King @ 2009-06-16 12:07 ` Jeff King 2009-06-16 12:11 ` [PATCH 1/2] Makefile: refactor regex compat support Jeff King 2009-06-16 12:14 ` [PATCH " Jeff King 2009-06-16 15:48 ` git diff looping? John Bito 2009-06-16 16:51 ` Junio C Hamano 2 siblings, 2 replies; 37+ messages in thread From: Jeff King @ 2009-06-16 12:07 UTC (permalink / raw) To: John Bito; +Cc: Junio C Hamano, git On Tue, Jun 16, 2009 at 07:47:26AM -0400, Jeff King wrote: > $ git clone git://repo.or.cz/egit.git > $ git diff v0.4.0 -- \ > org.spearce.egit.core.test/src/org/spearce/egit/core/op/T0001_ConnectProviderOperationTest.java > > which isn't even all that big a file, but it is either causing some > horrible algorithmic behavior in the regex library, or is outright > sending it into an infinite loop. > > I tried building against the code in compat/regex; it completes in a > reasonable amount of time, though it is still noticeably slow. With > system regex, the diff given above doesn't complete in less than 90 > seconds (at which I get bored and kill it). With compat/regex, it > completes in about 2.2 seconds. Disabling the xfuncname, it completes in > 0.14 seconds. And here is a patch series to use compat/regex on Solaris. I think the first one should be non-controversial, as it just makes the knob more convenient to turn. The second one is up for debate. 1/2: Makefile: refactor regex compat support 2/2: Makefile: use compat regex on Solaris -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/2] Makefile: refactor regex compat support 2009-06-16 12:07 ` Jeff King @ 2009-06-16 12:11 ` Jeff King 2009-06-16 18:47 ` Johannes Sixt 2009-06-16 12:14 ` [PATCH " Jeff King 1 sibling, 1 reply; 37+ messages in thread From: Jeff King @ 2009-06-16 12:11 UTC (permalink / raw) To: John Bito; +Cc: Junio C Hamano, Johannes Sixt, git There was no tweakable knob to use the regex compat code; it was embedded in the mingw build. Since other platforms may want to use it, let's factor it out in the usual way for build configuration knobs. Signed-off-by: Jeff King <peff@peff.net> --- This should behave the same as before on all platforms. The only one I might have screwed up by doing it wrong is mingw, but I have no machine to test on. Johannes, can you confirm that it is right? Makefile | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 04bf8b1..a1e4e45 100644 --- a/Makefile +++ b/Makefile @@ -182,6 +182,8 @@ all:: # # Define NO_CROSS_DIRECTORY_HARDLINKS if you plan to distribute the installed # programs as a tar, where bin/ and libexec/ might be on different file systems. +# +# Define NO_REGEX if you have no or inferior regex support in your C library. GIT-VERSION-FILE: .FORCE-GIT-VERSION-FILE @$(SHELL_PATH) ./GIT-VERSION-GEN @@ -863,10 +865,11 @@ ifneq (,$(findstring MINGW,$(uname_S))) USE_WIN32_MMAP = YesPlease UNRELIABLE_FSTAT = UnfortunatelyYes OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo - COMPAT_CFLAGS += -D__USE_MINGW_ACCESS -DNOGDI -Icompat -Icompat/regex -Icompat/fnmatch + NO_REGEX = YesPlease + COMPAT_CFLAGS += -D__USE_MINGW_ACCESS -DNOGDI -Icompat -Icompat/fnmatch COMPAT_CFLAGS += -DSNPRINTF_SIZE_CORR=1 COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\" - COMPAT_OBJS += compat/mingw.o compat/fnmatch/fnmatch.o compat/regex/regex.o compat/winansi.o + COMPAT_OBJS += compat/mingw.o compat/fnmatch/fnmatch.o compat/winansi.o EXTLIBS += -lws2_32 X = .exe endif @@ -1157,6 +1160,10 @@ endif ifdef UNRELIABLE_FSTAT BASIC_CFLAGS += -DUNRELIABLE_FSTAT endif +ifdef NO_REGEX + COMPAT_CFLAGS += -Icompat/regex + COMPAT_OBJS += compat/regex/regex.o +endif ifeq ($(TCLTK_PATH),) NO_TCLTK=NoThanks -- 1.6.3.2.225.gb8364.dirty ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] Makefile: refactor regex compat support 2009-06-16 12:11 ` [PATCH 1/2] Makefile: refactor regex compat support Jeff King @ 2009-06-16 18:47 ` Johannes Sixt 2009-06-16 19:05 ` Jeff King 0 siblings, 1 reply; 37+ messages in thread From: Johannes Sixt @ 2009-06-16 18:47 UTC (permalink / raw) To: Jeff King; +Cc: John Bito, Junio C Hamano, git On Dienstag, 16. Juni 2009, Jeff King wrote: > There was no tweakable knob to use the regex compat code; it > was embedded in the mingw build. Since other platforms may > want to use it, let's factor it out in the usual way for > build configuration knobs. > > Signed-off-by: Jeff King <peff@peff.net> > --- > This should behave the same as before on all platforms. The only one I > might have screwed up by doing it wrong is mingw, but I have no machine > to test on. Johannes, can you confirm that it is right? It compiles and passes t/t40* (diff stuff) on Windows. But the patch conflicts with recent master, though nothing worrisome. -- Hannes ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] Makefile: refactor regex compat support 2009-06-16 18:47 ` Johannes Sixt @ 2009-06-16 19:05 ` Jeff King 2009-06-16 19:07 ` [PATCH v2 " Jeff King 2009-06-16 19:08 ` [PATCH v2 2/2] Makefile: use compat regex on Solaris Jeff King 0 siblings, 2 replies; 37+ messages in thread From: Jeff King @ 2009-06-16 19:05 UTC (permalink / raw) To: Johannes Sixt; +Cc: John Bito, Junio C Hamano, git On Tue, Jun 16, 2009 at 08:47:28PM +0200, Johannes Sixt wrote: > It compiles and passes t/t40* (diff stuff) on Windows. But the patch > conflicts with recent master, though nothing worrisome. Thanks for checking. Yeah, it looks like several changes got merged to master right after my branch point. All the conflicts are purely textual. For convenience, I'll repost a rebased version. -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v2 1/2] Makefile: refactor regex compat support 2009-06-16 19:05 ` Jeff King @ 2009-06-16 19:07 ` Jeff King 2009-06-16 19:08 ` [PATCH v2 2/2] Makefile: use compat regex on Solaris Jeff King 1 sibling, 0 replies; 37+ messages in thread From: Jeff King @ 2009-06-16 19:07 UTC (permalink / raw) To: Johannes Sixt; +Cc: John Bito, Junio C Hamano, git There was no tweakable knob to use the regex compat code; it was embedded in the mingw build. Since other platforms may want to use it, let's factor it out in the usual way for build configuration knobs. Signed-off-by: Jeff King <peff@peff.net> --- Rebased on today's master to resolve textual conflicts. Makefile | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 41ab8e9..0cb21da 100644 --- a/Makefile +++ b/Makefile @@ -194,6 +194,8 @@ all:: # # Define USE_NED_ALLOCATOR if you want to replace the platforms default # memory allocators with the nedmalloc allocator written by Niall Douglas. +# +# Define NO_REGEX if you have no or inferior regex support in your C library. GIT-VERSION-FILE: .FORCE-GIT-VERSION-FILE @$(SHELL_PATH) ./GIT-VERSION-GEN @@ -884,9 +886,10 @@ ifneq (,$(findstring MINGW,$(uname_S))) USE_NED_ALLOCATOR = YesPlease UNRELIABLE_FSTAT = UnfortunatelyYes OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo - COMPAT_CFLAGS += -D__USE_MINGW_ACCESS -DNOGDI -Icompat -Icompat/regex -Icompat/fnmatch + NO_REGEX = YesPlease + COMPAT_CFLAGS += -D__USE_MINGW_ACCESS -DNOGDI -Icompat -Icompat/fnmatch COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\" - COMPAT_OBJS += compat/mingw.o compat/fnmatch/fnmatch.o compat/regex/regex.o compat/winansi.o + COMPAT_OBJS += compat/mingw.o compat/fnmatch/fnmatch.o compat/winansi.o EXTLIBS += -lws2_32 X = .exe ifneq (,$(wildcard ../THIS_IS_MSYSGIT)) @@ -1200,6 +1203,10 @@ endif ifdef UNRELIABLE_FSTAT BASIC_CFLAGS += -DUNRELIABLE_FSTAT endif +ifdef NO_REGEX + COMPAT_CFLAGS += -Icompat/regex + COMPAT_OBJS += compat/regex/regex.o +endif ifdef USE_NED_ALLOCATOR COMPAT_CFLAGS += -DUSE_NED_ALLOCATOR -DOVERRIDE_STRDUP -DNDEBUG -DREPLACE_SYSTEM_ALLOCATOR -Icompat/nedmalloc -- 1.6.3.2.411.gffb5 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 2/2] Makefile: use compat regex on Solaris 2009-06-16 19:05 ` Jeff King 2009-06-16 19:07 ` [PATCH v2 " Jeff King @ 2009-06-16 19:08 ` Jeff King 2009-06-16 20:07 ` Brandon Casey 2009-06-17 13:15 ` Mike Ralphson 1 sibling, 2 replies; 37+ messages in thread From: Jeff King @ 2009-06-16 19:08 UTC (permalink / raw) To: Johannes Sixt; +Cc: John Bito, Junio C Hamano, git The system regex is either slow or buggy for complex patterns, like the built-in xfuncname pattern for java files. Signed-off-by: Jeff King <peff@peff.net> --- Rebased to today's master to resolve textual conflicts. Makefile | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/Makefile b/Makefile index 0cb21da..3bd0c08 100644 --- a/Makefile +++ b/Makefile @@ -725,6 +725,7 @@ ifeq ($(uname_S),SunOS) NO_MEMMEM = YesPlease NO_MKDTEMP = YesPlease NO_MKSTEMPS = YesPlease + NO_REGEX = YesPlease ifeq ($(uname_R),5.7) NEEDS_RESOLV = YesPlease NO_IPV6 = YesPlease -- 1.6.3.2.411.gffb5 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v2 2/2] Makefile: use compat regex on Solaris 2009-06-16 19:08 ` [PATCH v2 2/2] Makefile: use compat regex on Solaris Jeff King @ 2009-06-16 20:07 ` Brandon Casey 2009-06-17 13:15 ` Mike Ralphson 1 sibling, 0 replies; 37+ messages in thread From: Brandon Casey @ 2009-06-16 20:07 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Sixt, John Bito, Junio C Hamano, git Jeff King wrote: > The system regex is either slow or buggy for complex > patterns, like the built-in xfuncname pattern for java > files. > > Signed-off-by: Jeff King <peff@peff.net> > --- > Rebased to today's master to resolve textual conflicts. > > Makefile | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/Makefile b/Makefile > index 0cb21da..3bd0c08 100644 > --- a/Makefile > +++ b/Makefile > @@ -725,6 +725,7 @@ ifeq ($(uname_S),SunOS) > NO_MEMMEM = YesPlease > NO_MKDTEMP = YesPlease > NO_MKSTEMPS = YesPlease > + NO_REGEX = YesPlease > ifeq ($(uname_R),5.7) > NEEDS_RESOLV = YesPlease > NO_IPV6 = YesPlease You need to add -DHAVE_ALLOCA_H to the BASIC_CFLAGS statement in the SunOS section of the Makefile so that alloca.h will be included in compat/regex/regex.c. This is necessary for the SUNWspro compiler. It takes me a long time to compile, so I haven't checked yet whether this causes any problems for the GNU compiler. -brandon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 2/2] Makefile: use compat regex on Solaris 2009-06-16 19:08 ` [PATCH v2 2/2] Makefile: use compat regex on Solaris Jeff King 2009-06-16 20:07 ` Brandon Casey @ 2009-06-17 13:15 ` Mike Ralphson 2009-06-17 13:55 ` Mike Ralphson 1 sibling, 1 reply; 37+ messages in thread From: Mike Ralphson @ 2009-06-17 13:15 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: Johannes Sixt, John Bito, git 2009/6/16 Jeff King <peff@peff.net> > > The system regex is either slow or buggy for complex > patterns, like the built-in xfuncname pattern for java > files. Also required on AIX (5.3). I thought this was a major performance regression somewhere else, but my 'old stable' version dates from a time when we were briefly using compat/regex on this platform before we reverted that change and switched over to extended regexes. With the above: 3m33.399s Without: ... manually aborted after 58m ! git 1.6.0.2.229.g1293c 3m21.645s Could you squash + NO_REGEX = YesPlease in ifeq ($(uname_S),AIX) please? Shout if follow-up patch preferred. Signed-off-by: Mike Ralphson <mike@abacus.co.uk> Mike PS Pound to a penny INTERNAL_QSORT would be a win on Solaris too... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 2/2] Makefile: use compat regex on Solaris 2009-06-17 13:15 ` Mike Ralphson @ 2009-06-17 13:55 ` Mike Ralphson 0 siblings, 0 replies; 37+ messages in thread From: Mike Ralphson @ 2009-06-17 13:55 UTC (permalink / raw) To: Jeff King, Junio C Hamano, Paolo Bonzini; +Cc: Johannes Sixt, John Bito, git 2009/6/17 Mike Ralphson <mike.ralphson@gmail.com>: > Also required on AIX (5.3). Scratch that, sorry - I hadn't seen the related thread (git diff looping?) Paolo's fix is the only one required for AIX. Sorry for the noise. Mike ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 2/2] Makefile: use compat regex on Solaris 2009-06-16 12:07 ` Jeff King 2009-06-16 12:11 ` [PATCH 1/2] Makefile: refactor regex compat support Jeff King @ 2009-06-16 12:14 ` Jeff King 1 sibling, 0 replies; 37+ messages in thread From: Jeff King @ 2009-06-16 12:14 UTC (permalink / raw) To: John Bito; +Cc: Junio C Hamano, Brandon Casey, git The system regex is either slow or buggy for complex patterns, like the built-in xfuncname pattern for java files. Signed-off-by: Jeff King <peff@peff.net> --- If people want to see the speed difference, try: $ git clone git://repo.or.cz/egit.git $ cd egit $ time git diff v0.4.0 >/dev/null before and after. Makefile | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/Makefile b/Makefile index a1e4e45..0e09ec8 100644 --- a/Makefile +++ b/Makefile @@ -713,6 +713,7 @@ ifeq ($(uname_S),SunOS) NO_HSTRERROR = YesPlease NO_MKDTEMP = YesPlease NO_MKSTEMPS = YesPlease + NO_REGEX = YesPlease ifneq ($(uname_R),5.11) OLD_ICONV = UnfortunatelyYes endif -- 1.6.3.2.225.gb8364.dirty ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 11:47 ` Jeff King 2009-06-16 12:07 ` Jeff King @ 2009-06-16 15:48 ` John Bito 2009-06-16 16:51 ` Junio C Hamano 2 siblings, 0 replies; 37+ messages in thread From: John Bito @ 2009-06-16 15:48 UTC (permalink / raw) To: Jeff King; +Cc: git Thank you, Jeff! Configuring the dummy regex allows the diff process to complete on Solaris 10, as well. ~John On Tue, Jun 16, 2009 at 4:47 AM, Jeff King<peff@peff.net> wrote: > On Mon, Jun 15, 2009 at 06:37:21PM -0700, John Bito wrote: > >> Running Git 1.6.1 on Solaris 10, git diff seems to go into a loop - >> consuming CPU and producing no output after a little bit. While the >> repository isn't small, it's not huge (it's >> http://repo.or.cz/w/egit.git). I've tried the following: > > I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to > be caused by a horribly slow system regex implementation; it really > chokes on the regex we use to find the "funcname" line for java files. I > tried running "git diff v0.4.0" and it still hadn't finished after 90 > seconds. Then I did: > > git config diff.java.xfuncname foo ;# some garbage regex > git diff v0.4.0 > > and it completed in about 2.5 seconds. > > Can you try that and see if it works around the problem for you? > > If anybody wants to look further into the problem, I think it is > specifically triggered by this file (and the built-in xfuncname for java > files): > > $ git clone git://repo.or.cz/egit.git > $ git diff v0.4.0 -- \ > org.spearce.egit.core.test/src/org/spearce/egit/core/op/T0001_ConnectProviderOperationTest.java > > which isn't even all that big a file, but it is either causing some > horrible algorithmic behavior in the regex library, or is outright > sending it into an infinite loop. > > I tried building against the code in compat/regex; it completes in a > reasonable amount of time, though it is still noticeably slow. With > system regex, the diff given above doesn't complete in less than 90 > seconds (at which I get bored and kill it). With compat/regex, it > completes in about 2.2 seconds. Disabling the xfuncname, it completes in > 0.14 seconds. > > So I think it is a viable solution to recommend building against > compat/regex on Solaris, but I think there is still room for improvement > in what we ship in compat/. > > -Peff > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 11:47 ` Jeff King 2009-06-16 12:07 ` Jeff King 2009-06-16 15:48 ` git diff looping? John Bito @ 2009-06-16 16:51 ` Junio C Hamano 2009-06-16 17:15 ` Jeff King 2009-06-16 17:16 ` git diff looping? John Bito 2 siblings, 2 replies; 37+ messages in thread From: Junio C Hamano @ 2009-06-16 16:51 UTC (permalink / raw) To: Jeff King; +Cc: John Bito, git Jeff King <peff@peff.net> writes: > I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to > be caused by a horribly slow system regex implementation; it really > chokes on the regex we use to find the "funcname" line for java files. Hmm. Is running under LC_ALL=C LANG=C _with_ the slow system regex help? > I tried building against the code in compat/regex; it completes in a > reasonable amount of time, though it is still noticeably slow. With > system regex, the diff given above doesn't complete in less than 90 > seconds (at which I get bored and kill it). With compat/regex, it > completes in about 2.2 seconds. Disabling the xfuncname, it completes in > 0.14 seconds. In this particular case it is clear that a good way to fix the problem is to replace Solaris's dumb regex implemention with what comes in compat/, but I at the same time have to wonder if that funcname pattern for java can somehow be simplified, so that it does not to require so sophisticated implementation of regexp? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 16:51 ` Junio C Hamano @ 2009-06-16 17:15 ` Jeff King 2009-06-16 17:35 ` Brandon Casey 2009-06-17 8:46 ` Paolo Bonzini 2009-06-16 17:16 ` git diff looping? John Bito 1 sibling, 2 replies; 37+ messages in thread From: Jeff King @ 2009-06-16 17:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: John Bito, git On Tue, Jun 16, 2009 at 09:51:24AM -0700, Junio C Hamano wrote: > > I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to > > be caused by a horribly slow system regex implementation; it really > > chokes on the regex we use to find the "funcname" line for java files. > > Hmm. Is running under LC_ALL=C LANG=C _with_ the slow system regex help? No, it remains extremely slow (it is possible that it _is_ faster, though, but I never managed to run either case to completion; they are both clearly orders of magnitude off of acceptable). > In this particular case it is clear that a good way to fix the problem is > to replace Solaris's dumb regex implemention with what comes in compat/, > but I at the same time have to wonder if that funcname pattern for java > can somehow be simplified, so that it does not to require so sophisticated > implementation of regexp? That may be a possibility. The default pattern is actually two regexes (one is a "do not match this" and the other is "match this"). The problematic one seems to be (and that is a space and a tab between the brackets): ^[ ]*(([ ]*[A-Za-z_][A-Za-z_0-9]*){2,}[ ]*\([^;]*)$ which I determined by setting diff.java.xfuncname just to that (and it remains slow). Whereas setting it to: ^[ ]*(catch|do|for|if|instanceof|new|return|switch|throw|while) completes in about 5 seconds of CPU time (in the actual pattern it is negated, but that shouldn't matter as we do the negation ourselves). Now that being said, 5 seconds is still embarrassingly bad. Watch this (with the solaris system regex): $ git config diff.java.xfuncname '^[ ]*(catch|do|for|if|instanceof|new|return|switch|throw|while)' $ time git diff v0.4.0 >/dev/null real 0m5.869s user 0m4.720s sys 0m0.200s $ git config diff.java.xfuncname foo $ time git diff v0.4.0 >/dev/null real 0m1.895s user 0m0.980s sys 0m0.210s So besides learning that this machine is horribly slow, we can see that running that relatively simple regex takes almost 4 seconds, compared to a little over 1 second to do the entire rest of the diff. I am inclined to say that regex performance like that is so bad that we shouldn't care about optimizing for it, and just use something else. Bear in mind that the same engine will be used for "grep", too. So you aren't really doing "git grep" users any favors by linking against such an awful library. Really, that performance is so bad that I'm beginning to wonder if I am somehow measuring something wrong. How could they ship something so crappy through so many versions? -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 17:15 ` Jeff King @ 2009-06-16 17:35 ` Brandon Casey 2009-06-16 17:39 ` John Bito 2009-06-16 20:22 ` Brandon Casey 2009-06-17 8:46 ` Paolo Bonzini 1 sibling, 2 replies; 37+ messages in thread From: Brandon Casey @ 2009-06-16 17:35 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, John Bito, git Jeff King wrote: > On Tue, Jun 16, 2009 at 09:51:24AM -0700, Junio C Hamano wrote: > >>> I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to >>> be caused by a horribly slow system regex implementation; it really >>> chokes on the regex we use to find the "funcname" line for java files. >> Hmm. Is running under LC_ALL=C LANG=C _with_ the slow system regex help? > > No, it remains extremely slow (it is possible that it _is_ faster, > though, but I never managed to run either case to completion; they are > both clearly orders of magnitude off of acceptable). I haven't tried setting LC_ALL, LANG, but this Solaris regex is MANY orders of magnitude slower. I've been running your example diff on the egit repository for 2 hours and it still hasn't finished. The compat/regex version finished in 3 seconds. Solaris 10 x86. -brandon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 17:35 ` Brandon Casey @ 2009-06-16 17:39 ` John Bito 2009-06-16 17:41 ` Jeff King 2009-06-16 20:22 ` Brandon Casey 1 sibling, 1 reply; 37+ messages in thread From: John Bito @ 2009-06-16 17:39 UTC (permalink / raw) To: Brandon Casey; +Cc: Jeff King, Junio C Hamano, git I believe the issue is that Solaris implements 'extended' regular expressions only in regcomp/regexec. The implementation of regcmp/regex seems to be from SysV and supports only 'basic' regular expressions. On Tue, Jun 16, 2009 at 10:35 AM, Brandon Casey<casey@nrlssc.navy.mil> wrote: > Jeff King wrote: >> On Tue, Jun 16, 2009 at 09:51:24AM -0700, Junio C Hamano wrote: >> >>>> I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to >>>> be caused by a horribly slow system regex implementation; it really >>>> chokes on the regex we use to find the "funcname" line for java files. >>> Hmm. Is running under LC_ALL=C LANG=C _with_ the slow system regex help? >> >> No, it remains extremely slow (it is possible that it _is_ faster, >> though, but I never managed to run either case to completion; they are >> both clearly orders of magnitude off of acceptable). > > I haven't tried setting LC_ALL, LANG, but this Solaris regex is MANY orders > of magnitude slower. I've been running your example diff on the egit > repository for 2 hours and it still hasn't finished. The compat/regex > version finished in 3 seconds. Solaris 10 x86. > > -brandon > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 17:39 ` John Bito @ 2009-06-16 17:41 ` Jeff King 0 siblings, 0 replies; 37+ messages in thread From: Jeff King @ 2009-06-16 17:41 UTC (permalink / raw) To: John Bito; +Cc: Brandon Casey, Junio C Hamano, git On Tue, Jun 16, 2009 at 10:39:39AM -0700, John Bito wrote: > I believe the issue is that Solaris implements 'extended' regular > expressions only in regcomp/regexec. The implementation of > regcmp/regex seems to be from SysV and supports only 'basic' regular > expressions. The regexps in question end up being compiled by regcomp (see xdiff-interface.c:xdiff_set_find_func), so I don't think that is the issue. -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 17:35 ` Brandon Casey 2009-06-16 17:39 ` John Bito @ 2009-06-16 20:22 ` Brandon Casey 1 sibling, 0 replies; 37+ messages in thread From: Brandon Casey @ 2009-06-16 20:22 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, John Bito, git Brandon Casey wrote: > Jeff King wrote: >> On Tue, Jun 16, 2009 at 09:51:24AM -0700, Junio C Hamano wrote: >> >>>> I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to >>>> be caused by a horribly slow system regex implementation; it really >>>> chokes on the regex we use to find the "funcname" line for java files. >>> Hmm. Is running under LC_ALL=C LANG=C _with_ the slow system regex help? >> No, it remains extremely slow (it is possible that it _is_ faster, >> though, but I never managed to run either case to completion; they are >> both clearly orders of magnitude off of acceptable). > > I haven't tried setting LC_ALL, LANG, but this Solaris regex is MANY orders > of magnitude slower. I've been running your example diff on the egit > repository for 2 hours and it still hasn't finished. The compat/regex > version finished in 3 seconds. Solaris 10 x86. Ok, I don't think this call is going to finish. 'git diff v0.4.0' on Solaris 10 x86 using the native regex library. It has been running now for over 4.5 hours. If you're interested in a data point from another non-gnu regex library, I ran the same test on a mips IRIX6.5. It took 19.5 secs, and this is not a young machine. It takes 4 secs when diff.java.xfuncname is set to 'foo'. -brandon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 17:15 ` Jeff King 2009-06-16 17:35 ` Brandon Casey @ 2009-06-17 8:46 ` Paolo Bonzini 2009-06-17 10:23 ` Jeff King 1 sibling, 1 reply; 37+ messages in thread From: Paolo Bonzini @ 2009-06-17 8:46 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, John Bito, git > Really, that performance is so bad that I'm beginning to wonder if I am > somehow measuring something wrong. How could they ship something so > crappy through so many versions? Because without some care in the matcher, the regex can be exponential. This happens because you can backtrack arbitrarily from [A-Za-z_0-9]* into [A-Za-z_] and ironically it also causes the regex not to work as intended; for example "catch(" can match the complex part of the regex (e.g. the first repetition can be "c" and the second can be "atch". We can make it faster and more correct at the expense of additional complication. Starting from: ^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$ we have to: 1) move [ \t] at the end of the repeated subexpression so that it removes the need for the [ \t] after ^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]*){2,}\([^;]*)$ 2) make sure that at least one space/tab is eaten on all but the last occurrence of the repeated subexpression. To this end the LHS of {2,} is duplicated, once with [ \t]+ and once with [ \t]*. The repetition itself becomes a + since the last occurrence is now separately handled: ^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]* [ \t]*\([^;]*)$ Paolo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-17 8:46 ` Paolo Bonzini @ 2009-06-17 10:23 ` Jeff King 2009-06-17 11:02 ` Paolo Bonzini ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Jeff King @ 2009-06-17 10:23 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Junio C Hamano, John Bito, git On Wed, Jun 17, 2009 at 10:46:21AM +0200, Paolo Bonzini wrote: > 2) make sure that at least one space/tab is eaten on all but the last > occurrence of the repeated subexpression. To this end the LHS of {2,} is > duplicated, once with [ \t]+ and once with [ \t]*. The repetition itself > becomes a + since the last occurrence is now separately handled: > > ^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]* > [ \t]*\([^;]*)$ Thanks, I can confirm that this is _much_ faster. Here are some timings from my Solaris 8 box for the "git diff v0.4.0" case using the system and compat engines, and using three regexes: the original that git is using now, an updated one with your regex above[1] replacing the second line of the stock pattern, and a baseline regex of "." which should take virtually no time at all. system, orig: infinite system, paolo: 2.5s system, ".": 0.6s compat, orig: 288.0s compat, paolo: 1.5s compat, ".": 0.6s So it goes from infinite to 2.5s. Which still spends 3 times as long matching funcname regexes as it does actually calculating the diff. The compat library is a little better, but still chokes pretty badly on the original regex. Let's compare compat to the glibc implementation on my Debian box: system, orig: 0.22s system, paolo: 0.22s system, ".": 0.15s compat, orig: 150.88s compat, paolo: 0.43s compat, ".": 0.15s Besides the exponential behavior on the original regex, it is still about twice as slow as the system one. So I think there are three possible optimizations worth considering: 1. Replace the builtin diff.java.xfuncname pattern with what Paolo suggested (though I haven't verified its correctness beyond a cursory look at the results). This is easy to do, and will help people with crappy system regex libraries and people on compat/regex/ (right now just mingw) a _lot_. The downside is that it's a little harder to read the regex, but not terribly so. 2. Recommend NO_REGEX for people with slow system regex libraries. This is also easy to do, and will help people even if we do (1) for two reasons: a. we process user-defined regexes through diff.*.xfuncname patterns, as well as through "git grep"; so we are protecting against poor performance when they give us a complex regex b. even on more reasonable regexps like Paolo's, we seem to get a 2:1 speedup over the Solaris system library 3. Replace compat/regex with something faster. It still produces exponential behavior in complex cases where glibc does not, and it seems to be about 1/3 as fast on Paolo's regex. I haven't looked at how large or how portable the glibc implementation is. Another alternative is that we could provide a simple compat/ as now, and have better support for linking against an external library like pcre, if it is available. -Peff [1] Note if you are cutting and pasting Paolo's regex into the C code, the "\(" needs to be "\\(", which I screwed up in my initial timings. :) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-17 10:23 ` Jeff King @ 2009-06-17 11:02 ` Paolo Bonzini 2009-06-17 11:31 ` Andreas Ericsson 2009-06-17 14:26 ` [PATCH] avoid exponential regex match for java and objc function names Paolo Bonzini 2 siblings, 0 replies; 37+ messages in thread From: Paolo Bonzini @ 2009-06-17 11:02 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, John Bito, git > system, orig: 0.22s > system, paolo: 0.22s > system, ".": 0.15s > compat, orig: 150.88s > compat, paolo: 0.43s > compat, ".": 0.15s > > Besides the exponential behavior on the original regex, it is still > about twice as slow as the system one. The reason is that the glibc regex is a DFA-based matcher. It is much slower on regexes with backreferences, but otherwise it is faster. > 1. Replace the builtin diff.java.xfuncname pattern with what Paolo > suggested (though I haven't verified its correctness beyond a > cursory look at the results). I checked it a bit harder, but still it is not easy to check because of the false positives in the original regex. I'm pretty sure it's correct though; I find it even easier to read (though longer) than the original one. > I haven't looked at how large or how portable the glibc > implementation is. Decently portable, but I don't think it's worth it. Users that write regexes so complex should know of the exponential behavior, I think. Paolo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-17 10:23 ` Jeff King 2009-06-17 11:02 ` Paolo Bonzini @ 2009-06-17 11:31 ` Andreas Ericsson 2009-06-17 13:08 ` Paolo Bonzini 2009-06-17 14:26 ` [PATCH] avoid exponential regex match for java and objc function names Paolo Bonzini 2 siblings, 1 reply; 37+ messages in thread From: Andreas Ericsson @ 2009-06-17 11:31 UTC (permalink / raw) To: Jeff King; +Cc: Paolo Bonzini, Junio C Hamano, John Bito, git Jeff King wrote: > > 3. Replace compat/regex with something faster. It still produces > exponential behavior in complex cases where glibc does not, and it > seems to be about 1/3 as fast on Paolo's regex. > > I haven't looked at how large or how portable the glibc > implementation is. Another alternative is that we could provide a > simple compat/ as now, and have better support for linking against > an external library like pcre, if it is available. > The glibc implementation is quite large. Cutting the library-specific cruft it still sits at about 10k LOC. Using PCRE is a no-go, as it uses perl-compatible regexes even for the posix-compatible API, as per pcreposix(3): When PCRE is called via these functions, it is only the API that is POSIX-like in style. The syntax and semantics of the regular expres- sions themselves are still those of Perl, subject to the setting of various PCRE options, as described below. "POSIX-like in style" means that the API approximates to the POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding domains it is probably even less compatible. This would probably surprise some "git grep" users quite a lot, I think. I like your other two suggestions though. The stuff already in compat/ seems to work well enough, so with Paolo's improved pattern it should be fine. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-17 11:31 ` Andreas Ericsson @ 2009-06-17 13:08 ` Paolo Bonzini 2009-06-17 13:16 ` Andreas Ericsson 0 siblings, 1 reply; 37+ messages in thread From: Paolo Bonzini @ 2009-06-17 13:08 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, John Bito, git > The glibc implementation is quite large. Cutting the library-specific > cruft it still sits at about 10k LOC. > > Using PCRE is a no-go, as it uses perl-compatible regexes even for the > posix-compatible API, as per pcreposix(3): I have a PCRE fork that has POSIX semantics (except the braindead leftmost-longest *sub*expressions). It weighs 8kLOC, you can find it in branch ssed of GNU sed's git repository. Paolo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-17 13:08 ` Paolo Bonzini @ 2009-06-17 13:16 ` Andreas Ericsson 2009-06-17 13:58 ` Paolo Bonzini 0 siblings, 1 reply; 37+ messages in thread From: Andreas Ericsson @ 2009-06-17 13:16 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Jeff King, Junio C Hamano, John Bito, git Paolo Bonzini wrote: > >> The glibc implementation is quite large. Cutting the library-specific >> cruft it still sits at about 10k LOC. >> >> Using PCRE is a no-go, as it uses perl-compatible regexes even for the >> posix-compatible API, as per pcreposix(3): > > I have a PCRE fork that has POSIX semantics (except the braindead > leftmost-longest *sub*expressions). It weighs 8kLOC, you can find it in > branch ssed of GNU sed's git repository. > Sounds neat. Do you by any chance have some performance measurements for it? If the work's already done and it provides a significant improvement I'm all for it ;-) -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-17 13:16 ` Andreas Ericsson @ 2009-06-17 13:58 ` Paolo Bonzini 0 siblings, 0 replies; 37+ messages in thread From: Paolo Bonzini @ 2009-06-17 13:58 UTC (permalink / raw) To: Andreas Ericsson; +Cc: Jeff King, Junio C Hamano, John Bito, git > Sounds neat. Do you by any chance have some performance measurements > for it? If the work's already done and it provides a significant > improvement I'm all for it ;-) It's very very fast, but only as fast as a backtracking matcher can be. I think it would trounce glibc on my regex but probably not on the buggy one. Paolo ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH] avoid exponential regex match for java and objc function names 2009-06-17 10:23 ` Jeff King 2009-06-17 11:02 ` Paolo Bonzini 2009-06-17 11:31 ` Andreas Ericsson @ 2009-06-17 14:26 ` Paolo Bonzini 2009-06-17 15:46 ` demerphq 2009-06-17 16:42 ` Junio C Hamano 2 siblings, 2 replies; 37+ messages in thread From: Paolo Bonzini @ 2009-06-17 14:26 UTC (permalink / raw) To: git; +Cc: peff, gitster In the old regex ^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ you can backtrack arbitrarily from [A-Za-z_0-9]* into [A-Za-z_], thus causing an exponential number of backtracks. Ironically it also causes the regex not to work as intended; for example "catch" can match the underlined part of the regex, the first repetition matching "c" and the second matching "atch". The replacement regex avoids this problem, because it makes sure that at least a space/tab is eaten on each repetition. In other words, a suffix of a repetition can never be a prefix of the next repetition. Signed-off-by: Paolo Bonzini <bonzini@gnu.org> --- userdiff.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/userdiff.c b/userdiff.c index d556da9..57529ae 100644 --- a/userdiff.c +++ b/userdiff.c @@ -13,7 +13,8 @@ PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$", "[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"), PATTERNS("java", "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n" - "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$", + "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$", + /* -- */ "[a-zA-Z_][a-zA-Z0-9_]*" "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" "|[-+*/<>%&^|=!]=" @@ -25,7 +26,7 @@ PATTERNS("objc", /* Objective-C methods */ "^[ \t]*([-+][ \t]*\\([ \t]*[A-Za-z_][A-Za-z_0-9* \t]*\\)[ \t]*[A-Za-z_].*)$\n" /* C functions */ - "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$\n" + "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$\n" /* Objective-C class/protocol definitions */ "^(@(implementation|interface|protocol)[ \t].*)$", /* -- */ -- 1.6.0.3 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH] avoid exponential regex match for java and objc function names 2009-06-17 14:26 ` [PATCH] avoid exponential regex match for java and objc function names Paolo Bonzini @ 2009-06-17 15:46 ` demerphq 2009-06-17 15:56 ` Jeff King 2009-06-17 16:42 ` Junio C Hamano 1 sibling, 1 reply; 37+ messages in thread From: demerphq @ 2009-06-17 15:46 UTC (permalink / raw) To: Paolo Bonzini; +Cc: git, peff, gitster Just a note, but If the Java regex library you are using supports the PCRE compatible (?>...) atomic matching construct (or their equivalent *+ and ++) then these patterns can be significantly improved beyond their current state. 2009/6/17 Paolo Bonzini <bonzini@gnu.org>: > In the old regex > > ^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$ > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > you can backtrack arbitrarily from [A-Za-z_0-9]* into [A-Za-z_], thus > causing an exponential number of backtracks. Ironically it also causes > the regex not to work as intended; for example "catch" can match the > underlined part of the regex, the first repetition matching "c" and > the second matching "atch". > > The replacement regex avoids this problem, because it makes sure that > at least a space/tab is eaten on each repetition. In other words, > a suffix of a repetition can never be a prefix of the next repetition. > > Signed-off-by: Paolo Bonzini <bonzini@gnu.org> > --- > userdiff.c | 5 +++-- > 1 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/userdiff.c b/userdiff.c > index d556da9..57529ae 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -13,7 +13,8 @@ PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$", > "[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"), > PATTERNS("java", > "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n" > - "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$", > + "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$", > + /* -- */ > "[a-zA-Z_][a-zA-Z0-9_]*" > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > "|[-+*/<>%&^|=!]=" > @@ -25,7 +26,7 @@ PATTERNS("objc", > /* Objective-C methods */ > "^[ \t]*([-+][ \t]*\\([ \t]*[A-Za-z_][A-Za-z_0-9* \t]*\\)[ \t]*[A-Za-z_].*)$\n" > /* C functions */ > - "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$\n" > + "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$\n" > /* Objective-C class/protocol definitions */ > "^(@(implementation|interface|protocol)[ \t].*)$", > /* -- */ > -- > 1.6.0.3 > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- perl -Mre=debug -e "/just|another|perl|hacker/" ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] avoid exponential regex match for java and objc function names 2009-06-17 15:46 ` demerphq @ 2009-06-17 15:56 ` Jeff King 2009-06-17 16:00 ` demerphq 0 siblings, 1 reply; 37+ messages in thread From: Jeff King @ 2009-06-17 15:56 UTC (permalink / raw) To: demerphq; +Cc: Paolo Bonzini, git, gitster On Wed, Jun 17, 2009 at 05:46:54PM +0200, demerphq wrote: > Just a note, but If the Java regex library you are using supports > the PCRE compatible (?>...) atomic matching construct (or their > equivalent *+ and ++) then these patterns can be significantly > improved beyond their current state. To clarify, this isn't a java regex library, but rather regexps used to match function names inside java language files when generating diffs. The regex library itself is the POSIX regex routines provided by libc. PCRE syntax is nice, but we don't want to require it for every build, and it's important to have the same syntax everywhere (so that, e.g., your config from one build works on a different build). -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] avoid exponential regex match for java and objc function names 2009-06-17 15:56 ` Jeff King @ 2009-06-17 16:00 ` demerphq 2009-06-17 16:04 ` Paolo Bonzini 0 siblings, 1 reply; 37+ messages in thread From: demerphq @ 2009-06-17 16:00 UTC (permalink / raw) To: Jeff King; +Cc: Paolo Bonzini, git, gitster 2009/6/17 Jeff King <peff@peff.net>: > On Wed, Jun 17, 2009 at 05:46:54PM +0200, demerphq wrote: > >> Just a note, but If the Java regex library you are using supports >> the PCRE compatible (?>...) atomic matching construct (or their >> equivalent *+ and ++) then these patterns can be significantly >> improved beyond their current state. > > To clarify, this isn't a java regex library, but rather regexps used to > match function names inside java language files when generating diffs. > The regex library itself is the POSIX regex routines provided by libc. > > PCRE syntax is nice, but we don't want to require it for every build, > and it's important to have the same syntax everywhere (so that, e.g., > your config from one build works on a different build). Ah ok. Im not familiar with the finer points of the POSIX engine, but PCRE and Perl's engine, and most similar engines are not true regular expression engines and thus benefit *greatly* from atomic matching if it is available. Like the difference between heat-death performance (or stack overflow), and running instantly. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] avoid exponential regex match for java and objc function names 2009-06-17 16:00 ` demerphq @ 2009-06-17 16:04 ` Paolo Bonzini 0 siblings, 0 replies; 37+ messages in thread From: Paolo Bonzini @ 2009-06-17 16:04 UTC (permalink / raw) To: demerphq; +Cc: Jeff King, Paolo Bonzini, git, gitster > Ah ok. Im not familiar with the finer points of the POSIX engine, but > PCRE and Perl's engine, and most similar engines are not true regular > expression engines and thus benefit *greatly* from atomic matching if > it is available. > > Like the difference between heat-death performance (or stack > overflow), and running instantly. You can almost always fix the regex to avoid this, by ensuring that whenever you have (...)+ (or *) a suffix of the subexpression cannot be a prefix of the subexpression too. This is what my patch did -- changing a bad regex to a nicely behaving one. Paolo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] avoid exponential regex match for java and objc function names 2009-06-17 14:26 ` [PATCH] avoid exponential regex match for java and objc function names Paolo Bonzini 2009-06-17 15:46 ` demerphq @ 2009-06-17 16:42 ` Junio C Hamano 2009-06-18 6:45 ` Paolo Bonzini 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2009-06-17 16:42 UTC (permalink / raw) To: Paolo Bonzini; +Cc: git, peff Paolo Bonzini <bonzini@gnu.org> writes: > In the old regex > > ^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\([^;]*)$ > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > you can backtrack arbitrarily from [A-Za-z_0-9]* into [A-Za-z_], thus > causing an exponential number of backtracks. Ironically it also causes > the regex not to work as intended; for example "catch" can match the > underlined part of the regex, the first repetition matching "c" and > the second matching "atch". > > The replacement regex avoids this problem, because it makes sure that > at least a space/tab is eaten on each repetition. In other words, > a suffix of a repetition can never be a prefix of the next repetition. Thanks; nicely done. Should I remove the "/* -- */" or is it for better readability I should keep? > Signed-off-by: Paolo Bonzini <bonzini@gnu.org> > --- > userdiff.c | 5 +++-- > 1 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/userdiff.c b/userdiff.c > index d556da9..57529ae 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -13,7 +13,8 @@ PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$", > "[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"), > PATTERNS("java", > "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n" > - "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$", > + "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$", > + /* -- */ > "[a-zA-Z_][a-zA-Z0-9_]*" > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > "|[-+*/<>%&^|=!]=" > @@ -25,7 +26,7 @@ PATTERNS("objc", > /* Objective-C methods */ > "^[ \t]*([-+][ \t]*\\([ \t]*[A-Za-z_][A-Za-z_0-9* \t]*\\)[ \t]*[A-Za-z_].*)$\n" > /* C functions */ > - "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$\n" > + "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$\n" > /* Objective-C class/protocol definitions */ > "^(@(implementation|interface|protocol)[ \t].*)$", > /* -- */ > -- > 1.6.0.3 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] avoid exponential regex match for java and objc function names 2009-06-17 16:42 ` Junio C Hamano @ 2009-06-18 6:45 ` Paolo Bonzini 0 siblings, 0 replies; 37+ messages in thread From: Paolo Bonzini @ 2009-06-18 6:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: Paolo Bonzini, git, peff > Should I remove the "/* -- */" or is it for better readability I should > keep? It helps detecting the separation between the function regex and the word regex: >> - "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$", >> + "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$", >> + /* -- */ I stole the idea from the Objective-C part: >> /* C functions */ >> - "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$\n" >> + "^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$\n" >> /* Objective-C class/protocol definitions */ >> "^(@(implementation|interface|protocol)[ \t].*)$", >> /* -- */ Paolo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 16:51 ` Junio C Hamano 2009-06-16 17:15 ` Jeff King @ 2009-06-16 17:16 ` John Bito 2009-06-16 17:24 ` Jeff King 1 sibling, 1 reply; 37+ messages in thread From: John Bito @ 2009-06-16 17:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git The Solaris 10 server here isn't set up to build git. git/Makefile isn't compatible with /usr/ccs/bin/make. Is it desired to have a Makefile that's portable to the Sun tools? I was going to test Jeff's patch, but I probably won't install GNU make on this machine unless I find I more compelling need to build git on Solaris. If it would help folks out, I'd be willing to try to create a Makefile patch that works with the Sun tools, but I don't currently have a Linux machine that I can easily use to verify compatibility. On Tue, Jun 16, 2009 at 9:51 AM, Junio C Hamano<gitster@pobox.com> wrote: > Jeff King <peff@peff.net> writes: > >> I can reproduce the problem on Solaris 8 using git v1.6.3. It seems to >> be caused by a horribly slow system regex implementation; it really >> chokes on the regex we use to find the "funcname" line for java files. > > Hmm. Is running under LC_ALL=C LANG=C _with_ the slow system regex help? > >> I tried building against the code in compat/regex; it completes in a >> reasonable amount of time, though it is still noticeably slow. With >> system regex, the diff given above doesn't complete in less than 90 >> seconds (at which I get bored and kill it). With compat/regex, it >> completes in about 2.2 seconds. Disabling the xfuncname, it completes in >> 0.14 seconds. > > In this particular case it is clear that a good way to fix the problem is > to replace Solaris's dumb regex implemention with what comes in compat/, > but I at the same time have to wonder if that funcname pattern for java > can somehow be simplified, so that it does not to require so sophisticated > implementation of regexp? > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: git diff looping? 2009-06-16 17:16 ` git diff looping? John Bito @ 2009-06-16 17:24 ` Jeff King 0 siblings, 0 replies; 37+ messages in thread From: Jeff King @ 2009-06-16 17:24 UTC (permalink / raw) To: John Bito; +Cc: Junio C Hamano, git On Tue, Jun 16, 2009 at 10:16:39AM -0700, John Bito wrote: > The Solaris 10 server here isn't set up to build git. git/Makefile > isn't compatible with /usr/ccs/bin/make. Is it desired to have a > Makefile that's portable to the Sun tools? No, the Makefile is hopelessly GNU, and that is intentional: the subset of make that is portable means there are a lot of things you just can't do. I think it was decided long ago that it wasn't worth trying to support non-gmake. -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2009-06-18 6:46 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-16 1:37 git diff looping? John Bito 2009-06-16 2:44 ` Jeff Epler 2009-06-16 2:53 ` John Bito 2009-06-16 11:47 ` Jeff King 2009-06-16 12:07 ` Jeff King 2009-06-16 12:11 ` [PATCH 1/2] Makefile: refactor regex compat support Jeff King 2009-06-16 18:47 ` Johannes Sixt 2009-06-16 19:05 ` Jeff King 2009-06-16 19:07 ` [PATCH v2 " Jeff King 2009-06-16 19:08 ` [PATCH v2 2/2] Makefile: use compat regex on Solaris Jeff King 2009-06-16 20:07 ` Brandon Casey 2009-06-17 13:15 ` Mike Ralphson 2009-06-17 13:55 ` Mike Ralphson 2009-06-16 12:14 ` [PATCH " Jeff King 2009-06-16 15:48 ` git diff looping? John Bito 2009-06-16 16:51 ` Junio C Hamano 2009-06-16 17:15 ` Jeff King 2009-06-16 17:35 ` Brandon Casey 2009-06-16 17:39 ` John Bito 2009-06-16 17:41 ` Jeff King 2009-06-16 20:22 ` Brandon Casey 2009-06-17 8:46 ` Paolo Bonzini 2009-06-17 10:23 ` Jeff King 2009-06-17 11:02 ` Paolo Bonzini 2009-06-17 11:31 ` Andreas Ericsson 2009-06-17 13:08 ` Paolo Bonzini 2009-06-17 13:16 ` Andreas Ericsson 2009-06-17 13:58 ` Paolo Bonzini 2009-06-17 14:26 ` [PATCH] avoid exponential regex match for java and objc function names Paolo Bonzini 2009-06-17 15:46 ` demerphq 2009-06-17 15:56 ` Jeff King 2009-06-17 16:00 ` demerphq 2009-06-17 16:04 ` Paolo Bonzini 2009-06-17 16:42 ` Junio C Hamano 2009-06-18 6:45 ` Paolo Bonzini 2009-06-16 17:16 ` git diff looping? John Bito 2009-06-16 17:24 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).