* git grep performance regression @ 2013-01-14 22:38 Ross Lagerwall 2013-01-15 2:46 ` Duy Nguyen 2013-01-15 2:48 ` Junio C Hamano 0 siblings, 2 replies; 5+ messages in thread From: Ross Lagerwall @ 2013-01-14 22:38 UTC (permalink / raw) To: git; +Cc: Jean-Noël AVILA Hi, I have noticed a performance regression in git grep between v1.8.1 and v1.8.1.1: On the kernel tree: For git 1.8.1: $ time git grep foodsgsg real 0m0.158s user 0m0.290s sys 0m0.207s For git 1.8.1.1: $ time /tmp/g/bin/git grep foodsgsg real 0m0.501s user 0m0.707s sys 0m0.493s A bisect seems to indicate that it was introduced by 94bc67: commit 94bc671a1f2e8610de475c2494d2763355a99f65 Author: Jean-Noël AVILA <avila.jn@gmail.com> Date: Sat Dec 8 21:04:39 2012 +0100 Add directory pattern matching to attributes The manpage of gitattributes says: "The rules how the pattern matches paths are the same as in .gitignore files" and the gitignore pattern matching has a pattern ending with / for directory matching. This rule is specifically relevant for the 'export-ignore' rule used for git archive. Signed-off-by: Jean-Noel Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com> Regards -- Ross Lagerwall ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git grep performance regression 2013-01-14 22:38 git grep performance regression Ross Lagerwall @ 2013-01-15 2:46 ` Duy Nguyen 2013-01-15 4:38 ` Duy Nguyen 2013-01-15 2:48 ` Junio C Hamano 1 sibling, 1 reply; 5+ messages in thread From: Duy Nguyen @ 2013-01-15 2:46 UTC (permalink / raw) To: Ross Lagerwall; +Cc: git, Jean-Noël AVILA On Tue, Jan 15, 2013 at 5:38 AM, Ross Lagerwall <rosslagerwall@gmail.com> wrote: > Hi, > > I have noticed a performance regression in git grep between v1.8.1 and > v1.8.1.1: > > On the kernel tree: > For git 1.8.1: > $ time git grep foodsgsg > > real 0m0.158s > user 0m0.290s > sys 0m0.207s > > For git 1.8.1.1: > $ time /tmp/g/bin/git grep foodsgsg > > real 0m0.501s > user 0m0.707s > sys 0m0.493s Interesting. I see the regression on linux-2.6 too (0.6s real vs 0.9s). > A bisect seems to indicate that it was introduced by 94bc67: > commit 94bc671a1f2e8610de475c2494d2763355a99f65 > Author: Jean-Noël AVILA <avila.jn@gmail.com> > Date: Sat Dec 8 21:04:39 2012 +0100 > > Add directory pattern matching to attributes > > The manpage of gitattributes says: "The rules how the pattern > matches paths are the same as in .gitignore files" and the gitignore > pattern matching has a pattern ending with / for directory matching. > > This rule is specifically relevant for the 'export-ignore' rule used > for git archive. Not the real contributor to the regression, but it should be noted that glibc's strrchr does not employ a typical loop. Instead it advances with strchr with a note that strchr is much faster. It looks like strchr compares a word at a time instead of a byte. We might want to do the same. I don't have time to look into details now, but by enabling DEBUG_ATTR, it looks like this commit makes it push and pop patterns a lot more than without the commit. -- Duy ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git grep performance regression 2013-01-15 2:46 ` Duy Nguyen @ 2013-01-15 4:38 ` Duy Nguyen 2013-01-15 7:36 ` Ross Lagerwall 0 siblings, 1 reply; 5+ messages in thread From: Duy Nguyen @ 2013-01-15 4:38 UTC (permalink / raw) To: Ross Lagerwall; +Cc: git, Jean-Noël AVILA, Junio C Hamano On Tue, Jan 15, 2013 at 9:46 AM, Duy Nguyen <pclouds@gmail.com> wrote: > I don't have time to look into details now, but by enabling > DEBUG_ATTR, it looks like this commit makes it push and pop patterns a > lot more than without the commit. I think the culprit is at this chunk: static void prepare_attr_stack(const char *path) { struct attr_stack *elem, *info; int dirlen, len; const char *cp; - cp = strrchr(path, '/'); - if (!cp) - dirlen = 0; - else - dirlen = cp - path; + dirlen = find_basename(path) - path; dirlen is not expected to include the trailing slash, but find_basename() does that. It messes up with the path filters for push/pop in the next code. This brings grep performance closely back to before for me. Ross, can you check (patch could be whitespace damaged by gmail)? -- 8< -- diff --git a/attr.c b/attr.c index b05110d..1e96e26 100644 --- a/attr.c +++ b/attr.c @@ -583,6 +583,9 @@ static void prepare_attr_stack(const char *path) dirlen = find_basename(path) - path; + if (dirlen) + dirlen--; + /* * At the bottom of the attribute stack is the built-in * set of attribute definitions, followed by the contents -- 8< -- ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: git grep performance regression 2013-01-15 4:38 ` Duy Nguyen @ 2013-01-15 7:36 ` Ross Lagerwall 0 siblings, 0 replies; 5+ messages in thread From: Ross Lagerwall @ 2013-01-15 7:36 UTC (permalink / raw) To: Duy Nguyen; +Cc: git, Jean-Noël AVILA, Junio C Hamano On Tue, Jan 15, 2013 at 11:38:32AM +0700, Duy Nguyen wrote: > dirlen is not expected to include the trailing slash, but > find_basename() does that. It messes up with the path filters for > push/pop in the next code. This brings grep performance closely back > to before for me. Ross, can you check (patch could be whitespace > damaged by gmail)? > Yes, that did seem to bring back the performance to the old level. I noticed that before the patch, the strace output was something like this: open("tools/vm/.gitattributes", O_RDONLY) = -1 ENOENT open("tools/vm//.gitattributes", O_RDONLY) = -1 ENOENT open("tools/vm//.gitattributes", O_RDONLY) = -1 ENOENT open("tools/vm//.gitattributes", O_RDONLY) = -1 ENOENT open("usr/.gitattributes", O_RDONLY) = -1 ENOENT open("usr//.gitattributes", O_RDONLY) = -1 ENOENT open("usr//.gitattributes", O_RDONLY) = -1 ENOENT open("usr//.gitattributes", O_RDONLY) = -1 ENOENT open("usr//.gitattributes", O_RDONLY) = -1 ENOENT open("usr//.gitattributes", O_RDONLY) = -1 ENOENT (and yes, the whitespace was damaged by Gmail!) Regards -- Ross Lagerwall ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git grep performance regression 2013-01-14 22:38 git grep performance regression Ross Lagerwall 2013-01-15 2:46 ` Duy Nguyen @ 2013-01-15 2:48 ` Junio C Hamano 1 sibling, 0 replies; 5+ messages in thread From: Junio C Hamano @ 2013-01-15 2:48 UTC (permalink / raw) To: Ross Lagerwall; +Cc: git, Jean-Noël AVILA Ross Lagerwall <rosslagerwall@gmail.com> writes: > I have noticed a performance regression in git grep between v1.8.1 and > v1.8.1.1: > > On the kernel tree: > For git 1.8.1: > $ time git grep foodsgsg > > real 0m0.158s > user 0m0.290s > sys 0m0.207s > > For git 1.8.1.1: > $ time /tmp/g/bin/git grep foodsgsg > > real 0m0.501s > user 0m0.707s > sys 0m0.493s > > > A bisect seems to indicate that it was introduced by 94bc67: > commit 94bc671a1f2e8610de475c2494d2763355a99f65 > Author: Jean-Noël AVILA <avila.jn@gmail.com> > Date: Sat Dec 8 21:04:39 2012 +0100 > > Add directory pattern matching to attributes > > The manpage of gitattributes says: "The rules how the pattern > matches paths are the same as in .gitignore files" and the gitignore > pattern matching has a pattern ending with / for directory matching. > > This rule is specifically relevant for the 'export-ignore' rule used > for git archive. > > Signed-off-by: Jean-Noel Avila <jn.avila@free.fr> > Signed-off-by: Junio C Hamano <gitster@pobox.com> Hmph, that looks really bad, especially given that in the normal codepath like "git grep", we would never care about directories (the attributes are normally applied to real paths with contents, and the use by archive is an anomaly) and the implementation should be done in a way not to impose such excess and useless overhead. We may end up reverting that patch for the time being X-<. Jean-Noël, ideas? ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-01-15 7:36 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-14 22:38 git grep performance regression Ross Lagerwall 2013-01-15 2:46 ` Duy Nguyen 2013-01-15 4:38 ` Duy Nguyen 2013-01-15 7:36 ` Ross Lagerwall 2013-01-15 2:48 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).