* git grep performance regression
@ 2013-01-14 22:38 Ross Lagerwall
2013-01-15 2:46 ` Duy Nguyen
2013-01-15 2:48 ` Junio C Hamano
0 siblings, 2 replies; 5+ messages in thread
From: Ross Lagerwall @ 2013-01-14 22:38 UTC (permalink / raw)
To: git; +Cc: Jean-Noël AVILA
Hi,
I have noticed a performance regression in git grep between v1.8.1 and
v1.8.1.1:
On the kernel tree:
For git 1.8.1:
$ time git grep foodsgsg
real 0m0.158s
user 0m0.290s
sys 0m0.207s
For git 1.8.1.1:
$ time /tmp/g/bin/git grep foodsgsg
real 0m0.501s
user 0m0.707s
sys 0m0.493s
A bisect seems to indicate that it was introduced by 94bc67:
commit 94bc671a1f2e8610de475c2494d2763355a99f65
Author: Jean-Noël AVILA <avila.jn@gmail.com>
Date: Sat Dec 8 21:04:39 2012 +0100
Add directory pattern matching to attributes
The manpage of gitattributes says: "The rules how the pattern
matches paths are the same as in .gitignore files" and the gitignore
pattern matching has a pattern ending with / for directory matching.
This rule is specifically relevant for the 'export-ignore' rule used
for git archive.
Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Regards
--
Ross Lagerwall
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git grep performance regression
2013-01-14 22:38 git grep performance regression Ross Lagerwall
@ 2013-01-15 2:46 ` Duy Nguyen
2013-01-15 4:38 ` Duy Nguyen
2013-01-15 2:48 ` Junio C Hamano
1 sibling, 1 reply; 5+ messages in thread
From: Duy Nguyen @ 2013-01-15 2:46 UTC (permalink / raw)
To: Ross Lagerwall; +Cc: git, Jean-Noël AVILA
On Tue, Jan 15, 2013 at 5:38 AM, Ross Lagerwall <rosslagerwall@gmail.com> wrote:
> Hi,
>
> I have noticed a performance regression in git grep between v1.8.1 and
> v1.8.1.1:
>
> On the kernel tree:
> For git 1.8.1:
> $ time git grep foodsgsg
>
> real 0m0.158s
> user 0m0.290s
> sys 0m0.207s
>
> For git 1.8.1.1:
> $ time /tmp/g/bin/git grep foodsgsg
>
> real 0m0.501s
> user 0m0.707s
> sys 0m0.493s
Interesting. I see the regression on linux-2.6 too (0.6s real vs 0.9s).
> A bisect seems to indicate that it was introduced by 94bc67:
> commit 94bc671a1f2e8610de475c2494d2763355a99f65
> Author: Jean-Noël AVILA <avila.jn@gmail.com>
> Date: Sat Dec 8 21:04:39 2012 +0100
>
> Add directory pattern matching to attributes
>
> The manpage of gitattributes says: "The rules how the pattern
> matches paths are the same as in .gitignore files" and the gitignore
> pattern matching has a pattern ending with / for directory matching.
>
> This rule is specifically relevant for the 'export-ignore' rule used
> for git archive.
Not the real contributor to the regression, but it should be noted
that glibc's strrchr does not employ a typical loop. Instead it
advances with strchr with a note that strchr is much faster. It looks
like strchr compares a word at a time instead of a byte. We might want
to do the same.
I don't have time to look into details now, but by enabling
DEBUG_ATTR, it looks like this commit makes it push and pop patterns a
lot more than without the commit.
--
Duy
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git grep performance regression
2013-01-14 22:38 git grep performance regression Ross Lagerwall
2013-01-15 2:46 ` Duy Nguyen
@ 2013-01-15 2:48 ` Junio C Hamano
1 sibling, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2013-01-15 2:48 UTC (permalink / raw)
To: Ross Lagerwall; +Cc: git, Jean-Noël AVILA
Ross Lagerwall <rosslagerwall@gmail.com> writes:
> I have noticed a performance regression in git grep between v1.8.1 and
> v1.8.1.1:
>
> On the kernel tree:
> For git 1.8.1:
> $ time git grep foodsgsg
>
> real 0m0.158s
> user 0m0.290s
> sys 0m0.207s
>
> For git 1.8.1.1:
> $ time /tmp/g/bin/git grep foodsgsg
>
> real 0m0.501s
> user 0m0.707s
> sys 0m0.493s
>
>
> A bisect seems to indicate that it was introduced by 94bc67:
> commit 94bc671a1f2e8610de475c2494d2763355a99f65
> Author: Jean-Noël AVILA <avila.jn@gmail.com>
> Date: Sat Dec 8 21:04:39 2012 +0100
>
> Add directory pattern matching to attributes
>
> The manpage of gitattributes says: "The rules how the pattern
> matches paths are the same as in .gitignore files" and the gitignore
> pattern matching has a pattern ending with / for directory matching.
>
> This rule is specifically relevant for the 'export-ignore' rule used
> for git archive.
>
> Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hmph, that looks really bad, especially given that in the normal
codepath like "git grep", we would never care about directories (the
attributes are normally applied to real paths with contents, and the
use by archive is an anomaly) and the implementation should be done
in a way not to impose such excess and useless overhead.
We may end up reverting that patch for the time being X-<.
Jean-Noël, ideas?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git grep performance regression
2013-01-15 2:46 ` Duy Nguyen
@ 2013-01-15 4:38 ` Duy Nguyen
2013-01-15 7:36 ` Ross Lagerwall
0 siblings, 1 reply; 5+ messages in thread
From: Duy Nguyen @ 2013-01-15 4:38 UTC (permalink / raw)
To: Ross Lagerwall; +Cc: git, Jean-Noël AVILA, Junio C Hamano
On Tue, Jan 15, 2013 at 9:46 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> I don't have time to look into details now, but by enabling
> DEBUG_ATTR, it looks like this commit makes it push and pop patterns a
> lot more than without the commit.
I think the culprit is at this chunk:
static void prepare_attr_stack(const char *path)
{
struct attr_stack *elem, *info;
int dirlen, len;
const char *cp;
- cp = strrchr(path, '/');
- if (!cp)
- dirlen = 0;
- else
- dirlen = cp - path;
+ dirlen = find_basename(path) - path;
dirlen is not expected to include the trailing slash, but
find_basename() does that. It messes up with the path filters for
push/pop in the next code. This brings grep performance closely back
to before for me. Ross, can you check (patch could be whitespace
damaged by gmail)?
-- 8< --
diff --git a/attr.c b/attr.c
index b05110d..1e96e26 100644
--- a/attr.c
+++ b/attr.c
@@ -583,6 +583,9 @@ static void prepare_attr_stack(const char *path)
dirlen = find_basename(path) - path;
+ if (dirlen)
+ dirlen--;
+
/*
* At the bottom of the attribute stack is the built-in
* set of attribute definitions, followed by the contents
-- 8< --
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: git grep performance regression
2013-01-15 4:38 ` Duy Nguyen
@ 2013-01-15 7:36 ` Ross Lagerwall
0 siblings, 0 replies; 5+ messages in thread
From: Ross Lagerwall @ 2013-01-15 7:36 UTC (permalink / raw)
To: Duy Nguyen; +Cc: git, Jean-Noël AVILA, Junio C Hamano
On Tue, Jan 15, 2013 at 11:38:32AM +0700, Duy Nguyen wrote:
> dirlen is not expected to include the trailing slash, but
> find_basename() does that. It messes up with the path filters for
> push/pop in the next code. This brings grep performance closely back
> to before for me. Ross, can you check (patch could be whitespace
> damaged by gmail)?
>
Yes, that did seem to bring back the performance to the old level.
I noticed that before the patch, the strace output was something like
this:
open("tools/vm/.gitattributes", O_RDONLY) = -1 ENOENT
open("tools/vm//.gitattributes", O_RDONLY) = -1 ENOENT
open("tools/vm//.gitattributes", O_RDONLY) = -1 ENOENT
open("tools/vm//.gitattributes", O_RDONLY) = -1 ENOENT
open("usr/.gitattributes", O_RDONLY) = -1 ENOENT
open("usr//.gitattributes", O_RDONLY) = -1 ENOENT
open("usr//.gitattributes", O_RDONLY) = -1 ENOENT
open("usr//.gitattributes", O_RDONLY) = -1 ENOENT
open("usr//.gitattributes", O_RDONLY) = -1 ENOENT
open("usr//.gitattributes", O_RDONLY) = -1 ENOENT
(and yes, the whitespace was damaged by Gmail!)
Regards
--
Ross Lagerwall
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-01-15 7:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-14 22:38 git grep performance regression Ross Lagerwall
2013-01-15 2:46 ` Duy Nguyen
2013-01-15 4:38 ` Duy Nguyen
2013-01-15 7:36 ` Ross Lagerwall
2013-01-15 2:48 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).