git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git grep performance regression
@ 2013-01-14 22:38 Ross Lagerwall
  2013-01-15  2:46 ` Duy Nguyen
  2013-01-15  2:48 ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Ross Lagerwall @ 2013-01-14 22:38 UTC (permalink / raw)
  To: git; +Cc: Jean-Noël AVILA

Hi,

I have noticed a performance regression in git grep between v1.8.1 and
v1.8.1.1:

On the kernel tree:
For git 1.8.1:
$ time git grep foodsgsg

real   0m0.158s
user   0m0.290s
sys    0m0.207s

For git 1.8.1.1:
$ time /tmp/g/bin/git grep foodsgsg

real   0m0.501s
user   0m0.707s
sys    0m0.493s


A bisect seems to indicate that it was introduced by 94bc67:
commit 94bc671a1f2e8610de475c2494d2763355a99f65
Author: Jean-Noël AVILA <avila.jn@gmail.com>
Date:   Sat Dec 8 21:04:39 2012 +0100

    Add directory pattern matching to attributes
    
    The manpage of gitattributes says: "The rules how the pattern
    matches paths are the same as in .gitignore files" and the gitignore
    pattern matching has a pattern ending with / for directory matching.
    
    This rule is specifically relevant for the 'export-ignore' rule used
    for git archive.
    
    Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

Regards
-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git grep performance regression
  2013-01-14 22:38 git grep performance regression Ross Lagerwall
@ 2013-01-15  2:46 ` Duy Nguyen
  2013-01-15  4:38   ` Duy Nguyen
  2013-01-15  2:48 ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Duy Nguyen @ 2013-01-15  2:46 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: git, Jean-Noël AVILA

On Tue, Jan 15, 2013 at 5:38 AM, Ross Lagerwall <rosslagerwall@gmail.com> wrote:
> Hi,
>
> I have noticed a performance regression in git grep between v1.8.1 and
> v1.8.1.1:
>
> On the kernel tree:
> For git 1.8.1:
> $ time git grep foodsgsg
>
> real   0m0.158s
> user   0m0.290s
> sys    0m0.207s
>
> For git 1.8.1.1:
> $ time /tmp/g/bin/git grep foodsgsg
>
> real   0m0.501s
> user   0m0.707s
> sys    0m0.493s

Interesting. I see the regression on linux-2.6 too (0.6s real vs 0.9s).

> A bisect seems to indicate that it was introduced by 94bc67:
> commit 94bc671a1f2e8610de475c2494d2763355a99f65
> Author: Jean-Noël AVILA <avila.jn@gmail.com>
> Date:   Sat Dec 8 21:04:39 2012 +0100
>
>     Add directory pattern matching to attributes
>
>     The manpage of gitattributes says: "The rules how the pattern
>     matches paths are the same as in .gitignore files" and the gitignore
>     pattern matching has a pattern ending with / for directory matching.
>
>     This rule is specifically relevant for the 'export-ignore' rule used
>     for git archive.

Not the real contributor to the regression, but it should be noted
that glibc's strrchr does not employ a typical loop. Instead it
advances with strchr with a note that strchr is much faster. It looks
like strchr compares a word at a time instead of a byte. We might want
to do the same.

I don't have time to look into details now, but by enabling
DEBUG_ATTR, it looks like this commit makes it push and pop patterns a
lot more than without the commit.
-- 
Duy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git grep performance regression
  2013-01-14 22:38 git grep performance regression Ross Lagerwall
  2013-01-15  2:46 ` Duy Nguyen
@ 2013-01-15  2:48 ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2013-01-15  2:48 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: git, Jean-Noël AVILA

Ross Lagerwall <rosslagerwall@gmail.com> writes:

> I have noticed a performance regression in git grep between v1.8.1 and
> v1.8.1.1:
>
> On the kernel tree:
> For git 1.8.1:
> $ time git grep foodsgsg
>
> real   0m0.158s
> user   0m0.290s
> sys    0m0.207s
>
> For git 1.8.1.1:
> $ time /tmp/g/bin/git grep foodsgsg
>
> real   0m0.501s
> user   0m0.707s
> sys    0m0.493s
>
>
> A bisect seems to indicate that it was introduced by 94bc67:
> commit 94bc671a1f2e8610de475c2494d2763355a99f65
> Author: Jean-Noël AVILA <avila.jn@gmail.com>
> Date:   Sat Dec 8 21:04:39 2012 +0100
>
>     Add directory pattern matching to attributes
>     
>     The manpage of gitattributes says: "The rules how the pattern
>     matches paths are the same as in .gitignore files" and the gitignore
>     pattern matching has a pattern ending with / for directory matching.
>     
>     This rule is specifically relevant for the 'export-ignore' rule used
>     for git archive.
>     
>     Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
>     Signed-off-by: Junio C Hamano <gitster@pobox.com>

Hmph, that looks really bad, especially given that in the normal
codepath like "git grep", we would never care about directories (the
attributes are normally applied to real paths with contents, and the
use by archive is an anomaly) and the implementation should be done
in a way not to impose such excess and useless overhead.

We may end up reverting that patch for the time being X-<.

Jean-Noël, ideas?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git grep performance regression
  2013-01-15  2:46 ` Duy Nguyen
@ 2013-01-15  4:38   ` Duy Nguyen
  2013-01-15  7:36     ` Ross Lagerwall
  0 siblings, 1 reply; 5+ messages in thread
From: Duy Nguyen @ 2013-01-15  4:38 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: git, Jean-Noël AVILA, Junio C Hamano

On Tue, Jan 15, 2013 at 9:46 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> I don't have time to look into details now, but by enabling
> DEBUG_ATTR, it looks like this commit makes it push and pop patterns a
> lot more than without the commit.

I think the culprit is at this chunk:

 static void prepare_attr_stack(const char *path)
 {
        struct attr_stack *elem, *info;
        int dirlen, len;
        const char *cp;

-       cp = strrchr(path, '/');
-       if (!cp)
-               dirlen = 0;
-       else
-               dirlen = cp - path;
+       dirlen = find_basename(path) - path;

dirlen is not expected to include the trailing slash, but
find_basename() does that. It messes up with the path filters for
push/pop in the next code. This brings grep performance closely back
to before for me. Ross, can you check (patch could be whitespace
damaged by gmail)?

-- 8< --
diff --git a/attr.c b/attr.c
index b05110d..1e96e26 100644
--- a/attr.c
+++ b/attr.c
@@ -583,6 +583,9 @@ static void prepare_attr_stack(const char *path)

        dirlen = find_basename(path) - path;

+       if (dirlen)
+               dirlen--;
+
        /*
         * At the bottom of the attribute stack is the built-in
         * set of attribute definitions, followed by the contents
-- 8< --

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: git grep performance regression
  2013-01-15  4:38   ` Duy Nguyen
@ 2013-01-15  7:36     ` Ross Lagerwall
  0 siblings, 0 replies; 5+ messages in thread
From: Ross Lagerwall @ 2013-01-15  7:36 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git, Jean-Noël AVILA, Junio C Hamano

On Tue, Jan 15, 2013 at 11:38:32AM +0700, Duy Nguyen wrote:
> dirlen is not expected to include the trailing slash, but
> find_basename() does that. It messes up with the path filters for
> push/pop in the next code. This brings grep performance closely back
> to before for me. Ross, can you check (patch could be whitespace
> damaged by gmail)?
> 

Yes, that did seem to bring back the performance to the old level.

I noticed that before the patch, the strace output was something like
this:
open("tools/vm/.gitattributes", O_RDONLY) = -1 ENOENT
open("tools/vm//.gitattributes", O_RDONLY) = -1 ENOENT
open("tools/vm//.gitattributes", O_RDONLY) = -1 ENOENT
open("tools/vm//.gitattributes", O_RDONLY) = -1 ENOENT
open("usr/.gitattributes", O_RDONLY)    = -1 ENOENT
open("usr//.gitattributes", O_RDONLY)   = -1 ENOENT
open("usr//.gitattributes", O_RDONLY)   = -1 ENOENT
open("usr//.gitattributes", O_RDONLY)   = -1 ENOENT
open("usr//.gitattributes", O_RDONLY)   = -1 ENOENT
open("usr//.gitattributes", O_RDONLY)   = -1 ENOENT

(and yes, the whitespace was damaged by Gmail!)

Regards
-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-01-15  7:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-14 22:38 git grep performance regression Ross Lagerwall
2013-01-15  2:46 ` Duy Nguyen
2013-01-15  4:38   ` Duy Nguyen
2013-01-15  7:36     ` Ross Lagerwall
2013-01-15  2:48 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).