All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Perches <joe@perches.com>
To: Stefan Beller <sbeller@google.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, git <git@vger.kernel.org>
Subject: Re: grep vs git grep performance?
Date: Fri, 27 Oct 2017 10:22:22 -0700	[thread overview]
Message-ID: <1509124942.1914.9.camel@perches.com> (raw)
In-Reply-To: <CAGZ79kYWPunzZ2u=MtCoCadxXu_4etEK5DYnhYXo+CgeHrXQwQ@mail.gmail.com>

On Thu, 2017-10-26 at 10:45 -0700, Stefan Beller wrote:
> On Thu, Oct 26, 2017 at 10:41 AM, Joe Perches <joe@perches.com> wrote:
> > On Thu, 2017-10-26 at 09:58 -0700, Stefan Beller wrote:
> > > + Avar who knows a thing about pcre (I assume the regex compilation
> > > has impact on grep speed)
> > > 
> > > On Thu, Oct 26, 2017 at 8:02 AM, Joe Perches <joe@perches.com> wrote:
> > > > Comparing a cache warm git grep vs command line grep
> > > > shows significant differences in cpu & wall clock.
> > > > 
> > > > Any ideas how to improve this?
> > > > 
> > > > $ time git grep "\bseq_.*%p\W" | wc -l
> > > > 112
> > > > 
> > > > real    0m4.271s
> > > > user    0m15.520s
> > > > sys     0m0.395s
> > > > 
> > > > $ time grep -r --include=*.[ch] "\bseq_.*%p\W" * | wc -l
> > > > 112
> > > > 
> > > > real    0m1.164s
> > > > user    0m0.847s
> > > > sys     0m0.314s
> > > > 
> > > 
> > > I wonder how much is algorithmic advantage vs coding/micro
> > > optimization that we can do.
> > 
> > As do I.  I presume this is libpcre related.
> > 
> > For instance, git grep performance is better than grep for:
> > 
> > $ time git grep -w "seq_printf" -- "*.[ch]" | wc -l
> > 8609
> > 
> > real    0m0.301s
> > user    0m0.548s
> > sys     0m0.372s
> > 
> > $ time grep -w -r --include=*.[ch] "seq_printf" * | wc -l
> > 8609
> > 
> > real    0m0.706s
> > user    0m0.396s
> > sys     0m0.309s
> > 
> 
> One important piece of information is what version of Git you are running,
> 
> 
> $ git tag --contains origin/ab/pcre-v2
> v2.14.0

v2.10

> ...
> 
> (and the version of pcre, see the numbers)
> https://git.kernel.org/pub/scm/git/git.git/commit/?id=94da9193a6eb8f1085d611c04ff8bbb4f5ae1e0a

I definitely didn't have that one.

I recompiled git latest (with USE_LIBPCRE2) and reran.

Here are the results

$ git --version
git version 2.15.0.rc2.48.g4e40fb3

$ time git grep -P "\bseq_.*%p\W" -- "*.[ch]" | wc -l
112

real	0m0.437s
user	0m1.008s
sys	0m0.381s

So, git grep performance has already been
quite successfully improved.

Thanks.


  reply	other threads:[~2017-10-27 17:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-26 15:02 grep vs git grep performance? Joe Perches
2017-10-26 15:11 ` Han-Wen Nienhuys
2017-10-26 15:55   ` Joe Perches
2017-10-26 16:13 ` SZEDER Gábor
2017-10-26 16:20   ` Joe Perches
2017-10-26 16:58 ` Stefan Beller
2017-10-26 17:41   ` Joe Perches
2017-10-26 17:45     ` Stefan Beller
2017-10-27 17:22       ` Joe Perches [this message]
2017-10-27 22:11         ` Ævar Arnfjörð Bjarmason
2017-10-27 23:22           ` Joe Perches
2017-10-28  7:45             ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1509124942.1914.9.camel@perches.com \
    --to=joe@perches.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.