From: Joe Perches <joe@perches.com>
To: Stefan Beller <sbeller@google.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, git <git@vger.kernel.org>
Subject: Re: grep vs git grep performance?
Date: Fri, 27 Oct 2017 10:22:22 -0700 [thread overview]
Message-ID: <1509124942.1914.9.camel@perches.com> (raw)
In-Reply-To: <CAGZ79kYWPunzZ2u=MtCoCadxXu_4etEK5DYnhYXo+CgeHrXQwQ@mail.gmail.com>
On Thu, 2017-10-26 at 10:45 -0700, Stefan Beller wrote:
> On Thu, Oct 26, 2017 at 10:41 AM, Joe Perches <joe@perches.com> wrote:
> > On Thu, 2017-10-26 at 09:58 -0700, Stefan Beller wrote:
> > > + Avar who knows a thing about pcre (I assume the regex compilation
> > > has impact on grep speed)
> > >
> > > On Thu, Oct 26, 2017 at 8:02 AM, Joe Perches <joe@perches.com> wrote:
> > > > Comparing a cache warm git grep vs command line grep
> > > > shows significant differences in cpu & wall clock.
> > > >
> > > > Any ideas how to improve this?
> > > >
> > > > $ time git grep "\bseq_.*%p\W" | wc -l
> > > > 112
> > > >
> > > > real 0m4.271s
> > > > user 0m15.520s
> > > > sys 0m0.395s
> > > >
> > > > $ time grep -r --include=*.[ch] "\bseq_.*%p\W" * | wc -l
> > > > 112
> > > >
> > > > real 0m1.164s
> > > > user 0m0.847s
> > > > sys 0m0.314s
> > > >
> > >
> > > I wonder how much is algorithmic advantage vs coding/micro
> > > optimization that we can do.
> >
> > As do I. I presume this is libpcre related.
> >
> > For instance, git grep performance is better than grep for:
> >
> > $ time git grep -w "seq_printf" -- "*.[ch]" | wc -l
> > 8609
> >
> > real 0m0.301s
> > user 0m0.548s
> > sys 0m0.372s
> >
> > $ time grep -w -r --include=*.[ch] "seq_printf" * | wc -l
> > 8609
> >
> > real 0m0.706s
> > user 0m0.396s
> > sys 0m0.309s
> >
>
> One important piece of information is what version of Git you are running,
>
>
> $ git tag --contains origin/ab/pcre-v2
> v2.14.0
v2.10
> ...
>
> (and the version of pcre, see the numbers)
> https://git.kernel.org/pub/scm/git/git.git/commit/?id=94da9193a6eb8f1085d611c04ff8bbb4f5ae1e0a
I definitely didn't have that one.
I recompiled git latest (with USE_LIBPCRE2) and reran.
Here are the results
$ git --version
git version 2.15.0.rc2.48.g4e40fb3
$ time git grep -P "\bseq_.*%p\W" -- "*.[ch]" | wc -l
112
real 0m0.437s
user 0m1.008s
sys 0m0.381s
So, git grep performance has already been
quite successfully improved.
Thanks.
next prev parent reply other threads:[~2017-10-27 17:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-26 15:02 grep vs git grep performance? Joe Perches
2017-10-26 15:11 ` Han-Wen Nienhuys
2017-10-26 15:55 ` Joe Perches
2017-10-26 16:13 ` SZEDER Gábor
2017-10-26 16:20 ` Joe Perches
2017-10-26 16:58 ` Stefan Beller
2017-10-26 17:41 ` Joe Perches
2017-10-26 17:45 ` Stefan Beller
2017-10-27 17:22 ` Joe Perches [this message]
2017-10-27 22:11 ` Ævar Arnfjörð Bjarmason
2017-10-27 23:22 ` Joe Perches
2017-10-28 7:45 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1509124942.1914.9.camel@perches.com \
--to=joe@perches.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.