From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Joe Perches <joe@perches.com>
Cc: Stefan Beller <sbeller@google.com>, git <git@vger.kernel.org>
Subject: Re: grep vs git grep performance?
Date: Sat, 28 Oct 2017 00:11:44 +0200 [thread overview]
Message-ID: <877evgxmu7.fsf@evledraar.booking.com> (raw)
In-Reply-To: <1509124942.1914.9.camel@perches.com>
On Fri, Oct 27 2017, Joe Perches jotted:
> On Thu, 2017-10-26 at 10:45 -0700, Stefan Beller wrote:
>> On Thu, Oct 26, 2017 at 10:41 AM, Joe Perches <joe@perches.com> wrote:
>> > On Thu, 2017-10-26 at 09:58 -0700, Stefan Beller wrote:
>> > > + Avar who knows a thing about pcre (I assume the regex compilation
>> > > has impact on grep speed)
>> > >
>> > > On Thu, Oct 26, 2017 at 8:02 AM, Joe Perches <joe@perches.com> wrote:
>> > > > Comparing a cache warm git grep vs command line grep
>> > > > shows significant differences in cpu & wall clock.
>> > > >
>> > > > Any ideas how to improve this?
>> > > >
>> > > > $ time git grep "\bseq_.*%p\W" | wc -l
>> > > > 112
>> > > >
>> > > > real 0m4.271s
>> > > > user 0m15.520s
>> > > > sys 0m0.395s
>> > > >
>> > > > $ time grep -r --include=*.[ch] "\bseq_.*%p\W" * | wc -l
>> > > > 112
>> > > >
>> > > > real 0m1.164s
>> > > > user 0m0.847s
>> > > > sys 0m0.314s
>> > > >
>> > >
>> > > I wonder how much is algorithmic advantage vs coding/micro
>> > > optimization that we can do.
>> >
>> > As do I. I presume this is libpcre related.
>> >
>> > For instance, git grep performance is better than grep for:
>> >
>> > $ time git grep -w "seq_printf" -- "*.[ch]" | wc -l
>> > 8609
>> >
>> > real 0m0.301s
>> > user 0m0.548s
>> > sys 0m0.372s
>> >
>> > $ time grep -w -r --include=*.[ch] "seq_printf" * | wc -l
>> > 8609
>> >
>> > real 0m0.706s
>> > user 0m0.396s
>> > sys 0m0.309s
>> >
>>
>> One important piece of information is what version of Git you are running,
>>
>>
>> $ git tag --contains origin/ab/pcre-v2
>> v2.14.0
>
> v2.10
>
>> ...
>>
>> (and the version of pcre, see the numbers)
>> https://git.kernel.org/pub/scm/git/git.git/commit/?id=94da9193a6eb8f1085d611c04ff8bbb4f5ae1e0a
>
> I definitely didn't have that one.
>
> I recompiled git latest (with USE_LIBPCRE2) and reran.
>
> Here are the results
>
> $ git --version
> git version 2.15.0.rc2.48.g4e40fb3
>
> $ time git grep -P "\bseq_.*%p\W" -- "*.[ch]" | wc -l
> 112
>
> real 0m0.437s
> user 0m1.008s
> sys 0m0.381s
>
> So, git grep performance has already been
> quite successfully improved.
...and I have WIP patches to use the PCRE engine for patterns without -P
which I intend to start sending soon after the next release.
next prev parent reply other threads:[~2017-10-27 22:11 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-26 15:02 grep vs git grep performance? Joe Perches
2017-10-26 15:11 ` Han-Wen Nienhuys
2017-10-26 15:55 ` Joe Perches
2017-10-26 16:13 ` SZEDER Gábor
2017-10-26 16:20 ` Joe Perches
2017-10-26 16:58 ` Stefan Beller
2017-10-26 17:41 ` Joe Perches
2017-10-26 17:45 ` Stefan Beller
2017-10-27 17:22 ` Joe Perches
2017-10-27 22:11 ` Ævar Arnfjörð Bjarmason [this message]
2017-10-27 23:22 ` Joe Perches
2017-10-28 7:45 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877evgxmu7.fsf@evledraar.booking.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=joe@perches.com \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.