git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 3/4] diffcore-pickaxe: further refactor count_match()
Date: Sat, 28 Feb 2009 14:10:16 +0100	[thread overview]
Message-ID: <49A937B8.1030205@lsrfire.ath.cx> (raw)
In-Reply-To: <7v8wnrgkjw.fsf@gitster.siamese.dyndns.org>

Junio C Hamano schrieb:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
>> ...
>>> In any case, there is also memmem(), which uses the same fast algorithm
>>> as strstr() in recent glibc versions.  Like this?
>> Thanks; it would be nice to bench this change.
> 
> With memmem() patch applied on top of [1-4/4], the same test as described
> in the commit log message of [4/4] which was:
> 
>     $ STRING='Ensure that the real time constraints are schedulable.'
>     $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null
> 
>     (Before the patch, best of 5 runs)
>     5.59user 0.15system 0:05.74elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
>     0inputs+0outputs (0major+39956minor)pagefaults 0swaps
> 
>     (After the patch, best of 5 runs)
>     3.04user 0.17system 0:03.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
>     0inputs+0outputs (0major+49697minor)pagefaults 0swaps
> 
>     The file "kernel/sched.c" has roughly 900 changes applied to it, and over
>     its lifetime, it has grown from 5kB to 9kB in size.  I'd expect a larger
>     file may see a bigger performance boost.
> 
> (With memmem() patch, best of 5 runs)
> 2.46user 0.15system 0:02.62elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+49698minor)pagefaults 0swaps

I get this (Ubuntu 8.10 x64, Fedora 10 x64 using the same Linux repo,
Windows Vista x64 using a different Linux repo with the same HEAD on
NTFS and msysgit, numbers are the elapsed time in seconds, best of five
runs):

                           Ubuntu  Fedora  Windows
   v1.6.2-rc2                8.14    8.16    9.236
   v1.6.2-rc2+[1-4]          2.43    2.45    2.995
   v1.6.2-rc2+[1-4]+memmem   1.31    1.25    2.917
   v1.6.2-rc2+[1-3]+memmem   1.51    1.16    8.455

Ubuntu has glibc 2.8, while Fedora 10 has glibc 2.9, with a new and more
efficient memmem() implementation.  On Windows, we use our own naive
memmem() implementation.

So using memmem() is worthwhile.  And providing a better fall-back
version in compat/ can speed up this particular case to the point where
the fourth patch becomes moot.

Hmm, gnulib (http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=summary)
contains all parts ready for copy & paste, licensed under the GPL 2 or
up.  That won't cause problems with the libgit2 relicensing effort, as
memmem()  won't end up in there, right?


For the record, here the raw timings used to make the table above (best
of five):

Fedora 10:
8.10user 0.05system 0:08.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+31943minor)pagefaults 0swaps

2.38user 0.06system 0:02.45elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+41294minor)pagefaults 0swaps

1.19user 0.05system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+39274minor)pagefaults 0swaps

1.10user 0.05system 0:01.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+31796minor)pagefaults 0swaps

Ubuntu 8.10:
8.08user 0.05system 0:08.14elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+29579minor)pagefaults 0swaps

2.35user 0.07system 0:02.43elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+40437minor)pagefaults 0swaps

1.23user 0.08system 0:01.31elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+41097minor)pagefaults 0swaps

1.46user 0.05system 0:01.51elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+30533minor)pagefaults 0swaps

Windows:
real    0m9.236s
user    0m0.000s
sys     0m0.000s

real    0m2.995s
user    0m0.000s
sys     0m0.000s

real    0m2.917s
user    0m0.000s
sys     0m0.015s

real    0m8.455s
user    0m0.000s
sys     0m0.000s

  reply	other threads:[~2009-02-28 13:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-26  6:52 [PATCH 0/4] Pickaxe search clean-up and optimization Junio C Hamano
2009-02-26  6:52 ` [PATCH 1/4] diffcore-pickaxe: refactor diffcore_pickaxe() Junio C Hamano
2009-02-27 23:58   ` René Scharfe
2009-02-26  6:52 ` [PATCH 2/4] diffcore-pickaxe: micro-optimize has_match() function Junio C Hamano
2009-02-26  6:52 ` [PATCH 3/4] diffcore-pickaxe: further refactor count_match() Junio C Hamano
2009-02-26  7:23   ` Kjetil Barvik
2009-02-28  1:13   ` René Scharfe
2009-02-28  1:25     ` Junio C Hamano
2009-02-28  6:08       ` Junio C Hamano
2009-02-28 13:10         ` René Scharfe [this message]
2009-02-28 17:40           ` Junio C Hamano
2009-02-28 18:15             ` René Scharfe
2009-02-28 19:16             ` [PATCH] import memmem() with linear complexity from Gnulib René Scharfe
2009-02-28 22:44               ` Mike Hommey
2009-03-01  3:41                 ` Jeff King
2009-03-01 11:15                   ` René Scharfe
2009-03-01 18:55                     ` René Scharfe
2009-03-01  7:31           ` [PATCH 3/4] diffcore-pickaxe: further refactor count_match() Junio C Hamano
2009-03-01 10:53             ` René Scharfe
2009-02-26  6:52 ` [PATCH 4/4] diffcore-pickaxe: optimize by trimming common initial and trailing parts Junio C Hamano
2009-02-26  9:05   ` Junio C Hamano
2009-03-02 23:00 ` [PATCH 1/2] diffcore-pickaxe: use memmem() René Scharfe
2009-03-02 23:19   ` [PATCH 2/2] optimize compat/ memmem() René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49A937B8.1030205@lsrfire.ath.cx \
    --to=rene.scharfe@lsrfire.ath.cx \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).