git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Ingo Molnar <mingo@elte.hu>, Jonathan Nieder <jrnieder@gmail.com>,
	Duy Nguyen <pclouds@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Make GIT_USE_LOOKUP default?
Date: Mon, 18 Mar 2013 03:32:29 -0400	[thread overview]
Message-ID: <20130318073229.GA5551@sigill.intra.peff.net> (raw)
In-Reply-To: <7vd2uxrdh7.fsf@alter.siamese.dyndns.org>

[+cc Ingo and Jonathan, as this revisits the "open-code hashcmp" thread
     referenced below]

On Sun, Mar 17, 2013 at 01:13:56PM -0700, Junio C Hamano wrote:

> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > This env comes from jc/sha1-lookup in 2008 (merge commit e9f9d4f), 5
> > years ago. I wonder if it's good enough to turn on by default and keep
> > improving from there, or is it still experimental?
> 
> The algorithm has been used in production in other codepaths like
> patch-ids and replace-object, so correctness-wise it should be fine
> to turn it on.  I think nobody has bothered to benchmark with and
> without the environment to see if it is really worth the complexity.
> 
> It may be a good idea to try doing so, now you have noticed it ;-).

The only benchmarking I could find in the list archive (besides the ones
in the commit itself, showing little change, but fewer page faults) is:

  http://article.gmane.org/gmane.comp.version-control.git/123832

which actually indicates that GIT_USE_LOOKUP is slower (despite having
fewer page faults).

By the way, looking at that made me think for a few minutes about
hashcmp, and I was surprised to find that we use an open-coded
comparison loop. That dates back to this thread by Ingo:

  http://article.gmane.org/gmane.comp.version-control.git/172286

I could not replicate his benchmarks at all. In fact, my measurements
showed a slight slowdown with 1a812f3 (hashcmp(): inline memcmp() by
hand to optimize, 2011-04-28).

Here are my best-of-five numbers for running "git rev-list --objects
--all >/dev/null" on linux-2.6.git:

  [current master, compiled with -O2]
  real    0m45.612s
  user    0m45.140s
  sys     0m0.300s

  [current master, compiled with -O3 for comparison]
  real    0m45.588s
  user    0m45.088s
  sys     0m0.312s

  [revert 1a812f3 (i.e., go back to memcmp), -O2]
  real    0m44.358s
  user    0m43.876s
  sys     0m0.316s

  [open-code first byte, fall back to memcmp, -O2]
  real    0m43.963s
  user    0m43.568s
  sys     0m0.284s

I wonder why we get such different numbers. Ingo said his tests are on a
Nehalem CPU, as are mine (mine is an i7-840QM). I wonder if we should be
wrapping the optimization in an #ifdef, but I'm not sure which flag we
should be checking.

Note that I didn't run all of my measurements using "git gc" as Ingo
did, which I think conflates a lot of unrelated performance issues (like
writing out a packfile). The interesting bits for hashcmp in "gc" are
the "Counting objects" phase of pack-objects, and "git prune"
determining reachability. Those are both basically the same as "rev-list
--objects --all".

I did do a quick check of `git gc`, though, and it showed results that
matched my rev-lists above (i.e., a very slight speedup by going back to
memcmp).

-Peff

  reply	other threads:[~2013-03-18  7:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-17 13:25 Make GIT_USE_LOOKUP default? Duy Nguyen
2013-03-17 20:13 ` Junio C Hamano
2013-03-18  7:32   ` Jeff King [this message]
2013-03-18 16:44     ` Junio C Hamano
2013-03-18 16:49       ` Jeff King
2013-03-18 17:08         ` Junio C Hamano
2013-03-18 17:19           ` Jeff King
2013-03-18 18:40             ` Junio C Hamano
2013-03-19 15:43     ` Duy Nguyen
2013-03-19 15:55       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130318073229.GA5551@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=mingo@elte.hu \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).