From: Jeff King <peff@peff.net>
To: Thomas Rast <trast@inf.ethz.ch>
Cc: Stefan Beller <stefanbeller@googlemail.com>, git@vger.kernel.org
Subject: Re: [PATCH] lookup_object: split up displacement penalty for hash collisions
Date: Fri, 16 Aug 2013 05:57:22 -0400 [thread overview]
Message-ID: <20130816095722.GA11201@sigill.intra.peff.net> (raw)
In-Reply-To: <8761v6dm0r.fsf@linux-k42r.v.cablecom.net>
On Fri, Aug 16, 2013 at 11:26:28AM +0200, Thomas Rast wrote:
> > I trust the laptop numbers less because it has far more thermal (and
> > thus throttling) issues, but the runs do show a significant difference,
> > though less than you claimed.
>
> Well, as I feared... another run on the same laptop:
>
> Test HEAD next
> ------------------------------------------------------------------------------
> 0001.1: rev-list --all 6.41(6.14+0.24) 6.36(6.10+0.23) -0.9%*
> 0001.2: rev-list --all --objects 54.60(53.84+0.55) 54.23(53.50+0.53) -0.7%
> ------------------------------------------------------------------------------
> Significance hints: '.' 0.1 '*' 0.05 '**' 0.01 '***' 0.001
Yeah, I get similar results on my i7-840QM. The new code is sometimes
faster and sometimes slower, never wins by more than 1%, and is well
within the run-to-run noise (which seems to vary by up to 5%).
I think the reason is that having a "2-deep" LRU cache helps less than
one might hope. There are a lot of objects in the hash table, but only a
small fraction are "hot" at any time; specifically, those objects that
are in the currently-examined tree. When we hit the "X" object from
Stefan's diagrams, we know that it is hot. But the "B" object may or may
not be hot. If it is, Stefan's optimization helps; if not, it does
nothing.
But statistically it probably isn't hot. There are ~3 million objects in
the kernel repo, but only ~40,000 tree entries. So we would naively
expect it to have an effect in only ~1% of cases (I am not sure if that
is accurate, though, as "hot" items within the same collision sequence
would tend to float to the front of the chain). I suspect any savings
in those 1% are eaten up by the extra swapping, as well as the fact that
the "hot" B actually moves to the middle.
Another way to think about it is that if B is hot, it will then get
looked up again and get shifted to the front of the chain, pushing X
back. So this is really only a win when two hot entries, X and B,
collide and trade the front position back and forth.
In that case, it seems like we would want to move B to the second
position. This lets the 2-hot case just keep swapping the hot items back
and forth as quickly as possible. To the detriment of C, D, etc, which
never get promoted. But the likelihood of having _3_ hot items in a
collision chain is even less than 2.
That's all vague and hand-wavy intuition of course, and might be
completely wrong. But it's at least a working theory that explains the
experimental results.
-Peff
next prev parent reply other threads:[~2013-08-16 9:57 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-15 19:35 [PATCH] lookup_object: split up displacement penalty for hash collisions Stefan Beller
2013-08-16 8:28 ` Thomas Rast
2013-08-16 9:26 ` Thomas Rast
2013-08-16 9:41 ` Stefan Beller
2013-08-16 9:57 ` Jeff King [this message]
2013-08-16 13:49 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130816095722.GA11201@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=stefanbeller@googlemail.com \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).