From: Thomas Rast <trast@inf.ethz.ch>
To: Stefan Beller <stefanbeller@googlemail.com>
Cc: <git@vger.kernel.org>, <peff@peff.net>
Subject: Re: [PATCH] lookup_object: split up displacement penalty for hash collisions
Date: Fri, 16 Aug 2013 10:28:40 +0200 [thread overview]
Message-ID: <87haeqdop3.fsf@linux-k42r.v.cablecom.net> (raw)
In-Reply-To: <1376595306-6335-1-git-send-email-stefanbeller@googlemail.com> (Stefan Beller's message of "Thu, 15 Aug 2013 21:35:06 +0200")
Stefan Beller <stefanbeller@googlemail.com> writes:
> A little background on hash tables first:
> Consider you want to have the object X, which you'd expect at position
> i, but because that place was already taken by B, it is not found at
> position i, you start looking right of position i to find X until you
> find it.
>
> index | i-1 | i | i+1 | i+2 | i+3 | i+4 |
> --------------------------------------------
> entry ... | A | B | C | D | E | X | ...
> --------------------------------------------
>
> Once you have found X at i+4, the commit 9a414486d9f0 (2013-05-01,
> lookup_object: prioritize recently found objects) did an optimization
> and swapped the object B with X, so the placement looks like
>
> index | i-1 | i | i+1 | i+2 | i+3 | i+4 |
> --------------------------------------------
> entry ... | A | X | C | D | E | B | ...
> --------------------------------------------
[...]
> If we now want to find X and X is expected at i, we put X to
> the position i and B to the middle position between B and X at D
> and D will go to the old position of X:
>
> index | i-1 | i | i+1 | i+2 | i+3 | i+4 |
> --------------------------------------------
> entry ... | A | X | C | B | E | D | ...
> --------------------------------------------
>
> So let's test how it works out:
> # running the current git.git master (c1ebd90c832e), repeat 25 times:
> perf stat -r 25 -- ./git-rev-list --all --objects >/dev/null
> ...
> 5.294512141 seconds time elapsed ( +- 7.88% )
> # Now running with this patch applied:
> 5.111576725 seconds time elapsed ( +- 8.17% )
[...]
> However please do check if this patch brings the promised performance
> on your own, as you're likely using different hardware and another
> software setup. Feel free to share your performance differences.
I get this on an i7-M620 laptop from t/perf/p0001-rev-list.sh:
Test HEAD next
-------------------------------------------------------------------------------
0001.1: rev-list --all 6.29(6.03+0.22) 6.33(6.06+0.24) +0.6%
0001.2: rev-list --all --objects 53.22(52.48+0.54) 54.90(54.15+0.55) +3.2%*
-------------------------------------------------------------------------------
Significance hints: '.' 0.1 '*' 0.05 '**' 0.01 '***' 0.001
And this on a newer i7-3930K workstation:
Test HEAD next
-------------------------------------------------------------------------------
0001.1: rev-list --all 3.86(3.71+0.14) 3.89(3.74+0.14) +0.7%*
0001.2: rev-list --all --objects 30.23(29.91+0.29) 30.35(30.04+0.29) +0.4%.
-------------------------------------------------------------------------------
Significance hints: '.' 0.1 '*' 0.05 '**' 0.01 '***' 0.001
That's using linux.git as a test repository, I figured the numbers were
too small with git.git.
I trust the laptop numbers less because it has far more thermal (and
thus throttling) issues, but the runs do show a significant difference,
though less than you claimed.
> @@ -90,9 +90,27 @@ struct object *lookup_object(const unsigned char *sha1)
> * Move object to where we started to look for it so
> * that we do not need to walk the hash table the next
> * time we look for it.
> + * However, we don't want to penalize the the object being
> + * moved from the first position, so divide the penalty to
> + * two different objects.
> */
> +
> + /*
> + * First make sure i > first, so the middle is really
> + * in between the i and first object
> + */
> + if (i < first)
> + i += obj_hash_size;
> +
> + middle = (i + first) / 2;
> + if (i >= obj_hash_size)
> + i -= obj_hash_size;
> + if (middle >= obj_hash_size)
> + middle -= obj_hash_size;
> +
> struct object *tmp = obj_hash[i];
Adding all the code above means that this declaration is now in the
middle of things, which is not valid C90. Please move the declaration
to the beginning of the block, and compile with
-Wdeclaration-after-statement to notice this issue.
> - obj_hash[i] = obj_hash[first];
> + obj_hash[i] = obj_hash[middle];
> + obj_hash[middle] = obj_hash[first];
> obj_hash[first] = tmp;
> }
> return obj;
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2013-08-16 8:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-15 19:35 [PATCH] lookup_object: split up displacement penalty for hash collisions Stefan Beller
2013-08-16 8:28 ` Thomas Rast [this message]
2013-08-16 9:26 ` Thomas Rast
2013-08-16 9:41 ` Stefan Beller
2013-08-16 9:57 ` Jeff King
2013-08-16 13:49 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87haeqdop3.fsf@linux-k42r.v.cablecom.net \
--to=trast@inf.ethz.ch \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=stefanbeller@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.