git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] lookup_object: split up displacement penalty for hash collisions
@ 2013-08-15 19:35 Stefan Beller
  2013-08-16  8:28 ` Thomas Rast
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2013-08-15 19:35 UTC (permalink / raw)
  To: git, peff; +Cc: Stefan Beller

A little background on hash tables first:
Consider you want to have the object X, which you'd expect at position
i, but because that place was already taken by B, it is not found at
position i, you start looking right of position i to find X until you
find it.

    index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
           --------------------------------------------
    entry   ... |  A  |  B  |  C  |  D  |  E  |  X  | ...
           --------------------------------------------

Once you have found X at i+4, the commit 9a414486d9f0 (2013-05-01,
lookup_object: prioritize recently found objects) did an optimization
and swapped the object B with X, so the placement looks like

    index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
           --------------------------------------------
    entry   ... |  A  |  X  |  C  |  D  |  E  |  B  | ...
           --------------------------------------------

This would improve the lookup times, because you'd likely be looking up X
again soon. However we are putting a heavy penalty on B for the collision.
Also suppose we'd look up B now, which is expected to be at i or even left
from there (i-n). So it is a long way to go until B is found.
So if B were expected at the same place as X, we'd end up as before the
swap, now penalizing the lookup of X again.

Jeff stated in the referenced commit, that another solution, an LRU
scheduling would be to shift all of the objects. So when looking for X
in the first diagram, once we found X, we'd shift all the objects and put
X to position i:

    index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
           --------------------------------------------
    entry   ... |  A  |  X  |  B  |  C  |  D  |  E  | ...
           --------------------------------------------

But that strategy would need to go through all the positions, hence taking
even longer.


This patch proposes another strategy: We split the penalty into parts and
add the penalty to different objects. To keep the overhead small, we're
splitting up the penalty to only 2 portions and assign it to the object B
and an arbitrary object in the range between B and X.

The reason why moving that arbitrary object in between is analogous to the
explanation of moving the first object. See 9a414486d9f0 (2013-05-01,
lookup_object: prioritize recently found objects) for a detailed explanation.

But which of the objects to choose from in between? At first I naively choose
the exact mid between those 2 objects, so here is the diagram from the beginning
again:

    index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
           --------------------------------------------
    entry   ... |  A  |  B  |  C  |  D  |  E  |  X  | ...
           --------------------------------------------

If we now want to find X and X is expected at i, we put X to
the position i and B to the middle position between B and X at D
and D will go to the old position of X:

    index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
           --------------------------------------------
    entry   ... |  A  |  X  |  C  |  B  |  E  |  D  | ...
           --------------------------------------------

So let's test how it works out:
	# running the current git.git master (c1ebd90c832e), repeat 25 times:
	perf stat -r 25 -- ./git-rev-list --all --objects >/dev/null
	...
	5.294512141 seconds time elapsed    ( +-  7.88% )
	# Now running with this patch applied:
	5.111576725 seconds time elapsed    ( +-  8.17% )

This is an average 5 percent performance gain, though the measure times
are varying itself by that amount (7.88 and 8.17 percent)

This approach is faster than before, because it still doesn't touch too
many object for resorting, but also doesn't put the penalty to one
outlier. The lookup penalty is split up between two objects, which
probably have nothing to do with each other.

But is the mid object really the right thing to do? We could
split the lookup penalty not into 2 halfs, but in a huge and a
small amount. So instead of the mid object, we are free to
choose any other arbitrary object in between B and X. Let's try
to use the nearest object (C) and farest object(E), because there
we may have better cache effects (near the objects we're accessing anyway)
and the calculation is easier:

If using C as the mid object, we'd end up with

    index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
           --------------------------------------------
    entry   ... |  A  |  X  |  B  |  D  |  E  |  C  | ...
           --------------------------------------------

or if taking the E object we'd end up with

    index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
           --------------------------------------------
    entry   ... |  A  |  X  |  C  |  D  |  B  |  E  | ...
           --------------------------------------------

	# Test with C as mid object (25 repetitions as well):
	# 5.35 seconds time elapsed  ( +-  7.09% )
	# Test with E as mid object (25 repetitions as well):
	# 5.42 seconds time elapsed  ( +-  6.50% )

So that was not successful, hence the initial idea to take the mid
object is in this patch. A second run of the original state and
this patch yieled a 5 percent performance gain again, so I am quite
confident now this patch is good.

However please do check if this patch brings the promised performance
on your own, as you're likely using different hardware and another
software setup. Feel free to share your performance differences.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
---
 object.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index d8a4b1f..014290d 100644
--- a/object.c
+++ b/object.c
@@ -71,7 +71,7 @@ static unsigned int hashtable_index(const unsigned char *sha1)
 
 struct object *lookup_object(const unsigned char *sha1)
 {
-	unsigned int i, first;
+	unsigned int i, first, middle;
 	struct object *obj;
 
 	if (!obj_hash)
@@ -90,9 +90,27 @@ struct object *lookup_object(const unsigned char *sha1)
 		 * Move object to where we started to look for it so
 		 * that we do not need to walk the hash table the next
 		 * time we look for it.
+		 * However, we don't want to penalize the the object being
+		 * moved from the first position, so divide the penalty to
+		 * two different objects.
 		 */
+
+		/*
+		 * First make sure i > first, so the middle is really
+		 * in between the i and first object
+		 */
+		if (i < first)
+			i += obj_hash_size;
+
+		middle = (i + first) / 2;
+		if (i >= obj_hash_size)
+			i -= obj_hash_size;
+		if (middle >= obj_hash_size)
+			middle -= obj_hash_size;
+
 		struct object *tmp = obj_hash[i];
-		obj_hash[i] = obj_hash[first];
+		obj_hash[i] = obj_hash[middle];
+		obj_hash[middle] = obj_hash[first];
 		obj_hash[first] = tmp;
 	}
 	return obj;
-- 
1.8.4.rc3.498.g5af1768

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] lookup_object: split up displacement penalty for hash collisions
  2013-08-15 19:35 [PATCH] lookup_object: split up displacement penalty for hash collisions Stefan Beller
@ 2013-08-16  8:28 ` Thomas Rast
  2013-08-16  9:26   ` Thomas Rast
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Rast @ 2013-08-16  8:28 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff

Stefan Beller <stefanbeller@googlemail.com> writes:

> A little background on hash tables first:
> Consider you want to have the object X, which you'd expect at position
> i, but because that place was already taken by B, it is not found at
> position i, you start looking right of position i to find X until you
> find it.
>
>     index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
>            --------------------------------------------
>     entry   ... |  A  |  B  |  C  |  D  |  E  |  X  | ...
>            --------------------------------------------
>
> Once you have found X at i+4, the commit 9a414486d9f0 (2013-05-01,
> lookup_object: prioritize recently found objects) did an optimization
> and swapped the object B with X, so the placement looks like
>
>     index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
>            --------------------------------------------
>     entry   ... |  A  |  X  |  C  |  D  |  E  |  B  | ...
>            --------------------------------------------
[...]
> If we now want to find X and X is expected at i, we put X to
> the position i and B to the middle position between B and X at D
> and D will go to the old position of X:
>
>     index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
>            --------------------------------------------
>     entry   ... |  A  |  X  |  C  |  B  |  E  |  D  | ...
>            --------------------------------------------
>
> So let's test how it works out:
> 	# running the current git.git master (c1ebd90c832e), repeat 25 times:
> 	perf stat -r 25 -- ./git-rev-list --all --objects >/dev/null
> 	...
> 	5.294512141 seconds time elapsed    ( +-  7.88% )
> 	# Now running with this patch applied:
> 	5.111576725 seconds time elapsed    ( +-  8.17% )
[...]
> However please do check if this patch brings the promised performance
> on your own, as you're likely using different hardware and another
> software setup. Feel free to share your performance differences.

I get this on an i7-M620 laptop from t/perf/p0001-rev-list.sh:

  Test                               HEAD                next                    
  -------------------------------------------------------------------------------
  0001.1: rev-list --all             6.29(6.03+0.22)     6.33(6.06+0.24) +0.6%   
  0001.2: rev-list --all --objects   53.22(52.48+0.54)   54.90(54.15+0.55) +3.2%*
  -------------------------------------------------------------------------------
  Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

And this on a newer i7-3930K workstation:

  Test                               HEAD                next                    
  -------------------------------------------------------------------------------
  0001.1: rev-list --all             3.86(3.71+0.14)     3.89(3.74+0.14) +0.7%*  
  0001.2: rev-list --all --objects   30.23(29.91+0.29)   30.35(30.04+0.29) +0.4%.
  -------------------------------------------------------------------------------
  Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

That's using linux.git as a test repository, I figured the numbers were
too small with git.git.

I trust the laptop numbers less because it has far more thermal (and
thus throttling) issues, but the runs do show a significant difference,
though less than you claimed.

> @@ -90,9 +90,27 @@ struct object *lookup_object(const unsigned char *sha1)
>  		 * Move object to where we started to look for it so
>  		 * that we do not need to walk the hash table the next
>  		 * time we look for it.
> +		 * However, we don't want to penalize the the object being
> +		 * moved from the first position, so divide the penalty to
> +		 * two different objects.
>  		 */
> +
> +		/*
> +		 * First make sure i > first, so the middle is really
> +		 * in between the i and first object
> +		 */
> +		if (i < first)
> +			i += obj_hash_size;
> +
> +		middle = (i + first) / 2;
> +		if (i >= obj_hash_size)
> +			i -= obj_hash_size;
> +		if (middle >= obj_hash_size)
> +			middle -= obj_hash_size;
> +
>  		struct object *tmp = obj_hash[i];

Adding all the code above means that this declaration is now in the
middle of things, which is not valid C90.  Please move the declaration
to the beginning of the block, and compile with
-Wdeclaration-after-statement to notice this issue.

> -		obj_hash[i] = obj_hash[first];
> +		obj_hash[i] = obj_hash[middle];
> +		obj_hash[middle] = obj_hash[first];
>  		obj_hash[first] = tmp;
>  	}
>  	return obj;

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] lookup_object: split up displacement penalty for hash collisions
  2013-08-16  8:28 ` Thomas Rast
@ 2013-08-16  9:26   ` Thomas Rast
  2013-08-16  9:41     ` Stefan Beller
  2013-08-16  9:57     ` Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: Thomas Rast @ 2013-08-16  9:26 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff

Thomas Rast <trast@inf.ethz.ch> writes:

> Stefan Beller <stefanbeller@googlemail.com> writes:
>
>> However please do check if this patch brings the promised performance
>> on your own, as you're likely using different hardware and another
>> software setup. Feel free to share your performance differences.
>
> I get this on an i7-M620 laptop from t/perf/p0001-rev-list.sh:
>
>   Test                               HEAD                next                    
>   -------------------------------------------------------------------------------
>   0001.1: rev-list --all             6.29(6.03+0.22)     6.33(6.06+0.24) +0.6%   
>   0001.2: rev-list --all --objects   53.22(52.48+0.54)   54.90(54.15+0.55) +3.2%*
>   -------------------------------------------------------------------------------
>   Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001
[...]
> I trust the laptop numbers less because it has far more thermal (and
> thus throttling) issues, but the runs do show a significant difference,
> though less than you claimed.

Well, as I feared... another run on the same laptop:

Test                               HEAD                next                                            
------------------------------------------------------------------------------
0001.1: rev-list --all             6.41(6.14+0.24)     6.36(6.10+0.23) -0.9%* 
0001.2: rev-list --all --objects   54.60(53.84+0.55)   54.23(53.50+0.53) -0.7%
------------------------------------------------------------------------------
Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] lookup_object: split up displacement penalty for hash collisions
  2013-08-16  9:26   ` Thomas Rast
@ 2013-08-16  9:41     ` Stefan Beller
  2013-08-16  9:57     ` Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: Stefan Beller @ 2013-08-16  9:41 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, peff

[-- Attachment #1: Type: text/plain, Size: 1922 bytes --]

On 08/16/2013 11:26 AM, Thomas Rast wrote:
> Thomas Rast <trast@inf.ethz.ch> writes:
> 
>> Stefan Beller <stefanbeller@googlemail.com> writes:
>>
>>> However please do check if this patch brings the promised performance
>>> on your own, as you're likely using different hardware and another
>>> software setup. Feel free to share your performance differences.
>>
>> I get this on an i7-M620 laptop from t/perf/p0001-rev-list.sh:
>>
>>   Test                               HEAD                next                    
>>   -------------------------------------------------------------------------------
>>   0001.1: rev-list --all             6.29(6.03+0.22)     6.33(6.06+0.24) +0.6%   
>>   0001.2: rev-list --all --objects   53.22(52.48+0.54)   54.90(54.15+0.55) +3.2%*
>>   -------------------------------------------------------------------------------
>>   Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001
> [...]
>> I trust the laptop numbers less because it has far more thermal (and
>> thus throttling) issues, but the runs do show a significant difference,
>> though less than you claimed.
> 
> Well, as I feared... another run on the same laptop:
> 
> Test                               HEAD                next                                            
> ------------------------------------------------------------------------------
> 0001.1: rev-list --all             6.41(6.14+0.24)     6.36(6.10+0.23) -0.9%* 
> 0001.2: rev-list --all --objects   54.60(53.84+0.55)   54.23(53.50+0.53) -0.7%
> ------------------------------------------------------------------------------
> Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001
> 

I did some more tests as well, and I seem to have just been lucky with
the results initially posted. Now I got a negative impact as well on one
test, so that patch is not worth for includsion.

Thanks,
Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] lookup_object: split up displacement penalty for hash collisions
  2013-08-16  9:26   ` Thomas Rast
  2013-08-16  9:41     ` Stefan Beller
@ 2013-08-16  9:57     ` Jeff King
  2013-08-16 13:49       ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff King @ 2013-08-16  9:57 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Stefan Beller, git

On Fri, Aug 16, 2013 at 11:26:28AM +0200, Thomas Rast wrote:

> > I trust the laptop numbers less because it has far more thermal (and
> > thus throttling) issues, but the runs do show a significant difference,
> > though less than you claimed.
> 
> Well, as I feared... another run on the same laptop:
> 
> Test                               HEAD                next                                            
> ------------------------------------------------------------------------------
> 0001.1: rev-list --all             6.41(6.14+0.24)     6.36(6.10+0.23) -0.9%* 
> 0001.2: rev-list --all --objects   54.60(53.84+0.55)   54.23(53.50+0.53) -0.7%
> ------------------------------------------------------------------------------
> Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

Yeah, I get similar results on my i7-840QM. The new code is sometimes
faster and sometimes slower, never wins by more than 1%, and is well
within the run-to-run noise (which seems to vary by up to 5%).

I think the reason is that having a "2-deep" LRU cache helps less than
one might hope. There are a lot of objects in the hash table, but only a
small fraction are "hot" at any time; specifically, those objects that
are in the currently-examined tree. When we hit the "X" object from
Stefan's diagrams, we know that it is hot. But the "B" object may or may
not be hot. If it is, Stefan's optimization helps; if not, it does
nothing.

But statistically it probably isn't hot. There are ~3 million objects in
the kernel repo, but only ~40,000 tree entries. So we would naively
expect it to have an effect in only ~1% of cases (I am not sure if that
is accurate, though, as "hot" items within the same collision sequence
would tend to float to the front of the chain).  I suspect any savings
in those 1% are eaten up by the extra swapping, as well as the fact that
the "hot" B actually moves to the middle.

Another way to think about it is that if B is hot, it will then get
looked up again and get shifted to the front of the chain, pushing X
back. So this is really only a win when two hot entries, X and B,
collide and trade the front position back and forth.

In that case, it seems like we would want to move B to the second
position. This lets the 2-hot case just keep swapping the hot items back
and forth as quickly as possible. To the detriment of C, D, etc, which
never get promoted. But the likelihood of having _3_ hot items in a
collision chain is even less than 2.

That's all vague and hand-wavy intuition of course, and might be
completely wrong. But it's at least a working theory that explains the
experimental results.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] lookup_object: split up displacement penalty for hash collisions
  2013-08-16  9:57     ` Jeff King
@ 2013-08-16 13:49       ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2013-08-16 13:49 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Stefan Beller, git

On Fri, Aug 16, 2013 at 05:57:22AM -0400, Jeff King wrote:

> In that case, it seems like we would want to move B to the second
> position. This lets the 2-hot case just keep swapping the hot items back
> and forth as quickly as possible. To the detriment of C, D, etc, which
> never get promoted. But the likelihood of having _3_ hot items in a
> collision chain is even less than 2.

That was one of the cases Stefan considered in the original patch, but
didn't show code. Here's my make-shift code to try it; one could also
parameterize it to shift down at most N items, and then just bump the
N+1th item to the end (so the existing behavior is N=0, my patch is N+1,
etc).

However, I measured no speedup at all. Oh well. It was worth a shot.

---
 object.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index d8a4b1f..f7ca969 100644
--- a/object.c
+++ b/object.c
@@ -71,7 +71,7 @@ struct object *lookup_object(const unsigned char *sha1)
 
 struct object *lookup_object(const unsigned char *sha1)
 {
-	unsigned int i, first;
+	unsigned int i, first, middle;
 	struct object *obj;
 
 	if (!obj_hash)
@@ -90,9 +90,22 @@ struct object *lookup_object(const unsigned char *sha1)
 		 * Move object to where we started to look for it so
 		 * that we do not need to walk the hash table the next
 		 * time we look for it.
+		 *
+		 * However, we don't want to penalize the object being
+		 * moved from the first position, so shift it down only one
+		 * slot, and bump the next object to the end. This is faster
+		 * than shifting the whole chain down, and covers the common
+		 * case of a few "hot" items near the front of the chain.
 		 */
+		int second = (first + 1) % obj_hash_size;
 		struct object *tmp = obj_hash[i];
-		obj_hash[i] = obj_hash[first];
+
+		if (i != second) {
+			obj_hash[i] = obj_hash[second];
+			obj_hash[second] = obj_hash[first];
+		} else
+			obj_hash[i] = obj_hash[first];
+
 		obj_hash[first] = tmp;
 	}
 	return obj;
-- 
1.8.4.rc2.28.g6bb5f3f

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-08-16 13:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-15 19:35 [PATCH] lookup_object: split up displacement penalty for hash collisions Stefan Beller
2013-08-16  8:28 ` Thomas Rast
2013-08-16  9:26   ` Thomas Rast
2013-08-16  9:41     ` Stefan Beller
2013-08-16  9:57     ` Jeff King
2013-08-16 13:49       ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).