All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Rast <trast@inf.ethz.ch>
To: Stefan Beller <stefanbeller@googlemail.com>
Cc: <git@vger.kernel.org>, <peff@peff.net>
Subject: Re: [PATCH] lookup_object: split up displacement penalty for hash collisions
Date: Fri, 16 Aug 2013 10:28:40 +0200	[thread overview]
Message-ID: <87haeqdop3.fsf@linux-k42r.v.cablecom.net> (raw)
In-Reply-To: <1376595306-6335-1-git-send-email-stefanbeller@googlemail.com> (Stefan Beller's message of "Thu, 15 Aug 2013 21:35:06 +0200")

Stefan Beller <stefanbeller@googlemail.com> writes:

> A little background on hash tables first:
> Consider you want to have the object X, which you'd expect at position
> i, but because that place was already taken by B, it is not found at
> position i, you start looking right of position i to find X until you
> find it.
>
>     index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
>            --------------------------------------------
>     entry   ... |  A  |  B  |  C  |  D  |  E  |  X  | ...
>            --------------------------------------------
>
> Once you have found X at i+4, the commit 9a414486d9f0 (2013-05-01,
> lookup_object: prioritize recently found objects) did an optimization
> and swapped the object B with X, so the placement looks like
>
>     index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
>            --------------------------------------------
>     entry   ... |  A  |  X  |  C  |  D  |  E  |  B  | ...
>            --------------------------------------------
[...]
> If we now want to find X and X is expected at i, we put X to
> the position i and B to the middle position between B and X at D
> and D will go to the old position of X:
>
>     index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
>            --------------------------------------------
>     entry   ... |  A  |  X  |  C  |  B  |  E  |  D  | ...
>            --------------------------------------------
>
> So let's test how it works out:
> 	# running the current git.git master (c1ebd90c832e), repeat 25 times:
> 	perf stat -r 25 -- ./git-rev-list --all --objects >/dev/null
> 	...
> 	5.294512141 seconds time elapsed    ( +-  7.88% )
> 	# Now running with this patch applied:
> 	5.111576725 seconds time elapsed    ( +-  8.17% )
[...]
> However please do check if this patch brings the promised performance
> on your own, as you're likely using different hardware and another
> software setup. Feel free to share your performance differences.

I get this on an i7-M620 laptop from t/perf/p0001-rev-list.sh:

  Test                               HEAD                next                    
  -------------------------------------------------------------------------------
  0001.1: rev-list --all             6.29(6.03+0.22)     6.33(6.06+0.24) +0.6%   
  0001.2: rev-list --all --objects   53.22(52.48+0.54)   54.90(54.15+0.55) +3.2%*
  -------------------------------------------------------------------------------
  Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

And this on a newer i7-3930K workstation:

  Test                               HEAD                next                    
  -------------------------------------------------------------------------------
  0001.1: rev-list --all             3.86(3.71+0.14)     3.89(3.74+0.14) +0.7%*  
  0001.2: rev-list --all --objects   30.23(29.91+0.29)   30.35(30.04+0.29) +0.4%.
  -------------------------------------------------------------------------------
  Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

That's using linux.git as a test repository, I figured the numbers were
too small with git.git.

I trust the laptop numbers less because it has far more thermal (and
thus throttling) issues, but the runs do show a significant difference,
though less than you claimed.

> @@ -90,9 +90,27 @@ struct object *lookup_object(const unsigned char *sha1)
>  		 * Move object to where we started to look for it so
>  		 * that we do not need to walk the hash table the next
>  		 * time we look for it.
> +		 * However, we don't want to penalize the the object being
> +		 * moved from the first position, so divide the penalty to
> +		 * two different objects.
>  		 */
> +
> +		/*
> +		 * First make sure i > first, so the middle is really
> +		 * in between the i and first object
> +		 */
> +		if (i < first)
> +			i += obj_hash_size;
> +
> +		middle = (i + first) / 2;
> +		if (i >= obj_hash_size)
> +			i -= obj_hash_size;
> +		if (middle >= obj_hash_size)
> +			middle -= obj_hash_size;
> +
>  		struct object *tmp = obj_hash[i];

Adding all the code above means that this declaration is now in the
middle of things, which is not valid C90.  Please move the declaration
to the beginning of the block, and compile with
-Wdeclaration-after-statement to notice this issue.

> -		obj_hash[i] = obj_hash[first];
> +		obj_hash[i] = obj_hash[middle];
> +		obj_hash[middle] = obj_hash[first];
>  		obj_hash[first] = tmp;
>  	}
>  	return obj;

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  reply	other threads:[~2013-08-16  8:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-15 19:35 [PATCH] lookup_object: split up displacement penalty for hash collisions Stefan Beller
2013-08-16  8:28 ` Thomas Rast [this message]
2013-08-16  9:26   ` Thomas Rast
2013-08-16  9:41     ` Stefan Beller
2013-08-16  9:57     ` Jeff King
2013-08-16 13:49       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87haeqdop3.fsf@linux-k42r.v.cablecom.net \
    --to=trast@inf.ethz.ch \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=stefanbeller@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.