git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Stefan Beller <sbeller@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] object: measure time needed for resolving hash collisions
Date: Wed, 14 Sep 2016 23:47:02 -0700	[thread overview]
Message-ID: <20160915064701.c4ishixuynbzpgwx@sigill.intra.peff.net> (raw)
In-Reply-To: <20160915020141.32000-1-sbeller@google.com>

On Wed, Sep 14, 2016 at 07:01:41PM -0700, Stefan Beller wrote:

>  According to Jeff, sending patches that don't get accepted is the new hype!

It is what all the cool kids are doing. Unfortunately, it does not save
you from nitpicky reviews...;)

>  	first = i = hash_obj(sha1, obj_hash_size);
> +	clock_gettime(CLOCK_MONOTONIC, &time1);
>  	while ((obj = obj_hash[i]) != NULL) {
>  		if (!hashcmp(sha1, obj->oid.hash))
>  			break;
> @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
>  		if (i == obj_hash_size)
>  			i = 0;
>  	}
> +	clock_gettime(CLOCK_MONOTONIC, &time2);
> +	diff(&time1, &time2, &t_diff);
> +	add_time_to(&caching, &t_diff);
>  	if (obj && i != first) {

I don't think this is actually measuring the time spent on collisions.
It's measuring the time we spend in hashcmp(), but that includes the
non-collision case where we find it in the first hashcmp.

Measuring _just_ the collisions is more like the patch below. In my
measurements it's more like 30ms, compared to 10s for all of the
hashcmps.

So we really aren't dealing with collisions, but rather just verifying
that our hash landed at the right spot. And _any_ data structure is
going to have to do that. If you want to make it faster, I'd try
optimizing hashcmp (and you can see why the critbit tree was slower; if
we spend so much time just on hashcmp() to make sure we're at the right
key, then making that slower with a bunch of branching is not going to
help).

I notice we still open-code hashcmp. I get a slight speedup by switching
it to memcmp(). About 2.5%, which is similar to what I showed in

  http://public-inbox.org/git/20130318073229.GA5551@sigill.intra.peff.net/

a few years ago (though it's more pronounced as a portion of the whole
now, because we've optimized some of the other bits).

The main driver there was memcmp() improvements that went into glibc
2.13 several years ago. It might be time to start assuming that memcmp()
beats our open-coded loop.

It may also be possible to really micro-optimize it on some platforms,
because we know the size in advance (I'd kind of expect the compiler to
do that, but if we're ending up in glibc memcmp then it sounds like it
is not the case).

-Peff

  reply	other threads:[~2016-09-15  6:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-15  2:01 [PATCH] object: measure time needed for resolving hash collisions Stefan Beller
2016-09-15  6:47 ` Jeff King [this message]
2016-09-15  7:01   ` Jeff King
2016-09-15 16:26   ` Stefan Beller
2016-09-15 18:49     ` Jeff King
2016-09-15 17:45   ` Junio C Hamano
2016-09-15 18:47     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160915064701.c4ishixuynbzpgwx@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).