git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Karsten Blees <karsten.blees@gmail.com>
To: "Vicent Martí" <tanoku@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Duy Nguyen <pclouds@gmail.com>, Karsten Blees <blees@dcon.de>,
	Git Mailing List <git@vger.kernel.org>, Jeff King <peff@peff.net>
Subject: Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
Date: Thu, 14 Nov 2013 20:38:01 +0100	[thread overview]
Message-ID: <52852699.3000408@gmail.com> (raw)
In-Reply-To: <CAH7EuMHgH6Oe_SvjyutBaakRfyZGHpp_iimaAzpV09AnHTYntw@mail.gmail.com>

Am 29.10.2013 10:09, schrieb Karsten Blees:
> On Mon, Oct 28, 2013 at 10:04 PM, Vicent Martí <tanoku@gmail.com> wrote:
>>
>> On Mon, Oct 28, 2013 at 8:45 PM, Karsten Blees <karsten.blees@gmail.com> wrote:
>>
>>> Regarding performance, khash uses open addressing, which requires more key comparisons (O(1/(1-load_factor))) than chaining (O(1+load_factor)). However, any measurable differences will most likely be dwarfed by IO costs in this particular use case.
>>
>> I don't think this is true. If you actually run a couple insertion and
>> lookup benchmarks comparing the two implementations, you'll find khash
>> to be around ~30% faster for most workloads (venturing a guess from
>> past experience). I am obviously not the author of khash, but I've
...

Just out of curiosity, I added performance test code for khash to the test in my current hashmap patch series [1]. It turns out that khash is by far the slowest of the bunch, especially with many collisions.

Again, I don't think that performance matters all that much (or in other words: _any_ hash table implementation will probably be fast enough compared to the rest that's going on). Its more a question of whether we really need two different hash table implementations (and a queasy feeling about the macro kludge in khash.h...).


Khash doesn't store the hash codes along with the entries (as both hash.[ch] and hashmap.[ch] do), so it needs to re-calculate hash codes on every resize. For a fair comparison, the "khash" test uses keys with pre-calculated hash codes in the key structure. This should be similar to a hash function that just copies 4 bytes from a sha1 key. Khash maps with more complex hash functions will be slower (see khstr).

The "khstr" test uses khash's predefined string map and khash's X31 hash function for strings (therefore no separate values for different hash functions here).

The table is similar to what I posted for hashmap-v2 [2] (i.e. real time in seconds for 1,000 rounds á 100,000 entries). I just turned it around a bit to make room for khash columns.

test | hash_fn | hashmap |  hash   |  khash  | khstr  |
-----+---------+---------+---------+---------+--------+
     | FNV     |   2.429 |  14.366 |  11.780 | 18.677 |
     | FNV  x2 |   2.946 |  14.558 |  10.922 |        |
 add | i       |   1.708 |   7.419 |   4.132 |        |
     | i    x2 |   1.791 |   8.565 |   4.502 |        |
     | i/10    |   1.555 |   1.805 | 344.691 |        |
     | i/10 x2 |   1.543 |   1.808 | 319.559 |        |
-----+---------+---------+---------+---------+--------+
     | FNV     |   1.822 |   3.452 |   4.922 |  8.309 |
get  | FNV  x2 |   2.298 |   3.194 |   4.473 |        |
100% | i       |   1.252 |   1.344 |   0.944 |        |
hits | i    x2 |   1.286 |   1.434 |   1.220 |        |
     | i/10    |   6.720 |   5.138 | 281.815 |        |
     | i/10 x2 |   6.297 |   5.188 | 257.021 |        |
-----+---------+---------+---------+---------+--------+
     | FNV     |   1.023 |   3.949 |   4.115 |  4.878 |
get  | FNV  x2 |   1.538 |   3.915 |   4.571 |        |
10%  | i       |   0.654 | 397.457 |  38.125 |        |
hits | i    x2 |   0.718 |   0.722 |   9.111 |        |
     | i/10    |   1.128 |  30.235 |  60.376 |        |
     | i/10 x2 |   1.260 |   1.082 |  43.354 |        |
-----+---------+---------+---------+---------+--------+

[1] https://github.com/kblees/git/commits/kb/hashmap-v5-khash
[2] http://article.gmane.org/gmane.comp.version-control.git/235290

      reply	other threads:[~2013-11-14 19:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-25 23:23 What's cooking in git.git (Oct 2013, #06; Fri, 25) Junio C Hamano
2013-10-26 10:30 ` Duy Nguyen
2013-10-28 15:48   ` Junio C Hamano
2013-10-28 16:16     ` Vicent Martí
2013-10-28 16:41       ` Junio C Hamano
2013-10-28 19:45       ` Karsten Blees
2013-10-28 21:04         ` Vicent Martí
2013-10-29  9:09           ` Karsten Blees
2013-11-14 19:38             ` Karsten Blees [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52852699.3000408@gmail.com \
    --to=karsten.blees@gmail.com \
    --cc=blees@dcon.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=tanoku@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).