From: Karsten Blees <karsten.blees@gmail.com>
To: Tay Ray Chuan <rctay89@gmail.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH v2 0/5] New hash table implementation
Date: Thu, 26 Sep 2013 16:38:12 +0200 [thread overview]
Message-ID: <524446D4.3010006@gmail.com> (raw)
In-Reply-To: <CALUzUxqX=zgkQg84jYQABKa=Lq=7BUee6824H+Xfye4XBnUZqA@mail.gmail.com>
Am 24.09.2013 13:16, schrieb Tay Ray Chuan:
> Hi Karsten,
>
> On Tue, Sep 24, 2013 at 5:50 PM, Karsten Blees <karsten.blees@gmail.com> wrote:
>>
>> | add | get 100% hits | get 10% hits
>> | hash | hashmap | hash | hashmap | hash | hashmap
>> --------+--------+---------+-------+---------+---------+--------
>> FNV | 14.815 | 2.345 | 3.059 | 1.642 | 4.085 | 0.976
>> FNV x2 | 14.409 | 2.706 | 2.888 | 1.959 | 3.905 | 1.393
>> i | 7.432 | 1.593 | 1.364 | 1.142 | 413.023 | 0.589
>> i x2 | 9.169 | 1.866 | 1.427 | 1.163 | 0.757 | 0.670
>> i/10 | 1.800 | 1.555 | 5.365 | 6.465 | 32.918 | 1.052
>> i/10 x2 | 1.892 | 1.555 | 5.386 | 6.474 | 1.123 | 1.206
>>
>> Tests can be reproduced with 'time echo "perfhash[map] <method> 1000" | ./test-hashmap', see test-hashmap.c for definition of method flags.
>
> I'm not sure if I'm reading the numbers right, but they look impressive!
>
The numbers are for 100 million additions / lookups (1,000 rounds á 100,000 entries). Considering everything else that happens in git, the hash table performance should be insignificant, though.
> If it's not too much trouble, could you put together an API document,
> along the lines of Documentation/technical/api-hash.txt?
Yes, I had already planned to port the documentation to asciidoc. Although in my experience, API documentation in the header file tends to better stay in sync with code changes (but this only makes real sense with extraction tools such as doxygen).
> I could give
> a stab at replacing patience and histogram diff's hash implementation
> with yours.
>
Open addressing (i.e. distributing conflicting entries to other buckes) *may* be faster *if* all data fits into the table (i.e. no pointers to the data are used). Scanning such a table (without following pointers) has very high locality and thus may benefit from accessing fewer CPU cache lines. The patience implementation seems to fall into this category (although the entry struct is fairly large, and it also uses the *2 trick to defeat bad hash codes (which wouldn't be necessary with chaining)).
Both patience and histogram use preallocated, fixed-size hash tables, and thus won't benefit from faster inserts (the 'add' performance numbers are for dynamically resized hash tables).
So, converting patience/histogram is probably not worth the trouble for performance reasons alone. If it also simplifies the algorithms and/or reduces memory usage - fine.
Ciao,
Karsten
next prev parent reply other threads:[~2013-09-26 14:38 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-10 23:27 [PATCH/RFC 0/5] New hash table implementation Karsten Blees
2013-09-10 23:28 ` [PATCH/RFC 1/5] add a hashtable implementation that supports O(1) removal Karsten Blees
2013-09-11 23:56 ` Junio C Hamano
2013-09-23 9:16 ` Karsten Blees
2013-09-12 4:10 ` Junio C Hamano
2013-09-23 9:21 ` Karsten Blees
2013-09-10 23:28 ` [PATCH/RFC 2/5] buitin/describe.c: use new hash map implementation Karsten Blees
2013-09-10 23:29 ` [PATCH/RFC 3/5] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
2013-09-10 23:30 ` [PATCH/RFC 4/5] diffcore-rename.c: simplify finding exact renames Karsten Blees
2013-09-10 23:30 ` [PATCH/RFC 5/5] diffcore-rename.c: use new hash map implementation Karsten Blees
2013-09-24 9:50 ` [PATCH v2 0/5] New hash table implementation Karsten Blees
2013-09-24 9:51 ` [PATCH v2 1/5] add a hashtable implementation that supports O(1) removal Karsten Blees
2013-09-24 9:52 ` [PATCH v2 2/5] buitin/describe.c: use new hash map implementation Karsten Blees
2013-09-24 9:52 ` [PATCH v2 3/5] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
2013-09-24 9:53 ` [PATCH v2 4/5] diffcore-rename.c: simplify finding exact renames Karsten Blees
2013-09-24 9:54 ` [PATCH v2 5/5] diffcore-rename.c: use new hash map implementation Karsten Blees
2013-09-24 10:18 ` [PATCH v2 0/5] New hash table implementation Fredrik Gustafsson
2013-09-24 11:16 ` Tay Ray Chuan
2013-09-26 14:38 ` Karsten Blees [this message]
2013-09-26 10:16 ` Fredrik Gustafsson
2013-09-26 10:26 ` Duy Nguyen
2013-09-26 11:08 ` Fredrik Gustafsson
2013-09-26 11:14 ` Duy Nguyen
2013-09-26 13:55 ` Karsten Blees
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=524446D4.3010006@gmail.com \
--to=karsten.blees@gmail.com \
--cc=git@vger.kernel.org \
--cc=rctay89@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).