git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fredrik Gustafsson <iveqy@iveqy.com>
To: Karsten Blees <karsten.blees@gmail.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH v2 0/5] New hash table implementation
Date: Tue, 24 Sep 2013 12:18:32 +0200	[thread overview]
Message-ID: <20130924101832.GC25070@paksenarrion.iveqy.com> (raw)
In-Reply-To: <52416058.90008@gmail.com>

On Tue, Sep 24, 2013 at 11:50:16AM +0200, Karsten Blees wrote:
> Regarding performance, I have to admit that the difference between the two implementations is far greater than I had anticipated. The following times (in seconds) are from Linux x64 (Debian Sarge) on a Core i7 860 @2.8GHz. All tests have been run with 1,000 rounds of 100,000 entries each.
> 
> The 'get 10% hits' test does 100,000 lookups on a table with 10,000 entries (i.e. 90% unsuccessful lookups).
> 
> The rows denote different hash functions with different qualities:
> - FNV: FNV-1 hash on stringified loop counter (i.e. fnv1(itoa(i))), as
>   an example of a high quality / low collision hash
> - i: just the loop counter (i.e. 0, 1, 2,...), guaranteed collision free
> - i/10: every 10 entries share the same hash code, lots of collisions
> 
> The i and i/10 tests show that open addressing suffers badly from clustering, i.e. with adjacent hash codes, it degrades to linear search. The *2 versions provide for some space between used buckets to better compare it to the chaining version.
> 
> 
>         |       add        |  get 100% hits  |    get 10% hits
>         |  hash  | hashmap | hash  | hashmap |  hash   | hashmap
> --------+--------+---------+-------+---------+---------+--------
> FNV     | 14.815 |   2.345 | 3.059 |   1.642 |   4.085 |   0.976
> FNV  x2 | 14.409 |   2.706 | 2.888 |   1.959 |   3.905 |   1.393
> i       |  7.432 |   1.593 | 1.364 |   1.142 | 413.023 |   0.589
> i    x2 |  9.169 |   1.866 | 1.427 |   1.163 |   0.757 |   0.670
> i/10    |  1.800 |   1.555 | 5.365 |   6.465 |  32.918 |   1.052
> i/10 x2 |  1.892 |   1.555 | 5.386 |   6.474 |   1.123 |   1.206
> 
> Tests can be reproduced with 'time echo "perfhash[map] <method> 1000" | ./test-hashmap', see test-hashmap.c for definition of method flags.
> 

So I did this improved hash implementation a few months back. Although I
could do a test like this and see an improvement, I failed to see an
improvement in actual git usage.

Hopefully it was just me doing something wrong, but I abandonned the
idea of a better hashmap since I couldn't see any major performance
boost using git and the current implementation is really simple and easy
to maintain.

So my question to you is, does your hashmap speed up git? And does it
speed it up enough to justify that your implementation is the double
amount of code than the current?
-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iveqy@iveqy.com

  parent reply	other threads:[~2013-09-24 10:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-10 23:27 [PATCH/RFC 0/5] New hash table implementation Karsten Blees
2013-09-10 23:28 ` [PATCH/RFC 1/5] add a hashtable implementation that supports O(1) removal Karsten Blees
2013-09-11 23:56   ` Junio C Hamano
2013-09-23  9:16     ` Karsten Blees
2013-09-12  4:10   ` Junio C Hamano
2013-09-23  9:21     ` Karsten Blees
2013-09-10 23:28 ` [PATCH/RFC 2/5] buitin/describe.c: use new hash map implementation Karsten Blees
2013-09-10 23:29 ` [PATCH/RFC 3/5] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
2013-09-10 23:30 ` [PATCH/RFC 4/5] diffcore-rename.c: simplify finding exact renames Karsten Blees
2013-09-10 23:30 ` [PATCH/RFC 5/5] diffcore-rename.c: use new hash map implementation Karsten Blees
2013-09-24  9:50 ` [PATCH v2 0/5] New hash table implementation Karsten Blees
2013-09-24  9:51   ` [PATCH v2 1/5] add a hashtable implementation that supports O(1) removal Karsten Blees
2013-09-24  9:52   ` [PATCH v2 2/5] buitin/describe.c: use new hash map implementation Karsten Blees
2013-09-24  9:52   ` [PATCH v2 3/5] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
2013-09-24  9:53   ` [PATCH v2 4/5] diffcore-rename.c: simplify finding exact renames Karsten Blees
2013-09-24  9:54   ` [PATCH v2 5/5] diffcore-rename.c: use new hash map implementation Karsten Blees
2013-09-24 10:18   ` Fredrik Gustafsson [this message]
2013-09-24 11:16   ` [PATCH v2 0/5] New hash table implementation Tay Ray Chuan
2013-09-26 14:38     ` Karsten Blees
2013-09-26 10:16   ` Fredrik Gustafsson
2013-09-26 10:26     ` Duy Nguyen
2013-09-26 11:08       ` Fredrik Gustafsson
2013-09-26 11:14         ` Duy Nguyen
2013-09-26 13:55     ` Karsten Blees

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130924101832.GC25070@paksenarrion.iveqy.com \
    --to=iveqy@iveqy.com \
    --cc=git@vger.kernel.org \
    --cc=karsten.blees@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).