From: Andreas Ericsson <ae@op5.se>
To: Luke Lu <git@vicaya.com>
Cc: Marko Kreen <markokr@gmail.com>,
Dmitry Potapov <dpotapov@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: I'm a total push-over..
Date: Thu, 24 Jan 2008 11:24:53 +0100 [thread overview]
Message-ID: <47986775.7010603@op5.se> (raw)
In-Reply-To: <F23CA352-416C-49EC-8132-688784CF3C18@vicaya.com>
Luke Lu wrote:
> On Jan 23, 2008, at 6:39 AM, Andreas Ericsson wrote:
>> Marko Kreen wrote:
>>> On 1/23/08, Andreas Ericsson <ae@op5.se> wrote:
>>>> Dmitry Potapov wrote:
>>>>> On Wed, Jan 23, 2008 at 09:32:54AM +0100, Andreas Ericsson wrote:
>>>>>> The FNV hash would be better (pasted below), but I doubt
>>>>>> anyone will ever care, and there will be larger differences
>>>>>> between architectures with this one than the lt_git hash (well,
>>>>>> a function's gotta have a name).
>>>>> Actually, Bob Jenkins' lookup3 hash is twice faster in my tests
>>>>> than FNV, and also it is much less likely to have any collision.
>>>>>
>>>> >From http://burtleburtle.net/bob/hash/doobs.html
>>>> ---
>>>> FNV Hash
>>>>
>>>> I need to fill this in. Search the web for FNV hash. It's faster
>>>> than my hash on Intel (because Intel has fast multiplication), but
>>>> slower on most other platforms. Preliminary tests suggested it has
>>>> decent distributions.
>>> I suspect that this paragraph was about comparison with lookup2
>>
>>
>> It might be. It's from the link Dmitry posted in his reply to my original
>> message. (something/something/doobs.html).
>>
>>> (not lookup3) because lookup3 beat easily all the "simple" hashes
>>
>> By how much? FNV beat Linus' hash by 0.01 microseconds / insertion,
>> and 0.1 microsecons / lookup. We're talking about a case here where
>> there will never be more lookups than insertions (unless I'm much
>> mistaken).
>>
>>> If you don't mind few percent speed penalty compared to Jenkings
>>> own optimized version, you can use my simplified version:
>>>
>>> http://repo.or.cz/w/pgbouncer.git?a=blob;f=src/hash.c;h=5c9a73639ad098c296c0be562c34573189f3e083;hb=HEAD
>>>
>>
>> I don't, but I don't care that deeply either. On the one hand,
>> it would be nifty to have an excellent hash-function in git.
>> On the other hand, it would look stupid with something that's
>> quite clearly over-kill.
>>
>>> It works always with "native" endianess, unlike Jenkins fixed-endian
>>> hashlittle() / hashbig(). It may or may not matter if you plan
>>> to write values on disk.
>>> Speed-wise it may be 10-30% slower worst case (in my case sparc-classic
>>> with unaligned data), but on x86, lucky gcc version and maybe
>>> also memcpy() hack seen in system.h, it tends to be ~10% faster,
>>> especially as it does always 4byte read in main loop.
>>
>> It would have to be a significant improvement in wall-clock time
>> on a test-case of hashing 30k strings to warrant going from 6 to 80
>> lines of code, imo. I still believe the original dumb hash Linus
>> wrote is "good enough".
>>
>> On a side-note, it was very interesting reading, and I shall have
>> to add jenkins3_mkreen() to my test-suite (although the "keep
>> copyright note" license thing bugs me a bit).
>
> Would you, for completeness' sake, please add Tcl and STL hashes to your
> test suite?
I could do that. Or I just publish the entire ugly thing and let someone
else add them ;-)
> The numbers are quite interesting. Is your test suite
> available somewhere, so we can test with our own data and hardware as
> well.
Not yet, no. I usually munge it up quite a lot when I want to test hashes
for a specific input, so it's not what anyone would call "pretty".
> Both Tcl hash and STL (from SGI probably HP days, still the
> current default with g++) string hashes are extremely simple (excluding
> the loop constructs):
>
> Tcl: h += (h<<3) + c; // essentially *9+c (but work better on
> non-late-intels)
> STL: h = h * 5 + c; // worse than above for most of my data
>
They sure do look simple enough. As for loop constructs, I've tried to
use the same looping mechanics for everything, so as to let the algorithm
be the only difference. Otherwise it gets tricky to do comparisons. The
exceptions are ofcourse hashes relying on Duff's device or similar
alignment trickery.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
next prev parent reply other threads:[~2008-01-24 10:26 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-22 23:37 I'm a total push-over Linus Torvalds
2008-01-23 1:35 ` Kevin Ballard
2008-01-23 2:23 ` Junio C Hamano
2008-01-23 2:36 ` Junio C Hamano
2008-01-23 12:24 ` Johannes Schindelin
2008-01-23 12:28 ` David Kastrup
2008-01-23 12:56 ` Theodore Tso
2008-01-23 2:58 ` Linus Torvalds
2008-01-23 3:19 ` Linus Torvalds
2008-01-25 6:50 ` Junio C Hamano
2008-01-25 16:24 ` Linus Torvalds
2008-01-23 7:23 ` Junio C Hamano
2008-01-23 12:25 ` Johannes Schindelin
2008-01-23 16:25 ` Linus Torvalds
2008-01-23 16:34 ` Johannes Schindelin
2008-01-23 17:09 ` Linus Torvalds
2008-01-23 17:29 ` Linus Torvalds
2008-01-25 5:21 ` Jeremy Maitin-Shepard
2008-01-25 12:51 ` Johannes Schindelin
2008-01-25 18:19 ` Jeremy Maitin-Shepard
2008-01-25 18:24 ` Johannes Schindelin
2008-01-25 19:07 ` Junio C Hamano
2008-01-23 8:32 ` Andreas Ericsson
2008-01-23 9:15 ` Dmitry Potapov
2008-01-23 9:31 ` Andreas Ericsson
2008-01-23 14:01 ` Marko Kreen
2008-01-23 14:39 ` Andreas Ericsson
2008-01-24 6:51 ` Luke Lu
2008-01-24 10:24 ` Andreas Ericsson [this message]
2008-01-24 13:19 ` Marko Kreen
2008-01-24 16:00 ` Andreas Ericsson
2008-01-24 16:13 ` Marko Kreen
2008-01-24 16:28 ` Dmitry Potapov
2008-01-24 17:15 ` Linus Torvalds
2008-01-24 18:45 ` Dmitry Potapov
2008-01-24 19:08 ` Linus Torvalds
2008-01-25 20:52 ` Marko Kreen
2008-01-25 22:16 ` Linus Torvalds
2008-01-25 22:35 ` Linus Torvalds
2008-01-26 12:16 ` Marko Kreen
2008-01-27 6:51 ` Linus Torvalds
2008-01-27 8:21 ` Dmitry Potapov
2008-01-27 14:07 ` Johannes Schindelin
2008-01-27 14:48 ` Dmitry Potapov
2008-01-27 9:45 ` Marko Kreen
2008-01-27 15:06 ` Dmitry Potapov
2008-01-26 12:37 ` Marko Kreen
2008-01-25 20:08 ` Marko Kreen
2008-01-23 17:10 ` Dmitry Potapov
2008-01-24 10:39 ` Andreas Ericsson
2008-01-23 16:06 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47986775.7010603@op5.se \
--to=ae@op5.se \
--cc=dpotapov@gmail.com \
--cc=git@vger.kernel.org \
--cc=git@vicaya.com \
--cc=gitster@pobox.com \
--cc=markokr@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).