git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Ericsson <ae@op5.se>
To: Marko Kreen <markokr@gmail.com>
Cc: Dmitry Potapov <dpotapov@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: I'm a total push-over..
Date: Wed, 23 Jan 2008 15:39:06 +0100	[thread overview]
Message-ID: <4797518A.3040704@op5.se> (raw)
In-Reply-To: <e51f66da0801230601n6edd2639lff70415afa9f9026@mail.gmail.com>

Marko Kreen wrote:
> On 1/23/08, Andreas Ericsson <ae@op5.se> wrote:
>> Dmitry Potapov wrote:
>>> On Wed, Jan 23, 2008 at 09:32:54AM +0100, Andreas Ericsson wrote:
>>>> The FNV hash would be better (pasted below), but I doubt
>>>> anyone will ever care, and there will be larger differences
>>>> between architectures with this one than the lt_git hash (well,
>>>> a function's gotta have a name).
>>> Actually, Bob Jenkins' lookup3 hash is twice faster in my tests
>>> than FNV, and also it is much less likely to have any collision.
>>>
>> >From http://burtleburtle.net/bob/hash/doobs.html
>> ---
>> FNV Hash
>>
>> I need to fill this in. Search the web for FNV hash. It's faster than my hash on Intel (because Intel has fast multiplication), but slower on most other platforms. Preliminary tests suggested it has decent distributions.
> 
> I suspect that this paragraph was about comparison with lookup2


It might be. It's from the link Dmitry posted in his reply to my original
message. (something/something/doobs.html).

> (not lookup3) because lookup3 beat easily all the "simple" hashes

By how much? FNV beat Linus' hash by 0.01 microseconds / insertion,
and 0.1 microsecons / lookup. We're talking about a case here where
there will never be more lookups than insertions (unless I'm much
mistaken).

> 
> If you don't mind few percent speed penalty compared to Jenkings
> own optimized version, you can use my simplified version:
> 
>   http://repo.or.cz/w/pgbouncer.git?a=blob;f=src/hash.c;h=5c9a73639ad098c296c0be562c34573189f3e083;hb=HEAD
> 

I don't, but I don't care that deeply either. On the one hand,
it would be nifty to have an excellent hash-function in git.
On the other hand, it would look stupid with something that's
quite clearly over-kill.

> It works always with "native" endianess, unlike Jenkins fixed-endian
> hashlittle() / hashbig().  It may or may not matter if you plan
> to write values on disk.
> 
> Speed-wise it may be 10-30% slower worst case (in my case sparc-classic
> with unaligned data), but on x86, lucky gcc version and maybe
> also memcpy() hack seen in system.h, it tends to be ~10% faster,
> especially as it does always 4byte read in main loop.
> 

It would have to be a significant improvement in wall-clock time
on a test-case of hashing 30k strings to warrant going from 6 to 80
lines of code, imo. I still believe the original dumb hash Linus
wrote is "good enough".

On a side-note, it was very interesting reading, and I shall have
to add jenkins3_mkreen() to my test-suite (although the "keep
copyright note" license thing bugs me a bit).

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

  reply	other threads:[~2008-01-24  6:18 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-22 23:37 I'm a total push-over Linus Torvalds
2008-01-23  1:35 ` Kevin Ballard
2008-01-23  2:23 ` Junio C Hamano
2008-01-23  2:36   ` Junio C Hamano
2008-01-23 12:24     ` Johannes Schindelin
2008-01-23 12:28       ` David Kastrup
2008-01-23 12:56         ` Theodore Tso
2008-01-23  2:58   ` Linus Torvalds
2008-01-23  3:19     ` Linus Torvalds
2008-01-25  6:50       ` Junio C Hamano
2008-01-25 16:24         ` Linus Torvalds
2008-01-23  7:23     ` Junio C Hamano
2008-01-23 12:25       ` Johannes Schindelin
2008-01-23 16:25         ` Linus Torvalds
2008-01-23 16:34           ` Johannes Schindelin
2008-01-23 17:09             ` Linus Torvalds
2008-01-23 17:29               ` Linus Torvalds
2008-01-25  5:21                 ` Jeremy Maitin-Shepard
2008-01-25 12:51                   ` Johannes Schindelin
2008-01-25 18:19                     ` Jeremy Maitin-Shepard
2008-01-25 18:24                       ` Johannes Schindelin
2008-01-25 19:07                       ` Junio C Hamano
2008-01-23  8:32 ` Andreas Ericsson
2008-01-23  9:15   ` Dmitry Potapov
2008-01-23  9:31     ` Andreas Ericsson
2008-01-23 14:01       ` Marko Kreen
2008-01-23 14:39         ` Andreas Ericsson [this message]
2008-01-24  6:51           ` Luke Lu
2008-01-24 10:24             ` Andreas Ericsson
2008-01-24 13:19           ` Marko Kreen
2008-01-24 16:00             ` Andreas Ericsson
2008-01-24 16:13               ` Marko Kreen
2008-01-24 16:28               ` Dmitry Potapov
2008-01-24 17:15                 ` Linus Torvalds
2008-01-24 18:45                   ` Dmitry Potapov
2008-01-24 19:08                     ` Linus Torvalds
2008-01-25 20:52                   ` Marko Kreen
2008-01-25 22:16                     ` Linus Torvalds
2008-01-25 22:35                       ` Linus Torvalds
2008-01-26 12:16                       ` Marko Kreen
2008-01-27  6:51                         ` Linus Torvalds
2008-01-27  8:21                           ` Dmitry Potapov
2008-01-27 14:07                             ` Johannes Schindelin
2008-01-27 14:48                               ` Dmitry Potapov
2008-01-27  9:45                           ` Marko Kreen
2008-01-27 15:06                             ` Dmitry Potapov
2008-01-26 12:37                       ` Marko Kreen
2008-01-25 20:08               ` Marko Kreen
2008-01-23 17:10       ` Dmitry Potapov
2008-01-24 10:39         ` Andreas Ericsson
2008-01-23 16:06   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4797518A.3040704@op5.se \
    --to=ae@op5.se \
    --cc=dpotapov@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=markokr@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).