git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Marko Kreen" <markokr@gmail.com>
To: "Andreas Ericsson" <ae@op5.se>
Cc: "Dmitry Potapov" <dpotapov@gmail.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: I'm a total push-over..
Date: Thu, 24 Jan 2008 18:13:02 +0200	[thread overview]
Message-ID: <e51f66da0801240813v61d3bc74x2863eb37678439db@mail.gmail.com> (raw)
In-Reply-To: <4798B633.8040606@op5.se>

On 1/24/08, Andreas Ericsson <ae@op5.se> wrote:
> Marko Kreen wrote:
> > On 1/23/08, Andreas Ericsson <ae@op5.se> wrote:
> >> Marko Kreen wrote:
> >>> (not lookup3) because lookup3 beat easily all the "simple" hashes
> >> By how much? FNV beat Linus' hash by 0.01 microseconds / insertion,
> >> and 0.1 microsecons / lookup. We're talking about a case here where
> >> there will never be more lookups than insertions (unless I'm much
> >> mistaken).
> >
> > FNV is around 40% slower than lookup3 on my Intel Core CPU, on 4byte aligned
> > input. See below for more detailed info.
> >
>
> But the tests surely need to check for unaligned cases, as that's what
> we're likely to hash, no?



> >> It would have to be a significant improvement in wall-clock time
> >> on a test-case of hashing 30k strings to warrant going from 6 to 80
> >> lines of code, imo. I still believe the original dumb hash Linus
> >> wrote is "good enough".
> >
> > Well, ad-hoc dumb hashes may have horrible worst-cases that you cant
> > see with light testing.  Therefore I'd still suggest some better
> > researched dumb hash (eg. FNV or OAT).
> >
>
> True. FNV is used in both MySQL and PostgreSQL. I'd say it's safe to
> assume it's fairly well tested.

PostgreSQL uses lookup2...

> > Here is my raw-speed test of different hashes.  Input is 4-byte
> > aligned which should be common case for malloc()-ed strings.
>
> Unless arena allocated, like we do in git.
>
> I'm not surprised that this test favours Jenkin's and Hsie's.
> That's to be expected as those benefit far more than simpler
> hashing algorithms for long strings. The overhead when trying
> shorter strings (say, between 3 and 15 chars, and not necessarily
> 4-byte aligned) sometimes make them quite a lot slower though.

Ok, here is 0..15 chars, random alignment:

Testing: seed=34 align=0 minlen=0 maxlen=15 trycnt=2 duration=10

lookup3 : try=0: ...  69.8092 MB/s
lookup3 : try=1: ...  69.8146 MB/s
own_memcpy: try=0: ...  66.7808 MB/s
own_memcpy: try=1: ...  66.7814 MB/s
memcpy_hack: try=0: ...  74.0635 MB/s
memcpy_hack: try=1: ...  74.0518 MB/s
lookup2 : try=0: ...  68.6582 MB/s
lookup2 : try=1: ...  68.6634 MB/s
fnv     : try=0: ...  74.5098 MB/s
fnv     : try=1: ...  74.5283 MB/s
hsieh   : try=0: ...  71.6708 MB/s
hsieh   : try=1: ...  71.6814 MB/s
oat     : try=0: ...  74.7828 MB/s
oat     : try=1: ...  74.7716 MB/s
elf     : try=0: ...  65.2077 MB/s
elf     : try=1: ...  65.2128 MB/s

Results compared to reference:

lookup3         : 100.000 %
own_memcpy      :  95.659 %
memcpy_hack     : 106.082 %
lookup2         :  98.351 %
fnv             : 106.743 %
hsieh           : 102.670 %
oat             : 107.112 %
elf             :  93.409 %

> > This also is best case for original lookup3(), on unaligned
> > input the memcpy variants beat it easily.  Input string
> > length varies randomly in range 0..64.
> >
>
> Well, memcpy() isn't very interesting to compare against
> hashes, as they test vastly different parts of the hardware's
> parts' performance. memcpy() should also perform exactly the
> same no matter what the test-data, which isn't always true for
> hashes.

Sorry, I meant my "simple-memcpy-based-lookup3".

> What *would* be interesting would be something along the lines
> of "duff_cpy()": ie, an unrolled loop that aligns itself and
> copies each byte to the same address each time.

How the hash fetched data from mempry is _very_ relevant.

> Interesting output, but not very surprising. Do you have the code
> available somewhere?

I can put it out.

-- 
marko

  reply	other threads:[~2008-01-24 16:13 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-22 23:37 I'm a total push-over Linus Torvalds
2008-01-23  1:35 ` Kevin Ballard
2008-01-23  2:23 ` Junio C Hamano
2008-01-23  2:36   ` Junio C Hamano
2008-01-23 12:24     ` Johannes Schindelin
2008-01-23 12:28       ` David Kastrup
2008-01-23 12:56         ` Theodore Tso
2008-01-23  2:58   ` Linus Torvalds
2008-01-23  3:19     ` Linus Torvalds
2008-01-25  6:50       ` Junio C Hamano
2008-01-25 16:24         ` Linus Torvalds
2008-01-23  7:23     ` Junio C Hamano
2008-01-23 12:25       ` Johannes Schindelin
2008-01-23 16:25         ` Linus Torvalds
2008-01-23 16:34           ` Johannes Schindelin
2008-01-23 17:09             ` Linus Torvalds
2008-01-23 17:29               ` Linus Torvalds
2008-01-25  5:21                 ` Jeremy Maitin-Shepard
2008-01-25 12:51                   ` Johannes Schindelin
2008-01-25 18:19                     ` Jeremy Maitin-Shepard
2008-01-25 18:24                       ` Johannes Schindelin
2008-01-25 19:07                       ` Junio C Hamano
2008-01-23  8:32 ` Andreas Ericsson
2008-01-23  9:15   ` Dmitry Potapov
2008-01-23  9:31     ` Andreas Ericsson
2008-01-23 14:01       ` Marko Kreen
2008-01-23 14:39         ` Andreas Ericsson
2008-01-24  6:51           ` Luke Lu
2008-01-24 10:24             ` Andreas Ericsson
2008-01-24 13:19           ` Marko Kreen
2008-01-24 16:00             ` Andreas Ericsson
2008-01-24 16:13               ` Marko Kreen [this message]
2008-01-24 16:28               ` Dmitry Potapov
2008-01-24 17:15                 ` Linus Torvalds
2008-01-24 18:45                   ` Dmitry Potapov
2008-01-24 19:08                     ` Linus Torvalds
2008-01-25 20:52                   ` Marko Kreen
2008-01-25 22:16                     ` Linus Torvalds
2008-01-25 22:35                       ` Linus Torvalds
2008-01-26 12:16                       ` Marko Kreen
2008-01-27  6:51                         ` Linus Torvalds
2008-01-27  8:21                           ` Dmitry Potapov
2008-01-27 14:07                             ` Johannes Schindelin
2008-01-27 14:48                               ` Dmitry Potapov
2008-01-27  9:45                           ` Marko Kreen
2008-01-27 15:06                             ` Dmitry Potapov
2008-01-26 12:37                       ` Marko Kreen
2008-01-25 20:08               ` Marko Kreen
2008-01-23 17:10       ` Dmitry Potapov
2008-01-24 10:39         ` Andreas Ericsson
2008-01-23 16:06   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e51f66da0801240813v61d3bc74x2863eb37678439db@mail.gmail.com \
    --to=markokr@gmail.com \
    --cc=ae@op5.se \
    --cc=dpotapov@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).