Re: XDL_FAST_HASH can be very slow

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Rast <tr@thomasrast.ch>
To: Patrick Reynolds <piki@github.com>
Cc: "git\@vger.kernel.org" <git@vger.kernel.org>, Jeff King <peff@peff.net>
Subject: Re: XDL_FAST_HASH can be very slow
Date: Mon, 22 Dec 2014 11:48:31 +0100	[thread overview]
Message-ID: <87r3vsrmdc.fsf@thomasrast.ch> (raw)
In-Reply-To: <CAJrMUs_fM8+=2j1e5hYiaRjQq1QF87X6qOLN847q-B7Nu-wniw@mail.gmail.com> (Patrick Reynolds's message of "Mon, 22 Dec 2014 03:08:12 -0600")

Patrick Reynolds <piki@github.com> writes:

> The original xdl_hash_record is essentially DJB hash, which does a
> multiplication, load, and xor for each byte of the input.  Commit
> 6942efc introduces an "XDL_FAST_HASH" version of the same function
> that is clearly inspired by the DJB hash, but it does only one
> multiplication, load, and xor for each 8 bytes of input -- i.e., fewer
> loads, but also a lot less bit mixing.  Less mixing means far more
> collisions, leading to the performance problems with evil-icons.  It's
> not clear to me if the XDL_FAST_HASH version intended to match the
> output of the DJB hash function, but it doesn't at all.

Note that XDL_FAST_HASH is just a ripoff of the hashing scheme that
Linus socially engineered on G+ around that time.  I didn't do any of
the hash genealogy that you did here, and it now shows.  The orginal
patches are linked from 6942efc (xdiff: load full words in the inner loop of
xdl_hash_record, 2012-04-06):

  https://lkml.org/lkml/2012/3/2/452
  https://lkml.org/lkml/2012/3/5/6

The code still exists:

  https://github.com/torvalds/linux/blob/master/fs/namei.c#L1678

> I have implemented two simpler possibilities, both of which fix the
> problems diffing the evil-icons repository:

I think it would be best to separate three goals here:

1. hash function throughput
2. quality of the hash values
3. avoiding collision attacks

XDL_FAST_HASH was strictly an attempt to improve throughput, and fairly
successful at that (6942efc (xdiff: load full words in the inner loop of
xdl_hash_record, 2012-04-06) quotes an 8% improvement on 'git log -p').

You are now addressing quality.

I have no idea how you ran into this, but if you are reworking things
already, I think it would be good to also randomize whatever hash you
put in so as to give some measure of protection against collision
attacks.

> 1. An XDL_FAST_HASH implementation that matches the output of the DJB
> hash exactly.  Its performance is basically the same as DJB, because
> the only thing is does differently is load 8 bytes at a time instead
> of 1.  It does all the same ALU operations as DJB.

I don't think there's a point in having such a function, since it would
mean a lot of code for no throughput gain.  Let's just remove
XDL_FAST_HASH and the original hashing scheme in favor of a better hash
function.

-- 
Thomas Rast
tr@thomasrast.ch

next prev parent reply	other threads:[~2014-12-22 10:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-22  4:19 XDL_FAST_HASH can be very slow Jeff King
2014-12-22  9:08 ` Patrick Reynolds
2014-12-22 10:48   ` Thomas Rast [this message]
2014-12-23  2:51     ` demerphq

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r3vsrmdc.fsf@thomasrast.ch \
    --to=tr@thomasrast.ch \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=piki@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).