All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Rast <tr@thomasrast.ch>
To: Patrick Reynolds <piki@github.com>
Cc: "git\@vger.kernel.org" <git@vger.kernel.org>, Jeff King <peff@peff.net>
Subject: Re: XDL_FAST_HASH can be very slow
Date: Mon, 22 Dec 2014 11:48:31 +0100	[thread overview]
Message-ID: <87r3vsrmdc.fsf@thomasrast.ch> (raw)
In-Reply-To: <CAJrMUs_fM8+=2j1e5hYiaRjQq1QF87X6qOLN847q-B7Nu-wniw@mail.gmail.com> (Patrick Reynolds's message of "Mon, 22 Dec 2014 03:08:12 -0600")

Patrick Reynolds <piki@github.com> writes:

> The original xdl_hash_record is essentially DJB hash, which does a
> multiplication, load, and xor for each byte of the input.  Commit
> 6942efc introduces an "XDL_FAST_HASH" version of the same function
> that is clearly inspired by the DJB hash, but it does only one
> multiplication, load, and xor for each 8 bytes of input -- i.e., fewer
> loads, but also a lot less bit mixing.  Less mixing means far more
> collisions, leading to the performance problems with evil-icons.  It's
> not clear to me if the XDL_FAST_HASH version intended to match the
> output of the DJB hash function, but it doesn't at all.

Note that XDL_FAST_HASH is just a ripoff of the hashing scheme that
Linus socially engineered on G+ around that time.  I didn't do any of
the hash genealogy that you did here, and it now shows.  The orginal
patches are linked from 6942efc (xdiff: load full words in the inner loop of
xdl_hash_record, 2012-04-06):

  https://lkml.org/lkml/2012/3/2/452
  https://lkml.org/lkml/2012/3/5/6

The code still exists:

  https://github.com/torvalds/linux/blob/master/fs/namei.c#L1678

> I have implemented two simpler possibilities, both of which fix the
> problems diffing the evil-icons repository:

I think it would be best to separate three goals here:

1. hash function throughput
2. quality of the hash values
3. avoiding collision attacks

XDL_FAST_HASH was strictly an attempt to improve throughput, and fairly
successful at that (6942efc (xdiff: load full words in the inner loop of
xdl_hash_record, 2012-04-06) quotes an 8% improvement on 'git log -p').

You are now addressing quality.

I have no idea how you ran into this, but if you are reworking things
already, I think it would be good to also randomize whatever hash you
put in so as to give some measure of protection against collision
attacks.

> 1. An XDL_FAST_HASH implementation that matches the output of the DJB
> hash exactly.  Its performance is basically the same as DJB, because
> the only thing is does differently is load 8 bytes at a time instead
> of 1.  It does all the same ALU operations as DJB.

I don't think there's a point in having such a function, since it would
mean a lot of code for no throughput gain.  Let's just remove
XDL_FAST_HASH and the original hashing scheme in favor of a better hash
function.

-- 
Thomas Rast
tr@thomasrast.ch

  reply	other threads:[~2014-12-22 10:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-22  4:19 XDL_FAST_HASH can be very slow Jeff King
2014-12-22  9:08 ` Patrick Reynolds
2014-12-22 10:48   ` Thomas Rast [this message]
2014-12-23  2:51     ` demerphq

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r3vsrmdc.fsf@thomasrast.ch \
    --to=tr@thomasrast.ch \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=piki@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.