From: Thomas Rast <tr@thomasrast.ch>
To: Patrick Reynolds <piki@github.com>
Cc: "git\@vger.kernel.org" <git@vger.kernel.org>, Jeff King <peff@peff.net>
Subject: Re: XDL_FAST_HASH can be very slow
Date: Mon, 22 Dec 2014 11:48:31 +0100 [thread overview]
Message-ID: <87r3vsrmdc.fsf@thomasrast.ch> (raw)
In-Reply-To: <CAJrMUs_fM8+=2j1e5hYiaRjQq1QF87X6qOLN847q-B7Nu-wniw@mail.gmail.com> (Patrick Reynolds's message of "Mon, 22 Dec 2014 03:08:12 -0600")
Patrick Reynolds <piki@github.com> writes:
> The original xdl_hash_record is essentially DJB hash, which does a
> multiplication, load, and xor for each byte of the input. Commit
> 6942efc introduces an "XDL_FAST_HASH" version of the same function
> that is clearly inspired by the DJB hash, but it does only one
> multiplication, load, and xor for each 8 bytes of input -- i.e., fewer
> loads, but also a lot less bit mixing. Less mixing means far more
> collisions, leading to the performance problems with evil-icons. It's
> not clear to me if the XDL_FAST_HASH version intended to match the
> output of the DJB hash function, but it doesn't at all.
Note that XDL_FAST_HASH is just a ripoff of the hashing scheme that
Linus socially engineered on G+ around that time. I didn't do any of
the hash genealogy that you did here, and it now shows. The orginal
patches are linked from 6942efc (xdiff: load full words in the inner loop of
xdl_hash_record, 2012-04-06):
https://lkml.org/lkml/2012/3/2/452
https://lkml.org/lkml/2012/3/5/6
The code still exists:
https://github.com/torvalds/linux/blob/master/fs/namei.c#L1678
> I have implemented two simpler possibilities, both of which fix the
> problems diffing the evil-icons repository:
I think it would be best to separate three goals here:
1. hash function throughput
2. quality of the hash values
3. avoiding collision attacks
XDL_FAST_HASH was strictly an attempt to improve throughput, and fairly
successful at that (6942efc (xdiff: load full words in the inner loop of
xdl_hash_record, 2012-04-06) quotes an 8% improvement on 'git log -p').
You are now addressing quality.
I have no idea how you ran into this, but if you are reworking things
already, I think it would be good to also randomize whatever hash you
put in so as to give some measure of protection against collision
attacks.
> 1. An XDL_FAST_HASH implementation that matches the output of the DJB
> hash exactly. Its performance is basically the same as DJB, because
> the only thing is does differently is load 8 bytes at a time instead
> of 1. It does all the same ALU operations as DJB.
I don't think there's a point in having such a function, since it would
mean a lot of code for no throughput gain. Let's just remove
XDL_FAST_HASH and the original hashing scheme in favor of a better hash
function.
--
Thomas Rast
tr@thomasrast.ch
next prev parent reply other threads:[~2014-12-22 10:58 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-22 4:19 XDL_FAST_HASH can be very slow Jeff King
2014-12-22 9:08 ` Patrick Reynolds
2014-12-22 10:48 ` Thomas Rast [this message]
2014-12-23 2:51 ` demerphq
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r3vsrmdc.fsf@thomasrast.ch \
--to=tr@thomasrast.ch \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=piki@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).