From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Rast Subject: Re: XDL_FAST_HASH can be very slow Date: Mon, 22 Dec 2014 11:48:31 +0100 Message-ID: <87r3vsrmdc.fsf@thomasrast.ch> References: <20141222041944.GA441@peff.net> Mime-Version: 1.0 Content-Type: text/plain Cc: "git\@vger.kernel.org" , Jeff King To: Patrick Reynolds X-From: git-owner@vger.kernel.org Mon Dec 22 11:58:22 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Y30gu-00085N-Kh for gcvg-git-2@plane.gmane.org; Mon, 22 Dec 2014 11:58:21 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754133AbaLVK6P (ORCPT ); Mon, 22 Dec 2014 05:58:15 -0500 Received: from ip1.thgersdorf.net ([148.251.9.194]:36139 "EHLO mail.psioc.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754051AbaLVK6P (ORCPT ); Mon, 22 Dec 2014 05:58:15 -0500 X-Greylist: delayed 569 seconds by postgrey-1.27 at vger.kernel.org; Mon, 22 Dec 2014 05:58:14 EST Received: from localhost (localhost [127.0.0.1]) by localhost.psioc.net (Postfix) with ESMTP id C91454D6009; Mon, 22 Dec 2014 11:48:40 +0100 (CET) X-Virus-Scanned: amavisd-new at psioc.net Received: from mail.psioc.net ([127.0.0.1]) by localhost (mail.psioc.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id mi4m1Gtg-jnw; Mon, 22 Dec 2014 11:48:30 +0100 (CET) Received: from hexa.thomasrast.ch (46-126-8-85.dynamic.hispeed.ch [46.126.8.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.psioc.net (Postfix) with ESMTPSA id CB3BD4D6001; Mon, 22 Dec 2014 11:48:29 +0100 (CET) In-Reply-To: (Patrick Reynolds's message of "Mon, 22 Dec 2014 03:08:12 -0600") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Patrick Reynolds writes: > The original xdl_hash_record is essentially DJB hash, which does a > multiplication, load, and xor for each byte of the input. Commit > 6942efc introduces an "XDL_FAST_HASH" version of the same function > that is clearly inspired by the DJB hash, but it does only one > multiplication, load, and xor for each 8 bytes of input -- i.e., fewer > loads, but also a lot less bit mixing. Less mixing means far more > collisions, leading to the performance problems with evil-icons. It's > not clear to me if the XDL_FAST_HASH version intended to match the > output of the DJB hash function, but it doesn't at all. Note that XDL_FAST_HASH is just a ripoff of the hashing scheme that Linus socially engineered on G+ around that time. I didn't do any of the hash genealogy that you did here, and it now shows. The orginal patches are linked from 6942efc (xdiff: load full words in the inner loop of xdl_hash_record, 2012-04-06): https://lkml.org/lkml/2012/3/2/452 https://lkml.org/lkml/2012/3/5/6 The code still exists: https://github.com/torvalds/linux/blob/master/fs/namei.c#L1678 > I have implemented two simpler possibilities, both of which fix the > problems diffing the evil-icons repository: I think it would be best to separate three goals here: 1. hash function throughput 2. quality of the hash values 3. avoiding collision attacks XDL_FAST_HASH was strictly an attempt to improve throughput, and fairly successful at that (6942efc (xdiff: load full words in the inner loop of xdl_hash_record, 2012-04-06) quotes an 8% improvement on 'git log -p'). You are now addressing quality. I have no idea how you ran into this, but if you are reworking things already, I think it would be good to also randomize whatever hash you put in so as to give some measure of protection against collision attacks. > 1. An XDL_FAST_HASH implementation that matches the output of the DJB > hash exactly. Its performance is basically the same as DJB, because > the only thing is does differently is load 8 bytes at a time instead > of 1. It does all the same ALU operations as DJB. I don't think there's a point in having such a function, since it would mean a lot of code for no throughput gain. Let's just remove XDL_FAST_HASH and the original hashing scheme in favor of a better hash function. -- Thomas Rast tr@thomasrast.ch