From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753447AbcEBJj4 (ORCPT ); Mon, 2 May 2016 05:39:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33190 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752976AbcEBJji (ORCPT ); Mon, 2 May 2016 05:39:38 -0400 Message-ID: <1462181968.3869.670.camel@localhost.localdomain> Subject: Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism From: Torvald Riegel To: Linus Torvalds Cc: Thomas Gleixner , Rik van Riel , LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Sebastian Andrzej Siewior , Darren Hart , Michael Kerrisk , Davidlohr Bueso , Chris Mason , "Carlos O'Donell" , Eric Dumazet Date: Mon, 02 May 2016 11:39:28 +0200 In-Reply-To: References: <20160428161742.363543816@linutronix.de> <20160428163525.495514517@linutronix.de> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2016-04-29 at 16:51 -0700, Linus Torvalds wrote: > On Fri, Apr 29, 2016 at 2:10 PM, Linus Torvalds > wrote: > > On Thu, Apr 28, 2016 at 11:32 AM, Linus Torvalds > > wrote: > >> > >> hash_long/ptr really shouldn't care about the low bits being zero at > >> all, because it should mix in all the bits (using a prime multiplier > >> and taking the high bits). > > > > Looking at this assertion, it doesn't actually pan out very well at all. > > > > The 32-bit hash looks to be ok. Certainly not perfect, but not horrible. > > > > The 64-bit hash seems to be quite horribly bad with lots of values. > > Ok, I have tried to research this a bit more. There's a lot of > confusion here that causes the fact that hash_64() sucks donkey balls. > > The basic method for the hashing method by multiplication is fairly > sane. It's well-documented, and attributed to Knuth as the comment > above it says. Section 2.2 of this article might also be of interest: M. Dietzfelbinger, T. Hagerup, J. Katajainen, and M. Penttonen. A Re- liable Randomized Algorithm for the Closest-Pair Problem. Journal of Algorithms, 25(1):19 – 51, 1997. I don't know much about hash functions, but my understanding of this is that one can do good hashing of words by multiplying with a random number and using the most-significant bits of the lower-significant word of the result. Different random numbers may work better than others for the input data (and some might be really awful), but the paper seems to claim that one *can* find a good random number for all input data. In practice, this might mean that re-hashing with a different random number after seeing too many conflicts in a hash table should eventually lead to a good hashing (ie, because the *class* of such hash functions is universal).