From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54731) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1anxxN-0005TA-J0 for qemu-devel@nongnu.org; Wed, 06 Apr 2016 20:37:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1anxxK-0000ON-CE for qemu-devel@nongnu.org; Wed, 06 Apr 2016 20:37:57 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:45450) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1anxxK-0000OJ-6f for qemu-devel@nongnu.org; Wed, 06 Apr 2016 20:37:54 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 2306920DEC for ; Wed, 6 Apr 2016 20:37:52 -0400 (EDT) Date: Wed, 6 Apr 2016 20:37:51 -0400 From: "Emilio G. Cota" Message-ID: <20160407003751.GA4459@flamenco> References: <1459834253-8291-8-git-send-email-cota@braap.org> <5703DCB7.50302@twiddle.net> <5703DE37.3080306@redhat.com> <5703E2DD.3020103@twiddle.net> <20160405194028.GA6671@flamenco> <5704293D.1070105@twiddle.net> <20160406005239.GA25081@flamenco> <5704F875.90509@redhat.com> <20160406174439.GB27512@flamenco> <5705542E.60708@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5705542E.60708@redhat.com> Subject: Re: [Qemu-devel] [PATCH 07/10] tb hash: hash phys_pc, pc, and flags with xxhash List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: MTTCG Devel , Peter Maydell , Peter Crosthwaite , QEMU Developers , Sergey Fedorov , Alex =?iso-8859-1?Q?Benn=E9e?= , Richard Henderson On Wed, Apr 06, 2016 at 20:23:42 +0200, Paolo Bonzini wrote: > On 06/04/2016 19:44, Emilio G. Cota wrote: > > I like this idea, because the ugliness of the sizeof checks is significant. > > However, the quality of the resulting hash is not as good when always using func5. > > For instance, when we'd otherwise use func3, two fifths of every input contain > > exactly the same bits: all 0's. This inevitably leads to more collisions. I take this back. I don't know anymore what I measured earlier today--it's been a long day and was juggling quite a few things. I essentially see the same chain lengths (within 0.2%) for either function, i.e. func3 or func5 with the padded 0's when running arm-softmmu. So this is good news :> > Perhaps better is to always use a three-word xxhash, but pick the 64-bit > version if any of phys_pc and pc are 64-bits. The unrolling would be > very effective, and the performance penalty not too important (64-bit on > 32-bit is very slow anyway). By "the 64-bit version" you mean what I called func5? That is: if (sizeof(phys_pc) == sizeof(uint64_t) || sizeof(pc) == sizeof(uint64_t)) return tb_hash_func5(); return tb_hash_func3(); or do you mean xxhash64 (which I did not include in my patchset)? My tests with xxhash64 suggest that the quality of the results do not improve over xxhash32, and the computation takes longer (it's more instructions); not much, but measurable. So we should probably just go with func5 always, as you suggested initially. If so, I'm ready to send a v2. Thanks, Emilio