From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752055AbbJZPWj (ORCPT ); Mon, 26 Oct 2015 11:22:39 -0400 Received: from www.linutronix.de ([62.245.132.108]:43469 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751704AbbJZPWi (ORCPT ); Mon, 26 Oct 2015 11:22:38 -0400 Subject: Re: [PATCH] futex: eliminate cache miss from futex_hash() To: Ingo Molnar , Davidlohr Bueso References: <1441834601-13633-1-git-send-email-linux@rasmusvillemoes.dk> <20150910102220.GB19736@linux-q0g1.site> <20150912095936.GA15348@gmail.com> Cc: Rasmus Villemoes , Thomas Gleixner , kbuild test robot , Peter Zijlstra , linux-kernel@vger.kernel.org From: Sebastian Andrzej Siewior X-Enigmail-Draft-Status: N1110 Message-ID: <562E4533.2060907@linutronix.de> Date: Mon, 26 Oct 2015 16:22:27 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150912095936.GA15348@gmail.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/12/2015 11:59 AM, Ingo Molnar wrote: > > * Davidlohr Bueso wrote: > >> I think we should leave it as is. > > But ... given that these are shared-cached values (cached on all CPUs), this > change would only be measurable in such a benchmark if the cache footprint of the > test is just about to overflow the size of the CPU cache and the one extra cache > line would cause cache trashing. That is very unlikely. > > So such a change seems to make sense unless you can argue that it's _bad_ to move > them closer to each other. hash_futex(), ARM, gcc-5.2.1: - three opcodes less - we don't push / pop a register to the stack --- futex_old.o_f.S +++ futex_new.o_f.S @@ -1,26 +1,23 @@ 00000000 : -push {lr} ; (str lr, [sp, #-4]!) -movw r3, #48887 ; 0xbef7 ldr r1, [r0, #8] -movt r3, #57005 ; 0xdead +movw r3, #48887 ; 0xbef7 ldr r2, [r0, #4] -movw ip, #0 +movt r3, #57005 ; 0xdead add r3, r1, r3 ldr r0, [r0] add r2, r3, r2 -movt ip, #0 +movw ip, #0 eor r1, r3, r2 add r3, r3, r0 sub r1, r1, r2, ror #18 -ldr ip, [ip] +movt ip, #0 eor r3, r3, r1 -movw lr, #0 +ldr r0, [ip, #4] sub r3, r3, r1, ror #21 -sub ip, ip, #1 +ldr ip, [ip] eor r2, r2, r3 -movt lr, #0 +sub r0, r0, #1 sub r2, r2, r3, ror #7 -ldr r0, [lr] eor r1, r1, r2 sub r1, r1, r2, ror #16 eor r3, r3, r1 @@ -29,6 +26,6 @@ sub r3, r2, r3, ror #18 eor r1, r1, r3 sub r3, r1, r3, ror #8 -and r3, r3, ip -add r0, r0, r3, lsl #6 -pop {pc} ; (ldr pc, [sp], #4) +and r0, r0, r3 +add r0, ip, r0, lsl #6 +bx lr I guess that not invoking three opcodes is a good thing :) > Thanks, > > Ingo > Sebastian