* [PATCH] futex: eliminate cache miss from futex_hash() @ 2015-09-09 21:36 Rasmus Villemoes 2015-09-10 10:22 ` Davidlohr Bueso 2015-09-22 14:27 ` [tip:locking/core] futex: Force hot variables into a single cache line tip-bot for Rasmus Villemoes 0 siblings, 2 replies; 5+ messages in thread From: Rasmus Villemoes @ 2015-09-09 21:36 UTC (permalink / raw) To: Thomas Gleixner, Davidlohr Bueso, kbuild test robot, Sebastian Andrzej Siewior Cc: Peter Zijlstra, Ingo Molnar, Rasmus Villemoes, linux-kernel futex_hash() references two global variables: the base pointer futex_queues and the size of the array futex_hashsize. The latter is marked __read_mostly, while the former is not, so they are likely to end up very far from each other. This means that futex_hash() is likely to encounter two cache misses. We could mark futex_queues as __read_mostly as well, but that doesn't guarantee they'll end up next to each other (and even if they do, they may still end up in different cache lines). So put the two variables in a small singleton struct with sufficient alignment and mark that as __read_mostly. A diff of the disassembly shows what I'd expect: : 31 d1 xor %edx,%ecx : c1 ca 12 ror $0x12,%edx : 29 d1 sub %edx,%ecx -: 48 8b 15 25 c8 e5 00 mov 0xe5c825(%rip),%rdx # ffffffff81f149c8 <futex_hashsize> +: 48 8b 15 35 c8 e5 00 mov 0xe5c835(%rip),%rdx # ffffffff81f149d8 <__futex_data+0x8> : 31 c8 xor %ecx,%eax : c1 c9 08 ror $0x8,%ecx : 29 c8 sub %ecx,%eax : 48 83 ea 01 sub $0x1,%rdx : 48 21 d0 and %rdx,%rax : 48 c1 e0 06 shl $0x6,%rax -: 48 03 05 e4 5e 02 01 add 0x1025ee4(%rip),%rax # ffffffff820de0a0 <futex_queues> +: 48 03 05 14 c8 e5 00 add 0xe5c814(%rip),%rax # ffffffff81f149d0 <__futex_data> : c3 retq : 0f 1f 00 nopl (%rax) Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> --- Resending since this was never picked up - and I assume it's actually ok. Also, this time the alignment is spelled 2*sizeof(long) to avoid wasting 8 bytes on 32bit. kernel/futex.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 6e443efc65f4..dfc86e93c31d 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -255,9 +255,18 @@ struct futex_hash_bucket { struct plist_head chain; } ____cacheline_aligned_in_smp; -static unsigned long __read_mostly futex_hashsize; +/* + * The base of the bucket array and its size are always used together + * (after initialization only in hash_futex()), so ensure that they + * reside in the same cacheline. + */ +static struct { + struct futex_hash_bucket *queues; + unsigned long hashsize; +} __futex_data __read_mostly __aligned(2*sizeof(long)); +#define futex_queues (__futex_data.queues) +#define futex_hashsize (__futex_data.hashsize) -static struct futex_hash_bucket *futex_queues; /* * Fault injections for futexes. -- 2.1.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] futex: eliminate cache miss from futex_hash() 2015-09-09 21:36 [PATCH] futex: eliminate cache miss from futex_hash() Rasmus Villemoes @ 2015-09-10 10:22 ` Davidlohr Bueso 2015-09-12 9:59 ` Ingo Molnar 2015-09-22 14:27 ` [tip:locking/core] futex: Force hot variables into a single cache line tip-bot for Rasmus Villemoes 1 sibling, 1 reply; 5+ messages in thread From: Davidlohr Bueso @ 2015-09-10 10:22 UTC (permalink / raw) To: Rasmus Villemoes Cc: Thomas Gleixner, kbuild test robot, Sebastian Andrzej Siewior, Peter Zijlstra, Ingo Molnar, linux-kernel On Wed, 09 Sep 2015, Rasmus Villemoes wrote: >futex_hash() references two global variables: the base pointer >futex_queues and the size of the array futex_hashsize. The latter is >marked __read_mostly, while the former is not, so they are likely to >end up very far from each other. This means that futex_hash() is >likely to encounter two cache misses. > >We could mark futex_queues as __read_mostly as well, but that doesn't >guarantee they'll end up next to each other (and even if they do, they >may still end up in different cache lines). So put the two variables >in a small singleton struct with sufficient alignment and mark that as >__read_mostly. This really doesn't have much practical effect -- not even on larger boxes, where such things matter. For instance, I ran the patch on a 60-core IvyBridge with 'perf-bench futex', for which futex-hash particularly benefits in good data layout (ie our current smp alignment). http://linux-scalability.org/futex-__futex_data/ I think we should leave it as is. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] futex: eliminate cache miss from futex_hash() 2015-09-10 10:22 ` Davidlohr Bueso @ 2015-09-12 9:59 ` Ingo Molnar 2015-10-26 15:22 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 5+ messages in thread From: Ingo Molnar @ 2015-09-12 9:59 UTC (permalink / raw) To: Davidlohr Bueso Cc: Rasmus Villemoes, Thomas Gleixner, kbuild test robot, Sebastian Andrzej Siewior, Peter Zijlstra, linux-kernel * Davidlohr Bueso <dave@stgolabs.net> wrote: > On Wed, 09 Sep 2015, Rasmus Villemoes wrote: > > >futex_hash() references two global variables: the base pointer > >futex_queues and the size of the array futex_hashsize. The latter is > >marked __read_mostly, while the former is not, so they are likely to > >end up very far from each other. This means that futex_hash() is > >likely to encounter two cache misses. > > > >We could mark futex_queues as __read_mostly as well, but that doesn't > >guarantee they'll end up next to each other (and even if they do, they > >may still end up in different cache lines). So put the two variables > >in a small singleton struct with sufficient alignment and mark that as > >__read_mostly. > > This really doesn't have much practical effect -- not even on larger > boxes, where such things matter. For instance, I ran the patch on a > 60-core IvyBridge with 'perf-bench futex', for which futex-hash > particularly benefits in good data layout (ie our current smp alignment). > > http://linux-scalability.org/futex-__futex_data/ > > I think we should leave it as is. But ... given that these are shared-cached values (cached on all CPUs), this change would only be measurable in such a benchmark if the cache footprint of the test is just about to overflow the size of the CPU cache and the one extra cache line would cause cache trashing. That is very unlikely. So such a change seems to make sense unless you can argue that it's _bad_ to move them closer to each other. Thanks, Ingo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] futex: eliminate cache miss from futex_hash() 2015-09-12 9:59 ` Ingo Molnar @ 2015-10-26 15:22 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 5+ messages in thread From: Sebastian Andrzej Siewior @ 2015-10-26 15:22 UTC (permalink / raw) To: Ingo Molnar, Davidlohr Bueso Cc: Rasmus Villemoes, Thomas Gleixner, kbuild test robot, Peter Zijlstra, linux-kernel On 09/12/2015 11:59 AM, Ingo Molnar wrote: > > * Davidlohr Bueso <dave@stgolabs.net> wrote: > >> I think we should leave it as is. > > But ... given that these are shared-cached values (cached on all CPUs), this > change would only be measurable in such a benchmark if the cache footprint of the > test is just about to overflow the size of the CPU cache and the one extra cache > line would cause cache trashing. That is very unlikely. > > So such a change seems to make sense unless you can argue that it's _bad_ to move > them closer to each other. hash_futex(), ARM, gcc-5.2.1: - three opcodes less - we don't push / pop a register to the stack --- futex_old.o_f.S +++ futex_new.o_f.S @@ -1,26 +1,23 @@ 00000000 <hash_futex>: -push {lr} ; (str lr, [sp, #-4]!) -movw r3, #48887 ; 0xbef7 ldr r1, [r0, #8] -movt r3, #57005 ; 0xdead +movw r3, #48887 ; 0xbef7 ldr r2, [r0, #4] -movw ip, #0 +movt r3, #57005 ; 0xdead add r3, r1, r3 ldr r0, [r0] add r2, r3, r2 -movt ip, #0 +movw ip, #0 eor r1, r3, r2 add r3, r3, r0 sub r1, r1, r2, ror #18 -ldr ip, [ip] +movt ip, #0 eor r3, r3, r1 -movw lr, #0 +ldr r0, [ip, #4] sub r3, r3, r1, ror #21 -sub ip, ip, #1 +ldr ip, [ip] eor r2, r2, r3 -movt lr, #0 +sub r0, r0, #1 sub r2, r2, r3, ror #7 -ldr r0, [lr] eor r1, r1, r2 sub r1, r1, r2, ror #16 eor r3, r3, r1 @@ -29,6 +26,6 @@ sub r3, r2, r3, ror #18 eor r1, r1, r3 sub r3, r1, r3, ror #8 -and r3, r3, ip -add r0, r0, r3, lsl #6 -pop {pc} ; (ldr pc, [sp], #4) +and r0, r0, r3 +add r0, ip, r0, lsl #6 +bx lr I guess that not invoking three opcodes is a good thing :) > Thanks, > > Ingo > Sebastian ^ permalink raw reply [flat|nested] 5+ messages in thread
* [tip:locking/core] futex: Force hot variables into a single cache line 2015-09-09 21:36 [PATCH] futex: eliminate cache miss from futex_hash() Rasmus Villemoes 2015-09-10 10:22 ` Davidlohr Bueso @ 2015-09-22 14:27 ` tip-bot for Rasmus Villemoes 1 sibling, 0 replies; 5+ messages in thread From: tip-bot for Rasmus Villemoes @ 2015-09-22 14:27 UTC (permalink / raw) To: linux-tip-commits Cc: mingo, tglx, fengguang.wu, dave, linux-kernel, bigeasy, peterz, linux, hpa Commit-ID: ac742d37180bee83bc433be087b66a17af2883b9 Gitweb: http://git.kernel.org/tip/ac742d37180bee83bc433be087b66a17af2883b9 Author: Rasmus Villemoes <linux@rasmusvillemoes.dk> AuthorDate: Wed, 9 Sep 2015 23:36:40 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Tue, 22 Sep 2015 16:23:15 +0200 futex: Force hot variables into a single cache line futex_hash() references two global variables: the base pointer futex_queues and the size of the array futex_hashsize. The latter is marked __read_mostly, while the former is not, so they are likely to end up very far from each other. This means that futex_hash() is likely to encounter two cache misses. We could mark futex_queues as __read_mostly as well, but that doesn't guarantee they'll end up next to each other (and even if they do, they may still end up in different cache lines). So put the two variables in a small singleton struct with sufficient alignment and mark that as __read_mostly. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: kbuild test robot <fengguang.wu@intel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/1441834601-13633-1-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/futex.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 6e443ef..dfc86e9 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -255,9 +255,18 @@ struct futex_hash_bucket { struct plist_head chain; } ____cacheline_aligned_in_smp; -static unsigned long __read_mostly futex_hashsize; +/* + * The base of the bucket array and its size are always used together + * (after initialization only in hash_futex()), so ensure that they + * reside in the same cacheline. + */ +static struct { + struct futex_hash_bucket *queues; + unsigned long hashsize; +} __futex_data __read_mostly __aligned(2*sizeof(long)); +#define futex_queues (__futex_data.queues) +#define futex_hashsize (__futex_data.hashsize) -static struct futex_hash_bucket *futex_queues; /* * Fault injections for futexes. ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-10-26 15:22 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-09 21:36 [PATCH] futex: eliminate cache miss from futex_hash() Rasmus Villemoes 2015-09-10 10:22 ` Davidlohr Bueso 2015-09-12 9:59 ` Ingo Molnar 2015-10-26 15:22 ` Sebastian Andrzej Siewior 2015-09-22 14:27 ` [tip:locking/core] futex: Force hot variables into a single cache line tip-bot for Rasmus Villemoes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).