* [PATCH] futex: eliminate cache miss from futex_hash()
@ 2015-09-09 21:36 Rasmus Villemoes
2015-09-10 10:22 ` Davidlohr Bueso
2015-09-22 14:27 ` [tip:locking/core] futex: Force hot variables into a single cache line tip-bot for Rasmus Villemoes
0 siblings, 2 replies; 5+ messages in thread
From: Rasmus Villemoes @ 2015-09-09 21:36 UTC (permalink / raw)
To: Thomas Gleixner, Davidlohr Bueso, kbuild test robot,
Sebastian Andrzej Siewior
Cc: Peter Zijlstra, Ingo Molnar, Rasmus Villemoes, linux-kernel
futex_hash() references two global variables: the base pointer
futex_queues and the size of the array futex_hashsize. The latter is
marked __read_mostly, while the former is not, so they are likely to
end up very far from each other. This means that futex_hash() is
likely to encounter two cache misses.
We could mark futex_queues as __read_mostly as well, but that doesn't
guarantee they'll end up next to each other (and even if they do, they
may still end up in different cache lines). So put the two variables
in a small singleton struct with sufficient alignment and mark that as
__read_mostly.
A diff of the disassembly shows what I'd expect:
: 31 d1 xor %edx,%ecx
: c1 ca 12 ror $0x12,%edx
: 29 d1 sub %edx,%ecx
-: 48 8b 15 25 c8 e5 00 mov 0xe5c825(%rip),%rdx # ffffffff81f149c8 <futex_hashsize>
+: 48 8b 15 35 c8 e5 00 mov 0xe5c835(%rip),%rdx # ffffffff81f149d8 <__futex_data+0x8>
: 31 c8 xor %ecx,%eax
: c1 c9 08 ror $0x8,%ecx
: 29 c8 sub %ecx,%eax
: 48 83 ea 01 sub $0x1,%rdx
: 48 21 d0 and %rdx,%rax
: 48 c1 e0 06 shl $0x6,%rax
-: 48 03 05 e4 5e 02 01 add 0x1025ee4(%rip),%rax # ffffffff820de0a0 <futex_queues>
+: 48 03 05 14 c8 e5 00 add 0xe5c814(%rip),%rax # ffffffff81f149d0 <__futex_data>
: c3 retq
: 0f 1f 00 nopl (%rax)
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
Resending since this was never picked up - and I assume it's actually
ok. Also, this time the alignment is spelled 2*sizeof(long) to avoid
wasting 8 bytes on 32bit.
kernel/futex.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 6e443efc65f4..dfc86e93c31d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -255,9 +255,18 @@ struct futex_hash_bucket {
struct plist_head chain;
} ____cacheline_aligned_in_smp;
-static unsigned long __read_mostly futex_hashsize;
+/*
+ * The base of the bucket array and its size are always used together
+ * (after initialization only in hash_futex()), so ensure that they
+ * reside in the same cacheline.
+ */
+static struct {
+ struct futex_hash_bucket *queues;
+ unsigned long hashsize;
+} __futex_data __read_mostly __aligned(2*sizeof(long));
+#define futex_queues (__futex_data.queues)
+#define futex_hashsize (__futex_data.hashsize)
-static struct futex_hash_bucket *futex_queues;
/*
* Fault injections for futexes.
--
2.1.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] futex: eliminate cache miss from futex_hash()
2015-09-09 21:36 [PATCH] futex: eliminate cache miss from futex_hash() Rasmus Villemoes
@ 2015-09-10 10:22 ` Davidlohr Bueso
2015-09-12 9:59 ` Ingo Molnar
2015-09-22 14:27 ` [tip:locking/core] futex: Force hot variables into a single cache line tip-bot for Rasmus Villemoes
1 sibling, 1 reply; 5+ messages in thread
From: Davidlohr Bueso @ 2015-09-10 10:22 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Thomas Gleixner, kbuild test robot, Sebastian Andrzej Siewior,
Peter Zijlstra, Ingo Molnar, linux-kernel
On Wed, 09 Sep 2015, Rasmus Villemoes wrote:
>futex_hash() references two global variables: the base pointer
>futex_queues and the size of the array futex_hashsize. The latter is
>marked __read_mostly, while the former is not, so they are likely to
>end up very far from each other. This means that futex_hash() is
>likely to encounter two cache misses.
>
>We could mark futex_queues as __read_mostly as well, but that doesn't
>guarantee they'll end up next to each other (and even if they do, they
>may still end up in different cache lines). So put the two variables
>in a small singleton struct with sufficient alignment and mark that as
>__read_mostly.
This really doesn't have much practical effect -- not even on larger
boxes, where such things matter. For instance, I ran the patch on a
60-core IvyBridge with 'perf-bench futex', for which futex-hash
particularly benefits in good data layout (ie our current smp alignment).
http://linux-scalability.org/futex-__futex_data/
I think we should leave it as is.
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] futex: eliminate cache miss from futex_hash()
2015-09-10 10:22 ` Davidlohr Bueso
@ 2015-09-12 9:59 ` Ingo Molnar
2015-10-26 15:22 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2015-09-12 9:59 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: Rasmus Villemoes, Thomas Gleixner, kbuild test robot,
Sebastian Andrzej Siewior, Peter Zijlstra, linux-kernel
* Davidlohr Bueso <dave@stgolabs.net> wrote:
> On Wed, 09 Sep 2015, Rasmus Villemoes wrote:
>
> >futex_hash() references two global variables: the base pointer
> >futex_queues and the size of the array futex_hashsize. The latter is
> >marked __read_mostly, while the former is not, so they are likely to
> >end up very far from each other. This means that futex_hash() is
> >likely to encounter two cache misses.
> >
> >We could mark futex_queues as __read_mostly as well, but that doesn't
> >guarantee they'll end up next to each other (and even if they do, they
> >may still end up in different cache lines). So put the two variables
> >in a small singleton struct with sufficient alignment and mark that as
> >__read_mostly.
>
> This really doesn't have much practical effect -- not even on larger
> boxes, where such things matter. For instance, I ran the patch on a
> 60-core IvyBridge with 'perf-bench futex', for which futex-hash
> particularly benefits in good data layout (ie our current smp alignment).
>
> http://linux-scalability.org/futex-__futex_data/
>
> I think we should leave it as is.
But ... given that these are shared-cached values (cached on all CPUs), this
change would only be measurable in such a benchmark if the cache footprint of the
test is just about to overflow the size of the CPU cache and the one extra cache
line would cause cache trashing. That is very unlikely.
So such a change seems to make sense unless you can argue that it's _bad_ to move
them closer to each other.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread
* [tip:locking/core] futex: Force hot variables into a single cache line
2015-09-09 21:36 [PATCH] futex: eliminate cache miss from futex_hash() Rasmus Villemoes
2015-09-10 10:22 ` Davidlohr Bueso
@ 2015-09-22 14:27 ` tip-bot for Rasmus Villemoes
1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Rasmus Villemoes @ 2015-09-22 14:27 UTC (permalink / raw)
To: linux-tip-commits
Cc: mingo, tglx, fengguang.wu, dave, linux-kernel, bigeasy, peterz,
linux, hpa
Commit-ID: ac742d37180bee83bc433be087b66a17af2883b9
Gitweb: http://git.kernel.org/tip/ac742d37180bee83bc433be087b66a17af2883b9
Author: Rasmus Villemoes <linux@rasmusvillemoes.dk>
AuthorDate: Wed, 9 Sep 2015 23:36:40 +0200
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 22 Sep 2015 16:23:15 +0200
futex: Force hot variables into a single cache line
futex_hash() references two global variables: the base pointer
futex_queues and the size of the array futex_hashsize. The latter is
marked __read_mostly, while the former is not, so they are likely to
end up very far from each other. This means that futex_hash() is
likely to encounter two cache misses.
We could mark futex_queues as __read_mostly as well, but that doesn't
guarantee they'll end up next to each other (and even if they do, they
may still end up in different cache lines). So put the two variables
in a small singleton struct with sufficient alignment and mark that as
__read_mostly.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/1441834601-13633-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/futex.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 6e443ef..dfc86e9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -255,9 +255,18 @@ struct futex_hash_bucket {
struct plist_head chain;
} ____cacheline_aligned_in_smp;
-static unsigned long __read_mostly futex_hashsize;
+/*
+ * The base of the bucket array and its size are always used together
+ * (after initialization only in hash_futex()), so ensure that they
+ * reside in the same cacheline.
+ */
+static struct {
+ struct futex_hash_bucket *queues;
+ unsigned long hashsize;
+} __futex_data __read_mostly __aligned(2*sizeof(long));
+#define futex_queues (__futex_data.queues)
+#define futex_hashsize (__futex_data.hashsize)
-static struct futex_hash_bucket *futex_queues;
/*
* Fault injections for futexes.
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] futex: eliminate cache miss from futex_hash()
2015-09-12 9:59 ` Ingo Molnar
@ 2015-10-26 15:22 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-10-26 15:22 UTC (permalink / raw)
To: Ingo Molnar, Davidlohr Bueso
Cc: Rasmus Villemoes, Thomas Gleixner, kbuild test robot,
Peter Zijlstra, linux-kernel
On 09/12/2015 11:59 AM, Ingo Molnar wrote:
>
> * Davidlohr Bueso <dave@stgolabs.net> wrote:
>
>> I think we should leave it as is.
>
> But ... given that these are shared-cached values (cached on all CPUs), this
> change would only be measurable in such a benchmark if the cache footprint of the
> test is just about to overflow the size of the CPU cache and the one extra cache
> line would cause cache trashing. That is very unlikely.
>
> So such a change seems to make sense unless you can argue that it's _bad_ to move
> them closer to each other.
hash_futex(), ARM, gcc-5.2.1:
- three opcodes less
- we don't push / pop a register to the stack
--- futex_old.o_f.S
+++ futex_new.o_f.S
@@ -1,26 +1,23 @@
00000000 <hash_futex>:
-push {lr} ; (str lr, [sp, #-4]!)
-movw r3, #48887 ; 0xbef7
ldr r1, [r0, #8]
-movt r3, #57005 ; 0xdead
+movw r3, #48887 ; 0xbef7
ldr r2, [r0, #4]
-movw ip, #0
+movt r3, #57005 ; 0xdead
add r3, r1, r3
ldr r0, [r0]
add r2, r3, r2
-movt ip, #0
+movw ip, #0
eor r1, r3, r2
add r3, r3, r0
sub r1, r1, r2, ror #18
-ldr ip, [ip]
+movt ip, #0
eor r3, r3, r1
-movw lr, #0
+ldr r0, [ip, #4]
sub r3, r3, r1, ror #21
-sub ip, ip, #1
+ldr ip, [ip]
eor r2, r2, r3
-movt lr, #0
+sub r0, r0, #1
sub r2, r2, r3, ror #7
-ldr r0, [lr]
eor r1, r1, r2
sub r1, r1, r2, ror #16
eor r3, r3, r1
@@ -29,6 +26,6 @@
sub r3, r2, r3, ror #18
eor r1, r1, r3
sub r3, r1, r3, ror #8
-and r3, r3, ip
-add r0, r0, r3, lsl #6
-pop {pc} ; (ldr pc, [sp], #4)
+and r0, r0, r3
+add r0, ip, r0, lsl #6
+bx lr
I guess that not invoking three opcodes is a good thing :)
> Thanks,
>
> Ingo
>
Sebastian
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-10-26 15:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-09 21:36 [PATCH] futex: eliminate cache miss from futex_hash() Rasmus Villemoes
2015-09-10 10:22 ` Davidlohr Bueso
2015-09-12 9:59 ` Ingo Molnar
2015-10-26 15:22 ` Sebastian Andrzej Siewior
2015-09-22 14:27 ` [tip:locking/core] futex: Force hot variables into a single cache line tip-bot for Rasmus Villemoes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).