From: Davidlohr Bueso <davidlohr@hp.com>
To: linux-kernel@vger.kernel.org
Cc: mingo@kernel.org, dvhart@linux.intel.com, peterz@infradead.org,
tglx@linutronix.de, efault@gmx.de, jeffm@suse.com,
torvalds@linux-foundation.org, scott.norton@hp.com,
tom.vaden@hp.com, aswin@hp.com, Waiman.Long@hp.com,
jason.low2@hp.com, davidlohr@hp.com
Subject: [PATCH 0/5] futex: Wakeup optimizations
Date: Fri, 22 Nov 2013 16:56:32 -0800 [thread overview]
Message-ID: <1385168197-8612-1-git-send-email-davidlohr@hp.com> (raw)
We have been dealing with a customer database workload on large
12Tb, 240 core 16 socket NUMA system that exhibits high amounts
of contention on some of the locks that serialize internal futex
data structures. This workload specially suffers in the wakeup
paths, where waiting on the corresponding hb->lock can account for
up to ~60% of the time. The result of such calls can mostly be
classified as (i) nothing to wake up and (ii) wakeup large amount
of tasks.
Before these patches are applied, we can see this pathological behavior:
37.12% 826174 xxx [kernel.kallsyms] [k] _raw_spin_lock
--- _raw_spin_lock
|
|--97.14%-- futex_wake
| do_futex
| sys_futex
| system_call_fastpath
| |
| |--99.70%-- 0x7f383fbdea1f
| | yyy
43.71% 762296 xxx [kernel.kallsyms] [k] _raw_spin_lock
--- _raw_spin_lock
|
|--53.74%-- futex_wake
| do_futex
| sys_futex
| system_call_fastpath
| |
| |--99.40%-- 0x7fe7d44a4c05
| | zzz
|--45.90%-- futex_wait_setup
| futex_wait
| do_futex
| sys_futex
| system_call_fastpath
| 0x7fe7ba315789
| syscall
With these patches, contention is practically non existent:
0.10% 49 xxx [kernel.kallsyms] [k] _raw_spin_lock
--- _raw_spin_lock
|
|--76.06%-- futex_wait_setup
| futex_wait
| do_futex
| sys_futex
| system_call_fastpath
| |
| |--99.90%-- 0x7f3165e63789
| | syscall|
...
|--6.27%-- futex_wake
| do_futex
| sys_futex
| system_call_fastpath
| |
| |--54.56%-- 0x7f317fff2c05
...
Patches 1 & 2 are cleanups and micro optimizations.
Patch 3 addresses the well known issue of the global hash table.
By creating a larger and NUMA aware table, we can reduce the false
sharing and collisions, thus reducing the chance of different futexes
using hb->lock.
Patch 4 reduces contention on the corresponding hb->lock by not trying to
acquire it if there are no blocked tasks in the waitqueue.
This particularly deals with point (i) above, where we see that it is not
uncommon for up to 90% of wakeup calls end up returning 0, indicating that no
tasks were woken.
Patch 5 resurrects a two year old idea from Peter Zijlstra to delay
the waking of the blocked tasks to be done without holding the hb->lock:
https://lkml.org/lkml/2011/9/14/118
This is useful for locking primitives that can effect multiple wakeups
per operation and want to avoid the futex's internal spinlock contention by
delaying the wakeups until we've released the hb->lock.
This particularly deals with point (ii) above, where we can observe that
in occasions the wake calls end up waking 125 to 200 waiters in what we believe
are RW locks in the application.
This patchset has also been tested on smaller systems for a variety of
benchmarks, including java workloads, kernel builds and custom bang-the-hell-out-of
hb locks programs. So far, no functional or performance regressions have been seen.
Furthermore, no issues were found when running the different tests in the futextest
suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/
This patchset applies on top of Linus' tree as of v3.13-rc1.
Special thanks to Scott Norton, Tom Vanden and Mark Ray for help presenting,
debugging and analyzing the data.
futex: Misc cleanups
futex: Check for pi futex_q only once
futex: Larger hash table
futex: Avoid taking hb lock if nothing to wakeup
sched,futex: Provide delayed wakeup list
include/linux/sched.h | 41 ++++++++++++++++++
kernel/futex.c | 113 +++++++++++++++++++++++++++-----------------------
kernel/sched/core.c | 19 +++++++++
3 files changed, 122 insertions(+), 51 deletions(-)
--
1.8.1.4
next reply other threads:[~2013-11-23 0:56 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-23 0:56 Davidlohr Bueso [this message]
2013-11-23 0:56 ` [PATCH 1/5] futex: Misc cleanups Davidlohr Bueso
2013-11-23 6:52 ` Darren Hart
2013-11-23 0:56 ` [PATCH 2/5] futex: Check for pi futex_q only once Davidlohr Bueso
2013-11-23 6:33 ` Darren Hart
2013-11-24 5:19 ` Davidlohr Bueso
2013-11-23 0:56 ` [PATCH 3/5] futex: Larger hash table Davidlohr Bueso
2013-11-23 6:52 ` Darren Hart
2013-11-23 0:56 ` [PATCH 4/5] futex: Avoid taking hb lock if nothing to wakeup Davidlohr Bueso
2013-11-23 1:25 ` Linus Torvalds
2013-11-23 3:03 ` Jason Low
2013-11-23 3:19 ` Davidlohr Bueso
2013-11-23 7:23 ` Darren Hart
2013-11-23 13:16 ` Thomas Gleixner
2013-11-24 3:46 ` Linus Torvalds
2013-11-24 5:15 ` Davidlohr Bueso
2013-11-25 12:01 ` Thomas Gleixner
2013-11-25 16:23 ` Thomas Gleixner
2013-11-25 16:36 ` Peter Zijlstra
2013-11-25 17:32 ` Thomas Gleixner
2013-11-25 17:38 ` Peter Zijlstra
2013-11-25 18:55 ` Davidlohr Bueso
2013-11-25 19:52 ` Thomas Gleixner
2013-11-25 19:47 ` Thomas Gleixner
2013-11-25 20:03 ` Darren Hart
2013-11-25 20:26 ` Thomas Gleixner
2013-11-26 13:53 ` Thomas Gleixner
2013-11-23 4:05 ` Waiman Long
2013-11-23 5:40 ` Darren Hart
2013-11-23 5:42 ` Hart, Darren
2013-11-23 7:20 ` Darren Hart
2013-11-23 0:56 ` [PATCH 5/5] sched,futex: Provide delayed wakeup list Davidlohr Bueso
2013-11-23 11:48 ` Peter Zijlstra
2013-11-23 12:01 ` Peter Zijlstra
2013-11-24 5:25 ` Davidlohr Bueso
2013-11-23 5:55 ` [PATCH 0/5] futex: Wakeup optimizations Darren Hart
2013-11-23 6:35 ` Mike Galbraith
2013-11-23 6:38 ` Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1385168197-8612-1-git-send-email-davidlohr@hp.com \
--to=davidlohr@hp.com \
--cc=Waiman.Long@hp.com \
--cc=aswin@hp.com \
--cc=dvhart@linux.intel.com \
--cc=efault@gmx.de \
--cc=jason.low2@hp.com \
--cc=jeffm@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=tom.vaden@hp.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox