public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] futex: Wakeup optimizations
@ 2013-12-19 18:45 Davidlohr Bueso
  2013-12-19 18:45 ` [PATCH v3 1/4] futex: Misc cleanups Davidlohr Bueso
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Davidlohr Bueso @ 2013-12-19 18:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, dvhart, peterz, tglx, paulmck, efault, jeffm, torvalds,
	jason.low2, Waiman.Long, tom.vaden, scott.norton, aswin,
	davidlohr

Changes from v2 [http://lwn.net/Articles/575449/]:
 - Reordered SOB tags to reflect me as primary author.

 - Improved ordering guarantee comments for patch 4.

 - Rebased patch 4 against Linus' tree (this patch didn't
   apply after the recent futex changes/fixes).

Changes from v1 [https://lkml.org/lkml/2013/11/22/525]:
 - Removed patch "futex: Check for pi futex_q only once".

 - Cleaned up ifdefs for larger hash table.

 - Added a doc patch from tglx that describes the futex 
   ordering guarantees.

 - Improved the lockless plist check for the wake calls.
   Based on the community feedback, the necessary abstractions
   and barriers are added to maintain ordering guarantees.
   Code documentation is also updated.

 - Removed patch "sched,futex: Provide delayed wakeup list".
   Based on feedback from PeterZ, I will look into this as
   a separate issue once the other patches are settled.
 

We have been dealing with a customer database workload on large
12Tb, 240 core 16 socket NUMA system that exhibits high amounts 
of contention on some of the locks that serialize internal futex 
data structures. This workload specially suffers in the wakeup 
paths, where waiting on the corresponding hb->lock can account for 
up to ~60% of the time. The result of such calls can mostly be 
classified as (i) nothing to wake up and (ii) wakeup large amount 
of tasks.

Before these patches are applied, we can see this pathological behavior:

 37.12%  826174  xxx  [kernel.kallsyms] [k] _raw_spin_lock
            --- _raw_spin_lock
             |
             |--97.14%-- futex_wake
             |          do_futex
             |          sys_futex
             |          system_call_fastpath
             |          |
             |          |--99.70%-- 0x7f383fbdea1f
             |          |           yyy

 43.71%  762296  xxx  [kernel.kallsyms] [k] _raw_spin_lock
            --- _raw_spin_lock
             |
             |--53.74%-- futex_wake
             |          do_futex
             |          sys_futex
             |          system_call_fastpath
             |          |
             |          |--99.40%-- 0x7fe7d44a4c05
             |          |           zzz
             |--45.90%-- futex_wait_setup
             |          futex_wait
             |          do_futex
             |          sys_futex
             |          system_call_fastpath
             |          0x7fe7ba315789
             |          syscall


With these patches, contention is practically non existent:

 0.10%     49   xxx  [kernel.kallsyms]   [k] _raw_spin_lock
               --- _raw_spin_lock
                |
                |--76.06%-- futex_wait_setup
                |          futex_wait
                |          do_futex
                |          sys_futex
                |          system_call_fastpath
                |          |
                |          |--99.90%-- 0x7f3165e63789
                |          |          syscall|
                           ...
                |--6.27%-- futex_wake
                |          do_futex
                |          sys_futex
                |          system_call_fastpath
                |          |
                |          |--54.56%-- 0x7f317fff2c05
                ...

Patch 1 is a cleanup.

Patch 2 addresses the well known issue of the global hash table.
By creating a larger and NUMA aware table, we can reduce the false
sharing and collisions, thus reducing the chance of different futexes 
using hb->lock.

Patch 3 documents the futex ordering guarantees.

Patch 4 reduces contention on the corresponding hb->lock by not trying to
acquire it if there are no blocked tasks in the waitqueue.
This particularly deals with point (i) above, where we see that it is not
uncommon for up to 90% of wakeup calls end up returning 0, indicating that no
tasks were woken.

This patchset has also been tested on smaller systems for a variety of
benchmarks, including java workloads, kernel builds and custom bang-the-hell-out-of
hb locks programs. So far, no functional or performance regressions have been seen.
Furthermore, no issues were found when running the different tests in the futextest 
suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/

This patchset applies on top of Linus' tree as of v3.13-rc4 (f7556698).

Special thanks to Scott Norton, Tom Vanden, Mark Ray and Aswin Chandramouleeswaran
for help presenting, debugging and analyzing the data.

  futex: Misc cleanups
  futex: Larger hash table
  futex: Document ordering guarantees
  futex: Avoid taking hb lock if nothing to wakeup

 kernel/futex.c | 236 +++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 196 insertions(+), 40 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH v2 0/4] futex: Wakeup optimizations
@ 2013-12-03  9:45 Davidlohr Bueso
  2013-12-03  9:45 ` [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup Davidlohr Bueso
  0 siblings, 1 reply; 30+ messages in thread
From: Davidlohr Bueso @ 2013-12-03  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, dvhart, peterz, tglx, paulmck, efault, jeffm, torvalds,
	scott.norton, tom.vaden, aswin, Waiman.Long, jason.low2,
	davidlohr

Changes from v1 [https://lkml.org/lkml/2013/11/22/525]:
 - Removed patch "futex: Check for pi futex_q only once".

 - Cleaned up ifdefs for larger hash table.

 - Added a doc patch from tglx that describes the futex 
   ordering guarantees.

 - Improved the lockless plist check for the wake calls.
   Based on the community feedback, the necessary abstractions
   and barriers are added to maintain ordering guarantees.
   Code documentation is also updated.

 - Removed patch "sched,futex: Provide delayed wakeup list".
   Based on feedback from PeterZ, I will look into this as
   a separate issue once the other patches are settled.
 

We have been dealing with a customer database workload on large
12Tb, 240 core 16 socket NUMA system that exhibits high amounts 
of contention on some of the locks that serialize internal futex 
data structures. This workload specially suffers in the wakeup 
paths, where waiting on the corresponding hb->lock can account for 
up to ~60% of the time. The result of such calls can mostly be 
classified as (i) nothing to wake up and (ii) wakeup large amount 
of tasks.

Before these patches are applied, we can see this pathological behavior:

 37.12%  826174  xxx  [kernel.kallsyms] [k] _raw_spin_lock
            --- _raw_spin_lock
             |
             |--97.14%-- futex_wake
             |          do_futex
             |          sys_futex
             |          system_call_fastpath
             |          |
             |          |--99.70%-- 0x7f383fbdea1f
             |          |           yyy

 43.71%  762296  xxx  [kernel.kallsyms] [k] _raw_spin_lock
            --- _raw_spin_lock
             |
             |--53.74%-- futex_wake
             |          do_futex
             |          sys_futex
             |          system_call_fastpath
             |          |
             |          |--99.40%-- 0x7fe7d44a4c05
             |          |           zzz
             |--45.90%-- futex_wait_setup
             |          futex_wait
             |          do_futex
             |          sys_futex
             |          system_call_fastpath
             |          0x7fe7ba315789
             |          syscall


With these patches, contention is practically non existent:

 0.10%     49   xxx  [kernel.kallsyms]   [k] _raw_spin_lock
               --- _raw_spin_lock
                |
                |--76.06%-- futex_wait_setup
                |          futex_wait
                |          do_futex
                |          sys_futex
                |          system_call_fastpath
                |          |
                |          |--99.90%-- 0x7f3165e63789
                |          |          syscall|
                           ...
                |--6.27%-- futex_wake
                |          do_futex
                |          sys_futex
                |          system_call_fastpath
                |          |
                |          |--54.56%-- 0x7f317fff2c05
                ...

Patch 1 is a cleanup.

Patch 2 addresses the well known issue of the global hash table.
By creating a larger and NUMA aware table, we can reduce the false
sharing and collisions, thus reducing the chance of different futexes 
using hb->lock.

Patch 3 documents the futex ordering guarantees.

Patch 4 reduces contention on the corresponding hb->lock by not trying to
acquire it if there are no blocked tasks in the waitqueue.
This particularly deals with point (i) above, where we see that it is not
uncommon for up to 90% of wakeup calls end up returning 0, indicating that no
tasks were woken.

This patchset has also been tested on smaller systems for a variety of
benchmarks, including java workloads, kernel builds and custom bang-the-hell-out-of
hb locks programs. So far, no functional or performance regressions have been seen.
Furthermore, no issues were found when running the different tests in the futextest 
suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/

This patchset applies on top of Linus' tree as of v3.13-rc2 (2e7babfa).

Special thanks to Scott Norton, Tom Vanden, Mark Ray and Aswin Chandramouleeswaran
for help presenting, debugging and analyzing the data.

  futex: Misc cleanups
  futex: Larger hash table
  futex: Document ordering guarantees
  futex: Avoid taking hb lock if nothing to wakeup

 kernel/futex.c | 230 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 194 insertions(+), 36 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-01-02 11:37 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-19 18:45 [PATCH v3 0/4] futex: Wakeup optimizations Davidlohr Bueso
2013-12-19 18:45 ` [PATCH v3 1/4] futex: Misc cleanups Davidlohr Bueso
2013-12-19 18:45 ` [PATCH v3 2/4] futex: Larger hash table Davidlohr Bueso
2013-12-19 18:45 ` [PATCH v3 3/4] futex: Document ordering guarantees Davidlohr Bueso
2013-12-19 18:45 ` [PATCH v3 4/4] futex: Avoid taking hb lock if nothing to wakeup Davidlohr Bueso
2013-12-19 19:25   ` Ingo Molnar
2013-12-19 19:29     ` Davidlohr Bueso
2013-12-19 19:42       ` Ingo Molnar
2013-12-19 22:35         ` Joe Perches
2013-12-20  9:11           ` Peter Zijlstra
2013-12-20 10:27             ` Ingo Molnar
2013-12-19 23:14   ` Linus Torvalds
2013-12-19 23:42     ` Davidlohr Bueso
2013-12-19 23:53       ` Linus Torvalds
2013-12-20  0:04         ` Linus Torvalds
2013-12-20  0:26           ` Darren Hart
2013-12-20  0:12         ` Linus Torvalds
2013-12-20  2:22         ` Davidlohr Bueso
2013-12-20  3:13           ` Linus Torvalds
2013-12-20  6:13             ` Davidlohr Bueso
2013-12-20  8:54             ` Peter Zijlstra
2013-12-20 19:30     ` Davidlohr Bueso
2013-12-20 19:54       ` Linus Torvalds
2013-12-21  5:57         ` Davidlohr Bueso
2013-12-21  1:36     ` Davidlohr Bueso
2013-12-21  2:34       ` Darren Hart
2013-12-21  3:08         ` Davidlohr Bueso
2013-12-25  3:13       ` Waiman Long
2014-01-02 11:37         ` Davidlohr Bueso
  -- strict thread matches above, loose matches on Subject: below --
2013-12-03  9:45 [PATCH v2 0/4] futex: Wakeup optimizations Davidlohr Bueso
2013-12-03  9:45 ` [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup Davidlohr Bueso
2013-12-10 18:10   ` [PATCH v3 " Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox