public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3 v2] futex: introduce FUTEX_SWAP operation
@ 2020-06-16 17:22 Peter Oskolkov
  2020-06-22 13:31 ` Aaron Lu
  0 siblings, 1 reply; 2+ messages in thread
From: Peter Oskolkov @ 2020-06-16 17:22 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Darren Hart, Vincent Guittot
  Cc: Peter Oskolkov, avagin, pjt@google.com, Ben Segall

From 7b091e46de4f9227b5a943e6d78283564e8c1c72 Mon Sep 17 00:00:00 2001
From: Peter Oskolkov <posk@google.com>
Date: Tue, 16 Jun 2020 10:13:58 -0700
Subject: [RFC PATCH 0/3 v2] futex: introduce FUTEX_SWAP operation

This is an RFC!

As Paul Turner presented at LPC in 2013 ...
- pdf: http://pdxplumbers.osuosl.org/2013/ocw//system/presentations/1653/original/LPC%20-%20User%20Threading.pdf
- video: https://www.youtube.com/watch?v=KXuZi9aeGTw

... Google has developed an M:N userspace threading subsystem backed
by Google-private SwitchTo Linux Kernel API (page 17 in the pdf referenced
above). This subsystem provides latency-sensitive services at Google with
fine-grained user-space control/scheduling over what is running when,
and this subsystem is used widely internally (called schedulers or fibers).

This RFC patchset is the first step to open-source this work. As explained
in the linked pdf and video, SwitchTo API has three core operations: wait,
resume, and swap (=switch). So this patchset adds a FUTEX_SWAP operation
that, in addition to FUTEX_WAIT and FUTEX_WAKE, will provide a foundation
on top of which user-space threading libraries can be built.

Another common use case for FUTEX_SWAP is message passing a-la RPC
between tasks: task/thread T1 prepares a message,
wakes T2 to work on it, and waits for the results; when T2 is done, it
wakes T1 and waits for more work to arrive. Currently the simplest
way to implement this is

a. T1: futex-wake T2, futex-wait
b. T2: wakes, does what it has been woken to do
c. T2: futex-wake T1, futex-wait

With FUTEX_SWAP, steps a and c above can be reduced to one futex operation
that runs 5-10 times faster.

Patches in this patchset:

Patch 1: introduce FUTEX_SWAP futex operation that,
         internally, does wake + wait. The purpose of this patch is
         to work out the API.
Patch 2: a first rough attempt to make FUTEX_SWAP faster than
         what wake + wait can do.
Patch 3: a selftest that can also be used to benchmark FUTEX_SWAP vs
         FUTEX_WAKE + FUTEX_WAIT.

v2: fix undefined symbol error ifndef CONFIG_SMP.

Peter Oskolkov (3):
  futex: introduce FUTEX_SWAP operation
  futex, sched: add wake_up_swap, use in FUTEX_SWAP
  selftests/futex: add futex_swap selftest

 include/linux/sched.h                         |   1 +
 include/uapi/linux/futex.h                    |   2 +
 kernel/futex.c                                |  96 ++++++--
 kernel/sched/core.c                           |   5 +
 kernel/sched/fair.c                           |   3 +
 kernel/sched/sched.h                          |   3 +-
 .../selftests/futex/functional/.gitignore     |   1 +
 .../selftests/futex/functional/Makefile       |   1 +
 .../selftests/futex/functional/futex_swap.c   | 209 ++++++++++++++++++
 .../selftests/futex/include/futextest.h       |  19 ++
 10 files changed, 323 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/futex/functional/futex_swap.c

--
2.25.1



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH 0/3 v2] futex: introduce FUTEX_SWAP operation
  2020-06-16 17:22 [RFC PATCH 0/3 v2] futex: introduce FUTEX_SWAP operation Peter Oskolkov
@ 2020-06-22 13:31 ` Aaron Lu
  0 siblings, 0 replies; 2+ messages in thread
From: Aaron Lu @ 2020-06-22 13:31 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Darren Hart, Vincent Guittot, Peter Oskolkov,
	avagin, pjt@google.com, Ben Segall

On Tue, Jun 16, 2020 at 10:22:11AM -0700, Peter Oskolkov wrote:
> From 7b091e46de4f9227b5a943e6d78283564e8c1c72 Mon Sep 17 00:00:00 2001
> From: Peter Oskolkov <posk@google.com>
> Date: Tue, 16 Jun 2020 10:13:58 -0700
> Subject: [RFC PATCH 0/3 v2] futex: introduce FUTEX_SWAP operation
> 
> This is an RFC!
> 
> As Paul Turner presented at LPC in 2013 ...
> - pdf: http://pdxplumbers.osuosl.org/2013/ocw//system/presentations/1653/original/LPC%20-%20User%20Threading.pdf
> - video: https://www.youtube.com/watch?v=KXuZi9aeGTw
> 
> ... Google has developed an M:N userspace threading subsystem backed
> by Google-private SwitchTo Linux Kernel API (page 17 in the pdf referenced
> above). This subsystem provides latency-sensitive services at Google with
> fine-grained user-space control/scheduling over what is running when,
> and this subsystem is used widely internally (called schedulers or fibers).
> 
> This RFC patchset is the first step to open-source this work. As explained
> in the linked pdf and video, SwitchTo API has three core operations: wait,
> resume, and swap (=switch). So this patchset adds a FUTEX_SWAP operation
> that, in addition to FUTEX_WAIT and FUTEX_WAKE, will provide a foundation
> on top of which user-space threading libraries can be built.
> 
> Another common use case for FUTEX_SWAP is message passing a-la RPC
> between tasks: task/thread T1 prepares a message,
> wakes T2 to work on it, and waits for the results; when T2 is done, it
> wakes T1 and waits for more work to arrive. Currently the simplest
> way to implement this is
> 
> a. T1: futex-wake T2, futex-wait
> b. T2: wakes, does what it has been woken to do
> c. T2: futex-wake T1, futex-wait
> 
> With FUTEX_SWAP, steps a and c above can be reduced to one futex operation
> that runs 5-10 times faster.

schbench used futex wait/wake to do sleep/wakeup between message thread
and worker thread and when worker thread is 1 per message thread, the
message thread and worker thread is also flipcall style.

So I modified schbench to make use of futex_swap and did a comparison.
In the not overloaded case, both runs roughly the same with futex_swap
performing slightly better. In the overloaded case, futex_swap performs
better than futex wait/wake in all metrics, with 90th seeing the largest
difference: 2556us vs 6us.

I guess when the scheduler change is in place, more latency gain is
expected.

Here is the log of the schbench run(on a 16core/32cpu x86_64 machine):

overloaded case

original schbench(aka futex wait/wake)
$./schbench -m 64 -t 1 -r 30

Latency percentiles (usec)
        50.0000th: 7
        75.0000th: 9
        90.0000th: 2556
        95.0000th: 7112
        *99.0000th: 14160
        99.5000th: 17504
        99.9000th: 22688
        min=0, max=30351

with futex swap
$./schbench -m 64 -t 1 -r 30

Latency percentiles (usec)
        50.0th: 4
        75.0th: 5
        90.0th: 6
        95.0th: 4568
        *99.0th: 12912
        99.5th: 15152
        99.9th: 20384
        min=0, max=30723


not overloaded case

original schbench(aka futex wait/wake)

$./schbench -m 32 -t 1 -r 30
Latency percentiles (usec)
        50.0000th: 6
        75.0000th: 7
        90.0000th: 8
        95.0000th: 9
        *99.0000th: 10
        99.5000th: 12
        99.9000th: 18
        min=0, max=398


with futex swap

$./schbench -m 32 -t 1 -r 30
Latency percentiles (usec)
        50.0th: 4
        75.0th: 5
        90.0th: 5
        95.0th: 6
        *99.0th: 8
        99.5th: 9
        99.9th: 12
        min=0, max=245

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-06-22 13:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-16 17:22 [RFC PATCH 0/3 v2] futex: introduce FUTEX_SWAP operation Peter Oskolkov
2020-06-22 13:31 ` Aaron Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox