From: Joel Fernandes <joel@joelfernandes.org>
To: Connor O'Brien <connoro@google.com>
Cc: linux-kernel@vger.kernel.org, kernel-team@android.com,
John Stultz <jstultz@google.com>,
Qais Yousef <qais.yousef@arm.com>, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E . McKenney" <paulmck@kernel.org>
Subject: Re: [RFC PATCH 00/11] Reviving the Proxy Execution Series
Date: Mon, 17 Oct 2022 02:23:05 +0000 [thread overview]
Message-ID: <Y0y8iURTSAv7ZspC@google.com> (raw)
In-Reply-To: <20221003214501.2050087-1-connoro@google.com>
On Mon, Oct 03, 2022 at 09:44:50PM +0000, Connor O'Brien wrote:
> Proxy execution is an approach to implementing priority inheritance
> based on distinguishing between a task's scheduler context (information
> required in order to make scheduling decisions about when the task gets
> to run, such as its scheduler class and priority) and its execution
> context (information required to actually run the task, such as CPU
> affinity). With proxy execution enabled, a task p1 that blocks on a
> mutex remains on the runqueue, but its "blocked" status and the mutex on
> which it blocks are recorded. If p1 is selected to run while still
> blocked, the lock owner p2 can run "on its behalf", inheriting p1's
> scheduler context. Execution context is not inherited, meaning that
> e.g. the CPUs where p2 can run are still determined by its own affinity
> and not p1's.
>
> In practice a number of more complicated situations can arise: the mutex
> owner might itself be blocked on another mutex, or it could be sleeping,
> running on a different CPU, in the process of migrating between CPUs,
> etc. Details on handling these various cases can be found in patch 7/11
> ("sched: Add proxy execution"), particularly in the implementation of
> proxy() and accompanying comments.
>
> Past discussions of proxy execution have often focused on the benefits
> for deadline scheduling. Current interest for Android is based more on
> desire for a broad solution to priority inversion on kernel mutexes,
> including among CFS tasks. One notable scenario arises when cpu cgroups
> are used to throttle less important background tasks. Priority inversion
> can occur when an "important" unthrottled task blocks on a mutex held by
> an "unimportant" task whose CPU time is constrained using cpu
> shares. The result is higher worst case latencies for the unthrottled
> task.[0] Testing by John Stultz with a simple reproducer [1] showed
> promising results for this case, with proxy execution appearing to
> eliminate the large latency spikes associated with priority
> inversion.[2]
>
> Proxy execution has been discussed over the past few years at several
> conferences[3][4][5], but (as far as I'm aware) patches implementing the
> concept have not been discussed on the list since Juri Lelli's RFC in
> 2018.[6] This series is an updated version of that patchset, seeking to
> incorporate subsequent work by Juri[7], Valentin Schneider[8] and Peter
> Zijlstra along with some new fixes.
>
> Testing so far has focused on stability, mostly via mutex locktorture
> with some tweaks to more quickly trigger proxy execution bugs. These
> locktorture changes are included at the end of the series for
> reference. The current series survives runs of >72 hours on QEMU without
> crashes, deadlocks, etc. Testing on Pixel 6 with the android-mainline
> kernel [9] yields similar results. In both cases, testing used >2 CPUs
> and CONFIG_FAIR_GROUP_SCHED=y, a configuration Valentin Schneider
> reported[10] showed stability problems with earlier versions of the
> series.
>
> That said, these are definitely still a work in progress, with some
> known remaining issues (e.g. warnings while booting on Pixel 6,
> suspicious looking min/max vruntime numbers) and likely others I haven't
> found yet. I've done my best to eliminate checks and code paths made
> redundant by new fixes but some probably remain. There's no attempt yet
> to handle core scheduling. Performance testing so far has been limited
> to the aforementioned priority inversion reproducer. The hope in sharing
> now is to revive the discussion on proxy execution and get some early
> input for continuing to revise & refine the patches.
I ran a test to check CFS time sharing. The accounting on top is confusing,
but ftrace confirms the proxying happening.
Task A - pid 122
Task B - pid 123
Task C - pid 121
Task D - pid 124
Here D and B just spin all the time. C is lock owner (in-kernel mutex) and
spins all the time, while A blocks on the same in-kernel mutex and remains
blocked.
Then I did "top -H" while the test was running which gives below output.
The first column is PID, and the third-last column is CPU percentage.
Without PE:
121 root 20 0 99496 4 0 R 33.6 0.0 0:02.76 t (task C)
123 root 20 0 99496 4 0 R 33.2 0.0 0:02.75 t (task B)
124 root 20 0 99496 4 0 R 33.2 0.0 0:02.75 t (task D)
With PE:
PID
122 root 20 0 99496 4 0 D 25.3 0.0 0:22.21 t (task A)
121 root 20 0 99496 4 0 R 25.0 0.0 0:22.20 t (task C)
123 root 20 0 99496 4 0 R 25.0 0.0 0:22.20 t (task B)
124 root 20 0 99496 4 0 R 25.0 0.0 0:22.20 t (task D)
With PE, I was expecting 2 threads with 25% and 1 thread with 50%. Instead I
get 4 threads with 25% in the top. Ftrace confirms that the D-state task is
in fact not running and proxying to the owner task so everything seems
working correctly, but the accounting seems confusing, as in, it is confusing
to see the D-state task task taking 25% CPU when it is obviously "sleeping".
Yeah, yeah, I know D is proxying for C (while being in the uninterruptible
sleep state), so may be it is OK then, but I did want to bring this up :-)
thanks,
- Joel
> [0] https://raw.githubusercontent.com/johnstultz-work/priority-inversion-demo/main/results/charts/6.0-rc7-throttling-starvation.png
> [1] https://github.com/johnstultz-work/priority-inversion-demo
> [2] https://raw.githubusercontent.com/johnstultz-work/priority-inversion-demo/main/results/charts/6.0-rc7-vanilla-vs-proxy.png
> [3] https://lpc.events/event/2/contributions/62/
> [4] https://lwn.net/Articles/793502/
> [5] https://lwn.net/Articles/820575/
> [6] https://lore.kernel.org/lkml/20181009092434.26221-1-juri.lelli@redhat.com/
> [7] https://github.com/jlelli/linux/tree/experimental/deadline/proxy-rfc-v2
> [8] https://gitlab.arm.com/linux-arm/linux-vs/-/tree/mainline/sched/proxy-rfc-v3/
> [9] https://source.android.com/docs/core/architecture/kernel/android-common
> [10] https://lpc.events/event/7/contributions/758/attachments/585/1036/lpc20-proxy.pdf#page=4
>
> Connor O'Brien (2):
> torture: support randomized shuffling for proxy exec testing
> locktorture: support nested mutexes
>
> Juri Lelli (3):
> locking/mutex: make mutex::wait_lock irq safe
> kernel/locking: Expose mutex_owner()
> sched: Fixup task CPUs for potential proxies.
>
> Peter Zijlstra (4):
> locking/ww_mutex: Remove wakeups from under mutex::wait_lock
> locking/mutex: Rework task_struct::blocked_on
> sched: Split scheduler execution context
> sched: Add proxy execution
>
> Valentin Schneider (2):
> kernel/locking: Add p->blocked_on wrapper
> sched/rt: Fix proxy/current (push,pull)ability
>
> include/linux/mutex.h | 2 +
> include/linux/sched.h | 15 +-
> include/linux/ww_mutex.h | 3 +
> init/Kconfig | 7 +
> init/init_task.c | 1 +
> kernel/Kconfig.locks | 2 +-
> kernel/fork.c | 6 +-
> kernel/locking/locktorture.c | 20 +-
> kernel/locking/mutex-debug.c | 9 +-
> kernel/locking/mutex.c | 109 +++++-
> kernel/locking/ww_mutex.h | 31 +-
> kernel/sched/core.c | 679 +++++++++++++++++++++++++++++++++--
> kernel/sched/deadline.c | 37 +-
> kernel/sched/fair.c | 33 +-
> kernel/sched/rt.c | 63 ++--
> kernel/sched/sched.h | 42 ++-
> kernel/torture.c | 10 +-
> 17 files changed, 955 insertions(+), 114 deletions(-)
>
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>
next prev parent reply other threads:[~2022-10-17 2:23 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-03 21:44 [RFC PATCH 00/11] Reviving the Proxy Execution Series Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 01/11] locking/ww_mutex: Remove wakeups from under mutex::wait_lock Connor O'Brien
2022-10-04 16:01 ` Waiman Long
2022-10-12 23:54 ` Joel Fernandes
2022-10-20 18:43 ` Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 02/11] locking/mutex: Rework task_struct::blocked_on Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 03/11] kernel/locking: Add p->blocked_on wrapper Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 04/11] locking/mutex: make mutex::wait_lock irq safe Connor O'Brien
2022-10-13 4:30 ` Joel Fernandes
2022-10-03 21:44 ` [RFC PATCH 05/11] sched: Split scheduler execution context Connor O'Brien
2022-10-14 17:01 ` Joel Fernandes
2022-10-19 17:17 ` Valentin Schneider
2022-10-20 18:43 ` Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 06/11] kernel/locking: Expose mutex_owner() Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 07/11] sched: Add proxy execution Connor O'Brien
2022-10-12 1:54 ` Joel Fernandes
2022-10-12 9:46 ` Juri Lelli
2022-10-14 17:07 ` Joel Fernandes
2022-10-15 13:53 ` Peter Zijlstra
2022-10-16 20:48 ` Steven Rostedt
2022-10-17 4:03 ` Joel Fernandes
2022-10-17 7:26 ` Peter Zijlstra
2022-10-24 22:33 ` Qais Yousef
2022-10-25 11:19 ` Joel Fernandes
2022-10-25 22:10 ` Qais Yousef
2022-10-15 15:28 ` Peter Zijlstra
2022-10-15 15:08 ` Peter Zijlstra
2022-10-15 15:10 ` Peter Zijlstra
2022-10-15 15:47 ` Peter Zijlstra
2022-10-24 10:13 ` Dietmar Eggemann
2022-10-29 3:31 ` Joel Fernandes
2022-10-31 16:39 ` Dietmar Eggemann
2022-10-31 18:00 ` Joel Fernandes
2022-11-04 17:09 ` Dietmar Eggemann
2022-11-21 0:22 ` Joel Fernandes
2022-11-21 1:49 ` Joel Fernandes
2022-11-21 3:59 ` Joel Fernandes
2022-11-22 18:45 ` Joel Fernandes
2023-01-09 8:51 ` Chen Yu
2022-10-03 21:44 ` [RFC PATCH 08/11] sched: Fixup task CPUs for potential proxies Connor O'Brien
2022-10-03 21:44 ` [RFC PATCH 09/11] sched/rt: Fix proxy/current (push,pull)ability Connor O'Brien
2022-10-10 11:40 ` Valentin Schneider
2022-10-14 22:32 ` Connor O'Brien
2022-10-19 17:05 ` Valentin Schneider
2022-10-20 13:30 ` Juri Lelli
2022-10-20 16:14 ` Valentin Schneider
2022-10-21 2:22 ` Connor O'Brien
2022-10-03 21:45 ` [RFC PATCH 10/11] torture: support randomized shuffling for proxy exec testing Connor O'Brien
2022-11-12 16:54 ` Joel Fernandes
2022-11-14 20:44 ` Connor O'Brien
2022-11-15 16:02 ` Joel Fernandes
2022-10-03 21:45 ` [RFC PATCH 11/11] locktorture: support nested mutexes Connor O'Brien
2022-10-06 9:59 ` [RFC PATCH 00/11] Reviving the Proxy Execution Series Juri Lelli
2022-10-06 10:07 ` Peter Zijlstra
2022-10-06 12:14 ` Juri Lelli
2022-10-15 15:44 ` Peter Zijlstra
2022-10-17 2:23 ` Joel Fernandes [this message]
2022-10-19 11:43 ` Qais Yousef
2022-10-19 12:23 ` Joel Fernandes
2022-10-19 13:41 ` Juri Lelli
2022-10-19 13:51 ` Joel Fernandes
2022-10-19 19:30 ` Qais Yousef
2022-10-20 8:51 ` Joel Fernandes
2022-10-17 3:25 ` Chengming Zhou
2022-10-17 3:56 ` Joel Fernandes
2022-10-17 4:26 ` Chengming Zhou
2022-10-17 12:27 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y0y8iURTSAv7ZspC@google.com \
--to=joel@joelfernandes.org \
--cc=boqun.feng@gmail.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=connoro@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qais.yousef@arm.com \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).