From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Valentin Schneider <valentin.schneider@arm.com>,
"Connor O'Brien" <connoro@google.com>,
John Stultz <jstultz@google.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Qais Yousef <qyousef@layalina.io>,
Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Zimuzo Ezeozue <zezeozue@google.com>,
Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Metin Kaya <Metin.Kaya@arm.com>,
Xuewen Yan <xuewen.yan94@gmail.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Suleiman Souhlal <suleiman@google.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: [PATCH v22 5/6] sched: Add blocked_donor link to task for smarter mutex handoffs
Date: Fri, 26 Sep 2025 03:29:13 +0000 [thread overview]
Message-ID: <20250926032931.27663-6-jstultz@google.com> (raw)
In-Reply-To: <20250926032931.27663-1-jstultz@google.com>
From: Peter Zijlstra <peterz@infradead.org>
Add link to the task this task is proxying for, and use it so
the mutex owner can do an intelligent hand-off of the mutex to
the task that the owner is running on behalf.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This patch was split out from larger proxy patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split out from larger proxy patch
v6:
* Moved proxied value from earlier patch to this one where it
is actually used
* Rework logic to check sched_proxy_exec() instead of using ifdefs
* Moved comment change to this patch where it makes sense
v7:
* Use more descriptive term then "us" in comments, as suggested
by Metin Kaya.
* Minor typo fixup from Metin Kaya
* Reworked proxied variable to prev_not_proxied to simplify usage
v8:
* Use helper for donor blocked_on_state transition
v9:
* Re-add mutex lock handoff in the unlock path, but only when we
have a blocked donor
* Slight reword of commit message suggested by Metin
v18:
* Add task_init initialization for blocked_donor, suggested by
Suleiman
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
include/linux/sched.h | 1 +
init/init_task.c | 1 +
kernel/fork.c | 2 +-
kernel/locking/mutex.c | 41 ++++++++++++++++++++++++++++++++++++++---
kernel/sched/core.c | 18 ++++++++++++++++--
5 files changed, 57 insertions(+), 6 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8245940783c77..5ca495d5d0a2d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1239,6 +1239,7 @@ struct task_struct {
#endif
struct mutex *blocked_on; /* lock we're blocked on */
+ struct task_struct *blocked_donor; /* task that is boosting this task */
raw_spinlock_t blocked_lock;
#ifdef CONFIG_SCHED_PROXY_EXEC
enum blocked_on_state blocked_on_state;
diff --git a/init/init_task.c b/init/init_task.c
index 63b66b4aa585a..4fb95ab1810a3 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -177,6 +177,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
#ifdef CONFIG_SCHED_PROXY_EXEC
.blocked_on_state = BO_RUNNABLE,
#endif
+ .blocked_donor = NULL,
#ifdef CONFIG_RT_MUTEXES
.pi_waiters = RB_ROOT_CACHED,
.pi_top_task = NULL,
diff --git a/kernel/fork.c b/kernel/fork.c
index d8eb66e5be918..651ebe85e1521 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2130,10 +2130,10 @@ __latent_entropy struct task_struct *copy_process(
#endif
p->blocked_on = NULL; /* not blocked yet */
+ p->blocked_donor = NULL; /* nobody is boosting p yet */
#ifdef CONFIG_SCHED_PROXY_EXEC
p->blocked_on_state = BO_RUNNABLE;
#endif
-
#ifdef CONFIG_BCACHE
p->sequential_io = 0;
p->sequential_io_avg = 0;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index d8cf2e9a22a65..fca2ee0756b1f 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -924,7 +924,7 @@ EXPORT_SYMBOL_GPL(ww_mutex_lock_interruptible);
*/
static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigned long ip)
{
- struct task_struct *next = NULL;
+ struct task_struct *donor, *next = NULL;
DEFINE_WAKE_Q(wake_q);
unsigned long owner;
unsigned long flags;
@@ -943,6 +943,12 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
MUTEX_WARN_ON(__owner_task(owner) != current);
MUTEX_WARN_ON(owner & MUTEX_FLAG_PICKUP);
+ if (sched_proxy_exec() && current->blocked_donor) {
+ /* force handoff if we have a blocked_donor */
+ owner = MUTEX_FLAG_HANDOFF;
+ break;
+ }
+
if (owner & MUTEX_FLAG_HANDOFF)
break;
@@ -956,7 +962,34 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
raw_spin_lock_irqsave(&lock->wait_lock, flags);
debug_mutex_unlock(lock);
- if (!list_empty(&lock->wait_list)) {
+
+ if (sched_proxy_exec()) {
+ raw_spin_lock(¤t->blocked_lock);
+ /*
+ * If we have a task boosting current, and that task was boosting
+ * current through this lock, hand the lock to that task, as that
+ * is the highest waiter, as selected by the scheduling function.
+ */
+ donor = current->blocked_donor;
+ if (donor) {
+ struct mutex *next_lock;
+
+ raw_spin_lock_nested(&donor->blocked_lock, SINGLE_DEPTH_NESTING);
+ next_lock = __get_task_blocked_on(donor);
+ if (next_lock == lock) {
+ next = donor;
+ __set_blocked_on_waking(donor);
+ wake_q_add(&wake_q, donor);
+ current->blocked_donor = NULL;
+ }
+ raw_spin_unlock(&donor->blocked_lock);
+ }
+ }
+
+ /*
+ * Failing that, pick any on the wait list.
+ */
+ if (!next && !list_empty(&lock->wait_list)) {
/* get the first entry from the wait-list: */
struct mutex_waiter *waiter =
list_first_entry(&lock->wait_list,
@@ -964,7 +997,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
next = waiter->task;
- raw_spin_lock(&next->blocked_lock);
+ raw_spin_lock_nested(&next->blocked_lock, SINGLE_DEPTH_NESTING);
debug_mutex_wake_waiter(lock, waiter);
WARN_ON_ONCE(__get_task_blocked_on(next) != lock);
__set_blocked_on_waking(next);
@@ -975,6 +1008,8 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
if (owner & MUTEX_FLAG_HANDOFF)
__mutex_handoff(lock, next);
+ if (sched_proxy_exec())
+ raw_spin_unlock(¤t->blocked_lock);
raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d063d2c9bd5aa..bccaa4bf41b7d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6831,7 +6831,17 @@ static inline bool proxy_can_run_here(struct rq *rq, struct task_struct *p)
* Find runnable lock owner to proxy for mutex blocked donor
*
* Follow the blocked-on relation:
- * task->blocked_on -> mutex->owner -> task...
+ *
+ * ,-> task
+ * | | blocked-on
+ * | v
+ * blocked_donor | mutex
+ * | | owner
+ * | v
+ * `-- task
+ *
+ * and set the blocked_donor relation, this latter is used by the mutex
+ * code to find which (blocked) task to hand-off to.
*
* Lock order:
*
@@ -6997,6 +7007,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
* rq, therefore holding @rq->lock is sufficient to
* guarantee its existence, as per ttwu_remote().
*/
+ owner->blocked_donor = p;
}
/* Handle actions we need to do outside of the guard() scope */
@@ -7097,6 +7108,7 @@ static void __sched notrace __schedule(int sched_mode)
unsigned long prev_state;
struct rq_flags rf;
struct rq *rq;
+ bool prev_not_proxied;
int cpu;
/* Trace preemptions consistently with task switches */
@@ -7169,9 +7181,11 @@ static void __sched notrace __schedule(int sched_mode)
switch_count = &prev->nvcsw;
}
+ prev_not_proxied = !prev->blocked_donor;
pick_again:
next = pick_next_task(rq, rq->donor, &rf);
rq_set_donor(rq, next);
+ next->blocked_donor = NULL;
if (unlikely(task_is_blocked(next))) {
next = find_proxy_task(rq, next, &rf);
if (!next) {
@@ -7237,7 +7251,7 @@ static void __sched notrace __schedule(int sched_mode)
rq = context_switch(rq, prev, next, &rf);
} else {
/* In case next was already curr but just got blocked_donor */
- if (!task_current_donor(rq, next))
+ if (prev_not_proxied && next->blocked_donor)
proxy_tag_curr(rq, next);
rq_unpin_lock(rq, &rf);
--
2.51.0.536.g15c5d4f767-goog
next prev parent reply other threads:[~2025-09-26 3:29 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 3:29 [PATCH v22 0/6] Donor Migration for Proxy Execution (v22) John Stultz
2025-09-26 3:29 ` [PATCH v22 1/6] locking: Add task::blocked_lock to serialize blocked_on state John Stultz
2025-10-08 10:27 ` Peter Zijlstra
2025-09-26 3:29 ` [PATCH v22 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration John Stultz
2025-10-08 11:26 ` Peter Zijlstra
2025-10-09 0:07 ` John Stultz
2025-10-09 11:43 ` Peter Zijlstra
2025-10-09 11:45 ` Peter Zijlstra
2025-10-14 2:43 ` John Stultz
2025-10-16 22:23 ` John Stultz
2025-09-26 3:29 ` [PATCH v22 3/6] sched: Add logic to zap balance callbacks if we pick again John Stultz
2025-10-08 11:37 ` Peter Zijlstra
2025-09-26 3:29 ` [PATCH v22 4/6] sched: Handle blocked-waiter migration (and return migration) John Stultz
2025-10-08 13:32 ` Peter Zijlstra
2025-10-16 0:15 ` John Stultz
2025-09-26 3:29 ` John Stultz [this message]
2025-09-26 3:29 ` [PATCH v22 6/6] sched: Migrate whole chain in proxy_migrate_task() John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250926032931.27663-6-jstultz@google.com \
--to=jstultz@google.com \
--cc=Metin.Kaya@arm.com \
--cc=boqun.feng@gmail.com \
--cc=bsegall@google.com \
--cc=connoro@google.com \
--cc=daniel.lezcano@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=tglx@linutronix.de \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=xuewen.yan94@gmail.com \
--cc=zezeozue@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.