From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Qais Yousef <qyousef@layalina.io>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Zimuzo Ezeozue <zezeozue@google.com>,
Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Metin Kaya <Metin.Kaya@arm.com>,
Xuewen Yan <xuewen.yan94@gmail.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Suleiman Souhlal <suleiman@google.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: [PATCH v27 06/10] sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case
Date: Sat, 4 Apr 2026 05:36:23 +0000 [thread overview]
Message-ID: <20260404053632.1729280-7-jstultz@google.com> (raw)
In-Reply-To: <20260404053632.1729280-1-jstultz@google.com>
This patch adds logic so try_to_wake_up() will notice if we are
waking a task where blocked_on == PROXY_WAKING, and if necessary
dequeue the task so the wakeup will naturally return-migrate the
donor task back to a cpu it can run on.
This helps performance as we do the dequeue and wakeup under the
locks normally taken in the try_to_wake_up() and avoids having
to do proxy_force_return() from __schedule(), which has to
re-take similar locks and then force a pick again loop.
This was split out from the larger proxy patch, and
significantly reworked.
Credits for the original patch go to:
Peter Zijlstra (Intel) <peterz@infradead.org>
Juri Lelli <juri.lelli@redhat.com>
Valentin Schneider <valentin.schneider@arm.com>
Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v24:
* Reworked proxy_needs_return() so its less nested as suggested
by K Prateek
* Switch to using block_task with DEQUEUE_SPECIAL as suggested
by K Prateek
* Fix edge case to reset wake_cpu if select_task_rq() chooses
the current rq and we skip set_task_cpu()
v26:
* Handle both blocked and PROXY_WAKING tasks in
proxy_needs_return(), as suggested by K Prateek
* Try to handle signal edge case in ttwu that K Prateek pointed
out
v27:
* Integrate simplifications to proxy_needs_return() suggested
by K Prateek
* Rework ttwu_runnable() to align with
ACQUIRE(__task_rq_lock, guard)(p) usage as suggested by Peter
* Major rework suggested by Peter to get rid of
proxy_force_return() completely, using proxy_deactivate() and
allow ttwu to handle all the return migration. Lots of helpful
improvements suggested by K Prateek included as well here.
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
include/linux/sched.h | 2 +-
kernel/sched/core.c | 194 +++++++++++++++++++++---------------------
2 files changed, 96 insertions(+), 100 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8ec3b6d7d718b..3ae1330801157 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -161,7 +161,7 @@ struct user_event_mm;
*/
#define is_special_task_state(state) \
((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \
- TASK_DEAD | TASK_FROZEN))
+ TASK_DEAD | TASK_WAKING | TASK_FROZEN))
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
# define debug_normal_state_change(state_value) \
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8f1b14a830851..2b5f9f905afe1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3659,6 +3659,44 @@ void update_rq_avg_idle(struct rq *rq)
rq->idle_stamp = 0;
}
+#ifdef CONFIG_SCHED_PROXY_EXEC
+static inline struct task_struct *proxy_resched_idle(struct rq *rq);
+
+/*
+ * Checks to see if task p has been proxy-migrated to another rq
+ * and needs to be returned. If so, we deactivate the task here
+ * so that it can be properly woken up on the p->wake_cpu
+ * (or whichever cpu select_task_rq() picks at the bottom of
+ * try_to_wake_up()
+ */
+static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
+{
+ if (!task_is_blocked(p))
+ return false;
+
+ guard(raw_spinlock)(&p->blocked_lock);
+
+ /* Task is waking up; clear any blocked_on relationship */
+ __clear_task_blocked_on(p, NULL);
+
+ /* If already current, don't need to return migrate */
+ if (task_current(rq, p))
+ return false;
+
+ /* If we're return migrating the rq->donor, switch it out for idle */
+ if (task_current_donor(rq, p))
+ proxy_resched_idle(rq);
+
+ block_task(rq, p, TASK_WAKING);
+ return true;
+}
+#else /* !CONFIG_SCHED_PROXY_EXEC */
+static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
+{
+ return false;
+}
+#endif /* CONFIG_SCHED_PROXY_EXEC */
+
static void
ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
struct rq_flags *rf)
@@ -3723,28 +3761,26 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
*/
static int ttwu_runnable(struct task_struct *p, int wake_flags)
{
- struct rq_flags rf;
- struct rq *rq;
- int ret = 0;
+ ACQUIRE(__task_rq_lock, guard)(p);
+ struct rq *rq = guard.rq;
- rq = __task_rq_lock(p, &rf);
- if (task_on_rq_queued(p)) {
- update_rq_clock(rq);
- if (p->se.sched_delayed)
- enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
- if (!task_on_cpu(rq, p)) {
- /*
- * When on_rq && !on_cpu the task is preempted, see if
- * it should preempt the task that is current now.
- */
- wakeup_preempt(rq, p, wake_flags);
- }
- ttwu_do_wakeup(p);
- ret = 1;
- }
- __task_rq_unlock(rq, p, &rf);
+ if (!task_on_rq_queued(p))
+ return 0;
- return ret;
+ update_rq_clock(rq);
+ if (p->se.sched_delayed)
+ enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
+ if (proxy_needs_return(rq, p))
+ return 0;
+ if (!task_on_cpu(rq, p)) {
+ /*
+ * When on_rq && !on_cpu the task is preempted, see if
+ * it should preempt the task that is current now.
+ */
+ wakeup_preempt(rq, p, wake_flags);
+ }
+ ttwu_do_wakeup(p);
+ return 1;
}
void sched_ttwu_pending(void *arg)
@@ -4131,6 +4167,8 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* it disabling IRQs (this allows not taking ->pi_lock).
*/
WARN_ON_ONCE(p->se.sched_delayed);
+ /* If p is current, we know we can run here, so clear blocked_on */
+ clear_task_blocked_on(p, NULL);
if (!ttwu_state_match(p, state, &success))
goto out;
@@ -4147,6 +4185,15 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
*/
scoped_guard (raw_spinlock_irqsave, &p->pi_lock) {
smp_mb__after_spinlock();
+
+ /*
+ * We could get a wakeup from a signal which wouldn't
+ * mark the blocked_on state as PROXY_WAKING. So
+ * set the woken task as PROXY_WAKING here so we are
+ * sure the task will wake and run.
+ */
+ set_task_blocked_on_waking(p, NULL);
+
if (!ttwu_state_match(p, state, &success))
break;
@@ -4211,6 +4258,14 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
*/
WRITE_ONCE(p->__state, TASK_WAKING);
+ /*
+ * We never clear the blocked_on relation on proxy_deactivate.
+ * If we don't clear it here, we have TASK_RUNNING + p->blocked_on
+ * when waking up. Since this is a fully blocked, off CPU task
+ * waking up, it should be safe to clear the blocked_on relation.
+ */
+ if (task_is_blocked(p))
+ clear_task_blocked_on(p, NULL);
/*
* If the owning (remote) CPU is still in the middle of schedule() with
* this task as prev, considering queueing p on the remote CPUs wake_list
@@ -4255,6 +4310,16 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
wake_flags |= WF_MIGRATED;
psi_ttwu_dequeue(p);
set_task_cpu(p, cpu);
+ } else if (cpu != p->wake_cpu) {
+ /*
+ * If we were proxy-migrated to cpu, then
+ * select_task_rq() picks cpu instead of wake_cpu
+ * to return to, we won't call set_task_cpu(),
+ * leaving a stale wake_cpu pointing to where we
+ * proxy-migrated from. So just fixup wake_cpu here
+ * if its not correct
+ */
+ p->wake_cpu = cpu;
}
ttwu_queue(p, cpu, wake_flags);
@@ -6542,7 +6607,7 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
if (signal_pending_state(task_state, p)) {
WRITE_ONCE(p->__state, TASK_RUNNING);
*task_state_p = TASK_RUNNING;
- set_task_blocked_on_waking(p, NULL);
+ clear_task_blocked_on(p, NULL);
return false;
}
@@ -6585,13 +6650,11 @@ static inline struct task_struct *proxy_resched_idle(struct rq *rq)
return rq->idle;
}
-static bool proxy_deactivate(struct rq *rq, struct task_struct *donor)
+static void proxy_deactivate(struct rq *rq, struct task_struct *donor)
{
unsigned long state = READ_ONCE(donor->__state);
- /* Don't deactivate if the state has been changed to TASK_RUNNING */
- if (state == TASK_RUNNING)
- return false;
+ WARN_ON_ONCE(state == TASK_RUNNING);
/*
* Because we got donor from pick_next_task(), it is *crucial*
* that we call proxy_resched_idle() before we deactivate it.
@@ -6602,7 +6665,7 @@ static bool proxy_deactivate(struct rq *rq, struct task_struct *donor)
* need to be changed from next *before* we deactivate.
*/
proxy_resched_idle(rq);
- return try_to_block_task(rq, donor, &state, true);
+ block_task(rq, donor, state);
}
static inline void proxy_release_rq_lock(struct rq *rq, struct rq_flags *rf)
@@ -6676,71 +6739,6 @@ static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
proxy_reacquire_rq_lock(rq, rf);
}
-static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
- struct task_struct *p)
- __must_hold(__rq_lockp(rq))
-{
- struct rq *task_rq, *target_rq = NULL;
- int cpu, wake_flag = WF_TTWU;
-
- lockdep_assert_rq_held(rq);
- WARN_ON(p == rq->curr);
-
- if (p == rq->donor)
- proxy_resched_idle(rq);
-
- proxy_release_rq_lock(rq, rf);
- /*
- * We drop the rq lock, and re-grab task_rq_lock to get
- * the pi_lock (needed for select_task_rq) as well.
- */
- scoped_guard (task_rq_lock, p) {
- task_rq = scope.rq;
-
- /*
- * Since we let go of the rq lock, the task may have been
- * woken or migrated to another rq before we got the
- * task_rq_lock. So re-check we're on the same RQ. If
- * not, the task has already been migrated and that CPU
- * will handle any futher migrations.
- */
- if (task_rq != rq)
- break;
-
- /*
- * Similarly, if we've been dequeued, someone else will
- * wake us
- */
- if (!task_on_rq_queued(p))
- break;
-
- /*
- * Since we should only be calling here from __schedule()
- * -> find_proxy_task(), no one else should have
- * assigned current out from under us. But check and warn
- * if we see this, then bail.
- */
- if (task_current(task_rq, p) || task_on_cpu(task_rq, p)) {
- WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d on_cpu: %i\n",
- __func__, cpu_of(task_rq),
- p->comm, p->pid, p->on_cpu);
- break;
- }
-
- update_rq_clock(task_rq);
- deactivate_task(task_rq, p, DEQUEUE_NOCLOCK);
- cpu = select_task_rq(p, p->wake_cpu, &wake_flag);
- set_task_cpu(p, cpu);
- target_rq = cpu_rq(cpu);
- clear_task_blocked_on(p, NULL);
- }
-
- if (target_rq)
- attach_one_task(target_rq, p);
-
- proxy_reacquire_rq_lock(rq, rf);
-}
-
/*
* Find runnable lock owner to proxy for mutex blocked donor
*
@@ -6776,7 +6774,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
clear_task_blocked_on(p, PROXY_WAKING);
return p;
}
- goto force_return;
+ goto deactivate;
}
/*
@@ -6811,7 +6809,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
__clear_task_blocked_on(p, NULL);
return p;
}
- goto force_return;
+ goto deactivate;
}
if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
@@ -6890,12 +6888,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
return owner;
deactivate:
- if (proxy_deactivate(rq, donor))
- return NULL;
- /* If deactivate fails, force return */
- p = donor;
-force_return:
- proxy_force_return(rq, rf, p);
+ proxy_deactivate(rq, p);
return NULL;
migrate_task:
proxy_migrate_task(rq, rf, p, owner_cpu);
@@ -7043,6 +7036,9 @@ static void __sched notrace __schedule(int sched_mode)
if (sched_proxy_exec()) {
struct task_struct *prev_donor = rq->donor;
+ if (!prev_state && prev->blocked_on)
+ clear_task_blocked_on(prev, NULL);
+
rq_set_donor(rq, next);
if (unlikely(next->blocked_on)) {
next = find_proxy_task(rq, next, &rf);
--
2.53.0.1213.gd9a14994de-goog
next prev parent reply other threads:[~2026-04-04 5:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-04 5:36 [PATCH v27 00/10] Optimized Donor Migration for Proxy Execution John Stultz
2026-04-04 5:36 ` [PATCH v27 01/10] sched: Rework pick_next_task() and prev_balance() to avoid stale prev references John Stultz
2026-04-04 5:36 ` [PATCH v27 02/10] sched: Avoid donor->sched_class->yield_task() null traversal John Stultz
2026-04-04 5:57 ` K Prateek Nayak
2026-04-04 6:09 ` John Stultz
2026-04-04 5:36 ` [PATCH v27 03/10] sched: deadline: Add some helper variables to cleanup deadline logic John Stultz
2026-04-04 5:36 ` [PATCH v27 04/10] sched: deadline: Add dl_rq->curr pointer to address issues with Proxy Exec John Stultz
2026-04-04 5:36 ` [PATCH v27 05/10] sched: Rework block_task so it can be directly called John Stultz
2026-04-04 5:36 ` John Stultz [this message]
2026-04-04 5:36 ` [PATCH v27 07/10] sched/core: Reset the donor to current task when donor is woken John Stultz
2026-04-04 5:36 ` [PATCH v27 08/10] sched: Add blocked_donor link to task for smarter mutex handoffs John Stultz
2026-04-04 5:36 ` [PATCH v27 09/10] sched: Break out core of attach_tasks() helper into sched.h John Stultz
2026-04-04 5:36 ` [PATCH v27 10/10] sched: Migrate whole chain in proxy_migrate_task() John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260404053632.1729280-7-jstultz@google.com \
--to=jstultz@google.com \
--cc=Metin.Kaya@arm.com \
--cc=boqun.feng@gmail.com \
--cc=bsegall@google.com \
--cc=daniel.lezcano@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=xuewen.yan94@gmail.com \
--cc=zezeozue@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.