From: Peter Zijlstra <peterz@infradead.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: John Stultz <jstultz@google.com>,
LKML <linux-kernel@vger.kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Zimuzo Ezeozue <zezeozue@google.com>,
Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Metin Kaya <Metin.Kaya@arm.com>,
Xuewen Yan <xuewen.yan94@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Suleiman Souhlal <suleiman@google.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution
Date: Thu, 2 Apr 2026 17:50:55 +0200 [thread overview]
Message-ID: <20260402155055.GV3738010@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <1515d405-62fc-4952-842f-b69e2bf192c0@amd.com>
On Fri, Mar 27, 2026 at 10:27:08PM +0530, K Prateek Nayak wrote:
> So taking a step back, this is what we have today (at least the
> common scenario):
>
> CPU0 (donor - A) CPU1 (owner - B)
> ================ ================
>
> mutex_lock()
> __set_current_state(TASK_INTERRUPTIBLE)
> __set_task_blocked_on(M)
> schedule()
> /* Retained for proxy */
> proxy_migrate_task()
> ==================================> /* Migrates to CPU1 */
> ...
> send_sig(B)
> signal_wake_up_state()
> wake_up_state()
> try_to_wake_up()
> ttwu_runnable()
> ttwu_do_wakeup() =============> /* A->__state = TASK_RUNNING */
>
> /*
> * After this point ttwu_state_match()
> * will fail for A so a mutex_unlock()
> * will have to go through __schedule()
> * for return migration.
> */
>
> __schedule()
> find_proxy_task()
>
> /* Scenario 1 - B sleeps */
> __clear_task_blocked_on()
> proxy_deactivate(A)
> /* A->__state == TASK_RUNNING */
> /* fallthrough */
>
> /* Scenario 2 - return migration after unlock() */
> __clear_task_blocked_on()
> /*
> * At this point proxy stops.
> * Much later after signal.
> */
> proxy_force_return()
> schedule() <==================================
> signal_pending_state()
>
> clear_task_blocked_on()
> __set_current_state(TASK_RUNNING)
>
> ... /* return with -EINR */
>
>
> Basically, a blocked donor has to wait for a mutex_unlock() before it
> can go process the signal and bail out on the mutex_lock_interruptible()
> which seems counter productive - but it is still okay from correctness
> perspective.
>
> >
> > One thing you *can* do it frob ttwu_runnable() to 'refuse' to wake the
> > task, and then it goes into the normal path and will do the migration.
> > I've done things like that before.
> >
> > Does that fix all the return-migration cases?
>
> Yes it does! If we handle the return via ttwu_runnable(), which is what
> proxy_needs_return() in the next chunk of changes aims to do and we can
> build the invariant that TASK_RUNNING + task_is_blocked() is an illegal
> state outside of __schedule() which works well with ttwu_state_match().
>
> >
> >> 2. Why does proxy_needs_return() (this comes later in John's tree but I
> >> moved it up ahead) need the proxy_task_runnable_but_waking() override
> >> of the ttwu_state_mach() machinery?
> >> (https://github.com/johnstultz-work/linux-dev/commit/28ad4d3fa847b90713ca18a623d1ee7f73b648d9)
> >
> > Since it comes later, I've not seen it and not given it thought ;-)
> >
> > (I mean, I've probably seen it at some point, but being the gold-fish
> > that I am, I have no recollection, so I might as well not have seen it).
> >
> > A brief look now makes me confused. The comment fails to describe how
> > that situation could ever come to pass.
>
> That is a signal delivery happening before unlock which will force
> TASK_RUNNING but since we are waiting on an unlock, the wakeup from
> unlock will see TASK_RUNNING + PROXY_WAKING.
>
> We then later force it on the ttwu path to do return via
> ttwu_runnable().
So, I've not gone through all the cases yet, and it is *COMPLETELY*
untested, but something like the below perhaps?
---
include/linux/sched.h | 2
kernel/sched/core.c | 173 ++++++++++++++++----------------------------------
2 files changed, 58 insertions(+), 117 deletions(-)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -161,7 +161,7 @@ struct user_event_mm;
*/
#define is_special_task_state(state) \
((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \
- TASK_DEAD | TASK_FROZEN))
+ TASK_DEAD | TASK_WAKING | TASK_FROZEN))
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
# define debug_normal_state_change(state_value) \
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2160,8 +2160,29 @@ void deactivate_task(struct rq *rq, stru
dequeue_task(rq, p, flags);
}
-static void block_task(struct rq *rq, struct task_struct *p, int flags)
+static bool block_task(struct rq *rq, struct task_struct *p, unsigned long task_state)
{
+ int flags = DEQUEUE_NOCLOCK;
+
+ p->sched_contributes_to_load =
+ (task_state & TASK_UNINTERRUPTIBLE) &&
+ !(task_state & TASK_NOLOAD) &&
+ !(task_state & TASK_FROZEN);
+
+ if (unlikely(is_special_task_state(task_state)))
+ flags |= DEQUEUE_SPECIAL;
+
+ /*
+ * __schedule() ttwu()
+ * prev_state = prev->state; if (p->on_rq && ...)
+ * if (prev_state) goto out;
+ * p->on_rq = 0; smp_acquire__after_ctrl_dep();
+ * p->state = TASK_WAKING
+ *
+ * Where __schedule() and ttwu() have matching control dependencies.
+ *
+ * After this, schedule() must not care about p->state any more.
+ */
if (dequeue_task(rq, p, DEQUEUE_SLEEP | flags))
__block_task(rq, p);
}
@@ -3702,28 +3723,39 @@ ttwu_do_activate(struct rq *rq, struct t
*/
static int ttwu_runnable(struct task_struct *p, int wake_flags)
{
- struct rq_flags rf;
- struct rq *rq;
- int ret = 0;
+ ACQUIRE(__task_rq_lock, guard)(p);
+ struct rq *rq = guard.rq;
- rq = __task_rq_lock(p, &rf);
- if (task_on_rq_queued(p)) {
- update_rq_clock(rq);
- if (p->se.sched_delayed)
- enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
- if (!task_on_cpu(rq, p)) {
+ if (!task_on_rq_queued(p))
+ return 0;
+
+ if (sched_proxy_exec() && p->blocked_on) {
+ guard(raw_spinlock)(&p->blocked_lock);
+ struct mutex *lock = p->blocked_on;
+ if (lock) {
/*
- * When on_rq && !on_cpu the task is preempted, see if
- * it should preempt the task that is current now.
+ * TASK_WAKING is a special state and results in
+ * DEQUEUE_SPECIAL such that the task will actually be
+ * forced from the runqueue.
*/
- wakeup_preempt(rq, p, wake_flags);
+ block_task(rq, p, TASK_WAKING);
+ p->blocked_on = NULL;
+ return 0;
}
- ttwu_do_wakeup(p);
- ret = 1;
}
- __task_rq_unlock(rq, p, &rf);
- return ret;
+ update_rq_clock(rq);
+ if (p->se.sched_delayed)
+ enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
+ if (!task_on_cpu(rq, p)) {
+ /*
+ * When on_rq && !on_cpu the task is preempted, see if
+ * it should preempt the task that is current now.
+ */
+ wakeup_preempt(rq, p, wake_flags);
+ }
+ ttwu_do_wakeup(p);
+ return 1;
}
void sched_ttwu_pending(void *arg)
@@ -6519,7 +6551,6 @@ static bool try_to_block_task(struct rq
unsigned long *task_state_p, bool should_block)
{
unsigned long task_state = *task_state_p;
- int flags = DEQUEUE_NOCLOCK;
if (signal_pending_state(task_state, p)) {
WRITE_ONCE(p->__state, TASK_RUNNING);
@@ -6539,26 +6570,7 @@ static bool try_to_block_task(struct rq
if (!should_block)
return false;
- p->sched_contributes_to_load =
- (task_state & TASK_UNINTERRUPTIBLE) &&
- !(task_state & TASK_NOLOAD) &&
- !(task_state & TASK_FROZEN);
-
- if (unlikely(is_special_task_state(task_state)))
- flags |= DEQUEUE_SPECIAL;
-
- /*
- * __schedule() ttwu()
- * prev_state = prev->state; if (p->on_rq && ...)
- * if (prev_state) goto out;
- * p->on_rq = 0; smp_acquire__after_ctrl_dep();
- * p->state = TASK_WAKING
- *
- * Where __schedule() and ttwu() have matching control dependencies.
- *
- * After this, schedule() must not care about p->state any more.
- */
- block_task(rq, p, flags);
+ block_task(rq, p, task_state);
return true;
}
@@ -6586,13 +6598,12 @@ static inline struct task_struct *proxy_
return rq->idle;
}
-static bool proxy_deactivate(struct rq *rq, struct task_struct *donor)
+static void proxy_deactivate(struct rq *rq, struct task_struct *donor)
{
unsigned long state = READ_ONCE(donor->__state);
- /* Don't deactivate if the state has been changed to TASK_RUNNING */
- if (state == TASK_RUNNING)
- return false;
+ WARN_ON_ONCE(state == TASK_RUNNING);
+
/*
* Because we got donor from pick_next_task(), it is *crucial*
* that we call proxy_resched_idle() before we deactivate it.
@@ -6603,7 +6614,7 @@ static bool proxy_deactivate(struct rq *
* need to be changed from next *before* we deactivate.
*/
proxy_resched_idle(rq);
- return try_to_block_task(rq, donor, &state, true);
+ block_task(rq, donor, state);
}
static inline void proxy_release_rq_lock(struct rq *rq, struct rq_flags *rf)
@@ -6677,71 +6688,6 @@ static void proxy_migrate_task(struct rq
proxy_reacquire_rq_lock(rq, rf);
}
-static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
- struct task_struct *p)
- __must_hold(__rq_lockp(rq))
-{
- struct rq *task_rq, *target_rq = NULL;
- int cpu, wake_flag = WF_TTWU;
-
- lockdep_assert_rq_held(rq);
- WARN_ON(p == rq->curr);
-
- if (p == rq->donor)
- proxy_resched_idle(rq);
-
- proxy_release_rq_lock(rq, rf);
- /*
- * We drop the rq lock, and re-grab task_rq_lock to get
- * the pi_lock (needed for select_task_rq) as well.
- */
- scoped_guard (task_rq_lock, p) {
- task_rq = scope.rq;
-
- /*
- * Since we let go of the rq lock, the task may have been
- * woken or migrated to another rq before we got the
- * task_rq_lock. So re-check we're on the same RQ. If
- * not, the task has already been migrated and that CPU
- * will handle any futher migrations.
- */
- if (task_rq != rq)
- break;
-
- /*
- * Similarly, if we've been dequeued, someone else will
- * wake us
- */
- if (!task_on_rq_queued(p))
- break;
-
- /*
- * Since we should only be calling here from __schedule()
- * -> find_proxy_task(), no one else should have
- * assigned current out from under us. But check and warn
- * if we see this, then bail.
- */
- if (task_current(task_rq, p) || task_on_cpu(task_rq, p)) {
- WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d on_cpu: %i\n",
- __func__, cpu_of(task_rq),
- p->comm, p->pid, p->on_cpu);
- break;
- }
-
- update_rq_clock(task_rq);
- deactivate_task(task_rq, p, DEQUEUE_NOCLOCK);
- cpu = select_task_rq(p, p->wake_cpu, &wake_flag);
- set_task_cpu(p, cpu);
- target_rq = cpu_rq(cpu);
- clear_task_blocked_on(p, NULL);
- }
-
- if (target_rq)
- attach_one_task(target_rq, p);
-
- proxy_reacquire_rq_lock(rq, rf);
-}
-
/*
* Find runnable lock owner to proxy for mutex blocked donor
*
@@ -6777,7 +6723,7 @@ find_proxy_task(struct rq *rq, struct ta
clear_task_blocked_on(p, PROXY_WAKING);
return p;
}
- goto force_return;
+ goto deactivate;
}
/*
@@ -6812,7 +6758,7 @@ find_proxy_task(struct rq *rq, struct ta
__clear_task_blocked_on(p, NULL);
return p;
}
- goto force_return;
+ goto deactivate;
}
if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
@@ -6891,12 +6837,7 @@ find_proxy_task(struct rq *rq, struct ta
return owner;
deactivate:
- if (proxy_deactivate(rq, donor))
- return NULL;
- /* If deactivate fails, force return */
- p = donor;
-force_return:
- proxy_force_return(rq, rf, p);
+ proxy_deactivate(rq, donor);
return NULL;
migrate_task:
proxy_migrate_task(rq, rf, p, owner_cpu);
next prev parent reply other threads:[~2026-04-02 15:51 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 19:13 [PATCH v26 00/10] Simple Donor Migration for Proxy Execution John Stultz
2026-03-24 19:13 ` [PATCH v26 01/10] sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr() John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 02/10] sched: Minimise repeated sched_proxy_exec() checking John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 03/10] sched: Fix potentially missing balancing with Proxy Exec John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 04/10] locking: Add task::blocked_lock to serialize blocked_on state John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 05/10] sched: Fix modifying donor->blocked on without proper locking John Stultz
2026-03-26 21:45 ` Steven Rostedt
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 06/10] sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 07/10] sched: Add assert_balance_callbacks_empty helper John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 08/10] sched: Add logic to zap balance callbacks if we pick again John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 09/10] sched: Move attach_one_task and attach_task helpers to sched.h John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 10/10] sched: Handle blocked-waiter migration (and return migration) John Stultz
2026-03-26 22:52 ` Steven Rostedt
2026-03-27 4:47 ` K Prateek Nayak
2026-03-27 12:47 ` Peter Zijlstra
2026-04-02 14:43 ` Peter Zijlstra
2026-04-02 15:08 ` Peter Zijlstra
2026-04-02 17:43 ` John Stultz
2026-04-02 17:34 ` John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-25 10:52 ` [PATCH v26 00/10] Simple Donor Migration for Proxy Execution K Prateek Nayak
2026-03-27 11:48 ` Peter Zijlstra
2026-03-27 13:33 ` K Prateek Nayak
2026-03-27 15:20 ` Peter Zijlstra
2026-03-27 15:41 ` Peter Zijlstra
2026-03-27 16:00 ` Peter Zijlstra
2026-03-27 16:57 ` K Prateek Nayak
2026-04-02 15:50 ` Peter Zijlstra [this message]
2026-04-02 18:31 ` John Stultz
2026-04-02 21:04 ` John Stultz
2026-04-03 6:09 ` K Prateek Nayak
2026-04-03 9:52 ` Peter Zijlstra
2026-04-03 10:25 ` K Prateek Nayak
2026-04-03 11:28 ` Peter Zijlstra
2026-04-03 13:43 ` K Prateek Nayak
2026-04-03 14:38 ` Peter Zijlstra
2026-04-03 15:39 ` K Prateek Nayak
2026-04-03 21:08 ` Peter Zijlstra
2026-04-04 0:26 ` John Stultz
2026-04-04 5:49 ` K Prateek Nayak
2026-04-04 6:07 ` John Stultz
2026-04-06 2:40 ` K Prateek Nayak
2026-04-03 12:54 ` Peter Zijlstra
2026-04-03 9:18 ` Peter Zijlstra
2026-03-27 19:15 ` John Stultz
2026-03-27 19:10 ` John Stultz
2026-03-28 4:53 ` K Prateek Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260402155055.GV3738010@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=Metin.Kaya@arm.com \
--cc=boqun.feng@gmail.com \
--cc=bsegall@google.com \
--cc=daniel.lezcano@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=xuewen.yan94@gmail.com \
--cc=zezeozue@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox