From: Peter Zijlstra <peterz@infradead.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: John Stultz <jstultz@google.com>,
LKML <linux-kernel@vger.kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Zimuzo Ezeozue <zezeozue@google.com>,
Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Metin Kaya <Metin.Kaya@arm.com>,
Xuewen Yan <xuewen.yan94@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Suleiman Souhlal <suleiman@google.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution
Date: Fri, 3 Apr 2026 11:52:25 +0200 [thread overview]
Message-ID: <20260403095225.GY3738010@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <e6dfce6a-ee6f-441b-9f21-e2036d0a3e5d@amd.com>
On Fri, Apr 03, 2026 at 11:39:25AM +0530, K Prateek Nayak wrote:
> >> @@ -3702,28 +3723,39 @@ ttwu_do_activate(struct rq *rq, struct t
> >> */
> >> static int ttwu_runnable(struct task_struct *p, int wake_flags)
> >> {
> >> - struct rq_flags rf;
> >> - struct rq *rq;
> >> - int ret = 0;
> >> + ACQUIRE(__task_rq_lock, guard)(p);
> >> + struct rq *rq = guard.rq;
> >>
> >> - rq = __task_rq_lock(p, &rf);
> >> - if (task_on_rq_queued(p)) {
> >> - update_rq_clock(rq);
> >> - if (p->se.sched_delayed)
> >> - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
> >> - if (!task_on_cpu(rq, p)) {
> >> + if (!task_on_rq_queued(p))
> >> + return 0;
> >> +
> >> + if (sched_proxy_exec() && p->blocked_on) {
> >> + guard(raw_spinlock)(&p->blocked_lock);
> >> + struct mutex *lock = p->blocked_on;
> >> + if (lock) {
> >> /*
> >> - * When on_rq && !on_cpu the task is preempted, see if
> >> - * it should preempt the task that is current now.
> >> + * TASK_WAKING is a special state and results in
> >> + * DEQUEUE_SPECIAL such that the task will actually be
> >> + * forced from the runqueue.
> >> */
> >> - wakeup_preempt(rq, p, wake_flags);
> >> + block_task(rq, p, TASK_WAKING);
>
> This needs to reset the rq->donor if the task getting woken up is the
> current donor.
*groan*, that is a fun case. I'll ponder that.
> >> + p->blocked_on = NULL;
> >> + return 0;
> >> }
> >> - ttwu_do_wakeup(p);
> >> - ret = 1;
> >> }
> >> - __task_rq_unlock(rq, p, &rf);
> >>
> >> - return ret;
> >> + update_rq_clock(rq);
>
> nit. Since block_task() adds a DEQUEUE_NOCLOCK now we need to move that
> clock update before the block.
D'0h :-)
> >> + if (p->se.sched_delayed)
> >> + enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
> >
> > I can't precisely remember the details now, but I believe we need to
> > handle enqueueing sched_delayed tasks before handling blocked_on
> > tasks.
>
> So proxy_deactivate() can still delay the task leading to
> task_on_rq_queued() and the wakeup coming to ttwu_runnable() so either
> we can dequeue it fully in proxy_deactivate() or we need to teach
> block_task() to add a DEQUEUE_DELAYED flag when task_is_blocked().
>
> I think the former is cleaner but we don't decay lag for fair task :-(
>
> We can't simply re-enqueue it either since proxy migration might have
> put it on a CPU outside its affinity mask so we need to take a full
> dequeue + wakeup in ttwu_runnable().
Right, sanest option is to have ttwu_runnable() deal with this.
> >> -static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
> >> - struct task_struct *p)
> >> - __must_hold(__rq_lockp(rq))
> >> -{
> >> -}
> >> -
>
> Went a little heavy of the delete there did you? :-)
Well, I thought that was the whole idea, have ttwu() handle this :-)
> >> /*
> >> * Find runnable lock owner to proxy for mutex blocked donor
> >> *
> >> @@ -6777,7 +6723,7 @@ find_proxy_task(struct rq *rq, struct ta
> >> clear_task_blocked_on(p, PROXY_WAKING);
> >> return p;
> >> }
> >> - goto force_return;
> >> + goto deactivate;
> >> }
>
> This makes sense if we preserve the !TASK_RUNNING + p->blocked_on
> invariant since we'll definitely get a wakeup here.
Right, so TASK_RUNNING must imply !->blocked_on.
> >>
> >> /*
> >> @@ -6812,7 +6758,7 @@ find_proxy_task(struct rq *rq, struct ta
> >> __clear_task_blocked_on(p, NULL);
> >> return p;
> >> }
> >> - goto force_return;
> >> + goto deactivate;
>
> This too makes sense considering !owner implies some task will be woken
> up but ... if we take this task off and another task steals the mutex,
> this task will no longer be able to proxy it since it is completely
> blocked now.
>
> Probably not desired. We should at least let it run and see if it can
> get the mutex and evaluate the "p->blocked_on" again since !owner is
> a limbo state.
I need to go re-read the mutex side of things, but doesn't that do
hand-off way more agressively?
Anyway, one thing that is completely missing is a fast path for when the
task is still inside its valid mask. I suspect adding that back will
cure some of these issues.
> So I added the following in top of Peter's diff on top of
> queue:sched/core and it hasn't crashed and burnt yet when running a
> handful instances of sched-messaging with a mix of fair and SCHED_RR
> priority:
>
> (Includes John's findings from the parallel thread)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5b2b2451720a..e845e3a8ae65 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2160,7 +2160,7 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
> dequeue_task(rq, p, flags);
> }
>
> -static bool block_task(struct rq *rq, struct task_struct *p, unsigned long task_state)
> +static void block_task(struct rq *rq, struct task_struct *p, unsigned long task_state)
> {
> int flags = DEQUEUE_NOCLOCK;
>
> @@ -3696,6 +3696,20 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
> }
> }
>
> +static void zap_balance_callbacks(struct rq *rq);
> +
> +static inline void proxy_reset_donor(struct rq *rq)
> +{
> +#ifdef CONFIG_SCHED_PROXY_EXEC
> + WARN_ON_ONCE(rq->curr == rq->donor);
> +
> + put_prev_set_next_task(rq, rq->donor, rq->curr);
> + rq_set_donor(rq, rq->curr);
> + zap_balance_callbacks(rq);
> + resched_curr(rq);
> +#endif
> +}
This one hurts my bain :-)
> /*
> * Consider @p being inside a wait loop:
> *
> @@ -3730,6 +3744,8 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
> return 0;
>
> update_rq_clock(rq);
> + if (p->se.sched_delayed)
> + enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
Right, this works but seems wasteful, might be better to add
DEQUEUE_DELAYED in the blocked_on case.
> if (sched_proxy_exec() && p->blocked_on) {
So I had doubts about this lockless test of ->blocked_on, I still cannot
convince myself it is correct.
> guard(raw_spinlock)(&p->blocked_lock);
> struct mutex *lock = p->blocked_on;
> @@ -3738,15 +3754,20 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
> * TASK_WAKING is a special state and results in
> * DEQUEUE_SPECIAL such that the task will actually be
> * forced from the runqueue.
> + *
> + * XXX: All of this is now equivalent of
> + * proxy_needs_return() from John's series :-)
> */
> - block_task(rq, p, TASK_WAKING);
> p->blocked_on = NULL;
> + if (task_current(rq, p))
> + goto out;
Right, fair enough :-) This could also be done when rq->cpu is inside
p->cpus_ptr mask, because in that case we don't strictly need a
migration. Thinking about that was on the todo list.
> + if (task_current_donor(rq, p))
> + proxy_reset_donor(rq);
Fun fun fun :-)
> + block_task(rq, p, TASK_WAKING);
> return 0;
> }
> }
> -
> - if (p->se.sched_delayed)
> - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
> +out:
> if (!task_on_cpu(rq, p)) {
> /*
> * When on_rq && !on_cpu the task is preempted, see if
> @@ -4256,6 +4277,15 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> */
> smp_cond_load_acquire(&p->on_cpu, !VAL);
>
> + /*
> + * We never clear the blocked_on relation on proxy_deactivate.
> + * If we don't clear it here, we have TASK_RUNNING + p->blocked_on
> + * when waking up. Since this is a fully blocked, off CPU task
> + * waking up, it should be safe to clear the blocked_on relation.
> + */
> + if (task_is_blocked(p))
> + clear_task_blocked_on(p, NULL);
> +
Aah, yes! This is when find_proxy_task() hits deactivate() for us and we
skip ttwu_runnable(). We still need to clear ->blocked_on.
I am once again not sure on the lockless nature of accessing
->blocked_on.
> cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
> if (task_cpu(p) != cpu) {
> if (p->in_iowait) {
> @@ -6977,6 +7007,10 @@ static void __sched notrace __schedule(int sched_mode)
> switch_count = &prev->nvcsw;
> }
>
> + /* See: https://github.com/kudureranganath/linux/commit/0d6a01bb19db39f045d6f0f5fb4d196500091637 */
> + if (!prev_state && task_is_blocked(prev))
> + clear_task_blocked_on(prev, NULL);
> +
This one confuses me, ttwu() should never results in ->blocked_on being
set.
> pick_again:
> assert_balance_callbacks_empty(rq);
> next = pick_next_task(rq, rq->donor, &rf);
next prev parent reply other threads:[~2026-04-03 9:52 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 19:13 [PATCH v26 00/10] Simple Donor Migration for Proxy Execution John Stultz
2026-03-24 19:13 ` [PATCH v26 01/10] sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr() John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 02/10] sched: Minimise repeated sched_proxy_exec() checking John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 03/10] sched: Fix potentially missing balancing with Proxy Exec John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 04/10] locking: Add task::blocked_lock to serialize blocked_on state John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 05/10] sched: Fix modifying donor->blocked on without proper locking John Stultz
2026-03-26 21:45 ` Steven Rostedt
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 06/10] sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 07/10] sched: Add assert_balance_callbacks_empty helper John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 08/10] sched: Add logic to zap balance callbacks if we pick again John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 09/10] sched: Move attach_one_task and attach_task helpers to sched.h John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 10/10] sched: Handle blocked-waiter migration (and return migration) John Stultz
2026-03-26 22:52 ` Steven Rostedt
2026-03-27 4:47 ` K Prateek Nayak
2026-03-27 12:47 ` Peter Zijlstra
2026-04-02 14:43 ` Peter Zijlstra
2026-04-02 15:08 ` Peter Zijlstra
2026-04-02 17:43 ` John Stultz
2026-04-02 17:34 ` John Stultz
2026-04-03 12:30 ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-25 10:52 ` [PATCH v26 00/10] Simple Donor Migration for Proxy Execution K Prateek Nayak
2026-03-27 11:48 ` Peter Zijlstra
2026-03-27 13:33 ` K Prateek Nayak
2026-03-27 15:20 ` Peter Zijlstra
2026-03-27 15:41 ` Peter Zijlstra
2026-03-27 16:00 ` Peter Zijlstra
2026-03-27 16:57 ` K Prateek Nayak
2026-04-02 15:50 ` Peter Zijlstra
2026-04-02 18:31 ` John Stultz
2026-04-02 21:04 ` John Stultz
2026-04-03 6:09 ` K Prateek Nayak
2026-04-03 9:52 ` Peter Zijlstra [this message]
2026-04-03 10:25 ` K Prateek Nayak
2026-04-03 11:28 ` Peter Zijlstra
2026-04-03 13:43 ` K Prateek Nayak
2026-04-03 14:38 ` Peter Zijlstra
2026-04-03 15:39 ` K Prateek Nayak
2026-04-03 21:08 ` Peter Zijlstra
2026-04-04 0:26 ` John Stultz
2026-04-04 5:49 ` K Prateek Nayak
2026-04-04 6:07 ` John Stultz
2026-04-06 2:40 ` K Prateek Nayak
2026-04-03 12:54 ` Peter Zijlstra
2026-04-03 9:18 ` Peter Zijlstra
2026-03-27 19:15 ` John Stultz
2026-03-27 19:10 ` John Stultz
2026-03-28 4:53 ` K Prateek Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260403095225.GY3738010@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=Metin.Kaya@arm.com \
--cc=boqun.feng@gmail.com \
--cc=bsegall@google.com \
--cc=daniel.lezcano@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=xuewen.yan94@gmail.com \
--cc=zezeozue@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox