From: K Prateek Nayak <kprateek.nayak@amd.com>
To: <soolaugust@gmail.com>, <linux-kernel@vger.kernel.org>,
John Stultz <jstultz@google.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
zhidao su <suzhidao@xiaomi.com>
Subject: Re: [PATCH] sched/proxy_exec: Handle sched_delayed owner in find_proxy_task()
Date: Tue, 3 Mar 2026 11:29:39 +0530 [thread overview]
Message-ID: <fabb2e9a-5141-4c30-baf2-05aae40d4ee9@amd.com> (raw)
In-Reply-To: <20260302101235.3988601-1-soolaugust@gmail.com>
(+ John)
Hello Zhidao,
On 3/2/2026 3:42 PM, soolaugust@gmail.com wrote:
> From: zhidao su <suzhidao@xiaomi.com>
>
> The blocked-owner check at the top of the inner loop unconditionally
> lumps two distinct states into one:
>
> 1. !on_rq -- the owner has fully left the runqueue; PE cannot
> proceed and proxy_deactivate() is the right action.
> 2. sched_delayed -- EEVDF deferred-dequeue: the owner called schedule()
> but was kept physically in the RB-tree because its
> lag was still positive (entity_eligible() == true).
>
> Case 2 is transient. The owner will resolve to one of two outcomes:
>
> * A wakeup arrives --> sched_delayed cleared, on_rq stays 1,
> owner eligible for PE on the next cycle.
> * Dequeue completes --> on_rq drops to 0, caught by case 1 above.
>
> Calling proxy_deactivate() in case 2 is unnecessarily aggressive: it
> removes the high-priority donor from the runqueue and clears its
> blocked_on, discarding valid PE state for a single missed cycle.
These bits are still in development and will get sorted later in the
series with the blocked owner handling. See
https://github.com/johnstultz-work/linux-dev/commit/e39257424cf2edb17b6be9be3cd50796d6650b1b
>
> A task that enters the mutex slowpath sets blocked_on before calling
> schedule(), and try_to_block_task() is only reached via the explicit
> DEQUEUE_DELAYED path -- not the sched_delayed shortcut. Therefore a
> sched_delayed owner never has blocked_on set and the chain cannot be
> followed further regardless.
>
> Split the check: keep proxy_deactivate() for !on_rq, and switch to
> proxy_resched_idle() for sched_delayed. This mirrors the existing
> handling of task_on_rq_migrating() owners (see proxy_resched_idle()
> call below), which also uses a yield-to-idle to handle a transient
> per-owner condition without disturbing the donor.
Just switching to idle will not alter the EEVDF state and the pick
will still converge on the same task whose owner will still be
delayed.
Until a wakeup or a full dequeue (and the owner could also be on a
remote CPU at this point), the pick would just be spinning in
__schedule(), continuously calling proxy_resched_idle() since
nothing has changed in the wait chain of this CPU no?
next = pick_next_task(); /* Gets a blocked donor */
if (task_is_blocked(next))
find_proxy_task(rq, next)
...
if (owner->se.sched_delayed) /* Finds a delayed owner. */
next = proxy_resched_idle(rq)
/*
* Switched to rq->idle with NEED_RESCHED set.
* Comes back into __schedule().
*/
next = pick_next_task();
if (task_is_blocked(next)) /* Same blocked task! */
find_proxy_task(rq, next)
...
if (owner->se.sched_delayed) /* Owner still delayed */
next = proxy_resched_idle(rq) /* Again switched to idle. */
... And the cycle repeats with preemption disabled !!!
This is terrible since blocked owner can be delayed on a busy runqueue
for more than a few tick - sure it is a transient state but it can last
for a while depending on the state of the cfs_rq where it is delayed,
up to few 10s of milliseconds in a practical worst case scenario.
>
> Signed-off-by: zhidao su <suzhidao@xiaomi.com>
> ---
> kernel/sched/core.c | 25 +++++++++++++++++++++++--
> 1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b7f77c165a6..dc9f17b35e4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6625,10 +6625,31 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
> return p;
> }
>
> - if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
> - /* XXX Don't handle blocked owners/delayed dequeue yet */
> + if (!READ_ONCE(owner->on_rq)) {
> + /*
> + * Owner is off the runqueue; proxy execution cannot
> + * proceed through it. Deactivate the donor so it will
> + * be properly re-enqueued when the owner eventually
> + * wakes and releases the mutex.
> + */
> return proxy_deactivate(rq, donor);
> }
> + if (owner->se.sched_delayed) {
> + /*
> + * The owner is in EEVDF's deferred-dequeue state: it
> + * called schedule() but the scheduler kept it physically
> + * on the runqueue because its lag was still positive.
> + * This is a transient condition -- the owner will either
> + * be woken (clearing sched_delayed) or fully dequeued
> + * (clearing on_rq) very shortly.
> + *
> + * Unlike the !on_rq case the donor is still valid; do
> + * not deactivate it. Yield to idle so the owner can
> + * complete its state transition, then retry PE on the
> + * next scheduling cycle.
> + */
> + return proxy_resched_idle(rq);
proxy_deactivate() is correct to do for now until we get to the
blocked owner handling.
> + }
>
> if (task_cpu(owner) != this_cpu) {
> /* XXX Don't handle migrations yet */
--
Thanks and Regards,
Prateek
next prev parent reply other threads:[~2026-03-03 5:59 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 10:12 [PATCH] sched/proxy_exec: Handle sched_delayed owner in find_proxy_task() soolaugust
2026-03-03 5:59 ` K Prateek Nayak [this message]
2026-03-03 6:30 ` soolaugust
2026-03-03 6:43 ` K Prateek Nayak
2026-03-03 21:21 ` John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fabb2e9a-5141-4c30-baf2-05aae40d4ee9@amd.com \
--to=kprateek.nayak@amd.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=soolaugust@gmail.com \
--cc=suzhidao@xiaomi.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox