From: Juri Lelli <juri.lelli@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <jstultz@google.com>,
soolaugust@gmail.com, mingo@redhat.com,
linux-kernel@vger.kernel.org, zhidao su <suzhidao@xiaomi.com>,
Andrea Righi <arighi@nvidia.com>
Subject: Re: [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch
Date: Tue, 7 Apr 2026 14:22:58 +0200 [thread overview]
Message-ID: <adT3IndtrOTxESDF@jlelli-thinkpadt14gen4.remote.csb> (raw)
In-Reply-To: <20260404102244.GB22575@noisy.programming.kicks-ass.net>
On 04/04/26 12:22, Peter Zijlstra wrote:
> On Sat, Apr 04, 2026 at 12:46:10AM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 03, 2026 at 12:31:19PM -0700, John Stultz wrote:
> >
> > > Using a 8 cpu VM with CONFIG_SCHED_PROXY_EXEC disabled:
> > >
> > > With commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server")
> > > reverted, I see the (expected, maybe) behavior where the starvation
> > > lasts ~1second, then dl_server allows all the threads to spawn right
> > > away, and then the test runs for 10 seconds.
> > >
> > > See perfetto chart:
> > > https://ui.perfetto.dev/#!/?s=a729fd2dd4b224d6335c5b2e727dc1a1c302c11a
> > > (click the Kernel-threads track and scroll down to see the test
> > > threads named referee/defense/offense/crazy-fan)
> > >
> > > With commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server")
> > > applied, it seems the dl_server boosting the kthreadd spawning is much
> > > more staggered. Again we spin up NR_CPU low priority threads, and
> > > there's ~1second of starvation, then we spawn one of the mid threads,
> > > and another second delay, then there's a two second delay befofe we
> > > get the third running, then we get a small burst of 5 threads at once,
> > > then it falls back to 1 second or more per thread as it spawns off the
> > > rest. All in all it takes ~44 seconds just to spawn the threads before
> > > running the test.
> > >
> > > Perfetto chart:
> > > https://ui.perfetto.dev/#!/?s=ab8e487375d0c82ceea478ee4534a7189269c0d4
> > >
> > > With higher cpu counts (64), the test effectively prevents the system
> > > from booting (trips the hung task watchdog).
> > >
> > > I haven't really diagnosed the issue, but it feels a little like the
> > > dl_server is boosting until the fair rq is empty but then giving up
> > > the rest of its time, so if a fair task runs repeatedly but for a very
> > > short period of time, it won't get to run again until the next
> > > dl_server period? Causing this rate-limiting one-task-per-second
> > > effect for thread spawning? I still need to stare at the dl_server
> > > logic some more.
> >
> > I'm getting a sense of deja-vu here. Didn't we cure this once before?
> >
> > I'll go stare at this somewhere next week I suppose -- we have a long
> > weekend here.
>
> Random brain wave...
>
> Since the dl_server is LLF (deferred), it will pretty much always trip
> the dl_entity_overflow() when interrupted, right? Does it make sense to
> use the revised wake-up rule for it, when appropriate?
>
> ---
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index d08b00429323..674de6a48551 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1027,7 +1027,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
> if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
> dl_entity_overflow(dl_se, rq_clock(rq))) {
>
> - if (unlikely(!dl_is_implicit(dl_se) &&
> + if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
> !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
> !is_dl_boosted(dl_se))) {
> update_dl_revised_wakeup(dl_se, rq);
>
So to keep boosting, by reducing runtime appropriately, until the end of
the current dl-server period. Makes sense to me.
Thanks!
Juri
next prev parent reply other threads:[~2026-04-07 12:23 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 13:30 [PATCH] sched/deadline: Fix stale dl_defer_running in dl_server else-branch soolaugust
2026-04-03 0:05 ` John Stultz
2026-04-03 1:30 ` John Stultz
2026-04-03 8:12 ` [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch soolaugust
2026-04-03 13:42 ` Peter Zijlstra
2026-04-03 13:58 ` Andrea Righi
2026-04-03 19:31 ` John Stultz
2026-04-03 22:46 ` Peter Zijlstra
2026-04-03 22:51 ` John Stultz
2026-04-03 22:54 ` John Stultz
2026-04-04 10:22 ` Peter Zijlstra
2026-04-05 8:37 ` zhidao su
2026-04-06 20:01 ` John Stultz
2026-04-06 20:03 ` John Stultz
2026-04-07 12:22 ` Juri Lelli [this message]
2026-04-07 15:00 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adT3IndtrOTxESDF@jlelli-thinkpadt14gen4.remote.csb \
--to=juri.lelli@redhat.com \
--cc=arighi@nvidia.com \
--cc=jstultz@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=soolaugust@gmail.com \
--cc=suzhidao@xiaomi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox