From: Juri Lelli <juri.lelli@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <jstultz@google.com>,
soolaugust@gmail.com, mingo@redhat.com,
linux-kernel@vger.kernel.org, zhidao su <suzhidao@xiaomi.com>,
Andrea Righi <arighi@nvidia.com>
Subject: Re: [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch
Date: Tue, 7 Apr 2026 14:22:58 +0200 [thread overview]
Message-ID: <adT3IndtrOTxESDF@jlelli-thinkpadt14gen4.remote.csb> (raw)
In-Reply-To: <20260404102244.GB22575@noisy.programming.kicks-ass.net>
On 04/04/26 12:22, Peter Zijlstra wrote:
> On Sat, Apr 04, 2026 at 12:46:10AM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 03, 2026 at 12:31:19PM -0700, John Stultz wrote:
> >
> > > Using a 8 cpu VM with CONFIG_SCHED_PROXY_EXEC disabled:
> > >
> > > With commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server")
> > > reverted, I see the (expected, maybe) behavior where the starvation
> > > lasts ~1second, then dl_server allows all the threads to spawn right
> > > away, and then the test runs for 10 seconds.
> > >
> > > See perfetto chart:
> > > https://ui.perfetto.dev/#!/?s=a729fd2dd4b224d6335c5b2e727dc1a1c302c11a
> > > (click the Kernel-threads track and scroll down to see the test
> > > threads named referee/defense/offense/crazy-fan)
> > >
> > > With commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server")
> > > applied, it seems the dl_server boosting the kthreadd spawning is much
> > > more staggered. Again we spin up NR_CPU low priority threads, and
> > > there's ~1second of starvation, then we spawn one of the mid threads,
> > > and another second delay, then there's a two second delay befofe we
> > > get the third running, then we get a small burst of 5 threads at once,
> > > then it falls back to 1 second or more per thread as it spawns off the
> > > rest. All in all it takes ~44 seconds just to spawn the threads before
> > > running the test.
> > >
> > > Perfetto chart:
> > > https://ui.perfetto.dev/#!/?s=ab8e487375d0c82ceea478ee4534a7189269c0d4
> > >
> > > With higher cpu counts (64), the test effectively prevents the system
> > > from booting (trips the hung task watchdog).
> > >
> > > I haven't really diagnosed the issue, but it feels a little like the
> > > dl_server is boosting until the fair rq is empty but then giving up
> > > the rest of its time, so if a fair task runs repeatedly but for a very
> > > short period of time, it won't get to run again until the next
> > > dl_server period? Causing this rate-limiting one-task-per-second
> > > effect for thread spawning? I still need to stare at the dl_server
> > > logic some more.
> >
> > I'm getting a sense of deja-vu here. Didn't we cure this once before?
> >
> > I'll go stare at this somewhere next week I suppose -- we have a long
> > weekend here.
>
> Random brain wave...
>
> Since the dl_server is LLF (deferred), it will pretty much always trip
> the dl_entity_overflow() when interrupted, right? Does it make sense to
> use the revised wake-up rule for it, when appropriate?
>
> ---
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index d08b00429323..674de6a48551 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1027,7 +1027,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
> if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
> dl_entity_overflow(dl_se, rq_clock(rq))) {
>
> - if (unlikely(!dl_is_implicit(dl_se) &&
> + if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
> !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
> !is_dl_boosted(dl_se))) {
> update_dl_revised_wakeup(dl_se, rq);
>
So to keep boosting, by reducing runtime appropriately, until the end of
the current dl-server period. Makes sense to me.
Thanks!
Juri
next prev parent reply other threads:[~2026-04-07 12:23 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 13:30 [PATCH] sched/deadline: Fix stale dl_defer_running in dl_server else-branch soolaugust
2026-04-03 0:05 ` John Stultz
2026-04-03 1:30 ` John Stultz
2026-04-03 8:12 ` [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch soolaugust
2026-04-03 13:42 ` Peter Zijlstra
2026-04-03 13:58 ` Andrea Righi
2026-04-03 19:31 ` John Stultz
2026-04-03 22:46 ` Peter Zijlstra
2026-04-03 22:51 ` John Stultz
2026-04-03 22:54 ` John Stultz
2026-04-04 10:22 ` Peter Zijlstra
2026-04-05 8:37 ` zhidao su
2026-04-06 20:01 ` John Stultz
2026-04-06 20:03 ` John Stultz
2026-04-07 12:22 ` Juri Lelli [this message]
2026-04-07 15:00 ` Peter Zijlstra
2026-04-08 11:20 ` [tip: sched/urgent] sched/deadline: Use revised wakeup rule for dl_server tip-bot2 for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adT3IndtrOTxESDF@jlelli-thinkpadt14gen4.remote.csb \
--to=juri.lelli@redhat.com \
--cc=arighi@nvidia.com \
--cc=jstultz@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=soolaugust@gmail.com \
--cc=suzhidao@xiaomi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.