public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Juri Lelli <juri.lelli@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <jstultz@google.com>,
	soolaugust@gmail.com, mingo@redhat.com,
	linux-kernel@vger.kernel.org, zhidao su <suzhidao@xiaomi.com>,
	Andrea Righi <arighi@nvidia.com>
Subject: Re: [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch
Date: Tue, 7 Apr 2026 14:22:58 +0200	[thread overview]
Message-ID: <adT3IndtrOTxESDF@jlelli-thinkpadt14gen4.remote.csb> (raw)
In-Reply-To: <20260404102244.GB22575@noisy.programming.kicks-ass.net>

On 04/04/26 12:22, Peter Zijlstra wrote:
> On Sat, Apr 04, 2026 at 12:46:10AM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 03, 2026 at 12:31:19PM -0700, John Stultz wrote:
> > 
> > > Using a 8 cpu VM with CONFIG_SCHED_PROXY_EXEC disabled:
> > > 
> > > With commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server")
> > > reverted, I see the (expected, maybe) behavior where the starvation
> > > lasts ~1second, then dl_server allows all the threads to spawn right
> > > away, and then the test runs for 10 seconds.
> > > 
> > > See perfetto chart:
> > >   https://ui.perfetto.dev/#!/?s=a729fd2dd4b224d6335c5b2e727dc1a1c302c11a
> > > (click the Kernel-threads track and scroll down to see the test
> > > threads named referee/defense/offense/crazy-fan)
> > > 
> > > With commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server")
> > > applied, it seems the dl_server boosting the kthreadd spawning is much
> > > more staggered. Again we spin up NR_CPU low priority threads, and
> > > there's  ~1second of starvation, then we spawn one of the mid threads,
> > > and another second delay, then there's a two second delay befofe we
> > > get the third running, then we get a small burst of 5 threads at once,
> > > then it falls back to 1 second or more per thread as it spawns off the
> > > rest. All in all it takes ~44 seconds just to spawn the threads before
> > > running the test.
> > > 
> > > Perfetto chart:
> > >   https://ui.perfetto.dev/#!/?s=ab8e487375d0c82ceea478ee4534a7189269c0d4
> > > 
> > > With higher cpu counts (64), the test effectively prevents the system
> > > from booting (trips the hung task watchdog).
> > > 
> > > I haven't really diagnosed the issue, but it feels a little like the
> > > dl_server is boosting until the fair rq is empty but then giving up
> > > the rest of its time, so if a fair task runs repeatedly but for a very
> > > short period of time, it won't get to run again until the next
> > > dl_server period? Causing this rate-limiting one-task-per-second
> > > effect for thread spawning? I still need to stare at the dl_server
> > > logic some more.
> > 
> > I'm getting a sense of deja-vu here. Didn't we cure this once before?
> > 
> > I'll go stare at this somewhere next week I suppose -- we have a long
> > weekend here.
> 
> Random brain wave...
> 
> Since the dl_server is LLF (deferred), it will pretty much always trip
> the dl_entity_overflow() when interrupted, right? Does it make sense to
> use the revised wake-up rule for it, when appropriate?
> 
> ---
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index d08b00429323..674de6a48551 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1027,7 +1027,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
>  	if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>  	    dl_entity_overflow(dl_se, rq_clock(rq))) {
>  
> -		if (unlikely(!dl_is_implicit(dl_se) &&
> +		if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
>  			     !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>  			     !is_dl_boosted(dl_se))) {
>  			update_dl_revised_wakeup(dl_se, rq);
> 

So to keep boosting, by reducing runtime appropriately, until the end of
the current dl-server period. Makes sense to me.

Thanks!
Juri


  parent reply	other threads:[~2026-04-07 12:23 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-02 13:30 [PATCH] sched/deadline: Fix stale dl_defer_running in dl_server else-branch soolaugust
2026-04-03  0:05 ` John Stultz
2026-04-03  1:30   ` John Stultz
2026-04-03  8:12     ` [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch soolaugust
2026-04-03 13:42       ` Peter Zijlstra
2026-04-03 13:58         ` Andrea Righi
2026-04-03 19:31         ` John Stultz
2026-04-03 22:46           ` Peter Zijlstra
2026-04-03 22:51             ` John Stultz
2026-04-03 22:54               ` John Stultz
2026-04-04 10:22             ` Peter Zijlstra
2026-04-05  8:37               ` zhidao su
2026-04-06 20:01               ` John Stultz
2026-04-06 20:03                 ` John Stultz
2026-04-07 12:22               ` Juri Lelli [this message]
2026-04-07 15:00                 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adT3IndtrOTxESDF@jlelli-thinkpadt14gen4.remote.csb \
    --to=juri.lelli@redhat.com \
    --cc=arighi@nvidia.com \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=soolaugust@gmail.com \
    --cc=suzhidao@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox