From: Peter Zijlstra <peterz@infradead.org>
To: soolaugust@gmail.com
Cc: jstultz@google.com, juri.lelli@redhat.com, mingo@redhat.com,
linux-kernel@vger.kernel.org, zhidao su <suzhidao@xiaomi.com>,
Andrea Righi <arighi@nvidia.com>
Subject: Re: [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch
Date: Fri, 3 Apr 2026 15:42:56 +0200 [thread overview]
Message-ID: <20260403134256.GH3558198@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20260403081215.3942454-1-soolaugust@gmail.com>
On Fri, Apr 03, 2026 at 04:12:15PM +0800, soolaugust@gmail.com wrote:
> From: zhidao su <suzhidao@xiaomi.com>
>
> commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server") added a
> dl_defer_running = 0 reset in the if-branch of update_dl_entity() to
> handle the case where [4] D->A is followed by [1] A->B (lapsed
> deadline). The intent was to ensure the server re-enters the zero-laxity
> wait when restarted after the deadline has passed.
>
> With Proxy Execution (PE), RT tasks proxied through the scheduler appear
> to trigger frequent dl_server_start() calls with expired deadlines. When
> this happens with dl_defer_running=1 (from a prior starvation episode),
> Peter's fix forces the fair_server back through the ~950ms zero-laxity
> wait each time.
>
> In our testing (virtme-ng, 4 CPUs, 4G RAM, ksched_football):
> With this fix: ~1s for all players to check in
> Without this fix: ~28s for all players to check in
>
> The issue appears to be that the clearing in update_dl_entity()'s
> if-branch is too aggressive for the PE use case.
> replenish_dl_new_period() already handles this via its internal guard:
>
> if (dl_se->dl_defer && !dl_se->dl_defer_running) {
> dl_se->dl_throttled = 1;
> dl_se->dl_defer_armed = 1;
> }
>
> When dl_defer_running=1 (starvation previously confirmed by the
> zero-laxity timer), replenish_dl_new_period() skips arming the
> zero-laxity timer, allowing the server to run directly. This seems
> correct: once starvation has been confirmed, subsequent start/stop
> cycles triggered by PE should not re-introduce the deferral delay.
>
> Note: this is the same change as the HACK revert in John's PE series
> (679ede58445 "HACK: Revert 'sched/deadline: Fix stuck dl_server'"),
> but with the rationale documented.
>
> The state machine comment is updated to reflect the actual behavior of
> replenish_dl_new_period() when dl_defer_running=1.
>
> Signed-off-by: zhidao su <suzhidao@xiaomi.com>
> ---
> kernel/sched/deadline.c | 12 +++---------
> 1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 01754d699f0..30b03021fce 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1034,12 +1034,6 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
> return;
> }
>
> - /*
> - * When [4] D->A is followed by [1] A->B, dl_defer_running
> - * needs to be cleared, otherwise it will fail to properly
> - * start the zero-laxity timer.
> - */
> - dl_se->dl_defer_running = 0;
> replenish_dl_new_period(dl_se, rq);
> } else if (dl_server(dl_se) && dl_se->dl_defer) {
> /*
This cannot be right; it will insta break Andrea's test case again.
And I cannot make sense of your explanation; how does PE cause what to
happen? You mention PROXY_WAKING, this then means proxy_force_return().
I suspect whatever it is you're seeing will go away once we delete that
thing, see this discussion:
https://lkml.kernel.org/r/20260402155055.GV3738010@noisy.programming.kicks-ass.net
next prev parent reply other threads:[~2026-04-03 13:43 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 13:30 [PATCH] sched/deadline: Fix stale dl_defer_running in dl_server else-branch soolaugust
2026-04-03 0:05 ` John Stultz
2026-04-03 1:30 ` John Stultz
2026-04-03 8:12 ` [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch soolaugust
2026-04-03 13:42 ` Peter Zijlstra [this message]
2026-04-03 13:58 ` Andrea Righi
2026-04-03 19:31 ` John Stultz
2026-04-03 22:46 ` Peter Zijlstra
2026-04-03 22:51 ` John Stultz
2026-04-03 22:54 ` John Stultz
2026-04-04 10:22 ` Peter Zijlstra
2026-04-05 8:37 ` zhidao su
2026-04-06 20:01 ` John Stultz
2026-04-06 20:03 ` John Stultz
2026-04-07 12:22 ` Juri Lelli
2026-04-07 15:00 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260403134256.GH3558198@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=arighi@nvidia.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=soolaugust@gmail.com \
--cc=suzhidao@xiaomi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox