public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: soolaugust@gmail.com
To: jstultz@google.com, juri.lelli@redhat.com
Cc: peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, zhidao su <suzhidao@xiaomi.com>
Subject: [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch
Date: Fri,  3 Apr 2026 16:12:15 +0800	[thread overview]
Message-ID: <20260403081215.3942454-1-soolaugust@gmail.com> (raw)
In-Reply-To: <CANDhNCoHSfNtQaUGz1h1FXFHZ1Kavtit2+F701kw7imztfiLRQ@mail.gmail.com>

From: zhidao su <suzhidao@xiaomi.com>

commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server") added a
dl_defer_running = 0 reset in the if-branch of update_dl_entity() to
handle the case where [4] D->A is followed by [1] A->B (lapsed
deadline). The intent was to ensure the server re-enters the zero-laxity
wait when restarted after the deadline has passed.

With Proxy Execution (PE), RT tasks proxied through the scheduler appear
to trigger frequent dl_server_start() calls with expired deadlines. When
this happens with dl_defer_running=1 (from a prior starvation episode),
Peter's fix forces the fair_server back through the ~950ms zero-laxity
wait each time.

In our testing (virtme-ng, 4 CPUs, 4G RAM, ksched_football):
  With this fix:    ~1s for all players to check in
  Without this fix: ~28s for all players to check in

The issue appears to be that the clearing in update_dl_entity()'s
if-branch is too aggressive for the PE use case.
replenish_dl_new_period() already handles this via its internal guard:

  if (dl_se->dl_defer && !dl_se->dl_defer_running) {
      dl_se->dl_throttled = 1;
      dl_se->dl_defer_armed = 1;
  }

When dl_defer_running=1 (starvation previously confirmed by the
zero-laxity timer), replenish_dl_new_period() skips arming the
zero-laxity timer, allowing the server to run directly. This seems
correct: once starvation has been confirmed, subsequent start/stop
cycles triggered by PE should not re-introduce the deferral delay.

Note: this is the same change as the HACK revert in John's PE series
(679ede58445 "HACK: Revert 'sched/deadline: Fix stuck dl_server'"),
but with the rationale documented.

The state machine comment is updated to reflect the actual behavior of
replenish_dl_new_period() when dl_defer_running=1.

Signed-off-by: zhidao su <suzhidao@xiaomi.com>
---
 kernel/sched/deadline.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 01754d699f0..30b03021fce 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1034,12 +1034,6 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
 			return;
 		}
 
-		/*
-		 * When [4] D->A is followed by [1] A->B, dl_defer_running
-		 * needs to be cleared, otherwise it will fail to properly
-		 * start the zero-laxity timer.
-		 */
-		dl_se->dl_defer_running = 0;
 		replenish_dl_new_period(dl_se, rq);
 	} else if (dl_server(dl_se) && dl_se->dl_defer) {
 		/*
@@ -1662,11 +1656,11 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec)
  *   enqueue_dl_entity()
  *     update_dl_entity(WAKEUP)
  *       if (dl_time_before() || dl_entity_overflow())
- *         dl_defer_running = 0;
  *         replenish_dl_new_period();
  *           // fwd period
- *           dl_throttled = 1;
- *           dl_defer_armed = 1;
+ *           if (!dl_defer_running)
+ *             dl_throttled = 1;
+ *             dl_defer_armed = 1;
  *       if (!dl_defer_running)
  *         dl_defer_armed = 1;
  *         dl_throttled = 1;
-- 
2.43.0


  reply	other threads:[~2026-04-03  8:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-02 13:30 [PATCH] sched/deadline: Fix stale dl_defer_running in dl_server else-branch soolaugust
2026-04-03  0:05 ` John Stultz
2026-04-03  1:30   ` John Stultz
2026-04-03  8:12     ` soolaugust [this message]
2026-04-03 13:42       ` [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch Peter Zijlstra
2026-04-03 13:58         ` Andrea Righi
2026-04-03 19:31         ` John Stultz
2026-04-03 22:46           ` Peter Zijlstra
2026-04-03 22:51             ` John Stultz
2026-04-03 22:54               ` John Stultz
2026-04-04 10:22             ` Peter Zijlstra
2026-04-05  8:37               ` zhidao su
2026-04-06 20:01               ` John Stultz
2026-04-06 20:03                 ` John Stultz
2026-04-07 12:22               ` Juri Lelli
2026-04-07 15:00                 ` Peter Zijlstra
2026-04-08 11:20               ` [tip: sched/urgent] sched/deadline: Use revised wakeup rule for dl_server tip-bot2 for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260403081215.3942454-1-soolaugust@gmail.com \
    --to=soolaugust@gmail.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=suzhidao@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox