From: Juri Lelli <juri.lelli@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <jstultz@google.com>,
LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Xuewen Yan <xuewen.yan94@gmail.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Suleiman Souhlal <suleiman@google.com>,
Qais Yousef <qyousef@layalina.io>,
Joel Fernandes <joelagnelf@nvidia.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: Re: [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing cpu starvation
Date: Tue, 16 Sep 2025 14:52:44 +0200 [thread overview]
Message-ID: <aMldnFrGfcMECbmK@jlelli-thinkpadt14gen4.remote.csb> (raw)
In-Reply-To: <20250916110155.GH3245006@noisy.programming.kicks-ass.net>
On 16/09/25 13:01, Peter Zijlstra wrote:
> On Tue, Sep 16, 2025 at 10:51:34AM +0200, Juri Lelli wrote:
>
> > > @@ -1173,7 +1171,7 @@ static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_
> > >
> > > if (!dl_se->server_has_tasks(dl_se)) {
> > > replenish_dl_entity(dl_se);
> > > - dl_server_stopped(dl_se);
> > > + dl_server_stop(dl_se);
> > > return HRTIMER_NORESTART;
> > > }
> >
> > It looks OK for a quick testing I've done. Also, it seems to make sense
> > to me. The defer timer has fired (we are executing the callback). If the
> > server hasn't got tasks to serve we can just stop it (clearing the
> > flags) and wait for the next enqueue of fair to start it again still in
> > defer mode. hrtimer_try_to_cancel() is redundant (but harmless),
> > dequeue_dl_entity() I believe we need to call to deal with
> > task_non_contending().
> >
> > Peter, what do you think?
>
> Well, the problem was that we were starting/stopping the thing too
> often, and the general idea of that commit:
>
> cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
>
> was to not stop the server, unless it's not seen fair tasks for a whole
> period.
>
> Now, the case John trips seems to be that there were tasks, we ran tasks
> until budget exhausted, dequeued the server and did start_dl_timer().
>
> Then the bandwidth timer fires at a point where there are no more fair
> tasks, replenish_dl_entity() gets called, which *should* set the
> 0-laxity timer, but doesn't -- because !server_has_tasks() -- and then
> nothing.
>
> So perhaps we should do something like the below. Simply continue
> as normal, until we do a whole cycle without having seen a task.
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 5b64bc621993..269ca2eb5ba9 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -875,7 +875,7 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se)
> */
> if (dl_se->dl_defer && !dl_se->dl_defer_running &&
> dl_time_before(rq_clock(dl_se->rq), dl_se->deadline - dl_se->runtime)) {
> - if (!is_dl_boosted(dl_se) && dl_se->server_has_tasks(dl_se)) {
> + if (!is_dl_boosted(dl_se)) {
>
> /*
> * Set dl_se->dl_defer_armed and dl_throttled variables to
> @@ -1171,12 +1171,6 @@ static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_
> if (!dl_se->dl_runtime)
> return HRTIMER_NORESTART;
>
> - if (!dl_se->server_has_tasks(dl_se)) {
> - replenish_dl_entity(dl_se);
> - dl_server_stopped(dl_se);
> - return HRTIMER_NORESTART;
> - }
> -
> if (dl_se->dl_defer_armed) {
> /*
> * First check if the server could consume runtime in background.
>
>
> Notably, this removes all ->server_has_tasks() users, so if this works
> and is correct, we can completely remove that callback and simplify
> more.
>
> Hmm?
But then what stops the server when the 0-laxity (defer) timer fires
again a period down the line?
next prev parent reply other threads:[~2025-09-16 12:52 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 11:49 [PATCH v2 00/12] sched: Address schbench regression Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 01/12] sched/psi: Optimize psi_group_change() cpu_clock() usage Peter Zijlstra
2025-07-15 19:11 ` Chris Mason
2025-07-16 6:06 ` K Prateek Nayak
2025-07-16 6:53 ` Beata Michalska
2025-07-16 10:40 ` Peter Zijlstra
2025-07-16 14:54 ` Johannes Weiner
2025-07-16 16:27 ` Chris Mason
2025-07-23 4:16 ` Aithal, Srikanth
2025-07-25 5:13 ` K Prateek Nayak
2025-07-02 11:49 ` [PATCH v2 02/12] sched/deadline: Less agressive dl_server handling Peter Zijlstra
2025-07-02 16:12 ` Juri Lelli
2025-07-10 12:46 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-07-14 22:56 ` [PATCH v2 02/12] " Mel Gorman
2025-07-15 14:55 ` Chris Mason
2025-07-16 18:19 ` Mel Gorman
2025-07-30 9:34 ` Geert Uytterhoeven
2025-07-30 9:46 ` Juri Lelli
2025-07-30 10:05 ` Geert Uytterhoeven
2025-08-05 22:03 ` Chris Bainbridge
2025-08-05 23:04 ` Chris Bainbridge
2025-09-15 22:29 ` John Stultz
2025-09-16 4:18 ` John Stultz
2025-09-16 5:28 ` [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing cpu starvation John Stultz
2025-09-16 8:51 ` Juri Lelli
2025-09-16 11:01 ` Peter Zijlstra
2025-09-16 12:52 ` Juri Lelli [this message]
2025-09-16 14:30 ` Peter Zijlstra
2025-09-16 17:35 ` John Stultz
2025-09-16 21:30 ` Peter Zijlstra
2025-09-17 3:29 ` John Stultz
2025-09-17 9:34 ` Peter Zijlstra
2025-09-17 12:26 ` Peter Zijlstra
2025-09-17 13:56 ` Juri Lelli
2025-09-17 17:30 ` Peter Zijlstra
2025-09-18 8:37 ` Juri Lelli
2025-09-18 9:04 ` Peter Zijlstra
2025-09-18 9:42 ` Juri Lelli
2025-09-17 19:29 ` John Stultz
2025-09-18 6:56 ` [tip: sched/urgent] sched/deadline: Fix dl_server behaviour tip-bot2 for Peter Zijlstra
2025-09-25 7:55 ` tip-bot2 for Peter Zijlstra
2025-09-18 6:56 ` [tip: sched/urgent] sched/deadline: Fix dl_server getting stuck tip-bot2 for Peter Zijlstra
2025-09-18 14:46 ` Dietmar Eggemann
2025-09-22 21:57 ` Marek Szyprowski
2025-09-22 23:46 ` John Stultz
2025-09-23 6:31 ` Marek Szyprowski
2025-09-23 7:25 ` Peter Zijlstra
2025-09-23 7:52 ` Marek Szyprowski
2025-09-23 22:02 ` Peter Zijlstra
2025-09-29 15:19 ` Marek Szyprowski
[not found] ` <eae77bd0-d874-4ddf-88d7-c1ab75358f91@samsung.com>
2025-10-09 8:35 ` Krzysztof Kozlowski
2025-10-09 9:26 ` Peter Zijlstra
2025-10-09 11:42 ` Marek Szyprowski
2025-09-25 7:55 ` tip-bot2 for Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 03/12] sched: Optimize ttwu() / select_task_rq() Peter Zijlstra
2025-07-10 16:47 ` Vincent Guittot
2025-07-14 22:59 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 04/12] sched: Use lock guard in ttwu_runnable() Peter Zijlstra
2025-07-10 16:48 ` Vincent Guittot
2025-07-14 23:00 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 05/12] sched: Add ttwu_queue controls Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-14 23:14 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 06/12] sched: Introduce ttwu_do_migrate() Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 07/12] psi: Split psi_ttwu_dequeue() Peter Zijlstra
2025-07-17 23:59 ` Chris Mason
2025-07-18 18:02 ` Steven Rostedt
2025-07-02 11:49 ` [PATCH v2 08/12] sched: Re-arrange __ttwu_queue_wakelist() Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 09/12] sched: Clean up ttwu comments Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 10/12] sched: Use lock guard in sched_ttwu_pending() Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 11/12] sched: Change ttwu_runnable() vs sched_delayed Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 12/12] sched: Add ttwu_queue support for delayed tasks Peter Zijlstra
2025-07-03 16:00 ` Phil Auld
2025-07-03 16:47 ` Peter Zijlstra
2025-07-03 17:11 ` Phil Auld
2025-07-14 13:57 ` Phil Auld
2025-07-04 6:13 ` K Prateek Nayak
2025-07-04 7:59 ` Peter Zijlstra
2025-07-08 12:44 ` Dietmar Eggemann
2025-07-08 18:57 ` Peter Zijlstra
2025-07-08 21:02 ` Peter Zijlstra
2025-07-23 5:42 ` Shrikanth Hegde
2025-07-02 15:27 ` [PATCH v2 00/12] sched: Address schbench regression Chris Mason
2025-07-07 9:05 ` Shrikanth Hegde
2025-07-07 9:11 ` Peter Zijlstra
2025-07-07 9:38 ` Shrikanth Hegde
2025-07-16 13:46 ` Phil Auld
2025-07-17 17:25 ` Phil Auld
2025-07-07 18:19 ` Shrikanth Hegde
2025-07-08 19:02 ` Peter Zijlstra
2025-07-09 16:46 ` Shrikanth Hegde
2025-07-14 17:54 ` Shrikanth Hegde
2025-07-21 19:37 ` Shrikanth Hegde
2025-07-22 20:20 ` Chris Mason
2025-07-24 18:23 ` Chris Mason
2025-07-08 15:09 ` Chris Mason
2025-07-08 17:29 ` Shrikanth Hegde
2025-07-17 13:04 ` Beata Michalska
2025-07-17 16:57 ` Beata Michalska
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aMldnFrGfcMECbmK@jlelli-thinkpadt14gen4.remote.csb \
--to=juri.lelli@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=jstultz@google.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xuewen.yan94@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.