From: Peter Zijlstra <peterz@infradead.org>
To: John Stultz <jstultz@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Xuewen Yan <xuewen.yan94@gmail.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Suleiman Souhlal <suleiman@google.com>,
Qais Yousef <qyousef@layalina.io>,
Joel Fernandes <joelagnelf@nvidia.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: Re: [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing cpu starvation
Date: Wed, 17 Sep 2025 11:34:41 +0200 [thread overview]
Message-ID: <20250917093441.GU3419281@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <CANDhNCqK3VBAxxWMsDez8xkX0vcTStWjRMR95pksUM6Q26Ctyw@mail.gmail.com>
On Tue, Sep 16, 2025 at 08:29:01PM -0700, John Stultz wrote:
> The problem is that previously the tasks were spawned without much
> delay, but after your change, we see each RT tasks after the first
> NR_CPU take ~1 second to spawn. On systems with large cpu counts
> taking NR_CPU*player-types seconds to start can be long enough (easily
> minutes) that we hit hung task errors.
>
> As I've been digging, part of the issue seems to be kthreadd is not an
> RT task, thus after we create NR_CPU spinner threads, the following
> threads don't actually start right away. They have to wait until
> rt-throttling or the dl_server runs, allowing kthreadd to get cpu time
> and start the thread. But then we have to wait until throttling stops
> so that the RT prio Ref thread can run to set the new thread as RT
> prio and then spawn the next, which then won't start until we do
> throttling again, etc.
>
> So it makes sense if the dl_server is "refreshed" once a second, we
> might see this one thread per second interlock happening. So this
> seems likely just an issue with my test.
>
> However, prior to your change, it seemed like the the RT tasks would
> be throttled, kthreadd would run and the thread would start, then with
> no SCHED_NORMAL tasks to run, the dl_server would stop and we'd run
> the Ref, which would immediately eneueued kthreadd, which started the
> dl_server and since the dl_server hadn't done much, I'm guessing it
> still had some runtime/bandwidth and would run kthreadd then stop
> right after.
>
> So now my guess (and my apologies, as I really am not familiar enough
> the the DL details) is that since we're not stopping the dl_server, it
> seems like the dl_server runtime gets used up each cycle, even though
> it didn't do much boosting. So we have to wait the full cycle for a
> refresh.
Yes. This makes sense.
The old code would disable the dl_server when fair tasks drops to 0
so even though we had that yield in __pick_task_dl(), we'd never hit it.
So the moment another fair task shows up (0->1) we re-enqueue the
dl_server (using update_dl_entity() / CBS wakeup rules) and continue
consuming bandwidth.
However, since we're now not stopping the thing, we hit that yield,
getting this pretty terrible behaviour where we will only run fair tasks
until there are none and then yield our entire period, forcing another
task to wait until the next cycle.
Let me go have a play, surely we can do better.
next prev parent reply other threads:[~2025-09-17 9:34 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 11:49 [PATCH v2 00/12] sched: Address schbench regression Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 01/12] sched/psi: Optimize psi_group_change() cpu_clock() usage Peter Zijlstra
2025-07-15 19:11 ` Chris Mason
2025-07-16 6:06 ` K Prateek Nayak
2025-07-16 6:53 ` Beata Michalska
2025-07-16 10:40 ` Peter Zijlstra
2025-07-16 14:54 ` Johannes Weiner
2025-07-16 16:27 ` Chris Mason
2025-07-23 4:16 ` Aithal, Srikanth
2025-07-25 5:13 ` K Prateek Nayak
2025-07-02 11:49 ` [PATCH v2 02/12] sched/deadline: Less agressive dl_server handling Peter Zijlstra
2025-07-02 16:12 ` Juri Lelli
2025-07-10 12:46 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-07-14 22:56 ` [PATCH v2 02/12] " Mel Gorman
2025-07-15 14:55 ` Chris Mason
2025-07-16 18:19 ` Mel Gorman
2025-07-30 9:34 ` Geert Uytterhoeven
2025-07-30 9:46 ` Juri Lelli
2025-07-30 10:05 ` Geert Uytterhoeven
2025-08-05 22:03 ` Chris Bainbridge
2025-08-05 23:04 ` Chris Bainbridge
2025-09-15 22:29 ` John Stultz
2025-09-16 4:18 ` John Stultz
2025-09-16 5:28 ` [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing cpu starvation John Stultz
2025-09-16 8:51 ` Juri Lelli
2025-09-16 11:01 ` Peter Zijlstra
2025-09-16 12:52 ` Juri Lelli
2025-09-16 14:30 ` Peter Zijlstra
2025-09-16 17:35 ` John Stultz
2025-09-16 21:30 ` Peter Zijlstra
2025-09-17 3:29 ` John Stultz
2025-09-17 9:34 ` Peter Zijlstra [this message]
2025-09-17 12:26 ` Peter Zijlstra
2025-09-17 13:56 ` Juri Lelli
2025-09-17 17:30 ` Peter Zijlstra
2025-09-18 8:37 ` Juri Lelli
2025-09-18 9:04 ` Peter Zijlstra
2025-09-18 9:42 ` Juri Lelli
2025-09-17 19:29 ` John Stultz
2025-09-18 6:56 ` [tip: sched/urgent] sched/deadline: Fix dl_server behaviour tip-bot2 for Peter Zijlstra
2025-09-25 7:55 ` tip-bot2 for Peter Zijlstra
2025-09-18 6:56 ` [tip: sched/urgent] sched/deadline: Fix dl_server getting stuck tip-bot2 for Peter Zijlstra
2025-09-18 14:46 ` Dietmar Eggemann
2025-09-22 21:57 ` Marek Szyprowski
2025-09-22 23:46 ` John Stultz
2025-09-23 6:31 ` Marek Szyprowski
2025-09-23 7:25 ` Peter Zijlstra
2025-09-23 7:52 ` Marek Szyprowski
2025-09-23 22:02 ` Peter Zijlstra
2025-09-29 15:19 ` Marek Szyprowski
[not found] ` <eae77bd0-d874-4ddf-88d7-c1ab75358f91@samsung.com>
2025-10-09 8:35 ` Krzysztof Kozlowski
2025-10-09 9:26 ` Peter Zijlstra
2025-10-09 11:42 ` Marek Szyprowski
2025-09-25 7:55 ` tip-bot2 for Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 03/12] sched: Optimize ttwu() / select_task_rq() Peter Zijlstra
2025-07-10 16:47 ` Vincent Guittot
2025-07-14 22:59 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 04/12] sched: Use lock guard in ttwu_runnable() Peter Zijlstra
2025-07-10 16:48 ` Vincent Guittot
2025-07-14 23:00 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 05/12] sched: Add ttwu_queue controls Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-14 23:14 ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 06/12] sched: Introduce ttwu_do_migrate() Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 07/12] psi: Split psi_ttwu_dequeue() Peter Zijlstra
2025-07-17 23:59 ` Chris Mason
2025-07-18 18:02 ` Steven Rostedt
2025-07-02 11:49 ` [PATCH v2 08/12] sched: Re-arrange __ttwu_queue_wakelist() Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 09/12] sched: Clean up ttwu comments Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 10/12] sched: Use lock guard in sched_ttwu_pending() Peter Zijlstra
2025-07-10 16:51 ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 11/12] sched: Change ttwu_runnable() vs sched_delayed Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 12/12] sched: Add ttwu_queue support for delayed tasks Peter Zijlstra
2025-07-03 16:00 ` Phil Auld
2025-07-03 16:47 ` Peter Zijlstra
2025-07-03 17:11 ` Phil Auld
2025-07-14 13:57 ` Phil Auld
2025-07-04 6:13 ` K Prateek Nayak
2025-07-04 7:59 ` Peter Zijlstra
2025-07-08 12:44 ` Dietmar Eggemann
2025-07-08 18:57 ` Peter Zijlstra
2025-07-08 21:02 ` Peter Zijlstra
2025-07-23 5:42 ` Shrikanth Hegde
2025-07-02 15:27 ` [PATCH v2 00/12] sched: Address schbench regression Chris Mason
2025-07-07 9:05 ` Shrikanth Hegde
2025-07-07 9:11 ` Peter Zijlstra
2025-07-07 9:38 ` Shrikanth Hegde
2025-07-16 13:46 ` Phil Auld
2025-07-17 17:25 ` Phil Auld
2025-07-07 18:19 ` Shrikanth Hegde
2025-07-08 19:02 ` Peter Zijlstra
2025-07-09 16:46 ` Shrikanth Hegde
2025-07-14 17:54 ` Shrikanth Hegde
2025-07-21 19:37 ` Shrikanth Hegde
2025-07-22 20:20 ` Chris Mason
2025-07-24 18:23 ` Chris Mason
2025-07-08 15:09 ` Chris Mason
2025-07-08 17:29 ` Shrikanth Hegde
2025-07-17 13:04 ` Beata Michalska
2025-07-17 16:57 ` Beata Michalska
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250917093441.GU3419281@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xuewen.yan94@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.