All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juri Lelli <juri.lelli@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <jstultz@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Suleiman Souhlal <suleiman@google.com>,
	Qais Yousef <qyousef@layalina.io>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
	kernel-team@android.com
Subject: Re: [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing cpu starvation
Date: Wed, 17 Sep 2025 15:56:20 +0200	[thread overview]
Message-ID: <aMq-BKLSIG9JrRb7@jlelli-thinkpadt14gen4.remote.csb> (raw)
In-Reply-To: <20250917122616.GG1386988@noisy.programming.kicks-ass.net>

On 17/09/25 14:26, Peter Zijlstra wrote:
> On Wed, Sep 17, 2025 at 11:34:42AM +0200, Peter Zijlstra wrote:
> 
> > Yes. This makes sense.
> > 
> > The old code would disable the dl_server when fair tasks drops to 0
> > so even though we had that yield in __pick_task_dl(), we'd never hit it.
> > So the moment another fair task shows up (0->1) we re-enqueue the
> > dl_server (using update_dl_entity() / CBS wakeup rules) and continue
> > consuming bandwidth.
> > 
> > However, since we're now not stopping the thing, we hit that yield,
> > getting this pretty terrible behaviour where we will only run fair tasks
> > until there are none and then yield our entire period, forcing another
> > task to wait until the next cycle.
> > 
> > Let me go have a play, surely we can do better.
> 
> Can you please try:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent
> 
> That's yesterdays patch and the below. Its compile tested only, but
> with a bit of luck it'll actually work ;-)
> 
> ---
> Subject: sched/deadline: Fix dl_server behaviour
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Sep 17 12:03:20 CEST 2025
> 
> John reported undesirable behaviour with the dl_server since commit:
> cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling").
> 
> When starving fair tasks on purpose (starting spinning FIFO tasks),
> his fair workload, which often goes (briefly) idle, would delay fair
> invocations for a second, running one invocation per second was both
> unexpected and terribly slow.
> 
> The reason this happens is that when dl_se->server_pick_task() returns
> NULL, indicating no runnable tasks, it would yield, pushing any later
> jobs out a whole period (1 second).
> 
> Instead simply stop the server. This should restore behaviour in that
> a later wakeup (which restarts the server) will be able to continue
> running (subject to the CBS wakeup rules).
> 
> Notably, this does not re-introduce the behaviour cccb45d7c4295 set
> out to solve, any start/stop cycle is naturally throttled by the timer
> period (no active cancel).

Neat!

> Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
> Reported-by: John Stultz <jstultz@google.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

...

> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -371,10 +371,39 @@ extern s64 dl_scaled_delta_exec(struct r
>   *   dl_server_update() -- called from update_curr_common(), propagates runtime
>   *                         to the server.
>   *
> - *   dl_server_start()
> - *   dl_server_stop()  -- start/stop the server when it has (no) tasks.
> + *   dl_server_start() -- start the server when it has tasks; it will stop
> + *			  automatically when there are no more tasks, per
> + *			  dl_se::server_pick() returning NULL.
> + *
> + *   dl_server_stop() -- (force) stop the server; use when updating
> + *                       parameters.
>   *
>   *   dl_server_init() -- initializes the server.
> + *
> + * When started the dl_server will (per dl_defer) schedule a timer for its
> + * zero-laxity point -- that is, unlike regular EDF tasks which run ASAP, a
> + * server will run at the very end of its period.
> + *
> + * This is done such that any runtime from the target class can be accounted
> + * against the server -- through dl_server_update() above -- such that when it
> + * becomes time to run, it might already be out of runtime and get deferred
> + * until the next period. In this case dl_server_timer() will alternate
> + * between defer and replenish but never actually enqueue the server.
> + *
> + * Only when the target class does not manage to exhaust the server's runtime
> + * (there's actualy starvation in the given period), will the dl_server get on
> + * the runqueue. Once queued it will pick tasks from the target class and run
> + * them until either its runtime is exhaused, at which point its back to
> + * dl_server_timer, or until there are no more tasks to run, at which point
> + * the dl_server stops itself.
> + *
> + * By stopping at this point the dl_server retains bandwidth, which, if a new
> + * task wakes up imminently (starting the server again), can be used --
> + * subject to CBS wakeup rules -- without having to wait for the next period.

In both cases we still defer until either the new period or the current
0-laxity, right?

The stop cleans all the flags, so subsequent start calls
enqueue(ENQUEUE_WAKEUP) -> update_dl_entity() which sets dl_throttled
and dl_defer_armed in both cases and then we start_dl_timer (defer
timer) after it (without enqueueing right away).

Or maybe I am still a bit lost. :)

> + * Additionally, because of the dl_defer behaviour the start/stop behaviour is
> + * naturally thottled to once per period, avoiding high context switch
> + * workloads from spamming the hrtimer program/cancel paths.

Right. Also nice cleanup of a flag and a method.


  reply	other threads:[~2025-09-17 13:56 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 11:49 [PATCH v2 00/12] sched: Address schbench regression Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 01/12] sched/psi: Optimize psi_group_change() cpu_clock() usage Peter Zijlstra
2025-07-15 19:11   ` Chris Mason
2025-07-16  6:06     ` K Prateek Nayak
2025-07-16  6:53     ` Beata Michalska
2025-07-16 10:40       ` Peter Zijlstra
2025-07-16 14:54         ` Johannes Weiner
2025-07-16 16:27         ` Chris Mason
2025-07-23  4:16         ` Aithal, Srikanth
2025-07-25  5:13         ` K Prateek Nayak
2025-07-02 11:49 ` [PATCH v2 02/12] sched/deadline: Less agressive dl_server handling Peter Zijlstra
2025-07-02 16:12   ` Juri Lelli
2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-07-14 22:56   ` [PATCH v2 02/12] " Mel Gorman
2025-07-15 14:55     ` Chris Mason
2025-07-16 18:19       ` Mel Gorman
2025-07-30  9:34   ` Geert Uytterhoeven
2025-07-30  9:46     ` Juri Lelli
2025-07-30 10:05       ` Geert Uytterhoeven
2025-08-05 22:03   ` Chris Bainbridge
2025-08-05 23:04     ` Chris Bainbridge
2025-09-15 22:29   ` John Stultz
2025-09-16  4:18     ` John Stultz
2025-09-16  5:28       ` [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing cpu starvation John Stultz
2025-09-16  8:51         ` Juri Lelli
2025-09-16 11:01           ` Peter Zijlstra
2025-09-16 12:52             ` Juri Lelli
2025-09-16 14:30               ` Peter Zijlstra
2025-09-16 17:35             ` John Stultz
2025-09-16 21:30               ` Peter Zijlstra
2025-09-17  3:29                 ` John Stultz
2025-09-17  9:34                   ` Peter Zijlstra
2025-09-17 12:26                     ` Peter Zijlstra
2025-09-17 13:56                       ` Juri Lelli [this message]
2025-09-17 17:30                         ` Peter Zijlstra
2025-09-18  8:37                           ` Juri Lelli
2025-09-18  9:04                             ` Peter Zijlstra
2025-09-18  9:42                               ` Juri Lelli
2025-09-17 19:29                       ` John Stultz
2025-09-18  6:56                       ` [tip: sched/urgent] sched/deadline: Fix dl_server behaviour tip-bot2 for Peter Zijlstra
2025-09-25  7:55                       ` tip-bot2 for Peter Zijlstra
2025-09-18  6:56             ` [tip: sched/urgent] sched/deadline: Fix dl_server getting stuck tip-bot2 for Peter Zijlstra
2025-09-18 14:46               ` Dietmar Eggemann
2025-09-22 21:57               ` Marek Szyprowski
2025-09-22 23:46                 ` John Stultz
2025-09-23  6:31                   ` Marek Szyprowski
2025-09-23  7:25                 ` Peter Zijlstra
2025-09-23  7:52                   ` Marek Szyprowski
2025-09-23 22:02                 ` Peter Zijlstra
2025-09-29 15:19                   ` Marek Szyprowski
     [not found]                   ` <eae77bd0-d874-4ddf-88d7-c1ab75358f91@samsung.com>
2025-10-09  8:35                     ` Krzysztof Kozlowski
2025-10-09  9:26                     ` Peter Zijlstra
2025-10-09 11:42                       ` Marek Szyprowski
2025-09-25  7:55             ` tip-bot2 for Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 03/12] sched: Optimize ttwu() / select_task_rq() Peter Zijlstra
2025-07-10 16:47   ` Vincent Guittot
2025-07-14 22:59   ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 04/12] sched: Use lock guard in ttwu_runnable() Peter Zijlstra
2025-07-10 16:48   ` Vincent Guittot
2025-07-14 23:00   ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 05/12] sched: Add ttwu_queue controls Peter Zijlstra
2025-07-10 16:51   ` Vincent Guittot
2025-07-14 23:14   ` Mel Gorman
2025-07-02 11:49 ` [PATCH v2 06/12] sched: Introduce ttwu_do_migrate() Peter Zijlstra
2025-07-10 16:51   ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 07/12] psi: Split psi_ttwu_dequeue() Peter Zijlstra
2025-07-17 23:59   ` Chris Mason
2025-07-18 18:02     ` Steven Rostedt
2025-07-02 11:49 ` [PATCH v2 08/12] sched: Re-arrange __ttwu_queue_wakelist() Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 09/12] sched: Clean up ttwu comments Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 10/12] sched: Use lock guard in sched_ttwu_pending() Peter Zijlstra
2025-07-10 16:51   ` Vincent Guittot
2025-07-02 11:49 ` [PATCH v2 11/12] sched: Change ttwu_runnable() vs sched_delayed Peter Zijlstra
2025-07-02 11:49 ` [PATCH v2 12/12] sched: Add ttwu_queue support for delayed tasks Peter Zijlstra
2025-07-03 16:00   ` Phil Auld
2025-07-03 16:47     ` Peter Zijlstra
2025-07-03 17:11       ` Phil Auld
2025-07-14 13:57         ` Phil Auld
2025-07-04  6:13       ` K Prateek Nayak
2025-07-04  7:59         ` Peter Zijlstra
2025-07-08 12:44   ` Dietmar Eggemann
2025-07-08 18:57     ` Peter Zijlstra
2025-07-08 21:02     ` Peter Zijlstra
2025-07-23  5:42   ` Shrikanth Hegde
2025-07-02 15:27 ` [PATCH v2 00/12] sched: Address schbench regression Chris Mason
2025-07-07  9:05 ` Shrikanth Hegde
2025-07-07  9:11   ` Peter Zijlstra
2025-07-07  9:38     ` Shrikanth Hegde
2025-07-16 13:46       ` Phil Auld
2025-07-17 17:25         ` Phil Auld
2025-07-07 18:19   ` Shrikanth Hegde
2025-07-08 19:02     ` Peter Zijlstra
2025-07-09 16:46       ` Shrikanth Hegde
2025-07-14 17:54       ` Shrikanth Hegde
2025-07-21 19:37       ` Shrikanth Hegde
2025-07-22 20:20         ` Chris Mason
2025-07-24 18:23           ` Chris Mason
2025-07-08 15:09   ` Chris Mason
2025-07-08 17:29     ` Shrikanth Hegde
2025-07-17 13:04 ` Beata Michalska
2025-07-17 16:57   ` Beata Michalska

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMq-BKLSIG9JrRb7@jlelli-thinkpadt14gen4.remote.csb \
    --to=juri.lelli@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hupu.gm@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=jstultz@google.com \
    --cc=kernel-team@android.com \
    --cc=kprateek.nayak@amd.com \
    --cc=kuyo.chang@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=suleiman@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xuewen.yan94@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.