From: Andrea Righi <arighi@nvidia.com>
To: Joel Fernandes <joelagnelf@nvidia.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>
Subject: Re: [PATCH v6 05/14] sched/deadline: Return EBUSY if dl_bw_cpus is zero
Date: Thu, 17 Jul 2025 17:51:29 +0200 [thread overview]
Message-ID: <aHkcAYN43ImECWQ2@gpd4> (raw)
In-Reply-To: <20250702232944.3221001-6-joelagnelf@nvidia.com>
Hi Joel,
On Wed, Jul 02, 2025 at 07:29:30PM -0400, Joel Fernandes wrote:
> Hotplugged CPUs coming online do an enqueue but are not a part of any
> root domain containing cpu_active() CPUs. So in this case, don't mess
> with accounting and we can retry later. Without this patch, we see
> crashes with sched_ext selftest's hotplug test due to divide by zero.
>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
> kernel/sched/deadline.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 7129b61d548b..0e73577257ad 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1725,7 +1725,12 @@ int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 perio
> cpus = dl_bw_cpus(cpu);
> cap = dl_bw_capacity(cpu);
>
> - if (__dl_overflow(dl_b, cap, old_bw, new_bw))
> + /*
> + * Hotplugged CPUs coming online do an enqueue but are not a part of any
> + * root domain containing cpu_active() CPUs. So in this case, don't mess
> + * with accounting and we can retry later.
> + */
> + if (!cpus || __dl_overflow(dl_b, cap, old_bw, new_bw))
> return -EBUSY;
I can trigger the following with the hotplug sched_ext selftest:
[ 61.731069] ------------[ cut here ]------------
[ 61.731241] WARNING: CPU: 4 PID: 2191 at kernel/sched/deadline.c:1591 dl_server_start+0x9c/0xb0
...
[ 61.731552] Sched_ext: hotplug_cbs (enabled+all), task: runnable_at=-1ms
[ 61.731555] RIP: 0010:dl_server_start+0x9c/0xb0
...
[ 61.732216] Call Trace:
[ 61.732239] <TASK>
[ 61.732262] enqueue_task_scx+0x2bb/0x350
[ 61.732298] enqueue_task+0x2e/0xd0
[ 61.732333] ttwu_do_activate+0xa4/0x2d0
[ 61.732368] try_to_wake_up+0x2a2/0x8e0
[ 61.732406] cpuhp_bringup_ap+0x72/0x250
[ 61.732441] ? __pfx_cpuhp_bringup_ap+0x10/0x10
[ 61.732478] cpuhp_invoke_callback+0x1b0/0x650
[ 61.732523] __cpuhp_invoke_callback_range+0x7e/0xf0
[ 61.732566] _cpu_up+0xea/0x1e0
[ 61.732601] cpu_up+0xc3/0xd0
[ 61.732635] cpu_subsys_online+0x4b/0xd0
[ 61.732678] device_online+0x49/0x80
[ 61.732714] online_store+0x9a/0xd0
[ 61.732744] ? sysfs_kf_write+0x2b/0x70
[ 61.732780] kernfs_fop_write_iter+0x137/0x200
[ 61.732822] vfs_write+0x264/0x5e0
[ 61.732867] ksys_write+0x79/0xf0
[ 61.732901] do_syscall_64+0xbb/0x370
[ 61.732939] entry_SYSCALL_64_after_hwframe+0x77/0x7f
I think we can simply ignore the !cpus case, as it's safe to skip
hotplugged CPUs that perform an enqueue when coming online, without
triggering a warning, and we can always retry applying the server params
later.
Maybe return -EAGAIN when !cpus and handle that gracefully in
dl_server_start()?
Something along these lines (on top of this patch). What do you think?
Thanks,
-Andrea
kernel/sched/deadline.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5d5819d445fb9..78d2a3b0a4cde 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1587,9 +1587,14 @@ void dl_server_start(struct sched_dl_entity *dl_se)
if (!dl_server(dl_se)) {
u64 runtime = 50 * NSEC_PER_MSEC;
u64 period = 1000 * NSEC_PER_MSEC;
+ int err;
- if (WARN_ON_ONCE(dl_server_apply_params(dl_se, runtime, period, 1)))
+ err = dl_server_apply_params(dl_se, runtime, period, 1);
+ if (err == -EAGAIN)
return;
+
+ WARN_ON_ONCE(err);
+
dl_se->dl_server = 1;
dl_se->dl_defer = 1;
setup_new_dl_entity(dl_se);
@@ -1662,7 +1667,10 @@ int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 perio
* root domain containing cpu_active() CPUs. So in this case, don't mess
* with accounting and we can retry later.
*/
- if (!cpus || __dl_overflow(dl_b, cap, old_bw, new_bw))
+ if (!cpus)
+ return -EAGAIN;
+
+ if (__dl_overflow(dl_b, cap, old_bw, new_bw))
return -EBUSY;
if (init) {
next prev parent reply other threads:[~2025-07-17 15:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 23:29 [PATCH v6 00/14] Add a deadline server for sched_ext tasks Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 01/14] sched/debug: Fix updating of ppos on server write ops Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 02/14] sched/debug: Stop and start server based on if it was active Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 03/14] sched/deadline: Clear the defer params Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 04/14] sched/deadline: Prevent setting server as started if params couldn't be applied Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 05/14] sched/deadline: Return EBUSY if dl_bw_cpus is zero Joel Fernandes
2025-07-17 15:51 ` Andrea Righi [this message]
2025-07-02 23:29 ` [PATCH v6 06/14] sched: Add support to pick functions to take rf Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 07/14] sched: Add a server arg to dl_server_update_idle_time() Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 08/14] sched/ext: Add a DL server for sched_ext tasks Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 09/14] sched/debug: Add support to change sched_ext server params Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 10/14] sched/deadline: Add support to remove DLserver's bandwidth contribution Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 11/14] sched/ext: Relinquish DL server reservations when not needed Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 12/14] selftests/sched_ext: Add test for sched_ext dl_server Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 13/14] sched/deadline: Fix DL server crash in inactive_timer callback Joel Fernandes
2025-07-02 23:29 ` [PATCH v6 14/14] selftests/sched_ext: Add test for DL server total_bw consistency Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHkcAYN43ImECWQ2@gpd4 \
--to=arighi@nvidia.com \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.