All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/deadline: Reject debugfs dl_server writes for offline CPUs
@ 2026-05-26 10:05 Andrea Righi
  2026-05-26 12:07 ` Juri Lelli
  2026-05-29 10:45 ` [tip: sched/core] " tip-bot2 for Andrea Righi
  0 siblings, 2 replies; 7+ messages in thread
From: Andrea Righi @ 2026-05-26 10:05 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, linux-kernel, Sashiko

Writing runtime or period via the per-CPU dl_server debugfs files
(/sys/kernel/debug/sched/{fair,ext}_server/cpu*/{runtime,period}) on an
offline CPU can trigger two distinct kernel issues:

1) Divide-by-zero in dl_server_apply_params():

  Oops: divide error: 0000 [#1] SMP NOPTI
  RIP: 0010:dl_server_apply_params+0x239/0x3a0
  Call Trace:
   sched_server_write_common.isra.0+0x21a/0x3c0
   full_proxy_write+0x78/0xd0
   vfs_write+0xe7/0x6e0

  Both __dl_sub() and __dl_add() divide by cpus internally, which can be
  0 once the CPU has been removed from any active root-domain span (this
  has been latent since the debugfs interface was introduced).

2) WARN_ON_ONCE in dl_server_start():

  WARNING: kernel/sched/deadline.c:1805 at dl_server_start+0x232/0x270

  Commit ee6e44dfe6e5 ("sched/deadline: Stop dl_server before CPU goes
  offline") added this check to catch enqueueing the server on an
  offline rq.

There's no meaningful semantics for re-configuring the per-CPU dl_server
bandwidth while the CPU is offline, so simply reject the write with
-EBUSY so userspace gets a clear error.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/all/20260526092228.3B6891F00A3A@smtp.kernel.org/
Fixes: d741f297bcea ("sched/fair: Fair server interface")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/debug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index ed3a0d65da0ca..e57ad8c78a60e 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -415,6 +415,9 @@ static ssize_t sched_server_write_common(struct file *filp, const char __user *u
 			return  -EINVAL;
 		}
 
+		if (!cpu_online(cpu_of(rq)))
+			return -EBUSY;
+
 		update_rq_clock(rq);
 		dl_server_stop(dl_se);
 		retval = dl_server_apply_params(dl_se, runtime, period, 0);

base-commit: 7b197f597bc895b01204d8389a4cf3b00780bd21
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread
* [PATCH] sched/deadline: Reject debugfs dl_server writes for offline CPUs
@ 2026-05-26 10:06 Andrea Righi
  2026-05-26 12:04 ` abaci-kreproducer
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Righi @ 2026-05-26 10:06 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, linux-kernel

Writing runtime or period via the per-CPU dl_server debugfs files
(/sys/kernel/debug/sched/{fair,ext}_server/cpu*/{runtime,period}) on an
offline CPU can trigger two distinct kernel issues:

1) Divide-by-zero in dl_server_apply_params():

  Oops: divide error: 0000 [#1] SMP NOPTI
  RIP: 0010:dl_server_apply_params+0x239/0x3a0
  Call Trace:
   sched_server_write_common.isra.0+0x21a/0x3c0
   full_proxy_write+0x78/0xd0
   vfs_write+0xe7/0x6e0

  Both __dl_sub() and __dl_add() divide by cpus internally, which can be
  0 once the CPU has been removed from any active root-domain span (this
  has been latent since the debugfs interface was introduced).

2) WARN_ON_ONCE in dl_server_start():

  WARNING: kernel/sched/deadline.c:1805 at dl_server_start+0x232/0x270

  Commit ee6e44dfe6e5 ("sched/deadline: Stop dl_server before CPU goes
  offline") added this check to catch enqueueing the server on an
  offline rq.

There's no meaningful semantics for re-configuring the per-CPU dl_server
bandwidth while the CPU is offline, so simply reject the write with
-EBUSY so userspace gets a clear error.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/all/20260526092228.3B6891F00A3A@smtp.kernel.org/
Fixes: d741f297bcea ("sched/fair: Fair server interface")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/debug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index ed3a0d65da0ca..e57ad8c78a60e 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -415,6 +415,9 @@ static ssize_t sched_server_write_common(struct file *filp, const char __user *u
 			return  -EINVAL;
 		}
 
+		if (!cpu_online(cpu_of(rq)))
+			return -EBUSY;
+
 		update_rq_clock(rq);
 		dl_server_stop(dl_se);
 		retval = dl_server_apply_params(dl_se, runtime, period, 0);

base-commit: 7b197f597bc895b01204d8389a4cf3b00780bd21
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-29 10:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 10:05 [PATCH] sched/deadline: Reject debugfs dl_server writes for offline CPUs Andrea Righi
2026-05-26 12:07 ` Juri Lelli
2026-05-29  7:09   ` Andrea Righi
2026-05-29  9:14     ` Peter Zijlstra
2026-05-29 10:45 ` [tip: sched/core] " tip-bot2 for Andrea Righi
  -- strict thread matches above, loose matches on Subject: below --
2026-05-26 10:06 [PATCH] " Andrea Righi
2026-05-26 12:04 ` abaci-kreproducer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.