Sched_ext development
 help / color / mirror / Atom feed
* [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix scx_flush_disable_work() UAF race
@ 2026-04-28  4:54 Cheng-Yang Chou
  2026-04-28 17:03 ` Tejun Heo
  0 siblings, 1 reply; 2+ messages in thread
From: Cheng-Yang Chou @ 2026-04-28  4:54 UTC (permalink / raw)
  To: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
  Cc: Ching-Chun Huang, Chia-Ping Tsai, yphbchou0911

scx_flush_disable_work() calls irq_work_sync() then kthread_flush_work()
to ensure the disable kthread work has fully completed before
bpf_scx_unreg() frees the SCX scheduler.

However, a concurrent scx_vexit() (e.g., triggered by watchdog stall)
creates a race window between scx_claim_exit() and irq_work_queue():

  CPU A (scx_vexit (watchdog))        CPU B (bpf_scx_unreg)
  ----                                ----
  scx_claim_exit()
    atomic_try_cmpxchg(NONE->kind)
  stack_trace_save()
  vscnprintf()
                                      scx_disable()
                                        scx_claim_exit() -> FAIL
                                      scx_flush_disable_work()
                                        irq_work_sync()      // no-op: not queued yet
                                        kthread_flush_work() // no-op: not queued yet
                                      kobject_put(&sch->kobj) -> free %sch
  irq_work_queue() -> UAF on %sch
  scx_disable_irq_workfn()
    kthread_queue_work() -> UAF

The root cause: CPU B's scx_flush_disable_work() returns after syncing
an irq_work that was never queued, while CPU A is still between
scx_claim_exit() and irq_work_queue().

Loop until exit_kind reaches SCX_EXIT_DONE, draining disable_irq_work
and disable_work each pass to catch any work queued after the previous
check. Return early if exit_kind is SCX_EXIT_NONE -- callers where no
disable was triggered (e.g., scx_sub_enable_workfn() abort path) have
nothing to wait for.

Fixes: 510a27055446 ("sched_ext: sync disable_irq_work in bpf_scx_unreg()")
Reported-by: https://sashiko.dev/#/patchset/20260424100221.32407-1-icheng%40nvidia.com
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
The race can be confirmed by widening the race window with udelay(5000)
after scx_claim_exit() in scx_vexit() [1].
[1]: https://gist.github.com/EricccTaiwan/3b6e6c3519747b057138613af9be3465

 kernel/sched/ext.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index f7b1b16e81a5..4f4726ee235a 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6086,8 +6086,15 @@ static void scx_disable(struct scx_sched *sch, enum scx_exit_kind kind)
  */
 static void scx_flush_disable_work(struct scx_sched *sch)
 {
-	irq_work_sync(&sch->disable_irq_work);
-	kthread_flush_work(&sch->disable_work);
+	if (atomic_read(&sch->exit_kind) == SCX_EXIT_NONE)
+		return;
+	while (true) {
+		irq_work_sync(&sch->disable_irq_work);
+		kthread_flush_work(&sch->disable_work);
+		if (atomic_read(&sch->exit_kind) == SCX_EXIT_DONE)
+			break;
+		cpu_relax();
+	}
 }
 
 static void dump_newline(struct seq_buf *s)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-28 17:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28  4:54 [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix scx_flush_disable_work() UAF race Cheng-Yang Chou
2026-04-28 17:03 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox