* [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix scx_flush_disable_work() UAF race
@ 2026-04-28 17:36 Cheng-Yang Chou
2026-04-28 19:13 ` Tejun Heo
0 siblings, 1 reply; 2+ messages in thread
From: Cheng-Yang Chou @ 2026-04-28 17:36 UTC (permalink / raw)
To: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
Cc: Ching-Chun Huang, Chia-Ping Tsai, yphbchou0911
scx_flush_disable_work() calls irq_work_sync() followed by
kthread_flush_work() to ensure that the disable kthread work has
fully completed before bpf_scx_unreg() frees the SCX scheduler.
However, a concurrent scx_vexit() (e.g., triggered by a watchdog stall)
creates a race window between scx_claim_exit() and irq_work_queue():
CPU A (scx_vexit (watchdog)) CPU B (bpf_scx_unreg)
---- ----
scx_claim_exit()
atomic_try_cmpxchg(NONE->kind)
stack_trace_save()
vscnprintf()
scx_disable()
scx_claim_exit() -> FAIL
scx_flush_disable_work()
irq_work_sync() // no-op: not queued yet
kthread_flush_work() // no-op: not queued yet
kobject_put(&sch->kobj) -> free %sch
irq_work_queue() -> UAF on %sch
scx_disable_irq_workfn()
kthread_queue_work() -> UAF
The root cause is that CPU B's scx_flush_disable_work() returns after
syncing an irq_work that has not yet been queued, while CPU A is still
executing the code between scx_claim_exit() and irq_work_queue().
Loop until exit_kind reaches SCX_EXIT_DONE or SCX_EXIT_NONE, draining
disable_irq_work and disable_work in each pass. This ensures that any
work queued after the previous check is caught, while also correctly
handling cases where no disable was triggered (e.g., the
scx_sub_enable_workfn() abort path).
Fixes: 510a27055446 ("sched_ext: sync disable_irq_work in bpf_scx_unreg()")
Reported-by: https://sashiko.dev/#/patchset/20260424100221.32407-1-icheng%40nvidia.com
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
Changes in v2:
- Fold early-return logic into do-while loop (Tejun Heo)
- Link to v1:
https://lore.kernel.org/r/20260428045529.1670916-1-yphbchou0911@gmail.com/
Thanks for the feedback,
Cheng-Yang
kernel/sched/ext.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index f7b1b16e81a5..80d34bafe59c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6086,8 +6086,13 @@ static void scx_disable(struct scx_sched *sch, enum scx_exit_kind kind)
*/
static void scx_flush_disable_work(struct scx_sched *sch)
{
- irq_work_sync(&sch->disable_irq_work);
- kthread_flush_work(&sch->disable_work);
+ int kind;
+
+ do {
+ irq_work_sync(&sch->disable_irq_work);
+ kthread_flush_work(&sch->disable_work);
+ kind = atomic_read(&sch->exit_kind);
+ } while (kind != SCX_EXIT_NONE && kind != SCX_EXIT_DONE);
}
static void dump_newline(struct seq_buf *s)
--
2.48.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-28 19:13 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 17:36 [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix scx_flush_disable_work() UAF race Cheng-Yang Chou
2026-04-28 19:13 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox