* [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation
@ 2026-03-09 16:30 Cheng-Yang Chou
2026-03-09 16:30 ` [PATCH 1/1] " Cheng-Yang Chou
2026-03-10 1:18 ` [PATCH 0/1] " Tejun Heo
0 siblings, 2 replies; 3+ messages in thread
From: Cheng-Yang Chou @ 2026-03-09 16:30 UTC (permalink / raw)
To: sched-ext; +Cc: tj, void, arighi, changwoo, jserv, yphbchou0911
While testing scx_rustland under vng, the locking dependency checker
reported a circular locking dependency:
[ 31.801757] ======================================================
[ 31.801786] WARNING: possible circular locking dependency detected
[ 31.801812] 7.0.0-rc2+ #31 Tainted: G E
[ 31.801835] ------------------------------------------------------
[ 31.801860] swapper/7/0 is trying to acquire lock:
[ 31.801884] ffffffffa4ac1638 (scx_sched_lock){-...}-{2:2}, at: scx_claim_exit+0x7a/0x180
[ 31.801923]
[ 31.801923] but task is already holding lock:
[ 31.801951] ffff8ce0bcbc5ca0 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x28/0xc0
[ 31.801990]
[ 31.801990] which lock already depends on the new lock.
[ 31.801990]
[ 31.802021]
[ 31.802021] the existing dependency chain (in reverse order) is:
[ 31.802050]
[ 31.802050] -> #1 (&rq->__lock){-.-.}-{2:2}:
[ 31.802079] _raw_spin_lock_nested+0x2d/0x50
[ 31.802103] raw_spin_rq_lock_nested+0x28/0xc0
[ 31.802128] scx_bypass+0x14e/0x4e0
[ 31.802147] scx_root_enable_workfn+0x2ce/0xa10
[ 31.802171] kthread_worker_fn+0xbf/0x3c0
[ 31.802199] kthread+0x109/0x140
[ 31.802218] ret_from_fork+0x3fd/0x490
[ 31.802242] ret_from_fork_asm+0x1a/0x30
[ 31.802267]
[ 31.802267] -> #0 (scx_sched_lock){-...}-{2:2}:
[ 31.802296] __lock_acquire+0x172e/0x2830
[ 31.802320] lock_acquire+0xd5/0x330
[ 31.802339] _raw_spin_lock_irqsave+0x49/0x80
[ 31.802363] scx_claim_exit+0x7a/0x180
[ 31.802387] scx_vexit+0x3a/0xd0
[ 31.802406] scx_exit+0x50/0x80
[ 31.802425] scx_tick+0x114/0x120
[ 31.802445] sched_tick+0x12e/0x3a0
[ 31.802464] update_process_times+0x90/0xf0
[ 31.802488] tick_nohz_handler+0x97/0x1b0
[ 31.802512] __hrtimer_run_queues+0xac/0x3a0
[ 31.802539] hrtimer_interrupt+0x116/0x280
[ 31.802564] __sysvec_apic_timer_interrupt+0x6b/0x1e0
[ 31.802589] sysvec_apic_timer_interrupt+0x9b/0xc0
[ 31.802613] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 31.802642] pv_native_safe_halt+0xb/0x10
[ 31.802669] arch_cpu_idle+0x9/0x10
[ 31.802687] default_idle_call+0x7c/0x220
[ 31.802713] do_idle+0x211/0x260
[ 31.802732] cpu_startup_entry+0x29/0x30
[ 31.802756] start_secondary+0x12d/0x170
[ 31.802780] common_startup_64+0x13e/0x141
[ 31.802804]
[ 31.802804] other info that might help us debug this:
[ 31.802804]
[ 31.802834] Possible unsafe locking scenario:
[ 31.802834]
[ 31.802859] CPU0 CPU1
[ 31.802879] ---- ----
[ 31.802899] lock(&rq->__lock);
[ 31.802918] lock(scx_sched_lock);
[ 31.802949] lock(&rq->__lock);
[ 31.802978] lock(scx_sched_lock);
[ 31.802997]
[ 31.802997] *** DEADLOCK ***
Link to full log:
https://gist.github.com/EricccTaiwan/bc7d8eac7a9a31af36a2e9f0a295da7c
Thanks,
Cheng-Yang
---
Cheng-Yang Chou (1):
sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant
propagation
kernel/sched/ext.c | 43 ++++++++++++++++++++++++++-----------------
1 file changed, 26 insertions(+), 17 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation
2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou
@ 2026-03-09 16:30 ` Cheng-Yang Chou
2026-03-10 1:18 ` [PATCH 0/1] " Tejun Heo
1 sibling, 0 replies; 3+ messages in thread
From: Cheng-Yang Chou @ 2026-03-09 16:30 UTC (permalink / raw)
To: sched-ext; +Cc: tj, void, arighi, changwoo, jserv, yphbchou0911
scx_claim_exit() acquired scx_sched_lock to propagate exits to
descendant schedulers, but it can be reached from the timer tick path
with the rq lock already held:
scx_tick() -> scx_exit() -> scx_vexit() -> scx_claim_exit()
scx_bypass() establishes scx_sched_lock -> rq lock ordering, creating a
circular dependency:
CPU0 CPU1
---- ----
lock(&rq->__lock);
lock(scx_sched_lock);
lock(&rq->__lock);
lock(scx_sched_lock);
Fix this by moving descendant propagation to scx_disable_workfn(), which
runs in kthread context without any rq lock held. Forward progress is
guaranteed by sch->aborting being set in scx_claim_exit() before
returning. No recursion is introduced since SCX_EXIT_PARENT exits skip
propagation.
Additionally, switch from raw_spinlock_irqsave to raw_spinlock_irq in
the workfn as IRQ flags need not be saved in kthread context.
Finally, add a blank line to avoid checkpatch failures.
Fixes: ebeca1f930ea ("sched_ext: Introduce cgroup sub-sched support")
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
kernel/sched/ext.c | 43 ++++++++++++++++++++++++++-----------------
1 file changed, 26 insertions(+), 17 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index d6d807337013..e767b45a8ab5 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5616,23 +5616,6 @@ static bool scx_claim_exit(struct scx_sched *sch, enum scx_exit_kind kind)
*/
WRITE_ONCE(sch->aborting, true);
- /*
- * Propagate exits to descendants immediately. Each has a dedicated
- * helper kthread and can run in parallel. While most of disabling is
- * serialized, running them in separate threads allows parallelizing
- * ops.exit(), which can take arbitrarily long prolonging bypass mode.
- *
- * This doesn't cause recursions as propagation only takes place for
- * non-propagation exits.
- */
- if (kind != SCX_EXIT_PARENT) {
- scoped_guard (raw_spinlock_irqsave, &scx_sched_lock) {
- struct scx_sched *pos;
- scx_for_each_descendant_pre(pos, sch)
- scx_disable(pos, SCX_EXIT_PARENT);
- }
- }
-
return true;
}
@@ -5650,6 +5633,32 @@ static void scx_disable_workfn(struct kthread_work *work)
if (atomic_try_cmpxchg(&sch->exit_kind, &kind, SCX_EXIT_DONE))
break;
}
+
+ /*
+ * Propagate exits to descendants. Each has a dedicated helper kthread
+ * and can run in parallel. While most of disabling is serialized,
+ * running them in separate threads allows parallelizing ops.exit(),
+ * which can take arbitrarily long prolonging bypass mode.
+ *
+ * This is done here rather than in scx_claim_exit() to avoid taking
+ * scx_sched_lock while an rq lock may be held: scx_claim_exit() can
+ * be reached from the timer tick path with the rq lock already held,
+ * but scx_bypass() establishes scx_sched_lock -> rq lock ordering,
+ * which would create a circular dependency. This workfn runs in
+ * kthread context without any rq lock held, so it is safe here.
+ *
+ * This doesn't cause recursion as scx_disable(pos, SCX_EXIT_PARENT)
+ * calls scx_claim_exit(pos, SCX_EXIT_PARENT), which skips this block.
+ */
+ if (kind != SCX_EXIT_PARENT) {
+ scoped_guard(raw_spinlock_irq, &scx_sched_lock) {
+ struct scx_sched *pos;
+
+ scx_for_each_descendant_pre(pos, sch)
+ scx_disable(pos, SCX_EXIT_PARENT);
+ }
+ }
+
ei->kind = kind;
ei->reason = scx_exit_reason(ei->kind);
--
2.48.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation
2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou
2026-03-09 16:30 ` [PATCH 1/1] " Cheng-Yang Chou
@ 2026-03-10 1:18 ` Tejun Heo
1 sibling, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2026-03-10 1:18 UTC (permalink / raw)
To: Cheng-Yang Chou
Cc: sched-ext, void, arighi, changwoo, jserv, Emil Tsalapatis,
linux-kernel
Hello,
Thanks for the report. I posted a fix series which takes a different approach
as deferring descendant propagation breaks the forward progress guarantee:
http://lkml.kernel.org/r/20260310011653.2993712-1-tj@kernel.org
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-10 1:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou
2026-03-09 16:30 ` [PATCH 1/1] " Cheng-Yang Chou
2026-03-10 1:18 ` [PATCH 0/1] " Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox