* [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation
@ 2026-03-09 16:30 Cheng-Yang Chou
2026-03-09 16:30 ` [PATCH 1/1] " Cheng-Yang Chou
2026-03-10 1:18 ` [PATCH 0/1] " Tejun Heo
0 siblings, 2 replies; 3+ messages in thread
From: Cheng-Yang Chou @ 2026-03-09 16:30 UTC (permalink / raw)
To: sched-ext; +Cc: tj, void, arighi, changwoo, jserv, yphbchou0911
While testing scx_rustland under vng, the locking dependency checker
reported a circular locking dependency:
[ 31.801757] ======================================================
[ 31.801786] WARNING: possible circular locking dependency detected
[ 31.801812] 7.0.0-rc2+ #31 Tainted: G E
[ 31.801835] ------------------------------------------------------
[ 31.801860] swapper/7/0 is trying to acquire lock:
[ 31.801884] ffffffffa4ac1638 (scx_sched_lock){-...}-{2:2}, at: scx_claim_exit+0x7a/0x180
[ 31.801923]
[ 31.801923] but task is already holding lock:
[ 31.801951] ffff8ce0bcbc5ca0 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x28/0xc0
[ 31.801990]
[ 31.801990] which lock already depends on the new lock.
[ 31.801990]
[ 31.802021]
[ 31.802021] the existing dependency chain (in reverse order) is:
[ 31.802050]
[ 31.802050] -> #1 (&rq->__lock){-.-.}-{2:2}:
[ 31.802079] _raw_spin_lock_nested+0x2d/0x50
[ 31.802103] raw_spin_rq_lock_nested+0x28/0xc0
[ 31.802128] scx_bypass+0x14e/0x4e0
[ 31.802147] scx_root_enable_workfn+0x2ce/0xa10
[ 31.802171] kthread_worker_fn+0xbf/0x3c0
[ 31.802199] kthread+0x109/0x140
[ 31.802218] ret_from_fork+0x3fd/0x490
[ 31.802242] ret_from_fork_asm+0x1a/0x30
[ 31.802267]
[ 31.802267] -> #0 (scx_sched_lock){-...}-{2:2}:
[ 31.802296] __lock_acquire+0x172e/0x2830
[ 31.802320] lock_acquire+0xd5/0x330
[ 31.802339] _raw_spin_lock_irqsave+0x49/0x80
[ 31.802363] scx_claim_exit+0x7a/0x180
[ 31.802387] scx_vexit+0x3a/0xd0
[ 31.802406] scx_exit+0x50/0x80
[ 31.802425] scx_tick+0x114/0x120
[ 31.802445] sched_tick+0x12e/0x3a0
[ 31.802464] update_process_times+0x90/0xf0
[ 31.802488] tick_nohz_handler+0x97/0x1b0
[ 31.802512] __hrtimer_run_queues+0xac/0x3a0
[ 31.802539] hrtimer_interrupt+0x116/0x280
[ 31.802564] __sysvec_apic_timer_interrupt+0x6b/0x1e0
[ 31.802589] sysvec_apic_timer_interrupt+0x9b/0xc0
[ 31.802613] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 31.802642] pv_native_safe_halt+0xb/0x10
[ 31.802669] arch_cpu_idle+0x9/0x10
[ 31.802687] default_idle_call+0x7c/0x220
[ 31.802713] do_idle+0x211/0x260
[ 31.802732] cpu_startup_entry+0x29/0x30
[ 31.802756] start_secondary+0x12d/0x170
[ 31.802780] common_startup_64+0x13e/0x141
[ 31.802804]
[ 31.802804] other info that might help us debug this:
[ 31.802804]
[ 31.802834] Possible unsafe locking scenario:
[ 31.802834]
[ 31.802859] CPU0 CPU1
[ 31.802879] ---- ----
[ 31.802899] lock(&rq->__lock);
[ 31.802918] lock(scx_sched_lock);
[ 31.802949] lock(&rq->__lock);
[ 31.802978] lock(scx_sched_lock);
[ 31.802997]
[ 31.802997] *** DEADLOCK ***
Link to full log:
https://gist.github.com/EricccTaiwan/bc7d8eac7a9a31af36a2e9f0a295da7c
Thanks,
Cheng-Yang
---
Cheng-Yang Chou (1):
sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant
propagation
kernel/sched/ext.c | 43 ++++++++++++++++++++++++++-----------------
1 file changed, 26 insertions(+), 17 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH 1/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation 2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou @ 2026-03-09 16:30 ` Cheng-Yang Chou 2026-03-10 1:18 ` [PATCH 0/1] " Tejun Heo 1 sibling, 0 replies; 3+ messages in thread From: Cheng-Yang Chou @ 2026-03-09 16:30 UTC (permalink / raw) To: sched-ext; +Cc: tj, void, arighi, changwoo, jserv, yphbchou0911 scx_claim_exit() acquired scx_sched_lock to propagate exits to descendant schedulers, but it can be reached from the timer tick path with the rq lock already held: scx_tick() -> scx_exit() -> scx_vexit() -> scx_claim_exit() scx_bypass() establishes scx_sched_lock -> rq lock ordering, creating a circular dependency: CPU0 CPU1 ---- ---- lock(&rq->__lock); lock(scx_sched_lock); lock(&rq->__lock); lock(scx_sched_lock); Fix this by moving descendant propagation to scx_disable_workfn(), which runs in kthread context without any rq lock held. Forward progress is guaranteed by sch->aborting being set in scx_claim_exit() before returning. No recursion is introduced since SCX_EXIT_PARENT exits skip propagation. Additionally, switch from raw_spinlock_irqsave to raw_spinlock_irq in the workfn as IRQ flags need not be saved in kthread context. Finally, add a blank line to avoid checkpatch failures. Fixes: ebeca1f930ea ("sched_ext: Introduce cgroup sub-sched support") Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> --- kernel/sched/ext.c | 43 ++++++++++++++++++++++++++----------------- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index d6d807337013..e767b45a8ab5 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -5616,23 +5616,6 @@ static bool scx_claim_exit(struct scx_sched *sch, enum scx_exit_kind kind) */ WRITE_ONCE(sch->aborting, true); - /* - * Propagate exits to descendants immediately. Each has a dedicated - * helper kthread and can run in parallel. While most of disabling is - * serialized, running them in separate threads allows parallelizing - * ops.exit(), which can take arbitrarily long prolonging bypass mode. - * - * This doesn't cause recursions as propagation only takes place for - * non-propagation exits. - */ - if (kind != SCX_EXIT_PARENT) { - scoped_guard (raw_spinlock_irqsave, &scx_sched_lock) { - struct scx_sched *pos; - scx_for_each_descendant_pre(pos, sch) - scx_disable(pos, SCX_EXIT_PARENT); - } - } - return true; } @@ -5650,6 +5633,32 @@ static void scx_disable_workfn(struct kthread_work *work) if (atomic_try_cmpxchg(&sch->exit_kind, &kind, SCX_EXIT_DONE)) break; } + + /* + * Propagate exits to descendants. Each has a dedicated helper kthread + * and can run in parallel. While most of disabling is serialized, + * running them in separate threads allows parallelizing ops.exit(), + * which can take arbitrarily long prolonging bypass mode. + * + * This is done here rather than in scx_claim_exit() to avoid taking + * scx_sched_lock while an rq lock may be held: scx_claim_exit() can + * be reached from the timer tick path with the rq lock already held, + * but scx_bypass() establishes scx_sched_lock -> rq lock ordering, + * which would create a circular dependency. This workfn runs in + * kthread context without any rq lock held, so it is safe here. + * + * This doesn't cause recursion as scx_disable(pos, SCX_EXIT_PARENT) + * calls scx_claim_exit(pos, SCX_EXIT_PARENT), which skips this block. + */ + if (kind != SCX_EXIT_PARENT) { + scoped_guard(raw_spinlock_irq, &scx_sched_lock) { + struct scx_sched *pos; + + scx_for_each_descendant_pre(pos, sch) + scx_disable(pos, SCX_EXIT_PARENT); + } + } + ei->kind = kind; ei->reason = scx_exit_reason(ei->kind); -- 2.48.1 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation 2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou 2026-03-09 16:30 ` [PATCH 1/1] " Cheng-Yang Chou @ 2026-03-10 1:18 ` Tejun Heo 1 sibling, 0 replies; 3+ messages in thread From: Tejun Heo @ 2026-03-10 1:18 UTC (permalink / raw) To: Cheng-Yang Chou Cc: sched-ext, void, arighi, changwoo, jserv, Emil Tsalapatis, linux-kernel Hello, Thanks for the report. I posted a fix series which takes a different approach as deferring descendant propagation breaks the forward progress guarantee: http://lkml.kernel.org/r/20260310011653.2993712-1-tj@kernel.org Thanks. -- tejun ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-10 1:18 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou 2026-03-09 16:30 ` [PATCH 1/1] " Cheng-Yang Chou 2026-03-10 1:18 ` [PATCH 0/1] " Tejun Heo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.