public inbox for sched-ext@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation
@ 2026-03-09 16:30 Cheng-Yang Chou
  2026-03-09 16:30 ` [PATCH 1/1] " Cheng-Yang Chou
  2026-03-10  1:18 ` [PATCH 0/1] " Tejun Heo
  0 siblings, 2 replies; 3+ messages in thread
From: Cheng-Yang Chou @ 2026-03-09 16:30 UTC (permalink / raw)
  To: sched-ext; +Cc: tj, void, arighi, changwoo, jserv, yphbchou0911

While testing scx_rustland under vng, the locking dependency checker
reported a circular locking dependency:

[   31.801757] ======================================================
[   31.801786] WARNING: possible circular locking dependency detected
[   31.801812] 7.0.0-rc2+ #31 Tainted: G            E
[   31.801835] ------------------------------------------------------
[   31.801860] swapper/7/0 is trying to acquire lock:
[   31.801884] ffffffffa4ac1638 (scx_sched_lock){-...}-{2:2}, at: scx_claim_exit+0x7a/0x180
[   31.801923]
[   31.801923] but task is already holding lock:
[   31.801951] ffff8ce0bcbc5ca0 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x28/0xc0
[   31.801990]
[   31.801990] which lock already depends on the new lock.
[   31.801990]
[   31.802021]
[   31.802021] the existing dependency chain (in reverse order) is:
[   31.802050]
[   31.802050] -> #1 (&rq->__lock){-.-.}-{2:2}:
[   31.802079]        _raw_spin_lock_nested+0x2d/0x50
[   31.802103]        raw_spin_rq_lock_nested+0x28/0xc0
[   31.802128]        scx_bypass+0x14e/0x4e0
[   31.802147]        scx_root_enable_workfn+0x2ce/0xa10
[   31.802171]        kthread_worker_fn+0xbf/0x3c0
[   31.802199]        kthread+0x109/0x140
[   31.802218]        ret_from_fork+0x3fd/0x490
[   31.802242]        ret_from_fork_asm+0x1a/0x30
[   31.802267]
[   31.802267] -> #0 (scx_sched_lock){-...}-{2:2}:
[   31.802296]        __lock_acquire+0x172e/0x2830
[   31.802320]        lock_acquire+0xd5/0x330
[   31.802339]        _raw_spin_lock_irqsave+0x49/0x80
[   31.802363]        scx_claim_exit+0x7a/0x180
[   31.802387]        scx_vexit+0x3a/0xd0
[   31.802406]        scx_exit+0x50/0x80
[   31.802425]        scx_tick+0x114/0x120
[   31.802445]        sched_tick+0x12e/0x3a0
[   31.802464]        update_process_times+0x90/0xf0
[   31.802488]        tick_nohz_handler+0x97/0x1b0
[   31.802512]        __hrtimer_run_queues+0xac/0x3a0
[   31.802539]        hrtimer_interrupt+0x116/0x280
[   31.802564]        __sysvec_apic_timer_interrupt+0x6b/0x1e0
[   31.802589]        sysvec_apic_timer_interrupt+0x9b/0xc0
[   31.802613]        asm_sysvec_apic_timer_interrupt+0x1b/0x20
[   31.802642]        pv_native_safe_halt+0xb/0x10
[   31.802669]        arch_cpu_idle+0x9/0x10
[   31.802687]        default_idle_call+0x7c/0x220
[   31.802713]        do_idle+0x211/0x260
[   31.802732]        cpu_startup_entry+0x29/0x30
[   31.802756]        start_secondary+0x12d/0x170
[   31.802780]        common_startup_64+0x13e/0x141
[   31.802804]
[   31.802804] other info that might help us debug this:
[   31.802804]
[   31.802834]  Possible unsafe locking scenario:
[   31.802834]
[   31.802859]        CPU0                    CPU1
[   31.802879]        ----                    ----
[   31.802899]   lock(&rq->__lock);
[   31.802918]                                lock(scx_sched_lock);
[   31.802949]                                lock(&rq->__lock);
[   31.802978]   lock(scx_sched_lock);
[   31.802997]
[   31.802997]  *** DEADLOCK ***

Link to full log:
        https://gist.github.com/EricccTaiwan/bc7d8eac7a9a31af36a2e9f0a295da7c

Thanks,
Cheng-Yang

---

Cheng-Yang Chou (1):
  sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant
    propagation

 kernel/sched/ext.c | 43 ++++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation
  2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou
@ 2026-03-09 16:30 ` Cheng-Yang Chou
  2026-03-10  1:18 ` [PATCH 0/1] " Tejun Heo
  1 sibling, 0 replies; 3+ messages in thread
From: Cheng-Yang Chou @ 2026-03-09 16:30 UTC (permalink / raw)
  To: sched-ext; +Cc: tj, void, arighi, changwoo, jserv, yphbchou0911

scx_claim_exit() acquired scx_sched_lock to propagate exits to
descendant schedulers, but it can be reached from the timer tick path
with the rq lock already held:

	scx_tick() -> scx_exit() -> scx_vexit() -> scx_claim_exit()

scx_bypass() establishes scx_sched_lock -> rq lock ordering, creating a
circular dependency:

        CPU0                    CPU1
        ----                    ----
   lock(&rq->__lock);
                                lock(scx_sched_lock);
                                lock(&rq->__lock);
   lock(scx_sched_lock);

Fix this by moving descendant propagation to scx_disable_workfn(), which
runs in kthread context without any rq lock held. Forward progress is
guaranteed by sch->aborting being set in scx_claim_exit() before
returning. No recursion is introduced since SCX_EXIT_PARENT exits skip
propagation.

Additionally, switch from raw_spinlock_irqsave to raw_spinlock_irq in
the workfn as IRQ flags need not be saved in kthread context.

Finally, add a blank line to avoid checkpatch failures.

Fixes: ebeca1f930ea ("sched_ext: Introduce cgroup sub-sched support")
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
 kernel/sched/ext.c | 43 ++++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index d6d807337013..e767b45a8ab5 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5616,23 +5616,6 @@ static bool scx_claim_exit(struct scx_sched *sch, enum scx_exit_kind kind)
 	 */
 	WRITE_ONCE(sch->aborting, true);
 
-	/*
-	 * Propagate exits to descendants immediately. Each has a dedicated
-	 * helper kthread and can run in parallel. While most of disabling is
-	 * serialized, running them in separate threads allows parallelizing
-	 * ops.exit(), which can take arbitrarily long prolonging bypass mode.
-	 *
-	 * This doesn't cause recursions as propagation only takes place for
-	 * non-propagation exits.
-	 */
-	if (kind != SCX_EXIT_PARENT) {
-		scoped_guard (raw_spinlock_irqsave, &scx_sched_lock) {
-			struct scx_sched *pos;
-			scx_for_each_descendant_pre(pos, sch)
-				scx_disable(pos, SCX_EXIT_PARENT);
-		}
-	}
-
 	return true;
 }
 
@@ -5650,6 +5633,32 @@ static void scx_disable_workfn(struct kthread_work *work)
 		if (atomic_try_cmpxchg(&sch->exit_kind, &kind, SCX_EXIT_DONE))
 			break;
 	}
+
+	/*
+	 * Propagate exits to descendants. Each has a dedicated helper kthread
+	 * and can run in parallel. While most of disabling is serialized,
+	 * running them in separate threads allows parallelizing ops.exit(),
+	 * which can take arbitrarily long prolonging bypass mode.
+	 *
+	 * This is done here rather than in scx_claim_exit() to avoid taking
+	 * scx_sched_lock while an rq lock may be held: scx_claim_exit() can
+	 * be reached from the timer tick path with the rq lock already held,
+	 * but scx_bypass() establishes scx_sched_lock -> rq lock ordering,
+	 * which would create a circular dependency. This workfn runs in
+	 * kthread context without any rq lock held, so it is safe here.
+	 *
+	 * This doesn't cause recursion as scx_disable(pos, SCX_EXIT_PARENT)
+	 * calls scx_claim_exit(pos, SCX_EXIT_PARENT), which skips this block.
+	 */
+	if (kind != SCX_EXIT_PARENT) {
+		scoped_guard(raw_spinlock_irq, &scx_sched_lock) {
+			struct scx_sched *pos;
+
+			scx_for_each_descendant_pre(pos, sch)
+				scx_disable(pos, SCX_EXIT_PARENT);
+		}
+	}
+
 	ei->kind = kind;
 	ei->reason = scx_exit_reason(ei->kind);
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation
  2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou
  2026-03-09 16:30 ` [PATCH 1/1] " Cheng-Yang Chou
@ 2026-03-10  1:18 ` Tejun Heo
  1 sibling, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2026-03-10  1:18 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: sched-ext, void, arighi, changwoo, jserv, Emil Tsalapatis,
	linux-kernel

Hello,

Thanks for the report. I posted a fix series which takes a different approach
as deferring descendant propagation breaks the forward progress guarantee:

  http://lkml.kernel.org/r/20260310011653.2993712-1-tj@kernel.org

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-10  1:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou
2026-03-09 16:30 ` [PATCH 1/1] " Cheng-Yang Chou
2026-03-10  1:18 ` [PATCH 0/1] " Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox