All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cheng-Yang Chou <yphbchou0911@gmail.com>
To: sched-ext@lists.linux.dev
Cc: tj@kernel.org, void@manifault.com, arighi@nvidia.com,
	changwoo@igalia.com, jserv@ccns.ncku.edu.tw,
	yphbchou0911@gmail.com
Subject: [PATCH 1/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation
Date: Tue, 10 Mar 2026 00:30:25 +0800	[thread overview]
Message-ID: <20260309163025.2240221-2-yphbchou0911@gmail.com> (raw)
In-Reply-To: <20260309163025.2240221-1-yphbchou0911@gmail.com>

scx_claim_exit() acquired scx_sched_lock to propagate exits to
descendant schedulers, but it can be reached from the timer tick path
with the rq lock already held:

	scx_tick() -> scx_exit() -> scx_vexit() -> scx_claim_exit()

scx_bypass() establishes scx_sched_lock -> rq lock ordering, creating a
circular dependency:

        CPU0                    CPU1
        ----                    ----
   lock(&rq->__lock);
                                lock(scx_sched_lock);
                                lock(&rq->__lock);
   lock(scx_sched_lock);

Fix this by moving descendant propagation to scx_disable_workfn(), which
runs in kthread context without any rq lock held. Forward progress is
guaranteed by sch->aborting being set in scx_claim_exit() before
returning. No recursion is introduced since SCX_EXIT_PARENT exits skip
propagation.

Additionally, switch from raw_spinlock_irqsave to raw_spinlock_irq in
the workfn as IRQ flags need not be saved in kthread context.

Finally, add a blank line to avoid checkpatch failures.

Fixes: ebeca1f930ea ("sched_ext: Introduce cgroup sub-sched support")
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
 kernel/sched/ext.c | 43 ++++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index d6d807337013..e767b45a8ab5 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5616,23 +5616,6 @@ static bool scx_claim_exit(struct scx_sched *sch, enum scx_exit_kind kind)
 	 */
 	WRITE_ONCE(sch->aborting, true);
 
-	/*
-	 * Propagate exits to descendants immediately. Each has a dedicated
-	 * helper kthread and can run in parallel. While most of disabling is
-	 * serialized, running them in separate threads allows parallelizing
-	 * ops.exit(), which can take arbitrarily long prolonging bypass mode.
-	 *
-	 * This doesn't cause recursions as propagation only takes place for
-	 * non-propagation exits.
-	 */
-	if (kind != SCX_EXIT_PARENT) {
-		scoped_guard (raw_spinlock_irqsave, &scx_sched_lock) {
-			struct scx_sched *pos;
-			scx_for_each_descendant_pre(pos, sch)
-				scx_disable(pos, SCX_EXIT_PARENT);
-		}
-	}
-
 	return true;
 }
 
@@ -5650,6 +5633,32 @@ static void scx_disable_workfn(struct kthread_work *work)
 		if (atomic_try_cmpxchg(&sch->exit_kind, &kind, SCX_EXIT_DONE))
 			break;
 	}
+
+	/*
+	 * Propagate exits to descendants. Each has a dedicated helper kthread
+	 * and can run in parallel. While most of disabling is serialized,
+	 * running them in separate threads allows parallelizing ops.exit(),
+	 * which can take arbitrarily long prolonging bypass mode.
+	 *
+	 * This is done here rather than in scx_claim_exit() to avoid taking
+	 * scx_sched_lock while an rq lock may be held: scx_claim_exit() can
+	 * be reached from the timer tick path with the rq lock already held,
+	 * but scx_bypass() establishes scx_sched_lock -> rq lock ordering,
+	 * which would create a circular dependency. This workfn runs in
+	 * kthread context without any rq lock held, so it is safe here.
+	 *
+	 * This doesn't cause recursion as scx_disable(pos, SCX_EXIT_PARENT)
+	 * calls scx_claim_exit(pos, SCX_EXIT_PARENT), which skips this block.
+	 */
+	if (kind != SCX_EXIT_PARENT) {
+		scoped_guard(raw_spinlock_irq, &scx_sched_lock) {
+			struct scx_sched *pos;
+
+			scx_for_each_descendant_pre(pos, sch)
+				scx_disable(pos, SCX_EXIT_PARENT);
+		}
+	}
+
 	ei->kind = kind;
 	ei->reason = scx_exit_reason(ei->kind);
 
-- 
2.48.1


  reply	other threads:[~2026-03-09 16:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09 16:30 [PATCH 0/1] sched_ext: Fix deadlock in scx_claim_exit() by deferring descendant propagation Cheng-Yang Chou
2026-03-09 16:30 ` Cheng-Yang Chou [this message]
2026-03-10  1:18 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260309163025.2240221-2-yphbchou0911@gmail.com \
    --to=yphbchou0911@gmail.com \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.