linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.17] sched_ext: Fix possible deadlock in the deferred_irq_workfn()
       [not found] <20251124080644.3871678-1-sashal@kernel.org>
@ 2025-11-24  8:06 ` Sasha Levin
  2025-11-24  8:06 ` [PATCH AUTOSEL 6.17] sched_ext: Use IRQ_WORK_INIT_HARD() to initialize rq->scx.kick_cpus_irq_work Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-11-24  8:06 UTC (permalink / raw)
  To: patches, stable
  Cc: Zqiang, Tejun Heo, Sasha Levin, mingo, peterz, juri.lelli,
	vincent.guittot, bigeasy, clrkwllms, rostedt, sched-ext,
	linux-kernel, linux-rt-devel

From: Zqiang <qiang.zhang@linux.dev>

[ Upstream commit a257e974210320ede524f340ffe16bf4bf0dda1e ]

For PREEMPT_RT=y kernels, the deferred_irq_workfn() is executed in
the per-cpu irq_work/* task context and not disable-irq, if the rq
returned by container_of() is current CPU's rq, the following scenarios
may occur:

lock(&rq->__lock);
<Interrupt>
  lock(&rq->__lock);

This commit use IRQ_WORK_INIT_HARD() to replace init_irq_work() to
initialize rq->scx.deferred_irq_work, make the deferred_irq_workfn()
is always invoked in hard-irq context.

Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Comprehensive Analysis

### 1. Commit Message and Problem Analysis
The commit addresses a **critical deadlock** scenario in the `sched_ext`
(BPF extensible scheduler) subsystem.
- **The Issue:** On `CONFIG_PREEMPT_RT` (Real-Time) kernels, `irq_work`
  items initialized with `init_irq_work()` default to running in a per-
  CPU thread context with interrupts enabled. The work function
  `deferred_irq_workfn()` acquires the runqueue lock
  (`raw_spin_rq_lock(rq)`). If an interrupt occurs while this lock is
  held, and the interrupt handler also attempts to acquire `rq->__lock`
  (a standard scheduler pattern), a deadlock occurs (A-A deadlock).
- **The Fix:** The commit changes the initialization of
  `deferred_irq_work` to use `IRQ_WORK_INIT_HARD()`. This forces the
  work function to execute in **hard interrupt context** (with
  interrupts disabled), preventing the nested interrupt scenario that
  causes the deadlock.

### 2. Deep Code Research & Verification
- **Subsystem Context:** `sched_ext` was merged in Linux v6.12. The
  buggy code exists in all stable kernels starting from v6.12.y up to
  the current v6.17.y. Older LTS kernels (6.6.y, 6.1.y) do not contain
  `sched_ext` and are unaffected.
- **Code Mechanics:**
  - **Buggy Code:** `init_irq_work(&rq->scx.deferred_irq_work,
    deferred_irq_workfn);` relies on defaults which are unsafe for this
    locking pattern on PREEMPT_RT.
  - **Corrected Code:** `rq->scx.deferred_irq_work =
    IRQ_WORK_INIT_HARD(deferred_irq_workfn);` explicitly sets the
    `IRQ_WORK_HARD_IRQ` flag.
  - **Precedent:** This pattern is well-established in the scheduler
    core (e.g., `rto_push_work` in `kernel/sched/topology.c` uses
    `IRQ_WORK_INIT_HARD` for the exact same reason).
- **Correctness:** `deferred_irq_workfn` calls `run_deferred`, which
  uses `raw_spin_rq_lock`. These locks are safe to take in hard-irq
  context. The fix is technically sound and adheres to locking rules.

### 3. Stable Kernel Rules Evaluation
- **Fixes a real bug?** **Yes.** It fixes a reproducible deadlock that
  causes system hangs.
- **Important issue?** **Yes.** Deadlocks are critical failures,
  especially on Real-Time systems where reliability is paramount.
- **Obviously correct?** **Yes.** The fix is a one-line change using a
  standard kernel macro specifically designed for this purpose.
- **Small and contained?** **Yes.** One line changed, no external
  dependencies.
- **No new features?** **Yes.** This is a pure bug fix for existing
  functionality.

### 4. Risk Assessment
- **Regression Risk:** **Very Low.** The change only affects the
  execution context of the work item. On non-RT kernels, `irq_work`
  often runs in hard-irq context anyway, so the behavior change is
  minimal. On RT kernels, this restores the required behavior to prevent
  deadlocks.
- **User Impact:** Users running `sched_ext` on Real-Time kernels are at
  risk of random system freezes without this fix.

### Conclusion
This commit is a textbook example of stable-material. It fixes a severe
bug (deadlock) in a supported feature (`sched_ext`) using a minimal,
well-understood solution. While it lacks a "Cc: stable" tag, the nature
of the bug (deadlock) and the surgical nature of the fix make it a
mandatory backport for all stable trees containing `sched_ext` (v6.12+).

**YES**

 kernel/sched/ext.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index e1b502ef1243c..fa64fdb6e9796 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5280,7 +5280,7 @@ void __init init_sched_ext_class(void)
 		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_kick_if_idle, GFP_KERNEL, n));
 		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_preempt, GFP_KERNEL, n));
 		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_wait, GFP_KERNEL, n));
-		init_irq_work(&rq->scx.deferred_irq_work, deferred_irq_workfn);
+		rq->scx.deferred_irq_work = IRQ_WORK_INIT_HARD(deferred_irq_workfn);
 		init_irq_work(&rq->scx.kick_cpus_irq_work, kick_cpus_irq_workfn);
 
 		if (cpu_online(cpu))
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH AUTOSEL 6.17] sched_ext: Use IRQ_WORK_INIT_HARD() to initialize rq->scx.kick_cpus_irq_work
       [not found] <20251124080644.3871678-1-sashal@kernel.org>
  2025-11-24  8:06 ` [PATCH AUTOSEL 6.17] sched_ext: Fix possible deadlock in the deferred_irq_workfn() Sasha Levin
@ 2025-11-24  8:06 ` Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-11-24  8:06 UTC (permalink / raw)
  To: patches, stable
  Cc: Zqiang, Tejun Heo, Sasha Levin, mingo, peterz, juri.lelli,
	vincent.guittot, bigeasy, clrkwllms, rostedt, sched-ext,
	linux-kernel, linux-rt-devel

From: Zqiang <qiang.zhang@linux.dev>

[ Upstream commit 36c6f3c03d104faf1aa90922f2310549c175420f ]

For PREEMPT_RT kernels, the kick_cpus_irq_workfn() be invoked in
the per-cpu irq_work/* task context and there is no rcu-read critical
section to protect. this commit therefore use IRQ_WORK_INIT_HARD() to
initialize the per-cpu rq->scx.kick_cpus_irq_work in the
init_sched_ext_class().

Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

1. **Commit Message Analysis**
    - **Problem:** On `PREEMPT_RT` kernels, `irq_work` initialized with
      `init_irq_work()` executes in a threaded context where RCU read-
      side critical sections are not implicit. The function
      `kick_cpus_irq_workfn` accesses per-CPU request queues
      (`rq->scx`), which requires RCU protection or preemption disabling
      to be safe.
    - **Solution:** The commit changes the initialization to
      `IRQ_WORK_INIT_HARD()`. This macro sets the `IRQ_WORK_HARD_IRQ`
      flag, forcing the work item to execute in hard interrupt context
      even on `PREEMPT_RT` kernels.
    - **Keywords:** "PREEMPT_RT", "RCU-read critical section",
      "initialize".
    - **Tags:** No explicit `Cc: stable` or `Fixes` tag in the provided
      text, but the nature of the fix (correctness on RT) is a strong
      candidate for stable.

2. **Deep Code Research**
    - **Code Context:** The affected file `kernel/sched/ext.c` belongs
      to the `sched_ext` (Extensible Scheduler Class) subsystem.
    - **Technical Mechanism:** In standard kernels, `irq_work` usually
      runs in contexts where RCU is safe. In `PREEMPT_RT`, the default
      behavior changes to threaded IRQs to reduce latency, but this
      removes the implicit RCU protection. Accessing scheduler runqueues
      (`rq`) without this protection can lead to Use-After-Free (UAF)
      bugs or data corruption if a CPU goes offline or the task
      structure changes.
    - **The Fix:** `IRQ_WORK_INIT_HARD` is the standard mechanism to
      opt-out of threaded execution for specific work items that require
      hard IRQ semantics (atomic execution, implicit RCU protection).
      This is a well-understood pattern in the kernel.
    - **Subsystem Status:** `sched_ext` was merged in v6.12. Therefore,
      this fix is applicable to stable kernels v6.12 and newer.

3. **Stable Kernel Rules Evaluation**
    - **Fixes a real bug?** Yes. It fixes a race condition/correctness
      issue on `PREEMPT_RT` kernels which could lead to crashes.
    - **Obviously correct?** Yes. The fix uses standard kernel
      primitives to enforce the required execution context.
    - **Small and contained?** Yes. It is a one-line change to an
      initialization function.
    - **No new features?** Yes. It only corrects the behavior of
      existing code.
    - **Regression Risk:** Low. It forces hard IRQ context, which is
      generally safe for `irq_work` provided the work function is fast
      (which `kick_cpus` typically is).

4. **Conclusion**
  This commit is a text-book example of a stable backport candidate. It
  addresses a correctness issue in the interaction between a specific
  subsystem (`sched_ext`) and the `PREEMPT_RT` configuration. The fix is
  minimal, surgical, and necessary to prevent potential crashes. While
  it applies only to kernels containing `sched_ext` (6.12+), it is
  critical for users running that configuration.

**YES**

 kernel/sched/ext.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index fa64fdb6e9796..6f8ef62c8216c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5281,7 +5281,7 @@ void __init init_sched_ext_class(void)
 		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_preempt, GFP_KERNEL, n));
 		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_wait, GFP_KERNEL, n));
 		rq->scx.deferred_irq_work = IRQ_WORK_INIT_HARD(deferred_irq_workfn);
-		init_irq_work(&rq->scx.kick_cpus_irq_work, kick_cpus_irq_workfn);
+		rq->scx.kick_cpus_irq_work = IRQ_WORK_INIT_HARD(kick_cpus_irq_workfn);
 
 		if (cpu_online(cpu))
 			cpu_rq(cpu)->scx.flags |= SCX_RQ_ONLINE;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-11-24  8:07 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251124080644.3871678-1-sashal@kernel.org>
2025-11-24  8:06 ` [PATCH AUTOSEL 6.17] sched_ext: Fix possible deadlock in the deferred_irq_workfn() Sasha Levin
2025-11-24  8:06 ` [PATCH AUTOSEL 6.17] sched_ext: Use IRQ_WORK_INIT_HARD() to initialize rq->scx.kick_cpus_irq_work Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).