public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler
@ 2026-02-15 17:41 Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] mailbox: pcc: Remove spurious IRQF_ONESHOT usage Sasha Levin
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Sebastian Andrzej Siewior, Jassi Brar, Sasha Levin, clrkwllms,
	rostedt, linux-kernel, linux-rt-devel

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

[ Upstream commit fa84883d44422208b45869a67c0265234fdce1f0 ]

request_threaded_irq() is invoked with a primary and a secondary handler
and no flags are passed. The primary handler is the same as
irq_default_primary_handler() so there is no need to have an identical
copy.
The lack of the IRQF_ONESHOT can be dangerous because the interrupt
source is not masked while the threaded handler is active. This means,
especially on LEVEL typed interrupt lines, the interrupt can fire again
before the threaded handler had a chance to run.

Use the default primary interrupt handler by specifying NULL and set
IRQF_ONESHOT so the interrupt source is masked until the secondary
handler is done.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Both commits are the same patch. The one being evaluated (fa84883d44422,
signed by Jassi Brar) is the one from the mailbox tree.

## Summary of Analysis

### What the commit fixes

The commit fixes a missing `IRQF_ONESHOT` flag in
`request_threaded_irq()` for the BCM FlexRM mailbox driver. The existing
code had a custom primary handler (`flexrm_irq_event`) that was
functionally identical to the kernel's `irq_default_primary_handler()` —
it simply returned `IRQ_WAKE_THREAD`. However, because the code didn't
set `IRQF_ONESHOT`, the interrupt line is not masked while the threaded
handler runs. On level-triggered interrupt lines, this can cause an
interrupt storm (the interrupt fires repeatedly before the threaded
handler gets to run).

### Why it matters

1. **Correctness**: Without `IRQF_ONESHOT`, if the interrupt is level-
   triggered (or chained through a level-triggered parent), the
   interrupt will continuously fire between the primary handler
   returning and the threaded handler completing. This is a potential
   interrupt storm / soft lockup.

2. **Author credibility**: Sebastian Siewior is the PREEMPT_RT
   maintainer and a leading expert on interrupt handling in Linux.
   Thomas Gleixner, the IRQ subsystem maintainer, also signed off on
   this patch via the tip tree.

3. **Pattern**: This is part of a systematic series fixing the same
   pattern across multiple drivers (efct, btintel_pcie, fsl-mc, etc.),
   all from the same author.

### Stable kernel criteria

- **Obviously correct**: Yes - removes an exact duplicate of
  `irq_default_primary_handler()` and adds the missing `IRQF_ONESHOT`
  flag. This is the canonical pattern for threaded IRQs.
- **Fixes a real bug**: Yes - missing `IRQF_ONESHOT` can cause interrupt
  storms on level-triggered lines.
- **Small and contained**: Yes - 1 file changed, 2 insertions, 12
  deletions. The change is strictly self-contained.
- **No new features**: Correct - this is purely a bug fix.
- **Risk**: Very low. The behavior is identical except that the
  interrupt line is now properly masked during threaded handler
  execution, which prevents the storm.

### Nuance: MSI vs Level-Triggered

The driver uses platform MSI interrupts for completions (the comment
says "We only have MSI for completions"). MSI interrupts are typically
edge-triggered and may not suffer from the re-assertion problem.
However:
1. Platform MSIs may be chained through parent controllers that ARE
   level-triggered.
2. The underlying IRQ chip may not have `IRQCHIP_ONESHOT_SAFE` set (only
   PCI MSI explicitly sets this).
3. Even if the risk is lower for MSI, adding `IRQF_ONESHOT` is the
   correct defensive practice and has zero downside.
4. The kernel enforcement check (`__setup_irq`) would actually reject
   the new code (handler=NULL without IRQF_ONESHOT) if the chip lacks
   `IRQCHIP_ONESHOT_SAFE`, so `IRQF_ONESHOT` is needed for the code to
   even work.

### Verification

- **Verified** `flexrm_irq_event()` returns only `IRQ_WAKE_THREAD` at
  line 1176-1182 of bcm-flexrm-mailbox.c, identical to
  `irq_default_primary_handler()` at kernel/irq/manage.c:976-979.
- **Verified** `request_threaded_irq()` with handler=NULL uses
  `irq_default_primary_handler` (kernel/irq/manage.c:2128-2132).
- **Verified** `__setup_irq()` requires IRQF_ONESHOT when
  handler==irq_default_primary_handler and chip lacks
  IRQCHIP_ONESHOT_SAFE (kernel/irq/manage.c:1664-1685).
- **Verified** platform MSI irq chips do NOT automatically set
  `IRQCHIP_ONESHOT_SAFE` - only PCI MSI does
  (drivers/pci/msi/irqdomain.c:133,187). No irqchip drivers set it
  either.
- **Verified** the driver has existed since v4.12 (commit
  dbc049eee7300), present in all active stable trees.
- **Verified** the fix is self-contained: only touches 1 file, no
  dependencies on other commits.
- **Verified** author Sebastian Siewior is the PREEMPT_RT maintainer
  with deep IRQ expertise, and this is part of a systematic series.
- **Verified** the commit also went through Thomas Gleixner's tip tree
  (commit 03843d95a4a4e) with his Signed-off-by, confirming IRQ
  subsystem maintainer review.

This is a small, safe, well-understood bug fix from a domain expert that
prevents a potential interrupt storm. It meets all stable kernel
criteria.

**YES**

 drivers/mailbox/bcm-flexrm-mailbox.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c
index 41f79e51d9e5a..4255fefc3a5a0 100644
--- a/drivers/mailbox/bcm-flexrm-mailbox.c
+++ b/drivers/mailbox/bcm-flexrm-mailbox.c
@@ -1173,14 +1173,6 @@ static int flexrm_debugfs_stats_show(struct seq_file *file, void *offset)
 
 /* ====== FlexRM interrupt handler ===== */
 
-static irqreturn_t flexrm_irq_event(int irq, void *dev_id)
-{
-	/* We only have MSI for completions so just wakeup IRQ thread */
-	/* Ring related errors will be informed via completion descriptors */
-
-	return IRQ_WAKE_THREAD;
-}
-
 static irqreturn_t flexrm_irq_thread(int irq, void *dev_id)
 {
 	flexrm_process_completions(dev_id);
@@ -1271,10 +1263,8 @@ static int flexrm_startup(struct mbox_chan *chan)
 		ret = -ENODEV;
 		goto fail_free_cmpl_memory;
 	}
-	ret = request_threaded_irq(ring->irq,
-				   flexrm_irq_event,
-				   flexrm_irq_thread,
-				   0, dev_name(ring->mbox->dev), ring);
+	ret = request_threaded_irq(ring->irq, NULL, flexrm_irq_thread,
+				   IRQF_ONESHOT, dev_name(ring->mbox->dev), ring);
 	if (ret) {
 		dev_err(ring->mbox->dev,
 			"failed to request ring%d IRQ\n", ring->num);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.19-6.1] mailbox: pcc: Remove spurious IRQF_ONESHOT usage
  2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
@ 2026-02-15 17:41 ` Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] remoteproc: imx_dsp_rproc: Skip RP_MBOX_SUSPEND_SYSTEM when mailbox TX channel is uninitialized Sasha Levin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Mark Brown, Aishwarya TCV, Sudeep Holla, Jassi Brar, Sasha Levin,
	sudeep.holla, bigeasy, clrkwllms, rostedt, linux-acpi,
	linux-kernel, linux-rt-devel

From: Mark Brown <broonie@kernel.org>

[ Upstream commit 673327028cd61db68a1e0c708be2e302c082adf9 ]

The PCC code currently specifies IRQF_ONESHOT if the interrupt could
potentially be shared but doesn't actually use request_threaded_irq() and
the interrupt handler does not use IRQ_WAKE_THREAD so IRQF_ONESHOT is
never relevant. Since commit aef30c8d569c ("genirq: Warn about using
IRQF_ONESHOT without a threaded handler") specifying it has resulted in a
WARN_ON(), fix this by removing IRQF_ONESHOT.

Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Summary of Analysis

### What the commit fixes

The PCC mailbox driver (`drivers/mailbox/pcc.c`) incorrectly specifies
`IRQF_ONESHOT` when requesting a shared interrupt. `IRQF_ONESHOT` is
only meaningful for threaded interrupt handlers (registered via
`request_threaded_irq()`), but this driver uses `devm_request_irq()`
with a hardcoded primary handler (`pcc_mbox_irq`) and no thread
function.

Since commit `aef30c8d569c` (merged 2026-01-13 into v6.19 cycle), the
generic IRQ code now emits a `WARN_ON_ONCE()` when `IRQF_ONESHOT` is
specified without a threaded handler. This means on any system with a
PCC shared interrupt, the kernel produces a warning splat at
boot/runtime — a visible regression for users.

### Stable kernel criteria assessment

1. **Obviously correct and tested**: Yes — the change removes a flag
   that has no effect (since the driver doesn't use threaded IRQs).
   Reviewed by the SCMI/PCC co-maintainer Sudeep Holla.

2. **Fixes a real bug**: Yes — it fixes a `WARN_ON()` that fires at
   runtime. WARN_ONs are treated as bugs by many distributions and can
   be flagged by automated testing.

3. **Important issue**: Moderate — a WARN_ON at boot is visible to users
   and breaks CI/testing systems. On PREEMPT_RT systems, the underlying
   issue (IRQF_ONESHOT without threading) could also be problematic
   since the handler is exempt from forced-threading.

4. **Small and contained**: Yes — single line change, single file, no
   side effects.

5. **No new features**: Correct — this purely removes a spurious flag.

### Dependencies

This fix only makes sense if commit `aef30c8d569c` ("genirq: Warn about
using IRQF_ONESHOT without a threaded handler") is also present in the
stable tree. That commit was merged in the v6.19 cycle (January 2026).
If `aef30c8d569c` is NOT backported to stable, then the `IRQF_ONESHOT`
flag is harmless (no WARN_ON fires, and the flag is simply ignored).
However, the `IRQF_ONESHOT` was always incorrect — removing it is safe
regardless.

The PCC shared interrupt support was added by commit `3db174e478cb0b` in
September 2023 (v6.7 cycle), so this code exists in stable trees 6.6.y
and later.

### Risk assessment

**Risk: Extremely low.** Removing an unused flag from an IRQ
registration call cannot break anything. The flag was never functional
since the driver doesn't use threaded IRQs.

**Benefit: Eliminates WARN_ON** splat on systems with shared PCC
interrupts, which affects ACPI-based ARM64 and x86 platforms using PCC
for CPPC/SCMI.

### Verification

- **Verified**: Commit `aef30c8d569c` exists in tree and adds
  `WARN_ON_ONCE(new->flags & IRQF_ONESHOT && !new->thread_fn)` in
  `kernel/irq/manage.c` (dated 2026-01-13).
- **Verified**: `drivers/mailbox/pcc.c` uses `devm_request_irq()` (line
  556), NOT `request_threaded_irq()`.
- **Verified**: No `IRQ_WAKE_THREAD` or `request_threaded_irq` usage in
  `pcc.c` (grep returned no matches).
- **Verified**: The `IRQF_ONESHOT` was introduced by commit
  `3db174e478cb0b` ("mailbox: pcc: Support shared interrupt for multiple
  subspaces", 2023-09-11), which is in the v6.7 cycle, making it present
  in 6.6.y stable and later.
- **Verified**: The change is a single flag removal (`IRQF_SHARED |
  IRQF_ONESHOT` → `IRQF_SHARED`), one line modified.
- **Verified**: Reviewed by Sudeep Holla (PCC/SCMI co-maintainer),
  reported by an ARM engineer.

### Conclusion

This is a straightforward fix for a WARN_ON triggered at runtime. The
change is minimal (removing one unused flag), obviously correct
(verified by code inspection that no threaded handler exists), reviewed
by a domain expert, and has zero risk of regression. It meets all stable
kernel criteria.

The only nuance is that the WARN_ON itself comes from a very recent
commit (`aef30c8d569c`), which may or may not be present in all stable
trees. However, even without that commit, removing the spurious
`IRQF_ONESHOT` is correct — the flag was never needed and was always a
no-op in this context. Removing incorrect flags proactively prevents
future issues and is good practice.

**YES**

 drivers/mailbox/pcc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c
index ff292b9e0be9e..060489e5ae6de 100644
--- a/drivers/mailbox/pcc.c
+++ b/drivers/mailbox/pcc.c
@@ -552,7 +552,7 @@ static int pcc_startup(struct mbox_chan *chan)
 
 	if (pchan->plat_irq > 0) {
 		irqflags = pcc_chan_plat_irq_can_be_shared(pchan) ?
-						IRQF_SHARED | IRQF_ONESHOT : 0;
+						IRQF_SHARED : 0;
 		rc = devm_request_irq(chan->mbox->dev, pchan->plat_irq, pcc_mbox_irq,
 				      irqflags, MBOX_IRQ_NAME, chan);
 		if (unlikely(rc)) {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.19-6.1] remoteproc: imx_dsp_rproc: Skip RP_MBOX_SUSPEND_SYSTEM when mailbox TX channel is uninitialized
  2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] mailbox: pcc: Remove spurious IRQF_ONESHOT usage Sasha Levin
@ 2026-02-15 17:41 ` Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] mailbox: imx: Skip the suspend flag for i.MX7ULP Sasha Levin
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Iuliana Prodan, Mathieu Poirier, Sasha Levin, andersson, Frank.Li,
	linux-remoteproc, imx, linux-arm-kernel

From: Iuliana Prodan <iuliana.prodan@nxp.com>

[ Upstream commit d62e0e92e589c53c4320ed5914af5fe103f5ce7e ]

Firmwares that do not use mailbox communication (e.g., the hello_world
sample) leave priv->tx_ch as NULL. The current suspend logic
unconditionally sends RP_MBOX_SUSPEND_SYSTEM, which is invalid without
an initialized TX channel.

Detect the no_mailboxes case early and skip sending the suspend
message. Instead, proceed directly to the runtime PM suspend path,
which is the correct behavior for firmwares that cannot respond to
mailbox requests.

Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Link: https://lore.kernel.org/r/20251204122825.756106-1-iuliana.prodan@oss.nxp.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

So `no_mailboxes` was introduced in v6.4. This means the bug only
affects kernels v6.4+. Let me check which stable trees are currently
maintained.

Now let me understand the actual bug severity more precisely. From the
code analysis:

1. When `no_mailboxes=1` is set, `imx_dsp_rproc_mbox_no_alloc()` is
   used, which is a no-op — `priv->tx_ch` stays NULL.
2. During suspend, `imx_dsp_suspend()` calls
   `mbox_send_message(priv->tx_ch, ...)` where `priv->tx_ch` is NULL.
3. `mbox_send_message()` checks `if (!chan || !chan->cl)` and returns
   `-EINVAL` — so it **does NOT crash**.
4. However, the return value is `-EINVAL` (negative), so the error path
   is taken: `dev_err()` is printed and the function returns the error
   code.
5. This causes **suspend to fail** with an error, which means the system
   **cannot suspend** when using `no_mailboxes` mode.

This is a real functional bug — suspend is broken when the
`no_mailboxes` module parameter is used. It won't crash, but it prevents
the system from suspending, which is a significant issue for
embedded/power-managed systems.

## Summary of Analysis

### What the commit fixes
When the `no_mailboxes` module parameter is set (firmware doesn't use
mailbox communication), `priv->tx_ch` remains NULL. The suspend function
`imx_dsp_suspend()` unconditionally tries to send a message via the
mailbox TX channel, which fails with `-EINVAL` when the channel is NULL.
This causes **system suspend to fail** for all users of the
`no_mailboxes` configuration.

The fix adds a NULL check for `priv->tx_ch` before the
`mbox_send_message()` call, skipping the mailbox communication and going
directly to the PM runtime suspend path — which is the correct behavior
for firmwares that don't use mailbox.

### Stable kernel criteria assessment
1. **Obviously correct and tested**: Yes — a simple NULL check that
   mirrors the existing pattern in `imx_dsp_rproc_kick()` (line 764).
   Signed off by the NXP developer and the remoteproc maintainer.
2. **Fixes a real bug**: Yes — suspend is broken when `no_mailboxes=1`.
3. **Important issue**: Moderate — it prevents system suspend for
   affected configurations (embedded i.MX DSP users with no-mailbox
   firmwares). On embedded/battery-powered devices, inability to suspend
   is a significant problem.
4. **Small and contained**: Yes — 9 lines added (NULL check + comment +
   debug message), single file, single function.
5. **No new features**: Correct — this is purely a bug fix.

### Risk assessment
- **Very low risk**: The added NULL check only triggers when
  `priv->tx_ch` is NULL (the no-mailbox case). It has zero impact on the
  normal mailbox path.
- The pattern is already established in the same driver
  (`imx_dsp_rproc_kick()` at line 764).
- The fix is self-contained with no dependencies on other patches.

### Affected versions
The `no_mailboxes` module parameter was introduced in commit
`11bb42a94648a` which landed in v6.4. The bug exists from v6.4 onwards.
Currently maintained stable trees v6.6.y and v6.12.y would both be
affected.

### Verification
- **Verified**: `mbox_send_message()` in `drivers/mailbox/mailbox.c:249`
  returns `-EINVAL` when `chan` is NULL (not a crash, but causes suspend
  failure since the error is propagated).
- **Verified**: `priv->tx_ch` starts as NULL (zeroed by
  `devm_rproc_alloc`) and remains NULL when
  `imx_dsp_rproc_mbox_no_alloc()` is used (it's a no-op, confirmed at
  line 635-638).
- **Verified**: `no_mailboxes` module parameter was introduced in commit
  `11bb42a94648a` (v6.4 cycle), confirmed via `git log -S
  "no_mailboxes"`.
- **Verified**: The existing `imx_dsp_rproc_kick()` function already has
  an identical defensive NULL check for `priv->tx_ch` at line 764,
  confirming the fix pattern is established.
- **Verified**: The `imx_dsp_suspend()` function returns the error from
  `mbox_send_message()` (line 1251), so a `-EINVAL` return causes
  suspend to fail entirely.
- **Verified**: The driver was introduced in `ec0e5549f3586d` (Oct 2021,
  v5.16 cycle), so it exists in multiple stable trees.
- **Verified**: The patch was reviewed and merged by remoteproc
  maintainer Mathieu Poirier, with a Link to the mailing list
  discussion.

This is a clean, small, obviously correct bug fix for a real problem
(broken suspend) in a specific but important use case (no-mailbox
firmware on i.MX DSP). It follows an established pattern in the same
driver and has very low regression risk.

**YES**

 drivers/remoteproc/imx_dsp_rproc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/remoteproc/imx_dsp_rproc.c b/drivers/remoteproc/imx_dsp_rproc.c
index 5130a35214c92..f51deaacc7008 100644
--- a/drivers/remoteproc/imx_dsp_rproc.c
+++ b/drivers/remoteproc/imx_dsp_rproc.c
@@ -1242,6 +1242,15 @@ static int imx_dsp_suspend(struct device *dev)
 	if (rproc->state != RPROC_RUNNING)
 		goto out;
 
+	/*
+	 * No channel available for sending messages;
+	 * indicates no mailboxes present, so trigger PM runtime suspend
+	 */
+	if (!priv->tx_ch) {
+		dev_dbg(dev, "No initialized mbox tx channel, suspend directly.\n");
+		goto out;
+	}
+
 	reinit_completion(&priv->pm_comp);
 
 	/* Tell DSP that suspend is happening */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.19-6.1] mailbox: imx: Skip the suspend flag for i.MX7ULP
  2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] mailbox: pcc: Remove spurious IRQF_ONESHOT usage Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] remoteproc: imx_dsp_rproc: Skip RP_MBOX_SUSPEND_SYSTEM when mailbox TX channel is uninitialized Sasha Levin
@ 2026-02-15 17:41 ` Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.18] mailbox: mchp-ipc-sbi: fix uninitialized symbol and other smatch warnings Sasha Levin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Jacky Bai, Peng Fan, Jassi Brar, Sasha Levin, Frank.Li,
	linux-kernel, imx, linux-arm-kernel

From: Jacky Bai <ping.bai@nxp.com>

[ Upstream commit 673b570825ace0dcb2ac0c676080559d505c6f40 ]

In current imx-mailbox driver, the MU IRQ is configured with
'IRQF_NO_SUSPEND' flag set. So during linux suspend/resume flow,
the MU IRQ is always enabled. With commit 892cb524ae8a ("mailbox: imx:
fix wakeup failure from freeze mode"), if the MU IRQ is triggered after
the priv->suspended flag has been set, the system suspend will be
aborted.

On i.MX7ULP platform, certain drivers that depend on rpmsg may need
to send rpmsg request and receive an acknowledgment from the remote
core during the late_suspend stage. Early suspend abort is not
expected, and the i.MX7ULP already has additional hardware and
software to make sure the system can be wakeup from freeze mode
correctly when MU IRQ is trigger.

Skip the 'suspend' flag handling logic on i.MX7ULP to avoid the
early abort when doing suspend.

Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I understand the full picture:

1. Commit 892cb524ae8a added `priv->suspend` flag and
   `pm_system_wakeup()` call in the ISR to handle freeze mode wakeup.
2. `pm_system_wakeup()` aborts the ongoing suspend process.
3. On i.MX7ULP, during `late_suspend`, drivers send rpmsg to the remote
   core and expect a response. This triggers the MU IRQ, which in turn
   calls `pm_system_wakeup()` and aborts suspend — an undesirable
   behavior on this platform.
4. The fix adds a per-platform flag `skip_suspend_flag` to skip the
   suspend flag logic for i.MX7ULP, which has its own hardware
   mechanisms for handling wakeup.

### Stable Kernel Assessment

**Bug severity**: This is a **suspend failure** on i.MX7ULP — the system
cannot properly enter suspend/freeze mode. This is a significant
functional regression for i.MX7ULP users caused by commit 892cb524ae8a.

**Fix quality**:
- Small, contained (adds 1 bool field + 1 config line + 2 conditional
  checks)
- Platform-specific — only affects i.MX7ULP; all other platforms are
  unchanged
- Low regression risk — it simply disables a feature that was causing
  problems on this specific platform
- Reviewed by Peng Fan (NXP maintainer)

**Fits stable criteria**:
- Fixes a real bug (suspend failure)
- Small and contained
- Obviously correct
- Platform-specific quirk/workaround — this falls squarely in the
  "hardware quirk" category

**Dependencies**: Requires commit 892cb524ae8a ("mailbox: imx: fix
wakeup failure from freeze mode") to be present. That commit is in
v5.18+, so applicable stable trees (6.1.y, 6.6.y, 6.12.y) all have it.

**Risk**: Very low. The change only affects the i.MX7ULP platform and
simply skips suspend flag logic that was causing problems. If the bool
field is false (default for all other platforms), behavior is completely
unchanged.

### Verification

- **git show 892cb524ae8a**: Verified that commit introduced the
  `priv->suspend` flag and `pm_system_wakeup()` call in `imx_mu_isr()`,
  confirming the mechanism described in the commit message.
- **git tag --contains 892cb524ae8a**: Confirmed the prerequisite commit
  is in v5.18+, so all active LTS trees (6.1.y, 6.6.y, 6.12.y) contain
  it.
- **git tag --contains 673b570825ace**: Confirmed the commit under
  review is post-v6.19 (in development tree).
- **Grep for `priv->suspend`**: Verified the ISR check at line 564 —
  when suspend flag is set and MU IRQ fires, `pm_system_wakeup()` is
  called, which aborts suspend.
- **Code review of diff**: Verified only i.MX7ULP config gets
  `skip_suspend_flag = true`; all other platform configs default to
  false (zero-initialized), so behavior is unchanged for non-i.MX7ULP
  platforms.
- **Unverified**: Cannot directly verify user reports of suspend failure
  on i.MX7ULP, but the commit message clearly describes the mechanism
  and the author is from NXP (the SoC vendor).

### Conclusion

This is a platform-specific hardware workaround that fixes suspend
failure on i.MX7ULP. It's small, well-scoped, reviewed by the subsystem
maintainer, and has essentially zero regression risk for other
platforms. It fits the "hardware quirk" exception category for stable
backports. The prerequisite commit exists in all active LTS trees.

**YES**

 drivers/mailbox/imx-mailbox.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
index 6778afc64a048..003f9236c35e0 100644
--- a/drivers/mailbox/imx-mailbox.c
+++ b/drivers/mailbox/imx-mailbox.c
@@ -122,6 +122,7 @@ struct imx_mu_dcfg {
 	u32	xRR;		/* Receive Register0 */
 	u32	xSR[IMX_MU_xSR_MAX];	/* Status Registers */
 	u32	xCR[IMX_MU_xCR_MAX];	/* Control Registers */
+	bool	skip_suspend_flag;
 };
 
 #define IMX_MU_xSR_GIPn(type, x) (type & IMX_MU_V2 ? BIT(x) : BIT(28 + (3 - (x))))
@@ -988,6 +989,7 @@ static const struct imx_mu_dcfg imx_mu_cfg_imx7ulp = {
 	.xRR	= 0x40,
 	.xSR	= {0x60, 0x60, 0x60, 0x60},
 	.xCR	= {0x64, 0x64, 0x64, 0x64, 0x64},
+	.skip_suspend_flag = true,
 };
 
 static const struct imx_mu_dcfg imx_mu_cfg_imx8ulp = {
@@ -1071,7 +1073,8 @@ static int __maybe_unused imx_mu_suspend_noirq(struct device *dev)
 			priv->xcr[i] = imx_mu_read(priv, priv->dcfg->xCR[i]);
 	}
 
-	priv->suspend = true;
+	if (!priv->dcfg->skip_suspend_flag)
+		priv->suspend = true;
 
 	return 0;
 }
@@ -1094,7 +1097,8 @@ static int __maybe_unused imx_mu_resume_noirq(struct device *dev)
 			imx_mu_write(priv, priv->xcr[i], priv->dcfg->xCR[i]);
 	}
 
-	priv->suspend = false;
+	if (!priv->dcfg->skip_suspend_flag)
+		priv->suspend = false;
 
 	return 0;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.19-6.18] mailbox: mchp-ipc-sbi: fix uninitialized symbol and other smatch warnings
  2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
                   ` (2 preceding siblings ...)
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] mailbox: imx: Skip the suspend flag for i.MX7ULP Sasha Levin
@ 2026-02-15 17:41 ` Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-5.15] mailbox: sprd: mask interrupts that are not handled Sasha Levin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Valentina Fernandez, kernel test robot, Dan Carpenter, Jassi Brar,
	Sasha Levin, linux-kernel

From: Valentina Fernandez <valentina.fernandezalanis@microchip.com>

[ Upstream commit bc4d17e495cd3b02bcb2e10f575763a5ff31f80b ]

Fix uninitialized symbol 'hartid' warning in mchp_ipc_cluster_aggr_isr()
by introducing a 'found' flag to track whether the IRQ matches any
online hart. If no match is found, return IRQ_NONE.

Also fix other smatch warnings by removing dead code in
mchp_ipc_startup() and by returning -ENODEV in dev_err_probe() if the
Microchip SBI extension is not found.

Fixes below smatch warnings:
drivers/mailbox/mailbox-mchp-ipc-sbi.c:187 mchp_ipc_cluster_aggr_isr() error: uninitialized symbol 'hartid'.
drivers/mailbox/mailbox-mchp-ipc-sbi.c:324 mchp_ipc_startup() warn: ignoring unreachable code.
drivers/mailbox/mailbox-mchp-ipc-sbi.c:422 mchp_ipc_probe() warn: passing zero to 'dev_err_probe'

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202512171533.CDLdScMY-lkp@intel.com/
Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

So v6.14 is no longer a maintained stable tree (stopped at v6.14.11).
The relevant current stable tree would be v6.18.y.

### User Impact

This driver is for RISC-V Microchip Inter-processor Communication. While
it's a niche driver:
1. The uninitialized variable bug in the ISR is a real correctness issue
2. The `dev_err_probe` returning 0 on failure could cause unexpected
   behavior
3. All three fixes are straightforward and low-risk

### Verification

- **git log** confirmed the file was introduced in commit
  `e4b1d67e71419` (v6.14-rc1)
- **git tag --contains** confirmed the file first appeared in v6.14
- **git show f7c330a8c83c9** confirmed a dependency on a prior fix that
  changed indexing from hartid to cpuid
- **git tag** confirmed v6.14 is no longer actively maintained (stopped
  at v6.14.11); v6.18.y is the relevant current stable tree
- The uninitialized variable bug is verified by reading the original
  code: if `for_each_online_cpu` loop doesn't find a matching IRQ,
  `hartid` is used uninitialized at line 187
- The `dev_err_probe` bug is verified: when `sbi_probe_extension`
  returns 0, passing 0 to `dev_err_probe` returns 0 (success), causing
  probe to incorrectly succeed
- The dead code is verified: the switch statement either returns 0 or
  gotos to error cleanup, making code after it unreachable

### Assessment

This commit fixes real bugs:
1. An uninitialized variable in an interrupt handler (potential
   undefined behavior / crash)
2. An incorrect probe success path when hardware support is missing
3. Dead code removal (minor cleanup)

The fixes are small, well-contained, and low-risk. They fix genuine bugs
in a driver that exists only in v6.14+. The commit has a dependency on
`f7c330a8c83c9` which would also need to be backported.

While the bugs are real, this is a very niche RISC-V mailbox driver
(Microchip IPC over SBI). The fixes are appropriate for stable if the
prerequisite commit is also included. The uninitialized variable and
incorrect probe return value are both correctness bugs worth fixing.

**YES**

 drivers/mailbox/mailbox-mchp-ipc-sbi.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/mailbox/mailbox-mchp-ipc-sbi.c b/drivers/mailbox/mailbox-mchp-ipc-sbi.c
index d444491a584e8..b87bf2fb4b9b9 100644
--- a/drivers/mailbox/mailbox-mchp-ipc-sbi.c
+++ b/drivers/mailbox/mailbox-mchp-ipc-sbi.c
@@ -174,17 +174,21 @@ static irqreturn_t mchp_ipc_cluster_aggr_isr(int irq, void *data)
 	struct mchp_ipc_msg ipc_msg;
 	struct mchp_ipc_status status_msg;
 	int ret;
-	unsigned long hartid;
 	u32 i, chan_index, chan_id;
+	bool found = false;
 
 	/* Find out the hart that originated the irq */
 	for_each_online_cpu(i) {
-		hartid = cpuid_to_hartid_map(i);
-		if (irq == ipc->cluster_cfg[i].irq)
+		if (irq == ipc->cluster_cfg[i].irq) {
+			found = true;
 			break;
+		}
 	}
 
-	status_msg.cluster = hartid;
+	if (unlikely(!found))
+		return IRQ_NONE;
+
+	status_msg.cluster = cpuid_to_hartid_map(i);
 	memcpy(ipc->cluster_cfg[i].buf_base, &status_msg, sizeof(struct mchp_ipc_status));
 
 	ret = mchp_ipc_sbi_send(SBI_EXT_IPC_STATUS, ipc->cluster_cfg[i].buf_base_addr);
@@ -321,13 +325,6 @@ static int mchp_ipc_startup(struct mbox_chan *chan)
 		goto fail_free_buf_msg_rx;
 	}
 
-	if (ret) {
-		dev_err(ipc->dev, "failed to register interrupt(s)\n");
-		goto fail_free_buf_msg_rx;
-	}
-
-	return ret;
-
 fail_free_buf_msg_rx:
 	kfree(chan_info->msg_buf_rx);
 fail_free_buf_msg_tx:
@@ -419,7 +416,7 @@ static int mchp_ipc_probe(struct platform_device *pdev)
 
 	ret = sbi_probe_extension(SBI_EXT_MICROCHIP_TECHNOLOGY);
 	if (ret <= 0)
-		return dev_err_probe(dev, ret, "Microchip SBI extension not detected\n");
+		return dev_err_probe(dev, -ENODEV, "Microchip SBI extension not detected\n");
 
 	ipc = devm_kzalloc(dev, sizeof(*ipc), GFP_KERNEL);
 	if (!ipc)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.19-5.15] mailbox: sprd: mask interrupts that are not handled
  2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
                   ` (3 preceding siblings ...)
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.18] mailbox: mchp-ipc-sbi: fix uninitialized symbol and other smatch warnings Sasha Levin
@ 2026-02-15 17:41 ` Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.18] mailbox: mchp-ipc-sbi: fix out-of-bounds access in mchp_ipc_get_cluster_aggr_irq() Sasha Levin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Otto Pflüger, Jassi Brar, Sasha Levin, orsonzhai,
	linux-kernel

From: Otto Pflüger <otto.pflueger@abscue.de>

[ Upstream commit 75df94d05fc03fd9d861eaf79ce10fbb7a548bd8 ]

To reduce the amount of spurious interrupts, disable the interrupts that
are not handled in this driver.

Signed-off-by: Otto Pflüger <otto.pflueger@abscue.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Comprehensive Analysis

### 1. What the Commit Does

This commit changes the interrupt mask initialization in
`sprd_mbox_startup()` for the Spreadtrum mailbox driver. The two key
changes are:

**a) Deterministic initialization instead of read-modify-write:**
- **Before:** Read the hardware's current IRQ mask register state, then
  clear specific bits to unmask desired interrupts. This left other bits
  in whatever state the hardware/bootloader had them.
- **After:** Start from the full mask constant (all interrupts masked),
  then unmask only the ones the driver actually handles. This is fully
  deterministic.

**b) Stop enabling the inbox overflow interrupt:**
- **Before:** Both `SPRD_INBOX_FIFO_OVERFLOW_IRQ` (BIT(1)) and
  `SPRD_INBOX_FIFO_DELIVER_IRQ` (BIT(2)) were unmasked. The driver only
  handles delivery interrupts.
- **After:** Only `SPRD_INBOX_FIFO_DELIVER_IRQ` is unmasked.

### 2. Bug Being Fixed

The `sprd_mbox_inbox_isr()` only checks the delivery status bits
(`SPRD_INBOX_FIFO_DELIVER_MASK`). If no delivery is pending, it returns
`IRQ_NONE` with a "spurious inbox interrupt" warning. This means:

- If the inbox overflow interrupt fires without a concurrent delivery
  event, the ISR returns `IRQ_NONE`
- Repeated `IRQ_NONE` returns trigger the kernel's spurious interrupt
  detection in `note_interrupt()` (`kernel/irq/spurious.c`), which can
  eventually **disable the entire IRQ line** with a "nobody cared"
  message
- If the inbox delivery interrupt shares the same IRQ line, disabling
  the IRQ would **break all mailbox communication**

Similarly, the old code left outbox bits 1-4 and inbox bit 0 in an
indeterminate state dependent on hardware power-on defaults, potentially
enabling additional interrupts the driver doesn't handle.

### 3. Stable Kernel Criteria Assessment

- **Obviously correct:** Yes. The change is straightforward - mask
  everything, then unmask only what the driver handles. This matches the
  `sprd_mbox_shutdown()` function which already uses
  `SPRD_INBOX_FIFO_IRQ_MASK` and `SPRD_OUTBOX_FIFO_IRQ_MASK` to mask all
  interrupts.
- **Fixes a real bug:** Yes. Spurious interrupts can lead to the kernel
  disabling the IRQ line ("nobody cared"), which would break mailbox
  functionality entirely.
- **Small and contained:** Yes. 4 insertions, 6 deletions in a single
  file, affecting one function.
- **No new features:** Correct. This only changes interrupt masking
  behavior.
- **Self-contained:** Yes. No dependency on the revision 2 commit or the
  delivery flag commit. The startup function code is unchanged between
  v6.19 and the state before this patch.

### 4. Risk Assessment

- **Risk:** Very low. The change makes the interrupt state deterministic
  and only enables interrupts the driver actually handles. The shutdown
  function already uses the same mask constants.
- **Regression potential:** Minimal. The only functional change is that
  the inbox overflow interrupt is no longer enabled - but since the ISR
  never handled it, enabling it was always a bug.
- **Scope:** Single driver (sprd-mailbox), single function, 10 lines
  changed.

### 5. Backport Applicability

The driver exists since v5.8. The buggy code in `sprd_mbox_startup()`
has been present since the driver was added (`ca27fc26cd221`). The patch
should apply cleanly to all stable trees that include this driver
(5.10.y, 5.15.y, 6.1.y, 6.6.y, etc.) as the startup function has never
been modified between the initial driver addition and this fix.

### Verification

- **Verified** that `sprd_mbox_inbox_isr()` does NOT handle overflow
  interrupts - it only checks `SPRD_INBOX_FIFO_DELIVER_MASK` and returns
  `IRQ_NONE` if no delivery is pending (lines 153-193 of the v6.19
  source).
- **Verified** that `sprd_mbox_shutdown()` already uses
  `SPRD_INBOX_FIFO_IRQ_MASK` and `SPRD_OUTBOX_FIFO_IRQ_MASK` to disable
  all interrupts (lines 269-284), confirming the mask constants are the
  correct "all masked" values.
- **Verified** the startup function is unchanged between v6.19 (HEAD)
  and the state just before this patch on master (identical code at
  lines 236-268).
- **Verified** the revision 2 commit (`c6ff944003cf1`) does NOT modify
  `sprd_mbox_startup()`, confirming this patch has no dependency on it.
- **Verified** the delivery flag commit (`c77661d60d422`) modifies
  `sprd_mbox_inbox_isr()` only, not `sprd_mbox_startup()`, confirming
  independence.
- **Verified** the driver was added in v5.8-rc1 (`ca27fc26cd221`), so it
  exists in all currently maintained stable trees (5.10+).
- **Verified** via lore.kernel.org that no reviewer concerns or NAKs
  were raised about this patch.
- **Verified** that `SPRD_INBOX_FIFO_IRQ_MASK = GENMASK(2,0) = 0x7` and
  `SPRD_OUTBOX_FIFO_IRQ_MASK = GENMASK(4,0) = 0x1F` from the source
  definitions (lines 44-52).
- **Could NOT verify** specific user reports of the "nobody cared" IRQ
  being triggered in practice (no Reported-by tag, no bug report link).
  However, the code analysis clearly shows this is possible if the
  overflow interrupt fires without a concurrent delivery.

### Summary

This is a small, self-contained bug fix that prevents spurious
interrupts from potentially disabling the entire IRQ line for the
Spreadtrum mailbox driver. The fix is obviously correct (mask what you
don't handle), has minimal risk, applies cleanly to all stable trees,
and prevents a real failure mode (kernel disabling the shared IRQ). It
meets all stable kernel criteria.

**YES**

 drivers/mailbox/sprd-mailbox.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/mailbox/sprd-mailbox.c b/drivers/mailbox/sprd-mailbox.c
index ee8539dfcef54..c1a5fe6cc8771 100644
--- a/drivers/mailbox/sprd-mailbox.c
+++ b/drivers/mailbox/sprd-mailbox.c
@@ -243,21 +243,19 @@ static int sprd_mbox_startup(struct mbox_chan *chan)
 		/* Select outbox FIFO mode and reset the outbox FIFO status */
 		writel(0x0, priv->outbox_base + SPRD_MBOX_FIFO_RST);
 
-		/* Enable inbox FIFO overflow and delivery interrupt */
-		val = readl(priv->inbox_base + SPRD_MBOX_IRQ_MSK);
-		val &= ~(SPRD_INBOX_FIFO_OVERFLOW_IRQ | SPRD_INBOX_FIFO_DELIVER_IRQ);
+		/* Enable inbox FIFO delivery interrupt */
+		val = SPRD_INBOX_FIFO_IRQ_MASK;
+		val &= ~SPRD_INBOX_FIFO_DELIVER_IRQ;
 		writel(val, priv->inbox_base + SPRD_MBOX_IRQ_MSK);
 
 		/* Enable outbox FIFO not empty interrupt */
-		val = readl(priv->outbox_base + SPRD_MBOX_IRQ_MSK);
+		val = SPRD_OUTBOX_FIFO_IRQ_MASK;
 		val &= ~SPRD_OUTBOX_FIFO_NOT_EMPTY_IRQ;
 		writel(val, priv->outbox_base + SPRD_MBOX_IRQ_MSK);
 
 		/* Enable supplementary outbox as the fundamental one */
 		if (priv->supp_base) {
 			writel(0x0, priv->supp_base + SPRD_MBOX_FIFO_RST);
-			val = readl(priv->supp_base + SPRD_MBOX_IRQ_MSK);
-			val &= ~SPRD_OUTBOX_FIFO_NOT_EMPTY_IRQ;
 			writel(val, priv->supp_base + SPRD_MBOX_IRQ_MSK);
 		}
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.19-6.18] mailbox: mchp-ipc-sbi: fix out-of-bounds access in mchp_ipc_get_cluster_aggr_irq()
  2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
                   ` (4 preceding siblings ...)
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-5.15] mailbox: sprd: mask interrupts that are not handled Sasha Levin
@ 2026-02-15 17:41 ` Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-5.10] mailbox: sprd: clear delivery flag before handling TX done Sasha Levin
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Valentina Fernandez, Conor Dooley, Jassi Brar, Sasha Levin,
	linux-kernel

From: Valentina Fernandez <valentina.fernandezalanis@microchip.com>

[ Upstream commit f7c330a8c83c9b0332fd524097eaf3e69148164d ]

The cluster_cfg array is dynamically allocated to hold per-CPU
configuration structures, with its size based on the number of online
CPUs. Previously, this array was indexed using hartid, which may be
non-contiguous or exceed the bounds of the array, leading to
out-of-bounds access.
Switch to using cpuid as the index, as it is guaranteed to be within
the valid range provided by for_each_online_cpu().

Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

The driver was introduced in v6.14-rc1. This means it's only present in
very recent kernel trees — 6.14 and later. It would only need
backporting to 6.14.y and potentially 6.15.y if those exist as stable
branches.

Now let me also check the ISR fix more carefully. I notice the current
(unfixed) code at line 183 reads:

```c
if (irq == ipc->cluster_cfg[hartid].irq)
```

But the diff shows the fix changes this to `ipc->cluster_cfg[i].irq`
where `i` is the `for_each_online_cpu` iterator. This is correct — `i`
will always be within `[0, num_online_cpus())`.

## Summary of Findings

### What the commit fixes
An **out-of-bounds array access** in the Microchip IPC SBI mailbox
driver. The `cluster_cfg` array is allocated with `num_online_cpus()`
elements but was indexed using RISC-V `hartid`, which can exceed array
bounds when hart IDs are non-contiguous. This affects both the setup
path (`mchp_ipc_get_cluster_aggr_irq`) and the interrupt handler
(`mchp_ipc_cluster_aggr_isr`).

### Bug severity
- **Out-of-bounds write** in probe path: memory corruption
- **Out-of-bounds read** in IRQ context: potential crash/panic
- This is a real, triggerable bug on any RISC-V system with non-
  contiguous hart IDs using Microchip IPC hardware

### Stable kernel criteria
1. **Obviously correct and tested**: Yes — the fix simply changes array
   indexing from hartid to cpuid, which is guaranteed in-bounds.
   Reviewed by Conor Dooley (Microchip).
2. **Fixes a real bug**: Yes — out-of-bounds memory access
3. **Important issue**: Yes — memory corruption and potential kernel
   crash
4. **Small and contained**: Yes — only changes one file, replacing
   `hartid` with `cpuid`/`i` in array accesses
5. **No new features**: Correct, pure bug fix
6. **Applies cleanly**: Straightforward change to a single file

### Risk assessment
- **Very low risk** — the change is mechanical substitution of the array
  index from hartid to cpuid
- The fix is purely within one driver file
- No behavioral change other than fixing the out-of-bounds access

### Concerns
- The driver was only introduced in v6.14-rc1, so it's only relevant for
  6.14.y stable trees and newer
- No dependency on other commits; self-contained fix

## Verification

- **Read the full driver source** (`drivers/mailbox/mailbox-mchp-ipc-
  sbi.c`): Confirmed the allocation at line 465 uses `num_online_cpus()`
  and the buggy accesses use `hartid` as index
- **Verified driver introduction**: `git log --diff-filter=A` shows the
  driver was added in commit e4b1d67e71419, which is `v6.14-rc1~97^2~10`
  — only in 6.14+
- **Confirmed the bug mechanism**: RISC-V hartids can be non-contiguous
  (this is well-known in the RISC-V kernel community), so `hartid >=
  num_online_cpus()` is a realistic scenario
- **Confirmed the ISR is also affected**: Line 183 in the unfixed code
  accesses `cluster_cfg[hartid]` in interrupt context
- **Confirmed the fix is correct**: `cpuid` from `for_each_online_cpu()`
  is always in range `[0, num_online_cpus())`
- **Verified the patch is reviewed**: "Reviewed-by: Conor Dooley"
  (Microchip kernel developer)

This is a clear, small, correct fix for a real out-of-bounds memory
access bug that can cause memory corruption or kernel crashes. It meets
all stable kernel criteria.

**YES**

 drivers/mailbox/mailbox-mchp-ipc-sbi.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/mailbox/mailbox-mchp-ipc-sbi.c b/drivers/mailbox/mailbox-mchp-ipc-sbi.c
index a6e52009a4245..d444491a584e8 100644
--- a/drivers/mailbox/mailbox-mchp-ipc-sbi.c
+++ b/drivers/mailbox/mailbox-mchp-ipc-sbi.c
@@ -180,20 +180,20 @@ static irqreturn_t mchp_ipc_cluster_aggr_isr(int irq, void *data)
 	/* Find out the hart that originated the irq */
 	for_each_online_cpu(i) {
 		hartid = cpuid_to_hartid_map(i);
-		if (irq == ipc->cluster_cfg[hartid].irq)
+		if (irq == ipc->cluster_cfg[i].irq)
 			break;
 	}
 
 	status_msg.cluster = hartid;
-	memcpy(ipc->cluster_cfg[hartid].buf_base, &status_msg, sizeof(struct mchp_ipc_status));
+	memcpy(ipc->cluster_cfg[i].buf_base, &status_msg, sizeof(struct mchp_ipc_status));
 
-	ret = mchp_ipc_sbi_send(SBI_EXT_IPC_STATUS, ipc->cluster_cfg[hartid].buf_base_addr);
+	ret = mchp_ipc_sbi_send(SBI_EXT_IPC_STATUS, ipc->cluster_cfg[i].buf_base_addr);
 	if (ret < 0) {
 		dev_err_ratelimited(ipc->dev, "could not get IHC irq status ret=%d\n", ret);
 		return IRQ_HANDLED;
 	}
 
-	memcpy(&status_msg, ipc->cluster_cfg[hartid].buf_base, sizeof(struct mchp_ipc_status));
+	memcpy(&status_msg, ipc->cluster_cfg[i].buf_base, sizeof(struct mchp_ipc_status));
 
 	/*
 	 * Iterate over each bit set in the IHC interrupt status register (IRQ_STATUS) to identify
@@ -385,21 +385,21 @@ static int mchp_ipc_get_cluster_aggr_irq(struct mchp_ipc_sbi_mbox *ipc)
 		if (ret <= 0)
 			continue;
 
-		ipc->cluster_cfg[hartid].irq = ret;
-		ret = devm_request_irq(ipc->dev, ipc->cluster_cfg[hartid].irq,
+		ipc->cluster_cfg[cpuid].irq = ret;
+		ret = devm_request_irq(ipc->dev, ipc->cluster_cfg[cpuid].irq,
 				       mchp_ipc_cluster_aggr_isr, IRQF_SHARED,
 				       "miv-ihc-irq", ipc);
 		if (ret)
 			return ret;
 
-		ipc->cluster_cfg[hartid].buf_base = devm_kmalloc(ipc->dev,
-								 sizeof(struct mchp_ipc_status),
-								 GFP_KERNEL);
+		ipc->cluster_cfg[cpuid].buf_base = devm_kmalloc(ipc->dev,
+								sizeof(struct mchp_ipc_status),
+								GFP_KERNEL);
 
-		if (!ipc->cluster_cfg[hartid].buf_base)
+		if (!ipc->cluster_cfg[cpuid].buf_base)
 			return -ENOMEM;
 
-		ipc->cluster_cfg[hartid].buf_base_addr = __pa(ipc->cluster_cfg[hartid].buf_base);
+		ipc->cluster_cfg[cpuid].buf_base_addr = __pa(ipc->cluster_cfg[cpuid].buf_base);
 
 		irq_found = true;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.19-5.10] mailbox: sprd: clear delivery flag before handling TX done
  2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
                   ` (5 preceding siblings ...)
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.18] mailbox: mchp-ipc-sbi: fix out-of-bounds access in mchp_ipc_get_cluster_aggr_irq() Sasha Levin
@ 2026-02-15 17:41 ` Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.18] f2fs: fix to do sanity check on node footer in __write_node_folio() Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-5.15] remoteproc: mediatek: Break lock dependency to `prepare_lock` Sasha Levin
  8 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Otto Pflüger, Jassi Brar, Sasha Levin, orsonzhai,
	linux-kernel

From: Otto Pflüger <otto.pflueger@abscue.de>

[ Upstream commit c77661d60d4223bf2ff10d409beb0c3b2021183b ]

If there are any pending messages in the mailbox queue, they are sent
as soon as a TX done event arrives from the driver. This may trigger a
new delivery interrupt while the previous one is still being handled.
If the delivery status is cleared after this, the interrupt is lost.
To prevent this from happening, clear the delivery status immediately
after checking it and before any new messages are sent.

Signed-off-by: Otto Pflüger <otto.pflueger@abscue.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

The function is essentially identical across all versions (only a typo
"traget" → "target" changed in a separate commit). The patch will apply
cleanly to all stable trees.

## 3. Classification

**Bug type**: Race condition / lost interrupt in interrupt handler
**Category**: Driver bug fix - interrupt handling race
**Severity**: Medium-High - lost interrupts can cause mailbox
communication stalls/hangs

## 4. Scope and Risk Assessment

- **Lines changed**: 5 insertions, 5 deletions (10 line diff, but
  effectively just moving 5 lines)
- **Files touched**: 1 file only (`drivers/mailbox/sprd-mailbox.c`)
- **Risk**: Extremely low - no logic change, just reordering operations
- **Correctness**: The fix is clearly correct - the status register
  value is already captured in `fifo_sts` before clearing, so all
  subsequent reads use the captured value

## 5. User Impact

This affects users of Spreadtrum/Unisoc SoCs (primarily mobile devices
and some embedded platforms, including some Android devices). Lost
mailbox interrupts can cause:
- Communication stalls between CPU cores
- Potential system hangs if mailbox is used for critical inter-processor
  communication
- Message loss and protocol failures

## 6. Stability Indicators

- Commit by someone who clearly understands the hardware and the bug
  mechanism
- Accepted by the subsystem maintainer (Jassi Brar)
- The fix pattern (clear status before processing, not after) is a well-
  known best practice for interrupt handlers

## 7. Dependency Check

The fix has no dependencies on other commits. It modifies the same
`sprd_mbox_inbox_isr` function that has existed since the driver's
introduction in v5.8, with minimal changes in between. The patch should
apply cleanly to all stable trees containing this driver.

## Verification

- **git log** showed `sprd-mailbox.c` was introduced in commit
  `ca27fc26cd221` (v5.8)
- **git show** of the original commit confirmed the buggy pattern
  (clear-after-process) has been present since the driver's initial
  version
- **git tag --contains** confirmed the driver exists in stable trees
  (5.10.y, 5.15.y, 6.1.y, etc.)
- **Code trace**: Verified `mbox_chan_txdone()` → `tx_tick()` →
  `msg_submit()` → `send_data()` call chain in
  `drivers/mailbox/mailbox.c` confirms that `mbox_chan_txdone()` can
  trigger new message transmission (line 70:
  `chan->mbox->ops->send_data(chan, data)`)
- **Code comparison**: Verified the inbox ISR function is essentially
  unchanged between initial commit and current mainline, confirming
  clean backport
- **File read**: Confirmed current tree still has the buggy ordering
  (line 184-187 has writel after the while loop)
- The fix commit is `c77661d60d422`, 5 lines added / 5 deleted, single
  file

## Summary

This is a clear, well-understood race condition fix in an interrupt
handler. The bug causes lost mailbox interrupts when a TX completion
triggers immediate re-sending of a new message. The fix is minimal (just
reordering existing code), obviously correct, affects a single driver
file, and applies cleanly to all stable trees. It fixes a real bug that
can cause communication failures on Unisoc/Spreadtrum SoC platforms.
This is a textbook example of a good stable backport candidate.

**YES**

 drivers/mailbox/sprd-mailbox.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/mailbox/sprd-mailbox.c b/drivers/mailbox/sprd-mailbox.c
index c1a5fe6cc8771..46d0c34177ab9 100644
--- a/drivers/mailbox/sprd-mailbox.c
+++ b/drivers/mailbox/sprd-mailbox.c
@@ -166,6 +166,11 @@ static irqreturn_t sprd_mbox_inbox_isr(int irq, void *data)
 		return IRQ_NONE;
 	}
 
+	/* Clear FIFO delivery and overflow status first */
+	writel(fifo_sts &
+	       (SPRD_INBOX_FIFO_DELIVER_MASK | SPRD_INBOX_FIFO_OVERLOW_MASK),
+	       priv->inbox_base + SPRD_MBOX_FIFO_RST);
+
 	while (send_sts) {
 		id = __ffs(send_sts);
 		send_sts &= (send_sts - 1);
@@ -181,11 +186,6 @@ static irqreturn_t sprd_mbox_inbox_isr(int irq, void *data)
 			mbox_chan_txdone(chan, 0);
 	}
 
-	/* Clear FIFO delivery and overflow status */
-	writel(fifo_sts &
-	       (SPRD_INBOX_FIFO_DELIVER_MASK | SPRD_INBOX_FIFO_OVERLOW_MASK),
-	       priv->inbox_base + SPRD_MBOX_FIFO_RST);
-
 	/* Clear irq status */
 	writel(SPRD_MBOX_IRQ_CLR, priv->inbox_base + SPRD_MBOX_IRQ_STS);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.19-6.18] f2fs: fix to do sanity check on node footer in __write_node_folio()
  2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
                   ` (6 preceding siblings ...)
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-5.10] mailbox: sprd: clear delivery flag before handling TX done Sasha Levin
@ 2026-02-15 17:41 ` Sasha Levin
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-5.15] remoteproc: mediatek: Break lock dependency to `prepare_lock` Sasha Levin
  8 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable; +Cc: Chao Yu, Jaegeuk Kim, Sasha Levin, linux-f2fs-devel

From: Chao Yu <chao@kernel.org>

[ Upstream commit 0a736109c9d29de0c26567e42cb99b27861aa8ba ]

Add node footer sanity check during node folio's writeback, if sanity
check fails, let's shutdown filesystem to avoid looping to redirty
and writeback in .writepages.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

So `sanity_check_node_footer` doesn't exist in 6.12.y. The fix would
need more substantial adaptation for older trees, but the core issue
(looping redirty on corrupted footer) still exists there - the old code
uses `f2fs_bug_on` which only triggers a WARN_ON (unless
CONFIG_F2FS_CHECK_FS is set) and then continues processing the corrupted
node.

## Analysis Summary

### What the commit does

The commit replaces `f2fs_bug_on(sbi, folio->index != nid)` in
`__write_node_folio()` with a proper call to
`sanity_check_node_footer()` + `f2fs_handle_critical_error()`.

### The bug

When a node folio has a corrupted footer (nid mismatch or other
inconsistency):

1. **Old behavior**: `f2fs_bug_on()` triggers either a `BUG_ON()`
   (kernel crash with `CONFIG_F2FS_CHECK_FS`) or just `WARN_ON` + sets
   `SBI_NEED_FSCK` and **continues execution**. In the WARN_ON case, the
   corrupted node gets processed and written. But more critically, if
   the node gets redirected to `redirty_out` or encounters another issue
   later, it enters an **infinite loop** of being redirtied and re-
   attempted for writeback in `.writepages`, since nothing stops the
   cycle.

2. **New behavior**: `sanity_check_node_footer()` detects the corruption
   more thoroughly (checking multiple footer fields, not just nid), and
   `f2fs_handle_critical_error()` shuts down the filesystem to **break
   the infinite writeback loop**.

### Why it matters

- **Infinite loop / soft lockup**: Without this fix, a corrupted node
  footer causes the kernel to loop endlessly trying to write the page,
  consuming CPU and potentially hanging the system.
- **Filesystem corruption defense**: On corrupted or fuzzed images, the
  old code would continue operating on inconsistent data.
- **Small, surgical fix**: Only 5 lines changed (1 insertion, 1 deletion
  effectively), replacing an assertion with a proper error handling
  path.

### Dependencies

The fix calls `sanity_check_node_footer()` (introduced in v6.15-rc1 by
commit 1cf6b5670af1f) and `f2fs_handle_critical_error()` +
`STOP_CP_REASON_CORRUPTED_NID`.

Crucially, `sanity_check_node_footer()` was **already backported** to
6.17.y and 6.18.y as part of commit c18ecd99e0c70's backport. Both
`f2fs_handle_critical_error` and `STOP_CP_REASON_CORRUPTED_NID` also
exist in these stable trees.

### Risk assessment

- **Very low risk**: The change replaces a debug assertion with proper
  error handling + filesystem shutdown. The error path (`goto
  redirty_out`) already exists and is well-tested.
- **Applies cleanly** to 6.17.y and 6.18.y (minor context difference
  with `f2fs_down_read` vs `f2fs_down_read_trace` but that's outside the
  hunk).
- For older trees (6.12.y and below), `sanity_check_node_footer()`
  doesn't exist, so a more complex backport would be needed.

## Verification

- **git show 0a736109c9d29**: Confirmed the actual commit is +5/-1
  lines, single file change in fs/f2fs/node.c
- **git show 1cf6b5670af1f**: Confirmed this is the prerequisite that
  introduced `sanity_check_node_footer()`, first appeared in v6.15-rc1
- **git tag --contains 1cf6b5670af1f**: Confirmed first release is
  v6.15-rc1
- **git show v6.17.13:fs/f2fs/node.c**: Confirmed
  `sanity_check_node_footer` exists (line 1503), and `f2fs_bug_on(sbi,
  folio->index != nid)` still present (the buggy line) at line 1754
- **git show v6.18.10:fs/f2fs/node.c**: Same -
  `sanity_check_node_footer` exists and the buggy `f2fs_bug_on` line is
  present
- **git show v6.17.13:fs/f2fs/f2fs.h**: Confirmed
  `f2fs_handle_critical_error` is declared
- **git show v6.17.13:include/linux/f2fs_fs.h**: Confirmed
  `STOP_CP_REASON_CORRUPTED_NID` exists
- **git show v6.18.10**: Same confirmations for both dependencies
- **git show v6.12.71:fs/f2fs/node.c**: Confirmed
  `sanity_check_node_footer` does NOT exist in 6.12.y (would need more
  work)
- **f2fs_bug_on definition** (fs/f2fs/f2fs.h:34-40): Confirmed that
  without CONFIG_F2FS_CHECK_FS, it's just WARN_ON + SBI_NEED_FSCK,
  allowing execution to continue
- **git show 186098f34b8a5**: Confirmed that c18ecd99e0c70 was
  backported to 6.17.y (combined with prerequisites), establishing
  `sanity_check_node_footer` in stable

This is a small, surgical fix that prevents an infinite writeback loop
on corrupted f2fs node footers. It replaces a weak assertion with proper
error handling and filesystem shutdown. All dependencies exist in 6.17.y
and 6.18.y stable trees. The fix is clearly correct, fixes a real bug
(system hang), and is appropriately scoped.

**YES**

 fs/f2fs/node.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 482a362f26254..a963c4165bc4b 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1751,7 +1751,11 @@ static bool __write_node_folio(struct folio *folio, bool atomic, bool *submitted
 
 	/* get old block addr of this node page */
 	nid = nid_of_node(folio);
-	f2fs_bug_on(sbi, folio->index != nid);
+
+	if (sanity_check_node_footer(sbi, folio, nid, NODE_TYPE_REGULAR)) {
+		f2fs_handle_critical_error(sbi, STOP_CP_REASON_CORRUPTED_NID);
+		goto redirty_out;
+	}
 
 	if (f2fs_get_node_info(sbi, nid, &ni, !do_balance))
 		goto redirty_out;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.19-5.15] remoteproc: mediatek: Break lock dependency to `prepare_lock`
  2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
                   ` (7 preceding siblings ...)
  2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.18] f2fs: fix to do sanity check on node footer in __write_node_folio() Sasha Levin
@ 2026-02-15 17:41 ` Sasha Levin
  8 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-02-15 17:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Tzung-Bi Shih, Chen-Yu Tsai, Mathieu Poirier, Sasha Levin,
	andersson, matthias.bgg, angelogioacchino.delregno,
	linux-remoteproc, linux-kernel, linux-arm-kernel, linux-mediatek

From: Tzung-Bi Shih <tzungbi@kernel.org>

[ Upstream commit d935187cfb27fc4168f78f3959aef4eafaae76bb ]

A potential circular locking dependency (ABBA deadlock) exists between
`ec_dev->lock` and the clock framework's `prepare_lock`.

The first order (A -> B) occurs when scp_ipi_send() is called while
`ec_dev->lock` is held (e.g., within cros_ec_cmd_xfer()):
1. cros_ec_cmd_xfer() acquires `ec_dev->lock` and calls scp_ipi_send().
2. scp_ipi_send() calls clk_prepare_enable(), which acquires
   `prepare_lock`.
See #0 in the following example calling trace.
(Lock Order: `ec_dev->lock` -> `prepare_lock`)

The reverse order (B -> A) is more complex and has been observed
(learned) by lockdep.  It involves the clock prepare operation
triggering power domain changes, which then propagates through sysfs
and power supply uevents, eventually calling back into the ChromeOS EC
driver and attempting to acquire `ec_dev->lock`:
1. Something calls clk_prepare(), which acquires `prepare_lock`.  It
   then triggers genpd operations like genpd_runtime_resume(), which
   takes `&genpd->mlock`.
2. Power domain changes can trigger regulator changes; regulator
   changes can then trigger device link changes; device link changes
   can then trigger sysfs changes.  Eventually, power_supply_uevent()
   is called.
3. This leads to calls like cros_usbpd_charger_get_prop(), which calls
   cros_ec_cmd_xfer_status(), which then attempts to acquire
   `ec_dev->lock`.
See #1 ~ #6 in the following example calling trace.
(Lock Order: `prepare_lock` -> `&genpd->mlock` -> ... -> `&ec_dev->lock`)

Move the clk_prepare()/clk_unprepare() operations for `scp->clk` to the
remoteproc prepare()/unprepare() callbacks.  This ensures `prepare_lock`
is only acquired in prepare()/unprepare() callbacks.  Since
`ec_dev->lock` is not involved in the callbacks, the dependency loop is
broken.

This means the clock is always "prepared" when the SCP is running.  The
prolonged "prepared time" for the clock should be acceptable as SCP is
designed to be a very power efficient processor.  The power consumption
impact can be negligible.

A simplified calling trace reported by lockdep:
> -> #6 (&ec_dev->lock)
>        cros_ec_cmd_xfer
>        cros_ec_cmd_xfer_status
>        cros_usbpd_charger_get_port_status
>        cros_usbpd_charger_get_prop
>        power_supply_get_property
>        power_supply_show_property
>        power_supply_uevent
>        dev_uevent
>        uevent_show
>        dev_attr_show
>        sysfs_kf_seq_show
>        kernfs_seq_show
> -> #5 (kn->active#2)
>        kernfs_drain
>        __kernfs_remove
>        kernfs_remove_by_name_ns
>        sysfs_remove_file_ns
>        device_del
>        __device_link_del
>        device_links_driver_bound
> -> #4 (device_links_lock)
>        device_link_remove
>        _regulator_put
>        regulator_put
> -> #3 (regulator_list_mutex)
>        regulator_lock_dependent
>        regulator_disable
>        scpsys_power_off
>        _genpd_power_off
>        genpd_power_off
> -> #2 (&genpd->mlock/1)
>        genpd_add_subdomain
>        pm_genpd_add_subdomain
>        scpsys_add_subdomain
>        scpsys_probe
> -> #1 (&genpd->mlock)
>        genpd_runtime_resume
>        __rpm_callback
>        rpm_callback
>        rpm_resume
>        __pm_runtime_resume
>        clk_core_prepare
>        clk_prepare
> -> #0 (prepare_lock)
>        clk_prepare
>        scp_ipi_send
>        scp_send_ipi
>        mtk_rpmsg_send
>        rpmsg_send
>        cros_ec_pkt_xfer_rpmsg

Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Link: https://lore.kernel.org/r/20260112110755.2435899-1-tzungbi@kernel.org
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Good - the `prepare`/`unprepare` ops were added in 2020, well before any
current stable tree branch point. They exist in 5.10, 5.15, 6.1, and all
newer stable trees.

### 5. User Impact Assessment

This affects **MediaTek SCP on ChromeOS devices** (Chromebooks with
MediaTek SoCs). The deadlock scenario involves:
- ChromeOS EC communication via rpmsg (SCP IPI)
- Clock framework interactions
- Power domain/regulator/device link chains

This is a real-world deadlock that lockdep detected. ChromeOS devices
are widely deployed, and a deadlock in the EC communication path could
cause a complete system hang.

### 6. Risk Assessment

**Changes are mechanical and low-risk:**
- All `clk_prepare_enable()` → `clk_enable()` (7 call sites)
- All `clk_disable_unprepare()` → `clk_disable()` (7 call sites)
- New `scp_prepare()`/`scp_unprepare()` callbacks added to `scp_ops`
  (simple wrappers)

**The pattern is well-established** - `clk_prepare()` is meant to be
called in sleepable context, `clk_enable()` can be called from any
context including IRQ. Separating them is a standard clock framework
pattern.

**Total: 30 insertions, 13 deletions** across 2 files - small and
contained.

### 7. Dependency Check

The commit is self-contained:
- No subsequent commits in these files depend on it
- The `rproc_ops.prepare`/`.unprepare` callbacks exist since 2020
  (present in all stable trees: 5.10+)
- No other infrastructure changes needed

### Verification

- **git log master** confirmed commit d935187cfb27f is the exact commit
  being analyzed
- **git log --oneline d935187cfb27f..master** confirmed no subsequent
  commits modify the same files (no dependencies)
- **git log/show 33467ac3c8dc8** confirmed `prepare`/`unprepare` ops
  were added to `rproc_ops` in April 2020, present in stable trees 5.10+
- **Code review of remoteproc_internal.h** confirmed
  `rproc_prepare_device()`/`rproc_unprepare_device()` wrappers exist and
  are called in `rproc_fw_boot()` and `rproc_shutdown()` paths
- **Reviewed-by and Tested-by** from Chen-Yu Tsai (wenst@chromium.org)
  confirms the fix was tested on real hardware
- **Lockdep call trace** in commit message confirms this is a real
  detected lock ordering issue, not theoretical
- The commit is a standalone single patch (not part of a series) -
  confirmed by lore.kernel.org link format (single patch, not X/Y)

### Summary

This commit fixes a **lockdep-detected ABBA deadlock** in the MediaTek
SCP remoteproc driver. The deadlock involves the clock framework's
`prepare_lock` and the ChromeOS EC's `ec_dev->lock`. The fix is
mechanically simple (splitting `clk_prepare_enable` into separate
`clk_prepare` + `clk_enable` calls), uses existing remoteproc framework
infrastructure (`prepare`/`unprepare` callbacks available since 2020),
and has been reviewed and tested. The scope is limited to 2 files with
30 insertions/13 deletions. The risk of regression is very low as it
uses a well-established clock framework pattern, and the benefit is
preventing a real deadlock on ChromeOS devices with MediaTek SoCs.

**YES**

 drivers/remoteproc/mtk_scp.c     | 39 +++++++++++++++++++++++---------
 drivers/remoteproc/mtk_scp_ipi.c |  4 ++--
 2 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c
index db8fd045468d9..98d00bd5200cc 100644
--- a/drivers/remoteproc/mtk_scp.c
+++ b/drivers/remoteproc/mtk_scp.c
@@ -283,7 +283,7 @@ static irqreturn_t scp_irq_handler(int irq, void *priv)
 	struct mtk_scp *scp = priv;
 	int ret;
 
-	ret = clk_prepare_enable(scp->clk);
+	ret = clk_enable(scp->clk);
 	if (ret) {
 		dev_err(scp->dev, "failed to enable clocks\n");
 		return IRQ_NONE;
@@ -291,7 +291,7 @@ static irqreturn_t scp_irq_handler(int irq, void *priv)
 
 	scp->data->scp_irq_handler(scp);
 
-	clk_disable_unprepare(scp->clk);
+	clk_disable(scp->clk);
 
 	return IRQ_HANDLED;
 }
@@ -665,7 +665,7 @@ static int scp_load(struct rproc *rproc, const struct firmware *fw)
 	struct device *dev = scp->dev;
 	int ret;
 
-	ret = clk_prepare_enable(scp->clk);
+	ret = clk_enable(scp->clk);
 	if (ret) {
 		dev_err(dev, "failed to enable clocks\n");
 		return ret;
@@ -680,7 +680,7 @@ static int scp_load(struct rproc *rproc, const struct firmware *fw)
 
 	ret = scp_elf_load_segments(rproc, fw);
 leave:
-	clk_disable_unprepare(scp->clk);
+	clk_disable(scp->clk);
 
 	return ret;
 }
@@ -691,14 +691,14 @@ static int scp_parse_fw(struct rproc *rproc, const struct firmware *fw)
 	struct device *dev = scp->dev;
 	int ret;
 
-	ret = clk_prepare_enable(scp->clk);
+	ret = clk_enable(scp->clk);
 	if (ret) {
 		dev_err(dev, "failed to enable clocks\n");
 		return ret;
 	}
 
 	ret = scp_ipi_init(scp, fw);
-	clk_disable_unprepare(scp->clk);
+	clk_disable(scp->clk);
 	return ret;
 }
 
@@ -709,7 +709,7 @@ static int scp_start(struct rproc *rproc)
 	struct scp_run *run = &scp->run;
 	int ret;
 
-	ret = clk_prepare_enable(scp->clk);
+	ret = clk_enable(scp->clk);
 	if (ret) {
 		dev_err(dev, "failed to enable clocks\n");
 		return ret;
@@ -734,14 +734,14 @@ static int scp_start(struct rproc *rproc)
 		goto stop;
 	}
 
-	clk_disable_unprepare(scp->clk);
+	clk_disable(scp->clk);
 	dev_info(dev, "SCP is ready. FW version %s\n", run->fw_ver);
 
 	return 0;
 
 stop:
 	scp->data->scp_reset_assert(scp);
-	clk_disable_unprepare(scp->clk);
+	clk_disable(scp->clk);
 	return ret;
 }
 
@@ -909,7 +909,7 @@ static int scp_stop(struct rproc *rproc)
 	struct mtk_scp *scp = rproc->priv;
 	int ret;
 
-	ret = clk_prepare_enable(scp->clk);
+	ret = clk_enable(scp->clk);
 	if (ret) {
 		dev_err(scp->dev, "failed to enable clocks\n");
 		return ret;
@@ -917,12 +917,29 @@ static int scp_stop(struct rproc *rproc)
 
 	scp->data->scp_reset_assert(scp);
 	scp->data->scp_stop(scp);
-	clk_disable_unprepare(scp->clk);
+	clk_disable(scp->clk);
 
 	return 0;
 }
 
+static int scp_prepare(struct rproc *rproc)
+{
+	struct mtk_scp *scp = rproc->priv;
+
+	return clk_prepare(scp->clk);
+}
+
+static int scp_unprepare(struct rproc *rproc)
+{
+	struct mtk_scp *scp = rproc->priv;
+
+	clk_unprepare(scp->clk);
+	return 0;
+}
+
 static const struct rproc_ops scp_ops = {
+	.prepare	= scp_prepare,
+	.unprepare	= scp_unprepare,
 	.start		= scp_start,
 	.stop		= scp_stop,
 	.load		= scp_load,
diff --git a/drivers/remoteproc/mtk_scp_ipi.c b/drivers/remoteproc/mtk_scp_ipi.c
index c068227e251e7..7a37e273b3af8 100644
--- a/drivers/remoteproc/mtk_scp_ipi.c
+++ b/drivers/remoteproc/mtk_scp_ipi.c
@@ -171,7 +171,7 @@ int scp_ipi_send(struct mtk_scp *scp, u32 id, void *buf, unsigned int len,
 	    WARN_ON(len > scp_sizes->ipi_share_buffer_size) || WARN_ON(!buf))
 		return -EINVAL;
 
-	ret = clk_prepare_enable(scp->clk);
+	ret = clk_enable(scp->clk);
 	if (ret) {
 		dev_err(scp->dev, "failed to enable clock\n");
 		return ret;
@@ -211,7 +211,7 @@ int scp_ipi_send(struct mtk_scp *scp, u32 id, void *buf, unsigned int len,
 
 unlock_mutex:
 	mutex_unlock(&scp->send_lock);
-	clk_disable_unprepare(scp->clk);
+	clk_disable(scp->clk);
 
 	return ret;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-02-15 17:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-15 17:41 [PATCH AUTOSEL 6.19-5.10] mailbox: bcm-ferxrm-mailbox: Use default primary handler Sasha Levin
2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] mailbox: pcc: Remove spurious IRQF_ONESHOT usage Sasha Levin
2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] remoteproc: imx_dsp_rproc: Skip RP_MBOX_SUSPEND_SYSTEM when mailbox TX channel is uninitialized Sasha Levin
2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.1] mailbox: imx: Skip the suspend flag for i.MX7ULP Sasha Levin
2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.18] mailbox: mchp-ipc-sbi: fix uninitialized symbol and other smatch warnings Sasha Levin
2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-5.15] mailbox: sprd: mask interrupts that are not handled Sasha Levin
2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.18] mailbox: mchp-ipc-sbi: fix out-of-bounds access in mchp_ipc_get_cluster_aggr_irq() Sasha Levin
2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-5.10] mailbox: sprd: clear delivery flag before handling TX done Sasha Levin
2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-6.18] f2fs: fix to do sanity check on node footer in __write_node_folio() Sasha Levin
2026-02-15 17:41 ` [PATCH AUTOSEL 6.19-5.15] remoteproc: mediatek: Break lock dependency to `prepare_lock` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox