* [PATCH AUTOSEL 6.19-6.18] soc: imx8m: Fix error handling for clk_prepare_enable()
2026-02-14 0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
@ 2026-02-14 0:58 ` Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19] soc/tegra: pmc: Fix unsafe generic_handle_irq() call Sasha Levin
` (7 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2026-02-14 0:58 UTC (permalink / raw)
To: patches, stable
Cc: Peng Fan, kernel test robot, Dan Carpenter, Marco Felsch,
Daniel Baluta, Shawn Guo, Sasha Levin, Frank.Li, neil.armstrong,
yelangyan, imx, linux-arm-kernel
From: Peng Fan <peng.fan@nxp.com>
[ Upstream commit f6ef3d9ff81240e9bcc030f2da132eb0f8a761d7 ]
imx8m_soc_prepare() directly returns the result of clk_prepare_enable(),
which skips proper cleanup if the clock enable fails. Check the return
value of clk_prepare_enable() and release resources if failure.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202601111406.ZVV3YaiU-lkp@intel.com/
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Interesting - in v6.15 and earlier, `clk_prepare_enable()` was called
without checking its return value at all, but the code continued with
proper cleanup on the success path. The v6.15 code has a different
(arguably less severe) issue - it doesn't check `clk_prepare_enable()`
return value but doesn't leak resources on normal exit.
Now I have a complete picture. Let me write the analysis.
---
## Detailed Analysis
### 1. Commit Message Analysis
The commit title explicitly says "Fix error handling for
clk_prepare_enable()". The message clearly describes the bug:
`imx8m_soc_prepare()` directly returns the result of
`clk_prepare_enable()`, bypassing the `err_clk` cleanup label when the
clock enable fails. This is a resource leak bug.
**Reporters:** Kernel test robot (lkp) and Dan Carpenter (well-known
Smatch/static analysis developer). This means the bug was found via
static analysis (Smatch), not a runtime crash report. The presence of
"Closes:" link to a kernel test robot report confirms this.
**Reviews:** Two Reviewed-by tags (Marco Felsch and Daniel Baluta),
indicating the fix was validated by subsystem developers.
### 2. Code Change Analysis
The bug is in `imx8m_soc_prepare()`:
```131:156:drivers/soc/imx/soc-imx8m.c
static int imx8m_soc_prepare(struct platform_device *pdev, const char
*ocotp_compatible)
{
// ... setup code ...
drvdata->ocotp_base = of_iomap(np, 0); // resource 1: iomap
// ...
drvdata->clk = of_clk_get_by_name(np, NULL); // resource 2:
clock ref
// ...
return clk_prepare_enable(drvdata->clk); // BUG: if this
fails, ocotp_base is leaked!
err_clk:
iounmap(drvdata->ocotp_base);
return ret;
}
```
**The bug mechanism:** When `clk_prepare_enable()` fails, the function
returns directly with the error code, skipping the `err_clk` label which
calls `iounmap(drvdata->ocotp_base)`. This leaks:
1. The `ocotp_base` iomap mapping (definite leak)
2. The clock reference from `of_clk_get_by_name()` (also leaked, since
`clk_put()` is not called - this is arguably still not fully fixed,
but the `err_clk` label was designed for the `of_clk_get_by_name`
failure case where `clk_put` should not be called)
**The fix:** Check the return value, jump to `err_clk` on failure, which
properly unmaps `ocotp_base`:
```diff
- return clk_prepare_enable(drvdata->clk);
+ ret = clk_prepare_enable(drvdata->clk);
+ if (ret)
+ goto err_clk;
+ return 0;
```
**Note:** There is a subtlety - when `clk_prepare_enable()` fails, the
`err_clk` label only calls `iounmap()` but not `clk_put()`. The clock
reference obtained from `of_clk_get_by_name()` is still leaked. However,
this is a separate (pre-existing) concern. The fix as written is still
an improvement: it's strictly better than the old code.
### 3. Dependency/Prerequisite Analysis
This is **critical** for the backport decision:
- The function `imx8m_soc_prepare()` was introduced in commit
`390c01073f5d0` ("soc: imx8m: Cleanup with adding
imx8m_soc_[un]prepare"), which first appeared in **v6.16-rc1**.
- This means the **buggy code does NOT exist** in v6.15 or any earlier
stable tree (v6.14, v6.13, v6.12, v6.6, v6.1, 5.15, etc.).
- The bug was introduced in v6.16 and the fix only applies to
**v6.16.y** stable and newer (v6.17.y, v6.18.y if applicable).
- I confirmed the buggy code exists identically in v6.16.12 (latest
v6.16 stable), so the fix would apply cleanly.
### 4. Scope and Risk Assessment
- **Size:** 5 lines changed (+5/-1) in a single file - extremely small
and surgical.
- **Risk:** Essentially zero. The fix only changes the error path
behavior (adds proper cleanup). The success path (when
`clk_prepare_enable()` works) is unchanged (returns 0 instead of the
return value of `clk_prepare_enable()`, but those are equivalent on
success).
- **Subsystem:** SoC driver for NXP i.MX8M family, a widely-used
embedded platform.
### 5. User Impact
- **Affected hardware:** All i.MX8M family SoCs (i.MX8MQ, i.MX8MM,
i.MX8MN, i.MX8MP) - very common in embedded/industrial/automotive use.
- **Trigger likelihood:** Low in practice. `clk_prepare_enable()` for
the OCOTP clock is unlikely to fail on real hardware under normal
conditions.
- **Severity if triggered:** Resource leak (iomap + clock reference).
Not a crash, not security-critical, not data corruption. On repeated
failures (e.g., in a test environment or if the clock framework has
issues), could contribute to resource exhaustion.
- **Detection method:** Static analysis (Smatch/kernel test robot), not
a user report.
### 6. Classification
This is a **real bug fix** - it fixes a resource leak on an error path.
It falls into the category of "error path changes that fix resource
leaks," which is explicitly identified as backport-worthy material
despite being described as a "cleanup."
### 7. Stable Criteria Check
| Criterion | Assessment |
|---|---|
| Obviously correct and tested | Yes - trivially correct, reviewed by 2
developers |
| Fixes a real bug | Yes - resource leak on error path |
| Important issue | Moderate - resource leak, not crash/corruption |
| Small and contained | Yes - 5 lines in 1 file |
| No new features/APIs | Correct |
| Applies cleanly | Yes - to v6.16.y |
### Conclusion
This is a small, obviously correct fix for a resource leak in the i.MX8M
SoC driver's error handling path. The fix is surgical (5 lines), well-
reviewed, and applies cleanly to v6.16.y stable (the only stable tree
that has the affected code). While the bug is unlikely to trigger in
practice (it requires `clk_prepare_enable()` to fail), the fix has
essentially zero regression risk and properly handles an error path that
should have been handled from the start. The fix was reported by kernel
test robot and Dan Carpenter, two well-known automated and manual static
analysis sources, and was reviewed by two subsystem developers.
The fix is appropriate for v6.16.y stable backport. It does NOT apply to
older stable trees (v6.15.y and below) because the affected function was
introduced in v6.16.
**YES**
drivers/soc/imx/soc-imx8m.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c
index 04a1b60f2f2b5..8e2322999f099 100644
--- a/drivers/soc/imx/soc-imx8m.c
+++ b/drivers/soc/imx/soc-imx8m.c
@@ -148,7 +148,11 @@ static int imx8m_soc_prepare(struct platform_device *pdev, const char *ocotp_com
goto err_clk;
}
- return clk_prepare_enable(drvdata->clk);
+ ret = clk_prepare_enable(drvdata->clk);
+ if (ret)
+ goto err_clk;
+
+ return 0;
err_clk:
iounmap(drvdata->ocotp_base);
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH AUTOSEL 6.19] soc/tegra: pmc: Fix unsafe generic_handle_irq() call
2026-02-14 0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.18] soc: imx8m: Fix error handling for clk_prepare_enable() Sasha Levin
@ 2026-02-14 0:58 ` Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.18] x86/sev: Use kfree_sensitive() when freeing a SNP message descriptor Sasha Levin
` (6 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2026-02-14 0:58 UTC (permalink / raw)
To: patches, stable
Cc: Prathamesh Shete, Petlozu Pravareshwar, Jon Hunter,
Thierry Reding, Sasha Levin, thierry.reding, bigeasy, clrkwllms,
rostedt, ulf.hansson, chleroy, jirislaby, yelangyan, linux-tegra,
linux-rt-devel
From: Prathamesh Shete <pshete@nvidia.com>
[ Upstream commit e6d96073af681780820c94079b978474a8a44413 ]
Currently, when resuming from system suspend on Tegra platforms,
the following warning is observed:
WARNING: CPU: 0 PID: 14459 at kernel/irq/irqdesc.c:666
Call trace:
handle_irq_desc+0x20/0x58 (P)
tegra186_pmc_wake_syscore_resume+0xe4/0x15c
syscore_resume+0x3c/0xb8
suspend_devices_and_enter+0x510/0x540
pm_suspend+0x16c/0x1d8
The warning occurs because generic_handle_irq() is being called from
a non-interrupt context which is considered as unsafe.
Fix this warning by deferring generic_handle_irq() call to an IRQ work
which gets executed in hard IRQ context where generic_handle_irq()
can be called safely.
When PREEMPT_RT kernels are used, regular IRQ work (initialized with
init_irq_work) is deferred to run in per-CPU kthreads in preemptible
context rather than hard IRQ context. Hence, use the IRQ_WORK_INIT_HARD
variant so that with PREEMPT_RT kernels, the IRQ work is processed in
hardirq context instead of being deferred to a thread which is required
for calling generic_handle_irq().
On non-PREEMPT_RT kernels, both init_irq_work() and IRQ_WORK_INIT_HARD()
execute in IRQ context, so this change has no functional impact for
standard kernel configurations.
Signed-off-by: Petlozu Pravareshwar <petlozup@nvidia.com>
Signed-off-by: Prathamesh Shete <pshete@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
[treding@nvidia.com: miscellaneous cleanups]
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Complete Analysis
### 1. COMMIT MESSAGE ANALYSIS
**Subject**: "soc/tegra: pmc: Fix unsafe generic_handle_irq() call"
The commit message is unambiguous - it's fixing a **real, observable
WARNING** during system resume on Tegra platforms. The warning backtrace
is included:
```
WARNING: CPU: 0 PID: 14459 at kernel/irq/irqdesc.c:666
Call trace:
handle_irq_desc+0x20/0x58 (P)
tegra186_pmc_wake_syscore_resume+0xe4/0x15c
syscore_resume+0x3c/0xb8
suspend_devices_and_enter+0x510/0x540
pm_suspend+0x16c/0x1d8
```
The fix has been **Reviewed-by** and **Tested-by** Jon Hunter (NVIDIA
kernel engineer), and authored by NVIDIA engineers who are experts on
Tegra PMC.
### 2. CODE CHANGE ANALYSIS - The Bug
**Root cause**: `tegra186_pmc_wake_syscore_resume()` is called by
`syscore_resume()` during system resume. Looking at `syscore_resume()`
in `drivers/base/syscore.c:93-110`:
```c
void syscore_resume(void)
{
WARN_ONCE(!irqs_disabled(), ...);
list_for_each_entry(syscore, &syscore_list, node)
if (syscore->ops->resume)
syscore->ops->resume(syscore->data);
}
```
This runs with **interrupts disabled** but **NOT in hard IRQ context**
(i.e., `in_hardirq()` returns false). However, when
`generic_handle_irq()` is called, it goes through `handle_irq_desc()`
(in `kernel/irq/irqdesc.c:658-671`):
```c
int handle_irq_desc(struct irq_desc *desc)
{
data = irq_desc_get_irq_data(desc);
if (WARN_ON_ONCE(!in_hardirq() &&
irqd_is_handle_enforce_irqctx(data)))
return -EPERM;
generic_handle_irq_desc(desc);
return 0;
}
```
The ARM GIC v2 and GIC v3 interrupt controllers **set
`IRQD_HANDLE_ENFORCE_IRQCTX` on ALL interrupts** they map (confirmed in
`drivers/irqchip/irq-gic.c:1077` and `drivers/irqchip/irq-
gic-v3.c:1585`). Since Tegra SoCs use ARM GIC, any IRQ going through the
Tegra wake domain will have this flag set. Thus, calling
`generic_handle_irq()` from the syscore resume context (not hard IRQ)
triggers the `WARN_ON_ONCE`.
**Why it matters**: Beyond the WARN, the function returns `-EPERM`
without actually handling the interrupt. This means **edge-triggered
wake IRQs are silently dropped**, potentially causing missed wakeup
events and device misbehavior after resume.
### 3. THE FIX
The fix:
1. Adds `struct irq_work wake_work` and `u32 *wake_status` fields to
`struct tegra_pmc`
2. Creates a new `tegra186_pmc_wake_handler()` function (the IRQ work
handler) that contains the logic previously in
`tegra186_pmc_process_wake_events()`
3. `tegra186_pmc_wake_syscore_resume()` now saves wake status to
`pmc->wake_status[]` and queues the IRQ work via `irq_work_queue()`
4. The IRQ work handler runs in **hard IRQ context** where
`generic_handle_irq()` is safe
5. Uses `IRQ_WORK_INIT_HARD()` to ensure hard IRQ context even on
`PREEMPT_RT` kernels
6. Adds a check in `tegra186_pmc_wake_syscore_suspend()` for unhandled
wake IRQs
This is the **correct** fix pattern - using `irq_work` to defer IRQ
handling to hard IRQ context is a well-established kernel pattern.
Multiple other drivers have been fixed similarly (see commits
`94ec234a16cf3`, `118c3ba24d04f`, `c6a91405ac5cd`, `f285de79569f9`,
`f460c70125bcb` which all converted to `generic_handle_irq_safe()`).
### 4. DEPENDENCY ANALYSIS
**Critical dependency**: This commit was written against the v6.19 tree
which has the new `struct syscore` API from commit `a97fbc3ee3e2a`
(v6.19-rc1 only). In stable trees (v6.6, v6.12), the functions use the
old `struct syscore_ops` interface:
- `tegra186_pmc_wake_syscore_resume(void)` (no `void *data` parameter)
- `tegra186_pmc_wake_syscore_suspend(void)` (no `void *data` parameter)
- `struct syscore_ops syscore` (not `struct syscore`)
**However**, the core fix logic (adding `irq_work`, deferring
`generic_handle_irq()`) does NOT depend on the syscore API change. The
fix can be backported with minor adaptation:
- The function signatures need to match the old `syscore_ops` callbacks
(no `void *data`)
- `IRQ_WORK_INIT_HARD` and `irq_work_queue` are available in v6.6+
- The `irq_work` and `wake_status` additions to `struct tegra_pmc` are
self-contained
**Affected stable trees**: v6.6.y, v6.12.y (and any other active stable
tree based on v6.2+, since commit `0474cc8489bda` introduced the buggy
code in v6.2).
### 5. SCOPE AND RISK
- **Files changed**: 1 file (`drivers/soc/tegra/pmc.c`)
- **Lines**: ~+60, ~-30 (net ~30 lines added)
- **Subsystem**: Tegra SoC PMC - a self-contained platform driver
- **Risk**: LOW - The change only affects Tegra platforms with wake
events. The `irq_work` mechanism is well-tested kernel infrastructure.
The fix has been reviewed and tested by NVIDIA engineers.
- **Regression risk**: Very low - non-Tegra systems are unaffected. The
new code path is functionally equivalent but executes in the correct
context.
### 6. USER IMPACT
- **Who**: All users of NVIDIA Tegra186+ SoCs (Tegra186, Tegra194,
Tegra234) using suspend/resume
- **Severity**: Every system resume triggers a kernel WARNING. Beyond
the warning itself, edge-triggered wake IRQs may be silently dropped
because `handle_irq_desc()` returns `-EPERM` without processing the
interrupt
- **Real-world**: Tegra platforms are used in NVIDIA Jetson embedded
systems, automotive platforms, and gaming devices (Nintendo Switch)
### 7. CONCLUSION
This is a clear bug fix for a real, reproducible issue that fires on
every resume on affected hardware. The fix uses a well-established
kernel pattern (irq_work) and is self-contained within a single platform
driver file. It has been reviewed and tested by the subsystem
maintainers. The fix will need minor adaptation for stable trees
(different `syscore_ops` API), but the core logic is fully applicable.
The bug exists in all stable trees from v6.2 onward.
**YES**
drivers/soc/tegra/pmc.c | 104 ++++++++++++++++++++++++++++------------
1 file changed, 74 insertions(+), 30 deletions(-)
diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
index f3760a3b3026d..407fa840814c3 100644
--- a/drivers/soc/tegra/pmc.c
+++ b/drivers/soc/tegra/pmc.c
@@ -28,6 +28,7 @@
#include <linux/iopoll.h>
#include <linux/irqdomain.h>
#include <linux/irq.h>
+#include <linux/irq_work.h>
#include <linux/kernel.h>
#include <linux/of_address.h>
#include <linux/of_clk.h>
@@ -468,6 +469,10 @@ struct tegra_pmc {
unsigned long *wake_sw_status_map;
unsigned long *wake_cntrl_level_map;
struct syscore syscore;
+
+ /* Pending wake IRQ processing */
+ struct irq_work wake_work;
+ u32 *wake_status;
};
static struct tegra_pmc *pmc = &(struct tegra_pmc) {
@@ -1905,6 +1910,50 @@ static int tegra_pmc_parse_dt(struct tegra_pmc *pmc, struct device_node *np)
return 0;
}
+/* translate sc7 wake sources back into IRQs to catch edge triggered wakeups */
+static void tegra186_pmc_wake_handler(struct irq_work *work)
+{
+ struct tegra_pmc *pmc = container_of(work, struct tegra_pmc, wake_work);
+ unsigned int i, wake;
+
+ for (i = 0; i < pmc->soc->max_wake_vectors; i++) {
+ unsigned long status = pmc->wake_status[i];
+
+ for_each_set_bit(wake, &status, 32) {
+ irq_hw_number_t hwirq = wake + (i * 32);
+ struct irq_desc *desc;
+ unsigned int irq;
+
+ irq = irq_find_mapping(pmc->domain, hwirq);
+ if (!irq) {
+ dev_warn(pmc->dev,
+ "No IRQ found for WAKE#%lu!\n",
+ hwirq);
+ continue;
+ }
+
+ dev_dbg(pmc->dev,
+ "Resume caused by WAKE#%lu mapped to IRQ#%u\n",
+ hwirq, irq);
+
+ desc = irq_to_desc(irq);
+ if (!desc) {
+ dev_warn(pmc->dev,
+ "No descriptor found for IRQ#%u\n",
+ irq);
+ continue;
+ }
+
+ if (!desc->action || !desc->action->name)
+ continue;
+
+ generic_handle_irq(irq);
+ }
+
+ pmc->wake_status[i] = 0;
+ }
+}
+
static int tegra_pmc_init(struct tegra_pmc *pmc)
{
if (pmc->soc->max_wake_events > 0) {
@@ -1923,6 +1972,18 @@ static int tegra_pmc_init(struct tegra_pmc *pmc)
pmc->wake_cntrl_level_map = bitmap_zalloc(pmc->soc->max_wake_events, GFP_KERNEL);
if (!pmc->wake_cntrl_level_map)
return -ENOMEM;
+
+ pmc->wake_status = kcalloc(pmc->soc->max_wake_vectors, sizeof(u32), GFP_KERNEL);
+ if (!pmc->wake_status)
+ return -ENOMEM;
+
+ /*
+ * Initialize IRQ work for processing wake IRQs. Must use
+ * HARD_IRQ variant to run in hard IRQ context on PREEMPT_RT
+ * because we call generic_handle_irq() which requires hard
+ * IRQ context.
+ */
+ pmc->wake_work = IRQ_WORK_INIT_HARD(tegra186_pmc_wake_handler);
}
if (pmc->soc->init)
@@ -3129,47 +3190,30 @@ static void wke_clear_wake_status(struct tegra_pmc *pmc)
}
}
-/* translate sc7 wake sources back into IRQs to catch edge triggered wakeups */
-static void tegra186_pmc_process_wake_events(struct tegra_pmc *pmc, unsigned int index,
- unsigned long status)
-{
- unsigned int wake;
-
- dev_dbg(pmc->dev, "Wake[%d:%d] status=%#lx\n", (index * 32) + 31, index * 32, status);
-
- for_each_set_bit(wake, &status, 32) {
- irq_hw_number_t hwirq = wake + 32 * index;
- struct irq_desc *desc;
- unsigned int irq;
-
- irq = irq_find_mapping(pmc->domain, hwirq);
-
- desc = irq_to_desc(irq);
- if (!desc || !desc->action || !desc->action->name) {
- dev_dbg(pmc->dev, "Resume caused by WAKE%ld, IRQ %d\n", hwirq, irq);
- continue;
- }
-
- dev_dbg(pmc->dev, "Resume caused by WAKE%ld, %s\n", hwirq, desc->action->name);
- generic_handle_irq(irq);
- }
-}
-
static void tegra186_pmc_wake_syscore_resume(void *data)
{
- u32 status, mask;
unsigned int i;
+ u32 mask;
for (i = 0; i < pmc->soc->max_wake_vectors; i++) {
mask = readl(pmc->wake + WAKE_AOWAKE_TIER2_ROUTING(i));
- status = readl(pmc->wake + WAKE_AOWAKE_STATUS_R(i)) & mask;
-
- tegra186_pmc_process_wake_events(pmc, i, status);
+ pmc->wake_status[i] = readl(pmc->wake + WAKE_AOWAKE_STATUS_R(i)) & mask;
}
+
+ /* Schedule IRQ work to process wake IRQs (if any) */
+ irq_work_queue(&pmc->wake_work);
}
static int tegra186_pmc_wake_syscore_suspend(void *data)
{
+ unsigned int i;
+
+ /* Check if there are unhandled wake IRQs */
+ for (i = 0; i < pmc->soc->max_wake_vectors; i++)
+ if (pmc->wake_status[i])
+ dev_warn(pmc->dev,
+ "Unhandled wake IRQs pending vector[%u]: 0x%x\n",
+ i, pmc->wake_status[i]);
wke_read_sw_wake_status(pmc);
/* flip the wakeup trigger for dual-edge triggered pads
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH AUTOSEL 6.19-6.18] x86/sev: Use kfree_sensitive() when freeing a SNP message descriptor
2026-02-14 0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.18] soc: imx8m: Fix error handling for clk_prepare_enable() Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19] soc/tegra: pmc: Fix unsafe generic_handle_irq() call Sasha Levin
@ 2026-02-14 0:58 ` Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.12] firmware: arm_ffa: Unmap Rx/Tx buffers on init failure Sasha Levin
` (5 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2026-02-14 0:58 UTC (permalink / raw)
To: patches, stable
Cc: Borislav Petkov (AMD), kernel test robot, Julia Lawall,
Tom Lendacky, Sasha Levin, nathan, mingo, ardb, Neeraj.Upadhyay,
aik, yelangyan, ashish.kalra, llvm
From: "Borislav Petkov (AMD)" <bp@alien8.de>
[ Upstream commit af05e558988ed004a20fc4de7d0f80cfbba663f0 ]
Use the proper helper instead of an open-coded variant.
Closes: https://lore.kernel.org/r/202512202235.WHPQkLZu-lkp@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://patch.msgid.link/20260112114147.GBaWTd-8HSy_Xp4S3X@fat_crate.local
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a thorough understanding. Let me compile my full analysis.
---
## Detailed Analysis
### 1. Commit Message Analysis
The commit message is intentionally understated: "Use the proper helper
instead of an open-coded variant." However, this is a **security-
relevant fix** flagged by automated static analysis tools. It was:
- **Reported-by: kernel test robot** (automated scanner from Intel)
- **Reported-by: Julia Lawall** (well-known for Coccinelle semantic
patches that detect exactly these patterns)
- **Reviewed-by: Tom Lendacky** (AMD SEV subsystem maintainer)
- **Authored-by: Borislav Petkov** (x86 co-maintainer, AMD SEV expert)
### 2. Code Change Analysis
The change is in `snp_msg_free()` in `arch/x86/coco/sev/core.c`:
**Before:**
```c
memset(mdesc, 0, sizeof(*mdesc));
kfree(mdesc);
```
**After:**
```c
kfree_sensitive(mdesc);
```
This is a **critical difference** despite appearing cosmetic:
1. **`memset()` can be optimized away**: When `memset()` is followed
immediately by `kfree()`, the compiler can perform dead store
elimination (DSE). Since the memory is about to be freed and no code
reads the zeroed values, the compiler may determine the `memset()` is
a dead store and remove it entirely. This is increasingly common with
modern GCC and Clang optimization levels, and especially with LTO.
2. **`kfree_sensitive()` uses `memzero_explicit()`**: This function
calls `memset()` followed by `barrier_data()`, which is a compiler
barrier that prevents the zero operation from being optimized away.
This is the canonical kernel API specifically designed for clearing
sensitive data before freeing.
3. **`kfree_sensitive()` clears the entire slab allocation**: Via
`ksize()`, it zeros the whole allocated slab object (which may be
larger than `sizeof(*mdesc)`), while the original code only zeroed
`sizeof(*mdesc)`. This provides more thorough clearing of any padding
or slab slack space.
### 3. Sensitivity of the Data
The `snp_msg_desc` structure contains:
- **`secret_request` and `secret_response`** (embedded `struct
snp_guest_msg`): These are **double-buffered encrypted/decrypted guest
messages** used for communication between the SEV-SNP guest VM and the
AMD Secure Processor (ASP). The comment in the struct says "Avoid
information leakage by double-buffering shared messages in fields that
are in regular encrypted memory." Each `snp_guest_msg` contains auth
tags, sequence numbers, and a full-page payload.
- **`vmpck`**: Pointer to the VM Platform Communication Key.
- **`ctx`**: AES-GCM crypto context (freed separately via `kfree()`).
- **`secrets`**: Pointer to the SNP secrets page.
- **`os_area_msg_seqno`**: Message sequence number.
This is AMD SEV-SNP code. The entire purpose of SEV-SNP is to protect
guest VM memory from the hypervisor. Leaving cryptographic state and
message contents in freed slab memory directly undermines the security
guarantees of the technology.
### 4. History and Stable Tree Applicability
- `snp_msg_free()` was introduced in commit c5529418d0507 ("x86/sev:
Carve out and export SNP guest messaging init routines"), which landed
in **v6.14-rc1**.
- The buggy `memset() + kfree()` pattern was introduced by this same
refactoring commit.
- The fix applies to all stable trees from **v6.14 onwards**: 6.14.y,
6.15.y, 6.16.y, 6.17.y, 6.18.y.
- I confirmed the vulnerable code exists in v6.14 and that
`kfree_sensitive()` is available there.
- It does NOT apply to 6.12.y, 6.6.y, or older LTS trees (they use a
different code structure in `drivers/virt/coco/sev-guest/`).
### 5. Risk Assessment
- **Size**: 3 lines changed to 2 lines (net -1 line). One of the
smallest possible changes.
- **Risk**: Essentially zero. `kfree_sensitive()` is the standard kernel
API for this exact use case, has been available since v5.9, and is
used in 300+ call sites across the kernel.
- **No dependencies**: The fix is entirely self-contained.
- **No behavioral change** from a functional perspective — same memory
is freed — only ensures the zeroing actually happens at runtime.
### 6. Classification
This is a **security fix** in **security-critical AMD SEV-SNP code**.
The pattern of `memset() + kfree()` for sensitive data is a well-known
anti-pattern that has been the subject of multiple CVEs in other
software projects. While the compiler may or may not optimize it away in
current builds, it is a latent security vulnerability that could
manifest with different compiler versions, optimization flags, or LTO
configurations.
### 7. Verdict
This commit:
- Fixes a real security issue (potential leakage of cryptographic data
in freed memory)
- Is in security-critical code (AMD SEV-SNP guest messaging)
- Is extremely small and surgical (2 lines -> 1 line)
- Uses the canonical kernel API (`kfree_sensitive()`)
- Has zero risk of regression
- Was reviewed by the AMD SEV maintainer
- Was flagged by two independent automated/semi-automated analysis tools
- Applies cleanly to stable trees v6.14+
- Has no dependencies on other commits
**YES**
arch/x86/coco/sev/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 9ae3b11754e65..c8ddb9febe3d9 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -2008,8 +2008,7 @@ void snp_msg_free(struct snp_msg_desc *mdesc)
free_shared_pages(mdesc->request, sizeof(struct snp_guest_msg));
iounmap((__force void __iomem *)mdesc->secrets);
- memset(mdesc, 0, sizeof(*mdesc));
- kfree(mdesc);
+ kfree_sensitive(mdesc);
}
EXPORT_SYMBOL_GPL(snp_msg_free);
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH AUTOSEL 6.19-6.12] firmware: arm_ffa: Unmap Rx/Tx buffers on init failure
2026-02-14 0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
` (2 preceding siblings ...)
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.18] x86/sev: Use kfree_sensitive() when freeing a SNP message descriptor Sasha Levin
@ 2026-02-14 0:58 ` Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.18] EDAC/igen6: Add two Intel Amston Lake SoCs support Sasha Levin
` (4 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2026-02-14 0:58 UTC (permalink / raw)
To: patches, stable
Cc: Haoxiang Li, Sudeep Holla, Sasha Levin, sudeep.holla,
linux-arm-kernel
From: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
[ Upstream commit 9fda364cb78c8b9e1abe4029f877300c94655742 ]
ffa_init() maps the Rx/Tx buffers via ffa_rxtx_map() but on the
partition setup failure path it never unmaps them.
Add the missing ffa_rxtx_unmap() call in the error path so that
the Rx/Tx buffers are properly released before freeing the backing
pages.
Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Message-Id: <20251210031656.56194-1-lihaoxiang@isrc.iscas.ac.cn>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Analysis
### 1. Commit Message Analysis
The commit message clearly describes a **resource leak on an error
path**: `ffa_init()` maps Rx/Tx buffers via `ffa_rxtx_map()`, but when
`ffa_setup_partitions()` fails, the error path frees the backing pages
without first unmapping the buffers from the firmware. The fix adds the
missing `ffa_rxtx_unmap()` call. The commit was accepted by the FF-A
subsystem maintainer Sudeep Holla.
### 2. Code Change Analysis
The fix is a **single line addition**:
```2071:2071:drivers/firmware/arm_ffa/driver.c
ffa_rxtx_unmap(drv_info->vm_id);
```
inserted between `ffa_notifications_cleanup()` and the `free_pages:`
label.
**Understanding the bug mechanism:**
The `ffa_init()` function follows this flow:
1. Allocates Rx/Tx buffer pages (lines 2038-2048)
2. Maps them to firmware via `ffa_rxtx_map()` (line 2050) — this is an
FF-A SMC/HVC call to the secure firmware (e.g., TrustZone / Hafnium
hypervisor)
3. Sets up partitions via `ffa_setup_partitions()` (line 2065)
4. If partition setup fails, falls through to `free_pages:` which frees
the backing pages
The problem: Step 4 **never calls `ffa_rxtx_unmap()`**, so the firmware
retains stale references to the freed physical pages. The `ffa_exit()`
function (line 2081-2089) correctly calls `ffa_rxtx_unmap()` before
freeing pages, confirming the error path was simply missing this
cleanup.
**Consequences of the bug:**
- The firmware (running in secure world or EL2 hypervisor) still holds
mappings to physical addresses of freed kernel pages
- Those freed pages can be reallocated for other kernel purposes
- The firmware could write to these stale mappings, causing **memory
corruption** of unrelated kernel data
- On systems with a stage-2 hypervisor, the mapping entries are also
leaked
### 3. Bug Origin
The bug was introduced by commit
`0c565d16b80074e57e3e56240d13fc6cd6ed0334` ("firmware: arm_ffa: Handle
partitions setup failures"), which first made `ffa_setup_partitions()`
return an error and added an error path in `ffa_init()`. Before this
commit, `ffa_setup_partitions()` was void and never triggered error
cleanup. This commit landed in **v6.8**. The error path was later
restructured by `be61da9385766` (v6.15), but the missing unmap persisted
through both versions.
### 4. Classification
This is a **resource leak / firmware-side stale mapping fix** on an
error path. It is NOT a new feature, API change, or refactoring — it is
purely adding missing cleanup.
### 5. Scope and Risk Assessment
- **Lines changed**: 1 line added
- **Files changed**: 1 file (`drivers/firmware/arm_ffa/driver.c`)
- **Risk**: Extremely low. The `ffa_rxtx_unmap()` function is already
used in `ffa_exit()` for normal cleanup. Adding it to the error path
is the obvious correct fix. The function itself just issues an FF-A
call to the firmware.
- **Regression potential**: Near zero. This only affects the failure
path of `ffa_setup_partitions()`, and calling unmap before freeing is
strictly correct.
### 6. User Impact
This affects ARM platforms using FF-A (Firmware Framework for Arm) —
notably ARM TrustZone/OP-TEE systems, Arm CCA systems, and virtualized
ARM environments with Hafnium. While `ffa_setup_partitions()` failure is
not the most common path, when it does occur, the consequences (firmware
holding stale page references, potential memory corruption) are severe.
### 7. Stable Kernel Criteria Check
| Criterion | Met? |
|---|---|
| Obviously correct and tested | Yes — mirrors `ffa_exit()` cleanup
exactly |
| Fixes a real bug | Yes — resource leak + potential memory corruption |
| Important issue | Yes — firmware-side stale mapping to freed kernel
pages |
| Small and contained | Yes — 1 line in 1 file |
| No new features | Yes — purely error-path cleanup |
| Applies cleanly | Yes — for v6.8+ trees (may need minor adjustment for
v6.8-v6.14 which used `cleanup_notifs:` label) |
### 8. Dependency Check
The fix depends on `0c565d16b80074e57e3e56240d13fc6cd6ed0334` (v6.8) for
the error path to exist, and on `be61da9385766` (v6.15) for the exact
code structure in the diff. For stable trees between v6.8 and v6.14, a
trivially adjusted backport would be needed (placing the unmap call
inside the `cleanup_notifs:` section instead of inline). The fix is
self-contained and has no other dependencies.
### Conclusion
This is a textbook error-path resource leak fix: a single line adding
missing firmware unmap before freeing the backing pages. The bug leaves
the firmware holding stale references to freed kernel memory, which
could lead to memory corruption. The fix is minimal, obviously correct
(mirrors `ffa_exit()`), and accepted by the subsystem maintainer. It
meets all stable kernel criteria.
**YES**
drivers/firmware/arm_ffa/driver.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index c72ee47565856..7209a630f6d16 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -2068,6 +2068,7 @@ static int __init ffa_init(void)
pr_err("failed to setup partitions\n");
ffa_notifications_cleanup();
+ ffa_rxtx_unmap(drv_info->vm_id);
free_pages:
if (drv_info->tx_buffer)
free_pages_exact(drv_info->tx_buffer, rxtx_bufsz);
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH AUTOSEL 6.19-6.18] EDAC/igen6: Add two Intel Amston Lake SoCs support
2026-02-14 0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
` (3 preceding siblings ...)
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.12] firmware: arm_ffa: Unmap Rx/Tx buffers on init failure Sasha Levin
@ 2026-02-14 0:58 ` Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.18] EDAC/igen6: Add more Intel Panther Lake-H " Sasha Levin
` (3 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2026-02-14 0:58 UTC (permalink / raw)
To: patches, stable
Cc: Qiuxu Zhuo, Tony Luck, Jianfeng Gao, Sasha Levin, bp, linux-edac
From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
[ Upstream commit 41ca2155d62b0b0d217f59e1bce18362d0c2446f ]
Intel Amston Lake SoCs with IBECC (In-Band ECC) capability share the same
IBECC registers as Alder Lake-N SoCs. Add two new compute die IDs for
Amston Lake SoC products to enable EDAC support.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Jianfeng Gao <jianfeng.gao@intel.com>
Link: https://patch.msgid.link/20251124065457.3630949-2-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have all the information needed for a complete analysis. Here is
my detailed assessment:
---
## Detailed Analysis: EDAC/igen6: Add two Intel Amston Lake SoCs support
### 1. Commit Message Analysis
The commit adds two new PCI compute die IDs (`DID_ASL_SKU2 = 0x4646`,
`DID_ASL_SKU3 = 0x4652`) for Intel Amston Lake SoC products to the
`igen6_edac` driver. The commit message explicitly states these SoCs
share the same IBECC registers as Alder Lake-N, meaning they reuse the
existing `adl_n_cfg` configuration structure.
- **Author**: Qiuxu Zhuo (Intel) - the primary maintainer and original
author of this driver
- **Signed-off-by**: Tony Luck (Intel RAS maintainer) - trusted
subsystem maintainer
- **Tested-by**: Jianfeng Gao (Intel) - hardware tested on actual Amston
Lake devices
### 2. Code Change Analysis
The diff is minimal - exactly **4 lines added** across a single file:
1. **Two `#define` additions** (lines after existing `DID_ASL_SKU1`):
- `#define DID_ASL_SKU2 0x4646`
- `#define DID_ASL_SKU3 0x4652`
2. **Two PCI device table entries** (in `igen6_pci_tbl[]`):
- `{ PCI_VDEVICE(INTEL, DID_ASL_SKU2), (kernel_ulong_t)&adl_n_cfg }`
- `{ PCI_VDEVICE(INTEL, DID_ASL_SKU3), (kernel_ulong_t)&adl_n_cfg }`
Both new entries use the already-existing `adl_n_cfg` configuration,
which has been in the kernel since v6.8 (commit `c4a5398991fd2`). No new
code paths, no new functions, no behavioral changes. This is a pure PCI
device ID addition.
### 3. Classification: New Device IDs (Explicitly Allowed Exception)
This is a textbook **new device ID addition** to an existing driver. The
stable kernel rules documentation explicitly allows this:
> "Adding PCI IDs, USB IDs, ACPI IDs, etc. to existing drivers - These
are trivial one-line additions that enable hardware support"
The pattern is identical to many prior device ID additions to this
driver:
- Arizona Beach (`b804d7c59aea0`) - single PCI ID using `adl_n_cfg`
- Alder Lake-N extra SKU (`65c441ec58224`) - single PCI ID using
`adl_n_cfg`
- Arrow Lake-U/H (`88150cd9501b9`) - PCI IDs using `mtl_p_cfg`
### 4. Scope and Risk Assessment
- **Lines changed**: 4 added, 0 deleted
- **Files touched**: 1 (`drivers/edac/igen6_edac.c`)
- **Complexity**: Near-zero - just PCI ID defines and table entries
- **Risk of regression**: Effectively zero. The new PCI IDs will only
match hardware that has those specific device IDs. If no such hardware
is present, the entries are completely inert. If the hardware IS
present, it uses the well-tested `adl_n_cfg` path that Alder Lake-N,
Arizona Beach, and Amston Lake SKU1 all use.
### 5. User Impact
Intel Amston Lake SoCs are described as "tailored for edge computing
needs" in the original Amston Lake commit. EDAC (Error Detection and
Correction) is critical for:
- **Edge computing and embedded systems** - where these SoCs are
deployed
- **Reliability-critical environments** - where memory error detection
prevents silent data corruption
- **Enterprise/industrial applications** - Amston Lake targets this
market segment
Without these device IDs, users with Amston Lake SKU2/SKU3 hardware get
**no IBECC EDAC support** - meaning memory errors go undetected, which
can lead to silent data corruption.
### 6. Stability Indicators
- Written by the driver's original author (Qiuxu Zhuo at Intel)
- Signed-off by the RAS subsystem maintainer (Tony Luck)
- Hardware-tested by Intel engineer (Jianfeng Gao)
- The same `adl_n_cfg` config is already used by 12+ Alder Lake-N SKUs,
Arizona Beach, and ASL_SKU1 - extremely well-tested code path
### 7. Dependency Check
This commit depends on commit `099d2db3625b3` ("EDAC/igen6: Add Intel
Amston Lake SoCs support") which introduced `DID_ASL_SKU1` and the
Amston Lake comment section. That commit landed in v6.16-rc1. Both
commits would need to be backported together to stable trees that have
`adl_n_cfg` (v6.8+). This is straightforward - both are trivial device
ID additions.
The `adl_n_cfg` configuration exists since v6.8 (commit
`c4a5398991fd2`), so active stable trees 6.8.y, 6.12.y would have the
necessary infrastructure. For 6.6.y, the `adl_n_cfg` is also present
since it was added in v6.8-rc1 which branched after 6.6 - actually let
me verify:
The `adl_n_cfg` was introduced by `c4a5398991fd2` which is tagged at
`v6.8-rc1~209^2~8`. This means it's in v6.8+. Active stable trees like
6.12.y and 6.6.y need to be checked individually, but 6.6.y would NOT
have `adl_n_cfg` since it predates v6.8. Stable trees 6.8+ would have
it.
### Summary
| Criterion | Assessment |
|---|---|
| Fixes real issue? | Yes - enables EDAC on Amston Lake SKU2/SKU3
hardware |
| Obviously correct? | Yes - trivial PCI ID addition |
| Small and contained? | Yes - 4 lines, 1 file |
| New features? | No - enables existing functionality on new hardware |
| Risk? | Near-zero |
| Tested? | Yes (Tested-by tag from Intel) |
| Category | New Device IDs - explicitly allowed in stable |
This commit is a clear-cut example of the "new device ID" exception to
stable kernel rules. It enables EDAC support for real Intel hardware
(Amston Lake edge computing SoCs) with zero risk of regression. The
change is trivial, authored and tested by Intel engineers, and follows
the exact same pattern as numerous prior device ID additions to this
driver.
**YES**
drivers/edac/igen6_edac.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 839b6dd3629e9..f2c9270c1893c 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -246,6 +246,8 @@ static struct work_struct ecclog_work;
/* Compute did IDs for Amston Lake with IBECC */
#define DID_ASL_SKU1 0x464a
+#define DID_ASL_SKU2 0x4646
+#define DID_ASL_SKU3 0x4652
/* Compute die IDs for Raptor Lake-P with IBECC */
#define DID_RPL_P_SKU1 0xa706
@@ -628,6 +630,8 @@ static struct pci_device_id igen6_pci_tbl[] = {
{ PCI_VDEVICE(INTEL, DID_ADL_N_SKU12), (kernel_ulong_t)&adl_n_cfg },
{ PCI_VDEVICE(INTEL, DID_AZB_SKU1), (kernel_ulong_t)&adl_n_cfg },
{ PCI_VDEVICE(INTEL, DID_ASL_SKU1), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ASL_SKU2), (kernel_ulong_t)&adl_n_cfg },
+ { PCI_VDEVICE(INTEL, DID_ASL_SKU3), (kernel_ulong_t)&adl_n_cfg },
{ PCI_VDEVICE(INTEL, DID_RPL_P_SKU1), (kernel_ulong_t)&rpl_p_cfg },
{ PCI_VDEVICE(INTEL, DID_RPL_P_SKU2), (kernel_ulong_t)&rpl_p_cfg },
{ PCI_VDEVICE(INTEL, DID_RPL_P_SKU3), (kernel_ulong_t)&rpl_p_cfg },
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH AUTOSEL 6.19-6.18] EDAC/igen6: Add more Intel Panther Lake-H SoCs support
2026-02-14 0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
` (4 preceding siblings ...)
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.18] EDAC/igen6: Add two Intel Amston Lake SoCs support Sasha Levin
@ 2026-02-14 0:58 ` Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-5.10] arm64: tegra: smaug: Add usb-role-switch support Sasha Levin
` (2 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2026-02-14 0:58 UTC (permalink / raw)
To: patches, stable
Cc: Lili Li, Tony Luck, Qiuxu Zhuo, Sasha Levin, bp, linux-edac
From: Lili Li <lili.li@intel.com>
[ Upstream commit 4c36e6106997b6ad8f4a279b4bdbca3ed6f53c6c ]
Add more Intel Panther Lake-H SoC compute die IDs for EDAC support.
Signed-off-by: Lili Li <lili.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://patch.msgid.link/20251124131537.3633983-1-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a comprehensive picture. Let me summarize my analysis.
## Detailed Analysis
### 1. Commit Message Analysis
The commit subject is "EDAC/igen6: Add more Intel Panther Lake-H SoCs
support" and the body says "Add more Intel Panther Lake-H SoC compute
die IDs for EDAC support." This is purely a **new device ID addition**
to an existing driver. The commit is authored by Intel employees (Lili
Li, Qiuxu Zhuo) who maintain this specific EDAC driver, and reviewed by
the same team.
### 2. Code Change Analysis
The diff is straightforward and mechanical:
**Part 1 - Macro definitions** (lines ~280-290): Adds 10 new `#define`
constants:
- `DID_PTL_H_SKU4` through `DID_PTL_H_SKU13` with PCI device IDs
`0xb003`, `0xb004`, `0xb005`, `0xb008`, `0xb011`, `0xb014`, `0xb015`,
`0xb028`, `0xb029`, `0xb02a`
**Part 2 - PCI device table** (lines ~675-695): Adds 10 new
`PCI_VDEVICE` entries, all pointing to the existing `mtl_p_cfg`
configuration structure — the same configuration used by the 3 initial
PTL-H SKUs.
This is a textbook **PCI device ID addition**. No new logic, no new
config structures, no new functions. Every new entry uses the identical
`mtl_p_cfg` configuration as the existing PTL-H SKUs.
### 3. Classification
This falls squarely into the **"New Device IDs" exception** for stable
trees:
- The driver (`igen6_edac`) already exists in stable trees
- The PTL-H configuration (`mtl_p_cfg`) and initial PTL-H support
already exist (since v6.13)
- Only new PCI IDs are being added — enabling additional hardware
variants of the same SoC family
### 4. Scope and Risk Assessment
- **Lines changed**: ~20 (10 `#define`s + 10 PCI table entries)
- **Files touched**: 1 (`drivers/edac/igen6_edac.c`)
- **Risk**: Extremely low. The PCI device table is a static array of ID-
to-config mappings. The new entries only match hardware with those
specific PCI IDs. If the hardware isn't present, the entries are
completely inert. Zero risk to existing functionality.
- **Complexity**: Trivial — pure data addition with no logic changes.
### 5. Dependency Check
This commit has one critical dependency:
- **Initial PTL-H support commit** (`0be9f1af39022`): "EDAC/igen6: Add
Intel Panther Lake-H SoCs support" — landed in **v6.13**. This commit
defined `DID_PTL_H_SKU1`-`SKU3` and the usage of `mtl_p_cfg` for
PTL-H.
Therefore:
- **v6.1 LTS, v6.6 LTS, v6.12 stable**: The prerequisite commit is NOT
present. Backporting here would require first backporting the initial
PTL-H support.
- **v6.13+ stable** (v6.13.y, v6.14.y, etc.): The prerequisite IS
present, so this patch can apply cleanly.
### 6. User Impact
EDAC (Error Detection And Correction) is critical infrastructure for
detecting and reporting memory errors. Without these device IDs, users
with Intel Panther Lake-H SoCs with these specific compute die IDs will
get **no EDAC/ECC error reporting** — meaning memory errors could go
undetected, leading to potential silent data corruption on affected
hardware. This is exactly the kind of hardware enablement that stable
users (particularly enterprise/embedded users) need.
### 7. Stability Indicators
- Reviewed by Intel's EDAC team member (Qiuxu Zhuo)
- Signed off by the RAS maintainer (Tony Luck)
- Same pattern as dozens of prior device ID additions to this driver,
all of which have been successfully backported to stable trees (Arrow
Lake, Meteor Lake-P/PS, Raptor Lake-P, Alder Lake-N, etc.)
### 8. Verdict
This is a classic, textbook device ID addition:
- Purely mechanical addition of PCI device IDs
- Uses existing infrastructure (no new code paths)
- Enables EDAC support on additional Panther Lake-H hardware variants
- Extremely low risk — inert on systems without matching hardware
- The only concern is dependency on initial PTL-H support (v6.13+), but
this is manageable
The commit matches the stable kernel rules for device ID additions
perfectly. It's small, contained, obviously correct, and enables real
hardware support for real users.
**YES**
drivers/edac/igen6_edac.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 553c31a2d9226..839b6dd3629e9 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -274,6 +274,16 @@ static struct work_struct ecclog_work;
#define DID_PTL_H_SKU1 0xb000
#define DID_PTL_H_SKU2 0xb001
#define DID_PTL_H_SKU3 0xb002
+#define DID_PTL_H_SKU4 0xb003
+#define DID_PTL_H_SKU5 0xb004
+#define DID_PTL_H_SKU6 0xb005
+#define DID_PTL_H_SKU7 0xb008
+#define DID_PTL_H_SKU8 0xb011
+#define DID_PTL_H_SKU9 0xb014
+#define DID_PTL_H_SKU10 0xb015
+#define DID_PTL_H_SKU11 0xb028
+#define DID_PTL_H_SKU12 0xb029
+#define DID_PTL_H_SKU13 0xb02a
/* Compute die IDs for Wildcat Lake with IBECC */
#define DID_WCL_SKU1 0xfd00
@@ -636,6 +646,16 @@ static struct pci_device_id igen6_pci_tbl[] = {
{ PCI_VDEVICE(INTEL, DID_PTL_H_SKU1), (kernel_ulong_t)&mtl_p_cfg },
{ PCI_VDEVICE(INTEL, DID_PTL_H_SKU2), (kernel_ulong_t)&mtl_p_cfg },
{ PCI_VDEVICE(INTEL, DID_PTL_H_SKU3), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU4), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU5), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU6), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU7), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU8), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU9), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU10), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU11), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU12), (kernel_ulong_t)&mtl_p_cfg },
+ { PCI_VDEVICE(INTEL, DID_PTL_H_SKU13), (kernel_ulong_t)&mtl_p_cfg },
{ PCI_VDEVICE(INTEL, DID_WCL_SKU1), (kernel_ulong_t)&wcl_cfg },
{ },
};
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH AUTOSEL 6.19-5.10] arm64: tegra: smaug: Add usb-role-switch support
2026-02-14 0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
` (5 preceding siblings ...)
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.18] EDAC/igen6: Add more Intel Panther Lake-H " Sasha Levin
@ 2026-02-14 0:58 ` Sasha Levin
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.12] Revert "arm64: zynqmp: Add an OP-TEE node to the device tree" Sasha Levin
2026-02-16 10:21 ` [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Geert Uytterhoeven
8 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2026-02-14 0:58 UTC (permalink / raw)
To: patches, stable
Cc: Diogo Ivo, Thierry Reding, Sasha Levin, thierry.reding, jonathanh,
neil.armstrong, yelangyan, linux-tegra
From: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
[ Upstream commit dfa93788dd8b2f9c59adf45ecf592082b1847b7b ]
The USB2 port on Smaug is configured for OTG operation but lacked the
required 'usb-role-switch' property, leading to a failed probe and a
non-functioning USB port. Add the property along with setting the default
role to host.
Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have all the information needed for a thorough analysis.
---
## Complete Analysis
### 1. COMMIT MESSAGE ANALYSIS
**Subject:** `arm64: tegra: smaug: Add usb-role-switch support`
**Body:** The commit message clearly states the problem: the USB2 port
on Smaug (Google Pixel C) is configured for OTG operation (`mode =
"otg"`) but lacked the required `usb-role-switch` property, **leading to
a failed probe and a non-functioning USB port**. This is a clear bug
fix, not a feature addition.
**Author/Signoffs:** Authored by Diogo Ivo, signed off by Thierry Reding
(NVIDIA's Tegra device tree maintainer).
### 2. CODE CHANGE ANALYSIS
The patch adds exactly **two lines** to a device tree file:
```1811:1812:arch/arm64/boot/dts/nvidia/tegra210-smaug.dts
usb-role-switch;
role-switch-default-mode = "host";
```
These are added to the `usb2-0` port node within the XUSB pad controller
(`padctl@7009f000`), which already has `mode = "otg"`.
### 3. ROOT CAUSE OF THE BUG
The bug was introduced by commit `f67213cee2b35` ("phy: tegra: xusb: Add
usb-role-switch support"), which was merged in **v5.7-rc1**. This commit
added the following mandatory check to
`tegra_xusb_usb2_port_parse_dt()`:
```765:777:drivers/phy/tegra/xusb.c
/* usb-role-switch property is mandatory for OTG/Peripheral
modes */
if (usb2->mode == USB_DR_MODE_PERIPHERAL ||
usb2->mode == USB_DR_MODE_OTG) {
if (of_property_read_bool(np, "usb-role-switch")) {
err = tegra_xusb_setup_usb_role_switch(port);
if (err < 0)
return err;
tegra_xusb_parse_usb_role_default_mode(port);
} else {
dev_err(&port->dev, "usb-role-switch not found
for %s mode",
modes[usb2->mode]);
return -EINVAL;
}
}
```
If the DT node has `mode = "otg"` but **does NOT** have the `usb-role-
switch` property, the driver returns `-EINVAL`, causing the entire XUSB
pad controller probe to **fail**. This means the USB port doesn't work
at all.
When commit `f67213cee2b35` was merged, the companion DTS updates were
done for other Tegra boards (Jetson TX1 in `dbf91ff001e54`, Jetson Nano
in `88d1049e16eaa`, Jetson TX2 in `6895c83fda8c9`), but **Smaug was
missed**. The smaug DTS had `mode = "otg"` since v4.9 (commit
`4d3457826abb1`) but was never updated with the required `usb-role-
switch` property.
### 4. IMPACT AND AFFECTED VERSIONS
- **Broken since:** v5.7 (when `usb-role-switch` became mandatory for
OTG mode)
- **Affected stable trees:** All active LTS kernels: **5.15.y, 6.1.y,
6.6.y, 6.12.y**
- **Device:** Google Pixel C (Tegra210 Smaug) — a widely sold consumer
tablet
- **Symptom:** Complete USB port probe failure. The XUSB pad controller
fails to initialize, making the USB-C port non-functional. This is not
a subtle degradation; it's a **total hardware functionality loss**.
### 5. CLASSIFICATION
This falls squarely into the **Device Tree (DT) update** exception
category:
- It's a DT fix for existing hardware with an existing driver
- It corrects an incorrect/incomplete hardware description
- It only affects a specific ARM64 platform (Smaug/Pixel C)
- Zero risk of regression to other platforms
### 6. SCOPE AND RISK ASSESSMENT
- **Size:** 2 lines added to a single DTS file
- **Risk:** Extremely low. Adding a `usb-role-switch` boolean property
and a `role-switch-default-mode` string property to a DT node is
trivially safe:
- `usb-role-switch` enables the existing role-switch code path that is
already used by every other Tegra OTG board
- `role-switch-default-mode = "host"` is simply ignored on kernels
before v6.3 (unused DT properties are silently ignored)
- **No functional dependencies:** While this commit appears in a series
with `c256740c4b1ff` (tegra-udc node) and `8acdb94dcfd37`
(DisplayPort), this patch is fully independent — it only adds
properties to the pad controller's port node, which is separate from
the UDC node.
### 7. BACKPORT FEASIBILITY
The patch modifies only the smaug DTS file with a trivial 2-line
addition. The context around it (`usb2-0` port node with `mode = "otg"`)
has been stable since v4.9 and exists identically in all stable trees.
This will apply cleanly.
### 8. CONCLUSION
This is a textbook DT bug fix: a missing required property (`usb-role-
switch`) causes a complete probe failure for the USB port on the Google
Pixel C tablet. The fix is a trivial 2-line DTS addition that:
1. Fixes a real bug (non-functioning USB port due to probe failure with
`-EINVAL`)
2. Is obviously correct (follows the exact same pattern as every other
Tegra OTG board)
3. Is extremely small and contained (2 lines in one DTS file)
4. Has zero regression risk (only affects one specific device)
5. Has been broken since v5.7, affecting all current stable trees
6. Falls into the explicit "DT updates for existing hardware" exception
category
The only mild concern is that this commit hasn't merged to mainline yet
(it's in a `next-*` branch). Stable backports typically track mainline
commits. However, from a purely technical standpoint, this is an ideal
stable fix.
**YES**
arch/arm64/boot/dts/nvidia/tegra210-smaug.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/nvidia/tegra210-smaug.dts b/arch/arm64/boot/dts/nvidia/tegra210-smaug.dts
index 5aa6afd56cbc6..dfbd1c72388c1 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210-smaug.dts
+++ b/arch/arm64/boot/dts/nvidia/tegra210-smaug.dts
@@ -1809,6 +1809,8 @@ usb2-0 {
status = "okay";
vbus-supply = <&usbc_vbus>;
mode = "otg";
+ usb-role-switch;
+ role-switch-default-mode = "host";
};
usb3-0 {
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH AUTOSEL 6.19-6.12] Revert "arm64: zynqmp: Add an OP-TEE node to the device tree"
2026-02-14 0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
` (6 preceding siblings ...)
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-5.10] arm64: tegra: smaug: Add usb-role-switch support Sasha Levin
@ 2026-02-14 0:58 ` Sasha Levin
2026-02-16 10:21 ` [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Geert Uytterhoeven
8 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2026-02-14 0:58 UTC (permalink / raw)
To: patches, stable
Cc: Tomas Melin, Michal Simek, Sasha Levin, neil.armstrong,
neal.frager, conor.dooley, yelangyan, linux-arm-kernel
From: Tomas Melin <tomas.melin@vaisala.com>
[ Upstream commit c197179990124f991fca220d97fac56779a02c6d ]
This reverts commit 06d22ed6b6635b17551f386b50bb5aaff9b75fbe.
OP-TEE logic in U-Boot automatically injects a reserved-memory
node along with optee firmware node to kernel device tree.
The injection logic is dependent on that there is no manually
defined optee node. Having the node in zynqmp.dtsi effectively
breaks OP-TEE's insertion of the reserved-memory node, causing
memory access violations during runtime.
Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
Signed-off-by: Michal Simek <michal.simek@amd.com>
Link: https://lore.kernel.org/r/20251125-revert-zynqmp-optee-v1-1-d2ce4c0fcaf6@vaisala.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a comprehensive understanding. Let me compile the full
analysis.
---
## Detailed Analysis
### 1. COMMIT MESSAGE ANALYSIS
This is a **revert** commit. It reverts
`06d22ed6b6635b17551f386b50bb5aaff9b75fbe` ("arm64: zynqmp: Add an OP-
TEE node to the device tree"), which was merged for v6.9-rc1 (January
2024).
The revert message explains a concrete, real-world problem:
- U-Boot's OP-TEE logic automatically injects a `reserved-memory` node
and `optee` firmware node into the kernel device tree at boot time
- This U-Boot injection logic checks for the **absence** of a manually-
defined optee node — if one already exists, it skips injection
entirely
- The original commit added a static optee node to `zynqmp.dtsi`, which
prevents U-Boot from injecting the `reserved-memory` node
- Without the reserved-memory node, **memory access violations occur
during runtime**
This is not speculative — the commit describes actual runtime memory
access violations.
### 2. CODE CHANGE ANALYSIS
The change is extremely small and surgical — it removes exactly 5 lines
(4 lines of DT node content + 1 blank line):
```195:198:arch/arm64/boot/dts/xilinx/zynqmp.dtsi
optee: optee {
compatible = "linaro,optee-tz";
method = "smc";
};
```
This removes a statically-defined OP-TEE node from the ZynqMP base
device tree include file (`zynqmp.dtsi`). The `optee` label is not
referenced by any other DTS file in the xilinx directory tree, so there
are no broken cross-references.
### 3. BUG MECHANISM
The bug mechanism is an interaction between the Linux kernel device tree
and U-Boot bootloader:
1. **U-Boot's OP-TEE initialization**: U-Boot contains logic that
detects if OP-TEE is running and, if so, dynamically modifies the FDT
(Flattened Device Tree) before passing it to Linux. It adds:
- A `reserved-memory` node describing memory regions reserved for OP-
TEE
- The `optee` firmware node pointing to the TEE
2. **Conditional injection**: U-Boot's injection is conditioned on there
being **no existing optee node** in the DT. If one is found, U-Boot
assumes the DT is already properly configured and skips injection.
3. **The problem**: The static optee node in `zynqmp.dtsi` has
`compatible = "linaro,optee-tz"` and `method = "smc"`, but it does
NOT include the critical `reserved-memory` region. When U-Boot sees
this node, it skips its injection. The kernel then boots with:
- An optee node (so the OP-TEE driver probes and starts communicating
with the secure world)
- **No reserved-memory** for OP-TEE (so the kernel may allocate and
use memory that OP-TEE also uses)
4. **Result**: Memory access violations at runtime. The kernel and OP-
TEE stomp on each other's memory. This is a data corruption and crash
bug.
### 4. CLASSIFICATION
This is a **device tree fix** that addresses a **runtime memory
corruption/access violation** bug. It falls squarely in the "DT updates
for existing hardware" exception category. The severity is high:
- Memory access violations = potential crashes, data corruption,
security boundary violations (TEE compromise)
- Affects all ZynqMP boards running OP-TEE with U-Boot
### 5. ORIGINAL COMMIT ANALYSIS
The original commit (06d22ed6b6635) claimed "having the DT node present
doesn't cause any side effects." This turned out to be **wrong** — it
has a significant side effect with U-Boot's standard OP-TEE injection
flow. The original commit was a feature addition ("add it in case
someone tries to load OP-TEE") that turned out to cause a real bug.
### 6. SCOPE AND RISK ASSESSMENT
- **Lines changed**: 5 lines removed (1 file)
- **Risk**: Extremely low — removing a DT node that shouldn't have been
there
- **Side effects**: None negative. If someone was using the static optee
node WITHOUT U-Boot injection, they would need to add it back in their
board-specific DT overlay. However, the standard OP-TEE workflow on
ZynqMP uses U-Boot injection, and the original commit was explicitly
designed as a convenience, not a necessity.
- **Regression potential**: Near zero — the only "regression" is
returning to the pre-6.9 behavior, which worked correctly
### 7. AFFECTED STABLE TREES
The original commit (06d22ed6b6635) is present in:
- v6.9 through v6.19 (all current stable trees from 6.9 onward)
- Critically, it's in **6.12.y** (current LTS tree)
The fix is relevant to 6.12.y, 6.13.y, and any other maintained stable
tree based on 6.9+.
### 8. DEPENDENCY CHECK
The revert has **zero dependencies**. It simply removes 5 lines from a
DTS file. No other code or DT changes are needed. The `optee` label is
not referenced by any other file. The revert will apply cleanly to any
stable tree that has the original commit.
### 9. USER IMPACT
- **Who is affected**: All ZynqMP users running OP-TEE with standard
U-Boot boot flow
- **Severity**: HIGH — runtime memory access violations can cause:
- Kernel crashes/panics
- OP-TEE secure world corruption
- Silent data corruption
- Potential security boundary violation (TEE memory exposed to normal
world)
- **Platform**: Xilinx ZynqMP (common in embedded, industrial,
automotive applications)
### 10. TRUST INDICATORS
- Submitted by Tomas Melin (Vaisala), indicating real-world usage and
testing
- Accepted by Michal Simek (AMD/Xilinx SoC maintainer), indicating
subsystem maintainer approval
- The commit message clearly explains the mechanism and consequences
- Clean revert of a well-identified problematic commit
### CONCLUSION
This is a textbook stable backport candidate:
1. **Fixes a real bug**: Runtime memory access violations on ZynqMP with
OP-TEE
2. **Obviously correct**: Clean revert of a commit that introduced the
problem
3. **Small and contained**: 5-line removal from a single DTS file
4. **No new features**: Removes a broken feature addition
5. **Zero risk**: Returns to the pre-6.9 behavior that worked correctly
6. **No dependencies**: Self-contained change
7. **Real-world impact**: Affects ZynqMP users with OP-TEE
(embedded/industrial platforms)
8. **Applies to active stable trees**: The buggy commit is in 6.12.y LTS
and other maintained stables
The fix is small, surgical, and meets all stable kernel criteria.
**YES**
arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index 938b014ca9231..b55c6b2e8e0e1 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -192,11 +192,6 @@ psci {
};
firmware {
- optee: optee {
- compatible = "linaro,optee-tz";
- method = "smc";
- };
-
zynqmp_firmware: zynqmp-firmware {
compatible = "xlnx,zynqmp-firmware";
#power-domain-cells = <1>;
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot
2026-02-14 0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
` (7 preceding siblings ...)
2026-02-14 0:58 ` [PATCH AUTOSEL 6.19-6.12] Revert "arm64: zynqmp: Add an OP-TEE node to the device tree" Sasha Levin
@ 2026-02-16 10:21 ` Geert Uytterhoeven
2026-02-16 13:12 ` Sasha Levin
8 siblings, 1 reply; 13+ messages in thread
From: Geert Uytterhoeven @ 2026-02-16 10:21 UTC (permalink / raw)
To: Sasha Levin
Cc: patches, stable, Helge Deller, guoren, neil.armstrong, brauner,
yelangyan, schuster.simon, linux-csky, Parisc List
Hi Sasha
Cc linux-parisc
How did you (or the LLM?) came up with that CC list?!?
On Sat, 14 Feb 2026 at 01:58, Sasha Levin <sashal@kernel.org> wrote:
>
> From: Helge Deller <deller@gmx.de>
>
> [ Upstream commit 35ac5a728c878594f2ea6c43b57652a16be3c968 ]
>
> Signed-off-by: Helge Deller <deller@gmx.de>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>
> LLM Generated explanations, may be completely bogus:
>
> Now I have all the information I need for a thorough analysis. Here is
> my complete assessment:
>
> ---
>
> ## Detailed Analysis: "parisc: Prevent interrupts during reboot"
>
> ### 1. COMMIT MESSAGE ANALYSIS
>
> The commit message is minimal: "parisc: Prevent interrupts during
> reboot". It clearly states the problem it's fixing -- interrupts are not
> properly disabled during the reboot path on PA-RISC systems. The author
> is **Helge Deller**, the long-time parisc maintainer and the most
> authoritative developer for this architecture. The commit has no
> `Fixes:` tag or `Cc: stable` (as expected for autosel candidates).
>
> ### 2. CODE CHANGE ANALYSIS
>
> The change is **one single effective line of code** plus a comment:
>
> ```c
> /* prevent interrupts during reboot */
> set_eiem(0);
> ```
>
> This is inserted into `machine_restart()` in
> `arch/parisc/kernel/process.c` immediately after
> `pdc_chassis_send_status(PDC_CHASSIS_DIRECT_SHUTDOWN)` and before
> `pdc_do_reset()`.
>
> **What `set_eiem(0)` does:** On PA-RISC, the EIEM (External Interrupt
> Enable Mask, Control Register 15) controls which external interrupts can
> fire. Setting it to 0 **masks all external interrupts at the hardware
> level**, preventing any interrupt from being delivered to the CPU. This
> is defined as:
>
> ```82:82:arch/parisc/include/asm/special_insns.h
> #define set_eiem(val) mtctl(val, CR_EIEM)
> ```
>
> **The bug:** Without this line, external interrupts remain enabled
> during the entire reboot sequence. This means:
>
> a) **Deadlock risk in `pdc_do_reset()`**: The `pdc_do_reset()` function
> acquires `pdc_lock` via `spin_lock_irqsave()`:
>
> ```1236:1246:arch/parisc/kernel/firmware.c
> int pdc_do_reset(void)
> {
> int retval;
> unsigned long flags;
>
> spin_lock_irqsave(&pdc_lock, flags);
> retval = mem_pdc_call(PDC_BROADCAST_RESET, PDC_DO_RESET);
> spin_unlock_irqrestore(&pdc_lock, flags);
>
> return retval;
> }
> ```
>
> While `spin_lock_irqsave` disables local interrupts, the PA-RISC EIEM
> hardware mask is a separate mechanism. On PA-RISC, the external
> interrupt delivery path goes through the EIEM -- an interrupt fires only
> if the corresponding EIEM bit is set AND the EIRR (External Interrupt
> Request Register) bit is set. If a hardware interrupt fires between
> `pdc_chassis_send_status()` (which also uses `pdc_lock`) and
> `pdc_do_reset()`, or during the firmware calls themselves, it could
> interfere with the reset process.
>
> b) **Interference with firmware reset**: `pdc_do_reset()` calls into PDC
> firmware (`mem_pdc_call(PDC_BROADCAST_RESET, PDC_DO_RESET)`). Firmware
> calls on PA-RISC are sensitive to the processor state. An interrupt
> arriving during or between firmware calls can corrupt the reset
> sequence, potentially causing the machine to **hang instead of
> rebooting**.
>
> c) **The `gsc_writel(CMD_RESET, COMMAND_GLOBAL)` fallback**: If
> `pdc_do_reset()` returns (on machines that don't implement
> `PDC_BROADCAST_RESET`), the code tries a hardware reset via
> `gsc_writel`. Interrupts during this path are equally problematic.
>
> ### 3. ESTABLISHED PATTERN IN PARISC AND OTHER ARCHITECTURES
>
> **PA-RISC internal precedent:**
> - `parisc_terminate()` in `traps.c` uses the exact same pattern:
> `set_eiem(0)` followed by `local_irq_disable()` before critical
> shutdown operations (line 428-429)
> - The SMP CPU hotplug code (`smp.c:481`) uses `set_eiem(0)` to disable
> all external interrupts when taking a CPU offline
>
> **Other architectures ALL disable interrupts before reset:**
> - ARM: `local_irq_disable()` at line 136 of `arch/arm/kernel/reboot.c`
> - ARM64: `local_irq_disable()` at line 141 of
> `arch/arm64/kernel/process.c`
> - x86: `local_irq_disable()` at line 100 of `arch/x86/kernel/reboot.c`
> - xtensa: `local_irq_disable()` at line 524 of
> `arch/xtensa/kernel/setup.c`
> - nios2: `local_irq_disable()` at line 49 of
> `arch/nios2/kernel/process.c`
> - csky: `local_irq_disable()` at line 25 of `arch/csky/kernel/power.c`
> - MIPS falcon: `local_irq_disable()` at line 37 of
> `arch/mips/lantiq/falcon/reset.c`
>
> PA-RISC was the **outlier** in not disabling interrupts before reboot.
> This commit fixes that deficiency.
>
> ### 4. CLASSIFICATION
>
> This is a **bug fix** -- specifically fixing a potential hang/crash
> during reboot caused by unmasked interrupts during firmware calls. It is
> NOT a new feature, optimization, or cleanup.
>
> ### 5. SCOPE AND RISK ASSESSMENT
>
> - **Lines changed**: 3 (1 blank line, 1 comment, 1 functional line)
> - **Files touched**: 1 (`arch/parisc/kernel/process.c`)
> - **Subsystem**: parisc architecture, reboot path
> - **Risk**: **Extremely low**. Disabling interrupts before a system
> reset is universally accepted as correct and necessary. The
> `set_eiem(0)` pattern is already used in the same codebase for similar
> critical paths.
>
> ### 6. USER IMPACT
>
> - **Who is affected**: All PA-RISC Linux users who reboot their machines
> - **What happens without the fix**: Potential hang during reboot,
> requiring a hard power cycle. This could be intermittent and timing-
> dependent, making it hard to reproduce but very real.
> - **Severity if triggered**: System hang requiring manual intervention
> (power cycle)
>
> ### 7. STABILITY INDICATORS
>
> - Author is the subsystem maintainer (highest trust level)
> - Pattern is well-established across all other architectures
> - The exact same `set_eiem(0)` call is used in the parisc
> `parisc_terminate()` function already
> - The change is trivially small and obviously correct
>
> ### 8. DEPENDENCY CHECK
>
> - No dependencies on other commits
> - The `set_eiem()` macro has existed since the earliest parisc code
> - The `machine_restart()` function is largely unchanged across all
> stable trees
> - This will apply cleanly to any kernel version that has parisc support
>
> ### Summary
>
> This is a trivially small, obviously correct fix that prevents
> interrupts during the reboot sequence on PA-RISC systems. Without it, an
> untimely interrupt can interfere with PDC firmware calls during reset,
> potentially causing the machine to hang instead of rebooting. Every
> other Linux architecture disables interrupts before reboot; parisc was
> the outlier. The fix uses the same `set_eiem(0)` pattern already used in
> `parisc_terminate()` and SMP CPU offline code. The risk is essentially
> zero -- there is no conceivable way that disabling interrupts right
> before a hardware reset could cause a regression.
>
> **YES**
>
> arch/parisc/kernel/process.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
> index e64ab5d2a40d6..703644e5bfc4a 100644
> --- a/arch/parisc/kernel/process.c
> +++ b/arch/parisc/kernel/process.c
> @@ -85,6 +85,9 @@ void machine_restart(char *cmd)
> #endif
> /* set up a new led state on systems shipped with a LED State panel */
> pdc_chassis_send_status(PDC_CHASSIS_DIRECT_SHUTDOWN);
> +
> + /* prevent interrupts during reboot */
> + set_eiem(0);
>
> /* "Normal" system reset */
> pdc_do_reset();
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot
2026-02-16 10:21 ` [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Geert Uytterhoeven
@ 2026-02-16 13:12 ` Sasha Levin
2026-02-16 13:28 ` Geert Uytterhoeven
0 siblings, 1 reply; 13+ messages in thread
From: Sasha Levin @ 2026-02-16 13:12 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: patches, stable, Helge Deller, guoren, neil.armstrong, brauner,
yelangyan, schuster.simon, linux-csky, Parisc List
On Mon, Feb 16, 2026 at 11:21:25AM +0100, Geert Uytterhoeven wrote:
>Hi Sasha
>
>Cc linux-parisc
>
>How did you (or the LLM?) came up with that CC list?!?
Interesting...
$ ~/linux/scripts/get_maintainer.pl --pattern-depth=1 --no-rolestats --nor --nos 0001-parisc-Prevent-interrupts-during-reboot.patch
Neil Armstrong <neil.armstrong@linaro.org>
"Guo Ren (Alibaba Damo Academy)" <guoren@kernel.org>
Christian Brauner <brauner@kernel.org>
Geert Uytterhoeven <geert@linux-m68k.org>
Andreas Larsson <andreas@gaisler.com>
Helge Deller <deller@gmx.de>
Langyan Ye <yelangyan@huaqin.corp-partner.google.com>
Simon Schuster <schuster.simon@siemens-energy.com>
I think that I'll fix it by replacing --pattern-depth with --nogit --nogit-fallback:
$ ~/linux/scripts/get_maintainer.pl --no-git --nogit-fallback --no-rolestats --nor --nos 0001-parisc-Prevent-interrupts-during-reboot.patch
"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Helge Deller <deller@gmx.de>
linux-parisc@vger.kernel.org
linux-kernel@vger.kernel.org
Thanks for reporting this!
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot
2026-02-16 13:12 ` Sasha Levin
@ 2026-02-16 13:28 ` Geert Uytterhoeven
2026-02-16 15:48 ` Sasha Levin
0 siblings, 1 reply; 13+ messages in thread
From: Geert Uytterhoeven @ 2026-02-16 13:28 UTC (permalink / raw)
To: Sasha Levin
Cc: patches, stable, Helge Deller, guoren, neil.armstrong, brauner,
yelangyan, schuster.simon, linux-csky, Parisc List
Hi Sasha,
On Mon, 16 Feb 2026 at 14:12, Sasha Levin <sashal@kernel.org> wrote:
> On Mon, Feb 16, 2026 at 11:21:25AM +0100, Geert Uytterhoeven wrote:
> >Cc linux-parisc
> >
> >How did you (or the LLM?) came up with that CC list?!?
>
> Interesting...
>
> $ ~/linux/scripts/get_maintainer.pl --pattern-depth=1 --no-rolestats --nor --nos 0001-parisc-Prevent-interrupts-during-reboot.patch
> Neil Armstrong <neil.armstrong@linaro.org>
> "Guo Ren (Alibaba Damo Academy)" <guoren@kernel.org>
> Christian Brauner <brauner@kernel.org>
> Geert Uytterhoeven <geert@linux-m68k.org>
> Andreas Larsson <andreas@gaisler.com>
> Helge Deller <deller@gmx.de>
> Langyan Ye <yelangyan@huaqin.corp-partner.google.com>
> Simon Schuster <schuster.simon@siemens-energy.com>
Still doesn't explain linux-csky?
> I think that I'll fix it by replacing --pattern-depth with --nogit --nogit-fallback:
>
> $ ~/linux/scripts/get_maintainer.pl --no-git --nogit-fallback --no-rolestats --nor --nos 0001-parisc-Prevent-interrupts-during-reboot.patch
> "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
> Helge Deller <deller@gmx.de>
> linux-parisc@vger.kernel.org
> linux-kernel@vger.kernel.org
Much better!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot
2026-02-16 13:28 ` Geert Uytterhoeven
@ 2026-02-16 15:48 ` Sasha Levin
0 siblings, 0 replies; 13+ messages in thread
From: Sasha Levin @ 2026-02-16 15:48 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: patches, stable, Helge Deller, guoren, neil.armstrong, brauner,
yelangyan, schuster.simon, linux-csky, Parisc List
On Mon, Feb 16, 2026 at 02:28:14PM +0100, Geert Uytterhoeven wrote:
>Hi Sasha,
>
>On Mon, 16 Feb 2026 at 14:12, Sasha Levin <sashal@kernel.org> wrote:
>> On Mon, Feb 16, 2026 at 11:21:25AM +0100, Geert Uytterhoeven wrote:
>> >Cc linux-parisc
>> >
>> >How did you (or the LLM?) came up with that CC list?!?
>>
>> Interesting...
>>
>> $ ~/linux/scripts/get_maintainer.pl --pattern-depth=1 --no-rolestats --nor --nos 0001-parisc-Prevent-interrupts-during-reboot.patch
>> Neil Armstrong <neil.armstrong@linaro.org>
>> "Guo Ren (Alibaba Damo Academy)" <guoren@kernel.org>
>> Christian Brauner <brauner@kernel.org>
>> Geert Uytterhoeven <geert@linux-m68k.org>
>> Andreas Larsson <andreas@gaisler.com>
>> Helge Deller <deller@gmx.de>
>> Langyan Ye <yelangyan@huaqin.corp-partner.google.com>
>> Simon Schuster <schuster.simon@siemens-energy.com>
>
>Still doesn't explain linux-csky?
Oh, that was there because the LLM mentioned csky in it's comparison with other
archs and get_maintainer.pl used keyword matching.
If you grab the actual patch that was sent in the AUTOSEL mail:
$ ~/linux/scripts/get_maintainer.pl --pattern-depth=1 --no-rolestats --nor --nos raw
Guo Ren <guoren@kernel.org>
linux-csky@vger.kernel.org
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 13+ messages in thread