All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Prathamesh Shete <pshete@nvidia.com>,
	Petlozu Pravareshwar <petlozup@nvidia.com>,
	Jon Hunter <jonathanh@nvidia.com>,
	Thierry Reding <treding@nvidia.com>,
	Sasha Levin <sashal@kernel.org>,
	thierry.reding@gmail.com, bigeasy@linutronix.de,
	clrkwllms@kernel.org, rostedt@goodmis.org,
	ulf.hansson@linaro.org, chleroy@kernel.org, jirislaby@kernel.org,
	yelangyan@huaqin.corp-partner.google.com,
	linux-tegra@vger.kernel.org, linux-rt-devel@lists.linux.dev
Subject: [PATCH AUTOSEL 6.19] soc/tegra: pmc: Fix unsafe generic_handle_irq() call
Date: Fri, 13 Feb 2026 19:58:09 -0500	[thread overview]
Message-ID: <20260214005825.3665084-3-sashal@kernel.org> (raw)
In-Reply-To: <20260214005825.3665084-1-sashal@kernel.org>

From: Prathamesh Shete <pshete@nvidia.com>

[ Upstream commit e6d96073af681780820c94079b978474a8a44413 ]

Currently, when resuming from system suspend on Tegra platforms,
the following warning is observed:

WARNING: CPU: 0 PID: 14459 at kernel/irq/irqdesc.c:666
Call trace:
 handle_irq_desc+0x20/0x58 (P)
 tegra186_pmc_wake_syscore_resume+0xe4/0x15c
 syscore_resume+0x3c/0xb8
 suspend_devices_and_enter+0x510/0x540
 pm_suspend+0x16c/0x1d8

The warning occurs because generic_handle_irq() is being called from
a non-interrupt context which is considered as unsafe.

Fix this warning by deferring generic_handle_irq() call to an IRQ work
which gets executed in hard IRQ context where generic_handle_irq()
can be called safely.

When PREEMPT_RT kernels are used, regular IRQ work (initialized with
init_irq_work) is deferred to run in per-CPU kthreads in preemptible
context rather than hard IRQ context. Hence, use the IRQ_WORK_INIT_HARD
variant so that with PREEMPT_RT kernels, the IRQ work is processed in
hardirq context instead of being deferred to a thread which is required
for calling generic_handle_irq().

On non-PREEMPT_RT kernels, both init_irq_work() and IRQ_WORK_INIT_HARD()
execute in IRQ context, so this change has no functional impact for
standard kernel configurations.

Signed-off-by: Petlozu Pravareshwar <petlozup@nvidia.com>
Signed-off-by: Prathamesh Shete <pshete@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
[treding@nvidia.com: miscellaneous cleanups]
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Complete Analysis

### 1. COMMIT MESSAGE ANALYSIS

**Subject**: "soc/tegra: pmc: Fix unsafe generic_handle_irq() call"

The commit message is unambiguous - it's fixing a **real, observable
WARNING** during system resume on Tegra platforms. The warning backtrace
is included:

```
WARNING: CPU: 0 PID: 14459 at kernel/irq/irqdesc.c:666
Call trace:
 handle_irq_desc+0x20/0x58 (P)
 tegra186_pmc_wake_syscore_resume+0xe4/0x15c
 syscore_resume+0x3c/0xb8
 suspend_devices_and_enter+0x510/0x540
 pm_suspend+0x16c/0x1d8
```

The fix has been **Reviewed-by** and **Tested-by** Jon Hunter (NVIDIA
kernel engineer), and authored by NVIDIA engineers who are experts on
Tegra PMC.

### 2. CODE CHANGE ANALYSIS - The Bug

**Root cause**: `tegra186_pmc_wake_syscore_resume()` is called by
`syscore_resume()` during system resume. Looking at `syscore_resume()`
in `drivers/base/syscore.c:93-110`:

```c
void syscore_resume(void)
{
    WARN_ONCE(!irqs_disabled(), ...);
    list_for_each_entry(syscore, &syscore_list, node)
        if (syscore->ops->resume)
            syscore->ops->resume(syscore->data);
}
```

This runs with **interrupts disabled** but **NOT in hard IRQ context**
(i.e., `in_hardirq()` returns false). However, when
`generic_handle_irq()` is called, it goes through `handle_irq_desc()`
(in `kernel/irq/irqdesc.c:658-671`):

```c
int handle_irq_desc(struct irq_desc *desc)
{
    data = irq_desc_get_irq_data(desc);
    if (WARN_ON_ONCE(!in_hardirq() &&
irqd_is_handle_enforce_irqctx(data)))
        return -EPERM;
    generic_handle_irq_desc(desc);
    return 0;
}
```

The ARM GIC v2 and GIC v3 interrupt controllers **set
`IRQD_HANDLE_ENFORCE_IRQCTX` on ALL interrupts** they map (confirmed in
`drivers/irqchip/irq-gic.c:1077` and `drivers/irqchip/irq-
gic-v3.c:1585`). Since Tegra SoCs use ARM GIC, any IRQ going through the
Tegra wake domain will have this flag set. Thus, calling
`generic_handle_irq()` from the syscore resume context (not hard IRQ)
triggers the `WARN_ON_ONCE`.

**Why it matters**: Beyond the WARN, the function returns `-EPERM`
without actually handling the interrupt. This means **edge-triggered
wake IRQs are silently dropped**, potentially causing missed wakeup
events and device misbehavior after resume.

### 3. THE FIX

The fix:
1. Adds `struct irq_work wake_work` and `u32 *wake_status` fields to
   `struct tegra_pmc`
2. Creates a new `tegra186_pmc_wake_handler()` function (the IRQ work
   handler) that contains the logic previously in
   `tegra186_pmc_process_wake_events()`
3. `tegra186_pmc_wake_syscore_resume()` now saves wake status to
   `pmc->wake_status[]` and queues the IRQ work via `irq_work_queue()`
4. The IRQ work handler runs in **hard IRQ context** where
   `generic_handle_irq()` is safe
5. Uses `IRQ_WORK_INIT_HARD()` to ensure hard IRQ context even on
   `PREEMPT_RT` kernels
6. Adds a check in `tegra186_pmc_wake_syscore_suspend()` for unhandled
   wake IRQs

This is the **correct** fix pattern - using `irq_work` to defer IRQ
handling to hard IRQ context is a well-established kernel pattern.
Multiple other drivers have been fixed similarly (see commits
`94ec234a16cf3`, `118c3ba24d04f`, `c6a91405ac5cd`, `f285de79569f9`,
`f460c70125bcb` which all converted to `generic_handle_irq_safe()`).

### 4. DEPENDENCY ANALYSIS

**Critical dependency**: This commit was written against the v6.19 tree
which has the new `struct syscore` API from commit `a97fbc3ee3e2a`
(v6.19-rc1 only). In stable trees (v6.6, v6.12), the functions use the
old `struct syscore_ops` interface:
- `tegra186_pmc_wake_syscore_resume(void)` (no `void *data` parameter)
- `tegra186_pmc_wake_syscore_suspend(void)` (no `void *data` parameter)
- `struct syscore_ops syscore` (not `struct syscore`)

**However**, the core fix logic (adding `irq_work`, deferring
`generic_handle_irq()`) does NOT depend on the syscore API change. The
fix can be backported with minor adaptation:
- The function signatures need to match the old `syscore_ops` callbacks
  (no `void *data`)
- `IRQ_WORK_INIT_HARD` and `irq_work_queue` are available in v6.6+
- The `irq_work` and `wake_status` additions to `struct tegra_pmc` are
  self-contained

**Affected stable trees**: v6.6.y, v6.12.y (and any other active stable
tree based on v6.2+, since commit `0474cc8489bda` introduced the buggy
code in v6.2).

### 5. SCOPE AND RISK

- **Files changed**: 1 file (`drivers/soc/tegra/pmc.c`)
- **Lines**: ~+60, ~-30 (net ~30 lines added)
- **Subsystem**: Tegra SoC PMC - a self-contained platform driver
- **Risk**: LOW - The change only affects Tegra platforms with wake
  events. The `irq_work` mechanism is well-tested kernel infrastructure.
  The fix has been reviewed and tested by NVIDIA engineers.
- **Regression risk**: Very low - non-Tegra systems are unaffected. The
  new code path is functionally equivalent but executes in the correct
  context.

### 6. USER IMPACT

- **Who**: All users of NVIDIA Tegra186+ SoCs (Tegra186, Tegra194,
  Tegra234) using suspend/resume
- **Severity**: Every system resume triggers a kernel WARNING. Beyond
  the warning itself, edge-triggered wake IRQs may be silently dropped
  because `handle_irq_desc()` returns `-EPERM` without processing the
  interrupt
- **Real-world**: Tegra platforms are used in NVIDIA Jetson embedded
  systems, automotive platforms, and gaming devices (Nintendo Switch)

### 7. CONCLUSION

This is a clear bug fix for a real, reproducible issue that fires on
every resume on affected hardware. The fix uses a well-established
kernel pattern (irq_work) and is self-contained within a single platform
driver file. It has been reviewed and tested by the subsystem
maintainers. The fix will need minor adaptation for stable trees
(different `syscore_ops` API), but the core logic is fully applicable.
The bug exists in all stable trees from v6.2 onward.

**YES**

 drivers/soc/tegra/pmc.c | 104 ++++++++++++++++++++++++++++------------
 1 file changed, 74 insertions(+), 30 deletions(-)

diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
index f3760a3b3026d..407fa840814c3 100644
--- a/drivers/soc/tegra/pmc.c
+++ b/drivers/soc/tegra/pmc.c
@@ -28,6 +28,7 @@
 #include <linux/iopoll.h>
 #include <linux/irqdomain.h>
 #include <linux/irq.h>
+#include <linux/irq_work.h>
 #include <linux/kernel.h>
 #include <linux/of_address.h>
 #include <linux/of_clk.h>
@@ -468,6 +469,10 @@ struct tegra_pmc {
 	unsigned long *wake_sw_status_map;
 	unsigned long *wake_cntrl_level_map;
 	struct syscore syscore;
+
+	/* Pending wake IRQ processing */
+	struct irq_work wake_work;
+	u32 *wake_status;
 };
 
 static struct tegra_pmc *pmc = &(struct tegra_pmc) {
@@ -1905,6 +1910,50 @@ static int tegra_pmc_parse_dt(struct tegra_pmc *pmc, struct device_node *np)
 	return 0;
 }
 
+/* translate sc7 wake sources back into IRQs to catch edge triggered wakeups */
+static void tegra186_pmc_wake_handler(struct irq_work *work)
+{
+	struct tegra_pmc *pmc = container_of(work, struct tegra_pmc, wake_work);
+	unsigned int i, wake;
+
+	for (i = 0; i < pmc->soc->max_wake_vectors; i++) {
+		unsigned long status = pmc->wake_status[i];
+
+		for_each_set_bit(wake, &status, 32) {
+			irq_hw_number_t hwirq = wake + (i * 32);
+			struct irq_desc *desc;
+			unsigned int irq;
+
+			irq = irq_find_mapping(pmc->domain, hwirq);
+			if (!irq) {
+				dev_warn(pmc->dev,
+					 "No IRQ found for WAKE#%lu!\n",
+					 hwirq);
+				continue;
+			}
+
+			dev_dbg(pmc->dev,
+				"Resume caused by WAKE#%lu mapped to IRQ#%u\n",
+				hwirq, irq);
+
+			desc = irq_to_desc(irq);
+			if (!desc) {
+				dev_warn(pmc->dev,
+					 "No descriptor found for IRQ#%u\n",
+					 irq);
+				continue;
+			}
+
+			if (!desc->action || !desc->action->name)
+				continue;
+
+			generic_handle_irq(irq);
+		}
+
+		pmc->wake_status[i] = 0;
+	}
+}
+
 static int tegra_pmc_init(struct tegra_pmc *pmc)
 {
 	if (pmc->soc->max_wake_events > 0) {
@@ -1923,6 +1972,18 @@ static int tegra_pmc_init(struct tegra_pmc *pmc)
 		pmc->wake_cntrl_level_map = bitmap_zalloc(pmc->soc->max_wake_events, GFP_KERNEL);
 		if (!pmc->wake_cntrl_level_map)
 			return -ENOMEM;
+
+		pmc->wake_status = kcalloc(pmc->soc->max_wake_vectors, sizeof(u32), GFP_KERNEL);
+		if (!pmc->wake_status)
+			return -ENOMEM;
+
+		/*
+		 * Initialize IRQ work for processing wake IRQs. Must use
+		 * HARD_IRQ variant to run in hard IRQ context on PREEMPT_RT
+		 * because we call generic_handle_irq() which requires hard
+		 * IRQ context.
+		 */
+		pmc->wake_work = IRQ_WORK_INIT_HARD(tegra186_pmc_wake_handler);
 	}
 
 	if (pmc->soc->init)
@@ -3129,47 +3190,30 @@ static void wke_clear_wake_status(struct tegra_pmc *pmc)
 	}
 }
 
-/* translate sc7 wake sources back into IRQs to catch edge triggered wakeups */
-static void tegra186_pmc_process_wake_events(struct tegra_pmc *pmc, unsigned int index,
-					     unsigned long status)
-{
-	unsigned int wake;
-
-	dev_dbg(pmc->dev, "Wake[%d:%d]  status=%#lx\n", (index * 32) + 31, index * 32, status);
-
-	for_each_set_bit(wake, &status, 32) {
-		irq_hw_number_t hwirq = wake + 32 * index;
-		struct irq_desc *desc;
-		unsigned int irq;
-
-		irq = irq_find_mapping(pmc->domain, hwirq);
-
-		desc = irq_to_desc(irq);
-		if (!desc || !desc->action || !desc->action->name) {
-			dev_dbg(pmc->dev, "Resume caused by WAKE%ld, IRQ %d\n", hwirq, irq);
-			continue;
-		}
-
-		dev_dbg(pmc->dev, "Resume caused by WAKE%ld, %s\n", hwirq, desc->action->name);
-		generic_handle_irq(irq);
-	}
-}
-
 static void tegra186_pmc_wake_syscore_resume(void *data)
 {
-	u32 status, mask;
 	unsigned int i;
+	u32 mask;
 
 	for (i = 0; i < pmc->soc->max_wake_vectors; i++) {
 		mask = readl(pmc->wake + WAKE_AOWAKE_TIER2_ROUTING(i));
-		status = readl(pmc->wake + WAKE_AOWAKE_STATUS_R(i)) & mask;
-
-		tegra186_pmc_process_wake_events(pmc, i, status);
+		pmc->wake_status[i] = readl(pmc->wake + WAKE_AOWAKE_STATUS_R(i)) & mask;
 	}
+
+	/* Schedule IRQ work to process wake IRQs (if any) */
+	irq_work_queue(&pmc->wake_work);
 }
 
 static int tegra186_pmc_wake_syscore_suspend(void *data)
 {
+	unsigned int i;
+
+	/* Check if there are unhandled wake IRQs */
+	for (i = 0; i < pmc->soc->max_wake_vectors; i++)
+		if (pmc->wake_status[i])
+			dev_warn(pmc->dev,
+				 "Unhandled wake IRQs pending vector[%u]: 0x%x\n",
+				 i, pmc->wake_status[i]);
 	wke_read_sw_wake_status(pmc);
 
 	/* flip the wakeup trigger for dual-edge triggered pads
-- 
2.51.0


  parent reply	other threads:[~2026-02-14  0:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-14  0:58 [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Sasha Levin
2026-02-14  0:58 ` [PATCH AUTOSEL 6.19-6.18] soc: imx8m: Fix error handling for clk_prepare_enable() Sasha Levin
2026-02-14  0:58 ` Sasha Levin [this message]
2026-02-14  0:58 ` [PATCH AUTOSEL 6.19-6.18] x86/sev: Use kfree_sensitive() when freeing a SNP message descriptor Sasha Levin
2026-02-14  0:58 ` [PATCH AUTOSEL 6.19-6.12] firmware: arm_ffa: Unmap Rx/Tx buffers on init failure Sasha Levin
2026-02-14  0:58 ` [PATCH AUTOSEL 6.19-6.18] EDAC/igen6: Add two Intel Amston Lake SoCs support Sasha Levin
2026-02-14  0:58 ` [PATCH AUTOSEL 6.19-6.18] EDAC/igen6: Add more Intel Panther Lake-H " Sasha Levin
2026-02-14  0:58 ` [PATCH AUTOSEL 6.19-5.10] arm64: tegra: smaug: Add usb-role-switch support Sasha Levin
2026-02-14  0:58 ` [PATCH AUTOSEL 6.19-6.12] Revert "arm64: zynqmp: Add an OP-TEE node to the device tree" Sasha Levin
2026-02-16 10:21 ` [PATCH AUTOSEL 6.19-5.10] parisc: Prevent interrupts during reboot Geert Uytterhoeven
2026-02-16 13:12   ` Sasha Levin
2026-02-16 13:28     ` Geert Uytterhoeven
2026-02-16 15:48       ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260214005825.3665084-3-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=chleroy@kernel.org \
    --cc=clrkwllms@kernel.org \
    --cc=jirislaby@kernel.org \
    --cc=jonathanh@nvidia.com \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=linux-tegra@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=petlozup@nvidia.com \
    --cc=pshete@nvidia.com \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=thierry.reding@gmail.com \
    --cc=treding@nvidia.com \
    --cc=ulf.hansson@linaro.org \
    --cc=yelangyan@huaqin.corp-partner.google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.