public inbox for linux-tegra@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 7.0-6.12] arm64: tegra: Fix snps,blen properties
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
@ 2026-04-20 13:17 ` Sasha Levin
  2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.19] spi: tegra210-quad: Fix false positive WARN on interrupt timeout with transfer complete Sasha Levin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-04-20 13:17 UTC (permalink / raw)
  To: patches, stable
  Cc: Thierry Reding, Sasha Levin, robh, krzk+dt, conor+dt,
	thierry.reding, jonathanh, devicetree, linux-tegra, linux-kernel

From: Thierry Reding <treding@nvidia.com>

[ Upstream commit 51f10c527a63dc4a71bce4b40fc53eee78bbbd52 ]

The snps,blen property of stmmac-axi-config nodes needs to have 7
entries in total, with unsupported burst lengths listed as 0.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed for my analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `arm64: tegra` (ARM64 device tree, Tegra SoC)
- Action verb: "Fix" — direct bug fix language
- Summary: Fix the `snps,blen` DT properties for MGBE Ethernet
  controllers
- Record: [arm64: tegra] [fix] [correcting snps,blen DT property to have
  required 7 entries]

**Step 1.2: Tags**
- `Signed-off-by: Thierry Reding <treding@nvidia.com>` — Author is the
  Tegra platform maintainer at NVIDIA.
- No Fixes: tag, no Cc: stable, no Reported-by (expected for auto-
  selection candidates).
- Record: No notable tags beyond authorship. Author is the subsystem
  maintainer.

**Step 1.3: Commit Body**
- States: "The snps,blen property of stmmac-axi-config nodes needs to
  have 7 entries in total, with unsupported burst lengths listed as 0."
- This describes a DT schema violation — the property had 4 entries but
  the driver requires 7.
- Record: Bug is a malformed DT property that doesn't match the driver's
  expectation.

**Step 1.4: Hidden Bug Fix Detection**
- This is NOT a hidden fix — it's explicitly labeled "Fix." The
  underlying bug is that `of_property_read_u32_array(np, "snps,blen",
  axi_blen, 7)` fails silently when the property only has 4 entries,
  leaving the stack buffer uninitialized.
- Record: Direct bug fix, not disguised.

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- 1 file changed: `arch/arm64/boot/dts/nvidia/tegra234.dtsi`
- 3 lines changed (each identical):
  - `snps,blen = <256 128 64 32>;` → `snps,blen = <256 128 64 32 0 0
    0>;`
- Affects MGBE0, MGBE1, MGBE2 stmmac-axi-config nodes.
- Record: Single DT file, 3 identical one-line changes. Scope:
  minimal/surgical.

**Step 2.2: Code Flow**
- Before: DT property has 4 u32 entries.
- After: DT property has 7 u32 entries (3 trailing zeros for unsupported
  burst lengths).
- The stmmac driver calls `of_property_read_u32_array(np, "snps,blen",
  axi_blen, AXI_BLEN)` where `AXI_BLEN = 7`. With only 4 entries,
  `of_find_property_value_of_size()` checks `prop->length (16) < min
  (28)` and returns `-EOVERFLOW`. The stack array `axi_blen[7]` is never
  written. Then `stmmac_axi_blen_to_mask()` processes uninitialized
  stack data.

**Step 2.3: Bug Mechanism**
- Category: **Uninitialized data** / **incorrect DT specification**
- Mechanism: The DT property is too short, causing
  `of_property_read_u32_array()` to fail, leaving a stack buffer
  uninitialized. The uninitialized data is then used to configure the
  AXI DMA burst length register for network hardware.
- Record: Uninitialized stack data used for hardware DMA configuration.
  The fix ensures the property has the correct count.

**Step 2.4: Fix Quality**
- Obviously correct: all other DT files using `snps,blen` have exactly 7
  entries (verified by grep across all arm64 DT files).
- Minimal/surgical: 3 identical one-line changes.
- Zero regression risk: adding trailing zeros only enables the driver to
  read the property successfully, and zero entries are explicitly
  skipped by `stmmac_axi_blen_to_mask()`.
- Record: Fix is obviously correct, minimal, zero regression risk.

## PHASE 3: GIT HISTORY

**Step 3.1: Blame**
- The buggy `snps,blen = <256 128 64 32>` was introduced by commit
  `81695da63b977` ("arm64: tegra: Add AXI configuration for Tegra234
  MGBE") by Thierry Reding, dated 2024-02-21, merged in v6.9.
- Record: Bug introduced in v6.9 by the same author who is now fixing
  it.

**Step 3.2: Fixes tag**
- No Fixes: tag present. The implicit fix target is `81695da63b977`.

**Step 3.3: File History**
- Recent changes to `tegra234.dtsi` are mostly DT cleanup/additions. No
  related fixes.
- Record: Standalone fix, no prerequisites.

**Step 3.4: Author**
- Thierry Reding is the Tegra platform maintainer at NVIDIA. He wrote
  the original buggy commit and is now fixing it.
- Record: Subsystem maintainer self-fix.

**Step 3.5: Dependencies**
- None. The fix is a pure DT property value change that applies
  independently.
- Record: No dependencies, applies cleanly standalone.

## PHASE 4: MAILING LIST RESEARCH

**Step 4.1: Original Submission**
- Found via web search: patch is `[PATCH 09/10]` in a series "dt-
  bindings: Various cleanups for Tegra-related bindings" posted
  2026-02-23.
- Part of a v3 cleanup series. While most patches in the series are DT
  binding cleanups, this specific patch (09/10) is a genuine bug fix.
- Record: Part of a larger DT cleanup series, but this patch is an
  independent bug fix.

**Step 4.2: Reviewers**
- b4 dig found the original commit (81695da63b977) was reviewed and
  tested by Jon Hunter (NVIDIA Tegra co-maintainer). The fix itself is
  straightforward enough that formal review was likely implicit.
- Record: Original buggy code was reviewed by Jon Hunter.

**Step 4.3-4.5: Bug Reports / Stable History**
- No specific bug report found. Likely discovered by the author during
  code review / DT validation.
- No prior stable discussion found.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1-5.4: Key Functions**
- `stmmac_axi_setup()` in `stmmac_platform.c` parses the DT property.
- Called during stmmac platform driver probe for any device using
  `snps,axi-config` DT phandle.
- `stmmac_axi_blen_to_mask()` converts the burst length array to
  register value.
- The register value is written to hardware in `dwxgmac2_dma_init()` /
  `dwmac4_dma_init()` / `dwmac1000_dma_init()`.
- Impact: Affects AXI DMA configuration for MGBE Ethernet on Tegra234.

**Step 5.5: Similar Patterns**
- All other arm64 DT files consistently use 7 entries for `snps,blen`.
  Tegra234 was the ONLY outlier with 4 entries.

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1: Buggy Code in Stable**
- Commit `81695da63b977` was introduced in v6.9. It exists in stable
  trees v6.12.y and any newer LTS branches.
- It does NOT exist in v6.6.y (confirmed via `git merge-base --is-
  ancestor`).
- Record: Bug affects v6.9+ stable trees (v6.12.y at minimum).

**Step 6.2: Backport Complications**
- Pure DT change with no code dependencies. Should apply cleanly.
- Record: Clean apply expected.

**Step 6.3: Related Fixes**
- No prior fix for this issue found in any tree.

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1: Subsystem**
- ARM64 Device Tree for Tegra234 MGBE (Multi-Gigabit Ethernet).
- Criticality: IMPORTANT — affects Tegra234 network hardware users
  (NVIDIA Jetson AGX Orin, etc.).
- Record: [arm64/tegra DT] [IMPORTANT — Jetson platform network
  hardware]

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1: Affected Users**
- Users of NVIDIA Tegra234 platforms (Jetson AGX Orin) using MGBE
  Ethernet controllers.
- Platform-specific but widely deployed in industrial/embedded
  applications.

**Step 8.2: Trigger Conditions**
- Triggered on every boot when the MGBE driver probes. No special
  configuration needed.
- The AXI burst length read fails silently, so uninitialized data
  configures DMA hardware.

**Step 8.3: Failure Mode**
- Uninitialized stack data used for DMA hardware configuration.
- Could result in suboptimal network performance, potential DMA errors,
  or undefined hardware behavior depending on what stack values happen
  to be.
- Severity: **HIGH** (hardware misconfiguration from uninitialized data,
  affecting network DMA)

**Step 8.4: Risk-Benefit**
- BENEFIT: High — fixes broken AXI DMA configuration for 3 MGBE Ethernet
  controllers on Tegra234.
- RISK: Very low — 3 identical one-line DT property changes, adding
  trailing zeros, no code change.
- Record: Very favorable risk-benefit ratio.

## PHASE 9: FINAL SYNTHESIS

**Step 9.1: Evidence Summary**

FOR backporting:
- Fixes real bug: `of_property_read_u32_array()` fails with only 4/7
  entries, leading to uninitialized stack data being used for hardware
  DMA configuration
- Minimal, surgical fix: 3 identical one-line DT changes
- Obviously correct: all other DT files use 7 entries
- Written by subsystem maintainer (Thierry Reding, NVIDIA Tegra
  maintainer)
- Zero regression risk: trailing zeros are explicitly handled ("Burst
  values of zero must be skipped")
- Applies to v6.12.y+ stable trees

AGAINST backporting:
- No explicit user report of symptoms (could mean it works by luck with
  stack data)
- Part of a larger series (but this patch is independent)

**Step 9.2: Stable Rules Checklist**
1. Obviously correct? **YES** — all other DT files have 7 entries
2. Fixes a real bug? **YES** — uninitialized data used for HW
   configuration
3. Important issue? **YES** — hardware DMA misconfiguration
4. Small and contained? **YES** — 3 one-line changes in 1 DT file
5. No new features? **YES** — pure DT fix
6. Can apply to stable? **YES** — clean DT-only change

**Step 9.3: Exception Category**
- DT update fixing existing hardware description — automatic YES
  category

## Verification

- [Phase 1] Parsed subject: "Fix" verb, arm64 tegra subsystem, snps,blen
  property
- [Phase 2] Diff: 3 lines changed in tegra234.dtsi, each adds `0 0 0` to
  snps,blen property
- [Phase 2] Verified AXI_BLEN=7 in `include/linux/stmmac.h` line 109
- [Phase 2] Verified `of_property_read_u32_array()` calls
  `of_find_property_value_of_size()` with min=28 bytes; property has 16
  bytes (4 entries), so it returns -EOVERFLOW
- [Phase 2] Verified `stmmac_axi_blen_to_mask()` skips zero entries
  ("Burst values of zero must be skipped")
- [Phase 3] git blame: buggy code from 81695da63b977 (v6.9), author:
  Thierry Reding
- [Phase 3] git merge-base: confirmed 81695da63b977 is in v6.12 but not
  v6.6
- [Phase 4] b4 dig: found original commit submission at lore (patch-id
  match)
- [Phase 4] Web search: fix is PATCH 09/10 of "dt-bindings: Various
  cleanups for Tegra-related bindings"
- [Phase 5] Verified all other arm64 DT snps,blen entries have exactly 7
  values (grep across arch/arm64/boot/dts)
- [Phase 5] Traced call chain: stmmac_axi_setup →
  of_property_read_u32_array → of_find_property_value_of_size
- [Phase 6] Bug exists in v6.9+ stable trees
- [Phase 8] Failure mode: uninitialized stack data used for AXI DMA
  register, severity HIGH

**YES**

 arch/arm64/boot/dts/nvidia/tegra234.dtsi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
index 850c473235e36..04a95b6658caa 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
@@ -3621,7 +3621,7 @@ ethernet@6800000 {
 			snps,axi-config = <&mgbe0_axi_setup>;
 
 			mgbe0_axi_setup: stmmac-axi-config {
-				snps,blen = <256 128 64 32>;
+				snps,blen = <256 128 64 32 0 0 0>;
 				snps,rd_osr_lmt = <63>;
 				snps,wr_osr_lmt = <63>;
 			};
@@ -3663,7 +3663,7 @@ ethernet@6900000 {
 			snps,axi-config = <&mgbe1_axi_setup>;
 
 			mgbe1_axi_setup: stmmac-axi-config {
-				snps,blen = <256 128 64 32>;
+				snps,blen = <256 128 64 32 0 0 0>;
 				snps,rd_osr_lmt = <63>;
 				snps,wr_osr_lmt = <63>;
 			};
@@ -3705,7 +3705,7 @@ ethernet@6a00000 {
 			snps,axi-config = <&mgbe2_axi_setup>;
 
 			mgbe2_axi_setup: stmmac-axi-config {
-				snps,blen = <256 128 64 32>;
+				snps,blen = <256 128 64 32 0 0 0>;
 				snps,rd_osr_lmt = <63>;
 				snps,wr_osr_lmt = <63>;
 			};
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 7.0-6.19] spi: tegra210-quad: Fix false positive WARN on interrupt timeout with transfer complete
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
  2026-04-20 13:17 ` [PATCH AUTOSEL 7.0-6.12] arm64: tegra: Fix snps,blen properties Sasha Levin
@ 2026-04-20 13:19 ` Sasha Levin
  2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] dt-bindings: net: Fix Tegra234 MGBE PTP clock Sasha Levin
  2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] gpio: tegra: fix irq_release_resources calling enable instead of disable Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-04-20 13:19 UTC (permalink / raw)
  To: patches, stable
  Cc: Breno Leitao, Mark Brown, Sasha Levin, ldewangan, thierry.reding,
	jonathanh, skomatineni, linux-tegra, linux-spi, linux-kernel

From: Breno Leitao <leitao@debian.org>

[ Upstream commit 5b94c94caafcad3c77cc6b1d213a93bf5dc0a98e ]

The WARN_ON_ONCE/WARN_ON fired unconditionally on any completion
timeout, including the recoverable case where the interrupt was lost but
the hardware actually finished the transfer. This produced a noisy splat
with a full call trace even though the driver successfully recovered via
tegra_qspi_handle_timeout().

Since tegra210 uses threaded interrupts, the transfer completion can be
signaled before the interrupt fires, making this false positive case
common in practice.

Almost all the hosts I sysadmin in my fleet produce the following splat:

	WARNING: CPU: 47 PID: 844 at drivers/spi/spi-tegra210-quad.c:1226 tegra_qspi_transfer_one_message+0x8a4/0xba8
	....
	tegra-qspi NVDA1513:00: QSPI interrupt timeout, but transfer complete

Move WARN_ON_ONCE/WARN_ON to fire only on real unrecoverable timeouts,
i.e., when tegra_qspi_handle_timeout() confirms the hardware did NOT
complete. This makes the warning actionable instead of just polluting
the metrics.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260408-tegra_warn-v1-1-669a3bc74d77@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the data needed for a comprehensive analysis. Here is the
full walkthrough:

---

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: PARSE THE SUBJECT LINE
Record: [spi: tegra210-quad] [Fix] [false positive WARN on interrupt
timeout with transfer complete]

### Step 1.2: PARSE ALL COMMIT MESSAGE TAGS
- **Signed-off-by:** Breno Leitao <leitao@debian.org> (author)
- **Link:** https://patch.msgid.link/20260408-tegra_warn-v1-1-
  669a3bc74d77@debian.org
- **Signed-off-by:** Mark Brown <broonie@kernel.org> (SPI subsystem
  maintainer)
- No Fixes: tag (expected for autosel candidates)
- No Reported-by: (author is the reporter, via commit message text)
- No Cc: stable tag (expected)

Record: Author is a regular tegra210-quad contributor (9 commits in this
driver). Patch was accepted by the SPI subsystem maintainer Mark Brown.

### Step 1.3: ANALYZE THE COMMIT BODY TEXT
The bug: `WARN_ON_ONCE`/`WARN_ON` fires unconditionally on any
completion timeout, including the recoverable case where the interrupt
was lost but hardware finished the transfer. The driver already has
`tegra_qspi_handle_timeout()` that recovers from this situation.

Symptom: Full kernel warning stack trace on every recoverable timeout:
```
WARNING: CPU: 47 PID: 844 at drivers/spi/spi-tegra210-quad.c:1226
tegra_qspi_transfer_one_message+0x8a4/0xba8
```

Scale: Author states "Almost all the hosts I sysadmin in my fleet
produce the following splat." This is common because tegra210 uses
threaded interrupts where transfer completion can be signaled before the
IRQ fires.

Root cause: The WARN fires before `tegra_qspi_handle_timeout()` has a
chance to determine if the hardware actually completed.

Record: [False positive WARN on recoverable timeout] [WARNING splat with
full call trace on every occurrence] [Common in practice across fleet of
Tegra machines] [Root cause: WARN placed before recovery check]

### Step 1.4: DETECT HIDDEN BUG FIXES
This IS a bug fix - the WARN fires when there's no actual problem. With
`panic_on_warn=1` (verified exists in `kernel/panic.c`), this causes
kernel panics on recoverable situations.

Record: [Yes, this is a real bug fix. WARN fires on non-error
conditions, which causes panics with panic_on_warn=1 and pollutes
logs/monitoring on all other systems.]

---

## PHASE 2: DIFF ANALYSIS - LINE BY LINE

### Step 2.1: INVENTORY THE CHANGES
- **File:** `drivers/spi/spi-tegra210-quad.c`
- **Two hunks:** Both doing the same logical change
  - Hunk 1 (`tegra_qspi_combined_seq_xfer`): -1 line, +2 lines (net +1)
  - Hunk 2 (`tegra_qspi_non_combined_seq_xfer`): -1 line, +2 lines (net
    +1)
- **Functions modified:** `tegra_qspi_combined_seq_xfer`,
  `tegra_qspi_non_combined_seq_xfer`
- **Scope:** Single-file surgical fix, two identical pattern changes

Record: [1 file, 2 functions, +2/-2 logical changes, scope: surgical
fix]

### Step 2.2: UNDERSTAND THE CODE FLOW CHANGE
**Hunk 1 (combined_seq_xfer):**
- BEFORE: `if (WARN_ON_ONCE(ret == 0))` fires WARN then calls
  `tegra_qspi_handle_timeout()` inside the `if` body
- AFTER: `if (ret == 0)` (no WARN), then calls
  `tegra_qspi_handle_timeout()`, and only `WARN_ON_ONCE(1)` if
  handle_timeout returns < 0 (real unrecoverable timeout)

**Hunk 2 (non_combined_seq_xfer):**
- BEFORE: `if (WARN_ON(ret == 0))` fires WARN then calls
  `tegra_qspi_handle_timeout()`
- AFTER: `if (ret == 0)` (no WARN), then calls
  `tegra_qspi_handle_timeout()`, and only `WARN_ON(1)` if handle_timeout
  returns < 0

Record: [Both hunks: WARN moved from unconditional timeout to only real-
timeout branch after recovery check]

### Step 2.3: IDENTIFY THE BUG MECHANISM
Category: **Logic/correctness fix** - The WARN was placed at the wrong
scope level. It fires for all timeouts (including recoverable ones where
hardware completed but interrupt was delayed), when it should only fire
for genuine failures.

Record: [Logic bug: WARN fires at wrong scope. Fix moves WARN to only
trigger after confirming hardware did NOT complete.]

### Step 2.4: ASSESS THE FIX QUALITY
- Obviously correct: YES - the WARN should only fire for real errors,
  not recoverable conditions
- Minimal/surgical: YES - just 2 identical changes moving WARN placement
- Regression risk: VERY LOW - the same WARN still fires on real
  timeouts, and the recovery path is untouched
- No red flags: single file, same subsystem, no API changes

Record: [Fix is obviously correct, minimal, zero regression risk]

---

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: BLAME THE CHANGED LINES
- Line 1226 (`WARN_ON_ONCE`): introduced by `41c721fc09393` (Breno
  Leitao, 2025-04-01) in v6.15
- Line 1343 (`WARN_ON`): original from `921fc1838fb036` (Sowjanya
  Komatineni, 2020-12-21) in v5.11
- Lines 1227-1248 (timeout handling): introduced by `380fd29d57abe`
  (Vishwaroop A, 2025-10-28) in v6.19

Record: [WARN_ON from v5.11, WARN_ON_ONCE since v6.15, recovery logic
since v6.19. The false positive only became apparent after 380fd29d57abe
added recovery, revealing the WARN fires even when recovery succeeds.]

### Step 3.2: FOLLOW THE FIXES: TAG
No Fixes: tag present. However, logically this fixes the combination of
`41c721fc09393` (which introduced WARN_ON_ONCE in combined path) and
`380fd29d57abe` (which added the recovery but didn't move the WARN).

Record: [No Fixes: tag. Logically fixes the interaction between WARN
placement and the recovery path added in 380fd29d57abe.]

### Step 3.3: CHECK FILE HISTORY FOR RELATED CHANGES
Recent commits show heavy work on this driver by the same author:
- Spinlock protection for curr_xfer (6 patches, v6.19)
- IRQ_HANDLED fix (v6.19)
- Rate limiting (v6.15)
- WARN_ON_ONCE (v6.15)

Record: [Standalone fix. No prerequisites beyond the already-merged
recovery handler. Author is the most active recent contributor to this
driver.]

### Step 3.4: CHECK THE AUTHOR'S OTHER COMMITS
Breno Leitao has 9 commits to this driver, including the prior
WARN_ON_ONCE change and the spinlock protection series. He is clearly
intimately familiar with this driver and has been fixing real production
issues.

Record: [Active contributor to this driver, fixing real production
issues across a fleet.]

### Step 3.5: CHECK FOR DEPENDENT/PREREQUISITE COMMITS
This patch depends on `380fd29d57abe` ("Check hardware status on
timeout") which introduced `tegra_qspi_handle_timeout()`. That commit is
in v6.19+.

Record: [Requires tegra_qspi_handle_timeout() from 380fd29d57abe
(v6.19). Patch cannot apply to older stable trees.]

---

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

### Step 4.1: FIND THE ORIGINAL PATCH DISCUSSION
Lore.kernel.org is behind Anubis anti-bot protection. B4 dig could not
be used directly on the new commit hash (not yet in tree).

The patch was submitted as v1 on 2026-04-08 (based on msgid). It was a
single patch (not part of a series). Mark Brown applied it directly,
suggesting straightforward acceptance.

Record: [Patch accepted directly by SPI maintainer. No concerns or NAKs
visible. v1 was applied directly (no revisions needed).]

### Step 4.2: CHECK WHO REVIEWED THE PATCH
Mark Brown (SPI subsystem maintainer) signed off, indicating direct
review and acceptance.

Record: [Mark Brown (subsystem maintainer) reviewed and applied.]

### Step 4.3-4.5: BUG REPORT / RELATED PATCHES / STABLE DISCUSSION
The commit message itself serves as the bug report - the author
describes fleet-wide impact. No separate bug report.

Record: [Author is the reporter. Fleet-wide impact documented in commit
message.]

---

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1: IDENTIFY KEY FUNCTIONS IN THE DIFF
- `tegra_qspi_combined_seq_xfer()` - Combined sequence transfer path
- `tegra_qspi_non_combined_seq_xfer()` - Non-combined sequence transfer
  path

### Step 5.2: TRACE CALLERS
Both functions are called from `tegra_qspi_transfer_one_message()`,
which is the SPI core transfer callback. This is the main transfer path
for ALL QSPI operations on Tegra.

Record: [Called from core SPI transfer path - every QSPI transaction
goes through this code.]

### Step 5.3-5.4: TRACE CALLEES / CALL CHAIN
The affected code calls `tegra_qspi_handle_timeout()` which checks
hardware status (QSPI_RDY bit) and manually processes completion if
hardware finished. This is a standard SPI data path.

Record: [Standard SPI data transfer path, reachable from any QSPI
operation on Tegra platforms.]

### Step 5.5: SEARCH FOR SIMILAR PATTERNS
The exact same pattern exists in both functions (combined and non-
combined), and the fix addresses both identically.

Record: [Two instances of the same pattern, both fixed.]

---

## PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS

### Step 6.1: DOES THE BUGGY CODE EXIST IN STABLE TREES?
- **v6.19.y:** YES - has `tegra_qspi_handle_timeout()` and both
  WARN_ON/WARN_ON_ONCE. Patch applies cleanly (verified code is
  identical to v7.0).
- **v6.18.y and earlier:** NO - `tegra_qspi_handle_timeout()` does not
  exist. The timeout handling is completely different. This patch is
  INAPPLICABLE.
- **v6.6.y, v6.1.y, v5.15.y:** NO - completely different timeout code.

Record: [Applicable ONLY to v6.19.y and v7.0. No applicability to older
stable trees.]

### Step 6.2: CHECK FOR BACKPORT COMPLICATIONS
For v6.19.y: The code is identical - clean apply expected.

Record: [Clean apply to v6.19.y. Not applicable to older trees.]

### Step 6.3: CHECK IF RELATED FIXES ARE ALREADY IN STABLE
No related fix is in stable for this specific false positive WARN issue.

Record: [No existing fix in stable for this issue.]

---

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

### Step 7.1: IDENTIFY THE SUBSYSTEM AND ITS CRITICALITY
- Subsystem: `drivers/spi` - SPI bus driver for Tegra QSPI controller
- Criticality: IMPORTANT for Tegra (NVIDIA) platforms (widely used in
  embedded/automotive/Jetson)

Record: [SPI driver subsystem, IMPORTANT for Tegra/Jetson platforms]

### Step 7.2: ASSESS SUBSYSTEM ACTIVITY
Very active - 20 recent commits to this file, ongoing improvements and
bug fixes.

Record: [Very active subsystem with ongoing fixes]

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: DETERMINE WHO IS AFFECTED
All users of Tegra210+ QSPI hardware. This includes NVIDIA Jetson
platforms widely used in robotics, automotive, and embedded
applications.

Record: [All Tegra QSPI users - Jetson/embedded platforms]

### Step 8.2: DETERMINE THE TRIGGER CONDITIONS
- Trigger: Any QSPI transfer where interrupt is delayed (common with
  threaded IRQs)
- Frequency: Very common - author reports "almost all hosts in my fleet"
- No privilege required - happens during normal SPI operations

Record: [Very common, happens during normal QSPI operations, no special
trigger needed]

### Step 8.3: DETERMINE THE FAILURE MODE SEVERITY
- Without `panic_on_warn`: MEDIUM - log pollution, monitoring noise,
  full stack trace on recoverable events
- With `panic_on_warn=1`: CRITICAL - kernel panic on a non-error
  condition
- The warning count can be extremely high (94,451 in commit
  41c721fc09393's message)

Record: [MEDIUM severity (CRITICAL with panic_on_warn=1). False positive
warnings pollute logs and can cause panics.]

### Step 8.4: CALCULATE RISK-BENEFIT RATIO
- **BENEFIT:** HIGH - eliminates false positive warnings that are common
  across Tegra fleet. Prevents panics on `panic_on_warn=1` systems.
  Makes the warning actionable (only fires on real errors).
- **RISK:** VERY LOW - 2-line change per function, obviously correct, no
  behavioral change except WARN placement.

Record: [HIGH benefit, VERY LOW risk. Excellent ratio.]

---

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: COMPILE THE EVIDENCE

**FOR backporting:**
- Fixes a real, common false positive WARN that fires during normal
  recoverable operation
- Can cause kernel panics on `panic_on_warn=1` systems
- Author documents fleet-wide impact ("almost all hosts in my fleet")
- Fix is tiny (2 identical 1-line changes), obviously correct, zero
  regression risk
- Accepted directly by SPI maintainer Mark Brown without revisions
- Author is an experienced contributor to this driver with 9 commits
- Applies cleanly to v6.19.y stable tree

**AGAINST backporting:**
- Only applicable to v6.19.y (depends on v6.19's
  `tegra_qspi_handle_timeout()`)
- Without `panic_on_warn`, this is "just" log noise (though significant
  noise)
- Limited to Tegra QSPI hardware users

### Step 9.2: APPLY THE STABLE RULES CHECKLIST
1. Obviously correct and tested? **YES** - trivially verifiable, tested
   in author's fleet
2. Fixes a real bug that affects users? **YES** - false positive WARN,
   panic with panic_on_warn
3. Important issue? **YES** - WARN with full call trace on every
   recoverable timeout; panic on panic_on_warn
4. Small and contained? **YES** - 2 identical single-line changes in one
   file
5. No new features or APIs? **YES** - just moves WARN placement
6. Can apply to stable trees? **YES** - cleanly to v6.19.y

### Step 9.3: CHECK FOR EXCEPTION CATEGORIES
Not an exception category - this is a straightforward bug fix.

### Step 9.4: MAKE YOUR DECISION
The fix is small, surgical, obviously correct, and fixes a real problem
that affects real Tegra users in production. It eliminates false
positive warnings that can cause panics on `panic_on_warn=1` systems and
significantly reduces log noise. The fix meets all stable kernel
criteria.

---

## Verification

- [Phase 1] Parsed tags: Signed-off-by author (Breno Leitao), Signed-
  off-by maintainer (Mark Brown), Link to patch
- [Phase 2] Diff analysis: 2 identical changes in 2 functions, moving
  WARN from outer scope to real-timeout-only branch
- [Phase 3] git blame: WARN_ON_ONCE at line 1226 from commit
  41c721fc09393 (v6.15); recovery handler from 380fd29d57abe (v6.19)
- [Phase 3] git show 380fd29d57abe: confirmed this introduced
  tegra_qspi_handle_timeout(), the function that makes the false
  positive detectable
- [Phase 3] git show 41c721fc09393: confirmed 94,451 warnings reported
  per host
- [Phase 3] git log --author: author has 9 commits to this driver
- [Phase 4] b4 dig found prior series by same author; current patch is
  standalone v1 accepted directly
- [Phase 5] Functions modified are in the core SPI transfer path, called
  from tegra_qspi_transfer_one_message
- [Phase 6] v6.19 code verified identical to v7.0 for affected areas
  (tegra_qspi_handle_timeout exists, WARN_ON locations match)
- [Phase 6] v6.12, v6.6 confirmed: NO tegra_qspi_handle_timeout
  function, completely different timeout code - patch inapplicable
- [Phase 6] v6.15, v6.18 confirmed: WARN_ON_ONCE exists but NO
  tegra_qspi_handle_timeout - patch inapplicable
- [Phase 8] panic_on_warn confirmed in kernel/panic.c: WARN triggers
  check_panic_on_warn() which calls panic()
- UNVERIFIED: Could not access lore.kernel.org due to Anubis bot
  protection; unable to verify full review discussion

**YES**

 drivers/spi/spi-tegra210-quad.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-tegra210-quad.c b/drivers/spi/spi-tegra210-quad.c
index 7ea5aa993596f..05f1753e2bbb1 100644
--- a/drivers/spi/spi-tegra210-quad.c
+++ b/drivers/spi/spi-tegra210-quad.c
@@ -1223,7 +1223,7 @@ static int tegra_qspi_combined_seq_xfer(struct tegra_qspi *tqspi,
 					(&tqspi->xfer_completion,
 					QSPI_DMA_TIMEOUT);
 
-			if (WARN_ON_ONCE(ret == 0)) {
+			if (ret == 0) {
 				/*
 				 * Check if hardware completed the transfer
 				 * even though interrupt was lost or delayed.
@@ -1232,6 +1232,7 @@ static int tegra_qspi_combined_seq_xfer(struct tegra_qspi *tqspi,
 				ret = tegra_qspi_handle_timeout(tqspi);
 				if (ret < 0) {
 					/* Real timeout - clean up and fail */
+					WARN_ON_ONCE(1);
 					dev_err(tqspi->dev, "transfer timeout\n");
 
 					/* Abort transfer by resetting pio/dma bit */
@@ -1340,7 +1341,7 @@ static int tegra_qspi_non_combined_seq_xfer(struct tegra_qspi *tqspi,
 
 		ret = wait_for_completion_timeout(&tqspi->xfer_completion,
 						  QSPI_DMA_TIMEOUT);
-		if (WARN_ON(ret == 0)) {
+		if (ret == 0) {
 			/*
 			 * Check if hardware completed the transfer even though
 			 * interrupt was lost or delayed. If so, process the
@@ -1349,6 +1350,7 @@ static int tegra_qspi_non_combined_seq_xfer(struct tegra_qspi *tqspi,
 			ret = tegra_qspi_handle_timeout(tqspi);
 			if (ret < 0) {
 				/* Real timeout - clean up and fail */
+				WARN_ON(1);
 				dev_err(tqspi->dev, "transfer timeout\n");
 
 				if (tqspi->is_curr_dma_xfer)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.18] dt-bindings: net: Fix Tegra234 MGBE PTP clock
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
  2026-04-20 13:17 ` [PATCH AUTOSEL 7.0-6.12] arm64: tegra: Fix snps,blen properties Sasha Levin
  2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.19] spi: tegra210-quad: Fix false positive WARN on interrupt timeout with transfer complete Sasha Levin
@ 2026-04-20 13:21 ` Sasha Levin
  2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] gpio: tegra: fix irq_release_resources calling enable instead of disable Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Jon Hunter, Krzysztof Kozlowski, Jakub Kicinski, Sasha Levin,
	andrew+netdev, davem, edumazet, pabeni, robh, krzk+dt, conor+dt,
	thierry.reding, treding, vbhadram, netdev, devicetree,
	linux-tegra, linux-kernel

From: Jon Hunter <jonathanh@nvidia.com>

[ Upstream commit fb22b1fc5bca3c0aad95388933497ceb30f1fb26 ]

The PTP clock for the Tegra234 MGBE device is incorrectly named
'ptp-ref' and should be 'ptp_ref'. This is causing the following
warning to be observed on Tegra234 platforms that use this device:

 ERR KERN tegra-mgbe 6800000.ethernet eth0: Invalid PTP clock rate
 WARNING KERN tegra-mgbe 6800000.ethernet eth0: PTP init failed

Although this constitutes an ABI breakage in the binding for this
device, PTP support has clearly never worked and so fix this now
so we can correct the device-tree for this device. Note that the
MGBE driver still supports the legacy 'ptp-ref' clock name and so
older/existing device-trees will still work, but given that this
is not the correct name, there is no point to advertise this in the
binding.

Fixes: 189c2e5c7669 ("dt-bindings: net: Add Tegra234 MGBE")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260401102941.17466-3-jonathanh@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 .../devicetree/bindings/net/nvidia,tegra234-mgbe.yaml         | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nvidia,tegra234-mgbe.yaml b/Documentation/devicetree/bindings/net/nvidia,tegra234-mgbe.yaml
index 2bd3efff2485e..215f14d1897d2 100644
--- a/Documentation/devicetree/bindings/net/nvidia,tegra234-mgbe.yaml
+++ b/Documentation/devicetree/bindings/net/nvidia,tegra234-mgbe.yaml
@@ -42,7 +42,7 @@ properties:
       - const: mgbe
       - const: mac
       - const: mac-divider
-      - const: ptp-ref
+      - const: ptp_ref
       - const: rx-input-m
       - const: rx-input
       - const: tx
@@ -133,7 +133,7 @@ examples:
                  <&bpmp TEGRA234_CLK_MGBE0_RX_PCS_M>,
                  <&bpmp TEGRA234_CLK_MGBE0_RX_PCS>,
                  <&bpmp TEGRA234_CLK_MGBE0_TX_PCS>;
-        clock-names = "mgbe", "mac", "mac-divider", "ptp-ref", "rx-input-m",
+        clock-names = "mgbe", "mac", "mac-divider", "ptp_ref", "rx-input-m",
                       "rx-input", "tx", "eee-pcs", "rx-pcs-input", "rx-pcs-m",
                       "rx-pcs", "tx-pcs";
         resets = <&bpmp TEGRA234_RESET_MGBE0_MAC>,
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.18] gpio: tegra: fix irq_release_resources calling enable instead of disable
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] dt-bindings: net: Fix Tegra234 MGBE PTP clock Sasha Levin
@ 2026-04-20 13:21 ` Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Samasth Norway Ananda, Bartosz Golaszewski, Sasha Levin, linusw,
	brgl, thierry.reding, jonathanh, linux-gpio, linux-tegra,
	linux-kernel

From: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>

[ Upstream commit 1561d96f5f55c1bca9ff047ace5813f4f244eea6 ]

tegra_gpio_irq_release_resources() erroneously calls tegra_gpio_enable()
instead of tegra_gpio_disable(). When IRQ resources are released, the
GPIO configuration bit (CNF) should be cleared to deconfigure the pin as
a GPIO. Leaving it enabled wastes power and can cause unexpected behavior
if the pin is later reused for an alternate function via pinctrl.

Fixes: 66fecef5bde0 ("gpio: tegra: Convert to gpio_irq_chip")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Link: https://patch.msgid.link/20260407210247.1737938-1-samasth.norway.ananda@oracle.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 drivers/gpio/gpio-tegra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpio/gpio-tegra.c b/drivers/gpio/gpio-tegra.c
index 15a5762a82c25..b14052fe64ac6 100644
--- a/drivers/gpio/gpio-tegra.c
+++ b/drivers/gpio/gpio-tegra.c
@@ -595,7 +595,7 @@ static void tegra_gpio_irq_release_resources(struct irq_data *d)
 	struct tegra_gpio_info *tgi = gpiochip_get_data(chip);
 
 	gpiochip_relres_irq(chip, d->hwirq);
-	tegra_gpio_enable(tgi, d->hwirq);
+	tegra_gpio_disable(tgi, d->hwirq);
 }
 
 static void tegra_gpio_irq_print_chip(struct irq_data *d, struct seq_file *s)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-20 13:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260420132314.1023554-1-sashal@kernel.org>
2026-04-20 13:17 ` [PATCH AUTOSEL 7.0-6.12] arm64: tegra: Fix snps,blen properties Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.19] spi: tegra210-quad: Fix false positive WARN on interrupt timeout with transfer complete Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] dt-bindings: net: Fix Tegra234 MGBE PTP clock Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] gpio: tegra: fix irq_release_resources calling enable instead of disable Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox