public inbox for dmaengine@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.19-6.18] dmaengine: idxd: Fix lockdep warnings when calling idxd_device_config()
       [not found] <20260330123842.756154-1-sashal@kernel.org>
@ 2026-03-30 12:38 ` Sasha Levin
  0 siblings, 0 replies; only message in thread
From: Sasha Levin @ 2026-03-30 12:38 UTC (permalink / raw)
  To: patches, stable
  Cc: Vinicius Costa Gomes, Dave Jiang, Vinod Koul, Sasha Levin,
	dmaengine, linux-kernel

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>

[ Upstream commit caf91cdf2de8b7134749d32cd4ae5520b108abb7 ]

Move the check for IDXD_FLAG_CONFIGURABLE and the locking to "inside"
idxd_device_config(), as this is common to all callers, and the one
that wasn't holding the lock was an error (that was causing the
lockdep warning).

Suggested-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-1-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a thorough understanding of the code. Let me compile the
final analysis.

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: PARSE THE SUBJECT LINE**
Record: [dmaengine: idxd:] [Fix] [lockdep warnings when calling
idxd_device_config()] — "Fix" verb explicitly indicates a bug fix.

**Step 1.2: PARSE ALL COMMIT MESSAGE TAGS**
Record:
- Suggested-by: Dave Jiang <dave.jiang@intel.com> — IDXD designated
  reviewer per MAINTAINERS
- Reviewed-by: Dave Jiang <dave.jiang@intel.com> — same person,
  subsystem expert
- Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> —
  author; listed as IDXD maintainer (M:) in MAINTAINERS
- Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-
  queues-v3-v3-1-7ed70658a9d1@intel.com
- Signed-off-by: Vinod Koul <vkoul@kernel.org> — dmaengine subsystem
  maintainer who applied it
- No Fixes:, Reported-by:, Tested-by:, or Cc: stable tags (expected)

**Step 1.3: ANALYZE THE COMMIT BODY TEXT**
Record: Bug = `idxd_device_config()` had
`lockdep_assert_held(&idxd->dev_lock)` but at least one caller was NOT
holding the lock. The commit message explicitly says "the one that
wasn't holding the lock was an error." Symptom = lockdep warnings at
runtime. Fix = move lock acquisition and `IDXD_FLAG_CONFIGURABLE` check
inside `idxd_device_config()` to centralize for all callers.

**Step 1.4: DETECT HIDDEN BUG FIXES**
Record: This is explicitly labeled a "Fix" for lockdep warnings, but the
underlying issue is a real synchronization bug. The commit message says
the missing lock "was an error" — the function's callee contract
required `dev_lock` for device configuration MMIO/state writes, and at
least one caller violated it.

---

## PHASE 2: DIFF ANALYSIS

**Step 2.1: INVENTORY THE CHANGES**
Record: 2 files: `drivers/dma/idxd/device.c` (3 hunks),
`drivers/dma/idxd/init.c` (1 hunk). Net change: approximately -5 lines.
Functions modified: `idxd_device_config()`, `idxd_drv_enable_wq()`,
`idxd_device_drv_probe()`, `idxd_reset_done()`. Scope: single-driver
surgical fix.

**Step 2.2: UNDERSTAND THE CODE FLOW CHANGE**
- **Hunk 1 (idxd_device_config):**
  `lockdep_assert_held(&idxd->dev_lock)` replaced with
  `guard(spinlock)(&idxd->dev_lock)` (scope-based lock).
  `IDXD_FLAG_CONFIGURABLE` early-return check added. Lock is now
  acquired internally.
- **Hunk 2 (idxd_drv_enable_wq):** External `spin_lock/spin_unlock` +
  `IDXD_FLAG_CONFIGURABLE` check removed; direct call to
  `idxd_device_config()`.
- **Hunk 3 (idxd_device_drv_probe):** Same as Hunk 2.
- **Hunk 4 (idxd_reset_done in init.c):** Outer `IDXD_FLAG_CONFIGURABLE`
  guard removed; direct unconditional call.
- **Key insight:** The unmodified caller `idxd_device_reinit()` in
  `irq.c` (line 39) is automatically fixed because
  `idxd_device_config()` now acquires the lock internally.

**Step 2.3: IDENTIFY THE BUG MECHANISM**
Record: Synchronization / locking-contract bug. `idxd_device_config()`
asserted `lockdep_assert_held(&idxd->dev_lock)` but two callers violated
this: (1) `idxd_device_reinit()` in irq.c — present since v5.9, never
held the lock; (2) `idxd_reset_done()` in init.c — added in v6.14, also
didn't hold the lock. The fix centralizes lock acquisition inside the
callee.

**Step 2.4: ASSESS THE FIX QUALITY**
Record: Obviously correct — standard pattern of moving lock from callers
into callee. Uses well-established `guard(spinlock)` API. Minimal:
removes code from callers, adds to callee. Reviewed and suggested by the
IDXD designated reviewer (Dave Jiang). Very low regression risk — the
two callers that had the lock (drv_enable_wq, device_drv_probe) simply
shift the lock scope inside the function.

---

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: BLAME THE CHANGED LINES**
Record: `idxd_device_config()` with
`lockdep_assert_held(&idxd->dev_lock)` introduced by `bfe1d56091c1a4`
(Dave Jiang, 2020-01-21) — first appeared in v5.6. The unlocked
`idxd_device_reinit()` caller was introduced by `0d5c10b4c84d6a` (Dave
Jiang, 2020-06-26) — first appeared in v5.9. The unlocked
`idxd_reset_done()` caller from `98d187a9890360` (Fenghua Yu,
2024-11-22) — first appeared in v6.14.

**Step 3.2: FOLLOW THE FIXES TAG**
Record: No Fixes: tag present (expected — that's why this needs manual
review).

**Step 3.3: CHECK FILE HISTORY**
Record: Active idxd development. Recent commits include FLR support, UAF
fixes, deadlock fixes (`407171717a4f4`), spinlock-to-mutex conversions.
This fix is standalone.

**Step 3.4: CHECK THE AUTHOR**
Record: Vinicius Costa Gomes is the listed IDXD maintainer (M:) in
MAINTAINERS. Dave Jiang (Suggested-by/Reviewed-by) is the designated
reviewer (R:). Vinod Koul is the dmaengine subsystem maintainer who
applied it. Strong trust signal.

**Step 3.5: CHECK FOR DEPENDENCIES**
Record: Uses `guard(spinlock)` which requires `include/linux/cleanup.h`
— verified present in v6.6 (`cleanup.h` exists), NOT present in v6.1
(file does not exist at that tag). The `init.c` hunk only applies to
trees with FLR support (v6.14+). The `device.c` changes are standalone.

---

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1-4.4:** Lore.kernel.org and patch.msgid.link URLs were blocked
by Anubis bot protection. The Link header indicates this is part of a
series related to "idxd-fix-flr-on-kernel-queues-v3." No stable-specific
discussion could be verified externally.

Record: UNVERIFIED — could not access lore/patch thread directly due to
anti-bot blocking.

---

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1: KEY FUNCTIONS**
Record: `idxd_device_config()`, `idxd_drv_enable_wq()`,
`idxd_device_drv_probe()`, `idxd_reset_done()`, and the unmodified but
affected `idxd_device_reinit()`.

**Step 5.2: TRACE CALLERS**
Verified all callers of `idxd_device_config()`:
1. `idxd_device_reinit()` in irq.c:39 — work queue callback for software
   reset recovery. **NO lock held — BUG (since v5.9)**
2. `idxd_reset_done()` in init.c:1097 — PCI FLR completion callback.
   **NO lock held — BUG (since v6.14)**
3. `idxd_drv_enable_wq()` in device.c:1455 — WQ enable path. Lock held
   correctly.
4. `idxd_device_drv_probe()` in device.c:1554 — Device driver probe.
   Lock held correctly.

**Step 5.3-5.4: CALL CHAINS**
- `idxd_device_reinit` ← `INIT_WORK` ← `idxd_halt()` ←
  `idxd_misc_thread()` (threaded IRQ handler via
  `request_threaded_irq`). Triggered when device enters halt state with
  software reset type.
- `idxd_reset_done` ← PCI error handler `.reset_done` callback.
  Triggered during FLR completion.
- Both are reachable during normal device operation under error/recovery
  conditions.

**Step 5.5: SIMILAR PATTERNS**
Record: Prior idxd locking fixes exist: `407171717a4f4` ("avoid deadlock
in process_misc_interrupts()") and `cf4ac3fef3388` (lockdep warning on
driver removal). This is a pattern in idxd of locking bugs being found
and fixed.

---

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1: BUGGY CODE IN STABLE TREES**
Verified:
- **v6.6:** `idxd_device_reinit()` calling `idxd_device_config()`
  without lock EXISTS (confirmed via `git show
  v6.6:drivers/dma/idxd/irq.c`). `idxd_reset_done()` does NOT exist
  (confirmed: `git grep idxd_device_config v6.6:drivers/dma/idxd/init.c`
  returns nothing).
- **v6.1:** Same unlocked `irq.c` caller exists, but `cleanup.h` /
  `guard(spinlock)` does NOT exist — backport would need traditional
  `spin_lock`/`spin_unlock`.
- **v6.14+:** Both buggy callers exist.

**Step 6.2: BACKPORT COMPLICATIONS**
- v6.6+: `guard(spinlock)` available. `device.c` changes apply cleanly.
  `init.c` hunk irrelevant (no FLR code).
- v6.1 and older: No `guard(spinlock)` — needs rework to explicit
  `spin_lock`/`spin_unlock`.

**Step 6.3: RELATED FIXES IN STABLE**
No prior fix for this specific `idxd_device_config()` lockdep/locking
issue found in stable.

---

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

**Step 7.1:** `drivers/dma/idxd` — Intel Data Streaming Accelerator
(DSA) / Intel Analytics Accelerator (IAA). Used in data center /
enterprise environments. Criticality: PERIPHERAL-to-IMPORTANT (hardware-
specific but used in enterprise/cloud computing).

**Step 7.2:** Actively maintained with regular bug fixes.

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1: WHO IS AFFECTED**
Record: Users of Intel IDXD/DSA/IAA hardware, primarily enterprise/data
center systems.

**Step 8.2: TRIGGER CONDITIONS**
Record: The `irq.c` unlocked path is triggered when the device enters
halt state and does a software reset (interrupt-driven). The `init.c`
path is triggered during FLR completion (v6.14+). These are error
recovery paths, not routine hot paths. However, they are reachable
during normal device operation when hardware issues occur.

**Step 8.3: FAILURE MODE SEVERITY**
The verified symptom is a lockdep warning. However, the missing lock on
`idxd_device_config()` means the function executes MMIO configuration
writes (`idxd_wqs_config_write`, `idxd_groups_config_write`) and
manipulates shared device state without the required `dev_lock`. This
creates a real (not theoretical) race window where concurrent access to
device configuration could cause incorrect MMIO programming or corrupted
device state. Severity: MEDIUM-HIGH — lockdep warning verified, race on
MMIO/config paths is the underlying risk.

**Step 8.4: RISK-BENEFIT RATIO**
- Benefit: MEDIUM-HIGH — fixes real synchronization bug that lockdep
  detected; prevents potential device misconfiguration during error
  recovery
- Risk: VERY LOW — net -5 lines, purely moves existing logic, reviewed
  by subsystem maintainer/reviewer
- Ratio: Strongly favorable

---

## PHASE 9: FINAL SYNTHESIS

**Step 9.1: EVIDENCE FOR**
- Fixes a verified locking contract violation (lockdep_assert_held
  fails)
- The commit message explicitly says the missing lock "was an error"
- The unlocked `irq.c` caller has existed since v5.9 — long-standing bug
  affecting all stable trees with IDXD
- `idxd_device_config()` performs hardware MMIO writes — missing lock
  means real race on device state
- Small, surgical fix: net -5 lines, two files, single driver
- Written by the IDXD maintainer, suggested and reviewed by the IDXD
  designated reviewer
- Applied by the dmaengine subsystem maintainer
- Similar prior locking fixes in IDXD have been applied (e.g.,
  `407171717a4f4`)

**Step 9.2: EVIDENCE AGAINST**
- No verified crash, data corruption, or security issue — the verified
  symptom is a lockdep warning
- The init.c hunk only applies to v6.14+ (FLR code)
- v6.1 and older need rework (no `guard(spinlock)`)
- Niche hardware (Intel DSA/IAA, enterprise/data center)
- Could not verify mailing list discussion due to anti-bot blocking

**Step 9.3: STABLE RULES CHECKLIST**
1. Obviously correct and tested? **YES** — standard lock-inside-callee
   pattern, reviewed by subsystem expert
2. Fixes a real bug? **YES** — locking contract violation on hardware
   configuration path
3. Important issue? **YES** — race on MMIO/device configuration writes
   during error recovery; lockdep fires on any debug/CI kernel
4. Small and contained? **YES** — 2 files, net -5 lines, single driver
5. No new features or APIs? **YES** — pure bug fix
6. Can apply to stable? **YES** for v6.6+ (with init.c hunk dropped on
   pre-6.14); needs minor adaptation for v6.1

**Step 9.4: DECISION**
The fix addresses a real synchronization bug where device configuration
MMIO writes execute without the required `dev_lock`. The lockdep warning
is the verified symptom, but the underlying issue is a genuine race on
shared hardware state — missing locks on MMIO paths are not cosmetic.
The fix is small, obviously correct, and comes from the subsystem
maintainer chain. It meets all stable kernel criteria.

---

## Verification

- [Phase 1] Parsed all tags from commit message. Confirmed Vinicius
  Costa Gomes is IDXD maintainer (M:) and Dave Jiang is reviewer (R:) in
  MAINTAINERS file at line 12750-12752.
- [Phase 2] Read current `device.c` lines 1121-1145: confirmed
  `lockdep_assert_held(&idxd->dev_lock)` at line 1125.
- [Phase 2] Read current `device.c` lines 1452-1456: confirmed
  `idxd_drv_enable_wq()` takes `spin_lock(&idxd->dev_lock)` before
  calling `idxd_device_config()`.
- [Phase 2] Read current `device.c` lines 1552-1555: confirmed
  `idxd_device_drv_probe()` takes `spin_lock(&idxd->dev_lock)` before
  calling `idxd_device_config()`.
- [Phase 2] Read current `init.c` lines 1093-1102: confirmed
  `idxd_reset_done()` calls `idxd_device_config()` under
  `IDXD_FLAG_CONFIGURABLE` check but WITHOUT `dev_lock`.
- [Phase 2] Read current `irq.c` lines 32-64: confirmed
  `idxd_device_reinit()` calls `idxd_device_config()` at line 39 WITHOUT
  any lock and WITHOUT `IDXD_FLAG_CONFIGURABLE` check.
- [Phase 3] git blame `irq.c` lines 32-40: confirmed
  `idxd_device_reinit()` from commit `0d5c10b4c84d6a` (Dave Jiang,
  2020-06-26); `idxd_device_config()` call from `bfe1d56091c1a4`
  (2020-01-21).
- [Phase 3] git blame `device.c` lines 1121-1126: confirmed
  `idxd_device_config()` with `lockdep_assert_held` from
  `bfe1d56091c1a4`.
- [Phase 3] git blame `init.c` lines 1093-1102: confirmed
  `idxd_reset_done()` code from `98d187a9890360` (Fenghua Yu,
  2024-11-22).
- [Phase 3] `git describe --contains 0d5c10b4c84d6a` =
  `v5.9-rc1~96^2~1^2~52` — bug introduced in v5.9.
- [Phase 3] `git describe --contains 98d187a9890360` = `v6.14-rc1~43^2`
  — FLR caller introduced in v6.14.
- [Phase 3] `git describe --contains bfe1d56091c1a4` =
  `v5.6-rc1~196^2~7` — original function since v5.6.
- [Phase 5] Verified all callers via grep: `irq.c:39`, `init.c:1097`,
  `device.c:1455`, `device.c:1554`, `idxd.h:762` (declaration).
- [Phase 5] Read `irq.c` lines 400-421: verified `idxd_halt()` queues
  `idxd_device_reinit` via `INIT_WORK` + `queue_work` on software reset,
  and `idxd_device_flr` on FLR reset type.
- [Phase 6] `git show v6.6:drivers/dma/idxd/irq.c`: confirmed unlocked
  `idxd_device_reinit()` calling `idxd_device_config()` exists in v6.6.
- [Phase 6] `git show v6.6:drivers/dma/idxd/device.c` grep: confirmed
  `lockdep_assert_held` and locked callers exist in v6.6.
- [Phase 6] `git grep idxd_device_config v6.6:drivers/dma/idxd/init.c`:
  empty — FLR code does NOT exist in v6.6.
- [Phase 6] `git show v6.6:include/linux/cleanup.h`: exists —
  `guard(spinlock)` available in v6.6.
- [Phase 6] `git show v6.1:include/linux/cleanup.h`: does NOT exist —
  `guard(spinlock)` NOT available in v6.1.
- [Phase 8] Verified `idxd_device_config()` callees perform MMIO writes:
  `idxd_wqs_config_write()`, `idxd_groups_config_write()` at lines
  1136-1141.
- UNVERIFIED: Mailing list discussion content (lore blocked by Anubis
  anti-bot protection).
- UNVERIFIED: Whether an unprivileged user can reliably trigger the
  halt/software-reset path.

**YES**

 drivers/dma/idxd/device.c | 17 +++++++----------
 drivers/dma/idxd/init.c   | 10 ++++------
 2 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index c2cdf41b6e576..621f797b50d84 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -1122,7 +1122,11 @@ int idxd_device_config(struct idxd_device *idxd)
 {
 	int rc;
 
-	lockdep_assert_held(&idxd->dev_lock);
+	guard(spinlock)(&idxd->dev_lock);
+
+	if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
+		return 0;
+
 	rc = idxd_wqs_setup(idxd);
 	if (rc < 0)
 		return rc;
@@ -1449,11 +1453,7 @@ int idxd_drv_enable_wq(struct idxd_wq *wq)
 		}
 	}
 
-	rc = 0;
-	spin_lock(&idxd->dev_lock);
-	if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
-		rc = idxd_device_config(idxd);
-	spin_unlock(&idxd->dev_lock);
+	rc = idxd_device_config(idxd);
 	if (rc < 0) {
 		dev_dbg(dev, "Writing wq %d config failed: %d\n", wq->id, rc);
 		goto err;
@@ -1549,10 +1549,7 @@ int idxd_device_drv_probe(struct idxd_dev *idxd_dev)
 	}
 
 	/* Device configuration */
-	spin_lock(&idxd->dev_lock);
-	if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
-		rc = idxd_device_config(idxd);
-	spin_unlock(&idxd->dev_lock);
+	rc = idxd_device_config(idxd);
 	if (rc < 0)
 		return -ENXIO;
 
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 2acc34b3daff8..c29f9123934ae 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -1093,12 +1093,10 @@ static void idxd_reset_done(struct pci_dev *pdev)
 	idxd_device_config_restore(idxd, idxd->idxd_saved);
 
 	/* Re-configure IDXD device if allowed. */
-	if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) {
-		rc = idxd_device_config(idxd);
-		if (rc < 0) {
-			dev_err(dev, "HALT: %s config fails\n", idxd_name);
-			goto out;
-		}
+	rc = idxd_device_config(idxd);
+	if (rc < 0) {
+		dev_err(dev, "HALT: %s config fails\n", idxd_name);
+		goto out;
 	}
 
 	/* Bind IDXD device to driver. */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2026-03-30 12:39 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260330123842.756154-1-sashal@kernel.org>
2026-03-30 12:38 ` [PATCH AUTOSEL 6.19-6.18] dmaengine: idxd: Fix lockdep warnings when calling idxd_device_config() Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox