* [PATCH AUTOSEL 6.19-6.18] dmaengine: idxd: Fix lockdep warnings when calling idxd_device_config()
[not found] <20260330123842.756154-1-sashal@kernel.org>
@ 2026-03-30 12:38 ` Sasha Levin
0 siblings, 0 replies; only message in thread
From: Sasha Levin @ 2026-03-30 12:38 UTC (permalink / raw)
To: patches, stable
Cc: Vinicius Costa Gomes, Dave Jiang, Vinod Koul, Sasha Levin,
dmaengine, linux-kernel
From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
[ Upstream commit caf91cdf2de8b7134749d32cd4ae5520b108abb7 ]
Move the check for IDXD_FLAG_CONFIGURABLE and the locking to "inside"
idxd_device_config(), as this is common to all callers, and the one
that wasn't holding the lock was an error (that was causing the
lockdep warning).
Suggested-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-1-7ed70658a9d1@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a thorough understanding of the code. Let me compile the
final analysis.
## PHASE 1: COMMIT MESSAGE FORENSICS
**Step 1.1: PARSE THE SUBJECT LINE**
Record: [dmaengine: idxd:] [Fix] [lockdep warnings when calling
idxd_device_config()] — "Fix" verb explicitly indicates a bug fix.
**Step 1.2: PARSE ALL COMMIT MESSAGE TAGS**
Record:
- Suggested-by: Dave Jiang <dave.jiang@intel.com> — IDXD designated
reviewer per MAINTAINERS
- Reviewed-by: Dave Jiang <dave.jiang@intel.com> — same person,
subsystem expert
- Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> —
author; listed as IDXD maintainer (M:) in MAINTAINERS
- Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-
queues-v3-v3-1-7ed70658a9d1@intel.com
- Signed-off-by: Vinod Koul <vkoul@kernel.org> — dmaengine subsystem
maintainer who applied it
- No Fixes:, Reported-by:, Tested-by:, or Cc: stable tags (expected)
**Step 1.3: ANALYZE THE COMMIT BODY TEXT**
Record: Bug = `idxd_device_config()` had
`lockdep_assert_held(&idxd->dev_lock)` but at least one caller was NOT
holding the lock. The commit message explicitly says "the one that
wasn't holding the lock was an error." Symptom = lockdep warnings at
runtime. Fix = move lock acquisition and `IDXD_FLAG_CONFIGURABLE` check
inside `idxd_device_config()` to centralize for all callers.
**Step 1.4: DETECT HIDDEN BUG FIXES**
Record: This is explicitly labeled a "Fix" for lockdep warnings, but the
underlying issue is a real synchronization bug. The commit message says
the missing lock "was an error" — the function's callee contract
required `dev_lock` for device configuration MMIO/state writes, and at
least one caller violated it.
---
## PHASE 2: DIFF ANALYSIS
**Step 2.1: INVENTORY THE CHANGES**
Record: 2 files: `drivers/dma/idxd/device.c` (3 hunks),
`drivers/dma/idxd/init.c` (1 hunk). Net change: approximately -5 lines.
Functions modified: `idxd_device_config()`, `idxd_drv_enable_wq()`,
`idxd_device_drv_probe()`, `idxd_reset_done()`. Scope: single-driver
surgical fix.
**Step 2.2: UNDERSTAND THE CODE FLOW CHANGE**
- **Hunk 1 (idxd_device_config):**
`lockdep_assert_held(&idxd->dev_lock)` replaced with
`guard(spinlock)(&idxd->dev_lock)` (scope-based lock).
`IDXD_FLAG_CONFIGURABLE` early-return check added. Lock is now
acquired internally.
- **Hunk 2 (idxd_drv_enable_wq):** External `spin_lock/spin_unlock` +
`IDXD_FLAG_CONFIGURABLE` check removed; direct call to
`idxd_device_config()`.
- **Hunk 3 (idxd_device_drv_probe):** Same as Hunk 2.
- **Hunk 4 (idxd_reset_done in init.c):** Outer `IDXD_FLAG_CONFIGURABLE`
guard removed; direct unconditional call.
- **Key insight:** The unmodified caller `idxd_device_reinit()` in
`irq.c` (line 39) is automatically fixed because
`idxd_device_config()` now acquires the lock internally.
**Step 2.3: IDENTIFY THE BUG MECHANISM**
Record: Synchronization / locking-contract bug. `idxd_device_config()`
asserted `lockdep_assert_held(&idxd->dev_lock)` but two callers violated
this: (1) `idxd_device_reinit()` in irq.c — present since v5.9, never
held the lock; (2) `idxd_reset_done()` in init.c — added in v6.14, also
didn't hold the lock. The fix centralizes lock acquisition inside the
callee.
**Step 2.4: ASSESS THE FIX QUALITY**
Record: Obviously correct — standard pattern of moving lock from callers
into callee. Uses well-established `guard(spinlock)` API. Minimal:
removes code from callers, adds to callee. Reviewed and suggested by the
IDXD designated reviewer (Dave Jiang). Very low regression risk — the
two callers that had the lock (drv_enable_wq, device_drv_probe) simply
shift the lock scope inside the function.
---
## PHASE 3: GIT HISTORY INVESTIGATION
**Step 3.1: BLAME THE CHANGED LINES**
Record: `idxd_device_config()` with
`lockdep_assert_held(&idxd->dev_lock)` introduced by `bfe1d56091c1a4`
(Dave Jiang, 2020-01-21) — first appeared in v5.6. The unlocked
`idxd_device_reinit()` caller was introduced by `0d5c10b4c84d6a` (Dave
Jiang, 2020-06-26) — first appeared in v5.9. The unlocked
`idxd_reset_done()` caller from `98d187a9890360` (Fenghua Yu,
2024-11-22) — first appeared in v6.14.
**Step 3.2: FOLLOW THE FIXES TAG**
Record: No Fixes: tag present (expected — that's why this needs manual
review).
**Step 3.3: CHECK FILE HISTORY**
Record: Active idxd development. Recent commits include FLR support, UAF
fixes, deadlock fixes (`407171717a4f4`), spinlock-to-mutex conversions.
This fix is standalone.
**Step 3.4: CHECK THE AUTHOR**
Record: Vinicius Costa Gomes is the listed IDXD maintainer (M:) in
MAINTAINERS. Dave Jiang (Suggested-by/Reviewed-by) is the designated
reviewer (R:). Vinod Koul is the dmaengine subsystem maintainer who
applied it. Strong trust signal.
**Step 3.5: CHECK FOR DEPENDENCIES**
Record: Uses `guard(spinlock)` which requires `include/linux/cleanup.h`
— verified present in v6.6 (`cleanup.h` exists), NOT present in v6.1
(file does not exist at that tag). The `init.c` hunk only applies to
trees with FLR support (v6.14+). The `device.c` changes are standalone.
---
## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH
**Step 4.1-4.4:** Lore.kernel.org and patch.msgid.link URLs were blocked
by Anubis bot protection. The Link header indicates this is part of a
series related to "idxd-fix-flr-on-kernel-queues-v3." No stable-specific
discussion could be verified externally.
Record: UNVERIFIED — could not access lore/patch thread directly due to
anti-bot blocking.
---
## PHASE 5: CODE SEMANTIC ANALYSIS
**Step 5.1: KEY FUNCTIONS**
Record: `idxd_device_config()`, `idxd_drv_enable_wq()`,
`idxd_device_drv_probe()`, `idxd_reset_done()`, and the unmodified but
affected `idxd_device_reinit()`.
**Step 5.2: TRACE CALLERS**
Verified all callers of `idxd_device_config()`:
1. `idxd_device_reinit()` in irq.c:39 — work queue callback for software
reset recovery. **NO lock held — BUG (since v5.9)**
2. `idxd_reset_done()` in init.c:1097 — PCI FLR completion callback.
**NO lock held — BUG (since v6.14)**
3. `idxd_drv_enable_wq()` in device.c:1455 — WQ enable path. Lock held
correctly.
4. `idxd_device_drv_probe()` in device.c:1554 — Device driver probe.
Lock held correctly.
**Step 5.3-5.4: CALL CHAINS**
- `idxd_device_reinit` ← `INIT_WORK` ← `idxd_halt()` ←
`idxd_misc_thread()` (threaded IRQ handler via
`request_threaded_irq`). Triggered when device enters halt state with
software reset type.
- `idxd_reset_done` ← PCI error handler `.reset_done` callback.
Triggered during FLR completion.
- Both are reachable during normal device operation under error/recovery
conditions.
**Step 5.5: SIMILAR PATTERNS**
Record: Prior idxd locking fixes exist: `407171717a4f4` ("avoid deadlock
in process_misc_interrupts()") and `cf4ac3fef3388` (lockdep warning on
driver removal). This is a pattern in idxd of locking bugs being found
and fixed.
---
## PHASE 6: STABLE TREE ANALYSIS
**Step 6.1: BUGGY CODE IN STABLE TREES**
Verified:
- **v6.6:** `idxd_device_reinit()` calling `idxd_device_config()`
without lock EXISTS (confirmed via `git show
v6.6:drivers/dma/idxd/irq.c`). `idxd_reset_done()` does NOT exist
(confirmed: `git grep idxd_device_config v6.6:drivers/dma/idxd/init.c`
returns nothing).
- **v6.1:** Same unlocked `irq.c` caller exists, but `cleanup.h` /
`guard(spinlock)` does NOT exist — backport would need traditional
`spin_lock`/`spin_unlock`.
- **v6.14+:** Both buggy callers exist.
**Step 6.2: BACKPORT COMPLICATIONS**
- v6.6+: `guard(spinlock)` available. `device.c` changes apply cleanly.
`init.c` hunk irrelevant (no FLR code).
- v6.1 and older: No `guard(spinlock)` — needs rework to explicit
`spin_lock`/`spin_unlock`.
**Step 6.3: RELATED FIXES IN STABLE**
No prior fix for this specific `idxd_device_config()` lockdep/locking
issue found in stable.
---
## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT
**Step 7.1:** `drivers/dma/idxd` — Intel Data Streaming Accelerator
(DSA) / Intel Analytics Accelerator (IAA). Used in data center /
enterprise environments. Criticality: PERIPHERAL-to-IMPORTANT (hardware-
specific but used in enterprise/cloud computing).
**Step 7.2:** Actively maintained with regular bug fixes.
---
## PHASE 8: IMPACT AND RISK ASSESSMENT
**Step 8.1: WHO IS AFFECTED**
Record: Users of Intel IDXD/DSA/IAA hardware, primarily enterprise/data
center systems.
**Step 8.2: TRIGGER CONDITIONS**
Record: The `irq.c` unlocked path is triggered when the device enters
halt state and does a software reset (interrupt-driven). The `init.c`
path is triggered during FLR completion (v6.14+). These are error
recovery paths, not routine hot paths. However, they are reachable
during normal device operation when hardware issues occur.
**Step 8.3: FAILURE MODE SEVERITY**
The verified symptom is a lockdep warning. However, the missing lock on
`idxd_device_config()` means the function executes MMIO configuration
writes (`idxd_wqs_config_write`, `idxd_groups_config_write`) and
manipulates shared device state without the required `dev_lock`. This
creates a real (not theoretical) race window where concurrent access to
device configuration could cause incorrect MMIO programming or corrupted
device state. Severity: MEDIUM-HIGH — lockdep warning verified, race on
MMIO/config paths is the underlying risk.
**Step 8.4: RISK-BENEFIT RATIO**
- Benefit: MEDIUM-HIGH — fixes real synchronization bug that lockdep
detected; prevents potential device misconfiguration during error
recovery
- Risk: VERY LOW — net -5 lines, purely moves existing logic, reviewed
by subsystem maintainer/reviewer
- Ratio: Strongly favorable
---
## PHASE 9: FINAL SYNTHESIS
**Step 9.1: EVIDENCE FOR**
- Fixes a verified locking contract violation (lockdep_assert_held
fails)
- The commit message explicitly says the missing lock "was an error"
- The unlocked `irq.c` caller has existed since v5.9 — long-standing bug
affecting all stable trees with IDXD
- `idxd_device_config()` performs hardware MMIO writes — missing lock
means real race on device state
- Small, surgical fix: net -5 lines, two files, single driver
- Written by the IDXD maintainer, suggested and reviewed by the IDXD
designated reviewer
- Applied by the dmaengine subsystem maintainer
- Similar prior locking fixes in IDXD have been applied (e.g.,
`407171717a4f4`)
**Step 9.2: EVIDENCE AGAINST**
- No verified crash, data corruption, or security issue — the verified
symptom is a lockdep warning
- The init.c hunk only applies to v6.14+ (FLR code)
- v6.1 and older need rework (no `guard(spinlock)`)
- Niche hardware (Intel DSA/IAA, enterprise/data center)
- Could not verify mailing list discussion due to anti-bot blocking
**Step 9.3: STABLE RULES CHECKLIST**
1. Obviously correct and tested? **YES** — standard lock-inside-callee
pattern, reviewed by subsystem expert
2. Fixes a real bug? **YES** — locking contract violation on hardware
configuration path
3. Important issue? **YES** — race on MMIO/device configuration writes
during error recovery; lockdep fires on any debug/CI kernel
4. Small and contained? **YES** — 2 files, net -5 lines, single driver
5. No new features or APIs? **YES** — pure bug fix
6. Can apply to stable? **YES** for v6.6+ (with init.c hunk dropped on
pre-6.14); needs minor adaptation for v6.1
**Step 9.4: DECISION**
The fix addresses a real synchronization bug where device configuration
MMIO writes execute without the required `dev_lock`. The lockdep warning
is the verified symptom, but the underlying issue is a genuine race on
shared hardware state — missing locks on MMIO paths are not cosmetic.
The fix is small, obviously correct, and comes from the subsystem
maintainer chain. It meets all stable kernel criteria.
---
## Verification
- [Phase 1] Parsed all tags from commit message. Confirmed Vinicius
Costa Gomes is IDXD maintainer (M:) and Dave Jiang is reviewer (R:) in
MAINTAINERS file at line 12750-12752.
- [Phase 2] Read current `device.c` lines 1121-1145: confirmed
`lockdep_assert_held(&idxd->dev_lock)` at line 1125.
- [Phase 2] Read current `device.c` lines 1452-1456: confirmed
`idxd_drv_enable_wq()` takes `spin_lock(&idxd->dev_lock)` before
calling `idxd_device_config()`.
- [Phase 2] Read current `device.c` lines 1552-1555: confirmed
`idxd_device_drv_probe()` takes `spin_lock(&idxd->dev_lock)` before
calling `idxd_device_config()`.
- [Phase 2] Read current `init.c` lines 1093-1102: confirmed
`idxd_reset_done()` calls `idxd_device_config()` under
`IDXD_FLAG_CONFIGURABLE` check but WITHOUT `dev_lock`.
- [Phase 2] Read current `irq.c` lines 32-64: confirmed
`idxd_device_reinit()` calls `idxd_device_config()` at line 39 WITHOUT
any lock and WITHOUT `IDXD_FLAG_CONFIGURABLE` check.
- [Phase 3] git blame `irq.c` lines 32-40: confirmed
`idxd_device_reinit()` from commit `0d5c10b4c84d6a` (Dave Jiang,
2020-06-26); `idxd_device_config()` call from `bfe1d56091c1a4`
(2020-01-21).
- [Phase 3] git blame `device.c` lines 1121-1126: confirmed
`idxd_device_config()` with `lockdep_assert_held` from
`bfe1d56091c1a4`.
- [Phase 3] git blame `init.c` lines 1093-1102: confirmed
`idxd_reset_done()` code from `98d187a9890360` (Fenghua Yu,
2024-11-22).
- [Phase 3] `git describe --contains 0d5c10b4c84d6a` =
`v5.9-rc1~96^2~1^2~52` — bug introduced in v5.9.
- [Phase 3] `git describe --contains 98d187a9890360` = `v6.14-rc1~43^2`
— FLR caller introduced in v6.14.
- [Phase 3] `git describe --contains bfe1d56091c1a4` =
`v5.6-rc1~196^2~7` — original function since v5.6.
- [Phase 5] Verified all callers via grep: `irq.c:39`, `init.c:1097`,
`device.c:1455`, `device.c:1554`, `idxd.h:762` (declaration).
- [Phase 5] Read `irq.c` lines 400-421: verified `idxd_halt()` queues
`idxd_device_reinit` via `INIT_WORK` + `queue_work` on software reset,
and `idxd_device_flr` on FLR reset type.
- [Phase 6] `git show v6.6:drivers/dma/idxd/irq.c`: confirmed unlocked
`idxd_device_reinit()` calling `idxd_device_config()` exists in v6.6.
- [Phase 6] `git show v6.6:drivers/dma/idxd/device.c` grep: confirmed
`lockdep_assert_held` and locked callers exist in v6.6.
- [Phase 6] `git grep idxd_device_config v6.6:drivers/dma/idxd/init.c`:
empty — FLR code does NOT exist in v6.6.
- [Phase 6] `git show v6.6:include/linux/cleanup.h`: exists —
`guard(spinlock)` available in v6.6.
- [Phase 6] `git show v6.1:include/linux/cleanup.h`: does NOT exist —
`guard(spinlock)` NOT available in v6.1.
- [Phase 8] Verified `idxd_device_config()` callees perform MMIO writes:
`idxd_wqs_config_write()`, `idxd_groups_config_write()` at lines
1136-1141.
- UNVERIFIED: Mailing list discussion content (lore blocked by Anubis
anti-bot protection).
- UNVERIFIED: Whether an unprivileged user can reliably trigger the
halt/software-reset path.
**YES**
drivers/dma/idxd/device.c | 17 +++++++----------
drivers/dma/idxd/init.c | 10 ++++------
2 files changed, 11 insertions(+), 16 deletions(-)
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index c2cdf41b6e576..621f797b50d84 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -1122,7 +1122,11 @@ int idxd_device_config(struct idxd_device *idxd)
{
int rc;
- lockdep_assert_held(&idxd->dev_lock);
+ guard(spinlock)(&idxd->dev_lock);
+
+ if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
+ return 0;
+
rc = idxd_wqs_setup(idxd);
if (rc < 0)
return rc;
@@ -1449,11 +1453,7 @@ int idxd_drv_enable_wq(struct idxd_wq *wq)
}
}
- rc = 0;
- spin_lock(&idxd->dev_lock);
- if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
- rc = idxd_device_config(idxd);
- spin_unlock(&idxd->dev_lock);
+ rc = idxd_device_config(idxd);
if (rc < 0) {
dev_dbg(dev, "Writing wq %d config failed: %d\n", wq->id, rc);
goto err;
@@ -1549,10 +1549,7 @@ int idxd_device_drv_probe(struct idxd_dev *idxd_dev)
}
/* Device configuration */
- spin_lock(&idxd->dev_lock);
- if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
- rc = idxd_device_config(idxd);
- spin_unlock(&idxd->dev_lock);
+ rc = idxd_device_config(idxd);
if (rc < 0)
return -ENXIO;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 2acc34b3daff8..c29f9123934ae 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -1093,12 +1093,10 @@ static void idxd_reset_done(struct pci_dev *pdev)
idxd_device_config_restore(idxd, idxd->idxd_saved);
/* Re-configure IDXD device if allowed. */
- if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) {
- rc = idxd_device_config(idxd);
- if (rc < 0) {
- dev_err(dev, "HALT: %s config fails\n", idxd_name);
- goto out;
- }
+ rc = idxd_device_config(idxd);
+ if (rc < 0) {
+ dev_err(dev, "HALT: %s config fails\n", idxd_name);
+ goto out;
}
/* Bind IDXD device to driver. */
--
2.53.0
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-03-30 12:39 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260330123842.756154-1-sashal@kernel.org>
2026-03-30 12:38 ` [PATCH AUTOSEL 6.19-6.18] dmaengine: idxd: Fix lockdep warnings when calling idxd_device_config() Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox