From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A0C43D349C; Mon, 30 Mar 2026 12:39:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774874343; cv=none; b=fTkWHPRTaKQtuYBf0zeX8qNCh8ePgKBzVt4JvYGU0Ucj+JHtI5m79A4qJP1eIEZoKMlIVe1PA4nfBkQb4lKdCjREjdhMFMAhIvZvISb/Sm4yaDculbkDmL4RBX/LtaztfYaOyIE7KZiqRo5ZuGJEXHbVllQE6F+98U6OVTYaJBc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774874343; c=relaxed/simple; bh=4Q/+NjtiN41IHzlIUFHT3FSSM5dzjFkMurNrExV5Mco=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=MT4UkvQij5jMP7OfmPkpEaNvtaZqiThrQlwMFhcdRerwgHaAii1gTmpVDT23n7NaGiO7KmvHGqQkSEYi2m5Ld6GdxS+qQ66+FgnxK86HegEk4kVCPil7Z0jUl8YqWYhX0EPNKH+yuzbZqpEyjeelib937OnqJ3zij0nQ3NWWAs0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Dwwk/eh4; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Dwwk/eh4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0549FC2BCB4; Mon, 30 Mar 2026 12:39:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774874342; bh=4Q/+NjtiN41IHzlIUFHT3FSSM5dzjFkMurNrExV5Mco=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Dwwk/eh4Yz0MNY53ow+aknYWHiw/RWnlQm5zxf5k+l7vtPasuRMRqojUQXV+2DxaH UOW71mWIofKhhXghZVOSHbhy4hsI77f+N12W8MwWDRFg6nhIu8q1CzFTibEyKvi/nz Jyfm9NRwZwBSAVYwbLYA/yMPR9F10q4EMG38mZskUayiEepjpWRtrn0aKA0n5KoORz ryMyP5bL11MloTi5nZ/fp5y36JFgted/wxJOlXOWqk+wLJgdyaBrmO2hQI5IuaFKOV yG5RvIWQOFI0896v8jOntq1BBH8VD15AiwltqczTBeO+pzk4vLLjYUg/WGdchyG9dC CHKlZ4E0KiW8g== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Vinicius Costa Gomes , Dave Jiang , Vinod Koul , Sasha Levin , dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH AUTOSEL 6.19-6.18] dmaengine: idxd: Fix lockdep warnings when calling idxd_device_config() Date: Mon, 30 Mar 2026 08:38:27 -0400 Message-ID: <20260330123842.756154-14-sashal@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260330123842.756154-1-sashal@kernel.org> References: <20260330123842.756154-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.19.10 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Vinicius Costa Gomes [ Upstream commit caf91cdf2de8b7134749d32cd4ae5520b108abb7 ] Move the check for IDXD_FLAG_CONFIGURABLE and the locking to "inside" idxd_device_config(), as this is common to all callers, and the one that wasn't holding the lock was an error (that was causing the lockdep warning). Suggested-by: Dave Jiang Reviewed-by: Dave Jiang Signed-off-by: Vinicius Costa Gomes Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel-queues-v3-v3-1-7ed70658a9d1@intel.com Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin --- LLM Generated explanations, may be completely bogus: Now I have a thorough understanding of the code. Let me compile the final analysis. ## PHASE 1: COMMIT MESSAGE FORENSICS **Step 1.1: PARSE THE SUBJECT LINE** Record: [dmaengine: idxd:] [Fix] [lockdep warnings when calling idxd_device_config()] — "Fix" verb explicitly indicates a bug fix. **Step 1.2: PARSE ALL COMMIT MESSAGE TAGS** Record: - Suggested-by: Dave Jiang — IDXD designated reviewer per MAINTAINERS - Reviewed-by: Dave Jiang — same person, subsystem expert - Signed-off-by: Vinicius Costa Gomes — author; listed as IDXD maintainer (M:) in MAINTAINERS - Link: https://patch.msgid.link/20260121-idxd-fix-flr-on-kernel- queues-v3-v3-1-7ed70658a9d1@intel.com - Signed-off-by: Vinod Koul — dmaengine subsystem maintainer who applied it - No Fixes:, Reported-by:, Tested-by:, or Cc: stable tags (expected) **Step 1.3: ANALYZE THE COMMIT BODY TEXT** Record: Bug = `idxd_device_config()` had `lockdep_assert_held(&idxd->dev_lock)` but at least one caller was NOT holding the lock. The commit message explicitly says "the one that wasn't holding the lock was an error." Symptom = lockdep warnings at runtime. Fix = move lock acquisition and `IDXD_FLAG_CONFIGURABLE` check inside `idxd_device_config()` to centralize for all callers. **Step 1.4: DETECT HIDDEN BUG FIXES** Record: This is explicitly labeled a "Fix" for lockdep warnings, but the underlying issue is a real synchronization bug. The commit message says the missing lock "was an error" — the function's callee contract required `dev_lock` for device configuration MMIO/state writes, and at least one caller violated it. --- ## PHASE 2: DIFF ANALYSIS **Step 2.1: INVENTORY THE CHANGES** Record: 2 files: `drivers/dma/idxd/device.c` (3 hunks), `drivers/dma/idxd/init.c` (1 hunk). Net change: approximately -5 lines. Functions modified: `idxd_device_config()`, `idxd_drv_enable_wq()`, `idxd_device_drv_probe()`, `idxd_reset_done()`. Scope: single-driver surgical fix. **Step 2.2: UNDERSTAND THE CODE FLOW CHANGE** - **Hunk 1 (idxd_device_config):** `lockdep_assert_held(&idxd->dev_lock)` replaced with `guard(spinlock)(&idxd->dev_lock)` (scope-based lock). `IDXD_FLAG_CONFIGURABLE` early-return check added. Lock is now acquired internally. - **Hunk 2 (idxd_drv_enable_wq):** External `spin_lock/spin_unlock` + `IDXD_FLAG_CONFIGURABLE` check removed; direct call to `idxd_device_config()`. - **Hunk 3 (idxd_device_drv_probe):** Same as Hunk 2. - **Hunk 4 (idxd_reset_done in init.c):** Outer `IDXD_FLAG_CONFIGURABLE` guard removed; direct unconditional call. - **Key insight:** The unmodified caller `idxd_device_reinit()` in `irq.c` (line 39) is automatically fixed because `idxd_device_config()` now acquires the lock internally. **Step 2.3: IDENTIFY THE BUG MECHANISM** Record: Synchronization / locking-contract bug. `idxd_device_config()` asserted `lockdep_assert_held(&idxd->dev_lock)` but two callers violated this: (1) `idxd_device_reinit()` in irq.c — present since v5.9, never held the lock; (2) `idxd_reset_done()` in init.c — added in v6.14, also didn't hold the lock. The fix centralizes lock acquisition inside the callee. **Step 2.4: ASSESS THE FIX QUALITY** Record: Obviously correct — standard pattern of moving lock from callers into callee. Uses well-established `guard(spinlock)` API. Minimal: removes code from callers, adds to callee. Reviewed and suggested by the IDXD designated reviewer (Dave Jiang). Very low regression risk — the two callers that had the lock (drv_enable_wq, device_drv_probe) simply shift the lock scope inside the function. --- ## PHASE 3: GIT HISTORY INVESTIGATION **Step 3.1: BLAME THE CHANGED LINES** Record: `idxd_device_config()` with `lockdep_assert_held(&idxd->dev_lock)` introduced by `bfe1d56091c1a4` (Dave Jiang, 2020-01-21) — first appeared in v5.6. The unlocked `idxd_device_reinit()` caller was introduced by `0d5c10b4c84d6a` (Dave Jiang, 2020-06-26) — first appeared in v5.9. The unlocked `idxd_reset_done()` caller from `98d187a9890360` (Fenghua Yu, 2024-11-22) — first appeared in v6.14. **Step 3.2: FOLLOW THE FIXES TAG** Record: No Fixes: tag present (expected — that's why this needs manual review). **Step 3.3: CHECK FILE HISTORY** Record: Active idxd development. Recent commits include FLR support, UAF fixes, deadlock fixes (`407171717a4f4`), spinlock-to-mutex conversions. This fix is standalone. **Step 3.4: CHECK THE AUTHOR** Record: Vinicius Costa Gomes is the listed IDXD maintainer (M:) in MAINTAINERS. Dave Jiang (Suggested-by/Reviewed-by) is the designated reviewer (R:). Vinod Koul is the dmaengine subsystem maintainer who applied it. Strong trust signal. **Step 3.5: CHECK FOR DEPENDENCIES** Record: Uses `guard(spinlock)` which requires `include/linux/cleanup.h` — verified present in v6.6 (`cleanup.h` exists), NOT present in v6.1 (file does not exist at that tag). The `init.c` hunk only applies to trees with FLR support (v6.14+). The `device.c` changes are standalone. --- ## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH **Step 4.1-4.4:** Lore.kernel.org and patch.msgid.link URLs were blocked by Anubis bot protection. The Link header indicates this is part of a series related to "idxd-fix-flr-on-kernel-queues-v3." No stable-specific discussion could be verified externally. Record: UNVERIFIED — could not access lore/patch thread directly due to anti-bot blocking. --- ## PHASE 5: CODE SEMANTIC ANALYSIS **Step 5.1: KEY FUNCTIONS** Record: `idxd_device_config()`, `idxd_drv_enable_wq()`, `idxd_device_drv_probe()`, `idxd_reset_done()`, and the unmodified but affected `idxd_device_reinit()`. **Step 5.2: TRACE CALLERS** Verified all callers of `idxd_device_config()`: 1. `idxd_device_reinit()` in irq.c:39 — work queue callback for software reset recovery. **NO lock held — BUG (since v5.9)** 2. `idxd_reset_done()` in init.c:1097 — PCI FLR completion callback. **NO lock held — BUG (since v6.14)** 3. `idxd_drv_enable_wq()` in device.c:1455 — WQ enable path. Lock held correctly. 4. `idxd_device_drv_probe()` in device.c:1554 — Device driver probe. Lock held correctly. **Step 5.3-5.4: CALL CHAINS** - `idxd_device_reinit` ← `INIT_WORK` ← `idxd_halt()` ← `idxd_misc_thread()` (threaded IRQ handler via `request_threaded_irq`). Triggered when device enters halt state with software reset type. - `idxd_reset_done` ← PCI error handler `.reset_done` callback. Triggered during FLR completion. - Both are reachable during normal device operation under error/recovery conditions. **Step 5.5: SIMILAR PATTERNS** Record: Prior idxd locking fixes exist: `407171717a4f4` ("avoid deadlock in process_misc_interrupts()") and `cf4ac3fef3388` (lockdep warning on driver removal). This is a pattern in idxd of locking bugs being found and fixed. --- ## PHASE 6: STABLE TREE ANALYSIS **Step 6.1: BUGGY CODE IN STABLE TREES** Verified: - **v6.6:** `idxd_device_reinit()` calling `idxd_device_config()` without lock EXISTS (confirmed via `git show v6.6:drivers/dma/idxd/irq.c`). `idxd_reset_done()` does NOT exist (confirmed: `git grep idxd_device_config v6.6:drivers/dma/idxd/init.c` returns nothing). - **v6.1:** Same unlocked `irq.c` caller exists, but `cleanup.h` / `guard(spinlock)` does NOT exist — backport would need traditional `spin_lock`/`spin_unlock`. - **v6.14+:** Both buggy callers exist. **Step 6.2: BACKPORT COMPLICATIONS** - v6.6+: `guard(spinlock)` available. `device.c` changes apply cleanly. `init.c` hunk irrelevant (no FLR code). - v6.1 and older: No `guard(spinlock)` — needs rework to explicit `spin_lock`/`spin_unlock`. **Step 6.3: RELATED FIXES IN STABLE** No prior fix for this specific `idxd_device_config()` lockdep/locking issue found in stable. --- ## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT **Step 7.1:** `drivers/dma/idxd` — Intel Data Streaming Accelerator (DSA) / Intel Analytics Accelerator (IAA). Used in data center / enterprise environments. Criticality: PERIPHERAL-to-IMPORTANT (hardware- specific but used in enterprise/cloud computing). **Step 7.2:** Actively maintained with regular bug fixes. --- ## PHASE 8: IMPACT AND RISK ASSESSMENT **Step 8.1: WHO IS AFFECTED** Record: Users of Intel IDXD/DSA/IAA hardware, primarily enterprise/data center systems. **Step 8.2: TRIGGER CONDITIONS** Record: The `irq.c` unlocked path is triggered when the device enters halt state and does a software reset (interrupt-driven). The `init.c` path is triggered during FLR completion (v6.14+). These are error recovery paths, not routine hot paths. However, they are reachable during normal device operation when hardware issues occur. **Step 8.3: FAILURE MODE SEVERITY** The verified symptom is a lockdep warning. However, the missing lock on `idxd_device_config()` means the function executes MMIO configuration writes (`idxd_wqs_config_write`, `idxd_groups_config_write`) and manipulates shared device state without the required `dev_lock`. This creates a real (not theoretical) race window where concurrent access to device configuration could cause incorrect MMIO programming or corrupted device state. Severity: MEDIUM-HIGH — lockdep warning verified, race on MMIO/config paths is the underlying risk. **Step 8.4: RISK-BENEFIT RATIO** - Benefit: MEDIUM-HIGH — fixes real synchronization bug that lockdep detected; prevents potential device misconfiguration during error recovery - Risk: VERY LOW — net -5 lines, purely moves existing logic, reviewed by subsystem maintainer/reviewer - Ratio: Strongly favorable --- ## PHASE 9: FINAL SYNTHESIS **Step 9.1: EVIDENCE FOR** - Fixes a verified locking contract violation (lockdep_assert_held fails) - The commit message explicitly says the missing lock "was an error" - The unlocked `irq.c` caller has existed since v5.9 — long-standing bug affecting all stable trees with IDXD - `idxd_device_config()` performs hardware MMIO writes — missing lock means real race on device state - Small, surgical fix: net -5 lines, two files, single driver - Written by the IDXD maintainer, suggested and reviewed by the IDXD designated reviewer - Applied by the dmaengine subsystem maintainer - Similar prior locking fixes in IDXD have been applied (e.g., `407171717a4f4`) **Step 9.2: EVIDENCE AGAINST** - No verified crash, data corruption, or security issue — the verified symptom is a lockdep warning - The init.c hunk only applies to v6.14+ (FLR code) - v6.1 and older need rework (no `guard(spinlock)`) - Niche hardware (Intel DSA/IAA, enterprise/data center) - Could not verify mailing list discussion due to anti-bot blocking **Step 9.3: STABLE RULES CHECKLIST** 1. Obviously correct and tested? **YES** — standard lock-inside-callee pattern, reviewed by subsystem expert 2. Fixes a real bug? **YES** — locking contract violation on hardware configuration path 3. Important issue? **YES** — race on MMIO/device configuration writes during error recovery; lockdep fires on any debug/CI kernel 4. Small and contained? **YES** — 2 files, net -5 lines, single driver 5. No new features or APIs? **YES** — pure bug fix 6. Can apply to stable? **YES** for v6.6+ (with init.c hunk dropped on pre-6.14); needs minor adaptation for v6.1 **Step 9.4: DECISION** The fix addresses a real synchronization bug where device configuration MMIO writes execute without the required `dev_lock`. The lockdep warning is the verified symptom, but the underlying issue is a genuine race on shared hardware state — missing locks on MMIO paths are not cosmetic. The fix is small, obviously correct, and comes from the subsystem maintainer chain. It meets all stable kernel criteria. --- ## Verification - [Phase 1] Parsed all tags from commit message. Confirmed Vinicius Costa Gomes is IDXD maintainer (M:) and Dave Jiang is reviewer (R:) in MAINTAINERS file at line 12750-12752. - [Phase 2] Read current `device.c` lines 1121-1145: confirmed `lockdep_assert_held(&idxd->dev_lock)` at line 1125. - [Phase 2] Read current `device.c` lines 1452-1456: confirmed `idxd_drv_enable_wq()` takes `spin_lock(&idxd->dev_lock)` before calling `idxd_device_config()`. - [Phase 2] Read current `device.c` lines 1552-1555: confirmed `idxd_device_drv_probe()` takes `spin_lock(&idxd->dev_lock)` before calling `idxd_device_config()`. - [Phase 2] Read current `init.c` lines 1093-1102: confirmed `idxd_reset_done()` calls `idxd_device_config()` under `IDXD_FLAG_CONFIGURABLE` check but WITHOUT `dev_lock`. - [Phase 2] Read current `irq.c` lines 32-64: confirmed `idxd_device_reinit()` calls `idxd_device_config()` at line 39 WITHOUT any lock and WITHOUT `IDXD_FLAG_CONFIGURABLE` check. - [Phase 3] git blame `irq.c` lines 32-40: confirmed `idxd_device_reinit()` from commit `0d5c10b4c84d6a` (Dave Jiang, 2020-06-26); `idxd_device_config()` call from `bfe1d56091c1a4` (2020-01-21). - [Phase 3] git blame `device.c` lines 1121-1126: confirmed `idxd_device_config()` with `lockdep_assert_held` from `bfe1d56091c1a4`. - [Phase 3] git blame `init.c` lines 1093-1102: confirmed `idxd_reset_done()` code from `98d187a9890360` (Fenghua Yu, 2024-11-22). - [Phase 3] `git describe --contains 0d5c10b4c84d6a` = `v5.9-rc1~96^2~1^2~52` — bug introduced in v5.9. - [Phase 3] `git describe --contains 98d187a9890360` = `v6.14-rc1~43^2` — FLR caller introduced in v6.14. - [Phase 3] `git describe --contains bfe1d56091c1a4` = `v5.6-rc1~196^2~7` — original function since v5.6. - [Phase 5] Verified all callers via grep: `irq.c:39`, `init.c:1097`, `device.c:1455`, `device.c:1554`, `idxd.h:762` (declaration). - [Phase 5] Read `irq.c` lines 400-421: verified `idxd_halt()` queues `idxd_device_reinit` via `INIT_WORK` + `queue_work` on software reset, and `idxd_device_flr` on FLR reset type. - [Phase 6] `git show v6.6:drivers/dma/idxd/irq.c`: confirmed unlocked `idxd_device_reinit()` calling `idxd_device_config()` exists in v6.6. - [Phase 6] `git show v6.6:drivers/dma/idxd/device.c` grep: confirmed `lockdep_assert_held` and locked callers exist in v6.6. - [Phase 6] `git grep idxd_device_config v6.6:drivers/dma/idxd/init.c`: empty — FLR code does NOT exist in v6.6. - [Phase 6] `git show v6.6:include/linux/cleanup.h`: exists — `guard(spinlock)` available in v6.6. - [Phase 6] `git show v6.1:include/linux/cleanup.h`: does NOT exist — `guard(spinlock)` NOT available in v6.1. - [Phase 8] Verified `idxd_device_config()` callees perform MMIO writes: `idxd_wqs_config_write()`, `idxd_groups_config_write()` at lines 1136-1141. - UNVERIFIED: Mailing list discussion content (lore blocked by Anubis anti-bot protection). - UNVERIFIED: Whether an unprivileged user can reliably trigger the halt/software-reset path. **YES** drivers/dma/idxd/device.c | 17 +++++++---------- drivers/dma/idxd/init.c | 10 ++++------ 2 files changed, 11 insertions(+), 16 deletions(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index c2cdf41b6e576..621f797b50d84 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -1122,7 +1122,11 @@ int idxd_device_config(struct idxd_device *idxd) { int rc; - lockdep_assert_held(&idxd->dev_lock); + guard(spinlock)(&idxd->dev_lock); + + if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) + return 0; + rc = idxd_wqs_setup(idxd); if (rc < 0) return rc; @@ -1449,11 +1453,7 @@ int idxd_drv_enable_wq(struct idxd_wq *wq) } } - rc = 0; - spin_lock(&idxd->dev_lock); - if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) - rc = idxd_device_config(idxd); - spin_unlock(&idxd->dev_lock); + rc = idxd_device_config(idxd); if (rc < 0) { dev_dbg(dev, "Writing wq %d config failed: %d\n", wq->id, rc); goto err; @@ -1549,10 +1549,7 @@ int idxd_device_drv_probe(struct idxd_dev *idxd_dev) } /* Device configuration */ - spin_lock(&idxd->dev_lock); - if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) - rc = idxd_device_config(idxd); - spin_unlock(&idxd->dev_lock); + rc = idxd_device_config(idxd); if (rc < 0) return -ENXIO; diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 2acc34b3daff8..c29f9123934ae 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -1093,12 +1093,10 @@ static void idxd_reset_done(struct pci_dev *pdev) idxd_device_config_restore(idxd, idxd->idxd_saved); /* Re-configure IDXD device if allowed. */ - if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) { - rc = idxd_device_config(idxd); - if (rc < 0) { - dev_err(dev, "HALT: %s config fails\n", idxd_name); - goto out; - } + rc = idxd_device_config(idxd); + if (rc < 0) { + dev_err(dev, "HALT: %s config fails\n", idxd_name); + goto out; } /* Bind IDXD device to driver. */ -- 2.53.0