[PATCH AUTOSEL 7.0-6.18] cxl/pci: Hold memdev lock in cxl_event_trace

public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 7.0-6.18] cxl/pci: Hold memdev lock in cxl_event_trace_record()
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
@ 2026-04-20 13:21 ` Sasha Levin
  2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.18] cxl/region: Fix use-after-free from auto assembly failure Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Li Ming, Dan Williams, Dave Jiang, Sasha Levin, dave, jic23,
	alison.schofield, vishal.l.verma, ira.weiny, djbw, linux-cxl,
	linux-kernel

From: Li Ming <ming.li@zohomail.com>

[ Upstream commit dc372e5f429ced834d81ff12a945397dc43585a8 ]

cxl_event_config() invokes cxl_mem_get_event_record() to get remain
event logs from CXL device during cxl_pci_probe(). If CXL memdev probing
failed before that, it is possible to access an invalid endpoint. So
adding a cxlmd->driver binding status checking inside
cxl_dpa_to_region() to ensure the corresponding endpoint is valid.

Besides, cxl_event_trace_record() needs to hold memdev lock to invoke
cxl_dpa_to_region() to ensure the memdev probing completed. It is
possible that cxl_event_trace_record() is invoked during the CXL memdev
probing, especially user or cxl_acpi triggers CXL memdev re-probing.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Li Ming <ming.li@zohomail.com>
Link: https://patch.msgid.link/20260314-fix_access_endpoint_without_drv_check-v2-3-4c09edf2e1db@zohomail.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information to write a comprehensive analysis.

# ANALYSIS: cxl/pci: Hold memdev lock in cxl_event_trace_record()

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: Parse the subject line
**Record:** Subsystem: cxl/pci. Action verb: "Hold" (synchronization -
adding a lock). One-line intent: Add memdev device lock around
`cxl_event_trace_record()` to serialize with memdev probing.

### Step 1.2: Parse all tags
**Record:**
- Suggested-by: Dan Williams (CXL maintainer)
- Reviewed-by: Dan Williams <dan.j.williams@intel.com>
- Reviewed-by: Dave Jiang <dave.jiang@intel.com>
- Signed-off-by: Li Ming (author)
- Signed-off-by: Dave Jiang (subsystem maintainer, applied via tree)
- Link: patch.msgid.link ->
  20260314-fix_access_endpoint_without_drv_check-v2-3
- **NO Fixes: tag** (patch 4 of the same series has one, but this one
  doesn't)
- **NO Cc: stable** tag
- Strong review from TWO senior CXL maintainers

### Step 1.3: Analyze the commit body
**Record:**
- Bug description: (1) During `cxl_pci_probe()`, `cxl_event_config()`
  calls `cxl_mem_get_event_record()` which can eventually call
  `cxl_event_trace_record()`. If the cxl_memdev driver probing failed
  before this, `cxlmd->endpoint` remains at its initial value
  `ERR_PTR(-ENXIO)` (non-NULL but invalid). (2)
  `cxl_event_trace_record()` can also race with re-probing triggered by
  user (sysfs) or cxl_acpi.
- Symptom: Invalid endpoint access in `cxl_dpa_to_region()` -> NULL-ptr-
  deref / GPF (same symptom as KASAN trace in the related commit
  0066688dbcdcf).
- Author's root cause explanation: `cxlmd->endpoint` is initialized to
  `ERR_PTR(-ENXIO)` at memdev creation, and only gets updated to valid
  port on successful probe. If probing fails, consumers can see the
  sentinel and crash when dereferencing.

### Step 1.4: Detect hidden bug fixes
**Record:** The commit uses "Hold memdev lock" (synchronization change).
Per the guidance, "Clean up locking"/synchronization changes often fix
races. This is explicitly a race fix even though the subject says "Hold
lock" rather than "Fix".

## PHASE 2: DIFF ANALYSIS

### Step 2.1: Inventory the changes
**Record:**
- `drivers/cxl/core/mbox.c`: ~3 lines changed (+1, added
  `guard(device)`, changed `const` to non-const)
- `drivers/cxl/core/region.c`: ~7 lines changed (added
  `!cxlmd->dev.driver` check, removed `port && is_cxl_endpoint(port)`
  check)
- `drivers/cxl/cxlmem.h`: 1 line changed (const removed from prototype)
- Total: 3 files, ~12 lines. Small, surgical.

### Step 2.2: Understand the code flow change
**Record:**
- `cxl_event_trace_record()`: BEFORE: takes region/dpa rwsems only.
  AFTER: takes memdev device lock first (synchronizes with memdev
  probe), then rwsems.
- `cxl_dpa_to_region()`: BEFORE: `port = cxlmd->endpoint; if (port &&
  is_cxl_endpoint(port) && ...)` - dereferences `ERR_PTR(-ENXIO)` in
  `is_cxl_endpoint()`. AFTER: First check `if (!cxlmd->dev.driver)
  return NULL;` - early exit when driver not bound. Then
  `cxl_num_decoders_committed(port)` check.

### Step 2.3: Identify the bug mechanism
**Record:** Combination bug category:
- **Race condition** in synchronization (commit adds `guard(device)`)
- **Memory safety** (commit adds NULL-ish check `!cxlmd->dev.driver`)
- **Invalid pointer dereference**: `cxlmd->endpoint` can be
  `ERR_PTR(-ENXIO)` (verified in drivers/cxl/core/memdev.c:678 where
  it's initialized). The old code `if (port && is_cxl_endpoint(port))`
  passes the NULL check since `ERR_PTR(-ENXIO)` is non-NULL, but then
  `is_cxl_endpoint()` dereferences `port->uport_dev` causing a GPF.

### Step 2.4: Assess fix quality
**Record:**
- Fix is correct and minimal
- Regression risk: Adding `guard(device)` could serialize event
  processing with probing. Acceptable - this is the intent. All
  callsites (`cxl_event_thread` IRQ handler, `cxl_event_config` via
  process context, `cxl_handle_cper_event`) are sleepable contexts.
- No deadlock risk: cxl_mem_probe does not need any cxl_pci-held
  resources; device locks are per-device.

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: Blame the changed lines
**Record:**
- `cxl_event_trace_record()` in its current form was introduced in
  v6.9-rc6 (commit 6aec00139d3a8 "cxl/core: Add region info to
  cxl_general_media and cxl_dram events"). Before v6.10 it was a static
  function without the region-lookup path.
- `cxlmd->endpoint = ERR_PTR(-ENXIO)` initialization in memdev.c:678 has
  been present for years.

### Step 3.2: Follow the Fixes: tag
**Record:** No Fixes: tag on this patch. The patch is a hardening
against race/NULL deref discovered during analysis rather than a
targeted fix. However, the bug fundamentally exists since v6.10 when
`cxl_dpa_to_region()` was first called from `cxl_event_trace_record()`.

### Step 3.3: Check file history for related changes
**Record:**
- Related recent fix: `0066688dbcdcf` ("cxl/port: Hold port host lock
  during dport adding") - merged v7.0-rc1+3. Shows an actual KASAN crash
  stack: `cxl_dpa_to_region+0x105 -> cxl_event_trace_record ->
  cxl_mock_mem_probe`. This confirms the same code path has produced
  observable crashes (in cxl_test).
- Related older fix: `285f2a0884143` ("cxl/region: Avoid null pointer
  dereference in region lookup") from v6.10 - an earlier attempt to
  harden `cxl_dpa_to_region` against the same invalid-endpoint scenario.
- This commit is patch 3/4 of the series "cxl: Consolidate
  cxlmd->endpoint accessing" (v2 from 20260314).

### Step 3.4: Check author's other commits
**Record:** Li Ming is an active CXL contributor with recent fixes in
the subsystem (PCI/IDE fixes, cxl/edac fixes, cxl/port fixes including
the related 0066688dbcdcf). Suggested-by Dan Williams = the CXL
architect. Patch-to-maintainer credibility is high.

### Step 3.5: Check for dependent/prerequisite commits
**Record:**
- Patch 3 uses `guard(device)(&cxlmd->dev)` which relies on
  `DEFINE_GUARD(device, ...)` in include/linux/device.h. This was
  introduced in v6.7-rc7 (commit 134c6eaa6087d), so all stable trees
  v6.7+ have it.
- Patch 3 does NOT depend on patch 1 of the series (which adds
  `DEFINE_GUARD_COND(device, _intr, ...)` - used only by patch 2).
- Patch 3 does NOT strictly depend on patch 2 (patch 2 fixes poison
  debugfs paths; orthogonal).
- However, older stable trees (v6.10-v6.16) use
  `cxl_region_rwsem`/`cxl_dpa_rwsem` instead of
  `cxl_rwsem.region`/`cxl_rwsem.dpa` (consolidated in v6.17 via
  d03fcf50ba56f). Backport would need rwsem name changes.

## PHASE 4: MAILING LIST RESEARCH

### Step 4.1: Find the original patch discussion
**Record:**
- b4 am successfully fetched the full series: 4 patches in "cxl:
  Consolidate cxlmd->endpoint accessing" v2.
- v1 of the series was at
  `20260310-fix_access_endpoint_without_drv_check-v1`.
- Changes v1->v2 per cover letter: squashed two patches into patch 3
  (this one), dropped an ineffective patch, moved lock placement per
  Alison Schofield's feedback.
- Dave Jiang confirmed applying patches 2/3/4 to `cxl/next` for v7.1:
  `43e4c205197e`, `11ce2524b7f3` (this patch), `b227d1faed0a`.
- **No stable nomination discussed** in the thread.
- No NAKs. Two rounds of review with all feedback addressed.

### Step 4.2: Check who reviewed the patch
**Record:** Dan Williams (Intel, CXL subsystem co-maintainer), Dave
Jiang (Intel, CXL subsystem maintainer), Alison Schofield (Intel, CXL
developer). All three CXL-specific mailing lists and linux-kernel were
CC'd. Full subsystem maintainer review.

### Step 4.3: Search for bug report
**Record:** No separate bug report link. The commit describes the
scenario analytically. The related commit `0066688dbcdcf` shows a real
KASAN crash in cxl_test with the same stack trace leading through
`cxl_event_trace_record -> cxl_dpa_to_region`, confirming the crash is
reproducible.

### Step 4.4: Check related patches in series
**Record:** Patch 3 is self-contained for its stated scenarios
(cxl_pci_probe event path, re-probing race). Patches 2 and 4 address
different callers (poison debugfs, cxl_reset_done). Patch 1 is a driver-
core helper used only by patch 2. Patch 3 stands on its own.

### Step 4.5: Stable mailing list history
**Record:** No stable-list discussion found for this specific patch
(only 1 month old - on its way to v7.1-rc1).

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1: Identify key functions
**Record:** Modified: `cxl_event_trace_record()`,
`__cxl_event_trace_record()`, `cxl_dpa_to_region()`.

### Step 5.2: Trace callers
**Record:**
- `cxl_event_trace_record()` callers (verified via grep):
  `cxl_handle_cper_event()` in pci.c (firmware event handler),
  `__cxl_event_trace_record()` in mbox.c.
- `__cxl_event_trace_record()` is called from
  `cxl_mem_get_records_log()` which is called from
  `cxl_mem_get_event_records()` which is called from: (a)
  `cxl_event_thread` (IRQ thread, pci.c:582), (b) `cxl_event_config()`
  (cxl_pci_probe path, pci.c:755).
- `cxl_dpa_to_region()` callers: `cxl_event_trace_record` (mbox.c),
  `cxl_inject_poison` and `cxl_clear_poison` (memdev.c via lines 315,
  384).

### Step 5.3: Trace callees
**Record:** `cxl_dpa_to_region` calls `device_for_each_child()` on the
endpoint port, iterating decoders. Pre-fix, first access is
`is_cxl_endpoint(port)` which dereferences `port->uport_dev` - this is
where `ERR_PTR(-ENXIO)` causes GPF.

### Step 5.4: Follow the call chain
**Record:** Path from user/firmware to crash:
1. cxl_pci_probe (boot/hotplug) -> cxl_event_config ->
   cxl_mem_get_event_records -> __cxl_event_trace_record ->
   cxl_event_trace_record -> cxl_dpa_to_region -> CRASH
2. CXL IRQ thread -> cxl_mem_get_event_records -> ... -> CRASH (if
   happens concurrent with re-probe)
3. Firmware CPER handler -> cxl_handle_cper_event ->
   cxl_event_trace_record -> CRASH

**Path is user-triggerable**: User can `echo` to sysfs to unbind/rebind
cxl_memdev, creating the race window with any ongoing event processing.

### Step 5.5: Search for similar patterns
**Record:** Commit `285f2a0884143` was an earlier (v6.10) attempt to
harden this same function against NULL-ish pointer issues. This current
patch provides stronger guarantees via driver-binding check + device
lock.

## PHASE 6: STABLE TREE ANALYSIS

### Step 6.1: Does the buggy code exist in stable?
**Record:** The function `cxl_event_trace_record()` started calling
`cxl_dpa_to_region()` in v6.10 (commit `6aec00139d3a8`). Before that
(v6.6, v6.1) the function didn't have this call path, so the bug doesn't
exist.

Bug exists in: v6.19.y (LTS), v6.17.y (prior LTS), v6.12.y (LTS), and
anything v6.10+.
Bug does NOT exist in: v6.6.y, v6.1.y, v5.15.y, v5.10.y, v5.4.y.

### Step 6.2: Check for backport complications
**Record:**
- v6.19.y: applies with minor adjustment (uses `cxl_rwsem.region/dpa` -
  matches current tree ✓)
- v6.17.y: applies cleanly (has cxl_rwsem consolidation from v6.17)
- v6.12.y: needs rwsem name changes (`cxl_region_rwsem`,
  `cxl_dpa_rwsem`) - manual backport needed
- v6.17+ already has the function in the format this patch modifies.
  Earlier trees need non-trivial rewording of the rwsem guards.

### Step 6.3: Check if related fixes are in stable
**Record:** Commit `0066688dbcdcf` has a Fixes: tag (`4f06d81e7c6a`) and
a clear backport candidate - but it addresses a different race (dport
addition). This commit is a separate, complementary fix for a related
but distinct scenario.

## PHASE 7: SUBSYSTEM CONTEXT

### Step 7.1: Subsystem criticality
**Record:** drivers/cxl = CXL memory/interconnect subsystem.
Criticality: IMPORTANT (used in data center servers, but fraction of
users compared to core mm/fs/net). CXL is relatively new hardware -
affected user population is concentrated in enterprise/server.

### Step 7.2: Subsystem activity
**Record:** CXL is actively developed - many commits per release. The
bug has existed since v6.10 (~2 years). No user-filed bug reports found,
but a reproducible test-environment crash exists.

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: Affected users
**Record:** CXL-hardware users: enterprise servers using CXL Type 3
memory devices. A subset of Linux deployments, but important for data
center.

### Step 8.2: Trigger conditions
**Record:**
- Requires probing failure OR user/firmware-initiated re-probing with
  concurrent event processing
- User-triggerable via sysfs (unprivileged users cannot access sysfs
  unbind, but root can)
- Timing-dependent race with a realistic window during probe
- Not triggered on every boot, but possible in fault/recovery scenarios

### Step 8.3: Failure mode severity
**Record:** CRITICAL - NULL-ptr-deref / general protection fault. Per
KASAN stack trace in sibling commit, the crash is reproducible. On a
server, this would be a kernel oops/panic during probe or device
recovery.

### Step 8.4: Risk-benefit
**Record:**
- Benefit: MEDIUM-HIGH (prevents crashes on CXL-enabled servers,
  especially during probe failure/recovery)
- Risk: LOW (~12 lines, surgical change, no API changes, well-reviewed
  by two maintainers)
- Ratio: favorable for backport

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: Compile evidence

**For backporting:**
- Fixes a real crash (null-ptr-deref / GPF) reachable from boot probe
  path
- Small and surgical (~12 lines, 3 files)
- Well-reviewed by two senior subsystem maintainers (Dan Williams, Dave
  Jiang)
- Suggested by Dan Williams (CXL architect)
- Bug is reachable from userspace via sysfs unbind/rebind + concurrent
  event
- Similar crash confirmed in KASAN testing (related sibling commit)
- No new features, no API changes
- Patch 3 is self-contained (doesn't require patches 1/2/4 to be
  correct)

**Against backporting:**
- No Fixes: tag (the author/maintainers didn't mark this as a regression
  fix)
- No Cc: stable: annotation
- Described as "consolidate endpoint accessing" (hardening effort, not
  targeted fix)
- Part of a larger series, though this patch is self-contained
- Older stable trees (v6.12) need rwsem name adaptation
- Race is theoretical in that no user report exists (only test-env KASAN
  hits)

### Step 9.2: Stable rules checklist
1. Obviously correct and tested? YES (reviewed by two maintainers,
   applied to cxl-next)
2. Fixes a real bug that affects users? YES (null-ptr-deref crash)
3. Important issue? YES (CRITICAL severity - kernel crash)
4. Small and contained? YES (~12 lines, 3 files)
5. No new features or APIs? YES (only changes prototype const-ness and
   adds lock)
6. Can apply to stable trees? YES for v6.17+, needs adaptation for
   v6.12-v6.16

### Step 9.3: Exception categories
Not a simple device ID/quirk/DT/build fix. Falls under "race condition /
invalid pointer dereference fix" category.

### Step 9.4: Decision
The evidence favors backporting: CRITICAL severity, small scope,
maintainer review, self-contained fix for a user-triggerable crash. The
lack of a Fixes: tag is explainable (the patch is a hardening against a
long-standing issue diagnosed through systematic review) but per the
prompt, absence of tags is not a negative signal. The prompt explicitly
calls out null-ptr-deref and race condition fixes as STRONG YES signals.

## Verification

- [Phase 1] Parsed tags: Found `Suggested-by: Dan Williams`, `Reviewed-
  by: Dan Williams`, `Reviewed-by: Dave Jiang`, `Signed-off-by: Li
  Ming`, `Signed-off-by: Dave Jiang`. No Fixes:, no Cc: stable on this
  patch (confirmed by reading commit text and mailing list mbox).
- [Phase 1] Link to lore discussion: `20260314-
  fix_access_endpoint_without_drv_check-v2-3-4c09edf2e1db@zohomail.com`
  - confirmed series name "cxl: Consolidate cxlmd->endpoint accessing".
- [Phase 2] Diff analysis: Verified 3 files changed with ~12 lines total
  (mbox.c: const -> mutable + `guard(device)` add; region.c: driver
  check added, is_cxl_endpoint removed; cxlmem.h: prototype updated).
- [Phase 2] Verified `ERR_PTR(-ENXIO)` initialization at
  drivers/cxl/core/memdev.c:678 via Grep.
- [Phase 2] Verified `is_cxl_endpoint()` dereferences port->uport_dev at
  drivers/cxl/cxlmem.h:99-101, confirming crash mechanism.
- [Phase 3] `git log --oneline --grep="cxl_event_trace_record"`: found
  related fix `0066688dbcdcf` with KASAN stack trace showing the same
  crash pattern.
- [Phase 3] `git show 6aec00139d3a8`: confirmed `cxl_dpa_to_region()`
  began being called from `cxl_event_trace_record` in v6.9-rc6-4-g (part
  of v6.10 release).
- [Phase 3] `git describe --contains d03fcf50ba56f`: cxl_rwsem
  consolidation in v6.17-rc1.
- [Phase 3] `git describe --contains 134c6eaa6087d`:
  `DEFINE_GUARD(device, ...)` in v6.7-rc7, so `guard(device)` available
  in all affected stable trees.
- [Phase 4] `b4 am` successfully fetched the series, confirmed 4-patch
  structure.
- [Phase 4] Read the mbox thread - confirmed Dave Jiang applied patches
  2/3/4 to `cxl/next` for v7.1 (commits 43e4c205197e, 11ce2524b7f3,
  b227d1faed0a).
- [Phase 4] No stable nomination or concerns raised in the thread.
- [Phase 5] `grep cxl_event_trace_record`: callers are
  `cxl_handle_cper_event` (pci.c) and `__cxl_event_trace_record`
  (mbox.c); further callers in `cxl_event_thread` (IRQ) and
  `cxl_event_config` (probe).
- [Phase 5] `grep cxl_dpa_to_region`: called from
  `cxl_event_trace_record` (mbox.c) and from
  `cxl_inject_poison`/`cxl_clear_poison` (memdev.c lines 315, 384).
- [Phase 6] `git show v6.19:drivers/cxl/core/region.c`: confirmed pre-
  fix `cxl_dpa_to_region()` code exists in v6.19 (buggy pattern).
- [Phase 6] `git show v6.6:drivers/cxl/core/mbox.c`: confirmed
  `cxl_event_trace_record()` in v6.6 is different (old signature) and
  doesn't call `cxl_dpa_to_region`, so bug doesn't exist there.
- [Phase 6] `git show v6.17:drivers/cxl/core/mbox.c`: confirmed v6.17
  has `cxl_rwsem.region/dpa` (patch applies cleanly).
- [Phase 8] Failure mode: GPF/null-ptr-deref in `is_cxl_endpoint(port)`
  when port is `ERR_PTR(-ENXIO)` - verified code path. Severity:
  CRITICAL.
- UNVERIFIED: Whether this specific crash has been reported by end users
  on real (non-test) hardware - no user bug report found in lore.
- UNVERIFIED: Whether this patch has already been picked up by mainline
  Linus tree - as of this analysis, it's in cxl/next targeting v7.1-rc1,
  not yet in mainline.

## Conclusion

This patch fixes a user-triggerable null-pointer dereference
(effectively accessing `ERR_PTR(-ENXIO)`) in a CXL event trace path. The
crash pattern has been observed with KASAN in testing environments. The
fix is small (~12 lines), well-reviewed by two senior maintainers, and
self-contained. It adds a `guard(device)(&cxlmd->dev)` to serialize with
memdev probing and an explicit driver-binding check. No new features or
API changes.

The main backport complication is that older stable trees (v6.12.y) use
`cxl_region_rwsem`/`cxl_dpa_rwsem` instead of the consolidated
`cxl_rwsem` (introduced in v6.17), requiring a minor adaptation. The bug
does not exist in v6.6 and earlier.

Given it fixes a kernel crash with clear mechanism, is small and
contained, and has strong maintainer review, it meets the stable kernel
rules criteria despite lacking a Fixes: tag.

**YES**

 drivers/cxl/core/mbox.c   | 5 +++--
 drivers/cxl/core/region.c | 8 +++++---
 drivers/cxl/cxlmem.h      | 2 +-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 12386d9127054..c4a2a1ba13ca6 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -893,7 +893,7 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, "CXL");

-void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+void cxl_event_trace_record(struct cxl_memdev *cxlmd,
 			    enum cxl_event_log_type type,
 			    enum cxl_event_type event_type,
 			    const uuid_t *uuid, union cxl_event *evt)
@@ -920,6 +920,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 		 * translations. Take topology mutation locks and lookup
 		 * { HPA, REGION } from { DPA, MEMDEV } in the event record.
 		 */
+		guard(device)(&cxlmd->dev);
 		guard(rwsem_read)(&cxl_rwsem.region);
 		guard(rwsem_read)(&cxl_rwsem.dpa);

@@ -968,7 +969,7 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, "CXL");

-static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+static void __cxl_event_trace_record(struct cxl_memdev *cxlmd,
 				     enum cxl_event_log_type type,
 				     struct cxl_event_record_raw *record)
 {
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c37ae0b28bbbc..373551022a2b3 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2950,13 +2950,15 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg)
 struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
 {
 	struct cxl_dpa_to_region_context ctx;
-	struct cxl_port *port;
+	struct cxl_port *port = cxlmd->endpoint;
+
+	if (!cxlmd->dev.driver)
+		return NULL;

 	ctx = (struct cxl_dpa_to_region_context) {
 		.dpa = dpa,
 	};
-	port = cxlmd->endpoint;
-	if (port && is_cxl_endpoint(port) && cxl_num_decoders_committed(port))
+	if (cxl_num_decoders_committed(port))
 		device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);

 	return ctx.cxlr;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index e21d744d639bd..7a34a19c02c87 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -864,7 +864,7 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 				  unsigned long *cmds);
 void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
-void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+void cxl_event_trace_record(struct cxl_memdev *cxlmd,
 			    enum cxl_event_log_type type,
 			    enum cxl_event_type event_type,
 			    const uuid_t *uuid, union cxl_event *evt);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] cxl/region: Fix use-after-free from auto assembly failure
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
  2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.18] cxl/pci: Hold memdev lock in cxl_event_trace_record() Sasha Levin
@ 2026-04-20 13:21 ` Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Dan Williams, Ira Weiny, Alison Schofield, Dave Jiang,
	Sasha Levin, dave, jic23, vishal.l.verma, djbw, linux-cxl,
	linux-kernel

From: Dan Williams <dan.j.williams@intel.com>

[ Upstream commit 87805c32e6ad7b5ce2d9f7f47e76081857a4a335 ]

The following crash signature results from region destruction while an
endpoint decoder is staged, but not fully attached.

[ dj: Moved bus_find_device( to next line. ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260327052821.440749-2-dan.j.williams@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

# Analysis: cxl/region: Fix use-after-free from auto assembly failure

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `cxl/region` (CXL memory region management)
- Action verb: "Fix" (explicit bug fix)
- Record: cxl/region Fix use-after-free bug triggered by region
  destruction after auto-assembly staging failure

**Step 1.2: Commit Tags (from original mbox found via b4)**
- Original mbox version (20260327052821.440749-2) contains:
  - `Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")` ←
    v6.3-rc1
  - `Cc: <stable@vger.kernel.org>` ← explicit stable nomination by
    author
  - `Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>`
- Reviewed-by: Ira Weiny, Alison Schofield, Dave Jiang (three maintainer
  reviewers)
- Signed-off-by: Dan Williams (author; CXL subsystem maintainer), Dave
  Jiang (committer)
- Link:
  patch.msgid.link/20260327052821.440749-2-dan.j.williams@intel.com
- Note: `[ dj: Moved bus_find_device( to next line. ]` - minor
  formatting adjustment at commit time
- Record: Author explicitly Cc'd stable, provides Fixes: tag, triple
  maintainer Reviewed-by

**Step 1.3: Commit Body**
- Candidate commit message is very short. Original mbox (before
  committer trimming) shows a full KASAN splat:
  ```
  BUG: KASAN: slab-use-after-free in __cxl_decoder_detach+0x724/0x830
  [cxl_core]
  Read of size 8 at addr ffff888265638840 by task modprobe/1287
  ... unregister_region+0x88/0x140 [cxl_core]
  ... devres_release_all+0x172/0x230
  ```
- The "staged" state is established by `cxl_region_attach_auto()` and
  finalized by `cxl_region_attach_position()`
- Memdev removal sees `cxled->cxld.region == NULL` (staged but not
  finalized) and falsely thinks decoder is unattached; later region
  removal finds stale pointer to freed endpoint decoder
- Record: Real bug, KASAN UAF, concrete crash, reachable via memdev
  unregister during autoassembly

**Step 1.4: Hidden Fix Detection**
- Not hidden - explicit "Fix use-after-free"
- Record: Explicit UAF fix, not disguised

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Files: `drivers/cxl/core/region.c` (+50), `drivers/cxl/cxl.h` (+4 -2)
- Functions modified: `cxl_rr_ep_add`, `cxl_region_attach_auto`,
  `__cxl_decoder_detach`
- New functions: `cxl_region_by_target`, `cxl_cancel_auto_attach`
- Scope: single-subsystem surgical fix
- Record: ~60 lines added in 2 files, contained in CXL core

**Step 2.2: Code Flow Changes**
- Before: `cxl_region_attach_auto()` places cxled into
  `p->targets[pos]`, increments `nr_targets`, but `cxld->region` remains
  NULL until `cxl_rr_ep_add()` runs later. If the auto-assembly fails
  (never reaches `cxl_rr_ep_add`), the stale pointer in `p->targets[]`
  persists.
- After: New intermediate state `CXL_DECODER_STATE_AUTO_STAGED` tracks
  the "attached to target array but not yet fully attached" window;
  `__cxl_decoder_detach` now cancels the staging when `cxlr == NULL`
- Record: Adds state tracking for the previously-untracked window
  between target-array placement and region attachment

**Step 2.3: Bug Mechanism**
- Category: (d) Memory safety / UAF fix + state machine gap
- Mechanism: Race between auto-assembly failure and memdev removal. When
  memdev is removed via `cxld_unregister()`, `cxl_decoder_detach(NULL,
  cxled, -1, DETACH_INVALIDATE)` is called. Path hits `cxlr =
  cxled->cxld.region` which is NULL for a staged-but-not-assembled
  decoder, returns NULL without removing the stale `p->targets[pos]`
  pointer. Later region destruction dereferences the freed cxled.
- Record: UAF in `__cxl_decoder_detach` call path from
  `unregister_region` -> iterates freed targets

**Step 2.4: Fix Quality**
- Surgical: introduces one new enum value, state transitions in 2
  places, one new cleanup helper, one new matcher
- No API changes, no locking changes, no hot-path changes
- Low regression risk: only affects auto-assembly path on failure
- Record: High-quality, well-contained fix

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
- `cxl_region_attach_auto()` and the `CXL_DECODER_STATE_AUTO` enum were
  introduced in the Fixes: target
- Record: Buggy code introduced in v6.3-rc1 via a32320b71f08

**Step 3.2: Follow Fixes: Tag**
- `git describe a32320b71f08 --contains` → `v6.3-rc1~89^2~6^2~7`
- Commit: "cxl/region: Add region autodiscovery" by Dan Williams, Feb
  2023
- Present in all stable trees from v6.3+: 6.6.y, 6.12.y, 6.15.y, 6.17.y
  (note: 6.1 predates the bug)
- Record: Bug exists in all stable trees from v6.3 onwards

**Step 3.3: File History**
- Recent changes relevant: `b3a88225519cf cxl/region: Consolidate
  cxl_decoder_kill_region() and cxl_region_detach()` (v6.17-rc1)
  refactored the two call sites into `__cxl_decoder_detach`;
  `d03fcf50ba56f cxl: Convert to ACQUIRE() for conditional rwsem
  locking` introduced new locking helpers
- Record: Code has been refactored in 7.0; older stable trees (<6.17)
  use `cxl_region_detach()` with similar `if (!cxlr) return 0;` pattern
  that has the same bug and would need an adapted backport

**Step 3.4: Author**
- Dan Williams is the CXL subsystem maintainer (originator of region
  autodiscovery); regular prolific contributor to drivers/cxl/
- Record: Subsystem maintainer authoring the fix → high trust

**Step 3.5: Dependencies**
- Fix uses `bus_find_device(&cxl_bus_type, ...)` - available since CXL
  bus exists
- Uses `__free(put_device)` scope-based cleanup - present in 6.6+
- No explicit prerequisites; part of a 9-patch series but patches 2-9
  are test/dax_hmem work unrelated to this fix
- Record: This patch (1/9) is self-contained; subsequent patches don't
  depend on it

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1: b4 dig / Lore Discussion**
- `b4 am` at
  patch.msgid.link/20260327052821.440749-2-dan.j.williams@intel.com
  fetched the full 9-patch thread
- This is the only revision (no v1/v2 indicated in cover letter)
- Cover letter states: "One use-after-free has been there since the
  original automatic region assembly code."
- Record: Single revision, clean review history, author explicitly flags
  UAF age

**Step 4.2: Reviewers**
- Ira Weiny, Alison Schofield, Dave Jiang - all CXL maintainers (DKIM-
  verified intel.com sign-offs)
- All three provided Reviewed-by on this patch
- Record: Thoroughly reviewed by core CXL maintainers

**Step 4.3: Bug Report**
- Bug was discovered by the author while writing test code (series 8/9:
  "Simulate auto-assembly failure"). Series 9/9 adds a test that
  exercises this path.
- Record: Discovered via new test harness; reproducible and tested in
  tree

**Step 4.4: Related Patches**
- 9-patch series: patch 1/9 (this) is a standalone UAF fix; remaining
  patches refactor dax_hmem and add tests
- No dependencies between this patch and 2-9
- Record: Standalone fix, no series dependencies

**Step 4.5: Stable Mailing List**
- Cc: stable@vger.kernel.org was present in original mbox posting
- Record: Explicitly nominated for stable by author

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1: Key Functions**
- Modified: `cxl_rr_ep_add`, `cxl_region_attach_auto`,
  `__cxl_decoder_detach`
- Added: `cxl_region_by_target`, `cxl_cancel_auto_attach`
- Record: 3 modified, 2 new helpers

**Step 5.2: Callers**
- `cxl_region_attach_auto` is called from `cxl_region_attach` during
  region creation
- `__cxl_decoder_detach` is called from `cxl_decoder_detach`, which is
  called from `cxld_unregister()` (on endpoint decoder device removal)
  and `detach_target()` (sysfs detach)
- `cxld_unregister` is registered via `devm_add_action_or_reset` in
  `cxl_decoder_autoremove` - fires on device/driver removal
- Record: Reachable via module unload, memdev hot-unplug, and sysfs-
  driven detach

**Step 5.3: Callees**
- `cxl_cancel_auto_attach` uses `bus_find_device` (existing API) with a
  simple matcher
- Record: Uses existing, well-established kernel APIs

**Step 5.4: Call Chain Reachability**
- modprobe / rmmod cxl_test / rmmod cxl_mem → memdev removal →
  cxld_unregister → cxl_decoder_detach → __cxl_decoder_detach → UAF
- Production scenarios: CXL hot-unplug, module unload during
  autoassembly, memdev probe failure during multi-decoder region
  assembly
- Record: Reachable from module-unload paths; triggerable on real
  hardware

**Step 5.5: Similar Patterns**
- The `state != CXL_DECODER_STATE_AUTO` guard in
  `cxl_region_attach_auto()` (line 1779) checks for the simpler two-
  state enum; adding a staged state does not regress this check because
  the staged->auto transition is managed internally
- Record: No parallel instances needing the same fix

## PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS

**Step 6.1: Code in Stable Trees**
- `CXL_DECODER_STATE_AUTO` enum exists in v6.3 onwards (confirmed by
  checking v6.1 → missing, v6.3 → present)
- `cxl_region_attach_auto()` exists in v6.3 onwards
- The buggy `if (!cxlr) return 0;` (or `return NULL;`) pattern exists in
  v6.6, v6.12, v6.15 equivalents (verified by reading v6.6 and v6.12
  tags)
- Record: Bug exists in v6.3, v6.6, v6.12, v6.15, v6.17, v7.0 trees

**Step 6.2: Backport Complications**
- v6.17+: `__cxl_decoder_detach` exists with same structure → should
  apply cleanly or with minor offsets
- Pre-v6.17 (6.6, 6.12, 6.15): function was named `cxl_region_detach`
  and called directly from `cxl_decoder_kill_region` +
  `cxld_unregister`; fix would need adaptation - inserting
  `cxl_cancel_auto_attach(cxled)` before the `return 0` in
  `cxl_region_detach`
- Pre-6.6 `__free(put_device)` scope cleanup: available via cleanup.h
  since ~5.19, but usage may differ
- Record: Clean apply on 6.17+/7.0; adapted backport needed for 6.6-6.15

**Step 6.3: Related Fixes in Stable**
- `101c268bd2f37 cxl/port: Fix use-after-free, permit out-of-order
  decoder shutdown` (v6.12-rc6) - different UAF, already backported
- `b3a88225519cf cxl/region: Consolidate...` (v6.17-rc1) - refactor, not
  a fix
- Record: No duplicate fix already in stable

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1: Subsystem Criticality**
- drivers/cxl/ - CXL (Compute Express Link) memory subsystem
- Used for CXL memory devices, increasingly common in server/datacenter
  deployments
- Bug triggers during module unload or memdev removal - important for
  operability
- Record: IMPORTANT (growing datacenter usage; data-tier memory path)

**Step 7.2: Activity**
- Very actively developed subsystem (~140 commits to region.c since
  v6.6)
- Record: Active subsystem; fix is current

## PHASE 8: IMPACT AND RISK

**Step 8.1: Affected Users**
- Users of CXL memory devices whose auto-assembly fails (e.g., firmware-
  programmed decoders that can't fully assemble, partial hardware
  configurations, module unload races)
- Record: CXL hardware users; scope grows as CXL adoption grows

**Step 8.2: Trigger Conditions**
- Memdev removed while at least one endpoint decoder is in staged-but-
  not-completed state
- Reproducible via cxl_test with `fail_autoassemble` module option
  (added in patch 8/9)
- Production trigger: module reload during partial assembly; hardware
  hotplug during assembly
- Record: Realistic trigger; concrete reproducer provided in same series

**Step 8.3: Failure Mode**
- Kernel panic via KASAN slab-use-after-free
- Without KASAN: silent memory corruption or crash in
  `__cxl_decoder_detach`
- Severity: CRITICAL (UAF with clear path to crash)
- Record: CRITICAL - memory safety violation

**Step 8.4: Risk/Benefit**
- Benefit: HIGH - eliminates real UAF in CXL subsystem
- Risk: LOW - adds new state, doesn't change successful path; all
  transitions are bounded
- Ratio: Strong positive
- Record: Clear net benefit

## PHASE 9: SYNTHESIS

**Step 9.1: Evidence Compilation**
- FOR: UAF with KASAN trace, Fixes: tag → v6.3 (affects all modern
  stable trees), explicit Cc: stable by author, triple maintainer
  Reviewed-by, author is subsystem maintainer, concrete reproducer in
  same series, contained ~60-line fix, no new userspace API
- AGAINST: Some adaptation needed for pre-v6.17 stable trees (function
  renamed), patch is very new (not in mainline yet - currently in linux-
  next)
- Record: FOR evidence overwhelming

**Step 9.2: Stable Rules Check**
1. Obviously correct: YES (state transitions are bounded and reviewed)
2. Real bug: YES (KASAN-confirmed UAF)
3. Important: YES (CRITICAL - UAF, potential crash/corruption)
4. Small/contained: YES (2 files, ~60 lines)
5. No new features/APIs: YES (internal state enum addition only)
6. Applies cleanly: Mostly - clean on v6.17+/v7.0, needs adaptation for
   6.6-6.15

**Step 9.3: Exception Categories**
- Not a device-ID-add or quirk; standalone UAF fix

**Step 9.4: Decision**
- Clear YES. Real UAF, author-nominated for stable, well-reviewed,
  contained scope.

## Verification

- [Phase 1] Read original mbox via `b4 am` at `/tmp/20260326_dan_j_willi
  ams_dax_hmem_add_tests_for_the_dax_hmem_takeover_capability.mbx`:
  confirmed `Fixes: a32320b71f08`, `Cc: <stable@vger.kernel.org>`, KASAN
  splat, three Reviewed-by from CXL maintainers
- [Phase 2] Read `drivers/cxl/core/region.c` 1040-1070, 1780-1810,
  2150-2220 and `drivers/cxl/cxl.h` 360-405 to validate code flow
- [Phase 3] `git show a32320b71f08 --stat`: confirmed introduction
  commit is "cxl/region: Add region autodiscovery" by Dan Williams, Feb
  2023
- [Phase 3] `git describe a32320b71f08 --contains` →
  `v6.3-rc1~89^2~6^2~7`: bug present since v6.3
- [Phase 3] `git show v6.1:drivers/cxl/cxl.h | grep cxl_decoder_state`:
  empty (enum didn't exist before v6.3)
- [Phase 3] `git show v6.3:drivers/cxl/cxl.h`: confirmed enum exists in
  v6.3
- [Phase 3] `git log --author="Dan Williams"` in drivers/cxl/: confirmed
  Dan Williams as subsystem maintainer
- [Phase 3] `git log --grep="cxl_decoder_detach"`: confirmed
  consolidation in `b3a88225519cf` (v6.17-rc1)
- [Phase 4] `b4 am https://patch.msgid.link/...`: fetched 9-patch
  series, confirmed triple DKIM-verified Reviewed-by
- [Phase 4] Cover letter read: confirmed "One use-after-free has been
  there since the original automatic region assembly code"
- [Phase 4] `git log linux-next/master --grep="use-after-free from auto
  assembly"`: commit `87805c32e6ad7` present in linux-next but not
  mainline yet
- [Phase 5] `grep -n CXL_DECODER_STATE` in drivers/cxl: identified all
  usage sites
- [Phase 5] Read `drivers/cxl/core/port.c` around line 2190: confirmed
  `cxld_unregister` calls `cxl_decoder_detach(NULL, cxled, -1,
  DETACH_INVALIDATE)`, matching the UAF trigger path
- [Phase 6] `git show v6.6:drivers/cxl/core/region.c` and `v6.12`:
  confirmed `cxl_region_detach()` has same `if (!cxlr) return 0;` bug
- [Phase 8] KASAN stack trace in original mbox shows
  `__cxl_decoder_detach+0x724 ... unregister_region+0x88 ...
  devres_release_all+0x172` - concrete reachability
- UNVERIFIED: Whether backport adaptation for pre-6.17 stable trees will
  be straightforward or require substantial rework beyond renaming
  `__cxl_decoder_detach` → `cxl_region_detach`

**Summary**

This is a genuine, well-reviewed use-after-free fix with a KASAN-
confirmed crash signature, originating from the CXL subsystem
maintainer. The bug has existed since v6.3 when region autodiscovery was
introduced, affects all current stable trees, and the author explicitly
Cc'd stable. The fix is small, contained, and introduces only an
internal enum value plus a cleanup helper. Reviewed by three CXL
maintainers. Pre-v6.17 stable trees will need minor contextual
adaptation due to the `__cxl_decoder_detach` refactor, but the
underlying logic is directly transferable.

**YES**

 drivers/cxl/core/region.c | 54 ++++++++++++++++++++++++++++++++++++++-
 drivers/cxl/cxl.h         |  6 +++--
 2 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 373551022a2b3..1e97443535167 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1063,6 +1063,14 @@ static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
 
 	if (!cxld->region) {
 		cxld->region = cxlr;
+
+		/*
+		 * Now that cxld->region is set the intermediate staging state
+		 * can be cleared.
+		 */
+		if (cxld == &cxled->cxld &&
+		    cxled->state == CXL_DECODER_STATE_AUTO_STAGED)
+			cxled->state = CXL_DECODER_STATE_AUTO;
 		get_device(&cxlr->dev);
 	}
 
@@ -1804,6 +1812,7 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
 	pos = p->nr_targets;
 	p->targets[pos] = cxled;
 	cxled->pos = pos;
+	cxled->state = CXL_DECODER_STATE_AUTO_STAGED;
 	p->nr_targets++;
 
 	return 0;
@@ -2153,6 +2162,47 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 	return 0;
 }
 
+static int cxl_region_by_target(struct device *dev, const void *data)
+{
+	const struct cxl_endpoint_decoder *cxled = data;
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+
+	if (!is_cxl_region(dev))
+		return 0;
+
+	cxlr = to_cxl_region(dev);
+	p = &cxlr->params;
+	return p->targets[cxled->pos] == cxled;
+}
+
+/*
+ * When an auto-region fails to assemble the decoder may be listed as a target,
+ * but not fully attached.
+ */
+static void cxl_cancel_auto_attach(struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+	int pos = cxled->pos;
+
+	if (cxled->state != CXL_DECODER_STATE_AUTO_STAGED)
+		return;
+
+	struct device *dev __free(put_device) =
+		bus_find_device(&cxl_bus_type, NULL, cxled, cxl_region_by_target);
+	if (!dev)
+		return;
+
+	cxlr = to_cxl_region(dev);
+	p = &cxlr->params;
+
+	p->nr_targets--;
+	cxled->state = CXL_DECODER_STATE_AUTO;
+	cxled->pos = -1;
+	p->targets[pos] = NULL;
+}
+
 static struct cxl_region *
 __cxl_decoder_detach(struct cxl_region *cxlr,
 		     struct cxl_endpoint_decoder *cxled, int pos,
@@ -2176,8 +2226,10 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
 		cxled = p->targets[pos];
 	} else {
 		cxlr = cxled->cxld.region;
-		if (!cxlr)
+		if (!cxlr) {
+			cxl_cancel_auto_attach(cxled);
 			return NULL;
+		}
 		p = &cxlr->params;
 	}
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9b947286eb9b0..30a31968f2663 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -378,12 +378,14 @@ struct cxl_decoder {
 };
 
 /*
- * Track whether this decoder is reserved for region autodiscovery, or
- * free for userspace provisioning.
+ * Track whether this decoder is free for userspace provisioning, reserved for
+ * region autodiscovery, whether it is started connecting (awaiting other
+ * peers), or has completed auto assembly.
  */
 enum cxl_decoder_state {
 	CXL_DECODER_STATE_MANUAL,
 	CXL_DECODER_STATE_AUTO,
+	CXL_DECODER_STATE_AUTO_STAGED,
 };
 
 /**
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-20 13:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260420132314.1023554-1-sashal@kernel.org>
2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.18] cxl/pci: Hold memdev lock in cxl_event_trace_record() Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.18] cxl/region: Fix use-after-free from auto assembly failure Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox