Linux Tegra architecture development
 help / color / mirror / Atom feed
* [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs
@ 2026-05-28  8:31 Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders Srirangan Madhavan
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

Hi folks!

This patch series introduces support for the CXL Reset method for CXL
Type 2 devices, implementing the reset procedure outlined in the CXL
Specification r3.2 [1], Sections 8.1.3, 9.6, and 9.7.

The userspace ABI is a write-only cxl_reset attribute under the CXL
memdev device:

    /sys/bus/cxl/devices/memX/cxl_reset

The memdev is the userspace handle, while the implementation coordinates
the target PCI function, affected sibling PCI functions, active CXL
memdevs, and any CXL regions reachable through those memdevs.

v6 changes (from v5 [2]):
- Rebased on the current CXL tree used for v7.1-rc4 development.
- Move the ABI from /sys/bus/pci/devices/.../cxl_reset to
  /sys/bus/cxl/devices/memX/cxl_reset.
- Use the memdev as the userspace handle while keeping the reset
  orchestration scoped to the CXL device reset scope.
- Reduce the earlier PCI/CXL save/restore series [3] to a single CXL HDM
  decoder restore/commit helper patch, included here as patch 1.
- Do not offline or hot-remove memory as part of reset. Return -EBUSY
  if an affected CXL region is online as System RAM or has an active
  region driver bound.
- Add reset-idle validation and CPU cache invalidation for affected CXL
  regions.
- Add CXL sibling PCI function discovery using the Non-CXL Function Map
  DVSEC and CXL.cache/CXL.mem capability bits.
- Coordinate PCI save/disable/restore and IOMMU reset prepare/done for
  the target and affected sibling functions.
- Add CXL DVSEC reset sequencing, including CXL.cache disable,
  writeback-invalidate, a minimum 100ms quiet period, reset-complete
  polling, and Reset Error reporting.
- Track affected memdevs, lock active memdevs across reset, restore and
  commit decoder state, re-enable CXL.mem, and wait for media ready
  after reset.
- Cache reset capability at memdev registration time for sysfs
  visibility.
- Document reset scope, Memory Clear not being requested, and -EBUSY
  behavior for active CXL regions.

Motivation:
-----------
- As support for Type 2 devices is being introduced, more devices need a
  CXL-specific reset mechanism beyond bus-wide PCI reset methods.

- FLR does not affect CXL.cache or CXL.mem protocol state, making CXL
  Reset the appropriate mechanism for cases where those protocols must
  be reset.

- The CXL specification highlights use cases such as function rebinding
  and error recovery where CXL Reset is explicitly required.

Change Description:
-------------------

Patch 1: cxl/hdm: Add helpers to restore and commit memdev decoders
- Restore endpoint decoder programming from CXL core's cached decoder
  objects while keeping CXL.mem disabled.
- Commit restored HDM decoders as a separate step so reset orchestration
  can re-enable CXL.mem only after safety checks complete.

Patch 2: PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
- Export PCI reset lifecycle helpers so CXL reset orchestration can save,
  disable, restore, and invoke reset callbacks for affected functions.

Patch 3: cxl: Add reset-idle and cache flush helpers
- Collect CXL regions affected by a memdev reset.
- Fail reset if affected regions are not idle.
- Invalidate CPU caches for each affected region once.

Patch 4: PCI/CXL: Add sibling function coordination for reset
- Identify CXL.cache/CXL.mem sibling functions in the reset scope.
- Use the Non-CXL Function Map DVSEC to exclude non-CXL functions.
- Save, disable, restore, and unlock affected PCI sibling functions.

Patch 5: cxl/pci: Add CXL DVSEC reset helper
- Execute CXL Reset through the CXL Device DVSEC.
- Disable CXL.cache and request writeback-invalidate where supported.
- Enforce the post-reset quiet period and poll for reset completion.
- Block and restore IOMMU traffic while reset is active.

Patch 6: cxl/pci: Track memdevs affected by CXL reset
- Track the target memdev and any sibling-function memdevs affected by
  reset.
- Revalidate and lock active memdevs before reset proceeds.

Patch 7: cxl/pci: Orchestrate CXL reset for affected memdevs
- Coordinate region validation, CPU cache invalidation, PCI function
  preparation, DVSEC reset, decoder restore and commit, CXL.mem enable,
  and media-ready wait.

Patch 8: cxl/memdev: Add cxl_reset sysfs attribute
- Expose /sys/bus/cxl/devices/memX/cxl_reset.
- Only make the attribute visible when the underlying PCI function is
  Type 2 and reset capable.
- Write a boolean true value, such as "1" or "true", to trigger reset.

Patch 9: Documentation/ABI: Document CXL memdev cxl_reset
- Document the new memdev sysfs ABI, reset scope, Memory Clear behavior,
  and idle-region requirement.

The CPU cache invalidation step depends on
cpu_cache_invalidate_memregion() support for the affected address ranges.
If no provider is available, reset fails before hardware reset is
requested.

Command line to test CXL reset on a capable memdev:

    echo 1 > /sys/bus/cxl/devices/memX/cxl_reset

Basic CXL DVSEC reset testing was done on a CXL Type 2 device. The reset
sequence completed successfully and ResetComplete was observed. Full
memdev/region integration testing is still in progress.

References:
[1] https://computeexpresslink.org/wp-content/uploads/2024/12/CXL_3.2-Spec-Announcement_FINAL-1.pdf
[2] https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/
[3] https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/

Srirangan Madhavan (9):
  cxl/hdm: Add helpers to restore and commit memdev decoders
  PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
  cxl: Add reset-idle and cache flush helpers
  PCI/CXL: Add sibling function coordination for reset
  cxl/pci: Add CXL DVSEC reset helper
  cxl/pci: Track memdevs affected by CXL reset
  cxl/pci: Orchestrate CXL reset for affected memdevs
  cxl/memdev: Add cxl_reset sysfs attribute
  Documentation/ABI: Document CXL memdev cxl_reset

 Documentation/ABI/testing/sysfs-bus-cxl |   28 +
 drivers/cxl/core/hdm.c                  |  318 ++++++-
 drivers/cxl/core/memdev.c               |   30 +
 drivers/cxl/core/pci.c                  | 1140 +++++++++++++++++++++++
 drivers/cxl/cxl.h                       |    5 +
 drivers/cxl/cxlmem.h                    |    2 +
 drivers/pci/pci.c                       |   22 +-
 include/linux/pci.h                     |    2 +
 include/uapi/linux/pci_regs.h           |   15 +
 9 files changed, 1557 insertions(+), 5 deletions(-)

base-commit: abb3c0de119032f4c0c81177884a3bb0a133e6ca
-- 
2.43.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders
  2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
@ 2026-05-28  8:31 ` Srirangan Madhavan
  2026-05-28 11:06   ` Richard Cheng
  2026-05-28  8:31 ` [PATCH v6 2/9] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() Srirangan Madhavan
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

Add helpers to restore endpoint decoder programming for a CXL memdev from
CXL core's cached decoder objects, then commit it as a distinct step.
Callers are expected to have established reset safety and to hold
cxl_rwsem.region for write.

cxl_restore_memdev_decoders() restores programmable decoder state while
keeping traffic disabled. For HDM-backed endpoints it programs enabled
endpoint decoder fields without COMMIT, keeps the HDM Decoder Capability
disabled, and mirrors matching endpoint DVSEC ranges where possible. For
endpoints without HDM decoder registers, it restores the legacy DVSEC
ranges that model endpoint decode.

cxl_commit_memdev_decoders() enables the HDM Decoder Capability and
commits enabled, unlocked endpoint decoders after safety checks pass. It
sets COMMIT only after decoder fields have been restored, does not
re-lock decoders, and does not set DVSEC MEM_ENABLE.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/core/hdm.c | 318 ++++++++++++++++++++++++++++++++++++++++-
 drivers/cxl/cxl.h      |   2 +
 2 files changed, 317 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 0c80b76a5f9b..f7af1041a9fc 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -679,7 +679,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
 	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
 }
 
-static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
+static int cxld_set_interleave_fields(struct cxl_decoder *cxld, u32 *ctrl)
 {
 	u16 eig;
 	u8 eiw;
@@ -690,14 +690,22 @@ static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
 	 */
 	if (WARN_ONCE(ways_to_eiw(cxld->interleave_ways, &eiw),
 		      "invalid interleave_ways: %d\n", cxld->interleave_ways))
-		return;
+		return -EINVAL;
 	if (WARN_ONCE(granularity_to_eig(cxld->interleave_granularity, &eig),
 		      "invalid interleave_granularity: %d\n",
 		      cxld->interleave_granularity))
-		return;
+		return -EINVAL;
 
 	u32p_replace_bits(ctrl, eig, CXL_HDM_DECODER0_CTRL_IG_MASK);
 	u32p_replace_bits(ctrl, eiw, CXL_HDM_DECODER0_CTRL_IW_MASK);
+	return 0;
+}
+
+static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
+{
+	if (cxld_set_interleave_fields(cxld, ctrl))
+		return;
+
 	*ctrl |= CXL_HDM_DECODER0_CTRL_COMMIT;
 }
 
@@ -927,6 +935,310 @@ static void cxl_decoder_reset(struct cxl_decoder *cxld)
 	}
 }
 
+static int cxl_restore_dvsec_range(struct cxl_memdev *cxlmd,
+				   struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_decoder *cxld = &cxled->cxld;
+	struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+	u64 base = cxld->hpa_range.start;
+	u64 size = range_len(&cxld->hpa_range);
+	u32 lo;
+	int dvsec = cxlds->cxl_dvsec;
+	int id = cxld->id;
+	int rc;
+
+	if (!dvsec)
+		return 0;
+
+	if (id >= CXL_DVSEC_RANGE_MAX)
+		return 0;
+
+	rc = pci_write_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(id),
+				    upper_32_bits(base));
+	if (rc)
+		return rc;
+
+	rc = pci_read_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(id),
+				   &lo);
+	if (rc)
+		return rc;
+	lo &= ~PCI_DVSEC_CXL_MEM_BASE_LOW;
+	lo |= lower_32_bits(base) & PCI_DVSEC_CXL_MEM_BASE_LOW;
+
+	rc = pci_write_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(id),
+				    lo);
+	if (rc)
+		return rc;
+
+	rc = pci_write_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_SIZE_HIGH(id),
+				    upper_32_bits(size));
+	if (rc)
+		return rc;
+
+	rc = pci_read_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
+				   &lo);
+	if (rc)
+		return rc;
+
+	/*
+	 * Preserve MEM_INFO_VALID / MEM_ACTIVE and any reserved bits while
+	 * restoring only the programmable size bits.
+	 */
+	lo &= ~PCI_DVSEC_CXL_MEM_SIZE_LOW;
+	lo |= lower_32_bits(size) & PCI_DVSEC_CXL_MEM_SIZE_LOW;
+
+	return pci_write_config_dword(pdev,
+				      dvsec + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
+				      lo);
+}
+
+static int cxl_restore_hdm_decoder(struct cxl_hdm *cxlhdm,
+				   struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_decoder *cxld = &cxled->cxld;
+	void __iomem *hdm;
+	u64 base, size, skip;
+	u32 ctrl;
+	int id;
+
+	id = cxld->id;
+	hdm = cxlhdm->regs.hdm_decoder;
+	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
+	if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
+		return 0;
+
+	base = cxld->hpa_range.start;
+	size = range_len(&cxld->hpa_range);
+	skip = cxled->skip;
+
+	ctrl &= ~(CXL_HDM_DECODER0_CTRL_LOCK |
+		  CXL_HDM_DECODER0_CTRL_COMMIT |
+		  CXL_HDM_DECODER0_CTRL_COMMITTED |
+		  CXL_HDM_DECODER0_CTRL_COMMIT_ERROR);
+	if (cxld_set_interleave_fields(cxld, &ctrl))
+		return -EINVAL;
+	cxld_set_type(cxld, &ctrl);
+
+	/* Preserve setup_hw_decoder() programming order, without COMMIT. */
+	writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
+	writel(lower_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
+	writel(upper_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(id));
+	writel(lower_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id));
+	writel(upper_32_bits(skip), hdm + CXL_HDM_DECODER0_SKIP_HIGH(id));
+	writel(lower_32_bits(skip), hdm + CXL_HDM_DECODER0_SKIP_LOW(id));
+	wmb();
+	writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
+
+	return 0;
+}
+
+struct cxl_restore_ctx {
+	struct cxl_memdev *cxlmd;
+	struct cxl_hdm *cxlhdm;
+};
+
+static int cxl_restore_decoder(struct device *dev, void *data)
+{
+	struct cxl_restore_ctx *ctx = data;
+	struct cxl_endpoint_decoder *cxled;
+	struct cxl_decoder *cxld;
+	int rc;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	cxld = &cxled->cxld;
+	if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
+		return 0;
+
+	if (ctx->cxlhdm->regs.hdm_decoder) {
+		if (cxld->id >= ctx->cxlhdm->decoder_count)
+			return -EINVAL;
+
+		rc = cxl_restore_hdm_decoder(ctx->cxlhdm, cxled);
+		if (rc)
+			return rc;
+	}
+
+	return cxl_restore_dvsec_range(ctx->cxlmd, cxled);
+}
+
+static int cxl_restore_decoders(struct cxl_memdev *cxlmd, struct cxl_hdm *cxlhdm)
+{
+	struct cxl_port *port = cxlhdm->port;
+	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+	struct cxl_restore_ctx ctx = {
+		.cxlmd = cxlmd,
+		.cxlhdm = cxlhdm,
+	};
+	u32 global_ctrl;
+
+	if (hdm) {
+		global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+		writel(global_ctrl & ~CXL_HDM_DECODER_ENABLE,
+		       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+	}
+
+	return device_for_each_child(&port->dev, &ctx, cxl_restore_decoder);
+}
+
+/**
+ * cxl_restore_memdev_decoders - Restore endpoint decoder programming
+ * @cxlmd: CXL memdev whose endpoint decoders need to be restored
+ *
+ * Restore only programmable decoder state from CXL core's cached decoder
+ * objects. For endpoints with HDM decoder registers, program the HDM decoder
+ * fields and mirror decoder ids representable by CXL_DVSEC_RANGE_MAX into the
+ * DVSEC range registers when present. For endpoints without HDM decoder
+ * registers, restore DVSEC range registers only.
+ *
+ * This helper leaves CXL.mem disabled: it does not commit HDM decoders, enable
+ * the HDM Decoder Capability, set PCI_DVSEC_CXL_MEM_ENABLE, or restore
+ * unrelated DVSEC CTRL, CTRL2, LOCK, MEM_ENABLE, or other control state.
+ * Callers must perform final commit/resume steps only after reset safety checks
+ * pass.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cxl_restore_memdev_decoders(struct cxl_memdev *cxlmd)
+{
+	struct cxl_port *endpoint = cxlmd->endpoint;
+	struct cxl_hdm *cxlhdm;
+	int rc;
+
+	lockdep_assert_held_write(&cxl_rwsem.region);
+
+	if (!endpoint)
+		return -ENODEV;
+
+	cxlhdm = dev_get_drvdata(&endpoint->dev);
+	if (!cxlhdm)
+		return -ENODEV;
+
+	scoped_guard(rwsem_read, &cxl_rwsem.dpa)
+		rc = cxl_restore_decoders(cxlmd, cxlhdm);
+	return rc;
+}
+
+static int cxl_commit_restored_hdm_decoder(struct cxl_hdm *cxlhdm,
+					   struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_decoder *cxld = &cxled->cxld;
+	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
+	u32 ctrl;
+	int id;
+
+	if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
+		return 0;
+
+	if (!hdm)
+		return 0;
+
+	id = cxld->id;
+	if (id >= cxlhdm->decoder_count)
+		return -EINVAL;
+
+	/*
+	 * cxl_restore_hdm_decoder() programmed the decoder fields first. This
+	 * control register write sets COMMIT as the final programming step.
+	 */
+	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
+	if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
+		return 0;
+
+	if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)
+		return 0;
+
+	ctrl |= CXL_HDM_DECODER0_CTRL_COMMIT;
+	writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
+
+	return cxld_await_commit(hdm, id);
+}
+
+struct cxl_commit_decoder_ctx {
+	struct cxl_hdm *cxlhdm;
+	int id;
+};
+
+static int cxl_commit_restored_decoder_by_id(struct device *dev, void *data)
+{
+	struct cxl_commit_decoder_ctx *ctx = data;
+	struct cxl_endpoint_decoder *cxled;
+	int rc;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	if (cxled->cxld.id != ctx->id)
+		return 0;
+
+	rc = cxl_commit_restored_hdm_decoder(ctx->cxlhdm, cxled);
+	return rc ?: 1;
+}
+
+/**
+ * cxl_commit_memdev_decoders - Commit restored endpoint decoder programming
+ * @cxlmd: CXL memdev whose endpoint decoders need to be committed
+ *
+ * Resume endpoint decoding after cxl_restore_memdev_decoders() has restored
+ * programmable decoder fields. For endpoints with HDM decoder registers, enable
+ * the HDM Decoder Capability and commit enabled, unlocked endpoint decoders.
+ * Locked decoders are left to their current hardware/firmware-owned state.
+ *
+ * This helper does not set PCI_DVSEC_CXL_MEM_ENABLE. Callers must enable
+ * CXL.mem only after all reset safety checks and decoder restore/commit steps
+ * have completed.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int cxl_commit_memdev_decoders(struct cxl_memdev *cxlmd)
+{
+	struct cxl_port *endpoint = cxlmd->endpoint;
+	struct cxl_hdm *cxlhdm;
+	void __iomem *hdm;
+	u32 global_ctrl;
+	int i, rc;
+
+	lockdep_assert_held_write(&cxl_rwsem.region);
+
+	if (!endpoint)
+		return -ENODEV;
+
+	cxlhdm = dev_get_drvdata(&endpoint->dev);
+	if (!cxlhdm)
+		return -ENODEV;
+
+	hdm = cxlhdm->regs.hdm_decoder;
+	if (!hdm)
+		return 0;
+
+	global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+	writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
+	       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+
+	for (i = 0; i < cxlhdm->decoder_count; i++) {
+		struct cxl_commit_decoder_ctx ctx = {
+			.cxlhdm = cxlhdm,
+			.id = i,
+		};
+
+		/*
+		 * Per CXL Spec 3.1 8.2.4.20.12 software must commit decoders
+		 * in HPA order. Region setup already enforces that ordering by
+		 * decoder id, so restore commits follow ascending id order.
+		 */
+		rc = device_for_each_child(&endpoint->dev, &ctx,
+					   cxl_commit_restored_decoder_by_id);
+		if (rc < 0)
+			return rc;
+	}
+
+	return 0;
+}
+
 static int cxl_setup_hdm_decoder_from_dvsec(
 	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
 	int which, struct cxl_endpoint_dvsec_info *info)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..b51b1e9d6400 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -794,6 +794,8 @@ int cxl_port_setup_regs(struct cxl_port *port,
 struct cxl_dev_state;
 int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
 			struct cxl_endpoint_dvsec_info *info);
+int cxl_restore_memdev_decoders(struct cxl_memdev *cxlmd);
+int cxl_commit_memdev_decoders(struct cxl_memdev *cxlmd);
 
 bool is_cxl_region(struct device *dev);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 2/9] PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
  2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders Srirangan Madhavan
@ 2026-05-28  8:31 ` Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 3/9] cxl: Add reset-idle and cache flush helpers Srirangan Madhavan
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

Export pci_dev_save_and_disable() and pci_dev_restore() so CXL reset
orchestration can reuse the PCI core reset lifecycle for non-standard
reset flows.

These helpers invoke driver reset_prepare/reset_done callbacks, save and
restore PCI config state, and disable the device while the caller holds
the device lock.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/pci/pci.c   | 22 ++++++++++++++++++++--
 include/linux/pci.h |  2 ++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d34266651ad0..75d2f4074750 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5003,7 +5003,15 @@ void pci_dev_unlock(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_dev_unlock);
 
-static void pci_dev_save_and_disable(struct pci_dev *dev)
+/**
+ * pci_dev_save_and_disable - Save device state and disable it
+ * @dev: PCI device to save and disable
+ *
+ * Save the PCI configuration state, invoke the driver's reset_prepare()
+ * callback if present, and disable the device by clearing the Command
+ * register. The device lock must be held by the caller.
+ */
+void pci_dev_save_and_disable(struct pci_dev *dev)
 {
 	const struct pci_error_handlers *err_handler =
 			dev->driver ? dev->driver->err_handler : NULL;
@@ -5036,8 +5044,17 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
 	 */
 	pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
 }
+EXPORT_SYMBOL_GPL(pci_dev_save_and_disable);
 
-static void pci_dev_restore(struct pci_dev *dev)
+/**
+ * pci_dev_restore - Restore device state after reset
+ * @dev: PCI device to restore
+ *
+ * Restore the saved PCI configuration state and invoke the driver's
+ * reset_done() callback if present. The device lock must be held by the
+ * caller.
+ */
+void pci_dev_restore(struct pci_dev *dev)
 {
 	const struct pci_error_handlers *err_handler =
 			dev->driver ? dev->driver->err_handler : NULL;
@@ -5054,6 +5071,7 @@ static void pci_dev_restore(struct pci_dev *dev)
 	else if (dev->driver)
 		pci_warn(dev, "reset done");
 }
+EXPORT_SYMBOL_GPL(pci_dev_restore);
 
 /* dev->reset_methods[] is a 0-terminated list of indices into this array */
 const struct pci_reset_fn_method pci_reset_fn_methods[] = {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c4454583c11..d6303e16e11b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2012,6 +2012,8 @@ void pci_dev_lock(struct pci_dev *dev);
 int pci_dev_trylock(struct pci_dev *dev);
 void pci_dev_unlock(struct pci_dev *dev);
 DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
+void pci_dev_save_and_disable(struct pci_dev *dev);
+void pci_dev_restore(struct pci_dev *dev);
 
 /*
  * PCI domain support.  Sometimes called PCI segment (eg by ACPI),
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 3/9] cxl: Add reset-idle and cache flush helpers
  2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 2/9] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() Srirangan Madhavan
@ 2026-05-28  8:31 ` Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 4/9] PCI/CXL: Add sibling function coordination for reset Srirangan Madhavan
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

Add helpers to collect the CXL regions affected by a memdev reset,
verify that those regions are idle, and invalidate CPU caches for the
affected address ranges before reset.

A memdev can participate in an interleaved region through multiple
endpoint decoders. Track affected regions in a temporary xarray so each
region is checked and cache-invalidated once per reset operation.

These helpers prepare the CXL.mem data path for reset. The actual reset
orchestration and decoder restore flow are added separately.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/core/pci.c | 170 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index d1f487b3d809..318744695f62 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -4,9 +4,11 @@
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/device.h>
 #include <linux/delay.h>
+#include <linux/memregion.h>
 #include <linux/pci.h>
 #include <linux/pci-doe.h>
 #include <linux/aer.h>
+#include <linux/xarray.h>
 #include <cxlpci.h>
 #include <cxlmem.h>
 #include <cxl.h>
@@ -926,3 +928,171 @@ int cxl_port_get_possible_dports(struct cxl_port *port)
 
 	return ctx.count;
 }
+
+static int cxl_reset_system_ram_found(struct resource *res, void *data)
+{
+	return 1;
+}
+
+struct cxl_reset_region_context {
+	struct xarray regions;
+};
+
+static void __maybe_unused
+cxl_reset_region_context_init(struct cxl_reset_region_context *ctx)
+{
+	xa_init(&ctx->regions);
+}
+
+static void __maybe_unused
+cxl_reset_region_context_destroy(struct cxl_reset_region_context *ctx)
+{
+	xa_destroy(&ctx->regions);
+}
+
+static int cxl_reset_add_region(struct cxl_reset_region_context *ctx,
+				struct cxl_region *cxlr)
+{
+	int rc;
+
+	if (!cxlr || !cxlr->params.res)
+		return 0;
+
+	rc = xa_insert(&ctx->regions, (unsigned long)cxlr, cxlr, GFP_KERNEL);
+
+	/* A region may be referenced by multiple affected endpoint decoders. */
+	return rc == -EBUSY ? 0 : rc;
+}
+
+static int cxl_reset_collect_region(struct device *dev, void *data)
+{
+	struct cxl_reset_region_context *ctx = data;
+	struct cxl_endpoint_decoder *cxled;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	return cxl_reset_add_region(ctx, cxled->cxld.region);
+}
+
+static int __maybe_unused
+cxl_reset_collect_memdev_regions(struct cxl_reset_region_context *ctx,
+				 struct cxl_memdev *cxlmd)
+{
+	struct cxl_port *endpoint;
+
+	if (!cxlmd || !cxlmd->cxlds)
+		return -ENODEV;
+
+	endpoint = cxlmd->endpoint;
+	if (!endpoint)
+		return 0;
+
+	return device_for_each_child(&endpoint->dev, ctx,
+				     cxl_reset_collect_region);
+}
+
+static bool cxl_reset_region_has_system_ram(struct cxl_region *cxlr)
+{
+	struct cxl_region_params *p = &cxlr->params;
+	int rc;
+
+	if (!p->res)
+		return false;
+
+	rc = walk_iomem_res_desc(IORES_DESC_NONE,
+				 IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
+				 p->res->start, p->res->end, NULL,
+				 cxl_reset_system_ram_found);
+
+	return rc > 0;
+}
+
+static int cxl_reset_validate_region_idle(struct cxl_region *cxlr)
+{
+	struct resource *res = cxlr->params.res;
+	int rc = 0;
+
+	lockdep_assert_held_write(&cxl_rwsem.region);
+
+	if (cxl_reset_region_has_system_ram(cxlr)) {
+		dev_err(&cxlr->dev,
+			"Cannot reset while CXL memory is online as System RAM [%pr]\n",
+			res);
+		return -EBUSY;
+	}
+
+	if (!device_trylock(&cxlr->dev))
+		return -EAGAIN;
+
+	if (cxlr->dev.driver) {
+		dev_err(&cxlr->dev,
+			"Cannot reset while CXL region has an active driver\n");
+		rc = -EBUSY;
+	}
+
+	device_unlock(&cxlr->dev);
+	return rc;
+}
+
+static int __maybe_unused
+cxl_reset_validate_regions_idle(struct cxl_reset_region_context *ctx)
+{
+	struct cxl_region *cxlr;
+	unsigned long index;
+	int rc;
+
+	xa_for_each(&ctx->regions, index, cxlr) {
+		rc = cxl_reset_validate_region_idle(cxlr);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
+static int cxl_reset_flush_region_cache(struct cxl_region *cxlr)
+{
+	struct resource *res = cxlr->params.res;
+	int rc;
+
+	if (!res)
+		return 0;
+
+	rc = cpu_cache_invalidate_memregion(res->start, resource_size(res));
+	if (rc)
+		dev_err(&cxlr->dev, "Failed to invalidate CPU cache [%pr]: %d\n",
+			res, rc);
+
+	return rc;
+}
+
+static int __maybe_unused
+cxl_reset_flush_cpu_caches(struct cxl_reset_region_context *ctx)
+{
+	struct cxl_region *cxlr;
+	unsigned long index;
+	int rc;
+
+	if (xa_empty(&ctx->regions))
+		return 0;
+
+	if (!cpu_cache_has_invalidate_memregion()) {
+		if (IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST)) {
+			pr_info_once(
+				"Bypassing cpu_cache_invalidate_memregion() for testing!\n");
+			return 0;
+		}
+		pr_warn("Failed to synchronize CPU cache state\n");
+		return -ENXIO;
+	}
+
+	xa_for_each(&ctx->regions, index, cxlr) {
+		rc = cxl_reset_flush_region_cache(cxlr);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 4/9] PCI/CXL: Add sibling function coordination for reset
  2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
                   ` (2 preceding siblings ...)
  2026-05-28  8:31 ` [PATCH v6 3/9] cxl: Add reset-idle and cache flush helpers Srirangan Madhavan
@ 2026-05-28  8:31 ` Srirangan Madhavan
  2026-05-28 11:15   ` Richard Cheng
  2026-05-28  8:31 ` [PATCH v6 5/9] cxl/pci: Add CXL DVSEC reset helper Srirangan Madhavan
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

Add helpers to collect CXL sibling PCI functions affected by a CXL reset
and prepare them for reset by saving and disabling them. Restore those
siblings and drop their references when reset coordination completes.

Use the Non-CXL Function Map DVSEC to exclude non-CXL functions, and
filter remaining siblings to functions that advertise CXL.cache or
CXL.mem capability.

Use pci_dev_trylock() for sibling locking and unwind on contention or
allocation failure, so competing reset paths fail with an errno.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/core/pci.c        | 207 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/pci_regs.h |   2 +
 2 files changed, 209 insertions(+)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 318744695f62..01effbb4e7cd 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1,9 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2021 Intel Corporation. All rights reserved. */
 #include <linux/units.h>
+#include <linux/bitmap.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/device.h>
 #include <linux/delay.h>
+#include <linux/iommu.h>
 #include <linux/memregion.h>
 #include <linux/pci.h>
 #include <linux/pci-doe.h>
@@ -15,6 +17,10 @@
 #include "core.h"
 #include "trace.h"
 
+#define CXL_RESET_MAX_FUNCTIONS		256
+#define CXL_RESET_FUNCTION_MAP_REGS	(CXL_RESET_MAX_FUNCTIONS / 32)
+#define CXL_RESET_SIBLINGS_INIT		8
+
 /**
  * DOC: cxl core pci
  *
@@ -1096,3 +1102,204 @@ cxl_reset_flush_cpu_caches(struct cxl_reset_region_context *ctx)
 
 	return 0;
 }
+
+struct cxl_reset_context {
+	struct pci_dev *target;
+	struct pci_dev **siblings;
+	int nr_siblings;
+	int sibling_capacity;
+	int nr_siblings_prepared;
+};
+
+struct cxl_reset_walk_ctx {
+	struct cxl_reset_context *ctx;
+	unsigned long *non_cxl_func_map;
+	int rc;
+};
+
+static void
+cxl_reset_read_non_cxl_func_map(struct pci_dev *pdev,
+				unsigned long *non_cxl_func_map)
+{
+	u32 map[CXL_RESET_FUNCTION_MAP_REGS] = {};
+	u16 dvsec;
+	int rc, i;
+
+	bitmap_zero(non_cxl_func_map, CXL_RESET_MAX_FUNCTIONS);
+
+	dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+					  PCI_DVSEC_CXL_FUNCTION_MAP);
+	if (!dvsec)
+		return;
+
+	for (i = 0; i < CXL_RESET_FUNCTION_MAP_REGS; i++) {
+		rc = pci_read_config_dword(pdev,
+					   dvsec + PCI_DVSEC_CXL_FUNCTION_MAP_REG +
+					   i * sizeof(map[i]), &map[i]);
+		if (rc) {
+			pci_warn(pdev,
+				 "failed to read CXL Function Map; treating all siblings as CXL: %d\n",
+				 rc);
+			bitmap_zero(non_cxl_func_map, CXL_RESET_MAX_FUNCTIONS);
+			return;
+		}
+	}
+
+	bitmap_from_arr32(non_cxl_func_map, map, CXL_RESET_MAX_FUNCTIONS);
+}
+
+static bool cxl_reset_is_cxl_sibling(struct pci_dev *pdev,
+				     struct pci_dev *sibling,
+				     unsigned long *non_cxl_func_map)
+{
+	if (sibling == pdev || sibling->bus != pdev->bus)
+		return false;
+
+	if (pci_ari_enabled(pdev->bus))
+		return !test_bit(sibling->devfn, non_cxl_func_map);
+
+	if (PCI_SLOT(sibling->devfn) != PCI_SLOT(pdev->devfn))
+		return false;
+
+	return !test_bit(PCI_FUNC(sibling->devfn) * 32 +
+			 PCI_SLOT(sibling->devfn), non_cxl_func_map);
+}
+
+static bool cxl_reset_has_cache_or_mem(struct pci_dev *pdev)
+{
+	u16 dvsec, cap;
+
+	dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+					  PCI_DVSEC_CXL_DEVICE);
+	if (!dvsec)
+		return false;
+
+	if (pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap))
+		return false;
+
+	return cap & (PCI_DVSEC_CXL_CACHE_CAPABLE | PCI_DVSEC_CXL_MEM_CAPABLE);
+}
+
+static int cxl_reset_add_sibling(struct cxl_reset_context *ctx,
+				 struct pci_dev *sibling)
+{
+	struct pci_dev **siblings;
+	int capacity;
+
+	if (ctx->nr_siblings < ctx->sibling_capacity)
+		goto add;
+
+	capacity = ctx->sibling_capacity ? ctx->sibling_capacity * 2 :
+		   CXL_RESET_SIBLINGS_INIT;
+	siblings = krealloc(ctx->siblings, capacity * sizeof(*siblings),
+			    GFP_KERNEL);
+	if (!siblings)
+		return -ENOMEM;
+
+	ctx->siblings = siblings;
+	ctx->sibling_capacity = capacity;
+
+add:
+	pci_dev_get(sibling);
+	ctx->siblings[ctx->nr_siblings++] = sibling;
+	return 0;
+}
+
+static int cxl_reset_collect_sibling(struct pci_dev *sibling, void *data)
+{
+	struct cxl_reset_walk_ctx *wctx = data;
+	struct cxl_reset_context *ctx = wctx->ctx;
+	struct pci_dev *pdev = ctx->target;
+
+	if (!cxl_reset_is_cxl_sibling(pdev, sibling, wctx->non_cxl_func_map))
+		return 0;
+
+	if (!cxl_reset_has_cache_or_mem(sibling))
+		return 0;
+
+	wctx->rc = cxl_reset_add_sibling(ctx, sibling);
+	return wctx->rc;
+}
+
+static int cxl_reset_collect_siblings(struct cxl_reset_context *ctx)
+{
+	DECLARE_BITMAP(non_cxl_func_map, CXL_RESET_MAX_FUNCTIONS);
+	struct cxl_reset_walk_ctx wctx = {
+		.ctx = ctx,
+		.non_cxl_func_map = non_cxl_func_map,
+	};
+
+	cxl_reset_read_non_cxl_func_map(ctx->target, non_cxl_func_map);
+	pci_walk_bus(ctx->target->bus, cxl_reset_collect_sibling, &wctx);
+	return wctx.rc;
+}
+
+static void cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
+{
+	int i;
+
+	for (i = ctx->nr_siblings_prepared - 1; i >= 0; i--) {
+		struct pci_dev *sibling = ctx->siblings[i];
+
+		pci_dev_reset_iommu_done(sibling);
+		pci_dev_restore(sibling);
+		pci_dev_unlock(sibling);
+	}
+
+	for (i = 0; i < ctx->nr_siblings; i++)
+		pci_dev_put(ctx->siblings[i]);
+
+	kfree(ctx->siblings);
+	ctx->siblings = NULL;
+	ctx->nr_siblings = 0;
+	ctx->sibling_capacity = 0;
+	ctx->nr_siblings_prepared = 0;
+}
+
+static int __maybe_unused
+cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
+{
+	int rc, i;
+
+	ctx->siblings = NULL;
+	ctx->nr_siblings = 0;
+	ctx->sibling_capacity = 0;
+	ctx->nr_siblings_prepared = 0;
+
+	rc = cxl_reset_collect_siblings(ctx);
+	if (rc)
+		goto err;
+
+	for (i = 0; i < ctx->nr_siblings; i++) {
+		struct pci_dev *sibling = ctx->siblings[i];
+
+		if (!pci_dev_trylock(sibling)) {
+			rc = -EAGAIN;
+			goto err;
+		}
+
+		pci_dev_save_and_disable(sibling);
+		rc = pci_dev_reset_iommu_prepare(sibling);
+		if (rc) {
+			pci_err(sibling,
+				"failed to block IOMMU for CXL reset: %d\n",
+				rc);
+			/*
+			 * Undo save_and_disable() for this sibling. IOMMU
+			 * prepare failed, so this sibling is not counted in
+			 * nr_siblings_prepared and must not get iommu_done().
+			 */
+			pci_dev_restore(sibling);
+			pci_dev_unlock(sibling);
+			goto err;
+		}
+
+		ctx->nr_siblings_prepared++;
+	}
+
+	return 0;
+
+err:
+	cxl_pci_functions_reset_done(ctx);
+	return rc;
+}
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 14f634ab9350..fa1fcd26af01 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1349,6 +1349,7 @@
 /* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */
 #define PCI_DVSEC_CXL_DEVICE				0
 #define  PCI_DVSEC_CXL_CAP				0xA
+#define   PCI_DVSEC_CXL_CACHE_CAPABLE			_BITUL(0)
 #define   PCI_DVSEC_CXL_MEM_CAPABLE			_BITUL(2)
 #define   PCI_DVSEC_CXL_HDM_COUNT			__GENMASK(5, 4)
 #define  PCI_DVSEC_CXL_CTRL				0xC
@@ -1366,6 +1367,7 @@
 
 /* CXL r4.0, 8.1.4: Non-CXL Function Map DVSEC */
 #define PCI_DVSEC_CXL_FUNCTION_MAP			2
+#define  PCI_DVSEC_CXL_FUNCTION_MAP_REG			0x0C
 
 /* CXL r4.0, 8.1.5: Extensions DVSEC for Ports */
 #define PCI_DVSEC_CXL_PORT				3
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 5/9] cxl/pci: Add CXL DVSEC reset helper
  2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
                   ` (3 preceding siblings ...)
  2026-05-28  8:31 ` [PATCH v6 4/9] PCI/CXL: Add sibling function coordination for reset Srirangan Madhavan
@ 2026-05-28  8:31 ` Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 6/9] cxl/pci: Track memdevs affected by CXL reset Srirangan Madhavan
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

Add a helper to execute CXL Reset through the CXL Device DVSEC. The
helper verifies reset capability, waits for pending PCIe transactions,
disables CXL.cache, optionally initiates cache writeback and invalidation,
and then starts CXL Reset through the DVSEC Control2 register.

Block IOMMU traffic while reset is active, then restore IOMMU
translations after reset completes.

Wait for the DVSEC reset timeout before checking reset completion, and
report reset error or timeout status from the DVSEC Status2 register. Add
the CXL Device DVSEC reset and cache control definitions needed by the
helper.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/core/pci.c        | 185 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/pci_regs.h |  13 +++
 2 files changed, 198 insertions(+)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 01effbb4e7cd..1dd880f5a333 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -20,6 +20,9 @@
 #define CXL_RESET_MAX_FUNCTIONS		256
 #define CXL_RESET_FUNCTION_MAP_REGS	(CXL_RESET_MAX_FUNCTIONS / 32)
 #define CXL_RESET_SIBLINGS_INIT		8
+#define CXL_RESET_CACHE_WBI_POLL_US	100
+#define CXL_RESET_CACHE_WBI_TIMEOUT_US	(100 * USEC_PER_MSEC)
+#define CXL_RESET_MIN_QUIET_MS		100
 
 /**
  * DOC: cxl core pci
@@ -1303,3 +1306,185 @@ cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
 	cxl_pci_functions_reset_done(ctx);
 	return rc;
 }
+
+static int cxl_reset_update_ctrl2(struct pci_dev *pdev, int dvsec, u16 set,
+				  u16 clear)
+{
+	u16 ctrl2;
+	int rc;
+
+	rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
+	if (rc)
+		return rc;
+
+	ctrl2 &= ~clear;
+	ctrl2 |= set;
+
+	return pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, ctrl2);
+}
+
+static int cxl_reset_wait_cache_inv(struct pci_dev *pdev, int dvsec)
+{
+	int remaining_us = CXL_RESET_CACHE_WBI_TIMEOUT_US;
+	u16 status2;
+	int rc;
+
+	do {
+		usleep_range(CXL_RESET_CACHE_WBI_POLL_US,
+			     CXL_RESET_CACHE_WBI_POLL_US + 1);
+		remaining_us -= CXL_RESET_CACHE_WBI_POLL_US;
+
+		rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_STATUS2,
+					  &status2);
+		if (rc)
+			return rc;
+
+		if (status2 & PCI_DVSEC_CXL_CACHE_INV)
+			return 0;
+	} while (remaining_us > 0);
+
+	pci_err(pdev, "CXL cache WB+I timed out\n");
+	return -ETIMEDOUT;
+}
+
+static int cxl_reset_enable_cache(struct pci_dev *pdev, int dvsec, u16 cap)
+{
+	if (!(cap & PCI_DVSEC_CXL_CACHE_CAPABLE))
+		return 0;
+
+	return cxl_reset_update_ctrl2(pdev, dvsec, 0,
+				      PCI_DVSEC_CXL_DISABLE_CACHING);
+}
+
+static int cxl_reset_disable_cache(struct pci_dev *pdev, int dvsec, u16 cap)
+{
+	int rc;
+
+	if (!(cap & PCI_DVSEC_CXL_CACHE_CAPABLE))
+		return 0;
+
+	rc = cxl_reset_update_ctrl2(pdev, dvsec,
+				    PCI_DVSEC_CXL_DISABLE_CACHING, 0);
+	if (rc)
+		return rc;
+
+	if (!(cap & PCI_DVSEC_CXL_CACHE_WBI_CAPABLE))
+		return 0;
+
+	rc = cxl_reset_update_ctrl2(pdev, dvsec,
+				    PCI_DVSEC_CXL_INIT_CACHE_WBI, 0);
+	if (rc)
+		goto err_enable_cache;
+
+	rc = cxl_reset_wait_cache_inv(pdev, dvsec);
+	if (rc)
+		goto err_enable_cache;
+
+	return 0;
+
+err_enable_cache:
+	/*
+	 * Best effort rollback: preserve the original WB+I failure even if
+	 * re-enabling CXL.cache also fails.
+	 */
+	cxl_reset_enable_cache(pdev, dvsec, cap);
+	return rc;
+}
+
+static int cxl_reset_wait_done(struct pci_dev *pdev, int dvsec, u16 cap)
+{
+	static const u32 reset_timeout_ms[] = { 10, 100, 1000, 10000, 100000 };
+	u32 timeout_ms;
+	u16 status2;
+	int rc, idx;
+
+	idx = FIELD_GET(PCI_DVSEC_CXL_RST_TIMEOUT, cap);
+	if (idx >= ARRAY_SIZE(reset_timeout_ms))
+		idx = ARRAY_SIZE(reset_timeout_ms) - 1;
+	timeout_ms = reset_timeout_ms[idx];
+
+	msleep(max_t(u32, timeout_ms, CXL_RESET_MIN_QUIET_MS));
+
+	rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_STATUS2,
+				  &status2);
+	if (rc)
+		return rc;
+
+	if (status2 & PCI_DVSEC_CXL_RST_ERR) {
+		pci_err(pdev, "CXL reset error\n");
+		return -EIO;
+	}
+
+	if (!(status2 & PCI_DVSEC_CXL_RST_DONE)) {
+		pci_err(pdev, "CXL reset timed out\n");
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
+static int __maybe_unused cxl_dev_reset(struct pci_dev *pdev, bool mem_clear)
+{
+	int dvsec, rc;
+	u16 ctrl2_clear = 0;
+	u16 cap;
+
+	dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+					  PCI_DVSEC_CXL_DEVICE);
+	if (!dvsec)
+		return -ENODEV;
+
+	rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap);
+	if (rc)
+		return rc;
+
+	if (!(cap & PCI_DVSEC_CXL_RST_CAPABLE))
+		return -EOPNOTSUPP;
+
+	if (mem_clear && !(cap & PCI_DVSEC_CXL_RST_MEM_CLR_CAPABLE))
+		return -EOPNOTSUPP;
+
+	if (!pci_wait_for_pending_transaction(pdev))
+		pci_err(pdev, "timed out waiting for pending transactions\n");
+
+	rc = pci_dev_reset_iommu_prepare(pdev);
+	if (rc) {
+		pci_err(pdev, "failed to block IOMMU for CXL reset: %d\n",
+			rc);
+		return rc;
+	}
+
+	rc = cxl_reset_disable_cache(pdev, dvsec, cap);
+	if (rc)
+		goto out_iommu;
+	if (cap & PCI_DVSEC_CXL_CACHE_CAPABLE)
+		ctrl2_clear |= PCI_DVSEC_CXL_DISABLE_CACHING;
+
+	if (mem_clear) {
+		rc = cxl_reset_update_ctrl2(pdev, dvsec,
+					    PCI_DVSEC_CXL_RST_MEM_CLR_EN, 0);
+		if (rc)
+			goto out_ctrl2;
+		ctrl2_clear |= PCI_DVSEC_CXL_RST_MEM_CLR_EN;
+	}
+
+	rc = cxl_reset_update_ctrl2(pdev, dvsec,
+				    PCI_DVSEC_CXL_INIT_CXL_RST, 0);
+	if (rc)
+		goto out_ctrl2;
+
+	rc = cxl_reset_wait_done(pdev, dvsec, cap);
+	if (rc)
+		goto out_iommu;
+
+	rc = cxl_reset_update_ctrl2(pdev, dvsec, 0,
+				    PCI_DVSEC_CXL_DISABLE_CACHING);
+
+out_ctrl2:
+	if (rc && ctrl2_clear)
+		cxl_reset_update_ctrl2(pdev, dvsec, 0, ctrl2_clear);
+
+out_iommu:
+	pci_dev_reset_iommu_done(pdev);
+	return rc;
+}
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index fa1fcd26af01..7fc1d34fcce7 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1352,8 +1352,21 @@
 #define   PCI_DVSEC_CXL_CACHE_CAPABLE			_BITUL(0)
 #define   PCI_DVSEC_CXL_MEM_CAPABLE			_BITUL(2)
 #define   PCI_DVSEC_CXL_HDM_COUNT			__GENMASK(5, 4)
+#define   PCI_DVSEC_CXL_CACHE_WBI_CAPABLE		_BITUL(6)
+#define   PCI_DVSEC_CXL_RST_CAPABLE			_BITUL(7)
+#define   PCI_DVSEC_CXL_RST_TIMEOUT			__GENMASK(10, 8)
+#define   PCI_DVSEC_CXL_RST_MEM_CLR_CAPABLE		_BITUL(11)
 #define  PCI_DVSEC_CXL_CTRL				0xC
 #define   PCI_DVSEC_CXL_MEM_ENABLE			_BITUL(2)
+#define  PCI_DVSEC_CXL_CTRL2				0x10
+#define   PCI_DVSEC_CXL_DISABLE_CACHING			_BITUL(0)
+#define   PCI_DVSEC_CXL_INIT_CACHE_WBI			_BITUL(1)
+#define   PCI_DVSEC_CXL_INIT_CXL_RST			_BITUL(2)
+#define   PCI_DVSEC_CXL_RST_MEM_CLR_EN			_BITUL(3)
+#define  PCI_DVSEC_CXL_STATUS2				0x12
+#define   PCI_DVSEC_CXL_CACHE_INV			_BITUL(0)
+#define   PCI_DVSEC_CXL_RST_DONE			_BITUL(1)
+#define   PCI_DVSEC_CXL_RST_ERR			_BITUL(2)
 #define  PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i)		(0x18 + (i * 0x10))
 #define  PCI_DVSEC_CXL_RANGE_SIZE_LOW(i)		(0x1C + (i * 0x10))
 #define   PCI_DVSEC_CXL_MEM_INFO_VALID			_BITUL(0)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 6/9] cxl/pci: Track memdevs affected by CXL reset
  2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
                   ` (4 preceding siblings ...)
  2026-05-28  8:31 ` [PATCH v6 5/9] cxl/pci: Add CXL DVSEC reset helper Srirangan Madhavan
@ 2026-05-28  8:31 ` Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 7/9] cxl/pci: Orchestrate CXL reset for affected memdevs Srirangan Madhavan
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

CXL reset is scoped to the CXL.cache/mem function set, so reset
orchestration needs to account for the target memdev and any affected
sibling-function memdevs.

Add reset context tracking for affected memdevs. Collect the memdevs
associated with the target and sibling PCI functions, track which ones
are active, collect their regions, and provide helpers to lock and
revalidate the active memdevs before reset proceeds.

The reset orchestration and CXL.mem restore flow are added separately.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/core/pci.c | 176 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 1dd880f5a333..c755c18c8d84 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1106,8 +1106,17 @@ cxl_reset_flush_cpu_caches(struct cxl_reset_region_context *ctx)
 	return 0;
 }
 
+struct cxl_reset_memdev {
+	struct cxl_memdev *cxlmd;
+	bool active;
+	bool locked;
+};
+
 struct cxl_reset_context {
 	struct pci_dev *target;
+	struct cxl_reset_memdev *memdevs;
+	int nr_memdevs;
+	int memdev_capacity;
 	struct pci_dev **siblings;
 	int nr_siblings;
 	int sibling_capacity;
@@ -1237,6 +1246,173 @@ static int cxl_reset_collect_siblings(struct cxl_reset_context *ctx)
 	return wctx.rc;
 }
 
+static int cxl_reset_match_memdev_by_parent(struct device *dev,
+					    const void *parent)
+{
+	return is_cxl_memdev(dev) && dev->parent == parent;
+}
+
+static bool cxl_reset_memdev_active(struct cxl_memdev *cxlmd)
+{
+	return cxlmd->dev.driver && cxlmd->endpoint &&
+	       !IS_ERR(cxlmd->endpoint);
+}
+
+static int cxl_reset_collect_pci_memdev(struct cxl_reset_context *ctx,
+					struct pci_dev *pdev)
+{
+	struct cxl_reset_memdev *memdevs;
+	struct cxl_memdev *cxlmd;
+	struct device *dev;
+	int capacity, i;
+
+	dev = bus_find_device(&cxl_bus_type, NULL, &pdev->dev,
+			      cxl_reset_match_memdev_by_parent);
+	if (!dev)
+		return 0;
+
+	cxlmd = to_cxl_memdev(dev);
+	for (i = 0; i < ctx->nr_memdevs; i++) {
+		if (ctx->memdevs[i].cxlmd == cxlmd) {
+			put_device(dev);
+			return 0;
+		}
+	}
+
+	if (ctx->nr_memdevs < ctx->memdev_capacity)
+		goto add;
+
+	capacity = ctx->memdev_capacity ? ctx->memdev_capacity * 2 :
+		   CXL_RESET_SIBLINGS_INIT;
+	memdevs = krealloc(ctx->memdevs, capacity * sizeof(*memdevs),
+			   GFP_KERNEL);
+	if (!memdevs) {
+		put_device(dev);
+		return -ENOMEM;
+	}
+
+	ctx->memdevs = memdevs;
+	ctx->memdev_capacity = capacity;
+
+add:
+	ctx->memdevs[ctx->nr_memdevs++] = (struct cxl_reset_memdev) {
+		.cxlmd = cxlmd,
+	};
+	return 0;
+}
+
+/*
+ * CXL Reset is device scoped for CXL.cache/mem. Use the affected PCI
+ * function set to find memdevs whose regions and endpoint decoder state must
+ * be handled around the reset.
+ */
+static int __maybe_unused cxl_reset_collect_memdevs(struct cxl_reset_context *ctx)
+{
+	int rc, i;
+
+	rc = cxl_reset_collect_pci_memdev(ctx, ctx->target);
+	if (rc)
+		return rc;
+
+	for (i = 0; i < ctx->nr_siblings; i++) {
+		rc = cxl_reset_collect_pci_memdev(ctx, ctx->siblings[i]);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
+static int __maybe_unused
+cxl_reset_collect_regions(struct cxl_reset_context *ctx,
+			  struct cxl_reset_region_context *region_ctx)
+{
+	int rc, i;
+
+	lockdep_assert_held_write(&cxl_rwsem.region);
+
+	for (i = 0; i < ctx->nr_memdevs; i++) {
+		struct cxl_reset_memdev *rmd = &ctx->memdevs[i];
+		struct cxl_memdev *cxlmd = rmd->cxlmd;
+
+		if (!device_trylock(&cxlmd->dev))
+			return -EAGAIN;
+
+		if (cxl_reset_memdev_active(cxlmd)) {
+			rc = cxl_reset_collect_memdev_regions(region_ctx,
+							      cxlmd);
+			if (!rc)
+				rmd->active = true;
+		} else {
+			rc = 0;
+		}
+
+		device_unlock(&cxlmd->dev);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
+static void cxl_reset_unlock_memdevs(struct cxl_reset_context *ctx)
+{
+	int i;
+
+	for (i = ctx->nr_memdevs - 1; i >= 0; i--) {
+		struct cxl_reset_memdev *rmd = &ctx->memdevs[i];
+
+		if (!rmd->locked)
+			continue;
+
+		device_unlock(&rmd->cxlmd->dev);
+		rmd->locked = false;
+	}
+}
+
+static int __maybe_unused cxl_reset_lock_memdevs(struct cxl_reset_context *ctx)
+{
+	int i;
+
+	lockdep_assert_held_write(&cxl_rwsem.region);
+
+	for (i = 0; i < ctx->nr_memdevs; i++) {
+		struct cxl_reset_memdev *rmd = &ctx->memdevs[i];
+		struct cxl_memdev *cxlmd = rmd->cxlmd;
+
+		if (!rmd->active)
+			continue;
+
+		if (!device_trylock(&cxlmd->dev))
+			goto err;
+
+		rmd->locked = true;
+		if (!cxl_reset_memdev_active(cxlmd)) {
+			cxl_reset_unlock_memdevs(ctx);
+			return -ENODEV;
+		}
+	}
+
+	return 0;
+
+err:
+	cxl_reset_unlock_memdevs(ctx);
+	return -EAGAIN;
+}
+
+static void __maybe_unused cxl_reset_put_memdevs(struct cxl_reset_context *ctx)
+{
+	int i;
+
+	for (i = 0; i < ctx->nr_memdevs; i++)
+		put_device(&ctx->memdevs[i].cxlmd->dev);
+
+	kfree(ctx->memdevs);
+	ctx->memdevs = NULL;
+	ctx->nr_memdevs = 0;
+	ctx->memdev_capacity = 0;
+}
+
 static void cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
 {
 	int i;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 7/9] cxl/pci: Orchestrate CXL reset for affected memdevs
  2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
                   ` (5 preceding siblings ...)
  2026-05-28  8:31 ` [PATCH v6 6/9] cxl/pci: Track memdevs affected by CXL reset Srirangan Madhavan
@ 2026-05-28  8:31 ` Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 8/9] cxl/memdev: Add cxl_reset sysfs attribute Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 9/9] Documentation/ABI: Document CXL memdev cxl_reset Srirangan Madhavan
  8 siblings, 0 replies; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

Add the reset flow that coordinates the target function, affected CXL
sibling functions, and any active memdevs in the CXL.cache/mem reset
scope.

The flow collects regions for the affected memdevs under
cxl_rwsem.region, verifies that those regions are idle, flushes CPU
caches for the affected ranges, saves and disables the target and sibling
PCI functions, and locks active memdevs to revalidate that their
endpoints are still present before reset.

After the CXL DVSEC reset completes, restore PCI config space so CXL
MMIO is accessible, restore decoder programming for all active affected
memdevs, commit their restored decoders, and only then re-enable CXL.mem
for the affected set.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/core/pci.c | 414 +++++++++++++++++++++++++++++++++++------
 1 file changed, 358 insertions(+), 56 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index c755c18c8d84..486c447e98f3 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -947,14 +947,12 @@ struct cxl_reset_region_context {
 	struct xarray regions;
 };
 
-static void __maybe_unused
-cxl_reset_region_context_init(struct cxl_reset_region_context *ctx)
+static void cxl_reset_region_context_init(struct cxl_reset_region_context *ctx)
 {
 	xa_init(&ctx->regions);
 }
 
-static void __maybe_unused
-cxl_reset_region_context_destroy(struct cxl_reset_region_context *ctx)
+static void cxl_reset_region_context_destroy(struct cxl_reset_region_context *ctx)
 {
 	xa_destroy(&ctx->regions);
 }
@@ -985,9 +983,8 @@ static int cxl_reset_collect_region(struct device *dev, void *data)
 	return cxl_reset_add_region(ctx, cxled->cxld.region);
 }
 
-static int __maybe_unused
-cxl_reset_collect_memdev_regions(struct cxl_reset_region_context *ctx,
-				 struct cxl_memdev *cxlmd)
+static int cxl_reset_collect_memdev_regions(struct cxl_reset_region_context *ctx,
+					    struct cxl_memdev *cxlmd)
 {
 	struct cxl_port *endpoint;
 
@@ -1045,8 +1042,7 @@ static int cxl_reset_validate_region_idle(struct cxl_region *cxlr)
 	return rc;
 }
 
-static int __maybe_unused
-cxl_reset_validate_regions_idle(struct cxl_reset_region_context *ctx)
+static int cxl_reset_validate_regions_idle(struct cxl_reset_region_context *ctx)
 {
 	struct cxl_region *cxlr;
 	unsigned long index;
@@ -1077,26 +1073,41 @@ static int cxl_reset_flush_region_cache(struct cxl_region *cxlr)
 	return rc;
 }
 
-static int __maybe_unused
-cxl_reset_flush_cpu_caches(struct cxl_reset_region_context *ctx)
+static int cxl_reset_cpu_cache_flush_preflight(struct cxl_reset_region_context *ctx,
+					       bool *skip)
 {
-	struct cxl_region *cxlr;
-	unsigned long index;
-	int rc;
+	if (skip)
+		*skip = false;
 
 	if (xa_empty(&ctx->regions))
 		return 0;
 
-	if (!cpu_cache_has_invalidate_memregion()) {
-		if (IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST)) {
-			pr_info_once(
-				"Bypassing cpu_cache_invalidate_memregion() for testing!\n");
-			return 0;
-		}
-		pr_warn("Failed to synchronize CPU cache state\n");
-		return -ENXIO;
+	if (cpu_cache_has_invalidate_memregion())
+		return 0;
+
+	if (IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST)) {
+		pr_info_once(
+			"Bypassing cpu_cache_invalidate_memregion() for testing!\n");
+		if (skip)
+			*skip = true;
+		return 0;
 	}
 
+	pr_warn("Failed to synchronize CPU cache state\n");
+	return -ENXIO;
+}
+
+static int cxl_reset_flush_cpu_caches(struct cxl_reset_region_context *ctx)
+{
+	struct cxl_region *cxlr;
+	unsigned long index;
+	bool skip;
+	int rc;
+
+	rc = cxl_reset_cpu_cache_flush_preflight(ctx, &skip);
+	if (rc || skip)
+		return rc;
+
 	xa_for_each(&ctx->regions, index, cxlr) {
 		rc = cxl_reset_flush_region_cache(cxlr);
 		if (rc)
@@ -1120,7 +1131,11 @@ struct cxl_reset_context {
 	struct pci_dev **siblings;
 	int nr_siblings;
 	int sibling_capacity;
+	int nr_siblings_locked;
 	int nr_siblings_prepared;
+	bool target_locked;
+	bool target_saved;
+	bool target_iommu_prepared;
 };
 
 struct cxl_reset_walk_ctx {
@@ -1306,7 +1321,7 @@ static int cxl_reset_collect_pci_memdev(struct cxl_reset_context *ctx,
  * function set to find memdevs whose regions and endpoint decoder state must
  * be handled around the reset.
  */
-static int __maybe_unused cxl_reset_collect_memdevs(struct cxl_reset_context *ctx)
+static int cxl_reset_collect_memdevs(struct cxl_reset_context *ctx)
 {
 	int rc, i;
 
@@ -1323,7 +1338,7 @@ static int __maybe_unused cxl_reset_collect_memdevs(struct cxl_reset_context *ct
 	return 0;
 }
 
-static int __maybe_unused
+static int
 cxl_reset_collect_regions(struct cxl_reset_context *ctx,
 			  struct cxl_reset_region_context *region_ctx)
 {
@@ -1370,7 +1385,7 @@ static void cxl_reset_unlock_memdevs(struct cxl_reset_context *ctx)
 	}
 }
 
-static int __maybe_unused cxl_reset_lock_memdevs(struct cxl_reset_context *ctx)
+static int cxl_reset_lock_memdevs(struct cxl_reset_context *ctx)
 {
 	int i;
 
@@ -1400,7 +1415,7 @@ static int __maybe_unused cxl_reset_lock_memdevs(struct cxl_reset_context *ctx)
 	return -EAGAIN;
 }
 
-static void __maybe_unused cxl_reset_put_memdevs(struct cxl_reset_context *ctx)
+static void cxl_reset_put_memdevs(struct cxl_reset_context *ctx)
 {
 	int i;
 
@@ -1417,14 +1432,20 @@ static void cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
 {
 	int i;
 
+	/*
+	 * Config state was restored early for CXL MMIO access. Complete PCI
+	 * reset recovery here by unblocking IOMMU and running reset_done().
+	 */
 	for (i = ctx->nr_siblings_prepared - 1; i >= 0; i--) {
 		struct pci_dev *sibling = ctx->siblings[i];
 
 		pci_dev_reset_iommu_done(sibling);
 		pci_dev_restore(sibling);
-		pci_dev_unlock(sibling);
 	}
 
+	for (i = ctx->nr_siblings_locked - 1; i >= 0; i--)
+		pci_dev_unlock(ctx->siblings[i]);
+
 	for (i = 0; i < ctx->nr_siblings; i++)
 		pci_dev_put(ctx->siblings[i]);
 
@@ -1432,31 +1453,39 @@ static void cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
 	ctx->siblings = NULL;
 	ctx->nr_siblings = 0;
 	ctx->sibling_capacity = 0;
+	ctx->nr_siblings_locked = 0;
 	ctx->nr_siblings_prepared = 0;
 }
 
-static int __maybe_unused
-cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
+static int cxl_pci_functions_lock(struct cxl_reset_context *ctx)
 {
-	int rc, i;
-
-	ctx->siblings = NULL;
-	ctx->nr_siblings = 0;
-	ctx->sibling_capacity = 0;
-	ctx->nr_siblings_prepared = 0;
+	int i;
 
-	rc = cxl_reset_collect_siblings(ctx);
-	if (rc)
-		goto err;
+	ctx->nr_siblings_locked = 0;
 
 	for (i = 0; i < ctx->nr_siblings; i++) {
 		struct pci_dev *sibling = ctx->siblings[i];
 
 		if (!pci_dev_trylock(sibling)) {
-			rc = -EAGAIN;
-			goto err;
+			cxl_pci_functions_reset_done(ctx);
+			return -EAGAIN;
 		}
 
+		ctx->nr_siblings_locked++;
+	}
+
+	return 0;
+}
+
+static int cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
+{
+	int rc, i;
+
+	ctx->nr_siblings_prepared = 0;
+
+	for (i = 0; i < ctx->nr_siblings_locked; i++) {
+		struct pci_dev *sibling = ctx->siblings[i];
+
 		pci_dev_save_and_disable(sibling);
 		rc = pci_dev_reset_iommu_prepare(sibling);
 		if (rc) {
@@ -1469,7 +1498,6 @@ cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
 			 * nr_siblings_prepared and must not get iommu_done().
 			 */
 			pci_dev_restore(sibling);
-			pci_dev_unlock(sibling);
 			goto err;
 		}
 
@@ -1483,6 +1511,79 @@ cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
 	return rc;
 }
 
+/*
+ * Restore PCI config space after reset so CXL MMIO is accessible for memdev
+ * restore. Driver reset_done callbacks remain deferred to final cleanup.
+ */
+static void cxl_pci_functions_restore_state(struct cxl_reset_context *ctx)
+{
+	int i;
+
+	for (i = ctx->nr_siblings_prepared - 1; i >= 0; i--)
+		pci_restore_state(ctx->siblings[i]);
+}
+
+static int cxl_pci_target_lock(struct cxl_reset_context *ctx)
+{
+	struct pci_dev *pdev = ctx->target;
+
+	if (!pci_dev_trylock(pdev))
+		return -EAGAIN;
+
+	ctx->target_locked = true;
+	return 0;
+}
+
+static int cxl_pci_target_reset_prepare(struct cxl_reset_context *ctx)
+{
+	struct pci_dev *pdev = ctx->target;
+	int rc;
+
+	/* Disable first to stop new transactions, then drain in-flight ones. */
+	pci_dev_save_and_disable(pdev);
+	ctx->target_saved = true;
+
+	if (!pci_wait_for_pending_transaction(pdev))
+		pci_err(pdev, "timed out waiting for pending transactions\n");
+
+	rc = pci_dev_reset_iommu_prepare(pdev);
+	if (rc) {
+		pci_err(pdev, "failed to block IOMMU for CXL reset: %d\n", rc);
+		return rc;
+	}
+
+	ctx->target_iommu_prepared = true;
+	return 0;
+}
+
+static void cxl_pci_target_restore_state(struct cxl_reset_context *ctx)
+{
+	if (ctx->target_saved)
+		pci_restore_state(ctx->target);
+}
+
+static void cxl_pci_target_reset_done(struct cxl_reset_context *ctx)
+{
+	if (ctx->target_iommu_prepared) {
+		pci_dev_reset_iommu_done(ctx->target);
+		ctx->target_iommu_prepared = false;
+	}
+
+	/*
+	 * cxl_pci_target_restore_state() restores config space before memdev
+	 * restore. Complete PCI reset recovery here with reset_done().
+	 */
+	if (ctx->target_saved) {
+		pci_dev_restore(ctx->target);
+		ctx->target_saved = false;
+	}
+
+	if (ctx->target_locked) {
+		pci_dev_unlock(ctx->target);
+		ctx->target_locked = false;
+	}
+}
+
 static int cxl_reset_update_ctrl2(struct pci_dev *pdev, int dvsec, u16 set,
 				  u16 clear)
 {
@@ -1599,7 +1700,7 @@ static int cxl_reset_wait_done(struct pci_dev *pdev, int dvsec, u16 cap)
 	return 0;
 }
 
-static int __maybe_unused cxl_dev_reset(struct pci_dev *pdev, bool mem_clear)
+static int cxl_dev_reset(struct pci_dev *pdev, bool mem_clear)
 {
 	int dvsec, rc;
 	u16 ctrl2_clear = 0;
@@ -1620,19 +1721,9 @@ static int __maybe_unused cxl_dev_reset(struct pci_dev *pdev, bool mem_clear)
 	if (mem_clear && !(cap & PCI_DVSEC_CXL_RST_MEM_CLR_CAPABLE))
 		return -EOPNOTSUPP;
 
-	if (!pci_wait_for_pending_transaction(pdev))
-		pci_err(pdev, "timed out waiting for pending transactions\n");
-
-	rc = pci_dev_reset_iommu_prepare(pdev);
-	if (rc) {
-		pci_err(pdev, "failed to block IOMMU for CXL reset: %d\n",
-			rc);
-		return rc;
-	}
-
 	rc = cxl_reset_disable_cache(pdev, dvsec, cap);
 	if (rc)
-		goto out_iommu;
+		return rc;
 	if (cap & PCI_DVSEC_CXL_CACHE_CAPABLE)
 		ctrl2_clear |= PCI_DVSEC_CXL_DISABLE_CACHING;
 
@@ -1651,7 +1742,7 @@ static int __maybe_unused cxl_dev_reset(struct pci_dev *pdev, bool mem_clear)
 
 	rc = cxl_reset_wait_done(pdev, dvsec, cap);
 	if (rc)
-		goto out_iommu;
+		return rc;
 
 	rc = cxl_reset_update_ctrl2(pdev, dvsec, 0,
 				    PCI_DVSEC_CXL_DISABLE_CACHING);
@@ -1660,7 +1751,218 @@ static int __maybe_unused cxl_dev_reset(struct pci_dev *pdev, bool mem_clear)
 	if (rc && ctrl2_clear)
 		cxl_reset_update_ctrl2(pdev, dvsec, 0, ctrl2_clear);
 
-out_iommu:
-	pci_dev_reset_iommu_done(pdev);
+	return rc;
+}
+
+static int cxl_reset_restore_memdev(struct cxl_reset_memdev *rmd)
+{
+	struct cxl_memdev *cxlmd = rmd->cxlmd;
+	int rc;
+
+	if (!rmd->active)
+		return 0;
+
+	rc = cxl_restore_memdev_decoders(cxlmd);
+	if (rc)
+		dev_err(&cxlmd->dev,
+			"Failed to restore CXL.mem decoders after reset: %d\n",
+			rc);
+
+	return rc;
+}
+
+static int cxl_reset_commit_memdev(struct cxl_reset_memdev *rmd)
+{
+	struct cxl_memdev *cxlmd = rmd->cxlmd;
+	int rc;
+
+	if (!rmd->active)
+		return 0;
+
+	rc = cxl_commit_memdev_decoders(cxlmd);
+	if (rc)
+		dev_err(&cxlmd->dev,
+			"Failed to commit CXL.mem decoders after reset: %d\n",
+			rc);
+
+	return rc;
+}
+
+static int cxl_reset_enable_memdev(struct cxl_reset_memdev *rmd)
+{
+	struct cxl_memdev *cxlmd = rmd->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	int rc;
+
+	if (!rmd->active)
+		return 0;
+
+	cxlds->media_ready = false;
+
+	rc = cxl_set_mem_enable(cxlds, PCI_DVSEC_CXL_MEM_ENABLE);
+	if (rc < 0) {
+		dev_err(&cxlmd->dev,
+			"Failed to enable CXL.mem after reset: %d\n", rc);
+		return rc;
+	}
+
+	rc = cxl_await_media_ready(cxlds);
+	if (rc) {
+		dev_err(&cxlmd->dev,
+			"Media not active after CXL reset: %d\n", rc);
+		return rc;
+	}
+	cxlds->media_ready = true;
+
+	return 0;
+}
+
+static void cxl_reset_disable_memdevs(struct cxl_reset_context *ctx)
+{
+	int rc, i;
+
+	for (i = ctx->nr_memdevs - 1; i >= 0; i--) {
+		struct cxl_memdev *cxlmd = ctx->memdevs[i].cxlmd;
+
+		if (!ctx->memdevs[i].active)
+			continue;
+
+		rc = cxl_set_mem_enable(cxlmd->cxlds, 0);
+		if (rc < 0)
+			dev_err(&cxlmd->dev,
+				"Failed to disable CXL.mem after reset restore failure; device state may be inconsistent: %d\n",
+				rc);
+	}
+}
+
+static int cxl_reset_restore_memdevs(struct cxl_reset_context *ctx)
+{
+	int rc;
+	int i;
+
+	lockdep_assert_held_write(&cxl_rwsem.region);
+
+	for (i = 0; i < ctx->nr_memdevs; i++) {
+		rc = cxl_reset_restore_memdev(&ctx->memdevs[i]);
+		if (rc)
+			return rc;
+	}
+
+	for (i = 0; i < ctx->nr_memdevs; i++) {
+		rc = cxl_reset_commit_memdev(&ctx->memdevs[i]);
+		if (rc)
+			return rc;
+	}
+
+	for (i = 0; i < ctx->nr_memdevs; i++) {
+		rc = cxl_reset_enable_memdev(&ctx->memdevs[i]);
+		if (rc) {
+			cxl_reset_disable_memdevs(ctx);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+static void cxl_reset_context_destroy(struct cxl_reset_context *ctx)
+{
+	/*
+	 * LIFO unwind for regular completion and partial initialization:
+	 * memdevs, sibling functions, target function, then references.
+	 * Each cleanup helper tolerates being called after its state was
+	 * already released on an earlier error path.
+	 */
+	cxl_reset_unlock_memdevs(ctx);
+	cxl_pci_functions_reset_done(ctx);
+	cxl_pci_target_reset_done(ctx);
+	cxl_reset_put_memdevs(ctx);
+}
+
+static int cxl_do_reset_locked(struct cxl_reset_context *ctx, bool mem_clear)
+{
+	struct cxl_reset_region_context region_ctx;
+	int rc;
+
+	lockdep_assert_held_write(&cxl_rwsem.region);
+
+	cxl_reset_region_context_init(&region_ctx);
+
+	rc = cxl_reset_collect_regions(ctx, &region_ctx);
+	if (rc)
+		goto out;
+
+	rc = cxl_pci_target_lock(ctx);
+	if (rc)
+		goto out;
+
+	rc = cxl_pci_functions_lock(ctx);
+	if (rc)
+		goto out;
+
+	rc = cxl_reset_lock_memdevs(ctx);
+	if (rc)
+		goto out;
+
+	rc = cxl_reset_cpu_cache_flush_preflight(&region_ctx, NULL);
+	if (rc)
+		goto out;
+
+	rc = cxl_reset_validate_regions_idle(&region_ctx);
+	if (rc)
+		goto out;
+
+	rc = cxl_reset_flush_cpu_caches(&region_ctx);
+	if (rc)
+		goto out;
+
+	rc = cxl_pci_target_reset_prepare(ctx);
+	if (rc)
+		goto out;
+
+	rc = cxl_pci_functions_reset_prepare(ctx);
+	if (rc)
+		goto out;
+
+	rc = cxl_dev_reset(ctx->target, mem_clear);
+
+	cxl_pci_target_restore_state(ctx);
+	cxl_pci_functions_restore_state(ctx);
+
+	if (!rc)
+		rc = cxl_reset_restore_memdevs(ctx);
+
+	cxl_reset_unlock_memdevs(ctx);
+
+out:
+	cxl_reset_region_context_destroy(&region_ctx);
+	return rc;
+}
+
+static int __maybe_unused cxl_do_reset(struct pci_dev *pdev, bool mem_clear)
+{
+	struct cxl_reset_context ctx = {
+		.target = pdev,
+	};
+	int rc;
+
+	/*
+	 * Snapshot the CXL r3.2 9.7 device reset scope before taking
+	 * cxl_rwsem.region. Hot-added functions after this point are not
+	 * coordinated by this reset operation.
+	 */
+	rc = cxl_reset_collect_siblings(&ctx);
+	if (rc)
+		goto out;
+
+	rc = cxl_reset_collect_memdevs(&ctx);
+	if (rc)
+		goto out;
+
+	scoped_guard(rwsem_write, &cxl_rwsem.region)
+		rc = cxl_do_reset_locked(&ctx, mem_clear);
+
+out:
+	cxl_reset_context_destroy(&ctx);
 	return rc;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 8/9] cxl/memdev: Add cxl_reset sysfs attribute
  2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
                   ` (6 preceding siblings ...)
  2026-05-28  8:31 ` [PATCH v6 7/9] cxl/pci: Orchestrate CXL reset for affected memdevs Srirangan Madhavan
@ 2026-05-28  8:31 ` Srirangan Madhavan
  2026-05-28  8:31 ` [PATCH v6 9/9] Documentation/ABI: Document CXL memdev cxl_reset Srirangan Madhavan
  8 siblings, 0 replies; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

Expose CXL reset through the CXL memdev device. The reset flow
depends on CXL memdev state to identify affected regions, coordinate
decoder restore, and keep CXL-specific policy out of the PCI sysfs ABI.

Add a write-only cxl_reset attribute under memX. The attribute is visible
only when the memdev's PCI parent advertises CXL Reset capability.
Writing a true boolean value invokes the CXL reset orchestration.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/core/memdev.c |  30 +++++++++++
 drivers/cxl/core/pci.c    | 102 +++++++++++++++++++++++++++++++++++++-
 drivers/cxl/cxl.h         |   3 ++
 drivers/cxl/cxlmem.h      |   2 +
 4 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 80e65690eb77..af67fa3d11b8 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -199,6 +199,26 @@ static ssize_t security_erase_store(struct device *dev,
 static struct device_attribute dev_attr_security_erase =
 	__ATTR(erase, 0200, NULL, security_erase_store);
 
+static ssize_t cxl_reset_store(struct device *dev,
+			       struct device_attribute *attr, const char *buf,
+			       size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	bool reset;
+	int rc;
+
+	rc = kstrtobool(buf, &reset);
+	if (rc)
+		return rc;
+
+	if (!reset)
+		return -EINVAL;
+
+	rc = cxl_memdev_reset(cxlmd);
+	return rc ? rc : len;
+}
+static DEVICE_ATTR_WO(cxl_reset);
+
 bool cxl_memdev_has_poison_cmd(struct cxl_memdev *cxlmd,
 			       enum poison_cmd_enabled_bits cmd)
 {
@@ -421,6 +441,7 @@ static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_payload_max.attr,
 	&dev_attr_label_storage_size.attr,
 	&dev_attr_numa_node.attr,
+	&dev_attr_cxl_reset.attr,
 	NULL,
 };
 
@@ -485,8 +506,16 @@ static struct attribute *cxl_memdev_security_attributes[] = {
 static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 				  int n)
 {
+	struct device *dev = kobj_to_dev(kobj);
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+
 	if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
 		return 0;
+
+	if (a == &dev_attr_cxl_reset.attr &&
+	    !cxl_memdev_reset_capable(cxlmd))
+		return 0;
+
 	return a->mode;
 }
 
@@ -1099,6 +1128,7 @@ static int cxlmd_add(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds)
 
 	cxlmd->cxlds = cxlds;
 	cxlds->cxlmd = cxlmd;
+	cxl_memdev_init_reset(cxlmd);
 
 	rc = cdev_device_add(&cxlmd->cdev, &cxlmd->dev);
 	if (rc) {
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 486c447e98f3..09f016544d24 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1207,6 +1207,22 @@ static bool cxl_reset_has_cache_or_mem(struct pci_dev *pdev)
 	return cap & (PCI_DVSEC_CXL_CACHE_CAPABLE | PCI_DVSEC_CXL_MEM_CAPABLE);
 }
 
+static bool cxl_reset_is_type2(struct pci_dev *pdev)
+{
+	u16 dvsec, cap;
+
+	dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+					  PCI_DVSEC_CXL_DEVICE);
+	if (!dvsec)
+		return false;
+
+	if (pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap))
+		return false;
+
+	return (cap & PCI_DVSEC_CXL_CACHE_CAPABLE) &&
+	       (cap & PCI_DVSEC_CXL_MEM_CAPABLE);
+}
+
 static int cxl_reset_add_sibling(struct cxl_reset_context *ctx,
 				 struct pci_dev *sibling)
 {
@@ -1939,7 +1955,7 @@ static int cxl_do_reset_locked(struct cxl_reset_context *ctx, bool mem_clear)
 	return rc;
 }
 
-static int __maybe_unused cxl_do_reset(struct pci_dev *pdev, bool mem_clear)
+static int cxl_do_reset(struct pci_dev *pdev, bool mem_clear)
 {
 	struct cxl_reset_context ctx = {
 		.target = pdev,
@@ -1966,3 +1982,87 @@ static int __maybe_unused cxl_do_reset(struct pci_dev *pdev, bool mem_clear)
 	cxl_reset_context_destroy(&ctx);
 	return rc;
 }
+
+static struct pci_dev *cxl_reset_get_fn0(struct pci_dev *pdev)
+{
+	unsigned int devfn;
+
+	/*
+	 * CXL Reset control/status is exposed in Function 0 and affects all
+	 * CXL.cache/mem functions in the device.
+	 */
+	if (pci_ari_enabled(pdev->bus))
+		devfn = 0;
+	else
+		devfn = PCI_DEVFN(PCI_SLOT(pdev->devfn), 0);
+
+	if (pdev->devfn == devfn)
+		return pci_dev_get(pdev);
+
+	return pci_get_slot(pdev->bus, devfn);
+}
+
+static bool cxl_memdev_probe_reset_capable(struct cxl_memdev *cxlmd)
+{
+	struct device *dev = cxlmd->dev.parent;
+	struct pci_dev *pdev, *fn0;
+	int dvsec;
+	u16 cap;
+
+	if (!dev || !dev_is_pci(dev))
+		return false;
+
+	pdev = to_pci_dev(dev);
+	if (!cxl_reset_is_type2(pdev))
+		return false;
+
+	fn0 = cxl_reset_get_fn0(pdev);
+	if (!fn0)
+		return false;
+
+	dvsec = pci_find_dvsec_capability(fn0, PCI_VENDOR_ID_CXL,
+					  PCI_DVSEC_CXL_DEVICE);
+	if (!dvsec)
+		goto out;
+
+	if (pci_read_config_word(fn0, dvsec + PCI_DVSEC_CXL_CAP, &cap))
+		goto out;
+
+	pci_dev_put(fn0);
+	return cap & PCI_DVSEC_CXL_RST_CAPABLE;
+
+out:
+	pci_dev_put(fn0);
+	return false;
+}
+
+void cxl_memdev_init_reset(struct cxl_memdev *cxlmd)
+{
+	cxlmd->reset_capable = cxl_memdev_probe_reset_capable(cxlmd);
+}
+
+bool cxl_memdev_reset_capable(struct cxl_memdev *cxlmd)
+{
+	return cxlmd->reset_capable;
+}
+
+int cxl_memdev_reset(struct cxl_memdev *cxlmd)
+{
+	struct device *dev = cxlmd->dev.parent;
+	struct pci_dev *fn0;
+	int rc;
+
+	if (!cxl_memdev_reset_capable(cxlmd))
+		return -EOPNOTSUPP;
+
+	if (!dev || !dev_is_pci(dev))
+		return -ENODEV;
+
+	fn0 = cxl_reset_get_fn0(to_pci_dev(dev));
+	if (!fn0)
+		return -ENODEV;
+
+	rc = cxl_do_reset(fn0, false);
+	pci_dev_put(fn0);
+	return rc;
+}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b51b1e9d6400..bf65996e24dc 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -796,6 +796,9 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
 			struct cxl_endpoint_dvsec_info *info);
 int cxl_restore_memdev_decoders(struct cxl_memdev *cxlmd);
 int cxl_commit_memdev_decoders(struct cxl_memdev *cxlmd);
+void cxl_memdev_init_reset(struct cxl_memdev *cxlmd);
+bool cxl_memdev_reset_capable(struct cxl_memdev *cxlmd);
+int cxl_memdev_reset(struct cxl_memdev *cxlmd);
 
 bool is_cxl_region(struct device *dev);
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 776c50d1db51..c8e7349fb130 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -48,6 +48,7 @@ struct cxl_memdev_attach {
  * @cxl_nvd: optional bridge to an nvdimm if the device supports pmem
  * @endpoint: connection to the CXL port topology for this memory device
  * @attach: creator of this memdev depends on CXL link attach to operate
+ * @reset_capable: cached CXL Reset support
  * @id: id number of this memdev instance.
  * @depth: endpoint port depth
  * @scrub_cycle: current scrub cycle set for this device
@@ -65,6 +66,7 @@ struct cxl_memdev {
 	struct cxl_nvdimm *cxl_nvd;
 	struct cxl_port *endpoint;
 	const struct cxl_memdev_attach *attach;
+	bool reset_capable;
 	int id;
 	int depth;
 	u8 scrub_cycle;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 9/9] Documentation/ABI: Document CXL memdev cxl_reset
  2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
                   ` (7 preceding siblings ...)
  2026-05-28  8:31 ` [PATCH v6 8/9] cxl/memdev: Add cxl_reset sysfs attribute Srirangan Madhavan
@ 2026-05-28  8:31 ` Srirangan Madhavan
  8 siblings, 0 replies; 12+ messages in thread
From: Srirangan Madhavan @ 2026-05-28  8:31 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-kernel
  Cc: vsethi, alwilliamson, Dan Williams, Sai Yashwanth Reddy Kancherla,
	Vishal Aslot, Manish Honap, Jiandi An, Richard Cheng, linux-tegra,
	Srirangan Madhavan

Document the write-only cxl_reset attribute under CXL memdev devices.
The attribute is visible only when the memdev's PCI parent advertises
CXL Reset capability, and writing a true boolean value requests the CXL
reset flow.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 28 +++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 16a9b3d2e2c0..d5d055e7a756 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -110,6 +110,34 @@ Description:
 		affinity for this device.
 
 
+What:		/sys/bus/cxl/devices/memX/cxl_reset
+Date:		May, 2026
+KernelVersion:	v7.1
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) Write a boolean true value, for example "1" or "true", to
+		request CXL Reset for this memory device. The driver performs
+		CXL-specific reset coordination for the target memdev before
+		issuing reset, including any required preparation for affected
+		CXL memory regions and related CXL memory devices.
+
+		CXL Reset control is Function 0 scoped. A write to this
+		attribute resets the CXL.cache and CXL.mem state for all
+		CXL.cache or CXL.mem functions in the same CXL device reset
+		scope, not only the memX device associated with this file.
+
+		The optional CXL Reset Memory Clear operation is not exposed by
+		this attribute.
+
+		A reset fails with -EBUSY if any affected CXL region is
+		online as System RAM or has an active region driver bound.
+		Userspace must first quiesce and release affected CXL memory
+		mappings.
+
+		If this file is not present, then CXL Reset is not supported
+		for the device.
+
+
 What:		/sys/bus/cxl/devices/memX/security/state
 Date:		June, 2023
 KernelVersion:	v6.5
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders
  2026-05-28  8:31 ` [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders Srirangan Madhavan
@ 2026-05-28 11:06   ` Richard Cheng
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Cheng @ 2026-05-28 11:06 UTC (permalink / raw)
  To: Srirangan Madhavan
  Cc: linux-cxl, linux-pci, linux-kernel, vsethi, alwilliamson,
	Dan Williams, Sai Yashwanth Reddy Kancherla, Vishal Aslot,
	Manish Honap, Jiandi An, linux-tegra

On Thu, May 28, 2026 at 08:31:46AM +0800, Srirangan Madhavan wrote:
> Add helpers to restore endpoint decoder programming for a CXL memdev from
> CXL core's cached decoder objects, then commit it as a distinct step.
> Callers are expected to have established reset safety and to hold
> cxl_rwsem.region for write.
> 
> cxl_restore_memdev_decoders() restores programmable decoder state while
> keeping traffic disabled. For HDM-backed endpoints it programs enabled
> endpoint decoder fields without COMMIT, keeps the HDM Decoder Capability
> disabled, and mirrors matching endpoint DVSEC ranges where possible. For
> endpoints without HDM decoder registers, it restores the legacy DVSEC
> ranges that model endpoint decode.
> 
> cxl_commit_memdev_decoders() enables the HDM Decoder Capability and
> commits enabled, unlocked endpoint decoders after safety checks pass. It
> sets COMMIT only after decoder fields have been restored, does not
> re-lock decoders, and does not set DVSEC MEM_ENABLE.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/core/hdm.c | 318 ++++++++++++++++++++++++++++++++++++++++-
>  drivers/cxl/cxl.h      |   2 +
>  2 files changed, 317 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 0c80b76a5f9b..f7af1041a9fc 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -679,7 +679,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
>  	return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>  }
>  
> -static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
> +static int cxld_set_interleave_fields(struct cxl_decoder *cxld, u32 *ctrl)
>  {
>  	u16 eig;
>  	u8 eiw;
> @@ -690,14 +690,22 @@ static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
>  	 */
>  	if (WARN_ONCE(ways_to_eiw(cxld->interleave_ways, &eiw),
>  		      "invalid interleave_ways: %d\n", cxld->interleave_ways))
> -		return;
> +		return -EINVAL;
>  	if (WARN_ONCE(granularity_to_eig(cxld->interleave_granularity, &eig),
>  		      "invalid interleave_granularity: %d\n",
>  		      cxld->interleave_granularity))
> -		return;
> +		return -EINVAL;
>  
>  	u32p_replace_bits(ctrl, eig, CXL_HDM_DECODER0_CTRL_IG_MASK);
>  	u32p_replace_bits(ctrl, eiw, CXL_HDM_DECODER0_CTRL_IW_MASK);
> +	return 0;
> +}
> +
> +static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
> +{
> +	if (cxld_set_interleave_fields(cxld, ctrl))
> +		return;
> +
>  	*ctrl |= CXL_HDM_DECODER0_CTRL_COMMIT;
>  }
>  
> @@ -927,6 +935,310 @@ static void cxl_decoder_reset(struct cxl_decoder *cxld)
>  	}
>  }
>  
> +static int cxl_restore_dvsec_range(struct cxl_memdev *cxlmd,
> +				   struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> +	u64 base = cxld->hpa_range.start;
> +	u64 size = range_len(&cxld->hpa_range);
> +	u32 lo;
> +	int dvsec = cxlds->cxl_dvsec;
> +	int id = cxld->id;
> +	int rc;
> +
> +	if (!dvsec)
> +		return 0;
> +
> +	if (id >= CXL_DVSEC_RANGE_MAX)
> +		return 0;
> +
> +	rc = pci_write_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(id),
> +				    upper_32_bits(base));
> +	if (rc)
> +		return rc;
> +
> +	rc = pci_read_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(id),
> +				   &lo);
> +	if (rc)
> +		return rc;

Here pci_read/write* returns positive values on failure, and you pass the value up.
Eventually surfacing through cxl_reset_store to userspace where sysfs thinks positive
values as "bytes written".

I think this might need a fix ?

Best regards,
Richard Cheng.


> +	lo &= ~PCI_DVSEC_CXL_MEM_BASE_LOW;
> +	lo |= lower_32_bits(base) & PCI_DVSEC_CXL_MEM_BASE_LOW;
> +
> +	rc = pci_write_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(id),
> +				    lo);
> +	if (rc)
> +		return rc;
> +
> +	rc = pci_write_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_SIZE_HIGH(id),
> +				    upper_32_bits(size));
> +	if (rc)
> +		return rc;
> +
> +	rc = pci_read_config_dword(pdev, dvsec + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
> +				   &lo);
> +	if (rc)
> +		return rc;
> +
> +	/*
> +	 * Preserve MEM_INFO_VALID / MEM_ACTIVE and any reserved bits while
> +	 * restoring only the programmable size bits.
> +	 */
> +	lo &= ~PCI_DVSEC_CXL_MEM_SIZE_LOW;
> +	lo |= lower_32_bits(size) & PCI_DVSEC_CXL_MEM_SIZE_LOW;
> +
> +	return pci_write_config_dword(pdev,
> +				      dvsec + PCI_DVSEC_CXL_RANGE_SIZE_LOW(id),
> +				      lo);
> +}
> +
> +static int cxl_restore_hdm_decoder(struct cxl_hdm *cxlhdm,
> +				   struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	void __iomem *hdm;
> +	u64 base, size, skip;
> +	u32 ctrl;
> +	int id;
> +
> +	id = cxld->id;
> +	hdm = cxlhdm->regs.hdm_decoder;
> +	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
> +	if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
> +		return 0;
> +
> +	base = cxld->hpa_range.start;
> +	size = range_len(&cxld->hpa_range);
> +	skip = cxled->skip;
> +
> +	ctrl &= ~(CXL_HDM_DECODER0_CTRL_LOCK |
> +		  CXL_HDM_DECODER0_CTRL_COMMIT |
> +		  CXL_HDM_DECODER0_CTRL_COMMITTED |
> +		  CXL_HDM_DECODER0_CTRL_COMMIT_ERROR);
> +	if (cxld_set_interleave_fields(cxld, &ctrl))
> +		return -EINVAL;
> +	cxld_set_type(cxld, &ctrl);
> +
> +	/* Preserve setup_hw_decoder() programming order, without COMMIT. */
> +	writel(upper_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id));
> +	writel(lower_32_bits(base), hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
> +	writel(upper_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(id));
> +	writel(lower_32_bits(size), hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id));
> +	writel(upper_32_bits(skip), hdm + CXL_HDM_DECODER0_SKIP_HIGH(id));
> +	writel(lower_32_bits(skip), hdm + CXL_HDM_DECODER0_SKIP_LOW(id));
> +	wmb();
> +	writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
> +
> +	return 0;
> +}
> +
> +struct cxl_restore_ctx {
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_hdm *cxlhdm;
> +};
> +
> +static int cxl_restore_decoder(struct device *dev, void *data)
> +{
> +	struct cxl_restore_ctx *ctx = data;
> +	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_decoder *cxld;
> +	int rc;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	cxld = &cxled->cxld;
> +	if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
> +		return 0;
> +
> +	if (ctx->cxlhdm->regs.hdm_decoder) {
> +		if (cxld->id >= ctx->cxlhdm->decoder_count)
> +			return -EINVAL;
> +
> +		rc = cxl_restore_hdm_decoder(ctx->cxlhdm, cxled);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return cxl_restore_dvsec_range(ctx->cxlmd, cxled);
> +}
> +
> +static int cxl_restore_decoders(struct cxl_memdev *cxlmd, struct cxl_hdm *cxlhdm)
> +{
> +	struct cxl_port *port = cxlhdm->port;
> +	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
> +	struct cxl_restore_ctx ctx = {
> +		.cxlmd = cxlmd,
> +		.cxlhdm = cxlhdm,
> +	};
> +	u32 global_ctrl;
> +
> +	if (hdm) {
> +		global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
> +		writel(global_ctrl & ~CXL_HDM_DECODER_ENABLE,
> +		       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
> +	}
> +
> +	return device_for_each_child(&port->dev, &ctx, cxl_restore_decoder);
> +}
> +
> +/**
> + * cxl_restore_memdev_decoders - Restore endpoint decoder programming
> + * @cxlmd: CXL memdev whose endpoint decoders need to be restored
> + *
> + * Restore only programmable decoder state from CXL core's cached decoder
> + * objects. For endpoints with HDM decoder registers, program the HDM decoder
> + * fields and mirror decoder ids representable by CXL_DVSEC_RANGE_MAX into the
> + * DVSEC range registers when present. For endpoints without HDM decoder
> + * registers, restore DVSEC range registers only.
> + *
> + * This helper leaves CXL.mem disabled: it does not commit HDM decoders, enable
> + * the HDM Decoder Capability, set PCI_DVSEC_CXL_MEM_ENABLE, or restore
> + * unrelated DVSEC CTRL, CTRL2, LOCK, MEM_ENABLE, or other control state.
> + * Callers must perform final commit/resume steps only after reset safety checks
> + * pass.
> + *
> + * Return: 0 on success, negative errno on failure.
> + */
> +int cxl_restore_memdev_decoders(struct cxl_memdev *cxlmd)
> +{
> +	struct cxl_port *endpoint = cxlmd->endpoint;
> +	struct cxl_hdm *cxlhdm;
> +	int rc;
> +
> +	lockdep_assert_held_write(&cxl_rwsem.region);
> +
> +	if (!endpoint)
> +		return -ENODEV;
> +
> +	cxlhdm = dev_get_drvdata(&endpoint->dev);
> +	if (!cxlhdm)
> +		return -ENODEV;
> +
> +	scoped_guard(rwsem_read, &cxl_rwsem.dpa)
> +		rc = cxl_restore_decoders(cxlmd, cxlhdm);
> +	return rc;
> +}
> +
> +static int cxl_commit_restored_hdm_decoder(struct cxl_hdm *cxlhdm,
> +					   struct cxl_endpoint_decoder *cxled)
> +{
> +	struct cxl_decoder *cxld = &cxled->cxld;
> +	void __iomem *hdm = cxlhdm->regs.hdm_decoder;
> +	u32 ctrl;
> +	int id;
> +
> +	if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
> +		return 0;
> +
> +	if (!hdm)
> +		return 0;
> +
> +	id = cxld->id;
> +	if (id >= cxlhdm->decoder_count)
> +		return -EINVAL;
> +
> +	/*
> +	 * cxl_restore_hdm_decoder() programmed the decoder fields first. This
> +	 * control register write sets COMMIT as the final programming step.
> +	 */
> +	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
> +	if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
> +		return 0;
> +
> +	if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)
> +		return 0;
> +
> +	ctrl |= CXL_HDM_DECODER0_CTRL_COMMIT;
> +	writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
> +
> +	return cxld_await_commit(hdm, id);
> +}
> +
> +struct cxl_commit_decoder_ctx {
> +	struct cxl_hdm *cxlhdm;
> +	int id;
> +};
> +
> +static int cxl_commit_restored_decoder_by_id(struct device *dev, void *data)
> +{
> +	struct cxl_commit_decoder_ctx *ctx = data;
> +	struct cxl_endpoint_decoder *cxled;
> +	int rc;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxled = to_cxl_endpoint_decoder(dev);
> +	if (cxled->cxld.id != ctx->id)
> +		return 0;
> +
> +	rc = cxl_commit_restored_hdm_decoder(ctx->cxlhdm, cxled);
> +	return rc ?: 1;
> +}
> +
> +/**
> + * cxl_commit_memdev_decoders - Commit restored endpoint decoder programming
> + * @cxlmd: CXL memdev whose endpoint decoders need to be committed
> + *
> + * Resume endpoint decoding after cxl_restore_memdev_decoders() has restored
> + * programmable decoder fields. For endpoints with HDM decoder registers, enable
> + * the HDM Decoder Capability and commit enabled, unlocked endpoint decoders.
> + * Locked decoders are left to their current hardware/firmware-owned state.
> + *
> + * This helper does not set PCI_DVSEC_CXL_MEM_ENABLE. Callers must enable
> + * CXL.mem only after all reset safety checks and decoder restore/commit steps
> + * have completed.
> + *
> + * Return: 0 on success, negative errno on failure.
> + */
> +int cxl_commit_memdev_decoders(struct cxl_memdev *cxlmd)
> +{
> +	struct cxl_port *endpoint = cxlmd->endpoint;
> +	struct cxl_hdm *cxlhdm;
> +	void __iomem *hdm;
> +	u32 global_ctrl;
> +	int i, rc;
> +
> +	lockdep_assert_held_write(&cxl_rwsem.region);
> +
> +	if (!endpoint)
> +		return -ENODEV;
> +
> +	cxlhdm = dev_get_drvdata(&endpoint->dev);
> +	if (!cxlhdm)
> +		return -ENODEV;
> +
> +	hdm = cxlhdm->regs.hdm_decoder;
> +	if (!hdm)
> +		return 0;
> +
> +	global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
> +	writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
> +	       hdm + CXL_HDM_DECODER_CTRL_OFFSET);
> +
> +	for (i = 0; i < cxlhdm->decoder_count; i++) {
> +		struct cxl_commit_decoder_ctx ctx = {
> +			.cxlhdm = cxlhdm,
> +			.id = i,
> +		};
> +
> +		/*
> +		 * Per CXL Spec 3.1 8.2.4.20.12 software must commit decoders
> +		 * in HPA order. Region setup already enforces that ordering by
> +		 * decoder id, so restore commits follow ascending id order.
> +		 */
> +		rc = device_for_each_child(&endpoint->dev, &ctx,
> +					   cxl_commit_restored_decoder_by_id);
> +		if (rc < 0)
> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +
>  static int cxl_setup_hdm_decoder_from_dvsec(
>  	struct cxl_port *port, struct cxl_decoder *cxld, u64 *dpa_base,
>  	int which, struct cxl_endpoint_dvsec_info *info)
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1297594beaec..b51b1e9d6400 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -794,6 +794,8 @@ int cxl_port_setup_regs(struct cxl_port *port,
>  struct cxl_dev_state;
>  int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds,
>  			struct cxl_endpoint_dvsec_info *info);
> +int cxl_restore_memdev_decoders(struct cxl_memdev *cxlmd);
> +int cxl_commit_memdev_decoders(struct cxl_memdev *cxlmd);
>  
>  bool is_cxl_region(struct device *dev);
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 4/9] PCI/CXL: Add sibling function coordination for reset
  2026-05-28  8:31 ` [PATCH v6 4/9] PCI/CXL: Add sibling function coordination for reset Srirangan Madhavan
@ 2026-05-28 11:15   ` Richard Cheng
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Cheng @ 2026-05-28 11:15 UTC (permalink / raw)
  To: Srirangan Madhavan
  Cc: linux-cxl, linux-pci, linux-kernel, vsethi, alwilliamson,
	Dan Williams, Sai Yashwanth Reddy Kancherla, Vishal Aslot,
	Manish Honap, Jiandi An, linux-tegra

On Thu, May 28, 2026 at 08:31:49AM +0800, Srirangan Madhavan wrote:
> Add helpers to collect CXL sibling PCI functions affected by a CXL reset
> and prepare them for reset by saving and disabling them. Restore those
> siblings and drop their references when reset coordination completes.
> 
> Use the Non-CXL Function Map DVSEC to exclude non-CXL functions, and
> filter remaining siblings to functions that advertise CXL.cache or
> CXL.mem capability.
> 
> Use pci_dev_trylock() for sibling locking and unwind on contention or
> allocation failure, so competing reset paths fail with an errno.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/core/pci.c        | 207 ++++++++++++++++++++++++++++++++++
>  include/uapi/linux/pci_regs.h |   2 +
>  2 files changed, 209 insertions(+)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 318744695f62..01effbb4e7cd 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -1,9 +1,11 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2021 Intel Corporation. All rights reserved. */
>  #include <linux/units.h>
> +#include <linux/bitmap.h>
>  #include <linux/io-64-nonatomic-lo-hi.h>
>  #include <linux/device.h>
>  #include <linux/delay.h>
> +#include <linux/iommu.h>
>  #include <linux/memregion.h>
>  #include <linux/pci.h>
>  #include <linux/pci-doe.h>
> @@ -15,6 +17,10 @@
>  #include "core.h"
>  #include "trace.h"
>  
> +#define CXL_RESET_MAX_FUNCTIONS		256
> +#define CXL_RESET_FUNCTION_MAP_REGS	(CXL_RESET_MAX_FUNCTIONS / 32)
> +#define CXL_RESET_SIBLINGS_INIT		8
> +
>  /**
>   * DOC: cxl core pci
>   *
> @@ -1096,3 +1102,204 @@ cxl_reset_flush_cpu_caches(struct cxl_reset_region_context *ctx)
>  
>  	return 0;
>  }
> +
> +struct cxl_reset_context {
> +	struct pci_dev *target;
> +	struct pci_dev **siblings;
> +	int nr_siblings;
> +	int sibling_capacity;
> +	int nr_siblings_prepared;
> +};
> +
> +struct cxl_reset_walk_ctx {
> +	struct cxl_reset_context *ctx;
> +	unsigned long *non_cxl_func_map;
> +	int rc;
> +};
> +
> +static void
> +cxl_reset_read_non_cxl_func_map(struct pci_dev *pdev,
> +				unsigned long *non_cxl_func_map)
> +{
> +	u32 map[CXL_RESET_FUNCTION_MAP_REGS] = {};
> +	u16 dvsec;
> +	int rc, i;
> +
> +	bitmap_zero(non_cxl_func_map, CXL_RESET_MAX_FUNCTIONS);
> +
> +	dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> +					  PCI_DVSEC_CXL_FUNCTION_MAP);
> +	if (!dvsec)
> +		return;
> +
> +	for (i = 0; i < CXL_RESET_FUNCTION_MAP_REGS; i++) {
> +		rc = pci_read_config_dword(pdev,
> +					   dvsec + PCI_DVSEC_CXL_FUNCTION_MAP_REG +
> +					   i * sizeof(map[i]), &map[i]);
> +		if (rc) {
> +			pci_warn(pdev,
> +				 "failed to read CXL Function Map; treating all siblings as CXL: %d\n",
> +				 rc);
> +			bitmap_zero(non_cxl_func_map, CXL_RESET_MAX_FUNCTIONS);
> +			return;
> +		}
> +	}
> +
> +	bitmap_from_arr32(non_cxl_func_map, map, CXL_RESET_MAX_FUNCTIONS);
> +}
> +
> +static bool cxl_reset_is_cxl_sibling(struct pci_dev *pdev,
> +				     struct pci_dev *sibling,
> +				     unsigned long *non_cxl_func_map)
> +{
> +	if (sibling == pdev || sibling->bus != pdev->bus)
> +		return false;
> +
> +	if (pci_ari_enabled(pdev->bus))
> +		return !test_bit(sibling->devfn, non_cxl_func_map);
> +
> +	if (PCI_SLOT(sibling->devfn) != PCI_SLOT(pdev->devfn))
> +		return false;
> +
> +	return !test_bit(PCI_FUNC(sibling->devfn) * 32 +
> +			 PCI_SLOT(sibling->devfn), non_cxl_func_map);
> +}
> +

Acked on sashiko-bot's finding, and even more, since the function already
does the check of whether sibling devfn is equal to the device's devfn slot or not,
PCI_SLOT(sibling->devfn) is guaranteed equal to the target's slot. It's a constant.

According to the spec, the Non-CXL Function Map is one bit per function within the same
multi-function device. I think the following change would be reasonable
"""
return !test_bit(PCI_FUNC(sibling->devfn), non_cxl_func_map);
"""

and besides the false-negative case, I think the more common case would be false positive, e.g.
F>=1 reads bits 32, 64, ... in the reserved portion of the 256-bit map, which are almost always
clear, so non-CXL siblings get pulled into the CXL reset path.

Best regards,
Richard Cheng.

> +static bool cxl_reset_has_cache_or_mem(struct pci_dev *pdev)
> +{
> +	u16 dvsec, cap;
> +
> +	dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> +					  PCI_DVSEC_CXL_DEVICE);
> +	if (!dvsec)
> +		return false;
> +
> +	if (pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap))
> +		return false;
> +
> +	return cap & (PCI_DVSEC_CXL_CACHE_CAPABLE | PCI_DVSEC_CXL_MEM_CAPABLE);
> +}
> +
> +static int cxl_reset_add_sibling(struct cxl_reset_context *ctx,
> +				 struct pci_dev *sibling)
> +{
> +	struct pci_dev **siblings;
> +	int capacity;
> +
> +	if (ctx->nr_siblings < ctx->sibling_capacity)
> +		goto add;
> +
> +	capacity = ctx->sibling_capacity ? ctx->sibling_capacity * 2 :
> +		   CXL_RESET_SIBLINGS_INIT;
> +	siblings = krealloc(ctx->siblings, capacity * sizeof(*siblings),
> +			    GFP_KERNEL);
> +	if (!siblings)
> +		return -ENOMEM;
> +
> +	ctx->siblings = siblings;
> +	ctx->sibling_capacity = capacity;
> +
> +add:
> +	pci_dev_get(sibling);
> +	ctx->siblings[ctx->nr_siblings++] = sibling;
> +	return 0;
> +}
> +
> +static int cxl_reset_collect_sibling(struct pci_dev *sibling, void *data)
> +{
> +	struct cxl_reset_walk_ctx *wctx = data;
> +	struct cxl_reset_context *ctx = wctx->ctx;
> +	struct pci_dev *pdev = ctx->target;
> +
> +	if (!cxl_reset_is_cxl_sibling(pdev, sibling, wctx->non_cxl_func_map))
> +		return 0;
> +
> +	if (!cxl_reset_has_cache_or_mem(sibling))
> +		return 0;
> +
> +	wctx->rc = cxl_reset_add_sibling(ctx, sibling);
> +	return wctx->rc;
> +}
> +
> +static int cxl_reset_collect_siblings(struct cxl_reset_context *ctx)
> +{
> +	DECLARE_BITMAP(non_cxl_func_map, CXL_RESET_MAX_FUNCTIONS);
> +	struct cxl_reset_walk_ctx wctx = {
> +		.ctx = ctx,
> +		.non_cxl_func_map = non_cxl_func_map,
> +	};
> +
> +	cxl_reset_read_non_cxl_func_map(ctx->target, non_cxl_func_map);
> +	pci_walk_bus(ctx->target->bus, cxl_reset_collect_sibling, &wctx);
> +	return wctx.rc;
> +}
> +
> +static void cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
> +{
> +	int i;
> +
> +	for (i = ctx->nr_siblings_prepared - 1; i >= 0; i--) {
> +		struct pci_dev *sibling = ctx->siblings[i];
> +
> +		pci_dev_reset_iommu_done(sibling);
> +		pci_dev_restore(sibling);
> +		pci_dev_unlock(sibling);
> +	}
> +
> +	for (i = 0; i < ctx->nr_siblings; i++)
> +		pci_dev_put(ctx->siblings[i]);
> +
> +	kfree(ctx->siblings);
> +	ctx->siblings = NULL;
> +	ctx->nr_siblings = 0;
> +	ctx->sibling_capacity = 0;
> +	ctx->nr_siblings_prepared = 0;
> +}
> +
> +static int __maybe_unused
> +cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
> +{
> +	int rc, i;
> +
> +	ctx->siblings = NULL;
> +	ctx->nr_siblings = 0;
> +	ctx->sibling_capacity = 0;
> +	ctx->nr_siblings_prepared = 0;
> +
> +	rc = cxl_reset_collect_siblings(ctx);
> +	if (rc)
> +		goto err;
> +
> +	for (i = 0; i < ctx->nr_siblings; i++) {
> +		struct pci_dev *sibling = ctx->siblings[i];
> +
> +		if (!pci_dev_trylock(sibling)) {
> +			rc = -EAGAIN;
> +			goto err;
> +		}
> +
> +		pci_dev_save_and_disable(sibling);
> +		rc = pci_dev_reset_iommu_prepare(sibling);
> +		if (rc) {
> +			pci_err(sibling,
> +				"failed to block IOMMU for CXL reset: %d\n",
> +				rc);
> +			/*
> +			 * Undo save_and_disable() for this sibling. IOMMU
> +			 * prepare failed, so this sibling is not counted in
> +			 * nr_siblings_prepared and must not get iommu_done().
> +			 */
> +			pci_dev_restore(sibling);
> +			pci_dev_unlock(sibling);
> +			goto err;
> +		}
> +
> +		ctx->nr_siblings_prepared++;
> +	}
> +
> +	return 0;
> +
> +err:
> +	cxl_pci_functions_reset_done(ctx);
> +	return rc;
> +}
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 14f634ab9350..fa1fcd26af01 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1349,6 +1349,7 @@
>  /* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */
>  #define PCI_DVSEC_CXL_DEVICE				0
>  #define  PCI_DVSEC_CXL_CAP				0xA
> +#define   PCI_DVSEC_CXL_CACHE_CAPABLE			_BITUL(0)
>  #define   PCI_DVSEC_CXL_MEM_CAPABLE			_BITUL(2)
>  #define   PCI_DVSEC_CXL_HDM_COUNT			__GENMASK(5, 4)
>  #define  PCI_DVSEC_CXL_CTRL				0xC
> @@ -1366,6 +1367,7 @@
>  
>  /* CXL r4.0, 8.1.4: Non-CXL Function Map DVSEC */
>  #define PCI_DVSEC_CXL_FUNCTION_MAP			2
> +#define  PCI_DVSEC_CXL_FUNCTION_MAP_REG			0x0C
>  
>  /* CXL r4.0, 8.1.5: Extensions DVSEC for Ports */
>  #define PCI_DVSEC_CXL_PORT				3
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-28 11:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders Srirangan Madhavan
2026-05-28 11:06   ` Richard Cheng
2026-05-28  8:31 ` [PATCH v6 2/9] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 3/9] cxl: Add reset-idle and cache flush helpers Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 4/9] PCI/CXL: Add sibling function coordination for reset Srirangan Madhavan
2026-05-28 11:15   ` Richard Cheng
2026-05-28  8:31 ` [PATCH v6 5/9] cxl/pci: Add CXL DVSEC reset helper Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 6/9] cxl/pci: Track memdevs affected by CXL reset Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 7/9] cxl/pci: Orchestrate CXL reset for affected memdevs Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 8/9] cxl/memdev: Add cxl_reset sysfs attribute Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 9/9] Documentation/ABI: Document CXL memdev cxl_reset Srirangan Madhavan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox