public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
@ 2026-03-06  8:00 smadhavan
  2026-03-06  8:00 ` [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions smadhavan
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: smadhavan @ 2026-03-06  8:00 UTC (permalink / raw)
  To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
	vishal.l.verma, alison.schofield, dave
  Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
	mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
	linux-kernel, Srirangan Madhavan

From: Srirangan Madhavan <smadhavan@nvidia.com>

CXL devices could lose their DVSEC configuration and HDM decoder programming
after multiple reset methods (whenever link disable/enable). This means a
device that was fully configured — with DVSEC control/range registers set
and HDM decoders committed — loses that state after reset. In cases where
these are programmed by firmware, downstream drivers are unable to re-initialize
the device because CXL memory ranges are no longer mapped.

This series adds CXL state save/restore logic to the PCI core so
that DVSEC and HDM decoder state is preserved across any PCI reset
path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.

HDM decoder defines and the cxl_register_map infrastructure are moved from
internal CXL driver headers to a new public include/cxl/pci.h, allowing
drivers/pci/cxl.c to use them.
This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
changes on top of his.

These patches were previously part of the CXL reset series and have been
split out [2] to allow independent review and merging. Review feedback on
the save/restore portions from v4 has been addressed.

Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
before reset and restored after, with decoder commit confirmed via the
COMMITTED status bit. Type-3 device testing is in progress.

This series is based on v7.0-rc1.

[1] https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/
[2] https://lore.kernel.org/linux-cxl/aa8d4f6a-e7bd-4a20-8d34-4376ea314b8f@intel.com/T/#m825c6bdd1934022123807e86d235358a63b08dbc

Srirangan Madhavan (5):
  PCI: Add CXL DVSEC control, lock, and range register definitions
  cxl: Move HDM decoder and register map definitions to
    include/cxl/pci.h
  PCI: Add virtual extended cap save buffer for CXL state
  PCI: Add cxl DVSEC state save/restore across resets
  PCI/CXL: Add HDM decoder state save/restore

 drivers/cxl/cxl.h             | 107 +-------
 drivers/cxl/cxlpci.h          |  10 -
 drivers/pci/Kconfig           |   4 +
 drivers/pci/Makefile          |   1 +
 drivers/pci/cxl.c             | 468 ++++++++++++++++++++++++++++++++++
 drivers/pci/pci.c             |  23 ++
 drivers/pci/pci.h             |  18 ++
 include/cxl/pci.h             | 129 ++++++++++
 include/uapi/linux/pci_regs.h |   6 +
 9 files changed, 650 insertions(+), 116 deletions(-)
 create mode 100644 drivers/pci/cxl.c
 create mode 100644 include/cxl/pci.h

base-commit: 6de23f81a5e0
--
2.43.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions
  2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
@ 2026-03-06  8:00 ` smadhavan
  2026-03-06 17:45   ` Alex Williamson
  2026-03-10 21:44   ` Dan Williams
  2026-03-06  8:00 ` [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h smadhavan
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: smadhavan @ 2026-03-06  8:00 UTC (permalink / raw)
  To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
	vishal.l.verma, alison.schofield, dave
  Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
	mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
	linux-kernel, Srirangan Madhavan

From: Srirangan Madhavan <smadhavan@nvidia.com>

PCI: Add CXL DVSEC control, lock, and range register definitions

Add register offset and field definitions for CXL DVSEC registers needed
by CXL state save/restore across resets:

  - CTRL2 (offset 0x10) and LOCK (offset 0x14) registers
  - CONFIG_LOCK bit in the LOCK register
  - RWL (read-write-when-locked) field masks for CTRL and range base
    registers.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 include/uapi/linux/pci_regs.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index ec1c54b5a310..6fdc20d7f5e6 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1353,14 +1353,20 @@
 #define   PCI_DVSEC_CXL_HDM_COUNT			__GENMASK(5, 4)
 #define  PCI_DVSEC_CXL_CTRL				0xC
 #define   PCI_DVSEC_CXL_MEM_ENABLE			_BITUL(2)
+#define   PCI_DVSEC_CXL_CTRL_RWL			0x5FED
+#define  PCI_DVSEC_CXL_CTRL2				0x10
+#define  PCI_DVSEC_CXL_LOCK				0x14
+#define   PCI_DVSEC_CXL_LOCK_CONFIG			_BITUL(0)
 #define  PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i)		(0x18 + (i * 0x10))
 #define  PCI_DVSEC_CXL_RANGE_SIZE_LOW(i)		(0x1C + (i * 0x10))
 #define   PCI_DVSEC_CXL_MEM_INFO_VALID			_BITUL(0)
 #define   PCI_DVSEC_CXL_MEM_ACTIVE			_BITUL(1)
 #define   PCI_DVSEC_CXL_MEM_SIZE_LOW			__GENMASK(31, 28)
 #define  PCI_DVSEC_CXL_RANGE_BASE_HIGH(i)		(0x20 + (i * 0x10))
+#define   PCI_DVSEC_CXL_RANGE_BASE_HI_RWL		0xFFFFFFFF
 #define  PCI_DVSEC_CXL_RANGE_BASE_LOW(i)		(0x24 + (i * 0x10))
 #define   PCI_DVSEC_CXL_MEM_BASE_LOW			__GENMASK(31, 28)
+#define   PCI_DVSEC_CXL_RANGE_BASE_LO_RWL		0xF0000000
 
 #define CXL_DVSEC_RANGE_MAX				2
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h
  2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
  2026-03-06  8:00 ` [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions smadhavan
@ 2026-03-06  8:00 ` smadhavan
  2026-03-06 17:45   ` Alex Williamson
  2026-03-06  8:00 ` [PATCH 3/5] PCI: Add virtual extended cap save buffer for CXL state smadhavan
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: smadhavan @ 2026-03-06  8:00 UTC (permalink / raw)
  To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
	vishal.l.verma, alison.schofield, dave
  Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
	mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
	linux-kernel, Srirangan Madhavan

From: Srirangan Madhavan <smadhavan@nvidia.com>

Move CXL HDM decoder register defines, register map structs
(cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map,
cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(),
enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs()
declarations from internal CXL headers to include/cxl/pci.h.

This makes them accessible to code outside the CXL subsystem, in
particular the PCI core CXL state save/restore support added in a
subsequent patch.

No functional change.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/cxl.h    | 107 +----------------------------------
 drivers/cxl/cxlpci.h |  10 ----
 include/cxl/pci.h    | 129 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 130 insertions(+), 116 deletions(-)
 create mode 100644 include/cxl/pci.h

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 04c673e7cdb0..0c84695449d8 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -12,6 +12,7 @@
 #include <linux/node.h>
 #include <linux/io.h>
 #include <linux/range.h>
+#include <cxl/pci.h>

 extern const struct nvdimm_security_ops *cxl_security_ops;

@@ -23,63 +24,6 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
  * (port-driver, region-driver, nvdimm object-drivers... etc).
  */

-/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
-#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
-
-/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers*/
-#define CXL_CM_OFFSET 0x1000
-#define CXL_CM_CAP_HDR_OFFSET 0x0
-#define   CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0)
-#define     CM_CAP_HDR_CAP_ID 1
-#define   CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16)
-#define     CM_CAP_HDR_CAP_VERSION 1
-#define   CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20)
-#define     CM_CAP_HDR_CACHE_MEM_VERSION 1
-#define   CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
-#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
-
-#define   CXL_CM_CAP_CAP_ID_RAS 0x2
-#define   CXL_CM_CAP_CAP_ID_HDM 0x5
-#define   CXL_CM_CAP_CAP_HDM_VERSION 1
-
-/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
-#define CXL_HDM_DECODER_CAP_OFFSET 0x0
-#define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
-#define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
-#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
-#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
-#define   CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
-#define   CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
-#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
-#define   CXL_HDM_DECODER_ENABLE BIT(1)
-#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
-#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
-#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
-#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
-#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
-#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
-#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
-#define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
-#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
-#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
-#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
-#define   CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
-#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
-#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
-#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
-#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
-
-/* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */
-#define CXL_DECODER_MIN_GRANULARITY 256
-#define CXL_DECODER_MAX_ENCODED_IG 6
-
-static inline int cxl_hdm_decoder_count(u32 cap_hdr)
-{
-	int val = FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, cap_hdr);
-
-	return val ? val * 2 : 1;
-}
-
 /* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
 static inline int eig_to_granularity(u16 eig, unsigned int *granularity)
 {
@@ -246,51 +190,6 @@ struct cxl_regs {
 	);
 };

-struct cxl_reg_map {
-	bool valid;
-	int id;
-	unsigned long offset;
-	unsigned long size;
-};
-
-struct cxl_component_reg_map {
-	struct cxl_reg_map hdm_decoder;
-	struct cxl_reg_map ras;
-};
-
-struct cxl_device_reg_map {
-	struct cxl_reg_map status;
-	struct cxl_reg_map mbox;
-	struct cxl_reg_map memdev;
-};
-
-struct cxl_pmu_reg_map {
-	struct cxl_reg_map pmu;
-};
-
-/**
- * struct cxl_register_map - DVSEC harvested register block mapping parameters
- * @host: device for devm operations and logging
- * @base: virtual base of the register-block-BAR + @block_offset
- * @resource: physical resource base of the register block
- * @max_size: maximum mapping size to perform register search
- * @reg_type: see enum cxl_regloc_type
- * @component_map: cxl_reg_map for component registers
- * @device_map: cxl_reg_maps for device registers
- * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
- */
-struct cxl_register_map {
-	struct device *host;
-	void __iomem *base;
-	resource_size_t resource;
-	resource_size_t max_size;
-	u8 reg_type;
-	union {
-		struct cxl_component_reg_map component_map;
-		struct cxl_device_reg_map device_map;
-		struct cxl_pmu_reg_map pmu_map;
-	};
-};

 void cxl_probe_component_regs(struct device *dev, void __iomem *base,
 			      struct cxl_component_reg_map *map);
@@ -304,13 +203,9 @@ int cxl_map_device_regs(const struct cxl_register_map *map,
 int cxl_map_pmu_regs(struct cxl_register_map *map, struct cxl_pmu_regs *regs);

 #define CXL_INSTANCES_COUNT -1
-enum cxl_regloc_type;
 int cxl_count_regblock(struct pci_dev *pdev, enum cxl_regloc_type type);
 int cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_type type,
 			       struct cxl_register_map *map, unsigned int index);
-int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
-		      struct cxl_register_map *map);
-int cxl_setup_regs(struct cxl_register_map *map);
 struct cxl_dport;
 resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
 					   struct cxl_dport *dport);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 0cf64218aa16..9e825c039dd9 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -13,16 +13,6 @@
  */
 #define CXL_PCI_DEFAULT_MAX_VECTORS 16

-/* Register Block Identifier (RBI) */
-enum cxl_regloc_type {
-	CXL_REGLOC_RBI_EMPTY = 0,
-	CXL_REGLOC_RBI_COMPONENT,
-	CXL_REGLOC_RBI_VIRT,
-	CXL_REGLOC_RBI_MEMDEV,
-	CXL_REGLOC_RBI_PMU,
-	CXL_REGLOC_RBI_TYPES
-};
-
 /*
  * Table Access DOE, CDAT Read Entry Response
  *
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
new file mode 100644
index 000000000000..763a048262c7
--- /dev/null
+++ b/include/cxl/pci.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __CXL_CXL_PCI_H__
+#define __CXL_CXL_PCI_H__
+
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/io.h>
+#include <linux/types.h>
+
+/* Register Block Identifier (RBI) */
+enum cxl_regloc_type {
+	CXL_REGLOC_RBI_EMPTY = 0,
+	CXL_REGLOC_RBI_COMPONENT,
+	CXL_REGLOC_RBI_VIRT,
+	CXL_REGLOC_RBI_MEMDEV,
+	CXL_REGLOC_RBI_PMU,
+	CXL_REGLOC_RBI_TYPES
+};
+
+/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
+#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
+
+/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers */
+#define CXL_CM_OFFSET 0x1000
+#define CXL_CM_CAP_HDR_OFFSET 0x0
+#define   CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0)
+#define     CM_CAP_HDR_CAP_ID 1
+#define   CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16)
+#define     CM_CAP_HDR_CAP_VERSION 1
+#define   CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20)
+#define     CM_CAP_HDR_CACHE_MEM_VERSION 1
+#define   CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
+#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
+
+#define   CXL_CM_CAP_CAP_ID_RAS 0x2
+#define   CXL_CM_CAP_CAP_ID_HDM 0x5
+#define   CXL_CM_CAP_CAP_HDM_VERSION 1
+
+/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
+#define CXL_HDM_DECODER_CAP_OFFSET 0x0
+#define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
+#define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
+#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
+#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
+#define   CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
+#define   CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
+#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
+#define   CXL_HDM_DECODER_ENABLE BIT(1)
+#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
+#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
+#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
+#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
+#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
+#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
+#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
+#define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
+#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
+#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
+#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
+#define   CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
+#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
+#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
+#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
+#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
+
+/* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */
+#define CXL_DECODER_MIN_GRANULARITY 256
+#define CXL_DECODER_MAX_ENCODED_IG 6
+
+static inline int cxl_hdm_decoder_count(u32 cap_hdr)
+{
+	int val = FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, cap_hdr);
+
+	return val ? val * 2 : 1;
+}
+
+struct cxl_reg_map {
+	bool valid;
+	int id;
+	unsigned long offset;
+	unsigned long size;
+};
+
+struct cxl_component_reg_map {
+	struct cxl_reg_map hdm_decoder;
+	struct cxl_reg_map ras;
+};
+
+struct cxl_device_reg_map {
+	struct cxl_reg_map status;
+	struct cxl_reg_map mbox;
+	struct cxl_reg_map memdev;
+};
+
+struct cxl_pmu_reg_map {
+	struct cxl_reg_map pmu;
+};
+
+/**
+ * struct cxl_register_map - DVSEC harvested register block mapping parameters
+ * @host: device for devm operations and logging
+ * @base: virtual base of the register-block-BAR + @block_offset
+ * @resource: physical resource base of the register block
+ * @max_size: maximum mapping size to perform register search
+ * @reg_type: see enum cxl_regloc_type
+ * @component_map: cxl_reg_map for component registers
+ * @device_map: cxl_reg_maps for device registers
+ * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
+ */
+struct cxl_register_map {
+	struct device *host;
+	void __iomem *base;
+	resource_size_t resource;
+	resource_size_t max_size;
+	u8 reg_type;
+	union {
+		struct cxl_component_reg_map component_map;
+		struct cxl_device_reg_map device_map;
+		struct cxl_pmu_reg_map pmu_map;
+	};
+};
+
+struct pci_dev;
+
+int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
+		      struct cxl_register_map *map);
+int cxl_setup_regs(struct cxl_register_map *map);
+
+#endif /* __CXL_CXL_PCI_H__ */
--
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] PCI: Add virtual extended cap save buffer for CXL state
  2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
  2026-03-06  8:00 ` [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions smadhavan
  2026-03-06  8:00 ` [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h smadhavan
@ 2026-03-06  8:00 ` smadhavan
  2026-03-10 21:45   ` Dan Williams
  2026-03-06  8:00 ` [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets smadhavan
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: smadhavan @ 2026-03-06  8:00 UTC (permalink / raw)
  To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
	vishal.l.verma, alison.schofield, dave
  Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
	mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
	linux-kernel, Srirangan Madhavan

From: Srirangan Madhavan <smadhavan@nvidia.com>

Add pci_add_virtual_ext_cap_save_buffer() to allocate save buffers
using virtual cap IDs (above PCI_EXT_CAP_ID_MAX) that don't require
a real capability in config space.

The existing pci_add_ext_cap_save_buffer() cannot be used for
CXL DVSEC state because it calls pci_find_saved_ext_cap()
which searches for a matching capability in PCI config space.
The CXL state saved here is a synthetic snapshot (DVSEC+HDM)
and should not be tied to a real extended-cap instance. A
virtual extended-cap save buffer API (cap IDs above
PCI_EXT_CAP_ID_MAX) allows PCI to track this state without
a backing config space capability.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/pci/pci.c | 20 ++++++++++++++++++++
 drivers/pci/pci.h | 18 ++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8479c2e1f74f..dc8181f13864 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3446,6 +3446,26 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev, u16 cap, unsigned int size)
 	return _pci_add_cap_save_buffer(dev, cap, true, size);
 }

+int pci_add_virtual_ext_cap_save_buffer(struct pci_dev *dev, u16 cap,
+					unsigned int size)
+{
+	struct pci_cap_saved_state *save_state;
+
+	if (cap <= PCI_EXT_CAP_ID_MAX)
+		return -EINVAL;
+
+	save_state = kzalloc(sizeof(*save_state) + size, GFP_KERNEL);
+	if (!save_state)
+		return -ENOMEM;
+
+	save_state->cap.cap_nr = cap;
+	save_state->cap.cap_extended = true;
+	save_state->cap.size = size;
+	pci_add_saved_cap(dev, save_state);
+
+	return 0;
+}
+
 /**
  * pci_allocate_cap_save_buffers - allocate buffers for saving capabilities
  * @dev: the PCI device
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 13d998fbacce..05c57f1e4701 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -245,15 +245,33 @@ struct pci_cap_saved_state {
 	struct pci_cap_saved_data	cap;
 };

+/*
+ * Virtual extended cap ID for CXL DVSEC state in the cap save chain.
+ */
+#define PCI_EXT_CAP_ID_CXL_DVSEC_VIRTUAL	0xFFFF
+static_assert(PCI_EXT_CAP_ID_MAX < PCI_EXT_CAP_ID_CXL_DVSEC_VIRTUAL);
+
 void pci_allocate_cap_save_buffers(struct pci_dev *dev);
 void pci_free_cap_save_buffers(struct pci_dev *dev);
 int pci_add_cap_save_buffer(struct pci_dev *dev, char cap, unsigned int size);
 int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 				u16 cap, unsigned int size);
+int pci_add_virtual_ext_cap_save_buffer(struct pci_dev *dev, u16 cap,
+					unsigned int size);
 struct pci_cap_saved_state *pci_find_saved_cap(struct pci_dev *dev, char cap);
 struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
 						   u16 cap);

+#ifdef CONFIG_PCI_CXL
+void pci_allocate_cxl_save_buffer(struct pci_dev *dev);
+void pci_save_cxl_state(struct pci_dev *dev);
+void pci_restore_cxl_state(struct pci_dev *dev);
+#else
+static inline void pci_allocate_cxl_save_buffer(struct pci_dev *dev) { }
+static inline void pci_save_cxl_state(struct pci_dev *dev) { }
+static inline void pci_restore_cxl_state(struct pci_dev *dev) { }
+#endif
+
 #define PCI_PM_D2_DELAY         200	/* usec; see PCIe r4.0, sec 5.9.1 */
 #define PCI_PM_D3HOT_WAIT       10	/* msec */
 #define PCI_PM_D3COLD_WAIT      100	/* msec */
--
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets
  2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
                   ` (2 preceding siblings ...)
  2026-03-06  8:00 ` [PATCH 3/5] PCI: Add virtual extended cap save buffer for CXL state smadhavan
@ 2026-03-06  8:00 ` smadhavan
  2026-03-06 17:45   ` Alex Williamson
  2026-03-12 12:28   ` Jonathan Cameron
  2026-03-06  8:00 ` [PATCH 5/5] PCI: Add HDM decoder state save/restore smadhavan
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: smadhavan @ 2026-03-06  8:00 UTC (permalink / raw)
  To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
	vishal.l.verma, alison.schofield, dave
  Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
	mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
	linux-kernel, Srirangan Madhavan

From: Srirangan Madhavan <smadhavan@nvidia.com>

Save and restore CXL DVSEC control registers (CTRL, CTRL2), range
base registers, and lock state across PCI resets.

When the DVSEC CONFIG_LOCK bit is set, certain DVSEC fields
become read-only and hardware may have updated them. Blindly
restoring saved values would be silently ignored or conflict
with hardware state. Instead, a read-merge-write approach is
used: current hardware values are read for the RWL
(read-write-when-locked) fields and merged with saved state,
so only writable bits are restored while locked bits retain
their hardware values.

Hooked into pci_save_state()/pci_restore_state() so all PCI reset
paths automatically preserve CXL DVSEC configuration.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/pci/Kconfig  |   4 +
 drivers/pci/Makefile |   1 +
 drivers/pci/cxl.c    | 177 +++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.c    |   3 +
 4 files changed, 185 insertions(+)
 create mode 100644 drivers/pci/cxl.c

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index e3f848ffb52a..6b96650b3f31 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -119,6 +119,10 @@ config XEN_PCIDEV_FRONTEND
 	  The PCI device frontend driver allows the kernel to import arbitrary
 	  PCI devices from a PCI backend to support PCI driver domains.

+config PCI_CXL
+	bool
+	default y if CXL_BUS
+
 config PCI_ATS
 	bool

diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 41ebc3b9a518..a6168ecef9c1 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_PCI_TSM)		+= tsm.o
 obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
 obj-$(CONFIG_PCI_NPEM)		+= npem.o
 obj-$(CONFIG_PCIE_TPH)		+= tph.o
+obj-$(CONFIG_PCI_CXL)		+= cxl.o
 obj-$(CONFIG_CARDBUS)		+= setup-cardbus.o

 # Endpoint library must be initialized before its users
diff --git a/drivers/pci/cxl.c b/drivers/pci/cxl.c
new file mode 100644
index 000000000000..abcf70de9171
--- /dev/null
+++ b/drivers/pci/cxl.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * CXL PCI state save/restore support.
+ *
+ * Saves and restores CXL DVSEC registers across PCI resets and link
+ * disable/enable transitions. Hooked into pci_save_state() /
+ * pci_restore_state() via the PCI capability save chain.
+ */
+#include <linux/pci.h>
+#include <cxl/pci.h>
+#include "pci.h"
+
+struct cxl_pci_state {
+	u16 dvsec;
+	u16 dvsec_ctrl;
+	u16 dvsec_ctrl2;
+	u32 range_base_hi[CXL_DVSEC_RANGE_MAX];
+	u32 range_base_lo[CXL_DVSEC_RANGE_MAX];
+	u16 dvsec_lock;
+	bool dvsec_valid;
+};
+
+static void cxl_save_dvsec(struct pci_dev *pdev, struct cxl_pci_state *state)
+{
+	int rc_ctrl, rc_ctrl2;
+	u16 dvsec;
+	int i;
+
+	dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+					  PCI_DVSEC_CXL_DEVICE);
+	if (!dvsec)
+		return;
+
+	state->dvsec = dvsec;
+	rc_ctrl = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL,
+				       &state->dvsec_ctrl);
+	rc_ctrl2 = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
+					&state->dvsec_ctrl2);
+	if (rc_ctrl || rc_ctrl2) {
+		pci_warn(pdev,
+			 "CXL: DVSEC read failed (ctrl rc=%d, ctrl2 rc=%d)\n",
+			 rc_ctrl, rc_ctrl2);
+		return;
+	}
+
+	for (i = 0; i < CXL_DVSEC_RANGE_MAX; i++) {
+		pci_read_config_dword(pdev,
+			dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i),
+			&state->range_base_hi[i]);
+		pci_read_config_dword(pdev,
+			dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(i),
+			&state->range_base_lo[i]);
+	}
+
+	pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_LOCK,
+			     &state->dvsec_lock);
+
+	state->dvsec_valid = true;
+}
+
+static u32 cxl_merge_rwl(u32 saved, u32 current_hw, u32 rwl_mask)
+{
+	return (current_hw & rwl_mask) | (saved & ~rwl_mask);
+}
+
+static void cxl_restore_dvsec(struct pci_dev *pdev,
+			      const struct cxl_pci_state *state)
+{
+	u16 lock_reg = 0;
+	int i;
+
+	if (!state->dvsec_valid)
+		return;
+
+	pci_read_config_word(pdev, state->dvsec + PCI_DVSEC_CXL_LOCK,
+			     &lock_reg);
+
+	if (lock_reg & PCI_DVSEC_CXL_LOCK_CONFIG) {
+		u16 hw_ctrl;
+		u32 hw_range_hi, hw_range_lo;
+
+		pci_read_config_word(pdev,
+				     state->dvsec + PCI_DVSEC_CXL_CTRL,
+				     &hw_ctrl);
+		pci_write_config_word(pdev,
+			state->dvsec + PCI_DVSEC_CXL_CTRL,
+			cxl_merge_rwl(state->dvsec_ctrl, hw_ctrl,
+				      PCI_DVSEC_CXL_CTRL_RWL));
+
+		pci_write_config_word(pdev,
+			state->dvsec + PCI_DVSEC_CXL_CTRL2,
+			state->dvsec_ctrl2);
+
+		for (i = 0; i < CXL_DVSEC_RANGE_MAX; i++) {
+			pci_read_config_dword(pdev,
+				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i),
+				&hw_range_hi);
+			pci_write_config_dword(pdev,
+				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i),
+				cxl_merge_rwl(state->range_base_hi[i],
+					      hw_range_hi,
+					      PCI_DVSEC_CXL_RANGE_BASE_HI_RWL));
+
+			pci_read_config_dword(pdev,
+				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(i),
+				&hw_range_lo);
+			pci_write_config_dword(pdev,
+				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(i),
+				cxl_merge_rwl(state->range_base_lo[i],
+					      hw_range_lo,
+					      PCI_DVSEC_CXL_RANGE_BASE_LO_RWL));
+		}
+	} else {
+		pci_write_config_word(pdev,
+				      state->dvsec + PCI_DVSEC_CXL_CTRL,
+				      state->dvsec_ctrl);
+		pci_write_config_word(pdev,
+				      state->dvsec + PCI_DVSEC_CXL_CTRL2,
+				      state->dvsec_ctrl2);
+		for (i = 0; i < CXL_DVSEC_RANGE_MAX; i++) {
+			pci_write_config_dword(pdev,
+				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i),
+				state->range_base_hi[i]);
+			pci_write_config_dword(pdev,
+				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(i),
+				state->range_base_lo[i]);
+		}
+
+		pci_write_config_word(pdev,
+			state->dvsec + PCI_DVSEC_CXL_LOCK,
+			state->dvsec_lock);
+	}
+}
+
+void pci_allocate_cxl_save_buffer(struct pci_dev *dev)
+{
+	if (!pcie_is_cxl(dev))
+		return;
+
+	if (pci_add_virtual_ext_cap_save_buffer(dev,
+			PCI_EXT_CAP_ID_CXL_DVSEC_VIRTUAL,
+			sizeof(struct cxl_pci_state)))
+		pci_err(dev, "unable to allocate CXL save buffer\n");
+}
+
+void pci_save_cxl_state(struct pci_dev *pdev)
+{
+	struct pci_cap_saved_state *save_state;
+	struct cxl_pci_state *state;
+
+	save_state = pci_find_saved_ext_cap(pdev,
+					    PCI_EXT_CAP_ID_CXL_DVSEC_VIRTUAL);
+	if (!save_state)
+		return;
+
+	state = (struct cxl_pci_state *)save_state->cap.data;
+	state->dvsec_valid = false;
+
+	cxl_save_dvsec(pdev, state);
+}
+
+void pci_restore_cxl_state(struct pci_dev *pdev)
+{
+	struct pci_cap_saved_state *save_state;
+	struct cxl_pci_state *state;
+
+	save_state = pci_find_saved_ext_cap(pdev,
+					    PCI_EXT_CAP_ID_CXL_DVSEC_VIRTUAL);
+	if (!save_state)
+		return;
+
+	state = (struct cxl_pci_state *)save_state->cap.data;
+	if (!state->dvsec_valid)
+		return;
+
+	cxl_restore_dvsec(pdev, state);
+}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index dc8181f13864..497720c64d6d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1759,6 +1759,7 @@ int pci_save_state(struct pci_dev *dev)
 	pci_save_aer_state(dev);
 	pci_save_ptm_state(dev);
 	pci_save_tph_state(dev);
+	pci_save_cxl_state(dev);
 	return pci_save_vc_state(dev);
 }
 EXPORT_SYMBOL(pci_save_state);
@@ -1841,6 +1842,7 @@ void pci_restore_state(struct pci_dev *dev)
 	pci_restore_aer_state(dev);

 	pci_restore_config_space(dev);
+	pci_restore_cxl_state(dev);

 	pci_restore_pcix_state(dev);
 	pci_restore_msi_state(dev);
@@ -3489,6 +3491,7 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
 		pci_err(dev, "unable to allocate suspend buffer for LTR\n");

 	pci_allocate_vc_save_buffers(dev);
+	pci_allocate_cxl_save_buffer(dev);
 }

 void pci_free_cap_save_buffers(struct pci_dev *dev)
--
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] PCI: Add HDM decoder state save/restore
  2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
                   ` (3 preceding siblings ...)
  2026-03-06  8:00 ` [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets smadhavan
@ 2026-03-06  8:00 ` smadhavan
  2026-03-10 21:39 ` [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets Dan Williams
  2026-03-12 12:34 ` Jonathan Cameron
  6 siblings, 0 replies; 25+ messages in thread
From: smadhavan @ 2026-03-06  8:00 UTC (permalink / raw)
  To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
	vishal.l.verma, alison.schofield, dave
  Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
	mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
	linux-kernel, Srirangan Madhavan

From: Srirangan Madhavan <smadhavan@nvidia.com>

Save and restore CXL HDM decoder registers (global control,
per-decoder base/size/target-list, and commit state) across PCI
resets. On restore, decoders that were committed are reprogrammed
and recommitted with a 10ms timeout. Locked decoders that are
already committed are skipped, since their state is protected by
hardware and reprogramming them would fail.

The Register Locator DVSEC is parsed directly via PCI config space
reads rather than calling cxl_find_regblock()/cxl_setup_regs(),
since this code lives in the PCI core and must not depend on CXL
module symbols.

MSE is temporarily enabled during save/restore to allow MMIO
access to the HDM decoder register block.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/pci/cxl.c | 297 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 294 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/cxl.c b/drivers/pci/cxl.c
index abcf70de9171..ed246f32dfa6 100644
--- a/drivers/pci/cxl.c
+++ b/drivers/pci/cxl.c
@@ -2,15 +2,31 @@
 /*
  * CXL PCI state save/restore support.
  *
- * Saves and restores CXL DVSEC registers across PCI resets and link
- * disable/enable transitions. Hooked into pci_save_state() /
+ * Saves and restores CXL DVSEC and HDM decoder registers across PCI resets
+ * and link disable/enable transitions. Hooked into pci_save_state() /
  * pci_restore_state() via the PCI capability save chain.
  */
 #include <linux/pci.h>
+#include <linux/io.h>
+#include <linux/jiffies.h>
+#include <linux/cleanup.h>
 #include <cxl/pci.h>
 #include "pci.h"

+#define CXL_HDM_MAX_DECODERS 32
+
+struct cxl_hdm_decoder_snapshot {
+	u32 base_lo;
+	u32 base_hi;
+	u32 size_lo;
+	u32 size_hi;
+	u32 ctrl;
+	u32 tl_lo;
+	u32 tl_hi;
+};
+
 struct cxl_pci_state {
+	/* DVSEC saved state */
 	u16 dvsec;
 	u16 dvsec_ctrl;
 	u16 dvsec_ctrl2;
@@ -18,6 +34,15 @@ struct cxl_pci_state {
 	u32 range_base_lo[CXL_DVSEC_RANGE_MAX];
 	u16 dvsec_lock;
 	bool dvsec_valid;
+
+	/* HDM decoder saved state */
+	int hdm_bar;
+	unsigned long hdm_bar_offset;
+	unsigned long hdm_map_size;
+	u32 hdm_global_ctrl;
+	int hdm_count;
+	struct cxl_hdm_decoder_snapshot decoders[CXL_HDM_MAX_DECODERS];
+	bool hdm_valid;
 };

 static void cxl_save_dvsec(struct pci_dev *pdev, struct cxl_pci_state *state)
@@ -132,6 +157,269 @@ static void cxl_restore_dvsec(struct pci_dev *pdev,
 	}
 }

+struct pci_cmd_saved {
+	struct pci_dev *pdev;
+	u16 cmd;
+};
+
+DEFINE_FREE(restore_pci_cmd, struct pci_cmd_saved,
+	    if (!(_T.cmd & PCI_COMMAND_MEMORY))
+		    pci_write_config_word(_T.pdev, PCI_COMMAND, _T.cmd))
+
+/**
+ * cxl_find_component_regblock - Find the Component Register Block via
+ *                               the Register Locator DVSEC
+ * @pdev: PCI device to scan
+ * @bir: output BAR index
+ * @offset: output offset within the BAR
+ *
+ * Parses the Register Locator DVSEC (ID 8) directly via PCI config space
+ * reads.  No dependency on CXL module symbols.
+ *
+ * Return: 0 on success, -ENODEV if not found.
+ */
+static int cxl_find_component_regblock(struct pci_dev *pdev,
+				       int *bir, u64 *offset)
+{
+	u32 regloc_size, regblocks;
+	u16 regloc;
+	int i;
+
+	regloc = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+					   PCI_DVSEC_CXL_REG_LOCATOR);
+	if (!regloc)
+		return -ENODEV;
+
+	pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, &regloc_size);
+	regloc_size = PCI_DVSEC_HEADER1_LEN(regloc_size);
+	regblocks = (regloc_size - PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1) / 8;
+
+	for (i = 0; i < regblocks; i++) {
+		u32 reg_lo, reg_hi;
+		unsigned int off;
+
+		off = regloc + PCI_DVSEC_CXL_REG_LOCATOR_BLOCK1 + i * 8;
+		pci_read_config_dword(pdev, off, &reg_lo);
+		pci_read_config_dword(pdev, off + 4, &reg_hi);
+
+		if (FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_ID, reg_lo) !=
+		    CXL_REGLOC_RBI_COMPONENT)
+			continue;
+
+		*bir = FIELD_GET(PCI_DVSEC_CXL_REG_LOCATOR_BIR, reg_lo);
+		*offset = ((u64)reg_hi << 32) |
+			  (reg_lo & PCI_DVSEC_CXL_REG_LOCATOR_BLOCK_OFF_LOW);
+		return 0;
+	}
+
+	return -ENODEV;
+}
+
+/*
+ * Discover and map HDM decoder registers.
+ * Caller must pci_iounmap() the returned pointer.
+ */
+static void __iomem *cxl_hdm_map(struct pci_dev *pdev, int *bar_out,
+				  unsigned long *offset_out,
+				  unsigned long *size_out)
+{
+	int bir;
+	u64 reg_offset;
+	void __iomem *comp_base, *cm_base;
+	u32 cap_hdr;
+	int cap, cap_count;
+	unsigned long hdm_offset = 0, hdm_size = 0;
+	void __iomem *hdm;
+
+	if (cxl_find_component_regblock(pdev, &bir, &reg_offset))
+		return NULL;
+
+	comp_base = pci_iomap_range(pdev, bir, reg_offset,
+				    CXL_CM_OFFSET + SZ_4K);
+	if (!comp_base)
+		return NULL;
+
+	cm_base = comp_base + CXL_CM_OFFSET;
+	cap_hdr = readl(cm_base);
+
+	if (FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, cap_hdr) != CM_CAP_HDR_CAP_ID) {
+		pci_iounmap(pdev, comp_base);
+		return NULL;
+	}
+
+	cap_count = FIELD_GET(CXL_CM_CAP_HDR_ARRAY_SIZE_MASK, cap_hdr);
+
+	for (cap = 1; cap <= cap_count; cap++) {
+		u32 hdr = readl(cm_base + cap * 4);
+		u16 cap_id = FIELD_GET(CXL_CM_CAP_HDR_ID_MASK, hdr);
+		u32 cap_off = FIELD_GET(CXL_CM_CAP_PTR_MASK, hdr);
+
+		if (cap_id != CXL_CM_CAP_CAP_ID_HDM)
+			continue;
+
+		hdr = readl(cm_base + cap_off);
+		hdm_offset = CXL_CM_OFFSET + cap_off;
+		hdm_size = 0x20 * cxl_hdm_decoder_count(hdr) + 0x10;
+		break;
+	}
+
+	pci_iounmap(pdev, comp_base);
+
+	if (!hdm_size)
+		return NULL;
+
+	hdm = pci_iomap_range(pdev, bir, reg_offset + hdm_offset, hdm_size);
+	if (!hdm)
+		return NULL;
+
+	*bar_out = bir;
+	*offset_out = reg_offset + hdm_offset;
+	*size_out = hdm_size;
+	return hdm;
+}
+
+static void cxl_save_hdm(struct pci_dev *pdev, void __iomem *hdm,
+			  struct cxl_pci_state *state, int count)
+{
+	int i;
+
+	state->hdm_count = min_t(int, count, CXL_HDM_MAX_DECODERS);
+	state->hdm_global_ctrl = readl(hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+
+	for (i = 0; i < state->hdm_count; i++) {
+		struct cxl_hdm_decoder_snapshot *d = &state->decoders[i];
+
+		d->base_lo = readl(hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(i));
+		d->base_hi = readl(hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i));
+		d->size_lo = readl(hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i));
+		d->size_hi = readl(hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i));
+		d->ctrl    = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
+		d->tl_lo   = readl(hdm + CXL_HDM_DECODER0_TL_LOW(i));
+		d->tl_hi   = readl(hdm + CXL_HDM_DECODER0_TL_HIGH(i));
+	}
+}
+
+static void cxl_restore_hdm(struct pci_dev *pdev, void __iomem *hdm,
+			     const struct cxl_pci_state *state)
+{
+	int i;
+
+	writel(state->hdm_global_ctrl, hdm + CXL_HDM_DECODER_CTRL_OFFSET);
+
+	for (i = 0; i < state->hdm_count; i++) {
+		const struct cxl_hdm_decoder_snapshot *d = &state->decoders[i];
+		unsigned long timeout;
+		u32 ctrl;
+
+		if (!(d->ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED))
+			continue;
+
+		ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
+		if ((ctrl & CXL_HDM_DECODER0_CTRL_LOCK) &&
+		    (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED))
+			continue;
+
+		if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED) {
+			ctrl &= ~CXL_HDM_DECODER0_CTRL_COMMIT;
+			writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
+		}
+
+		writel(d->base_lo, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(i));
+		writel(d->base_hi, hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i));
+		writel(d->size_lo, hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i));
+		writel(d->size_hi, hdm + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i));
+		writel(d->tl_lo, hdm + CXL_HDM_DECODER0_TL_LOW(i));
+		writel(d->tl_hi, hdm + CXL_HDM_DECODER0_TL_HIGH(i));
+
+		wmb();
+
+		ctrl = d->ctrl & ~(CXL_HDM_DECODER0_CTRL_COMMITTED |
+				   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR);
+		ctrl |= CXL_HDM_DECODER0_CTRL_COMMIT;
+		writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
+
+		timeout = jiffies + msecs_to_jiffies(10);
+		for (;;) {
+			ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
+			if (ctrl & CXL_HDM_DECODER0_CTRL_COMMITTED)
+				break;
+			if (ctrl & CXL_HDM_DECODER0_CTRL_COMMIT_ERROR) {
+				pci_warn(pdev,
+					 "HDM decoder %d commit error on restore\n",
+					 i);
+				break;
+			}
+			if (time_after(jiffies, timeout)) {
+				pci_warn(pdev,
+					 "HDM decoder %d commit timeout on restore\n",
+					 i);
+				break;
+			}
+			cpu_relax();
+		}
+	}
+}
+
+static void cxl_save_hdm_decoders(struct pci_dev *pdev,
+				   struct cxl_pci_state *state)
+{
+	int hdm_bar;
+	unsigned long hdm_bar_offset, hdm_map_size;
+	void __iomem *hdm;
+	u16 cmd;
+	u32 cap;
+	struct pci_cmd_saved saved __free(restore_pci_cmd) = {
+		.pdev = pdev, .cmd = PCI_COMMAND_MEMORY,
+	};
+
+	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
+	saved.cmd = cmd;
+	if (!(cmd & PCI_COMMAND_MEMORY))
+		pci_write_config_word(pdev, PCI_COMMAND,
+				      cmd | PCI_COMMAND_MEMORY);
+
+	hdm = cxl_hdm_map(pdev, &hdm_bar, &hdm_bar_offset, &hdm_map_size);
+	if (!hdm)
+		return;
+
+	cap = readl(hdm + CXL_HDM_DECODER_CAP_OFFSET);
+	cxl_save_hdm(pdev, hdm, state, cxl_hdm_decoder_count(cap));
+	state->hdm_bar = hdm_bar;
+	state->hdm_bar_offset = hdm_bar_offset;
+	state->hdm_map_size = hdm_map_size;
+	state->hdm_valid = true;
+	pci_iounmap(pdev, hdm);
+}
+
+static void cxl_restore_hdm_decoders(struct pci_dev *pdev,
+				      const struct cxl_pci_state *state)
+{
+	void __iomem *hdm;
+	u16 cmd;
+	struct pci_cmd_saved saved __free(restore_pci_cmd) = {
+		.pdev = pdev, .cmd = PCI_COMMAND_MEMORY,
+	};
+
+	if (!state->hdm_valid)
+		return;
+
+	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
+	saved.cmd = cmd;
+	if (!(cmd & PCI_COMMAND_MEMORY))
+		pci_write_config_word(pdev, PCI_COMMAND,
+				      cmd | PCI_COMMAND_MEMORY);
+
+	hdm = pci_iomap_range(pdev, state->hdm_bar, state->hdm_bar_offset,
+			      state->hdm_map_size);
+	if (!hdm) {
+		pci_warn(pdev, "CXL: failed to map HDM for restore\n");
+		return;
+	}
+
+	cxl_restore_hdm(pdev, hdm, state);
+	pci_iounmap(pdev, hdm);
+}
+
 void pci_allocate_cxl_save_buffer(struct pci_dev *dev)
 {
 	if (!pcie_is_cxl(dev))
@@ -155,8 +443,10 @@ void pci_save_cxl_state(struct pci_dev *pdev)

 	state = (struct cxl_pci_state *)save_state->cap.data;
 	state->dvsec_valid = false;
+	state->hdm_valid = false;

 	cxl_save_dvsec(pdev, state);
+	cxl_save_hdm_decoders(pdev, state);
 }

 void pci_restore_cxl_state(struct pci_dev *pdev)
@@ -170,8 +460,9 @@ void pci_restore_cxl_state(struct pci_dev *pdev)
 		return;

 	state = (struct cxl_pci_state *)save_state->cap.data;
-	if (!state->dvsec_valid)
+	if (!state->dvsec_valid && !state->hdm_valid)
 		return;

 	cxl_restore_dvsec(pdev, state);
+	cxl_restore_hdm_decoders(pdev, state);
 }
--
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets
  2026-03-06  8:00 ` [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets smadhavan
@ 2026-03-06 17:45   ` Alex Williamson
  2026-03-12 12:28   ` Jonathan Cameron
  1 sibling, 0 replies; 25+ messages in thread
From: Alex Williamson @ 2026-03-06 17:45 UTC (permalink / raw)
  To: smadhavan
  Cc: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
	vishal.l.verma, alison.schofield, dave, jeshuas, vsethi,
	skancherla, vaslot, sdonthineni, mhonap, vidyas, jan, mochs,
	dschumacher, linux-cxl, linux-pci, linux-kernel

On Fri, 6 Mar 2026 08:00:18 +0000
<smadhavan@nvidia.com> wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Save and restore CXL DVSEC control registers (CTRL, CTRL2), range
> base registers, and lock state across PCI resets.
> 
> When the DVSEC CONFIG_LOCK bit is set, certain DVSEC fields
> become read-only and hardware may have updated them. Blindly
> restoring saved values would be silently ignored or conflict
> with hardware state. Instead, a read-merge-write approach is
> used: current hardware values are read for the RWL
> (read-write-when-locked) fields and merged with saved state,
> so only writable bits are restored while locked bits retain
> their hardware values.
> 
> Hooked into pci_save_state()/pci_restore_state() so all PCI reset
> paths automatically preserve CXL DVSEC configuration.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/pci/Kconfig  |   4 +
>  drivers/pci/Makefile |   1 +
>  drivers/pci/cxl.c    | 177 +++++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.c    |   3 +
>  4 files changed, 185 insertions(+)
>  create mode 100644 drivers/pci/cxl.c
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index e3f848ffb52a..6b96650b3f31 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -119,6 +119,10 @@ config XEN_PCIDEV_FRONTEND
>  	  The PCI device frontend driver allows the kernel to import arbitrary
>  	  PCI devices from a PCI backend to support PCI driver domains.
> 
> +config PCI_CXL
> +	bool
> +	default y if CXL_BUS
> +
>  config PCI_ATS
>  	bool
> 
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 41ebc3b9a518..a6168ecef9c1 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -39,6 +39,7 @@ obj-$(CONFIG_PCI_TSM)		+= tsm.o
>  obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
>  obj-$(CONFIG_PCI_NPEM)		+= npem.o
>  obj-$(CONFIG_PCIE_TPH)		+= tph.o
> +obj-$(CONFIG_PCI_CXL)		+= cxl.o
>  obj-$(CONFIG_CARDBUS)		+= setup-cardbus.o
> 
>  # Endpoint library must be initialized before its users
> diff --git a/drivers/pci/cxl.c b/drivers/pci/cxl.c
> new file mode 100644
> index 000000000000..abcf70de9171
> --- /dev/null
> +++ b/drivers/pci/cxl.c
> @@ -0,0 +1,177 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * CXL PCI state save/restore support.
> + *
> + * Saves and restores CXL DVSEC registers across PCI resets and link
> + * disable/enable transitions. Hooked into pci_save_state() /
> + * pci_restore_state() via the PCI capability save chain.
> + */
> +#include <linux/pci.h>
> +#include <cxl/pci.h>
> +#include "pci.h"
> +
> +struct cxl_pci_state {
> +	u16 dvsec;
> +	u16 dvsec_ctrl;
> +	u16 dvsec_ctrl2;
> +	u32 range_base_hi[CXL_DVSEC_RANGE_MAX];
> +	u32 range_base_lo[CXL_DVSEC_RANGE_MAX];
> +	u16 dvsec_lock;
> +	bool dvsec_valid;
> +};
> +
> +static void cxl_save_dvsec(struct pci_dev *pdev, struct cxl_pci_state *state)
> +{
> +	int rc_ctrl, rc_ctrl2;
> +	u16 dvsec;
> +	int i;
> +
> +	dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> +					  PCI_DVSEC_CXL_DEVICE);
> +	if (!dvsec)
> +		return;
> +
> +	state->dvsec = dvsec;
> +	rc_ctrl = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL,
> +				       &state->dvsec_ctrl);
> +	rc_ctrl2 = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> +					&state->dvsec_ctrl2);
> +	if (rc_ctrl || rc_ctrl2) {
> +		pci_warn(pdev,
> +			 "CXL: DVSEC read failed (ctrl rc=%d, ctrl2 rc=%d)\n",
> +			 rc_ctrl, rc_ctrl2);
> +		return;
> +	}
> +
> +	for (i = 0; i < CXL_DVSEC_RANGE_MAX; i++) {
> +		pci_read_config_dword(pdev,
> +			dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i),
> +			&state->range_base_hi[i]);
> +		pci_read_config_dword(pdev,
> +			dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(i),
> +			&state->range_base_lo[i]);
> +	}

Shouldn't we also be handling the array of RANGE_SIZE registers here?
NB. the low bits indicating active/valid need special handling.  Thanks,

Alex

> +
> +	pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_LOCK,
> +			     &state->dvsec_lock);
> +
> +	state->dvsec_valid = true;
> +}
> +
> +static u32 cxl_merge_rwl(u32 saved, u32 current_hw, u32 rwl_mask)
> +{
> +	return (current_hw & rwl_mask) | (saved & ~rwl_mask);
> +}
> +
> +static void cxl_restore_dvsec(struct pci_dev *pdev,
> +			      const struct cxl_pci_state *state)
> +{
> +	u16 lock_reg = 0;
> +	int i;
> +
> +	if (!state->dvsec_valid)
> +		return;
> +
> +	pci_read_config_word(pdev, state->dvsec + PCI_DVSEC_CXL_LOCK,
> +			     &lock_reg);
> +
> +	if (lock_reg & PCI_DVSEC_CXL_LOCK_CONFIG) {
> +		u16 hw_ctrl;
> +		u32 hw_range_hi, hw_range_lo;
> +
> +		pci_read_config_word(pdev,
> +				     state->dvsec + PCI_DVSEC_CXL_CTRL,
> +				     &hw_ctrl);
> +		pci_write_config_word(pdev,
> +			state->dvsec + PCI_DVSEC_CXL_CTRL,
> +			cxl_merge_rwl(state->dvsec_ctrl, hw_ctrl,
> +				      PCI_DVSEC_CXL_CTRL_RWL));
> +
> +		pci_write_config_word(pdev,
> +			state->dvsec + PCI_DVSEC_CXL_CTRL2,
> +			state->dvsec_ctrl2);
> +
> +		for (i = 0; i < CXL_DVSEC_RANGE_MAX; i++) {
> +			pci_read_config_dword(pdev,
> +				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i),
> +				&hw_range_hi);
> +			pci_write_config_dword(pdev,
> +				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i),
> +				cxl_merge_rwl(state->range_base_hi[i],
> +					      hw_range_hi,
> +					      PCI_DVSEC_CXL_RANGE_BASE_HI_RWL));
> +
> +			pci_read_config_dword(pdev,
> +				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(i),
> +				&hw_range_lo);
> +			pci_write_config_dword(pdev,
> +				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(i),
> +				cxl_merge_rwl(state->range_base_lo[i],
> +					      hw_range_lo,
> +					      PCI_DVSEC_CXL_RANGE_BASE_LO_RWL));
> +		}
> +	} else {
> +		pci_write_config_word(pdev,
> +				      state->dvsec + PCI_DVSEC_CXL_CTRL,
> +				      state->dvsec_ctrl);
> +		pci_write_config_word(pdev,
> +				      state->dvsec + PCI_DVSEC_CXL_CTRL2,
> +				      state->dvsec_ctrl2);
> +		for (i = 0; i < CXL_DVSEC_RANGE_MAX; i++) {
> +			pci_write_config_dword(pdev,
> +				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_HIGH(i),
> +				state->range_base_hi[i]);
> +			pci_write_config_dword(pdev,
> +				state->dvsec + PCI_DVSEC_CXL_RANGE_BASE_LOW(i),
> +				state->range_base_lo[i]);
> +		}
> +
> +		pci_write_config_word(pdev,
> +			state->dvsec + PCI_DVSEC_CXL_LOCK,
> +			state->dvsec_lock);
> +	}
> +}
> +
> +void pci_allocate_cxl_save_buffer(struct pci_dev *dev)
> +{
> +	if (!pcie_is_cxl(dev))
> +		return;
> +
> +	if (pci_add_virtual_ext_cap_save_buffer(dev,
> +			PCI_EXT_CAP_ID_CXL_DVSEC_VIRTUAL,
> +			sizeof(struct cxl_pci_state)))
> +		pci_err(dev, "unable to allocate CXL save buffer\n");
> +}
> +
> +void pci_save_cxl_state(struct pci_dev *pdev)
> +{
> +	struct pci_cap_saved_state *save_state;
> +	struct cxl_pci_state *state;
> +
> +	save_state = pci_find_saved_ext_cap(pdev,
> +					    PCI_EXT_CAP_ID_CXL_DVSEC_VIRTUAL);
> +	if (!save_state)
> +		return;
> +
> +	state = (struct cxl_pci_state *)save_state->cap.data;
> +	state->dvsec_valid = false;
> +
> +	cxl_save_dvsec(pdev, state);
> +}
> +
> +void pci_restore_cxl_state(struct pci_dev *pdev)
> +{
> +	struct pci_cap_saved_state *save_state;
> +	struct cxl_pci_state *state;
> +
> +	save_state = pci_find_saved_ext_cap(pdev,
> +					    PCI_EXT_CAP_ID_CXL_DVSEC_VIRTUAL);
> +	if (!save_state)
> +		return;
> +
> +	state = (struct cxl_pci_state *)save_state->cap.data;
> +	if (!state->dvsec_valid)
> +		return;
> +
> +	cxl_restore_dvsec(pdev, state);
> +}
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index dc8181f13864..497720c64d6d 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1759,6 +1759,7 @@ int pci_save_state(struct pci_dev *dev)
>  	pci_save_aer_state(dev);
>  	pci_save_ptm_state(dev);
>  	pci_save_tph_state(dev);
> +	pci_save_cxl_state(dev);
>  	return pci_save_vc_state(dev);
>  }
>  EXPORT_SYMBOL(pci_save_state);
> @@ -1841,6 +1842,7 @@ void pci_restore_state(struct pci_dev *dev)
>  	pci_restore_aer_state(dev);
> 
>  	pci_restore_config_space(dev);
> +	pci_restore_cxl_state(dev);
> 
>  	pci_restore_pcix_state(dev);
>  	pci_restore_msi_state(dev);
> @@ -3489,6 +3491,7 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>  		pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> 
>  	pci_allocate_vc_save_buffers(dev);
> +	pci_allocate_cxl_save_buffer(dev);
>  }
> 
>  void pci_free_cap_save_buffers(struct pci_dev *dev)
> --
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h
  2026-03-06  8:00 ` [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h smadhavan
@ 2026-03-06 17:45   ` Alex Williamson
  2026-03-07  0:35     ` Srirangan Madhavan
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2026-03-06 17:45 UTC (permalink / raw)
  To: smadhavan
  Cc: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
	vishal.l.verma, alison.schofield, dave, jeshuas, vsethi,
	skancherla, vaslot, sdonthineni, mhonap, vidyas, jan, mochs,
	dschumacher, linux-cxl, linux-pci, linux-kernel

On Fri, 6 Mar 2026 08:00:16 +0000
<smadhavan@nvidia.com> wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Move CXL HDM decoder register defines, register map structs
> (cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map,
> cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(),
> enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs()
> declarations from internal CXL headers to include/cxl/pci.h.

Regarding cxl_find_regblock()/cxl_setup_regs()..
 
> This makes them accessible to code outside the CXL subsystem, in
> particular the PCI core CXL state save/restore support added in a
> subsequent patch.

Yet in 5/:

    The Register Locator DVSEC is parsed directly via PCI config space
    reads rather than calling cxl_find_regblock()/cxl_setup_regs(),
    since this code lives in the PCI core and must not depend on CXL
    module symbols.

So moving them to the public header is just unnecessary code churn.

Also, specifically regarding cxl_find_regblock(), where 5/ implements a
new cxl_find_component_regblock().  The latter looks significantly
similar to the original.  Shouldn't we refactor the new PCI implemented
version to be more generic rather than scope reduced, and export it for
use by CXL code to avoid the duplication?  Thanks,

Alex
 
> No functional change.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/cxl.h    | 107 +----------------------------------
>  drivers/cxl/cxlpci.h |  10 ----
>  include/cxl/pci.h    | 129 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 130 insertions(+), 116 deletions(-)
>  create mode 100644 include/cxl/pci.h
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 04c673e7cdb0..0c84695449d8 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -12,6 +12,7 @@
>  #include <linux/node.h>
>  #include <linux/io.h>
>  #include <linux/range.h>
> +#include <cxl/pci.h>
> 
>  extern const struct nvdimm_security_ops *cxl_security_ops;
> 
> @@ -23,63 +24,6 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
>   * (port-driver, region-driver, nvdimm object-drivers... etc).
>   */
> 
> -/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
> -#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
> -
> -/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers*/
> -#define CXL_CM_OFFSET 0x1000
> -#define CXL_CM_CAP_HDR_OFFSET 0x0
> -#define   CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0)
> -#define     CM_CAP_HDR_CAP_ID 1
> -#define   CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16)
> -#define     CM_CAP_HDR_CAP_VERSION 1
> -#define   CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20)
> -#define     CM_CAP_HDR_CACHE_MEM_VERSION 1
> -#define   CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
> -#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
> -
> -#define   CXL_CM_CAP_CAP_ID_RAS 0x2
> -#define   CXL_CM_CAP_CAP_ID_HDM 0x5
> -#define   CXL_CM_CAP_CAP_HDM_VERSION 1
> -
> -/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
> -#define CXL_HDM_DECODER_CAP_OFFSET 0x0
> -#define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
> -#define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
> -#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
> -#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
> -#define   CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
> -#define   CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
> -#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
> -#define   CXL_HDM_DECODER_ENABLE BIT(1)
> -#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
> -#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
> -#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
> -#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
> -#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
> -#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
> -#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
> -#define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
> -#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
> -#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> -#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
> -#define   CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
> -#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
> -#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
> -#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
> -#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
> -
> -/* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */
> -#define CXL_DECODER_MIN_GRANULARITY 256
> -#define CXL_DECODER_MAX_ENCODED_IG 6
> -
> -static inline int cxl_hdm_decoder_count(u32 cap_hdr)
> -{
> -	int val = FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, cap_hdr);
> -
> -	return val ? val * 2 : 1;
> -}
> -
>  /* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
>  static inline int eig_to_granularity(u16 eig, unsigned int *granularity)
>  {
> @@ -246,51 +190,6 @@ struct cxl_regs {
>  	);
>  };
> 
> -struct cxl_reg_map {
> -	bool valid;
> -	int id;
> -	unsigned long offset;
> -	unsigned long size;
> -};
> -
> -struct cxl_component_reg_map {
> -	struct cxl_reg_map hdm_decoder;
> -	struct cxl_reg_map ras;
> -};
> -
> -struct cxl_device_reg_map {
> -	struct cxl_reg_map status;
> -	struct cxl_reg_map mbox;
> -	struct cxl_reg_map memdev;
> -};
> -
> -struct cxl_pmu_reg_map {
> -	struct cxl_reg_map pmu;
> -};
> -
> -/**
> - * struct cxl_register_map - DVSEC harvested register block mapping parameters
> - * @host: device for devm operations and logging
> - * @base: virtual base of the register-block-BAR + @block_offset
> - * @resource: physical resource base of the register block
> - * @max_size: maximum mapping size to perform register search
> - * @reg_type: see enum cxl_regloc_type
> - * @component_map: cxl_reg_map for component registers
> - * @device_map: cxl_reg_maps for device registers
> - * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
> - */
> -struct cxl_register_map {
> -	struct device *host;
> -	void __iomem *base;
> -	resource_size_t resource;
> -	resource_size_t max_size;
> -	u8 reg_type;
> -	union {
> -		struct cxl_component_reg_map component_map;
> -		struct cxl_device_reg_map device_map;
> -		struct cxl_pmu_reg_map pmu_map;
> -	};
> -};
> 
>  void cxl_probe_component_regs(struct device *dev, void __iomem *base,
>  			      struct cxl_component_reg_map *map);
> @@ -304,13 +203,9 @@ int cxl_map_device_regs(const struct cxl_register_map *map,
>  int cxl_map_pmu_regs(struct cxl_register_map *map, struct cxl_pmu_regs *regs);
> 
>  #define CXL_INSTANCES_COUNT -1
> -enum cxl_regloc_type;
>  int cxl_count_regblock(struct pci_dev *pdev, enum cxl_regloc_type type);
>  int cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_type type,
>  			       struct cxl_register_map *map, unsigned int index);
> -int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> -		      struct cxl_register_map *map);
> -int cxl_setup_regs(struct cxl_register_map *map);
>  struct cxl_dport;
>  resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
>  					   struct cxl_dport *dport);
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 0cf64218aa16..9e825c039dd9 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -13,16 +13,6 @@
>   */
>  #define CXL_PCI_DEFAULT_MAX_VECTORS 16
> 
> -/* Register Block Identifier (RBI) */
> -enum cxl_regloc_type {
> -	CXL_REGLOC_RBI_EMPTY = 0,
> -	CXL_REGLOC_RBI_COMPONENT,
> -	CXL_REGLOC_RBI_VIRT,
> -	CXL_REGLOC_RBI_MEMDEV,
> -	CXL_REGLOC_RBI_PMU,
> -	CXL_REGLOC_RBI_TYPES
> -};
> -
>  /*
>   * Table Access DOE, CDAT Read Entry Response
>   *
> diff --git a/include/cxl/pci.h b/include/cxl/pci.h
> new file mode 100644
> index 000000000000..763a048262c7
> --- /dev/null
> +++ b/include/cxl/pci.h
> @@ -0,0 +1,129 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef __CXL_CXL_PCI_H__
> +#define __CXL_CXL_PCI_H__
> +
> +#include <linux/bitfield.h>
> +#include <linux/bitops.h>
> +#include <linux/io.h>
> +#include <linux/types.h>
> +
> +/* Register Block Identifier (RBI) */
> +enum cxl_regloc_type {
> +	CXL_REGLOC_RBI_EMPTY = 0,
> +	CXL_REGLOC_RBI_COMPONENT,
> +	CXL_REGLOC_RBI_VIRT,
> +	CXL_REGLOC_RBI_MEMDEV,
> +	CXL_REGLOC_RBI_PMU,
> +	CXL_REGLOC_RBI_TYPES
> +};
> +
> +/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
> +#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
> +
> +/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers */
> +#define CXL_CM_OFFSET 0x1000
> +#define CXL_CM_CAP_HDR_OFFSET 0x0
> +#define   CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0)
> +#define     CM_CAP_HDR_CAP_ID 1
> +#define   CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16)
> +#define     CM_CAP_HDR_CAP_VERSION 1
> +#define   CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20)
> +#define     CM_CAP_HDR_CACHE_MEM_VERSION 1
> +#define   CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
> +#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
> +
> +#define   CXL_CM_CAP_CAP_ID_RAS 0x2
> +#define   CXL_CM_CAP_CAP_ID_HDM 0x5
> +#define   CXL_CM_CAP_CAP_HDM_VERSION 1
> +
> +/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
> +#define CXL_HDM_DECODER_CAP_OFFSET 0x0
> +#define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
> +#define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
> +#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
> +#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
> +#define   CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
> +#define   CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
> +#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
> +#define   CXL_HDM_DECODER_ENABLE BIT(1)
> +#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
> +#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
> +#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
> +#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
> +#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
> +#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
> +#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
> +#define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
> +#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
> +#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> +#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
> +#define   CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
> +#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
> +#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
> +#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
> +#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
> +
> +/* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */
> +#define CXL_DECODER_MIN_GRANULARITY 256
> +#define CXL_DECODER_MAX_ENCODED_IG 6
> +
> +static inline int cxl_hdm_decoder_count(u32 cap_hdr)
> +{
> +	int val = FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, cap_hdr);
> +
> +	return val ? val * 2 : 1;
> +}
> +
> +struct cxl_reg_map {
> +	bool valid;
> +	int id;
> +	unsigned long offset;
> +	unsigned long size;
> +};
> +
> +struct cxl_component_reg_map {
> +	struct cxl_reg_map hdm_decoder;
> +	struct cxl_reg_map ras;
> +};
> +
> +struct cxl_device_reg_map {
> +	struct cxl_reg_map status;
> +	struct cxl_reg_map mbox;
> +	struct cxl_reg_map memdev;
> +};
> +
> +struct cxl_pmu_reg_map {
> +	struct cxl_reg_map pmu;
> +};
> +
> +/**
> + * struct cxl_register_map - DVSEC harvested register block mapping parameters
> + * @host: device for devm operations and logging
> + * @base: virtual base of the register-block-BAR + @block_offset
> + * @resource: physical resource base of the register block
> + * @max_size: maximum mapping size to perform register search
> + * @reg_type: see enum cxl_regloc_type
> + * @component_map: cxl_reg_map for component registers
> + * @device_map: cxl_reg_maps for device registers
> + * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
> + */
> +struct cxl_register_map {
> +	struct device *host;
> +	void __iomem *base;
> +	resource_size_t resource;
> +	resource_size_t max_size;
> +	u8 reg_type;
> +	union {
> +		struct cxl_component_reg_map component_map;
> +		struct cxl_device_reg_map device_map;
> +		struct cxl_pmu_reg_map pmu_map;
> +	};
> +};
> +
> +struct pci_dev;
> +
> +int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> +		      struct cxl_register_map *map);
> +int cxl_setup_regs(struct cxl_register_map *map);
> +
> +#endif /* __CXL_CXL_PCI_H__ */
> --
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions
  2026-03-06  8:00 ` [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions smadhavan
@ 2026-03-06 17:45   ` Alex Williamson
  2026-03-07  0:37     ` Srirangan Madhavan
  2026-03-10 21:44   ` Dan Williams
  1 sibling, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2026-03-06 17:45 UTC (permalink / raw)
  To: smadhavan
  Cc: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
	vishal.l.verma, alison.schofield, dave, jeshuas, vsethi,
	skancherla, vaslot, sdonthineni, mhonap, vidyas, jan, mochs,
	dschumacher, linux-cxl, linux-pci, linux-kernel

On Fri, 6 Mar 2026 08:00:15 +0000
<smadhavan@nvidia.com> wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> PCI: Add CXL DVSEC control, lock, and range register definitions
> 
> Add register offset and field definitions for CXL DVSEC registers needed
> by CXL state save/restore across resets:
> 
>   - CTRL2 (offset 0x10) and LOCK (offset 0x14) registers
>   - CONFIG_LOCK bit in the LOCK register
>   - RWL (read-write-when-locked) field masks for CTRL and range base
>     registers.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  include/uapi/linux/pci_regs.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index ec1c54b5a310..6fdc20d7f5e6 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1353,14 +1353,20 @@
>  #define   PCI_DVSEC_CXL_HDM_COUNT			__GENMASK(5, 4)
>  #define  PCI_DVSEC_CXL_CTRL				0xC
>  #define   PCI_DVSEC_CXL_MEM_ENABLE			_BITUL(2)
> +#define   PCI_DVSEC_CXL_CTRL_RWL			0x5FED
> +#define  PCI_DVSEC_CXL_CTRL2				0x10
> +#define  PCI_DVSEC_CXL_LOCK				0x14
> +#define   PCI_DVSEC_CXL_LOCK_CONFIG			_BITUL(0)
>  #define  PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i)		(0x18 + (i * 0x10))
>  #define  PCI_DVSEC_CXL_RANGE_SIZE_LOW(i)		(0x1C + (i * 0x10))
>  #define   PCI_DVSEC_CXL_MEM_INFO_VALID			_BITUL(0)
>  #define   PCI_DVSEC_CXL_MEM_ACTIVE			_BITUL(1)
>  #define   PCI_DVSEC_CXL_MEM_SIZE_LOW			__GENMASK(31, 28)
>  #define  PCI_DVSEC_CXL_RANGE_BASE_HIGH(i)		(0x20 + (i * 0x10))
> +#define   PCI_DVSEC_CXL_RANGE_BASE_HI_RWL		0xFFFFFFFF
>  #define  PCI_DVSEC_CXL_RANGE_BASE_LOW(i)		(0x24 + (i * 0x10))
>  #define   PCI_DVSEC_CXL_MEM_BASE_LOW			__GENMASK(31, 28)
> +#define   PCI_DVSEC_CXL_RANGE_BASE_LO_RWL		0xF0000000
>  
>  #define CXL_DVSEC_RANGE_MAX				2
>  

These RWL defines really seem to be more kernel policy than spec
definitions.  Do they really belong in the uAPI header?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h
  2026-03-06 17:45   ` Alex Williamson
@ 2026-03-07  0:35     ` Srirangan Madhavan
  2026-03-10 16:13       ` Dave Jiang
  0 siblings, 1 reply; 25+ messages in thread
From: Srirangan Madhavan @ 2026-03-07  0:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: bhelgaas@google.com, dan.j.williams@intel.com,
	dave.jiang@intel.com, jonathan.cameron@huawei.com,
	ira.weiny@intel.com, vishal.l.verma@intel.com,
	alison.schofield@intel.com, dave@stgolabs.net, Jeshua Smith,
	Vikram Sethi, Sai Yashwanth Reddy Kancherla, Vishal Aslot,
	Shanker Donthineni, Manish Honap, Vidya Sagar, Jiandi An,
	Matt Ochs, Derek Schumacher, linux-cxl@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org

> So moving them to the public header is just unnecessary code churn.
Apologies for that oversight. I can remove the unnecessary moves from this patch.

> Also, specifically regarding cxl_find_regblock(), where 5/ implements a
> new cxl_find_component_regblock().  The latter looks significantly
> similar to the original.  Shouldn't we refactor the new PCI implemented
> version to be more generic rather than scope reduced, and export it for
> use by CXL code to avoid the duplication?

Regarding the duplication between cxl_find_component_regblock()
and the CXL subsystem's cxl_find_regblock():
I understand the duplication isn't ideal. I looked into generalizing the PCI version
(cxl_find_component_regblock) and exporting it for CXL to use, but exporting just
the find step from PCI while the rest of the register discovery pipeline stays in CXL
felt like an inconsistent split across the two subsystems. It seems like addressing
that would mean refactoring a significant portion of regs.c into the PCI subsystem,
which also doesn't appear correct.
Given that, would it be acceptable to keep the current scope-reduced version in the
PCI?
Please let me know if it would be acceptable to just refactor the cxl_find_component_regblock
and export it from pci.

Thank you.

Regards,
Srirangan.

________________________________________
From: Alex Williamson <alwilliamson@nvidia.com>
Sent: Friday, March 6, 2026 9:45 AM
To: Srirangan Madhavan
Cc: bhelgaas@google.com; dan.j.williams@intel.com; dave.jiang@intel.com; jonathan.cameron@huawei.com; ira.weiny@intel.com; vishal.l.verma@intel.com; alison.schofield@intel.com; dave@stgolabs.net; Jeshua Smith; Vikram Sethi; Sai Yashwanth Reddy Kancherla; Vishal Aslot; Shanker Donthineni; Manish Honap; Vidya Sagar; Jiandi An; Matt Ochs; Derek Schumacher; linux-cxl@vger.kernel.org; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h

On Fri, 6 Mar 2026 08:00:16 +0000
<smadhavan@nvidia.com> wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> Move CXL HDM decoder register defines, register map structs
> (cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map,
> cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(),
> enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs()
> declarations from internal CXL headers to include/cxl/pci.h.

Regarding cxl_find_regblock()/cxl_setup_regs()..

> This makes them accessible to code outside the CXL subsystem, in
> particular the PCI core CXL state save/restore support added in a
> subsequent patch.

Yet in 5/:

    The Register Locator DVSEC is parsed directly via PCI config space
    reads rather than calling cxl_find_regblock()/cxl_setup_regs(),
    since this code lives in the PCI core and must not depend on CXL
    module symbols.

So moving them to the public header is just unnecessary code churn.

Also, specifically regarding cxl_find_regblock(), where 5/ implements a
new cxl_find_component_regblock().  The latter looks significantly
similar to the original.  Shouldn't we refactor the new PCI implemented
version to be more generic rather than scope reduced, and export it for
use by CXL code to avoid the duplication?  Thanks,

Alex

> No functional change.
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/cxl.h    | 107 +----------------------------------
>  drivers/cxl/cxlpci.h |  10 ----
>  include/cxl/pci.h    | 129 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 130 insertions(+), 116 deletions(-)
>  create mode 100644 include/cxl/pci.h
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 04c673e7cdb0..0c84695449d8 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -12,6 +12,7 @@
>  #include <linux/node.h>
>  #include <linux/io.h>
>  #include <linux/range.h>
> +#include <cxl/pci.h>
>
>  extern const struct nvdimm_security_ops *cxl_security_ops;
>
> @@ -23,63 +24,6 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
>   * (port-driver, region-driver, nvdimm object-drivers... etc).
>   */
>
> -/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
> -#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
> -
> -/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers*/
> -#define CXL_CM_OFFSET 0x1000
> -#define CXL_CM_CAP_HDR_OFFSET 0x0
> -#define   CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0)
> -#define     CM_CAP_HDR_CAP_ID 1
> -#define   CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16)
> -#define     CM_CAP_HDR_CAP_VERSION 1
> -#define   CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20)
> -#define     CM_CAP_HDR_CACHE_MEM_VERSION 1
> -#define   CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
> -#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
> -
> -#define   CXL_CM_CAP_CAP_ID_RAS 0x2
> -#define   CXL_CM_CAP_CAP_ID_HDM 0x5
> -#define   CXL_CM_CAP_CAP_HDM_VERSION 1
> -
> -/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
> -#define CXL_HDM_DECODER_CAP_OFFSET 0x0
> -#define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
> -#define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
> -#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
> -#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
> -#define   CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
> -#define   CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
> -#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
> -#define   CXL_HDM_DECODER_ENABLE BIT(1)
> -#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
> -#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
> -#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
> -#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
> -#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
> -#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
> -#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
> -#define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
> -#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
> -#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> -#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
> -#define   CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
> -#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
> -#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
> -#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
> -#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
> -
> -/* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */
> -#define CXL_DECODER_MIN_GRANULARITY 256
> -#define CXL_DECODER_MAX_ENCODED_IG 6
> -
> -static inline int cxl_hdm_decoder_count(u32 cap_hdr)
> -{
> -     int val = FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, cap_hdr);
> -
> -     return val ? val * 2 : 1;
> -}
> -
>  /* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
>  static inline int eig_to_granularity(u16 eig, unsigned int *granularity)
>  {
> @@ -246,51 +190,6 @@ struct cxl_regs {
>       );
>  };
>
> -struct cxl_reg_map {
> -     bool valid;
> -     int id;
> -     unsigned long offset;
> -     unsigned long size;
> -};
> -
> -struct cxl_component_reg_map {
> -     struct cxl_reg_map hdm_decoder;
> -     struct cxl_reg_map ras;
> -};
> -
> -struct cxl_device_reg_map {
> -     struct cxl_reg_map status;
> -     struct cxl_reg_map mbox;
> -     struct cxl_reg_map memdev;
> -};
> -
> -struct cxl_pmu_reg_map {
> -     struct cxl_reg_map pmu;
> -};
> -
> -/**
> - * struct cxl_register_map - DVSEC harvested register block mapping parameters
> - * @host: device for devm operations and logging
> - * @base: virtual base of the register-block-BAR + @block_offset
> - * @resource: physical resource base of the register block
> - * @max_size: maximum mapping size to perform register search
> - * @reg_type: see enum cxl_regloc_type
> - * @component_map: cxl_reg_map for component registers
> - * @device_map: cxl_reg_maps for device registers
> - * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
> - */
> -struct cxl_register_map {
> -     struct device *host;
> -     void __iomem *base;
> -     resource_size_t resource;
> -     resource_size_t max_size;
> -     u8 reg_type;
> -     union {
> -             struct cxl_component_reg_map component_map;
> -             struct cxl_device_reg_map device_map;
> -             struct cxl_pmu_reg_map pmu_map;
> -     };
> -};
>
>  void cxl_probe_component_regs(struct device *dev, void __iomem *base,
>                             struct cxl_component_reg_map *map);
> @@ -304,13 +203,9 @@ int cxl_map_device_regs(const struct cxl_register_map *map,
>  int cxl_map_pmu_regs(struct cxl_register_map *map, struct cxl_pmu_regs *regs);
>
>  #define CXL_INSTANCES_COUNT -1
> -enum cxl_regloc_type;
>  int cxl_count_regblock(struct pci_dev *pdev, enum cxl_regloc_type type);
>  int cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_type type,
>                              struct cxl_register_map *map, unsigned int index);
> -int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> -                   struct cxl_register_map *map);
> -int cxl_setup_regs(struct cxl_register_map *map);
>  struct cxl_dport;
>  resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
>                                          struct cxl_dport *dport);
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 0cf64218aa16..9e825c039dd9 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -13,16 +13,6 @@
>   */
>  #define CXL_PCI_DEFAULT_MAX_VECTORS 16
>
> -/* Register Block Identifier (RBI) */
> -enum cxl_regloc_type {
> -     CXL_REGLOC_RBI_EMPTY = 0,
> -     CXL_REGLOC_RBI_COMPONENT,
> -     CXL_REGLOC_RBI_VIRT,
> -     CXL_REGLOC_RBI_MEMDEV,
> -     CXL_REGLOC_RBI_PMU,
> -     CXL_REGLOC_RBI_TYPES
> -};
> -
>  /*
>   * Table Access DOE, CDAT Read Entry Response
>   *
> diff --git a/include/cxl/pci.h b/include/cxl/pci.h
> new file mode 100644
> index 000000000000..763a048262c7
> --- /dev/null
> +++ b/include/cxl/pci.h
> @@ -0,0 +1,129 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef __CXL_CXL_PCI_H__
> +#define __CXL_CXL_PCI_H__
> +
> +#include <linux/bitfield.h>
> +#include <linux/bitops.h>
> +#include <linux/io.h>
> +#include <linux/types.h>
> +
> +/* Register Block Identifier (RBI) */
> +enum cxl_regloc_type {
> +     CXL_REGLOC_RBI_EMPTY = 0,
> +     CXL_REGLOC_RBI_COMPONENT,
> +     CXL_REGLOC_RBI_VIRT,
> +     CXL_REGLOC_RBI_MEMDEV,
> +     CXL_REGLOC_RBI_PMU,
> +     CXL_REGLOC_RBI_TYPES
> +};
> +
> +/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
> +#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
> +
> +/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers */
> +#define CXL_CM_OFFSET 0x1000
> +#define CXL_CM_CAP_HDR_OFFSET 0x0
> +#define   CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0)
> +#define     CM_CAP_HDR_CAP_ID 1
> +#define   CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16)
> +#define     CM_CAP_HDR_CAP_VERSION 1
> +#define   CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20)
> +#define     CM_CAP_HDR_CACHE_MEM_VERSION 1
> +#define   CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
> +#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
> +
> +#define   CXL_CM_CAP_CAP_ID_RAS 0x2
> +#define   CXL_CM_CAP_CAP_ID_HDM 0x5
> +#define   CXL_CM_CAP_CAP_HDM_VERSION 1
> +
> +/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
> +#define CXL_HDM_DECODER_CAP_OFFSET 0x0
> +#define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
> +#define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
> +#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
> +#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
> +#define   CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
> +#define   CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
> +#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
> +#define   CXL_HDM_DECODER_ENABLE BIT(1)
> +#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
> +#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
> +#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
> +#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
> +#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
> +#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
> +#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
> +#define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
> +#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
> +#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> +#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
> +#define   CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
> +#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
> +#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
> +#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
> +#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
> +
> +/* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */
> +#define CXL_DECODER_MIN_GRANULARITY 256
> +#define CXL_DECODER_MAX_ENCODED_IG 6
> +
> +static inline int cxl_hdm_decoder_count(u32 cap_hdr)
> +{
> +     int val = FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, cap_hdr);
> +
> +     return val ? val * 2 : 1;
> +}
> +
> +struct cxl_reg_map {
> +     bool valid;
> +     int id;
> +     unsigned long offset;
> +     unsigned long size;
> +};
> +
> +struct cxl_component_reg_map {
> +     struct cxl_reg_map hdm_decoder;
> +     struct cxl_reg_map ras;
> +};
> +
> +struct cxl_device_reg_map {
> +     struct cxl_reg_map status;
> +     struct cxl_reg_map mbox;
> +     struct cxl_reg_map memdev;
> +};
> +
> +struct cxl_pmu_reg_map {
> +     struct cxl_reg_map pmu;
> +};
> +
> +/**
> + * struct cxl_register_map - DVSEC harvested register block mapping parameters
> + * @host: device for devm operations and logging
> + * @base: virtual base of the register-block-BAR + @block_offset
> + * @resource: physical resource base of the register block
> + * @max_size: maximum mapping size to perform register search
> + * @reg_type: see enum cxl_regloc_type
> + * @component_map: cxl_reg_map for component registers
> + * @device_map: cxl_reg_maps for device registers
> + * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
> + */
> +struct cxl_register_map {
> +     struct device *host;
> +     void __iomem *base;
> +     resource_size_t resource;
> +     resource_size_t max_size;
> +     u8 reg_type;
> +     union {
> +             struct cxl_component_reg_map component_map;
> +             struct cxl_device_reg_map device_map;
> +             struct cxl_pmu_reg_map pmu_map;
> +     };
> +};
> +
> +struct pci_dev;
> +
> +int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
> +                   struct cxl_register_map *map);
> +int cxl_setup_regs(struct cxl_register_map *map);
> +
> +#endif /* __CXL_CXL_PCI_H__ */
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions
  2026-03-06 17:45   ` Alex Williamson
@ 2026-03-07  0:37     ` Srirangan Madhavan
  0 siblings, 0 replies; 25+ messages in thread
From: Srirangan Madhavan @ 2026-03-07  0:37 UTC (permalink / raw)
  To: Alex Williamson
  Cc: bhelgaas@google.com, dan.j.williams@intel.com,
	dave.jiang@intel.com, jonathan.cameron@huawei.com,
	ira.weiny@intel.com, vishal.l.verma@intel.com,
	alison.schofield@intel.com, dave@stgolabs.net, Jeshua Smith,
	Vikram Sethi, Sai Yashwanth Reddy Kancherla, Vishal Aslot,
	Shanker Donthineni, Manish Honap, Vidya Sagar, Jiandi An,
	Matt Ochs, Derek Schumacher, linux-cxl@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org

> These RWL defines really seem to be more kernel policy than spec
> definitions.  Do they really belong in the uAPI header?  Thanks,

Ack. I can move the RWL defines locally to where they are used.

________________________________________
From: Alex Williamson <alwilliamson@nvidia.com>
Sent: Friday, March 6, 2026 9:45 AM
To: Srirangan Madhavan
Cc: bhelgaas@google.com; dan.j.williams@intel.com; dave.jiang@intel.com; jonathan.cameron@huawei.com; ira.weiny@intel.com; vishal.l.verma@intel.com; alison.schofield@intel.com; dave@stgolabs.net; Jeshua Smith; Vikram Sethi; Sai Yashwanth Reddy Kancherla; Vishal Aslot; Shanker Donthineni; Manish Honap; Vidya Sagar; Jiandi An; Matt Ochs; Derek Schumacher; linux-cxl@vger.kernel.org; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions

On Fri, 6 Mar 2026 08:00:15 +0000
<smadhavan@nvidia.com> wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> PCI: Add CXL DVSEC control, lock, and range register definitions
>
> Add register offset and field definitions for CXL DVSEC registers needed
> by CXL state save/restore across resets:
>
>   - CTRL2 (offset 0x10) and LOCK (offset 0x14) registers
>   - CONFIG_LOCK bit in the LOCK register
>   - RWL (read-write-when-locked) field masks for CTRL and range base
>     registers.
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  include/uapi/linux/pci_regs.h | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index ec1c54b5a310..6fdc20d7f5e6 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1353,14 +1353,20 @@
>  #define   PCI_DVSEC_CXL_HDM_COUNT                    __GENMASK(5, 4)
>  #define  PCI_DVSEC_CXL_CTRL                          0xC
>  #define   PCI_DVSEC_CXL_MEM_ENABLE                   _BITUL(2)
> +#define   PCI_DVSEC_CXL_CTRL_RWL                     0x5FED
> +#define  PCI_DVSEC_CXL_CTRL2                         0x10
> +#define  PCI_DVSEC_CXL_LOCK                          0x14
> +#define   PCI_DVSEC_CXL_LOCK_CONFIG                  _BITUL(0)
>  #define  PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i)            (0x18 + (i * 0x10))
>  #define  PCI_DVSEC_CXL_RANGE_SIZE_LOW(i)             (0x1C + (i * 0x10))
>  #define   PCI_DVSEC_CXL_MEM_INFO_VALID                       _BITUL(0)
>  #define   PCI_DVSEC_CXL_MEM_ACTIVE                   _BITUL(1)
>  #define   PCI_DVSEC_CXL_MEM_SIZE_LOW                 __GENMASK(31, 28)
>  #define  PCI_DVSEC_CXL_RANGE_BASE_HIGH(i)            (0x20 + (i * 0x10))
> +#define   PCI_DVSEC_CXL_RANGE_BASE_HI_RWL            0xFFFFFFFF
>  #define  PCI_DVSEC_CXL_RANGE_BASE_LOW(i)             (0x24 + (i * 0x10))
>  #define   PCI_DVSEC_CXL_MEM_BASE_LOW                 __GENMASK(31, 28)
> +#define   PCI_DVSEC_CXL_RANGE_BASE_LO_RWL            0xF0000000
>
>  #define CXL_DVSEC_RANGE_MAX                          2
>

These RWL defines really seem to be more kernel policy than spec
definitions.  Do they really belong in the uAPI header?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h
  2026-03-07  0:35     ` Srirangan Madhavan
@ 2026-03-10 16:13       ` Dave Jiang
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Jiang @ 2026-03-10 16:13 UTC (permalink / raw)
  To: Srirangan Madhavan, Alex Williamson
  Cc: bhelgaas@google.com, dan.j.williams@intel.com,
	jonathan.cameron@huawei.com, ira.weiny@intel.com,
	vishal.l.verma@intel.com, alison.schofield@intel.com,
	dave@stgolabs.net, Jeshua Smith, Vikram Sethi,
	Sai Yashwanth Reddy Kancherla, Vishal Aslot, Shanker Donthineni,
	Manish Honap, Vidya Sagar, Jiandi An, Matt Ochs, Derek Schumacher,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org



On 3/6/26 5:35 PM, Srirangan Madhavan wrote:
>> So moving them to the public header is just unnecessary code churn.
> Apologies for that oversight. I can remove the unnecessary moves from this patch.
> 
>> Also, specifically regarding cxl_find_regblock(), where 5/ implements a
>> new cxl_find_component_regblock().  The latter looks significantly
>> similar to the original.  Shouldn't we refactor the new PCI implemented
>> version to be more generic rather than scope reduced, and export it for
>> use by CXL code to avoid the duplication?
> 
> Regarding the duplication between cxl_find_component_regblock()
> and the CXL subsystem's cxl_find_regblock():
> I understand the duplication isn't ideal. I looked into generalizing the PCI version
> (cxl_find_component_regblock) and exporting it for CXL to use, but exporting just
> the find step from PCI while the rest of the register discovery pipeline stays in CXL
> felt like an inconsistent split across the two subsystems. It seems like addressing
> that would mean refactoring a significant portion of regs.c into the PCI subsystem,
> which also doesn't appear correct.
> Given that, would it be acceptable to keep the current scope-reduced version in the
> PCI?
> Please let me know if it would be acceptable to just refactor the cxl_find_component_regblock
> and export it from pci.

Hi Srirangan. PCI base spec in r6.2 introduced MMIO register blocks (MRB) that is patterned after CXL register blocks. A while back I hacked up some code [1] to unify the CXL register block discovery code with PCI usages. It never got around to go upstream since I was waiting on a PCI device that utilizes this new scheme. However, maybe this can be used as inspiration for your uses. Maybe we should start refactoring to make this code common given there will be PCI usages eventually?

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/djiang/linux.git/log/?h=pci-mbox
 
> 
> Thank you.
> 
> Regards,
> Srirangan.
> 
> ________________________________________
> From: Alex Williamson <alwilliamson@nvidia.com>
> Sent: Friday, March 6, 2026 9:45 AM
> To: Srirangan Madhavan
> Cc: bhelgaas@google.com; dan.j.williams@intel.com; dave.jiang@intel.com; jonathan.cameron@huawei.com; ira.weiny@intel.com; vishal.l.verma@intel.com; alison.schofield@intel.com; dave@stgolabs.net; Jeshua Smith; Vikram Sethi; Sai Yashwanth Reddy Kancherla; Vishal Aslot; Shanker Donthineni; Manish Honap; Vidya Sagar; Jiandi An; Matt Ochs; Derek Schumacher; linux-cxl@vger.kernel.org; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h
> 
> On Fri, 6 Mar 2026 08:00:16 +0000
> <smadhavan@nvidia.com> wrote:
> 
>> From: Srirangan Madhavan <smadhavan@nvidia.com>
>>
>> Move CXL HDM decoder register defines, register map structs
>> (cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map,
>> cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(),
>> enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs()
>> declarations from internal CXL headers to include/cxl/pci.h.
> 
> Regarding cxl_find_regblock()/cxl_setup_regs()..
> 
>> This makes them accessible to code outside the CXL subsystem, in
>> particular the PCI core CXL state save/restore support added in a
>> subsequent patch.
> 
> Yet in 5/:
> 
>     The Register Locator DVSEC is parsed directly via PCI config space
>     reads rather than calling cxl_find_regblock()/cxl_setup_regs(),
>     since this code lives in the PCI core and must not depend on CXL
>     module symbols.
> 
> So moving them to the public header is just unnecessary code churn.
> 
> Also, specifically regarding cxl_find_regblock(), where 5/ implements a
> new cxl_find_component_regblock().  The latter looks significantly
> similar to the original.  Shouldn't we refactor the new PCI implemented
> version to be more generic rather than scope reduced, and export it for
> use by CXL code to avoid the duplication?  Thanks,
> 
> Alex
> 
>> No functional change.
>>
>> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
>> ---
>>  drivers/cxl/cxl.h    | 107 +----------------------------------
>>  drivers/cxl/cxlpci.h |  10 ----
>>  include/cxl/pci.h    | 129 +++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 130 insertions(+), 116 deletions(-)
>>  create mode 100644 include/cxl/pci.h
>>
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 04c673e7cdb0..0c84695449d8 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -12,6 +12,7 @@
>>  #include <linux/node.h>
>>  #include <linux/io.h>
>>  #include <linux/range.h>
>> +#include <cxl/pci.h>
>>
>>  extern const struct nvdimm_security_ops *cxl_security_ops;
>>
>> @@ -23,63 +24,6 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
>>   * (port-driver, region-driver, nvdimm object-drivers... etc).
>>   */
>>
>> -/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
>> -#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
>> -
>> -/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers*/
>> -#define CXL_CM_OFFSET 0x1000
>> -#define CXL_CM_CAP_HDR_OFFSET 0x0
>> -#define   CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0)
>> -#define     CM_CAP_HDR_CAP_ID 1
>> -#define   CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16)
>> -#define     CM_CAP_HDR_CAP_VERSION 1
>> -#define   CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20)
>> -#define     CM_CAP_HDR_CACHE_MEM_VERSION 1
>> -#define   CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
>> -#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
>> -
>> -#define   CXL_CM_CAP_CAP_ID_RAS 0x2
>> -#define   CXL_CM_CAP_CAP_ID_HDM 0x5
>> -#define   CXL_CM_CAP_CAP_HDM_VERSION 1
>> -
>> -/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
>> -#define CXL_HDM_DECODER_CAP_OFFSET 0x0
>> -#define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
>> -#define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
>> -#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
>> -#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
>> -#define   CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
>> -#define   CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
>> -#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
>> -#define   CXL_HDM_DECODER_ENABLE BIT(1)
>> -#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
>> -#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
>> -#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
>> -#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
>> -#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
>> -#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
>> -#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
>> -#define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
>> -#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
>> -#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
>> -#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
>> -#define   CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
>> -#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
>> -#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
>> -#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
>> -#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
>> -
>> -/* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */
>> -#define CXL_DECODER_MIN_GRANULARITY 256
>> -#define CXL_DECODER_MAX_ENCODED_IG 6
>> -
>> -static inline int cxl_hdm_decoder_count(u32 cap_hdr)
>> -{
>> -     int val = FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, cap_hdr);
>> -
>> -     return val ? val * 2 : 1;
>> -}
>> -
>>  /* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
>>  static inline int eig_to_granularity(u16 eig, unsigned int *granularity)
>>  {
>> @@ -246,51 +190,6 @@ struct cxl_regs {
>>       );
>>  };
>>
>> -struct cxl_reg_map {
>> -     bool valid;
>> -     int id;
>> -     unsigned long offset;
>> -     unsigned long size;
>> -};
>> -
>> -struct cxl_component_reg_map {
>> -     struct cxl_reg_map hdm_decoder;
>> -     struct cxl_reg_map ras;
>> -};
>> -
>> -struct cxl_device_reg_map {
>> -     struct cxl_reg_map status;
>> -     struct cxl_reg_map mbox;
>> -     struct cxl_reg_map memdev;
>> -};
>> -
>> -struct cxl_pmu_reg_map {
>> -     struct cxl_reg_map pmu;
>> -};
>> -
>> -/**
>> - * struct cxl_register_map - DVSEC harvested register block mapping parameters
>> - * @host: device for devm operations and logging
>> - * @base: virtual base of the register-block-BAR + @block_offset
>> - * @resource: physical resource base of the register block
>> - * @max_size: maximum mapping size to perform register search
>> - * @reg_type: see enum cxl_regloc_type
>> - * @component_map: cxl_reg_map for component registers
>> - * @device_map: cxl_reg_maps for device registers
>> - * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
>> - */
>> -struct cxl_register_map {
>> -     struct device *host;
>> -     void __iomem *base;
>> -     resource_size_t resource;
>> -     resource_size_t max_size;
>> -     u8 reg_type;
>> -     union {
>> -             struct cxl_component_reg_map component_map;
>> -             struct cxl_device_reg_map device_map;
>> -             struct cxl_pmu_reg_map pmu_map;
>> -     };
>> -};
>>
>>  void cxl_probe_component_regs(struct device *dev, void __iomem *base,
>>                             struct cxl_component_reg_map *map);
>> @@ -304,13 +203,9 @@ int cxl_map_device_regs(const struct cxl_register_map *map,
>>  int cxl_map_pmu_regs(struct cxl_register_map *map, struct cxl_pmu_regs *regs);
>>
>>  #define CXL_INSTANCES_COUNT -1
>> -enum cxl_regloc_type;
>>  int cxl_count_regblock(struct pci_dev *pdev, enum cxl_regloc_type type);
>>  int cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_type type,
>>                              struct cxl_register_map *map, unsigned int index);
>> -int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
>> -                   struct cxl_register_map *map);
>> -int cxl_setup_regs(struct cxl_register_map *map);
>>  struct cxl_dport;
>>  resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
>>                                          struct cxl_dport *dport);
>> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
>> index 0cf64218aa16..9e825c039dd9 100644
>> --- a/drivers/cxl/cxlpci.h
>> +++ b/drivers/cxl/cxlpci.h
>> @@ -13,16 +13,6 @@
>>   */
>>  #define CXL_PCI_DEFAULT_MAX_VECTORS 16
>>
>> -/* Register Block Identifier (RBI) */
>> -enum cxl_regloc_type {
>> -     CXL_REGLOC_RBI_EMPTY = 0,
>> -     CXL_REGLOC_RBI_COMPONENT,
>> -     CXL_REGLOC_RBI_VIRT,
>> -     CXL_REGLOC_RBI_MEMDEV,
>> -     CXL_REGLOC_RBI_PMU,
>> -     CXL_REGLOC_RBI_TYPES
>> -};
>> -
>>  /*
>>   * Table Access DOE, CDAT Read Entry Response
>>   *
>> diff --git a/include/cxl/pci.h b/include/cxl/pci.h
>> new file mode 100644
>> index 000000000000..763a048262c7
>> --- /dev/null
>> +++ b/include/cxl/pci.h
>> @@ -0,0 +1,129 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#ifndef __CXL_CXL_PCI_H__
>> +#define __CXL_CXL_PCI_H__
>> +
>> +#include <linux/bitfield.h>
>> +#include <linux/bitops.h>
>> +#include <linux/io.h>
>> +#include <linux/types.h>
>> +
>> +/* Register Block Identifier (RBI) */
>> +enum cxl_regloc_type {
>> +     CXL_REGLOC_RBI_EMPTY = 0,
>> +     CXL_REGLOC_RBI_COMPONENT,
>> +     CXL_REGLOC_RBI_VIRT,
>> +     CXL_REGLOC_RBI_MEMDEV,
>> +     CXL_REGLOC_RBI_PMU,
>> +     CXL_REGLOC_RBI_TYPES
>> +};
>> +
>> +/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
>> +#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
>> +
>> +/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers */
>> +#define CXL_CM_OFFSET 0x1000
>> +#define CXL_CM_CAP_HDR_OFFSET 0x0
>> +#define   CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0)
>> +#define     CM_CAP_HDR_CAP_ID 1
>> +#define   CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16)
>> +#define     CM_CAP_HDR_CAP_VERSION 1
>> +#define   CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20)
>> +#define     CM_CAP_HDR_CACHE_MEM_VERSION 1
>> +#define   CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
>> +#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
>> +
>> +#define   CXL_CM_CAP_CAP_ID_RAS 0x2
>> +#define   CXL_CM_CAP_CAP_ID_HDM 0x5
>> +#define   CXL_CM_CAP_CAP_HDM_VERSION 1
>> +
>> +/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
>> +#define CXL_HDM_DECODER_CAP_OFFSET 0x0
>> +#define   CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
>> +#define   CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
>> +#define   CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
>> +#define   CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
>> +#define   CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
>> +#define   CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
>> +#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
>> +#define   CXL_HDM_DECODER_ENABLE BIT(1)
>> +#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
>> +#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
>> +#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
>> +#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
>> +#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
>> +#define   CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
>> +#define   CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
>> +#define   CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
>> +#define   CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
>> +#define   CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
>> +#define   CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
>> +#define   CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
>> +#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
>> +#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
>> +#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
>> +#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
>> +
>> +/* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */
>> +#define CXL_DECODER_MIN_GRANULARITY 256
>> +#define CXL_DECODER_MAX_ENCODED_IG 6
>> +
>> +static inline int cxl_hdm_decoder_count(u32 cap_hdr)
>> +{
>> +     int val = FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, cap_hdr);
>> +
>> +     return val ? val * 2 : 1;
>> +}
>> +
>> +struct cxl_reg_map {
>> +     bool valid;
>> +     int id;
>> +     unsigned long offset;
>> +     unsigned long size;
>> +};
>> +
>> +struct cxl_component_reg_map {
>> +     struct cxl_reg_map hdm_decoder;
>> +     struct cxl_reg_map ras;
>> +};
>> +
>> +struct cxl_device_reg_map {
>> +     struct cxl_reg_map status;
>> +     struct cxl_reg_map mbox;
>> +     struct cxl_reg_map memdev;
>> +};
>> +
>> +struct cxl_pmu_reg_map {
>> +     struct cxl_reg_map pmu;
>> +};
>> +
>> +/**
>> + * struct cxl_register_map - DVSEC harvested register block mapping parameters
>> + * @host: device for devm operations and logging
>> + * @base: virtual base of the register-block-BAR + @block_offset
>> + * @resource: physical resource base of the register block
>> + * @max_size: maximum mapping size to perform register search
>> + * @reg_type: see enum cxl_regloc_type
>> + * @component_map: cxl_reg_map for component registers
>> + * @device_map: cxl_reg_maps for device registers
>> + * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
>> + */
>> +struct cxl_register_map {
>> +     struct device *host;
>> +     void __iomem *base;
>> +     resource_size_t resource;
>> +     resource_size_t max_size;
>> +     u8 reg_type;
>> +     union {
>> +             struct cxl_component_reg_map component_map;
>> +             struct cxl_device_reg_map device_map;
>> +             struct cxl_pmu_reg_map pmu_map;
>> +     };
>> +};
>> +
>> +struct pci_dev;
>> +
>> +int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
>> +                   struct cxl_register_map *map);
>> +int cxl_setup_regs(struct cxl_register_map *map);
>> +
>> +#endif /* __CXL_CXL_PCI_H__ */
>> --
>> 2.43.0
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
  2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
                   ` (4 preceding siblings ...)
  2026-03-06  8:00 ` [PATCH 5/5] PCI: Add HDM decoder state save/restore smadhavan
@ 2026-03-10 21:39 ` Dan Williams
  2026-03-10 22:46   ` Alex Williamson
  2026-03-12 12:34 ` Jonathan Cameron
  6 siblings, 1 reply; 25+ messages in thread
From: Dan Williams @ 2026-03-10 21:39 UTC (permalink / raw)
  To: smadhavan, bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron,
	ira.weiny, vishal.l.verma, alison.schofield, dave
  Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
	mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
	linux-kernel, Srirangan Madhavan

smadhavan@ wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> CXL devices could lose their DVSEC configuration and HDM decoder programming
> after multiple reset methods (whenever link disable/enable). This means a
> device that was fully configured — with DVSEC control/range registers set
> and HDM decoders committed — loses that state after reset. In cases where
> these are programmed by firmware, downstream drivers are unable to re-initialize
> the device because CXL memory ranges are no longer mapped.
> 
> This series adds CXL state save/restore logic to the PCI core so
>
> that DVSEC and HDM decoder state is preserved across any PCI reset
> path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.

The PCI core has no business learning CXL core internals.

For example, I have been pushing the CXL port protocol error handling
series to minimally involve the PCI core. Just enough enabling to
forward AER events, but otherwise PCI core stays blissfully unaware of
CXL details. The alternative is maintenance burden to the
PCI core that I expect is best to avoid.

> HDM decoder defines and the cxl_register_map infrastructure are moved from
> internal CXL driver headers to a new public include/cxl/pci.h, allowing
> drivers/pci/cxl.c to use them.
> This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
> minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
> changes on top of his.

I think we need to evaluate where things stand after both the CXL port
error handling series and the CXL accelerator base series have landed.
Not that they are functionally dependendent on each other, but there is
a review backlog that needs to clear, and those establish the precedent
about where CXL functionality lands between PCI core, CXL core, and CXL
enlightened drivers.

> These patches were previously part of the CXL reset series and have been
> split out [2] to allow independent review and merging. Review feedback on
> the save/restore portions from v4 has been addressed.
> 
> Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
> before reset and restored after, with decoder commit confirmed via the
> COMMITTED status bit. Type-3 device testing is in progress.

It is a memory hot plug event.An accelerator driver can coordinate
quiescing CXL.mem over events like reset, a memory expander driver can
not. The PCI core can not manage memory hot plug.  It is the wrong place
to enable this specific CXL reset because PCI core has no idea about the
suitability of reset at any given point of time.

Now, the secondary bus reset enabling for the CXL did end up with
changes to the PCI core:

53c49b6e6dd2 PCI/CXL: Add 'cxl_bus' reset method for devices below CXL Ports

...but only to disambiguate that hardware may be blocking secondary bus
reset by default. However, as the cxl_reset_done() handler shows, there
is zero coordination. One might get lucky and be able to see those
dev_crit() messages before the kernel crashes in the memory expander
case.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions
  2026-03-06  8:00 ` [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions smadhavan
  2026-03-06 17:45   ` Alex Williamson
@ 2026-03-10 21:44   ` Dan Williams
  2026-03-16 14:02     ` Vishal Aslot
  1 sibling, 1 reply; 25+ messages in thread
From: Dan Williams @ 2026-03-10 21:44 UTC (permalink / raw)
  To: smadhavan, bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron,
	ira.weiny, vishal.l.verma, alison.schofield, dave
  Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
	mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
	linux-kernel, Srirangan Madhavan

smadhavan@ wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> PCI: Add CXL DVSEC control, lock, and range register definitions
> 
> Add register offset and field definitions for CXL DVSEC registers needed
> by CXL state save/restore across resets:
> 
>   - CTRL2 (offset 0x10) and LOCK (offset 0x14) registers
>   - CONFIG_LOCK bit in the LOCK register
>   - RWL (read-write-when-locked) field masks for CTRL and range base
>     registers.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  include/uapi/linux/pci_regs.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index ec1c54b5a310..6fdc20d7f5e6 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1353,14 +1353,20 @@
>  #define   PCI_DVSEC_CXL_HDM_COUNT			__GENMASK(5, 4)
>  #define  PCI_DVSEC_CXL_CTRL				0xC
>  #define   PCI_DVSEC_CXL_MEM_ENABLE			_BITUL(2)
> +#define   PCI_DVSEC_CXL_CTRL_RWL			0x5FED

This is odd, why is it needed? If the bits are locked then writes are
dropped.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] PCI: Add virtual extended cap save buffer for CXL state
  2026-03-06  8:00 ` [PATCH 3/5] PCI: Add virtual extended cap save buffer for CXL state smadhavan
@ 2026-03-10 21:45   ` Dan Williams
  0 siblings, 0 replies; 25+ messages in thread
From: Dan Williams @ 2026-03-10 21:45 UTC (permalink / raw)
  To: smadhavan, bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron,
	ira.weiny, vishal.l.verma, alison.schofield, dave
  Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
	mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
	linux-kernel, Srirangan Madhavan

smadhavan@ wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Add pci_add_virtual_ext_cap_save_buffer() to allocate save buffers
> using virtual cap IDs (above PCI_EXT_CAP_ID_MAX) that don't require
> a real capability in config space.
> 
> The existing pci_add_ext_cap_save_buffer() cannot be used for
> CXL DVSEC state because it calls pci_find_saved_ext_cap()
> which searches for a matching capability in PCI config space.
> The CXL state saved here is a synthetic snapshot (DVSEC+HDM)
> and should not be tied to a real extended-cap instance. A
> virtual extended-cap save buffer API (cap IDs above
> PCI_EXT_CAP_ID_MAX) allows PCI to track this state without
> a backing config space capability.

I think this goes away if PCI core is not tasked with understand CXL
internals.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
  2026-03-10 21:39 ` [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets Dan Williams
@ 2026-03-10 22:46   ` Alex Williamson
  2026-03-11  1:45     ` Dan Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2026-03-10 22:46 UTC (permalink / raw)
  To: Dan Williams
  Cc: alex, smadhavan, bhelgaas, dave.jiang, jonathan.cameron,
	ira.weiny, vishal.l.verma, alison.schofield, dave, jeshuas,
	vsethi, skancherla, vaslot, sdonthineni, mhonap, vidyas, jan,
	mochs, dschumacher, linux-cxl, linux-pci, linux-kernel

Hey Dan,

On Tue, 10 Mar 2026 14:39:25 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> smadhavan@ wrote:
> > From: Srirangan Madhavan <smadhavan@nvidia.com>
> > 
> > CXL devices could lose their DVSEC configuration and HDM decoder programming
> > after multiple reset methods (whenever link disable/enable). This means a
> > device that was fully configured — with DVSEC control/range registers set
> > and HDM decoders committed — loses that state after reset. In cases where
> > these are programmed by firmware, downstream drivers are unable to re-initialize
> > the device because CXL memory ranges are no longer mapped.
> > 
> > This series adds CXL state save/restore logic to the PCI core so
> >
> > that DVSEC and HDM decoder state is preserved across any PCI reset
> > path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.  
> 
> The PCI core has no business learning CXL core internals.
> 
> For example, I have been pushing the CXL port protocol error handling
> series to minimally involve the PCI core. Just enough enabling to
> forward AER events, but otherwise PCI core stays blissfully unaware of
> CXL details. The alternative is maintenance burden to the
> PCI core that I expect is best to avoid.
> 
> > HDM decoder defines and the cxl_register_map infrastructure are moved from
> > internal CXL driver headers to a new public include/cxl/pci.h, allowing
> > drivers/pci/cxl.c to use them.
> > This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
> > minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
> > changes on top of his.  
> 
> I think we need to evaluate where things stand after both the CXL port
> error handling series and the CXL accelerator base series have landed.
> Not that they are functionally dependendent on each other, but there is
> a review backlog that needs to clear, and those establish the precedent
> about where CXL functionality lands between PCI core, CXL core, and CXL
> enlightened drivers.
> 
> > These patches were previously part of the CXL reset series and have been
> > split out [2] to allow independent review and merging. Review feedback on
> > the save/restore portions from v4 has been addressed.
> > 
> > Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
> > before reset and restored after, with decoder commit confirmed via the
> > COMMITTED status bit. Type-3 device testing is in progress.  
> 
> It is a memory hot plug event.An accelerator driver can coordinate
> quiescing CXL.mem over events like reset, a memory expander driver can
> not. The PCI core can not manage memory hot plug.  It is the wrong place
> to enable this specific CXL reset because PCI core has no idea about the
> suitability of reset at any given point of time.
> 
> Now, the secondary bus reset enabling for the CXL did end up with
> changes to the PCI core:
> 
> 53c49b6e6dd2 PCI/CXL: Add 'cxl_bus' reset method for devices below CXL Ports
> 
> ...but only to disambiguate that hardware may be blocking secondary bus
> reset by default. However, as the cxl_reset_done() handler shows, there
> is zero coordination. One might get lucky and be able to see those
> dev_crit() messages before the kernel crashes in the memory expander
> case.

A constraint here is that CXL_BUS can be modular while PCI is builtin,
but reset is initiated through PCI and drivers like vfio-pci already
manage an opaque blob of PCI device state that can be pushed back into
the device to restore it between use cases.  If PCI is not enlightened
about CXL state to some extent, how does this work?

PCI core has already been enlightened about things like virtual-channel
that it doesn't otherwise touch in order to be able to save and restore
firmware initiated configurations.  I think there are aspects of that
sort of thing here as well.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
  2026-03-10 22:46   ` Alex Williamson
@ 2026-03-11  1:45     ` Dan Williams
  2026-03-17 14:51       ` Manish Honap
  0 siblings, 1 reply; 25+ messages in thread
From: Dan Williams @ 2026-03-11  1:45 UTC (permalink / raw)
  To: Alex Williamson, Dan Williams
  Cc: alex, smadhavan, bhelgaas, dave.jiang, jonathan.cameron,
	ira.weiny, vishal.l.verma, alison.schofield, dave, jeshuas,
	vsethi, skancherla, vaslot, sdonthineni, mhonap, vidyas, jan,
	mochs, dschumacher, linux-cxl, linux-pci, linux-kernel

Alex Williamson wrote:
[..]
> A constraint here is that CXL_BUS can be modular while PCI is builtin,
> but reset is initiated through PCI and drivers like vfio-pci already
> manage an opaque blob of PCI device state that can be pushed back into
> the device to restore it between use cases.  If PCI is not enlightened
> about CXL state to some extent, how does this work?

My expectation is that "vfio-cxl" is responsible. Similar to vfio-pci
that builds on PCI core functionality for assigning a device, a vfio-cxl
driver would build on CXL.

Specficially, register generic 'struct cxl_memdev' and/or, 'struct
cxl_cachdev' objects with the CXL core, like any other accelerator
driver, and coordinate various levels of reset on those objects, not the
'struct pci_dev'.

> PCI core has already been enlightened about things like virtual-channel
> that it doesn't otherwise touch in order to be able to save and restore
> firmware initiated configurations.  I think there are aspects of that
> sort of thing here as well.  Thanks,

I am willing to hear more and admit I am not familiar with the details
of virtual-channel that make it both amenable to vfio-pci management and
similar to CXL. CXL needs to consider MM, cross-device dependencies, and
decoder topology management that is more dynamic than what happens for
PCI resources.

The CXL accelerator series is currently contending with being able to
restore device configuration after reset. I expect vfio-cxl to build on
that, not push CXL flows into the PCI core.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets
  2026-03-06  8:00 ` [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets smadhavan
  2026-03-06 17:45   ` Alex Williamson
@ 2026-03-12 12:28   ` Jonathan Cameron
  1 sibling, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2026-03-12 12:28 UTC (permalink / raw)
  To: smadhavan
  Cc: bhelgaas, dan.j.williams, dave.jiang, ira.weiny, vishal.l.verma,
	alison.schofield, dave, alwilliamson, jeshuas, vsethi, skancherla,
	vaslot, sdonthineni, mhonap, vidyas, jan, mochs, dschumacher,
	linux-cxl, linux-pci, linux-kernel

On Fri, 6 Mar 2026 08:00:18 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Save and restore CXL DVSEC control registers (CTRL, CTRL2), range
> base registers, and lock state across PCI resets.
> 
> When the DVSEC CONFIG_LOCK bit is set, certain DVSEC fields
> become read-only and hardware may have updated them. 

This I'm not following.  Can you give an example of which
fields the hardware is allowed to change after lock is set?

> Blindly
> restoring saved values would be silently ignored or conflict
> with hardware state. Instead, a read-merge-write approach is
> used: current hardware values are read for the RWL
> (read-write-when-locked) fields and merged with saved state,
> so only writable bits are restored while locked bits retain
> their hardware values.
> 
> Hooked into pci_save_state()/pci_restore_state() so all PCI reset
> paths automatically preserve CXL DVSEC configuration.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
  2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
                   ` (5 preceding siblings ...)
  2026-03-10 21:39 ` [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets Dan Williams
@ 2026-03-12 12:34 ` Jonathan Cameron
  2026-03-16 13:59   ` Vishal Aslot
  6 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2026-03-12 12:34 UTC (permalink / raw)
  To: smadhavan
  Cc: bhelgaas, dan.j.williams, dave.jiang, ira.weiny, vishal.l.verma,
	alison.schofield, dave, alwilliamson, jeshuas, vsethi, skancherla,
	vaslot, sdonthineni, mhonap, vidyas, jan, mochs, dschumacher,
	linux-cxl, linux-pci, linux-kernel

On Fri, 6 Mar 2026 08:00:14 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> CXL devices could lose their DVSEC configuration and HDM decoder programming
> after multiple reset methods (whenever link disable/enable). This means a
> device that was fully configured — with DVSEC control/range registers set
> and HDM decoders committed — loses that state after reset. In cases where
> these are programmed by firmware, downstream drivers are unable to re-initialize
> the device because CXL memory ranges are no longer mapped.

Hi Srirangan,

Firstly this might be because I'm behind on patch review and there is
a lot going on right now!  So this might be addressed in a different series.

I'd like to understand the whole use case + flow here.  In general I think
we have a problem if a driver is relying on the bios having set up the
decoders and simply doesn't function if the bios didn't do it (and
that applies in this reset case as well). For starters, no hotplug.
Anyhow, that's a different issue, so we can leave that for now.

I'm thinking the reset flow is a good deal more complex than simply
putting the bios programmed values back.  In some cases that might
be a very bad idea as autonomous traffic can hit the type 2 device
the moment these decoders are enabled and I'm guessing that may be
before the device has fully recovered. There are very few spec rules
about this that I can recall. On the setup path the BIOS presumably
got the device into a state where enabling such traffic was fine
and hopefully the driver bind doesn't break that state.

I think you are restoring CXL.mem as well so that gate isn't
going to save us.  Note it would be good to document what is restored and
why more clearly.  Sure we can figure it out from the code, but
a document might make life easier.

A device might handle this mess for us, but I doubt that this is universal.
For type 3 devices, I'm not sure what we want to do on reset in general.

Anyhow, this is really a request for a more detailed description of the
expected reset flow that goes into what the spec constrains and what
it doesn't.  Probably something worthy of going in Documentation.

Thanks,

Jonathan

> 
> This series adds CXL state save/restore logic to the PCI core so
> that DVSEC and HDM decoder state is preserved across any PCI reset
> path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.
> 
> HDM decoder defines and the cxl_register_map infrastructure are moved from
> internal CXL driver headers to a new public include/cxl/pci.h, allowing
> drivers/pci/cxl.c to use them.
> This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
> minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
> changes on top of his.
> 
> These patches were previously part of the CXL reset series and have been
> split out [2] to allow independent review and merging. Review feedback on
> the save/restore portions from v4 has been addressed.
> 
> Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
> before reset and restored after, with decoder commit confirmed via the
> COMMITTED status bit. Type-3 device testing is in progress.
> 
> This series is based on v7.0-rc1.
> 
> [1] https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/
> [2] https://lore.kernel.org/linux-cxl/aa8d4f6a-e7bd-4a20-8d34-4376ea314b8f@intel.com/T/#m825c6bdd1934022123807e86d235358a63b08dbc
> 
> Srirangan Madhavan (5):
>   PCI: Add CXL DVSEC control, lock, and range register definitions
>   cxl: Move HDM decoder and register map definitions to
>     include/cxl/pci.h
>   PCI: Add virtual extended cap save buffer for CXL state
>   PCI: Add cxl DVSEC state save/restore across resets
>   PCI/CXL: Add HDM decoder state save/restore
> 
>  drivers/cxl/cxl.h             | 107 +-------
>  drivers/cxl/cxlpci.h          |  10 -
>  drivers/pci/Kconfig           |   4 +
>  drivers/pci/Makefile          |   1 +
>  drivers/pci/cxl.c             | 468 ++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.c             |  23 ++
>  drivers/pci/pci.h             |  18 ++
>  include/cxl/pci.h             | 129 ++++++++++
>  include/uapi/linux/pci_regs.h |   6 +
>  9 files changed, 650 insertions(+), 116 deletions(-)
>  create mode 100644 drivers/pci/cxl.c
>  create mode 100644 include/cxl/pci.h
> 
> base-commit: 6de23f81a5e0
> --
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
  2026-03-12 12:34 ` Jonathan Cameron
@ 2026-03-16 13:59   ` Vishal Aslot
  2026-03-16 17:28     ` Jonathan Cameron
  0 siblings, 1 reply; 25+ messages in thread
From: Vishal Aslot @ 2026-03-16 13:59 UTC (permalink / raw)
  To: Jonathan Cameron, Srirangan Madhavan
  Cc: bhelgaas@google.com, dan.j.williams@intel.com,
	dave.jiang@intel.com, ira.weiny@intel.com,
	vishal.l.verma@intel.com, alison.schofield@intel.com,
	dave@stgolabs.net, Alex Williamson, Jeshua Smith, Vikram Sethi,
	Sai Yashwanth Reddy Kancherla, Shanker Donthineni, Manish Honap,
	Vidya Sagar, Jiandi An, Matt Ochs, Derek Schumacher,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

Hi Jonathan,
          Replying on behalf of Srirangan. Srirangan, feel free to add to this.
          
>  In general I think
we have a problem if a driver is relying on the bios having set up the
decoders and simply doesn't function if the bios didn't do it (and
that applies in this reset case as well).

The issue for us (and possibly for other devices as well) is that it is possible to reset a device without the device driver being loaded at all. One usecase is in-band fw update followed by reset to cause the device to go through its own secure boot. Now, arguably the device driver can restore DVSEC + HDM decoder + whatever else once it loads. If the driver is not loaded, it is not possible to use the device anyway. The main thing here is that without the driver present, reset_prep() and reset_done() cannot be handled, so we wanted this to be restored by the kernel itself.

There is a virtualization as well as confidential computing angle here as well but I am not fully versed in that path to comment.

> I'm thinking the reset flow is a good deal more complex than simply
putting the bios programmed values back.  In some cases that might
be a very bad idea as autonomous traffic can hit the type 2 device
the moment these decoders are enabled and I'm guessing that may be
before the device has fully recovered. 

Yes, completely agree. In fact for our GPUs it's a combinatin of kernel-first and firmware-first. But we, at least, for our use-case ensure that the device will not be used before reset is completed by means outside of the OS control.

> Note it would be good to document what is restored ....more clearly
I agree. Srirangan, could you please list out the DVSECs and capabilities that you are saving and restoring? 
In general, we should be saving and restoring anything that was locked but possibly got unlocked or was set back to its reset value. Things like mem.enable, cache.enable in DVSEC 0, range registers, HDM decoders would be at least unlocked if no wiped out, etc.
 
________________________________________
From: Jonathan Cameron <jonathan.cameron@huawei.com>
Sent: Thursday, March 12, 2026 7:34 AM
To: Srirangan Madhavan
Cc: bhelgaas@google.com; dan.j.williams@intel.com; dave.jiang@intel.com; ira.weiny@intel.com; vishal.l.verma@intel.com; alison.schofield@intel.com; dave@stgolabs.net; Alex Williamson; Jeshua Smith; Vikram Sethi; Sai Yashwanth Reddy Kancherla; Vishal Aslot; Shanker Donthineni; Manish Honap; Vidya Sagar; Jiandi An; Matt Ochs; Derek Schumacher; linux-cxl@vger.kernel.org; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets

External email: Use caution opening links or attachments


On Fri, 6 Mar 2026 08:00:14 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> CXL devices could lose their DVSEC configuration and HDM decoder programming
> after multiple reset methods (whenever link disable/enable). This means a
> device that was fully configured — with DVSEC control/range registers set
> and HDM decoders committed — loses that state after reset. In cases where
> these are programmed by firmware, downstream drivers are unable to re-initialize
> the device because CXL memory ranges are no longer mapped.

Hi Srirangan,

Firstly this might be because I'm behind on patch review and there is
a lot going on right now!  So this might be addressed in a different series.

I'd like to understand the whole use case + flow here.  In general I think
we have a problem if a driver is relying on the bios having set up the
decoders and simply doesn't function if the bios didn't do it (and
that applies in this reset case as well). For starters, no hotplug.
Anyhow, that's a different issue, so we can leave that for now.

I'm thinking the reset flow is a good deal more complex than simply
putting the bios programmed values back.  In some cases that might
be a very bad idea as autonomous traffic can hit the type 2 device
the moment these decoders are enabled and I'm guessing that may be
before the device has fully recovered. There are very few spec rules
about this that I can recall. On the setup path the BIOS presumably
got the device into a state where enabling such traffic was fine
and hopefully the driver bind doesn't break that state.

I think you are restoring CXL.mem as well so that gate isn't
going to save us.  Note it would be good to document what is restored and
why more clearly.  Sure we can figure it out from the code, but
a document might make life easier.

A device might handle this mess for us, but I doubt that this is universal.
For type 3 devices, I'm not sure what we want to do on reset in general.

Anyhow, this is really a request for a more detailed description of the
expected reset flow that goes into what the spec constrains and what
it doesn't.  Probably something worthy of going in Documentation.

Thanks,

Jonathan

>
> This series adds CXL state save/restore logic to the PCI core so
> that DVSEC and HDM decoder state is preserved across any PCI reset
> path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.
>
> HDM decoder defines and the cxl_register_map infrastructure are moved from
> internal CXL driver headers to a new public include/cxl/pci.h, allowing
> drivers/pci/cxl.c to use them.
> This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
> minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
> changes on top of his.
>
> These patches were previously part of the CXL reset series and have been
> split out [2] to allow independent review and merging. Review feedback on
> the save/restore portions from v4 has been addressed.
>
> Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
> before reset and restored after, with decoder commit confirmed via the
> COMMITTED status bit. Type-3 device testing is in progress.
>
> This series is based on v7.0-rc1.
>
> [1] https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/
> [2] https://lore.kernel.org/linux-cxl/aa8d4f6a-e7bd-4a20-8d34-4376ea314b8f@intel.com/T/#m825c6bdd1934022123807e86d235358a63b08dbc
>
> Srirangan Madhavan (5):
>   PCI: Add CXL DVSEC control, lock, and range register definitions
>   cxl: Move HDM decoder and register map definitions to
>     include/cxl/pci.h
>   PCI: Add virtual extended cap save buffer for CXL state
>   PCI: Add cxl DVSEC state save/restore across resets
>   PCI/CXL: Add HDM decoder state save/restore
>
>  drivers/cxl/cxl.h             | 107 +-------
>  drivers/cxl/cxlpci.h          |  10 -
>  drivers/pci/Kconfig           |   4 +
>  drivers/pci/Makefile          |   1 +
>  drivers/pci/cxl.c             | 468 ++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.c             |  23 ++
>  drivers/pci/pci.h             |  18 ++
>  include/cxl/pci.h             | 129 ++++++++++
>  include/uapi/linux/pci_regs.h |   6 +
>  9 files changed, 650 insertions(+), 116 deletions(-)
>  create mode 100644 drivers/pci/cxl.c
>  create mode 100644 include/cxl/pci.h
>
> base-commit: 6de23f81a5e0
> --
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions
  2026-03-10 21:44   ` Dan Williams
@ 2026-03-16 14:02     ` Vishal Aslot
  0 siblings, 0 replies; 25+ messages in thread
From: Vishal Aslot @ 2026-03-16 14:02 UTC (permalink / raw)
  To: Dan Williams, Srirangan Madhavan, bhelgaas@google.com,
	dave.jiang@intel.com, jonathan.cameron@huawei.com,
	ira.weiny@intel.com, vishal.l.verma@intel.com,
	alison.schofield@intel.com, dave@stgolabs.net
  Cc: Alex Williamson, Jeshua Smith, Vikram Sethi,
	Sai Yashwanth Reddy Kancherla, Shanker Donthineni, Manish Honap,
	Vidya Sagar, Jiandi An, Matt Ochs, Derek Schumacher,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

I am wondering..thinking out loud here..what if the cxl "driver" (e.g the port driver) registered for its own "reset_prep" and "reset_done"? Or would having the cxl specific routine in cxl driver that pcie reset path calls into work? I realize the 2nd suggestion may not address the main concern but it's an improvement.

-Vishal
________________________________________
From: Dan Williams <dan.j.williams@intel.com>
Sent: Tuesday, March 10, 2026 4:44 PM
To: Srirangan Madhavan; bhelgaas@google.com; dan.j.williams@intel.com; dave.jiang@intel.com; jonathan.cameron@huawei.com; ira.weiny@intel.com; vishal.l.verma@intel.com; alison.schofield@intel.com; dave@stgolabs.net
Cc: Alex Williamson; Jeshua Smith; Vikram Sethi; Sai Yashwanth Reddy Kancherla; Vishal Aslot; Shanker Donthineni; Manish Honap; Vidya Sagar; Jiandi An; Matt Ochs; Derek Schumacher; linux-cxl@vger.kernel.org; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Srirangan Madhavan
Subject: Re: [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions

External email: Use caution opening links or attachments


smadhavan@ wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> PCI: Add CXL DVSEC control, lock, and range register definitions
>
> Add register offset and field definitions for CXL DVSEC registers needed
> by CXL state save/restore across resets:
>
>   - CTRL2 (offset 0x10) and LOCK (offset 0x14) registers
>   - CONFIG_LOCK bit in the LOCK register
>   - RWL (read-write-when-locked) field masks for CTRL and range base
>     registers.
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  include/uapi/linux/pci_regs.h | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index ec1c54b5a310..6fdc20d7f5e6 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1353,14 +1353,20 @@
>  #define   PCI_DVSEC_CXL_HDM_COUNT                    __GENMASK(5, 4)
>  #define  PCI_DVSEC_CXL_CTRL                          0xC
>  #define   PCI_DVSEC_CXL_MEM_ENABLE                   _BITUL(2)
> +#define   PCI_DVSEC_CXL_CTRL_RWL                     0x5FED

This is odd, why is it needed? If the bits are locked then writes are
dropped.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
  2026-03-16 13:59   ` Vishal Aslot
@ 2026-03-16 17:28     ` Jonathan Cameron
  0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2026-03-16 17:28 UTC (permalink / raw)
  To: Vishal Aslot
  Cc: Srirangan Madhavan, bhelgaas@google.com, dan.j.williams@intel.com,
	dave.jiang@intel.com, ira.weiny@intel.com,
	vishal.l.verma@intel.com, alison.schofield@intel.com,
	dave@stgolabs.net, Alex Williamson, Jeshua Smith, Vikram Sethi,
	Sai Yashwanth Reddy Kancherla, Shanker Donthineni, Manish Honap,
	Vidya Sagar, Jiandi An, Matt Ochs, Derek Schumacher,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Mon, 16 Mar 2026 13:59:59 +0000
Vishal Aslot <vaslot@nvidia.com> wrote:

> Hi Jonathan,
>           Replying on behalf of Srirangan. Srirangan, feel free to add to this.

Hi Vishal,

I've more hacked the thread to make it more readable going forwards
(it's tricky to spot replies in plain text otherwise!) If you can
persuade your email client to do quoting in similar fashion that would
be great.

>           
> >  In general I think  
> > we have a problem if a driver is relying on the bios having set up the
> > decoders and simply doesn't function if the bios didn't do it (and
> > that applies in this reset case as well).
> 

> The issue for us (and possibly for other devices as well) is that it
> is possible to reset a device without the device driver being loaded
> at all. One usecase is in-band fw update followed by reset to cause
> the device to go through its own secure boot. Now, arguably the
> device driver can restore DVSEC + HDM decoder + whatever else once it
> loads. If the driver is not loaded, it is not possible to use the
> device anyway. The main thing here is that without the driver
> present, reset_prep() and reset_done() cannot be handled, so we
> wanted this to be restored by the kernel itself.


Understood the use case. I'm just concerned on how vague the spec
leaves things.

I think we maybe need to go get some specification clarifications
or we are going to end up with a long running quirk list for when
this is safe to do restoration.  One thing I did think about is
restoring the stuff the bios setup with the exception of some of
the 'enables'. That might be not turning on mem.enable or something
similar (maybe not committing the decoders). Thus 'safe' but
retaining info in the hardware state.

Dan mentioned in another thread that he was looking at how we retain
some information on bios config over resets - that might help as well. 

We can probably also 'steal' info from the RP assuming it's not
do a single RP hostbridge with pass through. The host window is
visible there for instance.
> 
> There is a virtualization as well as confidential computing angle
> here as well but I am not fully versed in that path to comment.

That's messy as well as we need to extract some data, but also reset
to a clean state.  Confidential compute is more fun in general as I'm
not sure we can restore TSP after a reset. Certainly nowhere near doing
so today.

> 
> > I'm thinking the reset flow is a good deal more complex than simply
> >  
> > putting the bios programmed values back.  In some cases that might
> > be a very bad idea as autonomous traffic can hit the type 2 device
> > the moment these decoders are enabled and I'm guessing that may be
> > before the device has fully recovered. 
> 
> Yes, completely agree. In fact for our GPUs it's a combinatin of
> kernel-first and firmware-first. But we, at least, for our use-case
> ensure that the device will not be used before reset is completed by
> means outside of the OS control.

I'm curious on this.  How?  Do you mean the device is safe to be hit
with CPU side prefetchers afterc cachelines from CXL.mem immediately
after reset? Or the checks in here are good enough for that to be safe?

> 
> > Note it would be good to document what is restored ....more clearly
> >  
> I agree. Srirangan, could you please list out the DVSECs and
> capabilities that you are saving and restoring? In general, we should
> be saving and restoring anything that was locked but possibly got
> unlocked or was set back to its reset value. Things like mem.enable,
> cache.enable in DVSEC 0, range registers, HDM decoders would be at
> least unlocked if no wiped out, etc.
> ________________________________________ From: Jonathan Cameron
> <jonathan.cameron@huawei.com> Sent: Thursday, March 12, 2026 7:34 AM
> To: Srirangan Madhavan Cc: bhelgaas@google.com;
> dan.j.williams@intel.com; dave.jiang@intel.com; ira.weiny@intel.com;
> vishal.l.verma@intel.com; alison.schofield@intel.com;
> dave@stgolabs.net; Alex Williamson; Jeshua Smith; Vikram Sethi; Sai
> Yashwanth Reddy Kancherla; Vishal Aslot; Shanker Donthineni; Manish
> Honap; Vidya Sagar; Jiandi An; Matt Ochs; Derek Schumacher;
> linux-cxl@vger.kernel.org; linux-pci@vger.kernel.org;
> linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/5] PCI/CXL: Save
> and restore CXL DVSEC and HDM state across resets
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, 6 Mar 2026 08:00:14 +0000
> smadhavan@nvidia.com wrote:
> 
> > From: Srirangan Madhavan <smadhavan@nvidia.com>
> >
> > CXL devices could lose their DVSEC configuration and HDM decoder
> > programming after multiple reset methods (whenever link
> > disable/enable). This means a device that was fully configured —
> > with DVSEC control/range registers set and HDM decoders committed —
> > loses that state after reset. In cases where these are programmed
> > by firmware, downstream drivers are unable to re-initialize the
> > device because CXL memory ranges are no longer mapped.  
> 
> Hi Srirangan,
> 
> Firstly this might be because I'm behind on patch review and there is
> a lot going on right now!  So this might be addressed in a different
> series.
> 
> I'd like to understand the whole use case + flow here.  In general I
> think we have a problem if a driver is relying on the bios having set
> up the decoders and simply doesn't function if the bios didn't do it
> (and that applies in this reset case as well). For starters, no
> hotplug. Anyhow, that's a different issue, so we can leave that for
> now.
> 
> I'm thinking the reset flow is a good deal more complex than simply
> putting the bios programmed values back.  In some cases that might
> be a very bad idea as autonomous traffic can hit the type 2 device
> the moment these decoders are enabled and I'm guessing that may be
> before the device has fully recovered. There are very few spec rules
> about this that I can recall. On the setup path the BIOS presumably
> got the device into a state where enabling such traffic was fine
> and hopefully the driver bind doesn't break that state.
> 
> I think you are restoring CXL.mem as well so that gate isn't
> going to save us.  Note it would be good to document what is restored
> and why more clearly.  Sure we can figure it out from the code, but
> a document might make life easier.
> 
> A device might handle this mess for us, but I doubt that this is
> universal. For type 3 devices, I'm not sure what we want to do on
> reset in general.
> 
> Anyhow, this is really a request for a more detailed description of
> the expected reset flow that goes into what the spec constrains and
> what it doesn't.  Probably something worthy of going in Documentation.
> 
> Thanks,
> 
> Jonathan
> 
> >
> > This series adds CXL state save/restore logic to the PCI core so
> > that DVSEC and HDM decoder state is preserved across any PCI reset
> > path that calls pci_save_state() / pci_restore_state(), for a CXL
> > capable device.
> >
> > HDM decoder defines and the cxl_register_map infrastructure are
> > moved from internal CXL driver headers to a new public
> > include/cxl/pci.h, allowing drivers/pci/cxl.c to use them.
> > This layout aligns with Alejandro Lucero's CXL Type-2 device series
> > [1] to minimize conflicts when both land. When he rebases to
> > 7.0-rc2, I can move my changes on top of his.
> >
> > These patches were previously part of the CXL reset series and have
> > been split out [2] to allow independent review and merging. Review
> > feedback on the save/restore portions from v4 has been addressed.
> >
> > Tested on a CXL Type-2 device. DVSEC and HDM state is correctly
> > saved before reset and restored after, with decoder commit
> > confirmed via the COMMITTED status bit. Type-3 device testing is in
> > progress.
> >
> > This series is based on v7.0-rc1.
> >
> > [1] https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/
> > [2] https://lore.kernel.org/linux-cxl/aa8d4f6a-e7bd-4a20-8d34-4376ea314b8f@intel.com/T/#m825c6bdd1934022123807e86d235358a63b08dbc
> >
> > Srirangan Madhavan (5):
> >   PCI: Add CXL DVSEC control, lock, and range register definitions
> >   cxl: Move HDM decoder and register map definitions to
> >     include/cxl/pci.h
> >   PCI: Add virtual extended cap save buffer for CXL state
> >   PCI: Add cxl DVSEC state save/restore across resets
> >   PCI/CXL: Add HDM decoder state save/restore
> >
> >  drivers/cxl/cxl.h             | 107 +-------
> >  drivers/cxl/cxlpci.h          |  10 -
> >  drivers/pci/Kconfig           |   4 +
> >  drivers/pci/Makefile          |   1 +
> >  drivers/pci/cxl.c             | 468
> > ++++++++++++++++++++++++++++++++++ drivers/pci/pci.c             |
> > 23 ++ drivers/pci/pci.h             |  18 ++
> >  include/cxl/pci.h             | 129 ++++++++++
> >  include/uapi/linux/pci_regs.h |   6 +
> >  9 files changed, 650 insertions(+), 116 deletions(-)
> >  create mode 100644 drivers/pci/cxl.c
> >  create mode 100644 include/cxl/pci.h
> >
> > base-commit: 6de23f81a5e0
> > --
> > 2.43.0
> >
> >  
> 
> 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
  2026-03-11  1:45     ` Dan Williams
@ 2026-03-17 14:51       ` Manish Honap
  2026-03-17 17:03         ` Dan Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Manish Honap @ 2026-03-17 14:51 UTC (permalink / raw)
  To: Dan Williams, Alex Williamson, jonathan.cameron@huawei.com
  Cc: alex@shazbot.org, Srirangan Madhavan, bhelgaas@google.com,
	dave.jiang@intel.com, ira.weiny@intel.com,
	vishal.l.verma@intel.com, alison.schofield@intel.com,
	dave@stgolabs.net, Jeshua Smith, Vikram Sethi,
	Sai Yashwanth Reddy Kancherla, Vishal Aslot, Shanker Donthineni,
	Vidya Sagar, Jiandi An, Matt Ochs, Derek Schumacher,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, Manish Honap



> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> Sent: 11 March 2026 07:15
> To: Alex Williamson <alex@shazbot.org>; Dan Williams
> <dan.j.williams@intel.com>
> Cc: alex@shazbot.org; Srirangan Madhavan <smadhavan@nvidia.com>;
> bhelgaas@google.com; dave.jiang@intel.com; jonathan.cameron@huawei.com;
> ira.weiny@intel.com; vishal.l.verma@intel.com; alison.schofield@intel.com;
> dave@stgolabs.net; Jeshua Smith <jeshuas@nvidia.com>; Vikram Sethi
> <vsethi@nvidia.com>; Sai Yashwanth Reddy Kancherla
> <skancherla@nvidia.com>; Vishal Aslot <vaslot@nvidia.com>; Shanker
> Donthineni <sdonthineni@nvidia.com>; Manish Honap <mhonap@nvidia.com>;
> Vidya Sagar <vidyas@nvidia.com>; Jiandi An <jan@nvidia.com>; Matt Ochs
> <mochs@nvidia.com>; Derek Schumacher <dschumacher@nvidia.com>; linux-
> cxl@vger.kernel.org; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state
> across resets
> 
> External email: Use caution opening links or attachments
> 
> 
> Alex Williamson wrote:
> [..]
> > A constraint here is that CXL_BUS can be modular while PCI is builtin,
> > but reset is initiated through PCI and drivers like vfio-pci already
> > manage an opaque blob of PCI device state that can be pushed back into
> > the device to restore it between use cases.  If PCI is not enlightened
> > about CXL state to some extent, how does this work?
> 
> My expectation is that "vfio-cxl" is responsible. Similar to vfio-pci that
> builds on PCI core functionality for assigning a device, a vfio-cxl driver
> would build on CXL.
> 
> Specficially, register generic 'struct cxl_memdev' and/or, 'struct
> cxl_cachdev' objects with the CXL core, like any other accelerator driver,
> and coordinate various levels of reset on those objects, not the 'struct
> pci_dev'.
> 
> > PCI core has already been enlightened about things like
> > virtual-channel that it doesn't otherwise touch in order to be able to
> > save and restore firmware initiated configurations.  I think there are
> > aspects of that sort of thing here as well.  Thanks,
> 
> I am willing to hear more and admit I am not familiar with the details of
> virtual-channel that make it both amenable to vfio-pci management and
> similar to CXL. CXL needs to consider MM, cross-device dependencies, and
> decoder topology management that is more dynamic than what happens for PCI
> resources.
> 
> The CXL accelerator series is currently contending with being able to
> restore device configuration after reset. I expect vfio-cxl to build on
> that, not push CXL flows into the PCI core.

Hello Dan,

My VFIO CXL Type-2 passthrough series [1] takes a position on this that I
would like to explain because I expect you will have similar concerns about
it and I'd rather have this conversation now.

Type-2 passthrough series takes the opposite structural approach as you are
suggesting here: CXL Type-2 support is an optional extension compiled into
vfio-pci-core (CONFIG_VFIO_CXL_CORE), not a separate driver.

Here is the reasoning:

1. Device enumeration
=====================

CXL Type-2 devices (GPU + accelerator class) are enumerated as struct pci_dev
objects.  The kernel discovers them through PCI config space scan, not through
the CXL bus. The CXL capability is advertised via the DVSEC (PCI_EXT_CAP_ID
0x23, Vendor ID 0x1E98), which is PCI config space. There is no CXL bus
device to bind to.

A standalone vfio-cxl driver would therefore need to match on the PCI device
just like vfio-pci does, and then call into vfio-pci-core for every PCI
concern: config space emulation, BAR region handling, MSI/MSI-X, INTx, DMA
mapping, FLR, and migration callbacks. That is the variant driver pattern
we rejected in favour of generic CXL passthrough. We have seen this exact
outcome with the prior iterations of this series before we moved to the
enlightened vfio-pci model.

2. CXL-CORE involvement
=======================

CXL type-2 passthrough series does not bypass CXL core. At vfio_pci_probe()
time the CXL enlightenment layer:

  - calls cxl_get_hdm_info() to probe the HDM Decoder Capability block,
  - calls cxl_get_committed_decoder() to locate pre-committed firmware regions,
  - calls cxl_create_region() / cxl_request_dpa() for dynamic allocation,
  - creates a struct cxl_memdev via the CXL core (via cxl_probe_component_regs,
    the same path Alejandro's v23 series uses).

The CXL core is fully involved.  The difference is that the binding to
userspace is still through vfio-pci, which already manages the pci_dev
lifecycle, reset sequencing, and VFIO region/irq API.

3. Standalone vfio-cxl
======================

To match the model you are suggesting, vfio-cxl would need to:

  (a) Register a new driver on the CXL bus (struct cxl_driver), probing
      struct cxl_memdev or a new struct cxl_endpoint,
  (b) Re-implement or delegate everything vfio-pci-core provides — config
      space, BAR regions, IRQs, DMA, FLR, and VFIO container management —
      either by calling vfio-pci-core as a library or by duplicating it, and
  (c) present to userspace through a new device model distinct from
      vfio-pci.

This is a significant new surface. QEMU's CXL passthrough support already
builds on vfio-pci: it receives the PCI device via VFIO, reads the
VFIO_DEVICE_INFO_CAP_CXL capability chain, and exposes the CXL topology.
A vfio-cxl object model would require non-trivial QEMU changes for something
that already works in the enlightened vfio-pci model.

4. Module dependency
====================

Current solution: CONFIG_VFIO_CXL_CORE depends on CONFIG_CXL_BUS. We do not
add CXL knowledge to the PCI core; we add it to the VFIO layer that is already
CXL_BUS-dependent.

I would very much appreciate your thoughts on [1] considering the above. I want
to understand your thoughts on whether vfio-pci-core can remain the single
entry point from userspace, or whether you envision a new VFIO device type.

Jonathan has indicated he has thoughts on this as well; hopefully, we
can converge on a direction that doesn't require duplicating vfio-pci-core.

[1] https://lore.kernel.org/linux-cxl/20260311203440.752648-1-mhonap@nvidia.com/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
  2026-03-17 14:51       ` Manish Honap
@ 2026-03-17 17:03         ` Dan Williams
  2026-03-17 18:19           ` Alex Williamson
  0 siblings, 1 reply; 25+ messages in thread
From: Dan Williams @ 2026-03-17 17:03 UTC (permalink / raw)
  To: Manish Honap, Dan Williams, Alex Williamson,
	jonathan.cameron@huawei.com
  Cc: alex@shazbot.org, Srirangan Madhavan, bhelgaas@google.com,
	dave.jiang@intel.com, ira.weiny@intel.com,
	vishal.l.verma@intel.com, alison.schofield@intel.com,
	dave@stgolabs.net, Jeshua Smith, Vikram Sethi,
	Sai Yashwanth Reddy Kancherla, Vishal Aslot, Shanker Donthineni,
	Vidya Sagar, Jiandi An, Matt Ochs, Derek Schumacher,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, Manish Honap

Manish Honap wrote:
[..]
> > The CXL accelerator series is currently contending with being able to
> > restore device configuration after reset. I expect vfio-cxl to build on
> > that, not push CXL flows into the PCI core.
> 
> Hello Dan,
> 
> My VFIO CXL Type-2 passthrough series [1] takes a position on this that I
> would like to explain because I expect you will have similar concerns about
> it and I'd rather have this conversation now.
> 
> Type-2 passthrough series takes the opposite structural approach as you are
> suggesting here: CXL Type-2 support is an optional extension compiled into
> vfio-pci-core (CONFIG_VFIO_CXL_CORE), not a separate driver.
> 
> Here is the reasoning:
> 
> 1. Device enumeration
> =====================
> 
> CXL Type-2 devices (GPU + accelerator class) are enumerated as struct pci_dev
> objects.  The kernel discovers them through PCI config space scan, not through
> the CXL bus. The CXL capability is advertised via the DVSEC (PCI_EXT_CAP_ID
> 0x23, Vendor ID 0x1E98), which is PCI config space. There is no CXL bus
> device to bind to.
> 
> A standalone vfio-cxl driver would therefore need to match on the PCI device
> just like vfio-pci does, and then call into vfio-pci-core for every PCI
> concern: config space emulation, BAR region handling, MSI/MSI-X, INTx, DMA
> mapping, FLR, and migration callbacks. That is the variant driver pattern
> we rejected in favour of generic CXL passthrough. We have seen this exact

Lore link for this "rejection" discussion?

> outcome with the prior iterations of this series before we moved to the
> enlightened vfio-pci model.

I still do not understand the argument. CXL functionality is a library
that PCI drivers can use. If vfio-pci functionality is also a library
then vfio-cxl is a driver that uses services from both libraries. Where
the module and driver name boundaries are drawn is more an organization
decision not an functional one.

The argument for vfio-cxl organizational independence is more about
being able to tell at a diffstat level the relative PCI vs CXL
maintenance impact / regression risk.

> 2. CXL-CORE involvement
> =======================
> 
> CXL type-2 passthrough series does not bypass CXL core. At vfio_pci_probe()
> time the CXL enlightenment layer:
> 
>   - calls cxl_get_hdm_info() to probe the HDM Decoder Capability block,
>   - calls cxl_get_committed_decoder() to locate pre-committed firmware regions,
>   - calls cxl_create_region() / cxl_request_dpa() for dynamic allocation,
>   - creates a struct cxl_memdev via the CXL core (via cxl_probe_component_regs,
>     the same path Alejandro's v23 series uses).
> 
> The CXL core is fully involved.  The difference is that the binding to
> userspace is still through vfio-pci, which already manages the pci_dev
> lifecycle, reset sequencing, and VFIO region/irq API.

Sure, every CXL driver in the system will do the same.

> 3. Standalone vfio-cxl
> ======================
> 
> To match the model you are suggesting, vfio-cxl would need to:
> 
>   (a) Register a new driver on the CXL bus (struct cxl_driver), probing
>       struct cxl_memdev or a new struct cxl_endpoint,

What, why? Just like this patch was series was proposing extending the
PCI core with additional common functionality the proposal is extend the
CXL core object drivers with the same.

>   (b) Re-implement or delegate everything vfio-pci-core provides — config
>       space, BAR regions, IRQs, DMA, FLR, and VFIO container management —
>       either by calling vfio-pci-core as a library or by duplicating it, and

What is the argument against a library?

>   (c) present to userspace through a new device model distinct from
>       vfio-pci.

CXL is a distinct operational model. What breaks if userspace is
required to explicitly account for CXL passhthrough?

> This is a significant new surface. QEMU's CXL passthrough support already
> builds on vfio-pci: it receives the PCI device via VFIO, reads the
> VFIO_DEVICE_INFO_CAP_CXL capability chain, and exposes the CXL topology.
> A vfio-cxl object model would require non-trivial QEMU changes for something
> that already works in the enlightened vfio-pci model.

What specifically about a kernel code organization choice affects the
QEMU implementation? A uAPI is kernel code organization agnostic.

The concern is designing ourselves into a PCI corner when longterm QEMU
benefits from understanding CXL objects. For example, CXL error handling
/ recovery is already well on its way to being performed in terms of CXL
port objects.

> 4. Module dependency
> ====================
> 
> Current solution: CONFIG_VFIO_CXL_CORE depends on CONFIG_CXL_BUS. We do not
> add CXL knowledge to the PCI core;

drivers/pci/cxl.c

> we add it to the VFIO layer that is already CXL_BUS-dependent.

Yes, VFIO layer needs CXL enlightenment and VFIO's requirements imply
wider benefits to other CXL capable devices.

> I would very much appreciate your thoughts on [1] considering the above. I want
> to understand your thoughts on whether vfio-pci-core can remain the single
> entry point from userspace, or whether you envision a new VFIO device type.
> 
> Jonathan has indicated he has thoughts on this as well; hopefully, we
> can converge on a direction that doesn't require duplicating vfio-pci-core.

No one is suggesting, "require duplicating vfio-pci-core", please do not
argue with strawman cariacatures like this.

> [1] https://lore.kernel.org/linux-cxl/20260311203440.752648-1-mhonap@nvidia.com/

Will take a look...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
  2026-03-17 17:03         ` Dan Williams
@ 2026-03-17 18:19           ` Alex Williamson
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Williamson @ 2026-03-17 18:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Manish Honap, jonathan.cameron@huawei.com, Srirangan Madhavan,
	bhelgaas@google.com, dave.jiang@intel.com, ira.weiny@intel.com,
	vishal.l.verma@intel.com, alison.schofield@intel.com,
	dave@stgolabs.net, Jeshua Smith, Vikram Sethi,
	Sai Yashwanth Reddy Kancherla, Vishal Aslot, Shanker Donthineni,
	Vidya Sagar, Jiandi An, Matt Ochs, Derek Schumacher,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, alex

On Tue, 17 Mar 2026 10:03:28 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Manish Honap wrote:
> [..]
> > > The CXL accelerator series is currently contending with being able to
> > > restore device configuration after reset. I expect vfio-cxl to build on
> > > that, not push CXL flows into the PCI core.  
> > 
> > Hello Dan,
> > 
> > My VFIO CXL Type-2 passthrough series [1] takes a position on this that I
> > would like to explain because I expect you will have similar concerns about
> > it and I'd rather have this conversation now.
> > 
> > Type-2 passthrough series takes the opposite structural approach as you are
> > suggesting here: CXL Type-2 support is an optional extension compiled into
> > vfio-pci-core (CONFIG_VFIO_CXL_CORE), not a separate driver.
> > 
> > Here is the reasoning:
> > 
> > 1. Device enumeration
> > =====================
> > 
> > CXL Type-2 devices (GPU + accelerator class) are enumerated as struct pci_dev
> > objects.  The kernel discovers them through PCI config space scan, not through
> > the CXL bus. The CXL capability is advertised via the DVSEC (PCI_EXT_CAP_ID
> > 0x23, Vendor ID 0x1E98), which is PCI config space. There is no CXL bus
> > device to bind to.
> > 
> > A standalone vfio-cxl driver would therefore need to match on the PCI device
> > just like vfio-pci does, and then call into vfio-pci-core for every PCI
> > concern: config space emulation, BAR region handling, MSI/MSI-X, INTx, DMA
> > mapping, FLR, and migration callbacks. That is the variant driver pattern
> > we rejected in favour of generic CXL passthrough. We have seen this exact  
> 
> Lore link for this "rejection" discussion?
> 
> > outcome with the prior iterations of this series before we moved to the
> > enlightened vfio-pci model.  
> 
> I still do not understand the argument. CXL functionality is a library
> that PCI drivers can use.

This is a key aspect of the decision to "enlighten" vfio-pci to know
about CXL.  Ultimately the vfio driver for CXL devices is a PCI driver,
it binds to a PCI device.  We've developed macros for PCI devices to
identify in their ID table that they provide a vfio-pci override
driver, see for example the output of the following on your own system:

$ grep vfio_pci /lib/modules/`uname -r`/modules.alias

The catch-all entry is vfio-pci itself:

alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci

A vfio-pci variant driver for a specific device will include vendor and
device ID matches:

alias vfio_pci:v000015B3d0000101Esv*sd*bc*sc*i* mlx5_vfio_pci

Tools like libvirt know to make use of this when assigning a PCI
hostdev device to a VM by matching the most appropriate driver based on
these aliases.  They know they'll get a vfio-pci interface for use with
things like QEMU with a vfio-pci driver option.

If we introduce vfio-cxl, that also binds to a PCI device, how do we
end up making this automatic for userspace?

If we were to make "vfio-cxl" as a vfio-pci variant driver, we'd need
to expand the ID table for specific devices, which becomes a
maintenance issue.  Otherwise userspace would need to detect the CXL
capabilities and override the automatic driver aliases.  We can't match
drivers based on DVSEC capabilities and we don't have any protocol to
define a "2nd best" match for a device alias if probe fails.

> If vfio-pci functionality is also a library
> then vfio-cxl is a driver that uses services from both libraries. Where
> the module and driver name boundaries are drawn is more an organization
> decision not an functional one.

But as above, it is functional.  Someone needs to define when to use
which driver, which leads to libvirt needing to specify whether a
device is being exposed as PCI or CXL, and the same understanding in
each VMM.  OTOH, using vfio-pci as the basis and layering CXL feature
detection, ie. enlightenment, gives us a more compatible, incremental
approach.

> The argument for vfio-cxl organizational independence is more about
> being able to tell at a diffstat level the relative PCI vs CXL
> maintenance impact / regression risk.

But we still have that.  CXL enlightenment for vfio-pci(-core) can
still be configured out and compartmentalized into separate helper
library code.

> > 2. CXL-CORE involvement
> > =======================
> > 
> > CXL type-2 passthrough series does not bypass CXL core. At vfio_pci_probe()
> > time the CXL enlightenment layer:
> > 
> >   - calls cxl_get_hdm_info() to probe the HDM Decoder Capability block,
> >   - calls cxl_get_committed_decoder() to locate pre-committed firmware regions,
> >   - calls cxl_create_region() / cxl_request_dpa() for dynamic allocation,
> >   - creates a struct cxl_memdev via the CXL core (via cxl_probe_component_regs,
> >     the same path Alejandro's v23 series uses).
> > 
> > The CXL core is fully involved.  The difference is that the binding to
> > userspace is still through vfio-pci, which already manages the pci_dev
> > lifecycle, reset sequencing, and VFIO region/irq API.  
> 
> Sure, every CXL driver in the system will do the same.
> 
> > 3. Standalone vfio-cxl
> > ======================
> > 
> > To match the model you are suggesting, vfio-cxl would need to:
> > 
> >   (a) Register a new driver on the CXL bus (struct cxl_driver), probing
> >       struct cxl_memdev or a new struct cxl_endpoint,  
> 
> What, why? Just like this patch was series was proposing extending the
> PCI core with additional common functionality the proposal is extend the
> CXL core object drivers with the same.

I don't follow, what is the proposal?
 
> >   (b) Re-implement or delegate everything vfio-pci-core provides — config
> >       space, BAR regions, IRQs, DMA, FLR, and VFIO container management —
> >       either by calling vfio-pci-core as a library or by duplicating it, and  
> 
> What is the argument against a library?

vfio-pci-core is already a library, the extensions to support CXL as an
enlightenment of vfio-pci is also a library.  The issue is that a
vfio-cxl PCI driver module presents more issues than simply code
organization.
 
> >   (c) present to userspace through a new device model distinct from
> >       vfio-pci.  
> 
> CXL is a distinct operational model. What breaks if userspace is
> required to explicitly account for CXL passhthrough?

The entire virtualization stack needs to gain an understanding of the
intended use case of the device rather than simply push a PCI device
with CXL capabilities out to the guest.
 
> > This is a significant new surface. QEMU's CXL passthrough support already
> > builds on vfio-pci: it receives the PCI device via VFIO, reads the
> > VFIO_DEVICE_INFO_CAP_CXL capability chain, and exposes the CXL topology.
> > A vfio-cxl object model would require non-trivial QEMU changes for something
> > that already works in the enlightened vfio-pci model.  
> 
> What specifically about a kernel code organization choice affects the
> QEMU implementation? A uAPI is kernel code organization agnostic.
> 
> The concern is designing ourselves into a PCI corner when longterm QEMU
> benefits from understanding CXL objects. For example, CXL error handling
> / recovery is already well on its way to being performed in terms of CXL
> port objects.

Are you suggesting that rather than using the PCI device as the basis
for assignment to a userspace driver or VM that we make each port
objects assignable and somehow collect them into configuration on top of
a PCI device?  I don't think these port objects are isolated for such a
use case.  I'd like to better understand how you envision this to work.

The organization of the code in the kernel seems 90%+ the same whether
we enlighten vfio-pci to detect and expose CXL features or we create a
separate vfio-cxl PCI driver only for CXL devices, but the userspace
consequences are increased significantly.
 
> > 4. Module dependency
> > ====================
> > 
> > Current solution: CONFIG_VFIO_CXL_CORE depends on CONFIG_CXL_BUS. We do not
> > add CXL knowledge to the PCI core;  
> 
> drivers/pci/cxl.c

This is largely a consequence of CXL_BUS being a loadable module.
 
> > we add it to the VFIO layer that is already CXL_BUS-dependent.  
> 
> Yes, VFIO layer needs CXL enlightenment and VFIO's requirements imply
> wider benefits to other CXL capable devices.
> 
> > I would very much appreciate your thoughts on [1] considering the above. I want
> > to understand your thoughts on whether vfio-pci-core can remain the single
> > entry point from userspace, or whether you envision a new VFIO device type.
> > 
> > Jonathan has indicated he has thoughts on this as well; hopefully, we
> > can converge on a direction that doesn't require duplicating vfio-pci-core.  
> 
> No one is suggesting, "require duplicating vfio-pci-core", please do not
> argue with strawman cariacatures like this.

I think it comes down to whether the enlightenment maps to the existing
granularity of the core module.  Reset is probably a good example, ie.
how does the device being CXL affect the emulation of FLR, initiated
through device config space, versus the device reset ioctl.  The former
should maintain the CXL.io scope while the latter has an expanded scope
with CXL.
 
> > [1] https://lore.kernel.org/linux-cxl/20260311203440.752648-1-mhonap@nvidia.com/  
> 
> Will take a look...

Thanks!

Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-03-17 18:19 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06  8:00 [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets smadhavan
2026-03-06  8:00 ` [PATCH 1/5] PCI: Add CXL DVSEC control, lock, and range register definitions smadhavan
2026-03-06 17:45   ` Alex Williamson
2026-03-07  0:37     ` Srirangan Madhavan
2026-03-10 21:44   ` Dan Williams
2026-03-16 14:02     ` Vishal Aslot
2026-03-06  8:00 ` [PATCH 2/5] cxl: Move HDM decoder and register map definitions to include/cxl/pci.h smadhavan
2026-03-06 17:45   ` Alex Williamson
2026-03-07  0:35     ` Srirangan Madhavan
2026-03-10 16:13       ` Dave Jiang
2026-03-06  8:00 ` [PATCH 3/5] PCI: Add virtual extended cap save buffer for CXL state smadhavan
2026-03-10 21:45   ` Dan Williams
2026-03-06  8:00 ` [PATCH 4/5] PCI: Add cxl DVSEC state save/restore across resets smadhavan
2026-03-06 17:45   ` Alex Williamson
2026-03-12 12:28   ` Jonathan Cameron
2026-03-06  8:00 ` [PATCH 5/5] PCI: Add HDM decoder state save/restore smadhavan
2026-03-10 21:39 ` [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets Dan Williams
2026-03-10 22:46   ` Alex Williamson
2026-03-11  1:45     ` Dan Williams
2026-03-17 14:51       ` Manish Honap
2026-03-17 17:03         ` Dan Williams
2026-03-17 18:19           ` Alex Williamson
2026-03-12 12:34 ` Jonathan Cameron
2026-03-16 13:59   ` Vishal Aslot
2026-03-16 17:28     ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox