* [PATCH v5 1/7] PCI: Add CXL DVSEC reset and capability register definitions
2026-03-06 9:23 [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices smadhavan
@ 2026-03-06 9:23 ` smadhavan
2026-03-06 9:23 ` [PATCH v5 2/7] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() smadhavan
` (6 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: smadhavan @ 2026-03-06 9:23 UTC (permalink / raw)
To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel, Srirangan Madhavan
From: Srirangan Madhavan <smadhavan@nvidia.com>
Add CXL DVSEC register definitions needed for CXL device reset per
CXL r3.2 section 8.1.3.1:
- Capability bits: RST_CAPABLE, CACHE_CAPABLE, CACHE_WBI_CAPABLE,
RST_TIMEOUT, RST_MEM_CLR_CAPABLE
- Control2 register: DISABLE_CACHING, INIT_CACHE_WBI, INIT_CXL_RST,
RST_MEM_CLR_EN
- Status2 register: CACHE_INV, RST_DONE, RST_ERR
- Non-CXL Function Map DVSEC register offset
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
include/uapi/linux/pci_regs.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 6fdc20d7f5e6..a9dcca54b01c 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1349,12 +1349,25 @@
/* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */
#define PCI_DVSEC_CXL_DEVICE 0
#define PCI_DVSEC_CXL_CAP 0xA
+#define PCI_DVSEC_CXL_CACHE_CAPABLE _BITUL(0)
#define PCI_DVSEC_CXL_MEM_CAPABLE _BITUL(2)
#define PCI_DVSEC_CXL_HDM_COUNT __GENMASK(5, 4)
+#define PCI_DVSEC_CXL_CACHE_WBI_CAPABLE _BITUL(6)
+#define PCI_DVSEC_CXL_RST_CAPABLE _BITUL(7)
+#define PCI_DVSEC_CXL_RST_TIMEOUT __GENMASK(10, 8)
+#define PCI_DVSEC_CXL_RST_MEM_CLR_CAPABLE _BITUL(11)
#define PCI_DVSEC_CXL_CTRL 0xC
#define PCI_DVSEC_CXL_MEM_ENABLE _BITUL(2)
#define PCI_DVSEC_CXL_CTRL_RWL 0x5FED
#define PCI_DVSEC_CXL_CTRL2 0x10
+#define PCI_DVSEC_CXL_DISABLE_CACHING _BITUL(0)
+#define PCI_DVSEC_CXL_INIT_CACHE_WBI _BITUL(1)
+#define PCI_DVSEC_CXL_INIT_CXL_RST _BITUL(2)
+#define PCI_DVSEC_CXL_RST_MEM_CLR_EN _BITUL(3)
+#define PCI_DVSEC_CXL_STATUS2 0x12
+#define PCI_DVSEC_CXL_CACHE_INV _BITUL(0)
+#define PCI_DVSEC_CXL_RST_DONE _BITUL(1)
+#define PCI_DVSEC_CXL_RST_ERR _BITUL(2)
#define PCI_DVSEC_CXL_LOCK 0x14
#define PCI_DVSEC_CXL_LOCK_CONFIG _BITUL(0)
#define PCI_DVSEC_CXL_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
@@ -1372,6 +1385,7 @@
/* CXL r4.0, 8.1.4: Non-CXL Function Map DVSEC */
#define PCI_DVSEC_CXL_FUNCTION_MAP 2
+#define PCI_DVSEC_CXL_FUNCTION_MAP_REG 0x0C
/* CXL r4.0, 8.1.5: Extensions DVSEC for Ports */
#define PCI_DVSEC_CXL_PORT 3
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH v5 2/7] PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
2026-03-06 9:23 [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices smadhavan
2026-03-06 9:23 ` [PATCH v5 1/7] PCI: Add CXL DVSEC reset and capability register definitions smadhavan
@ 2026-03-06 9:23 ` smadhavan
2026-03-06 9:23 ` [PATCH v5 3/7] cxl: Add memory offlining and cache flush helpers smadhavan
` (5 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: smadhavan @ 2026-03-06 9:23 UTC (permalink / raw)
To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel, Srirangan Madhavan
From: Srirangan Madhavan <smadhavan@nvidia.com>
Export pci_dev_save_and_disable() and pci_dev_restore() so that
subsystems performing non-standard reset sequences (e.g. CXL)
can reuse the PCI core standard pre/post reset lifecycle:
driver reset_prepare/reset_done callbacks, PCI config space
save/restore, and device disable/re-enable.
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
drivers/pci/pci.c | 21 +++++++++++++++++++--
include/linux/pci.h | 3 +++
2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 497720c64d6d..2ef8d7274b30 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5033,7 +5033,15 @@ void pci_dev_unlock(struct pci_dev *dev)
}
EXPORT_SYMBOL_GPL(pci_dev_unlock);
-static void pci_dev_save_and_disable(struct pci_dev *dev)
+/**
+ * pci_dev_save_and_disable - Save device state and disable it
+ * @dev: PCI device to save and disable
+ *
+ * Save the PCI configuration state, invoke the driver's reset_prepare
+ * callback (if any), and disable the device by clearing the Command register.
+ * The device lock must be held by the caller.
+ */
+void pci_dev_save_and_disable(struct pci_dev *dev)
{
const struct pci_error_handlers *err_handler =
dev->driver ? dev->driver->err_handler : NULL;
@@ -5066,8 +5074,16 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
*/
pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
}
+EXPORT_SYMBOL_GPL(pci_dev_save_and_disable);
-static void pci_dev_restore(struct pci_dev *dev)
+/**
+ * pci_dev_restore - Restore device state after reset
+ * @dev: PCI device to restore
+ *
+ * Restore the saved PCI configuration state and invoke the driver's
+ * reset_done callback (if any). The device lock must be held by the caller.
+ */
+void pci_dev_restore(struct pci_dev *dev)
{
const struct pci_error_handlers *err_handler =
dev->driver ? dev->driver->err_handler : NULL;
@@ -5084,6 +5100,7 @@ static void pci_dev_restore(struct pci_dev *dev)
else if (dev->driver)
pci_warn(dev, "reset done");
}
+EXPORT_SYMBOL_GPL(pci_dev_restore);
/* dev->reset_methods[] is a 0-terminated list of indices into this array */
const struct pci_reset_fn_method pci_reset_fn_methods[] = {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1c270f1d5123..b229c1d93735 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2007,6 +2007,9 @@ int pci_dev_trylock(struct pci_dev *dev);
void pci_dev_unlock(struct pci_dev *dev);
DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
+void pci_dev_save_and_disable(struct pci_dev *dev);
+void pci_dev_restore(struct pci_dev *dev);
+
/*
* PCI domain support. Sometimes called PCI segment (eg by ACPI),
* a PCI domain is defined to be a set of PCI buses which share
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH v5 3/7] cxl: Add memory offlining and cache flush helpers
2026-03-06 9:23 [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices smadhavan
2026-03-06 9:23 ` [PATCH v5 1/7] PCI: Add CXL DVSEC reset and capability register definitions smadhavan
2026-03-06 9:23 ` [PATCH v5 2/7] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() smadhavan
@ 2026-03-06 9:23 ` smadhavan
2026-03-06 23:34 ` Alex Williamson
2026-03-09 23:01 ` Dave Jiang
2026-03-06 9:23 ` [PATCH v5 4/7] cxl: Add multi-function sibling coordination for CXL reset smadhavan
` (4 subsequent siblings)
7 siblings, 2 replies; 19+ messages in thread
From: smadhavan @ 2026-03-06 9:23 UTC (permalink / raw)
To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel, Srirangan Madhavan
From: Srirangan Madhavan <smadhavan@nvidia.com>
Add infrastructure for quiescing the CXL data path before reset:
- Memory offlining: check if CXL-backed memory is online and offline
it via offline_and_remove_memory() before reset, per CXL
spec requirement to quiesce all CXL.mem transactions before issuing
CXL Reset.
- CPU cache flush: invalidate cache lines before reset
as a safety measure after memory offline.
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
drivers/cxl/core/pci.c | 110 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 110 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index f96ce884a213..9e6f0c4b3cb6 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -4,6 +4,8 @@
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/device.h>
#include <linux/delay.h>
+#include <linux/memory_hotplug.h>
+#include <linux/memregion.h>
#include <linux/pci.h>
#include <linux/pci-doe.h>
#include <linux/aer.h>
@@ -869,3 +871,111 @@ int cxl_port_get_possible_dports(struct cxl_port *port)
return ctx.count;
}
+
+/*
+ * CXL Reset support - core-provided reset logic for CXL devices.
+ *
+ * These functions implement the CXL reset sequence.
+ */
+
+/*
+ * If CXL memory backed by this decoder is online as System RAM, offline
+ * and remove it per CXL spec requirements before issuing CXL Reset.
+ * Returns 0 if memory was not online or was successfully offlined.
+ */
+static int __maybe_unused cxl_offline_memory(struct device *dev, void *data)
+{
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_region *cxlr;
+ struct cxl_region_params *p;
+ int rc;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ cxlr = cxled->cxld.region;
+ if (!cxlr)
+ return 0;
+
+ p = &cxlr->params;
+ if (!p->res)
+ return 0;
+
+ if (walk_iomem_res_desc(IORES_DESC_NONE,
+ IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
+ p->res->start, p->res->end, NULL, NULL) <= 0)
+ return 0;
+
+ dev_info(dev, "Offlining CXL memory [%pr] for reset\n", p->res);
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+ rc = offline_and_remove_memory(p->res->start, resource_size(p->res));
+ if (rc) {
+ dev_err(dev,
+ "Failed to offline CXL memory [%pr]: %d\n",
+ p->res, rc);
+ return rc;
+ }
+#else
+ dev_err(dev, "Memory hotremove not supported, cannot offline CXL memory\n");
+ rc = -EOPNOTSUPP;
+ return rc;
+#endif
+
+ return 0;
+}
+
+static int __maybe_unused cxl_reset_prepare_memdev(struct cxl_memdev *cxlmd)
+{
+ struct cxl_port *endpoint;
+ struct device *dev;
+
+ if (!cxlmd || !cxlmd->cxlds)
+ return -ENODEV;
+
+ dev = cxlmd->cxlds->dev;
+ endpoint = cxlmd->endpoint;
+ if (!endpoint)
+ return 0;
+
+ return device_for_each_child(&endpoint->dev, NULL,
+ cxl_offline_memory);
+}
+
+static int __maybe_unused cxl_decoder_flush_cache(struct device *dev, void *data)
+{
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_region *cxlr;
+ struct resource *res;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ cxlr = cxled->cxld.region;
+ if (!cxlr || !cxlr->params.res)
+ return 0;
+
+ res = cxlr->params.res;
+ cpu_cache_invalidate_memregion(res->start, resource_size(res));
+ return 0;
+}
+
+static int __maybe_unused cxl_reset_flush_cpu_caches(struct cxl_memdev *cxlmd)
+{
+ struct cxl_port *endpoint;
+
+ if (!cxlmd)
+ return 0;
+
+ endpoint = cxlmd->endpoint;
+ if (!endpoint || IS_ERR(endpoint))
+ return 0;
+
+ if (!cpu_cache_has_invalidate_memregion())
+ return 0;
+
+ device_for_each_child(&endpoint->dev, NULL, cxl_decoder_flush_cache);
+ return 0;
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v5 3/7] cxl: Add memory offlining and cache flush helpers
2026-03-06 9:23 ` [PATCH v5 3/7] cxl: Add memory offlining and cache flush helpers smadhavan
@ 2026-03-06 23:34 ` Alex Williamson
2026-03-09 23:01 ` Dave Jiang
1 sibling, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2026-03-06 23:34 UTC (permalink / raw)
To: smadhavan
Cc: alex, bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron,
ira.weiny, vishal.l.verma, alison.schofield, dave, jeshuas,
vsethi, skancherla, vaslot, sdonthineni, mhonap, vidyas, jan,
mochs, dschumacher, linux-cxl, linux-pci, linux-kernel
On Fri, 6 Mar 2026 09:23:18 +0000
<smadhavan@nvidia.com> wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> Add infrastructure for quiescing the CXL data path before reset:
>
> - Memory offlining: check if CXL-backed memory is online and offline
> it via offline_and_remove_memory() before reset, per CXL
> spec requirement to quiesce all CXL.mem transactions before issuing
> CXL Reset.
> - CPU cache flush: invalidate cache lines before reset
> as a safety measure after memory offline.
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
> drivers/cxl/core/pci.c | 110 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 110 insertions(+)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index f96ce884a213..9e6f0c4b3cb6 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -4,6 +4,8 @@
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/device.h>
> #include <linux/delay.h>
> +#include <linux/memory_hotplug.h>
> +#include <linux/memregion.h>
> #include <linux/pci.h>
> #include <linux/pci-doe.h>
> #include <linux/aer.h>
> @@ -869,3 +871,111 @@ int cxl_port_get_possible_dports(struct cxl_port *port)
>
> return ctx.count;
> }
> +
> +/*
> + * CXL Reset support - core-provided reset logic for CXL devices.
> + *
> + * These functions implement the CXL reset sequence.
> + */
> +
> +/*
> + * If CXL memory backed by this decoder is online as System RAM, offline
> + * and remove it per CXL spec requirements before issuing CXL Reset.
> + * Returns 0 if memory was not online or was successfully offlined.
> + */
> +static int __maybe_unused cxl_offline_memory(struct device *dev, void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_region *cxlr;
> + struct cxl_region_params *p;
> + int rc;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + cxlr = cxled->cxld.region;
> + if (!cxlr)
> + return 0;
> +
> + p = &cxlr->params;
> + if (!p->res)
> + return 0;
> +
> + if (walk_iomem_res_desc(IORES_DESC_NONE,
> + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> + p->res->start, p->res->end, NULL, NULL) <= 0)
> + return 0;
> +
> + dev_info(dev, "Offlining CXL memory [%pr] for reset\n", p->res);
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> + rc = offline_and_remove_memory(p->res->start, resource_size(p->res));
> + if (rc) {
> + dev_err(dev,
> + "Failed to offline CXL memory [%pr]: %d\n",
> + p->res, rc);
> + return rc;
> + }
> +#else
> + dev_err(dev, "Memory hotremove not supported, cannot offline CXL memory\n");
> + rc = -EOPNOTSUPP;
> + return rc;
> +#endif
This would be cleaner if we stubbed offline_and_remove_memory() with
-EOPNOTSUPP. Thanks,
Alex
> +
> + return 0;
> +}
> +
> +static int __maybe_unused cxl_reset_prepare_memdev(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint;
> + struct device *dev;
> +
> + if (!cxlmd || !cxlmd->cxlds)
> + return -ENODEV;
> +
> + dev = cxlmd->cxlds->dev;
> + endpoint = cxlmd->endpoint;
> + if (!endpoint)
> + return 0;
> +
> + return device_for_each_child(&endpoint->dev, NULL,
> + cxl_offline_memory);
> +}
> +
> +static int __maybe_unused cxl_decoder_flush_cache(struct device *dev, void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_region *cxlr;
> + struct resource *res;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + cxlr = cxled->cxld.region;
> + if (!cxlr || !cxlr->params.res)
> + return 0;
> +
> + res = cxlr->params.res;
> + cpu_cache_invalidate_memregion(res->start, resource_size(res));
> + return 0;
> +}
> +
> +static int __maybe_unused cxl_reset_flush_cpu_caches(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint;
> +
> + if (!cxlmd)
> + return 0;
> +
> + endpoint = cxlmd->endpoint;
> + if (!endpoint || IS_ERR(endpoint))
> + return 0;
> +
> + if (!cpu_cache_has_invalidate_memregion())
> + return 0;
> +
> + device_for_each_child(&endpoint->dev, NULL, cxl_decoder_flush_cache);
> + return 0;
> +}
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v5 3/7] cxl: Add memory offlining and cache flush helpers
2026-03-06 9:23 ` [PATCH v5 3/7] cxl: Add memory offlining and cache flush helpers smadhavan
2026-03-06 23:34 ` Alex Williamson
@ 2026-03-09 23:01 ` Dave Jiang
1 sibling, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2026-03-09 23:01 UTC (permalink / raw)
To: smadhavan, bhelgaas, dan.j.williams, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel
On 3/6/26 2:23 AM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> Add infrastructure for quiescing the CXL data path before reset:
>
> - Memory offlining: check if CXL-backed memory is online and offline
> it via offline_and_remove_memory() before reset, per CXL
> spec requirement to quiesce all CXL.mem transactions before issuing
> CXL Reset.
> - CPU cache flush: invalidate cache lines before reset
> as a safety measure after memory offline.
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
> drivers/cxl/core/pci.c | 110 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 110 insertions(+)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index f96ce884a213..9e6f0c4b3cb6 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -4,6 +4,8 @@
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/device.h>
> #include <linux/delay.h>
> +#include <linux/memory_hotplug.h>
> +#include <linux/memregion.h>
> #include <linux/pci.h>
> #include <linux/pci-doe.h>
> #include <linux/aer.h>
> @@ -869,3 +871,111 @@ int cxl_port_get_possible_dports(struct cxl_port *port)
>
> return ctx.count;
> }
> +
> +/*
> + * CXL Reset support - core-provided reset logic for CXL devices.
> + *
> + * These functions implement the CXL reset sequence.
> + */
> +
> +/*
> + * If CXL memory backed by this decoder is online as System RAM, offline
> + * and remove it per CXL spec requirements before issuing CXL Reset.
> + * Returns 0 if memory was not online or was successfully offlined.
> + */
> +static int __maybe_unused cxl_offline_memory(struct device *dev, void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_region *cxlr;
> + struct cxl_region_params *p;
> + int rc;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + cxlr = cxled->cxld.region;
> + if (!cxlr)
> + return 0;
> +
> + p = &cxlr->params;
> + if (!p->res)
> + return 0;
> +
> + if (walk_iomem_res_desc(IORES_DESC_NONE,
> + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
> + p->res->start, p->res->end, NULL, NULL) <= 0)
This function is performed per endpoint. So if a region is backed by multiple endpoints, wouldn't this memory offline operation be performed over the same region on every related endpoint instead of just once? Maybe a temp xarray during the reset process that keeps track of the regions that are being hit with reset?
> + return 0;
> +
> + dev_info(dev, "Offlining CXL memory [%pr] for reset\n", p->res);
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> + rc = offline_and_remove_memory(p->res->start, resource_size(p->res));
> + if (rc) {
> + dev_err(dev,
> + "Failed to offline CXL memory [%pr]: %d\n",
> + p->res, rc);
> + return rc;
> + }
> +#else
> + dev_err(dev, "Memory hotremove not supported, cannot offline CXL memory\n");
> + rc = -EOPNOTSUPP;
> + return rc;
> +#endif
Same comment as Alex. ifdef in C files are not preferred. Maybe a helper function can be used and stubbed out when !CONFIG_MEMORY_HOTREMOVE.
> +
> + return 0;
> +}
> +
> +static int __maybe_unused cxl_reset_prepare_memdev(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint;
> + struct device *dev;
> +
> + if (!cxlmd || !cxlmd->cxlds)
> + return -ENODEV;
> +
> + dev = cxlmd->cxlds->dev;
> + endpoint = cxlmd->endpoint;
> + if (!endpoint)
> + return 0;
> +
> + return device_for_each_child(&endpoint->dev, NULL,
> + cxl_offline_memory);
> +}
> +
> +static int __maybe_unused cxl_decoder_flush_cache(struct device *dev, void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_region *cxlr;
> + struct resource *res;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + cxlr = cxled->cxld.region;
> + if (!cxlr || !cxlr->params.res)
> + return 0;
> +
> + res = cxlr->params.res;
> + cpu_cache_invalidate_memregion(res->start, resource_size(res));
Same comment as offline memory. Cache being invalidated per region for every decoder. Probably not something you want to do.
DJ
> + return 0;
> +}
> +
> +static int __maybe_unused cxl_reset_flush_cpu_caches(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint;
> +
> + if (!cxlmd)
> + return 0;
> +
> + endpoint = cxlmd->endpoint;
> + if (!endpoint || IS_ERR(endpoint))
> + return 0;
> +
> + if (!cpu_cache_has_invalidate_memregion())
> + return 0;
> +
> + device_for_each_child(&endpoint->dev, NULL, cxl_decoder_flush_cache);
> + return 0;
> +}
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v5 4/7] cxl: Add multi-function sibling coordination for CXL reset
2026-03-06 9:23 [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices smadhavan
` (2 preceding siblings ...)
2026-03-06 9:23 ` [PATCH v5 3/7] cxl: Add memory offlining and cache flush helpers smadhavan
@ 2026-03-06 9:23 ` smadhavan
2026-03-06 23:34 ` Alex Williamson
2026-03-06 9:23 ` [PATCH v5 5/7] cxl: Add CXL DVSEC reset sequence and flow orchestration smadhavan
` (3 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: smadhavan @ 2026-03-06 9:23 UTC (permalink / raw)
To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel, Srirangan Madhavan
From: Srirangan Madhavan <smadhavan@nvidia.com>
Add sibling PCI function save/disable/restore coordination for CXL
reset. Before reset, all CXL.cachemem sibling functions are locked,
saved, and disabled; after reset they are restored. The Non-CXL Function
Map DVSEC and per-function DVSEC capability register are consulted to
skip non-CXL and CXL.io-only functions. A global mutex serializes
concurrent resets to prevent deadlocks between sibling functions.
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
drivers/cxl/core/pci.c | 137 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 137 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 9e6f0c4b3cb6..b6f10a2cb404 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -15,6 +15,9 @@
#include "core.h"
#include "trace.h"
+/* Initial sibling array capacity: covers max non-ARI functions per slot */
+#define CXL_RESET_SIBLINGS_INIT 8
+
/**
* DOC: cxl core pci
*
@@ -979,3 +982,137 @@ static int __maybe_unused cxl_reset_flush_cpu_caches(struct cxl_memdev *cxlmd)
device_for_each_child(&endpoint->dev, NULL, cxl_decoder_flush_cache);
return 0;
}
+
+/*
+ * Serialize all CXL reset operations globally.
+ */
+static DEFINE_MUTEX(cxl_reset_mutex);
+
+struct cxl_reset_context {
+ struct pci_dev *target;
+ struct pci_dev **pci_functions;
+ int pci_func_count;
+ int pci_func_cap;
+};
+
+/*
+ * Check if a sibling function is non-CXL using the Non-CXL Function Map
+ * DVSEC. Returns true if fn is listed as non-CXL, false otherwise (including
+ * on any read failure).
+ */
+static bool cxl_is_non_cxl_function(struct pci_dev *pdev,
+ u16 func_map_dvsec, int fn)
+{
+ int reg, bit;
+ u32 map;
+
+ if (pci_ari_enabled(pdev->bus)) {
+ reg = fn / 32;
+ bit = fn % 32;
+ } else {
+ reg = fn;
+ bit = PCI_SLOT(pdev->devfn);
+ }
+
+ if (pci_read_config_dword(pdev,
+ func_map_dvsec + PCI_DVSEC_CXL_FUNCTION_MAP_REG + (reg * 4),
+ &map))
+ return false;
+
+ return map & BIT(bit);
+}
+
+struct cxl_reset_walk_ctx {
+ struct cxl_reset_context *ctx;
+ u16 func_map_dvsec;
+ bool ari;
+};
+
+static int cxl_reset_collect_sibling(struct pci_dev *func, void *data)
+{
+ struct cxl_reset_walk_ctx *wctx = data;
+ struct cxl_reset_context *ctx = wctx->ctx;
+ struct pci_dev *pdev = ctx->target;
+ u16 dvsec, cap;
+ int fn;
+
+ if (func == pdev)
+ return 0;
+
+ if (!wctx->ari &&
+ PCI_SLOT(func->devfn) != PCI_SLOT(pdev->devfn))
+ return 0;
+
+ fn = wctx->ari ? func->devfn : PCI_FUNC(func->devfn);
+ if (wctx->func_map_dvsec &&
+ cxl_is_non_cxl_function(pdev, wctx->func_map_dvsec, fn))
+ return 0;
+
+ /* Only coordinate with siblings that have CXL.cachemem */
+ dvsec = pci_find_dvsec_capability(func, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_DEVICE);
+ if (!dvsec)
+ return 0;
+ if (pci_read_config_word(func, dvsec + PCI_DVSEC_CXL_CAP, &cap))
+ return 0;
+ if (!(cap & (PCI_DVSEC_CXL_CACHE_CAPABLE |
+ PCI_DVSEC_CXL_MEM_CAPABLE)))
+ return 0;
+
+ /* Grow sibling array; double capacity for ARI devices when running out of space */
+ if (ctx->pci_func_count >= ctx->pci_func_cap) {
+ struct pci_dev **new;
+ int new_cap = ctx->pci_func_cap ? ctx->pci_func_cap * 2
+ : CXL_RESET_SIBLINGS_INIT;
+
+ new = krealloc(ctx->pci_functions,
+ new_cap * sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return 1;
+ ctx->pci_functions = new;
+ ctx->pci_func_cap = new_cap;
+ }
+
+ pci_dev_get(func);
+ ctx->pci_functions[ctx->pci_func_count++] = func;
+ return 0;
+}
+
+static void __maybe_unused cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
+{
+ struct pci_dev *pdev = ctx->target;
+ struct cxl_reset_walk_ctx wctx;
+ int i;
+
+ ctx->pci_func_count = 0;
+ ctx->pci_functions = NULL;
+ ctx->pci_func_cap = 0;
+
+ wctx.ctx = ctx;
+ wctx.ari = pci_ari_enabled(pdev->bus);
+ wctx.func_map_dvsec = pci_find_dvsec_capability(pdev,
+ PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_FUNCTION_MAP);
+
+ /* Collect CXL.cachemem siblings under pci_bus_sem */
+ pci_walk_bus(pdev->bus, cxl_reset_collect_sibling, &wctx);
+
+ /* Lock and save/disable siblings outside pci_bus_sem */
+ for (i = 0; i < ctx->pci_func_count; i++) {
+ pci_dev_lock(ctx->pci_functions[i]);
+ pci_dev_save_and_disable(ctx->pci_functions[i]);
+ }
+}
+
+static void __maybe_unused cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
+{
+ int i;
+
+ for (i = 0; i < ctx->pci_func_count; i++) {
+ pci_dev_restore(ctx->pci_functions[i]);
+ pci_dev_unlock(ctx->pci_functions[i]);
+ pci_dev_put(ctx->pci_functions[i]);
+ }
+ kfree(ctx->pci_functions);
+ ctx->pci_functions = NULL;
+ ctx->pci_func_count = 0;
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v5 4/7] cxl: Add multi-function sibling coordination for CXL reset
2026-03-06 9:23 ` [PATCH v5 4/7] cxl: Add multi-function sibling coordination for CXL reset smadhavan
@ 2026-03-06 23:34 ` Alex Williamson
0 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2026-03-06 23:34 UTC (permalink / raw)
To: smadhavan, linux-cxl
Cc: alex, bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron,
ira.weiny, vishal.l.verma, alison.schofield, dave, jeshuas,
vsethi, skancherla, vaslot, sdonthineni, mhonap, vidyas, jan,
mochs, dschumacher, linux-pci, linux-kernel
On Fri, 6 Mar 2026 09:23:19 +0000
<smadhavan@nvidia.com> wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> Add sibling PCI function save/disable/restore coordination for CXL
> reset. Before reset, all CXL.cachemem sibling functions are locked,
> saved, and disabled; after reset they are restored. The Non-CXL Function
> Map DVSEC and per-function DVSEC capability register are consulted to
> skip non-CXL and CXL.io-only functions. A global mutex serializes
> concurrent resets to prevent deadlocks between sibling functions.
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
> drivers/cxl/core/pci.c | 137 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 137 insertions(+)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 9e6f0c4b3cb6..b6f10a2cb404 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -15,6 +15,9 @@
> #include "core.h"
> #include "trace.h"
>
> +/* Initial sibling array capacity: covers max non-ARI functions per slot */
> +#define CXL_RESET_SIBLINGS_INIT 8
> +
> /**
> * DOC: cxl core pci
> *
> @@ -979,3 +982,137 @@ static int __maybe_unused cxl_reset_flush_cpu_caches(struct cxl_memdev *cxlmd)
> device_for_each_child(&endpoint->dev, NULL, cxl_decoder_flush_cache);
> return 0;
> }
> +
> +/*
> + * Serialize all CXL reset operations globally.
> + */
> +static DEFINE_MUTEX(cxl_reset_mutex);
> +
> +struct cxl_reset_context {
> + struct pci_dev *target;
> + struct pci_dev **pci_functions;
> + int pci_func_count;
> + int pci_func_cap;
> +};
> +
> +/*
> + * Check if a sibling function is non-CXL using the Non-CXL Function Map
> + * DVSEC. Returns true if fn is listed as non-CXL, false otherwise (including
> + * on any read failure).
> + */
> +static bool cxl_is_non_cxl_function(struct pci_dev *pdev,
> + u16 func_map_dvsec, int fn)
I think we can do better, how about:
static bool is_cxl_sibling(struct pci_dev *pdev,
struct pci_dev *sibling,
unsigned long *non_cxl_func_map)
{
if (pci_ari_enabled(pdev->bus))
return !test_bit(sibling->devfn, non_cxl_func_map);
return PCI_SLOT(pdev->devfn) == PCI_SLOT(sibling->devfn) ?
!test_bit(PCI_FUNC(sibling->devfn) * 32 +
PCI_SLOT(sibling->devfn), non_cxl_func_map) :
false;
}
To get this we'd eliminate ari and func_map_dvsec from the walk data
and replace with with a bitmap of all 8 non-cxl functions pre-filled
before the walk.
> +{
> + int reg, bit;
> + u32 map;
> +
> + if (pci_ari_enabled(pdev->bus)) {
> + reg = fn / 32;
> + bit = fn % 32;
> + } else {
> + reg = fn;
> + bit = PCI_SLOT(pdev->devfn);
> + }
> +
> + if (pci_read_config_dword(pdev,
> + func_map_dvsec + PCI_DVSEC_CXL_FUNCTION_MAP_REG + (reg * 4),
> + &map))
> + return false;
> +
> + return map & BIT(bit);
> +}
> +
> +struct cxl_reset_walk_ctx {
> + struct cxl_reset_context *ctx;
> + u16 func_map_dvsec;
> + bool ari;
> +};
> +
> +static int cxl_reset_collect_sibling(struct pci_dev *func, void *data)
> +{
> + struct cxl_reset_walk_ctx *wctx = data;
> + struct cxl_reset_context *ctx = wctx->ctx;
> + struct pci_dev *pdev = ctx->target;
> + u16 dvsec, cap;
> + int fn;
> +
> + if (func == pdev)
> + return 0;
> +
> + if (!wctx->ari &&
> + PCI_SLOT(func->devfn) != PCI_SLOT(pdev->devfn))
> + return 0;
> +
> + fn = wctx->ari ? func->devfn : PCI_FUNC(func->devfn);
> + if (wctx->func_map_dvsec &&
> + cxl_is_non_cxl_function(pdev, wctx->func_map_dvsec, fn))
> + return 0;
The above, since the identity check, becomes:
if (!is_cxl_sibling(pdev, func, non_cxl_func_map))
return 0;
> +
> + /* Only coordinate with siblings that have CXL.cachemem */
Is this meant to read as "cache/mem" since we test for either below?
> + dvsec = pci_find_dvsec_capability(func, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_DEVICE);
> + if (!dvsec)
> + return 0;
> + if (pci_read_config_word(func, dvsec + PCI_DVSEC_CXL_CAP, &cap))
> + return 0;
> + if (!(cap & (PCI_DVSEC_CXL_CACHE_CAPABLE |
> + PCI_DVSEC_CXL_MEM_CAPABLE)))
> + return 0;
> +
> + /* Grow sibling array; double capacity for ARI devices when running out of space */
> + if (ctx->pci_func_count >= ctx->pci_func_cap) {
> + struct pci_dev **new;
> + int new_cap = ctx->pci_func_cap ? ctx->pci_func_cap * 2
> + : CXL_RESET_SIBLINGS_INIT;
> +
> + new = krealloc(ctx->pci_functions,
> + new_cap * sizeof(*new), GFP_KERNEL);
> + if (!new)
> + return 1;
> + ctx->pci_functions = new;
> + ctx->pci_func_cap = new_cap;
> + }
> +
> + pci_dev_get(func);
> + ctx->pci_functions[ctx->pci_func_count++] = func;
> + return 0;
> +}
> +
> +static void __maybe_unused cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
> +{
> + struct pci_dev *pdev = ctx->target;
> + struct cxl_reset_walk_ctx wctx;
> + int i;
> +
> + ctx->pci_func_count = 0;
> + ctx->pci_functions = NULL;
> + ctx->pci_func_cap = 0;
> +
> + wctx.ctx = ctx;
> + wctx.ari = pci_ari_enabled(pdev->bus);
> + wctx.func_map_dvsec = pci_find_dvsec_capability(pdev,
> + PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_FUNCTION_MAP);
> +
> + /* Collect CXL.cachemem siblings under pci_bus_sem */
> + pci_walk_bus(pdev->bus, cxl_reset_collect_sibling, &wctx);
> +
> + /* Lock and save/disable siblings outside pci_bus_sem */
> + for (i = 0; i < ctx->pci_func_count; i++) {
> + pci_dev_lock(ctx->pci_functions[i]);
> + pci_dev_save_and_disable(ctx->pci_functions[i]);
> + }
We also need to trigger the pci_dev_reset_iommu_{prepare,done}() hook
around reset.
Also, I think this whole path needs to be able to return error. We've
got the krealloc in the walk function that can fail and is just
silently ignored and also the pci_dev_lock() here, where we've
eliminated the PCI bus semaphore issue, but it's still prone to dead-
lock. We should use trylock and support unwind when it fails. Return
an errno to the sysfs interface.
An example deadlock that could occur is a CXL reset initiated on fn1,
where we then try to lock fn0, fn2, etc (fn1 already locked) while
another interface walks fn0, fn1, etc. Thanks,
Alex
> +}
> +
> +static void __maybe_unused cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
> +{
> + int i;
> +
> + for (i = 0; i < ctx->pci_func_count; i++) {
> + pci_dev_restore(ctx->pci_functions[i]);
> + pci_dev_unlock(ctx->pci_functions[i]);
> + pci_dev_put(ctx->pci_functions[i]);
> + }
> + kfree(ctx->pci_functions);
> + ctx->pci_functions = NULL;
> + ctx->pci_func_count = 0;
> +}
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v5 5/7] cxl: Add CXL DVSEC reset sequence and flow orchestration
2026-03-06 9:23 [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices smadhavan
` (3 preceding siblings ...)
2026-03-06 9:23 ` [PATCH v5 4/7] cxl: Add multi-function sibling coordination for CXL reset smadhavan
@ 2026-03-06 9:23 ` smadhavan
2026-03-06 23:33 ` Alex Williamson
2026-03-10 0:26 ` Dave Jiang
2026-03-06 9:23 ` [PATCH v5 6/7] cxl: Add cxl_reset sysfs interface for PCI devices smadhavan
` (2 subsequent siblings)
7 siblings, 2 replies; 19+ messages in thread
From: smadhavan @ 2026-03-06 9:23 UTC (permalink / raw)
To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel, Srirangan Madhavan
From: Srirangan Madhavan <smadhavan@nvidia.com>
cxl_dev_reset() implements the hardware reset sequence:
optionally enable memory clear, initiate reset via
CTRL2, wait for completion, and re-enable caching.
cxl_do_reset() orchestrates the full reset flow:
1. CXL pre-reset: mem offlining and cache flush (when memdev present)
2. PCI save/disable: pci_dev_save_and_disable() automatically saves
CXL DVSEC and HDM decoder state via PCI core hooks
3. Sibling coordination: save/disable CXL.cachemem sibling functions
4. Execute CXL DVSEC reset
5. Sibling restore: always runs to re-enable sibling functions
6. PCI restore: pci_dev_restore() automatically restores CXL state
The CXL-specific DVSEC and HDM save/restore is handled
by the PCI core's CXL save/restore infrastructure (drivers/pci/cxl.c).
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
drivers/cxl/core/pci.c | 181 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 179 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index b6f10a2cb404..c758b3f1b3f9 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1078,7 +1078,7 @@ static int cxl_reset_collect_sibling(struct pci_dev *func, void *data)
return 0;
}
-static void __maybe_unused cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
+static void cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
{
struct pci_dev *pdev = ctx->target;
struct cxl_reset_walk_ctx wctx;
@@ -1103,7 +1103,7 @@ static void __maybe_unused cxl_pci_functions_reset_prepare(struct cxl_reset_cont
}
}
-static void __maybe_unused cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
+static void cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
{
int i;
@@ -1116,3 +1116,180 @@ static void __maybe_unused cxl_pci_functions_reset_done(struct cxl_reset_context
ctx->pci_functions = NULL;
ctx->pci_func_count = 0;
}
+
+/*
+ * CXL device reset execution
+ */
+static int cxl_dev_reset(struct pci_dev *pdev, int dvsec)
+{
+ static const u32 reset_timeout_ms[] = { 10, 100, 1000, 10000, 100000 };
+ u16 cap, ctrl2, status2;
+ u32 timeout_ms;
+ int rc, idx;
+
+ if (!pci_wait_for_pending_transaction(pdev))
+ pci_err(pdev, "timed out waiting for pending transactions\n");
+
+ rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap);
+ if (rc)
+ return rc;
+
+ rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
+ if (rc)
+ return rc;
+
+ /*
+ * Disable caching and initiate cache writeback+invalidation if the
+ * device supports it. Poll for completion.
+ * Per CXL r3.2 section 9.6, software may use the cache size from
+ * DVSEC CXL Capability2 to compute a suitable timeout; we use a
+ * default of 10ms.
+ */
+ if (cap & PCI_DVSEC_CXL_CACHE_WBI_CAPABLE) {
+ u32 wbi_poll_us = 100;
+ s32 wbi_remaining_us = 10000;
+
+ ctrl2 |= PCI_DVSEC_CXL_DISABLE_CACHING;
+ rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
+ ctrl2);
+ if (rc)
+ return rc;
+
+ ctrl2 |= PCI_DVSEC_CXL_INIT_CACHE_WBI;
+ rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
+ ctrl2);
+ if (rc)
+ return rc;
+
+ do {
+ usleep_range(wbi_poll_us, wbi_poll_us + 1);
+ wbi_remaining_us -= wbi_poll_us;
+ rc = pci_read_config_word(pdev,
+ dvsec + PCI_DVSEC_CXL_STATUS2,
+ &status2);
+ if (rc)
+ return rc;
+ } while (!(status2 & PCI_DVSEC_CXL_CACHE_INV) &&
+ wbi_remaining_us > 0);
+
+ if (!(status2 & PCI_DVSEC_CXL_CACHE_INV)) {
+ pci_err(pdev, "CXL cache WB+I timed out\n");
+ return -ETIMEDOUT;
+ }
+ } else if (cap & PCI_DVSEC_CXL_CACHE_CAPABLE) {
+ ctrl2 |= PCI_DVSEC_CXL_DISABLE_CACHING;
+ rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
+ ctrl2);
+ if (rc)
+ return rc;
+ }
+
+ if (cap & PCI_DVSEC_CXL_RST_MEM_CLR_CAPABLE) {
+ rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
+ &ctrl2);
+ if (rc)
+ return rc;
+
+ ctrl2 |= PCI_DVSEC_CXL_RST_MEM_CLR_EN;
+ rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
+ ctrl2);
+ if (rc)
+ return rc;
+ }
+
+ idx = FIELD_GET(PCI_DVSEC_CXL_RST_TIMEOUT, cap);
+ if (idx >= ARRAY_SIZE(reset_timeout_ms))
+ idx = ARRAY_SIZE(reset_timeout_ms) - 1;
+ timeout_ms = reset_timeout_ms[idx];
+
+ rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
+ if (rc)
+ return rc;
+
+ ctrl2 |= PCI_DVSEC_CXL_INIT_CXL_RST;
+ rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, ctrl2);
+ if (rc)
+ return rc;
+
+ msleep(timeout_ms);
+
+ rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_STATUS2,
+ &status2);
+ if (rc)
+ return rc;
+
+ if (status2 & PCI_DVSEC_CXL_RST_ERR) {
+ pci_err(pdev, "CXL reset error\n");
+ return -EIO;
+ }
+
+ if (!(status2 & PCI_DVSEC_CXL_RST_DONE)) {
+ pci_err(pdev, "CXL reset timeout\n");
+ return -ETIMEDOUT;
+ }
+
+ rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
+ if (rc)
+ return rc;
+
+ ctrl2 &= ~PCI_DVSEC_CXL_DISABLE_CACHING;
+ rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, ctrl2);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int match_memdev_by_parent(struct device *dev, const void *parent)
+{
+ return is_cxl_memdev(dev) && dev->parent == parent;
+}
+
+static int cxl_do_reset(struct pci_dev *pdev)
+{
+ struct cxl_reset_context ctx = { .target = pdev };
+ struct cxl_memdev *cxlmd = NULL;
+ struct device *memdev = NULL;
+ int dvsec, rc;
+
+ dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_DEVICE);
+ if (!dvsec)
+ return -ENODEV;
+
+ memdev = bus_find_device(&cxl_bus_type, NULL, &pdev->dev,
+ match_memdev_by_parent);
+ if (memdev) {
+ cxlmd = to_cxl_memdev(memdev);
+ guard(device)(&cxlmd->dev);
+ }
+
+ mutex_lock(&cxl_reset_mutex);
+ pci_dev_lock(pdev);
+
+ if (cxlmd) {
+ rc = cxl_reset_prepare_memdev(cxlmd);
+ if (rc)
+ goto out_unlock;
+
+ cxl_reset_flush_cpu_caches(cxlmd);
+ }
+
+ pci_dev_save_and_disable(pdev);
+ cxl_pci_functions_reset_prepare(&ctx);
+
+ rc = cxl_dev_reset(pdev, dvsec);
+
+ cxl_pci_functions_reset_done(&ctx);
+
+ pci_dev_restore(pdev);
+
+out_unlock:
+ pci_dev_unlock(pdev);
+ mutex_unlock(&cxl_reset_mutex);
+
+ if (memdev)
+ put_device(memdev);
+
+ return rc;
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v5 5/7] cxl: Add CXL DVSEC reset sequence and flow orchestration
2026-03-06 9:23 ` [PATCH v5 5/7] cxl: Add CXL DVSEC reset sequence and flow orchestration smadhavan
@ 2026-03-06 23:33 ` Alex Williamson
2026-03-10 0:26 ` Dave Jiang
1 sibling, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2026-03-06 23:33 UTC (permalink / raw)
To: smadhavan
Cc: alex, bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron,
ira.weiny, vishal.l.verma, alison.schofield, dave, jeshuas,
vsethi, skancherla, vaslot, sdonthineni, mhonap, vidyas, jan,
mochs, dschumacher, linux-cxl, linux-pci, linux-kernel
On Fri, 6 Mar 2026 09:23:20 +0000
<smadhavan@nvidia.com> wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> cxl_dev_reset() implements the hardware reset sequence:
> optionally enable memory clear, initiate reset via
> CTRL2, wait for completion, and re-enable caching.
>
> cxl_do_reset() orchestrates the full reset flow:
> 1. CXL pre-reset: mem offlining and cache flush (when memdev present)
> 2. PCI save/disable: pci_dev_save_and_disable() automatically saves
> CXL DVSEC and HDM decoder state via PCI core hooks
> 3. Sibling coordination: save/disable CXL.cachemem sibling functions
> 4. Execute CXL DVSEC reset
> 5. Sibling restore: always runs to re-enable sibling functions
> 6. PCI restore: pci_dev_restore() automatically restores CXL state
>
> The CXL-specific DVSEC and HDM save/restore is handled
> by the PCI core's CXL save/restore infrastructure (drivers/pci/cxl.c).
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
> drivers/cxl/core/pci.c | 181 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 179 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index b6f10a2cb404..c758b3f1b3f9 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -1078,7 +1078,7 @@ static int cxl_reset_collect_sibling(struct pci_dev *func, void *data)
> return 0;
> }
>
> -static void __maybe_unused cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
> +static void cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
> {
> struct pci_dev *pdev = ctx->target;
> struct cxl_reset_walk_ctx wctx;
> @@ -1103,7 +1103,7 @@ static void __maybe_unused cxl_pci_functions_reset_prepare(struct cxl_reset_cont
> }
> }
>
> -static void __maybe_unused cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
> +static void cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
> {
> int i;
>
> @@ -1116,3 +1116,180 @@ static void __maybe_unused cxl_pci_functions_reset_done(struct cxl_reset_context
> ctx->pci_functions = NULL;
> ctx->pci_func_count = 0;
> }
> +
> +/*
> + * CXL device reset execution
> + */
> +static int cxl_dev_reset(struct pci_dev *pdev, int dvsec)
> +{
> + static const u32 reset_timeout_ms[] = { 10, 100, 1000, 10000, 100000 };
> + u16 cap, ctrl2, status2;
> + u32 timeout_ms;
> + int rc, idx;
> +
> + if (!pci_wait_for_pending_transaction(pdev))
> + pci_err(pdev, "timed out waiting for pending transactions\n");
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap);
> + if (rc)
> + return rc;
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
> + if (rc)
> + return rc;
> +
> + /*
> + * Disable caching and initiate cache writeback+invalidation if the
> + * device supports it. Poll for completion.
> + * Per CXL r3.2 section 9.6, software may use the cache size from
> + * DVSEC CXL Capability2 to compute a suitable timeout; we use a
> + * default of 10ms.
> + */
> + if (cap & PCI_DVSEC_CXL_CACHE_WBI_CAPABLE) {
> + u32 wbi_poll_us = 100;
> + s32 wbi_remaining_us = 10000;
> +
> + ctrl2 |= PCI_DVSEC_CXL_DISABLE_CACHING;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + ctrl2);
> + if (rc)
> + return rc;
> +
> + ctrl2 |= PCI_DVSEC_CXL_INIT_CACHE_WBI;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + ctrl2);
> + if (rc)
> + return rc;
> +
> + do {
> + usleep_range(wbi_poll_us, wbi_poll_us + 1);
> + wbi_remaining_us -= wbi_poll_us;
> + rc = pci_read_config_word(pdev,
> + dvsec + PCI_DVSEC_CXL_STATUS2,
> + &status2);
> + if (rc)
> + return rc;
> + } while (!(status2 & PCI_DVSEC_CXL_CACHE_INV) &&
> + wbi_remaining_us > 0);
> +
> + if (!(status2 & PCI_DVSEC_CXL_CACHE_INV)) {
> + pci_err(pdev, "CXL cache WB+I timed out\n");
> + return -ETIMEDOUT;
> + }
> + } else if (cap & PCI_DVSEC_CXL_CACHE_CAPABLE) {
> + ctrl2 |= PCI_DVSEC_CXL_DISABLE_CACHING;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + ctrl2);
> + if (rc)
> + return rc;
> + }
> +
> + if (cap & PCI_DVSEC_CXL_RST_MEM_CLR_CAPABLE) {
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + &ctrl2);
> + if (rc)
> + return rc;
> +
> + ctrl2 |= PCI_DVSEC_CXL_RST_MEM_CLR_EN;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + ctrl2);
> + if (rc)
> + return rc;
> + }
> +
> + idx = FIELD_GET(PCI_DVSEC_CXL_RST_TIMEOUT, cap);
> + if (idx >= ARRAY_SIZE(reset_timeout_ms))
> + idx = ARRAY_SIZE(reset_timeout_ms) - 1;
> + timeout_ms = reset_timeout_ms[idx];
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
> + if (rc)
> + return rc;
> +
> + ctrl2 |= PCI_DVSEC_CXL_INIT_CXL_RST;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, ctrl2);
> + if (rc)
> + return rc;
> +
> + msleep(timeout_ms);
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_STATUS2,
> + &status2);
> + if (rc)
> + return rc;
> +
> + if (status2 & PCI_DVSEC_CXL_RST_ERR) {
> + pci_err(pdev, "CXL reset error\n");
> + return -EIO;
> + }
> +
> + if (!(status2 & PCI_DVSEC_CXL_RST_DONE)) {
> + pci_err(pdev, "CXL reset timeout\n");
> + return -ETIMEDOUT;
> + }
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
> + if (rc)
> + return rc;
> +
> + ctrl2 &= ~PCI_DVSEC_CXL_DISABLE_CACHING;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, ctrl2);
> + if (rc)
> + return rc;
> +
> + return 0;
> +}
> +
> +static int match_memdev_by_parent(struct device *dev, const void *parent)
> +{
> + return is_cxl_memdev(dev) && dev->parent == parent;
> +}
> +
> +static int cxl_do_reset(struct pci_dev *pdev)
> +{
> + struct cxl_reset_context ctx = { .target = pdev };
> + struct cxl_memdev *cxlmd = NULL;
> + struct device *memdev = NULL;
> + int dvsec, rc;
> +
> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_DEVICE);
> + if (!dvsec)
> + return -ENODEV;
> +
> + memdev = bus_find_device(&cxl_bus_type, NULL, &pdev->dev,
> + match_memdev_by_parent);
> + if (memdev) {
> + cxlmd = to_cxl_memdev(memdev);
> + guard(device)(&cxlmd->dev);
> + }
The guard scope ends at the closing brace, aiui.
Also consider whether we could be racing a remove, I think we probably
need trylock and an error return here.
> +
> + mutex_lock(&cxl_reset_mutex);
> + pci_dev_lock(pdev);
> +
> + if (cxlmd) {
> + rc = cxl_reset_prepare_memdev(cxlmd);
> + if (rc)
> + goto out_unlock;
> +
> + cxl_reset_flush_cpu_caches(cxlmd);
> + }
We're holding device-lock across memory offline, which could take some
time. Is the guard above sufficient that we could consolidate the
offline and flush above the mutex and device lock?
What about memdev devices collected as part of the save_and_disable
with restore below? How do we get to skip offline of that memory?
If we switch to a trylock scheme as noted in 4/, do we still need the
global mutex? Thanks,
Alex
> +
> + pci_dev_save_and_disable(pdev);
> + cxl_pci_functions_reset_prepare(&ctx);
> +
> + rc = cxl_dev_reset(pdev, dvsec);
> +
> + cxl_pci_functions_reset_done(&ctx);
> +
> + pci_dev_restore(pdev);
> +
> +out_unlock:
> + pci_dev_unlock(pdev);
> + mutex_unlock(&cxl_reset_mutex);
> +
> + if (memdev)
> + put_device(memdev);
> +
> + return rc;
> +}
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v5 5/7] cxl: Add CXL DVSEC reset sequence and flow orchestration
2026-03-06 9:23 ` [PATCH v5 5/7] cxl: Add CXL DVSEC reset sequence and flow orchestration smadhavan
2026-03-06 23:33 ` Alex Williamson
@ 2026-03-10 0:26 ` Dave Jiang
1 sibling, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2026-03-10 0:26 UTC (permalink / raw)
To: smadhavan, bhelgaas, dan.j.williams, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel
On 3/6/26 2:23 AM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> cxl_dev_reset() implements the hardware reset sequence:
> optionally enable memory clear, initiate reset via
> CTRL2, wait for completion, and re-enable caching.
>
> cxl_do_reset() orchestrates the full reset flow:
> 1. CXL pre-reset: mem offlining and cache flush (when memdev present)
> 2. PCI save/disable: pci_dev_save_and_disable() automatically saves
> CXL DVSEC and HDM decoder state via PCI core hooks
> 3. Sibling coordination: save/disable CXL.cachemem sibling functions
> 4. Execute CXL DVSEC reset
> 5. Sibling restore: always runs to re-enable sibling functions
> 6. PCI restore: pci_dev_restore() automatically restores CXL state
>
> The CXL-specific DVSEC and HDM save/restore is handled
> by the PCI core's CXL save/restore infrastructure (drivers/pci/cxl.c).
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
> drivers/cxl/core/pci.c | 181 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 179 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index b6f10a2cb404..c758b3f1b3f9 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -1078,7 +1078,7 @@ static int cxl_reset_collect_sibling(struct pci_dev *func, void *data)
> return 0;
> }
>
> -static void __maybe_unused cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
> +static void cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
> {
> struct pci_dev *pdev = ctx->target;
> struct cxl_reset_walk_ctx wctx;
> @@ -1103,7 +1103,7 @@ static void __maybe_unused cxl_pci_functions_reset_prepare(struct cxl_reset_cont
> }
> }
>
> -static void __maybe_unused cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
> +static void cxl_pci_functions_reset_done(struct cxl_reset_context *ctx)
> {
> int i;
>
> @@ -1116,3 +1116,180 @@ static void __maybe_unused cxl_pci_functions_reset_done(struct cxl_reset_context
> ctx->pci_functions = NULL;
> ctx->pci_func_count = 0;
> }
> +
> +/*
> + * CXL device reset execution
> + */
> +static int cxl_dev_reset(struct pci_dev *pdev, int dvsec)
> +{
> + static const u32 reset_timeout_ms[] = { 10, 100, 1000, 10000, 100000 };
> + u16 cap, ctrl2, status2;
> + u32 timeout_ms;
> + int rc, idx;
> +
> + if (!pci_wait_for_pending_transaction(pdev))
> + pci_err(pdev, "timed out waiting for pending transactions\n");
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap);
> + if (rc)
> + return rc;
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
> + if (rc)
> + return rc;
> +
> + /*
> + * Disable caching and initiate cache writeback+invalidation if the
> + * device supports it. Poll for completion.
> + * Per CXL r3.2 section 9.6, software may use the cache size from
> + * DVSEC CXL Capability2 to compute a suitable timeout; we use a
> + * default of 10ms.
> + */
> + if (cap & PCI_DVSEC_CXL_CACHE_WBI_CAPABLE) {
> + u32 wbi_poll_us = 100;
> + s32 wbi_remaining_us = 10000;
> +
> + ctrl2 |= PCI_DVSEC_CXL_DISABLE_CACHING;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + ctrl2);
> + if (rc)
> + return rc;
> +
> + ctrl2 |= PCI_DVSEC_CXL_INIT_CACHE_WBI;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + ctrl2);
> + if (rc)
> + return rc;
> +
> + do {
> + usleep_range(wbi_poll_us, wbi_poll_us + 1);
> + wbi_remaining_us -= wbi_poll_us;
> + rc = pci_read_config_word(pdev,
> + dvsec + PCI_DVSEC_CXL_STATUS2,
> + &status2);
> + if (rc)
> + return rc;
> + } while (!(status2 & PCI_DVSEC_CXL_CACHE_INV) &&
> + wbi_remaining_us > 0);
> +
> + if (!(status2 & PCI_DVSEC_CXL_CACHE_INV)) {
> + pci_err(pdev, "CXL cache WB+I timed out\n");
> + return -ETIMEDOUT;
> + }
> + } else if (cap & PCI_DVSEC_CXL_CACHE_CAPABLE) {
> + ctrl2 |= PCI_DVSEC_CXL_DISABLE_CACHING;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + ctrl2);
> + if (rc)
> + return rc;
> + }
> +
> + if (cap & PCI_DVSEC_CXL_RST_MEM_CLR_CAPABLE) {
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + &ctrl2);
> + if (rc)
> + return rc;
> +
> + ctrl2 |= PCI_DVSEC_CXL_RST_MEM_CLR_EN;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2,
> + ctrl2);
> + if (rc)
> + return rc;
> + }
> +
> + idx = FIELD_GET(PCI_DVSEC_CXL_RST_TIMEOUT, cap);
> + if (idx >= ARRAY_SIZE(reset_timeout_ms))
> + idx = ARRAY_SIZE(reset_timeout_ms) - 1;
> + timeout_ms = reset_timeout_ms[idx];
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
> + if (rc)
> + return rc;
> +
> + ctrl2 |= PCI_DVSEC_CXL_INIT_CXL_RST;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, ctrl2);
> + if (rc)
> + return rc;
> +
> + msleep(timeout_ms);
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_STATUS2,
> + &status2);
> + if (rc)
> + return rc;
> +
> + if (status2 & PCI_DVSEC_CXL_RST_ERR) {
> + pci_err(pdev, "CXL reset error\n");
> + return -EIO;
> + }
> +
> + if (!(status2 & PCI_DVSEC_CXL_RST_DONE)) {
> + pci_err(pdev, "CXL reset timeout\n");
> + return -ETIMEDOUT;
> + }
> +
> + rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, &ctrl2);
> + if (rc)
> + return rc;
> +
> + ctrl2 &= ~PCI_DVSEC_CXL_DISABLE_CACHING;
> + rc = pci_write_config_word(pdev, dvsec + PCI_DVSEC_CXL_CTRL2, ctrl2);
> + if (rc)
> + return rc;
> +
> + return 0;
> +}
> +
> +static int match_memdev_by_parent(struct device *dev, const void *parent)
> +{
> + return is_cxl_memdev(dev) && dev->parent == parent;
> +}
> +
> +static int cxl_do_reset(struct pci_dev *pdev)
> +{
> + struct cxl_reset_context ctx = { .target = pdev };
> + struct cxl_memdev *cxlmd = NULL;
> + struct device *memdev = NULL;
> + int dvsec, rc;
> +
> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_DEVICE);
> + if (!dvsec)
> + return -ENODEV;
> +
> + memdev = bus_find_device(&cxl_bus_type, NULL, &pdev->dev,
> + match_memdev_by_parent);
You can create a custom __free() function here with memdev.
> + if (memdev) {
> + cxlmd = to_cxl_memdev(memdev);
> + guard(device)(&cxlmd->dev);
> + }
> +
> + mutex_lock(&cxl_reset_mutex);
guard(mutex)(&cxl_reset_mutex)?
> + pci_dev_lock(pdev);
> +
> + if (cxlmd) {
> + rc = cxl_reset_prepare_memdev(cxlmd);
> + if (rc)
> + goto out_unlock;
> +
> + cxl_reset_flush_cpu_caches(cxlmd);
> + }
Can you move the discovery and touching of cxlmd to a helper function? Would that clean things up a bit here?
DJ
> +
> + pci_dev_save_and_disable(pdev);
> + cxl_pci_functions_reset_prepare(&ctx);
> +
> + rc = cxl_dev_reset(pdev, dvsec);
> +
> + cxl_pci_functions_reset_done(&ctx);
> +
> + pci_dev_restore(pdev);
> +
> +out_unlock:
> + pci_dev_unlock(pdev);
> + mutex_unlock(&cxl_reset_mutex);
> +
> + if (memdev)
> + put_device(memdev);
> +
> + return rc;
> +}
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v5 6/7] cxl: Add cxl_reset sysfs interface for PCI devices
2026-03-06 9:23 [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices smadhavan
` (4 preceding siblings ...)
2026-03-06 9:23 ` [PATCH v5 5/7] cxl: Add CXL DVSEC reset sequence and flow orchestration smadhavan
@ 2026-03-06 9:23 ` smadhavan
2026-03-06 23:32 ` Alex Williamson
` (2 more replies)
2026-03-06 9:23 ` [PATCH v5 7/7] Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute smadhavan
2026-03-09 22:37 ` [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices Dave Jiang
7 siblings, 3 replies; 19+ messages in thread
From: smadhavan @ 2026-03-06 9:23 UTC (permalink / raw)
To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel, Srirangan Madhavan
From: Srirangan Madhavan <smadhavan@nvidia.com>
Add a "cxl_reset" sysfs attribute to PCI devices that support CXL
Reset (CXL r3.2 section 8.1.3.1). The attribute is visible only on
devices with both CXL.cache and CXL.mem capabilities and the CXL
Reset Capable bit set in the DVSEC.
Writing "1" to the attribute triggers the full CXL reset flow via
cxl_do_reset(). The interface is decoupled from memdev creation:
when a CXL memdev exists, memory offlining and cache flush are
performed; otherwise reset proceeds without the memory management.
The sysfs attribute is managed entirely by the CXL module using
sysfs_create_group() / sysfs_remove_group() rather than the PCI
core's static attribute groups. This avoids cross-module symbol
dependencies between the PCI core (always built-in) and CXL_BUS
(potentially modular).
At module init, existing PCI devices are scanned and a PCI bus
notifier handles hot-plug/unplug. kernfs_drain() makes sure that
any in-flight store() completes before sysfs_remove_group() returns,
preventing use-after-free during module unload.
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/pci.c | 113 ++++++++++++++++++++++++++++++++++++++++
drivers/cxl/core/port.c | 3 ++
3 files changed, 118 insertions(+)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 007b8aff0238..edd0389eac52 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -136,6 +136,8 @@ extern struct cxl_rwsem cxl_rwsem;
int cxl_memdev_init(void);
void cxl_memdev_exit(void);
void cxl_mbox_init(void);
+void cxl_reset_sysfs_init(void);
+void cxl_reset_sysfs_exit(void);
enum cxl_poison_trace_type {
CXL_POISON_TRACE_LIST,
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index c758b3f1b3f9..3a53d4314f24 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1293,3 +1293,116 @@ static int cxl_do_reset(struct pci_dev *pdev)
return rc;
}
+
+/*
+ * CXL reset sysfs attribute management.
+ *
+ * The cxl_reset attribute is added to PCI devices that advertise CXL Reset
+ * capability. Managed entirely by the CXL module via subsys_interface on
+ * pci_bus_type, avoiding cross-module symbol dependencies between the PCI
+ * core (built-in) and CXL (potentially modular).
+ *
+ * subsys_interface handles existing devices at register time and hot-plug
+ * add/remove automatically. On unregister, remove_dev runs for all tracked
+ * devices under bus core serialization.
+ */
+
+static bool pci_cxl_reset_capable(struct pci_dev *pdev)
+{
+ int dvsec;
+ u16 cap;
+
+ dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_DEVICE);
+ if (!dvsec)
+ return false;
+
+ if (pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap))
+ return false;
+
+ if (!(cap & PCI_DVSEC_CXL_CACHE_CAPABLE) ||
+ !(cap & PCI_DVSEC_CXL_MEM_CAPABLE))
+ return false;
+
+ return !!(cap & PCI_DVSEC_CXL_RST_CAPABLE);
+}
+
+static ssize_t cxl_reset_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ int rc;
+
+ if (!sysfs_streq(buf, "1"))
+ return -EINVAL;
+
+ rc = cxl_do_reset(pdev);
+ return rc ? rc : count;
+}
+static DEVICE_ATTR_WO(cxl_reset);
+
+static umode_t cxl_reset_attr_is_visible(struct kobject *kobj,
+ struct attribute *a, int n)
+{
+ struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
+
+ if (!pci_cxl_reset_capable(pdev))
+ return 0;
+
+ return a->mode;
+}
+
+static struct attribute *cxl_reset_attrs[] = {
+ &dev_attr_cxl_reset.attr,
+ NULL,
+};
+
+static const struct attribute_group cxl_reset_attr_group = {
+ .attrs = cxl_reset_attrs,
+ .is_visible = cxl_reset_attr_is_visible,
+};
+
+static int cxl_reset_add_dev(struct device *dev,
+ struct subsys_interface *sif)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (!pci_cxl_reset_capable(pdev))
+ return 0;
+
+ return sysfs_create_group(&dev->kobj, &cxl_reset_attr_group);
+}
+
+static void cxl_reset_remove_dev(struct device *dev,
+ struct subsys_interface *sif)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (!pci_cxl_reset_capable(pdev))
+ return;
+
+ sysfs_remove_group(&dev->kobj, &cxl_reset_attr_group);
+}
+
+static struct subsys_interface cxl_reset_interface = {
+ .name = "cxl_reset",
+ .subsys = &pci_bus_type,
+ .add_dev = cxl_reset_add_dev,
+ .remove_dev = cxl_reset_remove_dev,
+};
+
+void cxl_reset_sysfs_init(void)
+{
+ int rc;
+
+ rc = subsys_interface_register(&cxl_reset_interface);
+ if (rc)
+ pr_warn("CXL: failed to register cxl_reset interface (%d)\n",
+ rc);
+}
+
+void cxl_reset_sysfs_exit(void)
+{
+ subsys_interface_unregister(&cxl_reset_interface);
+}
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index b69c2529744c..050dbe63b7fb 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2542,6 +2542,8 @@ static __init int cxl_core_init(void)
if (rc)
goto err_ras;
+ cxl_reset_sysfs_init();
+
return 0;
err_ras:
@@ -2557,6 +2559,7 @@ static __init int cxl_core_init(void)
static void cxl_core_exit(void)
{
+ cxl_reset_sysfs_exit();
cxl_ras_exit();
cxl_region_exit();
bus_unregister(&cxl_bus_type);
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v5 6/7] cxl: Add cxl_reset sysfs interface for PCI devices
2026-03-06 9:23 ` [PATCH v5 6/7] cxl: Add cxl_reset sysfs interface for PCI devices smadhavan
@ 2026-03-06 23:32 ` Alex Williamson
2026-03-12 13:01 ` Jonathan Cameron
2026-03-14 20:39 ` Krzysztof Wilczyński
2 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2026-03-06 23:32 UTC (permalink / raw)
To: smadhavan
Cc: alex, bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron,
ira.weiny, vishal.l.verma, alison.schofield, dave, jeshuas,
vsethi, skancherla, vaslot, sdonthineni, mhonap, vidyas, jan,
mochs, dschumacher, linux-cxl, linux-pci, linux-kernel
On Fri, 6 Mar 2026 09:23:21 +0000
<smadhavan@nvidia.com> wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> Add a "cxl_reset" sysfs attribute to PCI devices that support CXL
> Reset (CXL r3.2 section 8.1.3.1). The attribute is visible only on
> devices with both CXL.cache and CXL.mem capabilities and the CXL
> Reset Capable bit set in the DVSEC.
>
> Writing "1" to the attribute triggers the full CXL reset flow via
> cxl_do_reset(). The interface is decoupled from memdev creation:
> when a CXL memdev exists, memory offlining and cache flush are
> performed; otherwise reset proceeds without the memory management.
>
> The sysfs attribute is managed entirely by the CXL module using
> sysfs_create_group() / sysfs_remove_group() rather than the PCI
> core's static attribute groups. This avoids cross-module symbol
> dependencies between the PCI core (always built-in) and CXL_BUS
> (potentially modular).
>
> At module init, existing PCI devices are scanned and a PCI bus
> notifier handles hot-plug/unplug. kernfs_drain() makes sure that
> any in-flight store() completes before sysfs_remove_group() returns,
> preventing use-after-free during module unload.
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
> drivers/cxl/core/core.h | 2 +
> drivers/cxl/core/pci.c | 113 ++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/port.c | 3 ++
> 3 files changed, 118 insertions(+)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 007b8aff0238..edd0389eac52 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -136,6 +136,8 @@ extern struct cxl_rwsem cxl_rwsem;
> int cxl_memdev_init(void);
> void cxl_memdev_exit(void);
> void cxl_mbox_init(void);
> +void cxl_reset_sysfs_init(void);
> +void cxl_reset_sysfs_exit(void);
>
> enum cxl_poison_trace_type {
> CXL_POISON_TRACE_LIST,
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index c758b3f1b3f9..3a53d4314f24 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -1293,3 +1293,116 @@ static int cxl_do_reset(struct pci_dev *pdev)
>
> return rc;
> }
> +
> +/*
> + * CXL reset sysfs attribute management.
> + *
> + * The cxl_reset attribute is added to PCI devices that advertise CXL Reset
> + * capability. Managed entirely by the CXL module via subsys_interface on
> + * pci_bus_type, avoiding cross-module symbol dependencies between the PCI
> + * core (built-in) and CXL (potentially modular).
> + *
> + * subsys_interface handles existing devices at register time and hot-plug
> + * add/remove automatically. On unregister, remove_dev runs for all tracked
> + * devices under bus core serialization.
> + */
> +
> +static bool pci_cxl_reset_capable(struct pci_dev *pdev)
> +{
> + int dvsec;
> + u16 cap;
> +
> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_DEVICE);
> + if (!dvsec)
> + return false;
> +
> + if (pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap))
> + return false;
> +
> + if (!(cap & PCI_DVSEC_CXL_CACHE_CAPABLE) ||
> + !(cap & PCI_DVSEC_CXL_MEM_CAPABLE))
> + return false;
> +
> + return !!(cap & PCI_DVSEC_CXL_RST_CAPABLE);
> +}
> +
> +static ssize_t cxl_reset_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> + int rc;
> +
> + if (!sysfs_streq(buf, "1"))
> + return -EINVAL;
This should use kstrtoul like the pci-sysfs interface so it accepts the
same formats. Thanks,
Alex
> +
> + rc = cxl_do_reset(pdev);
> + return rc ? rc : count;
> +}
> +static DEVICE_ATTR_WO(cxl_reset);
> +
> +static umode_t cxl_reset_attr_is_visible(struct kobject *kobj,
> + struct attribute *a, int n)
> +{
> + struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
> +
> + if (!pci_cxl_reset_capable(pdev))
> + return 0;
> +
> + return a->mode;
> +}
> +
> +static struct attribute *cxl_reset_attrs[] = {
> + &dev_attr_cxl_reset.attr,
> + NULL,
> +};
> +
> +static const struct attribute_group cxl_reset_attr_group = {
> + .attrs = cxl_reset_attrs,
> + .is_visible = cxl_reset_attr_is_visible,
> +};
> +
> +static int cxl_reset_add_dev(struct device *dev,
> + struct subsys_interface *sif)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!pci_cxl_reset_capable(pdev))
> + return 0;
> +
> + return sysfs_create_group(&dev->kobj, &cxl_reset_attr_group);
> +}
> +
> +static void cxl_reset_remove_dev(struct device *dev,
> + struct subsys_interface *sif)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!pci_cxl_reset_capable(pdev))
> + return;
> +
> + sysfs_remove_group(&dev->kobj, &cxl_reset_attr_group);
> +}
> +
> +static struct subsys_interface cxl_reset_interface = {
> + .name = "cxl_reset",
> + .subsys = &pci_bus_type,
> + .add_dev = cxl_reset_add_dev,
> + .remove_dev = cxl_reset_remove_dev,
> +};
> +
> +void cxl_reset_sysfs_init(void)
> +{
> + int rc;
> +
> + rc = subsys_interface_register(&cxl_reset_interface);
> + if (rc)
> + pr_warn("CXL: failed to register cxl_reset interface (%d)\n",
> + rc);
> +}
> +
> +void cxl_reset_sysfs_exit(void)
> +{
> + subsys_interface_unregister(&cxl_reset_interface);
> +}
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index b69c2529744c..050dbe63b7fb 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -2542,6 +2542,8 @@ static __init int cxl_core_init(void)
> if (rc)
> goto err_ras;
>
> + cxl_reset_sysfs_init();
> +
> return 0;
>
> err_ras:
> @@ -2557,6 +2559,7 @@ static __init int cxl_core_init(void)
>
> static void cxl_core_exit(void)
> {
> + cxl_reset_sysfs_exit();
> cxl_ras_exit();
> cxl_region_exit();
> bus_unregister(&cxl_bus_type);
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v5 6/7] cxl: Add cxl_reset sysfs interface for PCI devices
2026-03-06 9:23 ` [PATCH v5 6/7] cxl: Add cxl_reset sysfs interface for PCI devices smadhavan
2026-03-06 23:32 ` Alex Williamson
@ 2026-03-12 13:01 ` Jonathan Cameron
2026-03-14 20:39 ` Krzysztof Wilczyński
2 siblings, 0 replies; 19+ messages in thread
From: Jonathan Cameron @ 2026-03-12 13:01 UTC (permalink / raw)
To: smadhavan
Cc: bhelgaas, dan.j.williams, dave.jiang, ira.weiny, vishal.l.verma,
alison.schofield, dave, alwilliamson, jeshuas, vsethi, skancherla,
vaslot, sdonthineni, mhonap, vidyas, jan, mochs, dschumacher,
linux-cxl, linux-pci, linux-kernel
On Fri, 6 Mar 2026 09:23:21 +0000
<smadhavan@nvidia.com> wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> Add a "cxl_reset" sysfs attribute to PCI devices that support CXL
> Reset (CXL r3.2 section 8.1.3.1). The attribute is visible only on
> devices with both CXL.cache and CXL.mem capabilities and the CXL
> Reset Capable bit set in the DVSEC.
>
> Writing "1" to the attribute triggers the full CXL reset flow via
> cxl_do_reset(). The interface is decoupled from memdev creation:
> when a CXL memdev exists, memory offlining and cache flush are
> performed; otherwise reset proceeds without the memory management.
>
> The sysfs attribute is managed entirely by the CXL module using
> sysfs_create_group() / sysfs_remove_group() rather than the PCI
> core's static attribute groups. This avoids cross-module symbol
> dependencies between the PCI core (always built-in) and CXL_BUS
> (potentially modular).
The side effect being the races that tend to come with dynamic creation
of sysfs. Not sure we can avoid that though.
>
> At module init, existing PCI devices are scanned and a PCI bus
> notifier handles hot-plug/unplug. kernfs_drain() makes sure that
> any in-flight store() completes before sysfs_remove_group() returns,
> preventing use-after-free during module unload.
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
A few trivial things inline.
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index c758b3f1b3f9..3a53d4314f24 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -1293,3 +1293,116 @@ static int cxl_do_reset(struct pci_dev *pdev)
>
> return rc;
> }
> +
> +/*
> + * CXL reset sysfs attribute management.
> + *
> + * The cxl_reset attribute is added to PCI devices that advertise CXL Reset
> + * capability. Managed entirely by the CXL module via subsys_interface on
> + * pci_bus_type, avoiding cross-module symbol dependencies between the PCI
> + * core (built-in) and CXL (potentially modular).
> + *
> + * subsys_interface handles existing devices at register time and hot-plug
> + * add/remove automatically. On unregister, remove_dev runs for all tracked
> + * devices under bus core serialization.
> + */
> +
> +static bool pci_cxl_reset_capable(struct pci_dev *pdev)
> +{
> + int dvsec;
> + u16 cap;
> +
> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_DEVICE);
> + if (!dvsec)
> + return false;
> +
> + if (pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_CAP, &cap))
> + return false;
> +
> + if (!(cap & PCI_DVSEC_CXL_CACHE_CAPABLE) ||
> + !(cap & PCI_DVSEC_CXL_MEM_CAPABLE))
Whilst it's a nonsensical setup to have a CXL device with no
CXL features, is there a reason we need this explicit check?
> + return false;
> +
> + return !!(cap & PCI_DVSEC_CXL_RST_CAPABLE);
Technically the !! not needed as the cast will deal with it.
If you want to force a 0/1 I'd prefer FIELD_GET()
> +}
> +
> +static ssize_t cxl_reset_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> + int rc;
> +
> + if (!sysfs_streq(buf, "1"))
> + return -EINVAL;
> +
> + rc = cxl_do_reset(pdev);
> + return rc ? rc : count;
> +}
> +static DEVICE_ATTR_WO(cxl_reset);
> +
> +static umode_t cxl_reset_attr_is_visible(struct kobject *kobj,
> + struct attribute *a, int n)
> +{
> + struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
> +
> + if (!pci_cxl_reset_capable(pdev))
> + return 0;
> +
> + return a->mode;
> +}
> +
> +static struct attribute *cxl_reset_attrs[] = {
> + &dev_attr_cxl_reset.attr,
> + NULL,
No comma on a terminating entry. We don't want to make it easy to
add stuff after this!
> +};
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v5 6/7] cxl: Add cxl_reset sysfs interface for PCI devices
2026-03-06 9:23 ` [PATCH v5 6/7] cxl: Add cxl_reset sysfs interface for PCI devices smadhavan
2026-03-06 23:32 ` Alex Williamson
2026-03-12 13:01 ` Jonathan Cameron
@ 2026-03-14 20:39 ` Krzysztof Wilczyński
2 siblings, 0 replies; 19+ messages in thread
From: Krzysztof Wilczyński @ 2026-03-14 20:39 UTC (permalink / raw)
To: smadhavan
Cc: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave, alwilliamson, jeshuas,
vsethi, skancherla, vaslot, sdonthineni, mhonap, vidyas, jan,
mochs, dschumacher, linux-cxl, linux-pci, linux-kernel
Hello,
> +static ssize_t cxl_reset_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> + int rc;
> +
> + if (!sysfs_streq(buf, "1"))
> + return -EINVAL;
Not sure what would the pattern/approach be for CXL sysfs entires, but
perhaps using kstrtobool() would work here? It handles all the boolean
types user could pass.
Thank you,
Krzysztof
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v5 7/7] Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute
2026-03-06 9:23 [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices smadhavan
` (5 preceding siblings ...)
2026-03-06 9:23 ` [PATCH v5 6/7] cxl: Add cxl_reset sysfs interface for PCI devices smadhavan
@ 2026-03-06 9:23 ` smadhavan
2026-03-06 23:32 ` Alex Williamson
2026-03-09 22:37 ` [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices Dave Jiang
7 siblings, 1 reply; 19+ messages in thread
From: smadhavan @ 2026-03-06 9:23 UTC (permalink / raw)
To: bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel, Srirangan Madhavan
From: Srirangan Madhavan <smadhavan@nvidia.com>
Document the cxl_reset sysfs attribute added to PCI devices that
support CXL Reset.
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
Documentation/ABI/testing/sysfs-bus-pci | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index b767db2c52cb..d67c733626b8 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -174,6 +174,28 @@ Description:
similiar to writing 1 to their individual "reset" file, so use
with caution.
+What: /sys/bus/pci/devices/.../cxl_reset
+Date: February 2026
+Contact: linux-cxl@vger.kernel.org
+Description:
+ This attribute is only visible when the device advertises
+ CXL Reset Capable in the CXL DVSEC Capability register
+ (CXL r3.2, section 8.1.3).
+
+ Writing 1 to this file triggers a CXL device reset which
+ affects CXL.cache and CXL.mem state on all CXL functions
+ (i.e. those not listed in the Non-CXL Function Map DVSEC,
+ section 8.1.4), not just CXL.io/PCIe state. This is
+ separate from the standard PCI reset interface because CXL
+ Reset has different scope.
+
+ The reset will fail with -EBUSY if any CXL regions using this
+ device have drivers bound. Active regions are torn down as
+ part of the reset sequence.
+
+ This attribute is registered by the CXL core when a CXL device
+ is discovered, independent of which driver binds the PCI device.
+
What: /sys/bus/pci/devices/.../vpd
Date: February 2008
Contact: Ben Hutchings <bwh@kernel.org>
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v5 7/7] Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute
2026-03-06 9:23 ` [PATCH v5 7/7] Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute smadhavan
@ 2026-03-06 23:32 ` Alex Williamson
0 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2026-03-06 23:32 UTC (permalink / raw)
To: smadhavan
Cc: alex, bhelgaas, dan.j.williams, dave.jiang, jonathan.cameron,
ira.weiny, vishal.l.verma, alison.schofield, dave, jeshuas,
vsethi, skancherla, vaslot, sdonthineni, mhonap, vidyas, jan,
mochs, dschumacher, linux-cxl, linux-pci, linux-kernel
On Fri, 6 Mar 2026 09:23:22 +0000
<smadhavan@nvidia.com> wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> Document the cxl_reset sysfs attribute added to PCI devices that
> support CXL Reset.
>
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
> Documentation/ABI/testing/sysfs-bus-pci | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index b767db2c52cb..d67c733626b8 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -174,6 +174,28 @@ Description:
> similiar to writing 1 to their individual "reset" file, so use
> with caution.
>
> +What: /sys/bus/pci/devices/.../cxl_reset
> +Date: February 2026
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + This attribute is only visible when the device advertises
> + CXL Reset Capable in the CXL DVSEC Capability register
> + (CXL r3.2, section 8.1.3).
> +
> + Writing 1 to this file triggers a CXL device reset which
> + affects CXL.cache and CXL.mem state on all CXL functions
> + (i.e. those not listed in the Non-CXL Function Map DVSEC,
> + section 8.1.4), not just CXL.io/PCIe state. This is
> + separate from the standard PCI reset interface because CXL
> + Reset has different scope.
> +
> + The reset will fail with -EBUSY if any CXL regions using this
> + device have drivers bound. Active regions are torn down as
> + part of the reset sequence.
There's no such test afaict. Thanks,
Alex
> +
> + This attribute is registered by the CXL core when a CXL device
> + is discovered, independent of which driver binds the PCI device.
> +
> What: /sys/bus/pci/devices/.../vpd
> Date: February 2008
> Contact: Ben Hutchings <bwh@kernel.org>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices
2026-03-06 9:23 [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices smadhavan
` (6 preceding siblings ...)
2026-03-06 9:23 ` [PATCH v5 7/7] Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute smadhavan
@ 2026-03-09 22:37 ` Dave Jiang
2026-03-09 22:40 ` Dave Jiang
7 siblings, 1 reply; 19+ messages in thread
From: Dave Jiang @ 2026-03-09 22:37 UTC (permalink / raw)
To: smadhavan, bhelgaas, dan.j.williams, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel
On 3/6/26 2:23 AM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
>
> Hi folks!
>
> This patch series introduces support for the CXL Reset method for CXL
> Type 2 devices, implementing the reset procedure outlined in CXL Spec [1]
> v3.2, Sections 8.1.3, 9.6 and 9.7.
>
> v5 changes (from v4):
> - Rebased on v7.0-rc1 and applied fixes from the review v4.
> - Added CXL DVSEC and HDM save/restore as a prerequisite series [2]
> - Switched from PCI reset method to sysfs
> interface at /sys/bus/pci/devices/.../cxl_reset (Dan, Alex)
> - Removed all PCI core changes - reset logic stays in CXL driver
> - Use cpu_cache_invalidate_memregion() instead of arch-specific code
> - Removed CONFIG_X86/CONFIG_ARM64 ifdefs
> - Added ABI documentation for sysfs interface
>
> v4 changes:
> - Fix CXL reset capability check parentheses warning
> - Gate CXL reset path on CONFIG_CXL_PCI reachability
>
> v3 changes:
> - Restrict CXL reset to Type 2 devices only
> - Add host and device cache flushing for sibling functions and region peers
> - Add region teardown and memory online detection before reset
> - Add configuration state save/restore (DVSEC, HDM, IDE)
> - Split the series by subsystem and functional blocks
>
> Motivation:
> -----------
> - As support for Type 2 devices [6] is being introduced, more devices will
> require finer-grained reset mechanisms beyond bus-wide reset methods.
>
> - FLR does not affect CXL.cache or CXL.mem protocols, making CXL Reset
> the preferred method in some cases.
>
> - The CXL spec (Sections 7.2.3 Binding and Unbinding, 9.5 FLR) highlights use
> cases like function rebinding and error recovery, where CXL Reset is
> explicitly mentioned.
>
> ABI Change reasoning (v5):
> -------------------------
> Previous versions (v1-v4) integrated CXL reset as a new PCI reset method
> in pci_reset_methods[]. Based on feedback from Dan Williams and Alex
> Williamson, v5 switches to a sysfs-based approach.
>
> The key reasoning is that CXL Reset has expanded scope than existing PCI
> reset methods. Mixing these in the same reset infrastructure causes
> problems. Therefore selectively exposing a cxl_reset method in pci-sysfs
> and leaving the existing interface unaffected.
>
> Change Description:
> -------------------
>
> Patch 1: PCI: Add CXL DVSEC reset and capability register definitions
> - Add reset and cache control bit definitions to pci_regs.h
>
> Patch 2: PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
> - Export for sibling function save/restore during CXL reset
>
> Patch 3: cxl: Add memory offlining and cache flush helpers
> - Offline CXL memory regions before reset
> - Flush CPU caches using cpu_cache_invalidate_memregion()
>
> Patch 4: cxl: Add multi-function sibling coordination for CXL reset
> - Identify CXL.cachemem sibling functions via Non-CXL Function Map DVSEC
> - Save/disable and restore sibling PCI functions around reset
>
> Patch 5: cxl: Add CXL DVSEC reset sequence and flow orchestration
> - Implement cxl_dev_reset() to trigger reset via DVSEC
> - Poll for reset completion with timeout
> - cxl_do_reset() orchestrates the complete reset sequence with
> proper locking and error handling
>
> Patch 6: cxl: Add cxl_reset sysfs interface for PCI devices
> - Expose /sys/bus/pci/devices/.../cxl_reset
> - Only visible for devices with Reset Capable bit set
> - Write "1" to trigger reset
>
> Patch 7: Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute
> - Document the new sysfs interface
> - Explain scope, visibility, and error conditions
>
> Dependencies:
> -------------
>
> This series depends on:
> [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
> https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/T/#t
>
> The cpu_cache_invalidate_memregion() call used for CPU cache flush currently
> has support on x86. ARM64 support will be addressed in a separate RFC.
>
> Command line to test the CXL reset on a capable device:
> echo 1 > /sys/bus/pci/devices/<pci_device>/cxl_reset
>
> Basic cxl_reset testing was done on a CXL Type-2 device: writing to the
> sysfs attribute, exercising the DVSEC reset sequence including WB+I and
> init reset, restore. Further testing is in progress.
>
> This series is based on v7.0-rc1.
>
> Srirangan Madhavan (7):
> PCI: Add CXL DVSEC reset and capability register definitions
> PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
> cxl: Add memory offlining and cache flush helpers
> cxl: Add multi-function sibling coordination for CXL reset
> cxl: Add CXL DVSEC reset sequence and flow orchestration
> cxl: Add cxl_reset sysfs interface for PCI devices
> Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute
>
> Documentation/ABI/testing/sysfs-bus-pci | 22 +
> drivers/cxl/core/core.h | 2 +
> drivers/cxl/core/pci.c | 537 ++++++++++++++++++++++++
> drivers/cxl/core/port.c | 3 +
> drivers/pci/pci.c | 21 +-
> include/linux/pci.h | 3 +
> include/uapi/linux/pci_regs.h | 14 +
> 7 files changed, 600 insertions(+), 2 deletions(-)
>
> base-commit: 6de23f81a5e0
The commit is 7.0-rc1. But b4 shazam seems to fail when attempting to apply.
Applying: PCI: Add CXL DVSEC reset and capability register definitions
Patch failed at 0001 PCI: Add CXL DVSEC reset and capability register definitions
error: patch failed: include/uapi/linux/pci_regs.h:1349
error: include/uapi/linux/pci_regs.h: patch does not apply
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices
2026-03-09 22:37 ` [PATCH v5 0/7] CXL: Add cxl_reset sysfs attribute for PCI devices Dave Jiang
@ 2026-03-09 22:40 ` Dave Jiang
0 siblings, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2026-03-09 22:40 UTC (permalink / raw)
To: smadhavan, bhelgaas, dan.j.williams, jonathan.cameron, ira.weiny,
vishal.l.verma, alison.schofield, dave
Cc: alwilliamson, jeshuas, vsethi, skancherla, vaslot, sdonthineni,
mhonap, vidyas, jan, mochs, dschumacher, linux-cxl, linux-pci,
linux-kernel
On 3/9/26 3:37 PM, Dave Jiang wrote:
>
>
> On 3/6/26 2:23 AM, smadhavan@nvidia.com wrote:
>> From: Srirangan Madhavan <smadhavan@nvidia.com>
>>
>> Hi folks!
>>
>> This patch series introduces support for the CXL Reset method for CXL
>> Type 2 devices, implementing the reset procedure outlined in CXL Spec [1]
>> v3.2, Sections 8.1.3, 9.6 and 9.7.
>>
>> v5 changes (from v4):
>> - Rebased on v7.0-rc1 and applied fixes from the review v4.
>> - Added CXL DVSEC and HDM save/restore as a prerequisite series [2]
>> - Switched from PCI reset method to sysfs
>> interface at /sys/bus/pci/devices/.../cxl_reset (Dan, Alex)
>> - Removed all PCI core changes - reset logic stays in CXL driver
>> - Use cpu_cache_invalidate_memregion() instead of arch-specific code
>> - Removed CONFIG_X86/CONFIG_ARM64 ifdefs
>> - Added ABI documentation for sysfs interface
>>
>> v4 changes:
>> - Fix CXL reset capability check parentheses warning
>> - Gate CXL reset path on CONFIG_CXL_PCI reachability
>>
>> v3 changes:
>> - Restrict CXL reset to Type 2 devices only
>> - Add host and device cache flushing for sibling functions and region peers
>> - Add region teardown and memory online detection before reset
>> - Add configuration state save/restore (DVSEC, HDM, IDE)
>> - Split the series by subsystem and functional blocks
>>
>> Motivation:
>> -----------
>> - As support for Type 2 devices [6] is being introduced, more devices will
>> require finer-grained reset mechanisms beyond bus-wide reset methods.
>>
>> - FLR does not affect CXL.cache or CXL.mem protocols, making CXL Reset
>> the preferred method in some cases.
>>
>> - The CXL spec (Sections 7.2.3 Binding and Unbinding, 9.5 FLR) highlights use
>> cases like function rebinding and error recovery, where CXL Reset is
>> explicitly mentioned.
>>
>> ABI Change reasoning (v5):
>> -------------------------
>> Previous versions (v1-v4) integrated CXL reset as a new PCI reset method
>> in pci_reset_methods[]. Based on feedback from Dan Williams and Alex
>> Williamson, v5 switches to a sysfs-based approach.
>>
>> The key reasoning is that CXL Reset has expanded scope than existing PCI
>> reset methods. Mixing these in the same reset infrastructure causes
>> problems. Therefore selectively exposing a cxl_reset method in pci-sysfs
>> and leaving the existing interface unaffected.
>>
>> Change Description:
>> -------------------
>>
>> Patch 1: PCI: Add CXL DVSEC reset and capability register definitions
>> - Add reset and cache control bit definitions to pci_regs.h
>>
>> Patch 2: PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
>> - Export for sibling function save/restore during CXL reset
>>
>> Patch 3: cxl: Add memory offlining and cache flush helpers
>> - Offline CXL memory regions before reset
>> - Flush CPU caches using cpu_cache_invalidate_memregion()
>>
>> Patch 4: cxl: Add multi-function sibling coordination for CXL reset
>> - Identify CXL.cachemem sibling functions via Non-CXL Function Map DVSEC
>> - Save/disable and restore sibling PCI functions around reset
>>
>> Patch 5: cxl: Add CXL DVSEC reset sequence and flow orchestration
>> - Implement cxl_dev_reset() to trigger reset via DVSEC
>> - Poll for reset completion with timeout
>> - cxl_do_reset() orchestrates the complete reset sequence with
>> proper locking and error handling
>>
>> Patch 6: cxl: Add cxl_reset sysfs interface for PCI devices
>> - Expose /sys/bus/pci/devices/.../cxl_reset
>> - Only visible for devices with Reset Capable bit set
>> - Write "1" to trigger reset
>>
>> Patch 7: Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute
>> - Document the new sysfs interface
>> - Explain scope, visibility, and error conditions
>>
>> Dependencies:
>> -------------
>>
>> This series depends on:
>> [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
>> https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/T/#t
>>
>> The cpu_cache_invalidate_memregion() call used for CPU cache flush currently
>> has support on x86. ARM64 support will be addressed in a separate RFC.
>>
>> Command line to test the CXL reset on a capable device:
>> echo 1 > /sys/bus/pci/devices/<pci_device>/cxl_reset
>>
>> Basic cxl_reset testing was done on a CXL Type-2 device: writing to the
>> sysfs attribute, exercising the DVSEC reset sequence including WB+I and
>> init reset, restore. Further testing is in progress.
>>
>> This series is based on v7.0-rc1.
>>
>> Srirangan Madhavan (7):
>> PCI: Add CXL DVSEC reset and capability register definitions
>> PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
>> cxl: Add memory offlining and cache flush helpers
>> cxl: Add multi-function sibling coordination for CXL reset
>> cxl: Add CXL DVSEC reset sequence and flow orchestration
>> cxl: Add cxl_reset sysfs interface for PCI devices
>> Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute
>>
>> Documentation/ABI/testing/sysfs-bus-pci | 22 +
>> drivers/cxl/core/core.h | 2 +
>> drivers/cxl/core/pci.c | 537 ++++++++++++++++++++++++
>> drivers/cxl/core/port.c | 3 +
>> drivers/pci/pci.c | 21 +-
>> include/linux/pci.h | 3 +
>> include/uapi/linux/pci_regs.h | 14 +
>> 7 files changed, 600 insertions(+), 2 deletions(-)
>>
>> base-commit: 6de23f81a5e0
>
> The commit is 7.0-rc1. But b4 shazam seems to fail when attempting to apply.
>
> Applying: PCI: Add CXL DVSEC reset and capability register definitions
> Patch failed at 0001 PCI: Add CXL DVSEC reset and capability register definitions
> error: patch failed: include/uapi/linux/pci_regs.h:1349
> error: include/uapi/linux/pci_regs.h: patch does not apply
>
nm. I need to apply the save/restore series first. The base-commit should not be the v7.1-rc1 commit.
>
>> --
>> 2.43.0
>>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread