[PATCH v4 0/10] CXL Reset support for Type 2 devices

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/10] CXL Reset support for Type 2 devices
@ 2026-01-20 22:26 smadhavan
  2026-01-20 22:26 ` [PATCH v4 01/10] cxl: move DVSEC defines to cxl pci header smadhavan
                   ` (12 more replies)
  0 siblings, 13 replies; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

Hi folks!

This patch series introduces support for the CXL Reset method for CXL
devices, implementing the reset procedure outlined in CXL Spec [1] v3.2,
Sections 9.6 and 9.7.

v4 changes:
- Fix CXL reset capability check parentheses warning
- Gate CXL reset path on CONFIG_CXL_PCI reachability

v3 changes:
- Restrict CXL reset to Type 2 devices only
- Add host and device cache flushing for
    * all sibling functions on multi-function devices
    * all sibling devices in a given region
- Add region teardown and memory online detection before reset
- Add configuration state save/restore (DVSEC, HDM, IDE)
- Split the series by subsystem and functional blocks

v2 changes:
- De-duplicate CXL DVSEC register defines under include/cxl/pci.h
- Fix style-related issues

v1 changes:
- Added cover letter and dropped the RFC

The RFC patches can be found here [2]
v2 patches can be found here [3]

Motivation:
-----------
This change is broadly useful for reasons including but not limited to the
following:

- As support for Type 2 devices [4] is being introduced, more devices will
  require finer-grained reset mechanisms beyond bus-wide reset methods.

- FLR does not affect CXL.cache or CXL.mem protocols, making CXL Reset
  the preferred method in some cases.

- The CXL spec (Sections 7.2.3 Binding and Unbinding, 9.5 FLR) highlights use
  cases like function rebinding and error recovery, where CXL Reset is
  explicitly mentioned.

Change Description:
-------------------

Patch 1: Move CXL DVSEC defines to the CXL PCI header
- Consolidate DVSEC register definitions under include/cxl/pci.h

Patch 2: Switch PCI CXL port DVSEC defines
- Use the shared CXL PCI header in the PCI core

Patch 3: Add Type 2 helper and reset DVSEC bits
- Add helper to identify Type 2 devices
- Define DVSEC reset/cache control bits

Patch 4: Add the CXL reset method in the PCI core
- Implement cxl_reset() method with capability checks and reset sequence
- Restrict to Type 2 devices

Patch 5: Add reset preparation and region teardown
- Implement region validation and teardown before reset
- Add device cache flush for all sibling devices in a given region

Patch 6: Wire CXL reset prepare/cleanup in PCI
- Call CXL reset prepare/cleanup around the core reset flow

Patch 7: Add host CPU cache flush and multi-function support
- Add host CPU cache flush (x86: wbinvd, arm64: dcache_clean_inval_poc)
- Add device cache flush for all sibling functions on multi-function devices

Patch 8: Add DVSEC configuration state save/restore
- Save/restore DVSEC registers (DEVCTL, DEVCTL2) with CONFIG_LOCK handling

Patch 9: Save/restore CXL config around reset
- Save PCI and CXL config before reset and restore afterwards

Patch 10: Add HDM decoder and IDE state save/restore
- Save/restore HDM decoder and IDE register state

The reset sequence: validate device type, check memory offline, tear down
regions, flush host CPU caches, flush device caches (all functions), save
config state, initiate reset, wait for completion, restore config state.

Command line to test the CXL reset on a capable device:
    echo cxl_reset > /sys/bus/pci/devices/<pci_device>/reset_method
    echo 1 > /sys/bus/pci/devices/<pci_device>/reset

[1] https://computeexpresslink.org/cxl-specification/
[2] https://lore.kernel.org/all/20241213074143.374-1-smadhavan@nvidia.com/
[3] https://lore.kernel.org/all/20250221043906.1593189-1-smadhavan@nvidia.com/
[4] https://lore.kernel.org/linux-cxl/20251205115248.772945-1-alejandro.lucero-palau@amd.com/

Srirangan Madhavan (10):
  [PATCH v4 1/10] cxl: move DVSEC defines to cxl pci header
  [PATCH v4 2/10] PCI: switch CXL port DVSEC defines
  [PATCH v4 3/10] cxl: add type 2 helper and reset DVSEC bits
  [PATCH v4 4/10] PCI: add CXL reset method
  [PATCH v4 5/10] cxl: add reset prepare and region teardown
  [PATCH v4 6/10] PCI: wire CXL reset prepare/cleanup
  [PATCH v4 7/10] cxl: add host cache flush and multi-function reset
  [PATCH v4 8/10] cxl: add DVSEC config save/restore
  [PATCH v4 9/10] PCI: save/restore CXL config around reset
  [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore

 drivers/cxl/core/pci.c        |   1 +
 drivers/cxl/core/regs.c       |   8 +
 drivers/cxl/cxl.h             |   4 +
 drivers/cxl/cxlpci.h          |  53 ---
 drivers/cxl/pci.c             | 621 +++++++++++++++++++++++++++++++++-
 drivers/pci/pci.c             | 150 +++++++-
 include/cxl/pci.h             | 134 ++++++++
 include/linux/pci.h           |  21 +-
 include/uapi/linux/pci_regs.h |   5 -
 9 files changed, 929 insertions(+), 68 deletions(-)
 create mode 100644 include/cxl/pci.h

--
2.34.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 01/10] cxl: move DVSEC defines to cxl pci header
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-21 10:31   ` Jonathan Cameron
  2026-01-20 22:26 ` [PATCH v4 02/10] PCI: switch CXL port DVSEC defines smadhavan
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

CXL DVSEC definitions are shared across PCI core and CXL drivers, so
move the register macros into the common CXL PCI header. This keeps
the DVSEC surface in one place and avoids duplication as the reset and
config helpers build on these offsets and bitfields.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/core/pci.c  |  1 +
 drivers/cxl/core/regs.c |  1 +
 drivers/cxl/cxlpci.h    | 53 -----------------------------------
 drivers/cxl/pci.c       |  1 +
 include/cxl/pci.h       | 62 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 65 insertions(+), 53 deletions(-)
 create mode 100644 include/cxl/pci.h

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 5b023a0178a4..968babcc09a2 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -7,6 +7,7 @@
 #include <linux/pci.h>
 #include <linux/pci-doe.h>
 #include <linux/aer.h>
+#include <cxl/pci.h>
 #include <cxlpci.h>
 #include <cxlmem.h>
 #include <cxl.h>
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 5ca7b0eed568..ecdb22ae6952 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -4,6 +4,7 @@
 #include <linux/device.h>
 #include <linux/slab.h>
 #include <linux/pci.h>
+#include <cxl/pci.h>
 #include <cxlmem.h>
 #include <cxlpci.h>
 #include <pmu.h>
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 1d526bea8431..cdb7cf3dbcb4 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -7,59 +7,6 @@

 #define CXL_MEMORY_PROGIF	0x10

-/*
- * See section 8.1 Configuration Space Registers in the CXL 2.0
- * Specification. Names are taken straight from the specification with "CXL" and
- * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
- */
-#define PCI_DVSEC_HEADER1_LENGTH_MASK	GENMASK(31, 20)
-
-/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
-#define CXL_DVSEC_PCIE_DEVICE					0
-#define   CXL_DVSEC_CAP_OFFSET		0xA
-#define     CXL_DVSEC_MEM_CAPABLE	BIT(2)
-#define     CXL_DVSEC_HDM_COUNT_MASK	GENMASK(5, 4)
-#define   CXL_DVSEC_CTRL_OFFSET		0xC
-#define     CXL_DVSEC_MEM_ENABLE	BIT(2)
-#define   CXL_DVSEC_RANGE_SIZE_HIGH(i)	(0x18 + (i * 0x10))
-#define   CXL_DVSEC_RANGE_SIZE_LOW(i)	(0x1C + (i * 0x10))
-#define     CXL_DVSEC_MEM_INFO_VALID	BIT(0)
-#define     CXL_DVSEC_MEM_ACTIVE	BIT(1)
-#define     CXL_DVSEC_MEM_SIZE_LOW_MASK	GENMASK(31, 28)
-#define   CXL_DVSEC_RANGE_BASE_HIGH(i)	(0x20 + (i * 0x10))
-#define   CXL_DVSEC_RANGE_BASE_LOW(i)	(0x24 + (i * 0x10))
-#define     CXL_DVSEC_MEM_BASE_LOW_MASK	GENMASK(31, 28)
-
-#define CXL_DVSEC_RANGE_MAX		2
-
-/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
-#define CXL_DVSEC_FUNCTION_MAP					2
-
-/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */
-#define CXL_DVSEC_PORT_EXTENSIONS				3
-
-/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */
-#define CXL_DVSEC_PORT_GPF					4
-#define   CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET		0x0C
-#define     CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK		GENMASK(3, 0)
-#define     CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK		GENMASK(11, 8)
-#define   CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET		0xE
-#define     CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK		GENMASK(3, 0)
-#define     CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK		GENMASK(11, 8)
-
-/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */
-#define CXL_DVSEC_DEVICE_GPF					5
-
-/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */
-#define CXL_DVSEC_PCIE_FLEXBUS_PORT				7
-
-/* CXL 2.0 8.1.9: Register Locator DVSEC */
-#define CXL_DVSEC_REG_LOCATOR					8
-#define   CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET			0xC
-#define     CXL_DVSEC_REG_LOCATOR_BIR_MASK			GENMASK(2, 0)
-#define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
-#define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
-
 /*
  * NOTE: Currently all the functions which are enabled for CXL require their
  * vectors to be in the first 16.  Use this as the default max.
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 3b2293dffb3f..55c767df4543 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -12,6 +12,7 @@
 #include <linux/aer.h>
 #include <linux/io.h>
 #include <cxl/mailbox.h>
+#include <cxl/pci.h>
 #include "cxlmem.h"
 #include "cxlpci.h"
 #include "cxl.h"
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
new file mode 100644
index 000000000000..728ba0cdd289
--- /dev/null
+++ b/include/cxl/pci.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+
+#ifndef __CXL_ACCEL_PCI_H
+#define __CXL_ACCEL_PCI_H
+
+/*
+ * See section 8.1 Configuration Space Registers in the CXL 2.0
+ * Specification. Names are taken straight from the specification with "CXL" and
+ * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
+ */
+#define PCI_DVSEC_HEADER1_LENGTH_MASK  GENMASK(31, 20)
+
+/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
+#define CXL_DVSEC_PCIE_DEVICE					0
+#define   CXL_DVSEC_CAP_OFFSET		0xA
+#define     CXL_DVSEC_MEM_CAPABLE	BIT(2)
+#define     CXL_DVSEC_HDM_COUNT_MASK	GENMASK(5, 4)
+#define   CXL_DVSEC_CTRL_OFFSET		0xC
+#define     CXL_DVSEC_MEM_ENABLE	BIT(2)
+#define   CXL_DVSEC_RANGE_SIZE_HIGH(i)	(0x18 + ((i) * 0x10))
+#define   CXL_DVSEC_RANGE_SIZE_LOW(i)	(0x1C + ((i) * 0x10))
+#define     CXL_DVSEC_MEM_INFO_VALID	BIT(0)
+#define     CXL_DVSEC_MEM_ACTIVE	BIT(1)
+#define     CXL_DVSEC_MEM_SIZE_LOW_MASK	GENMASK(31, 28)
+#define   CXL_DVSEC_RANGE_BASE_HIGH(i)	(0x20 + ((i) * 0x10))
+#define   CXL_DVSEC_RANGE_BASE_LOW(i)	(0x24 + ((i) * 0x10))
+#define     CXL_DVSEC_MEM_BASE_LOW_MASK	GENMASK(31, 28)
+
+#define CXL_DVSEC_RANGE_MAX		2
+
+/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
+#define CXL_DVSEC_FUNCTION_MAP					2
+
+/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */
+#define CXL_DVSEC_PORT_EXTENSIONS				3
+#define   CXL_DVSEC_PORT_CTL		0xC
+#define     CXL_DVSEC_UNMASK_SBR		BIT(0)
+
+/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */
+#define CXL_DVSEC_PORT_GPF					4
+#define   CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET		0x0C
+#define     CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK		GENMASK(3, 0)
+#define     CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK		GENMASK(11, 8)
+#define   CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET		0xE
+#define     CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK		GENMASK(3, 0)
+#define     CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK		GENMASK(11, 8)
+
+/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */
+#define CXL_DVSEC_DEVICE_GPF					5
+
+/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */
+#define CXL_DVSEC_PCIE_FLEXBUS_PORT				7
+
+/* CXL 2.0 8.1.9: Register Locator DVSEC */
+#define CXL_DVSEC_REG_LOCATOR					8
+#define   CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET			0xC
+#define     CXL_DVSEC_REG_LOCATOR_BIR_MASK			GENMASK(2, 0)
+#define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
+#define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
+
+#endif
--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 01/10] cxl: move DVSEC defines to cxl pci header
  2026-01-20 22:26 ` [PATCH v4 01/10] cxl: move DVSEC defines to cxl pci header smadhavan
@ 2026-01-21 10:31   ` Jonathan Cameron
  0 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-21 10:31 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, 20 Jan 2026 22:26:01 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> CXL DVSEC definitions are shared across PCI core and CXL drivers, so
> move the register macros into the common CXL PCI header. This keeps
> the DVSEC surface in one place and avoids duplication as the reset and
> config helpers build on these offsets and bitfields.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>

These are moving in:
https://lore.kernel.org/all/20260114182055.46029-2-terry.bowman@amd.com/
However it is to the main uapi/linux/pci_regs.h file.

*fingers crossed* that should land this cycle and simplify your set a little.

Jonathan

> ---
>  drivers/cxl/core/pci.c  |  1 +
>  drivers/cxl/core/regs.c |  1 +
>  drivers/cxl/cxlpci.h    | 53 -----------------------------------
>  drivers/cxl/pci.c       |  1 +
>  include/cxl/pci.h       | 62 +++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 65 insertions(+), 53 deletions(-)
>  create mode 100644 include/cxl/pci.h
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 5b023a0178a4..968babcc09a2 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -7,6 +7,7 @@
>  #include <linux/pci.h>
>  #include <linux/pci-doe.h>
>  #include <linux/aer.h>
> +#include <cxl/pci.h>
>  #include <cxlpci.h>
>  #include <cxlmem.h>
>  #include <cxl.h>
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index 5ca7b0eed568..ecdb22ae6952 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -4,6 +4,7 @@
>  #include <linux/device.h>
>  #include <linux/slab.h>
>  #include <linux/pci.h>
> +#include <cxl/pci.h>
>  #include <cxlmem.h>
>  #include <cxlpci.h>
>  #include <pmu.h>
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 1d526bea8431..cdb7cf3dbcb4 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -7,59 +7,6 @@
> 
>  #define CXL_MEMORY_PROGIF	0x10
> 
> -/*
> - * See section 8.1 Configuration Space Registers in the CXL 2.0
> - * Specification. Names are taken straight from the specification with "CXL" and
> - * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
> - */
> -#define PCI_DVSEC_HEADER1_LENGTH_MASK	GENMASK(31, 20)
> -
> -/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
> -#define CXL_DVSEC_PCIE_DEVICE					0
> -#define   CXL_DVSEC_CAP_OFFSET		0xA
> -#define     CXL_DVSEC_MEM_CAPABLE	BIT(2)
> -#define     CXL_DVSEC_HDM_COUNT_MASK	GENMASK(5, 4)
> -#define   CXL_DVSEC_CTRL_OFFSET		0xC
> -#define     CXL_DVSEC_MEM_ENABLE	BIT(2)
> -#define   CXL_DVSEC_RANGE_SIZE_HIGH(i)	(0x18 + (i * 0x10))
> -#define   CXL_DVSEC_RANGE_SIZE_LOW(i)	(0x1C + (i * 0x10))
> -#define     CXL_DVSEC_MEM_INFO_VALID	BIT(0)
> -#define     CXL_DVSEC_MEM_ACTIVE	BIT(1)
> -#define     CXL_DVSEC_MEM_SIZE_LOW_MASK	GENMASK(31, 28)
> -#define   CXL_DVSEC_RANGE_BASE_HIGH(i)	(0x20 + (i * 0x10))
> -#define   CXL_DVSEC_RANGE_BASE_LOW(i)	(0x24 + (i * 0x10))
> -#define     CXL_DVSEC_MEM_BASE_LOW_MASK	GENMASK(31, 28)
> -
> -#define CXL_DVSEC_RANGE_MAX		2
> -
> -/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
> -#define CXL_DVSEC_FUNCTION_MAP					2
> -
> -/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */
> -#define CXL_DVSEC_PORT_EXTENSIONS				3
> -
> -/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */
> -#define CXL_DVSEC_PORT_GPF					4
> -#define   CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET		0x0C
> -#define     CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK		GENMASK(3, 0)
> -#define     CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK		GENMASK(11, 8)
> -#define   CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET		0xE
> -#define     CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK		GENMASK(3, 0)
> -#define     CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK		GENMASK(11, 8)
> -
> -/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */
> -#define CXL_DVSEC_DEVICE_GPF					5
> -
> -/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */
> -#define CXL_DVSEC_PCIE_FLEXBUS_PORT				7
> -
> -/* CXL 2.0 8.1.9: Register Locator DVSEC */
> -#define CXL_DVSEC_REG_LOCATOR					8
> -#define   CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET			0xC
> -#define     CXL_DVSEC_REG_LOCATOR_BIR_MASK			GENMASK(2, 0)
> -#define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
> -#define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
> -
>  /*
>   * NOTE: Currently all the functions which are enabled for CXL require their
>   * vectors to be in the first 16.  Use this as the default max.
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 3b2293dffb3f..55c767df4543 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -12,6 +12,7 @@
>  #include <linux/aer.h>
>  #include <linux/io.h>
>  #include <cxl/mailbox.h>
> +#include <cxl/pci.h>
>  #include "cxlmem.h"
>  #include "cxlpci.h"
>  #include "cxl.h"
> diff --git a/include/cxl/pci.h b/include/cxl/pci.h
> new file mode 100644
> index 000000000000..728ba0cdd289
> --- /dev/null
> +++ b/include/cxl/pci.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +
> +#ifndef __CXL_ACCEL_PCI_H
> +#define __CXL_ACCEL_PCI_H
> +
> +/*
> + * See section 8.1 Configuration Space Registers in the CXL 2.0
> + * Specification. Names are taken straight from the specification with "CXL" and
> + * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
> + */
> +#define PCI_DVSEC_HEADER1_LENGTH_MASK  GENMASK(31, 20)
> +
> +/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
> +#define CXL_DVSEC_PCIE_DEVICE					0
> +#define   CXL_DVSEC_CAP_OFFSET		0xA
> +#define     CXL_DVSEC_MEM_CAPABLE	BIT(2)
> +#define     CXL_DVSEC_HDM_COUNT_MASK	GENMASK(5, 4)
> +#define   CXL_DVSEC_CTRL_OFFSET		0xC
> +#define     CXL_DVSEC_MEM_ENABLE	BIT(2)
> +#define   CXL_DVSEC_RANGE_SIZE_HIGH(i)	(0x18 + ((i) * 0x10))
> +#define   CXL_DVSEC_RANGE_SIZE_LOW(i)	(0x1C + ((i) * 0x10))
> +#define     CXL_DVSEC_MEM_INFO_VALID	BIT(0)
> +#define     CXL_DVSEC_MEM_ACTIVE	BIT(1)
> +#define     CXL_DVSEC_MEM_SIZE_LOW_MASK	GENMASK(31, 28)
> +#define   CXL_DVSEC_RANGE_BASE_HIGH(i)	(0x20 + ((i) * 0x10))
> +#define   CXL_DVSEC_RANGE_BASE_LOW(i)	(0x24 + ((i) * 0x10))
> +#define     CXL_DVSEC_MEM_BASE_LOW_MASK	GENMASK(31, 28)
> +
> +#define CXL_DVSEC_RANGE_MAX		2
> +
> +/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
> +#define CXL_DVSEC_FUNCTION_MAP					2
> +
> +/* CXL 2.0 8.1.5: CXL 2.0 Extensions DVSEC for Ports */
> +#define CXL_DVSEC_PORT_EXTENSIONS				3
> +#define   CXL_DVSEC_PORT_CTL		0xC
> +#define     CXL_DVSEC_UNMASK_SBR		BIT(0)
> +
> +/* CXL 2.0 8.1.6: GPF DVSEC for CXL Port */
> +#define CXL_DVSEC_PORT_GPF					4
> +#define   CXL_DVSEC_PORT_GPF_PHASE_1_CONTROL_OFFSET		0x0C
> +#define     CXL_DVSEC_PORT_GPF_PHASE_1_TMO_BASE_MASK		GENMASK(3, 0)
> +#define     CXL_DVSEC_PORT_GPF_PHASE_1_TMO_SCALE_MASK		GENMASK(11, 8)
> +#define   CXL_DVSEC_PORT_GPF_PHASE_2_CONTROL_OFFSET		0xE
> +#define     CXL_DVSEC_PORT_GPF_PHASE_2_TMO_BASE_MASK		GENMASK(3, 0)
> +#define     CXL_DVSEC_PORT_GPF_PHASE_2_TMO_SCALE_MASK		GENMASK(11, 8)
> +
> +/* CXL 2.0 8.1.7: GPF DVSEC for CXL Device */
> +#define CXL_DVSEC_DEVICE_GPF					5
> +
> +/* CXL 2.0 8.1.8: PCIe DVSEC for Flex Bus Port */
> +#define CXL_DVSEC_PCIE_FLEXBUS_PORT				7
> +
> +/* CXL 2.0 8.1.9: Register Locator DVSEC */
> +#define CXL_DVSEC_REG_LOCATOR					8
> +#define   CXL_DVSEC_REG_LOCATOR_BLOCK1_OFFSET			0xC
> +#define     CXL_DVSEC_REG_LOCATOR_BIR_MASK			GENMASK(2, 0)
> +#define	    CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK			GENMASK(15, 8)
> +#define     CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK		GENMASK(31, 16)
> +
> +#endif
> --
> 2.34.1
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 02/10] PCI: switch CXL port DVSEC defines
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
  2026-01-20 22:26 ` [PATCH v4 01/10] cxl: move DVSEC defines to cxl pci header smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-21 10:34   ` Jonathan Cameron
  2026-01-20 22:26 ` [PATCH v4 03/10] cxl: add type 2 helper and reset DVSEC bits smadhavan
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

The PCI core consumes CXL port DVSEC fields for reset handling, so
switch it to use the shared CXL PCI header instead of the uapi header.
This aligns the core with the header split and keeps internal code from
depending on uapi-only definitions.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/pci/pci.c             | 17 +++++++++--------
 include/uapi/linux/pci_regs.h |  5 -----
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13dbb405dc31..8bb07e253646 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -30,6 +30,7 @@
 #include <asm/dma.h>
 #include <linux/aer.h>
 #include <linux/bitfield.h>
+#include <cxl/pci.h>
 #include "pci.h"

 DEFINE_MUTEX(pci_slot_mutex);
@@ -4842,7 +4843,7 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
 static u16 cxl_port_dvsec(struct pci_dev *dev)
 {
 	return pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
-					 PCI_DVSEC_CXL_PORT);
+					 CXL_DVSEC_PORT_EXTENSIONS);
 }

 static bool cxl_sbr_masked(struct pci_dev *dev)
@@ -4854,7 +4855,7 @@ static bool cxl_sbr_masked(struct pci_dev *dev)
 	if (!dvsec)
 		return false;

-	rc = pci_read_config_word(dev, dvsec + PCI_DVSEC_CXL_PORT_CTL, &reg);
+	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_PORT_CTL, &reg);
 	if (rc || PCI_POSSIBLE_ERROR(reg))
 		return false;

@@ -4863,7 +4864,7 @@ static bool cxl_sbr_masked(struct pci_dev *dev)
 	 * bit in Bridge Control has no effect.  When 1, the Port generates
 	 * hot reset when the SBR bit is set to 1.
 	 */
-	if (reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR)
+	if (reg & CXL_DVSEC_UNMASK_SBR)
 		return false;

 	return true;
@@ -4908,22 +4909,22 @@ static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
 	if (probe)
 		return 0;

-	rc = pci_read_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL, &reg);
+	rc = pci_read_config_word(bridge, dvsec + CXL_DVSEC_PORT_CTL, &reg);
 	if (rc)
 		return -ENOTTY;

-	if (reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR) {
+	if (reg & CXL_DVSEC_UNMASK_SBR) {
 		val = reg;
 	} else {
-		val = reg | PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR;
-		pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
+		val = reg | CXL_DVSEC_UNMASK_SBR;
+		pci_write_config_word(bridge, dvsec + CXL_DVSEC_PORT_CTL,
 				      val);
 	}

 	rc = pci_reset_bus_function(dev, probe);

 	if (reg != val)
-		pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
+		pci_write_config_word(bridge, dvsec + CXL_DVSEC_PORT_CTL,
 				      reg);

 	return rc;
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 3add74ae2594..4f9e6dddc282 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1253,11 +1253,6 @@
 #define PCI_DEV3_STA		0x0c	/* Device 3 Status Register */
 #define  PCI_DEV3_STA_SEGMENT	0x8	/* Segment Captured (end-to-end flit-mode detected) */

-/* Compute Express Link (CXL r3.1, sec 8.1.5) */
-#define PCI_DVSEC_CXL_PORT				3
-#define PCI_DVSEC_CXL_PORT_CTL				0x0c
-#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR		0x00000001
-
 /* Integrity and Data Encryption Extended Capability */
 #define PCI_IDE_CAP			0x04
 #define  PCI_IDE_CAP_LINK		0x1  /* Link IDE Stream Supported */
--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 02/10] PCI: switch CXL port DVSEC defines
  2026-01-20 22:26 ` [PATCH v4 02/10] PCI: switch CXL port DVSEC defines smadhavan
@ 2026-01-21 10:34   ` Jonathan Cameron
  0 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-21 10:34 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, 20 Jan 2026 22:26:02 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> The PCI core consumes CXL port DVSEC fields for reset handling, so
> switch it to use the shared CXL PCI header instead of the uapi header.
> This aligns the core with the header split and keeps internal code from
> depending on uapi-only definitions.

Why do you think they are uapi only?  Masses of kernel code relies on those
definitions.

Intent is to just have one source of truth for userspace tools and kernel
space ones.

Also, removing anything from those headers will probably break someone's
user space so is basically impossible to do safely.

J
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/pci/pci.c             | 17 +++++++++--------
>  include/uapi/linux/pci_regs.h |  5 -----
>  2 files changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..8bb07e253646 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -30,6 +30,7 @@
>  #include <asm/dma.h>
>  #include <linux/aer.h>
>  #include <linux/bitfield.h>
> +#include <cxl/pci.h>
>  #include "pci.h"
> 
>  DEFINE_MUTEX(pci_slot_mutex);
> @@ -4842,7 +4843,7 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
>  static u16 cxl_port_dvsec(struct pci_dev *dev)
>  {
>  	return pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
> -					 PCI_DVSEC_CXL_PORT);
> +					 CXL_DVSEC_PORT_EXTENSIONS);
>  }
> 
>  static bool cxl_sbr_masked(struct pci_dev *dev)
> @@ -4854,7 +4855,7 @@ static bool cxl_sbr_masked(struct pci_dev *dev)
>  	if (!dvsec)
>  		return false;
> 
> -	rc = pci_read_config_word(dev, dvsec + PCI_DVSEC_CXL_PORT_CTL, &reg);
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_PORT_CTL, &reg);
>  	if (rc || PCI_POSSIBLE_ERROR(reg))
>  		return false;
> 
> @@ -4863,7 +4864,7 @@ static bool cxl_sbr_masked(struct pci_dev *dev)
>  	 * bit in Bridge Control has no effect.  When 1, the Port generates
>  	 * hot reset when the SBR bit is set to 1.
>  	 */
> -	if (reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR)
> +	if (reg & CXL_DVSEC_UNMASK_SBR)
>  		return false;
> 
>  	return true;
> @@ -4908,22 +4909,22 @@ static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
>  	if (probe)
>  		return 0;
> 
> -	rc = pci_read_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL, &reg);
> +	rc = pci_read_config_word(bridge, dvsec + CXL_DVSEC_PORT_CTL, &reg);
>  	if (rc)
>  		return -ENOTTY;
> 
> -	if (reg & PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR) {
> +	if (reg & CXL_DVSEC_UNMASK_SBR) {
>  		val = reg;
>  	} else {
> -		val = reg | PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR;
> -		pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
> +		val = reg | CXL_DVSEC_UNMASK_SBR;
> +		pci_write_config_word(bridge, dvsec + CXL_DVSEC_PORT_CTL,
>  				      val);
>  	}
> 
>  	rc = pci_reset_bus_function(dev, probe);
> 
>  	if (reg != val)
> -		pci_write_config_word(bridge, dvsec + PCI_DVSEC_CXL_PORT_CTL,
> +		pci_write_config_word(bridge, dvsec + CXL_DVSEC_PORT_CTL,
>  				      reg);
> 
>  	return rc;
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 3add74ae2594..4f9e6dddc282 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1253,11 +1253,6 @@
>  #define PCI_DEV3_STA		0x0c	/* Device 3 Status Register */
>  #define  PCI_DEV3_STA_SEGMENT	0x8	/* Segment Captured (end-to-end flit-mode detected) */
> 
> -/* Compute Express Link (CXL r3.1, sec 8.1.5) */
> -#define PCI_DVSEC_CXL_PORT				3
> -#define PCI_DVSEC_CXL_PORT_CTL				0x0c
> -#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR		0x00000001
> -
>  /* Integrity and Data Encryption Extended Capability */
>  #define PCI_IDE_CAP			0x04
>  #define  PCI_IDE_CAP_LINK		0x1  /* Link IDE Stream Supported */
> --
> 2.34.1
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 03/10] cxl: add type 2 helper and reset DVSEC bits
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
  2026-01-20 22:26 ` [PATCH v4 01/10] cxl: move DVSEC defines to cxl pci header smadhavan
  2026-01-20 22:26 ` [PATCH v4 02/10] PCI: switch CXL port DVSEC defines smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-20 23:27   ` Dave Jiang
  2026-01-20 22:26 ` [PATCH v4 04/10] PCI: add CXL reset method smadhavan
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

Introduce a helper to identify CXL Type 2 devices and define the DVSEC
reset/cache control bits used by the reset flow.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/pci.c | 10 ++++++++++
 include/cxl/pci.h | 14 ++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 55c767df4543..b562e607ec46 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1075,6 +1075,16 @@ static pci_ers_result_t cxl_slot_reset(struct pci_dev *pdev)
 	return PCI_ERS_RESULT_RECOVERED;
 }

+bool cxl_is_type2_device(struct pci_dev *pdev)
+{
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+
+	if (!cxlds)
+		return false;
+
+	return cxlds->type == CXL_DEVTYPE_DEVMEM;
+}
+
 static void cxl_error_resume(struct pci_dev *pdev)
 {
 	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
index 728ba0cdd289..71d8de5de948 100644
--- a/include/cxl/pci.h
+++ b/include/cxl/pci.h
@@ -14,10 +14,24 @@
 /* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
 #define CXL_DVSEC_PCIE_DEVICE					0
 #define   CXL_DVSEC_CAP_OFFSET		0xA
+#define     CXL_DVSEC_CACHE_CAPABLE	BIT(0)
 #define     CXL_DVSEC_MEM_CAPABLE	BIT(2)
 #define     CXL_DVSEC_HDM_COUNT_MASK	GENMASK(5, 4)
+#define     CXL_DVSEC_CACHE_WBI_CAPABLE	BIT(6)
+#define     CXL_DVSEC_CXL_RST_CAPABLE	BIT(7)
+#define     CXL_DVSEC_CXL_RST_TIMEOUT_MASK	GENMASK(10, 8)
+#define     CXL_DVSEC_CXL_RST_MEM_CLR_CAPABLE	BIT(11)
 #define   CXL_DVSEC_CTRL_OFFSET		0xC
 #define     CXL_DVSEC_MEM_ENABLE	BIT(2)
+#define   CXL_DVSEC_CTRL2_OFFSET	0x10
+#define     CXL_DVSEC_DISABLE_CACHING	BIT(0)
+#define     CXL_DVSEC_INIT_CACHE_WBI	BIT(1)
+#define     CXL_DVSEC_INIT_CXL_RESET	BIT(2)
+#define     CXL_DVSEC_CXL_RST_MEM_CLR_ENABLE	BIT(3)
+#define   CXL_DVSEC_STATUS2_OFFSET	0x12
+#define     CXL_DVSEC_CACHE_INVALID	BIT(0)
+#define     CXL_DVSEC_CXL_RST_COMPLETE	BIT(1)
+#define     CXL_DVSEC_CXL_RESET_ERR	BIT(2)
 #define   CXL_DVSEC_RANGE_SIZE_HIGH(i)	(0x18 + ((i) * 0x10))
 #define   CXL_DVSEC_RANGE_SIZE_LOW(i)	(0x1C + ((i) * 0x10))
 #define     CXL_DVSEC_MEM_INFO_VALID	BIT(0)
--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 03/10] cxl: add type 2 helper and reset DVSEC bits
  2026-01-20 22:26 ` [PATCH v4 03/10] cxl: add type 2 helper and reset DVSEC bits smadhavan
@ 2026-01-20 23:27   ` Dave Jiang
  2026-01-21 10:45     ` Jonathan Cameron
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Jiang @ 2026-01-20 23:27 UTC (permalink / raw)
  To: smadhavan, dave, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira



On 1/20/26 3:26 PM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Introduce a helper to identify CXL Type 2 devices and define the DVSEC
> reset/cache control bits used by the reset flow.

Should probably be 2 separate patches for these 2 things.

> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/pci.c | 10 ++++++++++
>  include/cxl/pci.h | 14 ++++++++++++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 55c767df4543..b562e607ec46 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1075,6 +1075,16 @@ static pci_ers_result_t cxl_slot_reset(struct pci_dev *pdev)
>  	return PCI_ERS_RESULT_RECOVERED;
>  }
> 
> +bool cxl_is_type2_device(struct pci_dev *pdev)
> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +
> +	if (!cxlds)
> +		return false;
> +
> +	return cxlds->type == CXL_DEVTYPE_DEVMEM;
> +}
> +
>  static void cxl_error_resume(struct pci_dev *pdev)
>  {
>  	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> diff --git a/include/cxl/pci.h b/include/cxl/pci.h
> index 728ba0cdd289..71d8de5de948 100644
> --- a/include/cxl/pci.h
> +++ b/include/cxl/pci.h
> @@ -14,10 +14,24 @@
>  /* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
>  #define CXL_DVSEC_PCIE_DEVICE					0
>  #define   CXL_DVSEC_CAP_OFFSET		0xA
> +#define     CXL_DVSEC_CACHE_CAPABLE	BIT(0)
>  #define     CXL_DVSEC_MEM_CAPABLE	BIT(2)
>  #define     CXL_DVSEC_HDM_COUNT_MASK	GENMASK(5, 4)
> +#define     CXL_DVSEC_CACHE_WBI_CAPABLE	BIT(6)
> +#define     CXL_DVSEC_CXL_RST_CAPABLE	BIT(7)
> +#define     CXL_DVSEC_CXL_RST_TIMEOUT_MASK	GENMASK(10, 8)
> +#define     CXL_DVSEC_CXL_RST_MEM_CLR_CAPABLE	BIT(11)
>  #define   CXL_DVSEC_CTRL_OFFSET		0xC
>  #define     CXL_DVSEC_MEM_ENABLE	BIT(2)
> +#define   CXL_DVSEC_CTRL2_OFFSET	0x10
> +#define     CXL_DVSEC_DISABLE_CACHING	BIT(0)
> +#define     CXL_DVSEC_INIT_CACHE_WBI	BIT(1)
> +#define     CXL_DVSEC_INIT_CXL_RESET	BIT(2)
> +#define     CXL_DVSEC_CXL_RST_MEM_CLR_ENABLE	BIT(3)
> +#define   CXL_DVSEC_STATUS2_OFFSET	0x12
> +#define     CXL_DVSEC_CACHE_INVALID	BIT(0)
> +#define     CXL_DVSEC_CXL_RST_COMPLETE	BIT(1)
> +#define     CXL_DVSEC_CXL_RESET_ERR	BIT(2)
>  #define   CXL_DVSEC_RANGE_SIZE_HIGH(i)	(0x18 + ((i) * 0x10))
>  #define   CXL_DVSEC_RANGE_SIZE_LOW(i)	(0x1C + ((i) * 0x10))
>  #define     CXL_DVSEC_MEM_INFO_VALID	BIT(0)

Should this chunk go with a different patch where the definitions are being used?

> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 03/10] cxl: add type 2 helper and reset DVSEC bits
  2026-01-20 23:27   ` Dave Jiang
@ 2026-01-21 10:45     ` Jonathan Cameron
  0 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-21 10:45 UTC (permalink / raw)
  To: Dave Jiang
  Cc: smadhavan, dave, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, 20 Jan 2026 16:27:33 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 1/20/26 3:26 PM, smadhavan@nvidia.com wrote:
> > From: Srirangan Madhavan <smadhavan@nvidia.com>
> > 
> > Introduce a helper to identify CXL Type 2 devices and define the DVSEC
> > reset/cache control bits used by the reset flow.  
> 
> Should probably be 2 separate patches for these 2 things.

Also, follow existing convention and put them new DVSEC defs
in the uapi/pci_regs.h file.  The rest are moving there shortly.

Given they are now in a uapi file, I also wonder if we should just
fill in the rest of the structure definitions as a stand alone
patch.  A partial set isn't much use to userspace tooling.

Jonathan

> 
> > 
> > Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> > ---
> >  drivers/cxl/pci.c | 10 ++++++++++
> >  include/cxl/pci.h | 14 ++++++++++++++
> >  2 files changed, 24 insertions(+)
> > 
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 55c767df4543..b562e607ec46 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -1075,6 +1075,16 @@ static pci_ers_result_t cxl_slot_reset(struct pci_dev *pdev)
> >  	return PCI_ERS_RESULT_RECOVERED;
> >  }
> > 
> > +bool cxl_is_type2_device(struct pci_dev *pdev)
> > +{
> > +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> > +
> > +	if (!cxlds)
> > +		return false;
> > +
> > +	return cxlds->type == CXL_DEVTYPE_DEVMEM;
> > +}
> > +
> >  static void cxl_error_resume(struct pci_dev *pdev)
> >  {
> >  	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> > diff --git a/include/cxl/pci.h b/include/cxl/pci.h
> > index 728ba0cdd289..71d8de5de948 100644
> > --- a/include/cxl/pci.h
> > +++ b/include/cxl/pci.h
> > @@ -14,10 +14,24 @@
> >  /* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
> >  #define CXL_DVSEC_PCIE_DEVICE					0
> >  #define   CXL_DVSEC_CAP_OFFSET		0xA
> > +#define     CXL_DVSEC_CACHE_CAPABLE	BIT(0)
> >  #define     CXL_DVSEC_MEM_CAPABLE	BIT(2)
> >  #define     CXL_DVSEC_HDM_COUNT_MASK	GENMASK(5, 4)
> > +#define     CXL_DVSEC_CACHE_WBI_CAPABLE	BIT(6)
> > +#define     CXL_DVSEC_CXL_RST_CAPABLE	BIT(7)
> > +#define     CXL_DVSEC_CXL_RST_TIMEOUT_MASK	GENMASK(10, 8)
> > +#define     CXL_DVSEC_CXL_RST_MEM_CLR_CAPABLE	BIT(11)
> >  #define   CXL_DVSEC_CTRL_OFFSET		0xC
> >  #define     CXL_DVSEC_MEM_ENABLE	BIT(2)
> > +#define   CXL_DVSEC_CTRL2_OFFSET	0x10
> > +#define     CXL_DVSEC_DISABLE_CACHING	BIT(0)
> > +#define     CXL_DVSEC_INIT_CACHE_WBI	BIT(1)
> > +#define     CXL_DVSEC_INIT_CXL_RESET	BIT(2)
> > +#define     CXL_DVSEC_CXL_RST_MEM_CLR_ENABLE	BIT(3)
> > +#define   CXL_DVSEC_STATUS2_OFFSET	0x12
> > +#define     CXL_DVSEC_CACHE_INVALID	BIT(0)
> > +#define     CXL_DVSEC_CXL_RST_COMPLETE	BIT(1)
> > +#define     CXL_DVSEC_CXL_RESET_ERR	BIT(2)
> >  #define   CXL_DVSEC_RANGE_SIZE_HIGH(i)	(0x18 + ((i) * 0x10))
> >  #define   CXL_DVSEC_RANGE_SIZE_LOW(i)	(0x1C + ((i) * 0x10))
> >  #define     CXL_DVSEC_MEM_INFO_VALID	BIT(0)  
> 
> Should this chunk go with a different patch where the definitions are being used?
> 
> > --
> > 2.34.1
> >   
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 04/10] PCI: add CXL reset method
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (2 preceding siblings ...)
  2026-01-20 22:26 ` [PATCH v4 03/10] cxl: add type 2 helper and reset DVSEC bits smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-21  0:08   ` Dave Jiang
                     ` (2 more replies)
  2026-01-20 22:26 ` [PATCH v4 05/10] cxl: add reset prepare and region teardown smadhavan
                   ` (8 subsequent siblings)
  12 siblings, 3 replies; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira,
	kernel test robot

From: Srirangan Madhavan <smadhavan@nvidia.com>

Add a PCI reset method "cxl_reset" that drives the CXL reset sequence using
DVSEC controls and timeout encoding. The method is restricted to
Type 2 devices, limiting the scope of the changes.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601172246.rz4Orygn-lkp@intel.com/
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/pci/pci.c   | 104 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  10 ++++-
 2 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8bb07e253646..e2d5ff25ab67 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4892,6 +4892,109 @@ static int pci_reset_bus_function(struct pci_dev *dev, bool probe)
 	return pci_parent_bus_reset(dev, probe);
 }

+static int cxl_reset_init(struct pci_dev *dev, u16 dvsec)
+{
+	/*
+	 * Timeout values ref CXL Spec v3.2 Ch 8 Control and Status Registers,
+	 * under section 8.1.3.1 DVSEC CXL Capability.
+	 */
+	u32 reset_timeouts_ms[] = { 10, 100, 1000, 10000, 100000 };
+	u16 reg;
+	u32 timeout_ms;
+	int rc, ind;
+
+	/* Check if CXL Reset MEM CLR is supported. */
+	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
+	if (rc)
+		return rc;
+
+	if (reg & CXL_DVSEC_CXL_RST_MEM_CLR_CAPABLE) {
+		rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
+					  &reg);
+		if (rc)
+			return rc;
+
+		reg |= CXL_DVSEC_CXL_RST_MEM_CLR_ENABLE;
+		pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
+	}
+
+	/* Read timeout value. */
+	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
+	if (rc)
+		return rc;
+	ind = FIELD_GET(CXL_DVSEC_CXL_RST_TIMEOUT_MASK, reg);
+	timeout_ms = reset_timeouts_ms[ind];
+
+	/* Write reset config. */
+	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &reg);
+	if (rc)
+		return rc;
+
+	reg |= CXL_DVSEC_INIT_CXL_RESET;
+	pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
+
+	/* Wait till timeout and then check reset status is complete. */
+	msleep(timeout_ms);
+	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_STATUS2_OFFSET, &reg);
+	if (rc)
+		return rc;
+	if (reg & CXL_DVSEC_CXL_RESET_ERR ||
+	    ~reg & CXL_DVSEC_CXL_RST_COMPLETE)
+		return -ETIMEDOUT;
+
+	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &reg);
+	if (rc)
+		return rc;
+	reg &= (~CXL_DVSEC_DISABLE_CACHING);
+	pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
+
+	return 0;
+}
+
+/**
+ * cxl_reset - initiate a cxl reset
+ * @dev: device to reset
+ * @probe: if true, return 0 if device can be reset this way
+ *
+ * Initiate a cxl reset on @dev.
+ */
+static int cxl_reset(struct pci_dev *dev, bool probe)
+{
+	u16 dvsec, reg;
+	int rc;
+
+	dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
+					  CXL_DVSEC_PCIE_DEVICE);
+	if (!dvsec)
+		return -ENOTTY;
+
+	/* Check if CXL Reset is supported. */
+	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
+	if (rc)
+		return -ENOTTY;
+
+	if ((reg & CXL_DVSEC_CXL_RST_CAPABLE) == 0)
+		return -ENOTTY;
+
+#if !IS_REACHABLE(CONFIG_CXL_PCI)
+	return -ENOTTY;
+#endif
+
+	/*
+	 * Expose CXL reset for Type 2 devices.
+	 */
+	if (!cxl_is_type2_device(dev))
+		return -ENOTTY;
+
+	if (probe)
+		return 0;
+
+	if (!pci_wait_for_pending_transaction(dev))
+		pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n");
+
+	return cxl_reset_init(dev, dvsec);
+}
+
 static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
 {
 	struct pci_dev *bridge;
@@ -5016,6 +5119,7 @@ const struct pci_reset_fn_method pci_reset_fn_methods[] = {
 	{ pci_dev_acpi_reset, .name = "acpi" },
 	{ pcie_reset_flr, .name = "flr" },
 	{ pci_af_flr, .name = "af_flr" },
+	{ cxl_reset, .name = "cxl_reset" },
 	{ pci_pm_reset, .name = "pm" },
 	{ pci_reset_bus_function, .name = "bus" },
 	{ cxl_reset_bus_function, .name = "cxl_bus" },
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 864775651c6f..4a8c4767db6e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -51,7 +51,7 @@
 			       PCI_STATUS_PARITY)

 /* Number of reset methods used in pci_reset_fn_methods array in pci.c */
-#define PCI_NUM_RESET_METHODS 8
+#define PCI_NUM_RESET_METHODS 9

 #define PCI_RESET_PROBE		true
 #define PCI_RESET_DO_RESET	false
@@ -1464,6 +1464,14 @@ int __must_check pci_resize_resource(struct pci_dev *dev, int i, int size,

 int pci_select_bars(struct pci_dev *dev, unsigned long flags);
 bool pci_device_is_present(struct pci_dev *pdev);
+#ifdef CONFIG_CXL_PCI
+bool cxl_is_type2_device(struct pci_dev *dev);
+#else
+static inline bool cxl_is_type2_device(struct pci_dev *dev)
+{
+	return false;
+}
+#endif
 void pci_ignore_hotplug(struct pci_dev *dev);
 struct pci_dev *pci_real_dma_dev(struct pci_dev *dev);
 int pci_status_get_and_clear_errors(struct pci_dev *pdev);
--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 04/10] PCI: add CXL reset method
  2026-01-20 22:26 ` [PATCH v4 04/10] PCI: add CXL reset method smadhavan
@ 2026-01-21  0:08   ` Dave Jiang
  2026-01-21 10:57   ` Jonathan Cameron
  2026-01-23 13:54   ` kernel test robot
  2 siblings, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2026-01-21  0:08 UTC (permalink / raw)
  To: smadhavan, dave, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, linux-cxl, linux-pci
  Cc: vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira,
	kernel test robot



On 1/20/26 3:26 PM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Add a PCI reset method "cxl_reset" that drives the CXL reset sequence using
> DVSEC controls and timeout encoding. The method is restricted to
> Type 2 devices, limiting the scope of the changes.
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202601172246.rz4Orygn-lkp@intel.com/

Don't think this is needed if it's kbot issue found during the series postings.

> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/pci/pci.c   | 104 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  10 ++++-
>  2 files changed, 113 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8bb07e253646..e2d5ff25ab67 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4892,6 +4892,109 @@ static int pci_reset_bus_function(struct pci_dev *dev, bool probe)
>  	return pci_parent_bus_reset(dev, probe);
>  }
> 
> +static int cxl_reset_init(struct pci_dev *dev, u16 dvsec)

cxl_dev_reset() to go with existing reset method naming?

> +{
> +	/*
> +	 * Timeout values ref CXL Spec v3.2 Ch 8 Control and Status Registers,
> +	 * under section 8.1.3.1 DVSEC CXL Capability.
> +	 */
> +	u32 reset_timeouts_ms[] = { 10, 100, 1000, 10000, 100000 };
Should this be const?

> +	u16 reg;
> +	u32 timeout_ms;
> +	int rc, ind;
> +
> +	/* Check if CXL Reset MEM CLR is supported. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +
> +	if (reg & CXL_DVSEC_CXL_RST_MEM_CLR_CAPABLE) {
> +		rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
> +					  &reg);
> +		if (rc)
> +			return rc;
> +
> +		reg |= CXL_DVSEC_CXL_RST_MEM_CLR_ENABLE;
> +		pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
> +	}
> +
> +	/* Read timeout value. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	ind = FIELD_GET(CXL_DVSEC_CXL_RST_TIMEOUT_MASK, reg);
> +	timeout_ms = reset_timeouts_ms[ind];
> +
> +	/* Write reset config. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +
> +	reg |= CXL_DVSEC_INIT_CXL_RESET;
> +	pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
> +
> +	/* Wait till timeout and then check reset status is complete. */
> +	msleep(timeout_ms);
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_STATUS2_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	if (reg & CXL_DVSEC_CXL_RESET_ERR ||
> +	    ~reg & CXL_DVSEC_CXL_RST_COMPLETE)
> +		return -ETIMEDOUT;
> +
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	reg &= (~CXL_DVSEC_DISABLE_CACHING);
> +	pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
> +
> +	return 0;
> +}
> +
> +/**
> + * cxl_reset - initiate a cxl reset
> + * @dev: device to reset
> + * @probe: if true, return 0 if device can be reset this way
> + *
> + * Initiate a cxl reset on @dev.
> + */
> +static int cxl_reset(struct pci_dev *dev, bool probe)
> +{
> +	u16 dvsec, reg;
> +	int rc;
> +
> +	dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
> +					  CXL_DVSEC_PCIE_DEVICE);
> +	if (!dvsec)
> +		return -ENOTTY;
> +
> +	/* Check if CXL Reset is supported. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
> +	if (rc)
> +		return -ENOTTY;
> +
> +	if ((reg & CXL_DVSEC_CXL_RST_CAPABLE) == 0)
> +		return -ENOTTY;
> +
> +#if !IS_REACHABLE(CONFIG_CXL_PCI)

Does this not require CONFIG_CXL_PCI to be built in (not module) to evaluate to 'y'? Also, instead of adding ifdef in the C code, maybe create a helper function for the remaining code within the function below and add an ifdef in the header depending on the config?

Although I think you may want to define cxl_is_type2_device() differently rather than relying on cxl_pci driver data. It probably should check the device type via config space instead to lessen the complications.

DJ

> +	return -ENOTTY;
> +#endif
> +
> +	/*
> +	 * Expose CXL reset for Type 2 devices.
> +	 */
> +	if (!cxl_is_type2_device(dev))
> +		return -ENOTTY;
> +
> +	if (probe)
> +		return 0;
> +
> +	if (!pci_wait_for_pending_transaction(dev))
> +		pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n");
> +
> +	return cxl_reset_init(dev, dvsec);
> +}
> +
>  static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
>  {
>  	struct pci_dev *bridge;
> @@ -5016,6 +5119,7 @@ const struct pci_reset_fn_method pci_reset_fn_methods[] = {
>  	{ pci_dev_acpi_reset, .name = "acpi" },
>  	{ pcie_reset_flr, .name = "flr" },
>  	{ pci_af_flr, .name = "af_flr" },
> +	{ cxl_reset, .name = "cxl_reset" },
>  	{ pci_pm_reset, .name = "pm" },
>  	{ pci_reset_bus_function, .name = "bus" },
>  	{ cxl_reset_bus_function, .name = "cxl_bus" },
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 864775651c6f..4a8c4767db6e 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -51,7 +51,7 @@
>  			       PCI_STATUS_PARITY)
> 
>  /* Number of reset methods used in pci_reset_fn_methods array in pci.c */
> -#define PCI_NUM_RESET_METHODS 8
> +#define PCI_NUM_RESET_METHODS 9
> 
>  #define PCI_RESET_PROBE		true
>  #define PCI_RESET_DO_RESET	false
> @@ -1464,6 +1464,14 @@ int __must_check pci_resize_resource(struct pci_dev *dev, int i, int size,
> 
>  int pci_select_bars(struct pci_dev *dev, unsigned long flags);
>  bool pci_device_is_present(struct pci_dev *pdev);
> +#ifdef CONFIG_CXL_PCI
> +bool cxl_is_type2_device(struct pci_dev *dev);
> +#else
> +static inline bool cxl_is_type2_device(struct pci_dev *dev)
> +{
> +	return false;
> +}
> +#endif
>  void pci_ignore_hotplug(struct pci_dev *dev);
>  struct pci_dev *pci_real_dma_dev(struct pci_dev *dev);
>  int pci_status_get_and_clear_errors(struct pci_dev *pdev);
> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 04/10] PCI: add CXL reset method
  2026-01-20 22:26 ` [PATCH v4 04/10] PCI: add CXL reset method smadhavan
  2026-01-21  0:08   ` Dave Jiang
@ 2026-01-21 10:57   ` Jonathan Cameron
  2026-01-23 13:54   ` kernel test robot
  2 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-21 10:57 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira,
	kernel test robot

On Tue, 20 Jan 2026 22:26:04 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Add a PCI reset method "cxl_reset" that drives the CXL reset sequence using
> DVSEC controls and timeout encoding. The method is restricted to
> Type 2 devices, limiting the scope of the changes.
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202601172246.rz4Orygn-lkp@intel.com/
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/pci/pci.c   | 104 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  10 ++++-
>  2 files changed, 113 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8bb07e253646..e2d5ff25ab67 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4892,6 +4892,109 @@ static int pci_reset_bus_function(struct pci_dev *dev, bool probe)
>  	return pci_parent_bus_reset(dev, probe);
>  }
> 
> +static int cxl_reset_init(struct pci_dev *dev, u16 dvsec)
> +{
> +	/*
> +	 * Timeout values ref CXL Spec v3.2 Ch 8 Control and Status Registers,
> +	 * under section 8.1.3.1 DVSEC CXL Capability.
> +	 */
> +	u32 reset_timeouts_ms[] = { 10, 100, 1000, 10000, 100000 };
> +	u16 reg;
> +	u32 timeout_ms;
> +	int rc, ind;
> +
> +	/* Check if CXL Reset MEM CLR is supported. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +
> +	if (reg & CXL_DVSEC_CXL_RST_MEM_CLR_CAPABLE) {
> +		rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
> +					  &reg);
> +		if (rc)
> +			return rc;
> +
> +		reg |= CXL_DVSEC_CXL_RST_MEM_CLR_ENABLE;
> +		pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);

Be consistent on checking for errors on these pci config accesses.

Pity there isn't a pci_clear_and_set_config_word() like the dword one.


> +	}
> +
> +	/* Read timeout value. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	ind = FIELD_GET(CXL_DVSEC_CXL_RST_TIMEOUT_MASK, reg);
> +	timeout_ms = reset_timeouts_ms[ind];
> +
> +	/* Write reset config. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &reg);

Separate writes needed for this and the MEM_CLR_ENABLE?  If not, refactoring
to build it up as one value to be written would be good. If separate write
is needed then add a comment to make that clear and avoid someone 'tidying'
this up later.


> +	if (rc)
> +		return rc;
> +
> +	reg |= CXL_DVSEC_INIT_CXL_RESET;
> +	pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
> +
> +	/* Wait till timeout and then check reset status is complete. */
> +	msleep(timeout_ms);
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_STATUS2_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	if (reg & CXL_DVSEC_CXL_RESET_ERR ||
> +	    ~reg & CXL_DVSEC_CXL_RST_COMPLETE)
> +		return -ETIMEDOUT;
I'd split these two conditions.  If we saw an reset_err it probably wasn't
a timeout so a different return code would be good to indicate that.

> +
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &reg);
> +	if (rc)
> +		return rc;
> +	reg &= (~CXL_DVSEC_DISABLE_CACHING);
Brackets not needed.

> +	pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
> +
> +	return 0;
> +}
> +
> +/**
> + * cxl_reset - initiate a cxl reset
> + * @dev: device to reset
> + * @probe: if true, return 0 if device can be reset this way
> + *
> + * Initiate a cxl reset on @dev.
> + */
> +static int cxl_reset(struct pci_dev *dev, bool probe)
> +{
> +	u16 dvsec, reg;
> +	int rc;
> +
> +	dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
> +					  CXL_DVSEC_PCIE_DEVICE);
> +	if (!dvsec)
> +		return -ENOTTY;
> +
> +	/* Check if CXL Reset is supported. */
> +	rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &reg);
> +	if (rc)
> +		return -ENOTTY;
> +
> +	if ((reg & CXL_DVSEC_CXL_RST_CAPABLE) == 0)
> +		return -ENOTTY;
> +
> +#if !IS_REACHABLE(CONFIG_CXL_PCI)
> +	return -ENOTTY;
> +#endif
> +
> +	/*
> +	 * Expose CXL reset for Type 2 devices.

Agree with Dave. Good to avoid linking to the CXL_PCI driver if possible.

> +	 */
> +	if (!cxl_is_type2_device(dev))
> +		return -ENOTTY;
> +
> +	if (probe)
> +		return 0;
> +
> +	if (!pci_wait_for_pending_transaction(dev))
> +		pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n");
> +
> +	return cxl_reset_init(dev, dvsec);
> +}

> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 864775651c6f..4a8c4767db6e 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -51,7 +51,7 @@
>  			       PCI_STATUS_PARITY)
> 
>  /* Number of reset methods used in pci_reset_fn_methods array in pci.c */
> -#define PCI_NUM_RESET_METHODS 8
> +#define PCI_NUM_RESET_METHODS 9
> 
>  #define PCI_RESET_PROBE		true
>  #define PCI_RESET_DO_RESET	false
> @@ -1464,6 +1464,14 @@ int __must_check pci_resize_resource(struct pci_dev *dev, int i, int size,
> 
>  int pci_select_bars(struct pci_dev *dev, unsigned long flags);
>  bool pci_device_is_present(struct pci_dev *pdev);
> +#ifdef CONFIG_CXL_PCI
> +bool cxl_is_type2_device(struct pci_dev *dev);
> +#else
> +static inline bool cxl_is_type2_device(struct pci_dev *dev)
> +{
> +	return false;
> +}
> +#endif
>  void pci_ignore_hotplug(struct pci_dev *dev);
>  struct pci_dev *pci_real_dma_dev(struct pci_dev *dev);
>  int pci_status_get_and_clear_errors(struct pci_dev *pdev);
> --
> 2.34.1
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 04/10] PCI: add CXL reset method
  2026-01-20 22:26 ` [PATCH v4 04/10] PCI: add CXL reset method smadhavan
  2026-01-21  0:08   ` Dave Jiang
  2026-01-21 10:57   ` Jonathan Cameron
@ 2026-01-23 13:54   ` kernel test robot
  2 siblings, 0 replies; 48+ messages in thread
From: kernel test robot @ 2026-01-23 13:54 UTC (permalink / raw)
  To: smadhavan, dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, linux-cxl, linux-pci
  Cc: oe-kbuild-all, smadhavan, vaslot, vsethi, sdonthineni, vidyas,
	mochs, jsequeira, kernel test robot

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus linus/master v6.19-rc6]
[cannot apply to next-20260122]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/smadhavan-nvidia-com/cxl-move-DVSEC-defines-to-cxl-pci-header/20260121-071852
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/20260120222610.2227109-5-smadhavan%40nvidia.com
patch subject: [PATCH v4 04/10] PCI: add CXL reset method
config: openrisc-allmodconfig (https://download.01.org/0day-ci/archive/20260123/202601232148.jndyojSY-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260123/202601232148.jndyojSY-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601232148.jndyojSY-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/cxl/pci.c:1085:6: error: redefinition of 'cxl_is_type2_device'
    1085 | bool cxl_is_type2_device(struct pci_dev *pdev)
         |      ^~~~~~~~~~~~~~~~~~~
   In file included from drivers/cxl/pci.c:11:
   include/linux/pci.h:1474:20: note: previous definition of 'cxl_is_type2_device' with type 'bool(struct pci_dev *)' {aka '_Bool(struct pci_dev *)'}
    1474 | static inline bool cxl_is_type2_device(struct pci_dev *dev)
         |                    ^~~~~~~~~~~~~~~~~~~


vim +/cxl_is_type2_device +1085 drivers/cxl/pci.c

2905cb5236cba6 Dan Williams       2022-11-29  1084  
b345d117d51557 Srirangan Madhavan 2026-01-20 @1085  bool cxl_is_type2_device(struct pci_dev *pdev)
b345d117d51557 Srirangan Madhavan 2026-01-20  1086  {
b345d117d51557 Srirangan Madhavan 2026-01-20  1087  	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
b345d117d51557 Srirangan Madhavan 2026-01-20  1088  
b345d117d51557 Srirangan Madhavan 2026-01-20  1089  	if (!cxlds)
b345d117d51557 Srirangan Madhavan 2026-01-20  1090  		return false;
b345d117d51557 Srirangan Madhavan 2026-01-20  1091  
b345d117d51557 Srirangan Madhavan 2026-01-20  1092  	return cxlds->type == CXL_DEVTYPE_DEVMEM;
b345d117d51557 Srirangan Madhavan 2026-01-20  1093  }
b345d117d51557 Srirangan Madhavan 2026-01-20  1094  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 05/10] cxl: add reset prepare and region teardown
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (3 preceding siblings ...)
  2026-01-20 22:26 ` [PATCH v4 04/10] PCI: add CXL reset method smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-21 11:09   ` Jonathan Cameron
  2026-01-21 21:25   ` Dave Jiang
  2026-01-20 22:26 ` [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup smadhavan
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

Prepare a Type 2 device for cxl_reset by validating memory is offline,
flushing device caches for region participants, and tearing down decoders
under cxl_region_rwsem. The lock stays held across reset to prevent new
region creation while reset is in progress.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/pci.c | 214 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 214 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index b562e607ec46..e4134162e82a 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1085,6 +1085,220 @@ bool cxl_is_type2_device(struct pci_dev *pdev)
 	return cxlds->type == CXL_DEVTYPE_DEVMEM;
 }

+static int cxl_check_region_driver_bound(struct device *dev, void *data)
+{
+	struct cxl_decoder *cxld = to_cxl_decoder(dev);
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	guard(rwsem_read)(&cxl_region_rwsem);
+	if (cxld->region && cxld->region->driver)
+		return -EBUSY;
+
+	return 0;
+}
+
+static int cxl_decoder_kill_region_iter(struct device *dev, void *data)
+{
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+	int rc;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	if (!cxled->cxld.region)
+		return 0;
+
+	cxl_decoder_kill_region_locked(cxled);
+
+	rc = device_for_each_child(&cxled->cxld.dev, NULL,
+				   cxl_check_region_driver_bound);
+	if (rc)
+		return rc;
+
+	return 0;
+}
+
+static int cxl_device_cache_wb_invalidate(struct pci_dev *pdev)
+{
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	u16 reg, val, cap;
+	int dvsec, rc;
+
+	if (!cxlds)
+		return -ENODEV;
+
+	dvsec = cxlds->cxl_dvsec;
+	if (!dvsec)
+		return -ENODEV;
+
+	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CAP_OFFSET, &cap);
+	if (rc)
+		return rc;
+
+	if (!(cap & CXL_DVSEC_CACHE_WBI_CAPABLE))
+		return 1;
+
+	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &val);
+	if (rc)
+		return rc;
+
+	val |= CXL_DVSEC_INIT_CACHE_WBI;
+	rc = pci_write_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET, val);
+	if (rc)
+		return rc;
+
+	do {
+		rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_STATUS2_OFFSET, &reg);
+		if (rc)
+			return rc;
+	} while (!(reg & CXL_DVSEC_CACHE_INVALID));
+
+	return 0;
+}
+
+static int cxl_region_flush_device_caches(struct device *dev, void *data)
+{
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+	struct cxl_region *cxlr = cxled->cxld.region;
+	struct cxl_region_params *p = &cxlr->params;
+	struct pci_dev *target_pdev = data;
+	int i, rc;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	if (!cxlr || !cxlr->params.res)
+		return 0;
+
+	for (i = 0; i < p->nr_targets; i++) {
+		struct cxl_endpoint_decoder *target_cxled = p->targets[i];
+		struct cxl_memdev *target_cxlmd = cxled_to_memdev(target_cxled);
+		struct cxl_dev_state *target_cxlds = target_cxlmd->cxlds;
+
+		if (!target_cxlds || !target_cxlds->pdev)
+			continue;
+
+		if (target_cxlds->pdev != target_pdev)
+			continue;
+
+		rc = cxl_device_cache_wb_invalidate(target_pdev);
+		if (rc && rc != 1)
+			return rc;
+	}
+
+	return 0;
+}
+
+/**
+ * cxl_reset_prepare_memdev - Prepare CXL device for reset
+ * @pdev: PCI device
+ *
+ * Validates it's safe to reset and tears down regions atomically under lock.
+ * Acquires cxl_region_rwsem and keeps it held throughout reset.
+ *
+ * Return: 0 on success (lock held), -EBUSY if memory online, negative on error
+ */
+static int cxl_reset_prepare_memdev(struct pci_dev *pdev)
+{
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	struct cxl_memdev *cxlmd;
+	struct cxl_port *endpoint;
+	int rc;
+
+	if (!cxlds)
+		return -ENODEV;
+
+	cxlmd = cxlds->cxlmd;
+	if (!cxlmd)
+		return -ENODEV;
+
+	endpoint = cxlmd->endpoint;
+	if (!endpoint)
+		return 0;
+
+	if (cxl_num_decoders_committed(endpoint) == 0)
+		return 0;
+
+	down_write(&cxl_region_rwsem);
+
+	/* Check and error out if memory is online */
+	rc = device_for_each_child(&endpoint->dev, NULL,
+				   cxl_check_region_driver_bound);
+	if (rc) {
+		up_write(&cxl_region_rwsem);
+		dev_err(&pdev->dev,
+			"Reset blocked: device has active regions with drivers bound\n");
+		return -EBUSY;
+	}
+
+	/* Flush device caches and tear down regions */
+	device_for_each_child(&endpoint->dev, pdev,
+			      cxl_region_flush_device_caches);
+
+	rc = device_for_each_child(&endpoint->dev, NULL,
+				   cxl_decoder_kill_region_iter);
+	if (rc) {
+		up_write(&cxl_region_rwsem);
+		dev_err(&pdev->dev, "Failed to tear down regions: %d\n", rc);
+		return rc;
+	}
+
+	/* Keep cxl_region_rwsem held, released by cleanup function */
+	return 0;
+}
+
+/**
+ * cxl_reset_cleanup_memdev - Release locks after CXL reset
+ * @pdev: PCI device
+ */
+static void cxl_reset_cleanup_memdev(struct pci_dev *pdev)
+{
+	if (lockdep_is_held_type(&cxl_region_rwsem, -1))
+		up_write(&cxl_region_rwsem);
+}
+
+/**
+ * cxl_reset_prepare_device - Prepare CXL device for reset
+ * @pdev: PCI device being reset
+ *
+ * CXL-reset-specific preparation. Validates memory is offline, flushes
+ * device caches, and tears down regions.
+ *
+ * Returns: 0 on success, -EBUSY if memory online, negative on error
+ */
+int cxl_reset_prepare_device(struct pci_dev *pdev)
+{
+	int rc;
+
+	rc = cxl_reset_prepare_memdev(pdev);
+	if (rc) {
+		if (rc == -EBUSY)
+			dev_err(&pdev->dev,
+				"Cannot reset: device has online memory or active regions\n");
+		else
+			dev_err(&pdev->dev,
+				"Failed to prepare device for reset: %d\n", rc);
+		return rc;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_reset_prepare_device, "CXL");
+
+/**
+ * cxl_reset_cleanup_device - Cleanup after CXL reset
+ * @pdev: PCI device that was reset
+ *
+ * Releases region locks held during reset.
+ */
+void cxl_reset_cleanup_device(struct pci_dev *pdev)
+{
+	cxl_reset_cleanup_memdev(pdev);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_reset_cleanup_device, "CXL");
+
 static void cxl_error_resume(struct pci_dev *pdev)
 {
 	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 05/10] cxl: add reset prepare and region teardown
  2026-01-20 22:26 ` [PATCH v4 05/10] cxl: add reset prepare and region teardown smadhavan
@ 2026-01-21 11:09   ` Jonathan Cameron
  2026-01-21 21:25   ` Dave Jiang
  1 sibling, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-21 11:09 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, 20 Jan 2026 22:26:05 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Prepare a Type 2 device for cxl_reset by validating memory is offline,
> flushing device caches for region participants, and tearing down decoders
> under cxl_region_rwsem. The lock stays held across reset to prevent new
> region creation while reset is in progress.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>

Some minor feedback from a quick look. I'll want to take a closer look
when we are closer to merging this.

> ---
>  drivers/cxl/pci.c | 214 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 214 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index b562e607ec46..e4134162e82a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1085,6 +1085,220 @@ bool cxl_is_type2_device(struct pci_dev *pdev)
>  	return cxlds->type == CXL_DEVTYPE_DEVMEM;
>  }
> 
> +static int cxl_check_region_driver_bound(struct device *dev, void *data)
> +{
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	guard(rwsem_read)(&cxl_region_rwsem);
> +	if (cxld->region && cxld->region->driver)
> +		return -EBUSY;
> +
> +	return 0;
> +}
> +
> +static int cxl_decoder_kill_region_iter(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> +	int rc;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	if (!cxled->cxld.region)
> +		return 0;
> +
> +	cxl_decoder_kill_region_locked(cxled);
> +
> +	rc = device_for_each_child(&cxled->cxld.dev, NULL,
> +				   cxl_check_region_driver_bound);

return device_for_each_child()

If that doesn't make sense after later patches, fine to leave as it is.


> +	if (rc)
> +		return rc;
> +
> +	return 0;
> +}
> +
> +static int cxl_device_cache_wb_invalidate(struct pci_dev *pdev)
> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	u16 reg, val, cap;
> +	int dvsec, rc;
> +
> +	if (!cxlds)
> +		return -ENODEV;
> +
> +	dvsec = cxlds->cxl_dvsec;
> +	if (!dvsec)
> +		return -ENODEV;
> +
> +	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CAP_OFFSET, &cap);
> +	if (rc)
> +		return rc;
> +
> +	if (!(cap & CXL_DVSEC_CACHE_WBI_CAPABLE))
> +		return 1;

With unusual return value, definitely need docs for this function.
Given use below, maybe just return 0?

If there are caches and there is no way to force WB, what does that mean
for whether we can reset the device?  Feels like maybe this at least
deserves a warning print.

My suspicion is that lack of that feature just means there is a device
specific way to do it but I'm fine with Linux not supporting that ;)

> +
> +	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &val);
> +	if (rc)
> +		return rc;
> +
> +	val |= CXL_DVSEC_INIT_CACHE_WBI;
> +	rc = pci_write_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET, val);
> +	if (rc)
> +		return rc;
> +
> +	do {
> +		rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_STATUS2_OFFSET, &reg);
> +		if (rc)
> +			return rc;
> +	} while (!(reg & CXL_DVSEC_CACHE_INVALID));
> +
> +	return 0;
> +}
> +
> +static int cxl_region_flush_device_caches(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> +	struct cxl_region *cxlr = cxled->cxld.region;
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct pci_dev *target_pdev = data;
> +	int i, rc;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	if (!cxlr || !cxlr->params.res)
> +		return 0;
> +
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct cxl_endpoint_decoder *target_cxled = p->targets[i];
> +		struct cxl_memdev *target_cxlmd = cxled_to_memdev(target_cxled);
> +		struct cxl_dev_state *target_cxlds = target_cxlmd->cxlds;
> +
> +		if (!target_cxlds || !target_cxlds->pdev)
> +			continue;
> +
> +		if (target_cxlds->pdev != target_pdev)

Seems like target_pdev == NULL is a bug and if possible should be checked for
before doing anything in this function, so you could simplify this as

		if (!target_cxlds || target_clds->pdev != target_pdev)

> +			continue;
> +
> +		rc = cxl_device_cache_wb_invalidate(target_pdev);
> +		if (rc && rc != 1)

As above, I'm not sure the rc == 1 return is helpful.

> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * cxl_reset_prepare_memdev - Prepare CXL device for reset
> + * @pdev: PCI device
> + *
> + * Validates it's safe to reset and tears down regions atomically under lock.
> + * Acquires cxl_region_rwsem and keeps it held throughout reset.

That may need some lockdep annotations. Make sure to run a lockdep build.

> + *
> + * Return: 0 on success (lock held), -EBUSY if memory online, negative on error
> + */
> +static int cxl_reset_prepare_memdev(struct pci_dev *pdev)
> +{

> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 05/10] cxl: add reset prepare and region teardown
  2026-01-20 22:26 ` [PATCH v4 05/10] cxl: add reset prepare and region teardown smadhavan
  2026-01-21 11:09   ` Jonathan Cameron
@ 2026-01-21 21:25   ` Dave Jiang
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2026-01-21 21:25 UTC (permalink / raw)
  To: smadhavan, dave, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, linux-cxl, linux-pci
  Cc: vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira



On 1/20/26 3:26 PM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Prepare a Type 2 device for cxl_reset by validating memory is offline,
> flushing device caches for region participants, and tearing down decoders
> under cxl_region_rwsem. The lock stays held across reset to prevent new
> region creation while reset is in progress.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/pci.c | 214 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 214 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index b562e607ec46..e4134162e82a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1085,6 +1085,220 @@ bool cxl_is_type2_device(struct pci_dev *pdev)
>  	return cxlds->type == CXL_DEVTYPE_DEVMEM;
>  }
> 
> +static int cxl_check_region_driver_bound(struct device *dev, void *data)
> +{
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	guard(rwsem_read)(&cxl_region_rwsem);
> +	if (cxld->region && cxld->region->driver)

I think you may need to take the region device lock before checking the region driver. The cxl_region_rwsem won't protect you from region device driver un/binding. While we know cxl_pci is bound since the callback is on a PCI device, there's no gaurantee that the region driver is.


> +		return -EBUSY;
> +
> +	return 0;
> +}
> +
> +static int cxl_decoder_kill_region_iter(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> +	int rc;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	if (!cxled->cxld.region)
> +		return 0;
> +
> +	cxl_decoder_kill_region_locked(cxled);

Where does this function come from? It's not present upstream.

> +
> +	rc = device_for_each_child(&cxled->cxld.dev, NULL,
> +				   cxl_check_region_driver_bound);

Why check if driver is bound after region is killed? Also, the check happens in cxl_reset_prepare_memdev() before cxl_decoder_kill_region_iter() is called, and if true fails the cxl_reset_prepare_memdev(). So at this point there shouldn't be any drivers bound right?

> +	if (rc)
> +		return rc;
> +
> +	return 0;
> +}
> +
> +static int cxl_device_cache_wb_invalidate(struct pci_dev *pdev)
> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	u16 reg, val, cap;
> +	int dvsec, rc;
> +
> +	if (!cxlds)
> +		return -ENODEV;
> +
> +	dvsec = cxlds->cxl_dvsec;
> +	if (!dvsec)
> +		return -ENODEV;
> +
> +	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CAP_OFFSET, &cap);
> +	if (rc)
> +		return rc;
> +
> +	if (!(cap & CXL_DVSEC_CACHE_WBI_CAPABLE))
> +		return 1;

Why return 1 when it's ignored anyhow? Can just return 0.

> +
> +	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET, &val);
> +	if (rc)
> +		return rc;
> +
> +	val |= CXL_DVSEC_INIT_CACHE_WBI;
> +	rc = pci_write_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET, val);
> +	if (rc)
> +		return rc;
> +
> +	do {
> +		rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_STATUS2_OFFSET, &reg);
> +		if (rc)
> +			return rc;
> +	} while (!(reg & CXL_DVSEC_CACHE_INVALID));
> +
> +	return 0;
> +}
> +
> +static int cxl_region_flush_device_caches(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> +	struct cxl_region *cxlr = cxled->cxld.region;
> +	struct cxl_region_params *p = &cxlr->params;
> +	struct pci_dev *target_pdev = data;
> +	int i, rc;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	if (!cxlr || !cxlr->params.res)
> +		return 0;
> +
> +	for (i = 0; i < p->nr_targets; i++) {
> +		struct cxl_endpoint_decoder *target_cxled = p->targets[i];
> +		struct cxl_memdev *target_cxlmd = cxled_to_memdev(target_cxled);
> +		struct cxl_dev_state *target_cxlds = target_cxlmd->cxlds;
> +
> +		if (!target_cxlds || !target_cxlds->pdev)
> +			continue;
> +
> +		if (target_cxlds->pdev != target_pdev)
> +			continue;
> +
> +		rc = cxl_device_cache_wb_invalidate(target_pdev);
> +		if (rc && rc != 1)
> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * cxl_reset_prepare_memdev - Prepare CXL device for reset
> + * @pdev: PCI device
> + *
> + * Validates it's safe to reset and tears down regions atomically under lock.
> + * Acquires cxl_region_rwsem and keeps it held throughout reset.
> + *
> + * Return: 0 on success (lock held), -EBUSY if memory online, negative on error
> + */
> +static int cxl_reset_prepare_memdev(struct pci_dev *pdev)

Given the function name, shouldn't this pass in a cxl_memdev?

> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_port *endpoint;
> +	int rc;
> +
> +	if (!cxlds)
> +		return -ENODEV;
> +
> +	cxlmd = cxlds->cxlmd;
> +	if (!cxlmd)
> +		return -ENODEV;
> +
> +	endpoint = cxlmd->endpoint;
> +	if (!endpoint)
> +		return 0;
> +
> +	if (cxl_num_decoders_committed(endpoint) == 0)
> +		return 0;

Without any decoders committed, this endpoint shouldn't be part of any region correct? Why bother checking driver bound after this. Seems overkill.

> +
> +	down_write(&cxl_region_rwsem);
> +
> +	/* Check and error out if memory is online */

Having cxl_region driver bound does not mean memory is online in the sense if it's hot-plugged to the memory subsystem. Are you looking for it to be not system memory here, or that cxl has claimed the region, or if there's just a hardware committed region?

> +	rc = device_for_each_child(&endpoint->dev, NULL,
> +				   cxl_check_region_driver_bound);
> +	if (rc) {
> +		up_write(&cxl_region_rwsem);
> +		dev_err(&pdev->dev,
> +			"Reset blocked: device has active regions with drivers bound\n");
> +		return -EBUSY;
> +	}
> +
> +	/* Flush device caches and tear down regions */
> +	device_for_each_child(&endpoint->dev, pdev,
> +			      cxl_region_flush_device_caches);
> +
> +	rc = device_for_each_child(&endpoint->dev, NULL,
> +				   cxl_decoder_kill_region_iter);
> +	if (rc) {
> +		up_write(&cxl_region_rwsem);
> +		dev_err(&pdev->dev, "Failed to tear down regions: %d\n", rc);
> +		return rc;
> +	}
> +
> +	/* Keep cxl_region_rwsem held, released by cleanup function */

May want to annotate the function with __acquires()
> +	return 0;
> +}
> +
> +/**
> + * cxl_reset_cleanup_memdev - Release locks after CXL reset
> + * @pdev: PCI device
> + */
> +static void cxl_reset_cleanup_memdev(struct pci_dev *pdev)
> +{
> +	if (lockdep_is_held_type(&cxl_region_rwsem, -1))

why check when it's expected that the lock is held? Probably should have lockdep_assert_held() instead. Also may need a __releases() to annotate that this function releases the lock acquires in cxl_reset_prepare_memdev(). 

DJ

> +		up_write(&cxl_region_rwsem);
> +}
> +
> +/**
> + * cxl_reset_prepare_device - Prepare CXL device for reset
> + * @pdev: PCI device being reset
> + *
> + * CXL-reset-specific preparation. Validates memory is offline, flushes
> + * device caches, and tears down regions.
> + *
> + * Returns: 0 on success, -EBUSY if memory online, negative on error
> + */
> +int cxl_reset_prepare_device(struct pci_dev *pdev)
> +{
> +	int rc;
> +
> +	rc = cxl_reset_prepare_memdev(pdev);
> +	if (rc) {
> +		if (rc == -EBUSY)
> +			dev_err(&pdev->dev,
> +				"Cannot reset: device has online memory or active regions\n");
> +		else
> +			dev_err(&pdev->dev,
> +				"Failed to prepare device for reset: %d\n", rc);
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_reset_prepare_device, "CXL");
> +
> +/**
> + * cxl_reset_cleanup_device - Cleanup after CXL reset
> + * @pdev: PCI device that was reset
> + *
> + * Releases region locks held during reset.
> + */
> +void cxl_reset_cleanup_device(struct pci_dev *pdev)
> +{
> +	cxl_reset_cleanup_memdev(pdev);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_reset_cleanup_device, "CXL");
> +
>  static void cxl_error_resume(struct pci_dev *pdev)
>  {
>  	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (4 preceding siblings ...)
  2026-01-20 22:26 ` [PATCH v4 05/10] cxl: add reset prepare and region teardown smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-21 22:13   ` Dave Jiang
  2026-01-24  7:54   ` kernel test robot
  2026-01-20 22:26 ` [PATCH v4 07/10] cxl: add host cache flush and multi-function reset smadhavan
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

Wire CXL reset preparation and cleanup into the PCI CXL reset path.
The flow now validates/offlines regions, performs teardown and cache
flushes, then releases the lock on completion or error. This keeps the
common reset_prepare flow intact while adding cxl_reset-specific quiesce logic.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/pci/pci.c   | 19 ++++++++++++++++++-
 include/linux/pci.h | 11 +++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e2d5ff25ab67..18047c893b0c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4989,10 +4989,27 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
 	if (probe)
 		return 0;

+	/*
+	 * CXL-reset-specific preparation: validate memory offline,
+	 * tear down regions, flush device caches.
+	 */
+	rc = cxl_reset_prepare_device(dev);
+	if (rc)
+		return rc;
+
 	if (!pci_wait_for_pending_transaction(dev))
 		pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n");

-	return cxl_reset_init(dev, dvsec);
+	rc = cxl_reset_init(dev, dvsec);
+	if (rc)
+		goto out_cleanup;
+
+	cxl_reset_cleanup_device(dev);
+	return 0;
+
+out_cleanup:
+	cxl_reset_cleanup_device(dev);
+	return rc;
 }

 static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a8c4767db6e..c074c2040b28 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1466,11 +1466,22 @@ int pci_select_bars(struct pci_dev *dev, unsigned long flags);
 bool pci_device_is_present(struct pci_dev *pdev);
 #ifdef CONFIG_CXL_PCI
 bool cxl_is_type2_device(struct pci_dev *dev);
+int cxl_reset_prepare_device(struct pci_dev *pdev);
+void cxl_reset_cleanup_device(struct pci_dev *pdev);
 #else
 static inline bool cxl_is_type2_device(struct pci_dev *dev)
 {
 	return false;
 }
+
+static inline int cxl_reset_prepare_device(struct pci_dev *pdev)
+{
+	return 0;
+}
+
+static inline void cxl_reset_cleanup_device(struct pci_dev *pdev)
+{
+}
 #endif
 void pci_ignore_hotplug(struct pci_dev *dev);
 struct pci_dev *pci_real_dma_dev(struct pci_dev *dev);
--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup
  2026-01-20 22:26 ` [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup smadhavan
@ 2026-01-21 22:13   ` Dave Jiang
  2026-01-22  2:17     ` Srirangan Madhavan
  2026-01-24  7:54   ` kernel test robot
  1 sibling, 1 reply; 48+ messages in thread
From: Dave Jiang @ 2026-01-21 22:13 UTC (permalink / raw)
  To: smadhavan, dave, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira



On 1/20/26 3:26 PM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Wire CXL reset preparation and cleanup into the PCI CXL reset path.
> The flow now validates/offlines regions, performs teardown and cache
> flushes, then releases the lock on completion or error. This keeps the
> common reset_prepare flow intact while adding cxl_reset-specific quiesce logic.

Can this be moved to the ->reset_prepare() callback of 'pci_error_handlers' rather than directly wire it into PCI core code? I don't think we want build a dependency of cxl core to the PCI core.

> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/pci/pci.c   | 19 ++++++++++++++++++-
>  include/linux/pci.h | 11 +++++++++++
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e2d5ff25ab67..18047c893b0c 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4989,10 +4989,27 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
>  	if (probe)
>  		return 0;
> 
> +	/*
> +	 * CXL-reset-specific preparation: validate memory offline,
> +	 * tear down regions, flush device caches.
> +	 */
> +	rc = cxl_reset_prepare_device(dev);
> +	if (rc)
> +		return rc;
> +
>  	if (!pci_wait_for_pending_transaction(dev))
>  		pci_err(dev, "timed out waiting for pending transaction; performing function level reset anyway\n");
> 
> -	return cxl_reset_init(dev, dvsec);
> +	rc = cxl_reset_init(dev, dvsec);
> +	if (rc)
> +		goto out_cleanup;
> +
> +	cxl_reset_cleanup_device(dev);
> +	return 0;
> +
> +out_cleanup:
> +	cxl_reset_cleanup_device(dev);
> +	return rc;
>  }
> 
>  static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 4a8c4767db6e..c074c2040b28 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1466,11 +1466,22 @@ int pci_select_bars(struct pci_dev *dev, unsigned long flags);
>  bool pci_device_is_present(struct pci_dev *pdev);
>  #ifdef CONFIG_CXL_PCI
>  bool cxl_is_type2_device(struct pci_dev *dev);
> +int cxl_reset_prepare_device(struct pci_dev *pdev);
> +void cxl_reset_cleanup_device(struct pci_dev *pdev);
>  #else
>  static inline bool cxl_is_type2_device(struct pci_dev *dev)
>  {
>  	return false;
>  }
> +
> +static inline int cxl_reset_prepare_device(struct pci_dev *pdev)
> +{
> +	return 0;
> +}
> +
> +static inline void cxl_reset_cleanup_device(struct pci_dev *pdev)
> +{
> +}
>  #endif
>  void pci_ignore_hotplug(struct pci_dev *dev);
>  struct pci_dev *pci_real_dma_dev(struct pci_dev *dev);
> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup
  2026-01-21 22:13   ` Dave Jiang
@ 2026-01-22  2:17     ` Srirangan Madhavan
  2026-01-22 15:11       ` Dave Jiang
  0 siblings, 1 reply; 48+ messages in thread
From: Srirangan Madhavan @ 2026-01-22  2:17 UTC (permalink / raw)
  To: Dave Jiang, dave@stgolabs.net, jonathan.cameron@huawei.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, dan.j.williams@intel.com,
	bhelgaas@google.com, ming.li@zohomail.com, rrichter@amd.com,
	Smita.KoralahalliChannabasappa@amd.com, huaisheng.ye@intel.com,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org
  Cc: Vishal Aslot, Vikram Sethi, Shanker Donthineni, Vidya Sagar,
	Matt Ochs, Jason Sequeira



On 1/21/26, 2:14 PM, "Dave Jiang" <dave.jiang@intel.com <mailto:dave.jiang@intel.com>> wrote:


>> Wire CXL reset preparation and cleanup into the PCI CXL reset path.
>> The flow now validates/offlines regions, performs teardown and cache
>> flushes, then releases the lock on completion or error. This keeps the
>> common reset_prepare flow intact while adding cxl_reset-specific quiesce logic.
>
> Can this be moved to the ->reset_prepare() callback of 'pci_error_handlers' rather than directly wire it into PCI core code? I don't think we want build a
> dependency of cxl core to the PCI core.

I understand the current implementation is not ideal. But I opted not to insert the call into ->reset_prepare() as pci_error_handler would be invoked for all the other reset methods (cxl_bus, FLR, etc).
Would it be okay to include the same reset prepare steps (i.e region teardown, cache flush, etc) in the common reset prepare? 

Thank you.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup
  2026-01-22  2:17     ` Srirangan Madhavan
@ 2026-01-22 15:11       ` Dave Jiang
  0 siblings, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2026-01-22 15:11 UTC (permalink / raw)
  To: Srirangan Madhavan, dave@stgolabs.net,
	jonathan.cameron@huawei.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	dan.j.williams@intel.com, bhelgaas@google.com,
	ming.li@zohomail.com, rrichter@amd.com,
	Smita.KoralahalliChannabasappa@amd.com, huaisheng.ye@intel.com,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org
  Cc: Vishal Aslot, Vikram Sethi, Shanker Donthineni, Vidya Sagar,
	Matt Ochs, Jason Sequeira



On 1/21/26 7:17 PM, Srirangan Madhavan wrote:
> 
> 
> On 1/21/26, 2:14 PM, "Dave Jiang" <dave.jiang@intel.com <mailto:dave.jiang@intel.com>> wrote:
> 
> 
>>> Wire CXL reset preparation and cleanup into the PCI CXL reset path.
>>> The flow now validates/offlines regions, performs teardown and cache
>>> flushes, then releases the lock on completion or error. This keeps the
>>> common reset_prepare flow intact while adding cxl_reset-specific quiesce logic.
>>
>> Can this be moved to the ->reset_prepare() callback of 'pci_error_handlers' rather than directly wire it into PCI core code? I don't think we want build a
>> dependency of cxl core to the PCI core.
> 
> I understand the current implementation is not ideal. But I opted not to insert the call into ->reset_prepare() as pci_error_handler would be invoked for all the other reset methods (cxl_bus, FLR, etc).
> Would it be okay to include the same reset prepare steps (i.e region teardown, cache flush, etc) in the common reset prepare? 

See my suggestion reply against patch 10/10. Maybe we shouldn't be going to PCI reset path.

> 
> Thank you.
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup
  2026-01-20 22:26 ` [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup smadhavan
  2026-01-21 22:13   ` Dave Jiang
@ 2026-01-24  7:54   ` kernel test robot
  1 sibling, 0 replies; 48+ messages in thread
From: kernel test robot @ 2026-01-24  7:54 UTC (permalink / raw)
  To: smadhavan, dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, linux-cxl, linux-pci
  Cc: oe-kbuild-all, smadhavan, vaslot, vsethi, sdonthineni, vidyas,
	mochs, jsequeira

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus linus/master v6.19-rc6]
[cannot apply to next-20260123]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/smadhavan-nvidia-com/cxl-move-DVSEC-defines-to-cxl-pci-header/20260121-071852
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/20260120222610.2227109-7-smadhavan%40nvidia.com
patch subject: [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup
config: openrisc-allmodconfig (https://download.01.org/0day-ci/archive/20260124/202601241505.gxL2m9pU-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260124/202601241505.gxL2m9pU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601241505.gxL2m9pU-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/cxl/pci.c:1085:6: error: redefinition of 'cxl_is_type2_device'
    1085 | bool cxl_is_type2_device(struct pci_dev *pdev)
         |      ^~~~~~~~~~~~~~~~~~~
   In file included from drivers/cxl/pci.c:11:
   include/linux/pci.h:1476:20: note: previous definition of 'cxl_is_type2_device' with type 'bool(struct pci_dev *)' {aka '_Bool(struct pci_dev *)'}
    1476 | static inline bool cxl_is_type2_device(struct pci_dev *dev)
         |                    ^~~~~~~~~~~~~~~~~~~
   drivers/cxl/pci.c: In function 'cxl_check_region_driver_bound':
   drivers/cxl/pci.c:1102:28: error: 'cxl_region_rwsem' undeclared (first use in this function); did you mean 'cxl_region_ref'?
    1102 |         guard(rwsem_read)(&cxl_region_rwsem);
         |                            ^~~~~~~~~~~~~~~~
         |                            cxl_region_ref
   drivers/cxl/pci.c:1102:28: note: each undeclared identifier is reported only once for each function it appears in
   drivers/cxl/pci.c:1103:41: error: 'struct cxl_region' has no member named 'driver'
    1103 |         if (cxld->region && cxld->region->driver)
         |                                         ^~
   drivers/cxl/pci.c: In function 'cxl_decoder_kill_region_iter':
   drivers/cxl/pci.c:1120:9: error: implicit declaration of function 'cxl_decoder_kill_region_locked'; did you mean 'cxl_decoder_kill_region_iter'? [-Wimplicit-function-declaration]
    1120 |         cxl_decoder_kill_region_locked(cxled);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |         cxl_decoder_kill_region_iter
   drivers/cxl/pci.c: In function 'cxl_region_flush_device_caches':
   drivers/cxl/pci.c:1187:53: error: 'struct cxl_dev_state' has no member named 'pdev'; did you mean 'dev'?
    1187 |                 if (!target_cxlds || !target_cxlds->pdev)
         |                                                     ^~~~
         |                                                     dev
   drivers/cxl/pci.c:1190:35: error: 'struct cxl_dev_state' has no member named 'pdev'; did you mean 'dev'?
    1190 |                 if (target_cxlds->pdev != target_pdev)
         |                                   ^~~~
         |                                   dev
   drivers/cxl/pci.c: In function 'cxl_reset_prepare_memdev':
   drivers/cxl/pci.c:1231:21: error: 'cxl_region_rwsem' undeclared (first use in this function); did you mean 'cxl_region_ref'?
    1231 |         down_write(&cxl_region_rwsem);
         |                     ^~~~~~~~~~~~~~~~
         |                     cxl_region_ref
   In file included from include/linux/spinlock.h:63,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:7,
                    from include/linux/mm.h:7,
                    from arch/openrisc/include/asm/pgalloc.h:20,
                    from arch/openrisc/include/asm/io.h:18,
                    from include/linux/io.h:12,
                    from include/linux/io-64-nonatomic-lo-hi.h:5,
                    from drivers/cxl/pci.c:4:
   drivers/cxl/pci.c: In function 'cxl_reset_cleanup_memdev':
   drivers/cxl/pci.c:1265:35: error: 'cxl_region_rwsem' undeclared (first use in this function); did you mean 'cxl_region_ref'?
    1265 |         if (lockdep_is_held_type(&cxl_region_rwsem, -1))
         |                                   ^~~~~~~~~~~~~~~~
   include/linux/lockdep.h:253:61: note: in definition of macro 'lockdep_is_held_type'
     253 | #define lockdep_is_held_type(lock, r)   lock_is_held_type(&(lock)->dep_map, (r))
         |                                                             ^~~~
   drivers/cxl/pci.c: At top level:
>> drivers/cxl/pci.c:1278:5: error: redefinition of 'cxl_reset_prepare_device'
    1278 | int cxl_reset_prepare_device(struct pci_dev *pdev)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:1481:19: note: previous definition of 'cxl_reset_prepare_device' with type 'int(struct pci_dev *)'
    1481 | static inline int cxl_reset_prepare_device(struct pci_dev *pdev)
         |                   ^~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/cxl/pci.c:1303:6: error: redefinition of 'cxl_reset_cleanup_device'
    1303 | void cxl_reset_cleanup_device(struct pci_dev *pdev)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:1486:20: note: previous definition of 'cxl_reset_cleanup_device' with type 'void(struct pci_dev *)'
    1486 | static inline void cxl_reset_cleanup_device(struct pci_dev *pdev)
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~


vim +/cxl_reset_prepare_device +1278 drivers/cxl/pci.c

76e9ed09acad72 Srirangan Madhavan 2026-01-20  1268  
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1269  /**
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1270   * cxl_reset_prepare_device - Prepare CXL device for reset
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1271   * @pdev: PCI device being reset
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1272   *
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1273   * CXL-reset-specific preparation. Validates memory is offline, flushes
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1274   * device caches, and tears down regions.
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1275   *
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1276   * Returns: 0 on success, -EBUSY if memory online, negative on error
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1277   */
76e9ed09acad72 Srirangan Madhavan 2026-01-20 @1278  int cxl_reset_prepare_device(struct pci_dev *pdev)
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1279  {
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1280  	int rc;
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1281  
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1282  	rc = cxl_reset_prepare_memdev(pdev);
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1283  	if (rc) {
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1284  		if (rc == -EBUSY)
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1285  			dev_err(&pdev->dev,
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1286  				"Cannot reset: device has online memory or active regions\n");
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1287  		else
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1288  			dev_err(&pdev->dev,
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1289  				"Failed to prepare device for reset: %d\n", rc);
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1290  		return rc;
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1291  	}
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1292  
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1293  	return 0;
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1294  }
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1295  EXPORT_SYMBOL_NS_GPL(cxl_reset_prepare_device, "CXL");
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1296  
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1297  /**
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1298   * cxl_reset_cleanup_device - Cleanup after CXL reset
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1299   * @pdev: PCI device that was reset
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1300   *
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1301   * Releases region locks held during reset.
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1302   */
76e9ed09acad72 Srirangan Madhavan 2026-01-20 @1303  void cxl_reset_cleanup_device(struct pci_dev *pdev)
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1304  {
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1305  	cxl_reset_cleanup_memdev(pdev);
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1306  }
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1307  EXPORT_SYMBOL_NS_GPL(cxl_reset_cleanup_device, "CXL");
76e9ed09acad72 Srirangan Madhavan 2026-01-20  1308  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (5 preceding siblings ...)
  2026-01-20 22:26 ` [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-21 11:20   ` Jonathan Cameron
  2026-01-21 23:59   ` Dave Jiang
  2026-01-20 22:26 ` [PATCH v4 08/10] cxl: add DVSEC config save/restore smadhavan
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

Flush host CPU caches for mapped HDM ranges after teardown and prepare
sibling Type 2 functions on multi-function devices. The host cache
maintenance uses wbinvd_on_all_cpus() on x86 and VA-based PoC clean+
invalidate on arm64 via memremap() and on_each_cpu(), matching the
required ordering before reset.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/pci.c | 150 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 148 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index e4134162e82a..f9cc452ccb8a 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -11,6 +11,10 @@
 #include <linux/pci.h>
 #include <linux/aer.h>
 #include <linux/io.h>
+#include <linux/align.h>
+#include <linux/cache.h>
+#include <linux/cacheflush.h>
+#include <linux/smp.h>
 #include <cxl/mailbox.h>
 #include <cxl/pci.h>
 #include "cxlmem.h"
@@ -1085,6 +1089,71 @@ bool cxl_is_type2_device(struct pci_dev *pdev)
 	return cxlds->type == CXL_DEVTYPE_DEVMEM;
 }

+#ifdef CONFIG_ARM64
+struct cxl_cache_flush_ctx {
+	void *va;
+	size_t len;
+};
+
+static void cxl_flush_by_va_local(void *info)
+{
+	struct cxl_cache_flush_ctx *ctx = info;
+
+	dcache_clean_inval_poc((unsigned long)ctx->va,
+			       (unsigned long)ctx->va + ctx->len);
+	asm volatile("dsb ish" ::: "memory");
+}
+#endif
+
+static int cxl_region_flush_host_cpu_caches(struct device *dev, void *data)
+{
+	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+	struct cxl_region *cxlr = cxled->cxld.region;
+	struct resource *res;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	if (!cxlr || !cxlr->params.res)
+		return 0;
+
+	res = cxlr->params.res;
+
+#ifdef CONFIG_X86
+	static bool flushed;
+
+	if (!flushed) {
+		wbinvd_on_all_cpus();
+		flushed = true;
+	}
+#elif defined(CONFIG_ARM64)
+	void *va;
+	size_t len, line_size = L1_CACHE_BYTES;
+	phys_addr_t start, end, aligned_start, aligned_end;
+	struct cxl_cache_flush_ctx flush_ctx;
+
+	start = res->start;
+	end = res->end;
+
+	aligned_start = ALIGN_DOWN(start, line_size);
+	aligned_end = ALIGN(end + 1, line_size);
+	len = aligned_end - aligned_start;
+
+	va = memremap(aligned_start, len, MEMREMAP_WB);
+	if (!va) {
+		pr_warn("Failed to map region for cache flush\n");
+		return 0;
+	}
+
+	flush_ctx.va = va;
+	flush_ctx.len = len;
+	on_each_cpu(cxl_flush_by_va_local, &flush_ctx, 1);
+
+	memunmap(va);
+#endif
+	return 0;
+}
+
 static int cxl_check_region_driver_bound(struct device *dev, void *data)
 {
 	struct cxl_decoder *cxld = to_cxl_decoder(dev);
@@ -1245,6 +1314,9 @@ static int cxl_reset_prepare_memdev(struct pci_dev *pdev)
 		return rc;
 	}

+	device_for_each_child(&endpoint->dev, NULL,
+			      cxl_region_flush_host_cpu_caches);
+
 	/* Keep cxl_region_rwsem held, released by cleanup function */
 	return 0;
 }
@@ -1259,12 +1331,79 @@ static void cxl_reset_cleanup_memdev(struct pci_dev *pdev)
 		up_write(&cxl_region_rwsem);
 }

+static int cxl_reset_prepare_all_functions(struct pci_dev *pdev)
+{
+	struct pci_dev *func_dev;
+	unsigned int devfn;
+	int func, rc;
+	struct pci_dev *prepared_funcs[8] = { NULL };
+	int prepared_count = 0;
+
+	for (func = 0; func < 8; func++) {
+		devfn = PCI_DEVFN(PCI_SLOT(pdev->devfn), func);
+
+		if (devfn == pdev->devfn)
+			continue;
+
+		func_dev = pci_get_slot(pdev->bus, devfn);
+		if (!func_dev)
+			continue;
+
+		if (!cxl_is_type2_device(func_dev)) {
+			pci_dev_put(func_dev);
+			continue;
+		}
+
+		rc = cxl_reset_prepare_memdev(func_dev);
+		if (rc) {
+			pci_dev_put(func_dev);
+			goto cleanup_funcs;
+		}
+
+		prepared_funcs[prepared_count++] = func_dev;
+	}
+
+	return 0;
+
+cleanup_funcs:
+	for (func = 0; func < prepared_count; func++) {
+		if (prepared_funcs[func]) {
+			cxl_reset_cleanup_memdev(prepared_funcs[func]);
+			pci_dev_put(prepared_funcs[func]);
+		}
+	}
+	return rc;
+}
+
+static void cxl_reset_cleanup_all_functions(struct pci_dev *pdev)
+{
+	struct pci_dev *func_dev;
+	unsigned int devfn;
+	int func;
+
+	for (func = 0; func < 8; func++) {
+		devfn = PCI_DEVFN(PCI_SLOT(pdev->devfn), func);
+
+		if (devfn == pdev->devfn)
+			continue;
+
+		func_dev = pci_get_slot(pdev->bus, devfn);
+		if (!func_dev)
+			continue;
+
+		if (cxl_is_type2_device(func_dev))
+			cxl_reset_cleanup_memdev(func_dev);
+
+		pci_dev_put(func_dev);
+	}
+}
+
 /**
  * cxl_reset_prepare_device - Prepare CXL device for reset
  * @pdev: PCI device being reset
  *
  * CXL-reset-specific preparation. Validates memory is offline, flushes
- * device caches, and tears down regions.
+ * device caches, and tears down regions for device and siblings.
  *
  * Returns: 0 on success, -EBUSY if memory online, negative on error
  */
@@ -1283,6 +1422,12 @@ int cxl_reset_prepare_device(struct pci_dev *pdev)
 		return rc;
 	}

+	rc = cxl_reset_prepare_all_functions(pdev);
+	if (rc) {
+		cxl_reset_cleanup_memdev(pdev);
+		return rc;
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_reset_prepare_device, "CXL");
@@ -1291,10 +1436,11 @@ EXPORT_SYMBOL_NS_GPL(cxl_reset_prepare_device, "CXL");
  * cxl_reset_cleanup_device - Cleanup after CXL reset
  * @pdev: PCI device that was reset
  *
- * Releases region locks held during reset.
+ * Releases region locks for device and all sibling functions.
  */
 void cxl_reset_cleanup_device(struct pci_dev *pdev)
 {
+	cxl_reset_cleanup_all_functions(pdev);
 	cxl_reset_cleanup_memdev(pdev);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_reset_cleanup_device, "CXL");
--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
  2026-01-20 22:26 ` [PATCH v4 07/10] cxl: add host cache flush and multi-function reset smadhavan
@ 2026-01-21 11:20   ` Jonathan Cameron
  2026-01-21 20:27     ` Davidlohr Bueso
                       ` (2 more replies)
  2026-01-21 23:59   ` Dave Jiang
  1 sibling, 3 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-21 11:20 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, 20 Jan 2026 22:26:07 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Flush host CPU caches for mapped HDM ranges after teardown and prepare
> sibling Type 2 functions on multi-function devices. The host cache
> maintenance uses wbinvd_on_all_cpus() on x86 and VA-based PoC clean+

That's not sufficient in general on arm64.  It might flush far enough
or there might be buffers beyond the point of coherence that are not
flushed.

Until there are clarifications from Arm we need something like the
agents in drivers/cache to do equivalent of wbinvd_on_all_cpus()
(be it on a PA range to making a tiny bit less horrible)

Or we need an opt in list for platforms where a flush to PoC is enough.

Needs to some sort of arch_ call as well, not hidden in the cxl driver.

I'd suggest a separate patch to add the necessary infrastructure for arm64
with the relevant maintainers and lists included.

It needs enough work to be something they'd consider that I haven't +CC them
on this version.

With that in mind I'll skip reviewing this patch for now.

Jonathan

> invalidate on arm64 via memremap() and on_each_cpu(), matching the
> required ordering before reset.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/pci.c | 150 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 148 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e4134162e82a..f9cc452ccb8a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -11,6 +11,10 @@
>  #include <linux/pci.h>
>  #include <linux/aer.h>
>  #include <linux/io.h>
> +#include <linux/align.h>
> +#include <linux/cache.h>
> +#include <linux/cacheflush.h>
> +#include <linux/smp.h>
>  #include <cxl/mailbox.h>
>  #include <cxl/pci.h>
>  #include "cxlmem.h"
> @@ -1085,6 +1089,71 @@ bool cxl_is_type2_device(struct pci_dev *pdev)
>  	return cxlds->type == CXL_DEVTYPE_DEVMEM;
>  }
> 
> +#ifdef CONFIG_ARM64
> +struct cxl_cache_flush_ctx {
> +	void *va;
> +	size_t len;
> +};
> +
> +static void cxl_flush_by_va_local(void *info)
> +{
> +	struct cxl_cache_flush_ctx *ctx = info;
> +
> +	dcache_clean_inval_poc((unsigned long)ctx->va,
> +			       (unsigned long)ctx->va + ctx->len);
> +	asm volatile("dsb ish" ::: "memory");
> +}
> +#endif
> +
> +static int cxl_region_flush_host_cpu_caches(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> +	struct cxl_region *cxlr = cxled->cxld.region;
> +	struct resource *res;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	if (!cxlr || !cxlr->params.res)
> +		return 0;
> +
> +	res = cxlr->params.res;
> +
> +#ifdef CONFIG_X86
> +	static bool flushed;
> +
> +	if (!flushed) {
> +		wbinvd_on_all_cpus();
> +		flushed = true;
> +	}
> +#elif defined(CONFIG_ARM64)
> +	void *va;
> +	size_t len, line_size = L1_CACHE_BYTES;
> +	phys_addr_t start, end, aligned_start, aligned_end;
> +	struct cxl_cache_flush_ctx flush_ctx;
> +
> +	start = res->start;
> +	end = res->end;
> +
> +	aligned_start = ALIGN_DOWN(start, line_size);
> +	aligned_end = ALIGN(end + 1, line_size);
> +	len = aligned_end - aligned_start;
> +
> +	va = memremap(aligned_start, len, MEMREMAP_WB);
> +	if (!va) {
> +		pr_warn("Failed to map region for cache flush\n");
> +		return 0;
> +	}
> +
> +	flush_ctx.va = va;
> +	flush_ctx.len = len;
> +	on_each_cpu(cxl_flush_by_va_local, &flush_ctx, 1);
> +
> +	memunmap(va);
> +#endif
> +	return 0;
> +}
> +
>  static int cxl_check_region_driver_bound(struct device *dev, void *data)
>  {
>  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> @@ -1245,6 +1314,9 @@ static int cxl_reset_prepare_memdev(struct pci_dev *pdev)
>  		return rc;
>  	}
> 
> +	device_for_each_child(&endpoint->dev, NULL,
> +			      cxl_region_flush_host_cpu_caches);
> +
>  	/* Keep cxl_region_rwsem held, released by cleanup function */
>  	return 0;
>  }
> @@ -1259,12 +1331,79 @@ static void cxl_reset_cleanup_memdev(struct pci_dev *pdev)
>  		up_write(&cxl_region_rwsem);
>  }
> 
> +static int cxl_reset_prepare_all_functions(struct pci_dev *pdev)
> +{
> +	struct pci_dev *func_dev;
> +	unsigned int devfn;
> +	int func, rc;
> +	struct pci_dev *prepared_funcs[8] = { NULL };
> +	int prepared_count = 0;
> +
> +	for (func = 0; func < 8; func++) {
> +		devfn = PCI_DEVFN(PCI_SLOT(pdev->devfn), func);
> +
> +		if (devfn == pdev->devfn)
> +			continue;
> +
> +		func_dev = pci_get_slot(pdev->bus, devfn);
> +		if (!func_dev)
> +			continue;
> +
> +		if (!cxl_is_type2_device(func_dev)) {
> +			pci_dev_put(func_dev);
> +			continue;
> +		}
> +
> +		rc = cxl_reset_prepare_memdev(func_dev);
> +		if (rc) {
> +			pci_dev_put(func_dev);
> +			goto cleanup_funcs;
> +		}
> +
> +		prepared_funcs[prepared_count++] = func_dev;
> +	}
> +
> +	return 0;
> +
> +cleanup_funcs:
> +	for (func = 0; func < prepared_count; func++) {
> +		if (prepared_funcs[func]) {
> +			cxl_reset_cleanup_memdev(prepared_funcs[func]);
> +			pci_dev_put(prepared_funcs[func]);
> +		}
> +	}
> +	return rc;
> +}
> +
> +static void cxl_reset_cleanup_all_functions(struct pci_dev *pdev)
> +{
> +	struct pci_dev *func_dev;
> +	unsigned int devfn;
> +	int func;
> +
> +	for (func = 0; func < 8; func++) {
> +		devfn = PCI_DEVFN(PCI_SLOT(pdev->devfn), func);
> +
> +		if (devfn == pdev->devfn)
> +			continue;
> +
> +		func_dev = pci_get_slot(pdev->bus, devfn);
> +		if (!func_dev)
> +			continue;
> +
> +		if (cxl_is_type2_device(func_dev))
> +			cxl_reset_cleanup_memdev(func_dev);
> +
> +		pci_dev_put(func_dev);
> +	}
> +}
> +
>  /**
>   * cxl_reset_prepare_device - Prepare CXL device for reset
>   * @pdev: PCI device being reset
>   *
>   * CXL-reset-specific preparation. Validates memory is offline, flushes
> - * device caches, and tears down regions.
> + * device caches, and tears down regions for device and siblings.
>   *
>   * Returns: 0 on success, -EBUSY if memory online, negative on error
>   */
> @@ -1283,6 +1422,12 @@ int cxl_reset_prepare_device(struct pci_dev *pdev)
>  		return rc;
>  	}
> 
> +	rc = cxl_reset_prepare_all_functions(pdev);
> +	if (rc) {
> +		cxl_reset_cleanup_memdev(pdev);
> +		return rc;
> +	}
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_reset_prepare_device, "CXL");
> @@ -1291,10 +1436,11 @@ EXPORT_SYMBOL_NS_GPL(cxl_reset_prepare_device, "CXL");
>   * cxl_reset_cleanup_device - Cleanup after CXL reset
>   * @pdev: PCI device that was reset
>   *
> - * Releases region locks held during reset.
> + * Releases region locks for device and all sibling functions.
>   */
>  void cxl_reset_cleanup_device(struct pci_dev *pdev)
>  {
> +	cxl_reset_cleanup_all_functions(pdev);
>  	cxl_reset_cleanup_memdev(pdev);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_reset_cleanup_device, "CXL");
> --
> 2.34.1
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
  2026-01-21 11:20   ` Jonathan Cameron
@ 2026-01-21 20:27     ` Davidlohr Bueso
  2026-01-22  9:53       ` Jonathan Cameron
  2026-01-21 22:19     ` Vikram Sethi
       [not found]     ` <PH7PR12MB9175CDFC163843BB497073CEBD96A@PH7PR12MB9175.namprd12.prod.outlook.com>
  2 siblings, 1 reply; 48+ messages in thread
From: Davidlohr Bueso @ 2026-01-21 20:27 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: smadhavan, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Wed, 21 Jan 2026, Jonathan Cameron wrote:

>On Tue, 20 Jan 2026 22:26:07 +0000
>smadhavan@nvidia.com wrote:
>
>> From: Srirangan Madhavan <smadhavan@nvidia.com>
>>
>> Flush host CPU caches for mapped HDM ranges after teardown and prepare
>> sibling Type 2 functions on multi-function devices. The host cache
>> maintenance uses wbinvd_on_all_cpus() on x86 and VA-based PoC clean+
>
>That's not sufficient in general on arm64.  It might flush far enough
>or there might be buffers beyond the point of coherence that are not
>flushed.

... and quite slow for large regions without firmware assistance
(CLEAN_INV_MEMREGION from PSCI 1.3).

>Until there are clarifications from Arm we need something like the
>agents in drivers/cache to do equivalent of wbinvd_on_all_cpus()
>(be it on a PA range to making a tiny bit less horrible)
>
>Or we need an opt in list for platforms where a flush to PoC is enough.
>
>Needs to some sort of arch_ call as well, not hidden in the cxl driver.

Reset should just be using cpu_cache_invalidate_memregion().

... now, in addition to DCD, we have two users that are beyond the
"just once at boot time" use case.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
  2026-01-21 20:27     ` Davidlohr Bueso
@ 2026-01-22  9:53       ` Jonathan Cameron
  0 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-22  9:53 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: smadhavan, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Wed, 21 Jan 2026 12:27:55 -0800
Davidlohr Bueso <dave@stgolabs.net> wrote:

> On Wed, 21 Jan 2026, Jonathan Cameron wrote:
> 
> >On Tue, 20 Jan 2026 22:26:07 +0000
> >smadhavan@nvidia.com wrote:
> >  
> >> From: Srirangan Madhavan <smadhavan@nvidia.com>
> >>
> >> Flush host CPU caches for mapped HDM ranges after teardown and prepare
> >> sibling Type 2 functions on multi-function devices. The host cache
> >> maintenance uses wbinvd_on_all_cpus() on x86 and VA-based PoC clean+  
> >
> >That's not sufficient in general on arm64.  It might flush far enough
> >or there might be buffers beyond the point of coherence that are not
> >flushed.  
> 
> ... and quite slow for large regions without firmware assistance
> (CLEAN_INV_MEMREGION from PSCI 1.3).

Just for completeness in case anyone takes this comment out of context.

CLEAN_INV_MEMREGION was dropped after that spec when beyond alpha. See F.b release.
https://developer.arm.com/documentation/den0022/fb/?lang=en
Talk to you to your favorite Arm architecture person for why
(I'll just say it was a good reason).

Also there is no guarantee that firmware interface does it any faster.
Implementation defined and all that.

No one jumped on my proposal to write an ACPI AML wrapper spec
so for now only option is to write a specific driver if you have
a hardware agent that offloads these operations (like the hisi_hha does).




> 
> >Until there are clarifications from Arm we need something like the
> >agents in drivers/cache to do equivalent of wbinvd_on_all_cpus()
> >(be it on a PA range to making a tiny bit less horrible)
> >
> >Or we need an opt in list for platforms where a flush to PoC is enough.
> >
> >Needs to some sort of arch_ call as well, not hidden in the cxl driver.  
> 
> Reset should just be using cpu_cache_invalidate_memregion().
> 
> ... now, in addition to DCD, we have two users that are beyond the
> "just once at boot time" use case.

or Back-Invalidate to do this from the device side. If anyone has built that yet...
Beware physical address based prefetchers though as there be monsters.

Jonathan

> 
> Thanks,
> Davidlohr
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
  2026-01-21 11:20   ` Jonathan Cameron
  2026-01-21 20:27     ` Davidlohr Bueso
@ 2026-01-21 22:19     ` Vikram Sethi
  2026-01-22  9:40       ` Souvik Chakravarty
       [not found]     ` <PH7PR12MB9175CDFC163843BB497073CEBD96A@PH7PR12MB9175.namprd12.prod.outlook.com>
  2 siblings, 1 reply; 48+ messages in thread
From: Vikram Sethi @ 2026-01-21 22:19 UTC (permalink / raw)
  To: jonathan.cameron@huawei.com, Srirangan Madhavan
  Cc: dave@stgolabs.net, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, dan.j.williams@intel.com,
	bhelgaas@google.com, ming.li@zohomail.com, rrichter@amd.com,
	Smita.KoralahalliChannabasappa@amd.com, huaisheng.ye@intel.com,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	Vishal Aslot, Vikram Sethi, Shanker Donthineni, Vidya Sagar,
	Matt Ochs, Jason Sequeira, Souvik Chakravarty


Hi Jonathan,


________________________________________
From: Jonathan Cameron <jonathan.cameron@huawei.com>
Sent: Wednesday, January 21, 2026 5:20 AM
To: Srirangan Madhavan
Cc: dave@stgolabs.net; dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com; dan.j.williams@intel.com; bhelgaas@google.com; ming.li@zohomail.com; rrichter@amd.com; Smita.KoralahalliChannabasappa@amd.com; huaisheng.ye@intel.com; linux-cxl@vger.kernel.org; linux-pci@vger.kernel.org; Vishal Aslot; Vikram Sethi; Shanker Donthineni; Vidya Sagar; Matt Ochs; Jason Sequeira
Subject: Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset


On Tue, 20 Jan 2026 22:26:07 +0000
smadhavan@nvidia.com wrote:

>> From: Srirangan Madhavan <smadhavan@nvidia.com>
>>
>> Flush host CPU caches for mapped HDM ranges after teardown and prepare
>> sibling Type 2 functions on multi-function devices. The host cache
>> maintenance uses wbinvd_on_all_cpus() on x86 and VA-based PoC clean+

>That's not sufficient in general on arm64.  It might flush far enough
>or there might be buffers beyond the point of coherence that are not
>flushed.

>Until there are clarifications from Arm we need something like the
>agents in drivers/cache to do equivalent of wbinvd_on_all_cpus()
>(be it on a PA range to making a tiny bit less horrible)

>Or we need an opt in list for platforms where a flush to PoC is enough.

>Needs to some sort of arch_ call as well, not hidden in the cxl driver.

Yes, and in fact it cannot be a cache flush by VA while PTE mappings to the memory are valid, as the core can prefetch the lines back right away, and later evictions post device reset can cause device errors as the device snoop filter isn’t aware of host having any lines.

The only way to do this correctly is via a SMC call to Arm Trusted Firmware to handle in a SOC specific way (after the memory has been offlined and removed from the PTEs), since there is no architectural way to flush by PA (set/way isn’t usable in the kernel). There was a SMC call defined for this in the PSCI spec 2 years ago, but was removed for some reason. I had a discussion with some ARM folks maintaining the PSCI and SMCCC specifications and the direction was to bring the SMC call back in SMCCC specification. Like you say, that can be a separate patch series. Adding Souvik for SMCCC specification details once it is available.  

Thanks


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
  2026-01-21 22:19     ` Vikram Sethi
@ 2026-01-22  9:40       ` Souvik Chakravarty
  0 siblings, 0 replies; 48+ messages in thread
From: Souvik Chakravarty @ 2026-01-22  9:40 UTC (permalink / raw)
  To: Vikram Sethi, jonathan.cameron@huawei.com, Srirangan Madhavan
  Cc: dave@stgolabs.net, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, dan.j.williams@intel.com,
	bhelgaas@google.com, ming.li@zohomail.com, rrichter@amd.com,
	Smita.KoralahalliChannabasappa@amd.com, huaisheng.ye@intel.com,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	Vishal Aslot, Shanker Donthineni, Vidya Sagar, Matt Ochs,
	Jason Sequeira

Hi,

On 21/01/2026 22:19, Vikram Sethi wrote:
>
> Hi Jonathan,
>
>
> ________________________________________
> From: Jonathan Cameron <jonathan.cameron@huawei.com>
> Sent: Wednesday, January 21, 2026 5:20 AM
> To: Srirangan Madhavan
> Cc: dave@stgolabs.net; dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com; dan.j.williams@intel.com; bhelgaas@google.com; ming.li@zohomail.com; rrichter@amd.com; Smita.KoralahalliChannabasappa@amd.com; huaisheng.ye@intel.com; linux-cxl@vger.kernel.org; linux-pci@vger.kernel.org; Vishal Aslot; Vikram Sethi; Shanker Donthineni; Vidya Sagar; Matt Ochs; Jason Sequeira
> Subject: Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
>
>
> On Tue, 20 Jan 2026 22:26:07 +0000
> smadhavan@nvidia.com wrote:
>
>>> From: Srirangan Madhavan <smadhavan@nvidia.com>
>>>
>>> Flush host CPU caches for mapped HDM ranges after teardown and prepare
>>> sibling Type 2 functions on multi-function devices. The host cache
>>> maintenance uses wbinvd_on_all_cpus() on x86 and VA-based PoC clean+
>
>> That's not sufficient in general on arm64.  It might flush far enough
>> or there might be buffers beyond the point of coherence that are not
>> flushed.
>
>> Until there are clarifications from Arm we need something like the
>> agents in drivers/cache to do equivalent of wbinvd_on_all_cpus()
>> (be it on a PA range to making a tiny bit less horrible)
>
>> Or we need an opt in list for platforms where a flush to PoC is enough.
>
>> Needs to some sort of arch_ call as well, not hidden in the cxl driver.
>
> Yes, and in fact it cannot be a cache flush by VA while PTE mappings to the memory are valid, as the core can prefetch the lines back right away, and later evictions post device reset can cause device errors as the device snoop filter isn’t aware of host having any lines.
>
> The only way to do this correctly is via a SMC call to Arm Trusted Firmware to handle in a SOC specific way (after the memory has been offlined and removed from the PTEs), since there is no architectural way to flush by PA (set/way isn’t usable in the kernel). There was a SMC call defined for this in the PSCI spec 2 years ago, but was removed for some reason. I had a discussion with some ARM folks maintaining the PSCI and SMCCC specifications and the direction was to bring the SMC call back in SMCCC specification. Like you say, that can be a separate patch series. Adding Souvik for SMCCC specification details once it is available.

Yes can confirm this is being actively worked on. Something should be
available in the public SMCCC spec soon. Will be the same PSCI functions
moved to SMCCC but slightly simplified.

Regards,
Souvik

>
> Thanks
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 48+ messages in thread

[parent not found: <PH7PR12MB9175CDFC163843BB497073CEBD96A@PH7PR12MB9175.namprd12.prod.outlook.com>]

* Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
       [not found]     ` <PH7PR12MB9175CDFC163843BB497073CEBD96A@PH7PR12MB9175.namprd12.prod.outlook.com>
@ 2026-01-22 10:31       ` Jonathan Cameron
  2026-01-22 19:24         ` Vikram Sethi
  0 siblings, 1 reply; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-22 10:31 UTC (permalink / raw)
  To: Vikram Sethi
  Cc: Srirangan Madhavan, dave@stgolabs.net, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, dan.j.williams@intel.com,
	bhelgaas@google.com, ming.li@zohomail.com, rrichter@amd.com,
	Smita.KoralahalliChannabasappa@amd.com, huaisheng.ye@intel.com,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	Vishal Aslot, Shanker Donthineni, Vidya Sagar, Matt Ochs,
	Jason Sequeira, Souvik Chakravarty, james.morse

On Wed, 21 Jan 2026 20:36:01 +0000
Vikram Sethi <vsethi@nvidia.com> wrote:

> Hi Jonathan,
> 
> From: Jonathan Cameron <jonathan.cameron@huawei.com>
> Date: Wednesday, January 21, 2026 at 5:20 AM
> To: Srirangan Madhavan <smadhavan@nvidia.com>
> Cc: dave@stgolabs.net <dave@stgolabs.net>, dave.jiang@intel.com <dave.jiang@intel.com>, alison.schofield@intel.com <alison.schofield@intel.com>, vishal.l.verma@intel.com <vishal.l.verma@intel.com>, ira.weiny@intel.com <ira.weiny@intel.com>, dan.j.williams@intel.com <dan.j.williams@intel.com>, bhelgaas@google.com <bhelgaas@google.com>, ming.li@zohomail.com <ming.li@zohomail.com>, rrichter@amd.com <rrichter@amd.com>, Smita.KoralahalliChannabasappa@amd.com <Smita.KoralahalliChannabasappa@amd.com>, huaisheng.ye@intel.com <huaisheng.ye@intel.com>, linux-cxl@vger.kernel.org <linux-cxl@vger.kernel.org>, linux-pci@vger.kernel.org <linux-pci@vger.kernel.org>, Vishal Aslot <vaslot@nvidia.com>, Vikram Sethi <vsethi@nvidia.com>, Shanker Donthineni <sdonthineni@nvidia.com>, Vidya Sagar <vidyas@nvidia.com>, Matt Ochs <mochs@nvidia.com>, Jason Sequeira <jsequeira@nvidia.com>
> Subject: Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
> 

Hi Vikram, Happy new year!

> On Tue, 20 Jan 2026 22:26:07 +0000
> smadhavan@nvidia.com wrote:
> 
> >> From: Srirangan Madhavan <smadhavan@nvidia.com>
> >>Flush host CPU caches for mapped HDM ranges after teardown and prepare
> >> sibling Type 2 functions on multi-function devices. The host cache
> > >maintenance uses wbinvd_on_all_cpus() on x86 and VA-based PoC clean+  
> 
> >That's not sufficient in general on arm64.  It might flush far enough
> >or there might be buffers beyond the point of coherence that are not
> >flushed.  
> snip
> 
> >Needs to some sort of arch_ call as well, not hidden in the cxl driver.  
> 
> Yes, and in fact it cannot be a cache flush by VA while PTE mappings
> to the memory are valid, as the core can prefetch the lines back
> right away, and later evictions post device reset can cause device
> errors as the device snoop filter isn’t aware of host having any
> lines.

Prefetching should (hopefully) not make any dirty lines. Hopefully
no one does clean writebacks (and there is a way to check if they
do in the ACPI tables).  You need to flush again after to force out
that stale stuff though before any demand fetches occur.

+ don't forget PA based prefetchers.  Doesn't even need to
be a mapping for fetches into distance caches to happen.  A given
platform might have restrictions on where those PA prefetchers will
go but we currently have no way to discover that.

I need to think a bit more on this as there are some scary
comments in the spec on CXL.reset such as all CXL.mem reads
are dropped (timeout fun). Mind you a device is permitted
to do that anyway before cxl.mem is enabled, so hopefully n
one times out if a prefetcher hits the device before that's on..

> The only way to do this correctly is via a SMC call to Arm
> trusted Firmware to handle in SOC specific way (after the memory has
> been offlined and removed from the PTEs), since there is no
> architectural way to flush by PA (set/way isn’t usable in the
> kernel). There was a SMC call defined for this in the PSCI spec 2
> years ago, but was removed for some reason. I had a discussion with
> some ARM folks maintaining the PSCI and SMCCC specifications and the
> direction was to bring the SMC call back in SMCCC specification. Like
> you say, that can be a separate patch series. Adding Souvik for SMCCC
> specification details once it is available.
> 

There are other possible paths which is where drivers/cache stuff
came from.+CC James Morse.

From a kernel point of view SMCCC needs to be just one option as a
bunch of hardware (I'll only point at ours as not sure what else is
public) provide MMIO accessible agents to do this stuff and going via
EL3 to talk to an engine the kernel can poke directly is silly.

I'm not against the PSCI thing coming back though if someone needs it.
Preferably without the CPU rendezvous stuff though -> or Linux can
just reject anyone who does that.

We could also revisit the approach of using an AML op region to issue
the SMCCC, thus allowing us to support vendor SMCCC calls (+ a general
one if ARM do bring that back) or whatever magic they want to use.
https://lore.kernel.org/all/20250820102950.175065-8-Jonathan.Cameron@huawei.com/

Advantage of that is you can wrap SMCCC, MMIO or whatever else you like
up and the kernel only needs one driver.  Disadvantage is need a spec
and only ACPI etc so we need a driver anyway if anyone cares on DT systems...

If you want to mess with that I can dig out the QEMU emulation and rebase
that driver.  I thought it was kind of cute, just a solution we didn't need
at the time!

Jonathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
  2026-01-22 10:31       ` Jonathan Cameron
@ 2026-01-22 19:24         ` Vikram Sethi
  2026-01-23 13:13           ` Jonathan Cameron
  0 siblings, 1 reply; 48+ messages in thread
From: Vikram Sethi @ 2026-01-22 19:24 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Srirangan Madhavan, dave@stgolabs.net, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, dan.j.williams@intel.com,
	bhelgaas@google.com, ming.li@zohomail.com, rrichter@amd.com,
	Smita.KoralahalliChannabasappa@amd.com, huaisheng.ye@intel.com,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	Vishal Aslot, Shanker Donthineni, Vidya Sagar, Matt Ochs,
	Jason Sequeira, Souvik Chakravarty, james.morse@arm.com

Hi Jonathan,
Happy new year!

> From: Jonathan Cameron <jonathan.cameron@huawei.com>
> Date: Thursday, January 22, 2026 4:31 AM
> To: Srirangan Madhavan <smadhavan@nvidia.com>
> Cc: dave@stgolabs.net <dave@stgolabs.net>, dave.jiang@intel.com <dave.jiang@intel.com>, alison.schofield@intel.com <alison.schofield@intel.com>, vishal.l.verma@intel.com <vishal.l.verma@intel.com>, ira.weiny@intel.com <ira.weiny@intel.com>, dan.j.williams@intel.com <dan.j.williams@intel.com>, bhelgaas@google.com <bhelgaas@google.com>, ming.li@zohomail.com <ming.li@zohomail.com>, rrichter@amd.com <rrichter@amd.com>, Smita.KoralahalliChannabasappa@amd.com <Smita.KoralahalliChannabasappa@amd.com>, huaisheng.ye@intel.com <huaisheng.ye@intel.com>, linux-cxl@vger.kernel.org <linux-cxl@vger.kernel.org>, linux-pci@vger.kernel.org <linux-pci@vger.kernel.org>, Vishal Aslot <vaslot@nvidia.com>, Vikram Sethi <vsethi@nvidia.com>, Shanker Donthineni <sdonthineni@nvidia.com>, Vidya Sagar <vidyas@nvidia.com>, Matt Ochs <mochs@nvidia.com>, Jason Sequeira <jsequeira@nvidia.com>
> Subject: Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset


>Prefetching should (hopefully) not make any dirty lines. Hopefully
>no one does clean writebacks (and there is a way to check if they
>do in the ACPI tables).  You need to flush again after to force out
>that stale stuff though before any demand fetches occur.

I am aware of some systems which do clean writebacks to device memory. 
What exact ACPI table has this discovery?

>I need to think a bit more on this as there are some scary
>comments in the spec on CXL.reset such as all CXL.mem reads
>are dropped (timeout fun). Mind you a device is permitted
>to do that anyway before cxl.mem is enabled, so hopefully n
>one times out if a prefetcher hits the device before that's on..

Yes, any sensible coherent memory device reset will have to include first offlining the memory such that there is no demand or speculative fetch possible, else you get to deal with CXL error isolation fun.

>From a kernel point of view SMCCC needs to be just one option as a
>bunch of hardware (I'll only point at ours as not sure what else is
>public) provide MMIO accessible agents to do this stuff and going via
>EL3 to talk to an engine the kernel can poke directly is silly.

We have a similar custom engine for efficient flushing, but the interface is not available to the kernel, so the SMCCC is preferred for our implementation. Like you say, the AML wrapper for SMCCC is another option, although we also have some device tree based systems where the efficient custom flushing is desirable. 

>I'm not against the PSCI thing coming back though if someone needs it.
>Preferably without the CPU rendezvous stuff though -> or Linux can
>just reject anyone who does that.

Agreed. I think Souvik also agreed offline that rendezvous is not needed for the SMC based cache flush.

Vikram

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
  2026-01-22 19:24         ` Vikram Sethi
@ 2026-01-23 13:13           ` Jonathan Cameron
  0 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-23 13:13 UTC (permalink / raw)
  To: Vikram Sethi
  Cc: Srirangan Madhavan, dave@stgolabs.net, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, dan.j.williams@intel.com,
	bhelgaas@google.com, ming.li@zohomail.com, rrichter@amd.com,
	Smita.KoralahalliChannabasappa@amd.com, huaisheng.ye@intel.com,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	Vishal Aslot, Shanker Donthineni, Vidya Sagar, Matt Ochs,
	Jason Sequeira, Souvik Chakravarty, james.morse@arm.com

On Thu, 22 Jan 2026 19:24:49 +0000
Vikram Sethi <vsethi@nvidia.com> wrote:

> Hi Jonathan,
> Happy new year!
> 
> > From: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Date: Thursday, January 22, 2026 4:31 AM
> > To: Srirangan Madhavan <smadhavan@nvidia.com>
> > Cc: dave@stgolabs.net <dave@stgolabs.net>, dave.jiang@intel.com <dave.jiang@intel.com>, alison.schofield@intel.com <alison.schofield@intel.com>, vishal.l.verma@intel.com <vishal.l.verma@intel.com>, ira.weiny@intel.com <ira.weiny@intel.com>, dan.j.williams@intel.com <dan.j.williams@intel.com>, bhelgaas@google.com <bhelgaas@google.com>, ming.li@zohomail.com <ming.li@zohomail.com>, rrichter@amd.com <rrichter@amd.com>, Smita.KoralahalliChannabasappa@amd.com <Smita.KoralahalliChannabasappa@amd.com>, huaisheng.ye@intel.com <huaisheng.ye@intel.com>, linux-cxl@vger.kernel.org <linux-cxl@vger.kernel.org>, linux-pci@vger.kernel.org <linux-pci@vger.kernel.org>, Vishal Aslot <vaslot@nvidia.com>, Vikram Sethi <vsethi@nvidia.com>, Shanker Donthineni <sdonthineni@nvidia.com>, Vidya Sagar <vidyas@nvidia.com>, Matt Ochs <mochs@nvidia.com>, Jason Sequeira <jsequeira@nvidia.com>
> > Subject: Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset  
> 
> 
> >Prefetching should (hopefully) not make any dirty lines. Hopefully
> >no one does clean writebacks (and there is a way to check if they
> >do in the ACPI tables).  You need to flush again after to force out
> >that stale stuff though before any demand fetches occur.  
> 
> I am aware of some systems which do clean writebacks to device memory. 
> What exact ACPI table has this discovery?

CEDT (so in the CXL spec, it's there in 3.2 and 4, I'm too lazy to open
earlier specs) Specifically the CXL System Description Structure (CSDS)
System Capabilities bit[1]

No Clean Writeback: Specifies the clean writeback beahvior of hte host
- 0 = The host may or may not generate clean writebacks.
- 1 = The host guarantees to never generate clean writeback at the host's
      cacheline granularity.


> 
> >I need to think a bit more on this as there are some scary
> >comments in the spec on CXL.reset such as all CXL.mem reads
> >are dropped (timeout fun). Mind you a device is permitted
> >to do that anyway before cxl.mem is enabled, so hopefully n
> >one times out if a prefetcher hits the device before that's on..  
> 
> Yes, any sensible coherent memory device reset will have to include first offlining the memory such that there is no demand or speculative fetch possible, else you get to deal with CXL error isolation fun.
> 
> >From a kernel point of view SMCCC needs to be just one option as a
> >bunch of hardware (I'll only point at ours as not sure what else is
> >public) provide MMIO accessible agents to do this stuff and going via
> >EL3 to talk to an engine the kernel can poke directly is silly.  
> 
> We have a similar custom engine for efficient flushing, but the interface is not available to the kernel, so the SMCCC is preferred for our implementation. Like you say, the AML wrapper for SMCCC is another option, although we also have some device tree based systems where the efficient custom flushing is desirable. 

Great.  So all that work they did before dropping the PSCI call
was worth doing.  I'll keep an eye open for the new spec.

> 
> >I'm not against the PSCI thing coming back though if someone needs it.
> >Preferably without the CPU rendezvous stuff though -> or Linux can
> >just reject anyone who does that.  
> 
> Agreed. I think Souvik also agreed offline that rendezvous is not needed for the SMC based cache flush.

Great.

Thanks,

Jonathan

> 
> Vikram
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/10] cxl: add host cache flush and multi-function reset
  2026-01-20 22:26 ` [PATCH v4 07/10] cxl: add host cache flush and multi-function reset smadhavan
  2026-01-21 11:20   ` Jonathan Cameron
@ 2026-01-21 23:59   ` Dave Jiang
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2026-01-21 23:59 UTC (permalink / raw)
  To: smadhavan, dave, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, linux-cxl, linux-pci
  Cc: vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira



On 1/20/26 3:26 PM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Flush host CPU caches for mapped HDM ranges after teardown and prepare
> sibling Type 2 functions on multi-function devices. The host cache
> maintenance uses wbinvd_on_all_cpus() on x86 and VA-based PoC clean+
> invalidate on arm64 via memremap() and on_each_cpu(), matching the
> required ordering before reset.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/pci.c | 150 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 148 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e4134162e82a..f9cc452ccb8a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -11,6 +11,10 @@
>  #include <linux/pci.h>
>  #include <linux/aer.h>
>  #include <linux/io.h>
> +#include <linux/align.h>
> +#include <linux/cache.h>
> +#include <linux/cacheflush.h>
> +#include <linux/smp.h>
>  #include <cxl/mailbox.h>
>  #include <cxl/pci.h>
>  #include "cxlmem.h"
> @@ -1085,6 +1089,71 @@ bool cxl_is_type2_device(struct pci_dev *pdev)
>  	return cxlds->type == CXL_DEVTYPE_DEVMEM;
>  }
> 
> +#ifdef CONFIG_ARM64

Please figure out some arch common way to do this. Really do not want CONFIG_$ARCH ifdefs sprinkled all over CXL code. Also this ARCH specific code probably does not belong under CXL anyhow.

DJ

> +struct cxl_cache_flush_ctx {
> +	void *va;
> +	size_t len;
> +};
> +
> +static void cxl_flush_by_va_local(void *info)
> +{
> +	struct cxl_cache_flush_ctx *ctx = info;
> +
> +	dcache_clean_inval_poc((unsigned long)ctx->va,
> +			       (unsigned long)ctx->va + ctx->len);
> +	asm volatile("dsb ish" ::: "memory");
> +}
> +#endif
> +
> +static int cxl_region_flush_host_cpu_caches(struct device *dev, void *data)
> +{
> +	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> +	struct cxl_region *cxlr = cxled->cxld.region;
> +	struct resource *res;
> +
> +	if (!is_endpoint_decoder(dev))
> +		return 0;
> +
> +	if (!cxlr || !cxlr->params.res)
> +		return 0;
> +
> +	res = cxlr->params.res;
> +
> +#ifdef CONFIG_X86
> +	static bool flushed;
> +
> +	if (!flushed) {
> +		wbinvd_on_all_cpus();
> +		flushed = true;
> +	}
> +#elif defined(CONFIG_ARM64)
> +	void *va;
> +	size_t len, line_size = L1_CACHE_BYTES;
> +	phys_addr_t start, end, aligned_start, aligned_end;
> +	struct cxl_cache_flush_ctx flush_ctx;
> +
> +	start = res->start;
> +	end = res->end;
> +
> +	aligned_start = ALIGN_DOWN(start, line_size);
> +	aligned_end = ALIGN(end + 1, line_size);
> +	len = aligned_end - aligned_start;
> +
> +	va = memremap(aligned_start, len, MEMREMAP_WB);
> +	if (!va) {
> +		pr_warn("Failed to map region for cache flush\n");
> +		return 0;
> +	}
> +
> +	flush_ctx.va = va;
> +	flush_ctx.len = len;
> +	on_each_cpu(cxl_flush_by_va_local, &flush_ctx, 1);
> +
> +	memunmap(va);
> +#endif
> +	return 0;
> +}
> +
>  static int cxl_check_region_driver_bound(struct device *dev, void *data)
>  {
>  	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> @@ -1245,6 +1314,9 @@ static int cxl_reset_prepare_memdev(struct pci_dev *pdev)
>  		return rc;
>  	}
> 
> +	device_for_each_child(&endpoint->dev, NULL,
> +			      cxl_region_flush_host_cpu_caches);
> +
>  	/* Keep cxl_region_rwsem held, released by cleanup function */
>  	return 0;
>  }
> @@ -1259,12 +1331,79 @@ static void cxl_reset_cleanup_memdev(struct pci_dev *pdev)
>  		up_write(&cxl_region_rwsem);
>  }
> 
> +static int cxl_reset_prepare_all_functions(struct pci_dev *pdev)
> +{
> +	struct pci_dev *func_dev;
> +	unsigned int devfn;
> +	int func, rc;
> +	struct pci_dev *prepared_funcs[8] = { NULL };
> +	int prepared_count = 0;
> +
> +	for (func = 0; func < 8; func++) {
> +		devfn = PCI_DEVFN(PCI_SLOT(pdev->devfn), func);
> +
> +		if (devfn == pdev->devfn)
> +			continue;
> +
> +		func_dev = pci_get_slot(pdev->bus, devfn);
> +		if (!func_dev)
> +			continue;
> +
> +		if (!cxl_is_type2_device(func_dev)) {
> +			pci_dev_put(func_dev);
> +			continue;
> +		}
> +
> +		rc = cxl_reset_prepare_memdev(func_dev);
> +		if (rc) {
> +			pci_dev_put(func_dev);
> +			goto cleanup_funcs;
> +		}
> +
> +		prepared_funcs[prepared_count++] = func_dev;
> +	}
> +
> +	return 0;
> +
> +cleanup_funcs:
> +	for (func = 0; func < prepared_count; func++) {
> +		if (prepared_funcs[func]) {
> +			cxl_reset_cleanup_memdev(prepared_funcs[func]);
> +			pci_dev_put(prepared_funcs[func]);
> +		}
> +	}
> +	return rc;
> +}
> +
> +static void cxl_reset_cleanup_all_functions(struct pci_dev *pdev)
> +{
> +	struct pci_dev *func_dev;
> +	unsigned int devfn;
> +	int func;
> +
> +	for (func = 0; func < 8; func++) {
> +		devfn = PCI_DEVFN(PCI_SLOT(pdev->devfn), func);
> +
> +		if (devfn == pdev->devfn)
> +			continue;
> +
> +		func_dev = pci_get_slot(pdev->bus, devfn);
> +		if (!func_dev)
> +			continue;
> +
> +		if (cxl_is_type2_device(func_dev))
> +			cxl_reset_cleanup_memdev(func_dev);
> +
> +		pci_dev_put(func_dev);
> +	}
> +}
> +
>  /**
>   * cxl_reset_prepare_device - Prepare CXL device for reset
>   * @pdev: PCI device being reset
>   *
>   * CXL-reset-specific preparation. Validates memory is offline, flushes
> - * device caches, and tears down regions.
> + * device caches, and tears down regions for device and siblings.
>   *
>   * Returns: 0 on success, -EBUSY if memory online, negative on error
>   */
> @@ -1283,6 +1422,12 @@ int cxl_reset_prepare_device(struct pci_dev *pdev)
>  		return rc;
>  	}
> 
> +	rc = cxl_reset_prepare_all_functions(pdev);
> +	if (rc) {
> +		cxl_reset_cleanup_memdev(pdev);
> +		return rc;
> +	}
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_reset_prepare_device, "CXL");
> @@ -1291,10 +1436,11 @@ EXPORT_SYMBOL_NS_GPL(cxl_reset_prepare_device, "CXL");
>   * cxl_reset_cleanup_device - Cleanup after CXL reset
>   * @pdev: PCI device that was reset
>   *
> - * Releases region locks held during reset.
> + * Releases region locks for device and all sibling functions.
>   */
>  void cxl_reset_cleanup_device(struct pci_dev *pdev)
>  {
> +	cxl_reset_cleanup_all_functions(pdev);
>  	cxl_reset_cleanup_memdev(pdev);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_reset_cleanup_device, "CXL");
> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 08/10] cxl: add DVSEC config save/restore
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (6 preceding siblings ...)
  2026-01-20 22:26 ` [PATCH v4 07/10] cxl: add host cache flush and multi-function reset smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-21 11:31   ` Jonathan Cameron
  2026-01-20 22:26 ` [PATCH v4 09/10] PCI: save/restore CXL config around reset smadhavan
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

Save and restore CXL DVSEC control registers across reset with
CONFIG_LOCK handling so RWL fields are preserved when locked. This
maintains device policy and capability state across cxl_reset while
avoiding writes to locked fields.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/pci.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++
 include/cxl/pci.h |  15 +++++++
 2 files changed, 122 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index f9cc452ccb8a..7d6a0ef70b2d 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1154,6 +1154,113 @@ static int cxl_region_flush_host_cpu_caches(struct device *dev, void *data)
 	return 0;
 }

+/*
+ * CXL DVSEC register save/restore
+ */
+static int cxl_save_dvsec_state(struct pci_dev *pdev,
+				struct cxl_type2_saved_state *state, int dvsec)
+{
+	int rc;
+
+	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CTRL_OFFSET,
+				  &state->dvsec_ctrl);
+	if (rc)
+		return rc;
+
+	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
+				  &state->dvsec_ctrl2);
+	return rc;
+}
+
+static int cxl_restore_dvsec_state(struct pci_dev *pdev,
+				   const struct cxl_type2_saved_state *state,
+				   int dvsec, bool config_locked)
+{
+	int rc;
+	u16 val_to_restore;
+
+	if (config_locked) {
+		u16 current_val;
+
+		rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CTRL_OFFSET,
+					  &current_val);
+		if (rc)
+			return rc;
+
+		val_to_restore = (current_val & CXL_DVSEC_CTRL_RWL_MASK) |
+				 (state->dvsec_ctrl & ~CXL_DVSEC_CTRL_RWL_MASK);
+	} else {
+		val_to_restore = state->dvsec_ctrl;
+	}
+
+	rc = pci_write_config_word(pdev, dvsec + CXL_DVSEC_CTRL_OFFSET,
+				   val_to_restore);
+	if (rc)
+		return rc;
+
+	rc = pci_write_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
+				   state->dvsec_ctrl2);
+	return rc;
+}
+
+/**
+ * cxl_config_save_state - Save CXL configuration state
+ * @pdev: PCI device
+ * @state: Structure to store saved state
+ *
+ * Saves CXL DVSEC state before reset.
+ */
+int cxl_config_save_state(struct pci_dev *pdev,
+			  struct cxl_type2_saved_state *state)
+{
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	int dvsec;
+
+	if (!cxlds || !state)
+		return -EINVAL;
+
+	memset(state, 0, sizeof(*state));
+
+	dvsec = cxlds->cxl_dvsec;
+	if (!dvsec)
+		return -ENODEV;
+
+	return cxl_save_dvsec_state(pdev, state, dvsec);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_config_save_state, "CXL");
+
+/**
+ * cxl_config_restore_state - Restore CXL configuration state
+ * @pdev: PCI device
+ * @state: Previously saved state
+ *
+ * Restores CXL DVSEC state after reset.
+ */
+int cxl_config_restore_state(struct pci_dev *pdev,
+			     const struct cxl_type2_saved_state *state)
+{
+	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	bool config_locked;
+	int rc, dvsec;
+	u16 lock_reg;
+
+	if (!cxlds || !state)
+		return -EINVAL;
+
+	dvsec = cxlds->cxl_dvsec;
+	if (!dvsec)
+		return -ENODEV;
+
+	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_LOCK_OFFSET, &lock_reg);
+	if (rc)
+		return rc;
+
+	config_locked = !!(lock_reg & CXL_DVSEC_LOCK_CONFIG_LOCK);
+
+	return cxl_restore_dvsec_state(pdev, state, dvsec, config_locked);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_config_restore_state, "CXL");
+
 static int cxl_check_region_driver_bound(struct device *dev, void *data)
 {
 	struct cxl_decoder *cxld = to_cxl_decoder(dev);
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
index 71d8de5de948..2c629ded73cc 100644
--- a/include/cxl/pci.h
+++ b/include/cxl/pci.h
@@ -4,6 +4,18 @@
 #ifndef __CXL_ACCEL_PCI_H
 #define __CXL_ACCEL_PCI_H

+/* CXL Type 2 device state for save/restore across reset */
+struct cxl_type2_saved_state {
+	/* DVSEC registers */
+	u16 dvsec_ctrl;
+	u16 dvsec_ctrl2;
+};
+
+int cxl_config_save_state(struct pci_dev *pdev,
+			  struct cxl_type2_saved_state *state);
+int cxl_config_restore_state(struct pci_dev *pdev,
+			     const struct cxl_type2_saved_state *state);
+
 /*
  * See section 8.1 Configuration Space Registers in the CXL 2.0
  * Specification. Names are taken straight from the specification with "CXL" and
@@ -23,6 +35,7 @@
 #define     CXL_DVSEC_CXL_RST_MEM_CLR_CAPABLE	BIT(11)
 #define   CXL_DVSEC_CTRL_OFFSET		0xC
 #define     CXL_DVSEC_MEM_ENABLE	BIT(2)
+#define     CXL_DVSEC_CTRL_RWL_MASK	0x5FED
 #define   CXL_DVSEC_CTRL2_OFFSET	0x10
 #define     CXL_DVSEC_DISABLE_CACHING	BIT(0)
 #define     CXL_DVSEC_INIT_CACHE_WBI	BIT(1)
@@ -32,6 +45,8 @@
 #define     CXL_DVSEC_CACHE_INVALID	BIT(0)
 #define     CXL_DVSEC_CXL_RST_COMPLETE	BIT(1)
 #define     CXL_DVSEC_CXL_RESET_ERR	BIT(2)
+#define   CXL_DVSEC_LOCK_OFFSET		0x14
+#define     CXL_DVSEC_LOCK_CONFIG_LOCK	BIT(0)
 #define   CXL_DVSEC_RANGE_SIZE_HIGH(i)	(0x18 + ((i) * 0x10))
 #define   CXL_DVSEC_RANGE_SIZE_LOW(i)	(0x1C + ((i) * 0x10))
 #define     CXL_DVSEC_MEM_INFO_VALID	BIT(0)
--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 08/10] cxl: add DVSEC config save/restore
  2026-01-20 22:26 ` [PATCH v4 08/10] cxl: add DVSEC config save/restore smadhavan
@ 2026-01-21 11:31   ` Jonathan Cameron
  0 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-21 11:31 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, 20 Jan 2026 22:26:08 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Save and restore CXL DVSEC control registers across reset with
> CONFIG_LOCK handling so RWL fields are preserved when locked. This
> maintains device policy and capability state across cxl_reset while
> avoiding writes to locked fields.

Add some more info here on what is protected by the lock and
preserved across reset.  This is odd enough I think that detail is needed.
There are other RWL fields outside those registers you are restoring here.
e.g. the range registers, various things in extended meta data,

> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/pci.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/cxl/pci.h |  15 +++++++
>  2 files changed, 122 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index f9cc452ccb8a..7d6a0ef70b2d 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1154,6 +1154,113 @@ static int cxl_region_flush_host_cpu_caches(struct device *dev, void *data)
>  	return 0;
>  }
> 
> +/*
> + * CXL DVSEC register save/restore
> + */
> +static int cxl_save_dvsec_state(struct pci_dev *pdev,
> +				struct cxl_type2_saved_state *state, int dvsec)
> +{
> +	int rc;
> +
> +	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CTRL_OFFSET,
> +				  &state->dvsec_ctrl);
> +	if (rc)
> +		return rc;
> +
> +	rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
> +				  &state->dvsec_ctrl2);
> +	return rc;
Similar to below. Just combine 2 lines into 1.
> +}
> +
> +static int cxl_restore_dvsec_state(struct pci_dev *pdev,
> +				   const struct cxl_type2_saved_state *state,
> +				   int dvsec, bool config_locked)
> +{
> +	int rc;
> +	u16 val_to_restore;
> +
> +	if (config_locked) {
> +		u16 current_val;
> +
> +		rc = pci_read_config_word(pdev, dvsec + CXL_DVSEC_CTRL_OFFSET,
> +					  &current_val);
> +		if (rc)
> +			return rc;
> +
> +		val_to_restore = (current_val & CXL_DVSEC_CTRL_RWL_MASK) |
> +				 (state->dvsec_ctrl & ~CXL_DVSEC_CTRL_RWL_MASK);
> +	} else {
> +		val_to_restore = state->dvsec_ctrl;
> +	}
> +
> +	rc = pci_write_config_word(pdev, dvsec + CXL_DVSEC_CTRL_OFFSET,
> +				   val_to_restore);
> +	if (rc)
> +		return rc;
> +
> +	rc = pci_write_config_word(pdev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
> +				   state->dvsec_ctrl2);
> +	return rc;

If this isn't expected to get more complex (I haven't checked) then
return pci_write_config()

> +}
> +
> +/**
> + * cxl_config_save_state - Save CXL configuration state
> + * @pdev: PCI device
> + * @state: Structure to store saved state
> + *
> + * Saves CXL DVSEC state before reset.
> + */
> +int cxl_config_save_state(struct pci_dev *pdev,
> +			  struct cxl_type2_saved_state *state)
> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	int dvsec;
> +
> +	if (!cxlds || !state)
> +		return -EINVAL;
> +
> +	memset(state, 0, sizeof(*state));
We need to make sure all registers are read, so why is zeroing helpful? 

> +
> +	dvsec = cxlds->cxl_dvsec;
> +	if (!dvsec)
> +		return -ENODEV;
> +
> +	return cxl_save_dvsec_state(pdev, state, dvsec);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_config_save_state, "CXL");


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 09/10] PCI: save/restore CXL config around reset
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (7 preceding siblings ...)
  2026-01-20 22:26 ` [PATCH v4 08/10] cxl: add DVSEC config save/restore smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-21 22:32   ` Dave Jiang
  2026-01-22 10:01   ` Lukas Wunner
  2026-01-20 22:26 ` [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore smadhavan
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

Save PCI and CXL configuration state before cxl_reset and restore it
after reset completes. This preserves DVSEC state alongside standard
PCI state and avoids losing reset-sensitive CXL configuration.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/pci/pci.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 18047c893b0c..0bc85c4cc5fd 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4960,6 +4960,7 @@ static int cxl_reset_init(struct pci_dev *dev, u16 dvsec)
  */
 static int cxl_reset(struct pci_dev *dev, bool probe)
 {
+	struct cxl_type2_saved_state cxl_state;
 	u16 dvsec, reg;
 	int rc;

@@ -4989,6 +4990,11 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
 	if (probe)
 		return 0;

+	pci_save_state(dev);
+	rc = cxl_config_save_state(dev, &cxl_state);
+	if (rc)
+		pci_warn(dev, "Failed to save CXL config state: %d\n", rc);
+
 	/*
 	 * CXL-reset-specific preparation: validate memory offline,
 	 * tear down regions, flush device caches.
@@ -5004,10 +5010,16 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
 	if (rc)
 		goto out_cleanup;

+	pci_restore_state(dev);
+	rc = cxl_config_restore_state(dev, &cxl_state);
+	if (rc)
+		pci_warn(dev, "Failed to restore CXL config state: %d\n", rc);
+
 	cxl_reset_cleanup_device(dev);
 	return 0;

 out_cleanup:
+	pci_restore_state(dev);
 	cxl_reset_cleanup_device(dev);
 	return rc;
 }
--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/10] PCI: save/restore CXL config around reset
  2026-01-20 22:26 ` [PATCH v4 09/10] PCI: save/restore CXL config around reset smadhavan
@ 2026-01-21 22:32   ` Dave Jiang
  2026-01-22 10:01   ` Lukas Wunner
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2026-01-21 22:32 UTC (permalink / raw)
  To: smadhavan, dave, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira



On 1/20/26 3:26 PM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Save PCI and CXL configuration state before cxl_reset and restore it
> after reset completes. This preserves DVSEC state alongside standard
> PCI state and avoids losing reset-sensitive CXL configuration.

Instead of putting dependency on the cxl core, maybe just move the code in the previous patch here since it's just a few lines of config read/writes. But an explanation of why the DVSEC needs to be preserved on the device regardless of whether a driver is present is needed in the commit log.

DJ


> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/pci/pci.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 18047c893b0c..0bc85c4cc5fd 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4960,6 +4960,7 @@ static int cxl_reset_init(struct pci_dev *dev, u16 dvsec)
>   */
>  static int cxl_reset(struct pci_dev *dev, bool probe)
>  {
> +	struct cxl_type2_saved_state cxl_state;
>  	u16 dvsec, reg;
>  	int rc;
> 
> @@ -4989,6 +4990,11 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
>  	if (probe)
>  		return 0;
> 
> +	pci_save_state(dev);
> +	rc = cxl_config_save_state(dev, &cxl_state);
> +	if (rc)
> +		pci_warn(dev, "Failed to save CXL config state: %d\n", rc);
> +
>  	/*
>  	 * CXL-reset-specific preparation: validate memory offline,
>  	 * tear down regions, flush device caches.
> @@ -5004,10 +5010,16 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
>  	if (rc)
>  		goto out_cleanup;
> 
> +	pci_restore_state(dev);
> +	rc = cxl_config_restore_state(dev, &cxl_state);
> +	if (rc)
> +		pci_warn(dev, "Failed to restore CXL config state: %d\n", rc);
> +
>  	cxl_reset_cleanup_device(dev);
>  	return 0;
> 
>  out_cleanup:
> +	pci_restore_state(dev);
>  	cxl_reset_cleanup_device(dev);
>  	return rc;
>  }
> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/10] PCI: save/restore CXL config around reset
  2026-01-20 22:26 ` [PATCH v4 09/10] PCI: save/restore CXL config around reset smadhavan
  2026-01-21 22:32   ` Dave Jiang
@ 2026-01-22 10:01   ` Lukas Wunner
  2026-01-22 10:47     ` Jonathan Cameron
  1 sibling, 1 reply; 48+ messages in thread
From: Lukas Wunner @ 2026-01-22 10:01 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira,
	Terry Bowman

On Tue, Jan 20, 2026 at 10:26:09PM +0000, smadhavan@nvidia.com wrote:
> +++ b/drivers/pci/pci.c
> @@ -4989,6 +4990,11 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
>  	if (probe)
>  		return 0;
> 
> +	pci_save_state(dev);
> +	rc = cxl_config_save_state(dev, &cxl_state);
> +	if (rc)
> +		pci_warn(dev, "Failed to save CXL config state: %d\n", rc);
> +

Hm, shouldn't the call to cxl_config_save_state() be moved to
pci_save_state() (and likewise, cxl_config_restore_state() moved to
pci_restore_state())?

E.g. when a DPC event occurs, I assume CXL registers need to
be restored as well on recovery, right?

Note that since v6.19-rc1, state is saved on enumeration and
thus always available for recovery, see a2f1e22390ac.

As a general remark on this series, it seems to have considerable
overlap with Terry's work on AER support for CXL, particularly in
patch [01/10], as Jonathan has remarked.  Please cc Terry on future
submissions and coordinate with him on conflicting parts of your
patches:

https://lore.kernel.org/all/20260114182055.46029-1-terry.bowman@amd.com/

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/10] PCI: save/restore CXL config around reset
  2026-01-22 10:01   ` Lukas Wunner
@ 2026-01-22 10:47     ` Jonathan Cameron
  2026-01-26 22:34       ` Alex Williamson
  0 siblings, 1 reply; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-22 10:47 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: smadhavan, dave, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira,
	Terry Bowman

On Thu, 22 Jan 2026 11:01:57 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> On Tue, Jan 20, 2026 at 10:26:09PM +0000, smadhavan@nvidia.com wrote:
> > +++ b/drivers/pci/pci.c
> > @@ -4989,6 +4990,11 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
> >  	if (probe)
> >  		return 0;
> > 
> > +	pci_save_state(dev);
> > +	rc = cxl_config_save_state(dev, &cxl_state);
> > +	if (rc)
> > +		pci_warn(dev, "Failed to save CXL config state: %d\n", rc);
> > +  
> 
> Hm, shouldn't the call to cxl_config_save_state() be moved to
> pci_save_state() (and likewise, cxl_config_restore_state() moved to
> pci_restore_state())?
> 
> E.g. when a DPC event occurs, I assume CXL registers need to
> be restored as well on recovery, right?
The CXL spec has some comic language around DPC that basically says
"use with care, DPC trigger will bring down physical link, reset devicestate,
disrupt CXL.cache and CXL.mem traffic".
or in shorter words
'Good luck'

If a CXL device undergoes DPC high chance you'll either trigger CXL isolation
which we aren't handing yet in Linux because we aren't convinced software
can really recover form it, or stall a CPU and end up rebooting.

Maybe we'll one day we'll figure this out. Today turn off DPC on CXL ports! :)


J


> 
> Note that since v6.19-rc1, state is saved on enumeration and
> thus always available for recovery, see a2f1e22390ac.
> 
> As a general remark on this series, it seems to have considerable
> overlap with Terry's work on AER support for CXL, particularly in
> patch [01/10], as Jonathan has remarked.  Please cc Terry on future
> submissions and coordinate with him on conflicting parts of your
> patches:
> 
> https://lore.kernel.org/all/20260114182055.46029-1-terry.bowman@amd.com/
> 
> Thanks,
> 
> Lukas
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/10] PCI: save/restore CXL config around reset
  2026-01-22 10:47     ` Jonathan Cameron
@ 2026-01-26 22:34       ` Alex Williamson
  2026-03-12 18:24         ` Jonathan Cameron
  0 siblings, 1 reply; 48+ messages in thread
From: Alex Williamson @ 2026-01-26 22:34 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Lukas Wunner, smadhavan, dave, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira,
	Terry Bowman

On Thu, 22 Jan 2026 10:47:45 +0000
Jonathan Cameron <jonathan.cameron@huawei.com> wrote:

> On Thu, 22 Jan 2026 11:01:57 +0100
> Lukas Wunner <lukas@wunner.de> wrote:
> 
> > On Tue, Jan 20, 2026 at 10:26:09PM +0000, smadhavan@nvidia.com wrote:  
> > > +++ b/drivers/pci/pci.c
> > > @@ -4989,6 +4990,11 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
> > >  	if (probe)
> > >  		return 0;
> > > 
> > > +	pci_save_state(dev);
> > > +	rc = cxl_config_save_state(dev, &cxl_state);
> > > +	if (rc)
> > > +		pci_warn(dev, "Failed to save CXL config state: %d\n", rc);
> > > +    
> > 
> > Hm, shouldn't the call to cxl_config_save_state() be moved to
> > pci_save_state() (and likewise, cxl_config_restore_state() moved to
> > pci_restore_state())?
> > 
> > E.g. when a DPC event occurs, I assume CXL registers need to
> > be restored as well on recovery, right?  
> The CXL spec has some comic language around DPC that basically says
> "use with care, DPC trigger will bring down physical link, reset devicestate,
> disrupt CXL.cache and CXL.mem traffic".
> or in shorter words
> 'Good luck'
> 
> If a CXL device undergoes DPC high chance you'll either trigger CXL isolation
> which we aren't handing yet in Linux because we aren't convinced software
> can really recover form it, or stall a CPU and end up rebooting.
> 
> Maybe we'll one day we'll figure this out. Today turn off DPC on CXL ports! :)

Even if we hand-wave that DPC isn't an issue, save/restore of the PCI
state happens at a higher level for every other PCI reset method and
we're creating inconsistency here.

PCI-core includes interfaces for saving PCI state, offloading PCI state
as an opaque blob, reloading, and restoring that state, and performing
resets without saving and restoring state.  This has a couple users,
including vfio.

If we want similar behavior for CXL type2 devices for a future vfio use
case, we shouldn't create unnecessary differentiation here with saving
the CXL state separately and making the reset method behave
differently.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/10] PCI: save/restore CXL config around reset
  2026-01-26 22:34       ` Alex Williamson
@ 2026-03-12 18:24         ` Jonathan Cameron
  0 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-03-12 18:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Lukas Wunner, smadhavan, dave, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira,
	Terry Bowman

On Mon, 26 Jan 2026 15:34:35 -0700
Alex Williamson <alex@shazbot.org> wrote:

> On Thu, 22 Jan 2026 10:47:45 +0000
> Jonathan Cameron <jonathan.cameron@huawei.com> wrote:
> 
> > On Thu, 22 Jan 2026 11:01:57 +0100
> > Lukas Wunner <lukas@wunner.de> wrote:
> >   
> > > On Tue, Jan 20, 2026 at 10:26:09PM +0000, smadhavan@nvidia.com wrote:    
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -4989,6 +4990,11 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
> > > >  	if (probe)
> > > >  		return 0;
> > > > 
> > > > +	pci_save_state(dev);
> > > > +	rc = cxl_config_save_state(dev, &cxl_state);
> > > > +	if (rc)
> > > > +		pci_warn(dev, "Failed to save CXL config state: %d\n", rc);
> > > > +      
> > > 
> > > Hm, shouldn't the call to cxl_config_save_state() be moved to
> > > pci_save_state() (and likewise, cxl_config_restore_state() moved to
> > > pci_restore_state())?
> > > 
> > > E.g. when a DPC event occurs, I assume CXL registers need to
> > > be restored as well on recovery, right?    
> > The CXL spec has some comic language around DPC that basically says
> > "use with care, DPC trigger will bring down physical link, reset devicestate,
> > disrupt CXL.cache and CXL.mem traffic".
> > or in shorter words
> > 'Good luck'
> > 
> > If a CXL device undergoes DPC high chance you'll either trigger CXL isolation
> > which we aren't handing yet in Linux because we aren't convinced software
> > can really recover form it, or stall a CPU and end up rebooting.
> > 
> > Maybe we'll one day we'll figure this out. Today turn off DPC on CXL ports! :)  
> 
> Even if we hand-wave that DPC isn't an issue, save/restore of the PCI
> state happens at a higher level for every other PCI reset method and
> we're creating inconsistency here.
> 
> PCI-core includes interfaces for saving PCI state, offloading PCI state
> as an opaque blob, reloading, and restoring that state, and performing
> resets without saving and restoring state.  This has a couple users,
> including vfio.
> 
> If we want similar behavior for CXL type2 devices for a future vfio use
> case, we shouldn't create unnecessary differentiation here with saving
> the CXL state separately and making the reset method behave
> differently.  Thanks,
> 

I'm a bit concerned that, unlike PCI where no traffic flows after reset
and restore of basic PCIe stuff, for CXL once you've put the decoders
etc back in place, CXL.mem traffic can happen autonomously. It's
cacheable and physical address prefetchers on the CPU side may be able
wander into it more or less randomly, whether there are page tables yet
or not.

This is somewhat similar to PCI devices misbehaving if you enable
bus mastering without ensuring they are in a clean state (just in the
other direction).

So I'm not sure how safe it is to restore the generic CXL state with
out the driver taking control.

I don't think there are tight enough guarantees that devices should be
able to survive this if their drivers haven't managed the setup of CXL.mem
carefully as they did during driver bind etc.  Maybe they had to
load a firmware first before there was anything behind a CXL protocol
front end.

The drivers can't stop CXL.mem in a prepare reset callback
prior to saving state as it may be RWL by an annoying BIOS.

Maybe I'm overly paranoid and all device manufacturers are sensible.
Or I missed some spec text that says devices should politely handle
traffic turning up before they are ready.  If they implement the memory
ready checks then we may be fine as hopefully Media Status == Ready
doesn't happen until it's safe to enable access (though the spec
doesn't actually say that is sufficient that I can find).

I need to do some more digging and maybe a spot of prototyping.
Also more than plausible I'm missing a nugget of code in here
that makes this all safe.

Jonathan

> Alex

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (8 preceding siblings ...)
  2026-01-20 22:26 ` [PATCH v4 09/10] PCI: save/restore CXL config around reset smadhavan
@ 2026-01-20 22:26 ` smadhavan
  2026-01-21 11:42   ` Jonathan Cameron
  2026-01-22 15:09   ` Dave Jiang
  2026-01-21  1:19 ` [PATCH v4 0/10] CXL Reset support for Type 2 devices Alison Schofield
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 48+ messages in thread
From: smadhavan @ 2026-01-20 22:26 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: smadhavan, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

From: Srirangan Madhavan <smadhavan@nvidia.com>

Extend state save/restore to HDM decoder and IDE registers for Type 2
devices. The HDM/IDE blocks are located via the component register map,
then preserved across reset to retain decoder configuration and IDE
policy. This avoids losing HDM/IDE programming when cxl_reset is issued.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
---
 drivers/cxl/core/regs.c |   7 ++
 drivers/cxl/cxl.h       |   4 ++
 drivers/cxl/pci.c       | 153 ++++++++++++++++++++++++++++++++++++++--
 include/cxl/pci.h       |  43 +++++++++++
 4 files changed, 201 insertions(+), 6 deletions(-)

diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index ecdb22ae6952..76d6869d82ea 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -93,6 +93,12 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
 			length = CXL_RAS_CAPABILITY_LENGTH;
 			rmap = &map->ras;
 			break;
+		case CXL_CM_CAP_CAP_ID_IDE:
+			dev_dbg(dev, "found IDE capability (0x%x)\n",
+				offset);
+			length = CXL_IDE_CAPABILITY_LENGTH;
+			rmap = &map->ide;
+			break;
 		default:
 			dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
 				offset);
@@ -212,6 +218,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
 	} mapinfo[] = {
 		{ &map->component_map.hdm_decoder, &regs->hdm_decoder },
 		{ &map->component_map.ras, &regs->ras },
+		{ &map->component_map.ide, &regs->ide },
 	};
 	int i;

diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index ba17fa86d249..a7a6b79755b3 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -39,8 +39,10 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
 #define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)

 #define   CXL_CM_CAP_CAP_ID_RAS 0x2
+#define   CXL_CM_CAP_CAP_ID_IDE 0x4
 #define   CXL_CM_CAP_CAP_ID_HDM 0x5
 #define   CXL_CM_CAP_CAP_HDM_VERSION 1
+#define   CXL_IDE_CAPABILITY_LENGTH 0x20

 /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
 #define CXL_HDM_DECODER_CAP_OFFSET 0x0
@@ -214,6 +216,7 @@ struct cxl_regs {
 	struct_group_tagged(cxl_component_regs, component,
 		void __iomem *hdm_decoder;
 		void __iomem *ras;
+		void __iomem *ide;
 	);
 	/*
 	 * Common set of CXL Device register block base pointers
@@ -256,6 +259,7 @@ struct cxl_reg_map {
 struct cxl_component_reg_map {
 	struct cxl_reg_map hdm_decoder;
 	struct cxl_reg_map ras;
+	struct cxl_reg_map ide;
 };

 struct cxl_device_reg_map {
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 7d6a0ef70b2d..eb735d0ae175 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -956,7 +956,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		dev_dbg(&pdev->dev, "RAS registers not found\n");

 	rc = cxl_map_component_regs(&cxlds->reg_map, &cxlds->regs.component,
-				    BIT(CXL_CM_CAP_CAP_ID_RAS));
+				    BIT(CXL_CM_CAP_CAP_ID_RAS) |
+				    BIT(CXL_CM_CAP_CAP_ID_IDE));
 	if (rc)
 		dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");

@@ -1203,18 +1204,126 @@ static int cxl_restore_dvsec_state(struct pci_dev *pdev,
 	return rc;
 }

+/*
+ * CXL HDM Decoder register save/restore
+ */
+static int cxl_save_hdm_state(struct cxl_dev_state *cxlds,
+			      struct cxl_type2_saved_state *state)
+{
+	void __iomem *hdm = cxlds->regs.hdm_decoder;
+	u32 cap, ctrl;
+	int i, count;
+
+	if (!hdm)
+		return 0;
+
+	cap = readl(hdm + CXL_HDM_DECODER_CAP_OFFSET);
+	count = cap & CXL_HDM_DECODER_COUNT_MASK;
+	count = min(count, CXL_MAX_DECODERS);
+
+	state->hdm_decoder_count = count;
+	state->hdm_global_ctrl = readl(hdm + CXL_HDM_DECODER_GLOBAL_CTRL_OFFSET);
+
+	for (i = 0; i < count; i++) {
+		struct cxl_hdm_decoder_state *d = &state->decoders[i];
+		u32 base_low, base_high, size_low, size_high;
+		u32 dpa_skip_low, dpa_skip_high;
+
+		base_low = readl(hdm + CXL_HDM_DECODER_BASE_LOW(i));
+		base_high = readl(hdm + CXL_HDM_DECODER_BASE_HIGH(i));
+		size_low = readl(hdm + CXL_HDM_DECODER_SIZE_LOW(i));
+		size_high = readl(hdm + CXL_HDM_DECODER_SIZE_HIGH(i));
+		ctrl = readl(hdm + CXL_HDM_DECODER_CTRL(i));
+		dpa_skip_low = readl(hdm + CXL_HDM_DECODER_DPA_SKIP_LOW(i));
+		dpa_skip_high = readl(hdm + CXL_HDM_DECODER_DPA_SKIP_HIGH(i));
+
+		d->base = ((u64)base_high << 32) | base_low;
+		d->size = ((u64)size_high << 32) | size_low;
+		d->ctrl = ctrl;
+		d->dpa_skip = ((u64)dpa_skip_high << 32) | dpa_skip_low;
+		d->enabled = !!(ctrl & CXL_HDM_DECODER_ENABLE);
+	}
+
+	return 0;
+}
+
+static int cxl_restore_hdm_state(struct cxl_dev_state *cxlds,
+				 const struct cxl_type2_saved_state *state)
+{
+	void __iomem *hdm = cxlds->regs.hdm_decoder;
+	int i;
+
+	if (!hdm || state->hdm_decoder_count == 0)
+		return 0;
+
+	writel(state->hdm_global_ctrl, hdm + CXL_HDM_DECODER_GLOBAL_CTRL_OFFSET);
+
+	for (i = 0; i < state->hdm_decoder_count; i++) {
+		const struct cxl_hdm_decoder_state *d = &state->decoders[i];
+
+		writel((u32)d->base, hdm + CXL_HDM_DECODER_BASE_LOW(i));
+		writel((u32)(d->base >> 32), hdm + CXL_HDM_DECODER_BASE_HIGH(i));
+		writel((u32)d->size, hdm + CXL_HDM_DECODER_SIZE_LOW(i));
+		writel((u32)(d->size >> 32), hdm + CXL_HDM_DECODER_SIZE_HIGH(i));
+		writel(d->ctrl, hdm + CXL_HDM_DECODER_CTRL(i));
+		writel((u32)d->dpa_skip, hdm + CXL_HDM_DECODER_DPA_SKIP_LOW(i));
+		writel((u32)(d->dpa_skip >> 32), hdm + CXL_HDM_DECODER_DPA_SKIP_HIGH(i));
+	}
+
+	return 0;
+}
+
+/*
+ * CXL IDE register save/restore
+ */
+static int cxl_save_ide_state(struct cxl_dev_state *cxlds,
+			      struct cxl_type2_saved_state *state)
+{
+	void __iomem *ide = cxlds->regs.ide;
+	u32 cap;
+
+	if (!ide)
+		return 0;
+
+	cap = readl(ide + CXL_IDE_CAP_OFFSET);
+	if (!(cap & CXL_IDE_CAP_CAPABLE))
+		return 0;
+
+	state->ide_cap = cap;
+	state->ide_ctrl = readl(ide + CXL_IDE_CTRL_OFFSET);
+	state->ide_key_refresh_time = readl(ide + CXL_IDE_KEY_REFRESH_TIME_CTRL_OFFSET);
+	state->ide_truncation_delay = readl(ide + CXL_IDE_TRUNCATION_DELAY_CTRL_OFFSET);
+
+	return 0;
+}
+
+static int cxl_restore_ide_state(struct cxl_dev_state *cxlds,
+				 const struct cxl_type2_saved_state *state)
+{
+	void __iomem *ide = cxlds->regs.ide;
+
+	if (!ide || !(state->ide_cap & CXL_IDE_CAP_CAPABLE))
+		return 0;
+
+	writel(state->ide_ctrl, ide + CXL_IDE_CTRL_OFFSET);
+	writel(state->ide_key_refresh_time, ide + CXL_IDE_KEY_REFRESH_TIME_CTRL_OFFSET);
+	writel(state->ide_truncation_delay, ide + CXL_IDE_TRUNCATION_DELAY_CTRL_OFFSET);
+
+	return 0;
+}
+
 /**
  * cxl_config_save_state - Save CXL configuration state
  * @pdev: PCI device
  * @state: Structure to store saved state
  *
- * Saves CXL DVSEC state before reset.
+ * Saves CXL DVSEC, HDM decoder, and IDE state before reset.
  */
 int cxl_config_save_state(struct pci_dev *pdev,
 			  struct cxl_type2_saved_state *state)
 {
 	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
-	int dvsec;
+	int rc, dvsec;

 	if (!cxlds || !state)
 		return -EINVAL;
@@ -1225,7 +1334,23 @@ int cxl_config_save_state(struct pci_dev *pdev,
 	if (!dvsec)
 		return -ENODEV;

-	return cxl_save_dvsec_state(pdev, state, dvsec);
+	rc = cxl_save_dvsec_state(pdev, state, dvsec);
+	if (rc)
+		return rc;
+
+	if (cxlds->regs.hdm_decoder) {
+		rc = cxl_save_hdm_state(cxlds, state);
+		if (rc)
+			pci_warn(pdev, "Failed to save HDM state: %d\n", rc);
+	}
+
+	if (cxlds->regs.ide) {
+		rc = cxl_save_ide_state(cxlds, state);
+		if (rc)
+			pci_warn(pdev, "Failed to save IDE state: %d\n", rc);
+	}
+
+	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_config_save_state, "CXL");

@@ -1234,7 +1359,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_config_save_state, "CXL");
  * @pdev: PCI device
  * @state: Previously saved state
  *
- * Restores CXL DVSEC state after reset.
+ * Restores CXL DVSEC, HDM decoder, and IDE state after reset.
  */
 int cxl_config_restore_state(struct pci_dev *pdev,
 			     const struct cxl_type2_saved_state *state)
@@ -1257,7 +1382,23 @@ int cxl_config_restore_state(struct pci_dev *pdev,

 	config_locked = !!(lock_reg & CXL_DVSEC_LOCK_CONFIG_LOCK);

-	return cxl_restore_dvsec_state(pdev, state, dvsec, config_locked);
+	rc = cxl_restore_dvsec_state(pdev, state, dvsec, config_locked);
+	if (rc)
+		return rc;
+
+	if (cxlds->regs.hdm_decoder && state->hdm_decoder_count > 0) {
+		rc = cxl_restore_hdm_state(cxlds, state);
+		if (rc)
+			pci_warn(pdev, "Failed to restore HDM state: %d\n", rc);
+	}
+
+	if (cxlds->regs.ide && (state->ide_cap & CXL_IDE_CAP_CAPABLE)) {
+		rc = cxl_restore_ide_state(cxlds, state);
+		if (rc)
+			pci_warn(pdev, "Failed to restore IDE state: %d\n", rc);
+	}
+
+	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_config_restore_state, "CXL");

diff --git a/include/cxl/pci.h b/include/cxl/pci.h
index 2c629ded73cc..9f2a9ad10d75 100644
--- a/include/cxl/pci.h
+++ b/include/cxl/pci.h
@@ -4,11 +4,33 @@
 #ifndef __CXL_ACCEL_PCI_H
 #define __CXL_ACCEL_PCI_H

+/* HDM Decoder state for save/restore */
+struct cxl_hdm_decoder_state {
+	u64 base;
+	u64 size;
+	u32 ctrl;
+	u64 dpa_skip;
+	bool enabled;
+};
+
+#define CXL_MAX_DECODERS 10
+
 /* CXL Type 2 device state for save/restore across reset */
 struct cxl_type2_saved_state {
 	/* DVSEC registers */
 	u16 dvsec_ctrl;
 	u16 dvsec_ctrl2;
+
+	/* HDM Decoder registers */
+	u32 hdm_decoder_count;
+	u32 hdm_global_ctrl;
+	struct cxl_hdm_decoder_state decoders[CXL_MAX_DECODERS];
+
+	/* IDE registers */
+	u32 ide_cap;
+	u32 ide_ctrl;
+	u32 ide_key_refresh_time;
+	u32 ide_truncation_delay;
 };

 int cxl_config_save_state(struct pci_dev *pdev,
@@ -58,6 +80,27 @@ int cxl_config_restore_state(struct pci_dev *pdev,

 #define CXL_DVSEC_RANGE_MAX		2

+/* CXL HDM Decoder Capability Structure (Section 8.2.4.20) */
+#define CXL_HDM_DECODER_CAP_OFFSET		0x0
+#define   CXL_HDM_DECODER_COUNT_MASK		GENMASK(3, 0)
+#define CXL_HDM_DECODER_GLOBAL_CTRL_OFFSET	0x4
+#define   CXL_HDM_DECODER_ENABLE		BIT(1)
+/* CXL HDM Decoder n registers (Offset 20h*n + base) */
+#define CXL_HDM_DECODER_BASE_LOW(n)		(0x10 + ((n) * 0x20))
+#define CXL_HDM_DECODER_BASE_HIGH(n)		(0x14 + ((n) * 0x20))
+#define CXL_HDM_DECODER_SIZE_LOW(n)		(0x18 + ((n) * 0x20))
+#define CXL_HDM_DECODER_SIZE_HIGH(n)		(0x1C + ((n) * 0x20))
+#define CXL_HDM_DECODER_CTRL(n)			(0x20 + ((n) * 0x20))
+#define CXL_HDM_DECODER_DPA_SKIP_LOW(n)		(0x24 + ((n) * 0x20))
+#define CXL_HDM_DECODER_DPA_SKIP_HIGH(n)	(0x28 + ((n) * 0x20))
+
+/* CXL IDE Capability Structure (Section 8.2.4.22) */
+#define CXL_IDE_CAP_OFFSET			0x00
+#define   CXL_IDE_CAP_CAPABLE			BIT(0)
+#define CXL_IDE_CTRL_OFFSET			0x04
+#define CXL_IDE_KEY_REFRESH_TIME_CTRL_OFFSET	0x18
+#define CXL_IDE_TRUNCATION_DELAY_CTRL_OFFSET	0x1C
+
 /* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
 #define CXL_DVSEC_FUNCTION_MAP					2

--
2.34.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore
  2026-01-20 22:26 ` [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore smadhavan
@ 2026-01-21 11:42   ` Jonathan Cameron
  2026-01-22 15:09   ` Dave Jiang
  1 sibling, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-21 11:42 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, 20 Jan 2026 22:26:10 +0000
smadhavan@nvidia.com wrote:

> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Extend state save/restore to HDM decoder and IDE registers for Type 2
> devices. The HDM/IDE blocks are located via the component register map,
> then preserved across reset to retain decoder configuration and IDE
> policy. This avoids losing HDM/IDE programming when cxl_reset is issued.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>

Split the two up. One patch for HDM, one for IDE.

Been a while since I read the IDE stuff.

Isn't reset going to trip up the encryption to the extent that we need
to redo the key exchange etc?

> ---
>  drivers/cxl/core/regs.c |   7 ++
>  drivers/cxl/cxl.h       |   4 ++
>  drivers/cxl/pci.c       | 153 ++++++++++++++++++++++++++++++++++++++--
>  include/cxl/pci.h       |  43 +++++++++++
>  4 files changed, 201 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index ecdb22ae6952..76d6869d82ea 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -93,6 +93,12 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
>  			length = CXL_RAS_CAPABILITY_LENGTH;
>  			rmap = &map->ras;
>  			break;
> +		case CXL_CM_CAP_CAP_ID_IDE:
> +			dev_dbg(dev, "found IDE capability (0x%x)\n",
> +				offset);

Trivial: Fits on one line under 80 chars.

> +			length = CXL_IDE_CAPABILITY_LENGTH;
> +			rmap = &map->ide;
> +			break;

> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 7d6a0ef70b2d..eb735d0ae175 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c

> +/*
> + * CXL HDM Decoder register save/restore
> + */
> +static int cxl_save_hdm_state(struct cxl_dev_state *cxlds,
> +			      struct cxl_type2_saved_state *state)
> +{
> +	void __iomem *hdm = cxlds->regs.hdm_decoder;
> +	u32 cap, ctrl;
> +	int i, count;
> +
> +	if (!hdm)
> +		return 0;
> +
> +	cap = readl(hdm + CXL_HDM_DECODER_CAP_OFFSET);
> +	count = cap & CXL_HDM_DECODER_COUNT_MASK;

Use FIELD_GET() for this so we don't need to go check the definition to be
sure there isn't a shift.

> +	count = min(count, CXL_MAX_DECODERS);
> +
> +	state->hdm_decoder_count = count;
> +	state->hdm_global_ctrl = readl(hdm + CXL_HDM_DECODER_GLOBAL_CTRL_OFFSET);
> +
> +	for (i = 0; i < count; i++) {
> +		struct cxl_hdm_decoder_state *d = &state->decoders[i];
> +		u32 base_low, base_high, size_low, size_high;
> +		u32 dpa_skip_low, dpa_skip_high;
> +
> +		base_low = readl(hdm + CXL_HDM_DECODER_BASE_LOW(i));
> +		base_high = readl(hdm + CXL_HDM_DECODER_BASE_HIGH(i));
> +		size_low = readl(hdm + CXL_HDM_DECODER_SIZE_LOW(i));
> +		size_high = readl(hdm + CXL_HDM_DECODER_SIZE_HIGH(i));
> +		ctrl = readl(hdm + CXL_HDM_DECODER_CTRL(i));
> +		dpa_skip_low = readl(hdm + CXL_HDM_DECODER_DPA_SKIP_LOW(i));
> +		dpa_skip_high = readl(hdm + CXL_HDM_DECODER_DPA_SKIP_HIGH(i));
> +
> +		d->base = ((u64)base_high << 32) | base_low;
> +		d->size = ((u64)size_high << 32) | size_low;
> +		d->ctrl = ctrl;
> +		d->dpa_skip = ((u64)dpa_skip_high << 32) | dpa_skip_low;
> +		d->enabled = !!(ctrl & CXL_HDM_DECODER_ENABLE);

I'm lazy. Why not just stash the register values as a raw block and
put them back? Do we want to put them back if they aren't locked?
We ripped down the region so I'd be kind of expecting the type 2
driver for the specific hardware to need to put this stuff back
as part of it's own reinit.


> +	}
> +
> +	return 0;
> +}




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore
  2026-01-20 22:26 ` [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore smadhavan
  2026-01-21 11:42   ` Jonathan Cameron
@ 2026-01-22 15:09   ` Dave Jiang
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Jiang @ 2026-01-22 15:09 UTC (permalink / raw)
  To: smadhavan, dave, jonathan.cameron, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci
  Cc: vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira



On 1/20/26 3:26 PM, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Extend state save/restore to HDM decoder and IDE registers for Type 2
> devices. The HDM/IDE blocks are located via the component register map,
> then preserved across reset to retain decoder configuration and IDE
> policy. This avoids losing HDM/IDE programming when cxl_reset is issued.

So the current implementation tries to save DVSEC, HDM decoders, and IDE registers (MMIO registers). This is not standard device reset and seems more usage specific. Given that is the case and the fact that the implementation keeps trying to pull very CXL specific bits into PCI core, maybe going through the PCI reset path isn't the right place? The whole thing would be significantly simplified if we add a 'reset' sysfs attribute to memdev and just do it from there? Thoughts? That would prevent CXL symbols keep being dragged into PCI and keep everything local for this specific usage. Probably should also add some text on the reasoning of saving of these MMIO registers.

DJ

> 
> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
> ---
>  drivers/cxl/core/regs.c |   7 ++
>  drivers/cxl/cxl.h       |   4 ++
>  drivers/cxl/pci.c       | 153 ++++++++++++++++++++++++++++++++++++++--
>  include/cxl/pci.h       |  43 +++++++++++
>  4 files changed, 201 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index ecdb22ae6952..76d6869d82ea 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -93,6 +93,12 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
>  			length = CXL_RAS_CAPABILITY_LENGTH;
>  			rmap = &map->ras;
>  			break;
> +		case CXL_CM_CAP_CAP_ID_IDE:
> +			dev_dbg(dev, "found IDE capability (0x%x)\n",
> +				offset);
> +			length = CXL_IDE_CAPABILITY_LENGTH;
> +			rmap = &map->ide;
> +			break;
>  		default:
>  			dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
>  				offset);
> @@ -212,6 +218,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
>  	} mapinfo[] = {
>  		{ &map->component_map.hdm_decoder, &regs->hdm_decoder },
>  		{ &map->component_map.ras, &regs->ras },
> +		{ &map->component_map.ide, &regs->ide },
>  	};
>  	int i;
> 
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index ba17fa86d249..a7a6b79755b3 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -39,8 +39,10 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
>  #define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
> 
>  #define   CXL_CM_CAP_CAP_ID_RAS 0x2
> +#define   CXL_CM_CAP_CAP_ID_IDE 0x4
>  #define   CXL_CM_CAP_CAP_ID_HDM 0x5
>  #define   CXL_CM_CAP_CAP_HDM_VERSION 1
> +#define   CXL_IDE_CAPABILITY_LENGTH 0x20
> 
>  /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
>  #define CXL_HDM_DECODER_CAP_OFFSET 0x0
> @@ -214,6 +216,7 @@ struct cxl_regs {
>  	struct_group_tagged(cxl_component_regs, component,
>  		void __iomem *hdm_decoder;
>  		void __iomem *ras;
> +		void __iomem *ide;
>  	);
>  	/*
>  	 * Common set of CXL Device register block base pointers
> @@ -256,6 +259,7 @@ struct cxl_reg_map {
>  struct cxl_component_reg_map {
>  	struct cxl_reg_map hdm_decoder;
>  	struct cxl_reg_map ras;
> +	struct cxl_reg_map ide;
>  };
> 
>  struct cxl_device_reg_map {
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 7d6a0ef70b2d..eb735d0ae175 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -956,7 +956,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  		dev_dbg(&pdev->dev, "RAS registers not found\n");
> 
>  	rc = cxl_map_component_regs(&cxlds->reg_map, &cxlds->regs.component,
> -				    BIT(CXL_CM_CAP_CAP_ID_RAS));
> +				    BIT(CXL_CM_CAP_CAP_ID_RAS) |
> +				    BIT(CXL_CM_CAP_CAP_ID_IDE));
>  	if (rc)
>  		dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
> 
> @@ -1203,18 +1204,126 @@ static int cxl_restore_dvsec_state(struct pci_dev *pdev,
>  	return rc;
>  }
> 
> +/*
> + * CXL HDM Decoder register save/restore
> + */
> +static int cxl_save_hdm_state(struct cxl_dev_state *cxlds,
> +			      struct cxl_type2_saved_state *state)
> +{
> +	void __iomem *hdm = cxlds->regs.hdm_decoder;
> +	u32 cap, ctrl;
> +	int i, count;
> +
> +	if (!hdm)
> +		return 0;
> +
> +	cap = readl(hdm + CXL_HDM_DECODER_CAP_OFFSET);
> +	count = cap & CXL_HDM_DECODER_COUNT_MASK;
> +	count = min(count, CXL_MAX_DECODERS);
> +
> +	state->hdm_decoder_count = count;
> +	state->hdm_global_ctrl = readl(hdm + CXL_HDM_DECODER_GLOBAL_CTRL_OFFSET);
> +
> +	for (i = 0; i < count; i++) {
> +		struct cxl_hdm_decoder_state *d = &state->decoders[i];
> +		u32 base_low, base_high, size_low, size_high;
> +		u32 dpa_skip_low, dpa_skip_high;
> +
> +		base_low = readl(hdm + CXL_HDM_DECODER_BASE_LOW(i));
> +		base_high = readl(hdm + CXL_HDM_DECODER_BASE_HIGH(i));
> +		size_low = readl(hdm + CXL_HDM_DECODER_SIZE_LOW(i));
> +		size_high = readl(hdm + CXL_HDM_DECODER_SIZE_HIGH(i));
> +		ctrl = readl(hdm + CXL_HDM_DECODER_CTRL(i));
> +		dpa_skip_low = readl(hdm + CXL_HDM_DECODER_DPA_SKIP_LOW(i));
> +		dpa_skip_high = readl(hdm + CXL_HDM_DECODER_DPA_SKIP_HIGH(i));
> +
> +		d->base = ((u64)base_high << 32) | base_low;
> +		d->size = ((u64)size_high << 32) | size_low;
> +		d->ctrl = ctrl;
> +		d->dpa_skip = ((u64)dpa_skip_high << 32) | dpa_skip_low;
> +		d->enabled = !!(ctrl & CXL_HDM_DECODER_ENABLE);
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_restore_hdm_state(struct cxl_dev_state *cxlds,
> +				 const struct cxl_type2_saved_state *state)
> +{
> +	void __iomem *hdm = cxlds->regs.hdm_decoder;
> +	int i;
> +
> +	if (!hdm || state->hdm_decoder_count == 0)
> +		return 0;
> +
> +	writel(state->hdm_global_ctrl, hdm + CXL_HDM_DECODER_GLOBAL_CTRL_OFFSET);
> +
> +	for (i = 0; i < state->hdm_decoder_count; i++) {
> +		const struct cxl_hdm_decoder_state *d = &state->decoders[i];
> +
> +		writel((u32)d->base, hdm + CXL_HDM_DECODER_BASE_LOW(i));
> +		writel((u32)(d->base >> 32), hdm + CXL_HDM_DECODER_BASE_HIGH(i));
> +		writel((u32)d->size, hdm + CXL_HDM_DECODER_SIZE_LOW(i));
> +		writel((u32)(d->size >> 32), hdm + CXL_HDM_DECODER_SIZE_HIGH(i));
> +		writel(d->ctrl, hdm + CXL_HDM_DECODER_CTRL(i));
> +		writel((u32)d->dpa_skip, hdm + CXL_HDM_DECODER_DPA_SKIP_LOW(i));
> +		writel((u32)(d->dpa_skip >> 32), hdm + CXL_HDM_DECODER_DPA_SKIP_HIGH(i));
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * CXL IDE register save/restore
> + */
> +static int cxl_save_ide_state(struct cxl_dev_state *cxlds,
> +			      struct cxl_type2_saved_state *state)
> +{
> +	void __iomem *ide = cxlds->regs.ide;
> +	u32 cap;
> +
> +	if (!ide)
> +		return 0;
> +
> +	cap = readl(ide + CXL_IDE_CAP_OFFSET);
> +	if (!(cap & CXL_IDE_CAP_CAPABLE))
> +		return 0;
> +
> +	state->ide_cap = cap;
> +	state->ide_ctrl = readl(ide + CXL_IDE_CTRL_OFFSET);
> +	state->ide_key_refresh_time = readl(ide + CXL_IDE_KEY_REFRESH_TIME_CTRL_OFFSET);
> +	state->ide_truncation_delay = readl(ide + CXL_IDE_TRUNCATION_DELAY_CTRL_OFFSET);
> +
> +	return 0;
> +}
> +
> +static int cxl_restore_ide_state(struct cxl_dev_state *cxlds,
> +				 const struct cxl_type2_saved_state *state)
> +{
> +	void __iomem *ide = cxlds->regs.ide;
> +
> +	if (!ide || !(state->ide_cap & CXL_IDE_CAP_CAPABLE))
> +		return 0;
> +
> +	writel(state->ide_ctrl, ide + CXL_IDE_CTRL_OFFSET);
> +	writel(state->ide_key_refresh_time, ide + CXL_IDE_KEY_REFRESH_TIME_CTRL_OFFSET);
> +	writel(state->ide_truncation_delay, ide + CXL_IDE_TRUNCATION_DELAY_CTRL_OFFSET);
> +
> +	return 0;
> +}
> +
>  /**
>   * cxl_config_save_state - Save CXL configuration state
>   * @pdev: PCI device
>   * @state: Structure to store saved state
>   *
> - * Saves CXL DVSEC state before reset.
> + * Saves CXL DVSEC, HDM decoder, and IDE state before reset.
>   */
>  int cxl_config_save_state(struct pci_dev *pdev,
>  			  struct cxl_type2_saved_state *state)
>  {
>  	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> -	int dvsec;
> +	int rc, dvsec;
> 
>  	if (!cxlds || !state)
>  		return -EINVAL;
> @@ -1225,7 +1334,23 @@ int cxl_config_save_state(struct pci_dev *pdev,
>  	if (!dvsec)
>  		return -ENODEV;
> 
> -	return cxl_save_dvsec_state(pdev, state, dvsec);
> +	rc = cxl_save_dvsec_state(pdev, state, dvsec);
> +	if (rc)
> +		return rc;
> +
> +	if (cxlds->regs.hdm_decoder) {
> +		rc = cxl_save_hdm_state(cxlds, state);
> +		if (rc)
> +			pci_warn(pdev, "Failed to save HDM state: %d\n", rc);
> +	}
> +
> +	if (cxlds->regs.ide) {
> +		rc = cxl_save_ide_state(cxlds, state);
> +		if (rc)
> +			pci_warn(pdev, "Failed to save IDE state: %d\n", rc);
> +	}
> +
> +	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_config_save_state, "CXL");
> 
> @@ -1234,7 +1359,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_config_save_state, "CXL");
>   * @pdev: PCI device
>   * @state: Previously saved state
>   *
> - * Restores CXL DVSEC state after reset.
> + * Restores CXL DVSEC, HDM decoder, and IDE state after reset.
>   */
>  int cxl_config_restore_state(struct pci_dev *pdev,
>  			     const struct cxl_type2_saved_state *state)
> @@ -1257,7 +1382,23 @@ int cxl_config_restore_state(struct pci_dev *pdev,
> 
>  	config_locked = !!(lock_reg & CXL_DVSEC_LOCK_CONFIG_LOCK);
> 
> -	return cxl_restore_dvsec_state(pdev, state, dvsec, config_locked);
> +	rc = cxl_restore_dvsec_state(pdev, state, dvsec, config_locked);
> +	if (rc)
> +		return rc;
> +
> +	if (cxlds->regs.hdm_decoder && state->hdm_decoder_count > 0) {
> +		rc = cxl_restore_hdm_state(cxlds, state);
> +		if (rc)
> +			pci_warn(pdev, "Failed to restore HDM state: %d\n", rc);
> +	}
> +
> +	if (cxlds->regs.ide && (state->ide_cap & CXL_IDE_CAP_CAPABLE)) {
> +		rc = cxl_restore_ide_state(cxlds, state);
> +		if (rc)
> +			pci_warn(pdev, "Failed to restore IDE state: %d\n", rc);
> +	}
> +
> +	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_config_restore_state, "CXL");
> 
> diff --git a/include/cxl/pci.h b/include/cxl/pci.h
> index 2c629ded73cc..9f2a9ad10d75 100644
> --- a/include/cxl/pci.h
> +++ b/include/cxl/pci.h
> @@ -4,11 +4,33 @@
>  #ifndef __CXL_ACCEL_PCI_H
>  #define __CXL_ACCEL_PCI_H
> 
> +/* HDM Decoder state for save/restore */
> +struct cxl_hdm_decoder_state {
> +	u64 base;
> +	u64 size;
> +	u32 ctrl;
> +	u64 dpa_skip;
> +	bool enabled;
> +};
> +
> +#define CXL_MAX_DECODERS 10
> +
>  /* CXL Type 2 device state for save/restore across reset */
>  struct cxl_type2_saved_state {
>  	/* DVSEC registers */
>  	u16 dvsec_ctrl;
>  	u16 dvsec_ctrl2;
> +
> +	/* HDM Decoder registers */
> +	u32 hdm_decoder_count;
> +	u32 hdm_global_ctrl;
> +	struct cxl_hdm_decoder_state decoders[CXL_MAX_DECODERS];
> +
> +	/* IDE registers */
> +	u32 ide_cap;
> +	u32 ide_ctrl;
> +	u32 ide_key_refresh_time;
> +	u32 ide_truncation_delay;
>  };
> 
>  int cxl_config_save_state(struct pci_dev *pdev,
> @@ -58,6 +80,27 @@ int cxl_config_restore_state(struct pci_dev *pdev,
> 
>  #define CXL_DVSEC_RANGE_MAX		2
> 
> +/* CXL HDM Decoder Capability Structure (Section 8.2.4.20) */
> +#define CXL_HDM_DECODER_CAP_OFFSET		0x0
> +#define   CXL_HDM_DECODER_COUNT_MASK		GENMASK(3, 0)
> +#define CXL_HDM_DECODER_GLOBAL_CTRL_OFFSET	0x4
> +#define   CXL_HDM_DECODER_ENABLE		BIT(1)
> +/* CXL HDM Decoder n registers (Offset 20h*n + base) */
> +#define CXL_HDM_DECODER_BASE_LOW(n)		(0x10 + ((n) * 0x20))
> +#define CXL_HDM_DECODER_BASE_HIGH(n)		(0x14 + ((n) * 0x20))
> +#define CXL_HDM_DECODER_SIZE_LOW(n)		(0x18 + ((n) * 0x20))
> +#define CXL_HDM_DECODER_SIZE_HIGH(n)		(0x1C + ((n) * 0x20))
> +#define CXL_HDM_DECODER_CTRL(n)			(0x20 + ((n) * 0x20))
> +#define CXL_HDM_DECODER_DPA_SKIP_LOW(n)		(0x24 + ((n) * 0x20))
> +#define CXL_HDM_DECODER_DPA_SKIP_HIGH(n)	(0x28 + ((n) * 0x20))
> +
> +/* CXL IDE Capability Structure (Section 8.2.4.22) */
> +#define CXL_IDE_CAP_OFFSET			0x00
> +#define   CXL_IDE_CAP_CAPABLE			BIT(0)
> +#define CXL_IDE_CTRL_OFFSET			0x04
> +#define CXL_IDE_KEY_REFRESH_TIME_CTRL_OFFSET	0x18
> +#define CXL_IDE_TRUNCATION_DELAY_CTRL_OFFSET	0x1C
> +
>  /* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
>  #define CXL_DVSEC_FUNCTION_MAP					2
> 
> --
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/10] CXL Reset support for Type 2 devices
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (9 preceding siblings ...)
  2026-01-20 22:26 ` [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore smadhavan
@ 2026-01-21  1:19 ` Alison Schofield
  2026-01-22  0:00 ` Bjorn Helgaas
  2026-01-27 16:33 ` Alex Williamson
  12 siblings, 0 replies; 48+ messages in thread
From: Alison Schofield @ 2026-01-21  1:19 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, jonathan.cameron, dave.jiang, vishal.l.verma, ira.weiny,
	dan.j.williams, bhelgaas, ming.li, rrichter,
	Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, Jan 20, 2026 at 10:26:00PM +0000, smadhavan@nvidia.com wrote:
> From: Srirangan Madhavan <smadhavan@nvidia.com>
> 
> Hi folks!
> 
> This patch series introduces support for the CXL Reset method for CXL
> devices, implementing the reset procedure outlined in CXL Spec [1] v3.2,
> Sections 9.6 and 9.7.

Hi Srirangan,

Following-up on the base commit for this patch. For the cxl subsystem,
a patchset today is expected to be based on 6.19-rc4 (or rc5 or rc6).
We get to that because our cxl/next is based on rc4. We ask folks not
to base on cxl/next because we want to see conflicts between patchsets.

I like to use the base commit option to automatically append it to
the cover letter, but you can just state it in the change log if your
method of sending patches doesn't support that.

With this set, there is something else that you are depending upon.
Please call that out and point to the commits in some way so we can
build on 6.19-rc4 plus those commits plus this patchset.

Thanks!
Alison

> 
> v4 changes:
> - Fix CXL reset capability check parentheses warning
> - Gate CXL reset path on CONFIG_CXL_PCI reachability
> 
> v3 changes:
> - Restrict CXL reset to Type 2 devices only
> - Add host and device cache flushing for
>     * all sibling functions on multi-function devices
>     * all sibling devices in a given region
> - Add region teardown and memory online detection before reset
> - Add configuration state save/restore (DVSEC, HDM, IDE)
> - Split the series by subsystem and functional blocks
> 
> v2 changes:
> - De-duplicate CXL DVSEC register defines under include/cxl/pci.h
> - Fix style-related issues
> 
> v1 changes:
> - Added cover letter and dropped the RFC
> 
> The RFC patches can be found here [2]
> v2 patches can be found here [3]
> 
> Motivation:
> -----------
> This change is broadly useful for reasons including but not limited to the
> following:
> 
> - As support for Type 2 devices [4] is being introduced, more devices will
>   require finer-grained reset mechanisms beyond bus-wide reset methods.
> 
> - FLR does not affect CXL.cache or CXL.mem protocols, making CXL Reset
>   the preferred method in some cases.
> 
> - The CXL spec (Sections 7.2.3 Binding and Unbinding, 9.5 FLR) highlights use
>   cases like function rebinding and error recovery, where CXL Reset is
>   explicitly mentioned.
> 
> Change Description:
> -------------------
> 
> Patch 1: Move CXL DVSEC defines to the CXL PCI header
> - Consolidate DVSEC register definitions under include/cxl/pci.h
> 
> Patch 2: Switch PCI CXL port DVSEC defines
> - Use the shared CXL PCI header in the PCI core
> 
> Patch 3: Add Type 2 helper and reset DVSEC bits
> - Add helper to identify Type 2 devices
> - Define DVSEC reset/cache control bits
> 
> Patch 4: Add the CXL reset method in the PCI core
> - Implement cxl_reset() method with capability checks and reset sequence
> - Restrict to Type 2 devices
> 
> Patch 5: Add reset preparation and region teardown
> - Implement region validation and teardown before reset
> - Add device cache flush for all sibling devices in a given region
> 
> Patch 6: Wire CXL reset prepare/cleanup in PCI
> - Call CXL reset prepare/cleanup around the core reset flow
> 
> Patch 7: Add host CPU cache flush and multi-function support
> - Add host CPU cache flush (x86: wbinvd, arm64: dcache_clean_inval_poc)
> - Add device cache flush for all sibling functions on multi-function devices
> 
> Patch 8: Add DVSEC configuration state save/restore
> - Save/restore DVSEC registers (DEVCTL, DEVCTL2) with CONFIG_LOCK handling
> 
> Patch 9: Save/restore CXL config around reset
> - Save PCI and CXL config before reset and restore afterwards
> 
> Patch 10: Add HDM decoder and IDE state save/restore
> - Save/restore HDM decoder and IDE register state
> 
> The reset sequence: validate device type, check memory offline, tear down
> regions, flush host CPU caches, flush device caches (all functions), save
> config state, initiate reset, wait for completion, restore config state.
> 
> Command line to test the CXL reset on a capable device:
>     echo cxl_reset > /sys/bus/pci/devices/<pci_device>/reset_method
>     echo 1 > /sys/bus/pci/devices/<pci_device>/reset
> 
> [1] https://computeexpresslink.org/cxl-specification/
> [2] https://lore.kernel.org/all/20241213074143.374-1-smadhavan@nvidia.com/
> [3] https://lore.kernel.org/all/20250221043906.1593189-1-smadhavan@nvidia.com/
> [4] https://lore.kernel.org/linux-cxl/20251205115248.772945-1-alejandro.lucero-palau@amd.com/
> 
> Srirangan Madhavan (10):
>   [PATCH v4 1/10] cxl: move DVSEC defines to cxl pci header
>   [PATCH v4 2/10] PCI: switch CXL port DVSEC defines
>   [PATCH v4 3/10] cxl: add type 2 helper and reset DVSEC bits
>   [PATCH v4 4/10] PCI: add CXL reset method
>   [PATCH v4 5/10] cxl: add reset prepare and region teardown
>   [PATCH v4 6/10] PCI: wire CXL reset prepare/cleanup
>   [PATCH v4 7/10] cxl: add host cache flush and multi-function reset
>   [PATCH v4 8/10] cxl: add DVSEC config save/restore
>   [PATCH v4 9/10] PCI: save/restore CXL config around reset
>   [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore
> 
>  drivers/cxl/core/pci.c        |   1 +
>  drivers/cxl/core/regs.c       |   8 +
>  drivers/cxl/cxl.h             |   4 +
>  drivers/cxl/cxlpci.h          |  53 ---
>  drivers/cxl/pci.c             | 621 +++++++++++++++++++++++++++++++++-
>  drivers/pci/pci.c             | 150 +++++++-
>  include/cxl/pci.h             | 134 ++++++++
>  include/linux/pci.h           |  21 +-
>  include/uapi/linux/pci_regs.h |   5 -
>  9 files changed, 929 insertions(+), 68 deletions(-)
>  create mode 100644 include/cxl/pci.h
> 
> --
> 2.34.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/10] CXL Reset support for Type 2 devices
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (10 preceding siblings ...)
  2026-01-21  1:19 ` [PATCH v4 0/10] CXL Reset support for Type 2 devices Alison Schofield
@ 2026-01-22  0:00 ` Bjorn Helgaas
  2026-01-27 16:33 ` Alex Williamson
  12 siblings, 0 replies; 48+ messages in thread
From: Bjorn Helgaas @ 2026-01-22  0:00 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, Jan 20, 2026 at 10:26:00PM +0000, smadhavan@nvidia.com wrote:
> ...

>   [PATCH v4 1/10] cxl: move DVSEC defines to cxl pci header
>   [PATCH v4 2/10] PCI: switch CXL port DVSEC defines
>   [PATCH v4 3/10] cxl: add type 2 helper and reset DVSEC bits
>   [PATCH v4 4/10] PCI: add CXL reset method
>   [PATCH v4 5/10] cxl: add reset prepare and region teardown
>   [PATCH v4 6/10] PCI: wire CXL reset prepare/cleanup
>   [PATCH v4 7/10] cxl: add host cache flush and multi-function reset
>   [PATCH v4 8/10] cxl: add DVSEC config save/restore
>   [PATCH v4 9/10] PCI: save/restore CXL config around reset
>   [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore

Please run "git log --online" on the files you change and match the
style of subject lines.  In drivers/pci/, the subject lines start with
capital letters, e.g.,

  PCI: Switch CXL port ...
  PCI: Add CXL reset method (but please include the function name)
  ...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/10] CXL Reset support for Type 2 devices
  2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
                   ` (11 preceding siblings ...)
  2026-01-22  0:00 ` Bjorn Helgaas
@ 2026-01-27 16:33 ` Alex Williamson
  2026-01-27 17:02   ` dan.j.williams
  12 siblings, 1 reply; 48+ messages in thread
From: Alex Williamson @ 2026-01-27 16:33 UTC (permalink / raw)
  To: smadhavan
  Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

On Tue, 20 Jan 2026 22:26:00 +0000
<smadhavan@nvidia.com> wrote:
> Motivation:
> -----------
> This change is broadly useful for reasons including but not limited to the
> following:
> 
> - As support for Type 2 devices [4] is being introduced, more devices will
>   require finer-grained reset mechanisms beyond bus-wide reset methods.
> 
> - FLR does not affect CXL.cache or CXL.mem protocols, making CXL Reset
>   the preferred method in some cases.

This proposal adds cxl_reset to the pci_reset_fn_methods[] array, so
while the above suggests there are some use cases that would prefer CXL
reset, we're interleaving the capability into the default function for
PCI function scoped reset.

I'm concerned about use cases like vfio-pci emulation of FLR, where we
currently call through the pci_reset_function() interfaces to take
advantage of device specific quirks.  Such a use case (guest directed
FLR) might reasonably expect the scope of the reset is limited to CXL.io
without affecting CXL.cache or CXL.mem, therefore it's not clear that
pci_reset_function() necessarily has the correct scope.

OTOH, we also expose a reset ioctl through vfio-pci that could
reasonably reset the full state of the device, PCI and CXL.  cxl_reset
would be appropriate here, however cxl_reset is prioritized below FLR
reset in the default reset_methods array, so pci_reset_function()
doesn't have the correct scope here either.

Is it really appropriate to consider cxl_reset as just another PCI reset
method given the scope difference?  Does there need to be a way to
specify the scope when calling pci_reset_function() for drivers aware
of CXL.mem/cache?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/10] CXL Reset support for Type 2 devices
  2026-01-27 16:33 ` Alex Williamson
@ 2026-01-27 17:02   ` dan.j.williams
  2026-01-27 18:07     ` Vikram Sethi
  0 siblings, 1 reply; 48+ messages in thread
From: dan.j.williams @ 2026-01-27 17:02 UTC (permalink / raw)
  To: Alex Williamson, smadhavan
  Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas, ming.li,
	rrichter, Smita.KoralahalliChannabasappa, huaisheng.ye, linux-cxl,
	linux-pci, vaslot, vsethi, sdonthineni, vidyas, mochs, jsequeira

Alex Williamson wrote:
[..]
> Is it really appropriate to consider cxl_reset as just another PCI reset
> method given the scope difference?  Does there need to be a way to
> specify the scope when calling pci_reset_function() for drivers aware
> of CXL.mem/cache?

Right, FWIW, I am of the opinion that trying to plumb CXL reset through
PCI reset is an awkward fit that will be an ongoing source of pain. For
the same reason that CXL protocol error handling is designed to have
little do with PCIe protocol error handling, CXL is not PCIe. For
example, the goal in the error handling case is to arrive at a point
where the PCIe error handling implementation can continue to evolve
without inflicting the maintenance burden load of "what about CXL?".
Notifications are simply shunted over to the CXL core early in the flow.
CXL reset does not need any PCI core entanglements as far as I can see.

Once the protocol error handling series has landed there is a potential
to extend that with CXL Reset recovery. However, that needs a clear
error model defined as to which resets have a chance of recovering
*system* operation when CXL.cache/mem fails. In Terry's series, panic /
reboot is the recovery, not reset.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/10] CXL Reset support for Type 2 devices
  2026-01-27 17:02   ` dan.j.williams
@ 2026-01-27 18:07     ` Vikram Sethi
  2026-01-28  3:42       ` dan.j.williams
  0 siblings, 1 reply; 48+ messages in thread
From: Vikram Sethi @ 2026-01-27 18:07 UTC (permalink / raw)
  To: dan.j.williams@intel.com, Alex Williamson, Srirangan Madhavan
  Cc: dave@stgolabs.net, jonathan.cameron@huawei.com,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	bhelgaas@google.com, ming.li@zohomail.com, rrichter@amd.com,
	Smita.KoralahalliChannabasappa@amd.com, huaisheng.ye@intel.com,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	Vishal Aslot, Shanker Donthineni, Vidya Sagar, Jason Gunthorpe,
	Matt Ochs, Jason Sequeira

Hi Dan,

________________________________________
From: dan.j.williams@intel.com <dan.j.williams@intel.com>
Sent: Tuesday, January 27, 2026 11:02 AM
To: Alex Williamson; Srirangan Madhavan
Cc: dave@stgolabs.net; jonathan.cameron@huawei.com; dave.jiang@intel.com; alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com; dan.j.williams@intel.com; bhelgaas@google.com; ming.li@zohomail.com; rrichter@amd.com; Smita.KoralahalliChannabasappa@amd.com; huaisheng.ye@intel.com; linux-cxl@vger.kernel.org; linux-pci@vger.kernel.org; Vishal Aslot; Vikram Sethi; Shanker Donthineni; Vidya Sagar; Matt Ochs; Jason Sequeira
Subject: Re: [PATCH v4 0/10] CXL Reset support for Type 2 devices

>Once the protocol error handling series has landed there is a potential
>to extend that with CXL Reset recovery. However, that needs a clear
>error model defined as to which resets have a chance of recovering
>*system* operation when CXL.cache/mem fails. In Terry's series, panic /
>reboot is the recovery, not reset.

It's not just about CXL protocol error handling though. Type2 Device passthrough will be a common usecase for CXL reset (entire device is assigned to VM) across different VM assignment.
The cache and mem of the device must be cleared via CXL reset for full device passthrough in addition to the PCIe/CXL.IO reset. Another usecase is when the device is reconfigured either in baremetal or in VM for passthrough usecase. Such usecases often require a reset of the device, and it's cache, mem and not a narrow "memregion" reset, so I'm not in favor of exposing it via CXL.mem sysfs entries. IMO, device sysfs attribute reset method is appropriate for type2 devices.
Finally, there is the error usecase, which in the common case is as simple as an uncorrected ECC error in the HDM. While not strictly necessary, it is common practice to reset the device in such cases, to recover the bad page/row via PPR on device reset. You want reset of the memory controller as part of the CXL reset here, FLR is not enough.
Regarding scope, and "recoverability", we have significant experience with recovering "pre-CXL" coherent GB200 devices for device errors: memory ECC or other, by killing the application using device memory, unloading driver, resetting the device, and reloading driver. The CXL protocol error series can use CXL reset series, but I don't see that this series needs to wait for protocol error handling to be merged.  

Thanks,
Vikram

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/10] CXL Reset support for Type 2 devices
  2026-01-27 18:07     ` Vikram Sethi
@ 2026-01-28  3:42       ` dan.j.williams
  2026-01-28 12:36         ` Jonathan Cameron
  0 siblings, 1 reply; 48+ messages in thread
From: dan.j.williams @ 2026-01-28  3:42 UTC (permalink / raw)
  To: Vikram Sethi, dan.j.williams@intel.com, Alex Williamson,
	Srirangan Madhavan
  Cc: dave@stgolabs.net, jonathan.cameron@huawei.com,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	bhelgaas@google.com, ming.li@zohomail.com, rrichter@amd.com,
	Smita.KoralahalliChannabasappa@amd.com, huaisheng.ye@intel.com,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	Vishal Aslot, Shanker Donthineni, Vidya Sagar, Jason Gunthorpe,
	Matt Ochs, Jason Sequeira

Vikram Sethi wrote:
> Hi Dan,
> 
> >Once the protocol error handling series has landed there is a potential
> >to extend that with CXL Reset recovery. However, that needs a clear
> >error model defined as to which resets have a chance of recovering
> >*system* operation when CXL.cache/mem fails. In Terry's series, panic /
> >reboot is the recovery, not reset.
> 
> It's not just about CXL protocol error handling though. Type2 Device
> passthrough will be a common usecase for CXL reset (entire device is
> assigned to VM) across different VM assignment.

I understand. The point about CXL Protocol Error handling series is that
it at least enlightens the PCIe core about the presence of active CXL
links.

It is also the case that it much closer to being upstream than this set
which still has fundamental questions.

> The cache and mem of the device must be cleared via CXL reset for full
> device passthrough in addition to the PCIe/CXL.IO reset. Another
> usecase is when the device is reconfigured either in baremetal or in
> VM for passthrough usecase. Such usecases often require a reset of the
> device, and it's cache, mem and not a narrow "memregion" reset, so I'm
> not in favor of exposing it via CXL.mem sysfs entries. IMO, device
> sysfs attribute reset method is appropriate for type2 devices.

That case is already handled today with secondary bus reset to
completely reset an entire device. That path is problematic for CXL
because PCI reset has no idea about how to manage caches or handle
memory unplug.

Administrator is responsible for making sure that event is not a
surprise memory removal or cache protocol corruption event. That is a
level of explosiveness that PCI reset has never needed to consider.

CXL Reset wants to be more surgical than secondary bus reset. The only
way to be more surgical is to cooperate with the CXL core that knows
whether the explosives have been rendered inert.

> Finally, there is the error usecase, which in the common case is as
> simple as an uncorrected ECC error in the HDM. While not strictly
> necessary, it is common practice to reset the device in such cases, to
> recover the bad page/row via PPR on device reset. You want reset of
> the memory controller as part of the CXL reset here, FLR is not
> enough.

??

CXL memory repair is already upstream without CXL Reset, see
CONFIG_CXL_EDAC_MEM_REPAIR.

> Regarding scope, and "recoverability", we have significant experience
> with recovering "pre-CXL" coherent GB200 devices for device errors:
> memory ECC or other, by killing the application using device memory,
> unloading driver, resetting the device, and reloading driver. The CXL
> protocol error series can use CXL reset series, but I don't see that
> this series needs to wait for protocol error handling to be merged.  

I do want to get to the point where device memory error recovery is a
first class citizen. CXL Protocol Error handling just happens to be at
the top of the review queue and safe to assume it can be a foundation
for further RAS features.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/10] CXL Reset support for Type 2 devices
  2026-01-28  3:42       ` dan.j.williams
@ 2026-01-28 12:36         ` Jonathan Cameron
  0 siblings, 0 replies; 48+ messages in thread
From: Jonathan Cameron @ 2026-01-28 12:36 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Vikram Sethi, Alex Williamson, Srirangan Madhavan,
	dave@stgolabs.net, dave.jiang@intel.com,
	alison.schofield@intel.com, vishal.l.verma@intel.com,
	ira.weiny@intel.com, bhelgaas@google.com, ming.li@zohomail.com,
	rrichter@amd.com, Smita.KoralahalliChannabasappa@amd.com,
	huaisheng.ye@intel.com, linux-cxl@vger.kernel.org,
	linux-pci@vger.kernel.org, Vishal Aslot, Shanker Donthineni,
	Vidya Sagar, Jason Gunthorpe, Matt Ochs, Jason Sequeira

On Tue, 27 Jan 2026 19:42:58 -0800
dan.j.williams@intel.com wrote:

> Vikram Sethi wrote:
> > Hi Dan,
> >   
> > >Once the protocol error handling series has landed there is a potential
> > >to extend that with CXL Reset recovery. However, that needs a clear
> > >error model defined as to which resets have a chance of recovering
> > >*system* operation when CXL.cache/mem fails. In Terry's series, panic /
> > >reboot is the recovery, not reset.  
> > 
> > It's not just about CXL protocol error handling though. Type2 Device
> > passthrough will be a common usecase for CXL reset (entire device is
> > assigned to VM) across different VM assignment.  
> 
> I understand. The point about CXL Protocol Error handling series is that
> it at least enlightens the PCIe core about the presence of active CXL
> links.
> 
> It is also the case that it much closer to being upstream than this set
> which still has fundamental questions.
> 
> > The cache and mem of the device must be cleared via CXL reset for full
> > device passthrough in addition to the PCIe/CXL.IO reset. Another
> > usecase is when the device is reconfigured either in baremetal or in
> > VM for passthrough usecase. Such usecases often require a reset of the
> > device, and it's cache, mem and not a narrow "memregion" reset, so I'm
> > not in favor of exposing it via CXL.mem sysfs entries. IMO, device
> > sysfs attribute reset method is appropriate for type2 devices.  
> 
> That case is already handled today with secondary bus reset to
> completely reset an entire device. That path is problematic for CXL
> because PCI reset has no idea about how to manage caches or handle
> memory unplug.
> 
> Administrator is responsible for making sure that event is not a
> surprise memory removal or cache protocol corruption event. That is a
> level of explosiveness that PCI reset has never needed to consider.
> 
> CXL Reset wants to be more surgical than secondary bus reset. The only
> way to be more surgical is to cooperate with the CXL core that knows
> whether the explosives have been rendered inert.
> 
> > Finally, there is the error usecase, which in the common case is as
> > simple as an uncorrected ECC error in the HDM. While not strictly
> > necessary, it is common practice to reset the device in such cases, to
> > recover the bad page/row via PPR on device reset. You want reset of
> > the memory controller as part of the CXL reset here, FLR is not
> > enough.  
> 
> ??
> 
> CXL memory repair is already upstream without CXL Reset, see
> CONFIG_CXL_EDAC_MEM_REPAIR.

To actually do it we rely on tear down of all access to the device
(if it it's disruptive).
There is new spec stuff covering asking a device to just do it next reset
that would fit more closely with this flow. No support yet I think.

Mind you we are talking type 2, so who knows what people will implement.
Nice if they used that part of the spec but I doubt we can rely on it!

Jonathan


> 
> > Regarding scope, and "recoverability", we have significant experience
> > with recovering "pre-CXL" coherent GB200 devices for device errors:
> > memory ECC or other, by killing the application using device memory,
> > unloading driver, resetting the device, and reloading driver. The CXL
> > protocol error series can use CXL reset series, but I don't see that
> > this series needs to wait for protocol error handling to be merged.    
> 
> I do want to get to the point where device memory error recovery is a
> first class citizen. CXL Protocol Error handling just happens to be at
> the top of the review queue and safe to assume it can be a foundation
> for further RAS features.
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2026-03-12 18:24 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
2026-01-20 22:26 ` [PATCH v4 01/10] cxl: move DVSEC defines to cxl pci header smadhavan
2026-01-21 10:31   ` Jonathan Cameron
2026-01-20 22:26 ` [PATCH v4 02/10] PCI: switch CXL port DVSEC defines smadhavan
2026-01-21 10:34   ` Jonathan Cameron
2026-01-20 22:26 ` [PATCH v4 03/10] cxl: add type 2 helper and reset DVSEC bits smadhavan
2026-01-20 23:27   ` Dave Jiang
2026-01-21 10:45     ` Jonathan Cameron
2026-01-20 22:26 ` [PATCH v4 04/10] PCI: add CXL reset method smadhavan
2026-01-21  0:08   ` Dave Jiang
2026-01-21 10:57   ` Jonathan Cameron
2026-01-23 13:54   ` kernel test robot
2026-01-20 22:26 ` [PATCH v4 05/10] cxl: add reset prepare and region teardown smadhavan
2026-01-21 11:09   ` Jonathan Cameron
2026-01-21 21:25   ` Dave Jiang
2026-01-20 22:26 ` [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup smadhavan
2026-01-21 22:13   ` Dave Jiang
2026-01-22  2:17     ` Srirangan Madhavan
2026-01-22 15:11       ` Dave Jiang
2026-01-24  7:54   ` kernel test robot
2026-01-20 22:26 ` [PATCH v4 07/10] cxl: add host cache flush and multi-function reset smadhavan
2026-01-21 11:20   ` Jonathan Cameron
2026-01-21 20:27     ` Davidlohr Bueso
2026-01-22  9:53       ` Jonathan Cameron
2026-01-21 22:19     ` Vikram Sethi
2026-01-22  9:40       ` Souvik Chakravarty
     [not found]     ` <PH7PR12MB9175CDFC163843BB497073CEBD96A@PH7PR12MB9175.namprd12.prod.outlook.com>
2026-01-22 10:31       ` Jonathan Cameron
2026-01-22 19:24         ` Vikram Sethi
2026-01-23 13:13           ` Jonathan Cameron
2026-01-21 23:59   ` Dave Jiang
2026-01-20 22:26 ` [PATCH v4 08/10] cxl: add DVSEC config save/restore smadhavan
2026-01-21 11:31   ` Jonathan Cameron
2026-01-20 22:26 ` [PATCH v4 09/10] PCI: save/restore CXL config around reset smadhavan
2026-01-21 22:32   ` Dave Jiang
2026-01-22 10:01   ` Lukas Wunner
2026-01-22 10:47     ` Jonathan Cameron
2026-01-26 22:34       ` Alex Williamson
2026-03-12 18:24         ` Jonathan Cameron
2026-01-20 22:26 ` [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore smadhavan
2026-01-21 11:42   ` Jonathan Cameron
2026-01-22 15:09   ` Dave Jiang
2026-01-21  1:19 ` [PATCH v4 0/10] CXL Reset support for Type 2 devices Alison Schofield
2026-01-22  0:00 ` Bjorn Helgaas
2026-01-27 16:33 ` Alex Williamson
2026-01-27 17:02   ` dan.j.williams
2026-01-27 18:07     ` Vikram Sethi
2026-01-28  3:42       ` dan.j.williams
2026-01-28 12:36         ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox