linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] CXL.mem error isolation support
@ 2025-07-30 21:47 Ben Cheatham
  2025-07-30 21:47 ` [PATCH 01/16] cxl/regs: Add cxl_unmap_component_regs() Ben Cheatham
                   ` (15 more replies)
  0 siblings, 16 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Overview
========
This series adds support for the CXL.mem Timeout & Isolation Capability
as defined by the CXL 3.2 spec (section 8.2.4.24), with some extras
(explained below). This is an optional capability implemented by
CXL-capable PCIe Root Ports to prevent the host system from resetting
when the CXL.mem link times out or goes down (i.e. CXL memory device is
suprise removed or dies). Without this capability, the system is
expected to immediately reset or power off when either of these
conditions occurs.

When CXL.mem isolation is triggered, the CXL memory below the port is no
longer accessible. Writes to the memory from the host are expected to
silently drop, while a synchronous response is expected for reads. This
response is implemntation specific, but an example response would be poisoned
data.

The specific features enabled by this series are:
 - Enabling CXL.mem isolation on link down conditions and transaction
   timeout
 - Setting up, enabling, and handling CXL isolation interrupts
 - Preventing onlining/enabling of isolated CXL memory
 - Sysfs attributes for system administrators to tune isolation
   capabilities

The Extras
==========
The last 3 commits provide support for an ECN [1] submitted by AMD that
allows platform firmware to modify how the OS enables and handles CXL
isolation. The ECN contents are expected to land in revision 4 of the
CXL spec. The link at [1] is only accessible to CXL SSWG members, but
I've done my best to explain the changes in the relevant commits.

The changes in these commits could probably be moved to earlier commits,
but I've opted to leave them tacked on the end just in case anyone has a
problem with their inclusion.

Intended Behavior
=================
Due to how CXL memory is currently handled by Linux, this feature isn't
all that useful for type 3 cards. The intended behavior for type 3 cards
is to panic when isolation is triggered, which defeats the purpose of
the feature.

The reason I'm sending this out anyway is twofold:
1) I've seen rumblings that CXL memory will be part of it's own opt-in
allocator in the future and the memory may be safely removable at that
point.
2) CXL memory provided by a Type 2 card may be safely removable, though
it's left up to the type 2 endpoint driver to handle isolation recovery.

I've also not included a flow for isolation recovery. This is because I
a) don't have a system that supports it, and b) it's not applicable to
the type 3 driver.

Building the Set
================
This series is based on both Terry's port error handling patch set (v10)
and Dave's deferred downstream port probe set (v7). Terry's set was needed
since it introduces the uncorrectable CXL error = system panic paradigm, as
well as the routines for logging the AER info from the CXL subsystem.

I included Dave's set due to a timing issue I saw where the PCIe portdrv
code would run after the CXL ports that have the isolation capability
were probed. This caused the isolation set up to fail because the PCIe
portdrv provides the information to allocate the CXL isolation
interrupt. I tried deferring the probe, but the deferral caused the
cxl_mem driver to break because the port wasn't probed yet. I could have
introduced a scheme to get around this, but it seemed easier to just use
Dave's set to fix it.

The isolation support is gated behind the CXL core being built-in
because the CXL isolation PCIe service needs the mapping code in
cxl/core/regs.c. I realize a rework is planned for the PCIe portdrv to
(hopefully) not make this the case, so I've kept the code as minimal as
possible.

To build the set I applied Terry's set to the base commit below, Dave's
on top of that, then my patches.

Patch Breakdown
===============
Patches 3-5 & 12-13 will need eyes from PCIe folks.
Patch 14 needs an ACPI reviewer.

- Patches 1-2: Register mapping updates needed for isolation support
- Patches 3-5: CXL isolation service driver & MSI/-X vector allocation
- Patch 6: Enable CXL.mem isolation
- Patches 7-8: Set up and enable CXL isolation interrupts
- Patch 9: Preventing onlining isolated memory
- Patch 10: Enable CXL.mem transaction timeout
- Patch 11: cxl_pci isolation handler
- Patches 12-13: CXL isolation sysfs attributes
- Patch 14: ECN changes to CXL _OSC method
- Patches 15-16: ECN additions

[1]:
Link: https://members.computeexpresslink.org/wg/software_systems/document/3118

Ben Cheatham (16):
  cxl/regs: Add cxl_unmap_component_regs()
  cxl/regs: Add CXL Isolation capability mapping
  PCI: PCIe portdrv: Add CXL Isolation service driver
  PCI: PCIe portdrv: Allocate CXL isolation MSI/-X vector
  PCI: PCIe portdrv: Add interface for getting CXL isolation IRQ
  cxl/core: Enable CXL.mem isolation
  cxl/core: Set up isolation interrupts
  cxl/core: Enable CXL isolation interrupts
  cxl/core: Prevent onlining CXL memory behind isolated ports
  cxl/core: Enable CXL.mem timeout
  cxl/pci: Add isolation handler
  PCI: PCIe portdrv: Add cxl_isolation sysfs attributes
  cxl/core, PCI: PCIe portdrv: Add CXL timeout range programming
  ACPI: Add CXL isolation _OSC fields
  cxl/core, cxl/acpi: Enable CXL isolation based on _OSC handshake
  cxl/core, cxl/acpi: Add CXL isolation notify handler

 drivers/acpi/pci_root.c          |   9 +
 drivers/cxl/Kconfig              |  14 ++
 drivers/cxl/acpi.c               |  75 +++++++
 drivers/cxl/core/core.h          |   2 +
 drivers/cxl/core/pci.c           | 138 ++++++++++++
 drivers/cxl/core/port.c          | 248 +++++++++++++++++++++
 drivers/cxl/core/region.c        |   3 +
 drivers/cxl/core/regs.c          |  85 +++++--
 drivers/cxl/cxl.h                |  35 +++
 drivers/cxl/cxlmem.h             |   4 +
 drivers/cxl/pci.c                |   9 +
 drivers/pci/pci-sysfs.c          |   3 +
 drivers/pci/pci.h                |   4 +
 drivers/pci/pcie/Makefile        |   1 +
 drivers/pci/pcie/cxl_isolation.c | 371 +++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.c       |  21 +-
 drivers/pci/pcie/portdrv.h       |  18 +-
 include/cxl/isolation.h          |  66 ++++++
 include/linux/acpi.h             |   3 +
 19 files changed, 1086 insertions(+), 23 deletions(-)
 create mode 100644 drivers/pci/pcie/cxl_isolation.c
 create mode 100644 include/cxl/isolation.h

base-commit: a403fe6c0b17f472e01246eb350f5eef105243ac
-- 
2.34.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/16] cxl/regs: Add cxl_unmap_component_regs()
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 02/16] cxl/regs: Add CXL Isolation capability mapping Ben Cheatham
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

In order to allocate the MSI/-X interrupt for CXL error isolation the
PCIe portdrv driver needs to map the MMIO-space isolation capability
register that contains the interrupt vector. The CXL core already
provides a function to map registers in this space:
cxl_map_component_regs(). The mappings given this function are managed
by devres. If the isolation registers are left mapped by the PCIe
portdrv driver, the CXL driver will run into a devres conflict when it
goes to map the same registers during probe.

Add cxl_unmap_component_regs() to be called from the PCIe portdrv
driver. This function will unwind the devres allocations done by
cxl_map_component_regs(), which will allow the PCIe portdrv driver to
map the isolation capability register without conflicts.

Factor out the register mapping retrieval code in
cxl_map_component_regs() to be reused by cxl_map/unmap_component_regs().

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/core/regs.c | 77 +++++++++++++++++++++++++++++++----------
 drivers/cxl/cxl.h       |  3 ++
 2 files changed, 62 insertions(+), 18 deletions(-)

diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index b8e767a9571c..da8e668a3b70 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -201,40 +201,81 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
 }
 EXPORT_SYMBOL_NS_GPL(devm_cxl_iomap_block, "CXL");
 
-int cxl_map_component_regs(const struct cxl_register_map *map,
+struct mapinfo {
+	const struct cxl_reg_map *rmap;
+	void __iomem **addr;
+};
+
+static int cxl_get_mapinfo(const struct cxl_register_map *map,
 			   struct cxl_component_regs *regs,
-			   unsigned long map_mask)
+			   unsigned long map_mask, struct mapinfo *info)
 {
-	struct device *host = map->host;
-	struct mapinfo {
-		const struct cxl_reg_map *rmap;
-		void __iomem **addr;
-	} mapinfo[] = {
+	struct mapinfo mapinfo[] = {
 		{ &map->component_map.hdm_decoder, &regs->hdm_decoder },
 		{ &map->component_map.ras, &regs->ras },
 	};
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(mapinfo); i++) {
-		struct mapinfo *mi = &mapinfo[i];
-		resource_size_t addr;
-		resource_size_t length;
-
-		if (!mi->rmap->valid)
+		if (!mapinfo[i].rmap->valid)
 			continue;
-		if (!test_bit(mi->rmap->id, &map_mask))
+		if (!test_bit(mapinfo[i].rmap->id, &map_mask))
 			continue;
-		addr = map->resource + mi->rmap->offset;
-		length = mi->rmap->size;
-		*(mi->addr) = devm_cxl_iomap_block(host, addr, length);
-		if (!*(mi->addr))
-			return -ENOMEM;
+
+		*info = mapinfo[i];
+		return 0;
 	}
 
+	return -ENXIO;
+}
+
+int cxl_map_component_regs(const struct cxl_register_map *map,
+			   struct cxl_component_regs *regs,
+			   unsigned long map_mask)
+{
+	struct device *host = map->host;
+	resource_size_t addr, length;
+	struct mapinfo mi;
+	int rc;
+
+	rc = cxl_get_mapinfo(map, regs, map_mask, &mi);
+	if (rc)
+		return rc;
+
+	addr = map->resource + mi.rmap->offset;
+	length = mi.rmap->size;
+	*mi.addr = devm_cxl_iomap_block(host, addr, length);
+	if (!(*mi.addr))
+		return -ENOMEM;
+
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_map_component_regs, "CXL");
 
+int cxl_unmap_component_regs(const struct cxl_register_map *map,
+			     struct cxl_component_regs *regs,
+			     unsigned long map_mask)
+{
+	struct device *host = map->host;
+	resource_size_t addr, length;
+	struct mapinfo mi;
+	int rc;
+
+	rc = cxl_get_mapinfo(map, regs, map_mask, &mi);
+	if (rc)
+		return rc;
+
+	if (!(*mi.addr))
+		return 0;
+
+	addr = map->resource + mi.rmap->offset;
+	length = mi.rmap->size;
+
+	devm_iounmap(host, *mi.addr);
+	devm_release_mem_region(host, addr, length);
+	return 0;
+}
+
 int cxl_map_device_regs(const struct cxl_register_map *map,
 			struct cxl_device_regs *regs)
 {
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 51b7058387ef..a0fda305e25b 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -300,6 +300,9 @@ void cxl_probe_device_regs(struct device *dev, void __iomem *base,
 int cxl_map_component_regs(const struct cxl_register_map *map,
 			   struct cxl_component_regs *regs,
 			   unsigned long map_mask);
+int cxl_unmap_component_regs(const struct cxl_register_map *map,
+			     struct cxl_component_regs *regs,
+			     unsigned long map_mask);
 int cxl_map_device_regs(const struct cxl_register_map *map,
 			struct cxl_device_regs *regs);
 int cxl_map_pmu_regs(struct cxl_register_map *map, struct cxl_pmu_regs *regs);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/16] cxl/regs: Add CXL Isolation capability mapping
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
  2025-07-30 21:47 ` [PATCH 01/16] cxl/regs: Add cxl_unmap_component_regs() Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 03/16] PCI: PCIe portdrv: Add CXL Isolation service driver Ben Cheatham
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Add necessary information to map the CXL Timeout & Isolation Capability
(CXL 3.2 8.2.4.24). This will be used in later commits by the CXL core
and PCIe portdrv driver to set up and manage the capability.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/core/regs.c | 8 ++++++++
 drivers/cxl/cxl.h       | 7 +++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index da8e668a3b70..bdc1eb59d69c 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -92,6 +92,13 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
 			length = CXL_RAS_CAPABILITY_LENGTH;
 			rmap = &map->ras;
 			break;
+		case CXL_CM_CAP_CAP_ID_ISOLATION:
+			dev_dbg(dev,
+				"found Timeout & Isolation capability (0x%x)\n",
+				offset);
+			length = CXL_ISOLATION_CAPABILITY_LENGTH;
+			rmap = &map->isolation;
+			break;
 		default:
 			dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
 				offset);
@@ -213,6 +220,7 @@ static int cxl_get_mapinfo(const struct cxl_register_map *map,
 	struct mapinfo mapinfo[] = {
 		{ &map->component_map.hdm_decoder, &regs->hdm_decoder },
 		{ &map->component_map.ras, &regs->ras },
+		{ &map->component_map.isolation, &regs->isolation },
 	};
 	int i;
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index a0fda305e25b..3013ba600ba3 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -41,6 +41,7 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
 
 #define   CXL_CM_CAP_CAP_ID_RAS 0x2
 #define   CXL_CM_CAP_CAP_ID_HDM 0x5
+#define   CXL_CM_CAP_CAP_ID_ISOLATION 0x9
 #define   CXL_CM_CAP_CAP_HDM_VERSION 1
 
 /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
@@ -133,6 +134,9 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 	return 0;
 }
 
+/* CXL 3.2 8.2.4.24 CXL Timeout and Isolation Capability Structure */
+#define CXL_ISOLATION_CAPABILITY_LENGTH 0x10
+
 /* RAS Registers CXL 2.0 8.2.5.9 CXL RAS Capability Structure */
 #define CXL_RAS_UNCORRECTABLE_STATUS_OFFSET 0x0
 #define   CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
@@ -211,10 +215,12 @@ struct cxl_regs {
 	 * Common set of CXL Component register block base pointers
 	 * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
 	 * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
+	 * @isolation: CXL 3.2 8.2.4.24 CXL Timeout & Isolation Capability Structure
 	 */
 	struct_group_tagged(cxl_component_regs, component,
 		void __iomem *hdm_decoder;
 		void __iomem *ras;
+		void __iomem *isolation;
 	);
 	/*
 	 * Common set of CXL Device register block base pointers
@@ -257,6 +263,7 @@ struct cxl_reg_map {
 struct cxl_component_reg_map {
 	struct cxl_reg_map hdm_decoder;
 	struct cxl_reg_map ras;
+	struct cxl_reg_map isolation;
 };
 
 struct cxl_device_reg_map {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/16] PCI: PCIe portdrv: Add CXL Isolation service driver
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
  2025-07-30 21:47 ` [PATCH 01/16] cxl/regs: Add cxl_unmap_component_regs() Ben Cheatham
  2025-07-30 21:47 ` [PATCH 02/16] cxl/regs: Add CXL Isolation capability mapping Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 04/16] PCI: PCIe portdrv: Allocate CXL isolation MSI/-X vector Ben Cheatham
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Add the CXL isolation service, which will provide the necessary
information to the PCIe portdrv and CXL drivers to map, setup, and
handle CXL isolation interrupts.

Add functions to get the CXL isolation MSI/-X interrupt vector
from the PCIe portdrv.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/Kconfig              | 14 +++++
 drivers/cxl/cxl.h                |  4 ++
 drivers/pci/pcie/Makefile        |  1 +
 drivers/pci/pcie/cxl_isolation.c | 87 ++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.c       |  1 +
 drivers/pci/pcie/portdrv.h       | 18 ++++++-
 6 files changed, 124 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/cxl_isolation.c

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 57274de54a45..537e1e8e13da 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -247,4 +247,18 @@ config CXL_NATIVE_RAS
 	  If unsure, or if this kernel is meant for production environments,
 	  say Y.
 
+config CXL_ISOLATION
+	bool "CXL.mem Isolation Support"
+	depends on PCIEPORTBUS
+	depends on CXL_BUS=PCIEPORTBUS
+	help
+	  Enables the CXL.mem isolation PCIe port bus service driver. This
+	  driver, in combination with the CXL driver core, is responsible
+	  for managing CXL-capable PCIe root ports that undergo CXL.mem
+	  error isolation due to either a CXL.mem transaction timeout or
+	  link down condition. Without error isolation, either of these
+	  conditions will trigger a system reset.
+
+	  If unsure say 'y'
+
 endif
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 3013ba600ba3..fdd7c4e024a6 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -135,6 +135,10 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 }
 
 /* CXL 3.2 8.2.4.24 CXL Timeout and Isolation Capability Structure */
+#define CXL_ISOLATION_CAPABILITY_OFFSET 0x0
+#define   CXL_ISOLATION_CAP_MEM_ISO_SUPP BIT(16)
+#define   CXL_ISOLATION_CAP_INTR_SUPP BIT(26)
+#define   CXL_ISOLATION_CAP_INTR_MASK GENMASK(31, 27)
 #define CXL_ISOLATION_CAPABILITY_LENGTH 0x10
 
 /* RAS Registers CXL 2.0 8.2.5.9 CXL RAS Capability Structure */
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index cd2cb925dbd5..8b959ddb9684 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -14,3 +14,4 @@ obj-$(CONFIG_PCIE_PME)		+= pme.o
 obj-$(CONFIG_PCIE_DPC)		+= dpc.o
 obj-$(CONFIG_PCIE_PTM)		+= ptm.o
 obj-$(CONFIG_PCIE_EDR)		+= edr.o
+obj-$(CONFIG_CXL_ISOLATION)	+= cxl_isolation.o
diff --git a/drivers/pci/pcie/cxl_isolation.c b/drivers/pci/pcie/cxl_isolation.c
new file mode 100644
index 000000000000..550f16271d1c
--- /dev/null
+++ b/drivers/pci/pcie/cxl_isolation.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * The CXL Isolation PCIe port service driver provides functions to allocate
+ * and set up CXL Timeout & Isolation interrupts (CXL 3.2 12.3). This driver
+ * does no actual interrupt handling, it only provides the information for
+ * the CXL driver to set up its own handling because the CXL driver is better
+ * equipped to handle isolation interrupts.
+ *
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Ben Cheatham <Benjamin.Cheatham@amd.com>
+ */
+
+#include <linux/pci.h>
+
+#include "../../cxl/cxlpci.h"
+#include "portdrv.h"
+
+static int get_isolation_intr_vec(u32 cap)
+{
+	if (!FIELD_GET(CXL_ISOLATION_CAP_INTR_SUPP, cap) ||
+	    !FIELD_GET(CXL_ISOLATION_CAP_MEM_ISO_SUPP, cap))
+		return -ENXIO;
+
+	return FIELD_GET(CXL_ISOLATION_CAP_INTR_MASK, cap);
+}
+
+int pcie_cxliso_get_intr_vec(struct pci_dev *dev, int *vec)
+{
+	struct cxl_component_regs regs;
+	struct cxl_register_map map;
+	u32 cap;
+	int rc;
+
+	rc = cxl_find_regblock(dev, CXL_REGLOC_RBI_COMPONENT, &map);
+	if (rc)
+		return rc;
+
+	rc = cxl_setup_regs(&map);
+	if (rc)
+		return rc;
+
+	if (!map.component_map.isolation.valid)
+		return -ENXIO;
+
+	rc = cxl_map_component_regs(&map, &regs,
+				    BIT(CXL_CM_CAP_CAP_ID_ISOLATION));
+	if (rc)
+		return rc;
+
+	cap = readl(regs.isolation + CXL_ISOLATION_CAPABILITY_OFFSET);
+	rc = get_isolation_intr_vec(cap);
+	if (rc < 0) {
+		cxl_unmap_component_regs(&map, &regs,
+					 BIT(CXL_CM_CAP_CAP_ID_ISOLATION));
+		return rc;
+	}
+
+	if (vec)
+		*vec = rc;
+
+	cxl_unmap_component_regs(&map, &regs, BIT(CXL_CM_CAP_CAP_ID_ISOLATION));
+	return 0;
+
+}
+
+static int cxl_isolation_probe(struct pcie_device *dev)
+{
+	if (!pcie_is_cxl(dev->port) || pcie_cxliso_get_intr_vec(dev->port, NULL))
+		return -ENXIO;
+
+	pci_info(dev->port, "CXLISO: Signaling with IRQ %d\n", dev->irq);
+	return 0;
+}
+
+static struct pcie_port_service_driver isolationdriver = {
+	.name = "cxl_isolation",
+	.port_type = PCI_EXP_TYPE_ROOT_PORT,
+	.service = PCIE_PORT_SERVICE_CXLISO,
+	.probe = cxl_isolation_probe,
+};
+
+int __init pcie_cxliso_init(void)
+{
+	return pcie_port_service_register(&isolationdriver);
+}
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index e8318fd5f6ed..465d1aec4ca6 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -834,6 +834,7 @@ static void __init pcie_init_services(void)
 	pcie_dpc_init();
 	pcie_bwctrl_init();
 	pcie_hp_init();
+	pcie_cxliso_init();
 }
 
 static int __init pcie_portdrv_init(void)
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index bd29d1cc7b8b..a9bfdb0b82be 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -22,8 +22,10 @@
 #define PCIE_PORT_SERVICE_DPC		(1 << PCIE_PORT_SERVICE_DPC_SHIFT)
 #define PCIE_PORT_SERVICE_BWCTRL_SHIFT	4	/* Bandwidth Controller (notifications) */
 #define PCIE_PORT_SERVICE_BWCTRL	(1 << PCIE_PORT_SERVICE_BWCTRL_SHIFT)
+#define PCIE_PORT_SERVICE_CXLISO_SHIFT	5	/* CXL Timeout & Isolation */
+#define PCIE_PORT_SERVICE_CXLISO	(1 << PCIE_PORT_SERVICE_CXLISO_SHIFT)
 
-#define PCIE_PORT_DEVICE_MAXSERVICES   5
+#define PCIE_PORT_DEVICE_MAXSERVICES   6
 
 extern bool pcie_ports_dpc_native;
 
@@ -53,6 +55,12 @@ static inline int pcie_dpc_init(void) { return 0; }
 
 int pcie_bwctrl_init(void);
 
+#ifdef CONFIG_CXL_ISOLATION
+int pcie_cxliso_init(void);
+#else
+static inline int pcie_cxliso_init(void) { return 0; }
+#endif
+
 /* Port Type */
 #define PCIE_ANY_PORT			(~0)
 
@@ -123,4 +131,12 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {}
 #endif /* !CONFIG_PCIE_PME */
 
 struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
+
+#ifdef CONFIG_CXL_ISOLATION
+int pcie_cxliso_get_intr_vec(struct pci_dev *dev, int *vec);
+#else
+static inline int pcie_cxliso_get_intr_vec(struct pci_dev *dev, int *vec)
+{ return -ENXIO; }
+#endif
+
 #endif /* _PORTDRV_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 04/16] PCI: PCIe portdrv: Allocate CXL isolation MSI/-X vector
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (2 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 03/16] PCI: PCIe portdrv: Add CXL Isolation service driver Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-08-04 21:39   ` Bjorn Helgaas
  2025-07-30 21:47 ` [PATCH 05/16] PCI: PCIe portdrv: Add interface for getting CXL isolation IRQ Ben Cheatham
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Update the PCIe portdrv MSI/-X vector allocation code to include the CXL
isolation service.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/pci/pcie/portdrv.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 465d1aec4ca6..6119e2e25ad2 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -50,12 +50,12 @@ static void release_pcie_device(struct device *dev)
 }
 
 /*
- * Fill in *pme, *aer, *dpc with the relevant Interrupt Message Numbers if
+ * Fill in *pme, *aer, *dpc, *iso with the relevant Interrupt Message Numbers if
  * services are enabled in "mask".  Return the number of MSI/MSI-X vectors
  * required to accommodate the largest Message Number.
  */
 static int pcie_message_numbers(struct pci_dev *dev, int mask,
-				u32 *pme, u32 *aer, u32 *dpc)
+				u32 *pme, u32 *aer, u32 *dpc, u32 *iso)
 {
 	u32 nvec = 0, pos;
 	u16 reg16;
@@ -98,6 +98,11 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
 		}
 	}
 
+	if (mask & PCIE_PORT_SERVICE_CXLISO) {
+		if (pcie_cxliso_get_intr_vec(dev, iso) == 0)
+			nvec = max(nvec, *iso + 1);
+	}
+
 	return nvec;
 }
 
@@ -113,7 +118,7 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
 static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
 {
 	int nr_entries, nvec, pcie_irq;
-	u32 pme = 0, aer = 0, dpc = 0;
+	u32 pme = 0, aer = 0, dpc = 0, iso = 0;
 
 	/* Allocate the maximum possible number of MSI/MSI-X vectors */
 	nr_entries = pci_alloc_irq_vectors(dev, 1, PCIE_PORT_MAX_MSI_ENTRIES,
@@ -122,7 +127,7 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
 		return nr_entries;
 
 	/* See how many and which Interrupt Message Numbers we actually use */
-	nvec = pcie_message_numbers(dev, mask, &pme, &aer, &dpc);
+	nvec = pcie_message_numbers(dev, mask, &pme, &aer, &dpc, &iso);
 	if (nvec > nr_entries) {
 		pci_free_irq_vectors(dev);
 		return -EIO;
@@ -163,6 +168,9 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
 	if (mask & PCIE_PORT_SERVICE_DPC)
 		irqs[PCIE_PORT_SERVICE_DPC_SHIFT] = pci_irq_vector(dev, dpc);
 
+	if (mask & PCIE_PORT_SERVICE_CXLISO)
+		irqs[PCIE_PORT_SERVICE_CXLISO_SHIFT] = pci_irq_vector(dev, iso);
+
 	return 0;
 }
 
@@ -278,6 +286,10 @@ static int get_port_device_capability(struct pci_dev *dev)
 			services |= PCIE_PORT_SERVICE_BWCTRL;
 	}
 
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
+	    pcie_cxliso_get_intr_vec(dev, NULL) == 0)
+			services |= PCIE_PORT_SERVICE_CXLISO;
+
 	return services;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 05/16] PCI: PCIe portdrv: Add interface for getting CXL isolation IRQ
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (3 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 04/16] PCI: PCIe portdrv: Allocate CXL isolation MSI/-X vector Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-31  5:59   ` Lukas Wunner
  2025-07-30 21:47 ` [PATCH 06/16] cxl/core: Enable CXL.mem isolation Ben Cheatham
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Add a function to the CXL isolation service driver that allows the CXL
core to get the necessary information for setting up an interrupt
handler.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/pci/pcie/cxl_isolation.c | 26 +++++++++++++++++++++++++-
 include/cxl/isolation.h          | 26 ++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 include/cxl/isolation.h

diff --git a/drivers/pci/pcie/cxl_isolation.c b/drivers/pci/pcie/cxl_isolation.c
index 550f16271d1c..5a56a327b599 100644
--- a/drivers/pci/pcie/cxl_isolation.c
+++ b/drivers/pci/pcie/cxl_isolation.c
@@ -13,6 +13,7 @@
  */
 
 #include <linux/pci.h>
+#include <cxl/isolation.h>
 
 #include "../../cxl/cxlpci.h"
 #include "portdrv.h"
@@ -62,14 +63,37 @@ int pcie_cxliso_get_intr_vec(struct pci_dev *dev, int *vec)
 
 	cxl_unmap_component_regs(&map, &regs, BIT(CXL_CM_CAP_CAP_ID_ISOLATION));
 	return 0;
+}
+
+struct cxl_isolation_info *
+pcie_cxl_dport_get_isolation_info(struct pci_dev *dport_dev)
+{
+	struct device *dev;
 
+	dev = pcie_port_find_device(dport_dev, PCIE_PORT_SERVICE_CXLISO);
+	if (!dev)
+		return NULL;
+
+	return get_service_data(to_pcie_device(dev));
 }
 
 static int cxl_isolation_probe(struct pcie_device *dev)
 {
-	if (!pcie_is_cxl(dev->port) || pcie_cxliso_get_intr_vec(dev->port, NULL))
+	struct cxl_isolation_info *info;
+	if (!pcie_is_cxl(dev->port) ||
+	    pcie_cxliso_get_intr_vec(dev->port, NULL))
 		return -ENXIO;
 
+	info = devm_kzalloc(&dev->device, sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	*info = (struct cxl_isolation_info) {
+		.dev = &dev->device,
+		.irq = dev->irq,
+	};
+
+	set_service_data(dev, info);
 	pci_info(dev->port, "CXLISO: Signaling with IRQ %d\n", dev->irq);
 	return 0;
 }
diff --git a/include/cxl/isolation.h b/include/cxl/isolation.h
new file mode 100644
index 000000000000..20a3a8942b2c
--- /dev/null
+++ b/include/cxl/isolation.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __CXL_ISOLATION_H__
+#define __CXL_ISOLATION_H__
+
+#include <linux/pci.h>
+
+/**
+ * struct cxl_isolation_info - Information for mapping CXL Isolation interrupts
+ * @dev: PCIe portdrv service device associated with IRQ
+ * @irq: IRQ line for interrupt
+ */
+struct cxl_isolation_info {
+	struct device *dev;
+	int irq;
+};
+
+#ifdef CONFIG_CXL_ISOLATION
+struct cxl_isolation_info *
+pcie_cxl_dport_get_isolation_info(struct pci_dev *dport_dev);
+#else /* !CONFIG_CXL_ISOLATION */
+static inline struct cxl_isolation_info *
+pcie_cxl_dport_get_isolation_info(struct pci_dev *dport_dev)
+{ return NULL; }
+#endif /* !CONFIG_CXL_ISOLATION */
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 06/16] cxl/core: Enable CXL.mem isolation
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (4 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 05/16] PCI: PCIe portdrv: Add interface for getting CXL isolation IRQ Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 07/16] cxl/core: Set up isolation interrupts Ben Cheatham
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Enable CXL.mem isolation during set up of CXL-capable PCIe Root Ports
that have the capability. This capability is optional, so failure to
enable isolation is not an error that should fail probe.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/core/pci.c  | 45 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/port.c | 36 +++++++++++++++++++++++++++++++++
 drivers/cxl/cxl.h       |  5 +++++
 include/cxl/isolation.h | 10 +++++++++
 4 files changed, 96 insertions(+)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 3106ed6b5f49..c4d8d9690214 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -7,6 +7,7 @@
 #include <linux/pci.h>
 #include <linux/pci-doe.h>
 #include <linux/aer.h>
+#include <cxl/isolation.h>
 #include <cxlpci.h>
 #include <cxlmem.h>
 #include <cxl.h>
@@ -1170,3 +1171,47 @@ int cxl_port_update_total_dports(struct cxl_port *port)
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_port_update_total_dports, "CXL");
+
+void cxl_enable_isolation(struct cxl_dport *dport)
+{
+	u32 ctrl;
+
+	ctrl = readl(dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	ctrl |= CXL_ISOLATION_CTRL_MEM_ISO_ENABLE;
+	writel(ctrl, dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+
+	dev_dbg(dport->dport_dev, "Enabled CXL.mem isolation\n");
+}
+
+static int cxl_dport_wait_for_rp_busy(void __iomem *reg)
+{
+	u32 status;
+	int i = 4;
+
+	do {
+		status = readl(reg + CXL_ISOLATION_STATUS_OFFSET);
+		if (!(status & CXL_ISOLATION_STAT_RP_BUSY))
+			return 0;
+
+		msleep(500);
+	} while (--i);
+
+	return -ETIMEDOUT;
+}
+
+int cxl_disable_isolation(struct cxl_dport *dport)
+{
+	u32 ctrl;
+	int rc;
+
+	rc = cxl_dport_wait_for_rp_busy(dport->regs.isolation);
+	if (rc)
+		return rc;
+
+	ctrl = readl(dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	ctrl &= CXL_ISOLATION_CTRL_MEM_ISO_ENABLE;
+	writel(ctrl, dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+
+	dev_dbg(dport->dport_dev, "Disabled CXL.mem isolation\n");
+	return 0;
+}
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 48f632aa3c7e..71e954ebc5aa 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -10,6 +10,7 @@
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/node.h>
+#include <cxl/isolation.h>
 #include <cxl/einj.h>
 #include <cxlmem.h>
 #include <cxlpci.h>
@@ -1173,6 +1174,35 @@ static void cxl_dport_unlink(void *data)
 	sysfs_remove_link(&port->dev.kobj, link_name);
 }
 
+/**
+ * cxl_dport_enable_isolation - Enable CXL Isolation for a CXL dport. This is
+ * an optional capability only supported by PCIe Root Ports.
+ *
+ * @dport: CXL-capable PCIe Root Port
+ *
+ * Returns 0 if capability unsupported, or when enabled.
+ */
+static int cxl_dport_enable_isolation(struct cxl_dport *dport)
+{
+	u32 cap;
+	int rc;
+
+	if (!dport->reg_map.component_map.isolation.valid)
+		return 0;
+
+	rc = cxl_map_component_regs(&dport->reg_map, &dport->regs.component,
+				    BIT(CXL_CM_CAP_CAP_ID_ISOLATION));
+	if (rc)
+		return rc;
+
+	cap = readl(dport->regs.isolation + CXL_ISOLATION_CAPABILITY_OFFSET);
+	if (!(cap & CXL_ISOLATION_CAP_MEM_ISO_SUPP))
+		return 0;
+
+	cxl_enable_isolation(dport);
+	return 0;
+}
+
 static struct cxl_dport *
 __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
 		     int port_id, resource_size_t component_reg_phys,
@@ -1235,6 +1265,12 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
 		dev_dbg(dport_dev, "Component Registers found for dport: %pa\n",
 			&component_reg_phys);
 
+	if (IS_ENABLED(CONFIG_CXL_ISOLATION)) {
+		rc = cxl_dport_enable_isolation(dport);
+		if (rc)
+			return ERR_PTR(rc);
+	}
+
 	cond_cxl_root_lock(port);
 	rc = add_dport(port, dport);
 	cond_cxl_root_unlock(port);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index fdd7c4e024a6..999ffa05b68f 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -139,6 +139,11 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 #define   CXL_ISOLATION_CAP_MEM_ISO_SUPP BIT(16)
 #define   CXL_ISOLATION_CAP_INTR_SUPP BIT(26)
 #define   CXL_ISOLATION_CAP_INTR_MASK GENMASK(31, 27)
+#define CXL_ISOLATION_CTRL_OFFSET 0x8
+#define   CXL_ISOLATION_CTRL_MEM_ISO_ENABLE BIT(16)
+#define CXL_ISOLATION_STATUS_OFFSET 0xC
+#define   CXL_ISOLATION_STAT_MEM_ISO BIT(8)
+#define   CXL_ISOLATION_STAT_RP_BUSY BIT(14)
 #define CXL_ISOLATION_CAPABILITY_LENGTH 0x10
 
 /* RAS Registers CXL 2.0 8.2.5.9 CXL RAS Capability Structure */
diff --git a/include/cxl/isolation.h b/include/cxl/isolation.h
index 20a3a8942b2c..429501a655dd 100644
--- a/include/cxl/isolation.h
+++ b/include/cxl/isolation.h
@@ -14,6 +14,16 @@ struct cxl_isolation_info {
 	int irq;
 };
 
+struct cxl_dport;
+#if IS_ENABLED(CONFIG_CXL_BUS)
+void cxl_enable_isolation(struct cxl_dport *dport);
+int cxl_disable_isolation(struct cxl_dport *dport);
+#else /* !CONFIG_CXL_BUS */
+static inline void cxl_enable_isolation(struct cxl_dport *dport) {}
+static inline int cxl_disable_isolation(struct cxl_dport *dport)
+{ return -ENXIO; }
+#endif /* !CONFIG_CXL_BUS */
+
 #ifdef CONFIG_CXL_ISOLATION
 struct cxl_isolation_info *
 pcie_cxl_dport_get_isolation_info(struct pci_dev *dport_dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 07/16] cxl/core: Set up isolation interrupts
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (5 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 06/16] cxl/core: Enable CXL.mem isolation Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 08/16] cxl/core: Enable CXL " Ben Cheatham
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Register a CXL isolation interrupt handler as part of cxl_dport set up.
Only CXL-capable PCIe Root Ports have CXL.mem isolation interrupt support.
The interrupts are left masked and will be unmasked in a later commit.

A CXL-capable PCIe Root Port that has CXL.mem isolation support and no
interrupt support will have any isolation support enabled. If
isolation were enabled without interrupts CXL.mem transactions could
return poisoned data. This could cause data/system corruption if left
unhandled, so the capability is left disabled in this case.

CXL endpoint drivers can add an isolation handler for a device through
the isolation_handler member of struct cxl_dev_state. If this handler
is not present, the system will panic. If the handler opts to not panic
(i.e. returns "CXL_ERR_NONE"), the endpoint driver is charged with
maintaining system reliability (cleaning up CXL memory, disabling device
state, etc.).

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/core/port.c | 113 ++++++++++++++++++++++++++++++++++++++--
 drivers/cxl/cxl.h       |   1 +
 drivers/cxl/cxlmem.h    |   4 ++
 include/cxl/isolation.h |  10 ++++
 4 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 71e954ebc5aa..a36440e85647 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1174,15 +1174,117 @@ static void cxl_dport_unlink(void *data)
 	sysfs_remove_link(&port->dev.kobj, link_name);
 }
 
+struct isolation_intr_data {
+	struct cxl_dport *dport;
+	struct cxl_port *port;
+};
+
+static irqreturn_t cxl_isolation_thread(int irq, void *_data)
+{
+	struct isolation_intr_data *data = _data;
+	struct cxl_dport *dport = data->dport;
+	struct cxl_port *port = data->port;
+	enum cxl_err_results res, acc;
+	struct cxl_dev_state *cxlds;
+	struct cxl_memdev *cxlmd;
+	struct cxl_dport *iter;
+	unsigned long index;
+	struct cxl_ep *ep;
+	bool lnk_down;
+	u32 status;
+
+	if (!dport || !port)
+		return IRQ_NONE;
+
+	guard(device)(&port->dev);
+	if (!dport->regs.isolation)
+		goto panic;
+
+	status = readl(dport->regs.isolation + CXL_ISOLATION_STATUS_OFFSET);
+	lnk_down = FIELD_GET(CXL_ISOLATION_STAT_LNK_DOWN, status);
+
+	acc = CXL_ERR_NONE;
+	xa_for_each(&port->endpoints, index, ep) {
+		iter = ep->dport;
+		while (iter && (&iter->port->dev != &port->dev))
+			iter = iter->port->parent_dport;
+
+		res = CXL_ERR_PANIC;
+		if (iter->dport_dev == dport->dport_dev) {
+			cxlmd = to_cxl_memdev(ep->ep);
+			cxlds = cxlmd->cxlds;
+
+			if (cxlds && cxlds->isolation_handler)
+				res = cxlds->isolation_handler(cxlds, lnk_down);
+		}
+
+		acc = max(res, acc);
+	}
+
+	if (acc == CXL_ERR_NONE)
+		return IRQ_HANDLED;
+
+panic:
+	panic("%s: downstream devices could not recover from CXL.mem link down\n",
+	      dev_name(dport->dport_dev));
+	return IRQ_NONE;
+}
+
+static void cxl_dport_free_interrupts(void *data)
+{
+	struct cxl_isolation_info *info;
+	struct cxl_dport *dport = data;
+	struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
+
+	info = pcie_cxl_dport_get_isolation_info(pdev);
+	if (!info)
+		return;
+
+	devm_free_irq(info->dev, info->irq, dport);
+}
+
+static int cxl_dport_setup_interrupts(struct device *host,
+				      struct cxl_dport *dport)
+{
+	struct isolation_intr_data *data;
+	struct cxl_isolation_info *info;
+	u32 cap;
+	int rc;
+
+	cap = readl(dport->regs.isolation + CXL_ISOLATION_CAPABILITY_OFFSET);
+	if (!(cap & CXL_ISOLATION_CAP_INTR_SUPP))
+		return -ENXIO;
+
+	info = pcie_cxl_dport_get_isolation_info(to_pci_dev(dport->dport_dev));
+	if (!info)
+		return -ENXIO;
+
+	data = devm_kmalloc(host, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+	data->port = dport->port;
+	data->dport = dport;
+
+	rc = devm_request_threaded_irq(info->dev, info->irq, NULL,
+				       cxl_isolation_thread,
+				       IRQF_SHARED | IRQF_ONESHOT,
+				       "cxl_isolation", data);
+	if (rc)
+		return rc;
+
+	return devm_add_action_or_reset(host, cxl_dport_free_interrupts, dport);
+}
+
 /**
  * cxl_dport_enable_isolation - Enable CXL Isolation for a CXL dport. This is
  * an optional capability only supported by PCIe Root Ports.
- *
+ * @host: Host device for @dport
  * @dport: CXL-capable PCIe Root Port
  *
  * Returns 0 if capability unsupported, or when enabled.
  */
-static int cxl_dport_enable_isolation(struct cxl_dport *dport)
+static int cxl_dport_enable_isolation(struct device *host,
+				      struct cxl_dport *dport)
 {
 	u32 cap;
 	int rc;
@@ -1199,6 +1301,10 @@ static int cxl_dport_enable_isolation(struct cxl_dport *dport)
 	if (!(cap & CXL_ISOLATION_CAP_MEM_ISO_SUPP))
 		return 0;
 
+	rc = cxl_dport_setup_interrupts(host, dport);
+	if (rc)
+		return rc == -ENXIO ? 0 : rc;
+
 	cxl_enable_isolation(dport);
 	return 0;
 }
@@ -1266,7 +1372,7 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
 			&component_reg_phys);
 
 	if (IS_ENABLED(CONFIG_CXL_ISOLATION)) {
-		rc = cxl_dport_enable_isolation(dport);
+		rc = cxl_dport_enable_isolation(host, dport);
 		if (rc)
 			return ERR_PTR(rc);
 	}
@@ -1543,6 +1649,7 @@ static void reap_dport(struct cxl_port *port, struct cxl_dport *dport)
 {
 	devm_release_action(&port->dev, cxl_dport_unlink, dport);
 	devm_release_action(&port->dev, cxl_dport_remove, dport);
+	devm_release_action(&port->dev, cxl_dport_free_interrupts, dport);
 	devm_kfree(&port->dev, dport);
 }
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 999ffa05b68f..9e3ca754251d 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -143,6 +143,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 #define   CXL_ISOLATION_CTRL_MEM_ISO_ENABLE BIT(16)
 #define CXL_ISOLATION_STATUS_OFFSET 0xC
 #define   CXL_ISOLATION_STAT_MEM_ISO BIT(8)
+#define   CXL_ISOLATION_STAT_LNK_DOWN BIT(9)
 #define   CXL_ISOLATION_STAT_RP_BUSY BIT(14)
 #define CXL_ISOLATION_CAPABILITY_LENGTH 0x10
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 551b0ba2caa1..fbe64c580785 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -9,6 +9,7 @@
 #include <linux/node.h>
 #include <cxl/event.h>
 #include <cxl/mailbox.h>
+#include <cxl/isolation.h>
 #include "cxl.h"
 
 /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -426,6 +427,7 @@ struct cxl_dpa_partition {
  * @type: Generic Memory Class device or Vendor Specific Memory device
  * @cxl_mbox: CXL mailbox context
  * @cxlfs: CXL features context
+ * @isolation_handler: CXL isolation CXL.mem link down handler
  */
 struct cxl_dev_state {
 	struct device *dev;
@@ -444,6 +446,8 @@ struct cxl_dev_state {
 #ifdef CONFIG_CXL_FEATURES
 	struct cxl_features_state *cxlfs;
 #endif
+	enum cxl_err_results (*isolation_handler)(struct cxl_dev_state *cxlds,
+						  bool lnk_down);
 };
 
 static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
diff --git a/include/cxl/isolation.h b/include/cxl/isolation.h
index 429501a655dd..3ad05ccc5e01 100644
--- a/include/cxl/isolation.h
+++ b/include/cxl/isolation.h
@@ -4,6 +4,16 @@
 
 #include <linux/pci.h>
 
+/**
+ * enum cxl_err_results - Possible results of an CXL isolation handler
+ * @CXL_ERR_NONE: Device can recover without CXL core intervention
+ * @CXL_ERR_PANIC: Device can't recover
+ */
+enum cxl_err_results {
+	CXL_ERR_NONE = 0,
+	CXL_ERR_PANIC,
+};
+
 /**
  * struct cxl_isolation_info - Information for mapping CXL Isolation interrupts
  * @dev: PCIe portdrv service device associated with IRQ
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/16] cxl/core: Enable CXL isolation interrupts
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (6 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 07/16] cxl/core: Set up isolation interrupts Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 09/16] cxl/core: Prevent onlining CXL memory behind isolated ports Ben Cheatham
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Add functions to enable and disable CXL isolation interrupts. Use these
functions to enable interrupts as part of isolation set up.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/core/core.h |  2 ++
 drivers/cxl/core/pci.c  | 22 ++++++++++++++++++++++
 drivers/cxl/core/port.c | 10 +++++++++-
 drivers/cxl/cxl.h       |  1 +
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index a84151238e17..4702a1f27318 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -146,4 +146,6 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
 		    u16 *return_code);
 #endif
 
+void cxl_enable_isolation_interrupts(struct cxl_dport *dport);
+void cxl_disable_isolation_interrupts(struct cxl_dport *dport);
 #endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index c4d8d9690214..89fb6d3854e3 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1172,6 +1172,28 @@ int cxl_port_update_total_dports(struct cxl_port *port)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_port_update_total_dports, "CXL");
 
+void cxl_enable_isolation_interrupts(struct cxl_dport *dport)
+{
+	u32 ctrl;
+
+	ctrl = readl(dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	ctrl |= CXL_ISOLATION_CTRL_MEM_INTR_ENABLE;
+	writel(ctrl, dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+
+	dev_dbg(dport->dport_dev, "Enabled CXL isolation interrupts\n");
+}
+
+void cxl_disable_isolation_interrupts(struct cxl_dport *dport)
+{
+	u32 ctrl;
+
+	ctrl = readl(dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	ctrl &= ~CXL_ISOLATION_CTRL_MEM_INTR_ENABLE;
+	writel(ctrl, dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+
+	dev_dbg(dport->dport_dev, "Disabled CXL isolation interrupts\n");
+}
+
 void cxl_enable_isolation(struct cxl_dport *dport)
 {
 	u32 ctrl;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index a36440e85647..90588bf927e0 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1236,6 +1236,9 @@ static void cxl_dport_free_interrupts(void *data)
 	struct cxl_dport *dport = data;
 	struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
 
+	if (dport->regs.isolation)
+		cxl_disable_isolation_interrupts(dport);
+
 	info = pcie_cxl_dport_get_isolation_info(pdev);
 	if (!info)
 		return;
@@ -1272,7 +1275,12 @@ static int cxl_dport_setup_interrupts(struct device *host,
 	if (rc)
 		return rc;
 
-	return devm_add_action_or_reset(host, cxl_dport_free_interrupts, dport);
+	rc = devm_add_action_or_reset(host, cxl_dport_free_interrupts, dport);
+	if (rc)
+		return rc;
+
+	cxl_enable_isolation_interrupts(dport);
+	return 0;
 }
 
 /**
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9e3ca754251d..62b3ed188949 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -141,6 +141,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 #define   CXL_ISOLATION_CAP_INTR_MASK GENMASK(31, 27)
 #define CXL_ISOLATION_CTRL_OFFSET 0x8
 #define   CXL_ISOLATION_CTRL_MEM_ISO_ENABLE BIT(16)
+#define   CXL_ISOLATION_CTRL_MEM_INTR_ENABLE BIT(26)
 #define CXL_ISOLATION_STATUS_OFFSET 0xC
 #define   CXL_ISOLATION_STAT_MEM_ISO BIT(8)
 #define   CXL_ISOLATION_STAT_LNK_DOWN BIT(9)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/16] cxl/core: Prevent onlining CXL memory behind isolated ports
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (7 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 08/16] cxl/core: Enable CXL " Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 10/16] cxl/core: Enable CXL.mem timeout Ben Cheatham
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

The host will not be able to access the CXL memory on devices
enabled/added below an isolated CXL downstream port. Add a check during
cxl_mem probe to prevent adding devices below an isolated port. Also add
a check to prevent CXL region creation below an isolated port for
previously disabled devices below the port.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/core/port.c   | 28 ++++++++++++++++++++++++++++
 drivers/cxl/core/region.c |  3 +++
 drivers/cxl/cxl.h         |  2 ++
 3 files changed, 33 insertions(+)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 90588bf927e0..c9e7bfc082d5 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -869,6 +869,13 @@ static int cxl_port_add(struct cxl_port *port,
 		 */
 		port->reg_map = cxlds->reg_map;
 		port->reg_map.host = &port->dev;
+
+		if (cxl_endpoint_port_isolated(port)) {
+			dev_err(&port->dev,
+				"port is under isolated CXL dport\n");
+			return -EBUSY;
+		}
+
 		cxlmd->endpoint = port;
 	} else if (parent_dport) {
 		rc = dev_set_name(dev, "port%d", port->id);
@@ -1174,6 +1181,27 @@ static void cxl_dport_unlink(void *data)
 	sysfs_remove_link(&port->dev.kobj, link_name);
 }
 
+bool cxl_endpoint_port_isolated(struct cxl_port *ep)
+{
+	struct cxl_dport *iter;
+	u32 status;
+
+	for (iter = ep->parent_dport;
+	     iter && iter->port && !is_cxl_root(iter->port);
+	     iter = iter->port->parent_dport) {
+		if (!iter->regs.isolation)
+			continue;
+
+		status = readl(iter->regs.isolation +
+			       CXL_ISOLATION_STATUS_OFFSET);
+		if (!(status & CXL_ISOLATION_STATUS_OFFSET))
+			return true;
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_isolated, "CXL");
+
 struct isolation_intr_data {
 	struct cxl_dport *dport;
 	struct cxl_port *port;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index b94fda6f2e4c..db9ff3b683aa 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3407,6 +3407,9 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 	int rc, part = READ_ONCE(cxled->part);
 	struct cxl_region *cxlr;
 
+	if (cxl_endpoint_port_isolated(cxlmd->endpoint))
+		return ERR_PTR(-EBUSY);
+
 	do {
 		cxlr = __create_region(cxlrd, cxlds->part[part].mode,
 				       atomic_read(&cxlrd->region_id));
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 62b3ed188949..8da1e40ab4e7 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -838,6 +838,8 @@ pci_ers_result_t cxl_error_detected(struct device *dev);
 void cxl_port_cor_error_detected(struct device *dev);
 pci_ers_result_t cxl_port_error_detected(struct device *dev);
 
+bool cxl_endpoint_port_isolated(struct cxl_port *port);
+
 /**
  * struct cxl_endpoint_dvsec_info - Cached DVSEC info
  * @mem_enabled: cached value of mem_enabled in the DVSEC at init time
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/16] cxl/core: Enable CXL.mem timeout
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (8 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 09/16] cxl/core: Prevent onlining CXL memory behind isolated ports Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 11/16] cxl/pci: Add isolation handler Ben Cheatham
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Add functions to enable and disable CXL.mem transaction timeout. Enable
timeout as part of CXL isolation set up.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/core/pci.c  | 22 ++++++++++++++++++++++
 drivers/cxl/core/port.c | 14 +++++++++-----
 drivers/cxl/cxl.h       |  4 ++++
 include/cxl/isolation.h |  5 +++++
 4 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 89fb6d3854e3..dd6c602d57d3 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1237,3 +1237,25 @@ int cxl_disable_isolation(struct cxl_dport *dport)
 	dev_dbg(dport->dport_dev, "Disabled CXL.mem isolation\n");
 	return 0;
 }
+
+void cxl_enable_timeout(struct cxl_dport *dport)
+{
+	u32 ctrl;
+
+	ctrl = readl(dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	ctrl |= CXL_ISOLATION_CTRL_MEM_TIME_ENABLE;
+	writel(ctrl, dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+
+	dev_dbg(dport->dport_dev, "Enabled CXL.mem transaction timeout\n");
+}
+
+void cxl_disable_timeout(struct cxl_dport *dport)
+{
+	u32 ctrl;
+
+	ctrl = readl(dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	ctrl &= ~CXL_ISOLATION_CTRL_MEM_TIME_ENABLE;
+	writel(ctrl, dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+
+	dev_dbg(dport->dport_dev, "Disabled CXL.mem transaction timeout\n");
+}
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index c9e7bfc082d5..6591e83e719c 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1312,15 +1312,15 @@ static int cxl_dport_setup_interrupts(struct device *host,
 }
 
 /**
- * cxl_dport_enable_isolation - Enable CXL Isolation for a CXL dport. This is
- * an optional capability only supported by PCIe Root Ports.
+ * cxl_dport_enable_timeout_isolation - Enable CXL Isolation for a CXL dport.
+ * This is an optional capability only supported by PCIe Root Ports.
  * @host: Host device for @dport
  * @dport: CXL-capable PCIe Root Port
  *
  * Returns 0 if capability unsupported, or when enabled.
  */
-static int cxl_dport_enable_isolation(struct device *host,
-				      struct cxl_dport *dport)
+static int cxl_dport_enable_timeout_isolation(struct device *host,
+					      struct cxl_dport *dport)
 {
 	u32 cap;
 	int rc;
@@ -1342,6 +1342,10 @@ static int cxl_dport_enable_isolation(struct device *host,
 		return rc == -ENXIO ? 0 : rc;
 
 	cxl_enable_isolation(dport);
+
+	if (!(cap & CXL_ISOLATION_CAP_MEM_TIME_SUPP))
+		cxl_enable_timeout(dport);
+
 	return 0;
 }
 
@@ -1408,7 +1412,7 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
 			&component_reg_phys);
 
 	if (IS_ENABLED(CONFIG_CXL_ISOLATION)) {
-		rc = cxl_dport_enable_isolation(host, dport);
+		rc = cxl_dport_enable_timeout_isolation(host, dport);
 		if (rc)
 			return ERR_PTR(rc);
 	}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 8da1e40ab4e7..7f9c6bd6e010 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -136,10 +136,14 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 
 /* CXL 3.2 8.2.4.24 CXL Timeout and Isolation Capability Structure */
 #define CXL_ISOLATION_CAPABILITY_OFFSET 0x0
+#define   CXL_ISOLATION_CAP_MEM_TIME_MASK GENMASK(3, 0)
+#define   CXL_ISOLATION_CAP_MEM_TIME_SUPP BIT(4)
 #define   CXL_ISOLATION_CAP_MEM_ISO_SUPP BIT(16)
 #define   CXL_ISOLATION_CAP_INTR_SUPP BIT(26)
 #define   CXL_ISOLATION_CAP_INTR_MASK GENMASK(31, 27)
 #define CXL_ISOLATION_CTRL_OFFSET 0x8
+#define   CXL_ISOLATION_CTRL_MEM_TIME_MASK GENMASK(3, 0)
+#define   CXL_ISOLATION_CTRL_MEM_TIME_ENABLE BIT(4)
 #define   CXL_ISOLATION_CTRL_MEM_ISO_ENABLE BIT(16)
 #define   CXL_ISOLATION_CTRL_MEM_INTR_ENABLE BIT(26)
 #define CXL_ISOLATION_STATUS_OFFSET 0xC
diff --git a/include/cxl/isolation.h b/include/cxl/isolation.h
index 3ad05ccc5e01..73282ac262a6 100644
--- a/include/cxl/isolation.h
+++ b/include/cxl/isolation.h
@@ -28,10 +28,15 @@ struct cxl_dport;
 #if IS_ENABLED(CONFIG_CXL_BUS)
 void cxl_enable_isolation(struct cxl_dport *dport);
 int cxl_disable_isolation(struct cxl_dport *dport);
+void cxl_enable_timeout(struct cxl_dport *dport);
+void cxl_disable_timeout(struct cxl_dport *dport);
+
 #else /* !CONFIG_CXL_BUS */
 static inline void cxl_enable_isolation(struct cxl_dport *dport) {}
 static inline int cxl_disable_isolation(struct cxl_dport *dport)
 { return -ENXIO; }
+static inline void cxl_enable_timeout(struct cxl_dport *dport) {}
+static inline void cxl_disable_timeout(struct cxl_dport *dport) {}
 #endif /* !CONFIG_CXL_BUS */
 
 #ifdef CONFIG_CXL_ISOLATION
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 11/16] cxl/pci: Add isolation handler
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (9 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 10/16] cxl/core: Enable CXL.mem timeout Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 12/16] PCI: PCIe portdrv: Add cxl_isolation sysfs attributes Ben Cheatham
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Add an error isolation handler for type 3 CXL devices. The current
behavior is to log any AER information and then panic.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/pci.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 2a56936ac4f2..b0f2400c6ac9 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -900,6 +900,13 @@ static struct attribute_group cxl_rcd_group = {
 };
 __ATTRIBUTE_GROUPS(cxl_rcd);
 
+static enum cxl_err_results
+cxl_pci_isolation_handler(struct cxl_dev_state *cxlds, bool lnk_down)
+{
+	cxl_error_detected(cxlds->dev);
+	return CXL_ERR_PANIC;
+}
+
 static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
@@ -1006,6 +1013,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		dev_dbg(&pdev->dev, "No CXL Features discovered\n");
 
+	cxlds->isolation_handler = cxl_pci_isolation_handler;
+
 	cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds);
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 12/16] PCI: PCIe portdrv: Add cxl_isolation sysfs attributes
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (10 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 11/16] cxl/pci: Add isolation handler Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 13/16] cxl/core, PCI: PCIe portdrv: Add CXL timeout range programming Ben Cheatham
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Add sysfs attributes to enable/disable CXL isolation and transaction
timeout. The intended use for these attributes is to disable isolation
and/or timeout as part of device maintenance or hotplug.

The attributes are added under a new "cxl_isolation" group on the PCIe
Root Port device.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/core/port.c          |  29 ++++++
 drivers/pci/pci-sysfs.c          |   3 +
 drivers/pci/pci.h                |   4 +
 drivers/pci/pcie/cxl_isolation.c | 158 +++++++++++++++++++++++++++++++
 include/cxl/isolation.h          |   8 ++
 5 files changed, 202 insertions(+)

diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 6591e83e719c..b5a306341bb2 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1346,6 +1346,7 @@ static int cxl_dport_enable_timeout_isolation(struct device *host,
 	if (!(cap & CXL_ISOLATION_CAP_MEM_TIME_SUPP))
 		cxl_enable_timeout(dport);
 
+	pcie_update_cxl_isolation_group(dport->dport_dev);
 	return 0;
 }
 
@@ -1598,6 +1599,34 @@ struct cxl_port *find_cxl_port(struct device *dport_dev,
 	return port;
 }
 
+/**
+ * cxl_find_pcie_rp - Find CXL port that contains a CXL-capable PCIe Root Port
+ * @dport_dev: CXL-capable PCIe Root Port device
+ * @rp: Pointer to store found dport in
+ *
+ * Returns CXL port with elevated reference count if @dport_dev is found
+ */
+struct cxl_port *cxl_find_pcie_rp(struct pci_dev *dport_dev,
+				  struct cxl_dport **rp)
+{
+	struct cxl_dport *dport;
+	struct cxl_port *parent;
+
+	struct cxl_port *hb __free(put_cxl_port) =
+		find_cxl_port(&dport_dev->dev, &dport);
+	if (!hb || !dport)
+		return NULL;
+
+	parent = parent_port_of(hb);
+	if (!parent || !is_cxl_root(parent))
+		return NULL;
+
+	if (rp)
+		*rp = dport;
+
+	return_ptr(hb);
+}
+
 static struct cxl_port *find_cxl_port_at(struct cxl_port *parent_port,
 					 struct device *dport_dev,
 					 struct cxl_dport **dport)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 268c69daa4d5..86e8d8d918cf 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1815,6 +1815,9 @@ const struct attribute_group *pci_dev_attr_groups[] = {
 #endif
 #ifdef CONFIG_PCI_DOE
 	&pci_doe_sysfs_group,
+#endif
+#ifdef CONFIG_CXL_ISOLATION
+	&cxl_isolation_attr_group,
 #endif
 	NULL,
 };
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c7fc86d93bea..3510a75c880b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -677,6 +677,10 @@ static inline void pci_rcec_exit(struct pci_dev *dev) { }
 static inline void pcie_link_rcec(struct pci_dev *rcec) { }
 #endif
 
+#ifdef CONFIG_CXL_ISOLATION
+extern struct attribute_group cxl_isolation_attr_group;
+#endif
+
 #ifdef CONFIG_PCI_ATS
 /* Address Translation Service */
 void pci_ats_init(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/cxl_isolation.c b/drivers/pci/pcie/cxl_isolation.c
index 5a56a327b599..9d2ad14810e8 100644
--- a/drivers/pci/pcie/cxl_isolation.c
+++ b/drivers/pci/pcie/cxl_isolation.c
@@ -77,6 +77,164 @@ pcie_cxl_dport_get_isolation_info(struct pci_dev *dport_dev)
 	return get_service_data(to_pcie_device(dev));
 }
 
+static ssize_t isolation_ctrl_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t n)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct cxl_port *port;
+	bool enable;
+	int rc;
+
+	rc = kstrtobool(buf, &enable);
+	if (rc)
+		return rc;
+
+	struct cxl_dport **dport __free(kfree) =
+		kzalloc(sizeof(*dport), GFP_KERNEL);
+	if (!dport)
+		return -ENOMEM;
+
+	port = cxl_find_pcie_rp(pdev, dport);
+	if (!port || !(*dport))
+		return -ENODEV;
+
+	if (enable)
+		cxl_enable_isolation(*dport);
+	else
+		rc = cxl_disable_isolation(*dport);
+
+	put_device(&port->dev);
+	return rc ? rc : n;
+}
+
+static ssize_t isolation_ctrl_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct cxl_port *port;
+	u32 ctrl;
+
+	struct cxl_dport **dport __free(kfree) =
+		kzalloc(sizeof(*dport), GFP_KERNEL);
+	if (!dport)
+		return -ENOMEM;
+
+	port = cxl_find_pcie_rp(pdev, dport);
+	if (!port || !(*dport))
+		return -ENODEV;
+
+	if (!(*dport)->regs.isolation)
+		return -ENXIO;
+
+	ctrl = readl((*dport)->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	put_device(&port->dev);
+
+	return sysfs_emit(buf, "%lu\n",
+			  FIELD_GET(CXL_ISOLATION_CTRL_MEM_ISO_ENABLE, ctrl));
+}
+DEVICE_ATTR_RW(isolation_ctrl);
+
+static ssize_t timeout_ctrl_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t n)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct cxl_port *port;
+	bool enable;
+	int rc;
+
+	rc = kstrtobool(buf, &enable);
+	if (rc)
+		return rc;
+
+	struct cxl_dport **dport __free(kfree) =
+		kzalloc(sizeof(*dport), GFP_KERNEL);
+	if (!dport)
+		return -ENOMEM;
+
+	port = cxl_find_pcie_rp(pdev, dport);
+	if (!port || !(*dport))
+		return -ENODEV;
+
+	if (enable)
+		cxl_enable_timeout(*dport);
+	else
+		cxl_disable_timeout(*dport);
+
+	put_device(&port->dev);
+	return n;
+}
+
+static ssize_t timeout_ctrl_show(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct cxl_port *port;
+	u32 ctrl;
+
+	struct cxl_dport **dport __free(kfree) =
+		kzalloc(sizeof(*dport), GFP_KERNEL);
+	if (!dport)
+		return -ENOMEM;
+
+	port = cxl_find_pcie_rp(pdev, dport);
+	if (!port || !(*dport))
+		return -ENODEV;
+
+	if (!(*dport)->regs.isolation)
+		return -ENXIO;
+
+	ctrl = readl((*dport)->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	put_device(&port->dev);
+
+	return sysfs_emit(buf, "%lu\n",
+			  FIELD_GET(CXL_ISOLATION_CTRL_MEM_TIME_ENABLE, ctrl));
+}
+DEVICE_ATTR_RW(timeout_ctrl);
+
+static struct attribute *isolation_attrs[] = {
+	&dev_attr_timeout_ctrl.attr,
+	&dev_attr_isolation_ctrl.attr,
+	NULL,
+};
+
+static umode_t cxl_isolation_attrs_visible(struct kobject *kobj,
+					   struct attribute *a, int n)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (!pcie_is_cxl(pdev) || pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT)
+		return 0;
+
+	if (pcie_port_find_device(pdev, PCIE_PORT_SERVICE_CXLISO))
+		return a->mode;
+	return 0;
+}
+
+const struct attribute_group cxl_isolation_attr_group = {
+	.name = "cxl_isolation",
+	.attrs = isolation_attrs,
+	.is_visible = cxl_isolation_attrs_visible,
+};
+
+void
+pcie_update_cxl_isolation_group(struct device *dport_dev)
+{
+	struct device *dev;
+
+	if (!dev_is_pci(dport_dev))
+		return;
+
+	dev = pcie_port_find_device(to_pci_dev(dport_dev),
+				    PCIE_PORT_SERVICE_CXLISO);
+	if (!dev)
+		return;
+
+	sysfs_update_group(&dport_dev->kobj, &cxl_isolation_attr_group);
+}
+
 static int cxl_isolation_probe(struct pcie_device *dev)
 {
 	struct cxl_isolation_info *info;
diff --git a/include/cxl/isolation.h b/include/cxl/isolation.h
index 73282ac262a6..0b6e4f0160a8 100644
--- a/include/cxl/isolation.h
+++ b/include/cxl/isolation.h
@@ -31,21 +31,29 @@ int cxl_disable_isolation(struct cxl_dport *dport);
 void cxl_enable_timeout(struct cxl_dport *dport);
 void cxl_disable_timeout(struct cxl_dport *dport);
 
+struct cxl_port *cxl_find_pcie_rp(struct pci_dev *pdev,
+				  struct cxl_dport **dport);
 #else /* !CONFIG_CXL_BUS */
 static inline void cxl_enable_isolation(struct cxl_dport *dport) {}
 static inline int cxl_disable_isolation(struct cxl_dport *dport)
 { return -ENXIO; }
 static inline void cxl_enable_timeout(struct cxl_dport *dport) {}
 static inline void cxl_disable_timeout(struct cxl_dport *dport) {}
+
+static inline struct cxl_port *cxl_find_pcie_rp(struct pci_dev *pdev,
+						struct cxl_dport **dport);
+{ return NULL; }
 #endif /* !CONFIG_CXL_BUS */
 
 #ifdef CONFIG_CXL_ISOLATION
 struct cxl_isolation_info *
 pcie_cxl_dport_get_isolation_info(struct pci_dev *dport_dev);
+void pcie_update_cxl_isolation_group(struct device *dport_dev);
 #else /* !CONFIG_CXL_ISOLATION */
 static inline struct cxl_isolation_info *
 pcie_cxl_dport_get_isolation_info(struct pci_dev *dport_dev)
 { return NULL; }
+static inline void pcie_update_cxl_isolation_group(struct device *dport_dev) {}
 #endif /* !CONFIG_CXL_ISOLATION */
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 13/16] cxl/core, PCI: PCIe portdrv: Add CXL timeout range programming
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (11 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 12/16] PCI: PCIe portdrv: Add cxl_isolation sysfs attributes Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-08-04 21:39   ` Bjorn Helgaas
  2025-07-30 21:47 ` [PATCH 14/16] ACPI: Add CXL isolation _OSC fields Ben Cheatham
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Add functions to enable programming the CXL.mem transaction timeout
range, if supported. Add a sysfs attribute to the "cxl_isolation" group
to allow programming the timeout from userspace.

The attribute can take either the CXL spec-defined hex value for the
associated timeout range (CXL 3.2 8.2.4.24.2 field 3:0) or a
string with the range. The range string is formatted as the range letter
in uppercase or lowercase, with an optional "2" to specify the second
range in the aforementioned spec ref.

For example, to program the port with a timeout of 65ms to 210ms (range B)
the following strings could be specified: "b2"/"B2". Picking the first
portion of range B (16ms to 55ms) would be: "b"/"B".

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/core/pci.c           |  49 +++++++++++++++
 drivers/pci/pcie/cxl_isolation.c | 102 +++++++++++++++++++++++++++++++
 include/cxl/isolation.h          |   3 +
 3 files changed, 154 insertions(+)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index dd6c602d57d3..616c337c818d 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1259,3 +1259,52 @@ void cxl_disable_timeout(struct cxl_dport *dport)
 
 	dev_dbg(dport->dport_dev, "Disabled CXL.mem transaction timeout\n");
 }
+
+static bool timeout_range_supported(u32 cap, u32 val)
+{
+	u32 supported = FIELD_GET(CXL_ISOLATION_CAP_MEM_TIME_MASK, cap);
+
+	if (!supported)
+		return false;
+
+	/* CXL 3.2 8.2.4.24.1 field 3:0 */
+	switch (val) {
+	/* Range A (default) */
+	case 0x0:
+	case 0x1:
+	case 0x2:
+		return (supported & BIT(0));
+	/* Range B */
+	case 0x5:
+	case 0x6:
+		return (supported & BIT(1));
+	/* Range C */
+	case 0x9:
+	case 0xA:
+		return (supported & BIT(2));
+	case 0xD:
+	case 0xE:
+	/* Range D */
+		return (supported & BIT(3));
+	default:
+		return false;
+	}
+}
+
+int cxl_set_timeout_range(struct cxl_dport *dport, u8 val)
+{
+	u32 cap, ctrl;
+
+	cap = readl(dport->regs.isolation + CXL_ISOLATION_CAPABILITY_OFFSET);
+	if (!(cap & CXL_ISOLATION_CAP_MEM_TIME_SUPP))
+		return -ENXIO;
+
+	if (!timeout_range_supported(cap, val))
+		return -EINVAL;
+
+	ctrl = readl(dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	ctrl &= FIELD_PREP(CXL_ISOLATION_CTRL_MEM_TIME_MASK, 0);
+	ctrl |= FIELD_PREP(CXL_ISOLATION_CTRL_MEM_TIME_MASK, val);
+	writel(ctrl, dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	return 0;
+}
diff --git a/drivers/pci/pcie/cxl_isolation.c b/drivers/pci/pcie/cxl_isolation.c
index 9d2ad14810e8..107201b5843f 100644
--- a/drivers/pci/pcie/cxl_isolation.c
+++ b/drivers/pci/pcie/cxl_isolation.c
@@ -193,9 +193,111 @@ static ssize_t timeout_ctrl_show(struct device *dev,
 }
 DEVICE_ATTR_RW(timeout_ctrl);
 
+/* CXL 3.2 8.2.4.24.2 CXL Timeout & Isolation Control Register, field 3:0 */
+const struct timeout_ranges {
+	char *str;
+	u8 val;
+} ranges[] = {
+	{ .str = "a", .val = 0x1 },
+	{ .str = "A", .val = 0x1 },
+	{ .str = "a2", .val = 0x2 },
+	{ .str = "A2", .val = 0x2 },
+	{ .str = "b", .val = 0x5 },
+	{ .str = "B", .val = 0x5 },
+	{ .str = "b2", .val = 0x6 },
+	{ .str = "B2", .val = 0x6 },
+	{ .str = "c", .val = 0x9 },
+	{ .str = "C", .val = 0x9 },
+	{ .str = "c2", .val = 0xA },
+	{ .str = "C2", .val = 0xA },
+	{ .str = "d", .val = 0xD },
+	{ .str = "D", .val = 0xD },
+	{ .str = "d2", .val = 0xE },
+	{ .str = "D2", .val = 0xE },
+};
+
+static int timeout_range_str_to_val(const char *str, u8 *val)
+{
+	char val_buf[32] = { 0 };
+	char *start;
+
+	strscpy(val_buf, str, ARRAY_SIZE(val_buf) - 1);
+	start = strim(val_buf);
+	if (!start)
+		return -EINVAL;
+
+	for (int i = 0; i < ARRAY_SIZE(ranges); i++)
+		if (strcmp(start, ranges[i].str) == 0)
+			return ranges[i].val;
+
+	return -EINVAL;
+}
+
+static ssize_t timeout_range_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t n)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct cxl_port *port;
+	u8 val;
+	int rc;
+
+	rc = kstrtou8(buf, 0, &val);
+	if (rc && timeout_range_str_to_val(buf, &val) < 0)
+		return -EINVAL;
+
+	struct cxl_dport **dport __free(kfree) =
+		kzalloc(sizeof(*dport), GFP_KERNEL);
+	if (!dport)
+		return -ENOMEM;
+
+	port = cxl_find_pcie_rp(pdev, dport);
+	if (!port || !(*dport))
+		return -ENODEV;
+
+	if (!(*dport)->regs.isolation)
+		return -ENXIO;
+
+	rc = cxl_set_timeout_range(*dport, val);
+	put_device(&port->dev);
+	return rc ? rc : n;
+}
+
+static ssize_t timeout_range_show(struct device *dev,
+				  struct device_attribute *attr, char * buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct cxl_port *port;
+	u32 ctrl, val;
+
+	struct cxl_dport **dport __free(kfree) =
+		kzalloc(sizeof(*dport), GFP_KERNEL);
+	if (!dport)
+		return -ENOMEM;
+
+	port = cxl_find_pcie_rp(pdev, dport);
+	if (!port || !(*dport))
+		return -ENODEV;
+
+	if (!(*dport)->regs.isolation)
+		return -ENXIO;
+
+	ctrl = readl((*dport)->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	put_device(&port->dev);
+
+	val = FIELD_GET(CXL_ISOLATION_CTRL_MEM_TIME_MASK, ctrl);
+	for (int i = 0; i < ARRAY_SIZE(ranges); i++)
+		if (ranges[i].val == val)
+			return sysfs_emit(buf, "%s\n", ranges[i].str);
+
+	return -ENXIO;
+}
+DEVICE_ATTR_RW(timeout_range);
+
 static struct attribute *isolation_attrs[] = {
 	&dev_attr_timeout_ctrl.attr,
 	&dev_attr_isolation_ctrl.attr,
+	&dev_attr_timeout_range.attr,
 	NULL,
 };
 
diff --git a/include/cxl/isolation.h b/include/cxl/isolation.h
index 0b6e4f0160a8..f2c4feb5a42b 100644
--- a/include/cxl/isolation.h
+++ b/include/cxl/isolation.h
@@ -30,6 +30,7 @@ void cxl_enable_isolation(struct cxl_dport *dport);
 int cxl_disable_isolation(struct cxl_dport *dport);
 void cxl_enable_timeout(struct cxl_dport *dport);
 void cxl_disable_timeout(struct cxl_dport *dport);
+int cxl_set_timeout_range(struct cxl_dport *dport, u8 val);
 
 struct cxl_port *cxl_find_pcie_rp(struct pci_dev *pdev,
 				  struct cxl_dport **dport);
@@ -39,6 +40,8 @@ static inline int cxl_disable_isolation(struct cxl_dport *dport)
 { return -ENXIO; }
 static inline void cxl_enable_timeout(struct cxl_dport *dport) {}
 static inline void cxl_disable_timeout(struct cxl_dport *dport) {}
+static inline int cxl_set_timeout_range(struct cxl_dport *dport, u8 val)
+{ return -ENXIO; }
 
 static inline struct cxl_port *cxl_find_pcie_rp(struct pci_dev *pdev,
 						struct cxl_dport **dport);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 14/16] ACPI: Add CXL isolation _OSC fields
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (12 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 13/16] cxl/core, PCI: PCIe portdrv: Add CXL timeout range programming Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-08-22 19:19   ` Rafael J. Wysocki
  2025-07-30 21:47 ` [PATCH 15/16] cxl/core, cxl/acpi: Enable CXL isolation based on _OSC handshake Ben Cheatham
  2025-07-30 21:47 ` [PATCH 16/16] cxl/core, cxl/acpi: Add CXL isolation notify handler Ben Cheatham
  15 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Add CXL Timeout and Isolation _OSC support and control fields, as
defined in the ECN at the link below. The ECN contents are expected to
appear in the CXL 4.0 specification. The link is only accessible to CXL
SSWG members, so a brief overview is provided here:

The ECN adds several fields to the CXL _OSC method (CXL 3.2 9.18.2)
for the purpose of reserving CXL isolation features for the platform
firmware's use. The fields introduced for kernel support reserve
toggling the CXL.mem isolation enable bit in the isolation control
register (CXL 3.2 8.2.4.24.2) and how the host is notified isolation has
occurred.

These fields will be used by the CXL driver to enable CXL isolation
according to the result of the handshake. Descriptions of these fields
are included in the commit messages of the commits where they are used.

Link: https://members.computeexpresslink.org/wg/software_systems/document/3118
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/acpi/pci_root.c | 9 +++++++++
 include/linux/acpi.h    | 3 +++
 2 files changed, 12 insertions(+)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 74ade4160314..33a922e160fc 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -145,10 +145,13 @@ static struct pci_osc_bit_struct cxl_osc_support_bit[] = {
 	{ OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT, "CXL20PortDevRegAccess" },
 	{ OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT, "CXLProtocolErrorReporting" },
 	{ OSC_CXL_NATIVE_HP_SUPPORT, "CXLNativeHotPlug" },
+	{ OSC_CXL_TIMEOUT_ISOLATION_SUPPORT, "CXLTimeoutIsolation" },
 };
 
 static struct pci_osc_bit_struct cxl_osc_control_bit[] = {
 	{ OSC_CXL_ERROR_REPORTING_CONTROL, "CXLMemErrorReporting" },
+	{ OSC_CXL_MEM_ISOLATION_CONTROL, "CXLMemIsolation" },
+	{ OSC_CXL_ISOLATION_NOTIF_CONTROL, "CXLIsolationNotifications" },
 };
 
 static void decode_osc_bits(struct acpi_pci_root *root, char *msg, u32 word,
@@ -493,6 +496,8 @@ static u32 calculate_cxl_support(void)
 		support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
 	if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
 		support |= OSC_CXL_NATIVE_HP_SUPPORT;
+	if (IS_ENABLED(CONFIG_CXL_ISOLATION))
+		support |= OSC_CXL_TIMEOUT_ISOLATION_SUPPORT;
 
 	return support;
 }
@@ -535,6 +540,10 @@ static u32 calculate_cxl_control(void)
 	if (IS_ENABLED(CONFIG_MEMORY_FAILURE))
 		control |= OSC_CXL_ERROR_REPORTING_CONTROL;
 
+	if (IS_ENABLED(CONFIG_CXL_ISOLATION))
+		control |= (OSC_CXL_MEM_ISOLATION_CONTROL |
+			    OSC_CXL_ISOLATION_NOTIF_CONTROL);
+
 	return control;
 }
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index f102c0fe3431..f172182aa029 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -626,9 +626,12 @@ extern u32 osc_sb_native_usb4_control;
 #define OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT	0x00000002
 #define OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT	0x00000004
 #define OSC_CXL_NATIVE_HP_SUPPORT		0x00000008
+#define OSC_CXL_TIMEOUT_ISOLATION_SUPPORT	0x00000010
 
 /* CXL _OSC: Capabilities DWORD 5: Control Field */
 #define OSC_CXL_ERROR_REPORTING_CONTROL		0x00000001
+#define OSC_CXL_MEM_ISOLATION_CONTROL		0x00000002
+#define OSC_CXL_ISOLATION_NOTIF_CONTROL		0x00000020
 
 static inline u32 acpi_osc_ctx_get_pci_control(struct acpi_osc_context *context)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 15/16] cxl/core, cxl/acpi: Enable CXL isolation based on _OSC handshake
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (13 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 14/16] ACPI: Add CXL isolation _OSC fields Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  2025-07-30 21:47 ` [PATCH 16/16] cxl/core, cxl/acpi: Add CXL isolation notify handler Ben Cheatham
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Enable CXL isolation based on the result of _OSC handshake, if
applicable. In the absence of a _OSC interpretation callback (i.e.
platform is non-ACPI), assume that we have control of all features.

A link for the ECN (expected in CXL 4.0 spec) that introduces the
relevant parts of the _OSC method (CXL 3.2 9.18.2) can be found below.
spec). This link is only accesible to CXL SSWG members, so here's a brief
overview:

The ECN introduces an _OSC field for controlling whether the OSPM can write
to the CXL.mem Isolation Enable bit in the CXL isolation control
register (CXL 3.2 8.2.4.24.2). If the firmware reserves control, the
OSPM is expected to not modify the isolation enable bit when writing the
register.

Link: https://members.computeexpresslink.org/wg/software_systems/document/3118
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/acpi.c      | 24 ++++++++++++++++++++++++
 drivers/cxl/core/port.c | 17 +++++++++++++++--
 drivers/cxl/cxl.h       |  5 +++++
 include/cxl/isolation.h |  4 ++++
 4 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index badaa99ab33a..b964f02fb56b 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -8,6 +8,7 @@
 #include <linux/pci.h>
 #include <linux/node.h>
 #include <asm/div64.h>
+#include <cxl/isolation.h>
 #include "cxlpci.h"
 #include "cxl.h"
 
@@ -367,9 +368,32 @@ static int cxl_acpi_setup_hostbridge_uport(struct cxl_root *cxl_root,
 	return 0;
 }
 
+static void decode_isolation_osc(struct cxl_port *hb, u32 iso_cap)
+{
+	bool err_corr = FIELD_GET(CXL_ISOLATION_CAP_ERR_COR_SUPP, iso_cap);
+	struct acpi_device *adev = ACPI_COMPANION(hb->uport_dev);
+	struct acpi_pci_root *pci_root;
+	u32 osc_ctrl;
+
+	if (!adev)
+		return;
+
+	pci_root = acpi_pci_find_root(adev->handle);
+	if (!pci_root)
+		return;
+
+	osc_ctrl = pci_root->osc_ext_control_set;
+	if (osc_ctrl & OSC_CXL_MEM_ISOLATION_CONTROL)
+		hb->isolation_caps |= CXL_ISOLATION_MEM_ENABLE;
+
+	if (!err_corr || (osc_ctrl & OSC_CXL_ISOLATION_NOTIF_CONTROL))
+		hb->isolation_caps |= CXL_ISOLATION_INTERRUPTS;
+}
+
 static const struct cxl_root_ops acpi_root_ops = {
 	.qos_class = cxl_acpi_qos_class,
 	.setup_hostbridge_uport = cxl_acpi_setup_hostbridge_uport,
+	.get_isolation_caps = decode_isolation_osc,
 };
 
 static void del_cxl_resource(struct resource *res)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index b5a306341bb2..e9eb7a8a5f72 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1322,7 +1322,8 @@ static int cxl_dport_setup_interrupts(struct device *host,
 static int cxl_dport_enable_timeout_isolation(struct device *host,
 					      struct cxl_dport *dport)
 {
-	u32 cap;
+	struct cxl_port *port = dport->port;
+	u32 cap, ctrl;
 	int rc;
 
 	if (!dport->reg_map.component_map.isolation.valid)
@@ -1337,11 +1338,23 @@ static int cxl_dport_enable_timeout_isolation(struct device *host,
 	if (!(cap & CXL_ISOLATION_CAP_MEM_ISO_SUPP))
 		return 0;
 
+	struct cxl_root *root __free(put_cxl_root) = find_cxl_root(dport->port);
+	if (root && root->ops && root->ops->get_isolation_caps)
+		root->ops->get_isolation_caps(port, cap);
+	else
+		port->isolation_caps = ~0;
+
+	ctrl = readl(dport->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
+	if (!(port->isolation_caps & CXL_ISOLATION_MEM_ENABLE) &&
+	    !(ctrl & CXL_ISOLATION_CTRL_MEM_ISO_ENABLE))
+		return 0;
+
 	rc = cxl_dport_setup_interrupts(host, dport);
 	if (rc)
 		return rc == -ENXIO ? 0 : rc;
 
-	cxl_enable_isolation(dport);
+	if (port->isolation_caps & CXL_ISOLATION_MEM_ENABLE)
+		cxl_enable_isolation(dport);
 
 	if (!(cap & CXL_ISOLATION_CAP_MEM_TIME_SUPP))
 		cxl_enable_timeout(dport);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 7f9c6bd6e010..aa36eba79181 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -139,6 +139,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 #define   CXL_ISOLATION_CAP_MEM_TIME_MASK GENMASK(3, 0)
 #define   CXL_ISOLATION_CAP_MEM_TIME_SUPP BIT(4)
 #define   CXL_ISOLATION_CAP_MEM_ISO_SUPP BIT(16)
+#define   CXL_ISOLATION_CAP_ERR_COR_SUPP BIT(25)
 #define   CXL_ISOLATION_CAP_INTR_SUPP BIT(26)
 #define   CXL_ISOLATION_CAP_INTR_MASK GENMASK(31, 27)
 #define CXL_ISOLATION_CTRL_OFFSET 0x8
@@ -629,6 +630,7 @@ struct cxl_dax_region {
  * @cdat: Cached CDAT data
  * @cdat_available: Should a CDAT attribute be available in sysfs
  * @pci_latency: Upstream latency in picoseconds
+ * @isolation_caps: Isolation capabilities given by platform firmware
  */
 struct cxl_port {
 	struct device dev;
@@ -654,6 +656,7 @@ struct cxl_port {
 	} cdat;
 	bool cdat_available;
 	long pci_latency;
+	u32 isolation_caps;
 };
 
 /**
@@ -679,6 +682,8 @@ struct cxl_root_ops {
 			 int *qos_class);
 	int (*setup_hostbridge_uport)(struct cxl_root *cxl_root,
 				      struct device *bridge_dev);
+	void (*get_isolation_caps)(struct cxl_port *hb,
+				   u32 iso_cap);
 };
 
 static inline struct cxl_dport *
diff --git a/include/cxl/isolation.h b/include/cxl/isolation.h
index f2c4feb5a42b..54b57c42e46e 100644
--- a/include/cxl/isolation.h
+++ b/include/cxl/isolation.h
@@ -4,6 +4,10 @@
 
 #include <linux/pci.h>
 
+/* CXL Isolation capabilities we have control over */
+#define CXL_ISOLATION_MEM_ENABLE BIT(0)
+#define CXL_ISOLATION_INTERRUPTS BIT(1)
+
 /**
  * enum cxl_err_results - Possible results of an CXL isolation handler
  * @CXL_ERR_NONE: Device can recover without CXL core intervention
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 16/16] cxl/core, cxl/acpi: Add CXL isolation notify handler
  2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
                   ` (14 preceding siblings ...)
  2025-07-30 21:47 ` [PATCH 15/16] cxl/core, cxl/acpi: Enable CXL isolation based on _OSC handshake Ben Cheatham
@ 2025-07-30 21:47 ` Ben Cheatham
  15 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2025-07-30 21:47 UTC (permalink / raw)
  To: linux-cxl, linux-pci, linux-acpi; +Cc: Ben Cheatham

Install ACPI 0x80 notify handler for CXL isolation, based on the result
of the _OSC negotiation. The handler is once installed per CXL host bridge
(HID of "ACPI0016") as part of isolation set up.

A link for the ECN (expected in CXL 4.0 spec) that introduces the
relevant parts of the _OSC method (CXL 3.2 9.18.2) can be found below.
spec). This link is only accesible to CXL SSWG members, so here's a brief
overview:

The ECN introduces a field in the _OSC method to control how the OSPM is
notified isolation has occurred. If the ERR_COR Signaling Supported bit
in the isolation capability register (CXL 3.2 8.2.4.24.1) is NOT set,
this portion of the _OSC can be ignored.

If the OSPM is given control of isolation notification, the mechanism to
be used when isolation occurs is an MSI/-X interrupt (pre-ECN behavior).
If the platorm firmware reserves control, the OSPM will be notified
through a ACPI 0x80 notify on the CXL host bridge ACPI device (ACPI HID:
"ACPI0016").

Link: https://members.computeexpresslink.org/wg/software_systems/document/3118
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/cxl/acpi.c      | 51 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/port.c | 41 +++++++++++++++++++++++++--------
 drivers/cxl/cxl.h       |  3 +++
 3 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index b964f02fb56b..145a03f15255 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -390,10 +390,61 @@ static void decode_isolation_osc(struct cxl_port *hb, u32 iso_cap)
 		hb->isolation_caps |= CXL_ISOLATION_INTERRUPTS;
 }
 
+static void isolation_notify_handler(acpi_handle handle, u32 event, void *data)
+{
+	struct cxl_port *hb = data;
+	struct cxl_dport *dport;
+	unsigned long index;
+
+	guard(device)(&hb->dev);
+	xa_for_each(&hb->dports, index, dport) {
+		if (dport->regs.isolation)
+			cxl_isolation_interrupt_handler(dport);
+	}
+}
+
+static void cxl_remove_iso_handler(void *data)
+{
+	acpi_handle handle = data;
+
+	acpi_remove_notify_handler(handle, ACPI_DEVICE_NOTIFY,
+				   isolation_notify_handler);
+}
+
+static int cxl_register_iso_handler(struct cxl_port *hb)
+{
+	struct acpi_device *adev = ACPI_COMPANION(hb->uport_dev);
+	acpi_handle handle;
+	int rc;
+
+	if (!adev)
+		return -EINVAL;
+
+	handle = acpi_device_handle(adev);
+
+	guard(device)(&hb->dev);
+	if (devm_is_action_added(&hb->dev, cxl_remove_iso_handler, handle))
+		return 0;
+
+	rc = acpi_install_notify_handler(handle, ACPI_DEVICE_NOTIFY,
+					 isolation_notify_handler, hb);
+	if (rc != AE_OK)
+		return -ENXIO;
+
+	rc = devm_add_action_or_reset(&hb->dev, cxl_remove_iso_handler,
+				      handle);
+	if (rc)
+		return rc;
+
+	dev_dbg(&hb->dev, "Installed CXL isolation notify handler\n");
+	return 0;
+}
+
 static const struct cxl_root_ops acpi_root_ops = {
 	.qos_class = cxl_acpi_qos_class,
 	.setup_hostbridge_uport = cxl_acpi_setup_hostbridge_uport,
 	.get_isolation_caps = decode_isolation_osc,
+	.register_hb_isolation_handler = cxl_register_iso_handler,
 };
 
 static void del_cxl_resource(struct resource *res)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index e9eb7a8a5f72..ece667f3aaf5 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1207,11 +1207,9 @@ struct isolation_intr_data {
 	struct cxl_port *port;
 };
 
-static irqreturn_t cxl_isolation_thread(int irq, void *_data)
+int cxl_isolation_interrupt_handler(struct cxl_dport *dport)
 {
-	struct isolation_intr_data *data = _data;
-	struct cxl_dport *dport = data->dport;
-	struct cxl_port *port = data->port;
+	struct cxl_port *port = dport->port;
 	enum cxl_err_results res, acc;
 	struct cxl_dev_state *cxlds;
 	struct cxl_memdev *cxlmd;
@@ -1221,10 +1219,6 @@ static irqreturn_t cxl_isolation_thread(int irq, void *_data)
 	bool lnk_down;
 	u32 status;
 
-	if (!dport || !port)
-		return IRQ_NONE;
-
-	guard(device)(&port->dev);
 	if (!dport->regs.isolation)
 		goto panic;
 
@@ -1255,8 +1249,9 @@ static irqreturn_t cxl_isolation_thread(int irq, void *_data)
 panic:
 	panic("%s: downstream devices could not recover from CXL.mem link down\n",
 	      dev_name(dport->dport_dev));
-	return IRQ_NONE;
+	return -ENXIO;
 }
+EXPORT_SYMBOL_NS_GPL(cxl_isolation_interrupt_handler, "CXL");
 
 static void cxl_dport_free_interrupts(void *data)
 {
@@ -1274,6 +1269,24 @@ static void cxl_dport_free_interrupts(void *data)
 	devm_free_irq(info->dev, info->irq, dport);
 }
 
+static irqreturn_t cxl_isolation_thread(int irq, void *_data)
+{
+	struct isolation_intr_data *data = _data;
+	struct cxl_dport *dport = data->dport;
+	struct cxl_port *port = data->port;
+	int rc;
+
+	if (!dport || !port)
+		return IRQ_NONE;
+
+	guard(device)(&port->dev);
+	rc = cxl_isolation_interrupt_handler(dport);
+	if (rc)
+		return IRQ_NONE;
+
+	return IRQ_HANDLED;
+}
+
 static int cxl_dport_setup_interrupts(struct device *host,
 				      struct cxl_dport *dport)
 {
@@ -1282,6 +1295,16 @@ static int cxl_dport_setup_interrupts(struct device *host,
 	u32 cap;
 	int rc;
 
+	if (!(dport->port->isolation_caps & CXL_ISOLATION_INTERRUPTS)) {
+		struct cxl_root *root __free(put_cxl_root) =
+			find_cxl_root(dport->port);
+		if (!root || !root->ops ||
+		    !root->ops->register_hb_isolation_handler)
+			return -ENXIO;
+
+		return root->ops->register_hb_isolation_handler(dport->port);
+	}
+
 	cap = readl(dport->regs.isolation + CXL_ISOLATION_CAPABILITY_OFFSET);
 	if (!(cap & CXL_ISOLATION_CAP_INTR_SUPP))
 		return -ENXIO;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index aa36eba79181..68074c0d78d1 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -684,8 +684,11 @@ struct cxl_root_ops {
 				      struct device *bridge_dev);
 	void (*get_isolation_caps)(struct cxl_port *hb,
 				   u32 iso_cap);
+	int (*register_hb_isolation_handler)(struct cxl_port *hb);
 };
 
+int cxl_isolation_interrupt_handler(struct cxl_dport *dport);
+
 static inline struct cxl_dport *
 cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dport_dev)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 05/16] PCI: PCIe portdrv: Add interface for getting CXL isolation IRQ
  2025-07-30 21:47 ` [PATCH 05/16] PCI: PCIe portdrv: Add interface for getting CXL isolation IRQ Ben Cheatham
@ 2025-07-31  5:59   ` Lukas Wunner
  2025-07-31 13:13     ` Cheatham, Benjamin
  0 siblings, 1 reply; 24+ messages in thread
From: Lukas Wunner @ 2025-07-31  5:59 UTC (permalink / raw)
  To: Ben Cheatham; +Cc: linux-cxl, linux-pci, linux-acpi

On Wed, Jul 30, 2025 at 04:47:07PM -0500, Ben Cheatham wrote:
> Add a function to the CXL isolation service driver that allows the CXL
> core to get the necessary information for setting up an interrupt
> handler.
[...]
>  static int cxl_isolation_probe(struct pcie_device *dev)
>  {
> -	if (!pcie_is_cxl(dev->port) || pcie_cxliso_get_intr_vec(dev->port, NULL))
> +	struct cxl_isolation_info *info;
> +	if (!pcie_is_cxl(dev->port) ||
> +	    pcie_cxliso_get_intr_vec(dev->port, NULL))
>  		return -ENXIO;

The re-wrapping of the if-condition shouldn't be here, it should be
wrapped the way you want it in the patch *introducing* the if-condition.

> +	info = devm_kzalloc(&dev->device, sizeof(*info), GFP_KERNEL);
> +	if (!info)
> +		return -ENOMEM;
> +
> +	*info = (struct cxl_isolation_info) {
> +		.dev = &dev->device,
> +		.irq = dev->irq,
> +	};
> +
> +	set_service_data(dev, info);

No, the irq is already saved in struct pcie_device, there's no need
to duplicate that.

> +struct cxl_isolation_info *
> +pcie_cxl_dport_get_isolation_info(struct pci_dev *dport_dev)
> +{
> +	struct device *dev;
>  
> +	dev = pcie_port_find_device(dport_dev, PCIE_PORT_SERVICE_CXLISO);
> +	if (!dev)
> +		return NULL;
> +
> +	return get_service_data(to_pcie_device(dev));
>  }

Just retrieve the irq from to_pcie_device(dev) and either return it
directly or through a call-by-reference parameter.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 05/16] PCI: PCIe portdrv: Add interface for getting CXL isolation IRQ
  2025-07-31  5:59   ` Lukas Wunner
@ 2025-07-31 13:13     ` Cheatham, Benjamin
  0 siblings, 0 replies; 24+ messages in thread
From: Cheatham, Benjamin @ 2025-07-31 13:13 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-cxl, linux-pci, linux-acpi

Thanks for taking a look Lukas! Responses inline.

On 7/31/2025 12:59 AM, Lukas Wunner wrote:
> On Wed, Jul 30, 2025 at 04:47:07PM -0500, Ben Cheatham wrote:
>> Add a function to the CXL isolation service driver that allows the CXL
>> core to get the necessary information for setting up an interrupt
>> handler.
> [...]
>>  static int cxl_isolation_probe(struct pcie_device *dev)
>>  {
>> -	if (!pcie_is_cxl(dev->port) || pcie_cxliso_get_intr_vec(dev->port, NULL))
>> +	struct cxl_isolation_info *info;
>> +	if (!pcie_is_cxl(dev->port) ||
>> +	    pcie_cxliso_get_intr_vec(dev->port, NULL))
>>  		return -ENXIO;
> 
> The re-wrapping of the if-condition shouldn't be here, it should be
> wrapped the way you want it in the patch *introducing* the if-condition.
> 

You're right, don't know why I did that :/.

>> +	info = devm_kzalloc(&dev->device, sizeof(*info), GFP_KERNEL);
>> +	if (!info)
>> +		return -ENOMEM;
>> +
>> +	*info = (struct cxl_isolation_info) {
>> +		.dev = &dev->device,
>> +		.irq = dev->irq,
>> +	};
>> +
>> +	set_service_data(dev, info);
> 
> No, the irq is already saved in struct pcie_device, there's no need
> to duplicate that.
> 

Good point, will do. I think I originally had the mapping of the CXL isolation
register in the struct cxl_isolation_info, but it makes no sense to keep this
around anymore.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 13/16] cxl/core, PCI: PCIe portdrv: Add CXL timeout range programming
  2025-07-30 21:47 ` [PATCH 13/16] cxl/core, PCI: PCIe portdrv: Add CXL timeout range programming Ben Cheatham
@ 2025-08-04 21:39   ` Bjorn Helgaas
  2025-08-06 17:58     ` Cheatham, Benjamin
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2025-08-04 21:39 UTC (permalink / raw)
  To: Ben Cheatham; +Cc: linux-cxl, linux-pci, linux-acpi

Drop a few of the subject line prefixes and mention something about
sysfs.  This has nothing to do with portdrv.  Following sibling "port
service drivers," I guess the prefix would be something like
"PCI/CXL:" or "PCI/CXL_ISO:"

On Wed, Jul 30, 2025 at 04:47:15PM -0500, Ben Cheatham wrote:
> Add functions to enable programming the CXL.mem transaction timeout
> range, if supported. Add a sysfs attribute to the "cxl_isolation" group
> to allow programming the timeout from userspace.

Include a sample path with at least the last 2-3 components.  Maybe
even an example, e.g.,

  # echo B2 > /sys/.../cxl_isolation/timeout_range

Probably also some doc in Documentation/ABI/testing/?

> The attribute can take either the CXL spec-defined hex value for the
> associated timeout range (CXL 3.2 8.2.4.24.2 field 3:0) or a
> string with the range. The range string is formatted as the range letter
> in uppercase or lowercase, with an optional "2" to specify the second
> range in the aforementioned spec ref.
> 
> For example, to program the port with a timeout of 65ms to 210ms (range B)
> the following strings could be specified: "b2"/"B2". Picking the first
> portion of range B (16ms to 55ms) would be: "b"/"B".

What's the value of accepting either upper- or lower-case?  It doubles
the size of ranges[], and I think timeout_range_show() always shows
the lower-case one.  The spec uses upper-case.

> +/* CXL 3.2 8.2.4.24.2 CXL Timeout & Isolation Control Register, field 3:0 */
> +const struct timeout_ranges {
> +	char *str;
> +	u8 val;
> +} ranges[] = {
> +	{ .str = "a", .val = 0x1 },
> +	{ .str = "A", .val = 0x1 },
> +	{ .str = "a2", .val = 0x2 },
> +	{ .str = "A2", .val = 0x2 },
> +	{ .str = "b", .val = 0x5 },
> +	{ .str = "B", .val = 0x5 },
> +	{ .str = "b2", .val = 0x6 },
> +	{ .str = "B2", .val = 0x6 },
> +	{ .str = "c", .val = 0x9 },
> +	{ .str = "C", .val = 0x9 },
> +	{ .str = "c2", .val = 0xA },
> +	{ .str = "C2", .val = 0xA },
> +	{ .str = "d", .val = 0xD },
> +	{ .str = "D", .val = 0xD },
> +	{ .str = "d2", .val = 0xE },
> +	{ .str = "D2", .val = 0xE },
> +};
> +
> +static int timeout_range_str_to_val(const char *str, u8 *val)
> +{
> +	char val_buf[32] = { 0 };
> +	char *start;
> +
> +	strscpy(val_buf, str, ARRAY_SIZE(val_buf) - 1);
> +	start = strim(val_buf);
> +	if (!start)
> +		return -EINVAL;
> +
> +	for (int i = 0; i < ARRAY_SIZE(ranges); i++)
> +		if (strcmp(start, ranges[i].str) == 0)
> +			return ranges[i].val;
> +
> +	return -EINVAL;
> +}
> +
> +static ssize_t timeout_range_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t n)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct cxl_port *port;
> +	u8 val;
> +	int rc;
> +
> +	rc = kstrtou8(buf, 0, &val);
> +	if (rc && timeout_range_str_to_val(buf, &val) < 0)
> +		return -EINVAL;
> +
> +	struct cxl_dport **dport __free(kfree) =
> +		kzalloc(sizeof(*dport), GFP_KERNEL);
> +	if (!dport)
> +		return -ENOMEM;
> +
> +	port = cxl_find_pcie_rp(pdev, dport);
> +	if (!port || !(*dport))
> +		return -ENODEV;
> +
> +	if (!(*dport)->regs.isolation)
> +		return -ENXIO;
> +
> +	rc = cxl_set_timeout_range(*dport, val);
> +	put_device(&port->dev);
> +	return rc ? rc : n;
> +}
> +
> +static ssize_t timeout_range_show(struct device *dev,
> +				  struct device_attribute *attr, char * buf)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct cxl_port *port;
> +	u32 ctrl, val;
> +
> +	struct cxl_dport **dport __free(kfree) =
> +		kzalloc(sizeof(*dport), GFP_KERNEL);
> +	if (!dport)
> +		return -ENOMEM;
> +
> +	port = cxl_find_pcie_rp(pdev, dport);
> +	if (!port || !(*dport))
> +		return -ENODEV;
> +
> +	if (!(*dport)->regs.isolation)
> +		return -ENXIO;
> +
> +	ctrl = readl((*dport)->regs.isolation + CXL_ISOLATION_CTRL_OFFSET);
> +	put_device(&port->dev);
> +
> +	val = FIELD_GET(CXL_ISOLATION_CTRL_MEM_TIME_MASK, ctrl);
> +	for (int i = 0; i < ARRAY_SIZE(ranges); i++)
> +		if (ranges[i].val == val)
> +			return sysfs_emit(buf, "%s\n", ranges[i].str);
> +
> +	return -ENXIO;
> +}
> +DEVICE_ATTR_RW(timeout_range);
> +
>  static struct attribute *isolation_attrs[] = {
>  	&dev_attr_timeout_ctrl.attr,
>  	&dev_attr_isolation_ctrl.attr,
> +	&dev_attr_timeout_range.attr,
>  	NULL,
>  };
>  
> diff --git a/include/cxl/isolation.h b/include/cxl/isolation.h
> index 0b6e4f0160a8..f2c4feb5a42b 100644
> --- a/include/cxl/isolation.h
> +++ b/include/cxl/isolation.h
> @@ -30,6 +30,7 @@ void cxl_enable_isolation(struct cxl_dport *dport);
>  int cxl_disable_isolation(struct cxl_dport *dport);
>  void cxl_enable_timeout(struct cxl_dport *dport);
>  void cxl_disable_timeout(struct cxl_dport *dport);
> +int cxl_set_timeout_range(struct cxl_dport *dport, u8 val);
>  
>  struct cxl_port *cxl_find_pcie_rp(struct pci_dev *pdev,
>  				  struct cxl_dport **dport);
> @@ -39,6 +40,8 @@ static inline int cxl_disable_isolation(struct cxl_dport *dport)
>  { return -ENXIO; }
>  static inline void cxl_enable_timeout(struct cxl_dport *dport) {}
>  static inline void cxl_disable_timeout(struct cxl_dport *dport) {}
> +static inline int cxl_set_timeout_range(struct cxl_dport *dport, u8 val)
> +{ return -ENXIO; }
>  
>  static inline struct cxl_port *cxl_find_pcie_rp(struct pci_dev *pdev,
>  						struct cxl_dport **dport);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/16] PCI: PCIe portdrv: Allocate CXL isolation MSI/-X vector
  2025-07-30 21:47 ` [PATCH 04/16] PCI: PCIe portdrv: Allocate CXL isolation MSI/-X vector Ben Cheatham
@ 2025-08-04 21:39   ` Bjorn Helgaas
  2025-08-06 17:58     ` Cheatham, Benjamin
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2025-08-04 21:39 UTC (permalink / raw)
  To: Ben Cheatham; +Cc: linux-cxl, linux-pci, linux-acpi

Observe the subject line convention for drivers/pci/pcie/portdrv.c.

On Wed, Jul 30, 2025 at 04:47:06PM -0500, Ben Cheatham wrote:
> Update the PCIe portdrv MSI/-X vector allocation code to include the CXL
> isolation service.

Use "MSI/MSI-X", not "MSI/-X" to follow PCIe spec formatting and make
this more greppable.  In subject also.

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/16] PCI: PCIe portdrv: Allocate CXL isolation MSI/-X vector
  2025-08-04 21:39   ` Bjorn Helgaas
@ 2025-08-06 17:58     ` Cheatham, Benjamin
  0 siblings, 0 replies; 24+ messages in thread
From: Cheatham, Benjamin @ 2025-08-06 17:58 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-cxl, linux-pci, linux-acpi

Hi Bjorn, thanks for taking a look!

On 8/4/2025 4:39 PM, Bjorn Helgaas wrote:
> Observe the subject line convention for drivers/pci/pcie/portdrv.c.
> 
> On Wed, Jul 30, 2025 at 04:47:06PM -0500, Ben Cheatham wrote:
>> Update the PCIe portdrv MSI/-X vector allocation code to include the CXL
>> isolation service.
> 
> Use "MSI/MSI-X", not "MSI/-X" to follow PCIe spec formatting and make
> this more greppable.  In subject also.
> 

For sure, I'll update it throughout the series.

> Bjorn


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 13/16] cxl/core, PCI: PCIe portdrv: Add CXL timeout range programming
  2025-08-04 21:39   ` Bjorn Helgaas
@ 2025-08-06 17:58     ` Cheatham, Benjamin
  0 siblings, 0 replies; 24+ messages in thread
From: Cheatham, Benjamin @ 2025-08-06 17:58 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-cxl, linux-pci, linux-acpi

On 8/4/2025 4:39 PM, Bjorn Helgaas wrote:
> Drop a few of the subject line prefixes and mention something about
> sysfs.  This has nothing to do with portdrv.  Following sibling "port
> service drivers," I guess the prefix would be something like
> "PCI/CXL:" or "PCI/CXL_ISO:"

Will do.

> 
> On Wed, Jul 30, 2025 at 04:47:15PM -0500, Ben Cheatham wrote:
>> Add functions to enable programming the CXL.mem transaction timeout
>> range, if supported. Add a sysfs attribute to the "cxl_isolation" group
>> to allow programming the timeout from userspace.
> 
> Include a sample path with at least the last 2-3 components.  Maybe
> even an example, e.g.,
> 
>   # echo B2 > /sys/.../cxl_isolation/timeout_range
> 
> Probably also some doc in Documentation/ABI/testing/?

I'll add it to the commit message and make an entry under Documentation.

> 
>> The attribute can take either the CXL spec-defined hex value for the
>> associated timeout range (CXL 3.2 8.2.4.24.2 field 3:0) or a
>> string with the range. The range string is formatted as the range letter
>> in uppercase or lowercase, with an optional "2" to specify the second
>> range in the aforementioned spec ref.
>>
>> For example, to program the port with a timeout of 65ms to 210ms (range B)
>> the following strings could be specified: "b2"/"B2". Picking the first
>> portion of range B (16ms to 55ms) would be: "b"/"B".
> 
> What's the value of accepting either upper- or lower-case?  It doubles
> the size of ranges[], and I think timeout_range_show() always shows
> the lower-case one.  The spec uses upper-case.
> 

Just ease of use. I'll restrict it to uppercase to match the spec (unless anyone has
a strong opinion otherwise).

Thanks,
Ben

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 14/16] ACPI: Add CXL isolation _OSC fields
  2025-07-30 21:47 ` [PATCH 14/16] ACPI: Add CXL isolation _OSC fields Ben Cheatham
@ 2025-08-22 19:19   ` Rafael J. Wysocki
  0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2025-08-22 19:19 UTC (permalink / raw)
  To: Ben Cheatham; +Cc: linux-cxl, linux-pci, linux-acpi, Dave Jiang

Cc: Dave

On Wed, Jul 30, 2025 at 11:50 PM Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
>
> Add CXL Timeout and Isolation _OSC support and control fields, as
> defined in the ECN at the link below. The ECN contents are expected to
> appear in the CXL 4.0 specification. The link is only accessible to CXL
> SSWG members, so a brief overview is provided here:
>
> The ECN adds several fields to the CXL _OSC method (CXL 3.2 9.18.2)
> for the purpose of reserving CXL isolation features for the platform
> firmware's use. The fields introduced for kernel support reserve
> toggling the CXL.mem isolation enable bit in the isolation control
> register (CXL 3.2 8.2.4.24.2) and how the host is notified isolation has
> occurred.
>
> These fields will be used by the CXL driver to enable CXL isolation
> according to the result of the handshake. Descriptions of these fields
> are included in the commit messages of the commits where they are used.
>
> Link: https://members.computeexpresslink.org/wg/software_systems/document/3118
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>

In case you need this

Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>

> ---
>  drivers/acpi/pci_root.c | 9 +++++++++
>  include/linux/acpi.h    | 3 +++
>  2 files changed, 12 insertions(+)
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 74ade4160314..33a922e160fc 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -145,10 +145,13 @@ static struct pci_osc_bit_struct cxl_osc_support_bit[] = {
>         { OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT, "CXL20PortDevRegAccess" },
>         { OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT, "CXLProtocolErrorReporting" },
>         { OSC_CXL_NATIVE_HP_SUPPORT, "CXLNativeHotPlug" },
> +       { OSC_CXL_TIMEOUT_ISOLATION_SUPPORT, "CXLTimeoutIsolation" },
>  };
>
>  static struct pci_osc_bit_struct cxl_osc_control_bit[] = {
>         { OSC_CXL_ERROR_REPORTING_CONTROL, "CXLMemErrorReporting" },
> +       { OSC_CXL_MEM_ISOLATION_CONTROL, "CXLMemIsolation" },
> +       { OSC_CXL_ISOLATION_NOTIF_CONTROL, "CXLIsolationNotifications" },
>  };
>
>  static void decode_osc_bits(struct acpi_pci_root *root, char *msg, u32 word,
> @@ -493,6 +496,8 @@ static u32 calculate_cxl_support(void)
>                 support |= OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT;
>         if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
>                 support |= OSC_CXL_NATIVE_HP_SUPPORT;
> +       if (IS_ENABLED(CONFIG_CXL_ISOLATION))
> +               support |= OSC_CXL_TIMEOUT_ISOLATION_SUPPORT;
>
>         return support;
>  }
> @@ -535,6 +540,10 @@ static u32 calculate_cxl_control(void)
>         if (IS_ENABLED(CONFIG_MEMORY_FAILURE))
>                 control |= OSC_CXL_ERROR_REPORTING_CONTROL;
>
> +       if (IS_ENABLED(CONFIG_CXL_ISOLATION))
> +               control |= (OSC_CXL_MEM_ISOLATION_CONTROL |
> +                           OSC_CXL_ISOLATION_NOTIF_CONTROL);
> +
>         return control;
>  }
>
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index f102c0fe3431..f172182aa029 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -626,9 +626,12 @@ extern u32 osc_sb_native_usb4_control;
>  #define OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT        0x00000002
>  #define OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT 0x00000004
>  #define OSC_CXL_NATIVE_HP_SUPPORT              0x00000008
> +#define OSC_CXL_TIMEOUT_ISOLATION_SUPPORT      0x00000010
>
>  /* CXL _OSC: Capabilities DWORD 5: Control Field */
>  #define OSC_CXL_ERROR_REPORTING_CONTROL                0x00000001
> +#define OSC_CXL_MEM_ISOLATION_CONTROL          0x00000002
> +#define OSC_CXL_ISOLATION_NOTIF_CONTROL                0x00000020
>
>  static inline u32 acpi_osc_ctx_get_pci_control(struct acpi_osc_context *context)
>  {
> --

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-08-22 19:19 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 21:47 [PATCH 00/16] CXL.mem error isolation support Ben Cheatham
2025-07-30 21:47 ` [PATCH 01/16] cxl/regs: Add cxl_unmap_component_regs() Ben Cheatham
2025-07-30 21:47 ` [PATCH 02/16] cxl/regs: Add CXL Isolation capability mapping Ben Cheatham
2025-07-30 21:47 ` [PATCH 03/16] PCI: PCIe portdrv: Add CXL Isolation service driver Ben Cheatham
2025-07-30 21:47 ` [PATCH 04/16] PCI: PCIe portdrv: Allocate CXL isolation MSI/-X vector Ben Cheatham
2025-08-04 21:39   ` Bjorn Helgaas
2025-08-06 17:58     ` Cheatham, Benjamin
2025-07-30 21:47 ` [PATCH 05/16] PCI: PCIe portdrv: Add interface for getting CXL isolation IRQ Ben Cheatham
2025-07-31  5:59   ` Lukas Wunner
2025-07-31 13:13     ` Cheatham, Benjamin
2025-07-30 21:47 ` [PATCH 06/16] cxl/core: Enable CXL.mem isolation Ben Cheatham
2025-07-30 21:47 ` [PATCH 07/16] cxl/core: Set up isolation interrupts Ben Cheatham
2025-07-30 21:47 ` [PATCH 08/16] cxl/core: Enable CXL " Ben Cheatham
2025-07-30 21:47 ` [PATCH 09/16] cxl/core: Prevent onlining CXL memory behind isolated ports Ben Cheatham
2025-07-30 21:47 ` [PATCH 10/16] cxl/core: Enable CXL.mem timeout Ben Cheatham
2025-07-30 21:47 ` [PATCH 11/16] cxl/pci: Add isolation handler Ben Cheatham
2025-07-30 21:47 ` [PATCH 12/16] PCI: PCIe portdrv: Add cxl_isolation sysfs attributes Ben Cheatham
2025-07-30 21:47 ` [PATCH 13/16] cxl/core, PCI: PCIe portdrv: Add CXL timeout range programming Ben Cheatham
2025-08-04 21:39   ` Bjorn Helgaas
2025-08-06 17:58     ` Cheatham, Benjamin
2025-07-30 21:47 ` [PATCH 14/16] ACPI: Add CXL isolation _OSC fields Ben Cheatham
2025-08-22 19:19   ` Rafael J. Wysocki
2025-07-30 21:47 ` [PATCH 15/16] cxl/core, cxl/acpi: Enable CXL isolation based on _OSC handshake Ben Cheatham
2025-07-30 21:47 ` [PATCH 16/16] cxl/core, cxl/acpi: Add CXL isolation notify handler Ben Cheatham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).