* [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support
@ 2024-02-15 19:40 Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 1/6] cxl/core: Add CXL Timeout & Isolation capability parsing Ben Cheatham
` (6 more replies)
0 siblings, 7 replies; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 19:40 UTC (permalink / raw)
To: linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas,
benjamin.cheatham
Implement initial support for CXL.mem Timeout & Isolation (CXL 3.0
12.3.2). This series implements support for CXL.mem enabling and
programming CXL.mem transaction timeout, CXL.mem error isolation,
and error isolation interrupts for CXL-enabled PCIe root ports that
implement the CXL Timeout & Isolation capability.
I am operating under the assumption that recovery from error isolation
will be more involved than just resetting the port and turning off
isolation, so that flow is not implemented here. There is also no
support for CXL.cache, but I plan to eventually implement both.
The series also introduces a PCIe port bus driver dependency on the CXL
core. I expect to be able to remove that when when my team submits
patches for a future rework of the PCIe port bus driver.
I have done some testing using QEMU by adding the isolation registers
and a hacked-up QMP command to test the interrupt flow, but I *DID NOT*
implement the actual isolation feature and the subsequent device
behavior. I'd be willing to share these changes (and my config) if
anyone is interested in testing this.
Any thoughts/comments would be greatly appreciated!
Ben Cheatham (6):
cxl/core: Add CXL Timeout & Isolation capability parsing
pcie/cxl_timeout: Add CXL Timeout & Isolation service driver
pcie/cxl_timeout: Add CXL.mem timeout range programming
pcie/cxl_timeout: Add CXL.mem error isolation support
pcie/portdrv: Add CXL MSI/-X allocation
pcie/cxl_timeout: Add CXL.mem Timeout & Isolation interrupt support
drivers/cxl/core/pci.c | 5 +
drivers/cxl/core/port.c | 80 +++++
drivers/cxl/core/region.c | 9 +
drivers/cxl/core/regs.c | 7 +
drivers/cxl/cxl.h | 37 +++
drivers/cxl/cxlpci.h | 9 +
drivers/pci/pcie/Kconfig | 10 +
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/cxl_timeout.c | 592 +++++++++++++++++++++++++++++++++
drivers/pci/pcie/portdrv.c | 36 +-
drivers/pci/pcie/portdrv.h | 17 +-
11 files changed, 799 insertions(+), 4 deletions(-)
create mode 100644 drivers/pci/pcie/cxl_timeout.c
--
2.34.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* [RFC PATCH 1/6] cxl/core: Add CXL Timeout & Isolation capability parsing
2024-02-15 19:40 [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Ben Cheatham
@ 2024-02-15 19:40 ` Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 2/6] pcie/cxl_timeout: Add CXL Timeout & Isolation service driver Ben Cheatham
` (5 subsequent siblings)
6 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 19:40 UTC (permalink / raw)
To: linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas,
benjamin.cheatham
Add parsing and mapping of the CXL Timeout & Isolation capability
structure (CXL 3.1 8.2.4.24) to regs.c.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/cxl/core/regs.c | 7 +++++++
drivers/cxl/cxl.h | 6 ++++++
2 files changed, 13 insertions(+)
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 372786f80955..fe0c19826a6a 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -92,6 +92,12 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
length = CXL_RAS_CAPABILITY_LENGTH;
rmap = &map->ras;
break;
+ case CXL_CM_CAP_CAP_ID_TIMEOUT:
+ dev_dbg(dev, "found Isolation & Timeout capability (0x%x)\n",
+ offset);
+ length = CXL_TIMEOUT_CAPABILITY_LENGTH;
+ rmap = &map->timeout;
+ break;
default:
dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
offset);
@@ -211,6 +217,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
} mapinfo[] = {
{ &map->component_map.hdm_decoder, ®s->hdm_decoder },
{ &map->component_map.ras, ®s->ras },
+ { &map->component_map.timeout, ®s->timeout },
};
int i;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b6017c0c57b4..87f3178d6642 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -36,6 +36,7 @@
#define CXL_CM_CAP_CAP_ID_RAS 0x2
#define CXL_CM_CAP_CAP_ID_HDM 0x5
+#define CXL_CM_CAP_CAP_ID_TIMEOUT 0x9
#define CXL_CM_CAP_CAP_HDM_VERSION 1
/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
@@ -126,6 +127,9 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
return 0;
}
+/* CXL 3.0 8.2.4.23 CXL Timeout and Isolation Capability Structure */
+#define CXL_TIMEOUT_CAPABILITY_OFFSET 0x0
+#define CXL_TIMEOUT_CAPABILITY_LENGTH 0x10
/* RAS Registers CXL 2.0 8.2.5.9 CXL RAS Capability Structure */
#define CXL_RAS_UNCORRECTABLE_STATUS_OFFSET 0x0
#define CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
@@ -208,6 +212,7 @@ struct cxl_regs {
struct_group_tagged(cxl_component_regs, component,
void __iomem *hdm_decoder;
void __iomem *ras;
+ void __iomem *timeout;
);
/*
* Common set of CXL Device register block base pointers
@@ -242,6 +247,7 @@ struct cxl_reg_map {
struct cxl_component_reg_map {
struct cxl_reg_map hdm_decoder;
struct cxl_reg_map ras;
+ struct cxl_reg_map timeout;
};
struct cxl_device_reg_map {
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC PATCH 2/6] pcie/cxl_timeout: Add CXL Timeout & Isolation service driver
2024-02-15 19:40 [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 1/6] cxl/core: Add CXL Timeout & Isolation capability parsing Ben Cheatham
@ 2024-02-15 19:40 ` Ben Cheatham
2024-02-15 21:13 ` Bjorn Helgaas
2024-02-15 19:40 ` [RFC PATCH 3/6] pcie/cxl_timeout: Add CXL.mem timeout range programming Ben Cheatham
` (4 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 19:40 UTC (permalink / raw)
To: linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas,
benjamin.cheatham
Add a CXL Timeout & Isolation (CXL 3.0 12.3) service driver to the
PCIe port bus driver for CXL root ports. The service will support
enabling/programming CXL.mem transaction timeout, error isolation,
and interrupt handling.
Add code to find and map CXL Timeout & Isolation capability register
(CXL 3.0 8.2.4.23.1) from service driver. Then use capability register
mapping to enable CXL.mem transaction timeout with the default value.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/cxl/cxl.h | 4 +
drivers/pci/pcie/Kconfig | 10 ++
drivers/pci/pcie/Makefile | 1 +
drivers/pci/pcie/cxl_timeout.c | 197 +++++++++++++++++++++++++++++++++
drivers/pci/pcie/portdrv.c | 1 +
drivers/pci/pcie/portdrv.h | 10 +-
6 files changed, 222 insertions(+), 1 deletion(-)
create mode 100644 drivers/pci/pcie/cxl_timeout.c
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 87f3178d6642..0c65f4ec7aae 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -129,7 +129,11 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
/* CXL 3.0 8.2.4.23 CXL Timeout and Isolation Capability Structure */
#define CXL_TIMEOUT_CAPABILITY_OFFSET 0x0
+#define CXL_TIMEOUT_CAP_MEM_TIMEOUT_SUPP BIT(4)
+#define CXL_TIMEOUT_CONTROL_OFFSET 0x8
+#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_ENABLE BIT(4)
#define CXL_TIMEOUT_CAPABILITY_LENGTH 0x10
+
/* RAS Registers CXL 2.0 8.2.5.9 CXL RAS Capability Structure */
#define CXL_RAS_UNCORRECTABLE_STATUS_OFFSET 0x0
#define CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 8999fcebde6a..27820af4502e 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -58,6 +58,16 @@ config PCIEAER_CXL
If unsure, say Y.
+config PCIE_CXL_TIMEOUT
+ bool "PCI Express CXL.mem Timeout & Isolation Interrupt support"
+ depends on PCIEPORTBUS
+ depends on CXL_BUS=PCIEPORTBUS && CXL_PORT
+ help
+ Enables the CXL.mem Timeout & Isolation PCIE port service driver. This
+ driver, in combination with the CXL driver core, is responsible for
+ handling CXL capable PCIE root ports that undergo CXL.mem error isolation
+ due to either a CXL.mem transaction timeout or uncorrectable device error.
+
#
# PCI Express ECRC
#
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 8de4ed5f98f1..433ef08efc6f 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_PCIE_PME) += pme.o
obj-$(CONFIG_PCIE_DPC) += dpc.o
obj-$(CONFIG_PCIE_PTM) += ptm.o
obj-$(CONFIG_PCIE_EDR) += edr.o
+obj-$(CONFIG_PCIE_CXL_TIMEOUT) += cxl_timeout.o
diff --git a/drivers/pci/pcie/cxl_timeout.c b/drivers/pci/pcie/cxl_timeout.c
new file mode 100644
index 000000000000..84f2df0e0397
--- /dev/null
+++ b/drivers/pci/pcie/cxl_timeout.c
@@ -0,0 +1,197 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Implements CXL Timeout & Isolation (CXL 3.0 12.3.2) interrupt support as a
+ * PCIE port service driver. The driver is set up such that near all of the
+ * work for setting up and handling interrupts are in this file, while the
+ * CXL core enables the interrupts during port enumeration.
+ *
+ * Copyright (C) 2024, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Ben Cheatham <Benjamin.Cheatham@amd.com>
+ */
+
+#define pr_fmt(fmt) "cxl_timeout: " fmt
+#define dev_fmt pr_fmt
+
+#include <linux/pci.h>
+#include <linux/acpi.h>
+
+#include "../../cxl/cxlpci.h"
+#include "portdrv.h"
+
+struct cxl_timeout {
+ struct pcie_device *dev;
+ void __iomem *regs;
+ u32 cap;
+};
+
+struct pcie_cxlt_data {
+ struct cxl_timeout *cxlt;
+ struct cxl_dport *dport;
+};
+
+static int cxl_map_timeout_regs(struct pci_dev *port,
+ struct cxl_register_map *map,
+ struct cxl_component_regs *regs)
+{
+ int rc = 0;
+
+ rc = cxl_find_regblock(port, CXL_REGLOC_RBI_COMPONENT, map);
+ if (rc)
+ return rc;
+
+ rc = cxl_setup_regs(map);
+ if (rc)
+ return rc;
+
+ rc = cxl_map_component_regs(map, regs,
+ BIT(CXL_CM_CAP_CAP_ID_TIMEOUT));
+ return rc;
+}
+
+static void cxl_unmap_timeout_regs(struct pci_dev *port,
+ struct cxl_register_map *map,
+ struct cxl_component_regs *regs)
+{
+ struct cxl_reg_map *timeout_map = &map->component_map.timeout;
+
+ devm_iounmap(map->host, regs->timeout);
+ devm_release_mem_region(map->host, map->resource + timeout_map->offset,
+ timeout_map->size);
+}
+
+static struct cxl_timeout *cxl_create_cxlt(struct pcie_device *dev)
+{
+ struct cxl_component_regs *regs;
+ struct cxl_register_map *map;
+ struct cxl_timeout *cxlt;
+ int rc;
+
+ regs = devm_kmalloc(&dev->device, sizeof(*regs), GFP_KERNEL);
+ if (!regs)
+ return ERR_PTR(-ENOMEM);
+
+ map = devm_kmalloc(&dev->device, sizeof(*map), GFP_KERNEL);
+ if (!map) {
+ devm_kfree(&dev->device, regs);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ rc = cxl_map_timeout_regs(dev->port, map, regs);
+ if (rc)
+ goto err;
+
+ cxlt = devm_kmalloc(&dev->device, sizeof(*cxlt), GFP_KERNEL);
+ if (!cxlt)
+ goto err;
+
+ cxlt->regs = regs->timeout;
+ cxlt->dev = dev;
+ cxlt->cap = readl(cxlt->regs + CXL_TIMEOUT_CAPABILITY_OFFSET);
+
+ return cxlt;
+
+err:
+ cxl_unmap_timeout_regs(dev->port, map, regs);
+ return ERR_PTR(rc);
+}
+
+int cxl_find_timeout_cap(struct pci_dev *dev, u32 *cap)
+{
+ struct cxl_component_regs regs;
+ struct cxl_register_map map;
+ int rc = 0;
+
+ rc = cxl_map_timeout_regs(dev, &map, ®s);
+ if (rc)
+ return rc;
+
+ *cap = readl(regs.timeout + CXL_TIMEOUT_CAPABILITY_OFFSET);
+ cxl_unmap_timeout_regs(dev, &map, ®s);
+
+ return rc;
+}
+
+static struct pcie_cxlt_data *cxlt_create_pdata(struct pcie_device *dev)
+{
+ struct pcie_cxlt_data *data;
+
+ data = devm_kzalloc(&dev->device, sizeof(*data), GFP_KERNEL);
+ if (IS_ERR_OR_NULL(data))
+ return ERR_PTR(-ENOMEM);
+
+ data->cxlt = cxl_create_cxlt(dev);
+ if (IS_ERR_OR_NULL(data->cxlt))
+ return ERR_PTR(PTR_ERR(data->cxlt));
+
+ data->dport = NULL;
+
+ return data;
+}
+
+static void cxl_disable_timeout(void *data)
+{
+ struct cxl_timeout *cxlt = data;
+ u32 cntrl = readl(cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+
+ cntrl &= ~CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_ENABLE;
+ writel(cntrl, cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+}
+
+static int cxl_enable_timeout(struct pcie_device *dev, struct cxl_timeout *cxlt)
+{
+ u32 cntrl;
+
+ if (!cxlt || !FIELD_GET(CXL_TIMEOUT_CAP_MEM_TIMEOUT_SUPP, cxlt->cap))
+ return -ENXIO;
+
+ cntrl = readl(cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+ cntrl |= CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_ENABLE;
+ writel(cntrl, cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+
+ return devm_add_action_or_reset(&dev->device, cxl_disable_timeout,
+ cxlt);
+}
+
+static int cxl_timeout_probe(struct pcie_device *dev)
+{
+ struct pci_dev *port = dev->port;
+ struct pcie_cxlt_data *pdata;
+ struct cxl_timeout *cxlt;
+ int rc = 0;
+
+ /* Limit to CXL root ports */
+ if (!pci_find_dvsec_capability(port, PCI_DVSEC_VENDOR_ID_CXL,
+ CXL_DVSEC_PORT_EXTENSIONS))
+ return -ENODEV;
+
+ pdata = cxlt_create_pdata(dev);
+ if (IS_ERR_OR_NULL(pdata))
+ return PTR_ERR(pdata);
+
+ set_service_data(dev, pdata);
+ cxlt = pdata->cxlt;
+
+ rc = cxl_enable_timeout(dev, cxlt);
+ if (rc)
+ pci_dbg(dev->port, "Failed to enable CXL.mem timeout: %d\n",
+ rc);
+
+ return rc;
+}
+
+static struct pcie_port_service_driver cxltdriver = {
+ .name = "cxl_timeout",
+ .port_type = PCI_EXP_TYPE_ROOT_PORT,
+ .service = PCIE_PORT_SERVICE_CXLT,
+
+ .probe = cxl_timeout_probe,
+};
+
+int __init pcie_cxlt_init(void)
+{
+ return pcie_port_service_register(&cxltdriver);
+}
+
+MODULE_IMPORT_NS(CXL);
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 14a4b89a3b83..7aa0a6f2da4e 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -829,6 +829,7 @@ static void __init pcie_init_services(void)
pcie_pme_init();
pcie_dpc_init();
pcie_hp_init();
+ pcie_cxlt_init();
}
static int __init pcie_portdrv_init(void)
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 1f3803bde7ee..5395a0e36956 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -22,8 +22,10 @@
#define PCIE_PORT_SERVICE_DPC (1 << PCIE_PORT_SERVICE_DPC_SHIFT)
#define PCIE_PORT_SERVICE_BWNOTIF_SHIFT 4 /* Bandwidth notification */
#define PCIE_PORT_SERVICE_BWNOTIF (1 << PCIE_PORT_SERVICE_BWNOTIF_SHIFT)
+#define PCIE_PORT_SERVICE_CXLT_SHIFT 5 /* CXL Timeout & Isolation */
+#define PCIE_PORT_SERVICE_CXLT (1 << PCIE_PORT_SERVICE_CXLT_SHIFT)
-#define PCIE_PORT_DEVICE_MAXSERVICES 5
+#define PCIE_PORT_DEVICE_MAXSERVICES 6
extern bool pcie_ports_dpc_native;
@@ -51,6 +53,12 @@ int pcie_dpc_init(void);
static inline int pcie_dpc_init(void) { return 0; }
#endif
+#ifdef CONFIG_PCIE_CXL_TIMEOUT
+int pcie_cxlt_init(void);
+#else
+static inline int pcie_cxlt_init(void) { return 0; }
+#endif
+
/* Port Type */
#define PCIE_ANY_PORT (~0)
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC PATCH 3/6] pcie/cxl_timeout: Add CXL.mem timeout range programming
2024-02-15 19:40 [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 1/6] cxl/core: Add CXL Timeout & Isolation capability parsing Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 2/6] pcie/cxl_timeout: Add CXL Timeout & Isolation service driver Ben Cheatham
@ 2024-02-15 19:40 ` Ben Cheatham
2024-02-15 21:35 ` Bjorn Helgaas
2024-02-15 19:40 ` [RFC PATCH 4/6] pcie/cxl_timeout: Add CXL.mem error isolation support Ben Cheatham
` (3 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 19:40 UTC (permalink / raw)
To: linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas,
benjamin.cheatham
Add programming of CXL.mem transaction timeout range (CXL 3.0 8.2.4.23.1
bits 4 & 11:8) through sysfs. To program the device, read the ranges
supported from "available_timeout_ranges" in the PCIe service device
directory, then write the desired value to "timeout_range". The current
value can be found by reading from "timeout_range".
Example for CXL-enabled PCIe port 0000:0c:00.0, with CXL timeout
service as 020:
# cd /sys/bus/pci_express/devices/0000:0c:00.0:pcie020
# cat available_timeout_ranges
0x0 Default range (50us-10ms)
0x1 Range A (50us-100us)
0x2 Range A (1ms-10ms)
# cat timeout_range
0
# echo 0x2 > timeout_range
# cat timeout_range
2
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/cxl/cxl.h | 13 +++
drivers/pci/pcie/cxl_timeout.c | 185 +++++++++++++++++++++++++++++++++
2 files changed, 198 insertions(+)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 0c65f4ec7aae..4aa5fecc43bd 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -129,11 +129,24 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
/* CXL 3.0 8.2.4.23 CXL Timeout and Isolation Capability Structure */
#define CXL_TIMEOUT_CAPABILITY_OFFSET 0x0
+#define CXL_TIMEOUT_CAP_MEM_TIMEOUT_MASK GENMASK(3, 0)
#define CXL_TIMEOUT_CAP_MEM_TIMEOUT_SUPP BIT(4)
#define CXL_TIMEOUT_CONTROL_OFFSET 0x8
+#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_MASK GENMASK(3, 0)
#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_ENABLE BIT(4)
#define CXL_TIMEOUT_CAPABILITY_LENGTH 0x10
+/* CXL 3.0 8.2.4.23.2 CXL Timeout and Isolation Control Register, bits 3:0 */
+#define CXL_TIMEOUT_TIMEOUT_RANGE_DEFAULT 0x0
+#define CXL_TIMEOUT_TIMEOUT_RANGE_A1 0x1
+#define CXL_TIMEOUT_TIMEOUT_RANGE_A2 0x2
+#define CXL_TIMEOUT_TIMEOUT_RANGE_B1 0x5
+#define CXL_TIMEOUT_TIMEOUT_RANGE_B2 0x6
+#define CXL_TIMEOUT_TIMEOUT_RANGE_C1 0x9
+#define CXL_TIMEOUT_TIMEOUT_RANGE_C2 0xA
+#define CXL_TIMEOUT_TIMEOUT_RANGE_D1 0xD
+#define CXL_TIMEOUT_TIMEOUT_RANGE_D2 0xE
+
/* RAS Registers CXL 2.0 8.2.5.9 CXL RAS Capability Structure */
#define CXL_RAS_UNCORRECTABLE_STATUS_OFFSET 0x0
#define CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
diff --git a/drivers/pci/pcie/cxl_timeout.c b/drivers/pci/pcie/cxl_timeout.c
index 84f2df0e0397..916dbaf2bb58 100644
--- a/drivers/pci/pcie/cxl_timeout.c
+++ b/drivers/pci/pcie/cxl_timeout.c
@@ -20,6 +20,8 @@
#include "../../cxl/cxlpci.h"
#include "portdrv.h"
+#define NUM_CXL_TIMEOUT_RANGES 9
+
struct cxl_timeout {
struct pcie_device *dev;
void __iomem *regs;
@@ -130,6 +132,57 @@ static struct pcie_cxlt_data *cxlt_create_pdata(struct pcie_device *dev)
return data;
}
+static bool cxl_validate_timeout_range(struct cxl_timeout *cxlt, u8 range)
+{
+ u8 timeout_ranges = FIELD_GET(CXL_TIMEOUT_CAP_MEM_TIMEOUT_MASK,
+ cxlt->cap);
+
+ if (!timeout_ranges)
+ return false;
+
+ switch (range) {
+ case CXL_TIMEOUT_TIMEOUT_RANGE_DEFAULT:
+ return true;
+ case CXL_TIMEOUT_TIMEOUT_RANGE_A1:
+ case CXL_TIMEOUT_TIMEOUT_RANGE_A2:
+ return timeout_ranges & BIT(0);
+ case CXL_TIMEOUT_TIMEOUT_RANGE_B1:
+ case CXL_TIMEOUT_TIMEOUT_RANGE_B2:
+ return timeout_ranges & BIT(1);
+ case CXL_TIMEOUT_TIMEOUT_RANGE_C1:
+ case CXL_TIMEOUT_TIMEOUT_RANGE_C2:
+ return timeout_ranges & BIT(2);
+ case CXL_TIMEOUT_TIMEOUT_RANGE_D1:
+ case CXL_TIMEOUT_TIMEOUT_RANGE_D2:
+ return timeout_ranges & BIT(3);
+ default:
+ pci_info(cxlt->dev->port, "Invalid timeout range: %d\n",
+ range);
+ return false;
+ }
+}
+
+static int cxl_set_mem_timeout_range(struct cxl_timeout *cxlt, u8 range)
+{
+ u32 cntrl;
+
+ if (!cxlt)
+ return -ENXIO;
+
+ if (!FIELD_GET(CXL_TIMEOUT_CAP_MEM_TIMEOUT_MASK, cxlt->cap)
+ || !cxl_validate_timeout_range(cxlt, range))
+ return -ENXIO;
+
+ cntrl = readl(cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+ cntrl &= ~CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_MASK;
+ cntrl |= CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_MASK & range;
+ writel(cntrl, cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+
+ pci_dbg(cxlt->dev->port,
+ "Timeout & isolation timeout set to range 0x%x\n", range);
+ return 0;
+}
+
static void cxl_disable_timeout(void *data)
{
struct cxl_timeout *cxlt = data;
@@ -154,6 +207,135 @@ static int cxl_enable_timeout(struct pcie_device *dev, struct cxl_timeout *cxlt)
cxlt);
}
+static ssize_t timeout_range_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct pcie_device *pdev = to_pcie_device(dev);
+ struct pcie_cxlt_data *pdata = get_service_data(pdev);
+ u32 cntrl, range;
+
+ if (!pdata || !pdata->cxlt)
+ return -ENXIO;
+
+ cntrl = readl(pdata->cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+
+ range = FIELD_GET(CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_MASK, cntrl);
+ return sysfs_emit(buf, "%u\n", range);
+}
+
+static ssize_t timeout_range_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct pcie_device *pdev = to_pcie_device(dev);
+ struct pcie_cxlt_data *pdata = get_service_data(pdev);
+ u8 range;
+ int rc;
+
+ if (kstrtou8(buf, 16, &range) < 0)
+ return -EINVAL;
+
+ if (!pdata || !pdata->cxlt)
+ return -ENXIO;
+
+ rc = cxl_set_mem_timeout_range(pdata->cxlt, range);
+ if (rc)
+ return rc;
+
+ return count;
+}
+static DEVICE_ATTR_RW(timeout_range);
+
+const struct cxl_timeout_range {
+ const char *str;
+ u8 range_val;
+} cxl_timeout_ranges[NUM_CXL_TIMEOUT_RANGES] = {
+ { "Default range (50us-10ms)",
+ CXL_TIMEOUT_TIMEOUT_RANGE_DEFAULT },
+ { "Range A (50us-100us)",
+ CXL_TIMEOUT_TIMEOUT_RANGE_A1 },
+ { "Range A (1ms-10ms)",
+ CXL_TIMEOUT_TIMEOUT_RANGE_A2 },
+ { "Range B (16ms-55ms)",
+ CXL_TIMEOUT_TIMEOUT_RANGE_B1 },
+ { "Range B (65ms-210ms)",
+ CXL_TIMEOUT_TIMEOUT_RANGE_B2 },
+ { "Range C (260ms-900ms)",
+ CXL_TIMEOUT_TIMEOUT_RANGE_C1 },
+ { "Range C (1s-3.5s)",
+ CXL_TIMEOUT_TIMEOUT_RANGE_C2 },
+ { "Range D (4s-13s)",
+ CXL_TIMEOUT_TIMEOUT_RANGE_D1 },
+ { "Range D (17s-64s)",
+ CXL_TIMEOUT_TIMEOUT_RANGE_D2 },
+};
+
+static ssize_t available_timeout_ranges_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct pcie_device *pdev = to_pcie_device(dev);
+ struct pcie_cxlt_data *pdata = get_service_data(pdev);
+ ssize_t count = 0;
+ u8 range;
+
+ if (!pdata || !pdata->cxlt)
+ return -ENXIO;
+
+ for (int i = 0; i < ARRAY_SIZE(cxl_timeout_ranges); i++) {
+ range = cxl_timeout_ranges[i].range_val;
+
+ if (cxl_validate_timeout_range(pdata->cxlt, range)) {
+ count += sysfs_emit_at(buf, count, "0x%x\t%s\n",
+ cxl_timeout_ranges[i].range_val,
+ cxl_timeout_ranges[i].str);
+ }
+ }
+
+ return count;
+}
+static DEVICE_ATTR_RO(available_timeout_ranges);
+
+static umode_t cxl_timeout_is_visible(struct kobject *kobj,
+ struct attribute *attr, int val)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct pcie_device *pdev = to_pcie_device(dev);
+ struct pcie_cxlt_data *pdata = get_service_data(pdev);
+ u32 cap;
+
+ if (!pdata || !pdata->cxlt)
+ return 0;
+
+ cap = pdata->cxlt->cap;
+
+ if ((attr == &dev_attr_timeout_range.attr) &&
+ cap & CXL_TIMEOUT_CAP_MEM_TIMEOUT_SUPP)
+ return attr->mode;
+
+ if ((attr == &dev_attr_available_timeout_ranges.attr) &&
+ (FIELD_GET(CXL_TIMEOUT_CAP_MEM_TIMEOUT_MASK, cap)))
+ return attr->mode;
+
+ return 0;
+}
+static struct attribute *cxl_timeout_timeout_attributes[] = {
+ &dev_attr_timeout_range.attr,
+ &dev_attr_available_timeout_ranges.attr,
+ NULL,
+};
+
+static struct attribute_group cxl_timeout_timeout_group = {
+ .attrs = cxl_timeout_timeout_attributes,
+ .is_visible = cxl_timeout_is_visible,
+};
+
+static const struct attribute_group *cxl_timeout_attribute_groups[] = {
+ &cxl_timeout_timeout_group,
+ NULL,
+};
+
static int cxl_timeout_probe(struct pcie_device *dev)
{
struct pci_dev *port = dev->port;
@@ -187,6 +369,9 @@ static struct pcie_port_service_driver cxltdriver = {
.service = PCIE_PORT_SERVICE_CXLT,
.probe = cxl_timeout_probe,
+ .driver = {
+ .dev_groups = cxl_timeout_attribute_groups,
+ },
};
int __init pcie_cxlt_init(void)
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC PATCH 4/6] pcie/cxl_timeout: Add CXL.mem error isolation support
2024-02-15 19:40 [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Ben Cheatham
` (2 preceding siblings ...)
2024-02-15 19:40 ` [RFC PATCH 3/6] pcie/cxl_timeout: Add CXL.mem timeout range programming Ben Cheatham
@ 2024-02-15 19:40 ` Ben Cheatham
2024-02-15 21:49 ` Bjorn Helgaas
2024-02-15 19:40 ` [RFC PATCH 5/6] pcie/portdrv: Add CXL MSI/-X allocation Ben Cheatham
` (2 subsequent siblings)
6 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 19:40 UTC (permalink / raw)
To: linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas,
benjamin.cheatham
Add and enable CXL.mem error isolation support (CXL 3.0 12.3.2)
to the CXL Timeout & Isolation service driver.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/cxl/cxl.h | 2 ++
drivers/pci/pcie/cxl_timeout.c | 40 +++++++++++++++++++++++++++++++++-
2 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 4aa5fecc43bd..b1d5232a0127 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -131,9 +131,11 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXL_TIMEOUT_CAPABILITY_OFFSET 0x0
#define CXL_TIMEOUT_CAP_MEM_TIMEOUT_MASK GENMASK(3, 0)
#define CXL_TIMEOUT_CAP_MEM_TIMEOUT_SUPP BIT(4)
+#define CXL_TIMEOUT_CAP_MEM_ISO_SUPP BIT(16)
#define CXL_TIMEOUT_CONTROL_OFFSET 0x8
#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_MASK GENMASK(3, 0)
#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_ENABLE BIT(4)
+#define CXL_TIMEOUT_CONTROL_MEM_ISO_ENABLE BIT(16)
#define CXL_TIMEOUT_CAPABILITY_LENGTH 0x10
/* CXL 3.0 8.2.4.23.2 CXL Timeout and Isolation Control Register, bits 3:0 */
diff --git a/drivers/pci/pcie/cxl_timeout.c b/drivers/pci/pcie/cxl_timeout.c
index 916dbaf2bb58..5900239e5bbf 100644
--- a/drivers/pci/pcie/cxl_timeout.c
+++ b/drivers/pci/pcie/cxl_timeout.c
@@ -207,6 +207,31 @@ static int cxl_enable_timeout(struct pcie_device *dev, struct cxl_timeout *cxlt)
cxlt);
}
+static void cxl_disable_isolation(void *data)
+{
+ struct cxl_timeout *cxlt = data;
+ u32 cntrl = readl(cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+
+ cntrl &= ~CXL_TIMEOUT_CONTROL_MEM_ISO_ENABLE;
+ writel(cntrl, cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+}
+
+static int cxl_enable_isolation(struct pcie_device *dev,
+ struct cxl_timeout *cxlt)
+{
+ u32 cntrl;
+
+ if (!cxlt || !FIELD_GET(CXL_TIMEOUT_CAP_MEM_ISO_SUPP, cxlt->cap))
+ return -ENXIO;
+
+ cntrl = readl(cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+ cntrl |= CXL_TIMEOUT_CONTROL_MEM_ISO_ENABLE;
+ writel(cntrl, cxlt->regs + CXL_TIMEOUT_CONTROL_OFFSET);
+
+ return devm_add_action_or_reset(&dev->device, cxl_disable_isolation,
+ cxlt);
+}
+
static ssize_t timeout_range_show(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -341,7 +366,8 @@ static int cxl_timeout_probe(struct pcie_device *dev)
struct pci_dev *port = dev->port;
struct pcie_cxlt_data *pdata;
struct cxl_timeout *cxlt;
- int rc = 0;
+ bool timeout_enabled;
+ int rc;
/* Limit to CXL root ports */
if (!pci_find_dvsec_capability(port, PCI_DVSEC_VENDOR_ID_CXL,
@@ -360,6 +386,18 @@ static int cxl_timeout_probe(struct pcie_device *dev)
pci_dbg(dev->port, "Failed to enable CXL.mem timeout: %d\n",
rc);
+ timeout_enabled = !rc;
+
+ rc = cxl_enable_isolation(dev, cxlt);
+ if (rc)
+ pci_dbg(dev->port, "Failed to enable CXL.mem isolation: %d\n",
+ rc);
+
+ if (rc && !timeout_enabled) {
+ pci_info(dev->port,
+ "Failed to enable CXL.mem timeout and isolation.\n");
+ }
+
return rc;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC PATCH 5/6] pcie/portdrv: Add CXL MSI/-X allocation
2024-02-15 19:40 [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Ben Cheatham
` (3 preceding siblings ...)
2024-02-15 19:40 ` [RFC PATCH 4/6] pcie/cxl_timeout: Add CXL.mem error isolation support Ben Cheatham
@ 2024-02-15 19:40 ` Ben Cheatham
2024-02-15 21:51 ` Bjorn Helgaas
2024-02-15 19:40 ` [RFC PATCH 6/6] pcie/cxl_timeout: Add CXL.mem Timeout & Isolation interrupt support Ben Cheatham
2024-02-15 23:43 ` [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Dan Williams
6 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 19:40 UTC (permalink / raw)
To: linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas,
benjamin.cheatham
Allocate an MSI/-X for CXL-enabled PCIe root ports that support
timeout & isolation interrupts. This vector will be used by the
CXL timeout & isolation service driver to disable the root port
in the CXL port hierarchy and any associated memory if the port
enters isolation.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/cxl/cxl.h | 2 ++
drivers/pci/pcie/cxl_timeout.c | 11 ++++++++++-
drivers/pci/pcie/portdrv.c | 35 +++++++++++++++++++++++++++++++---
drivers/pci/pcie/portdrv.h | 6 ++++++
4 files changed, 50 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b1d5232a0127..3b5645ec95b9 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -132,6 +132,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXL_TIMEOUT_CAP_MEM_TIMEOUT_MASK GENMASK(3, 0)
#define CXL_TIMEOUT_CAP_MEM_TIMEOUT_SUPP BIT(4)
#define CXL_TIMEOUT_CAP_MEM_ISO_SUPP BIT(16)
+#define CXL_TIMEOUT_CAP_INTR_SUPP BIT(26)
+#define CXL_TIMEOUT_CAP_INTR_MASK GENMASK(31, 27)
#define CXL_TIMEOUT_CONTROL_OFFSET 0x8
#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_MASK GENMASK(3, 0)
#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_ENABLE BIT(4)
diff --git a/drivers/pci/pcie/cxl_timeout.c b/drivers/pci/pcie/cxl_timeout.c
index 5900239e5bbf..352d9370a999 100644
--- a/drivers/pci/pcie/cxl_timeout.c
+++ b/drivers/pci/pcie/cxl_timeout.c
@@ -99,7 +99,7 @@ static struct cxl_timeout *cxl_create_cxlt(struct pcie_device *dev)
return ERR_PTR(rc);
}
-int cxl_find_timeout_cap(struct pci_dev *dev, u32 *cap)
+int pcie_cxl_find_timeout_cap(struct pci_dev *dev, u32 *cap)
{
struct cxl_component_regs regs;
struct cxl_register_map map;
@@ -115,6 +115,15 @@ int cxl_find_timeout_cap(struct pci_dev *dev, u32 *cap)
return rc;
}
+bool pcie_supports_cxl_timeout_interrupts(u32 cap)
+{
+ if (!(cap & CXL_TIMEOUT_CAP_INTR_SUPP))
+ return false;
+
+ return (cap & CXL_TIMEOUT_CAP_MEM_ISO_SUPP) ||
+ (cap & CXL_TIMEOUT_CAP_MEM_TIMEOUT_SUPP);
+}
+
static struct pcie_cxlt_data *cxlt_create_pdata(struct pcie_device *dev)
{
struct pcie_cxlt_data *data;
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 7aa0a6f2da4e..c36fe6ccfeae 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -21,6 +21,7 @@
#include "../pci.h"
#include "portdrv.h"
+#include "../../cxl/cxlpci.h"
/*
* The PCIe Capability Interrupt Message Number (PCIe r3.1, sec 7.8.2) must
@@ -55,7 +56,7 @@ static void release_pcie_device(struct device *dev)
* required to accommodate the largest Message Number.
*/
static int pcie_message_numbers(struct pci_dev *dev, int mask,
- u32 *pme, u32 *aer, u32 *dpc)
+ u32 *pme, u32 *aer, u32 *dpc, u32 *cxl)
{
u32 nvec = 0, pos;
u16 reg16;
@@ -98,6 +99,19 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
}
}
+#ifdef CONFIG_PCIE_CXL_TIMEOUT
+ if (mask & PCIE_PORT_SERVICE_CXLT) {
+ u32 cap;
+
+ if (!pcie_cxl_find_timeout_cap(dev, &cap) &&
+ pcie_supports_cxl_timeout_interrupts(cap)) {
+ *cxl = FIELD_GET(CXL_TIMEOUT_CAP_INTR_MASK,
+ pos);
+ nvec = max(nvec, *cxl + 1);
+ }
+ }
+#endif
+
return nvec;
}
@@ -113,7 +127,7 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
{
int nr_entries, nvec, pcie_irq;
- u32 pme = 0, aer = 0, dpc = 0;
+ u32 pme = 0, aer = 0, dpc = 0, cxlt = 0;
/* Allocate the maximum possible number of MSI/MSI-X vectors */
nr_entries = pci_alloc_irq_vectors(dev, 1, PCIE_PORT_MAX_MSI_ENTRIES,
@@ -122,7 +136,7 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
return nr_entries;
/* See how many and which Interrupt Message Numbers we actually use */
- nvec = pcie_message_numbers(dev, mask, &pme, &aer, &dpc);
+ nvec = pcie_message_numbers(dev, mask, &pme, &aer, &dpc, &cxlt);
if (nvec > nr_entries) {
pci_free_irq_vectors(dev);
return -EIO;
@@ -163,6 +177,9 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
if (mask & PCIE_PORT_SERVICE_DPC)
irqs[PCIE_PORT_SERVICE_DPC_SHIFT] = pci_irq_vector(dev, dpc);
+ if (mask & PCIE_PORT_SERVICE_CXLT)
+ irqs[PCIE_PORT_SERVICE_CXLT_SHIFT] = pci_irq_vector(dev, cxlt);
+
return 0;
}
@@ -274,6 +291,18 @@ static int get_port_device_capability(struct pci_dev *dev)
services |= PCIE_PORT_SERVICE_BWNOTIF;
}
+#ifdef CONFIG_PCIE_CXL_TIMEOUT
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
+ pci_find_dvsec_capability(dev, PCI_DVSEC_VENDOR_ID_CXL,
+ CXL_DVSEC_PORT_EXTENSIONS)) {
+ u32 cap;
+
+ if (!pcie_cxl_find_timeout_cap(dev, &cap) &&
+ pcie_supports_cxl_timeout_interrupts(cap))
+ services |= PCIE_PORT_SERVICE_CXLT;
+ }
+#endif
+
return services;
}
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 5395a0e36956..f89e7366e986 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -129,4 +129,10 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {}
#endif /* !CONFIG_PCIE_PME */
struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
+
+#ifdef CONFIG_PCIE_CXL_TIMEOUT
+int pcie_cxl_find_timeout_cap(struct pci_dev *dev, u32 *cap);
+bool pcie_supports_cxl_timeout_interrupts(u32 cap);
+#endif
+
#endif /* _PORTDRV_H_ */
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [RFC PATCH 6/6] pcie/cxl_timeout: Add CXL.mem Timeout & Isolation interrupt support
2024-02-15 19:40 [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Ben Cheatham
` (4 preceding siblings ...)
2024-02-15 19:40 ` [RFC PATCH 5/6] pcie/portdrv: Add CXL MSI/-X allocation Ben Cheatham
@ 2024-02-15 19:40 ` Ben Cheatham
2024-02-15 21:57 ` Bjorn Helgaas
2024-02-15 23:43 ` [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Dan Williams
6 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 19:40 UTC (permalink / raw)
To: linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas,
benjamin.cheatham
Add support for CXL.mem Timeout & Isolation interrupts. A CXL root port
under isolation will not complete writes and will return an exception
response (i.e. poison) on reads (CXL 3.0 12.3.2). Therefore, when a
CXL-enabled PCIe root port enters isolation, we assume that the memory
under the port is unreachable and will need to be unmapped.
When an isolation interrupt occurs, the CXL Timeout & Isolation service
driver will mark the port (cxl_dport) as isolated, destroy any CXL memory
regions under the *bridge* (cxl_port), and attempt to unmap the memory
backing the CXL region(s). If the memory was already in use, the mapping
is not guaranteed to succeed.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/cxl/core/pci.c | 5 +
drivers/cxl/core/port.c | 80 ++++++++++++++++
drivers/cxl/core/region.c | 9 ++
drivers/cxl/cxl.h | 10 ++
drivers/cxl/cxlpci.h | 9 ++
drivers/pci/pcie/cxl_timeout.c | 165 ++++++++++++++++++++++++++++++++-
drivers/pci/pcie/portdrv.h | 1 +
7 files changed, 278 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 6c9c8d92f8f7..95b6a5f0d0cc 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -64,6 +64,11 @@ static int match_add_dports(struct pci_dev *pdev, void *data)
}
ctx->count++;
+ if (type == PCI_EXP_TYPE_ROOT_PORT && !pcie_cxlt_register_dport(dport))
+ return devm_add_action_or_reset(dport->dport_dev,
+ pcie_cxlt_unregister_dport,
+ dport);
+
return 0;
}
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index e59d9d37aa65..88d114c67596 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -2000,6 +2000,86 @@ int cxl_decoder_autoremove(struct device *host, struct cxl_decoder *cxld)
}
EXPORT_SYMBOL_NS_GPL(cxl_decoder_autoremove, CXL);
+/* Checks to see if a dport is above an endpoint */
+static bool cxl_dport_is_parent(struct cxl_dport *parent, struct cxl_ep *ep)
+{
+ struct cxl_dport *ep_dport = ep->dport;
+ struct cxl_port *ep_port = ep_dport->port;
+
+ while (!is_cxl_root(ep_port)) {
+ if (ep_dport == parent)
+ return true;
+
+ ep_dport = ep_port->parent_dport;
+ ep_port = ep_dport->port;
+ }
+
+ return false;
+}
+
+bool cxl_dport_is_in_region(struct cxl_dport *dport,
+ struct cxl_region_ref *rr)
+{
+ struct cxl_region_params *p = &rr->region->params;
+ struct cxl_ep *ep;
+ int i;
+
+ for (i = 0; i < p->nr_targets; i++) {
+ if (!p->targets[i])
+ continue;
+
+ ep = cxl_ep_load(dport->port, cxled_to_memdev(p->targets[i]));
+
+ if (ep && cxl_dport_is_parent(dport, ep))
+ return true;
+ }
+
+ return false;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_dport_is_in_region, CXL);
+
+void cxl_port_kill_regions(struct cxl_port *port)
+{
+ struct cxl_endpoint_decoder *ep_decoder;
+ struct cxl_region_params *p;
+ struct cxl_region_ref *ref;
+ unsigned long index;
+ struct cxl_ep *ep;
+ int i;
+
+ xa_for_each(&port->regions, index, ref) {
+ p = &ref->region->params;
+
+ for (i = 0; i < p->nr_targets; i++) {
+ ep_decoder = p->targets[i];
+ if (!ep_decoder)
+ continue;
+
+ ep = cxl_ep_load(port, cxled_to_memdev(ep_decoder));
+ if (ep)
+ cxl_decoder_kill_region(ep_decoder);
+ }
+ }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_kill_regions, CXL);
+
+bool cxl_port_is_isolated(struct cxl_port *port)
+{
+ struct cxl_dport *dport = port->parent_dport;
+
+ while (!is_cxl_root(port) && dport) {
+ if (dport->isolated || !dport->port)
+ return true;
+
+ dport = dport->port->parent_dport;
+ port = dport->port;
+ }
+
+
+ return false;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_port_is_isolated, CXL);
+
/**
* __cxl_driver_register - register a driver for the cxl bus
* @cxl_drv: cxl driver structure to attach
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 0f05692bfec3..f9aef17db26c 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1699,6 +1699,12 @@ static int cxl_region_attach(struct cxl_region *cxlr,
return -ENXIO;
}
+ if (cxl_port_is_isolated(ep_port)) {
+ dev_err(&cxlr->dev, "%s:%s endpoint is under a dport in error isolation\n",
+ dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
+ return -EBUSY;
+ }
+
if (cxled->cxld.target_type != cxlr->type) {
dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
@@ -2782,6 +2788,9 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
struct resource *res;
int rc;
+ if (cxl_port_is_isolated(cxlmd->endpoint))
+ return ERR_PTR(-EBUSY);
+
do {
cxlr = __create_region(cxlrd, cxled->mode,
atomic_read(&cxlrd->region_id));
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 3b5645ec95b9..1bee2560446a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -138,6 +138,10 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_MASK GENMASK(3, 0)
#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_ENABLE BIT(4)
#define CXL_TIMEOUT_CONTROL_MEM_ISO_ENABLE BIT(16)
+#define CXL_TIMEOUT_CONTROL_MEM_INTR_ENABLE BIT(26)
+#define CXL_TIMEOUT_STATUS_OFFSET 0xC
+#define CXL_TIMEOUT_STATUS_MEM_TIMEOUT BIT(0)
+#define CXL_TIMEOUT_STATUS_MEM_ISO BIT(8)
#define CXL_TIMEOUT_CAPABILITY_LENGTH 0x10
/* CXL 3.0 8.2.4.23.2 CXL Timeout and Isolation Control Register, bits 3:0 */
@@ -700,8 +704,11 @@ struct cxl_dport {
struct access_coordinate sw_coord;
struct access_coordinate hb_coord;
long link_latency;
+ bool isolated;
};
+bool cxl_port_is_isolated(struct cxl_port *port);
+
/**
* struct cxl_ep - track an endpoint's interest in a port
* @ep: device that hosts a generic CXL endpoint (expander or accelerator)
@@ -735,6 +742,9 @@ struct cxl_region_ref {
int nr_targets;
};
+bool cxl_dport_is_in_region(struct cxl_dport *dport, struct cxl_region_ref *ref);
+void cxl_port_kill_regions(struct cxl_port *port);
+
/*
* The platform firmware device hosting the root is also the top of the
* CXL port topology. All other CXL ports have another CXL port as their
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 711b05d9a370..7100e23a1819 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -106,4 +106,13 @@ void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
+
+#ifdef CONFIG_PCIE_CXL_TIMEOUT
+int pcie_cxlt_register_dport(struct cxl_dport *dport);
+void pcie_cxlt_unregister_dport(struct cxl_dport *dport);
+#else
+int pcie_cxlt_register_dport(void *dport) { return -ENXIO; }
+void pcie_cxlt_unregister_dport(void *dport) {}
+#endif
+
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/pci/pcie/cxl_timeout.c b/drivers/pci/pcie/cxl_timeout.c
index 352d9370a999..2193e872b4b7 100644
--- a/drivers/pci/pcie/cxl_timeout.c
+++ b/drivers/pci/pcie/cxl_timeout.c
@@ -22,6 +22,8 @@
#define NUM_CXL_TIMEOUT_RANGES 9
+static u32 num_cxlt_devs;
+
struct cxl_timeout {
struct pcie_device *dev;
void __iomem *regs;
@@ -141,6 +143,141 @@ static struct pcie_cxlt_data *cxlt_create_pdata(struct pcie_device *dev)
return data;
}
+int pcie_cxlt_register_dport(struct cxl_dport *dport)
+{
+ struct device *dev = dport->dport_dev;
+ struct pcie_device *pcie_dev;
+ struct pcie_cxlt_data *pdata;
+ struct pci_dev *pdev;
+
+ if (!dev_is_pci(dev))
+ return -ENXIO;
+
+ pdev = to_pci_dev(dev);
+
+ dev = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_CXLT);
+ if (!dev) {
+ dev_warn(dev,
+ "Device is not registered with cxl_timeout driver.\n");
+ return -ENODEV;
+ }
+
+ pcie_dev = to_pcie_device(dev);
+
+ pdata = get_service_data(pcie_dev);
+ pdata->dport = dport;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pcie_cxlt_register_dport);
+
+void pcie_cxlt_unregister_dport(struct cxl_dport *dport)
+{
+ struct device *dev = dport->dport_dev;
+ struct pcie_device *pcie_dev;
+ struct pcie_cxlt_data *pdata;
+ struct pci_dev *pdev;
+
+ if (!dev_is_pci(dev))
+ return;
+
+ pdev = to_pci_dev(dev);
+
+ dev = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_CXLT);
+ if (!dev) {
+ dev_dbg(dev,
+ "Device was not registered with cxl_timeout driver.\n");
+ return;
+ }
+
+ pcie_dev = to_pcie_device(dev);
+ pdata = get_service_data(pcie_dev);
+ pdata->dport = NULL;
+}
+EXPORT_SYMBOL_GPL(pcie_cxlt_unregister_dport);
+
+struct cxl_timeout_wq_data {
+ struct work_struct w;
+ struct cxl_dport *dport;
+};
+
+static struct workqueue_struct *cxl_timeout_wq;
+
+static void cxl_timeout_handler(struct work_struct *w)
+{
+ struct cxl_timeout_wq_data *data =
+ container_of(w, struct cxl_timeout_wq_data, w);
+ struct cxl_dport *dport = data->dport;
+ struct cxl_port *port;
+ struct cxl_region_ref *ref;
+ unsigned long index;
+ bool kill_regions;
+
+ if (!dport || !dport->port)
+ return;
+
+ port = dport->port;
+
+ xa_for_each(&port->regions, index, ref)
+ if (cxl_dport_is_in_region(dport, ref))
+ kill_regions = true;
+
+ if (kill_regions)
+ cxl_port_kill_regions(port);
+
+ kfree(data);
+}
+
+irqreturn_t cxl_timeout_thread(int irq, void *data)
+{
+ struct cxl_timeout_wq_data *wq_data;
+ struct cxl_timeout *cxlt = data;
+ struct pcie_device *pcie_dev = cxlt->dev;
+ struct pcie_cxlt_data *pdata;
+ struct cxl_dport *dport;
+ u32 status;
+
+ /*
+ * If the CXL core didn't register a cxl_dport with this PCIe device,
+ * then dport enumeration failed and there's nothing to do CXL-wise.
+ */
+ pdata = get_service_data(pcie_dev);
+ if (!pdata || !pdata->dport)
+ return IRQ_HANDLED;
+
+ dport = pdata->dport;
+
+ status = readl(cxlt->regs + CXL_TIMEOUT_STATUS_OFFSET);
+ if (!(status & CXL_TIMEOUT_STATUS_MEM_ISO
+ || status & CXL_TIMEOUT_STATUS_MEM_TIMEOUT))
+ return IRQ_HANDLED;
+
+ dport->isolated = true;
+
+ wq_data = kzalloc(sizeof(struct cxl_timeout_wq_data), GFP_NOWAIT);
+ if (!wq_data)
+ return IRQ_NONE;
+
+ wq_data->dport = dport;
+
+ INIT_WORK(&wq_data->w, cxl_timeout_handler);
+ queue_work(cxl_timeout_wq, &wq_data->w);
+
+ return IRQ_HANDLED;
+}
+
+static int cxl_enable_interrupts(struct pcie_device *dev,
+ struct cxl_timeout *cxlt)
+{
+ if (!cxlt || !FIELD_GET(CXL_TIMEOUT_CAP_INTR_SUPP, cxlt->cap))
+ return -ENXIO;
+
+ return devm_request_threaded_irq(&dev->device, dev->irq, NULL,
+ cxl_timeout_thread,
+ IRQF_SHARED | IRQF_ONESHOT, "cxltdrv",
+ cxlt);
+}
+
static bool cxl_validate_timeout_range(struct cxl_timeout *cxlt, u8 range)
{
u8 timeout_ranges = FIELD_GET(CXL_TIMEOUT_CAP_MEM_TIMEOUT_MASK,
@@ -405,9 +542,28 @@ static int cxl_timeout_probe(struct pcie_device *dev)
if (rc && !timeout_enabled) {
pci_info(dev->port,
"Failed to enable CXL.mem timeout and isolation.\n");
+ return rc;
}
- return rc;
+ rc = cxl_enable_interrupts(dev, cxlt);
+ if (rc) {
+ pci_info(dev->port,
+ "Failed to enable CXL.mem timeout & isolation interrupts: %d\n",
+ rc);
+ } else {
+ pci_info(port, "enabled with IRQ %d\n", dev->irq);
+ }
+
+ num_cxlt_devs++;
+ return 0;
+}
+
+static void cxl_timeout_remove(struct pcie_device *dev)
+{
+ num_cxlt_devs--;
+
+ if (!num_cxlt_devs)
+ destroy_workqueue(cxl_timeout_wq);
}
static struct pcie_port_service_driver cxltdriver = {
@@ -416,6 +572,7 @@ static struct pcie_port_service_driver cxltdriver = {
.service = PCIE_PORT_SERVICE_CXLT,
.probe = cxl_timeout_probe,
+ .remove = cxl_timeout_remove,
.driver = {
.dev_groups = cxl_timeout_attribute_groups,
},
@@ -423,6 +580,12 @@ static struct pcie_port_service_driver cxltdriver = {
int __init pcie_cxlt_init(void)
{
+ cxl_timeout_wq = alloc_ordered_workqueue("cxl_timeout", 0);
+ if (!cxl_timeout_wq)
+ return -ENOMEM;
+
+ num_cxlt_devs = 0;
+
return pcie_port_service_register(&cxltdriver);
}
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index f89e7366e986..c56c629cf563 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -10,6 +10,7 @@
#define _PORTDRV_H_
#include <linux/compiler.h>
+#include <linux/errno.h>
/* Service Type */
#define PCIE_PORT_SERVICE_PME_SHIFT 0 /* Power Management Event */
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 2/6] pcie/cxl_timeout: Add CXL Timeout & Isolation service driver
2024-02-15 19:40 ` [RFC PATCH 2/6] pcie/cxl_timeout: Add CXL Timeout & Isolation service driver Ben Cheatham
@ 2024-02-15 21:13 ` Bjorn Helgaas
2024-02-15 22:21 ` Ben Cheatham
0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2024-02-15 21:13 UTC (permalink / raw)
To: Ben Cheatham
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
Follow drivers/pci/ subject line convention. Suggest
PCI/CXL: Add CXL Timeout & Isolation service driver
On Thu, Feb 15, 2024 at 01:40:44PM -0600, Ben Cheatham wrote:
> Add a CXL Timeout & Isolation (CXL 3.0 12.3) service driver to the
> PCIe port bus driver for CXL root ports. The service will support
> enabling/programming CXL.mem transaction timeout, error isolation,
> and interrupt handling.
> ...
> /* CXL 3.0 8.2.4.23 CXL Timeout and Isolation Capability Structure */
> #define CXL_TIMEOUT_CAPABILITY_OFFSET 0x0
> +#define CXL_TIMEOUT_CAP_MEM_TIMEOUT_SUPP BIT(4)
> +#define CXL_TIMEOUT_CONTROL_OFFSET 0x8
> +#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_ENABLE BIT(4)
> #define CXL_TIMEOUT_CAPABILITY_LENGTH 0x10
Weird lack of alignment makes these #defines hard to read, but kudos
for following the local style ;)
> /* RAS Registers CXL 2.0 8.2.5.9 CXL RAS Capability Structure */
Update to current CXL spec, please. In r3.0, it looks like this is
sec 8.2.4.16. Updating everything to r3.1 references would be even
better.
> +config PCIE_CXL_TIMEOUT
> + bool "PCI Express CXL.mem Timeout & Isolation Interrupt support"
> + depends on PCIEPORTBUS
> + depends on CXL_BUS=PCIEPORTBUS && CXL_PORT
> + help
> + Enables the CXL.mem Timeout & Isolation PCIE port service driver. This
> + driver, in combination with the CXL driver core, is responsible for
> + handling CXL capable PCIE root ports that undergo CXL.mem error isolation
> + due to either a CXL.mem transaction timeout or uncorrectable device error.
When running menuconfig, it seems like both PCIEAER_CXL and
PCIE_CXL_TIMEOUT would fit more logically in the drivers/cxl/Kconfig
menu along with the rest of the CXL options.
Rewrap to fit in 78 columns.
s/PCIE/PCIe/ (also below)
> + * Implements CXL Timeout & Isolation (CXL 3.0 12.3.2) interrupt support as a
> + * PCIE port service driver. The driver is set up such that near all of the
> + * work for setting up and handling interrupts are in this file, while the
> + * CXL core enables the interrupts during port enumeration.
Seems like sec 12.3.2 is slightly too specific here. Maybe 12.3?
> +struct pcie_cxlt_data {
> + struct cxl_timeout *cxlt;
> + struct cxl_dport *dport;
"dport" is not used here. Would be better to add it in the patch that
needs it.
> +static int cxl_map_timeout_regs(struct pci_dev *port,
> + struct cxl_register_map *map,
> + struct cxl_component_regs *regs)
> +{
> + int rc = 0;
Unnecessary initialization.
> + rc = cxl_find_regblock(port, CXL_REGLOC_RBI_COMPONENT, map);
> ...
> +int cxl_find_timeout_cap(struct pci_dev *dev, u32 *cap)
> +{
> + struct cxl_component_regs regs;
> + struct cxl_register_map map;
> + int rc = 0;
Unnecessary initialization.
> + rc = cxl_map_timeout_regs(dev, &map, ®s);
> + if (rc)
> + return rc;
> +
> + *cap = readl(regs.timeout + CXL_TIMEOUT_CAPABILITY_OFFSET);
> + cxl_unmap_timeout_regs(dev, &map, ®s);
> +
> + return rc;
Since we already know the value:
return 0;
> +}
Move cxl_find_timeout_cap() to the patch that needs it. It's unused
in this one.
> +static int cxl_timeout_probe(struct pcie_device *dev)
> +{
> + struct pci_dev *port = dev->port;
> + struct pcie_cxlt_data *pdata;
> + struct cxl_timeout *cxlt;
> + int rc = 0;
Unnecessary initialization.
> + /* Limit to CXL root ports */
> + if (!pci_find_dvsec_capability(port, PCI_DVSEC_VENDOR_ID_CXL,
> + CXL_DVSEC_PORT_EXTENSIONS))
> + return -ENODEV;
> +
> + pdata = cxlt_create_pdata(dev);
> + if (IS_ERR_OR_NULL(pdata))
> + return PTR_ERR(pdata);
> +
> + set_service_data(dev, pdata);
> + cxlt = pdata->cxlt;
> +
> + rc = cxl_enable_timeout(dev, cxlt);
> + if (rc)
> + pci_dbg(dev->port, "Failed to enable CXL.mem timeout: %d\n",
> + rc);
"(%pe)" (similar in subsequent patches)
> + return rc;
> +}
> +
> @@ -829,6 +829,7 @@ static void __init pcie_init_services(void)
> pcie_pme_init();
> pcie_dpc_init();
> pcie_hp_init();
> + pcie_cxlt_init();
"cxlt" seems slightly too generic outside cxl_timeout.c, since there
might be other CXL things that start with "t" someday.
> +#define PCIE_PORT_SERVICE_CXLT_SHIFT 5 /* CXL Timeout & Isolation */
> +#define PCIE_PORT_SERVICE_CXLT (1 << PCIE_PORT_SERVICE_CXLT_SHIFT)
I hate having to add this to the portdrv, but I see why you need to
(so we can deal with the weirdness of PCIe feature interrupts).
Bjorn
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 3/6] pcie/cxl_timeout: Add CXL.mem timeout range programming
2024-02-15 19:40 ` [RFC PATCH 3/6] pcie/cxl_timeout: Add CXL.mem timeout range programming Ben Cheatham
@ 2024-02-15 21:35 ` Bjorn Helgaas
2024-02-15 22:21 ` Ben Cheatham
0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2024-02-15 21:35 UTC (permalink / raw)
To: Ben Cheatham
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On Thu, Feb 15, 2024 at 01:40:45PM -0600, Ben Cheatham wrote:
> Add programming of CXL.mem transaction timeout range (CXL 3.0 8.2.4.23.1
> bits 4 & 11:8) through sysfs. To program the device, read the ranges
> supported from "available_timeout_ranges" in the PCIe service device
> directory, then write the desired value to "timeout_range". The current
> value can be found by reading from "timeout_range".
>
> Example for CXL-enabled PCIe port 0000:0c:00.0, with CXL timeout
> service as 020:
> # cd /sys/bus/pci_express/devices/0000:0c:00.0:pcie020
I would really, really like to avoid adding sysfs dependences on
portdrv. Is there any chance these files could go in the normal
/sys/bus/pci/devices/ hierarchy instead?
> +/* CXL 3.0 8.2.4.23.2 CXL Timeout and Isolation Control Register, bits 3:0 */
> +#define CXL_TIMEOUT_TIMEOUT_RANGE_DEFAULT 0x0
> +#define CXL_TIMEOUT_TIMEOUT_RANGE_D2 0xE
Looks like the single other example in this file of hex constants
using A-F uses lower-case.
Does "TIMEOUT_TIMEOUT" add information over just "TIMEOUT"?
> +#define NUM_CXL_TIMEOUT_RANGES 9
I don't think we actually need this constant, do we?
> +static bool cxl_validate_timeout_range(struct cxl_timeout *cxlt, u8 range)
"validate" is not a very name for a function returning "bool" because
you can't tell what true/false means from the name. "valid" would be
fine.
> + pci_dbg(cxlt->dev->port,
> + "Timeout & isolation timeout set to range 0x%x\n", range);
I don't know CXL, but "timeout ... timeout" reads sort of strange. Is
it actually a timeout for a timeout? Maybe it was supposed to be
"transaction timeout"?
> +const struct cxl_timeout_range {
> + const char *str;
> + u8 range_val;
> +} cxl_timeout_ranges[NUM_CXL_TIMEOUT_RANGES] = {
Static?
Bjorn
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 4/6] pcie/cxl_timeout: Add CXL.mem error isolation support
2024-02-15 19:40 ` [RFC PATCH 4/6] pcie/cxl_timeout: Add CXL.mem error isolation support Ben Cheatham
@ 2024-02-15 21:49 ` Bjorn Helgaas
2024-02-15 22:21 ` Ben Cheatham
0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2024-02-15 21:49 UTC (permalink / raw)
To: Ben Cheatham
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On Thu, Feb 15, 2024 at 01:40:46PM -0600, Ben Cheatham wrote:
> Add and enable CXL.mem error isolation support (CXL 3.0 12.3.2)
> to the CXL Timeout & Isolation service driver.
> @@ -341,7 +366,8 @@ static int cxl_timeout_probe(struct pcie_device *dev)
> struct pci_dev *port = dev->port;
> struct pcie_cxlt_data *pdata;
> struct cxl_timeout *cxlt;
> - int rc = 0;
> + bool timeout_enabled;
> + int rc;
>
> /* Limit to CXL root ports */
> if (!pci_find_dvsec_capability(port, PCI_DVSEC_VENDOR_ID_CXL,
> @@ -360,6 +386,18 @@ static int cxl_timeout_probe(struct pcie_device *dev)
> pci_dbg(dev->port, "Failed to enable CXL.mem timeout: %d\n",
> rc);
>
> + timeout_enabled = !rc;
This ends up being a little weird: mixing int and bool, negating rc
here and then negating timeout_enabled later, several messages.
Maybe could just keep rc1 and rc2, drop the pci_dbg messages and
enhance the "enabled" message to be something like:
"enabled %s%s with IRQ", rc1 ? "" : "timeout", rc2 ? "" : "isolation"
("&" left for your imagination).
Or something like
#define FLAG(x) ((x) ? '-' : '+')
"CXL.mem timeout%c isolation%c enabled with IRQ %d", FLAG(rc1), FLAG(rc2)
> + rc = cxl_enable_isolation(dev, cxlt);
> + if (rc)
> + pci_dbg(dev->port, "Failed to enable CXL.mem isolation: %d\n",
> + rc);
"(%pe)"
> + if (rc && !timeout_enabled) {
> + pci_info(dev->port,
> + "Failed to enable CXL.mem timeout and isolation.\n");
Most messages don't include a period at end. It just adds the chance
for line wrapping unnecessarily.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 5/6] pcie/portdrv: Add CXL MSI/-X allocation
2024-02-15 19:40 ` [RFC PATCH 5/6] pcie/portdrv: Add CXL MSI/-X allocation Ben Cheatham
@ 2024-02-15 21:51 ` Bjorn Helgaas
2024-02-15 22:22 ` Ben Cheatham
0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2024-02-15 21:51 UTC (permalink / raw)
To: Ben Cheatham
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On Thu, Feb 15, 2024 at 01:40:47PM -0600, Ben Cheatham wrote:
> Allocate an MSI/-X for CXL-enabled PCIe root ports that support
> timeout & isolation interrupts. This vector will be used by the
> CXL timeout & isolation service driver to disable the root port
> in the CXL port hierarchy and any associated memory if the port
> enters isolation.
Wrap to fill 75 columns.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 6/6] pcie/cxl_timeout: Add CXL.mem Timeout & Isolation interrupt support
2024-02-15 19:40 ` [RFC PATCH 6/6] pcie/cxl_timeout: Add CXL.mem Timeout & Isolation interrupt support Ben Cheatham
@ 2024-02-15 21:57 ` Bjorn Helgaas
2024-02-15 22:22 ` Ben Cheatham
0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2024-02-15 21:57 UTC (permalink / raw)
To: Ben Cheatham
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On Thu, Feb 15, 2024 at 01:40:48PM -0600, Ben Cheatham wrote:
> Add support for CXL.mem Timeout & Isolation interrupts. A CXL root port
> under isolation will not complete writes and will return an exception
> response (i.e. poison) on reads (CXL 3.0 12.3.2). Therefore, when a
> CXL-enabled PCIe root port enters isolation, we assume that the memory
> under the port is unreachable and will need to be unmapped.
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -10,6 +10,7 @@
> #define _PORTDRV_H_
>
> #include <linux/compiler.h>
> +#include <linux/errno.h>
This doesn't look required by portdrv.h. If it's only needed by
something that *includes* portdrv.h, I think errno.h should be
included by that .c file instead.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 2/6] pcie/cxl_timeout: Add CXL Timeout & Isolation service driver
2024-02-15 21:13 ` Bjorn Helgaas
@ 2024-02-15 22:21 ` Ben Cheatham
2024-02-15 22:26 ` Bjorn Helgaas
0 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 22:21 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
Hi Bjorn, thanks for the comments! Responses inline.
On 2/15/24 3:13 PM, Bjorn Helgaas wrote:
> Follow drivers/pci/ subject line convention. Suggest
>
> PCI/CXL: Add CXL Timeout & Isolation service driver
>
> On Thu, Feb 15, 2024 at 01:40:44PM -0600, Ben Cheatham wrote:
>> Add a CXL Timeout & Isolation (CXL 3.0 12.3) service driver to the
>> PCIe port bus driver for CXL root ports. The service will support
>> enabling/programming CXL.mem transaction timeout, error isolation,
>> and interrupt handling.
>> ...
>
>> /* CXL 3.0 8.2.4.23 CXL Timeout and Isolation Capability Structure */
>> #define CXL_TIMEOUT_CAPABILITY_OFFSET 0x0
>> +#define CXL_TIMEOUT_CAP_MEM_TIMEOUT_SUPP BIT(4)
>> +#define CXL_TIMEOUT_CONTROL_OFFSET 0x8
>> +#define CXL_TIMEOUT_CONTROL_MEM_TIMEOUT_ENABLE BIT(4)
>> #define CXL_TIMEOUT_CAPABILITY_LENGTH 0x10
>
> Weird lack of alignment makes these #defines hard to read, but kudos
> for following the local style ;)
>
>> /* RAS Registers CXL 2.0 8.2.5.9 CXL RAS Capability Structure */
>
> Update to current CXL spec, please. In r3.0, it looks like this is
> sec 8.2.4.16. Updating everything to r3.1 references would be even
> better.
>
Yeah a lot of the spec references in cxl.h are out of date, I'll update them
where I touch them. And I didn't realize 3.1 was out yet, I'll update the references
to that revision as well.
>> +config PCIE_CXL_TIMEOUT
>> + bool "PCI Express CXL.mem Timeout & Isolation Interrupt support"
>> + depends on PCIEPORTBUS
>> + depends on CXL_BUS=PCIEPORTBUS && CXL_PORT
>> + help
>> + Enables the CXL.mem Timeout & Isolation PCIE port service driver. This
>> + driver, in combination with the CXL driver core, is responsible for
>> + handling CXL capable PCIE root ports that undergo CXL.mem error isolation
>> + due to either a CXL.mem transaction timeout or uncorrectable device error.
>
> When running menuconfig, it seems like both PCIEAER_CXL and
> PCIE_CXL_TIMEOUT would fit more logically in the drivers/cxl/Kconfig
> menu along with the rest of the CXL options.
>
Will do.
> Rewrap to fit in 78 columns.
>
> s/PCIE/PCIe/ (also below)
>
Will do both above.
>> + * Implements CXL Timeout & Isolation (CXL 3.0 12.3.2) interrupt support as a
>> + * PCIE port service driver. The driver is set up such that near all of the
>> + * work for setting up and handling interrupts are in this file, while the
>> + * CXL core enables the interrupts during port enumeration.
>
> Seems like sec 12.3.2 is slightly too specific here. Maybe 12.3?
>
12.3.2 is the CXL.mem portion of the section, which is the only section implemented
in the patch set. That being said, I don't mind changing it to 12.3 instead.
>> +struct pcie_cxlt_data {
>> + struct cxl_timeout *cxlt;
>> + struct cxl_dport *dport;
>
> "dport" is not used here. Would be better to add it in the patch that
> needs it.
>
Will do.
>> +static int cxl_map_timeout_regs(struct pci_dev *port,
>> + struct cxl_register_map *map,
>> + struct cxl_component_regs *regs)
>> +{
>> + int rc = 0;
>
> Unnecessary initialization.
>
>> + rc = cxl_find_regblock(port, CXL_REGLOC_RBI_COMPONENT, map);
>> ...
>
>> +int cxl_find_timeout_cap(struct pci_dev *dev, u32 *cap)
>> +{
>> + struct cxl_component_regs regs;
>> + struct cxl_register_map map;
>> + int rc = 0;
>
> Unnecessary initialization.
>
>> + rc = cxl_map_timeout_regs(dev, &map, ®s);
>> + if (rc)
>> + return rc;
>> +
>> + *cap = readl(regs.timeout + CXL_TIMEOUT_CAPABILITY_OFFSET);
>> + cxl_unmap_timeout_regs(dev, &map, ®s);
>> +
>> + return rc;
>
> Since we already know the value:
>
> return 0;
>
>> +}
>
> Move cxl_find_timeout_cap() to the patch that needs it. It's unused
> in this one.
>
Will do to all 4 above.
>> +static int cxl_timeout_probe(struct pcie_device *dev)
>> +{
>> + struct pci_dev *port = dev->port;
>> + struct pcie_cxlt_data *pdata;
>> + struct cxl_timeout *cxlt;
>> + int rc = 0;
>
> Unnecessary initialization.
>
>> + /* Limit to CXL root ports */
>> + if (!pci_find_dvsec_capability(port, PCI_DVSEC_VENDOR_ID_CXL,
>> + CXL_DVSEC_PORT_EXTENSIONS))
>> + return -ENODEV;
>> +
>> + pdata = cxlt_create_pdata(dev);
>> + if (IS_ERR_OR_NULL(pdata))
>> + return PTR_ERR(pdata);
>> +
>> + set_service_data(dev, pdata);
>> + cxlt = pdata->cxlt;
>> +
>> + rc = cxl_enable_timeout(dev, cxlt);
>> + if (rc)
>> + pci_dbg(dev->port, "Failed to enable CXL.mem timeout: %d\n",
>> + rc);
>
> "(%pe)" (similar in subsequent patches)
>
I wasn't aware of that, I'll change it.
>> + return rc;
>> +}
>> +
>
>> @@ -829,6 +829,7 @@ static void __init pcie_init_services(void)
>> pcie_pme_init();
>> pcie_dpc_init();
>> pcie_hp_init();
>> + pcie_cxlt_init();
>
> "cxlt" seems slightly too generic outside cxl_timeout.c, since there
> might be other CXL things that start with "t" someday.
>
>> +#define PCIE_PORT_SERVICE_CXLT_SHIFT 5 /* CXL Timeout & Isolation */
>> +#define PCIE_PORT_SERVICE_CXLT (1 << PCIE_PORT_SERVICE_CXLT_SHIFT)
>
> I hate having to add this to the portdrv, but I see why you need to
> (so we can deal with the weirdness of PCIe feature interrupts).
>
> Bjorn
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 3/6] pcie/cxl_timeout: Add CXL.mem timeout range programming
2024-02-15 21:35 ` Bjorn Helgaas
@ 2024-02-15 22:21 ` Ben Cheatham
2024-02-15 22:29 ` Bjorn Helgaas
0 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 22:21 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On 2/15/24 3:35 PM, Bjorn Helgaas wrote:
> On Thu, Feb 15, 2024 at 01:40:45PM -0600, Ben Cheatham wrote:
>> Add programming of CXL.mem transaction timeout range (CXL 3.0 8.2.4.23.1
>> bits 4 & 11:8) through sysfs. To program the device, read the ranges
>> supported from "available_timeout_ranges" in the PCIe service device
>> directory, then write the desired value to "timeout_range". The current
>> value can be found by reading from "timeout_range".
>>
>> Example for CXL-enabled PCIe port 0000:0c:00.0, with CXL timeout
>> service as 020:
>> # cd /sys/bus/pci_express/devices/0000:0c:00.0:pcie020
>
> I would really, really like to avoid adding sysfs dependences on
> portdrv. Is there any chance these files could go in the normal
> /sys/bus/pci/devices/ hierarchy instead?
>
This was just the first place I tried putting it that seemed ok. I'll take a look
at moving them over to pci/devices instead.
>> +/* CXL 3.0 8.2.4.23.2 CXL Timeout and Isolation Control Register, bits 3:0 */
>> +#define CXL_TIMEOUT_TIMEOUT_RANGE_DEFAULT 0x0
>
>> +#define CXL_TIMEOUT_TIMEOUT_RANGE_D2 0xE
>
> Looks like the single other example in this file of hex constants
> using A-F uses lower-case.
>
> Does "TIMEOUT_TIMEOUT" add information over just "TIMEOUT"?
>
TIMEOUT_TIMEOUT is supposed to specify transaction timeout, but I'll agree it is
confusing. I could try changing to CXL_TRANSACTION_TIMEOUT_RANGE_* instead, but
it would break the convention of CXL timeout & isolation defines starting with
CXL_TIMEOUT.
>> +#define NUM_CXL_TIMEOUT_RANGES 9
>
> I don't think we actually need this constant, do we?
>
Not really, just gets rid of a magic number (as well as makes sure anyone who
changes that array remembers to update the array size).
>> +static bool cxl_validate_timeout_range(struct cxl_timeout *cxlt, u8 range)
>
> "validate" is not a very name for a function returning "bool" because
> you can't tell what true/false means from the name. "valid" would be
> fine.
>
Will change.
>> + pci_dbg(cxlt->dev->port,
>> + "Timeout & isolation timeout set to range 0x%x\n", range);
>
> I don't know CXL, but "timeout ... timeout" reads sort of strange. Is
> it actually a timeout for a timeout? Maybe it was supposed to be
> "transaction timeout"?
>
Yeah it's the same as above. I'll change it to "Timeout & isolation transaction timeout ..." instead.
>> +const struct cxl_timeout_range {
>> + const char *str;
>> + u8 range_val;
>> +} cxl_timeout_ranges[NUM_CXL_TIMEOUT_RANGES] = {
>
> Static?
>
Yep, I'll change it.
> Bjorn
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 4/6] pcie/cxl_timeout: Add CXL.mem error isolation support
2024-02-15 21:49 ` Bjorn Helgaas
@ 2024-02-15 22:21 ` Ben Cheatham
0 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 22:21 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On 2/15/24 3:49 PM, Bjorn Helgaas wrote:
> On Thu, Feb 15, 2024 at 01:40:46PM -0600, Ben Cheatham wrote:
>> Add and enable CXL.mem error isolation support (CXL 3.0 12.3.2)
>> to the CXL Timeout & Isolation service driver.
>
>> @@ -341,7 +366,8 @@ static int cxl_timeout_probe(struct pcie_device *dev)
>> struct pci_dev *port = dev->port;
>> struct pcie_cxlt_data *pdata;
>> struct cxl_timeout *cxlt;
>> - int rc = 0;
>> + bool timeout_enabled;
>> + int rc;
>>
>> /* Limit to CXL root ports */
>> if (!pci_find_dvsec_capability(port, PCI_DVSEC_VENDOR_ID_CXL,
>> @@ -360,6 +386,18 @@ static int cxl_timeout_probe(struct pcie_device *dev)
>> pci_dbg(dev->port, "Failed to enable CXL.mem timeout: %d\n",
>> rc);
>>
>> + timeout_enabled = !rc;
>
> This ends up being a little weird: mixing int and bool, negating rc
> here and then negating timeout_enabled later, several messages.
>
> Maybe could just keep rc1 and rc2, drop the pci_dbg messages and
> enhance the "enabled" message to be something like:
> "enabled %s%s with IRQ", rc1 ? "" : "timeout", rc2 ? "" : "isolation"
> ("&" left for your imagination).
>
> Or something like
> #define FLAG(x) ((x) ? '-' : '+')
> "CXL.mem timeout%c isolation%c enabled with IRQ %d", FLAG(rc1), FLAG(rc2)
>
I thought it was janky as well. I tried something really quick using ternaries
and it looked weirder :/. But I agree with you here, I'll take another stab at it.
>> + rc = cxl_enable_isolation(dev, cxlt);
>
>> + if (rc)
>> + pci_dbg(dev->port, "Failed to enable CXL.mem isolation: %d\n",
>> + rc);
>
> "(%pe)"
>
>> + if (rc && !timeout_enabled) {
>> + pci_info(dev->port,
>> + "Failed to enable CXL.mem timeout and isolation.\n");
>
> Most messages don't include a period at end. It just adds the chance
> for line wrapping unnecessarily.
>
I'll remove them.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 5/6] pcie/portdrv: Add CXL MSI/-X allocation
2024-02-15 21:51 ` Bjorn Helgaas
@ 2024-02-15 22:22 ` Ben Cheatham
0 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 22:22 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On 2/15/24 3:51 PM, Bjorn Helgaas wrote:
> On Thu, Feb 15, 2024 at 01:40:47PM -0600, Ben Cheatham wrote:
>> Allocate an MSI/-X for CXL-enabled PCIe root ports that support
>> timeout & isolation interrupts. This vector will be used by the
>> CXL timeout & isolation service driver to disable the root port
>> in the CXL port hierarchy and any associated memory if the port
>> enters isolation.
>
> Wrap to fill 75 columns.
Consider it done :).
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 6/6] pcie/cxl_timeout: Add CXL.mem Timeout & Isolation interrupt support
2024-02-15 21:57 ` Bjorn Helgaas
@ 2024-02-15 22:22 ` Ben Cheatham
0 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 22:22 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On 2/15/24 3:57 PM, Bjorn Helgaas wrote:
> On Thu, Feb 15, 2024 at 01:40:48PM -0600, Ben Cheatham wrote:
>> Add support for CXL.mem Timeout & Isolation interrupts. A CXL root port
>> under isolation will not complete writes and will return an exception
>> response (i.e. poison) on reads (CXL 3.0 12.3.2). Therefore, when a
>> CXL-enabled PCIe root port enters isolation, we assume that the memory
>> under the port is unreachable and will need to be unmapped.
>
>> --- a/drivers/pci/pcie/portdrv.h
>> +++ b/drivers/pci/pcie/portdrv.h
>> @@ -10,6 +10,7 @@
>> #define _PORTDRV_H_
>>
>> #include <linux/compiler.h>
>> +#include <linux/errno.h>
>
> This doesn't look required by portdrv.h. If it's only needed by
> something that *includes* portdrv.h, I think errno.h should be
> included by that .c file instead.
>
I agree. It's a remnant of an older version of this that had a declaration
that needed it, I'll remove it.
Thanks again,
Ben
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 2/6] pcie/cxl_timeout: Add CXL Timeout & Isolation service driver
2024-02-15 22:21 ` Ben Cheatham
@ 2024-02-15 22:26 ` Bjorn Helgaas
0 siblings, 0 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2024-02-15 22:26 UTC (permalink / raw)
To: Ben Cheatham
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On Thu, Feb 15, 2024 at 04:21:42PM -0600, Ben Cheatham wrote:
> On 2/15/24 3:13 PM, Bjorn Helgaas wrote:
> >> + rc = cxl_enable_timeout(dev, cxlt);
> >> + if (rc)
> >> + pci_dbg(dev->port, "Failed to enable CXL.mem timeout: %d\n",
> >> + rc);
> >
> > "(%pe)" (similar in subsequent patches)
> >
>
> I wasn't aware of that, I'll change it.
I wasn't either, but I'm getting patches to change them, and it does
seem a little bit easier than looking up the errnos when debugging,
so ...
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 3/6] pcie/cxl_timeout: Add CXL.mem timeout range programming
2024-02-15 22:21 ` Ben Cheatham
@ 2024-02-15 22:29 ` Bjorn Helgaas
2024-02-15 22:30 ` Ben Cheatham
0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2024-02-15 22:29 UTC (permalink / raw)
To: Ben Cheatham
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On Thu, Feb 15, 2024 at 04:21:48PM -0600, Ben Cheatham wrote:
> On 2/15/24 3:35 PM, Bjorn Helgaas wrote:
> > On Thu, Feb 15, 2024 at 01:40:45PM -0600, Ben Cheatham wrote:
> >> +#define NUM_CXL_TIMEOUT_RANGES 9
> >
> > I don't think we actually need this constant, do we?
>
> Not really, just gets rid of a magic number (as well as makes sure
> anyone who changes that array remembers to update the array size).
My point was that I don't think you need to specify the size of the
array at all since it's initialized, and you already use ARRAY_SIZE()
elsewhere.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 3/6] pcie/cxl_timeout: Add CXL.mem timeout range programming
2024-02-15 22:29 ` Bjorn Helgaas
@ 2024-02-15 22:30 ` Ben Cheatham
0 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2024-02-15 22:30 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-cxl, linux-pci, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
bhelgaas
On 2/15/24 4:29 PM, Bjorn Helgaas wrote:
> On Thu, Feb 15, 2024 at 04:21:48PM -0600, Ben Cheatham wrote:
>> On 2/15/24 3:35 PM, Bjorn Helgaas wrote:
>>> On Thu, Feb 15, 2024 at 01:40:45PM -0600, Ben Cheatham wrote:
>
>>>> +#define NUM_CXL_TIMEOUT_RANGES 9
>>>
>>> I don't think we actually need this constant, do we?
>>
>> Not really, just gets rid of a magic number (as well as makes sure
>> anyone who changes that array remembers to update the array size).
>
> My point was that I don't think you need to specify the size of the
> array at all since it's initialized, and you already use ARRAY_SIZE()
> elsewhere.
>
Sorry about that, I just completely misinterpreted that. I agree, I'll remove it.
Thanks,
Ben
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support
2024-02-15 19:40 [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Ben Cheatham
` (5 preceding siblings ...)
2024-02-15 19:40 ` [RFC PATCH 6/6] pcie/cxl_timeout: Add CXL.mem Timeout & Isolation interrupt support Ben Cheatham
@ 2024-02-15 23:43 ` Dan Williams
2024-03-25 15:15 ` Ben Cheatham
6 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2024-02-15 23:43 UTC (permalink / raw)
To: Ben Cheatham, linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, bhelgaas,
benjamin.cheatham, linux-mm
[ add linux-mm because the implications of this feature weigh much
heavier on mm/ than drivers/ ]
Ben Cheatham wrote:
> Implement initial support for CXL.mem Timeout & Isolation (CXL 3.0
> 12.3.2). This series implements support for CXL.mem enabling and
> programming CXL.mem transaction timeout, CXL.mem error isolation,
> and error isolation interrupts for CXL-enabled PCIe root ports that
> implement the CXL Timeout & Isolation capability.
>
> I am operating under the assumption that recovery from error isolation
> will be more involved than just resetting the port and turning off
> isolation, so that flow is not implemented here.
That needs to be answered first.
The notification infrastructure is trivial in comparison. The use case
needs to be clear *before* we start adding infrastructure to Linux that
may end up just being dead code, and certainly before we further burden
portdrv.c with more entanglements.
I put together the write-up below in a separate context for my thoughts
on the error isolation capability and that Linux really needs an end
user to stand up and say, "yes, even with all those caveats CXL Error
Isolation is still useful for our environment." The tl;dr is "are you
abosolutely sure you would not just rather reboot?"
> There is also no support for CXL.cache, but I plan to eventually
> implement both.
To date there are zero upstream drivers for CXL.cache initiators, and
CXL 3.0 introduced HDM-DB to supplant HDM-D. All that to say, if there
are zero mass-market (devices with upstream drivers) adopters of HDM-D,
and there are zero HDM-DB platforms available today it seems Linux has
time for this to continue to evolve. If anyone reading this has a
CXL.cache initiator driver waiting in the wings, do reach out on the
list to clarify what commons belong in the Linux CXL and/or PCI core.
> The series also introduces a PCIe port bus driver dependency on the CXL
> core. I expect to be able to remove that when when my team submits
> patches for a future rework of the PCIe port bus driver.
We have time to wait for that work to settle. Do not make the job harder
in the near term by adding one more dependency to unwind.
> I have done some testing using QEMU by adding the isolation registers
> and a hacked-up QMP command to test the interrupt flow, but I *DID NOT*
> implement the actual isolation feature and the subsequent device
> behavior. I'd be willing to share these changes (and my config) if
> anyone is interested in testing this.
>
> Any thoughts/comments would be greatly appreciated!
---
Memory Error Isolation is a mechanism that *might* allow recovery of CXL
Root Port Link Down conditions or CXL transaction timeouts. When the
event occurs, outstanding writes are dropped, and outstanding reads
terminate with machine check. In order to exit Isolation, the link must
transition through a "link down" status. Effectively Isolation behaves
like a combination of a machine check storm until system software can
evacuate all users and then a surprise hot-removal (unplug) of the
memory before the device can be recovered. Where recovery is effectively
a hot re-plug of the device with a full re-enumeration thereafter. From
the Linux perspective all memory contents are considered forfeited. This
poses several challenges for how to utilize the memory to achieve
reliable recovery. The criteria for evaluating the Linux upstream
maintenance cost for overcoming those challenges is whether the sum
total of those mitigations remains an improvement over a system-reboot
to recover from the same Isolation event. Add to that the fact that not
fully accounting for all the mentioned challenges is still going to
result in a reboot due to kernel panic.
In order to understand the limits of Isolation recovery relative to
reboot recovery, it is important to understand the fundamental
limitations of Linux machine check recovery and memory-hotremove.
Machine checks are typically only recoverable when they hit user memory.
Roughly, if a machine check occurs in a kernel mapped page the machine
check handler triggers a kernel panic. Machine checks are often
recoverable because failures are limited to a few cachelines at a time
and the page allocation distribution is heavily skewed towards
user-memory.
CXL Isolation takes down an entire CXL root port, and with interleaving
can lead to a region spanning multiple ports to be taken down. Even if
the interleave configuration forfeited bandwidth to contain isolation
events to a single CXL root port, it is still on the order of 100s of
GBs that will start throwing machine checks on access all at once. If
that memory is being used by the kernel as typical System-RAM, some of
it is likely to be kernel mapped. Marking all CXL memory as ZONE_MOVABLE
to avoid kernel allocations is not a reliable mitigation for this
scenario as the kernel always needs some ratio of ZONE_NORMAL to access
ZONE_MOVABLE memory.
The "kernel memory" problem similarly effects surprise removal events.
The kernel can only reliably unplug memory that it can force offline and
there is no facility to force-offline ZONE_NORMAL memory. Long term
memory pinning like guests VMs with assigned devices or RDMA, or even
short-term pins from transient device DMA, can hold off memory removal
indefinitely which means recovery may be held off indefinitely. For
System-RAM there is no facility to notify all users to evacuate, instead
the memory hot-removal code walks every page in the range to be offlined
and aborts if any single page has an elevated reference count.
It follows from the above that Isolation requires that the CXL memory
ranges to be recovered must never be online as System-RAM, the kernel
cannot offer typical memory management services for memory that is
subject to "surprise removal". Instead, device-dax is a facility that
has some properties that may allow recovery to be reliable. The
device-dax facility arranges for a given memory range to be mappable via
a device-file. This effectively allows userspace management of that
memory, but at the cost of application changes.
If, for example, the intended use case of Isolation capable CXL memory
is to place VMs that can die during an Isolation event while keeping the
rest of the system up, then that could be achieved with device-dax. For
example, allocate a device-dax instance per-VM and specify it as a
"memory-backend-file" to qemu-kvm.
Again the loss of typical core mm memory semantics and the need for
application changes raises the question of whether reboot is preferred
over Isolation recovery. Unlike System-RAM that supports anonymous
mappings disconnected from the physical memory device, device-dax
implements file-backed mappings which includes methods to reverse-map
all users and revoke their access to the memory range that has gone into
Isolation.
---
So if someone says, "yes I can tolerate losing a root port at a time and
I can tolerate deploying my workloads with userspace memory management,
and this is preferable to a reboot", then maybe Linux should entertain
CXL Error Isolation. Until such an end use case gains clear uptake it
seems too early to worry about plumbing the notification mechanism.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support
2024-02-15 23:43 ` [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Dan Williams
@ 2024-03-25 15:15 ` Ben Cheatham
2024-03-25 15:54 ` Dan Williams
0 siblings, 1 reply; 24+ messages in thread
From: Ben Cheatham @ 2024-03-25 15:15 UTC (permalink / raw)
To: Dan Williams, linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, bhelgaas, linux-mm
Hey Dan,
Sorry that I went radio silent on this, I've been thinking about the approach to take
from here on out. More below!
On 2/15/24 5:43 PM, Dan Williams wrote:
[snip]
>> The series also introduces a PCIe port bus driver dependency on the CXL
>> core. I expect to be able to remove that when when my team submits
>> patches for a future rework of the PCIe port bus driver.
>
> We have time to wait for that work to settle. Do not make the job harder
> in the near term by adding one more dependency to unwind.
>
For sure, I'll withhold any work I do that touches the port bus driver
until that's sent upstream. I'll send anything that doesn't touch
the port driver, i.e. CXL region changes, as RFC as well.
[snip]
> So if someone says, "yes I can tolerate losing a root port at a time and
> I can tolerate deploying my workloads with userspace memory management,
> and this is preferable to a reboot", then maybe Linux should entertain
> CXL Error Isolation. Until such an end use case gains clear uptake it
> seems too early to worry about plumbing the notification mechanism.
So in my mind the primary use case for this feature is in a server/data center environment.
Losing a root port and the attached memory is definitely preferable to a reboot in this environment,
so I think that isolation would still be useful even if it isn't plug-and-play.
I agree with your assessment of what has to happen before we can make this feature work. From what I understand
these are the main requirements for making isolation work:
1. The memory region can't be onlined as System RAM
2. The region needs to be mapped as device-dax
3. There need to be safeguards such that someone can't remap the region to System RAM with
error isolation enabled (added by me)
Considering these all have to do with mm, I think that's a good place to start. What I'm thinking of starting with
is adding a module parameter (or config option) to enable isolation for CXL dax regions by default, as well as
a sysfs option for toggling error isolation for the CXL region. When the module parameter is specified, the CXL
region driver would create the region as a device-dax region by default instead of onlining the region as system RAM.
The sysfs would allow toggling error isolation for CXL regions that are already device-dax.
That would cover requirements 1 & 2, but would still allow someone to reconfigure the region as a system RAM region
with error isolation still enabled using sysfs/daxctl. To make sure the this doesn't happen, my plan is to have the
CXL region driver automatically disable error isolation when the underlying memory region is going offline, then
check to make sure the memory isn't onlined as System RAM before re-enabling isolation.
So, with that in mind, isolation would most likely go from something that is enabled by default when compiled in to a
feature for correctly-configured CXL regions that has to be enabled by the end user. If that is sounds good, here's
what my roadmap would be going forward:
1. Enable creating device-dax mapped CXL regions by default
2. Add support for checking a CXL region meets the requirements for enabling isolation (i.e. all devices in
dax region(s) are device-dax)
3. Add back in the error isolation enablement and notification mechanisms in this patch series
And then after those are in place:
4. Add back in timeout mechanisms
5. Recovery flows/support
I'll also be dropping CXL.cache support until there is more support. I'd like to hear your (or anyone else's) thoughts
on this!
Thanks,
Ben
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support
2024-03-25 15:15 ` Ben Cheatham
@ 2024-03-25 15:54 ` Dan Williams
2024-04-01 19:41 ` Ben Cheatham
0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2024-03-25 15:54 UTC (permalink / raw)
To: Ben Cheatham, Dan Williams, linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, bhelgaas, linux-mm
Ben Cheatham wrote:
> Hey Dan,
>
> Sorry that I went radio silent on this, I've been thinking about the approach to take
> from here on out. More below!
>
> On 2/15/24 5:43 PM, Dan Williams wrote:
>
> [snip]
>
> >> The series also introduces a PCIe port bus driver dependency on the CXL
> >> core. I expect to be able to remove that when when my team submits
> >> patches for a future rework of the PCIe port bus driver.
> >
> > We have time to wait for that work to settle. Do not make the job harder
> > in the near term by adding one more dependency to unwind.
> >
>
> For sure, I'll withhold any work I do that touches the port bus driver
> until that's sent upstream. I'll send anything that doesn't touch
> the port driver, i.e. CXL region changes, as RFC as well.
>
> [snip]
>
> > So if someone says, "yes I can tolerate losing a root port at a time and
> > I can tolerate deploying my workloads with userspace memory management,
> > and this is preferable to a reboot", then maybe Linux should entertain
> > CXL Error Isolation. Until such an end use case gains clear uptake it
> > seems too early to worry about plumbing the notification mechanism.
>
> So in my mind the primary use case for this feature is in a
> server/data center environment. Losing a root port and the attached
> memory is definitely preferable to a reboot in this environment, so I
> think that isolation would still be useful even if it isn't
> plug-and-play.
That is not sufficient. This feature needs an implementation partner to
work through the caveats.
> I agree with your assessment of what has to happen before we can make
> this feature work. From what I understand these are the main
> requirements for making isolation work:
Requirement 1 is "Find customer".
> 1. The memory region can't be onlined as System RAM
> 2. The region needs to be mapped as device-dax
> 3. There need to be safeguards such that someone can't remap the
> region to System RAM with error isolation enabled (added by me)
No, this policy does not belong in the kernel.
> Considering these all have to do with mm, I think that's a good place
> to start. What I'm thinking of starting with is adding a module
> parameter (or config option) to enable isolation for CXL dax regions
> by default, as well as a sysfs option for toggling error isolation for
> the CXL region. When the module parameter is specified, the CXL region
> driver would create the region as a device-dax region by default
> instead of onlining the region as system RAM. The sysfs would allow
> toggling error isolation for CXL regions that are already device-dax.
No, definitely not going to invent module paramter policy for this. The
policy to not online dax regions is already available.
> That would cover requirements 1 & 2, but would still allow someone to
> reconfigure the region as a system RAM region with error isolation
> still enabled using sysfs/daxctl. To make sure the this doesn't
> happen, my plan is to have the CXL region driver automatically disable
> error isolation when the underlying memory region is going offline,
> then check to make sure the memory isn't onlined as System RAM before
> re-enabling isolation.
Why would you ever need to disable isolation? If it is present it at
least nominally allows software to keep running vs hang awaiting a
watchdog-timer reboot.
> So, with that in mind, isolation would most likely go from something
> that is enabled by default when compiled in to a feature for
> correctly-configured CXL regions that has to be enabled by the end
> user. If that is sounds good, here's what my roadmap would be going
> forward:
Again, this needs a customer to weigh the mitigations that the kernel
needs to carry.
> 1. Enable creating device-dax mapped CXL regions by default
Already exists.
> 2. Add support for checking a CXL region meets the requirements
> for enabling isolation (i.e. all devices in
> dax region(s) are device-dax)
Regions should not know or care if isolation is enabled, the
implications are identical to force unbinding the region driver.
> 3. Add back in the error isolation enablement and notification
> mechanisms in this patch series
Not until it is clear that this feature has deployment value.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support
2024-03-25 15:54 ` Dan Williams
@ 2024-04-01 19:41 ` Ben Cheatham
0 siblings, 0 replies; 24+ messages in thread
From: Ben Cheatham @ 2024-04-01 19:41 UTC (permalink / raw)
To: Dan Williams, linux-cxl, linux-pci
Cc: dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, bhelgaas, linux-mm
Sorry for the delay, I got a bit busy for a while there. Responses inline.
On 3/25/24 10:54 AM, Dan Williams wrote:
> Ben Cheatham wrote:
>> Hey Dan,
>>
>> Sorry that I went radio silent on this, I've been thinking about the approach to take
>> from here on out. More below!
>>
>> On 2/15/24 5:43 PM, Dan Williams wrote:
>>
>> [snip]
>>
>>>> The series also introduces a PCIe port bus driver dependency on the CXL
>>>> core. I expect to be able to remove that when when my team submits
>>>> patches for a future rework of the PCIe port bus driver.
>>>
>>> We have time to wait for that work to settle. Do not make the job harder
>>> in the near term by adding one more dependency to unwind.
>>>
>>
>> For sure, I'll withhold any work I do that touches the port bus driver
>> until that's sent upstream. I'll send anything that doesn't touch
>> the port driver, i.e. CXL region changes, as RFC as well.
>>
>> [snip]
>>
>>> So if someone says, "yes I can tolerate losing a root port at a time and
>>> I can tolerate deploying my workloads with userspace memory management,
>>> and this is preferable to a reboot", then maybe Linux should entertain
>>> CXL Error Isolation. Until such an end use case gains clear uptake it
>>> seems too early to worry about plumbing the notification mechanism.
>>
>> So in my mind the primary use case for this feature is in a
>> server/data center environment. Losing a root port and the attached
>> memory is definitely preferable to a reboot in this environment, so I
>> think that isolation would still be useful even if it isn't
>> plug-and-play.
>
> That is not sufficient. This feature needs an implementation partner to
> work through the caveats.
>
Got it, I'll come back to this if/when I have a customer willing to commit
to this.
>> I agree with your assessment of what has to happen before we can make
>> this feature work. From what I understand these are the main
>> requirements for making isolation work:
>
> Requirement 1 is "Find customer".
>
>> 1. The memory region can't be onlined as System RAM
>> 2. The region needs to be mapped as device-dax
>> 3. There need to be safeguards such that someone can't remap the
>> region to System RAM with error isolation enabled (added by me)
>
> No, this policy does not belong in the kernel.
>
Understood.
>> Considering these all have to do with mm, I think that's a good place
>> to start. What I'm thinking of starting with is adding a module
>> parameter (or config option) to enable isolation for CXL dax regions
>> by default, as well as a sysfs option for toggling error isolation for
>> the CXL region. When the module parameter is specified, the CXL region
>> driver would create the region as a device-dax region by default
>> instead of onlining the region as system RAM. The sysfs would allow
>> toggling error isolation for CXL regions that are already device-dax.
>
> No, definitely not going to invent module paramter policy for this. The
> policy to not online dax regions is already available.
>
>> That would cover requirements 1 & 2, but would still allow someone to
>> reconfigure the region as a system RAM region with error isolation
>> still enabled using sysfs/daxctl. To make sure the this doesn't
>> happen, my plan is to have the CXL region driver automatically disable
>> error isolation when the underlying memory region is going offline,
>> then check to make sure the memory isn't onlined as System RAM before
>> re-enabling isolation.
>
> Why would you ever need to disable isolation? If it is present it at
> least nominally allows software to keep running vs hang awaiting a
> watchdog-timer reboot.
>
That's a good point. I think I was just looking at this too long and
got a bit lost in the sauce, I'll keep this in mind for if/when I
come back to this.
>> So, with that in mind, isolation would most likely go from something
>> that is enabled by default when compiled in to a feature for
>> correctly-configured CXL regions that has to be enabled by the end
>> user. If that is sounds good, here's what my roadmap would be going
>> forward:
>
> Again, this needs a customer to weigh the mitigations that the kernel
> needs to carry.
>
>> 1. Enable creating device-dax mapped CXL regions by default
>
> Already exists.
>
>> 2. Add support for checking a CXL region meets the requirements
>> for enabling isolation (i.e. all devices in
>> dax region(s) are device-dax)
>
> Regions should not know or care if isolation is enabled, the
> implications are identical to force unbinding the region driver.
>
Got it.
Thanks,
Ben
>> 3. Add back in the error isolation enablement and notification
>> mechanisms in this patch series
>
> Not until it is clear that this feature has deployment value.
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2024-04-01 19:41 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-15 19:40 [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 1/6] cxl/core: Add CXL Timeout & Isolation capability parsing Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 2/6] pcie/cxl_timeout: Add CXL Timeout & Isolation service driver Ben Cheatham
2024-02-15 21:13 ` Bjorn Helgaas
2024-02-15 22:21 ` Ben Cheatham
2024-02-15 22:26 ` Bjorn Helgaas
2024-02-15 19:40 ` [RFC PATCH 3/6] pcie/cxl_timeout: Add CXL.mem timeout range programming Ben Cheatham
2024-02-15 21:35 ` Bjorn Helgaas
2024-02-15 22:21 ` Ben Cheatham
2024-02-15 22:29 ` Bjorn Helgaas
2024-02-15 22:30 ` Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 4/6] pcie/cxl_timeout: Add CXL.mem error isolation support Ben Cheatham
2024-02-15 21:49 ` Bjorn Helgaas
2024-02-15 22:21 ` Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 5/6] pcie/portdrv: Add CXL MSI/-X allocation Ben Cheatham
2024-02-15 21:51 ` Bjorn Helgaas
2024-02-15 22:22 ` Ben Cheatham
2024-02-15 19:40 ` [RFC PATCH 6/6] pcie/cxl_timeout: Add CXL.mem Timeout & Isolation interrupt support Ben Cheatham
2024-02-15 21:57 ` Bjorn Helgaas
2024-02-15 22:22 ` Ben Cheatham
2024-02-15 23:43 ` [RFC PATCH 0/6] Implement initial CXL Timeout & Isolation support Dan Williams
2024-03-25 15:15 ` Ben Cheatham
2024-03-25 15:54 ` Dan Williams
2024-04-01 19:41 ` Ben Cheatham
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).