* [PATCH v5 0/3] CXL, ACPI, APEI, EINJ: Update EINJ for CXL 1.1 error types
@ 2023-09-25 20:01 Ben Cheatham
2023-09-25 20:01 ` [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev Ben Cheatham
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-25 20:01 UTC (permalink / raw)
To: rafael, dan.j.williams, linux-cxl, linux-acpi
Cc: bhelgaas, benjamin.cheatham, yazen.ghannam
v5 Changes:
- Removed device_add_file() in favor of using sysfs_create_group()
and an is_visible() callback
- Add setting cxl_root to NULL on cxl_acpi driver exit
- Add real path for cxl_rcrb_addr file to documentation in addition
to symlink path
This patch is a follow up to the discussion at [1], and builds on Tony's
CXL error patch at [2].
The new CXL error types will use the Memory Address field in the
SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
compliant memory-mapped downstream port. The value of the memory address
will be in the port's MMIO range, and it will not represent physical
(normal or persistent) memory.
In v2 [3], the user supplied the MMIO address for the downstream port, but
per Dan Williams' suggestion [3], the addresses are predetermined and
the user only picks the error type to inject and the downstream port to
inject into. In order to inject an error, the user write the error type
to the error_type file under the einj debugfs directory, then writes any
integer into one of the files under the cxl directory.
[1]:
Link: https://lore.kernel.org/linux-acpi/20221206205234.606073-1-Benjamin.Cheatham@amd.com/
[2]:
Link: https://lore.kernel.org/linux-cxl/CAJZ5v0hNQUfWViqxbJ5B4JCGJUuHpWWSpqpCFWPNpGuagoFbsQ@mail.gmail.com/T/#t
[3]:
Link: https://lore.kernel.org/linux-cxl/20230403151849.43408-1-Benjamin.Cheatham@amd.com/
Ben Cheatham (3):
CXL, PCIE: Add cxl_rcrb_addr file to dport_dev
ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support
ACPI, APEI, EINJ: Update EINJ documentation
Documentation/ABI/testing/sysfs-bus-cxl | 9 +++
.../firmware-guide/acpi/apei/einj.rst | 25 ++++++-
drivers/acpi/apei/Kconfig | 2 +
drivers/acpi/apei/einj.c | 24 +++++-
drivers/cxl/acpi.c | 3 +
drivers/cxl/core/port.c | 75 +++++++++++++++++++
drivers/cxl/cxl.h | 3 +
include/linux/cxl.h | 11 +++
8 files changed, 147 insertions(+), 5 deletions(-)
create mode 100644 include/linux/cxl.h
--
2.34.1
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev
2023-09-25 20:01 [PATCH v5 0/3] CXL, ACPI, APEI, EINJ: Update EINJ for CXL 1.1 error types Ben Cheatham
@ 2023-09-25 20:01 ` Ben Cheatham
2023-09-26 10:50 ` Jonathan Cameron
` (2 more replies)
2023-09-25 20:01 ` [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support Ben Cheatham
2023-09-25 20:01 ` [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation Ben Cheatham
2 siblings, 3 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-25 20:01 UTC (permalink / raw)
To: rafael, dan.j.williams, linux-cxl, linux-acpi
Cc: bhelgaas, benjamin.cheatham, yazen.ghannam
Add cxl_rcrb_addr to the dport_dev (normally represented by a pcie
device) for CXL RCH root ports. The file will print the RCRB base
MMIO address of the root port when read and will be used by
users looking to inject CXL EINJ error types for RCH hosts.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
Documentation/ABI/testing/sysfs-bus-cxl | 9 ++++
drivers/cxl/acpi.c | 2 +
drivers/cxl/core/port.c | 58 +++++++++++++++++++++++++
drivers/cxl/cxl.h | 2 +
4 files changed, 71 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 087f762ebfd5..85621da69296 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -177,6 +177,15 @@ Description:
integer reflects the hardware port unique-id used in the
hardware decoder target list.
+What: /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
+What: /sys/devices/pciX/cxl_rcrb_addr
+Date: August, 2023
+KernelVersion: v6.6
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (RO) The 'cxl_rcrb_addr' device file gives the MMIO base address
+ of the RCRB of the corresponding CXL 1.1 downstream port. Only
+ present for CXL 1.1 dports.
What: /sys/bus/cxl/devices/decoderX.Y
Date: June, 2021
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index d1c559879dcc..3e2ca946bf47 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -676,6 +676,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
if (IS_ERR(root_port))
return PTR_ERR(root_port);
+ set_cxl_root(root_port);
+
rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
add_host_bridge_dport);
if (rc < 0)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 724be8448eb4..c3914e73f67e 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -875,6 +875,14 @@ struct cxl_port *find_cxl_root(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL);
+static struct cxl_port *cxl_root;
+
+void set_cxl_root(struct cxl_port *root_port)
+{
+ cxl_root = root_port;
+}
+EXPORT_SYMBOL_NS_GPL(set_cxl_root, CXL);
+
static struct cxl_dport *find_dport(struct cxl_port *port, int id)
{
struct cxl_dport *dport;
@@ -930,11 +938,56 @@ static void cond_cxl_root_unlock(struct cxl_port *port)
device_unlock(&port->dev);
}
+static ssize_t cxl_rcrb_addr_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cxl_dport *dport;
+
+ if (!cxl_root)
+ return -ENODEV;
+
+ dport = cxl_find_dport_by_dev(cxl_root, dev);
+ if (!dport)
+ return -ENODEV;
+
+ return sysfs_emit(buf, "0x%llx\n", (u64) dport->rcrb.base);
+}
+DEVICE_ATTR_RO(cxl_rcrb_addr);
+
+static umode_t cxl_rcrb_addr_is_visible(struct kobject *kobj,
+ struct attribute *a, int n)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct cxl_dport *dport;
+
+ if (!IS_ENABLED(CONFIG_ACPI_APEI_EINJ) || !cxl_root)
+ return 0;
+
+ dport = cxl_find_dport_by_dev(cxl_root, dev);
+ if (!dport || !dport->rch || dport->rcrb.base == CXL_RESOURCE_NONE)
+ return 0;
+
+ return a->mode;
+}
+
+static struct attribute *cxl_rcrb_addr_attrs[] = {
+ &dev_attr_cxl_rcrb_addr.attr,
+ NULL,
+};
+
+static const struct attribute_group cxl_rcrb_addr_group = {
+ .attrs = cxl_rcrb_addr_attrs,
+ .is_visible = cxl_rcrb_addr_is_visible,
+};
+
static void cxl_dport_remove(void *data)
{
struct cxl_dport *dport = data;
struct cxl_port *port = dport->port;
+ if (dport->rch)
+ sysfs_remove_group(&dport->dport_dev->kobj, &cxl_rcrb_addr_group);
+
xa_erase(&port->dports, (unsigned long) dport->dport_dev);
put_device(dport->dport_dev);
}
@@ -1021,6 +1074,11 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
if (rc)
return ERR_PTR(rc);
+ rc = sysfs_create_group(&dport_dev->kobj, &cxl_rcrb_addr_group);
+ if (rc)
+ dev_dbg(dport_dev, "Couldn't create cxl_rcrb_addr group: %d\n",
+ rc);
+
return dport;
}
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 76d92561af29..4d5bce4bae7e 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -690,6 +690,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
resource_size_t component_reg_phys,
struct cxl_dport *parent_dport);
struct cxl_port *find_cxl_root(struct cxl_port *port);
+void set_cxl_root(struct cxl_port *root_port);
+
int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
void cxl_bus_rescan(void);
void cxl_bus_drain(void);
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support
2023-09-25 20:01 [PATCH v5 0/3] CXL, ACPI, APEI, EINJ: Update EINJ for CXL 1.1 error types Ben Cheatham
2023-09-25 20:01 ` [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev Ben Cheatham
@ 2023-09-25 20:01 ` Ben Cheatham
2023-09-26 11:04 ` Jonathan Cameron
` (2 more replies)
2023-09-25 20:01 ` [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation Ben Cheatham
2 siblings, 3 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-25 20:01 UTC (permalink / raw)
To: rafael, dan.j.williams, linux-cxl, linux-acpi
Cc: bhelgaas, benjamin.cheatham, yazen.ghannam
Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI
v6.5. Because these error types target memory-mapped CXL 1.1 compliant
downstream ports and not physical (normal/persistent) memory, these
error types are not currently allowed through the memory range
validation done by the EINJ driver.
The MMIO address of a CXL 1.1 downstream port can be found in the
cxl_rcrb_addr file in the corresponding dport directory under
/sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same
procedure as a memory error type, but with param1 set to the
downstream port MMIO address.
Example usage:
$ cd /sys/kernel/debug/apei/einj
$ cat available_error_type
0x00000008 Memory Correctable
0x00000010 Memory Uncorrectable non-fatal
0x00000020 Memory Uncorrectable fatal
0x00000040 PCI Express Correctable
0x00000080 PCI Express Uncorrectable non-fatal
0x00000100 PCI Express Uncorrectable fatal
0x00008000 CXL.mem Protocol Correctable
0x00020000 CXL.mem Protocol Uncorrectable fatal
$ echo 0x8000 > error_type
$ echo 0xfffffffffffff000 > param2
$ echo 0x2 > flags
$ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
0xb2f00000
$ echo 0xb2f00000 > param1
$ echo 1 > error_inject
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/acpi/apei/Kconfig | 2 ++
drivers/acpi/apei/einj.c | 24 +++++++++++++++++++++++-
drivers/cxl/acpi.c | 1 +
drivers/cxl/core/port.c | 17 +++++++++++++++++
drivers/cxl/cxl.h | 1 +
include/linux/cxl.h | 11 +++++++++++
6 files changed, 55 insertions(+), 1 deletion(-)
create mode 100644 include/linux/cxl.h
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 6b18f8bc7be3..eb9cc7157342 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,8 @@ config ACPI_APEI_MEMORY_FAILURE
config ACPI_APEI_EINJ
tristate "APEI Error INJection (EINJ)"
depends on ACPI_APEI && DEBUG_FS
+ imply CXL_BUS
+ imply CXL_ACPI
help
EINJ provides a hardware error injection mechanism, it is
mainly used for debugging and testing the other parts of
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 013eb621dc92..8000417a5f26 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -21,6 +21,7 @@
#include <linux/nmi.h>
#include <linux/delay.h>
#include <linux/mm.h>
+#include <linux/cxl.h>
#include <asm/unaligned.h>
#include "apei-internal.h"
@@ -36,6 +37,7 @@
#define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
ACPI_EINJ_MEMORY_UNCORRECTABLE | \
ACPI_EINJ_MEMORY_FATAL)
+#define CXL_ERROR_MASK GENMASK(17, 12)
/*
* ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
@@ -512,6 +514,22 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
return rc;
}
+static int is_valid_cxl_addr(u64 addr)
+{
+ struct cxl_dport *dport;
+
+ if (IS_REACHABLE(CONFIG_CXL_ACPI))
+ dport = cxl_find_rch_dport_by_rcrb((resource_size_t) addr);
+ else
+ return 0;
+
+ if (!IS_ERR_OR_NULL(dport))
+ return 1;
+
+ pr_info("Could not find dport with rcrb 0x%llx\n", addr);
+ return 0;
+}
+
/* Inject the specified hardware error */
static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
u64 param3, u64 param4)
@@ -537,8 +555,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
if (type & ACPI5_VENDOR_BIT) {
if (vendor_flags != SETWA_FLAGS_MEM)
goto inject;
- } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
+ } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
goto inject;
+ } else if (type & CXL_ERROR_MASK && is_valid_cxl_addr(param1)) {
+ goto inject;
+ }
/*
* Disallow crazy address masks that give BIOS leeway to pick
@@ -807,3 +828,4 @@ module_exit(einj_exit);
MODULE_AUTHOR("Huang Ying");
MODULE_DESCRIPTION("APEI Error INJection support");
MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(CXL);
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 3e2ca946bf47..a7adf3d9814e 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -756,6 +756,7 @@ static void __exit cxl_acpi_exit(void)
{
platform_driver_unregister(&cxl_acpi_driver);
cxl_bus_drain();
+ set_cxl_root(NULL);
}
/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index c3914e73f67e..cb653cb1a1ea 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1147,6 +1147,23 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, CXL);
+struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base)
+{
+ struct cxl_dport *dport;
+ unsigned long index;
+
+ if (!cxl_root)
+ return ERR_PTR(-ENODEV);
+
+ xa_for_each(&cxl_root->dports, index, dport)
+ if ((dport->rch && dport->rcrb.base != CXL_RESOURCE_NONE)
+ && dport->rcrb.base == rcrb_base)
+ return dport;
+
+ return NULL;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_find_rch_dport_by_rcrb, CXL);
+
static int add_ep(struct cxl_ep *new)
{
struct cxl_port *port = new->dport->port;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 4d5bce4bae7e..3e6779dbcd23 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -8,6 +8,7 @@
#include <linux/bitfield.h>
#include <linux/bitops.h>
#include <linux/log2.h>
+#include <linux/cxl.h>
#include <linux/io.h>
/**
diff --git a/include/linux/cxl.h b/include/linux/cxl.h
new file mode 100644
index 000000000000..6702caa78e7a
--- /dev/null
+++ b/include/linux/cxl.h
@@ -0,0 +1,11 @@
+#ifndef _LINUX_CXL_H
+#define _LINUX_CXL_H
+
+#include <linux/xarray.h>
+#include <linux/errno.h>
+
+struct cxl_dport;
+
+struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base);
+
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation
2023-09-25 20:01 [PATCH v5 0/3] CXL, ACPI, APEI, EINJ: Update EINJ for CXL 1.1 error types Ben Cheatham
2023-09-25 20:01 ` [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev Ben Cheatham
2023-09-25 20:01 ` [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support Ben Cheatham
@ 2023-09-25 20:01 ` Ben Cheatham
2023-09-26 11:05 ` Jonathan Cameron
2023-09-26 20:24 ` Bjorn Helgaas
2 siblings, 2 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-25 20:01 UTC (permalink / raw)
To: rafael, dan.j.williams, linux-cxl, linux-acpi
Cc: bhelgaas, benjamin.cheatham, yazen.ghannam
Update EINJ documentation to include CXL errors in available_error_types
table and usage of the types.
Also fix a formatting error in the param4 file description that caused
the description to be on the same line as the bullet point.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
.../firmware-guide/acpi/apei/einj.rst | 25 ++++++++++++++++---
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index d6b61d22f525..c6f28118c48b 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -32,6 +32,9 @@ configuration::
CONFIG_ACPI_APEI
CONFIG_ACPI_APEI_EINJ
+To use CXL error types ``CONFIG_CXL_ACPI`` needs to be set to the same
+value as ``CONFIG_ACPI_APEI_EINJ`` (either "y" or "m").
+
The EINJ user interface is in <debugfs mount point>/apei/einj.
The following files belong to it:
@@ -40,9 +43,9 @@ The following files belong to it:
This file shows which error types are supported:
- ================ ===================================
+ ================ =========================================
Error Type Value Error Description
- ================ ===================================
+ ================ =========================================
0x00000001 Processor Correctable
0x00000002 Processor Uncorrectable non-fatal
0x00000004 Processor Uncorrectable fatal
@@ -55,7 +58,13 @@ The following files belong to it:
0x00000200 Platform Correctable
0x00000400 Platform Uncorrectable non-fatal
0x00000800 Platform Uncorrectable fatal
- ================ ===================================
+ 0x00001000 CXL.cache Protocol Correctable
+ 0x00002000 CXL.cache Protocol Uncorrectable non-fatal
+ 0x00004000 CXL.cache Protocol Uncorrectable fatal
+ 0x00008000 CXL.mem Protocol Correctable
+ 0x00010000 CXL.mem Protocol Uncorrectable non-fatal
+ 0x00020000 CXL.mem Protocol Uncorrectable fatal
+ ================ =========================================
The format of the file contents are as above, except present are only
the available error types.
@@ -106,6 +115,7 @@ The following files belong to it:
Used when the 0x1 bit is set in "flags" to specify the APIC id
- param4
+
Used when the 0x4 bit is set in "flags" to specify target PCIe device
- notrigger
@@ -159,6 +169,13 @@ and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
documentation for details (and expect changes to this API if vendors
creativity in using this feature expands beyond our expectations).
+CXL error types are supported from ACPI 6.5 onwards. To use these error
+types you need the MMIO address of a CXL 1.1 downstream port. You can
+find the address of dportY in /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
+(it's possible that the dport is under the CXL root, in that case the
+path would be /sys/us/cxl/devices/rootX/dportY/cxl_rcrb_addr).
+From there, write the address to param1 and continue as you would for a
+memory error type.
An error injection example::
@@ -201,4 +218,4 @@ The following sequence can be used:
7) Read from the virtual address. This will trigger the error
For more information about EINJ, please refer to ACPI specification
-version 4.0, section 17.5 and ACPI 5.0, section 18.6.
+version 4.0, section 17.5 and ACPI 6.5, section 18.6.
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev
2023-09-25 20:01 ` [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev Ben Cheatham
@ 2023-09-26 10:50 ` Jonathan Cameron
2023-09-26 16:00 ` Ben Cheatham
2023-09-26 20:23 ` Bjorn Helgaas
2023-09-26 21:15 ` Dan Williams
2 siblings, 1 reply; 20+ messages in thread
From: Jonathan Cameron @ 2023-09-26 10:50 UTC (permalink / raw)
To: Ben Cheatham
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On Mon, 25 Sep 2023 15:01:25 -0500
Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
> Add cxl_rcrb_addr to the dport_dev (normally represented by a pcie
> device) for CXL RCH root ports. The file will print the RCRB base
> MMIO address of the root port when read and will be used by
> users looking to inject CXL EINJ error types for RCH hosts.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Hi Ben,
I'm still not totally convinced that injecting the group via the link
onto the PCI device is necessarily a good idea, but if Bjorn is fine with
that I don't mind too much.
There is a question on whether this should also be added to the
sysfs-bus-pci docs given it turns up on a device where people might look there
first.
So with that in mind
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> Documentation/ABI/testing/sysfs-bus-cxl | 9 ++++
> drivers/cxl/acpi.c | 2 +
> drivers/cxl/core/port.c | 58 +++++++++++++++++++++++++
> drivers/cxl/cxl.h | 2 +
> 4 files changed, 71 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 087f762ebfd5..85621da69296 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -177,6 +177,15 @@ Description:
> integer reflects the hardware port unique-id used in the
> hardware decoder target list.
>
> +What: /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
> +What: /sys/devices/pciX/cxl_rcrb_addr
> +Date: August, 2023
> +KernelVersion: v6.6
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + (RO) The 'cxl_rcrb_addr' device file gives the MMIO base address
> + of the RCRB of the corresponding CXL 1.1 downstream port. Only
> + present for CXL 1.1 dports.
>
> What: /sys/bus/cxl/devices/decoderX.Y
> Date: June, 2021
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index d1c559879dcc..3e2ca946bf47 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -676,6 +676,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> if (IS_ERR(root_port))
> return PTR_ERR(root_port);
>
> + set_cxl_root(root_port);
> +
> rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
> add_host_bridge_dport);
> if (rc < 0)
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 724be8448eb4..c3914e73f67e 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -875,6 +875,14 @@ struct cxl_port *find_cxl_root(struct cxl_port *port)
> }
> EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL);
>
> +static struct cxl_port *cxl_root;
> +
> +void set_cxl_root(struct cxl_port *root_port)
> +{
> + cxl_root = root_port;
> +}
> +EXPORT_SYMBOL_NS_GPL(set_cxl_root, CXL);
> +
> static struct cxl_dport *find_dport(struct cxl_port *port, int id)
> {
> struct cxl_dport *dport;
> @@ -930,11 +938,56 @@ static void cond_cxl_root_unlock(struct cxl_port *port)
> device_unlock(&port->dev);
> }
>
> +static ssize_t cxl_rcrb_addr_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cxl_dport *dport;
> +
> + if (!cxl_root)
> + return -ENODEV;
> +
> + dport = cxl_find_dport_by_dev(cxl_root, dev);
> + if (!dport)
> + return -ENODEV;
> +
> + return sysfs_emit(buf, "0x%llx\n", (u64) dport->rcrb.base);
> +}
> +DEVICE_ATTR_RO(cxl_rcrb_addr);
> +
> +static umode_t cxl_rcrb_addr_is_visible(struct kobject *kobj,
> + struct attribute *a, int n)
> +{
> + struct device *dev = kobj_to_dev(kobj);
> + struct cxl_dport *dport;
> +
> + if (!IS_ENABLED(CONFIG_ACPI_APEI_EINJ) || !cxl_root)
> + return 0;
> +
> + dport = cxl_find_dport_by_dev(cxl_root, dev);
> + if (!dport || !dport->rch || dport->rcrb.base == CXL_RESOURCE_NONE)
> + return 0;
> +
> + return a->mode;
> +}
> +
> +static struct attribute *cxl_rcrb_addr_attrs[] = {
> + &dev_attr_cxl_rcrb_addr.attr,
> + NULL,
> +};
> +
> +static const struct attribute_group cxl_rcrb_addr_group = {
> + .attrs = cxl_rcrb_addr_attrs,
> + .is_visible = cxl_rcrb_addr_is_visible,
> +};
> +
> static void cxl_dport_remove(void *data)
> {
> struct cxl_dport *dport = data;
> struct cxl_port *port = dport->port;
>
> + if (dport->rch)
> + sysfs_remove_group(&dport->dport_dev->kobj, &cxl_rcrb_addr_group);
> +
> xa_erase(&port->dports, (unsigned long) dport->dport_dev);
> put_device(dport->dport_dev);
> }
> @@ -1021,6 +1074,11 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> if (rc)
> return ERR_PTR(rc);
>
> + rc = sysfs_create_group(&dport_dev->kobj, &cxl_rcrb_addr_group);
> + if (rc)
> + dev_dbg(dport_dev, "Couldn't create cxl_rcrb_addr group: %d\n",
> + rc);
> +
> return dport;
> }
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 76d92561af29..4d5bce4bae7e 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -690,6 +690,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
> resource_size_t component_reg_phys,
> struct cxl_dport *parent_dport);
> struct cxl_port *find_cxl_root(struct cxl_port *port);
> +void set_cxl_root(struct cxl_port *root_port);
> +
> int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
> void cxl_bus_rescan(void);
> void cxl_bus_drain(void);
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support
2023-09-25 20:01 ` [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support Ben Cheatham
@ 2023-09-26 11:04 ` Jonathan Cameron
2023-09-26 16:00 ` Ben Cheatham
2023-09-26 20:22 ` Bjorn Helgaas
2023-09-26 21:36 ` Dan Williams
2 siblings, 1 reply; 20+ messages in thread
From: Jonathan Cameron @ 2023-09-26 11:04 UTC (permalink / raw)
To: Ben Cheatham
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On Mon, 25 Sep 2023 15:01:26 -0500
Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
> Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI
> v6.5. Because these error types target memory-mapped CXL 1.1 compliant
> downstream ports and not physical (normal/persistent) memory, these
> error types are not currently allowed through the memory range
> validation done by the EINJ driver.
>
> The MMIO address of a CXL 1.1 downstream port can be found in the
> cxl_rcrb_addr file in the corresponding dport directory under
> /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same
> procedure as a memory error type, but with param1 set to the
> downstream port MMIO address.
>
> Example usage:
> $ cd /sys/kernel/debug/apei/einj
> $ cat available_error_type
> 0x00000008 Memory Correctable
> 0x00000010 Memory Uncorrectable non-fatal
> 0x00000020 Memory Uncorrectable fatal
> 0x00000040 PCI Express Correctable
> 0x00000080 PCI Express Uncorrectable non-fatal
> 0x00000100 PCI Express Uncorrectable fatal
> 0x00008000 CXL.mem Protocol Correctable
> 0x00020000 CXL.mem Protocol Uncorrectable fatal
> $ echo 0x8000 > error_type
> $ echo 0xfffffffffffff000 > param2
> $ echo 0x2 > flags
> $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
> 0xb2f00000
> $ echo 0xb2f00000 > param1
> $ echo 1 > error_inject
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Hi Ben
A few trivial things inline.
Jonathan
> ---
> drivers/acpi/apei/Kconfig | 2 ++
> drivers/acpi/apei/einj.c | 24 +++++++++++++++++++++++-
> drivers/cxl/acpi.c | 1 +
> drivers/cxl/core/port.c | 17 +++++++++++++++++
> drivers/cxl/cxl.h | 1 +
> include/linux/cxl.h | 11 +++++++++++
> 6 files changed, 55 insertions(+), 1 deletion(-)
> create mode 100644 include/linux/cxl.h
>
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index 6b18f8bc7be3..eb9cc7157342 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -55,6 +55,8 @@ config ACPI_APEI_MEMORY_FAILURE
> config ACPI_APEI_EINJ
> tristate "APEI Error INJection (EINJ)"
> depends on ACPI_APEI && DEBUG_FS
> + imply CXL_BUS
> + imply CXL_ACPI
> help
> EINJ provides a hardware error injection mechanism, it is
> mainly used for debugging and testing the other parts of
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 013eb621dc92..8000417a5f26 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -21,6 +21,7 @@
> #include <linux/nmi.h>
> #include <linux/delay.h>
> #include <linux/mm.h>
> +#include <linux/cxl.h>
> #include <asm/unaligned.h>
>
> #include "apei-internal.h"
> @@ -36,6 +37,7 @@
> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
> ACPI_EINJ_MEMORY_FATAL)
> +#define CXL_ERROR_MASK GENMASK(17, 12)
>
> /*
> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
> @@ -512,6 +514,22 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> return rc;
> }
>
> +static int is_valid_cxl_addr(u64 addr)
> +{
> + struct cxl_dport *dport;
> +
> + if (IS_REACHABLE(CONFIG_CXL_ACPI))
> + dport = cxl_find_rch_dport_by_rcrb((resource_size_t) addr);
> + else
> + return 0;
Flip this logic probably so the quick exit is the indented path.
if (!IS_REACHABLE(CONFIG_CXL_ACPI))
return 0
dport = ...
> +
> + if (!IS_ERR_OR_NULL(dport))
> + return 1;
> +
> + pr_info("Could not find dport with rcrb 0x%llx\n", addr);
> + return 0;
> +}
> +
> /* Inject the specified hardware error */
> static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> u64 param3, u64 param4)
> @@ -537,8 +555,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> if (type & ACPI5_VENDOR_BIT) {
> if (vendor_flags != SETWA_FLAGS_MEM)
> goto inject;
> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
> goto inject;
> + } else if (type & CXL_ERROR_MASK && is_valid_cxl_addr(param1)) {
> + goto inject;
> + }
>
> /*
> * Disallow crazy address masks that give BIOS leeway to pick
> @@ -807,3 +828,4 @@ module_exit(einj_exit);
> MODULE_AUTHOR("Huang Ying");
> MODULE_DESCRIPTION("APEI Error INJection support");
> MODULE_LICENSE("GPL");
> +MODULE_IMPORT_NS(CXL);
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 3e2ca946bf47..a7adf3d9814e 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -756,6 +756,7 @@ static void __exit cxl_acpi_exit(void)
> {
> platform_driver_unregister(&cxl_acpi_driver);
> cxl_bus_drain();
> + set_cxl_root(NULL);
Why this patch rather than previous?
> }
>
> /* load before dax_hmem sees 'Soft Reserved' CXL ranges */
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index c3914e73f67e..cb653cb1a1ea 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1147,6 +1147,23 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, CXL);
>
> +struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base)
> +{
> + struct cxl_dport *dport;
> + unsigned long index;
> +
> + if (!cxl_root)
> + return ERR_PTR(-ENODEV);
> +
> + xa_for_each(&cxl_root->dports, index, dport)
> + if ((dport->rch && dport->rcrb.base != CXL_RESOURCE_NONE)
> + && dport->rcrb.base == rcrb_base)
Local style in this file has && at end of line above.
> + return dport;
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_find_rch_dport_by_rcrb, CXL);
> +
> static int add_ep(struct cxl_ep *new)
> {
> struct cxl_port *port = new->dport->port;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 4d5bce4bae7e..3e6779dbcd23 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -8,6 +8,7 @@
> #include <linux/bitfield.h>
> #include <linux/bitops.h>
> #include <linux/log2.h>
> +#include <linux/cxl.h>
> #include <linux/io.h>
>
> /**
> diff --git a/include/linux/cxl.h b/include/linux/cxl.h
> new file mode 100644
> index 000000000000..6702caa78e7a
> --- /dev/null
> +++ b/include/linux/cxl.h
We have other code outside drivers/cxl that gets to the
CXL headers directly (to avoid this include being created before now).
https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/perf/cxl_pmu.c#L24
There wasn't much enthusiasm for creating a globally visible header hence
I did that ../cxl stuff for that.
Same probably applies here.
Jonathan
> @@ -0,0 +1,11 @@
> +#ifndef _LINUX_CXL_H
> +#define _LINUX_CXL_H
> +
> +#include <linux/xarray.h>
> +#include <linux/errno.h>
> +
> +struct cxl_dport;
> +
> +struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base);
> +
> +#endif
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation
2023-09-25 20:01 ` [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation Ben Cheatham
@ 2023-09-26 11:05 ` Jonathan Cameron
2023-09-26 16:00 ` Ben Cheatham
2023-09-26 20:24 ` Bjorn Helgaas
1 sibling, 1 reply; 20+ messages in thread
From: Jonathan Cameron @ 2023-09-26 11:05 UTC (permalink / raw)
To: Ben Cheatham
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On Mon, 25 Sep 2023 15:01:27 -0500
Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
> Update EINJ documentation to include CXL errors in available_error_types
> table and usage of the types.
>
> Also fix a formatting error in the param4 file description that caused
> the description to be on the same line as the bullet point.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
A trivial comment inline.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> .../firmware-guide/acpi/apei/einj.rst | 25 ++++++++++++++++---
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
> index d6b61d22f525..c6f28118c48b 100644
> --- a/Documentation/firmware-guide/acpi/apei/einj.rst
> +++ b/Documentation/firmware-guide/acpi/apei/einj.rst
> @@ -32,6 +32,9 @@ configuration::
> CONFIG_ACPI_APEI
> CONFIG_ACPI_APEI_EINJ
>
> +To use CXL error types ``CONFIG_CXL_ACPI`` needs to be set to the same
> +value as ``CONFIG_ACPI_APEI_EINJ`` (either "y" or "m").
> +
> The EINJ user interface is in <debugfs mount point>/apei/einj.
>
> The following files belong to it:
> @@ -40,9 +43,9 @@ The following files belong to it:
>
> This file shows which error types are supported:
>
> - ================ ===================================
> + ================ =========================================
> Error Type Value Error Description
> - ================ ===================================
> + ================ =========================================
> 0x00000001 Processor Correctable
> 0x00000002 Processor Uncorrectable non-fatal
> 0x00000004 Processor Uncorrectable fatal
> @@ -55,7 +58,13 @@ The following files belong to it:
> 0x00000200 Platform Correctable
> 0x00000400 Platform Uncorrectable non-fatal
> 0x00000800 Platform Uncorrectable fatal
> - ================ ===================================
> + 0x00001000 CXL.cache Protocol Correctable
> + 0x00002000 CXL.cache Protocol Uncorrectable non-fatal
> + 0x00004000 CXL.cache Protocol Uncorrectable fatal
> + 0x00008000 CXL.mem Protocol Correctable
> + 0x00010000 CXL.mem Protocol Uncorrectable non-fatal
> + 0x00020000 CXL.mem Protocol Uncorrectable fatal
> + ================ =========================================
>
> The format of the file contents are as above, except present are only
> the available error types.
> @@ -106,6 +115,7 @@ The following files belong to it:
> Used when the 0x1 bit is set in "flags" to specify the APIC id
>
> - param4
> +
#Unrelated change. Probably reasonable but should be separate patch really.
> Used when the 0x4 bit is set in "flags" to specify target PCIe device
>
> - notrigger
> @@ -159,6 +169,13 @@ and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
> documentation for details (and expect changes to this API if vendors
> creativity in using this feature expands beyond our expectations).
>
> +CXL error types are supported from ACPI 6.5 onwards. To use these error
> +types you need the MMIO address of a CXL 1.1 downstream port. You can
> +find the address of dportY in /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
> +(it's possible that the dport is under the CXL root, in that case the
> +path would be /sys/us/cxl/devices/rootX/dportY/cxl_rcrb_addr).
> +From there, write the address to param1 and continue as you would for a
> +memory error type.
>
> An error injection example::
>
> @@ -201,4 +218,4 @@ The following sequence can be used:
> 7) Read from the virtual address. This will trigger the error
>
> For more information about EINJ, please refer to ACPI specification
> -version 4.0, section 17.5 and ACPI 5.0, section 18.6.
> +version 4.0, section 17.5 and ACPI 6.5, section 18.6.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev
2023-09-26 10:50 ` Jonathan Cameron
@ 2023-09-26 16:00 ` Ben Cheatham
0 siblings, 0 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-26 16:00 UTC (permalink / raw)
To: Jonathan Cameron
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
Hi Jonathan, thanks for the review! Responses inline.
On 9/26/23 5:50 AM, Jonathan Cameron wrote:
> On Mon, 25 Sep 2023 15:01:25 -0500
> Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
>
>> Add cxl_rcrb_addr to the dport_dev (normally represented by a pcie
>> device) for CXL RCH root ports. The file will print the RCRB base
>> MMIO address of the root port when read and will be used by
>> users looking to inject CXL EINJ error types for RCH hosts.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> Hi Ben,
>
> I'm still not totally convinced that injecting the group via the link
> onto the PCI device is necessarily a good idea, but if Bjorn is fine with
> that I don't mind too much.
>
I'm not 100% convinced it's great either, but it was the cleanest implementation
I could come up with at the time. I'll try and see if I can come up with something
better if I get some extra time in the near future, otherwise I'll leave it as is
if Bjorn approves.
> There is a question on whether this should also be added to the
> sysfs-bus-pci docs given it turns up on a device where people might look there
> first.
>
I agree, I'll move it to sysfs-bus-pci. I didn't move it this revision because the device
the file was on (pciX) wasn't under sys/bus/pci on my test system.
> So with that in mind
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
>> ---
>> Documentation/ABI/testing/sysfs-bus-cxl | 9 ++++
>> drivers/cxl/acpi.c | 2 +
>> drivers/cxl/core/port.c | 58 +++++++++++++++++++++++++
>> drivers/cxl/cxl.h | 2 +
>> 4 files changed, 71 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
>> index 087f762ebfd5..85621da69296 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-cxl
>> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
>> @@ -177,6 +177,15 @@ Description:
>> integer reflects the hardware port unique-id used in the
>> hardware decoder target list.
>>
>> +What: /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
>> +What: /sys/devices/pciX/cxl_rcrb_addr
>> +Date: August, 2023
>> +KernelVersion: v6.6
>> +Contact: linux-cxl@vger.kernel.org
>> +Description:
>> + (RO) The 'cxl_rcrb_addr' device file gives the MMIO base address
>> + of the RCRB of the corresponding CXL 1.1 downstream port. Only
>> + present for CXL 1.1 dports.
>>
>> What: /sys/bus/cxl/devices/decoderX.Y
>> Date: June, 2021
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index d1c559879dcc..3e2ca946bf47 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -676,6 +676,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>> if (IS_ERR(root_port))
>> return PTR_ERR(root_port);
>>
>> + set_cxl_root(root_port);
>> +
>> rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
>> add_host_bridge_dport);
>> if (rc < 0)
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 724be8448eb4..c3914e73f67e 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -875,6 +875,14 @@ struct cxl_port *find_cxl_root(struct cxl_port *port)
>> }
>> EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL);
>>
>> +static struct cxl_port *cxl_root;
>> +
>> +void set_cxl_root(struct cxl_port *root_port)
>> +{
>> + cxl_root = root_port;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(set_cxl_root, CXL);
>> +
>> static struct cxl_dport *find_dport(struct cxl_port *port, int id)
>> {
>> struct cxl_dport *dport;
>> @@ -930,11 +938,56 @@ static void cond_cxl_root_unlock(struct cxl_port *port)
>> device_unlock(&port->dev);
>> }
>>
>> +static ssize_t cxl_rcrb_addr_show(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + struct cxl_dport *dport;
>> +
>> + if (!cxl_root)
>> + return -ENODEV;
>> +
>> + dport = cxl_find_dport_by_dev(cxl_root, dev);
>> + if (!dport)
>> + return -ENODEV;
>> +
>> + return sysfs_emit(buf, "0x%llx\n", (u64) dport->rcrb.base);
>> +}
>> +DEVICE_ATTR_RO(cxl_rcrb_addr);
>> +
>> +static umode_t cxl_rcrb_addr_is_visible(struct kobject *kobj,
>> + struct attribute *a, int n)
>> +{
>> + struct device *dev = kobj_to_dev(kobj);
>> + struct cxl_dport *dport;
>> +
>> + if (!IS_ENABLED(CONFIG_ACPI_APEI_EINJ) || !cxl_root)
>> + return 0;
>> +
>> + dport = cxl_find_dport_by_dev(cxl_root, dev);
>> + if (!dport || !dport->rch || dport->rcrb.base == CXL_RESOURCE_NONE)
>> + return 0;
>> +
>> + return a->mode;
>> +}
>> +
>> +static struct attribute *cxl_rcrb_addr_attrs[] = {
>> + &dev_attr_cxl_rcrb_addr.attr,
>> + NULL,
>> +};
>> +
>> +static const struct attribute_group cxl_rcrb_addr_group = {
>> + .attrs = cxl_rcrb_addr_attrs,
>> + .is_visible = cxl_rcrb_addr_is_visible,
>> +};
>> +
>> static void cxl_dport_remove(void *data)
>> {
>> struct cxl_dport *dport = data;
>> struct cxl_port *port = dport->port;
>>
>> + if (dport->rch)
>> + sysfs_remove_group(&dport->dport_dev->kobj, &cxl_rcrb_addr_group);
>> +
>> xa_erase(&port->dports, (unsigned long) dport->dport_dev);
>> put_device(dport->dport_dev);
>> }
>> @@ -1021,6 +1074,11 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
>> if (rc)
>> return ERR_PTR(rc);
>>
>> + rc = sysfs_create_group(&dport_dev->kobj, &cxl_rcrb_addr_group);
>> + if (rc)
>> + dev_dbg(dport_dev, "Couldn't create cxl_rcrb_addr group: %d\n",
>> + rc);
>> +
>> return dport;
>> }
>>
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 76d92561af29..4d5bce4bae7e 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -690,6 +690,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
>> resource_size_t component_reg_phys,
>> struct cxl_dport *parent_dport);
>> struct cxl_port *find_cxl_root(struct cxl_port *port);
>> +void set_cxl_root(struct cxl_port *root_port);
>> +
>> int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
>> void cxl_bus_rescan(void);
>> void cxl_bus_drain(void);
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support
2023-09-26 11:04 ` Jonathan Cameron
@ 2023-09-26 16:00 ` Ben Cheatham
0 siblings, 0 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-26 16:00 UTC (permalink / raw)
To: Jonathan Cameron
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On 9/26/23 6:04 AM, Jonathan Cameron wrote:
> On Mon, 25 Sep 2023 15:01:26 -0500
> Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
>
>> Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI
>> v6.5. Because these error types target memory-mapped CXL 1.1 compliant
>> downstream ports and not physical (normal/persistent) memory, these
>> error types are not currently allowed through the memory range
>> validation done by the EINJ driver.
>>
>> The MMIO address of a CXL 1.1 downstream port can be found in the
>> cxl_rcrb_addr file in the corresponding dport directory under
>> /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same
>> procedure as a memory error type, but with param1 set to the
>> downstream port MMIO address.
>>
>> Example usage:
>> $ cd /sys/kernel/debug/apei/einj
>> $ cat available_error_type
>> 0x00000008 Memory Correctable
>> 0x00000010 Memory Uncorrectable non-fatal
>> 0x00000020 Memory Uncorrectable fatal
>> 0x00000040 PCI Express Correctable
>> 0x00000080 PCI Express Uncorrectable non-fatal
>> 0x00000100 PCI Express Uncorrectable fatal
>> 0x00008000 CXL.mem Protocol Correctable
>> 0x00020000 CXL.mem Protocol Uncorrectable fatal
>> $ echo 0x8000 > error_type
>> $ echo 0xfffffffffffff000 > param2
>> $ echo 0x2 > flags
>> $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
>> 0xb2f00000
>> $ echo 0xb2f00000 > param1
>> $ echo 1 > error_inject
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> Hi Ben
>
> A few trivial things inline.
>
> Jonathan
>
>> ---
>> drivers/acpi/apei/Kconfig | 2 ++
>> drivers/acpi/apei/einj.c | 24 +++++++++++++++++++++++-
>> drivers/cxl/acpi.c | 1 +
>> drivers/cxl/core/port.c | 17 +++++++++++++++++
>> drivers/cxl/cxl.h | 1 +
>> include/linux/cxl.h | 11 +++++++++++
>> 6 files changed, 55 insertions(+), 1 deletion(-)
>> create mode 100644 include/linux/cxl.h
>>
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index 6b18f8bc7be3..eb9cc7157342 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -55,6 +55,8 @@ config ACPI_APEI_MEMORY_FAILURE
>> config ACPI_APEI_EINJ
>> tristate "APEI Error INJection (EINJ)"
>> depends on ACPI_APEI && DEBUG_FS
>> + imply CXL_BUS
>> + imply CXL_ACPI
>> help
>> EINJ provides a hardware error injection mechanism, it is
>> mainly used for debugging and testing the other parts of
>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>> index 013eb621dc92..8000417a5f26 100644
>> --- a/drivers/acpi/apei/einj.c
>> +++ b/drivers/acpi/apei/einj.c
>> @@ -21,6 +21,7 @@
>> #include <linux/nmi.h>
>> #include <linux/delay.h>
>> #include <linux/mm.h>
>> +#include <linux/cxl.h>
>> #include <asm/unaligned.h>
>>
>> #include "apei-internal.h"
>> @@ -36,6 +37,7 @@
>> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
>> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
>> ACPI_EINJ_MEMORY_FATAL)
>> +#define CXL_ERROR_MASK GENMASK(17, 12)
>>
>> /*
>> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
>> @@ -512,6 +514,22 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>> return rc;
>> }
>>
>> +static int is_valid_cxl_addr(u64 addr)
>> +{
>> + struct cxl_dport *dport;
>> +
>> + if (IS_REACHABLE(CONFIG_CXL_ACPI))
>> + dport = cxl_find_rch_dport_by_rcrb((resource_size_t) addr);
>> + else
>> + return 0;
> Flip this logic probably so the quick exit is the indented path.
>
> if (!IS_REACHABLE(CONFIG_CXL_ACPI))
> return 0
>
> dport = ...
Will do.
>
>> +
>> + if (!IS_ERR_OR_NULL(dport))
>> + return 1;
>> +
>> + pr_info("Could not find dport with rcrb 0x%llx\n", addr);
>> + return 0;
>> +}
>> +
>> /* Inject the specified hardware error */
>> static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>> u64 param3, u64 param4)
>> @@ -537,8 +555,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>> if (type & ACPI5_VENDOR_BIT) {
>> if (vendor_flags != SETWA_FLAGS_MEM)
>> goto inject;
>> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
>> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
>> goto inject;
>> + } else if (type & CXL_ERROR_MASK && is_valid_cxl_addr(param1)) {
>> + goto inject;
>> + }
>>
>> /*
>> * Disallow crazy address masks that give BIOS leeway to pick
>> @@ -807,3 +828,4 @@ module_exit(einj_exit);
>> MODULE_AUTHOR("Huang Ying");
>> MODULE_DESCRIPTION("APEI Error INJection support");
>> MODULE_LICENSE("GPL");
>> +MODULE_IMPORT_NS(CXL);
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index 3e2ca946bf47..a7adf3d9814e 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -756,6 +756,7 @@ static void __exit cxl_acpi_exit(void)
>> {
>> platform_driver_unregister(&cxl_acpi_driver);
>> cxl_bus_drain();
>> + set_cxl_root(NULL);
>
> Why this patch rather than previous?
>
Good point. I added this a bit late and seem to have accidentally
put it in the wrong patch. I'll move it into the previous.
>> }
>>
>> /* load before dax_hmem sees 'Soft Reserved' CXL ranges */
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index c3914e73f67e..cb653cb1a1ea 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -1147,6 +1147,23 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
>> }
>> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, CXL);
>>
>> +struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base)
>> +{
>> + struct cxl_dport *dport;
>> + unsigned long index;
>> +
>> + if (!cxl_root)
>> + return ERR_PTR(-ENODEV);
>> +
>> + xa_for_each(&cxl_root->dports, index, dport)
>> + if ((dport->rch && dport->rcrb.base != CXL_RESOURCE_NONE)
>> + && dport->rcrb.base == rcrb_base)
>
> Local style in this file has && at end of line above.
>
I'll fix that, thanks for pointing it out.
>> + return dport;
>> +
>> + return NULL;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_find_rch_dport_by_rcrb, CXL);
>> +
>> static int add_ep(struct cxl_ep *new)
>> {
>> struct cxl_port *port = new->dport->port;
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 4d5bce4bae7e..3e6779dbcd23 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -8,6 +8,7 @@
>> #include <linux/bitfield.h>
>> #include <linux/bitops.h>
>> #include <linux/log2.h>
>> +#include <linux/cxl.h>
>> #include <linux/io.h>
>>
>> /**
>> diff --git a/include/linux/cxl.h b/include/linux/cxl.h
>> new file mode 100644
>> index 000000000000..6702caa78e7a
>> --- /dev/null
>> +++ b/include/linux/cxl.h
>
> We have other code outside drivers/cxl that gets to the
> CXL headers directly (to avoid this include being created before now).
>
> https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/perf/cxl_pmu.c#L24
>
> There wasn't much enthusiasm for creating a globally visible header hence
> I did that ../cxl stuff for that.
>
> Same probably applies here.
>
I'll look into using the private header. Thanks!
> Jonathan
>
>> @@ -0,0 +1,11 @@
>> +#ifndef _LINUX_CXL_H
>> +#define _LINUX_CXL_H
>> +
>> +#include <linux/xarray.h>
>> +#include <linux/errno.h>
>> +
>> +struct cxl_dport;
>> +
>> +struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base);
>> +
>> +#endif
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation
2023-09-26 11:05 ` Jonathan Cameron
@ 2023-09-26 16:00 ` Ben Cheatham
0 siblings, 0 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-26 16:00 UTC (permalink / raw)
To: Jonathan Cameron
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On 9/26/23 6:05 AM, Jonathan Cameron wrote:
> On Mon, 25 Sep 2023 15:01:27 -0500
> Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
>
>> Update EINJ documentation to include CXL errors in available_error_types
>> table and usage of the types.
>>
>> Also fix a formatting error in the param4 file description that caused
>> the description to be on the same line as the bullet point.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> A trivial comment inline.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
>> ---
>> .../firmware-guide/acpi/apei/einj.rst | 25 ++++++++++++++++---
>> 1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
>> index d6b61d22f525..c6f28118c48b 100644
>> --- a/Documentation/firmware-guide/acpi/apei/einj.rst
>> +++ b/Documentation/firmware-guide/acpi/apei/einj.rst
>> @@ -32,6 +32,9 @@ configuration::
>> CONFIG_ACPI_APEI
>> CONFIG_ACPI_APEI_EINJ
>>
>> +To use CXL error types ``CONFIG_CXL_ACPI`` needs to be set to the same
>> +value as ``CONFIG_ACPI_APEI_EINJ`` (either "y" or "m").
>> +
>> The EINJ user interface is in <debugfs mount point>/apei/einj.
>>
>> The following files belong to it:
>> @@ -40,9 +43,9 @@ The following files belong to it:
>>
>> This file shows which error types are supported:
>>
>> - ================ ===================================
>> + ================ =========================================
>> Error Type Value Error Description
>> - ================ ===================================
>> + ================ =========================================
>> 0x00000001 Processor Correctable
>> 0x00000002 Processor Uncorrectable non-fatal
>> 0x00000004 Processor Uncorrectable fatal
>> @@ -55,7 +58,13 @@ The following files belong to it:
>> 0x00000200 Platform Correctable
>> 0x00000400 Platform Uncorrectable non-fatal
>> 0x00000800 Platform Uncorrectable fatal
>> - ================ ===================================
>> + 0x00001000 CXL.cache Protocol Correctable
>> + 0x00002000 CXL.cache Protocol Uncorrectable non-fatal
>> + 0x00004000 CXL.cache Protocol Uncorrectable fatal
>> + 0x00008000 CXL.mem Protocol Correctable
>> + 0x00010000 CXL.mem Protocol Uncorrectable non-fatal
>> + 0x00020000 CXL.mem Protocol Uncorrectable fatal
>> + ================ =========================================
>>
>> The format of the file contents are as above, except present are only
>> the available error types.
>> @@ -106,6 +115,7 @@ The following files belong to it:
>> Used when the 0x1 bit is set in "flags" to specify the APIC id
>>
>> - param4
>> +
>
> #Unrelated change. Probably reasonable but should be separate patch really.
>
I'll take that out.
Thanks,
Ben
>> Used when the 0x4 bit is set in "flags" to specify target PCIe device
>>
>> - notrigger
>> @@ -159,6 +169,13 @@ and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
>> documentation for details (and expect changes to this API if vendors
>> creativity in using this feature expands beyond our expectations).
>>
>> +CXL error types are supported from ACPI 6.5 onwards. To use these error
>> +types you need the MMIO address of a CXL 1.1 downstream port. You can
>> +find the address of dportY in /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
>> +(it's possible that the dport is under the CXL root, in that case the
>> +path would be /sys/us/cxl/devices/rootX/dportY/cxl_rcrb_addr).
>> +From there, write the address to param1 and continue as you would for a
>> +memory error type.
>>
>> An error injection example::
>>
>> @@ -201,4 +218,4 @@ The following sequence can be used:
>> 7) Read from the virtual address. This will trigger the error
>>
>> For more information about EINJ, please refer to ACPI specification
>> -version 4.0, section 17.5 and ACPI 5.0, section 18.6.
>> +version 4.0, section 17.5 and ACPI 6.5, section 18.6.
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support
2023-09-25 20:01 ` [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support Ben Cheatham
2023-09-26 11:04 ` Jonathan Cameron
@ 2023-09-26 20:22 ` Bjorn Helgaas
2023-09-27 15:31 ` Ben Cheatham
2023-09-26 21:36 ` Dan Williams
2 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2023-09-26 20:22 UTC (permalink / raw)
To: Ben Cheatham
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On Mon, Sep 25, 2023 at 03:01:26PM -0500, Ben Cheatham wrote:
> Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI
> v6.5. Because these error types target memory-mapped CXL 1.1 compliant
> downstream ports and not physical (normal/persistent) memory, these
> error types are not currently allowed through the memory range
> validation done by the EINJ driver.
>
> The MMIO address of a CXL 1.1 downstream port can be found in the
> cxl_rcrb_addr file in the corresponding dport directory under
> /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same
> procedure as a memory error type, but with param1 set to the
> downstream port MMIO address.
>
> Example usage:
> $ cd /sys/kernel/debug/apei/einj
> $ cat available_error_type
> 0x00000008 Memory Correctable
> 0x00000010 Memory Uncorrectable non-fatal
> 0x00000020 Memory Uncorrectable fatal
> 0x00000040 PCI Express Correctable
> 0x00000080 PCI Express Uncorrectable non-fatal
> 0x00000100 PCI Express Uncorrectable fatal
> 0x00008000 CXL.mem Protocol Correctable
> 0x00020000 CXL.mem Protocol Uncorrectable fatal
> $ echo 0x8000 > error_type
> $ echo 0xfffffffffffff000 > param2
> $ echo 0x2 > flags
> $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
> 0xb2f00000
> $ echo 0xb2f00000 > param1
It seems sort of unpleasant to expose the physical base address of the
RCRB. Is there any way to use a device address or other logical
identifier instead and keep the actual MMIO address internal? E.g., a
PCI device has a <domain>:<bus>:<device>.<function> address. I assume
CXL addresses would look similar?
> $ echo 1 > error_inject
> ...
> +static int is_valid_cxl_addr(u64 addr)
Maybe the function name should include a hint that this should be an
RCRB address? But I don't claim to know the CXL architecture or
nomenclature.
Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev
2023-09-25 20:01 ` [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev Ben Cheatham
2023-09-26 10:50 ` Jonathan Cameron
@ 2023-09-26 20:23 ` Bjorn Helgaas
2023-09-27 15:30 ` Ben Cheatham
2023-09-26 21:15 ` Dan Williams
2 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2023-09-26 20:23 UTC (permalink / raw)
To: Ben Cheatham
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On Mon, Sep 25, 2023 at 03:01:25PM -0500, Ben Cheatham wrote:
> Add cxl_rcrb_addr to the dport_dev (normally represented by a pcie
> device) for CXL RCH root ports. The file will print the RCRB base
> MMIO address of the root port when read and will be used by
> users looking to inject CXL EINJ error types for RCH hosts.
I guess this is talking about a sysfs file? If so, maybe mention that
explicitly in the subject and commit log.
I don't know how you decided to capitalize CXL initialisms and not
"pcie", but I usually use "PCIe".
> +static struct cxl_port *cxl_root;
> +
> +void set_cxl_root(struct cxl_port *root_port)
> +{
> + cxl_root = root_port;
> +}
Is there always at most one cxl_root? Seems worth a one-line comment
at the static data item, since in the world of devices, data is
usually in a per-device struct, not in a single static item.
> @@ -1021,6 +1074,11 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> if (rc)
> return ERR_PTR(rc);
>
> + rc = sysfs_create_group(&dport_dev->kobj, &cxl_rcrb_addr_group);
> + if (rc)
> + dev_dbg(dport_dev, "Couldn't create cxl_rcrb_addr group: %d\n",
> + rc);
Is there any way to create this with an attribute group that the sysfs
infrastructure adds automatically? I'm not suggesting you have a race
condition here, but using the sysfs infrastructure avoids a lot of
potential problems.
Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation
2023-09-25 20:01 ` [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation Ben Cheatham
2023-09-26 11:05 ` Jonathan Cameron
@ 2023-09-26 20:24 ` Bjorn Helgaas
2023-09-27 15:31 ` Ben Cheatham
1 sibling, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2023-09-26 20:24 UTC (permalink / raw)
To: Ben Cheatham
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On Mon, Sep 25, 2023 at 03:01:27PM -0500, Ben Cheatham wrote:
> Update EINJ documentation to include CXL errors in available_error_types
> table and usage of the types.
>
> Also fix a formatting error in the param4 file description that caused
> the description to be on the same line as the bullet point.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
> .../firmware-guide/acpi/apei/einj.rst | 25 ++++++++++++++++---
> 1 file changed, 21 insertions(+), 4 deletions(-)
I always feel like the documentation update should be in the same
patch as the new functionality so it's easy to match up with the code
and keep things together when backporting. But I know that sentiment
is not universal and maybe there's good reason to keep them separate.
> diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
> index d6b61d22f525..c6f28118c48b 100644
> --- a/Documentation/firmware-guide/acpi/apei/einj.rst
> +++ b/Documentation/firmware-guide/acpi/apei/einj.rst
> @@ -32,6 +32,9 @@ configuration::
> CONFIG_ACPI_APEI
> CONFIG_ACPI_APEI_EINJ
>
> +To use CXL error types ``CONFIG_CXL_ACPI`` needs to be set to the same
> +value as ``CONFIG_ACPI_APEI_EINJ`` (either "y" or "m").
> ...
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev
2023-09-25 20:01 ` [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev Ben Cheatham
2023-09-26 10:50 ` Jonathan Cameron
2023-09-26 20:23 ` Bjorn Helgaas
@ 2023-09-26 21:15 ` Dan Williams
2023-09-27 15:31 ` Ben Cheatham
2 siblings, 1 reply; 20+ messages in thread
From: Dan Williams @ 2023-09-26 21:15 UTC (permalink / raw)
To: Ben Cheatham, rafael, dan.j.williams, linux-cxl, linux-acpi
Cc: bhelgaas, benjamin.cheatham, yazen.ghannam
Ben Cheatham wrote:
> Add cxl_rcrb_addr to the dport_dev (normally represented by a pcie
> device) for CXL RCH root ports. The file will print the RCRB base
> MMIO address of the root port when read and will be used by
> users looking to inject CXL EINJ error types for RCH hosts.
RCRB is an implementation detail of RCH topologies, I don't see why
userspace needs this information, maybe it becomes clearer in the follow
on patches, but I would hope this detail could be hidden.
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
> Documentation/ABI/testing/sysfs-bus-cxl | 9 ++++
> drivers/cxl/acpi.c | 2 +
> drivers/cxl/core/port.c | 58 +++++++++++++++++++++++++
> drivers/cxl/cxl.h | 2 +
> 4 files changed, 71 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 087f762ebfd5..85621da69296 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -177,6 +177,15 @@ Description:
> integer reflects the hardware port unique-id used in the
> hardware decoder target list.
>
> +What: /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
> +What: /sys/devices/pciX/cxl_rcrb_addr
> +Date: August, 2023
> +KernelVersion: v6.6
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + (RO) The 'cxl_rcrb_addr' device file gives the MMIO base address
> + of the RCRB of the corresponding CXL 1.1 downstream port. Only
> + present for CXL 1.1 dports.
>
> What: /sys/bus/cxl/devices/decoderX.Y
> Date: June, 2021
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index d1c559879dcc..3e2ca946bf47 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -676,6 +676,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> if (IS_ERR(root_port))
> return PTR_ERR(root_port);
>
> + set_cxl_root(root_port);
> +
The cxl_root is not a singleton and the way to determine this linkage is
by walking up the port hierarchy. See find_cxl_root().
> rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
> add_host_bridge_dport);
> if (rc < 0)
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 724be8448eb4..c3914e73f67e 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -875,6 +875,14 @@ struct cxl_port *find_cxl_root(struct cxl_port *port)
> }
> EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL);
>
> +static struct cxl_port *cxl_root;
> +
> +void set_cxl_root(struct cxl_port *root_port)
> +{
> + cxl_root = root_port;
> +}
> +EXPORT_SYMBOL_NS_GPL(set_cxl_root, CXL);
> +
> static struct cxl_dport *find_dport(struct cxl_port *port, int id)
> {
> struct cxl_dport *dport;
> @@ -930,11 +938,56 @@ static void cond_cxl_root_unlock(struct cxl_port *port)
> device_unlock(&port->dev);
> }
>
> +static ssize_t cxl_rcrb_addr_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cxl_dport *dport;
> +
> + if (!cxl_root)
> + return -ENODEV;
> +
> + dport = cxl_find_dport_by_dev(cxl_root, dev);
> + if (!dport)
> + return -ENODEV;
> +
> + return sysfs_emit(buf, "0x%llx\n", (u64) dport->rcrb.base);
> +}
> +DEVICE_ATTR_RO(cxl_rcrb_addr);
> +
> +static umode_t cxl_rcrb_addr_is_visible(struct kobject *kobj,
> + struct attribute *a, int n)
> +{
> + struct device *dev = kobj_to_dev(kobj);
> + struct cxl_dport *dport;
> +
> + if (!IS_ENABLED(CONFIG_ACPI_APEI_EINJ) || !cxl_root)
> + return 0;
> +
> + dport = cxl_find_dport_by_dev(cxl_root, dev);
> + if (!dport || !dport->rch || dport->rcrb.base == CXL_RESOURCE_NONE)
> + return 0;
> +
> + return a->mode;
> +}
> +
> +static struct attribute *cxl_rcrb_addr_attrs[] = {
> + &dev_attr_cxl_rcrb_addr.attr,
> + NULL,
> +};
> +
> +static const struct attribute_group cxl_rcrb_addr_group = {
> + .attrs = cxl_rcrb_addr_attrs,
> + .is_visible = cxl_rcrb_addr_is_visible,
> +};
> +
> static void cxl_dport_remove(void *data)
> {
> struct cxl_dport *dport = data;
> struct cxl_port *port = dport->port;
>
> + if (dport->rch)
> + sysfs_remove_group(&dport->dport_dev->kobj, &cxl_rcrb_addr_group);
> +
> xa_erase(&port->dports, (unsigned long) dport->dport_dev);
> put_device(dport->dport_dev);
> }
> @@ -1021,6 +1074,11 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> if (rc)
> return ERR_PTR(rc);
>
> + rc = sysfs_create_group(&dport_dev->kobj, &cxl_rcrb_addr_group);
> + if (rc)
> + dev_dbg(dport_dev, "Couldn't create cxl_rcrb_addr group: %d\n",
> + rc);
Please no dynamic sysfs attribute registration. If this attribute is
needed it should be static.
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support
2023-09-25 20:01 ` [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support Ben Cheatham
2023-09-26 11:04 ` Jonathan Cameron
2023-09-26 20:22 ` Bjorn Helgaas
@ 2023-09-26 21:36 ` Dan Williams
2023-09-27 15:32 ` Ben Cheatham
2 siblings, 1 reply; 20+ messages in thread
From: Dan Williams @ 2023-09-26 21:36 UTC (permalink / raw)
To: Ben Cheatham, rafael, dan.j.williams, linux-cxl, linux-acpi
Cc: bhelgaas, benjamin.cheatham, yazen.ghannam
Ben Cheatham wrote:
> Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI
> v6.5. Because these error types target memory-mapped CXL 1.1 compliant
> downstream ports and not physical (normal/persistent) memory, these
> error types are not currently allowed through the memory range
> validation done by the EINJ driver.
>
> The MMIO address of a CXL 1.1 downstream port can be found in the
> cxl_rcrb_addr file in the corresponding dport directory under
> /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same
> procedure as a memory error type, but with param1 set to the
> downstream port MMIO address.
>
> Example usage:
> $ cd /sys/kernel/debug/apei/einj
> $ cat available_error_type
> 0x00000008 Memory Correctable
> 0x00000010 Memory Uncorrectable non-fatal
> 0x00000020 Memory Uncorrectable fatal
> 0x00000040 PCI Express Correctable
> 0x00000080 PCI Express Uncorrectable non-fatal
> 0x00000100 PCI Express Uncorrectable fatal
> 0x00008000 CXL.mem Protocol Correctable
> 0x00020000 CXL.mem Protocol Uncorrectable fatal
> $ echo 0x8000 > error_type
> $ echo 0xfffffffffffff000 > param2
> $ echo 0x2 > flags
> $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
> 0xb2f00000
> $ echo 0xb2f00000 > param1
> $ echo 1 > error_inject
I have the same reaction to this as I did before:
http://lore.kernel.org/r/647817212bcf1_e067a2945@dwillia2-xfh.jf.intel.com.notmuch
Why is per-port error injection being driven from this legacy global
interface where userspace needs to take information from sysfs and walk
it over to this other interface? Especially since "rcrb" is an
implementation detail that will be invalidated with CXL VH topologies?
What I would like to see, since this is a new capability with no need to
be beholden to legacy is to disaggregate the interface to be per-port.
For example:
/sys/kernel/debug/cxl/$mem/{inject,clear}_poison is already established
for memory device poison injection. Why not add something like:
/sys/kernel/debug/cxl/$port/einj_{type,inject}
For triggering errors by the CXL subsystem device name, and unburden
userspace from needing to deal in magic numbers.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev
2023-09-26 20:23 ` Bjorn Helgaas
@ 2023-09-27 15:30 ` Ben Cheatham
0 siblings, 0 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-27 15:30 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
Hi Bjorn, thanks for taking a look!
On 9/26/23 3:23 PM, Bjorn Helgaas wrote:
> On Mon, Sep 25, 2023 at 03:01:25PM -0500, Ben Cheatham wrote:
>> Add cxl_rcrb_addr to the dport_dev (normally represented by a pcie
>> device) for CXL RCH root ports. The file will print the RCRB base
>> MMIO address of the root port when read and will be used by
>> users looking to inject CXL EINJ error types for RCH hosts.
>
> I guess this is talking about a sysfs file? If so, maybe mention that
> explicitly in the subject and commit log.
>
> I don't know how you decided to capitalize CXL initialisms and not
> "pcie", but I usually use "PCIe".
>
Sorry about that, it's a typo. I should've written PCIE/PCIe.
>> +static struct cxl_port *cxl_root;
>> +
>> +void set_cxl_root(struct cxl_port *root_port)
>> +{
>> + cxl_root = root_port;
>> +}
>
> Is there always at most one cxl_root? Seems worth a one-line comment
> at the static data item, since in the world of devices, data is
> usually in a per-device struct, not in a single static item.
>
I thought that the root was a singleton, but per Dan's comment it seems
that isn't the case.
>> @@ -1021,6 +1074,11 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
>> if (rc)
>> return ERR_PTR(rc);
>>
>> + rc = sysfs_create_group(&dport_dev->kobj, &cxl_rcrb_addr_group);
>> + if (rc)
>> + dev_dbg(dport_dev, "Couldn't create cxl_rcrb_addr group: %d\n",
>> + rc);
>
> Is there any way to create this with an attribute group that the sysfs
> infrastructure adds automatically? I'm not suggesting you have a race
> condition here, but using the sysfs infrastructure avoids a lot of
> potential problems.
>
I would need to look into it more. I did it this way because I wasn't sure
there was a way to set it up statically since the dport_dev can match a
couple of different device types. After looking at Dan's comments I will
probably move away from making this file altogether and move the functionality
under sysfs/debug/cxl.
> Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support
2023-09-26 20:22 ` Bjorn Helgaas
@ 2023-09-27 15:31 ` Ben Cheatham
0 siblings, 0 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-27 15:31 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On 9/26/23 3:22 PM, Bjorn Helgaas wrote:
> On Mon, Sep 25, 2023 at 03:01:26PM -0500, Ben Cheatham wrote:
>> Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI
>> v6.5. Because these error types target memory-mapped CXL 1.1 compliant
>> downstream ports and not physical (normal/persistent) memory, these
>> error types are not currently allowed through the memory range
>> validation done by the EINJ driver.
>>
>> The MMIO address of a CXL 1.1 downstream port can be found in the
>> cxl_rcrb_addr file in the corresponding dport directory under
>> /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same
>> procedure as a memory error type, but with param1 set to the
>> downstream port MMIO address.
>>
>> Example usage:
>> $ cd /sys/kernel/debug/apei/einj
>> $ cat available_error_type
>> 0x00000008 Memory Correctable
>> 0x00000010 Memory Uncorrectable non-fatal
>> 0x00000020 Memory Uncorrectable fatal
>> 0x00000040 PCI Express Correctable
>> 0x00000080 PCI Express Uncorrectable non-fatal
>> 0x00000100 PCI Express Uncorrectable fatal
>> 0x00008000 CXL.mem Protocol Correctable
>> 0x00020000 CXL.mem Protocol Uncorrectable fatal
>> $ echo 0x8000 > error_type
>> $ echo 0xfffffffffffff000 > param2
>> $ echo 0x2 > flags
>> $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
>> 0xb2f00000
>> $ echo 0xb2f00000 > param1
>
> It seems sort of unpleasant to expose the physical base address of the
> RCRB. Is there any way to use a device address or other logical
> identifier instead and keep the actual MMIO address internal? E.g., a
> PCI device has a <domain>:<bus>:<device>.<function> address. I assume
> CXL addresses would look similar?
>
I have the MMIO address of the dport exposed here because the ACPI spec
says that the address is expected for CXL 1.1 errors and I wanted to match
the spec. For CXL 2.0 errors the SBDF is expected as you described.
>> $ echo 1 > error_inject
>> ...
>
>> +static int is_valid_cxl_addr(u64 addr)
>
> Maybe the function name should include a hint that this should be an
> RCRB address? But I don't claim to know the CXL architecture or
> nomenclature.
>
Good point, if I keep this function in v6 I'll make the name more
descriptive.
> Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation
2023-09-26 20:24 ` Bjorn Helgaas
@ 2023-09-27 15:31 ` Ben Cheatham
0 siblings, 0 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-27 15:31 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: rafael, dan.j.williams, linux-cxl, linux-acpi, bhelgaas,
yazen.ghannam
On 9/26/23 3:24 PM, Bjorn Helgaas wrote:
> On Mon, Sep 25, 2023 at 03:01:27PM -0500, Ben Cheatham wrote:
>> Update EINJ documentation to include CXL errors in available_error_types
>> table and usage of the types.
>>
>> Also fix a formatting error in the param4 file description that caused
>> the description to be on the same line as the bullet point.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>> .../firmware-guide/acpi/apei/einj.rst | 25 ++++++++++++++++---
>> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> I always feel like the documentation update should be in the same
> patch as the new functionality so it's easy to match up with the code
> and keep things together when backporting. But I know that sentiment
> is not universal and maybe there's good reason to keep them separate.
>
I put it into a separate patch since the documentation change was substantial,
but if it gets shorter in v6 I'll put it into the previous patch.
Thanks,
Ben
>> diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
>> index d6b61d22f525..c6f28118c48b 100644
>> --- a/Documentation/firmware-guide/acpi/apei/einj.rst
>> +++ b/Documentation/firmware-guide/acpi/apei/einj.rst
>> @@ -32,6 +32,9 @@ configuration::
>> CONFIG_ACPI_APEI
>> CONFIG_ACPI_APEI_EINJ
>>
>> +To use CXL error types ``CONFIG_CXL_ACPI`` needs to be set to the same
>> +value as ``CONFIG_ACPI_APEI_EINJ`` (either "y" or "m").
>> ...
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev
2023-09-26 21:15 ` Dan Williams
@ 2023-09-27 15:31 ` Ben Cheatham
0 siblings, 0 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-27 15:31 UTC (permalink / raw)
To: Dan Williams, rafael, linux-cxl, linux-acpi; +Cc: bhelgaas, yazen.ghannam
Thanks for the review Dan, responses inline.
On 9/26/23 4:15 PM, Dan Williams wrote:
> Ben Cheatham wrote:
>> Add cxl_rcrb_addr to the dport_dev (normally represented by a pcie
>> device) for CXL RCH root ports. The file will print the RCRB base
>> MMIO address of the root port when read and will be used by
>> users looking to inject CXL EINJ error types for RCH hosts.
>
> RCRB is an implementation detail of RCH topologies, I don't see why
> userspace needs this information, maybe it becomes clearer in the follow
> on patches, but I would hope this detail could be hidden.
>
It doesn't, I'll rename the file (if it stays at all).
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>> Documentation/ABI/testing/sysfs-bus-cxl | 9 ++++
>> drivers/cxl/acpi.c | 2 +
>> drivers/cxl/core/port.c | 58 +++++++++++++++++++++++++
>> drivers/cxl/cxl.h | 2 +
>> 4 files changed, 71 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
>> index 087f762ebfd5..85621da69296 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-cxl
>> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
>> @@ -177,6 +177,15 @@ Description:
>> integer reflects the hardware port unique-id used in the
>> hardware decoder target list.
>>
>> +What: /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
>> +What: /sys/devices/pciX/cxl_rcrb_addr
>> +Date: August, 2023
>> +KernelVersion: v6.6
>> +Contact: linux-cxl@vger.kernel.org
>> +Description:
>> + (RO) The 'cxl_rcrb_addr' device file gives the MMIO base address
>> + of the RCRB of the corresponding CXL 1.1 downstream port. Only
>> + present for CXL 1.1 dports.
>>
>> What: /sys/bus/cxl/devices/decoderX.Y
>> Date: June, 2021
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index d1c559879dcc..3e2ca946bf47 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -676,6 +676,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>> if (IS_ERR(root_port))
>> return PTR_ERR(root_port);
>>
>> + set_cxl_root(root_port);
>> +
>
> The cxl_root is not a singleton and the way to determine this linkage is
> by walking up the port hierarchy. See find_cxl_root().
>
Ok, I was under the impression it was and couldn't find anything definitive
in the CXL spec about it. The reason I did that was that I couldn't get
access to the CXL port hierarchy from the EINJ model and needed an
access point.
>> rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
>> add_host_bridge_dport);
>> if (rc < 0)
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 724be8448eb4..c3914e73f67e 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -875,6 +875,14 @@ struct cxl_port *find_cxl_root(struct cxl_port *port)
>> }
>> EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL);
>>
>> +static struct cxl_port *cxl_root;
>> +
>> +void set_cxl_root(struct cxl_port *root_port)
>> +{
>> + cxl_root = root_port;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(set_cxl_root, CXL);
>> +
>> static struct cxl_dport *find_dport(struct cxl_port *port, int id)
>> {
>> struct cxl_dport *dport;
>> @@ -930,11 +938,56 @@ static void cond_cxl_root_unlock(struct cxl_port *port)
>> device_unlock(&port->dev);
>> }
>>
>> +static ssize_t cxl_rcrb_addr_show(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + struct cxl_dport *dport;
>> +
>> + if (!cxl_root)
>> + return -ENODEV;
>> +
>> + dport = cxl_find_dport_by_dev(cxl_root, dev);
>> + if (!dport)
>> + return -ENODEV;
>> +
>> + return sysfs_emit(buf, "0x%llx\n", (u64) dport->rcrb.base);
>> +}
>> +DEVICE_ATTR_RO(cxl_rcrb_addr);
>> +
>> +static umode_t cxl_rcrb_addr_is_visible(struct kobject *kobj,
>> + struct attribute *a, int n)
>> +{
>> + struct device *dev = kobj_to_dev(kobj);
>> + struct cxl_dport *dport;
>> +
>> + if (!IS_ENABLED(CONFIG_ACPI_APEI_EINJ) || !cxl_root)
>> + return 0;
>> +
>> + dport = cxl_find_dport_by_dev(cxl_root, dev);
>> + if (!dport || !dport->rch || dport->rcrb.base == CXL_RESOURCE_NONE)
>> + return 0;
>> +
>> + return a->mode;
>> +}
>> +
>> +static struct attribute *cxl_rcrb_addr_attrs[] = {
>> + &dev_attr_cxl_rcrb_addr.attr,
>> + NULL,
>> +};
>> +
>> +static const struct attribute_group cxl_rcrb_addr_group = {
>> + .attrs = cxl_rcrb_addr_attrs,
>> + .is_visible = cxl_rcrb_addr_is_visible,
>> +};
>> +
>> static void cxl_dport_remove(void *data)
>> {
>> struct cxl_dport *dport = data;
>> struct cxl_port *port = dport->port;
>>
>> + if (dport->rch)
>> + sysfs_remove_group(&dport->dport_dev->kobj, &cxl_rcrb_addr_group);
>> +
>> xa_erase(&port->dports, (unsigned long) dport->dport_dev);
>> put_device(dport->dport_dev);
>> }
>> @@ -1021,6 +1074,11 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
>> if (rc)
>> return ERR_PTR(rc);
>>
>> + rc = sysfs_create_group(&dport_dev->kobj, &cxl_rcrb_addr_group);
>> + if (rc)
>> + dev_dbg(dport_dev, "Couldn't create cxl_rcrb_addr group: %d\n",
>> + rc);
>
> Please no dynamic sysfs attribute registration. If this attribute is
> needed it should be static.
Understood. The file probably won't stay in v6, but I'll keep everything
static.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support
2023-09-26 21:36 ` Dan Williams
@ 2023-09-27 15:32 ` Ben Cheatham
0 siblings, 0 replies; 20+ messages in thread
From: Ben Cheatham @ 2023-09-27 15:32 UTC (permalink / raw)
To: Dan Williams, rafael, linux-cxl, linux-acpi; +Cc: bhelgaas, yazen.ghannam
On 9/26/23 4:36 PM, Dan Williams wrote:
> Ben Cheatham wrote:
>> Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI
>> v6.5. Because these error types target memory-mapped CXL 1.1 compliant
>> downstream ports and not physical (normal/persistent) memory, these
>> error types are not currently allowed through the memory range
>> validation done by the EINJ driver.
>>
>> The MMIO address of a CXL 1.1 downstream port can be found in the
>> cxl_rcrb_addr file in the corresponding dport directory under
>> /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same
>> procedure as a memory error type, but with param1 set to the
>> downstream port MMIO address.
>>
>> Example usage:
>> $ cd /sys/kernel/debug/apei/einj
>> $ cat available_error_type
>> 0x00000008 Memory Correctable
>> 0x00000010 Memory Uncorrectable non-fatal
>> 0x00000020 Memory Uncorrectable fatal
>> 0x00000040 PCI Express Correctable
>> 0x00000080 PCI Express Uncorrectable non-fatal
>> 0x00000100 PCI Express Uncorrectable fatal
>> 0x00008000 CXL.mem Protocol Correctable
>> 0x00020000 CXL.mem Protocol Uncorrectable fatal
>> $ echo 0x8000 > error_type
>> $ echo 0xfffffffffffff000 > param2
>> $ echo 0x2 > flags
>> $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr
>> 0xb2f00000
>> $ echo 0xb2f00000 > param1
>> $ echo 1 > error_inject
>
> I have the same reaction to this as I did before:
>
> http://lore.kernel.org/r/647817212bcf1_e067a2945@dwillia2-xfh.jf.intel.com.notmuch
>
> Why is per-port error injection being driven from this legacy global
> interface where userspace needs to take information from sysfs and walk
> it over to this other interface? Especially since "rcrb" is an
> implementation detail that will be invalidated with CXL VH topologies?
>
I get what you're saying, I did it this way because I saw this as primarily
an EINJ feature and wanted to keep it in that module. I agree with you though,
it's clunky and would be better inside the cxl module/debugfs.
> What I would like to see, since this is a new capability with no need to
> be beholden to legacy is to disaggregate the interface to be per-port.
>
> For example:
>
> /sys/kernel/debug/cxl/$mem/{inject,clear}_poison is already established
> for memory device poison injection. Why not add something like:
>
> /sys/kernel/debug/cxl/$port/einj_{type,inject}
>
> For triggering errors by the CXL subsystem device name, and unburden
> userspace from needing to deal in magic numbers.
I'll go ahead and move everything over to the cxl debugfs and do what you're
suggesting. Thanks for taking the time to take a look!
Thanks,
Ben
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-09-27 15:32 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-25 20:01 [PATCH v5 0/3] CXL, ACPI, APEI, EINJ: Update EINJ for CXL 1.1 error types Ben Cheatham
2023-09-25 20:01 ` [PATCH v5 1/3] CXL, PCIE: Add cxl_rcrb_addr file to dport_dev Ben Cheatham
2023-09-26 10:50 ` Jonathan Cameron
2023-09-26 16:00 ` Ben Cheatham
2023-09-26 20:23 ` Bjorn Helgaas
2023-09-27 15:30 ` Ben Cheatham
2023-09-26 21:15 ` Dan Williams
2023-09-27 15:31 ` Ben Cheatham
2023-09-25 20:01 ` [PATCH v5 2/3] ACPI, APEI, EINJ: Add CXL 1.1 EINJ error type support Ben Cheatham
2023-09-26 11:04 ` Jonathan Cameron
2023-09-26 16:00 ` Ben Cheatham
2023-09-26 20:22 ` Bjorn Helgaas
2023-09-27 15:31 ` Ben Cheatham
2023-09-26 21:36 ` Dan Williams
2023-09-27 15:32 ` Ben Cheatham
2023-09-25 20:01 ` [PATCH v5 3/3] ACPI, APEI, EINJ: Update EINJ documentation Ben Cheatham
2023-09-26 11:05 ` Jonathan Cameron
2023-09-26 16:00 ` Ben Cheatham
2023-09-26 20:24 ` Bjorn Helgaas
2023-09-27 15:31 ` Ben Cheatham
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox