* [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types
@ 2024-02-08 20:00 Ben Cheatham
2024-02-08 20:00 ` [PATCH v11 1/4] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
` (4 more replies)
0 siblings, 5 replies; 21+ messages in thread
From: Ben Cheatham @ 2024-02-08 20:00 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
v11 Changes:
- Drop patch 2/6 (Add CXL protocol error defines) and put the
defines in patch 4/6 instead (Dan)
- Add Dan's reviewed-by
v10 Changes:
- Fixups in EINJ module initializtion (Dan)
- Add include/linux/einj-cxl.h to MAINTAINERS under CXL subsystem
(Dan)
- Replace usages of IS_ENABLED(CONFIG_CXL_EINJ) with new
einj_is_initialized() function in cxl/core/port.c (Dan)
- Fix typo in EINJ documentation (Dan)
The new CXL error types will use the Memory Address field in the
SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
compliant memory-mapped downstream port. The value of the memory address
will be in the port's MMIO range, and it will not represent physical
(normal or persistent) memory.
Add the functionality for injecting CXL 1.1 errors to the EINJ module,
but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
Instead, make the error types available under /sys/kernel/debug/cxl.
This allows for validating the MMIO address for a CXL 1.1 error type
while also not making the user responsible for finding it.
Ben Cheatham (4):
cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
EINJ: Migrate to a platform driver
cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
EINJ, Documentation: Update EINJ kernel doc
Documentation/ABI/testing/debugfs-cxl | 22 ++
.../firmware-guide/acpi/apei/einj.rst | 19 ++
MAINTAINERS | 1 +
drivers/acpi/apei/einj.c | 202 ++++++++++++++++--
drivers/cxl/Kconfig | 12 ++
drivers/cxl/core/port.c | 39 ++++
include/linux/einj-cxl.h | 45 ++++
7 files changed, 328 insertions(+), 12 deletions(-)
create mode 100644 include/linux/einj-cxl.h
--
2.34.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v11 1/4] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
2024-02-08 20:00 [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
@ 2024-02-08 20:00 ` Ben Cheatham
2024-02-14 14:05 ` Jonathan Cameron
2024-02-08 20:00 ` [PATCH v11 2/4] EINJ: Migrate to a platform driver Ben Cheatham
` (3 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Ben Cheatham @ 2024-02-08 20:00 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Add CONFIG_CXL_EINJ to cxl/Kconfig. This option will allow for the CXL
core module to access helpers inside the EINJ module, while also giving
users the option of disabling CXL EINJ error types at build time.
Also update CONFIG_ACPI_APEI_EINJ to set CONFIG_CXL_EINJ by default.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/cxl/Kconfig | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 67998dbd1d46..95f215a2e5dc 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -157,4 +157,16 @@ config CXL_PMU
monitoring units and provide standard perf based interfaces.
If unsure say 'm'.
+
+config CXL_EINJ
+ bool "CXL Error INJection Support"
+ default ACPI_APEI_EINJ
+ depends on ACPI_APEI_EINJ >= CXL_BUS
+ help
+ Support for CXL protocol Error INJection through debugfs/cxl.
+ Availability and which errors are supported is dependent on
+ the host platform. Look to ACPI v6.5 section 18.6.4 and kernel
+ EINJ documentation for more information.
+
+ If unsure say 'n'
endif
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v11 2/4] EINJ: Migrate to a platform driver
2024-02-08 20:00 [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
2024-02-08 20:00 ` [PATCH v11 1/4] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
@ 2024-02-08 20:00 ` Ben Cheatham
2024-02-14 14:37 ` Jonathan Cameron
2024-02-08 20:00 ` [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions Ben Cheatham
` (2 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Ben Cheatham @ 2024-02-08 20:00 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Change the EINJ module to install a platform device/driver on module
init and move the module init() and exit() functions to driver probe and
remove. This change allows the EINJ module to load regardless of whether
setting up EINJ succeeds, which allows dependent modules to still load
(i.e. the CXL core).
Since EINJ may no longer be initialized when the module loads, any
functions that are called from dependent/external modules should check
the einj_initialized variable before calling any EINJ functions.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/acpi/apei/einj.c | 44 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 013eb621dc92..73dde21d3e89 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -21,6 +21,7 @@
#include <linux/nmi.h>
#include <linux/delay.h>
#include <linux/mm.h>
+#include <linux/platform_device.h>
#include <asm/unaligned.h>
#include "apei-internal.h"
@@ -136,6 +137,11 @@ static struct apei_exec_ins_type einj_ins_type[] = {
*/
static DEFINE_MUTEX(einj_mutex);
+/*
+ * Exported APIs use this flag to exit early if einj_probe() failed.
+ */
+static bool einj_initialized __ro_after_init;
+
static void *einj_param;
static void einj_exec_ctx_init(struct apei_exec_context *ctx)
@@ -684,7 +690,7 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
return 0;
}
-static int __init einj_init(void)
+static int __init einj_probe(struct platform_device *pdev)
{
int rc;
acpi_status status;
@@ -782,7 +788,7 @@ static int __init einj_init(void)
return rc;
}
-static void __exit einj_exit(void)
+static void __exit einj_remove(struct platform_device *pdev)
{
struct apei_exec_context ctx;
@@ -801,6 +807,40 @@ static void __exit einj_exit(void)
acpi_put_table((struct acpi_table_header *)einj_tab);
}
+static struct platform_device *einj_dev;
+struct platform_driver einj_driver = {
+ .remove_new = einj_remove,
+ .driver = {
+ .name = "acpi-einj",
+ },
+};
+
+static int __init einj_init(void)
+{
+ struct platform_device_info einj_dev_info = {
+ .name = "acpi-einj",
+ .id = -1,
+ };
+ int rc;
+
+ einj_dev = platform_device_register_full(&einj_dev_info);
+ if (IS_ERR_OR_NULL(einj_dev))
+ return PTR_ERR(einj_dev);
+
+ rc = platform_driver_probe(&einj_driver, einj_probe);
+ einj_initialized = rc == 0;
+
+ return 0;
+}
+
+static void __exit einj_exit(void)
+{
+ if (einj_initialized)
+ platform_driver_unregister(&einj_driver);
+
+ platform_device_del(einj_dev);
+}
+
module_init(einj_init);
module_exit(einj_exit);
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-08 20:00 [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
2024-02-08 20:00 ` [PATCH v11 1/4] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
2024-02-08 20:00 ` [PATCH v11 2/4] EINJ: Migrate to a platform driver Ben Cheatham
@ 2024-02-08 20:00 ` Ben Cheatham
2024-02-09 21:30 ` Ben Cheatham
2024-02-14 15:27 ` Jonathan Cameron
2024-02-08 20:00 ` [PATCH v11 4/4] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
2024-02-11 1:15 ` [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Dan Williams
4 siblings, 2 replies; 21+ messages in thread
From: Ben Cheatham @ 2024-02-08 20:00 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Implement CXL helper functions in the EINJ module for getting/injecting
available CXL protocol error types and export them to sysfs under
kernel/debug/cxl.
The kernel/debug/cxl/einj_types file will print the available CXL
protocol errors in the same format as the available_error_types
file provided by the EINJ module. The
kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
error_type and error_inject files provided by the EINJ module, i.e.:
writing an error type into $dport_dev/einj_inject will inject said error
type into the CXL dport represented by $dport_dev.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
Documentation/ABI/testing/debugfs-cxl | 22 ++++
MAINTAINERS | 1 +
drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++--
drivers/cxl/core/port.c | 39 +++++++
include/linux/einj-cxl.h | 45 ++++++++
5 files changed, 255 insertions(+), 10 deletions(-)
create mode 100644 include/linux/einj-cxl.h
diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
index fe61d372e3fa..bcd985cca66a 100644
--- a/Documentation/ABI/testing/debugfs-cxl
+++ b/Documentation/ABI/testing/debugfs-cxl
@@ -33,3 +33,25 @@ Description:
device cannot clear poison from the address, -ENXIO is returned.
The clear_poison attribute is only visible for devices
supporting the capability.
+
+What: /sys/kernel/debug/cxl/einj_types
+Date: January, 2024
+KernelVersion: v6.9
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (RO) Prints the CXL protocol error types made available by
+ the platform in the format "0x<error number> <error type>".
+ The <error number> can be written to einj_inject to inject
+ <error type> into a chosen dport.
+
+What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
+Date: January, 2024
+KernelVersion: v6.9
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (WO) Writing an integer to this file injects the corresponding
+ CXL protocol error into $dport_dev ($dport_dev will be a device
+ name from /sys/bus/pci/devices). The integer to type mapping for
+ injection can be found by reading from einj_types. If the dport
+ was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
+ a CXL 2.0 error is injected.
diff --git a/MAINTAINERS b/MAINTAINERS
index 9104430e148e..02d7feb2ed1f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5246,6 +5246,7 @@ L: linux-cxl@vger.kernel.org
S: Maintained
F: drivers/cxl/
F: include/uapi/linux/cxl_mem.h
+F: include/linux/einj-cxl.h
F: tools/testing/cxl/
COMPUTE EXPRESS LINK PMU (CPMU)
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 73dde21d3e89..9137cc01f791 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -21,6 +21,7 @@
#include <linux/nmi.h>
#include <linux/delay.h>
#include <linux/mm.h>
+#include <linux/einj-cxl.h>
#include <linux/platform_device.h>
#include <asm/unaligned.h>
@@ -37,6 +38,20 @@
#define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
ACPI_EINJ_MEMORY_UNCORRECTABLE | \
ACPI_EINJ_MEMORY_FATAL)
+#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
+#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
+#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
+#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
+#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
+#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
+#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
+#endif
+#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
+ ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
+ ACPI_EINJ_CXL_CACHE_FATAL | \
+ ACPI_EINJ_CXL_MEM_CORRECTABLE | \
+ ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
+ ACPI_EINJ_CXL_MEM_FATAL)
/*
* ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
@@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
if (type & ACPI5_VENDOR_BIT) {
if (vendor_flags != SETWA_FLAGS_MEM)
goto inject;
- } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
+ } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
goto inject;
+ } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
+ goto inject;
+ }
/*
* Disallow crazy address masks that give BIOS leeway to pick
@@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = {
"0x00000200\tPlatform Correctable\n",
"0x00000400\tPlatform Uncorrectable non-fatal\n",
"0x00000800\tPlatform Uncorrectable fatal\n",
+};
+
+static const char * const einj_cxl_error_type_string[] = {
"0x00001000\tCXL.cache Protocol Correctable\n",
"0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
"0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
@@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v)
DEFINE_SHOW_ATTRIBUTE(available_error_type);
-static int error_type_get(void *data, u64 *val)
+int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
{
- *val = error_type;
+ int cxl_err, rc;
+ u32 available_error_type = 0;
+
+ if (!einj_initialized)
+ return -ENXIO;
+
+ rc = einj_get_available_error_type(&available_error_type);
+ if (rc)
+ return rc;
+
+ for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
+ cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
+
+ if (available_error_type & cxl_err)
+ seq_puts(m, einj_cxl_error_type_string[pos]);
+ }
return 0;
}
+EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
-static int error_type_set(void *data, u64 val)
+static int validate_error_type(u64 type)
{
+ u32 tval, vendor, available_error_type = 0;
int rc;
- u32 available_error_type = 0;
- u32 tval, vendor;
/* Only low 32 bits for error type are valid */
- if (val & GENMASK_ULL(63, 32))
+ if (type & GENMASK_ULL(63, 32))
return -EINVAL;
/*
* Vendor defined types have 0x80000000 bit set, and
* are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
*/
- vendor = val & ACPI5_VENDOR_BIT;
- tval = val & 0x7fffffff;
+ vendor = type & ACPI5_VENDOR_BIT;
+ tval = type & 0x7fffffff;
/* Only one error type can be specified */
if (tval & (tval - 1))
@@ -652,9 +688,105 @@ static int error_type_set(void *data, u64 val)
rc = einj_get_available_error_type(&available_error_type);
if (rc)
return rc;
- if (!(val & available_error_type))
+ if (!(type & available_error_type))
return -EINVAL;
}
+
+ return 0;
+}
+
+static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
+{
+ struct pci_bus *pbus;
+ struct pci_host_bridge *bridge;
+ u64 seg = 0, bus;
+
+ pbus = dport_dev->bus;
+ bridge = pci_find_host_bridge(pbus);
+
+ if (!bridge)
+ return -ENODEV;
+
+ if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
+ seg = bridge->domain_nr << 24;
+
+ bus = pbus->number << 16;
+ *sbdf = seg | bus | dport_dev->devfn;
+
+ return 0;
+}
+
+int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
+{
+ u64 param1 = 0, param2 = 0, param4 = 0;
+ u32 flags;
+ int rc;
+
+ if (!einj_initialized)
+ return -ENXIO;
+
+ /* Only CXL error types can be specified */
+ if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
+ return -EINVAL;
+
+ rc = validate_error_type(type);
+ if (rc)
+ return rc;
+
+ param1 = (u64) rcrb;
+ param2 = 0xfffffffffffff000;
+ flags = 0x2;
+
+ return einj_error_inject(type, flags, param1, param2, 0, param4);
+}
+EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
+
+int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
+{
+ u64 param1 = 0, param2 = 0, param4 = 0;
+ u32 flags;
+ int rc;
+
+ if (!einj_initialized)
+ return -ENXIO;
+
+ /* Only CXL error types can be specified */
+ if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
+ return -EINVAL;
+
+ rc = validate_error_type(type);
+ if (rc)
+ return rc;
+
+ rc = cxl_dport_get_sbdf(dport, ¶m4);
+ if (rc)
+ return rc;
+
+ flags = 0x4;
+
+ return einj_error_inject(type, flags, param1, param2, 0, param4);
+}
+EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
+
+static int error_type_get(void *data, u64 *val)
+{
+ *val = error_type;
+
+ return 0;
+}
+
+static int error_type_set(void *data, u64 val)
+{
+ int rc;
+
+ /* CXL error types have to be injected from cxl debugfs */
+ if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
+ return -EINVAL;
+
+ rc = validate_error_type(val);
+ if (rc)
+ return rc;
+
error_type = val;
return 0;
@@ -690,6 +822,12 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
return 0;
}
+bool einj_is_initialized(void)
+{
+ return einj_initialized;
+}
+EXPORT_SYMBOL_GPL(einj_is_initialized);
+
static int __init einj_probe(struct platform_device *pdev)
{
int rc;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 8c00fd6be730..c52c92222be5 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -3,6 +3,7 @@
#include <linux/platform_device.h>
#include <linux/memregion.h>
#include <linux/workqueue.h>
+#include <linux/einj-cxl.h>
#include <linux/debugfs.h>
#include <linux/device.h>
#include <linux/module.h>
@@ -797,6 +798,37 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
return rc;
}
+DEFINE_SHOW_ATTRIBUTE(einj_cxl_available_error_type);
+
+static int cxl_einj_inject(void *data, u64 type)
+{
+ struct cxl_dport *dport = data;
+
+ if (dport->rch)
+ return einj_cxl_inject_rch_error(dport->rcrb.base, type);
+
+ return einj_cxl_inject_error(to_pci_dev(dport->dport_dev), type);
+}
+DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
+
+static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
+{
+ struct dentry *dir;
+
+ /*
+ * dport_dev needs to be a PCIe port for CXL 2.0+ ports because
+ * EINJ expects a dport SBDF to be specified for 2.0 error injection.
+ */
+ if (!einj_is_initialized() ||
+ (!dport->rch && !dev_is_pci(dport->dport_dev)))
+ return;
+
+ dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
+
+ debugfs_create_file("einj_inject", 0200, dir, dport,
+ &cxl_einj_inject_fops);
+}
+
static struct cxl_port *__devm_cxl_add_port(struct device *host,
struct device *uport_dev,
resource_size_t component_reg_phys,
@@ -1144,6 +1176,8 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
if (dev_is_pci(dport_dev))
dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
+ cxl_debugfs_create_dport_dir(dport);
+
return dport;
}
@@ -2220,6 +2254,11 @@ static __init int cxl_core_init(void)
cxl_debugfs = debugfs_create_dir("cxl", NULL);
+ if (einj_is_initialized()) {
+ debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
+ &einj_cxl_available_error_type_fops);
+ }
+
cxl_mbox_init();
rc = cxl_memdev_init();
diff --git a/include/linux/einj-cxl.h b/include/linux/einj-cxl.h
new file mode 100644
index 000000000000..af57cc8580a6
--- /dev/null
+++ b/include/linux/einj-cxl.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * CXL protocol Error INJection support.
+ *
+ * Copyright (c) 2023 Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Ben Cheatham <benjamin.cheatham@amd.com>
+ */
+#ifndef CXL_EINJ_H
+#define CXL_EINJ_H
+
+#include <linux/pci.h>
+
+#if IS_ENABLED(CONFIG_ACPI_APEI_EINJ)
+int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
+int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
+int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
+bool einj_is_initialized(void);
+#else
+static inline int einj_cxl_available_error_type_show(struct seq_file *m,
+ void *v)
+{
+ return -ENXIO;
+}
+
+static inline int einj_cxl_type_show(struct seq_file *m, void *data)
+{
+ return -ENXIO;
+}
+
+static inline int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type)
+{
+ return -ENXIO;
+}
+
+static inline int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
+{
+ return -ENXIO;
+}
+
+static inline bool einj_is_initialized(void) { return false; }
+#endif
+
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v11 4/4] EINJ, Documentation: Update EINJ kernel doc
2024-02-08 20:00 [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
` (2 preceding siblings ...)
2024-02-08 20:00 ` [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions Ben Cheatham
@ 2024-02-08 20:00 ` Ben Cheatham
2024-02-14 15:29 ` Jonathan Cameron
2024-02-11 1:15 ` [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Dan Williams
4 siblings, 1 reply; 21+ messages in thread
From: Ben Cheatham @ 2024-02-08 20:00 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Update EINJ kernel document to include how to inject CXL protocol error
types, build the kernel to include CXL error types, and give an example
injection.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
.../firmware-guide/acpi/apei/einj.rst | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index d6b61d22f525..f179adf7b61c 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,6 +181,25 @@ You should see something like this in dmesg::
[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
+CXL error types are supported from ACPI 6.5 onwards. These error types
+are not available in the legacy interface at /sys/kernel/debug/apei/einj,
+and are instead at /sys/kernel/debug/cxl/. There is a file under debug/cxl
+called "einj_type" that is analogous to available_error_type under debug/cxl.
+There is also a "einj_inject" file in each $dport_dev directory under debug/cxl
+that will inject a given error into the dport represented by $dport_dev.
+For example, to inject a CXL.mem protocol correctable error into
+$dport_dev=pci0000:0c::
+
+ # cd /sys/kernel/debug/cxl/
+ # cat einj_type # See which error can be injected
+ 0x00008000 CXL.mem Protocol Correctable
+ 0x00010000 CXL.mem Protocol Uncorrectable non-fatal
+ 0x00020000 CXL.mem Protocol Uncorrectable fatal
+ # cd 0000:e0:01.1 # Navigate to dport to inject into
+ # echo 0x8000 > einj_inject # Inject error
+
+To use CXL error types, ``CONFIG_CXL_EINJ`` will need to be enabled.
+
Special notes for injection into SGX enclaves:
There may be a separate BIOS setup option to enable SGX injection.
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-08 20:00 ` [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions Ben Cheatham
@ 2024-02-09 21:30 ` Ben Cheatham
2024-02-10 0:02 ` Dan Williams
2024-02-14 15:27 ` Jonathan Cameron
1 sibling, 1 reply; 21+ messages in thread
From: Ben Cheatham @ 2024-02-09 21:30 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
On 2/8/24 2:00 PM, Ben Cheatham wrote:
> Implement CXL helper functions in the EINJ module for getting/injecting
> available CXL protocol error types and export them to sysfs under
> kernel/debug/cxl.
>
> The kernel/debug/cxl/einj_types file will print the available CXL
> protocol errors in the same format as the available_error_types
> file provided by the EINJ module. The
> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
> error_type and error_inject files provided by the EINJ module, i.e.:
> writing an error type into $dport_dev/einj_inject will inject said error
> type into the CXL dport represented by $dport_dev.
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
> Documentation/ABI/testing/debugfs-cxl | 22 ++++
> MAINTAINERS | 1 +
> drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++--
> drivers/cxl/core/port.c | 39 +++++++
> include/linux/einj-cxl.h | 45 ++++++++
> 5 files changed, 255 insertions(+), 10 deletions(-)
> create mode 100644 include/linux/einj-cxl.h
>
> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
> index fe61d372e3fa..bcd985cca66a 100644
> --- a/Documentation/ABI/testing/debugfs-cxl
> +++ b/Documentation/ABI/testing/debugfs-cxl
> @@ -33,3 +33,25 @@ Description:
> device cannot clear poison from the address, -ENXIO is returned.
> The clear_poison attribute is only visible for devices
> supporting the capability.
> +
> +What: /sys/kernel/debug/cxl/einj_types
> +Date: January, 2024
> +KernelVersion: v6.9
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + (RO) Prints the CXL protocol error types made available by
> + the platform in the format "0x<error number> <error type>".
> + The <error number> can be written to einj_inject to inject
> + <error type> into a chosen dport.
> +
> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
> +Date: January, 2024
> +KernelVersion: v6.9
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + (WO) Writing an integer to this file injects the corresponding
> + CXL protocol error into $dport_dev ($dport_dev will be a device
> + name from /sys/bus/pci/devices). The integer to type mapping for
> + injection can be found by reading from einj_types. If the dport
> + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
> + a CXL 2.0 error is injected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9104430e148e..02d7feb2ed1f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5246,6 +5246,7 @@ L: linux-cxl@vger.kernel.org
> S: Maintained
> F: drivers/cxl/
> F: include/uapi/linux/cxl_mem.h
> +F: include/linux/einj-cxl.h
> F: tools/testing/cxl/
>
> COMPUTE EXPRESS LINK PMU (CPMU)
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 73dde21d3e89..9137cc01f791 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -21,6 +21,7 @@
> #include <linux/nmi.h>
> #include <linux/delay.h>
> #include <linux/mm.h>
> +#include <linux/einj-cxl.h>
> #include <linux/platform_device.h>
> #include <asm/unaligned.h>
>
> @@ -37,6 +38,20 @@
> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
> ACPI_EINJ_MEMORY_FATAL)
> +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
> +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
> +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
> +#endif
> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
> + ACPI_EINJ_CXL_CACHE_FATAL | \
> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
> + ACPI_EINJ_CXL_MEM_FATAL)
>
> /*
> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
> @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> if (type & ACPI5_VENDOR_BIT) {
> if (vendor_flags != SETWA_FLAGS_MEM)
> goto inject;
> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
> goto inject;
> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
> + goto inject;
> + }
>
> /*
> * Disallow crazy address masks that give BIOS leeway to pick
> @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = {
> "0x00000200\tPlatform Correctable\n",
> "0x00000400\tPlatform Uncorrectable non-fatal\n",
> "0x00000800\tPlatform Uncorrectable fatal\n",
> +};
> +
> +static const char * const einj_cxl_error_type_string[] = {
> "0x00001000\tCXL.cache Protocol Correctable\n",
> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
> @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v)
>
> DEFINE_SHOW_ATTRIBUTE(available_error_type);
>
> -static int error_type_get(void *data, u64 *val)
> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
> {
> - *val = error_type;
> + int cxl_err, rc;
> + u32 available_error_type = 0;
> +
> + if (!einj_initialized)
> + return -ENXIO;
> +
> + rc = einj_get_available_error_type(&available_error_type);
> + if (rc)
> + return rc;
> +
> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
> +
> + if (available_error_type & cxl_err)
> + seq_puts(m, einj_cxl_error_type_string[pos]);
> + }
>
> return 0;
> }
> +EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
>
> -static int error_type_set(void *data, u64 val)
> +static int validate_error_type(u64 type)
> {
> + u32 tval, vendor, available_error_type = 0;
> int rc;
> - u32 available_error_type = 0;
> - u32 tval, vendor;
>
> /* Only low 32 bits for error type are valid */
> - if (val & GENMASK_ULL(63, 32))
> + if (type & GENMASK_ULL(63, 32))
> return -EINVAL;
>
> /*
> * Vendor defined types have 0x80000000 bit set, and
> * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
> */
> - vendor = val & ACPI5_VENDOR_BIT;
> - tval = val & 0x7fffffff;
> + vendor = type & ACPI5_VENDOR_BIT;
> + tval = type & 0x7fffffff;
>
> /* Only one error type can be specified */
> if (tval & (tval - 1))
> @@ -652,9 +688,105 @@ static int error_type_set(void *data, u64 val)
> rc = einj_get_available_error_type(&available_error_type);
> if (rc)
> return rc;
> - if (!(val & available_error_type))
> + if (!(type & available_error_type))
> return -EINVAL;
> }
> +
> + return 0;
> +}
> +
> +static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
> +{
> + struct pci_bus *pbus;
> + struct pci_host_bridge *bridge;
> + u64 seg = 0, bus;
> +
> + pbus = dport_dev->bus;
> + bridge = pci_find_host_bridge(pbus);
> +
> + if (!bridge)
> + return -ENODEV;
> +
> + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
> + seg = bridge->domain_nr << 24;
> +
> + bus = pbus->number << 16;
> + *sbdf = seg | bus | dport_dev->devfn;
> +
> + return 0;
> +}
> +
> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
> +{
> + u64 param1 = 0, param2 = 0, param4 = 0;
> + u32 flags;
> + int rc;
> +
> + if (!einj_initialized)
> + return -ENXIO;
> +
> + /* Only CXL error types can be specified */
> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
> + return -EINVAL;
> +
> + rc = validate_error_type(type);
> + if (rc)
> + return rc;
> +
> + param1 = (u64) rcrb;
> + param2 = 0xfffffffffffff000;
> + flags = 0x2;
> +
> + return einj_error_inject(type, flags, param1, param2, 0, param4);
> +}
> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
> +
> +int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
> +{
> + u64 param1 = 0, param2 = 0, param4 = 0;
> + u32 flags;
> + int rc;
> +
> + if (!einj_initialized)
> + return -ENXIO;
> +
> + /* Only CXL error types can be specified */
> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
> + return -EINVAL;
> +
> + rc = validate_error_type(type);
> + if (rc)
> + return rc;
> +
> + rc = cxl_dport_get_sbdf(dport, ¶m4);
> + if (rc)
> + return rc;
> +
> + flags = 0x4;
> +
> + return einj_error_inject(type, flags, param1, param2, 0, param4);
> +}
> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
> +
> +static int error_type_get(void *data, u64 *val)
> +{
> + *val = error_type;
> +
> + return 0;
> +}
> +
> +static int error_type_set(void *data, u64 val)
> +{
> + int rc;
> +
> + /* CXL error types have to be injected from cxl debugfs */
> + if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
> + return -EINVAL;
> +
> + rc = validate_error_type(val);
> + if (rc)
> + return rc;
> +
> error_type = val;
>
> return 0;
> @@ -690,6 +822,12 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
> return 0;
> }
>
> +bool einj_is_initialized(void)
> +{
> + return einj_initialized;
> +}
> +EXPORT_SYMBOL_GPL(einj_is_initialized);
> +
> static int __init einj_probe(struct platform_device *pdev)
> {
> int rc;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 8c00fd6be730..c52c92222be5 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -3,6 +3,7 @@
> #include <linux/platform_device.h>
> #include <linux/memregion.h>
> #include <linux/workqueue.h>
> +#include <linux/einj-cxl.h>
> #include <linux/debugfs.h>
> #include <linux/device.h>
> #include <linux/module.h>
> @@ -797,6 +798,37 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
> return rc;
> }
>
> +DEFINE_SHOW_ATTRIBUTE(einj_cxl_available_error_type);
> +
> +static int cxl_einj_inject(void *data, u64 type)
> +{
> + struct cxl_dport *dport = data;
> +
> + if (dport->rch)
> + return einj_cxl_inject_rch_error(dport->rcrb.base, type);
> +
> + return einj_cxl_inject_error(to_pci_dev(dport->dport_dev), type);
> +}
> +DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
> +
> +static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
> +{
> + struct dentry *dir;
> +
> + /*
> + * dport_dev needs to be a PCIe port for CXL 2.0+ ports because
> + * EINJ expects a dport SBDF to be specified for 2.0 error injection.
> + */
> + if (!einj_is_initialized() ||
> + (!dport->rch && !dev_is_pci(dport->dport_dev)))
> + return;
> +
> + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
> +
> + debugfs_create_file("einj_inject", 0200, dir, dport,
> + &cxl_einj_inject_fops);
> +}
> +
> static struct cxl_port *__devm_cxl_add_port(struct device *host,
> struct device *uport_dev,
> resource_size_t component_reg_phys,
> @@ -1144,6 +1176,8 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> if (dev_is_pci(dport_dev))
> dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
>
> + cxl_debugfs_create_dport_dir(dport);
> +
> return dport;
> }
>
> @@ -2220,6 +2254,11 @@ static __init int cxl_core_init(void)
>
> cxl_debugfs = debugfs_create_dir("cxl", NULL);
>
> + if (einj_is_initialized()) {
> + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
> + &einj_cxl_available_error_type_fops);
> + }
> +
> cxl_mbox_init();
>
> rc = cxl_memdev_init();
> diff --git a/include/linux/einj-cxl.h b/include/linux/einj-cxl.h
> new file mode 100644
> index 000000000000..af57cc8580a6
> --- /dev/null
> +++ b/include/linux/einj-cxl.h
> @@ -0,0 +1,45 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * CXL protocol Error INJection support.
> + *
> + * Copyright (c) 2023 Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Author: Ben Cheatham <benjamin.cheatham@amd.com>
> + */
> +#ifndef CXL_EINJ_H
> +#define CXL_EINJ_H
> +
> +#include <linux/pci.h>
> +
> +#if IS_ENABLED(CONFIG_ACPI_APEI_EINJ)
I was testing some other work using this kernel and was getting a linker error
when I had CXL_BUS set to 'y' and ACPI_APEI_EINJ set to 'm'. I went ahead and
solved it by changing the line above to use IS_REACHABLE() instead. The change
shouldn't actually affect the functionality of the patch since this is in a build
configuration where the CXL driver isn't using the EINJ module since CONFIG_CXL_EINJ
is set to n due to Kconfig restraints.
I can submit another version of this series with this fix, but I don't think it
makes much sense for a 1 line change, so I've but a diff below with the change
for whoever ends up putting this in their tree:
diff --git a/include/linux/einj-cxl.h b/include/linux/einj-cxl.h
index af57cc8580a6..70d411ea4629 100644
--- a/include/linux/einj-cxl.h
+++ b/include/linux/einj-cxl.h
@@ -12,7 +12,7 @@
#include <linux/pci.h>
-#if IS_ENABLED(CONFIG_ACPI_APEI_EINJ)
+#if IS_REACHABLE(CONFIG_ACPI_APEI_EINJ)
int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
Thanks,
Ben
> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
> +int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
> +bool einj_is_initialized(void);
> +#else
> +static inline int einj_cxl_available_error_type_show(struct seq_file *m,
> + void *v)
> +{
> + return -ENXIO;
> +}
> +
> +static inline int einj_cxl_type_show(struct seq_file *m, void *data)
> +{
> + return -ENXIO;
> +}
> +
> +static inline int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type)
> +{
> + return -ENXIO;
> +}
> +
> +static inline int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
> +{
> + return -ENXIO;
> +}
> +
> +static inline bool einj_is_initialized(void) { return false; }
> +#endif
> +
> +#endif
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-09 21:30 ` Ben Cheatham
@ 2024-02-10 0:02 ` Dan Williams
2024-02-10 6:31 ` Dan Williams
0 siblings, 1 reply; 21+ messages in thread
From: Dan Williams @ 2024-02-10 0:02 UTC (permalink / raw)
To: Ben Cheatham, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
Ben Cheatham wrote:
>
>
> On 2/8/24 2:00 PM, Ben Cheatham wrote:
> > Implement CXL helper functions in the EINJ module for getting/injecting
> > available CXL protocol error types and export them to sysfs under
> > kernel/debug/cxl.
> >
> > The kernel/debug/cxl/einj_types file will print the available CXL
> > protocol errors in the same format as the available_error_types
> > file provided by the EINJ module. The
> > kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
> > error_type and error_inject files provided by the EINJ module, i.e.:
> > writing an error type into $dport_dev/einj_inject will inject said error
> > type into the CXL dport represented by $dport_dev.
> >
> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> > ---
> > Documentation/ABI/testing/debugfs-cxl | 22 ++++
> > MAINTAINERS | 1 +
> > drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++--
> > drivers/cxl/core/port.c | 39 +++++++
> > include/linux/einj-cxl.h | 45 ++++++++
> > 5 files changed, 255 insertions(+), 10 deletions(-)
> > create mode 100644 include/linux/einj-cxl.h
> >
> > diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
> > index fe61d372e3fa..bcd985cca66a 100644
> > --- a/Documentation/ABI/testing/debugfs-cxl
> > +++ b/Documentation/ABI/testing/debugfs-cxl
> > @@ -33,3 +33,25 @@ Description:
> > device cannot clear poison from the address, -ENXIO is returned.
> > The clear_poison attribute is only visible for devices
> > supporting the capability.
> > +
> > +What: /sys/kernel/debug/cxl/einj_types
> > +Date: January, 2024
> > +KernelVersion: v6.9
> > +Contact: linux-cxl@vger.kernel.org
> > +Description:
> > + (RO) Prints the CXL protocol error types made available by
> > + the platform in the format "0x<error number> <error type>".
> > + The <error number> can be written to einj_inject to inject
> > + <error type> into a chosen dport.
> > +
> > +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
> > +Date: January, 2024
> > +KernelVersion: v6.9
> > +Contact: linux-cxl@vger.kernel.org
> > +Description:
> > + (WO) Writing an integer to this file injects the corresponding
> > + CXL protocol error into $dport_dev ($dport_dev will be a device
> > + name from /sys/bus/pci/devices). The integer to type mapping for
> > + injection can be found by reading from einj_types. If the dport
> > + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
> > + a CXL 2.0 error is injected.
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 9104430e148e..02d7feb2ed1f 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -5246,6 +5246,7 @@ L: linux-cxl@vger.kernel.org
> > S: Maintained
> > F: drivers/cxl/
> > F: include/uapi/linux/cxl_mem.h
> > +F: include/linux/einj-cxl.h
> > F: tools/testing/cxl/
> >
> > COMPUTE EXPRESS LINK PMU (CPMU)
> > diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> > index 73dde21d3e89..9137cc01f791 100644
> > --- a/drivers/acpi/apei/einj.c
> > +++ b/drivers/acpi/apei/einj.c
> > @@ -21,6 +21,7 @@
> > #include <linux/nmi.h>
> > #include <linux/delay.h>
> > #include <linux/mm.h>
> > +#include <linux/einj-cxl.h>
> > #include <linux/platform_device.h>
> > #include <asm/unaligned.h>
> >
> > @@ -37,6 +38,20 @@
> > #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
> > ACPI_EINJ_MEMORY_UNCORRECTABLE | \
> > ACPI_EINJ_MEMORY_FATAL)
> > +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
> > +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
> > +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
> > +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
> > +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
> > +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
> > +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
> > +#endif
> > +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
> > + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
> > + ACPI_EINJ_CXL_CACHE_FATAL | \
> > + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
> > + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
> > + ACPI_EINJ_CXL_MEM_FATAL)
> >
> > /*
> > * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
> > @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> > if (type & ACPI5_VENDOR_BIT) {
> > if (vendor_flags != SETWA_FLAGS_MEM)
> > goto inject;
> > - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
> > + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
> > goto inject;
> > + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
> > + goto inject;
> > + }
> >
> > /*
> > * Disallow crazy address masks that give BIOS leeway to pick
> > @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = {
> > "0x00000200\tPlatform Correctable\n",
> > "0x00000400\tPlatform Uncorrectable non-fatal\n",
> > "0x00000800\tPlatform Uncorrectable fatal\n",
> > +};
> > +
> > +static const char * const einj_cxl_error_type_string[] = {
> > "0x00001000\tCXL.cache Protocol Correctable\n",
> > "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
> > "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
> > @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v)
> >
> > DEFINE_SHOW_ATTRIBUTE(available_error_type);
> >
> > -static int error_type_get(void *data, u64 *val)
> > +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
> > {
> > - *val = error_type;
> > + int cxl_err, rc;
> > + u32 available_error_type = 0;
> > +
> > + if (!einj_initialized)
> > + return -ENXIO;
> > +
> > + rc = einj_get_available_error_type(&available_error_type);
> > + if (rc)
> > + return rc;
> > +
> > + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
> > + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
> > +
> > + if (available_error_type & cxl_err)
> > + seq_puts(m, einj_cxl_error_type_string[pos]);
> > + }
> >
> > return 0;
> > }
> > +EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
> >
> > -static int error_type_set(void *data, u64 val)
> > +static int validate_error_type(u64 type)
> > {
> > + u32 tval, vendor, available_error_type = 0;
> > int rc;
> > - u32 available_error_type = 0;
> > - u32 tval, vendor;
> >
> > /* Only low 32 bits for error type are valid */
> > - if (val & GENMASK_ULL(63, 32))
> > + if (type & GENMASK_ULL(63, 32))
> > return -EINVAL;
> >
> > /*
> > * Vendor defined types have 0x80000000 bit set, and
> > * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
> > */
> > - vendor = val & ACPI5_VENDOR_BIT;
> > - tval = val & 0x7fffffff;
> > + vendor = type & ACPI5_VENDOR_BIT;
> > + tval = type & 0x7fffffff;
> >
> > /* Only one error type can be specified */
> > if (tval & (tval - 1))
> > @@ -652,9 +688,105 @@ static int error_type_set(void *data, u64 val)
> > rc = einj_get_available_error_type(&available_error_type);
> > if (rc)
> > return rc;
> > - if (!(val & available_error_type))
> > + if (!(type & available_error_type))
> > return -EINVAL;
> > }
> > +
> > + return 0;
> > +}
> > +
> > +static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
> > +{
> > + struct pci_bus *pbus;
> > + struct pci_host_bridge *bridge;
> > + u64 seg = 0, bus;
> > +
> > + pbus = dport_dev->bus;
> > + bridge = pci_find_host_bridge(pbus);
> > +
> > + if (!bridge)
> > + return -ENODEV;
> > +
> > + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
> > + seg = bridge->domain_nr << 24;
> > +
> > + bus = pbus->number << 16;
> > + *sbdf = seg | bus | dport_dev->devfn;
> > +
> > + return 0;
> > +}
> > +
> > +int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
> > +{
> > + u64 param1 = 0, param2 = 0, param4 = 0;
> > + u32 flags;
> > + int rc;
> > +
> > + if (!einj_initialized)
> > + return -ENXIO;
> > +
> > + /* Only CXL error types can be specified */
> > + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
> > + return -EINVAL;
> > +
> > + rc = validate_error_type(type);
> > + if (rc)
> > + return rc;
> > +
> > + param1 = (u64) rcrb;
> > + param2 = 0xfffffffffffff000;
> > + flags = 0x2;
> > +
> > + return einj_error_inject(type, flags, param1, param2, 0, param4);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
> > +
> > +int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
> > +{
> > + u64 param1 = 0, param2 = 0, param4 = 0;
> > + u32 flags;
> > + int rc;
> > +
> > + if (!einj_initialized)
> > + return -ENXIO;
> > +
> > + /* Only CXL error types can be specified */
> > + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
> > + return -EINVAL;
> > +
> > + rc = validate_error_type(type);
> > + if (rc)
> > + return rc;
> > +
> > + rc = cxl_dport_get_sbdf(dport, ¶m4);
> > + if (rc)
> > + return rc;
> > +
> > + flags = 0x4;
> > +
> > + return einj_error_inject(type, flags, param1, param2, 0, param4);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
> > +
> > +static int error_type_get(void *data, u64 *val)
> > +{
> > + *val = error_type;
> > +
> > + return 0;
> > +}
> > +
> > +static int error_type_set(void *data, u64 val)
> > +{
> > + int rc;
> > +
> > + /* CXL error types have to be injected from cxl debugfs */
> > + if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
> > + return -EINVAL;
> > +
> > + rc = validate_error_type(val);
> > + if (rc)
> > + return rc;
> > +
> > error_type = val;
> >
> > return 0;
> > @@ -690,6 +822,12 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
> > return 0;
> > }
> >
> > +bool einj_is_initialized(void)
> > +{
> > + return einj_initialized;
> > +}
> > +EXPORT_SYMBOL_GPL(einj_is_initialized);
> > +
> > static int __init einj_probe(struct platform_device *pdev)
> > {
> > int rc;
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 8c00fd6be730..c52c92222be5 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -3,6 +3,7 @@
> > #include <linux/platform_device.h>
> > #include <linux/memregion.h>
> > #include <linux/workqueue.h>
> > +#include <linux/einj-cxl.h>
> > #include <linux/debugfs.h>
> > #include <linux/device.h>
> > #include <linux/module.h>
> > @@ -797,6 +798,37 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
> > return rc;
> > }
> >
> > +DEFINE_SHOW_ATTRIBUTE(einj_cxl_available_error_type);
> > +
> > +static int cxl_einj_inject(void *data, u64 type)
> > +{
> > + struct cxl_dport *dport = data;
> > +
> > + if (dport->rch)
> > + return einj_cxl_inject_rch_error(dport->rcrb.base, type);
> > +
> > + return einj_cxl_inject_error(to_pci_dev(dport->dport_dev), type);
> > +}
> > +DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
> > +
> > +static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
> > +{
> > + struct dentry *dir;
> > +
> > + /*
> > + * dport_dev needs to be a PCIe port for CXL 2.0+ ports because
> > + * EINJ expects a dport SBDF to be specified for 2.0 error injection.
> > + */
> > + if (!einj_is_initialized() ||
> > + (!dport->rch && !dev_is_pci(dport->dport_dev)))
> > + return;
> > +
> > + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
> > +
> > + debugfs_create_file("einj_inject", 0200, dir, dport,
> > + &cxl_einj_inject_fops);
> > +}
> > +
> > static struct cxl_port *__devm_cxl_add_port(struct device *host,
> > struct device *uport_dev,
> > resource_size_t component_reg_phys,
> > @@ -1144,6 +1176,8 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> > if (dev_is_pci(dport_dev))
> > dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
> >
> > + cxl_debugfs_create_dport_dir(dport);
> > +
> > return dport;
> > }
> >
> > @@ -2220,6 +2254,11 @@ static __init int cxl_core_init(void)
> >
> > cxl_debugfs = debugfs_create_dir("cxl", NULL);
> >
> > + if (einj_is_initialized()) {
> > + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
> > + &einj_cxl_available_error_type_fops);
> > + }
> > +
> > cxl_mbox_init();
> >
> > rc = cxl_memdev_init();
> > diff --git a/include/linux/einj-cxl.h b/include/linux/einj-cxl.h
> > new file mode 100644
> > index 000000000000..af57cc8580a6
> > --- /dev/null
> > +++ b/include/linux/einj-cxl.h
> > @@ -0,0 +1,45 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/*
> > + * CXL protocol Error INJection support.
> > + *
> > + * Copyright (c) 2023 Advanced Micro Devices, Inc.
> > + * All Rights Reserved.
> > + *
> > + * Author: Ben Cheatham <benjamin.cheatham@amd.com>
> > + */
> > +#ifndef CXL_EINJ_H
> > +#define CXL_EINJ_H
> > +
> > +#include <linux/pci.h>
> > +
> > +#if IS_ENABLED(CONFIG_ACPI_APEI_EINJ)
>
> I was testing some other work using this kernel and was getting a linker error
> when I had CXL_BUS set to 'y' and ACPI_APEI_EINJ set to 'm'. I went ahead and
> solved it by changing the line above to use IS_REACHABLE() instead. The change
> shouldn't actually affect the functionality of the patch since this is in a build
> configuration where the CXL driver isn't using the EINJ module since CONFIG_CXL_EINJ
> is set to n due to Kconfig restraints.
>
> I can submit another version of this series with this fix, but I don't think it
> makes much sense for a 1 line change, so I've but a diff below with the change
> for whoever ends up putting this in their tree:
I don't think this is the right fix. CONFIG_CXL_EINJ should be disabled when
CONFIG_CXL_BUS is built-in. Looks like the polarity the dependency is
backwards. The problem with IS_REACHABLE() is it removes the simple
debug option of "send me your .config" to see why some functionality is
not turned on.
Could you test out flipping the dependency from:
depends on ACPI_APEI_EINJ >= CXL_BUS
...to:
depends on ACPI_APEI_EINJ <= CXL_BUS
?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-10 0:02 ` Dan Williams
@ 2024-02-10 6:31 ` Dan Williams
2024-02-12 14:12 ` Ben Cheatham
0 siblings, 1 reply; 21+ messages in thread
From: Dan Williams @ 2024-02-10 6:31 UTC (permalink / raw)
To: Dan Williams, Ben Cheatham, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
Dan Williams wrote:
> Ben Cheatham wrote:
> >
> >
> > On 2/8/24 2:00 PM, Ben Cheatham wrote:
> > > Implement CXL helper functions in the EINJ module for getting/injecting
> > > available CXL protocol error types and export them to sysfs under
> > > kernel/debug/cxl.
> > >
> > > The kernel/debug/cxl/einj_types file will print the available CXL
> > > protocol errors in the same format as the available_error_types
> > > file provided by the EINJ module. The
> > > kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
> > > error_type and error_inject files provided by the EINJ module, i.e.:
> > > writing an error type into $dport_dev/einj_inject will inject said error
> > > type into the CXL dport represented by $dport_dev.
> > >
> > > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> > > ---
> > > Documentation/ABI/testing/debugfs-cxl | 22 ++++
> > > MAINTAINERS | 1 +
> > > drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++--
> > > drivers/cxl/core/port.c | 39 +++++++
> > > include/linux/einj-cxl.h | 45 ++++++++
> > > 5 files changed, 255 insertions(+), 10 deletions(-)
> > > create mode 100644 include/linux/einj-cxl.h
> > >
> > > diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
> > > index fe61d372e3fa..bcd985cca66a 100644
> > > --- a/Documentation/ABI/testing/debugfs-cxl
> > > +++ b/Documentation/ABI/testing/debugfs-cxl
> > > @@ -33,3 +33,25 @@ Description:
> > > device cannot clear poison from the address, -ENXIO is returned.
> > > The clear_poison attribute is only visible for devices
> > > supporting the capability.
> > > +
> > > +What: /sys/kernel/debug/cxl/einj_types
> > > +Date: January, 2024
> > > +KernelVersion: v6.9
> > > +Contact: linux-cxl@vger.kernel.org
> > > +Description:
> > > + (RO) Prints the CXL protocol error types made available by
> > > + the platform in the format "0x<error number> <error type>".
> > > + The <error number> can be written to einj_inject to inject
> > > + <error type> into a chosen dport.
> > > +
> > > +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
> > > +Date: January, 2024
> > > +KernelVersion: v6.9
> > > +Contact: linux-cxl@vger.kernel.org
> > > +Description:
> > > + (WO) Writing an integer to this file injects the corresponding
> > > + CXL protocol error into $dport_dev ($dport_dev will be a device
> > > + name from /sys/bus/pci/devices). The integer to type mapping for
> > > + injection can be found by reading from einj_types. If the dport
> > > + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
> > > + a CXL 2.0 error is injected.
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 9104430e148e..02d7feb2ed1f 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -5246,6 +5246,7 @@ L: linux-cxl@vger.kernel.org
> > > S: Maintained
> > > F: drivers/cxl/
> > > F: include/uapi/linux/cxl_mem.h
> > > +F: include/linux/einj-cxl.h
> > > F: tools/testing/cxl/
> > >
> > > COMPUTE EXPRESS LINK PMU (CPMU)
> > > diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> > > index 73dde21d3e89..9137cc01f791 100644
> > > --- a/drivers/acpi/apei/einj.c
> > > +++ b/drivers/acpi/apei/einj.c
> > > @@ -21,6 +21,7 @@
> > > #include <linux/nmi.h>
> > > #include <linux/delay.h>
> > > #include <linux/mm.h>
> > > +#include <linux/einj-cxl.h>
> > > #include <linux/platform_device.h>
> > > #include <asm/unaligned.h>
> > >
> > > @@ -37,6 +38,20 @@
> > > #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
> > > ACPI_EINJ_MEMORY_UNCORRECTABLE | \
> > > ACPI_EINJ_MEMORY_FATAL)
> > > +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
> > > +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
> > > +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
> > > +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
> > > +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
> > > +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
> > > +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
> > > +#endif
> > > +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
> > > + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
> > > + ACPI_EINJ_CXL_CACHE_FATAL | \
> > > + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
> > > + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
> > > + ACPI_EINJ_CXL_MEM_FATAL)
> > >
> > > /*
> > > * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
> > > @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> > > if (type & ACPI5_VENDOR_BIT) {
> > > if (vendor_flags != SETWA_FLAGS_MEM)
> > > goto inject;
> > > - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
> > > + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
> > > goto inject;
> > > + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
> > > + goto inject;
> > > + }
> > >
> > > /*
> > > * Disallow crazy address masks that give BIOS leeway to pick
> > > @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = {
> > > "0x00000200\tPlatform Correctable\n",
> > > "0x00000400\tPlatform Uncorrectable non-fatal\n",
> > > "0x00000800\tPlatform Uncorrectable fatal\n",
> > > +};
> > > +
> > > +static const char * const einj_cxl_error_type_string[] = {
> > > "0x00001000\tCXL.cache Protocol Correctable\n",
> > > "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
> > > "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
> > > @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v)
> > >
> > > DEFINE_SHOW_ATTRIBUTE(available_error_type);
> > >
> > > -static int error_type_get(void *data, u64 *val)
> > > +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
> > > {
> > > - *val = error_type;
> > > + int cxl_err, rc;
> > > + u32 available_error_type = 0;
> > > +
> > > + if (!einj_initialized)
> > > + return -ENXIO;
> > > +
> > > + rc = einj_get_available_error_type(&available_error_type);
> > > + if (rc)
> > > + return rc;
> > > +
> > > + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
> > > + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
> > > +
> > > + if (available_error_type & cxl_err)
> > > + seq_puts(m, einj_cxl_error_type_string[pos]);
> > > + }
> > >
> > > return 0;
> > > }
> > > +EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
> > >
> > > -static int error_type_set(void *data, u64 val)
> > > +static int validate_error_type(u64 type)
> > > {
> > > + u32 tval, vendor, available_error_type = 0;
> > > int rc;
> > > - u32 available_error_type = 0;
> > > - u32 tval, vendor;
> > >
> > > /* Only low 32 bits for error type are valid */
> > > - if (val & GENMASK_ULL(63, 32))
> > > + if (type & GENMASK_ULL(63, 32))
> > > return -EINVAL;
> > >
> > > /*
> > > * Vendor defined types have 0x80000000 bit set, and
> > > * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
> > > */
> > > - vendor = val & ACPI5_VENDOR_BIT;
> > > - tval = val & 0x7fffffff;
> > > + vendor = type & ACPI5_VENDOR_BIT;
> > > + tval = type & 0x7fffffff;
> > >
> > > /* Only one error type can be specified */
> > > if (tval & (tval - 1))
> > > @@ -652,9 +688,105 @@ static int error_type_set(void *data, u64 val)
> > > rc = einj_get_available_error_type(&available_error_type);
> > > if (rc)
> > > return rc;
> > > - if (!(val & available_error_type))
> > > + if (!(type & available_error_type))
> > > return -EINVAL;
> > > }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
> > > +{
> > > + struct pci_bus *pbus;
> > > + struct pci_host_bridge *bridge;
> > > + u64 seg = 0, bus;
> > > +
> > > + pbus = dport_dev->bus;
> > > + bridge = pci_find_host_bridge(pbus);
> > > +
> > > + if (!bridge)
> > > + return -ENODEV;
> > > +
> > > + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
> > > + seg = bridge->domain_nr << 24;
> > > +
> > > + bus = pbus->number << 16;
> > > + *sbdf = seg | bus | dport_dev->devfn;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
> > > +{
> > > + u64 param1 = 0, param2 = 0, param4 = 0;
> > > + u32 flags;
> > > + int rc;
> > > +
> > > + if (!einj_initialized)
> > > + return -ENXIO;
> > > +
> > > + /* Only CXL error types can be specified */
> > > + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
> > > + return -EINVAL;
> > > +
> > > + rc = validate_error_type(type);
> > > + if (rc)
> > > + return rc;
> > > +
> > > + param1 = (u64) rcrb;
> > > + param2 = 0xfffffffffffff000;
> > > + flags = 0x2;
> > > +
> > > + return einj_error_inject(type, flags, param1, param2, 0, param4);
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
> > > +
> > > +int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
> > > +{
> > > + u64 param1 = 0, param2 = 0, param4 = 0;
> > > + u32 flags;
> > > + int rc;
> > > +
> > > + if (!einj_initialized)
> > > + return -ENXIO;
> > > +
> > > + /* Only CXL error types can be specified */
> > > + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
> > > + return -EINVAL;
> > > +
> > > + rc = validate_error_type(type);
> > > + if (rc)
> > > + return rc;
> > > +
> > > + rc = cxl_dport_get_sbdf(dport, ¶m4);
> > > + if (rc)
> > > + return rc;
> > > +
> > > + flags = 0x4;
> > > +
> > > + return einj_error_inject(type, flags, param1, param2, 0, param4);
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
> > > +
> > > +static int error_type_get(void *data, u64 *val)
> > > +{
> > > + *val = error_type;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int error_type_set(void *data, u64 val)
> > > +{
> > > + int rc;
> > > +
> > > + /* CXL error types have to be injected from cxl debugfs */
> > > + if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
> > > + return -EINVAL;
> > > +
> > > + rc = validate_error_type(val);
> > > + if (rc)
> > > + return rc;
> > > +
> > > error_type = val;
> > >
> > > return 0;
> > > @@ -690,6 +822,12 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
> > > return 0;
> > > }
> > >
> > > +bool einj_is_initialized(void)
> > > +{
> > > + return einj_initialized;
> > > +}
> > > +EXPORT_SYMBOL_GPL(einj_is_initialized);
> > > +
> > > static int __init einj_probe(struct platform_device *pdev)
> > > {
> > > int rc;
> > > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > > index 8c00fd6be730..c52c92222be5 100644
> > > --- a/drivers/cxl/core/port.c
> > > +++ b/drivers/cxl/core/port.c
> > > @@ -3,6 +3,7 @@
> > > #include <linux/platform_device.h>
> > > #include <linux/memregion.h>
> > > #include <linux/workqueue.h>
> > > +#include <linux/einj-cxl.h>
> > > #include <linux/debugfs.h>
> > > #include <linux/device.h>
> > > #include <linux/module.h>
> > > @@ -797,6 +798,37 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
> > > return rc;
> > > }
> > >
> > > +DEFINE_SHOW_ATTRIBUTE(einj_cxl_available_error_type);
> > > +
> > > +static int cxl_einj_inject(void *data, u64 type)
> > > +{
> > > + struct cxl_dport *dport = data;
> > > +
> > > + if (dport->rch)
> > > + return einj_cxl_inject_rch_error(dport->rcrb.base, type);
> > > +
> > > + return einj_cxl_inject_error(to_pci_dev(dport->dport_dev), type);
> > > +}
> > > +DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
> > > +
> > > +static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
> > > +{
> > > + struct dentry *dir;
> > > +
> > > + /*
> > > + * dport_dev needs to be a PCIe port for CXL 2.0+ ports because
> > > + * EINJ expects a dport SBDF to be specified for 2.0 error injection.
> > > + */
> > > + if (!einj_is_initialized() ||
> > > + (!dport->rch && !dev_is_pci(dport->dport_dev)))
> > > + return;
> > > +
> > > + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
> > > +
> > > + debugfs_create_file("einj_inject", 0200, dir, dport,
> > > + &cxl_einj_inject_fops);
> > > +}
> > > +
> > > static struct cxl_port *__devm_cxl_add_port(struct device *host,
> > > struct device *uport_dev,
> > > resource_size_t component_reg_phys,
> > > @@ -1144,6 +1176,8 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
> > > if (dev_is_pci(dport_dev))
> > > dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
> > >
> > > + cxl_debugfs_create_dport_dir(dport);
> > > +
> > > return dport;
> > > }
> > >
> > > @@ -2220,6 +2254,11 @@ static __init int cxl_core_init(void)
> > >
> > > cxl_debugfs = debugfs_create_dir("cxl", NULL);
> > >
> > > + if (einj_is_initialized()) {
> > > + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
> > > + &einj_cxl_available_error_type_fops);
> > > + }
> > > +
> > > cxl_mbox_init();
> > >
> > > rc = cxl_memdev_init();
> > > diff --git a/include/linux/einj-cxl.h b/include/linux/einj-cxl.h
> > > new file mode 100644
> > > index 000000000000..af57cc8580a6
> > > --- /dev/null
> > > +++ b/include/linux/einj-cxl.h
> > > @@ -0,0 +1,45 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > > +/*
> > > + * CXL protocol Error INJection support.
> > > + *
> > > + * Copyright (c) 2023 Advanced Micro Devices, Inc.
> > > + * All Rights Reserved.
> > > + *
> > > + * Author: Ben Cheatham <benjamin.cheatham@amd.com>
> > > + */
> > > +#ifndef CXL_EINJ_H
> > > +#define CXL_EINJ_H
> > > +
> > > +#include <linux/pci.h>
> > > +
> > > +#if IS_ENABLED(CONFIG_ACPI_APEI_EINJ)
> >
> > I was testing some other work using this kernel and was getting a linker error
> > when I had CXL_BUS set to 'y' and ACPI_APEI_EINJ set to 'm'. I went ahead and
> > solved it by changing the line above to use IS_REACHABLE() instead. The change
> > shouldn't actually affect the functionality of the patch since this is in a build
> > configuration where the CXL driver isn't using the EINJ module since CONFIG_CXL_EINJ
> > is set to n due to Kconfig restraints.
> >
> > I can submit another version of this series with this fix, but I don't think it
> > makes much sense for a 1 line change, so I've but a diff below with the change
> > for whoever ends up putting this in their tree:
>
> I don't think this is the right fix. CONFIG_CXL_EINJ should be disabled when
> CONFIG_CXL_BUS is built-in. Looks like the polarity the dependency is
> backwards. The problem with IS_REACHABLE() is it removes the simple
> debug option of "send me your .config" to see why some functionality is
> not turned on.
>
> Could you test out flipping the dependency from:
>
> depends on ACPI_APEI_EINJ >= CXL_BUS
>
> ...to:
>
> depends on ACPI_APEI_EINJ <= CXL_BUS
Wait, no y == 2 and m == 1, so:
depends on ACPI_APEI_EINJ >= CXL_BUS
...is correct, but you need IS_ENABLED(CONFIG_CXL_EINJ), not
IS_REACHABLE to guard those exports.
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types
2024-02-08 20:00 [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
` (3 preceding siblings ...)
2024-02-08 20:00 ` [PATCH v11 4/4] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
@ 2024-02-11 1:15 ` Dan Williams
2024-02-12 14:13 ` Ben Cheatham
4 siblings, 1 reply; 21+ messages in thread
From: Dan Williams @ 2024-02-11 1:15 UTC (permalink / raw)
To: Ben Cheatham, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham, alexey.kardashevskiy,
Avadhut.Naik, bp, tony.luck, rafael.j.wysocki
[ Cc all the signers of 709f3cbd652e ("ACPI: APEI: EINJ: Refactor
available_error_type_show()") ]
Ben Cheatham wrote:
> v11 Changes:
> - Drop patch 2/6 (Add CXL protocol error defines) and put the
> defines in patch 4/6 instead (Dan)
> - Add Dan's reviewed-by
>
> v10 Changes:
> - Fixups in EINJ module initializtion (Dan)
> - Add include/linux/einj-cxl.h to MAINTAINERS under CXL subsystem
> (Dan)
> - Replace usages of IS_ENABLED(CONFIG_CXL_EINJ) with new
> einj_is_initialized() function in cxl/core/port.c (Dan)
> - Fix typo in EINJ documentation (Dan)
>
> The new CXL error types will use the Memory Address field in the
> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
> compliant memory-mapped downstream port. The value of the memory address
> will be in the port's MMIO range, and it will not represent physical
> (normal or persistent) memory.
>
> Add the functionality for injecting CXL 1.1 errors to the EINJ module,
> but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
> Instead, make the error types available under /sys/kernel/debug/cxl.
> This allows for validating the MMIO address for a CXL 1.1 error type
> while also not making the user responsible for finding it.
>
> Ben Cheatham (4):
> cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
> EINJ: Migrate to a platform driver
> cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
I think the above that go across cxl and EINJ can just be prefixed:
"cxl, EINJ:"
Also please rebase this on v6.8-rc3 to resolve a conflict with:
709f3cbd652e ACPI: APEI: EINJ: Refactor available_error_type_show()
That should also allow you to fixup the missing ifdef CONFIG_CXL_EINJ
guard around the EINJ driver exports.
This needs at least one "non-Dan" reviewer for the ACPI side.
> EINJ, Documentation: Update EINJ kernel doc
>
> Documentation/ABI/testing/debugfs-cxl | 22 ++
> .../firmware-guide/acpi/apei/einj.rst | 19 ++
> MAINTAINERS | 1 +
> drivers/acpi/apei/einj.c | 202 ++++++++++++++++--
> drivers/cxl/Kconfig | 12 ++
> drivers/cxl/core/port.c | 39 ++++
> include/linux/einj-cxl.h | 45 ++++
> 7 files changed, 328 insertions(+), 12 deletions(-)
> create mode 100644 include/linux/einj-cxl.h
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-10 6:31 ` Dan Williams
@ 2024-02-12 14:12 ` Ben Cheatham
2024-02-13 17:29 ` Ben Cheatham
0 siblings, 1 reply; 21+ messages in thread
From: Ben Cheatham @ 2024-02-12 14:12 UTC (permalink / raw)
To: Dan Williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
On 2/10/24 12:31 AM, Dan Williams wrote:
> Dan Williams wrote:
>> Ben Cheatham wrote:
>>>
>>>
>>> On 2/8/24 2:00 PM, Ben Cheatham wrote:
>>>> Implement CXL helper functions in the EINJ module for getting/injecting
>>>> available CXL protocol error types and export them to sysfs under
>>>> kernel/debug/cxl.
>>>>
>>>> The kernel/debug/cxl/einj_types file will print the available CXL
>>>> protocol errors in the same format as the available_error_types
>>>> file provided by the EINJ module. The
>>>> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
>>>> error_type and error_inject files provided by the EINJ module, i.e.:
>>>> writing an error type into $dport_dev/einj_inject will inject said error
>>>> type into the CXL dport represented by $dport_dev.
>>>>
>>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>>>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>>>> ---
>>>> Documentation/ABI/testing/debugfs-cxl | 22 ++++
>>>> MAINTAINERS | 1 +
>>>> drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++--
>>>> drivers/cxl/core/port.c | 39 +++++++
>>>> include/linux/einj-cxl.h | 45 ++++++++
>>>> 5 files changed, 255 insertions(+), 10 deletions(-)
>>>> create mode 100644 include/linux/einj-cxl.h
>>>>
>>>> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
>>>> index fe61d372e3fa..bcd985cca66a 100644
>>>> --- a/Documentation/ABI/testing/debugfs-cxl
>>>> +++ b/Documentation/ABI/testing/debugfs-cxl
>>>> @@ -33,3 +33,25 @@ Description:
>>>> device cannot clear poison from the address, -ENXIO is returned.
>>>> The clear_poison attribute is only visible for devices
>>>> supporting the capability.
>>>> +
>>>> +What: /sys/kernel/debug/cxl/einj_types
>>>> +Date: January, 2024
>>>> +KernelVersion: v6.9
>>>> +Contact: linux-cxl@vger.kernel.org
>>>> +Description:
>>>> + (RO) Prints the CXL protocol error types made available by
>>>> + the platform in the format "0x<error number> <error type>".
>>>> + The <error number> can be written to einj_inject to inject
>>>> + <error type> into a chosen dport.
>>>> +
>>>> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
>>>> +Date: January, 2024
>>>> +KernelVersion: v6.9
>>>> +Contact: linux-cxl@vger.kernel.org
>>>> +Description:
>>>> + (WO) Writing an integer to this file injects the corresponding
>>>> + CXL protocol error into $dport_dev ($dport_dev will be a device
>>>> + name from /sys/bus/pci/devices). The integer to type mapping for
>>>> + injection can be found by reading from einj_types. If the dport
>>>> + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
>>>> + a CXL 2.0 error is injected.
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index 9104430e148e..02d7feb2ed1f 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -5246,6 +5246,7 @@ L: linux-cxl@vger.kernel.org
>>>> S: Maintained
>>>> F: drivers/cxl/
>>>> F: include/uapi/linux/cxl_mem.h
>>>> +F: include/linux/einj-cxl.h
>>>> F: tools/testing/cxl/
>>>>
>>>> COMPUTE EXPRESS LINK PMU (CPMU)
>>>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>>>> index 73dde21d3e89..9137cc01f791 100644
>>>> --- a/drivers/acpi/apei/einj.c
>>>> +++ b/drivers/acpi/apei/einj.c
>>>> @@ -21,6 +21,7 @@
>>>> #include <linux/nmi.h>
>>>> #include <linux/delay.h>
>>>> #include <linux/mm.h>
>>>> +#include <linux/einj-cxl.h>
>>>> #include <linux/platform_device.h>
>>>> #include <asm/unaligned.h>
>>>>
>>>> @@ -37,6 +38,20 @@
>>>> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
>>>> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
>>>> ACPI_EINJ_MEMORY_FATAL)
>>>> +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
>>>> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
>>>> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
>>>> +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
>>>> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
>>>> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
>>>> +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
>>>> +#endif
>>>> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
>>>> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
>>>> + ACPI_EINJ_CXL_CACHE_FATAL | \
>>>> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
>>>> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
>>>> + ACPI_EINJ_CXL_MEM_FATAL)
>>>>
>>>> /*
>>>> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
>>>> @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>>>> if (type & ACPI5_VENDOR_BIT) {
>>>> if (vendor_flags != SETWA_FLAGS_MEM)
>>>> goto inject;
>>>> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
>>>> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
>>>> goto inject;
>>>> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
>>>> + goto inject;
>>>> + }
>>>>
>>>> /*
>>>> * Disallow crazy address masks that give BIOS leeway to pick
>>>> @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = {
>>>> "0x00000200\tPlatform Correctable\n",
>>>> "0x00000400\tPlatform Uncorrectable non-fatal\n",
>>>> "0x00000800\tPlatform Uncorrectable fatal\n",
>>>> +};
>>>> +
>>>> +static const char * const einj_cxl_error_type_string[] = {
>>>> "0x00001000\tCXL.cache Protocol Correctable\n",
>>>> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
>>>> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
>>>> @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v)
>>>>
>>>> DEFINE_SHOW_ATTRIBUTE(available_error_type);
>>>>
>>>> -static int error_type_get(void *data, u64 *val)
>>>> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
>>>> {
>>>> - *val = error_type;
>>>> + int cxl_err, rc;
>>>> + u32 available_error_type = 0;
>>>> +
>>>> + if (!einj_initialized)
>>>> + return -ENXIO;
>>>> +
>>>> + rc = einj_get_available_error_type(&available_error_type);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
>>>> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
>>>> +
>>>> + if (available_error_type & cxl_err)
>>>> + seq_puts(m, einj_cxl_error_type_string[pos]);
>>>> + }
>>>>
>>>> return 0;
>>>> }
>>>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
>>>>
>>>> -static int error_type_set(void *data, u64 val)
>>>> +static int validate_error_type(u64 type)
>>>> {
>>>> + u32 tval, vendor, available_error_type = 0;
>>>> int rc;
>>>> - u32 available_error_type = 0;
>>>> - u32 tval, vendor;
>>>>
>>>> /* Only low 32 bits for error type are valid */
>>>> - if (val & GENMASK_ULL(63, 32))
>>>> + if (type & GENMASK_ULL(63, 32))
>>>> return -EINVAL;
>>>>
>>>> /*
>>>> * Vendor defined types have 0x80000000 bit set, and
>>>> * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
>>>> */
>>>> - vendor = val & ACPI5_VENDOR_BIT;
>>>> - tval = val & 0x7fffffff;
>>>> + vendor = type & ACPI5_VENDOR_BIT;
>>>> + tval = type & 0x7fffffff;
>>>>
>>>> /* Only one error type can be specified */
>>>> if (tval & (tval - 1))
>>>> @@ -652,9 +688,105 @@ static int error_type_set(void *data, u64 val)
>>>> rc = einj_get_available_error_type(&available_error_type);
>>>> if (rc)
>>>> return rc;
>>>> - if (!(val & available_error_type))
>>>> + if (!(type & available_error_type))
>>>> return -EINVAL;
>>>> }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
>>>> +{
>>>> + struct pci_bus *pbus;
>>>> + struct pci_host_bridge *bridge;
>>>> + u64 seg = 0, bus;
>>>> +
>>>> + pbus = dport_dev->bus;
>>>> + bridge = pci_find_host_bridge(pbus);
>>>> +
>>>> + if (!bridge)
>>>> + return -ENODEV;
>>>> +
>>>> + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
>>>> + seg = bridge->domain_nr << 24;
>>>> +
>>>> + bus = pbus->number << 16;
>>>> + *sbdf = seg | bus | dport_dev->devfn;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
>>>> +{
>>>> + u64 param1 = 0, param2 = 0, param4 = 0;
>>>> + u32 flags;
>>>> + int rc;
>>>> +
>>>> + if (!einj_initialized)
>>>> + return -ENXIO;
>>>> +
>>>> + /* Only CXL error types can be specified */
>>>> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
>>>> + return -EINVAL;
>>>> +
>>>> + rc = validate_error_type(type);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + param1 = (u64) rcrb;
>>>> + param2 = 0xfffffffffffff000;
>>>> + flags = 0x2;
>>>> +
>>>> + return einj_error_inject(type, flags, param1, param2, 0, param4);
>>>> +}
>>>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
>>>> +
>>>> +int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
>>>> +{
>>>> + u64 param1 = 0, param2 = 0, param4 = 0;
>>>> + u32 flags;
>>>> + int rc;
>>>> +
>>>> + if (!einj_initialized)
>>>> + return -ENXIO;
>>>> +
>>>> + /* Only CXL error types can be specified */
>>>> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
>>>> + return -EINVAL;
>>>> +
>>>> + rc = validate_error_type(type);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + rc = cxl_dport_get_sbdf(dport, ¶m4);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + flags = 0x4;
>>>> +
>>>> + return einj_error_inject(type, flags, param1, param2, 0, param4);
>>>> +}
>>>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
>>>> +
>>>> +static int error_type_get(void *data, u64 *val)
>>>> +{
>>>> + *val = error_type;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int error_type_set(void *data, u64 val)
>>>> +{
>>>> + int rc;
>>>> +
>>>> + /* CXL error types have to be injected from cxl debugfs */
>>>> + if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
>>>> + return -EINVAL;
>>>> +
>>>> + rc = validate_error_type(val);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> error_type = val;
>>>>
>>>> return 0;
>>>> @@ -690,6 +822,12 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
>>>> return 0;
>>>> }
>>>>
>>>> +bool einj_is_initialized(void)
>>>> +{
>>>> + return einj_initialized;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(einj_is_initialized);
>>>> +
>>>> static int __init einj_probe(struct platform_device *pdev)
>>>> {
>>>> int rc;
>>>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>>>> index 8c00fd6be730..c52c92222be5 100644
>>>> --- a/drivers/cxl/core/port.c
>>>> +++ b/drivers/cxl/core/port.c
>>>> @@ -3,6 +3,7 @@
>>>> #include <linux/platform_device.h>
>>>> #include <linux/memregion.h>
>>>> #include <linux/workqueue.h>
>>>> +#include <linux/einj-cxl.h>
>>>> #include <linux/debugfs.h>
>>>> #include <linux/device.h>
>>>> #include <linux/module.h>
>>>> @@ -797,6 +798,37 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
>>>> return rc;
>>>> }
>>>>
>>>> +DEFINE_SHOW_ATTRIBUTE(einj_cxl_available_error_type);
>>>> +
>>>> +static int cxl_einj_inject(void *data, u64 type)
>>>> +{
>>>> + struct cxl_dport *dport = data;
>>>> +
>>>> + if (dport->rch)
>>>> + return einj_cxl_inject_rch_error(dport->rcrb.base, type);
>>>> +
>>>> + return einj_cxl_inject_error(to_pci_dev(dport->dport_dev), type);
>>>> +}
>>>> +DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
>>>> +
>>>> +static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
>>>> +{
>>>> + struct dentry *dir;
>>>> +
>>>> + /*
>>>> + * dport_dev needs to be a PCIe port for CXL 2.0+ ports because
>>>> + * EINJ expects a dport SBDF to be specified for 2.0 error injection.
>>>> + */
>>>> + if (!einj_is_initialized() ||
>>>> + (!dport->rch && !dev_is_pci(dport->dport_dev)))
>>>> + return;
>>>> +
>>>> + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
>>>> +
>>>> + debugfs_create_file("einj_inject", 0200, dir, dport,
>>>> + &cxl_einj_inject_fops);
>>>> +}
>>>> +
>>>> static struct cxl_port *__devm_cxl_add_port(struct device *host,
>>>> struct device *uport_dev,
>>>> resource_size_t component_reg_phys,
>>>> @@ -1144,6 +1176,8 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
>>>> if (dev_is_pci(dport_dev))
>>>> dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
>>>>
>>>> + cxl_debugfs_create_dport_dir(dport);
>>>> +
>>>> return dport;
>>>> }
>>>>
>>>> @@ -2220,6 +2254,11 @@ static __init int cxl_core_init(void)
>>>>
>>>> cxl_debugfs = debugfs_create_dir("cxl", NULL);
>>>>
>>>> + if (einj_is_initialized()) {
>>>> + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
>>>> + &einj_cxl_available_error_type_fops);
>>>> + }
>>>> +
>>>> cxl_mbox_init();
>>>>
>>>> rc = cxl_memdev_init();
>>>> diff --git a/include/linux/einj-cxl.h b/include/linux/einj-cxl.h
>>>> new file mode 100644
>>>> index 000000000000..af57cc8580a6
>>>> --- /dev/null
>>>> +++ b/include/linux/einj-cxl.h
>>>> @@ -0,0 +1,45 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>>>> +/*
>>>> + * CXL protocol Error INJection support.
>>>> + *
>>>> + * Copyright (c) 2023 Advanced Micro Devices, Inc.
>>>> + * All Rights Reserved.
>>>> + *
>>>> + * Author: Ben Cheatham <benjamin.cheatham@amd.com>
>>>> + */
>>>> +#ifndef CXL_EINJ_H
>>>> +#define CXL_EINJ_H
>>>> +
>>>> +#include <linux/pci.h>
>>>> +
>>>> +#if IS_ENABLED(CONFIG_ACPI_APEI_EINJ)
>>>
>>> I was testing some other work using this kernel and was getting a linker error
>>> when I had CXL_BUS set to 'y' and ACPI_APEI_EINJ set to 'm'. I went ahead and
>>> solved it by changing the line above to use IS_REACHABLE() instead. The change
>>> shouldn't actually affect the functionality of the patch since this is in a build
>>> configuration where the CXL driver isn't using the EINJ module since CONFIG_CXL_EINJ
>>> is set to n due to Kconfig restraints.
>>>
>>> I can submit another version of this series with this fix, but I don't think it
>>> makes much sense for a 1 line change, so I've but a diff below with the change
>>> for whoever ends up putting this in their tree:
>>
>> I don't think this is the right fix. CONFIG_CXL_EINJ should be disabled when
>> CONFIG_CXL_BUS is built-in. Looks like the polarity the dependency is
>> backwards. The problem with IS_REACHABLE() is it removes the simple
>> debug option of "send me your .config" to see why some functionality is
>> not turned on.
>>
>> Could you test out flipping the dependency from:
>>
>> depends on ACPI_APEI_EINJ >= CXL_BUS
>>
>> ...to:
>>
>> depends on ACPI_APEI_EINJ <= CXL_BUS
>
> Wait, no y == 2 and m == 1, so:
>
> depends on ACPI_APEI_EINJ >= CXL_BUS
>
> ...is correct, but you need IS_ENABLED(CONFIG_CXL_EINJ), not
> IS_REACHABLE to guard those exports.
Ok, I'll test it out today and let you know if it works.
Thanks,
Ben
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types
2024-02-11 1:15 ` [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Dan Williams
@ 2024-02-12 14:13 ` Ben Cheatham
0 siblings, 0 replies; 21+ messages in thread
From: Ben Cheatham @ 2024-02-12 14:13 UTC (permalink / raw)
To: Dan Williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, alexey.kardashevskiy, Avadhut.Naik, bp,
tony.luck, rafael.j.wysocki
On 2/10/24 7:15 PM, Dan Williams wrote:
> [ Cc all the signers of 709f3cbd652e ("ACPI: APEI: EINJ: Refactor
> available_error_type_show()") ]
>
> Ben Cheatham wrote:
>> v11 Changes:
>> - Drop patch 2/6 (Add CXL protocol error defines) and put the
>> defines in patch 4/6 instead (Dan)
>> - Add Dan's reviewed-by
>>
>> v10 Changes:
>> - Fixups in EINJ module initializtion (Dan)
>> - Add include/linux/einj-cxl.h to MAINTAINERS under CXL subsystem
>> (Dan)
>> - Replace usages of IS_ENABLED(CONFIG_CXL_EINJ) with new
>> einj_is_initialized() function in cxl/core/port.c (Dan)
>> - Fix typo in EINJ documentation (Dan)
>>
>> The new CXL error types will use the Memory Address field in the
>> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
>> compliant memory-mapped downstream port. The value of the memory address
>> will be in the port's MMIO range, and it will not represent physical
>> (normal or persistent) memory.
>>
>> Add the functionality for injecting CXL 1.1 errors to the EINJ module,
>> but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
>> Instead, make the error types available under /sys/kernel/debug/cxl.
>> This allows for validating the MMIO address for a CXL 1.1 error type
>> while also not making the user responsible for finding it.
>>
>> Ben Cheatham (4):
>> cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
>> EINJ: Migrate to a platform driver
>> cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
>
> I think the above that go across cxl and EINJ can just be prefixed:
>
> "cxl, EINJ:"
>
> Also please rebase this on v6.8-rc3 to resolve a conflict with:
>
> 709f3cbd652e ACPI: APEI: EINJ: Refactor available_error_type_show()
>
> That should also allow you to fixup the missing ifdef CONFIG_CXL_EINJ
> guard around the EINJ driver exports.
Alright, will do.
Thanks,
Ben
>
> This needs at least one "non-Dan" reviewer for the ACPI side.
>
>> EINJ, Documentation: Update EINJ kernel doc
>>
>> Documentation/ABI/testing/debugfs-cxl | 22 ++
>> .../firmware-guide/acpi/apei/einj.rst | 19 ++
>> MAINTAINERS | 1 +
>> drivers/acpi/apei/einj.c | 202 ++++++++++++++++--
>> drivers/cxl/Kconfig | 12 ++
>> drivers/cxl/core/port.c | 39 ++++
>> include/linux/einj-cxl.h | 45 ++++
>> 7 files changed, 328 insertions(+), 12 deletions(-)
>> create mode 100644 include/linux/einj-cxl.h
>>
>> --
>> 2.34.1
>>
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-12 14:12 ` Ben Cheatham
@ 2024-02-13 17:29 ` Ben Cheatham
2024-02-13 18:08 ` Dan Williams
0 siblings, 1 reply; 21+ messages in thread
From: Ben Cheatham @ 2024-02-13 17:29 UTC (permalink / raw)
To: Dan Williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
On 2/12/24 8:12 AM, Ben Cheatham wrote:
>
>
> On 2/10/24 12:31 AM, Dan Williams wrote:
[snip]
>>>
>>> I don't think this is the right fix. CONFIG_CXL_EINJ should be disabled when
>>> CONFIG_CXL_BUS is built-in. Looks like the polarity the dependency is
>>> backwards. The problem with IS_REACHABLE() is it removes the simple
>>> debug option of "send me your .config" to see why some functionality is
>>> not turned on.
>>>
>>> Could you test out flipping the dependency from:
>>>
>>> depends on ACPI_APEI_EINJ >= CXL_BUS
>>>
>>> ...to:
>>>
>>> depends on ACPI_APEI_EINJ <= CXL_BUS
>>
>> Wait, no y == 2 and m == 1, so:
>>
>> depends on ACPI_APEI_EINJ >= CXL_BUS
>>
>> ...is correct, but you need IS_ENABLED(CONFIG_CXL_EINJ), not
>> IS_REACHABLE to guard those exports.
>
> Ok, I'll test it out today and let you know if it works.
>
Alright, I tested it yesterday and that fix doesn't work. When I change the guard to
IS_ENABLED(CONFIG_CXL_EINJ) and set CONFIG_CXL_EINJ=n and CONFIG_ACPI_APEI_EINJ=y/m
I get a redefinition error in einj.c. I've found a 4 ways to fix this so far, here they are:
1. Leave the guard as IS_ENABLED(CONFIG_ACPI_APEI_EINJ) and add IS_ENABLED(CONFIG_CXL_EINJ)
around any calls to the functions in einj-cxl.h in cxl/core/port.c.
2. Change the guard to IS_ENABLED(CONFIG_CXL_EINJ) || IS_REACHABLE(CONFIG_ACPI_APEI_EINJ) and
add a IS_ENABLED(CONFIG_CXL_EINJ) check at the beginning of the einj_cxl_* functions in einj.c.
I'm not a fan of this one since it doesn't actually change what's built and just "artificially"
gates the functionality.
3. Change the guard to IS_ENABLED(CONFIG_CXL_EINJ) and add a #if IS_ENABLED(CONFIG_CXL_EINJ) guard
around the einj_cxl_* functions in einj.c. (I know this is pretty undesirable, but I thought I'd
mention it for posterity's sake).
4. Change the Kconfig definition of CXL_EINJ to depends on ACPI_APEI_EINJ = CXL_BUS.
My vote is for option 4, but I figured I should ask to see if you (or anyone else reading this thread)
likes a different one or has a better idea than the ones I laid out.
Thanks,
Ben
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-13 17:29 ` Ben Cheatham
@ 2024-02-13 18:08 ` Dan Williams
0 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2024-02-13 18:08 UTC (permalink / raw)
To: Ben Cheatham, Dan Williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
Ben Cheatham wrote:
[..]
> >> Wait, no y == 2 and m == 1, so:
> >>
> >> depends on ACPI_APEI_EINJ >= CXL_BUS
> >>
> >> ...is correct, but you need IS_ENABLED(CONFIG_CXL_EINJ), not
> >> IS_REACHABLE to guard those exports.
> >
> > Ok, I'll test it out today and let you know if it works.
> >
>
> Alright, I tested it yesterday and that fix doesn't work. When I
> change the guard to IS_ENABLED(CONFIG_CXL_EINJ) and set
> CONFIG_CXL_EINJ=n and CONFIG_ACPI_APEI_EINJ=y/m I get a redefinition
> error in einj.c. I've found a 4 ways to fix this so far, here they
> are:
>
[..]
> 4. Change the Kconfig definition of CXL_EINJ to depends on ACPI_APEI_EINJ = CXL_BUS.
>
> My vote is for option 4, but I figured I should ask to see if you (or
> anyone else reading this thread) likes a different one or has a better
> idea than the ones I laid out.
Looks good to me, it is the easiest to understand. Everything else looks
like a Kbuild unit test that only a build-bot would love. EINJ needs to
remain modular for the non-CXL case and if someone decides they want to
turn on this debug feature then CXL needs to be modular too.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 1/4] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
2024-02-08 20:00 ` [PATCH v11 1/4] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
@ 2024-02-14 14:05 ` Jonathan Cameron
2024-02-14 14:34 ` Ben Cheatham
0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2024-02-14 14:05 UTC (permalink / raw)
To: Ben Cheatham
Cc: dan.j.williams, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rafael, linux-cxl, linux-acpi
On Thu, 8 Feb 2024 14:00:39 -0600
Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
> Add CONFIG_CXL_EINJ to cxl/Kconfig. This option will allow for the CXL
> core module to access helpers inside the EINJ module, while also giving
> users the option of disabling CXL EINJ error types at build time.
>
> Also update CONFIG_ACPI_APEI_EINJ to set CONFIG_CXL_EINJ by default.
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Hi Ben,
I'm not a particular fan of Kconfig only patches, so would personally
have squashed this with patch 3 (or wherever it gets used)
That would also have had the side effect of making it clear this
doesn't actually get used (you and Dan have been discussing how
it should be).
Jonathan
> ---
> drivers/cxl/Kconfig | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 67998dbd1d46..95f215a2e5dc 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -157,4 +157,16 @@ config CXL_PMU
> monitoring units and provide standard perf based interfaces.
>
> If unsure say 'm'.
> +
> +config CXL_EINJ
> + bool "CXL Error INJection Support"
> + default ACPI_APEI_EINJ
> + depends on ACPI_APEI_EINJ >= CXL_BUS
> + help
> + Support for CXL protocol Error INJection through debugfs/cxl.
> + Availability and which errors are supported is dependent on
> + the host platform. Look to ACPI v6.5 section 18.6.4 and kernel
> + EINJ documentation for more information.
> +
> + If unsure say 'n'
> endif
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 1/4] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
2024-02-14 14:05 ` Jonathan Cameron
@ 2024-02-14 14:34 ` Ben Cheatham
0 siblings, 0 replies; 21+ messages in thread
From: Ben Cheatham @ 2024-02-14 14:34 UTC (permalink / raw)
To: Jonathan Cameron
Cc: dan.j.williams, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rafael, linux-cxl, linux-acpi
On 2/14/24 8:05 AM, Jonathan Cameron wrote:
> On Thu, 8 Feb 2024 14:00:39 -0600
> Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
>
>> Add CONFIG_CXL_EINJ to cxl/Kconfig. This option will allow for the CXL
>> core module to access helpers inside the EINJ module, while also giving
>> users the option of disabling CXL EINJ error types at build time.
>>
>> Also update CONFIG_ACPI_APEI_EINJ to set CONFIG_CXL_EINJ by default.
>>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>
> Hi Ben,
>
> I'm not a particular fan of Kconfig only patches, so would personally
> have squashed this with patch 3 (or wherever it gets used)
>
> That would also have had the side effect of making it clear this
> doesn't actually get used (you and Dan have been discussing how
> it should be).
>
> Jonathan
>
I agree, I'll fold this in to patch 3.
Thanks,
Ben
>
>> ---
>> drivers/cxl/Kconfig | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
>> index 67998dbd1d46..95f215a2e5dc 100644
>> --- a/drivers/cxl/Kconfig
>> +++ b/drivers/cxl/Kconfig
>> @@ -157,4 +157,16 @@ config CXL_PMU
>> monitoring units and provide standard perf based interfaces.
>>
>> If unsure say 'm'.
>> +
>> +config CXL_EINJ
>> + bool "CXL Error INJection Support"
>> + default ACPI_APEI_EINJ
>> + depends on ACPI_APEI_EINJ >= CXL_BUS
>> + help
>> + Support for CXL protocol Error INJection through debugfs/cxl.
>> + Availability and which errors are supported is dependent on
>> + the host platform. Look to ACPI v6.5 section 18.6.4 and kernel
>> + EINJ documentation for more information.
>> +
>> + If unsure say 'n'
>> endif
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 2/4] EINJ: Migrate to a platform driver
2024-02-08 20:00 ` [PATCH v11 2/4] EINJ: Migrate to a platform driver Ben Cheatham
@ 2024-02-14 14:37 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-02-14 14:37 UTC (permalink / raw)
To: Ben Cheatham
Cc: dan.j.williams, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rafael, linux-cxl, linux-acpi
On Thu, 8 Feb 2024 14:00:40 -0600
Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
> Change the EINJ module to install a platform device/driver on module
> init and move the module init() and exit() functions to driver probe and
> remove. This change allows the EINJ module to load regardless of whether
> setting up EINJ succeeds, which allows dependent modules to still load
> (i.e. the CXL core).
>
> Since EINJ may no longer be initialized when the module loads, any
> functions that are called from dependent/external modules should check
> the einj_initialized variable before calling any EINJ functions.
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
As there is not a platform device about, could reduce the number of globals
in here, but job for another day.
Also devm fun might be useful to reduce the need for manual error handling
and remove().
So generally this enables some potential tidying up, but what you have
here LGTM and would be first step in that process anyway.
If I run out of other stuff to do (unlikely ;) then I might take a stab
at such a cleanup sometime.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/acpi/apei/einj.c | 44 ++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 013eb621dc92..73dde21d3e89 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -21,6 +21,7 @@
> #include <linux/nmi.h>
> #include <linux/delay.h>
> #include <linux/mm.h>
> +#include <linux/platform_device.h>
> #include <asm/unaligned.h>
>
> #include "apei-internal.h"
> @@ -136,6 +137,11 @@ static struct apei_exec_ins_type einj_ins_type[] = {
> */
> static DEFINE_MUTEX(einj_mutex);
>
> +/*
> + * Exported APIs use this flag to exit early if einj_probe() failed.
> + */
> +static bool einj_initialized __ro_after_init;
> +
> static void *einj_param;
>
> static void einj_exec_ctx_init(struct apei_exec_context *ctx)
> @@ -684,7 +690,7 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
> return 0;
> }
>
> -static int __init einj_init(void)
> +static int __init einj_probe(struct platform_device *pdev)
> {
> int rc;
> acpi_status status;
> @@ -782,7 +788,7 @@ static int __init einj_init(void)
> return rc;
> }
>
> -static void __exit einj_exit(void)
> +static void __exit einj_remove(struct platform_device *pdev)
> {
> struct apei_exec_context ctx;
>
> @@ -801,6 +807,40 @@ static void __exit einj_exit(void)
> acpi_put_table((struct acpi_table_header *)einj_tab);
> }
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-08 20:00 ` [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions Ben Cheatham
2024-02-09 21:30 ` Ben Cheatham
@ 2024-02-14 15:27 ` Jonathan Cameron
2024-02-14 16:41 ` Ben Cheatham
1 sibling, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2024-02-14 15:27 UTC (permalink / raw)
To: Ben Cheatham
Cc: dan.j.williams, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rafael, linux-cxl, linux-acpi
On Thu, 8 Feb 2024 14:00:41 -0600
Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
> Implement CXL helper functions in the EINJ module for getting/injecting
> available CXL protocol error types and export them to sysfs under
> kernel/debug/cxl.
>
> The kernel/debug/cxl/einj_types file will print the available CXL
> protocol errors in the same format as the available_error_types
> file provided by the EINJ module. The
> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
> error_type and error_inject files provided by the EINJ module, i.e.:
> writing an error type into $dport_dev/einj_inject will inject said error
> type into the CXL dport represented by $dport_dev.
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Hi Ben,
Sorry I've not looked at this sooner.
Anyhow, some comments inline. Mostly looks good to me.
Jonathan
> ---
> Documentation/ABI/testing/debugfs-cxl | 22 ++++
> MAINTAINERS | 1 +
> drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++--
> drivers/cxl/core/port.c | 39 +++++++
> include/linux/einj-cxl.h | 45 ++++++++
> 5 files changed, 255 insertions(+), 10 deletions(-)
> create mode 100644 include/linux/einj-cxl.h
>
> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
> index fe61d372e3fa..bcd985cca66a 100644
> --- a/Documentation/ABI/testing/debugfs-cxl
> +++ b/Documentation/ABI/testing/debugfs-cxl
> @@ -33,3 +33,25 @@ Description:
> device cannot clear poison from the address, -ENXIO is returned.
> The clear_poison attribute is only visible for devices
> supporting the capability.
> +
> +What: /sys/kernel/debug/cxl/einj_types
> +Date: January, 2024
> +KernelVersion: v6.9
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + (RO) Prints the CXL protocol error types made available by
> + the platform in the format "0x<error number> <error type>".
> + The <error number> can be written to einj_inject to inject
> + <error type> into a chosen dport.
I think it's a limited set, so docs could include what the error_type values can
be? From this description it's not obvious they are human readable strings.
> +
> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
> +Date: January, 2024
> +KernelVersion: v6.9
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + (WO) Writing an integer to this file injects the corresponding
> + CXL protocol error into $dport_dev ($dport_dev will be a device
> + name from /sys/bus/pci/devices). The integer to type mapping for
> + injection can be found by reading from einj_types. If the dport
> + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
> + a CXL 2.0 error is injected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9104430e148e..02d7feb2ed1f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5246,6 +5246,7 @@ L: linux-cxl@vger.kernel.org
> S: Maintained
> F: drivers/cxl/
> F: include/uapi/linux/cxl_mem.h
> +F: include/linux/einj-cxl.h
> F: tools/testing/cxl/
>
> COMPUTE EXPRESS LINK PMU (CPMU)
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 73dde21d3e89..9137cc01f791 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -21,6 +21,7 @@
> #include <linux/nmi.h>
> #include <linux/delay.h>
> #include <linux/mm.h>
> +#include <linux/einj-cxl.h>
> #include <linux/platform_device.h>
> #include <asm/unaligned.h>
>
> @@ -37,6 +38,20 @@
> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
> ACPI_EINJ_MEMORY_FATAL)
> +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
> +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
> +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
> +#endif
> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
> + ACPI_EINJ_CXL_CACHE_FATAL | \
> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
> + ACPI_EINJ_CXL_MEM_FATAL)
>
> /*
> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
> @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> if (type & ACPI5_VENDOR_BIT) {
> if (vendor_flags != SETWA_FLAGS_MEM)
> goto inject;
> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
> goto inject;
> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
> + goto inject;
> + }
>
> /*
> * Disallow crazy address masks that give BIOS leeway to pick
> @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = {
> "0x00000200\tPlatform Correctable\n",
> "0x00000400\tPlatform Uncorrectable non-fatal\n",
> "0x00000800\tPlatform Uncorrectable fatal\n",
> +};
> +
> +static const char * const einj_cxl_error_type_string[] = {
> "0x00001000\tCXL.cache Protocol Correctable\n",
> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
> @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v)
>
> DEFINE_SHOW_ATTRIBUTE(available_error_type);
>
> -static int error_type_get(void *data, u64 *val)
> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
> {
> - *val = error_type;
> + int cxl_err, rc;
> + u32 available_error_type = 0;
> +
> + if (!einj_initialized)
> + return -ENXIO;
> +
> + rc = einj_get_available_error_type(&available_error_type);
> + if (rc)
> + return rc;
> +
> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
Trivial so feel free to ignore but, I'd stick to local styles and have pos
declared in more traditional c style.
> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
Maybe clearer as
cxl_err = FIELD_PREP(CXL_ERROR_MASK, BIT(pos));
> +
> + if (available_error_type & cxl_err)
> + seq_puts(m, einj_cxl_error_type_string[pos]);
> + }
>
> return 0;
> }
> +EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
>
> -static int error_type_set(void *data, u64 val)
> +static int validate_error_type(u64 type)
> {
> + u32 tval, vendor, available_error_type = 0;
> int rc;
> - u32 available_error_type = 0;
> - u32 tval, vendor;
>
> /* Only low 32 bits for error type are valid */
> - if (val & GENMASK_ULL(63, 32))
> + if (type & GENMASK_ULL(63, 32))
> return -EINVAL;
>
> /*
> * Vendor defined types have 0x80000000 bit set, and
> * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
> */
> - vendor = val & ACPI5_VENDOR_BIT;
> - tval = val & 0x7fffffff;
> + vendor = type & ACPI5_VENDOR_BIT;
> + tval = type & 0x7fffffff;
Could flip this to GENMASK whilst you are here.
I don't much like counting fs :)
>
> /* Only one error type can be specified */
> if (tval & (tval - 1))
> @@ -652,9 +688,105 @@ static int error_type_set(void *data, u64 val)
> rc = einj_get_available_error_type(&available_error_type);
> if (rc)
> return rc;
> - if (!(val & available_error_type))
> + if (!(type & available_error_type))
> return -EINVAL;
> }
> +
> + return 0;
> +}
> +
> +static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
> +{
> + struct pci_bus *pbus;
> + struct pci_host_bridge *bridge;
> + u64 seg = 0, bus;
> +
> + pbus = dport_dev->bus;
> + bridge = pci_find_host_bridge(pbus);
> +
> + if (!bridge)
> + return -ENODEV;
> +
> + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
Ah. x86 is not using the CONFIG_PCI_DOMAINS_GENERIC
so I guess we can't just use pci_domain_nr(pbus)?
(for the generic case bus->domain_nr is filled in when
the host bridge is registered). Pity.
> + seg = bridge->domain_nr << 24;
> +
> + bus = pbus->number << 16;
I'd do the shifts whilst building sbpf.
AS it stands you end up with what look to be wrong values in
seg and bus because you'd shifted them in the local variables.
> + *sbdf = seg | bus | dport_dev->devfn;
*sbdf = (seg << 24) | (bus << 16) | dport_dev->devfn;
(with shifts dropped where seg and bus are set)
> +
> + return 0;
> +}
> +
> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
> +{
> + u64 param1 = 0, param2 = 0, param4 = 0;
> + u32 flags;
> + int rc;
> +
> + if (!einj_initialized)
> + return -ENXIO;
> +
> + /* Only CXL error types can be specified */
> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
As below - a utility function would unify the 3 uses of this and
avoid need for comment.
> + return -EINVAL;
> +
> + rc = validate_error_type(type);
> + if (rc)
> + return rc;
> +
> + param1 = (u64) rcrb;
already a u64
> + param2 = 0xfffffffffffff000;
No need to initialize these to 0 above.
> + flags = 0x2;
> +
> + return einj_error_inject(type, flags, param1, param2, 0, param4);
return einj_error_inject(type, flags,
rcrb, 0xfffffffffffff000, 0, 0);
> +}
> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
> +
> +int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
> +{
> + u64 param1 = 0, param2 = 0, param4 = 0;
> + u32 flags;
> + int rc;
> +
> + if (!einj_initialized)
> + return -ENXIO;
> +
> + /* Only CXL error types can be specified */
> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
As below a utility function would do this nicely (and avoid need
for comment).
if (!is_cxl_error(type))
> + return -EINVAL;
> +
> + rc = validate_error_type(type);
> + if (rc)
> + return rc;
> +
> + rc = cxl_dport_get_sbdf(dport, ¶m4);
> + if (rc)
> + return rc;
> +
> + flags = 0x4;
> +
> + return einj_error_inject(type, flags, param1, param2, 0, param4);
Why not
return einj_error_injecT(type, 0x4, 0, 0, 0, param4);
?
Maybe flags is useful as "documentation" but given the parameters are nicely
in order and you already didn't bother with param3, I can't see why
keeping param1 and param2 as local variables is useful.
> +}
> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
> +
> +static int error_type_set(void *data, u64 val)
> +{
> + int rc;
> +
> + /* CXL error types have to be injected from cxl debugfs */
> + if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
Given you have inverse of this above, maybe it's worth a little
helper function to have the logic only once?
if (is_cxl_error(val))
> + return -EINVAL;
> +
> + rc = validate_error_type(val);
> + if (rc)
> + return rc;
> +
> error_type = val;
>
> return 0;
> @@ -690,6 +822,12 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
> return 0;
> }
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 8c00fd6be730..c52c92222be5 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> +static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
> +{
> + struct dentry *dir;
> +
> + /*
> + * dport_dev needs to be a PCIe port for CXL 2.0+ ports because
> + * EINJ expects a dport SBDF to be specified for 2.0 error injection.
> + */
> + if (!einj_is_initialized() ||
> + (!dport->rch && !dev_is_pci(dport->dport_dev)))
> + return;
Trivial: Even though it's a little more code I'd break this up so that it's clear
exactly what the comment refers to.
if (!einj_is_initialized)
return;
/*
* dport_dev needs to be a PCIe port for CXL 2.0+ ports because
* EINJ expects a dport SBDF to be specified for 2.0 error injection.
*/
if (!dport->rch && !dev_is_pci(dport->dport_dev))
return;
> +
> + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
> +
> + debugfs_create_file("einj_inject", 0200, dir, dport,
> + &cxl_einj_inject_fops);
> +}
> +
> static struct cxl_port *__devm_cxl_add_port(struct device *host,
> struct device *uport_dev,
> resource_size_t component_reg_phys,
...
> @@ -2220,6 +2254,11 @@ static __init int cxl_core_init(void)
>
> cxl_debugfs = debugfs_create_dir("cxl", NULL);
>
> + if (einj_is_initialized()) {
> + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
> + &einj_cxl_available_error_type_fops);
> + }
> +
> cxl_mbox_init();
>
> rc = cxl_memdev_init();
> diff --git a/include/linux/einj-cxl.h b/include/linux/einj-cxl.h
> new file mode 100644
> index 000000000000..af57cc8580a6
> --- /dev/null
> +++ b/include/linux/einj-cxl.h
> +
> +#if IS_ENABLED(CONFIG_ACPI_APEI_EINJ)
> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
> +int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
> +bool einj_is_initialized(void);
> +#else
It's trivial but if you are respinning, I'd like to see a comment for the
else and the endif to let us trivially see what they match with.
Lack of indenting for this preprocessor conditions can make this really
hard to undwind once a file grows more complex over time.
> +static inline int einj_cxl_available_error_type_show(struct seq_file *m,
> + void *v)
> +{
> + return -ENXIO;
> +}
> +
> +static inline int einj_cxl_type_show(struct seq_file *m, void *data)
A stub for a function that doesn't exist otherwise. Left
over from refactors?
> +{
> + return -ENXIO;
> +}
> +
> +static inline int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type)
> +{
> + return -ENXIO;
> +}
> +
> +static inline int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
> +{
> + return -ENXIO;
> +}
> +
> +static inline bool einj_is_initialized(void) { return false; }
> +#endif
> +
> +#endif
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 4/4] EINJ, Documentation: Update EINJ kernel doc
2024-02-08 20:00 ` [PATCH v11 4/4] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
@ 2024-02-14 15:29 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-02-14 15:29 UTC (permalink / raw)
To: Ben Cheatham
Cc: dan.j.williams, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rafael, linux-cxl, linux-acpi
On Thu, 8 Feb 2024 14:00:42 -0600
Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
> Update EINJ kernel document to include how to inject CXL protocol error
> types, build the kernel to include CXL error types, and give an example
> injection.
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> .../firmware-guide/acpi/apei/einj.rst | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
> index d6b61d22f525..f179adf7b61c 100644
> --- a/Documentation/firmware-guide/acpi/apei/einj.rst
> +++ b/Documentation/firmware-guide/acpi/apei/einj.rst
> @@ -181,6 +181,25 @@ You should see something like this in dmesg::
> [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
> [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
>
> +CXL error types are supported from ACPI 6.5 onwards. These error types
> +are not available in the legacy interface at /sys/kernel/debug/apei/einj,
> +and are instead at /sys/kernel/debug/cxl/. There is a file under debug/cxl
> +called "einj_type" that is analogous to available_error_type under debug/cxl.
> +There is also a "einj_inject" file in each $dport_dev directory under debug/cxl
> +that will inject a given error into the dport represented by $dport_dev.
> +For example, to inject a CXL.mem protocol correctable error into
> +$dport_dev=pci0000:0c::
> +
> + # cd /sys/kernel/debug/cxl/
> + # cat einj_type # See which error can be injected
> + 0x00008000 CXL.mem Protocol Correctable
> + 0x00010000 CXL.mem Protocol Uncorrectable non-fatal
> + 0x00020000 CXL.mem Protocol Uncorrectable fatal
> + # cd 0000:e0:01.1 # Navigate to dport to inject into
> + # echo 0x8000 > einj_inject # Inject error
> +
> +To use CXL error types, ``CONFIG_CXL_EINJ`` will need to be enabled.
> +
> Special notes for injection into SGX enclaves:
>
> There may be a separate BIOS setup option to enable SGX injection.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-14 15:27 ` Jonathan Cameron
@ 2024-02-14 16:41 ` Ben Cheatham
2024-02-14 17:46 ` Jonathan Cameron
0 siblings, 1 reply; 21+ messages in thread
From: Ben Cheatham @ 2024-02-14 16:41 UTC (permalink / raw)
To: Jonathan Cameron
Cc: dan.j.williams, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rafael, linux-cxl, linux-acpi
On 2/14/24 9:27 AM, Jonathan Cameron wrote:
> On Thu, 8 Feb 2024 14:00:41 -0600
> Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
>
>> Implement CXL helper functions in the EINJ module for getting/injecting
>> available CXL protocol error types and export them to sysfs under
>> kernel/debug/cxl.
>>
>> The kernel/debug/cxl/einj_types file will print the available CXL
>> protocol errors in the same format as the available_error_types
>> file provided by the EINJ module. The
>> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
>> error_type and error_inject files provided by the EINJ module, i.e.:
>> writing an error type into $dport_dev/einj_inject will inject said error
>> type into the CXL dport represented by $dport_dev.
>>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> Hi Ben,
>
> Sorry I've not looked at this sooner.
>
> Anyhow, some comments inline. Mostly looks good to me.
>
> Jonathan
>
>> ---
>> Documentation/ABI/testing/debugfs-cxl | 22 ++++
>> MAINTAINERS | 1 +
>> drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++--
>> drivers/cxl/core/port.c | 39 +++++++
>> include/linux/einj-cxl.h | 45 ++++++++
>> 5 files changed, 255 insertions(+), 10 deletions(-)
>> create mode 100644 include/linux/einj-cxl.h
>>
>> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
>> index fe61d372e3fa..bcd985cca66a 100644
>> --- a/Documentation/ABI/testing/debugfs-cxl
>> +++ b/Documentation/ABI/testing/debugfs-cxl
>> @@ -33,3 +33,25 @@ Description:
>> device cannot clear poison from the address, -ENXIO is returned.
>> The clear_poison attribute is only visible for devices
>> supporting the capability.
>> +
>> +What: /sys/kernel/debug/cxl/einj_types
>> +Date: January, 2024
>> +KernelVersion: v6.9
>> +Contact: linux-cxl@vger.kernel.org
>> +Description:
>> + (RO) Prints the CXL protocol error types made available by
>> + the platform in the format "0x<error number> <error type>".
>> + The <error number> can be written to einj_inject to inject
>> + <error type> into a chosen dport.
>
> I think it's a limited set, so docs could include what the error_type values can
> be? From this description it's not obvious they are human readable strings.
>
It is a limited set, but that set has 6 variants. It may make the description
a bit long to include all of them, but I could include an example string instead?
If length isn't an issue then I can add them all in.
>> +
>> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
>> +Date: January, 2024
>> +KernelVersion: v6.9
>> +Contact: linux-cxl@vger.kernel.org
>> +Description:
>> + (WO) Writing an integer to this file injects the corresponding
>> + CXL protocol error into $dport_dev ($dport_dev will be a device
>> + name from /sys/bus/pci/devices). The integer to type mapping for
>> + injection can be found by reading from einj_types. If the dport
>> + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
>> + a CXL 2.0 error is injected.
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 9104430e148e..02d7feb2ed1f 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -5246,6 +5246,7 @@ L: linux-cxl@vger.kernel.org
>> S: Maintained
>> F: drivers/cxl/
>> F: include/uapi/linux/cxl_mem.h
>> +F: include/linux/einj-cxl.h
>> F: tools/testing/cxl/
>>
>> COMPUTE EXPRESS LINK PMU (CPMU)
>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>> index 73dde21d3e89..9137cc01f791 100644
>> --- a/drivers/acpi/apei/einj.c
>> +++ b/drivers/acpi/apei/einj.c
>> @@ -21,6 +21,7 @@
>> #include <linux/nmi.h>
>> #include <linux/delay.h>
>> #include <linux/mm.h>
>> +#include <linux/einj-cxl.h>
>> #include <linux/platform_device.h>
>> #include <asm/unaligned.h>
>>
>> @@ -37,6 +38,20 @@
>> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
>> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
>> ACPI_EINJ_MEMORY_FATAL)
>> +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
>> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
>> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
>> +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
>> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
>> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
>> +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
>> +#endif
>> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
>> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
>> + ACPI_EINJ_CXL_CACHE_FATAL | \
>> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
>> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
>> + ACPI_EINJ_CXL_MEM_FATAL)
>>
>> /*
>> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
>> @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>> if (type & ACPI5_VENDOR_BIT) {
>> if (vendor_flags != SETWA_FLAGS_MEM)
>> goto inject;
>> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
>> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
>> goto inject;
>> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
>> + goto inject;
>> + }
>>
>> /*
>> * Disallow crazy address masks that give BIOS leeway to pick
>> @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = {
>> "0x00000200\tPlatform Correctable\n",
>> "0x00000400\tPlatform Uncorrectable non-fatal\n",
>> "0x00000800\tPlatform Uncorrectable fatal\n",
>> +};
>> +
>> +static const char * const einj_cxl_error_type_string[] = {
>> "0x00001000\tCXL.cache Protocol Correctable\n",
>> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
>> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
>> @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v)
>>
>> DEFINE_SHOW_ATTRIBUTE(available_error_type);
>>
>> -static int error_type_get(void *data, u64 *val)
>> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
>> {
>> - *val = error_type;
>> + int cxl_err, rc;
>> + u32 available_error_type = 0;
>> +
>> + if (!einj_initialized)
>> + return -ENXIO;
>> +
>> + rc = einj_get_available_error_type(&available_error_type);
>> + if (rc)
>> + return rc;
>> +
>> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
>
> Trivial so feel free to ignore but, I'd stick to local styles and have pos
> declared in more traditional c style.
>
Will do.
>> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
>
> Maybe clearer as
> cxl_err = FIELD_PREP(CXL_ERROR_MASK, BIT(pos));
>
I'll think about it. I think I agree with you, but I've seen a good amount of
people who aren't familiar with the FIELD_* macros in which case it isn't much clearer.
>> +
>> + if (available_error_type & cxl_err)
>> + seq_puts(m, einj_cxl_error_type_string[pos]);
>> + }
>>
>> return 0;
>> }
>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
>>
>> -static int error_type_set(void *data, u64 val)
>> +static int validate_error_type(u64 type)
>> {
>> + u32 tval, vendor, available_error_type = 0;
>> int rc;
>> - u32 available_error_type = 0;
>> - u32 tval, vendor;
>>
>> /* Only low 32 bits for error type are valid */
>> - if (val & GENMASK_ULL(63, 32))
>> + if (type & GENMASK_ULL(63, 32))
>> return -EINVAL;
>>
>> /*
>> * Vendor defined types have 0x80000000 bit set, and
>> * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
>> */
>> - vendor = val & ACPI5_VENDOR_BIT;
>> - tval = val & 0x7fffffff;
>> + vendor = type & ACPI5_VENDOR_BIT;
>> + tval = type & 0x7fffffff;
>
> Could flip this to GENMASK whilst you are here.
> I don't much like counting fs :)
>
Me neither. Will do.
>
>>
>> /* Only one error type can be specified */
>> if (tval & (tval - 1))
>> @@ -652,9 +688,105 @@ static int error_type_set(void *data, u64 val)
>> rc = einj_get_available_error_type(&available_error_type);
>> if (rc)
>> return rc;
>> - if (!(val & available_error_type))
>> + if (!(type & available_error_type))
>> return -EINVAL;
>> }
>> +
>> + return 0;
>> +}
>> +
>> +static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
>> +{
>> + struct pci_bus *pbus;
>> + struct pci_host_bridge *bridge;
>> + u64 seg = 0, bus;
>> +
>> + pbus = dport_dev->bus;
>> + bridge = pci_find_host_bridge(pbus);
>> +
>> + if (!bridge)
>> + return -ENODEV;
>> +
>> + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
>
> Ah. x86 is not using the CONFIG_PCI_DOMAINS_GENERIC
> so I guess we can't just use pci_domain_nr(pbus)?
> (for the generic case bus->domain_nr is filled in when
> the host bridge is registered). Pity.
>
>> + seg = bridge->domain_nr << 24;
>> +
>> + bus = pbus->number << 16;
> I'd do the shifts whilst building sbpf.
> AS it stands you end up with what look to be wrong values in
> seg and bus because you'd shifted them in the local variables.
>
>> + *sbdf = seg | bus | dport_dev->devfn;
> *sbdf = (seg << 24) | (bus << 16) | dport_dev->devfn;
>
> (with shifts dropped where seg and bus are set)
Will do! I was never 100% sure about this function, but I
never had a problem with it in my testing.
>> +
>> + return 0;
>> +}
>> +
>> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
>> +{
>> + u64 param1 = 0, param2 = 0, param4 = 0;
>> + u32 flags;
>> + int rc;
>> +
>> + if (!einj_initialized)
>> + return -ENXIO;
>> +
>> + /* Only CXL error types can be specified */
>> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
>
> As below - a utility function would unify the 3 uses of this and
> avoid need for comment.
>
>> + return -EINVAL;
>> +
>> + rc = validate_error_type(type);
>> + if (rc)
>> + return rc;
>> +
>> + param1 = (u64) rcrb;
> already a u64
probably left over from an earlier revision, I'll get rid of the cast.
>> + param2 = 0xfffffffffffff000;
> No need to initialize these to 0 above.
>> + flags = 0x2;
>> +
>> + return einj_error_inject(type, flags, param1, param2, 0, param4);
>
> return einj_error_inject(type, flags,
> rcrb, 0xfffffffffffff000, 0, 0);
>
Will change.
>> +}
>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
>> +
>> +int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
>> +{
>> + u64 param1 = 0, param2 = 0, param4 = 0;
>> + u32 flags;
>> + int rc;
>> +
>> + if (!einj_initialized)
>> + return -ENXIO;
>> +
>> + /* Only CXL error types can be specified */
>> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
>
> As below a utility function would do this nicely (and avoid need
> for comment).
> if (!is_cxl_error(type))
>
>> + return -EINVAL;
>> +
>> + rc = validate_error_type(type);
>> + if (rc)
>> + return rc;
>> +
>> + rc = cxl_dport_get_sbdf(dport, ¶m4);
>> + if (rc)
>> + return rc;
>> +
>> + flags = 0x4;
>> +
>> + return einj_error_inject(type, flags, param1, param2, 0, param4);
> Why not
> return einj_error_injecT(type, 0x4, 0, 0, 0, param4);
> ?
>
> Maybe flags is useful as "documentation" but given the parameters are nicely
> in order and you already didn't bother with param3, I can't see why
> keeping param1 and param2 as local variables is useful.
>
It's a good point. I originally had the RCH and VH variants in the same function,
and the param variables did matter then. Now that that's not the case, I'll
remove them.
>> +}
>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
>
>
>
>> +
>> +static int error_type_set(void *data, u64 val)
>> +{
>> + int rc;
>> +
>> + /* CXL error types have to be injected from cxl debugfs */
>> + if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
>
> Given you have inverse of this above, maybe it's worth a little
> helper function to have the logic only once?
>
> if (is_cxl_error(val))
>
I agree, I'll add it.
>> + return -EINVAL;
>> +
>> + rc = validate_error_type(val);
>> + if (rc)
>> + return rc;
>> +
>> error_type = val;
>>
>> return 0;
>> @@ -690,6 +822,12 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
>> return 0;
>> }
>
>
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 8c00fd6be730..c52c92222be5 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>
>> +static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
>> +{
>> + struct dentry *dir;
>> +
>> + /*
>> + * dport_dev needs to be a PCIe port for CXL 2.0+ ports because
>> + * EINJ expects a dport SBDF to be specified for 2.0 error injection.
>> + */
>> + if (!einj_is_initialized() ||
>> + (!dport->rch && !dev_is_pci(dport->dport_dev)))
>> + return;
>
> Trivial: Even though it's a little more code I'd break this up so that it's clear
> exactly what the comment refers to.
>
I'm fine with that, will do.
> if (!einj_is_initialized)
> return;
>
> /*
> * dport_dev needs to be a PCIe port for CXL 2.0+ ports because
> * EINJ expects a dport SBDF to be specified for 2.0 error injection.
> */
> if (!dport->rch && !dev_is_pci(dport->dport_dev))
> return;
>> +
>> + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
>> +
>> + debugfs_create_file("einj_inject", 0200, dir, dport,
>> + &cxl_einj_inject_fops);
>> +}
>> +
>> static struct cxl_port *__devm_cxl_add_port(struct device *host,
>> struct device *uport_dev,
>> resource_size_t component_reg_phys,
>
> ...
>
>> @@ -2220,6 +2254,11 @@ static __init int cxl_core_init(void)
>>
>> cxl_debugfs = debugfs_create_dir("cxl", NULL);
>>
>> + if (einj_is_initialized()) {
>> + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
>> + &einj_cxl_available_error_type_fops);
>> + }
>> +
>> cxl_mbox_init();
>>
>> rc = cxl_memdev_init();
>
>
>> diff --git a/include/linux/einj-cxl.h b/include/linux/einj-cxl.h
>> new file mode 100644
>> index 000000000000..af57cc8580a6
>> --- /dev/null
>> +++ b/include/linux/einj-cxl.h
>
>
>> +
>> +#if IS_ENABLED(CONFIG_ACPI_APEI_EINJ)
>> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
>> +int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
>> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
>> +bool einj_is_initialized(void);
>> +#else
>
> It's trivial but if you are respinning, I'd like to see a comment for the
> else and the endif to let us trivially see what they match with.
> Lack of indenting for this preprocessor conditions can make this really
> hard to undwind once a file grows more complex over time.
>
I see, I'll comment them!
>> +static inline int einj_cxl_available_error_type_show(struct seq_file *m,
>> + void *v)
>> +{
>> + return -ENXIO;
>> +}
>> +
>> +static inline int einj_cxl_type_show(struct seq_file *m, void *data)
>
> A stub for a function that doesn't exist otherwise. Left
> over from refactors?
>
Probably, I'll remove it.
Thanks,
Ben
>> +{
>> + return -ENXIO;
>> +}
>> +
>> +static inline int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type)
>> +{
>> + return -ENXIO;
>> +}
>> +
>> +static inline int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
>> +{
>> + return -ENXIO;
>> +}
>> +
>> +static inline bool einj_is_initialized(void) { return false; }
>> +#endif
>> +
>> +#endif
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-14 16:41 ` Ben Cheatham
@ 2024-02-14 17:46 ` Jonathan Cameron
2024-02-14 17:54 ` Ben Cheatham
0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2024-02-14 17:46 UTC (permalink / raw)
To: Ben Cheatham
Cc: dan.j.williams, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rafael, linux-cxl, linux-acpi
On Wed, 14 Feb 2024 10:41:00 -0600
Ben Cheatham <benjamin.cheatham@amd.com> wrote:
> On 2/14/24 9:27 AM, Jonathan Cameron wrote:
> > On Thu, 8 Feb 2024 14:00:41 -0600
> > Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
> >
> >> Implement CXL helper functions in the EINJ module for getting/injecting
> >> available CXL protocol error types and export them to sysfs under
> >> kernel/debug/cxl.
> >>
> >> The kernel/debug/cxl/einj_types file will print the available CXL
> >> protocol errors in the same format as the available_error_types
> >> file provided by the EINJ module. The
> >> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
> >> error_type and error_inject files provided by the EINJ module, i.e.:
> >> writing an error type into $dport_dev/einj_inject will inject said error
> >> type into the CXL dport represented by $dport_dev.
> >>
> >> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> > Hi Ben,
> >
> > Sorry I've not looked at this sooner.
> >
> > Anyhow, some comments inline. Mostly looks good to me.
> >
> > Jonathan
> >
> >> ---
> >> Documentation/ABI/testing/debugfs-cxl | 22 ++++
> >> MAINTAINERS | 1 +
> >> drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++--
> >> drivers/cxl/core/port.c | 39 +++++++
> >> include/linux/einj-cxl.h | 45 ++++++++
> >> 5 files changed, 255 insertions(+), 10 deletions(-)
> >> create mode 100644 include/linux/einj-cxl.h
> >>
> >> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
> >> index fe61d372e3fa..bcd985cca66a 100644
> >> --- a/Documentation/ABI/testing/debugfs-cxl
> >> +++ b/Documentation/ABI/testing/debugfs-cxl
> >> @@ -33,3 +33,25 @@ Description:
> >> device cannot clear poison from the address, -ENXIO is returned.
> >> The clear_poison attribute is only visible for devices
> >> supporting the capability.
> >> +
> >> +What: /sys/kernel/debug/cxl/einj_types
> >> +Date: January, 2024
> >> +KernelVersion: v6.9
> >> +Contact: linux-cxl@vger.kernel.org
> >> +Description:
> >> + (RO) Prints the CXL protocol error types made available by
> >> + the platform in the format "0x<error number> <error type>".
> >> + The <error number> can be written to einj_inject to inject
> >> + <error type> into a chosen dport.
> >
> > I think it's a limited set, so docs could include what the error_type values can
> > be? From this description it's not obvious they are human readable strings.
> >
>
> It is a limited set, but that set has 6 variants. It may make the description
> a bit long to include all of them, but I could include an example string instead?
> If length isn't an issue then I can add them all in.
Example works.
>
> >> +
> >> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
> >> +Date: January, 2024
> >> +KernelVersion: v6.9
> >> +Contact: linux-cxl@vger.kernel.org
> >> +Description:
> >> + (WO) Writing an integer to this file injects the corresponding
> >> + CXL protocol error into $dport_dev ($dport_dev will be a device
> >> + name from /sys/bus/pci/devices). The integer to type mapping for
> >> + injection can be found by reading from einj_types. If the dport
> >> + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
> >> + a CXL 2.0 error is injected.
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 9104430e148e..02d7feb2ed1f 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -5246,6 +5246,7 @@ L: linux-cxl@vger.kernel.org
> >> S: Maintained
> >> F: drivers/cxl/
> >> F: include/uapi/linux/cxl_mem.h
> >> +F: include/linux/einj-cxl.h
> >> F: tools/testing/cxl/
> >>
> >> COMPUTE EXPRESS LINK PMU (CPMU)
> >> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> >> index 73dde21d3e89..9137cc01f791 100644
> >> --- a/drivers/acpi/apei/einj.c
> >> +++ b/drivers/acpi/apei/einj.c
> >> @@ -21,6 +21,7 @@
> >> #include <linux/nmi.h>
> >> #include <linux/delay.h>
> >> #include <linux/mm.h>
> >> +#include <linux/einj-cxl.h>
> >> #include <linux/platform_device.h>
> >> #include <asm/unaligned.h>
> >>
> >> @@ -37,6 +38,20 @@
> >> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
> >> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
> >> ACPI_EINJ_MEMORY_FATAL)
> >> +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
> >> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
> >> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
> >> +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
> >> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
> >> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
> >> +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
> >> +#endif
> >> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
> >> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
> >> + ACPI_EINJ_CXL_CACHE_FATAL | \
> >> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
> >> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
> >> + ACPI_EINJ_CXL_MEM_FATAL)
> >>
> >> /*
> >> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
> >> @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> >> if (type & ACPI5_VENDOR_BIT) {
> >> if (vendor_flags != SETWA_FLAGS_MEM)
> >> goto inject;
> >> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
> >> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
> >> goto inject;
> >> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
> >> + goto inject;
> >> + }
> >>
> >> /*
> >> * Disallow crazy address masks that give BIOS leeway to pick
> >> @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = {
> >> "0x00000200\tPlatform Correctable\n",
> >> "0x00000400\tPlatform Uncorrectable non-fatal\n",
> >> "0x00000800\tPlatform Uncorrectable fatal\n",
> >> +};
> >> +
> >> +static const char * const einj_cxl_error_type_string[] = {
> >> "0x00001000\tCXL.cache Protocol Correctable\n",
> >> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
> >> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
> >> @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v)
> >>
> >> DEFINE_SHOW_ATTRIBUTE(available_error_type);
> >>
> >> -static int error_type_get(void *data, u64 *val)
> >> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
> >> {
> >> - *val = error_type;
> >> + int cxl_err, rc;
> >> + u32 available_error_type = 0;
> >> +
> >> + if (!einj_initialized)
> >> + return -ENXIO;
> >> +
> >> + rc = einj_get_available_error_type(&available_error_type);
> >> + if (rc)
> >> + return rc;
> >> +
> >> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
> >
> > Trivial so feel free to ignore but, I'd stick to local styles and have pos
> > declared in more traditional c style.
> >
>
> Will do.
>
> >> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
> >
> > Maybe clearer as
> > cxl_err = FIELD_PREP(CXL_ERROR_MASK, BIT(pos));
> >
>
> I'll think about it. I think I agree with you, but I've seen a good amount of
> people who aren't familiar with the FIELD_* macros in which case it isn't much clearer.
Lets teach them ;)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions
2024-02-14 17:46 ` Jonathan Cameron
@ 2024-02-14 17:54 ` Ben Cheatham
0 siblings, 0 replies; 21+ messages in thread
From: Ben Cheatham @ 2024-02-14 17:54 UTC (permalink / raw)
To: Jonathan Cameron
Cc: dan.j.williams, dave, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, rafael, linux-cxl, linux-acpi
On 2/14/24 11:46 AM, Jonathan Cameron wrote:
> On Wed, 14 Feb 2024 10:41:00 -0600
> Ben Cheatham <benjamin.cheatham@amd.com> wrote:
>
>> On 2/14/24 9:27 AM, Jonathan Cameron wrote:
>>> On Thu, 8 Feb 2024 14:00:41 -0600
>>> Ben Cheatham <Benjamin.Cheatham@amd.com> wrote:
>>>
>>>> Implement CXL helper functions in the EINJ module for getting/injecting
>>>> available CXL protocol error types and export them to sysfs under
>>>> kernel/debug/cxl.
>>>>
>>>> The kernel/debug/cxl/einj_types file will print the available CXL
>>>> protocol errors in the same format as the available_error_types
>>>> file provided by the EINJ module. The
>>>> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
>>>> error_type and error_inject files provided by the EINJ module, i.e.:
>>>> writing an error type into $dport_dev/einj_inject will inject said error
>>>> type into the CXL dport represented by $dport_dev.
>>>>
>>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>>>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>>> Hi Ben,
>>>
>>> Sorry I've not looked at this sooner.
>>>
>>> Anyhow, some comments inline. Mostly looks good to me.
>>>
>>> Jonathan
>>>
>>>> ---
>>>> Documentation/ABI/testing/debugfs-cxl | 22 ++++
>>>> MAINTAINERS | 1 +
>>>> drivers/acpi/apei/einj.c | 158 ++++++++++++++++++++++++--
>>>> drivers/cxl/core/port.c | 39 +++++++
>>>> include/linux/einj-cxl.h | 45 ++++++++
>>>> 5 files changed, 255 insertions(+), 10 deletions(-)
>>>> create mode 100644 include/linux/einj-cxl.h
>>>>
>>>> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
>>>> index fe61d372e3fa..bcd985cca66a 100644
>>>> --- a/Documentation/ABI/testing/debugfs-cxl
>>>> +++ b/Documentation/ABI/testing/debugfs-cxl
>>>> @@ -33,3 +33,25 @@ Description:
>>>> device cannot clear poison from the address, -ENXIO is returned.
>>>> The clear_poison attribute is only visible for devices
>>>> supporting the capability.
>>>> +
>>>> +What: /sys/kernel/debug/cxl/einj_types
>>>> +Date: January, 2024
>>>> +KernelVersion: v6.9
>>>> +Contact: linux-cxl@vger.kernel.org
>>>> +Description:
>>>> + (RO) Prints the CXL protocol error types made available by
>>>> + the platform in the format "0x<error number> <error type>".
>>>> + The <error number> can be written to einj_inject to inject
>>>> + <error type> into a chosen dport.
>>>
>>> I think it's a limited set, so docs could include what the error_type values can
>>> be? From this description it's not obvious they are human readable strings.
>>>
>>
>> It is a limited set, but that set has 6 variants. It may make the description
>> a bit long to include all of them, but I could include an example string instead?
>> If length isn't an issue then I can add them all in.
>
> Example works.
>
I tried adding them all in and it didn't make the description too long, so I went
ahead and did that instead of an example.
>>
>>>> +
>>>> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
>>>> +Date: January, 2024
>>>> +KernelVersion: v6.9
>>>> +Contact: linux-cxl@vger.kernel.org
>>>> +Description:
>>>> + (WO) Writing an integer to this file injects the corresponding
>>>> + CXL protocol error into $dport_dev ($dport_dev will be a device
>>>> + name from /sys/bus/pci/devices). The integer to type mapping for
>>>> + injection can be found by reading from einj_types. If the dport
>>>> + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
>>>> + a CXL 2.0 error is injected.
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index 9104430e148e..02d7feb2ed1f 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -5246,6 +5246,7 @@ L: linux-cxl@vger.kernel.org
>>>> S: Maintained
>>>> F: drivers/cxl/
>>>> F: include/uapi/linux/cxl_mem.h
>>>> +F: include/linux/einj-cxl.h
>>>> F: tools/testing/cxl/
>>>>
>>>> COMPUTE EXPRESS LINK PMU (CPMU)
>>>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>>>> index 73dde21d3e89..9137cc01f791 100644
>>>> --- a/drivers/acpi/apei/einj.c
>>>> +++ b/drivers/acpi/apei/einj.c
>>>> @@ -21,6 +21,7 @@
>>>> #include <linux/nmi.h>
>>>> #include <linux/delay.h>
>>>> #include <linux/mm.h>
>>>> +#include <linux/einj-cxl.h>
>>>> #include <linux/platform_device.h>
>>>> #include <asm/unaligned.h>
>>>>
>>>> @@ -37,6 +38,20 @@
>>>> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
>>>> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
>>>> ACPI_EINJ_MEMORY_FATAL)
>>>> +#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
>>>> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
>>>> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
>>>> +#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
>>>> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
>>>> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
>>>> +#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
>>>> +#endif
>>>> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
>>>> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
>>>> + ACPI_EINJ_CXL_CACHE_FATAL | \
>>>> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
>>>> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
>>>> + ACPI_EINJ_CXL_MEM_FATAL)
>>>>
>>>> /*
>>>> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
>>>> @@ -543,8 +558,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>>>> if (type & ACPI5_VENDOR_BIT) {
>>>> if (vendor_flags != SETWA_FLAGS_MEM)
>>>> goto inject;
>>>> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
>>>> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
>>>> goto inject;
>>>> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
>>>> + goto inject;
>>>> + }
>>>>
>>>> /*
>>>> * Disallow crazy address masks that give BIOS leeway to pick
>>>> @@ -596,6 +614,9 @@ static const char * const einj_error_type_string[] = {
>>>> "0x00000200\tPlatform Correctable\n",
>>>> "0x00000400\tPlatform Uncorrectable non-fatal\n",
>>>> "0x00000800\tPlatform Uncorrectable fatal\n",
>>>> +};
>>>> +
>>>> +static const char * const einj_cxl_error_type_string[] = {
>>>> "0x00001000\tCXL.cache Protocol Correctable\n",
>>>> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
>>>> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
>>>> @@ -621,29 +642,44 @@ static int available_error_type_show(struct seq_file *m, void *v)
>>>>
>>>> DEFINE_SHOW_ATTRIBUTE(available_error_type);
>>>>
>>>> -static int error_type_get(void *data, u64 *val)
>>>> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
>>>> {
>>>> - *val = error_type;
>>>> + int cxl_err, rc;
>>>> + u32 available_error_type = 0;
>>>> +
>>>> + if (!einj_initialized)
>>>> + return -ENXIO;
>>>> +
>>>> + rc = einj_get_available_error_type(&available_error_type);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
>>>
>>> Trivial so feel free to ignore but, I'd stick to local styles and have pos
>>> declared in more traditional c style.
>>>
>>
>> Will do.
>>
>>>> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
>>>
>>> Maybe clearer as
>>> cxl_err = FIELD_PREP(CXL_ERROR_MASK, BIT(pos));
>>>
>>
>> I'll think about it. I think I agree with you, but I've seen a good amount of
>> people who aren't familiar with the FIELD_* macros in which case it isn't much clearer.
>
> Lets teach them ;)
>
Sounds good!
Thanks,
Ben
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-02-14 17:54 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-08 20:00 [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
2024-02-08 20:00 ` [PATCH v11 1/4] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
2024-02-14 14:05 ` Jonathan Cameron
2024-02-14 14:34 ` Ben Cheatham
2024-02-08 20:00 ` [PATCH v11 2/4] EINJ: Migrate to a platform driver Ben Cheatham
2024-02-14 14:37 ` Jonathan Cameron
2024-02-08 20:00 ` [PATCH v11 3/4] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions Ben Cheatham
2024-02-09 21:30 ` Ben Cheatham
2024-02-10 0:02 ` Dan Williams
2024-02-10 6:31 ` Dan Williams
2024-02-12 14:12 ` Ben Cheatham
2024-02-13 17:29 ` Ben Cheatham
2024-02-13 18:08 ` Dan Williams
2024-02-14 15:27 ` Jonathan Cameron
2024-02-14 16:41 ` Ben Cheatham
2024-02-14 17:46 ` Jonathan Cameron
2024-02-14 17:54 ` Ben Cheatham
2024-02-08 20:00 ` [PATCH v11 4/4] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
2024-02-14 15:29 ` Jonathan Cameron
2024-02-11 1:15 ` [PATCH v11 0/4] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Dan Williams
2024-02-12 14:13 ` Ben Cheatham
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox