* [PATCH v8 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types
@ 2023-12-13 22:36 Ben Cheatham
2023-12-13 22:36 ` [PATCH v8 1/5] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: Ben Cheatham @ 2023-12-13 22:36 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
v8 Changes:
- Remove callbacks and use a shared header instead (patch 4) (Dan)
- Add CONFIG_CXL_EINJ as part of header rework (patch 1) (Dan)
- Add wrapper __init function for EINJ module (Dan)
- Move einj_types debugfs file to be directly under debug/cxl (Dan)
- Move dport directories to be directly under debug/cxl (remove
portX directories) (Dan)
v7 Changes:
- Fixed a bug a permissions bug with einj_inject file (was using
0x200 instead of 0200)
- Fixed a kernel test bot error
- Removed a "magic" number in cxl_einj_available_error_type()
- Bumped kernel version in debugfs documentation entries
- Added Jonathan's Reviewed-by
v6 Changes:
- Reworked to have CXL error types under /sys/kernel/debug/cxl (Dan)
- Removed CXL error types from legacy EINJ interface in favor of
new interface
- Removed cxl_rcrb_addr file
- Added optional patch for CXL error type #defines (patch 2/5)
- Changes to documentation updates to match rework
- Change base to cxl-fixes branch
The new CXL error types will use the Memory Address field in the
SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
compliant memory-mapped downstream port. The value of the memory address
will be in the port's MMIO range, and it will not represent physical
(normal or persistent) memory.
Add the functionality for injecting CXL 1.1 errors to the EINJ module,
but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
Instead, make the error types available under /sys/kernel/debug/cxl.
This allows for validating the MMIO address for a CXL 1.1 error type
while also not making the user responsible for finding it.
Ben Cheatham (5):
cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
ACPI, APEI, EINJ: Add wrapper __init function
ACPI: Add CXL protocol error defines
cxl/core, EINJ: Add CXL debugfs files and EINJ functions
EINJ, Documentation: Update EINJ kernel doc
Documentation/ABI/testing/debugfs-cxl | 23 +++
.../firmware-guide/acpi/apei/einj.rst | 19 ++
drivers/acpi/apei/Kconfig | 1 +
drivers/acpi/apei/einj.c | 169 ++++++++++++++++--
drivers/cxl/Kconfig | 12 ++
drivers/cxl/core/port.c | 33 ++++
drivers/cxl/einj.h | 58 ++++++
include/acpi/actbl1.h | 6 +
8 files changed, 310 insertions(+), 11 deletions(-)
create mode 100644 drivers/cxl/einj.h
--
2.34.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v8 1/5] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
2023-12-13 22:36 [PATCH v8 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
@ 2023-12-13 22:36 ` Ben Cheatham
2023-12-18 23:48 ` Dan Williams
2023-12-13 22:36 ` [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function Ben Cheatham
` (3 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2023-12-13 22:36 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Add CONFIG_CXL_EINJ to cxl/Kconfig. This option will allow for the CXL
core module to access helpers inside the EINJ module, while also giving
users the option of disabling CXL EINJ error types at build time.
Also update CONFIG_ACPI_APEI_EINJ to set CONFIG_CXL_EINJ by default.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/acpi/apei/Kconfig | 1 +
drivers/cxl/Kconfig | 12 ++++++++++++
2 files changed, 13 insertions(+)
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 6b18f8bc7be3..4c3f0ec5731e 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,7 @@ config ACPI_APEI_MEMORY_FAILURE
config ACPI_APEI_EINJ
tristate "APEI Error INJection (EINJ)"
depends on ACPI_APEI && DEBUG_FS
+ imply CXL_BUS
help
EINJ provides a hardware error injection mechanism, it is
mainly used for debugging and testing the other parts of
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 8ea1d340e438..6f4adcd733e5 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -154,4 +154,16 @@ config CXL_PMU
monitoring units and provide standard perf based interfaces.
If unsure say 'm'.
+
+config CXL_EINJ
+ bool "CXL Error INJection Support"
+ default ACPI_APEI_EINJ
+ depends on ACPI_APEI_EINJ && ACPI_APEI_EINJ=CXL_BUS
+ help
+ Support for CXL protocol Error INJection through debugfs/cxl.
+ Availability and which errors are supported is dependent on
+ the host platform. Look to ACPI v6.5 section 18.6.4 and kernel
+ EINJ documentation for more information.
+
+ If unsure say 'n'
endif
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function
2023-12-13 22:36 [PATCH v8 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
2023-12-13 22:36 ` [PATCH v8 1/5] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
@ 2023-12-13 22:36 ` Ben Cheatham
2023-12-18 23:59 ` Dan Williams
2023-12-13 22:37 ` [PATCH v8 3/5] ACPI: Add CXL protocol error defines Ben Cheatham
` (2 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2023-12-13 22:36 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
The CXL core module should be able to load regardless of whether the
EINJ module initializes correctly. Instead of porting the EINJ module to
a library module, add a wrapper __init function around einj_init() to
pin the EINJ module even if it does not initialize correctly. This
should be fine since the EINJ module is only ever unloaded manually.
One note: since the CXL core will be calling into the EINJ module
directly, even though it may not have initialized, all CXL helper
functions *have* to check if the EINJ module is initialized before
doing any work.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
drivers/acpi/apei/einj.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 013eb621dc92..26a887d2a5cd 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -615,6 +615,8 @@ static int available_error_type_show(struct seq_file *m, void *v)
DEFINE_SHOW_ATTRIBUTE(available_error_type);
+static bool einj_initialized;
+
static int error_type_get(void *data, u64 *val)
{
*val = error_type;
@@ -684,7 +686,7 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
return 0;
}
-static int __init einj_init(void)
+static int __init __einj_init(void)
{
int rc;
acpi_status status;
@@ -782,10 +784,31 @@ static int __init einj_init(void)
return rc;
}
+static int __init einj_init(void)
+{
+ int rc = __einj_init();
+
+ einj_initialized = (rc == 0);
+
+ /*
+ * CXL needs to be able to link and call its EINJ helpers
+ * regardless of whether the EINJ table is present and initialized
+ * correctly. CXL helpers check @einj_initialized before
+ * doing any work.
+ */
+ if (IS_ENABLED(CONFIG_CXL_EINJ))
+ return 0;
+
+ return rc;
+}
+
static void __exit einj_exit(void)
{
struct apei_exec_context ctx;
+ if (!einj_initialized)
+ return;
+
if (einj_param) {
acpi_size size = (acpi5) ?
sizeof(struct set_error_type_with_address) :
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v8 3/5] ACPI: Add CXL protocol error defines
2023-12-13 22:36 [PATCH v8 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
2023-12-13 22:36 ` [PATCH v8 1/5] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
2023-12-13 22:36 ` [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function Ben Cheatham
@ 2023-12-13 22:37 ` Ben Cheatham
2023-12-19 5:09 ` Dan Williams
2023-12-13 22:37 ` [PATCH v8 4/5] cxl/core, EINJ: Add CXL debugfs files and EINJ functions Ben Cheatham
2023-12-13 22:37 ` [PATCH v8 5/5] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
4 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2023-12-13 22:37 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Add CXL protocol error defines to include/actbl1.h.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
I made a pull request for this support in the ACPICA project which has
been accepted (link below), so this patch is temporary and I expect it
to be dropped once the kernel updates from ACPICA.
[1]:
Link: https://github.com/acpica/acpica/pull/884
---
include/acpi/actbl1.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index a33375e055ad..1f58c5d86869 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -1096,6 +1096,12 @@ enum acpi_einj_command_status {
#define ACPI_EINJ_PLATFORM_CORRECTABLE (1<<9)
#define ACPI_EINJ_PLATFORM_UNCORRECTABLE (1<<10)
#define ACPI_EINJ_PLATFORM_FATAL (1<<11)
+#define ACPI_EINJ_CXL_CACHE_CORRECTABLE (1<<12)
+#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE (1<<13)
+#define ACPI_EINJ_CXL_CACHE_FATAL (1<<14)
+#define ACPI_EINJ_CXL_MEM_CORRECTABLE (1<<15)
+#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE (1<<16)
+#define ACPI_EINJ_CXL_MEM_FATAL (1<<17)
#define ACPI_EINJ_VENDOR_DEFINED (1<<31)
/*******************************************************************************
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v8 4/5] cxl/core, EINJ: Add CXL debugfs files and EINJ functions
2023-12-13 22:36 [PATCH v8 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
` (2 preceding siblings ...)
2023-12-13 22:37 ` [PATCH v8 3/5] ACPI: Add CXL protocol error defines Ben Cheatham
@ 2023-12-13 22:37 ` Ben Cheatham
2023-12-19 4:47 ` Dan Williams
2023-12-13 22:37 ` [PATCH v8 5/5] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
4 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2023-12-13 22:37 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Implement CXL helper functions in the EINJ module for getting the
available CXL protocol error types and injecting CXL errors and export
them to sysfs under kernel/debug/cxl.
The kernel/debug/cxl/einj_types file will print the available CXL
protocol errors in the same format as the available_error_types
file provided by the EINJ module. The
kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
error_type and error_inject files provided by the EINJ module, i.e.:
writing an error type into $dport_dev/einj_inject will inject said error
type into the CXL dport represented by $dport_dev.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
Documentation/ABI/testing/debugfs-cxl | 23 ++++
drivers/acpi/apei/einj.c | 144 ++++++++++++++++++++++++--
drivers/cxl/core/port.c | 33 ++++++
drivers/cxl/einj.h | 58 +++++++++++
4 files changed, 248 insertions(+), 10 deletions(-)
create mode 100644 drivers/cxl/einj.h
diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
index fe61d372e3fa..97a8684bad84 100644
--- a/Documentation/ABI/testing/debugfs-cxl
+++ b/Documentation/ABI/testing/debugfs-cxl
@@ -33,3 +33,26 @@ Description:
device cannot clear poison from the address, -ENXIO is returned.
The clear_poison attribute is only visible for devices
supporting the capability.
+
+What: /sys/kernel/debug/cxl/einj_types
+Date: November, 2023
+KernelVersion: v6.8
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (RO) Prints the CXL protocol error types made available by
+ the platform in the format "0x<error number> <error type>".
+ The <error number> can be written to einj_inject to inject
+ <error type> into a chosen dport. This file is only visible if
+ CONFIG_CXL_EINJ is enabled.
+
+What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
+Date: November, 2023
+KernelVersion: v6.8
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (WO) Writing an integer to this file injects the corresponding
+ CXL protocol error into $dport_dev (integer to type mapping is
+ available by reading from einj_types). If the dport was
+ enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
+ a CXL 2.0 error is injected. This file is only visible if
+ CONFIG_CXL_EINJ is enabled.
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 26a887d2a5cd..1a2195779b52 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -24,6 +24,7 @@
#include <asm/unaligned.h>
#include "apei-internal.h"
+#include "../../cxl/einj.h"
#undef pr_fmt
#define pr_fmt(fmt) "EINJ: " fmt
@@ -36,6 +37,12 @@
#define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
ACPI_EINJ_MEMORY_UNCORRECTABLE | \
ACPI_EINJ_MEMORY_FATAL)
+#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
+ ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
+ ACPI_EINJ_CXL_CACHE_FATAL | \
+ ACPI_EINJ_CXL_MEM_CORRECTABLE | \
+ ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
+ ACPI_EINJ_CXL_MEM_FATAL)
/*
* ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
@@ -537,8 +544,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
if (type & ACPI5_VENDOR_BIT) {
if (vendor_flags != SETWA_FLAGS_MEM)
goto inject;
- } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
+ } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
goto inject;
+ } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
+ goto inject;
+ }
/*
* Disallow crazy address masks that give BIOS leeway to pick
@@ -590,6 +600,9 @@ static const char * const einj_error_type_string[] = {
"0x00000200\tPlatform Correctable\n",
"0x00000400\tPlatform Uncorrectable non-fatal\n",
"0x00000800\tPlatform Uncorrectable fatal\n",
+};
+
+static const char * const einj_cxl_error_type_string[] = {
"0x00001000\tCXL.cache Protocol Correctable\n",
"0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
"0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
@@ -617,29 +630,44 @@ DEFINE_SHOW_ATTRIBUTE(available_error_type);
static bool einj_initialized;
-static int error_type_get(void *data, u64 *val)
+int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
{
- *val = error_type;
+ int cxl_err, rc;
+ u32 available_error_type = 0;
+
+ if (!einj_initialized)
+ return -ENXIO;
+
+ rc = einj_get_available_error_type(&available_error_type);
+ if (rc)
+ return rc;
+
+ for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
+ cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
+
+ if (available_error_type & cxl_err)
+ seq_puts(m, einj_cxl_error_type_string[pos]);
+ }
return 0;
}
+EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
-static int error_type_set(void *data, u64 val)
+static int validate_error_type(u64 type)
{
+ u32 tval, vendor, available_error_type = 0;
int rc;
- u32 available_error_type = 0;
- u32 tval, vendor;
/* Only low 32 bits for error type are valid */
- if (val & GENMASK_ULL(63, 32))
+ if (type & GENMASK_ULL(63, 32))
return -EINVAL;
/*
* Vendor defined types have 0x80000000 bit set, and
* are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
*/
- vendor = val & ACPI5_VENDOR_BIT;
- tval = val & 0x7fffffff;
+ vendor = type & ACPI5_VENDOR_BIT;
+ tval = type & 0x7fffffff;
/* Only one error type can be specified */
if (tval & (tval - 1))
@@ -648,9 +676,105 @@ static int error_type_set(void *data, u64 val)
rc = einj_get_available_error_type(&available_error_type);
if (rc)
return rc;
- if (!(val & available_error_type))
+ if (!(type & available_error_type))
return -EINVAL;
}
+
+ return 0;
+}
+
+static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
+{
+ struct pci_bus *pbus;
+ struct pci_host_bridge *bridge;
+ u64 seg = 0, bus;
+
+ pbus = dport_dev->bus;
+ bridge = pci_find_host_bridge(pbus);
+
+ if (!bridge)
+ return -ENODEV;
+
+ if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
+ seg = bridge->domain_nr << 24;
+
+ bus = pbus->number << 16;
+ *sbdf = seg | bus | dport_dev->devfn;
+
+ return 0;
+}
+
+int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
+{
+ u64 param1 = 0, param2 = 0, param4 = 0;
+ u32 flags;
+ int rc;
+
+ if (!einj_initialized)
+ return -ENXIO;
+
+ /* Only CXL error types can be specified */
+ if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
+ return -EINVAL;
+
+ rc = validate_error_type(type);
+ if (rc)
+ return rc;
+
+ param1 = (u64) rcrb;
+ param2 = 0xfffffffffffff000;
+ flags = 0x2;
+
+ return einj_error_inject(type, flags, param1, param2, 0, param4);
+}
+EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
+
+int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
+{
+ u64 param1 = 0, param2 = 0, param4 = 0;
+ u32 flags;
+ int rc;
+
+ if (!einj_initialized)
+ return -ENXIO;
+
+ /* Only CXL error types can be specified */
+ if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
+ return -EINVAL;
+
+ rc = validate_error_type(type);
+ if (rc)
+ return rc;
+
+ rc = cxl_dport_get_sbdf(dport, ¶m4);
+ if (rc)
+ return rc;
+
+ flags = 0x4;
+
+ return einj_error_inject(type, flags, param1, param2, 0, param4);
+}
+EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
+
+static int error_type_get(void *data, u64 *val)
+{
+ *val = error_type;
+
+ return 0;
+}
+
+static int error_type_set(void *data, u64 val)
+{
+ int rc;
+
+ /* CXL error types have to be injected from cxl debugfs */
+ if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
+ return -EINVAL;
+
+ rc = validate_error_type(val);
+ if (rc)
+ return rc;
+
error_type = val;
return 0;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 38441634e4c6..4ed4a24138c3 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -11,6 +11,7 @@
#include <linux/idr.h>
#include <cxlmem.h>
#include <cxlpci.h>
+#include <einj.h>
#include <cxl.h>
#include "core.h"
@@ -783,6 +784,32 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
return rc;
}
+DEFINE_SHOW_ATTRIBUTE(cxl_einj_available_error_type);
+
+static int cxl_einj_inject(void *data, u64 type)
+{
+ struct cxl_dport *dport = data;
+
+ if (dport->rch)
+ return cxl_einj_inject_rch_error(dport->rcrb.base, type);
+
+ if (!dev_is_pci(dport->dport_dev))
+ return -EINVAL;
+
+ return cxl_einj_inject_error(to_pci_dev(dport->dport_dev), type);
+}
+DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
+
+static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
+{
+ struct dentry *dir;
+
+ dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
+
+ debugfs_create_file("einj_inject", 0200, dir, dport,
+ &cxl_einj_inject_fops);
+}
+
static struct cxl_port *__devm_cxl_add_port(struct device *host,
struct device *uport_dev,
resource_size_t component_reg_phys,
@@ -1136,6 +1163,8 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
} else {
dev_dbg(dport_dev, "dport added to %s\n",
dev_name(&port->dev));
+
+ cxl_debugfs_create_dport_dir(dport);
}
return dport;
@@ -1170,6 +1199,8 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
} else {
dev_dbg(dport_dev, "RCH dport added to %s\n",
dev_name(&port->dev));
+
+ cxl_debugfs_create_dport_dir(dport);
}
return dport;
@@ -2109,6 +2140,8 @@ static __init int cxl_core_init(void)
int rc;
cxl_debugfs = debugfs_create_dir("cxl", NULL);
+ debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
+ &cxl_einj_available_error_type_fops);
cxl_mbox_init();
diff --git a/drivers/cxl/einj.h b/drivers/cxl/einj.h
new file mode 100644
index 000000000000..b913763c3238
--- /dev/null
+++ b/drivers/cxl/einj.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * CXL protocol Error INJection support.
+ *
+ * Copyright (c) 2023 Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Ben Cheatham <benjamin.cheatham@amd.com>
+ */
+#ifndef CXL_EINJ_H
+#define CXL_EINJ_H
+#include <linux/pci.h>
+
+int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
+int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
+int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
+
+#if IS_ENABLED(CONFIG_CXL_EINJ)
+static inline int cxl_einj_available_error_type_show(struct seq_file *m,
+ void *v)
+{
+ return einj_cxl_available_error_type_show(m, v);
+}
+
+static inline int cxl_einj_inject_error(struct pci_dev *dport_dev, u64 type)
+{
+ return einj_cxl_inject_error(dport_dev, type);
+}
+
+static inline int cxl_einj_inject_rch_error(u64 rcrb, u64 type)
+{
+ return einj_cxl_inject_rch_error(rcrb, type);
+}
+
+#else
+static inline int cxl_einj_available_error_type_show(struct seq_file *m,
+ void *v)
+{
+ return -ENXIO;
+}
+
+static inline int cxl_einj_type_show(struct seq_file *m, void *data)
+{
+ return -ENXIO;
+}
+
+static inline int cxl_einj_inject_error(struct pci_dev *dport_dev, u64 type)
+{
+ return -ENXIO;
+}
+
+static inline int cxl_einj_inject_rch_error(u64 rcrb, u64 type)
+{
+ return -ENXIO;
+}
+#endif
+
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v8 5/5] EINJ, Documentation: Update EINJ kernel doc
2023-12-13 22:36 [PATCH v8 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
` (3 preceding siblings ...)
2023-12-13 22:37 ` [PATCH v8 4/5] cxl/core, EINJ: Add CXL debugfs files and EINJ functions Ben Cheatham
@ 2023-12-13 22:37 ` Ben Cheatham
4 siblings, 0 replies; 19+ messages in thread
From: Ben Cheatham @ 2023-12-13 22:37 UTC (permalink / raw)
To: dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Update EINJ kernel document to include how to inject CXL protocol error
types, build the kernel to include CXL error types, and give an example
injection.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
.../firmware-guide/acpi/apei/einj.rst | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index d6b61d22f525..a9d3f6926cae 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,6 +181,25 @@ You should see something like this in dmesg::
[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
+CXL error types are supported from ACPI 6.5 onwards. These error types
+are not available in the legacy interface at /sys/kernel/debug/apei/einj,
+and are instead at /sys/kernel/debug/cxl/. There is a file under debug/cxl
+called "einj_type" that is analagous to available_error_type under debug/cxl.
+There is also a "einj_inject" file in each $dport_dev directory under debug/cxl
+that will inject a given error into the dport represented by $dport_dev.
+For example, to inject a CXL.mem protocol correctable error into
+$dport_dev=pci0000:0c::
+
+ # cd /sys/kernel/debug/cxl/
+ # cat einj_type # See which error can be injected
+ 0x00008000 CXL.mem Protocol Correctable
+ 0x00010000 CXL.mem Protocol Uncorrectable non-fatal
+ 0x00020000 CXL.mem Protocol Uncorrectable fatal
+ # cd pci0000:0c # Navigate to dport to inject into
+ # echo 0x8000 > einj_inject # Inject error
+
+To use CXL error types, ``CONFIG_CXL_EINJ`` will need to be enabled.
+
Special notes for injection into SGX enclaves:
There may be a separate BIOS setup option to enable SGX injection.
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v8 1/5] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
2023-12-13 22:36 ` [PATCH v8 1/5] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
@ 2023-12-18 23:48 ` Dan Williams
2024-01-10 20:31 ` Ben Cheatham
0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2023-12-18 23:48 UTC (permalink / raw)
To: Ben Cheatham, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Ben Cheatham wrote:
> Add CONFIG_CXL_EINJ to cxl/Kconfig. This option will allow for the CXL
> core module to access helpers inside the EINJ module, while also giving
> users the option of disabling CXL EINJ error types at build time.
>
> Also update CONFIG_ACPI_APEI_EINJ to set CONFIG_CXL_EINJ by default.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
> drivers/acpi/apei/Kconfig | 1 +
> drivers/cxl/Kconfig | 12 ++++++++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index 6b18f8bc7be3..4c3f0ec5731e 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -55,6 +55,7 @@ config ACPI_APEI_MEMORY_FAILURE
> config ACPI_APEI_EINJ
> tristate "APEI Error INJection (EINJ)"
> depends on ACPI_APEI && DEBUG_FS
> + imply CXL_BUS
This can safely be deleted, it is sufficient for CXL_BUS to depend on
ACPI_APEI_EINJ.
> help
> EINJ provides a hardware error injection mechanism, it is
> mainly used for debugging and testing the other parts of
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 8ea1d340e438..6f4adcd733e5 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -154,4 +154,16 @@ config CXL_PMU
> monitoring units and provide standard perf based interfaces.
>
> If unsure say 'm'.
> +
> +config CXL_EINJ
> + bool "CXL Error INJection Support"
> + default ACPI_APEI_EINJ
> + depends on ACPI_APEI_EINJ && ACPI_APEI_EINJ=CXL_BUS
It is ok for CXL_BUS to be a module while ACPI_APEI_EINJ is built-in, so
the && can be dropped since CXL_BUS is guaranteed to be > 0 here, i.e.:
default ACPI_APEI_EINJ
depends on ACPI_APEI_EINJ >= CXL_BUS
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function
2023-12-13 22:36 ` [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function Ben Cheatham
@ 2023-12-18 23:59 ` Dan Williams
2023-12-19 15:39 ` Jonathan Cameron
0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2023-12-18 23:59 UTC (permalink / raw)
To: Ben Cheatham, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Ben Cheatham wrote:
> The CXL core module should be able to load regardless of whether the
> EINJ module initializes correctly. Instead of porting the EINJ module to
> a library module, add a wrapper __init function around einj_init() to
Small quibble with this wording... the larger EINJ module refactoring
would be separating module_init() from EINJ probe(). As is this simple
introduction of an einit_init() wrapper *is* refactoring this module to
be used as a module dependency.
> pin the EINJ module even if it does not initialize correctly. This
> should be fine since the EINJ module is only ever unloaded manually.
>
> One note: since the CXL core will be calling into the EINJ module
> directly, even though it may not have initialized, all CXL helper
> functions *have* to check if the EINJ module is initialized before
> doing any work.
Another small quibble here, perhaps s/may not have initialized/may not
have successfully initialized/? Because initialization will have
definitely completed one way or the other, but callers need to abort if
it completed in error.
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Did Jonathan really get in and review this new patch in the series
before me? If yes, apologies I missed it, if no I think it is best
practice to not carry forward series Reviewed-by's if new patches appear
in the series between revisions.
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
With above fixups, feel free to add:
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 4/5] cxl/core, EINJ: Add CXL debugfs files and EINJ functions
2023-12-13 22:37 ` [PATCH v8 4/5] cxl/core, EINJ: Add CXL debugfs files and EINJ functions Ben Cheatham
@ 2023-12-19 4:47 ` Dan Williams
2024-01-10 20:31 ` Ben Cheatham
0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2023-12-19 4:47 UTC (permalink / raw)
To: Ben Cheatham, dan.j.williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Ben Cheatham wrote:
> Implement CXL helper functions in the EINJ module for getting the
> available CXL protocol error types and injecting CXL errors and export
> them to sysfs under kernel/debug/cxl.
>
> The kernel/debug/cxl/einj_types file will print the available CXL
> protocol errors in the same format as the available_error_types
> file provided by the EINJ module. The
> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
> error_type and error_inject files provided by the EINJ module, i.e.:
> writing an error type into $dport_dev/einj_inject will inject said error
> type into the CXL dport represented by $dport_dev.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
> Documentation/ABI/testing/debugfs-cxl | 23 ++++
> drivers/acpi/apei/einj.c | 144 ++++++++++++++++++++++++--
> drivers/cxl/core/port.c | 33 ++++++
> drivers/cxl/einj.h | 58 +++++++++++
> 4 files changed, 248 insertions(+), 10 deletions(-)
> create mode 100644 drivers/cxl/einj.h
>
> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
> index fe61d372e3fa..97a8684bad84 100644
> --- a/Documentation/ABI/testing/debugfs-cxl
> +++ b/Documentation/ABI/testing/debugfs-cxl
> @@ -33,3 +33,26 @@ Description:
> device cannot clear poison from the address, -ENXIO is returned.
> The clear_poison attribute is only visible for devices
> supporting the capability.
> +
> +What: /sys/kernel/debug/cxl/einj_types
> +Date: November, 2023
> +KernelVersion: v6.8
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + (RO) Prints the CXL protocol error types made available by
> + the platform in the format "0x<error number> <error type>".
> + The <error number> can be written to einj_inject to inject
> + <error type> into a chosen dport. This file is only visible if
> + CONFIG_CXL_EINJ is enabled.
I don't think the Kconfig dependency needs to be mentioned here.
> +
> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
> +Date: November, 2023
> +KernelVersion: v6.8
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + (WO) Writing an integer to this file injects the corresponding
> + CXL protocol error into $dport_dev (integer to type mapping is
> + available by reading from einj_types).
Maybe mention that $dport_dev is a device name from /sys/bus/pci/devices?
> + enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
> + a CXL 2.0 error is injected. This file is only visible if
> + CONFIG_CXL_EINJ is enabled.
Same "drop the Kconfig mention" here.
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 26a887d2a5cd..1a2195779b52 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -24,6 +24,7 @@
> #include <asm/unaligned.h>
>
> #include "apei-internal.h"
> +#include "../../cxl/einj.h"
I thought this was going to move to a top-level header?
>
> #undef pr_fmt
> #define pr_fmt(fmt) "EINJ: " fmt
> @@ -36,6 +37,12 @@
> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
> ACPI_EINJ_MEMORY_FATAL)
> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
> + ACPI_EINJ_CXL_CACHE_FATAL | \
> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
> + ACPI_EINJ_CXL_MEM_FATAL)
>
> /*
> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
> @@ -537,8 +544,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
> if (type & ACPI5_VENDOR_BIT) {
> if (vendor_flags != SETWA_FLAGS_MEM)
> goto inject;
> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
> goto inject;
> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
> + goto inject;
> + }
>
> /*
> * Disallow crazy address masks that give BIOS leeway to pick
> @@ -590,6 +600,9 @@ static const char * const einj_error_type_string[] = {
> "0x00000200\tPlatform Correctable\n",
> "0x00000400\tPlatform Uncorrectable non-fatal\n",
> "0x00000800\tPlatform Uncorrectable fatal\n",
> +};
> +
> +static const char * const einj_cxl_error_type_string[] = {
> "0x00001000\tCXL.cache Protocol Correctable\n",
> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
> @@ -617,29 +630,44 @@ DEFINE_SHOW_ATTRIBUTE(available_error_type);
>
> static bool einj_initialized;
>
> -static int error_type_get(void *data, u64 *val)
> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
Oh nice an einj_cxl_ prefix in the global namespace, looks appropriate. I
mention this here because I don't think a later cxl_einj_ wrapper makes
sense, see below.
> {
> - *val = error_type;
> + int cxl_err, rc;
> + u32 available_error_type = 0;
> +
> + if (!einj_initialized)
> + return -ENXIO;
> +
> + rc = einj_get_available_error_type(&available_error_type);
> + if (rc)
> + return rc;
> +
> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
> +
> + if (available_error_type & cxl_err)
> + seq_puts(m, einj_cxl_error_type_string[pos]);
> + }
>
> return 0;
> }
> +EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
>
> -static int error_type_set(void *data, u64 val)
> +static int validate_error_type(u64 type)
> {
> + u32 tval, vendor, available_error_type = 0;
> int rc;
> - u32 available_error_type = 0;
> - u32 tval, vendor;
>
> /* Only low 32 bits for error type are valid */
> - if (val & GENMASK_ULL(63, 32))
> + if (type & GENMASK_ULL(63, 32))
> return -EINVAL;
>
> /*
> * Vendor defined types have 0x80000000 bit set, and
> * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
> */
> - vendor = val & ACPI5_VENDOR_BIT;
> - tval = val & 0x7fffffff;
> + vendor = type & ACPI5_VENDOR_BIT;
> + tval = type & 0x7fffffff;
>
> /* Only one error type can be specified */
> if (tval & (tval - 1))
> @@ -648,9 +676,105 @@ static int error_type_set(void *data, u64 val)
> rc = einj_get_available_error_type(&available_error_type);
> if (rc)
> return rc;
> - if (!(val & available_error_type))
> + if (!(type & available_error_type))
> return -EINVAL;
> }
> +
> + return 0;
> +}
> +
> +static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
> +{
> + struct pci_bus *pbus;
> + struct pci_host_bridge *bridge;
> + u64 seg = 0, bus;
> +
> + pbus = dport_dev->bus;
> + bridge = pci_find_host_bridge(pbus);
> +
> + if (!bridge)
> + return -ENODEV;
> +
> + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
> + seg = bridge->domain_nr << 24;
> +
> + bus = pbus->number << 16;
> + *sbdf = seg | bus | dport_dev->devfn;
> +
> + return 0;
> +}
> +
> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
> +{
> + u64 param1 = 0, param2 = 0, param4 = 0;
> + u32 flags;
> + int rc;
> +
> + if (!einj_initialized)
> + return -ENXIO;
> +
> + /* Only CXL error types can be specified */
> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
> + return -EINVAL;
> +
> + rc = validate_error_type(type);
> + if (rc)
> + return rc;
> +
> + param1 = (u64) rcrb;
> + param2 = 0xfffffffffffff000;
> + flags = 0x2;
> +
> + return einj_error_inject(type, flags, param1, param2, 0, param4);
> +}
> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
> +
> +int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
> +{
> + u64 param1 = 0, param2 = 0, param4 = 0;
> + u32 flags;
> + int rc;
> +
> + if (!einj_initialized)
> + return -ENXIO;
> +
> + /* Only CXL error types can be specified */
> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
> + return -EINVAL;
> +
> + rc = validate_error_type(type);
> + if (rc)
> + return rc;
> +
> + rc = cxl_dport_get_sbdf(dport, ¶m4);
> + if (rc)
> + return rc;
> +
> + flags = 0x4;
> +
> + return einj_error_inject(type, flags, param1, param2, 0, param4);
> +}
> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
> +
> +static int error_type_get(void *data, u64 *val)
> +{
> + *val = error_type;
> +
> + return 0;
> +}
> +
> +static int error_type_set(void *data, u64 val)
> +{
> + int rc;
> +
> + /* CXL error types have to be injected from cxl debugfs */
> + if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
> + return -EINVAL;
> +
> + rc = validate_error_type(val);
> + if (rc)
> + return rc;
> +
> error_type = val;
>
> return 0;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 38441634e4c6..4ed4a24138c3 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -11,6 +11,7 @@
> #include <linux/idr.h>
> #include <cxlmem.h>
> #include <cxlpci.h>
> +#include <einj.h>
> #include <cxl.h>
> #include "core.h"
>
> @@ -783,6 +784,32 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
> return rc;
> }
>
> +DEFINE_SHOW_ATTRIBUTE(cxl_einj_available_error_type);
> +
> +static int cxl_einj_inject(void *data, u64 type)
> +{
> + struct cxl_dport *dport = data;
> +
> + if (dport->rch)
> + return cxl_einj_inject_rch_error(dport->rcrb.base, type);
> +
> + if (!dev_is_pci(dport->dport_dev))
> + return -EINVAL;
> +
> + return cxl_einj_inject_error(to_pci_dev(dport->dport_dev), type);
> +}
> +DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
> +
> +static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
> +{
> + struct dentry *dir;
I think an "IS_ENABLED(CONFIG_CXL_EINJ)" is warranted here.
> +
> + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
> +
> + debugfs_create_file("einj_inject", 0200, dir, dport,
> + &cxl_einj_inject_fops);
I will note that I am little bit uneasy about this ACPI'ism escaping
into the common core, but he mitigation for me is that if some other
platform firmware invented a platform-firmware method for error inject
it would simply need to reuse the einj_cxl_ namespace to make it common.
> +}
> +
> static struct cxl_port *__devm_cxl_add_port(struct device *host,
> struct device *uport_dev,
> resource_size_t component_reg_phys,
> @@ -1136,6 +1163,8 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
> } else {
> dev_dbg(dport_dev, "dport added to %s\n",
> dev_name(&port->dev));
> +
> + cxl_debugfs_create_dport_dir(dport);
> }
>
> return dport;
> @@ -1170,6 +1199,8 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
> } else {
> dev_dbg(dport_dev, "RCH dport added to %s\n",
> dev_name(&port->dev));
> +
> + cxl_debugfs_create_dport_dir(dport);
> }
Move the above 2 invocations into 1 common call from
__devm_cxl_add_dport().
>
> return dport;
> @@ -2109,6 +2140,8 @@ static __init int cxl_core_init(void)
> int rc;
>
> cxl_debugfs = debugfs_create_dir("cxl", NULL);
> + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
> + &cxl_einj_available_error_type_fops);
This should be gated by "IS_ENABLED(CONFIG_CXL_EINJ)"
>
> cxl_mbox_init();
>
> diff --git a/drivers/cxl/einj.h b/drivers/cxl/einj.h
> new file mode 100644
> index 000000000000..b913763c3238
> --- /dev/null
> +++ b/drivers/cxl/einj.h
> @@ -0,0 +1,58 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * CXL protocol Error INJection support.
> + *
> + * Copyright (c) 2023 Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Author: Ben Cheatham <benjamin.cheatham@amd.com>
> + */
> +#ifndef CXL_EINJ_H
> +#define CXL_EINJ_H
> +#include <linux/pci.h>
> +
> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
> +int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
See below, these should be within the ifdef guard.
> +
> +#if IS_ENABLED(CONFIG_CXL_EINJ)
This should be
#ifdef CONFIG_ACPI_APEI_EINJ
...because that is the module defining the exported public interface.
> +static inline int cxl_einj_available_error_type_show(struct seq_file *m,
> + void *v)
> +{
> + return einj_cxl_available_error_type_show(m, v);
> +}
> +
> +static inline int cxl_einj_inject_error(struct pci_dev *dport_dev, u64 type)
> +{
> + return einj_cxl_inject_error(dport_dev, type);
> +}
> +
> +static inline int cxl_einj_inject_rch_error(u64 rcrb, u64 type)
> +{
> + return einj_cxl_inject_rch_error(rcrb, type);
> +}
Above 3 static inlines can be deleted, this section of the ifdef would
just be:
int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
> +
> +#else
> +static inline int cxl_einj_available_error_type_show(struct seq_file *m,
> + void *v)
> +{
> + return -ENXIO;
> +}
> +
> +static inline int cxl_einj_type_show(struct seq_file *m, void *data)
> +{
> + return -ENXIO;
> +}
> +
> +static inline int cxl_einj_inject_error(struct pci_dev *dport_dev, u64 type)
> +{
> + return -ENXIO;
> +}
> +
> +static inline int cxl_einj_inject_rch_error(u64 rcrb, u64 type)
> +{
> + return -ENXIO;
> +}
> +#endif
Rename these to their einj_cxl_ equivalent.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 3/5] ACPI: Add CXL protocol error defines
2023-12-13 22:37 ` [PATCH v8 3/5] ACPI: Add CXL protocol error defines Ben Cheatham
@ 2023-12-19 5:09 ` Dan Williams
0 siblings, 0 replies; 19+ messages in thread
From: Dan Williams @ 2023-12-19 5:09 UTC (permalink / raw)
To: rafael, Ben Cheatham, dan.j.williams, dave, jonathan.cameron,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny
Cc: linux-cxl, linux-acpi, benjamin.cheatham
Ben Cheatham wrote:
> Add CXL protocol error defines to include/actbl1.h.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
>
> I made a pull request for this support in the ACPICA project which has
> been accepted (link below), so this patch is temporary and I expect it
> to be dropped once the kernel updates from ACPICA.
>
> [1]:
> Link: https://github.com/acpica/acpica/pull/884
Hi Rafael,
Might the kernel side of this ACPICA update hit a stable commit I can use to
land this topic branch in cxl.git for v6.8? Alternatively, if you want to take
the whole thing through your tree with my ack, just let me know.
> ---
> include/acpi/actbl1.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index a33375e055ad..1f58c5d86869 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -1096,6 +1096,12 @@ enum acpi_einj_command_status {
> #define ACPI_EINJ_PLATFORM_CORRECTABLE (1<<9)
> #define ACPI_EINJ_PLATFORM_UNCORRECTABLE (1<<10)
> #define ACPI_EINJ_PLATFORM_FATAL (1<<11)
> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE (1<<12)
> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE (1<<13)
> +#define ACPI_EINJ_CXL_CACHE_FATAL (1<<14)
> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE (1<<15)
> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE (1<<16)
> +#define ACPI_EINJ_CXL_MEM_FATAL (1<<17)
> #define ACPI_EINJ_VENDOR_DEFINED (1<<31)
>
> /*******************************************************************************
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function
2023-12-18 23:59 ` Dan Williams
@ 2023-12-19 15:39 ` Jonathan Cameron
2023-12-19 20:48 ` Dan Williams
2023-12-20 14:33 ` Ben Cheatham
0 siblings, 2 replies; 19+ messages in thread
From: Jonathan Cameron @ 2023-12-19 15:39 UTC (permalink / raw)
To: Dan Williams
Cc: Ben Cheatham, dave, dave.jiang, alison.schofield, vishal.l.verma,
ira.weiny, rafael, linux-cxl, linux-acpi
On Mon, 18 Dec 2023 15:59:12 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> Ben Cheatham wrote:
> > The CXL core module should be able to load regardless of whether the
> > EINJ module initializes correctly. Instead of porting the EINJ module to
> > a library module, add a wrapper __init function around einj_init() to
>
> Small quibble with this wording... the larger EINJ module refactoring
> would be separating module_init() from EINJ probe(). As is this simple
> introduction of an einit_init() wrapper *is* refactoring this module to
> be used as a module dependency.
>
> > pin the EINJ module even if it does not initialize correctly. This
> > should be fine since the EINJ module is only ever unloaded manually.
> >
> > One note: since the CXL core will be calling into the EINJ module
> > directly, even though it may not have initialized, all CXL helper
> > functions *have* to check if the EINJ module is initialized before
> > doing any work.
>
> Another small quibble here, perhaps s/may not have initialized/may not
> have successfully initialized/? Because initialization will have
> definitely completed one way or the other, but callers need to abort if
> it completed in error.
>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> Did Jonathan really get in and review this new patch in the series
> before me? If yes, apologies I missed it, if no I think it is best
> practice to not carry forward series Reviewed-by's if new patches appear
> in the series between revisions.
I'm not keen on the solution as it's esoteric and to me seems fragile.
I've looked at discussion on v7 and can see why you ended up with this
but I'd have preferred to see the 'violent' approach :)
I don't mind if there is consensus on this, but not reviewed by from
me for this one.
>
>
> > Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>
> With above fixups, feel free to add:
>
> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function
2023-12-19 15:39 ` Jonathan Cameron
@ 2023-12-19 20:48 ` Dan Williams
2023-12-20 14:37 ` Ben Cheatham
2023-12-20 14:33 ` Ben Cheatham
1 sibling, 1 reply; 19+ messages in thread
From: Dan Williams @ 2023-12-19 20:48 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams
Cc: Ben Cheatham, dave, dave.jiang, alison.schofield, vishal.l.verma,
ira.weiny, rafael, linux-cxl, linux-acpi
Jonathan Cameron wrote:
> On Mon, 18 Dec 2023 15:59:12 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > Ben Cheatham wrote:
> > > The CXL core module should be able to load regardless of whether the
> > > EINJ module initializes correctly. Instead of porting the EINJ module to
> > > a library module, add a wrapper __init function around einj_init() to
> >
> > Small quibble with this wording... the larger EINJ module refactoring
> > would be separating module_init() from EINJ probe(). As is this simple
> > introduction of an einit_init() wrapper *is* refactoring this module to
> > be used as a module dependency.
> >
> > > pin the EINJ module even if it does not initialize correctly. This
> > > should be fine since the EINJ module is only ever unloaded manually.
> > >
> > > One note: since the CXL core will be calling into the EINJ module
> > > directly, even though it may not have initialized, all CXL helper
> > > functions *have* to check if the EINJ module is initialized before
> > > doing any work.
> >
> > Another small quibble here, perhaps s/may not have initialized/may not
> > have successfully initialized/? Because initialization will have
> > definitely completed one way or the other, but callers need to abort if
> > it completed in error.
> >
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >
> > Did Jonathan really get in and review this new patch in the series
> > before me? If yes, apologies I missed it, if no I think it is best
> > practice to not carry forward series Reviewed-by's if new patches appear
> > in the series between revisions.
>
> I'm not keen on the solution as it's esoteric and to me seems fragile.
> I've looked at discussion on v7 and can see why you ended up with this
> but I'd have preferred to see the 'violent' approach :)
The issue though is similar to the argument for the creation of the
ACPI0017 device for CXL, there is not a great place to hang the einj
device-driver.
However, since einj has no legacy "auto-load" behavior, I think it is
not a lot of code to have einj's module_init() do something like this:
einj_dev = platform_device_register_full(&einj_dev_info);
platform_driver_register(&einj_driver);
Ben, you want to give that a shot? Jonathan is right that my proposed
hack is *a* solution but probably not *the* solution where this should
end up.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function
2023-12-19 15:39 ` Jonathan Cameron
2023-12-19 20:48 ` Dan Williams
@ 2023-12-20 14:33 ` Ben Cheatham
1 sibling, 0 replies; 19+ messages in thread
From: Ben Cheatham @ 2023-12-20 14:33 UTC (permalink / raw)
To: Jonathan Cameron, Dan Williams
Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
rafael, linux-cxl, linux-acpi
On 12/19/23 9:39 AM, Jonathan Cameron wrote:
> On Mon, 18 Dec 2023 15:59:12 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
>> Ben Cheatham wrote:
>>> The CXL core module should be able to load regardless of whether the
>>> EINJ module initializes correctly. Instead of porting the EINJ module to
>>> a library module, add a wrapper __init function around einj_init() to
>>
>> Small quibble with this wording... the larger EINJ module refactoring
>> would be separating module_init() from EINJ probe(). As is this simple
>> introduction of an einit_init() wrapper *is* refactoring this module to
>> be used as a module dependency.
>>
>>> pin the EINJ module even if it does not initialize correctly. This
>>> should be fine since the EINJ module is only ever unloaded manually.
>>>
>>> One note: since the CXL core will be calling into the EINJ module
>>> directly, even though it may not have initialized, all CXL helper
>>> functions *have* to check if the EINJ module is initialized before
>>> doing any work.
>>
>> Another small quibble here, perhaps s/may not have initialized/may not
>> have successfully initialized/? Because initialization will have
>> definitely completed one way or the other, but callers need to abort if
>> it completed in error.
>>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>
>> Did Jonathan really get in and review this new patch in the series
>> before me? If yes, apologies I missed it, if no I think it is best
>> practice to not carry forward series Reviewed-by's if new patches appear
>> in the series between revisions.
>
> I'm not keen on the solution as it's esoteric and to me seems fragile.
> I've looked at discussion on v7 and can see why you ended up with this
> but I'd have preferred to see the 'violent' approach :)
>
> I don't mind if there is consensus on this, but not reviewed by from
> me for this one.
>
Gotcha, I'll go ahead on drop your reviewed-by.
>>
>>
>>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>>
>> With above fixups, feel free to add:
>>
>> Co-developed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function
2023-12-19 20:48 ` Dan Williams
@ 2023-12-20 14:37 ` Ben Cheatham
2023-12-20 23:51 ` Dan Williams
0 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2023-12-20 14:37 UTC (permalink / raw)
To: Dan Williams, Jonathan Cameron
Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
rafael, linux-cxl, linux-acpi
On 12/19/23 2:48 PM, Dan Williams wrote:
> Jonathan Cameron wrote:
>> On Mon, 18 Dec 2023 15:59:12 -0800
>> Dan Williams <dan.j.williams@intel.com> wrote:
>>
>>> Ben Cheatham wrote:
>>>> The CXL core module should be able to load regardless of whether the
>>>> EINJ module initializes correctly. Instead of porting the EINJ module to
>>>> a library module, add a wrapper __init function around einj_init() to
>>>
>>> Small quibble with this wording... the larger EINJ module refactoring
>>> would be separating module_init() from EINJ probe(). As is this simple
>>> introduction of an einit_init() wrapper *is* refactoring this module to
>>> be used as a module dependency.
>>>
>>>> pin the EINJ module even if it does not initialize correctly. This
>>>> should be fine since the EINJ module is only ever unloaded manually.
>>>>
>>>> One note: since the CXL core will be calling into the EINJ module
>>>> directly, even though it may not have initialized, all CXL helper
>>>> functions *have* to check if the EINJ module is initialized before
>>>> doing any work.
>>>
>>> Another small quibble here, perhaps s/may not have initialized/may not
>>> have successfully initialized/? Because initialization will have
>>> definitely completed one way or the other, but callers need to abort if
>>> it completed in error.
>>>
>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>
>>> Did Jonathan really get in and review this new patch in the series
>>> before me? If yes, apologies I missed it, if no I think it is best
>>> practice to not carry forward series Reviewed-by's if new patches appear
>>> in the series between revisions.
>>
>> I'm not keen on the solution as it's esoteric and to me seems fragile.
>> I've looked at discussion on v7 and can see why you ended up with this
>> but I'd have preferred to see the 'violent' approach :)
>
> The issue though is similar to the argument for the creation of the
> ACPI0017 device for CXL, there is not a great place to hang the einj
> device-driver.
>
> However, since einj has no legacy "auto-load" behavior, I think it is
> not a lot of code to have einj's module_init() do something like this:
>
> einj_dev = platform_device_register_full(&einj_dev_info);
> platform_driver_register(&einj_driver);
>
> Ben, you want to give that a shot? Jonathan is right that my proposed
> hack is *a* solution but probably not *the* solution where this should
> end up.
I can take a look. I won't be able to get to it until around the new year
since I'm vacation at the moment. I'll also respond take a look at the
rest of your review around then.
Thanks,
Ben
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function
2023-12-20 14:37 ` Ben Cheatham
@ 2023-12-20 23:51 ` Dan Williams
0 siblings, 0 replies; 19+ messages in thread
From: Dan Williams @ 2023-12-20 23:51 UTC (permalink / raw)
To: Ben Cheatham, Dan Williams, Jonathan Cameron
Cc: dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
rafael, linux-cxl, linux-acpi
Ben Cheatham wrote:
[..]
> > Ben, you want to give that a shot? Jonathan is right that my proposed
> > hack is *a* solution but probably not *the* solution where this should
> > end up.
>
> I can take a look. I won't be able to get to it until around the new year
> since I'm vacation at the moment. I'll also respond take a look at the
> rest of your review around then.
Enjoy your vacation, and thanks for all the work on this. We'll get it
sorted, and I think small bit of EINJ modernization will be a nice New
Year's gift.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 1/5] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option
2023-12-18 23:48 ` Dan Williams
@ 2024-01-10 20:31 ` Ben Cheatham
0 siblings, 0 replies; 19+ messages in thread
From: Ben Cheatham @ 2024-01-10 20:31 UTC (permalink / raw)
To: Dan Williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
Hi Dan, sorry for the wait, but I finally am getting some time to take a look at this. I haven't gotten around
to taking a look at updating the module init to use a platform driver, but I should be able to get to it in the next
couple of days here. Thanks for the review!
On 12/18/23 5:48 PM, Dan Williams wrote:
> Ben Cheatham wrote:
>> Add CONFIG_CXL_EINJ to cxl/Kconfig. This option will allow for the CXL
>> core module to access helpers inside the EINJ module, while also giving
>> users the option of disabling CXL EINJ error types at build time.
>>
>> Also update CONFIG_ACPI_APEI_EINJ to set CONFIG_CXL_EINJ by default.
>>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>> drivers/acpi/apei/Kconfig | 1 +
>> drivers/cxl/Kconfig | 12 ++++++++++++
>> 2 files changed, 13 insertions(+)
>>
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index 6b18f8bc7be3..4c3f0ec5731e 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -55,6 +55,7 @@ config ACPI_APEI_MEMORY_FAILURE
>> config ACPI_APEI_EINJ
>> tristate "APEI Error INJection (EINJ)"
>> depends on ACPI_APEI && DEBUG_FS
>> + imply CXL_BUS
>
> This can safely be deleted, it is sufficient for CXL_BUS to depend on
> ACPI_APEI_EINJ.
>
Ok, will do.
>> help
>> EINJ provides a hardware error injection mechanism, it is
>> mainly used for debugging and testing the other parts of
>> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
>> index 8ea1d340e438..6f4adcd733e5 100644
>> --- a/drivers/cxl/Kconfig
>> +++ b/drivers/cxl/Kconfig
>> @@ -154,4 +154,16 @@ config CXL_PMU
>> monitoring units and provide standard perf based interfaces.
>>
>> If unsure say 'm'.
>> +
>> +config CXL_EINJ
>> + bool "CXL Error INJection Support"
>> + default ACPI_APEI_EINJ
>> + depends on ACPI_APEI_EINJ && ACPI_APEI_EINJ=CXL_BUS
>
> It is ok for CXL_BUS to be a module while ACPI_APEI_EINJ is built-in, so
> the && can be dropped since CXL_BUS is guaranteed to be > 0 here, i.e.:
>
> default ACPI_APEI_EINJ
> depends on ACPI_APEI_EINJ >= CXL_BUS
Sounds good. I don't have much experience with Kconfig and wasn't aware that >= was a thing, good to know!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 4/5] cxl/core, EINJ: Add CXL debugfs files and EINJ functions
2023-12-19 4:47 ` Dan Williams
@ 2024-01-10 20:31 ` Ben Cheatham
2024-01-10 22:10 ` Dan Williams
0 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2024-01-10 20:31 UTC (permalink / raw)
To: Dan Williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
On 12/18/23 10:47 PM, Dan Williams wrote:
> Ben Cheatham wrote:
>> Implement CXL helper functions in the EINJ module for getting the
>> available CXL protocol error types and injecting CXL errors and export
>> them to sysfs under kernel/debug/cxl.
>>
>> The kernel/debug/cxl/einj_types file will print the available CXL
>> protocol errors in the same format as the available_error_types
>> file provided by the EINJ module. The
>> kernel/debug/cxl/$dport_dev/einj_inject is functionally the same as the
>> error_type and error_inject files provided by the EINJ module, i.e.:
>> writing an error type into $dport_dev/einj_inject will inject said error
>> type into the CXL dport represented by $dport_dev.
>>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>> Documentation/ABI/testing/debugfs-cxl | 23 ++++
>> drivers/acpi/apei/einj.c | 144 ++++++++++++++++++++++++--
>> drivers/cxl/core/port.c | 33 ++++++
>> drivers/cxl/einj.h | 58 +++++++++++
>> 4 files changed, 248 insertions(+), 10 deletions(-)
>> create mode 100644 drivers/cxl/einj.h
>>
>> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
>> index fe61d372e3fa..97a8684bad84 100644
>> --- a/Documentation/ABI/testing/debugfs-cxl
>> +++ b/Documentation/ABI/testing/debugfs-cxl
>> @@ -33,3 +33,26 @@ Description:
>> device cannot clear poison from the address, -ENXIO is returned.
>> The clear_poison attribute is only visible for devices
>> supporting the capability.
>> +
>> +What: /sys/kernel/debug/cxl/einj_types
>> +Date: November, 2023
>> +KernelVersion: v6.8
>> +Contact: linux-cxl@vger.kernel.org
>> +Description:
>> + (RO) Prints the CXL protocol error types made available by
>> + the platform in the format "0x<error number> <error type>".
>> + The <error number> can be written to einj_inject to inject
>> + <error type> into a chosen dport. This file is only visible if
>> + CONFIG_CXL_EINJ is enabled.
>
> I don't think the Kconfig dependency needs to be mentioned here.
>
I'll remove it.
>> +
>> +What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
>> +Date: November, 2023
>> +KernelVersion: v6.8
>> +Contact: linux-cxl@vger.kernel.org
>> +Description:
>> + (WO) Writing an integer to this file injects the corresponding
>> + CXL protocol error into $dport_dev (integer to type mapping is
>> + available by reading from einj_types).
>
> Maybe mention that $dport_dev is a device name from /sys/bus/pci/devices?
>
Will do!
>> + enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
>> + a CXL 2.0 error is injected. This file is only visible if
>> + CONFIG_CXL_EINJ is enabled.
>
> Same "drop the Kconfig mention" here.
>
>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>> index 26a887d2a5cd..1a2195779b52 100644
>> --- a/drivers/acpi/apei/einj.c
>> +++ b/drivers/acpi/apei/einj.c
>> @@ -24,6 +24,7 @@
>> #include <asm/unaligned.h>
>>
>> #include "apei-internal.h"
>> +#include "../../cxl/einj.h"
>
> I thought this was going to move to a top-level header?
>
Sorry, I either misunderstood you a couple of versions ago or misread the original request.
I'll move it under include/linux.
>>
>> #undef pr_fmt
>> #define pr_fmt(fmt) "EINJ: " fmt
>> @@ -36,6 +37,12 @@
>> #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
>> ACPI_EINJ_MEMORY_UNCORRECTABLE | \
>> ACPI_EINJ_MEMORY_FATAL)
>> +#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
>> + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
>> + ACPI_EINJ_CXL_CACHE_FATAL | \
>> + ACPI_EINJ_CXL_MEM_CORRECTABLE | \
>> + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
>> + ACPI_EINJ_CXL_MEM_FATAL)
>>
>> /*
>> * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
>> @@ -537,8 +544,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>> if (type & ACPI5_VENDOR_BIT) {
>> if (vendor_flags != SETWA_FLAGS_MEM)
>> goto inject;
>> - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
>> + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
>> goto inject;
>> + } else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
>> + goto inject;
>> + }
>>
>> /*
>> * Disallow crazy address masks that give BIOS leeway to pick
>> @@ -590,6 +600,9 @@ static const char * const einj_error_type_string[] = {
>> "0x00000200\tPlatform Correctable\n",
>> "0x00000400\tPlatform Uncorrectable non-fatal\n",
>> "0x00000800\tPlatform Uncorrectable fatal\n",
>> +};
>> +
>> +static const char * const einj_cxl_error_type_string[] = {
>> "0x00001000\tCXL.cache Protocol Correctable\n",
>> "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
>> "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
>> @@ -617,29 +630,44 @@ DEFINE_SHOW_ATTRIBUTE(available_error_type);
>>
>> static bool einj_initialized;
>>
>> -static int error_type_get(void *data, u64 *val)
>> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v)
>
> Oh nice an einj_cxl_ prefix in the global namespace, looks appropriate. I
> mention this here because I don't think a later cxl_einj_ wrapper makes
> sense, see below.
>
>> {
>> - *val = error_type;
>> + int cxl_err, rc;
>> + u32 available_error_type = 0;
>> +
>> + if (!einj_initialized)
>> + return -ENXIO;
>> +
>> + rc = einj_get_available_error_type(&available_error_type);
>> + if (rc)
>> + return rc;
>> +
>> + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
>> + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
>> +
>> + if (available_error_type & cxl_err)
>> + seq_puts(m, einj_cxl_error_type_string[pos]);
>> + }
>>
>> return 0;
>> }
>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL);
>>
>> -static int error_type_set(void *data, u64 val)
>> +static int validate_error_type(u64 type)
>> {
>> + u32 tval, vendor, available_error_type = 0;
>> int rc;
>> - u32 available_error_type = 0;
>> - u32 tval, vendor;
>>
>> /* Only low 32 bits for error type are valid */
>> - if (val & GENMASK_ULL(63, 32))
>> + if (type & GENMASK_ULL(63, 32))
>> return -EINVAL;
>>
>> /*
>> * Vendor defined types have 0x80000000 bit set, and
>> * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
>> */
>> - vendor = val & ACPI5_VENDOR_BIT;
>> - tval = val & 0x7fffffff;
>> + vendor = type & ACPI5_VENDOR_BIT;
>> + tval = type & 0x7fffffff;
>>
>> /* Only one error type can be specified */
>> if (tval & (tval - 1))
>> @@ -648,9 +676,105 @@ static int error_type_set(void *data, u64 val)
>> rc = einj_get_available_error_type(&available_error_type);
>> if (rc)
>> return rc;
>> - if (!(val & available_error_type))
>> + if (!(type & available_error_type))
>> return -EINVAL;
>> }
>> +
>> + return 0;
>> +}
>> +
>> +static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf)
>> +{
>> + struct pci_bus *pbus;
>> + struct pci_host_bridge *bridge;
>> + u64 seg = 0, bus;
>> +
>> + pbus = dport_dev->bus;
>> + bridge = pci_find_host_bridge(pbus);
>> +
>> + if (!bridge)
>> + return -ENODEV;
>> +
>> + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
>> + seg = bridge->domain_nr << 24;
>> +
>> + bus = pbus->number << 16;
>> + *sbdf = seg | bus | dport_dev->devfn;
>> +
>> + return 0;
>> +}
>> +
>> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type)
>> +{
>> + u64 param1 = 0, param2 = 0, param4 = 0;
>> + u32 flags;
>> + int rc;
>> +
>> + if (!einj_initialized)
>> + return -ENXIO;
>> +
>> + /* Only CXL error types can be specified */
>> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
>> + return -EINVAL;
>> +
>> + rc = validate_error_type(type);
>> + if (rc)
>> + return rc;
>> +
>> + param1 = (u64) rcrb;
>> + param2 = 0xfffffffffffff000;
>> + flags = 0x2;
>> +
>> + return einj_error_inject(type, flags, param1, param2, 0, param4);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL);
>> +
>> +int einj_cxl_inject_error(struct pci_dev *dport, u64 type)
>> +{
>> + u64 param1 = 0, param2 = 0, param4 = 0;
>> + u32 flags;
>> + int rc;
>> +
>> + if (!einj_initialized)
>> + return -ENXIO;
>> +
>> + /* Only CXL error types can be specified */
>> + if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
>> + return -EINVAL;
>> +
>> + rc = validate_error_type(type);
>> + if (rc)
>> + return rc;
>> +
>> + rc = cxl_dport_get_sbdf(dport, ¶m4);
>> + if (rc)
>> + return rc;
>> +
>> + flags = 0x4;
>> +
>> + return einj_error_inject(type, flags, param1, param2, 0, param4);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL);
>> +
>> +static int error_type_get(void *data, u64 *val)
>> +{
>> + *val = error_type;
>> +
>> + return 0;
>> +}
>> +
>> +static int error_type_set(void *data, u64 val)
>> +{
>> + int rc;
>> +
>> + /* CXL error types have to be injected from cxl debugfs */
>> + if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
>> + return -EINVAL;
>> +
>> + rc = validate_error_type(val);
>> + if (rc)
>> + return rc;
>> +
>> error_type = val;
>>
>> return 0;
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 38441634e4c6..4ed4a24138c3 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -11,6 +11,7 @@
>> #include <linux/idr.h>
>> #include <cxlmem.h>
>> #include <cxlpci.h>
>> +#include <einj.h>
>> #include <cxl.h>
>> #include "core.h"
>>
>> @@ -783,6 +784,32 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
>> return rc;
>> }
>>
>> +DEFINE_SHOW_ATTRIBUTE(cxl_einj_available_error_type);
>> +
>> +static int cxl_einj_inject(void *data, u64 type)
>> +{
>> + struct cxl_dport *dport = data;
>> +
>> + if (dport->rch)
>> + return cxl_einj_inject_rch_error(dport->rcrb.base, type);
>> +
>> + if (!dev_is_pci(dport->dport_dev))
>> + return -EINVAL;
>> +
>> + return cxl_einj_inject_error(to_pci_dev(dport->dport_dev), type);
>> +}
>> +DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
>> +
>> +static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport)
>> +{
>> + struct dentry *dir;
>
> I think an "IS_ENABLED(CONFIG_CXL_EINJ)" is warranted here.
>
I agree. I'll add one.
>> +
>> + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
>> +
>> + debugfs_create_file("einj_inject", 0200, dir, dport,
>> + &cxl_einj_inject_fops);
>
> I will note that I am little bit uneasy about this ACPI'ism escaping
> into the common core, but he mitigation for me is that if some other
> platform firmware invented a platform-firmware method for error inject
> it would simply need to reuse the einj_cxl_ namespace to make it common.
>
I'll be honest; I'm not sure I understand the concern here, but that's probably
just inexperience on my part. That being said, I don't mind changing it if you
have any suggestions!
>
>> +}
>> +
>> static struct cxl_port *__devm_cxl_add_port(struct device *host,
>> struct device *uport_dev,
>> resource_size_t component_reg_phys,
>> @@ -1136,6 +1163,8 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
>> } else {
>> dev_dbg(dport_dev, "dport added to %s\n",
>> dev_name(&port->dev));
>> +
>> + cxl_debugfs_create_dport_dir(dport);
>> }
>>
>> return dport;
>> @@ -1170,6 +1199,8 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
>> } else {
>> dev_dbg(dport_dev, "RCH dport added to %s\n",
>> dev_name(&port->dev));
>> +
>> + cxl_debugfs_create_dport_dir(dport);
>> }
>
> Move the above 2 invocations into 1 common call from
> __devm_cxl_add_dport().
>
Will do.
>>
>> return dport;
>> @@ -2109,6 +2140,8 @@ static __init int cxl_core_init(void)
>> int rc;
>>
>> cxl_debugfs = debugfs_create_dir("cxl", NULL);
>> + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL,
>> + &cxl_einj_available_error_type_fops);
>
> This should be gated by "IS_ENABLED(CONFIG_CXL_EINJ)"
>
Will do.
>>
>> cxl_mbox_init();
>>
>> diff --git a/drivers/cxl/einj.h b/drivers/cxl/einj.h
>> new file mode 100644
>> index 000000000000..b913763c3238
>> --- /dev/null
>> +++ b/drivers/cxl/einj.h
>> @@ -0,0 +1,58 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +/*
>> + * CXL protocol Error INJection support.
>> + *
>> + * Copyright (c) 2023 Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Ben Cheatham <benjamin.cheatham@amd.com>
>> + */
>> +#ifndef CXL_EINJ_H
>> +#define CXL_EINJ_H
>> +#include <linux/pci.h>
>> +
>> +int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
>> +int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
>> +int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
>
> See below, these should be within the ifdef guard.
>
>> +
>> +#if IS_ENABLED(CONFIG_CXL_EINJ)
>
> This should be
>
> #ifdef CONFIG_ACPI_APEI_EINJ
>
> ...because that is the module defining the exported public interface.
>
Makes sense, I'll change it.
>> +static inline int cxl_einj_available_error_type_show(struct seq_file *m,
>> + void *v)
>> +{
>> + return einj_cxl_available_error_type_show(m, v);
>> +}
>> +
>> +static inline int cxl_einj_inject_error(struct pci_dev *dport_dev, u64 type)
>> +{
>> + return einj_cxl_inject_error(dport_dev, type);
>> +}
>> +
>> +static inline int cxl_einj_inject_rch_error(u64 rcrb, u64 type)
>> +{
>> + return einj_cxl_inject_rch_error(rcrb, type);
>> +}
>
> Above 3 static inlines can be deleted, this section of the ifdef would
> just be:
>
> int einj_cxl_available_error_type_show(struct seq_file *m, void *v);
> int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type);
> int einj_cxl_inject_rch_error(u64 rcrb, u64 type);
>
Sounds good. I originally tried this but was getting compile-time errors in einj.c
most likely due to gating around CONFIG_CXL_EINJ instead of CONFIG_ACPI_APEI_EINJ.
>> +
>> +#else
>> +static inline int cxl_einj_available_error_type_show(struct seq_file *m,
>> + void *v)
>> +{
>> + return -ENXIO;
>> +}
>> +
>> +static inline int cxl_einj_type_show(struct seq_file *m, void *data)
>> +{
>> + return -ENXIO;
>> +}
>> +
>> +static inline int cxl_einj_inject_error(struct pci_dev *dport_dev, u64 type)
>> +{
>> + return -ENXIO;
>> +}
>> +
>> +static inline int cxl_einj_inject_rch_error(u64 rcrb, u64 type)
>> +{
>> + return -ENXIO;
>> +}
>> +#endif
>
> Rename these to their einj_cxl_ equivalent.
Gladly! I wasn't a fan of using cxl_einj_ and einj_cxl_ at the same time since the prefixes are so similiar :/.
Thanks again,
Ben
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 4/5] cxl/core, EINJ: Add CXL debugfs files and EINJ functions
2024-01-10 20:31 ` Ben Cheatham
@ 2024-01-10 22:10 ` Dan Williams
2024-01-10 22:17 ` Ben Cheatham
0 siblings, 1 reply; 19+ messages in thread
From: Dan Williams @ 2024-01-10 22:10 UTC (permalink / raw)
To: Ben Cheatham, Dan Williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
Ben Cheatham wrote:
[..]
> >> +
> >> + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
> >> +
> >> + debugfs_create_file("einj_inject", 0200, dir, dport,
> >> + &cxl_einj_inject_fops);
> >
> > I will note that I am little bit uneasy about this ACPI'ism escaping
> > into the common core, but he mitigation for me is that if some other
> > platform firmware invented a platform-firmware method for error inject
> > it would simply need to reuse the einj_cxl_ namespace to make it common.
> >
>
> I'll be honest; I'm not sure I understand the concern here, but that's probably
> just inexperience on my part. That being said, I don't mind changing it if you
> have any suggestions!
>
Notice how the CXL subsystem organizes all the ACPI interface concerns
into drivers/cxl/acpi.c. There are some generic callbacks like the
@calc_hb argument to cxl_root_decoder_alloc(), but there are no direct
ties of the CXL core code to ACPI details. This separation allows, in
principle, a non-ACPI platform (like PowerPC OpenFirmware) to support
CXL without needing to unwind a bunch of CXL-ACPI specific stuff.
This "cxl_einj" work is leaking an ACPI specific term "EINJ" into the
core code. The reason I am ok with this is that the ABI is still gated
on CONFIG_ACPI_APEI_EINJ. Also, we still have the option to require that
OpenFirmware error injection just call their user facing ABI "EINJ" as
well, even if it is a different name in the OpenFirmware specification.
To be clear though no one has sent patches for CXL support on anything
other than ACPI based platforms.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v8 4/5] cxl/core, EINJ: Add CXL debugfs files and EINJ functions
2024-01-10 22:10 ` Dan Williams
@ 2024-01-10 22:17 ` Ben Cheatham
0 siblings, 0 replies; 19+ messages in thread
From: Ben Cheatham @ 2024-01-10 22:17 UTC (permalink / raw)
To: Dan Williams, dave, jonathan.cameron, dave.jiang,
alison.schofield, vishal.l.verma, ira.weiny, rafael
Cc: linux-cxl, linux-acpi
On 1/10/24 4:10 PM, Dan Williams wrote:
> Ben Cheatham wrote:
> [..]
>>>> +
>>>> + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev));
>>>> +
>>>> + debugfs_create_file("einj_inject", 0200, dir, dport,
>>>> + &cxl_einj_inject_fops);
>>>
>>> I will note that I am little bit uneasy about this ACPI'ism escaping
>>> into the common core, but he mitigation for me is that if some other
>>> platform firmware invented a platform-firmware method for error inject
>>> it would simply need to reuse the einj_cxl_ namespace to make it common.
>>>
>>
>> I'll be honest; I'm not sure I understand the concern here, but that's probably
>> just inexperience on my part. That being said, I don't mind changing it if you
>> have any suggestions!
>>
>
> Notice how the CXL subsystem organizes all the ACPI interface concerns
> into drivers/cxl/acpi.c. There are some generic callbacks like the
> @calc_hb argument to cxl_root_decoder_alloc(), but there are no direct
> ties of the CXL core code to ACPI details. This separation allows, in
> principle, a non-ACPI platform (like PowerPC OpenFirmware) to support
> CXL without needing to unwind a bunch of CXL-ACPI specific stuff.
>
> This "cxl_einj" work is leaking an ACPI specific term "EINJ" into the
> core code. The reason I am ok with this is that the ABI is still gated
> on CONFIG_ACPI_APEI_EINJ. Also, we still have the option to require that
> OpenFirmware error injection just call their user facing ABI "EINJ" as
> well, even if it is a different name in the OpenFirmware specification.
> To be clear though no one has sent patches for CXL support on anything
> other than ACPI based platforms.
Ok that makes sense, thanks for the explanation!
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-01-10 22:17 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-13 22:36 [PATCH v8 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
2023-12-13 22:36 ` [PATCH v8 1/5] cxl, ACPI, APEI, EINJ: Add CXL_EINJ Kconfig option Ben Cheatham
2023-12-18 23:48 ` Dan Williams
2024-01-10 20:31 ` Ben Cheatham
2023-12-13 22:36 ` [PATCH v8 2/5] ACPI, APEI, EINJ: Add wrapper __init function Ben Cheatham
2023-12-18 23:59 ` Dan Williams
2023-12-19 15:39 ` Jonathan Cameron
2023-12-19 20:48 ` Dan Williams
2023-12-20 14:37 ` Ben Cheatham
2023-12-20 23:51 ` Dan Williams
2023-12-20 14:33 ` Ben Cheatham
2023-12-13 22:37 ` [PATCH v8 3/5] ACPI: Add CXL protocol error defines Ben Cheatham
2023-12-19 5:09 ` Dan Williams
2023-12-13 22:37 ` [PATCH v8 4/5] cxl/core, EINJ: Add CXL debugfs files and EINJ functions Ben Cheatham
2023-12-19 4:47 ` Dan Williams
2024-01-10 20:31 ` Ben Cheatham
2024-01-10 22:10 ` Dan Williams
2024-01-10 22:17 ` Ben Cheatham
2023-12-13 22:37 ` [PATCH v8 5/5] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox