* [PATCH 1/6] libcxl: Add debugfs path to CXL context
2025-04-24 21:23 [ndctl PATCH 0/6] Add error injection support Ben Cheatham
@ 2025-04-24 21:23 ` Ben Cheatham
2025-04-24 21:23 ` [PATCH 2/6] libcxl: Add CXL protocol errors Ben Cheatham
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-04-24 21:23 UTC (permalink / raw)
To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield
Add the debugfs path to the CXL library context. This will be used by
library functions that access information from the CXL debugfs to
retrieve information.
The default path is the normal mount point for the debugfs
(/sys/kernel/debug) but the debugfs mount point can vary. Add a library
API call for setting the debugfs path for cases where the debugfs isn't
mounted at the default.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/lib/libcxl.c | 7 +++++++
cxl/lib/libcxl.sym | 5 +++++
cxl/libcxl.h | 1 +
3 files changed, 13 insertions(+)
diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 63aa4ef..e86d00f 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -54,6 +54,7 @@ struct cxl_ctx {
struct kmod_ctx *kmod_ctx;
struct daxctl_ctx *daxctl_ctx;
void *private_data;
+ const char *debugfs;
};
static void free_pmem(struct cxl_pmem *pmem)
@@ -294,6 +295,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
c->udev = udev;
c->udev_queue = udev_queue;
c->timeout = 5000;
+ c->debugfs = "/sys/kernel/debug";
return 0;
@@ -3265,6 +3267,11 @@ CXL_EXPORT int cxl_port_decoders_committed(struct cxl_port *port)
return port->decoders_committed;
}
+CXL_EXPORT void cxl_set_debugfs(struct cxl_ctx *ctx, const char *debugfs)
+{
+ ctx->debugfs = debugfs;
+}
+
static void *add_cxl_bus(void *parent, int id, const char *cxlbus_base)
{
const char *devname = devpath_to_devname(cxlbus_base);
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index 763151f..61553c0 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -287,3 +287,8 @@ global:
cxl_memdev_trigger_poison_list;
cxl_region_trigger_poison_list;
} LIBCXL_7;
+
+LIBCXL_9 {
+global:
+ cxl_set_debugfs;
+} LIBECXL_8;
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index 43c082a..f3f11ad 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -32,6 +32,7 @@ void cxl_set_userdata(struct cxl_ctx *ctx, void *userdata);
void *cxl_get_userdata(struct cxl_ctx *ctx);
void cxl_set_private_data(struct cxl_ctx *ctx, void *data);
void *cxl_get_private_data(struct cxl_ctx *ctx);
+void cxl_set_debugfs(struct cxl_ctx *ctx, const char *debugfs);
enum cxl_fwl_status {
CXL_FWL_STATUS_UNKNOWN,
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 2/6] libcxl: Add CXL protocol errors
2025-04-24 21:23 [ndctl PATCH 0/6] Add error injection support Ben Cheatham
2025-04-24 21:23 ` [PATCH 1/6] libcxl: Add debugfs path to CXL context Ben Cheatham
@ 2025-04-24 21:23 ` Ben Cheatham
2025-04-24 21:23 ` [PATCH 3/6] libcxl: Add poison injection functions Ben Cheatham
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-04-24 21:23 UTC (permalink / raw)
To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield
The v6.11 kernel adds CXL protocol (CXL.cache/CXL.mem) error injection
for platforms that implement the v6.5+ ACPI specification. These errors
are reported by the kernel through the einj_types file and injected
through the einj_inject file under the relevant CXL RCH dport or VH root
port.
Add a library API to retreive the CXL error types and inject them. This
API will be used in a later commit by the cxl inject-error and list
commands.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/lib/libcxl.c | 166 +++++++++++++++++++++++++++++++++++++++++++++
cxl/lib/libcxl.sym | 5 ++
cxl/lib/private.h | 14 ++++
cxl/libcxl.h | 13 ++++
4 files changed, 198 insertions(+)
diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index e86d00f..408b2a3 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -46,11 +46,13 @@ struct cxl_ctx {
void *userdata;
int memdevs_init;
int buses_init;
+ int perrors_init;
unsigned long timeout;
struct udev *udev;
struct udev_queue *udev_queue;
struct list_head memdevs;
struct list_head buses;
+ struct list_head perrors;
struct kmod_ctx *kmod_ctx;
struct daxctl_ctx *daxctl_ctx;
void *private_data;
@@ -204,6 +206,14 @@ static void free_bus(struct cxl_bus *bus, struct list_head *head)
free(bus);
}
+static void free_protocol_error(struct cxl_protocol_error *perror,
+ struct list_head *head)
+{
+ if (head)
+ list_del_from(head, &perror->list);
+ free(perror);
+}
+
/**
* cxl_get_userdata - retrieve stored data pointer from library context
* @ctx: cxl library context
@@ -290,6 +300,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
*ctx = c;
list_head_init(&c->memdevs);
list_head_init(&c->buses);
+ list_head_init(&c->perrors);
c->kmod_ctx = kmod_ctx;
c->daxctl_ctx = daxctl_ctx;
c->udev = udev;
@@ -331,6 +342,7 @@ CXL_EXPORT struct cxl_ctx *cxl_ref(struct cxl_ctx *ctx)
*/
CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
{
+ struct cxl_protocol_error *perror, *_p;
struct cxl_memdev *memdev, *_d;
struct cxl_bus *bus, *_b;
@@ -346,6 +358,9 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
list_for_each_safe(&ctx->buses, bus, _b, port.list)
free_bus(bus, &ctx->buses);
+ list_for_each_safe(&ctx->perrors, perror, _p, list)
+ free_protocol_error(perror, &ctx->perrors);
+
udev_queue_unref(ctx->udev_queue);
udev_unref(ctx->udev);
kmod_unref(ctx->kmod_ctx);
@@ -3272,6 +3287,157 @@ CXL_EXPORT void cxl_set_debugfs(struct cxl_ctx *ctx, const char *debugfs)
ctx->debugfs = debugfs;
}
+const struct cxl_protocol_error cxl_protocol_errors[] = {
+ CXL_PROTOCOL_ERROR(12, "cache-correctable"),
+ CXL_PROTOCOL_ERROR(13, "cache-uncorrectable"),
+ CXL_PROTOCOL_ERROR(14, "cache-fatal"),
+ CXL_PROTOCOL_ERROR(15, "mem-correctable"),
+ CXL_PROTOCOL_ERROR(16, "mem-uncorrectable"),
+ CXL_PROTOCOL_ERROR(17, "mem-fatal")
+};
+
+static struct cxl_protocol_error *create_cxl_protocol_error(struct cxl_ctx *ctx,
+ unsigned long n)
+{
+ struct cxl_protocol_error *perror;
+
+ for (unsigned long i = 0; i < ARRAY_SIZE(cxl_protocol_errors); i++) {
+ if (n != BIT(cxl_protocol_errors[i].num))
+ continue;
+
+ perror = calloc(1, sizeof(*perror));
+ if (!perror)
+ return NULL;
+
+ *perror = cxl_protocol_errors[i];
+ perror->ctx = ctx;
+ return perror;
+ }
+
+ return NULL;
+}
+
+static void cxl_add_protocol_errors(struct cxl_ctx *ctx)
+{
+ size_t path_len = strlen(ctx->debugfs) + 30;
+ struct cxl_protocol_error *perror;
+ char *path, *num, *save;
+ unsigned long n;
+ char buf[512];
+ int rc = 0;
+
+ path = calloc(1, path_len);
+ if (!path)
+ return;
+
+ snprintf(path, path_len, "%s/cxl/einj_types", ctx->debugfs);
+ rc = access(path, F_OK);
+ if (rc) {
+ err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
+ goto err;
+ }
+
+ rc = sysfs_read_attr(ctx, path, buf);
+ if (rc) {
+ err(ctx, "failed to read %s: %s\n", path, strerror(-rc));
+ goto err;
+ }
+
+ /*
+ * The format of the output of the einj_types attr is:
+ * <Error number in hex 1> <Error name 1>
+ * <Error number in hex 2> <Error name 2>
+ * ...
+ *
+ * We only need the number, so parse that and skip the rest of
+ * the line.
+ */
+ num = strtok_r(buf, " \n", &save);
+ while (num) {
+ n = strtoul(num, NULL, 16);
+ perror = create_cxl_protocol_error(ctx, n);
+ if (perror)
+ list_add(&ctx->perrors, &perror->list);
+
+ num = strtok_r(NULL, "\n", &save);
+ if (!num)
+ break;
+
+ num = strtok_r(NULL, " \n", &save);
+ }
+
+err:
+ free(path);
+}
+
+static void cxl_protocol_errors_init(struct cxl_ctx *ctx)
+{
+ if (ctx->perrors_init)
+ return;
+
+ ctx->perrors_init = 1;
+ cxl_add_protocol_errors(ctx);
+}
+
+CXL_EXPORT struct cxl_protocol_error *
+cxl_protocol_error_get_first(struct cxl_ctx *ctx)
+{
+ cxl_protocol_errors_init(ctx);
+
+ return list_top(&ctx->perrors, struct cxl_protocol_error, list);
+}
+
+CXL_EXPORT struct cxl_protocol_error *
+cxl_protocol_error_get_next(struct cxl_protocol_error *perror)
+{
+ struct cxl_ctx *ctx = perror->ctx;
+
+ return list_next(&ctx->perrors, perror, list);
+}
+
+CXL_EXPORT unsigned long
+cxl_protocol_error_get_num(struct cxl_protocol_error *perror)
+{
+ return perror->num;
+}
+
+CXL_EXPORT const char *
+cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
+{
+ return perror->string;
+}
+
+CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
+ unsigned long error)
+{
+ struct cxl_ctx *ctx = dport->port->ctx;
+ unsigned long path_len = strlen(ctx->debugfs) + 100;
+ char buf[32] = { 0 };
+ char *path;
+ int rc;
+
+ path = calloc(path_len, sizeof(char));
+ if (!path)
+ return -ENOMEM;
+
+ snprintf(path, path_len, "%s/cxl/%s/einj_inject", ctx->debugfs,
+ cxl_dport_get_devname(dport));
+ rc = access(path, F_OK);
+ if (rc) {
+ err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
+ free(path);
+ return rc;
+ }
+
+ snprintf(buf, sizeof(buf), "0x%lx\n", error);
+ rc = sysfs_write_attr(ctx, path, buf);
+ if (rc)
+ err(ctx, "failed to write %s: %s\n", path, strerror(-rc));
+
+ free(path);
+ return rc;
+}
+
static void *add_cxl_bus(void *parent, int id, const char *cxlbus_base)
{
const char *devname = devpath_to_devname(cxlbus_base);
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index 61553c0..a0ab86d 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -291,4 +291,9 @@ global:
LIBCXL_9 {
global:
cxl_set_debugfs;
+ cxl_protocol_error_get_first;
+ cxl_protocol_error_get_next;
+ cxl_protocol_error_get_num;
+ cxl_protocol_error_get_str;
+ cxl_dport_protocol_error_inject;
} LIBECXL_8;
diff --git a/cxl/lib/private.h b/cxl/lib/private.h
index b6cd910..85806ac 100644
--- a/cxl/lib/private.h
+++ b/cxl/lib/private.h
@@ -102,6 +102,20 @@ struct cxl_port {
struct list_head dports;
};
+struct cxl_protocol_error {
+ unsigned long num;
+ const char *string;
+ struct cxl_ctx *ctx;
+ struct list_node list;
+};
+
+#define CXL_PROTOCOL_ERROR(n, str) \
+ ((struct cxl_protocol_error){ \
+ .num = (n), \
+ .string = (str), \
+ .ctx = NULL, \
+ })
+
struct cxl_bus {
struct cxl_port port;
};
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index f3f11ad..f8b2aff 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -487,6 +487,19 @@ int cxl_cmd_alert_config_set_enable_alert_actions(struct cxl_cmd *cmd,
int enable);
struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memdev);
+struct cxl_protocol_error;
+struct cxl_protocol_error *cxl_protocol_error_get_first(struct cxl_ctx *ctx);
+struct cxl_protocol_error *
+cxl_protocol_error_get_next(struct cxl_protocol_error *perror);
+unsigned long cxl_protocol_error_get_num(struct cxl_protocol_error *perror);
+const char *cxl_protocol_error_get_str(struct cxl_protocol_error *perror);
+int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
+ unsigned long error);
+
+#define cxl_protocol_error_foreach(ctx, perror) \
+ for (perror = cxl_protocol_error_get_first(ctx); perror != NULL; \
+ perror = cxl_protocol_error_get_next(perror))
+
#ifdef __cplusplus
} /* extern "C" */
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/6] libcxl: Add poison injection functions
2025-04-24 21:23 [ndctl PATCH 0/6] Add error injection support Ben Cheatham
2025-04-24 21:23 ` [PATCH 1/6] libcxl: Add debugfs path to CXL context Ben Cheatham
2025-04-24 21:23 ` [PATCH 2/6] libcxl: Add CXL protocol errors Ben Cheatham
@ 2025-04-24 21:23 ` Ben Cheatham
2025-04-24 21:23 ` [PATCH 4/6] cxl/list: Add debugfs option Ben Cheatham
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-04-24 21:23 UTC (permalink / raw)
To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield
Add a library API for clearing and injecting poison into a CXL memory
device through the kernel debugfs.
This API will be used by the cxl inject-error command to inject/clear
poison from a CXL memory device.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/lib/libcxl.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++
cxl/lib/libcxl.sym | 3 +++
cxl/libcxl.h | 3 +++
3 files changed, 58 insertions(+)
diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 408b2a3..bc4b08c 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -4856,3 +4856,55 @@ CXL_EXPORT struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memde
{
return cxl_cmd_new_generic(memdev, CXL_MEM_COMMAND_ID_SET_ALERT_CONFIG);
}
+
+CXL_EXPORT bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev)
+{
+ struct cxl_ctx *ctx = memdev->ctx;
+ size_t path_len = strlen(ctx->debugfs) + 100;
+ bool exists;
+ char *path;
+
+ path = calloc(path_len, sizeof(char));
+ if (!path)
+ return false;
+
+ snprintf(path, path_len, "%s/cxl/%s/inject_poison", ctx->debugfs,
+ cxl_memdev_get_devname(memdev));
+ exists = access(path, F_OK) == 0;
+
+ free(path);
+ return exists;
+}
+
+static int cxl_memdev_poison_action(struct cxl_memdev *memdev, size_t dpa,
+ bool clear)
+{
+ struct cxl_ctx *ctx = memdev->ctx;
+ size_t path_len = strlen(ctx->debugfs) + 100;
+ char addr[32];
+ char *path;
+ int rc;
+
+ path = calloc(path_len, sizeof(char));
+ if (!path)
+ return -ENOMEM;
+
+ snprintf(path, path_len, "%s/cxl/%s/%s", ctx->debugfs,
+ cxl_memdev_get_devname(memdev),
+ clear ? "clear_poison" : "inject_poison");
+ snprintf(addr, 32, "0x%lx\n", dpa);
+
+ rc = sysfs_write_attr(ctx, path, addr);
+ free(path);
+ return rc;
+}
+
+CXL_EXPORT int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t addr)
+{
+ return cxl_memdev_poison_action(memdev, addr, false);
+}
+
+CXL_EXPORT int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t addr)
+{
+ return cxl_memdev_poison_action(memdev, addr, true);
+}
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index a0ab86d..783a257 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -296,4 +296,7 @@ global:
cxl_protocol_error_get_num;
cxl_protocol_error_get_str;
cxl_dport_protocol_error_inject;
+ cxl_memdev_has_poison_injection;
+ cxl_memdev_inject_poison;
+ cxl_memdev_clear_poison;
} LIBECXL_8;
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index f8b2aff..6840d2a 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -101,6 +101,9 @@ int cxl_memdev_read_label(struct cxl_memdev *memdev, void *buf, size_t length,
size_t offset);
int cxl_memdev_write_label(struct cxl_memdev *memdev, void *buf, size_t length,
size_t offset);
+bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev);
+int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t dpa);
+int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t dpa);
struct cxl_cmd *cxl_cmd_new_get_fw_info(struct cxl_memdev *memdev);
unsigned int cxl_cmd_fw_info_get_num_slots(struct cxl_cmd *cmd);
unsigned int cxl_cmd_fw_info_get_active_slot(struct cxl_cmd *cmd);
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 4/6] cxl/list: Add debugfs option
2025-04-24 21:23 [ndctl PATCH 0/6] Add error injection support Ben Cheatham
` (2 preceding siblings ...)
2025-04-24 21:23 ` [PATCH 3/6] libcxl: Add poison injection functions Ben Cheatham
@ 2025-04-24 21:23 ` Ben Cheatham
2025-04-24 21:24 ` [PATCH 5/6] cxl/list: Add injectable-errors option Ben Cheatham
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-04-24 21:23 UTC (permalink / raw)
To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield
Add "--debugfs" option to specify the path to the kernel debugfs.
Defaults to "/sys/kernel/debug" if left unspecified.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
Documentation/cxl/cxl-list.txt | 4 ++++
cxl/list.c | 6 ++++++
2 files changed, 10 insertions(+)
diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
index 9a9911e..56eb516 100644
--- a/Documentation/cxl/cxl-list.txt
+++ b/Documentation/cxl/cxl-list.txt
@@ -491,6 +491,10 @@ OPTIONS
If the cxl tool was built with debug enabled, turn on debug
messages.
+--debugfs::
+ Specifies the path to the kernel debug filesystem. If not specified,
+ to "/sys/kernel/debug".
+
include::human-option.txt[]
include::../copyright.txt[]
diff --git a/cxl/list.c b/cxl/list.c
index 0b25d78..5f77d87 100644
--- a/cxl/list.c
+++ b/cxl/list.c
@@ -13,6 +13,7 @@
#include "filter.h"
static struct cxl_filter_params param;
+static const char *debugfs;
static bool debug;
static const struct option options[] = {
@@ -60,6 +61,8 @@ static const struct option options[] = {
OPT_BOOLEAN('L', "media-errors", ¶m.media_errors,
"include media-error information "),
OPT_INCR('v', "verbose", ¶m.verbose, "increase output detail"),
+ OPT_STRING(0, "debugfs", &debugfs, "debugfs mount point",
+ "mount point of kernel debugfs (defaults to '/sys/kernel/debug')"),
#ifdef ENABLE_DEBUG
OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
#endif
@@ -146,6 +149,9 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
param.ctx.log_priority = LOG_DEBUG;
}
+ if (debugfs)
+ cxl_set_debugfs(ctx, debugfs);
+
if (cxl_filter_has(param.port_filter, "root") && param.ports)
param.buses = true;
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 5/6] cxl/list: Add injectable-errors option
2025-04-24 21:23 [ndctl PATCH 0/6] Add error injection support Ben Cheatham
` (3 preceding siblings ...)
2025-04-24 21:23 ` [PATCH 4/6] cxl/list: Add debugfs option Ben Cheatham
@ 2025-04-24 21:24 ` Ben Cheatham
2025-04-24 21:24 ` [PATCH 6/6] cxl: Add inject-error command Ben Cheatham
2025-04-29 2:35 ` [ndctl PATCH 0/6] Add error injection support Alison Schofield
6 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-04-24 21:24 UTC (permalink / raw)
To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield
Add "--injectable-errors"/"-N" option to show injectable error
information for CXL objects. Applicable CXL objects are CXL memory
devices, where the information reported is whether poison is injectable,
and CXL busses, which list the CXL protocol error types available for
injection.
The CXL protocol error types will be the same across busses because the
information comes from the ACPI EINJ error types table (ACPI v6.5 18.6),
but are presented under the bus for easier filtering.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
Documentation/cxl/cxl-list.txt | 35 +++++++++++++++++++++++++++++++++-
cxl/filter.h | 3 +++
cxl/json.c | 30 +++++++++++++++++++++++++++++
cxl/list.c | 3 +++
util/json.h | 1 +
5 files changed, 71 insertions(+), 1 deletion(-)
diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
index 56eb516..6d65947 100644
--- a/Documentation/cxl/cxl-list.txt
+++ b/Documentation/cxl/cxl-list.txt
@@ -469,6 +469,38 @@ OPTIONS
}
----
+-N::
+--injectable-errors::
+ Include injectable error information in the output. For CXL memory devices
+ this includes whether poison is injectable through the kernel debug filesystem.
+ The types of CXL protocol errors available for injection into downstream ports
+ are listed as part of a CXL bus object.
+
+----
+# cxl list -NB
+[
+ {
+ "bus":"root0",
+ "provider":"ACPI.CXL",
+ "injectable_protocol_errors":[
+ "mem-correctable",
+ "mem-fatal",
+ ]
+ }
+]
+
+# cxl list -N
+[
+ {
+ "memdev":"mem0",
+ "pmem_size":268435456,
+ "ram_size":268435456,
+ "serial":2,
+ "poison_injectable":true
+ }
+]
+
+----
-v::
--verbose::
Increase verbosity of the output. This can be specified
@@ -485,7 +517,8 @@ OPTIONS
devices with --idle.
- *-vvv*
Everything *-vv* provides, plus enable
- --health, --partition, and --media-errors.
+ --health, --partition, --media-errors, and
+ --injectable-errors.
--debug::
If the cxl tool was built with debug enabled, turn on debug
diff --git a/cxl/filter.h b/cxl/filter.h
index 956a46e..34f8387 100644
--- a/cxl/filter.h
+++ b/cxl/filter.h
@@ -31,6 +31,7 @@ struct cxl_filter_params {
bool alert_config;
bool dax;
bool media_errors;
+ bool inj_errors;
int verbose;
struct log_ctx ctx;
};
@@ -91,6 +92,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
if (param->media_errors)
flags |= UTIL_JSON_MEDIA_ERRORS;
+ if (param->inj_errors)
+ flags |= UTIL_JSON_INJ_ERRORS;
return flags;
}
diff --git a/cxl/json.c b/cxl/json.c
index e65bd80..6f1a7cf 100644
--- a/cxl/json.c
+++ b/cxl/json.c
@@ -855,6 +855,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
json_object_object_add(jdev, "firmware", jobj);
}
+ if (flags & UTIL_JSON_INJ_ERRORS) {
+ jobj = json_object_new_boolean(cxl_memdev_has_poison_injection(memdev));
+ if (jobj)
+ json_object_object_add(jdev, "poison_injectable", jobj);
+ }
+
if (flags & UTIL_JSON_MEDIA_ERRORS) {
jobj = util_cxl_poison_list_to_json(NULL, memdev, flags);
if (jobj)
@@ -930,6 +936,8 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
unsigned long flags)
{
const char *devname = cxl_bus_get_devname(bus);
+ struct cxl_ctx *ctx = cxl_bus_get_ctx(bus);
+ struct cxl_protocol_error *perror;
struct json_object *jbus, *jobj;
jbus = json_object_new_object();
@@ -945,6 +953,28 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
json_object_object_add(jbus, "provider", jobj);
json_object_set_userdata(jbus, bus, NULL);
+
+ if (flags & UTIL_JSON_INJ_ERRORS) {
+ jobj = json_object_new_array();
+ if (!jobj)
+ return jbus;
+
+ cxl_protocol_error_foreach(ctx, perror)
+ {
+ struct json_object *jerr_str;
+ const char *perror_str;
+
+ perror_str = cxl_protocol_error_get_str(perror);
+
+ jerr_str = json_object_new_string(perror_str);
+ if (jerr_str)
+ json_object_array_add(jobj, jerr_str);
+ }
+
+ json_object_object_add(jbus, "injectable_protocol_errors",
+ jobj);
+ }
+
return jbus;
}
diff --git a/cxl/list.c b/cxl/list.c
index 5f77d87..d43b47e 100644
--- a/cxl/list.c
+++ b/cxl/list.c
@@ -60,6 +60,8 @@ static const struct option options[] = {
"include alert configuration information"),
OPT_BOOLEAN('L', "media-errors", ¶m.media_errors,
"include media-error information "),
+ OPT_BOOLEAN('N', "injectable-errors", ¶m.inj_errors,
+ "include injectable error information"),
OPT_INCR('v', "verbose", ¶m.verbose, "increase output detail"),
OPT_STRING(0, "debugfs", &debugfs, "debugfs mount point",
"mount point of kernel debugfs (defaults to '/sys/kernel/debug')"),
@@ -127,6 +129,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
param.alert_config = true;
param.dax = true;
param.media_errors = true;
+ param.inj_errors = true;
/* fallthrough */
case 2:
param.idle = true;
diff --git a/util/json.h b/util/json.h
index 560f845..57278cb 100644
--- a/util/json.h
+++ b/util/json.h
@@ -21,6 +21,7 @@ enum util_json_flags {
UTIL_JSON_TARGETS = (1 << 11),
UTIL_JSON_PARTITION = (1 << 12),
UTIL_JSON_ALERT_CONFIG = (1 << 13),
+ UTIL_JSON_INJ_ERRORS = (1 << 14),
};
void util_display_json_array(FILE *f_out, struct json_object *jarray,
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 6/6] cxl: Add inject-error command
2025-04-24 21:23 [ndctl PATCH 0/6] Add error injection support Ben Cheatham
` (4 preceding siblings ...)
2025-04-24 21:24 ` [PATCH 5/6] cxl/list: Add injectable-errors option Ben Cheatham
@ 2025-04-24 21:24 ` Ben Cheatham
2025-04-29 2:35 ` [ndctl PATCH 0/6] Add error injection support Alison Schofield
6 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-04-24 21:24 UTC (permalink / raw)
To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield
Add the "inject-error" command that can be used to inject CXL protocol
errors into CXL downstream ports and poison in to memory devices. The
available error types can be found by using 'cxl-list' with the
"-N"/"--injectable-errors" option.
The full list of supported device and error types can be found in the
command's documentation.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
Documentation/cxl/cxl-inject-error.txt | 139 ++++++++++++++++
Documentation/cxl/meson.build | 1 +
cxl/builtin.h | 1 +
cxl/cxl.c | 1 +
cxl/inject-error.c | 211 +++++++++++++++++++++++++
cxl/meson.build | 1 +
6 files changed, 354 insertions(+)
create mode 100644 Documentation/cxl/cxl-inject-error.txt
create mode 100644 cxl/inject-error.c
diff --git a/Documentation/cxl/cxl-inject-error.txt b/Documentation/cxl/cxl-inject-error.txt
new file mode 100644
index 0000000..50b25fe
--- /dev/null
+++ b/Documentation/cxl/cxl-inject-error.txt
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+
+cxl-inject-error(1)
+===================
+
+NAME
+----
+cxl-inject-error - Inject CXL errors into CXL devices
+
+SYNOPSIS
+--------
+[verse]
+'cxl inject-error' <device name> [<options>]
+
+Inject an error into a CXL device. The type of errors supported depend on the
+device specified. The types of devices supported are:
+
+"Downstream Ports":: A CXL RCH downstream port (dport) or a CXL VH root port.
+Eligible CXL 2.0+ ports are dports of ports at depth 1 in the output of cxl-list.
+Dports are specified by host name ("0000:0e:01.1").
+"memdevs":: A CXL memory device. Memory devices are specified by device name
+("mem0"), device id ("0"), and/or host device name ("0000:35:00.0").
+
+There are two types of errors which can be injected: CXL protocol errors
+and device poison.
+
+CXL protocol errors can only be used with downstream ports (as defined above).
+Protocol errors follow the format of "<protocol>-<severity>". For example,
+a "mem-fatal" error is a CXL.mem fatal protocol error. Protocol errors can be
+found with the '-N' option of 'cxl-list' under a CXL bus object. For example:
+
+----
+
+# cxl list -NB
+[
+ {
+ "bus":"root0",
+ "provider":"ACPI.CXL",
+ "injectable_protocol_errors":[
+ "mem-correctable",
+ "mem-fatal",
+ ]
+ }
+]
+
+----
+
+CXL protocol (CXL.cache/mem) error injection requires the platform to support
+ACPI v6.5+ error injection (EINJ). In addition to platform support, the
+CONFIG_ACPI_APEI_EINJ and CONFIG_ACPI_APEI_EINJ_CXL kernel configuration options
+will need to be enabled. For more information, view the Linux kernel documentation
+on EINJ.
+
+Device poison can only by used with CXL memory devices. A device physical address
+(DPA) is required to do poison injection. DPAs range from 0 to the size of
+device's memory, which can be found using 'cxl-list'. An example injection:
+
+----
+
+# cxl inject-error mem0 -t poison -a 0x1000
+poison injected at mem0:0x1000
+# cxl list -m mem0 -u --media-errors
+{
+ "memdev":"mem0",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0",
+ "host":"0000:0d:00.0",
+ "firmware_version":"BWFW VERSION 00",
+ "media_errors":[
+ {
+ "offset":"0x1000",
+ "length":64,
+ "source":"Injected"
+ }
+ ]
+}
+
+----
+
+Not all devices support poison injection. To see if a device supports poison injection
+through debugfs, use 'cxl-list' with the '-N' option and look for the "poison-injectable"
+attribute under the device. Example:
+
+----
+
+# cxl list -Nu -m mem0
+{
+ "memdev":"mem0",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0",
+ "host":"0000:0d:00.0",
+ "firmware_version":"BWFW VERSION 00",
+ "poison_injectable":true
+}
+
+----
+
+This command depends on the kernel debug filesystem (debugfs) to do CXL protocol
+error and device poison injection. If your kernel debugfs is not mounted at
+the normal spot (/sys/kernel/debug) you will need to provide the path for it
+using the '--debugfs' option.
+
+
+OPTIONS
+-------
+-a::
+--address::
+ Device physical address (DPA) to use for poison injection. Address can
+ be specified in hex or decimal. Required for poison injection.
+
+-t::
+--type::
+ Type of error to inject into <device name>. The type of error is restricted
+ by device type. The following shows the possible types under their associated
+ device type(s):
+----
+
+Downstream Ports: ::
+ cache-correctable, cache-uncorrectable, cache-fatal, mem-correctable,
+ mem-fatal
+
+Memdevs: ::
+ poison
+
+----
+
+--clear::
+ Clear poison previously injected into a device.
+
+--debug::
+ Enable debug output
+
+--debugfs::
+ The mount point of the Linux kernel debug filesystem (debugfs). Defaults
+ to "/sys/kernel/debug" if left unspecified.
+
+SEE ALSO
+--------
+linkcxl:cxl-list[1]
diff --git a/Documentation/cxl/meson.build b/Documentation/cxl/meson.build
index 8085c1c..1502d25 100644
--- a/Documentation/cxl/meson.build
+++ b/Documentation/cxl/meson.build
@@ -50,6 +50,7 @@ cxl_manpages = [
'cxl-update-firmware.txt',
'cxl-set-alert-config.txt',
'cxl-wait-sanitize.txt',
+ 'cxl-inject-error.txt',
]
foreach man : cxl_manpages
diff --git a/cxl/builtin.h b/cxl/builtin.h
index c483f30..e82fcb5 100644
--- a/cxl/builtin.h
+++ b/cxl/builtin.h
@@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx);
int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
+int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
#ifdef ENABLE_LIBTRACEFS
int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
#else
diff --git a/cxl/cxl.c b/cxl/cxl.c
index 1643667..a98bd6b 100644
--- a/cxl/cxl.c
+++ b/cxl/cxl.c
@@ -80,6 +80,7 @@ static struct cmd_struct commands[] = {
{ "disable-region", .c_fn = cmd_disable_region },
{ "destroy-region", .c_fn = cmd_destroy_region },
{ "monitor", .c_fn = cmd_monitor },
+ { "inject-error", .c_fn = cmd_inject_error },
};
int main(int argc, const char **argv)
diff --git a/cxl/inject-error.c b/cxl/inject-error.c
new file mode 100644
index 0000000..907bfc2
--- /dev/null
+++ b/cxl/inject-error.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 AMD. All rights reserved. */
+#include <util/parse-options.h>
+#include <cxl/libcxl.h>
+#include <cxl/filter.h>
+#include <util/log.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <errno.h>
+#include <limits.h>
+
+#define EINJ_TYPES_BUF_SIZE 512
+
+static const char *debugfs;
+static bool debug;
+
+static struct inject_params {
+ const char *type;
+ const char *address;
+ bool clear;
+} param;
+
+static const struct option inject_options[] = {
+ OPT_STRING('t', "type", ¶m.type, "Error type",
+ "Error type to inject into <device>"),
+ OPT_STRING('a', "address", ¶m.address, "Address for poison injection",
+ "Device physical address for poison injection in hex or decimal"),
+ OPT_BOOLEAN(0, "clear", ¶m.clear, "Clear poison instead of inject"),
+ OPT_STRING(0, "debugfs", &debugfs, "debugfs mount point",
+ "Mount point for debug file system, defaults to /sys/kernel/debug"),
+#ifdef ENABLE_DEBUG
+ OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
+#endif
+ OPT_END(),
+};
+
+static struct log_ctx iel;
+
+static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
+ const char *type)
+{
+ struct cxl_protocol_error *perror;
+
+ cxl_protocol_error_foreach(ctx, perror) {
+ if (strcmp(type, cxl_protocol_error_get_str(perror)) == 0)
+ return perror;
+ }
+
+ log_err(&iel, "Invalid CXL protocol error type: %s\n", type);
+ return NULL;
+}
+
+static struct cxl_dport *find_cxl_dport(struct cxl_ctx *ctx, const char *devname)
+{
+ struct cxl_port *port, *top;
+ struct cxl_dport *dport;
+ struct cxl_bus *bus;
+
+ cxl_bus_foreach(ctx, bus) {
+ top = cxl_bus_get_port(bus);
+
+ cxl_port_foreach_all(top, port)
+ cxl_dport_foreach(port, dport)
+ if (!strcmp(devname,
+ cxl_dport_get_devname(dport)))
+ return dport;
+ }
+
+ log_err(&iel, "Downstream port \"%s\" not found\n", devname);
+ return NULL;
+}
+
+static struct cxl_memdev *find_cxl_memdev(struct cxl_ctx *ctx, const char *filter)
+{
+ struct cxl_memdev *memdev;
+
+ cxl_memdev_foreach(ctx, memdev) {
+ if (util_cxl_memdev_filter(memdev, filter, NULL))
+ return memdev;
+ }
+
+ log_err(&iel, "Memdev \"%s\" not found\n", filter);
+ return NULL;
+}
+
+static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
+ struct cxl_protocol_error *perror)
+{
+ struct cxl_dport *dport;
+ int rc;
+
+ if (!devname) {
+ log_err(&iel, "No downstream port specified for injection\n");
+ return -EINVAL;
+ }
+
+ dport = find_cxl_dport(ctx, devname);
+ if (!dport)
+ return -ENODEV;
+
+ rc = cxl_dport_protocol_error_inject(dport,
+ cxl_protocol_error_get_num(perror));
+ if (rc)
+ return rc;
+
+ printf("injected %s protocol error.\n",
+ cxl_protocol_error_get_str(perror));
+ return 0;
+}
+
+static int inject_poison(struct cxl_ctx *ctx, const char *filter,
+ const char *addr, bool clear)
+{
+ struct cxl_memdev *memdev;
+ size_t a;
+ int rc;
+
+ memdev = find_cxl_memdev(ctx, filter);
+ if (!memdev)
+ return -ENODEV;
+
+ if (!cxl_memdev_has_poison_injection(memdev)) {
+ log_err(&iel, "%s does not support error injection\n",
+ cxl_memdev_get_devname(memdev));
+ return -EINVAL;
+ }
+
+ if (!addr) {
+ log_err(&iel, "no address provided\n");
+ return -EINVAL;
+ }
+
+ a = strtoull(addr, NULL, 0);
+ if (a == ULLONG_MAX && errno == ERANGE) {
+ log_err(&iel, "invalid address %s: %s", addr, strerror(-EINVAL));
+ return -EINVAL;
+ }
+
+ if (clear)
+ rc = cxl_memdev_clear_poison(memdev, a);
+ else
+ rc = cxl_memdev_inject_poison(memdev, a);
+
+ if (rc) {
+ log_err(&iel, "failed to %s %s:%s: %s\n",
+ clear ? "clear poison at" : "inject point at",
+ cxl_memdev_get_devname(memdev), addr, strerror(-rc));
+ } else {
+ printf("poison %s at %s:%s\n", clear ? "cleared" : "injected",
+ cxl_memdev_get_devname(memdev), addr);
+ }
+
+ return rc;
+}
+
+static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
+ const struct option *options, const char *usage)
+{
+ struct cxl_protocol_error *perr;
+ const char * const u[] = {
+ usage,
+ NULL
+ };
+ int rc = -EINVAL;
+
+ log_init(&iel, "cxl inject-error", "CXL_INJECT_LOG");
+ argc = parse_options(argc, argv, options, u, 0);
+
+ if (debug) {
+ cxl_set_log_priority(ctx, LOG_DEBUG);
+ iel.log_priority = LOG_DEBUG;
+ } else {
+ iel.log_priority = LOG_INFO;
+ }
+
+ if (debugfs)
+ cxl_set_debugfs(ctx, debugfs);
+
+ if (argc != 1) {
+ usage_with_options(u, options);
+ return rc;
+ }
+
+ if (strcmp(param.type, "poison") == 0) {
+ rc = inject_poison(ctx, argv[0], param.address, param.clear);
+ if (rc)
+ log_err(&iel, "Failed to inject poison into %s: %s\n",
+ argv[0], strerror(-rc));
+
+ return rc;
+ }
+
+ perr = find_cxl_proto_err(ctx, param.type);
+ if (perr) {
+ rc = inject_proto_err(ctx, argv[0], perr);
+ if (rc)
+ log_err(&iel, "Failed to inject error: %d\n", rc);
+ }
+
+ log_err(&iel, "Invalid error type %s", param.type);
+ return rc;
+}
+
+int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
+{
+ int rc = inject_action(argc, argv, ctx, inject_options,
+ "inject-error <device> [<options>]");
+
+ return rc ? EXIT_FAILURE : EXIT_SUCCESS;
+}
diff --git a/cxl/meson.build b/cxl/meson.build
index e4d1683..29918e4 100644
--- a/cxl/meson.build
+++ b/cxl/meson.build
@@ -7,6 +7,7 @@ cxl_src = [
'memdev.c',
'json.c',
'filter.c',
+ 'inject-error.c',
'../daxctl/json.c',
'../daxctl/filter.c',
]
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [ndctl PATCH 0/6] Add error injection support
2025-04-24 21:23 [ndctl PATCH 0/6] Add error injection support Ben Cheatham
` (5 preceding siblings ...)
2025-04-24 21:24 ` [PATCH 6/6] cxl: Add inject-error command Ben Cheatham
@ 2025-04-29 2:35 ` Alison Schofield
2025-04-29 20:01 ` Ben Cheatham
6 siblings, 1 reply; 10+ messages in thread
From: Alison Schofield @ 2025-04-29 2:35 UTC (permalink / raw)
To: Ben Cheatham, Junhyeok Im; +Cc: nvdimm, linux-cxl
On Thu, Apr 24, 2025 at 04:23:55PM -0500, Ben Cheatham wrote:
> This series adds support for injecting CXL protocol (CXL.cache/mem)
> errors[1] into CXL RCH Downstream ports and VH root ports[2] and
> poison into CXL memory devices through the CXL debugfs. Errors are
> injected using a new 'inject-error' command, while errors are reported
> using a new cxl-list "-N"/"--injectable-errors" option.
>
> The 'inject-error' command and "-N" option of cxl-list both require
> access to the CXL driver's debugfs. Because the debugfs doesn't have a
> required mount point, a "--debugfs" option is added to both cxl-list and
> cxl-inject-error to specify the path to the debugfs if it isn't mounted
> to the usual place (/sys/kernel/debug).
>
> The documentation for the new cxl-inject-error command shows both usage
> and the possible device/error types, as well as how to retrieve them
> using cxl-list. The documentation for cxl-list has also been updated to
> show the usage of the new injectable errors and debugfs options.
>
> [1]: ACPI v6.5 spec, section 18.6.4
> [2]: ACPI v6.5 spec, table 18.31
Hi Ben,
Junkyeok Im posted a set for inject & clear poison back in 2023.[1] It
went through one round of review but was a bit ahead of it's time as we
were still working out the presentation of media-errors in the trigger
poison patch set. I'll 'cc them here in case they have interest and can
help review thi set.
How come you're not interested in implementing clear-poison?
[1] https://lore.kernel.org/linux-cxl/20230517032311.19923-1-junhyeok.im@samsung.com/
--Alison
>
> Ben Cheatham (6):
> libcxl: Add debugfs path to CXL context
> libcxl: Add CXL protocol errors
> libcxl: Add poison injection functions
> cxl/list: Add debugfs option
> cxl/list: Add injectable-errors option
> cxl: Add inject-error command
>
> Documentation/cxl/cxl-inject-error.txt | 139 +++++++++++++++
> Documentation/cxl/cxl-list.txt | 39 ++++-
> Documentation/cxl/meson.build | 1 +
> cxl/builtin.h | 1 +
> cxl/cxl.c | 1 +
> cxl/filter.h | 3 +
> cxl/inject-error.c | 211 +++++++++++++++++++++++
> cxl/json.c | 30 ++++
> cxl/lib/libcxl.c | 225 +++++++++++++++++++++++++
> cxl/lib/libcxl.sym | 13 ++
> cxl/lib/private.h | 14 ++
> cxl/libcxl.h | 17 ++
> cxl/list.c | 9 +
> cxl/meson.build | 1 +
> util/json.h | 1 +
> 15 files changed, 704 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/cxl/cxl-inject-error.txt
> create mode 100644 cxl/inject-error.c
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [ndctl PATCH 0/6] Add error injection support
2025-04-29 2:35 ` [ndctl PATCH 0/6] Add error injection support Alison Schofield
@ 2025-04-29 20:01 ` Ben Cheatham
2025-04-30 0:53 ` Alison Schofield
0 siblings, 1 reply; 10+ messages in thread
From: Ben Cheatham @ 2025-04-29 20:01 UTC (permalink / raw)
To: Alison Schofield, Junhyeok Im; +Cc: nvdimm, linux-cxl
On 4/28/25 9:35 PM, Alison Schofield wrote:
> On Thu, Apr 24, 2025 at 04:23:55PM -0500, Ben Cheatham wrote:
>> This series adds support for injecting CXL protocol (CXL.cache/mem)
>> errors[1] into CXL RCH Downstream ports and VH root ports[2] and
>> poison into CXL memory devices through the CXL debugfs. Errors are
>> injected using a new 'inject-error' command, while errors are reported
>> using a new cxl-list "-N"/"--injectable-errors" option.
>>
>> The 'inject-error' command and "-N" option of cxl-list both require
>> access to the CXL driver's debugfs. Because the debugfs doesn't have a
>> required mount point, a "--debugfs" option is added to both cxl-list and
>> cxl-inject-error to specify the path to the debugfs if it isn't mounted
>> to the usual place (/sys/kernel/debug).
>>
>> The documentation for the new cxl-inject-error command shows both usage
>> and the possible device/error types, as well as how to retrieve them
>> using cxl-list. The documentation for cxl-list has also been updated to
>> show the usage of the new injectable errors and debugfs options.
>>
>> [1]: ACPI v6.5 spec, section 18.6.4
>> [2]: ACPI v6.5 spec, table 18.31
>
> Hi Ben,
>
> Junkyeok Im posted a set for inject & clear poison back in 2023.[1] It
> went through one round of review but was a bit ahead of it's time as we
> were still working out the presentation of media-errors in the trigger
> poison patch set. I'll 'cc them here in case they have interest and can
> help review thi set.
Thanks for pointing this out. I forgot to look for an existing set before
implementing it myself, sorry about that :/.
I'd be willing to drop the poison support from this set and use Junhyeok's
instead, integrate it into this one, or leave it as-is.
>
> How come you're not interested in implementing clear-poison?
It is implemented, it's a flag ("--clear") for the inject-error command. I forgot
to mention it in the cover letter, I can add it in v2.
Thanks,
Ben
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [ndctl PATCH 0/6] Add error injection support
2025-04-29 20:01 ` Ben Cheatham
@ 2025-04-30 0:53 ` Alison Schofield
0 siblings, 0 replies; 10+ messages in thread
From: Alison Schofield @ 2025-04-30 0:53 UTC (permalink / raw)
To: Ben Cheatham; +Cc: Junhyeok Im, nvdimm, linux-cxl
On Tue, Apr 29, 2025 at 03:01:42PM -0500, Ben Cheatham wrote:
>
> On 4/28/25 9:35 PM, Alison Schofield wrote:
> > On Thu, Apr 24, 2025 at 04:23:55PM -0500, Ben Cheatham wrote:
> >> This series adds support for injecting CXL protocol (CXL.cache/mem)
> >> errors[1] into CXL RCH Downstream ports and VH root ports[2] and
> >> poison into CXL memory devices through the CXL debugfs. Errors are
> >> injected using a new 'inject-error' command, while errors are reported
> >> using a new cxl-list "-N"/"--injectable-errors" option.
> >>
> >> The 'inject-error' command and "-N" option of cxl-list both require
> >> access to the CXL driver's debugfs. Because the debugfs doesn't have a
> >> required mount point, a "--debugfs" option is added to both cxl-list and
> >> cxl-inject-error to specify the path to the debugfs if it isn't mounted
> >> to the usual place (/sys/kernel/debug).
> >>
> >> The documentation for the new cxl-inject-error command shows both usage
> >> and the possible device/error types, as well as how to retrieve them
> >> using cxl-list. The documentation for cxl-list has also been updated to
> >> show the usage of the new injectable errors and debugfs options.
> >>
> >> [1]: ACPI v6.5 spec, section 18.6.4
> >> [2]: ACPI v6.5 spec, table 18.31
> >
> > Hi Ben,
> >
> > Junkyeok Im posted a set for inject & clear poison back in 2023.[1] It
> > went through one round of review but was a bit ahead of it's time as we
> > were still working out the presentation of media-errors in the trigger
> > poison patch set. I'll 'cc them here in case they have interest and can
> > help review thi set.
>
> Thanks for pointing this out. I forgot to look for an existing set before
> implementing it myself, sorry about that :/.
>
> I'd be willing to drop the poison support from this set and use Junhyeok's
> instead, integrate it into this one, or leave it as-is.
I should have recalled at the RFC time. Anyway, compare and contrast and
select the best path forward.
>
> >
> > How come you're not interested in implementing clear-poison?
>
> It is implemented, it's a flag ("--clear") for the inject-error command. I forgot
> to mention it in the cover letter, I can add it in v2.
Ah, I haven't reviewed yet to see that. I'm going to ask for that to be
its own command. We may get into some naming brouhaha. You are using the
word 'error' for multiple types of errors and we used 'media-error' specifically
for device poison. I'll put more thought into it when I review in detail.
>
> Thanks,
> Ben
^ permalink raw reply [flat|nested] 10+ messages in thread