* [ndctl PATCH v6 0/7] Add error injection support
@ 2026-01-09 16:07 Ben Cheatham
2026-01-09 16:07 ` [PATCH 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
` (6 more replies)
0 siblings, 7 replies; 19+ messages in thread
From: Ben Cheatham @ 2026-01-09 16:07 UTC (permalink / raw)
To: nvdimm, alison.schofield, dave.jiang; +Cc: linux-cxl, benjamin.cheatham
v6 Changes:
- Rebase to pending branch (Alison)
- Drop const for ctx->debugfs (Alison)
- Rename get_debugfs_dir() to get_cxl_debugfs_dir() and return cxl directory in debugfs
(i.e. "/sys/kernel/debug" -> "/sys/kernel/debug/cxl")
- Rename ctx->debugfs to ctx->cxl_debugfs
- Fix missing free of einj path (Alison)
- Add protocol errors in order to perrors list (Alison)
- Use hex constants instead of BIT() for protocol errors (Alison)
- Add symbols to LIBCXL_11 instead of LIBCXL_10 (Alison)
- Update commit message to reflect util_cxl_dport_filter() behavior (Alison)
- Remove EINJ_TYPES_BUF_SIZE #ifdef (Alison)
- Fix type mismatch of addr in poison_action() (Alison)
- Fix inject_action() to catch missing 'type' option (Alison)
- Remove '-N' option and show the information behind that option by default when
CXL debugfs is present (Alison)
- Add 'protocol_injectable' attribute for dports (Alison)
- Update inject-error man page with port injection example (Alison)
- Add warning to inject-error man page (Alison)
v5 Changes:
- Use setmntent()/getmntent() instead of open-coding getting the
debugfs path (Dave)
- Use correct return code for sysfs_read_attr() (Dave)
v4 Changes:
- Variable renames for clarity (Dave)
- Use errno instead of rc for access() calls (Dave)
- Check returns for snprintf() (Dave)
- Add util_cxl_dport_filter() (Dave)
- Replace printf() calls with log_info() (Dave)
- Write correct value to debugfs during protocol error injection
(BIT(error) vs. error)
v3 Changes:
- Rebase on v83 release
- Fix whitespace errors (Alison)
v2 Changes:
- Make the --clear option of 'inject-error' its own command (Alison)
- Debugfs is now found using the /proc/mount entry instead of
providing the path using a --debugfs option
- Man page added for 'clear-error'
- Reword commit descriptions for clarity
This series adds support for injecting CXL protocol (CXL.cache/mem)
errors[1] into CXL RCH Downstream ports and VH root ports[2] and
poison into CXL memory devices through the CXL debugfs. Errors are
injected using a new 'inject-error' command. Device poison can be
cleared using the 'clear-error' command. The 'inject-error' and
'clear-error' commands require access to the CXL driver's debugfs.
The documentation for the new cxl-inject-error command shows both usage
and the possible device/error types, as well as how to retrieve them
using cxl-list. cxl-list has been updated to include the possible error
types for protocol error injection (under the "bus" object) and which CXL
dports and memory devices support injection.
[1]: ACPI v6.5 spec, section 18.6.4
[2]: ACPI v6.5 spec, table 18.31
Ben Cheatham (7):
libcxl: Add debugfs path to CXL context
libcxl: Add CXL protocol errors
libcxl: Add poison injection support
cxl: Add inject-error command
cxl: Add clear-error command
cxl/list: Add injectable errors in output
Documentation: Add docs for inject/clear-error commands
Documentation/cxl/cxl-clear-error.txt | 69 ++++++
Documentation/cxl/cxl-inject-error.txt | 161 ++++++++++++
Documentation/cxl/meson.build | 2 +
cxl/builtin.h | 2 +
cxl/cxl.c | 2 +
cxl/filter.c | 26 ++
cxl/filter.h | 2 +
cxl/inject-error.c | 248 +++++++++++++++++++
cxl/json.c | 38 +++
cxl/lib/libcxl.c | 330 +++++++++++++++++++++++++
cxl/lib/libcxl.sym | 10 +
cxl/lib/private.h | 14 ++
cxl/libcxl.h | 18 ++
cxl/meson.build | 1 +
14 files changed, 923 insertions(+)
create mode 100644 Documentation/cxl/cxl-clear-error.txt
create mode 100644 Documentation/cxl/cxl-inject-error.txt
create mode 100644 cxl/inject-error.c
--
2.52.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/7] libcxl: Add debugfs path to CXL context
2026-01-09 16:07 [ndctl PATCH v6 0/7] Add error injection support Ben Cheatham
@ 2026-01-09 16:07 ` Ben Cheatham
2026-01-09 17:43 ` Dave Jiang
2026-01-09 16:07 ` [PATCH 2/7] libcxl: Add CXL protocol errors Ben Cheatham
` (5 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2026-01-09 16:07 UTC (permalink / raw)
To: nvdimm, alison.schofield, dave.jiang; +Cc: linux-cxl, benjamin.cheatham
Find the CXL debugfs mount point and add it to the CXL library context.
This will be used by poison and procotol error library functions to
access the information presented by the filesystem.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/lib/libcxl.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 32728de..6b7e92c 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -8,6 +8,8 @@
#include <stdlib.h>
#include <dirent.h>
#include <unistd.h>
+#include <mntent.h>
+#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
@@ -54,6 +56,7 @@ struct cxl_ctx {
struct kmod_ctx *kmod_ctx;
struct daxctl_ctx *daxctl_ctx;
void *private_data;
+ char *cxl_debugfs;
};
static void free_pmem(struct cxl_pmem *pmem)
@@ -240,6 +243,38 @@ CXL_EXPORT void *cxl_get_private_data(struct cxl_ctx *ctx)
return ctx->private_data;
}
+static char* get_cxl_debugfs_dir(void)
+{
+ char *debugfs_dir = NULL;
+ struct mntent *ent;
+ FILE *mntf;
+
+ mntf = setmntent("/proc/mounts", "r");
+ if (!mntf)
+ return NULL;
+
+ while ((ent = getmntent(mntf)) != NULL) {
+ if (!strcmp(ent->mnt_type, "debugfs")) {
+ /* Magic '5' here is length of "/cxl" + NULL terminator */
+ debugfs_dir = calloc(strlen(ent->mnt_dir) + 5, 1);
+ if (!debugfs_dir)
+ return NULL;
+
+ strcpy(debugfs_dir, ent->mnt_dir);
+ strcat(debugfs_dir, "/cxl");
+ if (access(debugfs_dir, F_OK) != 0) {
+ free(debugfs_dir);
+ debugfs_dir = NULL;
+ }
+
+ break;
+ }
+ }
+
+ endmntent(mntf);
+ return debugfs_dir;
+}
+
/**
* cxl_new - instantiate a new library context
* @ctx: context to establish
@@ -295,6 +330,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
c->udev = udev;
c->udev_queue = udev_queue;
c->timeout = 5000;
+ c->cxl_debugfs = get_cxl_debugfs_dir();
return 0;
@@ -350,6 +386,7 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
kmod_unref(ctx->kmod_ctx);
daxctl_unref(ctx->daxctl_ctx);
info(ctx, "context %p released\n", ctx);
+ free((void *)ctx->cxl_debugfs);
free(ctx);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/7] libcxl: Add CXL protocol errors
2026-01-09 16:07 [ndctl PATCH v6 0/7] Add error injection support Ben Cheatham
2026-01-09 16:07 ` [PATCH 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
@ 2026-01-09 16:07 ` Ben Cheatham
2026-01-09 17:54 ` Dave Jiang
2026-01-09 16:07 ` [PATCH 3/7] libcxl: Add poison injection support Ben Cheatham
` (4 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2026-01-09 16:07 UTC (permalink / raw)
To: nvdimm, alison.schofield, dave.jiang; +Cc: linux-cxl, benjamin.cheatham
The v6.11 Linux kernel adds CXL protocl (CXL.cache & CXL.mem) error
injection for platforms that implement the error types as according to
the v6.5+ ACPI specification. The interface for injecting these errors
are provided by the kernel under the CXL debugfs. The relevant files in
the interface are the einj_types file, which provides the available CXL
error types for injection, and the einj_inject file, which injects the
error into a CXL VH root port or CXL RCH downstream port.
Add a library API to retrieve the CXL error types and inject them. This
API will be used in a later commit by the 'cxl-inject-error' and
'cxl-list' commands.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/lib/libcxl.c | 194 +++++++++++++++++++++++++++++++++++++++++++++
cxl/lib/libcxl.sym | 5 ++
cxl/lib/private.h | 14 ++++
cxl/libcxl.h | 13 +++
4 files changed, 226 insertions(+)
diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 6b7e92c..27ff037 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -48,11 +48,13 @@ struct cxl_ctx {
void *userdata;
int memdevs_init;
int buses_init;
+ int perrors_init;
unsigned long timeout;
struct udev *udev;
struct udev_queue *udev_queue;
struct list_head memdevs;
struct list_head buses;
+ struct list_head perrors;
struct kmod_ctx *kmod_ctx;
struct daxctl_ctx *daxctl_ctx;
void *private_data;
@@ -207,6 +209,14 @@ static void free_bus(struct cxl_bus *bus, struct list_head *head)
free(bus);
}
+static void free_protocol_error(struct cxl_protocol_error *perror,
+ struct list_head *head)
+{
+ if (head)
+ list_del_from(head, &perror->list);
+ free(perror);
+}
+
/**
* cxl_get_userdata - retrieve stored data pointer from library context
* @ctx: cxl library context
@@ -325,6 +335,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
*ctx = c;
list_head_init(&c->memdevs);
list_head_init(&c->buses);
+ list_head_init(&c->perrors);
c->kmod_ctx = kmod_ctx;
c->daxctl_ctx = daxctl_ctx;
c->udev = udev;
@@ -366,6 +377,7 @@ CXL_EXPORT struct cxl_ctx *cxl_ref(struct cxl_ctx *ctx)
*/
CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
{
+ struct cxl_protocol_error *perror, *_p;
struct cxl_memdev *memdev, *_d;
struct cxl_bus *bus, *_b;
@@ -381,6 +393,9 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
list_for_each_safe(&ctx->buses, bus, _b, port.list)
free_bus(bus, &ctx->buses);
+ list_for_each_safe(&ctx->perrors, perror, _p, list)
+ free_protocol_error(perror, &ctx->perrors);
+
udev_queue_unref(ctx->udev_queue);
udev_unref(ctx->udev);
kmod_unref(ctx->kmod_ctx);
@@ -3423,6 +3438,185 @@ CXL_EXPORT int cxl_port_decoders_committed(struct cxl_port *port)
return port->decoders_committed;
}
+const struct cxl_protocol_error cxl_protocol_errors[] = {
+ CXL_PROTOCOL_ERROR(0x1000, "cache-correctable"),
+ CXL_PROTOCOL_ERROR(0x2000, "cache-uncorrectable"),
+ CXL_PROTOCOL_ERROR(0x4000, "cache-fatal"),
+ CXL_PROTOCOL_ERROR(0x8000, "mem-correctable"),
+ CXL_PROTOCOL_ERROR(0x10000, "mem-uncorrectable"),
+ CXL_PROTOCOL_ERROR(0x20000, "mem-fatal")
+};
+
+static struct cxl_protocol_error *create_cxl_protocol_error(struct cxl_ctx *ctx,
+ unsigned int n)
+{
+ struct cxl_protocol_error *perror;
+
+ for (unsigned long i = 0; i < ARRAY_SIZE(cxl_protocol_errors); i++) {
+ if (n != cxl_protocol_errors[i].num)
+ continue;
+
+ perror = calloc(1, sizeof(*perror));
+ if (!perror)
+ return NULL;
+
+ *perror = cxl_protocol_errors[i];
+ perror->ctx = ctx;
+ return perror;
+ }
+
+ return NULL;
+}
+
+static void cxl_add_protocol_errors(struct cxl_ctx *ctx)
+{
+ struct cxl_protocol_error *perror;
+ char buf[SYSFS_ATTR_SIZE];
+ char *path, *num, *save;
+ size_t path_len, len;
+ unsigned long n;
+ int rc = 0;
+
+ if (!ctx->cxl_debugfs)
+ return;
+
+ path_len = strlen(ctx->cxl_debugfs) + 100;
+ path = calloc(1, path_len);
+ if (!path)
+ return;
+
+ len = snprintf(path, path_len, "%s/einj_types", ctx->cxl_debugfs);
+ if (len >= path_len) {
+ err(ctx, "Buffer too small\n");
+ goto err;
+ }
+
+ rc = access(path, F_OK);
+ if (rc) {
+ err(ctx, "failed to access %s: %s\n", path, strerror(errno));
+ goto err;
+ }
+
+ rc = sysfs_read_attr(ctx, path, buf);
+ if (rc) {
+ err(ctx, "failed to read %s: %s\n", path, strerror(-rc));
+ goto err;
+ }
+
+ /*
+ * The format of the output of the einj_types attr is:
+ * <Error number in hex 1> <Error name 1>
+ * <Error number in hex 2> <Error name 2>
+ * ...
+ *
+ * We only need the number, so parse that and skip the rest of
+ * the line.
+ */
+ num = strtok_r(buf, " \n", &save);
+ while (num) {
+ n = strtoul(num, NULL, 16);
+ perror = create_cxl_protocol_error(ctx, n);
+ if (perror)
+ list_add_tail(&ctx->perrors, &perror->list);
+
+ num = strtok_r(NULL, "\n", &save);
+ if (!num)
+ break;
+
+ num = strtok_r(NULL, " \n", &save);
+ }
+
+err:
+ free(path);
+}
+
+static void cxl_protocol_errors_init(struct cxl_ctx *ctx)
+{
+ if (ctx->perrors_init)
+ return;
+
+ ctx->perrors_init = 1;
+ cxl_add_protocol_errors(ctx);
+}
+
+CXL_EXPORT struct cxl_protocol_error *
+cxl_protocol_error_get_first(struct cxl_ctx *ctx)
+{
+ cxl_protocol_errors_init(ctx);
+
+ return list_top(&ctx->perrors, struct cxl_protocol_error, list);
+}
+
+CXL_EXPORT struct cxl_protocol_error *
+cxl_protocol_error_get_next(struct cxl_protocol_error *perror)
+{
+ struct cxl_ctx *ctx = perror->ctx;
+
+ return list_next(&ctx->perrors, perror, list);
+}
+
+CXL_EXPORT unsigned int
+cxl_protocol_error_get_num(struct cxl_protocol_error *perror)
+{
+ return perror->num;
+}
+
+CXL_EXPORT const char *
+cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
+{
+ return perror->string;
+}
+
+CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
+ unsigned int error)
+{
+ struct cxl_ctx *ctx = dport->port->ctx;
+ char buf[32] = { 0 };
+ size_t path_len, len;
+ char *path;
+ int rc;
+
+ if (!ctx->cxl_debugfs)
+ return -ENOENT;
+
+ path_len = strlen(ctx->cxl_debugfs) + 100;
+ path = calloc(path_len, sizeof(char));
+ if (!path)
+ return -ENOMEM;
+
+ len = snprintf(path, path_len, "%s/%s/einj_inject", ctx->cxl_debugfs,
+ cxl_dport_get_devname(dport));
+ if (len >= path_len) {
+ err(ctx, "%s: buffer too small\n", cxl_dport_get_devname(dport));
+ free(path);
+ return -ENOMEM;
+ }
+
+ rc = access(path, F_OK);
+ if (rc) {
+ err(ctx, "failed to access %s: %s\n", path, strerror(errno));
+ free(path);
+ return -errno;
+ }
+
+ len = snprintf(buf, sizeof(buf), "0x%x\n", error);
+ if (len >= sizeof(buf)) {
+ err(ctx, "%s: buffer too small\n", cxl_dport_get_devname(dport));
+ free(path);
+ return -ENOMEM;
+ }
+
+ rc = sysfs_write_attr(ctx, path, buf);
+ if (rc) {
+ err(ctx, "failed to write %s: %s\n", path, strerror(-rc));
+ free(path);
+ return -errno;
+ }
+
+ free(path);
+ return 0;
+}
+
static void *add_cxl_bus(void *parent, int id, const char *cxlbus_base)
{
const char *devname = devpath_to_devname(cxlbus_base);
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index 36a93c3..c683b83 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -304,4 +304,9 @@ global:
LIBCXL_11 {
global:
cxl_region_get_extended_linear_cache_size;
+ cxl_protocol_error_get_first;
+ cxl_protocol_error_get_next;
+ cxl_protocol_error_get_num;
+ cxl_protocol_error_get_str;
+ cxl_dport_protocol_error_inject;
} LIBCXL_10;
diff --git a/cxl/lib/private.h b/cxl/lib/private.h
index 542cdb7..582eebf 100644
--- a/cxl/lib/private.h
+++ b/cxl/lib/private.h
@@ -108,6 +108,20 @@ struct cxl_port {
struct list_head dports;
};
+struct cxl_protocol_error {
+ unsigned int num;
+ const char *string;
+ struct cxl_ctx *ctx;
+ struct list_node list;
+};
+
+#define CXL_PROTOCOL_ERROR(n, str) \
+ ((struct cxl_protocol_error){ \
+ .num = (n), \
+ .string = (str), \
+ .ctx = NULL, \
+ })
+
struct cxl_bus {
struct cxl_port port;
};
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index 9371aac..faef62e 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -498,6 +498,19 @@ int cxl_cmd_alert_config_set_enable_alert_actions(struct cxl_cmd *cmd,
int enable);
struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memdev);
+struct cxl_protocol_error;
+struct cxl_protocol_error *cxl_protocol_error_get_first(struct cxl_ctx *ctx);
+struct cxl_protocol_error *
+cxl_protocol_error_get_next(struct cxl_protocol_error *perror);
+unsigned int cxl_protocol_error_get_num(struct cxl_protocol_error *perror);
+const char *cxl_protocol_error_get_str(struct cxl_protocol_error *perror);
+int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
+ unsigned int error);
+
+#define cxl_protocol_error_foreach(ctx, perror) \
+ for (perror = cxl_protocol_error_get_first(ctx); perror != NULL; \
+ perror = cxl_protocol_error_get_next(perror))
+
#ifdef __cplusplus
} /* extern "C" */
#endif
--
2.52.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/7] libcxl: Add poison injection support
2026-01-09 16:07 [ndctl PATCH v6 0/7] Add error injection support Ben Cheatham
2026-01-09 16:07 ` [PATCH 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
2026-01-09 16:07 ` [PATCH 2/7] libcxl: Add CXL protocol errors Ben Cheatham
@ 2026-01-09 16:07 ` Ben Cheatham
2026-01-09 18:03 ` Dave Jiang
2026-01-09 16:07 ` [PATCH 4/7] cxl: Add inject-error command Ben Cheatham
` (3 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2026-01-09 16:07 UTC (permalink / raw)
To: nvdimm, alison.schofield, dave.jiang; +Cc: linux-cxl, benjamin.cheatham
Add a library API for clearing and injecting poison into a CXL memory
device through the CXL debugfs.
This API will be used by the 'cxl-inject-error' and 'cxl-clear-error'
commands in later commits.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/lib/libcxl.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++
cxl/lib/libcxl.sym | 3 ++
cxl/libcxl.h | 3 ++
3 files changed, 89 insertions(+)
diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 27ff037..deebf7f 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -5046,3 +5046,86 @@ CXL_EXPORT struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memde
{
return cxl_cmd_new_generic(memdev, CXL_MEM_COMMAND_ID_SET_ALERT_CONFIG);
}
+
+CXL_EXPORT bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev)
+{
+ struct cxl_ctx *ctx = memdev->ctx;
+ size_t path_len, len;
+ bool exists = true;
+ char *path;
+ int rc;
+
+ if (!ctx->cxl_debugfs)
+ return false;
+
+ path_len = strlen(ctx->cxl_debugfs) + 100;
+ path = calloc(path_len, sizeof(char));
+ if (!path)
+ return false;
+
+ len = snprintf(path, path_len, "%s/%s/inject_poison", ctx->cxl_debugfs,
+ cxl_memdev_get_devname(memdev));
+ if (len >= path_len) {
+ err(ctx, "%s: buffer too small\n",
+ cxl_memdev_get_devname(memdev));
+ free(path);
+ return false;
+ }
+
+ rc = access(path, F_OK);
+ if (rc)
+ exists = false;
+
+ free(path);
+ return exists;
+}
+
+static int cxl_memdev_poison_action(struct cxl_memdev *memdev, size_t dpa,
+ bool clear)
+{
+ struct cxl_ctx *ctx = memdev->ctx;
+ size_t path_len, len;
+ char addr[32];
+ char *path;
+ int rc;
+
+ if (!ctx->cxl_debugfs)
+ return -ENOENT;
+
+ path_len = strlen(ctx->cxl_debugfs) + 100;
+ path = calloc(path_len, sizeof(char));
+ if (!path)
+ return -ENOMEM;
+
+ len = snprintf(path, path_len, "%s/%s/%s", ctx->cxl_debugfs,
+ cxl_memdev_get_devname(memdev),
+ clear ? "clear_poison" : "inject_poison");
+ if (len >= path_len) {
+ err(ctx, "%s: buffer too small\n",
+ cxl_memdev_get_devname(memdev));
+ free(path);
+ return -ENOMEM;
+ }
+
+ len = snprintf(addr, sizeof(addr), "0x%lx\n", dpa);
+ if (len >= sizeof(addr)) {
+ err(ctx, "%s: buffer too small\n",
+ cxl_memdev_get_devname(memdev));
+ free(path);
+ return -ENOMEM;
+ }
+
+ rc = sysfs_write_attr(ctx, path, addr);
+ free(path);
+ return rc;
+}
+
+CXL_EXPORT int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t addr)
+{
+ return cxl_memdev_poison_action(memdev, addr, false);
+}
+
+CXL_EXPORT int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t addr)
+{
+ return cxl_memdev_poison_action(memdev, addr, true);
+}
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index c683b83..c636edb 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -309,4 +309,7 @@ global:
cxl_protocol_error_get_num;
cxl_protocol_error_get_str;
cxl_dport_protocol_error_inject;
+ cxl_memdev_has_poison_injection;
+ cxl_memdev_inject_poison;
+ cxl_memdev_clear_poison;
} LIBCXL_10;
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index faef62e..4d035f0 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -105,6 +105,9 @@ int cxl_memdev_read_label(struct cxl_memdev *memdev, void *buf, size_t length,
size_t offset);
int cxl_memdev_write_label(struct cxl_memdev *memdev, void *buf, size_t length,
size_t offset);
+bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev);
+int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t dpa);
+int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t dpa);
struct cxl_cmd *cxl_cmd_new_get_fw_info(struct cxl_memdev *memdev);
unsigned int cxl_cmd_fw_info_get_num_slots(struct cxl_cmd *cmd);
unsigned int cxl_cmd_fw_info_get_active_slot(struct cxl_cmd *cmd);
--
2.52.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/7] cxl: Add inject-error command
2026-01-09 16:07 [ndctl PATCH v6 0/7] Add error injection support Ben Cheatham
` (2 preceding siblings ...)
2026-01-09 16:07 ` [PATCH 3/7] libcxl: Add poison injection support Ben Cheatham
@ 2026-01-09 16:07 ` Ben Cheatham
2026-01-09 21:53 ` Dave Jiang
2026-01-09 16:07 ` [PATCH 5/7] cxl: Add clear-error command Ben Cheatham
` (2 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2026-01-09 16:07 UTC (permalink / raw)
To: nvdimm, alison.schofield, dave.jiang; +Cc: linux-cxl, benjamin.cheatham
Add the 'cxl-inject-error' command. This command will provide CXL
protocol error injection for CXL VH root ports and CXL RCH downstream
ports, as well as poison injection for CXL memory devices.
Add util_cxl_dport_filter() to find downstream ports by device name.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/builtin.h | 1 +
cxl/cxl.c | 1 +
cxl/filter.c | 26 +++++++
cxl/filter.h | 2 +
cxl/inject-error.c | 188 +++++++++++++++++++++++++++++++++++++++++++++
cxl/meson.build | 1 +
6 files changed, 219 insertions(+)
create mode 100644 cxl/inject-error.c
diff --git a/cxl/builtin.h b/cxl/builtin.h
index c483f30..e82fcb5 100644
--- a/cxl/builtin.h
+++ b/cxl/builtin.h
@@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx);
int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
+int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
#ifdef ENABLE_LIBTRACEFS
int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
#else
diff --git a/cxl/cxl.c b/cxl/cxl.c
index 1643667..a98bd6b 100644
--- a/cxl/cxl.c
+++ b/cxl/cxl.c
@@ -80,6 +80,7 @@ static struct cmd_struct commands[] = {
{ "disable-region", .c_fn = cmd_disable_region },
{ "destroy-region", .c_fn = cmd_destroy_region },
{ "monitor", .c_fn = cmd_monitor },
+ { "inject-error", .c_fn = cmd_inject_error },
};
int main(int argc, const char **argv)
diff --git a/cxl/filter.c b/cxl/filter.c
index b135c04..8c7dc6e 100644
--- a/cxl/filter.c
+++ b/cxl/filter.c
@@ -171,6 +171,32 @@ util_cxl_endpoint_filter_by_port(struct cxl_endpoint *endpoint,
return NULL;
}
+struct cxl_dport *util_cxl_dport_filter(struct cxl_dport *dport,
+ const char *__ident)
+{
+
+ char *ident, *save;
+ const char *arg;
+
+ if (!__ident)
+ return dport;
+
+ ident = strdup(__ident);
+ if (!ident)
+ return NULL;
+
+ for (arg = strtok_r(ident, which_sep(__ident), &save); arg;
+ arg = strtok_r(NULL, which_sep(__ident), &save)) {
+ if (strcmp(arg, cxl_dport_get_devname(dport)) == 0)
+ break;
+ }
+
+ free(ident);
+ if (arg)
+ return dport;
+ return NULL;
+}
+
static struct cxl_decoder *
util_cxl_decoder_filter_by_port(struct cxl_decoder *decoder, const char *ident,
enum cxl_port_filter_mode mode)
diff --git a/cxl/filter.h b/cxl/filter.h
index 956a46e..70463c4 100644
--- a/cxl/filter.h
+++ b/cxl/filter.h
@@ -55,6 +55,8 @@ enum cxl_port_filter_mode {
struct cxl_port *util_cxl_port_filter(struct cxl_port *port, const char *ident,
enum cxl_port_filter_mode mode);
+struct cxl_dport *util_cxl_dport_filter(struct cxl_dport *dport,
+ const char *__ident);
struct cxl_bus *util_cxl_bus_filter(struct cxl_bus *bus, const char *__ident);
struct cxl_endpoint *util_cxl_endpoint_filter(struct cxl_endpoint *endpoint,
const char *__ident);
diff --git a/cxl/inject-error.c b/cxl/inject-error.c
new file mode 100644
index 0000000..0ca2e6b
--- /dev/null
+++ b/cxl/inject-error.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 AMD. All rights reserved. */
+#include <util/parse-options.h>
+#include <cxl/libcxl.h>
+#include <cxl/filter.h>
+#include <util/log.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <errno.h>
+#include <limits.h>
+
+static bool debug;
+
+static struct inject_params {
+ const char *type;
+ const char *address;
+} inj_param;
+
+static const struct option inject_options[] = {
+ OPT_STRING('t', "type", &inj_param.type, "Error type",
+ "Error type to inject into <device>"),
+ OPT_STRING('a', "address", &inj_param.address, "Address for poison injection",
+ "Device physical address for poison injection in hex or decimal"),
+#ifdef ENABLE_DEBUG
+ OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
+#endif
+ OPT_END(),
+};
+
+static struct log_ctx iel;
+
+static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
+ const char *type)
+{
+ struct cxl_protocol_error *perror;
+
+ cxl_protocol_error_foreach(ctx, perror) {
+ if (strcmp(type, cxl_protocol_error_get_str(perror)) == 0)
+ return perror;
+ }
+
+ log_err(&iel, "Invalid CXL protocol error type: %s\n", type);
+ return NULL;
+}
+
+static struct cxl_dport *find_cxl_dport(struct cxl_ctx *ctx, const char *devname)
+{
+ struct cxl_dport *dport;
+ struct cxl_port *port;
+ struct cxl_bus *bus;
+
+ cxl_bus_foreach(ctx, bus)
+ cxl_port_foreach_all(cxl_bus_get_port(bus), port)
+ cxl_dport_foreach(port, dport)
+ if (util_cxl_dport_filter(dport, devname))
+ return dport;
+
+ log_err(&iel, "Downstream port \"%s\" not found\n", devname);
+ return NULL;
+}
+
+static struct cxl_memdev *find_cxl_memdev(struct cxl_ctx *ctx,
+ const char *filter)
+{
+ struct cxl_memdev *memdev;
+
+ cxl_memdev_foreach(ctx, memdev) {
+ if (util_cxl_memdev_filter(memdev, filter, NULL))
+ return memdev;
+ }
+
+ log_err(&iel, "Memdev \"%s\" not found\n", filter);
+ return NULL;
+}
+
+static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
+ struct cxl_protocol_error *perror)
+{
+ struct cxl_dport *dport;
+ int rc;
+
+ if (!devname) {
+ log_err(&iel, "No downstream port specified for injection\n");
+ return -EINVAL;
+ }
+
+ dport = find_cxl_dport(ctx, devname);
+ if (!dport)
+ return -ENODEV;
+
+ rc = cxl_dport_protocol_error_inject(dport,
+ cxl_protocol_error_get_num(perror));
+ if (rc)
+ return rc;
+
+ log_info(&iel, "injected %s protocol error.\n",
+ cxl_protocol_error_get_str(perror));
+ return 0;
+}
+
+static int poison_action(struct cxl_ctx *ctx, const char *filter,
+ const char *addr_str)
+{
+ struct cxl_memdev *memdev;
+ unsigned long long addr;
+ int rc;
+
+ memdev = find_cxl_memdev(ctx, filter);
+ if (!memdev)
+ return -ENODEV;
+
+ if (!cxl_memdev_has_poison_injection(memdev)) {
+ log_err(&iel, "%s does not support error injection\n",
+ cxl_memdev_get_devname(memdev));
+ return -EINVAL;
+ }
+
+ if (!addr_str) {
+ log_err(&iel, "no address provided\n");
+ return -EINVAL;
+ }
+
+ errno = 0;
+ addr = strtoull(addr_str, NULL, 0);
+ if (addr == ULLONG_MAX && errno == ERANGE) {
+ log_err(&iel, "invalid address %s", addr_str);
+ return -EINVAL;
+ }
+
+ rc = cxl_memdev_inject_poison(memdev, addr);
+ if (rc)
+ log_err(&iel, "failed to inject poison at %s:%s: %s\n",
+ cxl_memdev_get_devname(memdev), addr_str, strerror(-rc));
+ else
+ log_info(&iel, "poison injected at %s:%s\n",
+ cxl_memdev_get_devname(memdev), addr_str);
+
+ return rc;
+}
+
+static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
+ const struct option *options, const char *usage)
+{
+ struct cxl_protocol_error *perr;
+ const char * const u[] = {
+ usage,
+ NULL
+ };
+ int rc = -EINVAL;
+
+ log_init(&iel, "cxl inject-error", "CXL_INJECT_LOG");
+ argc = parse_options(argc, argv, options, u, 0);
+
+ if (debug) {
+ cxl_set_log_priority(ctx, LOG_DEBUG);
+ iel.log_priority = LOG_DEBUG;
+ } else {
+ iel.log_priority = LOG_INFO;
+ }
+
+ if (argc != 1 || inj_param.type == NULL) {
+ usage_with_options(u, options);
+ return rc;
+ }
+
+ if (strcmp(inj_param.type, "poison") == 0) {
+ rc = poison_action(ctx, argv[0], inj_param.address);
+ return rc;
+ }
+
+ perr = find_cxl_proto_err(ctx, inj_param.type);
+ if (perr) {
+ rc = inject_proto_err(ctx, argv[0], perr);
+ if (rc)
+ log_err(&iel, "Failed to inject error: %d\n", rc);
+ }
+
+ return rc;
+}
+
+int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
+{
+ int rc = inject_action(argc, argv, ctx, inject_options,
+ "inject-error <device> -t <type> [<options>]");
+
+ return rc ? EXIT_FAILURE : EXIT_SUCCESS;
+}
diff --git a/cxl/meson.build b/cxl/meson.build
index b9924ae..92031b5 100644
--- a/cxl/meson.build
+++ b/cxl/meson.build
@@ -7,6 +7,7 @@ cxl_src = [
'memdev.c',
'json.c',
'filter.c',
+ 'inject-error.c',
'../daxctl/json.c',
'../daxctl/filter.c',
]
--
2.52.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 5/7] cxl: Add clear-error command
2026-01-09 16:07 [ndctl PATCH v6 0/7] Add error injection support Ben Cheatham
` (3 preceding siblings ...)
2026-01-09 16:07 ` [PATCH 4/7] cxl: Add inject-error command Ben Cheatham
@ 2026-01-09 16:07 ` Ben Cheatham
2026-01-09 22:12 ` Dave Jiang
2026-01-09 16:07 ` [PATCH 6/7] cxl/list: Add injectable errors in output Ben Cheatham
2026-01-09 16:07 ` [PATCH 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
6 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2026-01-09 16:07 UTC (permalink / raw)
To: nvdimm, alison.schofield, dave.jiang; +Cc: linux-cxl, benjamin.cheatham
Add the 'cxl-clear-error' command. This command allows the user to clear
device poison from CXL memory devices.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/builtin.h | 1 +
cxl/cxl.c | 1 +
cxl/inject-error.c | 70 ++++++++++++++++++++++++++++++++++++++++++----
3 files changed, 67 insertions(+), 5 deletions(-)
diff --git a/cxl/builtin.h b/cxl/builtin.h
index e82fcb5..68ed1de 100644
--- a/cxl/builtin.h
+++ b/cxl/builtin.h
@@ -26,6 +26,7 @@ int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
+int cmd_clear_error(int argc, const char **argv, struct cxl_ctx *ctx);
#ifdef ENABLE_LIBTRACEFS
int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
#else
diff --git a/cxl/cxl.c b/cxl/cxl.c
index a98bd6b..e1740b5 100644
--- a/cxl/cxl.c
+++ b/cxl/cxl.c
@@ -81,6 +81,7 @@ static struct cmd_struct commands[] = {
{ "destroy-region", .c_fn = cmd_destroy_region },
{ "monitor", .c_fn = cmd_monitor },
{ "inject-error", .c_fn = cmd_inject_error },
+ { "clear-error", .c_fn = cmd_clear_error },
};
int main(int argc, const char **argv)
diff --git a/cxl/inject-error.c b/cxl/inject-error.c
index 0ca2e6b..76f9fa9 100644
--- a/cxl/inject-error.c
+++ b/cxl/inject-error.c
@@ -17,6 +17,10 @@ static struct inject_params {
const char *address;
} inj_param;
+static struct clear_params {
+ const char *address;
+} clear_param;
+
static const struct option inject_options[] = {
OPT_STRING('t', "type", &inj_param.type, "Error type",
"Error type to inject into <device>"),
@@ -28,6 +32,15 @@ static const struct option inject_options[] = {
OPT_END(),
};
+static const struct option clear_options[] = {
+ OPT_STRING('a', "address", &clear_param.address, "Address for poison clearing",
+ "Device physical address to clear poison from in hex or decimal"),
+#ifdef ENABLE_DEBUG
+ OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
+#endif
+ OPT_END(),
+};
+
static struct log_ctx iel;
static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
@@ -100,7 +113,7 @@ static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
}
static int poison_action(struct cxl_ctx *ctx, const char *filter,
- const char *addr_str)
+ const char *addr_str, bool clear)
{
struct cxl_memdev *memdev;
unsigned long long addr;
@@ -128,12 +141,18 @@ static int poison_action(struct cxl_ctx *ctx, const char *filter,
return -EINVAL;
}
- rc = cxl_memdev_inject_poison(memdev, addr);
+ if (clear)
+ rc = cxl_memdev_clear_poison(memdev, addr);
+ else
+ rc = cxl_memdev_inject_poison(memdev, addr);
+
if (rc)
- log_err(&iel, "failed to inject poison at %s:%s: %s\n",
+ log_err(&iel, "failed to %s %s:%s: %s\n",
+ clear ? "clear poison at" : "inject poison at",
cxl_memdev_get_devname(memdev), addr_str, strerror(-rc));
else
- log_info(&iel, "poison injected at %s:%s\n",
+ log_info(&iel,
+ "poison %s at %s:%s\n", clear ? "cleared" : "injected",
cxl_memdev_get_devname(memdev), addr_str);
return rc;
@@ -165,7 +184,7 @@ static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
}
if (strcmp(inj_param.type, "poison") == 0) {
- rc = poison_action(ctx, argv[0], inj_param.address);
+ rc = poison_action(ctx, argv[0], inj_param.address, false);
return rc;
}
@@ -186,3 +205,44 @@ int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
return rc ? EXIT_FAILURE : EXIT_SUCCESS;
}
+
+static int clear_action(int argc, const char **argv, struct cxl_ctx *ctx,
+ const struct option *options, const char *usage)
+{
+ const char * const u[] = {
+ usage,
+ NULL
+ };
+ int rc = -EINVAL;
+
+ log_init(&iel, "cxl clear-error", "CXL_CLEAR_LOG");
+ argc = parse_options(argc, argv, options, u, 0);
+
+ if (debug) {
+ cxl_set_log_priority(ctx, LOG_DEBUG);
+ iel.log_priority = LOG_DEBUG;
+ } else {
+ iel.log_priority = LOG_INFO;
+ }
+
+ if (argc != 1) {
+ usage_with_options(u, options);
+ return rc;
+ }
+
+ rc = poison_action(ctx, argv[0], clear_param.address, true);
+ if (rc) {
+ log_err(&iel, "Failed to clear poison on %s at: %s\n",
+ argv[0], strerror(-rc));
+ return rc;
+ }
+
+ return rc;
+}
+
+int cmd_clear_error(int argc, const char **argv, struct cxl_ctx *ctx)
+{
+ int rc = clear_action(argc, argv, ctx, clear_options,
+ "clear-error <device> [<options>]");
+ return rc ? EXIT_FAILURE : EXIT_SUCCESS;
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 6/7] cxl/list: Add injectable errors in output
2026-01-09 16:07 [ndctl PATCH v6 0/7] Add error injection support Ben Cheatham
` (4 preceding siblings ...)
2026-01-09 16:07 ` [PATCH 5/7] cxl: Add clear-error command Ben Cheatham
@ 2026-01-09 16:07 ` Ben Cheatham
2026-01-09 22:17 ` Dave Jiang
2026-01-09 16:07 ` [PATCH 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
6 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2026-01-09 16:07 UTC (permalink / raw)
To: nvdimm, alison.schofield, dave.jiang; +Cc: linux-cxl, benjamin.cheatham
Add injectable error information for CXL memory devices and busses.
This information is only shown when the CXL debugfs is accessible
(normally mounted at /sys/kernel/debug/cxl).
For CXL memory devices and dports this reports whether the device
supports poison injection. The "--media-errors"/"-L" option shows
injected poison for memory devices.
For CXL busses this shows injectable CXL protocol error types. The
information will be the same across busses because the error types are
system-wide. The information is presented under the bus for easier
filtering.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/json.c | 38 ++++++++++++++++++++++++++++++++++++++
cxl/lib/libcxl.c | 34 +++++++++++++++++++++++++---------
cxl/lib/libcxl.sym | 2 ++
cxl/libcxl.h | 2 ++
4 files changed, 67 insertions(+), 9 deletions(-)
diff --git a/cxl/json.c b/cxl/json.c
index e9cb88a..6cdf513 100644
--- a/cxl/json.c
+++ b/cxl/json.c
@@ -663,6 +663,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
json_object_object_add(jdev, "state", jobj);
}
+ if (cxl_debugfs_exists(cxl_memdev_get_ctx(memdev))) {
+ jobj = json_object_new_boolean(cxl_memdev_has_poison_injection(memdev));
+ if (jobj)
+ json_object_object_add(jdev, "poison_injectable", jobj);
+ }
+
if (flags & UTIL_JSON_PARTITION) {
jobj = util_cxl_memdev_partition_to_json(memdev, flags);
if (jobj)
@@ -691,6 +697,7 @@ void util_cxl_dports_append_json(struct json_object *jport,
{
struct json_object *jobj, *jdports;
struct cxl_dport *dport;
+ char *einj_path;
int val;
val = cxl_port_get_nr_dports(port);
@@ -739,6 +746,13 @@ void util_cxl_dports_append_json(struct json_object *jport,
if (jobj)
json_object_object_add(jdport, "id", jobj);
+ einj_path = cxl_dport_get_einj_path(dport);
+ jobj = json_object_new_boolean(einj_path != NULL);
+ if (jobj)
+ json_object_object_add(jdport, "protocol_injectable",
+ jobj);
+ free(einj_path);
+
json_object_array_add(jdports, jdport);
json_object_set_userdata(jdport, dport, NULL);
}
@@ -750,6 +764,8 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
unsigned long flags)
{
const char *devname = cxl_bus_get_devname(bus);
+ struct cxl_ctx *ctx = cxl_bus_get_ctx(bus);
+ struct cxl_protocol_error *perror;
struct json_object *jbus, *jobj;
jbus = json_object_new_object();
@@ -765,6 +781,28 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
json_object_object_add(jbus, "provider", jobj);
json_object_set_userdata(jbus, bus, NULL);
+
+ if (cxl_debugfs_exists(ctx)) {
+ jobj = json_object_new_array();
+ if (!jobj)
+ return jbus;
+
+ cxl_protocol_error_foreach(ctx, perror)
+ {
+ struct json_object *jerr_str;
+ const char *perror_str;
+
+ perror_str = cxl_protocol_error_get_str(perror);
+
+ jerr_str = json_object_new_string(perror_str);
+ if (jerr_str)
+ json_object_array_add(jobj, jerr_str);
+ }
+
+ json_object_object_add(jbus, "injectable_protocol_errors",
+ jobj);
+ }
+
return jbus;
}
diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index deebf7f..f824701 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -285,6 +285,11 @@ static char* get_cxl_debugfs_dir(void)
return debugfs_dir;
}
+CXL_EXPORT bool cxl_debugfs_exists(struct cxl_ctx *ctx)
+{
+ return ctx->cxl_debugfs != NULL;
+}
+
/**
* cxl_new - instantiate a new library context
* @ctx: context to establish
@@ -3567,38 +3572,49 @@ cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
return perror->string;
}
-CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
- unsigned int error)
+CXL_EXPORT char *cxl_dport_get_einj_path(struct cxl_dport *dport)
{
struct cxl_ctx *ctx = dport->port->ctx;
- char buf[32] = { 0 };
size_t path_len, len;
char *path;
int rc;
- if (!ctx->cxl_debugfs)
- return -ENOENT;
-
path_len = strlen(ctx->cxl_debugfs) + 100;
path = calloc(path_len, sizeof(char));
if (!path)
- return -ENOMEM;
+ return NULL;
len = snprintf(path, path_len, "%s/%s/einj_inject", ctx->cxl_debugfs,
cxl_dport_get_devname(dport));
if (len >= path_len) {
err(ctx, "%s: buffer too small\n", cxl_dport_get_devname(dport));
free(path);
- return -ENOMEM;
+ return NULL;
}
rc = access(path, F_OK);
if (rc) {
err(ctx, "failed to access %s: %s\n", path, strerror(errno));
free(path);
- return -errno;
+ return NULL;
}
+ return path;
+}
+
+CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
+ unsigned int error)
+{
+ struct cxl_ctx *ctx = dport->port->ctx;
+ char buf[32] = { 0 };
+ char *path;
+ size_t len;
+ int rc;
+
+ path = cxl_dport_get_einj_path(dport);
+ if (!path)
+ return -ENOENT;
+
len = snprintf(buf, sizeof(buf), "0x%x\n", error);
if (len >= sizeof(buf)) {
err(ctx, "%s: buffer too small\n", cxl_dport_get_devname(dport));
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index c636edb..ebca543 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -308,8 +308,10 @@ global:
cxl_protocol_error_get_next;
cxl_protocol_error_get_num;
cxl_protocol_error_get_str;
+ cxl_dport_get_einj_path;
cxl_dport_protocol_error_inject;
cxl_memdev_has_poison_injection;
cxl_memdev_inject_poison;
cxl_memdev_clear_poison;
+ cxl_debugfs_exists;
} LIBCXL_10;
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index 4d035f0..e390aca 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -32,6 +32,7 @@ void cxl_set_userdata(struct cxl_ctx *ctx, void *userdata);
void *cxl_get_userdata(struct cxl_ctx *ctx);
void cxl_set_private_data(struct cxl_ctx *ctx, void *data);
void *cxl_get_private_data(struct cxl_ctx *ctx);
+bool cxl_debugfs_exists(struct cxl_ctx *ctx);
enum cxl_fwl_status {
CXL_FWL_STATUS_UNKNOWN,
@@ -507,6 +508,7 @@ struct cxl_protocol_error *
cxl_protocol_error_get_next(struct cxl_protocol_error *perror);
unsigned int cxl_protocol_error_get_num(struct cxl_protocol_error *perror);
const char *cxl_protocol_error_get_str(struct cxl_protocol_error *perror);
+char *cxl_dport_get_einj_path(struct cxl_dport *dport);
int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
unsigned int error);
--
2.52.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 7/7] Documentation: Add docs for inject/clear-error commands
2026-01-09 16:07 [ndctl PATCH v6 0/7] Add error injection support Ben Cheatham
` (5 preceding siblings ...)
2026-01-09 16:07 ` [PATCH 6/7] cxl/list: Add injectable errors in output Ben Cheatham
@ 2026-01-09 16:07 ` Ben Cheatham
2026-01-09 22:25 ` Dave Jiang
6 siblings, 1 reply; 19+ messages in thread
From: Ben Cheatham @ 2026-01-09 16:07 UTC (permalink / raw)
To: nvdimm, alison.schofield, dave.jiang; +Cc: linux-cxl, benjamin.cheatham
Add man pages for the 'cxl-inject-error' and 'cxl-clear-error' commands.
These man pages show usage and examples for each of their use cases.
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
Documentation/cxl/cxl-clear-error.txt | 69 +++++++++++
Documentation/cxl/cxl-inject-error.txt | 161 +++++++++++++++++++++++++
Documentation/cxl/meson.build | 2 +
3 files changed, 232 insertions(+)
create mode 100644 Documentation/cxl/cxl-clear-error.txt
create mode 100644 Documentation/cxl/cxl-inject-error.txt
diff --git a/Documentation/cxl/cxl-clear-error.txt b/Documentation/cxl/cxl-clear-error.txt
new file mode 100644
index 0000000..9d77855
--- /dev/null
+++ b/Documentation/cxl/cxl-clear-error.txt
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+
+cxl-clear-error(1)
+==================
+
+NAME
+----
+cxl-clear-error - Clear CXL errors from CXL devices
+
+SYNOPSIS
+--------
+[verse]
+'cxl clear-error' <device name> [<options>]
+
+Clear an error from a CXL device. The types of devices supported are:
+
+"memdevs":: A CXL memory device. Memory devices are specified by device
+name ("mem0"), device id ("0") and/or host device name ("0000:35:00.0").
+
+Only device poison (viewable using the '-L'/'--media-errors' option of
+'cxl-list') can be cleared from a device using this command. For example:
+
+----
+
+# cxl list -m mem0 -L -u
+{
+ "memdev":"mem0",
+ "ram_size":"1024.00 MiB (1073.74 MB)",
+ "ram_qos_class":42,
+ "serial":"0x0",
+ "numa_node:1,
+ "host":"0000:35:00.0",
+ "media_errors":[
+ {
+ "offset":"0x1000",
+ "length":64,
+ "source":"Injected"
+ }
+ ]
+}
+
+# cxl clear-error mem0 -a 0x1000
+poison cleared at mem0:0x1000
+
+# cxl list -m mem0 -L -u
+{
+ "memdev":"mem0",
+ "ram_size":"1024.00 MiB (1073.74 MB)",
+ "ram_qos_class":42,
+ "serial":"0x0",
+ "numa_node:1,
+ "host":"0000:35:00.0",
+ "media_errors":[
+ ]
+}
+
+----
+
+This command depends on the kernel debug filesystem (debugfs) to clear device poison.
+
+OPTIONS
+-------
+-a::
+--address::
+ Device physical address (DPA) to clear poison from. Address can be specified
+ in hex or decimal. Required for clearing poison.
+
+--debug::
+ Enable debug output
diff --git a/Documentation/cxl/cxl-inject-error.txt b/Documentation/cxl/cxl-inject-error.txt
new file mode 100644
index 0000000..80d03be
--- /dev/null
+++ b/Documentation/cxl/cxl-inject-error.txt
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0
+
+cxl-inject-error(1)
+===================
+
+NAME
+----
+cxl-inject-error - Inject CXL errors into CXL devices
+
+SYNOPSIS
+--------
+[verse]
+'cxl inject-error' <device name> [<options>]
+
+WARNING: Error injection can cause system instability and should only be used
+for debugging hardware and software error recovery flows. Use at your own risk!
+
+Inject an error into a CXL device. The type of errors supported depend on the
+device specified. The types of devices supported are:
+
+"Downstream Ports":: A CXL RCH downstream port (dport) or a CXL VH root port.
+Eligible ports will have their 'protocol_injectable' attribute in 'cxl-list'
+set to true. Dports are specified by host name ("0000:0e:01.1").
+"memdevs":: A CXL memory device. Memory devices are specified by device name
+("mem0"), device id ("0"), and/or host device name ("0000:35:00.0").
+
+There are two types of errors which can be injected: CXL protocol errors
+and device poison.
+
+CXL protocol errors can only be used with downstream ports (as defined above).
+Protocol errors follow the format of "<protocol>-<severity>". For example,
+a "mem-fatal" error is a CXL.mem fatal protocol error. Protocol errors can be
+found in the "injectable_protocol_errors" list under a CXL bus object. This
+list is only available when the CXL debugfs is accessible (normally mounted
+at "/sys/kernel/debug/cxl"). For example:
+
+----
+
+# cxl list -B
+[
+ {
+ "bus":"root0",
+ "provider":"ACPI.CXL",
+ "injectable_protocol_errors":[
+ "mem-correctable",
+ "mem-fatal",
+ ]
+ }
+]
+
+----
+
+CXL protocol (CXL.cache/mem) error injection requires the platform to support
+ACPI v6.5+ error injection (EINJ). In addition to platform support, the
+CONFIG_ACPI_APEI_EINJ and CONFIG_ACPI_APEI_EINJ_CXL kernel configuration options
+will need to be enabled. For more information, view the Linux kernel documentation
+on EINJ. Example using the bus output above:
+
+----
+
+# cxl list -TP
+ [
+ {
+ "port":"port1",
+ "host":"pci0000:e0",
+ "depth":1,
+ "decoders_committed":1,
+ "nr_dports":1,
+ "dports":[
+ {
+ "dport":"0000:e0:01.1",
+ "alias":"device:02",
+ "id":0,
+ "protocol_injectable":true
+ }
+ ]
+ }
+]
+
+# cxl inject-error "0000:e0:01.1" -t mem-correctable
+cxl inject-error: inject_proto_err: injected mem-correctable protocol error.
+
+----
+
+Device poison can only by used with CXL memory devices. A device physical address
+(DPA) is required to do poison injection. DPAs range from 0 to the size of
+device's memory, which can be found using 'cxl-list'. An example injection:
+
+----
+
+# cxl inject-error mem0 -t poison -a 0x1000
+poison injected at mem0:0x1000
+# cxl list -m mem0 -u --media-errors
+{
+ "memdev":"mem0",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0",
+ "host":"0000:0d:00.0",
+ "firmware_version":"BWFW VERSION 00",
+ "media_errors":[
+ {
+ "offset":"0x1000",
+ "length":64,
+ "source":"Injected"
+ }
+ ]
+}
+
+----
+
+Not all memory devices support poison injection. To see if a device supports
+poison injection through debugfs, use 'cxl-list' look for the "poison-injectable"
+attribute under the device. This attribute is only available when the CXL debugfs
+is accessible. Example:
+
+----
+
+# cxl list -u -m mem0
+{
+ "memdev":"mem0",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0",
+ "host":"0000:0d:00.0",
+ "firmware_version":"BWFW VERSION 00",
+ "poison_injectable":true
+}
+
+----
+
+This command depends on the kernel debug filesystem (debugfs) to do CXL protocol
+error and device poison injection.
+
+OPTIONS
+-------
+-a::
+--address::
+ Device physical address (DPA) to use for poison injection. Address can
+ be specified in hex or decimal. Required for poison injection.
+
+-t::
+--type::
+ Type of error to inject into <device name>. The type of error is restricted
+ by device type. The following shows the possible types under their associated
+ device type(s):
+----
+
+Downstream Ports: ::
+ cache-correctable, cache-uncorrectable, cache-fatal, mem-correctable,
+ mem-uncorrectable, mem-fatal
+
+Memdevs: ::
+ poison
+
+----
+
+--debug::
+ Enable debug output
+
+SEE ALSO
+--------
+linkcxl:cxl-list[1]
diff --git a/Documentation/cxl/meson.build b/Documentation/cxl/meson.build
index 8085c1c..0b75eed 100644
--- a/Documentation/cxl/meson.build
+++ b/Documentation/cxl/meson.build
@@ -50,6 +50,8 @@ cxl_manpages = [
'cxl-update-firmware.txt',
'cxl-set-alert-config.txt',
'cxl-wait-sanitize.txt',
+ 'cxl-inject-error.txt',
+ 'cxl-clear-error.txt',
]
foreach man : cxl_manpages
--
2.52.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/7] libcxl: Add debugfs path to CXL context
2026-01-09 16:07 ` [PATCH 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
@ 2026-01-09 17:43 ` Dave Jiang
0 siblings, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2026-01-09 17:43 UTC (permalink / raw)
To: Ben Cheatham, nvdimm, alison.schofield; +Cc: linux-cxl
On 1/9/26 9:07 AM, Ben Cheatham wrote:
> Find the CXL debugfs mount point and add it to the CXL library context.
> This will be used by poison and procotol error library functions to
> access the information presented by the filesystem.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> cxl/lib/libcxl.c | 37 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> index 32728de..6b7e92c 100644
> --- a/cxl/lib/libcxl.c
> +++ b/cxl/lib/libcxl.c
> @@ -8,6 +8,8 @@
> #include <stdlib.h>
> #include <dirent.h>
> #include <unistd.h>
> +#include <mntent.h>
> +#include <string.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> @@ -54,6 +56,7 @@ struct cxl_ctx {
> struct kmod_ctx *kmod_ctx;
> struct daxctl_ctx *daxctl_ctx;
> void *private_data;
> + char *cxl_debugfs;
> };
>
> static void free_pmem(struct cxl_pmem *pmem)
> @@ -240,6 +243,38 @@ CXL_EXPORT void *cxl_get_private_data(struct cxl_ctx *ctx)
> return ctx->private_data;
> }
>
> +static char* get_cxl_debugfs_dir(void)
> +{
> + char *debugfs_dir = NULL;
> + struct mntent *ent;
> + FILE *mntf;
> +
> + mntf = setmntent("/proc/mounts", "r");
> + if (!mntf)
> + return NULL;
> +
> + while ((ent = getmntent(mntf)) != NULL) {
> + if (!strcmp(ent->mnt_type, "debugfs")) {
> + /* Magic '5' here is length of "/cxl" + NULL terminator */
> + debugfs_dir = calloc(strlen(ent->mnt_dir) + 5, 1);
> + if (!debugfs_dir)
> + return NULL;
> +
> + strcpy(debugfs_dir, ent->mnt_dir);
> + strcat(debugfs_dir, "/cxl");
> + if (access(debugfs_dir, F_OK) != 0) {
> + free(debugfs_dir);
> + debugfs_dir = NULL;
> + }
> +
> + break;
> + }
> + }
> +
> + endmntent(mntf);
> + return debugfs_dir;
> +}
> +
> /**
> * cxl_new - instantiate a new library context
> * @ctx: context to establish
> @@ -295,6 +330,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
> c->udev = udev;
> c->udev_queue = udev_queue;
> c->timeout = 5000;
> + c->cxl_debugfs = get_cxl_debugfs_dir();
>
> return 0;
>
> @@ -350,6 +386,7 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
> kmod_unref(ctx->kmod_ctx);
> daxctl_unref(ctx->daxctl_ctx);
> info(ctx, "context %p released\n", ctx);
> + free((void *)ctx->cxl_debugfs);
> free(ctx);
> }
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/7] libcxl: Add CXL protocol errors
2026-01-09 16:07 ` [PATCH 2/7] libcxl: Add CXL protocol errors Ben Cheatham
@ 2026-01-09 17:54 ` Dave Jiang
2026-01-12 17:20 ` Cheatham, Benjamin
0 siblings, 1 reply; 19+ messages in thread
From: Dave Jiang @ 2026-01-09 17:54 UTC (permalink / raw)
To: Ben Cheatham, nvdimm, alison.schofield; +Cc: linux-cxl
On 1/9/26 9:07 AM, Ben Cheatham wrote:
> The v6.11 Linux kernel adds CXL protocl (CXL.cache & CXL.mem) error
> injection for platforms that implement the error types as according to
> the v6.5+ ACPI specification. The interface for injecting these errors
> are provided by the kernel under the CXL debugfs. The relevant files in
> the interface are the einj_types file, which provides the available CXL
> error types for injection, and the einj_inject file, which injects the
> error into a CXL VH root port or CXL RCH downstream port.
>
> Add a library API to retrieve the CXL error types and inject them. This
> API will be used in a later commit by the 'cxl-inject-error' and
> 'cxl-list' commands.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Just a nit below. otherwise
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> cxl/lib/libcxl.c | 194 +++++++++++++++++++++++++++++++++++++++++++++
> cxl/lib/libcxl.sym | 5 ++
> cxl/lib/private.h | 14 ++++
> cxl/libcxl.h | 13 +++
> 4 files changed, 226 insertions(+)
>
> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> index 6b7e92c..27ff037 100644
> --- a/cxl/lib/libcxl.c
> +++ b/cxl/lib/libcxl.c
> @@ -48,11 +48,13 @@ struct cxl_ctx {
> void *userdata;
> int memdevs_init;
> int buses_init;
> + int perrors_init;
> unsigned long timeout;
> struct udev *udev;
> struct udev_queue *udev_queue;
> struct list_head memdevs;
> struct list_head buses;
> + struct list_head perrors;
> struct kmod_ctx *kmod_ctx;
> struct daxctl_ctx *daxctl_ctx;
> void *private_data;
> @@ -207,6 +209,14 @@ static void free_bus(struct cxl_bus *bus, struct list_head *head)
> free(bus);
> }
>
> +static void free_protocol_error(struct cxl_protocol_error *perror,
> + struct list_head *head)
> +{
> + if (head)
> + list_del_from(head, &perror->list);
> + free(perror);
> +}
> +
> /**
> * cxl_get_userdata - retrieve stored data pointer from library context
> * @ctx: cxl library context
> @@ -325,6 +335,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
> *ctx = c;
> list_head_init(&c->memdevs);
> list_head_init(&c->buses);
> + list_head_init(&c->perrors);
> c->kmod_ctx = kmod_ctx;
> c->daxctl_ctx = daxctl_ctx;
> c->udev = udev;
> @@ -366,6 +377,7 @@ CXL_EXPORT struct cxl_ctx *cxl_ref(struct cxl_ctx *ctx)
> */
> CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
> {
> + struct cxl_protocol_error *perror, *_p;
> struct cxl_memdev *memdev, *_d;
> struct cxl_bus *bus, *_b;
>
> @@ -381,6 +393,9 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
> list_for_each_safe(&ctx->buses, bus, _b, port.list)
> free_bus(bus, &ctx->buses);
>
> + list_for_each_safe(&ctx->perrors, perror, _p, list)
> + free_protocol_error(perror, &ctx->perrors);
> +
> udev_queue_unref(ctx->udev_queue);
> udev_unref(ctx->udev);
> kmod_unref(ctx->kmod_ctx);
> @@ -3423,6 +3438,185 @@ CXL_EXPORT int cxl_port_decoders_committed(struct cxl_port *port)
> return port->decoders_committed;
> }
>
> +const struct cxl_protocol_error cxl_protocol_errors[] = {
> + CXL_PROTOCOL_ERROR(0x1000, "cache-correctable"),
> + CXL_PROTOCOL_ERROR(0x2000, "cache-uncorrectable"),
> + CXL_PROTOCOL_ERROR(0x4000, "cache-fatal"),
> + CXL_PROTOCOL_ERROR(0x8000, "mem-correctable"),
> + CXL_PROTOCOL_ERROR(0x10000, "mem-uncorrectable"),
> + CXL_PROTOCOL_ERROR(0x20000, "mem-fatal")
> +};
> +
> +static struct cxl_protocol_error *create_cxl_protocol_error(struct cxl_ctx *ctx,
> + unsigned int n)
> +{
> + struct cxl_protocol_error *perror;
> +
> + for (unsigned long i = 0; i < ARRAY_SIZE(cxl_protocol_errors); i++) {
> + if (n != cxl_protocol_errors[i].num)
> + continue;
> +
> + perror = calloc(1, sizeof(*perror));
> + if (!perror)
> + return NULL;
> +
> + *perror = cxl_protocol_errors[i];
> + perror->ctx = ctx;
> + return perror;
> + }
> +
> + return NULL;
> +}
> +
> +static void cxl_add_protocol_errors(struct cxl_ctx *ctx)
> +{
> + struct cxl_protocol_error *perror;
> + char buf[SYSFS_ATTR_SIZE];
> + char *path, *num, *save;
> + size_t path_len, len;
> + unsigned long n;
> + int rc = 0;
> +
> + if (!ctx->cxl_debugfs)
> + return;
> +
> + path_len = strlen(ctx->cxl_debugfs) + 100;
> + path = calloc(1, path_len);
Maybe just use PATH_MAX from <linux/limits.h>.
DJ
> + if (!path)
> + return;
> +
> + len = snprintf(path, path_len, "%s/einj_types", ctx->cxl_debugfs);
> + if (len >= path_len) {
> + err(ctx, "Buffer too small\n");
> + goto err;
> + }
> +
> + rc = access(path, F_OK);
> + if (rc) {
> + err(ctx, "failed to access %s: %s\n", path, strerror(errno));
> + goto err;
> + }
> +
> + rc = sysfs_read_attr(ctx, path, buf);
> + if (rc) {
> + err(ctx, "failed to read %s: %s\n", path, strerror(-rc));
> + goto err;
> + }
> +
> + /*
> + * The format of the output of the einj_types attr is:
> + * <Error number in hex 1> <Error name 1>
> + * <Error number in hex 2> <Error name 2>
> + * ...
> + *
> + * We only need the number, so parse that and skip the rest of
> + * the line.
> + */
> + num = strtok_r(buf, " \n", &save);
> + while (num) {
> + n = strtoul(num, NULL, 16);
> + perror = create_cxl_protocol_error(ctx, n);
> + if (perror)
> + list_add_tail(&ctx->perrors, &perror->list);
> +
> + num = strtok_r(NULL, "\n", &save);
> + if (!num)
> + break;
> +
> + num = strtok_r(NULL, " \n", &save);
> + }
> +
> +err:
> + free(path);
> +}
> +
> +static void cxl_protocol_errors_init(struct cxl_ctx *ctx)
> +{
> + if (ctx->perrors_init)
> + return;
> +
> + ctx->perrors_init = 1;
> + cxl_add_protocol_errors(ctx);
> +}
> +
> +CXL_EXPORT struct cxl_protocol_error *
> +cxl_protocol_error_get_first(struct cxl_ctx *ctx)
> +{
> + cxl_protocol_errors_init(ctx);
> +
> + return list_top(&ctx->perrors, struct cxl_protocol_error, list);
> +}
> +
> +CXL_EXPORT struct cxl_protocol_error *
> +cxl_protocol_error_get_next(struct cxl_protocol_error *perror)
> +{
> + struct cxl_ctx *ctx = perror->ctx;
> +
> + return list_next(&ctx->perrors, perror, list);
> +}
> +
> +CXL_EXPORT unsigned int
> +cxl_protocol_error_get_num(struct cxl_protocol_error *perror)
> +{
> + return perror->num;
> +}
> +
> +CXL_EXPORT const char *
> +cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
> +{
> + return perror->string;
> +}
> +
> +CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
> + unsigned int error)
> +{
> + struct cxl_ctx *ctx = dport->port->ctx;
> + char buf[32] = { 0 };
> + size_t path_len, len;
> + char *path;
> + int rc;
> +
> + if (!ctx->cxl_debugfs)
> + return -ENOENT;
> +
> + path_len = strlen(ctx->cxl_debugfs) + 100;
> + path = calloc(path_len, sizeof(char));
> + if (!path)
> + return -ENOMEM;
> +
> + len = snprintf(path, path_len, "%s/%s/einj_inject", ctx->cxl_debugfs,
> + cxl_dport_get_devname(dport));
> + if (len >= path_len) {
> + err(ctx, "%s: buffer too small\n", cxl_dport_get_devname(dport));
> + free(path);
> + return -ENOMEM;
> + }
> +
> + rc = access(path, F_OK);
> + if (rc) {
> + err(ctx, "failed to access %s: %s\n", path, strerror(errno));
> + free(path);
> + return -errno;
> + }
> +
> + len = snprintf(buf, sizeof(buf), "0x%x\n", error);
> + if (len >= sizeof(buf)) {
> + err(ctx, "%s: buffer too small\n", cxl_dport_get_devname(dport));
> + free(path);
> + return -ENOMEM;
> + }
> +
> + rc = sysfs_write_attr(ctx, path, buf);
> + if (rc) {
> + err(ctx, "failed to write %s: %s\n", path, strerror(-rc));
> + free(path);
> + return -errno;
> + }
> +
> + free(path);
> + return 0;
> +}
> +
> static void *add_cxl_bus(void *parent, int id, const char *cxlbus_base)
> {
> const char *devname = devpath_to_devname(cxlbus_base);
> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
> index 36a93c3..c683b83 100644
> --- a/cxl/lib/libcxl.sym
> +++ b/cxl/lib/libcxl.sym
> @@ -304,4 +304,9 @@ global:
> LIBCXL_11 {
> global:
> cxl_region_get_extended_linear_cache_size;
> + cxl_protocol_error_get_first;
> + cxl_protocol_error_get_next;
> + cxl_protocol_error_get_num;
> + cxl_protocol_error_get_str;
> + cxl_dport_protocol_error_inject;
> } LIBCXL_10;
> diff --git a/cxl/lib/private.h b/cxl/lib/private.h
> index 542cdb7..582eebf 100644
> --- a/cxl/lib/private.h
> +++ b/cxl/lib/private.h
> @@ -108,6 +108,20 @@ struct cxl_port {
> struct list_head dports;
> };
>
> +struct cxl_protocol_error {
> + unsigned int num;
> + const char *string;
> + struct cxl_ctx *ctx;
> + struct list_node list;
> +};
> +
> +#define CXL_PROTOCOL_ERROR(n, str) \
> + ((struct cxl_protocol_error){ \
> + .num = (n), \
> + .string = (str), \
> + .ctx = NULL, \
> + })
> +
> struct cxl_bus {
> struct cxl_port port;
> };
> diff --git a/cxl/libcxl.h b/cxl/libcxl.h
> index 9371aac..faef62e 100644
> --- a/cxl/libcxl.h
> +++ b/cxl/libcxl.h
> @@ -498,6 +498,19 @@ int cxl_cmd_alert_config_set_enable_alert_actions(struct cxl_cmd *cmd,
> int enable);
> struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memdev);
>
> +struct cxl_protocol_error;
> +struct cxl_protocol_error *cxl_protocol_error_get_first(struct cxl_ctx *ctx);
> +struct cxl_protocol_error *
> +cxl_protocol_error_get_next(struct cxl_protocol_error *perror);
> +unsigned int cxl_protocol_error_get_num(struct cxl_protocol_error *perror);
> +const char *cxl_protocol_error_get_str(struct cxl_protocol_error *perror);
> +int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
> + unsigned int error);
> +
> +#define cxl_protocol_error_foreach(ctx, perror) \
> + for (perror = cxl_protocol_error_get_first(ctx); perror != NULL; \
> + perror = cxl_protocol_error_get_next(perror))
> +
> #ifdef __cplusplus
> } /* extern "C" */
> #endif
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/7] libcxl: Add poison injection support
2026-01-09 16:07 ` [PATCH 3/7] libcxl: Add poison injection support Ben Cheatham
@ 2026-01-09 18:03 ` Dave Jiang
2026-01-12 17:20 ` Cheatham, Benjamin
0 siblings, 1 reply; 19+ messages in thread
From: Dave Jiang @ 2026-01-09 18:03 UTC (permalink / raw)
To: Ben Cheatham, nvdimm, alison.schofield; +Cc: linux-cxl
On 1/9/26 9:07 AM, Ben Cheatham wrote:
> Add a library API for clearing and injecting poison into a CXL memory
> device through the CXL debugfs.
>
> This API will be used by the 'cxl-inject-error' and 'cxl-clear-error'
> commands in later commits.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
> cxl/lib/libcxl.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++
> cxl/lib/libcxl.sym | 3 ++
> cxl/libcxl.h | 3 ++
> 3 files changed, 89 insertions(+)
>
> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> index 27ff037..deebf7f 100644
> --- a/cxl/lib/libcxl.c
> +++ b/cxl/lib/libcxl.c
> @@ -5046,3 +5046,86 @@ CXL_EXPORT struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memde
> {
> return cxl_cmd_new_generic(memdev, CXL_MEM_COMMAND_ID_SET_ALERT_CONFIG);
> }
> +
> +CXL_EXPORT bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev)
> +{
> + struct cxl_ctx *ctx = memdev->ctx;
> + size_t path_len, len;
> + bool exists = true;
> + char *path;
> + int rc;
> +
> + if (!ctx->cxl_debugfs)
> + return false;
> +
> + path_len = strlen(ctx->cxl_debugfs) + 100;
Same comment about PATH_MAX.
> + path = calloc(path_len, sizeof(char));
> + if (!path)
> + return false;
> +
> + len = snprintf(path, path_len, "%s/%s/inject_poison", ctx->cxl_debugfs,
> + cxl_memdev_get_devname(memdev));
> + if (len >= path_len) {
> + err(ctx, "%s: buffer too small\n",
> + cxl_memdev_get_devname(memdev));
> + free(path);
> + return false;
I think I saw in an earlier patch that you were using goto to filter error exit point. So may as well make it consistent and do it here as well.
> + }
> +
> + rc = access(path, F_OK);
> + if (rc)
> + exists = false;
> +
> + free(path);
> + return exists;
> +}
> +
> +static int cxl_memdev_poison_action(struct cxl_memdev *memdev, size_t dpa,
> + bool clear)
> +{
> + struct cxl_ctx *ctx = memdev->ctx;
> + size_t path_len, len;
> + char addr[32];
> + char *path;
> + int rc;
> +
> + if (!ctx->cxl_debugfs)
> + return -ENOENT;
> +
> + path_len = strlen(ctx->cxl_debugfs) + 100;
same comment about path len
> + path = calloc(path_len, sizeof(char));
> + if (!path)
> + return -ENOMEM;
> +
> + len = snprintf(path, path_len, "%s/%s/%s", ctx->cxl_debugfs,
> + cxl_memdev_get_devname(memdev),
> + clear ? "clear_poison" : "inject_poison");
> + if (len >= path_len) {
> + err(ctx, "%s: buffer too small\n",
> + cxl_memdev_get_devname(memdev));
> + free(path);
> + return -ENOMEM;
same comment about error paths
DJ
> + }
> +
> + len = snprintf(addr, sizeof(addr), "0x%lx\n", dpa);
> + if (len >= sizeof(addr)) {
> + err(ctx, "%s: buffer too small\n",
> + cxl_memdev_get_devname(memdev));
> + free(path);
> + return -ENOMEM;
> + }
> +
> + rc = sysfs_write_attr(ctx, path, addr);
> + free(path);
> + return rc;
> +}
> +
> +CXL_EXPORT int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t addr)
> +{
> + return cxl_memdev_poison_action(memdev, addr, false);
> +}
> +
> +CXL_EXPORT int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t addr)
> +{
> + return cxl_memdev_poison_action(memdev, addr, true);
> +}
> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
> index c683b83..c636edb 100644
> --- a/cxl/lib/libcxl.sym
> +++ b/cxl/lib/libcxl.sym
> @@ -309,4 +309,7 @@ global:
> cxl_protocol_error_get_num;
> cxl_protocol_error_get_str;
> cxl_dport_protocol_error_inject;
> + cxl_memdev_has_poison_injection;
> + cxl_memdev_inject_poison;
> + cxl_memdev_clear_poison;
> } LIBCXL_10;
> diff --git a/cxl/libcxl.h b/cxl/libcxl.h
> index faef62e..4d035f0 100644
> --- a/cxl/libcxl.h
> +++ b/cxl/libcxl.h
> @@ -105,6 +105,9 @@ int cxl_memdev_read_label(struct cxl_memdev *memdev, void *buf, size_t length,
> size_t offset);
> int cxl_memdev_write_label(struct cxl_memdev *memdev, void *buf, size_t length,
> size_t offset);
> +bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev);
> +int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t dpa);
> +int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t dpa);
> struct cxl_cmd *cxl_cmd_new_get_fw_info(struct cxl_memdev *memdev);
> unsigned int cxl_cmd_fw_info_get_num_slots(struct cxl_cmd *cmd);
> unsigned int cxl_cmd_fw_info_get_active_slot(struct cxl_cmd *cmd);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/7] cxl: Add inject-error command
2026-01-09 16:07 ` [PATCH 4/7] cxl: Add inject-error command Ben Cheatham
@ 2026-01-09 21:53 ` Dave Jiang
2026-01-12 17:20 ` Cheatham, Benjamin
0 siblings, 1 reply; 19+ messages in thread
From: Dave Jiang @ 2026-01-09 21:53 UTC (permalink / raw)
To: Ben Cheatham, nvdimm, alison.schofield; +Cc: linux-cxl
On 1/9/26 9:07 AM, Ben Cheatham wrote:
> Add the 'cxl-inject-error' command. This command will provide CXL
> protocol error injection for CXL VH root ports and CXL RCH downstream
> ports, as well as poison injection for CXL memory devices.
>
> Add util_cxl_dport_filter() to find downstream ports by device name.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
> cxl/builtin.h | 1 +
> cxl/cxl.c | 1 +
> cxl/filter.c | 26 +++++++
> cxl/filter.h | 2 +
> cxl/inject-error.c | 188 +++++++++++++++++++++++++++++++++++++++++++++
> cxl/meson.build | 1 +
> 6 files changed, 219 insertions(+)
> create mode 100644 cxl/inject-error.c
>
> diff --git a/cxl/builtin.h b/cxl/builtin.h
> index c483f30..e82fcb5 100644
> --- a/cxl/builtin.h
> +++ b/cxl/builtin.h
> @@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx);
> int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
> int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
> int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
> #ifdef ENABLE_LIBTRACEFS
> int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
> #else
> diff --git a/cxl/cxl.c b/cxl/cxl.c
> index 1643667..a98bd6b 100644
> --- a/cxl/cxl.c
> +++ b/cxl/cxl.c
> @@ -80,6 +80,7 @@ static struct cmd_struct commands[] = {
> { "disable-region", .c_fn = cmd_disable_region },
> { "destroy-region", .c_fn = cmd_destroy_region },
> { "monitor", .c_fn = cmd_monitor },
> + { "inject-error", .c_fn = cmd_inject_error },
> };
>
> int main(int argc, const char **argv)
> diff --git a/cxl/filter.c b/cxl/filter.c
> index b135c04..8c7dc6e 100644
> --- a/cxl/filter.c
> +++ b/cxl/filter.c
> @@ -171,6 +171,32 @@ util_cxl_endpoint_filter_by_port(struct cxl_endpoint *endpoint,
> return NULL;
> }
>
> +struct cxl_dport *util_cxl_dport_filter(struct cxl_dport *dport,
> + const char *__ident)
> +{
> +
> + char *ident, *save;
> + const char *arg;
> +
> + if (!__ident)
> + return dport;
> +
> + ident = strdup(__ident);
> + if (!ident)
> + return NULL;
> +
> + for (arg = strtok_r(ident, which_sep(__ident), &save); arg;
> + arg = strtok_r(NULL, which_sep(__ident), &save)) {
> + if (strcmp(arg, cxl_dport_get_devname(dport)) == 0)
> + break;
> + }
> +
> + free(ident);
> + if (arg)
> + return dport;
> + return NULL;
> +}
> +
> static struct cxl_decoder *
> util_cxl_decoder_filter_by_port(struct cxl_decoder *decoder, const char *ident,
> enum cxl_port_filter_mode mode)
> diff --git a/cxl/filter.h b/cxl/filter.h
> index 956a46e..70463c4 100644
> --- a/cxl/filter.h
> +++ b/cxl/filter.h
> @@ -55,6 +55,8 @@ enum cxl_port_filter_mode {
>
> struct cxl_port *util_cxl_port_filter(struct cxl_port *port, const char *ident,
> enum cxl_port_filter_mode mode);
> +struct cxl_dport *util_cxl_dport_filter(struct cxl_dport *dport,
> + const char *__ident);
> struct cxl_bus *util_cxl_bus_filter(struct cxl_bus *bus, const char *__ident);
> struct cxl_endpoint *util_cxl_endpoint_filter(struct cxl_endpoint *endpoint,
> const char *__ident);
> diff --git a/cxl/inject-error.c b/cxl/inject-error.c
> new file mode 100644
> index 0000000..0ca2e6b
> --- /dev/null
> +++ b/cxl/inject-error.c
> @@ -0,0 +1,188 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2025 AMD. All rights reserved. */
> +#include <util/parse-options.h>
> +#include <cxl/libcxl.h>
> +#include <cxl/filter.h>
> +#include <util/log.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <stdio.h>
> +#include <errno.h>
> +#include <limits.h>
> +
> +static bool debug;
> +
> +static struct inject_params {
> + const char *type;
> + const char *address;
> +} inj_param;
> +
> +static const struct option inject_options[] = {
> + OPT_STRING('t', "type", &inj_param.type, "Error type",
> + "Error type to inject into <device>"),
> + OPT_STRING('a', "address", &inj_param.address, "Address for poison injection",
> + "Device physical address for poison injection in hex or decimal"),
> +#ifdef ENABLE_DEBUG
> + OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
> +#endif
> + OPT_END(),
> +};
> +
> +static struct log_ctx iel;
> +
> +static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
> + const char *type)
> +{
> + struct cxl_protocol_error *perror;
> +
> + cxl_protocol_error_foreach(ctx, perror) {
> + if (strcmp(type, cxl_protocol_error_get_str(perror)) == 0)
> + return perror;
> + }
> +
> + log_err(&iel, "Invalid CXL protocol error type: %s\n", type);
> + return NULL;
> +}
> +
> +static struct cxl_dport *find_cxl_dport(struct cxl_ctx *ctx, const char *devname)
> +{
> + struct cxl_dport *dport;
> + struct cxl_port *port;
> + struct cxl_bus *bus;
> +
> + cxl_bus_foreach(ctx, bus)
> + cxl_port_foreach_all(cxl_bus_get_port(bus), port)
> + cxl_dport_foreach(port, dport)
> + if (util_cxl_dport_filter(dport, devname))
> + return dport;
> +
> + log_err(&iel, "Downstream port \"%s\" not found\n", devname);
> + return NULL;
> +}
> +
> +static struct cxl_memdev *find_cxl_memdev(struct cxl_ctx *ctx,
> + const char *filter)
> +{
> + struct cxl_memdev *memdev;
> +
> + cxl_memdev_foreach(ctx, memdev) {
> + if (util_cxl_memdev_filter(memdev, filter, NULL))
> + return memdev;
> + }
> +
> + log_err(&iel, "Memdev \"%s\" not found\n", filter);
> + return NULL;
> +}
> +
> +static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
> + struct cxl_protocol_error *perror)
> +{
> + struct cxl_dport *dport;
> + int rc;
> +
> + if (!devname) {
> + log_err(&iel, "No downstream port specified for injection\n");
> + return -EINVAL;
> + }
> +
> + dport = find_cxl_dport(ctx, devname);
> + if (!dport)
> + return -ENODEV;
> +
> + rc = cxl_dport_protocol_error_inject(dport,
> + cxl_protocol_error_get_num(perror));
> + if (rc)
> + return rc;
> +
> + log_info(&iel, "injected %s protocol error.\n",
> + cxl_protocol_error_get_str(perror));
> + return 0;
> +}
> +
> +static int poison_action(struct cxl_ctx *ctx, const char *filter,
> + const char *addr_str)
> +{
> + struct cxl_memdev *memdev;
> + unsigned long long addr;
> + int rc;
> +
> + memdev = find_cxl_memdev(ctx, filter);
> + if (!memdev)
> + return -ENODEV;
> +
> + if (!cxl_memdev_has_poison_injection(memdev)) {
> + log_err(&iel, "%s does not support error injection\n",
> + cxl_memdev_get_devname(memdev));
> + return -EINVAL;
> + }
> +
> + if (!addr_str) {
> + log_err(&iel, "no address provided\n");
> + return -EINVAL;
> + }
> +
> + errno = 0;
Why does errno needs to be set here?
> + addr = strtoull(addr_str, NULL, 0);
> + if (addr == ULLONG_MAX && errno == ERANGE) {
> + log_err(&iel, "invalid address %s", addr_str);
> + return -EINVAL;
> + }
> +
> + rc = cxl_memdev_inject_poison(memdev, addr);
> + if (rc)
> + log_err(&iel, "failed to inject poison at %s:%s: %s\n",
> + cxl_memdev_get_devname(memdev), addr_str, strerror(-rc));
We don't error if poison fails to inject?
DJ
> + else
> + log_info(&iel, "poison injected at %s:%s\n",
> + cxl_memdev_get_devname(memdev), addr_str);
> +
> + return rc;
> +}
> +
> +static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
> + const struct option *options, const char *usage)
> +{
> + struct cxl_protocol_error *perr;
> + const char * const u[] = {
> + usage,
> + NULL
> + };
> + int rc = -EINVAL;
> +
> + log_init(&iel, "cxl inject-error", "CXL_INJECT_LOG");
> + argc = parse_options(argc, argv, options, u, 0);
> +
> + if (debug) {
> + cxl_set_log_priority(ctx, LOG_DEBUG);
> + iel.log_priority = LOG_DEBUG;
> + } else {
> + iel.log_priority = LOG_INFO;
> + }
> +
> + if (argc != 1 || inj_param.type == NULL) {
> + usage_with_options(u, options);
> + return rc;
> + }
> +
> + if (strcmp(inj_param.type, "poison") == 0) {
> + rc = poison_action(ctx, argv[0], inj_param.address);
> + return rc;
> + }
> +
> + perr = find_cxl_proto_err(ctx, inj_param.type);
> + if (perr) {
> + rc = inject_proto_err(ctx, argv[0], perr);
> + if (rc)
> + log_err(&iel, "Failed to inject error: %d\n", rc);
> + }
> +
> + return rc;
> +}
> +
> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
> +{
> + int rc = inject_action(argc, argv, ctx, inject_options,
> + "inject-error <device> -t <type> [<options>]");
> +
> + return rc ? EXIT_FAILURE : EXIT_SUCCESS;
> +}
> diff --git a/cxl/meson.build b/cxl/meson.build
> index b9924ae..92031b5 100644
> --- a/cxl/meson.build
> +++ b/cxl/meson.build
> @@ -7,6 +7,7 @@ cxl_src = [
> 'memdev.c',
> 'json.c',
> 'filter.c',
> + 'inject-error.c',
> '../daxctl/json.c',
> '../daxctl/filter.c',
> ]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 5/7] cxl: Add clear-error command
2026-01-09 16:07 ` [PATCH 5/7] cxl: Add clear-error command Ben Cheatham
@ 2026-01-09 22:12 ` Dave Jiang
0 siblings, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2026-01-09 22:12 UTC (permalink / raw)
To: Ben Cheatham, nvdimm, alison.schofield; +Cc: linux-cxl
On 1/9/26 9:07 AM, Ben Cheatham wrote:
> Add the 'cxl-clear-error' command. This command allows the user to clear
> device poison from CXL memory devices.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> cxl/builtin.h | 1 +
> cxl/cxl.c | 1 +
> cxl/inject-error.c | 70 ++++++++++++++++++++++++++++++++++++++++++----
> 3 files changed, 67 insertions(+), 5 deletions(-)
>
> diff --git a/cxl/builtin.h b/cxl/builtin.h
> index e82fcb5..68ed1de 100644
> --- a/cxl/builtin.h
> +++ b/cxl/builtin.h
> @@ -26,6 +26,7 @@ int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
> int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
> int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
> int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
> +int cmd_clear_error(int argc, const char **argv, struct cxl_ctx *ctx);
> #ifdef ENABLE_LIBTRACEFS
> int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
> #else
> diff --git a/cxl/cxl.c b/cxl/cxl.c
> index a98bd6b..e1740b5 100644
> --- a/cxl/cxl.c
> +++ b/cxl/cxl.c
> @@ -81,6 +81,7 @@ static struct cmd_struct commands[] = {
> { "destroy-region", .c_fn = cmd_destroy_region },
> { "monitor", .c_fn = cmd_monitor },
> { "inject-error", .c_fn = cmd_inject_error },
> + { "clear-error", .c_fn = cmd_clear_error },
> };
>
> int main(int argc, const char **argv)
> diff --git a/cxl/inject-error.c b/cxl/inject-error.c
> index 0ca2e6b..76f9fa9 100644
> --- a/cxl/inject-error.c
> +++ b/cxl/inject-error.c
> @@ -17,6 +17,10 @@ static struct inject_params {
> const char *address;
> } inj_param;
>
> +static struct clear_params {
> + const char *address;
> +} clear_param;
> +
> static const struct option inject_options[] = {
> OPT_STRING('t', "type", &inj_param.type, "Error type",
> "Error type to inject into <device>"),
> @@ -28,6 +32,15 @@ static const struct option inject_options[] = {
> OPT_END(),
> };
>
> +static const struct option clear_options[] = {
> + OPT_STRING('a', "address", &clear_param.address, "Address for poison clearing",
> + "Device physical address to clear poison from in hex or decimal"),
> +#ifdef ENABLE_DEBUG
> + OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
> +#endif
> + OPT_END(),
> +};
> +
> static struct log_ctx iel;
>
> static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
> @@ -100,7 +113,7 @@ static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
> }
>
> static int poison_action(struct cxl_ctx *ctx, const char *filter,
> - const char *addr_str)
> + const char *addr_str, bool clear)
> {
> struct cxl_memdev *memdev;
> unsigned long long addr;
> @@ -128,12 +141,18 @@ static int poison_action(struct cxl_ctx *ctx, const char *filter,
> return -EINVAL;
> }
>
> - rc = cxl_memdev_inject_poison(memdev, addr);
> + if (clear)
> + rc = cxl_memdev_clear_poison(memdev, addr);
> + else
> + rc = cxl_memdev_inject_poison(memdev, addr);
> +
> if (rc)
> - log_err(&iel, "failed to inject poison at %s:%s: %s\n",
> + log_err(&iel, "failed to %s %s:%s: %s\n",
> + clear ? "clear poison at" : "inject poison at",
> cxl_memdev_get_devname(memdev), addr_str, strerror(-rc));
> else
> - log_info(&iel, "poison injected at %s:%s\n",
> + log_info(&iel,
> + "poison %s at %s:%s\n", clear ? "cleared" : "injected",
> cxl_memdev_get_devname(memdev), addr_str);
>
> return rc;
> @@ -165,7 +184,7 @@ static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
> }
>
> if (strcmp(inj_param.type, "poison") == 0) {
> - rc = poison_action(ctx, argv[0], inj_param.address);
> + rc = poison_action(ctx, argv[0], inj_param.address, false);
> return rc;
> }
>
> @@ -186,3 +205,44 @@ int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
>
> return rc ? EXIT_FAILURE : EXIT_SUCCESS;
> }
> +
> +static int clear_action(int argc, const char **argv, struct cxl_ctx *ctx,
> + const struct option *options, const char *usage)
> +{
> + const char * const u[] = {
> + usage,
> + NULL
> + };
> + int rc = -EINVAL;
> +
> + log_init(&iel, "cxl clear-error", "CXL_CLEAR_LOG");
> + argc = parse_options(argc, argv, options, u, 0);
> +
> + if (debug) {
> + cxl_set_log_priority(ctx, LOG_DEBUG);
> + iel.log_priority = LOG_DEBUG;
> + } else {
> + iel.log_priority = LOG_INFO;
> + }
> +
> + if (argc != 1) {
> + usage_with_options(u, options);
> + return rc;
> + }
> +
> + rc = poison_action(ctx, argv[0], clear_param.address, true);
> + if (rc) {
> + log_err(&iel, "Failed to clear poison on %s at: %s\n",
> + argv[0], strerror(-rc));
> + return rc;
> + }
> +
> + return rc;
> +}
> +
> +int cmd_clear_error(int argc, const char **argv, struct cxl_ctx *ctx)
> +{
> + int rc = clear_action(argc, argv, ctx, clear_options,
> + "clear-error <device> [<options>]");
> + return rc ? EXIT_FAILURE : EXIT_SUCCESS;
> +}
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/7] cxl/list: Add injectable errors in output
2026-01-09 16:07 ` [PATCH 6/7] cxl/list: Add injectable errors in output Ben Cheatham
@ 2026-01-09 22:17 ` Dave Jiang
0 siblings, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2026-01-09 22:17 UTC (permalink / raw)
To: Ben Cheatham, nvdimm, alison.schofield; +Cc: linux-cxl
On 1/9/26 9:07 AM, Ben Cheatham wrote:
> Add injectable error information for CXL memory devices and busses.
> This information is only shown when the CXL debugfs is accessible
> (normally mounted at /sys/kernel/debug/cxl).
>
> For CXL memory devices and dports this reports whether the device
> supports poison injection. The "--media-errors"/"-L" option shows
> injected poison for memory devices.
>
> For CXL busses this shows injectable CXL protocol error types. The
> information will be the same across busses because the error types are
> system-wide. The information is presented under the bus for easier
> filtering.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> cxl/json.c | 38 ++++++++++++++++++++++++++++++++++++++
> cxl/lib/libcxl.c | 34 +++++++++++++++++++++++++---------
> cxl/lib/libcxl.sym | 2 ++
> cxl/libcxl.h | 2 ++
> 4 files changed, 67 insertions(+), 9 deletions(-)
>
> diff --git a/cxl/json.c b/cxl/json.c
> index e9cb88a..6cdf513 100644
> --- a/cxl/json.c
> +++ b/cxl/json.c
> @@ -663,6 +663,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
> json_object_object_add(jdev, "state", jobj);
> }
>
> + if (cxl_debugfs_exists(cxl_memdev_get_ctx(memdev))) {
> + jobj = json_object_new_boolean(cxl_memdev_has_poison_injection(memdev));
> + if (jobj)
> + json_object_object_add(jdev, "poison_injectable", jobj);
> + }
> +
> if (flags & UTIL_JSON_PARTITION) {
> jobj = util_cxl_memdev_partition_to_json(memdev, flags);
> if (jobj)
> @@ -691,6 +697,7 @@ void util_cxl_dports_append_json(struct json_object *jport,
> {
> struct json_object *jobj, *jdports;
> struct cxl_dport *dport;
> + char *einj_path;
> int val;
>
> val = cxl_port_get_nr_dports(port);
> @@ -739,6 +746,13 @@ void util_cxl_dports_append_json(struct json_object *jport,
> if (jobj)
> json_object_object_add(jdport, "id", jobj);
>
> + einj_path = cxl_dport_get_einj_path(dport);
> + jobj = json_object_new_boolean(einj_path != NULL);
> + if (jobj)
> + json_object_object_add(jdport, "protocol_injectable",
> + jobj);
> + free(einj_path);
> +
> json_object_array_add(jdports, jdport);
> json_object_set_userdata(jdport, dport, NULL);
> }
> @@ -750,6 +764,8 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
> unsigned long flags)
> {
> const char *devname = cxl_bus_get_devname(bus);
> + struct cxl_ctx *ctx = cxl_bus_get_ctx(bus);
> + struct cxl_protocol_error *perror;
> struct json_object *jbus, *jobj;
>
> jbus = json_object_new_object();
> @@ -765,6 +781,28 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
> json_object_object_add(jbus, "provider", jobj);
>
> json_object_set_userdata(jbus, bus, NULL);
> +
> + if (cxl_debugfs_exists(ctx)) {
> + jobj = json_object_new_array();
> + if (!jobj)
> + return jbus;
> +
> + cxl_protocol_error_foreach(ctx, perror)
> + {
> + struct json_object *jerr_str;
> + const char *perror_str;
> +
> + perror_str = cxl_protocol_error_get_str(perror);
> +
> + jerr_str = json_object_new_string(perror_str);
> + if (jerr_str)
> + json_object_array_add(jobj, jerr_str);
> + }
> +
> + json_object_object_add(jbus, "injectable_protocol_errors",
> + jobj);
> + }
> +
> return jbus;
> }
>
> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> index deebf7f..f824701 100644
> --- a/cxl/lib/libcxl.c
> +++ b/cxl/lib/libcxl.c
> @@ -285,6 +285,11 @@ static char* get_cxl_debugfs_dir(void)
> return debugfs_dir;
> }
>
> +CXL_EXPORT bool cxl_debugfs_exists(struct cxl_ctx *ctx)
> +{
> + return ctx->cxl_debugfs != NULL;
> +}
> +
> /**
> * cxl_new - instantiate a new library context
> * @ctx: context to establish
> @@ -3567,38 +3572,49 @@ cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
> return perror->string;
> }
>
> -CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
> - unsigned int error)
> +CXL_EXPORT char *cxl_dport_get_einj_path(struct cxl_dport *dport)
> {
> struct cxl_ctx *ctx = dport->port->ctx;
> - char buf[32] = { 0 };
> size_t path_len, len;
> char *path;
> int rc;
>
> - if (!ctx->cxl_debugfs)
> - return -ENOENT;
> -
> path_len = strlen(ctx->cxl_debugfs) + 100;
> path = calloc(path_len, sizeof(char));
> if (!path)
> - return -ENOMEM;
> + return NULL;
>
> len = snprintf(path, path_len, "%s/%s/einj_inject", ctx->cxl_debugfs,
> cxl_dport_get_devname(dport));
> if (len >= path_len) {
> err(ctx, "%s: buffer too small\n", cxl_dport_get_devname(dport));
> free(path);
> - return -ENOMEM;
> + return NULL;
> }
>
> rc = access(path, F_OK);
> if (rc) {
> err(ctx, "failed to access %s: %s\n", path, strerror(errno));
> free(path);
> - return -errno;
> + return NULL;
> }
>
> + return path;
> +}
> +
> +CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
> + unsigned int error)
> +{
> + struct cxl_ctx *ctx = dport->port->ctx;
> + char buf[32] = { 0 };
> + char *path;
> + size_t len;
> + int rc;
> +
> + path = cxl_dport_get_einj_path(dport);
> + if (!path)
> + return -ENOENT;
> +
> len = snprintf(buf, sizeof(buf), "0x%x\n", error);
> if (len >= sizeof(buf)) {
> err(ctx, "%s: buffer too small\n", cxl_dport_get_devname(dport));
> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
> index c636edb..ebca543 100644
> --- a/cxl/lib/libcxl.sym
> +++ b/cxl/lib/libcxl.sym
> @@ -308,8 +308,10 @@ global:
> cxl_protocol_error_get_next;
> cxl_protocol_error_get_num;
> cxl_protocol_error_get_str;
> + cxl_dport_get_einj_path;
> cxl_dport_protocol_error_inject;
> cxl_memdev_has_poison_injection;
> cxl_memdev_inject_poison;
> cxl_memdev_clear_poison;
> + cxl_debugfs_exists;
> } LIBCXL_10;
> diff --git a/cxl/libcxl.h b/cxl/libcxl.h
> index 4d035f0..e390aca 100644
> --- a/cxl/libcxl.h
> +++ b/cxl/libcxl.h
> @@ -32,6 +32,7 @@ void cxl_set_userdata(struct cxl_ctx *ctx, void *userdata);
> void *cxl_get_userdata(struct cxl_ctx *ctx);
> void cxl_set_private_data(struct cxl_ctx *ctx, void *data);
> void *cxl_get_private_data(struct cxl_ctx *ctx);
> +bool cxl_debugfs_exists(struct cxl_ctx *ctx);
>
> enum cxl_fwl_status {
> CXL_FWL_STATUS_UNKNOWN,
> @@ -507,6 +508,7 @@ struct cxl_protocol_error *
> cxl_protocol_error_get_next(struct cxl_protocol_error *perror);
> unsigned int cxl_protocol_error_get_num(struct cxl_protocol_error *perror);
> const char *cxl_protocol_error_get_str(struct cxl_protocol_error *perror);
> +char *cxl_dport_get_einj_path(struct cxl_dport *dport);
> int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
> unsigned int error);
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] Documentation: Add docs for inject/clear-error commands
2026-01-09 16:07 ` [PATCH 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
@ 2026-01-09 22:25 ` Dave Jiang
0 siblings, 0 replies; 19+ messages in thread
From: Dave Jiang @ 2026-01-09 22:25 UTC (permalink / raw)
To: Ben Cheatham, nvdimm, alison.schofield; +Cc: linux-cxl
On 1/9/26 9:07 AM, Ben Cheatham wrote:
> Add man pages for the 'cxl-inject-error' and 'cxl-clear-error' commands.
> These man pages show usage and examples for each of their use cases.
>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> Documentation/cxl/cxl-clear-error.txt | 69 +++++++++++
> Documentation/cxl/cxl-inject-error.txt | 161 +++++++++++++++++++++++++
> Documentation/cxl/meson.build | 2 +
> 3 files changed, 232 insertions(+)
> create mode 100644 Documentation/cxl/cxl-clear-error.txt
> create mode 100644 Documentation/cxl/cxl-inject-error.txt
>
> diff --git a/Documentation/cxl/cxl-clear-error.txt b/Documentation/cxl/cxl-clear-error.txt
> new file mode 100644
> index 0000000..9d77855
> --- /dev/null
> +++ b/Documentation/cxl/cxl-clear-error.txt
> @@ -0,0 +1,69 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +cxl-clear-error(1)
> +==================
> +
> +NAME
> +----
> +cxl-clear-error - Clear CXL errors from CXL devices
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'cxl clear-error' <device name> [<options>]
> +
> +Clear an error from a CXL device. The types of devices supported are:
> +
> +"memdevs":: A CXL memory device. Memory devices are specified by device
> +name ("mem0"), device id ("0") and/or host device name ("0000:35:00.0").
> +
> +Only device poison (viewable using the '-L'/'--media-errors' option of
> +'cxl-list') can be cleared from a device using this command. For example:
> +
> +----
> +
> +# cxl list -m mem0 -L -u
> +{
> + "memdev":"mem0",
> + "ram_size":"1024.00 MiB (1073.74 MB)",
> + "ram_qos_class":42,
> + "serial":"0x0",
> + "numa_node:1,
> + "host":"0000:35:00.0",
> + "media_errors":[
> + {
> + "offset":"0x1000",
> + "length":64,
> + "source":"Injected"
> + }
> + ]
> +}
> +
> +# cxl clear-error mem0 -a 0x1000
> +poison cleared at mem0:0x1000
> +
> +# cxl list -m mem0 -L -u
> +{
> + "memdev":"mem0",
> + "ram_size":"1024.00 MiB (1073.74 MB)",
> + "ram_qos_class":42,
> + "serial":"0x0",
> + "numa_node:1,
> + "host":"0000:35:00.0",
> + "media_errors":[
> + ]
> +}
> +
> +----
> +
> +This command depends on the kernel debug filesystem (debugfs) to clear device poison.
> +
> +OPTIONS
> +-------
> +-a::
> +--address::
> + Device physical address (DPA) to clear poison from. Address can be specified
> + in hex or decimal. Required for clearing poison.
> +
> +--debug::
> + Enable debug output
> diff --git a/Documentation/cxl/cxl-inject-error.txt b/Documentation/cxl/cxl-inject-error.txt
> new file mode 100644
> index 0000000..80d03be
> --- /dev/null
> +++ b/Documentation/cxl/cxl-inject-error.txt
> @@ -0,0 +1,161 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +cxl-inject-error(1)
> +===================
> +
> +NAME
> +----
> +cxl-inject-error - Inject CXL errors into CXL devices
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'cxl inject-error' <device name> [<options>]
> +
> +WARNING: Error injection can cause system instability and should only be used
> +for debugging hardware and software error recovery flows. Use at your own risk!
> +
> +Inject an error into a CXL device. The type of errors supported depend on the
> +device specified. The types of devices supported are:
> +
> +"Downstream Ports":: A CXL RCH downstream port (dport) or a CXL VH root port.
> +Eligible ports will have their 'protocol_injectable' attribute in 'cxl-list'
> +set to true. Dports are specified by host name ("0000:0e:01.1").
> +"memdevs":: A CXL memory device. Memory devices are specified by device name
> +("mem0"), device id ("0"), and/or host device name ("0000:35:00.0").
> +
> +There are two types of errors which can be injected: CXL protocol errors
> +and device poison.
> +
> +CXL protocol errors can only be used with downstream ports (as defined above).
> +Protocol errors follow the format of "<protocol>-<severity>". For example,
> +a "mem-fatal" error is a CXL.mem fatal protocol error. Protocol errors can be
> +found in the "injectable_protocol_errors" list under a CXL bus object. This
> +list is only available when the CXL debugfs is accessible (normally mounted
> +at "/sys/kernel/debug/cxl"). For example:
> +
> +----
> +
> +# cxl list -B
> +[
> + {
> + "bus":"root0",
> + "provider":"ACPI.CXL",
> + "injectable_protocol_errors":[
> + "mem-correctable",
> + "mem-fatal",
> + ]
> + }
> +]
> +
> +----
> +
> +CXL protocol (CXL.cache/mem) error injection requires the platform to support
> +ACPI v6.5+ error injection (EINJ). In addition to platform support, the
> +CONFIG_ACPI_APEI_EINJ and CONFIG_ACPI_APEI_EINJ_CXL kernel configuration options
> +will need to be enabled. For more information, view the Linux kernel documentation
> +on EINJ. Example using the bus output above:
> +
> +----
> +
> +# cxl list -TP
> + [
> + {
> + "port":"port1",
> + "host":"pci0000:e0",
> + "depth":1,
> + "decoders_committed":1,
> + "nr_dports":1,
> + "dports":[
> + {
> + "dport":"0000:e0:01.1",
> + "alias":"device:02",
> + "id":0,
> + "protocol_injectable":true
> + }
> + ]
> + }
> +]
> +
> +# cxl inject-error "0000:e0:01.1" -t mem-correctable
> +cxl inject-error: inject_proto_err: injected mem-correctable protocol error.
> +
> +----
> +
> +Device poison can only by used with CXL memory devices. A device physical address
> +(DPA) is required to do poison injection. DPAs range from 0 to the size of
> +device's memory, which can be found using 'cxl-list'. An example injection:
> +
> +----
> +
> +# cxl inject-error mem0 -t poison -a 0x1000
> +poison injected at mem0:0x1000
> +# cxl list -m mem0 -u --media-errors
> +{
> + "memdev":"mem0",
> + "ram_size":"256.00 MiB (268.44 MB)",
> + "serial":"0",
> + "host":"0000:0d:00.0",
> + "firmware_version":"BWFW VERSION 00",
> + "media_errors":[
> + {
> + "offset":"0x1000",
> + "length":64,
> + "source":"Injected"
> + }
> + ]
> +}
> +
> +----
> +
> +Not all memory devices support poison injection. To see if a device supports
> +poison injection through debugfs, use 'cxl-list' look for the "poison-injectable"
> +attribute under the device. This attribute is only available when the CXL debugfs
> +is accessible. Example:
> +
> +----
> +
> +# cxl list -u -m mem0
> +{
> + "memdev":"mem0",
> + "ram_size":"256.00 MiB (268.44 MB)",
> + "serial":"0",
> + "host":"0000:0d:00.0",
> + "firmware_version":"BWFW VERSION 00",
> + "poison_injectable":true
> +}
> +
> +----
> +
> +This command depends on the kernel debug filesystem (debugfs) to do CXL protocol
> +error and device poison injection.
> +
> +OPTIONS
> +-------
> +-a::
> +--address::
> + Device physical address (DPA) to use for poison injection. Address can
> + be specified in hex or decimal. Required for poison injection.
> +
> +-t::
> +--type::
> + Type of error to inject into <device name>. The type of error is restricted
> + by device type. The following shows the possible types under their associated
> + device type(s):
> +----
> +
> +Downstream Ports: ::
> + cache-correctable, cache-uncorrectable, cache-fatal, mem-correctable,
> + mem-uncorrectable, mem-fatal
> +
> +Memdevs: ::
> + poison
> +
> +----
> +
> +--debug::
> + Enable debug output
> +
> +SEE ALSO
> +--------
> +linkcxl:cxl-list[1]
> diff --git a/Documentation/cxl/meson.build b/Documentation/cxl/meson.build
> index 8085c1c..0b75eed 100644
> --- a/Documentation/cxl/meson.build
> +++ b/Documentation/cxl/meson.build
> @@ -50,6 +50,8 @@ cxl_manpages = [
> 'cxl-update-firmware.txt',
> 'cxl-set-alert-config.txt',
> 'cxl-wait-sanitize.txt',
> + 'cxl-inject-error.txt',
> + 'cxl-clear-error.txt',
> ]
>
> foreach man : cxl_manpages
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/7] libcxl: Add CXL protocol errors
2026-01-09 17:54 ` Dave Jiang
@ 2026-01-12 17:20 ` Cheatham, Benjamin
0 siblings, 0 replies; 19+ messages in thread
From: Cheatham, Benjamin @ 2026-01-12 17:20 UTC (permalink / raw)
To: Dave Jiang, nvdimm, alison.schofield; +Cc: linux-cxl
Hey Dave, thanks for taking a look! Responses inline.
On 1/9/2026 11:54 AM, Dave Jiang wrote:
>
>
> On 1/9/26 9:07 AM, Ben Cheatham wrote:
>> The v6.11 Linux kernel adds CXL protocl (CXL.cache & CXL.mem) error
>> injection for platforms that implement the error types as according to
>> the v6.5+ ACPI specification. The interface for injecting these errors
>> are provided by the kernel under the CXL debugfs. The relevant files in
>> the interface are the einj_types file, which provides the available CXL
>> error types for injection, and the einj_inject file, which injects the
>> error into a CXL VH root port or CXL RCH downstream port.
>>
>> Add a library API to retrieve the CXL error types and inject them. This
>> API will be used in a later commit by the 'cxl-inject-error' and
>> 'cxl-list' commands.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>
> Just a nit below. otherwise
>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
[snip]
>> +static void cxl_add_protocol_errors(struct cxl_ctx *ctx)
>> +{
>> + struct cxl_protocol_error *perror;
>> + char buf[SYSFS_ATTR_SIZE];
>> + char *path, *num, *save;
>> + size_t path_len, len;
>> + unsigned long n;
>> + int rc = 0;
>> +
>> + if (!ctx->cxl_debugfs)
>> + return;
>> +
>> + path_len = strlen(ctx->cxl_debugfs) + 100;
>> + path = calloc(1, path_len);
>
> Maybe just use PATH_MAX from <linux/limits.h>.
Sounds good to me, I'll change it.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/7] libcxl: Add poison injection support
2026-01-09 18:03 ` Dave Jiang
@ 2026-01-12 17:20 ` Cheatham, Benjamin
0 siblings, 0 replies; 19+ messages in thread
From: Cheatham, Benjamin @ 2026-01-12 17:20 UTC (permalink / raw)
To: Dave Jiang, nvdimm, alison.schofield; +Cc: linux-cxl
On 1/9/2026 12:03 PM, Dave Jiang wrote:
>
>
> On 1/9/26 9:07 AM, Ben Cheatham wrote:
>> Add a library API for clearing and injecting poison into a CXL memory
>> device through the CXL debugfs.
>>
>> This API will be used by the 'cxl-inject-error' and 'cxl-clear-error'
>> commands in later commits.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>> cxl/lib/libcxl.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++
>> cxl/lib/libcxl.sym | 3 ++
>> cxl/libcxl.h | 3 ++
>> 3 files changed, 89 insertions(+)
>>
>> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
>> index 27ff037..deebf7f 100644
>> --- a/cxl/lib/libcxl.c
>> +++ b/cxl/lib/libcxl.c
>> @@ -5046,3 +5046,86 @@ CXL_EXPORT struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memde
>> {
>> return cxl_cmd_new_generic(memdev, CXL_MEM_COMMAND_ID_SET_ALERT_CONFIG);
>> }
>> +
>> +CXL_EXPORT bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev)
>> +{
>> + struct cxl_ctx *ctx = memdev->ctx;
>> + size_t path_len, len;
>> + bool exists = true;
>> + char *path;
>> + int rc;
>> +
>> + if (!ctx->cxl_debugfs)
>> + return false;
>> +
>> + path_len = strlen(ctx->cxl_debugfs) + 100;
>
> Same comment about PATH_MAX.
I'll change it (here and everywhere else).
>
>> + path = calloc(path_len, sizeof(char));
>> + if (!path)
>> + return false;
>> +
>> + len = snprintf(path, path_len, "%s/%s/inject_poison", ctx->cxl_debugfs,
>> + cxl_memdev_get_devname(memdev));
>> + if (len >= path_len) {
>> + err(ctx, "%s: buffer too small\n",
>> + cxl_memdev_get_devname(memdev));
>> + free(path);
>> + return false;
>
> I think I saw in an earlier patch that you were using goto to filter error exit point. So may as well make it consistent and do it here as well.
Sure, I'll update this and the function below. I already screwed up one of the return paths last revision so
it's probably warranted.
>
>> + }
>> +
>> + rc = access(path, F_OK);
>> + if (rc)
>> + exists = false;
>> +
>> + free(path);
>> + return exists;
>> +}
>> +
>> +static int cxl_memdev_poison_action(struct cxl_memdev *memdev, size_t dpa,
>> + bool clear)
>> +{
>> + struct cxl_ctx *ctx = memdev->ctx;
>> + size_t path_len, len;
>> + char addr[32];
>> + char *path;
>> + int rc;
>> +
>> + if (!ctx->cxl_debugfs)
>> + return -ENOENT;
>> +
>> + path_len = strlen(ctx->cxl_debugfs) + 100;
>
> same comment about path len
>
>> + path = calloc(path_len, sizeof(char));
>> + if (!path)
>> + return -ENOMEM;
>> +
>> + len = snprintf(path, path_len, "%s/%s/%s", ctx->cxl_debugfs,
>> + cxl_memdev_get_devname(memdev),
>> + clear ? "clear_poison" : "inject_poison");
>> + if (len >= path_len) {
>> + err(ctx, "%s: buffer too small\n",
>> + cxl_memdev_get_devname(memdev));
>> + free(path);
>> + return -ENOMEM;
>
> same comment about error paths
>
> DJ
>
>> + }
>> +
>> + len = snprintf(addr, sizeof(addr), "0x%lx\n", dpa);
>> + if (len >= sizeof(addr)) {
>> + err(ctx, "%s: buffer too small\n",
>> + cxl_memdev_get_devname(memdev));
>> + free(path);
>> + return -ENOMEM;
>> + }
>> +
>> + rc = sysfs_write_attr(ctx, path, addr);
>> + free(path);
>> + return rc;
>> +}
>> +
>> +CXL_EXPORT int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t addr)
>> +{
>> + return cxl_memdev_poison_action(memdev, addr, false);
>> +}
>> +
>> +CXL_EXPORT int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t addr)
>> +{
>> + return cxl_memdev_poison_action(memdev, addr, true);
>> +}
>> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
>> index c683b83..c636edb 100644
>> --- a/cxl/lib/libcxl.sym
>> +++ b/cxl/lib/libcxl.sym
>> @@ -309,4 +309,7 @@ global:
>> cxl_protocol_error_get_num;
>> cxl_protocol_error_get_str;
>> cxl_dport_protocol_error_inject;
>> + cxl_memdev_has_poison_injection;
>> + cxl_memdev_inject_poison;
>> + cxl_memdev_clear_poison;
>> } LIBCXL_10;
>> diff --git a/cxl/libcxl.h b/cxl/libcxl.h
>> index faef62e..4d035f0 100644
>> --- a/cxl/libcxl.h
>> +++ b/cxl/libcxl.h
>> @@ -105,6 +105,9 @@ int cxl_memdev_read_label(struct cxl_memdev *memdev, void *buf, size_t length,
>> size_t offset);
>> int cxl_memdev_write_label(struct cxl_memdev *memdev, void *buf, size_t length,
>> size_t offset);
>> +bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev);
>> +int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t dpa);
>> +int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t dpa);
>> struct cxl_cmd *cxl_cmd_new_get_fw_info(struct cxl_memdev *memdev);
>> unsigned int cxl_cmd_fw_info_get_num_slots(struct cxl_cmd *cmd);
>> unsigned int cxl_cmd_fw_info_get_active_slot(struct cxl_cmd *cmd);
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/7] cxl: Add inject-error command
2026-01-09 21:53 ` Dave Jiang
@ 2026-01-12 17:20 ` Cheatham, Benjamin
0 siblings, 0 replies; 19+ messages in thread
From: Cheatham, Benjamin @ 2026-01-12 17:20 UTC (permalink / raw)
To: Dave Jiang, nvdimm, alison.schofield; +Cc: linux-cxl
On 1/9/2026 3:53 PM, Dave Jiang wrote:
>
>
> On 1/9/26 9:07 AM, Ben Cheatham wrote:
>> Add the 'cxl-inject-error' command. This command will provide CXL
>> protocol error injection for CXL VH root ports and CXL RCH downstream
>> ports, as well as poison injection for CXL memory devices.
>>
>> Add util_cxl_dport_filter() to find downstream ports by device name.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
[snip]
>> +
>> +static int poison_action(struct cxl_ctx *ctx, const char *filter,
>> + const char *addr_str)
>> +{
>> + struct cxl_memdev *memdev;
>> + unsigned long long addr;
>> + int rc;
>> +
>> + memdev = find_cxl_memdev(ctx, filter);
>> + if (!memdev)
>> + return -ENODEV;
>> +
>> + if (!cxl_memdev_has_poison_injection(memdev)) {
>> + log_err(&iel, "%s does not support error injection\n",
>> + cxl_memdev_get_devname(memdev));
>> + return -EINVAL;
>> + }
>> +
>> + if (!addr_str) {
>> + log_err(&iel, "no address provided\n");
>> + return -EINVAL;
>> + }
>> +
>> + errno = 0;
>
> Why does errno needs to be set here?
Alison suggested it last revision. It's been a while so my memory is a bit hazy,
but I think strtoull() doesn't reset errno and checking it below could cause a problem
if it was set to ERANGE by a previous function.
>
>> + addr = strtoull(addr_str, NULL, 0);
>> + if (addr == ULLONG_MAX && errno == ERANGE) {
>> + log_err(&iel, "invalid address %s", addr_str);
>> + return -EINVAL;
>> + }
>> +
>> + rc = cxl_memdev_inject_poison(memdev, addr);
>> + if (rc)
>> + log_err(&iel, "failed to inject poison at %s:%s: %s\n",
>> + cxl_memdev_get_devname(memdev), addr_str, strerror(-rc));
>
> We don't error if poison fails to inject?
It does? The return code of cxl_memdev_inject_poison() is returned below, the only
thing that this if statement does is pick which message is emitted.
Thanks,
Ben
>
> DJ
>
>> + else
>> + log_info(&iel, "poison injected at %s:%s\n",
>> + cxl_memdev_get_devname(memdev), addr_str);
>> +
>> + return rc;
>> +}
>> +
>> +static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
>> + const struct option *options, const char *usage)
>> +{
>> + struct cxl_protocol_error *perr;
>> + const char * const u[] = {
>> + usage,
>> + NULL
>> + };
>> + int rc = -EINVAL;
>> +
>> + log_init(&iel, "cxl inject-error", "CXL_INJECT_LOG");
>> + argc = parse_options(argc, argv, options, u, 0);
>> +
>> + if (debug) {
>> + cxl_set_log_priority(ctx, LOG_DEBUG);
>> + iel.log_priority = LOG_DEBUG;
>> + } else {
>> + iel.log_priority = LOG_INFO;
>> + }
>> +
>> + if (argc != 1 || inj_param.type == NULL) {
>> + usage_with_options(u, options);
>> + return rc;
>> + }
>> +
>> + if (strcmp(inj_param.type, "poison") == 0) {
>> + rc = poison_action(ctx, argv[0], inj_param.address);
>> + return rc;
>> + }
>> +
>> + perr = find_cxl_proto_err(ctx, inj_param.type);
>> + if (perr) {
>> + rc = inject_proto_err(ctx, argv[0], perr);
>> + if (rc)
>> + log_err(&iel, "Failed to inject error: %d\n", rc);
>> + }
>> +
>> + return rc;
>> +}
>> +
>> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
>> +{
>> + int rc = inject_action(argc, argv, ctx, inject_options,
>> + "inject-error <device> -t <type> [<options>]");
>> +
>> + return rc ? EXIT_FAILURE : EXIT_SUCCESS;
>> +}
>> diff --git a/cxl/meson.build b/cxl/meson.build
>> index b9924ae..92031b5 100644
>> --- a/cxl/meson.build
>> +++ b/cxl/meson.build
>> @@ -7,6 +7,7 @@ cxl_src = [
>> 'memdev.c',
>> 'json.c',
>> 'filter.c',
>> + 'inject-error.c',
>> '../daxctl/json.c',
>> '../daxctl/filter.c',
>> ]
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/7] libcxl: Add debugfs path to CXL context
2026-01-22 20:37 [ndctl PATCH v7 0/7] Add error injection support Ben Cheatham
@ 2026-01-22 20:37 ` Ben Cheatham
0 siblings, 0 replies; 19+ messages in thread
From: Ben Cheatham @ 2026-01-22 20:37 UTC (permalink / raw)
To: nvdimm, alison.schofield, dave.jiang; +Cc: linux-cxl, benjamin.cheatham
Find the CXL debugfs mount point and add it to the CXL library context.
This will be used by poison and procotol error library functions to
access the information presented by the filesystem.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
cxl/lib/libcxl.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 32728de..6b7e92c 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -8,6 +8,8 @@
#include <stdlib.h>
#include <dirent.h>
#include <unistd.h>
+#include <mntent.h>
+#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
@@ -54,6 +56,7 @@ struct cxl_ctx {
struct kmod_ctx *kmod_ctx;
struct daxctl_ctx *daxctl_ctx;
void *private_data;
+ char *cxl_debugfs;
};
static void free_pmem(struct cxl_pmem *pmem)
@@ -240,6 +243,38 @@ CXL_EXPORT void *cxl_get_private_data(struct cxl_ctx *ctx)
return ctx->private_data;
}
+static char* get_cxl_debugfs_dir(void)
+{
+ char *debugfs_dir = NULL;
+ struct mntent *ent;
+ FILE *mntf;
+
+ mntf = setmntent("/proc/mounts", "r");
+ if (!mntf)
+ return NULL;
+
+ while ((ent = getmntent(mntf)) != NULL) {
+ if (!strcmp(ent->mnt_type, "debugfs")) {
+ /* Magic '5' here is length of "/cxl" + NULL terminator */
+ debugfs_dir = calloc(strlen(ent->mnt_dir) + 5, 1);
+ if (!debugfs_dir)
+ return NULL;
+
+ strcpy(debugfs_dir, ent->mnt_dir);
+ strcat(debugfs_dir, "/cxl");
+ if (access(debugfs_dir, F_OK) != 0) {
+ free(debugfs_dir);
+ debugfs_dir = NULL;
+ }
+
+ break;
+ }
+ }
+
+ endmntent(mntf);
+ return debugfs_dir;
+}
+
/**
* cxl_new - instantiate a new library context
* @ctx: context to establish
@@ -295,6 +330,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
c->udev = udev;
c->udev_queue = udev_queue;
c->timeout = 5000;
+ c->cxl_debugfs = get_cxl_debugfs_dir();
return 0;
@@ -350,6 +386,7 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
kmod_unref(ctx->kmod_ctx);
daxctl_unref(ctx->daxctl_ctx);
info(ctx, "context %p released\n", ctx);
+ free((void *)ctx->cxl_debugfs);
free(ctx);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-01-22 20:57 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-09 16:07 [ndctl PATCH v6 0/7] Add error injection support Ben Cheatham
2026-01-09 16:07 ` [PATCH 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
2026-01-09 17:43 ` Dave Jiang
2026-01-09 16:07 ` [PATCH 2/7] libcxl: Add CXL protocol errors Ben Cheatham
2026-01-09 17:54 ` Dave Jiang
2026-01-12 17:20 ` Cheatham, Benjamin
2026-01-09 16:07 ` [PATCH 3/7] libcxl: Add poison injection support Ben Cheatham
2026-01-09 18:03 ` Dave Jiang
2026-01-12 17:20 ` Cheatham, Benjamin
2026-01-09 16:07 ` [PATCH 4/7] cxl: Add inject-error command Ben Cheatham
2026-01-09 21:53 ` Dave Jiang
2026-01-12 17:20 ` Cheatham, Benjamin
2026-01-09 16:07 ` [PATCH 5/7] cxl: Add clear-error command Ben Cheatham
2026-01-09 22:12 ` Dave Jiang
2026-01-09 16:07 ` [PATCH 6/7] cxl/list: Add injectable errors in output Ben Cheatham
2026-01-09 22:17 ` Dave Jiang
2026-01-09 16:07 ` [PATCH 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
2026-01-09 22:25 ` Dave Jiang
-- strict thread matches above, loose matches on Subject: below --
2026-01-22 20:37 [ndctl PATCH v7 0/7] Add error injection support Ben Cheatham
2026-01-22 20:37 ` [PATCH 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox