nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [ndctl PATCH v2 0/7] Add error injection support
@ 2025-06-02 20:56 Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-06-02 20:56 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield, junhyeok.im

v2 Changes:
	- Make the --clear option of 'inject-error' its own command (Alison)
	- Debugfs is now found using the /proc/mount entry instead of
	providing the path using a --debugfs option
	- Man page added for 'clear-error'
	- Reword commit descriptions for clarity

This series adds support for injecting CXL protocol (CXL.cache/mem)
errors[1] into CXL RCH Downstream ports and VH root ports[2] and
poison into CXL memory devices through the CXL debugfs. Errors are
injected using a new 'inject-error' command, while errors are reported
using a new cxl-list "-N"/"--injectable-errors" option. Device poison
can be cleared using the 'clear-error' command.

The 'inject-error'/'clear-error' commands and "-N" option of cxl-list all
require access to the CXL driver's debugfs.

The documentation for the new cxl-inject-error command shows both usage
and the possible device/error types, as well as how to retrieve them
using cxl-list. The documentation for cxl-list has also been updated to
show the usage of the new injectable errors option.

[1]: ACPI v6.5 spec, section 18.6.4
[2]: ACPI v6.5 spec, table 18.31

--

Alison, I reached out to Junhyeok about his poison injection series but
never heard back, so I've just continued with my original plans for a
v2.

Quick note: My testing setup is screwed up at the moment, so this
revision is untested. I'll try to get it fixed for the next revision.

--

Ben Cheatham (7):
  libcxl: Add debugfs path to CXL context
  libcxl: Add CXL protocol errors
  libcxl: Add poison injection support
  cxl: Add inject-error command
  cxl: Add clear-error command
  cxl/list: Add injectable errors in output
  Documentation: Add docs for inject/clear-error commands

 Documentation/cxl/cxl-clear-error.txt  |  67 ++++++
 Documentation/cxl/cxl-inject-error.txt | 129 ++++++++++++
 Documentation/cxl/cxl-list.txt         |  35 +++-
 Documentation/cxl/meson.build          |   2 +
 cxl/builtin.h                          |   2 +
 cxl/cxl.c                              |   2 +
 cxl/filter.h                           |   3 +
 cxl/inject-error.c                     | 253 +++++++++++++++++++++++
 cxl/json.c                             |  30 +++
 cxl/lib/libcxl.c                       | 274 +++++++++++++++++++++++++
 cxl/lib/libcxl.sym                     |  12 ++
 cxl/lib/private.h                      |  14 ++
 cxl/libcxl.h                           |  16 ++
 cxl/list.c                             |   3 +
 cxl/meson.build                        |   1 +
 util/json.h                            |   1 +
 16 files changed, 843 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/cxl/cxl-clear-error.txt
 create mode 100644 Documentation/cxl/cxl-inject-error.txt
 create mode 100644 cxl/inject-error.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ndctl PATCH v2 1/7] libcxl: Add debugfs path to CXL context
  2025-06-02 20:56 [ndctl PATCH v2 0/7] Add error injection support Ben Cheatham
@ 2025-06-02 20:56 ` Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 2/7] libcxl: Add CXL protocol errors Ben Cheatham
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-06-02 20:56 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield, junhyeok.im

Find the CXL debugfs mount point and add it to the CXL library context.
This will be used by poison and procotol error library functions to
access the information presented by the filesystem.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/lib/libcxl.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 63aa4ef..3e0aa5c 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -54,6 +54,7 @@ struct cxl_ctx {
 	struct kmod_ctx *kmod_ctx;
 	struct daxctl_ctx *daxctl_ctx;
 	void *private_data;
+	const char *debugfs;
 };
 
 static void free_pmem(struct cxl_pmem *pmem)
@@ -239,6 +240,43 @@ CXL_EXPORT void *cxl_get_private_data(struct cxl_ctx *ctx)
 	return ctx->private_data;
 }
 
+static char *get_debugfs_dir(void)
+{
+	char *dev, *dir, *type, *ret = NULL;
+	char line[PATH_MAX + 256 + 1];
+	FILE *fp;
+
+	fp = fopen("/proc/mounts", "r");
+	if (!fp)
+		return ret;
+
+	while (fgets(line, sizeof(line), fp)) {
+		dev = strtok(line, " \t");
+		if (!dev)
+			break;
+
+		dir = strtok(NULL, " \t");
+		if (!dir)
+			break;
+
+		type = strtok(NULL, " \t");
+		if (!type)
+			break;
+
+		if (!strcmp(type, "debugfs")) {
+			ret = calloc(strlen(dir) + 1, 1);
+			if (!ret)
+				break;
+
+			strcpy(ret, dir);
+			break;
+		}
+	}
+
+	fclose(fp);
+	return ret;
+}
+
 /**
  * cxl_new - instantiate a new library context
  * @ctx: context to establish
@@ -294,6 +332,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
 	c->udev = udev;
 	c->udev_queue = udev_queue;
 	c->timeout = 5000;
+	c->debugfs = get_debugfs_dir();
 
 	return 0;
 
@@ -349,6 +388,7 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
 	kmod_unref(ctx->kmod_ctx);
 	daxctl_unref(ctx->daxctl_ctx);
 	info(ctx, "context %p released\n", ctx);
+	free((void *)ctx->debugfs);
 	free(ctx);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [ndctl PATCH v2 2/7] libcxl: Add CXL protocol errors
  2025-06-02 20:56 [ndctl PATCH v2 0/7] Add error injection support Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
@ 2025-06-02 20:56 ` Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 3/7] libcxl: Add poison injection support Ben Cheatham
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-06-02 20:56 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield, junhyeok.im

The v6.11 Linux kernel adds CXL protocl (CXL.cache & CXL.mem) error
injection for platforms that implement the error types as according to
the v6.5+ ACPI specification. The interface for injecting these errors
are provided by the kernel under the CXL debugfs. The relevant files in
the interface are the einj_types file, which provides the available CXL
error types for injection, and the einj_inject file, which injects the
error into a CXL VH root port or CXL RCH downstream port.

Add a library API to retrieve the CXL error types and inject them. This
API will be used in a later commit by the 'cxl-inject-error' and
'cxl-list' commands.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/lib/libcxl.c   | 174 +++++++++++++++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.sym |   9 +++
 cxl/lib/private.h  |  14 ++++
 cxl/libcxl.h       |  13 ++++
 4 files changed, 210 insertions(+)

diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 3e0aa5c..0403fa9 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -46,11 +46,13 @@ struct cxl_ctx {
 	void *userdata;
 	int memdevs_init;
 	int buses_init;
+	int perrors_init;
 	unsigned long timeout;
 	struct udev *udev;
 	struct udev_queue *udev_queue;
 	struct list_head memdevs;
 	struct list_head buses;
+	struct list_head perrors;
 	struct kmod_ctx *kmod_ctx;
 	struct daxctl_ctx *daxctl_ctx;
 	void *private_data;
@@ -204,6 +206,14 @@ static void free_bus(struct cxl_bus *bus, struct list_head *head)
 	free(bus);
 }
 
+static void free_protocol_error(struct cxl_protocol_error *perror,
+				struct list_head *head)
+{
+	if (head)
+		list_del_from(head, &perror->list);
+	free(perror);
+}
+
 /**
  * cxl_get_userdata - retrieve stored data pointer from library context
  * @ctx: cxl library context
@@ -327,6 +337,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
 	*ctx = c;
 	list_head_init(&c->memdevs);
 	list_head_init(&c->buses);
+	list_head_init(&c->perrors);
 	c->kmod_ctx = kmod_ctx;
 	c->daxctl_ctx = daxctl_ctx;
 	c->udev = udev;
@@ -368,6 +379,7 @@ CXL_EXPORT struct cxl_ctx *cxl_ref(struct cxl_ctx *ctx)
  */
 CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
 {
+	struct cxl_protocol_error *perror, *_p;
 	struct cxl_memdev *memdev, *_d;
 	struct cxl_bus *bus, *_b;
 
@@ -383,6 +395,9 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
 	list_for_each_safe(&ctx->buses, bus, _b, port.list)
 		free_bus(bus, &ctx->buses);
 
+	list_for_each_safe(&ctx->perrors, perror, _p, list)
+		free_protocol_error(perror, &ctx->perrors);
+
 	udev_queue_unref(ctx->udev_queue);
 	udev_unref(ctx->udev);
 	kmod_unref(ctx->kmod_ctx);
@@ -3305,6 +3320,165 @@ CXL_EXPORT int cxl_port_decoders_committed(struct cxl_port *port)
 	return port->decoders_committed;
 }
 
+const struct cxl_protocol_error cxl_protocol_errors[] = {
+	CXL_PROTOCOL_ERROR(12, "cache-correctable"),
+	CXL_PROTOCOL_ERROR(13, "cache-uncorrectable"),
+	CXL_PROTOCOL_ERROR(14, "cache-fatal"),
+	CXL_PROTOCOL_ERROR(15, "mem-correctable"),
+	CXL_PROTOCOL_ERROR(16, "mem-uncorrectable"),
+	CXL_PROTOCOL_ERROR(17, "mem-fatal")
+};
+
+static struct cxl_protocol_error *create_cxl_protocol_error(struct cxl_ctx *ctx,
+							    unsigned long n)
+{
+	struct cxl_protocol_error *perror;
+
+	for (unsigned long i = 0; i < ARRAY_SIZE(cxl_protocol_errors); i++) {
+		if (n != BIT(cxl_protocol_errors[i].num))
+			continue;
+
+		perror = calloc(1, sizeof(*perror));
+		if (!perror)
+			return NULL;
+
+		*perror = cxl_protocol_errors[i];
+		perror->ctx = ctx;
+		return perror;
+	}
+
+	return NULL;
+}
+
+static void cxl_add_protocol_errors(struct cxl_ctx *ctx)
+{
+	struct cxl_protocol_error *perror;
+	char *path, *num, *save;
+	unsigned long n;
+	size_t path_len;
+	char buf[512];
+	int rc = 0;
+
+	if (!ctx->debugfs)
+		return;
+
+	path_len = strlen(ctx->debugfs) + 100;
+	path = calloc(1, path_len);
+	if (!path)
+		return;
+
+	snprintf(path, path_len, "%s/cxl/einj_types", ctx->debugfs);
+	rc = access(path, F_OK);
+	if (rc) {
+		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
+		goto err;
+	}
+
+	rc = sysfs_read_attr(ctx, path, buf);
+	if (rc) {
+		err(ctx, "failed to read %s: %s\n", path, strerror(-rc));
+		goto err;
+	}
+
+	/*
+	 * The format of the output of the einj_types attr is:
+	 * <Error number in hex 1> <Error name 1>
+	 * <Error number in hex 2> <Error name 2>
+	 * ...
+	 *
+	 * We only need the number, so parse that and skip the rest of
+	 * the line.
+	 */
+	num = strtok_r(buf, " \n", &save);
+	while (num) {
+		n = strtoul(num, NULL, 16);
+		perror = create_cxl_protocol_error(ctx, n);
+		if (perror)
+			list_add(&ctx->perrors, &perror->list);
+
+		num = strtok_r(NULL, "\n", &save);
+		if (!num)
+			break;
+
+		num = strtok_r(NULL, " \n", &save);
+	}
+
+err:
+	free(path);
+}
+
+static void cxl_protocol_errors_init(struct cxl_ctx *ctx)
+{
+	if (ctx->perrors_init)
+		return;
+
+	ctx->perrors_init = 1;
+	cxl_add_protocol_errors(ctx);
+}
+
+CXL_EXPORT struct cxl_protocol_error *
+cxl_protocol_error_get_first(struct cxl_ctx *ctx)
+{
+	cxl_protocol_errors_init(ctx);
+
+	return list_top(&ctx->perrors, struct cxl_protocol_error, list);
+}
+
+CXL_EXPORT struct cxl_protocol_error *
+cxl_protocol_error_get_next(struct cxl_protocol_error *perror)
+{
+	struct cxl_ctx *ctx = perror->ctx;
+
+	return list_next(&ctx->perrors, perror, list);
+}
+
+CXL_EXPORT unsigned long
+cxl_protocol_error_get_num(struct cxl_protocol_error *perror)
+{
+	return perror->num;
+}
+
+CXL_EXPORT const char *
+cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
+{
+	return perror->string;
+}
+
+CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
+					       unsigned long error)
+{
+	struct cxl_ctx *ctx = dport->port->ctx;
+	unsigned long path_len;
+	char buf[32] = { 0 };
+	char *path;
+	int rc;
+
+	if (!ctx->debugfs)
+		return -ENOENT;
+
+	path_len = strlen(ctx->debugfs) + 100;
+	path = calloc(path_len, sizeof(char));
+	if (!path)
+		return -ENOMEM;
+
+	snprintf(path, path_len, "%s/cxl/%s/einj_inject", ctx->debugfs,
+		 cxl_dport_get_devname(dport));
+	rc = access(path, F_OK);
+	if (rc) {
+		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
+		free(path);
+		return rc;
+	}
+
+	snprintf(buf, sizeof(buf), "0x%lx\n", error);
+	rc = sysfs_write_attr(ctx, path, buf);
+	if (rc)
+		err(ctx, "failed to write %s: %s\n", path, strerror(-rc));
+
+	free(path);
+	return rc;
+}
+
 static void *add_cxl_bus(void *parent, int id, const char *cxlbus_base)
 {
 	const char *devname = devpath_to_devname(cxlbus_base);
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index 763151f..61ed0db 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -287,3 +287,12 @@ global:
 	cxl_memdev_trigger_poison_list;
 	cxl_region_trigger_poison_list;
 } LIBCXL_7;
+
+LIBCXL_9 {
+global:
+	cxl_protocol_error_get_first;
+	cxl_protocol_error_get_next;
+	cxl_protocol_error_get_num;
+	cxl_protocol_error_get_str;
+	cxl_dport_protocol_error_inject;
+} LIBECXL_8;
diff --git a/cxl/lib/private.h b/cxl/lib/private.h
index b6cd910..85806ac 100644
--- a/cxl/lib/private.h
+++ b/cxl/lib/private.h
@@ -102,6 +102,20 @@ struct cxl_port {
 	struct list_head dports;
 };
 
+struct cxl_protocol_error {
+	unsigned long num;
+	const char *string;
+	struct cxl_ctx *ctx;
+	struct list_node list;
+};
+
+#define CXL_PROTOCOL_ERROR(n, str)	\
+	((struct cxl_protocol_error){	\
+		.num = (n),		\
+		.string = (str),	\
+		.ctx = NULL,		\
+	})
+
 struct cxl_bus {
 	struct cxl_port port;
 };
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index 43c082a..afa076a 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -486,6 +486,19 @@ int cxl_cmd_alert_config_set_enable_alert_actions(struct cxl_cmd *cmd,
 						  int enable);
 struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memdev);
 
+struct cxl_protocol_error;
+struct cxl_protocol_error *cxl_protocol_error_get_first(struct cxl_ctx *ctx);
+struct cxl_protocol_error *
+cxl_protocol_error_get_next(struct cxl_protocol_error *perror);
+unsigned long cxl_protocol_error_get_num(struct cxl_protocol_error *perror);
+const char *cxl_protocol_error_get_str(struct cxl_protocol_error *perror);
+int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
+				    unsigned long error);
+
+#define cxl_protocol_error_foreach(ctx, perror)				       \
+	for (perror = cxl_protocol_error_get_first(ctx); perror != NULL;       \
+	     perror = cxl_protocol_error_get_next(perror))
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [ndctl PATCH v2 3/7] libcxl: Add poison injection support
  2025-06-02 20:56 [ndctl PATCH v2 0/7] Add error injection support Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 2/7] libcxl: Add CXL protocol errors Ben Cheatham
@ 2025-06-02 20:56 ` Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 4/7] cxl: Add inject-error command Ben Cheatham
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-06-02 20:56 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield, junhyeok.im

Add a library API for clearing and injecting poison into a CXL memory
device through the CXL debugfs.

This API will be used by the 'cxl-inject-error' and 'cxl-clear-error'
commands in later commits.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/lib/libcxl.c   | 60 ++++++++++++++++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.sym |  3 +++
 cxl/libcxl.h       |  3 +++
 3 files changed, 66 insertions(+)

diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 0403fa9..e1c9951 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -4897,3 +4897,63 @@ CXL_EXPORT struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memde
 {
 	return cxl_cmd_new_generic(memdev, CXL_MEM_COMMAND_ID_SET_ALERT_CONFIG);
 }
+
+CXL_EXPORT bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev)
+{
+	struct cxl_ctx *ctx = memdev->ctx;
+	size_t path_len;
+	bool exists;
+	char *path;
+
+	if (!ctx->debugfs)
+		return false;
+
+	path_len = strlen(ctx->debugfs) + 100;
+	path = calloc(path_len, sizeof(char));
+	if (!path)
+		return false;
+
+	snprintf(path, path_len, "%s/cxl/%s/inject_poison", ctx->debugfs,
+		 cxl_memdev_get_devname(memdev));
+	exists = access(path, F_OK) == 0;
+
+	free(path);
+	return exists;
+}
+
+static int cxl_memdev_poison_action(struct cxl_memdev *memdev, size_t dpa,
+				    bool clear)
+{
+	struct cxl_ctx *ctx = memdev->ctx;
+	size_t path_len;
+	char addr[32];
+	char *path;
+	int rc;
+
+	if (!ctx->debugfs)
+		return -ENOENT;
+
+	path_len = strlen(ctx->debugfs) + 100;
+	path = calloc(path_len, sizeof(char));
+	if (!path)
+		return -ENOMEM;
+
+	snprintf(path, path_len, "%s/cxl/%s/%s", ctx->debugfs,
+		 cxl_memdev_get_devname(memdev),
+		 clear ? "clear_poison" : "inject_poison");
+	snprintf(addr, 32, "0x%lx\n", dpa);
+
+	rc = sysfs_write_attr(ctx, path, addr);
+	free(path);
+	return rc;
+}
+
+CXL_EXPORT int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t addr)
+{
+	return cxl_memdev_poison_action(memdev, addr, false);
+}
+
+CXL_EXPORT int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t addr)
+{
+	return cxl_memdev_poison_action(memdev, addr, true);
+}
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index 61ed0db..012d344 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -295,4 +295,7 @@ global:
 	cxl_protocol_error_get_num;
 	cxl_protocol_error_get_str;
 	cxl_dport_protocol_error_inject;
+	cxl_memdev_has_poison_injection;
+	cxl_memdev_inject_poison;
+	cxl_memdev_clear_poison;
 } LIBECXL_8;
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index afa076a..fa007d0 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -100,6 +100,9 @@ int cxl_memdev_read_label(struct cxl_memdev *memdev, void *buf, size_t length,
 		size_t offset);
 int cxl_memdev_write_label(struct cxl_memdev *memdev, void *buf, size_t length,
 		size_t offset);
+bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev);
+int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t dpa);
+int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t dpa);
 struct cxl_cmd *cxl_cmd_new_get_fw_info(struct cxl_memdev *memdev);
 unsigned int cxl_cmd_fw_info_get_num_slots(struct cxl_cmd *cmd);
 unsigned int cxl_cmd_fw_info_get_active_slot(struct cxl_cmd *cmd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [ndctl PATCH v2 4/7] cxl: Add inject-error command
  2025-06-02 20:56 [ndctl PATCH v2 0/7] Add error injection support Ben Cheatham
                   ` (2 preceding siblings ...)
  2025-06-02 20:56 ` [ndctl PATCH v2 3/7] libcxl: Add poison injection support Ben Cheatham
@ 2025-06-02 20:56 ` Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 5/7] cxl: Add clear-error command Ben Cheatham
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-06-02 20:56 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield, junhyeok.im

Add the 'cxl-inject-error' command. This command will provide CXL
protocol error injection for CXL VH root ports and CXL RCH downstream
ports, as well as poison injection for CXL memory devices.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/builtin.h      |   1 +
 cxl/cxl.c          |   1 +
 cxl/inject-error.c | 196 +++++++++++++++++++++++++++++++++++++++++++++
 cxl/meson.build    |   1 +
 4 files changed, 199 insertions(+)
 create mode 100644 cxl/inject-error.c

diff --git a/cxl/builtin.h b/cxl/builtin.h
index c483f30..e82fcb5 100644
--- a/cxl/builtin.h
+++ b/cxl/builtin.h
@@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
+int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
 #ifdef ENABLE_LIBTRACEFS
 int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
 #else
diff --git a/cxl/cxl.c b/cxl/cxl.c
index 1643667..a98bd6b 100644
--- a/cxl/cxl.c
+++ b/cxl/cxl.c
@@ -80,6 +80,7 @@ static struct cmd_struct commands[] = {
 	{ "disable-region", .c_fn = cmd_disable_region },
 	{ "destroy-region", .c_fn = cmd_destroy_region },
 	{ "monitor", .c_fn = cmd_monitor },
+	{ "inject-error", .c_fn = cmd_inject_error },
 };
 
 int main(int argc, const char **argv)
diff --git a/cxl/inject-error.c b/cxl/inject-error.c
new file mode 100644
index 0000000..bc46f82
--- /dev/null
+++ b/cxl/inject-error.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 AMD. All rights reserved. */
+#include <util/parse-options.h>
+#include <cxl/libcxl.h>
+#include <cxl/filter.h>
+#include <util/log.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <errno.h>
+#include <limits.h>
+
+#define EINJ_TYPES_BUF_SIZE 512
+
+static bool debug;
+
+static struct inject_params {
+	const char *type;
+	const char *address;
+} inj_param;
+
+static const struct option inject_options[] = {
+	OPT_STRING('t', "type", &inj_param.type, "Error type",
+		   "Error type to inject into <device>"),
+	OPT_STRING('a', "address", &inj_param.address, "Address for poison injection",
+		   "Device physical address for poison injection in hex or decimal"),
+#ifdef ENABLE_DEBUG
+	OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
+#endif
+	OPT_END(),
+};
+
+static struct log_ctx iel;
+
+static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
+						     const char *type)
+{
+	struct cxl_protocol_error *perror;
+
+	cxl_protocol_error_foreach(ctx, perror) {
+		if (strcmp(type, cxl_protocol_error_get_str(perror)) == 0)
+			return perror;
+	}
+
+	log_err(&iel, "Invalid CXL protocol error type: %s\n", type);
+	return NULL;
+}
+
+static struct cxl_dport *find_cxl_dport(struct cxl_ctx *ctx, const char *devname)
+{
+	struct cxl_port *port, *top;
+	struct cxl_dport *dport;
+	struct cxl_bus *bus;
+
+	cxl_bus_foreach(ctx, bus) {
+		top = cxl_bus_get_port(bus);
+
+		cxl_port_foreach_all(top, port)
+			cxl_dport_foreach(port, dport)
+				if (!strcmp(devname,
+					    cxl_dport_get_devname(dport)))
+					return dport;
+	}
+
+	log_err(&iel, "Downstream port \"%s\" not found\n", devname);
+	return NULL;
+}
+
+static struct cxl_memdev *find_cxl_memdev(struct cxl_ctx *ctx,
+					  const char *filter)
+{
+	struct cxl_memdev *memdev;
+
+	cxl_memdev_foreach(ctx, memdev) {
+		if (util_cxl_memdev_filter(memdev, filter, NULL))
+			return memdev;
+	}
+
+	log_err(&iel, "Memdev \"%s\" not found\n", filter);
+	return NULL;
+}
+
+static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
+			    struct cxl_protocol_error *perror)
+{
+	struct cxl_dport *dport;
+	int rc;
+
+	if (!devname) {
+		log_err(&iel, "No downstream port specified for injection\n");
+		return -EINVAL;
+	}
+
+	dport = find_cxl_dport(ctx, devname);
+	if (!dport)
+		return -ENODEV;
+
+	rc = cxl_dport_protocol_error_inject(dport,
+					     cxl_protocol_error_get_num(perror));
+	if (rc)
+		return rc;
+
+	printf("injected %s protocol error.\n",
+	       cxl_protocol_error_get_str(perror));
+	return 0;
+}
+
+static int poison_action(struct cxl_ctx *ctx, const char *filter,
+			 const char *addr)
+{
+	struct cxl_memdev *memdev;
+	size_t a;
+	int rc;
+
+	memdev = find_cxl_memdev(ctx, filter);
+	if (!memdev)
+		return -ENODEV;
+
+	if (!cxl_memdev_has_poison_injection(memdev)) {
+		log_err(&iel, "%s does not support error injection\n",
+			cxl_memdev_get_devname(memdev));
+		return -EINVAL;
+	}
+
+	if (!addr) {
+		log_err(&iel, "no address provided\n");
+		return -EINVAL;
+	}
+
+	a = strtoull(addr, NULL, 0);
+	if (a == ULLONG_MAX && errno == ERANGE) {
+		log_err(&iel, "invalid address %s", addr);
+		return -EINVAL;
+	}
+
+	rc = cxl_memdev_inject_poison(memdev, a);
+
+	if (rc)
+		log_err(&iel, "failed to inject poison at %s:%s: %s\n",
+			cxl_memdev_get_devname(memdev), addr, strerror(-rc));
+	else
+		printf("poison injected at %s:%s\n",
+		       cxl_memdev_get_devname(memdev), addr);
+
+	return rc;
+}
+
+static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
+			 const struct option *options, const char *usage)
+{
+	struct cxl_protocol_error *perr;
+	const char * const u[] = {
+		usage,
+		NULL
+	};
+	int rc = -EINVAL;
+
+	log_init(&iel, "cxl inject-error", "CXL_INJECT_LOG");
+	argc = parse_options(argc, argv, options, u, 0);
+
+	if (debug) {
+		cxl_set_log_priority(ctx, LOG_DEBUG);
+		iel.log_priority = LOG_DEBUG;
+	} else {
+		iel.log_priority = LOG_INFO;
+	}
+
+	if (argc != 1) {
+		usage_with_options(u, options);
+		return rc;
+	}
+
+	if (strcmp(inj_param.type, "poison") == 0) {
+		rc = poison_action(ctx, argv[0], inj_param.address);
+		return rc;
+	}
+
+	perr = find_cxl_proto_err(ctx, inj_param.type);
+	if (perr) {
+		rc = inject_proto_err(ctx, argv[0], perr);
+		if (rc)
+			log_err(&iel, "Failed to inject error: %d\n", rc);
+	}
+
+	log_err(&iel, "Invalid error type %s", inj_param.type);
+	return rc;
+}
+
+int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
+{
+	int rc = inject_action(argc, argv, ctx, inject_options,
+			       "inject-error <device> [<options>]");
+
+	return rc ? EXIT_FAILURE : EXIT_SUCCESS;
+}
+
diff --git a/cxl/meson.build b/cxl/meson.build
index e4d1683..29918e4 100644
--- a/cxl/meson.build
+++ b/cxl/meson.build
@@ -7,6 +7,7 @@ cxl_src = [
   'memdev.c',
   'json.c',
   'filter.c',
+  'inject-error.c',
   '../daxctl/json.c',
   '../daxctl/filter.c',
 ]
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [ndctl PATCH v2 5/7] cxl: Add clear-error command
  2025-06-02 20:56 [ndctl PATCH v2 0/7] Add error injection support Ben Cheatham
                   ` (3 preceding siblings ...)
  2025-06-02 20:56 ` [ndctl PATCH v2 4/7] cxl: Add inject-error command Ben Cheatham
@ 2025-06-02 20:56 ` Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 6/7] cxl/list: Add injectable errors in output Ben Cheatham
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-06-02 20:56 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield, junhyeok.im

Add the 'cxl-clear-error' command. This command allows the user to clear
device poison from CXL memory devices.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/builtin.h      |  1 +
 cxl/cxl.c          |  1 +
 cxl/inject-error.c | 67 ++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/cxl/builtin.h b/cxl/builtin.h
index e82fcb5..68ed1de 100644
--- a/cxl/builtin.h
+++ b/cxl/builtin.h
@@ -26,6 +26,7 @@ int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
+int cmd_clear_error(int argc, const char **argv, struct cxl_ctx *ctx);
 #ifdef ENABLE_LIBTRACEFS
 int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
 #else
diff --git a/cxl/cxl.c b/cxl/cxl.c
index a98bd6b..e1740b5 100644
--- a/cxl/cxl.c
+++ b/cxl/cxl.c
@@ -81,6 +81,7 @@ static struct cmd_struct commands[] = {
 	{ "destroy-region", .c_fn = cmd_destroy_region },
 	{ "monitor", .c_fn = cmd_monitor },
 	{ "inject-error", .c_fn = cmd_inject_error },
+	{ "clear-error", .c_fn = cmd_clear_error },
 };
 
 int main(int argc, const char **argv)
diff --git a/cxl/inject-error.c b/cxl/inject-error.c
index bc46f82..f8a9445 100644
--- a/cxl/inject-error.c
+++ b/cxl/inject-error.c
@@ -19,6 +19,10 @@ static struct inject_params {
 	const char *address;
 } inj_param;
 
+static struct clear_params {
+	const char *address;
+} clear_param;
+
 static const struct option inject_options[] = {
 	OPT_STRING('t', "type", &inj_param.type, "Error type",
 		   "Error type to inject into <device>"),
@@ -30,6 +34,15 @@ static const struct option inject_options[] = {
 	OPT_END(),
 };
 
+static const struct option clear_options[] = {
+	OPT_STRING('a', "address", &clear_param.address, "Address for poison clearing",
+		   "Device physical address to clear poison from in hex or decimal"),
+#ifdef ENABLE_DEBUG
+	OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
+#endif
+	OPT_END(),
+};
+
 static struct log_ctx iel;
 
 static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
@@ -106,7 +119,7 @@ static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
 }
 
 static int poison_action(struct cxl_ctx *ctx, const char *filter,
-			 const char *addr)
+			 const char *addr, bool clear)
 {
 	struct cxl_memdev *memdev;
 	size_t a;
@@ -133,13 +146,17 @@ static int poison_action(struct cxl_ctx *ctx, const char *filter,
 		return -EINVAL;
 	}
 
-	rc = cxl_memdev_inject_poison(memdev, a);
+	if (clear)
+		rc = cxl_memdev_clear_poison(memdev, a);
+	else
+		rc = cxl_memdev_inject_poison(memdev, a);
 
 	if (rc)
-		log_err(&iel, "failed to inject poison at %s:%s: %s\n",
+		log_err(&iel, "failed to %s %s:%s: %s\n",
+			clear ? "clear poison at" : "inject point at",
 			cxl_memdev_get_devname(memdev), addr, strerror(-rc));
 	else
-		printf("poison injected at %s:%s\n",
+		printf("poison %s at %s:%s\n", clear ? "cleared" : "injected",
 		       cxl_memdev_get_devname(memdev), addr);
 
 	return rc;
@@ -171,7 +188,7 @@ static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
 	}
 
 	if (strcmp(inj_param.type, "poison") == 0) {
-		rc = poison_action(ctx, argv[0], inj_param.address);
+		rc = poison_action(ctx, argv[0], inj_param.address, false);
 		return rc;
 	}
 
@@ -194,3 +211,43 @@ int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
 	return rc ? EXIT_FAILURE : EXIT_SUCCESS;
 }
 
+static int clear_action(int argc, const char **argv, struct cxl_ctx *ctx,
+			const struct option *options, const char *usage)
+{
+	const char * const u[] = {
+		usage,
+		NULL
+	};
+	int rc = -EINVAL;
+
+	log_init(&iel, "cxl clear-error", "CXL_CLEAR_LOG");
+	argc = parse_options(argc, argv, options, u, 0);
+
+	if (debug) {
+		cxl_set_log_priority(ctx, LOG_DEBUG);
+		iel.log_priority = LOG_DEBUG;
+	} else {
+		iel.log_priority = LOG_INFO;
+	}
+
+	if (argc != 1) {
+		usage_with_options(u, options);
+		return rc;
+	}
+
+	rc = poison_action(ctx, argv[0], clear_param.address, true);
+	if (rc) {
+		log_err(&iel, "Failed to inject poison into %s: %s\n",
+			argv[0], strerror(-rc));
+		return rc;
+	}
+
+	return rc;
+}
+
+int cmd_clear_error(int argc, const char **argv, struct cxl_ctx *ctx)
+{
+	int rc = clear_action(argc, argv, ctx, clear_options,
+			      "clear-error <device> [<options>]");
+	return rc ? EXIT_FAILURE : EXIT_SUCCESS;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [ndctl PATCH v2 6/7] cxl/list: Add injectable errors in output
  2025-06-02 20:56 [ndctl PATCH v2 0/7] Add error injection support Ben Cheatham
                   ` (4 preceding siblings ...)
  2025-06-02 20:56 ` [ndctl PATCH v2 5/7] cxl: Add clear-error command Ben Cheatham
@ 2025-06-02 20:56 ` Ben Cheatham
  2025-06-02 20:56 ` [ndctl PATCH v2 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
  2025-06-12  3:29 ` [ndctl PATCH v2 0/7] Add error injection support Alison Schofield
  7 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-06-02 20:56 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield, junhyeok.im

Add the "--injectable-errors"/"-N" option to show injectable error
information for CXL devices. The applicable devices are CXL memory
devices and CXL busses.

For CXL memory devices the option reports whether the device supports
poison injection (the "--media-errors"/"-L" option shows injected
poison).

For CXL busses the option shows injectable CXL protocol error types. The
information will be the same across busses because the error types are
system-wide. The information is presented under the bus for easier
filtering.

Update the man page for 'cxl-list' to show the usage of the new option.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 Documentation/cxl/cxl-list.txt | 35 +++++++++++++++++++++++++++++++++-
 cxl/filter.h                   |  3 +++
 cxl/json.c                     | 30 +++++++++++++++++++++++++++++
 cxl/list.c                     |  3 +++
 util/json.h                    |  1 +
 5 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
index 9a9911e..358a64e 100644
--- a/Documentation/cxl/cxl-list.txt
+++ b/Documentation/cxl/cxl-list.txt
@@ -469,6 +469,38 @@ OPTIONS
 }
 ----
 
+-N::
+--injectable-errors::
+	Include injectable error information in the output. For CXL memory devices
+	this includes whether poison is injectable through the kernel debug filesystem.
+	The types of CXL protocol errors available for injection into downstream ports
+	are listed as part of a CXL bus object.
+
+----
+# cxl list -NB
+[
+  {
+	"bus":"root0",
+	"provider":"ACPI.CXL",
+	"injectable_protocol_errors":[
+	  "mem-correctable",
+	  "mem-fatal",
+	]
+  }
+]
+
+# cxl list -N
+[
+  {
+    "memdev":"mem0",
+    "pmem_size":268435456,
+    "ram_size":268435456,
+    "serial":2,
+	"poison_injectable":true
+  }
+]
+
+----
 -v::
 --verbose::
 	Increase verbosity of the output. This can be specified
@@ -485,7 +517,8 @@ OPTIONS
 	  devices with --idle.
 	- *-vvv*
 	  Everything *-vv* provides, plus enable
-	  --health, --partition, and --media-errors.
+	  --health, --partition, --media-errors, and
+	  --injectable-errors.
 
 --debug::
 	If the cxl tool was built with debug enabled, turn on debug
diff --git a/cxl/filter.h b/cxl/filter.h
index 956a46e..34f8387 100644
--- a/cxl/filter.h
+++ b/cxl/filter.h
@@ -31,6 +31,7 @@ struct cxl_filter_params {
 	bool alert_config;
 	bool dax;
 	bool media_errors;
+	bool inj_errors;
 	int verbose;
 	struct log_ctx ctx;
 };
@@ -91,6 +92,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
 		flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
 	if (param->media_errors)
 		flags |= UTIL_JSON_MEDIA_ERRORS;
+	if (param->inj_errors)
+		flags |= UTIL_JSON_INJ_ERRORS;
 	return flags;
 }
 
diff --git a/cxl/json.c b/cxl/json.c
index e65bd80..6f1a7cf 100644
--- a/cxl/json.c
+++ b/cxl/json.c
@@ -855,6 +855,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
 			json_object_object_add(jdev, "firmware", jobj);
 	}
 
+	if (flags & UTIL_JSON_INJ_ERRORS) {
+		jobj = json_object_new_boolean(cxl_memdev_has_poison_injection(memdev));
+		if (jobj)
+			json_object_object_add(jdev, "poison_injectable", jobj);
+	}
+
 	if (flags & UTIL_JSON_MEDIA_ERRORS) {
 		jobj = util_cxl_poison_list_to_json(NULL, memdev, flags);
 		if (jobj)
@@ -930,6 +936,8 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
 					 unsigned long flags)
 {
 	const char *devname = cxl_bus_get_devname(bus);
+	struct cxl_ctx *ctx = cxl_bus_get_ctx(bus);
+	struct cxl_protocol_error *perror;
 	struct json_object *jbus, *jobj;
 
 	jbus = json_object_new_object();
@@ -945,6 +953,28 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
 		json_object_object_add(jbus, "provider", jobj);
 
 	json_object_set_userdata(jbus, bus, NULL);
+
+	if (flags & UTIL_JSON_INJ_ERRORS) {
+		jobj = json_object_new_array();
+		if (!jobj)
+			return jbus;
+
+		cxl_protocol_error_foreach(ctx, perror)
+		{
+			struct json_object *jerr_str;
+			const char *perror_str;
+
+			perror_str = cxl_protocol_error_get_str(perror);
+
+			jerr_str = json_object_new_string(perror_str);
+			if (jerr_str)
+				json_object_array_add(jobj, jerr_str);
+		}
+
+		json_object_object_add(jbus, "injectable_protocol_errors",
+				       jobj);
+	}
+
 	return jbus;
 }
 
diff --git a/cxl/list.c b/cxl/list.c
index 0b25d78..a505ed6 100644
--- a/cxl/list.c
+++ b/cxl/list.c
@@ -59,6 +59,8 @@ static const struct option options[] = {
 		    "include alert configuration information"),
 	OPT_BOOLEAN('L', "media-errors", &param.media_errors,
 		    "include media-error information "),
+	OPT_BOOLEAN('N', "injectable-errors", &param.inj_errors,
+		    "include injectable error information"),
 	OPT_INCR('v', "verbose", &param.verbose, "increase output detail"),
 #ifdef ENABLE_DEBUG
 	OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
@@ -124,6 +126,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
 		param.alert_config = true;
 		param.dax = true;
 		param.media_errors = true;
+		param.inj_errors = true;
 		/* fallthrough */
 	case 2:
 		param.idle = true;
diff --git a/util/json.h b/util/json.h
index 560f845..57278cb 100644
--- a/util/json.h
+++ b/util/json.h
@@ -21,6 +21,7 @@ enum util_json_flags {
 	UTIL_JSON_TARGETS	= (1 << 11),
 	UTIL_JSON_PARTITION	= (1 << 12),
 	UTIL_JSON_ALERT_CONFIG	= (1 << 13),
+	UTIL_JSON_INJ_ERRORS	= (1 << 14),
 };
 
 void util_display_json_array(FILE *f_out, struct json_object *jarray,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [ndctl PATCH v2 7/7] Documentation: Add docs for inject/clear-error commands
  2025-06-02 20:56 [ndctl PATCH v2 0/7] Add error injection support Ben Cheatham
                   ` (5 preceding siblings ...)
  2025-06-02 20:56 ` [ndctl PATCH v2 6/7] cxl/list: Add injectable errors in output Ben Cheatham
@ 2025-06-02 20:56 ` Ben Cheatham
  2025-06-12  3:29 ` [ndctl PATCH v2 0/7] Add error injection support Alison Schofield
  7 siblings, 0 replies; 10+ messages in thread
From: Ben Cheatham @ 2025-06-02 20:56 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, benjamin.cheatham, alison.schofield, junhyeok.im

Add man pages for the 'cxl-inject-error' and 'cxl-clear-error' commands.
These man pages show usage and examples for each of their use cases.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 Documentation/cxl/cxl-clear-error.txt  |  67 +++++++++++++
 Documentation/cxl/cxl-inject-error.txt | 129 +++++++++++++++++++++++++
 Documentation/cxl/meson.build          |   2 +
 3 files changed, 198 insertions(+)
 create mode 100644 Documentation/cxl/cxl-clear-error.txt
 create mode 100644 Documentation/cxl/cxl-inject-error.txt

diff --git a/Documentation/cxl/cxl-clear-error.txt b/Documentation/cxl/cxl-clear-error.txt
new file mode 100644
index 0000000..ccb0e63
--- /dev/null
+++ b/Documentation/cxl/cxl-clear-error.txt
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+
+cxl-clear-error(1)
+==================
+
+NAME
+----
+cxl-clear-error - Clear CXL errors from CXL devices
+
+SYNOPSIS
+--------
+[verse]
+'cxl clear-error' <device name> [<options>]
+
+Clear an error from a CXL device. The types of devices supported are:
+
+"memdevs":: A CXL memory device. Memory devices are specified by device
+name ("mem0"), device id ("0") and/or host device name ("0000:35:00.0").
+
+Only device poison (viewable using the '-L'/'--media-errors' option of
+'cxl-list') can be cleared from a device using this command. For example:
+
+----
+
+# cxl list -m mem0 -L -u
+{
+  "memdev":"mem0",
+  "ram_size":"1024.00 MiB (1073.74 MB)",
+  "ram_qos_class":42,
+  "serial":"0x0",
+  "numa_node:1,
+  "host":"0000:35:00.0",
+  "media_errors":[
+    {
+	  "offset":"0x1000",
+	  "length":64,
+	  "source":"Injected"
+	}
+  ]
+}
+
+# cxl clear-error mem0 -a 0x1000
+poison cleared at mem0:0x1000
+
+# cxl list -m mem0 -L -u
+{
+  "memdev":"mem0",
+  "ram_size":"1024.00 MiB (1073.74 MB)",
+  "ram_qos_class":42,
+  "serial":"0x0",
+  "numa_node:1,
+  "host":"0000:35:00.0",
+  "media_errors":[
+  ]
+}
+
+----
+
+OPTIONS
+-------
+-a::
+--address::
+	Device physical address (DPA) to clear poison from. Address can be specified
+	in hex or decimal. Required for clearing poison.
+
+--debug::
+	Enable debug output
diff --git a/Documentation/cxl/cxl-inject-error.txt b/Documentation/cxl/cxl-inject-error.txt
new file mode 100644
index 0000000..a0263dd
--- /dev/null
+++ b/Documentation/cxl/cxl-inject-error.txt
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0
+
+cxl-inject-error(1)
+===================
+
+NAME
+----
+cxl-inject-error - Inject CXL errors into CXL devices
+
+SYNOPSIS
+--------
+[verse]
+'cxl inject-error' <device name> [<options>]
+
+Inject an error into a CXL device. The type of errors supported depend on the
+device specified. The types of devices supported are:
+
+"Downstream Ports":: A CXL RCH downstream port (dport) or a CXL VH root port.
+Eligible CXL 2.0+ ports are dports of ports at depth 1 in the output of cxl-list.
+Dports are specified by host name ("0000:0e:01.1").
+"memdevs":: A CXL memory device. Memory devices are specified by device name
+("mem0"), device id ("0"), and/or host device name ("0000:35:00.0").
+
+There are two types of errors which can be injected: CXL protocol errors
+and device poison.
+
+CXL protocol errors can only be used with downstream ports (as defined above).
+Protocol errors follow the format of "<protocol>-<severity>". For example,
+a "mem-fatal" error is a CXL.mem fatal protocol error. Protocol errors can be
+found with the '-N' option of 'cxl-list' under a CXL bus object. For example:
+
+----
+
+# cxl list -NB
+[
+  {
+	"bus":"root0",
+	"provider":"ACPI.CXL",
+	"injectable_protocol_errors":[
+	  "mem-correctable",
+	  "mem-fatal",
+	]
+  }
+]
+
+----
+
+CXL protocol (CXL.cache/mem) error injection requires the platform to support
+ACPI v6.5+ error injection (EINJ). In addition to platform support, the
+CONFIG_ACPI_APEI_EINJ and CONFIG_ACPI_APEI_EINJ_CXL kernel configuration options
+will need to be enabled. For more information, view the Linux kernel documentation
+on EINJ.
+
+Device poison can only by used with CXL memory devices. A device physical address
+(DPA) is required to do poison injection. DPAs range from 0 to the size of
+device's memory, which can be found using 'cxl-list'. An example injection:
+
+----
+
+# cxl inject-error mem0 -t poison -a 0x1000
+poison injected at mem0:0x1000
+# cxl list -m mem0 -u --media-errors
+{
+  "memdev":"mem0",
+  "ram_size":"256.00 MiB (268.44 MB)",
+  "serial":"0",
+  "host":"0000:0d:00.0",
+  "firmware_version":"BWFW VERSION 00",
+  "media_errors":[
+    {
+  	"offset":"0x1000",
+  	"length":64,
+  	"source":"Injected"
+    }
+  ]
+}
+
+----
+
+Not all devices support poison injection. To see if a device supports poison injection
+through debugfs, use 'cxl-list' with the '-N' option and look for the "poison-injectable"
+attribute under the device. Example:
+
+----
+
+# cxl list -Nu -m mem0
+{
+  "memdev":"mem0",
+  "ram_size":"256.00 MiB (268.44 MB)",
+  "serial":"0",
+  "host":"0000:0d:00.0",
+  "firmware_version":"BWFW VERSION 00",
+  "poison_injectable":true
+}
+
+----
+
+This command depends on the kernel debug filesystem (debugfs) to do CXL protocol
+error and device poison injection.
+
+OPTIONS
+-------
+-a::
+--address::
+	Device physical address (DPA) to use for poison injection. Address can
+	be specified in hex or decimal. Required for poison injection.
+
+-t::
+--type::
+	Type of error to inject into <device name>. The type of error is restricted
+	by device type. The following shows the possible types under their associated
+	device type(s):
+----
+
+Downstream Ports: ::
+	cache-correctable, cache-uncorrectable, cache-fatal, mem-correctable,
+	mem-fatal
+
+Memdevs: ::
+	poison
+
+----
+
+--debug::
+	Enable debug output
+
+SEE ALSO
+--------
+linkcxl:cxl-list[1]
diff --git a/Documentation/cxl/meson.build b/Documentation/cxl/meson.build
index 8085c1c..0b75eed 100644
--- a/Documentation/cxl/meson.build
+++ b/Documentation/cxl/meson.build
@@ -50,6 +50,8 @@ cxl_manpages = [
   'cxl-update-firmware.txt',
   'cxl-set-alert-config.txt',
   'cxl-wait-sanitize.txt',
+  'cxl-inject-error.txt',
+  'cxl-clear-error.txt',
 ]
 
 foreach man : cxl_manpages
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [ndctl PATCH v2 0/7] Add error injection support
  2025-06-02 20:56 [ndctl PATCH v2 0/7] Add error injection support Ben Cheatham
                   ` (6 preceding siblings ...)
  2025-06-02 20:56 ` [ndctl PATCH v2 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
@ 2025-06-12  3:29 ` Alison Schofield
  2025-06-16 20:09   ` Cheatham, Benjamin
  7 siblings, 1 reply; 10+ messages in thread
From: Alison Schofield @ 2025-06-12  3:29 UTC (permalink / raw)
  To: Ben Cheatham; +Cc: nvdimm, linux-cxl, junhyeok.im

On Mon, Jun 02, 2025 at 03:56:26PM -0500, Ben Cheatham wrote:
> v2 Changes:
> 	- Make the --clear option of 'inject-error' its own command (Alison)
> 	- Debugfs is now found using the /proc/mount entry instead of
> 	providing the path using a --debugfs option
> 	- Man page added for 'clear-error'
> 	- Reword commit descriptions for clarity
> 
> This series adds support for injecting CXL protocol (CXL.cache/mem)
> errors[1] into CXL RCH Downstream ports and VH root ports[2] and
> poison into CXL memory devices through the CXL debugfs. Errors are
> injected using a new 'inject-error' command, while errors are reported
> using a new cxl-list "-N"/"--injectable-errors" option. Device poison
> can be cleared using the 'clear-error' command.
> 
> The 'inject-error'/'clear-error' commands and "-N" option of cxl-list all
> require access to the CXL driver's debugfs.
> 
> The documentation for the new cxl-inject-error command shows both usage
> and the possible device/error types, as well as how to retrieve them
> using cxl-list. The documentation for cxl-list has also been updated to
> show the usage of the new injectable errors option.
> 
> [1]: ACPI v6.5 spec, section 18.6.4
> [2]: ACPI v6.5 spec, table 18.31
> 
> --
> 
> Alison, I reached out to Junhyeok about his poison injection series but
> never heard back, so I've just continued with my original plans for a
> v2.
> 
> Quick note: My testing setup is screwed up at the moment, so this
> revision is untested. I'll try to get it fixed for the next revision.

I applied this to v82 (needs a sync up in libcxl.sym) and ran cxl-poison unit
test using your new cxl-cli cmds instead of writing to debugfs directly.[1]
Works for me. Just thought I'd share that as proof of life until I review it
completely.

Adding more test cases to cxl-poison.sh makes sense for the device poison.
Wondering about the protocol errors. How do we test those?

[1] diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh
index 6ed890bc666c..41ab670b1094 100644
--- a/test/cxl-poison.sh
+++ b/test/cxl-poison.sh
@@ -68,7 +68,8 @@ inject_poison_sysfs()
        memdev="$1"
        addr="$2"
 
-       echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
+#      echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
+       $CXL inject-error "$memdev" -t poison -a "$addr"
 }
 
 clear_poison_sysfs()
@@ -76,7 +77,8 @@ clear_poison_sysfs()
        memdev="$1"
        addr="$2"
 
-       echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
+#      echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
+       $CXL clear-error "$memdev" -a "$addr"
 }


While applying this: Documentation: Add docs for inject/clear-error commands
Got these whitespace complaints:
234: new blank line at EOF
158: space before tab in indent.
        "offset":"0x1000",
159: space before tab in indent.
        "length":64,
160: space before tab in indent.
        "source":"Injected"


-- snip


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [ndctl PATCH v2 0/7] Add error injection support
  2025-06-12  3:29 ` [ndctl PATCH v2 0/7] Add error injection support Alison Schofield
@ 2025-06-16 20:09   ` Cheatham, Benjamin
  0 siblings, 0 replies; 10+ messages in thread
From: Cheatham, Benjamin @ 2025-06-16 20:09 UTC (permalink / raw)
  To: Alison Schofield; +Cc: nvdimm, linux-cxl, junhyeok.im

[snip]

> I applied this to v82 (needs a sync up in libcxl.sym) and ran cxl-poison unit
> test using your new cxl-cli cmds instead of writing to debugfs directly.[1]
> Works for me. Just thought I'd share that as proof of life until I review it
> completely.

Thanks for testing!

> 
> Adding more test cases to cxl-poison.sh makes sense for the device poison.
> Wondering about the protocol errors. How do we test those?

So protocol errors are provided by the platform through EINJ (section 18.6 of ACPI v6.5 spec).
The best way to test would be with real hardware (what I've been doing), but that obviously doesn't
work for everyone. I'm not aware of a way to mock/emulate an actual injection (QEMU doesn't support EINJ),
so I'm not sure software-only testing is viable.

I do have any idea for testing the interface though. It would probably look like writing to mock
error_types/einj_inject attributes (that replace the ones in debugfs) and having a phony error
come up in the dmesg. Something like:

# echo 0x8000 > <debugfs>/<cxl dport>/einj_inject
# dmesg
...
[CXL Error print]

Of course I'm not sure how useful that is since it's basically a roundabout way of testing the debugfs
files exist :/.

> 
> [1] diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh
> index 6ed890bc666c..41ab670b1094 100644
> --- a/test/cxl-poison.sh
> +++ b/test/cxl-poison.sh
> @@ -68,7 +68,8 @@ inject_poison_sysfs()
>         memdev="$1"
>         addr="$2"
>  
> -       echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
> +#      echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison
> +       $CXL inject-error "$memdev" -t poison -a "$addr"
>  }
>  
>  clear_poison_sysfs()
> @@ -76,7 +77,8 @@ clear_poison_sysfs()
>         memdev="$1"
>         addr="$2"
>  
> -       echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
> +#      echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison
> +       $CXL clear-error "$memdev" -a "$addr"
>  }
> 
> 
> While applying this: Documentation: Add docs for inject/clear-error commands
> Got these whitespace complaints:
> 234: new blank line at EOF
> 158: space before tab in indent.
>         "offset":"0x1000",
> 159: space before tab in indent.
>         "length":64,
> 160: space before tab in indent.
>         "source":"Injected"
> 

I'll fix these for v3.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-06-16 20:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-02 20:56 [ndctl PATCH v2 0/7] Add error injection support Ben Cheatham
2025-06-02 20:56 ` [ndctl PATCH v2 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
2025-06-02 20:56 ` [ndctl PATCH v2 2/7] libcxl: Add CXL protocol errors Ben Cheatham
2025-06-02 20:56 ` [ndctl PATCH v2 3/7] libcxl: Add poison injection support Ben Cheatham
2025-06-02 20:56 ` [ndctl PATCH v2 4/7] cxl: Add inject-error command Ben Cheatham
2025-06-02 20:56 ` [ndctl PATCH v2 5/7] cxl: Add clear-error command Ben Cheatham
2025-06-02 20:56 ` [ndctl PATCH v2 6/7] cxl/list: Add injectable errors in output Ben Cheatham
2025-06-02 20:56 ` [ndctl PATCH v2 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
2025-06-12  3:29 ` [ndctl PATCH v2 0/7] Add error injection support Alison Schofield
2025-06-16 20:09   ` Cheatham, Benjamin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).