public inbox for nvdimm@lists.linux.dev
 help / color / mirror / Atom feed
* [ndctl PATCH v3 0/7] Add error injection support
@ 2025-10-21 18:31 Ben Cheatham
  2025-10-21 18:31 ` [ndctl PATCH v3 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Ben Cheatham @ 2025-10-21 18:31 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, alison.schofield, Ben Cheatham

v3 Changes:
	- Rebase on v83 release
	- Fix whitespace errors (Alison)

v2 Changes:
	- Make the --clear option of 'inject-error' its own command (Alison)
	- Debugfs is now found using the /proc/mount entry instead of
	providing the path using a --debugfs option
	- Man page added for 'clear-error'
	- Reword commit descriptions for clarity

This series adds support for injecting CXL protocol (CXL.cache/mem)
errors[1] into CXL RCH Downstream ports and VH root ports[2] and
poison into CXL memory devices through the CXL debugfs. Errors are
injected using a new 'inject-error' command, while errors are reported
using a new cxl-list "-N"/"--injectable-errors" option. Device poison
can be cleared using the 'clear-error' command.

The 'inject-error'/'clear-error' commands and "-N" option of cxl-list all
require access to the CXL driver's debugfs.

The documentation for the new cxl-inject-error command shows both usage
and the possible device/error types, as well as how to retrieve them
using cxl-list. The documentation for cxl-list has also been updated to
show the usage of the new injectable errors option.

[1]: ACPI v6.5 spec, section 18.6.4
[2]: ACPI v6.5 spec, table 18.31

Ben Cheatham (7):
  libcxl: Add debugfs path to CXL context
  libcxl: Add CXL protocol errors
  libcxl: Add poison injection support
  cxl: Add inject-error command
  cxl: Add clear-error command
  cxl/list: Add injectable errors in output
  Documentation: Add docs for inject/clear-error commands

 Documentation/cxl/cxl-clear-error.txt  |  67 ++++++
 Documentation/cxl/cxl-inject-error.txt | 129 ++++++++++++
 Documentation/cxl/cxl-list.txt         |  35 +++-
 Documentation/cxl/meson.build          |   2 +
 cxl/builtin.h                          |   2 +
 cxl/cxl.c                              |   2 +
 cxl/filter.h                           |   3 +
 cxl/inject-error.c                     | 253 +++++++++++++++++++++++
 cxl/json.c                             |  30 +++
 cxl/lib/libcxl.c                       | 274 +++++++++++++++++++++++++
 cxl/lib/libcxl.sym                     |   8 +
 cxl/lib/private.h                      |  14 ++
 cxl/libcxl.h                           |  16 ++
 cxl/list.c                             |   3 +
 cxl/meson.build                        |   1 +
 util/json.h                            |   1 +
 16 files changed, 839 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/cxl/cxl-clear-error.txt
 create mode 100644 Documentation/cxl/cxl-inject-error.txt
 create mode 100644 cxl/inject-error.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [ndctl PATCH v3 1/7] libcxl: Add debugfs path to CXL context
  2025-10-21 18:31 [ndctl PATCH v3 0/7] Add error injection support Ben Cheatham
@ 2025-10-21 18:31 ` Ben Cheatham
  2025-10-21 22:55   ` Dave Jiang
  2025-10-21 18:31 ` [ndctl PATCH v3 2/7] libcxl: Add CXL protocol errors Ben Cheatham
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Ben Cheatham @ 2025-10-21 18:31 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, alison.schofield, Ben Cheatham

Find the CXL debugfs mount point and add it to the CXL library context.
This will be used by poison and procotol error library functions to
access the information presented by the filesystem.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/lib/libcxl.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index cafde1c..ea5831f 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -54,6 +54,7 @@ struct cxl_ctx {
 	struct kmod_ctx *kmod_ctx;
 	struct daxctl_ctx *daxctl_ctx;
 	void *private_data;
+	const char *debugfs;
 };
 
 static void free_pmem(struct cxl_pmem *pmem)
@@ -240,6 +241,43 @@ CXL_EXPORT void *cxl_get_private_data(struct cxl_ctx *ctx)
 	return ctx->private_data;
 }
 
+static char *get_debugfs_dir(void)
+{
+	char *dev, *dir, *type, *ret = NULL;
+	char line[PATH_MAX + 256 + 1];
+	FILE *fp;
+
+	fp = fopen("/proc/mounts", "r");
+	if (!fp)
+		return ret;
+
+	while (fgets(line, sizeof(line), fp)) {
+		dev = strtok(line, " \t");
+		if (!dev)
+			break;
+
+		dir = strtok(NULL, " \t");
+		if (!dir)
+			break;
+
+		type = strtok(NULL, " \t");
+		if (!type)
+			break;
+
+		if (!strcmp(type, "debugfs")) {
+			ret = calloc(strlen(dir) + 1, 1);
+			if (!ret)
+				break;
+
+			strcpy(ret, dir);
+			break;
+		}
+	}
+
+	fclose(fp);
+	return ret;
+}
+
 /**
  * cxl_new - instantiate a new library context
  * @ctx: context to establish
@@ -295,6 +333,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
 	c->udev = udev;
 	c->udev_queue = udev_queue;
 	c->timeout = 5000;
+	c->debugfs = get_debugfs_dir();
 
 	return 0;
 
@@ -350,6 +389,7 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
 	kmod_unref(ctx->kmod_ctx);
 	daxctl_unref(ctx->daxctl_ctx);
 	info(ctx, "context %p released\n", ctx);
+	free((void *)ctx->debugfs);
 	free(ctx);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [ndctl PATCH v3 2/7] libcxl: Add CXL protocol errors
  2025-10-21 18:31 [ndctl PATCH v3 0/7] Add error injection support Ben Cheatham
  2025-10-21 18:31 ` [ndctl PATCH v3 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
@ 2025-10-21 18:31 ` Ben Cheatham
  2025-10-21 23:15   ` Dave Jiang
  2025-10-21 18:31 ` [ndctl PATCH v3 3/7] libcxl: Add poison injection support Ben Cheatham
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Ben Cheatham @ 2025-10-21 18:31 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, alison.schofield, Ben Cheatham

The v6.11 Linux kernel adds CXL protocl (CXL.cache & CXL.mem) error
injection for platforms that implement the error types as according to
the v6.5+ ACPI specification. The interface for injecting these errors
are provided by the kernel under the CXL debugfs. The relevant files in
the interface are the einj_types file, which provides the available CXL
error types for injection, and the einj_inject file, which injects the
error into a CXL VH root port or CXL RCH downstream port.

Add a library API to retrieve the CXL error types and inject them. This
API will be used in a later commit by the 'cxl-inject-error' and
'cxl-list' commands.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/lib/libcxl.c   | 174 +++++++++++++++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.sym |   5 ++
 cxl/lib/private.h  |  14 ++++
 cxl/libcxl.h       |  13 ++++
 4 files changed, 206 insertions(+)

diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index ea5831f..9486b0f 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -46,11 +46,13 @@ struct cxl_ctx {
 	void *userdata;
 	int memdevs_init;
 	int buses_init;
+	int perrors_init;
 	unsigned long timeout;
 	struct udev *udev;
 	struct udev_queue *udev_queue;
 	struct list_head memdevs;
 	struct list_head buses;
+	struct list_head perrors;
 	struct kmod_ctx *kmod_ctx;
 	struct daxctl_ctx *daxctl_ctx;
 	void *private_data;
@@ -205,6 +207,14 @@ static void free_bus(struct cxl_bus *bus, struct list_head *head)
 	free(bus);
 }
 
+static void free_protocol_error(struct cxl_protocol_error *perror,
+				struct list_head *head)
+{
+	if (head)
+		list_del_from(head, &perror->list);
+	free(perror);
+}
+
 /**
  * cxl_get_userdata - retrieve stored data pointer from library context
  * @ctx: cxl library context
@@ -328,6 +338,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
 	*ctx = c;
 	list_head_init(&c->memdevs);
 	list_head_init(&c->buses);
+	list_head_init(&c->perrors);
 	c->kmod_ctx = kmod_ctx;
 	c->daxctl_ctx = daxctl_ctx;
 	c->udev = udev;
@@ -369,6 +380,7 @@ CXL_EXPORT struct cxl_ctx *cxl_ref(struct cxl_ctx *ctx)
  */
 CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
 {
+	struct cxl_protocol_error *perror, *_p;
 	struct cxl_memdev *memdev, *_d;
 	struct cxl_bus *bus, *_b;
 
@@ -384,6 +396,9 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
 	list_for_each_safe(&ctx->buses, bus, _b, port.list)
 		free_bus(bus, &ctx->buses);
 
+	list_for_each_safe(&ctx->perrors, perror, _p, list)
+		free_protocol_error(perror, &ctx->perrors);
+
 	udev_queue_unref(ctx->udev_queue);
 	udev_unref(ctx->udev);
 	kmod_unref(ctx->kmod_ctx);
@@ -3416,6 +3431,165 @@ CXL_EXPORT int cxl_port_decoders_committed(struct cxl_port *port)
 	return port->decoders_committed;
 }
 
+const struct cxl_protocol_error cxl_protocol_errors[] = {
+	CXL_PROTOCOL_ERROR(12, "cache-correctable"),
+	CXL_PROTOCOL_ERROR(13, "cache-uncorrectable"),
+	CXL_PROTOCOL_ERROR(14, "cache-fatal"),
+	CXL_PROTOCOL_ERROR(15, "mem-correctable"),
+	CXL_PROTOCOL_ERROR(16, "mem-uncorrectable"),
+	CXL_PROTOCOL_ERROR(17, "mem-fatal")
+};
+
+static struct cxl_protocol_error *create_cxl_protocol_error(struct cxl_ctx *ctx,
+							    unsigned long n)
+{
+	struct cxl_protocol_error *perror;
+
+	for (unsigned long i = 0; i < ARRAY_SIZE(cxl_protocol_errors); i++) {
+		if (n != BIT(cxl_protocol_errors[i].num))
+			continue;
+
+		perror = calloc(1, sizeof(*perror));
+		if (!perror)
+			return NULL;
+
+		*perror = cxl_protocol_errors[i];
+		perror->ctx = ctx;
+		return perror;
+	}
+
+	return NULL;
+}
+
+static void cxl_add_protocol_errors(struct cxl_ctx *ctx)
+{
+	struct cxl_protocol_error *perror;
+	char *path, *num, *save;
+	unsigned long n;
+	size_t path_len;
+	char buf[512];
+	int rc = 0;
+
+	if (!ctx->debugfs)
+		return;
+
+	path_len = strlen(ctx->debugfs) + 100;
+	path = calloc(1, path_len);
+	if (!path)
+		return;
+
+	snprintf(path, path_len, "%s/cxl/einj_types", ctx->debugfs);
+	rc = access(path, F_OK);
+	if (rc) {
+		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
+		goto err;
+	}
+
+	rc = sysfs_read_attr(ctx, path, buf);
+	if (rc) {
+		err(ctx, "failed to read %s: %s\n", path, strerror(-rc));
+		goto err;
+	}
+
+	/*
+	 * The format of the output of the einj_types attr is:
+	 * <Error number in hex 1> <Error name 1>
+	 * <Error number in hex 2> <Error name 2>
+	 * ...
+	 *
+	 * We only need the number, so parse that and skip the rest of
+	 * the line.
+	 */
+	num = strtok_r(buf, " \n", &save);
+	while (num) {
+		n = strtoul(num, NULL, 16);
+		perror = create_cxl_protocol_error(ctx, n);
+		if (perror)
+			list_add(&ctx->perrors, &perror->list);
+
+		num = strtok_r(NULL, "\n", &save);
+		if (!num)
+			break;
+
+		num = strtok_r(NULL, " \n", &save);
+	}
+
+err:
+	free(path);
+}
+
+static void cxl_protocol_errors_init(struct cxl_ctx *ctx)
+{
+	if (ctx->perrors_init)
+		return;
+
+	ctx->perrors_init = 1;
+	cxl_add_protocol_errors(ctx);
+}
+
+CXL_EXPORT struct cxl_protocol_error *
+cxl_protocol_error_get_first(struct cxl_ctx *ctx)
+{
+	cxl_protocol_errors_init(ctx);
+
+	return list_top(&ctx->perrors, struct cxl_protocol_error, list);
+}
+
+CXL_EXPORT struct cxl_protocol_error *
+cxl_protocol_error_get_next(struct cxl_protocol_error *perror)
+{
+	struct cxl_ctx *ctx = perror->ctx;
+
+	return list_next(&ctx->perrors, perror, list);
+}
+
+CXL_EXPORT unsigned long
+cxl_protocol_error_get_num(struct cxl_protocol_error *perror)
+{
+	return perror->num;
+}
+
+CXL_EXPORT const char *
+cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
+{
+	return perror->string;
+}
+
+CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
+					       unsigned long error)
+{
+	struct cxl_ctx *ctx = dport->port->ctx;
+	unsigned long path_len;
+	char buf[32] = { 0 };
+	char *path;
+	int rc;
+
+	if (!ctx->debugfs)
+		return -ENOENT;
+
+	path_len = strlen(ctx->debugfs) + 100;
+	path = calloc(path_len, sizeof(char));
+	if (!path)
+		return -ENOMEM;
+
+	snprintf(path, path_len, "%s/cxl/%s/einj_inject", ctx->debugfs,
+		 cxl_dport_get_devname(dport));
+	rc = access(path, F_OK);
+	if (rc) {
+		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
+		free(path);
+		return rc;
+	}
+
+	snprintf(buf, sizeof(buf), "0x%lx\n", error);
+	rc = sysfs_write_attr(ctx, path, buf);
+	if (rc)
+		err(ctx, "failed to write %s: %s\n", path, strerror(-rc));
+
+	free(path);
+	return rc;
+}
+
 static void *add_cxl_bus(void *parent, int id, const char *cxlbus_base)
 {
 	const char *devname = devpath_to_devname(cxlbus_base);
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index e01a676..02d5119 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -299,4 +299,9 @@ global:
 LIBCXL_10 {
 global:
 	cxl_memdev_is_port_ancestor;
+	cxl_protocol_error_get_first;
+	cxl_protocol_error_get_next;
+	cxl_protocol_error_get_num;
+	cxl_protocol_error_get_str;
+	cxl_dport_protocol_error_inject;
 } LIBCXL_9;
diff --git a/cxl/lib/private.h b/cxl/lib/private.h
index 7d5a1bc..4e881b6 100644
--- a/cxl/lib/private.h
+++ b/cxl/lib/private.h
@@ -108,6 +108,20 @@ struct cxl_port {
 	struct list_head dports;
 };
 
+struct cxl_protocol_error {
+	unsigned long num;
+	const char *string;
+	struct cxl_ctx *ctx;
+	struct list_node list;
+};
+
+#define CXL_PROTOCOL_ERROR(n, str)	\
+	((struct cxl_protocol_error){	\
+		.num = (n),		\
+		.string = (str),	\
+		.ctx = NULL,		\
+	})
+
 struct cxl_bus {
 	struct cxl_port port;
 };
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index 54bc025..9026e05 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -496,6 +496,19 @@ int cxl_cmd_alert_config_set_enable_alert_actions(struct cxl_cmd *cmd,
 						  int enable);
 struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memdev);
 
+struct cxl_protocol_error;
+struct cxl_protocol_error *cxl_protocol_error_get_first(struct cxl_ctx *ctx);
+struct cxl_protocol_error *
+cxl_protocol_error_get_next(struct cxl_protocol_error *perror);
+unsigned long cxl_protocol_error_get_num(struct cxl_protocol_error *perror);
+const char *cxl_protocol_error_get_str(struct cxl_protocol_error *perror);
+int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
+				    unsigned long error);
+
+#define cxl_protocol_error_foreach(ctx, perror)				       \
+	for (perror = cxl_protocol_error_get_first(ctx); perror != NULL;       \
+	     perror = cxl_protocol_error_get_next(perror))
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [ndctl PATCH v3 3/7] libcxl: Add poison injection support
  2025-10-21 18:31 [ndctl PATCH v3 0/7] Add error injection support Ben Cheatham
  2025-10-21 18:31 ` [ndctl PATCH v3 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
  2025-10-21 18:31 ` [ndctl PATCH v3 2/7] libcxl: Add CXL protocol errors Ben Cheatham
@ 2025-10-21 18:31 ` Ben Cheatham
  2025-10-21 23:44   ` Dave Jiang
  2025-10-21 18:31 ` [ndctl PATCH v3 4/7] cxl: Add inject-error command Ben Cheatham
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Ben Cheatham @ 2025-10-21 18:31 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, alison.schofield, Ben Cheatham

Add a library API for clearing and injecting poison into a CXL memory
device through the CXL debugfs.

This API will be used by the 'cxl-inject-error' and 'cxl-clear-error'
commands in later commits.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/lib/libcxl.c   | 60 ++++++++++++++++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.sym |  3 +++
 cxl/libcxl.h       |  3 +++
 3 files changed, 66 insertions(+)

diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
index 9486b0f..9d4bd80 100644
--- a/cxl/lib/libcxl.c
+++ b/cxl/lib/libcxl.c
@@ -5019,3 +5019,63 @@ CXL_EXPORT struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memde
 {
 	return cxl_cmd_new_generic(memdev, CXL_MEM_COMMAND_ID_SET_ALERT_CONFIG);
 }
+
+CXL_EXPORT bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev)
+{
+	struct cxl_ctx *ctx = memdev->ctx;
+	size_t path_len;
+	bool exists;
+	char *path;
+
+	if (!ctx->debugfs)
+		return false;
+
+	path_len = strlen(ctx->debugfs) + 100;
+	path = calloc(path_len, sizeof(char));
+	if (!path)
+		return false;
+
+	snprintf(path, path_len, "%s/cxl/%s/inject_poison", ctx->debugfs,
+		 cxl_memdev_get_devname(memdev));
+	exists = access(path, F_OK) == 0;
+
+	free(path);
+	return exists;
+}
+
+static int cxl_memdev_poison_action(struct cxl_memdev *memdev, size_t dpa,
+				    bool clear)
+{
+	struct cxl_ctx *ctx = memdev->ctx;
+	size_t path_len;
+	char addr[32];
+	char *path;
+	int rc;
+
+	if (!ctx->debugfs)
+		return -ENOENT;
+
+	path_len = strlen(ctx->debugfs) + 100;
+	path = calloc(path_len, sizeof(char));
+	if (!path)
+		return -ENOMEM;
+
+	snprintf(path, path_len, "%s/cxl/%s/%s", ctx->debugfs,
+		 cxl_memdev_get_devname(memdev),
+		 clear ? "clear_poison" : "inject_poison");
+	snprintf(addr, 32, "0x%lx\n", dpa);
+
+	rc = sysfs_write_attr(ctx, path, addr);
+	free(path);
+	return rc;
+}
+
+CXL_EXPORT int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t addr)
+{
+	return cxl_memdev_poison_action(memdev, addr, false);
+}
+
+CXL_EXPORT int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t addr)
+{
+	return cxl_memdev_poison_action(memdev, addr, true);
+}
diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
index 02d5119..3bce60d 100644
--- a/cxl/lib/libcxl.sym
+++ b/cxl/lib/libcxl.sym
@@ -304,4 +304,7 @@ global:
 	cxl_protocol_error_get_num;
 	cxl_protocol_error_get_str;
 	cxl_dport_protocol_error_inject;
+	cxl_memdev_has_poison_injection;
+	cxl_memdev_inject_poison;
+	cxl_memdev_clear_poison;
 } LIBCXL_9;
diff --git a/cxl/libcxl.h b/cxl/libcxl.h
index 9026e05..3b51d61 100644
--- a/cxl/libcxl.h
+++ b/cxl/libcxl.h
@@ -105,6 +105,9 @@ int cxl_memdev_read_label(struct cxl_memdev *memdev, void *buf, size_t length,
 		size_t offset);
 int cxl_memdev_write_label(struct cxl_memdev *memdev, void *buf, size_t length,
 		size_t offset);
+bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev);
+int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t dpa);
+int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t dpa);
 struct cxl_cmd *cxl_cmd_new_get_fw_info(struct cxl_memdev *memdev);
 unsigned int cxl_cmd_fw_info_get_num_slots(struct cxl_cmd *cmd);
 unsigned int cxl_cmd_fw_info_get_active_slot(struct cxl_cmd *cmd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [ndctl PATCH v3 4/7] cxl: Add inject-error command
  2025-10-21 18:31 [ndctl PATCH v3 0/7] Add error injection support Ben Cheatham
                   ` (2 preceding siblings ...)
  2025-10-21 18:31 ` [ndctl PATCH v3 3/7] libcxl: Add poison injection support Ben Cheatham
@ 2025-10-21 18:31 ` Ben Cheatham
  2025-10-22 17:06   ` Dave Jiang
  2025-10-21 18:31 ` [ndctl PATCH v3 5/7] cxl: Add clear-error command Ben Cheatham
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Ben Cheatham @ 2025-10-21 18:31 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, alison.schofield, Ben Cheatham

Add the 'cxl-inject-error' command. This command will provide CXL
protocol error injection for CXL VH root ports and CXL RCH downstream
ports, as well as poison injection for CXL memory devices.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/builtin.h      |   1 +
 cxl/cxl.c          |   1 +
 cxl/inject-error.c | 195 +++++++++++++++++++++++++++++++++++++++++++++
 cxl/meson.build    |   1 +
 4 files changed, 198 insertions(+)
 create mode 100644 cxl/inject-error.c

diff --git a/cxl/builtin.h b/cxl/builtin.h
index c483f30..e82fcb5 100644
--- a/cxl/builtin.h
+++ b/cxl/builtin.h
@@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
+int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
 #ifdef ENABLE_LIBTRACEFS
 int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
 #else
diff --git a/cxl/cxl.c b/cxl/cxl.c
index 1643667..a98bd6b 100644
--- a/cxl/cxl.c
+++ b/cxl/cxl.c
@@ -80,6 +80,7 @@ static struct cmd_struct commands[] = {
 	{ "disable-region", .c_fn = cmd_disable_region },
 	{ "destroy-region", .c_fn = cmd_destroy_region },
 	{ "monitor", .c_fn = cmd_monitor },
+	{ "inject-error", .c_fn = cmd_inject_error },
 };
 
 int main(int argc, const char **argv)
diff --git a/cxl/inject-error.c b/cxl/inject-error.c
new file mode 100644
index 0000000..c48ea69
--- /dev/null
+++ b/cxl/inject-error.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 AMD. All rights reserved. */
+#include <util/parse-options.h>
+#include <cxl/libcxl.h>
+#include <cxl/filter.h>
+#include <util/log.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <errno.h>
+#include <limits.h>
+
+#define EINJ_TYPES_BUF_SIZE 512
+
+static bool debug;
+
+static struct inject_params {
+	const char *type;
+	const char *address;
+} inj_param;
+
+static const struct option inject_options[] = {
+	OPT_STRING('t', "type", &inj_param.type, "Error type",
+		   "Error type to inject into <device>"),
+	OPT_STRING('a', "address", &inj_param.address, "Address for poison injection",
+		   "Device physical address for poison injection in hex or decimal"),
+#ifdef ENABLE_DEBUG
+	OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
+#endif
+	OPT_END(),
+};
+
+static struct log_ctx iel;
+
+static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
+						     const char *type)
+{
+	struct cxl_protocol_error *perror;
+
+	cxl_protocol_error_foreach(ctx, perror) {
+		if (strcmp(type, cxl_protocol_error_get_str(perror)) == 0)
+			return perror;
+	}
+
+	log_err(&iel, "Invalid CXL protocol error type: %s\n", type);
+	return NULL;
+}
+
+static struct cxl_dport *find_cxl_dport(struct cxl_ctx *ctx, const char *devname)
+{
+	struct cxl_port *port, *top;
+	struct cxl_dport *dport;
+	struct cxl_bus *bus;
+
+	cxl_bus_foreach(ctx, bus) {
+		top = cxl_bus_get_port(bus);
+
+		cxl_port_foreach_all(top, port)
+			cxl_dport_foreach(port, dport)
+				if (!strcmp(devname,
+					    cxl_dport_get_devname(dport)))
+					return dport;
+	}
+
+	log_err(&iel, "Downstream port \"%s\" not found\n", devname);
+	return NULL;
+}
+
+static struct cxl_memdev *find_cxl_memdev(struct cxl_ctx *ctx,
+					  const char *filter)
+{
+	struct cxl_memdev *memdev;
+
+	cxl_memdev_foreach(ctx, memdev) {
+		if (util_cxl_memdev_filter(memdev, filter, NULL))
+			return memdev;
+	}
+
+	log_err(&iel, "Memdev \"%s\" not found\n", filter);
+	return NULL;
+}
+
+static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
+			    struct cxl_protocol_error *perror)
+{
+	struct cxl_dport *dport;
+	int rc;
+
+	if (!devname) {
+		log_err(&iel, "No downstream port specified for injection\n");
+		return -EINVAL;
+	}
+
+	dport = find_cxl_dport(ctx, devname);
+	if (!dport)
+		return -ENODEV;
+
+	rc = cxl_dport_protocol_error_inject(dport,
+					     cxl_protocol_error_get_num(perror));
+	if (rc)
+		return rc;
+
+	printf("injected %s protocol error.\n",
+	       cxl_protocol_error_get_str(perror));
+	return 0;
+}
+
+static int poison_action(struct cxl_ctx *ctx, const char *filter,
+			 const char *addr)
+{
+	struct cxl_memdev *memdev;
+	size_t a;
+	int rc;
+
+	memdev = find_cxl_memdev(ctx, filter);
+	if (!memdev)
+		return -ENODEV;
+
+	if (!cxl_memdev_has_poison_injection(memdev)) {
+		log_err(&iel, "%s does not support error injection\n",
+			cxl_memdev_get_devname(memdev));
+		return -EINVAL;
+	}
+
+	if (!addr) {
+		log_err(&iel, "no address provided\n");
+		return -EINVAL;
+	}
+
+	a = strtoull(addr, NULL, 0);
+	if (a == ULLONG_MAX && errno == ERANGE) {
+		log_err(&iel, "invalid address %s", addr);
+		return -EINVAL;
+	}
+
+	rc = cxl_memdev_inject_poison(memdev, a);
+
+	if (rc)
+		log_err(&iel, "failed to inject poison at %s:%s: %s\n",
+			cxl_memdev_get_devname(memdev), addr, strerror(-rc));
+	else
+		printf("poison injected at %s:%s\n",
+		       cxl_memdev_get_devname(memdev), addr);
+
+	return rc;
+}
+
+static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
+			 const struct option *options, const char *usage)
+{
+	struct cxl_protocol_error *perr;
+	const char * const u[] = {
+		usage,
+		NULL
+	};
+	int rc = -EINVAL;
+
+	log_init(&iel, "cxl inject-error", "CXL_INJECT_LOG");
+	argc = parse_options(argc, argv, options, u, 0);
+
+	if (debug) {
+		cxl_set_log_priority(ctx, LOG_DEBUG);
+		iel.log_priority = LOG_DEBUG;
+	} else {
+		iel.log_priority = LOG_INFO;
+	}
+
+	if (argc != 1) {
+		usage_with_options(u, options);
+		return rc;
+	}
+
+	if (strcmp(inj_param.type, "poison") == 0) {
+		rc = poison_action(ctx, argv[0], inj_param.address);
+		return rc;
+	}
+
+	perr = find_cxl_proto_err(ctx, inj_param.type);
+	if (perr) {
+		rc = inject_proto_err(ctx, argv[0], perr);
+		if (rc)
+			log_err(&iel, "Failed to inject error: %d\n", rc);
+	}
+
+	log_err(&iel, "Invalid error type %s", inj_param.type);
+	return rc;
+}
+
+int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
+{
+	int rc = inject_action(argc, argv, ctx, inject_options,
+			       "inject-error <device> [<options>]");
+
+	return rc ? EXIT_FAILURE : EXIT_SUCCESS;
+}
diff --git a/cxl/meson.build b/cxl/meson.build
index b9924ae..92031b5 100644
--- a/cxl/meson.build
+++ b/cxl/meson.build
@@ -7,6 +7,7 @@ cxl_src = [
   'memdev.c',
   'json.c',
   'filter.c',
+  'inject-error.c',
   '../daxctl/json.c',
   '../daxctl/filter.c',
 ]
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [ndctl PATCH v3 5/7] cxl: Add clear-error command
  2025-10-21 18:31 [ndctl PATCH v3 0/7] Add error injection support Ben Cheatham
                   ` (3 preceding siblings ...)
  2025-10-21 18:31 ` [ndctl PATCH v3 4/7] cxl: Add inject-error command Ben Cheatham
@ 2025-10-21 18:31 ` Ben Cheatham
  2025-10-21 18:31 ` [ndctl PATCH v3 6/7] cxl/list: Add injectable errors in output Ben Cheatham
  2025-10-21 18:31 ` [ndctl PATCH v3 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
  6 siblings, 0 replies; 20+ messages in thread
From: Ben Cheatham @ 2025-10-21 18:31 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, alison.schofield, Ben Cheatham

Add the 'cxl-clear-error' command. This command allows the user to clear
device poison from CXL memory devices.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 cxl/builtin.h      |  1 +
 cxl/cxl.c          |  1 +
 cxl/inject-error.c | 68 ++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/cxl/builtin.h b/cxl/builtin.h
index e82fcb5..68ed1de 100644
--- a/cxl/builtin.h
+++ b/cxl/builtin.h
@@ -26,6 +26,7 @@ int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
 int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
+int cmd_clear_error(int argc, const char **argv, struct cxl_ctx *ctx);
 #ifdef ENABLE_LIBTRACEFS
 int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
 #else
diff --git a/cxl/cxl.c b/cxl/cxl.c
index a98bd6b..e1740b5 100644
--- a/cxl/cxl.c
+++ b/cxl/cxl.c
@@ -81,6 +81,7 @@ static struct cmd_struct commands[] = {
 	{ "destroy-region", .c_fn = cmd_destroy_region },
 	{ "monitor", .c_fn = cmd_monitor },
 	{ "inject-error", .c_fn = cmd_inject_error },
+	{ "clear-error", .c_fn = cmd_clear_error },
 };
 
 int main(int argc, const char **argv)
diff --git a/cxl/inject-error.c b/cxl/inject-error.c
index c48ea69..f8a9445 100644
--- a/cxl/inject-error.c
+++ b/cxl/inject-error.c
@@ -19,6 +19,10 @@ static struct inject_params {
 	const char *address;
 } inj_param;
 
+static struct clear_params {
+	const char *address;
+} clear_param;
+
 static const struct option inject_options[] = {
 	OPT_STRING('t', "type", &inj_param.type, "Error type",
 		   "Error type to inject into <device>"),
@@ -30,6 +34,15 @@ static const struct option inject_options[] = {
 	OPT_END(),
 };
 
+static const struct option clear_options[] = {
+	OPT_STRING('a', "address", &clear_param.address, "Address for poison clearing",
+		   "Device physical address to clear poison from in hex or decimal"),
+#ifdef ENABLE_DEBUG
+	OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
+#endif
+	OPT_END(),
+};
+
 static struct log_ctx iel;
 
 static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
@@ -106,7 +119,7 @@ static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
 }
 
 static int poison_action(struct cxl_ctx *ctx, const char *filter,
-			 const char *addr)
+			 const char *addr, bool clear)
 {
 	struct cxl_memdev *memdev;
 	size_t a;
@@ -133,13 +146,17 @@ static int poison_action(struct cxl_ctx *ctx, const char *filter,
 		return -EINVAL;
 	}
 
-	rc = cxl_memdev_inject_poison(memdev, a);
+	if (clear)
+		rc = cxl_memdev_clear_poison(memdev, a);
+	else
+		rc = cxl_memdev_inject_poison(memdev, a);
 
 	if (rc)
-		log_err(&iel, "failed to inject poison at %s:%s: %s\n",
+		log_err(&iel, "failed to %s %s:%s: %s\n",
+			clear ? "clear poison at" : "inject point at",
 			cxl_memdev_get_devname(memdev), addr, strerror(-rc));
 	else
-		printf("poison injected at %s:%s\n",
+		printf("poison %s at %s:%s\n", clear ? "cleared" : "injected",
 		       cxl_memdev_get_devname(memdev), addr);
 
 	return rc;
@@ -171,7 +188,7 @@ static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
 	}
 
 	if (strcmp(inj_param.type, "poison") == 0) {
-		rc = poison_action(ctx, argv[0], inj_param.address);
+		rc = poison_action(ctx, argv[0], inj_param.address, false);
 		return rc;
 	}
 
@@ -193,3 +210,44 @@ int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
 
 	return rc ? EXIT_FAILURE : EXIT_SUCCESS;
 }
+
+static int clear_action(int argc, const char **argv, struct cxl_ctx *ctx,
+			const struct option *options, const char *usage)
+{
+	const char * const u[] = {
+		usage,
+		NULL
+	};
+	int rc = -EINVAL;
+
+	log_init(&iel, "cxl clear-error", "CXL_CLEAR_LOG");
+	argc = parse_options(argc, argv, options, u, 0);
+
+	if (debug) {
+		cxl_set_log_priority(ctx, LOG_DEBUG);
+		iel.log_priority = LOG_DEBUG;
+	} else {
+		iel.log_priority = LOG_INFO;
+	}
+
+	if (argc != 1) {
+		usage_with_options(u, options);
+		return rc;
+	}
+
+	rc = poison_action(ctx, argv[0], clear_param.address, true);
+	if (rc) {
+		log_err(&iel, "Failed to inject poison into %s: %s\n",
+			argv[0], strerror(-rc));
+		return rc;
+	}
+
+	return rc;
+}
+
+int cmd_clear_error(int argc, const char **argv, struct cxl_ctx *ctx)
+{
+	int rc = clear_action(argc, argv, ctx, clear_options,
+			      "clear-error <device> [<options>]");
+	return rc ? EXIT_FAILURE : EXIT_SUCCESS;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [ndctl PATCH v3 6/7] cxl/list: Add injectable errors in output
  2025-10-21 18:31 [ndctl PATCH v3 0/7] Add error injection support Ben Cheatham
                   ` (4 preceding siblings ...)
  2025-10-21 18:31 ` [ndctl PATCH v3 5/7] cxl: Add clear-error command Ben Cheatham
@ 2025-10-21 18:31 ` Ben Cheatham
  2025-10-22 17:18   ` Dave Jiang
  2025-10-21 18:31 ` [ndctl PATCH v3 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
  6 siblings, 1 reply; 20+ messages in thread
From: Ben Cheatham @ 2025-10-21 18:31 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, alison.schofield, Ben Cheatham

Add the "--injectable-errors"/"-N" option to show injectable error
information for CXL devices. The applicable devices are CXL memory
devices and CXL busses.

For CXL memory devices the option reports whether the device supports
poison injection (the "--media-errors"/"-L" option shows injected
poison).

For CXL busses the option shows injectable CXL protocol error types. The
information will be the same across busses because the error types are
system-wide. The information is presented under the bus for easier
filtering.

Update the man page for 'cxl-list' to show the usage of the new option.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 Documentation/cxl/cxl-list.txt | 35 +++++++++++++++++++++++++++++++++-
 cxl/filter.h                   |  3 +++
 cxl/json.c                     | 30 +++++++++++++++++++++++++++++
 cxl/list.c                     |  3 +++
 util/json.h                    |  1 +
 5 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
index 0595638..35ff542 100644
--- a/Documentation/cxl/cxl-list.txt
+++ b/Documentation/cxl/cxl-list.txt
@@ -471,6 +471,38 @@ The media-errors option is only available with '-Dlibtracefs=enabled'.
 }
 ----
 
+-N::
+--injectable-errors::
+	Include injectable error information in the output. For CXL memory devices
+	this includes whether poison is injectable through the kernel debug filesystem.
+	The types of CXL protocol errors available for injection into downstream ports
+	are listed as part of a CXL bus object.
+
+----
+# cxl list -NB
+[
+  {
+	"bus":"root0",
+	"provider":"ACPI.CXL",
+	"injectable_protocol_errors":[
+	  "mem-correctable",
+	  "mem-fatal",
+	]
+  }
+]
+
+# cxl list -N
+[
+  {
+    "memdev":"mem0",
+    "pmem_size":268435456,
+    "ram_size":268435456,
+    "serial":2,
+	"poison_injectable":true
+  }
+]
+
+----
 -v::
 --verbose::
 	Increase verbosity of the output. This can be specified
@@ -487,7 +519,8 @@ The media-errors option is only available with '-Dlibtracefs=enabled'.
 	  devices with --idle.
 	- *-vvv*
 	  Everything *-vv* provides, plus enable
-	  --health, --partition, and --media-errors.
+	  --health, --partition, --media-errors, and
+	  --injectable-errors.
 
 --debug::
 	If the cxl tool was built with debug enabled, turn on debug
diff --git a/cxl/filter.h b/cxl/filter.h
index 956a46e..34f8387 100644
--- a/cxl/filter.h
+++ b/cxl/filter.h
@@ -31,6 +31,7 @@ struct cxl_filter_params {
 	bool alert_config;
 	bool dax;
 	bool media_errors;
+	bool inj_errors;
 	int verbose;
 	struct log_ctx ctx;
 };
@@ -91,6 +92,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
 		flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
 	if (param->media_errors)
 		flags |= UTIL_JSON_MEDIA_ERRORS;
+	if (param->inj_errors)
+		flags |= UTIL_JSON_INJ_ERRORS;
 	return flags;
 }
 
diff --git a/cxl/json.c b/cxl/json.c
index bde4589..2917477 100644
--- a/cxl/json.c
+++ b/cxl/json.c
@@ -675,6 +675,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
 			json_object_object_add(jdev, "firmware", jobj);
 	}
 
+	if (flags & UTIL_JSON_INJ_ERRORS) {
+		jobj = json_object_new_boolean(cxl_memdev_has_poison_injection(memdev));
+		if (jobj)
+			json_object_object_add(jdev, "poison_injectable", jobj);
+	}
+
 	if (flags & UTIL_JSON_MEDIA_ERRORS) {
 		jobj = util_cxl_poison_list_to_json(NULL, memdev, flags);
 		if (jobj)
@@ -750,6 +756,8 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
 					 unsigned long flags)
 {
 	const char *devname = cxl_bus_get_devname(bus);
+	struct cxl_ctx *ctx = cxl_bus_get_ctx(bus);
+	struct cxl_protocol_error *perror;
 	struct json_object *jbus, *jobj;
 
 	jbus = json_object_new_object();
@@ -765,6 +773,28 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
 		json_object_object_add(jbus, "provider", jobj);
 
 	json_object_set_userdata(jbus, bus, NULL);
+
+	if (flags & UTIL_JSON_INJ_ERRORS) {
+		jobj = json_object_new_array();
+		if (!jobj)
+			return jbus;
+
+		cxl_protocol_error_foreach(ctx, perror)
+		{
+			struct json_object *jerr_str;
+			const char *perror_str;
+
+			perror_str = cxl_protocol_error_get_str(perror);
+
+			jerr_str = json_object_new_string(perror_str);
+			if (jerr_str)
+				json_object_array_add(jobj, jerr_str);
+		}
+
+		json_object_object_add(jbus, "injectable_protocol_errors",
+				       jobj);
+	}
+
 	return jbus;
 }
 
diff --git a/cxl/list.c b/cxl/list.c
index 0b25d78..a505ed6 100644
--- a/cxl/list.c
+++ b/cxl/list.c
@@ -59,6 +59,8 @@ static const struct option options[] = {
 		    "include alert configuration information"),
 	OPT_BOOLEAN('L', "media-errors", &param.media_errors,
 		    "include media-error information "),
+	OPT_BOOLEAN('N', "injectable-errors", &param.inj_errors,
+		    "include injectable error information"),
 	OPT_INCR('v', "verbose", &param.verbose, "increase output detail"),
 #ifdef ENABLE_DEBUG
 	OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
@@ -124,6 +126,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
 		param.alert_config = true;
 		param.dax = true;
 		param.media_errors = true;
+		param.inj_errors = true;
 		/* fallthrough */
 	case 2:
 		param.idle = true;
diff --git a/util/json.h b/util/json.h
index 560f845..57278cb 100644
--- a/util/json.h
+++ b/util/json.h
@@ -21,6 +21,7 @@ enum util_json_flags {
 	UTIL_JSON_TARGETS	= (1 << 11),
 	UTIL_JSON_PARTITION	= (1 << 12),
 	UTIL_JSON_ALERT_CONFIG	= (1 << 13),
+	UTIL_JSON_INJ_ERRORS	= (1 << 14),
 };
 
 void util_display_json_array(FILE *f_out, struct json_object *jarray,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [ndctl PATCH v3 7/7] Documentation: Add docs for inject/clear-error commands
  2025-10-21 18:31 [ndctl PATCH v3 0/7] Add error injection support Ben Cheatham
                   ` (5 preceding siblings ...)
  2025-10-21 18:31 ` [ndctl PATCH v3 6/7] cxl/list: Add injectable errors in output Ben Cheatham
@ 2025-10-21 18:31 ` Ben Cheatham
  2025-10-22 17:22   ` Dave Jiang
  6 siblings, 1 reply; 20+ messages in thread
From: Ben Cheatham @ 2025-10-21 18:31 UTC (permalink / raw)
  To: nvdimm; +Cc: linux-cxl, alison.schofield, Ben Cheatham

Add man pages for the 'cxl-inject-error' and 'cxl-clear-error' commands.
These man pages show usage and examples for each of their use cases.

Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 Documentation/cxl/cxl-clear-error.txt  |  67 +++++++++++++
 Documentation/cxl/cxl-inject-error.txt | 129 +++++++++++++++++++++++++
 Documentation/cxl/meson.build          |   2 +
 3 files changed, 198 insertions(+)
 create mode 100644 Documentation/cxl/cxl-clear-error.txt
 create mode 100644 Documentation/cxl/cxl-inject-error.txt

diff --git a/Documentation/cxl/cxl-clear-error.txt b/Documentation/cxl/cxl-clear-error.txt
new file mode 100644
index 0000000..ccb0e63
--- /dev/null
+++ b/Documentation/cxl/cxl-clear-error.txt
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+
+cxl-clear-error(1)
+==================
+
+NAME
+----
+cxl-clear-error - Clear CXL errors from CXL devices
+
+SYNOPSIS
+--------
+[verse]
+'cxl clear-error' <device name> [<options>]
+
+Clear an error from a CXL device. The types of devices supported are:
+
+"memdevs":: A CXL memory device. Memory devices are specified by device
+name ("mem0"), device id ("0") and/or host device name ("0000:35:00.0").
+
+Only device poison (viewable using the '-L'/'--media-errors' option of
+'cxl-list') can be cleared from a device using this command. For example:
+
+----
+
+# cxl list -m mem0 -L -u
+{
+  "memdev":"mem0",
+  "ram_size":"1024.00 MiB (1073.74 MB)",
+  "ram_qos_class":42,
+  "serial":"0x0",
+  "numa_node:1,
+  "host":"0000:35:00.0",
+  "media_errors":[
+    {
+	  "offset":"0x1000",
+	  "length":64,
+	  "source":"Injected"
+	}
+  ]
+}
+
+# cxl clear-error mem0 -a 0x1000
+poison cleared at mem0:0x1000
+
+# cxl list -m mem0 -L -u
+{
+  "memdev":"mem0",
+  "ram_size":"1024.00 MiB (1073.74 MB)",
+  "ram_qos_class":42,
+  "serial":"0x0",
+  "numa_node:1,
+  "host":"0000:35:00.0",
+  "media_errors":[
+  ]
+}
+
+----
+
+OPTIONS
+-------
+-a::
+--address::
+	Device physical address (DPA) to clear poison from. Address can be specified
+	in hex or decimal. Required for clearing poison.
+
+--debug::
+	Enable debug output
diff --git a/Documentation/cxl/cxl-inject-error.txt b/Documentation/cxl/cxl-inject-error.txt
new file mode 100644
index 0000000..e1bebd7
--- /dev/null
+++ b/Documentation/cxl/cxl-inject-error.txt
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0
+
+cxl-inject-error(1)
+===================
+
+NAME
+----
+cxl-inject-error - Inject CXL errors into CXL devices
+
+SYNOPSIS
+--------
+[verse]
+'cxl inject-error' <device name> [<options>]
+
+Inject an error into a CXL device. The type of errors supported depend on the
+device specified. The types of devices supported are:
+
+"Downstream Ports":: A CXL RCH downstream port (dport) or a CXL VH root port.
+Eligible CXL 2.0+ ports are dports of ports at depth 1 in the output of cxl-list.
+Dports are specified by host name ("0000:0e:01.1").
+"memdevs":: A CXL memory device. Memory devices are specified by device name
+("mem0"), device id ("0"), and/or host device name ("0000:35:00.0").
+
+There are two types of errors which can be injected: CXL protocol errors
+and device poison.
+
+CXL protocol errors can only be used with downstream ports (as defined above).
+Protocol errors follow the format of "<protocol>-<severity>". For example,
+a "mem-fatal" error is a CXL.mem fatal protocol error. Protocol errors can be
+found with the '-N' option of 'cxl-list' under a CXL bus object. For example:
+
+----
+
+# cxl list -NB
+[
+  {
+	"bus":"root0",
+	"provider":"ACPI.CXL",
+	"injectable_protocol_errors":[
+	  "mem-correctable",
+	  "mem-fatal",
+	]
+  }
+]
+
+----
+
+CXL protocol (CXL.cache/mem) error injection requires the platform to support
+ACPI v6.5+ error injection (EINJ). In addition to platform support, the
+CONFIG_ACPI_APEI_EINJ and CONFIG_ACPI_APEI_EINJ_CXL kernel configuration options
+will need to be enabled. For more information, view the Linux kernel documentation
+on EINJ.
+
+Device poison can only by used with CXL memory devices. A device physical address
+(DPA) is required to do poison injection. DPAs range from 0 to the size of
+device's memory, which can be found using 'cxl-list'. An example injection:
+
+----
+
+# cxl inject-error mem0 -t poison -a 0x1000
+poison injected at mem0:0x1000
+# cxl list -m mem0 -u --media-errors
+{
+  "memdev":"mem0",
+  "ram_size":"256.00 MiB (268.44 MB)",
+  "serial":"0",
+  "host":"0000:0d:00.0",
+  "firmware_version":"BWFW VERSION 00",
+  "media_errors":[
+    {
+      "offset":"0x1000",
+      "length":64,
+      "source":"Injected"
+    }
+  ]
+}
+
+----
+
+Not all devices support poison injection. To see if a device supports poison injection
+through debugfs, use 'cxl-list' with the '-N' option and look for the "poison-injectable"
+attribute under the device. Example:
+
+----
+
+# cxl list -Nu -m mem0
+{
+  "memdev":"mem0",
+  "ram_size":"256.00 MiB (268.44 MB)",
+  "serial":"0",
+  "host":"0000:0d:00.0",
+  "firmware_version":"BWFW VERSION 00",
+  "poison_injectable":true
+}
+
+----
+
+This command depends on the kernel debug filesystem (debugfs) to do CXL protocol
+error and device poison injection.
+
+OPTIONS
+-------
+-a::
+--address::
+	Device physical address (DPA) to use for poison injection. Address can
+	be specified in hex or decimal. Required for poison injection.
+
+-t::
+--type::
+	Type of error to inject into <device name>. The type of error is restricted
+	by device type. The following shows the possible types under their associated
+	device type(s):
+----
+
+Downstream Ports: ::
+	cache-correctable, cache-uncorrectable, cache-fatal, mem-correctable,
+	mem-fatal
+
+Memdevs: ::
+	poison
+
+----
+
+--debug::
+	Enable debug output
+
+SEE ALSO
+--------
+linkcxl:cxl-list[1]
diff --git a/Documentation/cxl/meson.build b/Documentation/cxl/meson.build
index 8085c1c..0b75eed 100644
--- a/Documentation/cxl/meson.build
+++ b/Documentation/cxl/meson.build
@@ -50,6 +50,8 @@ cxl_manpages = [
   'cxl-update-firmware.txt',
   'cxl-set-alert-config.txt',
   'cxl-wait-sanitize.txt',
+  'cxl-inject-error.txt',
+  'cxl-clear-error.txt',
 ]
 
 foreach man : cxl_manpages
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 1/7] libcxl: Add debugfs path to CXL context
  2025-10-21 18:31 ` [ndctl PATCH v3 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
@ 2025-10-21 22:55   ` Dave Jiang
  2025-10-23 20:15     ` Cheatham, Benjamin
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Jiang @ 2025-10-21 22:55 UTC (permalink / raw)
  To: Ben Cheatham, nvdimm; +Cc: linux-cxl, alison.schofield



On 10/21/25 11:31 AM, Ben Cheatham wrote:
> Find the CXL debugfs mount point and add it to the CXL library context.
> This will be used by poison and procotol error library functions to
> access the information presented by the filesystem.
> 
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
>  cxl/lib/libcxl.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> index cafde1c..ea5831f 100644
> --- a/cxl/lib/libcxl.c
> +++ b/cxl/lib/libcxl.c
> @@ -54,6 +54,7 @@ struct cxl_ctx {
>  	struct kmod_ctx *kmod_ctx;
>  	struct daxctl_ctx *daxctl_ctx;
>  	void *private_data;
> +	const char *debugfs;
>  };
>  
>  static void free_pmem(struct cxl_pmem *pmem)
> @@ -240,6 +241,43 @@ CXL_EXPORT void *cxl_get_private_data(struct cxl_ctx *ctx)
>  	return ctx->private_data;
>  }
>  
> +static char *get_debugfs_dir(void)

const char *?


Also maybe get_debugfs_dir_path()

> +{
> +	char *dev, *dir, *type, *ret = NULL;

'debugfs_dir' rather than 'ret' would be clearer to read.

DJ

> +	char line[PATH_MAX + 256 + 1];
> +	FILE *fp;
> +
> +	fp = fopen("/proc/mounts", "r");
> +	if (!fp)
> +		return ret;
> +
> +	while (fgets(line, sizeof(line), fp)) {
> +		dev = strtok(line, " \t");
> +		if (!dev)
> +			break;
> +
> +		dir = strtok(NULL, " \t");
> +		if (!dir)
> +			break;
> +
> +		type = strtok(NULL, " \t");
> +		if (!type)
> +			break;
> +
> +		if (!strcmp(type, "debugfs")) {
> +			ret = calloc(strlen(dir) + 1, 1);
> +			if (!ret)
> +				break;
> +
> +			strcpy(ret, dir);
> +			break;
> +		}
> +	}
> +
> +	fclose(fp);
> +	return ret;
> +}
> +
>  /**
>   * cxl_new - instantiate a new library context
>   * @ctx: context to establish
> @@ -295,6 +333,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
>  	c->udev = udev;
>  	c->udev_queue = udev_queue;
>  	c->timeout = 5000;
> +	c->debugfs = get_debugfs_dir();
>  
>  	return 0;
>  
> @@ -350,6 +389,7 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
>  	kmod_unref(ctx->kmod_ctx);
>  	daxctl_unref(ctx->daxctl_ctx);
>  	info(ctx, "context %p released\n", ctx);
> +	free((void *)ctx->debugfs);
>  	free(ctx);
>  }
>  



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 2/7] libcxl: Add CXL protocol errors
  2025-10-21 18:31 ` [ndctl PATCH v3 2/7] libcxl: Add CXL protocol errors Ben Cheatham
@ 2025-10-21 23:15   ` Dave Jiang
  2025-10-23 20:15     ` Cheatham, Benjamin
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Jiang @ 2025-10-21 23:15 UTC (permalink / raw)
  To: Ben Cheatham, nvdimm; +Cc: linux-cxl, alison.schofield



On 10/21/25 11:31 AM, Ben Cheatham wrote:
> The v6.11 Linux kernel adds CXL protocl (CXL.cache & CXL.mem) error
> injection for platforms that implement the error types as according to
> the v6.5+ ACPI specification. The interface for injecting these errors
> are provided by the kernel under the CXL debugfs. The relevant files in
> the interface are the einj_types file, which provides the available CXL
> error types for injection, and the einj_inject file, which injects the
> error into a CXL VH root port or CXL RCH downstream port.
> 
> Add a library API to retrieve the CXL error types and inject them. This
> API will be used in a later commit by the 'cxl-inject-error' and
> 'cxl-list' commands.
> 
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
>  cxl/lib/libcxl.c   | 174 +++++++++++++++++++++++++++++++++++++++++++++
>  cxl/lib/libcxl.sym |   5 ++
>  cxl/lib/private.h  |  14 ++++
>  cxl/libcxl.h       |  13 ++++
>  4 files changed, 206 insertions(+)
> 
> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> index ea5831f..9486b0f 100644
> --- a/cxl/lib/libcxl.c
> +++ b/cxl/lib/libcxl.c
> @@ -46,11 +46,13 @@ struct cxl_ctx {
>  	void *userdata;
>  	int memdevs_init;
>  	int buses_init;
> +	int perrors_init;
>  	unsigned long timeout;
>  	struct udev *udev;
>  	struct udev_queue *udev_queue;
>  	struct list_head memdevs;
>  	struct list_head buses;
> +	struct list_head perrors;
>  	struct kmod_ctx *kmod_ctx;
>  	struct daxctl_ctx *daxctl_ctx;
>  	void *private_data;
> @@ -205,6 +207,14 @@ static void free_bus(struct cxl_bus *bus, struct list_head *head)
>  	free(bus);
>  }
>  
> +static void free_protocol_error(struct cxl_protocol_error *perror,
> +				struct list_head *head)
> +{
> +	if (head)
> +		list_del_from(head, &perror->list);

I would go if (!head) return;

> +	free(perror);
> +}
> +
>  /**
>   * cxl_get_userdata - retrieve stored data pointer from library context
>   * @ctx: cxl library context
> @@ -328,6 +338,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
>  	*ctx = c;
>  	list_head_init(&c->memdevs);
>  	list_head_init(&c->buses);
> +	list_head_init(&c->perrors);
>  	c->kmod_ctx = kmod_ctx;
>  	c->daxctl_ctx = daxctl_ctx;
>  	c->udev = udev;
> @@ -369,6 +380,7 @@ CXL_EXPORT struct cxl_ctx *cxl_ref(struct cxl_ctx *ctx)
>   */
>  CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
>  {
> +	struct cxl_protocol_error *perror, *_p;
>  	struct cxl_memdev *memdev, *_d;
>  	struct cxl_bus *bus, *_b;
>  
> @@ -384,6 +396,9 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
>  	list_for_each_safe(&ctx->buses, bus, _b, port.list)
>  		free_bus(bus, &ctx->buses);
>  
> +	list_for_each_safe(&ctx->perrors, perror, _p, list)
> +		free_protocol_error(perror, &ctx->perrors);
> +
>  	udev_queue_unref(ctx->udev_queue);
>  	udev_unref(ctx->udev);
>  	kmod_unref(ctx->kmod_ctx);
> @@ -3416,6 +3431,165 @@ CXL_EXPORT int cxl_port_decoders_committed(struct cxl_port *port)
>  	return port->decoders_committed;
>  }
>  
> +const struct cxl_protocol_error cxl_protocol_errors[] = {
> +	CXL_PROTOCOL_ERROR(12, "cache-correctable"),
> +	CXL_PROTOCOL_ERROR(13, "cache-uncorrectable"),
> +	CXL_PROTOCOL_ERROR(14, "cache-fatal"),
> +	CXL_PROTOCOL_ERROR(15, "mem-correctable"),
> +	CXL_PROTOCOL_ERROR(16, "mem-uncorrectable"),
> +	CXL_PROTOCOL_ERROR(17, "mem-fatal")
> +};
> +
> +static struct cxl_protocol_error *create_cxl_protocol_error(struct cxl_ctx *ctx,
> +							    unsigned long n)

why unsigned long instead of int? are there that many errors?

> +{
> +	struct cxl_protocol_error *perror;
> +
> +	for (unsigned long i = 0; i < ARRAY_SIZE(cxl_protocol_errors); i++) {
> +		if (n != BIT(cxl_protocol_errors[i].num))
> +			continue;
> +
> +		perror = calloc(1, sizeof(*perror));
> +		if (!perror)
> +			return NULL;
> +
> +		*perror = cxl_protocol_errors[i];
> +		perror->ctx = ctx;
> +		return perror;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void cxl_add_protocol_errors(struct cxl_ctx *ctx)
> +{
> +	struct cxl_protocol_error *perror;
> +	char *path, *num, *save;
> +	unsigned long n;
> +	size_t path_len;
> +	char buf[512];

Use SYSFS_ATTR_SIZE rather than 512

> +	int rc = 0;
> +
> +	if (!ctx->debugfs)
> +		return;
> +
> +	path_len = strlen(ctx->debugfs) + 100;
> +	path = calloc(1, path_len);
> +	if (!path)
> +		return;
> +
> +	snprintf(path, path_len, "%s/cxl/einj_types", ctx->debugfs);
> +	rc = access(path, F_OK);
> +	if (rc) {
> +		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
strerror(errno)? access() returns -1 and the actual error is in errno.
> +		goto err;
> +	}
> +
> +	rc = sysfs_read_attr(ctx, path, buf);
> +	if (rc) {
> +		err(ctx, "failed to read %s: %s\n", path, strerror(-rc));
> +		goto err;
> +	}
> +
> +	/*
> +	 * The format of the output of the einj_types attr is:
> +	 * <Error number in hex 1> <Error name 1>
> +	 * <Error number in hex 2> <Error name 2>
> +	 * ...
> +	 *
> +	 * We only need the number, so parse that and skip the rest of
> +	 * the line.
> +	 */
> +	num = strtok_r(buf, " \n", &save);
> +	while (num) {
> +		n = strtoul(num, NULL, 16);
> +		perror = create_cxl_protocol_error(ctx, n);
> +		if (perror)
> +			list_add(&ctx->perrors, &perror->list);
> +
> +		num = strtok_r(NULL, "\n", &save);
> +		if (!num)
> +			break;
> +
> +		num = strtok_r(NULL, " \n", &save);
> +	}
> +
> +err:
> +	free(path);
> +}
> +
> +static void cxl_protocol_errors_init(struct cxl_ctx *ctx)
> +{
> +	if (ctx->perrors_init)
> +		return;
> +
> +	ctx->perrors_init = 1;
> +	cxl_add_protocol_errors(ctx);
> +}
> +
> +CXL_EXPORT struct cxl_protocol_error *
> +cxl_protocol_error_get_first(struct cxl_ctx *ctx)
> +{
> +	cxl_protocol_errors_init(ctx);
> +
> +	return list_top(&ctx->perrors, struct cxl_protocol_error, list);
> +}
> +
> +CXL_EXPORT struct cxl_protocol_error *
> +cxl_protocol_error_get_next(struct cxl_protocol_error *perror)
> +{
> +	struct cxl_ctx *ctx = perror->ctx;
> +
> +	return list_next(&ctx->perrors, perror, list);
> +}
> +
> +CXL_EXPORT unsigned long
> +cxl_protocol_error_get_num(struct cxl_protocol_error *perror)
> +{
> +	return perror->num;
> +}
> +
> +CXL_EXPORT const char *
> +cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
> +{
> +	return perror->string;
> +}
> +
> +CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
> +					       unsigned long error)
> +{
> +	struct cxl_ctx *ctx = dport->port->ctx;
> +	unsigned long path_len;
> +	char buf[32] = { 0 };
> +	char *path;
> +	int rc;
> +
> +	if (!ctx->debugfs)
> +		return -ENOENT;
> +
> +	path_len = strlen(ctx->debugfs) + 100;
> +	path = calloc(path_len, sizeof(char));
> +	if (!path)
> +		return -ENOMEM;
> +
> +	snprintf(path, path_len, "%s/cxl/%s/einj_inject", ctx->debugfs,
> +		 cxl_dport_get_devname(dport));

check return value

> +	rc = access(path, F_OK);
> +	if (rc) {
> +		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));

errno

> +		free(path);
> +		return rc;
-errno instead of rc

> +	}
> +
> +	snprintf(buf, sizeof(buf), "0x%lx\n", error);

check return value?

DJ

> +	rc = sysfs_write_attr(ctx, path, buf);
> +	if (rc)
> +		err(ctx, "failed to write %s: %s\n", path, strerror(-rc));
> +
> +	free(path);
> +	return rc;
> +}
> +
>  static void *add_cxl_bus(void *parent, int id, const char *cxlbus_base)
>  {
>  	const char *devname = devpath_to_devname(cxlbus_base);
> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
> index e01a676..02d5119 100644
> --- a/cxl/lib/libcxl.sym
> +++ b/cxl/lib/libcxl.sym
> @@ -299,4 +299,9 @@ global:
>  LIBCXL_10 {
>  global:
>  	cxl_memdev_is_port_ancestor;
> +	cxl_protocol_error_get_first;
> +	cxl_protocol_error_get_next;
> +	cxl_protocol_error_get_num;
> +	cxl_protocol_error_get_str;
> +	cxl_dport_protocol_error_inject;
>  } LIBCXL_9;
> diff --git a/cxl/lib/private.h b/cxl/lib/private.h
> index 7d5a1bc..4e881b6 100644
> --- a/cxl/lib/private.h
> +++ b/cxl/lib/private.h
> @@ -108,6 +108,20 @@ struct cxl_port {
>  	struct list_head dports;
>  };
>  
> +struct cxl_protocol_error {
> +	unsigned long num;
> +	const char *string;
> +	struct cxl_ctx *ctx;
> +	struct list_node list;
> +};
> +
> +#define CXL_PROTOCOL_ERROR(n, str)	\
> +	((struct cxl_protocol_error){	\
> +		.num = (n),		\
> +		.string = (str),	\
> +		.ctx = NULL,		\
> +	})
> +
>  struct cxl_bus {
>  	struct cxl_port port;
>  };
> diff --git a/cxl/libcxl.h b/cxl/libcxl.h
> index 54bc025..9026e05 100644
> --- a/cxl/libcxl.h
> +++ b/cxl/libcxl.h
> @@ -496,6 +496,19 @@ int cxl_cmd_alert_config_set_enable_alert_actions(struct cxl_cmd *cmd,
>  						  int enable);
>  struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memdev);
>  
> +struct cxl_protocol_error;
> +struct cxl_protocol_error *cxl_protocol_error_get_first(struct cxl_ctx *ctx);
> +struct cxl_protocol_error *
> +cxl_protocol_error_get_next(struct cxl_protocol_error *perror);
> +unsigned long cxl_protocol_error_get_num(struct cxl_protocol_error *perror);
> +const char *cxl_protocol_error_get_str(struct cxl_protocol_error *perror);
> +int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
> +				    unsigned long error);
> +
> +#define cxl_protocol_error_foreach(ctx, perror)				       \
> +	for (perror = cxl_protocol_error_get_first(ctx); perror != NULL;       \
> +	     perror = cxl_protocol_error_get_next(perror))
> +
>  #ifdef __cplusplus
>  } /* extern "C" */
>  #endif




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 3/7] libcxl: Add poison injection support
  2025-10-21 18:31 ` [ndctl PATCH v3 3/7] libcxl: Add poison injection support Ben Cheatham
@ 2025-10-21 23:44   ` Dave Jiang
  2025-10-23 20:15     ` Cheatham, Benjamin
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Jiang @ 2025-10-21 23:44 UTC (permalink / raw)
  To: Ben Cheatham, nvdimm; +Cc: linux-cxl, alison.schofield



On 10/21/25 11:31 AM, Ben Cheatham wrote:
> Add a library API for clearing and injecting poison into a CXL memory
> device through the CXL debugfs.
> 
> This API will be used by the 'cxl-inject-error' and 'cxl-clear-error'
> commands in later commits.
> 
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
>  cxl/lib/libcxl.c   | 60 ++++++++++++++++++++++++++++++++++++++++++++++
>  cxl/lib/libcxl.sym |  3 +++
>  cxl/libcxl.h       |  3 +++
>  3 files changed, 66 insertions(+)
> 
> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> index 9486b0f..9d4bd80 100644
> --- a/cxl/lib/libcxl.c
> +++ b/cxl/lib/libcxl.c
> @@ -5019,3 +5019,63 @@ CXL_EXPORT struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memde
>  {
>  	return cxl_cmd_new_generic(memdev, CXL_MEM_COMMAND_ID_SET_ALERT_CONFIG);
>  }
> +
> +CXL_EXPORT bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev)
> +{
> +	struct cxl_ctx *ctx = memdev->ctx;
> +	size_t path_len;
> +	bool exists;
> +	char *path;
> +
> +	if (!ctx->debugfs)
> +		return false;
> +
> +	path_len = strlen(ctx->debugfs) + 100;
> +	path = calloc(path_len, sizeof(char));
> +	if (!path)
> +		return false;
> +
> +	snprintf(path, path_len, "%s/cxl/%s/inject_poison", ctx->debugfs,
> +		 cxl_memdev_get_devname(memdev));

check return value

> +	exists = access(path, F_OK) == 0;

While this works, it is more readable this way:

	exists = true;
	...
	rc = access(path, F_OK);
	if (rc)
		exists = false;> +
> +	free(path);
> +	return exists;
> +}
> +
> +static int cxl_memdev_poison_action(struct cxl_memdev *memdev, size_t dpa,
> +				    bool clear)
> +{
> +	struct cxl_ctx *ctx = memdev->ctx;
> +	size_t path_len;
> +	char addr[32];
> +	char *path;
> +	int rc;
> +
> +	if (!ctx->debugfs)
> +		return -ENOENT;
> +
> +	path_len = strlen(ctx->debugfs) + 100;
> +	path = calloc(path_len, sizeof(char));
> +	if (!path)
> +		return -ENOMEM;
> +
> +	snprintf(path, path_len, "%s/cxl/%s/%s", ctx->debugfs,
> +		 cxl_memdev_get_devname(memdev),
> +		 clear ? "clear_poison" : "inject_poison");
> +	snprintf(addr, 32, "0x%lx\n", dpa);

check return values for both snprintf()

DJ

> +
> +	rc = sysfs_write_attr(ctx, path, addr);
> +	free(path);
> +	return rc;
> +}
> +
> +CXL_EXPORT int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t addr)
> +{
> +	return cxl_memdev_poison_action(memdev, addr, false);
> +}
> +
> +CXL_EXPORT int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t addr)
> +{
> +	return cxl_memdev_poison_action(memdev, addr, true);
> +}
> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym
> index 02d5119..3bce60d 100644
> --- a/cxl/lib/libcxl.sym
> +++ b/cxl/lib/libcxl.sym
> @@ -304,4 +304,7 @@ global:
>  	cxl_protocol_error_get_num;
>  	cxl_protocol_error_get_str;
>  	cxl_dport_protocol_error_inject;
> +	cxl_memdev_has_poison_injection;
> +	cxl_memdev_inject_poison;
> +	cxl_memdev_clear_poison;
>  } LIBCXL_9;
> diff --git a/cxl/libcxl.h b/cxl/libcxl.h
> index 9026e05..3b51d61 100644
> --- a/cxl/libcxl.h
> +++ b/cxl/libcxl.h
> @@ -105,6 +105,9 @@ int cxl_memdev_read_label(struct cxl_memdev *memdev, void *buf, size_t length,
>  		size_t offset);
>  int cxl_memdev_write_label(struct cxl_memdev *memdev, void *buf, size_t length,
>  		size_t offset);
> +bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev);
> +int cxl_memdev_inject_poison(struct cxl_memdev *memdev, size_t dpa);
> +int cxl_memdev_clear_poison(struct cxl_memdev *memdev, size_t dpa);
>  struct cxl_cmd *cxl_cmd_new_get_fw_info(struct cxl_memdev *memdev);
>  unsigned int cxl_cmd_fw_info_get_num_slots(struct cxl_cmd *cmd);
>  unsigned int cxl_cmd_fw_info_get_active_slot(struct cxl_cmd *cmd);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 4/7] cxl: Add inject-error command
  2025-10-21 18:31 ` [ndctl PATCH v3 4/7] cxl: Add inject-error command Ben Cheatham
@ 2025-10-22 17:06   ` Dave Jiang
  2025-10-23 20:15     ` Cheatham, Benjamin
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Jiang @ 2025-10-22 17:06 UTC (permalink / raw)
  To: Ben Cheatham, nvdimm; +Cc: linux-cxl, alison.schofield



On 10/21/25 11:31 AM, Ben Cheatham wrote:
> Add the 'cxl-inject-error' command. This command will provide CXL
> protocol error injection for CXL VH root ports and CXL RCH downstream
> ports, as well as poison injection for CXL memory devices.
> 
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
>  cxl/builtin.h      |   1 +
>  cxl/cxl.c          |   1 +
>  cxl/inject-error.c | 195 +++++++++++++++++++++++++++++++++++++++++++++
>  cxl/meson.build    |   1 +
>  4 files changed, 198 insertions(+)
>  create mode 100644 cxl/inject-error.c
> 
> diff --git a/cxl/builtin.h b/cxl/builtin.h
> index c483f30..e82fcb5 100644
> --- a/cxl/builtin.h
> +++ b/cxl/builtin.h
> @@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx);
>  int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>  int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>  int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
>  #ifdef ENABLE_LIBTRACEFS
>  int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
>  #else
> diff --git a/cxl/cxl.c b/cxl/cxl.c
> index 1643667..a98bd6b 100644
> --- a/cxl/cxl.c
> +++ b/cxl/cxl.c
> @@ -80,6 +80,7 @@ static struct cmd_struct commands[] = {
>  	{ "disable-region", .c_fn = cmd_disable_region },
>  	{ "destroy-region", .c_fn = cmd_destroy_region },
>  	{ "monitor", .c_fn = cmd_monitor },
> +	{ "inject-error", .c_fn = cmd_inject_error },
>  };
>  
>  int main(int argc, const char **argv)
> diff --git a/cxl/inject-error.c b/cxl/inject-error.c
> new file mode 100644
> index 0000000..c48ea69
> --- /dev/null
> +++ b/cxl/inject-error.c
> @@ -0,0 +1,195 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2025 AMD. All rights reserved. */
> +#include <util/parse-options.h>
> +#include <cxl/libcxl.h>
> +#include <cxl/filter.h>
> +#include <util/log.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <stdio.h>
> +#include <errno.h>
> +#include <limits.h>
> +
> +#define EINJ_TYPES_BUF_SIZE 512
> +
> +static bool debug;
> +
> +static struct inject_params {
> +	const char *type;
> +	const char *address;
> +} inj_param;
> +
> +static const struct option inject_options[] = {
> +	OPT_STRING('t', "type", &inj_param.type, "Error type",
> +		   "Error type to inject into <device>"),
> +	OPT_STRING('a', "address", &inj_param.address, "Address for poison injection",
> +		   "Device physical address for poison injection in hex or decimal"),
> +#ifdef ENABLE_DEBUG
> +	OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
> +#endif
> +	OPT_END(),
> +};
> +
> +static struct log_ctx iel;
> +
> +static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
> +						     const char *type)
> +{
> +	struct cxl_protocol_error *perror;
> +
> +	cxl_protocol_error_foreach(ctx, perror) {
> +		if (strcmp(type, cxl_protocol_error_get_str(perror)) == 0)
> +			return perror;
> +	}
> +
> +	log_err(&iel, "Invalid CXL protocol error type: %s\n", type);
> +	return NULL;
> +}
> +
> +static struct cxl_dport *find_cxl_dport(struct cxl_ctx *ctx, const char *devname)
> +{
> +	struct cxl_port *port, *top;
> +	struct cxl_dport *dport;
> +	struct cxl_bus *bus;
> +
> +	cxl_bus_foreach(ctx, bus) {
> +		top = cxl_bus_get_port(bus);
> +
> +		cxl_port_foreach_all(top, port)
> +			cxl_dport_foreach(port, dport)
> +				if (!strcmp(devname,
> +					    cxl_dport_get_devname(dport)))
> +					return dport;

Would it be worthwhile to create a util_cxl_dport_filter()?

> +	}
> +
> +	log_err(&iel, "Downstream port \"%s\" not found\n", devname);
> +	return NULL;
> +}
> +
> +static struct cxl_memdev *find_cxl_memdev(struct cxl_ctx *ctx,
> +					  const char *filter)
> +{
> +	struct cxl_memdev *memdev;
> +
> +	cxl_memdev_foreach(ctx, memdev) {
> +		if (util_cxl_memdev_filter(memdev, filter, NULL))
> +			return memdev;
> +	}
> +
> +	log_err(&iel, "Memdev \"%s\" not found\n", filter);
> +	return NULL;
> +}
> +
> +static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
> +			    struct cxl_protocol_error *perror)
> +{
> +	struct cxl_dport *dport;
> +	int rc;
> +
> +	if (!devname) {
> +		log_err(&iel, "No downstream port specified for injection\n");
> +		return -EINVAL;
> +	}
> +
> +	dport = find_cxl_dport(ctx, devname);
> +	if (!dport)
> +		return -ENODEV;
> +
> +	rc = cxl_dport_protocol_error_inject(dport,
> +					     cxl_protocol_error_get_num(perror));
> +	if (rc)
> +		return rc;
> +
> +	printf("injected %s protocol error.\n",
> +	       cxl_protocol_error_get_str(perror));

log_info() maybe?

> +	return 0;
> +}
> +
> +static int poison_action(struct cxl_ctx *ctx, const char *filter,
> +			 const char *addr)
> +{
> +	struct cxl_memdev *memdev;
> +	size_t a;

Maybe rename 'addr' to 'addr_str' and rename 'a' to 'addr'

> +	int rc;
> +
> +	memdev = find_cxl_memdev(ctx, filter);
> +	if (!memdev)
> +		return -ENODEV;
> +
> +	if (!cxl_memdev_has_poison_injection(memdev)) {
> +		log_err(&iel, "%s does not support error injection\n",
> +			cxl_memdev_get_devname(memdev));
> +		return -EINVAL;
> +	}
> +
> +	if (!addr) {
> +		log_err(&iel, "no address provided\n");
> +		return -EINVAL;
> +	}
> +
> +	a = strtoull(addr, NULL, 0);
> +	if (a == ULLONG_MAX && errno == ERANGE) {
> +		log_err(&iel, "invalid address %s", addr);
> +		return -EINVAL;
> +	}
> +
> +	rc = cxl_memdev_inject_poison(memdev, a);
> +

unnecessary blank line> +	if (rc)
> +		log_err(&iel, "failed to inject poison at %s:%s: %s\n",
> +			cxl_memdev_get_devname(memdev), addr, strerror(-rc));
> +	else
> +		printf("poison injected at %s:%s\n",
> +		       cxl_memdev_get_devname(memdev), addr);

log_info() maybe?

DJ

> +
> +	return rc;
> +}
> +
> +static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx,
> +			 const struct option *options, const char *usage)
> +{
> +	struct cxl_protocol_error *perr;
> +	const char * const u[] = {
> +		usage,
> +		NULL
> +	};
> +	int rc = -EINVAL;
> +
> +	log_init(&iel, "cxl inject-error", "CXL_INJECT_LOG");
> +	argc = parse_options(argc, argv, options, u, 0);
> +
> +	if (debug) {
> +		cxl_set_log_priority(ctx, LOG_DEBUG);
> +		iel.log_priority = LOG_DEBUG;
> +	} else {
> +		iel.log_priority = LOG_INFO;
> +	}
> +
> +	if (argc != 1) {
> +		usage_with_options(u, options);
> +		return rc;
> +	}
> +
> +	if (strcmp(inj_param.type, "poison") == 0) {
> +		rc = poison_action(ctx, argv[0], inj_param.address);
> +		return rc;
> +	}
> +
> +	perr = find_cxl_proto_err(ctx, inj_param.type);
> +	if (perr) {
> +		rc = inject_proto_err(ctx, argv[0], perr);
> +		if (rc)
> +			log_err(&iel, "Failed to inject error: %d\n", rc);
> +	}
> +
> +	log_err(&iel, "Invalid error type %s", inj_param.type);
> +	return rc;
> +}
> +
> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx)
> +{
> +	int rc = inject_action(argc, argv, ctx, inject_options,
> +			       "inject-error <device> [<options>]");
> +
> +	return rc ? EXIT_FAILURE : EXIT_SUCCESS;
> +}
> diff --git a/cxl/meson.build b/cxl/meson.build
> index b9924ae..92031b5 100644
> --- a/cxl/meson.build
> +++ b/cxl/meson.build
> @@ -7,6 +7,7 @@ cxl_src = [
>    'memdev.c',
>    'json.c',
>    'filter.c',
> +  'inject-error.c',
>    '../daxctl/json.c',
>    '../daxctl/filter.c',
>  ]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 6/7] cxl/list: Add injectable errors in output
  2025-10-21 18:31 ` [ndctl PATCH v3 6/7] cxl/list: Add injectable errors in output Ben Cheatham
@ 2025-10-22 17:18   ` Dave Jiang
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jiang @ 2025-10-22 17:18 UTC (permalink / raw)
  To: Ben Cheatham, nvdimm; +Cc: linux-cxl, alison.schofield



On 10/21/25 11:31 AM, Ben Cheatham wrote:
> Add the "--injectable-errors"/"-N" option to show injectable error
> information for CXL devices. The applicable devices are CXL memory
> devices and CXL busses.
> 
> For CXL memory devices the option reports whether the device supports
> poison injection (the "--media-errors"/"-L" option shows injected
> poison).
> 
> For CXL busses the option shows injectable CXL protocol error types. The
> information will be the same across busses because the error types are
> system-wide. The information is presented under the bus for easier
> filtering.
> 
> Update the man page for 'cxl-list' to show the usage of the new option.
> 
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
>  Documentation/cxl/cxl-list.txt | 35 +++++++++++++++++++++++++++++++++-
>  cxl/filter.h                   |  3 +++
>  cxl/json.c                     | 30 +++++++++++++++++++++++++++++
>  cxl/list.c                     |  3 +++
>  util/json.h                    |  1 +
>  5 files changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> index 0595638..35ff542 100644
> --- a/Documentation/cxl/cxl-list.txt
> +++ b/Documentation/cxl/cxl-list.txt
> @@ -471,6 +471,38 @@ The media-errors option is only available with '-Dlibtracefs=enabled'.
>  }
>  ----
>  
> +-N::
> +--injectable-errors::
> +	Include injectable error information in the output. For CXL memory devices
> +	this includes whether poison is injectable through the kernel debug filesystem.
> +	The types of CXL protocol errors available for injection into downstream ports
> +	are listed as part of a CXL bus object.
> +
> +----
> +# cxl list -NB
> +[
> +  {
> +	"bus":"root0",
> +	"provider":"ACPI.CXL",
> +	"injectable_protocol_errors":[
> +	  "mem-correctable",
> +	  "mem-fatal",
> +	]
> +  }
> +]
> +
> +# cxl list -N
> +[
> +  {
> +    "memdev":"mem0",
> +    "pmem_size":268435456,
> +    "ram_size":268435456,
> +    "serial":2,
> +	"poison_injectable":true
> +  }
> +]
> +
> +----
>  -v::
>  --verbose::
>  	Increase verbosity of the output. This can be specified
> @@ -487,7 +519,8 @@ The media-errors option is only available with '-Dlibtracefs=enabled'.
>  	  devices with --idle.
>  	- *-vvv*
>  	  Everything *-vv* provides, plus enable
> -	  --health, --partition, and --media-errors.
> +	  --health, --partition, --media-errors, and
> +	  --injectable-errors.
>  
>  --debug::
>  	If the cxl tool was built with debug enabled, turn on debug
> diff --git a/cxl/filter.h b/cxl/filter.h
> index 956a46e..34f8387 100644
> --- a/cxl/filter.h
> +++ b/cxl/filter.h
> @@ -31,6 +31,7 @@ struct cxl_filter_params {
>  	bool alert_config;
>  	bool dax;
>  	bool media_errors;
> +	bool inj_errors;
>  	int verbose;
>  	struct log_ctx ctx;
>  };
> @@ -91,6 +92,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param)
>  		flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS;
>  	if (param->media_errors)
>  		flags |= UTIL_JSON_MEDIA_ERRORS;
> +	if (param->inj_errors)
> +		flags |= UTIL_JSON_INJ_ERRORS;
>  	return flags;
>  }
>  
> diff --git a/cxl/json.c b/cxl/json.c
> index bde4589..2917477 100644
> --- a/cxl/json.c
> +++ b/cxl/json.c
> @@ -675,6 +675,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev,
>  			json_object_object_add(jdev, "firmware", jobj);
>  	}
>  
> +	if (flags & UTIL_JSON_INJ_ERRORS) {
> +		jobj = json_object_new_boolean(cxl_memdev_has_poison_injection(memdev));
> +		if (jobj)
> +			json_object_object_add(jdev, "poison_injectable", jobj);
> +	}
> +
>  	if (flags & UTIL_JSON_MEDIA_ERRORS) {
>  		jobj = util_cxl_poison_list_to_json(NULL, memdev, flags);
>  		if (jobj)
> @@ -750,6 +756,8 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
>  					 unsigned long flags)
>  {
>  	const char *devname = cxl_bus_get_devname(bus);
> +	struct cxl_ctx *ctx = cxl_bus_get_ctx(bus);
> +	struct cxl_protocol_error *perror;
>  	struct json_object *jbus, *jobj;
>  
>  	jbus = json_object_new_object();
> @@ -765,6 +773,28 @@ struct json_object *util_cxl_bus_to_json(struct cxl_bus *bus,
>  		json_object_object_add(jbus, "provider", jobj);
>  
>  	json_object_set_userdata(jbus, bus, NULL);
> +
> +	if (flags & UTIL_JSON_INJ_ERRORS) {
> +		jobj = json_object_new_array();
> +		if (!jobj)
> +			return jbus;
> +
> +		cxl_protocol_error_foreach(ctx, perror)
> +		{
> +			struct json_object *jerr_str;
> +			const char *perror_str;
> +
> +			perror_str = cxl_protocol_error_get_str(perror);
> +
> +			jerr_str = json_object_new_string(perror_str);
> +			if (jerr_str)
> +				json_object_array_add(jobj, jerr_str);
> +		}
> +
> +		json_object_object_add(jbus, "injectable_protocol_errors",
> +				       jobj);
> +	}
> +
>  	return jbus;
>  }
>  
> diff --git a/cxl/list.c b/cxl/list.c
> index 0b25d78..a505ed6 100644
> --- a/cxl/list.c
> +++ b/cxl/list.c
> @@ -59,6 +59,8 @@ static const struct option options[] = {
>  		    "include alert configuration information"),
>  	OPT_BOOLEAN('L', "media-errors", &param.media_errors,
>  		    "include media-error information "),
> +	OPT_BOOLEAN('N', "injectable-errors", &param.inj_errors,
> +		    "include injectable error information"),
>  	OPT_INCR('v', "verbose", &param.verbose, "increase output detail"),
>  #ifdef ENABLE_DEBUG
>  	OPT_BOOLEAN(0, "debug", &debug, "debug list walk"),
> @@ -124,6 +126,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx)
>  		param.alert_config = true;
>  		param.dax = true;
>  		param.media_errors = true;
> +		param.inj_errors = true;
>  		/* fallthrough */
>  	case 2:
>  		param.idle = true;
> diff --git a/util/json.h b/util/json.h
> index 560f845..57278cb 100644
> --- a/util/json.h
> +++ b/util/json.h
> @@ -21,6 +21,7 @@ enum util_json_flags {
>  	UTIL_JSON_TARGETS	= (1 << 11),
>  	UTIL_JSON_PARTITION	= (1 << 12),
>  	UTIL_JSON_ALERT_CONFIG	= (1 << 13),
> +	UTIL_JSON_INJ_ERRORS	= (1 << 14),
>  };
>  
>  void util_display_json_array(FILE *f_out, struct json_object *jarray,


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 7/7] Documentation: Add docs for inject/clear-error commands
  2025-10-21 18:31 ` [ndctl PATCH v3 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
@ 2025-10-22 17:22   ` Dave Jiang
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jiang @ 2025-10-22 17:22 UTC (permalink / raw)
  To: Ben Cheatham, nvdimm; +Cc: linux-cxl, alison.schofield



On 10/21/25 11:31 AM, Ben Cheatham wrote:
> Add man pages for the 'cxl-inject-error' and 'cxl-clear-error' commands.
> These man pages show usage and examples for each of their use cases.
> 
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
>  Documentation/cxl/cxl-clear-error.txt  |  67 +++++++++++++
>  Documentation/cxl/cxl-inject-error.txt | 129 +++++++++++++++++++++++++
>  Documentation/cxl/meson.build          |   2 +
>  3 files changed, 198 insertions(+)
>  create mode 100644 Documentation/cxl/cxl-clear-error.txt
>  create mode 100644 Documentation/cxl/cxl-inject-error.txt
> 
> diff --git a/Documentation/cxl/cxl-clear-error.txt b/Documentation/cxl/cxl-clear-error.txt
> new file mode 100644
> index 0000000..ccb0e63
> --- /dev/null
> +++ b/Documentation/cxl/cxl-clear-error.txt
> @@ -0,0 +1,67 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +cxl-clear-error(1)
> +==================
> +
> +NAME
> +----
> +cxl-clear-error - Clear CXL errors from CXL devices
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'cxl clear-error' <device name> [<options>]
> +
> +Clear an error from a CXL device. The types of devices supported are:
> +
> +"memdevs":: A CXL memory device. Memory devices are specified by device
> +name ("mem0"), device id ("0") and/or host device name ("0000:35:00.0").
> +
> +Only device poison (viewable using the '-L'/'--media-errors' option of
> +'cxl-list') can be cleared from a device using this command. For example:
> +
> +----
> +
> +# cxl list -m mem0 -L -u
> +{
> +  "memdev":"mem0",
> +  "ram_size":"1024.00 MiB (1073.74 MB)",
> +  "ram_qos_class":42,
> +  "serial":"0x0",
> +  "numa_node:1,
> +  "host":"0000:35:00.0",
> +  "media_errors":[
> +    {
> +	  "offset":"0x1000",
> +	  "length":64,
> +	  "source":"Injected"
> +	}
> +  ]
> +}
> +
> +# cxl clear-error mem0 -a 0x1000
> +poison cleared at mem0:0x1000
> +
> +# cxl list -m mem0 -L -u
> +{
> +  "memdev":"mem0",
> +  "ram_size":"1024.00 MiB (1073.74 MB)",
> +  "ram_qos_class":42,
> +  "serial":"0x0",
> +  "numa_node:1,
> +  "host":"0000:35:00.0",
> +  "media_errors":[
> +  ]
> +}
> +
> +----
> +
> +OPTIONS
> +-------
> +-a::
> +--address::
> +	Device physical address (DPA) to clear poison from. Address can be specified
> +	in hex or decimal. Required for clearing poison.
> +
> +--debug::
> +	Enable debug output
> diff --git a/Documentation/cxl/cxl-inject-error.txt b/Documentation/cxl/cxl-inject-error.txt
> new file mode 100644
> index 0000000..e1bebd7
> --- /dev/null
> +++ b/Documentation/cxl/cxl-inject-error.txt
> @@ -0,0 +1,129 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +cxl-inject-error(1)
> +===================
> +
> +NAME
> +----
> +cxl-inject-error - Inject CXL errors into CXL devices
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'cxl inject-error' <device name> [<options>]
> +
> +Inject an error into a CXL device. The type of errors supported depend on the
> +device specified. The types of devices supported are:
> +
> +"Downstream Ports":: A CXL RCH downstream port (dport) or a CXL VH root port.
> +Eligible CXL 2.0+ ports are dports of ports at depth 1 in the output of cxl-list.
> +Dports are specified by host name ("0000:0e:01.1").
> +"memdevs":: A CXL memory device. Memory devices are specified by device name
> +("mem0"), device id ("0"), and/or host device name ("0000:35:00.0").
> +
> +There are two types of errors which can be injected: CXL protocol errors
> +and device poison.
> +
> +CXL protocol errors can only be used with downstream ports (as defined above).
> +Protocol errors follow the format of "<protocol>-<severity>". For example,
> +a "mem-fatal" error is a CXL.mem fatal protocol error. Protocol errors can be
> +found with the '-N' option of 'cxl-list' under a CXL bus object. For example:
> +
> +----
> +
> +# cxl list -NB
> +[
> +  {
> +	"bus":"root0",
> +	"provider":"ACPI.CXL",
> +	"injectable_protocol_errors":[
> +	  "mem-correctable",
> +	  "mem-fatal",
> +	]
> +  }
> +]
> +
> +----
> +
> +CXL protocol (CXL.cache/mem) error injection requires the platform to support
> +ACPI v6.5+ error injection (EINJ). In addition to platform support, the
> +CONFIG_ACPI_APEI_EINJ and CONFIG_ACPI_APEI_EINJ_CXL kernel configuration options
> +will need to be enabled. For more information, view the Linux kernel documentation
> +on EINJ.
> +
> +Device poison can only by used with CXL memory devices. A device physical address
> +(DPA) is required to do poison injection. DPAs range from 0 to the size of
> +device's memory, which can be found using 'cxl-list'. An example injection:
> +
> +----
> +
> +# cxl inject-error mem0 -t poison -a 0x1000
> +poison injected at mem0:0x1000
> +# cxl list -m mem0 -u --media-errors
> +{
> +  "memdev":"mem0",
> +  "ram_size":"256.00 MiB (268.44 MB)",
> +  "serial":"0",
> +  "host":"0000:0d:00.0",
> +  "firmware_version":"BWFW VERSION 00",
> +  "media_errors":[
> +    {
> +      "offset":"0x1000",
> +      "length":64,
> +      "source":"Injected"
> +    }
> +  ]
> +}
> +
> +----
> +
> +Not all devices support poison injection. To see if a device supports poison injection
> +through debugfs, use 'cxl-list' with the '-N' option and look for the "poison-injectable"
> +attribute under the device. Example:
> +
> +----
> +
> +# cxl list -Nu -m mem0
> +{
> +  "memdev":"mem0",
> +  "ram_size":"256.00 MiB (268.44 MB)",
> +  "serial":"0",
> +  "host":"0000:0d:00.0",
> +  "firmware_version":"BWFW VERSION 00",
> +  "poison_injectable":true
> +}
> +
> +----
> +
> +This command depends on the kernel debug filesystem (debugfs) to do CXL protocol
> +error and device poison injection.
> +
> +OPTIONS
> +-------
> +-a::
> +--address::
> +	Device physical address (DPA) to use for poison injection. Address can
> +	be specified in hex or decimal. Required for poison injection.
> +
> +-t::
> +--type::
> +	Type of error to inject into <device name>. The type of error is restricted
> +	by device type. The following shows the possible types under their associated
> +	device type(s):
> +----
> +
> +Downstream Ports: ::
> +	cache-correctable, cache-uncorrectable, cache-fatal, mem-correctable,
> +	mem-fatal
> +
> +Memdevs: ::
> +	poison
> +
> +----
> +
> +--debug::
> +	Enable debug output
> +
> +SEE ALSO
> +--------
> +linkcxl:cxl-list[1]
> diff --git a/Documentation/cxl/meson.build b/Documentation/cxl/meson.build
> index 8085c1c..0b75eed 100644
> --- a/Documentation/cxl/meson.build
> +++ b/Documentation/cxl/meson.build
> @@ -50,6 +50,8 @@ cxl_manpages = [
>    'cxl-update-firmware.txt',
>    'cxl-set-alert-config.txt',
>    'cxl-wait-sanitize.txt',
> +  'cxl-inject-error.txt',
> +  'cxl-clear-error.txt',
>  ]
>  
>  foreach man : cxl_manpages


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 1/7] libcxl: Add debugfs path to CXL context
  2025-10-21 22:55   ` Dave Jiang
@ 2025-10-23 20:15     ` Cheatham, Benjamin
  0 siblings, 0 replies; 20+ messages in thread
From: Cheatham, Benjamin @ 2025-10-23 20:15 UTC (permalink / raw)
  To: Dave Jiang, nvdimm; +Cc: linux-cxl, alison.schofield

Hi Dave, thanks for taking a look!

On 10/21/2025 5:55 PM, Dave Jiang wrote:
> 
> 
> On 10/21/25 11:31 AM, Ben Cheatham wrote:
>> Find the CXL debugfs mount point and add it to the CXL library context.
>> This will be used by poison and procotol error library functions to
>> access the information presented by the filesystem.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>>  cxl/lib/libcxl.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 40 insertions(+)
>>
>> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
>> index cafde1c..ea5831f 100644
>> --- a/cxl/lib/libcxl.c
>> +++ b/cxl/lib/libcxl.c
>> @@ -54,6 +54,7 @@ struct cxl_ctx {
>>  	struct kmod_ctx *kmod_ctx;
>>  	struct daxctl_ctx *daxctl_ctx;
>>  	void *private_data;
>> +	const char *debugfs;
>>  };
>>  
>>  static void free_pmem(struct cxl_pmem *pmem)
>> @@ -240,6 +241,43 @@ CXL_EXPORT void *cxl_get_private_data(struct cxl_ctx *ctx)
>>  	return ctx->private_data;
>>  }
>>  
>> +static char *get_debugfs_dir(void)
> 
> const char *?
> 

Will do

> 
> Also maybe get_debugfs_dir_path()
> 
>> +{
>> +	char *dev, *dir, *type, *ret = NULL;
> 
> 'debugfs_dir' rather than 'ret' would be clearer to read.
> 

Makes sense, I'll change it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 2/7] libcxl: Add CXL protocol errors
  2025-10-21 23:15   ` Dave Jiang
@ 2025-10-23 20:15     ` Cheatham, Benjamin
  2025-10-23 22:50       ` Dave Jiang
  0 siblings, 1 reply; 20+ messages in thread
From: Cheatham, Benjamin @ 2025-10-23 20:15 UTC (permalink / raw)
  To: Dave Jiang, nvdimm; +Cc: linux-cxl, alison.schofield

On 10/21/2025 6:15 PM, Dave Jiang wrote:
> 
> 
> On 10/21/25 11:31 AM, Ben Cheatham wrote:
>> The v6.11 Linux kernel adds CXL protocl (CXL.cache & CXL.mem) error
>> injection for platforms that implement the error types as according to
>> the v6.5+ ACPI specification. The interface for injecting these errors
>> are provided by the kernel under the CXL debugfs. The relevant files in
>> the interface are the einj_types file, which provides the available CXL
>> error types for injection, and the einj_inject file, which injects the
>> error into a CXL VH root port or CXL RCH downstream port.
>>
>> Add a library API to retrieve the CXL error types and inject them. This
>> API will be used in a later commit by the 'cxl-inject-error' and
>> 'cxl-list' commands.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>>  cxl/lib/libcxl.c   | 174 +++++++++++++++++++++++++++++++++++++++++++++
>>  cxl/lib/libcxl.sym |   5 ++
>>  cxl/lib/private.h  |  14 ++++
>>  cxl/libcxl.h       |  13 ++++
>>  4 files changed, 206 insertions(+)
>>
>> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
>> index ea5831f..9486b0f 100644
>> --- a/cxl/lib/libcxl.c
>> +++ b/cxl/lib/libcxl.c
>> @@ -46,11 +46,13 @@ struct cxl_ctx {
>>  	void *userdata;
>>  	int memdevs_init;
>>  	int buses_init;
>> +	int perrors_init;
>>  	unsigned long timeout;
>>  	struct udev *udev;
>>  	struct udev_queue *udev_queue;
>>  	struct list_head memdevs;
>>  	struct list_head buses;
>> +	struct list_head perrors;
>>  	struct kmod_ctx *kmod_ctx;
>>  	struct daxctl_ctx *daxctl_ctx;
>>  	void *private_data;
>> @@ -205,6 +207,14 @@ static void free_bus(struct cxl_bus *bus, struct list_head *head)
>>  	free(bus);
>>  }
>>  
>> +static void free_protocol_error(struct cxl_protocol_error *perror,
>> +				struct list_head *head)
>> +{
>> +	if (head)
>> +		list_del_from(head, &perror->list);
> 
> I would go if (!head) return;
> 

Would that work? I think I would still need to free perror below.

>> +	free(perror);
>> +}
>> +
>>  /**
>>   * cxl_get_userdata - retrieve stored data pointer from library context
>>   * @ctx: cxl library context
>> @@ -328,6 +338,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
>>  	*ctx = c;
>>  	list_head_init(&c->memdevs);
>>  	list_head_init(&c->buses);
>> +	list_head_init(&c->perrors);
>>  	c->kmod_ctx = kmod_ctx;
>>  	c->daxctl_ctx = daxctl_ctx;
>>  	c->udev = udev;
>> @@ -369,6 +380,7 @@ CXL_EXPORT struct cxl_ctx *cxl_ref(struct cxl_ctx *ctx)
>>   */
>>  CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
>>  {
>> +	struct cxl_protocol_error *perror, *_p;
>>  	struct cxl_memdev *memdev, *_d;
>>  	struct cxl_bus *bus, *_b;
>>  
>> @@ -384,6 +396,9 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
>>  	list_for_each_safe(&ctx->buses, bus, _b, port.list)
>>  		free_bus(bus, &ctx->buses);
>>  
>> +	list_for_each_safe(&ctx->perrors, perror, _p, list)
>> +		free_protocol_error(perror, &ctx->perrors);
>> +
>>  	udev_queue_unref(ctx->udev_queue);
>>  	udev_unref(ctx->udev);
>>  	kmod_unref(ctx->kmod_ctx);
>> @@ -3416,6 +3431,165 @@ CXL_EXPORT int cxl_port_decoders_committed(struct cxl_port *port)
>>  	return port->decoders_committed;
>>  }
>>  
>> +const struct cxl_protocol_error cxl_protocol_errors[] = {
>> +	CXL_PROTOCOL_ERROR(12, "cache-correctable"),
>> +	CXL_PROTOCOL_ERROR(13, "cache-uncorrectable"),
>> +	CXL_PROTOCOL_ERROR(14, "cache-fatal"),
>> +	CXL_PROTOCOL_ERROR(15, "mem-correctable"),
>> +	CXL_PROTOCOL_ERROR(16, "mem-uncorrectable"),
>> +	CXL_PROTOCOL_ERROR(17, "mem-fatal")
>> +};
>> +
>> +static struct cxl_protocol_error *create_cxl_protocol_error(struct cxl_ctx *ctx,
>> +							    unsigned long n)
> 
> why unsigned long instead of int? are there that many errors?
>

No there aren't. I'll change it over to unsigned int instead.

>> +{
>> +	struct cxl_protocol_error *perror;
>> +
>> +	for (unsigned long i = 0; i < ARRAY_SIZE(cxl_protocol_errors); i++) {
>> +		if (n != BIT(cxl_protocol_errors[i].num))
>> +			continue;
>> +
>> +		perror = calloc(1, sizeof(*perror));
>> +		if (!perror)
>> +			return NULL;
>> +
>> +		*perror = cxl_protocol_errors[i];
>> +		perror->ctx = ctx;
>> +		return perror;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static void cxl_add_protocol_errors(struct cxl_ctx *ctx)
>> +{
>> +	struct cxl_protocol_error *perror;
>> +	char *path, *num, *save;
>> +	unsigned long n;
>> +	size_t path_len;
>> +	char buf[512];
> 
> Use SYSFS_ATTR_SIZE rather than 512

Wasn't aware of that, will do!

> 
>> +	int rc = 0;
>> +
>> +	if (!ctx->debugfs)
>> +		return;
>> +
>> +	path_len = strlen(ctx->debugfs) + 100;
>> +	path = calloc(1, path_len);
>> +	if (!path)
>> +		return;
>> +
>> +	snprintf(path, path_len, "%s/cxl/einj_types", ctx->debugfs);
>> +	rc = access(path, F_OK);
>> +	if (rc) {
>> +		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
> strerror(errno)? access() returns -1 and the actual error is in errno.

My bad, will update it (and elsewhere).

>> +		goto err;
>> +	}
>> +
>> +	rc = sysfs_read_attr(ctx, path, buf);
>> +	if (rc) {
>> +		err(ctx, "failed to read %s: %s\n", path, strerror(-rc));
>> +		goto err;
>> +	}
>> +
>> +	/*
>> +	 * The format of the output of the einj_types attr is:
>> +	 * <Error number in hex 1> <Error name 1>
>> +	 * <Error number in hex 2> <Error name 2>
>> +	 * ...
>> +	 *
>> +	 * We only need the number, so parse that and skip the rest of
>> +	 * the line.
>> +	 */
>> +	num = strtok_r(buf, " \n", &save);
>> +	while (num) {
>> +		n = strtoul(num, NULL, 16);
>> +		perror = create_cxl_protocol_error(ctx, n);
>> +		if (perror)
>> +			list_add(&ctx->perrors, &perror->list);
>> +
>> +		num = strtok_r(NULL, "\n", &save);
>> +		if (!num)
>> +			break;
>> +
>> +		num = strtok_r(NULL, " \n", &save);
>> +	}
>> +
>> +err:
>> +	free(path);
>> +}
>> +
>> +static void cxl_protocol_errors_init(struct cxl_ctx *ctx)
>> +{
>> +	if (ctx->perrors_init)
>> +		return;
>> +
>> +	ctx->perrors_init = 1;
>> +	cxl_add_protocol_errors(ctx);
>> +}
>> +
>> +CXL_EXPORT struct cxl_protocol_error *
>> +cxl_protocol_error_get_first(struct cxl_ctx *ctx)
>> +{
>> +	cxl_protocol_errors_init(ctx);
>> +
>> +	return list_top(&ctx->perrors, struct cxl_protocol_error, list);
>> +}
>> +
>> +CXL_EXPORT struct cxl_protocol_error *
>> +cxl_protocol_error_get_next(struct cxl_protocol_error *perror)
>> +{
>> +	struct cxl_ctx *ctx = perror->ctx;
>> +
>> +	return list_next(&ctx->perrors, perror, list);
>> +}
>> +
>> +CXL_EXPORT unsigned long
>> +cxl_protocol_error_get_num(struct cxl_protocol_error *perror)
>> +{
>> +	return perror->num;
>> +}
>> +
>> +CXL_EXPORT const char *
>> +cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
>> +{
>> +	return perror->string;
>> +}
>> +
>> +CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
>> +					       unsigned long error)
>> +{
>> +	struct cxl_ctx *ctx = dport->port->ctx;
>> +	unsigned long path_len;
>> +	char buf[32] = { 0 };
>> +	char *path;
>> +	int rc;
>> +
>> +	if (!ctx->debugfs)
>> +		return -ENOENT;
>> +
>> +	path_len = strlen(ctx->debugfs) + 100;
>> +	path = calloc(path_len, sizeof(char));
>> +	if (!path)
>> +		return -ENOMEM;
>> +
>> +	snprintf(path, path_len, "%s/cxl/%s/einj_inject", ctx->debugfs,
>> +		 cxl_dport_get_devname(dport));
> 
> check return value

Yep, will do (elsewhere as well).

> 
>> +	rc = access(path, F_OK);
>> +	if (rc) {
>> +		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
> 
> errno
> 
>> +		free(path);
>> +		return rc;
> -errno instead of rc
> 
>> +	}
>> +
>> +	snprintf(buf, sizeof(buf), "0x%lx\n", error);
> 
> check return value?
> 
> DJ
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 3/7] libcxl: Add poison injection support
  2025-10-21 23:44   ` Dave Jiang
@ 2025-10-23 20:15     ` Cheatham, Benjamin
  0 siblings, 0 replies; 20+ messages in thread
From: Cheatham, Benjamin @ 2025-10-23 20:15 UTC (permalink / raw)
  To: Dave Jiang, nvdimm; +Cc: linux-cxl, alison.schofield

On 10/21/2025 6:44 PM, Dave Jiang wrote:
> 
> 
> On 10/21/25 11:31 AM, Ben Cheatham wrote:
>> Add a library API for clearing and injecting poison into a CXL memory
>> device through the CXL debugfs.
>>
>> This API will be used by the 'cxl-inject-error' and 'cxl-clear-error'
>> commands in later commits.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>>  cxl/lib/libcxl.c   | 60 ++++++++++++++++++++++++++++++++++++++++++++++
>>  cxl/lib/libcxl.sym |  3 +++
>>  cxl/libcxl.h       |  3 +++
>>  3 files changed, 66 insertions(+)
>>
>> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
>> index 9486b0f..9d4bd80 100644
>> --- a/cxl/lib/libcxl.c
>> +++ b/cxl/lib/libcxl.c
>> @@ -5019,3 +5019,63 @@ CXL_EXPORT struct cxl_cmd *cxl_cmd_new_set_alert_config(struct cxl_memdev *memde
>>  {
>>  	return cxl_cmd_new_generic(memdev, CXL_MEM_COMMAND_ID_SET_ALERT_CONFIG);
>>  }
>> +
>> +CXL_EXPORT bool cxl_memdev_has_poison_injection(struct cxl_memdev *memdev)
>> +{
>> +	struct cxl_ctx *ctx = memdev->ctx;
>> +	size_t path_len;
>> +	bool exists;
>> +	char *path;
>> +
>> +	if (!ctx->debugfs)
>> +		return false;
>> +
>> +	path_len = strlen(ctx->debugfs) + 100;
>> +	path = calloc(path_len, sizeof(char));
>> +	if (!path)
>> +		return false;
>> +
>> +	snprintf(path, path_len, "%s/cxl/%s/inject_poison", ctx->debugfs,
>> +		 cxl_memdev_get_devname(memdev));
> 
> check return value
> 
>> +	exists = access(path, F_OK) == 0;
> 
> While this works, it is more readable this way:
> 
> 	exists = true;
> 	...
> 	rc = access(path, F_OK);
> 	if (rc)
> 		exists = false;> +

Ok, I'll change it.

>> +	free(path);
>> +	return exists;
>> +}
>> +
>> +static int cxl_memdev_poison_action(struct cxl_memdev *memdev, size_t dpa,
>> +				    bool clear)
>> +{
>> +	struct cxl_ctx *ctx = memdev->ctx;
>> +	size_t path_len;
>> +	char addr[32];
>> +	char *path;
>> +	int rc;
>> +
>> +	if (!ctx->debugfs)
>> +		return -ENOENT;
>> +
>> +	path_len = strlen(ctx->debugfs) + 100;
>> +	path = calloc(path_len, sizeof(char));
>> +	if (!path)
>> +		return -ENOMEM;
>> +
>> +	snprintf(path, path_len, "%s/cxl/%s/%s", ctx->debugfs,
>> +		 cxl_memdev_get_devname(memdev),
>> +		 clear ? "clear_poison" : "inject_poison");
>> +	snprintf(addr, 32, "0x%lx\n", dpa);
> 
> check return values for both snprintf()
> 

Will do!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 4/7] cxl: Add inject-error command
  2025-10-22 17:06   ` Dave Jiang
@ 2025-10-23 20:15     ` Cheatham, Benjamin
  2025-10-23 22:51       ` Dave Jiang
  0 siblings, 1 reply; 20+ messages in thread
From: Cheatham, Benjamin @ 2025-10-23 20:15 UTC (permalink / raw)
  To: Dave Jiang, nvdimm; +Cc: linux-cxl, alison.schofield

On 10/22/2025 12:06 PM, Dave Jiang wrote:
> 
> 
> On 10/21/25 11:31 AM, Ben Cheatham wrote:
>> Add the 'cxl-inject-error' command. This command will provide CXL
>> protocol error injection for CXL VH root ports and CXL RCH downstream
>> ports, as well as poison injection for CXL memory devices.
>>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>>  cxl/builtin.h      |   1 +
>>  cxl/cxl.c          |   1 +
>>  cxl/inject-error.c | 195 +++++++++++++++++++++++++++++++++++++++++++++
>>  cxl/meson.build    |   1 +
>>  4 files changed, 198 insertions(+)
>>  create mode 100644 cxl/inject-error.c
>>
>> diff --git a/cxl/builtin.h b/cxl/builtin.h
>> index c483f30..e82fcb5 100644
>> --- a/cxl/builtin.h
>> +++ b/cxl/builtin.h
>> @@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx);
>>  int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>>  int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>>  int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
>> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
>>  #ifdef ENABLE_LIBTRACEFS
>>  int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
>>  #else
>> diff --git a/cxl/cxl.c b/cxl/cxl.c
>> index 1643667..a98bd6b 100644
>> --- a/cxl/cxl.c
>> +++ b/cxl/cxl.c
>> @@ -80,6 +80,7 @@ static struct cmd_struct commands[] = {
>>  	{ "disable-region", .c_fn = cmd_disable_region },
>>  	{ "destroy-region", .c_fn = cmd_destroy_region },
>>  	{ "monitor", .c_fn = cmd_monitor },
>> +	{ "inject-error", .c_fn = cmd_inject_error },
>>  };
>>  
>>  int main(int argc, const char **argv)
>> diff --git a/cxl/inject-error.c b/cxl/inject-error.c
>> new file mode 100644
>> index 0000000..c48ea69
>> --- /dev/null
>> +++ b/cxl/inject-error.c
>> @@ -0,0 +1,195 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (C) 2025 AMD. All rights reserved. */
>> +#include <util/parse-options.h>
>> +#include <cxl/libcxl.h>
>> +#include <cxl/filter.h>
>> +#include <util/log.h>
>> +#include <stdlib.h>
>> +#include <unistd.h>
>> +#include <stdio.h>
>> +#include <errno.h>
>> +#include <limits.h>
>> +
>> +#define EINJ_TYPES_BUF_SIZE 512
>> +
>> +static bool debug;
>> +
>> +static struct inject_params {
>> +	const char *type;
>> +	const char *address;
>> +} inj_param;
>> +
>> +static const struct option inject_options[] = {
>> +	OPT_STRING('t', "type", &inj_param.type, "Error type",
>> +		   "Error type to inject into <device>"),
>> +	OPT_STRING('a', "address", &inj_param.address, "Address for poison injection",
>> +		   "Device physical address for poison injection in hex or decimal"),
>> +#ifdef ENABLE_DEBUG
>> +	OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
>> +#endif
>> +	OPT_END(),
>> +};
>> +
>> +static struct log_ctx iel;
>> +
>> +static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
>> +						     const char *type)
>> +{
>> +	struct cxl_protocol_error *perror;
>> +
>> +	cxl_protocol_error_foreach(ctx, perror) {
>> +		if (strcmp(type, cxl_protocol_error_get_str(perror)) == 0)
>> +			return perror;
>> +	}
>> +
>> +	log_err(&iel, "Invalid CXL protocol error type: %s\n", type);
>> +	return NULL;
>> +}
>> +
>> +static struct cxl_dport *find_cxl_dport(struct cxl_ctx *ctx, const char *devname)
>> +{
>> +	struct cxl_port *port, *top;
>> +	struct cxl_dport *dport;
>> +	struct cxl_bus *bus;
>> +
>> +	cxl_bus_foreach(ctx, bus) {
>> +		top = cxl_bus_get_port(bus);
>> +
>> +		cxl_port_foreach_all(top, port)
>> +			cxl_dport_foreach(port, dport)
>> +				if (!strcmp(devname,
>> +					    cxl_dport_get_devname(dport)))
>> +					return dport;
> 
> Would it be worthwhile to create a util_cxl_dport_filter()?
> 

Yeah probably. I'll make one for the next revision.

>> +	}
>> +
>> +	log_err(&iel, "Downstream port \"%s\" not found\n", devname);
>> +	return NULL;
>> +}
>> +
>> +static struct cxl_memdev *find_cxl_memdev(struct cxl_ctx *ctx,
>> +					  const char *filter)
>> +{
>> +	struct cxl_memdev *memdev;
>> +
>> +	cxl_memdev_foreach(ctx, memdev) {
>> +		if (util_cxl_memdev_filter(memdev, filter, NULL))
>> +			return memdev;
>> +	}
>> +
>> +	log_err(&iel, "Memdev \"%s\" not found\n", filter);
>> +	return NULL;
>> +}
>> +
>> +static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
>> +			    struct cxl_protocol_error *perror)
>> +{
>> +	struct cxl_dport *dport;
>> +	int rc;
>> +
>> +	if (!devname) {
>> +		log_err(&iel, "No downstream port specified for injection\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	dport = find_cxl_dport(ctx, devname);
>> +	if (!dport)
>> +		return -ENODEV;
>> +
>> +	rc = cxl_dport_protocol_error_inject(dport,
>> +					     cxl_protocol_error_get_num(perror));
>> +	if (rc)
>> +		return rc;
>> +
>> +	printf("injected %s protocol error.\n",
>> +	       cxl_protocol_error_get_str(perror));
> 
> log_info() maybe?

I think I had it as log_info() before, but I don't think it was making it's way to
the console. I think I wanted the console output because I personally don't like running
silent commands. Not a great reason, so I'm fine with changing it if that's the preferred
way.

> 
>> +	return 0;
>> +}
>> +
>> +static int poison_action(struct cxl_ctx *ctx, const char *filter,
>> +			 const char *addr)
>> +{
>> +	struct cxl_memdev *memdev;
>> +	size_t a;
> 
> Maybe rename 'addr' to 'addr_str' and rename 'a' to 'addr'
> 

Sure.

>> +	int rc;
>> +
>> +	memdev = find_cxl_memdev(ctx, filter);
>> +	if (!memdev)
>> +		return -ENODEV;
>> +
>> +	if (!cxl_memdev_has_poison_injection(memdev)) {
>> +		log_err(&iel, "%s does not support error injection\n",
>> +			cxl_memdev_get_devname(memdev));
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!addr) {
>> +		log_err(&iel, "no address provided\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	a = strtoull(addr, NULL, 0);
>> +	if (a == ULLONG_MAX && errno == ERANGE) {
>> +		log_err(&iel, "invalid address %s", addr);
>> +		return -EINVAL;
>> +	}
>> +
>> +	rc = cxl_memdev_inject_poison(memdev, a);
>> +
> 
> unnecessary blank line> +	if (rc)

Will remove!

>> +		log_err(&iel, "failed to inject poison at %s:%s: %s\n",
>> +			cxl_memdev_get_devname(memdev), addr, strerror(-rc));
>> +	else
>> +		printf("poison injected at %s:%s\n",
>> +		       cxl_memdev_get_devname(memdev), addr);
> 
> log_info() maybe?

Same thing as above.

Thanks,
Ben

> 
> DJ
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 2/7] libcxl: Add CXL protocol errors
  2025-10-23 20:15     ` Cheatham, Benjamin
@ 2025-10-23 22:50       ` Dave Jiang
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jiang @ 2025-10-23 22:50 UTC (permalink / raw)
  To: Cheatham, Benjamin, nvdimm; +Cc: linux-cxl, alison.schofield



On 10/23/25 1:15 PM, Cheatham, Benjamin wrote:
> On 10/21/2025 6:15 PM, Dave Jiang wrote:
>>
>>
>> On 10/21/25 11:31 AM, Ben Cheatham wrote:
>>> The v6.11 Linux kernel adds CXL protocl (CXL.cache & CXL.mem) error
>>> injection for platforms that implement the error types as according to
>>> the v6.5+ ACPI specification. The interface for injecting these errors
>>> are provided by the kernel under the CXL debugfs. The relevant files in
>>> the interface are the einj_types file, which provides the available CXL
>>> error types for injection, and the einj_inject file, which injects the
>>> error into a CXL VH root port or CXL RCH downstream port.
>>>
>>> Add a library API to retrieve the CXL error types and inject them. This
>>> API will be used in a later commit by the 'cxl-inject-error' and
>>> 'cxl-list' commands.
>>>
>>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>>> ---
>>>  cxl/lib/libcxl.c   | 174 +++++++++++++++++++++++++++++++++++++++++++++
>>>  cxl/lib/libcxl.sym |   5 ++
>>>  cxl/lib/private.h  |  14 ++++
>>>  cxl/libcxl.h       |  13 ++++
>>>  4 files changed, 206 insertions(+)
>>>
>>> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
>>> index ea5831f..9486b0f 100644
>>> --- a/cxl/lib/libcxl.c
>>> +++ b/cxl/lib/libcxl.c
>>> @@ -46,11 +46,13 @@ struct cxl_ctx {
>>>  	void *userdata;
>>>  	int memdevs_init;
>>>  	int buses_init;
>>> +	int perrors_init;
>>>  	unsigned long timeout;
>>>  	struct udev *udev;
>>>  	struct udev_queue *udev_queue;
>>>  	struct list_head memdevs;
>>>  	struct list_head buses;
>>> +	struct list_head perrors;
>>>  	struct kmod_ctx *kmod_ctx;
>>>  	struct daxctl_ctx *daxctl_ctx;
>>>  	void *private_data;
>>> @@ -205,6 +207,14 @@ static void free_bus(struct cxl_bus *bus, struct list_head *head)
>>>  	free(bus);
>>>  }
>>>  
>>> +static void free_protocol_error(struct cxl_protocol_error *perror,
>>> +				struct list_head *head)
>>> +{
>>> +	if (head)
>>> +		list_del_from(head, &perror->list);
>>
>> I would go if (!head) return;
>>
> 
> Would that work? I think I would still need to free perror below.

Ah right you need to free that. nm

DJ> 
>>> +	free(perror);
>>> +}
>>> +
>>>  /**
>>>   * cxl_get_userdata - retrieve stored data pointer from library context
>>>   * @ctx: cxl library context
>>> @@ -328,6 +338,7 @@ CXL_EXPORT int cxl_new(struct cxl_ctx **ctx)
>>>  	*ctx = c;
>>>  	list_head_init(&c->memdevs);
>>>  	list_head_init(&c->buses);
>>> +	list_head_init(&c->perrors);
>>>  	c->kmod_ctx = kmod_ctx;
>>>  	c->daxctl_ctx = daxctl_ctx;
>>>  	c->udev = udev;
>>> @@ -369,6 +380,7 @@ CXL_EXPORT struct cxl_ctx *cxl_ref(struct cxl_ctx *ctx)
>>>   */
>>>  CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
>>>  {
>>> +	struct cxl_protocol_error *perror, *_p;
>>>  	struct cxl_memdev *memdev, *_d;
>>>  	struct cxl_bus *bus, *_b;
>>>  
>>> @@ -384,6 +396,9 @@ CXL_EXPORT void cxl_unref(struct cxl_ctx *ctx)
>>>  	list_for_each_safe(&ctx->buses, bus, _b, port.list)
>>>  		free_bus(bus, &ctx->buses);
>>>  
>>> +	list_for_each_safe(&ctx->perrors, perror, _p, list)
>>> +		free_protocol_error(perror, &ctx->perrors);
>>> +
>>>  	udev_queue_unref(ctx->udev_queue);
>>>  	udev_unref(ctx->udev);
>>>  	kmod_unref(ctx->kmod_ctx);
>>> @@ -3416,6 +3431,165 @@ CXL_EXPORT int cxl_port_decoders_committed(struct cxl_port *port)
>>>  	return port->decoders_committed;
>>>  }
>>>  
>>> +const struct cxl_protocol_error cxl_protocol_errors[] = {
>>> +	CXL_PROTOCOL_ERROR(12, "cache-correctable"),
>>> +	CXL_PROTOCOL_ERROR(13, "cache-uncorrectable"),
>>> +	CXL_PROTOCOL_ERROR(14, "cache-fatal"),
>>> +	CXL_PROTOCOL_ERROR(15, "mem-correctable"),
>>> +	CXL_PROTOCOL_ERROR(16, "mem-uncorrectable"),
>>> +	CXL_PROTOCOL_ERROR(17, "mem-fatal")
>>> +};
>>> +
>>> +static struct cxl_protocol_error *create_cxl_protocol_error(struct cxl_ctx *ctx,
>>> +							    unsigned long n)
>>
>> why unsigned long instead of int? are there that many errors?
>>
> 
> No there aren't. I'll change it over to unsigned int instead.
> 
>>> +{
>>> +	struct cxl_protocol_error *perror;
>>> +
>>> +	for (unsigned long i = 0; i < ARRAY_SIZE(cxl_protocol_errors); i++) {
>>> +		if (n != BIT(cxl_protocol_errors[i].num))
>>> +			continue;
>>> +
>>> +		perror = calloc(1, sizeof(*perror));
>>> +		if (!perror)
>>> +			return NULL;
>>> +
>>> +		*perror = cxl_protocol_errors[i];
>>> +		perror->ctx = ctx;
>>> +		return perror;
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static void cxl_add_protocol_errors(struct cxl_ctx *ctx)
>>> +{
>>> +	struct cxl_protocol_error *perror;
>>> +	char *path, *num, *save;
>>> +	unsigned long n;
>>> +	size_t path_len;
>>> +	char buf[512];
>>
>> Use SYSFS_ATTR_SIZE rather than 512
> 
> Wasn't aware of that, will do!
> 
>>
>>> +	int rc = 0;
>>> +
>>> +	if (!ctx->debugfs)
>>> +		return;
>>> +
>>> +	path_len = strlen(ctx->debugfs) + 100;
>>> +	path = calloc(1, path_len);
>>> +	if (!path)
>>> +		return;
>>> +
>>> +	snprintf(path, path_len, "%s/cxl/einj_types", ctx->debugfs);
>>> +	rc = access(path, F_OK);
>>> +	if (rc) {
>>> +		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
>> strerror(errno)? access() returns -1 and the actual error is in errno.
> 
> My bad, will update it (and elsewhere).
> 
>>> +		goto err;
>>> +	}
>>> +
>>> +	rc = sysfs_read_attr(ctx, path, buf);
>>> +	if (rc) {
>>> +		err(ctx, "failed to read %s: %s\n", path, strerror(-rc));
>>> +		goto err;
>>> +	}
>>> +
>>> +	/*
>>> +	 * The format of the output of the einj_types attr is:
>>> +	 * <Error number in hex 1> <Error name 1>
>>> +	 * <Error number in hex 2> <Error name 2>
>>> +	 * ...
>>> +	 *
>>> +	 * We only need the number, so parse that and skip the rest of
>>> +	 * the line.
>>> +	 */
>>> +	num = strtok_r(buf, " \n", &save);
>>> +	while (num) {
>>> +		n = strtoul(num, NULL, 16);
>>> +		perror = create_cxl_protocol_error(ctx, n);
>>> +		if (perror)
>>> +			list_add(&ctx->perrors, &perror->list);
>>> +
>>> +		num = strtok_r(NULL, "\n", &save);
>>> +		if (!num)
>>> +			break;
>>> +
>>> +		num = strtok_r(NULL, " \n", &save);
>>> +	}
>>> +
>>> +err:
>>> +	free(path);
>>> +}
>>> +
>>> +static void cxl_protocol_errors_init(struct cxl_ctx *ctx)
>>> +{
>>> +	if (ctx->perrors_init)
>>> +		return;
>>> +
>>> +	ctx->perrors_init = 1;
>>> +	cxl_add_protocol_errors(ctx);
>>> +}
>>> +
>>> +CXL_EXPORT struct cxl_protocol_error *
>>> +cxl_protocol_error_get_first(struct cxl_ctx *ctx)
>>> +{
>>> +	cxl_protocol_errors_init(ctx);
>>> +
>>> +	return list_top(&ctx->perrors, struct cxl_protocol_error, list);
>>> +}
>>> +
>>> +CXL_EXPORT struct cxl_protocol_error *
>>> +cxl_protocol_error_get_next(struct cxl_protocol_error *perror)
>>> +{
>>> +	struct cxl_ctx *ctx = perror->ctx;
>>> +
>>> +	return list_next(&ctx->perrors, perror, list);
>>> +}
>>> +
>>> +CXL_EXPORT unsigned long
>>> +cxl_protocol_error_get_num(struct cxl_protocol_error *perror)
>>> +{
>>> +	return perror->num;
>>> +}
>>> +
>>> +CXL_EXPORT const char *
>>> +cxl_protocol_error_get_str(struct cxl_protocol_error *perror)
>>> +{
>>> +	return perror->string;
>>> +}
>>> +
>>> +CXL_EXPORT int cxl_dport_protocol_error_inject(struct cxl_dport *dport,
>>> +					       unsigned long error)
>>> +{
>>> +	struct cxl_ctx *ctx = dport->port->ctx;
>>> +	unsigned long path_len;
>>> +	char buf[32] = { 0 };
>>> +	char *path;
>>> +	int rc;
>>> +
>>> +	if (!ctx->debugfs)
>>> +		return -ENOENT;
>>> +
>>> +	path_len = strlen(ctx->debugfs) + 100;
>>> +	path = calloc(path_len, sizeof(char));
>>> +	if (!path)
>>> +		return -ENOMEM;
>>> +
>>> +	snprintf(path, path_len, "%s/cxl/%s/einj_inject", ctx->debugfs,
>>> +		 cxl_dport_get_devname(dport));
>>
>> check return value
> 
> Yep, will do (elsewhere as well).
> 
>>
>>> +	rc = access(path, F_OK);
>>> +	if (rc) {
>>> +		err(ctx, "failed to access %s: %s\n", path, strerror(-rc));
>>
>> errno
>>
>>> +		free(path);
>>> +		return rc;
>> -errno instead of rc
>>
>>> +	}
>>> +
>>> +	snprintf(buf, sizeof(buf), "0x%lx\n", error);
>>
>> check return value?
>>
>> DJ
>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ndctl PATCH v3 4/7] cxl: Add inject-error command
  2025-10-23 20:15     ` Cheatham, Benjamin
@ 2025-10-23 22:51       ` Dave Jiang
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jiang @ 2025-10-23 22:51 UTC (permalink / raw)
  To: Cheatham, Benjamin, nvdimm; +Cc: linux-cxl, alison.schofield



On 10/23/25 1:15 PM, Cheatham, Benjamin wrote:
> On 10/22/2025 12:06 PM, Dave Jiang wrote:
>>
>>
>> On 10/21/25 11:31 AM, Ben Cheatham wrote:
>>> Add the 'cxl-inject-error' command. This command will provide CXL
>>> protocol error injection for CXL VH root ports and CXL RCH downstream
>>> ports, as well as poison injection for CXL memory devices.
>>>
>>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>>> ---
>>>  cxl/builtin.h      |   1 +
>>>  cxl/cxl.c          |   1 +
>>>  cxl/inject-error.c | 195 +++++++++++++++++++++++++++++++++++++++++++++
>>>  cxl/meson.build    |   1 +
>>>  4 files changed, 198 insertions(+)
>>>  create mode 100644 cxl/inject-error.c
>>>
>>> diff --git a/cxl/builtin.h b/cxl/builtin.h
>>> index c483f30..e82fcb5 100644
>>> --- a/cxl/builtin.h
>>> +++ b/cxl/builtin.h
>>> @@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx);
>>>  int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>>>  int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx);
>>>  int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx);
>>> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx);
>>>  #ifdef ENABLE_LIBTRACEFS
>>>  int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx);
>>>  #else
>>> diff --git a/cxl/cxl.c b/cxl/cxl.c
>>> index 1643667..a98bd6b 100644
>>> --- a/cxl/cxl.c
>>> +++ b/cxl/cxl.c
>>> @@ -80,6 +80,7 @@ static struct cmd_struct commands[] = {
>>>  	{ "disable-region", .c_fn = cmd_disable_region },
>>>  	{ "destroy-region", .c_fn = cmd_destroy_region },
>>>  	{ "monitor", .c_fn = cmd_monitor },
>>> +	{ "inject-error", .c_fn = cmd_inject_error },
>>>  };
>>>  
>>>  int main(int argc, const char **argv)
>>> diff --git a/cxl/inject-error.c b/cxl/inject-error.c
>>> new file mode 100644
>>> index 0000000..c48ea69
>>> --- /dev/null
>>> +++ b/cxl/inject-error.c
>>> @@ -0,0 +1,195 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/* Copyright (C) 2025 AMD. All rights reserved. */
>>> +#include <util/parse-options.h>
>>> +#include <cxl/libcxl.h>
>>> +#include <cxl/filter.h>
>>> +#include <util/log.h>
>>> +#include <stdlib.h>
>>> +#include <unistd.h>
>>> +#include <stdio.h>
>>> +#include <errno.h>
>>> +#include <limits.h>
>>> +
>>> +#define EINJ_TYPES_BUF_SIZE 512
>>> +
>>> +static bool debug;
>>> +
>>> +static struct inject_params {
>>> +	const char *type;
>>> +	const char *address;
>>> +} inj_param;
>>> +
>>> +static const struct option inject_options[] = {
>>> +	OPT_STRING('t', "type", &inj_param.type, "Error type",
>>> +		   "Error type to inject into <device>"),
>>> +	OPT_STRING('a', "address", &inj_param.address, "Address for poison injection",
>>> +		   "Device physical address for poison injection in hex or decimal"),
>>> +#ifdef ENABLE_DEBUG
>>> +	OPT_BOOLEAN(0, "debug", &debug, "turn on debug output"),
>>> +#endif
>>> +	OPT_END(),
>>> +};
>>> +
>>> +static struct log_ctx iel;
>>> +
>>> +static struct cxl_protocol_error *find_cxl_proto_err(struct cxl_ctx *ctx,
>>> +						     const char *type)
>>> +{
>>> +	struct cxl_protocol_error *perror;
>>> +
>>> +	cxl_protocol_error_foreach(ctx, perror) {
>>> +		if (strcmp(type, cxl_protocol_error_get_str(perror)) == 0)
>>> +			return perror;
>>> +	}
>>> +
>>> +	log_err(&iel, "Invalid CXL protocol error type: %s\n", type);
>>> +	return NULL;
>>> +}
>>> +
>>> +static struct cxl_dport *find_cxl_dport(struct cxl_ctx *ctx, const char *devname)
>>> +{
>>> +	struct cxl_port *port, *top;
>>> +	struct cxl_dport *dport;
>>> +	struct cxl_bus *bus;
>>> +
>>> +	cxl_bus_foreach(ctx, bus) {
>>> +		top = cxl_bus_get_port(bus);
>>> +
>>> +		cxl_port_foreach_all(top, port)
>>> +			cxl_dport_foreach(port, dport)
>>> +				if (!strcmp(devname,
>>> +					    cxl_dport_get_devname(dport)))
>>> +					return dport;
>>
>> Would it be worthwhile to create a util_cxl_dport_filter()?
>>
> 
> Yeah probably. I'll make one for the next revision.
> 
>>> +	}
>>> +
>>> +	log_err(&iel, "Downstream port \"%s\" not found\n", devname);
>>> +	return NULL;
>>> +}
>>> +
>>> +static struct cxl_memdev *find_cxl_memdev(struct cxl_ctx *ctx,
>>> +					  const char *filter)
>>> +{
>>> +	struct cxl_memdev *memdev;
>>> +
>>> +	cxl_memdev_foreach(ctx, memdev) {
>>> +		if (util_cxl_memdev_filter(memdev, filter, NULL))
>>> +			return memdev;
>>> +	}
>>> +
>>> +	log_err(&iel, "Memdev \"%s\" not found\n", filter);
>>> +	return NULL;
>>> +}
>>> +
>>> +static int inject_proto_err(struct cxl_ctx *ctx, const char *devname,
>>> +			    struct cxl_protocol_error *perror)
>>> +{
>>> +	struct cxl_dport *dport;
>>> +	int rc;
>>> +
>>> +	if (!devname) {
>>> +		log_err(&iel, "No downstream port specified for injection\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	dport = find_cxl_dport(ctx, devname);
>>> +	if (!dport)
>>> +		return -ENODEV;
>>> +
>>> +	rc = cxl_dport_protocol_error_inject(dport,
>>> +					     cxl_protocol_error_get_num(perror));
>>> +	if (rc)
>>> +		return rc;
>>> +
>>> +	printf("injected %s protocol error.\n",
>>> +	       cxl_protocol_error_get_str(perror));
>>
>> log_info() maybe?
> 
> I think I had it as log_info() before, but I don't think it was making it's way to
> the console. I think I wanted the console output because I personally don't like running
> silent commands. Not a great reason, so I'm fine with changing it if that's the preferred
> way.
> 

Alison,
Do you have a preference?

DJ

>>
>>> +	return 0;
>>> +}
>>> +
>>> +static int poison_action(struct cxl_ctx *ctx, const char *filter,
>>> +			 const char *addr)
>>> +{
>>> +	struct cxl_memdev *memdev;
>>> +	size_t a;
>>
>> Maybe rename 'addr' to 'addr_str' and rename 'a' to 'addr'
>>
> 
> Sure.
> 
>>> +	int rc;
>>> +
>>> +	memdev = find_cxl_memdev(ctx, filter);
>>> +	if (!memdev)
>>> +		return -ENODEV;
>>> +
>>> +	if (!cxl_memdev_has_poison_injection(memdev)) {
>>> +		log_err(&iel, "%s does not support error injection\n",
>>> +			cxl_memdev_get_devname(memdev));
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (!addr) {
>>> +		log_err(&iel, "no address provided\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	a = strtoull(addr, NULL, 0);
>>> +	if (a == ULLONG_MAX && errno == ERANGE) {
>>> +		log_err(&iel, "invalid address %s", addr);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	rc = cxl_memdev_inject_poison(memdev, a);
>>> +
>>
>> unnecessary blank line> +	if (rc)
> 
> Will remove!
> 
>>> +		log_err(&iel, "failed to inject poison at %s:%s: %s\n",
>>> +			cxl_memdev_get_devname(memdev), addr, strerror(-rc));
>>> +	else
>>> +		printf("poison injected at %s:%s\n",
>>> +		       cxl_memdev_get_devname(memdev), addr);
>>
>> log_info() maybe?
> 
> Same thing as above.
> 
> Thanks,
> Ben
> 
>>
>> DJ
>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-10-23 22:51 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-21 18:31 [ndctl PATCH v3 0/7] Add error injection support Ben Cheatham
2025-10-21 18:31 ` [ndctl PATCH v3 1/7] libcxl: Add debugfs path to CXL context Ben Cheatham
2025-10-21 22:55   ` Dave Jiang
2025-10-23 20:15     ` Cheatham, Benjamin
2025-10-21 18:31 ` [ndctl PATCH v3 2/7] libcxl: Add CXL protocol errors Ben Cheatham
2025-10-21 23:15   ` Dave Jiang
2025-10-23 20:15     ` Cheatham, Benjamin
2025-10-23 22:50       ` Dave Jiang
2025-10-21 18:31 ` [ndctl PATCH v3 3/7] libcxl: Add poison injection support Ben Cheatham
2025-10-21 23:44   ` Dave Jiang
2025-10-23 20:15     ` Cheatham, Benjamin
2025-10-21 18:31 ` [ndctl PATCH v3 4/7] cxl: Add inject-error command Ben Cheatham
2025-10-22 17:06   ` Dave Jiang
2025-10-23 20:15     ` Cheatham, Benjamin
2025-10-23 22:51       ` Dave Jiang
2025-10-21 18:31 ` [ndctl PATCH v3 5/7] cxl: Add clear-error command Ben Cheatham
2025-10-21 18:31 ` [ndctl PATCH v3 6/7] cxl/list: Add injectable errors in output Ben Cheatham
2025-10-22 17:18   ` Dave Jiang
2025-10-21 18:31 ` [ndctl PATCH v3 7/7] Documentation: Add docs for inject/clear-error commands Ben Cheatham
2025-10-22 17:22   ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox