* [ndctl PATCH v2 0/3] Support poison list retrieval
@ 2023-10-01 22:31 alison.schofield
2023-10-01 22:31 ` [ndctl PATCH v2 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: alison.schofield @ 2023-10-01 22:31 UTC (permalink / raw)
To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
Changes since v1:
- Replace 'media-error' language with 'poison'.
At v1 I was spec obsessed and following it's language strictly. Jonathan
questioned it at the time, and I've come around to simply say poison,
since that is the language we've all been using for the past year+.
It also aligns with the inject-poison and clear-poison options that
have been posted on this list.
- Retrieve poison per region by iterating through the contributing memdevs.
(The by region trigger was designed out of the driver implementation.)
- Add the HPA and region info to both the by region and by memdev cxl list
json.
- Applied one review tag to the untouched pid patch. (Jonathan)
- Link to v1:
https://lore.kernel.org/nvdimm/cover.1668133294.git.alison.schofield@intel.com/
Add the option to include a memory device poison list in cxl list json output.
Examples appended below: by memdev, by region, by memdev and coincidentally
in a region, and no poison found.
Example: By memdev
cxl list -m mem1 --poison -u
{
"memdev":"mem1",
"pmem_size":"1024.00 MiB (1073.74 MB)",
"ram_size":"1024.00 MiB (1073.74 MB)",
"serial":"0x1",
"numa_node":1,
"host":"cxl_mem.1",
"poison":{
"nr_poison_records":4,
"poison_records":[
{
"dpa":"0x40000000",
"dpa_length":64,
"source":"Injected",
"flags":""
},
{
"dpa":"0x40001000",
"dpa_length":64,
"source":"Injected",
"flags":""
},
{
"dpa":"0",
"dpa_length":64,
"source":"Injected",
"flags":""
},
{
"dpa":"0x600",
"dpa_length":64,
"source":"Injected",
"flags":""
}
]
}
}
Example: By region
cxl list -r region5 --poison -u
{
"region":"region5",
"resource":"0xf110000000",
"size":"2.00 GiB (2.15 GB)",
"type":"pmem",
"interleave_ways":2,
"interleave_granularity":4096,
"decode_state":"commit",
"poison":{
"nr_poison_records":2,
"poison_records":[
{
"memdev":"mem1",
"region":"region5",
"hpa":"0xf110001000",
"dpa":"0x40000000",
"dpa_length":64,
"source":"Injected",
"flags":""
},
{
"memdev":"mem0",
"region":"region5",
"hpa":"0xf110000000",
"dpa":"0x40000000",
"dpa_length":64,
"source":"Injected",
"flags":""
}
]
}
}
Example: By memdev and coincidentally in a region
# cxl list -m mem0 --poison -u
{
"memdev":"mem0",
"pmem_size":"1024.00 MiB (1073.74 MB)",
"ram_size":"1024.00 MiB (1073.74 MB)",
"serial":"0",
"numa_node":0,
"host":"cxl_mem.0",
"poison":{
"nr_poison_records":1,
"poison_records":[
{
"region":"region5",
"hpa":"0xf110000000",
"dpa":"0x40000000",
"dpa_length":64,
"source":"Injected",
"flags":""
}
]
}
}
Example: No poison found
cxl list -m mem9 --poison -u
{
"memdev":"mem9",
"pmem_size":"1024.00 MiB (1073.74 MB)",
"ram_size":"1024.00 MiB (1073.74 MB)",
"serial":"0x9",
"numa_node":1,
"host":"cxl_mem.9",
"poison":{
"nr_poison_records":0
}
}
Alison Schofield (5):
libcxl: add interfaces for GET_POISON_LIST mailbox commands
cxl: add an optional pid check to event parsing
cxl/list: collect and parse the poison list records
cxl/list: add --poison option to cxl list
cxl/test: add cxl-poison.sh unit test
Documentation/cxl/cxl-list.txt | 64 ++++++++++
cxl/event_trace.c | 5 +
cxl/event_trace.h | 1 +
cxl/filter.h | 3 +
cxl/json.c | 208 +++++++++++++++++++++++++++++++++
cxl/lib/libcxl.c | 47 ++++++++
cxl/lib/libcxl.sym | 6 +
cxl/libcxl.h | 2 +
cxl/list.c | 2 +
test/cxl-poison.sh | 103 ++++++++++++++++
test/meson.build | 2 +
util/json.h | 1 +
12 files changed, 444 insertions(+)
create mode 100644 test/cxl-poison.sh
base-commit: a871e6153b11fe63780b37cdcb1eb347b296095c
--
2.37.3
^ permalink raw reply [flat|nested] 12+ messages in thread* [ndctl PATCH v2 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands 2023-10-01 22:31 [ndctl PATCH v2 0/3] Support poison list retrieval alison.schofield @ 2023-10-01 22:31 ` alison.schofield 2023-11-15 10:08 ` Verma, Vishal L 2023-10-01 22:31 ` [ndctl PATCH v2 2/5] cxl: add an optional pid check to event parsing alison.schofield ` (3 subsequent siblings) 4 siblings, 1 reply; 12+ messages in thread From: alison.schofield @ 2023-10-01 22:31 UTC (permalink / raw) To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl From: Alison Schofield <alison.schofield@intel.com> CXL devices maintain a list of locations that are poisoned or result in poison if the addresses are accessed by the host. Per the spec (CXL 3.0 8.2.9.8.4.1), the device returns the Poison List as a set of Media Error Records that include the source of the error, the starting device physical address and length. Trigger the retrieval of the poison list by writing to the memory device sysfs attribute: trigger_poison_list. The CXL driver only offers triggering per memdev, so the trigger by region interface offered here is a convenience API that triggers a poison list retrieval for each memdev contributing to a region. int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev); int cxl_region_trigger_poison_list(struct cxl_region *region); The resulting poison records are logged as kernel trace events named 'cxl_poison'. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- cxl/lib/libcxl.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ cxl/lib/libcxl.sym | 6 ++++++ cxl/libcxl.h | 2 ++ 3 files changed, 55 insertions(+) diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c index af4ca44eae19..2f6e64ea2ae7 100644 --- a/cxl/lib/libcxl.c +++ b/cxl/lib/libcxl.c @@ -1647,6 +1647,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev) return 0; } +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev) +{ + struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev); + char *path = memdev->dev_buf; + int len = memdev->buf_len, rc; + + if (snprintf(path, len, "%s/trigger_poison_list", memdev->dev_path) >= + len) { + err(ctx, "%s: buffer too small\n", + cxl_memdev_get_devname(memdev)); + return -ENXIO; + } + rc = sysfs_write_attr(ctx, path, "1\n"); + if (rc < 0) { + fprintf(stderr, + "%s: Failed write sysfs attr trigger_poison_list\n", + cxl_memdev_get_devname(memdev)); + return rc; + } + return 0; +} + +CXL_EXPORT int cxl_region_trigger_poison_list(struct cxl_region *region) +{ + struct cxl_memdev_mapping *mapping; + int rc; + + cxl_mapping_foreach(region, mapping) { + struct cxl_decoder *decoder; + struct cxl_memdev *memdev; + + decoder = cxl_mapping_get_decoder(mapping); + if (!decoder) + continue; + + memdev = cxl_decoder_get_memdev(decoder); + if (!memdev) + continue; + + rc = cxl_memdev_trigger_poison_list(memdev); + if (rc) + return rc; + } + + return 0; +} + CXL_EXPORT int cxl_memdev_enable(struct cxl_memdev *memdev) { struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev); diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym index 8fa1cca3d0d7..277b7e21d6a6 100644 --- a/cxl/lib/libcxl.sym +++ b/cxl/lib/libcxl.sym @@ -264,3 +264,9 @@ global: cxl_memdev_update_fw; cxl_memdev_cancel_fw_update; } LIBCXL_5; + +LIBCXL_7 { +global: + cxl_memdev_trigger_poison_list; + cxl_region_trigger_poison_list; +} LIBCXL_6; diff --git a/cxl/libcxl.h b/cxl/libcxl.h index 0f4f4b2648fb..ecdffe36df2c 100644 --- a/cxl/libcxl.h +++ b/cxl/libcxl.h @@ -460,6 +460,8 @@ enum cxl_setpartition_mode { int cxl_cmd_partition_set_mode(struct cxl_cmd *cmd, enum cxl_setpartition_mode mode); +int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev); +int cxl_region_trigger_poison_list(struct cxl_region *region); #ifdef __cplusplus } /* extern "C" */ -- 2.37.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [ndctl PATCH v2 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands 2023-10-01 22:31 ` [ndctl PATCH v2 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield @ 2023-11-15 10:08 ` Verma, Vishal L 2023-11-17 16:21 ` Alison Schofield 0 siblings, 1 reply; 12+ messages in thread From: Verma, Vishal L @ 2023-11-15 10:08 UTC (permalink / raw) To: Schofield, Alison; +Cc: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev On Sun, 2023-10-01 at 15:31 -0700, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > CXL devices maintain a list of locations that are poisoned or result > in poison if the addresses are accessed by the host. > > Per the spec (CXL 3.0 8.2.9.8.4.1), the device returns the Poison > List as a set of Media Error Records that include the source of the > error, the starting device physical address and length. > > Trigger the retrieval of the poison list by writing to the memory > device sysfs attribute: trigger_poison_list. The CXL driver only > offers triggering per memdev, so the trigger by region interface > offered here is a convenience API that triggers a poison list > retrieval for each memdev contributing to a region. > > int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev); > int cxl_region_trigger_poison_list(struct cxl_region *region); > > The resulting poison records are logged as kernel trace events > named 'cxl_poison'. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > --- > cxl/lib/libcxl.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ > cxl/lib/libcxl.sym | 6 ++++++ > cxl/libcxl.h | 2 ++ > 3 files changed, 55 insertions(+) > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c > index af4ca44eae19..2f6e64ea2ae7 100644 > --- a/cxl/lib/libcxl.c > +++ b/cxl/lib/libcxl.c > @@ -1647,6 +1647,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev) > return 0; > } > > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev) > +{ > + struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev); > + char *path = memdev->dev_buf; > + int len = memdev->buf_len, rc; > + > + if (snprintf(path, len, "%s/trigger_poison_list", memdev->dev_path) >= > + len) { I see this unfortunate line break Jonathan commented on still crept in, agreed that breaking up snprintf's args would look better. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ndctl PATCH v2 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands 2023-11-15 10:08 ` Verma, Vishal L @ 2023-11-17 16:21 ` Alison Schofield 0 siblings, 0 replies; 12+ messages in thread From: Alison Schofield @ 2023-11-17 16:21 UTC (permalink / raw) To: Verma, Vishal L; +Cc: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev On Wed, Nov 15, 2023 at 02:08:03AM -0800, Vishal Verma wrote: > On Sun, 2023-10-01 at 15:31 -0700, alison.schofield@intel.com wrote: snip > > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev) > > +{ > > + struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev); > > + char *path = memdev->dev_buf; > > + int len = memdev->buf_len, rc; > > + > > + if (snprintf(path, len, "%s/trigger_poison_list", memdev->dev_path) >= > > + len) { > > I see this unfortunate line break Jonathan commented on still crept in, > agreed that breaking up snprintf's args would look better. Fixed up in v3. Thanks! > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ndctl PATCH v2 2/5] cxl: add an optional pid check to event parsing 2023-10-01 22:31 [ndctl PATCH v2 0/3] Support poison list retrieval alison.schofield 2023-10-01 22:31 ` [ndctl PATCH v2 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield @ 2023-10-01 22:31 ` alison.schofield 2023-10-01 22:31 ` [ndctl PATCH v2 3/5] cxl/list: collect and parse the poison list records alison.schofield ` (2 subsequent siblings) 4 siblings, 0 replies; 12+ messages in thread From: alison.schofield @ 2023-10-01 22:31 UTC (permalink / raw) To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl, Jonathan Cameron From: Alison Schofield <alison.schofield@intel.com> When parsing CXL events, callers may only be interested in events that originate from the current process. Introduce an optional argument to the event trace context: event_pid. When event_pid is present, only include events with a matching pid in the returned JSON list. It is not a failure to see other, non matching results. Simply skip those. The initial use case for this is device poison listings where only the poison error records requested by this process are wanted. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> --- cxl/event_trace.c | 5 +++++ cxl/event_trace.h | 1 + 2 files changed, 6 insertions(+) diff --git a/cxl/event_trace.c b/cxl/event_trace.c index db8cc85f0b6f..269060898118 100644 --- a/cxl/event_trace.c +++ b/cxl/event_trace.c @@ -208,6 +208,11 @@ static int cxl_event_parse(struct tep_event *event, struct tep_record *record, return 0; } + if (event_ctx->event_pid) { + if (event_ctx->event_pid != tep_data_pid(event->tep, record)) + return 0; + } + if (event_ctx->parse_event) return event_ctx->parse_event(event, record, &event_ctx->jlist_head); diff --git a/cxl/event_trace.h b/cxl/event_trace.h index ec6267202c8b..7f7773b2201f 100644 --- a/cxl/event_trace.h +++ b/cxl/event_trace.h @@ -15,6 +15,7 @@ struct event_ctx { const char *system; struct list_head jlist_head; const char *event_name; /* optional */ + int event_pid; /* optional */ int (*parse_event)(struct tep_event *event, struct tep_record *record, struct list_head *jlist_head); /* optional */ }; -- 2.37.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [ndctl PATCH v2 3/5] cxl/list: collect and parse the poison list records 2023-10-01 22:31 [ndctl PATCH v2 0/3] Support poison list retrieval alison.schofield 2023-10-01 22:31 ` [ndctl PATCH v2 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield 2023-10-01 22:31 ` [ndctl PATCH v2 2/5] cxl: add an optional pid check to event parsing alison.schofield @ 2023-10-01 22:31 ` alison.schofield 2023-11-15 10:09 ` Verma, Vishal L 2023-10-01 22:31 ` [ndctl PATCH v2 4/5] cxl/list: add --poison option to cxl list alison.schofield 2023-10-01 22:31 ` [ndctl PATCH v2 5/5] cxl/test: add cxl-poison.sh unit test alison.schofield 4 siblings, 1 reply; 12+ messages in thread From: alison.schofield @ 2023-10-01 22:31 UTC (permalink / raw) To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl From: Alison Schofield <alison.schofield@intel.com> Poison list records are logged as events in the kernel tracing subsystem. To prepare the poison list for cxl list, enable tracing, trigger the poison list read, and parse the generated cxl_poison events into a json representation. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- cxl/json.c | 208 ++++++++++++++++++++++++++++++++++++++++++++++++++++ util/json.h | 1 + 2 files changed, 209 insertions(+) diff --git a/cxl/json.c b/cxl/json.c index 7678d02020b6..36db73de4f8f 100644 --- a/cxl/json.c +++ b/cxl/json.c @@ -2,15 +2,19 @@ // Copyright (C) 2015-2021 Intel Corporation. All rights reserved. #include <limits.h> #include <util/json.h> +#include <util/bitmap.h> #include <uuid/uuid.h> #include <cxl/libcxl.h> #include <json-c/json.h> #include <json-c/printbuf.h> #include <ccan/short_types/short_types.h> +#include <traceevent/event-parse.h> +#include <tracefs/tracefs.h> #include "filter.h" #include "json.h" #include "../daxctl/json.h" +#include "event_trace.h" #define CXL_FW_VERSION_STR_LEN 16 #define CXL_FW_MAX_SLOTS 4 @@ -571,6 +575,190 @@ err_jobj: return NULL; } +/* CXL 8.2.9.5.4.1 Get Poison List: Poison Source */ +#define CXL_POISON_SOURCE_UNKNOWN 0 +#define CXL_POISON_SOURCE_EXTERNAL 1 +#define CXL_POISON_SOURCE_INTERNAL 2 +#define CXL_POISON_SOURCE_INJECTED 3 +#define CXL_POISON_SOURCE_VENDOR 7 + +/* CXL 8.2.9.5.4.1 Get Poison List: Payload out flags */ +#define CXL_POISON_FLAG_MORE BIT(0) +#define CXL_POISON_FLAG_OVERFLOW BIT(1) +#define CXL_POISON_FLAG_SCANNING BIT(2) + +static struct json_object * +util_cxl_poison_events_to_json(struct tracefs_instance *inst, bool is_region, + unsigned long flags) +{ + struct json_object *jerrors, *jmedia, *jobj = NULL; + struct jlist_node *jnode, *next; + struct event_ctx ectx = { + .event_name = "cxl_poison", + .event_pid = getpid(), + .system = "cxl", + }; + int rc, count = 0; + + list_head_init(&ectx.jlist_head); + rc = cxl_parse_events(inst, &ectx); + if (rc < 0) { + fprintf(stderr, "Failed to parse events: %d\n", rc); + return NULL; + } + /* Add nr_poison_records:0 to json */ + if (list_empty(&ectx.jlist_head)) + goto out; + + jerrors = json_object_new_array(); + if (!jerrors) + return NULL; + + list_for_each_safe(&ectx.jlist_head, jnode, next, list) { + struct json_object *jval = NULL; + struct json_object *jp = NULL; + int source, pflags; + u64 addr, len; + + jp = json_object_new_object(); + if (!jp) + return NULL; + + if (is_region) { + /* Add the memdev name in a by region list */ + if (json_object_object_get_ex(jnode->jobj, "memdev", + &jval)) + json_object_object_add(jp, "memdev", jval); + } + + /* + * When listing is by memdev, region names and valid HPAs + * will appear if the poison address is part of a region. + * Pick up those valid region names and HPAs but ignore the + * empties and invalids. + */ + + /* Only add non NULL region names */ + if (json_object_object_get_ex(jnode->jobj, "region", &jval)) { + if (strlen(json_object_get_string(jval)) != 0) + json_object_object_add(jp, "region", jval); + } + /* Only display valid HPAs */ + if (json_object_object_get_ex(jnode->jobj, "hpa", &jval)) { + addr = json_object_get_uint64(jval); + if (addr != ULLONG_MAX) { + jobj = util_json_object_hex(addr, flags); + json_object_object_add(jp, "hpa", jobj); + } + } + if (json_object_object_get_ex(jnode->jobj, "dpa", &jval)) { + addr = json_object_get_int64(jval); + jobj = util_json_object_hex(addr, flags); + json_object_object_add(jp, "dpa", jobj); + } + if (json_object_object_get_ex(jnode->jobj, "dpa_length", &jval)) { + len = json_object_get_int64(jval); + jobj = util_json_object_size(len, flags); + json_object_object_add(jp, "dpa_length", jobj); + } + if (json_object_object_get_ex(jnode->jobj, "source", &jval)) { + source = json_object_get_int(jval); + if (source == CXL_POISON_SOURCE_UNKNOWN) + jobj = json_object_new_string("Unknown"); + else if (source == CXL_POISON_SOURCE_EXTERNAL) + jobj = json_object_new_string("External"); + else if (source == CXL_POISON_SOURCE_INTERNAL) + jobj = json_object_new_string("Internal"); + else if (source == CXL_POISON_SOURCE_INJECTED) + jobj = json_object_new_string("Injected"); + else if (source == CXL_POISON_SOURCE_VENDOR) + jobj = json_object_new_string("Vendor"); + else + jobj = json_object_new_string("Reserved"); + json_object_object_add(jp, "source", jobj); + } + if (json_object_object_get_ex(jnode->jobj, "flags", &jval)) { + char flag_str[32] = { '\0' }; + + pflags = json_object_get_int(jval); + if (pflags & CXL_POISON_FLAG_MORE) + strcat(flag_str, "More,"); + if (pflags & CXL_POISON_FLAG_OVERFLOW) + strcat(flag_str, "Overflow,"); + if (pflags & CXL_POISON_FLAG_SCANNING) + strcat(flag_str, "Scanning,"); + jobj = json_object_new_string(flag_str); + if (jobj) + json_object_object_add(jp, "flags", jobj); + } + if (json_object_object_get_ex(jnode->jobj, "overflow_t", &jval)) + json_object_object_add(jp, "overflow_time", jval); + + json_object_array_add(jerrors, jp); + count++; + } /* list_for_each_safe */ + +out: + jmedia = json_object_new_object(); + if (!jmedia) + return NULL; + + /* Always include the count. If count is zero, no records follow. */ + jobj = json_object_new_int(count); + if (jobj) + json_object_object_add(jmedia, "nr_poison_records", jobj); + if (count) + json_object_object_add(jmedia, "poison_records", jerrors); + + return jmedia; +} + +struct cxl_poison_ctx { + void *dev; + bool is_region; +}; + +static struct json_object * +util_cxl_poison_list_to_json(struct cxl_poison_ctx *pctx, + unsigned long flags) +{ + struct json_object *jmedia = NULL; + struct tracefs_instance *inst; + int rc; + + inst = tracefs_instance_create("cxl list"); + if (!inst) { + fprintf(stderr, "tracefs_instance_create() failed\n"); + return NULL; + } + + rc = cxl_event_tracing_enable(inst, "cxl", "cxl_poison"); + if (rc < 0) { + fprintf(stderr, "Failed to enable trace: %d\n", rc); + goto err_free; + } + + if (pctx->is_region) + rc = cxl_region_trigger_poison_list(pctx->dev); + else + rc = cxl_memdev_trigger_poison_list(pctx->dev); + if (rc) { + fprintf(stderr, "Failed write of sysfs attribute: %d\n", rc); + goto err_free; + } + + rc = cxl_event_tracing_disable(inst); + if (rc < 0) { + fprintf(stderr, "Failed to disable trace: %d\n", rc); + goto err_free; + } + + jmedia = util_cxl_poison_events_to_json(inst, pctx->is_region, flags); +err_free: + tracefs_instance_free(inst); + return jmedia; +} + struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev, unsigned long flags) { @@ -649,6 +837,16 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev, json_object_object_add(jdev, "firmware", jobj); } + if (flags & UTIL_JSON_POISON_LIST) { + struct cxl_poison_ctx pctx = { + .dev = memdev, + .is_region = false, + }; + jobj = util_cxl_poison_list_to_json(&pctx, flags); + if (jobj) + json_object_object_add(jdev, "poison", jobj); + } + json_object_set_userdata(jdev, memdev, NULL); return jdev; } @@ -987,6 +1185,16 @@ struct json_object *util_cxl_region_to_json(struct cxl_region *region, json_object_object_add(jregion, "state", jobj); } + if (flags & UTIL_JSON_POISON_LIST) { + struct cxl_poison_ctx pectx = { + .dev = region, + .is_region = true, + }; + jobj = util_cxl_poison_list_to_json(&pectx, flags); + if (jobj) + json_object_object_add(jregion, "poison", jobj); + } + util_cxl_mappings_append_json(jregion, region, flags); if (flags & UTIL_JSON_DAX) { diff --git a/util/json.h b/util/json.h index ea370df4d1b7..3ae4074a95c3 100644 --- a/util/json.h +++ b/util/json.h @@ -21,6 +21,7 @@ enum util_json_flags { UTIL_JSON_TARGETS = (1 << 11), UTIL_JSON_PARTITION = (1 << 12), UTIL_JSON_ALERT_CONFIG = (1 << 13), + UTIL_JSON_POISON_LIST = (1 << 14), }; void util_display_json_array(FILE *f_out, struct json_object *jarray, -- 2.37.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [ndctl PATCH v2 3/5] cxl/list: collect and parse the poison list records 2023-10-01 22:31 ` [ndctl PATCH v2 3/5] cxl/list: collect and parse the poison list records alison.schofield @ 2023-11-15 10:09 ` Verma, Vishal L 2023-11-17 16:44 ` Alison Schofield 0 siblings, 1 reply; 12+ messages in thread From: Verma, Vishal L @ 2023-11-15 10:09 UTC (permalink / raw) To: Schofield, Alison; +Cc: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev On Sun, 2023-10-01 at 15:31 -0700, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > Poison list records are logged as events in the kernel tracing > subsystem. To prepare the poison list for cxl list, enable tracing, > trigger the poison list read, and parse the generated cxl_poison > events into a json representation. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > --- > cxl/json.c | 208 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > util/json.h | 1 + > 2 files changed, 209 insertions(+) > > diff --git a/cxl/json.c b/cxl/json.c > index 7678d02020b6..36db73de4f8f 100644 > --- a/cxl/json.c > +++ b/cxl/json.c > @@ -2,15 +2,19 @@ > // Copyright (C) 2015-2021 Intel Corporation. All rights reserved. > #include <limits.h> > #include <util/json.h> > +#include <util/bitmap.h> > #include <uuid/uuid.h> > #include <cxl/libcxl.h> > #include <json-c/json.h> > #include <json-c/printbuf.h> > #include <ccan/short_types/short_types.h> > +#include <traceevent/event-parse.h> > +#include <tracefs/tracefs.h> > > #include "filter.h" > #include "json.h" > #include "../daxctl/json.h" > +#include "event_trace.h" > > #define CXL_FW_VERSION_STR_LEN 16 > #define CXL_FW_MAX_SLOTS 4 > @@ -571,6 +575,190 @@ err_jobj: > return NULL; > } > > +/* CXL 8.2.9.5.4.1 Get Poison List: Poison Source */ These usually have a spec version too - "CXL 3.0 8.2.9... " > +#define CXL_POISON_SOURCE_UNKNOWN 0 > +#define CXL_POISON_SOURCE_EXTERNAL 1 > +#define CXL_POISON_SOURCE_INTERNAL 2 > +#define CXL_POISON_SOURCE_INJECTED 3 > +#define CXL_POISON_SOURCE_VENDOR 7 > + > +/* CXL 8.2.9.5.4.1 Get Poison List: Payload out flags */ Same thing here. > +#define CXL_POISON_FLAG_MORE BIT(0) > +#define CXL_POISON_FLAG_OVERFLOW BIT(1) > +#define CXL_POISON_FLAG_SCANNING BIT(2) > + > +static struct json_object * > +util_cxl_poison_events_to_json(struct tracefs_instance *inst, bool is_region, > + unsigned long flags) > +{ > + struct json_object *jerrors, *jmedia, *jobj = NULL; Since everything else is now 'poison', might be good to also s/jmedia/jpoison/ everywhere. > + struct jlist_node *jnode, *next; > + struct event_ctx ectx = { > + .event_name = "cxl_poison", > + .event_pid = getpid(), > + .system = "cxl", > + }; > + int rc, count = 0; > + > + list_head_init(&ectx.jlist_head); > + rc = cxl_parse_events(inst, &ectx); > + if (rc < 0) { > + fprintf(stderr, "Failed to parse events: %d\n", rc); > + return NULL; > + } > + /* Add nr_poison_records:0 to json */ > + if (list_empty(&ectx.jlist_head)) > + goto out; > + > + jerrors = json_object_new_array(); > + if (!jerrors) > + return NULL; > + > + list_for_each_safe(&ectx.jlist_head, jnode, next, list) { > + struct json_object *jval = NULL; > + struct json_object *jp = NULL; Are the NULL assignments needed? At least for @jp, it is unconditionally assigned below, and isn't used before that. I suspect json-c probably doesn't care about what's in @jval either before writing it. > + int source, pflags; > + u64 addr, len; > + > + jp = json_object_new_object(); > + if (!jp) > + return NULL; > + > + if (is_region) { > + /* Add the memdev name in a by region list */ > + if (json_object_object_get_ex(jnode->jobj, "memdev", > + &jval)) > + json_object_object_add(jp, "memdev", jval); > + } > + > + /* > + * When listing is by memdev, region names and valid HPAs > + * will appear if the poison address is part of a region. > + * Pick up those valid region names and HPAs but ignore the > + * empties and invalids. > + */ > + > + /* Only add non NULL region names */ > + if (json_object_object_get_ex(jnode->jobj, "region", &jval)) { > + if (strlen(json_object_get_string(jval)) != 0) > + json_object_object_add(jp, "region", jval); > + } > + /* Only display valid HPAs */ > + if (json_object_object_get_ex(jnode->jobj, "hpa", &jval)) { > + addr = json_object_get_uint64(jval); > + if (addr != ULLONG_MAX) { > + jobj = util_json_object_hex(addr, flags); > + json_object_object_add(jp, "hpa", jobj); > + } > + } > + if (json_object_object_get_ex(jnode->jobj, "dpa", &jval)) { > + addr = json_object_get_int64(jval); > + jobj = util_json_object_hex(addr, flags); > + json_object_object_add(jp, "dpa", jobj); > + } > + if (json_object_object_get_ex(jnode->jobj, "dpa_length", &jval)) { > + len = json_object_get_int64(jval); > + jobj = util_json_object_size(len, flags); > + json_object_object_add(jp, "dpa_length", jobj); > + } > + if (json_object_object_get_ex(jnode->jobj, "source", &jval)) { > + source = json_object_get_int(jval); > + if (source == CXL_POISON_SOURCE_UNKNOWN) > + jobj = json_object_new_string("Unknown"); > + else if (source == CXL_POISON_SOURCE_EXTERNAL) > + jobj = json_object_new_string("External"); > + else if (source == CXL_POISON_SOURCE_INTERNAL) > + jobj = json_object_new_string("Internal"); > + else if (source == CXL_POISON_SOURCE_INJECTED) > + jobj = json_object_new_string("Injected"); > + else if (source == CXL_POISON_SOURCE_VENDOR) > + jobj = json_object_new_string("Vendor"); > + else > + jobj = json_object_new_string("Reserved"); Minor nit, but maybe 'switch (source) ...' would look a bit cleaner? > + json_object_object_add(jp, "source", jobj); > + } > + if (json_object_object_get_ex(jnode->jobj, "flags", &jval)) { > + char flag_str[32] = { '\0' }; > + > + pflags = json_object_get_int(jval); > + if (pflags & CXL_POISON_FLAG_MORE) > + strcat(flag_str, "More,"); > + if (pflags & CXL_POISON_FLAG_OVERFLOW) > + strcat(flag_str, "Overflow,"); > + if (pflags & CXL_POISON_FLAG_SCANNING) > + strcat(flag_str, "Scanning,"); > + jobj = json_object_new_string(flag_str); > + if (jobj) > + json_object_object_add(jp, "flags", jobj); > + } > + if (json_object_object_get_ex(jnode->jobj, "overflow_t", &jval)) > + json_object_object_add(jp, "overflow_time", jval); > + > + json_object_array_add(jerrors, jp); > + count++; > + } /* list_for_each_safe */ > + > +out: > + jmedia = json_object_new_object(); > + if (!jmedia) > + return NULL; > + > + /* Always include the count. If count is zero, no records follow. */ > + jobj = json_object_new_int(count); > + if (jobj) > + json_object_object_add(jmedia, "nr_poison_records", jobj); > + if (count) > + json_object_object_add(jmedia, "poison_records", jerrors); Since these are already nested under a 'poison' JSON object, I'm tempted to say these can just be 'nr_records' and 'records' respectively. > + > + return jmedia; > +} > + > +struct cxl_poison_ctx { > + void *dev; > + bool is_region; > +}; This structure is a bit awkward - what do you think about creating different wrappers for the memdev and region case - util_cxl_memdev_poison_list_to_json(), and util_cxl_region_poison_list_to_json() that are called respectively by util_cxl_{memdev,region}_to_json(), and internally they can call: util_cxl_poison_list_to_json(NULL, memdev, flags), or util_cxl_poison_list_to_json(region, NULL, flags) For the next level down, i.e. poison_events_to_json, the @is_region bool passed in directly is fine as it doesn't need the memdev or region objects passed in via void *. > + > +static struct json_object * > +util_cxl_poison_list_to_json(struct cxl_poison_ctx *pctx, > + unsigned long flags) > +{ > + struct json_object *jmedia = NULL; > + struct tracefs_instance *inst; > + int rc; > + > + inst = tracefs_instance_create("cxl list"); > + if (!inst) { > + fprintf(stderr, "tracefs_instance_create() failed\n"); > + return NULL; > + } > + > + rc = cxl_event_tracing_enable(inst, "cxl", "cxl_poison"); > + if (rc < 0) { > + fprintf(stderr, "Failed to enable trace: %d\n", rc); > + goto err_free; > + } > + > + if (pctx->is_region) > + rc = cxl_region_trigger_poison_list(pctx->dev); > + else > + rc = cxl_memdev_trigger_poison_list(pctx->dev); > + if (rc) { > + fprintf(stderr, "Failed write of sysfs attribute: %d\n", rc); This would be incorrect if the memdev trigger reported an ENOMEM, and then this reported a sysfs write failure. It should at least be something like 'failed to trigger poison" - but since the memdev trigger helper has prints for every failure case, maybe this can just be omitted? > + goto err_free; > + } > + > + rc = cxl_event_tracing_disable(inst); > + if (rc < 0) { > + fprintf(stderr, "Failed to disable trace: %d\n", rc); > + goto err_free; > + } > + > + jmedia = util_cxl_poison_events_to_json(inst, pctx->is_region, flags); > +err_free: > + tracefs_instance_free(inst); > + return jmedia; > +} > + > struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev, > unsigned long flags) > { > @@ -649,6 +837,16 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev, > json_object_object_add(jdev, "firmware", jobj); > } > > + if (flags & UTIL_JSON_POISON_LIST) { > + struct cxl_poison_ctx pctx = { > + .dev = memdev, > + .is_region = false, > + }; > + jobj = util_cxl_poison_list_to_json(&pctx, flags); > + if (jobj) > + json_object_object_add(jdev, "poison", jobj); > + } > + > json_object_set_userdata(jdev, memdev, NULL); > return jdev; > } > @@ -987,6 +1185,16 @@ struct json_object *util_cxl_region_to_json(struct cxl_region *region, > json_object_object_add(jregion, "state", jobj); > } > > + if (flags & UTIL_JSON_POISON_LIST) { > + struct cxl_poison_ctx pectx = { > + .dev = region, > + .is_region = true, > + }; > + jobj = util_cxl_poison_list_to_json(&pectx, flags); > + if (jobj) > + json_object_object_add(jregion, "poison", jobj); > + } > + > util_cxl_mappings_append_json(jregion, region, flags); > > if (flags & UTIL_JSON_DAX) { > diff --git a/util/json.h b/util/json.h > index ea370df4d1b7..3ae4074a95c3 100644 > --- a/util/json.h > +++ b/util/json.h > @@ -21,6 +21,7 @@ enum util_json_flags { > UTIL_JSON_TARGETS = (1 << 11), > UTIL_JSON_PARTITION = (1 << 12), > UTIL_JSON_ALERT_CONFIG = (1 << 13), > + UTIL_JSON_POISON_LIST = (1 << 14), There's already a UTIL_JSON_MEDIA_ERRORS, can we just reuse that (in spite of the name :)) > }; > > void util_display_json_array(FILE *f_out, struct json_object *jarray, ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ndctl PATCH v2 3/5] cxl/list: collect and parse the poison list records 2023-11-15 10:09 ` Verma, Vishal L @ 2023-11-17 16:44 ` Alison Schofield 0 siblings, 0 replies; 12+ messages in thread From: Alison Schofield @ 2023-11-17 16:44 UTC (permalink / raw) To: Verma, Vishal L; +Cc: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev On Wed, Nov 15, 2023 at 02:09:15AM -0800, Vishal Verma wrote: > On Sun, 2023-10-01 at 15:31 -0700, alison.schofield@intel.com wrote: snip > > > > +/* CXL 8.2.9.5.4.1 Get Poison List: Poison Source */ > > These usually have a spec version too - "CXL 3.0 8.2.9... " Got it. Fixed up the two like this and also a commit log reference. Moved them all to the now released CXL 3.1 spec. > > +static struct json_object * > > +util_cxl_poison_events_to_json(struct tracefs_instance *inst, bool is_region, > > + unsigned long flags) > > +{ > > + struct json_object *jerrors, *jmedia, *jobj = NULL; > > Since everything else is now 'poison', might be good to also > s/jmedia/jpoison/ everywhere. Done. > > + > > + list_for_each_safe(&ectx.jlist_head, jnode, next, list) { > > + struct json_object *jval = NULL; > > + struct json_object *jp = NULL; > > Are the NULL assignments needed? At least for @jp, it is > unconditionally assigned below, and isn't used before that. I suspect > json-c probably doesn't care about what's in @jval either before > writing it. > jp init, obvious not init needed. For jval init, I guess I worried about garbage in jval, but the way the trace event is built, it's not a concern. Removed both those inits. snip > > + if (json_object_object_get_ex(jnode->jobj, "source", &jval)) { > > + source = json_object_get_int(jval); > > + if (source == CXL_POISON_SOURCE_UNKNOWN) > > + jobj = json_object_new_string("Unknown"); > > + else if (source == CXL_POISON_SOURCE_EXTERNAL) > > + jobj = json_object_new_string("External"); > > + else if (source == CXL_POISON_SOURCE_INTERNAL) > > + jobj = json_object_new_string("Internal"); > > + else if (source == CXL_POISON_SOURCE_INJECTED) > > + jobj = json_object_new_string("Injected"); > > + else if (source == CXL_POISON_SOURCE_VENDOR) > > + jobj = json_object_new_string("Vendor"); > > + else > > + jobj = json_object_new_string("Reserved"); > > Minor nit, but maybe 'switch (source) ...' would look a bit cleaner? Done. > > > + > > + /* Always include the count. If count is zero, no records follow. */ > > + jobj = json_object_new_int(count); > > + if (jobj) > > + json_object_object_add(jmedia, "nr_poison_records", jobj); > > + if (count) > > + json_object_object_add(jmedia, "poison_records", jerrors); > > Since these are already nested under a 'poison' JSON object, I'm > tempted to say these can just be 'nr_records' and 'records' > respectively. > Done. > > + > > + return jmedia; > > +} > > + > > +struct cxl_poison_ctx { > > + void *dev; > > + bool is_region; > > +}; > > This structure is a bit awkward - what do you think about creating > different wrappers for the memdev and region case - > > util_cxl_memdev_poison_list_to_json(), and > util_cxl_region_poison_list_to_json() that are called respectively by > > util_cxl_{memdev,region}_to_json(), and internally they can call: > > util_cxl_poison_list_to_json(NULL, memdev, flags), or > util_cxl_poison_list_to_json(region, NULL, flags) > > For the next level down, i.e. poison_events_to_json, the @is_region > bool passed in directly is fine as it doesn't need the memdev or region > objects passed in via void *. Thanks! I did something like this but didn't actually add wrappers. Please look in v3 and let me know. > snip > > + if (pctx->is_region) > > + rc = cxl_region_trigger_poison_list(pctx->dev); > > + else > > + rc = cxl_memdev_trigger_poison_list(pctx->dev); > > + if (rc) { > > + fprintf(stderr, "Failed write of sysfs attribute: %d\n", rc); > > This would be incorrect if the memdev trigger reported an ENOMEM, and > then this reported a sysfs write failure. > > It should at least be something like 'failed to trigger poison" - but > since the memdev trigger helper has prints for every failure case, > maybe this can just be omitted? Removed it. > snip > > diff --git a/util/json.h b/util/json.h > > index ea370df4d1b7..3ae4074a95c3 100644 > > --- a/util/json.h > > +++ b/util/json.h > > @@ -21,6 +21,7 @@ enum util_json_flags { > > UTIL_JSON_TARGETS = (1 << 11), > > UTIL_JSON_PARTITION = (1 << 12), > > UTIL_JSON_ALERT_CONFIG = (1 << 13), > > + UTIL_JSON_POISON_LIST = (1 << 14), > > There's already a UTIL_JSON_MEDIA_ERRORS, can we just reuse that (in > spite of the name :)) Since it's not visible to user, changed it. Thanks for the review Vishal! > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [ndctl PATCH v2 4/5] cxl/list: add --poison option to cxl list 2023-10-01 22:31 [ndctl PATCH v2 0/3] Support poison list retrieval alison.schofield ` (2 preceding siblings ...) 2023-10-01 22:31 ` [ndctl PATCH v2 3/5] cxl/list: collect and parse the poison list records alison.schofield @ 2023-10-01 22:31 ` alison.schofield 2023-10-01 22:31 ` [ndctl PATCH v2 5/5] cxl/test: add cxl-poison.sh unit test alison.schofield 4 siblings, 0 replies; 12+ messages in thread From: alison.schofield @ 2023-10-01 22:31 UTC (permalink / raw) To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl From: Alison Schofield <alison.schofield@intel.com> The --poison option to 'cxl list' retrieves poison lists from memory devices supporting the capability and displays the returned poison records in the cxl list json. This option can apply to memdevs or regions. Example usage in the Documentation/cxl/cxl-list.txt update. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- Documentation/cxl/cxl-list.txt | 64 ++++++++++++++++++++++++++++++++++ cxl/filter.h | 3 ++ cxl/list.c | 2 ++ 3 files changed, 69 insertions(+) diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt index 838de4086678..2a4a8fe2efaa 100644 --- a/Documentation/cxl/cxl-list.txt +++ b/Documentation/cxl/cxl-list.txt @@ -415,6 +415,70 @@ OPTIONS --region:: Specify CXL region device name(s), or device id(s), to filter the listing. +-L:: +--poison:: + Include poison information. The poison list is retrieved from the + device(s) and poison records are added to the listing. Apply this + option to memdevs and regions where devices support the poison + list capability. + +---- +# cxl list -m mem11 --poison +[ + { + "memdev":"mem11", + "pmem_size":268435456, + "ram_size":0, + "serial":0, + "host":"0000:37:00.0", + "poison":{ + "nr_poison_records":1, + "poison_records":[ + { + "dpa":0, + "dpa_length":64, + "source":"Internal", + "flags":"", + "overflow_time":0 + } + ] + } + } +] +# cxl list -r region5 --poison +[ + { + "region":"region5", + "resource":1035623989248, + "size":2147483648, + "interleave_ways":2, + "interleave_granularity":4096, + "decode_state":"commit", + "poison":{ + "nr_poison_records":2, + "poison_records":[ + { + "memdev":"mem2", + "dpa":0, + "dpa_length":64, + "source":"Internal", + "flags":"", + "overflow_time":0 + }, + { + "memdev":"mem5", + "dpa":0, + "length":512, + "source":"Vendor", + "flags":"", + "overflow_time":0 + } + ] + } + } +] +---- + -v:: --verbose:: Increase verbosity of the output. This can be specified diff --git a/cxl/filter.h b/cxl/filter.h index 3f65990f835a..f38470e39543 100644 --- a/cxl/filter.h +++ b/cxl/filter.h @@ -30,6 +30,7 @@ struct cxl_filter_params { bool fw; bool alert_config; bool dax; + bool poison; int verbose; struct log_ctx ctx; }; @@ -88,6 +89,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param) flags |= UTIL_JSON_ALERT_CONFIG; if (param->dax) flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS; + if (param->poison) + flags |= UTIL_JSON_POISON_LIST; return flags; } diff --git a/cxl/list.c b/cxl/list.c index 93ba51ef895c..13fef8569340 100644 --- a/cxl/list.c +++ b/cxl/list.c @@ -57,6 +57,8 @@ static const struct option options[] = { "include memory device firmware information"), OPT_BOOLEAN('A', "alert-config", ¶m.alert_config, "include alert configuration information"), + OPT_BOOLEAN('L', "poison", ¶m.poison, + "include poison information "), OPT_INCR('v', "verbose", ¶m.verbose, "increase output detail"), #ifdef ENABLE_DEBUG OPT_BOOLEAN(0, "debug", &debug, "debug list walk"), -- 2.37.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [ndctl PATCH v2 5/5] cxl/test: add cxl-poison.sh unit test 2023-10-01 22:31 [ndctl PATCH v2 0/3] Support poison list retrieval alison.schofield ` (3 preceding siblings ...) 2023-10-01 22:31 ` [ndctl PATCH v2 4/5] cxl/list: add --poison option to cxl list alison.schofield @ 2023-10-01 22:31 ` alison.schofield 2023-11-15 10:13 ` Verma, Vishal L 4 siblings, 1 reply; 12+ messages in thread From: alison.schofield @ 2023-10-01 22:31 UTC (permalink / raw) To: Vishal Verma; +Cc: Alison Schofield, nvdimm, linux-cxl From: Alison Schofield <alison.schofield@intel.com> Exercise cxl list, libcxl, and driver pieces of the get poison list pathway. Inject and clear poison using debugfs and use cxl-cli to read the poison list by memdev and by region. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- test/cxl-poison.sh | 103 +++++++++++++++++++++++++++++++++++++++++++++ test/meson.build | 2 + 2 files changed, 105 insertions(+) create mode 100644 test/cxl-poison.sh diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh new file mode 100644 index 000000000000..3c424532da7b --- /dev/null +++ b/test/cxl-poison.sh @@ -0,0 +1,103 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2022 Intel Corporation. All rights reserved. + +. $(dirname $0)/common + +rc=77 + +set -ex + +trap 'err $LINENO' ERR + +check_prereq "jq" + +modprobe -r cxl_test +modprobe cxl_test +cxl list + +# THEORY OF OPERATION: Exercise cxl-cli and cxl driver ability to +# inject, clear, and get the poison list. Do it by memdev and by region. +# Based on current cxl-test topology. + +create_region() +{ + region=$($CXL create-region -d $decoder -m $memdevs | jq -r ".region") + + if [[ ! $region ]]; then + echo "create-region failed for $decoder" + err "$LINENO" + fi +} + +setup_x2_region() +{ + # Find an x2 decoder + decoder=$($CXL list -b cxl_test -D -d root | jq -r ".[] | + select(.pmem_capable == true) | + select(.nr_targets == 2) | + .decoder") + + # Find a memdev for each host-bridge interleave position + port_dev0=$($CXL list -T -d $decoder | jq -r ".[] | + .targets | .[] | select(.position == 0) | .target") + port_dev1=$($CXL list -T -d $decoder | jq -r ".[] | + .targets | .[] | select(.position == 1) | .target") + mem0=$($CXL list -M -p $port_dev0 | jq -r ".[0].memdev") + mem1=$($CXL list -M -p $port_dev1 | jq -r ".[0].memdev") + memdevs="$mem0 $mem1" +} + +find_media_errors() +{ + nr=$(echo $json | jq -r ".nr_poison_records") + if [[ $nr -ne $NR_ERRS ]]; then + echo "$mem: $NR_ERRS poison records expected, $nr found" + err "$LINENO" + fi +} + +# Turn Tracing ON +# Note that 'cxl list --poison' does toggle the tracing, so +# turning it on here is to enable the test user to view inject +# and clear trace events, if they wish. +echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable + +# Using DEBUGFS: +# When cxl-cli support for inject and clear arrives, replace +# the writes to /sys/kernel/debug with the new cxl commands +# that wrap them. + +# Poison by memdev: inject, list, clear, list. +# Inject 2 into pmem and 2 into ram partition. +echo 0x40000000 > /sys/kernel/debug/cxl/mem1/inject_poison +echo 0x40001000 > /sys/kernel/debug/cxl/mem1/inject_poison +echo 0x0 > /sys/kernel/debug/cxl/mem1/inject_poison +echo 0x600 > /sys/kernel/debug/cxl/mem1/inject_poison +NR_ERRS=4 +json=$("$CXL" list -m mem1 --poison | jq -r '.[].poison') +find_media_errors +echo 0x40000000 > /sys/kernel/debug/cxl/mem1/clear_poison +echo 0x40001000 > /sys/kernel/debug/cxl/mem1/clear_poison +echo 0x0 > /sys/kernel/debug/cxl/mem1/clear_poison +echo 0x600 > /sys/kernel/debug/cxl/mem1/clear_poison +NR_ERRS=0 +json=$("$CXL" list -m mem1 --poison | jq -r '.[].poison') +find_media_errors + +# Poison by region: inject, list, clear, list. +setup_x2_region +create_region +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem0"/inject_poison +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem1"/inject_poison +NR_ERRS=2 +json=$("$CXL" list -r "$region" --poison | jq -r '.[].poison') +find_media_errors +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem0"/clear_poison +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem1"/clear_poison +NR_ERRS=0 +json=$("$CXL" list -r "$region" --poison | jq -r '.[].poison') +find_media_errors + +check_dmesg "$LINENO" +modprobe -r cxl-test diff --git a/test/meson.build b/test/meson.build index 224adaf41fcc..2706fa5d633c 100644 --- a/test/meson.build +++ b/test/meson.build @@ -157,6 +157,7 @@ cxl_create_region = find_program('cxl-create-region.sh') cxl_xor_region = find_program('cxl-xor-region.sh') cxl_update_firmware = find_program('cxl-update-firmware.sh') cxl_events = find_program('cxl-events.sh') +cxl_poison = find_program('cxl-poison.sh') tests = [ [ 'libndctl', libndctl, 'ndctl' ], @@ -186,6 +187,7 @@ tests = [ [ 'cxl-create-region.sh', cxl_create_region, 'cxl' ], [ 'cxl-xor-region.sh', cxl_xor_region, 'cxl' ], [ 'cxl-events.sh', cxl_events, 'cxl' ], + [ 'cxl-poison.sh', cxl_poison, 'cxl' ], ] if get_option('destructive').enabled() -- 2.37.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [ndctl PATCH v2 5/5] cxl/test: add cxl-poison.sh unit test 2023-10-01 22:31 ` [ndctl PATCH v2 5/5] cxl/test: add cxl-poison.sh unit test alison.schofield @ 2023-11-15 10:13 ` Verma, Vishal L 2023-11-17 16:52 ` Alison Schofield 0 siblings, 1 reply; 12+ messages in thread From: Verma, Vishal L @ 2023-11-15 10:13 UTC (permalink / raw) To: Schofield, Alison; +Cc: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev On Sun, 2023-10-01 at 15:31 -0700, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > Exercise cxl list, libcxl, and driver pieces of the get poison list > pathway. Inject and clear poison using debugfs and use cxl-cli to > read the poison list by memdev and by region. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > --- > test/cxl-poison.sh | 103 +++++++++++++++++++++++++++++++++++++++++++++ > test/meson.build | 2 + > 2 files changed, 105 insertions(+) > create mode 100644 test/cxl-poison.sh > > diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh > new file mode 100644 > index 000000000000..3c424532da7b > --- /dev/null > +++ b/test/cxl-poison.sh > @@ -0,0 +1,103 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2022 Intel Corporation. All rights reserved. > + > +. $(dirname $0)/common > + > +rc=77 > + > +set -ex > + > +trap 'err $LINENO' ERR > + > +check_prereq "jq" > + > +modprobe -r cxl_test > +modprobe cxl_test > +cxl list "$CXL" list Also should reset rc from 77 so that it doesn't show as skipped on a real failure. > + > +# THEORY OF OPERATION: Exercise cxl-cli and cxl driver ability to > +# inject, clear, and get the poison list. Do it by memdev and by region. > +# Based on current cxl-test topology. > + > +create_region() > +{ > + region=$($CXL create-region -d $decoder -m $memdevs | jq -r ".region") > + > + if [[ ! $region ]]; then > + echo "create-region failed for $decoder" > + err "$LINENO" > + fi > +} > + > +setup_x2_region() > +{ > + # Find an x2 decoder > + decoder=$($CXL list -b cxl_test -D -d root | jq -r ".[] | I suspect this comes from another test, but test/common defines a $cxl_test_bus that can be used here. > + select(.pmem_capable == true) | > + select(.nr_targets == 2) | > + .decoder") > + > + # Find a memdev for each host-bridge interleave position > + port_dev0=$($CXL list -T -d $decoder | jq -r ".[] | > + .targets | .[] | select(.position == 0) | .target") > + port_dev1=$($CXL list -T -d $decoder | jq -r ".[] | > + .targets | .[] | select(.position == 1) | .target") > + mem0=$($CXL list -M -p $port_dev0 | jq -r ".[0].memdev") > + mem1=$($CXL list -M -p $port_dev1 | jq -r ".[0].memdev") > + memdevs="$mem0 $mem1" > +} > + > +find_media_errors() > +{ > + nr=$(echo $json | jq -r ".nr_poison_records") No need for echo and pipe - nr="$(jq -r ".nr_poison_records" <<< "$json")" Also, this currently assumes that a global '$json' will be available and up to date. In this test the way it is called, this will always be true, but it would be cleaner to actually pass $json to find_media_errors() each time, and in here, do something like local json="$1" > + if [[ $nr -ne $NR_ERRS ]]; then If using the bash variant, [[ ]], this should be if [[ $nr != $NR_ERRS ]]; then > + echo "$mem: $NR_ERRS poison records expected, $nr found" > + err "$LINENO" > + fi > +} > + > +# Turn Tracing ON > +# Note that 'cxl list --poison' does toggle the tracing, so > +# turning it on here is to enable the test user to view inject > +# and clear trace events, if they wish. > +echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable > + > +# Using DEBUGFS: > +# When cxl-cli support for inject and clear arrives, replace > +# the writes to /sys/kernel/debug with the new cxl commands > +# that wrap them. > + > +# Poison by memdev: inject, list, clear, list. > +# Inject 2 into pmem and 2 into ram partition. > +echo 0x40000000 > /sys/kernel/debug/cxl/mem1/inject_poison > +echo 0x40001000 > /sys/kernel/debug/cxl/mem1/inject_poison > +echo 0x0 > /sys/kernel/debug/cxl/mem1/inject_poison > +echo 0x600 > /sys/kernel/debug/cxl/mem1/inject_poison > +NR_ERRS=4 > +json=$("$CXL" list -m mem1 --poison | jq -r '.[].poison') > +find_media_errors > +echo 0x40000000 > /sys/kernel/debug/cxl/mem1/clear_poison > +echo 0x40001000 > /sys/kernel/debug/cxl/mem1/clear_poison > +echo 0x0 > /sys/kernel/debug/cxl/mem1/clear_poison > +echo 0x600 > /sys/kernel/debug/cxl/mem1/clear_poison > +NR_ERRS=0 > +json=$("$CXL" list -m mem1 --poison | jq -r '.[].poison') > +find_media_errors For all of the above debugfs writes - mem1 is hard-coded - is this supposed to be "$mem1" from when setup_x2_region() was done (similar to how the region stuff is done below)? > + > +# Poison by region: inject, list, clear, list. > +setup_x2_region > +create_region > +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem0"/inject_poison > +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem1"/inject_poison > +NR_ERRS=2 > +json=$("$CXL" list -r "$region" --poison | jq -r '.[].poison') > +find_media_errors > +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem0"/clear_poison > +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem1"/clear_poison It might be nice to create a couple of helpers - inject_poison_sysfs() { memdev="$1" addr="$2 ... } And similarly clear_poison_sysfs()... > +NR_ERRS=0 > +json=$("$CXL" list -r "$region" --poison | jq -r '.[].poison') > +find_media_errors > + > +check_dmesg "$LINENO" > +modprobe -r cxl-test > diff --git a/test/meson.build b/test/meson.build > index 224adaf41fcc..2706fa5d633c 100644 > --- a/test/meson.build > +++ b/test/meson.build > @@ -157,6 +157,7 @@ cxl_create_region = find_program('cxl-create-region.sh') > cxl_xor_region = find_program('cxl-xor-region.sh') > cxl_update_firmware = find_program('cxl-update-firmware.sh') > cxl_events = find_program('cxl-events.sh') > +cxl_poison = find_program('cxl-poison.sh') > > tests = [ > [ 'libndctl', libndctl, 'ndctl' ], > @@ -186,6 +187,7 @@ tests = [ > [ 'cxl-create-region.sh', cxl_create_region, 'cxl' ], > [ 'cxl-xor-region.sh', cxl_xor_region, 'cxl' ], > [ 'cxl-events.sh', cxl_events, 'cxl' ], > + [ 'cxl-poison.sh', cxl_poison, 'cxl' ], > ] > > if get_option('destructive').enabled() ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [ndctl PATCH v2 5/5] cxl/test: add cxl-poison.sh unit test 2023-11-15 10:13 ` Verma, Vishal L @ 2023-11-17 16:52 ` Alison Schofield 0 siblings, 0 replies; 12+ messages in thread From: Alison Schofield @ 2023-11-17 16:52 UTC (permalink / raw) To: Verma, Vishal L; +Cc: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev On Wed, Nov 15, 2023 at 02:13:48AM -0800, Vishal Verma wrote: > On Sun, 2023-10-01 at 15:31 -0700, alison.schofield@intel.com wrote: > > From: Alison Schofield <alison.schofield@intel.com> > > > > Exercise cxl list, libcxl, and driver pieces of the get poison list > > pathway. Inject and clear poison using debugfs and use cxl-cli to > > read the poison list by memdev and by region. > > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > > --- snip > > +cxl list > > "$CXL" list > > Also should reset rc from 77 so that it doesn't show as skipped on a > real failure. Done. > snip > > +setup_x2_region() > > +{ > > + # Find an x2 decoder > > + decoder=$($CXL list -b cxl_test -D -d root | jq -r ".[] | > > I suspect this comes from another test, but test/common defines a > $cxl_test_bus that can be used here. Done. > snip > > +find_media_errors() > > +{ > > + nr=$(echo $json | jq -r ".nr_poison_records") > > No need for echo and pipe - > > nr="$(jq -r ".nr_poison_records" <<< "$json")" Done > > Also, this currently assumes that a global '$json' will be available > and up to date. In this test the way it is called, this will always be > true, but it would be cleaner to actually pass $json to > find_media_errors() each time, and in here, do something like > > local json="$1" > Done > > + if [[ $nr -ne $NR_ERRS ]]; then > > If using the bash variant, [[ ]], this should be > > if [[ $nr != $NR_ERRS ]]; then > Done > > + echo "$mem: $NR_ERRS poison records expected, $nr found" > > + err "$LINENO" > > + fi > > +} > > + snip > > +find_media_errors > > For all of the above debugfs writes - > > mem1 is hard-coded - is this supposed to be "$mem1" from when > setup_x2_region() was done (similar to how the region stuff is done > below)? It was intentionally hardcoded based on what I expect in the cxl-test topology. Changed it in v3 to look up a memdev. > > > + > > +# Poison by region: inject, list, clear, list. > > +setup_x2_region > > +create_region > > +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem0"/inject_poison > > +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem1"/inject_poison > > +NR_ERRS=2 > > +json=$("$CXL" list -r "$region" --poison | jq -r '.[].poison') > > +find_media_errors > > +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem0"/clear_poison > > +echo 0x40000000 > /sys/kernel/debug/cxl/"$mem1"/clear_poison > > It might be nice to create a couple of helpers - > > inject_poison_sysfs() { > memdev="$1" > addr="$2 > ... > } > > And similarly > > clear_poison_sysfs()... > Done Thanks for the review Vishal, especially the bash & jq wisdom! > > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-11-17 16:52 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-01 22:31 [ndctl PATCH v2 0/3] Support poison list retrieval alison.schofield 2023-10-01 22:31 ` [ndctl PATCH v2 1/5] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield 2023-11-15 10:08 ` Verma, Vishal L 2023-11-17 16:21 ` Alison Schofield 2023-10-01 22:31 ` [ndctl PATCH v2 2/5] cxl: add an optional pid check to event parsing alison.schofield 2023-10-01 22:31 ` [ndctl PATCH v2 3/5] cxl/list: collect and parse the poison list records alison.schofield 2023-11-15 10:09 ` Verma, Vishal L 2023-11-17 16:44 ` Alison Schofield 2023-10-01 22:31 ` [ndctl PATCH v2 4/5] cxl/list: add --poison option to cxl list alison.schofield 2023-10-01 22:31 ` [ndctl PATCH v2 5/5] cxl/test: add cxl-poison.sh unit test alison.schofield 2023-11-15 10:13 ` Verma, Vishal L 2023-11-17 16:52 ` Alison Schofield
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox