* [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record
@ 2025-07-17 10:18 shiju.jose
2025-07-17 10:18 ` [PATCH v2 1/4] cxl/events: Update Common Event Record to CXL spec rev 3.2 shiju.jose
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: shiju.jose @ 2025-07-17 10:18 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, dave.jiang, jonathan.cameron,
alison.schofield, dave, vishal.l.verma, ira.weiny
Cc: tanxiaofei, prime.zeng, linuxarm, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Add following changes to the CXL trace events,
1. Update Common Event Record to CXL spec rev 3.2
https://lore.kernel.org/all/20250522090002.831-1-shiju.jose@huawei.com/
2. Add extra validity checks for corrected memory error count
in General Media Event Record
3. Add extra validity checks for CVME count in DRAM Event Record
4. Add support for Trace Memory Sparing Event Record.
Changes
v1 -> v2
https://lore.kernel.org/all/20250716104945.2002-1-shiju.jose@huawei.com/
1. Fixed comment "Spacing before the } is inconsistent" in patch
Add support for Trace Memory Sparing Event Record (Jonathan).
2. Dropped note and question from patch(2) and patch (3) (Jonathan and
Dave Jiang)
3 Fix for
https://lore.kernel.org/all/202507171153.p2RrAdN4-lkp@intel.com/ and
https://lore.kernel.org/all/202507171217.6p5GHqr0-lkp@intel.com/
Foregot to do get_unaligned_le24().
Shiju Jose (4):
cxl/events: Update Common Event Record to CXL spec rev 3.2
cxl/events: Add extra validity checks for corrected memory error count
in General Media Event Record
cxl/events: Add extra validity checks for CVME count in DRAM Event
Record
cxl/events: Trace Memory Sparing Event Record
drivers/cxl/core/mbox.c | 24 +++++++
drivers/cxl/core/trace.h | 133 +++++++++++++++++++++++++++++++++++++--
drivers/cxl/cxlmem.h | 8 +++
include/cxl/event.h | 37 ++++++++++-
4 files changed, 195 insertions(+), 7 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/4] cxl/events: Update Common Event Record to CXL spec rev 3.2
2025-07-17 10:18 [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record shiju.jose
@ 2025-07-17 10:18 ` shiju.jose
2025-07-17 10:18 ` [PATCH v2 2/4] cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record shiju.jose
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: shiju.jose @ 2025-07-17 10:18 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, dave.jiang, jonathan.cameron,
alison.schofield, dave, vishal.l.verma, ira.weiny
Cc: tanxiaofei, prime.zeng, linuxarm, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
CXL spec 3.2 section 8.2.10.2.1 Table 8-55, Common Event Record format
defined new fields LD-ID and Head ID.
LD-ID: ID of logical device from where the event originated, which is
valid only if LD-ID valid flag is set to 1.
CXL spec 3.2 Section 2.4 describes, a Type 3 Multi-Logical Device (MLD)
can partition its resources into up to 16 isolated Logical Devices.
Each Logical Device is identified by a Logical Device Identifier (LD-ID)
in CXL.mem and CXL.io protocols. LD-ID is a 16-bit Logical Device
identifier applicable for CXL.io and CXL.mem requests and responses.
CXL.mem supports only the lower 4 bits of LD-ID and therefore can support
up to 16 unique LD-ID values over the link. Requests and responses
forwarded over an MLD Port are tagged with LD-ID.
Head ID: ID of the device head, from where the event originated, which is
valid only if head valid flag is set to 1.
Add updates for the above spec changes in the CXL events record and CXL
common trace event implementation.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/trace.h | 18 ++++++++++++++----
include/cxl/event.h | 4 +++-
2 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 25ebfbc1616c..a77487a257b3 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -214,12 +214,16 @@ TRACE_EVENT(cxl_overflow,
#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED BIT(4)
#define CXL_EVENT_RECORD_FLAG_HW_REPLACE BIT(5)
#define CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID BIT(6)
+#define CXL_EVENT_RECORD_FLAG_LD_ID_VALID BIT(7)
+#define CXL_EVENT_RECORD_FLAG_HEAD_ID_VALID BIT(8)
#define show_hdr_flags(flags) __print_flags(flags, " | ", \
{ CXL_EVENT_RECORD_FLAG_PERMANENT, "PERMANENT_CONDITION" }, \
{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED, "MAINTENANCE_NEEDED" }, \
{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED, "PERFORMANCE_DEGRADED" }, \
{ CXL_EVENT_RECORD_FLAG_HW_REPLACE, "HARDWARE_REPLACEMENT_NEEDED" }, \
- { CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID, "MAINT_OP_SUB_CLASS_VALID" } \
+ { CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID, "MAINT_OP_SUB_CLASS_VALID" }, \
+ { CXL_EVENT_RECORD_FLAG_LD_ID_VALID, "LD_ID_VALID" }, \
+ { CXL_EVENT_RECORD_FLAG_HEAD_ID_VALID, "HEAD_ID_VALID" } \
)
/*
@@ -247,7 +251,9 @@ TRACE_EVENT(cxl_overflow,
__field(u64, hdr_timestamp) \
__field(u8, hdr_length) \
__field(u8, hdr_maint_op_class) \
- __field(u8, hdr_maint_op_sub_class)
+ __field(u8, hdr_maint_op_sub_class) \
+ __field(u16, hdr_ld_id) \
+ __field(u8, hdr_head_id)
#define CXL_EVT_TP_fast_assign(cxlmd, l, hdr) \
__assign_str(memdev); \
@@ -260,18 +266,22 @@ TRACE_EVENT(cxl_overflow,
__entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \
__entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \
__entry->hdr_maint_op_class = (hdr).maint_op_class; \
- __entry->hdr_maint_op_sub_class = (hdr).maint_op_sub_class
+ __entry->hdr_maint_op_sub_class = (hdr).maint_op_sub_class; \
+ __entry->hdr_ld_id = le16_to_cpu((hdr).ld_id); \
+ __entry->hdr_head_id = (hdr).head_id
#define CXL_EVT_TP_printk(fmt, ...) \
TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb " \
"len=%d flags='%s' handle=%x related_handle=%x " \
- "maint_op_class=%u maint_op_sub_class=%u : " fmt, \
+ "maint_op_class=%u maint_op_sub_class=%u " \
+ "ld_id=%x head_id=%x : " fmt, \
__get_str(memdev), __get_str(host), __entry->serial, \
cxl_event_log_type_str(__entry->log), \
__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \
__entry->hdr_related_handle, __entry->hdr_maint_op_class, \
__entry->hdr_maint_op_sub_class, \
+ __entry->hdr_ld_id, __entry->hdr_head_id, \
##__VA_ARGS__)
TRACE_EVENT(cxl_generic_event,
diff --git a/include/cxl/event.h b/include/cxl/event.h
index f9ae1796da85..f4cb8568566b 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -19,7 +19,9 @@ struct cxl_event_record_hdr {
__le64 timestamp;
u8 maint_op_class;
u8 maint_op_sub_class;
- u8 reserved[14];
+ __le16 ld_id;
+ u8 head_id;
+ u8 reserved[11];
} __packed;
struct cxl_event_media_hdr {
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/4] cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record
2025-07-17 10:18 [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record shiju.jose
2025-07-17 10:18 ` [PATCH v2 1/4] cxl/events: Update Common Event Record to CXL spec rev 3.2 shiju.jose
@ 2025-07-17 10:18 ` shiju.jose
2025-07-17 10:18 ` [PATCH v2 3/4] cxl/events: Add extra validity checks for CVME count in DRAM " shiju.jose
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: shiju.jose @ 2025-07-17 10:18 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, dave.jiang, jonathan.cameron,
alison.schofield, dave, vishal.l.verma, ira.weiny
Cc: tanxiaofei, prime.zeng, linuxarm, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
According to the CXL Specification Revision 3.2, Section 8.2.10.2.1.1,
Table 8-57 (General Media Event Record), the Corrected Memory Error Count
field is valid under the following conditions:
1. The Threshold Event bit is set in the Memory Event Descriptor field,
and
2. The Corrected Memory Error Count must be greater than 0 for events
where the Advanced Programmable Threshold Counter has expired.
Additionally, if the Advanced Programmable Corrected Memory Error Counter
Expire bit in the Memory Event Type field is set, then the Threshold Event
bit in the Memory Event Descriptor field shall also be set.
Add validity checks for the above conditions while reporting the event to
the userspace.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/mbox.c | 9 +++++++++
drivers/cxl/core/trace.h | 5 ++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 2689e6453c5a..ba4a29afd3aa 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -926,6 +926,15 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
if (cxl_store_rec_gen_media((struct cxl_memdev *)cxlmd, evt))
dev_dbg(&cxlmd->dev, "CXL store rec_gen_media failed\n");
+ if (evt->gen_media.media_hdr.descriptor &
+ CXL_GMER_EVT_DESC_THRESHOLD_EVENT)
+ WARN_ON_ONCE((evt->gen_media.media_hdr.type &
+ CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE) &&
+ !get_unaligned_le24(evt->gen_media.cme_count));
+ else
+ WARN_ON_ONCE(evt->gen_media.media_hdr.type &
+ CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE);
+
trace_cxl_general_media(cxlmd, type, cxlr, hpa,
hpa_alias, &evt->gen_media);
} else if (event_type == CXL_CPER_EVENT_DRAM) {
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index a77487a257b3..c38f94ca0ca1 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -506,7 +506,10 @@ TRACE_EVENT(cxl_general_media,
uuid_copy(&__entry->region_uuid, &uuid_null);
}
__entry->cme_threshold_ev_flags = rec->cme_threshold_ev_flags;
- __entry->cme_count = get_unaligned_le24(rec->cme_count);
+ if (rec->media_hdr.descriptor & CXL_GMER_EVT_DESC_THRESHOLD_EVENT)
+ __entry->cme_count = get_unaligned_le24(rec->cme_count);
+ else
+ __entry->cme_count = 0;
),
CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 3/4] cxl/events: Add extra validity checks for CVME count in DRAM Event Record
2025-07-17 10:18 [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record shiju.jose
2025-07-17 10:18 ` [PATCH v2 1/4] cxl/events: Update Common Event Record to CXL spec rev 3.2 shiju.jose
2025-07-17 10:18 ` [PATCH v2 2/4] cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record shiju.jose
@ 2025-07-17 10:18 ` shiju.jose
2025-07-17 10:18 ` [PATCH v2 4/4] cxl/events: Trace Memory Sparing " shiju.jose
2025-07-18 23:27 ` [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record Dave Jiang
4 siblings, 0 replies; 6+ messages in thread
From: shiju.jose @ 2025-07-17 10:18 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, dave.jiang, jonathan.cameron,
alison.schofield, dave, vishal.l.verma, ira.weiny
Cc: tanxiaofei, prime.zeng, linuxarm, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
According to the CXL Specification Revision 3.2, Section 8.2.10.2.1.2,
Table 8-58 (DRAM Event Record), the CVME (Corrected Volatile Memory Error)
Count field is valid under the following conditions:
1. The Threshold Event bit is set in the Memory Event Descriptor field,
and
2. The CVME Count must be greater than 0 for events where the Advanced
Programmable Threshold Counter has expired.
Additionally, if the Advanced Programmable Corrected Memory Error Counter
Expire bit in the Memory Event Type field is set, then the Threshold Event
bit in the Memory Event Descriptor field shall also be set.
Add validity checks for the above conditions while reporting the event to
the userspace.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/mbox.c | 9 +++++++++
drivers/cxl/core/trace.h | 5 ++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index ba4a29afd3aa..445889b128cd 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -941,6 +941,15 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
if (cxl_store_rec_dram((struct cxl_memdev *)cxlmd, evt))
dev_dbg(&cxlmd->dev, "CXL store rec_dram failed\n");
+ if (evt->dram.media_hdr.descriptor &
+ CXL_GMER_EVT_DESC_THRESHOLD_EVENT)
+ WARN_ON_ONCE((evt->dram.media_hdr.type &
+ CXL_DER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE) &&
+ !get_unaligned_le24(evt->dram.cvme_count));
+ else
+ WARN_ON_ONCE(evt->dram.media_hdr.type &
+ CXL_DER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE);
+
trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias,
&evt->dram);
}
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index c38f94ca0ca1..462c2e892ba2 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -661,7 +661,10 @@ TRACE_EVENT(cxl_dram,
CXL_EVENT_GEN_MED_COMP_ID_SIZE);
__entry->sub_channel = rec->sub_channel;
__entry->cme_threshold_ev_flags = rec->cme_threshold_ev_flags;
- __entry->cvme_count = get_unaligned_le24(rec->cvme_count);
+ if (rec->media_hdr.descriptor & CXL_GMER_EVT_DESC_THRESHOLD_EVENT)
+ __entry->cvme_count = get_unaligned_le24(rec->cvme_count);
+ else
+ __entry->cvme_count = 0;
),
CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' sub_type='%s' " \
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 4/4] cxl/events: Trace Memory Sparing Event Record
2025-07-17 10:18 [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record shiju.jose
` (2 preceding siblings ...)
2025-07-17 10:18 ` [PATCH v2 3/4] cxl/events: Add extra validity checks for CVME count in DRAM " shiju.jose
@ 2025-07-17 10:18 ` shiju.jose
2025-07-18 23:27 ` [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record Dave Jiang
4 siblings, 0 replies; 6+ messages in thread
From: shiju.jose @ 2025-07-17 10:18 UTC (permalink / raw)
To: linux-cxl, dan.j.williams, dave.jiang, jonathan.cameron,
alison.schofield, dave, vishal.l.verma, ira.weiny
Cc: tanxiaofei, prime.zeng, linuxarm, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
CXL rev 3.2 section 8.2.10.2.1.4 Table 8-60 defines the Memory Sparing
Event Record.
Determine if the event read is memory sparing record and if so trace the
record.
Memory device shall produce a memory sparing event record
1. After completion of a PPR maintenance operation if the memory sparing
event record enable bit is set (Field: sPPR/hPPR Operation Mode in
Table 8-128/Table 8-131).
2. In response to a query request by the host (see section 8.2.10.7.1.4)
to determine the availability of sparing resources.
The device shall report the resource availability by producing the Memory
Sparing Event Record (see Table 8-60) in which the channel, rank, nibble
mask, bank group, bank, row, column, sub-channel fields are a copy of the
values specified in the request. If the controller does not support
reporting whether a resource is available, and a perform maintenance
operation for memory sparing is issued with query resources set to 1, the
controller shall return invalid input.
Example trace log for produce memory sparing event record on completion
of a soft PPR operation,
cxl_memory_sparing: memdev=mem1 host=0000:0f:00.0 serial=3
log=Informational : time=55045163029
uuid=e71f3a40-2d29-4092-8a39-4d1c966c7c65 len=128 flags='0x1' handle=1
related_handle=0 maint_op_class=2 maint_op_sub_class=1
ld_id=0 head_id=0 : flags='' result=0
validity_flags='CHANNEL|RANK|NIBBLE|BANK GROUP|BANK|ROW|COLUMN'
spare resource avail=1 channel=2 rank=5 nibble_mask=a59c bank_group=2
bank=4 row=13 column=23 sub_channel=0
comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
comp_id_pldm_valid_flags='' pldm_entity_id=0x00 pldm_resource_id=0x00
Note: For memory sparing event record, fields 'maintenance operation
class' and 'maintenance operation subclass' are defined twice, first
in the common event record (Table 8-55) and second in the memory
sparing event record (Table 8-60). Thus those in the sparing event
record coded as reserved, to be removed when the spec is updated.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/mbox.c | 6 +++
drivers/cxl/core/trace.h | 105 +++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxlmem.h | 8 +++
include/cxl/event.h | 33 ++++++++++++
4 files changed, 152 insertions(+)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 445889b128cd..f7e081c00c49 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -899,6 +899,10 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
trace_cxl_generic_event(cxlmd, type, uuid, &evt->generic);
return;
}
+ if (event_type == CXL_CPER_EVENT_MEM_SPARING) {
+ trace_cxl_memory_sparing(cxlmd, type, &evt->mem_sparing);
+ return;
+ }
if (trace_cxl_general_media_enabled() || trace_cxl_dram_enabled()) {
u64 dpa, hpa = ULLONG_MAX, hpa_alias = ULLONG_MAX;
@@ -970,6 +974,8 @@ static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
ev_type = CXL_CPER_EVENT_DRAM;
else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
ev_type = CXL_CPER_EVENT_MEM_MODULE;
+ else if (uuid_equal(uuid, &CXL_EVENT_MEM_SPARING_UUID))
+ ev_type = CXL_CPER_EVENT_MEM_SPARING;
cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
}
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 462c2e892ba2..a53ec4798b12 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -887,6 +887,111 @@ TRACE_EVENT(cxl_memory_module,
)
);
+/*
+ * Memory Sparing Event Record - MSER
+ *
+ * CXL rev 3.2 section 8.2.10.2.1.4; Table 8-60
+ */
+#define CXL_MSER_QUERY_RESOURCE_FLAG BIT(0)
+#define CXL_MSER_HARD_SPARING_FLAG BIT(1)
+#define CXL_MSER_DEV_INITED_FLAG BIT(2)
+#define show_mem_sparing_flags(flags) __print_flags(flags, "|", \
+ { CXL_MSER_QUERY_RESOURCE_FLAG, "Query Resources" }, \
+ { CXL_MSER_HARD_SPARING_FLAG, "Hard Sparing" }, \
+ { CXL_MSER_DEV_INITED_FLAG, "Device Initiated Sparing" } \
+)
+
+#define CXL_MSER_VALID_CHANNEL BIT(0)
+#define CXL_MSER_VALID_RANK BIT(1)
+#define CXL_MSER_VALID_NIBBLE BIT(2)
+#define CXL_MSER_VALID_BANK_GROUP BIT(3)
+#define CXL_MSER_VALID_BANK BIT(4)
+#define CXL_MSER_VALID_ROW BIT(5)
+#define CXL_MSER_VALID_COLUMN BIT(6)
+#define CXL_MSER_VALID_COMPONENT_ID BIT(7)
+#define CXL_MSER_VALID_COMPONENT_ID_FORMAT BIT(8)
+#define CXL_MSER_VALID_SUB_CHANNEL BIT(9)
+#define show_mem_sparing_valid_flags(flags) __print_flags(flags, "|", \
+ { CXL_MSER_VALID_CHANNEL, "CHANNEL" }, \
+ { CXL_MSER_VALID_RANK, "RANK" }, \
+ { CXL_MSER_VALID_NIBBLE, "NIBBLE" }, \
+ { CXL_MSER_VALID_BANK_GROUP, "BANK GROUP" }, \
+ { CXL_MSER_VALID_BANK, "BANK" }, \
+ { CXL_MSER_VALID_ROW, "ROW" }, \
+ { CXL_MSER_VALID_COLUMN, "COLUMN" }, \
+ { CXL_MSER_VALID_COMPONENT_ID, "COMPONENT ID" }, \
+ { CXL_MSER_VALID_COMPONENT_ID_FORMAT, "COMPONENT ID PLDM FORMAT" }, \
+ { CXL_MSER_VALID_SUB_CHANNEL, "SUB CHANNEL" } \
+)
+
+TRACE_EVENT(cxl_memory_sparing,
+
+ TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
+ struct cxl_event_mem_sparing *rec),
+
+ TP_ARGS(cxlmd, log, rec),
+
+ TP_STRUCT__entry(
+ CXL_EVT_TP_entry
+
+ /* Memory Sparing Event */
+ __field(u8, flags)
+ __field(u8, result)
+ __field(u16, validity_flags)
+ __field(u16, res_avail)
+ __field(u8, channel)
+ __field(u8, rank)
+ __field(u32, nibble_mask)
+ __field(u8, bank_group)
+ __field(u8, bank)
+ __field(u32, row)
+ __field(u16, column)
+ __field(u8, sub_channel)
+ __array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
+ ),
+
+ TP_fast_assign(
+ CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ __entry->hdr_uuid = CXL_EVENT_MEM_SPARING_UUID;
+
+ /* Memory Sparing Event */
+ __entry->flags = rec->flags;
+ __entry->result = rec->result;
+ __entry->validity_flags = le16_to_cpu(rec->validity_flags);
+ __entry->res_avail = le16_to_cpu(rec->res_avail);
+ __entry->channel = rec->channel;
+ __entry->rank = rec->rank;
+ __entry->nibble_mask = get_unaligned_le24(rec->nibble_mask);
+ __entry->bank_group = rec->bank_group;
+ __entry->bank = rec->bank;
+ __entry->row = get_unaligned_le24(rec->row);
+ __entry->column = le16_to_cpu(rec->column);
+ __entry->sub_channel = rec->sub_channel;
+ memcpy(__entry->comp_id, &rec->component_id,
+ CXL_EVENT_GEN_MED_COMP_ID_SIZE);
+ ),
+
+ CXL_EVT_TP_printk("flags='%s' result=%u validity_flags='%s' " \
+ "spare resource avail=%u channel=%u rank=%u " \
+ "nibble_mask=%x bank_group=%u bank=%u " \
+ "row=%u column=%u sub_channel=%u " \
+ "comp_id=%s comp_id_pldm_valid_flags='%s' " \
+ "pldm_entity_id=%s pldm_resource_id=%s",
+ show_mem_sparing_flags(__entry->flags),
+ __entry->result,
+ show_mem_sparing_valid_flags(__entry->validity_flags),
+ __entry->res_avail, __entry->channel, __entry->rank,
+ __entry->nibble_mask, __entry->bank_group, __entry->bank,
+ __entry->row, __entry->column, __entry->sub_channel,
+ __print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
+ show_comp_id_pldm_flags(__entry->comp_id[0]),
+ show_pldm_entity_id(__entry->validity_flags, CXL_MSER_VALID_COMPONENT_ID,
+ CXL_MSER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
+ show_pldm_resource_id(__entry->validity_flags, CXL_MSER_VALID_COMPONENT_ID,
+ CXL_MSER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id)
+ )
+);
+
#define show_poison_trace_type(type) \
__print_symbolic(type, \
{ CXL_POISON_TRACE_LIST, "List" }, \
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 551b0ba2caa1..f98311f357b7 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -633,6 +633,14 @@ struct cxl_mbox_identify {
UUID_INIT(0xfe927475, 0xdd59, 0x4339, 0xa5, 0x86, 0x79, 0xba, 0xb1, \
0x13, 0xb7, 0x74)
+/*
+ * Memory Sparing Event Record UUID
+ * CXL rev 3.2 section 8.2.10.2.1.4: Table 8-60
+ */
+#define CXL_EVENT_MEM_SPARING_UUID \
+ UUID_INIT(0xe71f3a40, 0x2d29, 0x4092, 0x8a, 0x39, 0x4d, 0x1c, 0x96, \
+ 0x6c, 0x7c, 0x65)
+
/*
* Get Event Records output payload
* CXL rev 3.0 section 8.2.9.2.2; Table 8-50
diff --git a/include/cxl/event.h b/include/cxl/event.h
index f4cb8568566b..6fd90f9cc203 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -110,11 +110,43 @@ struct cxl_event_mem_module {
u8 reserved[0x2a];
} __packed;
+/*
+ * Memory Sparing Event Record - MSER
+ * CXL rev 3.2 section 8.2.10.2.1.4; Table 8-60
+ */
+struct cxl_event_mem_sparing {
+ struct cxl_event_record_hdr hdr;
+ /*
+ * The fields maintenance operation class and maintenance operation
+ * subclass defined in the Memory Sparing Event Record are the
+ * duplication of the same in the common event record. Thus defined
+ * as reserved and to be removed after the spec correction.
+ */
+ u8 rsv1;
+ u8 rsv2;
+ u8 flags;
+ u8 result;
+ __le16 validity_flags;
+ u8 reserved1[6];
+ __le16 res_avail;
+ u8 channel;
+ u8 rank;
+ u8 nibble_mask[3];
+ u8 bank_group;
+ u8 bank;
+ u8 row[3];
+ __le16 column;
+ u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+ u8 sub_channel;
+ u8 reserved2[0x25];
+} __packed;
+
union cxl_event {
struct cxl_event_generic generic;
struct cxl_event_gen_media gen_media;
struct cxl_event_dram dram;
struct cxl_event_mem_module mem_module;
+ struct cxl_event_mem_sparing mem_sparing;
/* dram & gen_media event header */
struct cxl_event_media_hdr media_hdr;
} __packed;
@@ -133,6 +165,7 @@ enum cxl_event_type {
CXL_CPER_EVENT_GEN_MEDIA,
CXL_CPER_EVENT_DRAM,
CXL_CPER_EVENT_MEM_MODULE,
+ CXL_CPER_EVENT_MEM_SPARING,
};
#define CPER_CXL_DEVICE_ID_VALID BIT(0)
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record
2025-07-17 10:18 [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record shiju.jose
` (3 preceding siblings ...)
2025-07-17 10:18 ` [PATCH v2 4/4] cxl/events: Trace Memory Sparing " shiju.jose
@ 2025-07-18 23:27 ` Dave Jiang
4 siblings, 0 replies; 6+ messages in thread
From: Dave Jiang @ 2025-07-18 23:27 UTC (permalink / raw)
To: shiju.jose, linux-cxl, dan.j.williams, jonathan.cameron,
alison.schofield, dave, vishal.l.verma, ira.weiny
Cc: tanxiaofei, prime.zeng, linuxarm
On 7/17/25 3:18 AM, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Add following changes to the CXL trace events,
> 1. Update Common Event Record to CXL spec rev 3.2
> https://lore.kernel.org/all/20250522090002.831-1-shiju.jose@huawei.com/
> 2. Add extra validity checks for corrected memory error count
> in General Media Event Record
> 3. Add extra validity checks for CVME count in DRAM Event Record
> 4. Add support for Trace Memory Sparing Event Record.
Series applied to cxl/next
3a32c5b3bb7d2dfad5fab94817f59e8963e2b1a6
>
> Changes
> v1 -> v2
> https://lore.kernel.org/all/20250716104945.2002-1-shiju.jose@huawei.com/
>
> 1. Fixed comment "Spacing before the } is inconsistent" in patch
> Add support for Trace Memory Sparing Event Record (Jonathan).
>
> 2. Dropped note and question from patch(2) and patch (3) (Jonathan and
> Dave Jiang)
>
> 3 Fix for
> https://lore.kernel.org/all/202507171153.p2RrAdN4-lkp@intel.com/ and
> https://lore.kernel.org/all/202507171217.6p5GHqr0-lkp@intel.com/
> Foregot to do get_unaligned_le24().
>
> Shiju Jose (4):
> cxl/events: Update Common Event Record to CXL spec rev 3.2
> cxl/events: Add extra validity checks for corrected memory error count
> in General Media Event Record
> cxl/events: Add extra validity checks for CVME count in DRAM Event
> Record
> cxl/events: Trace Memory Sparing Event Record
>
> drivers/cxl/core/mbox.c | 24 +++++++
> drivers/cxl/core/trace.h | 133 +++++++++++++++++++++++++++++++++++++--
> drivers/cxl/cxlmem.h | 8 +++
> include/cxl/event.h | 37 ++++++++++-
> 4 files changed, 195 insertions(+), 7 deletions(-)
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-07-18 23:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17 10:18 [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record shiju.jose
2025-07-17 10:18 ` [PATCH v2 1/4] cxl/events: Update Common Event Record to CXL spec rev 3.2 shiju.jose
2025-07-17 10:18 ` [PATCH v2 2/4] cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record shiju.jose
2025-07-17 10:18 ` [PATCH v2 3/4] cxl/events: Add extra validity checks for CVME count in DRAM " shiju.jose
2025-07-17 10:18 ` [PATCH v2 4/4] cxl/events: Trace Memory Sparing " shiju.jose
2025-07-18 23:27 ` [PATCH v2 0/4] cxl/events: Update to rev 3.2, improvements and add trace memory sparing event record Dave Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).