* [PATCH v4 0/6] Update Event Records to CXL spec rev 3.1
@ 2024-11-20 9:37 shiju.jose
2024-11-20 9:37 ` [PATCH v4 1/6] cxl/events: Update Common Event Record " shiju.jose
` (5 more replies)
0 siblings, 6 replies; 19+ messages in thread
From: shiju.jose @ 2024-11-20 9:37 UTC (permalink / raw)
To: dave.jiang, dan.j.williams, jonathan.cameron, alison.schofield,
nifan.cxl, vishal.l.verma, ira.weiny, dave, linux-cxl
Cc: linux-kernel, linuxarm, tanxiaofei, prime.zeng, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Add updates for the changes in CXL spec rev 3.1 in the CXL events
records and CXL trace events implementation.
Notes:
=====
1. The number of decoded strings in TP_printk() caused parsing error when
libtraceevent in userspace parses the CXL trace events, for rasdaemon.
It was found that long decoded strings of field values in the TP_printk()
caused the issue. As a solution, decoding of some fields in the TP_printk()
are removed to accommodate the new fields.
Decoding of all these fields is added in the userspace tool rasdaemon.
2. Verified with modified CXL error injection in QEMU CXL emulation and
CXL mock test in kernel.(Thanks to Jonathan for verify with CXL mock test).
Shiju Jose (6):
cxl/events: Update Common Event Record to CXL spec rev 3.1
cxl/events: Add Component Identifier formatting for CXL spec rev 3.1
cxl/events: Update General Media Event Record to CXL spec rev 3.1
cxl/events: Update DRAM Event Record to CXL spec rev 3.1
cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
cxl/test: Update test code for event records to CXL spec rev 3.1
Changes:
V3 -> V4
1. Changes for the parsing error parsing error when libtraceevent in
userspace parses the CXL trace events, for rasdaemon.
It was found that long decoded strings of field values in the TP_printk()
caused the issue, looks like due to buffer overflow/corruption.
Increasing known buffer sizes in userspace and kernel did not help.
As a solution, decoding of some fields in the TP_printk() are removed
to accommodate the new fields.
Decoding of these fields is added in the userspace tool rasdaemon.
V2 -> V3
1. Changes for the feedbacks from Jonathan.
- Added printing component Id format bit in show_valid_flags()
- Modified parsing component ID in patch [2] and added logging
of raw comp-id, comp_id_pldm_flags, PLDM entity id and
PLDM resource id in patches 3 to 4.
V1 -> V2
1. Changes for the feedbacks from Jonathan.
- Separate patch for Component Identifier formatting.
- Moved printing of event sub type after event type.
- For memory module event, rename sub_type to event_sub_type.
2. Changes for the feedbacks from Alison.
- Updated patch's subject
- Updated CXL test code for CXL spec rev 3.1 event records.
3. Changed logic for Component Identifier formatting and other improvements.
drivers/cxl/core/trace.h | 233 ++++++++++++++++++++++++++---------
include/cxl/event.h | 28 +++--
tools/testing/cxl/test/mem.c | 23 +++-
3 files changed, 217 insertions(+), 67 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v4 1/6] cxl/events: Update Common Event Record to CXL spec rev 3.1
2024-11-20 9:37 [PATCH v4 0/6] Update Event Records to CXL spec rev 3.1 shiju.jose
@ 2024-11-20 9:37 ` shiju.jose
2024-11-26 17:27 ` Fan Ni
2024-11-20 9:37 ` [PATCH v4 2/6] cxl/events: Add Component Identifier formatting for " shiju.jose
` (4 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: shiju.jose @ 2024-11-20 9:37 UTC (permalink / raw)
To: dave.jiang, dan.j.williams, jonathan.cameron, alison.schofield,
nifan.cxl, vishal.l.verma, ira.weiny, dave, linux-cxl
Cc: linux-kernel, linuxarm, tanxiaofei, prime.zeng, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
CXL spec 3.1 section 8.2.9.2.1 Table 8-42, Common Event Record format has
updated with Maintenance Operation Subclass information.
Add updates for the above spec change in the CXL events record and CXL
common trace event implementations.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/trace.h | 13 +++++++++----
include/cxl/event.h | 3 ++-
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 8389a94adb1a..7305974e2301 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -166,11 +166,13 @@ TRACE_EVENT(cxl_overflow,
#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED BIT(3)
#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED BIT(4)
#define CXL_EVENT_RECORD_FLAG_HW_REPLACE BIT(5)
+#define CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID BIT(6)
#define show_hdr_flags(flags) __print_flags(flags, " | ", \
{ CXL_EVENT_RECORD_FLAG_PERMANENT, "PERMANENT_CONDITION" }, \
{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED, "MAINTENANCE_NEEDED" }, \
{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED, "PERFORMANCE_DEGRADED" }, \
- { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "HARDWARE_REPLACEMENT_NEEDED" } \
+ { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "HARDWARE_REPLACEMENT_NEEDED" }, \
+ { CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID, "MAINT_OP_SUB_CLASS_VALID" } \
)
/*
@@ -197,7 +199,8 @@ TRACE_EVENT(cxl_overflow,
__field(u16, hdr_related_handle) \
__field(u64, hdr_timestamp) \
__field(u8, hdr_length) \
- __field(u8, hdr_maint_op_class)
+ __field(u8, hdr_maint_op_class) \
+ __field(u8, hdr_maint_op_sub_class)
#define CXL_EVT_TP_fast_assign(cxlmd, l, hdr) \
__assign_str(memdev); \
@@ -209,17 +212,19 @@ TRACE_EVENT(cxl_overflow,
__entry->hdr_handle = le16_to_cpu((hdr).handle); \
__entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \
__entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \
- __entry->hdr_maint_op_class = (hdr).maint_op_class
+ __entry->hdr_maint_op_class = (hdr).maint_op_class; \
+ __entry->hdr_maint_op_sub_class = (hdr).maint_op_sub_class
#define CXL_EVT_TP_printk(fmt, ...) \
TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb " \
"len=%d flags='%s' handle=%x related_handle=%x " \
- "maint_op_class=%u : " fmt, \
+ "maint_op_class=%u maint_op_sub_class=%u : " fmt, \
__get_str(memdev), __get_str(host), __entry->serial, \
cxl_event_log_type_str(__entry->log), \
__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \
__entry->hdr_related_handle, __entry->hdr_maint_op_class, \
+ __entry->hdr_maint_op_sub_class, \
##__VA_ARGS__)
TRACE_EVENT(cxl_generic_event,
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 0bea1afbd747..e1d485ad376b 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -18,7 +18,8 @@ struct cxl_event_record_hdr {
__le16 related_handle;
__le64 timestamp;
u8 maint_op_class;
- u8 reserved[15];
+ u8 maint_op_sub_class;
+ u8 reserved[14];
} __packed;
struct cxl_event_media_hdr {
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v4 2/6] cxl/events: Add Component Identifier formatting for CXL spec rev 3.1
2024-11-20 9:37 [PATCH v4 0/6] Update Event Records to CXL spec rev 3.1 shiju.jose
2024-11-20 9:37 ` [PATCH v4 1/6] cxl/events: Update Common Event Record " shiju.jose
@ 2024-11-20 9:37 ` shiju.jose
2024-11-20 9:37 ` [PATCH v4 3/6] cxl/events: Update General Media Event Record to " shiju.jose
` (3 subsequent siblings)
5 siblings, 0 replies; 19+ messages in thread
From: shiju.jose @ 2024-11-20 9:37 UTC (permalink / raw)
To: dave.jiang, dan.j.williams, jonathan.cameron, alison.schofield,
nifan.cxl, vishal.l.verma, ira.weiny, dave, linux-cxl
Cc: linux-kernel, linuxarm, tanxiaofei, prime.zeng, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Add Component Identifier formatting for CXL spec rev 3.1, Section
8.2.9.2.1, Table 8-44.
Examples for Component Identifier format in trace log,
validity_flags='CHANNEL|RANK|DEVICE|COMPONENT|COMPONENT PLDM FORMAT' \
comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \
comp_id_pldm_valid_flags='PLDM Entity ID | Resource ID' \
pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=fc d2 7e 2f \
validity_flags='COMPONENT|COMPONENT PLDM FORMAT' \
comp_id=02 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \
comp_id_pldm_valid_flags='Resource ID' \
pldm_entity_id=0x00 pldm_resource_id=fc d2 7e 2f
If the validity flags for component ID/component ID format or PLDM ID or
resource ID are not set, then pldm_entity_id=0x00 or pldm_resource_id=0x00
would be printed.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/trace.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 7305974e2301..8e9d80e34a28 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -268,6 +268,23 @@ TRACE_EVENT(cxl_generic_event,
{ CXL_DPA_NOT_REPAIRABLE, "NOT_REPAIRABLE" } \
)
+/*
+ * Component ID Format
+ * CXL 3.1 section 8.2.9.2.1; Table 8-44
+ */
+#define CXL_PLDM_COMPONENT_ID_ENTITY_VALID BIT(0)
+#define CXL_PLDM_COMPONENT_ID_RES_VALID BIT(1)
+
+#define show_pldm_entity_id(flags, valid_comp_id, valid_id_format, comp_id) \
+ (flags & valid_comp_id && flags & valid_id_format) ? \
+ (comp_id[0] & CXL_PLDM_COMPONENT_ID_ENTITY_VALID) ? \
+ __print_hex(&comp_id[1], 6) : "0x00" : "0x00"
+
+#define show_pldm_resource_id(flags, valid_comp_id, valid_id_format, comp_id) \
+ (flags & valid_comp_id && flags & valid_id_format) ? \
+ (comp_id[0] & CXL_PLDM_COMPONENT_ID_RES_VALID) ? \
+ __print_hex(&comp_id[7], 4) : "0x00" : "0x00"
+
/*
* General Media Event Record - GMER
* CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-20 9:37 [PATCH v4 0/6] Update Event Records to CXL spec rev 3.1 shiju.jose
2024-11-20 9:37 ` [PATCH v4 1/6] cxl/events: Update Common Event Record " shiju.jose
2024-11-20 9:37 ` [PATCH v4 2/6] cxl/events: Add Component Identifier formatting for " shiju.jose
@ 2024-11-20 9:37 ` shiju.jose
2024-11-26 11:51 ` Shiju Jose
2024-11-20 9:37 ` [PATCH v4 4/6] cxl/events: Update DRAM " shiju.jose
` (2 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: shiju.jose @ 2024-11-20 9:37 UTC (permalink / raw)
To: dave.jiang, dan.j.williams, jonathan.cameron, alison.schofield,
nifan.cxl, vishal.l.verma, ira.weiny, dave, linux-cxl
Cc: linux-kernel, linuxarm, tanxiaofei, prime.zeng, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
CXL spec rev 3.1 section 8.2.9.2.1.1 Table 8-45, General Media Event
Record has updated with following new fields and new types for Memory
Event Type and Transaction Type fields.
1. Advanced Programmable Corrected Memory Error Threshold Event Flags
2. Corrected Memory Error Count at Event
3. Memory Event Sub-Type
The format of component identifier has changed (CXL spec 3.1 section
8.2.9.2.1 Table 8-44).
Update the general media event record and general media trace event for
the above spec changes. The new fields are inserted in logical places.
Example trace log of cxl_general_media trace event,
cxl_general_media: memdev=mem0 host=0000:0f:00.0 serial=3 log=Fatal : \
time=45104947948 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 \
flags='0x1' handle=1 related_handle=0 maint_op_class=2 \
maint_op_sub_class=4 : dpa=0x30d40 dpa_flags=0x0 \
descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVERFLOW' \
type='TE State Violation' sub_type=0x2 transaction_type=0x4 channel=3 \
rank=33 device=0x5 validity_flags=0x1f \
comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \
pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=fc d2 7e 2f \
hpa=0xffffffffffffffff region= region_uuid=00000000-0000-0000-0000-000000000000 \
cme_threshold_ev_flags=0x3 cme_count=0x78
The number of decoded strings in TP_printk() caused parsing error when
libtraceevent in userspace parses the CXL general media trace event for
rasdaemon. It was found that long decoded strings of field values in the
TP_printk() caused the issue. As a solution, decoding of some fields
in the TP_printk() were removed to accommodate the new fields.
Decoding of all these fields is added in the userspace tool rasdaemon.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/trace.h | 58 +++++++++++++++++++++++++++++-----------
include/cxl/event.h | 7 +++--
2 files changed, 48 insertions(+), 17 deletions(-)
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 8e9d80e34a28..77055d66b56e 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -287,7 +287,7 @@ TRACE_EVENT(cxl_generic_event,
/*
* General Media Event Record - GMER
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ * CXL rev 3.1 Section 8.2.9.2.1.1; Table 8-45
*/
#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT BIT(0)
#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT BIT(1)
@@ -301,10 +301,18 @@ TRACE_EVENT(cxl_generic_event,
#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR 0x00
#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR 0x01
#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR 0x02
-#define show_gmer_mem_event_type(type) __print_symbolic(type, \
- { CXL_GMER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" }, \
- { CXL_GMER_MEM_EVT_TYPE_INV_ADDR, "Invalid Address" }, \
- { CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path Error" } \
+#define CXL_GMER_MEM_EVT_TYPE_TE_STATE_VIOLATION 0x03
+#define CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR 0x04
+#define CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE 0x05
+#define CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION 0x06
+#define show_gmer_mem_event_type(type) __print_symbolic(type, \
+ { CXL_GMER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" }, \
+ { CXL_GMER_MEM_EVT_TYPE_INV_ADDR, "Invalid Address" }, \
+ { CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path Error" }, \
+ { CXL_GMER_MEM_EVT_TYPE_TE_STATE_VIOLATION, "TE State Violation" }, \
+ { CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR, "Scrub Media ECC Error" }, \
+ { CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE, "Adv Prog CME Counter Expiration" }, \
+ { CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION, "CKID Violation" } \
)
#define CXL_GMER_TRANS_UNKNOWN 0x00
@@ -314,6 +322,8 @@ TRACE_EVENT(cxl_generic_event,
#define CXL_GMER_TRANS_HOST_INJECT_POISON 0x04
#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB 0x05
#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT 0x06
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_ECS 0x07
+#define CXL_GMER_TRANS_MEDIA_INITIALIZATION 0x08
#define show_trans_type(type) __print_symbolic(type, \
{ CXL_GMER_TRANS_UNKNOWN, "Unknown" }, \
{ CXL_GMER_TRANS_HOST_READ, "Host Read" }, \
@@ -321,18 +331,22 @@ TRACE_EVENT(cxl_generic_event,
{ CXL_GMER_TRANS_HOST_SCAN_MEDIA, "Host Scan Media" }, \
{ CXL_GMER_TRANS_HOST_INJECT_POISON, "Host Inject Poison" }, \
{ CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB, "Internal Media Scrub" }, \
- { CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT, "Internal Media Management" } \
+ { CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT, "Internal Media Management" }, \
+ { CXL_GMER_TRANS_INTERNAL_MEDIA_ECS, "Internal Media Error Check Scrub" }, \
+ { CXL_GMER_TRANS_MEDIA_INITIALIZATION, "Media Initialization" } \
)
#define CXL_GMER_VALID_CHANNEL BIT(0)
#define CXL_GMER_VALID_RANK BIT(1)
#define CXL_GMER_VALID_DEVICE BIT(2)
#define CXL_GMER_VALID_COMPONENT BIT(3)
+#define CXL_GMER_VALID_COMPONENT_ID_FORMAT BIT(4)
#define show_valid_flags(flags) __print_flags(flags, "|", \
{ CXL_GMER_VALID_CHANNEL, "CHANNEL" }, \
{ CXL_GMER_VALID_RANK, "RANK" }, \
{ CXL_GMER_VALID_DEVICE, "DEVICE" }, \
- { CXL_GMER_VALID_COMPONENT, "COMPONENT" } \
+ { CXL_GMER_VALID_COMPONENT, "COMPONENT" }, \
+ { CXL_GMER_VALID_COMPONENT_ID_FORMAT, "COMPONENT PLDM FORMAT" } \
)
TRACE_EVENT(cxl_general_media,
@@ -348,6 +362,7 @@ TRACE_EVENT(cxl_general_media,
__field(u64, dpa)
__field(u8, descriptor)
__field(u8, type)
+ __field(u8, sub_type)
__field(u8, transaction_type)
__field(u8, channel)
__field(u32, device)
@@ -359,6 +374,8 @@ TRACE_EVENT(cxl_general_media,
__field(u8, rank)
__field(u8, dpa_flags)
__string(region_name, cxlr ? dev_name(&cxlr->dev) : "")
+ __field(u8, cme_threshold_ev_flags)
+ __field(u32, cme_count)
),
TP_fast_assign(
@@ -372,6 +389,7 @@ TRACE_EVENT(cxl_general_media,
__entry->dpa &= CXL_DPA_MASK;
__entry->descriptor = rec->media_hdr.descriptor;
__entry->type = rec->media_hdr.type;
+ __entry->sub_type = rec->sub_type;
__entry->transaction_type = rec->media_hdr.transaction_type;
__entry->channel = rec->media_hdr.channel;
__entry->rank = rec->media_hdr.rank;
@@ -380,6 +398,8 @@ TRACE_EVENT(cxl_general_media,
CXL_EVENT_GEN_MED_COMP_ID_SIZE);
__entry->validity_flags = get_unaligned_le16(&rec->media_hdr.validity_flags);
__entry->hpa = hpa;
+ __entry->cme_threshold_ev_flags = rec->cme_threshold_ev_flags;
+ __entry->cme_count = get_unaligned_le24(rec->cme_count);
if (cxlr) {
__assign_str(region_name);
uuid_copy(&__entry->region_uuid, &cxlr->params.uuid);
@@ -389,18 +409,26 @@ TRACE_EVENT(cxl_general_media,
}
),
- CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
- "descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
- "device=%x comp_id=%s validity_flags='%s' " \
- "hpa=%llx region=%s region_uuid=%pUb",
- __entry->dpa, show_dpa_flags(__entry->dpa_flags),
+ CXL_EVT_TP_printk("dpa=0x%llx dpa_flags=0x%x " \
+ "descriptor='%s' type='%s' sub_type=0x%x " \
+ "transaction_type=0x%x channel=%u rank=%u " \
+ "device=0x%x validity_flags=0x%x " \
+ "comp_id=%s pldm_entity_id=%s pldm_resource_id=%s " \
+ "hpa=0x%llx region=%s region_uuid=%pUb " \
+ "cme_threshold_ev_flags=0x%x cme_count=0x%x ",
+ __entry->dpa, __entry->dpa_flags,
show_event_desc_flags(__entry->descriptor),
show_gmer_mem_event_type(__entry->type),
- show_trans_type(__entry->transaction_type),
+ __entry->sub_type, __entry->transaction_type,
__entry->channel, __entry->rank, __entry->device,
+ __entry->validity_flags,
__print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
- show_valid_flags(__entry->validity_flags),
- __entry->hpa, __get_str(region_name), &__entry->region_uuid
+ show_pldm_entity_id(__entry->validity_flags, CXL_GMER_VALID_COMPONENT,
+ CXL_GMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
+ show_pldm_resource_id(__entry->validity_flags, CXL_GMER_VALID_COMPONENT,
+ CXL_GMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
+ __entry->hpa, __get_str(region_name), &__entry->region_uuid,
+ __entry->cme_threshold_ev_flags, __entry->cme_count
)
);
diff --git a/include/cxl/event.h b/include/cxl/event.h
index e1d485ad376b..2b07adf39010 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -45,14 +45,17 @@ struct cxl_event_generic {
/*
* General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ * CXL rev 3.1 Section 8.2.9.2.1.1; Table 8-45
*/
#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
struct cxl_event_gen_media {
struct cxl_event_media_hdr media_hdr;
u8 device[3];
u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
- u8 reserved[46];
+ u8 cme_threshold_ev_flags;
+ u8 cme_count[3];
+ u8 sub_type;
+ u8 reserved[41];
} __packed;
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v4 4/6] cxl/events: Update DRAM Event Record to CXL spec rev 3.1
2024-11-20 9:37 [PATCH v4 0/6] Update Event Records to CXL spec rev 3.1 shiju.jose
` (2 preceding siblings ...)
2024-11-20 9:37 ` [PATCH v4 3/6] cxl/events: Update General Media Event Record to " shiju.jose
@ 2024-11-20 9:37 ` shiju.jose
2024-11-20 9:37 ` [PATCH v4 5/6] cxl/events: Update Memory Module " shiju.jose
2024-11-20 9:37 ` [PATCH v4 6/6] cxl/test: Update test code for event records " shiju.jose
5 siblings, 0 replies; 19+ messages in thread
From: shiju.jose @ 2024-11-20 9:37 UTC (permalink / raw)
To: dave.jiang, dan.j.williams, jonathan.cameron, alison.schofield,
nifan.cxl, vishal.l.verma, ira.weiny, dave, linux-cxl
Cc: linux-kernel, linuxarm, tanxiaofei, prime.zeng, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
CXL spec 3.1 section 8.2.9.2.1.2 Table 8-46, DRAM Event Record has updated
with following new fields and new types for Memory Event Type, Transaction
Type and Validity Flags fields.
1. Component Identifier
2. Sub-channel
3. Advanced Programmable Corrected Memory Error Threshold Event Flags
4. Corrected Memory Error Count at Event
5. Memory Event Sub-Type
Update DRAM events record and DRAM trace event for the above spec
changes. The new fields are inserted in logical places.
Includes trivial consistency of white space improvements.
Example trace print of cxl_dram trace event,
cxl_dram: memdev=mem0 host=0000:0f:00.0 serial=3 log=Informational : \
time=76068483335 uuid=601dcbb3-9c06-4eab-b8af-4e9bfb5c9624 len=128 \
flags='0x1' handle=1 related_handle=0 maint_op_class=1 \
maint_op_sub_class=3 : dpa=0x18680 dpa_flags=0x0 descriptor=0x3 \
type=0x3 sub_type=0x5 transaction_type=0x5 channel=3 rank=17 \
nibble_mask=0x3b00b2 bank_group=7 bank=11 row=2 column=77 \
cor_mask=21 00 00 00 00 00 00 00 2c 00 00 00 00 00 00 00 37 00 00 00 \
00 00 00 00 42 00 00 00 00 00 00 00 validity_flags=0x3ff \
comp_id=01 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \
pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=0x00 \
hpa=0xffffffffffffffff region= \
region_uuid=00000000-0000-0000-0000-000000000000 \
sub_channel=5 cme_threshold_ev_flags=0x3 cvme_count=0x94
The number of decoded strings in TP_printk() caused a parsing error when
libtraceevent in userspace parsed the CXL dram trace event for rasdaemon.
It was found that long decoded strings of field values in the TP_printk()
caused the issue. As a solution, decoding of field values in the TP_printk()
were removed to accommodate the new fields.
Decoding of all these fields is added in the userspace tool rasdaemon.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/trace.h | 82 +++++++++++++++++++++++++++-------------
include/cxl/event.h | 9 ++++-
2 files changed, 63 insertions(+), 28 deletions(-)
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 77055d66b56e..f4f3d49166cd 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -435,7 +435,7 @@ TRACE_EVENT(cxl_general_media,
/*
* DRAM Event Record - DER
*
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ * CXL rev 3.1 section 8.2.9.2.1.2; Table 8-46
*/
/*
* DRAM Event Record defines many fields the same as the General Media Event
@@ -445,11 +445,17 @@ TRACE_EVENT(cxl_general_media,
#define CXL_DER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR 0x01
#define CXL_DER_MEM_EVT_TYPE_INV_ADDR 0x02
#define CXL_DER_MEM_EVT_TYPE_DATA_PATH_ERROR 0x03
-#define show_dram_mem_event_type(type) __print_symbolic(type, \
- { CXL_DER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" }, \
- { CXL_DER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR, "Scrub Media ECC Error" }, \
- { CXL_DER_MEM_EVT_TYPE_INV_ADDR, "Invalid Address" }, \
- { CXL_DER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path Error" } \
+#define CXL_DER_MEM_EVT_TYPE_TE_STATE_VIOLATION 0x04
+#define CXL_DER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE 0x05
+#define CXL_DER_MEM_EVT_TYPE_CKID_VIOLATION 0x06
+#define show_dram_mem_event_type(type) __print_symbolic(type, \
+ { CXL_DER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" }, \
+ { CXL_DER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR, "Scrub Media ECC Error" }, \
+ { CXL_DER_MEM_EVT_TYPE_INV_ADDR, "Invalid Address" }, \
+ { CXL_DER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path Error" }, \
+ { CXL_DER_MEM_EVT_TYPE_TE_STATE_VIOLATION, "TE State Violation" }, \
+ { CXL_DER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE, "Adv Prog CME Counter Expiration" }, \
+ { CXL_DER_MEM_EVT_TYPE_CKID_VIOLATION, "CKID Violation" } \
)
#define CXL_DER_VALID_CHANNEL BIT(0)
@@ -460,15 +466,21 @@ TRACE_EVENT(cxl_general_media,
#define CXL_DER_VALID_ROW BIT(5)
#define CXL_DER_VALID_COLUMN BIT(6)
#define CXL_DER_VALID_CORRECTION_MASK BIT(7)
-#define show_dram_valid_flags(flags) __print_flags(flags, "|", \
- { CXL_DER_VALID_CHANNEL, "CHANNEL" }, \
- { CXL_DER_VALID_RANK, "RANK" }, \
- { CXL_DER_VALID_NIBBLE, "NIBBLE" }, \
- { CXL_DER_VALID_BANK_GROUP, "BANK GROUP" }, \
- { CXL_DER_VALID_BANK, "BANK" }, \
- { CXL_DER_VALID_ROW, "ROW" }, \
- { CXL_DER_VALID_COLUMN, "COLUMN" }, \
- { CXL_DER_VALID_CORRECTION_MASK, "CORRECTION MASK" } \
+#define CXL_DER_VALID_COMPONENT BIT(8)
+#define CXL_DER_VALID_COMPONENT_ID_FORMAT BIT(9)
+#define CXL_DER_VALID_SUB_CHANNEL BIT(10)
+#define show_dram_valid_flags(flags) __print_flags(flags, "|", \
+ { CXL_DER_VALID_CHANNEL, "CHANNEL" }, \
+ { CXL_DER_VALID_RANK, "RANK" }, \
+ { CXL_DER_VALID_NIBBLE, "NIBBLE" }, \
+ { CXL_DER_VALID_BANK_GROUP, "BANK GROUP" }, \
+ { CXL_DER_VALID_BANK, "BANK" }, \
+ { CXL_DER_VALID_ROW, "ROW" }, \
+ { CXL_DER_VALID_COLUMN, "COLUMN" }, \
+ { CXL_DER_VALID_CORRECTION_MASK, "CORRECTION MASK" }, \
+ { CXL_DER_VALID_COMPONENT, "COMPONENT" }, \
+ { CXL_DER_VALID_COMPONENT_ID_FORMAT, "COMPONENT PLDM FORMAT" }, \
+ { CXL_DER_VALID_SUB_CHANNEL, "SUB CHANNEL" } \
)
TRACE_EVENT(cxl_dram,
@@ -484,6 +496,7 @@ TRACE_EVENT(cxl_dram,
__field(u64, dpa)
__field(u8, descriptor)
__field(u8, type)
+ __field(u8, sub_type)
__field(u8, transaction_type)
__field(u8, channel)
__field(u16, validity_flags)
@@ -498,6 +511,10 @@ TRACE_EVENT(cxl_dram,
__field(u8, bank) /* Out of order to pack trace record */
__field(u8, dpa_flags) /* Out of order to pack trace record */
__string(region_name, cxlr ? dev_name(&cxlr->dev) : "")
+ __array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
+ __field(u8, sub_channel)
+ __field(u8, cme_threshold_ev_flags)
+ __field(u32, cvme_count)
),
TP_fast_assign(
@@ -510,6 +527,7 @@ TRACE_EVENT(cxl_dram,
__entry->dpa &= CXL_DPA_MASK;
__entry->descriptor = rec->media_hdr.descriptor;
__entry->type = rec->media_hdr.type;
+ __entry->sub_type = rec->sub_type;
__entry->transaction_type = rec->media_hdr.transaction_type;
__entry->validity_flags = get_unaligned_le16(rec->media_hdr.validity_flags);
__entry->channel = rec->media_hdr.channel;
@@ -529,23 +547,35 @@ TRACE_EVENT(cxl_dram,
__assign_str(region_name);
uuid_copy(&__entry->region_uuid, &uuid_null);
}
+ memcpy(__entry->comp_id, &rec->component_id,
+ CXL_EVENT_GEN_MED_COMP_ID_SIZE);
+ __entry->sub_channel = rec->sub_channel;
+ __entry->cme_threshold_ev_flags = rec->cme_threshold_ev_flags;
+ __entry->cvme_count = get_unaligned_le24(rec->cvme_count);
),
- CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' " \
- "transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
- "bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
- "validity_flags='%s' " \
- "hpa=%llx region=%s region_uuid=%pUb",
- __entry->dpa, show_dpa_flags(__entry->dpa_flags),
- show_event_desc_flags(__entry->descriptor),
- show_dram_mem_event_type(__entry->type),
- show_trans_type(__entry->transaction_type),
+ CXL_EVT_TP_printk("dpa=0x%llx dpa_flags=0x%x " \
+ "descriptor=0x%x type=0x%x sub_type=0x%x " \
+ "transaction_type=0x%x channel=%u rank=%u " \
+ "nibble_mask=0x%x bank_group=%u bank=%u " \
+ "row=%u column=%u cor_mask=%s validity_flags=0x%x " \
+ "comp_id=%s pldm_entity_id=%s pldm_resource_id=%s " \
+ "hpa=0x%llx region=%s region_uuid=%pUb " \
+ "sub_channel=%u cme_threshold_ev_flags=0x%x cvme_count=0x%x ", \
+ __entry->dpa, __entry->dpa_flags, __entry->descriptor,
+ __entry->type, __entry->sub_type, __entry->transaction_type,
__entry->channel, __entry->rank, __entry->nibble_mask,
__entry->bank_group, __entry->bank,
__entry->row, __entry->column,
__print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
- show_dram_valid_flags(__entry->validity_flags),
- __entry->hpa, __get_str(region_name), &__entry->region_uuid
+ __entry->validity_flags,
+ __print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
+ show_pldm_entity_id(__entry->validity_flags, CXL_DER_VALID_COMPONENT,
+ CXL_DER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
+ show_pldm_resource_id(__entry->validity_flags, CXL_DER_VALID_COMPONENT,
+ CXL_DER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
+ __entry->hpa, __get_str(region_name), &__entry->region_uuid,
+ __entry->sub_channel, __entry->cme_threshold_ev_flags, __entry->cvme_count
)
);
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 2b07adf39010..dd85aa9beddf 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -60,7 +60,7 @@ struct cxl_event_gen_media {
/*
* DRAM Event Record - DER
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ * CXL rev 3.1 section 8.2.9.2.1.2; Table 3-46
*/
#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
struct cxl_event_dram {
@@ -71,7 +71,12 @@ struct cxl_event_dram {
u8 row[3];
u8 column[2];
u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
- u8 reserved[0x17];
+ u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+ u8 sub_channel;
+ u8 cme_threshold_ev_flags;
+ u8 cvme_count[3];
+ u8 sub_type;
+ u8 reserved;
} __packed;
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v4 5/6] cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
2024-11-20 9:37 [PATCH v4 0/6] Update Event Records to CXL spec rev 3.1 shiju.jose
` (3 preceding siblings ...)
2024-11-20 9:37 ` [PATCH v4 4/6] cxl/events: Update DRAM " shiju.jose
@ 2024-11-20 9:37 ` shiju.jose
2024-11-20 9:37 ` [PATCH v4 6/6] cxl/test: Update test code for event records " shiju.jose
5 siblings, 0 replies; 19+ messages in thread
From: shiju.jose @ 2024-11-20 9:37 UTC (permalink / raw)
To: dave.jiang, dan.j.williams, jonathan.cameron, alison.schofield,
nifan.cxl, vishal.l.verma, ira.weiny, dave, linux-cxl
Cc: linux-kernel, linuxarm, tanxiaofei, prime.zeng, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
CXL spec 3.1 section 8.2.9.2.1.3 Table 8-47, Memory Module Event Record
has updated with following new fields and new info for Device Event Type
and Device Health Information fields.
1. Validity Flags
2. Component Identifier
3. Device Event Sub-Type
Update the Memory Module event record and Memory Module trace event for
the above spec changes. The new fields are inserted in logical places.
Example trace print of cxl_memory_module trace event,
cxl_memory_module: memdev=mem0 host=0000:0f:00.0 serial=3 log=Fatal : \
time=46654654941 uuid=fe927475-dd59-4339-a586-79bab113b774 len=128 \
flags='0x1' handle=1 related_handle=0 maint_op_class=0 \
maint_op_sub_class=1 : event_type='Temperature Change' event_sub_type=0x2 \
health_status=0x5 media_status=0x7 as_life_used=0x3 as_dev_temp=Normal \
as_cor_vol_err_cnt=Normal as_cor_per_err_cnt=Normal life_used=8 \
device_temp=3 dirty_shutdown_cnt=33 cor_vol_err_cnt=25 cor_per_err_cnt=45 \
validity_flags=0x3 comp_id=02 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \
pldm_entity_id=0x00 pldm_resource_id=fc d2 7e 2f
The number of decoded strings in TP_printk() cause parsing error when
libtraceevent in userspace parses the CXL memory module trace event for
rasdaemon. It was found that long decoded strings of field values in the
TP_printk() caused the issue. As a solution, decoding of some of the
fields in the TP_printk() were removed to accommodate the new fields.
Decoding of all these fields is added in the userspace tool rasdaemon.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
drivers/cxl/core/trace.h | 63 +++++++++++++++++++++++++++++++++-------
include/cxl/event.h | 9 ++++--
2 files changed, 58 insertions(+), 14 deletions(-)
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index f4f3d49166cd..d0b8f35ae09f 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -582,7 +582,7 @@ TRACE_EVENT(cxl_dram,
/*
* Memory Module Event Record - MMER
*
- * CXL res 3.0 section 8.2.9.2.1.3; Table 8-45
+ * CXL res 3.1 section 8.2.9.2.1.3; Table 8-47
*/
#define CXL_MMER_HEALTH_STATUS_CHANGE 0x00
#define CXL_MMER_MEDIA_STATUS_CHANGE 0x01
@@ -590,27 +590,35 @@ TRACE_EVENT(cxl_dram,
#define CXL_MMER_TEMP_CHANGE 0x03
#define CXL_MMER_DATA_PATH_ERROR 0x04
#define CXL_MMER_LSA_ERROR 0x05
+#define CXL_MMER_UNRECOV_SIDEBAND_BUS_ERROR 0x06
+#define CXL_MMER_MEMORY_MEDIA_FRU_ERROR 0x07
+#define CXL_MMER_POWER_MANAGEMENT_FAULT 0x08
#define show_dev_evt_type(type) __print_symbolic(type, \
{ CXL_MMER_HEALTH_STATUS_CHANGE, "Health Status Change" }, \
{ CXL_MMER_MEDIA_STATUS_CHANGE, "Media Status Change" }, \
{ CXL_MMER_LIFE_USED_CHANGE, "Life Used Change" }, \
{ CXL_MMER_TEMP_CHANGE, "Temperature Change" }, \
{ CXL_MMER_DATA_PATH_ERROR, "Data Path Error" }, \
- { CXL_MMER_LSA_ERROR, "LSA Error" } \
+ { CXL_MMER_LSA_ERROR, "LSA Error" }, \
+ { CXL_MMER_UNRECOV_SIDEBAND_BUS_ERROR, "Unrecoverable Internal Sideband Bus Error" }, \
+ { CXL_MMER_MEMORY_MEDIA_FRU_ERROR, "Memory Media FRU Error" }, \
+ { CXL_MMER_POWER_MANAGEMENT_FAULT, "Power Management Fault" } \
)
/*
* Device Health Information - DHI
*
- * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
+ * CXL res 3.1 section 8.2.9.9.3.1; Table 8-133
*/
#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
+#define CXL_DHI_HS_MEM_CAPACITY_DEGRADED BIT(3)
#define show_health_status_flags(flags) __print_flags(flags, "|", \
{ CXL_DHI_HS_MAINTENANCE_NEEDED, "MAINTENANCE_NEEDED" }, \
{ CXL_DHI_HS_PERFORMANCE_DEGRADED, "PERFORMANCE_DEGRADED" }, \
- { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "REPLACEMENT_NEEDED" } \
+ { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "REPLACEMENT_NEEDED" }, \
+ { CXL_DHI_HS_MEM_CAPACITY_DEGRADED, "MEM_CAPACITY_DEGRADED" } \
)
#define CXL_DHI_MS_NORMAL 0x00
@@ -664,6 +672,26 @@ TRACE_EVENT(cxl_dram,
#define CXL_DHI_AS_COR_VOL_ERR_CNT(as) ((as & 0x10) >> 4)
#define CXL_DHI_AS_COR_PER_ERR_CNT(as) ((as & 0x20) >> 5)
+#define CXL_MMER_VALID_COMPONENT BIT(0)
+#define CXL_MMER_VALID_COMPONENT_ID_FORMAT BIT(1)
+#define show_mem_module_valid_flags(flags) __print_flags(flags, "|", \
+ { CXL_MMER_VALID_COMPONENT, "COMPONENT" }, \
+ { CXL_MMER_VALID_COMPONENT_ID_FORMAT, "COMPONENT PLDM FORMAT" } \
+)
+#define CXL_MMER_DEV_EVT_SUB_TYPE_NOT_REPORTED 0x00
+#define CXL_MMER_DEV_EVT_SUB_TYPE_INVALID_CONFIG_DATA 0x01
+#define CXL_MMER_DEV_EVT_SUB_TYPE_UNSUPP_CONFIG_DATA 0x02
+#define CXL_MMER_DEV_EVT_SUB_TYPE_UNSUPP_MEM_MEDIA_FRU 0x03
+#define show_dev_event_sub_type(sub_type) __print_symbolic(sub_type, \
+ { CXL_MMER_DEV_EVT_SUB_TYPE_NOT_REPORTED, "Not Reported" }, \
+ { CXL_MMER_DEV_EVT_SUB_TYPE_INVALID_CONFIG_DATA, "Invalid Config Data" }, \
+ { CXL_MMER_DEV_EVT_SUB_TYPE_UNSUPP_CONFIG_DATA, "Unsupported Config Data" }, \
+ { \
+ CXL_MMER_DEV_EVT_SUB_TYPE_UNSUPP_MEM_MEDIA_FRU, \
+ "Unsupported Memory Media FRU" \
+ } \
+)
+
TRACE_EVENT(cxl_memory_module,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
@@ -676,6 +704,7 @@ TRACE_EVENT(cxl_memory_module,
/* Memory Module Event */
__field(u8, event_type)
+ __field(u8, event_sub_type)
/* Device Health Info */
__field(u8, health_status)
@@ -686,6 +715,8 @@ TRACE_EVENT(cxl_memory_module,
__field(u32, cor_per_err_cnt)
__field(s16, device_temp)
__field(u8, add_status)
+ __field(u16, validity_flags)
+ __array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
),
TP_fast_assign(
@@ -694,6 +725,7 @@ TRACE_EVENT(cxl_memory_module,
/* Memory Module Event */
__entry->event_type = rec->event_type;
+ __entry->event_sub_type = rec->event_sub_type;
/* Device Health Info */
__entry->health_status = rec->info.health_status;
@@ -704,22 +736,31 @@ TRACE_EVENT(cxl_memory_module,
__entry->cor_per_err_cnt = get_unaligned_le32(rec->info.cor_per_err_cnt);
__entry->device_temp = get_unaligned_le16(rec->info.device_temp);
__entry->add_status = rec->info.add_status;
+ __entry->validity_flags = get_unaligned_le16(rec->validity_flags);
+ memcpy(__entry->comp_id, &rec->component_id,
+ CXL_EVENT_GEN_MED_COMP_ID_SIZE);
),
- CXL_EVT_TP_printk("event_type='%s' health_status='%s' media_status='%s' " \
- "as_life_used=%s as_dev_temp=%s as_cor_vol_err_cnt=%s " \
+ CXL_EVT_TP_printk("event_type='%s' event_sub_type=0x%x health_status=0x%x " \
+ "media_status=0x%x as_life_used=%s as_dev_temp=%s as_cor_vol_err_cnt=%s " \
"as_cor_per_err_cnt=%s life_used=%u device_temp=%d " \
- "dirty_shutdown_cnt=%u cor_vol_err_cnt=%u cor_per_err_cnt=%u",
- show_dev_evt_type(__entry->event_type),
- show_health_status_flags(__entry->health_status),
- show_media_status(__entry->media_status),
+ "dirty_shutdown_cnt=%u cor_vol_err_cnt=%u cor_per_err_cnt=%u " \
+ "validity_flags=0x%x " \
+ "comp_id=%s pldm_entity_id=%s pldm_resource_id=%s ",
+ show_dev_evt_type(__entry->event_type), __entry->event_sub_type,
+ __entry->health_status, __entry->media_status,
show_two_bit_status(CXL_DHI_AS_LIFE_USED(__entry->add_status)),
show_two_bit_status(CXL_DHI_AS_DEV_TEMP(__entry->add_status)),
show_one_bit_status(CXL_DHI_AS_COR_VOL_ERR_CNT(__entry->add_status)),
show_one_bit_status(CXL_DHI_AS_COR_PER_ERR_CNT(__entry->add_status)),
__entry->life_used, __entry->device_temp,
__entry->dirty_shutdown_cnt, __entry->cor_vol_err_cnt,
- __entry->cor_per_err_cnt
+ __entry->cor_per_err_cnt, __entry->validity_flags,
+ __print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
+ show_pldm_entity_id(__entry->validity_flags, CXL_MMER_VALID_COMPONENT,
+ CXL_MMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
+ show_pldm_resource_id(__entry->validity_flags, CXL_MMER_VALID_COMPONENT,
+ CXL_MMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id)
)
);
diff --git a/include/cxl/event.h b/include/cxl/event.h
index dd85aa9beddf..f44c9487ca39 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -81,7 +81,7 @@ struct cxl_event_dram {
/*
* Get Health Info Record
- * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ * CXL rev 3.1 section 8.2.9.9.3.1; Table 8-133
*/
struct cxl_get_health_info {
u8 health_status;
@@ -96,13 +96,16 @@ struct cxl_get_health_info {
/*
* Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ * CXL rev 3.1 section 8.2.9.2.1.3; Table 8-47
*/
struct cxl_event_mem_module {
struct cxl_event_record_hdr hdr;
u8 event_type;
struct cxl_get_health_info info;
- u8 reserved[0x3d];
+ u8 validity_flags[2];
+ u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+ u8 event_sub_type;
+ u8 reserved[0x2a];
} __packed;
union cxl_event {
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v4 6/6] cxl/test: Update test code for event records to CXL spec rev 3.1
2024-11-20 9:37 [PATCH v4 0/6] Update Event Records to CXL spec rev 3.1 shiju.jose
` (4 preceding siblings ...)
2024-11-20 9:37 ` [PATCH v4 5/6] cxl/events: Update Memory Module " shiju.jose
@ 2024-11-20 9:37 ` shiju.jose
5 siblings, 0 replies; 19+ messages in thread
From: shiju.jose @ 2024-11-20 9:37 UTC (permalink / raw)
To: dave.jiang, dan.j.williams, jonathan.cameron, alison.schofield,
nifan.cxl, vishal.l.verma, ira.weiny, dave, linux-cxl
Cc: linux-kernel, linuxarm, tanxiaofei, prime.zeng, shiju.jose
From: Shiju Jose <shiju.jose@huawei.com>
Update test code for General Media, DRAM, Memory Module Event
Records to CXL spec rev 3.1.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
tools/testing/cxl/test/mem.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 71916e0e1546..94e7534581d8 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -401,6 +401,10 @@ struct cxl_test_gen_media gen_media = {
.channel = 1,
.rank = 30,
},
+ .component_id = { 0x3, 0x74, 0xc5, 0x8, 0x9a, 0x1a, 0xb, 0xfc, 0xd2, 0x7e, 0x2f, 0x31, 0x9b, 0x3c, 0x81, 0x4d },
+ .cme_threshold_ev_flags = 3,
+ .cme_count = { 33, 0, 0 },
+ .sub_type = 0x2,
},
};
@@ -429,6 +433,11 @@ struct cxl_test_dram dram = {
.bank_group = 5,
.bank = 2,
.column = {0xDE, 0xAD},
+ .component_id = { 0x1, 0x74, 0xc5, 0x8, 0x9a, 0x1a, 0xb, 0xfc, 0xd2, 0x7e, 0x2f, 0x31, 0x9b, 0x3c, 0x81, 0x4d },
+ .sub_channel = 8,
+ .cme_threshold_ev_flags = 2,
+ .cvme_count = { 14, 0, 0 },
+ .sub_type = 0x5,
},
};
@@ -456,7 +465,10 @@ struct cxl_test_mem_module mem_module = {
.dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
.cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
.cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
- }
+ },
+ /* .validity_flags = <set below> */
+ .component_id = { 0x2, 0x74, 0xc5, 0x8, 0x9a, 0x1a, 0xb, 0xfc, 0xd2, 0x7e, 0x2f, 0x31, 0x9b, 0x3c, 0x81, 0x4d },
+ .event_sub_type = 0x3,
},
};
@@ -478,13 +490,18 @@ static int mock_set_timestamp(struct cxl_dev_state *cxlds,
static void cxl_mock_add_event_logs(struct mock_event_store *mes)
{
- put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK,
+ put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK |
+ CXL_GMER_VALID_COMPONENT | CXL_GMER_VALID_COMPONENT_ID_FORMAT,
&gen_media.rec.media_hdr.validity_flags);
put_unaligned_le16(CXL_DER_VALID_CHANNEL | CXL_DER_VALID_BANK_GROUP |
- CXL_DER_VALID_BANK | CXL_DER_VALID_COLUMN,
+ CXL_DER_VALID_BANK | CXL_DER_VALID_COLUMN | CXL_DER_VALID_SUB_CHANNEL |
+ CXL_DER_VALID_COMPONENT | CXL_DER_VALID_COMPONENT_ID_FORMAT,
&dram.rec.media_hdr.validity_flags);
+ put_unaligned_le16(CXL_MMER_VALID_COMPONENT | CXL_MMER_VALID_COMPONENT_ID_FORMAT,
+ &mem_module.rec.validity_flags);
+
mes_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
mes_add_event(mes, CXL_EVENT_TYPE_INFO,
(struct cxl_event_record_raw *)&gen_media);
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-20 9:37 ` [PATCH v4 3/6] cxl/events: Update General Media Event Record to " shiju.jose
@ 2024-11-26 11:51 ` Shiju Jose
2024-11-26 17:02 ` Steven Rostedt
0 siblings, 1 reply; 19+ messages in thread
From: Shiju Jose @ 2024-11-26 11:51 UTC (permalink / raw)
To: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org
Cc: rostedt@goodmis.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
CC Steven Rostedt.
Hi Steven,
We are encountering a parsing error ("FAILED TO PARSE") from libtraceevent when it
tries to parse some of the CXL trace events for the user-space tool rasdaemon.
This issue appeared after new fields were added to the trace events.
It was found that the issue does not occur when all or some of the decoded strings
for the event's data and flags are removed from the TP_printk() function in the kernel,
and only the values are printed instead.
https://elixir.bootlin.com/linux/v6.12/source/drivers/cxl/core/trace.h
https://lore.kernel.org/lkml/20241120093745.1847-1-shiju.jose@huawei.com/
Below is the information from the debugging in libtraceevent:
The failure occurs in the following functions and locations within libtraceevent:
File: src/event-parse.c
Function: event_read_format()
ret = event_read_fields(event->tep, event, &event->format.fields); if (ret < 0)
return ret;
Function: event_read_fields()
if (test_type_token(type, token, TEP_EVENT_ITEM, "field"))
goto fail;
Can you recognize if there are any limitations or issues that would prevent
libtraceevent from parsing the trace event in the condition described above?
Thanks,
Shiju
>-----Original Message-----
>From: Shiju Jose <shiju.jose@huawei.com>
>Sent: 20 November 2024 09:38
>To: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org
>Cc: linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>;
>tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
>Shiju Jose <shiju.jose@huawei.com>
>Subject: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL
>spec rev 3.1
>
>From: Shiju Jose <shiju.jose@huawei.com>
>
>CXL spec rev 3.1 section 8.2.9.2.1.1 Table 8-45, General Media Event Record has
>updated with following new fields and new types for Memory Event Type and
>Transaction Type fields.
>1. Advanced Programmable Corrected Memory Error Threshold Event Flags 2.
>Corrected Memory Error Count at Event 3. Memory Event Sub-Type
>
>The format of component identifier has changed (CXL spec 3.1 section
>8.2.9.2.1 Table 8-44).
>
>Update the general media event record and general media trace event for the
>above spec changes. The new fields are inserted in logical places.
>
>Example trace log of cxl_general_media trace event,
>
>cxl_general_media: memdev=mem0 host=0000:0f:00.0 serial=3 log=Fatal : \
>time=45104947948 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 \
>flags='0x1' handle=1 related_handle=0 maint_op_class=2 \
>maint_op_sub_class=4 : dpa=0x30d40 dpa_flags=0x0 \
>descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVER
>FLOW' \ type='TE State Violation' sub_type=0x2 transaction_type=0x4 channel=3
>\
>rank=33 device=0x5 validity_flags=0x1f \
>comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \
>pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=fc d2 7e 2f \
>hpa=0xffffffffffffffff region= region_uuid=00000000-0000-0000-0000-
>000000000000 \
>cme_threshold_ev_flags=0x3 cme_count=0x78
>
>The number of decoded strings in TP_printk() caused parsing error when
>libtraceevent in userspace parses the CXL general media trace event for
>rasdaemon. It was found that long decoded strings of field values in the
>TP_printk() caused the issue. As a solution, decoding of some fields in the
>TP_printk() were removed to accommodate the new fields.
>Decoding of all these fields is added in the userspace tool rasdaemon.
>
>Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>---
> drivers/cxl/core/trace.h | 58 +++++++++++++++++++++++++++++-----------
> include/cxl/event.h | 7 +++--
> 2 files changed, 48 insertions(+), 17 deletions(-)
>
>diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index
>8e9d80e34a28..77055d66b56e 100644
>--- a/drivers/cxl/core/trace.h
>+++ b/drivers/cxl/core/trace.h
>@@ -287,7 +287,7 @@ TRACE_EVENT(cxl_generic_event,
>
> /*
> * General Media Event Record - GMER
>- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
>+ * CXL rev 3.1 Section 8.2.9.2.1.1; Table 8-45
> */
> #define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT BIT(0)
> #define CXL_GMER_EVT_DESC_THRESHOLD_EVENT BIT(1)
>@@ -301,10 +301,18 @@ TRACE_EVENT(cxl_generic_event,
> #define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR 0x00
> #define CXL_GMER_MEM_EVT_TYPE_INV_ADDR 0x01
> #define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR 0x02
>-#define show_gmer_mem_event_type(type) __print_symbolic(type,
> \
>- { CXL_GMER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" },
> \
>- { CXL_GMER_MEM_EVT_TYPE_INV_ADDR, "Invalid
>Address" }, \
>- { CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path
>Error" } \
>+#define CXL_GMER_MEM_EVT_TYPE_TE_STATE_VIOLATION 0x03
>+#define CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR 0x04
>+#define CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE 0x05
>+#define CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION 0x06
>+#define show_gmer_mem_event_type(type) __print_symbolic(type,
> \
>+ { CXL_GMER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" },
> \
>+ { CXL_GMER_MEM_EVT_TYPE_INV_ADDR, "Invalid
>Address" }, \
>+ { CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path
>Error" }, \
>+ { CXL_GMER_MEM_EVT_TYPE_TE_STATE_VIOLATION, "TE State
>Violation" }, \
>+ { CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR, "Scrub
>Media ECC Error" }, \
>+ { CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE, "Adv
>Prog CME Counter Expiration" }, \
>+ { CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION, "CKID
>Violation" } \
> )
>
> #define CXL_GMER_TRANS_UNKNOWN 0x00
>@@ -314,6 +322,8 @@ TRACE_EVENT(cxl_generic_event,
> #define CXL_GMER_TRANS_HOST_INJECT_POISON 0x04
> #define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB 0x05
> #define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT 0x06
>+#define CXL_GMER_TRANS_INTERNAL_MEDIA_ECS 0x07
>+#define CXL_GMER_TRANS_MEDIA_INITIALIZATION 0x08
> #define show_trans_type(type) __print_symbolic(type,
> \
> { CXL_GMER_TRANS_UNKNOWN, "Unknown" },
> \
> { CXL_GMER_TRANS_HOST_READ, "Host Read" },
> \
>@@ -321,18 +331,22 @@ TRACE_EVENT(cxl_generic_event,
> { CXL_GMER_TRANS_HOST_SCAN_MEDIA, "Host Scan
>Media" }, \
> { CXL_GMER_TRANS_HOST_INJECT_POISON, "Host Inject
>Poison" }, \
> { CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB, "Internal
>Media Scrub" }, \
>- { CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,
> "Internal Media Management" } \
>+ { CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,
> "Internal Media Management" }, \
>+ { CXL_GMER_TRANS_INTERNAL_MEDIA_ECS, "Internal
>Media Error Check Scrub" }, \
>+ { CXL_GMER_TRANS_MEDIA_INITIALIZATION, "Media
>Initialization" } \
> )
>
> #define CXL_GMER_VALID_CHANNEL BIT(0)
> #define CXL_GMER_VALID_RANK BIT(1)
> #define CXL_GMER_VALID_DEVICE BIT(2)
> #define CXL_GMER_VALID_COMPONENT BIT(3)
>+#define CXL_GMER_VALID_COMPONENT_ID_FORMAT BIT(4)
> #define show_valid_flags(flags) __print_flags(flags, "|", \
> { CXL_GMER_VALID_CHANNEL, "CHANNEL" }, \
> { CXL_GMER_VALID_RANK, "RANK"
> }, \
> { CXL_GMER_VALID_DEVICE, "DEVICE" }, \
>- { CXL_GMER_VALID_COMPONENT, "COMPONENT"
> } \
>+ { CXL_GMER_VALID_COMPONENT, "COMPONENT"
> }, \
>+ { CXL_GMER_VALID_COMPONENT_ID_FORMAT,
> "COMPONENT PLDM FORMAT" } \
> )
>
> TRACE_EVENT(cxl_general_media,
>@@ -348,6 +362,7 @@ TRACE_EVENT(cxl_general_media,
> __field(u64, dpa)
> __field(u8, descriptor)
> __field(u8, type)
>+ __field(u8, sub_type)
> __field(u8, transaction_type)
> __field(u8, channel)
> __field(u32, device)
>@@ -359,6 +374,8 @@ TRACE_EVENT(cxl_general_media,
> __field(u8, rank)
> __field(u8, dpa_flags)
> __string(region_name, cxlr ? dev_name(&cxlr->dev) : "")
>+ __field(u8, cme_threshold_ev_flags)
>+ __field(u32, cme_count)
> ),
>
> TP_fast_assign(
>@@ -372,6 +389,7 @@ TRACE_EVENT(cxl_general_media,
> __entry->dpa &= CXL_DPA_MASK;
> __entry->descriptor = rec->media_hdr.descriptor;
> __entry->type = rec->media_hdr.type;
>+ __entry->sub_type = rec->sub_type;
> __entry->transaction_type = rec->media_hdr.transaction_type;
> __entry->channel = rec->media_hdr.channel;
> __entry->rank = rec->media_hdr.rank;
>@@ -380,6 +398,8 @@ TRACE_EVENT(cxl_general_media,
> CXL_EVENT_GEN_MED_COMP_ID_SIZE);
> __entry->validity_flags = get_unaligned_le16(&rec-
>>media_hdr.validity_flags);
> __entry->hpa = hpa;
>+ __entry->cme_threshold_ev_flags = rec-
>>cme_threshold_ev_flags;
>+ __entry->cme_count = get_unaligned_le24(rec->cme_count);
> if (cxlr) {
> __assign_str(region_name);
> uuid_copy(&__entry->region_uuid, &cxlr-
>>params.uuid); @@ -389,18 +409,26 @@ TRACE_EVENT(cxl_general_media,
> }
> ),
>
>- CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
>- "descriptor='%s' type='%s' transaction_type='%s' channel=%u
>rank=%u " \
>- "device=%x comp_id=%s validity_flags='%s' " \
>- "hpa=%llx region=%s region_uuid=%pUb",
>- __entry->dpa, show_dpa_flags(__entry->dpa_flags),
>+ CXL_EVT_TP_printk("dpa=0x%llx dpa_flags=0x%x " \
>+ "descriptor='%s' type='%s' sub_type=0x%x " \
>+ "transaction_type=0x%x channel=%u rank=%u " \
>+ "device=0x%x validity_flags=0x%x " \
>+ "comp_id=%s pldm_entity_id=%s pldm_resource_id=%s " \
>+ "hpa=0x%llx region=%s region_uuid=%pUb " \
>+ "cme_threshold_ev_flags=0x%x cme_count=0x%x ",
>+ __entry->dpa, __entry->dpa_flags,
> show_event_desc_flags(__entry->descriptor),
> show_gmer_mem_event_type(__entry->type),
>- show_trans_type(__entry->transaction_type),
>+ __entry->sub_type, __entry->transaction_type,
> __entry->channel, __entry->rank, __entry->device,
>+ __entry->validity_flags,
> __print_hex(__entry->comp_id,
>CXL_EVENT_GEN_MED_COMP_ID_SIZE),
>- show_valid_flags(__entry->validity_flags),
>- __entry->hpa, __get_str(region_name), &__entry->region_uuid
>+ show_pldm_entity_id(__entry->validity_flags,
>CXL_GMER_VALID_COMPONENT,
>+
>CXL_GMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
>+ show_pldm_resource_id(__entry->validity_flags,
>CXL_GMER_VALID_COMPONENT,
>+
>CXL_GMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
>+ __entry->hpa, __get_str(region_name), &__entry->region_uuid,
>+ __entry->cme_threshold_ev_flags, __entry->cme_count
> )
> );
>
>diff --git a/include/cxl/event.h b/include/cxl/event.h index
>e1d485ad376b..2b07adf39010 100644
>--- a/include/cxl/event.h
>+++ b/include/cxl/event.h
>@@ -45,14 +45,17 @@ struct cxl_event_generic {
>
> /*
> * General Media Event Record
>- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
>+ * CXL rev 3.1 Section 8.2.9.2.1.1; Table 8-45
> */
> #define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
> struct cxl_event_gen_media {
> struct cxl_event_media_hdr media_hdr;
> u8 device[3];
> u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
>- u8 reserved[46];
>+ u8 cme_threshold_ev_flags;
>+ u8 cme_count[3];
>+ u8 sub_type;
>+ u8 reserved[41];
> } __packed;
>
> /*
>--
>2.43.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-26 11:51 ` Shiju Jose
@ 2024-11-26 17:02 ` Steven Rostedt
2024-11-27 10:12 ` Shiju Jose
0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2024-11-26 17:02 UTC (permalink / raw)
To: Shiju Jose
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
On Tue, 26 Nov 2024 11:51:23 +0000
Shiju Jose <shiju.jose@huawei.com> wrote:
> We are encountering a parsing error ("FAILED TO PARSE") from libtraceevent when it
> tries to parse some of the CXL trace events for the user-space tool rasdaemon.
> This issue appeared after new fields were added to the trace events.
> It was found that the issue does not occur when all or some of the decoded strings
> for the event's data and flags are removed from the TP_printk() function in the kernel,
> and only the values are printed instead.
> https://elixir.bootlin.com/linux/v6.12/source/drivers/cxl/core/trace.h
> https://lore.kernel.org/lkml/20241120093745.1847-1-shiju.jose@huawei.com/
>
> Below is the information from the debugging in libtraceevent:
> The failure occurs in the following functions and locations within libtraceevent:
> File: src/event-parse.c
> Function: event_read_format()
> ret = event_read_fields(event->tep, event, &event->format.fields); if (ret < 0)
> return ret;
>
> Function: event_read_fields()
> if (test_type_token(type, token, TEP_EVENT_ITEM, "field"))
> goto fail;
>
> Can you recognize if there are any limitations or issues that would prevent
> libtraceevent from parsing the trace event in the condition described above?
Can you show me the output of the format files for the affected trace events:
# cat /sys/kernel/tracing/cxl/<affected_event>/format
You can attach it too if your email does whitespace mangling.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 1/6] cxl/events: Update Common Event Record to CXL spec rev 3.1
2024-11-20 9:37 ` [PATCH v4 1/6] cxl/events: Update Common Event Record " shiju.jose
@ 2024-11-26 17:27 ` Fan Ni
2024-11-27 10:15 ` Shiju Jose
0 siblings, 1 reply; 19+ messages in thread
From: Fan Ni @ 2024-11-26 17:27 UTC (permalink / raw)
To: shiju.jose
Cc: dave.jiang, dan.j.williams, jonathan.cameron, alison.schofield,
nifan.cxl, vishal.l.verma, ira.weiny, dave, linux-cxl,
linux-kernel, linuxarm, tanxiaofei, prime.zeng
On Wed, Nov 20, 2024 at 09:37:40AM +0000, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
>
> CXL spec 3.1 section 8.2.9.2.1 Table 8-42, Common Event Record format has
> updated with Maintenance Operation Subclass information.
The table reference is incorrect, should be Table 8-43 based on spec r3.1,
v1.0, Auguest 7, 2023.
Other than that,
Reviewed-by: Fan Ni <fan.ni@samsung.com>
>
> Add updates for the above spec change in the CXL events record and CXL
> common trace event implementations.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
> drivers/cxl/core/trace.h | 13 +++++++++----
> include/cxl/event.h | 3 ++-
> 2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index 8389a94adb1a..7305974e2301 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -166,11 +166,13 @@ TRACE_EVENT(cxl_overflow,
> #define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED BIT(3)
> #define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED BIT(4)
> #define CXL_EVENT_RECORD_FLAG_HW_REPLACE BIT(5)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID BIT(6)
> #define show_hdr_flags(flags) __print_flags(flags, " | ", \
> { CXL_EVENT_RECORD_FLAG_PERMANENT, "PERMANENT_CONDITION" }, \
> { CXL_EVENT_RECORD_FLAG_MAINT_NEEDED, "MAINTENANCE_NEEDED" }, \
> { CXL_EVENT_RECORD_FLAG_PERF_DEGRADED, "PERFORMANCE_DEGRADED" }, \
> - { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "HARDWARE_REPLACEMENT_NEEDED" } \
> + { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "HARDWARE_REPLACEMENT_NEEDED" }, \
> + { CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID, "MAINT_OP_SUB_CLASS_VALID" } \
> )
>
> /*
> @@ -197,7 +199,8 @@ TRACE_EVENT(cxl_overflow,
> __field(u16, hdr_related_handle) \
> __field(u64, hdr_timestamp) \
> __field(u8, hdr_length) \
> - __field(u8, hdr_maint_op_class)
> + __field(u8, hdr_maint_op_class) \
> + __field(u8, hdr_maint_op_sub_class)
>
> #define CXL_EVT_TP_fast_assign(cxlmd, l, hdr) \
> __assign_str(memdev); \
> @@ -209,17 +212,19 @@ TRACE_EVENT(cxl_overflow,
> __entry->hdr_handle = le16_to_cpu((hdr).handle); \
> __entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \
> __entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \
> - __entry->hdr_maint_op_class = (hdr).maint_op_class
> + __entry->hdr_maint_op_class = (hdr).maint_op_class; \
> + __entry->hdr_maint_op_sub_class = (hdr).maint_op_sub_class
>
> #define CXL_EVT_TP_printk(fmt, ...) \
> TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb " \
> "len=%d flags='%s' handle=%x related_handle=%x " \
> - "maint_op_class=%u : " fmt, \
> + "maint_op_class=%u maint_op_sub_class=%u : " fmt, \
> __get_str(memdev), __get_str(host), __entry->serial, \
> cxl_event_log_type_str(__entry->log), \
> __entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \
> __entry->hdr_related_handle, __entry->hdr_maint_op_class, \
> + __entry->hdr_maint_op_sub_class, \
> ##__VA_ARGS__)
>
> TRACE_EVENT(cxl_generic_event,
> diff --git a/include/cxl/event.h b/include/cxl/event.h
> index 0bea1afbd747..e1d485ad376b 100644
> --- a/include/cxl/event.h
> +++ b/include/cxl/event.h
> @@ -18,7 +18,8 @@ struct cxl_event_record_hdr {
> __le16 related_handle;
> __le64 timestamp;
> u8 maint_op_class;
> - u8 reserved[15];
> + u8 maint_op_sub_class;
> + u8 reserved[14];
> } __packed;
>
> struct cxl_event_media_hdr {
> --
> 2.43.0
>
--
Fan Ni (From gmail)
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-26 17:02 ` Steven Rostedt
@ 2024-11-27 10:12 ` Shiju Jose
2024-11-27 15:41 ` Steven Rostedt
0 siblings, 1 reply; 19+ messages in thread
From: Shiju Jose @ 2024-11-27 10:12 UTC (permalink / raw)
To: Steven Rostedt
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
[-- Attachment #1: Type: text/plain, Size: 2317 bytes --]
>-----Original Message-----
>From: Steven Rostedt <rostedt@goodmis.org>
>Sent: 26 November 2024 17:03
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>
>Subject: Re: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
>CXL spec rev 3.1
>
>On Tue, 26 Nov 2024 11:51:23 +0000
>Shiju Jose <shiju.jose@huawei.com> wrote:
>
>> We are encountering a parsing error ("FAILED TO PARSE") from
>> libtraceevent when it tries to parse some of the CXL trace events for the user-
>space tool rasdaemon.
>> This issue appeared after new fields were added to the trace events.
>> It was found that the issue does not occur when all or some of the
>> decoded strings for the event's data and flags are removed from the
>> TP_printk() function in the kernel, and only the values are printed instead.
>> https://elixir.bootlin.com/linux/v6.12/source/drivers/cxl/core/trace.h
>> https://lore.kernel.org/lkml/20241120093745.1847-1-shiju.jose@huawei.c
>> om/
>>
>> Below is the information from the debugging in libtraceevent:
>> The failure occurs in the following functions and locations within libtraceevent:
>> File: src/event-parse.c
>> Function: event_read_format()
>> ret = event_read_fields(event->tep, event, &event->format.fields); if (ret < 0)
>> return ret;
>>
>> Function: event_read_fields()
>> if (test_type_token(type, token, TEP_EVENT_ITEM, "field"))
>> goto fail;
>>
>> Can you recognize if there are any limitations or issues that would
>> prevent libtraceevent from parsing the trace event in the condition described
>above?
>
>Can you show me the output of the format files for the affected trace events:
>
> # cat /sys/kernel/tracing/cxl/<affected_event>/format
>
>You can attach it too if your email does whitespace mangling.
Hi Steve,
Please find attached, output of format file for the CXL general media trace event.
>
>Thanks,
>
>-- Steve
Thanks,
Shiju
[-- Attachment #2: format_cxl_general_media.txt --]
[-- Type: text/plain, Size: 4222 bytes --]
root@localhost:~# cat /sys/kernel/debug/tracing/events/cxl/cxl_general_media/format
name: cxl_general_media
ID: 1464
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:__data_loc char[] memdev; offset:8; size:4; signed:0;
field:__data_loc char[] host; offset:12; size:4; signed:0;
field:int log; offset:16; size:4; signed:1;
field:uuid_t hdr_uuid; offset:20; size:16; signed:0;
field:u64 serial; offset:40; size:8; signed:0;
field:u32 hdr_flags; offset:48; size:4; signed:0;
field:u16 hdr_handle; offset:52; size:2; signed:0;
field:u16 hdr_related_handle; offset:54; size:2; signed:0;
field:u64 hdr_timestamp; offset:56; size:8; signed:0;
field:u8 hdr_length; offset:64; size:1; signed:0;
field:u8 hdr_maint_op_class; offset:65; size:1; signed:0;
field:u8 hdr_maint_op_sub_class; offset:66; size:1; signed:0;
field:u64 dpa; offset:72; size:8; signed:0;
field:u8 descriptor; offset:80; size:1; signed:0;
field:u8 type; offset:81; size:1; signed:0;
field:u8 transaction_type; offset:82; size:1; signed:0;
field:u8 channel; offset:83; size:1; signed:0;
field:u32 device; offset:84; size:4; signed:0;
field:u8 comp_id[16]; offset:88; size:16; signed:0;
field:u64 hpa; offset:104; size:8; signed:0;
field:uuid_t region_uuid; offset:112; size:16; signed:0;
field:u16 validity_flags; offset:128; size:2; signed:0;
field:u8 rank; offset:130; size:1; signed:0;
field:u8 dpa_flags; offset:131; size:1; signed:0;
field:__data_loc char[] region_name; offset:132; size:4; signed:0;
field:u8 sub_type; offset:136; size:1; signed:0;
field:u8 cme_threshold_ev_flags; offset:137; size:1; signed:0;
field:u32 cme_count; offset:140; size:4; signed:0;
print fmt: "memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb len=%d flags='%s' handle=%x related_handle=%x maint_op_class=%u maint_op_sub_class=%u : dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u device=%x validity_flags='%s' comp_id=%shpa=%llx region=%s region_uuid=%pUb sub_type=%u cme_threshold_ev_flags=%u cme_count=%u", __get_str(memdev), __get_str(host), REC->serial, __print_symbolic(REC->log, { CXL_EVENT_TYPE_INFO, "Informational" }, { CXL_EVENT_TYPE_WARN, "Warning" }, { CXL_EVENT_TYPE_FAIL, "Failure" }, { CXL_EVENT_TYPE_FATAL, "Fatal" }), REC->hdr_timestamp, &REC->hdr_uuid, REC->hdr_length, __print_flags(REC->hdr_flags, " | ", { ((((1UL))) << (2)), "PERMANENT_CONDITION" }, { ((((1UL))) << (3)), "MAINTENANCE_NEEDED" }, { ((((1UL))) << (4)), "PERFORMANCE_DEGRADED" }, { ((((1UL))) << (5)), "HARDWARE_REPLACEMENT_NEEDED" }, { ((((1UL))) << (6)), "MAINT_OP_SUB_CLASS_VALID" } ), REC->hdr_handle, REC->hdr_related_handle, REC->hdr_maint_op_class, REC->hdr_maint_op_sub_class, REC->dpa, __print_flags(REC->dpa_flags, "|", { ((((1UL))) << (0)), "VOLATILE" }, { ((((1UL))) << (1)), "NOT_REPAIRABLE" } ), __print_flags(REC->descriptor, "|", { ((((1UL))) << (0)), "UNCORRECTABLE_EVENT" }, { ((((1UL))) << (1)), "THRESHOLD_EVENT" }, { ((((1UL))) << (2)), "POISON_LIST_OVERFLOW" } ), __print_symbolic(REC->type, { 0x00, "ECC Error" }, { 0x01, "Invalid Address" }, { 0x02, "Data Path Error" }, { 0x03, "TE State Violation" }, { 0x04, "Scrub Media ECC Error" }, { 0x05, "Adv Prog CME Counter Expiration" }, { 0x06, "CKID Violation" } ), __print_symbolic(REC->transaction_type, { 0x00, "Unknown" }, { 0x01, "Host Read" }, { 0x02, "Host Write" }, { 0x03, "Host Scan Media" }, { 0x04, "Host Inject Poison" }, { 0x05, "Internal Media Scrub" }, { 0x06, "Internal Media Management" }, { 0x07, "Internal Media Error Check Scrub" }, { 0x08, "Media Initialization" } ), REC->channel, REC->rank, REC->device, __print_flags(REC->validity_flags, "|", { ((((1UL))) << (0)), "CHANNEL" }, { ((((1UL))) << (1)), "RANK" }, { ((((1UL))) << (2)), "DEVICE" }, { ((((1UL))) << (3)), "COMPONENT" }, { ((((1UL))) << (4)), "COMPONENT PLDM FORMAT" } ), __print_hex(REC->comp_id, 0x10), REC->hpa, __get_str(region_name), &REC->region_uuid, REC->sub_type, REC->cme_threshold_ev_flags, REC->cme_count
root@localhost:~#
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH v4 1/6] cxl/events: Update Common Event Record to CXL spec rev 3.1
2024-11-26 17:27 ` Fan Ni
@ 2024-11-27 10:15 ` Shiju Jose
0 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2024-11-27 10:15 UTC (permalink / raw)
To: Fan Ni
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, vishal.l.verma@intel.com,
ira.weiny@intel.com, dave@stgolabs.net, linux-cxl@vger.kernel.org,
linux-kernel@vger.kernel.org, Linuxarm, tanxiaofei, Zengtao (B)
>-----Original Message-----
>From: Fan Ni <nifan.cxl@gmail.com>
>Sent: 26 November 2024 17:27
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>
>Subject: Re: [PATCH v4 1/6] cxl/events: Update Common Event Record to CXL
>spec rev 3.1
>
>On Wed, Nov 20, 2024 at 09:37:40AM +0000, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> CXL spec 3.1 section 8.2.9.2.1 Table 8-42, Common Event Record format
>> has updated with Maintenance Operation Subclass information.
>The table reference is incorrect, should be Table 8-43 based on spec r3.1, v1.0,
>Auguest 7, 2023.
>
Hi Fan,
Thanks for reviewing. Will fix.
>Other than that,
>
>
>Reviewed-by: Fan Ni <fan.ni@samsung.com>
>
>>
>> Add updates for the above spec change in the CXL events record and CXL
>> common trace event implementations.
>>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>> drivers/cxl/core/trace.h | 13 +++++++++----
>> include/cxl/event.h | 3 ++-
>> 2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index
>> 8389a94adb1a..7305974e2301 100644
>> --- a/drivers/cxl/core/trace.h
>> +++ b/drivers/cxl/core/trace.h
>> @@ -166,11 +166,13 @@ TRACE_EVENT(cxl_overflow,
>> #define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED BIT(3)
>> #define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED BIT(4)
>> #define CXL_EVENT_RECORD_FLAG_HW_REPLACE BIT(5)
>> +#define CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID BIT(6)
>> #define show_hdr_flags(flags) __print_flags(flags, " | ",
> \
>> { CXL_EVENT_RECORD_FLAG_PERMANENT,
> "PERMANENT_CONDITION" }, \
>> { CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> "MAINTENANCE_NEEDED" }, \
>> { CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
> "PERFORMANCE_DEGRADED" }, \
>> - { CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> "HARDWARE_REPLACEMENT_NEEDED" } \
>> + { CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> "HARDWARE_REPLACEMENT_NEEDED" }, \
>> + { CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID,
> "MAINT_OP_SUB_CLASS_VALID" } \
>> )
>>
>> /*
>> @@ -197,7 +199,8 @@ TRACE_EVENT(cxl_overflow,
>> __field(u16, hdr_related_handle) \
>> __field(u64, hdr_timestamp) \
>> __field(u8, hdr_length) \
>> - __field(u8, hdr_maint_op_class)
>> + __field(u8, hdr_maint_op_class) \
>> + __field(u8, hdr_maint_op_sub_class)
>>
>> #define CXL_EVT_TP_fast_assign(cxlmd, l, hdr)
> \
>> __assign_str(memdev); \
>> @@ -209,17 +212,19 @@ TRACE_EVENT(cxl_overflow,
>> __entry->hdr_handle = le16_to_cpu((hdr).handle);
> \
>> __entry->hdr_related_handle = le16_to_cpu((hdr).related_handle);
> \
>> __entry->hdr_timestamp = le64_to_cpu((hdr).timestamp);
> \
>> - __entry->hdr_maint_op_class = (hdr).maint_op_class
>> + __entry->hdr_maint_op_class = (hdr).maint_op_class;
> \
>> + __entry->hdr_maint_op_sub_class = (hdr).maint_op_sub_class
>>
>> #define CXL_EVT_TP_printk(fmt, ...) \
>> TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu
>uuid=%pUb " \
>> "len=%d flags='%s' handle=%x related_handle=%x "
> \
>> - "maint_op_class=%u : " fmt,
> \
>> + "maint_op_class=%u maint_op_sub_class=%u : " fmt,
> \
>> __get_str(memdev), __get_str(host), __entry->serial,
> \
>> cxl_event_log_type_str(__entry->log),
> \
>> __entry->hdr_timestamp, &__entry->hdr_uuid, __entry-
>>hdr_length,\
>> show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,
> \
>> __entry->hdr_related_handle, __entry->hdr_maint_op_class,
> \
>> + __entry->hdr_maint_op_sub_class, \
>> ##__VA_ARGS__)
>>
>> TRACE_EVENT(cxl_generic_event,
>> diff --git a/include/cxl/event.h b/include/cxl/event.h index
>> 0bea1afbd747..e1d485ad376b 100644
>> --- a/include/cxl/event.h
>> +++ b/include/cxl/event.h
>> @@ -18,7 +18,8 @@ struct cxl_event_record_hdr {
>> __le16 related_handle;
>> __le64 timestamp;
>> u8 maint_op_class;
>> - u8 reserved[15];
>> + u8 maint_op_sub_class;
>> + u8 reserved[14];
>> } __packed;
>>
>> struct cxl_event_media_hdr {
>> --
>> 2.43.0
>>
>
>--
>Fan Ni (From gmail)
Thanks,
Shiju
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-27 10:12 ` Shiju Jose
@ 2024-11-27 15:41 ` Steven Rostedt
2024-11-27 18:20 ` Shiju Jose
0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2024-11-27 15:41 UTC (permalink / raw)
To: Shiju Jose
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
On Wed, 27 Nov 2024 10:12:12 +0000
Shiju Jose <shiju.jose@huawei.com> wrote:
> format:
> field:unsigned short common_type; offset:0; size:2; signed:0;
> field:unsigned char common_flags; offset:2; size:1; signed:0;
> field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
> field:int common_pid; offset:4; size:4; signed:1;
>
> field:__data_loc char[] memdev; offset:8; size:4; signed:0;
> field:__data_loc char[] host; offset:12; size:4; signed:0;
> field:int log; offset:16; size:4; signed:1;
> field:uuid_t hdr_uuid; offset:20; size:16; signed:0;
New type for me ;-)
> field:u64 serial; offset:40; size:8; signed:0;
> field:u32 hdr_flags; offset:48; size:4; signed:0;
> field:u16 hdr_handle; offset:52; size:2; signed:0;
> field:u16 hdr_related_handle; offset:54; size:2; signed:0;
> field:u64 hdr_timestamp; offset:56; size:8; signed:0;
> field:u8 hdr_length; offset:64; size:1; signed:0;
> field:u8 hdr_maint_op_class; offset:65; size:1; signed:0;
> field:u8 hdr_maint_op_sub_class; offset:66; size:1; signed:0;
> field:u64 dpa; offset:72; size:8; signed:0;
> field:u8 descriptor; offset:80; size:1; signed:0;
> field:u8 type; offset:81; size:1; signed:0;
> field:u8 transaction_type; offset:82; size:1; signed:0;
> field:u8 channel; offset:83; size:1; signed:0;
> field:u32 device; offset:84; size:4; signed:0;
> field:u8 comp_id[16]; offset:88; size:16; signed:0;
> field:u64 hpa; offset:104; size:8; signed:0;
> field:uuid_t region_uuid; offset:112; size:16; signed:0;
> field:u16 validity_flags; offset:128; size:2; signed:0;
> field:u8 rank; offset:130; size:1; signed:0;
> field:u8 dpa_flags; offset:131; size:1; signed:0;
> field:__data_loc char[] region_name; offset:132; size:4; signed:0;
> field:u8 sub_type; offset:136; size:1; signed:0;
> field:u8 cme_threshold_ev_flags; offset:137; size:1; signed:0;
> field:u32 cme_count; offset:140; size:4; signed:0;
>
> print fmt: "memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb len=%d flags='%s' handle=%x related_handle=%x maint_op_class=%u maint_op_sub_class=%u : dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u device=%x validity_flags='%s' comp_id=%shpa=%llx region=%s region_uuid=%pUb sub_type=%u cme_threshold_ev_flags=%u cme_count=%u", __get_str(memdev), __get_str(host), REC->serial, __print_symbolic(REC->log, { CXL_EVENT_TYPE_INFO, "Informational" }, { CXL_EVENT_TYPE_WARN, "Warning" }, { CXL_EVENT_TYPE_FAIL, "Failure" }, { CXL_EVENT_TYPE_FATAL, "Fatal" }), REC->hdr_timestamp,
> &REC->hdr_uuid,
libtraceevent doesn't know about taking an address with '&'.
If I remove it (and the other one below for region_uuid), it parses fine.
I'll have to add this to the library, as it should be able to handle this.
I bet I also have to add "%pUb" as well.
Thanks,
-- Steve
> REC->hdr_length, __print_flags(REC->hdr_flags, " | ", { ((((1UL))) << (2)), "PERMANENT_CONDITION" }, { ((((1UL))) << (3)), "MAINTENANCE_NEEDED" }, { ((((1UL))) << (4)), "PERFORMANCE_DEGRADED" }, { ((((1UL))) << (5)), "HARDWARE_REPLACEMENT_NEEDED" }, { ((((1UL))) << (6)), "MAINT_OP_SUB_CLASS_VALID" } ), REC->hdr_handle, REC->hdr_related_handle, REC->hdr_maint_op_class, REC->hdr_maint_op_sub_class, REC->dpa, __print_flags(REC->dpa_flags, "|", { ((((1UL))) << (0)), "VOLATILE" }, { ((((1UL))) << (1)), "NOT_REPAIRABLE" } ), __print_flags(REC->descriptor, "|", { ((((1UL))) << (0)), "UNCORRECTABLE_EVENT" }, { ((((1UL))) << (1)), "THRESHOLD_EVENT" }, { ((((1UL))) << (2)), "POISON_LIST_OVERFLOW" } ), __print_symbolic(REC->type, { 0x00, "ECC Error" }, { 0x01, "Invalid Address" }, { 0x02, "Data Path Error" }, { 0x03, "TE State Violation" }, { 0x04, "Scrub Media ECC Error" }, { 0x05, "Adv Prog CME Counter Expiration" }, { 0x06, "CKID Violation" } ), __print_symbolic(REC->transaction_type, { 0x00, "Unknown" }, { 0x01, "Host Read" }, { 0x02, "Host Write" }, { 0x03, "Host Scan Media" }, { 0x04, "Host Inject Poison" }, { 0x05, "Internal Media Scrub" }, { 0x06, "Internal Media Management" }, { 0x07, "Internal Media Error Check Scrub" }, { 0x08, "Media Initialization" } ), REC->channel, REC->rank, REC->device, __print_flags(REC->validity_flags, "|", { ((((1UL))) << (0)), "CHANNEL" }, { ((((1UL))) << (1)), "RANK" }, { ((((1UL))) << (2)), "DEVICE" }, { ((((1UL))) << (3)), "COMPONENT" }, { ((((1UL))) << (4)), "COMPONENT PLDM FORMAT" } ), __print_hex(REC->comp_id, 0x10), REC->hpa, __get_str(region_name), &REC->region_uuid, REC->sub_type, REC->cme_threshold_ev_flags, REC->cme_count
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-27 15:41 ` Steven Rostedt
@ 2024-11-27 18:20 ` Shiju Jose
2024-11-27 18:34 ` Steven Rostedt
0 siblings, 1 reply; 19+ messages in thread
From: Shiju Jose @ 2024-11-27 18:20 UTC (permalink / raw)
To: Steven Rostedt
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
[-- Attachment #1: Type: text/plain, Size: 4363 bytes --]
Hi Steve,
Thanks for the quick reply.
Please find reply inline.
>-----Original Message-----
>From: Steven Rostedt <rostedt@goodmis.org>
>Sent: 27 November 2024 15:42
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>
>Subject: Re: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
>CXL spec rev 3.1
>
>On Wed, 27 Nov 2024 10:12:12 +0000
>Shiju Jose <shiju.jose@huawei.com> wrote:
>
>> format:
>> field:unsigned short common_type; offset:0; size:2;
> signed:0;
>> field:unsigned char common_flags; offset:2; size:1;
> signed:0;
>> field:unsigned char common_preempt_count; offset:3; size:1;
> signed:0;
>> field:int common_pid; offset:4; size:4; signed:1;
>>
>> field:__data_loc char[] memdev; offset:8; size:4;
> signed:0;
>> field:__data_loc char[] host; offset:12; size:4; signed:0;
>> field:int log; offset:16; size:4; signed:1;
>
>> field:uuid_t hdr_uuid; offset:20; size:16; signed:0;
>
>New type for me ;-)
>
>> field:u64 serial; offset:40; size:8; signed:0;
>> field:u32 hdr_flags; offset:48; size:4; signed:0;
>> field:u16 hdr_handle; offset:52; size:2; signed:0;
>> field:u16 hdr_related_handle; offset:54; size:2; signed:0;
>> field:u64 hdr_timestamp; offset:56; size:8; signed:0;
>> field:u8 hdr_length; offset:64; size:1; signed:0;
>> field:u8 hdr_maint_op_class; offset:65; size:1; signed:0;
>> field:u8 hdr_maint_op_sub_class; offset:66; size:1;
> signed:0;
>> field:u64 dpa; offset:72; size:8; signed:0;
>> field:u8 descriptor; offset:80; size:1; signed:0;
>> field:u8 type; offset:81; size:1; signed:0;
>> field:u8 transaction_type; offset:82; size:1; signed:0;
>> field:u8 channel; offset:83; size:1; signed:0;
>> field:u32 device; offset:84; size:4; signed:0;
>> field:u8 comp_id[16]; offset:88; size:16; signed:0;
>> field:u64 hpa; offset:104; size:8; signed:0;
>> field:uuid_t region_uuid; offset:112; size:16; signed:0;
>> field:u16 validity_flags; offset:128; size:2; signed:0;
>> field:u8 rank; offset:130; size:1; signed:0;
>> field:u8 dpa_flags; offset:131; size:1; signed:0;
>> field:__data_loc char[] region_name; offset:132; size:4;
> signed:0;
>> field:u8 sub_type; offset:136; size:1; signed:0;
>> field:u8 cme_threshold_ev_flags; offset:137; size:1;
> signed:0;
>> field:u32 cme_count; offset:140; size:4; signed:0;
>>
>> print fmt: "memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb
>len=%d flags='%s' handle=%x related_handle=%x maint_op_class=%u
>maint_op_sub_class=%u : dpa=%llx dpa_flags='%s' descriptor='%s' type='%s'
>transaction_type='%s' channel=%u rank=%u device=%x validity_flags='%s'
>comp_id=%shpa=%llx region=%s region_uuid=%pUb sub_type=%u
>cme_threshold_ev_flags=%u cme_count=%u", __get_str(memdev),
>__get_str(host), REC->serial, __print_symbolic(REC->log, {
>CXL_EVENT_TYPE_INFO, "Informational" }, { CXL_EVENT_TYPE_WARN,
>"Warning" }, { CXL_EVENT_TYPE_FAIL, "Failure" }, { CXL_EVENT_TYPE_FATAL,
>"Fatal" }), REC->hdr_timestamp,
>
>
>> &REC->hdr_uuid,
>
>libtraceevent doesn't know about taking an address with '&'.
>
>If I remove it (and the other one below for region_uuid), it parses fine.
>
>I'll have to add this to the library, as it should be able to handle this.
>I bet I also have to add "%pUb" as well.
>
I tested removing hdr_uuid and region_uuid from the rasdaemon test setup
as you suggested. As a result, libtraceevent parses correctly, as you mentioned.
However, I encounter similar parsing error ("FAILED TO PARSE") when I add two additional
decoded strings (%s) to the TP_printk, replacing (%u). Please see the attached format file,
"format_cxl_general_media_v3.1_basic", for your reference.
I've also attached another format file, "format_cxl_general_media_v3.1_full",
which contains the complete TP_printk() intended.
Can you please help or else can you share how to debug these errors in the
libtraceevent setup?
>Thanks,
>
>-- Steve
>
Thanks,
Shiju
[-- Attachment #2: format_cxl_general_media_v3.1_full.txt --]
[-- Type: text/plain, Size: 4999 bytes --]
root@localhost:~# cat /sys/kernel/debug/tracing/events/cxl/cxl_general_media/format
name: cxl_general_media
ID: 1464
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:__data_loc char[] memdev; offset:8; size:4; signed:0;
field:__data_loc char[] host; offset:12; size:4; signed:0;
field:int log; offset:16; size:4; signed:1;
field:u64 serial; offset:24; size:8; signed:0;
field:u32 hdr_flags; offset:32; size:4; signed:0;
field:u16 hdr_handle; offset:36; size:2; signed:0;
field:u16 hdr_related_handle; offset:38; size:2; signed:0;
field:u64 hdr_timestamp; offset:40; size:8; signed:0;
field:u8 hdr_length; offset:48; size:1; signed:0;
field:u8 hdr_maint_op_class; offset:49; size:1; signed:0;
field:u8 hdr_maint_op_sub_class; offset:50; size:1; signed:0;
field:u64 dpa; offset:56; size:8; signed:0;
field:u8 descriptor; offset:64; size:1; signed:0;
field:u8 type; offset:65; size:1; signed:0;
field:u8 sub_type; offset:66; size:1; signed:0;
field:u8 transaction_type; offset:67; size:1; signed:0;
field:u8 channel; offset:68; size:1; signed:0;
field:u32 device; offset:72; size:4; signed:0;
field:u8 comp_id[16]; offset:76; size:16; signed:0;
field:u64 hpa; offset:96; size:8; signed:0;
field:u16 validity_flags; offset:104; size:2; signed:0;
field:u8 rank; offset:106; size:1; signed:0;
field:u8 dpa_flags; offset:107; size:1; signed:0;
field:u8 cme_threshold_ev_flags; offset:108; size:1; signed:0;
field:u32 cme_count; offset:112; size:4; signed:0;
field:__data_loc char[] region_name; offset:116; size:4; signed:0;
print fmt: "memdev=%s host=%s serial=%lld log=%s : time=%llu len=%d flags='%s' handle=%x related_handle=%x maint_op_class=%u maint_op_sub_class=%u : dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' sub_type='%s' transaction_type='%s' channel=%u rank=%u device=%x validity_flags='%s' comp_id=%s comp_id_pldm_valid_flags='%s' pldm_entity_id=%s pldm_resource_id=%s hpa=%llx cme_threshold_ev_flags='%s' cme_count=%x region=%s", __get_str(memdev), __get_str(host), REC->serial, __print_symbolic(REC->log, { CXL_EVENT_TYPE_INFO, "Informational" }, { CXL_EVENT_TYPE_WARN, "Warning" }, { CXL_EVENT_TYPE_FAIL, "Failure" }, { CXL_EVENT_TYPE_FATAL, "Fatal" }), REC->hdr_timestamp, REC->hdr_length, __print_flags(REC->hdr_flags, " | ", { ((((1UL))) << (2)), "PERMANENT_CONDITION" }, { ((((1UL))) << (3)), "MAINTENANCE_NEEDED" }, { ((((1UL))) << (4)), "PERFORMANCE_DEGRADED" }, { ((((1UL))) << (5)), "HARDWARE_REPLACEMENT_NEEDED" }, { ((((1UL))) << (6)), "MAINT_OP_SUB_CLASS_VALID" } ), REC->hdr_handle, REC->hdr_related_handle, REC->hdr_maint_op_class, REC->hdr_maint_op_sub_class, REC->dpa, __print_flags(REC->dpa_flags, "|", { ((((1UL))) << (0)), "VOLATILE" }, { ((((1UL))) << (1)), "NOT_REPAIRABLE" } ), __print_flags(REC->descriptor, "|", { ((((1UL))) << (0)), "UNCORRECTABLE_EVENT" }, { ((((1UL))) << (1)), "THRESHOLD_EVENT" }, { ((((1UL))) << (2)), "POISON_LIST_OVERFLOW" } ), __print_symbolic(REC->type, { 0x00, "ECC Error" }, { 0x01, "Invalid Address" }, { 0x02, "Data Path Error" }, { 0x03, "TE State Violation" }, { 0x04, "Scrub Media ECC Error" }, { 0x05, "Adv Prog CME Counter Expiration" }, { 0x06, "CKID Violation" } ), __print_symbolic(REC->sub_type, { 0x00, "Not Reported" }, { 0x01, "Internal Datapath Error" }, { 0x02, "Media Link Command Training Error" }, { 0x03, "Media Link Control Training Error" }, { 0x04, "Media Link Data Training Error" }, { 0x05, "Media Link CRC Error" } ), __print_symbolic(REC->transaction_type, { 0x00, "Unknown" }, { 0x01, "Host Read" }, { 0x02, "Host Write" }, { 0x03, "Host Scan Media" }, { 0x04, "Host Inject Poison" }, { 0x05, "Internal Media Scrub" }, { 0x06, "Internal Media Management" }, { 0x07, "Internal Media Error Check Scrub" }, { 0x08, "Media Initialization" } ), REC->channel, REC->rank, REC->device, __print_flags(REC->validity_flags, "|", { ((((1UL))) << (0)), "CHANNEL" }, { ((((1UL))) << (1)), "RANK" }, { ((((1UL))) << (2)), "DEVICE" }, { ((((1UL))) << (3)), "COMPONENT" }, { ((((1UL))) << (4)), "COMPONENT PLDM FORMAT" } ), __print_hex(REC->comp_id, 0x10), __print_flags(REC->comp_id[0], " | ", { ((((1UL))) << (0)), "PLDM Entity ID" }, { ((((1UL))) << (1)), "Resource ID" } ), (REC->validity_flags & ((((1UL))) << (3)) && REC->validity_flags & ((((1UL))) << (4))) ? (REC->comp_id[0] & ((((1UL))) << (0))) ? __print_hex(&REC->comp_id[1], 6) : "0x00" : "0x00", (REC->validity_flags & ((((1UL))) << (3)) && REC->validity_flags & ((((1UL))) << (4))) ? (REC->comp_id[0] & ((((1UL))) << (1))) ? __print_hex(&REC->comp_id[7], 4) : "0x00" : "0x00", REC->hpa, __print_flags(REC->cme_threshold_ev_flags, "|", { ((((1UL))) << (0)), "Corrected Memory Errors in Multiple Media Components" }, { ((((1UL))) << (1)), "Exceeded Programmable Threshold" } ), REC->cme_count, __get_str(region_name)
[-- Attachment #3: format_cxl_general_media_v3.1_basic.txt --]
[-- Type: text/plain, Size: 4463 bytes --]
root@localhost:~# cat /sys/kernel/debug/tracing/events/cxl/cxl_general_media/format
name: cxl_general_media
ID: 1464
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:__data_loc char[] memdev; offset:8; size:4; signed:0;
field:__data_loc char[] host; offset:12; size:4; signed:0;
field:int log; offset:16; size:4; signed:1;
field:u64 serial; offset:24; size:8; signed:0;
field:u32 hdr_flags; offset:32; size:4; signed:0;
field:u16 hdr_handle; offset:36; size:2; signed:0;
field:u16 hdr_related_handle; offset:38; size:2; signed:0;
field:u64 hdr_timestamp; offset:40; size:8; signed:0;
field:u8 hdr_length; offset:48; size:1; signed:0;
field:u8 hdr_maint_op_class; offset:49; size:1; signed:0;
field:u8 hdr_maint_op_sub_class; offset:50; size:1; signed:0;
field:u64 dpa; offset:56; size:8; signed:0;
field:u8 descriptor; offset:64; size:1; signed:0;
field:u8 type; offset:65; size:1; signed:0;
field:u8 transaction_type; offset:66; size:1; signed:0;
field:u8 channel; offset:67; size:1; signed:0;
field:u32 device; offset:68; size:4; signed:0;
field:u8 comp_id[16]; offset:72; size:16; signed:0;
field:u64 hpa; offset:88; size:8; signed:0;
field:u16 validity_flags; offset:96; size:2; signed:0;
field:u8 rank; offset:98; size:1; signed:0;
field:u8 dpa_flags; offset:99; size:1; signed:0;
field:__data_loc char[] region_name; offset:100; size:4; signed:0;
field:u8 sub_type; offset:104; size:1; signed:0;
field:u8 cme_threshold_ev_flags; offset:105; size:1; signed:0;
field:u32 cme_count; offset:108; size:4; signed:0;
print fmt: "memdev=%s host=%s serial=%lld log=%s : time=%llu len=%d flags='%s' handle=%x related_handle=%x maint_op_class=%u maint_op_sub_class=%u : dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u device=%x validity_flags='%s' comp_id=%s hpa=%llx region=%s sub_type='%s' cme_threshold_ev_flags='%s' cme_count=%u", __get_str(memdev), __get_str(host), REC->serial, __print_symbolic(REC->log, { CXL_EVENT_TYPE_INFO, "Informational" }, { CXL_EVENT_TYPE_WARN, "Warning" }, { CXL_EVENT_TYPE_FAIL, "Failure" }, { CXL_EVENT_TYPE_FATAL, "Fatal" }), REC->hdr_timestamp, REC->hdr_length, __print_flags(REC->hdr_flags, " | ", { ((((1UL))) << (2)), "PERMANENT_CONDITION" }, { ((((1UL))) << (3)), "MAINTENANCE_NEEDED" }, { ((((1UL))) << (4)), "PERFORMANCE_DEGRADED" }, { ((((1UL))) << (5)), "HARDWARE_REPLACEMENT_NEEDED" }, { ((((1UL))) << (6)), "MAINT_OP_SUB_CLASS_VALID" } ), REC->hdr_handle, REC->hdr_related_handle, REC->hdr_maint_op_class, REC->hdr_maint_op_sub_class, REC->dpa, __print_flags(REC->dpa_flags, "|", { ((((1UL))) << (0)), "VOLATILE" }, { ((((1UL))) << (1)), "NOT_REPAIRABLE" } ), __print_flags(REC->descriptor, "|", { ((((1UL))) << (0)), "UNCORRECTABLE_EVENT" }, { ((((1UL))) << (1)), "THRESHOLD_EVENT" }, { ((((1UL))) << (2)), "POISON_LIST_OVERFLOW" } ), __print_symbolic(REC->type, { 0x00, "ECC Error" }, { 0x01, "Invalid Address" }, { 0x02, "Data Path Error" }, { 0x03, "TE State Violation" }, { 0x04, "Scrub Media ECC Error" }, { 0x05, "Adv Prog CME Counter Expiration" }, { 0x06, "CKID Violation" } ), __print_symbolic(REC->transaction_type, { 0x00, "Unknown" }, { 0x01, "Host Read" }, { 0x02, "Host Write" }, { 0x03, "Host Scan Media" }, { 0x04, "Host Inject Poison" }, { 0x05, "Internal Media Scrub" }, { 0x06, "Internal Media Management" }, { 0x07, "Internal Media Error Check Scrub" }, { 0x08, "Media Initialization" } ), REC->channel, REC->rank, REC->device, __print_flags(REC->validity_flags, "|", { ((((1UL))) << (0)), "CHANNEL" }, { ((((1UL))) << (1)), "RANK" }, { ((((1UL))) << (2)), "DEVICE" }, { ((((1UL))) << (3)), "COMPONENT" }, { ((((1UL))) << (4)), "COMPONENT PLDM FORMAT" } ), __print_hex(REC->comp_id, 0x10), REC->hpa, __get_str(region_name), __print_symbolic(REC->sub_type, { 0x00, "Not Reported" }, { 0x01, "Internal Datapath Error" }, { 0x02, "Media Link Command Training Error" }, { 0x03, "Media Link Control Training Error" }, { 0x04, "Media Link Data Training Error" }, { 0x05, "Media Link CRC Error" } ), __print_flags(REC->cme_threshold_ev_flags, "|", { ((((1UL))) << (0)), "Corrected Memory Errors in Multiple Media Components" }, { ((((1UL))) << (1)), "Exceeded Programmable Threshold" } ), REC->cme_count
root@localhost:~#
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-27 18:20 ` Shiju Jose
@ 2024-11-27 18:34 ` Steven Rostedt
2024-11-28 10:01 ` Shiju Jose
0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2024-11-27 18:34 UTC (permalink / raw)
To: Shiju Jose
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]
On Wed, 27 Nov 2024 18:20:26 +0000
Shiju Jose <shiju.jose@huawei.com> wrote:
> I tested removing hdr_uuid and region_uuid from the rasdaemon test setup
> as you suggested. As a result, libtraceevent parses correctly, as you mentioned.
>
> However, I encounter similar parsing error ("FAILED TO PARSE") when I add two additional
> decoded strings (%s) to the TP_printk, replacing (%u). Please see the attached format file,
> "format_cxl_general_media_v3.1_basic", for your reference.
Are you sure. I don't see anything wrong with that one. Which version of
libtraceevent do you have?
>
> I've also attached another format file, "format_cxl_general_media_v3.1_full",
> which contains the complete TP_printk() intended.
This one has some complex arguments and also uses the '&' address of an
argument.
>
> Can you please help or else can you share how to debug these errors in the
> libtraceevent setup?
Basically, I use the attached program (that just links to libtraceevent).
Note, I need to delete the first line of your files, which has the "cat"
command. But you can run this on the file itself:
./tep_load_event /sys/kernel/tracing/events/cxl/cxl_general_media/format
But you may need to be root to do that. If root just copies that file, you
can then run it as non-root.
# cp /sys/kernel/tracing/events/cxl/cxl_general_media/format /tmp
$ ./tep_load_event /tmp/format
I run it under gdb and see where it fails. But it should let you know if it
will pass or not. I put a breakpoint on tep_warning and when it gets hit, I
examine what it did up to that point.
-- Steve
[-- Attachment #2: tep_load_event.c --]
[-- Type: text/x-c++src, Size: 1627 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <event-parse.h>
static char *argv0;
static char *get_this_name(void)
{
static char *this_name;
char *arg;
char *p;
if (this_name)
return this_name;
arg = argv0;
p = arg+strlen(arg);
while (p >= arg && *p != '/')
p--;
p++;
this_name = p;
return p;
}
static void usage(void)
{
char *p = get_this_name();
printf("usage: %s exec\n"
"\n",p);
exit(-1);
}
static void __vdie(const char *fmt, va_list ap, int err)
{
int ret = errno;
char *p = get_this_name();
if (err && errno)
perror(p);
else
ret = -1;
fprintf(stderr, " ");
vfprintf(stderr, fmt, ap);
fprintf(stderr, "\n");
exit(ret);
}
void die(const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
__vdie(fmt, ap, 0);
va_end(ap);
}
void pdie(const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
__vdie(fmt, ap, 1);
va_end(ap);
}
int main (int argc, char **argv)
{
struct tep_handle *tep;
off_t size;
char *buf;
int ret;
argv0 = argv[0];
if (argc < 2)
usage();
tep = tep_alloc();
if (!tep)
pdie("tep_alloc");
tep_set_loglevel(TEP_LOG_ALL);
ret = open(argv[1], O_RDONLY);
if (ret < 0)
pdie(argv[1]);
size = lseek(ret, 0, SEEK_END);
if (size < 0)
pdie("lseek");
lseek(ret, 0, SEEK_SET);
buf = malloc(size);
if (!buf)
pdie("malloc");
size = read(ret, buf, size);
if (size < 0)
pdie("read");
close(ret);
ret = tep_parse_event(tep, buf, size, "sys");
if (ret < 0)
pdie("tep_parse_event");
free(buf);
tep_free(tep);
return 0;
}
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-27 18:34 ` Steven Rostedt
@ 2024-11-28 10:01 ` Shiju Jose
2024-11-29 13:22 ` Shiju Jose
0 siblings, 1 reply; 19+ messages in thread
From: Shiju Jose @ 2024-11-28 10:01 UTC (permalink / raw)
To: Steven Rostedt
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
>-----Original Message-----
>From: Steven Rostedt <rostedt@goodmis.org>
>Sent: 27 November 2024 18:34
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>
>Subject: Re: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
>CXL spec rev 3.1
>
>On Wed, 27 Nov 2024 18:20:26 +0000
>Shiju Jose <shiju.jose@huawei.com> wrote:
>
>> I tested removing hdr_uuid and region_uuid from the rasdaemon test
>> setup as you suggested. As a result, libtraceevent parses correctly, as you
>mentioned.
>>
>> However, I encounter similar parsing error ("FAILED TO PARSE") when I
>> add two additional decoded strings (%s) to the TP_printk, replacing
>> (%u). Please see the attached format file,
>"format_cxl_general_media_v3.1_basic", for your reference.
>
>Are you sure. I don't see anything wrong with that one. Which version of
>libtraceevent do you have?
libtraceevent source code version 1.8.4, build for arm64.
>
>>
>> I've also attached another format file,
>> "format_cxl_general_media_v3.1_full",
>> which contains the complete TP_printk() intended.
>
>This one has some complex arguments and also uses the '&' address of an
>argument.
Thanks.
I will debug this when basic one is working.
>
>>
>> Can you please help or else can you share how to debug these errors in
>> the libtraceevent setup?
>
>Basically, I use the attached program (that just links to libtraceevent).
>
>Note, I need to delete the first line of your files, which has the "cat"
>command. But you can run this on the file itself:
>
> ./tep_load_event /sys/kernel/tracing/events/cxl/cxl_general_media/format
>
>But you may need to be root to do that. If root just copies that file, you can then
>run it as non-root.
>
> # cp /sys/kernel/tracing/events/cxl/cxl_general_media/format /tmp $
>./tep_load_event /tmp/format
>
>I run it under gdb and see where it fails. But it should let you know if it will pass
>or not. I put a breakpoint on tep_warning and when it gets hit, I examine what it
>did up to that point.
Thanks Steve for the instructions. I will try this.
>
>-- Steve
Thanks,
Shiju
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-28 10:01 ` Shiju Jose
@ 2024-11-29 13:22 ` Shiju Jose
2024-12-03 15:21 ` Shiju Jose
0 siblings, 1 reply; 19+ messages in thread
From: Shiju Jose @ 2024-11-29 13:22 UTC (permalink / raw)
To: Shiju Jose, Steven Rostedt
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
>-----Original Message-----
>From: Shiju Jose <shiju.jose@huawei.com>
>Sent: 28 November 2024 10:02
>To: Steven Rostedt <rostedt@goodmis.org>
>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>
>Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
>CXL spec rev 3.1
>
>>-----Original Message-----
>>From: Steven Rostedt <rostedt@goodmis.org>
>>Sent: 27 November 2024 18:34
>>To: Shiju Jose <shiju.jose@huawei.com>
>>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
>><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>>dave@stgolabs.net; linux-cxl@vger.kernel.org;
>>linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>;
>>tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
>><prime.zeng@hisilicon.com>
>>Subject: Re: [PATCH v4 3/6] cxl/events: Update General Media Event
>>Record to CXL spec rev 3.1
>>
>>On Wed, 27 Nov 2024 18:20:26 +0000
>>Shiju Jose <shiju.jose@huawei.com> wrote:
>>
>>> I tested removing hdr_uuid and region_uuid from the rasdaemon test
>>> setup as you suggested. As a result, libtraceevent parses correctly,
>>> as you
>>mentioned.
>>>
>>> However, I encounter similar parsing error ("FAILED TO PARSE") when
>>> I add two additional decoded strings (%s) to the TP_printk, replacing
>>> (%u). Please see the attached format file,
>>"format_cxl_general_media_v3.1_basic", for your reference.
>>
>>Are you sure. I don't see anything wrong with that one. Which version
>>of libtraceevent do you have?
>
>libtraceevent source code version 1.8.4, build for arm64.
Hi Steve,
I debug this case and please find the info,
1. rasdaemon : read() from format file return around 1/3rd of the
actual data in the file only when the total size of the format's file
is above 4K bytes (page size), For example, in this case, the total size was 4512 bytes,
but read only 1674 bytes.
This partial data resulted tep_parse_event() in libtraceevent failed internally in the parse_format()
and since __parse_event() does not return error when parse_format() fail,
thus initialization of the event does not fail.
The following solution in rasdaemon solved the issue,
(provided if no other fix for the above issue with read()),
1. read() and accumulate content of format file until EOF reached.
2. Increased the buffer size from 4K bytes to 8K bytes.
3. May be __parse_event() in libtraceevent and thus tep_parse_event() return error
when parse_format() fail so that the initialization would fail if the input data is
corrupted or partial?
>
>>
>>>
>>> I've also attached another format file,
>>> "format_cxl_general_media_v3.1_full",
>>> which contains the complete TP_printk() intended.
>>
>>This one has some complex arguments and also uses the '&' address of an
>>argument.
>Thanks.
>I will debug this when basic one is working.
This case too worked with above workaround after removing '&' address
from the TP_printk().
>>
>>>
>>> Can you please help or else can you share how to debug these errors
>>> in the libtraceevent setup?
>>
>>Basically, I use the attached program (that just links to libtraceevent).
>>
>>Note, I need to delete the first line of your files, which has the "cat"
>>command. But you can run this on the file itself:
>>
>> ./tep_load_event
>> /sys/kernel/tracing/events/cxl/cxl_general_media/format
>>
>>But you may need to be root to do that. If root just copies that file,
>>you can then run it as non-root.
>>
>> # cp /sys/kernel/tracing/events/cxl/cxl_general_media/format /tmp $
>>./tep_load_event /tmp/format
>>
>>I run it under gdb and see where it fails. But it should let you know
>>if it will pass or not. I put a breakpoint on tep_warning and when it
>>gets hit, I examine what it did up to that point.
>
>Thanks Steve for the instructions. I will try this.
I built tep_load_event.c for arm64 and it worked after copy
format file to the /tmp/ folder as you said.
>>
>>-- Steve
>
Thanks,
Shiju
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-11-29 13:22 ` Shiju Jose
@ 2024-12-03 15:21 ` Shiju Jose
2024-12-04 11:35 ` Shiju Jose
0 siblings, 1 reply; 19+ messages in thread
From: Shiju Jose @ 2024-12-03 15:21 UTC (permalink / raw)
To: Steven Rostedt
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
>-----Original Message-----
>From: Shiju Jose <shiju.jose@huawei.com>
>Sent: 29 November 2024 13:23
>To: Shiju Jose <shiju.jose@huawei.com>; Steven Rostedt <rostedt@goodmis.org>
>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>
>Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
>CXL spec rev 3.1
>
>>-----Original Message-----
>>From: Shiju Jose <shiju.jose@huawei.com>
>>Sent: 28 November 2024 10:02
>>To: Steven Rostedt <rostedt@goodmis.org>
>>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
>><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>>dave@stgolabs.net; linux-cxl@vger.kernel.org;
>>linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>;
>>tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
>><prime.zeng@hisilicon.com>
>>Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event
>>Record to CXL spec rev 3.1
>>
[...]
>
>Hi Steve,
>
>I debug this case and please find the info, 1. rasdaemon : read() from format
>file return around 1/3rd of the
> actual data in the file only when the total size of the format's file
> is above 4K bytes (page size), For example, in this case, the total size was 4512
>bytes,
> but read only 1674 bytes.
> This partial data resulted tep_parse_event() in libtraceevent failed internally
>in the parse_format()
> and since __parse_event() does not return error when parse_format() fail,
> thus initialization of the event does not fail.
>
> The following solution in rasdaemon solved the issue,
> (provided if no other fix for the above issue with read()),
> 1. read() and accumulate content of format file until EOF reached.
> 2. Increased the buffer size from 4K bytes to 8K bytes.
> 3. May be __parse_event() in libtraceevent and thus tep_parse_event()
>return error
> when parse_format() fail so that the initialization would fail if the input
>data is
> corrupted or partial?
Hi Steve,
Identified the root cause of this issue in the kernel:
The read() function for the event's format file calls seq_read() and seq_read_iter() in the kernel,
which allocates a buffer of PAGE_SIZE (4 KB) for sequential reads. However, it
detects an overflow in the following marked call (seq_has_overflowed()) and
returns with partial data read.
=====================================
File : https://elixir.bootlin.com/linux/v6.13-rc1/source/fs/seq_file.c
ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
{
struct seq_file *m = iocb->ki_filp->private_data;
[...]
/* grab buffer if we didn't have one */
if (!m->buf) {
---------> m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
if (!m->buf)
goto Enomem;
}
// something left in the buffer - copy it out first
[...]
// get a non-empty record in the buffer
m->from = 0;
p = m->op->start(m, &m->index);
while (1) {
err = PTR_ERR(p);
if (!p || IS_ERR(p)) // EOF or an error
break;
[...]
}
// EOF or an error
m->op->stop(m, p);
m->count = 0;
goto Done;
Fill:
// one non-empty record is in the buffer; if they want more,
// try to fit more in, but in any case we need to advance
// the iterator once for every record shown.
while (1) {
size_t offs = m->count;
loff_t pos = m->index;
[...]
err = m->op->show(m, p);
if (err > 0) { // ->show() says "skip it"
m->count = offs;
----------> } else if (err || seq_has_overflowed(m)) {
m->count = offs;
break;
}
}
m->op->stop(m, p);
n = copy_to_iter(m->buf, m->count, iter);
copied += n;
m->count -= n;
m->from = n;
Done:
[...]
goto Done;
}
EXPORT_SYMBOL(seq_read_iter);
====================================
The following fix in the kernel's seq_read_iter() allows userspace to read the
content of the format file etc in one go if the requested size exceeds PAGE_SIZE,
and resolves the parsing error caused by the event's format file exceeding PAGE_SIZE.
====================================
diff --git a/fs/seq_file.c b/fs/seq_file.c index e676c8b0cf5d..f1f1af180562 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -207,7 +207,11 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
/* grab buffer if we didn't have one */
if (!m->buf) {
- m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
+ if (iter->count % PAGE_SIZE)
+ m->size = ((iter->count / PAGE_SIZE) + 1) * PAGE_SIZE;
+ else
+ m->size = (iter->count / PAGE_SIZE) * PAGE_SIZE;
+ m->buf = seq_buf_alloc(m->size);
if (!m->buf)
goto Enomem;
}
====================================
Can you suggest which fix is the right one, kernel based fix or rasdaemon based fix (which
mentioned in the previous email)?
Thanks,
Shiju
>>
>>>
>>>>
>>>> I've also attached another format file,
>>>> "format_cxl_general_media_v3.1_full",
>>>> which contains the complete TP_printk() intended.
>>>
[...]
>>>-- Steve
>>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
2024-12-03 15:21 ` Shiju Jose
@ 2024-12-04 11:35 ` Shiju Jose
0 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2024-12-04 11:35 UTC (permalink / raw)
To: Shiju Jose, Steven Rostedt
Cc: dave.jiang@intel.com, dan.j.williams@intel.com, Jonathan Cameron,
alison.schofield@intel.com, nifan.cxl@gmail.com,
vishal.l.verma@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
tanxiaofei, Zengtao (B)
>-----Original Message-----
>From: Shiju Jose <shiju.jose@huawei.com>
>Sent: 03 December 2024 15:22
>To: Steven Rostedt <rostedt@goodmis.org>
>Cc: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org;
>Linuxarm <linuxarm@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>Zengtao (B) <prime.zeng@hisilicon.com>
>Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
>CXL spec rev 3.1
>
>[...]
>>
>>Hi Steve,
>>
>>I debug this case and please find the info, 1. rasdaemon : read() from
>>format file return around 1/3rd of the
>> actual data in the file only when the total size of the format's file
>> is above 4K bytes (page size), For example, in this case, the total
>>size was 4512 bytes,
>> but read only 1674 bytes.
>> This partial data resulted tep_parse_event() in libtraceevent
>>failed internally in the parse_format()
>> and since __parse_event() does not return error when parse_format() fail,
>> thus initialization of the event does not fail.
>>
>> The following solution in rasdaemon solved the issue,
>> (provided if no other fix for the above issue with read()),
>> 1. read() and accumulate content of format file until EOF reached.
>> 2. Increased the buffer size from 4K bytes to 8K bytes.
>> 3. May be __parse_event() in libtraceevent and thus
>>tep_parse_event() return error
>> when parse_format() fail so that the initialization would fail
>>if the input data is
>> corrupted or partial?
>
>Hi Steve,
>
>Identified the root cause of this issue in the kernel:
>The read() function for the event's format file calls seq_read() and
>seq_read_iter() in the kernel, which allocates a buffer of PAGE_SIZE (4 KB) for
>sequential reads. However, it detects an overflow in the following marked call
>(seq_has_overflowed()) and returns with partial data read.
>
>=====================================
>
>File : https://elixir.bootlin.com/linux/v6.13-rc1/source/fs/seq_file.c
>
>ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter) {
> struct seq_file *m = iocb->ki_filp->private_data; [...]
> /* grab buffer if we didn't have one */
> if (!m->buf) {
>---------> m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
> if (!m->buf)
> goto Enomem;
> }
> // something left in the buffer - copy it out first [...]
> // get a non-empty record in the buffer
> m->from = 0;
> p = m->op->start(m, &m->index);
> while (1) {
> err = PTR_ERR(p);
> if (!p || IS_ERR(p)) // EOF or an error
> break;
>[...]
> }
> // EOF or an error
> m->op->stop(m, p);
> m->count = 0;
> goto Done;
>Fill:
> // one non-empty record is in the buffer; if they want more,
> // try to fit more in, but in any case we need to advance
> // the iterator once for every record shown.
> while (1) {
> size_t offs = m->count;
> loff_t pos = m->index;
>
>[...]
> err = m->op->show(m, p);
> if (err > 0) { // ->show() says "skip it"
> m->count = offs;
>----------> } else if (err || seq_has_overflowed(m)) {
> m->count = offs;
> break;
> }
> }
> m->op->stop(m, p);
> n = copy_to_iter(m->buf, m->count, iter);
> copied += n;
> m->count -= n;
> m->from = n;
>Done:
>[...]
> goto Done;
>}
>EXPORT_SYMBOL(seq_read_iter);
>====================================
>
>The following fix in the kernel's seq_read_iter() allows userspace to read the
>content of the format file etc in one go if the requested size exceeds PAGE_SIZE,
>and resolves the parsing error caused by the event's format file exceeding
>PAGE_SIZE.
>
>====================================
>diff --git a/fs/seq_file.c b/fs/seq_file.c index e676c8b0cf5d..f1f1af180562
>100644
>--- a/fs/seq_file.c
>+++ b/fs/seq_file.c
>@@ -207,7 +207,11 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter
>*iter)
>
> /* grab buffer if we didn't have one */
> if (!m->buf) {
>- m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
>+ if (iter->count % PAGE_SIZE)
>+ m->size = ((iter->count / PAGE_SIZE) + 1) * PAGE_SIZE;
>+ else
>+ m->size = (iter->count / PAGE_SIZE) * PAGE_SIZE;
>+ m->buf = seq_buf_alloc(m->size);
> if (!m->buf)
> goto Enomem;
> }
>====================================
>
>Can you suggest which fix is the right one, kernel based fix or rasdaemon based
>fix (which mentioned in the previous email)?
Following change for reading trace event's format file in the kernel worked fine to fix
the parsing error, which seems better than changing the common seq_read().
============================================
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 7266ec2a4eea..9cb1c5a1f070 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1674,6 +1674,35 @@ static int trace_format_open(struct inode *inode, struct file *file)
return 0;
}
+/**
+ * trace_format_read - read() method for format file.
+ * @file: the file to read from
+ * @buf: the buffer to read to
+ * @size: the maximum number of bytes to read
+ * @ppos: the current position in the file
+ *
+ * * Return:
+ * * %0 - No bytes copied (EOF).
+ * * %>0 - Number of bytes copied.
+ * * %<0 - Error code.
+ */
+static ssize_t trace_format_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
+{
+ ssize_t copied = 0;
+ ssize_t ret;
+
+ do {
+ ret = seq_read(file, buf + copied, size - copied, ppos);
+ if (ret < 0)
+ return ret;
+ copied += ret;
+ if (copied >= size)
+ break;
+ } while (ret);
+
+ return copied;
+}
+
#ifdef CONFIG_PERF_EVENTS
static ssize_t
event_id_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
@@ -2157,7 +2186,7 @@ static const struct file_operations ftrace_enable_fops = {
static const struct file_operations ftrace_event_format_fops = {
.open = trace_format_open,
- .read = seq_read,
+ .read = trace_format_read,
.llseek = seq_lseek,
.release = seq_release,
};
=============================================
>
>[...]
>>>>-- Steve
>>>
>>
>
Thanks,
Shiju
^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-12-04 11:35 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-20 9:37 [PATCH v4 0/6] Update Event Records to CXL spec rev 3.1 shiju.jose
2024-11-20 9:37 ` [PATCH v4 1/6] cxl/events: Update Common Event Record " shiju.jose
2024-11-26 17:27 ` Fan Ni
2024-11-27 10:15 ` Shiju Jose
2024-11-20 9:37 ` [PATCH v4 2/6] cxl/events: Add Component Identifier formatting for " shiju.jose
2024-11-20 9:37 ` [PATCH v4 3/6] cxl/events: Update General Media Event Record to " shiju.jose
2024-11-26 11:51 ` Shiju Jose
2024-11-26 17:02 ` Steven Rostedt
2024-11-27 10:12 ` Shiju Jose
2024-11-27 15:41 ` Steven Rostedt
2024-11-27 18:20 ` Shiju Jose
2024-11-27 18:34 ` Steven Rostedt
2024-11-28 10:01 ` Shiju Jose
2024-11-29 13:22 ` Shiju Jose
2024-12-03 15:21 ` Shiju Jose
2024-12-04 11:35 ` Shiju Jose
2024-11-20 9:37 ` [PATCH v4 4/6] cxl/events: Update DRAM " shiju.jose
2024-11-20 9:37 ` [PATCH v4 5/6] cxl/events: Update Memory Module " shiju.jose
2024-11-20 9:37 ` [PATCH v4 6/6] cxl/test: Update test code for event records " shiju.jose
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox