Linux CXL
 help / color / mirror / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: "dave.jiang@intel.com" <dave.jiang@intel.com>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Cc: "rostedt@goodmis.org" <rostedt@goodmis.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linuxarm <linuxarm@huawei.com>,
	tanxiaofei <tanxiaofei@huawei.com>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>
Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1
Date: Tue, 26 Nov 2024 11:51:23 +0000	[thread overview]
Message-ID: <180fcfd623c64cdb86cdc9059f749af0@huawei.com> (raw)
In-Reply-To: <20241120093745.1847-4-shiju.jose@huawei.com>

CC  Steven Rostedt.

Hi Steven,

We are encountering a parsing error ("FAILED TO PARSE") from libtraceevent  when it
tries to parse some of the CXL trace events for the user-space tool rasdaemon. 
This issue appeared after new fields were added to the trace events. 
It was found that the issue does not occur when all or some of the decoded strings
for the event's data and flags are removed from the TP_printk() function in the kernel,
and only the values are printed instead.
https://elixir.bootlin.com/linux/v6.12/source/drivers/cxl/core/trace.h
https://lore.kernel.org/lkml/20241120093745.1847-1-shiju.jose@huawei.com/

Below is the information from the debugging in libtraceevent:
The failure occurs in the following functions and locations within libtraceevent:
File: src/event-parse.c
Function: event_read_format()
ret = event_read_fields(event->tep, event, &event->format.fields); if (ret < 0)
    return ret;

Function: event_read_fields()
if (test_type_token(type, token, TEP_EVENT_ITEM, "field"))
    goto fail;

Can you recognize if there are any limitations or issues  that would prevent
libtraceevent from parsing the trace event in the condition described above?

Thanks,
Shiju

>-----Original Message-----
>From: Shiju Jose <shiju.jose@huawei.com>
>Sent: 20 November 2024 09:38
>To: dave.jiang@intel.com; dan.j.williams@intel.com; Jonathan Cameron
><jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>nifan.cxl@gmail.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>dave@stgolabs.net; linux-cxl@vger.kernel.org
>Cc: linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>;
>tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
>Shiju Jose <shiju.jose@huawei.com>
>Subject: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL
>spec rev 3.1
>
>From: Shiju Jose <shiju.jose@huawei.com>
>
>CXL spec rev 3.1 section 8.2.9.2.1.1 Table 8-45, General Media Event Record has
>updated with following new fields and new types for Memory Event Type and
>Transaction Type fields.
>1. Advanced Programmable Corrected Memory Error Threshold Event Flags 2.
>Corrected Memory Error Count at Event 3. Memory Event Sub-Type
>
>The format of component identifier has changed (CXL spec 3.1 section
>8.2.9.2.1 Table 8-44).
>
>Update the general media event record and general media trace event for the
>above spec changes. The new fields are inserted in logical places.
>
>Example trace log of cxl_general_media trace event,
>
>cxl_general_media: memdev=mem0 host=0000:0f:00.0 serial=3 log=Fatal : \
>time=45104947948 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 \
>flags='0x1' handle=1 related_handle=0 maint_op_class=2 \
>maint_op_sub_class=4 : dpa=0x30d40 dpa_flags=0x0 \
>descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVER
>FLOW' \ type='TE State Violation' sub_type=0x2 transaction_type=0x4 channel=3
>\
>rank=33 device=0x5 validity_flags=0x1f \
>comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \
>pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=fc d2 7e 2f \
>hpa=0xffffffffffffffff region= region_uuid=00000000-0000-0000-0000-
>000000000000 \
>cme_threshold_ev_flags=0x3 cme_count=0x78
>
>The number of decoded strings in TP_printk() caused  parsing error when
>libtraceevent in userspace parses the CXL general media trace event for
>rasdaemon. It was found that long decoded strings of field values in the
>TP_printk() caused the issue. As a solution, decoding of some fields in the
>TP_printk() were removed to accommodate the new fields.
>Decoding of all these fields is added in the userspace tool rasdaemon.
>
>Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>---
> drivers/cxl/core/trace.h | 58 +++++++++++++++++++++++++++++-----------
> include/cxl/event.h      |  7 +++--
> 2 files changed, 48 insertions(+), 17 deletions(-)
>
>diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index
>8e9d80e34a28..77055d66b56e 100644
>--- a/drivers/cxl/core/trace.h
>+++ b/drivers/cxl/core/trace.h
>@@ -287,7 +287,7 @@ TRACE_EVENT(cxl_generic_event,
>
> /*
>  * General Media Event Record - GMER
>- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
>+ * CXL rev 3.1 Section 8.2.9.2.1.1; Table 8-45
>  */
> #define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT		BIT(0)
> #define CXL_GMER_EVT_DESC_THRESHOLD_EVENT		BIT(1)
>@@ -301,10 +301,18 @@ TRACE_EVENT(cxl_generic_event,
> #define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR			0x00
> #define CXL_GMER_MEM_EVT_TYPE_INV_ADDR			0x01
> #define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR		0x02
>-#define show_gmer_mem_event_type(type)	__print_symbolic(type,
>		\
>-	{ CXL_GMER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },
>		\
>-	{ CXL_GMER_MEM_EVT_TYPE_INV_ADDR,		"Invalid
>Address" },	\
>-	{ CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,	"Data Path
>Error" }	\
>+#define CXL_GMER_MEM_EVT_TYPE_TE_STATE_VIOLATION	0x03
>+#define CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR	0x04
>+#define CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE	0x05
>+#define CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION		0x06
>+#define show_gmer_mem_event_type(type)	__print_symbolic(type,
>			\
>+	{ CXL_GMER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },
>			\
>+	{ CXL_GMER_MEM_EVT_TYPE_INV_ADDR,		"Invalid
>Address" },		\
>+	{ CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,	"Data Path
>Error" },		\
>+	{ CXL_GMER_MEM_EVT_TYPE_TE_STATE_VIOLATION,	"TE State
>Violation" },		\
>+	{ CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR,	"Scrub
>Media ECC Error" },	\
>+	{ CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE,	"Adv
>Prog CME Counter Expiration" },	\
>+	{ CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION,		"CKID
>Violation" }		\
> )
>
> #define CXL_GMER_TRANS_UNKNOWN				0x00
>@@ -314,6 +322,8 @@ TRACE_EVENT(cxl_generic_event,
> #define CXL_GMER_TRANS_HOST_INJECT_POISON		0x04
> #define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB		0x05
> #define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT	0x06
>+#define CXL_GMER_TRANS_INTERNAL_MEDIA_ECS		0x07
>+#define CXL_GMER_TRANS_MEDIA_INITIALIZATION		0x08
> #define show_trans_type(type)	__print_symbolic(type,
>		\
> 	{ CXL_GMER_TRANS_UNKNOWN,			"Unknown" },
>			\
> 	{ CXL_GMER_TRANS_HOST_READ,			"Host Read" },
>			\
>@@ -321,18 +331,22 @@ TRACE_EVENT(cxl_generic_event,
> 	{ CXL_GMER_TRANS_HOST_SCAN_MEDIA,		"Host Scan
>Media" },		\
> 	{ CXL_GMER_TRANS_HOST_INJECT_POISON,		"Host Inject
>Poison" },		\
> 	{ CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,		"Internal
>Media Scrub" },	\
>-	{ CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,
>	"Internal Media Management" }	\
>+	{ CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,
>	"Internal Media Management" },	\
>+	{ CXL_GMER_TRANS_INTERNAL_MEDIA_ECS,		"Internal
>Media Error Check Scrub" },	\
>+	{ CXL_GMER_TRANS_MEDIA_INITIALIZATION,		"Media
>Initialization" }	\
> )
>
> #define CXL_GMER_VALID_CHANNEL				BIT(0)
> #define CXL_GMER_VALID_RANK				BIT(1)
> #define CXL_GMER_VALID_DEVICE				BIT(2)
> #define CXL_GMER_VALID_COMPONENT			BIT(3)
>+#define CXL_GMER_VALID_COMPONENT_ID_FORMAT		BIT(4)
> #define show_valid_flags(flags)	__print_flags(flags, "|",		   \
> 	{ CXL_GMER_VALID_CHANNEL,			"CHANNEL"	}, \
> 	{ CXL_GMER_VALID_RANK,				"RANK"
>	}, \
> 	{ CXL_GMER_VALID_DEVICE,			"DEVICE"	}, \
>-	{ CXL_GMER_VALID_COMPONENT,			"COMPONENT"
>	}  \
>+	{ CXL_GMER_VALID_COMPONENT,			"COMPONENT"
>	}, \
>+	{ CXL_GMER_VALID_COMPONENT_ID_FORMAT,
>	"COMPONENT PLDM FORMAT"	} \
> )
>
> TRACE_EVENT(cxl_general_media,
>@@ -348,6 +362,7 @@ TRACE_EVENT(cxl_general_media,
> 		__field(u64, dpa)
> 		__field(u8, descriptor)
> 		__field(u8, type)
>+		__field(u8, sub_type)
> 		__field(u8, transaction_type)
> 		__field(u8, channel)
> 		__field(u32, device)
>@@ -359,6 +374,8 @@ TRACE_EVENT(cxl_general_media,
> 		__field(u8, rank)
> 		__field(u8, dpa_flags)
> 		__string(region_name, cxlr ? dev_name(&cxlr->dev) : "")
>+		__field(u8, cme_threshold_ev_flags)
>+		__field(u32, cme_count)
> 	),
>
> 	TP_fast_assign(
>@@ -372,6 +389,7 @@ TRACE_EVENT(cxl_general_media,
> 		__entry->dpa &= CXL_DPA_MASK;
> 		__entry->descriptor = rec->media_hdr.descriptor;
> 		__entry->type = rec->media_hdr.type;
>+		__entry->sub_type = rec->sub_type;
> 		__entry->transaction_type = rec->media_hdr.transaction_type;
> 		__entry->channel = rec->media_hdr.channel;
> 		__entry->rank = rec->media_hdr.rank;
>@@ -380,6 +398,8 @@ TRACE_EVENT(cxl_general_media,
> 			CXL_EVENT_GEN_MED_COMP_ID_SIZE);
> 		__entry->validity_flags = get_unaligned_le16(&rec-
>>media_hdr.validity_flags);
> 		__entry->hpa = hpa;
>+		__entry->cme_threshold_ev_flags = rec-
>>cme_threshold_ev_flags;
>+		__entry->cme_count = get_unaligned_le24(rec->cme_count);
> 		if (cxlr) {
> 			__assign_str(region_name);
> 			uuid_copy(&__entry->region_uuid, &cxlr-
>>params.uuid); @@ -389,18 +409,26 @@ TRACE_EVENT(cxl_general_media,
> 		}
> 	),
>
>-	CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
>-		"descriptor='%s' type='%s' transaction_type='%s' channel=%u
>rank=%u " \
>-		"device=%x comp_id=%s validity_flags='%s' " \
>-		"hpa=%llx region=%s region_uuid=%pUb",
>-		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
>+	CXL_EVT_TP_printk("dpa=0x%llx dpa_flags=0x%x " \
>+		"descriptor='%s' type='%s' sub_type=0x%x " \
>+		"transaction_type=0x%x channel=%u rank=%u " \
>+		"device=0x%x validity_flags=0x%x " \
>+		"comp_id=%s pldm_entity_id=%s pldm_resource_id=%s " \
>+		"hpa=0x%llx region=%s region_uuid=%pUb " \
>+		"cme_threshold_ev_flags=0x%x cme_count=0x%x ",
>+		__entry->dpa, __entry->dpa_flags,
> 		show_event_desc_flags(__entry->descriptor),
> 		show_gmer_mem_event_type(__entry->type),
>-		show_trans_type(__entry->transaction_type),
>+		__entry->sub_type, __entry->transaction_type,
> 		__entry->channel, __entry->rank, __entry->device,
>+		__entry->validity_flags,
> 		__print_hex(__entry->comp_id,
>CXL_EVENT_GEN_MED_COMP_ID_SIZE),
>-		show_valid_flags(__entry->validity_flags),
>-		__entry->hpa, __get_str(region_name), &__entry->region_uuid
>+		show_pldm_entity_id(__entry->validity_flags,
>CXL_GMER_VALID_COMPONENT,
>+
>CXL_GMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
>+		show_pldm_resource_id(__entry->validity_flags,
>CXL_GMER_VALID_COMPONENT,
>+
>CXL_GMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
>+		__entry->hpa, __get_str(region_name), &__entry->region_uuid,
>+		__entry->cme_threshold_ev_flags, __entry->cme_count
> 	)
> );
>
>diff --git a/include/cxl/event.h b/include/cxl/event.h index
>e1d485ad376b..2b07adf39010 100644
>--- a/include/cxl/event.h
>+++ b/include/cxl/event.h
>@@ -45,14 +45,17 @@ struct cxl_event_generic {
>
> /*
>  * General Media Event Record
>- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
>+ * CXL rev 3.1 Section 8.2.9.2.1.1; Table 8-45
>  */
> #define CXL_EVENT_GEN_MED_COMP_ID_SIZE	0x10
> struct cxl_event_gen_media {
> 	struct cxl_event_media_hdr media_hdr;
> 	u8 device[3];
> 	u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
>-	u8 reserved[46];
>+	u8 cme_threshold_ev_flags;
>+	u8 cme_count[3];
>+	u8 sub_type;
>+	u8 reserved[41];
> } __packed;
>
> /*
>--
>2.43.0


  reply	other threads:[~2024-11-26 11:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-20  9:37 [PATCH v4 0/6] Update Event Records to CXL spec rev 3.1 shiju.jose
2024-11-20  9:37 ` [PATCH v4 1/6] cxl/events: Update Common Event Record " shiju.jose
2024-11-26 17:27   ` Fan Ni
2024-11-27 10:15     ` Shiju Jose
2024-11-20  9:37 ` [PATCH v4 2/6] cxl/events: Add Component Identifier formatting for " shiju.jose
2024-11-20  9:37 ` [PATCH v4 3/6] cxl/events: Update General Media Event Record to " shiju.jose
2024-11-26 11:51   ` Shiju Jose [this message]
2024-11-26 17:02     ` Steven Rostedt
2024-11-27 10:12       ` Shiju Jose
2024-11-27 15:41         ` Steven Rostedt
2024-11-27 18:20           ` Shiju Jose
2024-11-27 18:34             ` Steven Rostedt
2024-11-28 10:01               ` Shiju Jose
2024-11-29 13:22                 ` Shiju Jose
2024-12-03 15:21                   ` Shiju Jose
2024-12-04 11:35                     ` Shiju Jose
2024-11-20  9:37 ` [PATCH v4 4/6] cxl/events: Update DRAM " shiju.jose
2024-11-20  9:37 ` [PATCH v4 5/6] cxl/events: Update Memory Module " shiju.jose
2024-11-20  9:37 ` [PATCH v4 6/6] cxl/test: Update test code for event records " shiju.jose

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=180fcfd623c64cdb86cdc9059f749af0@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=nifan.cxl@gmail.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rostedt@goodmis.org \
    --cc=tanxiaofei@huawei.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox