public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v3 0/6] efi/cxl-cper: Report CPER CXL component events through trace events
@ 2023-11-01 21:11 Ira Weiny
  2023-11-01 21:11 ` [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events Ira Weiny
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-01 21:11 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

Series status/background
========================

This is another RFC version of processing the CXL CPER records through
the CXL trace mechanisms as Dan mentioned in [1].

This raises the cxl event structures to a core header and rearranges them
such that they can be shared most efficiently.  Thus eliminating a
memcpy Smita noticed.  Also BDF is used instead of serial number.

NOTE: I'm still fuzzy on which fields in the CPER record are correct to
find the BDF in the Linux code.  It would be nice to double check those
for me.

The CPER code remains compile tested only.  The original event code
continues to pass cxl-test.

[1] https://lore.kernel.org/all/6528808cef2ba_780ef294c5@dwillia2-xfh.jf.intel.com.notmuch/

Cover letter
============

CXL Component Events, as defined by EFI 2.10 Section N.2.14, wrap a
mostly CXL event payload in an EFI Common Platform Error Record (CPER)
record.  If a device is configured for firmware first CXL event records
are not sent directly to the host.

The CXL sub-system uniquely has DPA to HPA translation information.  It
also already properly decodes the event format.  Send the CXL CPER
records to the CXL sub-system for processing.

With CXL event logs the device interrupts the host with events.  In the
EFI case events are wrapped with device information which needs to be
matched with memdev devices the CXL driver is tracking.

A number of alternatives were considered to match the memdev with the
CPER record.  The most robust was to find the PCI device via Bus,
Device, Function and match it to the memdev driver data.

CPER records are identified with GUID's while CXL event logs contain
UUID's.  The UUID was previously printed for all events.  But the UUID
is redundant information which presents unnecessary complexity when
processing CPER data.  Remove the UUIDs from known events.  Restructure
the code to make sharing the data between CPER/event logs most
efficient.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
Changes in RFC v3:
- djbw: Share structures between CPER/event logs
- Smita: use BDF to resolve the memdev
- djbw/Smita: various cleanups
- Link to v2: https://lore.kernel.org/r/20230601-cxl-cper-v2-0-314d9c36ab02@intel.com

---
Ira Weiny (6):
      cxl/trace: Remove uuid from event trace known events
      cxl/events: Promote CXL event structures to a core header
      cxl/events: Remove UUID from non-generic event structures
      cxl/events: Create a CXL event union
      firmware/efi: Process CXL Component Events
      cxl/memdev: Register for and process CPER events

 drivers/cxl/core/mbox.c         |  57 +++++++++-----
 drivers/cxl/core/trace.h        |  18 ++---
 drivers/cxl/cxlmem.h            |  96 ++---------------------
 drivers/cxl/pci.c               |  59 +++++++++++++-
 drivers/firmware/efi/cper.c     |  15 ++++
 drivers/firmware/efi/cper_cxl.c |  40 ++++++++++
 drivers/firmware/efi/cper_cxl.h |  29 +++++++
 include/linux/cxl-event.h       | 160 ++++++++++++++++++++++++++++++++++++++
 tools/testing/cxl/test/mem.c    | 166 +++++++++++++++++++++++-----------------
 9 files changed, 451 insertions(+), 189 deletions(-)
---
base-commit: 1c8b86a3799f7e5be903c3f49fcdaee29fd385b5
change-id: 20230601-cxl-cper-26ffc839c6c6

Best regards,
-- 
Ira Weiny <ira.weiny@intel.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events
  2023-11-01 21:11 [PATCH RFC v3 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
@ 2023-11-01 21:11 ` Ira Weiny
  2023-11-03 14:27   ` Jonathan Cameron
  2023-11-01 21:11 ` [PATCH RFC v3 2/6] cxl/events: Promote CXL event structures to a core header Ira Weiny
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Ira Weiny @ 2023-11-01 21:11 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

The uuid printed in the well known events is redundant.  The uuid
defines what the event was.

Remove the uuid from the known events and only report it in the generic
event as it remains informative there.

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/cxl/core/trace.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index a0b5819bc70b..79ed03637604 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -189,7 +189,6 @@ TRACE_EVENT(cxl_overflow,
 	__string(memdev, dev_name(&cxlmd->dev))			\
 	__string(host, dev_name(cxlmd->dev.parent))		\
 	__field(int, log)					\
-	__field_struct(uuid_t, hdr_uuid)			\
 	__field(u64, serial)					\
 	__field(u32, hdr_flags)					\
 	__field(u16, hdr_handle)				\
@@ -203,7 +202,6 @@ TRACE_EVENT(cxl_overflow,
 	__assign_str(host, dev_name((cxlmd)->dev.parent));			\
 	__entry->log = (l);							\
 	__entry->serial = (cxlmd)->cxlds->serial;				\
-	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
 	__entry->hdr_length = (hdr).length;					\
 	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
 	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
@@ -212,12 +210,12 @@ TRACE_EVENT(cxl_overflow,
 	__entry->hdr_maint_op_class = (hdr).maint_op_class
 
 #define CXL_EVT_TP_printk(fmt, ...) \
-	TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb "	\
+	TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu "		\
 		"len=%d flags='%s' handle=%x related_handle=%x "		\
 		"maint_op_class=%u : " fmt,					\
 		__get_str(memdev), __get_str(host), __entry->serial,		\
 		cxl_event_log_type_str(__entry->log),				\
-		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
+		__entry->hdr_timestamp, __entry->hdr_length,			\
 		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
 		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
 		##__VA_ARGS__)
@@ -231,15 +229,17 @@ TRACE_EVENT(cxl_generic_event,
 
 	TP_STRUCT__entry(
 		CXL_EVT_TP_entry
+		__field_struct(uuid_t, hdr_uuid)
 		__array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
 	),
 
 	TP_fast_assign(
 		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+		memcpy(&__entry->hdr_uuid, &rec->hdr.id, sizeof(uuid_t));
 		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
 	),
 
-	CXL_EVT_TP_printk("%s",
+	CXL_EVT_TP_printk("uuid=%pUb %s", &__entry->hdr_uuid,
 		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
 );
 

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 2/6] cxl/events: Promote CXL event structures to a core header
  2023-11-01 21:11 [PATCH RFC v3 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
  2023-11-01 21:11 ` [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events Ira Weiny
@ 2023-11-01 21:11 ` Ira Weiny
  2023-11-01 21:11 ` [PATCH RFC v3 3/6] cxl/events: Remove UUID from non-generic event structures Ira Weiny
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-01 21:11 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

EFI code can process CXL events through CPER records.  Those records use
almost the same format as the CXL events.

Lift the CXL event structures to a core header to be shared.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2:
[djbw: new patch]
---
 drivers/cxl/cxlmem.h      |  90 +----------------------------------------
 include/linux/cxl-event.h | 100 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 101 insertions(+), 89 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 706f8a6d1ef4..d694820ce8f5 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -6,6 +6,7 @@
 #include <linux/cdev.h>
 #include <linux/uuid.h>
 #include <linux/rcuwait.h>
+#include <linux/cxl-event.h>
 #include "cxl.h"
 
 /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -576,27 +577,6 @@ struct cxl_mbox_identify {
 	u8 qos_telemetry_caps;
 } __packed;
 
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
-struct cxl_event_record_hdr {
-	uuid_t id;
-	u8 length;
-	u8 flags[3];
-	__le16 handle;
-	__le16 related_handle;
-	__le64 timestamp;
-	u8 maint_op_class;
-	u8 reserved[15];
-} __packed;
-
-#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
-	struct cxl_event_record_hdr hdr;
-	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
-} __packed;
-
 /*
  * Get Event Records output payload
  * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
@@ -638,74 +618,6 @@ struct cxl_mbox_clear_event_payload {
 } __packed;
 #define CXL_CLEAR_EVENT_MAX_HANDLES U8_MAX
 
-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CXL_EVENT_GEN_MED_COMP_ID_SIZE	0x10
-struct cxl_event_gen_media {
-	struct cxl_event_record_hdr hdr;
-	__le64 phys_addr;
-	u8 descriptor;
-	u8 type;
-	u8 transaction_type;
-	u8 validity_flags[2];
-	u8 channel;
-	u8 rank;
-	u8 device[3];
-	u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
-	u8 reserved[46];
-} __packed;
-
-/*
- * DRAM Event Record - DER
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
- */
-#define CXL_EVENT_DER_CORRECTION_MASK_SIZE	0x20
-struct cxl_event_dram {
-	struct cxl_event_record_hdr hdr;
-	__le64 phys_addr;
-	u8 descriptor;
-	u8 type;
-	u8 transaction_type;
-	u8 validity_flags[2];
-	u8 channel;
-	u8 rank;
-	u8 nibble_mask[3];
-	u8 bank_group;
-	u8 bank;
-	u8 row[3];
-	u8 column[2];
-	u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
-	u8 reserved[0x17];
-} __packed;
-
-/*
- * Get Health Info Record
- * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
- */
-struct cxl_get_health_info {
-	u8 health_status;
-	u8 media_status;
-	u8 add_status;
-	u8 life_used;
-	u8 device_temp[2];
-	u8 dirty_shutdown_cnt[4];
-	u8 cor_vol_err_cnt[4];
-	u8 cor_per_err_cnt[4];
-} __packed;
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-struct cxl_event_mem_module {
-	struct cxl_event_record_hdr hdr;
-	u8 event_type;
-	struct cxl_get_health_info info;
-	u8 reserved[0x3d];
-} __packed;
-
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
new file mode 100644
index 000000000000..1c94e8fdd227
--- /dev/null
+++ b/include/linux/cxl-event.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CXL_EVENT_H
+#define _LINUX_CXL_EVENT_H
+
+/*
+ * CXL event records; CXL rev 3.0
+ *
+ * Copyright(c) 2023 Intel Corporation.
+ */
+
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+	uuid_t id;
+	u8 length;
+	u8 flags[3];
+	__le16 handle;
+	__le16 related_handle;
+	__le64 timestamp;
+	u8 maint_op_class;
+	u8 reserved[15];
+} __packed;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+	struct cxl_event_record_hdr hdr;
+	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MED_COMP_ID_SIZE	0x10
+struct cxl_event_gen_media {
+	struct cxl_event_record_hdr hdr;
+	__le64 phys_addr;
+	u8 descriptor;
+	u8 type;
+	u8 transaction_type;
+	u8 validity_flags[2];
+	u8 channel;
+	u8 rank;
+	u8 device[3];
+	u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+	u8 reserved[46];
+} __packed;
+
+/*
+ * DRAM Event Record - DER
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVENT_DER_CORRECTION_MASK_SIZE	0x20
+struct cxl_event_dram {
+	struct cxl_event_record_hdr hdr;
+	__le64 phys_addr;
+	u8 descriptor;
+	u8 type;
+	u8 transaction_type;
+	u8 validity_flags[2];
+	u8 channel;
+	u8 rank;
+	u8 nibble_mask[3];
+	u8 bank_group;
+	u8 bank;
+	u8 row[3];
+	u8 column[2];
+	u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
+	u8 reserved[0x17];
+} __packed;
+
+/*
+ * Get Health Info Record
+ * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+	u8 health_status;
+	u8 media_status;
+	u8 add_status;
+	u8 life_used;
+	u8 device_temp[2];
+	u8 dirty_shutdown_cnt[4];
+	u8 cor_vol_err_cnt[4];
+	u8 cor_per_err_cnt[4];
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_event_mem_module {
+	struct cxl_event_record_hdr hdr;
+	u8 event_type;
+	struct cxl_get_health_info info;
+	u8 reserved[0x3d];
+} __packed;
+
+#endif /* _LINUX_CXL_EVENT_H */

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 3/6] cxl/events: Remove UUID from non-generic event structures
  2023-11-01 21:11 [PATCH RFC v3 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
  2023-11-01 21:11 ` [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events Ira Weiny
  2023-11-01 21:11 ` [PATCH RFC v3 2/6] cxl/events: Promote CXL event structures to a core header Ira Weiny
@ 2023-11-01 21:11 ` Ira Weiny
  2023-11-01 21:11 ` [PATCH RFC v3 4/6] cxl/events: Create a CXL event union Ira Weiny
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-01 21:11 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

The UEFI CXL CPER structure does not include the UUID.  Now that the
UUID is only reported in the generic CXL trace event there is no need to
carry the UUID in the structures which could come from CPER records.

Remove the UUID from the event record header.  Retain the UUID for
processing the raw records.  Create dummy structures for creating test
records.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2:
[iweiny: new patch]
---
 drivers/cxl/core/mbox.c      |   2 +-
 drivers/cxl/core/trace.h     |   2 +-
 include/linux/cxl-event.h    |  10 ++--
 tools/testing/cxl/test/mem.c | 135 +++++++++++++++++++++++++------------------
 4 files changed, 85 insertions(+), 64 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 4df4f614f490..f27c409f9034 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -864,7 +864,7 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 				   enum cxl_event_log_type type,
 				   struct cxl_event_record_raw *record)
 {
-	uuid_t *id = &record->hdr.id;
+	uuid_t *id = &record->id;
 
 	if (uuid_equal(id, &gen_media_event_uuid)) {
 		struct cxl_event_gen_media *rec =
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 79ed03637604..72c620597708 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -235,7 +235,7 @@ TRACE_EVENT(cxl_generic_event,
 
 	TP_fast_assign(
 		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
-		memcpy(&__entry->hdr_uuid, &rec->hdr.id, sizeof(uuid_t));
+		memcpy(&__entry->hdr_uuid, &rec->id, sizeof(uuid_t));
 		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
 	),
 
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 1c94e8fdd227..ebb00ead1496 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -8,12 +8,7 @@
  * Copyright(c) 2023 Intel Corporation.
  */
 
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
 struct cxl_event_record_hdr {
-	uuid_t id;
 	u8 length;
 	u8 flags[3];
 	__le16 handle;
@@ -23,8 +18,13 @@ struct cxl_event_record_hdr {
 	u8 reserved[15];
 } __packed;
 
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
 #define CXL_EVENT_RECORD_DATA_LENGTH 0x50
 struct cxl_event_record_raw {
+	uuid_t id;
 	struct cxl_event_record_hdr hdr;
 	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
 } __packed;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 464fc39ed277..85862be00c48 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -330,9 +330,9 @@ static void cxl_mock_event_trigger(struct device *dev)
 }
 
 struct cxl_event_record_raw maint_needed = {
+	.id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
+			0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
 	.hdr = {
-		.id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
-				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
 		.length = sizeof(struct cxl_event_record_raw),
 		.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
 		/* .handle = Set dynamically */
@@ -342,9 +342,9 @@ struct cxl_event_record_raw maint_needed = {
 };
 
 struct cxl_event_record_raw hardware_replace = {
+	.id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
+			0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
 	.hdr = {
-		.id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
-				0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
 		.length = sizeof(struct cxl_event_record_raw),
 		.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
 		/* .handle = Set dynamically */
@@ -353,64 +353,85 @@ struct cxl_event_record_raw hardware_replace = {
 	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 };
 
-struct cxl_event_gen_media gen_media = {
-	.hdr = {
-		.id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
-				0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
-		.length = sizeof(struct cxl_event_gen_media),
-		.flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0),
+struct cxl_test_gen_media {
+	uuid_t id;
+	struct cxl_event_gen_media rec;
+} __packed;
+
+struct cxl_test_gen_media gen_media = {
+	.id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+			0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+	.rec = {
+		.hdr = {
+			.length = sizeof(struct cxl_test_gen_media),
+			.flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0),
+		},
+		.phys_addr = cpu_to_le64(0x2000),
+		.descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+		.type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+		.transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+		/* .validity_flags = <set below> */
+		.channel = 1,
+		.rank = 30
 	},
-	.phys_addr = cpu_to_le64(0x2000),
-	.descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
-	.type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
-	.transaction_type = CXL_GMER_TRANS_HOST_WRITE,
-	/* .validity_flags = <set below> */
-	.channel = 1,
-	.rank = 30
 };
 
-struct cxl_event_dram dram = {
-	.hdr = {
-		.id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
-				0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
-		.length = sizeof(struct cxl_event_dram),
-		.flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0),
+struct cxl_test_dram {
+	uuid_t id;
+	struct cxl_event_dram rec;
+} __packed;
+
+struct cxl_test_dram dram = {
+	.id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+			0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+	.rec = {
+		.hdr = {
+			.length = sizeof(struct cxl_test_dram),
+			.flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0),
+		},
+		.phys_addr = cpu_to_le64(0x8000),
+		.descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+		.type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+		.transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+		/* .validity_flags = <set below> */
+		.channel = 1,
+		.bank_group = 5,
+		.bank = 2,
+		.column = {0xDE, 0xAD},
 	},
-	.phys_addr = cpu_to_le64(0x8000),
-	.descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
-	.type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
-	.transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
-	/* .validity_flags = <set below> */
-	.channel = 1,
-	.bank_group = 5,
-	.bank = 2,
-	.column = {0xDE, 0xAD},
 };
 
-struct cxl_event_mem_module mem_module = {
-	.hdr = {
-		.id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
-				0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
-		.length = sizeof(struct cxl_event_mem_module),
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0),
+struct cxl_test_mem_module {
+	uuid_t id;
+	struct cxl_event_mem_module rec;
+} __packed;
+
+struct cxl_test_mem_module mem_module = {
+	.id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+			0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+	.rec = {
+		.hdr = {
+			.length = sizeof(struct cxl_test_mem_module),
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0),
+		},
+		.event_type = CXL_MMER_TEMP_CHANGE,
+		.info = {
+			.health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+			.media_status = CXL_DHI_MS_ALL_DATA_LOST,
+			.add_status = (CXL_DHI_AS_CRITICAL << 2) |
+				      (CXL_DHI_AS_WARNING << 4) |
+				      (CXL_DHI_AS_WARNING << 5),
+			.device_temp = { 0xDE, 0xAD},
+			.dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
+			.cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+			.cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+		}
 	},
-	.event_type = CXL_MMER_TEMP_CHANGE,
-	.info = {
-		.health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
-		.media_status = CXL_DHI_MS_ALL_DATA_LOST,
-		.add_status = (CXL_DHI_AS_CRITICAL << 2) |
-			      (CXL_DHI_AS_WARNING << 4) |
-			      (CXL_DHI_AS_WARNING << 5),
-		.device_temp = { 0xDE, 0xAD},
-		.dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
-		.cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
-		.cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
-	}
 };
 
 static int mock_set_timestamp(struct cxl_dev_state *cxlds,
@@ -432,11 +453,11 @@ static int mock_set_timestamp(struct cxl_dev_state *cxlds,
 static void cxl_mock_add_event_logs(struct mock_event_store *mes)
 {
 	put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK,
-			   &gen_media.validity_flags);
+			   &gen_media.rec.validity_flags);
 
 	put_unaligned_le16(CXL_DER_VALID_CHANNEL | CXL_DER_VALID_BANK_GROUP |
 			   CXL_DER_VALID_BANK | CXL_DER_VALID_COLUMN,
-			   &dram.validity_flags);
+			   &dram.rec.validity_flags);
 
 	mes_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
 	mes_add_event(mes, CXL_EVENT_TYPE_INFO,

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 4/6] cxl/events: Create a CXL event union
  2023-11-01 21:11 [PATCH RFC v3 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
                   ` (2 preceding siblings ...)
  2023-11-01 21:11 ` [PATCH RFC v3 3/6] cxl/events: Remove UUID from non-generic event structures Ira Weiny
@ 2023-11-01 21:11 ` Ira Weiny
  2023-11-01 21:11 ` [PATCH RFC v3 5/6] firmware/efi: Process CXL Component Events Ira Weiny
  2023-11-01 21:11 ` [PATCH RFC v3 6/6] cxl/memdev: Register for and process CPER events Ira Weiny
  5 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-01 21:11 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

The CXL CPER and event log records share everything but a UUID/GUID in
their structures.

Define a cxl_event union without the UUID/GUID to be shared between the
CPER and event log record formats.  Adjust the code to use this union
after the record type is determined.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2:
[iweiny: new patch]
---
 drivers/cxl/core/mbox.c      | 23 +++++++++--------------
 drivers/cxl/core/trace.h     | 10 +++++-----
 include/linux/cxl-event.h    | 23 +++++++++++++++++------
 tools/testing/cxl/test/mem.c | 31 ++++++++++++++++++-------------
 4 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index f27c409f9034..bb40b844e3bd 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -864,25 +864,17 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 				   enum cxl_event_log_type type,
 				   struct cxl_event_record_raw *record)
 {
+	union cxl_event *evt = &record->event;
 	uuid_t *id = &record->id;
 
 	if (uuid_equal(id, &gen_media_event_uuid)) {
-		struct cxl_event_gen_media *rec =
-				(struct cxl_event_gen_media *)record;
-
-		trace_cxl_general_media(cxlmd, type, rec);
+		trace_cxl_general_media(cxlmd, type, &evt->gen_media);
 	} else if (uuid_equal(id, &dram_event_uuid)) {
-		struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
-
-		trace_cxl_dram(cxlmd, type, rec);
+		trace_cxl_dram(cxlmd, type, &evt->dram);
 	} else if (uuid_equal(id, &mem_mod_event_uuid)) {
-		struct cxl_event_mem_module *rec =
-				(struct cxl_event_mem_module *)record;
-
-		trace_cxl_memory_module(cxlmd, type, rec);
+		trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
 	} else {
-		/* For unknown record types print just the header */
-		trace_cxl_generic_event(cxlmd, type, record);
+		trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
 	}
 }
 
@@ -926,7 +918,10 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
 	 */
 	i = 0;
 	for (cnt = 0; cnt < total; cnt++) {
-		payload->handles[i++] = get_pl->records[cnt].hdr.handle;
+		struct cxl_event_record_raw *raw = &get_pl->records[cnt];
+		struct cxl_event_generic *gen = &raw->event.generic;
+
+		payload->handles[i++] = gen->hdr.handle;
 		dev_dbg(mds->cxlds.dev, "Event log '%d': Clearing %u\n", log,
 			le16_to_cpu(payload->handles[i]));
 
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 72c620597708..be0bda68f80f 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -223,9 +223,9 @@ TRACE_EVENT(cxl_overflow,
 TRACE_EVENT(cxl_generic_event,
 
 	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
-		 struct cxl_event_record_raw *rec),
+		 uuid_t *uuid, struct cxl_event_generic *gen_rec),
 
-	TP_ARGS(cxlmd, log, rec),
+	TP_ARGS(cxlmd, log, uuid, gen_rec),
 
 	TP_STRUCT__entry(
 		CXL_EVT_TP_entry
@@ -234,9 +234,9 @@ TRACE_EVENT(cxl_generic_event,
 	),
 
 	TP_fast_assign(
-		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
-		memcpy(&__entry->hdr_uuid, &rec->id, sizeof(uuid_t));
-		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
+		CXL_EVT_TP_fast_assign(cxlmd, log, gen_rec->hdr);
+		memcpy(&__entry->hdr_uuid, uuid, sizeof(uuid_t));
+		memcpy(__entry->data, gen_rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
 	),
 
 	CXL_EVT_TP_printk("uuid=%pUb %s", &__entry->hdr_uuid,
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index ebb00ead1496..6b689e1efc78 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -18,13 +18,8 @@ struct cxl_event_record_hdr {
 	u8 reserved[15];
 } __packed;
 
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
 #define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
-	uuid_t id;
+struct cxl_event_generic {
 	struct cxl_event_record_hdr hdr;
 	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
 } __packed;
@@ -97,4 +92,20 @@ struct cxl_event_mem_module {
 	u8 reserved[0x3d];
 } __packed;
 
+union cxl_event {
+	struct cxl_event_generic generic;
+	struct cxl_event_gen_media gen_media;
+	struct cxl_event_dram dram;
+	struct cxl_event_mem_module mem_module;
+};
+
+/*
+ * Common Event Record Format; in event logs
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_raw {
+	uuid_t id;
+	union cxl_event event;
+} __packed;
+
 #endif /* _LINUX_CXL_EVENT_H */
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 85862be00c48..2535eb182ece 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -244,7 +244,8 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
 	for (i = 0; i < CXL_TEST_EVENT_CNT && !event_log_empty(log); i++) {
 		memcpy(&pl->records[i], event_get_current(log),
 		       sizeof(pl->records[i]));
-		pl->records[i].hdr.handle = event_get_cur_event_handle(log);
+		pl->records[i].event.generic.hdr.handle =
+				event_get_cur_event_handle(log);
 		log->cur_idx++;
 	}
 
@@ -332,25 +333,29 @@ static void cxl_mock_event_trigger(struct device *dev)
 struct cxl_event_record_raw maint_needed = {
 	.id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
 			0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
-	.hdr = {
-		.length = sizeof(struct cxl_event_record_raw),
-		.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0xa5b6),
+	.event.generic = {
+		.hdr = {
+			.length = sizeof(struct cxl_event_record_raw),
+			.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0xa5b6),
+		},
+		.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 	},
-	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 };
 
 struct cxl_event_record_raw hardware_replace = {
 	.id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
 			0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
-	.hdr = {
-		.length = sizeof(struct cxl_event_record_raw),
-		.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
-		/* .handle = Set dynamically */
-		.related_handle = cpu_to_le16(0xb6a5),
+	.event.generic = {
+		.hdr = {
+			.length = sizeof(struct cxl_event_record_raw),
+			.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
+			/* .handle = Set dynamically */
+			.related_handle = cpu_to_le16(0xb6a5),
+		},
+		.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 	},
-	.data = { 0xDE, 0xAD, 0xBE, 0xEF },
 };
 
 struct cxl_test_gen_media {

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 5/6] firmware/efi: Process CXL Component Events
  2023-11-01 21:11 [PATCH RFC v3 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
                   ` (3 preceding siblings ...)
  2023-11-01 21:11 ` [PATCH RFC v3 4/6] cxl/events: Create a CXL event union Ira Weiny
@ 2023-11-01 21:11 ` Ira Weiny
  2023-11-03  0:30   ` Smita Koralahalli
  2023-11-01 21:11 ` [PATCH RFC v3 6/6] cxl/memdev: Register for and process CPER events Ira Weiny
  5 siblings, 1 reply; 15+ messages in thread
From: Ira Weiny @ 2023-11-01 21:11 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

BIOS can configure memory devices as firmware first.  This will send CXL
events to the firmware instead of the OS.  The firmware can then send
these events to the OS via UEFI.

UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
format for CXL Component Events.  The format is mostly the same as the
CXL Common Event Record Format.  The only difference is the UUID is
passed via the Section Type as a GUID and not included as part of the
record data.

Add EFI support to detect CXL CPER records and call a notifier chain
with the record data blobs.

Note that the format of a GUID and UUID are not the same.  Therefore the
Section Type GUID defines are duplicated from the CXL code.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2
[djbw: use common event structures]
[djbw: remove print in core cper code]
[djbw: export register call as NS_GPL]
[iweiny: fix 0day issues]

Changes from RFC v1
[iweiny: use an enum for know record types and skip converting GUID to UUID]
[iweiny: commit to the UUID not being part of the event record data]
[iweiny: use defines for GUID definitions]
---
 drivers/firmware/efi/cper.c     | 15 +++++++++++++
 drivers/firmware/efi/cper_cxl.c | 40 +++++++++++++++++++++++++++++++++
 drivers/firmware/efi/cper_cxl.h | 29 ++++++++++++++++++++++++
 include/linux/cxl-event.h       | 49 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 133 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 35c37f667781..3d0b60144a07 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -22,6 +22,7 @@
 #include <linux/aer.h>
 #include <linux/printk.h>
 #include <linux/bcd.h>
+#include <linux/cxl-event.h>
 #include <acpi/ghes.h>
 #include <ras/ras_event.h>
 #include "cper_cxl.h"
@@ -607,6 +608,20 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
 			cper_print_prot_err(newpfx, prot_err);
 		else
 			goto err_section_too_small;
+	} else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA) ||
+		   guid_equal(sec_type, &CPER_SEC_CXL_DRAM) ||
+		   guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE)) {
+		struct cper_cxl_event_rec *rec = acpi_hest_get_payload(gdata);
+
+		if (rec->hdr.length <= sizeof(rec->hdr))
+			goto err_section_too_small;
+
+		if (rec->hdr.length > sizeof(*rec)) {
+			pr_err(FW_WARN "error section length is too big\n");
+			return;
+		}
+
+		cper_post_cxl_event(newpfx, sec_type, rec);
 	} else {
 		const void *err = acpi_hest_get_payload(gdata);
 
diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
index a55771b99a97..bf642962a7ba 100644
--- a/drivers/firmware/efi/cper_cxl.c
+++ b/drivers/firmware/efi/cper_cxl.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/cper.h>
+#include <linux/cxl-event.h>
 #include "cper_cxl.h"
 
 #define PROT_ERR_VALID_AGENT_TYPE		BIT_ULL(0)
@@ -187,3 +188,42 @@ void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_e
 			       sizeof(cxl_ras->header_log), 0);
 	}
 }
+
+/* CXL CPER notifier chain */
+static BLOCKING_NOTIFIER_HEAD(cxl_cper_chain_head);
+
+void cper_post_cxl_event(const char *pfx, guid_t *sec_type,
+			 struct cper_cxl_event_rec *rec)
+{
+	struct cxl_cper_notifier_data nd = {
+		.rec = rec,
+	};
+
+	if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
+		pr_err(FW_WARN "cxl event no Component Event Log present\n");
+		return;
+	}
+
+	if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA))
+		nd.event_type = CXL_CPER_EVENT_GEN_MEDIA;
+	else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM))
+		nd.event_type = CXL_CPER_EVENT_DRAM;
+	else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE))
+		nd.event_type = CXL_CPER_EVENT_MEM_MODULE;
+
+	if (blocking_notifier_call_chain(&cxl_cper_chain_head, 0, (void *)&nd)
+			== NOTIFY_BAD)
+		pr_err(FW_WARN "cxl event notifier chain failed\n");
+}
+
+int register_cxl_cper_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&cxl_cper_chain_head, nb);
+}
+EXPORT_SYMBOL_NS_GPL(register_cxl_cper_notifier, CXL);
+
+void unregister_cxl_cper_notifier(struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&cxl_cper_chain_head, nb);
+}
+EXPORT_SYMBOL_NS_GPL(unregister_cxl_cper_notifier, CXL);
diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
index 86bfcf7909ec..e83d727f7489 100644
--- a/drivers/firmware/efi/cper_cxl.h
+++ b/drivers/firmware/efi/cper_cxl.h
@@ -10,11 +10,38 @@
 #ifndef LINUX_CPER_CXL_H
 #define LINUX_CPER_CXL_H
 
+#include <linux/cxl-event.h>
+
 /* CXL Protocol Error Section */
 #define CPER_SEC_CXL_PROT_ERR						\
 	GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78,	\
 		  0x4B, 0x77, 0x10, 0x48)
 
+/* CXL Event record UUIDs are used as the section type */
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CPER_SEC_CXL_GEN_MEDIA						\
+	GUID_INIT(0xfbcd0a77, 0xc260, 0x417f,				\
+		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CPER_SEC_CXL_DRAM						\
+	GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,				\
+		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CPER_SEC_CXL_MEM_MODULE						\
+	GUID_INIT(0xfe927475, 0xdd59, 0x4339,				\
+		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
 #pragma pack(1)
 
 /* Compute Express Link Protocol Error Section, UEFI v2.10 sec N.2.13 */
@@ -62,5 +89,7 @@ struct cper_sec_prot_err {
 #pragma pack()
 
 void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err);
+void cper_post_cxl_event(const char *pfx, guid_t *sec_type,
+			 struct cper_cxl_event_rec *rec);
 
 #endif //__CPER_CXL_
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 6b689e1efc78..2bdadde80f1a 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -108,4 +108,53 @@ struct cxl_event_record_raw {
 	union cxl_event event;
 } __packed;
 
+enum cxl_event_type {
+	CXL_CPER_EVENT_GEN_MEDIA,
+	CXL_CPER_EVENT_DRAM,
+	CXL_CPER_EVENT_MEM_MODULE,
+};
+
+#define CPER_CXL_DEVICE_ID_VALID		BIT(0)
+#define CPER_CXL_DEVICE_SN_VALID		BIT(1)
+#define CPER_CXL_COMP_EVENT_LOG_VALID		BIT(2)
+struct cper_cxl_event_rec {
+	struct {
+		u32 length;
+		u64 validation_bits;
+		struct cper_cxl_event_devid {
+			u16 vendor_id;
+			u16 device_id;
+			u8 func_num;
+			u8 device_num;
+			u8 bus_num;
+			u16 segment_num;
+			u16 slot_num; /* bits 2:0 reserved */
+			u8 reserved;
+		} device_id;
+		struct cper_cxl_event_sn {
+			u32 lower_dw;
+			u32 upper_dw;
+		} dev_serial_num;
+	} hdr;
+
+	union cxl_event event;
+};
+
+struct cxl_cper_notifier_data {
+	enum cxl_event_type event_type;
+	struct cper_cxl_event_rec *rec;
+};
+
+#ifdef CONFIG_UEFI_CPER
+int register_cxl_cper_notifier(struct notifier_block *nb);
+void unregister_cxl_cper_notifier(struct notifier_block *nb);
+#else
+static inline int register_cxl_cper_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+
+static inline void unregister_cxl_cper_notifier(struct notifier_block *nb) { }
+#endif
+
 #endif /* _LINUX_CXL_EVENT_H */

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 6/6] cxl/memdev: Register for and process CPER events
  2023-11-01 21:11 [PATCH RFC v3 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
                   ` (4 preceding siblings ...)
  2023-11-01 21:11 ` [PATCH RFC v3 5/6] firmware/efi: Process CXL Component Events Ira Weiny
@ 2023-11-01 21:11 ` Ira Weiny
  2023-11-03  0:32   ` Smita Koralahalli
  5 siblings, 1 reply; 15+ messages in thread
From: Ira Weiny @ 2023-11-01 21:11 UTC (permalink / raw)
  To: Dan Williams, Jonathan Cameron, Smita Koralahalli
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl,
	Ira Weiny

If the firmware has configured CXL event support to be firmware first
the OS can process those events through CPER records.  Matching memory
devices to the CPER records can be done via Bus, Device, Function which
is part of the CPER record header.

Detect firmware first, register a notifier callback for each memdev, and
trace events when they match the proper device.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC v2:
[smita/djbw: use BDF not serial number for memdev ID]
[smita: eliminate memcpy]
[djbw: adjust to new structures]
[iweiny: fix 0day errors]

Changes from RFC v1:
[iweiny: adjust to cper_event enum instead of converting guids]
---
 drivers/cxl/core/mbox.c | 42 ++++++++++++++++++++++++++---------
 drivers/cxl/cxlmem.h    |  6 +++++
 drivers/cxl/pci.c       | 59 ++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 96 insertions(+), 11 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index bb40b844e3bd..b949abb1b765 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -860,22 +860,44 @@ static const uuid_t mem_mod_event_uuid =
 	UUID_INIT(0xfe927475, 0xdd59, 0x4339,
 		  0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
 
-static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
-				   enum cxl_event_log_type type,
-				   struct cxl_event_record_raw *record)
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+			    enum cxl_event_log_type type,
+			    enum cxl_event_type event_type,
+			    union cxl_event *event)
 {
-	union cxl_event *evt = &record->event;
+	switch (event_type) {
+	case CXL_CPER_EVENT_GEN_MEDIA:
+		trace_cxl_general_media(cxlmd, type, &event->gen_media);
+		break;
+	case CXL_CPER_EVENT_DRAM:
+		trace_cxl_dram(cxlmd, type, &event->dram);
+		break;
+	case CXL_CPER_EVENT_MEM_MODULE:
+		trace_cxl_memory_module(cxlmd, type, &event->mem_module);
+		break;
+	}
+}
+EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
+
+static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+				     enum cxl_event_log_type type,
+				     struct cxl_event_record_raw *record)
+{
+	enum cxl_event_type event_type;
 	uuid_t *id = &record->id;
 
 	if (uuid_equal(id, &gen_media_event_uuid)) {
-		trace_cxl_general_media(cxlmd, type, &evt->gen_media);
+		event_type = CXL_CPER_EVENT_GEN_MEDIA;
 	} else if (uuid_equal(id, &dram_event_uuid)) {
-		trace_cxl_dram(cxlmd, type, &evt->dram);
+		event_type = CXL_CPER_EVENT_DRAM;
 	} else if (uuid_equal(id, &mem_mod_event_uuid)) {
-		trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
+		event_type = CXL_CPER_EVENT_MEM_MODULE;
 	} else {
-		trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
+		trace_cxl_generic_event(cxlmd, type, id, &record->event.generic);
+		return;
 	}
+
+	cxl_event_trace_record(cxlmd, type, event_type, &record->event);
 }
 
 static int cxl_clear_event_record(struct cxl_memdev_state *mds,
@@ -986,8 +1008,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
 			break;
 
 		for (i = 0; i < nr_rec; i++)
-			cxl_event_trace_record(cxlmd, type,
-					       &payload->records[i]);
+			__cxl_event_trace_record(cxlmd, type,
+						 &payload->records[i]);
 
 		if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
 			trace_cxl_overflow(cxlmd, type, payload);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index d694820ce8f5..b85cbf421cf4 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -478,6 +478,8 @@ struct cxl_memdev_state {
 	struct cxl_security_state security;
 	struct cxl_fw_state fw;
 
+	struct notifier_block cxl_cper_nb;
+
 	struct rcuwait mbox_wait;
 	int (*mbox_send)(struct cxl_memdev_state *mds,
 			 struct cxl_mbox_cmd *cmd);
@@ -775,6 +777,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 				  unsigned long *cmds);
 void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+			    enum cxl_event_log_type type,
+			    enum cxl_event_type event_type,
+			    union cxl_event *event);
 int cxl_set_timestamp(struct cxl_memdev_state *mds);
 int cxl_poison_state_init(struct cxl_memdev_state *mds);
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 44a21ab7add5..37add91068c0 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#include <asm-generic/unaligned.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/moduleparam.h>
 #include <linux/module.h>
@@ -748,6 +749,60 @@ static bool cxl_event_int_is_fw(u8 setting)
 	return mode == CXL_INT_FW;
 }
 
+#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
+static int cxl_cper_event_call(struct notifier_block *nb, unsigned long action,
+			       void *data)
+{
+	struct cxl_cper_notifier_data *nd = data;
+	struct cper_cxl_event_devid *device_id = &nd->rec->hdr.device_id;
+	enum cxl_event_log_type log_type;
+	struct cxl_memdev_state *mds;
+	struct cxl_dev_state *cxlds;
+	struct pci_dev *pdev;
+	unsigned int devfn;
+	u32 hdr_flags;
+
+	mds = container_of(nb, struct cxl_memdev_state, cxl_cper_nb);
+
+	/* PCI_DEVFN() would require 2 extra bit shifts; skip those */
+	devfn = (device_id->slot_num & 0xfff8) | (device_id->func_num & 0x07);
+	pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
+					   device_id->bus_num, devfn);
+	cxlds = pci_get_drvdata(pdev);
+	if (cxlds != &mds->cxlds) {
+		pci_dev_put(pdev);
+		return NOTIFY_DONE;
+	}
+
+	/* Fabricate a log type */
+	hdr_flags = get_unaligned_le24(nd->rec->event.generic.hdr.flags);
+	log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
+
+	cxl_event_trace_record(mds->cxlds.cxlmd, log_type, nd->event_type,
+			       &nd->rec->event);
+	pci_dev_put(pdev);
+	return NOTIFY_OK;
+}
+
+static void cxl_unregister_cper_events(void *_mds)
+{
+	struct cxl_memdev_state *mds = _mds;
+
+	unregister_cxl_cper_notifier(&mds->cxl_cper_nb);
+}
+
+static void register_cper_events(struct cxl_memdev_state *mds)
+{
+	mds->cxl_cper_nb.notifier_call = cxl_cper_event_call;
+
+	if (register_cxl_cper_notifier(&mds->cxl_cper_nb)) {
+		dev_err(mds->cxlds.dev, "CPER registration failed\n");
+		return;
+	}
+
+	devm_add_action_or_reset(mds->cxlds.dev, cxl_unregister_cper_events, mds);
+}
+
 static int cxl_event_config(struct pci_host_bridge *host_bridge,
 			    struct cxl_memdev_state *mds)
 {
@@ -758,8 +813,10 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
 	 * When BIOS maintains CXL error reporting control, it will process
 	 * event records.  Only one agent can do so.
 	 */
-	if (!host_bridge->native_cxl_error)
+	if (!host_bridge->native_cxl_error) {
+		register_cper_events(mds);
 		return 0;
+	}
 
 	rc = cxl_mem_alloc_event_buf(mds);
 	if (rc)

-- 
2.41.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 5/6] firmware/efi: Process CXL Component Events
  2023-11-01 21:11 ` [PATCH RFC v3 5/6] firmware/efi: Process CXL Component Events Ira Weiny
@ 2023-11-03  0:30   ` Smita Koralahalli
  2023-11-06 22:12     ` Ira Weiny
  0 siblings, 1 reply; 15+ messages in thread
From: Smita Koralahalli @ 2023-11-03  0:30 UTC (permalink / raw)
  To: Ira Weiny, Dan Williams, Jonathan Cameron
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

On 11/1/2023 2:11 PM, Ira Weiny wrote:

[snip]

> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 6b689e1efc78..2bdadde80f1a 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -108,4 +108,53 @@ struct cxl_event_record_raw {
>   	union cxl_event event;
>   } __packed;
>   
> +enum cxl_event_type {
> +	CXL_CPER_EVENT_GEN_MEDIA,
> +	CXL_CPER_EVENT_DRAM,
> +	CXL_CPER_EVENT_MEM_MODULE,
> +};
> +
> +#define CPER_CXL_DEVICE_ID_VALID		BIT(0)
> +#define CPER_CXL_DEVICE_SN_VALID		BIT(1)
> +#define CPER_CXL_COMP_EVENT_LOG_VALID		BIT(2)
> +struct cper_cxl_event_rec {
> +	struct {
> +		u32 length;
> +		u64 validation_bits;
> +		struct cper_cxl_event_devid {
> +			u16 vendor_id;
> +			u16 device_id;
> +			u8 func_num;
> +			u8 device_num;
> +			u8 bus_num;
> +			u16 segment_num;
> +			u16 slot_num; /* bits 2:0 reserved */
> +			u8 reserved;
> +		} device_id;
> +		struct cper_cxl_event_sn {
> +			u32 lower_dw;
> +			u32 upper_dw;
> +		} dev_serial_num;
> +	} hdr;
> +
> +	union cxl_event event;
> +};

Do we need pragma pack or similar for alignment here?

Thanks,
Smita

> +
> +struct cxl_cper_notifier_data {
> +	enum cxl_event_type event_type;
> +	struct cper_cxl_event_rec *rec;
> +};
> +
> +#ifdef CONFIG_UEFI_CPER
> +int register_cxl_cper_notifier(struct notifier_block *nb);
> +void unregister_cxl_cper_notifier(struct notifier_block *nb);
> +#else
> +static inline int register_cxl_cper_notifier(struct notifier_block *nb)
> +{
> +	return 0;
> +}
> +
> +static inline void unregister_cxl_cper_notifier(struct notifier_block *nb) { }
> +#endif
> +
>   #endif /* _LINUX_CXL_EVENT_H */
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 6/6] cxl/memdev: Register for and process CPER events
  2023-11-01 21:11 ` [PATCH RFC v3 6/6] cxl/memdev: Register for and process CPER events Ira Weiny
@ 2023-11-03  0:32   ` Smita Koralahalli
  2023-11-06 22:10     ` Ira Weiny
  0 siblings, 1 reply; 15+ messages in thread
From: Smita Koralahalli @ 2023-11-03  0:32 UTC (permalink / raw)
  To: Ira Weiny, Dan Williams, Jonathan Cameron
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

On 11/1/2023 2:11 PM, Ira Weiny wrote:

[snip]

> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 44a21ab7add5..37add91068c0 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1,5 +1,6 @@
>   // SPDX-License-Identifier: GPL-2.0-only
>   /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +#include <asm-generic/unaligned.h>
>   #include <linux/io-64-nonatomic-lo-hi.h>
>   #include <linux/moduleparam.h>
>   #include <linux/module.h>
> @@ -748,6 +749,60 @@ static bool cxl_event_int_is_fw(u8 setting)
>   	return mode == CXL_INT_FW;
>   }
>   
> +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> +static int cxl_cper_event_call(struct notifier_block *nb, unsigned long action,
> +			       void *data)
> +{
> +	struct cxl_cper_notifier_data *nd = data;
> +	struct cper_cxl_event_devid *device_id = &nd->rec->hdr.device_id;
> +	enum cxl_event_log_type log_type;
> +	struct cxl_memdev_state *mds;
> +	struct cxl_dev_state *cxlds;
> +	struct pci_dev *pdev;
> +	unsigned int devfn;
> +	u32 hdr_flags;
> +
> +	mds = container_of(nb, struct cxl_memdev_state, cxl_cper_nb);
> +
> +	/* PCI_DEVFN() would require 2 extra bit shifts; skip those */
> +	devfn = (device_id->slot_num & 0xfff8) | (device_id->func_num & 0x07);

devfn = PCI_DEVFN(device_id->device_num, device_id->func_num) should 
also work correct?

> +	pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> +					   device_id->bus_num, devfn);
> +	cxlds = pci_get_drvdata(pdev);
> +	if (cxlds != &mds->cxlds) {

Do we need a error message here?

Thanks,
Smita

> +		pci_dev_put(pdev);
> +		return NOTIFY_DONE;
> +	}
> +
> +	/* Fabricate a log type */
> +	hdr_flags = get_unaligned_le24(nd->rec->event.generic.hdr.flags);
> +	log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
> +
> +	cxl_event_trace_record(mds->cxlds.cxlmd, log_type, nd->event_type,
> +			       &nd->rec->event);
> +	pci_dev_put(pdev);
> +	return NOTIFY_OK;
> +}
> +
> +static void cxl_unregister_cper_events(void *_mds)
> +{
> +	struct cxl_memdev_state *mds = _mds;
> +
> +	unregister_cxl_cper_notifier(&mds->cxl_cper_nb);
> +}
> +
> +static void register_cper_events(struct cxl_memdev_state *mds)
> +{
> +	mds->cxl_cper_nb.notifier_call = cxl_cper_event_call;
> +
> +	if (register_cxl_cper_notifier(&mds->cxl_cper_nb)) {
> +		dev_err(mds->cxlds.dev, "CPER registration failed\n");
> +		return;
> +	}
> +
> +	devm_add_action_or_reset(mds->cxlds.dev, cxl_unregister_cper_events, mds);
> +}
> +
>   static int cxl_event_config(struct pci_host_bridge *host_bridge,
>   			    struct cxl_memdev_state *mds)
>   {
> @@ -758,8 +813,10 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
>   	 * When BIOS maintains CXL error reporting control, it will process
>   	 * event records.  Only one agent can do so.
>   	 */
> -	if (!host_bridge->native_cxl_error)
> +	if (!host_bridge->native_cxl_error) {
> +		register_cper_events(mds);
>   		return 0;
> +	}
>   
>   	rc = cxl_mem_alloc_event_buf(mds);
>   	if (rc)
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events
  2023-11-01 21:11 ` [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events Ira Weiny
@ 2023-11-03 14:27   ` Jonathan Cameron
  2023-11-03 16:12     ` Shiju Jose
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Cameron @ 2023-11-03 14:27 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Smita Koralahalli, Yazen Ghannam, Davidlohr Bueso,
	Dave Jiang, Alison Schofield, Vishal Verma, Ard Biesheuvel,
	linux-efi, linux-kernel, linux-cxl, shiju.jose

On Wed, 01 Nov 2023 14:11:18 -0700
Ira Weiny <ira.weiny@intel.com> wrote:

> The uuid printed in the well known events is redundant.  The uuid
> defines what the event was.
> 
> Remove the uuid from the known events and only report it in the generic
> event as it remains informative there.
> 
> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Removing the print is fine, but look like this also removes the actual trace point
field.  That's userspace ABI.  Expanding it is fine, but taking fields away
is more problematic.

Are we sure we don't break anyone?  Shiju, will rasdaemon be fine with this
change?

Thanks,

Jonathan



> ---
>  drivers/cxl/core/trace.h | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index a0b5819bc70b..79ed03637604 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -189,7 +189,6 @@ TRACE_EVENT(cxl_overflow,
>  	__string(memdev, dev_name(&cxlmd->dev))			\
>  	__string(host, dev_name(cxlmd->dev.parent))		\
>  	__field(int, log)					\
> -	__field_struct(uuid_t, hdr_uuid)			\
>  	__field(u64, serial)					\
>  	__field(u32, hdr_flags)					\
>  	__field(u16, hdr_handle)				\
> @@ -203,7 +202,6 @@ TRACE_EVENT(cxl_overflow,
>  	__assign_str(host, dev_name((cxlmd)->dev.parent));			\
>  	__entry->log = (l);							\
>  	__entry->serial = (cxlmd)->cxlds->serial;				\
> -	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
>  	__entry->hdr_length = (hdr).length;					\
>  	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
>  	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
> @@ -212,12 +210,12 @@ TRACE_EVENT(cxl_overflow,
>  	__entry->hdr_maint_op_class = (hdr).maint_op_class
>  
>  #define CXL_EVT_TP_printk(fmt, ...) \
> -	TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb "	\
> +	TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu "		\
>  		"len=%d flags='%s' handle=%x related_handle=%x "		\
>  		"maint_op_class=%u : " fmt,					\
>  		__get_str(memdev), __get_str(host), __entry->serial,		\
>  		cxl_event_log_type_str(__entry->log),				\
> -		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> +		__entry->hdr_timestamp, __entry->hdr_length,			\
>  		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
>  		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
>  		##__VA_ARGS__)
> @@ -231,15 +229,17 @@ TRACE_EVENT(cxl_generic_event,
>  
>  	TP_STRUCT__entry(
>  		CXL_EVT_TP_entry
> +		__field_struct(uuid_t, hdr_uuid)
>  		__array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
>  	),
>  
>  	TP_fast_assign(
>  		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
> +		memcpy(&__entry->hdr_uuid, &rec->hdr.id, sizeof(uuid_t));
>  		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
>  	),
>  
> -	CXL_EVT_TP_printk("%s",
> +	CXL_EVT_TP_printk("uuid=%pUb %s", &__entry->hdr_uuid,
>  		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
>  );
>  
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events
  2023-11-03 14:27   ` Jonathan Cameron
@ 2023-11-03 16:12     ` Shiju Jose
  2023-11-06 22:05       ` Ira Weiny
  0 siblings, 1 reply; 15+ messages in thread
From: Shiju Jose @ 2023-11-03 16:12 UTC (permalink / raw)
  To: Jonathan Cameron, Ira Weiny
  Cc: Dan Williams, Smita Koralahalli, Yazen Ghannam, Davidlohr Bueso,
	Dave Jiang, Alison Schofield, Vishal Verma, Ard Biesheuvel,
	linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-cxl@vger.kernel.org



>-----Original Message-----
>From: Jonathan Cameron <jonathan.cameron@huawei.com>
>Sent: 03 November 2023 14:28
>To: Ira Weiny <ira.weiny@intel.com>
>Cc: Dan Williams <dan.j.williams@intel.com>; Smita Koralahalli
><Smita.KoralahalliChannabasappa@amd.com>; Yazen Ghannam
><yazen.ghannam@amd.com>; Davidlohr Bueso <dave@stgolabs.net>; Dave
>Jiang <dave.jiang@intel.com>; Alison Schofield <alison.schofield@intel.com>;
>Vishal Verma <vishal.l.verma@intel.com>; Ard Biesheuvel <ardb@kernel.org>;
>linux-efi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
>cxl@vger.kernel.org; Shiju Jose <shiju.jose@huawei.com>
>Subject: Re: [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known
>events
>
>On Wed, 01 Nov 2023 14:11:18 -0700
>Ira Weiny <ira.weiny@intel.com> wrote:
>
>> The uuid printed in the well known events is redundant.  The uuid
>> defines what the event was.
>>
>> Remove the uuid from the known events and only report it in the
>> generic event as it remains informative there.
>>
>> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>
>Removing the print is fine, but look like this also removes the actual trace point
>field.  That's userspace ABI.  Expanding it is fine, but taking fields away is more
>problematic.
>
>Are we sure we don't break anyone?  Shiju, will rasdaemon be fine with this
>change?

The field hdr_uuid is removed from the common CXL_EVT_TP_entry shared by the
trace events cxl_generic_event, cxl_general_media, cxl_dram and cxl_memory_module .
rasdaemon will break because of this while processing these trace events
and also affects the corresponding error records in the SQLite data base. 
Rasdaemon needs update to avoid this.
 
>
>Thanks,
>
>Jonathan
>

Thanks,
Shiju

>
>
>> ---
>>  drivers/cxl/core/trace.h | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index
>> a0b5819bc70b..79ed03637604 100644
>> --- a/drivers/cxl/core/trace.h
>> +++ b/drivers/cxl/core/trace.h
>> @@ -189,7 +189,6 @@ TRACE_EVENT(cxl_overflow,
>>  	__string(memdev, dev_name(&cxlmd->dev))			\
>>  	__string(host, dev_name(cxlmd->dev.parent))		\
>>  	__field(int, log)					\
>> -	__field_struct(uuid_t, hdr_uuid)			\
>>  	__field(u64, serial)					\
>>  	__field(u32, hdr_flags)					\
>>  	__field(u16, hdr_handle)				\
>> @@ -203,7 +202,6 @@ TRACE_EVENT(cxl_overflow,
>>  	__assign_str(host, dev_name((cxlmd)->dev.parent));
>	\
>>  	__entry->log = (l);
>	\
>>  	__entry->serial = (cxlmd)->cxlds->serial;				\
>> -	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));
>	\
>>  	__entry->hdr_length = (hdr).length;
>	\
>>  	__entry->hdr_flags = get_unaligned_le24((hdr).flags);
>	\
>>  	__entry->hdr_handle = le16_to_cpu((hdr).handle);
>	\
>> @@ -212,12 +210,12 @@ TRACE_EVENT(cxl_overflow,
>>  	__entry->hdr_maint_op_class = (hdr).maint_op_class
>>
>>  #define CXL_EVT_TP_printk(fmt, ...) \
>> -	TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu
>uuid=%pUb "	\
>> +	TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu "
>	\
>>  		"len=%d flags='%s' handle=%x related_handle=%x "
>	\
>>  		"maint_op_class=%u : " fmt,
>	\
>>  		__get_str(memdev), __get_str(host), __entry->serial,
>	\
>>  		cxl_event_log_type_str(__entry->log),
>	\
>> -		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry-
>>hdr_length,\
>> +		__entry->hdr_timestamp, __entry->hdr_length,
>	\
>>  		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,
>	\
>>  		__entry->hdr_related_handle, __entry->hdr_maint_op_class,
>	\
>>  		##__VA_ARGS__)
>> @@ -231,15 +229,17 @@ TRACE_EVENT(cxl_generic_event,
>>
>>  	TP_STRUCT__entry(
>>  		CXL_EVT_TP_entry
>> +		__field_struct(uuid_t, hdr_uuid)
>>  		__array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
>>  	),
>>
>>  	TP_fast_assign(
>>  		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
>> +		memcpy(&__entry->hdr_uuid, &rec->hdr.id, sizeof(uuid_t));
>>  		memcpy(__entry->data, &rec->data,
>CXL_EVENT_RECORD_DATA_LENGTH);
>>  	),
>>
>> -	CXL_EVT_TP_printk("%s",
>> +	CXL_EVT_TP_printk("uuid=%pUb %s", &__entry->hdr_uuid,
>>  		__print_hex(__entry->data,
>CXL_EVENT_RECORD_DATA_LENGTH))  );
>>
>>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events
  2023-11-03 16:12     ` Shiju Jose
@ 2023-11-06 22:05       ` Ira Weiny
  2023-11-07  9:38         ` Shiju Jose
  0 siblings, 1 reply; 15+ messages in thread
From: Ira Weiny @ 2023-11-06 22:05 UTC (permalink / raw)
  To: Shiju Jose, Jonathan Cameron, Ira Weiny
  Cc: Dan Williams, Smita Koralahalli, Yazen Ghannam, Davidlohr Bueso,
	Dave Jiang, Alison Schofield, Vishal Verma, Ard Biesheuvel,
	linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-cxl@vger.kernel.org

Shiju Jose wrote:
> 
> 
> >-----Original Message-----
> >From: Jonathan Cameron <jonathan.cameron@huawei.com>
> >Sent: 03 November 2023 14:28
> >To: Ira Weiny <ira.weiny@intel.com>
> >Cc: Dan Williams <dan.j.williams@intel.com>; Smita Koralahalli
> ><Smita.KoralahalliChannabasappa@amd.com>; Yazen Ghannam
> ><yazen.ghannam@amd.com>; Davidlohr Bueso <dave@stgolabs.net>; Dave
> >Jiang <dave.jiang@intel.com>; Alison Schofield <alison.schofield@intel.com>;
> >Vishal Verma <vishal.l.verma@intel.com>; Ard Biesheuvel <ardb@kernel.org>;
> >linux-efi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> >cxl@vger.kernel.org; Shiju Jose <shiju.jose@huawei.com>
> >Subject: Re: [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known
> >events
> >
> >On Wed, 01 Nov 2023 14:11:18 -0700
> >Ira Weiny <ira.weiny@intel.com> wrote:
> >
> >> The uuid printed in the well known events is redundant.  The uuid
> >> defines what the event was.
> >>
> >> Remove the uuid from the known events and only report it in the
> >> generic event as it remains informative there.
> >>
> >> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
> >> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >
> >Removing the print is fine, but look like this also removes the actual trace point
> >field.  That's userspace ABI.  Expanding it is fine, but taking fields away is more
> >problematic.
> >
> >Are we sure we don't break anyone?  Shiju, will rasdaemon be fine with this
> >change?
> 
> The field hdr_uuid is removed from the common CXL_EVT_TP_entry shared by the
> trace events cxl_generic_event, cxl_general_media, cxl_dram and cxl_memory_module .
> rasdaemon will break because of this while processing these trace events
> and also affects the corresponding error records in the SQLite data base. 
> Rasdaemon needs update to avoid this.
>  

Ok we can leave the uuid field in easy enough.

But does rasdaemon use the value of the field for anything?  In other
words does CPER record processing need to generate a proper UUID value?

Ira

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 6/6] cxl/memdev: Register for and process CPER events
  2023-11-03  0:32   ` Smita Koralahalli
@ 2023-11-06 22:10     ` Ira Weiny
  0 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-06 22:10 UTC (permalink / raw)
  To: Smita Koralahalli, Ira Weiny, Dan Williams, Jonathan Cameron
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

Smita Koralahalli wrote:
> On 11/1/2023 2:11 PM, Ira Weiny wrote:
> 

[snip]

> > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> > +static int cxl_cper_event_call(struct notifier_block *nb, unsigned long action,
> > +			       void *data)
> > +{
> > +	struct cxl_cper_notifier_data *nd = data;
> > +	struct cper_cxl_event_devid *device_id = &nd->rec->hdr.device_id;
> > +	enum cxl_event_log_type log_type;
> > +	struct cxl_memdev_state *mds;
> > +	struct cxl_dev_state *cxlds;
> > +	struct pci_dev *pdev;
> > +	unsigned int devfn;
> > +	u32 hdr_flags;
> > +
> > +	mds = container_of(nb, struct cxl_memdev_state, cxl_cper_nb);
> > +
> > +	/* PCI_DEVFN() would require 2 extra bit shifts; skip those */
> > +	devfn = (device_id->slot_num & 0xfff8) | (device_id->func_num & 0x07);
> 
> devfn = PCI_DEVFN(device_id->device_num, device_id->func_num) should 
> also work correct?

Device num is the slot number right shifted?  If so then yes.  I'm not an
expert on the PCIe nomenclature.

> 
> > +	pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> > +					   device_id->bus_num, devfn);
> > +	cxlds = pci_get_drvdata(pdev);
> > +	if (cxlds != &mds->cxlds) {
> 
> Do we need a error message here?

No, it is just that this event is not for this device.  Another device
will process it.  Or if there is no driver loaded for the device it will
be ignored.  (Same as would happen if the events were coming through the
log because the driver is not monitoring the log.)

Ira

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 5/6] firmware/efi: Process CXL Component Events
  2023-11-03  0:30   ` Smita Koralahalli
@ 2023-11-06 22:12     ` Ira Weiny
  0 siblings, 0 replies; 15+ messages in thread
From: Ira Weiny @ 2023-11-06 22:12 UTC (permalink / raw)
  To: Smita Koralahalli, Ira Weiny, Dan Williams, Jonathan Cameron
  Cc: Yazen Ghannam, Davidlohr Bueso, Dave Jiang, Alison Schofield,
	Vishal Verma, Ard Biesheuvel, linux-efi, linux-kernel, linux-cxl

Smita Koralahalli wrote:
> On 11/1/2023 2:11 PM, Ira Weiny wrote:
> 
>

[snip]

> > +
> > +#define CPER_CXL_DEVICE_ID_VALID		BIT(0)
> > +#define CPER_CXL_DEVICE_SN_VALID		BIT(1)
> > +#define CPER_CXL_COMP_EVENT_LOG_VALID		BIT(2)
> > +struct cper_cxl_event_rec {
> > +	struct {
> > +		u32 length;
> > +		u64 validation_bits;
> > +		struct cper_cxl_event_devid {
> > +			u16 vendor_id;
> > +			u16 device_id;
> > +			u8 func_num;
> > +			u8 device_num;
> > +			u8 bus_num;
> > +			u16 segment_num;
> > +			u16 slot_num; /* bits 2:0 reserved */
> > +			u8 reserved;
> > +		} device_id;
> > +		struct cper_cxl_event_sn {
> > +			u32 lower_dw;
> > +			u32 upper_dw;
> > +		} dev_serial_num;
> > +	} hdr;
> > +
> > +	union cxl_event event;
> > +};
> 
> Do we need pragma pack or similar for alignment here?

Yes good catch.

Fixed,
Ira

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events
  2023-11-06 22:05       ` Ira Weiny
@ 2023-11-07  9:38         ` Shiju Jose
  0 siblings, 0 replies; 15+ messages in thread
From: Shiju Jose @ 2023-11-07  9:38 UTC (permalink / raw)
  To: Ira Weiny, Jonathan Cameron
  Cc: Dan Williams, Smita Koralahalli, Yazen Ghannam, Davidlohr Bueso,
	Dave Jiang, Alison Schofield, Vishal Verma, Ard Biesheuvel,
	linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-cxl@vger.kernel.org

Hi Ira,

>-----Original Message-----
>From: Ira Weiny <ira.weiny@intel.com>
>Sent: 06 November 2023 22:06
>To: Shiju Jose <shiju.jose@huawei.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>; Ira Weiny <ira.weiny@intel.com>
>Cc: Dan Williams <dan.j.williams@intel.com>; Smita Koralahalli
><Smita.KoralahalliChannabasappa@amd.com>; Yazen Ghannam
><yazen.ghannam@amd.com>; Davidlohr Bueso <dave@stgolabs.net>; Dave
>Jiang <dave.jiang@intel.com>; Alison Schofield <alison.schofield@intel.com>;
>Vishal Verma <vishal.l.verma@intel.com>; Ard Biesheuvel <ardb@kernel.org>;
>linux-efi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
>cxl@vger.kernel.org
>Subject: RE: [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known
>events
>
>Shiju Jose wrote:
>>
>>
>> >-----Original Message-----
>> >From: Jonathan Cameron <jonathan.cameron@huawei.com>
>> >Sent: 03 November 2023 14:28
>> >To: Ira Weiny <ira.weiny@intel.com>
>> >Cc: Dan Williams <dan.j.williams@intel.com>; Smita Koralahalli
>> ><Smita.KoralahalliChannabasappa@amd.com>; Yazen Ghannam
>> ><yazen.ghannam@amd.com>; Davidlohr Bueso <dave@stgolabs.net>; Dave
>> >Jiang <dave.jiang@intel.com>; Alison Schofield
>> ><alison.schofield@intel.com>; Vishal Verma
>> ><vishal.l.verma@intel.com>; Ard Biesheuvel <ardb@kernel.org>;
>> >linux-efi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
>> >cxl@vger.kernel.org; Shiju Jose <shiju.jose@huawei.com>
>> >Subject: Re: [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event
>> >trace known events
>> >
>> >On Wed, 01 Nov 2023 14:11:18 -0700
>> >Ira Weiny <ira.weiny@intel.com> wrote:
>> >
>> >> The uuid printed in the well known events is redundant.  The uuid
>> >> defines what the event was.
>> >>
>> >> Remove the uuid from the known events and only report it in the
>> >> generic event as it remains informative there.
>> >>
>> >> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>> >> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> >> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>> >
>> >Removing the print is fine, but look like this also removes the
>> >actual trace point field.  That's userspace ABI.  Expanding it is
>> >fine, but taking fields away is more problematic.
>> >
>> >Are we sure we don't break anyone?  Shiju, will rasdaemon be fine
>> >with this change?
>>
>> The field hdr_uuid is removed from the common CXL_EVT_TP_entry shared
>> by the trace events cxl_generic_event, cxl_general_media, cxl_dram and
>cxl_memory_module .
>> rasdaemon will break because of this while processing these trace
>> events and also affects the corresponding error records in the SQLite data
>base.
>> Rasdaemon needs update to avoid this.
>>
>
>Ok we can leave the uuid field in easy enough.
>
>But does rasdaemon use the value of the field for anything?  In other words does
>CPER record processing need to generate a proper UUID value?
No. Presently used for logging purpose only in the rasdaemon.

>
>Ira

Thanks,
Shiju

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-11-07  9:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-01 21:11 [PATCH RFC v3 0/6] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
2023-11-01 21:11 ` [PATCH RFC v3 1/6] cxl/trace: Remove uuid from event trace known events Ira Weiny
2023-11-03 14:27   ` Jonathan Cameron
2023-11-03 16:12     ` Shiju Jose
2023-11-06 22:05       ` Ira Weiny
2023-11-07  9:38         ` Shiju Jose
2023-11-01 21:11 ` [PATCH RFC v3 2/6] cxl/events: Promote CXL event structures to a core header Ira Weiny
2023-11-01 21:11 ` [PATCH RFC v3 3/6] cxl/events: Remove UUID from non-generic event structures Ira Weiny
2023-11-01 21:11 ` [PATCH RFC v3 4/6] cxl/events: Create a CXL event union Ira Weiny
2023-11-01 21:11 ` [PATCH RFC v3 5/6] firmware/efi: Process CXL Component Events Ira Weiny
2023-11-03  0:30   ` Smita Koralahalli
2023-11-06 22:12     ` Ira Weiny
2023-11-01 21:11 ` [PATCH RFC v3 6/6] cxl/memdev: Register for and process CPER events Ira Weiny
2023-11-03  0:32   ` Smita Koralahalli
2023-11-06 22:10     ` Ira Weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox