From: Dan Williams <dan.j.williams@intel.com>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
Shiju Jose <shiju.jose@huawei.com>,
Yazen Ghannam <yazen.ghannam@amd.com>,
"Davidlohr Bueso" <dave@stgolabs.net>,
Dave Jiang <dave.jiang@intel.com>,
"Alison Schofield" <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ard Biesheuvel <ardb@kernel.org>, <linux-efi@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
"Bjorn Helgaas" <bhelgaas@google.com>
Subject: Re: [PATCH v5 0/9] efi/cxl-cper: Report CPER CXL component events through trace events
Date: Mon, 8 Jan 2024 18:08:13 -0800 [thread overview]
Message-ID: <659caa8da651c_127da22947b@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <eefc5888-4610-8e39-61ed-2d84e9ebf255@amd.com>
Smita Koralahalli wrote:
> On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> > On Wed, 20 Dec 2023 16:17:27 -0800
> > Ira Weiny <ira.weiny@intel.com> wrote:
> >
> >> Series status/background
> >> ========================
> >>
> >> Smita has been a great help with this series. Thank you again!
> >>
> >> Smita's testing found that the GHES code ended up printing the events
> >> twice. This version avoids the duplicate print by calling the callback
> >> from the GHES code instead of the EFI code as suggested by Dan.
> >
> > I'm not sure this is working as intended.
> >
> > There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> > and now the EFI code handling that pretty printed things is missing we get
> > the horrible kernel logging for an unknown block instead.
> >
> > So I think we need some minimal code in cper.c to match the guids then not
> > log them (on basis we are arguing there is no need for new cper records).
> > Otherwise we are in for some messy kernel logs
> >
> > Something like:
> >
> > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > {1}[Hardware Error]: event severity: recoverable
> > {1}[Hardware Error]: Error 0, type: recoverable
> > {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> > {1}[Hardware Error]: section length: 0x90
> > {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> > {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> > {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> > cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
> >
> > (I'm filling the record with 0s currently)
>
> Yeah, when I tested this, I thought its okay for the hexdump to be there
> in dmesg from EFI as the handling is done in trace events from GHES.
>
> If, we need to handle from EFI, then it would be a good reason to move
> the GUIDs out from GHES and place it in a common location for EFI/cper
> to share similar to protocol errors.
Ah, yes, my expectation was more aligned with Jonathan's observation to
do the processing in GHES code *and* skip the processing in the CPER
code, something like:
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 56a5d2ef9e0a..e13e5fa4df4b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -666,30 +666,6 @@ static cxl_cper_callback cper_callback;
/* CXL Event record UUIDs are formatted as GUIDs and reported in section type */
-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CPER_SEC_CXL_GEN_MEDIA_GUID \
- GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
- 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
-
-/*
- * DRAM Event Record
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
- */
-#define CPER_SEC_CXL_DRAM_GUID \
- GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
- 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-#define CPER_SEC_CXL_MEM_MODULE_GUID \
- GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
- 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
-
static void cxl_cper_post_event(enum cxl_event_type event_type,
struct cxl_cper_event_rec *rec)
{
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 35c37f667781..0a4eed470750 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -24,6 +24,7 @@
#include <linux/bcd.h>
#include <acpi/ghes.h>
#include <ras/ras_event.h>
+#include <linux/cxl-event.h>
#include "cper_cxl.h"
/*
@@ -607,6 +608,15 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
cper_print_prot_err(newpfx, prot_err);
else
goto err_section_too_small;
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
+ printk("%ssection_type: CXL General Media Error\n", newpfx);
+ /* see: cxl_cper_event_call() */
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID)) {
+ printk("%ssection_type: CXL DRAM Error\n", newpfx);
+ /* see: cxl_cper_event_call() */
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
+ printk("%ssection_type: CXL Memory Module Error\n", newpfx);
+ /* see: cxl_cper_event_call() */
} else {
const void *err = acpi_hest_get_payload(gdata);
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 17eadee819b6..6d9a7df88d4a 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -1,12 +1,31 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_CXL_EVENT_H
#define _LINUX_CXL_EVENT_H
+#include <linux/uuid.h>
/*
- * CXL event records; CXL rev 3.0
- *
- * Copyright(c) 2023 Intel Corporation.
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CPER_SEC_CXL_GEN_MEDIA_GUID \
+ GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CPER_SEC_CXL_DRAM_GUID \
+ GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
*/
+#define CPER_SEC_CXL_MEM_MODULE_GUID \
+ GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
struct cxl_event_record_hdr {
u8 length;
next prev parent reply other threads:[~2024-01-09 2:08 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-21 0:17 [PATCH v5 0/9] efi/cxl-cper: Report CPER CXL component events through trace events Ira Weiny
2023-12-21 0:17 ` [PATCH v5 1/9] cxl/trace: Pass uuid explicitly to event traces Ira Weiny
2024-01-08 12:56 ` Jonathan Cameron
2023-12-21 0:17 ` [PATCH v5 2/9] cxl/events: Promote CXL event structures to a core header Ira Weiny
2024-01-08 13:05 ` Jonathan Cameron
2023-12-21 0:17 ` [PATCH v5 3/9] cxl/events: Create common event UUID defines Ira Weiny
2024-01-08 13:07 ` Jonathan Cameron
2023-12-21 0:17 ` [PATCH v5 4/9] cxl/events: Remove passing a UUID to known event traces Ira Weiny
2024-01-08 13:23 ` Jonathan Cameron
2024-01-09 23:38 ` Dan Williams
2024-01-10 14:22 ` Jonathan Cameron
2023-12-21 0:17 ` [PATCH v5 5/9] cxl/events: Separate UUID from event structures Ira Weiny
2024-01-08 13:27 ` Jonathan Cameron
2023-12-21 0:17 ` [PATCH v5 6/9] cxl/events: Create a CXL event union Ira Weiny
2024-01-08 13:31 ` Jonathan Cameron
2023-12-21 0:17 ` [PATCH v5 7/9] acpi/ghes: Process CXL Component Events Ira Weiny
2024-01-08 13:41 ` Jonathan Cameron
2023-12-21 0:17 ` [PATCH v5 8/9] PCI: Define scoped based management functions Ira Weiny
2024-01-03 22:38 ` Dan Williams
2024-01-03 23:01 ` Bjorn Helgaas
2024-01-04 0:21 ` Dan Williams
2024-01-04 17:17 ` Ira Weiny
2024-01-04 18:32 ` Bjorn Helgaas
2024-01-04 18:59 ` Dan Williams
2024-01-04 21:46 ` Ira Weiny
2024-01-04 22:37 ` Bjorn Helgaas
2024-01-04 23:00 ` Ira Weiny
2024-01-04 6:05 ` Lukas Wunner
2024-01-04 6:43 ` Dan Williams
2024-01-04 7:02 ` Lukas Wunner
2024-01-04 7:37 ` Ard Biesheuvel
2024-01-04 17:41 ` Dan Williams
2024-01-08 13:44 ` Jonathan Cameron
2023-12-21 0:17 ` [PATCH v5 9/9] cxl/pci: Register for and process CPER events Ira Weiny
2024-01-02 15:14 ` Smita Koralahalli
2024-01-02 20:29 ` Ira Weiny
2024-01-03 22:08 ` Dan Williams
2024-01-04 18:31 ` Ira Weiny
2024-01-08 13:50 ` Jonathan Cameron
2024-01-09 23:59 ` Dan Williams
2024-01-04 22:55 ` [PATCH v5 0/9] efi/cxl-cper: Report CPER CXL component events through trace events Bjorn Helgaas
2024-01-08 16:58 ` Jonathan Cameron
2024-01-08 20:04 ` Smita Koralahalli
2024-01-09 2:08 ` Dan Williams [this message]
2024-01-09 2:32 ` Ira Weiny
2024-01-09 2:59 ` Dan Williams
2024-01-09 16:04 ` Jonathan Cameron
2024-01-09 20:49 ` Dan Williams
2024-01-09 23:30 ` Dan Williams
2024-01-09 23:31 ` Ard Biesheuvel
2024-01-10 14:24 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=659caa8da651c_127da22947b@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=ardb@kernel.org \
--cc=bhelgaas@google.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=shiju.jose@huawei.com \
--cc=vishal.l.verma@intel.com \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox