* [PATCH v3 0/7] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors
@ 2024-11-19 0:39 Smita Koralahalli
2024-11-19 0:39 ` [PATCH v3 1/7] efi/cper, cxl: Prefix protocol error struct and function names with cxl_ Smita Koralahalli
` (6 more replies)
0 siblings, 7 replies; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-19 0:39 UTC (permalink / raw)
To: linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
This patchset adds logging support for CXL CPER endpoint and port protocol
errors.
The first 5 patches update the existing codebase to support CXL CPER
Protocol error reporting.
The last 2 patches introduce recognizing and reporting CXL CPER Protocol
errors.
Should be based on top of:
https://lore.kernel.org/linux-cxl/20241113215429.3177981-1-terry.bowman@amd.com
Link to v2:
https://lore.kernel.org/linux-cxl/20241001005234.61409-1-Smita.KoralahalliChannabasappa@amd.com/
Changes in v2 -> v3:
[Dan]: Define a new workqueue for CXL CPER Protocol errors and avoid
reusing existing workqueue which handles CXL CPER events.
[Dan] Update function and struct names.
[Ira] Don't define common function get_cxl_devstate().
[Dan] Use switch cases rather than defining array of structures.
[Dan] Pass the entire cxl_cper_prot_err struct for CXL subsystem.
[Dan] Use pr_err_ratelimited().
[Dan] Use AER_ severities directly. Don't define CXL_ severities.
[Dan] Limit either to Device ID or Agent Info check.
[Dan] Validate size of RAS field matches expectations.
Changes in v2 -> v1:
[Jonathan] Refactor code for trace support. Rename get_cxl_dev()
to get_cxl_devstate().
[Jonathan] Cleanups for get_cxl_devstate().
[Alison, Jonathan]: Define array of structures for Device ID and Serial
number comparison.
[Dave] p_err -> rec/p_rec.
[Jonathan] Remove pr_warn.
Smita Koralahalli (7):
efi/cper, cxl: Prefix protocol error struct and function names with
cxl_
efi/cper, cxl: Make definitions and structures global
efi/cper, cxl: Remove cper_cxl.h
acpi/ghes, cxl: Rename cxl_cper_register_work to
cxl_cper_register_event_work
acpi/ghes, cxl: Refactor work registration functions to support
multiple workqueues
acpi/ghes, cper: Recognize and cache CXL Protocol errors
acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
drivers/acpi/apei/ghes.c | 129 +++++++++++++++++++++++++++++---
drivers/cxl/core/pci.c | 50 +++++++++++++
drivers/cxl/cxlpci.h | 6 ++
drivers/cxl/pci.c | 59 ++++++++++++++-
drivers/firmware/efi/cper.c | 6 +-
drivers/firmware/efi/cper_cxl.c | 39 +---------
drivers/firmware/efi/cper_cxl.h | 66 ----------------
include/cxl/event.h | 109 ++++++++++++++++++++++++++-
include/linux/cper.h | 8 ++
9 files changed, 351 insertions(+), 121 deletions(-)
delete mode 100644 drivers/firmware/efi/cper_cxl.h
--
2.17.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 1/7] efi/cper, cxl: Prefix protocol error struct and function names with cxl_
2024-11-19 0:39 [PATCH v3 0/7] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Smita Koralahalli
@ 2024-11-19 0:39 ` Smita Koralahalli
2024-11-26 15:05 ` Jonathan Cameron
2024-12-02 18:12 ` Ira Weiny
2024-11-19 0:39 ` [PATCH v3 2/7] efi/cper, cxl: Make definitions and structures global Smita Koralahalli
` (5 subsequent siblings)
6 siblings, 2 replies; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-19 0:39 UTC (permalink / raw)
To: linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
Rename the protocol error struct from struct cper_sec_prot_err to
struct cxl_cper_sec_prot_err and cper_print_prot_err() to
cxl_cper_print_prot_err() to maintain naming consistency. No
functional changes.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/firmware/efi/cper.c | 4 ++--
drivers/firmware/efi/cper_cxl.c | 3 ++-
drivers/firmware/efi/cper_cxl.h | 5 +++--
3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index b69e68ef3f02..8e5762f7ef2e 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -624,11 +624,11 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
else
goto err_section_too_small;
} else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
- struct cper_sec_prot_err *prot_err = acpi_hest_get_payload(gdata);
+ struct cxl_cper_sec_prot_err *prot_err = acpi_hest_get_payload(gdata);
printk("%ssection_type: CXL Protocol Error\n", newpfx);
if (gdata->error_data_length >= sizeof(*prot_err))
- cper_print_prot_err(newpfx, prot_err);
+ cxl_cper_print_prot_err(newpfx, prot_err);
else
goto err_section_too_small;
} else {
diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
index a55771b99a97..cbaabcb7382d 100644
--- a/drivers/firmware/efi/cper_cxl.c
+++ b/drivers/firmware/efi/cper_cxl.c
@@ -55,7 +55,8 @@ enum {
USP, /* CXL Upstream Switch Port */
};
-void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err)
+void cxl_cper_print_prot_err(const char *pfx,
+ const struct cxl_cper_sec_prot_err *prot_err)
{
if (prot_err->valid_bits & PROT_ERR_VALID_AGENT_TYPE)
pr_info("%s agent_type: %d, %s\n", pfx, prot_err->agent_type,
diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
index 86bfcf7909ec..0e3ab0ba17c3 100644
--- a/drivers/firmware/efi/cper_cxl.h
+++ b/drivers/firmware/efi/cper_cxl.h
@@ -18,7 +18,7 @@
#pragma pack(1)
/* Compute Express Link Protocol Error Section, UEFI v2.10 sec N.2.13 */
-struct cper_sec_prot_err {
+struct cxl_cper_sec_prot_err {
u64 valid_bits;
u8 agent_type;
u8 reserved[7];
@@ -61,6 +61,7 @@ struct cper_sec_prot_err {
#pragma pack()
-void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err);
+void cxl_cper_print_prot_err(const char *pfx,
+ const struct cxl_cper_sec_prot_err *prot_err);
#endif //__CPER_CXL_
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 2/7] efi/cper, cxl: Make definitions and structures global
2024-11-19 0:39 [PATCH v3 0/7] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Smita Koralahalli
2024-11-19 0:39 ` [PATCH v3 1/7] efi/cper, cxl: Prefix protocol error struct and function names with cxl_ Smita Koralahalli
@ 2024-11-19 0:39 ` Smita Koralahalli
2024-11-26 15:09 ` Jonathan Cameron
2024-12-02 18:15 ` Ira Weiny
2024-11-19 0:39 ` [PATCH v3 3/7] efi/cper, cxl: Remove cper_cxl.h Smita Koralahalli
` (4 subsequent siblings)
6 siblings, 2 replies; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-19 0:39 UTC (permalink / raw)
To: linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
In preparation to add tracepoint support, move protocol error UUID
definition to a common location, Also, make struct CXL RAS capability,
cxl_cper_sec_prot_err and CPER validation flags global for use across
different modules.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/firmware/efi/cper.c | 1 +
drivers/firmware/efi/cper_cxl.c | 35 +--------------
drivers/firmware/efi/cper_cxl.h | 51 ---------------------
include/cxl/event.h | 80 +++++++++++++++++++++++++++++++++
include/linux/cper.h | 4 ++
5 files changed, 86 insertions(+), 85 deletions(-)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 8e5762f7ef2e..ae1953e2b214 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -24,6 +24,7 @@
#include <linux/bcd.h>
#include <acpi/ghes.h>
#include <ras/ras_event.h>
+#include <cxl/event.h>
#include "cper_cxl.h"
/*
diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
index cbaabcb7382d..64c0dd27be6e 100644
--- a/drivers/firmware/efi/cper_cxl.c
+++ b/drivers/firmware/efi/cper_cxl.c
@@ -8,27 +8,9 @@
*/
#include <linux/cper.h>
+#include <cxl/event.h>
#include "cper_cxl.h"
-#define PROT_ERR_VALID_AGENT_TYPE BIT_ULL(0)
-#define PROT_ERR_VALID_AGENT_ADDRESS BIT_ULL(1)
-#define PROT_ERR_VALID_DEVICE_ID BIT_ULL(2)
-#define PROT_ERR_VALID_SERIAL_NUMBER BIT_ULL(3)
-#define PROT_ERR_VALID_CAPABILITY BIT_ULL(4)
-#define PROT_ERR_VALID_DVSEC BIT_ULL(5)
-#define PROT_ERR_VALID_ERROR_LOG BIT_ULL(6)
-
-/* CXL RAS Capability Structure, CXL v3.0 sec 8.2.4.16 */
-struct cxl_ras_capability_regs {
- u32 uncor_status;
- u32 uncor_mask;
- u32 uncor_severity;
- u32 cor_status;
- u32 cor_mask;
- u32 cap_control;
- u32 header_log[16];
-};
-
static const char * const prot_err_agent_type_strs[] = {
"Restricted CXL Device",
"Restricted CXL Host Downstream Port",
@@ -40,21 +22,6 @@ static const char * const prot_err_agent_type_strs[] = {
"CXL Upstream Switch Port",
};
-/*
- * The layout of the enumeration and the values matches CXL Agent Type
- * field in the UEFI 2.10 Section N.2.13,
- */
-enum {
- RCD, /* Restricted CXL Device */
- RCH_DP, /* Restricted CXL Host Downstream Port */
- DEVICE, /* CXL Device */
- LD, /* CXL Logical Device */
- FMLD, /* CXL Fabric Manager managed Logical Device */
- RP, /* CXL Root Port */
- DSP, /* CXL Downstream Switch Port */
- USP, /* CXL Upstream Switch Port */
-};
-
void cxl_cper_print_prot_err(const char *pfx,
const struct cxl_cper_sec_prot_err *prot_err)
{
diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
index 0e3ab0ba17c3..5ce1401ee17a 100644
--- a/drivers/firmware/efi/cper_cxl.h
+++ b/drivers/firmware/efi/cper_cxl.h
@@ -10,57 +10,6 @@
#ifndef LINUX_CPER_CXL_H
#define LINUX_CPER_CXL_H
-/* CXL Protocol Error Section */
-#define CPER_SEC_CXL_PROT_ERR \
- GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78, \
- 0x4B, 0x77, 0x10, 0x48)
-
-#pragma pack(1)
-
-/* Compute Express Link Protocol Error Section, UEFI v2.10 sec N.2.13 */
-struct cxl_cper_sec_prot_err {
- u64 valid_bits;
- u8 agent_type;
- u8 reserved[7];
-
- /*
- * Except for RCH Downstream Port, all the remaining CXL Agent
- * types are uniquely identified by the PCIe compatible SBDF number.
- */
- union {
- u64 rcrb_base_addr;
- struct {
- u8 function;
- u8 device;
- u8 bus;
- u16 segment;
- u8 reserved_1[3];
- };
- } agent_addr;
-
- struct {
- u16 vendor_id;
- u16 device_id;
- u16 subsystem_vendor_id;
- u16 subsystem_id;
- u8 class_code[2];
- u16 slot;
- u8 reserved_1[4];
- } device_id;
-
- struct {
- u32 lower_dw;
- u32 upper_dw;
- } dev_serial_num;
-
- u8 capability[60];
- u16 dvsec_len;
- u16 err_len;
- u8 reserved_2[4];
-};
-
-#pragma pack()
-
void cxl_cper_print_prot_err(const char *pfx,
const struct cxl_cper_sec_prot_err *prot_err);
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 0bea1afbd747..66d85fc87701 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -152,6 +152,86 @@ struct cxl_cper_work_data {
struct cxl_cper_event_rec rec;
};
+#define PROT_ERR_VALID_AGENT_TYPE BIT_ULL(0)
+#define PROT_ERR_VALID_AGENT_ADDRESS BIT_ULL(1)
+#define PROT_ERR_VALID_DEVICE_ID BIT_ULL(2)
+#define PROT_ERR_VALID_SERIAL_NUMBER BIT_ULL(3)
+#define PROT_ERR_VALID_CAPABILITY BIT_ULL(4)
+#define PROT_ERR_VALID_DVSEC BIT_ULL(5)
+#define PROT_ERR_VALID_ERROR_LOG BIT_ULL(6)
+
+/*
+ * The layout of the enumeration and the values matches CXL Agent Type
+ * field in the UEFI 2.10 Section N.2.13,
+ */
+enum {
+ RCD, /* Restricted CXL Device */
+ RCH_DP, /* Restricted CXL Host Downstream Port */
+ DEVICE, /* CXL Device */
+ LD, /* CXL Logical Device */
+ FMLD, /* CXL Fabric Manager managed Logical Device */
+ RP, /* CXL Root Port */
+ DSP, /* CXL Downstream Switch Port */
+ USP, /* CXL Upstream Switch Port */
+};
+
+#pragma pack(1)
+
+/* Compute Express Link Protocol Error Section, UEFI v2.10 sec N.2.13 */
+struct cxl_cper_sec_prot_err {
+ u64 valid_bits;
+ u8 agent_type;
+ u8 reserved[7];
+
+ /*
+ * Except for RCH Downstream Port, all the remaining CXL Agent
+ * types are uniquely identified by the PCIe compatible SBDF number.
+ */
+ union {
+ u64 rcrb_base_addr;
+ struct {
+ u8 function;
+ u8 device;
+ u8 bus;
+ u16 segment;
+ u8 reserved_1[3];
+ };
+ } agent_addr;
+
+ struct {
+ u16 vendor_id;
+ u16 device_id;
+ u16 subsystem_vendor_id;
+ u16 subsystem_id;
+ u8 class_code[2];
+ u16 slot;
+ u8 reserved_1[4];
+ } device_id;
+
+ struct {
+ u32 lower_dw;
+ u32 upper_dw;
+ } dev_serial_num;
+
+ u8 capability[60];
+ u16 dvsec_len;
+ u16 err_len;
+ u8 reserved_2[4];
+};
+
+#pragma pack()
+
+/* CXL RAS Capability Structure, CXL v3.0 sec 8.2.4.16 */
+struct cxl_ras_capability_regs {
+ u32 uncor_status;
+ u32 uncor_mask;
+ u32 uncor_severity;
+ u32 cor_status;
+ u32 cor_mask;
+ u32 cap_control;
+ u32 header_log[16];
+};
+
#ifdef CONFIG_ACPI_APEI_GHES
int cxl_cper_register_work(struct work_struct *work);
int cxl_cper_unregister_work(struct work_struct *work);
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 265b0f8fc0b3..5c6d4d5b9975 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -89,6 +89,10 @@ enum {
#define CPER_NOTIFY_DMAR \
GUID_INIT(0x667DD791, 0xC6B3, 0x4c27, 0x8A, 0x6B, 0x0F, 0x8E, \
0x72, 0x2D, 0xEB, 0x41)
+/* CXL Protocol Error Section */
+#define CPER_SEC_CXL_PROT_ERR \
+ GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78, \
+ 0x4B, 0x77, 0x10, 0x48)
/* CXL Event record UUIDs are formatted as GUIDs and reported in section type */
/*
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 3/7] efi/cper, cxl: Remove cper_cxl.h
2024-11-19 0:39 [PATCH v3 0/7] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Smita Koralahalli
2024-11-19 0:39 ` [PATCH v3 1/7] efi/cper, cxl: Prefix protocol error struct and function names with cxl_ Smita Koralahalli
2024-11-19 0:39 ` [PATCH v3 2/7] efi/cper, cxl: Make definitions and structures global Smita Koralahalli
@ 2024-11-19 0:39 ` Smita Koralahalli
2024-11-26 15:51 ` Jonathan Cameron
2024-12-02 18:15 ` Ira Weiny
2024-11-19 0:39 ` [PATCH v3 4/7] acpi/ghes, cxl: Rename cxl_cper_register_work to cxl_cper_register_event_work Smita Koralahalli
` (3 subsequent siblings)
6 siblings, 2 replies; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-19 0:39 UTC (permalink / raw)
To: linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
Move the declaration of cxl_cper_print_prot_err() to include/linux/cper.h
to avoid maintaining a separate header file just for this function
declaration. Remove drivers/firmware/efi/cper_cxl.h as its contents have
been reorganized.
Eliminate its corresponding #include directives from source files that
previously included it, since the header file has been removed.
No functional changes.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/firmware/efi/cper.c | 1 -
drivers/firmware/efi/cper_cxl.c | 1 -
drivers/firmware/efi/cper_cxl.h | 16 ----------------
include/linux/cper.h | 4 ++++
4 files changed, 4 insertions(+), 18 deletions(-)
delete mode 100644 drivers/firmware/efi/cper_cxl.h
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index ae1953e2b214..928409199a1a 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -25,7 +25,6 @@
#include <acpi/ghes.h>
#include <ras/ras_event.h>
#include <cxl/event.h>
-#include "cper_cxl.h"
/*
* CPER record ID need to be unique even after reboot, because record
diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
index 64c0dd27be6e..8a7667faf953 100644
--- a/drivers/firmware/efi/cper_cxl.c
+++ b/drivers/firmware/efi/cper_cxl.c
@@ -9,7 +9,6 @@
#include <linux/cper.h>
#include <cxl/event.h>
-#include "cper_cxl.h"
static const char * const prot_err_agent_type_strs[] = {
"Restricted CXL Device",
diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
deleted file mode 100644
index 5ce1401ee17a..000000000000
--- a/drivers/firmware/efi/cper_cxl.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * UEFI Common Platform Error Record (CPER) support for CXL Section.
- *
- * Copyright (C) 2022 Advanced Micro Devices, Inc.
- *
- * Author: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
- */
-
-#ifndef LINUX_CPER_CXL_H
-#define LINUX_CPER_CXL_H
-
-void cxl_cper_print_prot_err(const char *pfx,
- const struct cxl_cper_sec_prot_err *prot_err);
-
-#endif //__CPER_CXL_
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 5c6d4d5b9975..0ed60a91eca9 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -605,4 +605,8 @@ void cper_estatus_print(const char *pfx,
int cper_estatus_check_header(const struct acpi_hest_generic_status *estatus);
int cper_estatus_check(const struct acpi_hest_generic_status *estatus);
+struct cxl_cper_sec_prot_err;
+void cxl_cper_print_prot_err(const char *pfx,
+ const struct cxl_cper_sec_prot_err *prot_err);
+
#endif
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 4/7] acpi/ghes, cxl: Rename cxl_cper_register_work to cxl_cper_register_event_work
2024-11-19 0:39 [PATCH v3 0/7] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Smita Koralahalli
` (2 preceding siblings ...)
2024-11-19 0:39 ` [PATCH v3 3/7] efi/cper, cxl: Remove cper_cxl.h Smita Koralahalli
@ 2024-11-19 0:39 ` Smita Koralahalli
2024-11-26 15:53 ` Jonathan Cameron
2024-11-19 0:39 ` [PATCH v3 5/7] acpi/ghes, cxl: Refactor work registration functions to support multiple workqueues Smita Koralahalli
` (2 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-19 0:39 UTC (permalink / raw)
To: linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
Rename cxl_cper_register_work() to cxl_cper_register_event_work() to
better reflect its purpose of registering CXL Component Events based work
within the CXL subsystem.
This rename prepares the codebase to support future patches where
cxl_cper_register_work() will accept generic pointers for Protocol Error
workqueue integration.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/acpi/apei/ghes.c | 8 ++++----
drivers/cxl/pci.c | 4 ++--
include/cxl/event.h | 8 ++++----
3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index ada93cfde9ba..082c409707ba 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -717,7 +717,7 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
schedule_work(cxl_cper_work);
}
-int cxl_cper_register_work(struct work_struct *work)
+int cxl_cper_register_event_work(struct work_struct *work)
{
if (cxl_cper_work)
return -EINVAL;
@@ -726,9 +726,9 @@ int cxl_cper_register_work(struct work_struct *work)
cxl_cper_work = work;
return 0;
}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_register_work, CXL);
+EXPORT_SYMBOL_NS_GPL(cxl_cper_register_event_work, CXL);
-int cxl_cper_unregister_work(struct work_struct *work)
+int cxl_cper_unregister_event_work(struct work_struct *work)
{
if (cxl_cper_work != work)
return -EINVAL;
@@ -737,7 +737,7 @@ int cxl_cper_unregister_work(struct work_struct *work)
cxl_cper_work = NULL;
return 0;
}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_work, CXL);
+EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_event_work, CXL);
int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
{
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 188412d45e0d..88a14d7baa65 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1075,7 +1075,7 @@ static int __init cxl_pci_driver_init(void)
if (rc)
return rc;
- rc = cxl_cper_register_work(&cxl_cper_work);
+ rc = cxl_cper_register_event_work(&cxl_cper_work);
if (rc)
pci_unregister_driver(&cxl_pci_driver);
@@ -1084,7 +1084,7 @@ static int __init cxl_pci_driver_init(void)
static void __exit cxl_pci_driver_exit(void)
{
- cxl_cper_unregister_work(&cxl_cper_work);
+ cxl_cper_unregister_event_work(&cxl_cper_work);
cancel_work_sync(&cxl_cper_work);
pci_unregister_driver(&cxl_pci_driver);
}
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 66d85fc87701..992568b35455 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -233,16 +233,16 @@ struct cxl_ras_capability_regs {
};
#ifdef CONFIG_ACPI_APEI_GHES
-int cxl_cper_register_work(struct work_struct *work);
-int cxl_cper_unregister_work(struct work_struct *work);
+int cxl_cper_register_event_work(struct work_struct *work);
+int cxl_cper_unregister_event_work(struct work_struct *work);
int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd);
#else
-static inline int cxl_cper_register_work(struct work_struct *work)
+static inline int cxl_cper_register_event_work(struct work_struct *work)
{
return 0;
}
-static inline int cxl_cper_unregister_work(struct work_struct *work)
+static inline int cxl_cper_unregister_event_work(struct work_struct *work)
{
return 0;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 5/7] acpi/ghes, cxl: Refactor work registration functions to support multiple workqueues
2024-11-19 0:39 [PATCH v3 0/7] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Smita Koralahalli
` (3 preceding siblings ...)
2024-11-19 0:39 ` [PATCH v3 4/7] acpi/ghes, cxl: Rename cxl_cper_register_work to cxl_cper_register_event_work Smita Koralahalli
@ 2024-11-19 0:39 ` Smita Koralahalli
2024-11-26 15:57 ` Jonathan Cameron
2024-11-19 0:39 ` [PATCH v3 6/7] acpi/ghes, cper: Recognize and cache CXL Protocol errors Smita Koralahalli
2024-11-19 0:39 ` [PATCH v3 7/7] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors Smita Koralahalli
6 siblings, 1 reply; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-19 0:39 UTC (permalink / raw)
To: linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
Refactor the work registration and unregistration functions in GHES to
enable reuse across different workqueues. This update lays the foundation
for integrating additional workqueues in the CXL subsystem for better
modularity and code reuse.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/acpi/apei/ghes.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 082c409707ba..62ffe6eb5503 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -717,26 +717,42 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
schedule_work(cxl_cper_work);
}
-int cxl_cper_register_event_work(struct work_struct *work)
+static int cxl_cper_register_work(struct work_struct **work_ptr,
+ spinlock_t *lock,
+ struct work_struct *work)
{
- if (cxl_cper_work)
+ if (*work_ptr)
return -EINVAL;
- guard(spinlock)(&cxl_cper_work_lock);
- cxl_cper_work = work;
+ guard(spinlock)(lock);
+ *work_ptr = work;
return 0;
}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_register_event_work, CXL);
-int cxl_cper_unregister_event_work(struct work_struct *work)
+static int cxl_cper_unregister_work(struct work_struct **work_ptr,
+ spinlock_t *lock,
+ struct work_struct *work)
{
- if (cxl_cper_work != work)
+ if (*work_ptr != work)
return -EINVAL;
- guard(spinlock)(&cxl_cper_work_lock);
- cxl_cper_work = NULL;
+ guard(spinlock)(lock);
+ *work_ptr = NULL;
return 0;
}
+
+int cxl_cper_register_event_work(struct work_struct *work)
+{
+ return cxl_cper_register_work(&cxl_cper_work, &cxl_cper_work_lock,
+ work);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_register_event_work, CXL);
+
+int cxl_cper_unregister_event_work(struct work_struct *work)
+{
+ return cxl_cper_unregister_work(&cxl_cper_work, &cxl_cper_work_lock,
+ work);
+}
EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_event_work, CXL);
int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 6/7] acpi/ghes, cper: Recognize and cache CXL Protocol errors
2024-11-19 0:39 [PATCH v3 0/7] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Smita Koralahalli
` (4 preceding siblings ...)
2024-11-19 0:39 ` [PATCH v3 5/7] acpi/ghes, cxl: Refactor work registration functions to support multiple workqueues Smita Koralahalli
@ 2024-11-19 0:39 ` Smita Koralahalli
2024-11-26 16:05 ` Jonathan Cameron
2024-12-02 18:41 ` Ira Weiny
2024-11-19 0:39 ` [PATCH v3 7/7] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors Smita Koralahalli
6 siblings, 2 replies; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-19 0:39 UTC (permalink / raw)
To: linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
Add support in GHES to detect and process CXL CPER Protocol errors, as
defined in UEFI v2.10, section N.2.13.
Define struct cxl_cper_prot_err_work_data to cache CXL protocol error
information, including RAS capabilities and severity, for further
handling.
These cached CXL CPER records will later be processed by workqueues
within the CXL subsystem.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/acpi/apei/ghes.c | 52 ++++++++++++++++++++++++++++++++++++++++
include/cxl/event.h | 6 +++++
2 files changed, 58 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 62ffe6eb5503..6cd9d5375d7c 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -676,6 +676,54 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
schedule_work(&entry->work);
}
+static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
+ int severity)
+{
+ struct cxl_cper_prot_err_work_data wd;
+ u8 *dvsec_start, *cap_start;
+
+ if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
+ pr_err_ratelimited("CXL CPER invalid agent type\n");
+ return;
+ }
+
+ if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
+ pr_err_ratelimited("CXL CPER invalid protocol error log\n");
+ return;
+ }
+
+ if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
+ pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
+ prot_err->err_len);
+ return;
+ }
+
+ if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
+ pr_warn(FW_WARN "CXL CPER no device serial number\n");
+
+ switch (prot_err->agent_type) {
+ case RCD:
+ case DEVICE:
+ case LD:
+ case FMLD:
+ case RP:
+ case DSP:
+ case USP:
+ memcpy(&wd.prot_err, prot_err, sizeof(wd.prot_err));
+
+ dvsec_start = (u8 *)(prot_err + 1);
+ cap_start = dvsec_start + prot_err->dvsec_len;
+
+ wd.ras_cap = *(struct cxl_ras_capability_regs *)cap_start;
+ wd.severity = cper_severity_to_aer(severity);
+ break;
+ default:
+ pr_err_ratelimited("CXL CPER invalid agent type: %d\n",
+ prot_err->agent_type);
+ return;
+ }
+}
+
/* Room for 8 entries for each of the 4 event log queues */
#define CXL_CPER_FIFO_DEPTH 32
DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, CXL_CPER_FIFO_DEPTH);
@@ -795,6 +843,10 @@ static bool ghes_do_proc(struct ghes *ghes,
}
else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
queued = ghes_handle_arm_hw_error(gdata, sev, sync);
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
+ struct cxl_cper_sec_prot_err *prot_err = acpi_hest_get_payload(gdata);
+
+ cxl_cper_post_prot_err(prot_err, gdata->error_severity);
} else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 992568b35455..c9a38ebaf207 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -232,6 +232,12 @@ struct cxl_ras_capability_regs {
u32 header_log[16];
};
+struct cxl_cper_prot_err_work_data {
+ struct cxl_cper_sec_prot_err prot_err;
+ struct cxl_ras_capability_regs ras_cap;
+ int severity;
+};
+
#ifdef CONFIG_ACPI_APEI_GHES
int cxl_cper_register_event_work(struct work_struct *work);
int cxl_cper_unregister_event_work(struct work_struct *work);
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 7/7] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
2024-11-19 0:39 [PATCH v3 0/7] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Smita Koralahalli
` (5 preceding siblings ...)
2024-11-19 0:39 ` [PATCH v3 6/7] acpi/ghes, cper: Recognize and cache CXL Protocol errors Smita Koralahalli
@ 2024-11-19 0:39 ` Smita Koralahalli
2024-11-26 16:05 ` Jonathan Cameron
6 siblings, 1 reply; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-19 0:39 UTC (permalink / raw)
To: linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
When PCIe AER is in FW-First, OS should process CXL Protocol errors from
CPER records. Introduce support for handling and logging CXL Protocol
errors.
The defined trace events cxl_aer_uncorrectable_error and
cxl_aer_correctable_error trace native CXL AER endpoint errors, while
cxl_cper_trace_corr_prot_err and cxl_cper_trace_uncorr_prot_err
trace native CXL AER port errors. Reuse both sets to trace FW-First
protocol errors.
Since the CXL code is required to be called from process context and
GHES is in interrupt context, use workqueues for processing.
Similar to CXL CPER event handling, use kfifo to handle errors as it
simplifies queue processing by providing lock free fifo operations.
Add the ability for the CXL sub-system to register a workqueue to
process CXL CPER protocol errors.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
drivers/acpi/apei/ghes.c | 41 ++++++++++++++++++++++++++++++
drivers/cxl/core/pci.c | 50 ++++++++++++++++++++++++++++++++++++
drivers/cxl/cxlpci.h | 6 +++++
drivers/cxl/pci.c | 55 ++++++++++++++++++++++++++++++++++++++++
include/cxl/event.h | 15 +++++++++++
5 files changed, 167 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6cd9d5375d7c..32062b6a9985 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -676,6 +676,15 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
schedule_work(&entry->work);
}
+/* Room for 8 entries */
+#define CXL_CPER_PROT_ERR_FIFO_DEPTH 8
+static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
+ CXL_CPER_PROT_ERR_FIFO_DEPTH);
+
+/* Synchronize schedule_work() with cxl_cper_prot_err_work changes */
+static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
+struct work_struct *cxl_cper_prot_err_work;
+
static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
int severity)
{
@@ -701,6 +710,11 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
pr_warn(FW_WARN "CXL CPER no device serial number\n");
+ guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
+
+ if (!cxl_cper_prot_err_work)
+ return;
+
switch (prot_err->agent_type) {
case RCD:
case DEVICE:
@@ -722,6 +736,13 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
prot_err->agent_type);
return;
}
+
+ if (!kfifo_put(&cxl_cper_prot_err_fifo, wd)) {
+ pr_err_ratelimited("CXL CPER kfifo overflow\n");
+ return;
+ }
+
+ schedule_work(cxl_cper_prot_err_work);
}
/* Room for 8 entries for each of the 4 event log queues */
@@ -809,6 +830,26 @@ int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
}
EXPORT_SYMBOL_NS_GPL(cxl_cper_kfifo_get, CXL);
+int cxl_cper_register_prot_err_work(struct work_struct *work)
+{
+ return cxl_cper_register_work(&cxl_cper_prot_err_work,
+ &cxl_cper_prot_err_work_lock, work);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_register_prot_err_work, CXL);
+
+int cxl_cper_unregister_prot_err_work(struct work_struct *work)
+{
+ return cxl_cper_unregister_work(&cxl_cper_prot_err_work,
+ &cxl_cper_prot_err_work_lock, work);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, CXL);
+
+int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
+{
+ return kfifo_get(&cxl_cper_prot_err_fifo, wd);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_prot_err_kfifo_get, CXL);
+
static bool ghes_do_proc(struct ghes *ghes,
const struct acpi_hest_generic_status *estatus)
{
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 4ede038a7148..c992b34c290b 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -650,6 +650,56 @@ void read_cdat_data(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL);
+void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, bool flag,
+ struct cxl_ras_capability_regs ras_cap)
+{
+ struct cxl_dev_state *cxlds;
+ u32 status;
+
+ status = ras_cap.cor_status & ~ras_cap.cor_mask;
+
+ if (!flag) {
+ trace_cxl_port_aer_correctable_error(&pdev->dev, status);
+ return;
+ }
+
+ cxlds = pci_get_drvdata(pdev);
+ if (!cxlds)
+ return;
+
+ trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_corr_prot_err, CXL);
+
+void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, bool flag,
+ struct cxl_ras_capability_regs ras_cap)
+{
+ struct cxl_dev_state *cxlds;
+ u32 status, fe;
+
+ status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
+
+ if (hweight32(status) > 1)
+ fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
+ ras_cap.cap_control));
+ else
+ fe = status;
+
+ if (!flag) {
+ trace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,
+ ras_cap.header_log);
+ return;
+ }
+
+ cxlds = pci_get_drvdata(pdev);
+ if (!cxlds)
+ return;
+
+ trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe,
+ ras_cap.header_log);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_uncorr_prot_err, CXL);
+
static void __cxl_handle_cor_ras(struct device *dev,
void __iomem *ras_base)
{
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 4da07727ab9c..5e4aa8681937 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -129,4 +129,10 @@ void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
+
+struct cxl_ras_capability_regs;
+void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, bool flag,
+ struct cxl_ras_capability_regs ras_cap);
+void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, bool flag,
+ struct cxl_ras_capability_regs ras_cap);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 88a14d7baa65..e261abe60e90 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1067,6 +1067,53 @@ static void cxl_cper_work_fn(struct work_struct *work)
}
static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
+static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
+{
+ unsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device,
+ data->prot_err.agent_addr.function);
+ struct pci_dev *pdev __free(pci_dev_put) =
+ pci_get_domain_bus_and_slot(
+ data->prot_err.agent_addr.segment,
+ data->prot_err.agent_addr.bus,
+ devfn
+ );
+ int port_type;
+
+ if (!pdev)
+ return;
+
+ guard(device)(&pdev->dev);
+ if (pdev->driver != &cxl_pci_driver)
+ return;
+
+ port_type = pci_pcie_type(pdev);
+ if (port_type == PCI_EXP_TYPE_ROOT_PORT ||
+ port_type == PCI_EXP_TYPE_DOWNSTREAM ||
+ port_type == PCI_EXP_TYPE_UPSTREAM) {
+ if (data->severity == AER_CORRECTABLE)
+ cxl_cper_trace_corr_prot_err(pdev, false, data->ras_cap);
+ else
+ cxl_cper_trace_uncorr_prot_err(pdev, false, data->ras_cap);
+
+ return;
+ }
+
+ if (data->severity == AER_CORRECTABLE)
+ cxl_cper_trace_corr_prot_err(pdev, true, data->ras_cap);
+ else
+ cxl_cper_trace_uncorr_prot_err(pdev, true, data->ras_cap);
+
+}
+
+static void cxl_cper_prot_err_work_fn(struct work_struct *work)
+{
+ struct cxl_cper_prot_err_work_data wd;
+
+ while (cxl_cper_prot_err_kfifo_get(&wd))
+ cxl_cper_handle_prot_err(&wd);
+}
+static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
+
static int __init cxl_pci_driver_init(void)
{
int rc;
@@ -1079,13 +1126,21 @@ static int __init cxl_pci_driver_init(void)
if (rc)
pci_unregister_driver(&cxl_pci_driver);
+ rc = cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
+ if (rc) {
+ cxl_cper_unregister_event_work(&cxl_cper_work);
+ pci_unregister_driver(&cxl_pci_driver);
+ }
+
return rc;
}
static void __exit cxl_pci_driver_exit(void)
{
cxl_cper_unregister_event_work(&cxl_cper_work);
+ cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
cancel_work_sync(&cxl_cper_work);
+ cancel_work_sync(&cxl_cper_prot_err_work);
pci_unregister_driver(&cxl_pci_driver);
}
diff --git a/include/cxl/event.h b/include/cxl/event.h
index c9a38ebaf207..5f83c3bfc813 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -242,6 +242,9 @@ struct cxl_cper_prot_err_work_data {
int cxl_cper_register_event_work(struct work_struct *work);
int cxl_cper_unregister_event_work(struct work_struct *work);
int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd);
+int cxl_cper_register_prot_err_work(struct work_struct *work);
+int cxl_cper_unregister_prot_err_work(struct work_struct *work);
+int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd);
#else
static inline int cxl_cper_register_event_work(struct work_struct *work)
{
@@ -256,6 +259,18 @@ static inline int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
{
return 0;
}
+static inline int cxl_cper_register_prot_err_work(struct work_struct *work)
+{
+ return 0;
+}
+static inline int cxl_cper_unregister_prot_err_work(struct work_struct *work)
+{
+ return 0;
+}
+static inline int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
+{
+ return 0;
+}
#endif
#endif /* _LINUX_CXL_EVENT_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 1/7] efi/cper, cxl: Prefix protocol error struct and function names with cxl_
2024-11-19 0:39 ` [PATCH v3 1/7] efi/cper, cxl: Prefix protocol error struct and function names with cxl_ Smita Koralahalli
@ 2024-11-26 15:05 ` Jonathan Cameron
2024-12-02 18:12 ` Ira Weiny
1 sibling, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2024-11-26 15:05 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On Tue, 19 Nov 2024 00:39:09 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> Rename the protocol error struct from struct cper_sec_prot_err to
> struct cxl_cper_sec_prot_err and cper_print_prot_err() to
> cxl_cper_print_prot_err() to maintain naming consistency. No
> functional changes.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Hi Smita,
Seems sensible to me.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/7] efi/cper, cxl: Make definitions and structures global
2024-11-19 0:39 ` [PATCH v3 2/7] efi/cper, cxl: Make definitions and structures global Smita Koralahalli
@ 2024-11-26 15:09 ` Jonathan Cameron
2024-12-02 18:15 ` Ira Weiny
1 sibling, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2024-11-26 15:09 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On Tue, 19 Nov 2024 00:39:10 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> In preparation to add tracepoint support, move protocol error UUID
> definition to a common location, Also, make struct CXL RAS capability,
> cxl_cper_sec_prot_err and CPER validation flags global for use across
> different modules.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 3/7] efi/cper, cxl: Remove cper_cxl.h
2024-11-19 0:39 ` [PATCH v3 3/7] efi/cper, cxl: Remove cper_cxl.h Smita Koralahalli
@ 2024-11-26 15:51 ` Jonathan Cameron
2024-11-27 19:36 ` Smita Koralahalli
2024-12-02 18:15 ` Ira Weiny
1 sibling, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2024-11-26 15:51 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On Tue, 19 Nov 2024 00:39:11 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> Move the declaration of cxl_cper_print_prot_err() to include/linux/cper.h
> to avoid maintaining a separate header file just for this function
> declaration. Remove drivers/firmware/efi/cper_cxl.h as its contents have
> been reorganized.
>
> Eliminate its corresponding #include directives from source files that
> previously included it, since the header file has been removed.
You lost me on this one. Looks like only place these existed was the now
empty header? I'd not mention that as it's just bit confusing.
>
> No functional changes.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
> deleted file mode 100644
> index 5ce1401ee17a..000000000000
> --- a/drivers/firmware/efi/cper_cxl.h
> +++ /dev/null
> @@ -1,16 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/*
> - * UEFI Common Platform Error Record (CPER) support for CXL Section.
> - *
> - * Copyright (C) 2022 Advanced Micro Devices, Inc.
> - *
> - * Author: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> - */
> -
> -#ifndef LINUX_CPER_CXL_H
> -#define LINUX_CPER_CXL_H
> -
> -void cxl_cper_print_prot_err(const char *pfx,
> - const struct cxl_cper_sec_prot_err *prot_err);
> -
> -#endif //__CPER_CXL_
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 4/7] acpi/ghes, cxl: Rename cxl_cper_register_work to cxl_cper_register_event_work
2024-11-19 0:39 ` [PATCH v3 4/7] acpi/ghes, cxl: Rename cxl_cper_register_work to cxl_cper_register_event_work Smita Koralahalli
@ 2024-11-26 15:53 ` Jonathan Cameron
0 siblings, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2024-11-26 15:53 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On Tue, 19 Nov 2024 00:39:12 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> Rename cxl_cper_register_work() to cxl_cper_register_event_work() to
> better reflect its purpose of registering CXL Component Events based work
> within the CXL subsystem.
>
> This rename prepares the codebase to support future patches where
> cxl_cper_register_work() will accept generic pointers for Protocol Error
> workqueue integration.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 5/7] acpi/ghes, cxl: Refactor work registration functions to support multiple workqueues
2024-11-19 0:39 ` [PATCH v3 5/7] acpi/ghes, cxl: Refactor work registration functions to support multiple workqueues Smita Koralahalli
@ 2024-11-26 15:57 ` Jonathan Cameron
2024-11-27 19:46 ` Smita Koralahalli
0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2024-11-26 15:57 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On Tue, 19 Nov 2024 00:39:13 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> Refactor the work registration and unregistration functions in GHES to
> enable reuse across different workqueues. This update lays the foundation
> for integrating additional workqueues in the CXL subsystem for better
> modularity and code reuse.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/acpi/apei/ghes.c | 34 +++++++++++++++++++++++++---------
> 1 file changed, 25 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 082c409707ba..62ffe6eb5503 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -717,26 +717,42 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
> schedule_work(cxl_cper_work);
> }
>
> -int cxl_cper_register_event_work(struct work_struct *work)
> +static int cxl_cper_register_work(struct work_struct **work_ptr,
> + spinlock_t *lock,
> + struct work_struct *work)
This is a somewhat strange interface. It doesn't
really do anything particularly useful. I'd be tempted to
just open code this at each call site.
> {
> - if (cxl_cper_work)
> + if (*work_ptr)
> return -EINVAL;
>
> - guard(spinlock)(&cxl_cper_work_lock);
> - cxl_cper_work = work;
> + guard(spinlock)(lock);
> + *work_ptr = work;
> return 0;
> }
> -EXPORT_SYMBOL_NS_GPL(cxl_cper_register_event_work, CXL);
>
> -int cxl_cper_unregister_event_work(struct work_struct *work)
> +static int cxl_cper_unregister_work(struct work_struct **work_ptr,
> + spinlock_t *lock,
> + struct work_struct *work)
> {
> - if (cxl_cper_work != work)
> + if (*work_ptr != work)
As above.
> return -EINVAL;
>
> - guard(spinlock)(&cxl_cper_work_lock);
> - cxl_cper_work = NULL;
> + guard(spinlock)(lock);
> + *work_ptr = NULL;
> return 0;
> }
> +
> +int cxl_cper_register_event_work(struct work_struct *work)
> +{
> + return cxl_cper_register_work(&cxl_cper_work, &cxl_cper_work_lock,
> + work);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_register_event_work, CXL);
> +
> +int cxl_cper_unregister_event_work(struct work_struct *work)
> +{
> + return cxl_cper_unregister_work(&cxl_cper_work, &cxl_cper_work_lock,
> + work);
> +}
> EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_event_work, CXL);
>
> int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 7/7] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
2024-11-19 0:39 ` [PATCH v3 7/7] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors Smita Koralahalli
@ 2024-11-26 16:05 ` Jonathan Cameron
2024-11-27 20:35 ` Smita Koralahalli
0 siblings, 1 reply; 25+ messages in thread
From: Jonathan Cameron @ 2024-11-26 16:05 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On Tue, 19 Nov 2024 00:39:15 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> When PCIe AER is in FW-First, OS should process CXL Protocol errors from
> CPER records. Introduce support for handling and logging CXL Protocol
> errors.
>
> The defined trace events cxl_aer_uncorrectable_error and
> cxl_aer_correctable_error trace native CXL AER endpoint errors, while
> cxl_cper_trace_corr_prot_err and cxl_cper_trace_uncorr_prot_err
> trace native CXL AER port errors. Reuse both sets to trace FW-First
> protocol errors.
>
> Since the CXL code is required to be called from process context and
> GHES is in interrupt context, use workqueues for processing.
>
> Similar to CXL CPER event handling, use kfifo to handle errors as it
> simplifies queue processing by providing lock free fifo operations.
>
> Add the ability for the CXL sub-system to register a workqueue to
> process CXL CPER protocol errors.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
A few minor comments inline.
Thanks
Jonathan
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 4ede038a7148..c992b34c290b 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -650,6 +650,56 @@ void read_cdat_data(struct cxl_port *port)
> }
> EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL);
>
> +void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, bool flag,
> + struct cxl_ras_capability_regs ras_cap)
> +{
> + struct cxl_dev_state *cxlds;
> + u32 status;
> +
> + status = ras_cap.cor_status & ~ras_cap.cor_mask;
> +
> + if (!flag) {
As below. Name of flag is not very helpful when reading the code.
Perhaps we can rename?
> + trace_cxl_port_aer_correctable_error(&pdev->dev, status);
> + return;
> + }
> +
> + cxlds = pci_get_drvdata(pdev);
> + if (!cxlds)
> + return;
> +
> + trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_corr_prot_err, CXL);
> +
> +void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, bool flag,
> + struct cxl_ras_capability_regs ras_cap)
> +{
> + struct cxl_dev_state *cxlds;
> + u32 status, fe;
> +
> + status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
> +
> + if (hweight32(status) > 1)
> + fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
> + ras_cap.cap_control));
> + else
> + fe = status;
> +
> + if (!flag) {
Why does a bool named flag indicate it's a port error?
> + trace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,
> + ras_cap.header_log);
> + return;
> + }
> +
> + cxlds = pci_get_drvdata(pdev);
> + if (!cxlds)
> + return;
> +
> + trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe,
> + ras_cap.header_log);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_uncorr_prot_err, CXL);
> +
> static void __cxl_handle_cor_ras(struct device *dev,
> void __iomem *ras_base)
> {
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 4da07727ab9c..5e4aa8681937 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -129,4 +129,10 @@ void read_cdat_data(struct cxl_port *port);
> void cxl_cor_error_detected(struct pci_dev *pdev);
> pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> pci_channel_state_t state);
> +
> +struct cxl_ras_capability_regs;
> +void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, bool flag,
> + struct cxl_ras_capability_regs ras_cap);
> +void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, bool flag,
> + struct cxl_ras_capability_regs ras_cap);
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 88a14d7baa65..e261abe60e90 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1067,6 +1067,53 @@ static void cxl_cper_work_fn(struct work_struct *work)
> }
> static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
>
> +static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
> +{
> + unsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device,
> + data->prot_err.agent_addr.function);
> + struct pci_dev *pdev __free(pci_dev_put) =
> + pci_get_domain_bus_and_slot(
> + data->prot_err.agent_addr.segment,
> + data->prot_err.agent_addr.bus,
> + devfn
> + );
pci_get_domain_bus_and_slot(data->prot_err.agent_addr.segment,
data->prot_err.agent_addr.bus,
devfn);
> + int port_type;
> +
> + if (!pdev)
> + return;
> +
> + guard(device)(&pdev->dev);
> + if (pdev->driver != &cxl_pci_driver)
> + return;
> +
> + port_type = pci_pcie_type(pdev);
> + if (port_type == PCI_EXP_TYPE_ROOT_PORT ||
> + port_type == PCI_EXP_TYPE_DOWNSTREAM ||
> + port_type == PCI_EXP_TYPE_UPSTREAM) {
> + if (data->severity == AER_CORRECTABLE)
> + cxl_cper_trace_corr_prot_err(pdev, false, data->ras_cap);
> + else
> + cxl_cper_trace_uncorr_prot_err(pdev, false, data->ras_cap);
> +
> + return;
> + }
> +
> + if (data->severity == AER_CORRECTABLE)
> + cxl_cper_trace_corr_prot_err(pdev, true, data->ras_cap);
> + else
> + cxl_cper_trace_uncorr_prot_err(pdev, true, data->ras_cap);
> +
> +}
> static int __init cxl_pci_driver_init(void)
> {
> int rc;
> @@ -1079,13 +1126,21 @@ static int __init cxl_pci_driver_init(void)
> if (rc)
> pci_unregister_driver(&cxl_pci_driver);
>
> + rc = cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
> + if (rc) {
> + cxl_cper_unregister_event_work(&cxl_cper_work);
> + pci_unregister_driver(&cxl_pci_driver);
I'd switch this to a goto style for error handling.
> + }
> +
> return rc;
that is
return 0;
err_unregister_event_work:
cxl_cper_unregister_event_work(&cxl_cper_work);
err_unreg:
pci_unregister_driver(&cxl_pci_driver);
return rc;
> }
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 6/7] acpi/ghes, cper: Recognize and cache CXL Protocol errors
2024-11-19 0:39 ` [PATCH v3 6/7] acpi/ghes, cper: Recognize and cache CXL Protocol errors Smita Koralahalli
@ 2024-11-26 16:05 ` Jonathan Cameron
2024-12-02 18:41 ` Ira Weiny
1 sibling, 0 replies; 25+ messages in thread
From: Jonathan Cameron @ 2024-11-26 16:05 UTC (permalink / raw)
To: Smita Koralahalli
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On Tue, 19 Nov 2024 00:39:14 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> Add support in GHES to detect and process CXL CPER Protocol errors, as
> defined in UEFI v2.10, section N.2.13.
>
> Define struct cxl_cper_prot_err_work_data to cache CXL protocol error
> information, including RAS capabilities and severity, for further
> handling.
>
> These cached CXL CPER records will later be processed by workqueues
> within the CXL subsystem.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Looks fine,
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 3/7] efi/cper, cxl: Remove cper_cxl.h
2024-11-26 15:51 ` Jonathan Cameron
@ 2024-11-27 19:36 ` Smita Koralahalli
0 siblings, 0 replies; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-27 19:36 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On 11/26/2024 7:51 AM, Jonathan Cameron wrote:
> On Tue, 19 Nov 2024 00:39:11 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
>
>> Move the declaration of cxl_cper_print_prot_err() to include/linux/cper.h
>> to avoid maintaining a separate header file just for this function
>> declaration. Remove drivers/firmware/efi/cper_cxl.h as its contents have
>> been reorganized.
>>
>> Eliminate its corresponding #include directives from source files that
>> previously included it, since the header file has been removed.
>
> You lost me on this one. Looks like only place these existed was the now
> empty header? I'd not mention that as it's just bit confusing.
Yes. I will remove this sentence.
Thanks
Smita
>
>
>>
>> No functional changes.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
>> deleted file mode 100644
>> index 5ce1401ee17a..000000000000
>> --- a/drivers/firmware/efi/cper_cxl.h
>> +++ /dev/null
>> @@ -1,16 +0,0 @@
>> -/* SPDX-License-Identifier: GPL-2.0-only */
>> -/*
>> - * UEFI Common Platform Error Record (CPER) support for CXL Section.
>> - *
>> - * Copyright (C) 2022 Advanced Micro Devices, Inc.
>> - *
>> - * Author: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> - */
>> -
>> -#ifndef LINUX_CPER_CXL_H
>> -#define LINUX_CPER_CXL_H
>> -
>> -void cxl_cper_print_prot_err(const char *pfx,
>> - const struct cxl_cper_sec_prot_err *prot_err);
>> -
>> -#endif //__CPER_CXL_
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 5/7] acpi/ghes, cxl: Refactor work registration functions to support multiple workqueues
2024-11-26 15:57 ` Jonathan Cameron
@ 2024-11-27 19:46 ` Smita Koralahalli
0 siblings, 0 replies; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-27 19:46 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On 11/26/2024 7:57 AM, Jonathan Cameron wrote:
> On Tue, 19 Nov 2024 00:39:13 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
>
>> Refactor the work registration and unregistration functions in GHES to
>> enable reuse across different workqueues. This update lays the foundation
>> for integrating additional workqueues in the CXL subsystem for better
>> modularity and code reuse.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>> drivers/acpi/apei/ghes.c | 34 +++++++++++++++++++++++++---------
>> 1 file changed, 25 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 082c409707ba..62ffe6eb5503 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -717,26 +717,42 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
>> schedule_work(cxl_cper_work);
>> }
>>
>> -int cxl_cper_register_event_work(struct work_struct *work)
>> +static int cxl_cper_register_work(struct work_struct **work_ptr,
>> + spinlock_t *lock,
>> + struct work_struct *work)
>
> This is a somewhat strange interface. It doesn't
> really do anything particularly useful. I'd be tempted to
> just open code this at each call site.
Okay I will change.
>
>
>> {
>> - if (cxl_cper_work)
>> + if (*work_ptr)
>> return -EINVAL;
>>
>> - guard(spinlock)(&cxl_cper_work_lock);
>> - cxl_cper_work = work;
>> + guard(spinlock)(lock);
>> + *work_ptr = work;
>> return 0;
>> }
>> -EXPORT_SYMBOL_NS_GPL(cxl_cper_register_event_work, CXL);
>>
>> -int cxl_cper_unregister_event_work(struct work_struct *work)
>> +static int cxl_cper_unregister_work(struct work_struct **work_ptr,
>> + spinlock_t *lock,
>> + struct work_struct *work)
>> {
>> - if (cxl_cper_work != work)
>> + if (*work_ptr != work)
> As above.
okay.
Thanks
Smita
>
>> return -EINVAL;
>>
>> - guard(spinlock)(&cxl_cper_work_lock);
>> - cxl_cper_work = NULL;
>> + guard(spinlock)(lock);
>> + *work_ptr = NULL;
>> return 0;
>> }
>> +
>> +int cxl_cper_register_event_work(struct work_struct *work)
>> +{
>> + return cxl_cper_register_work(&cxl_cper_work, &cxl_cper_work_lock,
>> + work);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_cper_register_event_work, CXL);
>> +
>> +int cxl_cper_unregister_event_work(struct work_struct *work)
>> +{
>> + return cxl_cper_unregister_work(&cxl_cper_work, &cxl_cper_work_lock,
>> + work);
>> +}
>> EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_event_work, CXL);
>>
>> int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 7/7] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
2024-11-26 16:05 ` Jonathan Cameron
@ 2024-11-27 20:35 ` Smita Koralahalli
2024-12-02 18:48 ` Ira Weiny
0 siblings, 1 reply; 25+ messages in thread
From: Smita Koralahalli @ 2024-11-27 20:35 UTC (permalink / raw)
To: Jonathan Cameron
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
On 11/26/2024 8:05 AM, Jonathan Cameron wrote:
> On Tue, 19 Nov 2024 00:39:15 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
>
>> When PCIe AER is in FW-First, OS should process CXL Protocol errors from
>> CPER records. Introduce support for handling and logging CXL Protocol
>> errors.
>>
>> The defined trace events cxl_aer_uncorrectable_error and
>> cxl_aer_correctable_error trace native CXL AER endpoint errors, while
>> cxl_cper_trace_corr_prot_err and cxl_cper_trace_uncorr_prot_err
>> trace native CXL AER port errors. Reuse both sets to trace FW-First
>> protocol errors.
>>
>> Since the CXL code is required to be called from process context and
>> GHES is in interrupt context, use workqueues for processing.
>>
>> Similar to CXL CPER event handling, use kfifo to handle errors as it
>> simplifies queue processing by providing lock free fifo operations.
>>
>> Add the ability for the CXL sub-system to register a workqueue to
>> process CXL CPER protocol errors.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>
> A few minor comments inline.
>
> Thanks
>
> Jonathan
>
>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
>> index 4ede038a7148..c992b34c290b 100644
>> --- a/drivers/cxl/core/pci.c
>> +++ b/drivers/cxl/core/pci.c
>> @@ -650,6 +650,56 @@ void read_cdat_data(struct cxl_port *port)
>> }
>> EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL);
>>
>> +void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, bool flag,
>> + struct cxl_ras_capability_regs ras_cap)
>> +{
>> + struct cxl_dev_state *cxlds;
>> + u32 status;
>> +
>> + status = ras_cap.cor_status & ~ras_cap.cor_mask;
>> +
>> + if (!flag) {
>
> As below. Name of flag is not very helpful when reading the code.
> Perhaps we can rename?
Okay. May be flag -> is_device_error ?
>
>> + trace_cxl_port_aer_correctable_error(&pdev->dev, status);
>> + return;
>> + }
>> +
>> + cxlds = pci_get_drvdata(pdev);
>> + if (!cxlds)
>> + return;
>> +
>> + trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_corr_prot_err, CXL);
>> +
>> +void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, bool flag,
>> + struct cxl_ras_capability_regs ras_cap)
>> +{
>> + struct cxl_dev_state *cxlds;
>> + u32 status, fe;
>> +
>> + status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
>> +
>> + if (hweight32(status) > 1)
>> + fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
>> + ras_cap.cap_control));
>> + else
>> + fe = status;
>> +
>> + if (!flag) {
>
> Why does a bool named flag indicate it's a port error?
I will rename it.
Or may be use an enum to explicitly define the error type
(CXL_ERROR_TYPE_DEVICE and CXL_ERROR_TYPE_PORT).
Or may be split the function into two distinct ones, one for port errors
and one for device errors.
Let me know your preference or any other suggestions here. I will change
it accordingly.
>
>> + trace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,
>> + ras_cap.header_log);
>> + return;
>> + }
>> +
>> + cxlds = pci_get_drvdata(pdev);
>> + if (!cxlds)
>> + return;
>> +
>> + trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe,
>> + ras_cap.header_log);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_uncorr_prot_err, CXL);
>> +
>> static void __cxl_handle_cor_ras(struct device *dev,
>> void __iomem *ras_base)
>> {
>> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
>> index 4da07727ab9c..5e4aa8681937 100644
>> --- a/drivers/cxl/cxlpci.h
>> +++ b/drivers/cxl/cxlpci.h
>> @@ -129,4 +129,10 @@ void read_cdat_data(struct cxl_port *port);
>> void cxl_cor_error_detected(struct pci_dev *pdev);
>> pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>> pci_channel_state_t state);
>> +
>> +struct cxl_ras_capability_regs;
>> +void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, bool flag,
>> + struct cxl_ras_capability_regs ras_cap);
>> +void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, bool flag,
>> + struct cxl_ras_capability_regs ras_cap);
>> #endif /* __CXL_PCI_H__ */
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 88a14d7baa65..e261abe60e90 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -1067,6 +1067,53 @@ static void cxl_cper_work_fn(struct work_struct *work)
>> }
>> static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
>>
>> +static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
>> +{
>> + unsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device,
>> + data->prot_err.agent_addr.function);
>> + struct pci_dev *pdev __free(pci_dev_put) =
>> + pci_get_domain_bus_and_slot(
>> + data->prot_err.agent_addr.segment,
>> + data->prot_err.agent_addr.bus,
>> + devfn
>> + );
> pci_get_domain_bus_and_slot(data->prot_err.agent_addr.segment,
> data->prot_err.agent_addr.bus,
> devfn);
Noted.
>
>> + int port_type;
>> +
>> + if (!pdev)
>> + return;
>> +
>> + guard(device)(&pdev->dev);
>> + if (pdev->driver != &cxl_pci_driver)
>> + return;
>> +
>> + port_type = pci_pcie_type(pdev);
>> + if (port_type == PCI_EXP_TYPE_ROOT_PORT ||
>> + port_type == PCI_EXP_TYPE_DOWNSTREAM ||
>> + port_type == PCI_EXP_TYPE_UPSTREAM) {
>> + if (data->severity == AER_CORRECTABLE)
>> + cxl_cper_trace_corr_prot_err(pdev, false, data->ras_cap);
>> + else
>> + cxl_cper_trace_uncorr_prot_err(pdev, false, data->ras_cap);
>> +
>> + return;
>> + }
>> +
>> + if (data->severity == AER_CORRECTABLE)
>> + cxl_cper_trace_corr_prot_err(pdev, true, data->ras_cap);
>> + else
>> + cxl_cper_trace_uncorr_prot_err(pdev, true, data->ras_cap);
>> +
>> +}
>
>> static int __init cxl_pci_driver_init(void)
>> {
>> int rc;
>> @@ -1079,13 +1126,21 @@ static int __init cxl_pci_driver_init(void)
>> if (rc)
>> pci_unregister_driver(&cxl_pci_driver);
>>
>> + rc = cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
>> + if (rc) {
>> + cxl_cper_unregister_event_work(&cxl_cper_work);
>> + pci_unregister_driver(&cxl_pci_driver);
> I'd switch this to a goto style for error handling.
>
>
>> + }
>> +
>> return rc;
>
> that is
> return 0;
>
> err_unregister_event_work:
> cxl_cper_unregister_event_work(&cxl_cper_work);
> err_unreg:
> pci_unregister_driver(&cxl_pci_driver);
> return rc;
>> }
Noted.
Thanks
Smita
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 1/7] efi/cper, cxl: Prefix protocol error struct and function names with cxl_
2024-11-19 0:39 ` [PATCH v3 1/7] efi/cper, cxl: Prefix protocol error struct and function names with cxl_ Smita Koralahalli
2024-11-26 15:05 ` Jonathan Cameron
@ 2024-12-02 18:12 ` Ira Weiny
1 sibling, 0 replies; 25+ messages in thread
From: Ira Weiny @ 2024-12-02 18:12 UTC (permalink / raw)
To: Smita Koralahalli, linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
Smita Koralahalli wrote:
> Rename the protocol error struct from struct cper_sec_prot_err to
> struct cxl_cper_sec_prot_err and cper_print_prot_err() to
> cxl_cper_print_prot_err() to maintain naming consistency. No
> functional changes.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 3/7] efi/cper, cxl: Remove cper_cxl.h
2024-11-19 0:39 ` [PATCH v3 3/7] efi/cper, cxl: Remove cper_cxl.h Smita Koralahalli
2024-11-26 15:51 ` Jonathan Cameron
@ 2024-12-02 18:15 ` Ira Weiny
1 sibling, 0 replies; 25+ messages in thread
From: Ira Weiny @ 2024-12-02 18:15 UTC (permalink / raw)
To: Smita Koralahalli, linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
Smita Koralahalli wrote:
> Move the declaration of cxl_cper_print_prot_err() to include/linux/cper.h
> to avoid maintaining a separate header file just for this function
> declaration. Remove drivers/firmware/efi/cper_cxl.h as its contents have
> been reorganized.
>
> Eliminate its corresponding #include directives from source files that
> previously included it, since the header file has been removed.
>
> No functional changes.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
I was going to make a comment on this header being pretty sparse after
patch 2. Thanks!
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/7] efi/cper, cxl: Make definitions and structures global
2024-11-19 0:39 ` [PATCH v3 2/7] efi/cper, cxl: Make definitions and structures global Smita Koralahalli
2024-11-26 15:09 ` Jonathan Cameron
@ 2024-12-02 18:15 ` Ira Weiny
1 sibling, 0 replies; 25+ messages in thread
From: Ira Weiny @ 2024-12-02 18:15 UTC (permalink / raw)
To: Smita Koralahalli, linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
Smita Koralahalli wrote:
> In preparation to add tracepoint support, move protocol error UUID
> definition to a common location, Also, make struct CXL RAS capability,
> cxl_cper_sec_prot_err and CPER validation flags global for use across
> different modules.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
After seeing patch 3. :-D
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 6/7] acpi/ghes, cper: Recognize and cache CXL Protocol errors
2024-11-19 0:39 ` [PATCH v3 6/7] acpi/ghes, cper: Recognize and cache CXL Protocol errors Smita Koralahalli
2024-11-26 16:05 ` Jonathan Cameron
@ 2024-12-02 18:41 ` Ira Weiny
2024-12-06 16:16 ` Koralahalli Channabasappa, Smita
1 sibling, 1 reply; 25+ messages in thread
From: Ira Weiny @ 2024-12-02 18:41 UTC (permalink / raw)
To: Smita Koralahalli, linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Ira Weiny,
Dan Williams, Jonathan Cameron, Yazen Ghannam, Terry Bowman,
Smita Koralahalli
Smita Koralahalli wrote:
> Add support in GHES to detect and process CXL CPER Protocol errors, as
> defined in UEFI v2.10, section N.2.13.
>
> Define struct cxl_cper_prot_err_work_data to cache CXL protocol error
> information, including RAS capabilities and severity, for further
> handling.
>
> These cached CXL CPER records will later be processed by workqueues
> within the CXL subsystem.
>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> drivers/acpi/apei/ghes.c | 52 ++++++++++++++++++++++++++++++++++++++++
> include/cxl/event.h | 6 +++++
> 2 files changed, 58 insertions(+)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 62ffe6eb5503..6cd9d5375d7c 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -676,6 +676,54 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
> schedule_work(&entry->work);
> }
>
> +static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> + int severity)
> +{
> + struct cxl_cper_prot_err_work_data wd;
> + u8 *dvsec_start, *cap_start;
> +
> + if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
> + pr_err_ratelimited("CXL CPER invalid agent type\n");
> + return;
> + }
> +
> + if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
> + pr_err_ratelimited("CXL CPER invalid protocol error log\n");
> + return;
> + }
> +
> + if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
> + pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
> + prot_err->err_len);
> + return;
> + }
> +
> + if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
> + pr_warn(FW_WARN "CXL CPER no device serial number\n");
> +
> + switch (prot_err->agent_type) {
> + case RCD:
> + case DEVICE:
> + case LD:
> + case FMLD:
> + case RP:
> + case DSP:
> + case USP:
> + memcpy(&wd.prot_err, prot_err, sizeof(wd.prot_err));
> +
> + dvsec_start = (u8 *)(prot_err + 1);
> + cap_start = dvsec_start + prot_err->dvsec_len;
> +
> + wd.ras_cap = *(struct cxl_ras_capability_regs *)cap_start;
Why not memcpy()?
Ira
[snip]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 7/7] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
2024-11-27 20:35 ` Smita Koralahalli
@ 2024-12-02 18:48 ` Ira Weiny
2024-12-06 16:29 ` Koralahalli Channabasappa, Smita
0 siblings, 1 reply; 25+ messages in thread
From: Ira Weiny @ 2024-12-02 18:48 UTC (permalink / raw)
To: Smita Koralahalli, Jonathan Cameron
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Yazen Ghannam, Terry Bowman
Smita Koralahalli wrote:
> On 11/26/2024 8:05 AM, Jonathan Cameron wrote:
> > On Tue, 19 Nov 2024 00:39:15 +0000
> > Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> >
> >> When PCIe AER is in FW-First, OS should process CXL Protocol errors from
> >> CPER records. Introduce support for handling and logging CXL Protocol
> >> errors.
> >>
> >> The defined trace events cxl_aer_uncorrectable_error and
> >> cxl_aer_correctable_error trace native CXL AER endpoint errors, while
> >> cxl_cper_trace_corr_prot_err and cxl_cper_trace_uncorr_prot_err
> >> trace native CXL AER port errors. Reuse both sets to trace FW-First
> >> protocol errors.
> >>
> >> Since the CXL code is required to be called from process context and
> >> GHES is in interrupt context, use workqueues for processing.
> >>
> >> Similar to CXL CPER event handling, use kfifo to handle errors as it
> >> simplifies queue processing by providing lock free fifo operations.
> >>
> >> Add the ability for the CXL sub-system to register a workqueue to
> >> process CXL CPER protocol errors.
> >>
> >> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> >
> > A few minor comments inline.
> >
> > Thanks
> >
> > Jonathan
> >
> >> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> >> index 4ede038a7148..c992b34c290b 100644
> >> --- a/drivers/cxl/core/pci.c
> >> +++ b/drivers/cxl/core/pci.c
> >> @@ -650,6 +650,56 @@ void read_cdat_data(struct cxl_port *port)
> >> }
> >> EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL);
> >>
> >> +void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, bool flag,
> >> + struct cxl_ras_capability_regs ras_cap)
> >> +{
> >> + struct cxl_dev_state *cxlds;
> >> + u32 status;
> >> +
> >> + status = ras_cap.cor_status & ~ras_cap.cor_mask;
> >> +
> >> + if (!flag) {
> >
> > As below. Name of flag is not very helpful when reading the code.
> > Perhaps we can rename?
>
> Okay. May be flag -> is_device_error ?
I had the same question about 'flag'.
> >
> >> + trace_cxl_port_aer_correctable_error(&pdev->dev, status);
> >> + return;
> >> + }
> >> +
> >> + cxlds = pci_get_drvdata(pdev);
> >> + if (!cxlds)
> >> + return;
> >> +
> >> + trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
> >> +}
> >> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_corr_prot_err, CXL);
> >> +
> >> +void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, bool flag,
> >> + struct cxl_ras_capability_regs ras_cap)
> >> +{
> >> + struct cxl_dev_state *cxlds;
> >> + u32 status, fe;
> >> +
> >> + status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
> >> +
> >> + if (hweight32(status) > 1)
> >> + fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
> >> + ras_cap.cap_control));
> >> + else
> >> + fe = status;
> >> +
> >> + if (!flag) {
> >
> > Why does a bool named flag indicate it's a port error?
>
> I will rename it.
>
> Or may be use an enum to explicitly define the error type
> (CXL_ERROR_TYPE_DEVICE and CXL_ERROR_TYPE_PORT).
>
> Or may be split the function into two distinct ones, one for port errors
> and one for device errors.
I would vote for 2 functions.
Ira
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 6/7] acpi/ghes, cper: Recognize and cache CXL Protocol errors
2024-12-02 18:41 ` Ira Weiny
@ 2024-12-06 16:16 ` Koralahalli Channabasappa, Smita
0 siblings, 0 replies; 25+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2024-12-06 16:16 UTC (permalink / raw)
To: Ira Weiny, linux-efi, linux-kernel, linux-cxl
Cc: Ard Biesheuvel, Alison Schofield, Vishal Verma, Dan Williams,
Jonathan Cameron, Yazen Ghannam, Terry Bowman
Hi Ira,
On 12/2/2024 10:41 AM, Ira Weiny wrote:
> Smita Koralahalli wrote:
>> Add support in GHES to detect and process CXL CPER Protocol errors, as
>> defined in UEFI v2.10, section N.2.13.
>>
>> Define struct cxl_cper_prot_err_work_data to cache CXL protocol error
>> information, including RAS capabilities and severity, for further
>> handling.
>>
>> These cached CXL CPER records will later be processed by workqueues
>> within the CXL subsystem.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>> drivers/acpi/apei/ghes.c | 52 ++++++++++++++++++++++++++++++++++++++++
>> include/cxl/event.h | 6 +++++
>> 2 files changed, 58 insertions(+)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 62ffe6eb5503..6cd9d5375d7c 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -676,6 +676,54 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
>> schedule_work(&entry->work);
>> }
>>
>> +static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
>> + int severity)
>> +{
>> + struct cxl_cper_prot_err_work_data wd;
>> + u8 *dvsec_start, *cap_start;
>> +
>> + if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
>> + pr_err_ratelimited("CXL CPER invalid agent type\n");
>> + return;
>> + }
>> +
>> + if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
>> + pr_err_ratelimited("CXL CPER invalid protocol error log\n");
>> + return;
>> + }
>> +
>> + if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
>> + pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
>> + prot_err->err_len);
>> + return;
>> + }
>> +
>> + if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
>> + pr_warn(FW_WARN "CXL CPER no device serial number\n");
>> +
>> + switch (prot_err->agent_type) {
>> + case RCD:
>> + case DEVICE:
>> + case LD:
>> + case FMLD:
>> + case RP:
>> + case DSP:
>> + case USP:
>> + memcpy(&wd.prot_err, prot_err, sizeof(wd.prot_err));
>> +
>> + dvsec_start = (u8 *)(prot_err + 1);
>> + cap_start = dvsec_start + prot_err->dvsec_len;
>> +
>> + wd.ras_cap = *(struct cxl_ras_capability_regs *)cap_start;
> Why not memcpy()?
Thanks for pointing out. Yes, I will change to memcpy() in next
revision. I think
memcpy() may suit more better here due to consistency and as well addressing
alignment concerns.
Thanks
Smita
>
> Ira
>
> [snip]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 7/7] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
2024-12-02 18:48 ` Ira Weiny
@ 2024-12-06 16:29 ` Koralahalli Channabasappa, Smita
0 siblings, 0 replies; 25+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2024-12-06 16:29 UTC (permalink / raw)
To: Ira Weiny, Jonathan Cameron
Cc: linux-efi, linux-kernel, linux-cxl, Ard Biesheuvel,
Alison Schofield, Vishal Verma, Dan Williams, Yazen Ghannam,
Terry Bowman
On 12/2/2024 10:48 AM, Ira Weiny wrote:
> Smita Koralahalli wrote:
>> On 11/26/2024 8:05 AM, Jonathan Cameron wrote:
>>> On Tue, 19 Nov 2024 00:39:15 +0000
>>> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
>>>
>>>> When PCIe AER is in FW-First, OS should process CXL Protocol errors from
>>>> CPER records. Introduce support for handling and logging CXL Protocol
>>>> errors.
>>>>
>>>> The defined trace events cxl_aer_uncorrectable_error and
>>>> cxl_aer_correctable_error trace native CXL AER endpoint errors, while
>>>> cxl_cper_trace_corr_prot_err and cxl_cper_trace_uncorr_prot_err
>>>> trace native CXL AER port errors. Reuse both sets to trace FW-First
>>>> protocol errors.
>>>>
>>>> Since the CXL code is required to be called from process context and
>>>> GHES is in interrupt context, use workqueues for processing.
>>>>
>>>> Similar to CXL CPER event handling, use kfifo to handle errors as it
>>>> simplifies queue processing by providing lock free fifo operations.
>>>>
>>>> Add the ability for the CXL sub-system to register a workqueue to
>>>> process CXL CPER protocol errors.
>>>>
>>>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>>>
>>> A few minor comments inline.
>>>
>>> Thanks
>>>
>>> Jonathan
>>>
>>>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
>>>> index 4ede038a7148..c992b34c290b 100644
>>>> --- a/drivers/cxl/core/pci.c
>>>> +++ b/drivers/cxl/core/pci.c
>>>> @@ -650,6 +650,56 @@ void read_cdat_data(struct cxl_port *port)
>>>> }
>>>> EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL);
>>>>
>>>> +void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, bool flag,
>>>> + struct cxl_ras_capability_regs ras_cap)
>>>> +{
>>>> + struct cxl_dev_state *cxlds;
>>>> + u32 status;
>>>> +
>>>> + status = ras_cap.cor_status & ~ras_cap.cor_mask;
>>>> +
>>>> + if (!flag) {
>>>
>>> As below. Name of flag is not very helpful when reading the code.
>>> Perhaps we can rename?
>>
>> Okay. May be flag -> is_device_error ?
>
> I had the same question about 'flag'.
>
>>>
>>>> + trace_cxl_port_aer_correctable_error(&pdev->dev, status);
>>>> + return;
>>>> + }
>>>> +
>>>> + cxlds = pci_get_drvdata(pdev);
>>>> + if (!cxlds)
>>>> + return;
>>>> +
>>>> + trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
>>>> +}
>>>> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_corr_prot_err, CXL);
>>>> +
>>>> +void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, bool flag,
>>>> + struct cxl_ras_capability_regs ras_cap)
>>>> +{
>>>> + struct cxl_dev_state *cxlds;
>>>> + u32 status, fe;
>>>> +
>>>> + status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
>>>> +
>>>> + if (hweight32(status) > 1)
>>>> + fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
>>>> + ras_cap.cap_control));
>>>> + else
>>>> + fe = status;
>>>> +
>>>> + if (!flag) {
>>>
>>> Why does a bool named flag indicate it's a port error?
>>
>> I will rename it.
>>
>> Or may be use an enum to explicitly define the error type
>> (CXL_ERROR_TYPE_DEVICE and CXL_ERROR_TYPE_PORT).
>>
>> Or may be split the function into two distinct ones, one for port errors
>> and one for device errors.
>
> I would vote for 2 functions.
> Ira
Noted. Thanks!
Thanks
Smita
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2024-12-06 16:29 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-19 0:39 [PATCH v3 0/7] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Smita Koralahalli
2024-11-19 0:39 ` [PATCH v3 1/7] efi/cper, cxl: Prefix protocol error struct and function names with cxl_ Smita Koralahalli
2024-11-26 15:05 ` Jonathan Cameron
2024-12-02 18:12 ` Ira Weiny
2024-11-19 0:39 ` [PATCH v3 2/7] efi/cper, cxl: Make definitions and structures global Smita Koralahalli
2024-11-26 15:09 ` Jonathan Cameron
2024-12-02 18:15 ` Ira Weiny
2024-11-19 0:39 ` [PATCH v3 3/7] efi/cper, cxl: Remove cper_cxl.h Smita Koralahalli
2024-11-26 15:51 ` Jonathan Cameron
2024-11-27 19:36 ` Smita Koralahalli
2024-12-02 18:15 ` Ira Weiny
2024-11-19 0:39 ` [PATCH v3 4/7] acpi/ghes, cxl: Rename cxl_cper_register_work to cxl_cper_register_event_work Smita Koralahalli
2024-11-26 15:53 ` Jonathan Cameron
2024-11-19 0:39 ` [PATCH v3 5/7] acpi/ghes, cxl: Refactor work registration functions to support multiple workqueues Smita Koralahalli
2024-11-26 15:57 ` Jonathan Cameron
2024-11-27 19:46 ` Smita Koralahalli
2024-11-19 0:39 ` [PATCH v3 6/7] acpi/ghes, cper: Recognize and cache CXL Protocol errors Smita Koralahalli
2024-11-26 16:05 ` Jonathan Cameron
2024-12-02 18:41 ` Ira Weiny
2024-12-06 16:16 ` Koralahalli Channabasappa, Smita
2024-11-19 0:39 ` [PATCH v3 7/7] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors Smita Koralahalli
2024-11-26 16:05 ` Jonathan Cameron
2024-11-27 20:35 ` Smita Koralahalli
2024-12-02 18:48 ` Ira Weiny
2024-12-06 16:29 ` Koralahalli Channabasappa, Smita
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox