linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Fix issues with ARM Processor CPER records
@ 2025-08-05 18:35 Daniel Ferguson
  2025-08-05 18:35 ` [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace Daniel Ferguson
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Daniel Ferguson @ 2025-08-05 18:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jonathan Corbet, Rafael J. Wysocki, Len Brown, James Morse,
	Tony Luck, Borislav Petkov, linux-doc, linux-kernel, linux-acpi,
	linux-efi, linux-edac, Jason Tian, Shengwei Luo, Daniel Ferguson,
	Mauro Carvalho Chehab, Jonathan Cameron, Shiju Jose

[NO NEED FOR INTERNAL REVIEW, THIS IS JUST A TEST]

This is needed for both kernelspace and userspace properly handle
ARM processor CPER events.

Patch 1 of this series fix the UEFI 2.6+ implementation of the ARM
trace event, as the original implementation was incomplete.
Changeset e9279e83ad1f ("trace, ras: add ARM processor error trace event")
added such event, but it reports only some fields of the CPER record
defined on UEFI 2.6+ appendix N, table N.16.  Those are not enough
actually parse such events on userspace, as not even the event type
is exported.

Patch 2 fixes a compilation breakage when W=1;

Patch 3 adds a new helper function to be used by cper and ghes drivers to
display CPER bitmaps;

Patch 4 fixes CPER logic according with UEFI 2.9A errata. Before it, there
was no description about how processor type field was encoded. The errata
defines it as a bitmask, and provides the information about how it should
be encoded.

Patch 5 adds CPER functions to Kernel-doc.

This series was validated with the help of an ARM EINJ code for QEMU:

	https://gitlab.com/mchehab_kernel/qemu/-/tree/qemu_submission

$ scripts/ghes_inject.py -d arm -p 0xdeadbeef -t cache,bus,micro-arch

[   11.094205] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[   11.095009] {1}[Hardware Error]: event severity: recoverable
[   11.095486] {1}[Hardware Error]:  Error 0, type: recoverable
[   11.096090] {1}[Hardware Error]:   section_type: ARM processor error
[   11.096399] {1}[Hardware Error]:   MIDR: 0x00000000000f0510
[   11.097135] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
[   11.097811] {1}[Hardware Error]:   running state: 0x0
[   11.098193] {1}[Hardware Error]:   Power State Coordination Interface state: 0
[   11.098699] {1}[Hardware Error]:   Error info structure 0:
[   11.099174] {1}[Hardware Error]:   num errors: 2
[   11.099682] {1}[Hardware Error]:    error_type: 0x1a: cache error|bus error|micro-architectural error
[   11.100150] {1}[Hardware Error]:    physical fault address: 0x00000000deadbeef
[   11.111214] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered

- 

I also tested the ghes and cper reports both with and without this
change, using different versions of rasdaemon, with and without
support for the extended trace event. Those are a summary of the
test results:

- adding more fields to the trace events didn't break userspace API:
  both versions of rasdaemon handled it;

- the rasdaemon patches to handle the new trace report was missing
  a backward-compatibility logic. I fixed already. So, rasdaemon
  can now handle both old and new trace events.

Btw, rasdaemon has gained support for the extended trace since its
version 0.5.8 (released in 2021). I didn't saw any issues there
complain about troubles on it, so either distros used on ARM servers
are using an old version of rasdaemon, or they're carrying on the trace
event changes as well.

---
v4:
 - rebase to kernel v6.16
 - modify commit message of patch 1, and adjust white spaces
   per Boris' suggestions.

v3:
 - history of patch 1 improved with a chain of co-developed-by;
 - add a better description and an example on patch 3;
 - use BIT_ULL() on patch 3;
 - add a missing include on patch 4.

v2:
  - removed an uneeded patch adding #ifdef for CONFIG_ARM/ARM64;
  - cper_bits_to_str() now returns the number of chars filled at the buffer;
  - did a cosmetic (blank lines) improvement at include/linux/ras.h;
  - arm_event trace dynamic arrays renamed to pei_buf/ctx_buf/oem_buf.

Jason Tian (1):
      RAS: Report all ARM processor CPER information to userspace

Mauro Carvalho Chehab (4):
      efi/cper: Adjust infopfx size to accept an extra space
      efi/cper: Add a new helper function to print bitmasks
      efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs
      docs: efi: add CPER functions to driver-api

 Documentation/driver-api/firmware/efi/index.rst | 11 +++--
 drivers/acpi/apei/ghes.c                        | 27 +++++------
 drivers/firmware/efi/cper-arm.c                 | 52 ++++++++++-----------
 drivers/firmware/efi/cper.c                     | 62 ++++++++++++++++++++++++-
 drivers/ras/ras.c                               | 41 +++++++++++++++-
 include/linux/cper.h                            | 12 +++--
 include/linux/ras.h                             | 16 +++++--
 include/ras/ras_event.h                         | 48 +++++++++++++++++--
 8 files changed, 210 insertions(+), 59 deletions(-)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace
  2025-08-05 18:35 [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
@ 2025-08-05 18:35 ` Daniel Ferguson
  2025-08-08 15:22   ` Jonathan Cameron
  2025-08-05 18:35 ` [PATCH v4 2/5] efi/cper: Adjust infopfx size to accept an extra space Daniel Ferguson
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Daniel Ferguson @ 2025-08-05 18:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jonathan Corbet, Rafael J. Wysocki, Len Brown, James Morse,
	Tony Luck, Borislav Petkov, linux-doc, linux-kernel, linux-acpi,
	linux-efi, linux-edac, Jason Tian, Shengwei Luo, Daniel Ferguson,
	Mauro Carvalho Chehab, Jonathan Cameron, Shiju Jose

From: Jason Tian <jason@os.amperecomputing.com>

The ARM processor CPER record was added in UEFI v2.6 and remained
unchanged up to v2.10.

Yet, the original arm_event trace code added by

  e9279e83ad1f ("trace, ras: add ARM processor error trace event")

is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
exporting any information from tables N.17 to N.29 of the record.

This is not enough for the user to be able to figure out what has
exactly happened or to take appropriate action.

According to the UEFI v2.9 specification chapter N2.4.4, the ARM
processor error section includes:

- several (ERR_INFO_NUM) ARM processor error information structures
  (Tables N.17 to N.20);
- several (CONTEXT_INFO_NUM) ARM processor context information
  structures (Tables N.21 to N.29);
- several vendor specific error information structures. The
  size is given by Section Length minus the size of the other
  fields.

In addition, it also exports two fields that are parsed by the GHES
driver when firmware reports it, e.g.:

- error severity
- CPU logical index

Report all of these information to userspace via a the ARM tracepoint so
that userspace can properly record the error and take decisions related
to CPU core isolation according to error severity and other info.

The updated ARM trace event now contains the following fields:

======================================  =============================
UEFI field on table N.16                ARM Processor trace fields
======================================  =============================
Validation                              handled when filling data for
                                        affinity MPIDR and running
                                        state.
ERR_INFO_NUM                            pei_len
CONTEXT_INFO_NUM                        ctx_len
Section Length                          indirectly reported by
                                        pei_len, ctx_len and oem_len
Error affinity level                    affinity
MPIDR_EL1                               mpidr
MIDR_EL1                                midr
Running State                           running_state
PSCI State                              psci_state
Processor Error Information Structure   pei_err - count at pei_len
Processor Context                       ctx_err- count at ctx_len
Vendor Specific Error Info              oem - count at oem_len
======================================  =============================

It should be noted that decoding of tables N.17 to N.29, if needed, will
be handled in userspace. That gives more flexibility, as there won't be
any need to flood the kernel with micro-architecture specific error
decoding.

Also, decoding the other fields require a complex logic, and should be
done for each of the several values inside the record field.  So, let
userspace daemons like rasdaemon decode them, parsing such tables and
having vendor-specific micro-architecture-specific decoders.

  [mchehab: modified description, solved merge conflicts and fixed coding style]

Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")

Co-developed-by: Jason Tian <jason@os.amperecomputing.com>
Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
Co-developed-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Shiju Jose <shiju.jose@huawei.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
---
 drivers/acpi/apei/ghes.c | 11 ++++-------
 drivers/ras/ras.c        | 41 +++++++++++++++++++++++++++++++++++++++--
 include/linux/ras.h      | 16 +++++++++++++---
 include/ras/ras_event.h  | 48 +++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 99 insertions(+), 17 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index f0584ccad451915a2679c17f2367461c141663c5..99e25553fc1320b2306efb751e12f2377c86878a 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -527,7 +527,7 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
 }
 
 static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
-				       int sev, bool sync)
+				     int sev, bool sync)
 {
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
@@ -535,9 +535,8 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 	int sec_sev, i;
 	char *p;
 
-	log_arm_hw_error(err);
-
 	sec_sev = ghes_severity(gdata->error_severity);
+	log_arm_hw_error(err, sec_sev);
 	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
 		return false;
 
@@ -870,11 +869,9 @@ static bool ghes_do_proc(struct ghes *ghes,
 
 			arch_apei_report_mem_error(sev, mem_err);
 			queued = ghes_handle_memory_failure(gdata, sev, sync);
-		}
-		else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
+		} else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
 			ghes_handle_aer(gdata);
-		}
-		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
+		} else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
 			queued = ghes_handle_arm_hw_error(gdata, sev, sync);
 		} else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
 			struct cxl_cper_sec_prot_err *prot_err = acpi_hest_get_payload(gdata);
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index a6e4792a1b2e9239f44f29102a7cc058d64b93ef..179b1744df71ee1eac28718628d550075c480cd5 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -52,9 +52,46 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id,
 	trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
 }
 
-void log_arm_hw_error(struct cper_sec_proc_arm *err)
+void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
 {
-	trace_arm_event(err);
+	struct cper_arm_err_info *err_info;
+	struct cper_arm_ctx_info *ctx_info;
+	u8 *ven_err_data;
+	u32 ctx_len = 0;
+	int n, sz, cpu;
+	s32 vsei_len;
+	u32 pei_len;
+	u8 *pei_err;
+	u8 *ctx_err;
+
+	pei_len = sizeof(struct cper_arm_err_info) * err->err_info_num;
+	pei_err = (u8 *)err + sizeof(struct cper_sec_proc_arm);
+
+	err_info = (struct cper_arm_err_info *)(err + 1);
+	ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
+	ctx_err = (u8 *)ctx_info;
+
+	for (n = 0; n < err->context_info_num; n++) {
+		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
+		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
+		ctx_len += sz;
+	}
+
+	vsei_len = err->section_length - (sizeof(struct cper_sec_proc_arm) + pei_len + ctx_len);
+	if (vsei_len < 0) {
+		pr_warn(FW_BUG "section length: %d\n", err->section_length);
+		pr_warn(FW_BUG "section length is too small\n");
+		pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
+		vsei_len = 0;
+	}
+	ven_err_data = (u8 *)ctx_info;
+
+	cpu = GET_LOGICAL_INDEX(err->mpidr);
+	if (cpu < 0)
+		cpu = -1;
+
+	trace_arm_event(err, pei_err, pei_len, ctx_err, ctx_len,
+			ven_err_data, (u32)vsei_len, sev, cpu);
 }
 
 static int __init ras_init(void)
diff --git a/include/linux/ras.h b/include/linux/ras.h
index a64182bc72ad3f2b430c53c7a9e23e798a1c1fbe..468941bfe855f6a1e3471245d98df5ffb362385b 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -24,8 +24,7 @@ int __init parse_cec_param(char *str);
 void log_non_standard_event(const guid_t *sec_type,
 			    const guid_t *fru_id, const char *fru_text,
 			    const u8 sev, const u8 *err, const u32 len);
-void log_arm_hw_error(struct cper_sec_proc_arm *err);
-
+void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev);
 #else
 static inline void
 log_non_standard_event(const guid_t *sec_type,
@@ -33,7 +32,7 @@ log_non_standard_event(const guid_t *sec_type,
 		       const u8 sev, const u8 *err, const u32 len)
 { return; }
 static inline void
-log_arm_hw_error(struct cper_sec_proc_arm *err) { return; }
+log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev) { return; }
 #endif
 
 struct atl_err {
@@ -53,4 +52,15 @@ static inline unsigned long
 amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err) { return -EINVAL; }
 #endif /* CONFIG_AMD_ATL */
 
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+#include <asm/smp_plat.h>
+/*
+ * Include ARM-specific SMP header which provides a function mapping mpidr to
+ * CPU logical index.
+ */
+#define GET_LOGICAL_INDEX(mpidr) get_logical_index(mpidr & MPIDR_HWID_BITMASK)
+#else
+#define GET_LOGICAL_INDEX(mpidr) -EINVAL
+#endif /* CONFIG_ARM || CONFIG_ARM64 */
+
 #endif /* __RAS_H__ */
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 14c9f943d53fb6cbadeef3f4b13e61470f0b5dee..a6b7ac0adaff9dfeab05a0c75ed359930ae04479 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -168,11 +168,24 @@ TRACE_EVENT(mc_event,
  * This event is generated when hardware detects an ARM processor error
  * has occurred. UEFI 2.6 spec section N.2.4.4.
  */
+#define APEIL "ARM Processor Err Info data len"
+#define APEID "ARM Processor Err Info raw data"
+#define APECIL "ARM Processor Err Context Info data len"
+#define APECID "ARM Processor Err Context Info raw data"
+#define VSEIL "Vendor Specific Err Info data len"
+#define VSEID "Vendor Specific Err Info raw data"
 TRACE_EVENT(arm_event,
 
-	TP_PROTO(const struct cper_sec_proc_arm *proc),
+	TP_PROTO(const struct cper_sec_proc_arm *proc, const u8 *pei_err,
+			const u32 pei_len,
+			const u8 *ctx_err,
+			const u32 ctx_len,
+			const u8 *oem,
+			const u32 oem_len,
+			u8 sev,
+			int cpu),
 
-	TP_ARGS(proc),
+	TP_ARGS(proc, pei_err, pei_len, ctx_err, ctx_len, oem, oem_len, sev, cpu),
 
 	TP_STRUCT__entry(
 		__field(u64, mpidr)
@@ -180,6 +193,14 @@ TRACE_EVENT(arm_event,
 		__field(u32, running_state)
 		__field(u32, psci_state)
 		__field(u8, affinity)
+		__field(u32, pei_len)
+		__dynamic_array(u8, pei_buf, pei_len)
+		__field(u32, ctx_len)
+		__dynamic_array(u8, ctx_buf, ctx_len)
+		__field(u32, oem_len)
+		__dynamic_array(u8, oem_buf, oem_len)
+		__field(u8, sev)
+		__field(int, cpu)
 	),
 
 	TP_fast_assign(
@@ -199,12 +220,29 @@ TRACE_EVENT(arm_event,
 			__entry->running_state = ~0;
 			__entry->psci_state = ~0;
 		}
+		__entry->pei_len = pei_len;
+		memcpy(__get_dynamic_array(pei_buf), pei_err, pei_len);
+		__entry->ctx_len = ctx_len;
+		memcpy(__get_dynamic_array(ctx_buf), ctx_err, ctx_len);
+		__entry->oem_len = oem_len;
+		memcpy(__get_dynamic_array(oem_buf), oem, oem_len);
+		__entry->sev = sev;
+		__entry->cpu = cpu;
 	),
 
-	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
-		  "running state: %d; PSCI state: %d",
+	TP_printk("cpu: %d; error: %d; affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
+		  "running state: %d; PSCI state: %d; "
+		  "%s: %d; %s: %s; %s: %d; %s: %s; %s: %d; %s: %s",
+		  __entry->cpu,
+		  __entry->sev,
 		  __entry->affinity, __entry->mpidr, __entry->midr,
-		  __entry->running_state, __entry->psci_state)
+		  __entry->running_state, __entry->psci_state,
+		  APEIL, __entry->pei_len, APEID,
+		  __print_hex(__get_dynamic_array(pei_buf), __entry->pei_len),
+		  APECIL, __entry->ctx_len, APECID,
+		  __print_hex(__get_dynamic_array(ctx_buf), __entry->ctx_len),
+		  VSEIL, __entry->oem_len, VSEID,
+		  __print_hex(__get_dynamic_array(oem_buf), __entry->oem_len))
 );
 
 /*

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/5] efi/cper: Adjust infopfx size to accept an extra space
  2025-08-05 18:35 [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
  2025-08-05 18:35 ` [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace Daniel Ferguson
@ 2025-08-05 18:35 ` Daniel Ferguson
  2025-08-05 18:35 ` [PATCH v4 3/5] efi/cper: Add a new helper function to print bitmasks Daniel Ferguson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Daniel Ferguson @ 2025-08-05 18:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jonathan Corbet, Rafael J. Wysocki, Len Brown, James Morse,
	Tony Luck, Borislav Petkov, linux-doc, linux-kernel, linux-acpi,
	linux-efi, linux-edac, Mauro Carvalho Chehab, Jonathan Cameron

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Compiling with W=1 with werror enabled produces an error:

drivers/firmware/efi/cper-arm.c: In function ‘cper_print_proc_arm’:
drivers/firmware/efi/cper-arm.c:298:64: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
  298 |                         snprintf(infopfx, sizeof(infopfx), "%s ", newpfx);
      |                                                                ^
drivers/firmware/efi/cper-arm.c:298:25: note: ‘snprintf’ output between 2 and 65 bytes into a destination of size 64
  298 |                         snprintf(infopfx, sizeof(infopfx), "%s ", newpfx);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As the logic there adds an space at the end of infopx buffer.
Add an extra space to avoid such warning.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 drivers/firmware/efi/cper-arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index f0a63d09d3c499b2f1e51d5ef35b69deeb3cbdb3..6ff781e47147c05c784ca5aa57149d1435cb2467 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -240,7 +240,7 @@ void cper_print_proc_arm(const char *pfx,
 	int i, len, max_ctx_type;
 	struct cper_arm_err_info *err_info;
 	struct cper_arm_ctx_info *ctx_info;
-	char newpfx[64], infopfx[64];
+	char newpfx[64], infopfx[ARRAY_SIZE(newpfx) + 1];
 
 	printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
 

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/5] efi/cper: Add a new helper function to print bitmasks
  2025-08-05 18:35 [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
  2025-08-05 18:35 ` [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace Daniel Ferguson
  2025-08-05 18:35 ` [PATCH v4 2/5] efi/cper: Adjust infopfx size to accept an extra space Daniel Ferguson
@ 2025-08-05 18:35 ` Daniel Ferguson
  2025-08-08 15:23   ` Jonathan Cameron
  2025-08-05 18:35 ` [PATCH v4 4/5] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs Daniel Ferguson
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Daniel Ferguson @ 2025-08-05 18:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jonathan Corbet, Rafael J. Wysocki, Len Brown, James Morse,
	Tony Luck, Borislav Petkov, linux-doc, linux-kernel, linux-acpi,
	linux-efi, linux-edac, Mauro Carvalho Chehab

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Add a helper function to print a string with names associated
to each bit field.

A typical example is:

	const char * const bits[] = {
		"bit 3 name",
		"bit 4 name",
		"bit 5 name",
	};
	char str[120];
        unsigned int bitmask = BIT(3) | BIT(5);

	#define MASK  GENMASK(5,3)

	cper_bits_to_str(str, sizeof(str), FIELD_GET(MASK, bitmask),
			 bits, ARRAY_SIZE(bits));

The above code fills string "str" with "bit 3 name|bit 5 name".

Reviewed-by; Jonathan Cameron <Jonathan.Cameron@huawei.com>

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 drivers/firmware/efi/cper.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cper.h        |  2 ++
 2 files changed, 62 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 928409199a1a4009b11cf3189fe036ad8861169c..79ba688a64f8da7af2dad097b9331c72afc73864 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -12,6 +12,7 @@
  * Specification version 2.4.
  */
 
+#include <linux/bitmap.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/time.h>
@@ -106,6 +107,65 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 		printk("%s\n", buf);
 }
 
+/**
+ * cper_bits_to_str - return a string for set bits
+ * @buf: buffer to store the output string
+ * @buf_size: size of the output string buffer
+ * @bits: bit mask
+ * @strs: string array, indexed by bit position
+ * @strs_size: size of the string array: @strs
+ *
+ * Add to @buf the bitmask in hexadecimal. Then, for each set bit in @bits,
+ * add the corresponding string describing the bit in @strs to @buf.
+ *
+ * A typical example is::
+ *
+ *	const char * const bits[] = {
+ *		"bit 3 name",
+ *		"bit 4 name",
+ *		"bit 5 name",
+ *	};
+ *	char str[120];
+ *	unsigned int bitmask = BIT(3) | BIT(5);
+ *	#define MASK GENMASK(5,3)
+ *
+ *	cper_bits_to_str(str, sizeof(str), FIELD_GET(MASK, bitmask),
+ *			 bits, ARRAY_SIZE(bits));
+ *
+ * The above code fills the string ``str`` with ``bit 3 name|bit 5 name``.
+ *
+ * Return: number of bytes stored or an error code if lower than zero.
+ */
+int cper_bits_to_str(char *buf, int buf_size, unsigned long bits,
+		     const char * const strs[], unsigned int strs_size)
+{
+	int len = buf_size;
+	char *str = buf;
+	int i, size;
+
+	*buf = '\0';
+
+	for_each_set_bit(i, &bits, strs_size) {
+		if (!(bits & BIT_ULL(i)))
+			continue;
+
+		if (*buf && len > 0) {
+			*str = '|';
+			len--;
+			str++;
+		}
+
+		size = strscpy(str, strs[i], len);
+		if (size < 0)
+			return size;
+
+		len -= size;
+		str += size;
+	}
+	return len - buf_size;
+}
+EXPORT_SYMBOL_GPL(cper_bits_to_str);
+
 static const char * const proc_type_strs[] = {
 	"IA32/X64",
 	"IA64",
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 0ed60a91eca9d6425c9a41947a927b59f7aa2c71..58f40477c824e61c7f798978947bf1f441ce45ad 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -588,6 +588,8 @@ const char *cper_mem_err_type_str(unsigned int);
 const char *cper_mem_err_status_str(u64 status);
 void cper_print_bits(const char *prefix, unsigned int bits,
 		     const char * const strs[], unsigned int strs_size);
+int cper_bits_to_str(char *buf, int buf_size, unsigned long bits,
+		     const char * const strs[], unsigned int strs_size);
 void cper_mem_err_pack(const struct cper_sec_mem_err *,
 		       struct cper_mem_err_compact *);
 const char *cper_mem_err_unpack(struct trace_seq *,

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/5] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs
  2025-08-05 18:35 [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
                   ` (2 preceding siblings ...)
  2025-08-05 18:35 ` [PATCH v4 3/5] efi/cper: Add a new helper function to print bitmasks Daniel Ferguson
@ 2025-08-05 18:35 ` Daniel Ferguson
  2025-08-05 18:35 ` [PATCH v4 5/5] docs: efi: add CPER functions to driver-api Daniel Ferguson
  2025-08-05 18:39 ` [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
  5 siblings, 0 replies; 16+ messages in thread
From: Daniel Ferguson @ 2025-08-05 18:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jonathan Corbet, Rafael J. Wysocki, Len Brown, James Morse,
	Tony Luck, Borislav Petkov, linux-doc, linux-kernel, linux-acpi,
	linux-efi, linux-edac, Mauro Carvalho Chehab, Jonathan Cameron

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor
was defined simply as:

Type at byte offset 4:

	- Cache error
	- TLB Error
	- Bus Error
	- Micro-architectural Error
	All other values are reserved

Yet, there was no information about how this would be encoded.

Spec 2.9A errata corrected it by defining:

	- Bit 1 - Cache Error
	- Bit 2 - TLB Error
	- Bit 3 - Bus Error
	- Bit 4 - Micro-architectural Error
	All other values are reserved

That actually aligns with the values already defined on older
versions at N.2.4.1. Generic Processor Error Section.

Spec 2.10 also preserve the same encoding as 2.9A.

Adjust CPER and GHES handling code for both generic and ARM
processors to properly handle UEFI 2.9A and 2.10 encoding.

Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 drivers/acpi/apei/ghes.c        | 16 ++++++++-----
 drivers/firmware/efi/cper-arm.c | 50 ++++++++++++++++++++---------------------
 include/linux/cper.h            | 10 ++++-----
 3 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 99e25553fc1320b2306efb751e12f2377c86878a..79a128cb04c351c1d01ee749904ee844963d0f10 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -22,6 +22,7 @@
 #include <linux/moduleparam.h>
 #include <linux/init.h>
 #include <linux/acpi.h>
+#include <linux/bitfield.h>
 #include <linux/io.h>
 #include <linux/interrupt.h>
 #include <linux/timer.h>
@@ -531,6 +532,7 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 {
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
+	char error_type[120];
 	bool queued = false;
 	int sec_sev, i;
 	char *p;
@@ -543,9 +545,8 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 	p = (char *)(err + 1);
 	for (i = 0; i < err->err_info_num; i++) {
 		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
-		bool is_cache = (err_info->type == CPER_ARM_CACHE_ERROR);
+		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
 		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
-		const char *error_type = "unknown error";
 
 		/*
 		 * The field (err_info->error_info & BIT(26)) is fixed to set to
@@ -559,12 +560,15 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 			continue;
 		}
 
-		if (err_info->type < ARRAY_SIZE(cper_proc_error_type_strs))
-			error_type = cper_proc_error_type_strs[err_info->type];
+		cper_bits_to_str(error_type, sizeof(error_type),
+				 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
+				 cper_proc_error_type_strs,
+				 ARRAY_SIZE(cper_proc_error_type_strs));
 
 		pr_warn_ratelimited(FW_WARN GHES_PFX
-				    "Unhandled processor error type: %s\n",
-				    error_type);
+				    "Unhandled processor error type 0x%02x: %s%s\n",
+				    err_info->type, error_type,
+				    (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
 		p += err_info->length;
 	}
 
diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index 6ff781e47147c05c784ca5aa57149d1435cb2467..76542a53e20275cf0f059e9ce409fd898de16d4d 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -93,15 +93,11 @@ static void cper_print_arm_err_info(const char *pfx, u32 type,
 	bool proc_context_corrupt, corrected, precise_pc, restartable_pc;
 	bool time_out, access_mode;
 
-	/* If the type is unknown, bail. */
-	if (type > CPER_ARM_MAX_TYPE)
-		return;
-
 	/*
 	 * Vendor type errors have error information values that are vendor
 	 * specific.
 	 */
-	if (type == CPER_ARM_VENDOR_ERROR)
+	if (type & CPER_ARM_VENDOR_ERROR)
 		return;
 
 	if (error_info & CPER_ARM_ERR_VALID_TRANSACTION_TYPE) {
@@ -116,43 +112,38 @@ static void cper_print_arm_err_info(const char *pfx, u32 type,
 	if (error_info & CPER_ARM_ERR_VALID_OPERATION_TYPE) {
 		op_type = ((error_info >> CPER_ARM_ERR_OPERATION_SHIFT)
 			   & CPER_ARM_ERR_OPERATION_MASK);
-		switch (type) {
-		case CPER_ARM_CACHE_ERROR:
+		if (type & CPER_ARM_CACHE_ERROR) {
 			if (op_type < ARRAY_SIZE(arm_cache_err_op_strs)) {
-				printk("%soperation type: %s\n", pfx,
+				printk("%scache error, operation type: %s\n", pfx,
 				       arm_cache_err_op_strs[op_type]);
 			}
-			break;
-		case CPER_ARM_TLB_ERROR:
+		}
+		if (type & CPER_ARM_TLB_ERROR) {
 			if (op_type < ARRAY_SIZE(arm_tlb_err_op_strs)) {
-				printk("%soperation type: %s\n", pfx,
+				printk("%sTLB error, operation type: %s\n", pfx,
 				       arm_tlb_err_op_strs[op_type]);
 			}
-			break;
-		case CPER_ARM_BUS_ERROR:
+		}
+		if (type & CPER_ARM_BUS_ERROR) {
 			if (op_type < ARRAY_SIZE(arm_bus_err_op_strs)) {
-				printk("%soperation type: %s\n", pfx,
+				printk("%sbus error, operation type: %s\n", pfx,
 				       arm_bus_err_op_strs[op_type]);
 			}
-			break;
 		}
 	}
 
 	if (error_info & CPER_ARM_ERR_VALID_LEVEL) {
 		level = ((error_info >> CPER_ARM_ERR_LEVEL_SHIFT)
 			 & CPER_ARM_ERR_LEVEL_MASK);
-		switch (type) {
-		case CPER_ARM_CACHE_ERROR:
+		if (type & CPER_ARM_CACHE_ERROR)
 			printk("%scache level: %d\n", pfx, level);
-			break;
-		case CPER_ARM_TLB_ERROR:
+
+		if (type & CPER_ARM_TLB_ERROR)
 			printk("%sTLB level: %d\n", pfx, level);
-			break;
-		case CPER_ARM_BUS_ERROR:
+
+		if (type & CPER_ARM_BUS_ERROR)
 			printk("%saffinity level at which the bus error occurred: %d\n",
 			       pfx, level);
-			break;
-		}
 	}
 
 	if (error_info & CPER_ARM_ERR_VALID_PROC_CONTEXT_CORRUPT) {
@@ -241,6 +232,7 @@ void cper_print_proc_arm(const char *pfx,
 	struct cper_arm_err_info *err_info;
 	struct cper_arm_ctx_info *ctx_info;
 	char newpfx[64], infopfx[ARRAY_SIZE(newpfx) + 1];
+	char error_type[120];
 
 	printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
 
@@ -289,9 +281,15 @@ void cper_print_proc_arm(const char *pfx,
 				       newpfx);
 		}
 
-		printk("%serror_type: %d, %s\n", newpfx, err_info->type,
-			err_info->type < ARRAY_SIZE(cper_proc_error_type_strs) ?
-			cper_proc_error_type_strs[err_info->type] : "unknown");
+		cper_bits_to_str(error_type, sizeof(error_type),
+				 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
+				 cper_proc_error_type_strs,
+				 ARRAY_SIZE(cper_proc_error_type_strs));
+
+		printk("%serror_type: 0x%02x: %s%s\n", newpfx, err_info->type,
+		       error_type,
+		       (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
+
 		if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) {
 			printk("%serror_info: 0x%016llx\n", newpfx,
 			       err_info->error_info);
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 58f40477c824e61c7f798978947bf1f441ce45ad..5b1236d8c65bb7d285a327c457115a18fc9d7953 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -297,11 +297,11 @@ enum {
 #define CPER_ARM_INFO_FLAGS_PROPAGATED		BIT(2)
 #define CPER_ARM_INFO_FLAGS_OVERFLOW		BIT(3)
 
-#define CPER_ARM_CACHE_ERROR			0
-#define CPER_ARM_TLB_ERROR			1
-#define CPER_ARM_BUS_ERROR			2
-#define CPER_ARM_VENDOR_ERROR			3
-#define CPER_ARM_MAX_TYPE			CPER_ARM_VENDOR_ERROR
+#define CPER_ARM_ERR_TYPE_MASK			GENMASK(4,1)
+#define CPER_ARM_CACHE_ERROR			BIT(1)
+#define CPER_ARM_TLB_ERROR			BIT(2)
+#define CPER_ARM_BUS_ERROR			BIT(3)
+#define CPER_ARM_VENDOR_ERROR			BIT(4)
 
 #define CPER_ARM_ERR_VALID_TRANSACTION_TYPE	BIT(0)
 #define CPER_ARM_ERR_VALID_OPERATION_TYPE	BIT(1)

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 5/5] docs: efi: add CPER functions to driver-api
  2025-08-05 18:35 [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
                   ` (3 preceding siblings ...)
  2025-08-05 18:35 ` [PATCH v4 4/5] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs Daniel Ferguson
@ 2025-08-05 18:35 ` Daniel Ferguson
  2025-08-05 18:39 ` [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
  5 siblings, 0 replies; 16+ messages in thread
From: Daniel Ferguson @ 2025-08-05 18:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jonathan Corbet, Rafael J. Wysocki, Len Brown, James Morse,
	Tony Luck, Borislav Petkov, linux-doc, linux-kernel, linux-acpi,
	linux-efi, linux-edac, Mauro Carvalho Chehab, Jonathan Cameron

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

There are two kernel-doc like descriptions at cper, which is used
by other parts of cper and on ghes driver. They both have kernel-doc
like descriptions.

Change the tags for them to be actual kernel-doc tags and add them
to the driver-api documentaion at the UEFI section.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 Documentation/driver-api/firmware/efi/index.rst | 11 ++++++++---
 drivers/firmware/efi/cper.c                     |  2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/driver-api/firmware/efi/index.rst b/Documentation/driver-api/firmware/efi/index.rst
index 4fe8abba9fc6bf8ed53443e48e79285730871c32..5a6b6229592c9a9d1eb223966c582e0969ee9514 100644
--- a/Documentation/driver-api/firmware/efi/index.rst
+++ b/Documentation/driver-api/firmware/efi/index.rst
@@ -1,11 +1,16 @@
 .. SPDX-License-Identifier: GPL-2.0
 
-============
-UEFI Support
-============
+====================================================
+Unified Extensible Firmware Interface (UEFI) Support
+====================================================
 
 UEFI stub library functions
 ===========================
 
 .. kernel-doc:: drivers/firmware/efi/libstub/mem.c
    :internal:
+
+UEFI Common Platform Error Record (CPER) functions
+==================================================
+
+.. kernel-doc:: drivers/firmware/efi/cper.c
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 79ba688a64f8da7af2dad097b9331c72afc73864..0232bd040f61c9b4521ae50ec4b6a1b0bfa5cc19 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -70,7 +70,7 @@ const char *cper_severity_str(unsigned int severity)
 }
 EXPORT_SYMBOL_GPL(cper_severity_str);
 
-/*
+/**
  * cper_print_bits - print strings for set bits
  * @pfx: prefix for each line, including log level and prefix string
  * @bits: bit mask

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/5] Fix issues with ARM Processor CPER records
  2025-08-05 18:35 [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
                   ` (4 preceding siblings ...)
  2025-08-05 18:35 ` [PATCH v4 5/5] docs: efi: add CPER functions to driver-api Daniel Ferguson
@ 2025-08-05 18:39 ` Daniel Ferguson
  2025-08-08 15:01   ` Jonathan Cameron
  5 siblings, 1 reply; 16+ messages in thread
From: Daniel Ferguson @ 2025-08-05 18:39 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jonathan Corbet, Rafael J. Wysocki, Len Brown, James Morse,
	Tony Luck, Borislav Petkov, linux-doc, linux-kernel, linux-acpi,
	linux-efi, linux-edac, Jason Tian, Shengwei Luo,
	Mauro Carvalho Chehab, Jonathan Cameron, Shiju Jose

On 8/5/2025 11:35 AM, Daniel Ferguson wrote:
> [NO NEED FOR INTERNAL REVIEW, THIS IS JUST A TEST]

Yes, PLEASE REVIEW...
I accidentally left that message in...

~Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/5] Fix issues with ARM Processor CPER records
  2025-08-05 18:39 ` [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
@ 2025-08-08 15:01   ` Jonathan Cameron
  2025-08-08 19:03     ` Daniel Ferguson
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2025-08-08 15:01 UTC (permalink / raw)
  To: Daniel Ferguson
  Cc: Ard Biesheuvel, Jonathan Corbet, Rafael J. Wysocki, Len Brown,
	James Morse, Tony Luck, Borislav Petkov, linux-doc, linux-kernel,
	linux-acpi, linux-efi, linux-edac, Jason Tian, Shengwei Luo,
	Mauro Carvalho Chehab, Shiju Jose

On Tue, 5 Aug 2025 11:39:38 -0700
Daniel Ferguson <danielf@os.amperecomputing.com> wrote:

> On 8/5/2025 11:35 AM, Daniel Ferguson wrote:
> > [NO NEED FOR INTERNAL REVIEW, THIS IS JUST A TEST]  
> 
> Yes, PLEASE REVIEW...
> I accidentally left that message in...
> 
> ~Daniel

Hi Daniel,

No problem with you picking this up to take forwards as I gather you asked
Mauro first, but good to mention that change and that it's with agreement
(or not if it's abandoned code which I don't think this was.) 

Thanks,

Jonathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace
  2025-08-05 18:35 ` [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace Daniel Ferguson
@ 2025-08-08 15:22   ` Jonathan Cameron
  2025-08-09 15:55     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2025-08-08 15:22 UTC (permalink / raw)
  To: Daniel Ferguson
  Cc: Ard Biesheuvel, Jonathan Corbet, Rafael J. Wysocki, Len Brown,
	James Morse, Tony Luck, Borislav Petkov, linux-doc, linux-kernel,
	linux-acpi, linux-efi, linux-edac, Jason Tian, Shengwei Luo,
	Mauro Carvalho Chehab, Shiju Jose

On Tue, 05 Aug 2025 11:35:38 -0700
Daniel Ferguson <danielf@os.amperecomputing.com> wrote:

> From: Jason Tian <jason@os.amperecomputing.com>
> 
> The ARM processor CPER record was added in UEFI v2.6 and remained
> unchanged up to v2.10.
> 
> Yet, the original arm_event trace code added by
> 
>   e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> 
> is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
> exporting any information from tables N.17 to N.29 of the record.
> 
> This is not enough for the user to be able to figure out what has
> exactly happened or to take appropriate action.
> 
> According to the UEFI v2.9 specification chapter N2.4.4, the ARM
> processor error section includes:
> 
> - several (ERR_INFO_NUM) ARM processor error information structures
>   (Tables N.17 to N.20);
> - several (CONTEXT_INFO_NUM) ARM processor context information
>   structures (Tables N.21 to N.29);
> - several vendor specific error information structures. The
>   size is given by Section Length minus the size of the other
>   fields.
> 
> In addition, it also exports two fields that are parsed by the GHES
> driver when firmware reports it, e.g.:
> 
> - error severity
> - CPU logical index
> 
> Report all of these information to userspace via a the ARM tracepoint so
> that userspace can properly record the error and take decisions related
> to CPU core isolation according to error severity and other info.
> 
> The updated ARM trace event now contains the following fields:
> 
> ======================================  =============================
> UEFI field on table N.16                ARM Processor trace fields
> ======================================  =============================
> Validation                              handled when filling data for
>                                         affinity MPIDR and running
>                                         state.
> ERR_INFO_NUM                            pei_len
> CONTEXT_INFO_NUM                        ctx_len
> Section Length                          indirectly reported by
>                                         pei_len, ctx_len and oem_len
> Error affinity level                    affinity
> MPIDR_EL1                               mpidr
> MIDR_EL1                                midr
> Running State                           running_state
> PSCI State                              psci_state
> Processor Error Information Structure   pei_err - count at pei_len
> Processor Context                       ctx_err- count at ctx_len
> Vendor Specific Error Info              oem - count at oem_len
> ======================================  =============================
> 
> It should be noted that decoding of tables N.17 to N.29, if needed, will
> be handled in userspace. That gives more flexibility, as there won't be
> any need to flood the kernel with micro-architecture specific error
> decoding.
> 
> Also, decoding the other fields require a complex logic, and should be
> done for each of the several values inside the record field.  So, let
> userspace daemons like rasdaemon decode them, parsing such tables and
> having vendor-specific micro-architecture-specific decoders.
> 
>   [mchehab: modified description, solved merge conflicts and fixed coding style]
> 
> Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> 

Fixes tag is part of the main tag block so no blank line here.
There are at least some scripts running on the kernel tree that trip
up on this (and one that moans at the submitter ;)

I'd also add something to explain the SoB sequence for the curious.

> Co-developed-by: Jason Tian <jason@os.amperecomputing.com>

Jason's the Author, so shouldn't have a Co-dev tag.
There is some info on this in
https://docs.kernel.org/process/submitting-patches.html

> Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
> Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
> Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
> Co-developed-by: Daniel Ferguson <danielf@os.amperecomputing.com>
> Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>

As person submitting I'd normally expect your SoB last.

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

I guess this is because Mauro posted an earlier version in which
case this is arguably correct, but likely to confuse.
For cases like this I add comments.

> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Tested-by: Shiju Jose <shiju.jose@huawei.com>
> Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
> Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section

A couple of trivial things inline.

> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index a6e4792a1b2e9239f44f29102a7cc058d64b93ef..179b1744df71ee1eac28718628d550075c480cd5 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -52,9 +52,46 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id,
>  	trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
>  }
>  
> -void log_arm_hw_error(struct cper_sec_proc_arm *err)
> +void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
>  {
> -	trace_arm_event(err);
> +	struct cper_arm_err_info *err_info;
> +	struct cper_arm_ctx_info *ctx_info;
> +	u8 *ven_err_data;
> +	u32 ctx_len = 0;
> +	int n, sz, cpu;
> +	s32 vsei_len;
> +	u32 pei_len;
> +	u8 *pei_err;
> +	u8 *ctx_err;

Maybe combine the two u8 *

> +
> +	pei_len = sizeof(struct cper_arm_err_info) * err->err_info_num;
> +	pei_err = (u8 *)err + sizeof(struct cper_sec_proc_arm);
	pei_err = (u8 *)(err + 1);

Which is the same as err_info. Setting one to the other will make that
relationship more obvious if we do need to keep them both around.
(Similar to the ctx_err = (u8 *)ctx_info; below)

> +
> +	err_info = (struct cper_arm_err_info *)(err + 1);
> +	ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
> +	ctx_err = (u8 *)ctx_info;
> +
> +	for (n = 0; n < err->context_info_num; n++) {
> +		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> +		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> +		ctx_len += sz;
> +	}
> +
> +	vsei_len = err->section_length - (sizeof(struct cper_sec_proc_arm) + pei_len + ctx_len);
> +	if (vsei_len < 0) {
> +		pr_warn(FW_BUG "section length: %d\n", err->section_length);
> +		pr_warn(FW_BUG "section length is too small\n");
> +		pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
> +		vsei_len = 0;
> +	}
> +	ven_err_data = (u8 *)ctx_info;
> +
> +	cpu = GET_LOGICAL_INDEX(err->mpidr);
> +	if (cpu < 0)
> +		cpu = -1;
> +
> +	trace_arm_event(err, pei_err, pei_len, ctx_err, ctx_len,
> +			ven_err_data, (u32)vsei_len, sev, cpu);
>  }

> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index 14c9f943d53fb6cbadeef3f4b13e61470f0b5dee..a6b7ac0adaff9dfeab05a0c75ed359930ae04479 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -168,11 +168,24 @@ TRACE_EVENT(mc_event,
>   * This event is generated when hardware detects an ARM processor error
>   * has occurred. UEFI 2.6 spec section N.2.4.4.
>   */
> +#define APEIL "ARM Processor Err Info data len"
> +#define APEID "ARM Processor Err Info raw data"
> +#define APECIL "ARM Processor Err Context Info data len"
> +#define APECID "ARM Processor Err Context Info raw data"
> +#define VSEIL "Vendor Specific Err Info data len"
> +#define VSEID "Vendor Specific Err Info raw data"
>  TRACE_EVENT(arm_event,
>  
> -	TP_PROTO(const struct cper_sec_proc_arm *proc),
> +	TP_PROTO(const struct cper_sec_proc_arm *proc, const u8 *pei_err,
> +			const u32 pei_len,
Trivial but this is an odd bit of wrapping to have two items
on first line, but then only one on later lines.  Also additional
indent seems unusual and isn't matched by the first few things I looked
at in this file.  So should be under the c of const (one space I think
rather than a tab).

> +			const u8 *ctx_err,
> +			const u32 ctx_len,
> +			const u8 *oem,
> +			const u32 oem_len,
> +			u8 sev,
> +			int cpu),
>  




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/5] efi/cper: Add a new helper function to print bitmasks
  2025-08-05 18:35 ` [PATCH v4 3/5] efi/cper: Add a new helper function to print bitmasks Daniel Ferguson
@ 2025-08-08 15:23   ` Jonathan Cameron
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2025-08-08 15:23 UTC (permalink / raw)
  To: Daniel Ferguson
  Cc: Ard Biesheuvel, Jonathan Corbet, Rafael J. Wysocki, Len Brown,
	James Morse, Tony Luck, Borislav Petkov, linux-doc, linux-kernel,
	linux-acpi, linux-efi, linux-edac, Mauro Carvalho Chehab

On Tue, 05 Aug 2025 11:35:40 -0700
Daniel Ferguson <danielf@os.amperecomputing.com> wrote:

> From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> 
> Add a helper function to print a string with names associated
> to each bit field.
> 
> A typical example is:
> 
> 	const char * const bits[] = {
> 		"bit 3 name",
> 		"bit 4 name",
> 		"bit 5 name",
> 	};
> 	char str[120];
>         unsigned int bitmask = BIT(3) | BIT(5);
> 
> 	#define MASK  GENMASK(5,3)
> 
> 	cper_bits_to_str(str, sizeof(str), FIELD_GET(MASK, bitmask),
> 			 bits, ARRAY_SIZE(bits));
> 
> The above code fills string "str" with "bit 3 name|bit 5 name".
> 
> Reviewed-by; Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 

oops.  That was probably my mess up.  Please fix that ; to : and
put it in the right place.

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Acked-by: Borislav Petkov (AMD) <bp@alien8.de>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/5] Fix issues with ARM Processor CPER records
  2025-08-08 15:01   ` Jonathan Cameron
@ 2025-08-08 19:03     ` Daniel Ferguson
  2025-08-09 15:49       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Ferguson @ 2025-08-08 19:03 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Ard Biesheuvel, Jonathan Corbet, Rafael J. Wysocki, Len Brown,
	James Morse, Tony Luck, Borislav Petkov, linux-doc, linux-kernel,
	linux-acpi, linux-efi, linux-edac, Jason Tian, Shengwei Luo,
	Mauro Carvalho Chehab, Shiju Jose



On 8/8/2025 8:01 AM, Jonathan Cameron wrote:
> On Tue, 5 Aug 2025 11:39:38 -0700
> Daniel Ferguson <danielf@os.amperecomputing.com> wrote:
> 
>> On 8/5/2025 11:35 AM, Daniel Ferguson wrote:
>>> [NO NEED FOR INTERNAL REVIEW, THIS IS JUST A TEST]  
>>
>> Yes, PLEASE REVIEW...
>> I accidentally left that message in...
>>
>> ~Daniel
> 
> Hi Daniel,
> 
> No problem with you picking this up to take forwards as I gather you asked
> Mauro first, but good to mention that change and that it's with agreement
> (or not if it's abandoned code which I don't think this was.) 
> 
> Thanks,
> 
> Jonathan

Hi Jonathan,

I didn't ask Mauro directly, but I did ask this list if my intervention would be
useful. I was given feedback that it would be.

When I resubmit this patch, to take into account your feedback, I'll be sure to
also explain in the cover letter the change of hands.

Hopefully I didn't step on any toes, that wasn't my intention.

FWIW, here is a link to the brief conversation that led to the hand-off:
https://lore.kernel.org/linux-acpi/CAMj1kXGSNT08QrCp1jazmi9ANpZ7RCuS4kHo9x6hwxtp6z0Nhg@mail.gmail.com/

Regards,
~Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/5] Fix issues with ARM Processor CPER records
  2025-08-08 19:03     ` Daniel Ferguson
@ 2025-08-09 15:49       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-09 15:49 UTC (permalink / raw)
  To: Daniel Ferguson
  Cc: Jonathan Cameron, Ard Biesheuvel, Jonathan Corbet,
	Rafael J. Wysocki, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, linux-doc, linux-kernel, linux-acpi, linux-efi,
	linux-edac, Jason Tian, Shengwei Luo, Shiju Jose

Em Fri, 8 Aug 2025 12:03:06 -0700
Daniel Ferguson <danielf@os.amperecomputing.com> escreveu:

> On 8/8/2025 8:01 AM, Jonathan Cameron wrote:
> > On Tue, 5 Aug 2025 11:39:38 -0700
> > Daniel Ferguson <danielf@os.amperecomputing.com> wrote:
> >   
> >> On 8/5/2025 11:35 AM, Daniel Ferguson wrote:  
> >>> [NO NEED FOR INTERNAL REVIEW, THIS IS JUST A TEST]    
> >>
> >> Yes, PLEASE REVIEW...
> >> I accidentally left that message in...
> >>
> >> ~Daniel  
> > 
> > Hi Daniel,
> > 
> > No problem with you picking this up to take forwards as I gather you asked
> > Mauro first, but good to mention that change and that it's with agreement
> > (or not if it's abandoned code which I don't think this was.) 
> > 
> > Thanks,
> > 
> > Jonathan  
> 
> Hi Jonathan,
> 
> I didn't ask Mauro directly, but I did ask this list if my intervention would be
> useful. I was given feedback that it would be.
> 
> When I resubmit this patch, to take into account your feedback, I'll be sure to
> also explain in the cover letter the change of hands.

I saw your past e-mail. I'm fine if you can resubmit this.

> 
> Hopefully I didn't step on any toes, that wasn't my intention.
> 
> FWIW, here is a link to the brief conversation that led to the hand-off:
> https://lore.kernel.org/linux-acpi/CAMj1kXGSNT08QrCp1jazmi9ANpZ7RCuS4kHo9x6hwxtp6z0Nhg@mail.gmail.com/

Regards,
Mauro

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace
  2025-08-08 15:22   ` Jonathan Cameron
@ 2025-08-09 15:55     ` Mauro Carvalho Chehab
  2025-08-11 10:52       ` Jonathan Cameron
  0 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2025-08-09 15:55 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Daniel Ferguson, Ard Biesheuvel, Jonathan Corbet,
	Rafael J. Wysocki, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, linux-doc, linux-kernel, linux-acpi, linux-efi,
	linux-edac, Jason Tian, Shengwei Luo, Shiju Jose

Em Fri, 8 Aug 2025 16:22:09 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:

> On Tue, 05 Aug 2025 11:35:38 -0700
> Daniel Ferguson <danielf@os.amperecomputing.com> wrote:
> 
> > From: Jason Tian <jason@os.amperecomputing.com>
> > 
> > The ARM processor CPER record was added in UEFI v2.6 and remained
> > unchanged up to v2.10.
> > 
> > Yet, the original arm_event trace code added by
> > 
> >   e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> > 
> > is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
> > exporting any information from tables N.17 to N.29 of the record.
> > 
> > This is not enough for the user to be able to figure out what has
> > exactly happened or to take appropriate action.
> > 
> > According to the UEFI v2.9 specification chapter N2.4.4, the ARM
> > processor error section includes:
> > 
> > - several (ERR_INFO_NUM) ARM processor error information structures
> >   (Tables N.17 to N.20);
> > - several (CONTEXT_INFO_NUM) ARM processor context information
> >   structures (Tables N.21 to N.29);
> > - several vendor specific error information structures. The
> >   size is given by Section Length minus the size of the other
> >   fields.
> > 
> > In addition, it also exports two fields that are parsed by the GHES
> > driver when firmware reports it, e.g.:
> > 
> > - error severity
> > - CPU logical index
> > 
> > Report all of these information to userspace via a the ARM tracepoint so
> > that userspace can properly record the error and take decisions related
> > to CPU core isolation according to error severity and other info.
> > 
> > The updated ARM trace event now contains the following fields:
> > 
> > ======================================  =============================
> > UEFI field on table N.16                ARM Processor trace fields
> > ======================================  =============================
> > Validation                              handled when filling data for
> >                                         affinity MPIDR and running
> >                                         state.
> > ERR_INFO_NUM                            pei_len
> > CONTEXT_INFO_NUM                        ctx_len
> > Section Length                          indirectly reported by
> >                                         pei_len, ctx_len and oem_len
> > Error affinity level                    affinity
> > MPIDR_EL1                               mpidr
> > MIDR_EL1                                midr
> > Running State                           running_state
> > PSCI State                              psci_state
> > Processor Error Information Structure   pei_err - count at pei_len
> > Processor Context                       ctx_err- count at ctx_len
> > Vendor Specific Error Info              oem - count at oem_len
> > ======================================  =============================
> > 
> > It should be noted that decoding of tables N.17 to N.29, if needed, will
> > be handled in userspace. That gives more flexibility, as there won't be
> > any need to flood the kernel with micro-architecture specific error
> > decoding.
> > 
> > Also, decoding the other fields require a complex logic, and should be
> > done for each of the several values inside the record field.  So, let
> > userspace daemons like rasdaemon decode them, parsing such tables and
> > having vendor-specific micro-architecture-specific decoders.
> > 
> >   [mchehab: modified description, solved merge conflicts and fixed coding style]
> > 
> > Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> >   
> 
> Fixes tag is part of the main tag block so no blank line here.
> There are at least some scripts running on the kernel tree that trip
> up on this (and one that moans at the submitter ;)
> 
> I'd also add something to explain the SoB sequence for the curious.
> 
> > Co-developed-by: Jason Tian <jason@os.amperecomputing.com>  
> 
> Jason's the Author, so shouldn't have a Co-dev tag.
> There is some info on this in
> https://docs.kernel.org/process/submitting-patches.html

My understanding is that all co-authors shall have co-developed-by
and SoB. Anyway, doesn't matter much in practice, I guess.

> 
> > Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
> > Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
> > Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
> > Co-developed-by: Daniel Ferguson <danielf@os.amperecomputing.com>
> > Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>  
> 
> As person submitting I'd normally expect your SoB last.
> 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> 
> I guess this is because Mauro posted an earlier version in which
> case this is arguably correct, but likely to confuse.
> For cases like this I add comments.

If the patch is identical, and it is just a resubmission,
I would keep the original order.

Otherwise, if Daniel did some changes at the code (except for a
trivial rebase stuff), better to move his co-developed-by/SoB to
the end, eventually adding:

[Daniel: <did something>] before the custody chain block.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace
  2025-08-09 15:55     ` Mauro Carvalho Chehab
@ 2025-08-11 10:52       ` Jonathan Cameron
  2025-08-11 22:37         ` Daniel Ferguson
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2025-08-11 10:52 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Daniel Ferguson, Ard Biesheuvel, Jonathan Corbet,
	Rafael J. Wysocki, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, linux-doc, linux-kernel, linux-acpi, linux-efi,
	linux-edac, Jason Tian, Shengwei Luo, Shiju Jose

On Sat, 9 Aug 2025 17:55:19 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Em Fri, 8 Aug 2025 16:22:09 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:
> 
> > On Tue, 05 Aug 2025 11:35:38 -0700
> > Daniel Ferguson <danielf@os.amperecomputing.com> wrote:
> >   
> > > From: Jason Tian <jason@os.amperecomputing.com>
> > > 
> > > The ARM processor CPER record was added in UEFI v2.6 and remained
> > > unchanged up to v2.10.
> > > 
> > > Yet, the original arm_event trace code added by
> > > 
> > >   e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> > > 
> > > is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
> > > exporting any information from tables N.17 to N.29 of the record.
> > > 
> > > This is not enough for the user to be able to figure out what has
> > > exactly happened or to take appropriate action.
> > > 
> > > According to the UEFI v2.9 specification chapter N2.4.4, the ARM
> > > processor error section includes:
> > > 
> > > - several (ERR_INFO_NUM) ARM processor error information structures
> > >   (Tables N.17 to N.20);
> > > - several (CONTEXT_INFO_NUM) ARM processor context information
> > >   structures (Tables N.21 to N.29);
> > > - several vendor specific error information structures. The
> > >   size is given by Section Length minus the size of the other
> > >   fields.
> > > 
> > > In addition, it also exports two fields that are parsed by the GHES
> > > driver when firmware reports it, e.g.:
> > > 
> > > - error severity
> > > - CPU logical index
> > > 
> > > Report all of these information to userspace via a the ARM tracepoint so
> > > that userspace can properly record the error and take decisions related
> > > to CPU core isolation according to error severity and other info.
> > > 
> > > The updated ARM trace event now contains the following fields:
> > > 
> > > ======================================  =============================
> > > UEFI field on table N.16                ARM Processor trace fields
> > > ======================================  =============================
> > > Validation                              handled when filling data for
> > >                                         affinity MPIDR and running
> > >                                         state.
> > > ERR_INFO_NUM                            pei_len
> > > CONTEXT_INFO_NUM                        ctx_len
> > > Section Length                          indirectly reported by
> > >                                         pei_len, ctx_len and oem_len
> > > Error affinity level                    affinity
> > > MPIDR_EL1                               mpidr
> > > MIDR_EL1                                midr
> > > Running State                           running_state
> > > PSCI State                              psci_state
> > > Processor Error Information Structure   pei_err - count at pei_len
> > > Processor Context                       ctx_err- count at ctx_len
> > > Vendor Specific Error Info              oem - count at oem_len
> > > ======================================  =============================
> > > 
> > > It should be noted that decoding of tables N.17 to N.29, if needed, will
> > > be handled in userspace. That gives more flexibility, as there won't be
> > > any need to flood the kernel with micro-architecture specific error
> > > decoding.
> > > 
> > > Also, decoding the other fields require a complex logic, and should be
> > > done for each of the several values inside the record field.  So, let
> > > userspace daemons like rasdaemon decode them, parsing such tables and
> > > having vendor-specific micro-architecture-specific decoders.
> > > 
> > >   [mchehab: modified description, solved merge conflicts and fixed coding style]
> > > 
> > > Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> > >     
> > 
> > Fixes tag is part of the main tag block so no blank line here.
> > There are at least some scripts running on the kernel tree that trip
> > up on this (and one that moans at the submitter ;)
> > 
> > I'd also add something to explain the SoB sequence for the curious.
> >   
> > > Co-developed-by: Jason Tian <jason@os.amperecomputing.com>    
> > 
> > Jason's the Author, so shouldn't have a Co-dev tag.
> > There is some info on this in
> > https://docs.kernel.org/process/submitting-patches.html  
> 
> My understanding is that all co-authors shall have co-developed-by
> and SoB. Anyway, doesn't matter much in practice, I guess.

Nope.  In the description the docs say "in addition to the author
attribute in the From: tag"  There are also examples where there
isn't a Co-dev for the From author including the subtle question of
ordering if someone else posts that patch.

I have a vague recollection one of the scripts checking linux-next
might moan about this. 

> 
> >   
> > > Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
> > > Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
> > > Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
> > > Co-developed-by: Daniel Ferguson <danielf@os.amperecomputing.com>
> > > Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>    
> > 
> > As person submitting I'd normally expect your SoB last.
> >   
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>    
> > 
> > I guess this is because Mauro posted an earlier version in which
> > case this is arguably correct, but likely to confuse.
> > For cases like this I add comments.  
> 
> If the patch is identical, and it is just a resubmission,
> I would keep the original order.
> 
> Otherwise, if Daniel did some changes at the code (except for a
> trivial rebase stuff), better to move his co-developed-by/SoB to
> the end, eventually adding:
> 
> [Daniel: <did something>] before the custody chain block.

Docs are clear that sender of the patch must be last SoB.
That's also checked by the some of the tag block check scripts
I think.  There's an example of exactly this case in the in the
submitting-patches.rst file (last one in the section that talks
about Co-developed-by.

For meaning I don't care that much, but keeping to the rigid
rules in that doc makes it easier for scripts to check for the
more important stuff like whether all necessary SoB are there.

Jonathan

> 
> Thanks,
> Mauro
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace
  2025-08-11 10:52       ` Jonathan Cameron
@ 2025-08-11 22:37         ` Daniel Ferguson
  2025-08-12 14:05           ` Jonathan Cameron
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Ferguson @ 2025-08-11 22:37 UTC (permalink / raw)
  To: Jonathan Cameron, Mauro Carvalho Chehab
  Cc: Ard Biesheuvel, Jonathan Corbet, Rafael J. Wysocki, Len Brown,
	James Morse, Tony Luck, Borislav Petkov, linux-doc, linux-kernel,
	linux-acpi, linux-efi, linux-edac, Jason Tian, Shengwei Luo,
	Shiju Jose



On 8/11/2025 3:52 AM, Jonathan Cameron wrote:
> On Sat, 9 Aug 2025 17:55:19 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
>> Em Fri, 8 Aug 2025 16:22:09 +0100
>> Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:
>>
>>> On Tue, 05 Aug 2025 11:35:38 -0700
>>> Daniel Ferguson <danielf@os.amperecomputing.com> wrote:
>>>   
>>>> From: Jason Tian <jason@os.amperecomputing.com>
>>>>
>>>> The ARM processor CPER record was added in UEFI v2.6 and remained
>>>> unchanged up to v2.10.
>>>>
>>>> Yet, the original arm_event trace code added by
>>>>
>>>>   e9279e83ad1f ("trace, ras: add ARM processor error trace event")
>>>>
>>>> is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
>>>> exporting any information from tables N.17 to N.29 of the record.
>>>>
>>>> This is not enough for the user to be able to figure out what has
>>>> exactly happened or to take appropriate action.
>>>>
>>>> According to the UEFI v2.9 specification chapter N2.4.4, the ARM
>>>> processor error section includes:
>>>>
>>>> - several (ERR_INFO_NUM) ARM processor error information structures
>>>>   (Tables N.17 to N.20);
>>>> - several (CONTEXT_INFO_NUM) ARM processor context information
>>>>   structures (Tables N.21 to N.29);
>>>> - several vendor specific error information structures. The
>>>>   size is given by Section Length minus the size of the other
>>>>   fields.
>>>>
>>>> In addition, it also exports two fields that are parsed by the GHES
>>>> driver when firmware reports it, e.g.:
>>>>
>>>> - error severity
>>>> - CPU logical index
>>>>
>>>> Report all of these information to userspace via a the ARM tracepoint so
>>>> that userspace can properly record the error and take decisions related
>>>> to CPU core isolation according to error severity and other info.
>>>>
>>>> The updated ARM trace event now contains the following fields:
>>>>
>>>> ======================================  =============================
>>>> UEFI field on table N.16                ARM Processor trace fields
>>>> ======================================  =============================
>>>> Validation                              handled when filling data for
>>>>                                         affinity MPIDR and running
>>>>                                         state.
>>>> ERR_INFO_NUM                            pei_len
>>>> CONTEXT_INFO_NUM                        ctx_len
>>>> Section Length                          indirectly reported by
>>>>                                         pei_len, ctx_len and oem_len
>>>> Error affinity level                    affinity
>>>> MPIDR_EL1                               mpidr
>>>> MIDR_EL1                                midr
>>>> Running State                           running_state
>>>> PSCI State                              psci_state
>>>> Processor Error Information Structure   pei_err - count at pei_len
>>>> Processor Context                       ctx_err- count at ctx_len
>>>> Vendor Specific Error Info              oem - count at oem_len
>>>> ======================================  =============================
>>>>
>>>> It should be noted that decoding of tables N.17 to N.29, if needed, will
>>>> be handled in userspace. That gives more flexibility, as there won't be
>>>> any need to flood the kernel with micro-architecture specific error
>>>> decoding.
>>>>
>>>> Also, decoding the other fields require a complex logic, and should be
>>>> done for each of the several values inside the record field.  So, let
>>>> userspace daemons like rasdaemon decode them, parsing such tables and
>>>> having vendor-specific micro-architecture-specific decoders.
>>>>
>>>>   [mchehab: modified description, solved merge conflicts and fixed coding style]
>>>>
>>>> Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
>>>>     
>>>
>>> Fixes tag is part of the main tag block so no blank line here.
>>> There are at least some scripts running on the kernel tree that trip
>>> up on this (and one that moans at the submitter ;)
>>>
>>> I'd also add something to explain the SoB sequence for the curious.
>>>   
>>>> Co-developed-by: Jason Tian <jason@os.amperecomputing.com>    
>>>
>>> Jason's the Author, so shouldn't have a Co-dev tag.
>>> There is some info on this in
>>> https://docs.kernel.org/process/submitting-patches.html  
>>
>> My understanding is that all co-authors shall have co-developed-by
>> and SoB. Anyway, doesn't matter much in practice, I guess.
> 
> Nope.  In the description the docs say "in addition to the author
> attribute in the From: tag"  There are also examples where there
> isn't a Co-dev for the From author including the subtle question of
> ordering if someone else posts that patch.
> 
> I have a vague recollection one of the scripts checking linux-next
> might moan about this. 
> 
>>
>>>   
>>>> Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
>>>> Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
>>>> Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
>>>> Co-developed-by: Daniel Ferguson <danielf@os.amperecomputing.com>
>>>> Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>    
>>>
>>> As person submitting I'd normally expect your SoB last.
>>>   
>>>> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>    
>>>
>>> I guess this is because Mauro posted an earlier version in which
>>> case this is arguably correct, but likely to confuse.
>>> For cases like this I add comments.  
>>
>> If the patch is identical, and it is just a resubmission,
>> I would keep the original order.
>>
>> Otherwise, if Daniel did some changes at the code (except for a
>> trivial rebase stuff), better to move his co-developed-by/SoB to
>> the end, eventually adding:
>>
>> [Daniel: <did something>] before the custody chain block.
> 
> Docs are clear that sender of the patch must be last SoB.
> That's also checked by the some of the tag block check scripts
> I think.  There's an example of exactly this case in the in the
> submitting-patches.rst file (last one in the section that talks
> about Co-developed-by.
> 
> For meaning I don't care that much, but keeping to the rigid
> rules in that doc makes it easier for scripts to check for the
> more important stuff like whether all necessary SoB are there.
> 
> Jonathan
> 

Just to make sure I get this right...

I will move my SoB and my Co-developed-by after any other SoB's in the tag
block. As a result, I do not need to provide an explanation in this case,
because there is nothing curious remaining ? Or should I still add comments
indicating why I rearranged the tag block ?

Based on your feedback, I'm intending to organize the tag block to look like this:

 Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
 Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
 Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
 Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
 Co-developed-by: Daniel Ferguson <danielf@os.amperecomputing.com>
 Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>
 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
 Tested-by: Shiju Jose <shiju.jose@huawei.com>
 Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
 Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
 Link:
https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section


Is that satisfactory?

Thank you,
Daniel

>>
>> Thanks,
>> Mauro
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace
  2025-08-11 22:37         ` Daniel Ferguson
@ 2025-08-12 14:05           ` Jonathan Cameron
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2025-08-12 14:05 UTC (permalink / raw)
  To: Daniel Ferguson
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, Jonathan Corbet,
	Rafael J. Wysocki, Len Brown, James Morse, Tony Luck,
	Borislav Petkov, linux-doc, linux-kernel, linux-acpi, linux-efi,
	linux-edac, Jason Tian, Shengwei Luo, Shiju Jose

On Mon, 11 Aug 2025 15:37:18 -0700
Daniel Ferguson <danielf@os.amperecomputing.com> wrote:

> On 8/11/2025 3:52 AM, Jonathan Cameron wrote:
> > On Sat, 9 Aug 2025 17:55:19 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >   
> >> Em Fri, 8 Aug 2025 16:22:09 +0100
> >> Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:
> >>  
> >>> On Tue, 05 Aug 2025 11:35:38 -0700
> >>> Daniel Ferguson <danielf@os.amperecomputing.com> wrote:
> >>>     
> >>>> From: Jason Tian <jason@os.amperecomputing.com>
> >>>>
> >>>> The ARM processor CPER record was added in UEFI v2.6 and remained
> >>>> unchanged up to v2.10.
> >>>>
> >>>> Yet, the original arm_event trace code added by
> >>>>
> >>>>   e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> >>>>
> >>>> is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
> >>>> exporting any information from tables N.17 to N.29 of the record.
> >>>>
> >>>> This is not enough for the user to be able to figure out what has
> >>>> exactly happened or to take appropriate action.
> >>>>
> >>>> According to the UEFI v2.9 specification chapter N2.4.4, the ARM
> >>>> processor error section includes:
> >>>>
> >>>> - several (ERR_INFO_NUM) ARM processor error information structures
> >>>>   (Tables N.17 to N.20);
> >>>> - several (CONTEXT_INFO_NUM) ARM processor context information
> >>>>   structures (Tables N.21 to N.29);
> >>>> - several vendor specific error information structures. The
> >>>>   size is given by Section Length minus the size of the other
> >>>>   fields.
> >>>>
> >>>> In addition, it also exports two fields that are parsed by the GHES
> >>>> driver when firmware reports it, e.g.:
> >>>>
> >>>> - error severity
> >>>> - CPU logical index
> >>>>
> >>>> Report all of these information to userspace via a the ARM tracepoint so
> >>>> that userspace can properly record the error and take decisions related
> >>>> to CPU core isolation according to error severity and other info.
> >>>>
> >>>> The updated ARM trace event now contains the following fields:
> >>>>
> >>>> ======================================  =============================
> >>>> UEFI field on table N.16                ARM Processor trace fields
> >>>> ======================================  =============================
> >>>> Validation                              handled when filling data for
> >>>>                                         affinity MPIDR and running
> >>>>                                         state.
> >>>> ERR_INFO_NUM                            pei_len
> >>>> CONTEXT_INFO_NUM                        ctx_len
> >>>> Section Length                          indirectly reported by
> >>>>                                         pei_len, ctx_len and oem_len
> >>>> Error affinity level                    affinity
> >>>> MPIDR_EL1                               mpidr
> >>>> MIDR_EL1                                midr
> >>>> Running State                           running_state
> >>>> PSCI State                              psci_state
> >>>> Processor Error Information Structure   pei_err - count at pei_len
> >>>> Processor Context                       ctx_err- count at ctx_len
> >>>> Vendor Specific Error Info              oem - count at oem_len
> >>>> ======================================  =============================
> >>>>
> >>>> It should be noted that decoding of tables N.17 to N.29, if needed, will
> >>>> be handled in userspace. That gives more flexibility, as there won't be
> >>>> any need to flood the kernel with micro-architecture specific error
> >>>> decoding.
> >>>>
> >>>> Also, decoding the other fields require a complex logic, and should be
> >>>> done for each of the several values inside the record field.  So, let
> >>>> userspace daemons like rasdaemon decode them, parsing such tables and
> >>>> having vendor-specific micro-architecture-specific decoders.
> >>>>
> >>>>   [mchehab: modified description, solved merge conflicts and fixed coding style]
> >>>>
> >>>> Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> >>>>       
> >>>
> >>> Fixes tag is part of the main tag block so no blank line here.
> >>> There are at least some scripts running on the kernel tree that trip
> >>> up on this (and one that moans at the submitter ;)
> >>>
> >>> I'd also add something to explain the SoB sequence for the curious.
> >>>     
> >>>> Co-developed-by: Jason Tian <jason@os.amperecomputing.com>      
> >>>
> >>> Jason's the Author, so shouldn't have a Co-dev tag.
> >>> There is some info on this in
> >>> https://docs.kernel.org/process/submitting-patches.html    
> >>
> >> My understanding is that all co-authors shall have co-developed-by
> >> and SoB. Anyway, doesn't matter much in practice, I guess.  
> > 
> > Nope.  In the description the docs say "in addition to the author
> > attribute in the From: tag"  There are also examples where there
> > isn't a Co-dev for the From author including the subtle question of
> > ordering if someone else posts that patch.
> > 
> > I have a vague recollection one of the scripts checking linux-next
> > might moan about this. 
> >   
> >>  
> >>>     
> >>>> Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
> >>>> Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
> >>>> Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
> >>>> Co-developed-by: Daniel Ferguson <danielf@os.amperecomputing.com>
> >>>> Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>      
> >>>
> >>> As person submitting I'd normally expect your SoB last.
> >>>     
> >>>> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>      
> >>>
> >>> I guess this is because Mauro posted an earlier version in which
> >>> case this is arguably correct, but likely to confuse.
> >>> For cases like this I add comments.    
> >>
> >> If the patch is identical, and it is just a resubmission,
> >> I would keep the original order.
> >>
> >> Otherwise, if Daniel did some changes at the code (except for a
> >> trivial rebase stuff), better to move his co-developed-by/SoB to
> >> the end, eventually adding:
> >>
> >> [Daniel: <did something>] before the custody chain block.  
> > 
> > Docs are clear that sender of the patch must be last SoB.
> > That's also checked by the some of the tag block check scripts
> > I think.  There's an example of exactly this case in the in the
> > submitting-patches.rst file (last one in the section that talks
> > about Co-developed-by.
> > 
> > For meaning I don't care that much, but keeping to the rigid
> > rules in that doc makes it easier for scripts to check for the
> > more important stuff like whether all necessary SoB are there.
> > 
> > Jonathan
> >   
> 
> Just to make sure I get this right...
> 
> I will move my SoB and my Co-developed-by after any other SoB's in the tag
> block. As a result, I do not need to provide an explanation in this case,
> because there is nothing curious remaining ? Or should I still add comments
> indicating why I rearranged the tag block ?
> 
> Based on your feedback, I'm intending to organize the tag block to look like this:
> 
>  Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
>  Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
>  Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
>  Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>  Co-developed-by: Daniel Ferguson <danielf@os.amperecomputing.com>
>  Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>
>  Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>  Tested-by: Shiju Jose <shiju.jose@huawei.com>
>  Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
>  Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
>  Link:
> https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
> 
> 
> Is that satisfactory?

The SoB from Mauro is still out of the normal pattern which
tends to happen only when things are bouncing between different submitters.

So I'd still add a comment and keep that after you Codev/SoB.

#1 [mchehab: modified description, solved merge conflicts and fixed coding style
   in version 3.]

Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
Co-developed-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> # 1
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Shiju Jose <shiju.jose@huawei.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section

Or something along those lines

Jonathan

> 
> Thank you,
> Daniel
> 
> >>
> >> Thanks,
> >> Mauro
> >>  
> >   
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-08-12 14:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-05 18:35 [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
2025-08-05 18:35 ` [PATCH v4 1/5] RAS: Report all ARM processor CPER information to userspace Daniel Ferguson
2025-08-08 15:22   ` Jonathan Cameron
2025-08-09 15:55     ` Mauro Carvalho Chehab
2025-08-11 10:52       ` Jonathan Cameron
2025-08-11 22:37         ` Daniel Ferguson
2025-08-12 14:05           ` Jonathan Cameron
2025-08-05 18:35 ` [PATCH v4 2/5] efi/cper: Adjust infopfx size to accept an extra space Daniel Ferguson
2025-08-05 18:35 ` [PATCH v4 3/5] efi/cper: Add a new helper function to print bitmasks Daniel Ferguson
2025-08-08 15:23   ` Jonathan Cameron
2025-08-05 18:35 ` [PATCH v4 4/5] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs Daniel Ferguson
2025-08-05 18:35 ` [PATCH v4 5/5] docs: efi: add CPER functions to driver-api Daniel Ferguson
2025-08-05 18:39 ` [PATCH v4 0/5] Fix issues with ARM Processor CPER records Daniel Ferguson
2025-08-08 15:01   ` Jonathan Cameron
2025-08-08 19:03     ` Daniel Ferguson
2025-08-09 15:49       ` Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).