linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Fix issues with ARM Processor CPER records
@ 2024-07-08 11:18 Mauro Carvalho Chehab
  2024-07-08 11:18 ` [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions Mauro Carvalho Chehab
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-08 11:18 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, James Morse,
	Jonathan Cameron, Shiju Jose, linux-efi, linux-kernel, linux-edac,
	Len Brown, linux-acpi, Rafael J. Wysocki, Jonathan Corbet,
	linux-doc

This series replaces two previously-sent series:
- https://lore.kernel.org/linux-edac/cover.1719219886.git.mchehab+huawei@kernel.org/T/#t
- https://lore.kernel.org/linux-edac/cover.1719484498.git.mchehab+huawei@kernel.org/T/#t

It is also available at:

	https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac.git/log/?h=edac-arm64

This is needed for both kernelspace and userspace properly handle ARM processor CPER
events.

Patches 1 and 2 of this series fix the UEFI 2.6+ implementation of the ARM 
trace event, as the original implementation was incomplete.
Changeset e9279e83ad1f ("trace, ras: add ARM processor error trace event")
added such event, but it reports only some fields of the CPER record
defined on UEFI 2.6+ appendix N, table N.16.  Those are not enough 
actually parse such events on userspace, as not even the event type
is exported.

Patch 3 fixes a compilation breakage when W=1;

Patch 4 adds a new helper function to be used by cper and ghes drivers to
display CPER bitmaps;

Patch 5 fixes CPER logic according with UEFI 2.9A errata. Before it, there
was no description about how processor type field was encoded. The errata
defines it as a bitmask, and provides the information about how it should
be encoded.

Patch 6 adds CPER functions to Kernel-doc.

This series was validated with the help of an ARM EINJ code for QEMU:

	https://github.com/mchehab/rasdaemon/wiki/error-injection

Using the QEMU injection code at:

   https://gitlab.com/mchehab_kernel/qemu/-/commits/arm-error-inject-v2?ref_type=heads

Running it on QEMU and sending those commands via QEMU QMP interface:

    { "execute": "qmp_capabilities" } 
    { "execute": "arm-inject-error", "arguments": {
	"validation": ["mpidr-valid", "affinity-valid", "running-state-valid", "vendor-specific-valid"],
	"running-state": [], "psci-state": 1229279264, "error": [
	{"type": ["tlb-error", "bus-error", "micro-arch-error"], "multiple-error": 2}, 
	{"type": ["micro-arch-error"]},
	{"type": ["tlb-error"]}, 
	{"type": ["bus-error"]}, 
	{"type": ["cache-error"]}]} }

The CPER event is now properly handled:

[   53.223383] {1}[Hardware Error]: event severity: recoverable
[   53.223690] {1}[Hardware Error]:  Error 0, type: recoverable
[   53.224073] {1}[Hardware Error]:   section_type: ARM processor error
[   53.224419] {1}[Hardware Error]:   MIDR: 0x0000000000000000
[   53.224694] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
[   53.225029] {1}[Hardware Error]:   error affinity level: 2
[   53.225266] {1}[Hardware Error]:   running state: 0x0
[   53.225516] {1}[Hardware Error]:   Power State Coordination Interface state: 1229279264
[   53.225857] {1}[Hardware Error]:   Error info structure 0:
[   53.226094] {1}[Hardware Error]:   num errors: 3
[   53.226317] {1}[Hardware Error]:    first error captured
[   53.226548] {1}[Hardware Error]:    propagated error captured
[   53.226806] {1}[Hardware Error]:    overflow occurred, error info is incomplete
[   53.227180] {1}[Hardware Error]:    error_type: 0x1c: TLB error|bus error|micro-architectural error
[   53.227549] {1}[Hardware Error]:    error_info: 0x000000000054007f
[   53.227819] {1}[Hardware Error]:    virtual fault address: 0x0000000067320230
[   53.228106] {1}[Hardware Error]:    physical fault address: 0x000000005cdfd492
[   53.228403] {1}[Hardware Error]:   Error info structure 1:
[   53.228636] {1}[Hardware Error]:   num errors: 3
[   53.228840] {1}[Hardware Error]:    first error captured
[   53.229061] {1}[Hardware Error]:    propagated error captured
[   53.229296] {1}[Hardware Error]:    overflow occurred, error info is incomplete
[   53.229577] {1}[Hardware Error]:    error_type: 0x10: micro-architectural error
[   53.229873] {1}[Hardware Error]:    error_info: 0x0000000078da03ff
[   53.230130] {1}[Hardware Error]:    virtual fault address: 0x0000000067320230
[   53.230412] {1}[Hardware Error]:    physical fault address: 0x000000005cdfd492
[   53.230694] {1}[Hardware Error]:   Error info structure 2:
[   53.230924] {1}[Hardware Error]:   num errors: 3
[   53.231128] {1}[Hardware Error]:    first error captured
[   53.231349] {1}[Hardware Error]:    propagated error captured
[   53.231582] {1}[Hardware Error]:    overflow occurred, error info is incomplete
[   53.231863] {1}[Hardware Error]:    error_type: 0x04: TLB error
[   53.232116] {1}[Hardware Error]:    error_info: 0x000000000054007f
[   53.232396] {1}[Hardware Error]:     transaction type: Instruction
[   53.232686] {1}[Hardware Error]:     TLB error, operation type: Instruction fetch
[   53.232998] {1}[Hardware Error]:     TLB level: 1
[   53.233215] {1}[Hardware Error]:     processor context not corrupted
[   53.233479] {1}[Hardware Error]:     the error has not been corrected
[   53.233740] {1}[Hardware Error]:     PC is imprecise
[   53.233974] {1}[Hardware Error]:    virtual fault address: 0x0000000067320230
[   53.234264] {1}[Hardware Error]:    physical fault address: 0x000000005cdfd492
[   53.234547] {1}[Hardware Error]:   Error info structure 3:
[   53.234776] {1}[Hardware Error]:   num errors: 3
[   53.234980] {1}[Hardware Error]:    first error captured
[   53.235199] {1}[Hardware Error]:    propagated error captured
[   53.235433] {1}[Hardware Error]:    overflow occurred, error info is incomplete
[   53.235714] {1}[Hardware Error]:    error_type: 0x08: bus error
[   53.235966] {1}[Hardware Error]:    error_info: 0x00000080d6460fff
[   53.236223] {1}[Hardware Error]:     transaction type: Generic
[   53.236478] {1}[Hardware Error]:     bus error, operation type: Generic read (type of instruction or data request cannot be determined)
[   53.236923] {1}[Hardware Error]:     affinity level at which the bus error occurred: 1
[   53.237234] {1}[Hardware Error]:     processor context corrupted
[   53.237481] {1}[Hardware Error]:     the error has been corrected
[   53.237728] {1}[Hardware Error]:     PC is imprecise
[   53.237937] {1}[Hardware Error]:     Program execution can be restarted reliably at the PC associated with the error.
[   53.238329] {1}[Hardware Error]:     participation type: Local processor observed
[   53.238627] {1}[Hardware Error]:     request timed out
[   53.238851] {1}[Hardware Error]:     address space: External Memory Access
[   53.239129] {1}[Hardware Error]:     memory access attributes:0x20
[   53.239393] {1}[Hardware Error]:     access mode: secure
[   53.239613] {1}[Hardware Error]:    virtual fault address: 0x0000000067320230
[   53.239890] {1}[Hardware Error]:    physical fault address: 0x000000005cdfd492
[   53.240168] {1}[Hardware Error]:   Error info structure 4:
[   53.240396] {1}[Hardware Error]:   num errors: 3
[   53.240601] {1}[Hardware Error]:    first error captured
[   53.240816] {1}[Hardware Error]:    propagated error captured
[   53.241048] {1}[Hardware Error]:    overflow occurred, error info is incomplete
[   53.241332] {1}[Hardware Error]:    error_type: 0x02: cache error
[   53.241589] {1}[Hardware Error]:    error_info: 0x000000000091000f
[   53.241843] {1}[Hardware Error]:     transaction type: Data Access
[   53.242101] {1}[Hardware Error]:     cache error, operation type: Data write
[   53.242385] {1}[Hardware Error]:     cache level: 2
[   53.242596] {1}[Hardware Error]:     processor context not corrupted
[   53.242847] {1}[Hardware Error]:    virtual fault address: 0x0000000067320230
[   53.243125] {1}[Hardware Error]:    physical fault address: 0x000000005cdfd492
[   53.243426] {1}[Hardware Error]:   Context info structure 0:
[   53.243675] {1}[Hardware Error]:    register context type: AArch64 EL1 context registers
[   53.244185] {1}[Hardware Error]:    00000000: 12abde67 00000000 00000000 00000000
[   53.244540] {1}[Hardware Error]:    00000010: 00000000 00000000 00000000 00000000
[   53.244864] {1}[Hardware Error]:    00000020: 00000000 00000000 00000000 00000000
[   53.245183] {1}[Hardware Error]:    00000030: 00000000 00000000 00000000 00000000
[   53.245504] {1}[Hardware Error]:    00000040: 00000000 00000000 00000000 00000000
[   53.245828] {1}[Hardware Error]:    00000050: 00000000 00000000 00000000 00000000
[   53.246149] {1}[Hardware Error]:    00000060: 00000000 00000000 00000000 00000000
[   53.246475] {1}[Hardware Error]:    00000070: 00000000 00000000 00000000 00000000
[   53.246799] {1}[Hardware Error]:    00000080: 00000000 00000000 00000000 00000000
[   53.247122] {1}[Hardware Error]:    00000090: 00000000 00000000 00000000 00000000
[   53.247446] {1}[Hardware Error]:    000000a0: 00000000 00000000 00000000 00000000
[   53.247767] {1}[Hardware Error]:    000000b0: 00000000 00000000 00000000 00000000
[   53.248090] {1}[Hardware Error]:    000000c0: 00000000 00000000 00000000 00000000
[   53.248415] {1}[Hardware Error]:    000000d0: 00000000 00000000 00000000 00000000
[   53.248739] {1}[Hardware Error]:    000000e0: 00000000 00000000 00000000 00000000
[   53.249064] {1}[Hardware Error]:    000000f0: 00000000 00000000 00000000 00000000
[   53.249398] {1}[Hardware Error]:    00000100: 00000000 00000000 00000000 00000000
[   53.249727] {1}[Hardware Error]:    00000110: 00000000 00000000 00000000 00000000
[   53.250053] {1}[Hardware Error]:    00000120: 00000000 00000000 00000000 00000000
[   53.250377] {1}[Hardware Error]:    00000130: 00000000 00000000 00000000 00000000
[   53.250700] {1}[Hardware Error]:    00000140: 00000000 00000000 00000000 00000000
[   53.251038] {1}[Hardware Error]:    00000150: 00000000 00000000 00000000 00000000
[   53.251368] {1}[Hardware Error]:    00000160: 00000000 00000000 00000000 00000000
[   53.251694] {1}[Hardware Error]:    00000170: 00000000 00000000 00000000 00000000
[   53.252017] {1}[Hardware Error]:    00000180: 00000000 00000000 00000000 00000000
[   53.252342] {1}[Hardware Error]:    00000190: 00000000 00000000 00000000 00000000
[   53.252664] {1}[Hardware Error]:    000001a0: 00000000 00000000 00000000 00000000
[   53.252984] {1}[Hardware Error]:    000001b0: 00000000 00000000 00000000 00000000
[   53.253309] {1}[Hardware Error]:    000001c0: 00000000 00000000 00000000 00000000
[   53.253630] {1}[Hardware Error]:    000001d0: 00000000 00000000 00000000 00000000
[   53.253949] {1}[Hardware Error]:    000001e0: 00000000 00000000 00000000 00000000
[   53.254273] {1}[Hardware Error]:    000001f0: 00000000 00000000 00000000 00000000
[   53.254595] {1}[Hardware Error]:    00000200: 00000000 00000000 00000000 00000000
[   53.254917] {1}[Hardware Error]:    00000210: 00000000 00000000 00000000 00000000
[   53.255245] {1}[Hardware Error]:    00000220: 00000000 00000000 00000000 00000000
[   53.255569] {1}[Hardware Error]:    00000230: 00000000 00000000 00000000 00000000
[   53.255890] {1}[Hardware Error]:    00000240: 00000000 00000000 00000000 00000000
[   53.256794] [Firmware Warn]: GHES: Unhandled processor error type 0x1c: TLB error|bus error|micro-architectural error
[   53.257203] [Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error
[   53.257543] [Firmware Warn]: GHES: Unhandled processor error type 0x04: TLB error
[   53.257842] [Firmware Warn]: GHES: Unhandled processor error type 0x08: bus error

- 

I also tested the ghes and cper reports both with and without this
change, using different versions of rasdaemon, with and without 
support for the extended trace event. Those are a summary of the
test results:

- adding more fields to the trace events didn't break userspace API:
  both versions of rasdaemon handled it;

- the rasdaemon patches to handle the new trace report was missing
  a backward-compatibility logic. I fixed already. So, rasdaemon
  can now handle both old and new trace events.

Btw, rasdaemon has gained support for the extended trace since its
version 0.5.8 (released in 2021). I didn't saw any issues there
complain about troubles on it, so either distros used on ARM servers
are using an old version of rasdaemon, or they're carrying on the trace
event changes as well.

Daniel Ferguson (1):
  RAS: ACPI: APEI: add conditional compilation to ARM error report
    functions

Mauro Carvalho Chehab (4):
  efi/cper: Adjust infopfx size to accept an extra space
  efi/cper: Add a new helper function to print bitmasks
  efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs
  docs: efi: add CPER functions to driver-api

Shengwei Luo (1):
  RAS: Report all ARM processor CPER information to userspace

 .../driver-api/firmware/efi/index.rst         | 11 ++--
 drivers/acpi/apei/ghes.c                      | 31 +++++------
 drivers/firmware/efi/cper-arm.c               | 52 +++++++++----------
 drivers/firmware/efi/cper.c                   | 43 ++++++++++++++-
 drivers/ras/ras.c                             | 47 ++++++++++++++++-
 include/linux/cper.h                          | 12 +++--
 include/linux/ras.h                           | 16 ++++--
 include/ras/ras_event.h                       | 48 +++++++++++++++--
 8 files changed, 198 insertions(+), 62 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions
  2024-07-08 11:18 [PATCH 0/6] Fix issues with ARM Processor CPER records Mauro Carvalho Chehab
@ 2024-07-08 11:18 ` Mauro Carvalho Chehab
  2024-07-08 11:32   ` Borislav Petkov
  2024-07-08 11:18 ` [PATCH 2/6] RAS: Report all ARM processor CPER information to userspace Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-08 11:18 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck
  Cc: Daniel Ferguson, Ard Biesheuvel, James Morse, Jonathan Cameron,
	Len Brown, Rafael J. Wysocki, Shiju Jose, Uwe Kleine-König,
	Dan Williams, Dave Jiang, Ira Weiny, Shuai Xue, linux-acpi,
	linux-edac, linux-efi, linux-kernel, Mauro Carvalho Chehab

From: Daniel Ferguson <danielf@os.amperecomputing.com>

This prevents the unnecessary inclusion of ARM specific RAS error
handling routines in non-ARM platforms.

[mchehab: avoid unneeded ifdefs and fix coding style issues]
Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/acpi/apei/ghes.c | 13 ++++++-------
 drivers/ras/ras.c        |  2 ++
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 623cc0cb4a65..2589a3536d91 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -529,11 +529,12 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
 }
 
 static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
-				       int sev, bool sync)
+				     int sev, bool sync)
 {
+	bool queued = false;
+#if defined(CONFIG_ARM) || defined (CONFIG_ARM64)
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
-	bool queued = false;
 	int sec_sev, i;
 	char *p;
 
@@ -570,7 +571,7 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 				    error_type);
 		p += err_info->length;
 	}
-
+#endif
 	return queued;
 }
 
@@ -773,11 +774,9 @@ static bool ghes_do_proc(struct ghes *ghes,
 
 			arch_apei_report_mem_error(sev, mem_err);
 			queued = ghes_handle_memory_failure(gdata, sev, sync);
-		}
-		else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
+		} else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
 			ghes_handle_aer(gdata);
-		}
-		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
+		} else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
 			queued = ghes_handle_arm_hw_error(gdata, sev, sync);
 		} else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
 			struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index a6e4792a1b2e..5d94ab79c8c3 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -54,7 +54,9 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id,
 
 void log_arm_hw_error(struct cper_sec_proc_arm *err)
 {
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
 	trace_arm_event(err);
+#endif
 }
 
 static int __init ras_init(void)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] RAS: Report all ARM processor CPER information to userspace
  2024-07-08 11:18 [PATCH 0/6] Fix issues with ARM Processor CPER records Mauro Carvalho Chehab
  2024-07-08 11:18 ` [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions Mauro Carvalho Chehab
@ 2024-07-08 11:18 ` Mauro Carvalho Chehab
  2024-07-08 15:34   ` Jonathan Cameron
  2024-07-08 11:18 ` [PATCH 3/6] efi/cper: Adjust infopfx size to accept an extra space Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-08 11:18 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck
  Cc: Shengwei Luo, Ard Biesheuvel, James Morse, Jonathan Cameron,
	Len Brown, Rafael J. Wysocki, Shiju Jose, Uwe Kleine-König,
	Dan Williams, Dave Jiang, Ira Weiny, Shuai Xue, Steven Rostedt,
	Tyler Baicar, Will Deacon, Xie XiuQi, linux-acpi, linux-edac,
	linux-efi, linux-kernel, Jason Tian, Daniel Ferguson,
	Mauro Carvalho Chehab

From: Shengwei Luo <luoshengwei@huawei.com>

The ARM processor CPER record was added at UEFI 2.6, and hasn't
any changes up to UEFI 2.10 on its struct.

Yet, the original arm_event trace code added on changeset
e9279e83ad1f ("trace, ras: add ARM processor error trace event") is
incomplete, as it only traces some fields of UAPI 2.6 table N.16,
not exporting at all any information from tables N.17 to N.29 of
the record.

This is not enough for user to take appropriate action or to log
what exactly happened.

According to UEFI_2_9 specification chapter N2.4.4, the ARM processor
error section includes:

- several (ERR_INFO_NUM) ARM processor error information structures
  (Tables N.17 to N.20);
- several (CONTEXT_INFO_NUM) ARM processor context information
  structures (Tables N.21 to N.29);
- several vendor specific error information structures. The
  size is given by Section Length minus the size of the other
  fields.

In addition to those data, it also exports two fields that are
parsed by the GHES driver when firmware reports it, e. g.:

- error severity
- cpu logical index

Report all of these information to userspace via trace uAPI, So that
userspace can properly record the error and take decisions related
to cpu core isolation according to error severity and other info.

After this patch, all the data from ARM Processor record from table
N.16 are directly or indirectly visible on userspace:

======================================	=============================
UEFI field on table N.16		ARM Processor trace fields
======================================	=============================
Validation				handled when filling data for
					affinity MPIDR and running
					state.
ERR_INFO_NUM				pei_len
CONTEXT_INFO_NUM			ctx_len
Section Length				indirectly reported by
					pei_len, ctx_len and oem_len
Error affinity level			affinity
MPIDR_EL1				mpidr
MIDR_EL1				midr
Running State				running_state
PSCI State				psci_state
Processor Error Information Structure	pei_err - count at pei_len
Processor Context			ctx_err- count at ctx_len
Vendor Specific Error Info		oem - count at oem_len
======================================	=============================

It should be noticed that decoding of tables N.17 to N.29, if needed,
will be handled on userspace. That gives more flexibility, as there
won't be any need to flood the Kernel with micro-architecture specific
error decoding).
Also, decoding the other fields require a complex logic, and should
be done for each of the several values inside the record field.
So, let userspace daemons like rasdaemon decode them, parsing such
tables and having vendor-specific micro-architecture-specific decoders.

[mchehab: modified patch description and fix coding style]
Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>
Tested-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
---
 drivers/acpi/apei/ghes.c |  3 +--
 drivers/ras/ras.c        | 45 +++++++++++++++++++++++++++++++++++--
 include/linux/ras.h      | 16 ++++++++++----
 include/ras/ras_event.h  | 48 +++++++++++++++++++++++++++++++++++-----
 4 files changed, 99 insertions(+), 13 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 2589a3536d91..90efca025d27 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -538,9 +538,8 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 	int sec_sev, i;
 	char *p;
 
-	log_arm_hw_error(err);
-
 	sec_sev = ghes_severity(gdata->error_severity);
+	log_arm_hw_error(err, sec_sev);
 	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
 		return false;
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 5d94ab79c8c3..75acc09bc96a 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -52,10 +52,51 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id,
 	trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
 }
 
-void log_arm_hw_error(struct cper_sec_proc_arm *err)
+void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
 {
 #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
-	trace_arm_event(err);
+	struct cper_arm_err_info *err_info;
+	struct cper_arm_ctx_info *ctx_info;
+	u8 *ven_err_data;
+	u32 ctx_len = 0;
+	int n, sz, cpu;
+	s32 vsei_len;
+	u32 pei_len;
+	u8 *pei_err;
+	u8 *ctx_err;
+
+	pei_len = sizeof(struct cper_arm_err_info) * err->err_info_num;
+	pei_err = (u8 *)err + sizeof(struct cper_sec_proc_arm);
+
+	err_info = (struct cper_arm_err_info *)(err + 1);
+	ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
+	ctx_err = (u8 *)ctx_info;
+	for (n = 0; n < err->context_info_num; n++) {
+		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
+		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
+		ctx_len += sz;
+	}
+
+	vsei_len = err->section_length - (sizeof(struct cper_sec_proc_arm) +
+					  pei_len + ctx_len);
+	if (vsei_len < 0) {
+		pr_warn(FW_BUG
+			"section length: %d\n", err->section_length);
+		pr_warn(FW_BUG
+			"section length is too small\n");
+		pr_warn(FW_BUG
+			"firmware-generated error record is incorrect\n");
+		vsei_len = 0;
+	}
+	ven_err_data = (u8 *)ctx_info;
+
+	cpu = GET_LOGICAL_INDEX(err->mpidr);
+	/* when return value is invalid, set cpu index to -1 */
+	if (cpu < 0)
+		cpu = -1;
+
+	trace_arm_event(err, pei_err, pei_len, ctx_err, ctx_len,
+			ven_err_data, (u32)vsei_len, sev, cpu);
 #endif
 }
 
diff --git a/include/linux/ras.h b/include/linux/ras.h
index a64182bc72ad..6025afe5736a 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -24,8 +24,7 @@ int __init parse_cec_param(char *str);
 void log_non_standard_event(const guid_t *sec_type,
 			    const guid_t *fru_id, const char *fru_text,
 			    const u8 sev, const u8 *err, const u32 len);
-void log_arm_hw_error(struct cper_sec_proc_arm *err);
-
+void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev);
 #else
 static inline void
 log_non_standard_event(const guid_t *sec_type,
@@ -33,7 +32,7 @@ log_non_standard_event(const guid_t *sec_type,
 		       const u8 sev, const u8 *err, const u32 len)
 { return; }
 static inline void
-log_arm_hw_error(struct cper_sec_proc_arm *err) { return; }
+log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev) { return; }
 #endif
 
 struct atl_err {
@@ -52,5 +51,14 @@ static inline void amd_retire_dram_row(struct atl_err *err) { }
 static inline unsigned long
 amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err) { return -EINVAL; }
 #endif /* CONFIG_AMD_ATL */
-
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+#include <asm/smp_plat.h>
+/*
+ * Include ARM specific SMP header which provides a function mapping mpidr to
+ * cpu logical index.
+ */
+#define GET_LOGICAL_INDEX(mpidr) get_logical_index(mpidr & MPIDR_HWID_BITMASK)
+#else
+#define GET_LOGICAL_INDEX(mpidr) -EINVAL
+#endif /* CONFIG_ARM || CONFIG_ARM64 */
 #endif /* __RAS_H__ */
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 7c47151d5c72..ce5214f008eb 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -168,11 +168,24 @@ TRACE_EVENT(mc_event,
  * This event is generated when hardware detects an ARM processor error
  * has occurred. UEFI 2.6 spec section N.2.4.4.
  */
+#define APEIL "ARM Processor Err Info data len"
+#define APEID "ARM Processor Err Info raw data"
+#define APECIL "ARM Processor Err Context Info data len"
+#define APECID "ARM Processor Err Context Info raw data"
+#define VSEIL "Vendor Specific Err Info data len"
+#define VSEID "Vendor Specific Err Info raw data"
 TRACE_EVENT(arm_event,
 
-	TP_PROTO(const struct cper_sec_proc_arm *proc),
+	TP_PROTO(const struct cper_sec_proc_arm *proc, const u8 *pei_err,
+			const u32 pei_len,
+			const u8 *ctx_err,
+			const u32 ctx_len,
+			const u8 *oem,
+			const u32 oem_len,
+			u8 sev,
+			int cpu),
 
-	TP_ARGS(proc),
+	TP_ARGS(proc, pei_err, pei_len, ctx_err, ctx_len, oem, oem_len, sev, cpu),
 
 	TP_STRUCT__entry(
 		__field(u64, mpidr)
@@ -180,6 +193,14 @@ TRACE_EVENT(arm_event,
 		__field(u32, running_state)
 		__field(u32, psci_state)
 		__field(u8, affinity)
+		__field(u32, pei_len)
+		__dynamic_array(u8, buf, pei_len)
+		__field(u32, ctx_len)
+		__dynamic_array(u8, buf1, ctx_len)
+		__field(u32, oem_len)
+		__dynamic_array(u8, buf2, oem_len)
+		__field(u8, sev)
+		__field(int, cpu)
 	),
 
 	TP_fast_assign(
@@ -199,12 +220,29 @@ TRACE_EVENT(arm_event,
 			__entry->running_state = ~0;
 			__entry->psci_state = ~0;
 		}
+		__entry->pei_len = pei_len;
+		memcpy(__get_dynamic_array(buf), pei_err, pei_len);
+		__entry->ctx_len = ctx_len;
+		memcpy(__get_dynamic_array(buf1), ctx_err, ctx_len);
+		__entry->oem_len = oem_len;
+		memcpy(__get_dynamic_array(buf2), oem, oem_len);
+		__entry->sev = sev;
+		__entry->cpu = cpu;
 	),
 
-	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
-		  "running state: %d; PSCI state: %d",
+	TP_printk("cpu: %d; error: %d; affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
+		  "running state: %d; PSCI state: %d; "
+		  "%s: %d; %s: %s; %s: %d; %s: %s; %s: %d; %s: %s",
+		  __entry->cpu,
+		  __entry->sev,
 		  __entry->affinity, __entry->mpidr, __entry->midr,
-		  __entry->running_state, __entry->psci_state)
+		  __entry->running_state, __entry->psci_state,
+		  APEIL, __entry->pei_len, APEID,
+		  __print_hex(__get_dynamic_array(buf), __entry->pei_len),
+		  APECIL, __entry->ctx_len, APECID,
+		  __print_hex(__get_dynamic_array(buf1), __entry->ctx_len),
+		  VSEIL, __entry->oem_len, VSEID,
+		  __print_hex(__get_dynamic_array(buf2), __entry->oem_len))
 );
 
 /*
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] efi/cper: Adjust infopfx size to accept an extra space
  2024-07-08 11:18 [PATCH 0/6] Fix issues with ARM Processor CPER records Mauro Carvalho Chehab
  2024-07-08 11:18 ` [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions Mauro Carvalho Chehab
  2024-07-08 11:18 ` [PATCH 2/6] RAS: Report all ARM processor CPER information to userspace Mauro Carvalho Chehab
@ 2024-07-08 11:18 ` Mauro Carvalho Chehab
  2024-07-08 11:18 ` [PATCH 4/6] efi/cper: Add a new helper function to print bitmasks Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-08 11:18 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, James Morse,
	Jonathan Cameron, Len Brown, Rafael J. Wysocki, Shiju Jose,
	linux-acpi, linux-edac, linux-efi, linux-kernel

Compiling with W=1 with werror enabled produces an error:

drivers/firmware/efi/cper-arm.c: In function ‘cper_print_proc_arm’:
drivers/firmware/efi/cper-arm.c:298:64: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
  298 |                         snprintf(infopfx, sizeof(infopfx), "%s ", newpfx);
      |                                                                ^
drivers/firmware/efi/cper-arm.c:298:25: note: ‘snprintf’ output between 2 and 65 bytes into a destination of size 64
  298 |                         snprintf(infopfx, sizeof(infopfx), "%s ", newpfx);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As the logic there adds an space at the end of infopx buffer.
Add an extra space to avoid such warning.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/firmware/efi/cper-arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index fa9c1c3bf168..eb7ee6af55f2 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -240,7 +240,7 @@ void cper_print_proc_arm(const char *pfx,
 	int i, len, max_ctx_type;
 	struct cper_arm_err_info *err_info;
 	struct cper_arm_ctx_info *ctx_info;
-	char newpfx[64], infopfx[64];
+	char newpfx[64], infopfx[ARRAY_SIZE(newpfx) + 1];
 
 	printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] efi/cper: Add a new helper function to print bitmasks
  2024-07-08 11:18 [PATCH 0/6] Fix issues with ARM Processor CPER records Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2024-07-08 11:18 ` [PATCH 3/6] efi/cper: Adjust infopfx size to accept an extra space Mauro Carvalho Chehab
@ 2024-07-08 11:18 ` Mauro Carvalho Chehab
  2024-07-08 15:45   ` Jonathan Cameron
  2024-07-08 11:18 ` [PATCH 5/6] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs Mauro Carvalho Chehab
  2024-07-08 11:18 ` [PATCH 6/6] docs: efi: add CPER functions to driver-api Mauro Carvalho Chehab
  5 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-08 11:18 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, James Morse,
	Jonathan Cameron, Len Brown, Rafael J. Wysocki, Shiju Jose,
	Alison Schofield, Ira Weiny, linux-acpi, linux-edac, linux-efi,
	linux-kernel

Sometimes it is desired to produce a single log line for errors.
Add a new helper function for such purpose.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/firmware/efi/cper.c | 43 +++++++++++++++++++++++++++++++++++++
 include/linux/cper.h        |  2 ++
 2 files changed, 45 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 7d2cdd9e2227..f8c8a15cd527 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -106,6 +106,49 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 		printk("%s\n", buf);
 }
 
+/*
+ * cper_bits_to_str - return a string for set bits
+ * @buf: buffer to store the output string
+ * @buf_size: size of the output string buffer
+ * @bits: bit mask
+ * @strs: string array, indexed by bit position
+ * @strs_size: size of the string array: @strs
+ * @mask: a continuous bitmask used to detect the first valid bit of the
+ *        bitmap.
+ *
+ * Add to @buf the bitmask in hexadecimal. Then, for each set bit in @bits
+ * mask, add the corresponding string describing the bit in @strs to @buf.
+ */
+char *cper_bits_to_str(char *buf, int buf_size, unsigned long bits,
+		       const char * const strs[], unsigned int strs_size)
+{
+	int len = buf_size;
+	char *str = buf;
+	int i, size;
+
+	*buf = '\0';
+
+	for_each_set_bit(i, &bits, strs_size) {
+		if (!(bits & (1U << (i))))
+			continue;
+
+		if (*buf && len > 0) {
+			*str = '|';
+			len--;
+			str++;
+		}
+
+		size = strscpy(str, strs[i], len);
+		if (size < 0)
+			break;
+
+		len -= size;
+		str += size;
+	}
+	return buf;
+}
+EXPORT_SYMBOL_GPL(cper_bits_to_str);
+
 static const char * const proc_type_strs[] = {
 	"IA32/X64",
 	"IA64",
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 265b0f8fc0b3..c2f14b916bfb 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -584,6 +584,8 @@ const char *cper_mem_err_type_str(unsigned int);
 const char *cper_mem_err_status_str(u64 status);
 void cper_print_bits(const char *prefix, unsigned int bits,
 		     const char * const strs[], unsigned int strs_size);
+char *cper_bits_to_str(char *buf, int buf_size, unsigned long bits,
+		       const char * const strs[], unsigned int strs_size);
 void cper_mem_err_pack(const struct cper_sec_mem_err *,
 		       struct cper_mem_err_compact *);
 const char *cper_mem_err_unpack(struct trace_seq *,
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs
  2024-07-08 11:18 [PATCH 0/6] Fix issues with ARM Processor CPER records Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2024-07-08 11:18 ` [PATCH 4/6] efi/cper: Add a new helper function to print bitmasks Mauro Carvalho Chehab
@ 2024-07-08 11:18 ` Mauro Carvalho Chehab
  2024-07-08 15:50   ` Jonathan Cameron
  2024-07-08 11:18 ` [PATCH 6/6] docs: efi: add CPER functions to driver-api Mauro Carvalho Chehab
  5 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-08 11:18 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, James Morse,
	Jonathan Cameron, Len Brown, Rafael J. Wysocki, Shiju Jose,
	Uwe Kleine-König, Alison Schofield, Dan Williams, Dave Jiang,
	Ira Weiny, Shuai Xue, linux-acpi, linux-edac, linux-efi,
	linux-kernel

Up to UEFI spec, the type byte of CPER struct for ARM processor was
defined simply as:

Type at byte offset 4:

	- Cache error
	- TLB Error
	- Bus Error
	- Micro-architectural Error
	All other values are reserved

Yet, there was no information about how this would be encoded.

Spec 2.9A errata corrected it by defining:

	- Bit 1 - Cache Error
	- Bit 2 - TLB Error
	- Bit 3 - Bus Error
	- Bit 4 - Micro-architectural Error
	All other values are reserved

That actually aligns with the values already defined on older
versions at N.2.4.1. Generic Processor Error Section.

Spec 2.10 also preserve the same encoding as 2.9A

Adjust CPER and GHES handling code for both generic and ARM
processors to properly handle UEFI 2.9A and 2.10 encoding.

Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 drivers/acpi/apei/ghes.c        | 15 ++++++----
 drivers/firmware/efi/cper-arm.c | 50 ++++++++++++++++-----------------
 include/linux/cper.h            | 10 +++----
 3 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 90efca025d27..e796140d93f0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -535,6 +535,7 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 #if defined(CONFIG_ARM) || defined (CONFIG_ARM64)
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
+	char error_type[120];
 	int sec_sev, i;
 	char *p;
 
@@ -546,9 +547,8 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 	p = (char *)(err + 1);
 	for (i = 0; i < err->err_info_num; i++) {
 		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
-		bool is_cache = (err_info->type == CPER_ARM_CACHE_ERROR);
+		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
 		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
-		const char *error_type = "unknown error";
 
 		/*
 		 * The field (err_info->error_info & BIT(26)) is fixed to set to
@@ -562,12 +562,15 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 			continue;
 		}
 
-		if (err_info->type < ARRAY_SIZE(cper_proc_error_type_strs))
-			error_type = cper_proc_error_type_strs[err_info->type];
+		cper_bits_to_str(error_type, sizeof(error_type),
+				 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
+				 cper_proc_error_type_strs,
+				 ARRAY_SIZE(cper_proc_error_type_strs));
 
 		pr_warn_ratelimited(FW_WARN GHES_PFX
-				    "Unhandled processor error type: %s\n",
-				    error_type);
+				    "Unhandled processor error type 0x%02x: %s%s\n",
+				    err_info->type, error_type,
+				    (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
 		p += err_info->length;
 	}
 #endif
diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index eb7ee6af55f2..52d18490b59e 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -93,15 +93,11 @@ static void cper_print_arm_err_info(const char *pfx, u32 type,
 	bool proc_context_corrupt, corrected, precise_pc, restartable_pc;
 	bool time_out, access_mode;
 
-	/* If the type is unknown, bail. */
-	if (type > CPER_ARM_MAX_TYPE)
-		return;
-
 	/*
 	 * Vendor type errors have error information values that are vendor
 	 * specific.
 	 */
-	if (type == CPER_ARM_VENDOR_ERROR)
+	if (type & CPER_ARM_VENDOR_ERROR)
 		return;
 
 	if (error_info & CPER_ARM_ERR_VALID_TRANSACTION_TYPE) {
@@ -116,43 +112,38 @@ static void cper_print_arm_err_info(const char *pfx, u32 type,
 	if (error_info & CPER_ARM_ERR_VALID_OPERATION_TYPE) {
 		op_type = ((error_info >> CPER_ARM_ERR_OPERATION_SHIFT)
 			   & CPER_ARM_ERR_OPERATION_MASK);
-		switch (type) {
-		case CPER_ARM_CACHE_ERROR:
+		if (type & CPER_ARM_CACHE_ERROR) {
 			if (op_type < ARRAY_SIZE(arm_cache_err_op_strs)) {
-				printk("%soperation type: %s\n", pfx,
+				printk("%scache error, operation type: %s\n", pfx,
 				       arm_cache_err_op_strs[op_type]);
 			}
-			break;
-		case CPER_ARM_TLB_ERROR:
+		}
+		if (type & CPER_ARM_TLB_ERROR) {
 			if (op_type < ARRAY_SIZE(arm_tlb_err_op_strs)) {
-				printk("%soperation type: %s\n", pfx,
+				printk("%sTLB error, operation type: %s\n", pfx,
 				       arm_tlb_err_op_strs[op_type]);
 			}
-			break;
-		case CPER_ARM_BUS_ERROR:
+		}
+		if (type & CPER_ARM_BUS_ERROR) {
 			if (op_type < ARRAY_SIZE(arm_bus_err_op_strs)) {
-				printk("%soperation type: %s\n", pfx,
+				printk("%sbus error, operation type: %s\n", pfx,
 				       arm_bus_err_op_strs[op_type]);
 			}
-			break;
 		}
 	}
 
 	if (error_info & CPER_ARM_ERR_VALID_LEVEL) {
 		level = ((error_info >> CPER_ARM_ERR_LEVEL_SHIFT)
 			 & CPER_ARM_ERR_LEVEL_MASK);
-		switch (type) {
-		case CPER_ARM_CACHE_ERROR:
+		if (type & CPER_ARM_CACHE_ERROR)
 			printk("%scache level: %d\n", pfx, level);
-			break;
-		case CPER_ARM_TLB_ERROR:
+
+		if (type & CPER_ARM_TLB_ERROR)
 			printk("%sTLB level: %d\n", pfx, level);
-			break;
-		case CPER_ARM_BUS_ERROR:
+
+		if (type & CPER_ARM_BUS_ERROR)
 			printk("%saffinity level at which the bus error occurred: %d\n",
 			       pfx, level);
-			break;
-		}
 	}
 
 	if (error_info & CPER_ARM_ERR_VALID_PROC_CONTEXT_CORRUPT) {
@@ -241,6 +232,7 @@ void cper_print_proc_arm(const char *pfx,
 	struct cper_arm_err_info *err_info;
 	struct cper_arm_ctx_info *ctx_info;
 	char newpfx[64], infopfx[ARRAY_SIZE(newpfx) + 1];
+	char error_type[120];
 
 	printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
 
@@ -289,9 +281,15 @@ void cper_print_proc_arm(const char *pfx,
 				       newpfx);
 		}
 
-		printk("%serror_type: %d, %s\n", newpfx, err_info->type,
-			err_info->type < ARRAY_SIZE(cper_proc_error_type_strs) ?
-			cper_proc_error_type_strs[err_info->type] : "unknown");
+		cper_bits_to_str(error_type, sizeof(error_type),
+				 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
+				 cper_proc_error_type_strs,
+				 ARRAY_SIZE(cper_proc_error_type_strs));
+
+		printk("%serror_type: 0x%02x: %s%s\n", newpfx, err_info->type,
+		       error_type,
+		       (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
+
 		if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) {
 			printk("%serror_info: 0x%016llx\n", newpfx,
 			       err_info->error_info);
diff --git a/include/linux/cper.h b/include/linux/cper.h
index c2f14b916bfb..fc62a80575e8 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -293,11 +293,11 @@ enum {
 #define CPER_ARM_INFO_FLAGS_PROPAGATED		BIT(2)
 #define CPER_ARM_INFO_FLAGS_OVERFLOW		BIT(3)
 
-#define CPER_ARM_CACHE_ERROR			0
-#define CPER_ARM_TLB_ERROR			1
-#define CPER_ARM_BUS_ERROR			2
-#define CPER_ARM_VENDOR_ERROR			3
-#define CPER_ARM_MAX_TYPE			CPER_ARM_VENDOR_ERROR
+#define CPER_ARM_ERR_TYPE_MASK			GENMASK(4,1)
+#define CPER_ARM_CACHE_ERROR			BIT(1)
+#define CPER_ARM_TLB_ERROR			BIT(2)
+#define CPER_ARM_BUS_ERROR			BIT(3)
+#define CPER_ARM_VENDOR_ERROR			BIT(4)
 
 #define CPER_ARM_ERR_VALID_TRANSACTION_TYPE	BIT(0)
 #define CPER_ARM_ERR_VALID_OPERATION_TYPE	BIT(1)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] docs: efi: add CPER functions to driver-api
  2024-07-08 11:18 [PATCH 0/6] Fix issues with ARM Processor CPER records Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2024-07-08 11:18 ` [PATCH 5/6] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs Mauro Carvalho Chehab
@ 2024-07-08 11:18 ` Mauro Carvalho Chehab
  2024-07-08 15:47   ` Jonathan Cameron
  5 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-08 11:18 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck
  Cc: Mauro Carvalho Chehab, Ard Biesheuvel, James Morse,
	Jonathan Cameron, Len Brown, Rafael J. Wysocki, Shiju Jose,
	Jonathan Corbet, linux-acpi, linux-doc, linux-edac, linux-efi,
	linux-kernel

There are two kernel-doc like descriptions at cper, which is used
by other parts of cper and on ghes driver. They both have kernel-doc
like descriptions.

Change the tags for them to be actual kernel-doc tags and add them
to the driver-api documentaion at the UEFI section.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 Documentation/driver-api/firmware/efi/index.rst | 11 ++++++++---
 drivers/firmware/efi/cper.c                     | 10 ++++------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/Documentation/driver-api/firmware/efi/index.rst b/Documentation/driver-api/firmware/efi/index.rst
index 4fe8abba9fc6..5a6b6229592c 100644
--- a/Documentation/driver-api/firmware/efi/index.rst
+++ b/Documentation/driver-api/firmware/efi/index.rst
@@ -1,11 +1,16 @@
 .. SPDX-License-Identifier: GPL-2.0
 
-============
-UEFI Support
-============
+====================================================
+Unified Extensible Firmware Interface (UEFI) Support
+====================================================
 
 UEFI stub library functions
 ===========================
 
 .. kernel-doc:: drivers/firmware/efi/libstub/mem.c
    :internal:
+
+UEFI Common Platform Error Record (CPER) functions
+==================================================
+
+.. kernel-doc:: drivers/firmware/efi/cper.c
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index f8c8a15cd527..2785c8ea8ad8 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -69,7 +69,7 @@ const char *cper_severity_str(unsigned int severity)
 }
 EXPORT_SYMBOL_GPL(cper_severity_str);
 
-/*
+/**
  * cper_print_bits - print strings for set bits
  * @pfx: prefix for each line, including log level and prefix string
  * @bits: bit mask
@@ -106,18 +106,16 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 		printk("%s\n", buf);
 }
 
-/*
+/**
  * cper_bits_to_str - return a string for set bits
  * @buf: buffer to store the output string
  * @buf_size: size of the output string buffer
  * @bits: bit mask
  * @strs: string array, indexed by bit position
  * @strs_size: size of the string array: @strs
- * @mask: a continuous bitmask used to detect the first valid bit of the
- *        bitmap.
  *
- * Add to @buf the bitmask in hexadecimal. Then, for each set bit in @bits
- * mask, add the corresponding string describing the bit in @strs to @buf.
+ * Add to @buf the bitmask in hexadecimal. Then, for each set bit in @bits,
+ * add the corresponding string describing the bit in @strs to @buf.
  */
 char *cper_bits_to_str(char *buf, int buf_size, unsigned long bits,
 		       const char * const strs[], unsigned int strs_size)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions
  2024-07-08 11:18 ` [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions Mauro Carvalho Chehab
@ 2024-07-08 11:32   ` Borislav Petkov
  2024-07-08 12:10     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2024-07-08 11:32 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Tony Luck, Daniel Ferguson, Ard Biesheuvel, James Morse,
	Jonathan Cameron, Len Brown, Rafael J. Wysocki, Shiju Jose,
	Uwe Kleine-König, Dan Williams, Dave Jiang, Ira Weiny,
	Shuai Xue, linux-acpi, linux-edac, linux-efi, linux-kernel

On Mon, Jul 08, 2024 at 01:18:10PM +0200, Mauro Carvalho Chehab wrote:
> From: Daniel Ferguson <danielf@os.amperecomputing.com>
> 
> This prevents the unnecessary inclusion of ARM specific RAS error

s/This prevents/Prevent/

Avoid having "This patch" or "This commit" or "This does <bla>" in the commit
message. It is tautologically useless.

"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."

From Documentation/process/submitting-patches.rst

> handling routines in non-ARM platforms.

Ok, this does "something". Why does it do it?

Otherwise it won't build on other architectures or is it going to cause code
bloat or why are we doing this?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions
  2024-07-08 11:32   ` Borislav Petkov
@ 2024-07-08 12:10     ` Mauro Carvalho Chehab
  2024-07-08 14:43       ` Borislav Petkov
  2024-07-08 14:55       ` Jonathan Cameron
  0 siblings, 2 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-08 12:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, Daniel Ferguson, Ard Biesheuvel, James Morse,
	Jonathan Cameron, Len Brown, Rafael J. Wysocki, Shiju Jose,
	Uwe Kleine-König, Dan Williams, Dave Jiang, Ira Weiny,
	Shuai Xue, linux-acpi, linux-edac, linux-efi, linux-kernel

Em Mon, 8 Jul 2024 13:32:34 +0200
Borislav Petkov <bp@alien8.de> escreveu:

> On Mon, Jul 08, 2024 at 01:18:10PM +0200, Mauro Carvalho Chehab wrote:
> > From: Daniel Ferguson <danielf@os.amperecomputing.com>
> > 
> > This prevents the unnecessary inclusion of ARM specific RAS error  
> 
> s/This prevents/Prevent/
> 
> Avoid having "This patch" or "This commit" or "This does <bla>" in the commit
> message. It is tautologically useless.
> 
> "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> to do frotz", as if you are giving orders to the codebase to change
> its behaviour."
> 
> From Documentation/process/submitting-patches.rst
> 
> > handling routines in non-ARM platforms.  
> 
> Ok, this does "something". Why does it do it?
> 
> Otherwise it won't build on other architectures or is it going to cause code
> bloat or why are we doing this?

Probably a better description would be:

    RAS: ACPI: APEI: add conditional compilation to ARM error report functions
    
    Don't include ARM Processor specific error handling routines in 
    non-ARM platforms, preparing it to the next patch, as arm-specific
    kAPI symbols will be used, thus avoiding build breakages when ARM
    is not selected.
    
    [mchehab: avoid unneeded ifdefs and fix coding style issues]
    Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

This patch itself just add conditionals to optimize out code on
non-ARM architectures. The next one will add some ARM-specific bits
inside ARM processor CPER trace, thus causing compilation breakages
on non-ARM, due to arm-specific kAPI bits that will be used then.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions
  2024-07-08 12:10     ` Mauro Carvalho Chehab
@ 2024-07-08 14:43       ` Borislav Petkov
  2024-07-11  5:26         ` Mauro Carvalho Chehab
  2024-07-08 14:55       ` Jonathan Cameron
  1 sibling, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2024-07-08 14:43 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Tony Luck, Daniel Ferguson, Ard Biesheuvel, James Morse,
	Jonathan Cameron, Len Brown, Rafael J. Wysocki, Shiju Jose,
	Uwe Kleine-König, Dan Williams, Dave Jiang, Ira Weiny,
	Shuai Xue, linux-acpi, linux-edac, linux-efi, linux-kernel

On Mon, Jul 08, 2024 at 02:10:25PM +0200, Mauro Carvalho Chehab wrote:
> This patch itself just add conditionals to optimize out code on
> non-ARM architectures. The next one will add some ARM-specific bits
> inside ARM processor CPER trace, thus causing compilation breakages
> on non-ARM, due to arm-specific kAPI bits that will be used then.

Are you sure?

I have both patches applied and then practically reverting the second one
builds an allmodconfig just fine.

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 90efca025d27..524fea3f4f76 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -532,7 +532,6 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 				     int sev, bool sync)
 {
 	bool queued = false;
-#if defined(CONFIG_ARM) || defined (CONFIG_ARM64)
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
 	int sec_sev, i;
@@ -570,7 +569,6 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 				    error_type);
 		p += err_info->length;
 	}
-#endif
 	return queued;
 }
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 75acc09bc96a..359bb163aee0 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -54,7 +54,6 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id,
 
 void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
 {
-#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
 	struct cper_arm_err_info *err_info;
 	struct cper_arm_ctx_info *ctx_info;
 	u8 *ven_err_data;
@@ -97,7 +96,6 @@ void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
 
 	trace_arm_event(err, pei_err, pei_len, ctx_err, ctx_len,
 			ven_err_data, (u32)vsei_len, sev, cpu);
-#endif
 }
 
 static int __init ras_init(void)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions
  2024-07-08 12:10     ` Mauro Carvalho Chehab
  2024-07-08 14:43       ` Borislav Petkov
@ 2024-07-08 14:55       ` Jonathan Cameron
  1 sibling, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2024-07-08 14:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Tony Luck, Daniel Ferguson, Ard Biesheuvel,
	James Morse, Len Brown, Rafael J. Wysocki, Shiju Jose,
	Uwe Kleine-König, Dan Williams, Dave Jiang, Ira Weiny,
	Shuai Xue, linux-acpi, linux-edac, linux-efi, linux-kernel

On Mon, 8 Jul 2024 14:10:25 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Em Mon, 8 Jul 2024 13:32:34 +0200
> Borislav Petkov <bp@alien8.de> escreveu:
> 
> > On Mon, Jul 08, 2024 at 01:18:10PM +0200, Mauro Carvalho Chehab wrote:  
> > > From: Daniel Ferguson <danielf@os.amperecomputing.com>
> > > 
> > > This prevents the unnecessary inclusion of ARM specific RAS error    
> > 
> > s/This prevents/Prevent/
> > 
> > Avoid having "This patch" or "This commit" or "This does <bla>" in the commit
> > message. It is tautologically useless.
> > 
> > "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> > instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> > to do frotz", as if you are giving orders to the codebase to change
> > its behaviour."
> > 
> > From Documentation/process/submitting-patches.rst
> >   
> > > handling routines in non-ARM platforms.    
> > 
> > Ok, this does "something". Why does it do it?
> > 
> > Otherwise it won't build on other architectures or is it going to cause code
> > bloat or why are we doing this?  
> 
> Probably a better description would be:
> 
>     RAS: ACPI: APEI: add conditional compilation to ARM error report functions
>     
>     Don't include ARM Processor specific error handling routines in 
>     non-ARM platforms, preparing it to the next patch, as arm-specific
>     kAPI symbols will be used, thus avoiding build breakages when ARM
>     is not selected.
>     
>     [mchehab: avoid unneeded ifdefs and fix coding style issues]
>     Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>
>     Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
With that change log seems fine to me.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> 
> This patch itself just add conditionals to optimize out code on
> non-ARM architectures. The next one will add some ARM-specific bits
> inside ARM processor CPER trace, thus causing compilation breakages
> on non-ARM, due to arm-specific kAPI bits that will be used then.
> 
> Thanks,
> Mauro


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] RAS: Report all ARM processor CPER information to userspace
  2024-07-08 11:18 ` [PATCH 2/6] RAS: Report all ARM processor CPER information to userspace Mauro Carvalho Chehab
@ 2024-07-08 15:34   ` Jonathan Cameron
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2024-07-08 15:34 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Tony Luck, Shengwei Luo, Ard Biesheuvel,
	James Morse, Len Brown, Rafael J. Wysocki, Shiju Jose,
	Uwe Kleine-König, Dan Williams, Dave Jiang, Ira Weiny,
	Shuai Xue, Steven Rostedt, Tyler Baicar, Will Deacon, Xie XiuQi,
	linux-acpi, linux-edac, linux-efi, linux-kernel, Jason Tian,
	Daniel Ferguson

On Mon,  8 Jul 2024 13:18:11 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> From: Shengwei Luo <luoshengwei@huawei.com>
> 
> The ARM processor CPER record was added at UEFI 2.6, and hasn't
> any changes up to UEFI 2.10 on its struct.
> 
> Yet, the original arm_event trace code added on changeset
> e9279e83ad1f ("trace, ras: add ARM processor error trace event") is
> incomplete, as it only traces some fields of UAPI 2.6 table N.16,
> not exporting at all any information from tables N.17 to N.29 of
> the record.
> 
> This is not enough for user to take appropriate action or to log
> what exactly happened.
> 
> According to UEFI_2_9 specification chapter N2.4.4, the ARM processor
> error section includes:
> 
> - several (ERR_INFO_NUM) ARM processor error information structures
>   (Tables N.17 to N.20);
> - several (CONTEXT_INFO_NUM) ARM processor context information
>   structures (Tables N.21 to N.29);
> - several vendor specific error information structures. The
>   size is given by Section Length minus the size of the other
>   fields.
> 
> In addition to those data, it also exports two fields that are
> parsed by the GHES driver when firmware reports it, e. g.:
> 
> - error severity
> - cpu logical index
> 
> Report all of these information to userspace via trace uAPI, So that
> userspace can properly record the error and take decisions related
> to cpu core isolation according to error severity and other info.
> 
> After this patch, all the data from ARM Processor record from table
> N.16 are directly or indirectly visible on userspace:
> 
> ======================================	=============================
> UEFI field on table N.16		ARM Processor trace fields
> ======================================	=============================
> Validation				handled when filling data for
> 					affinity MPIDR and running
> 					state.
> ERR_INFO_NUM				pei_len
> CONTEXT_INFO_NUM			ctx_len
> Section Length				indirectly reported by
> 					pei_len, ctx_len and oem_len
> Error affinity level			affinity
> MPIDR_EL1				mpidr
> MIDR_EL1				midr
> Running State				running_state
> PSCI State				psci_state
> Processor Error Information Structure	pei_err - count at pei_len
> Processor Context			ctx_err- count at ctx_len
> Vendor Specific Error Info		oem - count at oem_len
> ======================================	=============================
> 
> It should be noticed that decoding of tables N.17 to N.29, if needed,
> will be handled on userspace. That gives more flexibility, as there
> won't be any need to flood the Kernel with micro-architecture specific
> error decoding).
> Also, decoding the other fields require a complex logic, and should
> be done for each of the several values inside the record field.
> So, let userspace daemons like rasdaemon decode them, parsing such
> tables and having vendor-specific micro-architecture-specific decoders.
> 
> [mchehab: modified patch description and fix coding style]
> Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
> Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
> Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com>
> Tested-by: Shiju Jose <shiju.jose@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section

A few comments inline but all of the 'I'd have done this slightly differently'
variety.  This is fine as it stands though.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/acpi/apei/ghes.c |  3 +--
>  drivers/ras/ras.c        | 45 +++++++++++++++++++++++++++++++++++--
>  include/linux/ras.h      | 16 ++++++++++----
>  include/ras/ras_event.h  | 48 +++++++++++++++++++++++++++++++++++-----
>  4 files changed, 99 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 2589a3536d91..90efca025d27 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -538,9 +538,8 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
>  	int sec_sev, i;
>  	char *p;
>  
> -	log_arm_hw_error(err);
> -
>  	sec_sev = ghes_severity(gdata->error_severity);
> +	log_arm_hw_error(err, sec_sev);
>  	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
>  		return false;
>  
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index 5d94ab79c8c3..75acc09bc96a 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -52,10 +52,51 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id,
>  	trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
>  }
>  
> -void log_arm_hw_error(struct cper_sec_proc_arm *err)
> +void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev)
>  {
>  #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> -	trace_arm_event(err);
> +	struct cper_arm_err_info *err_info;
> +	struct cper_arm_ctx_info *ctx_info;
> +	u8 *ven_err_data;
> +	u32 ctx_len = 0;
> +	int n, sz, cpu;
> +	s32 vsei_len;
> +	u32 pei_len;
> +	u8 *pei_err;
> +	u8 *ctx_err;
> +
> +	pei_len = sizeof(struct cper_arm_err_info) * err->err_info_num;
> +	pei_err = (u8 *)err + sizeof(struct cper_sec_proc_arm);
> +
> +	err_info = (struct cper_arm_err_info *)(err + 1);
> +	ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
> +	ctx_err = (u8 *)ctx_info;
> +	for (n = 0; n < err->context_info_num; n++) {
> +		sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
> +		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
> +		ctx_len += sz;
> +	}
> +
> +	vsei_len = err->section_length - (sizeof(struct cper_sec_proc_arm) +
> +					  pei_len + ctx_len);
> +	if (vsei_len < 0) {
> +		pr_warn(FW_BUG
> +			"section length: %d\n", err->section_length);
> +		pr_warn(FW_BUG
> +			"section length is too small\n");
> +		pr_warn(FW_BUG
> +			"firmware-generated error record is incorrect\n");
> +		vsei_len = 0;
> +	}
> +	ven_err_data = (u8 *)ctx_info;
> +
> +	cpu = GET_LOGICAL_INDEX(err->mpidr);
> +	/* when return value is invalid, set cpu index to -1 */
> +	if (cpu < 0)
> +		cpu = -1;
> +
> +	trace_arm_event(err, pei_err, pei_len, ctx_err, ctx_len,
> +			ven_err_data, (u32)vsei_len, sev, cpu);
>  #endif
>  }
>  
> diff --git a/include/linux/ras.h b/include/linux/ras.h
> index a64182bc72ad..6025afe5736a 100644
> --- a/include/linux/ras.h
> +++ b/include/linux/ras.h
> @@ -24,8 +24,7 @@ int __init parse_cec_param(char *str);
>  void log_non_standard_event(const guid_t *sec_type,
>  			    const guid_t *fru_id, const char *fru_text,
>  			    const u8 sev, const u8 *err, const u32 len);
> -void log_arm_hw_error(struct cper_sec_proc_arm *err);
> -
> +void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev);
>  #else
>  static inline void
>  log_non_standard_event(const guid_t *sec_type,
> @@ -33,7 +32,7 @@ log_non_standard_event(const guid_t *sec_type,
>  		       const u8 sev, const u8 *err, const u32 len)
>  { return; }
>  static inline void
> -log_arm_hw_error(struct cper_sec_proc_arm *err) { return; }
> +log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev) { return; }
>  #endif
>  
>  struct atl_err {
> @@ -52,5 +51,14 @@ static inline void amd_retire_dram_row(struct atl_err *err) { }
>  static inline unsigned long
>  amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err) { return -EINVAL; }
>  #endif /* CONFIG_AMD_ATL */
> -

I'd keep a blank line here for readability.

> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> +#include <asm/smp_plat.h>
> +/*
> + * Include ARM specific SMP header which provides a function mapping mpidr to
> + * cpu logical index.
> + */
> +#define GET_LOGICAL_INDEX(mpidr) get_logical_index(mpidr & MPIDR_HWID_BITMASK)
> +#else
> +#define GET_LOGICAL_INDEX(mpidr) -EINVAL
> +#endif /* CONFIG_ARM || CONFIG_ARM64 */
>  #endif /* __RAS_H__ */
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index 7c47151d5c72..ce5214f008eb 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -168,11 +168,24 @@ TRACE_EVENT(mc_event,
>   * This event is generated when hardware detects an ARM processor error
>   * has occurred. UEFI 2.6 spec section N.2.4.4.
>   */
> +#define APEIL "ARM Processor Err Info data len"
> +#define APEID "ARM Processor Err Info raw data"
> +#define APECIL "ARM Processor Err Context Info data len"
> +#define APECID "ARM Processor Err Context Info raw data"
> +#define VSEIL "Vendor Specific Err Info data len"
> +#define VSEID "Vendor Specific Err Info raw data"

I don't think I'd have bothered with these defines, but
it doesn't really matter.
traceprintk is strictly for debug convenience etc so not
vital how it is formatted however, maybe could have used a
shorter description as
"Vendor Specific Err Info (Length %d): %s"
However it would be inconsistent with existing entries.


>  TRACE_EVENT(arm_event,
>  
> -	TP_PROTO(const struct cper_sec_proc_arm *proc),
> +	TP_PROTO(const struct cper_sec_proc_arm *proc, const u8 *pei_err,
> +			const u32 pei_len,
> +			const u8 *ctx_err,
> +			const u32 ctx_len,
> +			const u8 *oem,
> +			const u32 oem_len,
> +			u8 sev,
> +			int cpu),
>  
> -	TP_ARGS(proc),
> +	TP_ARGS(proc, pei_err, pei_len, ctx_err, ctx_len, oem, oem_len, sev, cpu),
>  
>  	TP_STRUCT__entry(
>  		__field(u64, mpidr)
> @@ -180,6 +193,14 @@ TRACE_EVENT(arm_event,
>  		__field(u32, running_state)
>  		__field(u32, psci_state)
>  		__field(u8, affinity)
> +		__field(u32, pei_len)
> +		__dynamic_array(u8, buf, pei_len)

Can we do better than naming buf, buf1, buf2?
Will make the code below easier to read if they are pei_buf, ctx_buf, oem_buf

> +		__field(u32, ctx_len)
> +		__dynamic_array(u8, buf1, ctx_len)
> +		__field(u32, oem_len)
> +		__dynamic_array(u8, buf2, oem_len)
> +		__field(u8, sev)
> +		__field(int, cpu)
>  	),
>  
>  	TP_fast_assign(
> @@ -199,12 +220,29 @@ TRACE_EVENT(arm_event,
>  			__entry->running_state = ~0;
>  			__entry->psci_state = ~0;
>  		}
> +		__entry->pei_len = pei_len;
> +		memcpy(__get_dynamic_array(buf), pei_err, pei_len);
> +		__entry->ctx_len = ctx_len;
> +		memcpy(__get_dynamic_array(buf1), ctx_err, ctx_len);
> +		__entry->oem_len = oem_len;
> +		memcpy(__get_dynamic_array(buf2), oem, oem_len);
> +		__entry->sev = sev;
> +		__entry->cpu = cpu;
>  	),
>  
> -	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
> -		  "running state: %d; PSCI state: %d",
> +	TP_printk("cpu: %d; error: %d; affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
> +		  "running state: %d; PSCI state: %d; "
> +		  "%s: %d; %s: %s; %s: %d; %s: %s; %s: %d; %s: %s",
> +		  __entry->cpu,
> +		  __entry->sev,
>  		  __entry->affinity, __entry->mpidr, __entry->midr,
> -		  __entry->running_state, __entry->psci_state)
> +		  __entry->running_state, __entry->psci_state,
> +		  APEIL, __entry->pei_len, APEID,
> +		  __print_hex(__get_dynamic_array(buf), __entry->pei_len),
> +		  APECIL, __entry->ctx_len, APECID,
> +		  __print_hex(__get_dynamic_array(buf1), __entry->ctx_len),
> +		  VSEIL, __entry->oem_len, VSEID,
> +		  __print_hex(__get_dynamic_array(buf2), __entry->oem_len))
>  );
>  
>  /*


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] efi/cper: Add a new helper function to print bitmasks
  2024-07-08 11:18 ` [PATCH 4/6] efi/cper: Add a new helper function to print bitmasks Mauro Carvalho Chehab
@ 2024-07-08 15:45   ` Jonathan Cameron
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2024-07-08 15:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Tony Luck, Ard Biesheuvel, James Morse,
	Len Brown, Rafael J. Wysocki, Shiju Jose, Alison Schofield,
	Ira Weiny, linux-acpi, linux-edac, linux-efi, linux-kernel

On Mon,  8 Jul 2024 13:18:13 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Sometimes it is desired to produce a single log line for errors.
> Add a new helper function for such purpose.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  drivers/firmware/efi/cper.c | 43 +++++++++++++++++++++++++++++++++++++
>  include/linux/cper.h        |  2 ++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 7d2cdd9e2227..f8c8a15cd527 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -106,6 +106,49 @@ void cper_print_bits(const char *pfx, unsigned int bits,
>  		printk("%s\n", buf);
>  }
>  
> +/*
> + * cper_bits_to_str - return a string for set bits
> + * @buf: buffer to store the output string
> + * @buf_size: size of the output string buffer
> + * @bits: bit mask
> + * @strs: string array, indexed by bit position
> + * @strs_size: size of the string array: @strs
> + * @mask: a continuous bitmask used to detect the first valid bit of the
> + *        bitmap.
> + *
> + * Add to @buf the bitmask in hexadecimal. Then, for each set bit in @bits
> + * mask, add the corresponding string describing the bit in @strs to @buf.

Good to document what the return value is.

Also, I note some fixes for this doc are in patch 6 that should be here.

I wonder if better to return number of bytes filled?
Currently the return value isn't used, but that feels potentially more
useful than returning the buffer and someone having to run strlen()
on it if they want to append something afterwards.

Also allows detection of out of space condition.


> + */
> +char *cper_bits_to_str(char *buf, int buf_size, unsigned long bits,
> +		       const char * const strs[], unsigned int strs_size)
> +{
> +	int len = buf_size;
> +	char *str = buf;
> +	int i, size;
> +
> +	*buf = '\0';
> +
> +	for_each_set_bit(i, &bits, strs_size) {
> +		if (!(bits & (1U << (i))))
> +			continue;

How would that happen? We are only entering the loop
if that condition is true.

> +
> +		if (*buf && len > 0) {
> +			*str = '|';
> +			len--;
> +			str++;
> +		}
> +
> +		size = strscpy(str, strs[i], len);
> +		if (size < 0)
> +			break;
> +
> +		len -= size;
> +		str += size;
> +	}
> +	return buf;
> +}
> +EXPORT_SYMBOL_GPL(cper_bits_to_str);
> +
>  static const char * const proc_type_strs[] = {
>  	"IA32/X64",
>  	"IA64",
> diff --git a/include/linux/cper.h b/include/linux/cper.h
> index 265b0f8fc0b3..c2f14b916bfb 100644
> --- a/include/linux/cper.h
> +++ b/include/linux/cper.h
> @@ -584,6 +584,8 @@ const char *cper_mem_err_type_str(unsigned int);
>  const char *cper_mem_err_status_str(u64 status);
>  void cper_print_bits(const char *prefix, unsigned int bits,
>  		     const char * const strs[], unsigned int strs_size);
> +char *cper_bits_to_str(char *buf, int buf_size, unsigned long bits,
> +		       const char * const strs[], unsigned int strs_size);
>  void cper_mem_err_pack(const struct cper_sec_mem_err *,
>  		       struct cper_mem_err_compact *);
>  const char *cper_mem_err_unpack(struct trace_seq *,


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] docs: efi: add CPER functions to driver-api
  2024-07-08 11:18 ` [PATCH 6/6] docs: efi: add CPER functions to driver-api Mauro Carvalho Chehab
@ 2024-07-08 15:47   ` Jonathan Cameron
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2024-07-08 15:47 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Tony Luck, Ard Biesheuvel, James Morse,
	Len Brown, Rafael J. Wysocki, Shiju Jose, Jonathan Corbet,
	linux-acpi, linux-doc, linux-edac, linux-efi, linux-kernel

On Mon,  8 Jul 2024 13:18:15 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> There are two kernel-doc like descriptions at cper, which is used
> by other parts of cper and on ghes driver. They both have kernel-doc
> like descriptions.
> 
> Change the tags for them to be actual kernel-doc tags and add them
> to the driver-api documentaion at the UEFI section.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Other than the blob at the end that belongs in earlier patch LGTM.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Documentation/driver-api/firmware/efi/index.rst | 11 ++++++++---
>  drivers/firmware/efi/cper.c                     | 10 ++++------
>  2 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/driver-api/firmware/efi/index.rst b/Documentation/driver-api/firmware/efi/index.rst
> index 4fe8abba9fc6..5a6b6229592c 100644
> --- a/Documentation/driver-api/firmware/efi/index.rst
> +++ b/Documentation/driver-api/firmware/efi/index.rst
> @@ -1,11 +1,16 @@
>  .. SPDX-License-Identifier: GPL-2.0
>  
> -============
> -UEFI Support
> -============
> +====================================================
> +Unified Extensible Firmware Interface (UEFI) Support
> +====================================================
>  
>  UEFI stub library functions
>  ===========================
>  
>  .. kernel-doc:: drivers/firmware/efi/libstub/mem.c
>     :internal:
> +
> +UEFI Common Platform Error Record (CPER) functions
> +==================================================
> +
> +.. kernel-doc:: drivers/firmware/efi/cper.c
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index f8c8a15cd527..2785c8ea8ad8 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -69,7 +69,7 @@ const char *cper_severity_str(unsigned int severity)
>  }
>  EXPORT_SYMBOL_GPL(cper_severity_str);
>  
> -/*
> +/**
>   * cper_print_bits - print strings for set bits
>   * @pfx: prefix for each line, including log level and prefix string
>   * @bits: bit mask
> @@ -106,18 +106,16 @@ void cper_print_bits(const char *pfx, unsigned int bits,
>  		printk("%s\n", buf);
>  }
>  
> -/*
> +/**
>   * cper_bits_to_str - return a string for set bits
>   * @buf: buffer to store the output string
>   * @buf_size: size of the output string buffer
>   * @bits: bit mask
>   * @strs: string array, indexed by bit position
>   * @strs_size: size of the string array: @strs
> - * @mask: a continuous bitmask used to detect the first valid bit of the
> - *        bitmap.
>   *
> - * Add to @buf the bitmask in hexadecimal. Then, for each set bit in @bits
> - * mask, add the corresponding string describing the bit in @strs to @buf.
> + * Add to @buf the bitmask in hexadecimal. Then, for each set bit in @bits,
> + * add the corresponding string describing the bit in @strs to @buf.
This is in wrong patch.  No point in introducing wrong docs to fix later.

>   */
>  char *cper_bits_to_str(char *buf, int buf_size, unsigned long bits,
>  		       const char * const strs[], unsigned int strs_size)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs
  2024-07-08 11:18 ` [PATCH 5/6] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs Mauro Carvalho Chehab
@ 2024-07-08 15:50   ` Jonathan Cameron
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2024-07-08 15:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Borislav Petkov, Tony Luck, Ard Biesheuvel, James Morse,
	Len Brown, Rafael J. Wysocki, Shiju Jose, Uwe Kleine-König,
	Alison Schofield, Dan Williams, Dave Jiang, Ira Weiny, Shuai Xue,
	linux-acpi, linux-edac, linux-efi, linux-kernel

On Mon,  8 Jul 2024 13:18:14 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Up to UEFI spec, the type byte of CPER struct for ARM processor was

Up to UEFI spec XXX?

> defined simply as:
> 
> Type at byte offset 4:
> 
> 	- Cache error
> 	- TLB Error
> 	- Bus Error
> 	- Micro-architectural Error
> 	All other values are reserved
> 
> Yet, there was no information about how this would be encoded.
> 
> Spec 2.9A errata corrected it by defining:
> 
> 	- Bit 1 - Cache Error
> 	- Bit 2 - TLB Error
> 	- Bit 3 - Bus Error
> 	- Bit 4 - Micro-architectural Error
> 	All other values are reserved
> 
> That actually aligns with the values already defined on older
> versions at N.2.4.1. Generic Processor Error Section.
> 
> Spec 2.10 also preserve the same encoding as 2.9A
> 
> Adjust CPER and GHES handling code for both generic and ARM
> processors to properly handle UEFI 2.9A and 2.10 encoding.
> 
> Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

With above tidied up.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions
  2024-07-08 14:43       ` Borislav Petkov
@ 2024-07-11  5:26         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-11  5:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, Daniel Ferguson, Ard Biesheuvel, James Morse,
	Jonathan Cameron, Len Brown, Rafael J. Wysocki, Shiju Jose,
	Uwe Kleine-König, Dan Williams, Dave Jiang, Ira Weiny,
	Shuai Xue, linux-acpi, linux-edac, linux-efi, linux-kernel

Em Mon, 8 Jul 2024 16:43:12 +0200
Borislav Petkov <bp@alien8.de> escreveu:

> On Mon, Jul 08, 2024 at 02:10:25PM +0200, Mauro Carvalho Chehab wrote:
> > This patch itself just add conditionals to optimize out code on
> > non-ARM architectures. The next one will add some ARM-specific bits
> > inside ARM processor CPER trace, thus causing compilation breakages
> > on non-ARM, due to arm-specific kAPI bits that will be used then.  
> 
> Are you sure?

That is what reviews to past attempts to merge patch 2 implied. 

> I have both patches applied and then practically reverting the second one
> builds an allmodconfig just fine.

I double-checked the logic: I noticed just one kABI symbol that
it is arm-specific (CPU midr), and there is has already a wrapper 
for it.

I also did a cross-compilation for both x86_64 and s390 to verify,
and indeed it is building fine without the ifdefs.

So, I'll drop patch 1.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-07-11  5:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-08 11:18 [PATCH 0/6] Fix issues with ARM Processor CPER records Mauro Carvalho Chehab
2024-07-08 11:18 ` [PATCH 1/6] RAS: ACPI: APEI: add conditional compilation to ARM error report functions Mauro Carvalho Chehab
2024-07-08 11:32   ` Borislav Petkov
2024-07-08 12:10     ` Mauro Carvalho Chehab
2024-07-08 14:43       ` Borislav Petkov
2024-07-11  5:26         ` Mauro Carvalho Chehab
2024-07-08 14:55       ` Jonathan Cameron
2024-07-08 11:18 ` [PATCH 2/6] RAS: Report all ARM processor CPER information to userspace Mauro Carvalho Chehab
2024-07-08 15:34   ` Jonathan Cameron
2024-07-08 11:18 ` [PATCH 3/6] efi/cper: Adjust infopfx size to accept an extra space Mauro Carvalho Chehab
2024-07-08 11:18 ` [PATCH 4/6] efi/cper: Add a new helper function to print bitmasks Mauro Carvalho Chehab
2024-07-08 15:45   ` Jonathan Cameron
2024-07-08 11:18 ` [PATCH 5/6] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs Mauro Carvalho Chehab
2024-07-08 15:50   ` Jonathan Cameron
2024-07-08 11:18 ` [PATCH 6/6] docs: efi: add CPER functions to driver-api Mauro Carvalho Chehab
2024-07-08 15:47   ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).