All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] fix CPER issues related to UEFI 2.9A Errata
@ 2024-06-24  9:19 Mauro Carvalho Chehab
  2024-06-24  9:19 ` [PATCH v5 1/4] efi/cper: Adjust infopfx size to accept an extra space Mauro Carvalho Chehab
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Mauro Carvalho Chehab @ 2024-06-24  9:19 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Borislav Petkov, Tony Luck, James Morse,
	Jonathan Cameron, Shiju Jose, linux-efi, linux-kernel, linux-edac,
	Ard Biesheuvel, Jonathan Corbet, Len Brown, linux-acpi, linux-doc

The UEFI 2.9A errata makes clear how ARM processor type encoding should
be done: it is meant to be equal to Generic processor, using a bitmask.

The current code assumes, for both generic and ARM processor types
that this is an integer, which is an incorrect assumption.

Fix it. While here, also fix a compilation issue when using W=1.

After the change, Kernel will properly decode receiving two errors at the same
message, as defined at UEFI spec:

[   75.282430] Memory failure: 0x5cdfd: recovery action for free buddy page: Recovered
[   94.973081] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[   94.973770] {2}[Hardware Error]: event severity: recoverable
[   94.974334] {2}[Hardware Error]:  Error 0, type: recoverable
[   94.974962] {2}[Hardware Error]:   section_type: ARM processor error
[   94.975586] {2}[Hardware Error]:   MIDR: 0x000000000000cd24
[   94.976202] {2}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x000000000000ab12
[   94.977011] {2}[Hardware Error]:   error affinity level: 2
[   94.977593] {2}[Hardware Error]:   running state: 0x1
[   94.978135] {2}[Hardware Error]:   Power State Coordination Interface state: 4660
[   94.978884] {2}[Hardware Error]:   Error info structure 0:
[   94.979463] {2}[Hardware Error]:   num errors: 3
[   94.979971] {2}[Hardware Error]:    first error captured
[   94.980523] {2}[Hardware Error]:    propagated error captured
[   94.981110] {2}[Hardware Error]:    overflow occurred, error info is incomplete
[   94.981893] {2}[Hardware Error]:    error_type: 0x0006: cache error|TLB error
[   94.982606] {2}[Hardware Error]:    error_info: 0x000000000091000f
[   94.983249] {2}[Hardware Error]:     transaction type: Data Access
[   94.983891] {2}[Hardware Error]:     cache error, operation type: Data write
[   94.984559] {2}[Hardware Error]:     TLB error, operation type: Data write
[   94.985215] {2}[Hardware Error]:     cache level: 2
[   94.985749] {2}[Hardware Error]:     TLB level: 2
[   94.986277] {2}[Hardware Error]:     processor context not corrupted

And the error code is properly decoded according with table N.17 from UEFI 2.10
spec:

	[   94.981893] {2}[Hardware Error]:    error_type: 0x0006: cache error|TLB error

The error injection logic was checked via QEMU using this patch:
https://lore.kernel.org/all/20240621165115.336-1-shiju.jose@huawei.com/

v5:
- Do some cleanups and minor fixes as suggested by Jonathan and Tony:
  - check errors at strscpy();
  - simplify cper_bits_to_str() function;
  - use FIELD_GET() and for_each_set_bit();
  - use ARRAY_SIZE() on infofx to let it clear that it should be size of newpfx + 1;
  - fix kernel-doc warning with W=1;
  - use kernel-doc for two exported functions at cper.c.

v4:
- The print function had some bugs on it, which was discovered with
  the help of an error injection tool I'm now using.

v3:
- It adds a helper function to produce a buffer describing the
  error bits at cper's printk and ghes pr_warn_bitrated. It also
  fixes a W=1 error while building cper.

v2:
- It fixes the way printks are handled on both cper_arm and ghes
  drivers.

v1: 
- (tagged as RFC) was mostly to give a heads up that the current 
  implementation is not following the spec. It also touches
  only cper code.




Mauro Carvalho Chehab (4):
  efi/cper: Adjust infopfx size to accept an extra space
  efi/cper: Add a new helper function to print bitmasks
  efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs
  docs: efi: add CPER functions to driver-api

 .../driver-api/firmware/efi/index.rst         | 11 ++--
 drivers/acpi/apei/ghes.c                      | 15 +++---
 drivers/firmware/efi/cper-arm.c               | 52 +++++++++----------
 drivers/firmware/efi/cper.c                   | 41 ++++++++++++++-
 include/linux/cper.h                          | 12 +++--
 5 files changed, 89 insertions(+), 42 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-07-01 15:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-24  9:19 [PATCH v5 0/4] fix CPER issues related to UEFI 2.9A Errata Mauro Carvalho Chehab
2024-06-24  9:19 ` [PATCH v5 1/4] efi/cper: Adjust infopfx size to accept an extra space Mauro Carvalho Chehab
2024-06-24  9:19 ` [PATCH v5 2/4] efi/cper: Add a new helper function to print bitmasks Mauro Carvalho Chehab
2024-07-01 15:07   ` Jonathan Cameron
2024-06-24  9:19 ` [PATCH v5 3/4] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs Mauro Carvalho Chehab
2024-06-27  8:59   ` kernel test robot
2024-06-28 20:28   ` kernel test robot
2024-06-24  9:19 ` [PATCH v5 4/4] docs: efi: add CPER functions to driver-api Mauro Carvalho Chehab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.