* [PATCH 1/2] ACPI: APEI: GHES: fix severity namespace in ghes_log_hwerr()
2026-06-17 13:32 [PATCH 0/2] vmcoreinfo: GHES: track fatal hardware errors Breno Leitao
@ 2026-06-17 13:32 ` Breno Leitao
2026-06-17 13:32 ` [PATCH 2/2] vmcore_info: track fatal hardware errors Breno Leitao
2026-06-29 16:02 ` [PATCH 0/2] vmcoreinfo: GHES: " Breno Leitao
2 siblings, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2026-06-17 13:32 UTC (permalink / raw)
To: Rafael J. Wysocki, Tony Luck, Borislav Petkov, Hanjun Guo,
Mauro Carvalho Chehab, Shuai Xue, Len Brown, Andrew Morton,
Baoquan He, Mike Rapoport, Pasha Tatashin, Pratyush Yadav,
Dave Young
Cc: linux-acpi, linux-kernel, riel, caggio, kexec, Breno Leitao,
kernel-team
ghes_log_hwerr() receives a GHES_SEV_* value from ghes_severity() but
tests it against CPER_SEV_RECOVERABLE. GHES_SEV_RECOVERABLE is 2 while
CPER_SEV_RECOVERABLE is 0, so every recoverable error is dropped and
only GHES_SEV_NO slips through; nothing useful is recorded through the
APEI/GHES path, which is the only one arm64 has.
Compare against GHES_SEV_RECOVERABLE so recoverable hardware errors are
tracked as intended.
Fixes: 918e1507cff9 ("vmcoreinfo: track and log recoverable hardware errors")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
drivers/acpi/apei/ghes.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 3236a3ce79d6b..f0f9f1529e7aa 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -877,7 +877,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_cper_kfifo_get, "CXL");
static void ghes_log_hwerr(int sev, guid_t *sec_type)
{
- if (sev != CPER_SEV_RECOVERABLE)
+ if (sev != GHES_SEV_RECOVERABLE)
return;
if (guid_equal(sec_type, &CPER_SEC_PROC_ARM) ||
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH 2/2] vmcore_info: track fatal hardware errors
2026-06-17 13:32 [PATCH 0/2] vmcoreinfo: GHES: track fatal hardware errors Breno Leitao
2026-06-17 13:32 ` [PATCH 1/2] ACPI: APEI: GHES: fix severity namespace in ghes_log_hwerr() Breno Leitao
@ 2026-06-17 13:32 ` Breno Leitao
2026-06-29 16:02 ` [PATCH 0/2] vmcoreinfo: GHES: " Breno Leitao
2 siblings, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2026-06-17 13:32 UTC (permalink / raw)
To: Rafael J. Wysocki, Tony Luck, Borislav Petkov, Hanjun Guo,
Mauro Carvalho Chehab, Shuai Xue, Len Brown, Andrew Morton,
Baoquan He, Mike Rapoport, Pasha Tatashin, Pratyush Yadav,
Dave Young
Cc: linux-acpi, linux-kernel, riel, caggio, kexec, Breno Leitao,
kernel-team
Fatal (panic-severity) hardware errors reported through APEI/GHES are
the ones most likely to have caused a crash, but hwerr_data did not
record them. Add a HWERR_FATAL bucket and bump it from
ghes_log_hwerr() when a GHES_SEV_PANIC error is seen, so crash tooling
can tell from the vmcore that a fatal hardware error preceded the
crash.
Tools reading hwerr_data gain one entry (HWERR_FATAL).
Signed-off-by: Breno Leitao <leitao@debian.org>
---
drivers/acpi/apei/ghes.c | 5 +++++
include/uapi/linux/vmcore.h | 1 +
2 files changed, 6 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index f0f9f1529e7aa..5a9e16bdca2b6 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -877,6 +877,11 @@ EXPORT_SYMBOL_NS_GPL(cxl_cper_kfifo_get, "CXL");
static void ghes_log_hwerr(int sev, guid_t *sec_type)
{
+ if (sev == GHES_SEV_PANIC) {
+ hwerr_log_error_type(HWERR_FATAL);
+ return;
+ }
+
if (sev != GHES_SEV_RECOVERABLE)
return;
diff --git a/include/uapi/linux/vmcore.h b/include/uapi/linux/vmcore.h
index 2ba89fafa518a..c774b037603e2 100644
--- a/include/uapi/linux/vmcore.h
+++ b/include/uapi/linux/vmcore.h
@@ -21,6 +21,7 @@ enum hwerr_error_type {
HWERR_RECOV_PCI,
HWERR_RECOV_CXL,
HWERR_RECOV_OTHERS,
+ HWERR_FATAL, /* fatal hardware errors */
HWERR_RECOV_MAX,
};
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH 0/2] vmcoreinfo: GHES: track fatal hardware errors
2026-06-17 13:32 [PATCH 0/2] vmcoreinfo: GHES: track fatal hardware errors Breno Leitao
2026-06-17 13:32 ` [PATCH 1/2] ACPI: APEI: GHES: fix severity namespace in ghes_log_hwerr() Breno Leitao
2026-06-17 13:32 ` [PATCH 2/2] vmcore_info: track fatal hardware errors Breno Leitao
@ 2026-06-29 16:02 ` Breno Leitao
2 siblings, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2026-06-29 16:02 UTC (permalink / raw)
To: Rafael J. Wysocki, Tony Luck, Borislav Petkov, Hanjun Guo,
Mauro Carvalho Chehab, Shuai Xue, Len Brown, Andrew Morton,
Baoquan He, Mike Rapoport, Pasha Tatashin, Pratyush Yadav,
Dave Young
Cc: linux-acpi, linux-kernel, riel, caggio, kexec, kernel-team
On Wed, Jun 17, 2026 at 06:32:46AM -0700, Breno Leitao wrote:
> Hardware errors reported through APEI/GHES are recorded in the kernel's
> hwerr_data array so that crash tooling can tell from the vmcore whether a
> hardware error preceded a crash.
This Hardware error tracking "thing" is currently in an awkward
location—it doesn't belong in RAS and has minimal connection to vmcore
info.
Following Baoquan's earlier suggestion, I'll refactor this as a standalone
driver, which should make the code organization clearer and more maintainable.
https://lore.kernel.org/all/aYvi4Y_HNqk_u1-v@fedora/
^ permalink raw reply [flat|nested] 4+ messages in thread