From: Breno Leitao <leitao@debian.org>
To: akpm@linux-foundation.org, bhe@redhat.com
Cc: linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
linux-arm-kernel@lists.infradead.org,
linux-acpi@vger.kernel.org, dyoung@redhat.com,
tony.luck@intel.com, xueshuai@linux.alibaba.com,
vgoyal@redhat.com, zhiquan1.li@intel.com, olja@meta.com,
Breno Leitao <leitao@debian.org>,
kernel-team@meta.com
Subject: [PATCH v2 1/2] vmcoreinfo: expose hardware error recovery statistics via sysfs
Date: Mon, 02 Feb 2026 06:27:39 -0800 [thread overview]
Message-ID: <20260202-vmcoreinfo_sysfs-v2-1-8f3b5308b894@debian.org> (raw)
In-Reply-To: <20260202-vmcoreinfo_sysfs-v2-0-8f3b5308b894@debian.org>
Add a sysfs directory at /sys/kernel/hwerr_recovery_stats/ to expose
hardware error recovery statistics that are already tracked by the
kernel. This allows userspace monitoring tools to track recovered
hardware errors without requiring kernel crashes.
This is useful to track recoverable hardware errors in a time series,
even if the host doesn't crash.
The sysfs directory contains one file per error subsystem:
/sys/kernel/hwerr_recovery_stats/cpu - CPU-related errors (MCE, ARM errors)
/sys/kernel/hwerr_recovery_stats/memory - Memory-related errors
/sys/kernel/hwerr_recovery_stats/pci - PCI/PCIe AER non-fatal errors
/sys/kernel/hwerr_recovery_stats/cxl - CXL errors
/sys/kernel/hwerr_recovery_stats/others - Other hardware errors
Each file contains a single integer representing the count of recovered
errors for that subsystem.
These statistics provide visibility into the health of the system's
hardware and can be used by system administrators to proactively detect
failing components before they cause system crashes.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
kernel/vmcore_info.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index e2784038bbed7..b7fcd21be7c59 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -6,6 +6,8 @@
#include <linux/buildid.h>
#include <linux/init.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
#include <linux/utsname.h>
#include <linux/vmalloc.h>
#include <linux/sizes.h>
@@ -139,6 +141,56 @@ void hwerr_log_error_type(enum hwerr_error_type src)
}
EXPORT_SYMBOL_GPL(hwerr_log_error_type);
+/* sysfs interface for hardware error recovery statistics */
+#define HWERR_ATTR_RO(_name, _type) \
+static ssize_t _name##_show(struct kobject *kobj, \
+ struct kobj_attribute *attr, char *buf) \
+{ \
+ return sysfs_emit(buf, "%d\n", \
+ atomic_read(&hwerr_data[_type].count)); \
+} \
+static struct kobj_attribute hwerr_##_name##_attr = __ATTR_RO(_name)
+
+HWERR_ATTR_RO(cpu, HWERR_RECOV_CPU);
+HWERR_ATTR_RO(memory, HWERR_RECOV_MEMORY);
+HWERR_ATTR_RO(pci, HWERR_RECOV_PCI);
+HWERR_ATTR_RO(cxl, HWERR_RECOV_CXL);
+HWERR_ATTR_RO(others, HWERR_RECOV_OTHERS);
+
+static struct attribute *hwerr_recovery_stats_attrs[] = {
+ &hwerr_cpu_attr.attr,
+ &hwerr_memory_attr.attr,
+ &hwerr_pci_attr.attr,
+ &hwerr_cxl_attr.attr,
+ &hwerr_others_attr.attr,
+ NULL,
+};
+
+static const struct attribute_group hwerr_recovery_stats_group = {
+ .attrs = hwerr_recovery_stats_attrs,
+};
+
+static struct kobject *hwerr_recovery_stats_kobj;
+
+static int __init hwerr_recovery_stats_init(void)
+{
+ hwerr_recovery_stats_kobj = kobject_create_and_add("hwerr_recovery_stats",
+ kernel_kobj);
+ if (!hwerr_recovery_stats_kobj) {
+ pr_warn("Failed to create hwerr_recovery_stats kobject\n");
+ return -ENOMEM;
+ }
+
+ if (sysfs_create_group(hwerr_recovery_stats_kobj,
+ &hwerr_recovery_stats_group)) {
+ kobject_put(hwerr_recovery_stats_kobj);
+ pr_warn("Failed to create hwerr_recovery_stats sysfs group\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
static int __init crash_save_vmcoreinfo_init(void)
{
vmcoreinfo_data = (unsigned char *)get_zeroed_page(GFP_KERNEL);
@@ -248,6 +300,9 @@ static int __init crash_save_vmcoreinfo_init(void)
arch_crash_save_vmcoreinfo();
update_vmcoreinfo_note();
+ /* Create /sys/kernel/hwerr_recovery_stats/ directory */
+ hwerr_recovery_stats_init();
+
return 0;
}
--
2.47.3
next prev parent reply other threads:[~2026-02-02 14:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 14:27 [PATCH v2 0/2] vmcoreinfo: Expose hardware error recovery statistics via sysfs Breno Leitao
2026-02-02 14:27 ` Breno Leitao [this message]
2026-02-11 2:01 ` [PATCH v2 1/2] vmcoreinfo: expose " Baoquan He
2026-02-02 14:27 ` [PATCH v2 2/2] docs: add ABI documentation for /sys/kernel/hwerr_recovery_stats/ Breno Leitao
2026-02-10 9:11 ` [PATCH v2 0/2] vmcoreinfo: Expose hardware error recovery statistics via sysfs Breno Leitao
2026-02-10 18:46 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260202-vmcoreinfo_sysfs-v2-1-8f3b5308b894@debian.org \
--to=leitao@debian.org \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=dyoung@redhat.com \
--cc=kernel-team@meta.com \
--cc=kexec@lists.infradead.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=olja@meta.com \
--cc=tony.luck@intel.com \
--cc=vgoyal@redhat.com \
--cc=xueshuai@linux.alibaba.com \
--cc=zhiquan1.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox