From: Baoquan He <bhe@redhat.com>
To: Breno Leitao <leitao@debian.org>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org,
linux-acpi@vger.kernel.org, dyoung@redhat.com,
tony.luck@intel.com, xueshuai@linux.alibaba.com,
vgoyal@redhat.com, zhiquan1.li@intel.com, olja@meta.com,
kernel-team@meta.com
Subject: Re: [PATCH v2 1/2] vmcoreinfo: expose hardware error recovery statistics via sysfs
Date: Wed, 11 Feb 2026 10:01:05 +0800 [thread overview]
Message-ID: <aYvi4Y_HNqk_u1-v@fedora> (raw)
In-Reply-To: <20260202-vmcoreinfo_sysfs-v2-1-8f3b5308b894@debian.org>
Hi Breno,
On 02/02/26 at 06:27am, Breno Leitao wrote:
> Add a sysfs directory at /sys/kernel/hwerr_recovery_stats/ to expose
> hardware error recovery statistics that are already tracked by the
> kernel. This allows userspace monitoring tools to track recovered
> hardware errors without requiring kernel crashes.
>
> This is useful to track recoverable hardware errors in a time series,
> even if the host doesn't crash.
>
> The sysfs directory contains one file per error subsystem:
>
> /sys/kernel/hwerr_recovery_stats/cpu - CPU-related errors (MCE, ARM errors)
> /sys/kernel/hwerr_recovery_stats/memory - Memory-related errors
> /sys/kernel/hwerr_recovery_stats/pci - PCI/PCIe AER non-fatal errors
> /sys/kernel/hwerr_recovery_stats/cxl - CXL errors
> /sys/kernel/hwerr_recovery_stats/others - Other hardware errors
>
> Each file contains a single integer representing the count of recovered
> errors for that subsystem.
>
> These statistics provide visibility into the health of the system's
> hardware and can be used by system administrators to proactively detect
> failing components before they cause system crashes.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> kernel/vmcore_info.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 55 insertions(+)
>
> diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
> index e2784038bbed7..b7fcd21be7c59 100644
> --- a/kernel/vmcore_info.c
> +++ b/kernel/vmcore_info.c
Since we agreed hwerr_recovery_stats has nothing to do with vmcore, it
seems inappropriate to put its sysfs handling code in
kernel/vmcore_info.c. File kernel/vmcore_info.c is only used to build
vmcore info for later vmcore dumping. And hwerr_log_error_type() should
not be put in kernel/vmcore_info.c. either. I didn't check this
carefully before, sorry. Please reconsider if these can be handled better.
Thanks
Baoquan
> @@ -6,6 +6,8 @@
>
> #include <linux/buildid.h>
> #include <linux/init.h>
> +#include <linux/kobject.h>
> +#include <linux/sysfs.h>
> #include <linux/utsname.h>
> #include <linux/vmalloc.h>
> #include <linux/sizes.h>
> @@ -139,6 +141,56 @@ void hwerr_log_error_type(enum hwerr_error_type src)
> }
> EXPORT_SYMBOL_GPL(hwerr_log_error_type);
>
> +/* sysfs interface for hardware error recovery statistics */
> +#define HWERR_ATTR_RO(_name, _type) \
> +static ssize_t _name##_show(struct kobject *kobj, \
> + struct kobj_attribute *attr, char *buf) \
> +{ \
> + return sysfs_emit(buf, "%d\n", \
> + atomic_read(&hwerr_data[_type].count)); \
> +} \
> +static struct kobj_attribute hwerr_##_name##_attr = __ATTR_RO(_name)
> +
> +HWERR_ATTR_RO(cpu, HWERR_RECOV_CPU);
> +HWERR_ATTR_RO(memory, HWERR_RECOV_MEMORY);
> +HWERR_ATTR_RO(pci, HWERR_RECOV_PCI);
> +HWERR_ATTR_RO(cxl, HWERR_RECOV_CXL);
> +HWERR_ATTR_RO(others, HWERR_RECOV_OTHERS);
> +
> +static struct attribute *hwerr_recovery_stats_attrs[] = {
> + &hwerr_cpu_attr.attr,
> + &hwerr_memory_attr.attr,
> + &hwerr_pci_attr.attr,
> + &hwerr_cxl_attr.attr,
> + &hwerr_others_attr.attr,
> + NULL,
> +};
> +
> +static const struct attribute_group hwerr_recovery_stats_group = {
> + .attrs = hwerr_recovery_stats_attrs,
> +};
> +
> +static struct kobject *hwerr_recovery_stats_kobj;
> +
> +static int __init hwerr_recovery_stats_init(void)
> +{
> + hwerr_recovery_stats_kobj = kobject_create_and_add("hwerr_recovery_stats",
> + kernel_kobj);
> + if (!hwerr_recovery_stats_kobj) {
> + pr_warn("Failed to create hwerr_recovery_stats kobject\n");
> + return -ENOMEM;
> + }
> +
> + if (sysfs_create_group(hwerr_recovery_stats_kobj,
> + &hwerr_recovery_stats_group)) {
> + kobject_put(hwerr_recovery_stats_kobj);
> + pr_warn("Failed to create hwerr_recovery_stats sysfs group\n");
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> static int __init crash_save_vmcoreinfo_init(void)
> {
> vmcoreinfo_data = (unsigned char *)get_zeroed_page(GFP_KERNEL);
> @@ -248,6 +300,9 @@ static int __init crash_save_vmcoreinfo_init(void)
> arch_crash_save_vmcoreinfo();
> update_vmcoreinfo_note();
>
> + /* Create /sys/kernel/hwerr_recovery_stats/ directory */
> + hwerr_recovery_stats_init();
> +
> return 0;
> }
>
>
> --
> 2.47.3
>
next prev parent reply other threads:[~2026-02-11 2:01 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 14:27 [PATCH v2 0/2] vmcoreinfo: Expose hardware error recovery statistics via sysfs Breno Leitao
2026-02-02 14:27 ` [PATCH v2 1/2] vmcoreinfo: expose " Breno Leitao
2026-02-11 2:01 ` Baoquan He [this message]
2026-02-02 14:27 ` [PATCH v2 2/2] docs: add ABI documentation for /sys/kernel/hwerr_recovery_stats/ Breno Leitao
2026-02-10 9:11 ` [PATCH v2 0/2] vmcoreinfo: Expose hardware error recovery statistics via sysfs Breno Leitao
2026-02-10 18:46 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYvi4Y_HNqk_u1-v@fedora \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dyoung@redhat.com \
--cc=kernel-team@meta.com \
--cc=kexec@lists.infradead.org \
--cc=leitao@debian.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=olja@meta.com \
--cc=tony.luck@intel.com \
--cc=vgoyal@redhat.com \
--cc=xueshuai@linux.alibaba.com \
--cc=zhiquan1.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.