From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>, linux-pci@vger.kernel.org
Cc: "Jon Pan-Doh" <pandoh@google.com>,
"Karolina Stolarek" <karolina.stolarek@oracle.com>,
"Weinan Liu" <wnliu@google.com>,
"Martin Petersen" <martin.petersen@oracle.com>,
"Ben Fuller" <ben.fuller@oracle.com>,
"Drew Walton" <drewwalton@microsoft.com>,
"Anil Agrawal" <anilagrawal@meta.com>,
"Tony Luck" <tony.luck@intel.com>,
"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
"Lukas Wunner" <lukas@wunner.de>,
"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
"Sargun Dhillon" <sargun@meta.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
"Oliver O'Halloran" <oohall@gmail.com>,
"Kai-Heng Feng" <kaihengf@nvidia.com>,
"Keith Busch" <kbusch@kernel.org>,
"Robert Richter" <rrichter@amd.com>,
"Terry Bowman" <terry.bowman@amd.com>,
"Shiju Jose" <shiju.jose@huawei.com>,
"Dave Jiang" <dave.jiang@intel.com>,
linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
"Bjorn Helgaas" <bhelgaas@google.com>
Subject: Re: [PATCH v8 20/20] PCI/AER: Add sysfs attributes for log ratelimits
Date: Thu, 22 May 2025 16:50:58 -0700 [thread overview]
Message-ID: <97a45686-3b54-489f-b0f4-847b99312aa0@linux.intel.com> (raw)
In-Reply-To: <20250522232339.1525671-21-helgaas@kernel.org>
Hi,
On 5/22/25 4:21 PM, Bjorn Helgaas wrote:
> From: Jon Pan-Doh <pandoh@google.com>
>
> Allow userspace to read/write log ratelimits per device (including
> enable/disable). Create aer/ sysfs directory to store them and any
> future AER configs.
>
> The new sysfs files are:
>
> /sys/bus/pci/devices/*/aer/correctable_ratelimit_burst
> /sys/bus/pci/devices/*/aer/correctable_ratelimit_interval_ms
> /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_burst
> /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_interval_ms
>
> The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so
> if we try to emit more than 10 messages in a 5 second period, some are
> suppressed.
>
> Update AER sysfs ABI filename to reflect the broader scope of AER sysfs
> attributes (e.g. stats and ratelimits).
>
> Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats ->
> sysfs-bus-pci-devices-aer
>
> Tested using aer-inject[1]. Configured correctable log ratelimit to 5.
> Sent 6 AER errors. Observed 5 errors logged while AER stats
> (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 6.
>
> Disabled ratelimiting and sent 6 more AER errors. Observed all 6 errors
> logged and accounted in AER stats (12 total errors).
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git
>
> [bhelgaas: note fatal errors are not ratelimited, "aer_report" ->
> "aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms]
>
> Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
> Signed-off-by: Jon Pan-Doh <pandoh@google.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Link: https://patch.msgid.link/20250520215047.1350603-18-helgaas@kernel.org
> ---
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ...es-aer_stats => sysfs-bus-pci-devices-aer} | 44 ++++++++
> Documentation/PCI/pcieaer-howto.rst | 5 +-
> drivers/pci/pci-sysfs.c | 1 +
> drivers/pci/pci.h | 1 +
> drivers/pci/pcie/aer.c | 105 ++++++++++++++++++
> 5 files changed, 155 insertions(+), 1 deletion(-)
> rename Documentation/ABI/testing/{sysfs-bus-pci-devices-aer_stats => sysfs-bus-pci-devices-aer} (72%)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
> similarity index 72%
> rename from Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
> rename to Documentation/ABI/testing/sysfs-bus-pci-devices-aer
> index d1f67bb81d5d..5ed284523956 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
> +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
> @@ -117,3 +117,47 @@ Date: July 2018
> KernelVersion: 4.19.0
> Contact: linux-pci@vger.kernel.org, rajatja@google.com
> Description: Total number of ERR_NONFATAL messages reported to rootport.
> +
> +PCIe AER ratelimits
> +-------------------
> +
> +These attributes show up under all the devices that are AER capable.
> +They represent configurable ratelimits of logs per error type.
> +
> +See Documentation/PCI/pcieaer-howto.rst for more info on ratelimits.
> +
> +What: /sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_interval_ms
> +Date: May 2025
> +KernelVersion: 6.16.0
> +Contact: linux-pci@vger.kernel.org
> +Description: Writing 0 disables AER correctable error log ratelimiting.
> + Writing a positive value sets the ratelimit interval in ms.
> + Default is DEFAULT_RATELIMIT_INTERVAL (5000 ms).
> +
> +What: /sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_burst
> +Date: May 2025
> +KernelVersion: 6.16.0
> +Contact: linux-pci@vger.kernel.org
> +Description: Ratelimit burst for correctable error logs. Writing a value
> + changes the number of errors (burst) allowed per interval
> + before ratelimiting. Reading gets the current ratelimit
> + burst. Default is DEFAULT_RATELIMIT_BURST (10).
> +
> +What: /sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_interval_ms
> +Date: May 2025
> +KernelVersion: 6.16.0
> +Contact: linux-pci@vger.kernel.org
> +Description: Writing 0 disables AER non-fatal uncorrectable error log
> + ratelimiting. Writing a positive value sets the ratelimit
> + interval in ms. Default is DEFAULT_RATELIMIT_INTERVAL
> + (5000 ms).
> +
> +What: /sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_burst
> +Date: May 2025
> +KernelVersion: 6.16.0
> +Contact: linux-pci@vger.kernel.org
> +Description: Ratelimit burst for non-fatal uncorrectable error logs.
> + Writing a value changes the number of errors (burst)
> + allowed per interval before ratelimiting. Reading gets the
> + current ratelimit burst. Default is DEFAULT_RATELIMIT_BURST
> + (10).
> diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst
> index 6fb31516fff1..4b71e2f43ca7 100644
> --- a/Documentation/PCI/pcieaer-howto.rst
> +++ b/Documentation/PCI/pcieaer-howto.rst
> @@ -97,12 +97,15 @@ DPC errors, are not ratelimited.
> AER uses the default ratelimit of DEFAULT_RATELIMIT_BURST (10 events) over
> DEFAULT_RATELIMIT_INTERVAL (5 seconds).
>
> +Ratelimits are exposed in the form of sysfs attributes and configurable.
> +See Documentation/ABI/testing/sysfs-bus-pci-devices-aer.
> +
> AER Statistics / Counters
> -------------------------
>
> When PCIe AER errors are captured, the counters / statistics are also exposed
> in the form of sysfs attributes which are documented at
> -Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
> +Documentation/ABI/testing/sysfs-bus-pci-devices-aer.
>
> Developer Guide
> ===============
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index c6cda56ca52c..278de99b00ce 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1805,6 +1805,7 @@ const struct attribute_group *pci_dev_attr_groups[] = {
> &pcie_dev_attr_group,
> #ifdef CONFIG_PCIEAER
> &aer_stats_attr_group,
> + &aer_attr_group,
> #endif
> #ifdef CONFIG_PCIEASPM
> &aspm_ctrl_attr_group,
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 3023c68fe485..eca2812cfd25 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -965,6 +965,7 @@ void pci_no_aer(void);
> void pci_aer_init(struct pci_dev *dev);
> void pci_aer_exit(struct pci_dev *dev);
> extern const struct attribute_group aer_stats_attr_group;
> +extern const struct attribute_group aer_attr_group;
> void pci_aer_clear_fatal_status(struct pci_dev *dev);
> int pci_aer_clear_status(struct pci_dev *dev);
> int pci_aer_raw_clear_status(struct pci_dev *dev);
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index ebac126144fc..6c331695af58 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -627,6 +627,111 @@ const struct attribute_group aer_stats_attr_group = {
> .is_visible = aer_stats_attrs_are_visible,
> };
>
> +/*
> + * Ratelimit interval
> + * <=0: disabled with ratelimit.interval = 0
> + * >0: enabled with ratelimit.interval in ms
> + */
> +#define aer_ratelimit_interval_attr(name, ratelimit) \
> + static ssize_t \
> + name##_show(struct device *dev, struct device_attribute *attr, \
> + char *buf) \
> + { \
> + struct pci_dev *pdev = to_pci_dev(dev); \
> + \
> + return sysfs_emit(buf, "%d\n", \
> + pdev->aer_info->ratelimit.interval); \
> + } \
> + \
> + static ssize_t \
> + name##_store(struct device *dev, struct device_attribute *attr, \
> + const char *buf, size_t count) \
> + { \
> + struct pci_dev *pdev = to_pci_dev(dev); \
> + int interval; \
> + \
> + if (!capable(CAP_SYS_ADMIN)) \
> + return -EPERM; \
> + \
> + if (kstrtoint(buf, 0, &interval) < 0) \
> + return -EINVAL; \
> + \
> + if (interval <= 0) \
> + interval = 0; \
> + else \
> + interval = msecs_to_jiffies(interval); \
> + \
> + pdev->aer_info->ratelimit.interval = interval; \
> + \
> + return count; \
> + } \
> + static DEVICE_ATTR_RW(name);
> +
> +#define aer_ratelimit_burst_attr(name, ratelimit) \
> + static ssize_t \
> + name##_show(struct device *dev, struct device_attribute *attr, \
> + char *buf) \
> + { \
> + struct pci_dev *pdev = to_pci_dev(dev); \
> + \
> + return sysfs_emit(buf, "%d\n", \
> + pdev->aer_info->ratelimit.burst); \
> + } \
> + \
> + static ssize_t \
> + name##_store(struct device *dev, struct device_attribute *attr, \
> + const char *buf, size_t count) \
> + { \
> + struct pci_dev *pdev = to_pci_dev(dev); \
> + int burst; \
> + \
> + if (!capable(CAP_SYS_ADMIN)) \
> + return -EPERM; \
> + \
> + if (kstrtoint(buf, 0, &burst) < 0) \
> + return -EINVAL; \
> + \
> + pdev->aer_info->ratelimit.burst = burst; \
> + \
> + return count; \
> + } \
> + static DEVICE_ATTR_RW(name);
> +
> +#define aer_ratelimit_attrs(name) \
> + aer_ratelimit_interval_attr(name##_ratelimit_interval_ms, \
> + name##_ratelimit) \
> + aer_ratelimit_burst_attr(name##_ratelimit_burst, \
> + name##_ratelimit)
> +
> +aer_ratelimit_attrs(correctable)
> +aer_ratelimit_attrs(nonfatal)
> +
> +static struct attribute *aer_attrs[] = {
> + &dev_attr_correctable_ratelimit_interval_ms.attr,
> + &dev_attr_correctable_ratelimit_burst.attr,
> + &dev_attr_nonfatal_ratelimit_interval_ms.attr,
> + &dev_attr_nonfatal_ratelimit_burst.attr,
> + NULL
> +};
> +
> +static umode_t aer_attrs_are_visible(struct kobject *kobj,
> + struct attribute *a, int n)
> +{
> + struct device *dev = kobj_to_dev(kobj);
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!pdev->aer_info)
> + return 0;
> +
> + return a->mode;
> +}
> +
> +const struct attribute_group aer_attr_group = {
> + .name = "aer",
> + .attrs = aer_attrs,
> + .is_visible = aer_attrs_are_visible,
> +};
> +
> static void pci_dev_aer_stats_incr(struct pci_dev *pdev,
> struct aer_err_info *info)
> {
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
next prev parent reply other threads:[~2025-05-22 23:51 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-22 23:21 [PATCH v8 00/20] Rate limit AER logs Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 01/20] PCI/DPC: Initialize aer_err_info before using it Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 02/20] PCI/DPC: Log Error Source ID only when valid Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 03/20] PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error() Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 04/20] PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type() Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 05/20] PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 06/20] PCI/AER: Rename aer_print_port_info() to aer_print_source() Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 07/20] PCI/AER: Move aer_print_source() earlier in file Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 08/20] PCI/AER: Initialize aer_err_info before using it Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 09/20] PCI/AER: Simplify pci_print_aer() Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 10/20] PCI/AER: Update statistics before ratelimiting Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 11/20] PCI/AER: Trace error event " Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 12/20] PCI/AER: Check log level once and remember it Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 13/20] PCI/ERR: Add printk level to pcie_print_tlp_log() Bjorn Helgaas
2025-05-22 23:44 ` Sathyanarayanan Kuppuswamy
2025-05-23 9:56 ` Ilpo Järvinen
2025-05-28 6:38 ` Lukas Wunner
2025-05-28 10:00 ` Ilpo Järvinen
2025-05-22 23:21 ` [PATCH v8 14/20] PCI/AER: Reduce pci_print_aer() correctable error level to KERN_WARNING Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 15/20] PCI/AER: Rename struct aer_stats to aer_info Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 16/20] PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index Bjorn Helgaas
2025-05-22 23:58 ` Sathyanarayanan Kuppuswamy
2025-05-23 11:13 ` Ilpo Järvinen
2025-05-23 16:12 ` Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 17/20] PCI/AER: Simplify add_error_device() Bjorn Helgaas
2025-05-22 23:57 ` Sathyanarayanan Kuppuswamy
2025-05-23 11:14 ` Ilpo Järvinen
2025-05-22 23:21 ` [PATCH v8 18/20] PCI/AER: Ratelimit correctable and non-fatal error logging Bjorn Helgaas
2025-05-22 23:56 ` Sathyanarayanan Kuppuswamy
2025-05-23 16:06 ` Bjorn Helgaas
2025-08-01 13:16 ` Breno Leitao
2025-08-01 13:35 ` Breno Leitao
2025-10-01 21:38 ` Bjorn Helgaas
2025-10-02 9:08 ` Breno Leitao
2025-05-22 23:21 ` [PATCH v8 19/20] PCI/AER: Add ratelimits to PCI AER Documentation Bjorn Helgaas
2025-05-22 23:21 ` [PATCH v8 20/20] PCI/AER: Add sysfs attributes for log ratelimits Bjorn Helgaas
2025-05-22 23:50 ` Sathyanarayanan Kuppuswamy [this message]
2025-05-23 16:21 ` [PATCH v8 00/20] Rate limit AER logs Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97a45686-3b54-489f-b0f4-847b99312aa0@linux.intel.com \
--to=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=anilagrawal@meta.com \
--cc=ben.fuller@oracle.com \
--cc=bhelgaas@google.com \
--cc=dave.jiang@intel.com \
--cc=drewwalton@microsoft.com \
--cc=helgaas@kernel.org \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=kaihengf@nvidia.com \
--cc=karolina.stolarek@oracle.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=martin.petersen@oracle.com \
--cc=oohall@gmail.com \
--cc=pandoh@google.com \
--cc=paulmck@kernel.org \
--cc=rrichter@amd.com \
--cc=sargun@meta.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=tony.luck@intel.com \
--cc=wnliu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.