From: Bjorn Helgaas <helgaas@kernel.org>
To: linux-pci@vger.kernel.org
Cc: "Jon Pan-Doh" <pandoh@google.com>,
"Karolina Stolarek" <karolina.stolarek@oracle.com>,
"Weinan Liu" <wnliu@google.com>,
"Martin Petersen" <martin.petersen@oracle.com>,
"Ben Fuller" <ben.fuller@oracle.com>,
"Drew Walton" <drewwalton@microsoft.com>,
"Anil Agrawal" <anilagrawal@meta.com>,
"Tony Luck" <tony.luck@intel.com>,
"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
"Sathyanarayanan Kuppuswamy"
<sathyanarayanan.kuppuswamy@linux.intel.com>,
"Lukas Wunner" <lukas@wunner.de>,
"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
"Sargun Dhillon" <sargun@meta.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
"Oliver O'Halloran" <oohall@gmail.com>,
"Kai-Heng Feng" <kaihengf@nvidia.com>,
"Keith Busch" <kbusch@kernel.org>,
"Robert Richter" <rrichter@amd.com>,
"Terry Bowman" <terry.bowman@amd.com>,
"Shiju Jose" <shiju.jose@huawei.com>,
"Dave Jiang" <dave.jiang@intel.com>,
linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>
Subject: Re: [PATCH v7 17/17] PCI/AER: Add sysfs attributes for log ratelimits
Date: Thu, 22 May 2025 18:05:45 -0500 [thread overview]
Message-ID: <20250522230545.GA1496972@bhelgaas> (raw)
In-Reply-To: <20250520215047.1350603-18-helgaas@kernel.org>
On Tue, May 20, 2025 at 04:50:34PM -0500, Bjorn Helgaas wrote:
> From: Jon Pan-Doh <pandoh@google.com>
>
> Allow userspace to read/write log ratelimits per device (including
> enable/disable). Create aer/ sysfs directory to store them and any
> future aer configs.
> +PCIe AER ratelimits
> +-------------------
> +
> +These attributes show up under all the devices that are AER capable.
> +They represent configurable ratelimits of logs per error type.
> +
> +See Documentation/PCI/pcieaer-howto.rst for more info on ratelimits.
> +
> +What: /sys/bus/pci/devices/<dev>/aer/ratelimit_log_enable
> +Date: May 2025
> +KernelVersion: 6.16.0
> +Contact: linux-pci@vger.kernel.org, pandoh@google.com
> +Description: Writing 1/0 enables/disables AER log ratelimiting. Reading
> + gets whether or not AER ratelimiting is currently enabled.
> + Enabled by default.
> +
> +What: /sys/bus/pci/devices/<dev>/aer/ratelimit_burst_cor_log
> +Date: May 2025
> +KernelVersion: 6.16.0
> +Contact: linux-pci@vger.kernel.org, pandoh@google.com
> +Description: Ratelimit burst for correctable error logs. Writing a value
> + changes the number of errors (burst) allowed per interval
> + (5 second window) before ratelimiting. Reading gets the
> + current ratelimit burst.
> +
> +What: /sys/bus/pci/devices/<dev>/aer/ratelimit_burst_uncor_log
> +Date: May 2025
> +KernelVersion: 6.16.0
> +Contact: linux-pci@vger.kernel.org, pandoh@google.com
> +Description: Ratelimit burst for non-fatal uncorrectable error logs.
> + Writing a value changes the number of errors (burst)
> + allowed per interval (5 second window) before ratelimiting.
> + Reading gets the current ratelimit burst.
I propose that we rename these to:
aer/correctable_ratelimit_burst
aer/nonfatal_ratelimit_burst
I think "log" is probably unnecessary, and the error type part
should correspond with the existing entries (unfortunately not in the
"aer" directory):
aer_dev_correctable
aer_dev_nonfatal
and making "ratelimit_burst" a suffix will make it correspond to
these:
fs/ext4/sysfs.c: ATTR_LIST(err_ratelimit_burst),
fs/ext4/sysfs.c: ATTR_LIST(warning_ratelimit_burst),
fs/ext4/sysfs.c: ATTR_LIST(msg_ratelimit_burst),
lib/fault-inject.c:FAULT_CONFIGFS_ATTR_NAMED(ratelimit_burst, "verbose_ratelimit_burst",
Also, I think we should split up "aer/ratelimit_log_enable" so we can
do it separately for correctable and nonfatal errors and name as:
aer/correctable_ratelimit_interval_ms
aer/nonfatal_ratelimit_interval_ms
so we can set the interval as well as the burst and to match these:
fs/ext4/sysfs.c: ATTR_LIST(err_ratelimit_interval_ms),
fs/ext4/sysfs.c: ATTR_LIST(warning_ratelimit_interval_ms),
fs/ext4/sysfs.c: ATTR_LIST(msg_ratelimit_interval_ms),
lib/fault-inject.c:FAULT_CONFIGFS_ATTR_NAMED(ratelimit_interval, "verbose_ratelimit_interval_ms",
I'll post the series as a whole again, but here's the interdiff
related to this particular change:
diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
index 01bb577bfee8..5ed284523956 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
@@ -126,28 +126,38 @@ They represent configurable ratelimits of logs per error type.
See Documentation/PCI/pcieaer-howto.rst for more info on ratelimits.
-What: /sys/bus/pci/devices/<dev>/aer/ratelimit_log_enable
+What: /sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_interval_ms
Date: May 2025
KernelVersion: 6.16.0
-Contact: linux-pci@vger.kernel.org, pandoh@google.com
-Description: Writing 1/0 enables/disables AER log ratelimiting. Reading
- gets whether or not AER ratelimiting is currently enabled.
- Enabled by default.
+Contact: linux-pci@vger.kernel.org
+Description: Writing 0 disables AER correctable error log ratelimiting.
+ Writing a positive value sets the ratelimit interval in ms.
+ Default is DEFAULT_RATELIMIT_INTERVAL (5000 ms).
-What: /sys/bus/pci/devices/<dev>/aer/ratelimit_burst_cor_log
+What: /sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_burst
Date: May 2025
KernelVersion: 6.16.0
-Contact: linux-pci@vger.kernel.org, pandoh@google.com
+Contact: linux-pci@vger.kernel.org
Description: Ratelimit burst for correctable error logs. Writing a value
changes the number of errors (burst) allowed per interval
- (5 second window) before ratelimiting. Reading gets the
- current ratelimit burst.
+ before ratelimiting. Reading gets the current ratelimit
+ burst. Default is DEFAULT_RATELIMIT_BURST (10).
-What: /sys/bus/pci/devices/<dev>/aer/ratelimit_burst_uncor_log
+What: /sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_interval_ms
Date: May 2025
KernelVersion: 6.16.0
-Contact: linux-pci@vger.kernel.org, pandoh@google.com
+Contact: linux-pci@vger.kernel.org
+Description: Writing 0 disables AER non-fatal uncorrectable error log
+ ratelimiting. Writing a positive value sets the ratelimit
+ interval in ms. Default is DEFAULT_RATELIMIT_INTERVAL
+ (5000 ms).
+
+What: /sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_burst
+Date: May 2025
+KernelVersion: 6.16.0
+Contact: linux-pci@vger.kernel.org
Description: Ratelimit burst for non-fatal uncorrectable error logs.
Writing a value changes the number of errors (burst)
- allowed per interval (5 second window) before ratelimiting.
- Reading gets the current ratelimit burst.
+ allowed per interval before ratelimiting. Reading gets the
+ current ratelimit burst. Default is DEFAULT_RATELIMIT_BURST
+ (10).
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 48014010dc8b..6c331695af58 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -91,8 +91,8 @@ struct aer_info {
u64 rootport_total_nonfatal_errs;
/* Ratelimits for errors */
- struct ratelimit_state cor_log_ratelimit;
- struct ratelimit_state uncor_log_ratelimit;
+ struct ratelimit_state correctable_ratelimit;
+ struct ratelimit_state nonfatal_ratelimit;
};
#define AER_LOG_TLP_MASKS (PCI_ERR_UNC_POISON_TLP| \
@@ -384,9 +384,9 @@ void pci_aer_init(struct pci_dev *dev)
dev->aer_info = kzalloc(sizeof(*dev->aer_info), GFP_KERNEL);
- ratelimit_state_init(&dev->aer_info->cor_log_ratelimit,
+ ratelimit_state_init(&dev->aer_info->correctable_ratelimit,
DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
- ratelimit_state_init(&dev->aer_info->uncor_log_ratelimit,
+ ratelimit_state_init(&dev->aer_info->nonfatal_ratelimit,
DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
/*
@@ -628,45 +628,44 @@ const struct attribute_group aer_stats_attr_group = {
};
/*
- * Ratelimit enable toggle
- * 0: disabled with ratelimit.interval = 0
- * 1: enabled with ratelimit.interval = nonzero
+ * Ratelimit interval
+ * <=0: disabled with ratelimit.interval = 0
+ * >0: enabled with ratelimit.interval in ms
*/
-static ssize_t ratelimit_log_enable_show(struct device *dev,
- struct device_attribute *attr,
- char *buf)
-{
- struct pci_dev *pdev = to_pci_dev(dev);
- bool enabled = pdev->aer_info->cor_log_ratelimit.interval != 0;
-
- return sysfs_emit(buf, "%d\n", enabled);
-}
-
-static ssize_t ratelimit_log_enable_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct pci_dev *pdev = to_pci_dev(dev);
- bool enable;
- int interval;
-
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
-
- if (kstrtobool(buf, &enable) < 0)
- return -EINVAL;
-
- if (enable)
- interval = DEFAULT_RATELIMIT_INTERVAL;
- else
- interval = 0;
-
- pdev->aer_info->cor_log_ratelimit.interval = interval;
- pdev->aer_info->uncor_log_ratelimit.interval = interval;
-
- return count;
-}
-static DEVICE_ATTR_RW(ratelimit_log_enable);
+#define aer_ratelimit_interval_attr(name, ratelimit) \
+ static ssize_t \
+ name##_show(struct device *dev, struct device_attribute *attr, \
+ char *buf) \
+ { \
+ struct pci_dev *pdev = to_pci_dev(dev); \
+ \
+ return sysfs_emit(buf, "%d\n", \
+ pdev->aer_info->ratelimit.interval); \
+ } \
+ \
+ static ssize_t \
+ name##_store(struct device *dev, struct device_attribute *attr, \
+ const char *buf, size_t count) \
+ { \
+ struct pci_dev *pdev = to_pci_dev(dev); \
+ int interval; \
+ \
+ if (!capable(CAP_SYS_ADMIN)) \
+ return -EPERM; \
+ \
+ if (kstrtoint(buf, 0, &interval) < 0) \
+ return -EINVAL; \
+ \
+ if (interval <= 0) \
+ interval = 0; \
+ else \
+ interval = msecs_to_jiffies(interval); \
+ \
+ pdev->aer_info->ratelimit.interval = interval; \
+ \
+ return count; \
+ } \
+ static DEVICE_ATTR_RW(name);
#define aer_ratelimit_burst_attr(name, ratelimit) \
static ssize_t \
@@ -696,15 +695,22 @@ static DEVICE_ATTR_RW(ratelimit_log_enable);
\
return count; \
} \
- static DEVICE_ATTR_RW(name)
+ static DEVICE_ATTR_RW(name);
-aer_ratelimit_burst_attr(ratelimit_burst_cor_log, cor_log_ratelimit);
-aer_ratelimit_burst_attr(ratelimit_burst_uncor_log, uncor_log_ratelimit);
+#define aer_ratelimit_attrs(name) \
+ aer_ratelimit_interval_attr(name##_ratelimit_interval_ms, \
+ name##_ratelimit) \
+ aer_ratelimit_burst_attr(name##_ratelimit_burst, \
+ name##_ratelimit)
+
+aer_ratelimit_attrs(correctable)
+aer_ratelimit_attrs(nonfatal)
static struct attribute *aer_attrs[] = {
- &dev_attr_ratelimit_log_enable.attr,
- &dev_attr_ratelimit_burst_cor_log.attr,
- &dev_attr_ratelimit_burst_uncor_log.attr,
+ &dev_attr_correctable_ratelimit_interval_ms.attr,
+ &dev_attr_correctable_ratelimit_burst.attr,
+ &dev_attr_nonfatal_ratelimit_interval_ms.attr,
+ &dev_attr_nonfatal_ratelimit_burst.attr,
NULL
};
@@ -786,9 +792,9 @@ static int aer_ratelimit(struct pci_dev *dev, unsigned int severity)
return 1; /* AER_FATAL not ratelimited */
if (severity == AER_CORRECTABLE)
- ratelimit = &dev->aer_info->cor_log_ratelimit;
+ ratelimit = &dev->aer_info->correctable_ratelimit;
else
- ratelimit = &dev->aer_info->uncor_log_ratelimit;
+ ratelimit = &dev->aer_info->nonfatal_ratelimit;
return __ratelimit(ratelimit);
}
next prev parent reply other threads:[~2025-05-22 23:05 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-20 21:50 [PATCH v7 00/17] Rate limit AER logs Bjorn Helgaas
2025-05-20 21:50 ` [PATCH v7 01/17] PCI/DPC: Initialize aer_err_info before using it Bjorn Helgaas
2025-05-21 8:52 ` Jonathan Cameron
2025-05-21 19:18 ` Bjorn Helgaas
2025-05-21 10:06 ` Ilpo Järvinen
2025-05-20 21:50 ` [PATCH v7 02/17] PCI/DPC: Log Error Source ID only when valid Bjorn Helgaas
2025-05-21 9:00 ` Jonathan Cameron
2025-05-21 19:23 ` Bjorn Helgaas
2025-05-21 10:09 ` Ilpo Järvinen
2025-05-20 21:50 ` [PATCH v7 03/17] PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error() Bjorn Helgaas
2025-05-20 22:26 ` Sathyanarayanan Kuppuswamy
2025-05-21 9:15 ` Jonathan Cameron
2025-05-21 10:12 ` Ilpo Järvinen
2025-05-20 21:50 ` [PATCH v7 04/17] PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type() Bjorn Helgaas
2025-05-20 22:27 ` Sathyanarayanan Kuppuswamy
2025-05-21 9:20 ` Jonathan Cameron
2025-05-21 19:39 ` Bjorn Helgaas
2025-05-21 10:14 ` Ilpo Järvinen
2025-05-20 21:50 ` [PATCH v7 05/17] PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc Bjorn Helgaas
2025-05-21 9:21 ` Jonathan Cameron
2025-05-20 21:50 ` [PATCH v7 06/17] PCI/AER: Rename aer_print_port_info() to aer_print_source() Bjorn Helgaas
2025-05-21 9:22 ` Jonathan Cameron
2025-05-20 21:50 ` [PATCH v7 07/17] PCI/AER: Move aer_print_source() earlier in file Bjorn Helgaas
2025-05-21 9:23 ` Jonathan Cameron
2025-05-20 21:50 ` [PATCH v7 08/17] PCI/AER: Initialize aer_err_info before using it Bjorn Helgaas
2025-05-21 9:24 ` Jonathan Cameron
2025-05-20 21:50 ` [PATCH v7 09/17] PCI/AER: Simplify pci_print_aer() Bjorn Helgaas
2025-05-20 22:29 ` Sathyanarayanan Kuppuswamy
2025-05-21 9:27 ` Jonathan Cameron
2025-05-20 21:50 ` [PATCH v7 10/17] PCI/AER: Update statistics early in logging Bjorn Helgaas
2025-05-21 9:43 ` Jonathan Cameron
2025-05-20 21:50 ` [PATCH v7 11/17] PCI/AER: Combine trace_aer_event() with statistics updates Bjorn Helgaas
2025-05-21 9:46 ` Jonathan Cameron
2025-05-21 17:12 ` Bjorn Helgaas
2025-05-20 21:50 ` [PATCH v7 12/17] PCI/AER: Check log level once and remember it Bjorn Helgaas
2025-05-21 9:52 ` Jonathan Cameron
2025-05-20 21:50 ` [PATCH v7 13/17] PCI/AER: Make all pci_print_aer() log levels depend on error type Bjorn Helgaas
2025-05-21 9:56 ` Jonathan Cameron
2025-05-21 17:45 ` Bjorn Helgaas
2025-05-20 21:50 ` [PATCH v7 14/17] PCI/AER: Rename struct aer_stats to aer_info Bjorn Helgaas
2025-05-21 9:59 ` Jonathan Cameron
2025-05-20 21:50 ` [PATCH v7 15/17] PCI/AER: Ratelimit correctable and non-fatal error logging Bjorn Helgaas
2025-05-20 22:33 ` Sathyanarayanan Kuppuswamy
2025-05-21 23:06 ` Bjorn Helgaas
2025-05-21 10:24 ` Ilpo Järvinen
2025-05-21 10:31 ` Jonathan Cameron
2025-05-21 22:54 ` Bjorn Helgaas
2025-05-22 11:17 ` Jonathan Cameron
2025-05-20 21:50 ` [PATCH v7 16/17] PCI/AER: Add ratelimits to PCI AER Documentation Bjorn Helgaas
2025-05-20 21:50 ` [PATCH v7 17/17] PCI/AER: Add sysfs attributes for log ratelimits Bjorn Helgaas
2025-05-20 22:35 ` Sathyanarayanan Kuppuswamy
2025-05-21 10:46 ` Jonathan Cameron
2025-05-21 11:05 ` Ilpo Järvinen
2025-05-21 22:59 ` Bjorn Helgaas
2025-05-22 11:21 ` Jonathan Cameron
2025-05-22 23:05 ` Bjorn Helgaas [this message]
2025-05-21 23:14 ` [PATCH v7 00/17] Rate limit AER logs Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250522230545.GA1496972@bhelgaas \
--to=helgaas@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=anilagrawal@meta.com \
--cc=ben.fuller@oracle.com \
--cc=bhelgaas@google.com \
--cc=dave.jiang@intel.com \
--cc=drewwalton@microsoft.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=kaihengf@nvidia.com \
--cc=karolina.stolarek@oracle.com \
--cc=kbusch@kernel.org \
--cc=kwilczynski@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=martin.petersen@oracle.com \
--cc=oohall@gmail.com \
--cc=pandoh@google.com \
--cc=paulmck@kernel.org \
--cc=rrichter@amd.com \
--cc=sargun@meta.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=tony.luck@intel.com \
--cc=wnliu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.