From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Jon Pan-Doh <pandoh@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
Karolina Stolarek <karolina.stolarek@oracle.com>,
<linux-pci@vger.kernel.org>,
Martin Petersen <martin.petersen@oracle.com>,
Ben Fuller <ben.fuller@oracle.com>,
"Drew Walton" <drewwalton@microsoft.com>,
Anil Agrawal <anilagrawal@meta.com>,
Tony Luck <tony.luck@intel.com>
Subject: Re: [PATCH 6/8] PCI/AER: Add AER sysfs attributes for ratelimits
Date: Fri, 31 Jan 2025 14:32:46 +0000 [thread overview]
Message-ID: <20250131143246.000037a2@huawei.com> (raw)
In-Reply-To: <20250115074301.3514927-7-pandoh@google.com>
On Tue, 14 Jan 2025 23:42:58 -0800
Jon Pan-Doh <pandoh@google.com> wrote:
> Allow userspace to read/write ratelimits per device. Create aer/ sysfs
> directory to store them and any future aer configs.
>
> Tested using aer-inject[1] tool. Configured correctable log/irq ratelimits
> to 5/10 respectively. Sent 12 AER errors. Observed 5 errors logged while
> AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 11
> (1 masked).
>
> Before: CEMsk: BadTLP-
> After: CEMsk: BadTLP+.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git
>
> Signed-off-by: Jon Pan-Doh <pandoh@google.com>
> ---
> .../testing/sysfs-bus-pci-devices-aer_stats | 32 +++++++++++
> Documentation/PCI/pcieaer-howto.rst | 4 +-
> drivers/pci/pci-sysfs.c | 1 +
> drivers/pci/pci.h | 1 +
> drivers/pci/pcie/aer.c | 56 +++++++++++++++++++
> 5 files changed, 93 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
> index d1f67bb81d5d..c680a53af0f4 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
> +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
> @@ -117,3 +117,35 @@ Date: July 2018
> KernelVersion: 4.19.0
> Contact: linux-pci@vger.kernel.org, rajatja@google.com
> Description: Total number of ERR_NONFATAL messages reported to rootport.
> +
> +PCIe AER ratelimits
> +----------------------------
Purely from an rst formatting point of view (this gets picked up in the docs
build) --- should extend I think only under the text.
> +
> +These attributes show up under all the devices that are AER capable. Provides
> +configurable ratelimits of logs/irq per error type. Writing a nonzero value
> +changes the number of errors (burst) allowed per 5 second window before
> +ratelimiting. Reading gets the current ratelimits.
> +
> +What: /sys/bus/pci/devices/<dev>/aer/ratelimit_cor_irq
> +Date: Jan 2025
> +KernelVersion: 6.14.0
> +Contact: linux-pci@vger.kernel.org, pandoh@google.com
> +Description: IRQ ratelimit for correctable errors.
I would add enough info that we can understand what values to write and what
to read to each description. This doc didn't lead me to the comment below
and it should have done...
> +
> +What: /sys/bus/pci/devices/<dev>/aer/ratelimit_uncor_irq
> +Date: Jan 2025
> +KernelVersion: 6.14.0
> +Contact: linux-pci@vger.kernel.org, pandoh@google.com
> +Description: IRQ ratelimit for uncorrectable errors.
> +
> +What: /sys/bus/pci/devices/<dev>/aer/ratelimit_cor_log
> +Date: Jan 2025
> +KernelVersion: 6.14.0
> +Contact: linux-pci@vger.kernel.org, pandoh@google.com
> +Description: Log ratelimit for correctable errors.
> +
> +What: /sys/bus/pci/devices/<dev>/aer/ratelimit_uncor_log
> +Date: Jan 2025
> +KernelVersion: 6.14.0
> +Contact: linux-pci@vger.kernel.org, pandoh@google.com
> +Description: Log ratelimit for uncorrectable errors.
> diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst
> index d41272504b18..4d5b0638f120 100644
> --- a/Documentation/PCI/pcieaer-howto.rst
> +++ b/Documentation/PCI/pcieaer-howto.rst
> @@ -89,7 +89,9 @@ AER Ratelimits
> -------------------------
>
> Errors, both at log and IRQ level, are ratelimited per device and error type.
> -This prevents spammy devices from stalling execution.
> +This prevents spammy devices from stalling execution. Ratelimits are exposed
> +in the form of sysfs attributes and configurable. See
> +Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
>
> AER Statistics / Counters
> -------------------------
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 7679d75d71e5..41acb6713e2d 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1693,6 +1693,7 @@ const struct attribute_group *pci_dev_attr_groups[] = {
> &pcie_dev_attr_group,
> #ifdef CONFIG_PCIEAER
> &aer_stats_attr_group,
> + &aer_attr_group,
> #endif
> #ifdef CONFIG_PCIEASPM
> &aspm_ctrl_attr_group,
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 2e40fc63ba31..9d0272a890ef 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -881,6 +881,7 @@ void pci_no_aer(void);
> void pci_aer_init(struct pci_dev *dev);
> void pci_aer_exit(struct pci_dev *dev);
> extern const struct attribute_group aer_stats_attr_group;
> +extern const struct attribute_group aer_attr_group;
> void pci_aer_clear_fatal_status(struct pci_dev *dev);
> int pci_aer_clear_status(struct pci_dev *dev);
> int pci_aer_raw_clear_status(struct pci_dev *dev);
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1db70ae87f52..e48e2951baae 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -630,6 +630,62 @@ const struct attribute_group aer_stats_attr_group = {
> .is_visible = aer_stats_attrs_are_visible,
> };
>
> +#define aer_ratelimit_attr(name, ratelimit) \
> + static ssize_t \
> + name##_show(struct device *dev, struct device_attribute *attr, \
> + char *buf) \
> +{ \
> + struct pci_dev *pdev = to_pci_dev(dev); \
> + return sysfs_emit(buf, "%u errors every %u seconds\n", \
> + pdev->aer_info->ratelimit.burst, \
> + pdev->aer_info->ratelimit.interval / HZ); \
Usual rule of thumb is you need a very strong reason to read something
different from what was written to a sysfs file.
I think your interval is fixed? If so name the files so that is apparent
and just have a count returned here. Or if you want to future proof
add a read only ratelimit_period_secs file that prints 5
ratelimit_in_5secs_uncor_log etc.
> +} \
> + \
> + static ssize_t \
> + name##_store(struct device *dev, struct device_attribute *attr, \
> + const char *buf, size_t count) \
> +{ \
> + struct pci_dev *pdev = to_pci_dev(dev); \
> + int burst; \
> + \
> + if (kstrtoint(buf, 0, &burst) < 0) \
> + return -EINVAL; \
> + \
> + pdev->aer_info->ratelimit.burst = burst; \
> + return count; \
> +} \
> +static DEVICE_ATTR_RW(name)
next prev parent reply other threads:[~2025-01-31 14:32 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 7:42 [PATCH 0/8] Rate limit AER logs/IRQs Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 1/8] PCI/AER: Remove aer_print_port_info Jon Pan-Doh
2025-01-16 14:27 ` Karolina Stolarek
2025-01-18 1:57 ` Jon Pan-Doh
2025-01-20 9:25 ` Karolina Stolarek
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-21 14:18 ` Ilpo Järvinen
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-25 4:15 ` Sathyanarayanan Kuppuswamy
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 2/8] PCI/AER: Move AER stat collection out of __aer_print_error Jon Pan-Doh
2025-01-16 14:47 ` Karolina Stolarek
2025-01-18 1:57 ` Jon Pan-Doh
2025-01-25 4:37 ` Sathyanarayanan Kuppuswamy
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 3/8] PCI/AER: Rename struct aer_stats to aer_info Jon Pan-Doh
2025-01-16 10:11 ` Karolina Stolarek
2025-01-18 1:59 ` Jon Pan-Doh
2025-01-20 10:13 ` Karolina Stolarek
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 4/8] PCI/AER: Introduce ratelimit for error logs Jon Pan-Doh
2025-01-16 11:11 ` Karolina Stolarek
2025-01-18 1:59 ` Jon Pan-Doh
2025-01-20 10:25 ` Karolina Stolarek
2025-01-15 7:42 ` [PATCH 5/8] PCI/AER: Introduce ratelimit for AER IRQs Jon Pan-Doh
2025-01-16 12:02 ` Karolina Stolarek
2025-01-18 1:58 ` Jon Pan-Doh
2025-01-20 10:38 ` Karolina Stolarek
2025-01-25 7:39 ` Lukas Wunner
2025-01-31 14:43 ` Jonathan Cameron
2025-03-04 23:42 ` Jon Pan-Doh
2025-02-06 13:56 ` Karolina Stolarek
2025-02-06 20:25 ` Lukas Wunner
2025-01-31 14:55 ` Jonathan Cameron
2025-03-04 23:42 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 6/8] PCI/AER: Add AER sysfs attributes for ratelimits Jon Pan-Doh
2025-01-31 14:32 ` Jonathan Cameron [this message]
2025-02-28 23:11 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 7/8] PCI/AER: Update AER sysfs ABI filename Jon Pan-Doh
2025-01-15 7:43 ` [PATCH 8/8] PCI/AER: Move AER sysfs attributes into separate directory Jon Pan-Doh
2025-01-16 10:26 ` Karolina Stolarek
2025-01-16 17:18 ` Rajat Jain
2025-01-31 14:36 ` Jonathan Cameron
2025-02-12 23:19 ` Jon Pan-Doh
2025-01-23 15:18 ` [PATCH 0/8] Rate limit AER logs/IRQs Bowman, Terry
2025-01-24 6:46 ` Jon Pan-Doh
2025-01-25 7:59 ` Lukas Wunner
2025-02-06 13:32 ` Karolina Stolarek
2025-02-12 23:19 ` Jon Pan-Doh
2025-02-13 16:00 ` Karolina Stolarek
2025-02-14 2:49 ` Jon Pan-Doh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250131143246.000037a2@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=anilagrawal@meta.com \
--cc=ben.fuller@oracle.com \
--cc=bhelgaas@google.com \
--cc=drewwalton@microsoft.com \
--cc=karolina.stolarek@oracle.com \
--cc=linux-pci@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=pandoh@google.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox