[PATCH RESEND 0/4] Rate limit reporting of Correctable Errors

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors
@ 2025-01-08 13:55 Karolina Stolarek
  2025-01-08 13:55 ` [PATCH RESEND 1/4] PCI/AER: Use the same log level for all messages Karolina Stolarek
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Karolina Stolarek @ 2025-01-08 13:55 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Martin Petersen, Ben Fuller

This is a resend of RFC, all patches apply cleanly on the top of
pci-next. I adjusted the series title, there are no functional
changes.

TL;DR
====

We are getting multiple reports about excessive logging of Correctable
Errors with no clear common root cause. As these errors are already
corrected by hardware, it makes sense to limit them. Introduce
a ratelimit state definition to pci_dev to control the number of
messages reported by a Root Port within a specified time interval.
The series adds other improvements in the area, as outlined in the
Proposal section.

Introduction
============

PCIe devices are expected to detect and report errors related to
transactions they participate in. If an error is discovered,
the device creates an Error Message and sends it to the Root Port
which, in turn, generates an interrupt upon receiving it. The
AER driver services such an interrupt by logging the error and
performing recovery work if needed. This means that every time
an error is detected, potentially every transaction, it gets
reported. But not all errors are of the same significance. We can
categorize them into errors from which the hardware can recover
(Correctable), and errors that require software intervention and
impact the functionality of the interface (Uncorrectable). While
informing about the latter is essential, reports of Correctable
Errors are purely informational. Such logs can helpful in monitoring
the stability of the link but if such errors are reported too
frequently, they pollute the logs and overshadow failures that require
our attention.

Background
==========

The excessive logging of AER Correctable Errors has been reported
many times in the past, across different devices, with no obvious
common root cause: [1], [2], [3] and more. There is no easy way to
control the rate of reported errors, so the end users would go for
workarounds, such as disabling AER feature, or creating scripts to
directly disable generating messages for Correctable Errors per
device[4]. In the majority of cases, lowering the rate at which
such errors are reported, should be enough to elevate the symptoms.

Previous Work
=============

The series submitted by Grant Grundler[5] attempted to rate limit
messages by using a ratelimited variant of pci_info wrapper. Although
this change lowers the volume of specific prinkts, the rate limiting
does not work globally, and messages could appear out of sync.
The desired solution is to introduce a manually managed ratelimit_state,
which would be used every time a Correctable Error is signaled.

Proposal
========

To overcome the limitations of per-printk-call state, this series
introduces a ratelimit state definition and uses it to decide if
a registered Correctable Error should be reported. Initially, we
allow to report one error per 2 seconds, and extend that interval
to 30 seconds after reaching the threshold of 1000 Correctable Errors
on a Root Port. The values proposed here are arbitrary and up
for discussion.

If all of this still does not help, we give an option to silence
the messages by exposing a bit in Device Control Register responsible
for Correctable Error Message generation as a sysfs attribute.

The series introduces changes as follows:

  1) Use the same log level across printks when reporting an AER Error

  2) Add a ratelimit_state definition to pci_dev struct to maintain
     the ratelimit state between interrupts. Check it to decide if
     a Correctable Error can be reported

  3) Extend the interval between reported Correctable Errors after
     reaching a threshold. Make this decision based on the value of
     aer_rootport_total_err_cor counter. Add a flag to pci_dev to signal
     this event to avoid continuous updates of the ratelimit interval

  4) Add cor_errs_reporting_enable sysfs attribute to read and write
     the bit controlling the generation of Correctable Error messages
     by the device.

Patches 1-3 depend one on another, patch 4 is a standalone change.

This solution requires modification to pci_dev struct, which can be
deemed invasive. Still, this is the only place capable of maintaining
a stateful ratelimit. The other structure used by aer_isr_one_error,
a function that processes the signaled error, is populated and reused
on each serviced interrupt. An alternative solution would be to have
one global rate limit state in the aforementioned function, but this
definition could be easily abused by one noisy device. It would hit
limit in no time, leaving Correctable Errors, even coming from other
Root Ports, unreported.

Having said that, I am more than happy to discuss other ways of
getting Correctable errors under control.

Results
=======

The patches were tested with aer-inject tool as well as with actual
hardware, Samsung PM1733 NVMe. The disk would generate high volumes of
Correctable Error messages, sometimes as soon as an IRQ was allocated
for it:

[   20.662657] pcieport 0000:00:01.3: AER: Multiple Corrected error
message received from 0000:03:00.0
[   20.677779] pci 0000:03:00.0: PCIe Bus Error: severity=Corrected,
type=Physical Layer, (Receiver ID)
[   20.693003] pci 0000:03:00.0:   device [144d:a824] error
status/mask=00000001/00000000
[   20.706861] pci 0000:03:00.0:    [ 0] RxErr
[   20.718171] pcieport 0000:00:01.3: AER: Multiple Corrected error
message received from 0000:03:00.0
[   20.733303] pci 0000:03:00.0: PCIe Bus Error: severity=Corrected,
type=Physical Layer, (Receiver ID)
[   20.748545] pci 0000:03:00.0:   device [144d:a824] error
status/mask=00000001/00000000
[   20.762421] pci 0000:03:00.0:    [ 0] RxErr
[   20.773733] pcieport 0000:00:01.3: AER: Multiple Corrected error
message received from 0000:03:00.0
[   20.788874] pci 0000:03:00.0: PCIe Bus Error: severity=Corrected,
type=Physical Layer, (Receiver ID)
[   20.804130] pci 0000:03:00.0:   device [144d:a824] error
status/mask=00000001/00000000
[   20.818026] pci 0000:03:00.0:    [ 0] RxErr
[   20.829359] pcieport 0000:00:01.3: AER: Corrected error message
received from 0000:03:00.0
[   20.843615] pci 0000:03:00.0: PCIe Bus Error: severity=Corrected,
type=Physical Layer, (Receiver ID)
[   20.858868] pci 0000:03:00.0:   device [144d:a824] error
status/mask=00000001/00000000
[   20.872752] pci 0000:03:00.0:    [ 0] RxErr

With the basic rate limiting, the dmesg is still populated at the
high rate:

[  248.871792] pcieport 0000:00:01.3: Correctable error message received
from 0000:03:00.0
[  248.871805] nvme 0000:03:00.0: PCIe Bus Error: severity=Correctable,
type=Physical Layer, (Receiver ID)
[  248.871808] nvme 0000:03:00.0:   device [144d:a824] error
status/mask=00000001/00000000
[  248.871811] nvme 0000:03:00.0:    [ 0] RxErr                  (First)
[  251.037039] report_aer_cor_err: 7 callbacks suppressed
[  251.037048] pcieport 0000:00:01.3: Correctable error message received
from 0000:03:00.0
[  251.037061] nvme 0000:03:00.0: PCIe Bus Error: severity=Correctable,
type=Physical Layer, (Receiver ID)
[  251.037064] nvme 0000:03:00.0:   device [144d:a824] error
status/mask=00000001/00000000
[  251.037067] nvme 0000:03:00.0:    [ 0] RxErr                  (First)
[  253.272609] report_aer_cor_err: 5 callbacks suppressed

After the threshold of Correctable Errors is reached, the rate limit
interval is updated:

[  342.528507] pcieport 0000:00:01.3: AER: Over 1000 Correctable Errors
reported, increasing the rate limit
[  371.958275] report_aer_cor_err: 94 callbacks suppressed
[  371.958284] pcieport 0000:00:01.3: Correctable error message received
from 0000:03:00.0
[  371.958297] nvme 0000:03:00.0: PCIe Bus Error: severity=Correctable,
type=Physical Layer, (Receiver ID)
[  371.958300] nvme 0000:03:00.0:   device [144d:a824] error
status/mask=00000001/00000000
[  371.958302] nvme 0000:03:00.0:    [ 0] RxErr                  (First)
[  402.018141] report_aer_cor_err: 89 callbacks suppressed

----------------------
[1] - https://bugzilla.kernel.org/show_bug.cgi?id=215027
[2] - https://bugzilla.kernel.org/show_bug.cgi?id=201517
[3] - https://bugzilla.kernel.org/show_bug.cgi?id=196183
[4] - https://gist.github.com/flisboac/5a0711201311b63d23b292110bb383cd
[5] -
https://lore.kernel.org/linux-pci/20230606035442.2886343-2-grundler@chromium.org/

Karolina Stolarek (4):
  PCI/AER: Use the same log level for all messages
  PCI/AER: Add Correctable Errors rate limiting
  PCI/AER: Increase the rate limit interval after threshold
  PCI: Add 'cor_err_reporting_enable' attribute

 Documentation/ABI/testing/sysfs-bus-pci |   7 ++
 drivers/pci/pci-sysfs.c                 |  42 +++++++++
 drivers/pci/pci.h                       |   2 +-
 drivers/pci/pcie/aer.c                  | 108 ++++++++++++++++--------
 drivers/pci/pcie/dpc.c                  |   2 +-
 include/linux/pci.h                     |   2 +
 6 files changed, 127 insertions(+), 36 deletions(-)

-- 
2.43.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RESEND 1/4] PCI/AER: Use the same log level for all messages
  2025-01-08 13:55 [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Karolina Stolarek
@ 2025-01-08 13:55 ` Karolina Stolarek
  2025-01-15  7:50   ` Jon Pan-Doh
  2025-01-08 13:55 ` [PATCH RESEND 2/4] PCI/AER: Add Correctable Errors rate limiting Karolina Stolarek
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Karolina Stolarek @ 2025-01-08 13:55 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Martin Petersen, Ben Fuller

When reporting an AER error, we check its type multiple times
to determine the log level for each message. Do this check only
in the top-level function and propagate the result down the call
chain. Make aer_print_port_info output to match the level of the
reported error.

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
---
 drivers/pci/pci.h      |  2 +-
 drivers/pci/pcie/aer.c | 64 ++++++++++++++++++++++--------------------
 drivers/pci/pcie/dpc.c |  2 +-
 3 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2e40fc63ba31..139ea4f01448 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -546,7 +546,7 @@ struct aer_err_info {
 };
 
 int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
-void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
+void aer_print_error(struct pci_dev *dev, struct aer_err_info *info, const char *level);
 #endif	/* CONFIG_PCIEAER */
 
 #ifdef CONFIG_PCIEPORTBUS
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 80c5ba8d8296..b13690fd172f 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -672,20 +672,18 @@ static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log *t)
 }
 
 static void __aer_print_error(struct pci_dev *dev,
-			      struct aer_err_info *info)
+			      struct aer_err_info *info,
+			      const char *level)
 {
 	const char **strings;
 	unsigned long status = info->status & ~info->mask;
-	const char *level, *errmsg;
+	const char *errmsg;
 	int i;
 
-	if (info->severity == AER_CORRECTABLE) {
+	if (info->severity == AER_CORRECTABLE)
 		strings = aer_correctable_error_string;
-		level = KERN_WARNING;
-	} else {
+	else
 		strings = aer_uncorrectable_error_string;
-		level = KERN_ERR;
-	}
 
 	for_each_set_bit(i, &status, 32) {
 		errmsg = strings[i];
@@ -698,11 +696,11 @@ static void __aer_print_error(struct pci_dev *dev,
 	pci_dev_aer_stats_incr(dev, info);
 }
 
-void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
+void aer_print_error(struct pci_dev *dev, struct aer_err_info *info,
+		     const char *level)
 {
 	int layer, agent;
 	int id = pci_dev_id(dev);
-	const char *level;
 
 	if (!info->status) {
 		pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
@@ -713,8 +711,6 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 	layer = AER_GET_LAYER_ERROR(info->severity, info->status);
 	agent = AER_GET_AGENT(info->severity, info->status);
 
-	level = (info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR;
-
 	pci_printk(level, dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
 		   aer_error_severity_string[info->severity],
 		   aer_error_layer[layer], aer_agent_string[agent]);
@@ -722,7 +718,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 	pci_printk(level, dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
 		   dev->vendor, dev->device, info->status, info->mask);
 
-	__aer_print_error(dev, info);
+	__aer_print_error(dev, info, level);
 
 	if (info->tlp_header_valid)
 		__print_tlp_header(dev, &info->tlp);
@@ -735,16 +731,17 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 			info->severity, info->tlp_header_valid, &info->tlp);
 }
 
-static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
+static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info,
+				const char *level)
 {
 	u8 bus = info->id >> 8;
 	u8 devfn = info->id & 0xff;
 
-	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
-		 info->multi_error_valid ? "Multiple " : "",
-		 aer_error_severity_string[info->severity],
-		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
-		 PCI_FUNC(devfn));
+	pci_printk(level, dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
+		   info->multi_error_valid ? "Multiple " : "",
+		   aer_error_severity_string[info->severity],
+		   pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
+		   PCI_FUNC(devfn));
 }
 
 #ifdef CONFIG_ACPI_APEI_PCIEAER
@@ -767,15 +764,18 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
 {
 	int layer, agent, tlp_header_valid = 0;
 	u32 status, mask;
+	const char *level;
 	struct aer_err_info info;
 
 	if (aer_severity == AER_CORRECTABLE) {
 		status = aer->cor_status;
 		mask = aer->cor_mask;
+		level = KERN_WARNING;
 	} else {
 		status = aer->uncor_status;
 		mask = aer->uncor_mask;
 		tlp_header_valid = status & AER_LOG_TLP_MASKS;
+		level = KERN_ERR;
 	}
 
 	layer = AER_GET_LAYER_ERROR(aer_severity, status);
@@ -787,14 +787,14 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
 	info.mask = mask;
 	info.first_error = PCI_ERR_CAP_FEP(aer->cap_control);
 
-	pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
-	__aer_print_error(dev, &info);
-	pci_err(dev, "aer_layer=%s, aer_agent=%s\n",
-		aer_error_layer[layer], aer_agent_string[agent]);
+	pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
+	__aer_print_error(dev, &info, level);
+	pci_printk(level, dev, "aer_layer=%s, aer_agent=%s\n",
+		   aer_error_layer[layer], aer_agent_string[agent]);
 
 	if (aer_severity != AER_CORRECTABLE)
-		pci_err(dev, "aer_uncor_severity: 0x%08x\n",
-			aer->uncor_severity);
+		pci_printk(level, dev, "aer_uncor_severity: 0x%08x\n",
+			   aer->uncor_severity);
 
 	if (tlp_header_valid)
 		__print_tlp_header(dev, &aer->header_log);
@@ -1255,14 +1255,15 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 	return 1;
 }
 
-static inline void aer_process_err_devices(struct aer_err_info *e_info)
+static inline void aer_process_err_devices(struct aer_err_info *e_info,
+					   const char *level)
 {
 	int i;
 
 	/* Report all before handle them, not to lost records by reset etc. */
 	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
 		if (aer_get_device_error_info(e_info->dev[i], e_info))
-			aer_print_error(e_info->dev[i], e_info);
+			aer_print_error(e_info->dev[i], e_info, level);
 	}
 	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
 		if (aer_get_device_error_info(e_info->dev[i], e_info))
@@ -1280,6 +1281,7 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 {
 	struct pci_dev *pdev = rpc->rpd;
 	struct aer_err_info e_info;
+	const char *level;
 
 	pci_rootport_aer_stats_incr(pdev, e_src);
 
@@ -1290,19 +1292,21 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 	if (e_src->status & PCI_ERR_ROOT_COR_RCV) {
 		e_info.id = ERR_COR_ID(e_src->id);
 		e_info.severity = AER_CORRECTABLE;
+		level = KERN_WARNING;
 
 		if (e_src->status & PCI_ERR_ROOT_MULTI_COR_RCV)
 			e_info.multi_error_valid = 1;
 		else
 			e_info.multi_error_valid = 0;
-		aer_print_port_info(pdev, &e_info);
+		aer_print_port_info(pdev, &e_info, level);
 
 		if (find_source_device(pdev, &e_info))
-			aer_process_err_devices(&e_info);
+			aer_process_err_devices(&e_info, level);
 	}
 
 	if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
 		e_info.id = ERR_UNCOR_ID(e_src->id);
+		level = KERN_ERR;
 
 		if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
 			e_info.severity = AER_FATAL;
@@ -1314,10 +1318,10 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 		else
 			e_info.multi_error_valid = 0;
 
-		aer_print_port_info(pdev, &e_info);
+		aer_print_port_info(pdev, &e_info, level);
 
 		if (find_source_device(pdev, &e_info))
-			aer_process_err_devices(&e_info);
+			aer_process_err_devices(&e_info, level);
 	}
 }
 
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 2b6ef7efa3c1..9e48d571d9e7 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -291,7 +291,7 @@ void dpc_process_error(struct pci_dev *pdev)
 	else if (reason == PCI_EXP_DPC_STATUS_TRIGGER_RSN_UNCOR &&
 		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
 		 aer_get_device_error_info(pdev, &info)) {
-		aer_print_error(pdev, &info);
+		aer_print_error(pdev, &info, KERN_ERR);
 		pci_aer_clear_nonfatal_status(pdev);
 		pci_aer_clear_fatal_status(pdev);
 	}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RESEND 2/4] PCI/AER: Add Correctable Errors rate limiting
  2025-01-08 13:55 [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Karolina Stolarek
  2025-01-08 13:55 ` [PATCH RESEND 1/4] PCI/AER: Use the same log level for all messages Karolina Stolarek
@ 2025-01-08 13:55 ` Karolina Stolarek
  2025-01-15  7:52   ` Jon Pan-Doh
  2025-01-08 13:55 ` [PATCH RESEND 3/4] PCI/AER: Increase the rate limit interval after threshold Karolina Stolarek
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Karolina Stolarek @ 2025-01-08 13:55 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Martin Petersen, Ben Fuller

In the case of a compromised Link integrity, we may see excessive
logging of Correctable Errors. This kind of errors is handled by
the hardware, so the messages are purely informational. It should
suffice to report the error once in a while, and inform how many
messages were suppressed over that time.

Add a ratelimit_state to control the number of printed Correctable
Errors per Root Port and check it each time a Correctable Error is
to be reported.

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
---
 drivers/pci/pcie/aer.c | 44 ++++++++++++++++++++++++++++--------------
 include/linux/pci.h    |  1 +
 2 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index b13690fd172f..5c34cc2b5bf3 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -40,6 +40,8 @@
 #define AER_MAX_TYPEOF_COR_ERRS		16	/* as per PCI_ERR_COR_STATUS */
 #define AER_MAX_TYPEOF_UNCOR_ERRS	27	/* as per PCI_ERR_UNCOR_STATUS*/
 
+#define AER_COR_ERR_INTERVAL		(2 * HZ)
+
 struct aer_err_source {
 	u32 status;			/* PCI_ERR_ROOT_STATUS */
 	u32 id;				/* PCI_ERR_ROOT_ERR_SRC */
@@ -375,6 +377,9 @@ void pci_aer_init(struct pci_dev *dev)
 
 	dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
 
+	/* Allow Root Port to report a Correctable Error message every 2 seconds */
+	ratelimit_state_init(&dev->cor_rs, AER_COR_ERR_INTERVAL, 1);
+
 	/*
 	 * We save/restore PCI_ERR_UNCOR_MASK, PCI_ERR_UNCOR_SEVER,
 	 * PCI_ERR_COR_MASK, and PCI_ERR_CAP.  Root and Root Complex Event
@@ -766,11 +771,13 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
 	u32 status, mask;
 	const char *level;
 	struct aer_err_info info;
+	bool no_ratelimit = true;
 
 	if (aer_severity == AER_CORRECTABLE) {
 		status = aer->cor_status;
 		mask = aer->cor_mask;
 		level = KERN_WARNING;
+		no_ratelimit = __ratelimit(&dev->cor_rs);
 	} else {
 		status = aer->uncor_status;
 		mask = aer->uncor_mask;
@@ -787,17 +794,20 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
 	info.mask = mask;
 	info.first_error = PCI_ERR_CAP_FEP(aer->cap_control);
 
-	pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
-	__aer_print_error(dev, &info, level);
-	pci_printk(level, dev, "aer_layer=%s, aer_agent=%s\n",
-		   aer_error_layer[layer], aer_agent_string[agent]);
+	if (no_ratelimit) {
+		pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
+			   status, mask);
+		__aer_print_error(dev, &info, level);
+		pci_printk(level, dev, "aer_layer=%s, aer_agent=%s\n",
+			   aer_error_layer[layer], aer_agent_string[agent]);
 
-	if (aer_severity != AER_CORRECTABLE)
-		pci_printk(level, dev, "aer_uncor_severity: 0x%08x\n",
-			   aer->uncor_severity);
+		if (aer_severity != AER_CORRECTABLE)
+			pci_printk(level, dev, "aer_uncor_severity: 0x%08x\n",
+				   aer->uncor_severity);
 
-	if (tlp_header_valid)
-		__print_tlp_header(dev, &aer->header_log);
+		if (tlp_header_valid)
+			__print_tlp_header(dev, &aer->header_log);
+	}
 
 	trace_aer_event(dev_name(&dev->dev), (status & ~mask),
 			aer_severity, tlp_header_valid, &aer->header_log);
@@ -1256,13 +1266,14 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 }
 
 static inline void aer_process_err_devices(struct aer_err_info *e_info,
-					   const char *level)
+					   const char *level,
+					   bool no_ratelimit)
 {
 	int i;
 
 	/* Report all before handle them, not to lost records by reset etc. */
 	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
-		if (aer_get_device_error_info(e_info->dev[i], e_info))
+		if (aer_get_device_error_info(e_info->dev[i], e_info) && no_ratelimit)
 			aer_print_error(e_info->dev[i], e_info, level);
 	}
 	for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) {
@@ -1282,6 +1293,7 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 	struct pci_dev *pdev = rpc->rpd;
 	struct aer_err_info e_info;
 	const char *level;
+	bool no_ratelimit = true;
 
 	pci_rootport_aer_stats_incr(pdev, e_src);
 
@@ -1298,10 +1310,14 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 			e_info.multi_error_valid = 1;
 		else
 			e_info.multi_error_valid = 0;
-		aer_print_port_info(pdev, &e_info, level);
+
+		no_ratelimit = __ratelimit(&pdev->cor_rs);
+
+		if (no_ratelimit)
+			aer_print_port_info(pdev, &e_info, level);
 
 		if (find_source_device(pdev, &e_info))
-			aer_process_err_devices(&e_info, level);
+			aer_process_err_devices(&e_info, level, no_ratelimit);
 	}
 
 	if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
@@ -1321,7 +1337,7 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 		aer_print_port_info(pdev, &e_info, level);
 
 		if (find_source_device(pdev, &e_info))
-			aer_process_err_devices(&e_info, level);
+			aer_process_err_devices(&e_info, level, no_ratelimit);
 	}
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b5eb8bda655d..a736547396ca 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -347,6 +347,7 @@ struct pci_dev {
 #ifdef CONFIG_PCIEAER
 	u16		aer_cap;	/* AER capability offset */
 	struct aer_stats *aer_stats;	/* AER stats for this device */
+	struct ratelimit_state cor_rs;	/* Correctable Errors Ratelimit */
 #endif
 #ifdef CONFIG_PCIEPORTBUS
 	struct rcec_ea	*rcec_ea;	/* RCEC cached endpoint association */
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RESEND 3/4] PCI/AER: Increase the rate limit interval after threshold
  2025-01-08 13:55 [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Karolina Stolarek
  2025-01-08 13:55 ` [PATCH RESEND 1/4] PCI/AER: Use the same log level for all messages Karolina Stolarek
  2025-01-08 13:55 ` [PATCH RESEND 2/4] PCI/AER: Add Correctable Errors rate limiting Karolina Stolarek
@ 2025-01-08 13:55 ` Karolina Stolarek
  2025-01-08 13:55 ` [PATCH RESEND 4/4] PCI: Add 'cor_err_reporting_enable' attribute Karolina Stolarek
  2025-01-15  7:55 ` [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Jon Pan-Doh
  4 siblings, 0 replies; 10+ messages in thread
From: Karolina Stolarek @ 2025-01-08 13:55 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Martin Petersen, Ben Fuller

In extreme circumstances, the default rate limit might not
be enough and a longer timeout is needed. To avoid spamming
the logs, update the interval to 30 seconds for the specific
Root Port after it observes over 1000 Correctable Errors.

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
---
 drivers/pci/pcie/aer.c | 22 +++++++++++++++++++++-
 include/linux/pci.h    |  1 +
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 5c34cc2b5bf3..98bf8bbadc07 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -40,7 +40,9 @@
 #define AER_MAX_TYPEOF_COR_ERRS		16	/* as per PCI_ERR_COR_STATUS */
 #define AER_MAX_TYPEOF_UNCOR_ERRS	27	/* as per PCI_ERR_UNCOR_STATUS*/
 
+#define AER_COR_ERR_THRESHOLD		1000
 #define AER_COR_ERR_INTERVAL		(2 * HZ)
+#define AER_COR_ERR_LONG_INTERVAL	(30 * HZ)
 
 struct aer_err_source {
 	u32 status;			/* PCI_ERR_ROOT_STATUS */
@@ -670,6 +672,24 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
 	}
 }
 
+static bool report_aer_cor_err(struct pci_dev *pdev)
+{
+	struct ratelimit_state *rs = &pdev->cor_rs;
+	struct aer_stats *aer_stats = pdev->aer_stats;
+	unsigned int total_cor_errs = aer_stats->rootport_total_cor_errs;
+
+	/* A significant number of errors reported, increase the rate limit */
+	if (total_cor_errs > AER_COR_ERR_THRESHOLD && !pdev->cor_err_throttled) {
+		pci_warn(pdev,
+			 "Over %d Correctable Errors reported, increasing the rate limit",
+			 AER_COR_ERR_THRESHOLD);
+		rs->interval = AER_COR_ERR_LONG_INTERVAL;
+		pdev->cor_err_throttled = 1;
+	}
+
+	return __ratelimit(&pdev->cor_rs);
+}
+
 static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log *t)
 {
 	pci_err(dev, "  TLP Header: %08x %08x %08x %08x\n",
@@ -1311,7 +1331,7 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 		else
 			e_info.multi_error_valid = 0;
 
-		no_ratelimit = __ratelimit(&pdev->cor_rs);
+		no_ratelimit = report_aer_cor_err(pdev);
 
 		if (no_ratelimit)
 			aer_print_port_info(pdev, &e_info, level);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a736547396ca..6b062b1fcd3c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -348,6 +348,7 @@ struct pci_dev {
 	u16		aer_cap;	/* AER capability offset */
 	struct aer_stats *aer_stats;	/* AER stats for this device */
 	struct ratelimit_state cor_rs;	/* Correctable Errors Ratelimit */
+	unsigned int cor_err_throttled:1;
 #endif
 #ifdef CONFIG_PCIEPORTBUS
 	struct rcec_ea	*rcec_ea;	/* RCEC cached endpoint association */
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RESEND 4/4] PCI: Add 'cor_err_reporting_enable' attribute
  2025-01-08 13:55 [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Karolina Stolarek
                   ` (2 preceding siblings ...)
  2025-01-08 13:55 ` [PATCH RESEND 3/4] PCI/AER: Increase the rate limit interval after threshold Karolina Stolarek
@ 2025-01-08 13:55 ` Karolina Stolarek
  2025-01-15  7:55 ` [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Jon Pan-Doh
  4 siblings, 0 replies; 10+ messages in thread
From: Karolina Stolarek @ 2025-01-08 13:55 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Martin Petersen, Ben Fuller

In some cases, the number of Correctable Error messages is overwhelming,
and even with the rate limit imposed, they fill up the logs. The system
cannot do much about such errors, so a user might wish to silence them
completely.

Add a sysfs attribute to control reporting of the Correctable Error
Messages per device.

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
---
 Documentation/ABI/testing/sysfs-bus-pci |  7 +++++
 drivers/pci/pci-sysfs.c                 | 42 +++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 5da6a14dc326..dba72ee37ce4 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -479,6 +479,13 @@ Description:
 		The file is writable if the PF is bound to a driver that
 		implements ->sriov_set_msix_vec_count().
 
+What:		/sys/bus/pci/devices/.../cor_err_reporting_enable
+Date:		December 2024
+Contact:	Linux PCI developers <linux-pci@vger.kernel.org>
+Description:
+		This file exposes a bit to control sending of Correctable Error
+		Messages. The value comes from the Device Control register.
+
 What:		/sys/bus/pci/devices/.../resourceN_resize
 Date:		September 2022
 Contact:	Alex Williamson <alex.williamson@redhat.com>
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6f1bb7514efb..f7f0d7971ad7 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -186,6 +186,47 @@ static ssize_t resource_show(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RO(resource);
 
+static ssize_t cor_err_reporting_enable_show(struct device *dev,
+					     struct device_attribute *attr,
+					     char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	u16 reg;
+	int err;
+
+	err = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &reg);
+
+	if (err)
+		return pcibios_err_to_errno(err);
+
+	return sysfs_emit(buf, "%u\n", reg & PCI_EXP_DEVCTL_CERE);
+}
+
+static ssize_t cor_err_reporting_enable_store(struct device *dev,
+					      struct device_attribute *attr,
+					      const char *buf, size_t count)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	u16 reg;
+	u8 val;
+	int err;
+
+	if (kstrtou8(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &reg);
+
+	reg &= ~PCI_EXP_DEVCTL_CERE;
+	reg |= val;
+	err = pcie_capability_write_word(pdev, PCI_EXP_DEVCTL, reg);
+
+	if (err)
+		return pcibios_err_to_errno(err);
+
+	return count;
+}
+static DEVICE_ATTR_RW(cor_err_reporting_enable);
+
 static ssize_t max_link_speed_show(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
@@ -659,6 +700,7 @@ static struct attribute *pcie_dev_attrs[] = {
 	&dev_attr_current_link_width.attr,
 	&dev_attr_max_link_width.attr,
 	&dev_attr_max_link_speed.attr,
+	&dev_attr_cor_err_reporting_enable.attr,
 	NULL,
 };
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RESEND 1/4] PCI/AER: Use the same log level for all messages
  2025-01-08 13:55 ` [PATCH RESEND 1/4] PCI/AER: Use the same log level for all messages Karolina Stolarek
@ 2025-01-15  7:50   ` Jon Pan-Doh
  0 siblings, 0 replies; 10+ messages in thread
From: Jon Pan-Doh @ 2025-01-15  7:50 UTC (permalink / raw)
  To: karolina.stolarek
  Cc: ben.fuller, bhelgaas, linux-pci, martin.petersen, Jon Pan-Doh

On Wed, 8 Jan 2025 13:55:31 +0000
Karolina Stolarek <karolina.stolarek@oracle.com> wrote:
>
> Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>

Reviewed-by: Jon Pan-Doh <pandoh@google.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RESEND 2/4] PCI/AER: Add Correctable Errors rate limiting
  2025-01-08 13:55 ` [PATCH RESEND 2/4] PCI/AER: Add Correctable Errors rate limiting Karolina Stolarek
@ 2025-01-15  7:52   ` Jon Pan-Doh
  0 siblings, 0 replies; 10+ messages in thread
From: Jon Pan-Doh @ 2025-01-15  7:52 UTC (permalink / raw)
  To: karolina.stolarek; +Cc: ben.fuller, bhelgaas, linux-pci, martin.petersen

On Wed, 8 Jan 2025 13:55:32 +0000
Karolina Stolarek <karolina.stolarek@oracle.com> wrote:
> Add a ratelimit_state to control the number of printed Correctable
> Errors per Root Port and check it each time a Correctable Error is
> to be reported.

I think the ratelimits should be consistent (i.e. per-device) whether from
GHES/CXL (pci_print_aer()) or native AER (aer_print_error).

If there are two devices under a root port and one is spamming correctable
errors, I think we'd still want to get correctable errors from the non-spammy
device.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors
  2025-01-08 13:55 [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Karolina Stolarek
                   ` (3 preceding siblings ...)
  2025-01-08 13:55 ` [PATCH RESEND 4/4] PCI: Add 'cor_err_reporting_enable' attribute Karolina Stolarek
@ 2025-01-15  7:55 ` Jon Pan-Doh
  2025-01-15 14:18   ` Karolina Stolarek
  4 siblings, 1 reply; 10+ messages in thread
From: Jon Pan-Doh @ 2025-01-15  7:55 UTC (permalink / raw)
  To: karolina.stolarek; +Cc: ben.fuller, bhelgaas, linux-pci, martin.petersen

On Wed, 8 Jan 2025 13:55:30 +0000
Karolina Stolarek <karolina.stolarek@oracle.com> wrote:
> TL;DR
> ====
> 
> We are getting multiple reports about excessive logging of Correctable
> Errors with no clear common root cause. As these errors are already
> corrected by hardware, it makes sense to limit them. Introduce
> a ratelimit state definition to pci_dev to control the number of
> messages reported by a Root Port within a specified time interval.
> The series adds other improvements in the area, as outlined in the
> Proposal section.

Hi Karolina,

This is a common impediment for many folks that want to enable AER. The
excessive logging stalls execution, making machines unusable. I've been
working on a similar solution[1] to yours (i.e. ratelimiting) with a few
differences:

- ratelimit uncorrectable errors
- ratelimit IRQs
- configure ratelimits from userspace (sysfs knobs)

Hoping we can collaborate on a solution (i.e. take best parts of both patch
series).

Thanks,
Jon

[1] https://lore.kernel.org/linux-pci/20250115074301.3514927-1-pandoh@google.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors
  2025-01-15  7:55 ` [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Jon Pan-Doh
@ 2025-01-15 14:18   ` Karolina Stolarek
  2025-01-17  2:51     ` Jon Pan-Doh
  0 siblings, 1 reply; 10+ messages in thread
From: Karolina Stolarek @ 2025-01-15 14:18 UTC (permalink / raw)
  To: Jon Pan-Doh; +Cc: ben.fuller, bhelgaas, linux-pci, martin.petersen

Hi Jon,

Many thanks for reaching out.

On 15/01/2025 08:55, Jon Pan-Doh wrote:
> On Wed, 8 Jan 2025 13:55:30 +0000
> Karolina Stolarek <karolina.stolarek@oracle.com> wrote:
>> TL;DR
>> ====
>>
>> We are getting multiple reports about excessive logging of Correctable
>> Errors with no clear common root cause. As these errors are already
>> corrected by hardware, it makes sense to limit them. Introduce
>> a ratelimit state definition to pci_dev to control the number of
>> messages reported by a Root Port within a specified time interval.
>> The series adds other improvements in the area, as outlined in the
>> Proposal section.
> 
> Hi Karolina,
> 
> This is a common impediment for many folks that want to enable AER. The
> excessive logging stalls execution, making machines unusable. I've been
> working on a similar solution[1] to yours (i.e. ratelimiting) with a few
> differences:
> 
> - ratelimit uncorrectable errors
> - ratelimit IRQs
> - configure ratelimits from userspace (sysfs knobs)
> 
> Hoping we can collaborate on a solution (i.e. take best parts of both patch
> series).

That indeed looks like a more robust solution, I'm more than happy to
join forces and work on this together.

Feel free to incorporate the 1/4 patch into your series. I plan to do a
proper review tomorrow.

Out of curiosity, do your patches apply to cleanly to pci/err and/or
pci-next branches? From what I can see, "PCI: Consolidate TLP Log
reading and printing" series[1] had been just merged, so there could be
conflicts.

All the best,
Karolina

--------------------------------------------------------------
[1] - https://lore.kernel.org/linux-pci/20250114170840.1633-1-
ilpo.jarvinen@linux.intel.com/

> 
> Thanks,
> Jon
> 
> [1] https://lore.kernel.org/linux-pci/20250115074301.3514927-1-pandoh@google.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors
  2025-01-15 14:18   ` Karolina Stolarek
@ 2025-01-17  2:51     ` Jon Pan-Doh
  0 siblings, 0 replies; 10+ messages in thread
From: Jon Pan-Doh @ 2025-01-17  2:51 UTC (permalink / raw)
  To: Karolina Stolarek; +Cc: ben.fuller, bhelgaas, linux-pci, martin.petersen

On Wed, Jan 15, 2025 at 6:18 AM Karolina Stolarek
<karolina.stolarek@oracle.com> wrote:
> Feel free to incorporate the 1/4 patch into your series. I plan to do a
> proper review tomorrow.

Ack.

> Out of curiosity, do your patches apply to cleanly to pci/err and/or
> pci-next branches? From what I can see, "PCI: Consolidate TLP Log
> reading and printing" series[1] had been just merged, so there could be
> conflicts.

Good catch. I rebased off of Linus' master branch so it's unlikely to
be clean. I'll rebase off of pci-next in the next version.

Thanks,
Jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-01-17  2:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-08 13:55 [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Karolina Stolarek
2025-01-08 13:55 ` [PATCH RESEND 1/4] PCI/AER: Use the same log level for all messages Karolina Stolarek
2025-01-15  7:50   ` Jon Pan-Doh
2025-01-08 13:55 ` [PATCH RESEND 2/4] PCI/AER: Add Correctable Errors rate limiting Karolina Stolarek
2025-01-15  7:52   ` Jon Pan-Doh
2025-01-08 13:55 ` [PATCH RESEND 3/4] PCI/AER: Increase the rate limit interval after threshold Karolina Stolarek
2025-01-08 13:55 ` [PATCH RESEND 4/4] PCI: Add 'cor_err_reporting_enable' attribute Karolina Stolarek
2025-01-15  7:55 ` [PATCH RESEND 0/4] Rate limit reporting of Correctable Errors Jon Pan-Doh
2025-01-15 14:18   ` Karolina Stolarek
2025-01-17  2:51     ` Jon Pan-Doh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox