* [PATCH v1] PCI/DPC: Fix AER error logging for DPC/EDR triggered events
@ 2026-03-18 17:04 Kuppuswamy Sathyanarayanan
2026-03-18 17:22 ` Bjorn Helgaas
0 siblings, 1 reply; 4+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-03-18 17:04 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar
Cc: Oliver OHalloran, Jon Pan-Doh, linux-pci, linux-kernel,
linuxppc-dev
aer_print_error() skips printing if ratelimit_print[i] is not set.
In the native AER path, ratelimit_print is initialized by
add_error_device() during source device discovery, and is set to 1
for fatal errors to bypass rate limiting since fatal errors should
always be logged.
The DPC/EDR path uses the DPC-capable port as the error source and
reads its AER uncorrectable error status registers directly in
dpc_get_aer_uncorrect_severity(). Since it does not go through
add_error_device(), ratelimit_print[0] is left uninitialized and zero.
As a result, aer_print_error() silently drops all AER error messages
for DPC/EDR triggered events.
Set ratelimit_print[0] to 1 to bypass rate limiting and always print
AER logs for fatal errors.
Fixes: a57f2bfb4a58 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
Co-developed-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com>
Signed-off-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
drivers/pci/pcie/dpc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fc18349614d7..7605ddd9f0ba 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -256,6 +256,7 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
info->dev[0] = dev;
info->error_dev_num = 1;
+ info->ratelimit_print[0] = 1;
return 1;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v1] PCI/DPC: Fix AER error logging for DPC/EDR triggered events 2026-03-18 17:04 [PATCH v1] PCI/DPC: Fix AER error logging for DPC/EDR triggered events Kuppuswamy Sathyanarayanan @ 2026-03-18 17:22 ` Bjorn Helgaas 2026-03-18 17:48 ` Kuppuswamy Sathyanarayanan 0 siblings, 1 reply; 4+ messages in thread From: Bjorn Helgaas @ 2026-03-18 17:22 UTC (permalink / raw) To: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran, Jon Pan-Doh, linux-pci, linux-kernel, linuxppc-dev, Sizhe Liu [+cc Sizhe] On Wed, Mar 18, 2026 at 10:04:49AM -0700, Kuppuswamy Sathyanarayanan wrote: > aer_print_error() skips printing if ratelimit_print[i] is not set. > In the native AER path, ratelimit_print is initialized by > add_error_device() during source device discovery, and is set to 1 > for fatal errors to bypass rate limiting since fatal errors should > always be logged. > > The DPC/EDR path uses the DPC-capable port as the error source and > reads its AER uncorrectable error status registers directly in > dpc_get_aer_uncorrect_severity(). Since it does not go through > add_error_device(), ratelimit_print[0] is left uninitialized and zero. > As a result, aer_print_error() silently drops all AER error messages > for DPC/EDR triggered events. > > Set ratelimit_print[0] to 1 to bypass rate limiting and always print > AER logs for fatal errors. > > Fixes: a57f2bfb4a58 ("PCI/AER: Ratelimit correctable and non-fatal error logging") > Co-developed-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com> > Signed-off-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com> > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> I think this does the same as https://git.kernel.org/cgit/linux/kernel/git/pci/pci.git/commit/?id=d4d1ecff2c2d which is already queued for v7.1. > --- > drivers/pci/pcie/dpc.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > index fc18349614d7..7605ddd9f0ba 100644 > --- a/drivers/pci/pcie/dpc.c > +++ b/drivers/pci/pcie/dpc.c > @@ -256,6 +256,7 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev, > > info->dev[0] = dev; > info->error_dev_num = 1; > + info->ratelimit_print[0] = 1; > > return 1; > } > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1] PCI/DPC: Fix AER error logging for DPC/EDR triggered events 2026-03-18 17:22 ` Bjorn Helgaas @ 2026-03-18 17:48 ` Kuppuswamy Sathyanarayanan 2026-03-30 22:20 ` Bjorn Helgaas 0 siblings, 1 reply; 4+ messages in thread From: Kuppuswamy Sathyanarayanan @ 2026-03-18 17:48 UTC (permalink / raw) To: Bjorn Helgaas Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran, Jon Pan-Doh, linux-pci, linux-kernel, linuxppc-dev, Sizhe Liu Hi Bjorn, On 3/18/2026 10:22 AM, Bjorn Helgaas wrote: > [+cc Sizhe] > > On Wed, Mar 18, 2026 at 10:04:49AM -0700, Kuppuswamy Sathyanarayanan wrote: >> aer_print_error() skips printing if ratelimit_print[i] is not set. >> In the native AER path, ratelimit_print is initialized by >> add_error_device() during source device discovery, and is set to 1 >> for fatal errors to bypass rate limiting since fatal errors should >> always be logged. >> >> The DPC/EDR path uses the DPC-capable port as the error source and >> reads its AER uncorrectable error status registers directly in >> dpc_get_aer_uncorrect_severity(). Since it does not go through >> add_error_device(), ratelimit_print[0] is left uninitialized and zero. >> As a result, aer_print_error() silently drops all AER error messages >> for DPC/EDR triggered events. >> >> Set ratelimit_print[0] to 1 to bypass rate limiting and always print >> AER logs for fatal errors. >> >> Fixes: a57f2bfb4a58 ("PCI/AER: Ratelimit correctable and non-fatal error logging") >> Co-developed-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com> >> Signed-off-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com> >> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > > I think this does the same as > https://git.kernel.org/cgit/linux/kernel/git/pci/pci.git/commit/?id=d4d1ecff2c2d > which is already queued for v7.1. Thanks for the reference. Since errors in the DPC path leads to port containment, I think it is best to always log them for reference and debug purposes. So I think we don't need to export aer_print_init() from the AER driver (which can ratelimit non-fatal DPC error). Instead we can by default skip ratelimit for DPC errors by initializing ratelimit_print[0] = 1. . > >> --- >> drivers/pci/pcie/dpc.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c >> index fc18349614d7..7605ddd9f0ba 100644 >> --- a/drivers/pci/pcie/dpc.c >> +++ b/drivers/pci/pcie/dpc.c >> @@ -256,6 +256,7 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev, >> >> info->dev[0] = dev; >> info->error_dev_num = 1; >> + info->ratelimit_print[0] = 1; >> >> return 1; >> } >> -- >> 2.43.0 >> -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1] PCI/DPC: Fix AER error logging for DPC/EDR triggered events 2026-03-18 17:48 ` Kuppuswamy Sathyanarayanan @ 2026-03-30 22:20 ` Bjorn Helgaas 0 siblings, 0 replies; 4+ messages in thread From: Bjorn Helgaas @ 2026-03-30 22:20 UTC (permalink / raw) To: Kuppuswamy Sathyanarayanan Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran, Jon Pan-Doh, linux-pci, linux-kernel, linuxppc-dev, Sizhe Liu On Wed, Mar 18, 2026 at 10:48:07AM -0700, Kuppuswamy Sathyanarayanan wrote: > On 3/18/2026 10:22 AM, Bjorn Helgaas wrote: > > On Wed, Mar 18, 2026 at 10:04:49AM -0700, Kuppuswamy Sathyanarayanan wrote: > >> aer_print_error() skips printing if ratelimit_print[i] is not set. > >> In the native AER path, ratelimit_print is initialized by > >> add_error_device() during source device discovery, and is set to 1 > >> for fatal errors to bypass rate limiting since fatal errors should > >> always be logged. > >> > >> The DPC/EDR path uses the DPC-capable port as the error source and > >> reads its AER uncorrectable error status registers directly in > >> dpc_get_aer_uncorrect_severity(). Since it does not go through > >> add_error_device(), ratelimit_print[0] is left uninitialized and zero. > >> As a result, aer_print_error() silently drops all AER error messages > >> for DPC/EDR triggered events. > >> > >> Set ratelimit_print[0] to 1 to bypass rate limiting and always print > >> AER logs for fatal errors. To be precise, I think this bypasses rate limiting for all uncorrectable errors (both fatal and non-fatal) that cause DPC to be triggered, i.e., uncorrectable errors detected directly by the DPC port, right? Uncorrectable errors detected downstream of the DPC port would generate ERR_NONFATAL or ERR_FATAL messages. When the DPC port receives those, it triggers DPC but logs only the "containment event ... received from" message. That message isn't ratelimited, and this patch doesn't change that. I guess there aren't any AER log details to log in this case because they're in downstream devices that we can't read while DPC is triggered. > >> Fixes: a57f2bfb4a58 ("PCI/AER: Ratelimit correctable and non-fatal error logging") > >> Co-developed-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com> > >> Signed-off-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com> > >> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > > > > I think this does the same as > > https://git.kernel.org/cgit/linux/kernel/git/pci/pci.git/commit/?id=d4d1ecff2c2d > > which is already queued for v7.1. > > Thanks for the reference. > > Since errors in the DPC path leads to port containment, I think it > is best to always log them for reference and debug purposes. So I > think we don't need to export aer_print_init() from the AER driver > (which can ratelimit non-fatal DPC error). Instead we can by default > skip ratelimit for DPC errors by initializing ratelimit_print[0] = > 1. I think that makes sense. With Sizhe's patch, the pci_warn() in dpc_process_error() is not ratelimited but the aer_print_error() part is, so we always see the "containment event" warning but may not see the rest. I guess we only call dpc_get_aer_uncorrect_severity() for the PCI_EXP_DPC_STATUS_TRIGGER_RSN_UNCOR case; the NFE, FE, and IN_EXT cases aren't affected by this patch. I replaced Sizhe's ratelimit patch on pci/dpc with this one, keeping the patch that holds a reference while calling dpc_process_error(). > >> --- > >> drivers/pci/pcie/dpc.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > >> index fc18349614d7..7605ddd9f0ba 100644 > >> --- a/drivers/pci/pcie/dpc.c > >> +++ b/drivers/pci/pcie/dpc.c > >> @@ -256,6 +256,7 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev, > >> > >> info->dev[0] = dev; > >> info->error_dev_num = 1; > >> + info->ratelimit_print[0] = 1; > >> > >> return 1; > >> } > >> -- > >> 2.43.0 > >> > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-30 22:21 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-18 17:04 [PATCH v1] PCI/DPC: Fix AER error logging for DPC/EDR triggered events Kuppuswamy Sathyanarayanan 2026-03-18 17:22 ` Bjorn Helgaas 2026-03-18 17:48 ` Kuppuswamy Sathyanarayanan 2026-03-30 22:20 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox