From: Bjorn Helgaas <helgaas@kernel.org>
To: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
Oliver OHalloran <oohall@gmail.com>,
Jon Pan-Doh <pandoh@google.com>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, Sizhe Liu <liusizhe5@huawei.com>
Subject: Re: [PATCH v1] PCI/DPC: Fix AER error logging for DPC/EDR triggered events
Date: Mon, 30 Mar 2026 17:20:59 -0500 [thread overview]
Message-ID: <20260330222059.GA103174@bhelgaas> (raw)
In-Reply-To: <875b2f99-35b9-4a7c-b47e-bfdc543ef7a0@linux.intel.com>
On Wed, Mar 18, 2026 at 10:48:07AM -0700, Kuppuswamy Sathyanarayanan wrote:
> On 3/18/2026 10:22 AM, Bjorn Helgaas wrote:
> > On Wed, Mar 18, 2026 at 10:04:49AM -0700, Kuppuswamy Sathyanarayanan wrote:
> >> aer_print_error() skips printing if ratelimit_print[i] is not set.
> >> In the native AER path, ratelimit_print is initialized by
> >> add_error_device() during source device discovery, and is set to 1
> >> for fatal errors to bypass rate limiting since fatal errors should
> >> always be logged.
> >>
> >> The DPC/EDR path uses the DPC-capable port as the error source and
> >> reads its AER uncorrectable error status registers directly in
> >> dpc_get_aer_uncorrect_severity(). Since it does not go through
> >> add_error_device(), ratelimit_print[0] is left uninitialized and zero.
> >> As a result, aer_print_error() silently drops all AER error messages
> >> for DPC/EDR triggered events.
> >>
> >> Set ratelimit_print[0] to 1 to bypass rate limiting and always print
> >> AER logs for fatal errors.
To be precise, I think this bypasses rate limiting for all
uncorrectable errors (both fatal and non-fatal) that cause DPC to be
triggered, i.e., uncorrectable errors detected directly by the DPC
port, right?
Uncorrectable errors detected downstream of the DPC port would
generate ERR_NONFATAL or ERR_FATAL messages. When the DPC port
receives those, it triggers DPC but logs only the "containment event
... received from" message. That message isn't ratelimited, and this
patch doesn't change that. I guess there aren't any AER log details
to log in this case because they're in downstream devices that we
can't read while DPC is triggered.
> >> Fixes: a57f2bfb4a58 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
> >> Co-developed-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com>
> >> Signed-off-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com>
> >> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> >
> > I think this does the same as
> > https://git.kernel.org/cgit/linux/kernel/git/pci/pci.git/commit/?id=d4d1ecff2c2d
> > which is already queued for v7.1.
>
> Thanks for the reference.
>
> Since errors in the DPC path leads to port containment, I think it
> is best to always log them for reference and debug purposes. So I
> think we don't need to export aer_print_init() from the AER driver
> (which can ratelimit non-fatal DPC error). Instead we can by default
> skip ratelimit for DPC errors by initializing ratelimit_print[0] =
> 1.
I think that makes sense. With Sizhe's patch, the pci_warn() in
dpc_process_error() is not ratelimited but the aer_print_error() part
is, so we always see the "containment event" warning but may not see
the rest.
I guess we only call dpc_get_aer_uncorrect_severity() for the
PCI_EXP_DPC_STATUS_TRIGGER_RSN_UNCOR case; the NFE, FE, and IN_EXT
cases aren't affected by this patch.
I replaced Sizhe's ratelimit patch on pci/dpc with this one, keeping
the patch that holds a reference while calling dpc_process_error().
> >> ---
> >> drivers/pci/pcie/dpc.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> >> index fc18349614d7..7605ddd9f0ba 100644
> >> --- a/drivers/pci/pcie/dpc.c
> >> +++ b/drivers/pci/pcie/dpc.c
> >> @@ -256,6 +256,7 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
> >>
> >> info->dev[0] = dev;
> >> info->error_dev_num = 1;
> >> + info->ratelimit_print[0] = 1;
> >>
> >> return 1;
> >> }
> >> --
> >> 2.43.0
> >>
>
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
>
prev parent reply other threads:[~2026-03-30 22:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-18 17:04 [PATCH v1] PCI/DPC: Fix AER error logging for DPC/EDR triggered events Kuppuswamy Sathyanarayanan
2026-03-18 17:22 ` Bjorn Helgaas
2026-03-18 17:48 ` Kuppuswamy Sathyanarayanan
2026-03-30 22:20 ` Bjorn Helgaas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260330222059.GA103174@bhelgaas \
--to=helgaas@kernel.org \
--cc=bhelgaas@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liusizhe5@huawei.com \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=pandoh@google.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox