public inbox for linuxppc-dev@ozlabs.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Terry Bowman <terry.bowman@amd.com>,
	Sathyanarayanan Kuppuswamy
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	linux-pci@vger.kernel.org, Shuai Xue <xueshuai@linux.alibaba.com>,
	tianruidong@linux.alibaba.com, Keith Busch <kbusch@kernel.org>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Oliver OHalloran <oohall@gmail.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] PCI/AER: Clear stale errors on reporting agents upon probe
Date: Mon, 2 Feb 2026 11:42:22 +0100	[thread overview]
Message-ID: <aYB_jmq7xlyKpBFb@wunner.de> (raw)
In-Reply-To: <20260127230055.GA384686@bhelgaas>

On Tue, Jan 27, 2026 at 05:00:55PM -0600, Bjorn Helgaas wrote:
> On Sun, Jan 25, 2026 at 10:25:51AM +0100, Lukas Wunner wrote:
> > Correctable and Uncorrectable Error Status Registers on reporting agents
> > are cleared upon PCI device enumeration in pci_aer_init() to flush past
> > events.  They're cleared again when an error is handled by the AER driver.
> 
> Do you think pci_aer_init() is the right time to clear the error
> status bits?  Most of those bits are sticky, so they're not cleared by
> reset.
> 
> I'm thinking about the scenario where a PCIe error occurs is captured
> in the AER error status registers, but the system reboots before the
> AER driver can log the error.  Since the bits are sticky, the new
> kernel might have a chance to find and log the error that happened
> with the previous kernel.

I agree that *reporting* errors instead of just silently *clearing* them
could be useful.

We cannot pinpoint when the errors occurred, so we'd have to mark them
in the log messages as having occurred "during shutdown or early boot"
or "during suspend or resume" (for errors occurring during a system sleep
cycle).  But that could still be good enough and helpful for users.

We could report them with KERN_INFO severity and if that turns out to be
too noisy, demote them to KERN_DEBUG or exempt certain error types
(such as Unsupported Requests).

Shuai Xue and I had a discussion late last year about reporting
versus silently clearing stale errors:
https://lore.kernel.org/all/aPoIDW_Yt90VgHL8@wunner.de/

I think we were both unsure back then whether you would entertain a patch
to report stale errors.  But since you're now raising the issue yourself,
I'd say yes, it's worth pursuing.

However I think the $SUBJECT_PATCH still makes sense:  If I were to submit
a series to report stale errors, I'd still first amend the code to clear
all stale errors (instead of leaving some of them uncleared), then amend it
to report errors prior to clearing them.  The $SUBJECT_PATCH is sort of
a fix that distributions may want to backport, whereas *reporting*
stale errors would be a new feature not eligible for backporting.

> So I wonder if pci_aer_init() should just find the Capability and
> alloc its buffers, and aer_probe() should look for existing errors and
> log them before clearing them.

Devices may be enumerated after aer_probe(), e.g. when they're hot-added
below an AER-capable and hotplug-capable Root Port.  For cases like this,
we'll still have to clear (and in the future report) stale errors in
pci_aer_init().

(The $SUBJECT_PATCH takes this into account and explicitly calls out
this corner case in the commit message.)

Thanks,

Lukas


  reply	other threads:[~2026-02-02 10:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-25  9:25 [PATCH] PCI/AER: Clear stale errors on reporting agents upon probe Lukas Wunner
2026-01-26 18:42 ` Kuppuswamy Sathyanarayanan
2026-01-27  7:56   ` Lukas Wunner
2026-01-27 23:00 ` Bjorn Helgaas
2026-02-02 10:42   ` Lukas Wunner [this message]
2026-02-06 22:24 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYB_jmq7xlyKpBFb@wunner.de \
    --to=lukas@wunner.de \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=terry.bowman@amd.com \
    --cc=tianruidong@linux.alibaba.com \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox