From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
To: Lukas Wunner <lukas@wunner.de>, Bjorn Helgaas <helgaas@kernel.org>
Cc: Terry Bowman <terry.bowman@amd.com>,
linux-pci@vger.kernel.org, Shuai Xue <xueshuai@linux.alibaba.com>,
tianruidong@linux.alibaba.com, Keith Busch <kbusch@kernel.org>,
Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
Oliver OHalloran <oohall@gmail.com>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] PCI/AER: Clear stale errors on reporting agents upon probe
Date: Mon, 26 Jan 2026 10:42:06 -0800 [thread overview]
Message-ID: <06fcb922-458c-473c-999a-1dd8518976f1@linux.intel.com> (raw)
In-Reply-To: <3011c2ed30c11f858e35e29939add754adea7478.1769332702.git.lukas@wunner.de>
On 1/25/2026 1:25 AM, Lukas Wunner wrote:
> Correctable and Uncorrectable Error Status Registers on reporting agents
> are cleared upon PCI device enumeration in pci_aer_init() to flush past
> events. They're cleared again when an error is handled by the AER driver.
>
> If an agent reports a new error after pci_aer_init() and before the AER
> driver has probed on the corresponding Root Port or Root Complex Event
> Collector, that error is not handled by the AER driver: It clears the
> Root Error Status Register on probe, but neglects to re-clear the
> Correctable and Uncorrectable Error Status Registers on reporting agents.
>
> The error will eventually be reported when another error occurs. Which
> is irritating because to an end user it appears as if the earlier error
> has just happened.
>
> Amend the AER driver to clear stale errors on reporting agents upon probe.
>
> Skip reporting agents which have not invoked pci_aer_init() yet to avoid
> using an uninitialized pdev->aer_cap. They're recognizable by the error
> bits in the Device Control register still being clear.
>
> Reporting agents may execute pci_aer_init() after the AER driver has
> probed, particularly when devices are hotplugged or removed/rescanned via
> sysfs. For this reason, it continues to be necessary that pci_aer_init()
> clears Correctable and Uncorrectable Error Status Registers.
>
Can you include details about where and in what configuration you observed
this issue?
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Reported-by: Lucas Van <lucas.van@intel.com> # off-list
> Tested-by: Lucas Van <lucas.van@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
> drivers/pci/pcie/aer.c | 26 +++++++++++++++++++++++++-
> 1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e0bcaa8..4299c55 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1608,6 +1608,20 @@ static void aer_disable_irq(struct pci_dev *pdev)
> pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> }
>
> +static int clear_status_iter(struct pci_dev *dev, void *data)
> +{
> + u16 devctl;
> +
> + /* Skip if pci_enable_pcie_error_reporting() hasn't been called yet */
> + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &devctl);
> + if (!(devctl & PCI_EXP_AER_FLAGS))
> + return 0;
> +
> + pci_aer_clear_status(dev);
> + pcie_clear_device_status(dev);
Should pci_aer_init() also clear device status along with uncor/cor error status?
> + return 0;
> +}
> +
> /**
> * aer_enable_rootport - enable Root Port's interrupts when receiving messages
> * @rpc: pointer to a Root Port data structure
> @@ -1629,9 +1643,19 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
> pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
> SYSTEM_ERROR_INTR_ON_MESG_MASK);
>
> - /* Clear error status */
> + /* Clear error status of this Root Port or RCEC */
> pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ®32);
> pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
> +
> + /* Clear error status of agents reporting to this Root Port or RCEC */
> + if (reg32 & AER_ERR_STATUS_MASK) {
> + if (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_EC)
> + pcie_walk_rcec(pdev, clear_status_iter, NULL);
> + else if (pdev->subordinate)
> + pci_walk_bus(pdev->subordinate, clear_status_iter,
> + NULL);
> + }
> +
> pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, ®32);
> pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
> pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ®32);
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
next prev parent reply other threads:[~2026-01-26 18:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-25 9:25 [PATCH] PCI/AER: Clear stale errors on reporting agents upon probe Lukas Wunner
2026-01-26 18:42 ` Kuppuswamy Sathyanarayanan [this message]
2026-01-27 7:56 ` Lukas Wunner
2026-01-27 23:00 ` Bjorn Helgaas
2026-02-02 10:42 ` Lukas Wunner
2026-02-06 22:24 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=06fcb922-458c-473c-999a-1dd8518976f1@linux.intel.com \
--to=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=terry.bowman@amd.com \
--cc=tianruidong@linux.alibaba.com \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox