* [PATCH] PCI/AER: Stop ruling out unbound devices as error source
@ 2026-03-27 9:56 Lukas Wunner
2026-03-30 6:32 ` Stefan Roese
2026-03-30 19:19 ` Bjorn Helgaas
0 siblings, 2 replies; 3+ messages in thread
From: Lukas Wunner @ 2026-03-27 9:56 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, Mahesh J Salgaonkar, Oliver OHalloran, linuxppc-dev,
Stefan Roese
When searching for the error source, the AER driver rules out devices
whose enable_cnt is zero. This was introduced in 2009 by commit
28eb27cf0839 ("PCI AER: support invalid error source IDs") without
providing a rationale.
Drivers typically call pci_enable_device() on probe, hence the enable_cnt
check essentially filters out unbound devices. At the time of the commit,
drivers had to opt in to AER by calling pci_enable_pcie_error_reporting()
and so any AER-enabled device could be assumed to be bound to a driver.
The check thus made sense because it allowed skipping config space
accesses to devices which were known not to be the error source.
But since 2022, AER is universally enabled on all devices when they are
enumerated, cf. commit f26e58bf6f54 ("PCI/AER: Enable error reporting when
AER is native").
Errors may very well be reported by unbound devices, e.g. due to link
instability. By ruling them out as error source, errors reported by them
are neither logged nor cleared. When they do get bound and another error
occurs, the earlier error is reported together with the new error, which
may confuse users. Stop doing so.
Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v6.0+
---
drivers/pci/pcie/aer.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 4299c55..384d026 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1039,8 +1039,6 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
* 3) There are multiple errors and prior ID comparing fails;
* We check AER status registers to find possible reporter.
*/
- if (atomic_read(&dev->enable_cnt) == 0)
- return false;
/* Check if AER is enabled */
pcie_capability_read_word(dev, PCI_EXP_DEVCTL, ®16);
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] PCI/AER: Stop ruling out unbound devices as error source
2026-03-27 9:56 [PATCH] PCI/AER: Stop ruling out unbound devices as error source Lukas Wunner
@ 2026-03-30 6:32 ` Stefan Roese
2026-03-30 19:19 ` Bjorn Helgaas
1 sibling, 0 replies; 3+ messages in thread
From: Stefan Roese @ 2026-03-30 6:32 UTC (permalink / raw)
To: Lukas Wunner, Bjorn Helgaas
Cc: linux-pci, Mahesh J Salgaonkar, Oliver OHalloran, linuxppc-dev
On 3/27/26 10:56, Lukas Wunner wrote:
> When searching for the error source, the AER driver rules out devices
> whose enable_cnt is zero. This was introduced in 2009 by commit
> 28eb27cf0839 ("PCI AER: support invalid error source IDs") without
> providing a rationale.
>
> Drivers typically call pci_enable_device() on probe, hence the enable_cnt
> check essentially filters out unbound devices. At the time of the commit,
> drivers had to opt in to AER by calling pci_enable_pcie_error_reporting()
> and so any AER-enabled device could be assumed to be bound to a driver.
> The check thus made sense because it allowed skipping config space
> accesses to devices which were known not to be the error source.
>
> But since 2022, AER is universally enabled on all devices when they are
> enumerated, cf. commit f26e58bf6f54 ("PCI/AER: Enable error reporting when
> AER is native").
>
> Errors may very well be reported by unbound devices, e.g. due to link
> instability. By ruling them out as error source, errors reported by them
> are neither logged nor cleared. When they do get bound and another error
> occurs, the earlier error is reported together with the new error, which
> may confuse users. Stop doing so.
>
> Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native")
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
> ---
> drivers/pci/pcie/aer.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 4299c55..384d026 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1039,8 +1039,6 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
> * 3) There are multiple errors and prior ID comparing fails;
> * We check AER status registers to find possible reporter.
> */
> - if (atomic_read(&dev->enable_cnt) == 0)
> - return false;
Reviewed-by: Stefan Roese <stefan.roese@mailbox.org>
Thanks,
Stefan
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] PCI/AER: Stop ruling out unbound devices as error source
2026-03-27 9:56 [PATCH] PCI/AER: Stop ruling out unbound devices as error source Lukas Wunner
2026-03-30 6:32 ` Stefan Roese
@ 2026-03-30 19:19 ` Bjorn Helgaas
1 sibling, 0 replies; 3+ messages in thread
From: Bjorn Helgaas @ 2026-03-30 19:19 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-pci, Mahesh J Salgaonkar, Oliver OHalloran, linuxppc-dev,
Stefan Roese
On Fri, Mar 27, 2026 at 10:56:43AM +0100, Lukas Wunner wrote:
> When searching for the error source, the AER driver rules out devices
> whose enable_cnt is zero. This was introduced in 2009 by commit
> 28eb27cf0839 ("PCI AER: support invalid error source IDs") without
> providing a rationale.
>
> Drivers typically call pci_enable_device() on probe, hence the enable_cnt
> check essentially filters out unbound devices. At the time of the commit,
> drivers had to opt in to AER by calling pci_enable_pcie_error_reporting()
> and so any AER-enabled device could be assumed to be bound to a driver.
> The check thus made sense because it allowed skipping config space
> accesses to devices which were known not to be the error source.
>
> But since 2022, AER is universally enabled on all devices when they are
> enumerated, cf. commit f26e58bf6f54 ("PCI/AER: Enable error reporting when
> AER is native").
>
> Errors may very well be reported by unbound devices, e.g. due to link
> instability. By ruling them out as error source, errors reported by them
> are neither logged nor cleared. When they do get bound and another error
> occurs, the earlier error is reported together with the new error, which
> may confuse users. Stop doing so.
>
> Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native")
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
Applied to pci/aer for v7.1, thanks!
> ---
> drivers/pci/pcie/aer.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 4299c55..384d026 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1039,8 +1039,6 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
> * 3) There are multiple errors and prior ID comparing fails;
> * We check AER status registers to find possible reporter.
> */
> - if (atomic_read(&dev->enable_cnt) == 0)
> - return false;
>
> /* Check if AER is enabled */
> pcie_capability_read_word(dev, PCI_EXP_DEVCTL, ®16);
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-30 19:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 9:56 [PATCH] PCI/AER: Stop ruling out unbound devices as error source Lukas Wunner
2026-03-30 6:32 ` Stefan Roese
2026-03-30 19:19 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox