public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI/AER: Stop ruling out unbound devices as error source
@ 2026-03-27  9:56 Lukas Wunner
  2026-03-30  6:32 ` Stefan Roese
  2026-03-30 19:19 ` Bjorn Helgaas
  0 siblings, 2 replies; 3+ messages in thread
From: Lukas Wunner @ 2026-03-27  9:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Mahesh J Salgaonkar, Oliver OHalloran, linuxppc-dev,
	Stefan Roese

When searching for the error source, the AER driver rules out devices
whose enable_cnt is zero.  This was introduced in 2009 by commit
28eb27cf0839 ("PCI AER: support invalid error source IDs") without
providing a rationale.

Drivers typically call pci_enable_device() on probe, hence the enable_cnt
check essentially filters out unbound devices.  At the time of the commit,
drivers had to opt in to AER by calling pci_enable_pcie_error_reporting()
and so any AER-enabled device could be assumed to be bound to a driver.
The check thus made sense because it allowed skipping config space
accesses to devices which were known not to be the error source.

But since 2022, AER is universally enabled on all devices when they are
enumerated, cf. commit f26e58bf6f54 ("PCI/AER: Enable error reporting when
AER is native").

Errors may very well be reported by unbound devices, e.g. due to link
instability.  By ruling them out as error source, errors reported by them
are neither logged nor cleared.  When they do get bound and another error
occurs, the earlier error is reported together with the new error, which
may confuse users.  Stop doing so.

Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v6.0+
---
 drivers/pci/pcie/aer.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 4299c55..384d026 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1039,8 +1039,6 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
 	 *      3) There are multiple errors and prior ID comparing fails;
 	 * We check AER status registers to find possible reporter.
 	 */
-	if (atomic_read(&dev->enable_cnt) == 0)
-		return false;
 
 	/* Check if AER is enabled */
 	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &reg16);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] PCI/AER: Stop ruling out unbound devices as error source
  2026-03-27  9:56 [PATCH] PCI/AER: Stop ruling out unbound devices as error source Lukas Wunner
@ 2026-03-30  6:32 ` Stefan Roese
  2026-03-30 19:19 ` Bjorn Helgaas
  1 sibling, 0 replies; 3+ messages in thread
From: Stefan Roese @ 2026-03-30  6:32 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas
  Cc: linux-pci, Mahesh J Salgaonkar, Oliver OHalloran, linuxppc-dev

On 3/27/26 10:56, Lukas Wunner wrote:
> When searching for the error source, the AER driver rules out devices
> whose enable_cnt is zero.  This was introduced in 2009 by commit
> 28eb27cf0839 ("PCI AER: support invalid error source IDs") without
> providing a rationale.
> 
> Drivers typically call pci_enable_device() on probe, hence the enable_cnt
> check essentially filters out unbound devices.  At the time of the commit,
> drivers had to opt in to AER by calling pci_enable_pcie_error_reporting()
> and so any AER-enabled device could be assumed to be bound to a driver.
> The check thus made sense because it allowed skipping config space
> accesses to devices which were known not to be the error source.
> 
> But since 2022, AER is universally enabled on all devices when they are
> enumerated, cf. commit f26e58bf6f54 ("PCI/AER: Enable error reporting when
> AER is native").
> 
> Errors may very well be reported by unbound devices, e.g. due to link
> instability.  By ruling them out as error source, errors reported by them
> are neither logged nor cleared.  When they do get bound and another error
> occurs, the earlier error is reported together with the new error, which
> may confuse users.  Stop doing so.
> 
> Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native")
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
> ---
>   drivers/pci/pcie/aer.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 4299c55..384d026 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1039,8 +1039,6 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
>   	 *      3) There are multiple errors and prior ID comparing fails;
>   	 * We check AER status registers to find possible reporter.
>   	 */
> -	if (atomic_read(&dev->enable_cnt) == 0)
> -		return false;

Reviewed-by: Stefan Roese <stefan.roese@mailbox.org>

Thanks,
Stefan


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] PCI/AER: Stop ruling out unbound devices as error source
  2026-03-27  9:56 [PATCH] PCI/AER: Stop ruling out unbound devices as error source Lukas Wunner
  2026-03-30  6:32 ` Stefan Roese
@ 2026-03-30 19:19 ` Bjorn Helgaas
  1 sibling, 0 replies; 3+ messages in thread
From: Bjorn Helgaas @ 2026-03-30 19:19 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, Mahesh J Salgaonkar, Oliver OHalloran, linuxppc-dev,
	Stefan Roese

On Fri, Mar 27, 2026 at 10:56:43AM +0100, Lukas Wunner wrote:
> When searching for the error source, the AER driver rules out devices
> whose enable_cnt is zero.  This was introduced in 2009 by commit
> 28eb27cf0839 ("PCI AER: support invalid error source IDs") without
> providing a rationale.
> 
> Drivers typically call pci_enable_device() on probe, hence the enable_cnt
> check essentially filters out unbound devices.  At the time of the commit,
> drivers had to opt in to AER by calling pci_enable_pcie_error_reporting()
> and so any AER-enabled device could be assumed to be bound to a driver.
> The check thus made sense because it allowed skipping config space
> accesses to devices which were known not to be the error source.
> 
> But since 2022, AER is universally enabled on all devices when they are
> enumerated, cf. commit f26e58bf6f54 ("PCI/AER: Enable error reporting when
> AER is native").
> 
> Errors may very well be reported by unbound devices, e.g. due to link
> instability.  By ruling them out as error source, errors reported by them
> are neither logged nor cleared.  When they do get bound and another error
> occurs, the earlier error is reported together with the new error, which
> may confuse users.  Stop doing so.
> 
> Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native")
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+

Applied to pci/aer for v7.1, thanks!

> ---
>  drivers/pci/pcie/aer.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 4299c55..384d026 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1039,8 +1039,6 @@ static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info)
>  	 *      3) There are multiple errors and prior ID comparing fails;
>  	 * We check AER status registers to find possible reporter.
>  	 */
> -	if (atomic_read(&dev->enable_cnt) == 0)
> -		return false;
>  
>  	/* Check if AER is enabled */
>  	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &reg16);
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-30 19:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27  9:56 [PATCH] PCI/AER: Stop ruling out unbound devices as error source Lukas Wunner
2026-03-30  6:32 ` Stefan Roese
2026-03-30 19:19 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox