linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Documentation: PCI: Amend error recovery doc with pci_save_state() rules
@ 2025-11-21 17:31 Lukas Wunner
  2025-11-21 18:57 ` Farhan Ali
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Lukas Wunner @ 2025-11-21 17:31 UTC (permalink / raw)
  To: Bjorn Helgaas, Jonathan Corbet
  Cc: Rafael J. Wysocki, Farhan Ali, Benjamin Block, Niklas Schnelle,
	Mahesh J Salgaonkar, Oliver OHalloran, linuxppc-dev, linux-pci,
	linux-pm, Linas Vepstas, linux-doc

After recovering from a PCI error through reset, affected devices are in
D0_uninitialized state and need to be brought into D0_active state by
re-initializing their Config Space registers (PCIe r7.0 sec 5.3.1.1).

To facilitate that, the PCI core provides pci_restore_state() and
pci_save_state() helpers.  Document rules governing their usage.

As Bjorn notes, so far no file in "Documentation/ includes anything about
the idea of a driver using pci_save_state() to capture the state it wants
to restore after an error", even though it is a common pattern in drivers.
So that's obviously a gap that should be closed.

Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Closes: https://lore.kernel.org/r/20251113161556.GA2284238@bhelgaas/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 Documentation/PCI/pci-error-recovery.rst | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
index 5df481a..43bc4e3 100644
--- a/Documentation/PCI/pci-error-recovery.rst
+++ b/Documentation/PCI/pci-error-recovery.rst
@@ -326,6 +326,21 @@ be recovered, there is nothing more that can be done;  the platform
 will typically report a "permanent failure" in such a case.  The
 device will be considered "dead" in this case.
 
+Drivers typically need to call pci_restore_state() after reset to
+re-initialize the device's config space registers and thereby
+bring it from D0\ :sub:`uninitialized` into D0\ :sub:`active` state
+(PCIe r7.0 sec 5.3.1.1).  The PCI core invokes pci_save_state()
+on enumeration after initializing config space to ensure that a
+saved state is available for subsequent error recovery.
+Drivers which modify config space on probe may need to invoke
+pci_save_state() afterwards to record those changes for later
+error recovery.  When going into system suspend, pci_save_state()
+is called for every PCI device and that state will be restored
+not only on resume, but also on any subsequent error recovery.
+In the unlikely event that the saved state recorded on suspend
+is unsuitable for error recovery, drivers should call
+pci_save_state() on resume.
+
 Drivers for multi-function cards will need to coordinate among
 themselves as to which driver instance will perform any "one-shot"
 or global device initialization. For example, the Symbios sym53cxx2
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  2025-11-21 17:31 [PATCH] Documentation: PCI: Amend error recovery doc with pci_save_state() rules Lukas Wunner
@ 2025-11-21 18:57 ` Farhan Ali
  2025-11-22 10:24   ` Lukas Wunner
  2025-11-24 18:23 ` Rafael J. Wysocki
  2025-11-24 22:54 ` Bjorn Helgaas
  2 siblings, 1 reply; 5+ messages in thread
From: Farhan Ali @ 2025-11-21 18:57 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, Jonathan Corbet
  Cc: Rafael J. Wysocki, Benjamin Block, Niklas Schnelle,
	Mahesh J Salgaonkar, Oliver OHalloran, linuxppc-dev, linux-pci,
	linux-pm, Linas Vepstas, linux-doc

Hi Lukas,

Thanks for the update to documentation.

On 11/21/2025 9:31 AM, Lukas Wunner wrote:
> After recovering from a PCI error through reset, affected devices are in
> D0_uninitialized state and need to be brought into D0_active state by
> re-initializing their Config Space registers (PCIe r7.0 sec 5.3.1.1).
>
> To facilitate that, the PCI core provides pci_restore_state() and
> pci_save_state() helpers.  Document rules governing their usage.
>
> As Bjorn notes, so far no file in "Documentation/ includes anything about
> the idea of a driver using pci_save_state() to capture the state it wants
> to restore after an error", even though it is a common pattern in drivers.
> So that's obviously a gap that should be closed.
>
> Reported-by: Bjorn Helgaas <helgaas@kernel.org>
> Closes: https://lore.kernel.org/r/20251113161556.GA2284238@bhelgaas/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>   Documentation/PCI/pci-error-recovery.rst | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
>
> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
> index 5df481a..43bc4e3 100644
> --- a/Documentation/PCI/pci-error-recovery.rst
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -326,6 +326,21 @@ be recovered, there is nothing more that can be done;  the platform
>   will typically report a "permanent failure" in such a case.  The
>   device will be considered "dead" in this case.
>   
> +Drivers typically need to call pci_restore_state() after reset to
> +re-initialize the device's config space registers and thereby
> +bring it from D0\ :sub:`uninitialized` into D0\ :sub:`active` state
> +(PCIe r7.0 sec 5.3.1.1).  The PCI core invokes pci_save_state()
> +on enumeration after initializing config space to ensure that a
> +saved state is available for subsequent error recovery.
> +Drivers which modify config space on probe may need to invoke
> +pci_save_state() afterwards to record those changes for later
> +error recovery.  When going into system suspend, pci_save_state()
> +is called for every PCI device and that state will be restored
> +not only on resume, but also on any subsequent error recovery.

Nit: Should we clarify in the above sentence on what calls the 
pci_save_state() when going into suspend? My assumption is the 
pci_save_state() is called by the PCI core and not the drivers?

> +In the unlikely event that the saved state recorded on suspend
> +is unsuitable for error recovery, drivers should call
> +pci_save_state() on resume.
> +

What should the PCI core do if the saved state recorded is bad? should 
we continue to restore the device with the recorded bad state? On s390 
restoring the device with the bad state can break the device put into 
error again.

Thanks

Farhan




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  2025-11-21 18:57 ` Farhan Ali
@ 2025-11-22 10:24   ` Lukas Wunner
  0 siblings, 0 replies; 5+ messages in thread
From: Lukas Wunner @ 2025-11-22 10:24 UTC (permalink / raw)
  To: Farhan Ali
  Cc: Bjorn Helgaas, Jonathan Corbet, Rafael J. Wysocki, Benjamin Block,
	Niklas Schnelle, Mahesh J Salgaonkar, Oliver OHalloran,
	linuxppc-dev, linux-pci, linux-pm, Linas Vepstas, linux-doc

On Fri, Nov 21, 2025 at 10:57:24AM -0800, Farhan Ali wrote:
> On 11/21/2025 9:31 AM, Lukas Wunner wrote:
> > +++ b/Documentation/PCI/pci-error-recovery.rst
> > @@ -326,6 +326,21 @@ be recovered, there is nothing more that can be done;  the platform
> >   will typically report a "permanent failure" in such a case.  The
> >   device will be considered "dead" in this case.
> > +Drivers typically need to call pci_restore_state() after reset to
> > +re-initialize the device's config space registers and thereby
> > +bring it from D0\ :sub:`uninitialized` into D0\ :sub:`active` state
> > +(PCIe r7.0 sec 5.3.1.1).  The PCI core invokes pci_save_state()
> > +on enumeration after initializing config space to ensure that a
> > +saved state is available for subsequent error recovery.
> > +Drivers which modify config space on probe may need to invoke
> > +pci_save_state() afterwards to record those changes for later
> > +error recovery.  When going into system suspend, pci_save_state()
> > +is called for every PCI device and that state will be restored
> > +not only on resume, but also on any subsequent error recovery.
> 
> Nit: Should we clarify in the above sentence on what calls the
> pci_save_state() when going into suspend? My assumption is the
> pci_save_state() is called by the PCI core and not the drivers?

Per section 3.1.2 of Documentation/power/pci.rst, pci_save_state()
may be called by either the driver or the PCI core.  Normally it's
the PCI core's responsibility, but a driver may choose to call it
and bring the device into a low power state itself.  The PCI core
recognizes that by looking at the state_saved flag in struct pci_dev
and will then neither call pci_save_state() nor transition the device
to a low power state.  That is the (only) purpose of the flag.

I could maybe add a cross-reference pointing to Documentation/power/pci.rst.
And/or that document could be moved to Documentation/PCI/.

> What should the PCI core do if the saved state recorded is bad? should we
> continue to restore the device with the recorded bad state?

Basically the answer is, it should never happen and if it does,
we've got a bug somewhere.

> On s390 restoring the device with the bad state can break the device
> put into error again.

My (limited) understanding is that you may end up with a bad
saved state on s390 virtualization scenarios because you're
telling the PCI core in the ->error_detected phase() that the
device has recovered and then you try to reset and recover the
device on your own.  I think the solution is to enhance qemu
to integrate better with error recovery on the host.

Thanks,

Lukas


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  2025-11-21 17:31 [PATCH] Documentation: PCI: Amend error recovery doc with pci_save_state() rules Lukas Wunner
  2025-11-21 18:57 ` Farhan Ali
@ 2025-11-24 18:23 ` Rafael J. Wysocki
  2025-11-24 22:54 ` Bjorn Helgaas
  2 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2025-11-24 18:23 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Jonathan Corbet, Rafael J. Wysocki, Farhan Ali,
	Benjamin Block, Niklas Schnelle, Mahesh J Salgaonkar,
	Oliver OHalloran, linuxppc-dev, linux-pci, linux-pm,
	Linas Vepstas, linux-doc

On Fri, Nov 21, 2025 at 6:31 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> After recovering from a PCI error through reset, affected devices are in
> D0_uninitialized state and need to be brought into D0_active state by
> re-initializing their Config Space registers (PCIe r7.0 sec 5.3.1.1).
>
> To facilitate that, the PCI core provides pci_restore_state() and
> pci_save_state() helpers.  Document rules governing their usage.
>
> As Bjorn notes, so far no file in "Documentation/ includes anything about
> the idea of a driver using pci_save_state() to capture the state it wants
> to restore after an error", even though it is a common pattern in drivers.
> So that's obviously a gap that should be closed.
>
> Reported-by: Bjorn Helgaas <helgaas@kernel.org>
> Closes: https://lore.kernel.org/r/20251113161556.GA2284238@bhelgaas/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

It looks good to me, so

Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>

> ---
>  Documentation/PCI/pci-error-recovery.rst | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
> index 5df481a..43bc4e3 100644
> --- a/Documentation/PCI/pci-error-recovery.rst
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -326,6 +326,21 @@ be recovered, there is nothing more that can be done;  the platform
>  will typically report a "permanent failure" in such a case.  The
>  device will be considered "dead" in this case.
>
> +Drivers typically need to call pci_restore_state() after reset to
> +re-initialize the device's config space registers and thereby
> +bring it from D0\ :sub:`uninitialized` into D0\ :sub:`active` state
> +(PCIe r7.0 sec 5.3.1.1).  The PCI core invokes pci_save_state()
> +on enumeration after initializing config space to ensure that a
> +saved state is available for subsequent error recovery.
> +Drivers which modify config space on probe may need to invoke
> +pci_save_state() afterwards to record those changes for later
> +error recovery.  When going into system suspend, pci_save_state()
> +is called for every PCI device and that state will be restored
> +not only on resume, but also on any subsequent error recovery.
> +In the unlikely event that the saved state recorded on suspend
> +is unsuitable for error recovery, drivers should call
> +pci_save_state() on resume.
> +
>  Drivers for multi-function cards will need to coordinate among
>  themselves as to which driver instance will perform any "one-shot"
>  or global device initialization. For example, the Symbios sym53cxx2
> --
> 2.51.0
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  2025-11-21 17:31 [PATCH] Documentation: PCI: Amend error recovery doc with pci_save_state() rules Lukas Wunner
  2025-11-21 18:57 ` Farhan Ali
  2025-11-24 18:23 ` Rafael J. Wysocki
@ 2025-11-24 22:54 ` Bjorn Helgaas
  2 siblings, 0 replies; 5+ messages in thread
From: Bjorn Helgaas @ 2025-11-24 22:54 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Jonathan Corbet, Rafael J. Wysocki, Farhan Ali, Benjamin Block,
	Niklas Schnelle, Mahesh J Salgaonkar, Oliver OHalloran,
	linuxppc-dev, linux-pci, linux-pm, Linas Vepstas, linux-doc

On Fri, Nov 21, 2025 at 06:31:17PM +0100, Lukas Wunner wrote:
> After recovering from a PCI error through reset, affected devices are in
> D0_uninitialized state and need to be brought into D0_active state by
> re-initializing their Config Space registers (PCIe r7.0 sec 5.3.1.1).
> 
> To facilitate that, the PCI core provides pci_restore_state() and
> pci_save_state() helpers.  Document rules governing their usage.
> 
> As Bjorn notes, so far no file in "Documentation/ includes anything about
> the idea of a driver using pci_save_state() to capture the state it wants
> to restore after an error", even though it is a common pattern in drivers.
> So that's obviously a gap that should be closed.
> 
> Reported-by: Bjorn Helgaas <helgaas@kernel.org>
> Closes: https://lore.kernel.org/r/20251113161556.GA2284238@bhelgaas/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Applied with Rafael's ack to pci/err for v6.19, thanks!

> ---
>  Documentation/PCI/pci-error-recovery.rst | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
> index 5df481a..43bc4e3 100644
> --- a/Documentation/PCI/pci-error-recovery.rst
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -326,6 +326,21 @@ be recovered, there is nothing more that can be done;  the platform
>  will typically report a "permanent failure" in such a case.  The
>  device will be considered "dead" in this case.
>  
> +Drivers typically need to call pci_restore_state() after reset to
> +re-initialize the device's config space registers and thereby
> +bring it from D0\ :sub:`uninitialized` into D0\ :sub:`active` state
> +(PCIe r7.0 sec 5.3.1.1).  The PCI core invokes pci_save_state()
> +on enumeration after initializing config space to ensure that a
> +saved state is available for subsequent error recovery.
> +Drivers which modify config space on probe may need to invoke
> +pci_save_state() afterwards to record those changes for later
> +error recovery.  When going into system suspend, pci_save_state()
> +is called for every PCI device and that state will be restored
> +not only on resume, but also on any subsequent error recovery.
> +In the unlikely event that the saved state recorded on suspend
> +is unsuitable for error recovery, drivers should call
> +pci_save_state() on resume.
> +
>  Drivers for multi-function cards will need to coordinate among
>  themselves as to which driver instance will perform any "one-shot"
>  or global device initialization. For example, the Symbios sym53cxx2
> -- 
> 2.51.0
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-11-24 22:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-21 17:31 [PATCH] Documentation: PCI: Amend error recovery doc with pci_save_state() rules Lukas Wunner
2025-11-21 18:57 ` Farhan Ali
2025-11-22 10:24   ` Lukas Wunner
2025-11-24 18:23 ` Rafael J. Wysocki
2025-11-24 22:54 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).