linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Riana Tauro <riana.tauro@intel.com>,
	"Sean C. Dardis" <sean.c.dardis@intel.com>,
	Farhan Ali <alifm@linux.ibm.com>,
	Benjamin Block <bblock@linux.ibm.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Alek Du <alek.du@intel.com>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Oliver OHalloran <oohall@gmail.com>,
	linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org,
	Giovanni Cabiddu <giovanni.cabiddu@intel.com>,
	qat-linux@intel.com, Dave Jiang <dave.jiang@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jirislaby@kernel.org>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH 1/2] PCI: Ensure error recoverability at all times
Date: Fri, 14 Nov 2025 17:39:27 -0600	[thread overview]
Message-ID: <20251114233927.GA2340588@bhelgaas> (raw)
In-Reply-To: <aRd7y8blTOn1XYFE@wunner.de>

On Fri, Nov 14, 2025 at 07:58:19PM +0100, Lukas Wunner wrote:
> On Thu, Nov 13, 2025 at 10:15:56AM -0600, Bjorn Helgaas wrote:
> > It seems like there are two things going on here, and I'm not sure
> > they're completely compatible:
> > 
> >   1) Driver calls pci_save_state() to take over device power
> >      management and prevent the PCI core from doing it.
> > 
> >   2) Driver calls pci_save_state() to capture the device state it
> >      wants to restore when recovering from an error.
> > 
> > Shouldn't a driver be able to do 2) without also getting 1)?
> 
> In general, it can:
> 
> A number of drivers already call pci_save_state() on probe to capture
> the state for subsequent error recovery.  If the driver has modified
> config space in its probe hook, then calling pci_save_state() continues
> to make sense.  If the driver has *not* modified config space, then the
> call becomes obsolete once this patch is accepted.

So I guess "state_saved == true" means "driver does its own power
management and PCI core shouldn't do it", and drivers that want 2) but
not 1) just need to set state_saved = false after they call
pci_save_state()?

That makes sense in sort of a weird way that makes my head hurt every
time I try to understand it.  I think it's the sequence of
"pci_save_state(); dev->state_saved = false" that seems
counter-intuitive: we just saved the state; why would we immediately
turn around and say we didn't?  Maybe "state_saved" isn't the most
appropriate name.

After error recovery, those drivers will see the state the driver
identified when it called pci_save_state().  But after resume, they
will see the state the PCI core saved at suspend time.  Right?

> The reason I'm using the "in general" qualifier:
> 
> I've identified two corner cases where the PCI core neglects to set
> state_saved = false before commencing the suspend sequence:
> 
> * If a driver has legacy PCI PM callbacks, pci_legacy_suspend() neglects
>   to set state_saved = false.  Yet both pci_legacy_suspend() and
>   pci_legacy_suspend_late() subsequently query the state_saved flag.

> * If a device is unbound or its driver has no PM callbacks
>   (driver->pm == NULL), pci_pm_freeze() neglects to set state_saved = false.
>   Yet pci_pm_freeze_noirq() subsequently queries the state_saved flag.
> 
> In these corner cases, pci_legacy_suspend() and pci_pm_freeze() depend
> on some other part of the PCI core to set state_saved = false.
> For a freshly enumerated device, the flag is initialized to false by
> kzalloc() and pci_device_add() also explicitly sets it to false for good
> measure.  On resume (or thaw or restore), the flag is set to false by
> pci_restore_state().  The latter is preserved as is by my patch and the
> former is moved to pci_bus_add_device() to retain the current behavior.
> 
> Clearly, the two corner cases should be fixed and then setting
> state_saved = false in pci_bus_add_device() becomes unnecessary.
> 
> I'd prefer doing that in a separate step though.
> 
> So drivers which use legacy PCI PM callbacks or have no PM callbacks
> should currently not call pci_save_state() on probe without manually
> setting state_saved = false afterwards.  If they neglect that, then
> pci_legacy_suspend_late() and pci_pm_freeze_noirq() will not call
> pci_save_state() on the next suspend cycle and so the state that
> will be restored on resume is the one recorded on probe, not the
> one that the device had on suspend.  If these two states happen
> to be identical, there's no problem.
> 
> > > > > +++ b/drivers/pci/bus.c
> > > > > @@ -358,6 +358,13 @@ void pci_bus_add_device(struct pci_dev *dev)
> > > > >  	pci_bridge_d3_update(dev);
> > > > >  
> > > > >  	/*
> > > > > +	 * Save config space for error recoverability.  Clear state_saved
> > > > > +	 * to detect whether drivers invoked pci_save_state() on suspen
> [...]
> > > > Can we expand this a little to explain how this is detected and what
> > > > drivers *should* be doing?
> [...]
> > Yes.  I should have proposed some text for the comment, e.g.,
> > 
> >   Save config space for error recoverability.  Clear state_saved.  If
> >   driver calls pci_save_state() again, state_saved will be set and
> >   we'll know that on suspend, the PCI core shouldn't call
> >   pci_save_state() or change the device power state.
> 
> I'm fine with rewording the code comment like this, as well as splitting
> the code comment as suggested by Rafael.  Would you prefer amending the
> code comment when applying or should I respin with a reworded comment?
> 
> Again, clearing state_saved in pci_bus_add_device() is just a temporary
> measure to retain the existing behavior.  It (and an accompanying code
> comment) can be dropped once pci_legacy_suspend() and pci_pm_freeze()
> are fixed.
> 
> > I'm just wishing for a more concrete mention of "pci_save_state()",
> > since that's where the critical "state_saved" flag is updated.
> > 
> > And I'm not sure Documentation/ includes anything about the idea of
> > a driver using pci_save_state() to capture the state it wants to
> > restore after an error.  I guess that's obvious now that I write it
> > down, but I'm sure learning a lot from this conversation :)
> 
> Okay, noted that the documentation could be improved.  I'd be glad
> if this could be postponed to a separate step as well though.
> I can only address problems one at a time. :)

Absolutely.  Would you mind posting an update with the tweaks above?
I'm not at all confident about doing it myself.

Bjorn


  reply	other threads:[~2025-11-14 23:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-12 13:25 [PATCH 0/2] PCI: Universal error recoverability of devices Lukas Wunner
2025-10-12 13:25 ` [PATCH 1/2] PCI: Ensure error recoverability at all times Lukas Wunner
2025-11-12 22:38   ` Bjorn Helgaas
2025-11-13  9:38     ` Lukas Wunner
2025-11-13 16:15       ` Bjorn Helgaas
2025-11-14 18:58         ` Lukas Wunner
2025-11-14 23:39           ` Bjorn Helgaas [this message]
2025-11-19 10:02             ` Lukas Wunner
2025-11-21 17:40         ` Lukas Wunner
2025-11-24 22:11           ` Bjorn Helgaas
2025-11-13 20:49   ` Rafael J. Wysocki
2025-11-13 21:03     ` Rafael J. Wysocki
2025-10-12 13:25 ` [PATCH 2/2] treewide: Drop pci_save_state() after pci_restore_state() Lukas Wunner
2025-11-05 14:22   ` Dave Jiang
2025-11-05 14:33   ` Giovanni Cabiddu
2025-11-24 23:13   ` Bjorn Helgaas
2025-11-14 23:45 ` [PATCH 0/2] PCI: Universal error recoverability of devices Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251114233927.GA2340588@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=alek.du@intel.com \
    --cc=alifm@linux.ibm.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=bblock@linux.ibm.com \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=giovanni.cabiddu@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jirislaby@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=martin.petersen@oracle.com \
    --cc=oohall@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=qat-linux@intel.com \
    --cc=rafael@kernel.org \
    --cc=riana.tauro@intel.com \
    --cc=schnelle@linux.ibm.com \
    --cc=sean.c.dardis@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).