From: Alex Williamson <alex@shazbot.org>
To: Farhan Ali <alifm@linux.ibm.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
Keith Busch <kbusch@kernel.org>,
linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, lukas@wunner.de, clg@redhat.com,
stable@vger.kernel.org, schnelle@linux.ibm.com,
mjrosato@linux.ibm.com, Bjorn Helgaas <bhelgaas@google.com>,
alex@shazbot.org
Subject: Re: [PATCH v9 3/9] PCI: Avoid saving config space state in reset path
Date: Fri, 20 Feb 2026 13:53:09 -0700 [thread overview]
Message-ID: <20260220135309.0014e308@shazbot.org> (raw)
In-Reply-To: <f6ef9900-ae3a-4580-a89d-f497fb4e5adf@linux.ibm.com>
On Thu, 19 Feb 2026 10:06:05 -0800
Farhan Ali <alifm@linux.ibm.com> wrote:
> On 2/18/2026 4:20 PM, Bjorn Helgaas wrote:
> > On Wed, Feb 18, 2026 at 01:48:57PM -0800, Farhan Ali wrote:
> >> On 2/18/2026 11:35 AM, Bjorn Helgaas wrote:
> >>> On Wed, Feb 18, 2026 at 12:02:01PM -0700, Keith Busch wrote:
> >>>> On Tue, Feb 17, 2026 at 11:55:43AM -0800, Farhan Ali wrote:
> >>>>> Yes I think you are right, with this change the PCI Command
> >>>>> register gets restored to state at enumeration. So we will
> >>>>> lose the updated state after pci_clear_master() and
> >>>>> pci_enable_device(). I think we can update the vfio driver to
> >>>>> call pci_save_state() after pci_enable_device()?
> >>>> Either that, or move the pci_enable_device() call to after the
> >>>> function reset.
> >>> I kind of like the latter idea because it seems a little simpler
> >>> for the rule of thumb to be that a reset done by the PCI core
> >>> returns the device to the same state as when the driver first
> >>> probed the device. Drivers would generally not use
> >>> pci_save_state() at all, and they could share some initialization
> >>> logic between probe and post-reset recovery.
> >> I think the vfio-pci driver was intentionally doing the
> >> pci_enable_device() before doing the reset. As per commit
> >> 9a92c5091a42 ("vfio-pci: Enable device before attempting reset") it
> >> was done to handle devices using PM reset, that were getting
> >> incorrectly identified not supporting PM reset due to current state
> >> of the device not being D0. It looks like pci_pm_reset() still
> >> returns -EINVAL if current power state is not D0. So I think we
> >> can't move pci_enable_device() after reset. Unless we want to update
> >> pci_pm_reset() to not use cached value of current_state and read it
> >> directly from register?
> > Devices are generally disabled at .probe() time, so that will be the
> > default saved state. But every driver will expect the device to be
> > enabled after the reset. Skipping the save state at reset time seems
> > like it would need a lot of work first and maybe it wouldn't ever be
> > practical. It wasn't really thought out; I was just hoping we could
> > simplify the save-state model and maybe unify driver reset and error
> > recovery paths. I think we need to drop this patch at least for now.
>
> Yeah, I agree this patch might be too disruptive for drivers. In that
> case would my previous version [1] to at least prevent saving state in
> case of an error be acceptable? Or is there another approach we should
> consider?
>
> [1] https://lore.kernel.org/all/20260122194437.1903-4-alifm@linux.ibm.com/
>
> >
> > 9a92c5091a42 ("vfio-pci: Enable device before attempting reset") was
> > mostly done to make pci_pm_reset() work, which requires the device to
> > be in D0. The main purpose of pci_enable_device() is to make device
> > BARs accessible; it *does* also put the device in D0 because BARs are
> > only accessible in D0, but pci_pm_reset() itself doesn't need the
> > BARs.
> >
> > Other reset methods, e.g., FLR, don't seem to require the device to be
> > in D0, so I'm not sure why pci_pm_reset() requires that. I think the
> > critical piece is the D3->D0 transition, and maybe we could arrange
> > for that to happen even if the device is already in D1/D2/D3hot or
> > even D3cold.
>
> Looking at the PCI spec (v6.1) I didn't see any requirement for the
> device to be in D0 state to perform a power state change. So I think we
> should be able to transition from D1/D2/D3hot to D0. But IIUC if a
> device is in D3cold, then won't register reads/writes fail till power is
> available to the device?
Yes, config space could be inaccessible in D3cold. IIRC, 9a92c5091a42
was specifically addressing that devices are typically provided to the
driver in the PCI_UNKNOWN state and at the time vfio-pci wasn't
changing that in the .probe function, like most drivers would, so we
needed to adjust the ordering of enabling the device versus calling
reset function.
Now that we've gained PM management in vfio-pci, that's no longer an
issue, but pci_pm_reset() does still require the device to arrive in
D0. Accepting devices arriving in D3cold or D3hot (with NoSoftReset-)
might avoid a power state bounce in some circumstances, but would not
have solved the original 9a92c5091a42 scenario where the device was in
PCI_UNKNOWN power state.
Sorry I missed my opportunity to reply to the suggestion for this
approach in the previous revision. I'm not sure if anything
specifically breaks with this approach to restore the initial device
state, but it's certainly not the contract I currently expect as a
user of the reset-function interfaces. I think that contract is
"reset the internal state of the device while saving and restoring
current config space". If we stray from that, what's the expectation
for things like resizable BARs? I don't think we want to reprovision
resources as a result of reset.
Here we seem to be worried about a specific, testable scenario where
config space might be inaccessible after error and applying the
workaround to that regardless whether that specific scenario is preset.
I don't see that a "test if config space is accessible and stuff the
original save state into the buffer rather than creating an invalid
save state" should be so complex as to require this simplification and
associated risk. Thanks,
Alex
next prev parent reply other threads:[~2026-02-20 20:53 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 18:22 [PATCH v9 0/9] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-02-17 18:22 ` [PATCH v9 1/9] PCI: Allow per function PCI slots Farhan Ali
2026-02-17 21:14 ` Keith Busch
2026-02-17 22:26 ` Farhan Ali
2026-02-19 21:37 ` Niklas Schnelle
2026-02-17 18:22 ` [PATCH v9 2/9] s390/pci: Add architecture specific resource/bus address translation Farhan Ali
2026-02-17 18:22 ` [PATCH v9 3/9] PCI: Avoid saving config space state in reset path Farhan Ali
2026-02-17 19:11 ` Keith Busch
2026-02-17 19:55 ` Farhan Ali
2026-02-18 19:02 ` Keith Busch
2026-02-18 19:35 ` Bjorn Helgaas
2026-02-18 21:48 ` Farhan Ali
2026-02-19 0:20 ` Bjorn Helgaas
2026-02-19 18:06 ` Farhan Ali
2026-02-20 20:53 ` Alex Williamson [this message]
2026-02-17 18:22 ` [PATCH v9 4/9] PCI: Add additional checks for flr reset Farhan Ali
2026-02-17 18:22 ` [PATCH v9 5/9] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2026-02-17 18:22 ` [PATCH v9 6/9] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2026-02-17 18:22 ` [PATCH v9 7/9] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2026-02-17 18:22 ` [PATCH v9 8/9] vfio: Add a reset_done callback for vfio-pci driver Farhan Ali
2026-02-17 18:22 ` [PATCH v9 9/9] vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260220135309.0014e308@shazbot.org \
--to=alex@shazbot.org \
--cc=alifm@linux.ibm.com \
--cc=bhelgaas@google.com \
--cc=clg@redhat.com \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mjrosato@linux.ibm.com \
--cc=schnelle@linux.ibm.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox