From: Alex Williamson <alex.williamson@redhat.com>
To: Farhan Ali <alifm@linux.ibm.com>
Cc: linux-s390@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, helgaas@kernel.org,
schnelle@linux.ibm.com, mjrosato@linux.ibm.com
Subject: Re: [PATCH v2 1/9] PCI: Avoid restoring error values in config space
Date: Tue, 26 Aug 2025 09:48:45 -0600 [thread overview]
Message-ID: <20250826094845.517e0fa7.alex.williamson@redhat.com> (raw)
In-Reply-To: <eb6d05d0-b448-4f4e-a734-50c56078dd9b@linux.ibm.com>
On Mon, 25 Aug 2025 15:13:00 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:
> On 8/25/2025 2:35 PM, Alex Williamson wrote:
> > On Mon, 25 Aug 2025 10:12:18 -0700
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >
> >> The current reset process saves the device's config space state before
> >> reset and restores it afterward. However, when a device is in an error
> >> state before reset, config space reads may return error values instead of
> >> valid data. This results in saving corrupted values that get written back
> >> to the device during state restoration. Add validation to prevent writing
> >> error values to the device when restoring the config space state after
> >> reset.
> >>
> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >> ---
> >> drivers/pci/pci.c | 3 +++
> >> 1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index b0f4d98036cd..0dd95d782022 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -1825,6 +1825,9 @@ static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> >> if (!force && val == saved_val)
> >> return;
> >>
> >> + if (PCI_POSSIBLE_ERROR(saved_val))
> >> + return;
> >> +
> >> for (;;) {
> >> pci_dbg(pdev, "restore config %#04x: %#010x -> %#010x\n",
> >> offset, val, saved_val);
> >
> > The commit log makes this sound like more than it is. We're really
> > only error checking the first 64 bytes of config space before restore,
> > the capabilities are not checked. I suppose skipping the BARs and
> > whatnot is no worse than writing -1 to them, but this is only a
> > complete solution in the narrow case where we're relying on vfio-pci to
> > come in and restore the pre-open device state.
> >
> > I had imagined that pci_save_state() might detect the error state of
> > the device, avoid setting state_saved, but we'd still perform the
> > restore callouts that only rely on internal kernel state, maybe adding a
> > fallback to restore the BARs from resource information.
>
> I initially started with pci_save_state(), and avoid saving the state
> altogether. But that would mean we don't go restore the msix state and
> for s390 don't call arch_restore_msi_irqs(). Do you prefer to avoid
> saving the state at all? This change was small and sufficient enough to
> avoid breaking the device in my testing.
If we're only reading -1 from the device anyway, I'm not sure what
value we're adding to continue to save bogus data from the device.
There are also various restore sub-functions that don't need that saved
state, ex. PASID, PRI, ATS, REBAR, AER, MSI, MSIX, ACS, VF REBAR,
SRIOV. We could push the state_saved check down into the functions
that do need the prior device state, add warnings and let the remaining
function proceed. We really need to at least pull BAR values from
resources information for there to be a chance of a functional device
without relying on vfio-pci to restore that though. Thanks,
Alex
next prev parent reply other threads:[~2025-08-26 15:48 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-25 17:12 [PATCH v2 0/9] Error recovery for vfio-pci devices on s390x Farhan Ali
2025-08-25 17:12 ` [PATCH v2 1/9] PCI: Avoid restoring error values in config space Farhan Ali
2025-08-25 21:35 ` Alex Williamson
2025-08-25 22:13 ` Farhan Ali
2025-08-26 15:48 ` Alex Williamson [this message]
2025-08-25 17:12 ` [PATCH v2 2/9] PCI: Add additional checks for flr and pm reset Farhan Ali
2025-08-25 21:54 ` Alex Williamson
2025-08-25 22:28 ` Farhan Ali
2025-08-25 17:12 ` [PATCH v2 3/9] PCI: Allow per function PCI slots for hypervisor isolated functions Farhan Ali
2025-08-27 7:50 ` Niklas Schnelle
2025-08-25 17:12 ` [PATCH v2 4/9] s390/pci: Restore airq unconditionally for the zPCI device Farhan Ali
2025-08-27 13:27 ` Niklas Schnelle
2025-08-25 17:12 ` [PATCH v2 5/9] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2025-08-25 17:12 ` [PATCH v2 6/9] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2025-08-25 17:12 ` [PATCH v2 7/9] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2025-08-25 17:12 ` [PATCH v2 8/9] vfio: Add a reset_done callback for vfio-pci driver Farhan Ali
2025-08-25 17:12 ` [PATCH v2 9/9] vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250826094845.517e0fa7.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=alifm@linux.ibm.com \
--cc=helgaas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=schnelle@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).