From: Alex Williamson <alex.williamson@redhat.com>
To: Farhan Ali <alifm@linux.ibm.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>,
Bjorn Helgaas <helgaas@kernel.org>,
linux-s390@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, mjrosato@linux.ibm.com
Subject: Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI
Date: Thu, 14 Aug 2025 14:57:43 -0600 [thread overview]
Message-ID: <20250814145743.204ca19a.alex.williamson@redhat.com> (raw)
In-Reply-To: <350a9bd5-c2a9-4206-98fd-8a7913d36112@linux.ibm.com>
On Thu, 14 Aug 2025 09:33:47 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:
> On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
> > On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
> >> On Wed, 13 Aug 2025 14:52:24 -0700
> >> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>
> >>> On 8/13/2025 1:30 PM, Alex Williamson wrote:
> >>>> On Wed, 13 Aug 2025 10:08:19 -0700
> >>>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>>
> >>>>> For zPCI devices we should drive a platform specific function reset
> >>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
> >>>>> in error state.
> >>>>>
> >>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >>>>> ---
> >>>>> arch/s390/pci/pci.c | 1 +
> >>>>> drivers/vfio/pci/vfio_pci_core.c | 4 ++++
> >>>>> drivers/vfio/pci/vfio_pci_priv.h | 5 ++++
> >>>>> drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
> >>>>> 4 files changed, 49 insertions(+)
> > --- snip ---
> >>>>>
> >>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
> >>>>> +{
> >>>>> + struct zpci_dev *zdev = to_zpci(vdev->pdev);
> >>>>> + int rc = -EIO;
> >>>>> +
> >>>>> + if (!zdev)
> >>>>> + return -ENODEV;
> >>>>> +
> >>>>> + /*
> >>>>> + * If we can't get the zdev->state_lock the device state is
> >>>>> + * currently undergoing a transition and we bail out - just
> >>>>> + * the same as if the device's state is not configured at all.
> >>>>> + */
> >>>>> + if (!mutex_trylock(&zdev->state_lock))
> >>>>> + return rc;
> >>>>> +
> >>>>> + /* We can reset only if the function is configured */
> >>>>> + if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
> >>>>> + goto out;
> >>>>> +
> >>>>> + rc = zpci_hot_reset_device(zdev);
> >>>>> + if (rc != 0)
> >>>>> + goto out;
> >>>>> +
> >>>>> + if (!vdev->pci_saved_state) {
> >>>>> + pci_err(vdev->pdev, "No saved available for the device");
> >>>>> + rc = -EIO;
> >>>>> + goto out;
> >>>>> + }
> >>>>> +
> >>>>> + pci_dev_lock(vdev->pdev);
> >>>>> + pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
> >>>>> + pci_restore_state(vdev->pdev);
> >>>>> + pci_dev_unlock(vdev->pdev);
> >>>>> +out:
> >>>>> + mutex_unlock(&zdev->state_lock);
> >>>>> + return rc;
> >>>>> +}
> >>>> This looks like it should be a device or arch specific reset
> >>>> implemented in drivers/pci, not vfio. Thanks,
> >>>>
> >>>> Alex
> >>> Are you suggesting to move this to an arch specific function? One thing
> >>> we need to do after the zpci_hot_reset_device, is to correctly restore
> >>> the config space of the device. And for vfio-pci bound devices we want
> >>> to restore the state of the device to when it was initially opened.
> >> We generally rely on the abstraction of pci_reset_function() to select
> >> the correct type of reset for a function scope reset. We've gone to
> >> quite a bit of effort to implement all device specific resets and
> >> quirks in the PCI core to be re-used across the kernel.
> >>
> >> Calling zpci_hot_reset_device() directly seems contradictory to those
> >> efforts. Should pci_reset_function() call this universally on s390x
> >> rather than providing access to FLR/PM/SBR reset?
> >>
> > I agree with you Alex. Still trying to figure out what's needed for
> > this. We already do zpci_hot_reset_device() in reset_slot() from the
> > s390_pci_hpc.c hotplug slot driver and that does get called via
> > pci_reset_hotplug_slot() and pci_reset_function(). There are a few
> > problems though that meant it didn't work for Farhan but I agree maybe
> > we can fix them for the general case. For one pci_reset_function()
> > via DEVICE_RESET first tries FLR but that won't work with the device in
> > the error state and MMIO blocked. Sadly __pci_reset_function_locked()
> > then concludes that other resets also won't work. So that's something
> > we might want to improve in general, for example maybe we need
> > something more like pci_dev_acpi_reset() with higher priority than FLR.
>
> Yeah I did think of adding something like s390x CLP reset as part of the
> reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But
> that would introduce s390x specific code in pci core common code.
>
> >
> > Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
> > sure why that won't work as is. @Farhan do you know?
>
> VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for
> majority of PCI devices on s390x as that would drive a bus reset. It was
> sufficient as most devices were single bus devices. But in the latest
> generation of machines (z17) we expose true SR-IOV and an OS can have
> access to both PF and VFs and so these are on the same bus and can have
> different ownership based on what is bound to vfio-pci.
>
> My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per
> function reset mechanism, which maps well with what our architecture
> provides. On s390x we can drive a per function reset (via firmware)
> through the CLP instruction driven by the zpci_hot_reset_device(). And
> doing it as vfio zpci specific function would confine the s390x logic.
>
> >> Why is it
> >> universally correct here given the ioctl previously made use of
> >> standard reset mechanisms?
> >>
> >> The DEVICE_RESET ioctl is simply an in-place reset of the device,
> >> without restoring the original device state. So we're also subtly
> >> changing that behavior here, presumably because we're targeting the
> >> specific error recovery case. Have you considered how this might
> >> break non-error-recovery use cases?
> >>
> >> I wonder if we want a different reset mechanism for this use case
> >> rather than these subtle semantic changes.
> > I think an alternative to that, which Farhan actually had in the
> > previous internal version, is to implement
> > pci_error_handlers::reset_done() and do the pci_load_saved_state()
> > there. That would only affect the error recovery case leaving other
> > cases alone.
> >
> >
> > Thanks,
> > Niklas
>
> The reason I abandoned reset_done() callback idea is because its not
> sufficient to recover the device correctly. Today before driving a reset
> we save the state of the device. When a device is in error state, any
> pci load/store (on s390x they are actual instructions :)) to config
> space would return an error value (0xffffffff). We don't have any checks
> in pci_save_state to prevent storing error values. And after a reset
> when we try to restore the config space (pci_dev_restore) we try to
> write the error value and this can be problematic. By the time the
> reset_done() callback is invoked, its already too late.
It's too late because we've re-written the error value back to config
space and as a result the device is broken? What if
pci_restore_state() were a little smarter to detect that it has bad
read data from pci_save_state() and only restores state based on kernel
data? Would that leave the device in a functional state that
reset_done() could restore the original saved state and push it out to
the device?
> @Alex,
> I am open to ideas/suggestions on this. Do we think we need a separate
> VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?
Unfortunately I was short sighted on VFIO_DEVICE_RESET and it's the one
ioctl that doesn't have any flags, so it's not very extensible.
Can we do more of the above, ie. enlighten the FLR/PM reset callbacks to
return -ENOTTY if the device is in an error state and config space is
returning -1 such that we fall through to a slot reset that doesn't
care how broken the device is and you auto-magically get the zpci
function you want? Follow-up with pushing the original state in
reset_done()? Thanks,
Alex
next prev parent reply other threads:[~2025-08-14 20:57 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-13 17:08 [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
2025-08-13 17:08 ` [PATCH v1 1/6] s390/pci: Restore airq unconditionally for the zPCI device Farhan Ali
2025-08-14 11:32 ` Niklas Schnelle
2025-08-14 16:42 ` Farhan Ali
2025-08-13 17:08 ` [PATCH v1 2/6] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2025-08-13 17:08 ` [PATCH v1 3/6] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2025-08-13 17:08 ` [PATCH v1 4/6] vfio-pci/zdev: Setup a zpci memory region for error information Farhan Ali
2025-08-13 20:30 ` Alex Williamson
2025-08-13 21:25 ` Farhan Ali
2025-08-13 21:42 ` Alex Williamson
2025-08-13 17:08 ` [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI Farhan Ali
2025-08-13 20:30 ` Alex Williamson
2025-08-13 21:52 ` Farhan Ali
2025-08-13 22:56 ` Alex Williamson
2025-08-14 13:12 ` Niklas Schnelle
2025-08-14 16:33 ` Farhan Ali
2025-08-14 19:55 ` Niklas Schnelle
2025-08-14 20:57 ` Alex Williamson [this message]
2025-08-14 22:33 ` Farhan Ali
2025-08-14 5:22 ` kernel test robot
2025-08-14 7:42 ` kernel test robot
2025-08-13 17:08 ` [PATCH v1 6/6] vfio: Allow error notification and recovery for ISM device Farhan Ali
2025-08-14 20:48 ` Bjorn Helgaas
2025-08-14 21:02 ` Farhan Ali
2025-08-15 20:48 ` Alex Williamson
2025-08-15 21:36 ` Farhan Ali
2025-08-13 17:45 ` [PATCH v1 0/6] Error recovery for vfio-pci devices on s390x Farhan Ali
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250814145743.204ca19a.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=alifm@linux.ibm.com \
--cc=helgaas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=schnelle@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).