From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Lukas Wunner <lukas@wunner.de>, Farhan Ali <alifm@linux.ibm.com>
Cc: Benjamin Block <bblock@linux.ibm.com>,
linux-s390@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
alex.williamson@redhat.com, helgaas@kernel.org, clg@redhat.com,
mjrosato@linux.ibm.com
Subject: Re: [PATCH v4 01/10] PCI: Avoid saving error values for config space
Date: Thu, 09 Oct 2025 11:12:03 +0200 [thread overview]
Message-ID: <d69f239040b830718b124c5bcef01b5075768226.camel@linux.ibm.com> (raw)
In-Reply-To: <aOaqEhLOzWzswx8O@wunner.de>
On Wed, 2025-10-08 at 20:14 +0200, Lukas Wunner wrote:
> On Wed, Oct 08, 2025 at 10:56:35AM -0700, Farhan Ali wrote:
> > On 10/8/2025 6:34 AM, Lukas Wunner wrote:
> > > I'm not sure yet. Let's back up a little: I'm missing an
> > > architectural description how you're intending to do error
> > > recovery in the VM. If I understand correctly, you're
> > > informing the VM of the error via the ->error_detected() callback.
> > >
> > > You're saying you need to check for accessibility of the device
> > > prior to resetting it from the VM, does that mean you're attempting
> > > a reset from the ->error_detected() callback?
> > >
> > > According to Documentation/PCI/pci-error-recovery.rst, the device
> > > isn't supposed to be considered accessible in ->error_detected().
> > > The first callback which allows access is ->mmio_enabled().
> > >
> >
> > The ->error_detected() callback is used to inform userspace of an error. In
> > the case of a VM, using QEMU as a userspace, once notified of an error QEMU
> > will inject an error into the guest in s390x architecture specific way [1]
> > (probably should have linked the QEMU series in the cover letter). Once
> > notified of the error VM's device driver will drive the recovery action. The
> > recovery action require a reset of the device and on s390x PCI devices are
> > reset using architecture specific instructions (zpci_device_hot_reset()).
>
> According to Documentation/PCI/pci-error-recovery.rst:
>
> "STEP 1: Notification
> --------------------
> Platform calls the error_detected() callback on every instance of
> every driver affected by the error.
> At this point, the device might not be accessible anymore, [...]
> it gives the driver a chance to cleanup, waiting for pending stuff
> (timers, whatever, etc...) to complete; it can take semaphores,
> schedule, etc... everything but touch the device."
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> And yet you're touching the device by trying to reset it.
>
> The code you're introducing in patch [01/10] only becomes necessary
> because you're not following the above-quoted protocol. If you
> follow the protocol, patch [01/10] becomes unnecessary.
>
I agree with your point above error_detected() should not touch the
device. My understanding of Farhan's series though is that it follows
that rule. As I understand it error_detected() is only used to inject
the s390 specific PCI error event into the VM using the information
stored in patch 7. As before vfio-pci returns
PCI_ERS_RESULT_CAN_RECOVER from error_detected() but then with patch 7
the pass-through case is detected and this gets turned into
PCI_ERS_RESULT_RECOVERED and the rest of the s390 recovery code gets
skipped. And yeah, writing it down I'm not super happy with this part,
maybe it would be better to have an explicit
PCI_ERS_RESULT_LEAVE_AS_IS.
Either way this leaves the PCI device in the error state just like for
the host the platform leaves the device in the error state. Up until
this point even if the VM/QEMU tried to do a reset already it would get
blocked on at least the zdev->state_lock until the recovery code is
done. Only after the VM would run its recovery code and with that drive
the reset.
> > > I also don't quite understand why the VM needs to perform a reset.
> > > Why can't you just let the VM tell the host that a reset is needed
> > > (PCI_ERS_RESULT_NEED_RESET) and then the host resets the device on
> > > behalf of the VM?
The reason is that we want the behavior from the VMs perspective to
follow s390's PCI error event handling architecture. In this model
however there is no mechanism to synchroniously ask the OS "An error
occurred would you want the device reset?" or to tell it that we as
hypervisor already unblocked MMIO/DMA or performed a reset. So instead
our idea was that we just do the error_detected() part in the host's
recovery code and then leave the device as is driving the rest from the
guest.
Thanks,
Niklas
next prev parent reply other threads:[~2025-10-09 9:12 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-24 17:16 [PATCH v4 00/10] Error recovery for vfio-pci devices on s390x Farhan Ali
2025-09-24 17:16 ` [PATCH v4 01/10] PCI: Avoid saving error values for config space Farhan Ali
2025-10-01 15:15 ` Benjamin Block
2025-10-01 17:12 ` Farhan Ali
2025-10-02 9:16 ` Benjamin Block
2025-10-04 14:54 ` Lukas Wunner
2025-10-06 17:54 ` Farhan Ali
2025-10-06 19:26 ` Lukas Wunner
2025-10-06 21:35 ` Farhan Ali
2025-10-08 13:34 ` Lukas Wunner
2025-10-08 17:56 ` Farhan Ali
2025-10-08 18:14 ` Lukas Wunner
2025-10-08 21:55 ` Farhan Ali
2025-10-09 4:52 ` Lukas Wunner
2025-10-09 17:02 ` Farhan Ali
2025-10-12 6:43 ` Lukas Wunner
2025-10-09 9:12 ` Niklas Schnelle [this message]
2025-10-12 6:34 ` Lukas Wunner
2025-10-14 12:07 ` Niklas Schnelle
2025-10-16 21:00 ` Farhan Ali
2025-10-19 14:34 ` Lukas Wunner
2025-10-20 8:59 ` Niklas Schnelle
2025-11-22 10:58 ` Lukas Wunner
2025-09-24 17:16 ` [PATCH v4 02/10] PCI: Add additional checks for flr reset Farhan Ali
2025-09-30 10:03 ` Benjamin Block
2025-09-30 17:04 ` Farhan Ali
2025-10-01 8:33 ` Benjamin Block
2025-10-01 14:37 ` Benjamin Block
2025-09-24 17:16 ` [PATCH v4 03/10] PCI: Allow per function PCI slots Farhan Ali
2025-10-01 14:34 ` Benjamin Block
2025-09-24 17:16 ` [PATCH v4 04/10] s390/pci: Add architecture specific resource/bus address translation Farhan Ali
2025-09-25 10:54 ` Niklas Schnelle
2025-10-01 16:04 ` Benjamin Block
2025-10-01 18:01 ` Farhan Ali
2025-10-02 12:58 ` Niklas Schnelle
2025-10-02 17:00 ` Bjorn Helgaas
2025-10-02 17:16 ` Ilpo Järvinen
2025-10-02 18:14 ` Niklas Schnelle
2025-09-24 17:16 ` [PATCH v4 05/10] s390/pci: Restore IRQ unconditionally for the zPCI device Farhan Ali
2025-09-24 17:16 ` [PATCH v4 06/10] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2025-09-24 17:16 ` [PATCH v4 07/10] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2025-09-25 14:28 ` Niklas Schnelle
2025-09-25 16:29 ` Farhan Ali
2025-09-24 17:16 ` [PATCH v4 08/10] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2025-09-25 8:04 ` kernel test robot
2025-09-24 17:16 ` [PATCH v4 09/10] vfio: Add a reset_done callback for vfio-pci driver Farhan Ali
2025-09-24 17:16 ` [PATCH v4 10/10] vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d69f239040b830718b124c5bcef01b5075768226.camel@linux.ibm.com \
--to=schnelle@linux.ibm.com \
--cc=alex.williamson@redhat.com \
--cc=alifm@linux.ibm.com \
--cc=bblock@linux.ibm.com \
--cc=clg@redhat.com \
--cc=helgaas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mjrosato@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).