linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Farhan Ali <alifm@linux.ibm.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Benjamin Block <bblock@linux.ibm.com>,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	alex.williamson@redhat.com, helgaas@kernel.org, clg@redhat.com,
	schnelle@linux.ibm.com, mjrosato@linux.ibm.com
Subject: Re: [PATCH v4 01/10] PCI: Avoid saving error values for config space
Date: Thu, 9 Oct 2025 10:02:12 -0700	[thread overview]
Message-ID: <3df48e3e-48e1-4cfb-aca9-7af606481b7c@linux.ibm.com> (raw)
In-Reply-To: <aOc_k2MjZI6hYgKy@wunner.de>


On 10/8/2025 9:52 PM, Lukas Wunner wrote:
> On Wed, Oct 08, 2025 at 02:55:56PM -0700, Farhan Ali wrote:
>>>> On 10/8/2025 6:34 AM, Lukas Wunner wrote:
>>>>> I also don't quite understand why the VM needs to perform a reset.
>>>>> Why can't you just let the VM tell the host that a reset is needed
>>>>> (PCI_ERS_RESULT_NEED_RESET) and then the host resets the device on
>>>>> behalf of the VM?
>> The reset is not performed by the VM, reset is still done by the host. My
>> approach for a VM to let the host know that reset was needed, was to
>> intercept any reset instructions for the PCI device in QEMU. QEMU would then
>> drive a reset via VFIO_DEVICE_RESET. Maybe I am missing something, but based
>> on what we have today in vfio driver, we don't have a mechanism for
>> userspace to reset a device other than VFIO_DEVICE_RESET and
>> VFIO_PCI_DEVICE_HOT_RESET ioctls.
> The ask is for the host to notify the VM of the ->error_detected() event
> and the VM then responding with one of the "enum pci_ers_result" values.

Maybe there is some confusion here. Could you clarify what do you mean 
by VM responding with "enum pci_ers_result" values? Is it a device 
driver (for example an NVMe driver) running in the VM that should do 
that? Or is it something else you are suggesting?

Let me try to clarify what I am trying to do with this patch series. For 
passthrough devices to a VM, the driver bound to the device on the host 
is vfio-pci. vfio-pci driver does support the error_detected() callback 
(vfio_pci_core_aer_err_detected()), and on an PCI error s390x recovery 
code on the host will call the vfio-pci error_detected() callback. The 
vfio-pci error_detected() callback will notify userspace/QEMU via an 
eventfd, and return PCI_ERS_RESULT_CAN_RECOVER. At this point the s390x 
error recovery on the host will skip any further action(see patch 7) and 
let userspace drive the error recovery.

Once userspace/QEMU is notified, it then inject this error into the VM 
so device drivers in the VM can take recovery actions. For example for a 
passthrough NVMe device, the VM's OS NVMe driver will access the device. 
At this point the VM's NVMe driver's error_detected() will drive the 
recovery by returning PCI_ERS_RESULT_NEED_RESET, and the s390x error 
recovery in the VM's OS will try to do a reset. Resets are privileged 
operations and so the VM will need intervention from QEMU to perform the 
reset. QEMU will invoke the ioctls to now notify the host that the VM is 
requesting a reset of the device. The vfio-pci driver on the host will 
then perform the reset on the device.

Thanks Farhan


  reply	other threads:[~2025-10-09 17:02 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-24 17:16 [PATCH v4 00/10] Error recovery for vfio-pci devices on s390x Farhan Ali
2025-09-24 17:16 ` [PATCH v4 01/10] PCI: Avoid saving error values for config space Farhan Ali
2025-10-01 15:15   ` Benjamin Block
2025-10-01 17:12     ` Farhan Ali
2025-10-02  9:16       ` Benjamin Block
2025-10-04 14:54       ` Lukas Wunner
2025-10-06 17:54         ` Farhan Ali
2025-10-06 19:26           ` Lukas Wunner
2025-10-06 21:35             ` Farhan Ali
2025-10-08 13:34               ` Lukas Wunner
2025-10-08 17:56                 ` Farhan Ali
2025-10-08 18:14                   ` Lukas Wunner
2025-10-08 21:55                     ` Farhan Ali
2025-10-09  4:52                       ` Lukas Wunner
2025-10-09 17:02                         ` Farhan Ali [this message]
2025-10-12  6:43                           ` Lukas Wunner
2025-10-09  9:12                     ` Niklas Schnelle
2025-10-12  6:34                       ` Lukas Wunner
2025-10-14 12:07                         ` Niklas Schnelle
2025-10-16 21:00                           ` Farhan Ali
2025-10-19 14:34                           ` Lukas Wunner
2025-10-20  8:59                             ` Niklas Schnelle
2025-11-22 10:58                               ` Lukas Wunner
2025-09-24 17:16 ` [PATCH v4 02/10] PCI: Add additional checks for flr reset Farhan Ali
2025-09-30 10:03   ` Benjamin Block
2025-09-30 17:04     ` Farhan Ali
2025-10-01  8:33       ` Benjamin Block
2025-10-01 14:37   ` Benjamin Block
2025-09-24 17:16 ` [PATCH v4 03/10] PCI: Allow per function PCI slots Farhan Ali
2025-10-01 14:34   ` Benjamin Block
2025-09-24 17:16 ` [PATCH v4 04/10] s390/pci: Add architecture specific resource/bus address translation Farhan Ali
2025-09-25 10:54   ` Niklas Schnelle
2025-10-01 16:04     ` Benjamin Block
2025-10-01 18:01       ` Farhan Ali
2025-10-02 12:58   ` Niklas Schnelle
2025-10-02 17:00     ` Bjorn Helgaas
2025-10-02 17:16       ` Ilpo Järvinen
2025-10-02 18:14       ` Niklas Schnelle
2025-09-24 17:16 ` [PATCH v4 05/10] s390/pci: Restore IRQ unconditionally for the zPCI device Farhan Ali
2025-09-24 17:16 ` [PATCH v4 06/10] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2025-09-24 17:16 ` [PATCH v4 07/10] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2025-09-25 14:28   ` Niklas Schnelle
2025-09-25 16:29     ` Farhan Ali
2025-09-24 17:16 ` [PATCH v4 08/10] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2025-09-25  8:04   ` kernel test robot
2025-09-24 17:16 ` [PATCH v4 09/10] vfio: Add a reset_done callback for vfio-pci driver Farhan Ali
2025-09-24 17:16 ` [PATCH v4 10/10] vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3df48e3e-48e1-4cfb-aca9-7af606481b7c@linux.ibm.com \
    --to=alifm@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=bblock@linux.ibm.com \
    --cc=clg@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mjrosato@linux.ibm.com \
    --cc=schnelle@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).