linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Farhan Ali <alifm@linux.ibm.com>
To: Niklas Schnelle <schnelle@linux.ibm.com>, Lukas Wunner <lukas@wunner.de>
Cc: Benjamin Block <bblock@linux.ibm.com>,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	alex.williamson@redhat.com, helgaas@kernel.org, clg@redhat.com,
	mjrosato@linux.ibm.com
Subject: Re: [PATCH v4 01/10] PCI: Avoid saving error values for config space
Date: Thu, 16 Oct 2025 14:00:22 -0700	[thread overview]
Message-ID: <1ee79c53-4c29-475f-b44e-6839b1feef78@linux.ibm.com> (raw)
In-Reply-To: <bb59edee909ceb09527cedec10896d45126f0027.camel@linux.ibm.com>


On 10/14/2025 5:07 AM, Niklas Schnelle wrote:
> On Sun, 2025-10-12 at 08:34 +0200, Lukas Wunner wrote:
>> On Thu, Oct 09, 2025 at 11:12:03AM +0200, Niklas Schnelle wrote:
>>> On Wed, 2025-10-08 at 20:14 +0200, Lukas Wunner wrote:
>>>> And yet you're touching the device by trying to reset it.
>>>>
>>>> The code you're introducing in patch [01/10] only becomes necessary
>>>> because you're not following the above-quoted protocol.  If you
>>>> follow the protocol, patch [01/10] becomes unnecessary.
>>> I agree with your point above error_detected() should not touch the
>>> device. My understanding of Farhan's series though is that it follows
>>> that rule. As I understand it error_detected() is only used to inject
>>> the s390 specific PCI error event into the VM using the information
>>> stored in patch 7. As before vfio-pci returns
>>> PCI_ERS_RESULT_CAN_RECOVER from error_detected() but then with patch 7
>>> the pass-through case is detected and this gets turned into
>>> PCI_ERS_RESULT_RECOVERED and the rest of the s390 recovery code gets
>>> skipped. And yeah, writing it down I'm not super happy with this part,
>>> maybe it would be better to have an explicit
>>> PCI_ERS_RESULT_LEAVE_AS_IS.
>> Thanks, that's the high-level overview I was looking for.
>>
>> It would be good to include something like this at least
>> in the cover letter or additionally in the commit messages
>> so that it's easier for reviewers to connect the dots.
>>
>> I was expecting paravirtualized error handling, i.e. the
>> VM is aware it's virtualized and vfio essentially proxies
>> the pci_ers_result return value of the driver (e.g. nvme)
>> back to the host, thereby allowing the host to drive error
>> recovery normally.  I'm not sure if there are technical
>> reasons preventing such an approach.
> It does sound technically feasible but sticking to the already
> architected error reporting and recovery has clear advantages. For one
> it will work with existing Linux versions without guest changes and it
> also has precedent with it working already in the z/VM hypervisor for
> years. I agree that there is some level of mismatch with Linux'
> recovery support but I don't think that outweighs having a clean
> virtualization support where the host and guest use the same interface.
>
>> If you do want to stick with your alternative approach,
>> maybe doing the error handling in the ->mmio_enabled() phase
>> instead of ->error_detected() would make more sense.
>> In that phase you're allowed to access the device,
>> you can also attempt a local reset and return
>> PCI_ERS_RESULT_RECOVERED on success.
>>
>> You'd have to return PCI_ERS_RESULT_CAN_RECOVER though
>> from the ->error_detected() callback in order to progress
>> to the ->mmio_enabled() step.
>>
>> Does that make sense?
>>
>> Thanks,
>>
>> Lukas
> The problem with using ->mmio_enabled() is two fold. For one we
> sometimes have to do a reset instead of clearing the error state, for
> example if the device was not only put in the error state but also
> disabled, or if the guest driver wants it, so we would also have to use
> ->slot_reset() and could end up with two resets. Second and more
> importantly this would break the guests assumption that the device will
> be in the error state with MMIO and DMA blocked when it gets an error
> event. On the other hand, that's exactly the state it is in if we
> report the error in the ->error_detected() callback and then skip the
> rest of recovery so it can be done in the guest, likely with the exact
> same Linux code. I'd assume this should be similar if QEMU/KVM wanted
> to virtualize AER+DPC except that there MMIO remains accessible?
>
> Here's an idea. Could it be an option to detect the pass-through in the
> vfio-pci driver's ->error_detected() callback, possibly with feedback
> from QEMU (@Alex?), and then return PCI_ERS_RESULT_RECOVERED from there
> skipping the rest of recovery?
>
> The skipping would be in-line with the below section of the
> documentation i.e. "no further intervention":
>
>    - PCI_ERS_RESULT_RECOVERED
>        Driver returns this if it thinks the device is usable despite
>        the error and does not need further intervention.
>
> It's just that in this case the device really remains with MMIO and DMA
> blocked, usable only in the sense that the vfio-pci + guest VM combo
> knows how to use a device with MMIO and DMA blocked with the guest
> recovery.
>
> Thanks,
> Niklas

Hi Lukas,

Hope this helps to clarify why we still need patch [01/10] (or at least 
the check in pci_save_state() to see if the device responds with error 
value or not if we move forward with your patch series PCI: Universal 
error recoverability of devices). We can discuss if that check needs to 
be moved somewhere else if there is concern with overhead in 
pci_save_state(). Discussing with Niklas (off mailing list), we were 
thinking if it makes sense if vfio_pci_core_aer_err_detected() returned 
PCI_ERS_RESULT_RECOVERED if it doesn't need any further intervention 
from platform recovery to align closer to pcie-error-recovery 
documentation? One proposal would be to have a flag in struct 
vfio_pci_core_device(eg vdev->mediated_recovery), which can be used to 
return PCI_ERS_RESULT_RECOVERED in vfio_pci_core_aer_err_detected()if 
the flag was set. The flag could be set by userspace using 
VFIO_DEVICE_FEATURE_SET for the device feature 
VFIO_DEVICE_FEATURE_ZPCI_ERROR (would like to hear Alex's thoughts on 
this proposal).

Thanks

Farhan


  reply	other threads:[~2025-10-16 21:00 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-24 17:16 [PATCH v4 00/10] Error recovery for vfio-pci devices on s390x Farhan Ali
2025-09-24 17:16 ` [PATCH v4 01/10] PCI: Avoid saving error values for config space Farhan Ali
2025-10-01 15:15   ` Benjamin Block
2025-10-01 17:12     ` Farhan Ali
2025-10-02  9:16       ` Benjamin Block
2025-10-04 14:54       ` Lukas Wunner
2025-10-06 17:54         ` Farhan Ali
2025-10-06 19:26           ` Lukas Wunner
2025-10-06 21:35             ` Farhan Ali
2025-10-08 13:34               ` Lukas Wunner
2025-10-08 17:56                 ` Farhan Ali
2025-10-08 18:14                   ` Lukas Wunner
2025-10-08 21:55                     ` Farhan Ali
2025-10-09  4:52                       ` Lukas Wunner
2025-10-09 17:02                         ` Farhan Ali
2025-10-12  6:43                           ` Lukas Wunner
2025-10-09  9:12                     ` Niklas Schnelle
2025-10-12  6:34                       ` Lukas Wunner
2025-10-14 12:07                         ` Niklas Schnelle
2025-10-16 21:00                           ` Farhan Ali [this message]
2025-10-19 14:34                           ` Lukas Wunner
2025-10-20  8:59                             ` Niklas Schnelle
2025-11-22 10:58                               ` Lukas Wunner
2025-09-24 17:16 ` [PATCH v4 02/10] PCI: Add additional checks for flr reset Farhan Ali
2025-09-30 10:03   ` Benjamin Block
2025-09-30 17:04     ` Farhan Ali
2025-10-01  8:33       ` Benjamin Block
2025-10-01 14:37   ` Benjamin Block
2025-09-24 17:16 ` [PATCH v4 03/10] PCI: Allow per function PCI slots Farhan Ali
2025-10-01 14:34   ` Benjamin Block
2025-09-24 17:16 ` [PATCH v4 04/10] s390/pci: Add architecture specific resource/bus address translation Farhan Ali
2025-09-25 10:54   ` Niklas Schnelle
2025-10-01 16:04     ` Benjamin Block
2025-10-01 18:01       ` Farhan Ali
2025-10-02 12:58   ` Niklas Schnelle
2025-10-02 17:00     ` Bjorn Helgaas
2025-10-02 17:16       ` Ilpo Järvinen
2025-10-02 18:14       ` Niklas Schnelle
2025-09-24 17:16 ` [PATCH v4 05/10] s390/pci: Restore IRQ unconditionally for the zPCI device Farhan Ali
2025-09-24 17:16 ` [PATCH v4 06/10] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2025-09-24 17:16 ` [PATCH v4 07/10] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2025-09-25 14:28   ` Niklas Schnelle
2025-09-25 16:29     ` Farhan Ali
2025-09-24 17:16 ` [PATCH v4 08/10] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2025-09-25  8:04   ` kernel test robot
2025-09-24 17:16 ` [PATCH v4 09/10] vfio: Add a reset_done callback for vfio-pci driver Farhan Ali
2025-09-24 17:16 ` [PATCH v4 10/10] vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ee79c53-4c29-475f-b44e-6839b1feef78@linux.ibm.com \
    --to=alifm@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=bblock@linux.ibm.com \
    --cc=clg@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mjrosato@linux.ibm.com \
    --cc=schnelle@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).