public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Farhan Ali <alifm@linux.ibm.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: lukas@wunner.de, alex@shazbot.org, kbusch@kernel.org,
	clg@redhat.com, stable@vger.kernel.org, schnelle@linux.ibm.com,
	mjrosato@linux.ibm.com, linux-s390@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v11 0/9] Error recovery for vfio-pci devices on s390x
Date: Tue, 24 Mar 2026 12:34:13 -0700	[thread overview]
Message-ID: <f171af52-87f5-4b1e-9caf-6d7b4e161534@linux.ibm.com> (raw)
In-Reply-To: <20260316191544.2279-1-alifm@linux.ibm.com>

Hi Bjorn,

Polite ping for the PCI patches in the series. I would appreciate any 
feedback for these patches :) .

Thanks

Farhan

On 3/16/2026 12:15 PM, Farhan Ali wrote:
> Hi,
>
> This Linux kernel patch series introduces support for error recovery for
> passthrough PCI devices on System Z (s390x).
>
> Background
> ----------
> For PCI devices on s390x an operating system receives platform specific
> error events from firmware rather than through AER.Today for
> passthrough/userspace devices, we don't attempt any error recovery and
> ignore any error events for the devices. The passthrough/userspace devices
> are managed by the vfio-pci driver. The driver does register error handling
> callbacks (error_detected), and on an error trigger an eventfd to
> userspace.  But we need a mechanism to notify userspace
> (QEMU/guest/userspace drivers) about the error event.
>
> Proposal
> --------
> We can expose this error information (currently only the PCI Error Code)
> via a device feature. Userspace can then obtain the error information
> via VFIO_DEVICE_FEATURE ioctl and take appropriate actions such as driving
> a device reset.
>
> This is how a typical flow for passthrough devices to a VM would work:
> For passthrough devices to a VM, the driver bound to the device on the host
> is vfio-pci. vfio-pci driver does support the error_detected() callback
> (vfio_pci_core_aer_err_detected()), and on an PCI error s390x recovery
> code on the host will call the vfio-pci error_detected() callback. The
> vfio-pci error_detected() callback will notify userspace/QEMU via an
> eventfd, and return PCI_ERS_RESULT_CAN_RECOVER. At this point the s390x
> error recovery on the host will skip any further action(see patch 6) and
> let userspace drive the error recovery.
>
> Once userspace/QEMU is notified, it then injects this error into the VM
> so device drivers in the VM can take recovery actions. For example for a
> passthrough NVMe device, the VM's OS NVMe driver will access the device.
> At this point the VM's NVMe driver's error_detected() will drive the
> recovery by returning PCI_ERS_RESULT_NEED_RESET, and the s390x error
> recovery in the VM's OS will try to do a reset. Resets are privileged
> operations and so the VM will need intervention from QEMU to perform the
> reset. QEMU will invoke the VFIO_DEVICE_RESET ioctl to now notify the
> host that the VM is requesting a reset of the device. The vfio-pci driver
> on the host will then perform the reset on the device to recover it.
>
>
> Thanks
> Farhan
>
> ChangeLog
> ---------
> v10 series https://lore.kernel.org/all/20260302203325.3826-1-alifm@linux.ibm.com/
> v10 -> v11
>      - Rebase on pci/next to handle merge conflicts with patch 1.
>
>      - Typo fixup in commit message (patch 4) and use guard() for mutex
>      (patch 6).
>
> v9 series https://lore.kernel.org/all/20260217182257.1582-1-alifm@linux.ibm.com/
> v9 -> v10
>     - Change pci_slot number to u16 (patch 1).
>
>     - Avoid saving invalid config space state if config space is
>     inaccessible in the device reset path. It uses the same patch as in v8
>     with R-b from Niklas.
>
>     - Rebase on 7.0.0-rc2
>
>
> v8 series https://lore.kernel.org/all/20260122194437.1903-1-alifm@linux.ibm.com/
> v8 -> v9
>     - Avoid saving PCI config space state in reset path (patch 3) (suggested by Bjorn)
>     
>     - Add explicit version to struct vfio_device_feature_zpci_err (patch 7).
>
>     - Rebase on 6.19
>
>
> v7 series https://lore.kernel.org/all/20260107183217.1365-1-alifm@linux.ibm.com/
> v7 -> v8
>     - Rebase on 6.19-rc4
>
>     - Address feedback from Niklas and Julien.
>
>
> v6 series https://lore.kernel.org/all/2c609e61-1861-4bf3-b019-a11c137d26a5@linux.ibm.com/
> v6 -> v7
>      - Rebase on 6.19-rc4
>
>      - Update commit message based on Niklas's suggestion (patch 3).
>
> v5 series https://lore.kernel.org/all/20251113183502.2388-1-alifm@linux.ibm.com/
> v5 -> v6
>     - Rebase on 6.18 + Lukas's PCI: Universal error recoverability of
>     devices series (https://lore.kernel.org/all/cover.1763483367.git.lukas@wunner.de/)
>
>     - Re-work config space accessibility check to pci_dev_save_and_disable() (patch 3).
>     This avoids saving the config space, in the reset path, if the device's config space is
>     corrupted or inaccessible.
>
> v4 series https://lore.kernel.org/all/20250924171628.826-1-alifm@linux.ibm.com/
> v4 -> v5
>      - Rebase on 6.18-rc5
>
>      - Move bug fixes to the beginning of the series (patch 1 and 2). These patches
>      were posted as a separate fixes series
> https://lore.kernel.org/all/a14936ac-47d6-461b-816f-0fd66f869b0f@linux.ibm.com/
>
>      - Add matching pci_put_dev() for pci_get_slot() (patch 6).
>
> v3 series https://lore.kernel.org/all/20250911183307.1910-1-alifm@linux.ibm.com/
> v3 -> v4
>      - Remove warn messages for each PCI capability not restored (patch 1)
>
>      - Check PCI_COMMAND and PCI_STATUS register for error value instead of device id
>      (patch 1)
>
>      - Fix kernel crash in patch 3
>
>      - Added reviewed by tags
>
>      - Address comments from Niklas's (patches 4, 5, 7)
>
>      - Fix compilation error non s390x system (patch 8)
>
>      - Explicitly align struct vfio_device_feature_zpci_err (patch 8)
>
>
> v2 series https://lore.kernel.org/all/20250825171226.1602-1-alifm@linux.ibm.com/
> v2 -> v3
>     - Patch 1 avoids saving any config space state if the device is in error
>     (suggested by Alex)
>
>     - Patch 2 adds additional check only for FLR reset to try other function
>       reset method (suggested by Alex).
>
>     - Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
>       functions. Creates a new flag pci_slot to allow per function slot.
>
>     - Patch 4 fixes a bug in s390 for resource to bus address translation.
>
>     - Rebase on 6.17-rc5
>
>
> v1 series https://lore.kernel.org/all/20250813170821.1115-1-alifm@linux.ibm.com/
> v1 - > v2
>     - Patches 1 and 2 adds some additional checks for FLR/PM reset to
>       try other function reset method (suggested by Alex).
>
>     - Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
>       functions.
>
>     - Patch 7 adds a new device feature for zPCI devices for the VFIO_DEVICE_FEATURE
>       ioctl. The ioctl is used by userspace to retriece any PCI error
>       information for the device (suggested by Alex).
>
>     - Patch 8 adds a reset_done() callback for the vfio-pci driver, to
>       restore the state of the device after a reset.
>
>     - Patch 9 removes the pcie check for triggering VFIO_PCI_ERR_IRQ_INDEX.
>
> Farhan Ali (9):
>    PCI: Allow per function PCI slots
>    s390/pci: Add architecture specific resource/bus address translation
>    PCI: Avoid saving config space state if inaccessible
>    PCI: Add additional checks for flr reset
>    s390/pci: Update the logic for detecting passthrough device
>    s390/pci: Store PCI error information for passthrough devices
>    vfio-pci/zdev: Add a device feature for error information
>    vfio: Add a reset_done callback for vfio-pci driver
>    vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX
>
>   arch/s390/include/asm/pci.h       |  29 ++++++++
>   arch/s390/pci/pci.c               |  75 +++++++++++++++++++++
>   arch/s390/pci/pci_event.c         | 106 +++++++++++++++++-------------
>   drivers/pci/host-bridge.c         |   8 +--
>   drivers/pci/pci.c                 |  26 +++++++-
>   drivers/pci/slot.c                |  31 +++++++--
>   drivers/vfio/pci/vfio_pci_core.c  |  22 +++++--
>   drivers/vfio/pci/vfio_pci_intrs.c |   3 +-
>   drivers/vfio/pci/vfio_pci_priv.h  |   9 +++
>   drivers/vfio/pci/vfio_pci_zdev.c  |  47 ++++++++++++-
>   include/linux/pci.h               |   5 +-
>   include/uapi/linux/vfio.h         |  17 +++++
>   12 files changed, 307 insertions(+), 71 deletions(-)
>

      parent reply	other threads:[~2026-03-24 19:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16 19:15 [PATCH v11 0/9] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-03-16 19:15 ` [PATCH v11 1/9] PCI: Allow per function PCI slots Farhan Ali
2026-03-24 21:55   ` Bjorn Helgaas
2026-03-24 23:08     ` Farhan Ali
2026-03-24 23:20       ` Bjorn Helgaas
2026-03-16 19:15 ` [PATCH v11 2/9] s390/pci: Add architecture specific resource/bus address translation Farhan Ali
2026-03-24 23:06   ` Bjorn Helgaas
2026-03-24 23:47     ` Farhan Ali
2026-03-25 11:58     ` Ilpo Järvinen
2026-03-25 17:44       ` Farhan Ali
2026-03-16 19:15 ` [PATCH v11 3/9] PCI: Avoid saving config space state if inaccessible Farhan Ali
2026-03-24 21:40   ` Bjorn Helgaas
2026-03-24 22:38     ` Farhan Ali
2026-03-24 22:52       ` Bjorn Helgaas
2026-03-16 19:15 ` [PATCH v11 4/9] PCI: Add additional checks for flr reset Farhan Ali
2026-03-24 22:49   ` Bjorn Helgaas
2026-03-24 23:22     ` Farhan Ali
2026-03-25 16:25     ` Alex Williamson
2026-03-25 18:40       ` Farhan Ali
2026-03-16 19:15 ` [PATCH v11 5/9] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2026-03-25 16:46   ` Alex Williamson
2026-03-16 19:15 ` [PATCH v11 6/9] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2026-03-25 17:01   ` Alex Williamson
2026-03-25 18:06     ` Farhan Ali
2026-03-16 19:15 ` [PATCH v11 7/9] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2026-03-25 17:18   ` Alex Williamson
2026-03-16 19:15 ` [PATCH v11 8/9] vfio: Add a reset_done callback for vfio-pci driver Farhan Ali
2026-03-25 17:30   ` Alex Williamson
2026-03-16 19:15 ` [PATCH v11 9/9] vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
2026-03-24 21:26   ` Bjorn Helgaas
2026-03-24 22:30     ` Farhan Ali
2026-03-25 17:50     ` Alex Williamson
2026-03-24 19:34 ` Farhan Ali [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f171af52-87f5-4b1e-9caf-6d7b4e161534@linux.ibm.com \
    --to=alifm@linux.ibm.com \
    --cc=alex@shazbot.org \
    --cc=clg@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mjrosato@linux.ibm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox