public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Farhan Ali <alifm@linux.ibm.com>
To: linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org
Cc: helgaas@kernel.org, alex@shazbot.org, alifm@linux.ibm.com,
	schnelle@linux.ibm.com, mjrosato@linux.ibm.com
Subject: [PATCH v15 0/7] Error recovery for vfio-pci devices on s390x
Date: Tue,  5 May 2026 13:05:03 -0700	[thread overview]
Message-ID: <20260505200510.2954-1-alifm@linux.ibm.com> (raw)

Hi,

This Linux kernel patch series introduces support for error recovery for
passthrough PCI devices on System Z (s390x).

Background
----------
For PCI devices on s390x an operating system receives platform specific
error events from firmware rather than through AER.Today for
passthrough/userspace devices, we don't attempt any error recovery and
ignore any error events for the devices. The passthrough/userspace devices
are managed by the vfio-pci driver. The driver does register error handling
callbacks (error_detected), and on an error trigger an eventfd to
userspace.  But we need a mechanism to notify userspace
(QEMU/guest/userspace drivers) about the error event.

Proposal
--------
We can expose this error information (currently only the PCI Error Code)
via a device feature. Userspace can then obtain the error information
via VFIO_DEVICE_FEATURE ioctl and take appropriate actions such as driving
a device reset.

This is how a typical flow for passthrough devices to a VM would work:
For passthrough devices to a VM, the driver bound to the device on the host
is vfio-pci. vfio-pci driver does support the error_detected() callback
(vfio_pci_core_aer_err_detected()), and on an PCI error s390x recovery
code on the host will call the vfio-pci error_detected() callback. The
vfio-pci error_detected() callback will notify userspace/QEMU via an
eventfd, and return PCI_ERS_RESULT_CAN_RECOVER. At this point the s390x
error recovery on the host will skip any further action(see patch 4) and
let userspace drive the error recovery.

Once userspace/QEMU is notified, it then injects this error into the VM
so device drivers in the VM can take recovery actions. For example for a
passthrough NVMe device, the VM's OS NVMe driver will access the device.
At this point the VM's NVMe driver's error_detected() will drive the
recovery by returning PCI_ERS_RESULT_NEED_RESET, and the s390x error
recovery in the VM's OS will try to do a reset. Resets are privileged
operations and so the VM will need intervention from QEMU to perform the
reset. QEMU will invoke the VFIO_DEVICE_RESET ioctl to now notify the
host that the VM is requesting a reset of the device. The vfio-pci driver
on the host will then perform the reset on the device to recover it.


Thanks
Farhan

ChangeLog
---------
This only includes change log from last 5 revision. Older change log is
available in v14 cover letter.

v14 https://lore.kernel.org/all/20260421163031.704-1-alifm@linux.ibm.com/
v14 -> v15
   - Fix issues identified by Sashiko (patch 4).

   - Address Niklas feedback (patch 4 and patch 5).

   - Rebase on 7.1-rc2.


v13 series https://lore.kernel.org/all/20260413210608.2912-1-alifm@linux.ibm.com/
v13 -> v14
   - Remove version from vfio uAPI struct. Instead reserve additional space
   and add a flags field. The flags will be used to indicate any usage of
   the reserved space (patch 5).

   - Remove pending_errors from vfio uAPI struct and instead return an
   error to indicate no more pending error for userspace to handle (patch 5).

   - Rebase on recent linux master

v12 series https://lore.kernel.org/all/20260330174011.1161-1-alifm@linux.ibm.com/
v12 -> v13
   - Add the mediated_recovery flag as part of struct zpci_ccdf_pending
   and protect the struct with pending_errs_lock (patch 4).

   - Move dequeing pending error logic to a helper function (patch 5).

   - Update device feature number for VFIO_DEVICE_FEATURE_ZPCI_ERROR (patch 5).

   - Rebase on linux-next with tag next-20260410


v11 series https://lore.kernel.org/all/20260316191544.2279-1-alifm@linux.ibm.com/
   - Address Bjorn's comments from v11 (patches 1-3).

   - Create a common function to check config space accessibility
   (patch 2).

   - Address Alex's comments from v11 (patches 4, 5, 7).

   - Protect the mediated_recovery flag with the pending_errs_lock.
   Doing that it made sense to squash patches 5 and 6 from v11
   (current patch 4). Even though the code didn't change significantly
   I have dropped R-b tags for it. Would appreciate another look at the
   patch (current patch 4).

   - Dropped arch specific pcibios_resource_to_bus and
   pcibios_bus_to_resource as its not needed for this series. Will address
   the issue as a standalone patch separate from this series.

   - Rebased on pci/next, with head at f8a1c947ccc6 ("Merge branch 'pci/misc'")


v10 series https://lore.kernel.org/all/20260302203325.3826-1-alifm@linux.ibm.com/
v10 -> v11
   - Rebase on pci/next to handle merge conflicts with patch 1.

   - Typo fixup in commit message (patch 4) and use guard() for mutex
    (patch 6).



Farhan Ali (7):
  PCI: Allow per function PCI slots to fix slot reset on s390
  PCI: Avoid saving config space state if inaccessible
  PCI: Fail FLR when config space is inaccessible
  s390/pci: Store PCI error information for passthrough devices
  vfio-pci/zdev: Add a device feature for error information
  vfio/pci: Add a reset_done callback for vfio-pci driver
  vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX

 arch/s390/include/asm/pci.h       |  32 +++++++
 arch/s390/pci/pci.c               |   1 +
 arch/s390/pci/pci_event.c         | 133 ++++++++++++++++++------------
 drivers/pci/hotplug/rpaphp_slot.c |   2 +-
 drivers/pci/pci.c                 |  32 ++++++-
 drivers/pci/slot.c                |  33 ++++++--
 drivers/vfio/pci/vfio_pci_core.c  |  22 +++--
 drivers/vfio/pci/vfio_pci_intrs.c |   3 +-
 drivers/vfio/pci/vfio_pci_priv.h  |   9 ++
 drivers/vfio/pci/vfio_pci_zdev.c  |  57 ++++++++++++-
 include/linux/pci.h               |   8 +-
 include/uapi/linux/vfio.h         |  30 +++++++
 12 files changed, 287 insertions(+), 75 deletions(-)

-- 
2.43.0


             reply	other threads:[~2026-05-05 20:05 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 20:05 Farhan Ali [this message]
2026-05-05 20:05 ` [PATCH v15 1/7] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
2026-05-05 20:05 ` [PATCH v15 2/7] PCI: Avoid saving config space state if inaccessible Farhan Ali
2026-05-05 20:05 ` [PATCH v15 3/7] PCI: Fail FLR when config space is inaccessible Farhan Ali
2026-05-05 20:05 ` [PATCH v15 4/7] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2026-05-05 20:05 ` [PATCH v15 5/7] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2026-05-05 20:05 ` [PATCH v15 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver Farhan Ali
2026-05-05 20:05 ` [PATCH v15 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260505200510.2954-1-alifm@linux.ibm.com \
    --to=alifm@linux.ibm.com \
    --cc=alex@shazbot.org \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=schnelle@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox