Linux PCI subsystem development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Farhan Ali" <alifm@linux.ibm.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH v15 4/7] s390/pci: Store PCI error information for passthrough devices
Date: Tue, 05 May 2026 22:56:25 +0000	[thread overview]
Message-ID: <20260505225625.ADC00C2BCB9@smtp.kernel.org> (raw)
In-Reply-To: <20260505200510.2954-5-alifm@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] The mediated recovery feature for non-KVM passthrough devices is gated by a KVM-dependent Kconfig option, causing it to be compiled out and fail when KVM is disabled.
- [High] A time-of-check to time-of-use (TOCTOU) race condition exists in `zpci_event_attempt_error_recovery` regarding the `mediated_recovery` state, leading to permanently broken passthrough devices.
- [Low] Missing trailing newline (`\n`) characters in `pr_err()` and `pr_info()` format strings.
--

commit b20dbe013c9c596d29efb7c91a62d8992eaf33b9
Author: Farhan Ali <alifm@linux.ibm.com>

s390/pci: Store PCI error information for passthrough devices

This commit modifies s390 PCI error handling to store error information for
passthrough devices. This allows userspace drivers like vfio-pci to manage
device recovery directly, removing the dependency on a KVM file descriptor.

> diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
> --- a/arch/s390/pci/pci_event.c
> +++ b/arch/s390/pci/pci_event.c
> @@ -81,6 +52,46 @@ static bool is_driver_supported(struct pci_driver *driver)
[ ... ]
> +static void zpci_store_pci_error(struct pci_dev *pdev,
> +				 struct zpci_ccdf_err *ccdf)
> +{
> +	struct zpci_dev *zdev = to_zpci(pdev);
> +	int i;
> +
> +	guard(mutex)(&zdev->pending_errs_lock);
> +	if (!zdev->pending_errs.mediated_recovery)
> +		return;
> +
> +	if (zdev->pending_errs.count >= ZPCI_ERR_PENDING_MAX) {
> +		pr_err("%s: Maximum number (%d) of pending error events queued",
> +		       pci_name(pdev), ZPCI_ERR_PENDING_MAX);

Is this missing a trailing newline (\n) in the format string?

> +		return;
> +	}
> +
> +	i = zdev->pending_errs.tail % ZPCI_ERR_PENDING_MAX;
> +	memcpy(&zdev->pending_errs.err[i], ccdf, sizeof(struct zpci_ccdf_err));
> +	zdev->pending_errs.tail++;
> +	zdev->pending_errs.count++;
> +}
[ ... ]
> +void zpci_stop_mediated_recovery(struct zpci_dev *zdev)
> +{
> +	guard(mutex)(&zdev->pending_errs_lock);
> +	zdev->pending_errs.mediated_recovery = false;
> +	if (zdev->pending_errs.count)
> +		pr_info("Unhandled PCI error events count=%d for PCI function 0x%x",
> +			zdev->pending_errs.count, zdev->fid);

Does this pr_info() also need a trailing newline (\n)?

> +	memset(&zdev->pending_errs, 0, sizeof(struct zpci_ccdf_pending));
> +}
> +EXPORT_SYMBOL_GPL(zpci_stop_mediated_recovery);
[ ... ]
> @@ -216,12 +221,25 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
>  		goto out_unlock;
>  	}
>  
> +	zpci_store_pci_error(pdev, ccdf);
> +
>  	ers_res = zpci_event_notify_error_detected(pdev, driver);
>  	if (ers_result_indicates_abort(ers_res)) {
>  		status_str = "failed (abort on detection)";
>  		goto out_unlock;
>  	}
>  
> +	mutex_lock(&zdev->pending_errs_lock);
> +	if (zdev->pending_errs.mediated_recovery) {
> +		pr_info("%s: Leaving recovery of pass-through device to user-space\n",
> +			pci_name(pdev));
> +		ers_res = PCI_ERS_RESULT_RECOVERED;
> +		status_str = "in progress";
> +		mutex_unlock(&zdev->pending_errs_lock);
> +		goto out_unlock;
> +	}
> +	mutex_unlock(&zdev->pending_errs_lock);

Is there a time-of-check to time-of-use race condition here?

zpci_store_pci_error() checks zdev->pending_errs.mediated_recovery and
returns without storing the error if it is false. Later in this function,
we acquire pending_errs_lock again and re-evaluate mediated_recovery to
decide whether to skip host-level recovery.

If a concurrent userspace open of the VFIO device sets mediated_recovery to
true between these two checks, the error is discarded but the host recovery
is still skipped. Could this state be cached once under a single critical
section?

> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> @@ -144,24 +144,36 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>  int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>  {
>  	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> +	int ret;
>  
>  	if (!zdev)
>  		return -ENODEV;
>  
> +	zpci_start_mediated_recovery(zdev);

The commit message states the goal is to support userspace drivers without
any KVM fd. However, does placing this logic in vfio_pci_zdev.c inadvertently
tie the mediated recovery feature to KVM?

Since vfio_pci_zdev.c is only built if CONFIG_VFIO_PCI_ZDEV_KVM is enabled,
if a kernel is configured without KVM, vfio_pci_zdev_open_device() resolves
to an empty inline stub. This means mediated_recovery would never be set to
true, and the host kernel would still attempt recovery on the passthrough
device.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260505200510.2954-1-alifm@linux.ibm.com?part=4

  reply	other threads:[~2026-05-05 22:56 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 20:05 [PATCH v15 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-05-05 20:05 ` [PATCH v15 1/7] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
2026-05-05 21:29   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 2/7] PCI: Avoid saving config space state if inaccessible Farhan Ali
2026-05-05 21:58   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 3/7] PCI: Fail FLR when config space is inaccessible Farhan Ali
2026-05-05 22:20   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 4/7] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2026-05-05 22:56   ` sashiko-bot [this message]
2026-05-06  9:38   ` Niklas Schnelle
2026-05-06 17:20     ` Farhan Ali
2026-05-08 19:58       ` Niklas Schnelle
2026-05-05 20:05 ` [PATCH v15 5/7] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2026-05-05 23:27   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver Farhan Ali
2026-05-05 23:56   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260505225625.ADC00C2BCB9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=alifm@linux.ibm.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox