All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Farhan Ali" <alifm@linux.ibm.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH v15 4/7] s390/pci: Store PCI error information for passthrough devices
Date: Tue, 05 May 2026 22:56:25 +0000	[thread overview]
Message-ID: <20260505225625.ADC00C2BCB9@smtp.kernel.org> (raw)
In-Reply-To: <20260505200510.2954-5-alifm@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] The mediated recovery feature for non-KVM passthrough devices is gated by a KVM-dependent Kconfig option, causing it to be compiled out and fail when KVM is disabled.
- [High] A time-of-check to time-of-use (TOCTOU) race condition exists in `zpci_event_attempt_error_recovery` regarding the `mediated_recovery` state, leading to permanently broken passthrough devices.
- [Low] Missing trailing newline (`\n`) characters in `pr_err()` and `pr_info()` format strings.
--

commit b20dbe013c9c596d29efb7c91a62d8992eaf33b9
Author: Farhan Ali <alifm@linux.ibm.com>

s390/pci: Store PCI error information for passthrough devices

This commit modifies s390 PCI error handling to store error information for
passthrough devices. This allows userspace drivers like vfio-pci to manage
device recovery directly, removing the dependency on a KVM file descriptor.

> diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
> --- a/arch/s390/pci/pci_event.c
> +++ b/arch/s390/pci/pci_event.c
> @@ -81,6 +52,46 @@ static bool is_driver_supported(struct pci_driver *driver)
[ ... ]
> +static void zpci_store_pci_error(struct pci_dev *pdev,
> +				 struct zpci_ccdf_err *ccdf)
> +{
> +	struct zpci_dev *zdev = to_zpci(pdev);
> +	int i;
> +
> +	guard(mutex)(&zdev->pending_errs_lock);
> +	if (!zdev->pending_errs.mediated_recovery)
> +		return;
> +
> +	if (zdev->pending_errs.count >= ZPCI_ERR_PENDING_MAX) {
> +		pr_err("%s: Maximum number (%d) of pending error events queued",
> +		       pci_name(pdev), ZPCI_ERR_PENDING_MAX);

Is this missing a trailing newline (\n) in the format string?

> +		return;
> +	}
> +
> +	i = zdev->pending_errs.tail % ZPCI_ERR_PENDING_MAX;
> +	memcpy(&zdev->pending_errs.err[i], ccdf, sizeof(struct zpci_ccdf_err));
> +	zdev->pending_errs.tail++;
> +	zdev->pending_errs.count++;
> +}
[ ... ]
> +void zpci_stop_mediated_recovery(struct zpci_dev *zdev)
> +{
> +	guard(mutex)(&zdev->pending_errs_lock);
> +	zdev->pending_errs.mediated_recovery = false;
> +	if (zdev->pending_errs.count)
> +		pr_info("Unhandled PCI error events count=%d for PCI function 0x%x",
> +			zdev->pending_errs.count, zdev->fid);

Does this pr_info() also need a trailing newline (\n)?

> +	memset(&zdev->pending_errs, 0, sizeof(struct zpci_ccdf_pending));
> +}
> +EXPORT_SYMBOL_GPL(zpci_stop_mediated_recovery);
[ ... ]
> @@ -216,12 +221,25 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
>  		goto out_unlock;
>  	}
>  
> +	zpci_store_pci_error(pdev, ccdf);
> +
>  	ers_res = zpci_event_notify_error_detected(pdev, driver);
>  	if (ers_result_indicates_abort(ers_res)) {
>  		status_str = "failed (abort on detection)";
>  		goto out_unlock;
>  	}
>  
> +	mutex_lock(&zdev->pending_errs_lock);
> +	if (zdev->pending_errs.mediated_recovery) {
> +		pr_info("%s: Leaving recovery of pass-through device to user-space\n",
> +			pci_name(pdev));
> +		ers_res = PCI_ERS_RESULT_RECOVERED;
> +		status_str = "in progress";
> +		mutex_unlock(&zdev->pending_errs_lock);
> +		goto out_unlock;
> +	}
> +	mutex_unlock(&zdev->pending_errs_lock);

Is there a time-of-check to time-of-use race condition here?

zpci_store_pci_error() checks zdev->pending_errs.mediated_recovery and
returns without storing the error if it is false. Later in this function,
we acquire pending_errs_lock again and re-evaluate mediated_recovery to
decide whether to skip host-level recovery.

If a concurrent userspace open of the VFIO device sets mediated_recovery to
true between these two checks, the error is discarded but the host recovery
is still skipped. Could this state be cached once under a single critical
section?

> diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
> --- a/drivers/vfio/pci/vfio_pci_zdev.c
> +++ b/drivers/vfio/pci/vfio_pci_zdev.c
> @@ -144,24 +144,36 @@ int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev,
>  int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>  {
>  	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> +	int ret;
>  
>  	if (!zdev)
>  		return -ENODEV;
>  
> +	zpci_start_mediated_recovery(zdev);

The commit message states the goal is to support userspace drivers without
any KVM fd. However, does placing this logic in vfio_pci_zdev.c inadvertently
tie the mediated recovery feature to KVM?

Since vfio_pci_zdev.c is only built if CONFIG_VFIO_PCI_ZDEV_KVM is enabled,
if a kernel is configured without KVM, vfio_pci_zdev_open_device() resolves
to an empty inline stub. This means mediated_recovery would never be set to
true, and the host kernel would still attempt recovery on the passthrough
device.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260505200510.2954-1-alifm@linux.ibm.com?part=4

  reply	other threads:[~2026-05-05 22:56 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 20:05 [PATCH v15 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-05-05 20:05 ` [PATCH v15 1/7] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
2026-05-05 21:29   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 2/7] PCI: Avoid saving config space state if inaccessible Farhan Ali
2026-05-05 21:58   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 3/7] PCI: Fail FLR when config space is inaccessible Farhan Ali
2026-05-05 22:20   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 4/7] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2026-05-05 22:56   ` sashiko-bot [this message]
2026-05-06  9:38   ` Niklas Schnelle
2026-05-06 17:20     ` Farhan Ali
2026-05-08 19:58       ` Niklas Schnelle
2026-05-05 20:05 ` [PATCH v15 5/7] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2026-05-05 23:27   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver Farhan Ali
2026-05-05 23:56   ` sashiko-bot
2026-05-05 20:05 ` [PATCH v15 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260505225625.ADC00C2BCB9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=alifm@linux.ibm.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.