public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Farhan Ali <alifm@linux.ibm.com>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org
Cc: helgaas@kernel.org, lukas@wunner.de, alex@shazbot.org,
	clg@redhat.com, mjrosato@linux.ibm.com
Subject: Re: [PATCH v14 5/7] vfio-pci/zdev: Add a device feature for error information
Date: Wed, 29 Apr 2026 11:40:10 +0200	[thread overview]
Message-ID: <56bcc0f67c15ad01f6fff161164698f1689af0f1.camel@linux.ibm.com> (raw)
In-Reply-To: <20260421163031.704-6-alifm@linux.ibm.com>

On Tue, 2026-04-21 at 09:30 -0700, Farhan Ali wrote:
> For zPCI devices, we have platform specific error information. The platform
> firmware provides this error information to the operating system in an
> architecture specific mechanism. To enable recovery from userspace for
> these devices, we want to expose this error information to userspace. Add a
> new device feature to expose this information.
> 
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  arch/s390/include/asm/pci.h      |  2 ++
>  arch/s390/pci/pci_event.c        | 19 ++++++++++++++++
>  drivers/vfio/pci/vfio_pci_core.c |  2 ++
>  drivers/vfio/pci/vfio_pci_priv.h |  9 ++++++++
>  drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h        | 29 ++++++++++++++++++++++++
>  6 files changed, 100 insertions(+)
> 
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 9a6a4eb9d7c1..b1858d512fec 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -360,6 +360,8 @@ int zpci_clear_error_state(struct zpci_dev *zdev);
>  int zpci_reset_load_store_blocked(struct zpci_dev *zdev);
>  void zpci_start_mediated_recovery(struct zpci_dev *zdev);
>  void zpci_stop_mediated_recovery(struct zpci_dev *zdev);
> +int zpci_get_pending_error(struct zpci_dev *zdev,
> +			   struct zpci_ccdf_err *ccdf);
>  
>  #ifdef CONFIG_NUMA
>  
> diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
> index c279a9f50a64..227a939fcad3 100644
> --- a/arch/s390/pci/pci_event.c
> +++ b/arch/s390/pci/pci_event.c
> @@ -74,6 +74,25 @@ static void zpci_store_pci_error(struct pci_dev *pdev,
>  	zdev->pending_errs.count++;
>  }
>  
> +int zpci_get_pending_error(struct zpci_dev *zdev,
> +			    struct zpci_ccdf_err *ccdf)
> +{
> +	int head = 0;
> +
> +	guard(mutex)(&zdev->pending_errs_lock);
> +
> +	if (!zdev->pending_errs.count)
> +		return -ENOMSG;

I think we may want to explicitly document that the feature ioctl fails
with -ENOMSG if there are no  more errors.

> +
> +	head = zdev->pending_errs.head % ZPCI_ERR_PENDING_MAX;
> +	memcpy(ccdf, &zdev->pending_errs.err[head],
> +		sizeof(struct zpci_ccdf_err));
> +	zdev->pending_errs.head++;
> +	zdev->pending_errs.count--;
> +	return 0;
> +}
--- snip ---
>  
> +int vfio_pci_zdev_feature_err(struct vfio_device *device, u32 flags,
> +			      void __user *arg, size_t argsz)
> +{
> +	struct vfio_device_feature_zpci_err err = {};
> +	struct vfio_pci_core_device *vdev;
> +	struct zpci_ccdf_err ccdf = {};
> +	struct zpci_dev *zdev;
> +	int ret;
> +
> +	vdev = container_of(device, struct vfio_pci_core_device, vdev);
> +	zdev = to_zpci(vdev->pdev);
> +	if (!zdev)
> +		return -ENODEV;
> +
> +	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
> +				 sizeof(err));
> +	if (ret != 1)
> +		return ret;
> +
> +	ret = zpci_get_pending_error(zdev, &ccdf);
> +	if (ret)
> +		return ret;
> +
> +	err.fh = ccdf.fh;
> +	err.fid = ccdf.fid;
> +	err.ett = ccdf.ett;
> +	err.mvn = ccdf.mvn;
> +	err.dmaas = ccdf.dmaas;
> +	err.q = ccdf.q;
> +	err.rw = ccdf.rw;
> +	err.faddr = ccdf.faddr;
> +	err.pec = ccdf.pec;

I like this explicit approach and I think having all of the information
available from the get got is a nice side effect also.

> +
> +	if (copy_to_user(arg, &err, sizeof(err)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev)
>  {
>  	struct zpci_dev *zdev = to_zpci(vdev->pdev);
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 5de618a3a5ee..e78f53ab561a 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1534,6 +1534,35 @@ struct vfio_device_feature_dma_buf {
>   */
>  #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2  12
>  
> +/**
> + * VFIO_DEVICE_FEATURE_ZPCI_ERROR feature provides PCI error information to
> + * userspace for vfio-pci devices on s390x. On s390x, PCI error recovery

In the kernel we use s390 while qemu uses s390x (and it really is 64
bit only for both) it's confusing but it is what it is. Stick to s390
here.

> + * involves platform firmware and notification to operating system is done

"operating systems" (plural) or "the operating system"

> + * by architecture specific mechanism. Exposing this information to
> + * userspace allows it to take appropriate actions to handle an
> + * error on the device.
> + */
> +
> +struct vfio_device_feature_zpci_err {
> +	__u32 flags;			/* Indicate feature in non CCDF reserved space */

The "non CCDF" feels a bit subtle for marking this as a vfio-pci
feature specific field. Maybe call it "vfio_feat_flags"?

> +	__u32 reserved1;
> +	__u32 fh;			/* function handle */
> +	__u32 fid;			/* function id */
> +	__u32 ett		:  4;	/* expected table type */
> +	__u32 mvn		: 12;	/* MSI vector number */
> +	__u32 dmaas		:  8;	/* DMA address space */
> +	__u32 reserved2		:  6;
> +	__u32 q			:  1;	/* event qualifier */
> +	__u32 rw		:  1;	/* read/write */
> +	__u64 faddr;			/* failing address */
> +	__u32 reserved3;
> +	__u16 reserved4;
> +	__u16 pec;			/* PCI event code */
> +	__u8 reserved5[28];		/* Non CCDF reserved space */

While also vfio-pci specific for now, the reserved5 would potentially
become CCDF if that ever grows. Maybe the comment could say "Allow for
future expansion"?

> +};
> +
> +#define VFIO_DEVICE_FEATURE_ZPCI_ERROR 13
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
>  

I like this new approach with returning one error ccdf at a time and
then -ENOMSG when there are no more errors. Thanks Alex for the
suggestion! 

So feel free to add my:

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>

Thanks,
Niklas

  reply	other threads:[~2026-04-29  9:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 16:30 [PATCH v14 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-04-21 16:30 ` [PATCH v14 1/7] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
2026-05-04 15:52   ` Gerd Bayer
2026-05-04 17:00     ` Farhan Ali
2026-05-06 11:55       ` Gerd Bayer
2026-04-21 16:30 ` [PATCH v14 2/7] PCI: Avoid saving config space state if inaccessible Farhan Ali
2026-04-21 16:30 ` [PATCH v14 3/7] PCI: Fail FLR when config space is inaccessible Farhan Ali
2026-04-21 16:30 ` [PATCH v14 4/7] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2026-04-29 11:41   ` Niklas Schnelle
2026-04-29 16:48     ` Farhan Ali
2026-04-30  7:48       ` Niklas Schnelle
2026-04-30 16:44         ` Farhan Ali
2026-04-21 16:30 ` [PATCH v14 5/7] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2026-04-29  9:40   ` Niklas Schnelle [this message]
2026-04-29 16:59     ` Farhan Ali
2026-04-30  8:35       ` Niklas Schnelle
2026-04-21 16:30 ` [PATCH v14 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver Farhan Ali
2026-04-21 16:30 ` [PATCH v14 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
2026-04-28 18:30 ` [PATCH v14 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-04-28 22:01 ` Bjorn Helgaas
2026-04-29 17:02   ` Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56bcc0f67c15ad01f6fff161164698f1689af0f1.camel@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=alex@shazbot.org \
    --cc=alifm@linux.ibm.com \
    --cc=clg@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mjrosato@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox