From: Alex Williamson <alex.williamson@redhat.com>
To: Yi Liu <yi.l.liu@intel.com>
Cc: jgg@nvidia.com, kevin.tian@intel.com, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
xudong.hao@intel.com, yan.y.zhao@intel.com,
terrence.xu@intel.com, yanting.jiang@intel.com
Subject: Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
Date: Mon, 27 Mar 2023 14:40:56 -0600 [thread overview]
Message-ID: <20230327144056.3a287eea.alex.williamson@redhat.com> (raw)
In-Reply-To: <20230327132619.5ab15440.alex.williamson@redhat.com>
On Mon, 27 Mar 2023 13:26:19 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:
> On Mon, 27 Mar 2023 02:34:58 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
>
> > This is a preparation for vfio device cdev as cdev gives userspace the
> > capability to open device cdev fd and management stack (e.g. libvirt)
> > could pass the device fd to the actual user (e.g. QEMU). As a result,
> > the actual user has no idea about the device's bus:devfn information.
> > This is a problem when user uses VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to
> > know the hot reset affected device scope as this ioctl returns bus:devfn
> > info. For the fd passing usage, the acutal user cannot map the bus:devfn
> > to the devices it has opened via the fd passed from management stack. So
> > a new ioctl is required.
> >
> > This new ioctl reports the list of iommufd dev_id that is opened by the
> > user. If there is affected device that is not bound to vfio driver or
> > opened by another user, this command shall fail with -EPERM. For the
> > noiommu mode in the vfio device cdev path, this shall fail as no dev_id
> > would be generated, hence nothing to report.
> >
> > This ioctl is useless to the users that open vfio group as such users
> > have no idea about the iommufd dev_id and it can use the existing
> > VFIO_DEVICE_GET_PCI_HOT_RESET_INFO. The user that uses the traditional
> > mode vfio group/container would be failed if invoking this ioctl. But
> > the user that uses the iommufd compat mode vfio group/container shall
> > succeed. This is harmless as long as user cannot make use of it and
> > should use VFIO_DEVICE_GET_PCI_HOT_RESET_INFO.
>
>
> So VFIO_DEVICE_GET_PCI_HOT_RESET_INFO reports a group and bdf, but
> VFIO_DEVICE_GET_PCI_HOT_RESET_*GROUP*_INFO is meant for the non-group,
> cdev use case and returns a dev_id rather than a group???
>
> Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags arg that
> isn't used, why do we need a new ioctl vs defining
> VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID. In fact, we could define
> vfio_dependent_device as:
>
> struct vfio_pci_dependent_device {
> union {
> __u32 group_id;
> __u32 dev_id;
> }
> __u16 segment;
> __u8 bus;
> __u8 devfn;
> };
>
> If the user calls with the above flag, dev_id is valid, otherwise
> group_id. Perhaps segment:buus:devfn could still be filled in with a
> NULL/invalid dev_id if the user doesn't have permissions for the device
> so that debugging from userspace isn't so opaque. Thanks,
>
> Alex
>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > drivers/vfio/pci/vfio_pci_core.c | 98 ++++++++++++++++++++++++++++++++
> > include/uapi/linux/vfio.h | 47 +++++++++++++++
> > 2 files changed, 145 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 19f5b075d70a..45edf4e9b98b 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -1181,6 +1181,102 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> > return ret;
> > }
> >
> > +static struct pci_dev *
> > +vfio_pci_dev_set_resettable(struct vfio_device_set *dev_set);
> > +
> > +static int vfio_pci_ioctl_get_pci_hot_reset_group_info(
> > + struct vfio_pci_core_device *vdev,
> > + struct vfio_pci_hot_reset_group_info __user *arg)
> > +{
> > + unsigned long minsz =
> > + offsetofend(struct vfio_pci_hot_reset_group_info, count);
> > + struct vfio_pci_hot_reset_group_info hdr;
> > + struct iommufd_ctx *iommufd, *cur_iommufd;
> > + u32 count = 0, index = 0, *devices = NULL;
> > + struct vfio_pci_core_device *cur;
> > + bool slot = false;
> > + int ret = 0;
> > +
> > + if (copy_from_user(&hdr, arg, minsz))
> > + return -EFAULT;
> > +
> > + if (hdr.argsz < minsz)
> > + return -EINVAL;
> > +
> > + hdr.flags = 0;
> > +
> > + /* Can we do a slot or bus reset or neither? */
> > + if (!pci_probe_reset_slot(vdev->pdev->slot))
> > + slot = true;
> > + else if (pci_probe_reset_bus(vdev->pdev->bus))
> > + return -ENODEV;
> > +
> > + mutex_lock(&vdev->vdev.dev_set->lock);
> > + if (!vfio_pci_dev_set_resettable(vdev->vdev.dev_set)) {
> > + ret = -EPERM;
> > + goto out_unlock;
> > + }
> > +
> > + iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> > + if (!iommufd) {
> > + ret = -EPERM;
> > + goto out_unlock;
> > + }
> > +
> > + /* How many devices are affected? */
> > + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
> > + &count, slot);
> > + if (ret)
> > + goto out_unlock;
> > +
> > + WARN_ON(!count); /* Should always be at least one */
> > +
> > + /*
> > + * If there's enough space, fill it now, otherwise return -ENOSPC and
> > + * the number of devices affected.
> > + */
> > + if (hdr.argsz < sizeof(hdr) + (count * sizeof(*devices))) {
> > + ret = -ENOSPC;
> > + hdr.count = count;
> > + goto reset_info_exit;
> > + }
> > +
> > + devices = kcalloc(count, sizeof(*devices), GFP_KERNEL);
> > + if (!devices) {
> > + ret = -ENOMEM;
> > + goto reset_info_exit;
> > + }
> > +
> > + list_for_each_entry(cur, &vdev->vdev.dev_set->device_list, vdev.dev_set_list) {
> > + cur_iommufd = vfio_iommufd_physical_ictx(&cur->vdev);
> > + if (cur->vdev.open_count) {
> > + if (cur_iommufd != iommufd) {
> > + ret = -EPERM;
> > + break;
> > + }
> > + ret = vfio_iommufd_physical_devid(&cur->vdev, &devices[index]);
> > + if (ret)
> > + break;
> > + index++;
> > + }
> > + }
This also makes use of vfio_pci_for_each_slot_or_bus() to iterate
affected devices, but then still assumes those affected devices are
necessarily represented in the dev_set. For example, a two-function
device with ACS isolation can have function 0 bound to vfio and
function 1 bound to a native host driver. The above code requires the
user to pass a buffer large enough for both functions, but only
function 0 is part of the dev_set, so function 1 is not reported as
affected, nor does it generate an error.
It looks like we also fail to set hdr.count except in the error path
above, so the below copy_to_user() copies a user specified range beyond
the end the allocated devices buffer out to userspace! Thanks,
Alex
> > +
> > +reset_info_exit:
> > + if (copy_to_user(arg, &hdr, minsz))
> > + ret = -EFAULT;
> > +
> > + if (!ret) {
> > + if (copy_to_user(&arg->devices, devices,
> > + hdr.count * sizeof(*devices)))
> > + ret = -EFAULT;
> > + }
> > +
> > + kfree(devices);
> > +out_unlock:
> > + mutex_unlock(&vdev->vdev.dev_set->lock);
> > + return ret;
> > +}
> > +
> > static int vfio_pci_ioctl_get_pci_hot_reset_info(
> > struct vfio_pci_core_device *vdev,
> > struct vfio_pci_hot_reset_info __user *arg)
> > @@ -1404,6 +1500,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> > return vfio_pci_ioctl_get_irq_info(vdev, uarg);
> > case VFIO_DEVICE_GET_PCI_HOT_RESET_INFO:
> > return vfio_pci_ioctl_get_pci_hot_reset_info(vdev, uarg);
> > + case VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO:
> > + return vfio_pci_ioctl_get_pci_hot_reset_group_info(vdev, uarg);
> > case VFIO_DEVICE_GET_REGION_INFO:
> > return vfio_pci_ioctl_get_region_info(vdev, uarg);
> > case VFIO_DEVICE_IOEVENTFD:
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 25432ef213ee..61b801dfd40b 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -669,6 +669,53 @@ struct vfio_pci_hot_reset_info {
> >
> > #define VFIO_DEVICE_GET_PCI_HOT_RESET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >
> > +/**
> > + * VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> > + * struct vfio_pci_hot_reset_group_info)
> > + *
> > + * This is used in the vfio device cdev mode. It returns the list of
> > + * affected devices (represented by iommufd dev_id) when hot reset is
> > + * issued on the current device with which this ioctl is invoked. It
> > + * only includes the devices that are opened by the current user in the
> > + * time of this command is invoked. This list may change when user opens
> > + * new device or close opened device, hence user should re-invoke it
> > + * after open/close devices. This command has no guarantee on the result
> > + * of VFIO_DEVICE_PCI_HOT_RESET since the not-opened affected device can
> > + * be by other users in the window between the two ioctls. If the affected
> > + * devices are opened by multiple users, the VFIO_DEVICE_PCI_HOT_RESET
> > + * shall fail, detail can check the description of VFIO_DEVICE_PCI_HOT_RESET.
> > + *
> > + * For the users that open vfio group/container, this ioctl is useless as
> > + * they have no idea about the iommufd dev_id returned by this ioctl. For
> > + * the users of the traditional mode vfio group/container, this ioctl will
> > + * fail as this mode does not use iommufd hence no dev_id to report back.
> > + * For the users of the iommufd compat mode vfio group/container, this ioctl
> > + * would succeed as this mode uses iommufd as container fd. But such users
> > + * still have no idea about the iommufd dev_id as the dev_id is only stored
> > + * in kernel in this mode. For the users of the vfio group/container, the
> > + * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO should be used to know the hot reset
> > + * affected devices.
> > + *
> > + * Return: 0 on success, -errno on failure:
> > + * -enospc = insufficient buffer;
> > + * -enodev = unsupported for device;
> > + * -eperm = no permission for device, this error comes:
> > + * - when there are affected devices that are opened but
> > + * not bound to the same iommufd with the current device
> > + * with which this ioctl is invoked,
> > + * - there are affected devices that are not bound to vfio
> > + * driver yet.
> > + * - no valid iommufd is bound (e.g. noiommu mode)
> > + */
> > +struct vfio_pci_hot_reset_group_info {
> > + __u32 argsz;
> > + __u32 flags;
> > + __u32 count;
> > + __u32 devices[];
> > +};
> > +
> > +#define VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO _IO(VFIO_TYPE, VFIO_BASE + 18)
> > +
> > /**
> > * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> > * struct vfio_pci_hot_reset)
>
next prev parent reply other threads:[~2023-03-27 20:41 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-27 9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
2023-03-27 9:34 ` [PATCH v2 01/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
2023-03-27 9:34 ` [PATCH v2 02/10] vfio/pci: Only check ownership of opened devices in hot reset Yi Liu
2023-03-27 9:34 ` [PATCH v2 03/10] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
2023-03-30 23:39 ` Jason Gunthorpe
2023-03-30 23:44 ` Jason Gunthorpe
2023-03-27 9:34 ` [PATCH v2 04/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
2023-03-30 23:44 ` Jason Gunthorpe
2023-03-27 9:34 ` [PATCH v2 05/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
2023-03-30 23:47 ` Jason Gunthorpe
2023-03-27 9:34 ` [PATCH v2 06/10] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
2023-03-30 23:48 ` Jason Gunthorpe
2023-03-27 9:34 ` [PATCH v2 07/10] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
2023-03-30 23:49 ` Jason Gunthorpe
2023-03-27 9:34 ` [PATCH v2 08/10] vfio/pci: Renaming for accepting device fd in " Yi Liu
2023-03-27 9:34 ` [PATCH v2 09/10] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
2023-03-30 23:50 ` Jason Gunthorpe
2023-03-27 9:34 ` [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO Yi Liu
2023-03-27 19:26 ` Alex Williamson
2023-03-27 20:40 ` Alex Williamson [this message]
2023-03-28 3:45 ` Liu, Yi L
2023-03-28 3:32 ` Liu, Yi L
2023-03-28 6:19 ` Tian, Kevin
2023-03-28 14:25 ` Alex Williamson
2023-03-28 14:38 ` Liu, Yi L
2023-03-28 14:46 ` Alex Williamson
2023-03-28 15:00 ` Liu, Yi L
2023-03-28 15:18 ` Alex Williamson
2023-03-28 15:45 ` Liu, Yi L
2023-03-28 16:00 ` Alex Williamson
2023-03-29 3:13 ` Liu, Yi L
2023-03-29 9:41 ` Tian, Kevin
2023-03-29 15:49 ` Alex Williamson
2023-03-29 15:57 ` Jason Gunthorpe
2023-03-30 1:17 ` Tian, Kevin
2023-03-30 22:38 ` Jason Gunthorpe
2023-03-30 12:48 ` Liu, Yi L
2023-03-30 12:56 ` Liu, Yi L
2023-03-30 22:44 ` Jason Gunthorpe
2023-03-30 23:05 ` Alex Williamson
2023-03-30 23:18 ` Jason Gunthorpe
2023-03-29 15:50 ` Jason Gunthorpe
2023-03-30 1:10 ` Tian, Kevin
2023-03-30 1:33 ` Tian, Kevin
2023-03-28 16:29 ` Jason Gunthorpe
2023-03-28 19:09 ` Alex Williamson
2023-03-28 19:22 ` Jason Gunthorpe
2023-03-28 12:40 ` Jason Gunthorpe
2023-03-28 14:45 ` Liu, Yi L
2023-03-31 3:14 ` [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Jiang, Yanting
2023-03-31 13:24 ` Alex Williamson
2023-04-03 2:04 ` Jiang, Yanting
2023-03-31 5:01 ` Jiang, Yanting
2023-03-31 17:27 ` Xu, Terrence
2023-03-31 17:49 ` Alex Williamson
2023-04-01 9:15 ` Xu, Terrence
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230327144056.3a287eea.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=chao.p.peng@linux.intel.com \
--cc=cohuck@redhat.com \
--cc=eric.auger@redhat.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=jasowang@redhat.com \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=lulu@redhat.com \
--cc=mjrosato@linux.ibm.com \
--cc=nicolinc@nvidia.com \
--cc=peterx@redhat.com \
--cc=robin.murphy@arm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=terrence.xu@intel.com \
--cc=xudong.hao@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yanting.jiang@intel.com \
--cc=yi.l.liu@intel.com \
--cc=yi.y.sun@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox