public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device
@ 2023-05-13 13:21 Yi Liu
  2023-05-13 13:21 ` [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices Yi Liu
                   ` (10 more replies)
  0 siblings, 11 replies; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds
to prove that it owns all devices affected by resetting the calling
device. While for cdev devices, user can use an iommufd-based ownership
checking model and invoke VFIO_DEVICE_PCI_HOT_RESET with a zero-length
fd array.

This series first creates iommufd_access for noiommu devices to fill the
gap for adding iommufd-based ownership checking model, then extends
VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to check ownership and return the
check result and the devid of affected devices to user. In the end, extends
the VFIO_DEVICE_PCI_HOT_RESET to accept zero-length fd array for hot-reset
with cdev devices.

The new hot reset method and updated _INFO ioctl are tested with the
below qemu:

https://github.com/yiliu1765/qemu/tree/iommufd_rfcv4.mig.reset.v4_var3
(requires to test with the cdev kernel)

Change log:

v5:
 - Drop patch 01 of v4 (Alex)
 - Create noiommu_access for noiommu devices (Jason)
 - Reserve all negative iommufd IDs, hence VFIO can encode negative
   values (Jason)
 - Make vfio_iommufd_physical_devid() return -EINVAL if it's not called
   with a physical device or a noiommu device.
 - Add vfio_find_device_in_devset() in vfio_main.c (Alex)
 - Add iommufd_ctx_has_group() to replace vfio_devset_iommufd_has_group().
   Reason: vfio_devset_iommufd_has_group() only loops the devices within
   the given devset to check the iommufd an iommu_group, but an iommu_group
   can span into multiple devsets. So if failed to find the group in a
   devset doesn't mean the group is not owned by the iommufd. So here either
   needs to search all the devsets or add an iommufd API to check it. It
   appears an iommufd API makes more sense.
 - Adopt suggestions from Alex on patch 08 and 09 of v4, refine the hot-reset
   uapi description and minor tweaks
 - Use bitfields for bool members (Alex)

v4: https://lore.kernel.org/kvm/20230426145419.450922-1-yi.l.liu@intel.com/
 - Rename the patch series subject
 - Patch 01 is moved from the cdev series
 - Patch 02, 06 are new per review comments in v3
 - Patch 03/04/05/07/08/09 are from v3 with updates

v3: https://lore.kernel.org/kvm/20230401144429.88673-1-yi.l.liu@intel.com/
 - Remove the new _INFO ioctl of v2, extend the existing _INFO ioctl to
   report devid (Alex)
 - Add r-b from Jason
 - Add t-b from Terrence Xu and Yanting Jiang (mainly regression test)

v2: https://lore.kernel.org/kvm/20230327093458.44939-1-yi.l.liu@intel.com/
 - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
 - Add r-b from Kevin and Jason
 - Add patch 10 to introduce a new _INFO ioctl for the usage of device
   fd passing usage in cdev path (Jason, Alex)

v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Yi Liu (10):
  vfio-iommufd: Create iommufd_access for noiommu devices
  vfio/pci: Update comment around group_fd get in
    vfio_pci_ioctl_pci_hot_reset()
  vfio/pci: Move the existing hot reset logic to be a helper
  vfio: Mark cdev usage in vfio_device
  iommufd: Reserve all negative IDs in the iommufd xarray
  vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
    vfio_device
  vfio: Add helper to search vfio_device in a dev_set
  iommufd: Add iommufd_ctx_has_group()
  vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device
    cdev
  vfio/pci: Allow passing zero-length fd array in
    VFIO_DEVICE_PCI_HOT_RESET

 drivers/iommu/iommufd/device.c   |  53 +++++++++
 drivers/iommu/iommufd/main.c     |   2 +-
 drivers/vfio/iommufd.c           |  63 ++++++++++-
 drivers/vfio/pci/vfio_pci_core.c | 184 ++++++++++++++++++++++++-------
 drivers/vfio/vfio_main.c         |  15 +++
 include/linux/iommufd.h          |  14 +++
 include/linux/vfio.h             |  23 ++++
 include/uapi/linux/vfio.h        |  60 +++++++++-
 8 files changed, 368 insertions(+), 46 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-17 17:26   ` Alex Williamson
  2023-05-13 13:21 ` [PATCH v5 02/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This binds noiommu device to iommufd and creates iommufd_access for this
bond. This is useful for adding an iommufd-based device ownership check
for VFIO_DEVICE_PCI_HOT_RESET since this model requires all the other
affected devices bound to the same iommufd as the device to be reset.
For noiommu devices, there is no backend iommu, so create iommufd_access
instead of iommufd_device.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/iommufd.c | 43 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/vfio.h   |  1 +
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 88b00c501015..c1379e826052 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -10,6 +10,42 @@
 MODULE_IMPORT_NS(IOMMUFD);
 MODULE_IMPORT_NS(IOMMUFD_VFIO);
 
+static void vfio_noiommu_access_unmap(void *data, unsigned long iova,
+				      unsigned long length)
+{
+}
+
+static const struct iommufd_access_ops vfio_user_noiommu_ops = {
+	.needs_pin_pages = 1,
+	.unmap = vfio_noiommu_access_unmap,
+};
+
+static int vfio_iommufd_noiommu_bind(struct vfio_device *vdev,
+				     struct iommufd_ctx *ictx,
+				     u32 *out_device_id)
+{
+	struct iommufd_access *user;
+
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	user = iommufd_access_create(ictx, &vfio_user_noiommu_ops,
+				     vdev, out_device_id);
+	if (IS_ERR(user))
+		return PTR_ERR(user);
+	vdev->noiommu_access = user;
+	return 0;
+}
+
+static void vfio_iommufd_noiommu_unbind(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (vdev->noiommu_access) {
+		iommufd_access_destroy(vdev->noiommu_access);
+		vdev->noiommu_access = NULL;
+	}
+}
+
 int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 {
 	u32 ioas_id;
@@ -29,7 +65,8 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 		 */
 		if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id))
 			return -EPERM;
-		return 0;
+
+		return vfio_iommufd_noiommu_bind(vdev, ictx, &device_id);
 	}
 
 	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
@@ -59,8 +96,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vfio_device_is_noiommu(vdev))
+	if (vfio_device_is_noiommu(vdev)) {
+		vfio_iommufd_noiommu_unbind(vdev);
 		return;
+	}
 
 	if (vdev->ops->unbind_iommufd)
 		vdev->ops->unbind_iommufd(vdev);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 2c137ea94a3e..16fd04490550 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -57,6 +57,7 @@ struct vfio_device {
 	struct list_head group_next;
 	struct list_head iommu_entry;
 	struct iommufd_access *iommufd_access;
+	struct iommufd_access *noiommu_access;
 	void (*put_kvm)(struct kvm *kvm);
 #if IS_ENABLED(CONFIG_IOMMUFD)
 	struct iommufd_device *iommufd_device;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 02/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
  2023-05-13 13:21 ` [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-13 13:21 ` [PATCH v5 03/10] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

this suits more on what the code does.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a5ab416cf476..f824de4dbf27 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1308,9 +1308,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	/*
-	 * For each group_fd, get the group through the vfio external user
-	 * interface and store the group and iommu ID.  This ensures the group
-	 * is held across the reset.
+	 * Get the group file for each fd to ensure the group is held across
+	 * the reset
 	 */
 	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 03/10] vfio/pci: Move the existing hot reset logic to be a helper
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
  2023-05-13 13:21 ` [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices Yi Liu
  2023-05-13 13:21 ` [PATCH v5 02/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-13 13:21 ` [PATCH v5 04/10] vfio: Mark cdev usage in vfio_device Yi Liu
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This prepares to add another method for hot reset. The major hot reset logic
are moved to vfio_pci_ioctl_pci_hot_reset_groups().

No functional change is intended.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 55 +++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index f824de4dbf27..39e7823088e7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1255,29 +1255,16 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	return ret;
 }
 
-static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
-					struct vfio_pci_hot_reset __user *arg)
+static int
+vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
+				    int array_count, bool slot,
+				    struct vfio_pci_hot_reset __user *arg)
 {
-	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
-	struct vfio_pci_hot_reset hdr;
 	int32_t *group_fds;
 	struct file **files;
 	struct vfio_pci_group_info info;
-	bool slot = false;
 	int file_idx, count = 0, ret = 0;
 
-	if (copy_from_user(&hdr, arg, minsz))
-		return -EFAULT;
-
-	if (hdr.argsz < minsz || hdr.flags)
-		return -EINVAL;
-
-	/* Can we do a slot or bus reset or neither? */
-	if (!pci_probe_reset_slot(vdev->pdev->slot))
-		slot = true;
-	else if (pci_probe_reset_bus(vdev->pdev->bus))
-		return -ENODEV;
-
 	/*
 	 * We can't let userspace give us an arbitrarily large buffer to copy,
 	 * so verify how many we think there could be.  Note groups can have
@@ -1289,11 +1276,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 		return ret;
 
 	/* Somewhere between 1 and count is OK */
-	if (!hdr.count || hdr.count > count)
+	if (!array_count || array_count > count)
 		return -EINVAL;
 
-	group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
-	files = kcalloc(hdr.count, sizeof(*files), GFP_KERNEL);
+	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
+	files = kcalloc(array_count, sizeof(*files), GFP_KERNEL);
 	if (!group_fds || !files) {
 		kfree(group_fds);
 		kfree(files);
@@ -1301,7 +1288,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	if (copy_from_user(group_fds, arg->group_fds,
-			   hdr.count * sizeof(*group_fds))) {
+			   array_count * sizeof(*group_fds))) {
 		kfree(group_fds);
 		kfree(files);
 		return -EFAULT;
@@ -1311,7 +1298,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	 * Get the group file for each fd to ensure the group is held across
 	 * the reset
 	 */
-	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
+	for (file_idx = 0; file_idx < array_count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
 
 		if (!file) {
@@ -1335,7 +1322,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	if (ret)
 		goto hot_reset_release;
 
-	info.count = hdr.count;
+	info.count = array_count;
 	info.files = files;
 
 	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
@@ -1348,6 +1335,28 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
+static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
+					struct vfio_pci_hot_reset __user *arg)
+{
+	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
+	struct vfio_pci_hot_reset hdr;
+	bool slot = false;
+
+	if (copy_from_user(&hdr, arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags)
+		return -EINVAL;
+
+	/* Can we do a slot or bus reset or neither? */
+	if (!pci_probe_reset_slot(vdev->pdev->slot))
+		slot = true;
+	else if (pci_probe_reset_bus(vdev->pdev->bus))
+		return -ENODEV;
+
+	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
+}
+
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
 				    struct vfio_device_ioeventfd __user *arg)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 04/10] vfio: Mark cdev usage in vfio_device
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
                   ` (2 preceding siblings ...)
  2023-05-13 13:21 ` [PATCH v5 03/10] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-13 13:21 ` [PATCH v5 05/10] iommufd: Reserve all negative IDs in the iommufd xarray Yi Liu
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

Use it to differentiate whether to report group_id or devid in the revised
VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no cdev path yet,
so the vfio_device_cdev_opened() helper always returns false.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/linux/vfio.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 16fd04490550..a61130bc06a2 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -140,6 +140,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
 #endif
 
+static inline bool vfio_device_cdev_opened(struct vfio_device *device)
+{
+	return false;
+}
+
 /**
  * struct vfio_migration_ops - VFIO bus device driver migration callbacks
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 05/10] iommufd: Reserve all negative IDs in the iommufd xarray
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
                   ` (3 preceding siblings ...)
  2023-05-13 13:21 ` [PATCH v5 04/10] vfio: Mark cdev usage in vfio_device Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-13 13:21 ` [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

Hence IOMMUFD users can encode the negative IDs for specific purposes.
e.g. VFIO needs two reserved values to tell userspace the ID returned
is not valid but has other meaning.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3fbe636c3d8a..32ce7befc8dd 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -50,7 +50,7 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
 	 * before calling iommufd_object_finalize().
 	 */
 	rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY,
-		      xa_limit_32b, GFP_KERNEL_ACCOUNT);
+		      xa_limit_31b, GFP_KERNEL_ACCOUNT);
 	if (rc)
 		goto out_free;
 	return obj;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
                   ` (4 preceding siblings ...)
  2023-05-13 13:21 ` [PATCH v5 05/10] iommufd: Reserve all negative IDs in the iommufd xarray Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-17 18:15   ` Alex Williamson
  2023-05-13 13:21 ` [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set Yi Liu
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This is needed by the vfio-pci driver to report affected devices in the
hot reset for a given device.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 24 ++++++++++++++++++++++++
 drivers/vfio/iommufd.c         | 20 ++++++++++++++++++++
 include/linux/iommufd.h        |  6 ++++++
 include/linux/vfio.h           | 14 ++++++++++++++
 4 files changed, 64 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4f9b2142274c..81466b97023f 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -116,6 +116,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
 
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
+{
+	return idev->ictx;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
+
+u32 iommufd_device_to_id(struct iommufd_device *idev)
+{
+	return idev->obj.id;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
+
 static int iommufd_device_setup_msi(struct iommufd_device *idev,
 				    struct iommufd_hw_pagetable *hwpt,
 				    phys_addr_t sw_msi_start)
@@ -463,6 +475,18 @@ void iommufd_access_destroy(struct iommufd_access *access)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
 
+struct iommufd_ctx *iommufd_access_to_ictx(struct iommufd_access *access)
+{
+	return access->ictx;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_access_to_ictx, IOMMUFD);
+
+u32 iommufd_access_to_id(struct iommufd_access *access)
+{
+	return access->obj.id;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_access_to_id, IOMMUFD);
+
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
 {
 	struct iommufd_ioas *new_ioas;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index c1379e826052..a18e920be164 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -105,6 +105,26 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
 		vdev->ops->unbind_iommufd(vdev);
 }
 
+struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
+{
+	if (vdev->iommufd_device)
+		return iommufd_device_to_ictx(vdev->iommufd_device);
+	if (vdev->noiommu_access)
+		return iommufd_access_to_ictx(vdev->noiommu_access);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
+
+int vfio_iommufd_physical_devid(struct vfio_device *vdev)
+{
+	if (vdev->iommufd_device)
+		return iommufd_device_to_id(vdev->iommufd_device);
+	if (vdev->noiommu_access)
+		return iommufd_access_to_id(vdev->noiommu_access);
+	return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
+
 /*
  * The physical standard ops mean that the iommufd_device is bound to the
  * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 1129a36a74c4..68cd65274e28 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -24,6 +24,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
 int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
 void iommufd_device_detach(struct iommufd_device *idev);
 
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
+u32 iommufd_device_to_id(struct iommufd_device *idev);
+
 struct iommufd_access_ops {
 	u8 needs_pin_pages : 1;
 	void (*unmap)(void *data, unsigned long iova, unsigned long length);
@@ -45,6 +48,9 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 void iommufd_access_destroy(struct iommufd_access *access);
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id);
 
+struct iommufd_ctx *iommufd_access_to_ictx(struct iommufd_access *access);
+u32 iommufd_access_to_id(struct iommufd_access *access);
+
 void iommufd_ctx_get(struct iommufd_ctx *ictx);
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index a61130bc06a2..fcbe084b18c8 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -115,6 +115,8 @@ struct vfio_device_ops {
 };
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
+struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev);
+int vfio_iommufd_physical_devid(struct vfio_device *vdev);
 int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
@@ -124,6 +126,18 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 #else
+static inline struct iommufd_ctx *
+vfio_iommufd_physical_ictx(struct vfio_device *vdev)
+{
+	return NULL;
+}
+
+static inline int
+vfio_iommufd_physical_devid(struct vfio_device *vdev)
+{
+	return -EOPNOTSUPP;
+}
+
 #define vfio_iommufd_physical_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
                   ` (5 preceding siblings ...)
  2023-05-13 13:21 ` [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-15  7:11   ` Cédric Le Goater
  2023-05-17 19:12   ` Alex Williamson
  2023-05-13 13:21 ` [PATCH v5 08/10] iommufd: Add iommufd_ctx_has_group() Yi Liu
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

There are drivers that need to search vfio_device within a given dev_set.
e.g. vfio-pci. So add a helper.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c |  8 +++-----
 drivers/vfio/vfio_main.c         | 15 +++++++++++++++
 include/linux/vfio.h             |  3 +++
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 39e7823088e7..4df2def35bdd 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2335,12 +2335,10 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
 static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
 {
 	struct vfio_device_set *dev_set = data;
-	struct vfio_device *cur;
 
-	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
-		if (cur->dev == &pdev->dev)
-			return 0;
-	return -EBUSY;
+	lockdep_assert_held(&dev_set->lock);
+
+	return vfio_find_device_in_devset(dev_set, &pdev->dev) ? 0 : -EBUSY;
 }
 
 /*
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f0ca33b2e1df..ab4f3a794f78 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -141,6 +141,21 @@ unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set)
 }
 EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
 
+struct vfio_device *
+vfio_find_device_in_devset(struct vfio_device_set *dev_set,
+			   struct device *dev)
+{
+	struct vfio_device *cur;
+
+	lockdep_assert_held(&dev_set->lock);
+
+	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
+		if (cur->dev == dev)
+			return cur;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_find_device_in_devset);
+
 /*
  * Device objects - create, release, get, put, search
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index fcbe084b18c8..4c17395ed4d2 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -259,6 +259,9 @@ void vfio_unregister_group_dev(struct vfio_device *device);
 
 int vfio_assign_device_set(struct vfio_device *device, void *set_id);
 unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
+struct vfio_device *
+vfio_find_device_in_devset(struct vfio_device_set *dev_set,
+			   struct device *dev);
 
 int vfio_mig_get_next_state(struct vfio_device *device,
 			    enum vfio_device_mig_state cur_fsm,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 08/10] iommufd: Add iommufd_ctx_has_group()
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
                   ` (6 preceding siblings ...)
  2023-05-13 13:21 ` [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-17 19:40   ` Alex Williamson
  2023-05-13 13:21 ` [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev Yi Liu
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

to check if any device within the given iommu_group has been bound with
the iommufd_ctx. This helpful for the checking on device ownership for
the devices which have been bound but cannot be bound to any other iommufd
as the iommu_group has been bound.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 29 +++++++++++++++++++++++++++++
 include/linux/iommufd.h        |  8 ++++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 81466b97023f..5e5f7912807b 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -98,6 +98,35 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
 
+/**
+ * iommufd_ctx_has_group - True if the struct device is bound to this ictx
+ * @ictx: iommufd file descriptor
+ * @group: Pointer to a physical iommu_group struct
+ *
+ * True if a iommufd_device_bind() is present for any device within the
+ * group.
+ */
+bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
+{
+	struct iommufd_object *obj;
+	unsigned long index;
+
+	if (!ictx || !group)
+		return false;
+
+	xa_lock(&ictx->objects);
+	xa_for_each(&ictx->objects, index, obj) {
+		if (obj->type == IOMMUFD_OBJ_DEVICE &&
+		    container_of(obj, struct iommufd_device, obj)->group == group) {
+			xa_unlock(&ictx->objects);
+			return true;
+		}
+	}
+	xa_unlock(&ictx->objects);
+	return false;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
+
 /**
  * iommufd_device_unbind - Undo iommufd_device_bind()
  * @idev: Device returned by iommufd_device_bind()
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 68cd65274e28..e49c16cd6831 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -16,6 +16,7 @@ struct page;
 struct iommufd_ctx;
 struct iommufd_access;
 struct file;
+struct iommu_group;
 
 struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 					   struct device *dev, u32 *id);
@@ -56,6 +57,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
 #if IS_ENABLED(CONFIG_IOMMUFD)
 struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
 void iommufd_ctx_put(struct iommufd_ctx *ictx);
+bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
 
 int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 			     unsigned long length, struct page **out_pages,
@@ -77,6 +79,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
 {
 }
 
+static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
+					 struct iommu_group *group)
+{
+	return false;
+}
+
 static inline int iommufd_access_pin_pages(struct iommufd_access *access,
 					   unsigned long iova,
 					   unsigned long length,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
                   ` (7 preceding siblings ...)
  2023-05-13 13:21 ` [PATCH v5 08/10] iommufd: Add iommufd_ctx_has_group() Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-15  7:29   ` Cédric Le Goater
  2023-05-17 22:01   ` Alex Williamson
  2023-05-13 13:21 ` [PATCH v5 10/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
  2023-05-18  5:51 ` [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Xu, Terrence
  10 siblings, 2 replies; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This makes VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl to use the iommufd_ctx
of the cdev device to check the ownership of the other affected devices.

This returns devid for each of the affected devices. If it is bound to the
iommufd_ctx of the cdev device, _INFO reports a valid devid > 0; If it is
not opened by the calling user, but it belongs to the same iommu_group of
a device that is bound to the iommufd_ctx of the cdev device, reports devid
value of 0; If the device is un-owned device, configured within a different
iommufd, or opened outside of the vfio device cdev API, the _INFO ioctl shall
report devid value of -1.

devid >=0 doesn't block hot-reset as the affected devices are considered to
be owned, while devid == -1 will block the use of VFIO_DEVICE_PCI_HOT_RESET
outside of proof-of-ownership calling conventions (ie. via legacy group
accessed devices).

This adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID to tell the user devid is
returned in case of calling user get device fd from other software stack
and adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED to tell user if all
the affected devices are owned, so user can know it without looping all
the returned devids.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h        | 46 +++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 4df2def35bdd..57586be770af 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -27,6 +27,7 @@
 #include <linux/vgaarb.h>
 #include <linux/nospec.h>
 #include <linux/sched/mm.h>
+#include <linux/iommufd.h>
 #if IS_ENABLED(CONFIG_EEH)
 #include <asm/eeh.h>
 #endif
@@ -36,6 +37,10 @@
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
 #define DRIVER_DESC "core driver for VFIO based PCI devices"
 
+#ifdef CONFIG_IOMMUFD
+MODULE_IMPORT_NS(IOMMUFD);
+#endif
+
 static bool nointxmask;
 static bool disable_vga;
 static bool disable_idle_d3;
@@ -776,6 +781,9 @@ struct vfio_pci_fill_info {
 	int max;
 	int cur;
 	struct vfio_pci_dependent_device *devices;
+	struct vfio_device *vdev;
+	bool devid:1;
+	bool dev_owned:1;
 };
 
 static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
@@ -790,7 +798,37 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 	if (!iommu_group)
 		return -EPERM; /* Cannot reset non-isolated devices */
 
-	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+	if (fill->devid) {
+		struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(fill->vdev);
+		struct vfio_device_set *dev_set = fill->vdev->dev_set;
+		struct vfio_device *vdev;
+
+		/*
+		 * Report devid for the affected devices:
+		 * - valid devid > 0 for the devices that are bound with
+		 *   the iommufd of the calling device.
+		 * - devid == 0 for the devices that have not been opened
+		 *   but have same group with one of the devices bound to
+		 *   the iommufd of the calling device.
+		 * - devid == -1 for others, and clear dev_owned flag.
+		 */
+		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
+		if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
+			int ret;
+
+			ret = vfio_iommufd_physical_devid(vdev);
+			if (WARN_ON(ret < 0))
+				return ret;
+			fill->devices[fill->cur].devid = ret;
+		} else if (vdev && iommufd_ctx_has_group(iommufd, iommu_group)) {
+			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
+		} else {
+			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
+			fill->dev_owned = false;
+		}
+	} else {
+		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+	}
 	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
 	fill->devices[fill->cur].bus = pdev->bus->number;
 	fill->devices[fill->cur].devfn = pdev->devfn;
@@ -1229,17 +1267,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 		return -ENOMEM;
 
 	fill.devices = devices;
+	fill.vdev = &vdev->vdev;
 
+	mutex_lock(&vdev->vdev.dev_set->lock);
+	fill.devid = fill.dev_owned = vfio_device_cdev_opened(&vdev->vdev);
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
 					    &fill, slot);
+	mutex_unlock(&vdev->vdev.dev_set->lock);
 
 	/*
 	 * If a device was removed between counting and filling, we may come up
 	 * short of fill.max.  If a device was added, we'll have a return of
 	 * -EAGAIN above.
 	 */
-	if (!ret)
+	if (!ret) {
 		hdr.count = fill.cur;
+		if (fill.devid) {
+			hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID;
+			if (fill.dev_owned)
+				hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
+		}
+	}
 
 reset_info_exit:
 	if (copy_to_user(arg, &hdr, minsz))
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0552e8dcf0cb..01203215251a 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -650,11 +650,53 @@ enum {
  * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
  *					      struct vfio_pci_hot_reset_info)
  *
+ * This command is used to query the affected devices in the hot reset for
+ * a given device.
+ *
+ * This command always reports the segment, bus, and devfn information for
+ * each affected device, and selectively reports the group_id or devid per
+ * the way how the calling device is opened.
+ *
+ *	- If the calling device is opened via the traditional group/container
+ *	  API, group_id is reported.  User should check if it has owned all
+ *	  the affected devices and provides a set of group fds to prove the
+ *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
+ *
+ *	- If the calling device is opened as a cdev, devid is reported.
+ *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
+ *	  data type.  For a given affected device, it is considered owned by
+ *	  this interface if it meets the following conditions:
+ *	  1) Has a valid devid within the iommufd_ctx of the calling device.
+ *	     Ownership cannot be determined across separate iommufd_ctx and the
+ *	     cdev calling conventions do not support a proof-of-ownership model
+ *	     as provided in the legacy group interface.  In this case a valid
+ *	     devid with value greater than zero is provided in the return
+ *	     structure.
+ *	  2) Does not have a valid devid within the iommufd_ctx of the calling
+ *	     device, but belongs to the same IOMMU group as the calling device
+ *	     or another opened device that has a valid devid within the
+ *	     iommufd_ctx of the calling device.  This provides implicit ownership
+ *	     for devices within the same DMA isolation context.  In this case
+ *	     the invalid devid value of zero is provided in the return structure.
+ *
+ *	  A devid value of -1 is provided in the return structure for devices
+ *	  where ownership is not available.  Such devices prevent the use of
+ *	  VFIO_DEVICE_PCI_HOT_RESET outside of proof-of-ownership calling
+ *	  conventions (ie. via legacy group accessed devices).
+ *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
+ *	  affected devices are owned by the user.  This flag is available only
+ *	  when VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
+ *
  * Return: 0 on success, -errno on failure:
  *	-enospc = insufficient buffer, -enodev = unsupported for device.
  */
 struct vfio_pci_dependent_device {
-	__u32	group_id;
+	union {
+		__u32   group_id;
+		__u32	devid;
+#define VFIO_PCI_DEVID_OWNED		0
+#define VFIO_PCI_DEVID_NOT_OWNED	-1
+	};
 	__u16	segment;
 	__u8	bus;
 	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
@@ -663,6 +705,8 @@ struct vfio_pci_dependent_device {
 struct vfio_pci_hot_reset_info {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
+#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
 	__u32	count;
 	struct vfio_pci_dependent_device	devices[];
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v5 10/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
                   ` (8 preceding siblings ...)
  2023-05-13 13:21 ` [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev Yi Liu
@ 2023-05-13 13:21 ` Yi Liu
  2023-05-18  5:51 ` [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Xu, Terrence
  10 siblings, 0 replies; 36+ messages in thread
From: Yi Liu @ 2023-05-13 13:21 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This is the way user to invoke hot-reset for the devices opened by cdev
interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
hot-reset for cdev devices.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 68 ++++++++++++++++++++++++++------
 include/uapi/linux/vfio.h        | 14 +++++++
 2 files changed, 71 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 57586be770af..7fa3f54eb855 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -185,7 +185,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
 struct vfio_pci_group_info;
 static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups);
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx);
 
 /*
  * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -1323,8 +1324,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 	if (ret)
 		return ret;
 
-	/* Somewhere between 1 and count is OK */
-	if (!array_count || array_count > count)
+	if (array_count > count || vfio_device_cdev_opened(&vdev->vdev))
 		return -EINVAL;
 
 	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
@@ -1373,7 +1373,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 	info.count = array_count;
 	info.files = files;
 
-	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
+	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
 
 hot_reset_release:
 	for (file_idx--; file_idx >= 0; file_idx--)
@@ -1402,7 +1402,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	else if (pci_probe_reset_bus(vdev->pdev->bus))
 		return -ENODEV;
 
-	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
+	if (hdr.count)
+		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
+
+	return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
+					  vfio_iommufd_physical_ictx(&vdev->vdev));
 }
 
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
@@ -2369,13 +2373,16 @@ const struct pci_error_handlers vfio_pci_core_err_handlers = {
 };
 EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
 
-static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
+static bool vfio_dev_in_groups(struct vfio_device *vdev,
 			       struct vfio_pci_group_info *groups)
 {
 	unsigned int i;
 
+	if (!groups)
+		return false;
+
 	for (i = 0; i < groups->count; i++)
-		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
+		if (vfio_file_has_dev(groups->files[i], vdev))
 			return true;
 	return false;
 }
@@ -2447,13 +2454,37 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
 	return ret;
 }
 
+static bool vfio_device_owned(struct vfio_device *vdev,
+			      struct vfio_pci_group_info *groups,
+			      struct iommufd_ctx *iommufd_ctx)
+{
+	struct iommu_group *group;
+
+	WARN_ON(!!groups == !!iommufd_ctx);
+
+	if (groups)
+		return vfio_dev_in_groups(vdev, groups);
+
+	if (vfio_iommufd_physical_ictx(vdev) == iommufd_ctx)
+		return true;
+
+	group = iommu_group_get(vdev->dev);
+	if (!group)
+		return false;
+
+	iommu_group_put(group);
+
+	return iommufd_ctx_has_group(iommufd_ctx, group);
+}
+
 /*
  * We need to get memory_lock for each device, but devices can share mmap_lock,
  * therefore we need to zap and hold the vma_lock for each device, and only then
  * get each memory_lock.
  */
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups)
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx)
 {
 	struct vfio_pci_core_device *cur_mem;
 	struct vfio_pci_core_device *cur_vma;
@@ -2484,10 +2515,25 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 
 	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
 		/*
-		 * Test whether all the affected devices are contained by the
-		 * set of groups provided by the user.
+		 * Test whether all the affected devices can be reset by the
+		 * user.
+		 *
+		 * If the user provides a set of groups, all the devices
+		 * in the dev_set should be contained by the set of groups
+		 * provided by the user.
+		 *
+		 * If the user provides a zero-length group fd array, then
+		 * all the devices in the dev_set must be bound to the same
+		 * iommufd_ctx as the input iommufd_ctx.  If there is any
+		 * device that has not been bound to iommufd_ctx yet, check
+		 * if its iommu_group has any device bound to the input
+		 * iommufd_ctx Such devices can be considered owned by
+		 * the input iommufd_ctx as the device cannot be owned
+		 * by another iommufd_ctx when its iommu_group is owned.
+		 *
+		 * Otherwise, reset is not allowed.
 		 */
-		if (!vfio_dev_in_groups(cur_vma, groups)) {
+		if (!vfio_device_owned(&cur_vma->vdev, groups, iommufd_ctx)) {
 			ret = -EINVAL;
 			goto err_undo;
 		}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 01203215251a..24858b650562 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -686,6 +686,9 @@ enum {
  *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
  *	  affected devices are owned by the user.  This flag is available only
  *	  when VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
+ *	  When set, user could invoke VFIO_DEVICE_PCI_HOT_RESET with a zero
+ *	  length fd array on the calling device as the ownership is validated
+ *	  by iommufd_ctx.
  *
  * Return: 0 on success, -errno on failure:
  *	-enospc = insufficient buffer, -enodev = unsupported for device.
@@ -717,6 +720,17 @@ struct vfio_pci_hot_reset_info {
  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
  *				    struct vfio_pci_hot_reset)
  *
+ * Userspace requests hot reset for the devices it operates.  Due to the
+ * underlying topology, multiple devices can be affected in the reset
+ * while some might be opened by another user.  To avoid interference
+ * the calling user must ensure all affected devices are owned by itself.
+ *
+ * As the ownership described by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, the
+ * cdev opened devices must exclusively provide a zero-length fd array and
+ * the group opened devices must exclusively use an array of group fds for
+ * proof of ownership.  Mixed access to devices between cdev and legacy
+ * groups are not supported by this interface.
+ *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_pci_hot_reset {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set
  2023-05-13 13:21 ` [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set Yi Liu
@ 2023-05-15  7:11   ` Cédric Le Goater
  2023-05-17 19:12   ` Alex Williamson
  1 sibling, 0 replies; 36+ messages in thread
From: Cédric Le Goater @ 2023-05-15  7:11 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan

On 5/13/23 15:21, Yi Liu wrote:
> There are drivers that need to search vfio_device within a given dev_set.
> e.g. vfio-pci. So add a helper.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/vfio/pci/vfio_pci_core.c |  8 +++-----
>   drivers/vfio/vfio_main.c         | 15 +++++++++++++++
>   include/linux/vfio.h             |  3 +++
>   3 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 39e7823088e7..4df2def35bdd 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -2335,12 +2335,10 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
>   static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
>   {
>   	struct vfio_device_set *dev_set = data;
> -	struct vfio_device *cur;
>   
> -	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> -		if (cur->dev == &pdev->dev)
> -			return 0;
> -	return -EBUSY;
> +	lockdep_assert_held(&dev_set->lock);

May be drop the lockdep_assert_held() above since there is one in
vfio_find_device_in_devset().

Thanks,

C.

> +
> +	return vfio_find_device_in_devset(dev_set, &pdev->dev) ? 0 : -EBUSY;
>   }
>   
>   /*
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index f0ca33b2e1df..ab4f3a794f78 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -141,6 +141,21 @@ unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set)
>   }
>   EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
>   
> +struct vfio_device *
> +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> +			   struct device *dev)
> +{
> +	struct vfio_device *cur;
> +
> +	lockdep_assert_held(&dev_set->lock);
> +
> +	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> +		if (cur->dev == dev)
> +			return cur;
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(vfio_find_device_in_devset);
> +
>   /*
>    * Device objects - create, release, get, put, search
>    */
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index fcbe084b18c8..4c17395ed4d2 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -259,6 +259,9 @@ void vfio_unregister_group_dev(struct vfio_device *device);
>   
>   int vfio_assign_device_set(struct vfio_device *device, void *set_id);
>   unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
> +struct vfio_device *
> +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> +			   struct device *dev);
>   
>   int vfio_mig_get_next_state(struct vfio_device *device,
>   			    enum vfio_device_mig_state cur_fsm,


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-05-13 13:21 ` [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev Yi Liu
@ 2023-05-15  7:29   ` Cédric Le Goater
  2023-05-15  7:47     ` Liu, Yi L
  2023-05-17 22:01   ` Alex Williamson
  1 sibling, 1 reply; 36+ messages in thread
From: Cédric Le Goater @ 2023-05-15  7:29 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan

On 5/13/23 15:21, Yi Liu wrote:
> This makes VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl to use the iommufd_ctx
> of the cdev device to check the ownership of the other affected devices.
> 
> This returns devid for each of the affected devices. If it is bound to the
> iommufd_ctx of the cdev device, _INFO reports a valid devid > 0; If it is
> not opened by the calling user, but it belongs to the same iommu_group of
> a device that is bound to the iommufd_ctx of the cdev device, reports devid
> value of 0; If the device is un-owned device, configured within a different
> iommufd, or opened outside of the vfio device cdev API, the _INFO ioctl shall
> report devid value of -1.
> 
> devid >=0 doesn't block hot-reset as the affected devices are considered to
> be owned, while devid == -1 will block the use of VFIO_DEVICE_PCI_HOT_RESET
> outside of proof-of-ownership calling conventions (ie. via legacy group
> accessed devices).
> 
> This adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID to tell the user devid is
> returned in case of calling user get device fd from other software stack
> and adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED to tell user if all
> the affected devices are owned, so user can know it without looping all
> the returned devids.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++++++++++++++++--
>   include/uapi/linux/vfio.h        | 46 +++++++++++++++++++++++++++-
>   2 files changed, 95 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 4df2def35bdd..57586be770af 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -27,6 +27,7 @@
>   #include <linux/vgaarb.h>
>   #include <linux/nospec.h>
>   #include <linux/sched/mm.h>
> +#include <linux/iommufd.h>
>   #if IS_ENABLED(CONFIG_EEH)
>   #include <asm/eeh.h>
>   #endif
> @@ -36,6 +37,10 @@
>   #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>   #define DRIVER_DESC "core driver for VFIO based PCI devices"
>   
> +#ifdef CONFIG_IOMMUFD

To import the IOMMUFD namespace, I had to use :

#if IS_ENABLED(CONFIG_IOMMUFD)

Thanks,

C.


> +MODULE_IMPORT_NS(IOMMUFD);
> +#endif
> +
>   static bool nointxmask;
>   static bool disable_vga;
>   static bool disable_idle_d3;
> @@ -776,6 +781,9 @@ struct vfio_pci_fill_info {
>   	int max;
>   	int cur;
>   	struct vfio_pci_dependent_device *devices;
> +	struct vfio_device *vdev;
> +	bool devid:1;
> +	bool dev_owned:1;
>   };
>   
>   static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> @@ -790,7 +798,37 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>   	if (!iommu_group)
>   		return -EPERM; /* Cannot reset non-isolated devices */
>   
> -	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +	if (fill->devid) {
> +		struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(fill->vdev);
> +		struct vfio_device_set *dev_set = fill->vdev->dev_set;
> +		struct vfio_device *vdev;
> +
> +		/*
> +		 * Report devid for the affected devices:
> +		 * - valid devid > 0 for the devices that are bound with
> +		 *   the iommufd of the calling device.
> +		 * - devid == 0 for the devices that have not been opened
> +		 *   but have same group with one of the devices bound to
> +		 *   the iommufd of the calling device.
> +		 * - devid == -1 for others, and clear dev_owned flag.
> +		 */
> +		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> +		if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
> +			int ret;
> +
> +			ret = vfio_iommufd_physical_devid(vdev);
> +			if (WARN_ON(ret < 0))
> +				return ret;
> +			fill->devices[fill->cur].devid = ret;
> +		} else if (vdev && iommufd_ctx_has_group(iommufd, iommu_group)) {
> +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
> +		} else {
> +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> +			fill->dev_owned = false;
> +		}
> +	} else {
> +		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +	}
>   	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
>   	fill->devices[fill->cur].bus = pdev->bus->number;
>   	fill->devices[fill->cur].devfn = pdev->devfn;
> @@ -1229,17 +1267,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>   		return -ENOMEM;
>   
>   	fill.devices = devices;
> +	fill.vdev = &vdev->vdev;
>   
> +	mutex_lock(&vdev->vdev.dev_set->lock);
> +	fill.devid = fill.dev_owned = vfio_device_cdev_opened(&vdev->vdev);
>   	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>   					    &fill, slot);
> +	mutex_unlock(&vdev->vdev.dev_set->lock);
>   
>   	/*
>   	 * If a device was removed between counting and filling, we may come up
>   	 * short of fill.max.  If a device was added, we'll have a return of
>   	 * -EAGAIN above.
>   	 */
> -	if (!ret)
> +	if (!ret) {
>   		hdr.count = fill.cur;
> +		if (fill.devid) {
> +			hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID;
> +			if (fill.dev_owned)
> +				hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> +		}
> +	}
>   
>   reset_info_exit:
>   	if (copy_to_user(arg, &hdr, minsz))
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 0552e8dcf0cb..01203215251a 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -650,11 +650,53 @@ enum {
>    * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
>    *					      struct vfio_pci_hot_reset_info)
>    *
> + * This command is used to query the affected devices in the hot reset for
> + * a given device.
> + *
> + * This command always reports the segment, bus, and devfn information for
> + * each affected device, and selectively reports the group_id or devid per
> + * the way how the calling device is opened.
> + *
> + *	- If the calling device is opened via the traditional group/container
> + *	  API, group_id is reported.  User should check if it has owned all
> + *	  the affected devices and provides a set of group fds to prove the
> + *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
> + *
> + *	- If the calling device is opened as a cdev, devid is reported.
> + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
> + *	  data type.  For a given affected device, it is considered owned by
> + *	  this interface if it meets the following conditions:
> + *	  1) Has a valid devid within the iommufd_ctx of the calling device.
> + *	     Ownership cannot be determined across separate iommufd_ctx and the
> + *	     cdev calling conventions do not support a proof-of-ownership model
> + *	     as provided in the legacy group interface.  In this case a valid
> + *	     devid with value greater than zero is provided in the return
> + *	     structure.
> + *	  2) Does not have a valid devid within the iommufd_ctx of the calling
> + *	     device, but belongs to the same IOMMU group as the calling device
> + *	     or another opened device that has a valid devid within the
> + *	     iommufd_ctx of the calling device.  This provides implicit ownership
> + *	     for devices within the same DMA isolation context.  In this case
> + *	     the invalid devid value of zero is provided in the return structure.
> + *
> + *	  A devid value of -1 is provided in the return structure for devices
> + *	  where ownership is not available.  Such devices prevent the use of
> + *	  VFIO_DEVICE_PCI_HOT_RESET outside of proof-of-ownership calling
> + *	  conventions (ie. via legacy group accessed devices).
> + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
> + *	  affected devices are owned by the user.  This flag is available only
> + *	  when VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> + *
>    * Return: 0 on success, -errno on failure:
>    *	-enospc = insufficient buffer, -enodev = unsupported for device.
>    */
>   struct vfio_pci_dependent_device {
> -	__u32	group_id;
> +	union {
> +		__u32   group_id;
> +		__u32	devid;
> +#define VFIO_PCI_DEVID_OWNED		0
> +#define VFIO_PCI_DEVID_NOT_OWNED	-1
> +	};
>   	__u16	segment;
>   	__u8	bus;
>   	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
> @@ -663,6 +705,8 @@ struct vfio_pci_dependent_device {
>   struct vfio_pci_hot_reset_info {
>   	__u32	argsz;
>   	__u32	flags;
> +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
> +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
>   	__u32	count;
>   	struct vfio_pci_dependent_device	devices[];
>   };


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-05-15  7:29   ` Cédric Le Goater
@ 2023-05-15  7:47     ` Liu, Yi L
  0 siblings, 0 replies; 36+ messages in thread
From: Liu, Yi L @ 2023-05-15  7:47 UTC (permalink / raw)
  To: Cédric Le Goater, alex.williamson@redhat.com, jgg@nvidia.com,
	Tian, Kevin
  Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
	eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org,
	mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com,
	yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong

> From: Cédric Le Goater <clegoate@redhat.com>
> Sent: Monday, May 15, 2023 3:30 PM
> 
> On 5/13/23 15:21, Yi Liu wrote:
> > This makes VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl to use the iommufd_ctx
> > of the cdev device to check the ownership of the other affected devices.
> >
> > This returns devid for each of the affected devices. If it is bound to the
> > iommufd_ctx of the cdev device, _INFO reports a valid devid > 0; If it is
> > not opened by the calling user, but it belongs to the same iommu_group of
> > a device that is bound to the iommufd_ctx of the cdev device, reports devid
> > value of 0; If the device is un-owned device, configured within a different
> > iommufd, or opened outside of the vfio device cdev API, the _INFO ioctl shall
> > report devid value of -1.
> >
> > devid >=0 doesn't block hot-reset as the affected devices are considered to
> > be owned, while devid == -1 will block the use of VFIO_DEVICE_PCI_HOT_RESET
> > outside of proof-of-ownership calling conventions (ie. via legacy group
> > accessed devices).
> >
> > This adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID to tell the user devid is
> > returned in case of calling user get device fd from other software stack
> > and adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED to tell user if all
> > the affected devices are owned, so user can know it without looping all
> > the returned devids.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++++++++++++++++--
> >   include/uapi/linux/vfio.h        | 46 +++++++++++++++++++++++++++-
> >   2 files changed, 95 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 4df2def35bdd..57586be770af 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -27,6 +27,7 @@
> >   #include <linux/vgaarb.h>
> >   #include <linux/nospec.h>
> >   #include <linux/sched/mm.h>
> > +#include <linux/iommufd.h>
> >   #if IS_ENABLED(CONFIG_EEH)
> >   #include <asm/eeh.h>
> >   #endif
> > @@ -36,6 +37,10 @@
> >   #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> >   #define DRIVER_DESC "core driver for VFIO based PCI devices"
> >
> > +#ifdef CONFIG_IOMMUFD
> 
> To import the IOMMUFD namespace, I had to use :
> 
> #if IS_ENABLED(CONFIG_IOMMUFD)

Thanks. Yes, IOMMUFD is tristate now, so needs to test CONFIG_IOMMUFD=m.
and "#if IS_ENABLED(CONFIG_IOMMUFD)" fixes the compiling failure.

Regards,
Yi Liu
> 
> 
> > +MODULE_IMPORT_NS(IOMMUFD);
> > +#endif
> > +
> >   static bool nointxmask;
> >   static bool disable_vga;
> >   static bool disable_idle_d3;
> > @@ -776,6 +781,9 @@ struct vfio_pci_fill_info {
> >   	int max;
> >   	int cur;
> >   	struct vfio_pci_dependent_device *devices;
> > +	struct vfio_device *vdev;
> > +	bool devid:1;
> > +	bool dev_owned:1;
> >   };
> >
> >   static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > @@ -790,7 +798,37 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> >   	if (!iommu_group)
> >   		return -EPERM; /* Cannot reset non-isolated devices */
> >
> > -	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > +	if (fill->devid) {
> > +		struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(fill->vdev);
> > +		struct vfio_device_set *dev_set = fill->vdev->dev_set;
> > +		struct vfio_device *vdev;
> > +
> > +		/*
> > +		 * Report devid for the affected devices:
> > +		 * - valid devid > 0 for the devices that are bound with
> > +		 *   the iommufd of the calling device.
> > +		 * - devid == 0 for the devices that have not been opened
> > +		 *   but have same group with one of the devices bound to
> > +		 *   the iommufd of the calling device.
> > +		 * - devid == -1 for others, and clear dev_owned flag.
> > +		 */
> > +		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> > +		if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
> > +			int ret;
> > +
> > +			ret = vfio_iommufd_physical_devid(vdev);
> > +			if (WARN_ON(ret < 0))
> > +				return ret;
> > +			fill->devices[fill->cur].devid = ret;
> > +		} else if (vdev && iommufd_ctx_has_group(iommufd, iommu_group)) {
> > +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
> > +		} else {
> > +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> > +			fill->dev_owned = false;
> > +		}
> > +	} else {
> > +		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > +	}
> >   	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> >   	fill->devices[fill->cur].bus = pdev->bus->number;
> >   	fill->devices[fill->cur].devfn = pdev->devfn;
> > @@ -1229,17 +1267,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> >   		return -ENOMEM;
> >
> >   	fill.devices = devices;
> > +	fill.vdev = &vdev->vdev;
> >
> > +	mutex_lock(&vdev->vdev.dev_set->lock);
> > +	fill.devid = fill.dev_owned = vfio_device_cdev_opened(&vdev->vdev);
> >   	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> >   					    &fill, slot);
> > +	mutex_unlock(&vdev->vdev.dev_set->lock);
> >
> >   	/*
> >   	 * If a device was removed between counting and filling, we may come up
> >   	 * short of fill.max.  If a device was added, we'll have a return of
> >   	 * -EAGAIN above.
> >   	 */
> > -	if (!ret)
> > +	if (!ret) {
> >   		hdr.count = fill.cur;
> > +		if (fill.devid) {
> > +			hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID;
> > +			if (fill.dev_owned)
> > +				hdr.flags |=
> VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> > +		}
> > +	}
> >
> >   reset_info_exit:
> >   	if (copy_to_user(arg, &hdr, minsz))
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 0552e8dcf0cb..01203215251a 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -650,11 +650,53 @@ enum {
> >    * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> >    *					      struct vfio_pci_hot_reset_info)
> >    *
> > + * This command is used to query the affected devices in the hot reset for
> > + * a given device.
> > + *
> > + * This command always reports the segment, bus, and devfn information for
> > + * each affected device, and selectively reports the group_id or devid per
> > + * the way how the calling device is opened.
> > + *
> > + *	- If the calling device is opened via the traditional group/container
> > + *	  API, group_id is reported.  User should check if it has owned all
> > + *	  the affected devices and provides a set of group fds to prove the
> > + *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
> > + *
> > + *	- If the calling device is opened as a cdev, devid is reported.
> > + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
> > + *	  data type.  For a given affected device, it is considered owned by
> > + *	  this interface if it meets the following conditions:
> > + *	  1) Has a valid devid within the iommufd_ctx of the calling device.
> > + *	     Ownership cannot be determined across separate iommufd_ctx and the
> > + *	     cdev calling conventions do not support a proof-of-ownership model
> > + *	     as provided in the legacy group interface.  In this case a valid
> > + *	     devid with value greater than zero is provided in the return
> > + *	     structure.
> > + *	  2) Does not have a valid devid within the iommufd_ctx of the calling
> > + *	     device, but belongs to the same IOMMU group as the calling device
> > + *	     or another opened device that has a valid devid within the
> > + *	     iommufd_ctx of the calling device.  This provides implicit ownership
> > + *	     for devices within the same DMA isolation context.  In this case
> > + *	     the invalid devid value of zero is provided in the return structure.
> > + *
> > + *	  A devid value of -1 is provided in the return structure for devices
> > + *	  where ownership is not available.  Such devices prevent the use of
> > + *	  VFIO_DEVICE_PCI_HOT_RESET outside of proof-of-ownership calling
> > + *	  conventions (ie. via legacy group accessed devices).
> > + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
> > + *	  affected devices are owned by the user.  This flag is available only
> > + *	  when VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> > + *
> >    * Return: 0 on success, -errno on failure:
> >    *	-enospc = insufficient buffer, -enodev = unsupported for device.
> >    */
> >   struct vfio_pci_dependent_device {
> > -	__u32	group_id;
> > +	union {
> > +		__u32   group_id;
> > +		__u32	devid;
> > +#define VFIO_PCI_DEVID_OWNED		0
> > +#define VFIO_PCI_DEVID_NOT_OWNED	-1
> > +	};
> >   	__u16	segment;
> >   	__u8	bus;
> >   	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
> > @@ -663,6 +705,8 @@ struct vfio_pci_dependent_device {
> >   struct vfio_pci_hot_reset_info {
> >   	__u32	argsz;
> >   	__u32	flags;
> > +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
> > +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
> >   	__u32	count;
> >   	struct vfio_pci_dependent_device	devices[];
> >   };


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices
  2023-05-13 13:21 ` [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices Yi Liu
@ 2023-05-17 17:26   ` Alex Williamson
  2023-05-17 18:20     ` Jason Gunthorpe
  0 siblings, 1 reply; 36+ messages in thread
From: Alex Williamson @ 2023-05-17 17:26 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Sat, 13 May 2023 06:21:27 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This binds noiommu device to iommufd and creates iommufd_access for this
> bond. This is useful for adding an iommufd-based device ownership check
> for VFIO_DEVICE_PCI_HOT_RESET since this model requires all the other
> affected devices bound to the same iommufd as the device to be reset.
> For noiommu devices, there is no backend iommu, so create iommufd_access
> instead of iommufd_device.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/iommufd.c | 43 ++++++++++++++++++++++++++++++++++++++++--
>  include/linux/vfio.h   |  1 +
>  2 files changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 88b00c501015..c1379e826052 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -10,6 +10,42 @@
>  MODULE_IMPORT_NS(IOMMUFD);
>  MODULE_IMPORT_NS(IOMMUFD_VFIO);
>  
> +static void vfio_noiommu_access_unmap(void *data, unsigned long iova,
> +				      unsigned long length)
> +{

Should this WARN_ON if called?

> +}
> +
> +static const struct iommufd_access_ops vfio_user_noiommu_ops = {
> +	.needs_pin_pages = 1,

But it doesn't.

> +	.unmap = vfio_noiommu_access_unmap,
> +};
> +
> +static int vfio_iommufd_noiommu_bind(struct vfio_device *vdev,
> +				     struct iommufd_ctx *ictx,
> +				     u32 *out_device_id)
> +{
> +	struct iommufd_access *user;
> +
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	user = iommufd_access_create(ictx, &vfio_user_noiommu_ops,
> +				     vdev, out_device_id);
> +	if (IS_ERR(user))
> +		return PTR_ERR(user);
> +	vdev->noiommu_access = user;
> +	return 0;
> +}
> +
> +static void vfio_iommufd_noiommu_unbind(struct vfio_device *vdev)
> +{
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	if (vdev->noiommu_access) {
> +		iommufd_access_destroy(vdev->noiommu_access);
> +		vdev->noiommu_access = NULL;
> +	}
> +}
> +
>  int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
>  {
>  	u32 ioas_id;
> @@ -29,7 +65,8 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
>  		 */
>  		if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id))
>  			return -EPERM;
> -		return 0;
> +
> +		return vfio_iommufd_noiommu_bind(vdev, ictx, &device_id);
>  	}
>  
>  	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
> @@ -59,8 +96,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
>  {
>  	lockdep_assert_held(&vdev->dev_set->lock);
>  
> -	if (vfio_device_is_noiommu(vdev))
> +	if (vfio_device_is_noiommu(vdev)) {
> +		vfio_iommufd_noiommu_unbind(vdev);
>  		return;
> +	}
>  
>  	if (vdev->ops->unbind_iommufd)
>  		vdev->ops->unbind_iommufd(vdev);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 2c137ea94a3e..16fd04490550 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -57,6 +57,7 @@ struct vfio_device {
>  	struct list_head group_next;
>  	struct list_head iommu_entry;
>  	struct iommufd_access *iommufd_access;
> +	struct iommufd_access *noiommu_access;

It's not clear to me why we need a separate iommufd_access for noiommu.
Can't we add a vfio_device_is_noiommu() check to the
vfio_{un}pin_pages() and vfio_dma_rw() interfaces and reuse the
existing pointer for both emulated and noiommu cases?  Maybe even the
iommufd_access* functions should test needs_pin_pages and generate an
error/warning if an access that was registered without reporting that
it needs page pinning makes use of such an interface.  Thanks,

Alex

>  	void (*put_kvm)(struct kvm *kvm);
>  #if IS_ENABLED(CONFIG_IOMMUFD)
>  	struct iommufd_device *iommufd_device;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-05-13 13:21 ` [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
@ 2023-05-17 18:15   ` Alex Williamson
  2023-05-17 18:22     ` Jason Gunthorpe
  2023-05-18 13:25     ` Liu, Yi L
  0 siblings, 2 replies; 36+ messages in thread
From: Alex Williamson @ 2023-05-17 18:15 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Sat, 13 May 2023 06:21:32 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This is needed by the vfio-pci driver to report affected devices in the
> hot reset for a given device.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 24 ++++++++++++++++++++++++
>  drivers/vfio/iommufd.c         | 20 ++++++++++++++++++++
>  include/linux/iommufd.h        |  6 ++++++
>  include/linux/vfio.h           | 14 ++++++++++++++
>  4 files changed, 64 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 4f9b2142274c..81466b97023f 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -116,6 +116,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
>  
> +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
> +{
> +	return idev->ictx;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
> +
> +u32 iommufd_device_to_id(struct iommufd_device *idev)
> +{
> +	return idev->obj.id;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
> +
>  static int iommufd_device_setup_msi(struct iommufd_device *idev,
>  				    struct iommufd_hw_pagetable *hwpt,
>  				    phys_addr_t sw_msi_start)
> @@ -463,6 +475,18 @@ void iommufd_access_destroy(struct iommufd_access *access)
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
>  
> +struct iommufd_ctx *iommufd_access_to_ictx(struct iommufd_access *access)
> +{
> +	return access->ictx;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_access_to_ictx, IOMMUFD);
> +
> +u32 iommufd_access_to_id(struct iommufd_access *access)
> +{
> +	return access->obj.id;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_access_to_id, IOMMUFD);
> +
>  int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
>  {
>  	struct iommufd_ioas *new_ioas;
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index c1379e826052..a18e920be164 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -105,6 +105,26 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
>  		vdev->ops->unbind_iommufd(vdev);
>  }
>  
> +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> +{
> +	if (vdev->iommufd_device)
> +		return iommufd_device_to_ictx(vdev->iommufd_device);
> +	if (vdev->noiommu_access)
> +		return iommufd_access_to_ictx(vdev->noiommu_access);
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
> +
> +int vfio_iommufd_physical_devid(struct vfio_device *vdev)
> +{
> +	if (vdev->iommufd_device)
> +		return iommufd_device_to_id(vdev->iommufd_device);
> +	if (vdev->noiommu_access)
> +		return iommufd_access_to_id(vdev->noiommu_access);
> +	return -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);

I think these exemplify that it would be better if both emulated and
noiommu use the same iommufd_access pointer.  Thanks,

Alex

> +
>  /*
>   * The physical standard ops mean that the iommufd_device is bound to the
>   * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
> diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> index 1129a36a74c4..68cd65274e28 100644
> --- a/include/linux/iommufd.h
> +++ b/include/linux/iommufd.h
> @@ -24,6 +24,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
>  int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
>  void iommufd_device_detach(struct iommufd_device *idev);
>  
> +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
> +u32 iommufd_device_to_id(struct iommufd_device *idev);
> +
>  struct iommufd_access_ops {
>  	u8 needs_pin_pages : 1;
>  	void (*unmap)(void *data, unsigned long iova, unsigned long length);
> @@ -45,6 +48,9 @@ iommufd_access_create(struct iommufd_ctx *ictx,
>  void iommufd_access_destroy(struct iommufd_access *access);
>  int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id);
>  
> +struct iommufd_ctx *iommufd_access_to_ictx(struct iommufd_access *access);
> +u32 iommufd_access_to_id(struct iommufd_access *access);
> +
>  void iommufd_ctx_get(struct iommufd_ctx *ictx);
>  
>  #if IS_ENABLED(CONFIG_IOMMUFD)
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index a61130bc06a2..fcbe084b18c8 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -115,6 +115,8 @@ struct vfio_device_ops {
>  };
>  
>  #if IS_ENABLED(CONFIG_IOMMUFD)
> +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev);
> +int vfio_iommufd_physical_devid(struct vfio_device *vdev);
>  int vfio_iommufd_physical_bind(struct vfio_device *vdev,
>  			       struct iommufd_ctx *ictx, u32 *out_device_id);
>  void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
> @@ -124,6 +126,18 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
>  void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
>  int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
>  #else
> +static inline struct iommufd_ctx *
> +vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> +{
> +	return NULL;
> +}
> +
> +static inline int
> +vfio_iommufd_physical_devid(struct vfio_device *vdev)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  #define vfio_iommufd_physical_bind                                      \
>  	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
>  		  u32 *out_device_id)) NULL)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices
  2023-05-17 17:26   ` Alex Williamson
@ 2023-05-17 18:20     ` Jason Gunthorpe
  2023-05-18 12:23       ` Liu, Yi L
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Gunthorpe @ 2023-05-17 18:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yi Liu, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Wed, May 17, 2023 at 11:26:09AM -0600, Alex Williamson wrote:

> It's not clear to me why we need a separate iommufd_access for
> noiommu.

The point was to allocate an ID for the device so we can use that ID
with the other interfaces in all cases.

Otherwise it is a too weird special case that is probably going to
keep causing trouble down the road...

Jason

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-05-17 18:15   ` Alex Williamson
@ 2023-05-17 18:22     ` Jason Gunthorpe
  2023-05-17 18:40       ` Alex Williamson
  2023-05-18 13:25     ` Liu, Yi L
  1 sibling, 1 reply; 36+ messages in thread
From: Jason Gunthorpe @ 2023-05-17 18:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yi Liu, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Wed, May 17, 2023 at 12:15:17PM -0600, Alex Williamson wrote:

> > +int vfio_iommufd_physical_devid(struct vfio_device *vdev)
> > +{
> > +	if (vdev->iommufd_device)
> > +		return iommufd_device_to_id(vdev->iommufd_device);
> > +	if (vdev->noiommu_access)
> > +		return iommufd_access_to_id(vdev->noiommu_access);
> > +	return -EINVAL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
> 
> I think these exemplify that it would be better if both emulated and
> noiommu use the same iommufd_access pointer.  Thanks,

Oh, I mis understood your other remark.. Yeah good question I have to
study this also

Jason

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-05-17 18:22     ` Jason Gunthorpe
@ 2023-05-17 18:40       ` Alex Williamson
  2023-05-17 18:43         ` Jason Gunthorpe
  0 siblings, 1 reply; 36+ messages in thread
From: Alex Williamson @ 2023-05-17 18:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Wed, 17 May 2023 15:22:27 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, May 17, 2023 at 12:15:17PM -0600, Alex Williamson wrote:
> 
> > > +int vfio_iommufd_physical_devid(struct vfio_device *vdev)
> > > +{
> > > +	if (vdev->iommufd_device)
> > > +		return iommufd_device_to_id(vdev->iommufd_device);
> > > +	if (vdev->noiommu_access)
> > > +		return iommufd_access_to_id(vdev->noiommu_access);
> > > +	return -EINVAL;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);  
> > 
> > I think these exemplify that it would be better if both emulated and
> > noiommu use the same iommufd_access pointer.  Thanks,  
> 
> Oh, I mis understood your other remark.. Yeah good question I have to
> study this also

I guess I also missed that this wasn't iommufd_access vs
noiommu_access, it's device vs access, but shouldn't any iommufd_access
pointer provide the devid?  I need to go look why we need two different
methods to get a devid...  Thanks,

Alex


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-05-17 18:40       ` Alex Williamson
@ 2023-05-17 18:43         ` Jason Gunthorpe
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Gunthorpe @ 2023-05-17 18:43 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yi Liu, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Wed, May 17, 2023 at 12:40:32PM -0600, Alex Williamson wrote:
> On Wed, 17 May 2023 15:22:27 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, May 17, 2023 at 12:15:17PM -0600, Alex Williamson wrote:
> > 
> > > > +int vfio_iommufd_physical_devid(struct vfio_device *vdev)
> > > > +{
> > > > +	if (vdev->iommufd_device)
> > > > +		return iommufd_device_to_id(vdev->iommufd_device);
> > > > +	if (vdev->noiommu_access)
> > > > +		return iommufd_access_to_id(vdev->noiommu_access);
> > > > +	return -EINVAL;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);  
> > > 
> > > I think these exemplify that it would be better if both emulated and
> > > noiommu use the same iommufd_access pointer.  Thanks,  
> > 
> > Oh, I mis understood your other remark.. Yeah good question I have to
> > study this also
> 
> I guess I also missed that this wasn't iommufd_access vs
> noiommu_access, it's device vs access, but shouldn't any iommufd_access
> pointer provide the devid?  I need to go look why we need two different
> methods to get a devid...

At least this hunk above makes sense, access and device are two
different objects with different types, so having different ID
accessors is reasonably logical.

But it is a good point that this should return the dev id of the
normal access for a normal mdev too

Ideally I'd like to see that we always return a dev id to userspace
for all vfio device types. Then we can rely on it

Jason

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set
  2023-05-13 13:21 ` [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set Yi Liu
  2023-05-15  7:11   ` Cédric Le Goater
@ 2023-05-17 19:12   ` Alex Williamson
  2023-05-18 12:31     ` Liu, Yi L
  1 sibling, 1 reply; 36+ messages in thread
From: Alex Williamson @ 2023-05-17 19:12 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Sat, 13 May 2023 06:21:33 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> There are drivers that need to search vfio_device within a given dev_set.
> e.g. vfio-pci. So add a helper.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c |  8 +++-----
>  drivers/vfio/vfio_main.c         | 15 +++++++++++++++
>  include/linux/vfio.h             |  3 +++
>  3 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 39e7823088e7..4df2def35bdd 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -2335,12 +2335,10 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
>  static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
>  {
>  	struct vfio_device_set *dev_set = data;
> -	struct vfio_device *cur;
>  
> -	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> -		if (cur->dev == &pdev->dev)
> -			return 0;
> -	return -EBUSY;
> +	lockdep_assert_held(&dev_set->lock);
> +
> +	return vfio_find_device_in_devset(dev_set, &pdev->dev) ? 0 : -EBUSY;

Maybe an opportunity to revisit why this returns -EBUSY rather than
something reasonable like -ENODEV.  It looks like we picked up the
-EBUSY in a882c16a2b7e where I think it was trying to preserve the
return of vfio_pci_try_zap_and_vma_lock_cb() but the return value here
is not even propagated so this looks like an chance to have it make
sense again.  Thanks,

Alex

>  }
>  
>  /*
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index f0ca33b2e1df..ab4f3a794f78 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -141,6 +141,21 @@ unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set)
>  }
>  EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
>  
> +struct vfio_device *
> +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> +			   struct device *dev)
> +{
> +	struct vfio_device *cur;
> +
> +	lockdep_assert_held(&dev_set->lock);
> +
> +	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> +		if (cur->dev == dev)
> +			return cur;
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(vfio_find_device_in_devset);
> +
>  /*
>   * Device objects - create, release, get, put, search
>   */
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index fcbe084b18c8..4c17395ed4d2 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -259,6 +259,9 @@ void vfio_unregister_group_dev(struct vfio_device *device);
>  
>  int vfio_assign_device_set(struct vfio_device *device, void *set_id);
>  unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
> +struct vfio_device *
> +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> +			   struct device *dev);
>  
>  int vfio_mig_get_next_state(struct vfio_device *device,
>  			    enum vfio_device_mig_state cur_fsm,


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 08/10] iommufd: Add iommufd_ctx_has_group()
  2023-05-13 13:21 ` [PATCH v5 08/10] iommufd: Add iommufd_ctx_has_group() Yi Liu
@ 2023-05-17 19:40   ` Alex Williamson
  2023-05-18 12:33     ` Liu, Yi L
  0 siblings, 1 reply; 36+ messages in thread
From: Alex Williamson @ 2023-05-17 19:40 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Sat, 13 May 2023 06:21:34 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> to check if any device within the given iommu_group has been bound with

Nit, I find these commit logs where the subject line is intended to
flow into the commit log to form a complete sentence difficult to read.
I expect complete thoughts within the commit log itself and the subject
should be a separate summary of the log.  Repeating the subject within
the commit log is ok.

> the iommufd_ctx. This helpful for the checking on device ownership for

s/This/This is/

> the devices which have been bound but cannot be bound to any other iommufd

s/have been/have not been/?

> as the iommu_group has been bound.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 29 +++++++++++++++++++++++++++++
>  include/linux/iommufd.h        |  8 ++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 81466b97023f..5e5f7912807b 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -98,6 +98,35 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
>  
> +/**
> + * iommufd_ctx_has_group - True if the struct device is bound to this ictx

What struct device?  Isn't this "True if any device within the group is
bound to the ictx"?

> + * @ictx: iommufd file descriptor
> + * @group: Pointer to a physical iommu_group struct
> + *
> + * True if a iommufd_device_bind() is present for any device within the
> + * group.

How can a function be present for a device?  Maybe "True if any device
within the group has been bound to this ictx, ex. via
iommufd_device_bind(), therefore implying ictx ownership of the group."  Thanks,

Alex

> + */
> +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
> +{
> +	struct iommufd_object *obj;
> +	unsigned long index;
> +
> +	if (!ictx || !group)
> +		return false;
> +
> +	xa_lock(&ictx->objects);
> +	xa_for_each(&ictx->objects, index, obj) {
> +		if (obj->type == IOMMUFD_OBJ_DEVICE &&
> +		    container_of(obj, struct iommufd_device, obj)->group == group) {
> +			xa_unlock(&ictx->objects);
> +			return true;
> +		}
> +	}
> +	xa_unlock(&ictx->objects);
> +	return false;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> +
>  /**
>   * iommufd_device_unbind - Undo iommufd_device_bind()
>   * @idev: Device returned by iommufd_device_bind()
> diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> index 68cd65274e28..e49c16cd6831 100644
> --- a/include/linux/iommufd.h
> +++ b/include/linux/iommufd.h
> @@ -16,6 +16,7 @@ struct page;
>  struct iommufd_ctx;
>  struct iommufd_access;
>  struct file;
> +struct iommu_group;
>  
>  struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
>  					   struct device *dev, u32 *id);
> @@ -56,6 +57,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
>  #if IS_ENABLED(CONFIG_IOMMUFD)
>  struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
>  void iommufd_ctx_put(struct iommufd_ctx *ictx);
> +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
>  
>  int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
>  			     unsigned long length, struct page **out_pages,
> @@ -77,6 +79,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
>  {
>  }
>  
> +static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
> +					 struct iommu_group *group)
> +{
> +	return false;
> +}
> +
>  static inline int iommufd_access_pin_pages(struct iommufd_access *access,
>  					   unsigned long iova,
>  					   unsigned long length,


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-05-13 13:21 ` [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev Yi Liu
  2023-05-15  7:29   ` Cédric Le Goater
@ 2023-05-17 22:01   ` Alex Williamson
  2023-05-18 13:21     ` Liu, Yi L
  1 sibling, 1 reply; 36+ messages in thread
From: Alex Williamson @ 2023-05-17 22:01 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Sat, 13 May 2023 06:21:35 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This makes VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl to use the iommufd_ctx

s/makes/allows/?

s/to//

> of the cdev device to check the ownership of the other affected devices.
> 
> This returns devid for each of the affected devices. If it is bound to the
> iommufd_ctx of the cdev device, _INFO reports a valid devid > 0; If it is
> not opened by the calling user, but it belongs to the same iommu_group of
> a device that is bound to the iommufd_ctx of the cdev device, reports devid
> value of 0; If the device is un-owned device, configured within a different
> iommufd, or opened outside of the vfio device cdev API, the _INFO ioctl shall
> report devid value of -1.
> 
> devid >=0 doesn't block hot-reset as the affected devices are considered to
> be owned, while devid == -1 will block the use of VFIO_DEVICE_PCI_HOT_RESET
> outside of proof-of-ownership calling conventions (ie. via legacy group
> accessed devices).
> 
> This adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID to tell the user devid is
> returned in case of calling user get device fd from other software stack

"other software stack"?  I think this is trying to say something like:

  When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD
  managed device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is
  reported to indicate the values returned are IOMMUFD devids rather
  than group IDs as used when accessing vfio devices through the
  conventional vfio group interface.  Additionally the flag
  VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported in this mode if
  all of the devices affected by the hot-reset are owned by either
  virtue of being directly bound to the same iommufd context as the
  calling device, or implicitly owned via a shared IOMMU group.

> and adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED to tell user if all
> the affected devices are owned, so user can know it without looping all
> the returned devids.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++++++++++++++++--
>  include/uapi/linux/vfio.h        | 46 +++++++++++++++++++++++++++-
>  2 files changed, 95 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 4df2def35bdd..57586be770af 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -27,6 +27,7 @@
>  #include <linux/vgaarb.h>
>  #include <linux/nospec.h>
>  #include <linux/sched/mm.h>
> +#include <linux/iommufd.h>
>  #if IS_ENABLED(CONFIG_EEH)
>  #include <asm/eeh.h>
>  #endif
> @@ -36,6 +37,10 @@
>  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>  #define DRIVER_DESC "core driver for VFIO based PCI devices"
>  
> +#ifdef CONFIG_IOMMUFD
> +MODULE_IMPORT_NS(IOMMUFD);
> +#endif
> +
>  static bool nointxmask;
>  static bool disable_vga;
>  static bool disable_idle_d3;
> @@ -776,6 +781,9 @@ struct vfio_pci_fill_info {
>  	int max;
>  	int cur;
>  	struct vfio_pci_dependent_device *devices;
> +	struct vfio_device *vdev;
> +	bool devid:1;
> +	bool dev_owned:1;
>  };
>  
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> @@ -790,7 +798,37 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  	if (!iommu_group)
>  		return -EPERM; /* Cannot reset non-isolated devices */
>  
> -	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +	if (fill->devid) {
> +		struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(fill->vdev);
> +		struct vfio_device_set *dev_set = fill->vdev->dev_set;
> +		struct vfio_device *vdev;
> +
> +		/*
> +		 * Report devid for the affected devices:
> +		 * - valid devid > 0 for the devices that are bound with
> +		 *   the iommufd of the calling device.
> +		 * - devid == 0 for the devices that have not been opened
> +		 *   but have same group with one of the devices bound to
> +		 *   the iommufd of the calling device.
> +		 * - devid == -1 for others, and clear dev_owned flag.
> +		 */
> +		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> +		if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
> +			int ret;
> +
> +			ret = vfio_iommufd_physical_devid(vdev);
> +			if (WARN_ON(ret < 0))
> +				return ret;
> +			fill->devices[fill->cur].devid = ret;

Nit, @devid seems like a better variable name here rather than @ret.

> +		} else if (vdev && iommufd_ctx_has_group(iommufd, iommu_group)) {
> +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
> +		} else {
> +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> +			fill->dev_owned = false;
> +		}

I think we're not describing the requirements for this middle test
correctly.  We're essentially only stating the iommufd_ctx_has_group()
part of the requirement, but we're also enforcing a
vfio_find_device_in_devset() requirement, which means the device is not
just unopened within a group shared by the iommufd context, but it must
also still be a device registered as a member of the devset, ie. it
must be bound to a vfio driver.

It's not a new requirement, it's imposed in the hot-reset ioctl itself,
but it's new for the info ioctl given that it's now trying to report
that the user can perform the reset for cdev callers.

This also shares too much logic with vfio_device_owned() added in the
next patch.  I think it might be cleaner to move the iommu_group_get() to
the group path below and change vfio_device_owned() to something that
can be used here and in the reset path.  For example, if we had a
function like:

static int vfio_hot_reset_devid(struct vfio_device *vdev,
                                struct iommufd_ctx *iommufd_ctx)
{
        struct iommu_group *group;
        int devid;

        if (!vdev)
                return VFIO_PCI_DEVID_NOT_OWNED;

        if (vfio_iommufd_physical_ictx(vdev) == iommufd_ctx) 
                return vfio_iommufd_physical_devid(vdev);

        group = iommu_group_get(vdev->dev);
        if (!group)
                return VFIO_PCI_DEVID_NOT_OWNED;
                        
        if (iommufd_ctx_has_group(iommufd_ctx, group))
                devid = VFIO_PCI_DEVID_OWNED;

        iommu_group_put(group);
                                
        return devid;
} 

It could be called above as:

	vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
	fill->devices[fill->cur].devid =
			vfio_hot_reset_devid(vdev, iommufd);


And from vfio_pci_dev_set_hot_reset() as:

	bool owned;

	if (iommufd_ctx) {
		int devid = vfio_hot_reset_devid(&cur_vma->vdev,
						 iommufd_ctx);

		owned = (devid != VFIO_PCI_DEVID_NOT_OWNED);
	} else
		owned = vfio_dev_in_groups(&cur_vma->vdev, groups);

Any better?

> +	} else {
> +		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +	}
>  	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
>  	fill->devices[fill->cur].bus = pdev->bus->number;
>  	fill->devices[fill->cur].devfn = pdev->devfn;
> @@ -1229,17 +1267,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  		return -ENOMEM;
>  
>  	fill.devices = devices;
> +	fill.vdev = &vdev->vdev;
>  
> +	mutex_lock(&vdev->vdev.dev_set->lock);
> +	fill.devid = fill.dev_owned = vfio_device_cdev_opened(&vdev->vdev);
>  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>  					    &fill, slot);
> +	mutex_unlock(&vdev->vdev.dev_set->lock);
>  
>  	/*
>  	 * If a device was removed between counting and filling, we may come up
>  	 * short of fill.max.  If a device was added, we'll have a return of
>  	 * -EAGAIN above.
>  	 */
> -	if (!ret)
> +	if (!ret) {
>  		hdr.count = fill.cur;
> +		if (fill.devid) {
> +			hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID;
> +			if (fill.dev_owned)
> +				hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> +		}
> +	}

Does this clean up the flag and branching a bit?

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 737115d16a79..6a2a079e452d 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -786,8 +786,7 @@ struct vfio_pci_fill_info {
 	int cur;
 	struct vfio_pci_dependent_device *devices;
 	struct vfio_device *vdev;
-	bool devid:1;
-	bool dev_owned:1;
+	u32 flags;
 };
 
 static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
@@ -802,7 +801,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 	if (!iommu_group)
 		return -EPERM; /* Cannot reset non-isolated devices */
 
-	if (fill->devid) {
+	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
 		struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(fill->vdev);
 		struct vfio_device_set *dev_set = fill->vdev->dev_set;
 		struct vfio_device *vdev;
@@ -814,7 +813,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 		 * - devid == 0 for the devices that have not been opened
 		 *   but have same group with one of the devices bound to
 		 *   the iommufd of the calling device.
-		 * - devid == -1 for others, and clear dev_owned flag.
+		 * - devid == -1 for others, and clear owned flag.
 		 */
 		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
 		if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
@@ -828,7 +827,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
 		} else {
 			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
-			fill->dev_owned = false;
+			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
 		}
 	} else {
 		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
@@ -1273,8 +1272,11 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	fill.devices = devices;
 	fill.vdev = &vdev->vdev;
 
+	if (vfio_device_cdev_opened(&vdev->vdev))
+		fill.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID |
+			     VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
+
 	mutex_lock(&vdev->vdev.dev_set->lock);
-	fill.devid = fill.dev_owned = vfio_device_cdev_opened(&vdev->vdev);
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
 					    &fill, slot);
 	mutex_unlock(&vdev->vdev.dev_set->lock);
@@ -1286,11 +1288,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	 */
 	if (!ret) {
 		hdr.count = fill.cur;
-		if (fill.devid) {
-			hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID;
-			if (fill.dev_owned)
-				hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
-		}
+		hdr.flags = fill.flags;
 	}
 
 reset_info_exit:

Thanks,
Alex

>  
>  reset_info_exit:
>  	if (copy_to_user(arg, &hdr, minsz))
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 0552e8dcf0cb..01203215251a 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -650,11 +650,53 @@ enum {
>   * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
>   *					      struct vfio_pci_hot_reset_info)
>   *
> + * This command is used to query the affected devices in the hot reset for
> + * a given device.
> + *
> + * This command always reports the segment, bus, and devfn information for
> + * each affected device, and selectively reports the group_id or devid per
> + * the way how the calling device is opened.
> + *
> + *	- If the calling device is opened via the traditional group/container
> + *	  API, group_id is reported.  User should check if it has owned all
> + *	  the affected devices and provides a set of group fds to prove the
> + *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
> + *
> + *	- If the calling device is opened as a cdev, devid is reported.
> + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
> + *	  data type.  For a given affected device, it is considered owned by
> + *	  this interface if it meets the following conditions:
> + *	  1) Has a valid devid within the iommufd_ctx of the calling device.
> + *	     Ownership cannot be determined across separate iommufd_ctx and the
> + *	     cdev calling conventions do not support a proof-of-ownership model
> + *	     as provided in the legacy group interface.  In this case a valid
> + *	     devid with value greater than zero is provided in the return
> + *	     structure.
> + *	  2) Does not have a valid devid within the iommufd_ctx of the calling
> + *	     device, but belongs to the same IOMMU group as the calling device
> + *	     or another opened device that has a valid devid within the
> + *	     iommufd_ctx of the calling device.  This provides implicit ownership
> + *	     for devices within the same DMA isolation context.  In this case
> + *	     the invalid devid value of zero is provided in the return structure.
> + *
> + *	  A devid value of -1 is provided in the return structure for devices
> + *	  where ownership is not available.  Such devices prevent the use of
> + *	  VFIO_DEVICE_PCI_HOT_RESET outside of proof-of-ownership calling
> + *	  conventions (ie. via legacy group accessed devices).
> + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
> + *	  affected devices are owned by the user.  This flag is available only
> + *	  when VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> + *
>   * Return: 0 on success, -errno on failure:
>   *	-enospc = insufficient buffer, -enodev = unsupported for device.
>   */
>  struct vfio_pci_dependent_device {
> -	__u32	group_id;
> +	union {
> +		__u32   group_id;
> +		__u32	devid;
> +#define VFIO_PCI_DEVID_OWNED		0
> +#define VFIO_PCI_DEVID_NOT_OWNED	-1
> +	};
>  	__u16	segment;
>  	__u8	bus;
>  	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
> @@ -663,6 +705,8 @@ struct vfio_pci_dependent_device {
>  struct vfio_pci_hot_reset_info {
>  	__u32	argsz;
>  	__u32	flags;
> +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
> +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
>  	__u32	count;
>  	struct vfio_pci_dependent_device	devices[];
>  };


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device
  2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
                   ` (9 preceding siblings ...)
  2023-05-13 13:21 ` [PATCH v5 10/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
@ 2023-05-18  5:51 ` Xu, Terrence
  10 siblings, 0 replies; 36+ messages in thread
From: Xu, Terrence @ 2023-05-18  5:51 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson@redhat.com, jgg@nvidia.com,
	Tian, Kevin
  Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
	eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org,
	mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com,
	yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Jiang, Yanting, Duan, Zhenzhong,
	clegoate@redhat.com

> -----Original Message-----
> From: Liu, Yi L <yi.l.liu@intel.com>
> Subject: [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device
> 
> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> prove that it owns all devices affected by resetting the calling device. While
> for cdev devices, user can use an iommufd-based ownership checking model
> and invoke VFIO_DEVICE_PCI_HOT_RESET with a zero-length fd array.
> 
> This series first creates iommufd_access for noiommu devices to fill the gap
> for adding iommufd-based ownership checking model, then extends
> VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to check ownership and return
> the check result and the devid of affected devices to user. In the end,
> extends the VFIO_DEVICE_PCI_HOT_RESET to accept zero-length fd array for
> hot-reset with cdev devices.
> 
> The new hot reset method and updated _INFO ioctl are tested with the
> below qemu:
> 
> https://github.com/yiliu1765/qemu/tree/iommufd_rfcv4.mig.reset.v4_var3
> (requires to test with the cdev kernel)
> 
> Change log:
> 
> v5:
>  - Drop patch 01 of v4 (Alex)
>  - Create noiommu_access for noiommu devices (Jason)
>  - Reserve all negative iommufd IDs, hence VFIO can encode negative
>    values (Jason)
>  - Make vfio_iommufd_physical_devid() return -EINVAL if it's not called
>    with a physical device or a noiommu device.
>  - Add vfio_find_device_in_devset() in vfio_main.c (Alex)
>  - Add iommufd_ctx_has_group() to replace
> vfio_devset_iommufd_has_group().
>    Reason: vfio_devset_iommufd_has_group() only loops the devices within
>    the given devset to check the iommufd an iommu_group, but an
> iommu_group
>    can span into multiple devsets. So if failed to find the group in a
>    devset doesn't mean the group is not owned by the iommufd. So here
> either
>    needs to search all the devsets or add an iommufd API to check it. It
>    appears an iommufd API makes more sense.
>  - Adopt suggestions from Alex on patch 08 and 09 of v4, refine the hot-reset
>    uapi description and minor tweaks
>  - Use bitfields for bool members (Alex)
> 
> v4: https://lore.kernel.org/kvm/20230426145419.450922-1-yi.l.liu@intel.com/
>  - Rename the patch series subject
>  - Patch 01 is moved from the cdev series
>  - Patch 02, 06 are new per review comments in v3
>  - Patch 03/04/05/07/08/09 are from v3 with updates
> 
> v3: https://lore.kernel.org/kvm/20230401144429.88673-1-yi.l.liu@intel.com/
>  - Remove the new _INFO ioctl of v2, extend the existing _INFO ioctl to
>    report devid (Alex)
>  - Add r-b from Jason
>  - Add t-b from Terrence Xu and Yanting Jiang (mainly regression test)
> 
> v2: https://lore.kernel.org/kvm/20230327093458.44939-1-yi.l.liu@intel.com/
>  - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
>  - Add r-b from Kevin and Jason
>  - Add patch 10 to introduce a new _INFO ioctl for the usage of device
>    fd passing usage in cdev path (Jason, Alex)
> 
> v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/
> 
> Regards,
> 	Yi Liu
> 
> Yi Liu (10):
>   vfio-iommufd: Create iommufd_access for noiommu devices
>   vfio/pci: Update comment around group_fd get in
>     vfio_pci_ioctl_pci_hot_reset()
>   vfio/pci: Move the existing hot reset logic to be a helper
>   vfio: Mark cdev usage in vfio_device
>   iommufd: Reserve all negative IDs in the iommufd xarray
>   vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
>     vfio_device
>   vfio: Add helper to search vfio_device in a dev_set
>   iommufd: Add iommufd_ctx_has_group()
>   vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device
>     cdev
>   vfio/pci: Allow passing zero-length fd array in
>     VFIO_DEVICE_PCI_HOT_RESET
> 
>  drivers/iommu/iommufd/device.c   |  53 +++++++++
>  drivers/iommu/iommufd/main.c     |   2 +-
>  drivers/vfio/iommufd.c           |  63 ++++++++++-
>  drivers/vfio/pci/vfio_pci_core.c | 184 ++++++++++++++++++++++++-------
>  drivers/vfio/vfio_main.c         |  15 +++
>  include/linux/iommufd.h          |  14 +++
>  include/linux/vfio.h             |  23 ++++
>  include/uapi/linux/vfio.h        |  60 +++++++++-
>  8 files changed, 368 insertions(+), 46 deletions(-)
> 
> --
> 2.34.1

The new uapi works fine.
Tested GVT-g / GVT-d VFIO legacy mode / compat mode / cdev mode, including negative tests. No regression be introduced.  

Tested-by: Terrence Xu <terrence.xu@intel.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices
  2023-05-17 18:20     ` Jason Gunthorpe
@ 2023-05-18 12:23       ` Liu, Yi L
  2023-05-18 19:43         ` Alex Williamson
  0 siblings, 1 reply; 36+ messages in thread
From: Liu, Yi L @ 2023-05-18 12:23 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Tian, Kevin, joro@8bytes.org, robin.murphy@arm.com,
	cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
	kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, May 18, 2023 2:21 AM
> 
> On Wed, May 17, 2023 at 11:26:09AM -0600, Alex Williamson wrote:
> 
> > It's not clear to me why we need a separate iommufd_access for
> > noiommu.
> 
> The point was to allocate an ID for the device so we can use that ID
> with the other interfaces in all cases.

I guess Alex's question is why adding a new pointer named noiommu_access
while there is already the iommufd_access pointer named iommufd_access.

Maybe we shall reuse the iommufd_access pointer?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set
  2023-05-17 19:12   ` Alex Williamson
@ 2023-05-18 12:31     ` Liu, Yi L
  2023-05-18 19:45       ` Alex Williamson
  0 siblings, 1 reply; 36+ messages in thread
From: Liu, Yi L @ 2023-05-18 12:31 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, May 18, 2023 3:13 AM
> 
> On Sat, 13 May 2023 06:21:33 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > There are drivers that need to search vfio_device within a given dev_set.
> > e.g. vfio-pci. So add a helper.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c |  8 +++-----
> >  drivers/vfio/vfio_main.c         | 15 +++++++++++++++
> >  include/linux/vfio.h             |  3 +++
> >  3 files changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 39e7823088e7..4df2def35bdd 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -2335,12 +2335,10 @@ static bool vfio_dev_in_groups(struct
> vfio_pci_core_device *vdev,
> >  static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
> >  {
> >  	struct vfio_device_set *dev_set = data;
> > -	struct vfio_device *cur;
> >
> > -	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> > -		if (cur->dev == &pdev->dev)
> > -			return 0;
> > -	return -EBUSY;
> > +	lockdep_assert_held(&dev_set->lock);
> > +
> > +	return vfio_find_device_in_devset(dev_set, &pdev->dev) ? 0 : -EBUSY;
> 
> Maybe an opportunity to revisit why this returns -EBUSY rather than
> something reasonable like -ENODEV.  It looks like we picked up the
> -EBUSY in a882c16a2b7e where I think it was trying to preserve the
> return of vfio_pci_try_zap_and_vma_lock_cb() but the return value here
> is not even propagated so this looks like an chance to have it make
> sense again.  Thanks,

From the name of this function, yes, -ENODEV is better. is it
Ok to modify it to be -ENODEV in this patch or a separate one?

Regards,
Yi Liu

> 
> >  }
> >
> >  /*
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index f0ca33b2e1df..ab4f3a794f78 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -141,6 +141,21 @@ unsigned int vfio_device_set_open_count(struct
> vfio_device_set *dev_set)
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
> >
> > +struct vfio_device *
> > +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> > +			   struct device *dev)
> > +{
> > +	struct vfio_device *cur;
> > +
> > +	lockdep_assert_held(&dev_set->lock);
> > +
> > +	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> > +		if (cur->dev == dev)
> > +			return cur;
> > +	return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_find_device_in_devset);
> > +
> >  /*
> >   * Device objects - create, release, get, put, search
> >   */
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index fcbe084b18c8..4c17395ed4d2 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -259,6 +259,9 @@ void vfio_unregister_group_dev(struct vfio_device *device);
> >
> >  int vfio_assign_device_set(struct vfio_device *device, void *set_id);
> >  unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
> > +struct vfio_device *
> > +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> > +			   struct device *dev);
> >
> >  int vfio_mig_get_next_state(struct vfio_device *device,
> >  			    enum vfio_device_mig_state cur_fsm,


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 08/10] iommufd: Add iommufd_ctx_has_group()
  2023-05-17 19:40   ` Alex Williamson
@ 2023-05-18 12:33     ` Liu, Yi L
  0 siblings, 0 replies; 36+ messages in thread
From: Liu, Yi L @ 2023-05-18 12:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, May 18, 2023 3:40 AM
> 
> On Sat, 13 May 2023 06:21:34 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > to check if any device within the given iommu_group has been bound with
> 
> Nit, I find these commit logs where the subject line is intended to
> flow into the commit log to form a complete sentence difficult to read.
> I expect complete thoughts within the commit log itself and the subject
> should be a separate summary of the log.  Repeating the subject within
> the commit log is ok.

Sure. I'll go through the commit messages.

> 
> > the iommufd_ctx. This helpful for the checking on device ownership for
> 
> s/This/This is/
> 
> > the devices which have been bound but cannot be bound to any other iommufd
> 
> s/have been/have not been/?
> 
> > as the iommu_group has been bound.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommufd/device.c | 29 +++++++++++++++++++++++++++++
> >  include/linux/iommufd.h        |  8 ++++++++
> >  2 files changed, 37 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 81466b97023f..5e5f7912807b 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -98,6 +98,35 @@ struct iommufd_device *iommufd_device_bind(struct
> iommufd_ctx *ictx,
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
> >
> > +/**
> > + * iommufd_ctx_has_group - True if the struct device is bound to this ictx
> 
> What struct device?  Isn't this "True if any device within the group is
> bound to the ictx"?

Yes, yes. a poor copy from a prior version..

> 
> > + * @ictx: iommufd file descriptor
> > + * @group: Pointer to a physical iommu_group struct
> > + *
> > + * True if a iommufd_device_bind() is present for any device within the
> > + * group.
> 
> How can a function be present for a device?  Maybe "True if any device
> within the group has been bound to this ictx, ex. via
> iommufd_device_bind(), therefore implying ictx ownership of the group."  Thanks,

Yes, this is the meaning of it. will fix it.

Regards,
Yi Liu

> 
> > + */
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
> > +{
> > +	struct iommufd_object *obj;
> > +	unsigned long index;
> > +
> > +	if (!ictx || !group)
> > +		return false;
> > +
> > +	xa_lock(&ictx->objects);
> > +	xa_for_each(&ictx->objects, index, obj) {
> > +		if (obj->type == IOMMUFD_OBJ_DEVICE &&
> > +		    container_of(obj, struct iommufd_device, obj)->group == group) {
> > +			xa_unlock(&ictx->objects);
> > +			return true;
> > +		}
> > +	}
> > +	xa_unlock(&ictx->objects);
> > +	return false;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> > +
> >  /**
> >   * iommufd_device_unbind - Undo iommufd_device_bind()
> >   * @idev: Device returned by iommufd_device_bind()
> > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> > index 68cd65274e28..e49c16cd6831 100644
> > --- a/include/linux/iommufd.h
> > +++ b/include/linux/iommufd.h
> > @@ -16,6 +16,7 @@ struct page;
> >  struct iommufd_ctx;
> >  struct iommufd_access;
> >  struct file;
> > +struct iommu_group;
> >
> >  struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> >  					   struct device *dev, u32 *id);
> > @@ -56,6 +57,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
> >  #if IS_ENABLED(CONFIG_IOMMUFD)
> >  struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
> >  void iommufd_ctx_put(struct iommufd_ctx *ictx);
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
> >
> >  int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
> >  			     unsigned long length, struct page **out_pages,
> > @@ -77,6 +79,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
> >  {
> >  }
> >
> > +static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
> > +					 struct iommu_group *group)
> > +{
> > +	return false;
> > +}
> > +
> >  static inline int iommufd_access_pin_pages(struct iommufd_access *access,
> >  					   unsigned long iova,
> >  					   unsigned long length,


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-05-17 22:01   ` Alex Williamson
@ 2023-05-18 13:21     ` Liu, Yi L
  2023-05-18 13:31       ` Liu, Yi L
  2023-05-18 20:02       ` Alex Williamson
  0 siblings, 2 replies; 36+ messages in thread
From: Liu, Yi L @ 2023-05-18 13:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, May 18, 2023 6:02 AM
> 
> On Sat, 13 May 2023 06:21:35 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This makes VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl to use the iommufd_ctx
> 
> s/makes/allows/?
> 
> s/to//
> 
> > of the cdev device to check the ownership of the other affected devices.
> >
> > This returns devid for each of the affected devices. If it is bound to the
> > iommufd_ctx of the cdev device, _INFO reports a valid devid > 0; If it is
> > not opened by the calling user, but it belongs to the same iommu_group of
> > a device that is bound to the iommufd_ctx of the cdev device, reports devid
> > value of 0; If the device is un-owned device, configured within a different
> > iommufd, or opened outside of the vfio device cdev API, the _INFO ioctl shall
> > report devid value of -1.
> >
> > devid >=0 doesn't block hot-reset as the affected devices are considered to
> > be owned, while devid == -1 will block the use of VFIO_DEVICE_PCI_HOT_RESET
> > outside of proof-of-ownership calling conventions (ie. via legacy group
> > accessed devices).
> >
> > This adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID to tell the user devid is
> > returned in case of calling user get device fd from other software stack
> 
> "other software stack"?  I think this is trying to say something like:
> 
>   When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD
>   managed device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is
>   reported to indicate the values returned are IOMMUFD devids rather
>   than group IDs as used when accessing vfio devices through the
>   conventional vfio group interface.  Additionally the flag
>   VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported in this mode if
>   all of the devices affected by the hot-reset are owned by either
>   virtue of being directly bound to the same iommufd context as the
>   calling device, or implicitly owned via a shared IOMMU group.

Yes. it is.

> > and adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED to tell user if all
> > the affected devices are owned, so user can know it without looping all
> > the returned devids.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++++++++++++++++--
> >  include/uapi/linux/vfio.h        | 46 +++++++++++++++++++++++++++-
> >  2 files changed, 95 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 4df2def35bdd..57586be770af 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/vgaarb.h>
> >  #include <linux/nospec.h>
> >  #include <linux/sched/mm.h>
> > +#include <linux/iommufd.h>
> >  #if IS_ENABLED(CONFIG_EEH)
> >  #include <asm/eeh.h>
> >  #endif
> > @@ -36,6 +37,10 @@
> >  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> >  #define DRIVER_DESC "core driver for VFIO based PCI devices"
> >
> > +#ifdef CONFIG_IOMMUFD
> > +MODULE_IMPORT_NS(IOMMUFD);
> > +#endif
> > +
> >  static bool nointxmask;
> >  static bool disable_vga;
> >  static bool disable_idle_d3;
> > @@ -776,6 +781,9 @@ struct vfio_pci_fill_info {
> >  	int max;
> >  	int cur;
> >  	struct vfio_pci_dependent_device *devices;
> > +	struct vfio_device *vdev;
> > +	bool devid:1;
> > +	bool dev_owned:1;
> >  };
> >
> >  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > @@ -790,7 +798,37 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> >  	if (!iommu_group)
> >  		return -EPERM; /* Cannot reset non-isolated devices */
> >
> > -	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > +	if (fill->devid) {
> > +		struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(fill->vdev);
> > +		struct vfio_device_set *dev_set = fill->vdev->dev_set;
> > +		struct vfio_device *vdev;
> > +
> > +		/*
> > +		 * Report devid for the affected devices:
> > +		 * - valid devid > 0 for the devices that are bound with
> > +		 *   the iommufd of the calling device.
> > +		 * - devid == 0 for the devices that have not been opened
> > +		 *   but have same group with one of the devices bound to
> > +		 *   the iommufd of the calling device.
> > +		 * - devid == -1 for others, and clear dev_owned flag.
> > +		 */
> > +		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> > +		if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
> > +			int ret;
> > +
> > +			ret = vfio_iommufd_physical_devid(vdev);
> > +			if (WARN_ON(ret < 0))
> > +				return ret;
> > +			fill->devices[fill->cur].devid = ret;
> 
> Nit, @devid seems like a better variable name here rather than @ret.
> 
> > +		} else if (vdev && iommufd_ctx_has_group(iommufd, iommu_group)) {
> > +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
> > +		} else {
> > +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> > +			fill->dev_owned = false;
> > +		}
> 
> I think we're not describing the requirements for this middle test
> correctly.  We're essentially only stating the iommufd_ctx_has_group()
> part of the requirement, but we're also enforcing a
> vfio_find_device_in_devset() requirement, which means the device is not
> just unopened within a group shared by the iommufd context, but it must
> also still be a device registered as a member of the devset, ie. it
> must be bound to a vfio driver.

Yes. This is to make it be par with hot-reset path. This is needed. Is it?

> 
> It's not a new requirement, it's imposed in the hot-reset ioctl itself,
> but it's new for the info ioctl given that it's now trying to report
> that the user can perform the reset for cdev callers.
>
> This also shares too much logic with vfio_device_owned() added in the
> next patch.  I think it might be cleaner to move the iommu_group_get() to
> the group path below and change vfio_device_owned() to something that
> can be used here and in the reset path.  For example, if we had a
> function like:
> 
> static int vfio_hot_reset_devid(struct vfio_device *vdev,
>                                 struct iommufd_ctx *iommufd_ctx)
> {
>         struct iommu_group *group;
>         int devid;
> 
>         if (!vdev)
>                 return VFIO_PCI_DEVID_NOT_OWNED;
> 
>         if (vfio_iommufd_physical_ictx(vdev) == iommufd_ctx)
>                 return vfio_iommufd_physical_devid(vdev);
> 
>         group = iommu_group_get(vdev->dev);
>         if (!group)
>                 return VFIO_PCI_DEVID_NOT_OWNED;
> 
>         if (iommufd_ctx_has_group(iommufd_ctx, group))
>                 devid = VFIO_PCI_DEVID_OWNED;
> 
>         iommu_group_put(group);
> 
>         return devid;
> }

Thanks. This can be placed in vfio/iommufd.c, hence no need to
Import the IOMMUFD namespace in vfio_pci_core.c.
vfio_iommufd_physical_devid() can ebe static within vfio/iommufd.c

> It could be called above as:
> 
> 	vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> 	fill->devices[fill->cur].devid =
> 			vfio_hot_reset_devid(vdev, iommufd);
> 
> 
> And from vfio_pci_dev_set_hot_reset() as:
> 
> 	bool owned;
> 
> 	if (iommufd_ctx) {
> 		int devid = vfio_hot_reset_devid(&cur_vma->vdev,
> 						 iommufd_ctx);
> 
> 		owned = (devid != VFIO_PCI_DEVID_NOT_OWNED);
> 	} else
> 		owned = vfio_dev_in_groups(&cur_vma->vdev, groups);
> 
> Any better?

Above looks good enough.

> 
> > +	} else {
> > +		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > +	}
> >  	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> >  	fill->devices[fill->cur].bus = pdev->bus->number;
> >  	fill->devices[fill->cur].devfn = pdev->devfn;
> > @@ -1229,17 +1267,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> >  		return -ENOMEM;
> >
> >  	fill.devices = devices;
> > +	fill.vdev = &vdev->vdev;
> >
> > +	mutex_lock(&vdev->vdev.dev_set->lock);
> > +	fill.devid = fill.dev_owned = vfio_device_cdev_opened(&vdev->vdev);
> >  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> >  					    &fill, slot);
> > +	mutex_unlock(&vdev->vdev.dev_set->lock);
> >
> >  	/*
> >  	 * If a device was removed between counting and filling, we may come up
> >  	 * short of fill.max.  If a device was added, we'll have a return of
> >  	 * -EAGAIN above.
> >  	 */
> > -	if (!ret)
> > +	if (!ret) {
> >  		hdr.count = fill.cur;
> > +		if (fill.devid) {
> > +			hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID;
> > +			if (fill.dev_owned)
> > +				hdr.flags |=
> VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> > +		}
> > +	}
> 
> Does this clean up the flag and branching a bit?

Yes. 😊

> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 737115d16a79..6a2a079e452d 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -786,8 +786,7 @@ struct vfio_pci_fill_info {
>  	int cur;
>  	struct vfio_pci_dependent_device *devices;
>  	struct vfio_device *vdev;
> -	bool devid:1;
> -	bool dev_owned:1;
> +	u32 flags;
>  };
> 
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> @@ -802,7 +801,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  	if (!iommu_group)
>  		return -EPERM; /* Cannot reset non-isolated devices */
> 
> -	if (fill->devid) {
> +	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
>  		struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(fill->vdev);
>  		struct vfio_device_set *dev_set = fill->vdev->dev_set;
>  		struct vfio_device *vdev;
> @@ -814,7 +813,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  		 * - devid == 0 for the devices that have not been opened
>  		 *   but have same group with one of the devices bound to
>  		 *   the iommufd of the calling device.
> -		 * - devid == -1 for others, and clear dev_owned flag.
> +		 * - devid == -1 for others, and clear owned flag.
>  		 */
>  		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
>  		if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
> @@ -828,7 +827,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
>  		} else {
>  			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> -			fill->dev_owned = false;
> +			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
>  		}
>  	} else {
>  		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> @@ -1273,8 +1272,11 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	fill.devices = devices;
>  	fill.vdev = &vdev->vdev;
> 
> +	if (vfio_device_cdev_opened(&vdev->vdev))
> +		fill.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID |
> +			     VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> +
>  	mutex_lock(&vdev->vdev.dev_set->lock);
> -	fill.devid = fill.dev_owned = vfio_device_cdev_opened(&vdev->vdev);
>  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>  					    &fill, slot);
>  	mutex_unlock(&vdev->vdev.dev_set->lock);
> @@ -1286,11 +1288,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	 */
>  	if (!ret) {
>  		hdr.count = fill.cur;
> -		if (fill.devid) {
> -			hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID;
> -			if (fill.dev_owned)
> -				hdr.flags |=
> VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> -		}
> +		hdr.flags = fill.flags;
>  	}
> 
>  reset_info_exit:
> 
> Thanks,
> Alex

Thanks,
Yi Liu

> >
> >  reset_info_exit:
> >  	if (copy_to_user(arg, &hdr, minsz))
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 0552e8dcf0cb..01203215251a 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -650,11 +650,53 @@ enum {
> >   * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> >   *					      struct vfio_pci_hot_reset_info)
> >   *
> > + * This command is used to query the affected devices in the hot reset for
> > + * a given device.
> > + *
> > + * This command always reports the segment, bus, and devfn information for
> > + * each affected device, and selectively reports the group_id or devid per
> > + * the way how the calling device is opened.
> > + *
> > + *	- If the calling device is opened via the traditional group/container
> > + *	  API, group_id is reported.  User should check if it has owned all
> > + *	  the affected devices and provides a set of group fds to prove the
> > + *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
> > + *
> > + *	- If the calling device is opened as a cdev, devid is reported.
> > + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
> > + *	  data type.  For a given affected device, it is considered owned by
> > + *	  this interface if it meets the following conditions:
> > + *	  1) Has a valid devid within the iommufd_ctx of the calling device.
> > + *	     Ownership cannot be determined across separate iommufd_ctx and the
> > + *	     cdev calling conventions do not support a proof-of-ownership model
> > + *	     as provided in the legacy group interface.  In this case a valid
> > + *	     devid with value greater than zero is provided in the return
> > + *	     structure.
> > + *	  2) Does not have a valid devid within the iommufd_ctx of the calling
> > + *	     device, but belongs to the same IOMMU group as the calling device
> > + *	     or another opened device that has a valid devid within the
> > + *	     iommufd_ctx of the calling device.  This provides implicit ownership
> > + *	     for devices within the same DMA isolation context.  In this case
> > + *	     the invalid devid value of zero is provided in the return structure.
> > + *
> > + *	  A devid value of -1 is provided in the return structure for devices
> > + *	  where ownership is not available.  Such devices prevent the use of
> > + *	  VFIO_DEVICE_PCI_HOT_RESET outside of proof-of-ownership calling
> > + *	  conventions (ie. via legacy group accessed devices).
> > + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
> > + *	  affected devices are owned by the user.  This flag is available only
> > + *	  when VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> > + *
> >   * Return: 0 on success, -errno on failure:
> >   *	-enospc = insufficient buffer, -enodev = unsupported for device.
> >   */
> >  struct vfio_pci_dependent_device {
> > -	__u32	group_id;
> > +	union {
> > +		__u32   group_id;
> > +		__u32	devid;
> > +#define VFIO_PCI_DEVID_OWNED		0
> > +#define VFIO_PCI_DEVID_NOT_OWNED	-1
> > +	};
> >  	__u16	segment;
> >  	__u8	bus;
> >  	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
> > @@ -663,6 +705,8 @@ struct vfio_pci_dependent_device {
> >  struct vfio_pci_hot_reset_info {
> >  	__u32	argsz;
> >  	__u32	flags;
> > +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
> > +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
> >  	__u32	count;
> >  	struct vfio_pci_dependent_device	devices[];
> >  };


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-05-17 18:15   ` Alex Williamson
  2023-05-17 18:22     ` Jason Gunthorpe
@ 2023-05-18 13:25     ` Liu, Yi L
  2023-05-18 19:50       ` Alex Williamson
  2023-05-19  7:57       ` Tian, Kevin
  1 sibling, 2 replies; 36+ messages in thread
From: Liu, Yi L @ 2023-05-18 13:25 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, May 18, 2023 2:15 AM
> 
> On Sat, 13 May 2023 06:21:32 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This is needed by the vfio-pci driver to report affected devices in the
> > hot reset for a given device.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommufd/device.c | 24 ++++++++++++++++++++++++
> >  drivers/vfio/iommufd.c         | 20 ++++++++++++++++++++
> >  include/linux/iommufd.h        |  6 ++++++
> >  include/linux/vfio.h           | 14 ++++++++++++++
> >  4 files changed, 64 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 4f9b2142274c..81466b97023f 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -116,6 +116,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
> >
> > +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
> > +{
> > +	return idev->ictx;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
> > +
> > +u32 iommufd_device_to_id(struct iommufd_device *idev)
> > +{
> > +	return idev->obj.id;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
> > +
> >  static int iommufd_device_setup_msi(struct iommufd_device *idev,
> >  				    struct iommufd_hw_pagetable *hwpt,
> >  				    phys_addr_t sw_msi_start)
> > @@ -463,6 +475,18 @@ void iommufd_access_destroy(struct iommufd_access
> *access)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
> >
> > +struct iommufd_ctx *iommufd_access_to_ictx(struct iommufd_access *access)
> > +{
> > +	return access->ictx;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_access_to_ictx, IOMMUFD);
> > +
> > +u32 iommufd_access_to_id(struct iommufd_access *access)
> > +{
> > +	return access->obj.id;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_access_to_id, IOMMUFD);
> > +
> >  int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
> >  {
> >  	struct iommufd_ioas *new_ioas;
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index c1379e826052..a18e920be164 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -105,6 +105,26 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> >  		vdev->ops->unbind_iommufd(vdev);
> >  }
> >
> > +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > +{
> > +	if (vdev->iommufd_device)
> > +		return iommufd_device_to_ictx(vdev->iommufd_device);
> > +	if (vdev->noiommu_access)
> > +		return iommufd_access_to_ictx(vdev->noiommu_access);
> > +	return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
> > +
> > +int vfio_iommufd_physical_devid(struct vfio_device *vdev)
> > +{
> > +	if (vdev->iommufd_device)
> > +		return iommufd_device_to_id(vdev->iommufd_device);
> > +	if (vdev->noiommu_access)
> > +		return iommufd_access_to_id(vdev->noiommu_access);
> > +	return -EINVAL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
> 
> I think these exemplify that it would be better if both emulated and
> noiommu use the same iommufd_access pointer.  Thanks,

Sure. Then I shall rename this helper. vfio_iommufd_device_devid()
What about your opinion?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-05-18 13:21     ` Liu, Yi L
@ 2023-05-18 13:31       ` Liu, Yi L
  2023-05-18 19:53         ` Alex Williamson
  2023-05-18 20:02       ` Alex Williamson
  1 sibling, 1 reply; 36+ messages in thread
From: Liu, Yi L @ 2023-05-18 13:31 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 18, 2023 9:22 PM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Thursday, May 18, 2023 6:02 AM
> >
> > On Sat, 13 May 2023 06:21:35 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:

> >
> > static int vfio_hot_reset_devid(struct vfio_device *vdev,
> >                                 struct iommufd_ctx *iommufd_ctx)
> > {
> >         struct iommu_group *group;
> >         int devid;
> >
> >         if (!vdev)
> >                 return VFIO_PCI_DEVID_NOT_OWNED;
> >
> >         if (vfio_iommufd_physical_ictx(vdev) == iommufd_ctx)
> >                 return vfio_iommufd_physical_devid(vdev);

Do we need to check the return of this helper? It returns -EINVAL
when iommufd_access and iommufd_device are both null. Though
not possible in this path. Is a WARN_ON needed or not?

Regards,
Yi Liu

> >
> >         group = iommu_group_get(vdev->dev);
> >         if (!group)
> >                 return VFIO_PCI_DEVID_NOT_OWNED;
> >
> >         if (iommufd_ctx_has_group(iommufd_ctx, group))
> >                 devid = VFIO_PCI_DEVID_OWNED;
> >
> >         iommu_group_put(group);
> >
> >         return devid;
> > }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices
  2023-05-18 12:23       ` Liu, Yi L
@ 2023-05-18 19:43         ` Alex Williamson
  0 siblings, 0 replies; 36+ messages in thread
From: Alex Williamson @ 2023-05-18 19:43 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Jason Gunthorpe, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

On Thu, 18 May 2023 12:23:29 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, May 18, 2023 2:21 AM
> > 
> > On Wed, May 17, 2023 at 11:26:09AM -0600, Alex Williamson wrote:
> >   
> > > It's not clear to me why we need a separate iommufd_access for
> > > noiommu.  
> > 
> > The point was to allocate an ID for the device so we can use that ID
> > with the other interfaces in all cases.  
> 
> I guess Alex's question is why adding a new pointer named noiommu_access
> while there is already the iommufd_access pointer named iommufd_access.

Yes, precisely.  Sorry that was confusing, we need the access for
noiommu but it's not clear why that access needs to be stored in a
separate pointer when we can already differentiate noiommu devices
otherwise.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set
  2023-05-18 12:31     ` Liu, Yi L
@ 2023-05-18 19:45       ` Alex Williamson
  0 siblings, 0 replies; 36+ messages in thread
From: Alex Williamson @ 2023-05-18 19:45 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

On Thu, 18 May 2023 12:31:07 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Thursday, May 18, 2023 3:13 AM
> > 
> > On Sat, 13 May 2023 06:21:33 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > There are drivers that need to search vfio_device within a given dev_set.
> > > e.g. vfio-pci. So add a helper.
> > >
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/pci/vfio_pci_core.c |  8 +++-----
> > >  drivers/vfio/vfio_main.c         | 15 +++++++++++++++
> > >  include/linux/vfio.h             |  3 +++
> > >  3 files changed, 21 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > > index 39e7823088e7..4df2def35bdd 100644
> > > --- a/drivers/vfio/pci/vfio_pci_core.c
> > > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > > @@ -2335,12 +2335,10 @@ static bool vfio_dev_in_groups(struct  
> > vfio_pci_core_device *vdev,  
> > >  static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
> > >  {
> > >  	struct vfio_device_set *dev_set = data;
> > > -	struct vfio_device *cur;
> > >
> > > -	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> > > -		if (cur->dev == &pdev->dev)
> > > -			return 0;
> > > -	return -EBUSY;
> > > +	lockdep_assert_held(&dev_set->lock);
> > > +
> > > +	return vfio_find_device_in_devset(dev_set, &pdev->dev) ? 0 : -EBUSY;  
> > 
> > Maybe an opportunity to revisit why this returns -EBUSY rather than
> > something reasonable like -ENODEV.  It looks like we picked up the
> > -EBUSY in a882c16a2b7e where I think it was trying to preserve the
> > return of vfio_pci_try_zap_and_vma_lock_cb() but the return value here
> > is not even propagated so this looks like an chance to have it make
> > sense again.  Thanks,  
> 
> From the name of this function, yes, -ENODEV is better. is it
> Ok to modify it to be -ENODEV in this patch or a separate one?

This patch is fine so long as it's noted in the commit log.  Thanks,

Alex
 
> > >  }
> > >
> > >  /*
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index f0ca33b2e1df..ab4f3a794f78 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -141,6 +141,21 @@ unsigned int vfio_device_set_open_count(struct  
> > vfio_device_set *dev_set)  
> > >  }
> > >  EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
> > >
> > > +struct vfio_device *
> > > +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> > > +			   struct device *dev)
> > > +{
> > > +	struct vfio_device *cur;
> > > +
> > > +	lockdep_assert_held(&dev_set->lock);
> > > +
> > > +	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> > > +		if (cur->dev == dev)
> > > +			return cur;
> > > +	return NULL;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_find_device_in_devset);
> > > +
> > >  /*
> > >   * Device objects - create, release, get, put, search
> > >   */
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index fcbe084b18c8..4c17395ed4d2 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -259,6 +259,9 @@ void vfio_unregister_group_dev(struct vfio_device *device);
> > >
> > >  int vfio_assign_device_set(struct vfio_device *device, void *set_id);
> > >  unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
> > > +struct vfio_device *
> > > +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> > > +			   struct device *dev);
> > >
> > >  int vfio_mig_get_next_state(struct vfio_device *device,
> > >  			    enum vfio_device_mig_state cur_fsm,  
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-05-18 13:25     ` Liu, Yi L
@ 2023-05-18 19:50       ` Alex Williamson
  2023-05-19  7:57       ` Tian, Kevin
  1 sibling, 0 replies; 36+ messages in thread
From: Alex Williamson @ 2023-05-18 19:50 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

On Thu, 18 May 2023 13:25:59 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Thursday, May 18, 2023 2:15 AM
> > 
> > On Sat, 13 May 2023 06:21:32 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > This is needed by the vfio-pci driver to report affected devices in the
> > > hot reset for a given device.
> > >
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/iommu/iommufd/device.c | 24 ++++++++++++++++++++++++
> > >  drivers/vfio/iommufd.c         | 20 ++++++++++++++++++++
> > >  include/linux/iommufd.h        |  6 ++++++
> > >  include/linux/vfio.h           | 14 ++++++++++++++
> > >  4 files changed, 64 insertions(+)
> > >
> > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > > index 4f9b2142274c..81466b97023f 100644
> > > --- a/drivers/iommu/iommufd/device.c
> > > +++ b/drivers/iommu/iommufd/device.c
> > > @@ -116,6 +116,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
> > >
> > > +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
> > > +{
> > > +	return idev->ictx;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
> > > +
> > > +u32 iommufd_device_to_id(struct iommufd_device *idev)
> > > +{
> > > +	return idev->obj.id;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
> > > +
> > >  static int iommufd_device_setup_msi(struct iommufd_device *idev,
> > >  				    struct iommufd_hw_pagetable *hwpt,
> > >  				    phys_addr_t sw_msi_start)
> > > @@ -463,6 +475,18 @@ void iommufd_access_destroy(struct iommufd_access  
> > *access)  
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
> > >
> > > +struct iommufd_ctx *iommufd_access_to_ictx(struct iommufd_access *access)
> > > +{
> > > +	return access->ictx;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(iommufd_access_to_ictx, IOMMUFD);
> > > +
> > > +u32 iommufd_access_to_id(struct iommufd_access *access)
> > > +{
> > > +	return access->obj.id;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(iommufd_access_to_id, IOMMUFD);
> > > +
> > >  int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
> > >  {
> > >  	struct iommufd_ioas *new_ioas;
> > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > index c1379e826052..a18e920be164 100644
> > > --- a/drivers/vfio/iommufd.c
> > > +++ b/drivers/vfio/iommufd.c
> > > @@ -105,6 +105,26 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> > >  		vdev->ops->unbind_iommufd(vdev);
> > >  }
> > >
> > > +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > > +{
> > > +	if (vdev->iommufd_device)
> > > +		return iommufd_device_to_ictx(vdev->iommufd_device);
> > > +	if (vdev->noiommu_access)
> > > +		return iommufd_access_to_ictx(vdev->noiommu_access);
> > > +	return NULL;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
> > > +
> > > +int vfio_iommufd_physical_devid(struct vfio_device *vdev)
> > > +{
> > > +	if (vdev->iommufd_device)
> > > +		return iommufd_device_to_id(vdev->iommufd_device);
> > > +	if (vdev->noiommu_access)
> > > +		return iommufd_access_to_id(vdev->noiommu_access);
> > > +	return -EINVAL;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);  
> > 
> > I think these exemplify that it would be better if both emulated and
> > noiommu use the same iommufd_access pointer.  Thanks,  
> 
> Sure. Then I shall rename this helper. vfio_iommufd_device_devid()
> What about your opinion?

Yes, it really didn't even occur to me that "physical" in the name was
meant to suggest this shouldn't be used for emulated mdev devices.  It
should work for all devices and using a shared iommufd access pointer
between noiommu and emulated should simplify that somewhat.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-05-18 13:31       ` Liu, Yi L
@ 2023-05-18 19:53         ` Alex Williamson
  0 siblings, 0 replies; 36+ messages in thread
From: Alex Williamson @ 2023-05-18 19:53 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

On Thu, 18 May 2023 13:31:47 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, May 18, 2023 9:22 PM
> >   
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Thursday, May 18, 2023 6:02 AM
> > >
> > > On Sat, 13 May 2023 06:21:35 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:  
> 
> > >
> > > static int vfio_hot_reset_devid(struct vfio_device *vdev,
> > >                                 struct iommufd_ctx *iommufd_ctx)
> > > {
> > >         struct iommu_group *group;
> > >         int devid;
> > >
> > >         if (!vdev)
> > >                 return VFIO_PCI_DEVID_NOT_OWNED;
> > >
> > >         if (vfio_iommufd_physical_ictx(vdev) == iommufd_ctx)
> > >                 return vfio_iommufd_physical_devid(vdev);  
> 
> Do we need to check the return of this helper? It returns -EINVAL
> when iommufd_access and iommufd_device are both null. Though
> not possible in this path. Is a WARN_ON needed or not?

I also came to the conclusion that it wasn't possible while reviewing
the code.  I wouldn't got to extreme measures to introduce paranoia
checks for impossible conditions.  Thanks,

Alex

> > >
> > >         group = iommu_group_get(vdev->dev);
> > >         if (!group)
> > >                 return VFIO_PCI_DEVID_NOT_OWNED;
> > >
> > >         if (iommufd_ctx_has_group(iommufd_ctx, group))
> > >                 devid = VFIO_PCI_DEVID_OWNED;
> > >
> > >         iommu_group_put(group);
> > >
> > >         return devid;
> > > }  


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-05-18 13:21     ` Liu, Yi L
  2023-05-18 13:31       ` Liu, Yi L
@ 2023-05-18 20:02       ` Alex Williamson
  1 sibling, 0 replies; 36+ messages in thread
From: Alex Williamson @ 2023-05-18 20:02 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
	robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

On Thu, 18 May 2023 13:21:57 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Thursday, May 18, 2023 6:02 AM
> > 
> > On Sat, 13 May 2023 06:21:35 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > This makes VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl to use the iommufd_ctx  
> > 
> > s/makes/allows/?
> > 
> > s/to//
> >   
> > > of the cdev device to check the ownership of the other affected devices.
> > >
> > > This returns devid for each of the affected devices. If it is bound to the
> > > iommufd_ctx of the cdev device, _INFO reports a valid devid > 0; If it is
> > > not opened by the calling user, but it belongs to the same iommu_group of
> > > a device that is bound to the iommufd_ctx of the cdev device, reports devid
> > > value of 0; If the device is un-owned device, configured within a different
> > > iommufd, or opened outside of the vfio device cdev API, the _INFO ioctl shall
> > > report devid value of -1.
> > >
> > > devid >=0 doesn't block hot-reset as the affected devices are considered to
> > > be owned, while devid == -1 will block the use of VFIO_DEVICE_PCI_HOT_RESET
> > > outside of proof-of-ownership calling conventions (ie. via legacy group
> > > accessed devices).
> > >
> > > This adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID to tell the user devid is
> > > returned in case of calling user get device fd from other software stack  
> > 
> > "other software stack"?  I think this is trying to say something like:
> > 
> >   When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD
> >   managed device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is
> >   reported to indicate the values returned are IOMMUFD devids rather
> >   than group IDs as used when accessing vfio devices through the
> >   conventional vfio group interface.  Additionally the flag
> >   VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported in this mode if
> >   all of the devices affected by the hot-reset are owned by either
> >   virtue of being directly bound to the same iommufd context as the
> >   calling device, or implicitly owned via a shared IOMMU group.  
> 
> Yes. it is.
> 
> > > and adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED to tell user if all
> > > the affected devices are owned, so user can know it without looping all
> > > the returned devids.
> > >
> > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++++++++++++++++--
> > >  include/uapi/linux/vfio.h        | 46 +++++++++++++++++++++++++++-
> > >  2 files changed, 95 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > > index 4df2def35bdd..57586be770af 100644
> > > --- a/drivers/vfio/pci/vfio_pci_core.c
> > > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > > @@ -27,6 +27,7 @@
> > >  #include <linux/vgaarb.h>
> > >  #include <linux/nospec.h>
> > >  #include <linux/sched/mm.h>
> > > +#include <linux/iommufd.h>
> > >  #if IS_ENABLED(CONFIG_EEH)
> > >  #include <asm/eeh.h>
> > >  #endif
> > > @@ -36,6 +37,10 @@
> > >  #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> > >  #define DRIVER_DESC "core driver for VFIO based PCI devices"
> > >
> > > +#ifdef CONFIG_IOMMUFD
> > > +MODULE_IMPORT_NS(IOMMUFD);
> > > +#endif
> > > +
> > >  static bool nointxmask;
> > >  static bool disable_vga;
> > >  static bool disable_idle_d3;
> > > @@ -776,6 +781,9 @@ struct vfio_pci_fill_info {
> > >  	int max;
> > >  	int cur;
> > >  	struct vfio_pci_dependent_device *devices;
> > > +	struct vfio_device *vdev;
> > > +	bool devid:1;
> > > +	bool dev_owned:1;
> > >  };
> > >
> > >  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > > @@ -790,7 +798,37 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > >  	if (!iommu_group)
> > >  		return -EPERM; /* Cannot reset non-isolated devices */
> > >
> > > -	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > > +	if (fill->devid) {
> > > +		struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(fill->vdev);
> > > +		struct vfio_device_set *dev_set = fill->vdev->dev_set;
> > > +		struct vfio_device *vdev;
> > > +
> > > +		/*
> > > +		 * Report devid for the affected devices:
> > > +		 * - valid devid > 0 for the devices that are bound with
> > > +		 *   the iommufd of the calling device.
> > > +		 * - devid == 0 for the devices that have not been opened
> > > +		 *   but have same group with one of the devices bound to
> > > +		 *   the iommufd of the calling device.
> > > +		 * - devid == -1 for others, and clear dev_owned flag.
> > > +		 */
> > > +		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> > > +		if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
> > > +			int ret;
> > > +
> > > +			ret = vfio_iommufd_physical_devid(vdev);
> > > +			if (WARN_ON(ret < 0))
> > > +				return ret;
> > > +			fill->devices[fill->cur].devid = ret;  
> > 
> > Nit, @devid seems like a better variable name here rather than @ret.
> >   
> > > +		} else if (vdev && iommufd_ctx_has_group(iommufd, iommu_group)) {
> > > +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
> > > +		} else {
> > > +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> > > +			fill->dev_owned = false;
> > > +		}  
> > 
> > I think we're not describing the requirements for this middle test
> > correctly.  We're essentially only stating the iommufd_ctx_has_group()
> > part of the requirement, but we're also enforcing a
> > vfio_find_device_in_devset() requirement, which means the device is not
> > just unopened within a group shared by the iommufd context, but it must
> > also still be a device registered as a member of the devset, ie. it
> > must be bound to a vfio driver.  
> 
> Yes. This is to make it be par with hot-reset path. This is needed. Is it?

Yes.  In supporting this null-array model, the kernel is taking on the
responsibility to indicate that the hot-reset is available, which
necessarily includes the devset test.  I think the logic we have is
correct, but the documentation and comments should match the code as
well.

> > It's not a new requirement, it's imposed in the hot-reset ioctl itself,
> > but it's new for the info ioctl given that it's now trying to report
> > that the user can perform the reset for cdev callers.
> >
> > This also shares too much logic with vfio_device_owned() added in the
> > next patch.  I think it might be cleaner to move the iommu_group_get() to
> > the group path below and change vfio_device_owned() to something that
> > can be used here and in the reset path.  For example, if we had a
> > function like:
> > 
> > static int vfio_hot_reset_devid(struct vfio_device *vdev,
> >                                 struct iommufd_ctx *iommufd_ctx)
> > {
> >         struct iommu_group *group;
> >         int devid;
> > 
> >         if (!vdev)
> >                 return VFIO_PCI_DEVID_NOT_OWNED;
> > 
> >         if (vfio_iommufd_physical_ictx(vdev) == iommufd_ctx)
> >                 return vfio_iommufd_physical_devid(vdev);
> > 
> >         group = iommu_group_get(vdev->dev);
> >         if (!group)
> >                 return VFIO_PCI_DEVID_NOT_OWNED;
> > 
> >         if (iommufd_ctx_has_group(iommufd_ctx, group))
> >                 devid = VFIO_PCI_DEVID_OWNED;
> > 
> >         iommu_group_put(group);
> > 
> >         return devid;
> > }  
> 
> Thanks. This can be placed in vfio/iommufd.c, hence no need to
> Import the IOMMUFD namespace in vfio_pci_core.c.
> vfio_iommufd_physical_devid() can ebe static within vfio/iommufd.c

Good, that's an improvement too.  Thanks,

Alex
 
> > It could be called above as:
> > 
> > 	vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> > 	fill->devices[fill->cur].devid =
> > 			vfio_hot_reset_devid(vdev, iommufd);
> > 
> > 
> > And from vfio_pci_dev_set_hot_reset() as:
> > 
> > 	bool owned;
> > 
> > 	if (iommufd_ctx) {
> > 		int devid = vfio_hot_reset_devid(&cur_vma->vdev,
> > 						 iommufd_ctx);
> > 
> > 		owned = (devid != VFIO_PCI_DEVID_NOT_OWNED);
> > 	} else
> > 		owned = vfio_dev_in_groups(&cur_vma->vdev, groups);
> > 
> > Any better?  
> 
> Above looks good enough.
> 
> >   
> > > +	} else {
> > > +		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > > +	}
> > >  	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> > >  	fill->devices[fill->cur].bus = pdev->bus->number;
> > >  	fill->devices[fill->cur].devfn = pdev->devfn;
> > > @@ -1229,17 +1267,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> > >  		return -ENOMEM;
> > >
> > >  	fill.devices = devices;
> > > +	fill.vdev = &vdev->vdev;
> > >
> > > +	mutex_lock(&vdev->vdev.dev_set->lock);
> > > +	fill.devid = fill.dev_owned = vfio_device_cdev_opened(&vdev->vdev);
> > >  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> > >  					    &fill, slot);
> > > +	mutex_unlock(&vdev->vdev.dev_set->lock);
> > >
> > >  	/*
> > >  	 * If a device was removed between counting and filling, we may come up
> > >  	 * short of fill.max.  If a device was added, we'll have a return of
> > >  	 * -EAGAIN above.
> > >  	 */
> > > -	if (!ret)
> > > +	if (!ret) {
> > >  		hdr.count = fill.cur;
> > > +		if (fill.devid) {
> > > +			hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID;
> > > +			if (fill.dev_owned)
> > > +				hdr.flags |=  
> > VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;  
> > > +		}
> > > +	}  
> > 
> > Does this clean up the flag and branching a bit?  
> 
> Yes. 😊
> 
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 737115d16a79..6a2a079e452d 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -786,8 +786,7 @@ struct vfio_pci_fill_info {
> >  	int cur;
> >  	struct vfio_pci_dependent_device *devices;
> >  	struct vfio_device *vdev;
> > -	bool devid:1;
> > -	bool dev_owned:1;
> > +	u32 flags;
> >  };
> > 
> >  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > @@ -802,7 +801,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> >  	if (!iommu_group)
> >  		return -EPERM; /* Cannot reset non-isolated devices */
> > 
> > -	if (fill->devid) {
> > +	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
> >  		struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(fill->vdev);
> >  		struct vfio_device_set *dev_set = fill->vdev->dev_set;
> >  		struct vfio_device *vdev;
> > @@ -814,7 +813,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> >  		 * - devid == 0 for the devices that have not been opened
> >  		 *   but have same group with one of the devices bound to
> >  		 *   the iommufd of the calling device.
> > -		 * - devid == -1 for others, and clear dev_owned flag.
> > +		 * - devid == -1 for others, and clear owned flag.
> >  		 */
> >  		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> >  		if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
> > @@ -828,7 +827,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> >  			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
> >  		} else {
> >  			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> > -			fill->dev_owned = false;
> > +			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> >  		}
> >  	} else {
> >  		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > @@ -1273,8 +1272,11 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> >  	fill.devices = devices;
> >  	fill.vdev = &vdev->vdev;
> > 
> > +	if (vfio_device_cdev_opened(&vdev->vdev))
> > +		fill.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID |
> > +			     VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> > +
> >  	mutex_lock(&vdev->vdev.dev_set->lock);
> > -	fill.devid = fill.dev_owned = vfio_device_cdev_opened(&vdev->vdev);
> >  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> >  					    &fill, slot);
> >  	mutex_unlock(&vdev->vdev.dev_set->lock);
> > @@ -1286,11 +1288,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> >  	 */
> >  	if (!ret) {
> >  		hdr.count = fill.cur;
> > -		if (fill.devid) {
> > -			hdr.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID;
> > -			if (fill.dev_owned)
> > -				hdr.flags |=
> > VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> > -		}
> > +		hdr.flags = fill.flags;
> >  	}
> > 
> >  reset_info_exit:
> > 
> > Thanks,
> > Alex  
> 
> Thanks,
> Yi Liu
> 
> > >
> > >  reset_info_exit:
> > >  	if (copy_to_user(arg, &hdr, minsz))
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 0552e8dcf0cb..01203215251a 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -650,11 +650,53 @@ enum {
> > >   * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> > >   *					      struct vfio_pci_hot_reset_info)
> > >   *
> > > + * This command is used to query the affected devices in the hot reset for
> > > + * a given device.
> > > + *
> > > + * This command always reports the segment, bus, and devfn information for
> > > + * each affected device, and selectively reports the group_id or devid per
> > > + * the way how the calling device is opened.
> > > + *
> > > + *	- If the calling device is opened via the traditional group/container
> > > + *	  API, group_id is reported.  User should check if it has owned all
> > > + *	  the affected devices and provides a set of group fds to prove the
> > > + *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
> > > + *
> > > + *	- If the calling device is opened as a cdev, devid is reported.
> > > + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
> > > + *	  data type.  For a given affected device, it is considered owned by
> > > + *	  this interface if it meets the following conditions:
> > > + *	  1) Has a valid devid within the iommufd_ctx of the calling device.
> > > + *	     Ownership cannot be determined across separate iommufd_ctx and the
> > > + *	     cdev calling conventions do not support a proof-of-ownership model
> > > + *	     as provided in the legacy group interface.  In this case a valid
> > > + *	     devid with value greater than zero is provided in the return
> > > + *	     structure.
> > > + *	  2) Does not have a valid devid within the iommufd_ctx of the calling
> > > + *	     device, but belongs to the same IOMMU group as the calling device
> > > + *	     or another opened device that has a valid devid within the
> > > + *	     iommufd_ctx of the calling device.  This provides implicit ownership
> > > + *	     for devices within the same DMA isolation context.  In this case
> > > + *	     the invalid devid value of zero is provided in the return structure.
> > > + *
> > > + *	  A devid value of -1 is provided in the return structure for devices
> > > + *	  where ownership is not available.  Such devices prevent the use of
> > > + *	  VFIO_DEVICE_PCI_HOT_RESET outside of proof-of-ownership calling
> > > + *	  conventions (ie. via legacy group accessed devices).
> > > + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
> > > + *	  affected devices are owned by the user.  This flag is available only
> > > + *	  when VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> > > + *
> > >   * Return: 0 on success, -errno on failure:
> > >   *	-enospc = insufficient buffer, -enodev = unsupported for device.
> > >   */
> > >  struct vfio_pci_dependent_device {
> > > -	__u32	group_id;
> > > +	union {
> > > +		__u32   group_id;
> > > +		__u32	devid;
> > > +#define VFIO_PCI_DEVID_OWNED		0
> > > +#define VFIO_PCI_DEVID_NOT_OWNED	-1
> > > +	};
> > >  	__u16	segment;
> > >  	__u8	bus;
> > >  	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
> > > @@ -663,6 +705,8 @@ struct vfio_pci_dependent_device {
> > >  struct vfio_pci_hot_reset_info {
> > >  	__u32	argsz;
> > >  	__u32	flags;
> > > +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
> > > +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
> > >  	__u32	count;
> > >  	struct vfio_pci_dependent_device	devices[];
> > >  };  
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-05-18 13:25     ` Liu, Yi L
  2023-05-18 19:50       ` Alex Williamson
@ 2023-05-19  7:57       ` Tian, Kevin
  1 sibling, 0 replies; 36+ messages in thread
From: Tian, Kevin @ 2023-05-19  7:57 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: jgg@nvidia.com, joro@8bytes.org, robin.murphy@arm.com,
	cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
	kvm@vger.kernel.org, mjrosato@linux.ibm.com,
	chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
	peterx@redhat.com, jasowang@redhat.com,
	shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
	suravee.suthikulpanit@amd.com,
	intel-gvt-dev@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
	Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
	Duan, Zhenzhong, clegoate@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, May 18, 2023 9:26 PM
> > > +int vfio_iommufd_physical_devid(struct vfio_device *vdev)
> > > +{
> > > +	if (vdev->iommufd_device)
> > > +		return iommufd_device_to_id(vdev->iommufd_device);
> > > +	if (vdev->noiommu_access)
> > > +		return iommufd_access_to_id(vdev->noiommu_access);
> > > +	return -EINVAL;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
> >
> > I think these exemplify that it would be better if both emulated and
> > noiommu use the same iommufd_access pointer.  Thanks,
> 
> Sure. Then I shall rename this helper. vfio_iommufd_device_devid()
> What about your opinion?
> 

Probably just vfio_iommufd_device_id().

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-05-19  7:57 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-13 13:21 [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
2023-05-13 13:21 ` [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices Yi Liu
2023-05-17 17:26   ` Alex Williamson
2023-05-17 18:20     ` Jason Gunthorpe
2023-05-18 12:23       ` Liu, Yi L
2023-05-18 19:43         ` Alex Williamson
2023-05-13 13:21 ` [PATCH v5 02/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
2023-05-13 13:21 ` [PATCH v5 03/10] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
2023-05-13 13:21 ` [PATCH v5 04/10] vfio: Mark cdev usage in vfio_device Yi Liu
2023-05-13 13:21 ` [PATCH v5 05/10] iommufd: Reserve all negative IDs in the iommufd xarray Yi Liu
2023-05-13 13:21 ` [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
2023-05-17 18:15   ` Alex Williamson
2023-05-17 18:22     ` Jason Gunthorpe
2023-05-17 18:40       ` Alex Williamson
2023-05-17 18:43         ` Jason Gunthorpe
2023-05-18 13:25     ` Liu, Yi L
2023-05-18 19:50       ` Alex Williamson
2023-05-19  7:57       ` Tian, Kevin
2023-05-13 13:21 ` [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set Yi Liu
2023-05-15  7:11   ` Cédric Le Goater
2023-05-17 19:12   ` Alex Williamson
2023-05-18 12:31     ` Liu, Yi L
2023-05-18 19:45       ` Alex Williamson
2023-05-13 13:21 ` [PATCH v5 08/10] iommufd: Add iommufd_ctx_has_group() Yi Liu
2023-05-17 19:40   ` Alex Williamson
2023-05-18 12:33     ` Liu, Yi L
2023-05-13 13:21 ` [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev Yi Liu
2023-05-15  7:29   ` Cédric Le Goater
2023-05-15  7:47     ` Liu, Yi L
2023-05-17 22:01   ` Alex Williamson
2023-05-18 13:21     ` Liu, Yi L
2023-05-18 13:31       ` Liu, Yi L
2023-05-18 19:53         ` Alex Williamson
2023-05-18 20:02       ` Alex Williamson
2023-05-13 13:21 ` [PATCH v5 10/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
2023-05-18  5:51 ` [PATCH v5 00/10] Enhance vfio PCI hot reset for vfio cdev device Xu, Terrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox