intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
@ 2023-02-27 11:11 Yi Liu
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 01/19] vfio: Allocate per device file structure Yi Liu
                   ` (22 more replies)
  0 siblings, 23 replies; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

Existing VFIO provides group-centric user APIs for userspace. Userspace
opens the /dev/vfio/$group_id first before getting device fd and hence
getting access to device. This is not the desired model for iommufd. Per
the conclusion of community discussion[1], iommufd provides device-centric
kAPIs and requires its consumer (like VFIO) to be device-centric user
APIs. Such user APIs are used to associate device with iommufd and also
the I/O address spaces managed by the iommufd.

This series first introduces a per device file structure to be prepared
for further enhancement and refactors the kvm-vfio code to be prepared
for accepting device file from userspace. Then refactors the vfio to be
able to handle iommufd binding. This refactor includes the mechanism of
blocking device access before iommufd bind, making the device_open exclusive.
between the group path and the cdev path. Eventually, adds the cdev support
for vfio device, and makes group infrastructure optional as it is not needed
when vfio device cdev is compiled.

This is also a prerequisite for iommu nesting for vfio device[2].

The complete code can be found in below branch, simple test done with the
legacy group path and the cdev path. Draft QEMU branch can be found at[3]

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v5
(config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)

base-commit: 63777bd2daa3625da6eada88bd9081f047664dad

[1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
[2] https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.com/
[3] https://github.com/yiliu1765/qemu/tree/iommufd_rfcv3 (it is based on Eric's
    QEMU iommufd rfcv3 (https://lore.kernel.org/kvm/20230131205305.2726330-1-eric.auger@redhat.com/)
    plus two commits to align with vfio_device_cdev v3/v4/v5)

Change log:

v5:
 - Add r-b from Kevin on patch 08, 13, 14, 15 and 17.
 - Rename patch 02 to limit the change for KVM facing kAPIs. The vfio pci
   hot reset path only accepts group file until patch 09. (Kevin)
 - Update comment around smp_load_acquire(&df->access_granted) (Yan)
 - Adopt Jason's suggestion on the vfio pci hot reset path, passing zero-length
   fd array to indicate using bound iommufd_ctx as ownership check. (Jason, Kevin)
 - Direct read df->access_granted value in vfio_device_cdev_close() (Kevin, Yan, Jason)
 - Wrap the iommufd get/put into a helper to refine the error path of
   vfio_device_ioctl_bind_iommufd(). (Yan)

v4: https://lore.kernel.org/kvm/20230221034812.138051-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 09/10
 - Add a line in devices/vfio.rst to emphasize user should add group/device to
   KVM prior to invoke open_device op which may be called in the VFIO_GROUP_GET_DEVICE_FD
   or VFIO_DEVICE_BIND_IOMMUFD ioctl.
 - Modify VFIO_GROUP/VFIO_DEVICE_CDEV Kconfig dependency (Alex)
 - Select VFIO_GROUP for SPAPR (Jason)
 - Check device fully-opened in PCI hotreset path for device fd (Jason)
 - Set df->access_granted in the caller of vfio_device_open() since
   the caller may fail in other operations, but df->access_granted
   does not allow a true to false change. So it should be set only when
   the open path is really done successfully. (Yan, Kevin)
 - Fix missing iommufd_ctx_put() in the cdev path (Yan)
 - Fix an issue found in testing exclusion between group and cdev path.
   vfio_device_cdev_close() should check df->access_granted before heading
   to other operations.
 - Update vfio.rst for iommufd/cdev

v3: https://lore.kernel.org/kvm/20230213151348.56451-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 03, 06, 07, 08.
 - Refine the group and cdev path exclusion. Remove vfio_device:single_open;
   add vfio_group::cdev_device_open_cnt to achieve exlucsion between group
   path and cdev path (Kevin, Jason)
 - Fix a bug in the error handling path (Yan Zhao)
 - Address misc remarks from Kevin

v2: https://lore.kernel.org/kvm/20230206090532.95598-1-yi.l.liu@intel.com/
 - Add r-b from Kevin and Eric on patch 01 02 04.
 - "Split kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()"
   from this series and got applied. (Alex, Kevin, Jason, Mathhew)
 - Add kvm_ref_lock to protect vfio_device_file->kvm instead of reusing
   dev_set->lock as dead-lock is observed with vfio-ap which would try to
   acquire kvm_lock. This is opposite lock order with kvm_device_release()
   which holds kvm_lock first and then hold dev_set->lock. (Kevin)
 - Use a separate ioctl for detaching IOAS. (Alex)
 - Rename vfio_device_file::single_open to be is_cdev_device (Kevin, Alex)
 - Move the vfio device cdev code into device_cdev.c and add a VFIO_DEVICE_CDEV
   kconfig for it. (Kevin, Jason)

v1: https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
 - Fix the circular refcount between kvm struct and device file reference. (JasonG)
 - Address comments from KevinT
 - Remained the ioctl for detach, needs to Alex's taste
   (https://lore.kernel.org/kvm/BN9PR11MB5276BE9F4B0613EE859317028CFF9@BN9PR11MB5276.namprd11.prod.outlook.com/)

rfc: https://lore.kernel.org/kvm/20221219084718.9342-1-yi.l.liu@intel.com/

Thanks,
	Yi Liu

Yi Liu (19):
  vfio: Allocate per device file structure
  vfio: Refine vfio file kAPIs for KVM
  vfio: Accept vfio device file in the KVM facing kAPI
  kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device
    fd
  kvm/vfio: Accept vfio device file from userspace
  vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  vfio: Block device access via device fd until device is opened
  vfio/pci: Update comment around group_fd get in
    vfio_pci_ioctl_pci_hot_reset()
  vfio/pci: Allow passing zero-length fd array in
    VFIO_DEVICE_PCI_HOT_RESET
  vfio: Add infrastructure for bind_iommufd from userspace
  vfio-iommufd: Add detach_ioas support for physical VFIO devices
  vfio-iommufd: Add detach_ioas for emulated VFIO devices
  vfio: Add cdev_device_open_cnt to vfio_group
  vfio: Make vfio_device_open() single open for device cdev path
  vfio: Add cdev for vfio_device
  vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  vfio: Compile group optionally
  docs: vfio: Add vfio device cdev description

 Documentation/driver-api/vfio.rst             | 133 +++++++-
 Documentation/virt/kvm/devices/vfio.rst       |  52 ++--
 drivers/gpu/drm/i915/gvt/kvmgt.c              |   1 +
 drivers/s390/cio/vfio_ccw_ops.c               |   1 +
 drivers/s390/crypto/vfio_ap_ops.c             |   1 +
 drivers/vfio/Kconfig                          |  26 ++
 drivers/vfio/Makefile                         |   3 +-
 drivers/vfio/device_cdev.c                    | 285 ++++++++++++++++++
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |   1 +
 drivers/vfio/group.c                          | 139 ++++++---
 drivers/vfio/iommufd.c                        |  59 +++-
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |   2 +
 drivers/vfio/pci/mlx5/main.c                  |   1 +
 drivers/vfio/pci/vfio_pci.c                   |   1 +
 drivers/vfio/pci/vfio_pci_core.c              | 116 +++++--
 drivers/vfio/platform/vfio_amba.c             |   1 +
 drivers/vfio/platform/vfio_platform.c         |   1 +
 drivers/vfio/vfio.h                           | 192 +++++++++++-
 drivers/vfio/vfio_main.c                      | 244 +++++++++++++--
 include/linux/iommufd.h                       |   6 +
 include/linux/vfio.h                          |  40 ++-
 include/uapi/linux/kvm.h                      |  16 +-
 include/uapi/linux/vfio.h                     | 102 +++++++
 virt/kvm/vfio.c                               | 141 ++++-----
 24 files changed, 1348 insertions(+), 216 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 01/19] vfio: Allocate per device file structure
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:46   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 02/19] vfio: Refine vfio file kAPIs for KVM Yi Liu
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

This is preparation for adding vfio device cdev support. vfio device
cdev requires:
1) a per device file memory to store the kvm pointer set by KVM. It will
   be propagated to vfio_device:kvm after the device cdev file is bound
   to an iommufd
2) a mechanism to block device access through device cdev fd before it
   is bound to an iommufd

To address above requirements, this adds a per device file structure
named vfio_device_file. For now, it's only a wrapper of struct vfio_device
pointer. Other fields will be added to this per file structure in future
commits.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/group.c     | 13 +++++++++++--
 drivers/vfio/vfio.h      |  6 ++++++
 drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++-----
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 0e9036e2b9c4..cf51e1a0fd96 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -215,19 +215,26 @@ void vfio_device_group_close(struct vfio_device *device)
 
 static struct file *vfio_device_open_file(struct vfio_device *device)
 {
+	struct vfio_device_file *df;
 	struct file *filep;
 	int ret;
 
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_out;
+	}
+
 	ret = vfio_device_group_open(device);
 	if (ret)
-		goto err_out;
+		goto err_free;
 
 	/*
 	 * We can't use anon_inode_getfd() because we need to modify
 	 * the f_mode flags directly to allow more than just ioctls
 	 */
 	filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
-				   device, O_RDWR);
+				   df, O_RDWR);
 	if (IS_ERR(filep)) {
 		ret = PTR_ERR(filep);
 		goto err_close_device;
@@ -251,6 +258,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 
 err_close_device:
 	vfio_device_group_close(device);
+err_free:
+	kfree(df);
 err_out:
 	return ERR_PTR(ret);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index e9721d8424bc..61bbf673e672 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -16,11 +16,17 @@ struct iommu_group;
 struct vfio_device;
 struct vfio_container;
 
+struct vfio_device_file {
+	struct vfio_device *device;
+};
+
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
 int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
 void vfio_device_close(struct vfio_device *device,
 		       struct iommufd_ctx *iommufd);
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 3a597e799918..d99fa0cec18e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -396,6 +396,20 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device)
+{
+	struct vfio_device_file *df;
+
+	df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT);
+	if (!df)
+		return ERR_PTR(-ENOMEM);
+
+	df->device = device;
+
+	return df;
+}
+
 static int vfio_device_first_open(struct vfio_device *device,
 				  struct iommufd_ctx *iommufd)
 {
@@ -509,12 +523,15 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
  */
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	vfio_device_group_close(device);
 
 	vfio_device_put_registration(device);
 
+	kfree(df);
+
 	return 0;
 }
 
@@ -1079,7 +1096,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
 static long vfio_device_fops_unl_ioctl(struct file *filep,
 				       unsigned int cmd, unsigned long arg)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 	int ret;
 
 	ret = vfio_device_pm_runtime_get(device);
@@ -1106,7 +1124,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 				     size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
@@ -1118,7 +1137,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 				      const char __user *buf,
 				      size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
@@ -1128,7 +1148,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 
 static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 02/19] vfio: Refine vfio file kAPIs for KVM
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 01/19] vfio: Allocate per device file structure Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:46   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 03/19] vfio: Accept vfio device file in the KVM facing kAPI Yi Liu
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

  bool vfio_file_enforced_coherent(struct file *file);
  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);

Besides the above change, vfio_file_is_valid() is added to check if a
given file is a valid vfio file. It would be extended to check both
vfio group file and vfio device file later.

vfio_file_is_group() is kept to for the VFIO PCI hot reset path.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/group.c     | 57 +++++++++++++++-------------------------
 drivers/vfio/vfio.h      |  3 +++
 drivers/vfio/vfio_main.c | 45 +++++++++++++++++++++++++++++++
 include/linux/vfio.h     |  1 +
 virt/kvm/vfio.c          | 10 +++----
 5 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index cf51e1a0fd96..742003e4a796 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -751,6 +751,15 @@ bool vfio_device_has_container(struct vfio_device *device)
 	return device->group->container;
 }
 
+struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	struct vfio_group *group = file->private_data;
+
+	if (file->f_op != &vfio_group_fops)
+		return NULL;
+	return group;
+}
+
 /**
  * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
  * @file: VFIO group file
@@ -761,13 +770,13 @@ bool vfio_device_has_container(struct vfio_device *device)
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 	struct iommu_group *iommu_group = NULL;
 
 	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
 		return NULL;
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return NULL;
 
 	mutex_lock(&group->group_lock);
@@ -781,33 +790,20 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
 /**
- * vfio_file_is_group - True if the file is usable with VFIO aPIS
+ * vfio_file_is_group - True if the file is a vfio group file
  * @file: VFIO group file
  */
 bool vfio_file_is_group(struct file *file)
 {
-	return file->f_op == &vfio_group_fops;
+	return vfio_group_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_group);
 
-/**
- * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
- *        is always CPU cache coherent
- * @file: VFIO group file
- *
- * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
- * bit in DMA transactions. A return of false indicates that the user has
- * rights to access additional instructions such as wbinvd on x86.
- */
-bool vfio_file_enforced_coherent(struct file *file)
+bool vfio_group_enforced_coherent(struct vfio_group *group)
 {
-	struct vfio_group *group = file->private_data;
 	struct vfio_device *device;
 	bool ret = true;
 
-	if (!vfio_file_is_group(file))
-		return true;
-
 	/*
 	 * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
 	 * any domain later attached to it will also not support it. If the cap
@@ -825,28 +821,17 @@ bool vfio_file_enforced_coherent(struct file *file)
 	mutex_unlock(&group->device_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
-/**
- * vfio_file_set_kvm - Link a kvm with VFIO drivers
- * @file: VFIO group file
- * @kvm: KVM to link
- *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the group.
- */
-void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return;
-
+	/*
+	 * When a VFIO device is first opened the KVM will be available in
+	 * device->kvm if one was associated with the group.
+	 */
 	spin_lock(&group->kvm_ref_lock);
 	group->kvm = kvm;
 	spin_unlock(&group->kvm_ref_lock);
 }
-EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
 /**
  * vfio_file_has_dev - True if the VFIO file is a handle for device
@@ -857,9 +842,9 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
  */
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return false;
 
 	return group == device->group;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 61bbf673e672..4612cadb6c56 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -90,6 +90,9 @@ void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device *device);
+struct vfio_group *vfio_group_from_file(struct file *file);
+bool vfio_group_enforced_coherent(struct vfio_group *group);
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index d99fa0cec18e..42ed3955814f 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1167,6 +1167,51 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+/**
+ * vfio_file_is_valid - True if the file is valid vfio file
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_valid(struct file *file)
+{
+	return vfio_group_from_file(file);
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_valid);
+
+/**
+ * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
+ *        is always CPU cache coherent
+ * @file: VFIO group file or VFIO device file
+ *
+ * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
+ * bit in DMA transactions. A return of false indicates that the user has
+ * rights to access additional instructions such as wbinvd on x86.
+ */
+bool vfio_file_enforced_coherent(struct file *file)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		return vfio_group_enforced_coherent(group);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
+
+/**
+ * vfio_file_set_kvm - Link a kvm with VFIO drivers
+ * @file: VFIO group file or VFIO device file
+ * @kvm: KVM to link
+ *
+ */
+void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		vfio_group_set_kvm(group, kvm);
+}
+EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
+
 /*
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 93134b023968..f2d3d6997ad7 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -246,6 +246,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file);
 bool vfio_file_is_group(struct file *file);
+bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 9584eb57e0ed..8bac308ba630 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
 	return ret;
 }
 
-static bool kvm_vfio_file_is_group(struct file *file)
+static bool kvm_vfio_file_is_valid(struct file *file)
 {
 	bool (*fn)(struct file *file);
 	bool ret;
 
-	fn = symbol_get(vfio_file_is_group);
+	fn = symbol_get(vfio_file_is_valid);
 	if (!fn)
 		return false;
 
 	ret = fn(file);
 
-	symbol_put(vfio_file_is_group);
+	symbol_put(vfio_file_is_valid);
 
 	return ret;
 }
@@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	if (!filp)
 		return -EBADF;
 
-	/* Ensure the FD is a vfio group FD.*/
-	if (!kvm_vfio_file_is_group(filp)) {
+	/* Ensure the FD is a vfio FD.*/
+	if (!kvm_vfio_file_is_valid(filp)) {
 		ret = -EINVAL;
 		goto err_fput;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 03/19] vfio: Accept vfio device file in the KVM facing kAPI
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 01/19] vfio: Allocate per device file structure Yi Liu
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 02/19] vfio: Refine vfio file kAPIs for KVM Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:46   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

This makes the vfio file kAPIs to accepte vfio device files, also a
preparation for vfio device cdev support.

For the kvm set with vfio device file, kvm pointer is stored in struct
vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
pointer usage within VFIO. This kvm pointer will be set to vfio_device
after device file is bound to iommufd in the cdev path.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/vfio.h      |  2 ++
 drivers/vfio/vfio_main.c | 42 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 4612cadb6c56..59ca8b3d7563 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	spinlock_t kvm_ref_lock; /* protect kvm field */
+	struct kvm *kvm;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 42ed3955814f..9941db787891 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -406,6 +406,7 @@ vfio_allocate_device_file(struct vfio_device *device)
 		return ERR_PTR(-ENOMEM);
 
 	df->device = device;
+	spin_lock_init(&df->kvm_ref_lock);
 
 	return df;
 }
@@ -1167,13 +1168,23 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+static struct vfio_device *vfio_device_from_file(struct file *file)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	if (file->f_op != &vfio_device_fops)
+		return NULL;
+	return df->device;
+}
+
 /**
  * vfio_file_is_valid - True if the file is valid vfio file
  * @file: VFIO group file or VFIO device file
  */
 bool vfio_file_is_valid(struct file *file)
 {
-	return vfio_group_from_file(file);
+	return vfio_group_from_file(file) ||
+	       vfio_device_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 
@@ -1188,15 +1199,36 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
  */
 bool vfio_file_enforced_coherent(struct file *file)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
+	struct vfio_device *device;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_enforced_coherent(group);
 
+	device = vfio_device_from_file(file);
+	if (device)
+		return device_iommu_capable(device->dev,
+					    IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
+static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	/*
+	 * The kvm is first recorded in the vfio_device_file, and will
+	 * be propagated to vfio_device::kvm when the file is bound to
+	 * iommufd successfully in the vfio device cdev path.
+	 */
+	spin_lock(&df->kvm_ref_lock);
+	df->kvm = kvm;
+	spin_unlock(&df->kvm_ref_lock);
+}
+
 /**
  * vfio_file_set_kvm - Link a kvm with VFIO drivers
  * @file: VFIO group file or VFIO device file
@@ -1205,10 +1237,14 @@ EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
  */
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		vfio_group_set_kvm(group, kvm);
+
+	if (vfio_device_from_file(file))
+		vfio_device_file_set_kvm(file, kvm);
 }
 EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (2 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 03/19] vfio: Accept vfio device file in the KVM facing kAPI Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:47   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 05/19] kvm/vfio: Accept vfio device file from userspace Yi Liu
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

Meanwhile, rename related helpers. No functional change is intended.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
 1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 8bac308ba630..857d6ba349e1 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -21,7 +21,7 @@
 #include <asm/kvm_ppc.h>
 #endif
 
-struct kvm_vfio_group {
+struct kvm_vfio_file {
 	struct list_head node;
 	struct file *file;
 #ifdef CONFIG_SPAPR_TCE_IOMMU
@@ -30,7 +30,7 @@ struct kvm_vfio_group {
 };
 
 struct kvm_vfio {
-	struct list_head group_list;
+	struct list_head file_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 }
 
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
-					     struct kvm_vfio_group *kvg)
+					     struct kvm_vfio_file *kvf)
 {
-	if (WARN_ON_ONCE(!kvg->iommu_group))
+	if (WARN_ON_ONCE(!kvf->iommu_group))
 		return;
 
-	kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
-	iommu_group_put(kvg->iommu_group);
-	kvg->iommu_group = NULL;
+	kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group);
+	iommu_group_put(kvf->iommu_group);
+	kvf->iommu_group = NULL;
 }
 #endif
 
 /*
- * Groups can use the same or different IOMMU domains.  If the same then
- * adding a new group may change the coherency of groups we've previously
- * been told about.  We don't want to care about any of that so we retest
- * each group and bail as soon as we find one that's noncoherent.  This
- * means we only ever [un]register_noncoherent_dma once for the whole device.
+ * Groups/devices can use the same or different IOMMU domains. If the same
+ * then adding a new group/device may change the coherency of groups/devices
+ * we've previously been told about. We don't want to care about any of
+ * that so we retest each group/device and bail as soon as we find one that's
+ * noncoherent.  This means we only ever [un]register_noncoherent_dma once
+ * for the whole device.
  */
 static void kvm_vfio_update_coherency(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
 	bool noncoherent = false;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (!kvm_vfio_file_enforced_coherent(kvg->file)) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (!kvm_vfio_file_enforced_coherent(kvf->file)) {
 			noncoherent = true;
 			break;
 		}
@@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev)
 	mutex_unlock(&kv->lock);
 }
 
-static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct file *filp;
 	int ret;
 
@@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file == filp) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file == filp) {
 			ret = -EEXIST;
 			goto err_unlock;
 		}
 	}
 
-	kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT);
-	if (!kvg) {
+	kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT);
+	if (!kvf) {
 		ret = -ENOMEM;
 		goto err_unlock;
 	}
 
-	kvg->file = filp;
-	list_add_tail(&kvg->node, &kv->group_list);
+	kvf->file = filp;
+	list_add_tail(&kvf->node, &kv->file_list);
 
 	kvm_arch_start_assignment(dev->kvm);
 
 	mutex_unlock(&kv->lock);
 
-	kvm_vfio_file_set_kvm(kvg->file, dev->kvm);
+	kvm_vfio_file_set_kvm(kvf->file, dev->kvm);
 	kvm_vfio_update_coherency(dev);
 
 	return 0;
@@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	return ret;
 }
 
-static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		list_del(&kvg->node);
+		list_del(&kvf->node);
 		kvm_arch_end_assignment(dev->kvm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		kfree(kvf);
 		ret = 0;
 		break;
 	}
@@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 }
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
-					void __user *arg)
+static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
+				       void __user *arg)
 {
 	struct kvm_vfio_spapr_tce param;
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		if (!kvg->iommu_group) {
-			kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
-			if (WARN_ON_ONCE(!kvg->iommu_group)) {
+		if (!kvf->iommu_group) {
+			kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file);
+			if (WARN_ON_ONCE(!kvf->iommu_group)) {
 				ret = -EIO;
 				goto err_fdput;
 			}
 		}
 
 		ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
-						       kvg->iommu_group);
+						       kvf->iommu_group);
 		break;
 	}
 
@@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 }
 #endif
 
-static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
-			      void __user *arg)
+static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
+			     void __user *arg)
 {
 	int32_t __user *argp = arg;
 	int32_t fd;
@@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
 	case KVM_DEV_VFIO_GROUP_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_add(dev, fd);
+		return kvm_vfio_file_add(dev, fd);
 
 	case KVM_DEV_VFIO_GROUP_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_del(dev, fd);
+		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
-		return kvm_vfio_group_set_spapr_tce(dev, arg);
+		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
 
@@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
-		return kvm_vfio_set_group(dev, attr->attr,
-					  u64_to_user_ptr(attr->addr));
+		return kvm_vfio_set_file(dev, attr->attr,
+					 u64_to_user_ptr(attr->addr));
 	}
 
 	return -ENXIO;
@@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 static void kvm_vfio_release(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg, *tmp;
+	struct kvm_vfio_file *kvf, *tmp;
 
-	list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+	list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		list_del(&kvg->node);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		list_del(&kvf->node);
+		kfree(kvf);
 		kvm_arch_end_assignment(dev->kvm);
 	}
 
@@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 	if (!kv)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->file_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 05/19] kvm/vfio: Accept vfio device file from userspace
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (3 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:47   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/virt/kvm/devices/vfio.rst | 52 +++++++++++++++++--------
 include/uapi/linux/kvm.h                | 16 ++++++--
 virt/kvm/vfio.c                         | 16 ++++----
 3 files changed, 55 insertions(+), 29 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
index 79b6811bb4f3..5b05b48abaab 100644
--- a/Documentation/virt/kvm/devices/vfio.rst
+++ b/Documentation/virt/kvm/devices/vfio.rst
@@ -9,24 +9,37 @@ Device types supported:
   - KVM_DEV_TYPE_VFIO
 
 Only one VFIO instance may be created per VM.  The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM.  As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence.  When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+tracks VFIO files (group or device) in use by the VM and features
+of those groups/devices important to the correctness and acceleration
+of the VM.  As groups/devices are enabled and disabled for use by the
+VM, KVM should be updated about their presence.  When registered with
+KVM, a reference to the VFIO file is held by KVM.
 
 Groups:
-  KVM_DEV_VFIO_GROUP
-
-KVM_DEV_VFIO_GROUP attributes:
-  KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
+  KVM_DEV_VFIO_FILE
+	alias: KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_FILE attributes:
+  KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
+	tracking
+
+	alias: KVM_DEV_VFIO_GROUP_ADD
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM
+	device tracking
+
+	alias: KVM_DEV_VFIO_GROUP_DEL
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+
+  KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
 	allocated by sPAPR KVM.
+
+	alias: KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE
+
 	kvm_device_attr.addr points to a struct::
 
 		struct kvm_vfio_spapr_tce {
@@ -40,9 +53,14 @@ KVM_DEV_VFIO_GROUP attributes:
 	- @tablefd is a file descriptor for a TCE table allocated via
 	  KVM_CREATE_SPAPR_TCE.
 
+	only accepts vfio group file as SPAPR has no iommufd support
+
 ::
 
-The GROUP_ADD operation above should be invoked prior to accessing the
+The FILE/GROUP_ADD operation above should be invoked prior to accessing the
 device file descriptor via VFIO_GROUP_GET_DEVICE_FD in order to support
 drivers which require a kvm pointer to be set in their .open_device()
-callback.
+callback.  It is the same for device file descriptor via character device
+open which gets device access via VFIO_DEVICE_BIND_IOMMUFD.  For such file
+descriptors, FILE_ADD should be invoked before VFIO_DEVICE_BIND_IOMMUFD
+to support the drivers mentioned in piror sentence as well.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 55155e262646..484a8133bc69 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1401,10 +1401,18 @@ struct kvm_device_attr {
 	__u64	addr;		/* userspace address of attr data */
 };
 
-#define  KVM_DEV_VFIO_GROUP			1
-#define   KVM_DEV_VFIO_GROUP_ADD			1
-#define   KVM_DEV_VFIO_GROUP_DEL			2
-#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE		3
+#define  KVM_DEV_VFIO_FILE	1
+
+#define   KVM_DEV_VFIO_FILE_ADD			1
+#define   KVM_DEV_VFIO_FILE_DEL			2
+#define   KVM_DEV_VFIO_FILE_SET_SPAPR_TCE	3
+
+/* KVM_DEV_VFIO_GROUP aliases are for compile time uapi compatibility */
+#define  KVM_DEV_VFIO_GROUP	KVM_DEV_VFIO_FILE
+
+#define   KVM_DEV_VFIO_GROUP_ADD	KVM_DEV_VFIO_FILE_ADD
+#define   KVM_DEV_VFIO_GROUP_DEL	KVM_DEV_VFIO_FILE_DEL
+#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE	KVM_DEV_VFIO_FILE_SET_SPAPR_TCE
 
 enum kvm_device_type {
 	KVM_DEV_TYPE_FSL_MPIC_20	= 1,
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 857d6ba349e1..d869913baafd 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -286,18 +286,18 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
 	int32_t fd;
 
 	switch (attr) {
-	case KVM_DEV_VFIO_GROUP_ADD:
+	case KVM_DEV_VFIO_FILE_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_add(dev, fd);
 
-	case KVM_DEV_VFIO_GROUP_DEL:
+	case KVM_DEV_VFIO_FILE_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+	case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
@@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		return kvm_vfio_set_file(dev, attr->attr,
 					 u64_to_user_ptr(attr->addr));
 	}
@@ -321,12 +321,12 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		switch (attr->attr) {
-		case KVM_DEV_VFIO_GROUP_ADD:
-		case KVM_DEV_VFIO_GROUP_DEL:
+		case KVM_DEV_VFIO_FILE_ADD:
+		case KVM_DEV_VFIO_FILE_DEL:
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+		case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 #endif
 			return 0;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (4 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 05/19] kvm/vfio: Accept vfio device file from userspace Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:47   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 07/19] vfio: Block device access via device fd until device is opened Yi Liu
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

This avoids passing too much parameters in multiple functions.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/group.c     | 19 +++++++++++++------
 drivers/vfio/vfio.h      |  8 ++++----
 drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
 3 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 742003e4a796..960b1bcb606b 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -166,8 +166,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
 	spin_unlock(&device->group->kvm_ref_lock);
 }
 
-static int vfio_device_group_open(struct vfio_device *device)
+static int vfio_device_group_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
@@ -187,7 +188,11 @@ static int vfio_device_group_open(struct vfio_device *device)
 	if (device->open_count == 0)
 		vfio_device_group_get_kvm_safe(device);
 
-	ret = vfio_device_open(device, device->group->iommufd);
+	df->iommufd = device->group->iommufd;
+
+	ret = vfio_device_open(df);
+	if (ret)
+		df->iommufd = NULL;
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -199,12 +204,14 @@ static int vfio_device_group_open(struct vfio_device *device)
 	return ret;
 }
 
-void vfio_device_group_close(struct vfio_device *device)
+void vfio_device_group_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	mutex_lock(&device->group->group_lock);
 	mutex_lock(&device->dev_set->lock);
 
-	vfio_device_close(device, device->group->iommufd);
+	vfio_device_close(df);
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -225,7 +232,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
-	ret = vfio_device_group_open(device);
+	ret = vfio_device_group_open(df);
 	if (ret)
 		goto err_free;
 
@@ -257,7 +264,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 	return filep;
 
 err_close_device:
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 err_free:
 	kfree(df);
 err_out:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 59ca8b3d7563..7c1ea870d8f3 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -20,13 +20,13 @@ struct vfio_device_file {
 	struct vfio_device *device;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
+	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd);
+int vfio_device_open(struct vfio_device_file *df);
+void vfio_device_close(struct vfio_device_file *df);
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
 
@@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
 void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
-void vfio_device_group_close(struct vfio_device *device);
+void vfio_device_group_close(struct vfio_device_file *df);
 struct vfio_group *vfio_group_from_file(struct file *file);
 bool vfio_group_enforced_coherent(struct vfio_group *group);
 void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 9941db787891..609700748082 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -411,9 +411,10 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device *device,
-				  struct iommufd_ctx *iommufd)
+static int vfio_device_first_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
 	int ret;
 
 	lockdep_assert_held(&device->dev_set->lock);
@@ -445,9 +446,11 @@ static int vfio_device_first_open(struct vfio_device *device,
 	return ret;
 }
 
-static void vfio_device_last_close(struct vfio_device *device,
-				   struct iommufd_ctx *iommufd)
+static void vfio_device_last_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	if (device->ops->close_device)
@@ -459,15 +462,16 @@ static void vfio_device_last_close(struct vfio_device *device,
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
+int vfio_device_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret = 0;
 
 	lockdep_assert_held(&device->dev_set->lock);
 
 	device->open_count++;
 	if (device->open_count == 1) {
-		ret = vfio_device_first_open(device, iommufd);
+		ret = vfio_device_first_open(df);
 		if (ret)
 			device->open_count--;
 	}
@@ -475,14 +479,15 @@ int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
 	return ret;
 }
 
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd)
+void vfio_device_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	vfio_assert_device_open(device);
 	if (device->open_count == 1)
-		vfio_device_last_close(device, iommufd);
+		vfio_device_last_close(df);
 	device->open_count--;
 }
 
@@ -527,7 +532,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 
 	vfio_device_put_registration(device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 07/19] vfio: Block device access via device fd until device is opened
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (5 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:48   ` Jason Gunthorpe
  2023-03-01  9:22   ` Liu, Yi L
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
                   ` (15 subsequent siblings)
  22 siblings, 2 replies; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

Allow the vfio_device file to be in a state where the device FD is
opened but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called). This inbetween state is not used when the device
FD is spawned from the group FD, however when we create the device FD
directly by opening a cdev it will be opened in the blocked state.

The reason for the inbetween state is that userspace only gets a FD but
doesn't gain access permission until binding the FD to an iommufd. So in
the blocked state, only the bind operation is allowed. Completing bind
will allow user to further access the device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Following this lockless scheme, it can safely handle the device FD
unbound->bound but it cannot handle bound->unbound. To allow this we'd
need to add a lock on all the vfio ioctls which seems costly. So once
device FD is bound, it remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/group.c     |  6 ++++++
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 16 ++++++++++++++++
 3 files changed, 23 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 960b1bcb606b..d8771d585cb1 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -197,6 +197,12 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
 
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
+
 	mutex_unlock(&device->dev_set->lock);
 
 out_unlock:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 7c1ea870d8f3..2e3cb284711d 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,7 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 609700748082..d16ac573e290 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1106,6 +1106,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	/* Paired with smp_store_release() in vfio_device_group_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	ret = vfio_device_pm_runtime_get(device);
 	if (ret)
 		return ret;
@@ -1133,6 +1137,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() in vfio_device_group_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
 
@@ -1146,6 +1154,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() in vfio_device_group_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
 
@@ -1157,6 +1169,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() in vfio_device_group_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (6 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 07/19] vfio: Block device access via device fd until device is opened Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:48   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

this suits more on what the code does.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a6492a25ff6a..1bf54beeaef2 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1308,9 +1308,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	/*
-	 * For each group_fd, get the group through the vfio external user
-	 * interface and store the group and iommu ID.  This ensures the group
-	 * is held across the reset.
+	 * Get the group file for each fd to ensure the group held across
+	 * the reset
 	 */
 	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (7 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:22   ` Jason Gunthorpe
  2023-03-02  6:07   ` Liu, Yi L
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace Yi Liu
                   ` (13 subsequent siblings)
  22 siblings, 2 replies; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

to indicate kernel to use the device's bound iommufd_ctx for the device
ownership check. Kernel should loop all the opened devices in the dev_set,
and check if they are bound to the same iommufd_ctx. For the devices that
has not been opened yet but affected, they can be reset by the current
users as they cannot be opened by any other user. This applies to the
existing group/container path as well.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 111 +++++++++++++++++++++++--------
 drivers/vfio/vfio.h              |  11 +++
 include/uapi/linux/vfio.h        |  16 +++++
 3 files changed, 109 insertions(+), 29 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1bf54beeaef2..e0ebe55b4df0 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -27,11 +27,13 @@
 #include <linux/vgaarb.h>
 #include <linux/nospec.h>
 #include <linux/sched/mm.h>
+#include <linux/iommufd.h>
 #if IS_ENABLED(CONFIG_EEH)
 #include <asm/eeh.h>
 #endif
 
 #include "vfio_pci_priv.h"
+#include "../vfio.h"
 
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
 #define DRIVER_DESC "core driver for VFIO based PCI devices"
@@ -180,7 +182,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
 struct vfio_pci_group_info;
 static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups);
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx);
 
 /*
  * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -1255,29 +1258,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	return ret;
 }
 
-static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
-					struct vfio_pci_hot_reset __user *arg)
+static int
+vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
+				    struct vfio_pci_hot_reset *hdr,
+				    bool slot,
+				    struct vfio_pci_hot_reset __user *arg)
 {
-	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
-	struct vfio_pci_hot_reset hdr;
 	int32_t *group_fds;
 	struct file **files;
 	struct vfio_pci_group_info info;
-	bool slot = false;
 	int file_idx, count = 0, ret = 0;
 
-	if (copy_from_user(&hdr, arg, minsz))
-		return -EFAULT;
-
-	if (hdr.argsz < minsz || hdr.flags)
-		return -EINVAL;
-
-	/* Can we do a slot or bus reset or neither? */
-	if (!pci_probe_reset_slot(vdev->pdev->slot))
-		slot = true;
-	else if (pci_probe_reset_bus(vdev->pdev->bus))
-		return -ENODEV;
-
 	/*
 	 * We can't let userspace give us an arbitrarily large buffer to copy,
 	 * so verify how many we think there could be.  Note groups can have
@@ -1289,11 +1280,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 		return ret;
 
 	/* Somewhere between 1 and count is OK */
-	if (!hdr.count || hdr.count > count)
+	if (hdr->count > count)
 		return -EINVAL;
 
-	group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
-	files = kcalloc(hdr.count, sizeof(*files), GFP_KERNEL);
+	group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
+	files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
 	if (!group_fds || !files) {
 		kfree(group_fds);
 		kfree(files);
@@ -1301,7 +1292,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	if (copy_from_user(group_fds, arg->group_fds,
-			   hdr.count * sizeof(*group_fds))) {
+			   hdr->count * sizeof(*group_fds))) {
 		kfree(group_fds);
 		kfree(files);
 		return -EFAULT;
@@ -1311,7 +1302,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	 * Get the group file for each fd to ensure the group held across
 	 * the reset
 	 */
-	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
+	for (file_idx = 0; file_idx < hdr->count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
 
 		if (!file) {
@@ -1335,10 +1326,10 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	if (ret)
 		goto hot_reset_release;
 
-	info.count = hdr.count;
+	info.count = hdr->count;
 	info.files = files;
 
-	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
+	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
 
 hot_reset_release:
 	for (file_idx--; file_idx >= 0; file_idx--)
@@ -1348,6 +1339,36 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
+static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
+					struct vfio_pci_hot_reset __user *arg)
+{
+	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
+	struct vfio_pci_hot_reset hdr;
+	struct iommufd_ctx *iommufd;
+	bool slot = false;
+
+	if (copy_from_user(&hdr, arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags)
+		return -EINVAL;
+
+	/* Can we do a slot or bus reset or neither? */
+	if (!pci_probe_reset_slot(vdev->pdev->slot))
+		slot = true;
+	else if (pci_probe_reset_bus(vdev->pdev->bus))
+		return -ENODEV;
+
+	if (!hdr.count)
+		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
+
+	iommufd = vfio_device_iommufd(&vdev->vdev);
+	if (!iommufd)
+		return -EINVAL;
+
+	return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
+}
+
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
 				    struct vfio_device_ioeventfd __user *arg)
 {
@@ -2317,6 +2338,9 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
 {
 	unsigned int i;
 
+	if (!groups)
+		return false;
+
 	for (i = 0; i < groups->count; i++)
 		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
 			return true;
@@ -2392,13 +2416,25 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
 	return ret;
 }
 
+static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
+				    struct iommufd_ctx *iommufd_ctx)
+{
+	struct iommufd_ctx *iommufd = vfio_device_iommufd(&vdev->vdev);
+
+	if (!iommufd)
+		return false;
+
+	return iommufd == iommufd_ctx;
+}
+
 /*
  * We need to get memory_lock for each device, but devices can share mmap_lock,
  * therefore we need to zap and hold the vma_lock for each device, and only then
  * get each memory_lock.
  */
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups)
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx)
 {
 	struct vfio_pci_core_device *cur_mem;
 	struct vfio_pci_core_device *cur_vma;
@@ -2429,10 +2465,27 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 
 	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
 		/*
-		 * Test whether all the affected devices are contained by the
-		 * set of groups provided by the user.
+		 * Test whether all the affected devices can be reset by the
+		 * user.  The affected devices may already been opened or not
+		 * yet.
+		 *
+		 * For the devices not opened yet, user can reset them. The
+		 * reason is that the hot reset is done under the protection
+		 * of the dev_set->lock, and device open is also under this
+		 * lock.  During the hot reset, such devices can not be opened
+		 * by other users.
+		 *
+		 * For the devices that have been opened, needs to check the
+		 * ownership.  If the user provides a set of group fds, the
+		 * ownership check is done by checking if all the opened
+		 * devices are contained by the groups.  If the user provides
+		 * a zero-length fd array, the ownerhsip check is done by
+		 * checking if all the opened devices are bound to the same
+		 * iommufd_ctx.
 		 */
-		if (!vfio_dev_in_groups(cur_vma, groups)) {
+		if (cur_vma->vdev.open_count &&
+		    !vfio_dev_in_groups(cur_vma, groups) &&
+		    !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx)) {
 			ret = -EINVAL;
 			goto err_undo;
 		}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 2e3cb284711d..64e862a02dad 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -225,6 +225,11 @@ static inline void vfio_container_cleanup(void)
 #if IS_ENABLED(CONFIG_IOMMUFD)
 int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
 void vfio_iommufd_unbind(struct vfio_device *device);
+static inline struct iommufd_ctx *
+vfio_device_iommufd(struct vfio_device *device)
+{
+	return device->iommufd_ictx;
+}
 #else
 static inline int vfio_iommufd_bind(struct vfio_device *device,
 				    struct iommufd_ctx *ictx)
@@ -235,6 +240,12 @@ static inline int vfio_iommufd_bind(struct vfio_device *device,
 static inline void vfio_iommufd_unbind(struct vfio_device *device)
 {
 }
+
+static inline struct iommufd_ctx *
+vfio_device_iommufd(struct vfio_device *device)
+{
+	return NULL;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0552e8dcf0cb..4bf11ee8de53 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -673,6 +673,22 @@ struct vfio_pci_hot_reset_info {
  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
  *				    struct vfio_pci_hot_reset)
  *
+ * Userspace requests hot reset for the devices it uses.  Due to the
+ * underlying topology, multiple devices may be affected in the reset.
+ * The affected devices may have been opened by the user or by other
+ * users or not opened yet.  Only when all the affected devices are
+ * either opened by the current user or not opened by any user, should
+ * the reset request be allowed.  Otherwise, this request is expected
+ * to return error.
+ *
+ * If the user uses group and container interface, it should pass down
+ * a set of group fds for ownership check.  If the user uses iommufd, it
+ * should pass down a zero-length group_fds array to indicate the kernel
+ * to use the bound iommufd for the ownership check.  User that uses the
+ * vfio iommufd compatible mode can also pass down a zero-length group_fds
+ * array as this mode uses iommufd in kernel, and there is no reason to
+ * forbide it.
+ *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_pci_hot_reset {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (8 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:29   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

For the device fd opened from cdev, userspace needs to bind it to an
iommufd and attach it to IOAS managed by iommufd. With such operations,
userspace can set up a secure DMA context and hence access device.

This changes the existing vfio_iommufd_bind() to accept a pt_id pointer
as an optional input, and also an dev_id pointer to selectively return
the dev_id to prepare for adding bind_iommufd ioctl, which does the bind
first and then attach IOAS.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/group.c     | 17 ++++++++++++++---
 drivers/vfio/iommufd.c   | 21 +++++++++------------
 drivers/vfio/vfio.h      |  9 ++++++---
 drivers/vfio/vfio_main.c | 10 ++++++----
 4 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index d8771d585cb1..e44232551448 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -169,6 +169,7 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
 static int vfio_device_group_open(struct vfio_device_file *df)
 {
 	struct vfio_device *device = df->device;
+	u32 ioas_id;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
@@ -177,6 +178,13 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 		goto out_unlock;
 	}
 
+	if (device->group->iommufd) {
+		ret = iommufd_vfio_compat_ioas_id(device->group->iommufd,
+						  &ioas_id);
+		if (ret)
+			goto out_unlock;
+	}
+
 	mutex_lock(&device->dev_set->lock);
 
 	/*
@@ -188,9 +196,12 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 	if (device->open_count == 0)
 		vfio_device_group_get_kvm_safe(device);
 
-	df->iommufd = device->group->iommufd;
-
-	ret = vfio_device_open(df);
+	if (device->group->iommufd) {
+		df->iommufd = device->group->iommufd;
+		ret = vfio_device_open(df, NULL, &ioas_id);
+	} else {
+		ret = vfio_device_open(df, NULL, NULL);
+	}
 	if (ret)
 		df->iommufd = NULL;
 
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 4f82a6fa7c6c..beef6ca21107 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -10,9 +10,9 @@
 MODULE_IMPORT_NS(IOMMUFD);
 MODULE_IMPORT_NS(IOMMUFD_VFIO);
 
-int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
+		      u32 *dev_id, u32 *pt_id)
 {
-	u32 ioas_id;
 	u32 device_id;
 	int ret;
 
@@ -29,17 +29,14 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 	if (ret)
 		return ret;
 
-	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
-	if (ret)
-		goto err_unbind;
-	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
-	if (ret)
-		goto err_unbind;
+	if (pt_id) {
+		ret = vdev->ops->attach_ioas(vdev, pt_id);
+		if (ret)
+			goto err_unbind;
+	}
 
-	/*
-	 * The legacy path has no way to return the device id or the selected
-	 * pt_id
-	 */
+	if (dev_id)
+		*dev_id = device_id;
 	return 0;
 
 err_unbind:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 64e862a02dad..04d2bd2e314d 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -26,7 +26,8 @@ struct vfio_device_file {
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device_file *df);
+int vfio_device_open(struct vfio_device_file *df,
+		     u32 *dev_id, u32 *pt_id);
 void vfio_device_close(struct vfio_device_file *df);
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
@@ -223,7 +224,8 @@ static inline void vfio_container_cleanup(void)
 #endif
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
-int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
+int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx,
+		      u32 *dev_id, u32 *pt_id);
 void vfio_iommufd_unbind(struct vfio_device *device);
 static inline struct iommufd_ctx *
 vfio_device_iommufd(struct vfio_device *device)
@@ -232,7 +234,8 @@ vfio_device_iommufd(struct vfio_device *device)
 }
 #else
 static inline int vfio_iommufd_bind(struct vfio_device *device,
-				    struct iommufd_ctx *ictx)
+				    struct iommufd_ctx *ictx,
+				    u32 *dev_id, u32 *pt_id)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index d16ac573e290..19ea65a87072 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -411,7 +411,8 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device_file *df)
+static int vfio_device_first_open(struct vfio_device_file *df,
+				  u32 *dev_id, u32 *pt_id)
 {
 	struct vfio_device *device = df->device;
 	struct iommufd_ctx *iommufd = df->iommufd;
@@ -423,7 +424,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 		return -ENODEV;
 
 	if (iommufd)
-		ret = vfio_iommufd_bind(device, iommufd);
+		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
 	else
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
@@ -462,7 +463,8 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device_file *df)
+int vfio_device_open(struct vfio_device_file *df,
+		     u32 *dev_id, u32 *pt_id)
 {
 	struct vfio_device *device = df->device;
 	int ret = 0;
@@ -471,7 +473,7 @@ int vfio_device_open(struct vfio_device_file *df)
 
 	device->open_count++;
 	if (device->open_count == 1) {
-		ret = vfio_device_first_open(df);
+		ret = vfio_device_first_open(df, dev_id, pt_id);
 		if (ret)
 			device->open_count--;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (9 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:44   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 12/19] vfio-iommufd: Add detach_ioas for emulated " Yi Liu
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

this prepares for adding DETACH ioctl for physical VFIO devices.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 Documentation/driver-api/vfio.rst             |  8 +++++---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  1 +
 drivers/vfio/iommufd.c                        | 20 +++++++++++++++++++
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  2 ++
 drivers/vfio/pci/mlx5/main.c                  |  1 +
 drivers/vfio/pci/vfio_pci.c                   |  1 +
 drivers/vfio/platform/vfio_amba.c             |  1 +
 drivers/vfio/platform/vfio_platform.c         |  1 +
 drivers/vfio/vfio_main.c                      |  3 ++-
 include/linux/vfio.h                          |  8 +++++++-
 10 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 50b690f7f663..44527420f20d 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -279,6 +279,7 @@ similar to a file operations structure::
 					struct iommufd_ctx *ictx, u32 *out_device_id);
 		void	(*unbind_iommufd)(struct vfio_device *vdev);
 		int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+		void	(*detach_ioas)(struct vfio_device *vdev);
 		int	(*open_device)(struct vfio_device *vdev);
 		void	(*close_device)(struct vfio_device *vdev);
 		ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -315,9 +316,10 @@ container_of().
 	- The [un]bind_iommufd callbacks are issued when the device is bound to
 	  and unbound from iommufd.
 
-	- The attach_ioas callback is issued when the device is attached to an
-	  IOAS managed by the bound iommufd. The attached IOAS is automatically
-	  detached when the device is unbound from iommufd.
+	- The [de]attach_ioas callback is issued when the device is attached to
+	  and detached from an IOAS managed by the bound iommufd. However, the
+	  attached IOAS can also be automatically detached when the device is
+	  unbound from iommufd.
 
 	- The read/write/mmap callbacks implement the device region access defined
 	  by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index c89a047a4cd8..d540cf683d93 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -594,6 +594,7 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct fsl_mc_driver vfio_fsl_mc_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index beef6ca21107..bfaa9876499b 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -88,6 +88,14 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 {
 	int rc;
 
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (!vdev->iommufd_device)
+		return -EINVAL;
+
+	if (vdev->iommufd_attached)
+		return -EBUSY;
+
 	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
 	if (rc)
 		return rc;
@@ -96,6 +104,18 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
 
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (!vdev->iommufd_device || !vdev->iommufd_attached)
+		return;
+
+	iommufd_device_detach(vdev->iommufd_device);
+	vdev->iommufd_attached = false;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
+
 /*
  * The emulated standard ops mean that vfio_device is going to use the
  * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index a117eaf21c14..b2f9778c8366 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1373,6 +1373,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
@@ -1391,6 +1392,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index e897537a9e8a..6fc3410989eb 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -1326,6 +1326,7 @@ static const struct vfio_device_ops mlx5vf_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int mlx5vf_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 29091ee2e984..cb5b7f865d58 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -141,6 +141,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index 83fe54015595..6464b3939ebc 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -119,6 +119,7 @@ static const struct vfio_device_ops vfio_amba_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct amba_id pl330_ids[] = {
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 22a1efca32a8..8cf22fa65baa 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -108,6 +108,7 @@ static const struct vfio_device_ops vfio_platform_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct platform_driver vfio_platform_driver = {
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 19ea65a87072..a51c0a0e8a1a 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -250,7 +250,8 @@ static int __vfio_register_dev(struct vfio_device *device,
 
 	if (WARN_ON(device->ops->bind_iommufd &&
 		    (!device->ops->unbind_iommufd ||
-		     !device->ops->attach_ioas)))
+		     !device->ops->attach_ioas ||
+		     !device->ops->detach_ioas)))
 		return -EINVAL;
 
 	/*
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index f2d3d6997ad7..9815a8c4ac7c 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -73,7 +73,9 @@ struct vfio_device {
  * @bind_iommufd: Called when binding the device to an iommufd
  * @unbind_iommufd: Opposite of bind_iommufd
  * @attach_ioas: Called when attaching device to an IOAS/HWPT managed by the
- *		 bound iommufd. Undo in unbind_iommufd.
+ *		 bound iommufd. Undo in unbind_iommufd if @detach_ioas is not
+ *		 called
+ * @detach_ioas: Opposite of attach_ioas
  * @open_device: Called when the first file descriptor is opened for this device
  * @close_device: Opposite of open_device
  * @read: Perform read(2) on device file descriptor
@@ -97,6 +99,7 @@ struct vfio_device_ops {
 				struct iommufd_ctx *ictx, u32 *out_device_id);
 	void	(*unbind_iommufd)(struct vfio_device *vdev);
 	int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+	void	(*detach_ioas)(struct vfio_device *vdev);
 	int	(*open_device)(struct vfio_device *vdev);
 	void	(*close_device)(struct vfio_device *vdev);
 	ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -118,6 +121,7 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
 int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev);
 int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
@@ -130,6 +134,8 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_physical_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_physical_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 12/19] vfio-iommufd: Add detach_ioas for emulated VFIO devices
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (10 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:45   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 13/19] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

this prepares for adding DETACH ioctl for emulated VFIO devices.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
 drivers/s390/cio/vfio_ccw_ops.c   |  1 +
 drivers/s390/crypto/vfio_ap_ops.c |  1 +
 drivers/vfio/iommufd.c            | 18 ++++++++++++++++++
 include/linux/vfio.h              |  3 +++
 5 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 8ae7039b3683..8a76a84bc3c1 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1474,6 +1474,7 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static int intel_vgpu_probe(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 5b53b94f13c7..cba4971618ff 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -632,6 +632,7 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 struct mdev_driver vfio_ccw_mdev_driver = {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 9c01957e56b3..f99c69d40982 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1802,6 +1802,7 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver vfio_ap_matrix_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index bfaa9876499b..faf2516b0f06 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -165,6 +165,12 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (!vdev->iommufd_ictx)
+		return -EINVAL;
+
+	if (vdev->iommufd_access)
+		return -EBUSY;
+
 	user = iommufd_access_create(vdev->iommufd_ictx, *pt_id, &vfio_user_ops,
 				     vdev);
 	if (IS_ERR(user))
@@ -173,3 +179,15 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
+
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (!vdev->iommufd_ictx || !vdev->iommufd_access)
+		return;
+
+	iommufd_access_destroy(vdev->iommufd_access);
+	vdev->iommufd_access = NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_detach_ioas);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 9815a8c4ac7c..c9b91c57de07 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -126,6 +126,7 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev);
 #else
 #define vfio_iommufd_physical_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
@@ -143,6 +144,8 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_emulated_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #endif
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 13/19] vfio: Add cdev_device_open_cnt to vfio_group
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (11 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 12/19] vfio-iommufd: Add detach_ioas for emulated " Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 19:20   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 14/19] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

for counting the devices that are opened via the cdev path. This count
is increased and decreased by the cdev path. The group path checks it
to achieve exclusion with the cdev path. With this, only one path (group
path or cdev path) will claim DMA ownership. This avoids scenarios in
which devices within the same group may be opened via different paths.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/group.c | 33 +++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h  |  3 +++
 2 files changed, 36 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index e44232551448..d4d78d63db06 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -387,6 +387,33 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 	}
 }
 
+int vfio_device_block_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+	int ret = 0;
+
+	mutex_lock(&group->group_lock);
+	if (group->opened_file) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	group->cdev_device_open_cnt++;
+
+out_unlock:
+	mutex_unlock(&group->group_lock);
+	return ret;
+}
+
+void vfio_device_unblock_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+
+	mutex_lock(&group->group_lock);
+	group->cdev_device_open_cnt--;
+	mutex_unlock(&group->group_lock);
+}
+
 static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 {
 	struct vfio_group *group =
@@ -409,6 +436,11 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 		goto out_unlock;
 	}
 
+	if (group->cdev_device_open_cnt) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	/*
 	 * Do we need multiple instances of the group open?  Seems not.
 	 */
@@ -483,6 +515,7 @@ static void vfio_group_release(struct device *dev)
 	mutex_destroy(&group->device_lock);
 	mutex_destroy(&group->group_lock);
 	WARN_ON(group->iommu_group);
+	WARN_ON(group->cdev_device_open_cnt);
 	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
 	kfree(group);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 04d2bd2e314d..a61d4df30716 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -84,8 +84,11 @@ struct vfio_group {
 	struct blocking_notifier_head	notifier;
 	struct iommufd_ctx		*iommufd;
 	spinlock_t			kvm_ref_lock;
+	unsigned int			cdev_device_open_cnt;
 };
 
+int vfio_device_block_group(struct vfio_device *device);
+void vfio_device_unblock_group(struct vfio_device *device);
 int vfio_device_set_group(struct vfio_device *device,
 			  enum vfio_group_type type);
 void vfio_device_remove_group(struct vfio_device *device);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 14/19] vfio: Make vfio_device_open() single open for device cdev path
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (12 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 13/19] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:52   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device Yi Liu
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

VFIO group has historically allowed multi-open of the device FD. This
was made secure because the "open" was executed via an ioctl to the
group FD which is itself only single open.

However, no known use of multiple device FDs today. It is kind of a
strange thing to do because new device FDs can naturally be created
via dup().

When we implement the new device uAPI (only used in cdev path) there is
no natural way to allow the device itself from being multi-opened in a
secure manner. Without the group FD we cannot prove the security context
of the opener.

Thus, when moving to the new uAPI we block the ability to multi-open
the device. Old group path still allows it.

vfio_device_open() needs to sustain both the legacy behavior i.e. multi-open
in the group path and the new behavior i.e. single-open in the cdev path.
This mixture leads to the introduction of a new is_cdev_device flag in struct
vfio_device_file.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/vfio.h      |  2 ++
 drivers/vfio/vfio_main.c | 10 +++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index a61d4df30716..54a48c8596f7 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	bool is_cdev_device;
+
 	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a51c0a0e8a1a..5b485d3bb67e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -472,6 +472,13 @@ int vfio_device_open(struct vfio_device_file *df,
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	/*
+	 * Device cdev path cannot support multiple device open since
+	 * it doesn't have a secure way for it.
+	 */
+	if (device->open_count != 0 && df->is_cdev_device)
+		return -EINVAL;
+
 	device->open_count++;
 	if (device->open_count == 1) {
 		ret = vfio_device_first_open(df, dev_id, pt_id);
@@ -535,7 +542,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(df);
+	if (!df->is_cdev_device)
+		vfio_device_group_close(df);
 
 	vfio_device_put_registration(device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (13 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 14/19] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:55   ` Jason Gunthorpe
  2023-02-27 19:06   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
                   ` (7 subsequent siblings)
  22 siblings, 2 replies; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

This allows user to directly open a vfio device w/o using the legacy
container/group interface, as a prerequisite for supporting new iommu
features like nested translation.

The device fd opened in this manner doesn't have the capability to access
the device as the fops open() doesn't open the device until the successful
BIND_IOMMUFD which be added in next patch.

With this patch, devices registered to vfio core have both group and device
interface created.

- group interface : /dev/vfio/$groupID
- device interface: /dev/vfio/devices/vfioX  (X is the minor number and
					      is unique across devices)

Given a vfio device the user can identify the matching vfioX by checking
the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
/sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
major:minor of the matching vfioX.

Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
that the major:minor matches.

The vfio_device cdev logic in this patch:
*) __vfio_register_dev() path ends up doing cdev_device_add() for each
   vfio_device;
*) vfio_unregister_group_dev() path does cdev_device_del();

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/Kconfig       | 12 ++++++++
 drivers/vfio/Makefile      |  1 +
 drivers/vfio/device_cdev.c | 63 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 46 ++++++++++++++++++++++++++++
 drivers/vfio/vfio_main.c   | 22 ++++++++++---
 include/linux/vfio.h       |  4 +++
 6 files changed, 144 insertions(+), 4 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index a8f544629467..169762316513 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -12,6 +12,18 @@ menuconfig VFIO
 	  If you don't know what to do here, say N.
 
 if VFIO
+config VFIO_DEVICE_CDEV
+	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
+	depends on IOMMUFD && (X86 || S390 || ARM || ARM64)
+	help
+	  The VFIO device cdev is another way for userspace to get device
+	  access. Userspace gets device fd by opening device cdev under
+	  /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
+	  to set up secure context for device access. It is not available for
+	  SPAPR_TCE_IOMMU.
+
+	  If you don't know what to do here, say N.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 70e7dcb302ef..245394aeb94b 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_VFIO) += vfio.o
 vfio-y += vfio_main.o \
 	  group.o \
 	  iova_bitmap.o
+vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
new file mode 100644
index 000000000000..9e2c1ecaaf4f
--- /dev/null
+++ b/drivers/vfio/device_cdev.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Intel Corporation.
+ */
+#include <linux/vfio.h>
+
+#include "vfio.h"
+
+static dev_t device_devt;
+
+void vfio_init_device_cdev(struct vfio_device *device)
+{
+	device->device.devt = MKDEV(MAJOR(device_devt), device->index);
+	cdev_init(&device->cdev, &vfio_device_fops);
+	device->cdev.owner = THIS_MODULE;
+}
+
+/*
+ * device access via the fd opened by this function is blocked until
+ * .open_device() is called successfully during BIND_IOMMUFD.
+ */
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
+{
+	struct vfio_device *device = container_of(inode->i_cdev,
+						  struct vfio_device, cdev);
+	struct vfio_device_file *df;
+	int ret;
+
+	if (!vfio_device_try_get_registration(device))
+		return -ENODEV;
+
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_put_registration;
+	}
+
+	df->is_cdev_device = true;
+	filep->private_data = df;
+
+	return 0;
+
+err_put_registration:
+	vfio_device_put_registration(device);
+	return ret;
+}
+
+static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
+}
+
+int vfio_cdev_init(struct class *device_class)
+{
+	device_class->devnode = vfio_device_devnode;
+	return alloc_chrdev_region(&device_devt, 0,
+				   MINORMASK + 1, "vfio-dev");
+}
+
+void vfio_cdev_cleanup(void)
+{
+	unregister_chrdev_region(device_devt, MINORMASK + 1);
+}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 54a48c8596f7..8661de75f94b 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -256,6 +256,52 @@ vfio_device_iommufd(struct vfio_device *device)
 }
 #endif
 
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	return cdev_device_add(&device->cdev, &device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	cdev_device_del(&device->cdev, &device->device);
+}
+
+void vfio_init_device_cdev(struct vfio_device *device);
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+int vfio_cdev_init(struct class *device_class);
+void vfio_cdev_cleanup(void);
+#else
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	return device_add(&device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	device_del(&device->device);
+}
+
+static inline void vfio_init_device_cdev(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_fops_cdev_open(struct inode *inode,
+					     struct file *filep)
+{
+	return 0;
+}
+
+static inline int vfio_cdev_init(struct class *device_class)
+{
+	return 0;
+}
+
+static inline void vfio_cdev_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_DEVICE_CDEV */
+
 #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
 int __init vfio_virqfd_init(void);
 void vfio_virqfd_exit(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 5b485d3bb67e..3f83447d022e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -235,6 +235,7 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
 	device->device.release = vfio_device_release;
 	device->device.class = vfio.device_class;
 	device->device.parent = device->dev;
+	vfio_init_device_cdev(device);
 	return 0;
 
 out_uninit:
@@ -269,7 +270,7 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
-	ret = device_add(&device->device);
+	ret = vfio_device_add(device);
 	if (ret)
 		goto err_out;
 
@@ -309,6 +310,13 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	bool interrupted = false;
 	long rc;
 
+	/*
+	 * Balances vfio_device_add in register path. Putting it as the
+	 * first operation in unregister to prevent registration refcount
+	 * from incrementing per cdev open.
+	 */
+	vfio_device_del(device);
+
 	vfio_device_put_registration(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
@@ -334,9 +342,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 
 	vfio_device_group_unregister(device);
 
-	/* Balances device_add in register path */
-	device_del(&device->device);
-
 	/* Balances vfio_device_set_group in register path */
 	vfio_device_remove_group(device);
 }
@@ -1192,6 +1197,7 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 
 const struct file_operations vfio_device_fops = {
 	.owner		= THIS_MODULE,
+	.open		= vfio_device_fops_cdev_open,
 	.release	= vfio_device_fops_release,
 	.read		= vfio_device_fops_read,
 	.write		= vfio_device_fops_write,
@@ -1541,9 +1547,16 @@ static int __init vfio_init(void)
 		goto err_dev_class;
 	}
 
+	ret = vfio_cdev_init(vfio.device_class);
+	if (ret)
+		goto err_alloc_dev_chrdev;
+
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	return 0;
 
+err_alloc_dev_chrdev:
+	class_destroy(vfio.device_class);
+	vfio.device_class = NULL;
 err_dev_class:
 	vfio_virqfd_exit();
 err_virqfd:
@@ -1554,6 +1567,7 @@ static int __init vfio_init(void)
 static void __exit vfio_cleanup(void)
 {
 	ida_destroy(&vfio.device_ida);
+	vfio_cdev_cleanup();
 	class_destroy(vfio.device_class);
 	vfio.device_class = NULL;
 	vfio_virqfd_exit();
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index c9b91c57de07..ce390533cb30 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -13,6 +13,7 @@
 #include <linux/mm.h>
 #include <linux/workqueue.h>
 #include <linux/poll.h>
+#include <linux/cdev.h>
 #include <uapi/linux/vfio.h>
 #include <linux/iova_bitmap.h>
 
@@ -51,6 +52,9 @@ struct vfio_device {
 	/* Members below here are private, not for driver use */
 	unsigned int index;
 	struct device device;	/* device.kref covers object life circle */
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+	struct cdev cdev;
+#endif
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (14 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 19:19   ` Jason Gunthorpe
                     ` (2 more replies)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
                   ` (6 subsequent siblings)
  22 siblings, 3 replies; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

This adds ioctl for userspace to bind device cdev fd to iommufd.

    VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
			      control provided by the iommufd. open_device
			      op is called after bind_iommufd op.
			      VFIO no iommu mode is indicated by passing
			      a negative iommufd value.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 146 +++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        |  17 ++++-
 drivers/vfio/vfio_main.c   |  54 ++++++++++++--
 include/linux/iommufd.h    |   6 ++
 include/uapi/linux/vfio.h  |  34 +++++++++
 5 files changed, 248 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 9e2c1ecaaf4f..37f80e368551 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2023 Intel Corporation.
  */
 #include <linux/vfio.h>
+#include <linux/iommufd.h>
 
 #include "vfio.h"
 
@@ -45,6 +46,151 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
 	return ret;
 }
 
+static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
+{
+	spin_lock(&df->kvm_ref_lock);
+	if (!df->kvm)
+		goto unlock;
+
+	_vfio_device_get_kvm_safe(df->device, df->kvm);
+
+unlock:
+	spin_unlock(&df->kvm_ref_lock);
+}
+
+void vfio_device_cdev_close(struct vfio_device_file *df)
+{
+	struct vfio_device *device = df->device;
+
+	mutex_lock(&device->dev_set->lock);
+	/*
+	 * As df->access_granted writer is under dev_set->lock as well,
+	 * so this read no need to use smp_load_acquire() to pair with
+	 * smp_store_release() in the caller of vfio_device_open().
+	 */
+	if (!df->access_granted) {
+		mutex_unlock(&device->dev_set->lock);
+		return;
+	}
+	vfio_device_close(df);
+	vfio_device_put_kvm(device);
+	if (df->iommufd)
+		iommufd_ctx_put(df->iommufd);
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+}
+
+static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
+{
+	struct fd f;
+	struct iommufd_ctx *iommufd;
+
+	f = fdget(fd);
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	iommufd = iommufd_ctx_from_file(f.file);
+
+	fdput(f);
+	return iommufd;
+}
+
+long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+				    unsigned long arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_bind_iommufd bind;
+	struct iommufd_ctx *iommufd = NULL;
+	unsigned long minsz;
+	int ret;
+
+	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
+
+	if (copy_from_user(&bind, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (bind.argsz < minsz || bind.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	ret = vfio_device_block_group(device);
+	if (ret)
+		return ret;
+
+	mutex_lock(&device->dev_set->lock);
+	/*
+	 * If already been bound to an iommufd, or already set noiommu
+	 * then fail it.
+	 */
+	if (df->iommufd || df->noiommu) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* iommufd < 0 means noiommu mode */
+	if (bind.iommufd < 0) {
+		if (!capable(CAP_SYS_RAWIO)) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+		df->noiommu = true;
+	} else {
+		iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
+		if (IS_ERR(iommufd)) {
+			ret = PTR_ERR(iommufd);
+			goto out_unlock;
+		}
+	}
+
+	/*
+	 * Before the device open, get the KVM pointer currently
+	 * associated with the device file (if there is) and obtain
+	 * a reference.  This reference is held until device closed.
+	 * Save the pointer in the device for use by drivers.
+	 */
+	vfio_device_get_kvm_safe(df);
+
+	df->iommufd = iommufd;
+	ret = vfio_device_open(df, &bind.out_devid, NULL);
+	if (ret)
+		goto out_put_kvm;
+
+	ret = copy_to_user((void __user *)arg +
+			   offsetofend(struct vfio_device_bind_iommufd, iommufd),
+			   &bind.out_devid,
+			   sizeof(bind.out_devid)) ? -EFAULT : 0;
+	if (ret)
+		goto out_close_device;
+
+	if (df->noiommu)
+		dev_warn(device->dev, "vfio-noiommu device used by user "
+			 "(%s:%d)\n", current->comm, task_pid_nr(current));
+
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+
+out_close_device:
+	vfio_device_close(df);
+out_put_kvm:
+	df->iommufd = NULL;
+	df->noiommu = false;
+	vfio_device_put_kvm(device);
+	if (iommufd)
+		iommufd_ctx_put(iommufd);
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+	return ret;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 8661de75f94b..4716a904e63b 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -23,7 +23,9 @@ struct vfio_device_file {
 	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
-	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
+	/* protected by struct vfio_device_set::lock */
+	struct iommufd_ctx *iommufd;
+	bool noiommu;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
@@ -269,6 +271,9 @@ static inline void vfio_device_del(struct vfio_device *device)
 
 void vfio_init_device_cdev(struct vfio_device *device);
 int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+void vfio_device_cdev_close(struct vfio_device_file *df);
+long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+				    unsigned long arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -292,6 +297,16 @@ static inline int vfio_device_fops_cdev_open(struct inode *inode,
 	return 0;
 }
 
+static inline void vfio_device_cdev_close(struct vfio_device_file *df)
+{
+}
+
+static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+						  unsigned long arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 3f83447d022e..69d0add930bb 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -37,6 +37,7 @@
 #include <linux/interval_tree.h>
 #include <linux/iova_bitmap.h>
 #include <linux/iommufd.h>
+#include <uapi/linux/iommufd.h>
 #include "vfio.h"
 
 #define DRIVER_VERSION	"0.3"
@@ -422,16 +423,32 @@ static int vfio_device_first_open(struct vfio_device_file *df,
 {
 	struct vfio_device *device = df->device;
 	struct iommufd_ctx *iommufd = df->iommufd;
-	int ret;
+	int ret = 0;
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	if (WARN_ON(iommufd && df->noiommu))
+		return -EINVAL;
+
 	if (!try_module_get(device->dev->driver->owner))
 		return -ENODEV;
 
+	/*
+	 * For group/container path, iommufd pointer is NULL when comes
+	 * into this helper. Its noiommu support is handled by
+	 * vfio_device_group_use_iommu()
+	 *
+	 * For iommufd compat mode, iommufd pointer here is a valid value.
+	 * Its noiommu support is in vfio_iommufd_bind().
+	 *
+	 * For device cdev path, iommufd pointer here is a valid value for
+	 * normal cases, but it is NULL if it's noiommu. Check df->noiommu
+	 * to differentiate cdev noiommu from the group/container path which
+	 * also passes NULL iommufd pointer in. If set then do nothing.
+	 */
 	if (iommufd)
 		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
-	else
+	else if (!df->noiommu)
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
 		goto err_module_put;
@@ -446,7 +463,7 @@ static int vfio_device_first_open(struct vfio_device_file *df,
 err_unuse_iommu:
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (!df->noiommu)
 		vfio_device_group_unuse_iommu(device);
 err_module_put:
 	module_put(device->dev->driver->owner);
@@ -464,7 +481,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 		device->ops->close_device(device);
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (!df->noiommu)
 		vfio_device_group_unuse_iommu(device);
 	module_put(device->dev->driver->owner);
 }
@@ -549,6 +566,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 
 	if (!df->is_cdev_device)
 		vfio_device_group_close(df);
+	else
+		vfio_device_cdev_close(df);
 
 	vfio_device_put_registration(device);
 
@@ -1122,7 +1141,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
-	/* Paired with smp_store_release() in vfio_device_group_open() */
+	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
+		return vfio_device_ioctl_bind_iommufd(df, arg);
+
+	/*
+	 * Paired with smp_store_release() in the caller of
+	 * vfio_device_open(). e.g. vfio_device_group_open()
+	 * and vfio_device_ioctl_bind_iommufd()
+	 */
 	if (!smp_load_acquire(&df->access_granted))
 		return -EINVAL;
 
@@ -1153,7 +1179,11 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	/* Paired with smp_store_release() in vfio_device_group_open() */
+	/*
+	 * Paired with smp_store_release() in the caller of
+	 * vfio_device_open(). e.g. vfio_device_group_open()
+	 * and vfio_device_ioctl_bind_iommufd()
+	 */
 	if (!smp_load_acquire(&df->access_granted))
 		return -EINVAL;
 
@@ -1170,7 +1200,11 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	/* Paired with smp_store_release() in vfio_device_group_open() */
+	/*
+	 * Paired with smp_store_release() in the caller of
+	 * vfio_device_open(). e.g. vfio_device_group_open()
+	 * and vfio_device_ioctl_bind_iommufd()
+	 */
 	if (!smp_load_acquire(&df->access_granted))
 		return -EINVAL;
 
@@ -1185,7 +1219,11 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	/* Paired with smp_store_release() in vfio_device_group_open() */
+	/*
+	 * Paired with smp_store_release() in the caller of
+	 * vfio_device_open(). e.g. vfio_device_group_open()
+	 * and vfio_device_ioctl_bind_iommufd()
+	 */
 	if (!smp_load_acquire(&df->access_granted))
 		return -EINVAL;
 
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 650d45629647..9672cf839687 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -17,6 +17,12 @@ struct iommufd_ctx;
 struct iommufd_access;
 struct file;
 
+/*
+ * iommufd core init xarray with flags==XA_FLAGS_ALLOC1, so valid
+ * ID starts from 1.
+ */
+#define IOMMUFD_INVALID_ID 0
+
 struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 					   struct device *dev, u32 *id);
 void iommufd_device_unbind(struct iommufd_device *idev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 4bf11ee8de53..92aa8dbc970a 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -194,6 +194,40 @@ struct vfio_group_status {
 
 /* --------------- IOCTLs for DEVICE file descriptors --------------- */
 
+/*
+ * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
+ *				   struct vfio_device_bind_iommufd)
+ *
+ * Bind a vfio_device to the specified iommufd.
+ *
+ * The user should provide a device cookie when calling this ioctl. The
+ * cookie is carried only in event e.g. I/O fault reported to userspace
+ * via iommufd. The user should use devid returned by this ioctl to mark
+ * the target device in other ioctls (e.g. capability query via iommufd).
+ *
+ * User is not allowed to access the device before the binding operation
+ * is completed.
+ *
+ * Unbind is automatically conducted when device fd is closed.
+ *
+ * @argsz:	 user filled size of this data.
+ * @flags:	 reserved for future extension.
+ * @dev_cookie:	 a per device cookie provided by userspace.
+ * @iommufd:	 iommufd to bind. a negative value means noiommu.
+ * @out_devid:	 the device id generated by this bind.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_bind_iommufd {
+	__u32		argsz;
+	__u32		flags;
+	__aligned_u64	dev_cookie;
+	__s32		iommufd;
+	__u32		out_devid;
+};
+
+#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (15 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 18:39   ` Jason Gunthorpe
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally Yi Liu
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

This adds ioctl for userspace to attach device cdev fd to and detach
from IOAS/hw_pagetable managed by iommufd.

    VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
				   managed by iommufd. Attach can be
				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
				   or device fd close.
    VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
				   IOAS or hw_pagetable managed by iommufd.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/device_cdev.c | 76 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 16 ++++++++
 drivers/vfio/vfio_main.c   |  8 ++++
 include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 37f80e368551..5b5a249a6612 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -191,6 +191,82 @@ long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return ret;
 }
 
+int vfio_ioctl_device_attach(struct vfio_device_file *df,
+			     void __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_attach_iommufd_pt attach;
+	unsigned long minsz;
+	int ret;
+
+	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
+
+	if (copy_from_user(&attach, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (attach.argsz < minsz || attach.flags ||
+	    attach.pt_id == IOMMUFD_INVALID_ID)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	mutex_lock(&device->dev_set->lock);
+	if (df->noiommu) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	ret = device->ops->attach_ioas(device, &attach.pt_id);
+	if (ret)
+		goto out_unlock;
+
+	ret = copy_to_user((void __user *)arg +
+			   offsetofend(struct vfio_device_attach_iommufd_pt, flags),
+			   &attach.pt_id,
+			   sizeof(attach.pt_id)) ? -EFAULT : 0;
+	if (ret)
+		goto out_detach;
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+
+out_detach:
+	device->ops->detach_ioas(device);
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	return ret;
+}
+
+int vfio_ioctl_device_detach(struct vfio_device_file *df,
+			     void __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_detach_iommufd_pt detach;
+	unsigned long minsz;
+
+	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
+
+	if (copy_from_user(&detach, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (detach.argsz < minsz || detach.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	mutex_lock(&device->dev_set->lock);
+	if (df->noiommu) {
+		mutex_unlock(&device->dev_set->lock);
+		return -EINVAL;
+	}
+	device->ops->detach_ioas(device);
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 4716a904e63b..5a1ceb014779 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -274,6 +274,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
 void vfio_device_cdev_close(struct vfio_device_file *df);
 long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 				    unsigned long arg);
+int vfio_ioctl_device_attach(struct vfio_device_file *df,
+			     void __user *arg);
+int vfio_ioctl_device_detach(struct vfio_device_file *df,
+			     void __user *arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -307,6 +311,18 @@ static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return -EOPNOTSUPP;
 }
 
+static inline int vfio_ioctl_device_attach(struct vfio_device_file *df,
+					   void __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int vfio_ioctl_device_detach(struct vfio_device_file *df,
+					   void __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 69d0add930bb..f68550fe206f 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1161,6 +1161,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
 		break;
 
+	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+		ret = vfio_ioctl_device_attach(df, (void __user *)arg);
+		break;
+
+	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
+		ret = vfio_ioctl_device_detach(df, (void __user *)arg);
+		break;
+
 	default:
 		if (unlikely(!device->ops->ioctl))
 			ret = -EINVAL;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 92aa8dbc970a..ff8753d0abb0 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -228,6 +228,58 @@ struct vfio_device_bind_iommufd {
 
 #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
 
+/*
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
+ *					struct vfio_device_attach_iommufd_pt)
+ *
+ * Attach a vfio device to an iommufd address space specified by IOAS
+ * id or hw_pagetable (hwpt) id.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD
+ *
+ * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ * @pt_id:	Input the target id which can represent an ioas or a hwpt
+ *		allocated via iommufd subsystem.
+ *		Output the attached hwpt id which could be the specified
+ *		hwpt itself or a hwpt automatically created for the
+ *		specified ioas by kernel during the attachment.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_attach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+	__u32	pt_id;
+};
+
+#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
+
+/*
+ * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21,
+ *					struct vfio_device_detach_iommufd_pt)
+ *
+ * Detach a vfio device from the iommufd address space it has been
+ * attached to. After it, device should be in a blocking DMA state.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_detach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+};
+
+#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 21)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (16 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 19:20   ` Jason Gunthorpe
  2023-02-28  6:00   ` Liu, Yi L
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 19/19] docs: vfio: Add vfio device cdev description Yi Liu
                   ` (4 subsequent siblings)
  22 siblings, 2 replies; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

group code is not needed for vfio device cdev, so with vfio device cdev
introduced, the group infrastructures can be compiled out if only cdev
is needed.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/Kconfig  | 14 +++++++++
 drivers/vfio/Makefile |  2 +-
 drivers/vfio/vfio.h   | 72 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h  | 24 ++++++++++++++-
 4 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 169762316513..c3ab06c314ea 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -4,6 +4,8 @@ menuconfig VFIO
 	select IOMMU_API
 	depends on IOMMUFD || !IOMMUFD
 	select INTERVAL_TREE
+	select VFIO_GROUP if SPAPR_TCE_IOMMU
+	select VFIO_DEVICE_CDEV if !VFIO_GROUP && (X86 || S390 || ARM || ARM64)
 	select VFIO_CONTAINER if IOMMUFD=n
 	help
 	  VFIO provides a framework for secure userspace device drivers.
@@ -15,6 +17,7 @@ if VFIO
 config VFIO_DEVICE_CDEV
 	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
 	depends on IOMMUFD && (X86 || S390 || ARM || ARM64)
+	default !VFIO_GROUP
 	help
 	  The VFIO device cdev is another way for userspace to get device
 	  access. Userspace gets device fd by opening device cdev under
@@ -24,9 +27,20 @@ config VFIO_DEVICE_CDEV
 
 	  If you don't know what to do here, say N.
 
+config VFIO_GROUP
+	bool "Support for the VFIO group /dev/vfio/$group_id"
+	default y
+	help
+	   VFIO group support provides the traditional model for accessing
+	   devices through VFIO and is used by the majority of userspace
+	   applications and drivers making use of VFIO.
+
+	   If you don't know what to do here, say Y.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+	depends on VFIO_GROUP
 	default y
 	help
 	  The VFIO container is the classic interface to VFIO for establishing
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 245394aeb94b..57c3515af606 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -2,9 +2,9 @@
 obj-$(CONFIG_VFIO) += vfio.o
 
 vfio-y += vfio_main.o \
-	  group.o \
 	  iova_bitmap.o
 vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
+vfio-$(CONFIG_VFIO_GROUP) += group.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 5a1ceb014779..a7b88521bf48 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -62,6 +62,7 @@ enum vfio_group_type {
 	VFIO_NO_IOMMU,
 };
 
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -107,6 +108,77 @@ void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
+#else
+struct vfio_group;
+
+static inline int vfio_device_block_group(struct vfio_device *device)
+{
+	return 0;
+}
+
+static inline void vfio_device_unblock_group(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_set_group(struct vfio_device *device,
+					enum vfio_group_type type)
+{
+	return 0;
+}
+
+static inline void vfio_device_remove_group(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_unregister(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_group_use_iommu(struct vfio_device *device)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_close(struct vfio_device_file *df)
+{
+}
+
+static inline struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
+{
+	return true;
+}
+
+static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
+{
+}
+
+static inline bool vfio_device_has_container(struct vfio_device *device)
+{
+	return false;
+}
+
+static inline int __init vfio_group_init(void)
+{
+	return 0;
+}
+
+static inline void vfio_group_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_GROUP */
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /**
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ce390533cb30..d12384824656 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -43,7 +43,9 @@ struct vfio_device {
 	 */
 	const struct vfio_migration_ops *mig_ops;
 	const struct vfio_log_ops *log_ops;
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 	struct vfio_group *group;
+#endif
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
 	unsigned int migration_flags;
@@ -58,8 +60,10 @@ struct vfio_device {
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 	struct list_head group_next;
 	struct list_head iommu_entry;
+#endif
 	struct iommufd_access *iommufd_access;
 	void (*put_kvm)(struct kvm *kvm);
 #if IS_ENABLED(CONFIG_IOMMUFD)
@@ -257,12 +261,30 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 /*
  * External user API
  */
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct iommu_group *vfio_file_iommu_group(struct file *file);
 bool vfio_file_is_group(struct file *file);
+bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
+#else
+static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_file_is_group(struct file *file)
+{
+	return false;
+}
+
+static inline bool vfio_file_has_dev(struct file *file,
+				     struct vfio_device *device)
+{
+	return false;
+}
+#endif
 bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
-bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
 
 #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned long))
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] [PATCH v5 19/19] docs: vfio: Add vfio device cdev description
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (17 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally Yi Liu
@ 2023-02-27 11:11 ` Yi Liu
  2023-02-27 11:31 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev5) Patchwork
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 131+ messages in thread
From: Yi Liu @ 2023-02-27 11:11 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.l.liu, yi.y.sun, mjrosato, kvm, intel-gvt-dev,
	joro, cohuck, xudong.hao, peterx, yan.y.zhao, eric.auger,
	terrence.xu, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu, robin.murphy,
	jasowang

This gives notes for userspace applications on device cdev usage.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/driver-api/vfio.rst | 125 ++++++++++++++++++++++++++++++
 1 file changed, 125 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 44527420f20d..5d290ceb2bbf 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,123 @@ group and can access them as follows::
 	/* Gratuitous device reset and go... */
 	ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMUFD and vfio_iommu_type1
+----------------------------
+
+IOMMUFD is the new user API to manage I/O page tables from userspace.
+It intends to be the portal of delivering advanced userspace DMA
+features (nested translation [5], PASID [6], etc.) and backward
+compatible with the vfio_iommu_type1 driver. Eventually vfio_iommu_type1
+will be deprecated.
+
+With the backward compatibility, no change is required for legacy VFIO
+drivers or applications to connect a VFIO device to IOMMUFD.
+
+	When CONFIG_IOMMUFD_VFIO_CONTAINER=n, VFIO container still provides
+	/dev/vfio/vfio which connects to vfio_iommu_type1. To disable VFIO
+	container and vfio_iommu_type1, the administrator could symbol link
+	/dev/vfio/vfio to /dev/iommu to enable VFIO container emulation
+	in IOMMUFD.
+
+	When CONFIG_IOMMUFD_VFIO_CONTAINER=y, IOMMUFD directly provides
+	/dev/vfio/vfio while the VFIO container and vfio_iommu_type1 are
+	explicitly disabled.
+
+VFIO Device cdev
+----------------
+
+Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
+in a VFIO group.
+
+With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
+by directly opening a character device /dev/vfio/devices/vfioX where
+"X" is the number allocated uniquely by VFIO for registered devices.
+
+The cdev only works with IOMMUFD. Both VFIO drivers and applications
+must adapt to the new cdev security model which requires using
+VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
+actually use the device. Once bind succeeds then a VFIO device can
+be fully accessed by the user.
+
+VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
+Hence those modules can be fully compiled out in an environment
+where no legacy VFIO application exists.
+
+So far SPAPR does not support IOMMUFD yet. So it cannot support device
+cdev either.
+
+Device cdev Example
+-------------------
+
+Assume user wants to access PCI device 0000:6a:01.0::
+
+	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
+	vfio0
+
+This device is therefore represented as vfio0. The user can verify
+its existence::
+
+	$ ls -l /dev/vfio/devices/vfio0
+	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
+	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
+	511:0
+	$ ls -l /dev/char/511\:0
+	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
+
+Then provide the user with access to the device if unprivileged
+operation is desired::
+
+	$ chown user:user /dev/vfio/devices/vfio0
+
+Finally the user could get cdev fd by::
+
+	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
+
+An opened cdev_fd doesn't give the user any permission of accessing
+the device except binding the cdev_fd to an iommufd. After that point
+then the device is fully accessible including attaching it to an
+IOMMUFD IOAS/HWPT to enable userspace DMA::
+
+	struct vfio_device_bind_iommufd bind = {
+		.argsz = sizeof(bind),
+		.flags = 0,
+	};
+	struct iommu_ioas_alloc alloc_data  = {
+		.size = sizeof(alloc_data),
+		.flags = 0,
+	};
+	struct vfio_device_attach_iommufd_pt attach_data = {
+		.argsz = sizeof(attach_data),
+		.flags = 0,
+	};
+	struct iommu_ioas_map map = {
+		.size = sizeof(map),
+		.flags = IOMMU_IOAS_MAP_READABLE |
+			 IOMMU_IOAS_MAP_WRITEABLE |
+			 IOMMU_IOAS_MAP_FIXED_IOVA,
+		.__reserved = 0,
+	};
+
+	iommufd = open("/dev/iommu", O_RDWR);
+
+	bind.iommufd = iommufd;
+	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
+
+	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
+	attach_data.pt_id = alloc_data.out_ioas_id;
+	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
+
+	/* Allocate some space and setup a DMA mapping */
+	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
+				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	map.iova = 0; /* 1MB starting at 0x0 from device view */
+	map.length = 1024 * 1024;
+	map.ioas_id = alloc_data.out_ioas_id;;
+
+	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
+
+	/* Other device operations as stated in "VFIO Usage Example" */
+
 VFIO User API
 -------------------------------------------------------------------------------
 
@@ -566,3 +683,11 @@ This implementation has some specifics:
 				\-0d.1
 
 	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
+
+.. [5] Nested translation is an IOMMU feature which supports two stage
+   address translations. This improves the address translation efficiency
+   in IOMMU virtualization.
+
+.. [6] PASID stands for Process Address Space ID, introduced by PCI
+   Express. It is a prerequisite for Shared Virtual Addressing (SVA)
+   and Scalable I/O Virtualization (Scalable IOV).
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev5)
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (18 preceding siblings ...)
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 19/19] docs: vfio: Add vfio device cdev description Yi Liu
@ 2023-02-27 11:31 ` Patchwork
  2023-02-27 19:21 ` [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Jason Gunthorpe
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 131+ messages in thread
From: Patchwork @ 2023-02-27 11:31 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: intel-gfx

== Series Details ==

Series: Add vfio_device cdev for iommufd support (rev5)
URL   : https://patchwork.freedesktop.org/series/113696/
State : failure

== Summary ==

Error: patch https://patchwork.freedesktop.org/api/1.0/series/113696/revisions/5/mbox/ not applied
Applying: vfio: Allocate per device file structure
Using index info to reconstruct a base tree...
M	drivers/vfio/group.c
M	drivers/vfio/vfio.h
M	drivers/vfio/vfio_main.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/vfio/vfio_main.c
Auto-merging drivers/vfio/vfio.h
Auto-merging drivers/vfio/group.c
Applying: vfio: Refine vfio file kAPIs for KVM
Using index info to reconstruct a base tree...
M	drivers/vfio/group.c
M	drivers/vfio/vfio.h
M	drivers/vfio/vfio_main.c
M	include/linux/vfio.h
Falling back to patching base and 3-way merge...
Auto-merging include/linux/vfio.h
Auto-merging drivers/vfio/vfio_main.c
Auto-merging drivers/vfio/vfio.h
Auto-merging drivers/vfio/group.c
CONFLICT (content): Merge conflict in drivers/vfio/group.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 vfio: Refine vfio file kAPIs for KVM
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
@ 2023-02-27 18:22   ` Jason Gunthorpe
  2023-02-28  2:31     ` Liu, Yi L
  2023-03-02  6:07   ` Liu, Yi L
  1 sibling, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:22 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:25AM -0800, Yi Liu wrote:
> to indicate kernel to use the device's bound iommufd_ctx for the device
> ownership check. Kernel should loop all the opened devices in the dev_set,
> and check if they are bound to the same iommufd_ctx. For the devices that
> has not been opened yet but affected, they can be reset by the current
> users as they cannot be opened by any other user. This applies to the
> existing group/container path as well.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 111 +++++++++++++++++++++++--------
>  drivers/vfio/vfio.h              |  11 +++
>  include/uapi/linux/vfio.h        |  16 +++++
>  3 files changed, 109 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1bf54beeaef2..e0ebe55b4df0 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -27,11 +27,13 @@
>  #include <linux/vgaarb.h>
>  #include <linux/nospec.h>
>  #include <linux/sched/mm.h>
> +#include <linux/iommufd.h>

Is this needed anymore?

>  #if IS_ENABLED(CONFIG_EEH)
>  #include <asm/eeh.h>
>  #endif
>  
>  #include "vfio_pci_priv.h"
> +#include "../vfio.h"

Don't do this, put vfio_device_iommufd() in the normal public header

> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 0552e8dcf0cb..4bf11ee8de53 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -673,6 +673,22 @@ struct vfio_pci_hot_reset_info {
>   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
>   *				    struct vfio_pci_hot_reset)
>   *
> + * Userspace requests hot reset for the devices it uses.  Due to the
> + * underlying topology, multiple devices may be affected in the reset.
> + * The affected devices may have been opened by the user or by other
> + * users or not opened yet.  Only when all the affected devices are
> + * either opened by the current user or not opened by any user, should
> + * the reset request be allowed.  Otherwise, this request is expected
> + * to return error.
> + *
> + * If the user uses group and container interface, it should pass down
> + * a set of group fds for ownership check.  If the user uses iommufd, it
> + * should pass down a zero-length group_fds array to indicate the kernel
> + * to use the bound iommufd for the ownership check.  User that uses the
> + * vfio iommufd compatible mode can also pass down a zero-length group_fds
> + * array as this mode uses iommufd in kernel, and there is no reason to
> + * forbide it.

'forbid'

Rest looks good

Thanks,
Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace Yi Liu
@ 2023-02-27 18:29   ` Jason Gunthorpe
  2023-02-28  2:35     ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:29 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:26AM -0800, Yi Liu wrote:
> For the device fd opened from cdev, userspace needs to bind it to an
> iommufd and attach it to IOAS managed by iommufd. With such operations,
> userspace can set up a secure DMA context and hence access device.
> 
> This changes the existing vfio_iommufd_bind() to accept a pt_id pointer
> as an optional input, and also an dev_id pointer to selectively return
> the dev_id to prepare for adding bind_iommufd ioctl, which does the bind
> first and then attach IOAS.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/group.c     | 17 ++++++++++++++---
>  drivers/vfio/iommufd.c   | 21 +++++++++------------
>  drivers/vfio/vfio.h      |  9 ++++++---
>  drivers/vfio/vfio_main.c | 10 ++++++----
>  4 files changed, 35 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index d8771d585cb1..e44232551448 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -169,6 +169,7 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
>  static int vfio_device_group_open(struct vfio_device_file *df)
>  {
>  	struct vfio_device *device = df->device;
> +	u32 ioas_id;
>  	int ret;
>  
>  	mutex_lock(&device->group->group_lock);
> @@ -177,6 +178,13 @@ static int vfio_device_group_open(struct vfio_device_file *df)
>  		goto out_unlock;
>  	}
>  
> +	if (device->group->iommufd) {
> +		ret = iommufd_vfio_compat_ioas_id(device->group->iommufd,
> +						  &ioas_id);
> +		if (ret)
> +			goto out_unlock;
> +	}

I don't really like this being moved out of iommufd.c

Pass in a NULL pt_id and the do some

> -int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
> +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
> +		      u32 *dev_id, u32 *pt_id)
>  {
> -	u32 ioas_id;
>  	u32 device_id;
>  	int ret;
>  
> @@ -29,17 +29,14 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
>  	if (ret)
>  		return ret;
>  
> -	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
> -	if (ret)
> -		goto err_unbind;

  io_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
		      u32 *dev_id, u32 *pt_id)
{
   u32 tmp_pt_id;
   if (!pt_id) {
       pt_id = &tmp_pt_id;
       ret = iommufd_vfio_compat_ioas_id(ictx, pt_id);
       if (ret)
		goto err_unbind;
  
   }

To handle it

And the commit message is sort of out of sync with the patch, more like:

vfio: Pass the pt_id as an argument to vfio_iommufd_bind()

To support binding the cdev the pt_id must come from userspace instead
of being forced to the compat_ioas_id.


Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
@ 2023-02-27 18:39   ` Jason Gunthorpe
  2023-02-28  2:51     ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:39 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:33AM -0800, Yi Liu wrote:
> This adds ioctl for userspace to attach device cdev fd to and detach
> from IOAS/hw_pagetable managed by iommufd.
> 
>     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> 				   managed by iommufd. Attach can be
> 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> 				   or device fd close.
>     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
> 				   IOAS or hw_pagetable managed by iommufd.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 76 ++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        | 16 ++++++++
>  drivers/vfio/vfio_main.c   |  8 ++++
>  include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++
>  4 files changed, 152 insertions(+)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 37f80e368551..5b5a249a6612 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -191,6 +191,82 @@ long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
>  	return ret;
>  }
>  
> +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> +			     void __user *arg)

This should be

struct vfio_device_attach_iommufd_pt __user *arg

> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_attach_iommufd_pt attach;
> +	unsigned long minsz;
> +	int ret;
> +
> +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> +
> +	if (copy_from_user(&attach, (void __user *)arg, minsz))

No cast

> +		return -EFAULT;
> +
> +	if (attach.argsz < minsz || attach.flags ||
> +	    attach.pt_id == IOMMUFD_INVALID_ID)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	if (df->noiommu) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	ret = device->ops->attach_ioas(device, &attach.pt_id);
> +	if (ret)
> +		goto out_unlock;
> +
> +	ret = copy_to_user((void __user *)arg +
> +			   offsetofend(struct vfio_device_attach_iommufd_pt, flags),

This should just be &arg->flags

> +			   &attach.pt_id,
> +			   sizeof(attach.pt_id)) ? -EFAULT : 0;

Also:

static_assert(__same_type(arg->flags), attach.pt_id);

> +	if (ret)
> +		goto out_detach;
> +	mutex_unlock(&device->dev_set->lock);
> +
> +	return 0;
> +
> +out_detach:
> +	device->ops->detach_ioas(device);


> +out_unlock:
> +	mutex_unlock(&device->dev_set->lock);
> +	return ret;
> +}
> +
> +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> +			     void __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_detach_iommufd_pt detach;
> +	unsigned long minsz;
> +
> +	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> +
> +	if (copy_from_user(&detach, (void __user *)arg, minsz))
> +		return -EFAULT;

Same comments here

> +
> +	if (detach.argsz < minsz || detach.flags)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	if (df->noiommu) {
> +		mutex_unlock(&device->dev_set->lock);
> +		return -EINVAL;
> +	}

This seems strange. no iommu mode should have a NULL dev->iommufctx.
Why do we have a df->noiommu at all?

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
@ 2023-02-27 18:44   ` Jason Gunthorpe
  2023-02-28  2:57     ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:44 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:27AM -0800, Yi Liu wrote:
> diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> index c89a047a4cd8..d540cf683d93 100644
> --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> @@ -594,6 +594,7 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
>  	.bind_iommufd	= vfio_iommufd_physical_bind,
>  	.unbind_iommufd	= vfio_iommufd_physical_unbind,
>  	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
> +	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
>  };
>  
>  static struct fsl_mc_driver vfio_fsl_mc_driver = {
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index beef6ca21107..bfaa9876499b 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -88,6 +88,14 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
>  {
>  	int rc;
>  
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	if (!vdev->iommufd_device)
> +		return -EINVAL;

This should be a WARN_ON. The vdev->iommufd_ctx should be NULL if it
hasn't been bound, and it can't be bound unless the
iommufd_device/attach was created.

> @@ -96,6 +104,18 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
>  }
>  EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
>  
> +void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
> +{
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	if (!vdev->iommufd_device || !vdev->iommufd_attached)
> +		return;

Same

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 12/19] vfio-iommufd: Add detach_ioas for emulated VFIO devices
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 12/19] vfio-iommufd: Add detach_ioas for emulated " Yi Liu
@ 2023-02-27 18:45   ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:45 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:28AM -0800, Yi Liu wrote:
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index bfaa9876499b..faf2516b0f06 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -165,6 +165,12 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
>  
>  	lockdep_assert_held(&vdev->dev_set->lock);
>  
> +	if (!vdev->iommufd_ictx)
> +		return -EINVAL;

Same remark about WARN_ON here too

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 01/19] vfio: Allocate per device file structure
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 01/19] vfio: Allocate per device file structure Yi Liu
@ 2023-02-27 18:46   ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:17AM -0800, Yi Liu wrote:
> This is preparation for adding vfio device cdev support. vfio device
> cdev requires:
> 1) a per device file memory to store the kvm pointer set by KVM. It will
>    be propagated to vfio_device:kvm after the device cdev file is bound
>    to an iommufd
> 2) a mechanism to block device access through device cdev fd before it
>    is bound to an iommufd
> 
> To address above requirements, this adds a per device file structure
> named vfio_device_file. For now, it's only a wrapper of struct vfio_device
> pointer. Other fields will be added to this per file structure in future
> commits.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> ---
>  drivers/vfio/group.c     | 13 +++++++++++--
>  drivers/vfio/vfio.h      |  6 ++++++
>  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++-----
>  3 files changed, 43 insertions(+), 7 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 02/19] vfio: Refine vfio file kAPIs for KVM
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 02/19] vfio: Refine vfio file kAPIs for KVM Yi Liu
@ 2023-02-27 18:46   ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:18AM -0800, Yi Liu wrote:
> This prepares for making the below kAPIs to accept both group file
> and device file instead of only vfio group file.
> 
>   bool vfio_file_enforced_coherent(struct file *file);
>   void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
> 
> Besides the above change, vfio_file_is_valid() is added to check if a
> given file is a valid vfio file. It would be extended to check both
> vfio group file and vfio device file later.
> 
> vfio_file_is_group() is kept to for the VFIO PCI hot reset path.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> ---
>  drivers/vfio/group.c     | 57 +++++++++++++++-------------------------
>  drivers/vfio/vfio.h      |  3 +++
>  drivers/vfio/vfio_main.c | 45 +++++++++++++++++++++++++++++++
>  include/linux/vfio.h     |  1 +
>  virt/kvm/vfio.c          | 10 +++----
>  5 files changed, 75 insertions(+), 41 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 03/19] vfio: Accept vfio device file in the KVM facing kAPI
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 03/19] vfio: Accept vfio device file in the KVM facing kAPI Yi Liu
@ 2023-02-27 18:46   ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:19AM -0800, Yi Liu wrote:
> This makes the vfio file kAPIs to accepte vfio device files, also a
> preparation for vfio device cdev support.
> 
> For the kvm set with vfio device file, kvm pointer is stored in struct
> vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
> pointer usage within VFIO. This kvm pointer will be set to vfio_device
> after device file is bound to iommufd in the cdev path.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/vfio.h      |  2 ++
>  drivers/vfio/vfio_main.c | 42 +++++++++++++++++++++++++++++++++++++---
>  2 files changed, 41 insertions(+), 3 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
@ 2023-02-27 18:47   ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:20AM -0800, Yi Liu wrote:
> Meanwhile, rename related helpers. No functional change is intended.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> ---
>  virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
>  1 file changed, 58 insertions(+), 57 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 05/19] kvm/vfio: Accept vfio device file from userspace
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 05/19] kvm/vfio: Accept vfio device file from userspace Yi Liu
@ 2023-02-27 18:47   ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:21AM -0800, Yi Liu wrote:
> This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
> Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  Documentation/virt/kvm/devices/vfio.rst | 52 +++++++++++++++++--------
>  include/uapi/linux/kvm.h                | 16 ++++++--
>  virt/kvm/vfio.c                         | 16 ++++----
>  3 files changed, 55 insertions(+), 29 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
@ 2023-02-27 18:47   ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:22AM -0800, Yi Liu wrote:
> This avoids passing too much parameters in multiple functions.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/group.c     | 19 +++++++++++++------
>  drivers/vfio/vfio.h      |  8 ++++----
>  drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
>  3 files changed, 32 insertions(+), 20 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 07/19] vfio: Block device access via device fd until device is opened
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 07/19] vfio: Block device access via device fd until device is opened Yi Liu
@ 2023-02-27 18:48   ` Jason Gunthorpe
  2023-03-01  9:22   ` Liu, Yi L
  1 sibling, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:48 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:23AM -0800, Yi Liu wrote:
> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
> 
> The reason for the inbetween state is that userspace only gets a FD but
> doesn't gain access permission until binding the FD to an iommufd. So in
> the blocked state, only the bind operation is allowed. Completing bind
> will allow user to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Following this lockless scheme, it can safely handle the device FD
> unbound->bound but it cannot handle bound->unbound. To allow this we'd
> need to add a lock on all the vfio ioctls which seems costly. So once
> device FD is bound, it remains bound until the FD is closed.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/group.c     |  6 ++++++
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
>  3 files changed, 23 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
@ 2023-02-27 18:48   ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:48 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:24AM -0800, Yi Liu wrote:
> this suits more on what the code does.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 14/19] vfio: Make vfio_device_open() single open for device cdev path
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 14/19] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
@ 2023-02-27 18:52   ` Jason Gunthorpe
  2023-02-28  3:11     ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:52 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:30AM -0800, Yi Liu wrote:
> @@ -535,7 +542,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> -	vfio_device_group_close(df);
> +	if (!df->is_cdev_device)
> +		vfio_device_group_close(df);

This hunk should go in another patch

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device Yi Liu
@ 2023-02-27 18:55   ` Jason Gunthorpe
  2023-02-28  3:47     ` Liu, Yi L
  2023-02-27 19:06   ` Jason Gunthorpe
  1 sibling, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 18:55 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:31AM -0800, Yi Liu wrote:
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index a8f544629467..169762316513 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -12,6 +12,18 @@ menuconfig VFIO
>  	  If you don't know what to do here, say N.
>  
>  if VFIO
> +config VFIO_DEVICE_CDEV
> +	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
> +	depends on IOMMUFD && (X86 || S390 || ARM || ARM64)

We don't need to propogate this arch detection stuff, at worst it
should be in iommufd kconfig if it is really needed.

Also that other thread shows that vfio doesn't work on ARM because we
can never take ownership of a device due to arm iommu

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device Yi Liu
  2023-02-27 18:55   ` Jason Gunthorpe
@ 2023-02-27 19:06   ` Jason Gunthorpe
  2023-02-28  3:59     ` Liu, Yi L
  1 sibling, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 19:06 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:31AM -0800, Yi Liu wrote:
> @@ -309,6 +310,13 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  	bool interrupted = false;
>  	long rc;
>  
> +	/*
> +	 * Balances vfio_device_add in register path. Putting it as the
> +	 * first operation in unregister to prevent registration refcount
> +	 * from incrementing per cdev open.
> +	 */
> +	vfio_device_del(device);
> +
>  	vfio_device_put_registration(device);
>  	rc = try_wait_for_completion(&device->comp);
>  	while (rc <= 0) {
> @@ -334,9 +342,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  
>  	vfio_device_group_unregister(device);
>  
> -	/* Balances device_add in register path */
> -	device_del(&device->device);
> -
>  	/* Balances vfio_device_set_group in register path */
>  	vfio_device_remove_group(device);

The same rational applies to vfio_device_group_unregister too, so it
should be moved up as well.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
@ 2023-02-27 19:19   ` Jason Gunthorpe
  2023-02-28  4:08     ` Liu, Yi L
  2023-03-01  9:19   ` Liu, Yi L
  2023-03-10  2:39   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 19:19 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:32AM -0800, Yi Liu wrote:
> This adds ioctl for userspace to bind device cdev fd to iommufd.
> 
>     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> 			      control provided by the iommufd. open_device
> 			      op is called after bind_iommufd op.
> 			      VFIO no iommu mode is indicated by passing
> 			      a negative iommufd value.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 146 +++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        |  17 ++++-
>  drivers/vfio/vfio_main.c   |  54 ++++++++++++--
>  include/linux/iommufd.h    |   6 ++
>  include/uapi/linux/vfio.h  |  34 +++++++++
>  5 files changed, 248 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 9e2c1ecaaf4f..37f80e368551 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2023 Intel Corporation.
>   */
>  #include <linux/vfio.h>
> +#include <linux/iommufd.h>
>  
>  #include "vfio.h"
>  
> @@ -45,6 +46,151 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>  	return ret;
>  }
>  
> +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> +{
> +	spin_lock(&df->kvm_ref_lock);
> +	if (!df->kvm)
> +		goto unlock;
> +
> +	_vfio_device_get_kvm_safe(df->device, df->kvm);
> +
> +unlock:

Just 

if (df->kvm)
   _vfio_device_get_kvm_safe(df->device, df->kvm);

Without the goto

> +	spin_unlock(&df->kvm_ref_lock);
> +}
> +
> +void vfio_device_cdev_close(struct vfio_device_file *df)
> +{
> +	struct vfio_device *device = df->device;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/*
> +	 * As df->access_granted writer is under dev_set->lock as well,
> +	 * so this read no need to use smp_load_acquire() to pair with
> +	 * smp_store_release() in the caller of vfio_device_open().
> +	 */

This is a bit misleading, we are about to free df in the caller, so at
this moment df has no current access. We don't even need to have the
mutex to test it.

> +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				    unsigned long arg)

struct device __user *arg and remove all the casts.

> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_bind_iommufd bind;
> +	struct iommufd_ctx *iommufd = NULL;
> +	unsigned long minsz;
> +	int ret;
> +
> +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> +
> +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> +		return -EFAULT;
> +
> +	if (bind.argsz < minsz || bind.flags)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;
> +
> +	ret = vfio_device_block_group(device);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/*
> +	 * If already been bound to an iommufd, or already set noiommu
> +	 * then fail it.
> +	 */
> +	if (df->iommufd || df->noiommu) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* iommufd < 0 means noiommu mode */
> +	if (bind.iommufd < 0) {
> +		if (!capable(CAP_SYS_RAWIO)) {
> +			ret = -EPERM;
> +			goto out_unlock;
> +		}
> +		df->noiommu = true;
> +	} else {
> +		iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> +		if (IS_ERR(iommufd)) {
> +			ret = PTR_ERR(iommufd);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	/*
> +	 * Before the device open, get the KVM pointer currently
> +	 * associated with the device file (if there is) and obtain
> +	 * a reference.  This reference is held until device closed.
> +	 * Save the pointer in the device for use by drivers.
> +	 */
> +	vfio_device_get_kvm_safe(df);
> +
> +	df->iommufd = iommufd;
> +	ret = vfio_device_open(df, &bind.out_devid, NULL);
> +	if (ret)
> +		goto out_put_kvm;
> +
> +	ret = copy_to_user((void __user *)arg +
> +			   offsetofend(struct vfio_device_bind_iommufd, iommufd),

??

&arg->out_dev_id

static_assert(__same_type...)

> diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> index 650d45629647..9672cf839687 100644
> --- a/include/linux/iommufd.h
> +++ b/include/linux/iommufd.h
> @@ -17,6 +17,12 @@ struct iommufd_ctx;
>  struct iommufd_access;
>  struct file;
>  
> +/*
> + * iommufd core init xarray with flags==XA_FLAGS_ALLOC1, so valid
> + * ID starts from 1.
> + */
> +#define IOMMUFD_INVALID_ID 0

Why? vfio doesn't need to check this just to generate EINVAL.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally Yi Liu
@ 2023-02-27 19:20   ` Jason Gunthorpe
  2023-02-28  3:14     ` Liu, Yi L
  2023-02-28  6:00   ` Liu, Yi L
  1 sibling, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 19:20 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:34AM -0800, Yi Liu wrote:

> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ce390533cb30..d12384824656 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -43,7 +43,9 @@ struct vfio_device {
>  	 */
>  	const struct vfio_migration_ops *mig_ops;
>  	const struct vfio_log_ops *log_ops;
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
>  	struct vfio_group *group;
> +#endif
>  	struct vfio_device_set *dev_set;
>  	struct list_head dev_set_list;
>  	unsigned int migration_flags;
> @@ -58,8 +60,10 @@ struct vfio_device {
>  	refcount_t refcount;	/* user count on registered device*/
>  	unsigned int open_count;
>  	struct completion comp;
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
>  	struct list_head group_next;
>  	struct list_head iommu_entry;
> +#endif
>  	struct iommufd_access *iommufd_access;
>  	void (*put_kvm)(struct kvm *kvm);

I'd combine these for readability

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 13/19] vfio: Add cdev_device_open_cnt to vfio_group
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 13/19] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
@ 2023-02-27 19:20   ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 19:20 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:29AM -0800, Yi Liu wrote:
> for counting the devices that are opened via the cdev path. This count
> is increased and decreased by the cdev path. The group path checks it
> to achieve exclusion with the cdev path. With this, only one path (group
> path or cdev path) will claim DMA ownership. This avoids scenarios in
> which devices within the same group may be opened via different paths.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/group.c | 33 +++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h  |  3 +++
>  2 files changed, 36 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (19 preceding siblings ...)
  2023-02-27 11:31 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev5) Patchwork
@ 2023-02-27 19:21 ` Jason Gunthorpe
  2023-02-28  3:03   ` Liu, Yi L
  2023-03-01 21:01 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev6) Patchwork
  2023-03-03  7:00 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev7) Patchwork
  22 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-27 19:21 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, kvm, lulu, joro, nicolinc, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Feb 27, 2023 at 03:11:16AM -0800, Yi Liu wrote:
> Existing VFIO provides group-centric user APIs for userspace. Userspace
> opens the /dev/vfio/$group_id first before getting device fd and hence
> getting access to device. This is not the desired model for iommufd. Per
> the conclusion of community discussion[1], iommufd provides device-centric
> kAPIs and requires its consumer (like VFIO) to be device-centric user
> APIs. Such user APIs are used to associate device with iommufd and also
> the I/O address spaces managed by the iommufd.
> 
> This series first introduces a per device file structure to be prepared
> for further enhancement and refactors the kvm-vfio code to be prepared
> for accepting device file from userspace. Then refactors the vfio to be
> able to handle iommufd binding. This refactor includes the mechanism of
> blocking device access before iommufd bind, making the device_open exclusive.
> between the group path and the cdev path. Eventually, adds the cdev support
> for vfio device, and makes group infrastructure optional as it is not needed
> when vfio device cdev is compiled.
> 
> This is also a prerequisite for iommu nesting for vfio device[2].
> 
> The complete code can be found in below branch, simple test done with the
> legacy group path and the cdev path. Draft QEMU branch can be found at[3]
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v5
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> 
> base-commit: 63777bd2daa3625da6eada88bd9081f047664dad

This needs to be rebased onto a clean v6.3-rc1 when it comes out

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-02-27 18:22   ` Jason Gunthorpe
@ 2023-02-28  2:31     ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  2:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 2:23 AM
> 
> On Mon, Feb 27, 2023 at 03:11:25AM -0800, Yi Liu wrote:
> > to indicate kernel to use the device's bound iommufd_ctx for the device
> > ownership check. Kernel should loop all the opened devices in the
> dev_set,
> > and check if they are bound to the same iommufd_ctx. For the devices
> that
> > has not been opened yet but affected, they can be reset by the current
> > users as they cannot be opened by any other user. This applies to the
> > existing group/container path as well.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 111 +++++++++++++++++++++++-------
> -
> >  drivers/vfio/vfio.h              |  11 +++
> >  include/uapi/linux/vfio.h        |  16 +++++
> >  3 files changed, 109 insertions(+), 29 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c
> b/drivers/vfio/pci/vfio_pci_core.c
> > index 1bf54beeaef2..e0ebe55b4df0 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -27,11 +27,13 @@
> >  #include <linux/vgaarb.h>
> >  #include <linux/nospec.h>
> >  #include <linux/sched/mm.h>
> > +#include <linux/iommufd.h>
> 
> Is this needed anymore?

No more. Will remove it.

> >  #if IS_ENABLED(CONFIG_EEH)
> >  #include <asm/eeh.h>
> >  #endif
> >
> >  #include "vfio_pci_priv.h"
> > +#include "../vfio.h"
> 
> Don't do this, put vfio_device_iommufd() in the normal public header

Ok. will put it in include/linux/vfio.h

> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 0552e8dcf0cb..4bf11ee8de53 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -673,6 +673,22 @@ struct vfio_pci_hot_reset_info {
> >   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> >   *				    struct vfio_pci_hot_reset)
> >   *
> > + * Userspace requests hot reset for the devices it uses.  Due to the
> > + * underlying topology, multiple devices may be affected in the reset.
> > + * The affected devices may have been opened by the user or by other
> > + * users or not opened yet.  Only when all the affected devices are
> > + * either opened by the current user or not opened by any user, should
> > + * the reset request be allowed.  Otherwise, this request is expected
> > + * to return error.
> > + *
> > + * If the user uses group and container interface, it should pass down
> > + * a set of group fds for ownership check.  If the user uses iommufd, it
> > + * should pass down a zero-length group_fds array to indicate the kernel
> > + * to use the bound iommufd for the ownership check.  User that uses
> the
> > + * vfio iommufd compatible mode can also pass down a zero-length
> group_fds
> > + * array as this mode uses iommufd in kernel, and there is no reason to
> > + * forbide it.
> 
> 'forbid'

Oh, yes. will correct it.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-27 18:29   ` Jason Gunthorpe
@ 2023-02-28  2:35     ` Liu, Yi L
  2023-02-28  6:58       ` Liu, Yi L
  2023-02-28 12:29       ` Jason Gunthorpe
  0 siblings, 2 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  2:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 2:30 AM
>
> On Mon, Feb 27, 2023 at 03:11:26AM -0800, Yi Liu wrote:
> > For the device fd opened from cdev, userspace needs to bind it to an
> > iommufd and attach it to IOAS managed by iommufd. With such
> operations,
> > userspace can set up a secure DMA context and hence access device.
> >
> > This changes the existing vfio_iommufd_bind() to accept a pt_id pointer
> > as an optional input, and also an dev_id pointer to selectively return
> > the dev_id to prepare for adding bind_iommufd ioctl, which does the bind
> > first and then attach IOAS.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > ---
> >  drivers/vfio/group.c     | 17 ++++++++++++++---
> >  drivers/vfio/iommufd.c   | 21 +++++++++------------
> >  drivers/vfio/vfio.h      |  9 ++++++---
> >  drivers/vfio/vfio_main.c | 10 ++++++----
> >  4 files changed, 35 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index d8771d585cb1..e44232551448 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -169,6 +169,7 @@ static void
> vfio_device_group_get_kvm_safe(struct vfio_device *device)
> >  static int vfio_device_group_open(struct vfio_device_file *df)
> >  {
> >  	struct vfio_device *device = df->device;
> > +	u32 ioas_id;
> >  	int ret;
> >
> >  	mutex_lock(&device->group->group_lock);
> > @@ -177,6 +178,13 @@ static int vfio_device_group_open(struct
> vfio_device_file *df)
> >  		goto out_unlock;
> >  	}
> >
> > +	if (device->group->iommufd) {
> > +		ret = iommufd_vfio_compat_ioas_id(device->group-
> >iommufd,
> > +						  &ioas_id);
> > +		if (ret)
> > +			goto out_unlock;
> > +	}
> 
> I don't really like this being moved out of iommufd.c
> 
> Pass in a NULL pt_id and the do some
> 
> > -int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx
> *ictx)
> > +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx
> *ictx,
> > +		      u32 *dev_id, u32 *pt_id)
> >  {
> > -	u32 ioas_id;
> >  	u32 device_id;
> >  	int ret;
> >
> > @@ -29,17 +29,14 @@ int vfio_iommufd_bind(struct vfio_device *vdev,
> struct iommufd_ctx *ictx)
> >  	if (ret)
> >  		return ret;
> >
> > -	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
> > -	if (ret)
> > -		goto err_unbind;
> 
>   io_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
> 		      u32 *dev_id, u32 *pt_id)
> {
>    u32 tmp_pt_id;
>    if (!pt_id) {
>        pt_id = &tmp_pt_id;
>        ret = iommufd_vfio_compat_ioas_id(ictx, pt_id);
>        if (ret)
> 		goto err_unbind;
> 
>    }
> 
> To handle it
> 
> And the commit message is sort of out of sync with the patch, more like:
> 
> vfio: Pass the pt_id as an argument to vfio_iommufd_bind()
> 
> To support binding the cdev the pt_id must come from userspace instead
> of being forced to the compat_ioas_id.
> 

Got it. not only pt_id, also dev_id. 😊

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-27 18:39   ` Jason Gunthorpe
@ 2023-02-28  2:51     ` Liu, Yi L
  2023-02-28 12:32       ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  2:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 2:39 AM
> 
> On Mon, Feb 27, 2023 at 03:11:33AM -0800, Yi Liu wrote:
> > This adds ioctl for userspace to attach device cdev fd to and detach
> > from IOAS/hw_pagetable managed by iommufd.
> >
> >     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS,
> hw_pagetable
> > 				   managed by iommufd. Attach can be
> > 				   undo by
> VFIO_DEVICE_DETACH_IOMMUFD_PT
> > 				   or device fd close.
> >     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the
> current attached
> > 				   IOAS or hw_pagetable managed by
> iommufd.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > ---
> >  drivers/vfio/device_cdev.c | 76
> ++++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio.h        | 16 ++++++++
> >  drivers/vfio/vfio_main.c   |  8 ++++
> >  include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++
> >  4 files changed, 152 insertions(+)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 37f80e368551..5b5a249a6612 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -191,6 +191,82 @@ long vfio_device_ioctl_bind_iommufd(struct
> vfio_device_file *df,
> >  	return ret;
> >  }
> >
> > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > +			     void __user *arg)
> 
> This should be
> 
> struct vfio_device_attach_iommufd_pt __user *arg

Got it.

> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_attach_iommufd_pt attach;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> > +
> > +	if (copy_from_user(&attach, (void __user *)arg, minsz))
> 
> No cast

Yes.

> > +		return -EFAULT;
> > +
> > +	if (attach.argsz < minsz || attach.flags ||
> > +	    attach.pt_id == IOMMUFD_INVALID_ID)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	if (df->noiommu) {
> > +		ret = -EINVAL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	ret = device->ops->attach_ioas(device, &attach.pt_id);
> > +	if (ret)
> > +		goto out_unlock;
> > +
> > +	ret = copy_to_user((void __user *)arg +
> > +			   offsetofend(struct
> vfio_device_attach_iommufd_pt, flags),
> 
> This should just be &arg->flags

Yes, can use arg->xxx here. I guess you mean &arg->pt_id.

> 
> > +			   &attach.pt_id,
> > +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> 
> Also:
> 
> static_assert(__same_type(arg->flags), attach.pt_id);

Got it. but s/arg->flags/arg->pt_id/

> > +	if (ret)
> > +		goto out_detach;
> > +	mutex_unlock(&device->dev_set->lock);
> > +
> > +	return 0;
> > +
> > +out_detach:
> > +	device->ops->detach_ioas(device);
> 
> 
> > +out_unlock:
> > +	mutex_unlock(&device->dev_set->lock);
> > +	return ret;
> > +}
> > +
> > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > +			     void __user *arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_detach_iommufd_pt detach;
> > +	unsigned long minsz;
> > +
> > +	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> > +
> > +	if (copy_from_user(&detach, (void __user *)arg, minsz))
> > +		return -EFAULT;
> 
> Same comments here

Sure.
 
> > +
> > +	if (detach.argsz < minsz || detach.flags)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	if (df->noiommu) {
> > +		mutex_unlock(&device->dev_set->lock);
> > +		return -EINVAL;
> > +	}
> 
> This seems strange. no iommu mode should have a NULL dev->iommufctx.
> Why do we have a df->noiommu at all?

This is due to the vfio_device_first_open(). Detail as below comment (part of
patch 0016).

+	/*
+	 * For group/container path, iommufd pointer is NULL when comes
+	 * into this helper. Its noiommu support is handled by
+	 * vfio_device_group_use_iommu()
+	 *
+	 * For iommufd compat mode, iommufd pointer here is a valid value.
+	 * Its noiommu support is in vfio_iommufd_bind().
+	 *
+	 * For device cdev path, iommufd pointer here is a valid value for
+	 * normal cases, but it is NULL if it's noiommu. Check df->noiommu
+	 * to differentiate cdev noiommu from the group/container path which
+	 * also passes NULL iommufd pointer in. If set then do nothing.
+	 */
 	if (iommufd)
 		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
-	else
+	else if (!df->noiommu)
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
 		goto err_module_put;

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices
  2023-02-27 18:44   ` Jason Gunthorpe
@ 2023-02-28  2:57     ` Liu, Yi L
  2023-02-28 12:33       ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  2:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 2:45 AM
> 
> On Mon, Feb 27, 2023 at 03:11:27AM -0800, Yi Liu wrote:
> > diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-
> mc/vfio_fsl_mc.c
> > index c89a047a4cd8..d540cf683d93 100644
> > --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> > +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> > @@ -594,6 +594,7 @@ static const struct vfio_device_ops
> vfio_fsl_mc_ops = {
> >  	.bind_iommufd	= vfio_iommufd_physical_bind,
> >  	.unbind_iommufd	= vfio_iommufd_physical_unbind,
> >  	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
> > +	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
> >  };
> >
> >  static struct fsl_mc_driver vfio_fsl_mc_driver = {
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index beef6ca21107..bfaa9876499b 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -88,6 +88,14 @@ int vfio_iommufd_physical_attach_ioas(struct
> vfio_device *vdev, u32 *pt_id)
> >  {
> >  	int rc;
> >
> > +	lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +	if (!vdev->iommufd_device)
> > +		return -EINVAL;
> 
> This should be a WARN_ON. The vdev->iommufd_ctx should be NULL if it
> hasn't been bound, and it can't be bound unless the
> iommufd_device/attach was created.

sure. But it is a user-triggerable warn. If userspace triggers it on
purpose, will it be a bad thing for kernel? Maybe use dev_warn_ratelimited()?

> > @@ -96,6 +104,18 @@ int vfio_iommufd_physical_attach_ioas(struct
> vfio_device *vdev, u32 *pt_id)
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
> >
> > +void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
> > +{
> > +	lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +	if (!vdev->iommufd_device || !vdev->iommufd_attached)
> > +		return;
> 
> Same

Sure. Will apply same warn when above comment is aligned.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-02-27 19:21 ` [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Jason Gunthorpe
@ 2023-02-28  3:03   ` Liu, Yi L
  2023-02-28 16:58     ` Xu, Terrence
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  3:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 3:21 AM
> 
> On Mon, Feb 27, 2023 at 03:11:16AM -0800, Yi Liu wrote:
> > Existing VFIO provides group-centric user APIs for userspace. Userspace
> > opens the /dev/vfio/$group_id first before getting device fd and hence
> > getting access to device. This is not the desired model for iommufd. Per
> > the conclusion of community discussion[1], iommufd provides device-
> centric
> > kAPIs and requires its consumer (like VFIO) to be device-centric user
> > APIs. Such user APIs are used to associate device with iommufd and also
> > the I/O address spaces managed by the iommufd.
> >
> > This series first introduces a per device file structure to be prepared
> > for further enhancement and refactors the kvm-vfio code to be prepared
> > for accepting device file from userspace. Then refactors the vfio to be
> > able to handle iommufd binding. This refactor includes the mechanism of
> > blocking device access before iommufd bind, making the device_open
> exclusive.
> > between the group path and the cdev path. Eventually, adds the cdev
> support
> > for vfio device, and makes group infrastructure optional as it is not needed
> > when vfio device cdev is compiled.
> >
> > This is also a prerequisite for iommu nesting for vfio device[2].
> >
> > The complete code can be found in below branch, simple test done with
> the
> > legacy group path and the cdev path. Draft QEMU branch can be found
> at[3]
> >
> > https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v5
> > (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> >
> > base-commit: 63777bd2daa3625da6eada88bd9081f047664dad
> 
> This needs to be rebased onto a clean v6.3-rc1 when it comes out

Yes, I'll send rebase and send one more version when v6.3-rc1
comes. Here just try to be near to the vfio code in Alex's next
branch.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 14/19] vfio: Make vfio_device_open() single open for device cdev path
  2023-02-27 18:52   ` Jason Gunthorpe
@ 2023-02-28  3:11     ` Liu, Yi L
  2023-02-28 12:33       ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  3:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 2:52 AM
> 
> On Mon, Feb 27, 2023 at 03:11:30AM -0800, Yi Liu wrote:
> > @@ -535,7 +542,8 @@ static int vfio_device_fops_release(struct inode
> *inode, struct file *filep)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > -	vfio_device_group_close(df);
> > +	if (!df->is_cdev_device)
> > +		vfio_device_group_close(df);
> 
> This hunk should go in another patch

Patch 15 or 16? Which one is your preference? To me, I guess patch
15 is better since the user may open cdev fds after it. But its release
op should not call vfio_device_group_close();

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally
  2023-02-27 19:20   ` Jason Gunthorpe
@ 2023-02-28  3:14     ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  3:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 3:20 AM
> 
> On Mon, Feb 27, 2023 at 03:11:34AM -0800, Yi Liu wrote:
> 
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index ce390533cb30..d12384824656 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -43,7 +43,9 @@ struct vfio_device {
> >  	 */
> >  	const struct vfio_migration_ops *mig_ops;
> >  	const struct vfio_log_ops *log_ops;
> > +#if IS_ENABLED(CONFIG_VFIO_GROUP)
> >  	struct vfio_group *group;
> > +#endif
> >  	struct vfio_device_set *dev_set;
> >  	struct list_head dev_set_list;
> >  	unsigned int migration_flags;
> > @@ -58,8 +60,10 @@ struct vfio_device {
> >  	refcount_t refcount;	/* user count on registered device*/
> >  	unsigned int open_count;
> >  	struct completion comp;
> > +#if IS_ENABLED(CONFIG_VFIO_GROUP)
> >  	struct list_head group_next;
> >  	struct list_head iommu_entry;
> > +#endif
> >  	struct iommufd_access *iommufd_access;
> >  	void (*put_kvm)(struct kvm *kvm);
> 
> I'd combine these for readability

Sure.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device
  2023-02-27 18:55   ` Jason Gunthorpe
@ 2023-02-28  3:47     ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  3:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 2:56 AM
> 
> On Mon, Feb 27, 2023 at 03:11:31AM -0800, Yi Liu wrote:
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index a8f544629467..169762316513 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -12,6 +12,18 @@ menuconfig VFIO
> >  	  If you don't know what to do here, say N.
> >
> >  if VFIO
> > +config VFIO_DEVICE_CDEV
> > +	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
> > +	depends on IOMMUFD && (X86 || S390 || ARM || ARM64)
> 
> We don't need to propogate this arch detection stuff, at worst it
> should be in iommufd kconfig if it is really needed.

Ok. this makes sense as cdev's real dependency is iommufd.

Btw. Also no need for the below stuff. Is it? just select CDEV if !VFIO_GROUP.
right?

select VFIO_DEVICE_CDEV if !VFIO_GROUP && (X86 || S390 || ARM || ARM64)

> Also that other thread shows that vfio doesn't work on ARM because we
> can never take ownership of a device due to arm iommu

It's interesting. May you share the link of this thread?:-)

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device
  2023-02-27 19:06   ` Jason Gunthorpe
@ 2023-02-28  3:59     ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  3:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 3:06 AM
> 
> On Mon, Feb 27, 2023 at 03:11:31AM -0800, Yi Liu wrote:
> > @@ -309,6 +310,13 @@ void vfio_unregister_group_dev(struct
> vfio_device *device)
> >  	bool interrupted = false;
> >  	long rc;
> >
> > +	/*
> > +	 * Balances vfio_device_add in register path. Putting it as the
> > +	 * first operation in unregister to prevent registration refcount
> > +	 * from incrementing per cdev open.
> > +	 */
> > +	vfio_device_del(device);
> > +
> >  	vfio_device_put_registration(device);
> >  	rc = try_wait_for_completion(&device->comp);
> >  	while (rc <= 0) {
> > @@ -334,9 +342,6 @@ void vfio_unregister_group_dev(struct vfio_device
> *device)
> >
> >  	vfio_device_group_unregister(device);
> >
> > -	/* Balances device_add in register path */
> > -	device_del(&device->device);
> > -
> >  	/* Balances vfio_device_set_group in register path */
> >  	vfio_device_remove_group(device);
> 
> The same rational applies to vfio_device_group_unregister too, so it
> should be moved up as well.

You are right. User may get new registration refcount in below path
which can be in parallel with this vfio_unregister_group_dev() path.
Let me move it and refine the comment as well.

ioctl(group_fd, VFIO_GROUP_GET_DEVICE_FD, )
  vfio_group_ioctl_get_device_fd()
    -> vfio_device_get_from_name()
      -> vfio_device_try_get_registration() -- refcount++

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-27 19:19   ` Jason Gunthorpe
@ 2023-02-28  4:08     ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  4:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 3:20 AM
> 
> On Mon, Feb 27, 2023 at 03:11:32AM -0800, Yi Liu wrote:
> > This adds ioctl for userspace to bind device cdev fd to iommufd.
> >
> >     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain
> DMA
> > 			      control provided by the iommufd. open_device
> > 			      op is called after bind_iommufd op.
> > 			      VFIO no iommu mode is indicated by passing
> > 			      a negative iommufd value.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/device_cdev.c | 146
> +++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio.h        |  17 ++++-
> >  drivers/vfio/vfio_main.c   |  54 ++++++++++++--
> >  include/linux/iommufd.h    |   6 ++
> >  include/uapi/linux/vfio.h  |  34 +++++++++
> >  5 files changed, 248 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 9e2c1ecaaf4f..37f80e368551 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -3,6 +3,7 @@
> >   * Copyright (c) 2023 Intel Corporation.
> >   */
> >  #include <linux/vfio.h>
> > +#include <linux/iommufd.h>
> >
> >  #include "vfio.h"
> >
> > @@ -45,6 +46,151 @@ int vfio_device_fops_cdev_open(struct inode
> *inode, struct file *filep)
> >  	return ret;
> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +	spin_lock(&df->kvm_ref_lock);
> > +	if (!df->kvm)
> > +		goto unlock;
> > +
> > +	_vfio_device_get_kvm_safe(df->device, df->kvm);
> > +
> > +unlock:
> 
> Just
> 
> if (df->kvm)
>    _vfio_device_get_kvm_safe(df->device, df->kvm);
> 
> Without the goto

Got it.

> > +	spin_unlock(&df->kvm_ref_lock);
> > +}
> > +
> > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > +{
> > +	struct vfio_device *device = df->device;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/*
> > +	 * As df->access_granted writer is under dev_set->lock as well,
> > +	 * so this read no need to use smp_load_acquire() to pair with
> > +	 * smp_store_release() in the caller of vfio_device_open().
> > +	 */
> 
> This is a bit misleading, we are about to free df in the caller, so at
> this moment df has no current access. We don't even need to have the
> mutex to test it.

Ok. so I can test it outside the lock and make the comment
more clear? How about below? Or simply no need to have
a comment here?

/*
  * caller of vfio_device_cdev_close() is going to free df, so there
  * is no need to use smp_load_acquire() to pair with
  * smp_store_release() in the writer path of df->access_granted.
  */

> > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				    unsigned long arg)
> 
> struct device __user *arg and remove all the casts.
> 
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_bind_iommufd bind;
> > +	struct iommufd_ctx *iommufd = NULL;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > +
> > +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (bind.argsz < minsz || bind.flags)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> > +
> > +	ret = vfio_device_block_group(device);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/*
> > +	 * If already been bound to an iommufd, or already set noiommu
> > +	 * then fail it.
> > +	 */
> > +	if (df->iommufd || df->noiommu) {
> > +		ret = -EINVAL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	/* iommufd < 0 means noiommu mode */
> > +	if (bind.iommufd < 0) {
> > +		if (!capable(CAP_SYS_RAWIO)) {
> > +			ret = -EPERM;
> > +			goto out_unlock;
> > +		}
> > +		df->noiommu = true;
> > +	} else {
> > +		iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> > +		if (IS_ERR(iommufd)) {
> > +			ret = PTR_ERR(iommufd);
> > +			goto out_unlock;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * Before the device open, get the KVM pointer currently
> > +	 * associated with the device file (if there is) and obtain
> > +	 * a reference.  This reference is held until device closed.
> > +	 * Save the pointer in the device for use by drivers.
> > +	 */
> > +	vfio_device_get_kvm_safe(df);
> > +
> > +	df->iommufd = iommufd;
> > +	ret = vfio_device_open(df, &bind.out_devid, NULL);
> > +	if (ret)
> > +		goto out_put_kvm;
> > +
> > +	ret = copy_to_user((void __user *)arg +
> > +			   offsetofend(struct vfio_device_bind_iommufd,
> iommufd),
> 
> ??
> 
> &arg->out_dev_id
>
> static_assert(__same_type...)

Yes, all the above comments are similar with other two patches. Will
refine accordingly.

> > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> > index 650d45629647..9672cf839687 100644
> > --- a/include/linux/iommufd.h
> > +++ b/include/linux/iommufd.h
> > @@ -17,6 +17,12 @@ struct iommufd_ctx;
> >  struct iommufd_access;
> >  struct file;
> >
> > +/*
> > + * iommufd core init xarray with flags==XA_FLAGS_ALLOC1, so valid
> > + * ID starts from 1.
> > + */
> > +#define IOMMUFD_INVALID_ID 0
> 
> Why? vfio doesn't need to check this just to generate EINVAL.

Hmmm, you are right. Not needed any more.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally Yi Liu
  2023-02-27 19:20   ` Jason Gunthorpe
@ 2023-02-28  6:00   ` Liu, Yi L
  2023-02-28 12:36     ` Jason Gunthorpe
  1 sibling, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  6:00 UTC (permalink / raw)
  To: alex.williamson@redhat.com, jgg@nvidia.com, Tian, Kevin
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, February 27, 2023 7:12 PM
> 
> group code is not needed for vfio device cdev, so with vfio device cdev
> introduced, the group infrastructures can be compiled out if only cdev
> is needed.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/Kconfig  | 14 +++++++++
>  drivers/vfio/Makefile |  2 +-
>  drivers/vfio/vfio.h   | 72
> +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/vfio.h  | 24 ++++++++++++++-
>  4 files changed, 110 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 169762316513..c3ab06c314ea 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -4,6 +4,8 @@ menuconfig VFIO
>  	select IOMMU_API
>  	depends on IOMMUFD || !IOMMUFD
>  	select INTERVAL_TREE
> +	select VFIO_GROUP if SPAPR_TCE_IOMMU
> +	select VFIO_DEVICE_CDEV if !VFIO_GROUP && (X86 || S390 || ARM || ARM64)

Got below warning when IOMMUFD=n, VFIO_GROUP=n. so may remove
this select or needs to let VFIO_DEVICE_CDEV select IOMMUFD instead of
depends on IOMMUFD.

WARNING: unmet direct dependencies detected for VFIO_DEVICE_CDEV
  Depends on [n]: VFIO [=m] && IOMMUFD [=n]
  Selected by [m]:
  - VFIO [=m] && (IOMMUFD [=n] || !IOMMUFD [=n]) && !VFIO_GROUP [=n]

>  	select VFIO_CONTAINER if IOMMUFD=n

Needs to be if IOMMUFD=n && VFIO_GROUP, otherwise vfio container
is compiled even VFIO_GROUP=n.

>  	help
>  	  VFIO provides a framework for secure userspace device drivers.
> @@ -15,6 +17,7 @@ if VFIO
>  config VFIO_DEVICE_CDEV
>  	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
>  	depends on IOMMUFD && (X86 || S390 || ARM || ARM64)

depends on IOMMUFD have warning when IOMMUFD=n and VFIO_GROUP=n.

> +	default !VFIO_GROUP
>  	help
>  	  The VFIO device cdev is another way for userspace to get device
>  	  access. Userspace gets device fd by opening device cdev under
> @@ -24,9 +27,20 @@ config VFIO_DEVICE_CDEV
> 
>  	  If you don't know what to do here, say N.
> 
> +config VFIO_GROUP
> +	bool "Support for the VFIO group /dev/vfio/$group_id"
> +	default y
> +	help
> +	   VFIO group support provides the traditional model for accessing
> +	   devices through VFIO and is used by the majority of userspace
> +	   applications and drivers making use of VFIO.
> +
> +	   If you don't know what to do here, say Y.
> +
>  config VFIO_CONTAINER
>  	bool "Support for the VFIO container /dev/vfio/vfio"
>  	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM ||
> ARM64)
> +	depends on VFIO_GROUP
>  	default y
>  	help
>  	  The VFIO container is the classic interface to VFIO for establishing
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 245394aeb94b..57c3515af606 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -2,9 +2,9 @@
>  obj-$(CONFIG_VFIO) += vfio.o
> 
>  vfio-y += vfio_main.o \
> -	  group.o \
>  	  iova_bitmap.o
>  vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
> +vfio-$(CONFIG_VFIO_GROUP) += group.o
>  vfio-$(CONFIG_IOMMUFD) += iommufd.o
>  vfio-$(CONFIG_VFIO_CONTAINER) += container.o
>  vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 5a1ceb014779..a7b88521bf48 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -62,6 +62,7 @@ enum vfio_group_type {
>  	VFIO_NO_IOMMU,
>  };
> 
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
>  struct vfio_group {
>  	struct device 			dev;
>  	struct cdev			cdev;
> @@ -107,6 +108,77 @@ void vfio_group_set_kvm(struct vfio_group *group,
> struct kvm *kvm);
>  bool vfio_device_has_container(struct vfio_device *device);
>  int __init vfio_group_init(void);
>  void vfio_group_cleanup(void);
> +#else
> +struct vfio_group;
> +
> +static inline int vfio_device_block_group(struct vfio_device *device)
> +{
> +	return 0;
> +}
> +
> +static inline void vfio_device_unblock_group(struct vfio_device *device)
> +{
> +}
> +
> +static inline int vfio_device_set_group(struct vfio_device *device,
> +					enum vfio_group_type type)
> +{
> +	return 0;
> +}
> +
> +static inline void vfio_device_remove_group(struct vfio_device *device)
> +{
> +}
> +
> +static inline void vfio_device_group_register(struct vfio_device *device)
> +{
> +}
> +
> +static inline void vfio_device_group_unregister(struct vfio_device *device)
> +{
> +}
> +
> +static inline int vfio_device_group_use_iommu(struct vfio_device *device)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline void vfio_device_group_unuse_iommu(struct vfio_device
> *device)
> +{
> +}
> +
> +static inline void vfio_device_group_close(struct vfio_device_file *df)
> +{
> +}
> +
> +static inline struct vfio_group *vfio_group_from_file(struct file *file)
> +{
> +	return NULL;
> +}
> +
> +static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
> +{
> +	return true;
> +}
> +
> +static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm
> *kvm)
> +{
> +}
> +
> +static inline bool vfio_device_has_container(struct vfio_device *device)
> +{
> +	return false;
> +}
> +
> +static inline int __init vfio_group_init(void)
> +{
> +	return 0;
> +}
> +
> +static inline void vfio_group_cleanup(void)
> +{
> +}
> +#endif /* CONFIG_VFIO_GROUP */
> 
>  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
>  /**
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ce390533cb30..d12384824656 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -43,7 +43,9 @@ struct vfio_device {
>  	 */
>  	const struct vfio_migration_ops *mig_ops;
>  	const struct vfio_log_ops *log_ops;
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
>  	struct vfio_group *group;
> +#endif
>  	struct vfio_device_set *dev_set;
>  	struct list_head dev_set_list;
>  	unsigned int migration_flags;
> @@ -58,8 +60,10 @@ struct vfio_device {
>  	refcount_t refcount;	/* user count on registered device*/
>  	unsigned int open_count;
>  	struct completion comp;
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
>  	struct list_head group_next;
>  	struct list_head iommu_entry;
> +#endif
>  	struct iommufd_access *iommufd_access;
>  	void (*put_kvm)(struct kvm *kvm);
>  #if IS_ENABLED(CONFIG_IOMMUFD)
> @@ -257,12 +261,30 @@ int vfio_mig_get_next_state(struct vfio_device
> *device,
>  /*
>   * External user API
>   */
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
>  struct iommu_group *vfio_file_iommu_group(struct file *file);
>  bool vfio_file_is_group(struct file *file);
> +bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
> +#else
> +static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
> +{
> +	return NULL;
> +}
> +
> +static inline bool vfio_file_is_group(struct file *file)
> +{
> +	return false;
> +}
> +
> +static inline bool vfio_file_has_dev(struct file *file,
> +				     struct vfio_device *device)
> +{
> +	return false;
> +}
> +#endif
>  bool vfio_file_is_valid(struct file *file);
>  bool vfio_file_enforced_coherent(struct file *file);
>  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
> -bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
> 
>  #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned
> long))
> 
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28  2:35     ` Liu, Yi L
@ 2023-02-28  6:58       ` Liu, Yi L
  2023-02-28 12:31         ` Jason Gunthorpe
  2023-02-28 12:29       ` Jason Gunthorpe
  1 sibling, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28  6:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 28, 2023 10:35 AM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 2:30 AM
> >
> > On Mon, Feb 27, 2023 at 03:11:26AM -0800, Yi Liu wrote:
> > > For the device fd opened from cdev, userspace needs to bind it to an
> > > iommufd and attach it to IOAS managed by iommufd. With such
> > operations,
> > > userspace can set up a secure DMA context and hence access device.
> > >
> > > This changes the existing vfio_iommufd_bind() to accept a pt_id pointer
> > > as an optional input, and also an dev_id pointer to selectively return
> > > the dev_id to prepare for adding bind_iommufd ioctl, which does the
> bind
> > > first and then attach IOAS.
> > >
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > ---
> > >  drivers/vfio/group.c     | 17 ++++++++++++++---
> > >  drivers/vfio/iommufd.c   | 21 +++++++++------------
> > >  drivers/vfio/vfio.h      |  9 ++++++---
> > >  drivers/vfio/vfio_main.c | 10 ++++++----
> > >  4 files changed, 35 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > index d8771d585cb1..e44232551448 100644
> > > --- a/drivers/vfio/group.c
> > > +++ b/drivers/vfio/group.c
> > > @@ -169,6 +169,7 @@ static void
> > vfio_device_group_get_kvm_safe(struct vfio_device *device)
> > >  static int vfio_device_group_open(struct vfio_device_file *df)
> > >  {
> > >  	struct vfio_device *device = df->device;
> > > +	u32 ioas_id;
> > >  	int ret;
> > >
> > >  	mutex_lock(&device->group->group_lock);
> > > @@ -177,6 +178,13 @@ static int vfio_device_group_open(struct
> > vfio_device_file *df)
> > >  		goto out_unlock;
> > >  	}
> > >
> > > +	if (device->group->iommufd) {
> > > +		ret = iommufd_vfio_compat_ioas_id(device->group-
> > >iommufd,
> > > +						  &ioas_id);
> > > +		if (ret)
> > > +			goto out_unlock;
> > > +	}
> >
> > I don't really like this being moved out of iommufd.c
> >
> > Pass in a NULL pt_id and the do some
> >
> > > -int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx
> > *ictx)
> > > +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx
> > *ictx,
> > > +		      u32 *dev_id, u32 *pt_id)
> > >  {
> > > -	u32 ioas_id;
> > >  	u32 device_id;
> > >  	int ret;
> > >
> > > @@ -29,17 +29,14 @@ int vfio_iommufd_bind(struct vfio_device *vdev,
> > struct iommufd_ctx *ictx)
> > >  	if (ret)
> > >  		return ret;
> > >
> > > -	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
> > > -	if (ret)
> > > -		goto err_unbind;
> >
> >   io_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
> > 		      u32 *dev_id, u32 *pt_id)
> > {
> >    u32 tmp_pt_id;
> >    if (!pt_id) {
> >        pt_id = &tmp_pt_id;
> >        ret = iommufd_vfio_compat_ioas_id(ictx, pt_id);
> >        if (ret)
> > 		goto err_unbind;
> >
> >    }
> >
> > To handle it
> >
> > And the commit message is sort of out of sync with the patch, more like:
> >
> > vfio: Pass the pt_id as an argument to vfio_iommufd_bind()
> >
> > To support binding the cdev the pt_id must come from userspace instead
> > of being forced to the compat_ioas_id.
> >

Seems like pt_id is no more needed in the vfio_iommufd_bind()
since it can get compat_ioas_id in the function itself. Cdev path
never passes a pt_id to vfio_iommufd_bind() as its attach is done
by separate ATTACH ioctl. Can we use the dev_id pointer to indicate
if it needs to get the compat ioas and attach it?

vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
		      u32 *dev_id)
{
...
        if (!dev_id) {
             u32 ioas_id;

             ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
             if (ret)
		goto err_unbind;

             ret = vdev->ops->attach_ioas(vdev, &ioas_id);
             if (ret)
		goto err_unbind;
       }
...
}

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28  2:35     ` Liu, Yi L
  2023-02-28  6:58       ` Liu, Yi L
@ 2023-02-28 12:29       ` Jason Gunthorpe
  2023-02-28 12:48         ` Liu, Yi L
  1 sibling, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:29 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 02:35:25AM +0000, Liu, Yi L wrote:

> > And the commit message is sort of out of sync with the patch, more like:
> > 
> > vfio: Pass the pt_id as an argument to vfio_iommufd_bind()
> > 
> > To support binding the cdev the pt_id must come from userspace instead
> > of being forced to the compat_ioas_id.
> > 
> 
> Got it. not only pt_id, also dev_id. 😊

Maybe dev_id should be read back from the iommufd_device pointer in
the vfio_device. It is trivially stored in that memory already

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28  6:58       ` Liu, Yi L
@ 2023-02-28 12:31         ` Jason Gunthorpe
  2023-02-28 12:45           ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:31 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 06:58:38AM +0000, Liu, Yi L wrote:

> Seems like pt_id is no more needed in the vfio_iommufd_bind()
> since it can get compat_ioas_id in the function itself. Cdev path
> never passes a pt_id to vfio_iommufd_bind() as its attach is done
> by separate ATTACH ioctl. Can we use the dev_id pointer to indicate
> if it needs to get the compat ioas and attach it?

In this case you need to split the group code to also use the two step
attach and then the attach will take in the null pt_id.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28  2:51     ` Liu, Yi L
@ 2023-02-28 12:32       ` Jason Gunthorpe
  2023-02-28 12:42         ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:32 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 02:51:28AM +0000, Liu, Yi L wrote:
> > This seems strange. no iommu mode should have a NULL dev->iommufctx.
> > Why do we have a df->noiommu at all?
> 
> This is due to the vfio_device_first_open(). Detail as below comment (part of
> patch 0016).
> 
> +	/*
> +	 * For group/container path, iommufd pointer is NULL when comes
> +	 * into this helper. Its noiommu support is handled by
> +	 * vfio_device_group_use_iommu()
> +	 *
> +	 * For iommufd compat mode, iommufd pointer here is a valid value.
> +	 * Its noiommu support is in vfio_iommufd_bind().
> +	 *
> +	 * For device cdev path, iommufd pointer here is a valid value for
> +	 * normal cases, but it is NULL if it's noiommu. Check df->noiommu
> +	 * to differentiate cdev noiommu from the group/container path which
> +	 * also passes NULL iommufd pointer in. If set then do nothing.
> +	 */

If the group is in iommufd mode then it should set this pointer too.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices
  2023-02-28  2:57     ` Liu, Yi L
@ 2023-02-28 12:33       ` Jason Gunthorpe
  2023-02-28 12:43         ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:33 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 02:57:42AM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 2:45 AM
> > 
> > On Mon, Feb 27, 2023 at 03:11:27AM -0800, Yi Liu wrote:
> > > diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-
> > mc/vfio_fsl_mc.c
> > > index c89a047a4cd8..d540cf683d93 100644
> > > --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> > > +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> > > @@ -594,6 +594,7 @@ static const struct vfio_device_ops
> > vfio_fsl_mc_ops = {
> > >  	.bind_iommufd	= vfio_iommufd_physical_bind,
> > >  	.unbind_iommufd	= vfio_iommufd_physical_unbind,
> > >  	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
> > > +	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
> > >  };
> > >
> > >  static struct fsl_mc_driver vfio_fsl_mc_driver = {
> > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > index beef6ca21107..bfaa9876499b 100644
> > > --- a/drivers/vfio/iommufd.c
> > > +++ b/drivers/vfio/iommufd.c
> > > @@ -88,6 +88,14 @@ int vfio_iommufd_physical_attach_ioas(struct
> > vfio_device *vdev, u32 *pt_id)
> > >  {
> > >  	int rc;
> > >
> > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > +
> > > +	if (!vdev->iommufd_device)
> > > +		return -EINVAL;
> > 
> > This should be a WARN_ON. The vdev->iommufd_ctx should be NULL if it
> > hasn't been bound, and it can't be bound unless the
> > iommufd_device/attach was created.
> 
> sure. But it is a user-triggerable warn. If userspace triggers it on
> purpose, will it be a bad thing for kernel? Maybe use
> dev_warn_ratelimited()?

How can it be user triggerable? You shouldn't be able to reach this
function until the device is bound because the ioctl should be after
the is it bound check

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 14/19] vfio: Make vfio_device_open() single open for device cdev path
  2023-02-28  3:11     ` Liu, Yi L
@ 2023-02-28 12:33       ` Jason Gunthorpe
  2023-03-01 13:58         ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:33 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 03:11:34AM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 2:52 AM
> > 
> > On Mon, Feb 27, 2023 at 03:11:30AM -0800, Yi Liu wrote:
> > > @@ -535,7 +542,8 @@ static int vfio_device_fops_release(struct inode
> > *inode, struct file *filep)
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > -	vfio_device_group_close(df);
> > > +	if (!df->is_cdev_device)
> > > +		vfio_device_group_close(df);
> > 
> > This hunk should go in another patch
> 
> Patch 15 or 16? Which one is your preference? To me, I guess patch
> 15 is better since the user may open cdev fds after it. But its release
> op should not call vfio_device_group_close();

It should go with the patch that allows creating the struct file
withotu calling vfio_device_group_open()

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally
  2023-02-28  6:00   ` Liu, Yi L
@ 2023-02-28 12:36     ` Jason Gunthorpe
  2023-03-01 13:59       ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:36 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 06:00:09AM +0000, Liu, Yi L wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, February 27, 2023 7:12 PM
> > 
> > group code is not needed for vfio device cdev, so with vfio device cdev
> > introduced, the group infrastructures can be compiled out if only cdev
> > is needed.
> > 
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/Kconfig  | 14 +++++++++
> >  drivers/vfio/Makefile |  2 +-
> >  drivers/vfio/vfio.h   | 72
> > +++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/vfio.h  | 24 ++++++++++++++-
> >  4 files changed, 110 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index 169762316513..c3ab06c314ea 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -4,6 +4,8 @@ menuconfig VFIO
> >  	select IOMMU_API
> >  	depends on IOMMUFD || !IOMMUFD
> >  	select INTERVAL_TREE
> > +	select VFIO_GROUP if SPAPR_TCE_IOMMU
> > +	select VFIO_DEVICE_CDEV if !VFIO_GROUP && (X86 || S390 || ARM || ARM64)
> 
> Got below warning when IOMMUFD=n, VFIO_GROUP=n. so may remove
> this select or needs to let VFIO_DEVICE_CDEV select IOMMUFD instead of
> depends on IOMMUFD.

Add

select VFIO_GROUP if !IOMMUFD

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28 12:32       ` Jason Gunthorpe
@ 2023-02-28 12:42         ` Liu, Yi L
  2023-02-28 12:53           ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28 12:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 8:32 PM
> 
> On Tue, Feb 28, 2023 at 02:51:28AM +0000, Liu, Yi L wrote:
> > > This seems strange. no iommu mode should have a NULL dev-
> >iommufctx.
> > > Why do we have a df->noiommu at all?
> >
> > This is due to the vfio_device_first_open(). Detail as below comment (part
> of
> > patch 0016).
> >
> > +	/*
> > +	 * For group/container path, iommufd pointer is NULL when comes
> > +	 * into this helper. Its noiommu support is handled by
> > +	 * vfio_device_group_use_iommu()
> > +	 *
> > +	 * For iommufd compat mode, iommufd pointer here is a valid value.
> > +	 * Its noiommu support is in vfio_iommufd_bind().
> > +	 *
> > +	 * For device cdev path, iommufd pointer here is a valid value for
> > +	 * normal cases, but it is NULL if it's noiommu. Check df->noiommu
> > +	 * to differentiate cdev noiommu from the group/container path
> which
> > +	 * also passes NULL iommufd pointer in. If set then do nothing.
> > +	 */
> 
> If the group is in iommufd mode then it should set this pointer too.

Yes, but the key point is that both the group in legacy mode and the
cdev path sets iommufd==NULL. And the handling for the two should
be different. So needs this extra info to differentiate them in
vfio_device_first_open().

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices
  2023-02-28 12:33       ` Jason Gunthorpe
@ 2023-02-28 12:43         ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28 12:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 8:33 PM
>
> On Tue, Feb 28, 2023 at 02:57:42AM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 28, 2023 2:45 AM
> > >
> > > On Mon, Feb 27, 2023 at 03:11:27AM -0800, Yi Liu wrote:
> > > > diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-
> > > mc/vfio_fsl_mc.c
> > > > index c89a047a4cd8..d540cf683d93 100644
> > > > --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> > > > +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> > > > @@ -594,6 +594,7 @@ static const struct vfio_device_ops
> > > vfio_fsl_mc_ops = {
> > > >  	.bind_iommufd	= vfio_iommufd_physical_bind,
> > > >  	.unbind_iommufd	= vfio_iommufd_physical_unbind,
> > > >  	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
> > > > +	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
> > > >  };
> > > >
> > > >  static struct fsl_mc_driver vfio_fsl_mc_driver = {
> > > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > > index beef6ca21107..bfaa9876499b 100644
> > > > --- a/drivers/vfio/iommufd.c
> > > > +++ b/drivers/vfio/iommufd.c
> > > > @@ -88,6 +88,14 @@ int vfio_iommufd_physical_attach_ioas(struct
> > > vfio_device *vdev, u32 *pt_id)
> > > >  {
> > > >  	int rc;
> > > >
> > > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > > +
> > > > +	if (!vdev->iommufd_device)
> > > > +		return -EINVAL;
> > >
> > > This should be a WARN_ON. The vdev->iommufd_ctx should be NULL if it
> > > hasn't been bound, and it can't be bound unless the
> > > iommufd_device/attach was created.
> >
> > sure. But it is a user-triggerable warn. If userspace triggers it on
> > purpose, will it be a bad thing for kernel? Maybe use
> > dev_warn_ratelimited()?
> 
> How can it be user triggerable? You shouldn't be able to reach this
> function until the device is bound because the ioctl should be after
> the is it bound check

Oh, yes. it is! ioctls are blocked until bound to iommufd.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28 12:31         ` Jason Gunthorpe
@ 2023-02-28 12:45           ` Liu, Yi L
  2023-02-28 12:52             ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28 12:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 8:31 PM
> 
> On Tue, Feb 28, 2023 at 06:58:38AM +0000, Liu, Yi L wrote:
> 
> > Seems like pt_id is no more needed in the vfio_iommufd_bind()
> > since it can get compat_ioas_id in the function itself. Cdev path
> > never passes a pt_id to vfio_iommufd_bind() as its attach is done
> > by separate ATTACH ioctl. Can we use the dev_id pointer to indicate
> > if it needs to get the compat ioas and attach it?
> 
> In this case you need to split the group code to also use the two step
> attach and then the attach will take in the null pt_id.

This seems to be the current way in this patch. Right? Group code passes
a pt_id pointer to vfio_iommufd_bind(). While the cdev path just passes
in a null pt_id pointer. Its attach is done later when user gives pt_id.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28 12:29       ` Jason Gunthorpe
@ 2023-02-28 12:48         ` Liu, Yi L
  2023-02-28 12:52           ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28 12:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 8:30 PM
> 
> On Tue, Feb 28, 2023 at 02:35:25AM +0000, Liu, Yi L wrote:
> 
> > > And the commit message is sort of out of sync with the patch, more like:
> > >
> > > vfio: Pass the pt_id as an argument to vfio_iommufd_bind()
> > >
> > > To support binding the cdev the pt_id must come from userspace
> instead
> > > of being forced to the compat_ioas_id.
> > >
> >
> > Got it. not only pt_id, also dev_id. 😊
> 
> Maybe dev_id should be read back from the iommufd_device pointer in
> the vfio_device. It is trivially stored in that memory already

Yes. this somehow gives me a doubt. Why iommufd_device_bind() returns
both iommufd_device pointer and the id back as id is already stored in the
iommufd_device. Is it?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28 12:45           ` Liu, Yi L
@ 2023-02-28 12:52             ` Jason Gunthorpe
  2023-02-28 12:56               ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:52 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 12:45:47PM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 8:31 PM
> > 
> > On Tue, Feb 28, 2023 at 06:58:38AM +0000, Liu, Yi L wrote:
> > 
> > > Seems like pt_id is no more needed in the vfio_iommufd_bind()
> > > since it can get compat_ioas_id in the function itself. Cdev path
> > > never passes a pt_id to vfio_iommufd_bind() as its attach is done
> > > by separate ATTACH ioctl. Can we use the dev_id pointer to indicate
> > > if it needs to get the compat ioas and attach it?
> > 
> > In this case you need to split the group code to also use the two step
> > attach and then the attach will take in the null pt_id.
> 
> This seems to be the current way in this patch. Right? Group code passes
> a pt_id pointer to vfio_iommufd_bind(). While the cdev path just passes
> in a null pt_id pointer. Its attach is done later when user gives pt_id.

I mean actually explicitly call attach and remove the implicit attach
during bind flow entirely.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28 12:48         ` Liu, Yi L
@ 2023-02-28 12:52           ` Jason Gunthorpe
  2023-02-28 13:24             ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:52 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 12:48:23PM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 8:30 PM
> > 
> > On Tue, Feb 28, 2023 at 02:35:25AM +0000, Liu, Yi L wrote:
> > 
> > > > And the commit message is sort of out of sync with the patch, more like:
> > > >
> > > > vfio: Pass the pt_id as an argument to vfio_iommufd_bind()
> > > >
> > > > To support binding the cdev the pt_id must come from userspace
> > instead
> > > > of being forced to the compat_ioas_id.
> > > >
> > >
> > > Got it. not only pt_id, also dev_id. 😊
> > 
> > Maybe dev_id should be read back from the iommufd_device pointer in
> > the vfio_device. It is trivially stored in that memory already
> 
> Yes. this somehow gives me a doubt. Why iommufd_device_bind() returns
> both iommufd_device pointer and the id back as id is already stored in the
> iommufd_device. Is it?

Yes, it was done this way to avoid another API to get the ID, but
perhaps that is more conveient for vfio anyhow. We could get rid of
the id return pointer as well

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28 12:42         ` Liu, Yi L
@ 2023-02-28 12:53           ` Jason Gunthorpe
  2023-02-28 13:22             ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:53 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 12:42:31PM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 8:32 PM
> > 
> > On Tue, Feb 28, 2023 at 02:51:28AM +0000, Liu, Yi L wrote:
> > > > This seems strange. no iommu mode should have a NULL dev-
> > >iommufctx.
> > > > Why do we have a df->noiommu at all?
> > >
> > > This is due to the vfio_device_first_open(). Detail as below comment (part
> > of
> > > patch 0016).
> > >
> > > +	/*
> > > +	 * For group/container path, iommufd pointer is NULL when comes
> > > +	 * into this helper. Its noiommu support is handled by
> > > +	 * vfio_device_group_use_iommu()
> > > +	 *
> > > +	 * For iommufd compat mode, iommufd pointer here is a valid value.
> > > +	 * Its noiommu support is in vfio_iommufd_bind().
> > > +	 *
> > > +	 * For device cdev path, iommufd pointer here is a valid value for
> > > +	 * normal cases, but it is NULL if it's noiommu. Check df->noiommu
> > > +	 * to differentiate cdev noiommu from the group/container path
> > which
> > > +	 * also passes NULL iommufd pointer in. If set then do nothing.
> > > +	 */
> > 
> > If the group is in iommufd mode then it should set this pointer too.
> 
> Yes, but the key point is that both the group in legacy mode and the
> cdev path sets iommufd==NULL. And the handling for the two should
> be different. So needs this extra info to differentiate them in
> vfio_device_first_open().

Don't encode that in the iommufd pointer, it is confusing.

A null iommufd pointer and a bound df flag is sufficient to see that
it is compat mode.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28 12:52             ` Jason Gunthorpe
@ 2023-02-28 12:56               ` Liu, Yi L
  2023-02-28 12:58                 ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28 12:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 8:52 PM
> 
> On Tue, Feb 28, 2023 at 12:45:47PM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 28, 2023 8:31 PM
> > >
> > > On Tue, Feb 28, 2023 at 06:58:38AM +0000, Liu, Yi L wrote:
> > >
> > > > Seems like pt_id is no more needed in the vfio_iommufd_bind()
> > > > since it can get compat_ioas_id in the function itself. Cdev path
> > > > never passes a pt_id to vfio_iommufd_bind() as its attach is done
> > > > by separate ATTACH ioctl. Can we use the dev_id pointer to indicate
> > > > if it needs to get the compat ioas and attach it?
> > >
> > > In this case you need to split the group code to also use the two step
> > > attach and then the attach will take in the null pt_id.
> >
> > This seems to be the current way in this patch. Right? Group code passes
> > a pt_id pointer to vfio_iommufd_bind(). While the cdev path just passes
> > in a null pt_id pointer. Its attach is done later when user gives pt_id.
> 
> I mean actually explicitly call attach and remove the implicit attach
> during bind flow entirely.

Okay, so I can wrap the iommufd_vfio_compat_ioas_id() and ops->attach_ioas
in a helper for group code to do attach after bind_iommufd. This can avoid to
moving the iommufd_vfio_compat_ioas_id() out of iommufd.c as your original
remark.

Is this ok?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28 12:56               ` Liu, Yi L
@ 2023-02-28 12:58                 ` Jason Gunthorpe
  0 siblings, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 12:58 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 12:56:11PM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 8:52 PM
> > 
> > On Tue, Feb 28, 2023 at 12:45:47PM +0000, Liu, Yi L wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Tuesday, February 28, 2023 8:31 PM
> > > >
> > > > On Tue, Feb 28, 2023 at 06:58:38AM +0000, Liu, Yi L wrote:
> > > >
> > > > > Seems like pt_id is no more needed in the vfio_iommufd_bind()
> > > > > since it can get compat_ioas_id in the function itself. Cdev path
> > > > > never passes a pt_id to vfio_iommufd_bind() as its attach is done
> > > > > by separate ATTACH ioctl. Can we use the dev_id pointer to indicate
> > > > > if it needs to get the compat ioas and attach it?
> > > >
> > > > In this case you need to split the group code to also use the two step
> > > > attach and then the attach will take in the null pt_id.
> > >
> > > This seems to be the current way in this patch. Right? Group code passes
> > > a pt_id pointer to vfio_iommufd_bind(). While the cdev path just passes
> > > in a null pt_id pointer. Its attach is done later when user gives pt_id.
> > 
> > I mean actually explicitly call attach and remove the implicit attach
> > during bind flow entirely.
> 
> Okay, so I can wrap the iommufd_vfio_compat_ioas_id() and ops->attach_ioas
> in a helper for group code to do attach after bind_iommufd. This can avoid to
> moving the iommufd_vfio_compat_ioas_id() out of iommufd.c as your original
> remark.
> 
> Is this ok?

Yes, some 'attach compat' helper makes sense

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28 12:53           ` Jason Gunthorpe
@ 2023-02-28 13:22             ` Liu, Yi L
  2023-02-28 13:25               ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28 13:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 8:54 PM
> 
> On Tue, Feb 28, 2023 at 12:42:31PM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 28, 2023 8:32 PM
> > >
> > > On Tue, Feb 28, 2023 at 02:51:28AM +0000, Liu, Yi L wrote:
> > > > > This seems strange. no iommu mode should have a NULL dev-
> > > >iommufctx.
> > > > > Why do we have a df->noiommu at all?
> > > >
> > > > This is due to the vfio_device_first_open(). Detail as below comment
> (part
> > > of
> > > > patch 0016).
> > > >
> > > > +	/*
> > > > +	 * For group/container path, iommufd pointer is NULL when comes
> > > > +	 * into this helper. Its noiommu support is handled by
> > > > +	 * vfio_device_group_use_iommu()
> > > > +	 *
> > > > +	 * For iommufd compat mode, iommufd pointer here is a valid value.
> > > > +	 * Its noiommu support is in vfio_iommufd_bind().
> > > > +	 *
> > > > +	 * For device cdev path, iommufd pointer here is a valid value for
> > > > +	 * normal cases, but it is NULL if it's noiommu. Check df->noiommu
> > > > +	 * to differentiate cdev noiommu from the group/container path
> > > which
> > > > +	 * also passes NULL iommufd pointer in. If set then do nothing.
> > > > +	 */
> > >
> > > If the group is in iommufd mode then it should set this pointer too.
> >
> > Yes, but the key point is that both the group in legacy mode and the
> > cdev path sets iommufd==NULL. And the handling for the two should
> > be different. So needs this extra info to differentiate them in
> > vfio_device_first_open().
> 
> Don't encode that in the iommufd pointer, it is confusing.

Maybe I failed to make it clear. As the below code, When
iommufd==!NULL, no need to differentiate whether it is
the group compat mode or the cdev path. But if iommufd==NULL,
it may be the legacy group code or the cdev noiommu mode. So
df->noiommu is added. But I agree this noiommu flag is confusing.
May use the df->is_cdev_device flag as the purpose here is to
differentiate cdev path and group path.

	if (iommufd)
		ret = vfio_iommufd_bind(device, iommufd, dev_id);
	else if (!df->noiommu)
		ret = vfio_device_group_use_iommu(device);
	if (ret)
		goto err_module_put;


> A null iommufd pointer and a bound df flag is sufficient to see that
> it is compat mode.

Hope df->is_cdev_device suits your expectation.:-) The code will look
like below:

	if (iommufd) {
		ret = vfio_iommufd_bind(device, iommufd, dev_id);

		if (!ret && !df->is_cdev_device) {
			ret = vfio_iommufd_attach_compat(device); // new helper as in patch 10 discussed
			if (ret)
				vfio_iommufd_unbind(device);
		}
	} else if (!df->is_cdev_device) {
		ret = vfio_device_group_use_iommu(device);
	}
	if (ret)
		goto err_module_put;

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-28 12:52           ` Jason Gunthorpe
@ 2023-02-28 13:24             ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28 13:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 8:53 PM
> 
> On Tue, Feb 28, 2023 at 12:48:23PM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 28, 2023 8:30 PM
> > >
> > > On Tue, Feb 28, 2023 at 02:35:25AM +0000, Liu, Yi L wrote:
> > >
> > > > > And the commit message is sort of out of sync with the patch, more
> like:
> > > > >
> > > > > vfio: Pass the pt_id as an argument to vfio_iommufd_bind()
> > > > >
> > > > > To support binding the cdev the pt_id must come from userspace
> > > instead
> > > > > of being forced to the compat_ioas_id.
> > > > >
> > > >
> > > > Got it. not only pt_id, also dev_id. 😊
> > >
> > > Maybe dev_id should be read back from the iommufd_device pointer in
> > > the vfio_device. It is trivially stored in that memory already
> >
> > Yes. this somehow gives me a doubt. Why iommufd_device_bind()
> returns
> > both iommufd_device pointer and the id back as id is already stored in the
> > iommufd_device. Is it?
> 
> Yes, it was done this way to avoid another API to get the ID, but
> perhaps that is more conveient for vfio anyhow. We could get rid of
> the id return pointer as well

Ok, maybe I can have a small patch to add API like iommufd_device_id()
to get devid, and get rid of the id return pointer as part of this series or
an independent prerequisite patch for this series.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28 13:22             ` Liu, Yi L
@ 2023-02-28 13:25               ` Jason Gunthorpe
  2023-02-28 13:36                 ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 13:25 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 01:22:50PM +0000, Liu, Yi L wrote:

> > A null iommufd pointer and a bound df flag is sufficient to see that
> > it is compat mode.
> 
> Hope df->is_cdev_device suits your expectation.:-) The code will look
> like below:

Yes, this is better.. However I'd suggest 'uses_container' as it is
clearer what the special case is

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28 13:25               ` Jason Gunthorpe
@ 2023-02-28 13:36                 ` Liu, Yi L
  2023-02-28 13:43                   ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28 13:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 9:26 PM
> 
> On Tue, Feb 28, 2023 at 01:22:50PM +0000, Liu, Yi L wrote:
> 
> > > A null iommufd pointer and a bound df flag is sufficient to see that
> > > it is compat mode.
> >
> > Hope df->is_cdev_device suits your expectation.:-) The code will look
> > like below:
> 
> Yes, this is better.. However I'd suggest 'uses_container' as it is
> clearer what the special case is

Surely doable. Need to add a helper like below:

bool vfio_device_group_uses_container()
{
	lockdep_assert_held(&device->group->group_lock);
	return device->group->container;
}

But I'm poor at naming it. If it is true, the code would call
vfio_device_group_use_iommu(). If doing it in this way,
I think it's better to rename vfio_device_group_use_iommu()
as well.

Regards,
Yi Liu



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28 13:36                 ` Liu, Yi L
@ 2023-02-28 13:43                   ` Jason Gunthorpe
  2023-02-28 14:01                     ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 13:43 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 01:36:24PM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 9:26 PM
> > 
> > On Tue, Feb 28, 2023 at 01:22:50PM +0000, Liu, Yi L wrote:
> > 
> > > > A null iommufd pointer and a bound df flag is sufficient to see that
> > > > it is compat mode.
> > >
> > > Hope df->is_cdev_device suits your expectation.:-) The code will look
> > > like below:
> > 
> > Yes, this is better.. However I'd suggest 'uses_container' as it is
> > clearer what the special case is
> 
> Surely doable. Need to add a helper like below:
> 
> bool vfio_device_group_uses_container()
> {
> 	lockdep_assert_held(&device->group->group_lock);
> 	return device->group->container;
> }

It should come from the df.

If you have a df then by definition:
  smp_load_acquire(..) == false     - Not bound
  df->device->iommufd_ctx != NULL   - Using iommufd
  df->group->containter != NULL     - Using legacy container
  all other cases                   - NO_IOMMU                

No locking required since all these cases after the smp_load_acquire
must be fixed for the lifetime of the df.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28 13:43                   ` Jason Gunthorpe
@ 2023-02-28 14:01                     ` Liu, Yi L
  2023-02-28 14:38                       ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-02-28 14:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 9:44 PM
> 
> On Tue, Feb 28, 2023 at 01:36:24PM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 28, 2023 9:26 PM
> > >
> > > On Tue, Feb 28, 2023 at 01:22:50PM +0000, Liu, Yi L wrote:
> > >
> > > > > A null iommufd pointer and a bound df flag is sufficient to see that
> > > > > it is compat mode.
> > > >
> > > > Hope df->is_cdev_device suits your expectation.:-) The code will look
> > > > like below:
> > >
> > > Yes, this is better.. However I'd suggest 'uses_container' as it is
> > > clearer what the special case is
> >
> > Surely doable. Need to add a helper like below:
> >
> > bool vfio_device_group_uses_container()
> > {
> > 	lockdep_assert_held(&device->group->group_lock);
> > 	return device->group->container;
> > }
> 
> It should come from the df.
> 
> If you have a df then by definition:
>   smp_load_acquire(..) == false     - Not bound
>   df->device->iommufd_ctx != NULL   - Using iommufd
>   df->group->containter != NULL     - Using legacy container
>   all other cases                   - NO_IOMMU
> 
> No locking required since all these cases after the smp_load_acquire
> must be fixed for the lifetime of the df.

Do you mean the df->access_granted (introduced in patch 07) or a new flag?
Following your suggestion, it seems a mandatory requirement to do the
smp_load_acquire(..) == false check first, and then call into the vfio_device_open()
which further calls vfio_device_first_open() to check the iommufd/
legacy container/noiommu stuffs. Is it?

df->group->containter this may need a helper to avoid decoding group
field. May be just store container in df?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28 14:01                     ` Liu, Yi L
@ 2023-02-28 14:38                       ` Jason Gunthorpe
  2023-03-01 14:04                         ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-02-28 14:38 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 02:01:36PM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 9:44 PM
> > 
> > On Tue, Feb 28, 2023 at 01:36:24PM +0000, Liu, Yi L wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Tuesday, February 28, 2023 9:26 PM
> > > >
> > > > On Tue, Feb 28, 2023 at 01:22:50PM +0000, Liu, Yi L wrote:
> > > >
> > > > > > A null iommufd pointer and a bound df flag is sufficient to see that
> > > > > > it is compat mode.
> > > > >
> > > > > Hope df->is_cdev_device suits your expectation.:-) The code will look
> > > > > like below:
> > > >
> > > > Yes, this is better.. However I'd suggest 'uses_container' as it is
> > > > clearer what the special case is
> > >
> > > Surely doable. Need to add a helper like below:
> > >
> > > bool vfio_device_group_uses_container()
> > > {
> > > 	lockdep_assert_held(&device->group->group_lock);
> > > 	return device->group->container;
> > > }
> > 
> > It should come from the df.
> > 
> > If you have a df then by definition:
> >   smp_load_acquire(..) == false     - Not bound
> >   df->device->iommufd_ctx != NULL   - Using iommufd
> >   df->group->containter != NULL     - Using legacy container
> >   all other cases                   - NO_IOMMU
> > 
> > No locking required since all these cases after the smp_load_acquire
> > must be fixed for the lifetime of the df.
> 
> Do you mean the df->access_granted (introduced in patch 07) or a new
> flag?

yes

> Following your suggestion, it seems a mandatory requirement to do the
> smp_load_acquire(..) == false check first, and then call into the vfio_device_open()
> which further calls vfio_device_first_open() to check the iommufd/
> legacy container/noiommu stuffs. Is it?

Figuring out if an open should happen or not is a different operation,
you already build exclusion between cdev/group so we don't need to
care about the open path. 

> df->group->containter this may need a helper to avoid decoding group
> field. May be just store container in df?

At worst a flag, but a helper seems like a good idea anyhow, then it
can be compiled out

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-02-28  3:03   ` Liu, Yi L
@ 2023-02-28 16:58     ` Xu, Terrence
  2023-03-01  2:29       ` Nicolin Chen
  0 siblings, 1 reply; 131+ messages in thread
From: Xu, Terrence @ 2023-02-28 16:58 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 28, 2023 11:03 AM
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 28, 2023 3:21 AM
> >
> > On Mon, Feb 27, 2023 at 03:11:16AM -0800, Yi Liu wrote:
> > > Existing VFIO provides group-centric user APIs for userspace.
> > > Userspace opens the /dev/vfio/$group_id first before getting device
> > > fd and hence getting access to device. This is not the desired model
> > > for iommufd. Per the conclusion of community discussion[1], iommufd
> > > provides device-
> > centric
> > > kAPIs and requires its consumer (like VFIO) to be device-centric
> > > user APIs. Such user APIs are used to associate device with iommufd
> > > and also the I/O address spaces managed by the iommufd.
> > >
> > > This series first introduces a per device file structure to be
> > > prepared for further enhancement and refactors the kvm-vfio code to
> > > be prepared for accepting device file from userspace. Then refactors
> > > the vfio to be able to handle iommufd binding. This refactor
> > > includes the mechanism of blocking device access before iommufd
> > > bind, making the device_open
> > exclusive.
> > > between the group path and the cdev path. Eventually, adds the cdev
> > support
> > > for vfio device, and makes group infrastructure optional as it is
> > > not needed when vfio device cdev is compiled.
> > >
> > > This is also a prerequisite for iommu nesting for vfio device[2].
> > >
> > > The complete code can be found in below branch, simple test done
> > > with
> > the
> > > legacy group path and the cdev path. Draft QEMU branch can be found
> > at[3]
> > >
> > > https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v5
> > > (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> > >
> > > base-commit: 63777bd2daa3625da6eada88bd9081f047664dad
> >
> > This needs to be rebased onto a clean v6.3-rc1 when it comes out
> 
> Yes, I'll send rebase and send one more version when v6.3-rc1 comes. Here
> just try to be near to the vfio code in Alex's next branch.
> 
> Regards,
> Yi Liu

Verified this series by "Intel GVT-g GPU device mediated passthrough" and "Intel GVT-d GPU device direct passthrough" technologies.
Both passed VFIO legacy mode / compat mode / cdev mode, including negative tests.

Tested-by: Terrence Xu <terrence.xu@intel.com>

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-02-28 16:58     ` Xu, Terrence
@ 2023-03-01  2:29       ` Nicolin Chen
  2023-03-01  3:44         ` Liu, Yi L
                           ` (2 more replies)
  0 siblings, 3 replies; 131+ messages in thread
From: Nicolin Chen @ 2023-03-01  2:29 UTC (permalink / raw)
  To: Xu, Terrence
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, Jason Gunthorpe, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Feb 28, 2023 at 04:58:06PM +0000, Xu, Terrence wrote:

> Verified this series by "Intel GVT-g GPU device mediated passthrough" and "Intel GVT-d GPU device direct passthrough" technologies.
> Both passed VFIO legacy mode / compat mode / cdev mode, including negative tests.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>

Sanity-tested this series on ARM64 with my wip branch:
https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-nesting
(Covering new iommufd and vfio-compat)

Tested-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-01  2:29       ` Nicolin Chen
@ 2023-03-01  3:44         ` Liu, Yi L
  2023-03-02  9:43         ` Shameerali Kolothum Thodi
  2023-03-03 21:29         ` Matthew Rosato
  2 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-01  3:44 UTC (permalink / raw)
  To: Nicolin Chen, Xu, Terrence
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com,
	suravee.suthikulpanit@amd.com, Zhao, Yan Y, eric.auger@redhat.com,
	shameerali.kolothum.thodi@huawei.com, Jason Gunthorpe,
	intel-gfx@lists.freedesktop.org, chao.p.peng@linux.intel.com,
	lulu@redhat.com, robin.murphy@arm.com, jasowang@redhat.com

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, March 1, 2023 10:29 AM
> 
> On Tue, Feb 28, 2023 at 04:58:06PM +0000, Xu, Terrence wrote:
> 
> > Verified this series by "Intel GVT-g GPU device mediated passthrough"
> and "Intel GVT-d GPU device direct passthrough" technologies.
> > Both passed VFIO legacy mode / compat mode / cdev mode, including
> negative tests.
> >
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> 
> Sanity-tested this series on ARM64 with my wip branch:
> https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-nesting
> (Covering new iommufd and vfio-compat)
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>

Thanks.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
  2023-02-27 19:19   ` Jason Gunthorpe
@ 2023-03-01  9:19   ` Liu, Yi L
  2023-03-01 17:46     ` Jason Gunthorpe
  2023-03-10  2:39   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-03-01  9:19 UTC (permalink / raw)
  To: alex.williamson@redhat.com, jgg@nvidia.com, Tian, Kevin
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, February 27, 2023 7:12 PM
[...]
> +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				    unsigned long arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_bind_iommufd bind;
> +	struct iommufd_ctx *iommufd = NULL;
> +	unsigned long minsz;
> +	int ret;
> +
> +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> +
> +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> +		return -EFAULT;
> +
> +	if (bind.argsz < minsz || bind.flags)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;

Hi Jason,

Per the comment in vfio_iommufd_bind(), such device driver
won't provide .bind_iommufd(). So shall we allow this ioctl
to go longer to call .open_device() instead of failing it here?
I think we need to allow it to go further. E.g. leave the check
to be in vfio_iommufd_bind(). Otherwise, user may not able
to use such devices. Is it?

> +
> +	ret = vfio_device_block_group(device);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/*
> +	 * If already been bound to an iommufd, or already set noiommu
> +	 * then fail it.
> +	 */
> +	if (df->iommufd || df->noiommu) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* iommufd < 0 means noiommu mode */
> +	if (bind.iommufd < 0) {
> +		if (!capable(CAP_SYS_RAWIO)) {
> +			ret = -EPERM;
> +			goto out_unlock;
> +		}
> +		df->noiommu = true;
> +	} else {
> +		iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> +		if (IS_ERR(iommufd)) {
> +			ret = PTR_ERR(iommufd);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	/*
> +	 * Before the device open, get the KVM pointer currently
> +	 * associated with the device file (if there is) and obtain
> +	 * a reference.  This reference is held until device closed.
> +	 * Save the pointer in the device for use by drivers.
> +	 */
> +	vfio_device_get_kvm_safe(df);
> +
> +	df->iommufd = iommufd;
> +	ret = vfio_device_open(df, &bind.out_devid, NULL);
> +	if (ret)
> +		goto out_put_kvm;
[...]
> 
>  /* --------------- IOCTLs for DEVICE file descriptors --------------- */
> 
> +/*
> + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
> + *				   struct vfio_device_bind_iommufd)
> + *
> + * Bind a vfio_device to the specified iommufd.
> + *
> + * The user should provide a device cookie when calling this ioctl. The
> + * cookie is carried only in event e.g. I/O fault reported to userspace
> + * via iommufd. The user should use devid returned by this ioctl to mark
> + * the target device in other ioctls (e.g. capability query via iommufd).
> + *
> + * User is not allowed to access the device before the binding operation
> + * is completed.
> + *
> + * Unbind is automatically conducted when device fd is closed.
> + *
> + * @argsz:	 user filled size of this data.
> + * @flags:	 reserved for future extension.
> + * @dev_cookie:	 a per device cookie provided by userspace.
> + * @iommufd:	 iommufd to bind. a negative value means noiommu.
> + * @out_devid:	 the device id generated by this bind.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_bind_iommufd {
> +	__u32		argsz;
> +	__u32		flags;
> +	__aligned_u64	dev_cookie;
> +	__s32		iommufd;
> +	__u32		out_devid;

As above, for the devices that do not do DMA, there is no .bind_iommufd
op, hence no iommufd_device generated. This means no good value
can be filled in this out_devid field. So this field is optional. Only
for the devices which do DMA, should this out_devid field return a
valid ID otherwise an invalid ID would be filled (e.g. value #0 is an
invalid value in the iommufd object id pool). Userspace needs to
check if the out_devid is valid or not before use. This ID can be further
used in iommufd uAPIs like IOMMU_HWPT_ALLOC, IOMMU_DEVICE_GET_INFO
and etc.

> +};
> +
> +#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE
> + 19)
> +
>  /**
>   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
>   *						struct vfio_device_info)
> --
> 2.34.1

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 07/19] vfio: Block device access via device fd until device is opened
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 07/19] vfio: Block device access via device fd until device is opened Yi Liu
  2023-02-27 18:48   ` Jason Gunthorpe
@ 2023-03-01  9:22   ` Liu, Yi L
  1 sibling, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-01  9:22 UTC (permalink / raw)
  To: alex.williamson@redhat.com, jgg@nvidia.com, Tian, Kevin
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, February 27, 2023 7:11 PM
> 
> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
> 
> The reason for the inbetween state is that userspace only gets a FD but
> doesn't gain access permission until binding the FD to an iommufd. So in
> the blocked state, only the bind operation is allowed. Completing bind
> will allow user to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Following this lockless scheme, it can safely handle the device FD
> unbound->bound but it cannot handle bound->unbound. To allow this we'd
> need to add a lock on all the vfio ioctls which seems costly. So once
> device FD is bound, it remains bound until the FD is closed.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/group.c     |  6 ++++++
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
>  3 files changed, 23 insertions(+)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 960b1bcb606b..d8771d585cb1 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -197,6 +197,12 @@ static int vfio_device_group_open(struct
> vfio_device_file *df)
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
> 
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
> +

A bug. If ret is false, it should not set df->access_granted. Would
be fixed in the  next version.

Regards,
Yi Liu
>  	mutex_unlock(&device->dev_set->lock);
> 
>  out_unlock:
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 7c1ea870d8f3..2e3cb284711d 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -18,6 +18,7 @@ struct vfio_container;
> 
>  struct vfio_device_file {
>  	struct vfio_device *device;
> +	bool access_granted;
>  	spinlock_t kvm_ref_lock; /* protect kvm field */
>  	struct kvm *kvm;
>  	struct iommufd_ctx *iommufd; /* protected by struct
> vfio_device_set::lock */
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 609700748082..d16ac573e290 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1106,6 +1106,10 @@ static long vfio_device_fops_unl_ioctl(struct file
> *filep,
>  	struct vfio_device *device = df->device;
>  	int ret;
> 
> +	/* Paired with smp_store_release() in vfio_device_group_open()
> */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	ret = vfio_device_pm_runtime_get(device);
>  	if (ret)
>  		return ret;
> @@ -1133,6 +1137,10 @@ static ssize_t vfio_device_fops_read(struct file
> *filep, char __user *buf,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
> 
> +	/* Paired with smp_store_release() in vfio_device_group_open()
> */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->read))
>  		return -EINVAL;
> 
> @@ -1146,6 +1154,10 @@ static ssize_t vfio_device_fops_write(struct file
> *filep,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
> 
> +	/* Paired with smp_store_release() in vfio_device_group_open()
> */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->write))
>  		return -EINVAL;
> 
> @@ -1157,6 +1169,10 @@ static int vfio_device_fops_mmap(struct file
> *filep, struct vm_area_struct *vma)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
> 
> +	/* Paired with smp_store_release() in vfio_device_group_open()
> */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->mmap))
>  		return -EINVAL;
> 
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 14/19] vfio: Make vfio_device_open() single open for device cdev path
  2023-02-28 12:33       ` Jason Gunthorpe
@ 2023-03-01 13:58         ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-01 13:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 8:34 PM
> 
> On Tue, Feb 28, 2023 at 03:11:34AM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 28, 2023 2:52 AM
> > >
> > > On Mon, Feb 27, 2023 at 03:11:30AM -0800, Yi Liu wrote:
> > > > @@ -535,7 +542,8 @@ static int vfio_device_fops_release(struct
> inode
> > > *inode, struct file *filep)
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > -	vfio_device_group_close(df);
> > > > +	if (!df->is_cdev_device)
> > > > +		vfio_device_group_close(df);
> > >
> > > This hunk should go in another patch
> >
> > Patch 15 or 16? Which one is your preference? To me, I guess patch
> > 15 is better since the user may open cdev fds after it. But its release
> > op should not call vfio_device_group_close();
> 
> It should go with the patch that allows creating the struct file
> withotu calling vfio_device_group_open()

Sure. I moved it to the patch which adds cdev as this patch starts to
have df->is_cdev_device == 1.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally
  2023-02-28 12:36     ` Jason Gunthorpe
@ 2023-03-01 13:59       ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-01 13:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 8:36 PM
> 
> On Tue, Feb 28, 2023 at 06:00:09AM +0000, Liu, Yi L wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Monday, February 27, 2023 7:12 PM
> > >
> > > group code is not needed for vfio device cdev, so with vfio device cdev
> > > introduced, the group infrastructures can be compiled out if only cdev
> > > is needed.
> > >
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/Kconfig  | 14 +++++++++
> > >  drivers/vfio/Makefile |  2 +-
> > >  drivers/vfio/vfio.h   | 72
> > > +++++++++++++++++++++++++++++++++++++++++++
> > >  include/linux/vfio.h  | 24 ++++++++++++++-
> > >  4 files changed, 110 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > > index 169762316513..c3ab06c314ea 100644
> > > --- a/drivers/vfio/Kconfig
> > > +++ b/drivers/vfio/Kconfig
> > > @@ -4,6 +4,8 @@ menuconfig VFIO
> > >  	select IOMMU_API
> > >  	depends on IOMMUFD || !IOMMUFD
> > >  	select INTERVAL_TREE
> > > +	select VFIO_GROUP if SPAPR_TCE_IOMMU
> > > +	select VFIO_DEVICE_CDEV if !VFIO_GROUP && (X86 || S390 || ARM
> || ARM64)
> >
> > Got below warning when IOMMUFD=n, VFIO_GROUP=n. so may remove
> > this select or needs to let VFIO_DEVICE_CDEV select IOMMUFD instead of
> > depends on IOMMUFD.
> 
> Add
> 
> select VFIO_GROUP if !IOMMUFD

Done.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-28 14:38                       ` Jason Gunthorpe
@ 2023-03-01 14:04                         ` Liu, Yi L
  2023-03-01 17:49                           ` Jason Gunthorpe
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-03-01 14:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 28, 2023 10:38 PM
> 
> On Tue, Feb 28, 2023 at 02:01:36PM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 28, 2023 9:44 PM
> > >
> > > On Tue, Feb 28, 2023 at 01:36:24PM +0000, Liu, Yi L wrote:
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Tuesday, February 28, 2023 9:26 PM
> > > > >
> > > > > On Tue, Feb 28, 2023 at 01:22:50PM +0000, Liu, Yi L wrote:
> > > > >
> > > > > > > A null iommufd pointer and a bound df flag is sufficient to see
> that
> > > > > > > it is compat mode.
> > > > > >
> > > > > > Hope df->is_cdev_device suits your expectation.:-) The code will
> look
> > > > > > like below:
> > > > >
> > > > > Yes, this is better.. However I'd suggest 'uses_container' as it is
> > > > > clearer what the special case is
> > > >
> > > > Surely doable. Need to add a helper like below:
> > > >
> > > > bool vfio_device_group_uses_container()
> > > > {
> > > > 	lockdep_assert_held(&device->group->group_lock);
> > > > 	return device->group->container;
> > > > }
> > >
> > > It should come from the df.
> > >
> > > If you have a df then by definition:
> > >   smp_load_acquire(..) == false     - Not bound
> > >   df->device->iommufd_ctx != NULL   - Using iommufd
> > >   df->group->containter != NULL     - Using legacy container
> > >   all other cases                   - NO_IOMMU
> > >
> > > No locking required since all these cases after the smp_load_acquire
> > > must be fixed for the lifetime of the df.
> >
> > Do you mean the df->access_granted (introduced in patch 07) or a new
> > flag?
> 
> yes
> 
> > Following your suggestion, it seems a mandatory requirement to do the
> > smp_load_acquire(..) == false check first, and then call into the
> vfio_device_open()
> > which further calls vfio_device_first_open() to check the iommufd/
> > legacy container/noiommu stuffs. Is it?
> 
> Figuring out if an open should happen or not is a different operation,
> you already build exclusion between cdev/group so we don't need to
> care about the open path.

Ok.
 
> > df->group->containter this may need a helper to avoid decoding group
> > field. May be just store container in df?
> 
> At worst a flag, but a helper seems like a good idea anyhow, then it
> can be compiled out

I add a separate commit as below. vfio_device_group_uses_container() is
added.

From 0ce86e6b71d1884e9f5de30ba23e3aa93cc84db9 Mon Sep 17 00:00:00 2001
From: Yi Liu <yi.l.liu@intel.com>
Date: Wed, 1 Mar 2023 02:24:43 -0800
Subject: [PATCH 15/22] vfio: Make vfio_device_first_open() to cover the
 noiommu mode in cdev path

vfio_device_first_open() now covers the below two cases:

1) user uses iommufd (e.g. the group path in iommufd compat mode);
2) user uses container (e.g. the group path in legacy mode);

The above two paths have their own noiommu mode support accordingly.

The cdev path also uses iommufd, so for the case user provides a valid
iommufd, this helper is able to support it. But for noiommu mode, the
cdev path just provides a NULL iommufd. So this needs to be able to cover
it. As there is no special things to do for the cdev path in noiommu
mode, it can be covered by simply differentiate it from the container
case. If user is not using iommufd nor container, it is the noiommu
mode.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     |  5 +++++
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 19 ++++++++++++++++---
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 2a13442add43..ed3ffe7ceb3f 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -777,6 +777,11 @@ void vfio_device_group_unregister(struct vfio_device *device)
 	mutex_unlock(&device->group->device_lock);
 }
 
+bool vfio_device_group_uses_container(struct vfio_device *device)
+{
+	return READ_ONCE(device->group->container);
+}
+
 int vfio_device_group_use_iommu(struct vfio_device *device)
 {
 	struct vfio_group *group = device->group;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 68d35e1d7b87..e1f5a0310551 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -95,6 +95,7 @@ int vfio_device_set_group(struct vfio_device *device,
 void vfio_device_remove_group(struct vfio_device *device);
 void vfio_device_group_register(struct vfio_device *device);
 void vfio_device_group_unregister(struct vfio_device *device);
+bool vfio_device_group_uses_container(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device_file *df);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 121a75fadceb..4b5b17e8aaa1 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -422,9 +422,22 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 	if (!try_module_get(device->dev->driver->owner))
 		return -ENODEV;
 
+	/*
+	 * The handling here depends on what the user is using.
+	 *
+	 * If user uses iommufd in the group compat mode or the
+	 * cdev path, call vfio_iommufd_bind().
+	 *
+	 * If user uses container in the group legacy mode, call
+	 * vfio_device_group_use_iommu().
+	 *
+	 * If user doesn't use iommufd nor container, this is
+	 * the noiommufd mode in the cdev path, nothing needs
+	 * to be done here just go ahead to open device.
+	 */
 	if (iommufd)
 		ret = vfio_iommufd_bind(device, iommufd);
-	else
+	else if (vfio_device_group_uses_container(device))
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
 		goto err_module_put;
@@ -439,7 +452,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 err_unuse_iommu:
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (vfio_device_group_uses_container(device))
 		vfio_device_group_unuse_iommu(device);
 err_module_put:
 	module_put(device->dev->driver->owner);
@@ -457,7 +470,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 		device->ops->close_device(device);
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (vfio_device_group_uses_container(device))
 		vfio_device_group_unuse_iommu(device);
 	module_put(device->dev->driver->owner);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-01  9:19   ` Liu, Yi L
@ 2023-03-01 17:46     ` Jason Gunthorpe
  2023-03-02  4:09       ` Liu, Yi L
  2023-03-03  6:57       ` Liu, Yi L
  0 siblings, 2 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-03-01 17:46 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Wed, Mar 01, 2023 at 09:19:07AM +0000, Liu, Yi L wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, February 27, 2023 7:12 PM
> [...]
> > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				    unsigned long arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_bind_iommufd bind;
> > +	struct iommufd_ctx *iommufd = NULL;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > +
> > +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (bind.argsz < minsz || bind.flags)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> 
> Hi Jason,
> 
> Per the comment in vfio_iommufd_bind(), such device driver
> won't provide .bind_iommufd(). So shall we allow this ioctl
> to go longer to call .open_device() instead of failing it here?
> I think we need to allow it to go further. E.g. leave the check
> to be in vfio_iommufd_bind(). Otherwise, user may not able
> to use such devices. Is it?

You are thinking about the crazy mdev samples?

We should probably just change them to provide a 'no dma' set of ops.

> > +struct vfio_device_bind_iommufd {
> > +	__u32		argsz;
> > +	__u32		flags;
> > +	__aligned_u64	dev_cookie;
> > +	__s32		iommufd;
> > +	__u32		out_devid;
> 
> As above, for the devices that do not do DMA, there is no .bind_iommufd
> op, hence no iommufd_device generated. This means no good value
> can be filled in this out_devid field. So this field is optional. Only
> for the devices which do DMA, should this out_devid field return a
> valid ID otherwise an invalid ID would be filled (e.g. value #0 is an
> invalid value in the iommufd object id pool). Userspace needs to
> check if the out_devid is valid or not before use. This ID can be further
> used in iommufd uAPIs like IOMMU_HWPT_ALLOC, IOMMU_DEVICE_GET_INFO
> and etc.

I would say create an access and harmonize the no-DMA devices with the
emulated devices.

What should we return here anyhow if an access was created?

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-03-01 14:04                         ` Liu, Yi L
@ 2023-03-01 17:49                           ` Jason Gunthorpe
  2023-03-02  3:24                             ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-03-01 17:49 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Wed, Mar 01, 2023 at 02:04:00PM +0000, Liu, Yi L wrote:
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 2a13442add43..ed3ffe7ceb3f 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -777,6 +777,11 @@ void vfio_device_group_unregister(struct vfio_device *device)
>  	mutex_unlock(&device->group->device_lock);
>  }
>  
> +bool vfio_device_group_uses_container(struct vfio_device *device)
> +{
> +	return READ_ONCE(device->group->container);
> +}

As I said this should take in the vfio_device_file because as long as
a vfio_device_file exists then group->contianer is required to be stable.

> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 121a75fadceb..4b5b17e8aaa1 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -422,9 +422,22 @@ static int vfio_device_first_open(struct vfio_device_file *df)
>  	if (!try_module_get(device->dev->driver->owner))
>  		return -ENODEV;
>  
> +	/*
> +	 * The handling here depends on what the user is using.
> +	 *
> +	 * If user uses iommufd in the group compat mode or the
> +	 * cdev path, call vfio_iommufd_bind().
> +	 *
> +	 * If user uses container in the group legacy mode, call
> +	 * vfio_device_group_use_iommu().
> +	 *
> +	 * If user doesn't use iommufd nor container, this is
> +	 * the noiommufd mode in the cdev path, nothing needs
> +	 * to be done here just go ahead to open device.
> +	 */
>  	if (iommufd)
>  		ret = vfio_iommufd_bind(device, iommufd);
> -	else
> +	else if (vfio_device_group_uses_container(device))
>  		ret = vfio_device_group_use_iommu(device);

But yes, this makes alot more sense..

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev6)
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (20 preceding siblings ...)
  2023-02-27 19:21 ` [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Jason Gunthorpe
@ 2023-03-01 21:01 ` Patchwork
  2023-03-03  7:00 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev7) Patchwork
  22 siblings, 0 replies; 131+ messages in thread
From: Patchwork @ 2023-03-01 21:01 UTC (permalink / raw)
  To: Liu, Yi L; +Cc: intel-gfx

== Series Details ==

Series: Add vfio_device cdev for iommufd support (rev6)
URL   : https://patchwork.freedesktop.org/series/113696/
State : failure

== Summary ==

Error: patch https://patchwork.freedesktop.org/api/1.0/series/113696/revisions/6/mbox/ not applied
Applying: vfio: Allocate per device file structure
Using index info to reconstruct a base tree...
M	drivers/vfio/group.c
M	drivers/vfio/vfio.h
M	drivers/vfio/vfio_main.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/vfio/vfio_main.c
Auto-merging drivers/vfio/vfio.h
Auto-merging drivers/vfio/group.c
Applying: vfio: Refine vfio file kAPIs for KVM
Using index info to reconstruct a base tree...
M	drivers/vfio/group.c
M	drivers/vfio/vfio.h
M	drivers/vfio/vfio_main.c
M	include/linux/vfio.h
Falling back to patching base and 3-way merge...
Auto-merging include/linux/vfio.h
Auto-merging drivers/vfio/vfio_main.c
Auto-merging drivers/vfio/vfio.h
Auto-merging drivers/vfio/group.c
CONFLICT (content): Merge conflict in drivers/vfio/group.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 vfio: Refine vfio file kAPIs for KVM
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-03-01 17:49                           ` Jason Gunthorpe
@ 2023-03-02  3:24                             ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-02  3:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 2, 2023 1:49 AM
> 
> On Wed, Mar 01, 2023 at 02:04:00PM +0000, Liu, Yi L wrote:
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 2a13442add43..ed3ffe7ceb3f 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -777,6 +777,11 @@ void vfio_device_group_unregister(struct
> vfio_device *device)
> >  	mutex_unlock(&device->group->device_lock);
> >  }
> >
> > +bool vfio_device_group_uses_container(struct vfio_device *device)
> > +{
> > +	return READ_ONCE(device->group->container);
> > +}
> 
> As I said this should take in the vfio_device_file because as long as
> a vfio_device_file exists then group->contianer is required to be stable.

Ok, let me store vfio_group in vfio_devcie_file instead of reach
it by df->device->group.

btw. With vfio_group stored in vfio_device_file, it looks like
the is_cdev_device flag (introduced in patch 14) is not necessary
now, we can always define the group pointer in vfio_device_file
even group code is compiled out, then we can use this group
pointer to check if the vfio_device_file is used in the group path
or the cdev path. Is it?

> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 121a75fadceb..4b5b17e8aaa1 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -422,9 +422,22 @@ static int vfio_device_first_open(struct
> vfio_device_file *df)
> >  	if (!try_module_get(device->dev->driver->owner))
> >  		return -ENODEV;
> >
> > +	/*
> > +	 * The handling here depends on what the user is using.
> > +	 *
> > +	 * If user uses iommufd in the group compat mode or the
> > +	 * cdev path, call vfio_iommufd_bind().
> > +	 *
> > +	 * If user uses container in the group legacy mode, call
> > +	 * vfio_device_group_use_iommu().
> > +	 *
> > +	 * If user doesn't use iommufd nor container, this is
> > +	 * the noiommufd mode in the cdev path, nothing needs
> > +	 * to be done here just go ahead to open device.
> > +	 */
> >  	if (iommufd)
> >  		ret = vfio_iommufd_bind(device, iommufd);
> > -	else
> > +	else if (vfio_device_group_uses_container(device))
> >  		ret = vfio_device_group_use_iommu(device);
> 
> But yes, this makes alot more sense..
> 
> Jason

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-01 17:46     ` Jason Gunthorpe
@ 2023-03-02  4:09       ` Liu, Yi L
  2023-03-03  6:57       ` Liu, Yi L
  1 sibling, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-02  4:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 2, 2023 1:47 AM
> 
> On Wed, Mar 01, 2023 at 09:19:07AM +0000, Liu, Yi L wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Monday, February 27, 2023 7:12 PM
> > [...]
> > > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > +				    unsigned long arg)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +	struct vfio_device_bind_iommufd bind;
> > > +	struct iommufd_ctx *iommufd = NULL;
> > > +	unsigned long minsz;
> > > +	int ret;
> > > +
> > > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > > +
> > > +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	if (bind.argsz < minsz || bind.flags)
> > > +		return -EINVAL;
> > > +
> > > +	if (!device->ops->bind_iommufd)
> > > +		return -ENODEV;
> >
> > Hi Jason,
> >
> > Per the comment in vfio_iommufd_bind(), such device driver
> > won't provide .bind_iommufd(). So shall we allow this ioctl
> > to go longer to call .open_device() instead of failing it here?
> > I think we need to allow it to go further. E.g. leave the check
> > to be in vfio_iommufd_bind(). Otherwise, user may not able
> > to use such devices. Is it?
> 
> You are thinking about the crazy mdev samples?

Yes. we don't have real devices which don't do DMA. Is it?
 
> We should probably just change them to provide a 'no dma' set of ops.

Yes. at least generate iommufd_device I suppose.

> > > +struct vfio_device_bind_iommufd {
> > > +	__u32		argsz;
> > > +	__u32		flags;
> > > +	__aligned_u64	dev_cookie;
> > > +	__s32		iommufd;
> > > +	__u32		out_devid;
> >
> > As above, for the devices that do not do DMA, there is no .bind_iommufd
> > op, hence no iommufd_device generated. This means no good value
> > can be filled in this out_devid field. So this field is optional. Only
> > for the devices which do DMA, should this out_devid field return a
> > valid ID otherwise an invalid ID would be filled (e.g. value #0 is an
> > invalid value in the iommufd object id pool). Userspace needs to
> > check if the out_devid is valid or not before use. This ID can be further
> > used in iommufd uAPIs like IOMMU_HWPT_ALLOC,
> IOMMU_DEVICE_GET_INFO
> > and etc.
> 
> I would say create an access and harmonize the no-DMA devices with the
> emulated devices.

In this case, iommufd_access would be created instead of iommufd_device.

> What should we return here anyhow if an access was created?

It depends on what can be done with this id and whether this field is mandatory.
For iommufd_device ID, the user could further use it to query iommu device info and
alloc hwpt. Do we have a similar usage for iommufd_access? And if we define this
field as optional, then we may return iommufd_access object Id in future if it is
needed.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
  2023-02-27 18:22   ` Jason Gunthorpe
@ 2023-03-02  6:07   ` Liu, Yi L
  2023-03-02  9:55     ` Tian, Kevin
  2023-03-02 21:04     ` Alex Williamson
  1 sibling, 2 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-02  6:07 UTC (permalink / raw)
  To: alex.williamson@redhat.com, jgg@nvidia.com, Tian, Kevin
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, February 27, 2023 7:11 PM
[...]
> @@ -2392,13 +2416,25 @@ static int
> vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
>  	return ret;
>  }
> 
> +static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> +				    struct iommufd_ctx *iommufd_ctx)
> +{
> +	struct iommufd_ctx *iommufd = vfio_device_iommufd(&vdev-
> >vdev);
> +
> +	if (!iommufd)
> +		return false;
> +
> +	return iommufd == iommufd_ctx;
> +}
> +
>  /*
>   * We need to get memory_lock for each device, but devices can share
> mmap_lock,
>   * therefore we need to zap and hold the vma_lock for each device, and
> only then
>   * get each memory_lock.
>   */
>  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> -				      struct vfio_pci_group_info *groups)
> +				      struct vfio_pci_group_info *groups,
> +				      struct iommufd_ctx *iommufd_ctx)
>  {
>  	struct vfio_pci_core_device *cur_mem;
>  	struct vfio_pci_core_device *cur_vma;
> @@ -2429,10 +2465,27 @@ static int vfio_pci_dev_set_hot_reset(struct
> vfio_device_set *dev_set,
> 
>  	list_for_each_entry(cur_vma, &dev_set->device_list,
> vdev.dev_set_list) {
>  		/*
> -		 * Test whether all the affected devices are contained by
> the
> -		 * set of groups provided by the user.
> +		 * Test whether all the affected devices can be reset by the
> +		 * user.  The affected devices may already been opened or
> not
> +		 * yet.
> +		 *
> +		 * For the devices not opened yet, user can reset them. The
> +		 * reason is that the hot reset is done under the protection
> +		 * of the dev_set->lock, and device open is also under this
> +		 * lock.  During the hot reset, such devices can not be
> opened
> +		 * by other users.
> +		 *
> +		 * For the devices that have been opened, needs to check
> the
> +		 * ownership.  If the user provides a set of group fds, the
> +		 * ownership check is done by checking if all the opened
> +		 * devices are contained by the groups.  If the user provides
> +		 * a zero-length fd array, the ownerhsip check is done by
> +		 * checking if all the opened devices are bound to the same
> +		 * iommufd_ctx.
>  		 */
> -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> +		if (cur_vma->vdev.open_count &&
> +		    !vfio_dev_in_groups(cur_vma, groups) &&
> +		    !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx)) {

Hi Alex, Jason,

There is one concern on this approach which is related to the
cdev noiommu mode. As patch 16 of this series, cdev path
supports noiommu mode by passing a negative iommufd to
kernel. In such case, the vfio_device is not bound to a valid
iommufd. Then the check in vfio_dev_in_iommufd_ctx() is
to be broken.

An idea is to add a cdev_noiommu flag in vfio_device, when
checking the iommufd_ictx, also check this flag. If all the opened
devices in the dev_set have vfio_device->cdev_noiommu==true,
then the reset is considered to be doable. But there is a special
case. If devices in this dev_set are opened by two applications
that operates in cdev noiommu mode, then this logic is not able
to differentiate them. In that case, should we allow the reset?
It seems to ok to allow reset since noiommu mode itself means
no security between the applications that use it. thoughts?

>  			ret = -EINVAL;
>  			goto err_undo;
>  		}
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 2e3cb284711d..64e862a02dad 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -225,6 +225,11 @@ static inline void vfio_container_cleanup(void)
>  #if IS_ENABLED(CONFIG_IOMMUFD)
>  int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx
> *ictx);
>  void vfio_iommufd_unbind(struct vfio_device *device);
> +static inline struct iommufd_ctx *
> +vfio_device_iommufd(struct vfio_device *device)
> +{
> +	return device->iommufd_ictx;
> +}
>  #else

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-01  2:29       ` Nicolin Chen
  2023-03-01  3:44         ` Liu, Yi L
@ 2023-03-02  9:43         ` Shameerali Kolothum Thodi
  2023-03-02 23:51           ` Nicolin Chen
  2023-03-03 21:29         ` Matthew Rosato
  2 siblings, 1 reply; 131+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-02  9:43 UTC (permalink / raw)
  To: Nicolin Chen, Xu, Terrence
  Cc: linux-s390@vger.kernel.org, Liu, Yi L, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com,
	suravee.suthikulpanit@amd.com, Zhao,  Yan Y,
	eric.auger@redhat.com, Jason Gunthorpe,
	intel-gfx@lists.freedesktop.org, chao.p.peng@linux.intel.com,
	lulu@redhat.com, robin.murphy@arm.com, jasowang@redhat.com


> -----Original Message-----
> From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> Sent: 01 March 2023 02:29
> To: Xu, Terrence <terrence.xu@intel.com>
> Cc: Liu, Yi L <yi.l.liu@intel.com>; Jason Gunthorpe <jgg@nvidia.com>;
> alex.williamson@redhat.com; Tian, Kevin <kevin.tian@intel.com>;
> joro@8bytes.org; robin.murphy@arm.com; cohuck@redhat.com;
> eric.auger@redhat.com; kvm@vger.kernel.org; mjrosato@linux.ibm.com;
> chao.p.peng@linux.intel.com; yi.y.sun@linux.intel.com; peterx@redhat.com;
> jasowang@redhat.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; lulu@redhat.com;
> suravee.suthikulpanit@amd.com; intel-gvt-dev@lists.freedesktop.org;
> intel-gfx@lists.freedesktop.org; linux-s390@vger.kernel.org; Hao, Xudong
> <xudong.hao@intel.com>; Zhao, Yan Y <yan.y.zhao@intel.com>
> Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 
> On Tue, Feb 28, 2023 at 04:58:06PM +0000, Xu, Terrence wrote:
> 
> > Verified this series by "Intel GVT-g GPU device mediated passthrough" and
> "Intel GVT-d GPU device direct passthrough" technologies.
> > Both passed VFIO legacy mode / compat mode / cdev mode, including
> negative tests.
> >
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> 
> Sanity-tested this series on ARM64 with my wip branch:
> https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-nesting
> (Covering new iommufd and vfio-compat)

Hi Nicolin,

Thanks for the latest ARM64 branch. Do you have a working Qemu branch corresponding to the
above one?

I tried the https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2Bsmmuv3
but for some reason not able to launch the Guest.

Please let me know.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-02  6:07   ` Liu, Yi L
@ 2023-03-02  9:55     ` Tian, Kevin
  2023-03-02 12:35       ` Jason Gunthorpe
  2023-03-02 21:04     ` Alex Williamson
  1 sibling, 1 reply; 131+ messages in thread
From: Tian, Kevin @ 2023-03-02  9:55 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson@redhat.com, jgg@nvidia.com
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, March 2, 2023 2:07 PM
> 
> > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > +		if (cur_vma->vdev.open_count &&
> > +		    !vfio_dev_in_groups(cur_vma, groups) &&
> > +		    !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx)) {
> 
> Hi Alex, Jason,
> 
> There is one concern on this approach which is related to the
> cdev noiommu mode. As patch 16 of this series, cdev path
> supports noiommu mode by passing a negative iommufd to
> kernel. In such case, the vfio_device is not bound to a valid
> iommufd. Then the check in vfio_dev_in_iommufd_ctx() is
> to be broken.
> 
> An idea is to add a cdev_noiommu flag in vfio_device, when
> checking the iommufd_ictx, also check this flag. If all the opened
> devices in the dev_set have vfio_device->cdev_noiommu==true,
> then the reset is considered to be doable. But there is a special
> case. If devices in this dev_set are opened by two applications
> that operates in cdev noiommu mode, then this logic is not able
> to differentiate them. In that case, should we allow the reset?
> It seems to ok to allow reset since noiommu mode itself means
> no security between the applications that use it. thoughts?
> 

Probably we need still pass in a valid iommufd (instead of using
a negative value) in noiommu case to mark the ownership so the
check in the reset path can correctly catch whether an opened
device belongs to this user.

That implies we may instead use a flag bit to mark NOIOMMU
mode and in the kernel also has a noiommu flag in device
file to differentiate it from normal case.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-02  9:55     ` Tian, Kevin
@ 2023-03-02 12:35       ` Jason Gunthorpe
  2023-03-02 14:20         ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-03-02 12:35 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, nicolinc@nvidia.com,
	Zhao, Yan Y, intel-gfx@lists.freedesktop.org,
	eric.auger@redhat.com, intel-gvt-dev@lists.freedesktop.org,
	yi.y.sun@linux.intel.com, cohuck@redhat.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Thu, Mar 02, 2023 at 09:55:46AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, March 2, 2023 2:07 PM
> > 
> > > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > > +		if (cur_vma->vdev.open_count &&
> > > +		    !vfio_dev_in_groups(cur_vma, groups) &&
> > > +		    !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx)) {
> > 
> > Hi Alex, Jason,
> > 
> > There is one concern on this approach which is related to the
> > cdev noiommu mode. As patch 16 of this series, cdev path
> > supports noiommu mode by passing a negative iommufd to
> > kernel. In such case, the vfio_device is not bound to a valid
> > iommufd. Then the check in vfio_dev_in_iommufd_ctx() is
> > to be broken.
> > 
> > An idea is to add a cdev_noiommu flag in vfio_device, when
> > checking the iommufd_ictx, also check this flag. If all the opened
> > devices in the dev_set have vfio_device->cdev_noiommu==true,
> > then the reset is considered to be doable. But there is a special
> > case. If devices in this dev_set are opened by two applications
> > that operates in cdev noiommu mode, then this logic is not able
> > to differentiate them. In that case, should we allow the reset?
> > It seems to ok to allow reset since noiommu mode itself means
> > no security between the applications that use it. thoughts?
> > 
> 
> Probably we need still pass in a valid iommufd (instead of using
> a negative value) in noiommu case to mark the ownership so the
> check in the reset path can correctly catch whether an opened
> device belongs to this user.

There should be no iommufd at all in no-iommu mode

Adding one just to deal with noiommu reset seems pretty sad :\

no-iommu is only really used by dpdk, and it doesn't invoke
VFIO_DEVICE_PCI_HOT_RESET at all.

I'd say as long as VFIO_DEVICE_PCI_HOT_RESET works if only one vfio
device is open using a empty list (eg we should ensure that the
invoking cdev itself is allowed) then I think it is OK.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-02 12:35       ` Jason Gunthorpe
@ 2023-03-02 14:20         ` Liu, Yi L
  2023-03-03  6:36           ` Tian, Kevin
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-03-02 14:20 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 2, 2023 8:35 PM
> 
> On Thu, Mar 02, 2023 at 09:55:46AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Thursday, March 2, 2023 2:07 PM
> > >
> > > > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > > > +		if (cur_vma->vdev.open_count &&
> > > > +		    !vfio_dev_in_groups(cur_vma, groups) &&
> > > > +		    !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx)) {
> > >
> > > Hi Alex, Jason,
> > >
> > > There is one concern on this approach which is related to the
> > > cdev noiommu mode. As patch 16 of this series, cdev path
> > > supports noiommu mode by passing a negative iommufd to
> > > kernel. In such case, the vfio_device is not bound to a valid
> > > iommufd. Then the check in vfio_dev_in_iommufd_ctx() is
> > > to be broken.
> > >
> > > An idea is to add a cdev_noiommu flag in vfio_device, when
> > > checking the iommufd_ictx, also check this flag. If all the opened
> > > devices in the dev_set have vfio_device->cdev_noiommu==true,
> > > then the reset is considered to be doable. But there is a special
> > > case. If devices in this dev_set are opened by two applications
> > > that operates in cdev noiommu mode, then this logic is not able
> > > to differentiate them. In that case, should we allow the reset?
> > > It seems to ok to allow reset since noiommu mode itself means
> > > no security between the applications that use it. thoughts?
> > >
> >
> > Probably we need still pass in a valid iommufd (instead of using
> > a negative value) in noiommu case to mark the ownership so the
> > check in the reset path can correctly catch whether an opened
> > device belongs to this user.
> 
> There should be no iommufd at all in no-iommu mode
> 
> Adding one just to deal with noiommu reset seems pretty sad :\
> 
> no-iommu is only really used by dpdk, and it doesn't invoke
> VFIO_DEVICE_PCI_HOT_RESET at all.

Does it happen to be or by design, this ioctl is not needed by dpdk?

> I'd say as long as VFIO_DEVICE_PCI_HOT_RESET works if only one vfio
> device is open using a empty list (eg we should ensure that the
> invoking cdev itself is allowed) then I think it is OK.

Sorry, which empty list are your referring?

Regards,
Yi Liu 

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-02  6:07   ` Liu, Yi L
  2023-03-02  9:55     ` Tian, Kevin
@ 2023-03-02 21:04     ` Alex Williamson
  1 sibling, 0 replies; 131+ messages in thread
From: Alex Williamson @ 2023-03-02 21:04 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, jgg@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Thu, 2 Mar 2023 06:07:04 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, February 27, 2023 7:11 PM  
> [...]
> > @@ -2392,13 +2416,25 @@ static int
> > vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
> >  	return ret;
> >  }
> > 
> > +static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> > +				    struct iommufd_ctx *iommufd_ctx)
> > +{
> > +	struct iommufd_ctx *iommufd = vfio_device_iommufd(&vdev-  
> > >vdev);  
> > +
> > +	if (!iommufd)
> > +		return false;
> > +
> > +	return iommufd == iommufd_ctx;
> > +}
> > +
> >  /*
> >   * We need to get memory_lock for each device, but devices can share
> > mmap_lock,
> >   * therefore we need to zap and hold the vma_lock for each device, and
> > only then
> >   * get each memory_lock.
> >   */
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > -				      struct vfio_pci_group_info *groups)
> > +				      struct vfio_pci_group_info *groups,
> > +				      struct iommufd_ctx *iommufd_ctx)
> >  {
> >  	struct vfio_pci_core_device *cur_mem;
> >  	struct vfio_pci_core_device *cur_vma;
> > @@ -2429,10 +2465,27 @@ static int vfio_pci_dev_set_hot_reset(struct
> > vfio_device_set *dev_set,
> > 
> >  	list_for_each_entry(cur_vma, &dev_set->device_list,
> > vdev.dev_set_list) {
> >  		/*
> > -		 * Test whether all the affected devices are contained by
> > the
> > -		 * set of groups provided by the user.
> > +		 * Test whether all the affected devices can be reset by the
> > +		 * user.  The affected devices may already been opened or
> > not
> > +		 * yet.
> > +		 *
> > +		 * For the devices not opened yet, user can reset them. The
> > +		 * reason is that the hot reset is done under the protection
> > +		 * of the dev_set->lock, and device open is also under this
> > +		 * lock.  During the hot reset, such devices can not be
> > opened
> > +		 * by other users.
> > +		 *
> > +		 * For the devices that have been opened, needs to check
> > the
> > +		 * ownership.  If the user provides a set of group fds, the
> > +		 * ownership check is done by checking if all the opened
> > +		 * devices are contained by the groups.  If the user provides
> > +		 * a zero-length fd array, the ownerhsip check is done by
> > +		 * checking if all the opened devices are bound to the same
> > +		 * iommufd_ctx.
> >  		 */
> > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > +		if (cur_vma->vdev.open_count &&
> > +		    !vfio_dev_in_groups(cur_vma, groups) &&
> > +		    !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx)) {  
> 
> Hi Alex, Jason,
> 
> There is one concern on this approach which is related to the
> cdev noiommu mode. As patch 16 of this series, cdev path
> supports noiommu mode by passing a negative iommufd to
> kernel. In such case, the vfio_device is not bound to a valid
> iommufd. Then the check in vfio_dev_in_iommufd_ctx() is
> to be broken.
> 
> An idea is to add a cdev_noiommu flag in vfio_device, when
> checking the iommufd_ictx, also check this flag. If all the opened
> devices in the dev_set have vfio_device->cdev_noiommu==true,
> then the reset is considered to be doable. But there is a special
> case. If devices in this dev_set are opened by two applications
> that operates in cdev noiommu mode, then this logic is not able
> to differentiate them. In that case, should we allow the reset?
> It seems to ok to allow reset since noiommu mode itself means
> no security between the applications that use it. thoughts?

I don't think the existing vulnerabilities of no-iommu mode should be
carte blanche to add additional weaknesses.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-02  9:43         ` Shameerali Kolothum Thodi
@ 2023-03-02 23:51           ` Nicolin Chen
  2023-03-03 15:01             ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 131+ messages in thread
From: Nicolin Chen @ 2023-03-02 23:51 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, Jason Gunthorpe, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, suravee.suthikulpanit@amd.com,
	robin.murphy@arm.com

On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum Thodi wrote:
 
> Hi Nicolin,
> 
> Thanks for the latest ARM64 branch. Do you have a working Qemu branch corresponding to the
> above one?
> 
> I tried the https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2Bsmmuv3
> but for some reason not able to launch the Guest.
> 
> Please let me know.

I do use that branch. It might not be that robust though as it
went through a big rebase. Can you try with the followings?

--trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*" --trace "msi_*" --trace "nvme_*"

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-02 14:20         ` Liu, Yi L
@ 2023-03-03  6:36           ` Tian, Kevin
  2023-03-03 16:55             ` Alex Williamson
  0 siblings, 1 reply; 131+ messages in thread
From: Tian, Kevin @ 2023-03-03  6:36 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, March 2, 2023 10:20 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, March 2, 2023 8:35 PM
> >
> > On Thu, Mar 02, 2023 at 09:55:46AM +0000, Tian, Kevin wrote:
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Thursday, March 2, 2023 2:07 PM
> > > >
> > > > > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > > > > +		if (cur_vma->vdev.open_count &&
> > > > > +		    !vfio_dev_in_groups(cur_vma, groups) &&
> > > > > +		    !vfio_dev_in_iommufd_ctx(cur_vma,
> iommufd_ctx)) {
> > > >
> > > > Hi Alex, Jason,
> > > >
> > > > There is one concern on this approach which is related to the
> > > > cdev noiommu mode. As patch 16 of this series, cdev path
> > > > supports noiommu mode by passing a negative iommufd to
> > > > kernel. In such case, the vfio_device is not bound to a valid
> > > > iommufd. Then the check in vfio_dev_in_iommufd_ctx() is
> > > > to be broken.
> > > >
> > > > An idea is to add a cdev_noiommu flag in vfio_device, when
> > > > checking the iommufd_ictx, also check this flag. If all the opened
> > > > devices in the dev_set have vfio_device->cdev_noiommu==true,
> > > > then the reset is considered to be doable. But there is a special
> > > > case. If devices in this dev_set are opened by two applications
> > > > that operates in cdev noiommu mode, then this logic is not able
> > > > to differentiate them. In that case, should we allow the reset?
> > > > It seems to ok to allow reset since noiommu mode itself means
> > > > no security between the applications that use it. thoughts?
> > > >
> > >
> > > Probably we need still pass in a valid iommufd (instead of using
> > > a negative value) in noiommu case to mark the ownership so the
> > > check in the reset path can correctly catch whether an opened
> > > device belongs to this user.
> >
> > There should be no iommufd at all in no-iommu mode
> >
> > Adding one just to deal with noiommu reset seems pretty sad :\
> >
> > no-iommu is only really used by dpdk, and it doesn't invoke
> > VFIO_DEVICE_PCI_HOT_RESET at all.
> 
> Does it happen to be or by design, this ioctl is not needed by dpdk?

use of noiommu should be discouraged.

if only known noiommu user doesn't use it then having certain
new restriction for noiommu in the hot reset path might be an
acceptable tradeoff.

but again needs Alex's input as he knows all the history about
noiommu. 😊

> 
> > I'd say as long as VFIO_DEVICE_PCI_HOT_RESET works if only one vfio
> > device is open using a empty list (eg we should ensure that the
> > invoking cdev itself is allowed) then I think it is OK.
> 
> Sorry, which empty list are your referring?
> 

I guess it refers to zero-length fd array.

But IMHO this restriction better only applies to the case where
noiommu device (iommufd_ctx=NULL) exists in the device set.

otherwise we still compare iommufd_ctx when multiple devices
are opened.

Then the impact to noiommu case is just that user cannot do
hot reset when it opens multiple devices in a same set.


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-01 17:46     ` Jason Gunthorpe
  2023-03-02  4:09       ` Liu, Yi L
@ 2023-03-03  6:57       ` Liu, Yi L
  2023-03-03  7:23         ` Liu, Yi L
  2023-03-07  6:38         ` Tian, Kevin
  1 sibling, 2 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-03  6:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 2, 2023 1:47 AM
> 
> On Wed, Mar 01, 2023 at 09:19:07AM +0000, Liu, Yi L wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Monday, February 27, 2023 7:12 PM
> > [...]
> > > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > +				    unsigned long arg)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +	struct vfio_device_bind_iommufd bind;
> > > +	struct iommufd_ctx *iommufd = NULL;
> > > +	unsigned long minsz;
> > > +	int ret;
> > > +
> > > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > > +
> > > +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	if (bind.argsz < minsz || bind.flags)
> > > +		return -EINVAL;
> > > +
> > > +	if (!device->ops->bind_iommufd)
> > > +		return -ENODEV;
> >
> > Hi Jason,
> >
> > Per the comment in vfio_iommufd_bind(), such device driver
> > won't provide .bind_iommufd(). So shall we allow this ioctl
> > to go longer to call .open_device() instead of failing it here?
> > I think we need to allow it to go further. E.g. leave the check
> > to be in vfio_iommufd_bind(). Otherwise, user may not able
> > to use such devices. Is it?
> 
> You are thinking about the crazy mdev samples?
> 
> We should probably just change them to provide a 'no dma' set of ops.
> 
> > > +struct vfio_device_bind_iommufd {
> > > +	__u32		argsz;
> > > +	__u32		flags;
> > > +	__aligned_u64	dev_cookie;
> > > +	__s32		iommufd;
> > > +	__u32		out_devid;
> >
> > As above, for the devices that do not do DMA, there is no .bind_iommufd
> > op, hence no iommufd_device generated. This means no good value
> > can be filled in this out_devid field. So this field is optional. Only
> > for the devices which do DMA, should this out_devid field return a
> > valid ID otherwise an invalid ID would be filled (e.g. value #0 is an
> > invalid value in the iommufd object id pool). Userspace needs to
> > check if the out_devid is valid or not before use. This ID can be further
> > used in iommufd uAPIs like IOMMU_HWPT_ALLOC,
> IOMMU_DEVICE_GET_INFO
> > and etc.
> 
> I would say create an access and harmonize the no-DMA devices with the
> emulated devices.

How about below change?

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 4f82a6fa7c6c..e536515086d7 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -18,12 +18,8 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	/*
-	 * If the driver doesn't provide this op then it means the device does
-	 * not do DMA at all. So nothing to do.
-	 */
-	if (!vdev->ops->bind_iommufd)
-		return 0;
+	if (WARN_ON(!vdev->ops->bind_iommufd))
+		return -ENODEV;
 
 	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
 	if (ret)
@@ -102,7 +98,9 @@ EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
 /*
  * The emulated standard ops mean that vfio_device is going to use the
  * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
- * ops set should call vfio_register_emulated_iommu_dev().
+ * ops set should call vfio_register_emulated_iommu_dev(). Drivers that do
+ * not call  vfio_pin_pages()/vfio_dma_rw() has no need to provide dma_unmap
+ * callback.
  */
 
 static void vfio_emulated_unmap(void *data, unsigned long iova,
@@ -110,7 +107,8 @@ static void vfio_emulated_unmap(void *data, unsigned long iova,
 {
 	struct vfio_device *vdev = data;
 
-	vdev->ops->dma_unmap(vdev, iova, length);
+	if (vdev->ops->dma_unmap)
+		vdev->ops->dma_unmap(vdev, iova, length);
 }
 
 static const struct iommufd_access_ops vfio_user_ops = {
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
index e54eb752e1ba..19391dda5fba 100644
--- a/samples/vfio-mdev/mbochs.c
+++ b/samples/vfio-mdev/mbochs.c
@@ -1374,6 +1374,9 @@ static const struct vfio_device_ops mbochs_dev_ops = {
 	.write = mbochs_write,
 	.ioctl = mbochs_ioctl,
 	.mmap = mbochs_mmap,
+	.bind_iommufd	= vfio_iommufd_emulated_bind,
+	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
+	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
 };
 
 static struct mdev_driver mbochs_driver = {
diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c
index e8400fdab71d..5f48aef36995 100644
--- a/samples/vfio-mdev/mdpy.c
+++ b/samples/vfio-mdev/mdpy.c
@@ -663,6 +663,9 @@ static const struct vfio_device_ops mdpy_dev_ops = {
 	.write = mdpy_write,
 	.ioctl = mdpy_ioctl,
 	.mmap = mdpy_mmap,
+	.bind_iommufd	= vfio_iommufd_emulated_bind,
+	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
+	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
 };
 
 static struct mdev_driver mdpy_driver = {
diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index e887de672c52..35460901b9f7 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -1269,6 +1269,9 @@ static const struct vfio_device_ops mtty_dev_ops = {
 	.read = mtty_read,
 	.write = mtty_write,
 	.ioctl = mtty_ioctl,
+	.bind_iommufd	= vfio_iommufd_emulated_bind,
+	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
+	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
 };
 
 static struct mdev_driver mtty_driver = {

> What should we return here anyhow if an access was created?

iommufd_access->obj.id. should be fine. Is it?

Regards,
Yi Liu

^ permalink raw reply related	[flat|nested] 131+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev7)
  2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (21 preceding siblings ...)
  2023-03-01 21:01 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev6) Patchwork
@ 2023-03-03  7:00 ` Patchwork
  22 siblings, 0 replies; 131+ messages in thread
From: Patchwork @ 2023-03-03  7:00 UTC (permalink / raw)
  To: Liu, Yi L; +Cc: intel-gfx

== Series Details ==

Series: Add vfio_device cdev for iommufd support (rev7)
URL   : https://patchwork.freedesktop.org/series/113696/
State : failure

== Summary ==

Error: patch https://patchwork.freedesktop.org/api/1.0/series/113696/revisions/7/mbox/ not applied
Applying: vfio: Allocate per device file structure
Using index info to reconstruct a base tree...
M	drivers/vfio/group.c
M	drivers/vfio/vfio.h
M	drivers/vfio/vfio_main.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/vfio/vfio_main.c
Auto-merging drivers/vfio/vfio.h
Auto-merging drivers/vfio/group.c
Applying: vfio: Refine vfio file kAPIs for KVM
Using index info to reconstruct a base tree...
M	drivers/vfio/group.c
M	drivers/vfio/vfio.h
M	drivers/vfio/vfio_main.c
M	include/linux/vfio.h
Falling back to patching base and 3-way merge...
Auto-merging include/linux/vfio.h
Auto-merging drivers/vfio/vfio_main.c
Auto-merging drivers/vfio/vfio.h
Auto-merging drivers/vfio/group.c
CONFLICT (content): Merge conflict in drivers/vfio/group.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 vfio: Refine vfio file kAPIs for KVM
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-03  6:57       ` Liu, Yi L
@ 2023-03-03  7:23         ` Liu, Yi L
  2023-03-07  6:38         ` Tian, Kevin
  1 sibling, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-03  7:23 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, nicolinc@nvidia.com, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Friday, March 3, 2023 2:58 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, March 2, 2023 1:47 AM
> >
> > On Wed, Mar 01, 2023 at 09:19:07AM +0000, Liu, Yi L wrote:
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Monday, February 27, 2023 7:12 PM
> > > [...]
> > > > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > > +				    unsigned long arg)
> > > > +{
> > > > +	struct vfio_device *device = df->device;
> > > > +	struct vfio_device_bind_iommufd bind;
> > > > +	struct iommufd_ctx *iommufd = NULL;
> > > > +	unsigned long minsz;
> > > > +	int ret;
> > > > +
> > > > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > > > +
> > > > +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> > > > +		return -EFAULT;
> > > > +
> > > > +	if (bind.argsz < minsz || bind.flags)
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (!device->ops->bind_iommufd)
> > > > +		return -ENODEV;
> > >
> > > Hi Jason,
> > >
> > > Per the comment in vfio_iommufd_bind(), such device driver
> > > won't provide .bind_iommufd(). So shall we allow this ioctl
> > > to go longer to call .open_device() instead of failing it here?
> > > I think we need to allow it to go further. E.g. leave the check
> > > to be in vfio_iommufd_bind(). Otherwise, user may not able
> > > to use such devices. Is it?
> >
> > You are thinking about the crazy mdev samples?
> >
> > We should probably just change them to provide a 'no dma' set of ops.
> >
> > > > +struct vfio_device_bind_iommufd {
> > > > +	__u32		argsz;
> > > > +	__u32		flags;
> > > > +	__aligned_u64	dev_cookie;
> > > > +	__s32		iommufd;
> > > > +	__u32		out_devid;
> > >
> > > As above, for the devices that do not do DMA, there is
> no .bind_iommufd
> > > op, hence no iommufd_device generated. This means no good value
> > > can be filled in this out_devid field. So this field is optional. Only
> > > for the devices which do DMA, should this out_devid field return a
> > > valid ID otherwise an invalid ID would be filled (e.g. value #0 is an
> > > invalid value in the iommufd object id pool). Userspace needs to
> > > check if the out_devid is valid or not before use. This ID can be further
> > > used in iommufd uAPIs like IOMMU_HWPT_ALLOC,
> > IOMMU_DEVICE_GET_INFO
> > > and etc.
> >
> > I would say create an access and harmonize the no-DMA devices with the
> > emulated devices.
> 
> How about below change?
> 
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 4f82a6fa7c6c..e536515086d7 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -18,12 +18,8 @@ int vfio_iommufd_bind(struct vfio_device *vdev,
> struct iommufd_ctx *ictx)
> 
>  	lockdep_assert_held(&vdev->dev_set->lock);
> 
> -	/*
> -	 * If the driver doesn't provide this op then it means the device does
> -	 * not do DMA at all. So nothing to do.
> -	 */
> -	if (!vdev->ops->bind_iommufd)
> -		return 0;
> +	if (WARN_ON(!vdev->ops->bind_iommufd))
> +		return -ENODEV;
> 
>  	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
>  	if (ret)
> @@ -102,7 +98,9 @@
> EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
>  /*
>   * The emulated standard ops mean that vfio_device is going to use the
>   * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using
> this
> - * ops set should call vfio_register_emulated_iommu_dev().
> + * ops set should call vfio_register_emulated_iommu_dev(). Drivers that
> do
> + * not call  vfio_pin_pages()/vfio_dma_rw() has no need to provide
> dma_unmap
> + * callback.
>   */
> 
>  static void vfio_emulated_unmap(void *data, unsigned long iova,
> @@ -110,7 +107,8 @@ static void vfio_emulated_unmap(void *data,
> unsigned long iova,
>  {
>  	struct vfio_device *vdev = data;
> 
> -	vdev->ops->dma_unmap(vdev, iova, length);
> +	if (vdev->ops->dma_unmap)
> +		vdev->ops->dma_unmap(vdev, iova, length);
>  }
> 
>  static const struct iommufd_access_ops vfio_user_ops = {
> diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
> index e54eb752e1ba..19391dda5fba 100644
> --- a/samples/vfio-mdev/mbochs.c
> +++ b/samples/vfio-mdev/mbochs.c
> @@ -1374,6 +1374,9 @@ static const struct vfio_device_ops
> mbochs_dev_ops = {
>  	.write = mbochs_write,
>  	.ioctl = mbochs_ioctl,
>  	.mmap = mbochs_mmap,
> +	.bind_iommufd	= vfio_iommufd_emulated_bind,
> +	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
> +	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
>  };
> 
>  static struct mdev_driver mbochs_driver = {
> diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c
> index e8400fdab71d..5f48aef36995 100644
> --- a/samples/vfio-mdev/mdpy.c
> +++ b/samples/vfio-mdev/mdpy.c
> @@ -663,6 +663,9 @@ static const struct vfio_device_ops mdpy_dev_ops =
> {
>  	.write = mdpy_write,
>  	.ioctl = mdpy_ioctl,
>  	.mmap = mdpy_mmap,
> +	.bind_iommufd	= vfio_iommufd_emulated_bind,
> +	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
> +	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
>  };
> 
>  static struct mdev_driver mdpy_driver = {
> diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
> index e887de672c52..35460901b9f7 100644
> --- a/samples/vfio-mdev/mtty.c
> +++ b/samples/vfio-mdev/mtty.c
> @@ -1269,6 +1269,9 @@ static const struct vfio_device_ops mtty_dev_ops
> = {
>  	.read = mtty_read,
>  	.write = mtty_write,
>  	.ioctl = mtty_ioctl,
> +	.bind_iommufd	= vfio_iommufd_emulated_bind,
> +	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
> +	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
>  };
> 
>  static struct mdev_driver mtty_driver = {
> 
> > What should we return here anyhow if an access was created?
> 
> iommufd_access->obj.id. should be fine. Is it?

btw. It requires creating iommufd_access in vfio_iommufd_emulated_bind()
instead of in the attach(). Seems like Nicolin's replace domain series has a patch
to move iommufd_access creation to the bind().

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-02 23:51           ` Nicolin Chen
@ 2023-03-03 15:01             ` Shameerali Kolothum Thodi
  2023-03-04  7:00               ` Nicolin Chen
  0 siblings, 1 reply; 131+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-03 15:01 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, Jason Gunthorpe, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, suravee.suthikulpanit@amd.com,
	robin.murphy@arm.com



> -----Original Message-----
> From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> Sent: 02 March 2023 23:51
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> Jason Gunthorpe <jgg@nvidia.com>; alex.williamson@redhat.com; Tian,
> Kevin <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; kvm@vger.kernel.org;
> mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> lulu@redhat.com; suravee.suthikulpanit@amd.com;
> intel-gvt-dev@lists.freedesktop.org; intel-gfx@lists.freedesktop.org;
> linux-s390@vger.kernel.org; Hao, Xudong <xudong.hao@intel.com>; Zhao,
> Yan Y <yan.y.zhao@intel.com>
> Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 
> On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > Hi Nicolin,
> >
> > Thanks for the latest ARM64 branch. Do you have a working Qemu branch
> corresponding to the
> > above one?
> >
> > I tried the
> https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> smmuv3
> > but for some reason not able to launch the Guest.
> >
> > Please let me know.
> 
> I do use that branch. It might not be that robust though as it
> went through a big rebase.

Ok. The issue seems to be quite random in nature and only happens when there
are multiple vCPUs. Also doesn't look like related to VFIO device assignment
as I can reproduce Guest hang without it by only having nested-smmuv3 and
iommufd object.

./qemu-system-aarch64-iommuf -machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
-enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 \
-object iommufd,id=iommufd0 \
-bios QEMU_EFI.fd \
-kernel Image-6.2-iommufd \
-initrd rootfs-iperf.cpio \
-net none \
-nographic \
-append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \
-trace events=events \
-D trace_iommufd 

When the issue happens, no output on terminal as if Qemu is in a locked state.

 Can you try with the followings?
> 
> --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*" --trace
> "msi_*" --trace "nvme_*"

The only trace events with above are this,

iommufd_backend_connect fd=22 owned=1 users=1 (0)
smmu_add_mr smmuv3-iommu-memory-region-0-0

I haven't debugged this further. Please let me know if issue is reproducible 
with multiple vCPUs at your end. For now will focus on VFIO dev specific tests.

Thanks,
Shameer 




^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-03  6:36           ` Tian, Kevin
@ 2023-03-03 16:55             ` Alex Williamson
  2023-03-05 14:48               ` Liu, Yi L
  2023-03-06 13:16               ` Jason Gunthorpe
  0 siblings, 2 replies; 131+ messages in thread
From: Alex Williamson @ 2023-03-03 16:55 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, nicolinc@nvidia.com,
	Jason Gunthorpe, Zhao, Yan Y, intel-gfx@lists.freedesktop.org,
	eric.auger@redhat.com, intel-gvt-dev@lists.freedesktop.org,
	yi.y.sun@linux.intel.com, cohuck@redhat.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Fri, 3 Mar 2023 06:36:35 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, March 2, 2023 10:20 PM
> >   
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Thursday, March 2, 2023 8:35 PM
> > >
> > > On Thu, Mar 02, 2023 at 09:55:46AM +0000, Tian, Kevin wrote:  
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Thursday, March 2, 2023 2:07 PM
> > > > >  
> > > > > > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > > > > > +		if (cur_vma->vdev.open_count &&
> > > > > > +		    !vfio_dev_in_groups(cur_vma, groups) &&
> > > > > > +		    !vfio_dev_in_iommufd_ctx(cur_vma,  
> > iommufd_ctx)) {  
> > > > >
> > > > > Hi Alex, Jason,
> > > > >
> > > > > There is one concern on this approach which is related to the
> > > > > cdev noiommu mode. As patch 16 of this series, cdev path
> > > > > supports noiommu mode by passing a negative iommufd to
> > > > > kernel. In such case, the vfio_device is not bound to a valid
> > > > > iommufd. Then the check in vfio_dev_in_iommufd_ctx() is
> > > > > to be broken.
> > > > >
> > > > > An idea is to add a cdev_noiommu flag in vfio_device, when
> > > > > checking the iommufd_ictx, also check this flag. If all the opened
> > > > > devices in the dev_set have vfio_device->cdev_noiommu==true,
> > > > > then the reset is considered to be doable. But there is a special
> > > > > case. If devices in this dev_set are opened by two applications
> > > > > that operates in cdev noiommu mode, then this logic is not able
> > > > > to differentiate them. In that case, should we allow the reset?
> > > > > It seems to ok to allow reset since noiommu mode itself means
> > > > > no security between the applications that use it. thoughts?
> > > > >  
> > > >
> > > > Probably we need still pass in a valid iommufd (instead of using
> > > > a negative value) in noiommu case to mark the ownership so the
> > > > check in the reset path can correctly catch whether an opened
> > > > device belongs to this user.  
> > >
> > > There should be no iommufd at all in no-iommu mode
> > >
> > > Adding one just to deal with noiommu reset seems pretty sad :\
> > >
> > > no-iommu is only really used by dpdk, and it doesn't invoke
> > > VFIO_DEVICE_PCI_HOT_RESET at all.  
> > 
> > Does it happen to be or by design, this ioctl is not needed by dpdk?  

I can't think of a reason DPDK couldn't use hot-reset.  If we want to
make it a policy, it should be enforced by code, but creating that
policy based on a difficulty in supporting that mode with iommufd isn't
great.
 
> use of noiommu should be discouraged.
> 
> if only known noiommu user doesn't use it then having certain
> new restriction for noiommu in the hot reset path might be an
> acceptable tradeoff.
> 
> but again needs Alex's input as he knows all the history about
> noiommu. 😊

No-IOMMU mode was meant to be a minimally invasive code change to
re-use the vfio device interface, or alternatively avoid extending
uio-pci-generic to support MSI/X, with better logging/tainting to know
when userspace is driving devices without IOMMU protection, and as a
means to promote a transition to standard support of vfio.  AFAIK,
there are still environments without v/IOMMU that make use of no-iommu
mode.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-01  2:29       ` Nicolin Chen
  2023-03-01  3:44         ` Liu, Yi L
  2023-03-02  9:43         ` Shameerali Kolothum Thodi
@ 2023-03-03 21:29         ` Matthew Rosato
  2 siblings, 0 replies; 131+ messages in thread
From: Matthew Rosato @ 2023-03-03 21:29 UTC (permalink / raw)
  To: Nicolin Chen, Xu, Terrence
  Cc: linux-s390@vger.kernel.org, Liu, Yi L, yi.y.sun@linux.intel.com,
	lulu@redhat.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com,
	suravee.suthikulpanit@amd.com, Zhao, Yan Y, eric.auger@redhat.com,
	shameerali.kolothum.thodi@huawei.com, Jason Gunthorpe,
	intel-gfx@lists.freedesktop.org, chao.p.peng@linux.intel.com,
	robin.murphy@arm.com, jasowang@redhat.com

On 2/28/23 9:29 PM, Nicolin Chen wrote:
> On Tue, Feb 28, 2023 at 04:58:06PM +0000, Xu, Terrence wrote:
> 
>> Verified this series by "Intel GVT-g GPU device mediated passthrough" and "Intel GVT-d GPU device direct passthrough" technologies.
>> Both passed VFIO legacy mode / compat mode / cdev mode, including negative tests.
>>
>> Tested-by: Terrence Xu <terrence.xu@intel.com>
> 
> Sanity-tested this series on ARM64 with my wip branch:
> https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-nesting
> (Covering new iommufd and vfio-compat)
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>

Tested a few different flavors of this series on s390 (I grabbed the most recent v6 copy from github):

legacy (IOMMUFD=n): vfio-pci, vfio-ccw, vfio-ap
compat (CONFIG_IOMMUFD_VFIO_CONTAINER=y): vfio-pci, vfio-ccw, vfio-ap
compat+cdev+group (VFIO_DEVICE_CDEV=y && VFIO_GROUP=y): vfio-pci (over cdev using Yi's qemu branch as well as via group), vfio-ccw and vfio-ap via group
compat+cdev-only (VFIO_DEVICE_CDEV=y && VFIO_GROUP=n): vfio-pci using Yi's qemu branch

Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-03 15:01             ` Shameerali Kolothum Thodi
@ 2023-03-04  7:00               ` Nicolin Chen
  2023-03-04  8:22                 ` Liu, Yi L
                                   ` (2 more replies)
  0 siblings, 3 replies; 131+ messages in thread
From: Nicolin Chen @ 2023-03-04  7:00 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, Jason Gunthorpe, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, suravee.suthikulpanit@amd.com,
	robin.murphy@arm.com

On Fri, Mar 03, 2023 at 03:01:03PM +0000, Shameerali Kolothum Thodi wrote:
> External email: Use caution opening links or attachments
> 
> 
> > -----Original Message-----
> > From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> > Sent: 02 March 2023 23:51
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> > Jason Gunthorpe <jgg@nvidia.com>; alex.williamson@redhat.com; Tian,
> > Kevin <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> > cohuck@redhat.com; eric.auger@redhat.com; kvm@vger.kernel.org;
> > mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> > yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> > lulu@redhat.com; suravee.suthikulpanit@amd.com;
> > intel-gvt-dev@lists.freedesktop.org; intel-gfx@lists.freedesktop.org;
> > linux-s390@vger.kernel.org; Hao, Xudong <xudong.hao@intel.com>; Zhao,
> > Yan Y <yan.y.zhao@intel.com>
> > Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> >
> > On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum Thodi
> > wrote:
> >
> > > Hi Nicolin,
> > >
> > > Thanks for the latest ARM64 branch. Do you have a working Qemu branch
> > corresponding to the
> > > above one?
> > >
> > > I tried the
> > https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> > smmuv3
> > > but for some reason not able to launch the Guest.
> > >
> > > Please let me know.
> >
> > I do use that branch. It might not be that robust though as it
> > went through a big rebase.
> 
> Ok. The issue seems to be quite random in nature and only happens when there
> are multiple vCPUs. Also doesn't look like related to VFIO device assignment
> as I can reproduce Guest hang without it by only having nested-smmuv3 and
> iommufd object.
> 
> ./qemu-system-aarch64-iommuf -machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
> -enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image-6.2-iommufd \
> -initrd rootfs-iperf.cpio \
> -net none \
> -nographic \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \
> -trace events=events \
> -D trace_iommufd
> 
> When the issue happens, no output on terminal as if Qemu is in a locked state.
> 
>  Can you try with the followings?
> >
> > --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*" --trace
> > "msi_*" --trace "nvme_*"
> 
> The only trace events with above are this,
> 
> iommufd_backend_connect fd=22 owned=1 users=1 (0)
> smmu_add_mr smmuv3-iommu-memory-region-0-0
> 
> I haven't debugged this further. Please let me know if issue is reproducible
> with multiple vCPUs at your end. For now will focus on VFIO dev specific tests.

Oh. My test environment has been a single-core vCPU. So that
doesn't happen to me. Can you try a vanilla QEMU branch that
our nesting branch is rebased on? I took a branch from Yi as
the baseline, while he might take from Eric for the rfcv3.

I am guessing that it might be an issue in the common tree.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-04  7:00               ` Nicolin Chen
@ 2023-03-04  8:22                 ` Liu, Yi L
  2023-03-08 15:54                 ` Shameerali Kolothum Thodi
  2023-03-14 11:38                 ` Shameerali Kolothum Thodi
  2 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-04  8:22 UTC (permalink / raw)
  To: Nicolin Chen, Shameerali Kolothum Thodi
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com,
	suravee.suthikulpanit@amd.com, Zhao,  Yan Y,
	eric.auger@redhat.com, Xu, Terrence, Jason Gunthorpe,
	intel-gfx@lists.freedesktop.org, chao.p.peng@linux.intel.com,
	lulu@redhat.com, robin.murphy@arm.com, jasowang@redhat.com

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, March 4, 2023 3:01 PM
> 
> Oh. My test environment has been a single-core vCPU. So that
> doesn't happen to me. Can you try a vanilla QEMU branch that
> our nesting branch is rebased on? I took a branch from Yi as
> the baseline, while he might take from Eric for the rfcv3.

Yes, I took the qemu from Eric's rfcv3, just plus two commits to align the
uapi.

Regards
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-03 16:55             ` Alex Williamson
@ 2023-03-05 14:48               ` Liu, Yi L
  2023-03-06  8:16                 ` Tian, Kevin
  2023-03-06  9:59                 ` Liu, Yi L
  2023-03-06 13:16               ` Jason Gunthorpe
  1 sibling, 2 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-05 14:48 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: linux-s390@vger.kernel.org, suravee.suthikulpanit@amd.com,
	yi.y.sun@linux.intel.com, mjrosato@linux.ibm.com,
	kvm@vger.kernel.org, intel-gvt-dev@lists.freedesktop.org,
	joro@8bytes.org, cohuck@redhat.com, Hao, Xudong,
	peterx@redhat.com, Zhao, Yan Y, eric.auger@redhat.com,
	Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, Jason Gunthorpe,
	intel-gfx@lists.freedesktop.org, chao.p.peng@linux.intel.com,
	lulu@redhat.com, robin.murphy@arm.com, jasowang@redhat.com

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, March 4, 2023 12:56 AM
> 
> On Fri, 3 Mar 2023 06:36:35 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Thursday, March 2, 2023 10:20 PM
> > >
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Thursday, March 2, 2023 8:35 PM
> > > >
> > > > On Thu, Mar 02, 2023 at 09:55:46AM +0000, Tian, Kevin wrote:
> > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > Sent: Thursday, March 2, 2023 2:07 PM
> > > > > >
> > > > > > > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > > > > > > +		if (cur_vma->vdev.open_count &&
> > > > > > > +		    !vfio_dev_in_groups(cur_vma, groups) &&
> > > > > > > +		    !vfio_dev_in_iommufd_ctx(cur_vma,
> > > iommufd_ctx)) {
> > > > > >
> > > > > > Hi Alex, Jason,
> > > > > >
> > > > > > There is one concern on this approach which is related to the
> > > > > > cdev noiommu mode. As patch 16 of this series, cdev path
> > > > > > supports noiommu mode by passing a negative iommufd to
> > > > > > kernel. In such case, the vfio_device is not bound to a valid
> > > > > > iommufd. Then the check in vfio_dev_in_iommufd_ctx() is
> > > > > > to be broken.
> > > > > >
> > > > > > An idea is to add a cdev_noiommu flag in vfio_device, when
> > > > > > checking the iommufd_ictx, also check this flag. If all the opened
> > > > > > devices in the dev_set have vfio_device->cdev_noiommu==true,
> > > > > > then the reset is considered to be doable. But there is a special
> > > > > > case. If devices in this dev_set are opened by two applications
> > > > > > that operates in cdev noiommu mode, then this logic is not able
> > > > > > to differentiate them. In that case, should we allow the reset?
> > > > > > It seems to ok to allow reset since noiommu mode itself means
> > > > > > no security between the applications that use it. thoughts?
> > > > > >
> > > > >
> > > > > Probably we need still pass in a valid iommufd (instead of using
> > > > > a negative value) in noiommu case to mark the ownership so the
> > > > > check in the reset path can correctly catch whether an opened
> > > > > device belongs to this user.
> > > >
> > > > There should be no iommufd at all in no-iommu mode
> > > >
> > > > Adding one just to deal with noiommu reset seems pretty sad :\
> > > >
> > > > no-iommu is only really used by dpdk, and it doesn't invoke
> > > > VFIO_DEVICE_PCI_HOT_RESET at all.
> > >
> > > Does it happen to be or by design, this ioctl is not needed by dpdk?
> 
> I can't think of a reason DPDK couldn't use hot-reset.  If we want to
> make it a policy, it should be enforced by code, but creating that
> policy based on a difficulty in supporting that mode with iommufd isn't
> great.

Makes sense. A userspace driver should have the chance to reset
device.

> 
> > use of noiommu should be discouraged.
> >
> > if only known noiommu user doesn't use it then having certain
> > new restriction for noiommu in the hot reset path might be an
> > acceptable tradeoff.
> >
> > but again needs Alex's input as he knows all the history about
> > noiommu. 😊
> 
> No-IOMMU mode was meant to be a minimally invasive code change to
> re-use the vfio device interface, or alternatively avoid extending
> uio-pci-generic to support MSI/X, with better logging/tainting to know
> when userspace is driving devices without IOMMU protection, and as a
> means to promote a transition to standard support of vfio.  AFAIK,
> there are still environments without v/IOMMU that make use of no-iommu
> mode.  Thanks,

This makes Jason's remark (noiommu should not use iommufd at all) much
more reasonable. If there is no v/IOMMU, then no iommufd at all.

If no iommufd is used in the no-iommu mode, this approach cannot
tell two applications that are operating in no-iommu mode. If we allow
reset, it may make no-iommu mode more weak. So perhaps we need
to have another approach for this ownership check.

How about falling back to prior solution. Allow userspace to pass a set
of device fd, and the kernel just checks the opened devices in the dev_set,
all the opened devices should be included in the device fd set. If not all
of them are included, fail it.

Regards,
Yi Liu



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-05 14:48               ` Liu, Yi L
@ 2023-03-06  8:16                 ` Tian, Kevin
  2023-03-06  8:23                   ` Tian, Kevin
  2023-03-06  9:59                 ` Liu, Yi L
  1 sibling, 1 reply; 131+ messages in thread
From: Tian, Kevin @ 2023-03-06  8:16 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: linux-s390@vger.kernel.org, suravee.suthikulpanit@amd.com,
	yi.y.sun@linux.intel.com, mjrosato@linux.ibm.com,
	kvm@vger.kernel.org, intel-gvt-dev@lists.freedesktop.org,
	joro@8bytes.org, cohuck@redhat.com, Hao, Xudong,
	peterx@redhat.com, Zhao, Yan Y, eric.auger@redhat.com,
	Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, Jason Gunthorpe,
	intel-gfx@lists.freedesktop.org, chao.p.peng@linux.intel.com,
	lulu@redhat.com, robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Sunday, March 5, 2023 10:49 PM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Saturday, March 4, 2023 12:56 AM
> >
> > On Fri, 3 Mar 2023 06:36:35 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >
> > > use of noiommu should be discouraged.
> > >
> > > if only known noiommu user doesn't use it then having certain
> > > new restriction for noiommu in the hot reset path might be an
> > > acceptable tradeoff.
> > >
> > > but again needs Alex's input as he knows all the history about
> > > noiommu. 😊
> >
> > No-IOMMU mode was meant to be a minimally invasive code change to
> > re-use the vfio device interface, or alternatively avoid extending
> > uio-pci-generic to support MSI/X, with better logging/tainting to know
> > when userspace is driving devices without IOMMU protection, and as a
> > means to promote a transition to standard support of vfio.  AFAIK,
> > there are still environments without v/IOMMU that make use of no-iommu
> > mode.  Thanks,
> 
> This makes Jason's remark (noiommu should not use iommufd at all) much
> more reasonable. If there is no v/IOMMU, then no iommufd at all.

yeah, viommu is a good point.

> 
> If no iommufd is used in the no-iommu mode, this approach cannot
> tell two applications that are operating in no-iommu mode. If we allow
> reset, it may make no-iommu mode more weak. So perhaps we need
> to have another approach for this ownership check.
> 
> How about falling back to prior solution. Allow userspace to pass a set
> of device fd, and the kernel just checks the opened devices in the dev_set,
> all the opened devices should be included in the device fd set. If not all
> of them are included, fail it.
> 

looks this is a cleaner approach.

if a device is not opened we know it's safe to reset.

If a device is opened then it must be opened by the calling process to be
reset.

from this angle we don't need to bother with noiommu vs. iommufd
when iommufd is not always available.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-06  8:16                 ` Tian, Kevin
@ 2023-03-06  8:23                   ` Tian, Kevin
  2023-03-06  8:33                     ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Tian, Kevin @ 2023-03-06  8:23 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, Alex Williamson
  Cc: linux-s390@vger.kernel.org, suravee.suthikulpanit@amd.com,
	yi.y.sun@linux.intel.com, mjrosato@linux.ibm.com,
	kvm@vger.kernel.org, intel-gvt-dev@lists.freedesktop.org,
	joro@8bytes.org, cohuck@redhat.com, Hao, Xudong,
	peterx@redhat.com, Zhao, Yan Y, eric.auger@redhat.com,
	Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, Jason Gunthorpe,
	intel-gfx@lists.freedesktop.org, chao.p.peng@linux.intel.com,
	lulu@redhat.com, robin.murphy@arm.com, jasowang@redhat.com

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Monday, March 6, 2023 4:17 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Sunday, March 5, 2023 10:49 PM
> >
> >
> > How about falling back to prior solution. Allow userspace to pass a set
> > of device fd, and the kernel just checks the opened devices in the dev_set,
> > all the opened devices should be included in the device fd set. If not all
> > of them are included, fail it.
> >
> 
> looks this is a cleaner approach.
> 
> if a device is not opened we know it's safe to reset.
> 
> If a device is opened then it must be opened by the calling process to be
> reset.
> 
> from this angle we don't need to bother with noiommu vs. iommufd
> when iommufd is not always available.

btw there is one thing to be fixed in your next version.

noiommu shouldn't be enabled on a device which always has a iommu group.

We need a check on iommu_group in following place:

+	/* iommufd < 0 means noiommu mode */
+	if (bind.iommufd < 0) {
+		if (!capable(CAP_SYS_RAWIO)) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+		df->noiommu = true;


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-06  8:23                   ` Tian, Kevin
@ 2023-03-06  8:33                     ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-06  8:33 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: linux-s390@vger.kernel.org, suravee.suthikulpanit@amd.com,
	yi.y.sun@linux.intel.com, mjrosato@linux.ibm.com,
	kvm@vger.kernel.org, intel-gvt-dev@lists.freedesktop.org,
	joro@8bytes.org, cohuck@redhat.com, Hao, Xudong,
	peterx@redhat.com, Zhao, Yan Y, eric.auger@redhat.com,
	Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, Jason Gunthorpe,
	intel-gfx@lists.freedesktop.org, chao.p.peng@linux.intel.com,
	lulu@redhat.com, robin.murphy@arm.com, jasowang@redhat.com

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Monday, March 6, 2023 4:23 PM
> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Monday, March 6, 2023 4:17 PM
> >
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Sunday, March 5, 2023 10:49 PM
> > >
> > >
> > > How about falling back to prior solution. Allow userspace to pass a set
> > > of device fd, and the kernel just checks the opened devices in the
> dev_set,
> > > all the opened devices should be included in the device fd set. If not all
> > > of them are included, fail it.
> > >
> >
> > looks this is a cleaner approach.
> >
> > if a device is not opened we know it's safe to reset.
> >
> > If a device is opened then it must be opened by the calling process to be
> > reset.
> >
> > from this angle we don't need to bother with noiommu vs. iommufd
> > when iommufd is not always available.
> 
> btw there is one thing to be fixed in your next version.
> 
> noiommu shouldn't be enabled on a device which always has a iommu
> group.
> 
> We need a check on iommu_group in following place:
> 
> +	/* iommufd < 0 means noiommu mode */
> +	if (bind.iommufd < 0) {
> +		if (!capable(CAP_SYS_RAWIO)) {
> +			ret = -EPERM;
> +			goto out_unlock;
> +		}
> +		df->noiommu = true;

Yes. it is. If there is iommu in the system, noiommu mode is not available.
Checking iommu_group presence could detect it. 😊

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-05 14:48               ` Liu, Yi L
  2023-03-06  8:16                 ` Tian, Kevin
@ 2023-03-06  9:59                 ` Liu, Yi L
  1 sibling, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-06  9:59 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: linux-s390@vger.kernel.org, suravee.suthikulpanit@amd.com,
	yi.y.sun@linux.intel.com, mjrosato@linux.ibm.com,
	kvm@vger.kernel.org, intel-gvt-dev@lists.freedesktop.org,
	joro@8bytes.org, cohuck@redhat.com, Hao, Xudong,
	peterx@redhat.com, Zhao, Yan Y, eric.auger@redhat.com,
	Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, Jason Gunthorpe,
	intel-gfx@lists.freedesktop.org, chao.p.peng@linux.intel.com,
	lulu@redhat.com, robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Sunday, March 5, 2023 10:49 PM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Saturday, March 4, 2023 12:56 AM
> >
> > On Fri, 3 Mar 2023 06:36:35 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Thursday, March 2, 2023 10:20 PM
> > > >
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Thursday, March 2, 2023 8:35 PM
> > > > >
> > > > > On Thu, Mar 02, 2023 at 09:55:46AM +0000, Tian, Kevin wrote:
> > > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > > Sent: Thursday, March 2, 2023 2:07 PM
> > > > > > >
> > > > > > > > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > > > > > > > +		if (cur_vma->vdev.open_count &&
> > > > > > > > +		    !vfio_dev_in_groups(cur_vma, groups) &&
> > > > > > > > +		    !vfio_dev_in_iommufd_ctx(cur_vma,
> > > > iommufd_ctx)) {
> > > > > > >
> > > > > > > Hi Alex, Jason,
> > > > > > >
> > > > > > > There is one concern on this approach which is related to the
> > > > > > > cdev noiommu mode. As patch 16 of this series, cdev path
> > > > > > > supports noiommu mode by passing a negative iommufd to
> > > > > > > kernel. In such case, the vfio_device is not bound to a valid
> > > > > > > iommufd. Then the check in vfio_dev_in_iommufd_ctx() is
> > > > > > > to be broken.
> > > > > > >
> > > > > > > An idea is to add a cdev_noiommu flag in vfio_device, when
> > > > > > > checking the iommufd_ictx, also check this flag. If all the opened
> > > > > > > devices in the dev_set have vfio_device->cdev_noiommu==true,
> > > > > > > then the reset is considered to be doable. But there is a special
> > > > > > > case. If devices in this dev_set are opened by two applications
> > > > > > > that operates in cdev noiommu mode, then this logic is not able
> > > > > > > to differentiate them. In that case, should we allow the reset?
> > > > > > > It seems to ok to allow reset since noiommu mode itself means
> > > > > > > no security between the applications that use it. thoughts?
> > > > > > >
> > > > > >
> > > > > > Probably we need still pass in a valid iommufd (instead of using
> > > > > > a negative value) in noiommu case to mark the ownership so the
> > > > > > check in the reset path can correctly catch whether an opened
> > > > > > device belongs to this user.
> > > > >
> > > > > There should be no iommufd at all in no-iommu mode
> > > > >
> > > > > Adding one just to deal with noiommu reset seems pretty sad :\
> > > > >
> > > > > no-iommu is only really used by dpdk, and it doesn't invoke
> > > > > VFIO_DEVICE_PCI_HOT_RESET at all.
> > > >
> > > > Does it happen to be or by design, this ioctl is not needed by dpdk?
> >
> > I can't think of a reason DPDK couldn't use hot-reset.  If we want to
> > make it a policy, it should be enforced by code, but creating that
> > policy based on a difficulty in supporting that mode with iommufd isn't
> > great.
> 
> Makes sense. A userspace driver should have the chance to reset
> device.
> 
> >
> > > use of noiommu should be discouraged.
> > >
> > > if only known noiommu user doesn't use it then having certain
> > > new restriction for noiommu in the hot reset path might be an
> > > acceptable tradeoff.
> > >
> > > but again needs Alex's input as he knows all the history about
> > > noiommu. 😊
> >
> > No-IOMMU mode was meant to be a minimally invasive code change to
> > re-use the vfio device interface, or alternatively avoid extending
> > uio-pci-generic to support MSI/X, with better logging/tainting to know
> > when userspace is driving devices without IOMMU protection, and as a
> > means to promote a transition to standard support of vfio.  AFAIK,
> > there are still environments without v/IOMMU that make use of no-
> iommu
> > mode.  Thanks,
> 
> This makes Jason's remark (noiommu should not use iommufd at all) much
> more reasonable. If there is no v/IOMMU, then no iommufd at all.

A correction. A system without iommu can still have iommufd. But
I it doesn’t change the direction here.

> If no iommufd is used in the no-iommu mode, this approach cannot
> tell two applications that are operating in no-iommu mode. If we allow
> reset, it may make no-iommu mode more weak. So perhaps we need
> to have another approach for this ownership check.
> 
> How about falling back to prior solution. Allow userspace to pass a set
> of device fd, and the kernel just checks the opened devices in the dev_set,
> all the opened devices should be included in the device fd set. If not all
> of them are included, fail it.
> 
> Regards,
> Yi Liu
> 


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-03 16:55             ` Alex Williamson
  2023-03-05 14:48               ` Liu, Yi L
@ 2023-03-06 13:16               ` Jason Gunthorpe
  2023-03-07  2:31                 ` Tian, Kevin
  1 sibling, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-03-06 13:16 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, nicolinc@nvidia.com,
	Zhao, Yan Y, intel-gfx@lists.freedesktop.org,
	eric.auger@redhat.com, intel-gvt-dev@lists.freedesktop.org,
	yi.y.sun@linux.intel.com, cohuck@redhat.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Fri, Mar 03, 2023 at 09:55:42AM -0700, Alex Williamson wrote:

> I can't think of a reason DPDK couldn't use hot-reset.  If we want to
> make it a policy, it should be enforced by code, but creating that
> policy based on a difficulty in supporting that mode with iommufd isn't
> great.

On the other hand adding code to allow device FDs in the hot reset
path that is never used and never tested isn't great either..

hot-reset does work for DPDK, it just doesn't work in the case where
DPDK would have many VFIO devices open and they have overlapping
device sets. Which, again, is something it doesn't do.

IMHO we should leave it out of the kernel and wait for a no-iommu user
to come forward that wants hot-reset of many devices. Then we can add
and test the device FD part. Most likely such a thing will never come
at this point.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-06 13:16               ` Jason Gunthorpe
@ 2023-03-07  2:31                 ` Tian, Kevin
  2023-03-07  2:35                   ` Liu, Yi L
  2023-03-07 12:36                   ` Jason Gunthorpe
  0 siblings, 2 replies; 131+ messages in thread
From: Tian, Kevin @ 2023-03-07  2:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: linux-s390@vger.kernel.org, Liu, Yi L, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, March 6, 2023 9:17 PM
> 
> On Fri, Mar 03, 2023 at 09:55:42AM -0700, Alex Williamson wrote:
> 
> > I can't think of a reason DPDK couldn't use hot-reset.  If we want to
> > make it a policy, it should be enforced by code, but creating that
> > policy based on a difficulty in supporting that mode with iommufd isn't
> > great.
> 
> On the other hand adding code to allow device FDs in the hot reset
> path that is never used and never tested isn't great either..
> 
> hot-reset does work for DPDK, it just doesn't work in the case where
> DPDK would have many VFIO devices open and they have overlapping
> device sets. Which, again, is something it doesn't do.
> 
> IMHO we should leave it out of the kernel and wait for a no-iommu user
> to come forward that wants hot-reset of many devices. Then we can add
> and test the device FD part. Most likely such a thing will never come
> at this point.
> 

I think we don't need to have this tradeoff if following Yi's last proposal
which requires every opened device in the set to be covered by the
device fd array. with dev_set->lock held in the reset/open path this is
a safe measure and fully contained in vfio-pci w/o need of further
checking noiommu or iommufd.

In the end same reset uAPI except the fd array can be device fd now. 😊

btw Yi, since this also affects the group path (though positive) it's clearer
to first add open_count check in existing group path in a separate patch
and then add the device fd support.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-07  2:31                 ` Tian, Kevin
@ 2023-03-07  2:35                   ` Liu, Yi L
  2023-03-07 12:36                   ` Jason Gunthorpe
  1 sibling, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-07  2:35 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Alex Williamson
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Tuesday, March 7, 2023 10:31 AM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, March 6, 2023 9:17 PM
> >
> > On Fri, Mar 03, 2023 at 09:55:42AM -0700, Alex Williamson wrote:
> >
> > > I can't think of a reason DPDK couldn't use hot-reset.  If we want to
> > > make it a policy, it should be enforced by code, but creating that
> > > policy based on a difficulty in supporting that mode with iommufd isn't
> > > great.
> >
> > On the other hand adding code to allow device FDs in the hot reset
> > path that is never used and never tested isn't great either..
> >
> > hot-reset does work for DPDK, it just doesn't work in the case where
> > DPDK would have many VFIO devices open and they have overlapping
> > device sets. Which, again, is something it doesn't do.
> >
> > IMHO we should leave it out of the kernel and wait for a no-iommu user
> > to come forward that wants hot-reset of many devices. Then we can add
> > and test the device FD part. Most likely such a thing will never come
> > at this point.
> >
> 
> I think we don't need to have this tradeoff if following Yi's last proposal
> which requires every opened device in the set to be covered by the
> device fd array. with dev_set->lock held in the reset/open path this is
> a safe measure and fully contained in vfio-pci w/o need of further
> checking noiommu or iommufd.
> 
> In the end same reset uAPI except the fd array can be device fd now. 😊
> 
> btw Yi, since this also affects the group path (though positive) it's clearer
> to first add open_count check in existing group path in a separate patch
> and then add the device fd support.

Yes. I've made them in the below branch. I plan to send v6 out with the
iommufd_access created in bind (not in the below branch yet).

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v6

vfio/pci: Accept device fd for VFIO_DEVICE_PCI_HOT_RESET
(https://github.com/yiliu1765/iommufd/commit/dcefb8ca5d13388ab9b9862992fd77cffcbadc30)
vfio/pci: Only need to check opened devices in the dev_set for hot reset
(https://github.com/yiliu1765/iommufd/commit/f7257f2db958d9d961a6e45ab0e301ee0397a243)

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-03  6:57       ` Liu, Yi L
  2023-03-03  7:23         ` Liu, Yi L
@ 2023-03-07  6:38         ` Tian, Kevin
  2023-03-07 12:37           ` Jason Gunthorpe
  1 sibling, 1 reply; 131+ messages in thread
From: Tian, Kevin @ 2023-03-07  6:38 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Friday, March 3, 2023 2:58 PM
> 
> > What should we return here anyhow if an access was created?
> 
> iommufd_access->obj.id. should be fine. Is it?
> 

Thinking more I'm not sure whether it's a good idea to fill the
dev_id field with an access object id and then later confuse
the user to get an -ENOENT error when trying to allocate a
hwpt with an access object id.

How can user differentiate it from the real error case where
invalid iommufd object is used?

It sounds clearer to return dev_id only when there is a true
device object being created by the bind_iommufd cmd. Then
the user can use it to decide whether  to further attempt
dev_id related cmds.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-07  2:31                 ` Tian, Kevin
  2023-03-07  2:35                   ` Liu, Yi L
@ 2023-03-07 12:36                   ` Jason Gunthorpe
  2023-03-07 13:28                     ` Liu, Yi L
  1 sibling, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-03-07 12:36 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, nicolinc@nvidia.com,
	Zhao, Yan Y, intel-gfx@lists.freedesktop.org,
	eric.auger@redhat.com, intel-gvt-dev@lists.freedesktop.org,
	yi.y.sun@linux.intel.com, cohuck@redhat.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Mar 07, 2023 at 02:31:11AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, March 6, 2023 9:17 PM
> > 
> > On Fri, Mar 03, 2023 at 09:55:42AM -0700, Alex Williamson wrote:
> > 
> > > I can't think of a reason DPDK couldn't use hot-reset.  If we want to
> > > make it a policy, it should be enforced by code, but creating that
> > > policy based on a difficulty in supporting that mode with iommufd isn't
> > > great.
> > 
> > On the other hand adding code to allow device FDs in the hot reset
> > path that is never used and never tested isn't great either..
> > 
> > hot-reset does work for DPDK, it just doesn't work in the case where
> > DPDK would have many VFIO devices open and they have overlapping
> > device sets. Which, again, is something it doesn't do.
> > 
> > IMHO we should leave it out of the kernel and wait for a no-iommu user
> > to come forward that wants hot-reset of many devices. Then we can add
> > and test the device FD part. Most likely such a thing will never come
> > at this point.
> > 
> 
> I think we don't need to have this tradeoff if following Yi's last proposal
> which requires every opened device in the set to be covered by the
> device fd array. with dev_set->lock held in the reset/open path this is
> a safe measure and fully contained in vfio-pci w/o need of further
> checking noiommu or iommufd.

I really prefer the 'use the iommufd option' still exist, it is so
much cleaner and easier for the actual users of this API. We've lost
the point by worrying about no iommu.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-07  6:38         ` Tian, Kevin
@ 2023-03-07 12:37           ` Jason Gunthorpe
  2023-03-07 13:03             ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Jason Gunthorpe @ 2023-03-07 12:37 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, nicolinc@nvidia.com,
	Zhao, Yan Y, intel-gfx@lists.freedesktop.org,
	eric.auger@redhat.com, intel-gvt-dev@lists.freedesktop.org,
	yi.y.sun@linux.intel.com, cohuck@redhat.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Tue, Mar 07, 2023 at 06:38:59AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Friday, March 3, 2023 2:58 PM
> > 
> > > What should we return here anyhow if an access was created?
> > 
> > iommufd_access->obj.id. should be fine. Is it?
> 
> Thinking more I'm not sure whether it's a good idea to fill the
> dev_id field with an access object id and then later confuse
> the user to get an -ENOENT error when trying to allocate a
> hwpt with an access object id.
> 
> How can user differentiate it from the real error case where
> invalid iommufd object is used?
> 
> It sounds clearer to return dev_id only when there is a true
> device object being created by the bind_iommufd cmd. Then
> the user can use it to decide whether  to further attempt
> dev_id related cmds.

It means we can never return an access_id

I don't think this is a problem, the first thing userspace should do
is a get info to the dev_id which is needed to learn which iommu
driver is running it, if that returns EOPNOTSUPP then it isn't a
physical iommu device.

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-07 12:37           ` Jason Gunthorpe
@ 2023-03-07 13:03             ` Liu, Yi L
  2023-03-08  7:17               ` Tian, Kevin
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-03-07 13:03 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 7, 2023 8:38 PM
> 
> On Tue, Mar 07, 2023 at 06:38:59AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Friday, March 3, 2023 2:58 PM
> > >
> > > > What should we return here anyhow if an access was created?
> > >
> > > iommufd_access->obj.id. should be fine. Is it?
> >
> > Thinking more I'm not sure whether it's a good idea to fill the
> > dev_id field with an access object id and then later confuse
> > the user to get an -ENOENT error when trying to allocate a
> > hwpt with an access object id.
> >
> > How can user differentiate it from the real error case where
> > invalid iommufd object is used?
> >
> > It sounds clearer to return dev_id only when there is a true
> > device object being created by the bind_iommufd cmd. Then
> > the user can use it to decide whether  to further attempt
> > dev_id related cmds.
> 
> It means we can never return an access_id
> 
> I don't think this is a problem, the first thing userspace should do
> is a get info to the dev_id which is needed to learn which iommu
> driver is running it, if that returns EOPNOTSUPP then it isn't a
> physical iommu device.

This may mean your below patch depends on the get info series. 😊
Also need to update the description to the ioctl.

https://lore.kernel.org/linux-iommu/12-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia.com/

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-07 12:36                   ` Jason Gunthorpe
@ 2023-03-07 13:28                     ` Liu, Yi L
  2023-03-08  7:26                       ` Tian, Kevin
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-03-07 13:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 7, 2023 8:37 PM
> 
> On Tue, Mar 07, 2023 at 02:31:11AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Monday, March 6, 2023 9:17 PM
> > >
> > > On Fri, Mar 03, 2023 at 09:55:42AM -0700, Alex Williamson wrote:
> > >
> > > > I can't think of a reason DPDK couldn't use hot-reset.  If we want to
> > > > make it a policy, it should be enforced by code, but creating that
> > > > policy based on a difficulty in supporting that mode with iommufd isn't
> > > > great.
> > >
> > > On the other hand adding code to allow device FDs in the hot reset
> > > path that is never used and never tested isn't great either..
> > >
> > > hot-reset does work for DPDK, it just doesn't work in the case where
> > > DPDK would have many VFIO devices open and they have overlapping
> > > device sets. Which, again, is something it doesn't do.
> > >
> > > IMHO we should leave it out of the kernel and wait for a no-iommu user
> > > to come forward that wants hot-reset of many devices. Then we can
> add
> > > and test the device FD part. Most likely such a thing will never come
> > > at this point.
> > >
> >
> > I think we don't need to have this tradeoff if following Yi's last proposal
> > which requires every opened device in the set to be covered by the
> > device fd array. with dev_set->lock held in the reset/open path this is
> > a safe measure and fully contained in vfio-pci w/o need of further
> > checking noiommu or iommufd.
> 
> I really prefer the 'use the iommufd option' still exist, it is so
> much cleaner and easier for the actual users of this API. We've lost
> the point by worrying about no iommu.

Hmmm, so you are suggesting to have both the device fd approach
and the zero-length array approach, let user to select the best way
based on their wisdom. Is it? how about something like below in the
uapi header.

/**
 * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
 *                                  struct vfio_pci_hot_reset)
 *
 * Userspace requests hot reset for the devices it uses.  Due to the
 * underlying topology, multiple devices may be affected in the reset.
 * The affected devices may have been opened by the user or by other
 * users or not opened yet.  Only when all the affected devices are
 * either opened by the current user or not opened by any user, should
 * the reset request be allowed.  Otherwise, this request is expected
 * to return error. group_fds array can accept either group fds or
 * device fds.  Users using iommufd (valid fd), could also passing a
 * zero-length group_fds array to indicate using the bound iommufd_ctx
 * for ownership check to the affected devices that are opened.
 *
 * Return: 0 on success, -errno on failure.
 */
struct vfio_pci_hot_reset {
        __u32   argsz;
        __u32   flags;
        __u32   count;
        __s32   group_fds[];
};

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-07 13:03             ` Liu, Yi L
@ 2023-03-08  7:17               ` Tian, Kevin
  0 siblings, 0 replies; 131+ messages in thread
From: Tian, Kevin @ 2023-03-08  7:17 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, March 7, 2023 9:04 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, March 7, 2023 8:38 PM
> >
> > On Tue, Mar 07, 2023 at 06:38:59AM +0000, Tian, Kevin wrote:
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Friday, March 3, 2023 2:58 PM
> > > >
> > > > > What should we return here anyhow if an access was created?
> > > >
> > > > iommufd_access->obj.id. should be fine. Is it?
> > >
> > > Thinking more I'm not sure whether it's a good idea to fill the
> > > dev_id field with an access object id and then later confuse
> > > the user to get an -ENOENT error when trying to allocate a
> > > hwpt with an access object id.
> > >
> > > How can user differentiate it from the real error case where
> > > invalid iommufd object is used?
> > >
> > > It sounds clearer to return dev_id only when there is a true
> > > device object being created by the bind_iommufd cmd. Then
> > > the user can use it to decide whether  to further attempt
> > > dev_id related cmds.
> >
> > It means we can never return an access_id
> >
> > I don't think this is a problem, the first thing userspace should do
> > is a get info to the dev_id which is needed to learn which iommu
> > driver is running it, if that returns EOPNOTSUPP then it isn't a
> > physical iommu device.
> 
> This may mean your below patch depends on the get info series. 😊
> Also need to update the description to the ioctl.
> 
> https://lore.kernel.org/linux-iommu/12-v1-7612f88c19f5+2f21-
> iommufd_alloc_jgg@nvidia.com/
> 

Probably not necessary. It's user to get info and then create hwpt.
I don't think we'll ever add a check whether the user has acquired
the info before creating the hwpt. From this angle there is no
dependency in code-wise.

My earlier comment was based on the user creating hwpt w/o
querying the info. Looks it's just user's job to make it right. We
may clarify this point in hwpt_alloc uAPI comment.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-07 13:28                     ` Liu, Yi L
@ 2023-03-08  7:26                       ` Tian, Kevin
  2023-03-08  7:47                         ` Liu, Yi L
  2023-03-08 15:08                         ` Jason Gunthorpe
  0 siblings, 2 replies; 131+ messages in thread
From: Tian, Kevin @ 2023-03-08  7:26 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, March 7, 2023 9:29 PM
> 
> >
> > I really prefer the 'use the iommufd option' still exist, it is so
> > much cleaner and easier for the actual users of this API. We've lost
> > the point by worrying about no iommu.
> 
> Hmmm, so you are suggesting to have both the device fd approach
> and the zero-length array approach, let user to select the best way
> based on their wisdom. Is it? how about something like below in the
> uapi header.
> 
> /**
>  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
>  *                                  struct vfio_pci_hot_reset)
>  *
>  * Userspace requests hot reset for the devices it uses.  Due to the
>  * underlying topology, multiple devices may be affected in the reset.
>  * The affected devices may have been opened by the user or by other
>  * users or not opened yet.  Only when all the affected devices are
>  * either opened by the current user or not opened by any user, should
>  * the reset request be allowed.  Otherwise, this request is expected
>  * to return error. group_fds array can accept either group fds or
>  * device fds.  Users using iommufd (valid fd), could also passing a
>  * zero-length group_fds array to indicate using the bound iommufd_ctx
>  * for ownership check to the affected devices that are opened.
>  *
>  * Return: 0 on success, -errno on failure.
>  */
> struct vfio_pci_hot_reset {
>         __u32   argsz;
>         __u32   flags;
>         __u32   count;
>         __s32   group_fds[];
> };
> 

 * Userspace requests hot reset for the devices it uses.  Due to the
 * underlying topology, multiple devices can be affected in the reset
 * while some might be opened by another user. To avoid interference
 * the calling user must ensure all affected devices, if opened, are
 * owned by itself.
 *
 * The ownership can be proved in three ways:
 *   - An array of group fds
 *   - An array of device fds
 *   - A zero-length array
 *
 * In the last case all affected devices which are opened by this user must
 * have been bound to a same iommufd_ctx.

and with this change let's rename 'group_fds'  to 'fds'

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-08  7:26                       ` Tian, Kevin
@ 2023-03-08  7:47                         ` Liu, Yi L
  2023-03-08  7:55                           ` Tian, Kevin
  2023-03-08 15:08                         ` Jason Gunthorpe
  1 sibling, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-03-08  7:47 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, March 8, 2023 3:26 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, March 7, 2023 9:29 PM
> >
> > >
> > > I really prefer the 'use the iommufd option' still exist, it is so
> > > much cleaner and easier for the actual users of this API. We've lost
> > > the point by worrying about no iommu.
> >
> > Hmmm, so you are suggesting to have both the device fd approach
> > and the zero-length array approach, let user to select the best way
> > based on their wisdom. Is it? how about something like below in the
> > uapi header.
> >
> > /**
> >  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> >  *                                  struct vfio_pci_hot_reset)
> >  *
> >  * Userspace requests hot reset for the devices it uses.  Due to the
> >  * underlying topology, multiple devices may be affected in the reset.
> >  * The affected devices may have been opened by the user or by other
> >  * users or not opened yet.  Only when all the affected devices are
> >  * either opened by the current user or not opened by any user, should
> >  * the reset request be allowed.  Otherwise, this request is expected
> >  * to return error. group_fds array can accept either group fds or
> >  * device fds.  Users using iommufd (valid fd), could also passing a
> >  * zero-length group_fds array to indicate using the bound iommufd_ctx
> >  * for ownership check to the affected devices that are opened.
> >  *
> >  * Return: 0 on success, -errno on failure.
> >  */
> > struct vfio_pci_hot_reset {
> >         __u32   argsz;
> >         __u32   flags;
> >         __u32   count;
> >         __s32   group_fds[];
> > };
> >
> 
>  * Userspace requests hot reset for the devices it uses.  Due to the
>  * underlying topology, multiple devices can be affected in the reset
>  * while some might be opened by another user. To avoid interference
>  * the calling user must ensure all affected devices, if opened, are
>  * owned by itself.
>  *
>  * The ownership can be proved in three ways:
>  *   - An array of group fds
>  *   - An array of device fds
>  *   - A zero-length array
>  *
Thanks.
>  * In the last case all affected devices which are opened by this user must
>  * have been bound to a same iommufd_ctx.

I think we only allow it when this iommufd_ctx is valid. Is it? To
user, it means device should be bound to a positive iommufd.

> and with this change let's rename 'group_fds'  to 'fds'

Sure. It would be something like below:

struct vfio_pci_hot_reset {
	__u32   argsz;
	__u32   flags;
	_u32   count;
	union {
		__s32   group_fds[0];
		__s32   fds[0];
	};
};

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-08  7:47                         ` Liu, Yi L
@ 2023-03-08  7:55                           ` Tian, Kevin
  2023-03-08  8:00                             ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Tian, Kevin @ 2023-03-08  7:55 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, March 8, 2023 3:47 PM
> 
> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Wednesday, March 8, 2023 3:26 PM
> >
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Tuesday, March 7, 2023 9:29 PM
> > >
> > > >
> > > > I really prefer the 'use the iommufd option' still exist, it is so
> > > > much cleaner and easier for the actual users of this API. We've lost
> > > > the point by worrying about no iommu.
> > >
> > > Hmmm, so you are suggesting to have both the device fd approach
> > > and the zero-length array approach, let user to select the best way
> > > based on their wisdom. Is it? how about something like below in the
> > > uapi header.
> > >
> > > /**
> > >  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> > >  *                                  struct vfio_pci_hot_reset)
> > >  *
> > >  * Userspace requests hot reset for the devices it uses.  Due to the
> > >  * underlying topology, multiple devices may be affected in the reset.
> > >  * The affected devices may have been opened by the user or by other
> > >  * users or not opened yet.  Only when all the affected devices are
> > >  * either opened by the current user or not opened by any user, should
> > >  * the reset request be allowed.  Otherwise, this request is expected
> > >  * to return error. group_fds array can accept either group fds or
> > >  * device fds.  Users using iommufd (valid fd), could also passing a
> > >  * zero-length group_fds array to indicate using the bound iommufd_ctx
> > >  * for ownership check to the affected devices that are opened.
> > >  *
> > >  * Return: 0 on success, -errno on failure.
> > >  */
> > > struct vfio_pci_hot_reset {
> > >         __u32   argsz;
> > >         __u32   flags;
> > >         __u32   count;
> > >         __s32   group_fds[];
> > > };
> > >
> >
> >  * Userspace requests hot reset for the devices it uses.  Due to the
> >  * underlying topology, multiple devices can be affected in the reset
> >  * while some might be opened by another user. To avoid interference
> >  * the calling user must ensure all affected devices, if opened, are
> >  * owned by itself.
> >  *
> >  * The ownership can be proved in three ways:
> >  *   - An array of group fds
> >  *   - An array of device fds
> >  *   - A zero-length array
> >  *
> Thanks.
> >  * In the last case all affected devices which are opened by this user must
> >  * have been bound to a same iommufd_ctx.
> 
> I think we only allow it when this iommufd_ctx is valid. Is it? To
> user, it means device should be bound to a positive iommufd.

I didn't get it. Do we have a iommufd_ctx created but marked as
invalid?

> 
> > and with this change let's rename 'group_fds'  to 'fds'
> 
> Sure. It would be something like below:
> 
> struct vfio_pci_hot_reset {
> 	__u32   argsz;
> 	__u32   flags;
> 	_u32   count;
> 	union {
> 		__s32   group_fds[0];
> 		__s32   fds[0];
> 	};
> };
> 

why union? Just renaming should work. In the kernel we will first
check whether it's group, whether it's device, then compare
iommufd_ctx is zero length.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-08  7:55                           ` Tian, Kevin
@ 2023-03-08  8:00                             ` Liu, Yi L
  2023-03-08  8:14                               ` Tian, Kevin
  0 siblings, 1 reply; 131+ messages in thread
From: Liu, Yi L @ 2023-03-08  8:00 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, March 8, 2023 3:55 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, March 8, 2023 3:47 PM
> >
> > > From: Tian, Kevin <kevin.tian@intel.com>
> > > Sent: Wednesday, March 8, 2023 3:26 PM
> > >
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Tuesday, March 7, 2023 9:29 PM
> > > >
> > > > >
> > > > > I really prefer the 'use the iommufd option' still exist, it is so
> > > > > much cleaner and easier for the actual users of this API. We've lost
> > > > > the point by worrying about no iommu.
> > > >
> > > > Hmmm, so you are suggesting to have both the device fd approach
> > > > and the zero-length array approach, let user to select the best way
> > > > based on their wisdom. Is it? how about something like below in the
> > > > uapi header.
> > > >
> > > > /**
> > > >  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> > > >  *                                  struct vfio_pci_hot_reset)
> > > >  *
> > > >  * Userspace requests hot reset for the devices it uses.  Due to the
> > > >  * underlying topology, multiple devices may be affected in the reset.
> > > >  * The affected devices may have been opened by the user or by
> other
> > > >  * users or not opened yet.  Only when all the affected devices are
> > > >  * either opened by the current user or not opened by any user,
> should
> > > >  * the reset request be allowed.  Otherwise, this request is expected
> > > >  * to return error. group_fds array can accept either group fds or
> > > >  * device fds.  Users using iommufd (valid fd), could also passing a
> > > >  * zero-length group_fds array to indicate using the bound
> iommufd_ctx
> > > >  * for ownership check to the affected devices that are opened.
> > > >  *
> > > >  * Return: 0 on success, -errno on failure.
> > > >  */
> > > > struct vfio_pci_hot_reset {
> > > >         __u32   argsz;
> > > >         __u32   flags;
> > > >         __u32   count;
> > > >         __s32   group_fds[];
> > > > };
> > > >
> > >
> > >  * Userspace requests hot reset for the devices it uses.  Due to the
> > >  * underlying topology, multiple devices can be affected in the reset
> > >  * while some might be opened by another user. To avoid interference
> > >  * the calling user must ensure all affected devices, if opened, are
> > >  * owned by itself.
> > >  *
> > >  * The ownership can be proved in three ways:
> > >  *   - An array of group fds
> > >  *   - An array of device fds
> > >  *   - A zero-length array
> > >  *
> > Thanks.
> > >  * In the last case all affected devices which are opened by this user
> must
> > >  * have been bound to a same iommufd_ctx.
> >
> > I think we only allow it when this iommufd_ctx is valid. Is it? To
> > user, it means device should be bound to a positive iommufd.
> 
> I didn't get it. Do we have a iommufd_ctx created but marked as
> invalid?

I mean iommufd_ctx==NULL. If a negative iommufd is provided,
then kernel side only has a NULL iommufd_ctx. If so, the ownership
check just fail if it uses iommufd_ctx for ownership proof.

> 
> >
> > > and with this change let's rename 'group_fds'  to 'fds'
> >
> > Sure. It would be something like below:
> >
> > struct vfio_pci_hot_reset {
> > 	__u32   argsz;
> > 	__u32   flags;
> > 	_u32   count;
> > 	union {
> > 		__s32   group_fds[0];
> > 		__s32   fds[0];
> > 	};
> > };
> >
> 
> why union? Just renaming should work. In the kernel we will first
> check whether it's group, whether it's device, then compare
> iommufd_ctx is zero length.

this is for old qemus. However, since it's just a rename perhaps
it is not needed. The layout is not changed. If qemu imports the
new header file, it needs to update the group_fds in its code as
well.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-08  8:00                             ` Liu, Yi L
@ 2023-03-08  8:14                               ` Tian, Kevin
  2023-03-08  8:15                                 ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Tian, Kevin @ 2023-03-08  8:14 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, March 8, 2023 4:01 PM
> 
> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Wednesday, March 8, 2023 3:55 PM
> >
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Wednesday, March 8, 2023 3:47 PM
> > >
> > > > From: Tian, Kevin <kevin.tian@intel.com>
> > > > Sent: Wednesday, March 8, 2023 3:26 PM
> > > >
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Tuesday, March 7, 2023 9:29 PM
> > > > >
> > > > > >
> > > > > > I really prefer the 'use the iommufd option' still exist, it is so
> > > > > > much cleaner and easier for the actual users of this API. We've lost
> > > > > > the point by worrying about no iommu.
> > > > >
> > > > > Hmmm, so you are suggesting to have both the device fd approach
> > > > > and the zero-length array approach, let user to select the best way
> > > > > based on their wisdom. Is it? how about something like below in the
> > > > > uapi header.
> > > > >
> > > > > /**
> > > > >  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> > > > >  *                                  struct vfio_pci_hot_reset)
> > > > >  *
> > > > >  * Userspace requests hot reset for the devices it uses.  Due to the
> > > > >  * underlying topology, multiple devices may be affected in the reset.
> > > > >  * The affected devices may have been opened by the user or by
> > other
> > > > >  * users or not opened yet.  Only when all the affected devices are
> > > > >  * either opened by the current user or not opened by any user,
> > should
> > > > >  * the reset request be allowed.  Otherwise, this request is expected
> > > > >  * to return error. group_fds array can accept either group fds or
> > > > >  * device fds.  Users using iommufd (valid fd), could also passing a
> > > > >  * zero-length group_fds array to indicate using the bound
> > iommufd_ctx
> > > > >  * for ownership check to the affected devices that are opened.
> > > > >  *
> > > > >  * Return: 0 on success, -errno on failure.
> > > > >  */
> > > > > struct vfio_pci_hot_reset {
> > > > >         __u32   argsz;
> > > > >         __u32   flags;
> > > > >         __u32   count;
> > > > >         __s32   group_fds[];
> > > > > };
> > > > >
> > > >
> > > >  * Userspace requests hot reset for the devices it uses.  Due to the
> > > >  * underlying topology, multiple devices can be affected in the reset
> > > >  * while some might be opened by another user. To avoid interference
> > > >  * the calling user must ensure all affected devices, if opened, are
> > > >  * owned by itself.
> > > >  *
> > > >  * The ownership can be proved in three ways:
> > > >  *   - An array of group fds
> > > >  *   - An array of device fds
> > > >  *   - A zero-length array
> > > >  *
> > > Thanks.
> > > >  * In the last case all affected devices which are opened by this user
> > must
> > > >  * have been bound to a same iommufd_ctx.
> > >
> > > I think we only allow it when this iommufd_ctx is valid. Is it? To
> > > user, it means device should be bound to a positive iommufd.
> >
> > I didn't get it. Do we have a iommufd_ctx created but marked as
> > invalid?
> 
> I mean iommufd_ctx==NULL. If a negative iommufd is provided,
> then kernel side only has a NULL iommufd_ctx. If so, the ownership
> check just fail if it uses iommufd_ctx for ownership proof.

it's fine. iommufd_ctx check doesn't work with noiommu.

User should use device fd if involving noiommu.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-08  8:14                               ` Tian, Kevin
@ 2023-03-08  8:15                                 ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-08  8:15 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, March 8, 2023 4:14 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, March 8, 2023 4:01 PM
> >
> > > From: Tian, Kevin <kevin.tian@intel.com>
> > > Sent: Wednesday, March 8, 2023 3:55 PM
> > >
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Wednesday, March 8, 2023 3:47 PM
> > > >
> > > > > From: Tian, Kevin <kevin.tian@intel.com>
> > > > > Sent: Wednesday, March 8, 2023 3:26 PM
> > > > >
> > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > Sent: Tuesday, March 7, 2023 9:29 PM
> > > > > >
> > > > > > >
> > > > > > > I really prefer the 'use the iommufd option' still exist, it is so
> > > > > > > much cleaner and easier for the actual users of this API. We've
> lost
> > > > > > > the point by worrying about no iommu.
> > > > > >
> > > > > > Hmmm, so you are suggesting to have both the device fd approach
> > > > > > and the zero-length array approach, let user to select the best way
> > > > > > based on their wisdom. Is it? how about something like below in
> the
> > > > > > uapi header.
> > > > > >
> > > > > > /**
> > > > > >  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE +
> 13,
> > > > > >  *                                  struct vfio_pci_hot_reset)
> > > > > >  *
> > > > > >  * Userspace requests hot reset for the devices it uses.  Due to the
> > > > > >  * underlying topology, multiple devices may be affected in the
> reset.
> > > > > >  * The affected devices may have been opened by the user or by
> > > other
> > > > > >  * users or not opened yet.  Only when all the affected devices are
> > > > > >  * either opened by the current user or not opened by any user,
> > > should
> > > > > >  * the reset request be allowed.  Otherwise, this request is
> expected
> > > > > >  * to return error. group_fds array can accept either group fds or
> > > > > >  * device fds.  Users using iommufd (valid fd), could also passing a
> > > > > >  * zero-length group_fds array to indicate using the bound
> > > iommufd_ctx
> > > > > >  * for ownership check to the affected devices that are opened.
> > > > > >  *
> > > > > >  * Return: 0 on success, -errno on failure.
> > > > > >  */
> > > > > > struct vfio_pci_hot_reset {
> > > > > >         __u32   argsz;
> > > > > >         __u32   flags;
> > > > > >         __u32   count;
> > > > > >         __s32   group_fds[];
> > > > > > };
> > > > > >
> > > > >
> > > > >  * Userspace requests hot reset for the devices it uses.  Due to the
> > > > >  * underlying topology, multiple devices can be affected in the reset
> > > > >  * while some might be opened by another user. To avoid
> interference
> > > > >  * the calling user must ensure all affected devices, if opened, are
> > > > >  * owned by itself.
> > > > >  *
> > > > >  * The ownership can be proved in three ways:
> > > > >  *   - An array of group fds
> > > > >  *   - An array of device fds
> > > > >  *   - A zero-length array
> > > > >  *
> > > > Thanks.
> > > > >  * In the last case all affected devices which are opened by this user
> > > must
> > > > >  * have been bound to a same iommufd_ctx.
> > > >
> > > > I think we only allow it when this iommufd_ctx is valid. Is it? To
> > > > user, it means device should be bound to a positive iommufd.
> > >
> > > I didn't get it. Do we have a iommufd_ctx created but marked as
> > > invalid?
> >
> > I mean iommufd_ctx==NULL. If a negative iommufd is provided,
> > then kernel side only has a NULL iommufd_ctx. If so, the ownership
> > check just fail if it uses iommufd_ctx for ownership proof.
> 
> it's fine. iommufd_ctx check doesn't work with noiommu.
> 
> User should use device fd if involving noiommu.

Yes, this is my point. This zero-length array approach is only
available for devices that are bound to positive iommufd.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-08  7:26                       ` Tian, Kevin
  2023-03-08  7:47                         ` Liu, Yi L
@ 2023-03-08 15:08                         ` Jason Gunthorpe
  1 sibling, 0 replies; 131+ messages in thread
From: Jason Gunthorpe @ 2023-03-08 15:08 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, nicolinc@nvidia.com,
	Zhao, Yan Y, intel-gfx@lists.freedesktop.org,
	eric.auger@redhat.com, intel-gvt-dev@lists.freedesktop.org,
	yi.y.sun@linux.intel.com, cohuck@redhat.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com

On Wed, Mar 08, 2023 at 07:26:08AM +0000, Tian, Kevin wrote:
>  * Userspace requests hot reset for the devices it uses.  Due to the
>  * underlying topology, multiple devices can be affected in the reset
>  * while some might be opened by another user. To avoid interference
>  * the calling user must ensure all affected devices, if opened, are
>  * owned by itself.
>  *
>  * The ownership can be proved in three ways:
>  *   - An array of group fds
>  *   - An array of device fds
>  *   - A zero-length array
>  *
>  * In the last case all affected devices which are opened by this user must
>  * have been bound to a same iommufd_ctx.
> 
> and with this change let's rename 'group_fds'  to 'fds'

Looks right

Jason

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-04  7:00               ` Nicolin Chen
  2023-03-04  8:22                 ` Liu, Yi L
@ 2023-03-08 15:54                 ` Shameerali Kolothum Thodi
  2023-03-14 11:38                 ` Shameerali Kolothum Thodi
  2 siblings, 0 replies; 131+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-08 15:54 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, Jason Gunthorpe, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, suravee.suthikulpanit@amd.com,
	robin.murphy@arm.com



> -----Original Message-----
> From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> Sent: 04 March 2023 07:01
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> Jason Gunthorpe <jgg@nvidia.com>; alex.williamson@redhat.com; Tian,
> Kevin <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; kvm@vger.kernel.org;
> mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> lulu@redhat.com; suravee.suthikulpanit@amd.com;
> intel-gvt-dev@lists.freedesktop.org; intel-gfx@lists.freedesktop.org;
> linux-s390@vger.kernel.org; Hao, Xudong <xudong.hao@intel.com>; Zhao,
> Yan Y <yan.y.zhao@intel.com>
> Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 
> On Fri, Mar 03, 2023 at 03:01:03PM +0000, Shameerali Kolothum Thodi
> wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > > -----Original Message-----
> > > From: Nicolin Chen [mailto:nicolinc@nvidia.com]
> > > Sent: 02 March 2023 23:51
> > > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> > > Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L
> > > <yi.l.liu@intel.com>; Jason Gunthorpe <jgg@nvidia.com>;
> > > alex.williamson@redhat.com; Tian, Kevin <kevin.tian@intel.com>;
> > > joro@8bytes.org; robin.murphy@arm.com; cohuck@redhat.com;
> > > eric.auger@redhat.com; kvm@vger.kernel.org; mjrosato@linux.ibm.com;
> > > chao.p.peng@linux.intel.com; yi.y.sun@linux.intel.com;
> > > peterx@redhat.com; jasowang@redhat.com; lulu@redhat.com;
> > > suravee.suthikulpanit@amd.com; intel-gvt-dev@lists.freedesktop.org;
> > > intel-gfx@lists.freedesktop.org; linux-s390@vger.kernel.org; Hao,
> > > Xudong <xudong.hao@intel.com>; Zhao, Yan Y <yan.y.zhao@intel.com>
> > > Subject: Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd
> > > support
> > >
> > > On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum Thodi
> > > wrote:
> > >
> > > > Hi Nicolin,
> > > >
> > > > Thanks for the latest ARM64 branch. Do you have a working Qemu
> > > > branch
> > > corresponding to the
> > > > above one?
> > > >
> > > > I tried the
> > >
> https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> > > smmuv3
> > > > but for some reason not able to launch the Guest.
> > > >
> > > > Please let me know.
> > >
> > > I do use that branch. It might not be that robust though as it went
> > > through a big rebase.
> >
> > Ok. The issue seems to be quite random in nature and only happens when
> > there are multiple vCPUs. Also doesn't look like related to VFIO
> > device assignment as I can reproduce Guest hang without it by only
> > having nested-smmuv3 and iommufd object.
> >
> > ./qemu-system-aarch64-iommuf -machine
> > virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
> -enable-kvm
> > -cpu host -m 1G -smp cpus=8,maxcpus=8 \ -object iommufd,id=iommufd0
> \
> > -bios QEMU_EFI.fd \ -kernel Image-6.2-iommufd \ -initrd
> > rootfs-iperf.cpio \ -net none \ -nographic \ -append "rdinit=init
> > console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \ -trace
> > events=events \ -D trace_iommufd
> >
> > When the issue happens, no output on terminal as if Qemu is in a locked
> state.
> >
> >  Can you try with the followings?
> > >
> > > --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*"
> > > --trace "msi_*" --trace "nvme_*"
> >
> > The only trace events with above are this,
> >
> > iommufd_backend_connect fd=22 owned=1 users=1 (0) smmu_add_mr
> > smmuv3-iommu-memory-region-0-0
> >
> > I haven't debugged this further. Please let me know if issue is
> > reproducible with multiple vCPUs at your end. For now will focus on VFIO
> dev specific tests.
> 
> Oh. My test environment has been a single-core vCPU. So that doesn't
> happen to me. Can you try a vanilla QEMU branch that our nesting branch is
> rebased on? I took a branch from Yi as the baseline, while he might take
> from Eric for the rfcv3.
> 
> I am guessing that it might be an issue in the common tree.

Yes, that looks like the case.
I tried with:
 commit 13356edb8750("Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging")

And issue is still there. So hopefully once we rebase everything it will go away.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
  2023-02-27 19:19   ` Jason Gunthorpe
  2023-03-01  9:19   ` Liu, Yi L
@ 2023-03-10  2:39   ` Alexey Kardashevskiy
  2023-03-10  5:49     ` Liu, Yi L
  2 siblings, 1 reply; 131+ messages in thread
From: Alexey Kardashevskiy @ 2023-03-10  2:39 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg, kevin.tian
  Cc: linux-s390, yi.y.sun, mjrosato, kvm, intel-gvt-dev, joro, cohuck,
	xudong.hao, peterx, yan.y.zhao, eric.auger, terrence.xu, nicolinc,
	shameerali.kolothum.thodi, suravee.suthikulpanit, intel-gfx,
	chao.p.peng, lulu, robin.murphy, jasowang

On 27/2/23 22:11, Yi Liu wrote:
> This adds ioctl for userspace to bind device cdev fd to iommufd.
> 
>      VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> 			      control provided by the iommufd. open_device
> 			      op is called after bind_iommufd op.
> 			      VFIO no iommu mode is indicated by passing
> 			      a negative iommufd value.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/vfio/device_cdev.c | 146 +++++++++++++++++++++++++++++++++++++
>   drivers/vfio/vfio.h        |  17 ++++-
>   drivers/vfio/vfio_main.c   |  54 ++++++++++++--
>   include/linux/iommufd.h    |   6 ++
>   include/uapi/linux/vfio.h  |  34 +++++++++
>   5 files changed, 248 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 9e2c1ecaaf4f..37f80e368551 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -3,6 +3,7 @@
>    * Copyright (c) 2023 Intel Corporation.
>    */
>   #include <linux/vfio.h>
> +#include <linux/iommufd.h>
>   
>   #include "vfio.h"
>   
> @@ -45,6 +46,151 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>   	return ret;
>   }
>   
> +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> +{
> +	spin_lock(&df->kvm_ref_lock);
> +	if (!df->kvm)
> +		goto unlock;
> +
> +	_vfio_device_get_kvm_safe(df->device, df->kvm);
> +
> +unlock:
> +	spin_unlock(&df->kvm_ref_lock);
> +}
> +
> +void vfio_device_cdev_close(struct vfio_device_file *df)
> +{
> +	struct vfio_device *device = df->device;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/*
> +	 * As df->access_granted writer is under dev_set->lock as well,
> +	 * so this read no need to use smp_load_acquire() to pair with
> +	 * smp_store_release() in the caller of vfio_device_open().
> +	 */
> +	if (!df->access_granted) {
> +		mutex_unlock(&device->dev_set->lock);
> +		return;
> +	}
> +	vfio_device_close(df);
> +	vfio_device_put_kvm(device);
> +	if (df->iommufd)
> +		iommufd_ctx_put(df->iommufd);
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +}
> +
> +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> +{
> +	struct fd f;
> +	struct iommufd_ctx *iommufd;
> +
> +	f = fdget(fd);
> +	if (!f.file)
> +		return ERR_PTR(-EBADF);
> +
> +	iommufd = iommufd_ctx_from_file(f.file);
> +
> +	fdput(f);
> +	return iommufd;
> +}
> +
> +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				    unsigned long arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_bind_iommufd bind;
> +	struct iommufd_ctx *iommufd = NULL;
> +	unsigned long minsz;
> +	int ret;
> +
> +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> +
> +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> +		return -EFAULT;
> +
> +	if (bind.argsz < minsz || bind.flags)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;
> +
> +	ret = vfio_device_block_group(device);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/*
> +	 * If already been bound to an iommufd, or already set noiommu
> +	 * then fail it.
> +	 */
> +	if (df->iommufd || df->noiommu) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* iommufd < 0 means noiommu mode */
> +	if (bind.iommufd < 0) {
> +		if (!capable(CAP_SYS_RAWIO)) {
> +			ret = -EPERM;
> +			goto out_unlock;
> +		}
> +		df->noiommu = true;
> +	} else {
> +		iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> +		if (IS_ERR(iommufd)) {
> +			ret = PTR_ERR(iommufd);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	/*
> +	 * Before the device open, get the KVM pointer currently
> +	 * associated with the device file (if there is) and obtain
> +	 * a reference.  This reference is held until device closed.
> +	 * Save the pointer in the device for use by drivers.
> +	 */
> +	vfio_device_get_kvm_safe(df);
> +
> +	df->iommufd = iommufd;
> +	ret = vfio_device_open(df, &bind.out_devid, NULL);


This is unrelated to this patch but reminded me - while debugging QEMU, 
vfio_assert_device_open() kept firing as I was killing QEMU (which in 
turn made the kernel close all fds), device->open_count==0 as QEMU was 
dying before calling ioctl(VFIO_DEVICE_BIND_IOMMUFD) which would call 
this vfio_device_open() just above. Has this been reported/fixed, just 
curious?



-- 
Alexey


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-10  2:39   ` Alexey Kardashevskiy
@ 2023-03-10  5:49     ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-10  5:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy, alex.williamson@redhat.com, jgg@nvidia.com,
	Tian, Kevin
  Cc: linux-s390@vger.kernel.org, yi.y.sun@linux.intel.com,
	mjrosato@linux.ibm.com, kvm@vger.kernel.org,
	intel-gvt-dev@lists.freedesktop.org, joro@8bytes.org,
	cohuck@redhat.com, Hao, Xudong, peterx@redhat.com, Zhao, Yan Y,
	eric.auger@redhat.com, Xu, Terrence, nicolinc@nvidia.com,
	shameerali.kolothum.thodi@huawei.com,
	suravee.suthikulpanit@amd.com, intel-gfx@lists.freedesktop.org,
	chao.p.peng@linux.intel.com, lulu@redhat.com,
	robin.murphy@arm.com, jasowang@redhat.com

> From: Alexey Kardashevskiy <aik@amd.com>
> Sent: Friday, March 10, 2023 10:39 AM
> 
> On 27/2/23 22:11, Yi Liu wrote:
> > This adds ioctl for userspace to bind device cdev fd to iommufd.
> >
> >      VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain
> DMA
> > 			      control provided by the iommufd. open_device
> > 			      op is called after bind_iommufd op.
> > 			      VFIO no iommu mode is indicated by passing
> > 			      a negative iommufd value.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/vfio/device_cdev.c | 146
> +++++++++++++++++++++++++++++++++++++
> >   drivers/vfio/vfio.h        |  17 ++++-
> >   drivers/vfio/vfio_main.c   |  54 ++++++++++++--
> >   include/linux/iommufd.h    |   6 ++
> >   include/uapi/linux/vfio.h  |  34 +++++++++
> >   5 files changed, 248 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 9e2c1ecaaf4f..37f80e368551 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -3,6 +3,7 @@
> >    * Copyright (c) 2023 Intel Corporation.
> >    */
> >   #include <linux/vfio.h>
> > +#include <linux/iommufd.h>
> >
> >   #include "vfio.h"
> >
> > @@ -45,6 +46,151 @@ int vfio_device_fops_cdev_open(struct inode
> *inode, struct file *filep)
> >   	return ret;
> >   }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +	spin_lock(&df->kvm_ref_lock);
> > +	if (!df->kvm)
> > +		goto unlock;
> > +
> > +	_vfio_device_get_kvm_safe(df->device, df->kvm);
> > +
> > +unlock:
> > +	spin_unlock(&df->kvm_ref_lock);
> > +}
> > +
> > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > +{
> > +	struct vfio_device *device = df->device;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/*
> > +	 * As df->access_granted writer is under dev_set->lock as well,
> > +	 * so this read no need to use smp_load_acquire() to pair with
> > +	 * smp_store_release() in the caller of vfio_device_open().
> > +	 */

device->open_count is sure to be non-zero if df->access_granted
is true. Otherwise, it means this device file has not opened device
successfully, so no need to do further tidy up.

> > +	if (!df->access_granted) {
> > +		mutex_unlock(&device->dev_set->lock);
> > +		return;
> > +	}
> > +	vfio_device_close(df);
> > +	vfio_device_put_kvm(device);
> > +	if (df->iommufd)
> > +		iommufd_ctx_put(df->iommufd);
> > +	mutex_unlock(&device->dev_set->lock);
> > +	vfio_device_unblock_group(device);
> > +}
> > +
> > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> > +{
> > +	struct fd f;
> > +	struct iommufd_ctx *iommufd;
> > +
> > +	f = fdget(fd);
> > +	if (!f.file)
> > +		return ERR_PTR(-EBADF);
> > +
> > +	iommufd = iommufd_ctx_from_file(f.file);
> > +
> > +	fdput(f);
> > +	return iommufd;
> > +}
> > +
> > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				    unsigned long arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_bind_iommufd bind;
> > +	struct iommufd_ctx *iommufd = NULL;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > +
> > +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (bind.argsz < minsz || bind.flags)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> > +
> > +	ret = vfio_device_block_group(device);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/*
> > +	 * If already been bound to an iommufd, or already set noiommu
> > +	 * then fail it.
> > +	 */
> > +	if (df->iommufd || df->noiommu) {
> > +		ret = -EINVAL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	/* iommufd < 0 means noiommu mode */
> > +	if (bind.iommufd < 0) {
> > +		if (!capable(CAP_SYS_RAWIO)) {
> > +			ret = -EPERM;
> > +			goto out_unlock;
> > +		}
> > +		df->noiommu = true;
> > +	} else {
> > +		iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> > +		if (IS_ERR(iommufd)) {
> > +			ret = PTR_ERR(iommufd);
> > +			goto out_unlock;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * Before the device open, get the KVM pointer currently
> > +	 * associated with the device file (if there is) and obtain
> > +	 * a reference.  This reference is held until device closed.
> > +	 * Save the pointer in the device for use by drivers.
> > +	 */
> > +	vfio_device_get_kvm_safe(df);
> > +
> > +	df->iommufd = iommufd;
> > +	ret = vfio_device_open(df, &bind.out_devid, NULL);
> 
> 
> This is unrelated to this patch but reminded me - while debugging QEMU,
> vfio_assert_device_open() kept firing as I was killing QEMU (which in
> turn made the kernel close all fds), device->open_count==0 as QEMU was
> dying before calling ioctl(VFIO_DEVICE_BIND_IOMMUFD) which would call
> this vfio_device_open() just above. Has this been reported/fixed, just
> curious?

Thanks, I think this was fixed by the code I marked above. I think
it was raised in v2 review. Should have been fixed after that. Have
tried v5 or v6? If still have issue, please feel free let me know it.

https://lore.kernel.org/kvm/Y+HIWRM%2FTjWcuT6I@yzhao56-desk.sh.intel.com/

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-04  7:00               ` Nicolin Chen
  2023-03-04  8:22                 ` Liu, Yi L
  2023-03-08 15:54                 ` Shameerali Kolothum Thodi
@ 2023-03-14 11:38                 ` Shameerali Kolothum Thodi
  2023-03-15 23:22                   ` Nicolin Chen
  2 siblings, 1 reply; 131+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-03-14 11:38 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, Liu, Yi L, kvm@vger.kernel.org,
	lulu@redhat.com, joro@8bytes.org, Jason Gunthorpe, Zhangfei Gao,
	Zhao, Yan Y, intel-gfx@lists.freedesktop.org,
	eric.auger@redhat.com, intel-gvt-dev@lists.freedesktop.org,
	yi.y.sun@linux.intel.com, cohuck@redhat.com,
	suravee.suthikulpanit@amd.com, robin.murphy@arm.com



> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: 08 March 2023 15:55
> To: 'Nicolin Chen' <nicolinc@nvidia.com>
> Cc: Xu, Terrence <terrence.xu@intel.com>; Liu, Yi L <yi.l.liu@intel.com>;
> Jason Gunthorpe <jgg@nvidia.com>; alex.williamson@redhat.com; Tian,
> Kevin <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; kvm@vger.kernel.org;
> mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> lulu@redhat.com; suravee.suthikulpanit@amd.com;
> intel-gvt-dev@lists.freedesktop.org; intel-gfx@lists.freedesktop.org;
> linux-s390@vger.kernel.org; Hao, Xudong <xudong.hao@intel.com>; Zhao,
> Yan Y <yan.y.zhao@intel.com>
> Subject: RE: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 

[...]
> > > > On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum
> > > > Thodi
> > > > wrote:
> > > >
> > > > > Hi Nicolin,
> > > > >
> > > > > Thanks for the latest ARM64 branch. Do you have a working Qemu
> > > > > branch
> > > > corresponding to the
> > > > > above one?
> > > > >
> > > > > I tried the
> > > >
> >
> https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> > > > smmuv3
> > > > > but for some reason not able to launch the Guest.
> > > > >
> > > > > Please let me know.
> > > >
> > > > I do use that branch. It might not be that robust though as it
> > > > went through a big rebase.
> > >
> > > Ok. The issue seems to be quite random in nature and only happens
> > > when there are multiple vCPUs. Also doesn't look like related to
> > > VFIO device assignment as I can reproduce Guest hang without it by
> > > only having nested-smmuv3 and iommufd object.
> > >
> > > ./qemu-system-aarch64-iommuf -machine
> > > virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
> > -enable-kvm
> > > -cpu host -m 1G -smp cpus=8,maxcpus=8 \ -object
> iommufd,id=iommufd0
> > \
> > > -bios QEMU_EFI.fd \ -kernel Image-6.2-iommufd \ -initrd
> > > rootfs-iperf.cpio \ -net none \ -nographic \ -append "rdinit=init
> > > console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \ -trace
> > > events=events \ -D trace_iommufd
> > >
> > > When the issue happens, no output on terminal as if Qemu is in a
> > > locked
> > state.
> > >
> > >  Can you try with the followings?
> > > >
> > > > --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*"
> > > > --trace "msi_*" --trace "nvme_*"
> > >
> > > The only trace events with above are this,
> > >
> > > iommufd_backend_connect fd=22 owned=1 users=1 (0) smmu_add_mr
> > > smmuv3-iommu-memory-region-0-0
> > >
> > > I haven't debugged this further. Please let me know if issue is
> > > reproducible with multiple vCPUs at your end. For now will focus on
> > > VFIO
> > dev specific tests.
> >
> > Oh. My test environment has been a single-core vCPU. So that doesn't
> > happen to me. Can you try a vanilla QEMU branch that our nesting
> > branch is rebased on? I took a branch from Yi as the baseline, while
> > he might take from Eric for the rfcv3.
> >
> > I am guessing that it might be an issue in the common tree.
> 
> Yes, that looks like the case.
> I tried with:
>  commit 13356edb8750("Merge tag 'block-pull-request' of
> https://gitlab.com/stefanha/qemu into staging")
> 
> And issue is still there. So hopefully once we rebase everything it will go
> away.

Hi Nicolin,

I rebased your latest Qemu branch[1] on top of v7.2.0 and not observed
the above issue so far. However noticed couple of other issues when
we try to hot add/remove devices.

(qemu) device_del net1
qemu-system-aarch64-iommufd: Failed to free id: 4 Inappropriate ioctl for device
qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000101000, 0xf000) = -2 (No such file or directory)
qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000000000, 0x100000) = -2 (No such file or directory)
qemu-system-aarch64-iommufd: Failed to free id:1 Device or resource busy

Ignoring the MMIO UNMAP errors, it looks like the object free is
not proper on dev removal path. I have few quick fixes here 
for this,
https://github.com/hisilicon/qemu/tree/private-v7.2.0-iommufd-nesting

With the above, it seems the HWPT/IOAS objects are destroyed properly
on dev detach path. But when the dev is added back, gets a Qemu seg fault
and so far I have no clue why that happens.

(qemu) device_add vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1
./qemu_run-iommufd-nested: line 13:  7041 Segmentation fault
(core dumped) ./qemu-system-aarch64-iommufd
-machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0
-enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 -object
iommufd,id=iommufd0 -bios QEMU_EFI_Dec2018.fd -kernel
Image-iommufd -initrd rootfs-iperf.cpio -device
ioh3420,id=rp1 -device
vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1 -append
"rdinit=init console=ttyAMA0 root=/dev/vda rw
earlycon=pl011,0x9000000" -net none -nographic -trace events=events -D
trace_iommufd

There are no kernel log/crash and not much useful traces while this happens.
Understand these are early days and it is not robust in anyway, but please
let me know if you suspect anything. I will continue debugging and will update
if anything.

Thanks,
Shameer

[1] https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2Bsmmuv3



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-14 11:38                 ` Shameerali Kolothum Thodi
@ 2023-03-15 23:22                   ` Nicolin Chen
  2023-03-16  7:39                     ` Liu, Yi L
  0 siblings, 1 reply; 131+ messages in thread
From: Nicolin Chen @ 2023-03-15 23:22 UTC (permalink / raw)
  To: Liu, Yi L, Shameerali Kolothum Thodi
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, Jason Gunthorpe, Zhangfei Gao, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, suravee.suthikulpanit@amd.com,
	robin.murphy@arm.com

On Tue, Mar 14, 2023 at 11:38:11AM +0000, Shameerali Kolothum Thodi wrote:

> Hi Nicolin,
> 
> I rebased your latest Qemu branch[1] on top of v7.2.0 and not observed
> the above issue so far. However noticed couple of other issues when
> we try to hot add/remove devices.
> 
> (qemu) device_del net1
> qemu-system-aarch64-iommufd: Failed to free id: 4 Inappropriate ioctl for device
> qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
> qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000101000, 0xf000) = -2 (No such file or directory)
> qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
> qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000000000, 0x100000) = -2 (No such file or directory)
> qemu-system-aarch64-iommufd: Failed to free id:1 Device or resource busy
> 
> Ignoring the MMIO UNMAP errors, it looks like the object free is
> not proper on dev removal path. I have few quick fixes here
> for this,
> https://github.com/hisilicon/qemu/tree/private-v7.2.0-iommufd-nesting

The smmuv3 change looks good to me. I will let Yi check the
iommufd change.

Yi, I wonder if this is the hot reset case that you asked me
for, a couple of weeks ago.

> With the above, it seems the HWPT/IOAS objects are destroyed properly
> on dev detach path. But when the dev is added back, gets a Qemu seg fault
> and so far I have no clue why that happens.
>
> (qemu) device_add vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1
> ./qemu_run-iommufd-nested: line 13:  7041 Segmentation fault
> (core dumped) ./qemu-system-aarch64-iommufd
> -machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0
> -enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 -object
> iommufd,id=iommufd0 -bios QEMU_EFI_Dec2018.fd -kernel
> Image-iommufd -initrd rootfs-iperf.cpio -device
> ioh3420,id=rp1 -device
> vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1 -append
> "rdinit=init console=ttyAMA0 root=/dev/vda rw
> earlycon=pl011,0x9000000" -net none -nographic -trace events=events -D
> trace_iommufd
> 
> There are no kernel log/crash and not much useful traces while this happens.
> Understand these are early days and it is not robust in anyway, but please
> let me know if you suspect anything. I will continue debugging and will update
> if anything.

Thanks! That'd be very helpful.

Nicolin

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support
  2023-03-15 23:22                   ` Nicolin Chen
@ 2023-03-16  7:39                     ` Liu, Yi L
  0 siblings, 0 replies; 131+ messages in thread
From: Liu, Yi L @ 2023-03-16  7:39 UTC (permalink / raw)
  To: Nicolin Chen, Shameerali Kolothum Thodi
  Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, Hao, Xudong,
	peterx@redhat.com, Xu, Terrence, chao.p.peng@linux.intel.com,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org, lulu@redhat.com,
	joro@8bytes.org, Jason Gunthorpe, Zhangfei Gao, Zhao, Yan Y,
	intel-gfx@lists.freedesktop.org, eric.auger@redhat.com,
	intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com,
	cohuck@redhat.com, suravee.suthikulpanit@amd.com,
	robin.murphy@arm.com

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, March 16, 2023 7:23 AM
>
> On Tue, Mar 14, 2023 at 11:38:11AM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > Hi Nicolin,
> >
> > I rebased your latest Qemu branch[1] on top of v7.2.0 and not observed
> > the above issue so far. However noticed couple of other issues when
> > we try to hot add/remove devices.
> >
> > (qemu) device_del net1
> > qemu-system-aarch64-iommufd: Failed to free id: 4 Inappropriate ioctl for
> device
> > qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such
> file or directory
> > qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0,
> 0x8000101000, 0xf000) = -2 (No such file or directory)
> > qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such
> file or directory
> > qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0,
> 0x8000000000, 0x100000) = -2 (No such file or directory)
> > qemu-system-aarch64-iommufd: Failed to free id:1 Device or resource
> busy
> >
> > Ignoring the MMIO UNMAP errors, it looks like the object free is
> > not proper on dev removal path. I have few quick fixes here
> > for this,
> > https://github.com/hisilicon/qemu/tree/private-v7.2.0-iommufd-nesting
> 
> The smmuv3 change looks good to me. I will let Yi check the
> iommufd change.
> 
> Yi, I wonder if this is the hot reset case that you asked me
> for, a couple of weeks ago.

Aha, not really. What Thodi does is the hot removal which is emulating
hot-plug out a physical device from the PCI slot. It may trigger hot reset
though since reset is something needs to be done during it. However,
it's not a focus test as I asked weeks ago. 😊

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 131+ messages in thread

end of thread, other threads:[~2023-03-20 13:19 UTC | newest]

Thread overview: 131+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-27 11:11 [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Yi Liu
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 01/19] vfio: Allocate per device file structure Yi Liu
2023-02-27 18:46   ` Jason Gunthorpe
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 02/19] vfio: Refine vfio file kAPIs for KVM Yi Liu
2023-02-27 18:46   ` Jason Gunthorpe
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 03/19] vfio: Accept vfio device file in the KVM facing kAPI Yi Liu
2023-02-27 18:46   ` Jason Gunthorpe
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
2023-02-27 18:47   ` Jason Gunthorpe
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 05/19] kvm/vfio: Accept vfio device file from userspace Yi Liu
2023-02-27 18:47   ` Jason Gunthorpe
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
2023-02-27 18:47   ` Jason Gunthorpe
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 07/19] vfio: Block device access via device fd until device is opened Yi Liu
2023-02-27 18:48   ` Jason Gunthorpe
2023-03-01  9:22   ` Liu, Yi L
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
2023-02-27 18:48   ` Jason Gunthorpe
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 09/19] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
2023-02-27 18:22   ` Jason Gunthorpe
2023-02-28  2:31     ` Liu, Yi L
2023-03-02  6:07   ` Liu, Yi L
2023-03-02  9:55     ` Tian, Kevin
2023-03-02 12:35       ` Jason Gunthorpe
2023-03-02 14:20         ` Liu, Yi L
2023-03-03  6:36           ` Tian, Kevin
2023-03-03 16:55             ` Alex Williamson
2023-03-05 14:48               ` Liu, Yi L
2023-03-06  8:16                 ` Tian, Kevin
2023-03-06  8:23                   ` Tian, Kevin
2023-03-06  8:33                     ` Liu, Yi L
2023-03-06  9:59                 ` Liu, Yi L
2023-03-06 13:16               ` Jason Gunthorpe
2023-03-07  2:31                 ` Tian, Kevin
2023-03-07  2:35                   ` Liu, Yi L
2023-03-07 12:36                   ` Jason Gunthorpe
2023-03-07 13:28                     ` Liu, Yi L
2023-03-08  7:26                       ` Tian, Kevin
2023-03-08  7:47                         ` Liu, Yi L
2023-03-08  7:55                           ` Tian, Kevin
2023-03-08  8:00                             ` Liu, Yi L
2023-03-08  8:14                               ` Tian, Kevin
2023-03-08  8:15                                 ` Liu, Yi L
2023-03-08 15:08                         ` Jason Gunthorpe
2023-03-02 21:04     ` Alex Williamson
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 10/19] vfio: Add infrastructure for bind_iommufd from userspace Yi Liu
2023-02-27 18:29   ` Jason Gunthorpe
2023-02-28  2:35     ` Liu, Yi L
2023-02-28  6:58       ` Liu, Yi L
2023-02-28 12:31         ` Jason Gunthorpe
2023-02-28 12:45           ` Liu, Yi L
2023-02-28 12:52             ` Jason Gunthorpe
2023-02-28 12:56               ` Liu, Yi L
2023-02-28 12:58                 ` Jason Gunthorpe
2023-02-28 12:29       ` Jason Gunthorpe
2023-02-28 12:48         ` Liu, Yi L
2023-02-28 12:52           ` Jason Gunthorpe
2023-02-28 13:24             ` Liu, Yi L
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
2023-02-27 18:44   ` Jason Gunthorpe
2023-02-28  2:57     ` Liu, Yi L
2023-02-28 12:33       ` Jason Gunthorpe
2023-02-28 12:43         ` Liu, Yi L
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 12/19] vfio-iommufd: Add detach_ioas for emulated " Yi Liu
2023-02-27 18:45   ` Jason Gunthorpe
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 13/19] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
2023-02-27 19:20   ` Jason Gunthorpe
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 14/19] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
2023-02-27 18:52   ` Jason Gunthorpe
2023-02-28  3:11     ` Liu, Yi L
2023-02-28 12:33       ` Jason Gunthorpe
2023-03-01 13:58         ` Liu, Yi L
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 15/19] vfio: Add cdev for vfio_device Yi Liu
2023-02-27 18:55   ` Jason Gunthorpe
2023-02-28  3:47     ` Liu, Yi L
2023-02-27 19:06   ` Jason Gunthorpe
2023-02-28  3:59     ` Liu, Yi L
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
2023-02-27 19:19   ` Jason Gunthorpe
2023-02-28  4:08     ` Liu, Yi L
2023-03-01  9:19   ` Liu, Yi L
2023-03-01 17:46     ` Jason Gunthorpe
2023-03-02  4:09       ` Liu, Yi L
2023-03-03  6:57       ` Liu, Yi L
2023-03-03  7:23         ` Liu, Yi L
2023-03-07  6:38         ` Tian, Kevin
2023-03-07 12:37           ` Jason Gunthorpe
2023-03-07 13:03             ` Liu, Yi L
2023-03-08  7:17               ` Tian, Kevin
2023-03-10  2:39   ` Alexey Kardashevskiy
2023-03-10  5:49     ` Liu, Yi L
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
2023-02-27 18:39   ` Jason Gunthorpe
2023-02-28  2:51     ` Liu, Yi L
2023-02-28 12:32       ` Jason Gunthorpe
2023-02-28 12:42         ` Liu, Yi L
2023-02-28 12:53           ` Jason Gunthorpe
2023-02-28 13:22             ` Liu, Yi L
2023-02-28 13:25               ` Jason Gunthorpe
2023-02-28 13:36                 ` Liu, Yi L
2023-02-28 13:43                   ` Jason Gunthorpe
2023-02-28 14:01                     ` Liu, Yi L
2023-02-28 14:38                       ` Jason Gunthorpe
2023-03-01 14:04                         ` Liu, Yi L
2023-03-01 17:49                           ` Jason Gunthorpe
2023-03-02  3:24                             ` Liu, Yi L
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 18/19] vfio: Compile group optionally Yi Liu
2023-02-27 19:20   ` Jason Gunthorpe
2023-02-28  3:14     ` Liu, Yi L
2023-02-28  6:00   ` Liu, Yi L
2023-02-28 12:36     ` Jason Gunthorpe
2023-03-01 13:59       ` Liu, Yi L
2023-02-27 11:11 ` [Intel-gfx] [PATCH v5 19/19] docs: vfio: Add vfio device cdev description Yi Liu
2023-02-27 11:31 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev5) Patchwork
2023-02-27 19:21 ` [Intel-gfx] [PATCH v5 00/19] Add vfio_device cdev for iommufd support Jason Gunthorpe
2023-02-28  3:03   ` Liu, Yi L
2023-02-28 16:58     ` Xu, Terrence
2023-03-01  2:29       ` Nicolin Chen
2023-03-01  3:44         ` Liu, Yi L
2023-03-02  9:43         ` Shameerali Kolothum Thodi
2023-03-02 23:51           ` Nicolin Chen
2023-03-03 15:01             ` Shameerali Kolothum Thodi
2023-03-04  7:00               ` Nicolin Chen
2023-03-04  8:22                 ` Liu, Yi L
2023-03-08 15:54                 ` Shameerali Kolothum Thodi
2023-03-14 11:38                 ` Shameerali Kolothum Thodi
2023-03-15 23:22                   ` Nicolin Chen
2023-03-16  7:39                     ` Liu, Yi L
2023-03-03 21:29         ` Matthew Rosato
2023-03-01 21:01 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev6) Patchwork
2023-03-03  7:00 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev7) Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).