public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/5] vfio-pci support pasid attach/detach
@ 2025-03-13 12:47 Yi Liu
  2025-03-13 12:47 ` [PATCH v8 1/5] ida: Add ida_find_first_range() Yi Liu
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Yi Liu @ 2025-03-13 12:47 UTC (permalink / raw)
  To: alex.williamson, kevin.tian
  Cc: jgg, eric.auger, nicolinc, kvm, yi.l.liu, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

This series introduces the PASID attach/detach user APIs (uAPIs) that
allow userspace to attach or detach a device's PASID to or from a specified
IOAS/hwpt. Currently, only the vfio-pci driver is enabled in this series.

Following this update, PASID-capable devices bound to vfio-pci can report
PASID capabilities to userspace and virtual machines (VMs), facilitating
PASID use cases such as Shared Virtual Addressing (SVA). In discussions
about reporting the virtual PASID (vPASID) to VMs [1], it was agreed that
the userspace virtual machine monitor (VMM) will synthesize the vPASID
capability. The VMM must identify a suitable location to insert the vPASID
capability, including handling hidden bits for certain devices. However,
this responsibility lies with userspace and is not the focus of this series.

This series begins by adding helpers for PASID attachment in the vfio core,
then extends the device character device (cdev) attach/detach ioctls to
support PASID attach/detach operations. At the conclusion of this series,
the IOMMU_GET_HW_INFO ioctl is extended to report PCI PASID capabilities
to userspace. Userspace should verify this capability before utilizing any
PASID-related uAPIs provided by VFIO, as agreed in [2]. This series depends
on the iommufd PASID attach/detach series [3].

The complete code is available at [4] and has been tested with a modified
QEMU branch [5].

[1] https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2@BN9PR11MB5276.namprd11.prod.outlook.com/
[2] https://lore.kernel.org/kvm/4f2daf50-a5ad-4599-ab59-bcfc008688d8@intel.com/
[3] https://lore.kernel.org/linux-iommu/20250313123532.103522-1-yi.l.liu@intel.com/
[4] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
[5] https://github.com/yiliu1765/qemu/tree/wip/zhenzhong/iommufd_nesting_rfcv2-test-pasid

Change log:

v8:
 - Rebased on top of the latest iommufd series, mainly using the latest
   kAPI for pasid attach

v7: https://lore.kernel.org/kvm/20250216054638.24603-1-yi.l.liu@intel.com/#t
 - Add Alex's and Kevin's r-b on vfio patches
 - Minor tweaks on patch 04 (Kevin)

v6: https://lore.kernel.org/kvm/20241219133534.16422-1-yi.l.liu@intel.com/
 - Drop the vfio_copy_user_data() generalization as it is not totally clear
   what it would cover. (Alex)
 - Reworked the patch 03 of v5 a bit. e.g. lift the pasid_{at|de}tach_ioas op test
   before the second user data copy; make 'if (xend > minsz)' to be 'if (xend)'
   and remove the comment accordingly. This is because we don't generalize
   the user data copy now, so xend is either 0 or non-zero, no need to check
   against minsz.
 - Make the IOMMU_GET_HW_INFO report out_max_pasid_log2 by checking the
   dev->iommu->max_pasids. This is because iommu drivers enables PASID
   as long as it supports. So checking it is enough. Also, it is more friendly
   to non-PCI PASID supports compared with reading the PCI config space to
   check if PASID is enabled.
 - Add selftest coverage for reporting max_pasid_log2 in IOMMU_HW_INFO ioctl.

v5: https://lore.kernel.org/kvm/20241108121742.18889-1-yi.l.liu@intel.com/
 - Fix a wrong return value (Alex)
 - Fix the policy of setting the xend array per flag extension (Alex)
 - A separate patch to generalize the code of copy user data (Alex)

v4: https://lore.kernel.org/kvm/20241104132732.16759-1-yi.l.liu@intel.com/
 - Add acked-by for the ida patch from Matthew
 - Add r-b from Kevin and Jason on patch 01, 02 and 04 of v3
 - Add common code to copy user data for the user struct with new fields
 - Extend the VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT to support pasid, patch 03
   is updated per this change. Hence drop r-b of it. (Kevin, Alex)
 - Add t-b from Zhangfei for patch 4 of v3
 - Nits from Vasant

v3: https://lore.kernel.org/linux-iommu/20240912131729.14951-1-yi.l.liu@intel.com/
 - Misc enhancement on patch 01 of v2 (Alex, Jason)
 - Add Jason's r-b to patch 03 of v2
 - Drop the logic that report PASID via VFIO_DEVICE_FEATURE ioctl
 - Extend IOMMU_GET_HW_INFO to report PASID support (Kevin, Jason, Alex)

v2: https://lore.kernel.org/kvm/20240412082121.33382-1-yi.l.liu@intel.com/
 - Use IDA to track if PASID is attached or not in VFIO. (Jason)
 - Fix the issue of calling pasid_at[de]tach_ioas callback unconditionally (Alex)
 - Fix the wrong data copy in vfio_df_ioctl_pasid_detach_pt() (Zhenzhong)
 - Minor tweaks in comments (Kevin)

v1: https://lore.kernel.org/kvm/20231127063909.129153-1-yi.l.liu@intel.com/
 - Report PASID capability via VFIO_DEVICE_FEATURE (Alex)

rfc: https://lore.kernel.org/linux-iommu/20230926093121.18676-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Yi Liu (5):
  ida: Add ida_find_first_range()
  vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
  vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  iommufd/selftest: Add coverage for reporting max_pasid_log2 via
    IOMMU_HW_INFO

 drivers/iommu/iommufd/device.c                | 35 +++++++++-
 drivers/pci/ats.c                             | 33 +++++++++
 drivers/vfio/device_cdev.c                    | 60 +++++++++++++---
 drivers/vfio/iommufd.c                        | 50 +++++++++++++
 drivers/vfio/pci/vfio_pci.c                   |  2 +
 include/linux/idr.h                           | 11 +++
 include/linux/pci-ats.h                       |  3 +
 include/linux/vfio.h                          | 14 ++++
 include/uapi/linux/iommufd.h                  | 14 +++-
 include/uapi/linux/vfio.h                     | 29 +++++---
 lib/idr.c                                     | 67 ++++++++++++++++++
 lib/test_ida.c                                | 70 +++++++++++++++++++
 tools/testing/selftests/iommu/iommufd.c       | 18 +++++
 .../selftests/iommu/iommufd_fail_nth.c        |  3 +-
 tools/testing/selftests/iommu/iommufd_utils.h | 17 +++--
 15 files changed, 401 insertions(+), 25 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v8 1/5] ida: Add ida_find_first_range()
  2025-03-13 12:47 [PATCH v8 0/5] vfio-pci support pasid attach/detach Yi Liu
@ 2025-03-13 12:47 ` Yi Liu
  2025-03-13 12:47 ` [PATCH v8 2/5] vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices Yi Liu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Yi Liu @ 2025-03-13 12:47 UTC (permalink / raw)
  To: alex.williamson, kevin.tian
  Cc: jgg, eric.auger, nicolinc, kvm, yi.l.liu, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

There is no helpers for user to check if a given ID is allocated or not,
neither a helper to loop all the allocated IDs in an IDA and do something
for cleanup. With the two needs, a helper to get the lowest allocated ID
of a range and two variants based on it.

Caller can check if a given ID is allocated or not by:

	bool ida_exists(struct ida *ida, unsigned int id)

Caller can iterate all allocated IDs by:

	int id;
	while ((id = ida_find_first(&pasid_ida)) >= 0) {
		//anything to do with the allocated ID
		ida_free(pasid_ida, pasid);
	}

Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/linux/idr.h | 11 +++++++
 lib/idr.c           | 67 +++++++++++++++++++++++++++++++++++++++++++
 lib/test_ida.c      | 70 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 148 insertions(+)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index da5f5fa4a3a6..718f9b1b91af 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -257,6 +257,7 @@ struct ida {
 int ida_alloc_range(struct ida *, unsigned int min, unsigned int max, gfp_t);
 void ida_free(struct ida *, unsigned int id);
 void ida_destroy(struct ida *ida);
+int ida_find_first_range(struct ida *ida, unsigned int min, unsigned int max);
 
 /**
  * ida_alloc() - Allocate an unused ID.
@@ -328,4 +329,14 @@ static inline bool ida_is_empty(const struct ida *ida)
 {
 	return xa_empty(&ida->xa);
 }
+
+static inline bool ida_exists(struct ida *ida, unsigned int id)
+{
+	return ida_find_first_range(ida, id, id) == id;
+}
+
+static inline int ida_find_first(struct ida *ida)
+{
+	return ida_find_first_range(ida, 0, ~0);
+}
 #endif /* __IDR_H__ */
diff --git a/lib/idr.c b/lib/idr.c
index da36054c3ca0..e2adc457abb4 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -476,6 +476,73 @@ int ida_alloc_range(struct ida *ida, unsigned int min, unsigned int max,
 }
 EXPORT_SYMBOL(ida_alloc_range);
 
+/**
+ * ida_find_first_range - Get the lowest used ID.
+ * @ida: IDA handle.
+ * @min: Lowest ID to get.
+ * @max: Highest ID to get.
+ *
+ * Get the lowest used ID between @min and @max, inclusive.  The returned
+ * ID will not exceed %INT_MAX, even if @max is larger.
+ *
+ * Context: Any context. Takes and releases the xa_lock.
+ * Return: The lowest used ID, or errno if no used ID is found.
+ */
+int ida_find_first_range(struct ida *ida, unsigned int min, unsigned int max)
+{
+	unsigned long index = min / IDA_BITMAP_BITS;
+	unsigned int offset = min % IDA_BITMAP_BITS;
+	unsigned long *addr, size, bit;
+	unsigned long tmp = 0;
+	unsigned long flags;
+	void *entry;
+	int ret;
+
+	if ((int)min < 0)
+		return -EINVAL;
+	if ((int)max < 0)
+		max = INT_MAX;
+
+	xa_lock_irqsave(&ida->xa, flags);
+
+	entry = xa_find(&ida->xa, &index, max / IDA_BITMAP_BITS, XA_PRESENT);
+	if (!entry) {
+		ret = -ENOENT;
+		goto err_unlock;
+	}
+
+	if (index > min / IDA_BITMAP_BITS)
+		offset = 0;
+	if (index * IDA_BITMAP_BITS + offset > max) {
+		ret = -ENOENT;
+		goto err_unlock;
+	}
+
+	if (xa_is_value(entry)) {
+		tmp = xa_to_value(entry);
+		addr = &tmp;
+		size = BITS_PER_XA_VALUE;
+	} else {
+		addr = ((struct ida_bitmap *)entry)->bitmap;
+		size = IDA_BITMAP_BITS;
+	}
+
+	bit = find_next_bit(addr, size, offset);
+
+	xa_unlock_irqrestore(&ida->xa, flags);
+
+	if (bit == size ||
+	    index * IDA_BITMAP_BITS + bit > max)
+		return -ENOENT;
+
+	return index * IDA_BITMAP_BITS + bit;
+
+err_unlock:
+	xa_unlock_irqrestore(&ida->xa, flags);
+	return ret;
+}
+EXPORT_SYMBOL(ida_find_first_range);
+
 /**
  * ida_free() - Release an allocated ID.
  * @ida: IDA handle.
diff --git a/lib/test_ida.c b/lib/test_ida.c
index c80155a1956d..63078f8dc13f 100644
--- a/lib/test_ida.c
+++ b/lib/test_ida.c
@@ -189,6 +189,75 @@ static void ida_check_bad_free(struct ida *ida)
 	IDA_BUG_ON(ida, !ida_is_empty(ida));
 }
 
+/*
+ * Check ida_find_first_range() and varriants.
+ */
+static void ida_check_find_first(struct ida *ida)
+{
+	/* IDA is empty; all of the below should be not exist */
+	IDA_BUG_ON(ida, ida_exists(ida, 0));
+	IDA_BUG_ON(ida, ida_exists(ida, 3));
+	IDA_BUG_ON(ida, ida_exists(ida, 63));
+	IDA_BUG_ON(ida, ida_exists(ida, 1023));
+	IDA_BUG_ON(ida, ida_exists(ida, (1 << 20) - 1));
+
+	/* IDA contains a single value entry */
+	IDA_BUG_ON(ida, ida_alloc_min(ida, 3, GFP_KERNEL) != 3);
+	IDA_BUG_ON(ida, ida_exists(ida, 0));
+	IDA_BUG_ON(ida, !ida_exists(ida, 3));
+	IDA_BUG_ON(ida, ida_exists(ida, 63));
+	IDA_BUG_ON(ida, ida_exists(ida, 1023));
+	IDA_BUG_ON(ida, ida_exists(ida, (1 << 20) - 1));
+
+	IDA_BUG_ON(ida, ida_alloc_min(ida, 63, GFP_KERNEL) != 63);
+	IDA_BUG_ON(ida, ida_exists(ida, 0));
+	IDA_BUG_ON(ida, !ida_exists(ida, 3));
+	IDA_BUG_ON(ida, !ida_exists(ida, 63));
+	IDA_BUG_ON(ida, ida_exists(ida, 1023));
+	IDA_BUG_ON(ida, ida_exists(ida, (1 << 20) - 1));
+
+	/* IDA contains a single bitmap */
+	IDA_BUG_ON(ida, ida_alloc_min(ida, 1023, GFP_KERNEL) != 1023);
+	IDA_BUG_ON(ida, ida_exists(ida, 0));
+	IDA_BUG_ON(ida, !ida_exists(ida, 3));
+	IDA_BUG_ON(ida, !ida_exists(ida, 63));
+	IDA_BUG_ON(ida, !ida_exists(ida, 1023));
+	IDA_BUG_ON(ida, ida_exists(ida, (1 << 20) - 1));
+
+	/* IDA contains a tree */
+	IDA_BUG_ON(ida, ida_alloc_min(ida, (1 << 20) - 1, GFP_KERNEL) != (1 << 20) - 1);
+	IDA_BUG_ON(ida, ida_exists(ida, 0));
+	IDA_BUG_ON(ida, !ida_exists(ida, 3));
+	IDA_BUG_ON(ida, !ida_exists(ida, 63));
+	IDA_BUG_ON(ida, !ida_exists(ida, 1023));
+	IDA_BUG_ON(ida, !ida_exists(ida, (1 << 20) - 1));
+
+	/* Now try to find first */
+	IDA_BUG_ON(ida, ida_find_first(ida) != 3);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, -1, 2) != -EINVAL);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 0, 2) != -ENOENT); // no used ID
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 0, 3) != 3);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 1, 3) != 3);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 3, 3) != 3);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 2, 4) != 3);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 4, 3) != -ENOENT); // min > max, fail
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 4, 60) != -ENOENT); // no used ID
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 4, 64) != 63);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 63, 63) != 63);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 64, 1026) != 1023);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 1023, 1023) != 1023);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 1023, (1 << 20) - 1) != 1023);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, 1024, (1 << 20) - 1) != (1 << 20) - 1);
+	IDA_BUG_ON(ida, ida_find_first_range(ida, (1 << 20), INT_MAX) != -ENOENT);
+
+	ida_free(ida, 3);
+	ida_free(ida, 63);
+	ida_free(ida, 1023);
+	ida_free(ida, (1 << 20) - 1);
+
+	IDA_BUG_ON(ida, !ida_is_empty(ida));
+}
+
 static DEFINE_IDA(ida);
 
 static int ida_checks(void)
@@ -202,6 +271,7 @@ static int ida_checks(void)
 	ida_check_max(&ida);
 	ida_check_conv(&ida);
 	ida_check_bad_free(&ida);
+	ida_check_find_first(&ida);
 
 	printk("IDA: %u of %u tests passed\n", tests_passed, tests_run);
 	return (tests_run != tests_passed) ? 0 : -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v8 2/5] vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
  2025-03-13 12:47 [PATCH v8 0/5] vfio-pci support pasid attach/detach Yi Liu
  2025-03-13 12:47 ` [PATCH v8 1/5] ida: Add ida_find_first_range() Yi Liu
@ 2025-03-13 12:47 ` Yi Liu
  2025-03-13 12:47 ` [PATCH v8 3/5] vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid Yi Liu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Yi Liu @ 2025-03-13 12:47 UTC (permalink / raw)
  To: alex.williamson, kevin.tian
  Cc: jgg, eric.auger, nicolinc, kvm, yi.l.liu, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

This adds pasid_at|de]tach_ioas ops for attaching hwpt to pasid of a
device and the helpers for it. For now, only vfio-pci supports pasid
attach/detach.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/iommufd.c      | 50 +++++++++++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci.c |  2 ++
 include/linux/vfio.h        | 14 +++++++++++
 3 files changed, 66 insertions(+)

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 37e1efa2c7bf..c8c3a2d53f86 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -119,14 +119,22 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 	if (IS_ERR(idev))
 		return PTR_ERR(idev);
 	vdev->iommufd_device = idev;
+	ida_init(&vdev->pasids);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_bind);
 
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev)
 {
+	int pasid;
+
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	while ((pasid = ida_find_first(&vdev->pasids)) >= 0) {
+		iommufd_device_detach(vdev->iommufd_device, pasid);
+		ida_free(&vdev->pasids, pasid);
+	}
+
 	if (vdev->iommufd_attached) {
 		iommufd_device_detach(vdev->iommufd_device, IOMMU_NO_PASID);
 		vdev->iommufd_attached = false;
@@ -170,6 +178,48 @@ void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
 
+int vfio_iommufd_physical_pasid_attach_ioas(struct vfio_device *vdev,
+					    u32 pasid, u32 *pt_id)
+{
+	int rc;
+
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_device))
+		return -EINVAL;
+
+	if (ida_exists(&vdev->pasids, pasid))
+		return iommufd_device_replace(vdev->iommufd_device,
+					      pasid, pt_id);
+
+	rc = ida_alloc_range(&vdev->pasids, pasid, pasid, GFP_KERNEL);
+	if (rc < 0)
+		return rc;
+
+	rc = iommufd_device_attach(vdev->iommufd_device, pasid, pt_id);
+	if (rc)
+		ida_free(&vdev->pasids, pasid);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_pasid_attach_ioas);
+
+void vfio_iommufd_physical_pasid_detach_ioas(struct vfio_device *vdev,
+					     u32 pasid)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_device))
+		return;
+
+	if (!ida_exists(&vdev->pasids, pasid))
+		return;
+
+	iommufd_device_detach(vdev->iommufd_device, pasid);
+	ida_free(&vdev->pasids, pasid);
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_pasid_detach_ioas);
+
 /*
  * The emulated standard ops mean that vfio_device is going to use the
  * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index e727941f589d..6f7ae7e5b7b0 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -144,6 +144,8 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
 	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
+	.pasid_attach_ioas	= vfio_iommufd_physical_pasid_attach_ioas,
+	.pasid_detach_ioas	= vfio_iommufd_physical_pasid_detach_ioas,
 };
 
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 000a6cab2d31..707b00772ce1 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -67,6 +67,7 @@ struct vfio_device {
 	struct inode *inode;
 #if IS_ENABLED(CONFIG_IOMMUFD)
 	struct iommufd_device *iommufd_device;
+	struct ida pasids;
 	u8 iommufd_attached:1;
 #endif
 	u8 cdev_opened:1;
@@ -91,6 +92,8 @@ struct vfio_device {
  *		 bound iommufd. Undo in unbind_iommufd if @detach_ioas is not
  *		 called.
  * @detach_ioas: Opposite of attach_ioas
+ * @pasid_attach_ioas: The pasid variation of attach_ioas
+ * @pasid_detach_ioas: Opposite of pasid_attach_ioas
  * @open_device: Called when the first file descriptor is opened for this device
  * @close_device: Opposite of open_device
  * @read: Perform read(2) on device file descriptor
@@ -115,6 +118,9 @@ struct vfio_device_ops {
 	void	(*unbind_iommufd)(struct vfio_device *vdev);
 	int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
 	void	(*detach_ioas)(struct vfio_device *vdev);
+	int	(*pasid_attach_ioas)(struct vfio_device *vdev, u32 pasid,
+				     u32 *pt_id);
+	void	(*pasid_detach_ioas)(struct vfio_device *vdev, u32 pasid);
 	int	(*open_device)(struct vfio_device *vdev);
 	void	(*close_device)(struct vfio_device *vdev);
 	ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -139,6 +145,10 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
 int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev);
+int vfio_iommufd_physical_pasid_attach_ioas(struct vfio_device *vdev,
+					    u32 pasid, u32 *pt_id);
+void vfio_iommufd_physical_pasid_detach_ioas(struct vfio_device *vdev,
+					     u32 pasid);
 int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
@@ -166,6 +176,10 @@ vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
 #define vfio_iommufd_physical_detach_ioas \
 	((void (*)(struct vfio_device *vdev)) NULL)
+#define vfio_iommufd_physical_pasid_attach_ioas \
+	((int (*)(struct vfio_device *vdev, u32 pasid, u32 *pt_id)) NULL)
+#define vfio_iommufd_physical_pasid_detach_ioas \
+	((void (*)(struct vfio_device *vdev, u32 pasid)) NULL)
 #define vfio_iommufd_emulated_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v8 3/5] vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  2025-03-13 12:47 [PATCH v8 0/5] vfio-pci support pasid attach/detach Yi Liu
  2025-03-13 12:47 ` [PATCH v8 1/5] ida: Add ida_find_first_range() Yi Liu
  2025-03-13 12:47 ` [PATCH v8 2/5] vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices Yi Liu
@ 2025-03-13 12:47 ` Yi Liu
  2025-03-19 17:41   ` Nicolin Chen
  2025-03-13 12:47 ` [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability Yi Liu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Yi Liu @ 2025-03-13 12:47 UTC (permalink / raw)
  To: alex.williamson, kevin.tian
  Cc: jgg, eric.auger, nicolinc, kvm, yi.l.liu, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

This extends the VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT ioctls to attach/detach
a given pasid of a vfio device to/from an IOAS/HWPT.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 60 +++++++++++++++++++++++++++++++++-----
 include/uapi/linux/vfio.h  | 29 +++++++++++-------
 2 files changed, 71 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index bb1817bd4ff3..6d436bee8207 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -162,9 +162,9 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df)
 int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
 			    struct vfio_device_attach_iommufd_pt __user *arg)
 {
-	struct vfio_device *device = df->device;
 	struct vfio_device_attach_iommufd_pt attach;
-	unsigned long minsz;
+	struct vfio_device *device = df->device;
+	unsigned long minsz, xend = 0;
 	int ret;
 
 	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
@@ -172,11 +172,34 @@ int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
 	if (copy_from_user(&attach, arg, minsz))
 		return -EFAULT;
 
-	if (attach.argsz < minsz || attach.flags)
+	if (attach.argsz < minsz)
 		return -EINVAL;
 
+	if (attach.flags & (~VFIO_DEVICE_ATTACH_PASID))
+		return -EINVAL;
+
+	if (attach.flags & VFIO_DEVICE_ATTACH_PASID) {
+		if (!device->ops->pasid_attach_ioas)
+			return -EOPNOTSUPP;
+		xend = offsetofend(struct vfio_device_attach_iommufd_pt, pasid);
+	}
+
+	if (xend) {
+		if (attach.argsz < xend)
+			return -EINVAL;
+
+		if (copy_from_user((void *)&attach + minsz,
+				   (void __user *)arg + minsz, xend - minsz))
+			return -EFAULT;
+	}
+
 	mutex_lock(&device->dev_set->lock);
-	ret = device->ops->attach_ioas(device, &attach.pt_id);
+	if (attach.flags & VFIO_DEVICE_ATTACH_PASID)
+		ret = device->ops->pasid_attach_ioas(device,
+						     attach.pasid,
+						     &attach.pt_id);
+	else
+		ret = device->ops->attach_ioas(device, &attach.pt_id);
 	if (ret)
 		goto out_unlock;
 
@@ -198,20 +221,41 @@ int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
 int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
 			    struct vfio_device_detach_iommufd_pt __user *arg)
 {
-	struct vfio_device *device = df->device;
 	struct vfio_device_detach_iommufd_pt detach;
-	unsigned long minsz;
+	struct vfio_device *device = df->device;
+	unsigned long minsz, xend = 0;
 
 	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
 
 	if (copy_from_user(&detach, arg, minsz))
 		return -EFAULT;
 
-	if (detach.argsz < minsz || detach.flags)
+	if (detach.argsz < minsz)
 		return -EINVAL;
 
+	if (detach.flags & (~VFIO_DEVICE_DETACH_PASID))
+		return -EINVAL;
+
+	if (detach.flags & VFIO_DEVICE_DETACH_PASID) {
+		if (!device->ops->pasid_detach_ioas)
+			return -EOPNOTSUPP;
+		xend = offsetofend(struct vfio_device_detach_iommufd_pt, pasid);
+	}
+
+	if (xend) {
+		if (detach.argsz < xend)
+			return -EINVAL;
+
+		if (copy_from_user((void *)&detach + minsz,
+				   (void __user *)arg + minsz, xend - minsz))
+			return -EFAULT;
+	}
+
 	mutex_lock(&device->dev_set->lock);
-	device->ops->detach_ioas(device);
+	if (detach.flags & VFIO_DEVICE_DETACH_PASID)
+		device->ops->pasid_detach_ioas(device, detach.pasid);
+	else
+		device->ops->detach_ioas(device);
 	mutex_unlock(&device->dev_set->lock);
 
 	return 0;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index c8dbf8219c4f..6899da70b929 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -931,29 +931,34 @@ struct vfio_device_bind_iommufd {
  * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 19,
  *					struct vfio_device_attach_iommufd_pt)
  * @argsz:	User filled size of this data.
- * @flags:	Must be 0.
+ * @flags:	Flags for attach.
  * @pt_id:	Input the target id which can represent an ioas or a hwpt
  *		allocated via iommufd subsystem.
  *		Output the input ioas id or the attached hwpt id which could
  *		be the specified hwpt itself or a hwpt automatically created
  *		for the specified ioas by kernel during the attachment.
+ * @pasid:	The pasid to be attached, only meaningful when
+ *		VFIO_DEVICE_ATTACH_PASID is set in @flags
  *
  * Associate the device with an address space within the bound iommufd.
  * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.  This is only
  * allowed on cdev fds.
  *
- * If a vfio device is currently attached to a valid hw_pagetable, without doing
- * a VFIO_DEVICE_DETACH_IOMMUFD_PT, a second VFIO_DEVICE_ATTACH_IOMMUFD_PT ioctl
- * passing in another hw_pagetable (hwpt) id is allowed. This action, also known
- * as a hw_pagetable replacement, will replace the device's currently attached
- * hw_pagetable with a new hw_pagetable corresponding to the given pt_id.
+ * If a vfio device or a pasid of this device is currently attached to a valid
+ * hw_pagetable (hwpt), without doing a VFIO_DEVICE_DETACH_IOMMUFD_PT, a second
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT ioctl passing in another hwpt id is allowed.
+ * This action, also known as a hw_pagetable replacement, will replace the
+ * currently attached hwpt of the device or the pasid of this device with a new
+ * hwpt corresponding to the given pt_id.
  *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_device_attach_iommufd_pt {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_DEVICE_ATTACH_PASID	(1 << 0)
 	__u32	pt_id;
+	__u32	pasid;
 };
 
 #define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 19)
@@ -962,17 +967,21 @@ struct vfio_device_attach_iommufd_pt {
  * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
  *					struct vfio_device_detach_iommufd_pt)
  * @argsz:	User filled size of this data.
- * @flags:	Must be 0.
+ * @flags:	Flags for detach.
+ * @pasid:	The pasid to be detached, only meaningful when
+ *		VFIO_DEVICE_DETACH_PASID is set in @flags
  *
- * Remove the association of the device and its current associated address
- * space.  After it, the device should be in a blocking DMA state.  This is only
- * allowed on cdev fds.
+ * Remove the association of the device or a pasid of the device and its current
+ * associated address space.  After it, the device or the pasid should be in a
+ * blocking DMA state.  This is only allowed on cdev fds.
  *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_device_detach_iommufd_pt {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_DEVICE_DETACH_PASID	(1 << 0)
+	__u32	pasid;
 };
 
 #define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-13 12:47 [PATCH v8 0/5] vfio-pci support pasid attach/detach Yi Liu
                   ` (2 preceding siblings ...)
  2025-03-13 12:47 ` [PATCH v8 3/5] vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid Yi Liu
@ 2025-03-13 12:47 ` Yi Liu
  2025-03-17  7:18   ` Yi Liu
  2025-03-19 17:58   ` Nicolin Chen
  2025-03-13 12:47 ` [PATCH v8 5/5] iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO Yi Liu
  2025-03-14 14:48 ` [PATCH v8 0/5] vfio-pci support pasid attach/detach Alex Williamson
  5 siblings, 2 replies; 25+ messages in thread
From: Yi Liu @ 2025-03-13 12:47 UTC (permalink / raw)
  To: alex.williamson, kevin.tian
  Cc: jgg, eric.auger, nicolinc, kvm, yi.l.liu, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

PASID usage requires PASID support in both device and IOMMU. Since the
iommu drivers always enable the PASID capability for the device if it
is supported, this extends the IOMMU_GET_HW_INFO to report the PASID
capability to userspace. Also, enhances the selftest accordingly.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 35 +++++++++++++++++++++++++++++++++-
 drivers/pci/ats.c              | 33 ++++++++++++++++++++++++++++++++
 include/linux/pci-ats.h        |  3 +++
 include/uapi/linux/iommufd.h   | 14 +++++++++++++-
 4 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 70da39f5e227..1f3bec61bcf9 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -3,6 +3,8 @@
  */
 #include <linux/iommu.h>
 #include <linux/iommufd.h>
+#include <linux/pci.h>
+#include <linux/pci-ats.h>
 #include <linux/slab.h>
 #include <uapi/linux/iommufd.h>
 #include <linux/msi.h>
@@ -1535,7 +1537,8 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
 	void *data;
 	int rc;
 
-	if (cmd->flags || cmd->__reserved)
+	if (cmd->flags || cmd->__reserved[0] || cmd->__reserved[1] ||
+	    cmd->__reserved[2])
 		return -EOPNOTSUPP;
 
 	idev = iommufd_get_device(ucmd, cmd->dev_id);
@@ -1592,6 +1595,36 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
 	if (device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY_TRACKING))
 		cmd->out_capabilities |= IOMMU_HW_CAP_DIRTY_TRACKING;
 
+	cmd->out_max_pasid_log2 = 0;
+	/*
+	 * Currently, all iommu drivers enable PASID in the probe_device()
+	 * op if iommu and device supports it. So the max_pasids stored in
+	 * dev->iommu indicates both PASID support and enable status. A
+	 * non-zero dev->iommu->max_pasids means PASID is supported and
+	 * enabled. The iommufd only reports PASID capability to userspace
+	 * if it's enabled.
+	 */
+	if (idev->dev->iommu->max_pasids) {
+		cmd->out_max_pasid_log2 = ilog2(idev->dev->iommu->max_pasids);
+
+		if (dev_is_pci(idev->dev)) {
+			struct pci_dev *pdev = to_pci_dev(idev->dev);
+			int ctrl;
+
+			ctrl = pci_pasid_status(pdev);
+
+			WARN_ON_ONCE(ctrl < 0 ||
+				     !(ctrl & PCI_PASID_CTRL_ENABLE));
+
+			if (ctrl & PCI_PASID_CTRL_EXEC)
+				cmd->out_capabilities |=
+						IOMMU_HW_CAP_PCI_PASID_EXEC;
+			if (ctrl & PCI_PASID_CTRL_PRIV)
+				cmd->out_capabilities |=
+						IOMMU_HW_CAP_PCI_PASID_PRIV;
+		}
+	}
+
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 out_free:
 	kfree(data);
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index c6b266c772c8..ec6c8dbdc5e9 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -538,4 +538,37 @@ int pci_max_pasids(struct pci_dev *pdev)
 	return (1 << FIELD_GET(PCI_PASID_CAP_WIDTH, supported));
 }
 EXPORT_SYMBOL_GPL(pci_max_pasids);
+
+/**
+ * pci_pasid_status - Check the PASID status
+ * @pdev: PCI device structure
+ *
+ * Returns a negative value when no PASID capability is present.
+ * Otherwise the value of the control register is returned.
+ * Status reported are:
+ *
+ * PCI_PASID_CTRL_ENABLE - PASID enabled
+ * PCI_PASID_CTRL_EXEC - Execute permission enabled
+ * PCI_PASID_CTRL_PRIV - Privileged mode enabled
+ */
+int pci_pasid_status(struct pci_dev *pdev)
+{
+	int pasid;
+	u16 ctrl;
+
+	if (pdev->is_virtfn)
+		pdev = pci_physfn(pdev);
+
+	pasid = pdev->pasid_cap;
+	if (!pasid)
+		return -EINVAL;
+
+	pci_read_config_word(pdev, pasid + PCI_PASID_CTRL, &ctrl);
+
+	ctrl &= PCI_PASID_CTRL_ENABLE | PCI_PASID_CTRL_EXEC |
+		PCI_PASID_CTRL_PRIV;
+
+	return ctrl;
+}
+EXPORT_SYMBOL_GPL(pci_pasid_status);
 #endif /* CONFIG_PCI_PASID */
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 0e8b74e63767..75c6c86cf09d 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -42,6 +42,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features);
 void pci_disable_pasid(struct pci_dev *pdev);
 int pci_pasid_features(struct pci_dev *pdev);
 int pci_max_pasids(struct pci_dev *pdev);
+int pci_pasid_status(struct pci_dev *pdev);
 #else /* CONFIG_PCI_PASID */
 static inline int pci_enable_pasid(struct pci_dev *pdev, int features)
 { return -EINVAL; }
@@ -50,6 +51,8 @@ static inline int pci_pasid_features(struct pci_dev *pdev)
 { return -EINVAL; }
 static inline int pci_max_pasids(struct pci_dev *pdev)
 { return -EINVAL; }
+static inline int pci_pasid_status(struct pci_dev *pdev)
+{ return -EINVAL; }
 #endif /* CONFIG_PCI_PASID */
 
 #endif /* LINUX_PCI_ATS_H */
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 75905f59b87f..ac9469576b51 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -611,9 +611,17 @@ enum iommu_hw_info_type {
  *                                   IOMMU_HWPT_GET_DIRTY_BITMAP
  *                                   IOMMU_HWPT_SET_DIRTY_TRACKING
  *
+ * @IOMMU_HW_CAP_PASID_EXEC: Execute Permission Supported, user ignores it
+ *                           when the struct iommu_hw_info::out_max_pasid_log2
+ *                           is zero.
+ * @IOMMU_HW_CAP_PASID_PRIV: Privileged Mode Supported, user ignores it
+ *                           when the struct iommu_hw_info::out_max_pasid_log2
+ *                           is zero.
  */
 enum iommufd_hw_capabilities {
 	IOMMU_HW_CAP_DIRTY_TRACKING = 1 << 0,
+	IOMMU_HW_CAP_PCI_PASID_EXEC = 1 << 1,
+	IOMMU_HW_CAP_PCI_PASID_PRIV = 1 << 2,
 };
 
 /**
@@ -629,6 +637,9 @@ enum iommufd_hw_capabilities {
  *                 iommu_hw_info_type.
  * @out_capabilities: Output the generic iommu capability info type as defined
  *                    in the enum iommu_hw_capabilities.
+ * @out_max_pasid_log2: Output the width of PASIDs. 0 means no PASID support.
+ *                      PCI devices turn to out_capabilities to check if the
+ *                      specific capabilities is supported or not.
  * @__reserved: Must be 0
  *
  * Query an iommu type specific hardware information data from an iommu behind
@@ -652,7 +663,8 @@ struct iommu_hw_info {
 	__u32 data_len;
 	__aligned_u64 data_uptr;
 	__u32 out_data_type;
-	__u32 __reserved;
+	__u8 out_max_pasid_log2;
+	__u8 __reserved[3];
 	__aligned_u64 out_capabilities;
 };
 #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v8 5/5] iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
  2025-03-13 12:47 [PATCH v8 0/5] vfio-pci support pasid attach/detach Yi Liu
                   ` (3 preceding siblings ...)
  2025-03-13 12:47 ` [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability Yi Liu
@ 2025-03-13 12:47 ` Yi Liu
  2025-03-19 18:12   ` Nicolin Chen
  2025-03-14 14:48 ` [PATCH v8 0/5] vfio-pci support pasid attach/detach Alex Williamson
  5 siblings, 1 reply; 25+ messages in thread
From: Yi Liu @ 2025-03-13 12:47 UTC (permalink / raw)
  To: alex.williamson, kevin.tian
  Cc: jgg, eric.auger, nicolinc, kvm, yi.l.liu, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

IOMMU_HW_INFO is extended to report max_pasid_log2, hence add coverage
for it.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 tools/testing/selftests/iommu/iommufd.c        | 18 ++++++++++++++++++
 .../testing/selftests/iommu/iommufd_fail_nth.c |  3 ++-
 tools/testing/selftests/iommu/iommufd_utils.h  | 17 +++++++++++++----
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index c41d15e91983..f06e0f554608 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -342,12 +342,14 @@ FIXTURE(iommufd_ioas)
 	uint32_t hwpt_id;
 	uint32_t device_id;
 	uint64_t base_iova;
+	uint32_t pasid_device_id;
 };
 
 FIXTURE_VARIANT(iommufd_ioas)
 {
 	unsigned int mock_domains;
 	unsigned int memory_limit;
+	bool pasid_capable;
 };
 
 FIXTURE_SETUP(iommufd_ioas)
@@ -372,6 +374,12 @@ FIXTURE_SETUP(iommufd_ioas)
 					     IOMMU_TEST_DEV_CACHE_DEFAULT);
 		self->base_iova = MOCK_APERTURE_START;
 	}
+
+	if (variant->pasid_capable)
+		test_cmd_mock_domain_flags(self->ioas_id,
+					   MOCK_FLAGS_DEVICE_PASID,
+					   NULL, NULL,
+					   &self->pasid_device_id);
 }
 
 FIXTURE_TEARDOWN(iommufd_ioas)
@@ -387,6 +395,7 @@ FIXTURE_VARIANT_ADD(iommufd_ioas, no_domain)
 FIXTURE_VARIANT_ADD(iommufd_ioas, mock_domain)
 {
 	.mock_domains = 1,
+	.pasid_capable = true,
 };
 
 FIXTURE_VARIANT_ADD(iommufd_ioas, two_mock_domain)
@@ -752,6 +761,8 @@ TEST_F(iommufd_ioas, get_hw_info)
 	} buffer_smaller;
 
 	if (self->device_id) {
+		uint8_t max_pasid = 0;
+
 		/* Provide a zero-size user_buffer */
 		test_cmd_get_hw_info(self->device_id, NULL, 0);
 		/* Provide a user_buffer with exact size */
@@ -766,6 +777,13 @@ TEST_F(iommufd_ioas, get_hw_info)
 		 * the fields within the size range still gets updated.
 		 */
 		test_cmd_get_hw_info(self->device_id, &buffer_smaller, sizeof(buffer_smaller));
+		test_cmd_get_hw_info_pasid(self->device_id, &max_pasid);
+		ASSERT_EQ(0, max_pasid);
+		if (variant->pasid_capable) {
+			test_cmd_get_hw_info_pasid(self->pasid_device_id,
+						   &max_pasid);
+			ASSERT_EQ(20, max_pasid);
+		}
 	} else {
 		test_err_get_hw_info(ENOENT, self->device_id,
 				     &buffer_exact, sizeof(buffer_exact));
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 6bbdc187a986..121e714a3183 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -664,7 +664,8 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 					&self->stdev_id, NULL, &idev_id))
 		return -1;
 
-	if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL))
+	if (_test_cmd_get_hw_info(self->fd, idev_id, &info,
+				  sizeof(info), NULL, NULL))
 		return -1;
 
 	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 523ff28e4bc9..8ed05838787d 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -757,7 +757,8 @@ static void teardown_iommufd(int fd, struct __test_metadata *_metadata)
 
 /* @data can be NULL */
 static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,
-				 size_t data_len, uint32_t *capabilities)
+				 size_t data_len, uint32_t *capabilities,
+				 uint8_t *max_pasid)
 {
 	struct iommu_test_hw_info *info = (struct iommu_test_hw_info *)data;
 	struct iommu_hw_info cmd = {
@@ -802,6 +803,9 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,
 			assert(!info->flags);
 	}
 
+	if (max_pasid)
+		*max_pasid = cmd.out_max_pasid_log2;
+
 	if (capabilities)
 		*capabilities = cmd.out_capabilities;
 
@@ -810,14 +814,19 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,
 
 #define test_cmd_get_hw_info(device_id, data, data_len)               \
 	ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, data, \
-					   data_len, NULL))
+					   data_len, NULL, NULL))
 
 #define test_err_get_hw_info(_errno, device_id, data, data_len)               \
 	EXPECT_ERRNO(_errno, _test_cmd_get_hw_info(self->fd, device_id, data, \
-						   data_len, NULL))
+						   data_len, NULL, NULL))
 
 #define test_cmd_get_hw_capabilities(device_id, caps, mask) \
-	ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps))
+	ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, \
+					   0, &caps, NULL))
+
+#define test_cmd_get_hw_info_pasid(device_id, max_pasid)              \
+	ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, \
+					   0, NULL, max_pasid))
 
 static int _test_ioctl_fault_alloc(int fd, __u32 *fault_id, __u32 *fault_fd)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 0/5] vfio-pci support pasid attach/detach
  2025-03-13 12:47 [PATCH v8 0/5] vfio-pci support pasid attach/detach Yi Liu
                   ` (4 preceding siblings ...)
  2025-03-13 12:47 ` [PATCH v8 5/5] iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO Yi Liu
@ 2025-03-14 14:48 ` Alex Williamson
  2025-03-17  7:25   ` Yi Liu
  5 siblings, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2025-03-14 14:48 UTC (permalink / raw)
  To: Yi Liu
  Cc: kevin.tian, jgg, eric.auger, nicolinc, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, 13 Mar 2025 05:47:48 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This series introduces the PASID attach/detach user APIs (uAPIs) that
> allow userspace to attach or detach a device's PASID to or from a specified
> IOAS/hwpt. Currently, only the vfio-pci driver is enabled in this series.
> 
> Following this update, PASID-capable devices bound to vfio-pci can report
> PASID capabilities to userspace and virtual machines (VMs), facilitating
> PASID use cases such as Shared Virtual Addressing (SVA). In discussions
> about reporting the virtual PASID (vPASID) to VMs [1], it was agreed that
> the userspace virtual machine monitor (VMM) will synthesize the vPASID
> capability. The VMM must identify a suitable location to insert the vPASID
> capability, including handling hidden bits for certain devices. However,
> this responsibility lies with userspace and is not the focus of this series.
> 
> This series begins by adding helpers for PASID attachment in the vfio core,
> then extends the device character device (cdev) attach/detach ioctls to
> support PASID attach/detach operations. At the conclusion of this series,
> the IOMMU_GET_HW_INFO ioctl is extended to report PCI PASID capabilities
> to userspace. Userspace should verify this capability before utilizing any
> PASID-related uAPIs provided by VFIO, as agreed in [2]. This series depends
> on the iommufd PASID attach/detach series [3].
> 
> The complete code is available at [4] and has been tested with a modified
> QEMU branch [5].

What's missing for this to go in and which tree will take it?  At a
glance it seems like 4/ needs a PCI sign-off and 5/ needs an IOMMUFD
sign-off.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-13 12:47 ` [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability Yi Liu
@ 2025-03-17  7:18   ` Yi Liu
  2025-03-19 17:58   ` Nicolin Chen
  1 sibling, 0 replies; 25+ messages in thread
From: Yi Liu @ 2025-03-17  7:18 UTC (permalink / raw)
  To: alex.williamson, kevin.tian, bhelgaas@google.com
  Cc: jgg, eric.auger, nicolinc, kvm, chao.p.peng, zhenzhong.duan,
	willy, zhangfei.gao, vasant.hegde

Loop Bjorn for the PCI part change.

On 2025/3/13 20:47, Yi Liu wrote:
> PASID usage requires PASID support in both device and IOMMU. Since the
> iommu drivers always enable the PASID capability for the device if it
> is supported, this extends the IOMMU_GET_HW_INFO to report the PASID
> capability to userspace. Also, enhances the selftest accordingly.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/device.c | 35 +++++++++++++++++++++++++++++++++-
>   drivers/pci/ats.c              | 33 ++++++++++++++++++++++++++++++++
>   include/linux/pci-ats.h        |  3 +++
>   include/uapi/linux/iommufd.h   | 14 +++++++++++++-
>   4 files changed, 83 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 70da39f5e227..1f3bec61bcf9 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -3,6 +3,8 @@
>    */
>   #include <linux/iommu.h>
>   #include <linux/iommufd.h>
> +#include <linux/pci.h>
> +#include <linux/pci-ats.h>
>   #include <linux/slab.h>
>   #include <uapi/linux/iommufd.h>
>   #include <linux/msi.h>
> @@ -1535,7 +1537,8 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
>   	void *data;
>   	int rc;
>   
> -	if (cmd->flags || cmd->__reserved)
> +	if (cmd->flags || cmd->__reserved[0] || cmd->__reserved[1] ||
> +	    cmd->__reserved[2])
>   		return -EOPNOTSUPP;
>   
>   	idev = iommufd_get_device(ucmd, cmd->dev_id);
> @@ -1592,6 +1595,36 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
>   	if (device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY_TRACKING))
>   		cmd->out_capabilities |= IOMMU_HW_CAP_DIRTY_TRACKING;
>   
> +	cmd->out_max_pasid_log2 = 0;
> +	/*
> +	 * Currently, all iommu drivers enable PASID in the probe_device()
> +	 * op if iommu and device supports it. So the max_pasids stored in
> +	 * dev->iommu indicates both PASID support and enable status. A
> +	 * non-zero dev->iommu->max_pasids means PASID is supported and
> +	 * enabled. The iommufd only reports PASID capability to userspace
> +	 * if it's enabled.
> +	 */
> +	if (idev->dev->iommu->max_pasids) {
> +		cmd->out_max_pasid_log2 = ilog2(idev->dev->iommu->max_pasids);
> +
> +		if (dev_is_pci(idev->dev)) {
> +			struct pci_dev *pdev = to_pci_dev(idev->dev);
> +			int ctrl;
> +
> +			ctrl = pci_pasid_status(pdev);
> +
> +			WARN_ON_ONCE(ctrl < 0 ||
> +				     !(ctrl & PCI_PASID_CTRL_ENABLE));
> +
> +			if (ctrl & PCI_PASID_CTRL_EXEC)
> +				cmd->out_capabilities |=
> +						IOMMU_HW_CAP_PCI_PASID_EXEC;
> +			if (ctrl & PCI_PASID_CTRL_PRIV)
> +				cmd->out_capabilities |=
> +						IOMMU_HW_CAP_PCI_PASID_PRIV;
> +		}
> +	}
> +
>   	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
>   out_free:
>   	kfree(data);
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index c6b266c772c8..ec6c8dbdc5e9 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -538,4 +538,37 @@ int pci_max_pasids(struct pci_dev *pdev)
>   	return (1 << FIELD_GET(PCI_PASID_CAP_WIDTH, supported));
>   }
>   EXPORT_SYMBOL_GPL(pci_max_pasids);
> +
> +/**
> + * pci_pasid_status - Check the PASID status
> + * @pdev: PCI device structure
> + *
> + * Returns a negative value when no PASID capability is present.
> + * Otherwise the value of the control register is returned.
> + * Status reported are:
> + *
> + * PCI_PASID_CTRL_ENABLE - PASID enabled
> + * PCI_PASID_CTRL_EXEC - Execute permission enabled
> + * PCI_PASID_CTRL_PRIV - Privileged mode enabled
> + */
> +int pci_pasid_status(struct pci_dev *pdev)
> +{
> +	int pasid;
> +	u16 ctrl;
> +
> +	if (pdev->is_virtfn)
> +		pdev = pci_physfn(pdev);
> +
> +	pasid = pdev->pasid_cap;
> +	if (!pasid)
> +		return -EINVAL;
> +
> +	pci_read_config_word(pdev, pasid + PCI_PASID_CTRL, &ctrl);
> +
> +	ctrl &= PCI_PASID_CTRL_ENABLE | PCI_PASID_CTRL_EXEC |
> +		PCI_PASID_CTRL_PRIV;
> +
> +	return ctrl;
> +}
> +EXPORT_SYMBOL_GPL(pci_pasid_status);
>   #endif /* CONFIG_PCI_PASID */
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 0e8b74e63767..75c6c86cf09d 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -42,6 +42,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features);
>   void pci_disable_pasid(struct pci_dev *pdev);
>   int pci_pasid_features(struct pci_dev *pdev);
>   int pci_max_pasids(struct pci_dev *pdev);
> +int pci_pasid_status(struct pci_dev *pdev);
>   #else /* CONFIG_PCI_PASID */
>   static inline int pci_enable_pasid(struct pci_dev *pdev, int features)
>   { return -EINVAL; }
> @@ -50,6 +51,8 @@ static inline int pci_pasid_features(struct pci_dev *pdev)
>   { return -EINVAL; }
>   static inline int pci_max_pasids(struct pci_dev *pdev)
>   { return -EINVAL; }
> +static inline int pci_pasid_status(struct pci_dev *pdev)
> +{ return -EINVAL; }
>   #endif /* CONFIG_PCI_PASID */
>   
>   #endif /* LINUX_PCI_ATS_H */
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 75905f59b87f..ac9469576b51 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -611,9 +611,17 @@ enum iommu_hw_info_type {
>    *                                   IOMMU_HWPT_GET_DIRTY_BITMAP
>    *                                   IOMMU_HWPT_SET_DIRTY_TRACKING
>    *
> + * @IOMMU_HW_CAP_PASID_EXEC: Execute Permission Supported, user ignores it
> + *                           when the struct iommu_hw_info::out_max_pasid_log2
> + *                           is zero.
> + * @IOMMU_HW_CAP_PASID_PRIV: Privileged Mode Supported, user ignores it
> + *                           when the struct iommu_hw_info::out_max_pasid_log2
> + *                           is zero.
>    */
>   enum iommufd_hw_capabilities {
>   	IOMMU_HW_CAP_DIRTY_TRACKING = 1 << 0,
> +	IOMMU_HW_CAP_PCI_PASID_EXEC = 1 << 1,
> +	IOMMU_HW_CAP_PCI_PASID_PRIV = 1 << 2,
>   };
>   
>   /**
> @@ -629,6 +637,9 @@ enum iommufd_hw_capabilities {
>    *                 iommu_hw_info_type.
>    * @out_capabilities: Output the generic iommu capability info type as defined
>    *                    in the enum iommu_hw_capabilities.
> + * @out_max_pasid_log2: Output the width of PASIDs. 0 means no PASID support.
> + *                      PCI devices turn to out_capabilities to check if the
> + *                      specific capabilities is supported or not.
>    * @__reserved: Must be 0
>    *
>    * Query an iommu type specific hardware information data from an iommu behind
> @@ -652,7 +663,8 @@ struct iommu_hw_info {
>   	__u32 data_len;
>   	__aligned_u64 data_uptr;
>   	__u32 out_data_type;
> -	__u32 __reserved;
> +	__u8 out_max_pasid_log2;
> +	__u8 __reserved[3];
>   	__aligned_u64 out_capabilities;
>   };
>   #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 0/5] vfio-pci support pasid attach/detach
  2025-03-14 14:48 ` [PATCH v8 0/5] vfio-pci support pasid attach/detach Alex Williamson
@ 2025-03-17  7:25   ` Yi Liu
  2025-03-17 19:28     ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Yi Liu @ 2025-03-17  7:25 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kevin.tian, jgg, eric.auger, nicolinc, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On 2025/3/14 22:48, Alex Williamson wrote:
> On Thu, 13 Mar 2025 05:47:48 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
>> This series introduces the PASID attach/detach user APIs (uAPIs) that
>> allow userspace to attach or detach a device's PASID to or from a specified
>> IOAS/hwpt. Currently, only the vfio-pci driver is enabled in this series.
>>
>> Following this update, PASID-capable devices bound to vfio-pci can report
>> PASID capabilities to userspace and virtual machines (VMs), facilitating
>> PASID use cases such as Shared Virtual Addressing (SVA). In discussions
>> about reporting the virtual PASID (vPASID) to VMs [1], it was agreed that
>> the userspace virtual machine monitor (VMM) will synthesize the vPASID
>> capability. The VMM must identify a suitable location to insert the vPASID
>> capability, including handling hidden bits for certain devices. However,
>> this responsibility lies with userspace and is not the focus of this series.
>>
>> This series begins by adding helpers for PASID attachment in the vfio core,
>> then extends the device character device (cdev) attach/detach ioctls to
>> support PASID attach/detach operations. At the conclusion of this series,
>> the IOMMU_GET_HW_INFO ioctl is extended to report PCI PASID capabilities
>> to userspace. Userspace should verify this capability before utilizing any
>> PASID-related uAPIs provided by VFIO, as agreed in [2]. This series depends
>> on the iommufd PASID attach/detach series [3].
>>
>> The complete code is available at [4] and has been tested with a modified
>> QEMU branch [5].
> 
> What's missing for this to go in and which tree will take it?  At a
> glance it seems like 4/ needs a PCI sign-off and 5/ needs an IOMMUFD
> sign-off.  Thanks,

Hi Alex,

yep, I just looped Bjorn in patch 4. While for patch 5, Jason is cced. I
thought this may need to be taken together with the iommufd pasid
series [1] due to dependency. Jason also mentioned this in the before [2].
It might be a shared tree from both of you I guess. :)

[1] 
https://lore.kernel.org/linux-iommu/20250313123532.103522-1-yi.l.liu@intel.com/
[2] https://lore.kernel.org/kvm/20250115005503.GP26854@ziepe.ca/

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 0/5] vfio-pci support pasid attach/detach
  2025-03-17  7:25   ` Yi Liu
@ 2025-03-17 19:28     ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2025-03-17 19:28 UTC (permalink / raw)
  To: Yi Liu
  Cc: Alex Williamson, kevin.tian, eric.auger, nicolinc, kvm,
	chao.p.peng, zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Mon, Mar 17, 2025 at 03:25:18PM +0800, Yi Liu wrote:
> On 2025/3/14 22:48, Alex Williamson wrote:
> > On Thu, 13 Mar 2025 05:47:48 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> > 
> > > This series introduces the PASID attach/detach user APIs (uAPIs) that
> > > allow userspace to attach or detach a device's PASID to or from a specified
> > > IOAS/hwpt. Currently, only the vfio-pci driver is enabled in this series.
> > > 
> > > Following this update, PASID-capable devices bound to vfio-pci can report
> > > PASID capabilities to userspace and virtual machines (VMs), facilitating
> > > PASID use cases such as Shared Virtual Addressing (SVA). In discussions
> > > about reporting the virtual PASID (vPASID) to VMs [1], it was agreed that
> > > the userspace virtual machine monitor (VMM) will synthesize the vPASID
> > > capability. The VMM must identify a suitable location to insert the vPASID
> > > capability, including handling hidden bits for certain devices. However,
> > > this responsibility lies with userspace and is not the focus of this series.
> > > 
> > > This series begins by adding helpers for PASID attachment in the vfio core,
> > > then extends the device character device (cdev) attach/detach ioctls to
> > > support PASID attach/detach operations. At the conclusion of this series,
> > > the IOMMU_GET_HW_INFO ioctl is extended to report PCI PASID capabilities
> > > to userspace. Userspace should verify this capability before utilizing any
> > > PASID-related uAPIs provided by VFIO, as agreed in [2]. This series depends
> > > on the iommufd PASID attach/detach series [3].
> > > 
> > > The complete code is available at [4] and has been tested with a modified
> > > QEMU branch [5].
> > 
> > What's missing for this to go in and which tree will take it?  At a
> > glance it seems like 4/ needs a PCI sign-off and 5/ needs an IOMMUFD
> > sign-off.  Thanks,
> 
> Hi Alex,
> 
> yep, I just looped Bjorn in patch 4. While for patch 5, Jason is cced. I
> thought this may need to be taken together with the iommufd pasid
> series [1] due to dependency. Jason also mentioned this in the before [2].
> It might be a shared tree from both of you I guess. :)

Yes, it all has to go together.. Everything is waiting on this:

> [1] https://lore.kernel.org/linux-iommu/20250313123532.103522-1-yi.l.liu@intel.com/

Which will hopefully be done in a few days??

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 3/5] vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  2025-03-13 12:47 ` [PATCH v8 3/5] vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid Yi Liu
@ 2025-03-19 17:41   ` Nicolin Chen
  2025-03-20 12:37     ` Yi Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2025-03-19 17:41 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, jgg, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, Mar 13, 2025 at 05:47:51AM -0700, Yi Liu wrote:
> This extends the VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT ioctls to attach/detach
> a given pasid of a vfio device to/from an IOAS/HWPT.
> 
> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

With some nits below:

>  drivers/vfio/device_cdev.c | 60 +++++++++++++++++++++++++++++++++-----
>  include/uapi/linux/vfio.h  | 29 +++++++++++-------
>  2 files changed, 71 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index bb1817bd4ff3..6d436bee8207 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -162,9 +162,9 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df)
>  int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
>  			    struct vfio_device_attach_iommufd_pt __user *arg)
>  {
> -	struct vfio_device *device = df->device;
>  	struct vfio_device_attach_iommufd_pt attach;
> -	unsigned long minsz;
> +	struct vfio_device *device = df->device;

It seems that the movement of this device line isn't necessary?

> +	if (attach.flags & (~VFIO_DEVICE_ATTACH_PASID))

Any reason for the parentheses? Why it's outside the ~ operator?

I assume (if adding more flags) we would end up with this:
	if (attach.flags & ~(VFIO_DEVICE_ATTACH_PASID | MORE_FLAGS))
?

> @@ -198,20 +221,41 @@ int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
>  int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
>  			    struct vfio_device_detach_iommufd_pt __user *arg)
>  {
> -	struct vfio_device *device = df->device;
>  	struct vfio_device_detach_iommufd_pt detach;
> -	unsigned long minsz;
> +	struct vfio_device *device = df->device;

Ditto.

> +	if (detach.flags & (~VFIO_DEVICE_DETACH_PASID))
> +		return -EINVAL;

Ditto.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-13 12:47 ` [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability Yi Liu
  2025-03-17  7:18   ` Yi Liu
@ 2025-03-19 17:58   ` Nicolin Chen
  2025-03-20 12:48     ` Yi Liu
  1 sibling, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2025-03-19 17:58 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, jgg, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, Mar 13, 2025 at 05:47:52AM -0700, Yi Liu wrote:
> PASID usage requires PASID support in both device and IOMMU. Since the
> iommu drivers always enable the PASID capability for the device if it
> is supported, this extends the IOMMU_GET_HW_INFO to report the PASID
> capability to userspace. Also, enhances the selftest accordingly.

Overall, I am a bit confused by the out_capabilities field in the
IOMMU_GET_HW_INFO. Why these capabilities cannot be reported via
the driver specific data structure?

E.g. we don't report the "NESTING" capability in out_capabilities
at all and that works fine since the iommu driver would reject if
it doesn't support.

Mind elaborate the reason for these two new capabilities? What is
the benefit from keeping them in the core v.s. driver level?

(sorry for doubting this at such a late stage...)

> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 35 +++++++++++++++++++++++++++++++++-
>  drivers/pci/ats.c              | 33 ++++++++++++++++++++++++++++++++
>  include/linux/pci-ats.h        |  3 +++
>  include/uapi/linux/iommufd.h   | 14 +++++++++++++-
>  4 files changed, 83 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 70da39f5e227..1f3bec61bcf9 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -3,6 +3,8 @@
>   */
>  #include <linux/iommu.h>
>  #include <linux/iommufd.h>
> +#include <linux/pci.h>
> +#include <linux/pci-ats.h>

"pci-ats" includes "pci.h" already.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 5/5] iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
  2025-03-13 12:47 ` [PATCH v8 5/5] iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO Yi Liu
@ 2025-03-19 18:12   ` Nicolin Chen
  0 siblings, 0 replies; 25+ messages in thread
From: Nicolin Chen @ 2025-03-19 18:12 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, jgg, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, Mar 13, 2025 at 05:47:53AM -0700, Yi Liu wrote:
> IOMMU_HW_INFO is extended to report max_pasid_log2, hence add coverage
> for it.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

Two nits:

> @@ -342,12 +342,14 @@ FIXTURE(iommufd_ioas)
>  	uint32_t hwpt_id;
>  	uint32_t device_id;
>  	uint64_t base_iova;
> +	uint32_t pasid_device_id;

Maybe "device_pasid_id", matching with "MOCK_FLAGS_DEVICE_PASID"?

> +		if (variant->pasid_capable) {
> +			test_cmd_get_hw_info_pasid(self->pasid_device_id,
> +						   &max_pasid);
> +			ASSERT_EQ(20, max_pasid);

Maybe add a define in the iommufd_test.h file? Since the selftest
in the kernel needs to provide a "20" also.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 3/5] vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
  2025-03-19 17:41   ` Nicolin Chen
@ 2025-03-20 12:37     ` Yi Liu
  0 siblings, 0 replies; 25+ messages in thread
From: Yi Liu @ 2025-03-20 12:37 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: alex.williamson, kevin.tian, jgg, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde



On 2025/3/20 01:41, Nicolin Chen wrote:
> On Thu, Mar 13, 2025 at 05:47:51AM -0700, Yi Liu wrote:
>> This extends the VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT ioctls to attach/detach
>> a given pasid of a vfio device to/from an IOAS/HWPT.
>>
>> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> 
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> 
> With some nits below:
> 
>>   drivers/vfio/device_cdev.c | 60 +++++++++++++++++++++++++++++++++-----
>>   include/uapi/linux/vfio.h  | 29 +++++++++++-------
>>   2 files changed, 71 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
>> index bb1817bd4ff3..6d436bee8207 100644
>> --- a/drivers/vfio/device_cdev.c
>> +++ b/drivers/vfio/device_cdev.c
>> @@ -162,9 +162,9 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df)
>>   int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
>>   			    struct vfio_device_attach_iommufd_pt __user *arg)
>>   {
>> -	struct vfio_device *device = df->device;
>>   	struct vfio_device_attach_iommufd_pt attach;
>> -	unsigned long minsz;
>> +	struct vfio_device *device = df->device;
> 
> It seems that the movement of this device line isn't necessary?

hmmm. it's just for reverse Christmas tree. no functional reason.
> 
>> +	if (attach.flags & (~VFIO_DEVICE_ATTACH_PASID))
> 
> Any reason for the parentheses? Why it's outside the ~ operator?
> 
> I assume (if adding more flags) we would end up with this:
> 	if (attach.flags & ~(VFIO_DEVICE_ATTACH_PASID | MORE_FLAGS))

yes. need to drop it.

> 
>> @@ -198,20 +221,41 @@ int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
>>   int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
>>   			    struct vfio_device_detach_iommufd_pt __user *arg)
>>   {
>> -	struct vfio_device *device = df->device;
>>   	struct vfio_device_detach_iommufd_pt detach;
>> -	unsigned long minsz;
>> +	struct vfio_device *device = df->device;
> 
> Ditto.
> 
>> +	if (detach.flags & (~VFIO_DEVICE_DETACH_PASID))
>> +		return -EINVAL;
> 
> Ditto.
> 
> Thanks
> Nicolin

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-19 17:58   ` Nicolin Chen
@ 2025-03-20 12:48     ` Yi Liu
  2025-03-20 16:47       ` Nicolin Chen
  0 siblings, 1 reply; 25+ messages in thread
From: Yi Liu @ 2025-03-20 12:48 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: alex.williamson, kevin.tian, jgg, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On 2025/3/20 01:58, Nicolin Chen wrote:
> On Thu, Mar 13, 2025 at 05:47:52AM -0700, Yi Liu wrote:
>> PASID usage requires PASID support in both device and IOMMU. Since the
>> iommu drivers always enable the PASID capability for the device if it
>> is supported, this extends the IOMMU_GET_HW_INFO to report the PASID
>> capability to userspace. Also, enhances the selftest accordingly.
> 
> Overall, I am a bit confused by the out_capabilities field in the
> IOMMU_GET_HW_INFO. Why these capabilities cannot be reported via
> the driver specific data structure?
> 
> E.g. we don't report the "NESTING" capability in out_capabilities
> at all and that works fine since the iommu driver would reject if
> it doesn't support.

NESTING is a bit different. Userspace needs to know underlying PASID
cap and further expose it to guest if it wants. While NESTING is not
from this angle. It's just for the use of userspace. So a try and fail
is ok.

> Mind elaborate the reason for these two new capabilities? What is
> the benefit from keeping them in the core v.s. driver level?

I view the PASID cap is generic just like the DIRTY_TRACKING cap.
Reporting them in the driver-specific part is fine, but I doubt
if it is better since PASID cap is generic to all vendors.

> (sorry for doubting this at such a late stage...)
> 
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>>   drivers/iommu/iommufd/device.c | 35 +++++++++++++++++++++++++++++++++-
>>   drivers/pci/ats.c              | 33 ++++++++++++++++++++++++++++++++
>>   include/linux/pci-ats.h        |  3 +++
>>   include/uapi/linux/iommufd.h   | 14 +++++++++++++-
>>   4 files changed, 83 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
>> index 70da39f5e227..1f3bec61bcf9 100644
>> --- a/drivers/iommu/iommufd/device.c
>> +++ b/drivers/iommu/iommufd/device.c
>> @@ -3,6 +3,8 @@
>>    */
>>   #include <linux/iommu.h>
>>   #include <linux/iommufd.h>
>> +#include <linux/pci.h>
>> +#include <linux/pci-ats.h>
> 
> "pci-ats" includes "pci.h" already.
> 

yes, it is.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-20 12:48     ` Yi Liu
@ 2025-03-20 16:47       ` Nicolin Chen
  2025-03-20 18:57         ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2025-03-20 16:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, jgg, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, Mar 20, 2025 at 08:48:49PM +0800, Yi Liu wrote:
> On 2025/3/20 01:58, Nicolin Chen wrote:
> > On Thu, Mar 13, 2025 at 05:47:52AM -0700, Yi Liu wrote:
> > > PASID usage requires PASID support in both device and IOMMU. Since the
> > > iommu drivers always enable the PASID capability for the device if it
> > > is supported, this extends the IOMMU_GET_HW_INFO to report the PASID
> > > capability to userspace. Also, enhances the selftest accordingly.
> > 
> > Overall, I am a bit confused by the out_capabilities field in the
> > IOMMU_GET_HW_INFO. Why these capabilities cannot be reported via
> > the driver specific data structure?
> > 
> > E.g. we don't report the "NESTING" capability in out_capabilities
> > at all and that works fine since the iommu driver would reject if
> > it doesn't support.
> 
> NESTING is a bit different. Userspace needs to know underlying PASID
> cap and further expose it to guest if it wants. While NESTING is not
> from this angle. It's just for the use of userspace. So a try and fail
> is ok.

Hmm, would you please elaborate the difference more?

Also, what's that "further expose"?

> > Mind elaborate the reason for these two new capabilities? What is
> > the benefit from keeping them in the core v.s. driver level?
> 
> I view the PASID cap is generic just like the DIRTY_TRACKING cap.

Well, I actually don't get the DIRTY_TRACKING cap either. Based on
what I see from Zhenzhong's implementation in the QEMU, it totally
could be a vendor-specific cap: we can just add another PCIIOMMUOps
op for QEMU core to ask the IOMMU device model to get hw_info and
allocate a DIRTY_TRACKING-enabled HWPT to return that to the core,
instead of core detecting the out_capabilities in the common path.

In that regard, honestly, I don't quite get this out_capabilities.

> Reporting them in the driver-specific part is fine, but I doubt
> if it is better since PASID cap is generic to all vendors.

The argument could be that NESTING is generic to all vendors too,
yet we don't have that in out_capabilities :-/

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-20 16:47       ` Nicolin Chen
@ 2025-03-20 18:57         ` Jason Gunthorpe
  2025-03-20 20:02           ` Nicolin Chen
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 18:57 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Yi Liu, alex.williamson, kevin.tian, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, Mar 20, 2025 at 09:47:32AM -0700, Nicolin Chen wrote:

> In that regard, honestly, I don't quite get this out_capabilities.

Yeah, I think it is best thought of as place to put discoverability if
people want discoverability.

I have had a wait and see feeling in this area since I don't know what
qemu or libvirt would actually use.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-20 18:57         ` Jason Gunthorpe
@ 2025-03-20 20:02           ` Nicolin Chen
  2025-03-20 23:40             ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2025-03-20 20:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, alex.williamson, kevin.tian, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, Mar 20, 2025 at 03:57:26PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2025 at 09:47:32AM -0700, Nicolin Chen wrote:
> 
> > In that regard, honestly, I don't quite get this out_capabilities.
> 
> Yeah, I think it is best thought of as place to put discoverability if
> people want discoverability.
> 
> I have had a wait and see feeling in this area since I don't know what
> qemu or libvirt would actually use.

Both ARM and Intel have max_pasid_log2 being reported somewhere
in their vendor data structures. So, unless user space really
wants that info immediately without involving the vendor IOMMU,
this max_pasid_log2 seems to be redundant.

Also, this patch polls two IOMMU caps out of pci_pasid_status()
that is a per device function. Is this okay? Can it end up with
two devices (one has PASID; the other doesn't) behind the same
IOMMU reporting two different sets of out_capabilities, which
were supposed to be the same since it the same IOMMU HW?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-20 20:02           ` Nicolin Chen
@ 2025-03-20 23:40             ` Jason Gunthorpe
  2025-03-20 23:48               ` Nicolin Chen
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 23:40 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Yi Liu, alex.williamson, kevin.tian, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, Mar 20, 2025 at 01:02:54PM -0700, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 03:57:26PM -0300, Jason Gunthorpe wrote:
> > On Thu, Mar 20, 2025 at 09:47:32AM -0700, Nicolin Chen wrote:
> > 
> > > In that regard, honestly, I don't quite get this out_capabilities.
> > 
> > Yeah, I think it is best thought of as place to put discoverability if
> > people want discoverability.
> > 
> > I have had a wait and see feeling in this area since I don't know what
> > qemu or libvirt would actually use.
> 
> Both ARM and Intel have max_pasid_log2 being reported somewhere
> in their vendor data structures. So, unless user space really
> wants that info immediately without involving the vendor IOMMU,
> this max_pasid_log2 seems to be redundant.

I don't expect that PASID support should require a userspace driver
component, it should work generically. So generic userspace should
have a generic way to get the pasid range.

> Also, this patch polls two IOMMU caps out of pci_pasid_status()
> that is a per device function. Is this okay?

I think so, the hw_info is a per-device operation

> Can it end up with two devices (one has PASID; the other doesn't)
> behind the same IOMMU reporting two different sets of
> out_capabilities, which were supposed to be the same since it the
> same IOMMU HW?

Yes it can report differences, but that is OK as the iommu is not
required to be uniform across all devices? Did you mean something else?

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-20 23:40             ` Jason Gunthorpe
@ 2025-03-20 23:48               ` Nicolin Chen
  2025-03-21  4:27                 ` Nicolin Chen
  0 siblings, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2025-03-20 23:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, alex.williamson, kevin.tian, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, Mar 20, 2025 at 08:40:57PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2025 at 01:02:54PM -0700, Nicolin Chen wrote:
> > On Thu, Mar 20, 2025 at 03:57:26PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Mar 20, 2025 at 09:47:32AM -0700, Nicolin Chen wrote:
> > > 
> > > > In that regard, honestly, I don't quite get this out_capabilities.
> > > 
> > > Yeah, I think it is best thought of as place to put discoverability if
> > > people want discoverability.
> > > 
> > > I have had a wait and see feeling in this area since I don't know what
> > > qemu or libvirt would actually use.
> > 
> > Both ARM and Intel have max_pasid_log2 being reported somewhere
> > in their vendor data structures. So, unless user space really
> > wants that info immediately without involving the vendor IOMMU,
> > this max_pasid_log2 seems to be redundant.
> 
> I don't expect that PASID support should require a userspace driver
> component, it should work generically. So generic userspace should
> have a generic way to get the pasid range.

OK.

> > Also, this patch polls two IOMMU caps out of pci_pasid_status()
> > that is a per device function. Is this okay?
> 
> I think so, the hw_info is a per-device operation
> 
> > Can it end up with two devices (one has PASID; the other doesn't)
> > behind the same IOMMU reporting two different sets of
> > out_capabilities, which were supposed to be the same since it the
> > same IOMMU HW?
> 
> Yes it can report differences, but that is OK as the iommu is not
> required to be uniform across all devices? Did you mean something else?

Hmm, I thought hw_info is all about a single IOMMU instance.

Although the ioctl is per-device operation, it feels odd that
different devices behind the same IOMMU will return different
copies of "IOMMU" hw_info for that IOMMU HW..

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-20 23:48               ` Nicolin Chen
@ 2025-03-21  4:27                 ` Nicolin Chen
  2025-03-21 17:37                   ` Yi Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2025-03-21  4:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, alex.williamson, kevin.tian, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Thu, Mar 20, 2025 at 04:48:33PM -0700, Nicolin Chen wrote:
> > > Also, this patch polls two IOMMU caps out of pci_pasid_status()
> > > that is a per device function. Is this okay?
> > 
> > I think so, the hw_info is a per-device operation
> > 
> > > Can it end up with two devices (one has PASID; the other doesn't)
> > > behind the same IOMMU reporting two different sets of
> > > out_capabilities, which were supposed to be the same since it the
> > > same IOMMU HW?
> > 
> > Yes it can report differences, but that is OK as the iommu is not
> > required to be uniform across all devices? Did you mean something else?
> 
> Hmm, I thought hw_info is all about a single IOMMU instance.
> 
> Although the ioctl is per-device operation, it feels odd that
> different devices behind the same IOMMU will return different
> copies of "IOMMU" hw_info for that IOMMU HW..

Reading this further, I found that Yi did report VFIO device cap
for PASID via a VFIO ioctl in the early versions but switched to
using the IOMMU_GET_HW_INFO since v3 (nearly a year ago). So, I
see that's a made decision.

Given that our IOMMU_GET_HW_INFO defines this:
  * Query an iommu type specific hardware information data from an iommu behind
  * a given device that has been bound to iommufd. This hardware info data will
  * be used to sync capabilities between the virtual iommu and the physical
  * iommu, e.g. a nested translation setup needs to check the hardware info, so
  * a guest stage-1 page table can be compatible with the physical iommu.

max_pasid_log2 is something that fits well. But PCI device cap
still feels odd in that regard, as it repurposes the ioctl.

So, perhaps we should update the uAPI documentation and ask user
space to run IOMMU_GET_HW_INFO for every iommufd_device, because
the output out_capabilities may be different per iommufd_device,
even if both devices are correctly assigned to the same vIOMMU.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-21  4:27                 ` Nicolin Chen
@ 2025-03-21 17:37                   ` Yi Liu
  2025-03-21 18:29                     ` Nicolin Chen
  0 siblings, 1 reply; 25+ messages in thread
From: Yi Liu @ 2025-03-21 17:37 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: alex.williamson, kevin.tian, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On 2025/3/21 12:27, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 04:48:33PM -0700, Nicolin Chen wrote:
>>>> Also, this patch polls two IOMMU caps out of pci_pasid_status()
>>>> that is a per device function. Is this okay?
>>>
>>> I think so, the hw_info is a per-device operation
>>>
>>>> Can it end up with two devices (one has PASID; the other doesn't)
>>>> behind the same IOMMU reporting two different sets of
>>>> out_capabilities, which were supposed to be the same since it the
>>>> same IOMMU HW?
>>>
>>> Yes it can report differences, but that is OK as the iommu is not
>>> required to be uniform across all devices? Did you mean something else?
>>
>> Hmm, I thought hw_info is all about a single IOMMU instance.
>>
>> Although the ioctl is per-device operation, it feels odd that
>> different devices behind the same IOMMU will return different
>> copies of "IOMMU" hw_info for that IOMMU HW..
> 
> Reading this further, I found that Yi did report VFIO device cap
> for PASID via a VFIO ioctl in the early versions but switched to
> using the IOMMU_GET_HW_INFO since v3 (nearly a year ago). So, I
> see that's a made decision.
> 
> Given that our IOMMU_GET_HW_INFO defines this:
>    * Query an iommu type specific hardware information data from an iommu behind
>    * a given device that has been bound to iommufd. This hardware info data will
>    * be used to sync capabilities between the virtual iommu and the physical
>    * iommu, e.g. a nested translation setup needs to check the hardware info, so
>    * a guest stage-1 page table can be compatible with the physical iommu.
> 
> max_pasid_log2 is something that fits well. But PCI device cap
> still feels odd in that regard, as it repurposes the ioctl.

PASID cap is a bit special. It should not be reported to user unless
both iommu and device enabled it. So adding it in this hw_info ioctl
is fine. It can avoid duplicate ioctls across userspace driver frameworks
as well.

> So, perhaps we should update the uAPI documentation and ask user
> space to run IOMMU_GET_HW_INFO for every iommufd_device, because
> the output out_capabilities may be different per iommufd_device,
> even if both devices are correctly assigned to the same vIOMMU.

since this is a per-device ioctl. userspace should expect difference
and. Actually, the userspace e.g. vfio may just invoke this ioctl
to know if the PASID cap instead of asking vIOMMU if we define it
in the driver-specific part. This is much convenient.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-21 17:37                   ` Yi Liu
@ 2025-03-21 18:29                     ` Nicolin Chen
  2025-03-21 18:42                       ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2025-03-21 18:29 UTC (permalink / raw)
  To: Yi Liu
  Cc: Jason Gunthorpe, alex.williamson, kevin.tian, eric.auger, kvm,
	chao.p.peng, zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Sat, Mar 22, 2025 at 01:37:39AM +0800, Yi Liu wrote:
> On 2025/3/21 12:27, Nicolin Chen wrote:
> > On Thu, Mar 20, 2025 at 04:48:33PM -0700, Nicolin Chen wrote:
> > Reading this further, I found that Yi did report VFIO device cap
> > for PASID via a VFIO ioctl in the early versions but switched to
> > using the IOMMU_GET_HW_INFO since v3 (nearly a year ago). So, I
> > see that's a made decision.
> > 
> > Given that our IOMMU_GET_HW_INFO defines this:
> >    * Query an iommu type specific hardware information data from an iommu behind
> >    * a given device that has been bound to iommufd. This hardware info data will
> >    * be used to sync capabilities between the virtual iommu and the physical
> >    * iommu, e.g. a nested translation setup needs to check the hardware info, so
> >    * a guest stage-1 page table can be compatible with the physical iommu.
> > 
> > max_pasid_log2 is something that fits well. But PCI device cap
> > still feels odd in that regard, as it repurposes the ioctl.
> 
> PASID cap is a bit special. It should not be reported to user unless
> both iommu and device enabled it. So adding it in this hw_info ioctl
> is fine. It can avoid duplicate ioctls across userspace driver frameworks
> as well.

Yea, I get the convenience.

> > So, perhaps we should update the uAPI documentation and ask user
> > space to run IOMMU_GET_HW_INFO for every iommufd_device, because
> > the output out_capabilities may be different per iommufd_device,
> > even if both devices are correctly assigned to the same vIOMMU.
> 
> since this is a per-device ioctl. userspace should expect difference
> and. Actually, the userspace e.g. vfio may just invoke this ioctl
> to know if the PASID cap instead of asking vIOMMU if we define it
> in the driver-specific part. This is much convenient.

A PASID cap of an IOMMU's is reported by max_pasid_log2 alone,
isn't it? Only the PCI layer that holds the VFIO device cares
about these two PCI device PASID caps that will be reported in
its emulated PCI_PASID_CAP register.

Yes, this is a per-device ioctl. But we defined it to use the
device only as a bridge to get access to its IOMMU and return
IOMMU's caps/infos. Now, we are reporting HW info about this
bridge itself. I think it repurposes the ioctl.

And honestly, "userspace should expect difference" isn't very
fair. A vIOMMU could have been initialized by the first given
iommufd_device, as it could have expected the IOMMU info from
either the first device or the second device to be consistent.
Yet now how a vIOMMU to get finalized given "userspace should
expect difference"? Certainly, I don't see an issue with these
two PCI caps, since a vIOMMU would unlikely integrate them in
its registers, so long as we note it down clearly that these
two "IOMMU_HW" caps come from the bridging idev v.s. IOMMU HW.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-21 18:29                     ` Nicolin Chen
@ 2025-03-21 18:42                       ` Jason Gunthorpe
  2025-03-21 19:00                         ` Nicolin Chen
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2025-03-21 18:42 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Yi Liu, alex.williamson, kevin.tian, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Fri, Mar 21, 2025 at 11:29:53AM -0700, Nicolin Chen wrote:

> Yes, this is a per-device ioctl. But we defined it to use the
> device only as a bridge to get access to its IOMMU and return
> IOMMU's caps/infos. Now, we are reporting HW info about this
> bridge itself. I think it repurposes the ioctl.

I had imagined it as both, it is called GET_HW_INFO not
GET_IOMMU_INFO..

Maybe all that is needed is a bit of clarity in the kdoc which items
are iommu global and which are per-device?

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
  2025-03-21 18:42                       ` Jason Gunthorpe
@ 2025-03-21 19:00                         ` Nicolin Chen
  0 siblings, 0 replies; 25+ messages in thread
From: Nicolin Chen @ 2025-03-21 19:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, alex.williamson, kevin.tian, eric.auger, kvm, chao.p.peng,
	zhenzhong.duan, willy, zhangfei.gao, vasant.hegde

On Fri, Mar 21, 2025 at 03:42:05PM -0300, Jason Gunthorpe wrote:
> On Fri, Mar 21, 2025 at 11:29:53AM -0700, Nicolin Chen wrote:
> 
> > Yes, this is a per-device ioctl. But we defined it to use the
> > device only as a bridge to get access to its IOMMU and return
> > IOMMU's caps/infos. Now, we are reporting HW info about this
> > bridge itself. I think it repurposes the ioctl.
> 
> I had imagined it as both, it is called GET_HW_INFO not
> GET_IOMMU_INFO..
> 
> Maybe all that is needed is a bit of clarity in the kdoc which items
> are iommu global and which are per-device?

Yea, that's my point after all.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-03-21 19:00 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-13 12:47 [PATCH v8 0/5] vfio-pci support pasid attach/detach Yi Liu
2025-03-13 12:47 ` [PATCH v8 1/5] ida: Add ida_find_first_range() Yi Liu
2025-03-13 12:47 ` [PATCH v8 2/5] vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices Yi Liu
2025-03-13 12:47 ` [PATCH v8 3/5] vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid Yi Liu
2025-03-19 17:41   ` Nicolin Chen
2025-03-20 12:37     ` Yi Liu
2025-03-13 12:47 ` [PATCH v8 4/5] iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability Yi Liu
2025-03-17  7:18   ` Yi Liu
2025-03-19 17:58   ` Nicolin Chen
2025-03-20 12:48     ` Yi Liu
2025-03-20 16:47       ` Nicolin Chen
2025-03-20 18:57         ` Jason Gunthorpe
2025-03-20 20:02           ` Nicolin Chen
2025-03-20 23:40             ` Jason Gunthorpe
2025-03-20 23:48               ` Nicolin Chen
2025-03-21  4:27                 ` Nicolin Chen
2025-03-21 17:37                   ` Yi Liu
2025-03-21 18:29                     ` Nicolin Chen
2025-03-21 18:42                       ` Jason Gunthorpe
2025-03-21 19:00                         ` Nicolin Chen
2025-03-13 12:47 ` [PATCH v8 5/5] iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO Yi Liu
2025-03-19 18:12   ` Nicolin Chen
2025-03-14 14:48 ` [PATCH v8 0/5] vfio-pci support pasid attach/detach Alex Williamson
2025-03-17  7:25   ` Yi Liu
2025-03-17 19:28     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox