virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
@ 2025-02-07 14:46 Jason Gunthorpe
  2025-02-07 14:46 ` [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static Jason Gunthorpe
                   ` (6 more replies)
  0 siblings, 7 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 14:46 UTC (permalink / raw)
  To: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon
  Cc: Eric Auger, patches

virtio-iommu and fsl_pamu are the only two drivers left that still
rely on ops.domain_alloc()

Update virtio-iommu to have a global static identity domain, implement
domain_alloc_paging(), and finalize its domain during allocation instead
of on first attach.

As virtio-iommu was the last real iommu driver using domain_alloc() update
the core code to isolate the remains to fsl_pamu.

I tested this using qemu x86 7.0.0 and 9.2.0 with a mlx5 VFIO PCI
device. The kernel behaves the same after this series.

However, there seem to be some unrelated qemu bugs. 7.0.0 does not have
working VIRTIO_IOMMU_F_BYPASS_CONFIG and neither qemu works with an
IDENTITY domain using !VIRTIO_IOMMU_F_BYPASS_CONFIG. It prints:

 qemu-system-x86_64: iommu has granularity incompatible with target AS
 qemu-system-x86_64: iommu map to non memory area 80000000
 qemu-system-x86_64: iommu map to non memory area c0000000
 qemu-system-x86_64: iommu map to non memory area e0000000
 qemu-system-x86_64: iommu map to non memory area f0000000
 qemu-system-x86_64: iommu map to non memory area f8000000
 qemu-system-x86_64: iommu map to non memory area fc000000
 qemu-system-x86_64: iommu map to non memory area fe000000
 qemu-system-x86_64: iommu map to non memory area fe800000
 qemu-system-x86_64: iommu map to non memory area 0
 qemu-system-x86_64: iommu map to non memory area fef00000
 qemu-system-x86_64: iommu map to non memory area ff000000
 qemu-system-x86_64: iommu has granularity incompatible with target AS
 qemu-system-x86_64: iommu map to non memory area 200000000
 qemu-system-x86_64: iommu map to non memory area 400000000
 qemu-system-x86_64: iommu map to non memory area 800000000
 qemu-system-x86_64: iommu map to non memory area 1000000000
 qemu-system-x86_64: iommu map to non memory area 2000000000
 qemu-system-x86_64: iommu map to non memory area 4000000000

Jason Gunthorpe (5):
  iommu/virtio: Break out bypass identity support into a global static
  iommu: Add domain_alloc_identity()
  iommu/virtio: Move to domain_alloc_paging()
  iommu: Do not call domain_alloc() in iommu_sva_domain_alloc()
  iommu: Hide ops.domain_alloc behind CONFIG_FSL_PAMU

 drivers/iommu/iommu-sva.c    |  15 ++-
 drivers/iommu/iommu.c        |  16 +--
 drivers/iommu/virtio-iommu.c | 200 +++++++++++++++++++++--------------
 include/linux/iommu.h        |  10 +-
 4 files changed, 141 insertions(+), 100 deletions(-)


base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
-- 
2.43.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static
  2025-02-07 14:46 [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jason Gunthorpe
@ 2025-02-07 14:46 ` Jason Gunthorpe
  2025-02-12  0:43   ` Jacob Pan
  2025-02-21 11:35   ` Jean-Philippe Brucker
  2025-02-07 14:46 ` [PATCH 2/5] iommu: Add domain_alloc_identity() Jason Gunthorpe
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 14:46 UTC (permalink / raw)
  To: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon
  Cc: Eric Auger, patches

To make way for a domain_alloc_paging conversion add the typical global
static IDENTITY domain. This supports VMMs that have a
VIRTIO_IOMMU_F_BYPASS_CONFIG config.

If the VMM does not have support then the domain_alloc path is still used,
which creates an IDENTITY domain out of a paging domain.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/virtio-iommu.c | 86 ++++++++++++++++++++++++++++--------
 1 file changed, 67 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index b85ce6310ddbda..c71a996760bddb 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -48,6 +48,7 @@ struct viommu_dev {
 	u64				pgsize_bitmap;
 	u32				first_domain;
 	u32				last_domain;
+	u32				identity_domain_id;
 	/* Supported MAP flags */
 	u32				map_flags;
 	u32				probe_size;
@@ -70,7 +71,6 @@ struct viommu_domain {
 	struct rb_root_cached		mappings;
 
 	unsigned long			nr_endpoints;
-	bool				bypass;
 };
 
 struct viommu_endpoint {
@@ -305,6 +305,22 @@ static int viommu_send_req_sync(struct viommu_dev *viommu, void *buf,
 	return ret;
 }
 
+static int viommu_send_attach_req(struct viommu_dev *viommu, struct device *dev,
+				  struct virtio_iommu_req_attach *req)
+{
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	int ret;
+	int i;
+
+	for (i = 0; i < fwspec->num_ids; i++) {
+		req->endpoint = cpu_to_le32(fwspec->ids[i]);
+		ret = viommu_send_req_sync(viommu, req, sizeof(*req));
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 /*
  * viommu_add_mapping - add a mapping to the internal tree
  *
@@ -687,12 +703,6 @@ static int viommu_domain_finalise(struct viommu_endpoint *vdev,
 	vdomain->viommu		= viommu;
 
 	if (domain->type == IOMMU_DOMAIN_IDENTITY) {
-		if (virtio_has_feature(viommu->vdev,
-				       VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
-			vdomain->bypass = true;
-			return 0;
-		}
-
 		ret = viommu_domain_map_identity(vdev, vdomain);
 		if (ret) {
 			ida_free(&viommu->domain_ids, vdomain->id);
@@ -719,10 +729,8 @@ static void viommu_domain_free(struct iommu_domain *domain)
 
 static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
-	int i;
 	int ret = 0;
 	struct virtio_iommu_req_attach req;
-	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
 	struct viommu_domain *vdomain = to_viommu_domain(domain);
 
@@ -761,16 +769,9 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		.domain		= cpu_to_le32(vdomain->id),
 	};
 
-	if (vdomain->bypass)
-		req.flags |= cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS);
-
-	for (i = 0; i < fwspec->num_ids; i++) {
-		req.endpoint = cpu_to_le32(fwspec->ids[i]);
-
-		ret = viommu_send_req_sync(vdomain->viommu, &req, sizeof(req));
-		if (ret)
-			return ret;
-	}
+	ret = viommu_send_attach_req(vdomain->viommu, dev, &req);
+	if (ret)
+		return ret;
 
 	if (!vdomain->nr_endpoints) {
 		/*
@@ -788,6 +789,40 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	return 0;
 }
 
+static int viommu_attach_identity_domain(struct iommu_domain *domain,
+					 struct device *dev)
+{
+	int ret = 0;
+	struct virtio_iommu_req_attach req;
+	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
+	struct viommu_domain *vdomain = to_viommu_domain(domain);
+
+	req = (struct virtio_iommu_req_attach) {
+		.head.type	= VIRTIO_IOMMU_T_ATTACH,
+		.domain		= cpu_to_le32(vdev->viommu->identity_domain_id),
+		.flags          = cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS),
+	};
+
+	ret = viommu_send_attach_req(vdev->viommu, dev, &req);
+	if (ret)
+		return ret;
+
+	if (vdev->vdomain) {
+		vdev->vdomain->nr_endpoints--;
+		vdomain->nr_endpoints++;
+		vdev->vdomain = vdomain;
+	}
+	return 0;
+}
+
+static struct viommu_domain viommu_identity_domain = {
+	.domain = { .type = IOMMU_DOMAIN_IDENTITY,
+		    .ops =
+			    &(const struct iommu_domain_ops){
+				    .attach_dev = viommu_attach_identity_domain,
+			    } }
+};
+
 static void viommu_detach_dev(struct viommu_endpoint *vdev)
 {
 	int i;
@@ -1061,6 +1096,7 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap)
 }
 
 static struct iommu_ops viommu_ops = {
+	.identity_domain	= &viommu_identity_domain.domain,
 	.capable		= viommu_capable,
 	.domain_alloc		= viommu_domain_alloc,
 	.probe_device		= viommu_probe_device,
@@ -1184,6 +1220,18 @@ static int viommu_probe(struct virtio_device *vdev)
 	if (virtio_has_feature(vdev, VIRTIO_IOMMU_F_MMIO))
 		viommu->map_flags |= VIRTIO_IOMMU_MAP_F_MMIO;
 
+	/* Reserve an ID to use as the bypass domain */
+	if (virtio_has_feature(viommu->vdev, VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+		viommu->identity_domain_id = viommu->first_domain;
+		viommu->first_domain++;
+	} else {
+		/*
+		 * Assume the VMM is sensible and it either supports bypass on
+		 * all instances or no instances.
+		 */
+		viommu_ops.identity_domain = NULL;
+	}
+
 	viommu_ops.pgsize_bitmap = viommu->pgsize_bitmap;
 
 	virtio_device_ready(vdev);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/5] iommu: Add domain_alloc_identity()
  2025-02-07 14:46 [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jason Gunthorpe
  2025-02-07 14:46 ` [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static Jason Gunthorpe
@ 2025-02-07 14:46 ` Jason Gunthorpe
  2025-02-12 13:56   ` Robin Murphy
  2025-02-07 14:46 ` [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging() Jason Gunthorpe
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 14:46 UTC (permalink / raw)
  To: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon
  Cc: Eric Auger, patches

virtio-iommu has a mode where the IDENTITY domain is actually a paging
domain with an identity mapping covering some of the system address
space manually created.

To support this add a new domain_alloc_identity() op that accepts
the struct device so that virtio can allocate and fully finalize a
paging domain to return.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 20 ++++++++++++--------
 include/linux/iommu.h |  4 ++++
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 870c3cdbd0f622..ee33d26dfcd40d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1595,15 +1595,19 @@ static struct iommu_domain *__iommu_alloc_identity_domain(struct device *dev)
 	if (ops->identity_domain)
 		return ops->identity_domain;
 
-	/* Older drivers create the identity domain via ops->domain_alloc() */
-	if (!ops->domain_alloc)
+	if (ops->domain_alloc_identity) {
+		domain = ops->domain_alloc_identity(dev);
+		if (IS_ERR(domain))
+			return domain;
+	} else if (ops->domain_alloc) {
+		domain = ops->domain_alloc(IOMMU_DOMAIN_IDENTITY);
+		if (!domain)
+			return ERR_PTR(-ENOMEM);
+		if (IS_ERR(domain))
+			return domain;
+	} else {
 		return ERR_PTR(-EOPNOTSUPP);
-
-	domain = ops->domain_alloc(IOMMU_DOMAIN_IDENTITY);
-	if (IS_ERR(domain))
-		return domain;
-	if (!domain)
-		return ERR_PTR(-ENOMEM);
+	}
 
 	iommu_domain_init(domain, IOMMU_DOMAIN_IDENTITY, ops);
 	return domain;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 38c65e92ecd091..6389d59178ba3f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -557,6 +557,9 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size,
  * @domain_alloc: allocate and return an iommu domain if success. Otherwise
  *                NULL is returned. The domain is not fully initialized until
  *                the caller iommu_domain_alloc() returns.
+ * @domain_alloc_identity: allocate an IDENTITY domain. Drivers should prefer to
+ *                         use identity_domain instead. This should only be used
+ *                         if dynamic logic is necessary.
  * @domain_alloc_paging_flags: Allocate an iommu domain corresponding to the
  *                     input parameters as defined in
  *                     include/uapi/linux/iommufd.h. The @user_data can be
@@ -615,6 +618,7 @@ struct iommu_ops {
 
 	/* Domain allocation and freeing by the iommu driver */
 	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
+	struct iommu_domain *(*domain_alloc_identity)(struct device *dev);
 	struct iommu_domain *(*domain_alloc_paging_flags)(
 		struct device *dev, u32 flags,
 		const struct iommu_user_data *user_data);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-07 14:46 [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jason Gunthorpe
  2025-02-07 14:46 ` [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static Jason Gunthorpe
  2025-02-07 14:46 ` [PATCH 2/5] iommu: Add domain_alloc_identity() Jason Gunthorpe
@ 2025-02-07 14:46 ` Jason Gunthorpe
  2025-02-12 19:22   ` Jacob Pan
  2025-02-07 14:46 ` [PATCH 4/5] iommu: Do not call domain_alloc() in iommu_sva_domain_alloc() Jason Gunthorpe
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 14:46 UTC (permalink / raw)
  To: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon
  Cc: Eric Auger, patches

virtio has the complication that it sometimes wants to return a paging
domain for IDENTITY which makes this conversion a little different than
other drivers.

Add a viommu_domain_alloc_paging() that combines viommu_domain_alloc() and
viommu_domain_finalise() to always return a fully initialized and
finalized paging domain.

Use viommu_domain_alloc_identity() to implement the special non-bypass
IDENTITY flow by calling viommu_domain_alloc_paging() then
viommu_domain_map_identity().

Remove support for deferred finalize and the vdomain->mutex.

Remove core support for domain_alloc() IDENTITY as virtio was the last
driver using it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c        |   6 --
 drivers/iommu/virtio-iommu.c | 114 ++++++++++++++++-------------------
 2 files changed, 53 insertions(+), 67 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ee33d26dfcd40d..73a05b34de4768 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1599,12 +1599,6 @@ static struct iommu_domain *__iommu_alloc_identity_domain(struct device *dev)
 		domain = ops->domain_alloc_identity(dev);
 		if (IS_ERR(domain))
 			return domain;
-	} else if (ops->domain_alloc) {
-		domain = ops->domain_alloc(IOMMU_DOMAIN_IDENTITY);
-		if (!domain)
-			return ERR_PTR(-ENOMEM);
-		if (IS_ERR(domain))
-			return domain;
 	} else {
 		return ERR_PTR(-EOPNOTSUPP);
 	}
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index c71a996760bddb..79b471c03b6ee4 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -63,7 +63,6 @@ struct viommu_mapping {
 struct viommu_domain {
 	struct iommu_domain		domain;
 	struct viommu_dev		*viommu;
-	struct mutex			mutex; /* protects viommu pointer */
 	unsigned int			id;
 	u32				map_flags;
 
@@ -97,6 +96,8 @@ struct viommu_event {
 	};
 };
 
+static struct viommu_domain viommu_identity_domain;
+
 #define to_viommu_domain(domain)	\
 	container_of(domain, struct viommu_domain, domain)
 
@@ -653,65 +654,45 @@ static void viommu_event_handler(struct virtqueue *vq)
 
 /* IOMMU API */
 
-static struct iommu_domain *viommu_domain_alloc(unsigned type)
+static struct iommu_domain *viommu_domain_alloc_paging(struct device *dev)
 {
-	struct viommu_domain *vdomain;
-
-	if (type != IOMMU_DOMAIN_UNMANAGED &&
-	    type != IOMMU_DOMAIN_DMA &&
-	    type != IOMMU_DOMAIN_IDENTITY)
-		return NULL;
-
-	vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
-	if (!vdomain)
-		return NULL;
-
-	mutex_init(&vdomain->mutex);
-	spin_lock_init(&vdomain->mappings_lock);
-	vdomain->mappings = RB_ROOT_CACHED;
-
-	return &vdomain->domain;
-}
-
-static int viommu_domain_finalise(struct viommu_endpoint *vdev,
-				  struct iommu_domain *domain)
-{
-	int ret;
-	unsigned long viommu_page_size;
+	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
 	struct viommu_dev *viommu = vdev->viommu;
-	struct viommu_domain *vdomain = to_viommu_domain(domain);
+	unsigned long viommu_page_size;
+	struct viommu_domain *vdomain;
+	int ret;
 
 	viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
 	if (viommu_page_size > PAGE_SIZE) {
 		dev_err(vdev->dev,
 			"granule 0x%lx larger than system page size 0x%lx\n",
 			viommu_page_size, PAGE_SIZE);
-		return -ENODEV;
+		return ERR_PTR(-ENODEV);
 	}
 
+	vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
+	if (!vdomain)
+		return ERR_PTR(-ENOMEM);
+
+	spin_lock_init(&vdomain->mappings_lock);
+	vdomain->mappings = RB_ROOT_CACHED;
+
 	ret = ida_alloc_range(&viommu->domain_ids, viommu->first_domain,
 			      viommu->last_domain, GFP_KERNEL);
-	if (ret < 0)
-		return ret;
-
-	vdomain->id		= (unsigned int)ret;
-
-	domain->pgsize_bitmap	= viommu->pgsize_bitmap;
-	domain->geometry	= viommu->geometry;
-
-	vdomain->map_flags	= viommu->map_flags;
-	vdomain->viommu		= viommu;
-
-	if (domain->type == IOMMU_DOMAIN_IDENTITY) {
-		ret = viommu_domain_map_identity(vdev, vdomain);
-		if (ret) {
-			ida_free(&viommu->domain_ids, vdomain->id);
-			vdomain->viommu = NULL;
-			return ret;
-		}
+	if (ret < 0) {
+		kfree(vdomain);
+		return ERR_PTR(ret);
 	}
 
-	return 0;
+	vdomain->id = (unsigned int)ret;
+
+	vdomain->domain.pgsize_bitmap = viommu->pgsize_bitmap;
+	vdomain->domain.geometry = viommu->geometry;
+
+	vdomain->map_flags = viommu->map_flags;
+	vdomain->viommu = viommu;
+
+	return &vdomain->domain;
 }
 
 static void viommu_domain_free(struct iommu_domain *domain)
@@ -727,6 +708,28 @@ static void viommu_domain_free(struct iommu_domain *domain)
 	kfree(vdomain);
 }
 
+static struct iommu_domain *viommu_domain_alloc_identity(struct device *dev)
+{
+	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
+	struct iommu_domain *domain;
+	int ret;
+
+	if (virtio_has_feature(vdev->viommu->vdev,
+			       VIRTIO_IOMMU_F_BYPASS_CONFIG))
+		return &viommu_identity_domain.domain;
+
+	domain = viommu_domain_alloc_paging(dev);
+	if (IS_ERR(domain))
+		return domain;
+
+	ret = viommu_domain_map_identity(vdev, to_viommu_domain(domain));
+	if (ret) {
+		viommu_domain_free(domain);
+		return ERR_PTR(ret);
+	}
+	return domain;
+}
+
 static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
@@ -734,20 +737,8 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
 	struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-	mutex_lock(&vdomain->mutex);
-	if (!vdomain->viommu) {
-		/*
-		 * Properly initialize the domain now that we know which viommu
-		 * owns it.
-		 */
-		ret = viommu_domain_finalise(vdev, domain);
-	} else if (vdomain->viommu != vdev->viommu) {
-		ret = -EINVAL;
-	}
-	mutex_unlock(&vdomain->mutex);
-
-	if (ret)
-		return ret;
+	if (vdomain->viommu != vdev->viommu)
+		return -EINVAL;
 
 	/*
 	 * In the virtio-iommu device, when attaching the endpoint to a new
@@ -1098,7 +1089,8 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap)
 static struct iommu_ops viommu_ops = {
 	.identity_domain	= &viommu_identity_domain.domain,
 	.capable		= viommu_capable,
-	.domain_alloc		= viommu_domain_alloc,
+	.domain_alloc_identity	= viommu_domain_alloc_identity,
+	.domain_alloc_paging	= viommu_domain_alloc_paging,
 	.probe_device		= viommu_probe_device,
 	.release_device		= viommu_release_device,
 	.device_group		= viommu_device_group,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/5] iommu: Do not call domain_alloc() in iommu_sva_domain_alloc()
  2025-02-07 14:46 [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2025-02-07 14:46 ` [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging() Jason Gunthorpe
@ 2025-02-07 14:46 ` Jason Gunthorpe
  2025-02-07 14:46 ` [PATCH 5/5] iommu: Hide ops.domain_alloc behind CONFIG_FSL_PAMU Jason Gunthorpe
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 14:46 UTC (permalink / raw)
  To: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon
  Cc: Eric Auger, patches

No driver implements SVA under domain_alloc() anymore, this is dead
code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu-sva.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 503c5d23c1ea27..b1cd32e665b61c 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -299,15 +299,12 @@ static struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
 	const struct iommu_ops *ops = dev_iommu_ops(dev);
 	struct iommu_domain *domain;
 
-	if (ops->domain_alloc_sva) {
-		domain = ops->domain_alloc_sva(dev, mm);
-		if (IS_ERR(domain))
-			return domain;
-	} else {
-		domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
-		if (!domain)
-			return ERR_PTR(-ENOMEM);
-	}
+	if (!ops->domain_alloc_sva)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	domain = ops->domain_alloc_sva(dev, mm);
+	if (IS_ERR(domain))
+		return domain;
 
 	domain->type = IOMMU_DOMAIN_SVA;
 	mmgrab(mm);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/5] iommu: Hide ops.domain_alloc behind CONFIG_FSL_PAMU
  2025-02-07 14:46 [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2025-02-07 14:46 ` [PATCH 4/5] iommu: Do not call domain_alloc() in iommu_sva_domain_alloc() Jason Gunthorpe
@ 2025-02-07 14:46 ` Jason Gunthorpe
  2025-02-12  0:41 ` [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jacob Pan
       [not found] ` <67abee53.170a0220.154671.ae28SMTPIN_ADDED_BROKEN@mx.google.com>
  6 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 14:46 UTC (permalink / raw)
  To: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon
  Cc: Eric Auger, patches

fsl_pamu is the last user of domain_alloc(), and it is using it to create
something weird that doesn't really fit into the iommu subsystem
architecture. It is a not a paging domain since it doesn't have any
map/unmap ops. It may be some special kind of identity domain.

For now just leave it as is. Wrap it's definition in CONFIG_FSL_PAMU to
discourage any new drivers from attempting to use it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 2 ++
 include/linux/iommu.h | 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 73a05b34de4768..b81a82df9fed25 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1987,8 +1987,10 @@ __iommu_paging_domain_alloc_flags(struct device *dev, unsigned int type,
 		domain = ops->domain_alloc_paging(dev);
 	else if (ops->domain_alloc_paging_flags)
 		domain = ops->domain_alloc_paging_flags(dev, flags, NULL);
+#if IS_ENABLED(CONFIG_FSL_PAMU)
 	else if (ops->domain_alloc && !flags)
 		domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED);
+#endif
 	else
 		return ERR_PTR(-EOPNOTSUPP);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6389d59178ba3f..55cc87bc62f5d0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -554,9 +554,7 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size,
  *           op is allocated in the iommu driver and freed by the caller after
  *           use. The information type is one of enum iommu_hw_info_type defined
  *           in include/uapi/linux/iommufd.h.
- * @domain_alloc: allocate and return an iommu domain if success. Otherwise
- *                NULL is returned. The domain is not fully initialized until
- *                the caller iommu_domain_alloc() returns.
+ * @domain_alloc: Do not use in new drivers
  * @domain_alloc_identity: allocate an IDENTITY domain. Drivers should prefer to
  *                         use identity_domain instead. This should only be used
  *                         if dynamic logic is necessary.
@@ -617,7 +615,9 @@ struct iommu_ops {
 	void *(*hw_info)(struct device *dev, u32 *length, u32 *type);
 
 	/* Domain allocation and freeing by the iommu driver */
+#if IS_ENABLED(CONFIG_FSL_PAMU)
 	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
+#endif
 	struct iommu_domain *(*domain_alloc_identity)(struct device *dev);
 	struct iommu_domain *(*domain_alloc_paging_flags)(
 		struct device *dev, u32 flags,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
  2025-02-07 14:46 [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2025-02-07 14:46 ` [PATCH 5/5] iommu: Hide ops.domain_alloc behind CONFIG_FSL_PAMU Jason Gunthorpe
@ 2025-02-12  0:41 ` Jacob Pan
  2025-02-12 12:50   ` Jason Gunthorpe
       [not found] ` <67abee53.170a0220.154671.ae28SMTPIN_ADDED_BROKEN@mx.google.com>
  6 siblings, 1 reply; 37+ messages in thread
From: Jacob Pan @ 2025-02-12  0:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, jacob.pan

Hi Jason,

On Fri,  7 Feb 2025 10:46:00 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> virtio-iommu and fsl_pamu are the only two drivers left that still
> rely on ops.domain_alloc()
> 
> Update virtio-iommu to have a global static identity domain, implement
> domain_alloc_paging(), and finalize its domain during allocation
> instead of on first attach.
> 
> As virtio-iommu was the last real iommu driver using domain_alloc()
> update the core code to isolate the remains to fsl_pamu.
> 
> I tested this using qemu x86 7.0.0 and 9.2.0 with a mlx5 VFIO PCI
> device. The kernel behaves the same after this series.
> 
> However, there seem to be some unrelated qemu bugs. 7.0.0 does not
> have working VIRTIO_IOMMU_F_BYPASS_CONFIG and neither qemu works with
> an IDENTITY domain using !VIRTIO_IOMMU_F_BYPASS_CONFIG. It prints:
> 
>  qemu-system-x86_64: iommu has granularity incompatible with target AS
>  qemu-system-x86_64: iommu map to non memory area 80000000
>  qemu-system-x86_64: iommu map to non memory area c0000000
>  qemu-system-x86_64: iommu map to non memory area e0000000
>  qemu-system-x86_64: iommu map to non memory area f0000000
>  qemu-system-x86_64: iommu map to non memory area f8000000
>  qemu-system-x86_64: iommu map to non memory area fc000000
>  qemu-system-x86_64: iommu map to non memory area fe000000
>  qemu-system-x86_64: iommu map to non memory area fe800000
>  qemu-system-x86_64: iommu map to non memory area 0
>  qemu-system-x86_64: iommu map to non memory area fef00000
>  qemu-system-x86_64: iommu map to non memory area ff000000
>  qemu-system-x86_64: iommu has granularity incompatible with target AS
>  qemu-system-x86_64: iommu map to non memory area 200000000
>  qemu-system-x86_64: iommu map to non memory area 400000000
>  qemu-system-x86_64: iommu map to non memory area 800000000
>  qemu-system-x86_64: iommu map to non memory area 1000000000
>  qemu-system-x86_64: iommu map to non memory area 2000000000
>  qemu-system-x86_64: iommu map to non memory area 4000000000

I see the same on arm64 with v9.0, assigned an ixgbe nic via VFIO.

qemu-system-aarch64: iommu map to non memory area 0                               
qemu-system-aarch64: iommu map to non memory area 8100000                         
qemu-system-aarch64: iommu map to non memory area 8200000                         
qemu-system-aarch64: iommu map to non memory area 8400000                         
qemu-system-aarch64: iommu map to non memory area 8800000                         
qemu-system-aarch64: iommu map to non memory area 0                               
qemu-system-aarch64: iommu map to non memory area 0                               
qemu-system-aarch64: iommu map to non memory area c000000                         
qemu-system-aarch64: iommu map to non memory area 10000000                        
qemu-system-aarch64: iommu map to non memory area 20000000                        
qemu-system-aarch64: iommu has granularity incompatible with target AS            
qemu-system-aarch64: iommu map to non memory area 200000000                       
qemu-system-aarch64: iommu map to non memory area 400000000                       
qemu-system-aarch64: iommu map to non memory area 800000000                       
qemu-system-aarch64: iommu map to non memory area 1000000000                      
qemu-system-aarch64: iommu map to non memory area 2000000000                      
qemu-system-aarch64: iommu map to non memory area 4000000000                      
qemu-system-aarch64: iommu map to non memory area 8000000000                      
qemu-system-aarch64: iommu map to non memory area 10000000000                     
qemu-system-aarch64: iommu map to non memory area 20000000000                     
qemu-system-aarch64: iommu map to non memory area 40000000000                     
qemu-system-aarch64: iommu map to non memory area 80000000000                     
qemu-system-aarch64: iommu map to non memory area 100000000000                    
qemu-system-aarch64: iommu map to non memory area 200000000000                    
qemu-system-aarch64: iommu map to non memory area 400000000000                    
qemu-system-aarch64: iommu map to non memory area 800000000000   

> Jason Gunthorpe (5):
>   iommu/virtio: Break out bypass identity support into a global static
>   iommu: Add domain_alloc_identity()
>   iommu/virtio: Move to domain_alloc_paging()
>   iommu: Do not call domain_alloc() in iommu_sva_domain_alloc()
>   iommu: Hide ops.domain_alloc behind CONFIG_FSL_PAMU
> 
>  drivers/iommu/iommu-sva.c    |  15 ++-
>  drivers/iommu/iommu.c        |  16 +--
>  drivers/iommu/virtio-iommu.c | 200
> +++++++++++++++++++++-------------- include/linux/iommu.h        |
> 10 +- 4 files changed, 141 insertions(+), 100 deletions(-)
> 
> 
> base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static
  2025-02-07 14:46 ` [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static Jason Gunthorpe
@ 2025-02-12  0:43   ` Jacob Pan
  2025-02-18 18:21     ` Jason Gunthorpe
  2025-02-21 11:35   ` Jean-Philippe Brucker
  1 sibling, 1 reply; 37+ messages in thread
From: Jacob Pan @ 2025-02-12  0:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, jacob.pan

Hi Jason,

On Fri,  7 Feb 2025 10:46:01 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> To make way for a domain_alloc_paging conversion add the typical
> global static IDENTITY domain. This supports VMMs that have a
> VIRTIO_IOMMU_F_BYPASS_CONFIG config.
> 
> If the VMM does not have support then the domain_alloc path is still
> used, which creates an IDENTITY domain out of a paging domain.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/virtio-iommu.c | 86
> ++++++++++++++++++++++++++++-------- 1 file changed, 67
> insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c
> b/drivers/iommu/virtio-iommu.c index b85ce6310ddbda..c71a996760bddb
> 100644 --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -48,6 +48,7 @@ struct viommu_dev {
>  	u64				pgsize_bitmap;
>  	u32				first_domain;
>  	u32				last_domain;
> +	u32				identity_domain_id;
>  	/* Supported MAP flags */
>  	u32				map_flags;
>  	u32				probe_size;
> @@ -70,7 +71,6 @@ struct viommu_domain {
>  	struct rb_root_cached		mappings;
>  
>  	unsigned long			nr_endpoints;
> -	bool				bypass;
>  };
>  
>  struct viommu_endpoint {
> @@ -305,6 +305,22 @@ static int viommu_send_req_sync(struct
> viommu_dev *viommu, void *buf, return ret;
>  }
>  
> +static int viommu_send_attach_req(struct viommu_dev *viommu, struct
> device *dev,
> +				  struct virtio_iommu_req_attach
> *req) +{
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +	int ret;
> +	int i;
nit: coding style inconsistent within this patch. Inverted xmas tree
here.

> +	for (i = 0; i < fwspec->num_ids; i++) {
> +		req->endpoint = cpu_to_le32(fwspec->ids[i]);
> +		ret = viommu_send_req_sync(viommu, req,
> sizeof(*req));
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * viommu_add_mapping - add a mapping to the internal tree
>   *
> @@ -687,12 +703,6 @@ static int viommu_domain_finalise(struct
> viommu_endpoint *vdev, vdomain->viommu		= viommu;
>  
>  	if (domain->type == IOMMU_DOMAIN_IDENTITY) {
> -		if (virtio_has_feature(viommu->vdev,
> -
> VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
> -			vdomain->bypass = true;
> -			return 0;
> -		}
> -
>  		ret = viommu_domain_map_identity(vdev, vdomain);
>  		if (ret) {
>  			ida_free(&viommu->domain_ids, vdomain->id);
> @@ -719,10 +729,8 @@ static void viommu_domain_free(struct
> iommu_domain *domain) 
>  static int viommu_attach_dev(struct iommu_domain *domain, struct
> device *dev) {
> -	int i;
>  	int ret = 0;
>  	struct virtio_iommu_req_attach req;
> -	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>  	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
>  	struct viommu_domain *vdomain = to_viommu_domain(domain);
>  
> @@ -761,16 +769,9 @@ static int viommu_attach_dev(struct iommu_domain
> *domain, struct device *dev) .domain		=
> cpu_to_le32(vdomain->id), };
>  
> -	if (vdomain->bypass)
> -		req.flags |=
> cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS); -
> -	for (i = 0; i < fwspec->num_ids; i++) {
> -		req.endpoint = cpu_to_le32(fwspec->ids[i]);
> -
> -		ret = viommu_send_req_sync(vdomain->viommu, &req,
> sizeof(req));
> -		if (ret)
> -			return ret;
> -	}
> +	ret = viommu_send_attach_req(vdomain->viommu, dev, &req);
> +	if (ret)
> +		return ret;
>  
>  	if (!vdomain->nr_endpoints) {
>  		/*
> @@ -788,6 +789,40 @@ static int viommu_attach_dev(struct iommu_domain
> *domain, struct device *dev) return 0;
>  }
>  
> +static int viommu_attach_identity_domain(struct iommu_domain *domain,
> +					 struct device *dev)
> +{
> +	int ret = 0;
> +	struct virtio_iommu_req_attach req;
> +	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
> +	struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> +	req = (struct virtio_iommu_req_attach) {
> +		.head.type	= VIRTIO_IOMMU_T_ATTACH,
> +		.domain		=
> cpu_to_le32(vdev->viommu->identity_domain_id),
> +		.flags          =
> cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS),
> +	};
> +
> +	ret = viommu_send_attach_req(vdev->viommu, dev, &req);
> +	if (ret)
> +		return ret;
> +
> +	if (vdev->vdomain) {
> +		vdev->vdomain->nr_endpoints--;
> +		vdomain->nr_endpoints++;
> +		vdev->vdomain = vdomain;
> +	}
> +	return 0;
> +}
> +
> +static struct viommu_domain viommu_identity_domain = {
> +	.domain = { .type = IOMMU_DOMAIN_IDENTITY,
> +		    .ops =
> +			    &(const struct iommu_domain_ops){
> +				    .attach_dev =
> viommu_attach_identity_domain,
> +			    } }
> +};
> +
>  static void viommu_detach_dev(struct viommu_endpoint *vdev)
>  {
>  	int i;
> @@ -1061,6 +1096,7 @@ static bool viommu_capable(struct device *dev,
> enum iommu_cap cap) }
>  
>  static struct iommu_ops viommu_ops = {
> +	.identity_domain	= &viommu_identity_domain.domain,
>  	.capable		= viommu_capable,
>  	.domain_alloc		= viommu_domain_alloc,
>  	.probe_device		= viommu_probe_device,
> @@ -1184,6 +1220,18 @@ static int viommu_probe(struct virtio_device
> *vdev) if (virtio_has_feature(vdev, VIRTIO_IOMMU_F_MMIO))
>  		viommu->map_flags |= VIRTIO_IOMMU_MAP_F_MMIO;
>  
> +	/* Reserve an ID to use as the bypass domain */
> +	if (virtio_has_feature(viommu->vdev,
> VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
> +		viommu->identity_domain_id = viommu->first_domain;
> +		viommu->first_domain++;
Could also use ida_alloc() instead of special treatment, it would be
consistent with the paging identity domain ID.

> +	} else {
> +		/*
> +		 * Assume the VMM is sensible and it either supports
> bypass on
> +		 * all instances or no instances.
> +		 */
> +		viommu_ops.identity_domain = NULL;
> +	}
> +
>  	viommu_ops.pgsize_bitmap = viommu->pgsize_bitmap;
>  
>  	virtio_device_ready(vdev);


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
       [not found] ` <67abee53.170a0220.154671.ae28SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2025-02-12 11:58   ` Jean-Philippe Brucker
  2025-02-12 17:05     ` Jacob Pan
       [not found]     ` <67acd4e2.630a0220.365aab.e098SMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 37+ messages in thread
From: Jean-Philippe Brucker @ 2025-02-12 11:58 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

Hi Jacob,

On Tue, Feb 11, 2025 at 04:41:51PM -0800, Jacob Pan wrote:
> > However, there seem to be some unrelated qemu bugs. 7.0.0 does not
> > have working VIRTIO_IOMMU_F_BYPASS_CONFIG and neither qemu works with
> > an IDENTITY domain using !VIRTIO_IOMMU_F_BYPASS_CONFIG. It prints:
> > 
> >  qemu-system-x86_64: iommu has granularity incompatible with target AS
> >  qemu-system-x86_64: iommu map to non memory area 80000000
> >  qemu-system-x86_64: iommu map to non memory area c0000000
> >  qemu-system-x86_64: iommu map to non memory area e0000000
> >  qemu-system-x86_64: iommu map to non memory area f0000000
> >  qemu-system-x86_64: iommu map to non memory area f8000000
> >  qemu-system-x86_64: iommu map to non memory area fc000000
> >  qemu-system-x86_64: iommu map to non memory area fe000000
> >  qemu-system-x86_64: iommu map to non memory area fe800000
> >  qemu-system-x86_64: iommu map to non memory area 0
> >  qemu-system-x86_64: iommu map to non memory area fef00000
> >  qemu-system-x86_64: iommu map to non memory area ff000000
> >  qemu-system-x86_64: iommu has granularity incompatible with target AS
> >  qemu-system-x86_64: iommu map to non memory area 200000000
> >  qemu-system-x86_64: iommu map to non memory area 400000000
> >  qemu-system-x86_64: iommu map to non memory area 800000000
> >  qemu-system-x86_64: iommu map to non memory area 1000000000
> >  qemu-system-x86_64: iommu map to non memory area 2000000000
> >  qemu-system-x86_64: iommu map to non memory area 4000000000
> 
> I see the same on arm64 with v9.0, assigned an ixgbe nic via VFIO.
> 
> qemu-system-aarch64: iommu map to non memory area 0                               
> qemu-system-aarch64: iommu map to non memory area 8100000                         
> qemu-system-aarch64: iommu map to non memory area 8200000                         
> qemu-system-aarch64: iommu map to non memory area 8400000                         
> qemu-system-aarch64: iommu map to non memory area 8800000                         
> qemu-system-aarch64: iommu map to non memory area 0                               
> qemu-system-aarch64: iommu map to non memory area 0                               
> qemu-system-aarch64: iommu map to non memory area c000000                         
> qemu-system-aarch64: iommu map to non memory area 10000000                        
> qemu-system-aarch64: iommu map to non memory area 20000000                        
> qemu-system-aarch64: iommu has granularity incompatible with target AS            
> qemu-system-aarch64: iommu map to non memory area 200000000                       
> qemu-system-aarch64: iommu map to non memory area 400000000                       
> qemu-system-aarch64: iommu map to non memory area 800000000                       
> qemu-system-aarch64: iommu map to non memory area 1000000000                      
> qemu-system-aarch64: iommu map to non memory area 2000000000                      
> qemu-system-aarch64: iommu map to non memory area 4000000000                      
> qemu-system-aarch64: iommu map to non memory area 8000000000                      
> qemu-system-aarch64: iommu map to non memory area 10000000000                     
> qemu-system-aarch64: iommu map to non memory area 20000000000                     
> qemu-system-aarch64: iommu map to non memory area 40000000000                     
> qemu-system-aarch64: iommu map to non memory area 80000000000                     
> qemu-system-aarch64: iommu map to non memory area 100000000000                    
> qemu-system-aarch64: iommu map to non memory area 200000000000                    
> qemu-system-aarch64: iommu map to non memory area 400000000000                    
> qemu-system-aarch64: iommu map to non memory area 800000000000   

QEMU v9.0 and v9.1 have reached end of life. Do you also get this error
with QEMU v9.2.x?

Are you running an arm64 guest on an x86 host with TCG, or arm64 guest on
arm64 host with KVM?  Does reproducing this require modifying QEMU or Linux
to disable support for F_BYPASS_CONFIG, or just passing QEMU/kernel
parameters?

Thanks,
Jean


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
  2025-02-12  0:41 ` [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jacob Pan
@ 2025-02-12 12:50   ` Jason Gunthorpe
  2025-02-12 18:50     ` Jacob Pan
  2025-02-21 11:42     ` Jean-Philippe Brucker
  0 siblings, 2 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-12 12:50 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On Tue, Feb 11, 2025 at 04:41:51PM -0800, Jacob Pan wrote:

> I see the same on arm64 with v9.0, assigned an ixgbe nic via VFIO.

Huh, I figured it worked on ARM..

There is no requirement that an iommu driver implement identity, this
is all an optimization. If it doesn't work in alot of cases can we
just remove it? It would simplfy a lot..

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/5] iommu: Add domain_alloc_identity()
  2025-02-07 14:46 ` [PATCH 2/5] iommu: Add domain_alloc_identity() Jason Gunthorpe
@ 2025-02-12 13:56   ` Robin Murphy
  2025-02-12 14:03     ` Jason Gunthorpe
  0 siblings, 1 reply; 37+ messages in thread
From: Robin Murphy @ 2025-02-12 13:56 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, Jean-Philippe Brucker, Joerg Roedel,
	virtualization, Will Deacon
  Cc: Eric Auger, patches, Alyssa Rosenzweig, Sven Peter, Janne Grunau

On 2025-02-07 2:46 pm, Jason Gunthorpe wrote:
> virtio-iommu has a mode where the IDENTITY domain is actually a paging
> domain with an identity mapping covering some of the system address
> space manually created.
> 
> To support this add a new domain_alloc_identity() op that accepts
> the struct device so that virtio can allocate and fully finalize a
> paging domain to return.

Oh, I'd already managed to forget this idea - this could be convenient 
for DART to implement per-instance identity domain support as well.

Robin.

> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/iommu.c | 20 ++++++++++++--------
>   include/linux/iommu.h |  4 ++++
>   2 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 870c3cdbd0f622..ee33d26dfcd40d 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1595,15 +1595,19 @@ static struct iommu_domain *__iommu_alloc_identity_domain(struct device *dev)
>   	if (ops->identity_domain)
>   		return ops->identity_domain;
>   
> -	/* Older drivers create the identity domain via ops->domain_alloc() */
> -	if (!ops->domain_alloc)
> +	if (ops->domain_alloc_identity) {
> +		domain = ops->domain_alloc_identity(dev);
> +		if (IS_ERR(domain))
> +			return domain;
> +	} else if (ops->domain_alloc) {
> +		domain = ops->domain_alloc(IOMMU_DOMAIN_IDENTITY);
> +		if (!domain)
> +			return ERR_PTR(-ENOMEM);
> +		if (IS_ERR(domain))
> +			return domain;
> +	} else {
>   		return ERR_PTR(-EOPNOTSUPP);
> -
> -	domain = ops->domain_alloc(IOMMU_DOMAIN_IDENTITY);
> -	if (IS_ERR(domain))
> -		return domain;
> -	if (!domain)
> -		return ERR_PTR(-ENOMEM);
> +	}
>   
>   	iommu_domain_init(domain, IOMMU_DOMAIN_IDENTITY, ops);
>   	return domain;
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 38c65e92ecd091..6389d59178ba3f 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -557,6 +557,9 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size,
>    * @domain_alloc: allocate and return an iommu domain if success. Otherwise
>    *                NULL is returned. The domain is not fully initialized until
>    *                the caller iommu_domain_alloc() returns.
> + * @domain_alloc_identity: allocate an IDENTITY domain. Drivers should prefer to
> + *                         use identity_domain instead. This should only be used
> + *                         if dynamic logic is necessary.
>    * @domain_alloc_paging_flags: Allocate an iommu domain corresponding to the
>    *                     input parameters as defined in
>    *                     include/uapi/linux/iommufd.h. The @user_data can be
> @@ -615,6 +618,7 @@ struct iommu_ops {
>   
>   	/* Domain allocation and freeing by the iommu driver */
>   	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
> +	struct iommu_domain *(*domain_alloc_identity)(struct device *dev);
>   	struct iommu_domain *(*domain_alloc_paging_flags)(
>   		struct device *dev, u32 flags,
>   		const struct iommu_user_data *user_data);


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/5] iommu: Add domain_alloc_identity()
  2025-02-12 13:56   ` Robin Murphy
@ 2025-02-12 14:03     ` Jason Gunthorpe
  2025-02-12 14:16       ` Robin Murphy
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-12 14:03 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, virtualization,
	Will Deacon, Eric Auger, patches, Alyssa Rosenzweig, Sven Peter,
	Janne Grunau

On Wed, Feb 12, 2025 at 01:56:55PM +0000, Robin Murphy wrote:
> On 2025-02-07 2:46 pm, Jason Gunthorpe wrote:
> > virtio-iommu has a mode where the IDENTITY domain is actually a paging
> > domain with an identity mapping covering some of the system address
> > space manually created.
> > 
> > To support this add a new domain_alloc_identity() op that accepts
> > the struct device so that virtio can allocate and fully finalize a
> > paging domain to return.
> 
> Oh, I'd already managed to forget this idea - this could be convenient for
> DART to implement per-instance identity domain support as well.

If we are going to broaden this someday then I suggest the core code should
implement it entirely buy asking the driver for a paging domain and
then mapping all of system memory into it. There is nothing special
here that needs to be in driver code.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/5] iommu: Add domain_alloc_identity()
  2025-02-12 14:03     ` Jason Gunthorpe
@ 2025-02-12 14:16       ` Robin Murphy
  2025-02-12 14:45         ` Jason Gunthorpe
  0 siblings, 1 reply; 37+ messages in thread
From: Robin Murphy @ 2025-02-12 14:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, virtualization,
	Will Deacon, Eric Auger, patches, Alyssa Rosenzweig, Sven Peter,
	Janne Grunau

On 2025-02-12 2:03 pm, Jason Gunthorpe wrote:
> On Wed, Feb 12, 2025 at 01:56:55PM +0000, Robin Murphy wrote:
>> On 2025-02-07 2:46 pm, Jason Gunthorpe wrote:
>>> virtio-iommu has a mode where the IDENTITY domain is actually a paging
>>> domain with an identity mapping covering some of the system address
>>> space manually created.
>>>
>>> To support this add a new domain_alloc_identity() op that accepts
>>> the struct device so that virtio can allocate and fully finalize a
>>> paging domain to return.
>>
>> Oh, I'd already managed to forget this idea - this could be convenient for
>> DART to implement per-instance identity domain support as well.
> 
> If we are going to broaden this someday then I suggest the core code should
> implement it entirely buy asking the driver for a paging domain and
> then mapping all of system memory into it. There is nothing special
> here that needs to be in driver code.

Sorry, that was just in terms of having a domain_alloc_identity() op 
that's allowed to fail, being potentially neater than otherwise having 
to register distinct per-instance ops with and without the static 
identity_domain. I'm inclined to agree that if anyone really is 
desperate for identity-mapped paging domains in the absence of true 
identity support, then indeed we probably should handle that in core 
code much like we do for improvised blocking domains.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/5] iommu: Add domain_alloc_identity()
  2025-02-12 14:16       ` Robin Murphy
@ 2025-02-12 14:45         ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-12 14:45 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, virtualization,
	Will Deacon, Eric Auger, patches, Alyssa Rosenzweig, Sven Peter,
	Janne Grunau

On Wed, Feb 12, 2025 at 02:16:54PM +0000, Robin Murphy wrote:
> On 2025-02-12 2:03 pm, Jason Gunthorpe wrote:
> > On Wed, Feb 12, 2025 at 01:56:55PM +0000, Robin Murphy wrote:
> > > On 2025-02-07 2:46 pm, Jason Gunthorpe wrote:
> > > > virtio-iommu has a mode where the IDENTITY domain is actually a paging
> > > > domain with an identity mapping covering some of the system address
> > > > space manually created.
> > > > 
> > > > To support this add a new domain_alloc_identity() op that accepts
> > > > the struct device so that virtio can allocate and fully finalize a
> > > > paging domain to return.
> > > 
> > > Oh, I'd already managed to forget this idea - this could be convenient for
> > > DART to implement per-instance identity domain support as well.
> > 
> > If we are going to broaden this someday then I suggest the core code should
> > implement it entirely buy asking the driver for a paging domain and
> > then mapping all of system memory into it. There is nothing special
> > here that needs to be in driver code.
> 
> Sorry, that was just in terms of having a domain_alloc_identity() op that's
> allowed to fail, being potentially neater than otherwise having to register
> distinct per-instance ops with and without the static
> identity_domain. 

Oh, I see, yes, that would be much better that the tricks drivers do
with the default domain too.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
  2025-02-12 11:58   ` Jean-Philippe Brucker
@ 2025-02-12 17:05     ` Jacob Pan
       [not found]     ` <67acd4e2.630a0220.365aab.e098SMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 37+ messages in thread
From: Jacob Pan @ 2025-02-12 17:05 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, jacob.pan

Hi Jean-Philippe,

On Wed, 12 Feb 2025 11:58:45 +0000
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:

> Hi Jacob,
> 
> On Tue, Feb 11, 2025 at 04:41:51PM -0800, Jacob Pan wrote:
> > > However, there seem to be some unrelated qemu bugs. 7.0.0 does not
> > > have working VIRTIO_IOMMU_F_BYPASS_CONFIG and neither qemu works
> > > with an IDENTITY domain using !VIRTIO_IOMMU_F_BYPASS_CONFIG. It
> > > prints:
> > > 
> > >  qemu-system-x86_64: iommu has granularity incompatible with
> > > target AS qemu-system-x86_64: iommu map to non memory area
> > > 80000000 qemu-system-x86_64: iommu map to non memory area c0000000
> > >  qemu-system-x86_64: iommu map to non memory area e0000000
> > >  qemu-system-x86_64: iommu map to non memory area f0000000
> > >  qemu-system-x86_64: iommu map to non memory area f8000000
> > >  qemu-system-x86_64: iommu map to non memory area fc000000
> > >  qemu-system-x86_64: iommu map to non memory area fe000000
> > >  qemu-system-x86_64: iommu map to non memory area fe800000
> > >  qemu-system-x86_64: iommu map to non memory area 0
> > >  qemu-system-x86_64: iommu map to non memory area fef00000
> > >  qemu-system-x86_64: iommu map to non memory area ff000000
> > >  qemu-system-x86_64: iommu has granularity incompatible with
> > > target AS qemu-system-x86_64: iommu map to non memory area
> > > 200000000 qemu-system-x86_64: iommu map to non memory area
> > > 400000000 qemu-system-x86_64: iommu map to non memory area
> > > 800000000 qemu-system-x86_64: iommu map to non memory area
> > > 1000000000 qemu-system-x86_64: iommu map to non memory area
> > > 2000000000 qemu-system-x86_64: iommu map to non memory area
> > > 4000000000  
> > 
> > I see the same on arm64 with v9.0, assigned an ixgbe nic via VFIO.
> > 
> > qemu-system-aarch64: iommu map to non memory area 0
> > qemu-system-aarch64: iommu map to non memory area 8100000
> > qemu-system-aarch64: iommu map to non memory area 8200000
> > qemu-system-aarch64: iommu map to non memory area 8400000
> > qemu-system-aarch64: iommu map to non memory area 8800000
> > qemu-system-aarch64: iommu map to non memory area 0
> > qemu-system-aarch64: iommu map to non memory area 0
> > qemu-system-aarch64: iommu map to non memory area c000000
> > qemu-system-aarch64: iommu map to non memory area 10000000
> > qemu-system-aarch64: iommu map to non memory area 20000000
> > qemu-system-aarch64: iommu has granularity incompatible with target
> > AS qemu-system-aarch64: iommu map to non memory area 200000000
> > qemu-system-aarch64: iommu map to non memory area 400000000
> > qemu-system-aarch64: iommu map to non memory area 800000000
> > qemu-system-aarch64: iommu map to non memory area 1000000000
> > qemu-system-aarch64: iommu map to non memory area 2000000000
> > qemu-system-aarch64: iommu map to non memory area 4000000000
> > qemu-system-aarch64: iommu map to non memory area 8000000000
> > qemu-system-aarch64: iommu map to non memory area 10000000000
> > qemu-system-aarch64: iommu map to non memory area 20000000000
> > qemu-system-aarch64: iommu map to non memory area 40000000000
> > qemu-system-aarch64: iommu map to non memory area 80000000000
> > qemu-system-aarch64: iommu map to non memory area 100000000000
> > qemu-system-aarch64: iommu map to non memory area 200000000000
> > qemu-system-aarch64: iommu map to non memory area 400000000000
> > qemu-system-aarch64: iommu map to non memory area 800000000000     
> 
> QEMU v9.0 and v9.1 have reached end of life. Do you also get this
> error with QEMU v9.2.x?
have not tried. will do.

> Are you running an arm64 guest on an x86 host with TCG, or arm64
> guest on arm64 host with KVM?
This is arm64 guest and host.

>  Does reproducing this require
> modifying QEMU or Linux to disable support for F_BYPASS_CONFIG, or
> just passing QEMU/kernel parameters?
I only modified guest code to pretend F_BYPASS_CONFIG is not supported,
 a couple of places like:

-       if (virtio_has_feature(viommu->vdev,
  VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+       if (!virtio_has_feature(viommu->vdev,
  VIRTIO_IOMMU_F_BYPASS_CONFIG)) {

Is there a QEMU command line option to disable
VIRTIO_IOMMU_F_BYPASS_CONFIG?



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
  2025-02-12 12:50   ` Jason Gunthorpe
@ 2025-02-12 18:50     ` Jacob Pan
  2025-02-12 20:10       ` Robin Murphy
  2025-02-21 11:42     ` Jean-Philippe Brucker
  1 sibling, 1 reply; 37+ messages in thread
From: Jacob Pan @ 2025-02-12 18:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, jacob.pan

Hi Jason,

On Wed, 12 Feb 2025 08:50:07 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Feb 11, 2025 at 04:41:51PM -0800, Jacob Pan wrote:
> 
> > I see the same on arm64 with v9.0, assigned an ixgbe nic via VFIO.  
> 
> Huh, I figured it worked on ARM..
> 
> There is no requirement that an iommu driver implement identity, this
> is all an optimization. If it doesn't work in alot of cases can we
> just remove it? It would simplfy a lot..
> 
Currently, identity domain type can be set via sysfs. To prevent that,
maybe we need this?

+static int viommu_def_domain_type(struct device *dev)
+{
+  	return IOMMU_DOMAIN_DMA;
+}
+
 static struct iommu_ops viommu_ops = {
        .identity_domain        = &viommu_identity_domain.domain,
        .capable                = viommu_capable,
@@ -1096,6 +1106,7 @@ static struct iommu_ops viommu_ops = {
        .device_group           = viommu_device_group,
        .get_resv_regions       = viommu_get_resv_regions,
        .of_xlate               = viommu_of_xlate,
+       .def_domain_type = viommu_def_domain_type,
        .owner                  = THIS_MODULE,


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
       [not found]     ` <67acd4e2.630a0220.365aab.e098SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2025-02-12 19:16       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 37+ messages in thread
From: Jean-Philippe Brucker @ 2025-02-12 19:16 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On Wed, Feb 12, 2025 at 09:05:35AM -0800, Jacob Pan wrote:
> >  Does reproducing this require
> > modifying QEMU or Linux to disable support for F_BYPASS_CONFIG, or
> > just passing QEMU/kernel parameters?
> I only modified guest code to pretend F_BYPASS_CONFIG is not supported,
>  a couple of places like:
> 
> -       if (virtio_has_feature(viommu->vdev,
>   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
> +       if (!virtio_has_feature(viommu->vdev,
>   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
> 
> Is there a QEMU command line option to disable
> VIRTIO_IOMMU_F_BYPASS_CONFIG?

No, but I have a patch for testing that didn't seem worth upstreaming:
https://jpbrucker.net/git/qemu/commit/?h=virtio-iommu/bypass&id=ab2f435576a95914d2939ef4f0b190548721fe7b
With it I'm able to reproduce the issue, looks like I only tested this
case with virtual devices and not VFIO. I'll look for a fix

Thanks,
Jean

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-07 14:46 ` [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging() Jason Gunthorpe
@ 2025-02-12 19:22   ` Jacob Pan
  2025-02-12 23:30     ` Jason Gunthorpe
  0 siblings, 1 reply; 37+ messages in thread
From: Jacob Pan @ 2025-02-12 19:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, jacob.pan

Hi Jason,

On Fri,  7 Feb 2025 10:46:03 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> virtio has the complication that it sometimes wants to return a paging
> domain for IDENTITY which makes this conversion a little different
> than other drivers.
> 
> Add a viommu_domain_alloc_paging() that combines
> viommu_domain_alloc() and viommu_domain_finalise() to always return a
> fully initialized and finalized paging domain.
> 
Slightly off the topic, still related to paging domain.

virtio spec for page table extension was introduced a while ago[1],
which adds support for guest owned page tables for paging domains.

Do you foresee the implementation can leverage your generic iommu_pt
work? i.e. for building guest IO page tables. It will add a new flavor
(page table based in addition to map/unmap) to
viommu_domain_alloc_paging() i think.

[1]
https://lore.kernel.org/all/20231006160947.2227396-8-jean-philippe@linaro.org/T/


> Use viommu_domain_alloc_identity() to implement the special non-bypass
> IDENTITY flow by calling viommu_domain_alloc_paging() then
> viommu_domain_map_identity().
> 
> Remove support for deferred finalize and the vdomain->mutex.
> 
> Remove core support for domain_alloc() IDENTITY as virtio was the last
> driver using it.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/iommu.c        |   6 --
>  drivers/iommu/virtio-iommu.c | 114
> ++++++++++++++++------------------- 2 files changed, 53
> insertions(+), 67 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index ee33d26dfcd40d..73a05b34de4768 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1599,12 +1599,6 @@ static struct iommu_domain
> *__iommu_alloc_identity_domain(struct device *dev) domain =
> ops->domain_alloc_identity(dev); if (IS_ERR(domain))
>  			return domain;
> -	} else if (ops->domain_alloc) {
> -		domain = ops->domain_alloc(IOMMU_DOMAIN_IDENTITY);
> -		if (!domain)
> -			return ERR_PTR(-ENOMEM);
> -		if (IS_ERR(domain))
> -			return domain;
>  	} else {
>  		return ERR_PTR(-EOPNOTSUPP);
>  	}
> diff --git a/drivers/iommu/virtio-iommu.c
> b/drivers/iommu/virtio-iommu.c index c71a996760bddb..79b471c03b6ee4
> 100644 --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -63,7 +63,6 @@ struct viommu_mapping {
>  struct viommu_domain {
>  	struct iommu_domain		domain;
>  	struct viommu_dev		*viommu;
> -	struct mutex			mutex; /* protects
> viommu pointer */ unsigned int			id;
>  	u32				map_flags;
>  
> @@ -97,6 +96,8 @@ struct viommu_event {
>  	};
>  };
>  
> +static struct viommu_domain viommu_identity_domain;
> +
>  #define to_viommu_domain(domain)	\
>  	container_of(domain, struct viommu_domain, domain)
>  
> @@ -653,65 +654,45 @@ static void viommu_event_handler(struct
> virtqueue *vq) 
>  /* IOMMU API */
>  
> -static struct iommu_domain *viommu_domain_alloc(unsigned type)
> +static struct iommu_domain *viommu_domain_alloc_paging(struct device
> *dev) {
> -	struct viommu_domain *vdomain;
> -
> -	if (type != IOMMU_DOMAIN_UNMANAGED &&
> -	    type != IOMMU_DOMAIN_DMA &&
> -	    type != IOMMU_DOMAIN_IDENTITY)
> -		return NULL;
> -
> -	vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
> -	if (!vdomain)
> -		return NULL;
> -
> -	mutex_init(&vdomain->mutex);
> -	spin_lock_init(&vdomain->mappings_lock);
> -	vdomain->mappings = RB_ROOT_CACHED;
> -
> -	return &vdomain->domain;
> -}
> -
> -static int viommu_domain_finalise(struct viommu_endpoint *vdev,
> -				  struct iommu_domain *domain)
> -{
> -	int ret;
> -	unsigned long viommu_page_size;
> +	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
>  	struct viommu_dev *viommu = vdev->viommu;
> -	struct viommu_domain *vdomain = to_viommu_domain(domain);
> +	unsigned long viommu_page_size;
> +	struct viommu_domain *vdomain;
> +	int ret;
>  
>  	viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
>  	if (viommu_page_size > PAGE_SIZE) {
>  		dev_err(vdev->dev,
>  			"granule 0x%lx larger than system page size
> 0x%lx\n", viommu_page_size, PAGE_SIZE);
> -		return -ENODEV;
> +		return ERR_PTR(-ENODEV);
>  	}
>  
> +	vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
> +	if (!vdomain)
> +		return ERR_PTR(-ENOMEM);
> +
> +	spin_lock_init(&vdomain->mappings_lock);
> +	vdomain->mappings = RB_ROOT_CACHED;
> +
>  	ret = ida_alloc_range(&viommu->domain_ids,
> viommu->first_domain, viommu->last_domain, GFP_KERNEL);
> -	if (ret < 0)
> -		return ret;
> -
> -	vdomain->id		= (unsigned int)ret;
> -
> -	domain->pgsize_bitmap	= viommu->pgsize_bitmap;
> -	domain->geometry	= viommu->geometry;
> -
> -	vdomain->map_flags	= viommu->map_flags;
> -	vdomain->viommu		= viommu;
> -
> -	if (domain->type == IOMMU_DOMAIN_IDENTITY) {
> -		ret = viommu_domain_map_identity(vdev, vdomain);
> -		if (ret) {
> -			ida_free(&viommu->domain_ids, vdomain->id);
> -			vdomain->viommu = NULL;
> -			return ret;
> -		}
> +	if (ret < 0) {
> +		kfree(vdomain);
> +		return ERR_PTR(ret);
>  	}
>  
> -	return 0;
> +	vdomain->id = (unsigned int)ret;
> +
> +	vdomain->domain.pgsize_bitmap = viommu->pgsize_bitmap;
> +	vdomain->domain.geometry = viommu->geometry;
> +
> +	vdomain->map_flags = viommu->map_flags;
> +	vdomain->viommu = viommu;
> +
> +	return &vdomain->domain;
>  }
>  
>  static void viommu_domain_free(struct iommu_domain *domain)
> @@ -727,6 +708,28 @@ static void viommu_domain_free(struct
> iommu_domain *domain) kfree(vdomain);
>  }
>  
> +static struct iommu_domain *viommu_domain_alloc_identity(struct
> device *dev) +{
> +	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
> +	struct iommu_domain *domain;
> +	int ret;
> +
> +	if (virtio_has_feature(vdev->viommu->vdev,
> +			       VIRTIO_IOMMU_F_BYPASS_CONFIG))
> +		return &viommu_identity_domain.domain;
> +
> +	domain = viommu_domain_alloc_paging(dev);
> +	if (IS_ERR(domain))
> +		return domain;
> +
> +	ret = viommu_domain_map_identity(vdev,
> to_viommu_domain(domain));
> +	if (ret) {
> +		viommu_domain_free(domain);
> +		return ERR_PTR(ret);
> +	}
> +	return domain;
> +}
> +
>  static int viommu_attach_dev(struct iommu_domain *domain, struct
> device *dev) {
>  	int ret = 0;
> @@ -734,20 +737,8 @@ static int viommu_attach_dev(struct iommu_domain
> *domain, struct device *dev) struct viommu_endpoint *vdev =
> dev_iommu_priv_get(dev); struct viommu_domain *vdomain =
> to_viommu_domain(domain); 
> -	mutex_lock(&vdomain->mutex);
> -	if (!vdomain->viommu) {
> -		/*
> -		 * Properly initialize the domain now that we know
> which viommu
> -		 * owns it.
> -		 */
> -		ret = viommu_domain_finalise(vdev, domain);
> -	} else if (vdomain->viommu != vdev->viommu) {
> -		ret = -EINVAL;
> -	}
> -	mutex_unlock(&vdomain->mutex);
> -
> -	if (ret)
> -		return ret;
> +	if (vdomain->viommu != vdev->viommu)
> +		return -EINVAL;
>  
>  	/*
>  	 * In the virtio-iommu device, when attaching the endpoint
> to a new @@ -1098,7 +1089,8 @@ static bool viommu_capable(struct
> device *dev, enum iommu_cap cap) static struct iommu_ops viommu_ops =
> { .identity_domain	= &viommu_identity_domain.domain,
>  	.capable		= viommu_capable,
> -	.domain_alloc		= viommu_domain_alloc,
> +	.domain_alloc_identity	= viommu_domain_alloc_identity,
> +	.domain_alloc_paging	= viommu_domain_alloc_paging,
>  	.probe_device		= viommu_probe_device,
>  	.release_device		= viommu_release_device,
>  	.device_group		= viommu_device_group,


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
  2025-02-12 18:50     ` Jacob Pan
@ 2025-02-12 20:10       ` Robin Murphy
  0 siblings, 0 replies; 37+ messages in thread
From: Robin Murphy @ 2025-02-12 20:10 UTC (permalink / raw)
  To: jacob.pan, Jason Gunthorpe
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, virtualization,
	Will Deacon, Eric Auger, patches

On 2025-02-12 6:50 pm, Jacob Pan wrote:
> Hi Jason,
> 
> On Wed, 12 Feb 2025 08:50:07 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
>> On Tue, Feb 11, 2025 at 04:41:51PM -0800, Jacob Pan wrote:
>>
>>> I see the same on arm64 with v9.0, assigned an ixgbe nic via VFIO.
>>
>> Huh, I figured it worked on ARM..
>>
>> There is no requirement that an iommu driver implement identity, this
>> is all an optimization. If it doesn't work in alot of cases can we
>> just remove it? It would simplfy a lot..
>>
> Currently, identity domain type can be set via sysfs. To prevent that,
> maybe we need this?
> 
> +static int viommu_def_domain_type(struct device *dev)
> +{
> +  	return IOMMU_DOMAIN_DMA;

Ugh, please no more of that.


Yes, a user can *request* to set an identity domain via sysfs, and if 
the driver does not support identity domains then:

iommu_group_store_type()
     iommu_setup_default_domain()
         iommu_group_alloc_default_domain()
             __iommu_group_alloc_default_domain()
                 __iommu_alloc_identity_domain()

will return failure, and everything will remain fine (well, except on 
apple-dart for now...) Find an s390 system and try it yourself* ;)

Thanks,
Robin.



*quickly, before the patches land and that gains identity support too!

> +}
> +
>   static struct iommu_ops viommu_ops = {
>          .identity_domain        = &viommu_identity_domain.domain,
>          .capable                = viommu_capable,
> @@ -1096,6 +1106,7 @@ static struct iommu_ops viommu_ops = {
>          .device_group           = viommu_device_group,
>          .get_resv_regions       = viommu_get_resv_regions,
>          .of_xlate               = viommu_of_xlate,
> +       .def_domain_type = viommu_def_domain_type,
>          .owner                  = THIS_MODULE,
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-12 19:22   ` Jacob Pan
@ 2025-02-12 23:30     ` Jason Gunthorpe
  2025-02-13  5:47       ` Jacob Pan
       [not found]       ` <67ad876d.170a0220.3c21dc.85ceSMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-12 23:30 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On Wed, Feb 12, 2025 at 11:22:35AM -0800, Jacob Pan wrote:

> Do you foresee the implementation can leverage your generic iommu_pt
> work? i.e. for building guest IO page tables. It will add a new flavor
> (page table based in addition to map/unmap) to
> viommu_domain_alloc_paging() i think.

Yes I do, I think it should be no problem and it will bring the
missing x86 formats that are currently not available in iopgtbl
wrappers.

I'm working toward getting a cut back AMD only series ready to post,
the other series on the iommu-pages should be the last preperation
work.

Certainly I would be happy to help you get it implemented if that is
your interest.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-12 23:30     ` Jason Gunthorpe
@ 2025-02-13  5:47       ` Jacob Pan
  2025-02-18 20:01         ` Jason Gunthorpe
       [not found]       ` <67ad876d.170a0220.3c21dc.85ceSMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 1 reply; 37+ messages in thread
From: Jacob Pan @ 2025-02-13  5:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, jacob.pan

Hi Jason,

On Wed, 12 Feb 2025 19:30:53 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Feb 12, 2025 at 11:22:35AM -0800, Jacob Pan wrote:
> 
> > Do you foresee the implementation can leverage your generic iommu_pt
> > work? i.e. for building guest IO page tables. It will add a new
> > flavor (page table based in addition to map/unmap) to
> > viommu_domain_alloc_paging() i think.  
> 
> Yes I do, I think it should be no problem and it will bring the
> missing x86 formats that are currently not available in iopgtbl
> wrappers.
I guess the missing x86 formats are AMDv2 and VT-d S1?

> I'm working toward getting a cut back AMD only series ready to post,
> the other series on the iommu-pages should be the last preperation
> work.
> 
> Certainly I would be happy to help you get it implemented if that is
> your interest.

Yes, that is definitely our interest! We are looking at expanding
hyperv-iommu to include guest DMA remapping support (current code has
IRQ remapping only). This expansion includes paging and sva domains. In
terms of the need for paging domain, I believe it is identical to
virtio-iommu page table extensions.

It would be great if you could help adding/accommodating such usage to
the generic iommu_pt. I think it will be a subset of features as
intended for iommufd. I.e. map/unmap operation.

One difference/simplification than virtio-iommu is that Hyperv-iommu
(backend) does not want to support map/unmap hypercall based paging
domain. IOW, page table based paging domain only, let the guest own S1.

Our code and backend support are still in the early stages, that is why
I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
@Jean @Eric Do you know?

Adding a couple developers from our side. @Yu Zhang @Easwar Hariharan.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
       [not found]       ` <67ad876d.170a0220.3c21dc.85ceSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2025-02-13  9:46         ` Jean-Philippe Brucker
  2025-02-13 17:03           ` Yu Zhang
  0 siblings, 1 reply; 37+ messages in thread
From: Jean-Philippe Brucker @ 2025-02-13  9:46 UTC (permalink / raw)
  To: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com
  Cc: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, tina.zhang

Hi Jacob,

On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
> Our code and backend support are still in the early stages, that is why
> I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
> anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
> @Jean @Eric Do you know?

As far as I know Tina worked on this most recently:
https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/

Thanks,
Jean

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-13  9:46         ` Jean-Philippe Brucker
@ 2025-02-13 17:03           ` Yu Zhang
  2025-02-13 18:09             ` Jean-Philippe Brucker
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Zhang @ 2025-02-13 17:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com,
	Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, tina.zhang

On Thu, Feb 13, 2025 at 09:46:01AM +0000, Jean-Philippe Brucker wrote:
> Hi Jacob,
> 
> On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
> > Our code and backend support are still in the early stages, that is why
> > I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
> > anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
> > @Jean @Eric Do you know?
> 
> As far as I know Tina worked on this most recently:
> https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
> https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/

Thanks a lot for this information, Jean.
IIUC, these patches were trying to add VT-d IO page table support in
virtio-iommu, but it is not based on Jason's generic PT [1]. Just wondering,
does anyone have plan to do the incorporation? 

[1] https://lore.kernel.org/all/0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/

B.R.
Yu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-13 17:03           ` Yu Zhang
@ 2025-02-13 18:09             ` Jean-Philippe Brucker
  2025-02-19  9:39               ` Yu Zhang
  0 siblings, 1 reply; 37+ messages in thread
From: Jean-Philippe Brucker @ 2025-02-13 18:09 UTC (permalink / raw)
  To: Yu Zhang
  Cc: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com,
	Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, tina.zhang

On Fri, Feb 14, 2025 at 01:03:43AM +0800, Yu Zhang wrote:
> On Thu, Feb 13, 2025 at 09:46:01AM +0000, Jean-Philippe Brucker wrote:
> > Hi Jacob,
> > 
> > On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
> > > Our code and backend support are still in the early stages, that is why
> > > I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
> > > anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
> > > @Jean @Eric Do you know?
> > 
> > As far as I know Tina worked on this most recently:
> > https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
> > https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/
> 
> Thanks a lot for this information, Jean.
> IIUC, these patches were trying to add VT-d IO page table support in
> virtio-iommu, but it is not based on Jason's generic PT [1]. Just wondering,
> does anyone have plan to do the incorporation? 

I'm not aware of anyone working on this at the moment. Something you will
need for a portable pviommu is a library that manages PASID tables rather
than page tables [1], because the Arm SMMUv3 arch only support assigning
PASID tables to the guest. Alternatively you could implement opaque PASID
table allocation via host calls, letting the guest allocate GPA space and
the host manage the PASID table, but that idea didn't seem very popular at
the time.

Thanks,
Jean

[1] https://lore.kernel.org/all/20210115121342.15093-1-vivek.gautam@arm.com/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static
  2025-02-12  0:43   ` Jacob Pan
@ 2025-02-18 18:21     ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 18:21 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On Tue, Feb 11, 2025 at 04:43:55PM -0800, Jacob Pan wrote:
> > +static int viommu_send_attach_req(struct viommu_dev *viommu, struct
> > device *dev,
> > +				  struct virtio_iommu_req_attach
> > *req) +{
> > +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> > +	int ret;
> > +	int i;
> nit: coding style inconsistent within this patch. Inverted xmas tree
> here.

Ok

> > +	/* Reserve an ID to use as the bypass domain */
> > +	if (virtio_has_feature(viommu->vdev,
> > VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
> > +		viommu->identity_domain_id = viommu->first_domain;
> > +		viommu->first_domain++;
> Could also use ida_alloc() instead of special treatment, it would be
> consistent with the paging identity domain ID.

Could do, but that would allocate memory and we already have
first_domain which is free

Thanks,
Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-13  5:47       ` Jacob Pan
@ 2025-02-18 20:01         ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-18 20:01 UTC (permalink / raw)
  To: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com
  Cc: iommu, Jean-Philippe Brucker, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
> Hi Jason,
> 
> On Wed, 12 Feb 2025 19:30:53 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, Feb 12, 2025 at 11:22:35AM -0800, Jacob Pan wrote:
> > 
> > > Do you foresee the implementation can leverage your generic iommu_pt
> > > work? i.e. for building guest IO page tables. It will add a new
> > > flavor (page table based in addition to map/unmap) to
> > > viommu_domain_alloc_paging() i think.  
> > 
> > Yes I do, I think it should be no problem and it will bring the
> > missing x86 formats that are currently not available in iopgtbl
> > wrappers.
> I guess the missing x86 formats are AMDv2 and VT-d S1?

Those are the same thing :)

> It would be great if you could help adding/accommodating such usage to
> the generic iommu_pt. I think it will be a subset of features as
> intended for iommufd. I.e. map/unmap operation.

Yeah it should be no problem. hyperv handling the invalidation queue
in its own way should not effect the logic that drives the page table.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-13 18:09             ` Jean-Philippe Brucker
@ 2025-02-19  9:39               ` Yu Zhang
  2025-02-19 10:35                 ` Jean-Philippe Brucker
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Zhang @ 2025-02-19  9:39 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com,
	Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches, tina.zhang

On Thu, Feb 13, 2025 at 06:09:19PM +0000, Jean-Philippe Brucker wrote:
> On Fri, Feb 14, 2025 at 01:03:43AM +0800, Yu Zhang wrote:
> > On Thu, Feb 13, 2025 at 09:46:01AM +0000, Jean-Philippe Brucker wrote:
> > > Hi Jacob,
> > > 
> > > On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
> > > > Our code and backend support are still in the early stages, that is why
> > > > I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
> > > > anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
> > > > @Jean @Eric Do you know?
> > > 
> > > As far as I know Tina worked on this most recently:
> > > https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
> > > https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/
> > 
> > Thanks a lot for this information, Jean.
> > IIUC, these patches were trying to add VT-d IO page table support in
> > virtio-iommu, but it is not based on Jason's generic PT [1]. Just wondering,
> > does anyone have plan to do the incorporation? 
> 
> I'm not aware of anyone working on this at the moment. Something you will
> need for a portable pviommu is a library that manages PASID tables rather
> than page tables [1], because the Arm SMMUv3 arch only support assigning
> PASID tables to the guest. Alternatively you could implement opaque PASID
> table allocation via host calls, letting the guest allocate GPA space and
> the host manage the PASID table, but that idea didn't seem very popular at
> the time.

Thank you, Jean. Just had a study of the spec. For ARM SMMUv3, letting
the guest manage the PASID table, and then assigning it directly to the
backend in ATTACH_TABLE request looks quite resonable. But for VT-d,
my understanding is the PASID table shall be managed by host. By "that
idea didn't seem very popular", do you mean that people also want the 
ATTCH_TABLE request for VT-d also assign the PASID table(an virtual one
managed by the guest). If yes, why? 
> 
> Thanks,
> Jean
> 
> [1] https://lore.kernel.org/all/20210115121342.15093-1-vivek.gautam@arm.com/
> 

B.R.
Yu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-19  9:39               ` Yu Zhang
@ 2025-02-19 10:35                 ` Jean-Philippe Brucker
  2025-02-19 11:11                   ` Yu Zhang
                                     ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Jean-Philippe Brucker @ 2025-02-19 10:35 UTC (permalink / raw)
  To: Yu Zhang
  Cc: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com,
	Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On Wed, Feb 19, 2025 at 05:39:19PM +0800, Yu Zhang wrote:
> On Thu, Feb 13, 2025 at 06:09:19PM +0000, Jean-Philippe Brucker wrote:
> > On Fri, Feb 14, 2025 at 01:03:43AM +0800, Yu Zhang wrote:
> > > On Thu, Feb 13, 2025 at 09:46:01AM +0000, Jean-Philippe Brucker wrote:
> > > > Hi Jacob,
> > > > 
> > > > On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
> > > > > Our code and backend support are still in the early stages, that is why
> > > > > I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
> > > > > anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
> > > > > @Jean @Eric Do you know?
> > > > 
> > > > As far as I know Tina worked on this most recently:
> > > > https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
> > > > https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/
> > > 
> > > Thanks a lot for this information, Jean.
> > > IIUC, these patches were trying to add VT-d IO page table support in
> > > virtio-iommu, but it is not based on Jason's generic PT [1]. Just wondering,
> > > does anyone have plan to do the incorporation? 
> > 
> > I'm not aware of anyone working on this at the moment. Something you will
> > need for a portable pviommu is a library that manages PASID tables rather
> > than page tables [1], because the Arm SMMUv3 arch only support assigning
> > PASID tables to the guest. Alternatively you could implement opaque PASID
> > table allocation via host calls, letting the guest allocate GPA space and
> > the host manage the PASID table, but that idea didn't seem very popular at
> > the time.
> 
> Thank you, Jean. Just had a study of the spec. For ARM SMMUv3, letting
> the guest manage the PASID table, and then assigning it directly to the
> backend in ATTACH_TABLE request looks quite resonable. But for VT-d,
> my understanding is the PASID table shall be managed by host. By "that
> idea didn't seem very popular", do you mean that people also want the 
> ATTCH_TABLE request for VT-d also assign the PASID table(an virtual one
> managed by the guest). If yes, why? 

No, the proposal for managing the PASID table in the host was done before
the VT-d architecture added Scalable mode, so at the time they also had to
assign whole PASID tables to the guest and weren't keen on managing it in
the host. I believe in revision 3 (2018) the architecture added support
for Scalable mode and the ability to manage PASID tables in the host.

Nowadays it wouldn't make sense for a pvIOMMU to manage the VT-d PASID
tables in the guest, because as I understand it there is no demand for
supporting the legacy mode address translation of VT-d.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-19 10:35                 ` Jean-Philippe Brucker
@ 2025-02-19 11:11                   ` Yu Zhang
  2025-02-19 11:57                     ` Jean-Philippe Brucker
  2025-02-19 13:10                   ` Yi Liu
  2025-02-20  2:58                   ` Baolu Lu
  2 siblings, 1 reply; 37+ messages in thread
From: Yu Zhang @ 2025-02-19 11:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com,
	Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On Wed, Feb 19, 2025 at 10:35:18AM +0000, Jean-Philippe Brucker wrote:
> On Wed, Feb 19, 2025 at 05:39:19PM +0800, Yu Zhang wrote:
> > On Thu, Feb 13, 2025 at 06:09:19PM +0000, Jean-Philippe Brucker wrote:
> > > On Fri, Feb 14, 2025 at 01:03:43AM +0800, Yu Zhang wrote:
> > > > On Thu, Feb 13, 2025 at 09:46:01AM +0000, Jean-Philippe Brucker wrote:
> > > > > Hi Jacob,
> > > > > 
> > > > > On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
> > > > > > Our code and backend support are still in the early stages, that is why
> > > > > > I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
> > > > > > anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
> > > > > > @Jean @Eric Do you know?
> > > > > 
> > > > > As far as I know Tina worked on this most recently:
> > > > > https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
> > > > > https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/
> > > > 
> > > > Thanks a lot for this information, Jean.
> > > > IIUC, these patches were trying to add VT-d IO page table support in
> > > > virtio-iommu, but it is not based on Jason's generic PT [1]. Just wondering,
> > > > does anyone have plan to do the incorporation? 
> > > 
> > > I'm not aware of anyone working on this at the moment. Something you will
> > > need for a portable pviommu is a library that manages PASID tables rather
> > > than page tables [1], because the Arm SMMUv3 arch only support assigning
> > > PASID tables to the guest. Alternatively you could implement opaque PASID
> > > table allocation via host calls, letting the guest allocate GPA space and
> > > the host manage the PASID table, but that idea didn't seem very popular at
> > > the time.
> > 
> > Thank you, Jean. Just had a study of the spec. For ARM SMMUv3, letting
> > the guest manage the PASID table, and then assigning it directly to the
> > backend in ATTACH_TABLE request looks quite resonable. But for VT-d,
> > my understanding is the PASID table shall be managed by host. By "that
> > idea didn't seem very popular", do you mean that people also want the 
> > ATTCH_TABLE request for VT-d also assign the PASID table(an virtual one
> > managed by the guest). If yes, why? 
> 
> No, the proposal for managing the PASID table in the host was done before
> the VT-d architecture added Scalable mode, so at the time they also had to
> assign whole PASID tables to the guest and weren't keen on managing it in
> the host. I believe in revision 3 (2018) the architecture added support
> for Scalable mode and the ability to manage PASID tables in the host.
> 
> Nowadays it wouldn't make sense for a pvIOMMU to manage the VT-d PASID
> tables in the guest, because as I understand it there is no demand for
> supporting the legacy mode address translation of VT-d.

Thanks, Jean. Yeah, my understanding is that old revisions of VT-d looks
more like the ARM SMMUv3. But with scalable mode introduced, there's no
need for a pvIOMMU to allocate a guest version PASID table on VT-d.

So my question becomes: why do we need a library that manages PASID
tables in the guest? To also support AMD IOMMU besides ARM SMMUv3(
I saw your spec proposed both ATTACH_TABLE request for AMD GCR3 table
and ATTACH_TABLE request for AMD page table, but don't know which one
is more preferred in practice)?

B.R.
Yu



> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-19 11:11                   ` Yu Zhang
@ 2025-02-19 11:57                     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 37+ messages in thread
From: Jean-Philippe Brucker @ 2025-02-19 11:57 UTC (permalink / raw)
  To: Yu Zhang
  Cc: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com,
	Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On Wed, Feb 19, 2025 at 07:11:43PM +0800, Yu Zhang wrote:
> On Wed, Feb 19, 2025 at 10:35:18AM +0000, Jean-Philippe Brucker wrote:
> > On Wed, Feb 19, 2025 at 05:39:19PM +0800, Yu Zhang wrote:
> > > On Thu, Feb 13, 2025 at 06:09:19PM +0000, Jean-Philippe Brucker wrote:
> > > > On Fri, Feb 14, 2025 at 01:03:43AM +0800, Yu Zhang wrote:
> > > > > On Thu, Feb 13, 2025 at 09:46:01AM +0000, Jean-Philippe Brucker wrote:
> > > > > > Hi Jacob,
> > > > > > 
> > > > > > On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
> > > > > > > Our code and backend support are still in the early stages, that is why
> > > > > > > I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
> > > > > > > anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
> > > > > > > @Jean @Eric Do you know?
> > > > > > 
> > > > > > As far as I know Tina worked on this most recently:
> > > > > > https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
> > > > > > https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/
> > > > > 
> > > > > Thanks a lot for this information, Jean.
> > > > > IIUC, these patches were trying to add VT-d IO page table support in
> > > > > virtio-iommu, but it is not based on Jason's generic PT [1]. Just wondering,
> > > > > does anyone have plan to do the incorporation? 
> > > > 
> > > > I'm not aware of anyone working on this at the moment. Something you will
> > > > need for a portable pviommu is a library that manages PASID tables rather
> > > > than page tables [1], because the Arm SMMUv3 arch only support assigning
> > > > PASID tables to the guest. Alternatively you could implement opaque PASID
> > > > table allocation via host calls, letting the guest allocate GPA space and
> > > > the host manage the PASID table, but that idea didn't seem very popular at
> > > > the time.
> > > 
> > > Thank you, Jean. Just had a study of the spec. For ARM SMMUv3, letting
> > > the guest manage the PASID table, and then assigning it directly to the
> > > backend in ATTACH_TABLE request looks quite resonable. But for VT-d,
> > > my understanding is the PASID table shall be managed by host. By "that
> > > idea didn't seem very popular", do you mean that people also want the 
> > > ATTCH_TABLE request for VT-d also assign the PASID table(an virtual one
> > > managed by the guest). If yes, why? 
> > 
> > No, the proposal for managing the PASID table in the host was done before
> > the VT-d architecture added Scalable mode, so at the time they also had to
> > assign whole PASID tables to the guest and weren't keen on managing it in
> > the host. I believe in revision 3 (2018) the architecture added support
> > for Scalable mode and the ability to manage PASID tables in the host.
> > 
> > Nowadays it wouldn't make sense for a pvIOMMU to manage the VT-d PASID
> > tables in the guest, because as I understand it there is no demand for
> > supporting the legacy mode address translation of VT-d.
> 
> Thanks, Jean. Yeah, my understanding is that old revisions of VT-d looks
> more like the ARM SMMUv3. But with scalable mode introduced, there's no
> need for a pvIOMMU to allocate a guest version PASID table on VT-d.
> 
> So my question becomes: why do we need a library that manages PASID
> tables in the guest? To also support AMD IOMMU besides ARM SMMUv3(
> I saw your spec proposed both ATTACH_TABLE request for AMD GCR3 table
> and ATTACH_TABLE request for AMD page table, but don't know which one
> is more preferred in practice)?

I don't know either, but given that the parts about AMD and RISC-V in the
proposal haven't received feedback they will definitely change, and it's a
question for AMD and RISC-V people.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-19 10:35                 ` Jean-Philippe Brucker
  2025-02-19 11:11                   ` Yu Zhang
@ 2025-02-19 13:10                   ` Yi Liu
  2025-02-20  2:58                   ` Baolu Lu
  2 siblings, 0 replies; 37+ messages in thread
From: Yi Liu @ 2025-02-19 13:10 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Yu Zhang
  Cc: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com,
	Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On 2025/2/19 18:35, Jean-Philippe Brucker wrote:
> On Wed, Feb 19, 2025 at 05:39:19PM +0800, Yu Zhang wrote:
>> On Thu, Feb 13, 2025 at 06:09:19PM +0000, Jean-Philippe Brucker wrote:
>>> On Fri, Feb 14, 2025 at 01:03:43AM +0800, Yu Zhang wrote:
>>>> On Thu, Feb 13, 2025 at 09:46:01AM +0000, Jean-Philippe Brucker wrote:
>>>>> Hi Jacob,
>>>>>
>>>>> On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
>>>>>> Our code and backend support are still in the early stages, that is why
>>>>>> I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
>>>>>> anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
>>>>>> @Jean @Eric Do you know?
>>>>>
>>>>> As far as I know Tina worked on this most recently:
>>>>> https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
>>>>> https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/
>>>>
>>>> Thanks a lot for this information, Jean.
>>>> IIUC, these patches were trying to add VT-d IO page table support in
>>>> virtio-iommu, but it is not based on Jason's generic PT [1]. Just wondering,
>>>> does anyone have plan to do the incorporation?
>>>
>>> I'm not aware of anyone working on this at the moment. Something you will
>>> need for a portable pviommu is a library that manages PASID tables rather
>>> than page tables [1], because the Arm SMMUv3 arch only support assigning
>>> PASID tables to the guest. Alternatively you could implement opaque PASID
>>> table allocation via host calls, letting the guest allocate GPA space and
>>> the host manage the PASID table, but that idea didn't seem very popular at
>>> the time.
>>
>> Thank you, Jean. Just had a study of the spec. For ARM SMMUv3, letting
>> the guest manage the PASID table, and then assigning it directly to the
>> backend in ATTACH_TABLE request looks quite resonable. But for VT-d,
>> my understanding is the PASID table shall be managed by host. By "that
>> idea didn't seem very popular", do you mean that people also want the
>> ATTCH_TABLE request for VT-d also assign the PASID table(an virtual one
>> managed by the guest). If yes, why?
> 
> No, the proposal for managing the PASID table in the host was done before
> the VT-d architecture added Scalable mode, so at the time they also had to
> assign whole PASID tables to the guest and weren't keen on managing it in
> the host. I believe in revision 3 (2018) the architecture added support
> for Scalable mode and the ability to manage PASID tables in the host.

that's correct. It was called ECS mode and has already been deprecated.

> Nowadays it wouldn't make sense for a pvIOMMU to manage the VT-d PASID
> tables in the guest, because as I understand it there is no demand for
> supporting the legacy mode address translation of VT-d.

Conceptually, ATTACH_PASID_TABLE does not suit here as the PASID table is
maintained by host. It may need an ATTACH_PASID thing which is triggered
per guest PASID table entry setup/update/destroy.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-19 10:35                 ` Jean-Philippe Brucker
  2025-02-19 11:11                   ` Yu Zhang
  2025-02-19 13:10                   ` Yi Liu
@ 2025-02-20  2:58                   ` Baolu Lu
  2025-02-20  3:44                     ` Yu Zhang
  2 siblings, 1 reply; 37+ messages in thread
From: Baolu Lu @ 2025-02-20  2:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Yu Zhang
  Cc: jacob.pan, Easwar Hariharan, zhangyu1@microsoft.com,
	Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	virtualization, Will Deacon, Eric Auger, patches

On 2/19/25 18:35, Jean-Philippe Brucker wrote:
> On Wed, Feb 19, 2025 at 05:39:19PM +0800, Yu Zhang wrote:
>> On Thu, Feb 13, 2025 at 06:09:19PM +0000, Jean-Philippe Brucker wrote:
>>> On Fri, Feb 14, 2025 at 01:03:43AM +0800, Yu Zhang wrote:
>>>> On Thu, Feb 13, 2025 at 09:46:01AM +0000, Jean-Philippe Brucker wrote:
>>>>> Hi Jacob,
>>>>>
>>>>> On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
>>>>>> Our code and backend support are still in the early stages, that is why
>>>>>> I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
>>>>>> anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
>>>>>> @Jean @Eric Do you know?
>>>>> As far as I know Tina worked on this most recently:
>>>>> https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
>>>>> https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/
>>>> Thanks a lot for this information, Jean.
>>>> IIUC, these patches were trying to add VT-d IO page table support in
>>>> virtio-iommu, but it is not based on Jason's generic PT [1]. Just wondering,
>>>> does anyone have plan to do the incorporation?
>>> I'm not aware of anyone working on this at the moment. Something you will
>>> need for a portable pviommu is a library that manages PASID tables rather
>>> than page tables [1], because the Arm SMMUv3 arch only support assigning
>>> PASID tables to the guest. Alternatively you could implement opaque PASID
>>> table allocation via host calls, letting the guest allocate GPA space and
>>> the host manage the PASID table, but that idea didn't seem very popular at
>>> the time.
>> Thank you, Jean. Just had a study of the spec. For ARM SMMUv3, letting
>> the guest manage the PASID table, and then assigning it directly to the
>> backend in ATTACH_TABLE request looks quite resonable. But for VT-d,
>> my understanding is the PASID table shall be managed by host. By "that
>> idea didn't seem very popular", do you mean that people also want the
>> ATTCH_TABLE request for VT-d also assign the PASID table(an virtual one
>> managed by the guest). If yes, why?
> No, the proposal for managing the PASID table in the host was done before
> the VT-d architecture added Scalable mode, so at the time they also had to
> assign whole PASID tables to the guest and weren't keen on managing it in
> the host. I believe in revision 3 (2018) the architecture added support
> for Scalable mode and the ability to manage PASID tables in the host.
> 
> Nowadays it wouldn't make sense for a pvIOMMU to manage the VT-d PASID
> tables in the guest, because as I understand it there is no demand for
> supporting the legacy mode address translation of VT-d.

Are you talking about ECS mode? There is no hardware or software
implementation for this mode, so we don't need to consider it.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging()
  2025-02-20  2:58                   ` Baolu Lu
@ 2025-02-20  3:44                     ` Yu Zhang
  0 siblings, 0 replies; 37+ messages in thread
From: Yu Zhang @ 2025-02-20  3:44 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Jean-Philippe Brucker, jacob.pan, Easwar Hariharan,
	zhangyu1@microsoft.com, Jason Gunthorpe, iommu, Joerg Roedel,
	Robin Murphy, virtualization, Will Deacon, Eric Auger, patches

On Thu, Feb 20, 2025 at 10:58:05AM +0800, Baolu Lu wrote:
> On 2/19/25 18:35, Jean-Philippe Brucker wrote:
> > On Wed, Feb 19, 2025 at 05:39:19PM +0800, Yu Zhang wrote:
> > > On Thu, Feb 13, 2025 at 06:09:19PM +0000, Jean-Philippe Brucker wrote:
> > > > On Fri, Feb 14, 2025 at 01:03:43AM +0800, Yu Zhang wrote:
> > > > > On Thu, Feb 13, 2025 at 09:46:01AM +0000, Jean-Philippe Brucker wrote:
> > > > > > Hi Jacob,
> > > > > > 
> > > > > > On Wed, Feb 12, 2025 at 09:47:23PM -0800, Jacob Pan wrote:
> > > > > > > Our code and backend support are still in the early stages, that is why
> > > > > > > I am attempting to convert virtio-iommu driver to iommu_pt. Not sure if
> > > > > > > anyone has done the QEMU part to support VIRTIO_IOMMU_F_ATTACH_TABLE?
> > > > > > > @Jean @Eric Do you know?
> > > > > > As far as I know Tina worked on this most recently:
> > > > > > https://github.com/TinaZhangZW/qemu/commits/virtio-iommu/vt-d-pgtable/
> > > > > > https://lore.kernel.org/all/20231106071226.9656-1-tina.zhang@intel.com/
> > > > > Thanks a lot for this information, Jean.
> > > > > IIUC, these patches were trying to add VT-d IO page table support in
> > > > > virtio-iommu, but it is not based on Jason's generic PT [1]. Just wondering,
> > > > > does anyone have plan to do the incorporation?
> > > > I'm not aware of anyone working on this at the moment. Something you will
> > > > need for a portable pviommu is a library that manages PASID tables rather
> > > > than page tables [1], because the Arm SMMUv3 arch only support assigning
> > > > PASID tables to the guest. Alternatively you could implement opaque PASID
> > > > table allocation via host calls, letting the guest allocate GPA space and
> > > > the host manage the PASID table, but that idea didn't seem very popular at
> > > > the time.
> > > Thank you, Jean. Just had a study of the spec. For ARM SMMUv3, letting
> > > the guest manage the PASID table, and then assigning it directly to the
> > > backend in ATTACH_TABLE request looks quite resonable. But for VT-d,
> > > my understanding is the PASID table shall be managed by host. By "that
> > > idea didn't seem very popular", do you mean that people also want the
> > > ATTCH_TABLE request for VT-d also assign the PASID table(an virtual one
> > > managed by the guest). If yes, why?
> > No, the proposal for managing the PASID table in the host was done before
> > the VT-d architecture added Scalable mode, so at the time they also had to
> > assign whole PASID tables to the guest and weren't keen on managing it in
> > the host. I believe in revision 3 (2018) the architecture added support
> > for Scalable mode and the ability to manage PASID tables in the host.
> > 
> > Nowadays it wouldn't make sense for a pvIOMMU to manage the VT-d PASID
> > tables in the guest, because as I understand it there is no demand for
> > supporting the legacy mode address translation of VT-d.
> 
> Are you talking about ECS mode? There is no hardware or software

Thanks, Baolu. I suppose so (by 'legacy', Jean was refers to ECS, and he
knows this is no longer supported).

> implementation for this mode, so we don't need to consider it.

It seems we have now reached a consensus that no PASID table needs to be
managed by a pvIOMMU for VT-d.

Meanwhile, it's worth noting that this is not the case for ARM SMMUv3,
which is the opposite. And as to AMD IOMMU, we may choose either.

B.R.
Yu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static
  2025-02-07 14:46 ` [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static Jason Gunthorpe
  2025-02-12  0:43   ` Jacob Pan
@ 2025-02-21 11:35   ` Jean-Philippe Brucker
  2025-02-24 19:37     ` Jason Gunthorpe
  1 sibling, 1 reply; 37+ messages in thread
From: Jean-Philippe Brucker @ 2025-02-21 11:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, Robin Murphy, virtualization, Will Deacon,
	Eric Auger, patches

On Fri, Feb 07, 2025 at 10:46:01AM -0400, Jason Gunthorpe wrote:
> To make way for a domain_alloc_paging conversion add the typical global
> static IDENTITY domain. This supports VMMs that have a
> VIRTIO_IOMMU_F_BYPASS_CONFIG config.
> 
> If the VMM does not have support then the domain_alloc path is still used,
> which creates an IDENTITY domain out of a paging domain.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/virtio-iommu.c | 86 ++++++++++++++++++++++++++++--------
>  1 file changed, 67 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index b85ce6310ddbda..c71a996760bddb 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -48,6 +48,7 @@ struct viommu_dev {
>  	u64				pgsize_bitmap;
>  	u32				first_domain;
>  	u32				last_domain;
> +	u32				identity_domain_id;
>  	/* Supported MAP flags */
>  	u32				map_flags;
>  	u32				probe_size;
> @@ -70,7 +71,6 @@ struct viommu_domain {
>  	struct rb_root_cached		mappings;
>  
>  	unsigned long			nr_endpoints;
> -	bool				bypass;
>  };
>  
>  struct viommu_endpoint {
> @@ -305,6 +305,22 @@ static int viommu_send_req_sync(struct viommu_dev *viommu, void *buf,
>  	return ret;
>  }
>  
> +static int viommu_send_attach_req(struct viommu_dev *viommu, struct device *dev,
> +				  struct virtio_iommu_req_attach *req)
> +{
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +	int ret;
> +	int i;
> +
> +	for (i = 0; i < fwspec->num_ids; i++) {
> +		req->endpoint = cpu_to_le32(fwspec->ids[i]);
> +		ret = viommu_send_req_sync(viommu, req, sizeof(*req));
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * viommu_add_mapping - add a mapping to the internal tree
>   *
> @@ -687,12 +703,6 @@ static int viommu_domain_finalise(struct viommu_endpoint *vdev,
>  	vdomain->viommu		= viommu;
>  
>  	if (domain->type == IOMMU_DOMAIN_IDENTITY) {
> -		if (virtio_has_feature(viommu->vdev,
> -				       VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
> -			vdomain->bypass = true;
> -			return 0;
> -		}
> -
>  		ret = viommu_domain_map_identity(vdev, vdomain);
>  		if (ret) {
>  			ida_free(&viommu->domain_ids, vdomain->id);
> @@ -719,10 +729,8 @@ static void viommu_domain_free(struct iommu_domain *domain)
>  
>  static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  {
> -	int i;
>  	int ret = 0;
>  	struct virtio_iommu_req_attach req;
> -	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>  	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
>  	struct viommu_domain *vdomain = to_viommu_domain(domain);
>  
> @@ -761,16 +769,9 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  		.domain		= cpu_to_le32(vdomain->id),
>  	};
>  
> -	if (vdomain->bypass)
> -		req.flags |= cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS);
> -
> -	for (i = 0; i < fwspec->num_ids; i++) {
> -		req.endpoint = cpu_to_le32(fwspec->ids[i]);
> -
> -		ret = viommu_send_req_sync(vdomain->viommu, &req, sizeof(req));
> -		if (ret)
> -			return ret;
> -	}
> +	ret = viommu_send_attach_req(vdomain->viommu, dev, &req);
> +	if (ret)
> +		return ret;
>  
>  	if (!vdomain->nr_endpoints) {
>  		/*
> @@ -788,6 +789,40 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	return 0;
>  }
>  
> +static int viommu_attach_identity_domain(struct iommu_domain *domain,
> +					 struct device *dev)
> +{
> +	int ret = 0;
> +	struct virtio_iommu_req_attach req;
> +	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
> +	struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> +	req = (struct virtio_iommu_req_attach) {
> +		.head.type	= VIRTIO_IOMMU_T_ATTACH,
> +		.domain		= cpu_to_le32(vdev->viommu->identity_domain_id),
> +		.flags          = cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS),
> +	};
> +
> +	ret = viommu_send_attach_req(vdev->viommu, dev, &req);
> +	if (ret)
> +		return ret;
> +
> +	if (vdev->vdomain) {
> +		vdev->vdomain->nr_endpoints--;
> +		vdomain->nr_endpoints++;
> +		vdev->vdomain = vdomain;

These two need to be unconditional

> +	}
> +	return 0;
> +}
> +
> +static struct viommu_domain viommu_identity_domain = {
> +	.domain = { .type = IOMMU_DOMAIN_IDENTITY,
> +		    .ops =
> +			    &(const struct iommu_domain_ops){
> +				    .attach_dev = viommu_attach_identity_domain,
> +			    } }
> +};

nit: how about

	static struct viommu_domain viommu_identity_domain = {
		.domain = {
			.type = IOMMU_DOMAIN_IDENTITY,
			.ops = &(const struct iommu_domain_ops) {
				.attach_dev = viommu_attach_identity_domain,
			},
		},
	};

> +
>  static void viommu_detach_dev(struct viommu_endpoint *vdev)
>  {
>  	int i;
> @@ -1061,6 +1096,7 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap)
>  }
>  
>  static struct iommu_ops viommu_ops = {
> +	.identity_domain	= &viommu_identity_domain.domain,
>  	.capable		= viommu_capable,
>  	.domain_alloc		= viommu_domain_alloc,
>  	.probe_device		= viommu_probe_device,
> @@ -1184,6 +1220,18 @@ static int viommu_probe(struct virtio_device *vdev)
>  	if (virtio_has_feature(vdev, VIRTIO_IOMMU_F_MMIO))
>  		viommu->map_flags |= VIRTIO_IOMMU_MAP_F_MMIO;
>  
> +	/* Reserve an ID to use as the bypass domain */
> +	if (virtio_has_feature(viommu->vdev, VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
> +		viommu->identity_domain_id = viommu->first_domain;
> +		viommu->first_domain++;
> +	} else {
> +		/*
> +		 * Assume the VMM is sensible and it either supports bypass on
> +		 * all instances or no instances.
> +		 */

Maybe also a WARN_ON(!viommu_ops.identity_domain) above?

Thanks,
Jean



> +		viommu_ops.identity_domain = NULL;
> +	}
> +
>  	viommu_ops.pgsize_bitmap = viommu->pgsize_bitmap;
>  
>  	virtio_device_ready(vdev);
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
  2025-02-12 12:50   ` Jason Gunthorpe
  2025-02-12 18:50     ` Jacob Pan
@ 2025-02-21 11:42     ` Jean-Philippe Brucker
  2025-02-24 19:39       ` Jason Gunthorpe
  1 sibling, 1 reply; 37+ messages in thread
From: Jean-Philippe Brucker @ 2025-02-21 11:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jacob Pan, iommu, Joerg Roedel, Robin Murphy, virtualization,
	Will Deacon, Eric Auger, patches

On Wed, Feb 12, 2025 at 08:50:07AM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 11, 2025 at 04:41:51PM -0800, Jacob Pan wrote:
> 
> > I see the same on arm64 with v9.0, assigned an ixgbe nic via VFIO.
> 
> Huh, I figured it worked on ARM..

Crosvm would be a more useful test because it doesn't implement any BYPASS
feature, but I don't have it set up at the moment. It's not a problem on
QEMU since it supports the BYPASS feature and doesn't need identity-mapped
domains.

> 
> There is no requirement that an iommu driver implement identity, this
> is all an optimization. If it doesn't work in alot of cases can we
> just remove it? It would simplfy a lot..

No objection.  Identity domains are useful if the use-case of virtio-iommu
is assigning devices to guest userspace, in which case you'd want to
bypass the IOMMU for the devices that are still owned by the kernel. But
VMMs have at least two alternatives to identity-mapping (BYPASS feature
and tailored topology).

Thanks,
Jean

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static
  2025-02-21 11:35   ` Jean-Philippe Brucker
@ 2025-02-24 19:37     ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-24 19:37 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, Joerg Roedel, Robin Murphy, virtualization, Will Deacon,
	Eric Auger, patches

On Fri, Feb 21, 2025 at 11:35:27AM +0000, Jean-Philippe Brucker wrote:
> > +static int viommu_attach_identity_domain(struct iommu_domain *domain,
> > +					 struct device *dev)
> > +{
> > +	int ret = 0;
> > +	struct virtio_iommu_req_attach req;
> > +	struct viommu_endpoint *vdev = dev_iommu_priv_get(dev);
> > +	struct viommu_domain *vdomain = to_viommu_domain(domain);
> > +
> > +	req = (struct virtio_iommu_req_attach) {
> > +		.head.type	= VIRTIO_IOMMU_T_ATTACH,
> > +		.domain		= cpu_to_le32(vdev->viommu->identity_domain_id),
> > +		.flags          = cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS),
> > +	};
> > +
> > +	ret = viommu_send_attach_req(vdev->viommu, dev, &req);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (vdev->vdomain) {
> > +		vdev->vdomain->nr_endpoints--;
> > +		vdomain->nr_endpoints++;
> > +		vdev->vdomain = vdomain;
> 
> These two need to be unconditional

Woops yes

> > +static struct viommu_domain viommu_identity_domain = {
> > +	.domain = { .type = IOMMU_DOMAIN_IDENTITY,
> > +		    .ops =
> > +			    &(const struct iommu_domain_ops){
> > +				    .attach_dev = viommu_attach_identity_domain,
> > +			    } }
> > +};
> 
> nit: how about
> 
> 	static struct viommu_domain viommu_identity_domain = {
> 		.domain = {
> 			.type = IOMMU_DOMAIN_IDENTITY,
> 			.ops = &(const struct iommu_domain_ops) {
> 				.attach_dev = viommu_attach_identity_domain,
> 			},
> 		},
> 	};

Done

> > +	/* Reserve an ID to use as the bypass domain */
> > +	if (virtio_has_feature(viommu->vdev, VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
> > +		viommu->identity_domain_id = viommu->first_domain;
> > +		viommu->first_domain++;
> > +	} else {
> > +		/*
> > +		 * Assume the VMM is sensible and it either supports bypass on
> > +		 * all instances or no instances.
> > +		 */
> 
> Maybe also a WARN_ON(!viommu_ops.identity_domain) above?

It starts working in the following patch because the core will call
viommu_domain_alloc_identity() that can make a correct
per-device/per-viommu determination of bypass support.

Let me fold this into the next patch to make that clearer:

@@ -998,7 +998,7 @@ static void viommu_get_resv_regions(struct device *dev, struct list_head *head)
        iommu_dma_get_resv_regions(dev, head);
 }
 
-static struct iommu_ops viommu_ops;
+static const struct iommu_ops viommu_ops;
 static struct virtio_driver virtio_iommu_drv;
 
 static int viommu_match_node(struct device *dev, const void *data)
@@ -1086,8 +1086,7 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap)
        }
 }
 
-static struct iommu_ops viommu_ops = {
-       .identity_domain        = &viommu_identity_domain.domain,
+static const struct iommu_ops viommu_ops = {
        .capable                = viommu_capable,
        .domain_alloc_identity  = viommu_domain_alloc_identity,
        .domain_alloc_paging    = viommu_domain_alloc_paging,
@@ -1216,12 +1215,6 @@ static int viommu_probe(struct virtio_device *vdev)
        if (virtio_has_feature(viommu->vdev, VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
                viommu->identity_domain_id = viommu->first_domain;
                viommu->first_domain++;
-       } else {
-               /*
-                * Assume the VMM is sensible and it either supports bypass on
-                * all instances or no instances.
-                */
-               viommu_ops.identity_domain = NULL;
        }

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging()
  2025-02-21 11:42     ` Jean-Philippe Brucker
@ 2025-02-24 19:39       ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2025-02-24 19:39 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Jacob Pan, iommu, Joerg Roedel, Robin Murphy, virtualization,
	Will Deacon, Eric Auger, patches

On Fri, Feb 21, 2025 at 11:42:06AM +0000, Jean-Philippe Brucker wrote:
> > There is no requirement that an iommu driver implement identity, this
> > is all an optimization. If it doesn't work in alot of cases can we
> > just remove it? It would simplfy a lot..
> 
> No objection.  Identity domains are useful if the use-case of virtio-iommu
> is assigning devices to guest userspace, in which case you'd want to
> bypass the IOMMU for the devices that are still owned by the kernel. But
> VMMs have at least two alternatives to identity-mapping (BYPASS feature
> and tailored topology).

Since Robin pointed out a few other drivers that would like to use the
domain_op lets leave it as is.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-02-24 19:39 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 14:46 [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jason Gunthorpe
2025-02-07 14:46 ` [PATCH 1/5] iommu/virtio: Break out bypass identity support into a global static Jason Gunthorpe
2025-02-12  0:43   ` Jacob Pan
2025-02-18 18:21     ` Jason Gunthorpe
2025-02-21 11:35   ` Jean-Philippe Brucker
2025-02-24 19:37     ` Jason Gunthorpe
2025-02-07 14:46 ` [PATCH 2/5] iommu: Add domain_alloc_identity() Jason Gunthorpe
2025-02-12 13:56   ` Robin Murphy
2025-02-12 14:03     ` Jason Gunthorpe
2025-02-12 14:16       ` Robin Murphy
2025-02-12 14:45         ` Jason Gunthorpe
2025-02-07 14:46 ` [PATCH 3/5] iommu/virtio: Move to domain_alloc_paging() Jason Gunthorpe
2025-02-12 19:22   ` Jacob Pan
2025-02-12 23:30     ` Jason Gunthorpe
2025-02-13  5:47       ` Jacob Pan
2025-02-18 20:01         ` Jason Gunthorpe
     [not found]       ` <67ad876d.170a0220.3c21dc.85ceSMTPIN_ADDED_BROKEN@mx.google.com>
2025-02-13  9:46         ` Jean-Philippe Brucker
2025-02-13 17:03           ` Yu Zhang
2025-02-13 18:09             ` Jean-Philippe Brucker
2025-02-19  9:39               ` Yu Zhang
2025-02-19 10:35                 ` Jean-Philippe Brucker
2025-02-19 11:11                   ` Yu Zhang
2025-02-19 11:57                     ` Jean-Philippe Brucker
2025-02-19 13:10                   ` Yi Liu
2025-02-20  2:58                   ` Baolu Lu
2025-02-20  3:44                     ` Yu Zhang
2025-02-07 14:46 ` [PATCH 4/5] iommu: Do not call domain_alloc() in iommu_sva_domain_alloc() Jason Gunthorpe
2025-02-07 14:46 ` [PATCH 5/5] iommu: Hide ops.domain_alloc behind CONFIG_FSL_PAMU Jason Gunthorpe
2025-02-12  0:41 ` [PATCH 0/5] Convert virtio-iommu to domain_alloc_paging() Jacob Pan
2025-02-12 12:50   ` Jason Gunthorpe
2025-02-12 18:50     ` Jacob Pan
2025-02-12 20:10       ` Robin Murphy
2025-02-21 11:42     ` Jean-Philippe Brucker
2025-02-24 19:39       ` Jason Gunthorpe
     [not found] ` <67abee53.170a0220.154671.ae28SMTPIN_ADDED_BROKEN@mx.google.com>
2025-02-12 11:58   ` Jean-Philippe Brucker
2025-02-12 17:05     ` Jacob Pan
     [not found]     ` <67acd4e2.630a0220.365aab.e098SMTPIN_ADDED_BROKEN@mx.google.com>
2025-02-12 19:16       ` Jean-Philippe Brucker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).