All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jacob Pan <jacob.pan@linux.microsoft.com>
To: Yi Liu <yi.l.liu@intel.com>
Cc: <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Alex Williamson <alex@shazbot.org>,
	Joerg Roedel <joro@8bytes.org>,
	Mostafa Saleh <smostafa@google.com>,
	David Matlack <dmatlack@google.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	Saurabh Sengar <ssengar@linux.microsoft.com>,
	<skhawaja@google.com>, <pasha.tatashin@soleen.com>,
	Will Deacon <will@kernel.org>,
	Baolu Lu <baolu.lu@linux.intel.com>,
	jacob.pan@linux.microsoft.com
Subject: Re: [PATCH v5 4/9] iommufd: Allow binding to a noiommu device
Date: Wed, 20 May 2026 08:54:59 -0700	[thread overview]
Message-ID: <20260520085459.000067ac@linux.microsoft.com> (raw)
In-Reply-To: <b9e3a3ef-086c-4284-94ce-8e7fafc22d85@intel.com>

Hi Yi,

On Wed, 20 May 2026 15:20:06 +0800
Yi Liu <yi.l.liu@intel.com> wrote:

> On 5/12/26 02:41, Jacob Pan wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > Allow iommufd to bind devices without an IOMMU (noiommu mode) by
> > creating a dummy IOMMU group for such devices and skipping hwpt
> > operations.
> > 
> > This enables noiommu devices to operate through the same iommufd
> > API as IOMMU- capable devices.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > v5:
> >     - simplify logic and rename iommufd_device_is_noiommu (Kevin,
> > Yi)
> >     - use a helper iommufd_bind_noiommu instead of open coding
> > (Kevin)
> >     - move IOMMU cap check under iommufd_bind_iommu() (Yi)
> >     - reword comments for partial init (Yi)
> >     - misc minor clean up
> > v4:
> >     - Update the description of the module parameter (Alex)
> > v3:
> >     - Consolidate into fewer patches
> > ---
> >   drivers/iommu/iommufd/device.c | 148
> > ++++++++++++++++++++++++--------- 1 file changed, 109
> > insertions(+), 39 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommufd/device.c
> > b/drivers/iommu/iommufd/device.c index d03076fcf3c2..4d75720432cc
> > 100644 --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -23,6 +23,16 @@ struct iommufd_attach {
> >   	struct xarray device_array;
> >   };
> >   
> > +/*
> > + * A noiommu device has no IOMMU driver attached regardless of
> > whether it
> > + * enters via the cdev path (no iommu_group) or the group path
> > (fake
> > + * noiommu iommu_group). In both cases dev->iommu is NULL.  
> 
> Are you trying to identify the case where both group and cdev
> interfaces coexist? In such a scenario, the group path creates a fake
> iommu group while the cdev path creates a dummy igroup. Both
> operations need to be performed, but checking iommu_group might
> interfere with the cdev noiommu path logic.
> 
> Is that why you're checking idev->dev->iommu here instead? If so,
> could you please make this rationale more explicit in the comment?
> 
> At first glance, the comment is quite confusing. It's unclear why a
> cdev-path-only helper function needs to reference the group path at
> all. The last sentence in particular seems to suggest this helper
> covers both paths, which adds to the confusion.
> 
Yes, Baolu also commented on this asking why not use
device_iommu_mapped. I will update the comment as follows:
 /*
- * A noiommu device has no IOMMU driver attached regardless of whether
it
- * enters via the cdev path (no iommu_group) or the group path (fake
- * noiommu iommu_group). In both cases dev->iommu is NULL.
+ * Detect a noiommu device for the cdev path. We check dev->iommu
rather than
+ * using device_iommu_mapped() (which checks dev->iommu_group) because
when
+ * both group and cdev interfaces coexist, the group path assigns a
fake
+ * noiommu iommu_group to the device. That would cause
device_iommu_mapped()
+ * to return true and hide the noiommu case from the cdev path.
dev->iommu is
+ * reliably NULL when no IOMMU driver is managing the device.
  */
 static bool iommufd_device_is_noiommu(struct iommufd_device *idev)
 {

> > + */
> > +static bool iommufd_device_is_noiommu(struct iommufd_device *idev)
> > +{
> > +	return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) &&
> > !idev->dev->iommu; +}
> > +
> >   static void iommufd_group_release(struct kref *kref)
> >   {
> >   	struct iommufd_group *igroup =
> > @@ -30,9 +40,11 @@ static void iommufd_group_release(struct kref
> > *kref) 
> >   	WARN_ON(!xa_empty(&igroup->pasid_attach));
> >   
> > -	xa_cmpxchg(&igroup->ictx->groups,
> > iommu_group_id(igroup->group), igroup,
> > -		   NULL, GFP_KERNEL);
> > -	iommu_group_put(igroup->group);
> > +	if (igroup->group) {
> > +		xa_cmpxchg(&igroup->ictx->groups,
> > iommu_group_id(igroup->group),
> > +			   igroup, NULL, GFP_KERNEL);
> > +		iommu_group_put(igroup->group);
> > +	}
> >   	mutex_destroy(&igroup->lock);
> >   	kfree(igroup);
> >   }
> > @@ -204,32 +216,19 @@ void iommufd_device_destroy(struct
> > iommufd_object *obj) struct iommufd_device *idev =
> >   		container_of(obj, struct iommufd_device, obj);
> >   
> > -	iommu_device_release_dma_owner(idev->dev);
> > +	if (!idev->igroup)
> > +		return;
> > +	if (!iommufd_device_is_noiommu(idev))
> > +		iommu_device_release_dma_owner(idev->dev);
> >   	iommufd_put_group(idev->igroup);
> >   	if (!iommufd_selftest_is_mock_dev(idev->dev))
> >   		iommufd_ctx_put(idev->ictx);
> >   }
> >   
> > -/**
> > - * iommufd_device_bind - Bind a physical device to an iommu fd
> > - * @ictx: iommufd file descriptor
> > - * @dev: Pointer to a physical device struct
> > - * @id: Output ID number to return to userspace for this device
> > - *
> > - * A successful bind establishes an ownership over the device and
> > returns
> > - * struct iommufd_device pointer, otherwise returns error pointer.
> > - *
> > - * A driver using this API must set driver_managed_dma and must
> > not touch
> > - * the device until this routine succeeds and establishes
> > ownership.
> > - *
> > - * Binding a PCI device places the entire RID under iommufd
> > control.
> > - *
> > - * The caller must undo this with iommufd_device_unbind()
> > - */
> > -struct iommufd_device *iommufd_device_bind(struct iommufd_ctx
> > *ictx,
> > -					   struct device *dev, u32
> > *id) +static int iommufd_bind_iommu(struct iommufd_device *idev)
> >   {
> > -	struct iommufd_device *idev;
> > +	struct iommufd_ctx *ictx = idev->ictx;
> > +	struct device *dev = idev->dev;
> >   	struct iommufd_group *igroup;
> >   	int rc;
> >   
> > @@ -238,11 +237,11 @@ struct iommufd_device
> > *iommufd_device_bind(struct iommufd_ctx *ictx,
> >   	 * to restore cache coherency.
> >   	 */
> >   	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > -		return ERR_PTR(-EINVAL);
> > +		return -EINVAL;
> >   
> >   	igroup = iommufd_get_group(ictx, dev);
> >   	if (IS_ERR(igroup))
> > -		return ERR_CAST(igroup);
> > +		return PTR_ERR(igroup);
> >   
> >   	/*
> >   	 * For historical compat with VFIO the insecure interrupt
> > path is @@ -268,21 +267,80 @@ struct iommufd_device
> > *iommufd_device_bind(struct iommufd_ctx *ictx, if (rc)
> >   		goto out_group_put;
> >   
> > +	/* igroup refcount moves into iommufd_device */
> > +	idev->igroup = igroup;
> > +	idev->enforce_cache_coherency =
> > +		device_iommu_capable(dev,
> > IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
> > +	return 0;
> > +
> > +out_group_put:
> > +	iommufd_put_group(igroup);
> > +	return rc;
> > +}
> > +
> > +/*
> > + * Noiommu devices have no real IOMMU group. Create a dummy igroup
> > so that
> > + * internal code paths that expect idev->igroup to be present
> > still work.
> > + * A NULL igroup->group distinguishes this from a real
> > IOMMU-backed group.
> > + */
> > +static int iommufd_bind_noiommu(struct iommufd_device *idev)
> > +{
> > +	struct iommufd_group *igroup;
> > +
> > +	igroup = iommufd_alloc_group(idev->ictx, NULL);
> > +	if (IS_ERR(igroup))
> > +		return PTR_ERR(igroup);
> > +	idev->igroup = igroup;
> > +	return 0;
> > +}
> > +
> > +/**
> > + * iommufd_device_bind - Bind a physical device to an iommu fd
> > + * @ictx: iommufd file descriptor
> > + * @dev: Pointer to a physical device struct
> > + * @id: Output ID number to return to userspace for this device
> > + *
> > + * A successful bind establishes an ownership over the device and
> > returns
> > + * struct iommufd_device pointer, otherwise returns error pointer.
> > + *
> > + * A driver using this API must set driver_managed_dma and must
> > not touch
> > + * the device until this routine succeeds and establishes
> > ownership.
> > + *
> > + * Binding a PCI device places the entire RID under iommufd
> > control.
> > + *
> > + * The caller must undo this with iommufd_device_unbind()
> > + */
> > +struct iommufd_device *iommufd_device_bind(struct iommufd_ctx
> > *ictx,
> > +					   struct device *dev, u32
> > *id) +{
> > +	struct iommufd_device *idev;
> > +	int rc;
> > +
> >   	idev = iommufd_object_alloc(ictx, idev,
> > IOMMUFD_OBJ_DEVICE);
> > -	if (IS_ERR(idev)) {
> > -		rc = PTR_ERR(idev);
> > -		goto out_release_owner;
> > -	}
> > +	if (IS_ERR(idev))
> > +		return idev;
> > +
> >   	idev->ictx = ictx;
> > +	idev->dev = dev;
> > +
> > +	if (!iommufd_device_is_noiommu(idev)) {
> > +		rc = iommufd_bind_iommu(idev);
> > +		if (rc)
> > +			goto err_out;
> > +	} else {
> > +		rc = iommufd_bind_noiommu(idev);
> > +		if (rc)
> > +			goto err_out;
> > +	}  
> 
> how about moving the 'if (rc)'out of the two branches?
> 
Sounds good!

> > +
> > +	/*
> > +	 * Take a ctx reference after bind succeeds. This must
> > happen here
> > +	 * so that iommufd_device_destroy() can handle partial
> > initialization
> > +	 */  
> 
> s/after bind succeeds/after bind_iommu/noiommu() succeeds. or more
> explicit, "idev->igroup is set." :)
> 
> Other parts LGTM, feel free to add my r-b tag after solving the above
> nits.
> 
> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
will do.
Thanks a lot!

> 
> >   	if (!iommufd_selftest_is_mock_dev(dev))
> >   		iommufd_ctx_get(ictx);
> > -	idev->dev = dev;
> > -	idev->enforce_cache_coherency =
> > -		device_iommu_capable(dev,
> > IOMMU_CAP_ENFORCE_CACHE_COHERENCY); /* The calling driver is a user
> > until iommufd_device_unbind() */ refcount_inc(&idev->obj.users);
> > -	/* igroup refcount moves into iommufd_device */
> > -	idev->igroup = igroup;
> >   
> >   	/*
> >   	 * If the caller fails after this success it must call
> > @@ -294,11 +352,14 @@ struct iommufd_device
> > *iommufd_device_bind(struct iommufd_ctx *ictx, *id = idev->obj.id;
> >   	return idev;
> >   
> > -out_release_owner:
> > -	iommu_device_release_dma_owner(dev);
> > -out_group_put:
> > -	iommufd_put_group(igroup);
> > +err_out:
> > +	/*
> > +	 * iommufd_device_destroy() handles partially initialized
> > idev,
> > +	 * so iommufd_object_abort_and_destroy() is safe to call
> > here.
> > +	 */
> > +	iommufd_object_abort_and_destroy(ictx, &idev->obj);
> >   	return ERR_PTR(rc);
> > +
> >   }
> >   EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD");
> >   
> > @@ -512,6 +573,9 @@ static int iommufd_hwpt_attach_device(struct
> > iommufd_hw_pagetable *hwpt, struct iommufd_attach_handle *handle;
> >   	int rc;
> >   
> > +	if (iommufd_device_is_noiommu(idev))
> > +		return 0;
> > +
> >   	if (!iommufd_hwpt_compatible_device(hwpt, idev))
> >   		return -EINVAL;
> >   
> > @@ -559,6 +623,9 @@ static void iommufd_hwpt_detach_device(struct
> > iommufd_hw_pagetable *hwpt, {
> >   	struct iommufd_attach_handle *handle;
> >   
> > +	if (iommufd_device_is_noiommu(idev))
> > +		return;
> > +
> >   	handle = iommufd_device_get_attach_handle(idev, pasid);
> >   	if (pasid == IOMMU_NO_PASID)
> >   		iommu_detach_group_handle(hwpt->domain,
> > idev->igroup->group); @@ -577,6 +644,9 @@ static int
> > iommufd_hwpt_replace_device(struct iommufd_device *idev, struct
> > iommufd_attach_handle *handle, *old_handle; int rc;
> >   
> > +	if (iommufd_device_is_noiommu(idev))
> > +		return 0;
> > +
> >   	if (!iommufd_hwpt_compatible_device(hwpt, idev))
> >   		return -EINVAL;
> >   
> > @@ -652,7 +722,7 @@ int iommufd_hw_pagetable_attach(struct
> > iommufd_hw_pagetable *hwpt, goto err_release_devid;
> >   	}
> >   
> > -	if (attach_resv) {
> > +	if (attach_resv && !iommufd_device_is_noiommu(idev)) {
> >   		rc = iommufd_device_attach_reserved_iova(idev,
> > hwpt_paging); if (rc)
> >   			goto err_release_devid;  


  reply	other threads:[~2026-05-20 15:55 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-05-11 18:41 ` [PATCH v5 1/9] vfio: Rename VFIO_NOIOMMU to VFIO_GROUP_NOIOMMU Jacob Pan
2026-05-19 23:34   ` Jason Gunthorpe
2026-05-11 18:41 ` [PATCH v5 2/9] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-05-13  6:58   ` Baolu Lu
2026-05-13 21:30     ` Jacob Pan
2026-05-13 19:18   ` Samiullah Khawaja
2026-05-20  7:19   ` Yi Liu
2026-05-20 16:15     ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 3/9] iommufd: Move igroup allocation to a function Jacob Pan
2026-05-13  7:18   ` Baolu Lu
2026-05-11 18:41 ` [PATCH v5 4/9] iommufd: Allow binding to a noiommu device Jacob Pan
2026-05-13  7:37   ` Baolu Lu
2026-05-13 22:08     ` Jacob Pan
2026-05-14  6:51       ` Baolu Lu
2026-05-19 21:25         ` Jacob Pan
2026-05-20  7:20   ` Yi Liu
2026-05-20 15:54     ` Jacob Pan [this message]
2026-05-21  3:27       ` Yi Liu
2026-05-11 18:41 ` [PATCH v5 5/9] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
2026-05-11 18:58   ` Jacob Pan
2026-05-13  7:53   ` Baolu Lu
2026-05-13 12:22     ` Jason Gunthorpe
2026-05-13 22:20       ` Jacob Pan
2026-05-13 23:26         ` Jason Gunthorpe
2026-05-20  7:20   ` Yi Liu
2026-05-20  7:31     ` Yi Liu
2026-05-20 14:22     ` Jason Gunthorpe
2026-05-20 14:39       ` Yi Liu
2026-05-20 17:02     ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 6/9] vfio/group: Add VFIO_CDEV_NOIOMMU Kconfig and tolerate NULL group Jacob Pan
2026-05-20  3:45   ` Alex Williamson
2026-05-20 17:08     ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 7/9] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
2026-05-19 23:40   ` Jason Gunthorpe
2026-05-20  2:56     ` Jacob Pan
2026-05-20  3:46   ` Alex Williamson
2026-05-20  7:20     ` Yi Liu
2026-05-20 18:15       ` Jacob Pan
2026-05-21  3:25         ` Yi Liu
2026-05-21 16:49       ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 8/9] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
2026-05-11 18:41 ` [PATCH v5 9/9] Documentation: Update VFIO NOIOMMU mode Jacob Pan
2026-05-20  7:20   ` Yi Liu
2026-05-20 16:26     ` Jacob Pan
2026-05-21  3:24       ` Yi Liu
2026-05-19 18:01 ` [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jason Gunthorpe
2026-05-19 21:03   ` Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520085459.000067ac@linux.microsoft.com \
    --to=jacob.pan@linux.microsoft.com \
    --cc=alex@shazbot.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=robin.murphy@arm.com \
    --cc=skhawaja@google.com \
    --cc=smostafa@google.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.