Linux IOMMU Development
 help / color / mirror / Atom feed
From: Alex Williamson <alex@shazbot.org>
To: Jacob Pan <jacob.pan@linux.microsoft.com>
Cc: linux-kernel@vger.kernel.org,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Jason Gunthorpe <jgg@nvidia.com>, Joerg Roedel <joro@8bytes.org>,
	Mostafa Saleh <smostafa@google.com>,
	David Matlack <dmatlack@google.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	"Tian, Kevin" <kevin.tian@intel.com>, Yi Liu <yi.l.liu@intel.com>,
	Saurabh Sengar <ssengar@linux.microsoft.com>,
	skhawaja@google.com, pasha.tatashin@soleen.com,
	Will Deacon <will@kernel.org>,
	Baolu Lu <baolu.lu@linux.intel.com>,
	alex@shazbot.org
Subject: Re: [PATCH v5 7/9] vfio: Enable cdev noiommu mode under iommufd
Date: Tue, 19 May 2026 21:46:13 -0600	[thread overview]
Message-ID: <20260519214613.167e8d5b@shazbot.org> (raw)
In-Reply-To: <20260511184116.3687392-8-jacob.pan@linux.microsoft.com>

On Mon, 11 May 2026 11:41:12 -0700
Jacob Pan <jacob.pan@linux.microsoft.com> wrote:

> Now that devices under noiommu mode can bind with IOMMUFD and perform
> IOAS operations, lift restrictions on cdev from VFIO side.
> 
> Remove the vfio_device_is_group_noiommu() early returns in
> vfio_df_iommufd_bind() and vfio_df_iommufd_unbind() so that both
> group and cdev noiommu devices go through the standard iommufd bind
> path. This is safe because iommufd_device_bind() now handles noiommu
> devices via its own iommufd_device_is_noiommu() check.
> 
> Add CAP_SYS_RAWIO checks for cdev open and bind under noiommu to
> maintain security parity with the group noiommu path.
> 
> No IOMMU cdevs are explicitly named with noiommu prefix. e.g.
> 
>     /dev/vfio/
>     |-- devices
>     |   `-- noiommu-vfio0
>     `-- vfio
> 
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> v5:
>    - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
>      and its dependencies
>    - Add comment to explain vfio_noiommu conditional definition (Alex)
>    - Removed early return for group noiommu in bind/unbind
>    - Use consistent wording referring to VFIO noiommu mode (Kevin)
>    - Update unsafe_noiommu Kconfig help text (Kevin)
>    - Change dev_warn to dev_info for noiommu enabling msg (Kevin)
> v4:
>    - Remove early return in iommufd_bind for noiommu (Alex)
> v3:
>    - Consolidate into fewer patches
> v2:
>    - removed unnecessary device->noiommu set in
>      iommufd_vfio_compat_ioas_get_id()
> ---
>  drivers/vfio/Kconfig       |  3 +--
>  drivers/vfio/device_cdev.c | 10 ++++++++++
>  drivers/vfio/iommufd.c     |  7 -------
>  drivers/vfio/vfio.h        | 22 ++++++++++++++--------
>  drivers/vfio/vfio_main.c   | 25 ++++++++++++++++++++-----
>  include/linux/vfio.h       |  1 +
>  6 files changed, 46 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index b1b1633412a9..b1a260b6054c 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
>  	  The VFIO device cdev is another way for userspace to get device
>  	  access. Userspace gets device fd by opening device cdev under
>  	  /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
> -	  to set up secure DMA context for device access.  This interface does
> -	  not support noiommu.
> +	  to set up secure DMA context for device access.
>  
>  	  If you don't know what to do here, say N.
>  
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 54abf312cf04..46a808244398 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>  	struct vfio_device_file *df;
>  	int ret;
>  
> +	if (device->noiommu && !capable(CAP_SYS_RAWIO))
> +		return -EPERM;
> +
>  	/* Paired with the put in vfio_device_fops_release() */
>  	if (!vfio_device_try_get_registration(device))
>  		return -ENODEV;
> @@ -110,6 +113,13 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
>  	if (df->group)
>  		return -EINVAL;
>  
> +	/*
> +	 * CAP_SYS_RAWIO is already checked at cdev open, recheck here
> +	 * in case the fd was passed to a less privileged process.
> +	 */
> +	if (device->noiommu && !capable(CAP_SYS_RAWIO))
> +		return -EPERM;
> +
>  	ret = vfio_device_block_group(device);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 39079ab27f92..bc80056c74d3 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -25,10 +25,6 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df)
>  
>  	lockdep_assert_held(&vdev->dev_set->lock);
>  
> -	/* Returns 0 to permit device opening under noiommu mode */
> -	if (vfio_device_is_group_noiommu(vdev))
> -		return 0;
> -
>  	return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
>  }
>  
> @@ -58,9 +54,6 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
>  
>  	lockdep_assert_held(&vdev->dev_set->lock);
>  
> -	if (vfio_device_is_group_noiommu(vdev))
> -		return;
> -
>  	if (vdev->ops->unbind_iommufd)
>  		vdev->ops->unbind_iommufd(vdev);
>  }
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 602623cacfc0..ac79b1a2fce9 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -36,7 +36,7 @@ vfio_allocate_device_file(struct vfio_device *device);
>  
>  extern const struct file_operations vfio_device_fops;
>  
> -#ifdef CONFIG_VFIO_GROUP_NOIOMMU
> +#if IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) || IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU)

Have you considered what happens when these are y/n or n/y?

I think in the former case we can create cdev devices for group-noiommu
devices that are not labeled noiommu, skip the CAP_SYS_RAWIO test, but
will fail to bind.  In the latter case, I think we fail to setup an
iommufd_device and unbind will segfault.

We really don't need to support independently setting GROUP vs CDEV
NOIOMMU, the suggestion was to try to get NOIOMMU from depending on
VFIO_GROUP.  We can do that other ways though and I think we can do it
without the rename in patch 1 that will inevitably result in some lost
config options for NOIOMMU on upgrade.

The Kconfig may get messy, perhaps something like:

config VFIO_NOIOMMU
	bool "VFIO No-IOMMU support"
	depends on VFIO_GROUP || VFIO_DEVICE_CDEV
	depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER
	depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
	select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV

Sorry if the previous suggestion sent us astray, but the subtleties of
independent support look tricky.  Thanks,

Alex

>  extern bool vfio_noiommu __read_mostly;
>  #else
>  enum { vfio_noiommu = false };
> @@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct vfio_device *device);
>  
>  static inline int vfio_device_add(struct vfio_device *device)
>  {
> -	/* cdev does not support noiommu device */
> -	if (vfio_device_is_group_noiommu(device))
> -		return device_add(&device->device);
>  	vfio_init_device_cdev(device);
>  	return cdev_device_add(&device->cdev, &device->device);
>  }
>  
>  static inline void vfio_device_del(struct vfio_device *device)
>  {
> -	if (vfio_device_is_group_noiommu(device))
> -		device_del(&device->device);
> -	else
> -		cdev_device_del(&device->cdev, &device->device);
> +	cdev_device_del(&device->cdev, &device->device);
>  }
>  
>  int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> @@ -420,6 +414,18 @@ static inline void vfio_cdev_cleanup(void)
>  }
>  #endif /* CONFIG_VFIO_DEVICE_CDEV */
>  
> +#if IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU)
> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev)
> +{
> +	return vdev->noiommu;
> +}
> +#else
> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev)
> +{
> +	return false;
> +}
> +#endif
> +
>  #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
>  int __init vfio_virqfd_init(void);
>  void vfio_virqfd_exit(void);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 4d940ce6f114..1ba0f282d746 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -54,7 +54,7 @@ static struct vfio {
>  	int				fs_count;
>  } vfio;
>  
> -#ifdef CONFIG_VFIO_GROUP_NOIOMMU
> +#if IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) || IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU)
>  bool vfio_noiommu __read_mostly;
>  module_param_named(enable_unsafe_noiommu_mode,
>  		   vfio_noiommu, bool, S_IRUGO | S_IWUSR);
> @@ -321,6 +321,20 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
>  	return ret;
>  }
>  
> +static int vfio_device_set_noiommu_and_name(struct vfio_device *device)
> +{
> +	if (IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU) && vfio_noiommu && !device->dev->iommu) {
> +		device->noiommu = true;
> +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> +		dev_warn(device->dev,
> +			 "Adding kernel taint for vfio-noiommu cdev on device\n");
> +	}
> +
> +	/* Just to be safe, expose to user explicitly noiommu cdev node */
> +	return dev_set_name(&device->device, "%svfio%d",
> +		     device->noiommu ? "noiommu-" : "", device->index);
> +}
> +
>  static int __vfio_register_dev(struct vfio_device *device,
>  			       enum vfio_group_type type)
>  {
> @@ -340,20 +354,21 @@ static int __vfio_register_dev(struct vfio_device *device,
>  	if (!device->dev_set)
>  		vfio_assign_device_set(device, device);
>  
> -	ret = dev_set_name(&device->device, "vfio%d", device->index);
> +	ret = vfio_device_set_group(device, type);
>  	if (ret)
>  		return ret;
>  
> -	ret = vfio_device_set_group(device, type);
> +	ret = vfio_device_set_noiommu_and_name(device);
>  	if (ret)
> -		return ret;
> +		goto err_out;
>  
>  	/*
>  	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
>  	 * restore cache coherency. It has to be checked here because it is only
>  	 * valid for cases where we are using iommu groups.
>  	 */
> -	if (type == VFIO_IOMMU && !vfio_device_is_group_noiommu(device) &&
> +	if (type == VFIO_IOMMU && !(vfio_device_is_group_noiommu(device) ||
> +				    vfio_device_is_cdev_noiommu(device)) &&
>  	    !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
>  		ret = -EINVAL;
>  		goto err_out;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 31b826efba00..45f08986359e 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -74,6 +74,7 @@ struct vfio_device {
>  	u8 iommufd_attached:1;
>  #endif
>  	u8 cdev_opened:1;
> +	u8 noiommu:1;
>  	/*
>  	 * debug_root is a static property of the vfio_device
>  	 * which must be set prior to registering the vfio_device.


  parent reply	other threads:[~2026-05-20  3:46 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-05-11 18:41 ` [PATCH v5 1/9] vfio: Rename VFIO_NOIOMMU to VFIO_GROUP_NOIOMMU Jacob Pan
2026-05-19 23:34   ` Jason Gunthorpe
2026-05-11 18:41 ` [PATCH v5 2/9] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-05-13  6:58   ` Baolu Lu
2026-05-13 21:30     ` Jacob Pan
2026-05-13 19:18   ` Samiullah Khawaja
2026-05-20  7:19   ` Yi Liu
2026-05-20 16:15     ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 3/9] iommufd: Move igroup allocation to a function Jacob Pan
2026-05-13  7:18   ` Baolu Lu
2026-05-11 18:41 ` [PATCH v5 4/9] iommufd: Allow binding to a noiommu device Jacob Pan
2026-05-13  7:37   ` Baolu Lu
2026-05-13 22:08     ` Jacob Pan
2026-05-14  6:51       ` Baolu Lu
2026-05-19 21:25         ` Jacob Pan
2026-05-20  7:20   ` Yi Liu
2026-05-20 15:54     ` Jacob Pan
2026-05-21  3:27       ` Yi Liu
2026-05-11 18:41 ` [PATCH v5 5/9] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
2026-05-11 18:58   ` Jacob Pan
2026-05-13  7:53   ` Baolu Lu
2026-05-13 12:22     ` Jason Gunthorpe
2026-05-13 22:20       ` Jacob Pan
2026-05-13 23:26         ` Jason Gunthorpe
2026-05-20  7:20   ` Yi Liu
2026-05-20  7:31     ` Yi Liu
2026-05-20 14:22     ` Jason Gunthorpe
2026-05-20 14:39       ` Yi Liu
2026-05-20 17:02     ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 6/9] vfio/group: Add VFIO_CDEV_NOIOMMU Kconfig and tolerate NULL group Jacob Pan
2026-05-20  3:45   ` Alex Williamson
2026-05-20 17:08     ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 7/9] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
2026-05-19 23:40   ` Jason Gunthorpe
2026-05-20  2:56     ` Jacob Pan
2026-05-20  3:46   ` Alex Williamson [this message]
2026-05-20  7:20     ` Yi Liu
2026-05-20 18:15       ` Jacob Pan
2026-05-21  3:25         ` Yi Liu
2026-05-21 16:49       ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 8/9] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
2026-05-11 18:41 ` [PATCH v5 9/9] Documentation: Update VFIO NOIOMMU mode Jacob Pan
2026-05-20  7:20   ` Yi Liu
2026-05-20 16:26     ` Jacob Pan
2026-05-21  3:24       ` Yi Liu
2026-05-19 18:01 ` [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jason Gunthorpe
2026-05-19 21:03   ` Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519214613.167e8d5b@shazbot.org \
    --to=alex@shazbot.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jacob.pan@linux.microsoft.com \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=robin.murphy@arm.com \
    --cc=skhawaja@google.com \
    --cc=smostafa@google.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox