From: Jacob Pan <jacob.pan@linux.microsoft.com>
To: Yi Liu <yi.l.liu@intel.com>
Cc: <linux-kernel@vger.kernel.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
Jason Gunthorpe <jgg@nvidia.com>,
Alex Williamson <alex@shazbot.org>,
Joerg Roedel <joro@8bytes.org>,
Mostafa Saleh <smostafa@google.com>,
David Matlack <dmatlack@google.com>,
Robin Murphy <robin.murphy@arm.com>,
Nicolin Chen <nicolinc@nvidia.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
Baolu Lu <baolu.lu@linux.intel.com>,
Saurabh Sengar <ssengar@linux.microsoft.com>,
<skhawaja@google.com>, <pasha.tatashin@soleen.com>,
Will Deacon <will@kernel.org>,
jacob.pan@linux.microsoft.com
Subject: Re: [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd
Date: Sat, 23 May 2026 15:01:47 -0700 [thread overview]
Message-ID: <20260523150147.00001c38@linux.microsoft.com> (raw)
In-Reply-To: <e431af3d-b19a-45d2-a58d-e400d43141f0@intel.com>
Hi Yi,
On Fri, 22 May 2026 17:19:41 +0800
Yi Liu <yi.l.liu@intel.com> wrote:
> On 5/22/26 06:11, Jacob Pan wrote:
> > Now that devices under noiommu mode can bind with IOMMUFD and
> > perform IOAS operations, lift restrictions on cdev from VFIO side.
> > Use cases are documented in Documentation/driver-api/vfio.rst
> >
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > v6:
> > - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and
> > group. Use Kconfig dependency to restrict usages and avoid null
> > group checks. (Alex & Yi)
> > - Add CAP_SYS_RAWIO checks for cdev open to maintain security
> > parity with the group noiommu path. (Alex)
> > v5:
> > - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
> > and its dependencies
> > - Add comment to explain vfio_noiommu conditional definition
> > (Alex)
> > - Removed early return for group noiommu in bind/unbind
> > - Use consistent wording referring to VFIO noiommu mode (Kevin)
> > - Update unsafe_noiommu Kconfig help text (Kevin)
> > - Change dev_warn to dev_info for noiommu enabling msg (Kevin)
> > v4:
> > - Remove early return in iommufd_bind for noiommu (Alex)
> > v3:
> > - Consolidate into fewer patches
> > v2:
> > - removed unnecessary device->noiommu set in
> > iommufd_vfio_compat_ioas_get_id()
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > drivers/vfio/Kconfig | 8 +++++---
> > drivers/vfio/device_cdev.c | 3 +++
> > drivers/vfio/iommufd.c | 6 +++---
> > drivers/vfio/vfio.h | 20 +++++++++++++-------
> > drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
> > include/linux/vfio.h | 1 +
> > 6 files changed, 44 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index ceae52fd7586..d3d8fef2855c 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
> > The VFIO device cdev is another way for userspace to
> > get device access. Userspace gets device fd by opening device cdev
> > under /dev/vfio/devices/vfioX, and then bind the device fd with an
> > iommufd
> > - to set up secure DMA context for device access. This
> > interface does
> > - not support noiommu.
> > + to set up secure DMA context for device access.
>
> if noiommu, it's unsafe DMA. :)
yes, here I just want to remove "This interface does not support
noiommu.".
>
> > If you don't know what to do here, say N.
> >
> > @@ -62,7 +61,10 @@ endif
> >
> > config VFIO_NOIOMMU
> > bool "VFIO No-IOMMU support"
> > - depends on VFIO_GROUP
> > + depends on VFIO_GROUP || VFIO_DEVICE_CDEV
> > + depends on !VFIO_GROUP || VFIO_CONTAINER ||
> > IOMMUFD_VFIO_CONTAINER
> > + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
> > + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
> > help
> > VFIO is built on the ability to isolate devices using
> > the IOMMU. Only with an IOMMU can userspace access to DMA capable
> > devices be diff --git a/drivers/vfio/device_cdev.c
> > b/drivers/vfio/device_cdev.c index 54abf312cf04..4e2c1e4fc1f8 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode
> > *inode, struct file *filep) struct vfio_device_file *df;
> > int ret;
> >
> > + if (device->noiommu && !capable(CAP_SYS_RAWIO))
> > + return -EPERM;
> > +
> > /* Paired with the put in vfio_device_fops_release() */
> > if (!vfio_device_try_get_registration(device))
> > return -ENODEV;
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index a38d262c6028..d4f2e2a0f2f3 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file
> > *df)
> > lockdep_assert_held(&vdev->dev_set->lock);
> >
> > - /* Returns 0 to permit device opening under noiommu mode */
> > - if (vfio_device_is_noiommu(vdev))
> > + /* Group noiommu via iommufd compat needs no device
> > binding */
> > + if (df->group && vfio_device_is_noiommu(vdev))
>
> seems like vfio_device_is_noiommu() implies group path, then no need
> to use df->group.
>
df->group is needed because only the legacy VFIO group/iommufd-compat
noiommu path should skip real iommufd device binding.
For df->group == NULL, the fd is a VFIO cdev fd. That path uses
VFIO_DEVICE_BIND_IOMMUFD and later VFIO_DEVICE_ATTACH_IOMMUFD_PT. Even
in noiommu cdev mode, bind must still call:
vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
so vdev->iommufd_device can get initialized. If the check were only:
if (vfio_device_is_noiommu(vdev))
return 0;
then cdev noiommu bind would falsely “succeed” without setting
vdev->iommufd_device. Later VFIO_DEVICE_ATTACH_IOMMUFD_PT calls
vfio_iommufd_physical_attach_ioas(), hits:
if (WARN_ON(!vdev->iommufd_device))
return -EINVAL;
In the noiommu test, you will get:
185.870670] ------------[ cut here ]------------
[ 185.871952] WARNING: drivers/vfio/iommufd.c:157 at
vfio_iommufd_physical_attach_ioas+0x3f/0x50, CPU#0:
vfio-noiommu-pc/157[ 185.875010] Modules linked in:[ 185.875882] CPU:
0 UID: 0 PID: 157 Comm: vfio-noiommu-pc Tainted: G U W
7.1.0-rc1+ #20 PREEMPT[ 185.878637] Tainted: [U]=USER, [W]=WARN[
185.879711] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014[ 185.882913]
RIP: 0010:vfio_iommufd_physical_attach_ioas+0x3f/0x50[ 185.884624]
Code: 89 f2 31 f6 f6 83 50 04 00 00 01 75 16 e8 d9 aa c6 ff 85 c0 75 07
80 8b 50 04 00 00 01 5b c3 cc cc cc cc e8 43 ab c6 ff eb e8 <0f> 0b
b80[ 185.889701] RSP: 0018:ffa000000062fd88 EFLAGS: 00010246[
185.891161] RAX: ffffffff81f59ee0 RBX: ff1100010c43b800 RCX:
0000000000000000[ 185.893141] RDX: ff1100010c708040 RSI:
ffa000000062fda0 RDI: 0000000000000000[ 185.895127] RBP:
ff1100010c43b800 R08: ff1100010c7c12b0 R09: 0000000000000000[
185.897119] R10: 0000000000000000 R11: 0000000000000000 R12:
00007ffec4c2f720[ 185.899102] R13: ffa000000062fda0 R14:
ff11000103bd40d0 R15: ff1100010c43b800[ 185.901075] FS:
0000000028d69380(0000) GS:ff110004e4a8d000(0000)
knlGS:0000000000000000[ 185.903284] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033[ 185.904888] CR2: 0000000028d73988 CR3:
0000000103507002 CR4: 0000000000f73ef0[ 185.906853] PKRU: 55555554[
185.907636] Call Trace:[ 185.908373] <TASK>[ 185.908932]
vfio_df_ioctl_attach_pt+0xc7/0x170[ 185.910085]
vfio_device_fops_unl_ioctl+0x49b/0xa50[ 185.911322] ?
file_tty_write.isra.0+0x202/0x320[ 185.912507]
__x64_sys_ioctl+0x425/0xa30[ 185.913502] do_syscall_64+0x5e/0xf80[
185.914444] ? irqentry_exit+0x3b/0x5e0[ 185.915414]
entry_SYSCALL_64_after_hwframe+0x76/0x7e[ 185.916701] RIP:
0033:0x434a4d[ 185.917498] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0
48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8
10 00 00 00 0f 05 <89> c2 3d0[ 185.922052] RSP: 002b:00007ffec4c2f6b0
EFLAGS: 00000246 ORIG_RAX: 0000000000000010[ 185.923785] RAX:
ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000434a4d[
185.925398] RDX: 00007ffec4c2f720 RSI: 0000000000003b77 RDI:
0000000000000004[ 185.927007] RBP: 00007ffec4c2f700 R08:
0000000000000064 R09: 0000000000000000[ 185.928611] R10:
0000000000000000 R11: 0000000000000246 R12: 00007ffec4c30918[
185.930211] R13: 00007ffec4c30940 R14: 00000000004cf868 R15:
0000000000000001[ 185.931758] </TASK>[ 185.932258] ---[ end trace
0000000000000000 ]---Failed to attach pt to device
> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> {
> return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> vdev->group->type == VFIO_NO_IOMMU;
> }
>
> > return 0;
> >
> > return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
> > @@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct
> > vfio_device_file *df)
> > lockdep_assert_held(&vdev->dev_set->lock);
> >
> > - if (vfio_device_is_noiommu(vdev))
> > + if (df->group && vfio_device_is_noiommu(vdev))
> > return;
> >
> > if (vdev->ops->unbind_iommufd)
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index e4b72e79b7e3..6f0a2dfc8a00 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct vfio_device
> > *device);
> > static inline int vfio_device_add(struct vfio_device *device)
> > {
> > - /* cdev does not support noiommu device */
> > - if (vfio_device_is_noiommu(device))
> > - return device_add(&device->device);
> > vfio_init_device_cdev(device);
> > return cdev_device_add(&device->cdev, &device->device);
> > }
> >
> > static inline void vfio_device_del(struct vfio_device *device)
> > {
> > - if (vfio_device_is_noiommu(device))
> > - device_del(&device->device);
> > - else
> > - cdev_device_del(&device->cdev, &device->device);
> > + cdev_device_del(&device->cdev, &device->device);
> > }
> >
> > int vfio_device_fops_cdev_open(struct inode *inode, struct file
> > *filep); @@ -420,6 +414,18 @@ static inline void
> > vfio_cdev_cleanup(void) }
> > #endif /* CONFIG_VFIO_DEVICE_CDEV */
> >
> > +#if IS_ENABLED(CONFIG_VFIO_NOIOMMU)
> > +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
> > *vdev) +{
> > + return vdev->noiommu;
> > +}
> > +#else
> > +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
> > *vdev) +{
> > + return false;
> > +}
> > +#endif
> > +
> > #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
> > int __init vfio_virqfd_init(void);
> > void vfio_virqfd_exit(void);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 6222376ab6ab..84381c500623 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -321,6 +321,20 @@ static int vfio_init_device(struct vfio_device
> > *device, struct device *dev, return ret;
> > }
> >
> > +static int vfio_device_set_noiommu_and_name(struct vfio_device
> > *device) +{
> > + if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu &&
> > !device->dev->iommu) {
> > + device->noiommu = true;
> > + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > + dev_warn(device->dev,
> > + "Adding kernel taint for vfio-noiommu
> > cdev on device\n");
> > + }
> > +
> > + /* Just to be safe, expose to user explicitly noiommu cdev
> > node */
> > + return dev_set_name(&device->device, "%svfio%d",
> > + device->noiommu ? "noiommu-" : "",
> > device->index); +}
> > +
> > static int __vfio_register_dev(struct vfio_device *device,
> > enum vfio_group_type type)
> > {
> > @@ -340,20 +354,21 @@ static int __vfio_register_dev(struct
> > vfio_device *device, if (!device->dev_set)
> > vfio_assign_device_set(device, device);
> >
> > - ret = dev_set_name(&device->device, "vfio%d",
> > device->index);
> > + ret = vfio_device_set_group(device, type);
> > if (ret)
> > return ret;
> >
> > - ret = vfio_device_set_group(device, type);
> > + ret = vfio_device_set_noiommu_and_name(device);
>
> the order of dev_set_name and vfio_device_set_group() are swapped, any
> special reason?
The ordering was intentional in an earlier version where the cdev
noiommu check depended on device->group. With the current check using
!device->dev->iommu, the ordering is no longer strictly required for
that test.
I kept vfio_device_set_group() first because the rest of registration
already treats group setup as the first VFIO state to unwind, and this
lets the existing err_out path handle failures after group assignment,
including dev_set_name(). I can restore the old order if you prefer,
since it is not functionally required anymore.
> > if (ret)
> > - return ret;
> > + goto err_out;
> >
> > /*
> > * VFIO always sets IOMMU_CACHE because we offer no way
> > for userspace to
> > * restore cache coherency. It has to be checked here
> > because it is only
> > * valid for cases where we are using iommu groups.
> > */
> > - if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)
> > &&
> > + if (type == VFIO_IOMMU && !(vfio_device_is_noiommu(device)
> > ||
> > +
> > vfio_device_is_cdev_noiommu(device)) &&
>
> now, the group path and cdev path have their own is_noiommu helper,
> can the two helpers be consolidated?
>
They could be consolidated mechanically, but I feel they are checking
different things it is more clear to keep them separate?
> > !device_iommu_capable(device->dev,
> > IOMMU_CAP_CACHE_COHERENCY)) { ret = -EINVAL;
> > goto err_out;
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 31b826efba00..45f08986359e 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -74,6 +74,7 @@ struct vfio_device {
> > u8 iommufd_attached:1;
> > #endif
> > u8 cdev_opened:1;
> > + u8 noiommu:1;
> > /*
> > * debug_root is a static property of the vfio_device
> > * which must be set prior to registering the
> > vfio_device.
next prev parent reply other threads:[~2026-05-23 22:01 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-05-21 22:11 ` [PATCH v6 1/7] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-05-21 22:11 ` [PATCH v6 2/7] iommufd: Move igroup allocation to a function Jacob Pan
2026-05-22 6:00 ` Baolu Lu
2026-05-21 22:11 ` [PATCH v6 3/7] iommufd: Allow binding to a noiommu device Jacob Pan
2026-05-22 6:01 ` Baolu Lu
2026-05-21 22:11 ` [PATCH v6 4/7] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
2026-05-22 9:22 ` Yi Liu
2026-05-21 22:11 ` [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
2026-05-22 9:19 ` Yi Liu
2026-05-23 22:01 ` Jacob Pan [this message]
2026-05-25 6:29 ` Yi Liu
2026-05-28 18:52 ` Jacob Pan
2026-05-29 7:27 ` Yi Liu
2026-05-21 22:11 ` [PATCH v6 6/7] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
2026-05-21 22:39 ` David Matlack
2026-06-03 0:13 ` Jacob Pan
2026-05-21 22:11 ` [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode Jacob Pan
2026-05-22 9:42 ` Yi Liu
2026-05-23 3:42 ` Jacob Pan
2026-05-25 6:29 ` Yi Liu
2026-05-25 8:30 ` [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Tian, Kevin
2026-05-26 15:32 ` Jacob Pan
2026-05-26 17:57 ` Alex Williamson
2026-05-27 22:34 ` Jacob Pan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260523150147.00001c38@linux.microsoft.com \
--to=jacob.pan@linux.microsoft.com \
--cc=alex@shazbot.org \
--cc=baolu.lu@linux.intel.com \
--cc=dmatlack@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nicolinc@nvidia.com \
--cc=pasha.tatashin@soleen.com \
--cc=robin.murphy@arm.com \
--cc=skhawaja@google.com \
--cc=smostafa@google.com \
--cc=ssengar@linux.microsoft.com \
--cc=will@kernel.org \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox