* [PATCH v8 0/6] iommufd: Enable noiommu mode for cdev
@ 2026-06-03 22:02 Jacob Pan
2026-06-03 22:02 ` [PATCH v8 1/6] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Jacob Pan @ 2026-06-03 22:02 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
VFIO's unsafe_noiommu_mode has long provided a way for userspace drivers
to operate on platforms lacking a hardware IOMMU. Today, IOMMUFD also
supports No-IOMMU mode for group-based devices under vfio_compat mode.
However, IOMMUFD's native character device (cdev) does not yet support
No-IOMMU mode, which is the purpose of this patch.
In summary, we have:
|-------------------------+------+---------------|
| Device access mode | VFIO | IOMMUFD |
|-------------------------+------+---------------|
| group /dev/vfio/$GROUP | Yes | Yes |
|-------------------------+------+---------------|
| cdev /dev/vfio/devices/ | No | This patch |
|-------------------------+------+---------------|
Beyond enabling cdev for IOMMUFD, this patch also addresses the following
deficiencies in the current No-IOMMU mode suggested by Jason[1]:
- Devices operating under No-IOMMU mode are limited to device-level UAPI
access, without container or IOAS-level capabilities. Consequently,
user-space drivers lack structured mechanisms for page pinning and often
resort to mlock(), which is less robust than pin_user_pages() used for
devices backed by a physical IOMMU. For example, mlock() does not prevent
page migration.
- There is no architectural mechanism for obtaining physical addresses for
DMA. As a workaround, user-space drivers frequently rely on /proc/pagemap
tricks or hardcoded values.
By allowing noiommu device access to IOMMUFD IOAS and HWPT objects, this
patch brings No-IOMMU mode closer to full citizenship within the IOMMU
subsystem. In addition to addressing the two deficiencies mentioned above,
the expectation is that it will also enable No-IOMMU devices to seamlessly
participate in live update sessions via KHO [2].
Furthermore, these devices will use the IOMMUFD-based ownership checking model for
VFIO_DEVICE_PCI_HOT_RESET, eliminating the need for an iommufd_access object
as required in a previous attempt [3].
ChangeLog (details in each patch):
v8:
- Guard noiommu for vdevice viommu alloc (Kevin)
v7:
- Handle Sashiko reviews.
- Dropped selftest for now, will submit separately for v7.2 to use
new lib helpers
v6: Undo CDEV-GROUP NOIOMMU split, use Kconfig to restrict unwanted
combo.
V5:
- Split CONFIG_VFIO_NOIOMMU into CONFIG_VFIO_GROUP_NOIOMMU and
CONFIG_VFIO_CDEV_NOIOMMU so cdev noiommu is independent of
VFIO_GROUP (Alex)
- Add CAP_SYS_RAWIO check for cdev open and bind under noiommu,
security parity with group noiommu (Alex)
- Add IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) guard in
iommufd_device_is_noiommu() to prevent noiommu bind when feature
is disabled
- Add prep patch to tolerate NULL group for cdev noiommu devices
when CONFIG_VFIO_GROUP_NOIOMMU is not set [7/9]
- Rename IOCTL to IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA to be more
specific (Kevin)
- Simplify iommufd_device_is_noiommu, use iommufd_bind_noiommu
helper (Kevin, Yi)
- Move IOMMU cap check under iommufd_bind_iommu() (Yi)
- Fix next_iova exceeding iopt_area_last_iova in GET_PA (Alex)
- Fix const hwpt, copyright date, typo in moved comment (Kevin)
- Add Reviewed-by tags
- Squash noiommu cdev selftest fix into selftest patch
- Drop DSA selftest patch
- Details in each patch changelog.
V4:
- Fix various corner cases pointed out by (Sashiko)
Details in each patch changelog.
V3:
- Improve error handling [3/10] (Mostafa)
- Simplify vfio_device_is_noiommu logic and merged in [6/10] (Mostafa)
- Add comment to explain the design difference over the legacy noiommu
VFIO code.[1/10]
V2:
- Fix build dependency by adding IOMMU_SUPPORT in [8/11]
- Add an optimization to scan beyond the first page for a contiguous
physical address range and return its length instead of a single
page.[4/11]
Since RFC[4]:
- Abandoned dummy iommu driver approach as patch 1-3 absorbed the
changes into iommufd.
[1] https://lore.kernel.org/linux-iommu/20250603175403.GA407344@nvidia.com/
[2] https://lore.kernel.org/linux-pci/20251027134430.00007e46@linux.microsoft.com/
[3] https://lore.kernel.org/kvm/20230522115751.326947-1-yi.l.liu@intel.com/
[4] https://lore.kernel.org/linux-iommu/20251201173012.18371-1-jacob.pan@linux.microsoft.com/
Jacob Pan (3):
iommufd: Add an ioctl to query PA from IOVA for noiommu mode
vfio: Enable cdev noiommu mode under iommufd
Documentation: Update VFIO NOIOMMU mode
Jason Gunthorpe (3):
iommufd: Support a HWPT without an iommu driver for noiommu
iommufd: Move igroup allocation to a function
iommufd: Allow binding to a noiommu device
Documentation/driver-api/vfio.rst | 81 +++++++++-
drivers/iommu/iommufd/Kconfig | 12 ++
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 197 +++++++++++++++++-------
drivers/iommu/iommufd/hw_pagetable.c | 19 ++-
drivers/iommu/iommufd/hwpt_noiommu.c | 105 +++++++++++++
drivers/iommu/iommufd/io_pagetable.c | 80 ++++++++++
drivers/iommu/iommufd/ioas.c | 33 ++++
drivers/iommu/iommufd/iommufd_private.h | 30 ++++
drivers/iommu/iommufd/main.c | 4 +
drivers/iommu/iommufd/viommu.c | 14 +-
drivers/vfio/Kconfig | 7 +-
drivers/vfio/device_cdev.c | 3 +
drivers/vfio/iommufd.c | 12 +-
drivers/vfio/vfio.h | 23 ++-
drivers/vfio/vfio_main.c | 26 +++-
include/linux/vfio.h | 1 +
include/uapi/linux/iommufd.h | 27 ++++
18 files changed, 590 insertions(+), 85 deletions(-)
create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v8 1/6] iommufd: Support a HWPT without an iommu driver for noiommu
2026-06-03 22:02 [PATCH v8 0/6] iommufd: Enable noiommu mode for cdev Jacob Pan
@ 2026-06-03 22:02 ` Jacob Pan
2026-06-03 22:02 ` [PATCH v8 2/6] iommufd: Move igroup allocation to a function Jacob Pan
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-06-03 22:02 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
From: Jason Gunthorpe <jgg@nvidia.com>
Create just a little part of a real iommu driver, enough to
slot in under the dev_iommu_ops() and allow iommufd to call
domain_alloc_paging_flags() and fail everything else.
This allows explicitly creating a HWPT under an IOAS.
A new Kconfig option IOMMUFD_NOIOMMU is introduced to differentiate
from the VFIO group/container based noiommu mode.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
V8:
- Guard vIOMMU and vDevice allocation paths for noiommu (Sashiko)
v7:
- Drain no-IOMMU generic-PT freelist (Sashiko)
- Import generic-PT IOMMU namespace (Sashiko)
v6: (Yi)
- Sort includes alphabetically (iommu.h after generic_pt/iommu.h)
- Fix comment: s/mock page table/SW-only page table/ to avoid confusion
with selftest mock
- Rewrite noiommu_amdv1_ops comment: explain why AMDV1 format is chosen
(multi-page size options), remove references to group-container mode distinction
v5:
- Use the new IOMMUFD_NOIOMMU Kconfig instead of VFIO_NOIOMMU
- Use consistent wording referring to VFIO noiommu mode (Kevin)
- Copyright date fix (Kevin)
v4:
- Make iommufd_noiommu_ops const
v3:
- Add comment to explain the design difference over the
legacy noiommu VFIO code.
---
drivers/iommu/iommufd/Kconfig | 12 +++
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/hw_pagetable.c | 19 ++++-
drivers/iommu/iommufd/hwpt_noiommu.c | 105 ++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 12 +++
drivers/iommu/iommufd/main.c | 1 +
drivers/iommu/iommufd/viommu.c | 14 +++-
7 files changed, 158 insertions(+), 6 deletions(-)
create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index 455bac0351f2..6c3bea83631b 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -16,6 +16,18 @@ config IOMMUFD
If you don't know what to do here, say N.
if IOMMUFD
+config IOMMUFD_NOIOMMU
+ bool
+ depends on !GENERIC_ATOMIC64 # IOMMU_PT_AMDV1 requires cmpxchg64
+ select GENERIC_PT
+ select IOMMU_PT
+ select IOMMU_PT_AMDV1
+ help
+ Provides a SW-only IO page table for devices without hardware
+ IOMMU backing. This uses the AMDV1 page table format for
+ IOVA-to-PA lookups only, not for hardware DMA translation.
+ To be selected by VFIO_NOIOMMU when VFIO_DEVICE_CDEV is enabled.
+
config IOMMUFD_VFIO_CONTAINER
bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
depends on VFIO_GROUP && !VFIO_CONTAINER
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 71d692c9a8f4..67207914bb6e 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -10,6 +10,7 @@ iommufd-y := \
vfio_compat.o \
viommu.o
+iommufd-$(CONFIG_IOMMUFD_NOIOMMU) += hwpt_noiommu.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index fe789c2dc0c9..8f95c75d47f3 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,15 @@
#include "../iommu-priv.h"
#include "iommufd_private.h"
+static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev)
+{
+ if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->igroup->group)
+ return &iommufd_noiommu_ops;
+ if (WARN_ON_ONCE(!idev->dev->iommu))
+ return NULL;
+ return dev_iommu_ops(idev->dev);
+}
+
static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
{
if (hwpt->domain)
@@ -114,11 +123,13 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
IOMMU_HWPT_FAULT_ID_VALID |
IOMMU_HWPT_ALLOC_PASID;
- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+ const struct iommu_ops *ops = get_iommu_ops(idev);
struct iommufd_hwpt_paging *hwpt_paging;
struct iommufd_hw_pagetable *hwpt;
int rc;
+ if (!ops)
+ return ERR_PTR(-ENODEV);
lockdep_assert_held(&ioas->mutex);
if ((flags || user_data) && !ops->domain_alloc_paging_flags)
@@ -229,7 +240,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
struct iommufd_device *idev, u32 flags,
const struct iommu_user_data *user_data)
{
- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+ const struct iommu_ops *ops = get_iommu_ops(idev);
struct iommufd_hwpt_nested *hwpt_nested;
struct iommufd_hw_pagetable *hwpt;
int rc;
@@ -389,10 +400,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
hwpt = &hwpt_nested->common;
} else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) {
struct iommufd_hwpt_nested *hwpt_nested;
+ struct iommu_device *iommu_dev;
struct iommufd_viommu *viommu;
viommu = container_of(pt_obj, struct iommufd_viommu, obj);
- if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) {
+ iommu_dev = iommufd_device_get_iommu_dev(idev);
+ if (!iommu_dev || viommu->iommu_dev != iommu_dev) {
rc = -EINVAL;
goto out_unlock;
}
diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/hwpt_noiommu.c
new file mode 100644
index 000000000000..9b8b5eb71491
--- /dev/null
+++ b/drivers/iommu/iommufd/hwpt_noiommu.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES
+ */
+#include <linux/generic_pt/iommu.h>
+#include <linux/iommu.h>
+#include "../iommu-pages.h"
+#include "iommufd_private.h"
+
+static const struct iommu_domain_ops noiommu_amdv1_ops;
+
+struct noiommu_domain {
+ union {
+ struct iommu_domain domain;
+ struct pt_iommu_amdv1 amdv1;
+ };
+ spinlock_t lock;
+};
+PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain);
+
+static void noiommu_change_top(struct pt_iommu *iommu_table,
+ phys_addr_t top_paddr, unsigned int top_level)
+{
+}
+
+static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt)
+{
+ struct noiommu_domain *domain =
+ container_of(iommupt, struct noiommu_domain, amdv1.iommu);
+
+ return &domain->lock;
+}
+
+static const struct pt_iommu_driver_ops noiommu_driver_ops = {
+ .get_top_lock = noiommu_get_top_lock,
+ .change_top = noiommu_change_top,
+};
+
+static struct iommu_domain *
+noiommu_alloc_paging_flags(struct device *dev, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ struct pt_iommu_amdv1_cfg cfg = {};
+ struct noiommu_domain *dom;
+ int rc;
+
+ if (flags || user_data)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ cfg.common.hw_max_vasz_lg2 = 64;
+ cfg.common.hw_max_oasz_lg2 = 52;
+ cfg.starting_level = 2;
+ cfg.common.features =
+ (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) |
+ BIT(PT_FEAT_AMDV1_FORCE_COHERENCE));
+
+ dom = kzalloc(sizeof(*dom), GFP_KERNEL);
+ if (!dom)
+ return ERR_PTR(-ENOMEM);
+
+ spin_lock_init(&dom->lock);
+ dom->amdv1.iommu.nid = NUMA_NO_NODE;
+ dom->amdv1.iommu.driver_ops = &noiommu_driver_ops;
+ dom->domain.ops = &noiommu_amdv1_ops;
+
+ /* Use SW-only page table which is based on AMDV1 */
+ rc = pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL);
+ if (rc) {
+ kfree(dom);
+ return ERR_PTR(rc);
+ }
+
+ return &dom->domain;
+}
+
+static void noiommu_domain_free(struct iommu_domain *iommu_domain)
+{
+ struct noiommu_domain *domain =
+ container_of(iommu_domain, struct noiommu_domain, domain);
+
+ pt_iommu_deinit(&domain->amdv1.iommu);
+ kfree(domain);
+}
+
+static void noiommu_iotlb_sync(struct iommu_domain *domain,
+ struct iommu_iotlb_gather *gather)
+{
+ iommu_put_pages_list(&gather->freelist);
+}
+
+/*
+ * Domain ops for iommufd no-IOMMU mode. Uses AMDV1 format as a
+ * SW-only IOPT because it has the best multi-page size options
+ * of all the formats. IOVAs serve only for IOVA-to-PA lookups,
+ * not for hardware DMA translation.
+ */
+static const struct iommu_domain_ops noiommu_amdv1_ops = {
+ IOMMU_PT_DOMAIN_OPS(amdv1),
+ .iotlb_sync = noiommu_iotlb_sync,
+ .free = noiommu_domain_free,
+};
+
+const struct iommu_ops iommufd_noiommu_ops = {
+ .domain_alloc_paging_flags = noiommu_alloc_paging_flags,
+};
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 6ac1965199e9..c8ed612e896a 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
refcount_dec(&hwpt->obj.users);
}
+extern const struct iommu_ops iommufd_noiommu_ops;
+
struct iommufd_attach;
struct iommufd_group {
@@ -501,6 +503,16 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id)
struct iommufd_device, obj);
}
+static inline struct iommu_device *
+iommufd_device_get_iommu_dev(struct iommufd_device *idev)
+{
+ if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->igroup->group)
+ return NULL;
+ if (WARN_ON_ONCE(!idev->dev->iommu))
+ return NULL;
+ return __iommu_get_iommu_dev(idev->dev);
+}
+
void iommufd_device_pre_destroy(struct iommufd_object *obj);
void iommufd_device_destroy(struct iommufd_object *obj);
int iommufd_get_hw_info(struct iommufd_ucmd *ucmd);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 8c6d43601afb..f6ae60bd3f70 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -804,5 +804,6 @@ MODULE_ALIAS("devname:vfio/vfio");
MODULE_IMPORT_NS("IOMMUFD_INTERNAL");
MODULE_IMPORT_NS("IOMMUFD");
MODULE_IMPORT_NS("DMA_BUF");
+MODULE_IMPORT_NS("GENERIC_PT_IOMMU");
MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices");
MODULE_LICENSE("GPL");
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 4081deda9b33..b51f67fdf4e3 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -25,6 +25,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
struct iommufd_hwpt_paging *hwpt_paging;
struct iommufd_viommu *viommu;
struct iommufd_device *idev;
+ struct iommu_device *iommu_dev;
const struct iommu_ops *ops;
size_t viommu_size;
int rc;
@@ -36,7 +37,12 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
if (IS_ERR(idev))
return PTR_ERR(idev);
- ops = dev_iommu_ops(idev->dev);
+ iommu_dev = iommufd_device_get_iommu_dev(idev);
+ if (!iommu_dev) {
+ rc = -EOPNOTSUPP;
+ goto out_put_idev;
+ }
+ ops = iommu_dev->ops;
if (!ops->get_viommu_size || !ops->viommu_init) {
rc = -EOPNOTSUPP;
goto out_put_idev;
@@ -87,7 +93,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
* pluggable IOMMU instance (if exists) is responsible for refcounting
* on its own.
*/
- viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev);
+ viommu->iommu_dev = iommu_dev;
rc = ops->viommu_init(viommu, hwpt_paging->common.domain,
user_data.len ? &user_data : NULL);
@@ -146,6 +152,7 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
struct iommufd_vdevice *vdev, *curr;
size_t vdev_size = sizeof(*vdev);
struct iommufd_viommu *viommu;
+ struct iommu_device *iommu_dev;
struct iommufd_device *idev;
u64 virt_id = cmd->virt_id;
int rc = 0;
@@ -164,7 +171,8 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
goto out_put_viommu;
}
- if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) {
+ iommu_dev = iommufd_device_get_iommu_dev(idev);
+ if (!iommu_dev || viommu->iommu_dev != iommu_dev) {
rc = -EINVAL;
goto out_put_idev;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 2/6] iommufd: Move igroup allocation to a function
2026-06-03 22:02 [PATCH v8 0/6] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-06-03 22:02 ` [PATCH v8 1/6] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
@ 2026-06-03 22:02 ` Jacob Pan
2026-06-03 22:02 ` [PATCH v8 3/6] iommufd: Allow binding to a noiommu device Jacob Pan
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-06-03 22:02 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
From: Jason Gunthorpe <jgg@nvidia.com>
So it can be reused in the next patch which allows binding to noiommu
device.
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v5:
- Add NULL group to the error handling path of
iommufd_group_setup_msi()
v3:
- New patch
---
drivers/iommu/iommufd/device.c | 43 +++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 170a7005f0bc..d03076fcf3c2 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -56,6 +56,30 @@ static bool iommufd_group_try_get(struct iommufd_group *igroup,
return kref_get_unless_zero(&igroup->ref);
}
+static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx,
+ struct iommu_group *group)
+{
+ struct iommufd_group *new_igroup;
+
+ new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL);
+ if (!new_igroup)
+ return ERR_PTR(-ENOMEM);
+
+ kref_init(&new_igroup->ref);
+ mutex_init(&new_igroup->lock);
+ xa_init(&new_igroup->pasid_attach);
+ new_igroup->sw_msi_start = PHYS_ADDR_MAX;
+ /* group reference moves into new_igroup */
+ new_igroup->group = group;
+
+ /*
+ * The ictx is not additionally refcounted here because all objects using
+ * an igroup must put it before their destroy completes.
+ */
+ new_igroup->ictx = ictx;
+ return new_igroup;
+}
+
/*
* iommufd needs to store some more data for each iommu_group, we keep a
* parallel xarray indexed by iommu_group id to hold this instead of putting it
@@ -87,25 +111,12 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
}
xa_unlock(&ictx->groups);
- new_igroup = kzalloc_obj(*new_igroup);
- if (!new_igroup) {
+ new_igroup = iommufd_alloc_group(ictx, group);
+ if (IS_ERR(new_igroup)) {
iommu_group_put(group);
- return ERR_PTR(-ENOMEM);
+ return new_igroup;
}
- kref_init(&new_igroup->ref);
- mutex_init(&new_igroup->lock);
- xa_init(&new_igroup->pasid_attach);
- new_igroup->sw_msi_start = PHYS_ADDR_MAX;
- /* group reference moves into new_igroup */
- new_igroup->group = group;
-
- /*
- * The ictx is not additionally refcounted here becase all objects using
- * an igroup must put it before their destroy completes.
- */
- new_igroup->ictx = ictx;
-
/*
* We dropped the lock so igroup is invalid. NULL is a safe and likely
* value to assume for the xa_cmpxchg algorithm.
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 3/6] iommufd: Allow binding to a noiommu device
2026-06-03 22:02 [PATCH v8 0/6] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-06-03 22:02 ` [PATCH v8 1/6] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-06-03 22:02 ` [PATCH v8 2/6] iommufd: Move igroup allocation to a function Jacob Pan
@ 2026-06-03 22:02 ` Jacob Pan
2026-06-03 22:02 ` [PATCH v8 4/6] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-06-03 22:02 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
From: Jason Gunthorpe <jgg@nvidia.com>
Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating
a dummy igroup for such devices and skipping hwpt operations.
This enables noiommu devices to operate through the same iommufd API as IOMMU-
capable devices.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v7:
- Block get hw info for noiommu
v6:
- Expand iommufd_device_is_noiommu() comment to explain why dev->iommu
is checked instead of device_iommu_mapped() (Yi & Baolu)
- Simplify bind error handling by factoring out duplicated rc check (Yi)
v5:
- simplify logic and rename iommufd_device_is_noiommu (Kevin, Yi)
- use a helper iommufd_bind_noiommu instead of open coding (Kevin)
- move IOMMU cap check under iommufd_bind_iommu() (Yi)
- reword comments for partial init (Yi)
- misc minor clean up
v4:
- Update the description of the module parameter (Alex)
v3:
- Consolidate into fewer patches
---
drivers/iommu/iommufd/device.c | 154 ++++++++++++++++++++++++---------
1 file changed, 115 insertions(+), 39 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index d03076fcf3c2..670349ff65ea 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -23,6 +23,19 @@ struct iommufd_attach {
struct xarray device_array;
};
+/*
+ * Detect a noiommu device for the cdev path. We check dev->iommu rather than
+ * using device_iommu_mapped() (which checks dev->iommu_group) because when
+ * both group and cdev interfaces coexist, the group path assigns a fake
+ * noiommu iommu_group to the device. That would cause device_iommu_mapped()
+ * to return true and hide the noiommu case from the cdev path. dev->iommu is
+ * reliably NULL when no IOMMU driver is managing the device.
+ */
+static bool iommufd_device_is_noiommu(struct iommufd_device *idev)
+{
+ return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->dev->iommu;
+}
+
static void iommufd_group_release(struct kref *kref)
{
struct iommufd_group *igroup =
@@ -30,9 +43,11 @@ static void iommufd_group_release(struct kref *kref)
WARN_ON(!xa_empty(&igroup->pasid_attach));
- xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
- NULL, GFP_KERNEL);
- iommu_group_put(igroup->group);
+ if (igroup->group) {
+ xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group),
+ igroup, NULL, GFP_KERNEL);
+ iommu_group_put(igroup->group);
+ }
mutex_destroy(&igroup->lock);
kfree(igroup);
}
@@ -204,32 +219,20 @@ void iommufd_device_destroy(struct iommufd_object *obj)
struct iommufd_device *idev =
container_of(obj, struct iommufd_device, obj);
- iommu_device_release_dma_owner(idev->dev);
+ /* igroup is NULL when destroy called during bind error cleanup */
+ if (!idev->igroup)
+ return;
+ if (!iommufd_device_is_noiommu(idev))
+ iommu_device_release_dma_owner(idev->dev);
iommufd_put_group(idev->igroup);
if (!iommufd_selftest_is_mock_dev(idev->dev))
iommufd_ctx_put(idev->ictx);
}
-/**
- * iommufd_device_bind - Bind a physical device to an iommu fd
- * @ictx: iommufd file descriptor
- * @dev: Pointer to a physical device struct
- * @id: Output ID number to return to userspace for this device
- *
- * A successful bind establishes an ownership over the device and returns
- * struct iommufd_device pointer, otherwise returns error pointer.
- *
- * A driver using this API must set driver_managed_dma and must not touch
- * the device until this routine succeeds and establishes ownership.
- *
- * Binding a PCI device places the entire RID under iommufd control.
- *
- * The caller must undo this with iommufd_device_unbind()
- */
-struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
- struct device *dev, u32 *id)
+static int iommufd_bind_iommu(struct iommufd_device *idev)
{
- struct iommufd_device *idev;
+ struct iommufd_ctx *ictx = idev->ictx;
+ struct device *dev = idev->dev;
struct iommufd_group *igroup;
int rc;
@@ -238,11 +241,11 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
* to restore cache coherency.
*/
if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
igroup = iommufd_get_group(ictx, dev);
if (IS_ERR(igroup))
- return ERR_CAST(igroup);
+ return PTR_ERR(igroup);
/*
* For historical compat with VFIO the insecure interrupt path is
@@ -268,21 +271,77 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
if (rc)
goto out_group_put;
+ /* igroup refcount moves into iommufd_device */
+ idev->igroup = igroup;
+ idev->enforce_cache_coherency =
+ device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+ return 0;
+
+out_group_put:
+ iommufd_put_group(igroup);
+ return rc;
+}
+
+/*
+ * Noiommu devices have no real IOMMU group. Create a dummy igroup so that
+ * internal code paths that expect idev->igroup to be present still work.
+ * A NULL igroup->group distinguishes this from a real IOMMU-backed group.
+ */
+static int iommufd_bind_noiommu(struct iommufd_device *idev)
+{
+ struct iommufd_group *igroup;
+
+ igroup = iommufd_alloc_group(idev->ictx, NULL);
+ if (IS_ERR(igroup))
+ return PTR_ERR(igroup);
+ idev->igroup = igroup;
+ return 0;
+}
+
+/**
+ * iommufd_device_bind - Bind a physical device to an iommu fd
+ * @ictx: iommufd file descriptor
+ * @dev: Pointer to a physical device struct
+ * @id: Output ID number to return to userspace for this device
+ *
+ * A successful bind establishes an ownership over the device and returns
+ * struct iommufd_device pointer, otherwise returns error pointer.
+ *
+ * A driver using this API must set driver_managed_dma and must not touch
+ * the device until this routine succeeds and establishes ownership.
+ *
+ * Binding a PCI device places the entire RID under iommufd control.
+ *
+ * The caller must undo this with iommufd_device_unbind()
+ */
+struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
+ struct device *dev, u32 *id)
+{
+ struct iommufd_device *idev;
+ int rc;
+
idev = iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE);
- if (IS_ERR(idev)) {
- rc = PTR_ERR(idev);
- goto out_release_owner;
- }
+ if (IS_ERR(idev))
+ return idev;
+
idev->ictx = ictx;
+ idev->dev = dev;
+
+ if (!iommufd_device_is_noiommu(idev))
+ rc = iommufd_bind_iommu(idev);
+ else
+ rc = iommufd_bind_noiommu(idev);
+ if (rc)
+ goto err_out;
+
+ /*
+ * Take a ctx reference after bind succeeds. This must happen here
+ * so that iommufd_device_destroy() can handle partial initialization
+ */
if (!iommufd_selftest_is_mock_dev(dev))
iommufd_ctx_get(ictx);
- idev->dev = dev;
- idev->enforce_cache_coherency =
- device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
/* The calling driver is a user until iommufd_device_unbind() */
refcount_inc(&idev->obj.users);
- /* igroup refcount moves into iommufd_device */
- idev->igroup = igroup;
/*
* If the caller fails after this success it must call
@@ -294,11 +353,14 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
*id = idev->obj.id;
return idev;
-out_release_owner:
- iommu_device_release_dma_owner(dev);
-out_group_put:
- iommufd_put_group(igroup);
+err_out:
+ /*
+ * iommufd_device_destroy() handles partially initialized idev,
+ * so iommufd_object_abort_and_destroy() is safe to call here.
+ */
+ iommufd_object_abort_and_destroy(ictx, &idev->obj);
return ERR_PTR(rc);
+
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD");
@@ -512,6 +574,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
struct iommufd_attach_handle *handle;
int rc;
+ if (iommufd_device_is_noiommu(idev))
+ return 0;
+
if (!iommufd_hwpt_compatible_device(hwpt, idev))
return -EINVAL;
@@ -559,6 +624,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
{
struct iommufd_attach_handle *handle;
+ if (iommufd_device_is_noiommu(idev))
+ return;
+
handle = iommufd_device_get_attach_handle(idev, pasid);
if (pasid == IOMMU_NO_PASID)
iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
@@ -577,6 +645,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
struct iommufd_attach_handle *handle, *old_handle;
int rc;
+ if (iommufd_device_is_noiommu(idev))
+ return 0;
+
if (!iommufd_hwpt_compatible_device(hwpt, idev))
return -EINVAL;
@@ -652,7 +723,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
goto err_release_devid;
}
- if (attach_resv) {
+ if (attach_resv && !iommufd_device_is_noiommu(idev)) {
rc = iommufd_device_attach_reserved_iova(idev, hwpt_paging);
if (rc)
goto err_release_devid;
@@ -1585,6 +1656,11 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
if (IS_ERR(idev))
return PTR_ERR(idev);
+ if (iommufd_device_is_noiommu(idev)) {
+ rc = -EOPNOTSUPP;
+ goto out_put;
+ }
+
ops = dev_iommu_ops(idev->dev);
if (ops->hw_info) {
data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 4/6] iommufd: Add an ioctl to query PA from IOVA for noiommu mode
2026-06-03 22:02 [PATCH v8 0/6] iommufd: Enable noiommu mode for cdev Jacob Pan
` (2 preceding siblings ...)
2026-06-03 22:02 ` [PATCH v8 3/6] iommufd: Allow binding to a noiommu device Jacob Pan
@ 2026-06-03 22:02 ` Jacob Pan
2026-06-03 22:02 ` [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
2026-06-03 22:02 ` [PATCH v8 6/6] Documentation: Update VFIO NOIOMMU mode Jacob Pan
5 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-06-03 22:02 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
To support no-IOMMU mode where userspace drivers perform unsafe DMA
using physical addresses, introduce a new API to retrieve the
physical address of a user-allocated DMA buffer that has been mapped to
an IOVA via IOMMU_IOAS_MAP. The mapping is backed by SW-only I/O page
tables maintained by the GENERIC_PT framework.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v8:
- Fix comment on start IOVA range (Kevin)
v7:
- Fix commit message (Yi)
- Avoid duplicated tmp_length settting (yi)
- Handle race with dma-buf revoke pages (Sashiko)
v6:
- Limit search length (Baolu, Jason)
v5:
- Fix next_iova exceeds iopt_area_last_iova (Alex)
- Rename IOCTL more specific to NOIOMMU, i.e.
IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA (Kevin)
- Add header stubs for iopt_get_phys()
v4:
- Fix ioctl return type (Yi Liu)
fix comment get_pa
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/iommu/iommufd/io_pagetable.c | 80 +++++++++++++++++++++++++
drivers/iommu/iommufd/ioas.c | 33 ++++++++++
drivers/iommu/iommufd/iommufd_private.h | 18 ++++++
drivers/iommu/iommufd/main.c | 3 +
include/uapi/linux/iommufd.h | 27 +++++++++
5 files changed, 161 insertions(+)
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 24d4917105d9..667c2d07e08b 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -859,6 +859,86 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
return iopt_unmap_iova_range(iopt, iova, iova_last, unmapped);
}
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
+ u64 *length)
+{
+ struct iopt_area *area;
+ struct iopt_pages *pages;
+ u64 max_length = *length;
+ u64 tmp_length = 0;
+ u64 tmp_paddr = 0;
+ int rc = 0;
+
+ down_read(&iopt->iova_rwsem);
+ area = iopt_area_iter_first(iopt, iova, iova);
+ if (!area || !area->pages) {
+ rc = -ENOENT;
+ goto unlock_exit;
+ }
+
+ pages = area->pages;
+ mutex_lock(&pages->mutex);
+ if (iopt_dmabuf_revoked(pages)) {
+ rc = -EINVAL;
+ goto unlock_pages;
+ }
+
+ if (!area->storage_domain ||
+ area->storage_domain->owner != &iommufd_noiommu_ops) {
+ rc = -EOPNOTSUPP;
+ goto unlock_pages;
+ }
+
+ *paddr = iommu_iova_to_phys(area->storage_domain, iova);
+ if (!*paddr) {
+ rc = -EINVAL;
+ goto unlock_pages;
+ }
+
+ tmp_length = PAGE_SIZE - offset_in_page(iova);
+ tmp_paddr = *paddr;
+ /*
+ * Scan the domain for the contiguous physical address length so that
+ * userspace search can be optimized for fewer ioctls. A max_length of
+ * 0 means no limit.
+ */
+ while (iova < iopt_area_last_iova(area)) {
+ unsigned long next_iova;
+ u64 next_paddr;
+
+ if (max_length && tmp_length >= max_length)
+ break;
+
+ if (check_add_overflow(iova, PAGE_SIZE, &next_iova))
+ break;
+
+ if (next_iova > iopt_area_last_iova(area))
+ break;
+
+ next_paddr = iommu_iova_to_phys(area->storage_domain, next_iova);
+
+ if (!next_paddr || next_paddr != tmp_paddr + PAGE_SIZE)
+ break;
+
+ iova = next_iova;
+ tmp_paddr += PAGE_SIZE;
+ tmp_length += PAGE_SIZE;
+ }
+
+ if (max_length && tmp_length > max_length)
+ tmp_length = max_length;
+ *length = tmp_length;
+
+unlock_pages:
+ mutex_unlock(&pages->mutex);
+unlock_exit:
+ up_read(&iopt->iova_rwsem);
+
+ return rc;
+}
+#endif
+
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped)
{
/* If the IOVAs are empty then unmap all succeeds */
diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index fed06c2b728e..ad1c3031f6a9 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -375,6 +375,39 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd)
return rc;
}
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_ioas_noiommu_get_pa *cmd = ucmd->cmd;
+ struct iommufd_ioas *ioas;
+ int rc;
+
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
+ if (cmd->flags || cmd->__reserved)
+ return -EOPNOTSUPP;
+
+ if (cmd->iova >= ULONG_MAX)
+ return -EOVERFLOW;
+
+ ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
+ if (IS_ERR(ioas))
+ return PTR_ERR(ioas);
+
+ rc = iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys,
+ &cmd->length);
+ if (rc)
+ goto out_put;
+
+ rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out_put:
+ iommufd_put_object(ucmd->ictx, &ioas->obj);
+
+ return rc;
+}
+#endif
+
static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx,
struct xarray *ioas_list)
{
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index c8ed612e896a..15909ba75c18 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -118,6 +118,16 @@ int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list,
int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
unsigned long length, unsigned long *unmapped);
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
+ u64 *length);
+#else
+static inline int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova,
+ u64 *paddr, u64 *length)
+{
+ return -EOPNOTSUPP;
+}
+#endif
int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
struct iommu_domain *domain,
@@ -346,6 +356,14 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd);
int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd);
int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd);
+#else
+static inline int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd)
+{
+ return -EOPNOTSUPP;
+}
+#endif
int iommufd_ioas_option(struct iommufd_ucmd *ucmd);
int iommufd_option_rlimit_mode(struct iommu_option *cmd,
struct iommufd_ctx *ictx);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index f6ae60bd3f70..a4668995269c 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -424,6 +424,7 @@ union ucmd_buffer {
struct iommu_ioas_alloc alloc;
struct iommu_ioas_allow_iovas allow_iovas;
struct iommu_ioas_copy ioas_copy;
+ struct iommu_ioas_noiommu_get_pa noiommu_get_pa;
struct iommu_ioas_iova_ranges iova_ranges;
struct iommu_ioas_map map;
struct iommu_ioas_unmap unmap;
@@ -482,6 +483,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, iova),
IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file,
struct iommu_ioas_map_file, iova),
+ IOCTL_OP(IOMMU_IOAS_NOIOMMU_GET_PA, iommufd_ioas_noiommu_get_pa, struct iommu_ioas_noiommu_get_pa,
+ out_phys),
IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap,
length),
IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64),
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index e998dfbd6960..552bc5c096b4 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -57,6 +57,7 @@ enum {
IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
IOMMUFD_CMD_HW_QUEUE_ALLOC = 0x94,
+ IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA = 0x95,
};
/**
@@ -219,6 +220,32 @@ struct iommu_ioas_map {
};
#define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP)
+/**
+ * struct iommu_ioas_noiommu_get_pa - ioctl(IOMMU_IOAS_NOIOMMU_GET_PA)
+ * @size: sizeof(struct iommu_ioas_noiommu_get_pa)
+ * @flags: Reserved, must be 0 for now
+ * @ioas_id: IOAS ID to query IOVA to PA mapping from
+ * @__reserved: Must be 0
+ * @iova: IOVA to query
+ * @length: On input, maximum number of bytes to scan for contiguity (0 means
+ * no limit). On output, actual number of contiguous bytes starting
+ * from out_phys.
+ * @out_phys: Output physical address the IOVA maps to
+ *
+ * Query the physical address backing an IOVA range. The beginning of the
+ * range must be mapped already. For noiommu devices doing unsafe DMA only.
+ */
+struct iommu_ioas_noiommu_get_pa {
+ __u32 size;
+ __u32 flags;
+ __u32 ioas_id;
+ __u32 __reserved;
+ __aligned_u64 iova;
+ __aligned_u64 length;
+ __aligned_u64 out_phys;
+};
+#define IOMMU_IOAS_NOIOMMU_GET_PA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA)
+
/**
* struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE)
* @size: sizeof(struct iommu_ioas_map_file)
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd
2026-06-03 22:02 [PATCH v8 0/6] iommufd: Enable noiommu mode for cdev Jacob Pan
` (3 preceding siblings ...)
2026-06-03 22:02 ` [PATCH v8 4/6] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
@ 2026-06-03 22:02 ` Jacob Pan
2026-06-08 23:19 ` Alex Williamson
2026-06-03 22:02 ` [PATCH v8 6/6] Documentation: Update VFIO NOIOMMU mode Jacob Pan
5 siblings, 1 reply; 11+ messages in thread
From: Jacob Pan @ 2026-06-03 22:02 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
Now that devices under noiommu mode can bind with IOMMUFD and perform
IOAS operations, lift restrictions on cdev from VFIO side.
Use cases are documented in Documentation/driver-api/vfio.rst
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v8:
- Fix warning message (Kevin)
v7:
- Avoid treating emulated device as noiommu device (Sashiko)
- Keep platforms w/ GENERIC_ATOMIC64 to use VFIO group noiommu as
before (Sashiko)
- Restore order of group & cdev init for noiommu (Yi)
- Consolidate noiommu helper for cdev & group (Yi)
v6:
- Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and group.
Use Kconfig dependency to restrict usages and avoid null group
checks. (Alex & Yi)
- Add CAP_SYS_RAWIO checks for cdev open to maintain security parity
with the group noiommu path. (Alex)
v5:
- Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
and its dependencies
- Add comment to explain vfio_noiommu conditional definition (Alex)
- Removed early return for group noiommu in bind/unbind
- Use consistent wording referring to VFIO noiommu mode (Kevin)
- Update unsafe_noiommu Kconfig help text (Kevin)
- Change dev_warn to dev_info for noiommu enabling msg (Kevin)
v4:
- Remove early return in iommufd_bind for noiommu (Alex)
v3:
- Consolidate into fewer patches
v2:
- removed unnecessary device->noiommu set in
iommufd_vfio_compat_ioas_get_id()
---
drivers/vfio/Kconfig | 7 ++++---
drivers/vfio/device_cdev.c | 3 +++
drivers/vfio/iommufd.c | 12 ++++++++----
drivers/vfio/vfio.h | 23 +++++++++--------------
drivers/vfio/vfio_main.c | 26 +++++++++++++++++++++++++-
include/linux/vfio.h | 1 +
6 files changed, 50 insertions(+), 22 deletions(-)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index ceae52fd7586..b9d6e1c22aed 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
The VFIO device cdev is another way for userspace to get device
access. Userspace gets device fd by opening device cdev under
/dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
- to set up secure DMA context for device access. This interface does
- not support noiommu.
+ to set up secure DMA context for device access.
If you don't know what to do here, say N.
@@ -62,7 +61,9 @@ endif
config VFIO_NOIOMMU
bool "VFIO No-IOMMU support"
- depends on VFIO_GROUP
+ depends on VFIO_GROUP || (VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64)
+ depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER
+ select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64
help
VFIO is built on the ability to isolate devices using the IOMMU.
Only with an IOMMU can userspace access to DMA capable devices be
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 54abf312cf04..5ca14979b56e 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
struct vfio_device_file *df;
int ret;
+ if (vfio_device_is_noiommu(device) && !capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
/* Paired with the put in vfio_device_fops_release() */
if (!vfio_device_try_get_registration(device))
return -ENODEV;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index a38d262c6028..e9893d34d07b 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- /* Returns 0 to permit device opening under noiommu mode */
- if (vfio_device_is_noiommu(vdev))
+ /* Group noiommu via iommufd compat needs no device binding */
+ if (df->group && vfio_device_is_noiommu(vdev))
return 0;
return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
@@ -40,7 +40,11 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
lockdep_assert_held(&vdev->dev_set->lock);
- /* compat noiommu does not need to do ioas attach */
+ /*
+ * Compat noiommu does not need to do ioas attach. This helper is
+ * only called from the legacy group/iommufd compat path, so no
+ * explicit df->group check is needed.
+ */
if (vfio_device_is_noiommu(vdev))
return 0;
@@ -58,7 +62,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- if (vfio_device_is_noiommu(vdev))
+ if (df->group && vfio_device_is_noiommu(vdev))
return;
if (vdev->ops->unbind_iommufd)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index e4b72e79b7e3..7728bc99b63d 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -112,11 +112,6 @@ bool vfio_device_has_container(struct vfio_device *device);
int __init vfio_group_init(void);
void vfio_group_cleanup(void);
-static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
-{
- return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
- vdev->group->type == VFIO_NO_IOMMU;
-}
#else
struct vfio_group;
@@ -188,11 +183,17 @@ static inline void vfio_group_cleanup(void)
{
}
+#endif /* CONFIG_VFIO_GROUP */
+
static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
- return false;
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
+ if (vdev->group && vdev->group->type == VFIO_NO_IOMMU)
+ return true;
+#endif
+
+ return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && vdev->noiommu;
}
-#endif /* CONFIG_VFIO_GROUP */
#if IS_ENABLED(CONFIG_VFIO_CONTAINER)
/**
@@ -358,19 +359,13 @@ void vfio_init_device_cdev(struct vfio_device *device);
static inline int vfio_device_add(struct vfio_device *device)
{
- /* cdev does not support noiommu device */
- if (vfio_device_is_noiommu(device))
- return device_add(&device->device);
vfio_init_device_cdev(device);
return cdev_device_add(&device->cdev, &device->device);
}
static inline void vfio_device_del(struct vfio_device *device)
{
- if (vfio_device_is_noiommu(device))
- device_del(&device->device);
- else
- cdev_device_del(&device->cdev, &device->device);
+ cdev_device_del(&device->cdev, &device->device);
}
int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6222376ab6ab..fc8a50941aac 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -321,6 +321,24 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
return ret;
}
+static int vfio_device_set_noiommu_and_name(struct vfio_device *device, enum vfio_group_type type)
+{
+ if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && vfio_noiommu &&
+ !device->dev->iommu && type == VFIO_IOMMU)
+ device->noiommu = true;
+
+ /*
+ * device->noiommu records no-IOMMU support for the standalone cdev
+ * interface. VFIO_NOIOMMU enables both group and cdev no-IOMMU; when
+ * cdev no-IOMMU is available, device->noiommu is set before
+ * vfio_device_set_group(), so the cdev is named noiommu-vfio%d up
+ * front. There cannot be a combination of a plain vfio%d cdev name and
+ * a no-IOMMU group because VFIO_NOIOMMU selects IOMMUFD_NOIOMMU.
+ */
+ return dev_set_name(&device->device, "%svfio%d",
+ device->noiommu ? "noiommu-" : "", device->index);
+}
+
static int __vfio_register_dev(struct vfio_device *device,
enum vfio_group_type type)
{
@@ -340,7 +358,7 @@ static int __vfio_register_dev(struct vfio_device *device,
if (!device->dev_set)
vfio_assign_device_set(device, device);
- ret = dev_set_name(&device->device, "vfio%d", device->index);
+ ret = vfio_device_set_noiommu_and_name(device, type);
if (ret)
return ret;
@@ -348,6 +366,12 @@ static int __vfio_register_dev(struct vfio_device *device,
if (ret)
return ret;
+ if (vfio_device_is_noiommu(device) && IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU)) {
+ add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+ dev_warn(device->dev,
+ "Adding kernel taint for vfio-noiommu cdev\n");
+ }
+
/*
* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
* restore cache coherency. It has to be checked here because it is only
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 31b826efba00..45f08986359e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -74,6 +74,7 @@ struct vfio_device {
u8 iommufd_attached:1;
#endif
u8 cdev_opened:1;
+ u8 noiommu:1;
/*
* debug_root is a static property of the vfio_device
* which must be set prior to registering the vfio_device.
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v8 6/6] Documentation: Update VFIO NOIOMMU mode
2026-06-03 22:02 [PATCH v8 0/6] iommufd: Enable noiommu mode for cdev Jacob Pan
` (4 preceding siblings ...)
2026-06-03 22:02 ` [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
@ 2026-06-03 22:02 ` Jacob Pan
5 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-06-03 22:02 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
Document the NOIOMMU mode with newly added cdev support under iommufd.
Cc: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
V8:
- Remove reference about self test.
v7:
- Added Kconfig matrix
v6:
- Generalize device node names (noiommu-vfioX, noiommu-Y) in the tree
example (Yi)
- Clarify table column descriptions for Yes/No meanings (Yi)
---
Documentation/driver-api/vfio.rst | 81 ++++++++++++++++++++++++++++++-
1 file changed, 79 insertions(+), 2 deletions(-)
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 2a21a42c9386..bf0632a43bc6 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -275,8 +275,6 @@ in a VFIO group.
With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
by directly opening a character device /dev/vfio/devices/vfioX where
"X" is the number allocated uniquely by VFIO for registered devices.
-cdev interface does not support noiommu devices, so user should use
-the legacy group interface if noiommu is wanted.
The cdev only works with IOMMUFD. Both VFIO drivers and applications
must adapt to the new cdev security model which requires using
@@ -370,6 +368,85 @@ IOMMUFD IOAS/HWPT to enable userspace DMA::
/* Other device operations as stated in "VFIO Usage Example" */
+VFIO NOIOMMU mode
+-------------------------------------------------------------------------------
+VFIO also supports a no-IOMMU mode, intended for usages where unsafe DMA can
+be performed by userspace drivers w/o physical IOMMU protection. This mode
+is controlled by the parameter:
+
+/sys/module/vfio/parameters/enable_unsafe_noiommu_mode
+
+Upon enabling this mode, with an assigned device, the user will be presented
+with a VFIO group and device file, e.g.::
+
+ /dev/vfio/
+ |-- devices
+ | `-- noiommu-vfioX /* VFIO device cdev */
+ |-- noiommu-Y /* VFIO group */
+ `-- vfio
+
+The capabilities vary depending on the device programming interface and kernel
+configuration used. The following table summarizes the differences ("Yes" means
+the UAPI is accessible and functional in noiommu mode, "No" means the UAPI is
+not supported):
+
++-------------------+---------------------+----------------------+
+| Feature | VFIO group | VFIO device cdev |
++===================+=====================+======================+
+| VFIO device UAPI | Yes | Yes |
++-------------------+---------------------+----------------------+
+| VFIO container | No | No |
++-------------------+---------------------+----------------------+
+| IOMMUFD IOAS | No | Yes* |
++-------------------+---------------------+----------------------+
+
+Note that the VFIO container case includes IOMMUFD provided VFIO compatibility
+interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER is
+enabled.
+
+* IOMMUFD UAPI is available for VFIO device cdev to pin and map user memory with
+ the ability to retrieve physical addresses for DMA command submission.
+
+Kconfig Support Matrix
+^^^^^^^^^^^^^^^^^^^^^^
+
+The visibility of CONFIG_VFIO_NOIOMMU depends on the combination of
+CONFIG_VFIO_GROUP, CONFIG_VFIO_DEVICE_CDEV, and whether a container backend
+(CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER) is configured. The
+Kconfig dependencies enforce the following constraints:
+
+- At least one access path (group or cdev) must be available.
+- If VFIO_GROUP is enabled, a container backend is required; otherwise the
+ group node would be unusable in noiommu mode.
+
+The resulting support matrix:
+
++------+-------+-----------+------+---------+---------------------------+
+| Case | GROUP | Container | CDEV | NOIOMMU | Notes |
++======+=======+===========+======+=========+===========================+
+| 1 | y | y | n | yes | Group noiommu works |
++------+-------+-----------+------+---------+---------------------------+
+| 2 | y | n | n | no | Blocked - no container |
++------+-------+-----------+------+---------+---------------------------+
+| 3 | y | y | y | yes | Both paths work |
++------+-------+-----------+------+---------+---------------------------+
+| 4 | y | n | y | no | Blocked - no container |
++------+-------+-----------+------+---------+---------------------------+
+| 5 | n | - | y | yes | Cdev-only works |
++------+-------+-----------+------+---------+---------------------------+
+| 6 | n | - | n | no | No access path |
++------+-------+-----------+------+---------+---------------------------+
+
+Container = CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER (either
+suffices). Case 4 is intentionally blocked: allowing NOIOMMU with GROUP
+enabled but no container would create unusable group nodes. Users who want
+cdev-only noiommu should set CONFIG_VFIO_GROUP=n (case 5).
+
+A new IOMMUFD ioctl IOMMU_IOAS_NOIOMMU_GET_PA is added to retrieve the physical
+address for a given IOVA. Although there is no physical DMA remapping hardware,
+IOMMU_IOAS_MAP_FIXED_IOVA is still used to establish IOVA-to-PA mappings in the
+software page table for later IOMMU_IOAS_NOIOMMU_GET_PA lookups.
+
VFIO User API
-------------------------------------------------------------------------------
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd
2026-06-03 22:02 ` [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
@ 2026-06-08 23:19 ` Alex Williamson
2026-06-09 18:50 ` Jacob Pan
0 siblings, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2026-06-08 23:19 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Joerg Roedel, Mostafa Saleh, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon, alex
On Wed, 3 Jun 2026 15:02:10 -0700
Jacob Pan <jacob.pan@linux.microsoft.com> wrote:
> Now that devices under noiommu mode can bind with IOMMUFD and perform
> IOAS operations, lift restrictions on cdev from VFIO side.
> Use cases are documented in Documentation/driver-api/vfio.rst
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> v8:
> - Fix warning message (Kevin)
> v7:
> - Avoid treating emulated device as noiommu device (Sashiko)
> - Keep platforms w/ GENERIC_ATOMIC64 to use VFIO group noiommu as
> before (Sashiko)
> - Restore order of group & cdev init for noiommu (Yi)
> - Consolidate noiommu helper for cdev & group (Yi)
> v6:
> - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and group.
> Use Kconfig dependency to restrict usages and avoid null group
> checks. (Alex & Yi)
> - Add CAP_SYS_RAWIO checks for cdev open to maintain security parity
> with the group noiommu path. (Alex)
> v5:
> - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
> and its dependencies
> - Add comment to explain vfio_noiommu conditional definition (Alex)
> - Removed early return for group noiommu in bind/unbind
> - Use consistent wording referring to VFIO noiommu mode (Kevin)
> - Update unsafe_noiommu Kconfig help text (Kevin)
> - Change dev_warn to dev_info for noiommu enabling msg (Kevin)
> v4:
> - Remove early return in iommufd_bind for noiommu (Alex)
> v3:
> - Consolidate into fewer patches
> v2:
> - removed unnecessary device->noiommu set in
> iommufd_vfio_compat_ioas_get_id()
>
> ---
> drivers/vfio/Kconfig | 7 ++++---
> drivers/vfio/device_cdev.c | 3 +++
> drivers/vfio/iommufd.c | 12 ++++++++----
> drivers/vfio/vfio.h | 23 +++++++++--------------
> drivers/vfio/vfio_main.c | 26 +++++++++++++++++++++++++-
> include/linux/vfio.h | 1 +
> 6 files changed, 50 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index ceae52fd7586..b9d6e1c22aed 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
> The VFIO device cdev is another way for userspace to get device
> access. Userspace gets device fd by opening device cdev under
> /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
> - to set up secure DMA context for device access. This interface does
> - not support noiommu.
> + to set up secure DMA context for device access.
>
> If you don't know what to do here, say N.
>
> @@ -62,7 +61,9 @@ endif
>
> config VFIO_NOIOMMU
> bool "VFIO No-IOMMU support"
> - depends on VFIO_GROUP
> + depends on VFIO_GROUP || (VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64)
> + depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER
> + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64
Sashiko is warning about this and it seems real, if the config were
something like this:
CONFIG_GENERIC_ATOMIC64=y
CONFIG_VFIO=y
CONFIG_VFIO_GROUP=y
CONFIG_VFIO_CONTAINER=y
CONFIG_VFIO_DEVICE_CDEV=y
The result is:
# => CONFIG_VFIO_NOIOMMU=y
# => CONFIG_IOMMUFD_NOIOMMU is not set
Which can result in:
/dev/vfio/
├── devices/
│ └── vfio0
└── noiommu-0
The cdev exists without the noiommu- prefix.
Something like this might work
config VFIO_NOIOMMU
bool "VFIO No-IOMMU support"
depends on VFIO_GROUP || (VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64)
+ depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER
- select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64
+ select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
help
VFIO is built on the ability to isolate devices using the IOMMU.
> help
> VFIO is built on the ability to isolate devices using the IOMMU.
> Only with an IOMMU can userspace access to DMA capable devices be
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 54abf312cf04..5ca14979b56e 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
> struct vfio_device_file *df;
> int ret;
>
> + if (vfio_device_is_noiommu(device) && !capable(CAP_SYS_RAWIO))
> + return -EPERM;
> +
Sashiko also notes a use-after-free issue here that seems real, we
likely need a vfio_device_try_get_registration() before with put on
error. Thanks,
Alex
> /* Paired with the put in vfio_device_fops_release() */
> if (!vfio_device_try_get_registration(device))
> return -ENODEV;
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index a38d262c6028..e9893d34d07b 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df)
>
> lockdep_assert_held(&vdev->dev_set->lock);
>
> - /* Returns 0 to permit device opening under noiommu mode */
> - if (vfio_device_is_noiommu(vdev))
> + /* Group noiommu via iommufd compat needs no device binding */
> + if (df->group && vfio_device_is_noiommu(vdev))
> return 0;
>
> return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
> @@ -40,7 +40,11 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
>
> lockdep_assert_held(&vdev->dev_set->lock);
>
> - /* compat noiommu does not need to do ioas attach */
> + /*
> + * Compat noiommu does not need to do ioas attach. This helper is
> + * only called from the legacy group/iommufd compat path, so no
> + * explicit df->group check is needed.
> + */
> if (vfio_device_is_noiommu(vdev))
> return 0;
>
> @@ -58,7 +62,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
>
> lockdep_assert_held(&vdev->dev_set->lock);
>
> - if (vfio_device_is_noiommu(vdev))
> + if (df->group && vfio_device_is_noiommu(vdev))
> return;
>
> if (vdev->ops->unbind_iommufd)
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index e4b72e79b7e3..7728bc99b63d 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -112,11 +112,6 @@ bool vfio_device_has_container(struct vfio_device *device);
> int __init vfio_group_init(void);
> void vfio_group_cleanup(void);
>
> -static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> -{
> - return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> - vdev->group->type == VFIO_NO_IOMMU;
> -}
> #else
> struct vfio_group;
>
> @@ -188,11 +183,17 @@ static inline void vfio_group_cleanup(void)
> {
> }
>
> +#endif /* CONFIG_VFIO_GROUP */
> +
> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> {
> - return false;
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
> + if (vdev->group && vdev->group->type == VFIO_NO_IOMMU)
> + return true;
> +#endif
> +
> + return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && vdev->noiommu;
> }
> -#endif /* CONFIG_VFIO_GROUP */
>
> #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
> /**
> @@ -358,19 +359,13 @@ void vfio_init_device_cdev(struct vfio_device *device);
>
> static inline int vfio_device_add(struct vfio_device *device)
> {
> - /* cdev does not support noiommu device */
> - if (vfio_device_is_noiommu(device))
> - return device_add(&device->device);
> vfio_init_device_cdev(device);
> return cdev_device_add(&device->cdev, &device->device);
> }
>
> static inline void vfio_device_del(struct vfio_device *device)
> {
> - if (vfio_device_is_noiommu(device))
> - device_del(&device->device);
> - else
> - cdev_device_del(&device->cdev, &device->device);
> + cdev_device_del(&device->cdev, &device->device);
> }
>
> int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 6222376ab6ab..fc8a50941aac 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -321,6 +321,24 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
> return ret;
> }
>
> +static int vfio_device_set_noiommu_and_name(struct vfio_device *device, enum vfio_group_type type)
> +{
> + if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && vfio_noiommu &&
> + !device->dev->iommu && type == VFIO_IOMMU)
> + device->noiommu = true;
> +
> + /*
> + * device->noiommu records no-IOMMU support for the standalone cdev
> + * interface. VFIO_NOIOMMU enables both group and cdev no-IOMMU; when
> + * cdev no-IOMMU is available, device->noiommu is set before
> + * vfio_device_set_group(), so the cdev is named noiommu-vfio%d up
> + * front. There cannot be a combination of a plain vfio%d cdev name and
> + * a no-IOMMU group because VFIO_NOIOMMU selects IOMMUFD_NOIOMMU.
> + */
> + return dev_set_name(&device->device, "%svfio%d",
> + device->noiommu ? "noiommu-" : "", device->index);
> +}
> +
> static int __vfio_register_dev(struct vfio_device *device,
> enum vfio_group_type type)
> {
> @@ -340,7 +358,7 @@ static int __vfio_register_dev(struct vfio_device *device,
> if (!device->dev_set)
> vfio_assign_device_set(device, device);
>
> - ret = dev_set_name(&device->device, "vfio%d", device->index);
> + ret = vfio_device_set_noiommu_and_name(device, type);
> if (ret)
> return ret;
>
> @@ -348,6 +366,12 @@ static int __vfio_register_dev(struct vfio_device *device,
> if (ret)
> return ret;
>
> + if (vfio_device_is_noiommu(device) && IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU)) {
> + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> + dev_warn(device->dev,
> + "Adding kernel taint for vfio-noiommu cdev\n");
> + }
> +
> /*
> * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> * restore cache coherency. It has to be checked here because it is only
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 31b826efba00..45f08986359e 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -74,6 +74,7 @@ struct vfio_device {
> u8 iommufd_attached:1;
> #endif
> u8 cdev_opened:1;
> + u8 noiommu:1;
> /*
> * debug_root is a static property of the vfio_device
> * which must be set prior to registering the vfio_device.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd
2026-06-08 23:19 ` Alex Williamson
@ 2026-06-09 18:50 ` Jacob Pan
2026-06-09 20:07 ` Alex Williamson
0 siblings, 1 reply; 11+ messages in thread
From: Jacob Pan @ 2026-06-09 18:50 UTC (permalink / raw)
To: Alex Williamson
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Joerg Roedel, Mostafa Saleh, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon, jacob.pan
Hi Alex,
On Mon, 8 Jun 2026 17:19:56 -0600
Alex Williamson <alex@shazbot.org> wrote:
> From: Alex Williamson <alex@shazbot.org>
> To: Jacob Pan <jacob.pan@linux.microsoft.com>
> Cc: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev"
> <iommu@lists.linux.dev>, Jason Gunthorpe <jgg@nvidia.com>, Joerg
> Roedel <joro@8bytes.org>, Mostafa Saleh <smostafa@google.com>, David
> Matlack <dmatlack@google.com>, Robin Murphy <robin.murphy@arm.com>,
> Nicolin Chen <nicolinc@nvidia.com>, "Tian, Kevin"
> <kevin.tian@intel.com>, Yi Liu <yi.l.liu@intel.com>, Baolu Lu
> <baolu.lu@linux.intel.com>, Saurabh Sengar
> <ssengar@linux.microsoft.com>, skhawaja@google.com,
> pasha.tatashin@soleen.com, Will Deacon <will@kernel.org>,
> alex@shazbot.org Subject: Re: [PATCH v8 5/6] vfio: Enable cdev
> noiommu mode under iommufd Date: Mon, 8 Jun 2026 17:19:56 -0600
> X-Mailer: Claws Mail 4.4.0 (GTK 3.24.52; x86_64-pc-linux-gnu)
>
> On Wed, 3 Jun 2026 15:02:10 -0700
> Jacob Pan <jacob.pan@linux.microsoft.com> wrote:
>
> > Now that devices under noiommu mode can bind with IOMMUFD and
> > perform IOAS operations, lift restrictions on cdev from VFIO side.
> > Use cases are documented in Documentation/driver-api/vfio.rst
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > v8:
> > - Fix warning message (Kevin)
> > v7:
> > - Avoid treating emulated device as noiommu device (Sashiko)
> > - Keep platforms w/ GENERIC_ATOMIC64 to use VFIO group noiommu as
> > before (Sashiko)
> > - Restore order of group & cdev init for noiommu (Yi)
> > - Consolidate noiommu helper for cdev & group (Yi)
> > v6:
> > - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and
> > group. Use Kconfig dependency to restrict usages and avoid null
> > group checks. (Alex & Yi)
> > - Add CAP_SYS_RAWIO checks for cdev open to maintain security
> > parity with the group noiommu path. (Alex)
> > v5:
> > - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
> > and its dependencies
> > - Add comment to explain vfio_noiommu conditional definition
> > (Alex)
> > - Removed early return for group noiommu in bind/unbind
> > - Use consistent wording referring to VFIO noiommu mode (Kevin)
> > - Update unsafe_noiommu Kconfig help text (Kevin)
> > - Change dev_warn to dev_info for noiommu enabling msg (Kevin)
> > v4:
> > - Remove early return in iommufd_bind for noiommu (Alex)
> > v3:
> > - Consolidate into fewer patches
> > v2:
> > - removed unnecessary device->noiommu set in
> > iommufd_vfio_compat_ioas_get_id()
> >
> > ---
> > drivers/vfio/Kconfig | 7 ++++---
> > drivers/vfio/device_cdev.c | 3 +++
> > drivers/vfio/iommufd.c | 12 ++++++++----
> > drivers/vfio/vfio.h | 23 +++++++++--------------
> > drivers/vfio/vfio_main.c | 26 +++++++++++++++++++++++++-
> > include/linux/vfio.h | 1 +
> > 6 files changed, 50 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index ceae52fd7586..b9d6e1c22aed 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
> > The VFIO device cdev is another way for userspace to get
> > device access. Userspace gets device fd by opening device cdev under
> > /dev/vfio/devices/vfioX, and then bind the device fd
> > with an iommufd
> > - to set up secure DMA context for device access. This
> > interface does
> > - not support noiommu.
> > + to set up secure DMA context for device access.
> >
> > If you don't know what to do here, say N.
> >
> > @@ -62,7 +61,9 @@ endif
> >
> > config VFIO_NOIOMMU
> > bool "VFIO No-IOMMU support"
> > - depends on VFIO_GROUP
> > + depends on VFIO_GROUP || (VFIO_DEVICE_CDEV &&
> > !GENERIC_ATOMIC64)
> > + depends on !VFIO_GROUP || VFIO_CONTAINER ||
> > IOMMUFD_VFIO_CONTAINER
> > + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV &&
> > !GENERIC_ATOMIC64
>
> Sashiko is warning about this and it seems real, if the config were
> something like this:
>
> CONFIG_GENERIC_ATOMIC64=y
> CONFIG_VFIO=y
> CONFIG_VFIO_GROUP=y
> CONFIG_VFIO_CONTAINER=y
> CONFIG_VFIO_DEVICE_CDEV=y
>
> The result is:
>
> # => CONFIG_VFIO_NOIOMMU=y
> # => CONFIG_IOMMUFD_NOIOMMU is not set
>
> Which can result in:
>
> /dev/vfio/
> ├── devices/
> │ └── vfio0
> └── noiommu-0
>
> The cdev exists without the noiommu- prefix.
>
Indeed, I thought about this which is why I put this comment in the code
"There cannot be a combination of a plain vfio%d cdev name and
a no-IOMMU group because VFIO_NOIOMMU selects IOMMUFD_NOIOMMU."
But I missed the select logic.
> Something like this might work
>
> config VFIO_NOIOMMU
> bool "VFIO No-IOMMU support"
> depends on VFIO_GROUP || (VFIO_DEVICE_CDEV &&
> !GENERIC_ATOMIC64)
> + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
> depends on !VFIO_GROUP || VFIO_CONTAINER ||
> IOMMUFD_VFIO_CONTAINER
> - select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV &&
> !GENERIC_ATOMIC64
> + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
> help
> VFIO is built on the ability to isolate devices using
> the IOMMU.
>
This will work, but it disables VFIO_NOIOMMU for configs with
VFIO_DEVICE_CDEV=y and GENERIC_ATOMIC64=y, even though the legacy group
noiommu path still works there. That can break existing distro configs
which enable both VFIO_GROUP and VFIO_DEVICE_CDEV, right?
How about add code change to skip noiommu cdev registeration if
IOMMUFD_NOIOMMU is not enabled? i.e.
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -359,13 +359,21 @@ void vfio_init_device_cdev(struct vfio_device
*device);
static inline int vfio_device_add(struct vfio_device *device)
{
+ if (vfio_device_is_noiommu(device) &&
+ !IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU))
+ return device_add(&device->device);
+
vfio_init_device_cdev(device);
return cdev_device_add(&device->cdev, &device->device);
}
static inline void vfio_device_del(struct vfio_device *device)
{
- cdev_device_del(&device->cdev, &device->device);
+ if (vfio_device_is_noiommu(device) &&
+ !IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU))
+ device_del(&device->device);
+ else
+ cdev_device_del(&device->cdev, &device->device);
}
I will also update the documentation to state this behavior:
"The cdev noiommu path requires CONFIG_GENERIC_ATOMIC64=n. When
CONFIG_VFIO_GROUP=y, CONFIG_VFIO_DEVICE_CDEV=y, and
CONFIG_GENERIC_ATOMIC64=y, CONFIG_VFIO_NOIOMMU remains selectable for
the group path, but no noiommu device cdev is registered. Cdev-only
noiommu is not selectable on those platforms."
> > help
> > VFIO is built on the ability to isolate devices using
> > the IOMMU. Only with an IOMMU can userspace access to DMA capable
> > devices be diff --git a/drivers/vfio/device_cdev.c
> > b/drivers/vfio/device_cdev.c index 54abf312cf04..5ca14979b56e 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode
> > *inode, struct file *filep) struct vfio_device_file *df;
> > int ret;
> >
> > + if (vfio_device_is_noiommu(device) &&
> > !capable(CAP_SYS_RAWIO))
> > + return -EPERM;
> > +
>
> Sashiko also notes a use-after-free issue here that seems real, we
> likely need a vfio_device_try_get_registration() before with put on
> error. Thanks,
>
right, will move it after vfio_device_try_get_registration().
> Alex
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd
2026-06-09 18:50 ` Jacob Pan
@ 2026-06-09 20:07 ` Alex Williamson
2026-06-09 21:11 ` Jacob Pan
0 siblings, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2026-06-09 20:07 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Joerg Roedel, Mostafa Saleh, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon, alex
On Tue, 9 Jun 2026 11:50:58 -0700
Jacob Pan <jacob.pan@linux.microsoft.com> wrote:
> Hi Alex,
>
> On Mon, 8 Jun 2026 17:19:56 -0600
> Alex Williamson <alex@shazbot.org> wrote:
>
> > From: Alex Williamson <alex@shazbot.org>
> > To: Jacob Pan <jacob.pan@linux.microsoft.com>
> > Cc: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev"
> > <iommu@lists.linux.dev>, Jason Gunthorpe <jgg@nvidia.com>, Joerg
> > Roedel <joro@8bytes.org>, Mostafa Saleh <smostafa@google.com>, David
> > Matlack <dmatlack@google.com>, Robin Murphy <robin.murphy@arm.com>,
> > Nicolin Chen <nicolinc@nvidia.com>, "Tian, Kevin"
> > <kevin.tian@intel.com>, Yi Liu <yi.l.liu@intel.com>, Baolu Lu
> > <baolu.lu@linux.intel.com>, Saurabh Sengar
> > <ssengar@linux.microsoft.com>, skhawaja@google.com,
> > pasha.tatashin@soleen.com, Will Deacon <will@kernel.org>,
> > alex@shazbot.org Subject: Re: [PATCH v8 5/6] vfio: Enable cdev
> > noiommu mode under iommufd Date: Mon, 8 Jun 2026 17:19:56 -0600
> > X-Mailer: Claws Mail 4.4.0 (GTK 3.24.52; x86_64-pc-linux-gnu)
> >
> > On Wed, 3 Jun 2026 15:02:10 -0700
> > Jacob Pan <jacob.pan@linux.microsoft.com> wrote:
> >
> > > Now that devices under noiommu mode can bind with IOMMUFD and
> > > perform IOAS operations, lift restrictions on cdev from VFIO side.
> > > Use cases are documented in Documentation/driver-api/vfio.rst
> > >
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > > ---
> > > v8:
> > > - Fix warning message (Kevin)
> > > v7:
> > > - Avoid treating emulated device as noiommu device (Sashiko)
> > > - Keep platforms w/ GENERIC_ATOMIC64 to use VFIO group noiommu as
> > > before (Sashiko)
> > > - Restore order of group & cdev init for noiommu (Yi)
> > > - Consolidate noiommu helper for cdev & group (Yi)
> > > v6:
> > > - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and
> > > group. Use Kconfig dependency to restrict usages and avoid null
> > > group checks. (Alex & Yi)
> > > - Add CAP_SYS_RAWIO checks for cdev open to maintain security
> > > parity with the group noiommu path. (Alex)
> > > v5:
> > > - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
> > > and its dependencies
> > > - Add comment to explain vfio_noiommu conditional definition
> > > (Alex)
> > > - Removed early return for group noiommu in bind/unbind
> > > - Use consistent wording referring to VFIO noiommu mode (Kevin)
> > > - Update unsafe_noiommu Kconfig help text (Kevin)
> > > - Change dev_warn to dev_info for noiommu enabling msg (Kevin)
> > > v4:
> > > - Remove early return in iommufd_bind for noiommu (Alex)
> > > v3:
> > > - Consolidate into fewer patches
> > > v2:
> > > - removed unnecessary device->noiommu set in
> > > iommufd_vfio_compat_ioas_get_id()
> > >
> > > ---
> > > drivers/vfio/Kconfig | 7 ++++---
> > > drivers/vfio/device_cdev.c | 3 +++
> > > drivers/vfio/iommufd.c | 12 ++++++++----
> > > drivers/vfio/vfio.h | 23 +++++++++--------------
> > > drivers/vfio/vfio_main.c | 26 +++++++++++++++++++++++++-
> > > include/linux/vfio.h | 1 +
> > > 6 files changed, 50 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > > index ceae52fd7586..b9d6e1c22aed 100644
> > > --- a/drivers/vfio/Kconfig
> > > +++ b/drivers/vfio/Kconfig
> > > @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
> > > The VFIO device cdev is another way for userspace to get
> > > device access. Userspace gets device fd by opening device cdev under
> > > /dev/vfio/devices/vfioX, and then bind the device fd
> > > with an iommufd
> > > - to set up secure DMA context for device access. This
> > > interface does
> > > - not support noiommu.
> > > + to set up secure DMA context for device access.
> > >
> > > If you don't know what to do here, say N.
> > >
> > > @@ -62,7 +61,9 @@ endif
> > >
> > > config VFIO_NOIOMMU
> > > bool "VFIO No-IOMMU support"
> > > - depends on VFIO_GROUP
> > > + depends on VFIO_GROUP || (VFIO_DEVICE_CDEV &&
> > > !GENERIC_ATOMIC64)
> > > + depends on !VFIO_GROUP || VFIO_CONTAINER ||
> > > IOMMUFD_VFIO_CONTAINER
> > > + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV &&
> > > !GENERIC_ATOMIC64
> >
> > Sashiko is warning about this and it seems real, if the config were
> > something like this:
> >
> > CONFIG_GENERIC_ATOMIC64=y
> > CONFIG_VFIO=y
> > CONFIG_VFIO_GROUP=y
> > CONFIG_VFIO_CONTAINER=y
> > CONFIG_VFIO_DEVICE_CDEV=y
> >
> > The result is:
> >
> > # => CONFIG_VFIO_NOIOMMU=y
> > # => CONFIG_IOMMUFD_NOIOMMU is not set
> >
> > Which can result in:
> >
> > /dev/vfio/
> > ├── devices/
> > │ └── vfio0
> > └── noiommu-0
> >
> > The cdev exists without the noiommu- prefix.
> >
> Indeed, I thought about this which is why I put this comment in the code
> "There cannot be a combination of a plain vfio%d cdev name and
> a no-IOMMU group because VFIO_NOIOMMU selects IOMMUFD_NOIOMMU."
> But I missed the select logic.
>
> > Something like this might work
> >
> > config VFIO_NOIOMMU
> > bool "VFIO No-IOMMU support"
> > depends on VFIO_GROUP || (VFIO_DEVICE_CDEV &&
> > !GENERIC_ATOMIC64)
> > + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
> > depends on !VFIO_GROUP || VFIO_CONTAINER ||
> > IOMMUFD_VFIO_CONTAINER
> > - select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV &&
> > !GENERIC_ATOMIC64
> > + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
> > help
> > VFIO is built on the ability to isolate devices using
> > the IOMMU.
> >
>
> This will work, but it disables VFIO_NOIOMMU for configs with
> VFIO_DEVICE_CDEV=y and GENERIC_ATOMIC64=y, even though the legacy group
> noiommu path still works there. That can break existing distro configs
> which enable both VFIO_GROUP and VFIO_DEVICE_CDEV, right?
>
> How about add code change to skip noiommu cdev registeration if
> IOMMUFD_NOIOMMU is not enabled? i.e.
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -359,13 +359,21 @@ void vfio_init_device_cdev(struct vfio_device
> *device);
>
> static inline int vfio_device_add(struct vfio_device *device)
> {
> + if (vfio_device_is_noiommu(device) &&
> + !IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU))
> + return device_add(&device->device);
> +
> vfio_init_device_cdev(device);
> return cdev_device_add(&device->cdev, &device->device);
> }
>
> static inline void vfio_device_del(struct vfio_device *device)
> {
> - cdev_device_del(&device->cdev, &device->device);
> + if (vfio_device_is_noiommu(device) &&
> + !IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU))
> + device_del(&device->device);
> + else
> + cdev_device_del(&device->cdev, &device->device);
> }
> I will also update the documentation to state this behavior:
>
> "The cdev noiommu path requires CONFIG_GENERIC_ATOMIC64=n. When
> CONFIG_VFIO_GROUP=y, CONFIG_VFIO_DEVICE_CDEV=y, and
> CONFIG_GENERIC_ATOMIC64=y, CONFIG_VFIO_NOIOMMU remains selectable for
> the group path, but no noiommu device cdev is registered. Cdev-only
> noiommu is not selectable on those platforms."
I suspect that the Venn diagram of the set of platforms that set
GENERIC_ATOMIC64 and the set of platforms we care about distro config
compatibility (or even the existence of a distro) is pretty nearly
disjoint. That said, your solution is better.
One check though, it looks like cdev_device_{add,del} already degrade
to device_{add,del} when device->devt == 0, so we could maybe simplify
by making vfio_init_device_cdev() conditional and the rest falls out
automatically. That also avoids the device->group traversal to check
noiommu on the del path. Thanks,
Alex
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd
2026-06-09 20:07 ` Alex Williamson
@ 2026-06-09 21:11 ` Jacob Pan
0 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-06-09 21:11 UTC (permalink / raw)
To: Alex Williamson
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Joerg Roedel, Mostafa Saleh, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon, jacob.pan
Hi Alex,
On Tue, 9 Jun 2026 14:07:57 -0600
Alex Williamson <alex@shazbot.org> wrote:
> On Tue, 9 Jun 2026 11:50:58 -0700
> Jacob Pan <jacob.pan@linux.microsoft.com> wrote:
>
> > Hi Alex,
> >
> > On Mon, 8 Jun 2026 17:19:56 -0600
> > Alex Williamson <alex@shazbot.org> wrote:
> >
> > > From: Alex Williamson <alex@shazbot.org>
> > > To: Jacob Pan <jacob.pan@linux.microsoft.com>
> > > Cc: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev"
> > > <iommu@lists.linux.dev>, Jason Gunthorpe <jgg@nvidia.com>, Joerg
> > > Roedel <joro@8bytes.org>, Mostafa Saleh <smostafa@google.com>,
> > > David Matlack <dmatlack@google.com>, Robin Murphy
> > > <robin.murphy@arm.com>, Nicolin Chen <nicolinc@nvidia.com>,
> > > "Tian, Kevin" <kevin.tian@intel.com>, Yi Liu
> > > <yi.l.liu@intel.com>, Baolu Lu <baolu.lu@linux.intel.com>,
> > > Saurabh Sengar <ssengar@linux.microsoft.com>, skhawaja@google.com,
> > > pasha.tatashin@soleen.com, Will Deacon <will@kernel.org>,
> > > alex@shazbot.org Subject: Re: [PATCH v8 5/6] vfio: Enable cdev
> > > noiommu mode under iommufd Date: Mon, 8 Jun 2026 17:19:56 -0600
> > > X-Mailer: Claws Mail 4.4.0 (GTK 3.24.52; x86_64-pc-linux-gnu)
> > >
> > > On Wed, 3 Jun 2026 15:02:10 -0700
> > > Jacob Pan <jacob.pan@linux.microsoft.com> wrote:
> > >
> > > > Now that devices under noiommu mode can bind with IOMMUFD and
> > > > perform IOAS operations, lift restrictions on cdev from VFIO
> > > > side. Use cases are documented in
> > > > Documentation/driver-api/vfio.rst
> > > >
> > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > > > ---
> > > > v8:
> > > > - Fix warning message (Kevin)
> > > > v7:
> > > > - Avoid treating emulated device as noiommu device (Sashiko)
> > > > - Keep platforms w/ GENERIC_ATOMIC64 to use VFIO group
> > > > noiommu as before (Sashiko)
> > > > - Restore order of group & cdev init for noiommu (Yi)
> > > > - Consolidate noiommu helper for cdev & group (Yi)
> > > > v6:
> > > > - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev
> > > > and group. Use Kconfig dependency to restrict usages and avoid
> > > > null group checks. (Alex & Yi)
> > > > - Add CAP_SYS_RAWIO checks for cdev open to maintain security
> > > > parity with the group noiommu path. (Alex)
> > > > v5:
> > > > - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
> > > > and its dependencies
> > > > - Add comment to explain vfio_noiommu conditional definition
> > > > (Alex)
> > > > - Removed early return for group noiommu in bind/unbind
> > > > - Use consistent wording referring to VFIO noiommu mode
> > > > (Kevin)
> > > > - Update unsafe_noiommu Kconfig help text (Kevin)
> > > > - Change dev_warn to dev_info for noiommu enabling msg
> > > > (Kevin) v4:
> > > > - Remove early return in iommufd_bind for noiommu (Alex)
> > > > v3:
> > > > - Consolidate into fewer patches
> > > > v2:
> > > > - removed unnecessary device->noiommu set in
> > > > iommufd_vfio_compat_ioas_get_id()
> > > >
> > > > ---
> > > > drivers/vfio/Kconfig | 7 ++++---
> > > > drivers/vfio/device_cdev.c | 3 +++
> > > > drivers/vfio/iommufd.c | 12 ++++++++----
> > > > drivers/vfio/vfio.h | 23 +++++++++--------------
> > > > drivers/vfio/vfio_main.c | 26 +++++++++++++++++++++++++-
> > > > include/linux/vfio.h | 1 +
> > > > 6 files changed, 50 insertions(+), 22 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > > > index ceae52fd7586..b9d6e1c22aed 100644
> > > > --- a/drivers/vfio/Kconfig
> > > > +++ b/drivers/vfio/Kconfig
> > > > @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
> > > > The VFIO device cdev is another way for userspace to
> > > > get device access. Userspace gets device fd by opening device
> > > > cdev under /dev/vfio/devices/vfioX, and then bind the device fd
> > > > with an iommufd
> > > > - to set up secure DMA context for device access. This
> > > > interface does
> > > > - not support noiommu.
> > > > + to set up secure DMA context for device access.
> > > >
> > > > If you don't know what to do here, say N.
> > > >
> > > > @@ -62,7 +61,9 @@ endif
> > > >
> > > > config VFIO_NOIOMMU
> > > > bool "VFIO No-IOMMU support"
> > > > - depends on VFIO_GROUP
> > > > + depends on VFIO_GROUP || (VFIO_DEVICE_CDEV &&
> > > > !GENERIC_ATOMIC64)
> > > > + depends on !VFIO_GROUP || VFIO_CONTAINER ||
> > > > IOMMUFD_VFIO_CONTAINER
> > > > + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV &&
> > > > !GENERIC_ATOMIC64
> > >
> > > Sashiko is warning about this and it seems real, if the config
> > > were something like this:
> > >
> > > CONFIG_GENERIC_ATOMIC64=y
> > > CONFIG_VFIO=y
> > > CONFIG_VFIO_GROUP=y
> > > CONFIG_VFIO_CONTAINER=y
> > > CONFIG_VFIO_DEVICE_CDEV=y
> > >
> > > The result is:
> > >
> > > # => CONFIG_VFIO_NOIOMMU=y
> > > # => CONFIG_IOMMUFD_NOIOMMU is not set
> > >
> > > Which can result in:
> > >
> > > /dev/vfio/
> > > ├── devices/
> > > │ └── vfio0
> > > └── noiommu-0
> > >
> > > The cdev exists without the noiommu- prefix.
> > >
> > Indeed, I thought about this which is why I put this comment in the
> > code "There cannot be a combination of a plain vfio%d cdev name and
> > a no-IOMMU group because VFIO_NOIOMMU selects IOMMUFD_NOIOMMU."
> > But I missed the select logic.
> >
> > > Something like this might work
> > >
> > > config VFIO_NOIOMMU
> > > bool "VFIO No-IOMMU support"
> > > depends on VFIO_GROUP || (VFIO_DEVICE_CDEV &&
> > > !GENERIC_ATOMIC64)
> > > + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
> > > depends on !VFIO_GROUP || VFIO_CONTAINER ||
> > > IOMMUFD_VFIO_CONTAINER
> > > - select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV &&
> > > !GENERIC_ATOMIC64
> > > + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
> > > help
> > > VFIO is built on the ability to isolate devices using
> > > the IOMMU.
> > >
> >
> > This will work, but it disables VFIO_NOIOMMU for configs with
> > VFIO_DEVICE_CDEV=y and GENERIC_ATOMIC64=y, even though the legacy
> > group noiommu path still works there. That can break existing
> > distro configs which enable both VFIO_GROUP and VFIO_DEVICE_CDEV,
> > right?
> >
> > How about add code change to skip noiommu cdev registeration if
> > IOMMUFD_NOIOMMU is not enabled? i.e.
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -359,13 +359,21 @@ void vfio_init_device_cdev(struct vfio_device
> > *device);
> >
> > static inline int vfio_device_add(struct vfio_device *device)
> > {
> > + if (vfio_device_is_noiommu(device) &&
> > + !IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU))
> > + return device_add(&device->device);
> > +
> > vfio_init_device_cdev(device);
> > return cdev_device_add(&device->cdev, &device->device);
> > }
> >
> > static inline void vfio_device_del(struct vfio_device *device)
> > {
> > - cdev_device_del(&device->cdev, &device->device);
> > + if (vfio_device_is_noiommu(device) &&
> > + !IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU))
> > + device_del(&device->device);
> > + else
> > + cdev_device_del(&device->cdev, &device->device);
> > }
> > I will also update the documentation to state this behavior:
> >
> > "The cdev noiommu path requires CONFIG_GENERIC_ATOMIC64=n. When
> > CONFIG_VFIO_GROUP=y, CONFIG_VFIO_DEVICE_CDEV=y, and
> > CONFIG_GENERIC_ATOMIC64=y, CONFIG_VFIO_NOIOMMU remains selectable
> > for the group path, but no noiommu device cdev is registered.
> > Cdev-only noiommu is not selectable on those platforms."
>
> I suspect that the Venn diagram of the set of platforms that set
> GENERIC_ATOMIC64 and the set of platforms we care about distro config
> compatibility (or even the existence of a distro) is pretty nearly
> disjoint. That said, your solution is better.
>
> One check though, it looks like cdev_device_{add,del} already degrade
> to device_{add,del} when device->devt == 0, so we could maybe simplify
> by making vfio_init_device_cdev() conditional and the rest falls out
> automatically. That also avoids the device->group traversal to check
> noiommu on the del path. Thanks,
>
Indeed, this is much simpler. Will do below as you suggested, Thanks.
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -11,6 +11,10 @@ static dev_t device_devt;
void vfio_init_device_cdev(struct vfio_device *device)
{
+ if (vfio_device_is_noiommu(device) &&
+ !IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU))
+ return;
+
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-09 21:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 22:02 [PATCH v8 0/6] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-06-03 22:02 ` [PATCH v8 1/6] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-06-03 22:02 ` [PATCH v8 2/6] iommufd: Move igroup allocation to a function Jacob Pan
2026-06-03 22:02 ` [PATCH v8 3/6] iommufd: Allow binding to a noiommu device Jacob Pan
2026-06-03 22:02 ` [PATCH v8 4/6] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
2026-06-03 22:02 ` [PATCH v8 5/6] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
2026-06-08 23:19 ` Alex Williamson
2026-06-09 18:50 ` Jacob Pan
2026-06-09 20:07 ` Alex Williamson
2026-06-09 21:11 ` Jacob Pan
2026-06-03 22:02 ` [PATCH v8 6/6] Documentation: Update VFIO NOIOMMU mode Jacob Pan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.