* [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev
@ 2026-05-11 18:41 Jacob Pan
2026-05-11 18:41 ` [PATCH v5 1/9] vfio: Rename VFIO_NOIOMMU to VFIO_GROUP_NOIOMMU Jacob Pan
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
VFIO's unsafe_noiommu_mode has long provided a way for userspace drivers
to operate on platforms lacking a hardware IOMMU. Today, IOMMUFD also
supports No-IOMMU mode for group-based devices under vfio_compat mode.
However, IOMMUFD's native character device (cdev) does not yet support
No-IOMMU mode, which is the purpose of this patch.
In summary, we have:
|-------------------------+------+---------------|
| Device access mode | VFIO | IOMMUFD |
|-------------------------+------+---------------|
| group /dev/vfio/$GROUP | Yes | Yes |
|-------------------------+------+---------------|
| cdev /dev/vfio/devices/ | No | This patch |
|-------------------------+------+---------------|
Beyond enabling cdev for IOMMUFD, this patch also addresses the following
deficiencies in the current No-IOMMU mode suggested by Jason[1]:
- Devices operating under No-IOMMU mode are limited to device-level UAPI
access, without container or IOAS-level capabilities. Consequently,
user-space drivers lack structured mechanisms for page pinning and often
resort to mlock(), which is less robust than pin_user_pages() used for
devices backed by a physical IOMMU. For example, mlock() does not prevent
page migration.
- There is no architectural mechanism for obtaining physical addresses for
DMA. As a workaround, user-space drivers frequently rely on /proc/pagemap
tricks or hardcoded values.
By allowing noiommu device access to IOMMUFD IOAS and HWPT objects, this
patch brings No-IOMMU mode closer to full citizenship within the IOMMU
subsystem. In addition to addressing the two deficiencies mentioned above,
the expectation is that it will also enable No-IOMMU devices to seamlessly
participate in live update sessions via KHO [2].
Furthermore, these devices will use the IOMMUFD-based ownership checking model for
VFIO_DEVICE_PCI_HOT_RESET, eliminating the need for an iommufd_access object
as required in a previous attempt [3].
ChangeLog:
V5:
- Split CONFIG_VFIO_NOIOMMU into CONFIG_VFIO_GROUP_NOIOMMU and
CONFIG_VFIO_CDEV_NOIOMMU so cdev noiommu is independent of
VFIO_GROUP (Alex)
- Add CAP_SYS_RAWIO check for cdev open and bind under noiommu,
security parity with group noiommu (Alex)
- Add IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) guard in
iommufd_device_is_noiommu() to prevent noiommu bind when feature
is disabled
- Add prep patch to tolerate NULL group for cdev noiommu devices
when CONFIG_VFIO_GROUP_NOIOMMU is not set [7/9]
- Rename IOCTL to IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA to be more
specific (Kevin)
- Simplify iommufd_device_is_noiommu, use iommufd_bind_noiommu
helper (Kevin, Yi)
- Move IOMMU cap check under iommufd_bind_iommu() (Yi)
- Fix next_iova exceeding iopt_area_last_iova in GET_PA (Alex)
- Fix const hwpt, copyright date, typo in moved comment (Kevin)
- Add Reviewed-by tags
- Squash noiommu cdev selftest fix into selftest patch
- Drop DSA selftest patch
- Details in each patch changelog.
V4:
- Fix various corner cases pointed out by (Sashiko)
Details in each patch changelog.
V3:
- Improve error handling [3/10] (Mostafa)
- Simplify vfio_device_is_noiommu logic and merged in [6/10] (Mostafa)
- Add comment to explain the design difference over the legacy noiommu
VFIO code.[1/10]
V2:
- Fix build dependency by adding IOMMU_SUPPORT in [8/11]
- Add an optimization to scan beyond the first page for a contiguous
physical address range and return its length instead of a single
page.[4/11]
Since RFC[4]:
- Abandoned dummy iommu driver approach as patch 1-3 absorbed the
changes into iommufd.
[1] https://lore.kernel.org/linux-iommu/20250603175403.GA407344@nvidia.com/
[2] https://lore.kernel.org/linux-pci/20251027134430.00007e46@linux.microsoft.com/
[3] https://lore.kernel.org/kvm/20230522115751.326947-1-yi.l.liu@intel.com/
[4] https://lore.kernel.org/linux-iommu/20251201173012.18371-1-jacob.pan@linux.microsoft.com/
Future cleanup: consolidate all CONFIG_IOMMUFD_NOIOMMU code
(iopt_get_phys, iommufd_ioas_noiommu_get_pa, iommufd_noiommu_ops) into
hwpt_noiommu.c to eliminate #ifdef guards from ioas.c and io_pagetable.c.
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v5 1/9] vfio: Rename VFIO_NOIOMMU to VFIO_GROUP_NOIOMMU
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
@ 2026-05-11 18:41 ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 2/9] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
In preparation for adding cdev-based noiommu support under iommufd,
rename CONFIG_VFIO_NOIOMMU to CONFIG_VFIO_GROUP_NOIOMMU to clearly
scope it to the legacy group/container path. Also rename the helper
vfio_device_is_noiommu() to vfio_device_is_group_noiommu() to match.
Add an explicit dependency on VFIO_CONTAINER or IOMMUFD_VFIO_CONTAINER
since the group-based noiommu path is only meaningful when container
support is enabled.
This is a pure rename with no functional change, laying the groundwork
for a separate VFIO_CDEV_NOIOMMU config that enables noiommu mode
through the iommufd cdev interface.
Link: https://lore.kernel.org/linux-iommu/20260416144915.4fe38481@shazbot.org/
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/iommu/iommufd/vfio_compat.c | 4 ++--
drivers/vfio/Kconfig | 6 +++---
drivers/vfio/container.c | 6 +++---
drivers/vfio/group.c | 4 ++--
drivers/vfio/iommufd.c | 6 +++---
drivers/vfio/vfio.h | 12 ++++++------
drivers/vfio/vfio_main.c | 4 ++--
7 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c
index acb48cdd3b00..51f4870ec2b3 100644
--- a/drivers/iommu/iommufd/vfio_compat.c
+++ b/drivers/iommu/iommufd/vfio_compat.c
@@ -286,7 +286,7 @@ static int iommufd_vfio_check_extension(struct iommufd_ctx *ictx,
return !ictx->no_iommu_mode;
case VFIO_NOIOMMU_IOMMU:
- return IS_ENABLED(CONFIG_VFIO_NOIOMMU);
+ return IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU);
case VFIO_DMA_CC_IOMMU:
return iommufd_vfio_cc_iommu(ictx);
@@ -318,7 +318,7 @@ static int iommufd_vfio_set_iommu(struct iommufd_ctx *ictx, unsigned long type)
* other ioctls. We let them keep working but they mostly fail since no
* IOAS should exist.
*/
- if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type == VFIO_NOIOMMU_IOMMU &&
+ if (IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) && type == VFIO_NOIOMMU_IOMMU &&
no_iommu_mode) {
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index ceae52fd7586..39939be2908e 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -60,9 +60,9 @@ config VFIO_IOMMU_SPAPR_TCE
default VFIO
endif
-config VFIO_NOIOMMU
- bool "VFIO No-IOMMU support"
- depends on VFIO_GROUP
+config VFIO_GROUP_NOIOMMU
+ bool "VFIO group No-IOMMU support"
+ depends on VFIO_GROUP && (VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER)
help
VFIO is built on the ability to isolate devices using the IOMMU.
Only with an IOMMU can userspace access to DMA capable devices be
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index 003281dbf8bc..9b8cdc5317d8 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -80,7 +80,7 @@ static const struct vfio_iommu_driver_ops vfio_noiommu_ops = {
static bool vfio_iommu_driver_allowed(struct vfio_container *container,
const struct vfio_iommu_driver *driver)
{
- if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU))
+ if (!IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU))
return true;
return container->noiommu == (driver->ops == &vfio_noiommu_ops);
}
@@ -583,7 +583,7 @@ int __init vfio_container_init(void)
return ret;
}
- if (IS_ENABLED(CONFIG_VFIO_NOIOMMU)) {
+ if (IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU)) {
ret = vfio_register_iommu_driver(&vfio_noiommu_ops);
if (ret)
goto err_misc;
@@ -597,7 +597,7 @@ int __init vfio_container_init(void)
void vfio_container_cleanup(void)
{
- if (IS_ENABLED(CONFIG_VFIO_NOIOMMU))
+ if (IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU))
vfio_unregister_iommu_driver(&vfio_noiommu_ops);
misc_deregister(&vfio_dev);
mutex_destroy(&vfio.iommu_drivers_lock);
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index b2299e5bc6df..5b9329df04e5 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -137,7 +137,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
iommufd = iommufd_ctx_from_file(fd_file(f));
if (!IS_ERR(iommufd)) {
- if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
+ if (IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) &&
group->type == VFIO_NO_IOMMU)
ret = iommufd_vfio_compat_set_no_iommu(iommufd);
else
@@ -190,7 +190,7 @@ static int vfio_df_group_open(struct vfio_device_file *df)
vfio_device_group_get_kvm_safe(device);
df->iommufd = device->group->iommufd;
- if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
+ if (df->iommufd && vfio_device_is_group_noiommu(device) && device->open_count == 0) {
/*
* Require no compat ioas to be assigned to proceed. The basic
* statement is that the user cannot have done something that
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index a38d262c6028..39079ab27f92 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -26,7 +26,7 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
/* Returns 0 to permit device opening under noiommu mode */
- if (vfio_device_is_noiommu(vdev))
+ if (vfio_device_is_group_noiommu(vdev))
return 0;
return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
@@ -41,7 +41,7 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
lockdep_assert_held(&vdev->dev_set->lock);
/* compat noiommu does not need to do ioas attach */
- if (vfio_device_is_noiommu(vdev))
+ if (vfio_device_is_group_noiommu(vdev))
return 0;
ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
@@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- if (vfio_device_is_noiommu(vdev))
+ if (vfio_device_is_group_noiommu(vdev))
return;
if (vdev->ops->unbind_iommufd)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index e4b72e79b7e3..602623cacfc0 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -36,7 +36,7 @@ vfio_allocate_device_file(struct vfio_device *device);
extern const struct file_operations vfio_device_fops;
-#ifdef CONFIG_VFIO_NOIOMMU
+#ifdef CONFIG_VFIO_GROUP_NOIOMMU
extern bool vfio_noiommu __read_mostly;
#else
enum { vfio_noiommu = false };
@@ -112,9 +112,9 @@ bool vfio_device_has_container(struct vfio_device *device);
int __init vfio_group_init(void);
void vfio_group_cleanup(void);
-static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
+static inline bool vfio_device_is_group_noiommu(struct vfio_device *vdev)
{
- return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
+ return IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) &&
vdev->group->type == VFIO_NO_IOMMU;
}
#else
@@ -188,7 +188,7 @@ static inline void vfio_group_cleanup(void)
{
}
-static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
+static inline bool vfio_device_is_group_noiommu(struct vfio_device *vdev)
{
return false;
}
@@ -359,7 +359,7 @@ void vfio_init_device_cdev(struct vfio_device *device);
static inline int vfio_device_add(struct vfio_device *device)
{
/* cdev does not support noiommu device */
- if (vfio_device_is_noiommu(device))
+ if (vfio_device_is_group_noiommu(device))
return device_add(&device->device);
vfio_init_device_cdev(device);
return cdev_device_add(&device->cdev, &device->device);
@@ -367,7 +367,7 @@ static inline int vfio_device_add(struct vfio_device *device)
static inline void vfio_device_del(struct vfio_device *device)
{
- if (vfio_device_is_noiommu(device))
+ if (vfio_device_is_group_noiommu(device))
device_del(&device->device);
else
cdev_device_del(&device->cdev, &device->device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6222376ab6ab..4d940ce6f114 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -54,7 +54,7 @@ static struct vfio {
int fs_count;
} vfio;
-#ifdef CONFIG_VFIO_NOIOMMU
+#ifdef CONFIG_VFIO_GROUP_NOIOMMU
bool vfio_noiommu __read_mostly;
module_param_named(enable_unsafe_noiommu_mode,
vfio_noiommu, bool, S_IRUGO | S_IWUSR);
@@ -353,7 +353,7 @@ static int __vfio_register_dev(struct vfio_device *device,
* restore cache coherency. It has to be checked here because it is only
* valid for cases where we are using iommu groups.
*/
- if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
+ if (type == VFIO_IOMMU && !vfio_device_is_group_noiommu(device) &&
!device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
ret = -EINVAL;
goto err_out;
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 2/9] iommufd: Support a HWPT without an iommu driver for noiommu
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-05-11 18:41 ` [PATCH v5 1/9] vfio: Rename VFIO_NOIOMMU to VFIO_GROUP_NOIOMMU Jacob Pan
@ 2026-05-11 18:41 ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 3/9] iommufd: Move igroup allocation to a function Jacob Pan
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
From: Jason Gunthorpe <jgg@nvidia.com>
Create just a little part of a real iommu driver, enough to
slot in under the dev_iommu_ops() and allow iommufd to call
domain_alloc_paging_flags() and fail everything else.
This allows explicitly creating a HWPT under an IOAS.
A new Kconfig option IOMMUFD_NOIOMMU is introduced to differentiate
from the VFIO group/container based noiommu mode.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v5:
- Use the new IOMMUFD_NOIOMMU Kconfig instead of VFIO_NOIOMMU
- Use consistent wording referring to VFIO noiommu mode (Kevin)
- Copyright date fix (Kevin)
v4:
- Make iommufd_noiommu_ops const
v3:
- Add comment to explain the design difference over the
legacy noiommu VFIO code.
---
drivers/iommu/iommufd/Kconfig | 13 +++
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/hw_pagetable.c | 15 +++-
drivers/iommu/iommufd/hwpt_noiommu.c | 102 ++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 2 +
5 files changed, 131 insertions(+), 2 deletions(-)
create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index 455bac0351f2..74d6ea5b5b3b 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -16,6 +16,19 @@ config IOMMUFD
If you don't know what to do here, say N.
if IOMMUFD
+config IOMMUFD_NOIOMMU
+ bool
+ depends on !GENERIC_ATOMIC64 # IOMMU_PT_AMDV1 requires cmpxchg64
+ select GENERIC_PT
+ select IOMMU_PT
+ select IOMMU_PT_AMDV1
+ help
+ Provides a SW-only IO page table for devices without hardware
+ IOMMU backing. This uses the AMDV1 page table format for
+ IOVA-to-PA lookups only, not for hardware DMA translation.
+
+ Selected by VFIO_CDEV_NOIOMMU. Not intended to be enabled directly.
+
config IOMMUFD_VFIO_CONTAINER
bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
depends on VFIO_GROUP && !VFIO_CONTAINER
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 71d692c9a8f4..67207914bb6e 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -10,6 +10,7 @@ iommufd-y := \
vfio_compat.o \
viommu.o
+iommufd-$(CONFIG_IOMMUFD_NOIOMMU) += hwpt_noiommu.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index fe789c2dc0c9..0ae14cd3fc72 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,15 @@
#include "../iommu-priv.h"
#include "iommufd_private.h"
+static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev)
+{
+ if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->igroup->group)
+ return &iommufd_noiommu_ops;
+ if (WARN_ON_ONCE(!idev->dev->iommu))
+ return NULL;
+ return dev_iommu_ops(idev->dev);
+}
+
static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
{
if (hwpt->domain)
@@ -114,11 +123,13 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
IOMMU_HWPT_FAULT_ID_VALID |
IOMMU_HWPT_ALLOC_PASID;
- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+ const struct iommu_ops *ops = get_iommu_ops(idev);
struct iommufd_hwpt_paging *hwpt_paging;
struct iommufd_hw_pagetable *hwpt;
int rc;
+ if (!ops)
+ return ERR_PTR(-ENODEV);
lockdep_assert_held(&ioas->mutex);
if ((flags || user_data) && !ops->domain_alloc_paging_flags)
@@ -229,7 +240,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
struct iommufd_device *idev, u32 flags,
const struct iommu_user_data *user_data)
{
- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+ const struct iommu_ops *ops = get_iommu_ops(idev);
struct iommufd_hwpt_nested *hwpt_nested;
struct iommufd_hw_pagetable *hwpt;
int rc;
diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/hwpt_noiommu.c
new file mode 100644
index 000000000000..b1efc4bca880
--- /dev/null
+++ b/drivers/iommu/iommufd/hwpt_noiommu.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES
+ */
+#include <linux/iommu.h>
+#include <linux/generic_pt/iommu.h>
+#include "iommufd_private.h"
+
+static const struct iommu_domain_ops noiommu_amdv1_ops;
+
+struct noiommu_domain {
+ union {
+ struct iommu_domain domain;
+ struct pt_iommu_amdv1 amdv1;
+ };
+ spinlock_t lock;
+};
+PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain);
+
+static void noiommu_change_top(struct pt_iommu *iommu_table,
+ phys_addr_t top_paddr, unsigned int top_level)
+{
+}
+
+static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt)
+{
+ struct noiommu_domain *domain =
+ container_of(iommupt, struct noiommu_domain, amdv1.iommu);
+
+ return &domain->lock;
+}
+
+static const struct pt_iommu_driver_ops noiommu_driver_ops = {
+ .get_top_lock = noiommu_get_top_lock,
+ .change_top = noiommu_change_top,
+};
+
+static struct iommu_domain *
+noiommu_alloc_paging_flags(struct device *dev, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ struct pt_iommu_amdv1_cfg cfg = {};
+ struct noiommu_domain *dom;
+ int rc;
+
+ if (flags || user_data)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ cfg.common.hw_max_vasz_lg2 = 64;
+ cfg.common.hw_max_oasz_lg2 = 52;
+ cfg.starting_level = 2;
+ cfg.common.features =
+ (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) |
+ BIT(PT_FEAT_AMDV1_FORCE_COHERENCE));
+
+ dom = kzalloc(sizeof(*dom), GFP_KERNEL);
+ if (!dom)
+ return ERR_PTR(-ENOMEM);
+
+ spin_lock_init(&dom->lock);
+ dom->amdv1.iommu.nid = NUMA_NO_NODE;
+ dom->amdv1.iommu.driver_ops = &noiommu_driver_ops;
+ dom->domain.ops = &noiommu_amdv1_ops;
+
+ /* Use mock page table which is based on AMDV1 */
+ rc = pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL);
+ if (rc) {
+ kfree(dom);
+ return ERR_PTR(rc);
+ }
+
+ return &dom->domain;
+}
+
+static void noiommu_domain_free(struct iommu_domain *iommu_domain)
+{
+ struct noiommu_domain *domain =
+ container_of(iommu_domain, struct noiommu_domain, domain);
+
+ pt_iommu_deinit(&domain->amdv1.iommu);
+ kfree(domain);
+}
+
+/*
+ * AMDV1 is used as a SW-only page table for no-IOMMU mode, similar to the
+ * iommufd selftest mock page table.
+ * Unlike the VFIO group-container based no-IOMMU mode, where no container
+ * level APIs are supported, this allows IOAS and hwpt objects to exist
+ * without hardware IOMMU support. IOVAs are used only for IOVA-to-PA
+ * lookups not for hardware translation in DMA.
+ *
+ * This is only used with iommufd and cdev-based interfaces and does not
+ * apply to the VFIO group-container based noiommu mode.
+ */
+static const struct iommu_domain_ops noiommu_amdv1_ops = {
+ IOMMU_PT_DOMAIN_OPS(amdv1),
+ .free = noiommu_domain_free,
+};
+
+const struct iommu_ops iommufd_noiommu_ops = {
+ .domain_alloc_paging_flags = noiommu_alloc_paging_flags,
+};
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 6ac1965199e9..2682b5baa6e9 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
refcount_dec(&hwpt->obj.users);
}
+extern const struct iommu_ops iommufd_noiommu_ops;
+
struct iommufd_attach;
struct iommufd_group {
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 3/9] iommufd: Move igroup allocation to a function
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-05-11 18:41 ` [PATCH v5 1/9] vfio: Rename VFIO_NOIOMMU to VFIO_GROUP_NOIOMMU Jacob Pan
2026-05-11 18:41 ` [PATCH v5 2/9] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
@ 2026-05-11 18:41 ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 4/9] iommufd: Allow binding to a noiommu device Jacob Pan
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
From: Jason Gunthorpe <jgg@nvidia.com>
So it can be reused in the next patch which allows binding to noiommu
device.
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v5:
- Add NULL group to the error handling path of
iommufd_group_setup_msi()
v3:
- New patch
---
drivers/iommu/iommufd/device.c | 43 +++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 170a7005f0bc..d03076fcf3c2 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -56,6 +56,30 @@ static bool iommufd_group_try_get(struct iommufd_group *igroup,
return kref_get_unless_zero(&igroup->ref);
}
+static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx,
+ struct iommu_group *group)
+{
+ struct iommufd_group *new_igroup;
+
+ new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL);
+ if (!new_igroup)
+ return ERR_PTR(-ENOMEM);
+
+ kref_init(&new_igroup->ref);
+ mutex_init(&new_igroup->lock);
+ xa_init(&new_igroup->pasid_attach);
+ new_igroup->sw_msi_start = PHYS_ADDR_MAX;
+ /* group reference moves into new_igroup */
+ new_igroup->group = group;
+
+ /*
+ * The ictx is not additionally refcounted here because all objects using
+ * an igroup must put it before their destroy completes.
+ */
+ new_igroup->ictx = ictx;
+ return new_igroup;
+}
+
/*
* iommufd needs to store some more data for each iommu_group, we keep a
* parallel xarray indexed by iommu_group id to hold this instead of putting it
@@ -87,25 +111,12 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
}
xa_unlock(&ictx->groups);
- new_igroup = kzalloc_obj(*new_igroup);
- if (!new_igroup) {
+ new_igroup = iommufd_alloc_group(ictx, group);
+ if (IS_ERR(new_igroup)) {
iommu_group_put(group);
- return ERR_PTR(-ENOMEM);
+ return new_igroup;
}
- kref_init(&new_igroup->ref);
- mutex_init(&new_igroup->lock);
- xa_init(&new_igroup->pasid_attach);
- new_igroup->sw_msi_start = PHYS_ADDR_MAX;
- /* group reference moves into new_igroup */
- new_igroup->group = group;
-
- /*
- * The ictx is not additionally refcounted here becase all objects using
- * an igroup must put it before their destroy completes.
- */
- new_igroup->ictx = ictx;
-
/*
* We dropped the lock so igroup is invalid. NULL is a safe and likely
* value to assume for the xa_cmpxchg algorithm.
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 4/9] iommufd: Allow binding to a noiommu device
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
` (2 preceding siblings ...)
2026-05-11 18:41 ` [PATCH v5 3/9] iommufd: Move igroup allocation to a function Jacob Pan
@ 2026-05-11 18:41 ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 5/9] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
From: Jason Gunthorpe <jgg@nvidia.com>
Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating
a dummy IOMMU group for such devices and skipping hwpt operations.
This enables noiommu devices to operate through the same iommufd API as IOMMU-
capable devices.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v5:
- simplify logic and rename iommufd_device_is_noiommu (Kevin, Yi)
- use a helper iommufd_bind_noiommu instead of open coding (Kevin)
- move IOMMU cap check under iommufd_bind_iommu() (Yi)
- reword comments for partial init (Yi)
- misc minor clean up
v4:
- Update the description of the module parameter (Alex)
v3:
- Consolidate into fewer patches
---
drivers/iommu/iommufd/device.c | 148 ++++++++++++++++++++++++---------
1 file changed, 109 insertions(+), 39 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index d03076fcf3c2..4d75720432cc 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -23,6 +23,16 @@ struct iommufd_attach {
struct xarray device_array;
};
+/*
+ * A noiommu device has no IOMMU driver attached regardless of whether it
+ * enters via the cdev path (no iommu_group) or the group path (fake
+ * noiommu iommu_group). In both cases dev->iommu is NULL.
+ */
+static bool iommufd_device_is_noiommu(struct iommufd_device *idev)
+{
+ return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->dev->iommu;
+}
+
static void iommufd_group_release(struct kref *kref)
{
struct iommufd_group *igroup =
@@ -30,9 +40,11 @@ static void iommufd_group_release(struct kref *kref)
WARN_ON(!xa_empty(&igroup->pasid_attach));
- xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
- NULL, GFP_KERNEL);
- iommu_group_put(igroup->group);
+ if (igroup->group) {
+ xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group),
+ igroup, NULL, GFP_KERNEL);
+ iommu_group_put(igroup->group);
+ }
mutex_destroy(&igroup->lock);
kfree(igroup);
}
@@ -204,32 +216,19 @@ void iommufd_device_destroy(struct iommufd_object *obj)
struct iommufd_device *idev =
container_of(obj, struct iommufd_device, obj);
- iommu_device_release_dma_owner(idev->dev);
+ if (!idev->igroup)
+ return;
+ if (!iommufd_device_is_noiommu(idev))
+ iommu_device_release_dma_owner(idev->dev);
iommufd_put_group(idev->igroup);
if (!iommufd_selftest_is_mock_dev(idev->dev))
iommufd_ctx_put(idev->ictx);
}
-/**
- * iommufd_device_bind - Bind a physical device to an iommu fd
- * @ictx: iommufd file descriptor
- * @dev: Pointer to a physical device struct
- * @id: Output ID number to return to userspace for this device
- *
- * A successful bind establishes an ownership over the device and returns
- * struct iommufd_device pointer, otherwise returns error pointer.
- *
- * A driver using this API must set driver_managed_dma and must not touch
- * the device until this routine succeeds and establishes ownership.
- *
- * Binding a PCI device places the entire RID under iommufd control.
- *
- * The caller must undo this with iommufd_device_unbind()
- */
-struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
- struct device *dev, u32 *id)
+static int iommufd_bind_iommu(struct iommufd_device *idev)
{
- struct iommufd_device *idev;
+ struct iommufd_ctx *ictx = idev->ictx;
+ struct device *dev = idev->dev;
struct iommufd_group *igroup;
int rc;
@@ -238,11 +237,11 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
* to restore cache coherency.
*/
if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
igroup = iommufd_get_group(ictx, dev);
if (IS_ERR(igroup))
- return ERR_CAST(igroup);
+ return PTR_ERR(igroup);
/*
* For historical compat with VFIO the insecure interrupt path is
@@ -268,21 +267,80 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
if (rc)
goto out_group_put;
+ /* igroup refcount moves into iommufd_device */
+ idev->igroup = igroup;
+ idev->enforce_cache_coherency =
+ device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+ return 0;
+
+out_group_put:
+ iommufd_put_group(igroup);
+ return rc;
+}
+
+/*
+ * Noiommu devices have no real IOMMU group. Create a dummy igroup so that
+ * internal code paths that expect idev->igroup to be present still work.
+ * A NULL igroup->group distinguishes this from a real IOMMU-backed group.
+ */
+static int iommufd_bind_noiommu(struct iommufd_device *idev)
+{
+ struct iommufd_group *igroup;
+
+ igroup = iommufd_alloc_group(idev->ictx, NULL);
+ if (IS_ERR(igroup))
+ return PTR_ERR(igroup);
+ idev->igroup = igroup;
+ return 0;
+}
+
+/**
+ * iommufd_device_bind - Bind a physical device to an iommu fd
+ * @ictx: iommufd file descriptor
+ * @dev: Pointer to a physical device struct
+ * @id: Output ID number to return to userspace for this device
+ *
+ * A successful bind establishes an ownership over the device and returns
+ * struct iommufd_device pointer, otherwise returns error pointer.
+ *
+ * A driver using this API must set driver_managed_dma and must not touch
+ * the device until this routine succeeds and establishes ownership.
+ *
+ * Binding a PCI device places the entire RID under iommufd control.
+ *
+ * The caller must undo this with iommufd_device_unbind()
+ */
+struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
+ struct device *dev, u32 *id)
+{
+ struct iommufd_device *idev;
+ int rc;
+
idev = iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE);
- if (IS_ERR(idev)) {
- rc = PTR_ERR(idev);
- goto out_release_owner;
- }
+ if (IS_ERR(idev))
+ return idev;
+
idev->ictx = ictx;
+ idev->dev = dev;
+
+ if (!iommufd_device_is_noiommu(idev)) {
+ rc = iommufd_bind_iommu(idev);
+ if (rc)
+ goto err_out;
+ } else {
+ rc = iommufd_bind_noiommu(idev);
+ if (rc)
+ goto err_out;
+ }
+
+ /*
+ * Take a ctx reference after bind succeeds. This must happen here
+ * so that iommufd_device_destroy() can handle partial initialization
+ */
if (!iommufd_selftest_is_mock_dev(dev))
iommufd_ctx_get(ictx);
- idev->dev = dev;
- idev->enforce_cache_coherency =
- device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
/* The calling driver is a user until iommufd_device_unbind() */
refcount_inc(&idev->obj.users);
- /* igroup refcount moves into iommufd_device */
- idev->igroup = igroup;
/*
* If the caller fails after this success it must call
@@ -294,11 +352,14 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
*id = idev->obj.id;
return idev;
-out_release_owner:
- iommu_device_release_dma_owner(dev);
-out_group_put:
- iommufd_put_group(igroup);
+err_out:
+ /*
+ * iommufd_device_destroy() handles partially initialized idev,
+ * so iommufd_object_abort_and_destroy() is safe to call here.
+ */
+ iommufd_object_abort_and_destroy(ictx, &idev->obj);
return ERR_PTR(rc);
+
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD");
@@ -512,6 +573,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
struct iommufd_attach_handle *handle;
int rc;
+ if (iommufd_device_is_noiommu(idev))
+ return 0;
+
if (!iommufd_hwpt_compatible_device(hwpt, idev))
return -EINVAL;
@@ -559,6 +623,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
{
struct iommufd_attach_handle *handle;
+ if (iommufd_device_is_noiommu(idev))
+ return;
+
handle = iommufd_device_get_attach_handle(idev, pasid);
if (pasid == IOMMU_NO_PASID)
iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
@@ -577,6 +644,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
struct iommufd_attach_handle *handle, *old_handle;
int rc;
+ if (iommufd_device_is_noiommu(idev))
+ return 0;
+
if (!iommufd_hwpt_compatible_device(hwpt, idev))
return -EINVAL;
@@ -652,7 +722,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
goto err_release_devid;
}
- if (attach_resv) {
+ if (attach_resv && !iommufd_device_is_noiommu(idev)) {
rc = iommufd_device_attach_reserved_iova(idev, hwpt_paging);
if (rc)
goto err_release_devid;
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 5/9] iommufd: Add an ioctl to query PA from IOVA for noiommu mode
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
` (3 preceding siblings ...)
2026-05-11 18:41 ` [PATCH v5 4/9] iommufd: Allow binding to a noiommu device Jacob Pan
@ 2026-05-11 18:41 ` Jacob Pan
2026-05-11 18:58 ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 6/9] vfio/group: Add VFIO_CDEV_NOIOMMU Kconfig and tolerate NULL group Jacob Pan
` (3 subsequent siblings)
8 siblings, 1 reply; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
To support no-IOMMU mode where userspace drivers perform unsafe DMA
using physical addresses, introduce a new API to retrieve the
physical address of a user-allocated DMA buffer that has been mapped to
an IOVA via IOAS. The mapping is backed by SW-only I/O page tables
maintained by the generic IOMMUPT framework.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v5:
- Add header stubs for iopt_get_phys() and
iommufd_ioas_noiommu_get_pa() to avoid ifdef at call sites (Kevin)
v4:
- Fix ioctl return type (Yi Liu)
v2:
- New patch
---
drivers/iommu/iommufd/io_pagetable.c | 62 +++++++++++++++++++++++++
drivers/iommu/iommufd/ioas.c | 30 ++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 18 +++++++
drivers/iommu/iommufd/main.c | 3 ++
include/uapi/linux/iommufd.h | 25 ++++++++++
5 files changed, 138 insertions(+)
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 24d4917105d9..1ee7c8e6408c 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -859,6 +859,68 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
return iopt_unmap_iova_range(iopt, iova, iova_last, unmapped);
}
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
+ u64 *length)
+{
+ struct iopt_area *area;
+ u64 tmp_length = 0;
+ u64 tmp_paddr = 0;
+ int rc = 0;
+
+ down_read(&iopt->iova_rwsem);
+ area = iopt_area_iter_first(iopt, iova, iova);
+ if (!area || !area->pages) {
+ rc = -ENOENT;
+ goto unlock_exit;
+ }
+
+ if (!area->storage_domain ||
+ area->storage_domain->owner != &iommufd_noiommu_ops) {
+ rc = -EOPNOTSUPP;
+ goto unlock_exit;
+ }
+
+ *paddr = iommu_iova_to_phys(area->storage_domain, iova);
+ if (!*paddr) {
+ rc = -EINVAL;
+ goto unlock_exit;
+ }
+
+ tmp_length = PAGE_SIZE - offset_in_page(iova);
+ tmp_paddr = *paddr;
+ /*
+ * Scan the domain for the contiguous physical address length so that
+ * userspace search can be optimized for fewer ioctls.
+ */
+ while (iova < iopt_area_last_iova(area)) {
+ unsigned long next_iova;
+ u64 next_paddr;
+
+ if (check_add_overflow(iova, PAGE_SIZE, &next_iova))
+ break;
+
+ if (next_iova > iopt_area_last_iova(area))
+ break;
+
+ next_paddr = iommu_iova_to_phys(area->storage_domain, next_iova);
+
+ if (!next_paddr || next_paddr != tmp_paddr + PAGE_SIZE)
+ break;
+
+ iova = next_iova;
+ tmp_paddr += PAGE_SIZE;
+ tmp_length += PAGE_SIZE;
+ }
+ *length = tmp_length;
+
+unlock_exit:
+ up_read(&iopt->iova_rwsem);
+
+ return rc;
+}
+#endif
+
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped)
{
/* If the IOVAs are empty then unmap all succeeds */
diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index fed06c2b728e..666440e32c9e 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -375,6 +375,36 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd)
return rc;
}
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_ioas_noiommu_get_pa *cmd = ucmd->cmd;
+ struct iommufd_ioas *ioas;
+ int rc;
+
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
+ if (cmd->flags || cmd->__reserved)
+ return -EOPNOTSUPP;
+
+ ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
+ if (IS_ERR(ioas))
+ return PTR_ERR(ioas);
+
+ rc = iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys,
+ &cmd->out_length);
+ if (rc)
+ goto out_put;
+
+ rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out_put:
+ iommufd_put_object(ucmd->ictx, &ioas->obj);
+
+ return rc;
+}
+#endif
+
static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx,
struct xarray *ioas_list)
{
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 2682b5baa6e9..13f1506d8066 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -118,6 +118,16 @@ int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list,
int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
unsigned long length, unsigned long *unmapped);
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
+ u64 *length);
+#else
+static inline int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova,
+ u64 *paddr, u64 *length)
+{
+ return -EOPNOTSUPP;
+}
+#endif
int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
struct iommu_domain *domain,
@@ -346,6 +356,14 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd);
int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd);
int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd);
+#else
+static inline int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd)
+{
+ return -EOPNOTSUPP;
+}
+#endif
int iommufd_ioas_option(struct iommufd_ucmd *ucmd);
int iommufd_option_rlimit_mode(struct iommu_option *cmd,
struct iommufd_ctx *ictx);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 8c6d43601afb..3b4192d70570 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -424,6 +424,7 @@ union ucmd_buffer {
struct iommu_ioas_alloc alloc;
struct iommu_ioas_allow_iovas allow_iovas;
struct iommu_ioas_copy ioas_copy;
+ struct iommu_ioas_noiommu_get_pa noiommu_get_pa;
struct iommu_ioas_iova_ranges iova_ranges;
struct iommu_ioas_map map;
struct iommu_ioas_unmap unmap;
@@ -482,6 +483,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, iova),
IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file,
struct iommu_ioas_map_file, iova),
+ IOCTL_OP(IOMMU_IOAS_NOIOMMU_GET_PA, iommufd_ioas_noiommu_get_pa, struct iommu_ioas_noiommu_get_pa,
+ out_phys),
IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap,
length),
IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64),
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index e998dfbd6960..7df366d161f1 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -57,6 +57,7 @@ enum {
IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
IOMMUFD_CMD_HW_QUEUE_ALLOC = 0x94,
+ IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA = 0x95,
};
/**
@@ -219,6 +220,30 @@ struct iommu_ioas_map {
};
#define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP)
+/**
+ * struct iommu_ioas_noiommu_get_pa - ioctl(IOMMU_IOAS_NOIOMMU_GET_PA)
+ * @size: sizeof(struct iommu_ioas_noiommu_get_pa)
+ * @flags: Reserved, must be 0 for now
+ * @ioas_id: IOAS ID to query IOVA to PA mapping from
+ * @__reserved: Must be 0
+ * @iova: IOVA to query
+ * @out_length: Number of bytes contiguous physical address starting from phys
+ * @out_phys: Output physical address the IOVA maps to
+ *
+ * Query the physical address backing an IOVA range. The entire range must be
+ * mapped already. For noiommu devices doing unsafe DMA only.
+ */
+struct iommu_ioas_noiommu_get_pa {
+ __u32 size;
+ __u32 flags;
+ __u32 ioas_id;
+ __u32 __reserved;
+ __aligned_u64 iova;
+ __aligned_u64 out_length;
+ __aligned_u64 out_phys;
+};
+#define IOMMU_IOAS_NOIOMMU_GET_PA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA)
+
/**
* struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE)
* @size: sizeof(struct iommu_ioas_map_file)
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 6/9] vfio/group: Add VFIO_CDEV_NOIOMMU Kconfig and tolerate NULL group
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
` (4 preceding siblings ...)
2026-05-11 18:41 ` [PATCH v5 5/9] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
@ 2026-05-11 18:41 ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 7/9] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
Add a new CONFIG_VFIO_CDEV_NOIOMMU option, independent of
CONFIG_VFIO_GROUP, to support noiommu mode via the cdev interface.
Since CONFIG_VFIO_GROUP can be enabled while CONFIG_VFIO_GROUP_NOIOMMU
is not, guard the noiommu group allocation in vfio_group_find_or_alloc()
with IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) to prevent creating spurious
/dev/vfio/noiommu-N group files when only cdev noiommu is configured.
For cdev noiommu devices that have no group, let vfio_device_set_group()
return success with a NULL group pointer and add null guards in group
functions that may be called during device lifecycle. These guards are
contained within group.c and are dead code for IOMMU-enabled devices
where device->group is always non-NULL.
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/vfio/Kconfig | 17 +++++++++++++++++
drivers/vfio/group.c | 31 +++++++++++++++++++++++++++++--
2 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 39939be2908e..b1b1633412a9 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -75,6 +75,23 @@ config VFIO_GROUP_NOIOMMU
If you don't know what to do here, say N.
+config VFIO_CDEV_NOIOMMU
+ bool "VFIO cdev No-IOMMU support"
+ depends on VFIO_DEVICE_CDEV
+ select IOMMUFD_NOIOMMU
+ help
+ VFIO cdev no-IOMMU mode enables device access via the cdev
+ interface without hardware IOMMU backing. This relies on
+ IOMMUFD_NOIOMMU to provide a SW-only IO page table for
+ IOVA-to-PA lookups.
+
+ Use of this mode will result in an unsupportable kernel and
+ will therefore taint the kernel. Device assignment to virtual
+ machines is also not possible with this mode since there is
+ no IOMMU to provide DMA translation.
+
+ If you don't know what to do here, say N.
+
config VFIO_VIRQFD
bool
select EVENTFD
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 5b9329df04e5..c8a75ee28f20 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -386,6 +386,9 @@ int vfio_device_block_group(struct vfio_device *device)
struct vfio_group *group = device->group;
int ret = 0;
+ if (!group)
+ return 0;
+
mutex_lock(&group->group_lock);
if (group->opened_file) {
ret = -EBUSY;
@@ -403,6 +406,9 @@ void vfio_device_unblock_group(struct vfio_device *device)
{
struct vfio_group *group = device->group;
+ if (!group)
+ return;
+
mutex_lock(&group->group_lock);
group->cdev_device_open_cnt--;
mutex_unlock(&group->group_lock);
@@ -641,7 +647,8 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
struct vfio_group *group;
iommu_group = iommu_group_get(dev);
- if (!iommu_group && vfio_noiommu) {
+ if (!iommu_group && IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) &&
+ vfio_noiommu) {
/*
* With noiommu enabled, create an IOMMU group for devices that
* don't already have one, implying no IOMMU hardware/driver
@@ -686,8 +693,19 @@ int vfio_device_set_group(struct vfio_device *device,
else
group = vfio_noiommu_group_alloc(device->dev, type);
- if (IS_ERR(group))
+ if (IS_ERR(group)) {
+ /*
+ * Cdev noiommu devices don't need a vfio_group. When
+ * CONFIG_VFIO_GROUP_NOIOMMU is not set, the group alloc
+ * above returns -EINVAL for devices without an IOMMU.
+ * That's fine — a NULL group is expected and iommufd
+ * handles these devices directly.
+ */
+ if (IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU) &&
+ vfio_noiommu && !device->dev->iommu)
+ return 0;
return PTR_ERR(group);
+ }
/* Our reference on group is moved to the device */
device->group = group;
@@ -699,6 +717,9 @@ void vfio_device_remove_group(struct vfio_device *device)
struct vfio_group *group = device->group;
struct iommu_group *iommu_group;
+ if (!group)
+ return;
+
if (group->type == VFIO_NO_IOMMU || group->type == VFIO_EMULATED_IOMMU)
iommu_group_remove_device(device->dev);
@@ -742,6 +763,8 @@ void vfio_device_remove_group(struct vfio_device *device)
void vfio_device_group_register(struct vfio_device *device)
{
+ if (!device->group)
+ return;
mutex_lock(&device->group->device_lock);
list_add(&device->group_next, &device->group->device_list);
mutex_unlock(&device->group->device_lock);
@@ -749,6 +772,8 @@ void vfio_device_group_register(struct vfio_device *device)
void vfio_device_group_unregister(struct vfio_device *device)
{
+ if (!device->group)
+ return;
mutex_lock(&device->group->device_lock);
list_del(&device->group_next);
mutex_unlock(&device->group->device_lock);
@@ -786,6 +811,8 @@ void vfio_device_group_unuse_iommu(struct vfio_device *device)
bool vfio_device_has_container(struct vfio_device *device)
{
+ if (!device->group)
+ return false;
return device->group->container;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 7/9] vfio: Enable cdev noiommu mode under iommufd
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
` (5 preceding siblings ...)
2026-05-11 18:41 ` [PATCH v5 6/9] vfio/group: Add VFIO_CDEV_NOIOMMU Kconfig and tolerate NULL group Jacob Pan
@ 2026-05-11 18:41 ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 8/9] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
2026-05-11 18:41 ` [PATCH v5 9/9] Documentation: Update VFIO NOIOMMU mode Jacob Pan
8 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
Now that devices under noiommu mode can bind with IOMMUFD and perform
IOAS operations, lift restrictions on cdev from VFIO side.
Remove the vfio_device_is_group_noiommu() early returns in
vfio_df_iommufd_bind() and vfio_df_iommufd_unbind() so that both
group and cdev noiommu devices go through the standard iommufd bind
path. This is safe because iommufd_device_bind() now handles noiommu
devices via its own iommufd_device_is_noiommu() check.
Add CAP_SYS_RAWIO checks for cdev open and bind under noiommu to
maintain security parity with the group noiommu path.
No IOMMU cdevs are explicitly named with noiommu prefix. e.g.
/dev/vfio/
|-- devices
| `-- noiommu-vfio0
`-- vfio
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v5:
- Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
and its dependencies
- Add comment to explain vfio_noiommu conditional definition (Alex)
- Removed early return for group noiommu in bind/unbind
- Use consistent wording referring to VFIO noiommu mode (Kevin)
- Update unsafe_noiommu Kconfig help text (Kevin)
- Change dev_warn to dev_info for noiommu enabling msg (Kevin)
v4:
- Remove early return in iommufd_bind for noiommu (Alex)
v3:
- Consolidate into fewer patches
v2:
- removed unnecessary device->noiommu set in
iommufd_vfio_compat_ioas_get_id()
---
drivers/vfio/Kconfig | 3 +--
drivers/vfio/device_cdev.c | 10 ++++++++++
drivers/vfio/iommufd.c | 7 -------
drivers/vfio/vfio.h | 22 ++++++++++++++--------
drivers/vfio/vfio_main.c | 25 ++++++++++++++++++++-----
include/linux/vfio.h | 1 +
6 files changed, 46 insertions(+), 22 deletions(-)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index b1b1633412a9..b1a260b6054c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
The VFIO device cdev is another way for userspace to get device
access. Userspace gets device fd by opening device cdev under
/dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
- to set up secure DMA context for device access. This interface does
- not support noiommu.
+ to set up secure DMA context for device access.
If you don't know what to do here, say N.
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 54abf312cf04..46a808244398 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
struct vfio_device_file *df;
int ret;
+ if (device->noiommu && !capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
/* Paired with the put in vfio_device_fops_release() */
if (!vfio_device_try_get_registration(device))
return -ENODEV;
@@ -110,6 +113,13 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
if (df->group)
return -EINVAL;
+ /*
+ * CAP_SYS_RAWIO is already checked at cdev open, recheck here
+ * in case the fd was passed to a less privileged process.
+ */
+ if (device->noiommu && !capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
ret = vfio_device_block_group(device);
if (ret)
return ret;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 39079ab27f92..bc80056c74d3 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -25,10 +25,6 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- /* Returns 0 to permit device opening under noiommu mode */
- if (vfio_device_is_group_noiommu(vdev))
- return 0;
-
return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
}
@@ -58,9 +54,6 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- if (vfio_device_is_group_noiommu(vdev))
- return;
-
if (vdev->ops->unbind_iommufd)
vdev->ops->unbind_iommufd(vdev);
}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 602623cacfc0..ac79b1a2fce9 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -36,7 +36,7 @@ vfio_allocate_device_file(struct vfio_device *device);
extern const struct file_operations vfio_device_fops;
-#ifdef CONFIG_VFIO_GROUP_NOIOMMU
+#if IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) || IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU)
extern bool vfio_noiommu __read_mostly;
#else
enum { vfio_noiommu = false };
@@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct vfio_device *device);
static inline int vfio_device_add(struct vfio_device *device)
{
- /* cdev does not support noiommu device */
- if (vfio_device_is_group_noiommu(device))
- return device_add(&device->device);
vfio_init_device_cdev(device);
return cdev_device_add(&device->cdev, &device->device);
}
static inline void vfio_device_del(struct vfio_device *device)
{
- if (vfio_device_is_group_noiommu(device))
- device_del(&device->device);
- else
- cdev_device_del(&device->cdev, &device->device);
+ cdev_device_del(&device->cdev, &device->device);
}
int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
@@ -420,6 +414,18 @@ static inline void vfio_cdev_cleanup(void)
}
#endif /* CONFIG_VFIO_DEVICE_CDEV */
+#if IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU)
+static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev)
+{
+ return vdev->noiommu;
+}
+#else
+static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev)
+{
+ return false;
+}
+#endif
+
#if IS_ENABLED(CONFIG_VFIO_VIRQFD)
int __init vfio_virqfd_init(void);
void vfio_virqfd_exit(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 4d940ce6f114..1ba0f282d746 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -54,7 +54,7 @@ static struct vfio {
int fs_count;
} vfio;
-#ifdef CONFIG_VFIO_GROUP_NOIOMMU
+#if IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) || IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU)
bool vfio_noiommu __read_mostly;
module_param_named(enable_unsafe_noiommu_mode,
vfio_noiommu, bool, S_IRUGO | S_IWUSR);
@@ -321,6 +321,20 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
return ret;
}
+static int vfio_device_set_noiommu_and_name(struct vfio_device *device)
+{
+ if (IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU) && vfio_noiommu && !device->dev->iommu) {
+ device->noiommu = true;
+ add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+ dev_warn(device->dev,
+ "Adding kernel taint for vfio-noiommu cdev on device\n");
+ }
+
+ /* Just to be safe, expose to user explicitly noiommu cdev node */
+ return dev_set_name(&device->device, "%svfio%d",
+ device->noiommu ? "noiommu-" : "", device->index);
+}
+
static int __vfio_register_dev(struct vfio_device *device,
enum vfio_group_type type)
{
@@ -340,20 +354,21 @@ static int __vfio_register_dev(struct vfio_device *device,
if (!device->dev_set)
vfio_assign_device_set(device, device);
- ret = dev_set_name(&device->device, "vfio%d", device->index);
+ ret = vfio_device_set_group(device, type);
if (ret)
return ret;
- ret = vfio_device_set_group(device, type);
+ ret = vfio_device_set_noiommu_and_name(device);
if (ret)
- return ret;
+ goto err_out;
/*
* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
* restore cache coherency. It has to be checked here because it is only
* valid for cases where we are using iommu groups.
*/
- if (type == VFIO_IOMMU && !vfio_device_is_group_noiommu(device) &&
+ if (type == VFIO_IOMMU && !(vfio_device_is_group_noiommu(device) ||
+ vfio_device_is_cdev_noiommu(device)) &&
!device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
ret = -EINVAL;
goto err_out;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 31b826efba00..45f08986359e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -74,6 +74,7 @@ struct vfio_device {
u8 iommufd_attached:1;
#endif
u8 cdev_opened:1;
+ u8 noiommu:1;
/*
* debug_root is a static property of the vfio_device
* which must be set prior to registering the vfio_device.
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 8/9] selftests/vfio: Add iommufd noiommu mode selftest for cdev
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
` (6 preceding siblings ...)
2026-05-11 18:41 ` [PATCH v5 7/9] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
@ 2026-05-11 18:41 ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 9/9] Documentation: Update VFIO NOIOMMU mode Jacob Pan
8 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
Add comprehensive selftest for VFIO device operations with iommufd in
noiommu mode. Tests cover:
- Device binding to iommufd
- IOAS (I/O Address Space) allocation, mapping with dummy IOVA
- Retrieve PA from dummy IOVA
- Device attach/detach operations as usual
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v4:
- squash DSA specific selftest changes
v2:
- New selftest for generic noiommu bind/unbind
---
tools/testing/selftests/vfio/Makefile | 1 +
.../lib/include/libvfio/vfio_pci_device.h | 16 +
.../selftests/vfio/lib/vfio_pci_device.c | 5 +-
.../vfio/vfio_iommufd_noiommu_test.c | 567 ++++++++++++++++++
4 files changed, 587 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
index 0684932d91bf..c9c02fdfd946 100644
--- a/tools/testing/selftests/vfio/Makefile
+++ b/tools/testing/selftests/vfio/Makefile
@@ -9,6 +9,7 @@ CFLAGS = $(KHDR_INCLUDES)
TEST_GEN_PROGS += vfio_dma_mapping_test
TEST_GEN_PROGS += vfio_dma_mapping_mmio_test
TEST_GEN_PROGS += vfio_iommufd_setup_test
+TEST_GEN_PROGS += vfio_iommufd_noiommu_test
TEST_GEN_PROGS += vfio_pci_device_test
TEST_GEN_PROGS += vfio_pci_device_init_perf_test
TEST_GEN_PROGS += vfio_pci_driver_test
diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
index 2858885a89bb..6218c91776b3 100644
--- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
+++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
@@ -122,4 +122,20 @@ static inline bool vfio_pci_device_match(struct vfio_pci_device *device,
const char *vfio_pci_get_cdev_path(const char *bdf);
+static inline bool vfio_pci_noiommu_mode_enabled(void)
+{
+ char buf[8] = {};
+ int fd, n;
+
+ fd = open("/sys/module/vfio/parameters/enable_unsafe_noiommu_mode",
+ O_RDONLY);
+ if (fd < 0)
+ return false;
+
+ n = read(fd, buf, sizeof(buf) - 1);
+ close(fd);
+
+ return n > 0 && buf[0] == 'Y';
+}
+
#endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H */
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
index fc75e04ef010..1a91658e812d 100644
--- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c
+++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
@@ -308,8 +308,9 @@ const char *vfio_pci_get_cdev_path(const char *bdf)
VFIO_ASSERT_NOT_NULL(dir, "Failed to open directory %s\n", dir_path);
while ((entry = readdir(dir)) != NULL) {
- /* Find the file that starts with "vfio" */
- if (strncmp("vfio", entry->d_name, 4))
+ /* Find the file that starts with "vfio" or "noiommu-vfio" */
+ if (strncmp("vfio", entry->d_name, 4) &&
+ strncmp("noiommu-vfio", entry->d_name, 12))
continue;
snprintf(cdev_path, PATH_MAX, "/dev/vfio/devices/%s", entry->d_name);
diff --git a/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
new file mode 100644
index 000000000000..2df7cf40daff
--- /dev/null
+++ b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
@@ -0,0 +1,567 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * VFIO iommufd NoIOMMU Mode Selftest
+ *
+ * Tests VFIO device operations with iommufd in noiommu mode, including:
+ * - Device binding to iommufd
+ * - IOAS (I/O Address Space) allocation and management
+ * - Device attach/detach to IOAS
+ * - Memory mapping in IOAS
+ * - Device info queries and reset
+ */
+
+#include <linux/limits.h>
+#include <linux/vfio.h>
+#include <linux/iommufd.h>
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <dirent.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <errno.h>
+
+#include <libvfio.h>
+#include "kselftest_harness.h"
+
+static const char iommu_dev_path[] = "/dev/iommu";
+static const char *cdev_path;
+
+static char *vfio_noiommu_get_device_id(const char *bdf)
+{
+ char *path = NULL;
+ char *vfio_id = NULL;
+ struct dirent *dentry;
+ DIR *dp;
+
+ if (asprintf(&path, "/sys/bus/pci/devices/%s/vfio-dev", bdf) < 0)
+ return NULL;
+
+ dp = opendir(path);
+ if (!dp) {
+ free(path);
+ return NULL;
+ }
+
+ while ((dentry = readdir(dp)) != NULL) {
+ if (strncmp("noiommu-vfio", dentry->d_name, 12) == 0) {
+ vfio_id = strdup(dentry->d_name);
+ break;
+ }
+ }
+
+ closedir(dp);
+ free(path);
+ return vfio_id;
+}
+
+static char *vfio_noiommu_get_cdev_path(const char *bdf)
+{
+ char *vfio_id = vfio_noiommu_get_device_id(bdf);
+ char *cdev = NULL;
+
+ if (vfio_id) {
+ asprintf(&cdev, "/dev/vfio/devices/%s", vfio_id);
+ free(vfio_id);
+ }
+ return cdev;
+}
+
+static int vfio_device_bind_iommufd_ioctl(int cdev_fd, int iommufd)
+{
+ struct vfio_device_bind_iommufd bind_args = {
+ .argsz = sizeof(bind_args),
+ .iommufd = iommufd,
+ };
+
+ return ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind_args);
+}
+
+static int vfio_device_get_info_ioctl(int cdev_fd,
+ struct vfio_device_info *info)
+{
+ info->argsz = sizeof(*info);
+ return ioctl(cdev_fd, VFIO_DEVICE_GET_INFO, info);
+}
+
+static int vfio_device_ioas_alloc_ioctl(int iommufd,
+ struct iommu_ioas_alloc *alloc_args)
+{
+ alloc_args->size = sizeof(*alloc_args);
+ alloc_args->flags = 0;
+ return ioctl(iommufd, IOMMU_IOAS_ALLOC, alloc_args);
+}
+
+static int vfio_device_attach_iommufd_pt_ioctl(int cdev_fd, u32 pt_id)
+{
+ struct vfio_device_attach_iommufd_pt attach_args = {
+ .argsz = sizeof(attach_args),
+ .pt_id = pt_id,
+ };
+
+ return ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_args);
+}
+
+static int vfio_device_detach_iommufd_pt_ioctl(int cdev_fd)
+{
+ struct vfio_device_detach_iommufd_pt detach_args = {
+ .argsz = sizeof(detach_args),
+ };
+
+ return ioctl(cdev_fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_args);
+}
+
+static int vfio_device_get_region_info_ioctl(int cdev_fd, uint32_t index,
+ struct vfio_region_info *info)
+{
+ info->argsz = sizeof(*info);
+ info->index = index;
+ return ioctl(cdev_fd, VFIO_DEVICE_GET_REGION_INFO, info);
+}
+
+static int vfio_device_reset_ioctl(int cdev_fd)
+{
+ return ioctl(cdev_fd, VFIO_DEVICE_RESET);
+}
+
+static int ioas_map_pages(int iommufd, uint32_t ioas_id, uint64_t iova,
+ size_t length, bool hugepages)
+{
+ struct iommu_ioas_map map_args = {
+ .size = sizeof(map_args),
+ .ioas_id = ioas_id,
+ .iova = iova,
+ .length = length,
+ .flags = IOMMU_IOAS_MAP_READABLE | IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_FIXED_IOVA,
+ };
+ void *pages;
+ int ret;
+
+ /* Allocate test pages */
+ if (hugepages)
+ pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
+ else
+ pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (pages == MAP_FAILED) {
+ printf("mmap failed for length 0x%lx\n", (unsigned long)length);
+ return -ENOMEM;
+ }
+
+ /* Set up page pointer for mapping */
+ map_args.user_va = (uintptr_t)pages;
+
+ printf(" ioas_map_pages: ioas_id=%u, iova=0x%lx, length=0x%lx, user_va=%p\n",
+ ioas_id, (unsigned long)iova, (unsigned long)length, pages);
+
+ /* Map into IOAS */
+ ret = ioctl(iommufd, IOMMU_IOAS_MAP, &map_args);
+ if (ret != 0)
+ printf(" IOMMU_IOAS_MAP failed: %d (%s)\n", ret, strerror(errno));
+ else
+ printf(" IOMMU_IOAS_MAP succeeded, IOVA=0x%lx\n", (unsigned long)map_args.iova);
+
+ munmap(pages, length);
+ return ret;
+}
+
+static int ioas_unmap_pages(int iommufd, uint32_t ioas_id, uint64_t iova,
+ size_t length)
+{
+ struct iommu_ioas_unmap unmap_args = {
+ .size = sizeof(unmap_args),
+ .ioas_id = ioas_id,
+ .iova = iova,
+ .length = length,
+ };
+
+ return ioctl(iommufd, IOMMU_IOAS_UNMAP, &unmap_args);
+}
+
+static int ioas_destroy_ioctl(int iommufd, uint32_t ioas_id)
+{
+ struct iommu_destroy destroy_args = {
+ .size = sizeof(destroy_args),
+ .id = ioas_id,
+ };
+
+ return ioctl(iommufd, IOMMU_DESTROY, &destroy_args);
+}
+
+static int ioas_noiommu_get_pa_ioctl(int iommufd, uint32_t ioas_id, uint64_t iova,
+ uint64_t *phys_out, uint64_t *length_out)
+{
+ struct {
+ __u32 size;
+ __u32 flags;
+ __u32 ioas_id;
+ __u32 __reserved;
+ __u64 iova;
+ __u64 out_length;
+ __u64 out_phys;
+ } get_pa = {
+ .size = sizeof(get_pa),
+ .flags = 0,
+ .ioas_id = ioas_id,
+ .iova = iova,
+ };
+
+ printf(" ioas_noiommu_get_pa_ioctl: ioas_id=%u, iova=0x%lx\n",
+ ioas_id, (unsigned long)iova);
+
+ if (ioctl(iommufd, IOMMU_IOAS_NOIOMMU_GET_PA, &get_pa) != 0) {
+ printf(" IOMMU_IOAS_NOIOMMU_GET_PA failed: %s (errno=%d)\n",
+ strerror(errno), errno);
+ return -1;
+ }
+
+ printf(" IOMMU_IOAS_NOIOMMU_GET_PA succeeded: PA=0x%lx, length=0x%lx\n",
+ (unsigned long)get_pa.out_phys, (unsigned long)get_pa.out_length);
+
+ if (phys_out)
+ *phys_out = get_pa.out_phys;
+ if (length_out)
+ *length_out = get_pa.out_length;
+
+ return 0;
+}
+
+FIXTURE(vfio_noiommu) {
+ int cdev_fd;
+ int iommufd;
+};
+
+FIXTURE_SETUP(vfio_noiommu)
+{
+ ASSERT_LE(0, (self->cdev_fd = open(cdev_path, O_RDWR, 0)));
+ ASSERT_LE(0, (self->iommufd = open(iommu_dev_path, O_RDWR, 0)));
+}
+
+FIXTURE_TEARDOWN(vfio_noiommu)
+{
+ if (self->cdev_fd >= 0)
+ close(self->cdev_fd);
+ if (self->iommufd >= 0)
+ close(self->iommufd);
+}
+
+/*
+ * Test: Device cdev can be opened
+ */
+TEST_F(vfio_noiommu, device_cdev_open)
+{
+ ASSERT_LE(0, self->cdev_fd);
+}
+
+/*
+ * Test: Device can be bound to iommufd
+ */
+TEST_F(vfio_noiommu, device_bind_iommufd)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+}
+
+/*
+ * Test: Device info can be queried after binding
+ */
+TEST_F(vfio_noiommu, device_get_info_after_bind)
+{
+ struct vfio_device_info info;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
+ ASSERT_NE(0, info.argsz);
+}
+
+/*
+ * Test: Getting device info fails without bind
+ */
+TEST_F(vfio_noiommu, device_get_info_without_bind_fails)
+{
+ struct vfio_device_info info;
+
+ ASSERT_NE(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
+}
+
+/*
+ * Test: Binding with invalid iommufd fails
+ */
+TEST_F(vfio_noiommu, device_bind_bad_iommufd_fails)
+{
+ ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, -2));
+}
+
+/*
+ * Test: Cannot bind twice to same device
+ */
+TEST_F(vfio_noiommu, device_repeated_bind_fails)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+}
+
+/*
+ * Test: IOAS can be allocated
+ */
+TEST_F(vfio_noiommu, ioas_alloc)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_NE(0, alloc_args.out_ioas_id);
+}
+
+/*
+ * Test: IOAS can be destroyed
+ */
+TEST_F(vfio_noiommu, ioas_destroy)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_EQ(0, ioas_destroy_ioctl(self->iommufd,
+ alloc_args.out_ioas_id));
+}
+
+/*
+ * Test: Device can attach to IOAS after binding
+ */
+TEST_F(vfio_noiommu, device_attach_to_ioas)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+}
+
+/*
+ * Test: Attaching to invalid IOAS fails
+ */
+TEST_F(vfio_noiommu, device_attach_invalid_ioas_fails)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_NE(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ UINT32_MAX));
+}
+
+/*
+ * Test: Device can detach from IOAS
+ */
+TEST_F(vfio_noiommu, device_detach_from_ioas)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+ ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd));
+}
+
+/*
+ * Test: Full lifecycle - bind, attach, detach, reset
+ */
+TEST_F(vfio_noiommu, device_lifecycle)
+{
+ struct iommu_ioas_alloc alloc_args;
+ struct vfio_device_info info;
+
+ /* Bind device to iommufd */
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+
+ /* Allocate IOAS */
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ /* Attach device to IOAS */
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+
+ /* Query device info */
+ ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
+
+ /* Detach device from IOAS */
+ ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd));
+
+ /* Reset device */
+ ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
+}
+
+/*
+ * Test: Get region info
+ */
+TEST_F(vfio_noiommu, device_get_region_info)
+{
+ struct vfio_device_info dev_info;
+ struct vfio_region_info region_info;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &dev_info));
+
+ /* Try to get first region info if device has regions */
+ if (dev_info.num_regions > 0) {
+ ASSERT_EQ(0, vfio_device_get_region_info_ioctl(self->cdev_fd, 0,
+ ®ion_info));
+ ASSERT_NE(0, region_info.argsz);
+ }
+}
+
+TEST_F(vfio_noiommu, device_reset)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
+}
+
+TEST_F(vfio_noiommu, ioas_map_pages)
+{
+ struct iommu_ioas_alloc alloc_args;
+ long page_size = sysconf(_SC_PAGESIZE);
+ uint64_t iova = 0x10000;
+ int i;
+
+ ASSERT_GT(page_size, 0);
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ printf("Page size: %ld bytes\n", page_size);
+ /* Test mapping regions of different sizes: 1, 2, 4, 8 pages */
+ for (i = 0; i < 4; i++) {
+ size_t map_size = page_size * (1 << i); /* 1, 2, 4, 8 pages */
+ uint64_t test_iova = iova + (i * 0x100000);
+
+ /* Attempt to map each region (may fail if not supported) */
+ ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
+ test_iova, map_size, false);
+ }
+}
+
+TEST_F(vfio_noiommu, multiple_ioas_alloc)
+{
+ struct iommu_ioas_alloc alloc1, alloc2;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc1));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc2));
+ ASSERT_NE(alloc1.out_ioas_id, alloc2.out_ioas_id);
+}
+
+/*
+ * Test: Query physical address for IOVA
+ * Tests IOMMU_IOAS_NOIOMMU_GET_PA ioctl to translate IOVA to physical address
+ * Note: Device must be attached to IOAS for PA query to work
+ */
+#define NR_PAGES 32
+TEST_F(vfio_noiommu, ioas_noiommu_get_pa_mapped)
+{
+ struct iommu_ioas_alloc alloc_args;
+ long page_size = sysconf(_SC_PAGESIZE);
+ uint64_t iova = 0x200000;
+ uint64_t phys = 0;
+ uint64_t length = 0;
+ int ret;
+
+ ASSERT_GT(page_size, 0);
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+
+ /*
+ * Map a page into an arbitrary IOAS, used as a cookie for lookup.
+ * Use hugepages to test contiguous PA. Make sure hugepages are
+ * available. e.g. echo 64 > /proc/sys/vm/nr_hugepages
+ */
+ ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
+ iova, page_size * NR_PAGES, true);
+ if (ret != 0)
+ return;
+
+ /* Query the physical address for the mapped dummy IOVA */
+ ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ iova, &phys, &length);
+
+ if (ret == 0) {
+ /* If we got a result, verify it's valid */
+ ASSERT_NE(0, phys);
+ ASSERT_GE((uint64_t)page_size * NR_PAGES, length);
+ }
+
+ /*
+ * Query with a non-page-aligned IOVA. The returned length must
+ * not exceed the actual contiguous range starting from that
+ * offset, i.e. it must be reduced by the sub-page offset.
+ */
+ phys = 0;
+ length = 0;
+ ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ iova + 0x80, &phys, &length);
+ if (ret == 0) {
+ ASSERT_NE(0, phys);
+ /* Length must account for the sub-page offset */
+ ASSERT_GE((uint64_t)page_size * NR_PAGES - 0x80, length);
+ ASSERT_LE(length, (uint64_t)page_size * NR_PAGES - 0x80);
+ /* Must not overshoot into the next page boundary */
+ ASSERT_EQ(0, (phys + length) % page_size);
+ }
+}
+
+TEST_F(vfio_noiommu, ioas_noiommu_get_pa_unmapped_fails)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ /* Try to retrieve unmapped IOVA (should fail) */
+ ASSERT_NE(0, ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ 0x10000, NULL, NULL));
+}
+
+int main(int argc, char *argv[])
+{
+ const char *device_bdf = vfio_selftests_get_bdf(&argc, argv);
+ char *cdev = NULL;
+
+ if (!device_bdf) {
+ ksft_print_msg("No device BDF provided\n");
+ return KSFT_SKIP;
+ }
+
+ cdev = vfio_noiommu_get_cdev_path(device_bdf);
+ if (!cdev) {
+ ksft_print_msg("Could not find cdev for device %s\n",
+ device_bdf);
+ return KSFT_SKIP;
+ }
+
+ cdev_path = cdev;
+ ksft_print_msg("Using cdev device %s for BDF %s\n", cdev_path,
+ device_bdf);
+
+ return test_harness_run(argc, argv);
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 9/9] Documentation: Update VFIO NOIOMMU mode
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
` (7 preceding siblings ...)
2026-05-11 18:41 ` [PATCH v5 8/9] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
@ 2026-05-11 18:41 ` Jacob Pan
8 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:41 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
Document the NOIOMMU mode with newly added cdev support under iommufd.
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
Documentation/driver-api/vfio.rst | 46 +++++++++++++++++++++++++++++--
1 file changed, 44 insertions(+), 2 deletions(-)
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 2a21a42c9386..d97017d80b98 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -275,8 +275,6 @@ in a VFIO group.
With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
by directly opening a character device /dev/vfio/devices/vfioX where
"X" is the number allocated uniquely by VFIO for registered devices.
-cdev interface does not support noiommu devices, so user should use
-the legacy group interface if noiommu is wanted.
The cdev only works with IOMMUFD. Both VFIO drivers and applications
must adapt to the new cdev security model which requires using
@@ -370,6 +368,50 @@ IOMMUFD IOAS/HWPT to enable userspace DMA::
/* Other device operations as stated in "VFIO Usage Example" */
+VFIO NOIOMMU mode
+-------------------------------------------------------------------------------
+VFIO also supports a no-IOMMU mode, intended for usages where unsafe DMA can
+be performed by userspace drivers w/o physical IOMMU protection. This mode
+is controlled by the parameter:
+
+/sys/module/vfio/parameters/enable_unsafe_noiommu_mode
+
+Upon enabling this mode, with an assigned device, the user will be presented
+with a VFIO group and device file, e.g.::
+
+ /dev/vfio/
+ |-- devices
+ | `-- noiommu-vfio0 /* VFIO device cdev */
+ |-- noiommu-0 /* VFIO group */
+ `-- vfio
+
+The capabilities vary depending on the device programming interface and kernel
+configuration used. The following table summarizes the differences:
+
++-------------------+---------------------+----------------------+
+| Feature | VFIO group | VFIO device cdev |
++===================+=====================+======================+
+| VFIO device UAPI | Yes | Yes |
++-------------------+---------------------+----------------------+
+| VFIO container | No | No |
++-------------------+---------------------+----------------------+
+| IOMMUFD IOAS | No | Yes* |
++-------------------+---------------------+----------------------+
+
+Note that the VFIO container case includes IOMMUFD provided VFIO compatibility
+interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER is
+enabled.
+
+* IOMMUFD UAPI is available for VFIO device cdev to pin and map user memory with
+ the ability to retrieve physical addresses for DMA command submission.
+
+A new IOMMUFD ioctl IOMMU_IOAS_NOIOMMU_GET_PA is added to retrieve the physical
+address for a given IOVA. Although there is no physical DMA remapping hardware,
+IOMMU_IOAS_MAP_FIXED_IOVA is still used to establish IOVA-to-PA mappings in the
+software page table for later IOMMU_IOAS_NOIOMMU_GET_PA lookups.
+tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c provides an example of
+using this ioctl in no-IOMMU mode.
+
VFIO User API
-------------------------------------------------------------------------------
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v5 5/9] iommufd: Add an ioctl to query PA from IOVA for noiommu mode
2026-05-11 18:41 ` [PATCH v5 5/9] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
@ 2026-05-11 18:58 ` Jacob Pan
0 siblings, 0 replies; 11+ messages in thread
From: Jacob Pan @ 2026-05-11 18:58 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Baolu Lu,
jacob.pan
On Mon, 11 May 2026 11:41:10 -0700
Jacob Pan <jacob.pan@linux.microsoft.com> wrote:
> To support no-IOMMU mode where userspace drivers perform unsafe DMA
> using physical addresses, introduce a new API to retrieve the
> physical address of a user-allocated DMA buffer that has been mapped
> to an IOVA via IOAS. The mapping is backed by SW-only I/O page tables
> maintained by the generic IOMMUPT framework.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> v5:
> - Add header stubs for iopt_get_phys() and
> iommufd_ioas_noiommu_get_pa() to avoid ifdef at call sites
> (Kevin) v4:
> - Fix ioctl return type (Yi Liu)
This is not the correct change log, I have made a mistake. The correct
one should be:
- Fix next_iova exceeds iopt_area_last_iova (Alex)
- Rename IOCTL more specific to NOIOMMU, i.e.
IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA (Kevin)
Sorry about that.
> v2:
> - New patch
> ---
> drivers/iommu/iommufd/io_pagetable.c | 62
> +++++++++++++++++++++++++ drivers/iommu/iommufd/ioas.c |
> 30 ++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 18 +++++++
> drivers/iommu/iommufd/main.c | 3 ++
> include/uapi/linux/iommufd.h | 25 ++++++++++
> 5 files changed, 138 insertions(+)
>
> diff --git a/drivers/iommu/iommufd/io_pagetable.c
> b/drivers/iommu/iommufd/io_pagetable.c index
> 24d4917105d9..1ee7c8e6408c 100644 ---
> a/drivers/iommu/iommufd/io_pagetable.c +++
> b/drivers/iommu/iommufd/io_pagetable.c @@ -859,6 +859,68 @@ int
> iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova, return
> iopt_unmap_iova_range(iopt, iova, iova_last, unmapped); }
>
> +#ifdef CONFIG_IOMMUFD_NOIOMMU
> +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64
> *paddr,
> + u64 *length)
> +{
> + struct iopt_area *area;
> + u64 tmp_length = 0;
> + u64 tmp_paddr = 0;
> + int rc = 0;
> +
> + down_read(&iopt->iova_rwsem);
> + area = iopt_area_iter_first(iopt, iova, iova);
> + if (!area || !area->pages) {
> + rc = -ENOENT;
> + goto unlock_exit;
> + }
> +
> + if (!area->storage_domain ||
> + area->storage_domain->owner != &iommufd_noiommu_ops) {
> + rc = -EOPNOTSUPP;
> + goto unlock_exit;
> + }
> +
> + *paddr = iommu_iova_to_phys(area->storage_domain, iova);
> + if (!*paddr) {
> + rc = -EINVAL;
> + goto unlock_exit;
> + }
> +
> + tmp_length = PAGE_SIZE - offset_in_page(iova);
> + tmp_paddr = *paddr;
> + /*
> + * Scan the domain for the contiguous physical address
> length so that
> + * userspace search can be optimized for fewer ioctls.
> + */
> + while (iova < iopt_area_last_iova(area)) {
> + unsigned long next_iova;
> + u64 next_paddr;
> +
> + if (check_add_overflow(iova, PAGE_SIZE, &next_iova))
> + break;
> +
> + if (next_iova > iopt_area_last_iova(area))
> + break;
> +
> + next_paddr =
> iommu_iova_to_phys(area->storage_domain, next_iova); +
> + if (!next_paddr || next_paddr != tmp_paddr +
> PAGE_SIZE)
> + break;
> +
> + iova = next_iova;
> + tmp_paddr += PAGE_SIZE;
> + tmp_length += PAGE_SIZE;
> + }
> + *length = tmp_length;
> +
> +unlock_exit:
> + up_read(&iopt->iova_rwsem);
> +
> + return rc;
> +}
> +#endif
> +
> int iopt_unmap_all(struct io_pagetable *iopt, unsigned long
> *unmapped) {
> /* If the IOVAs are empty then unmap all succeeds */
> diff --git a/drivers/iommu/iommufd/ioas.c
> b/drivers/iommu/iommufd/ioas.c index fed06c2b728e..666440e32c9e 100644
> --- a/drivers/iommu/iommufd/ioas.c
> +++ b/drivers/iommu/iommufd/ioas.c
> @@ -375,6 +375,36 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd)
> return rc;
> }
>
> +#ifdef CONFIG_IOMMUFD_NOIOMMU
> +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd)
> +{
> + struct iommu_ioas_noiommu_get_pa *cmd = ucmd->cmd;
> + struct iommufd_ioas *ioas;
> + int rc;
> +
> + if (!capable(CAP_SYS_RAWIO))
> + return -EPERM;
> +
> + if (cmd->flags || cmd->__reserved)
> + return -EOPNOTSUPP;
> +
> + ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
> + if (IS_ERR(ioas))
> + return PTR_ERR(ioas);
> +
> + rc = iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys,
> + &cmd->out_length);
> + if (rc)
> + goto out_put;
> +
> + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> +out_put:
> + iommufd_put_object(ucmd->ictx, &ioas->obj);
> +
> + return rc;
> +}
> +#endif
> +
> static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx,
> struct xarray *ioas_list)
> {
> diff --git a/drivers/iommu/iommufd/iommufd_private.h
> b/drivers/iommu/iommufd/iommufd_private.h index
> 2682b5baa6e9..13f1506d8066 100644 ---
> a/drivers/iommu/iommufd/iommufd_private.h +++
> b/drivers/iommu/iommufd/iommufd_private.h @@ -118,6 +118,16 @@ int
> iopt_map_pages(struct io_pagetable *iopt, struct list_head
> *pages_list, int iopt_unmap_iova(struct io_pagetable *iopt, unsigned
> long iova, unsigned long length, unsigned long *unmapped); int
> iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
> +#ifdef CONFIG_IOMMUFD_NOIOMMU +int iopt_get_phys(struct io_pagetable
> *iopt, unsigned long iova, u64 *paddr,
> + u64 *length);
> +#else
> +static inline int iopt_get_phys(struct io_pagetable *iopt, unsigned
> long iova,
> + u64 *paddr, u64 *length)
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif
>
> int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
> struct iommu_domain *domain,
> @@ -346,6 +356,14 @@ int iommufd_ioas_map_file(struct iommufd_ucmd
> *ucmd); int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd);
> int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
> int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
> +#ifdef CONFIG_IOMMUFD_NOIOMMU
> +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd);
> +#else
> +static inline int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd
> *ucmd) +{
> + return -EOPNOTSUPP;
> +}
> +#endif
> int iommufd_ioas_option(struct iommufd_ucmd *ucmd);
> int iommufd_option_rlimit_mode(struct iommu_option *cmd,
> struct iommufd_ctx *ictx);
> diff --git a/drivers/iommu/iommufd/main.c
> b/drivers/iommu/iommufd/main.c index 8c6d43601afb..3b4192d70570 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -424,6 +424,7 @@ union ucmd_buffer {
> struct iommu_ioas_alloc alloc;
> struct iommu_ioas_allow_iovas allow_iovas;
> struct iommu_ioas_copy ioas_copy;
> + struct iommu_ioas_noiommu_get_pa noiommu_get_pa;
> struct iommu_ioas_iova_ranges iova_ranges;
> struct iommu_ioas_map map;
> struct iommu_ioas_unmap unmap;
> @@ -482,6 +483,8 @@ static const struct iommufd_ioctl_op
> iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map,
> struct iommu_ioas_map, iova), IOCTL_OP(IOMMU_IOAS_MAP_FILE,
> iommufd_ioas_map_file, struct iommu_ioas_map_file, iova),
> + IOCTL_OP(IOMMU_IOAS_NOIOMMU_GET_PA,
> iommufd_ioas_noiommu_get_pa, struct iommu_ioas_noiommu_get_pa,
> + out_phys),
> IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct
> iommu_ioas_unmap, length),
> IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option,
> val64), diff --git a/include/uapi/linux/iommufd.h
> b/include/uapi/linux/iommufd.h index e998dfbd6960..7df366d161f1 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -57,6 +57,7 @@ enum {
> IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
> IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
> IOMMUFD_CMD_HW_QUEUE_ALLOC = 0x94,
> + IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA = 0x95,
> };
>
> /**
> @@ -219,6 +220,30 @@ struct iommu_ioas_map {
> };
> #define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP)
>
> +/**
> + * struct iommu_ioas_noiommu_get_pa -
> ioctl(IOMMU_IOAS_NOIOMMU_GET_PA)
> + * @size: sizeof(struct iommu_ioas_noiommu_get_pa)
> + * @flags: Reserved, must be 0 for now
> + * @ioas_id: IOAS ID to query IOVA to PA mapping from
> + * @__reserved: Must be 0
> + * @iova: IOVA to query
> + * @out_length: Number of bytes contiguous physical address starting
> from phys
> + * @out_phys: Output physical address the IOVA maps to
> + *
> + * Query the physical address backing an IOVA range. The entire
> range must be
> + * mapped already. For noiommu devices doing unsafe DMA only.
> + */
> +struct iommu_ioas_noiommu_get_pa {
> + __u32 size;
> + __u32 flags;
> + __u32 ioas_id;
> + __u32 __reserved;
> + __aligned_u64 iova;
> + __aligned_u64 out_length;
> + __aligned_u64 out_phys;
> +};
> +#define IOMMU_IOAS_NOIOMMU_GET_PA _IO(IOMMUFD_TYPE,
> IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA) +
> /**
> * struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE)
> * @size: sizeof(struct iommu_ioas_map_file)
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-05-11 18:58 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 18:41 [PATCH v5 0/9] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-05-11 18:41 ` [PATCH v5 1/9] vfio: Rename VFIO_NOIOMMU to VFIO_GROUP_NOIOMMU Jacob Pan
2026-05-11 18:41 ` [PATCH v5 2/9] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-05-11 18:41 ` [PATCH v5 3/9] iommufd: Move igroup allocation to a function Jacob Pan
2026-05-11 18:41 ` [PATCH v5 4/9] iommufd: Allow binding to a noiommu device Jacob Pan
2026-05-11 18:41 ` [PATCH v5 5/9] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
2026-05-11 18:58 ` Jacob Pan
2026-05-11 18:41 ` [PATCH v5 6/9] vfio/group: Add VFIO_CDEV_NOIOMMU Kconfig and tolerate NULL group Jacob Pan
2026-05-11 18:41 ` [PATCH v5 7/9] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
2026-05-11 18:41 ` [PATCH v5 8/9] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
2026-05-11 18:41 ` [PATCH v5 9/9] Documentation: Update VFIO NOIOMMU mode Jacob Pan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox