* [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev
@ 2026-05-21 22:11 Jacob Pan
2026-05-21 22:11 ` [PATCH v6 1/7] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
` (7 more replies)
0 siblings, 8 replies; 25+ messages in thread
From: Jacob Pan @ 2026-05-21 22:11 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
VFIO's unsafe_noiommu_mode has long provided a way for userspace drivers
to operate on platforms lacking a hardware IOMMU. Today, IOMMUFD also
supports No-IOMMU mode for group-based devices under vfio_compat mode.
However, IOMMUFD's native character device (cdev) does not yet support
No-IOMMU mode, which is the purpose of this patch.
In summary, we have:
|-------------------------+------+---------------|
| Device access mode | VFIO | IOMMUFD |
|-------------------------+------+---------------|
| group /dev/vfio/$GROUP | Yes | Yes |
|-------------------------+------+---------------|
| cdev /dev/vfio/devices/ | No | This patch |
|-------------------------+------+---------------|
Beyond enabling cdev for IOMMUFD, this patch also addresses the following
deficiencies in the current No-IOMMU mode suggested by Jason[1]:
- Devices operating under No-IOMMU mode are limited to device-level UAPI
access, without container or IOAS-level capabilities. Consequently,
user-space drivers lack structured mechanisms for page pinning and often
resort to mlock(), which is less robust than pin_user_pages() used for
devices backed by a physical IOMMU. For example, mlock() does not prevent
page migration.
- There is no architectural mechanism for obtaining physical addresses for
DMA. As a workaround, user-space drivers frequently rely on /proc/pagemap
tricks or hardcoded values.
By allowing noiommu device access to IOMMUFD IOAS and HWPT objects, this
patch brings No-IOMMU mode closer to full citizenship within the IOMMU
subsystem. In addition to addressing the two deficiencies mentioned above,
the expectation is that it will also enable No-IOMMU devices to seamlessly
participate in live update sessions via KHO [2].
Furthermore, these devices will use the IOMMUFD-based ownership checking model for
VFIO_DEVICE_PCI_HOT_RESET, eliminating the need for an iommufd_access object
as required in a previous attempt [3].
ChangeLog:
V6:
- Delete rename VFIO_IOMMU patch
- Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and group.
Use Kconfig dependency to restrict usages and avoid null group
checks. (Alex & Yi)
- Add CAP_SYS_RAWIO checks for cdev open to maintain security parity
with the group noiommu path. (Alex)
- Updated documentation with Kconfig usage matrix
- Added max length limit to get_pa ioctl (Baolu & Jason)
V5:
- Split CONFIG_VFIO_NOIOMMU into CONFIG_VFIO_GROUP_NOIOMMU and
CONFIG_VFIO_CDEV_NOIOMMU so cdev noiommu is independent of
VFIO_GROUP (Alex)
- Add CAP_SYS_RAWIO check for cdev open and bind under noiommu,
security parity with group noiommu (Alex)
- Add IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) guard in
iommufd_device_is_noiommu() to prevent noiommu bind when feature
is disabled
- Add prep patch to tolerate NULL group for cdev noiommu devices
when CONFIG_VFIO_GROUP_NOIOMMU is not set [7/9]
- Rename IOCTL to IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA to be more
specific (Kevin)
- Simplify iommufd_device_is_noiommu, use iommufd_bind_noiommu
helper (Kevin, Yi)
- Move IOMMU cap check under iommufd_bind_iommu() (Yi)
- Fix next_iova exceeding iopt_area_last_iova in GET_PA (Alex)
- Fix const hwpt, copyright date, typo in moved comment (Kevin)
- Add Reviewed-by tags
- Squash noiommu cdev selftest fix into selftest patch
- Drop DSA selftest patch
- Details in each patch changelog.
V4:
- Fix various corner cases pointed out by (Sashiko)
Details in each patch changelog.
V3:
- Improve error handling [3/10] (Mostafa)
- Simplify vfio_device_is_noiommu logic and merged in [6/10] (Mostafa)
- Add comment to explain the design difference over the legacy noiommu
VFIO code.[1/10]
V2:
- Fix build dependency by adding IOMMU_SUPPORT in [8/11]
- Add an optimization to scan beyond the first page for a contiguous
physical address range and return its length instead of a single
page.[4/11]
Since RFC[4]:
- Abandoned dummy iommu driver approach as patch 1-3 absorbed the
changes into iommufd.
[1] https://lore.kernel.org/linux-iommu/20250603175403.GA407344@nvidia.com/
[2] https://lore.kernel.org/linux-pci/20251027134430.00007e46@linux.microsoft.com/
[3] https://lore.kernel.org/kvm/20230522115751.326947-1-yi.l.liu@intel.com/
[4] https://lore.kernel.org/linux-iommu/20251201173012.18371-1-jacob.pan@linux.microsoft.com/
Future cleanup: consolidate all CONFIG_IOMMUFD_NOIOMMU code
(iopt_get_phys, iommufd_ioas_noiommu_get_pa, iommufd_noiommu_ops) into
hwpt_noiommu.c to eliminate #ifdef guards from ioas.c and io_pagetable.c.
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
Jacob Pan (4):
iommufd: Add an ioctl to query PA from IOVA for noiommu mode
vfio: Enable cdev noiommu mode under iommufd
selftests/vfio: Add iommufd noiommu mode selftest for cdev
Documentation: Update VFIO NOIOMMU mode
Jason Gunthorpe (3):
iommufd: Support a HWPT without an iommu driver for noiommu
iommufd: Move igroup allocation to a function
iommufd: Allow binding to a noiommu device
Documentation/driver-api/vfio.rst | 83 ++-
drivers/iommu/iommufd/Kconfig | 12 +
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 192 +++--
drivers/iommu/iommufd/hw_pagetable.c | 15 +-
drivers/iommu/iommufd/hwpt_noiommu.c | 97 +++
drivers/iommu/iommufd/io_pagetable.c | 72 ++
drivers/iommu/iommufd/ioas.c | 30 +
drivers/iommu/iommufd/iommufd_private.h | 20 +
drivers/iommu/iommufd/main.c | 3 +
drivers/vfio/Kconfig | 8 +-
drivers/vfio/device_cdev.c | 3 +
drivers/vfio/iommufd.c | 6 +-
drivers/vfio/vfio.h | 20 +-
drivers/vfio/vfio_main.c | 23 +-
include/linux/vfio.h | 1 +
include/uapi/linux/iommufd.h | 27 +
tools/testing/selftests/vfio/Makefile | 1 +
.../lib/include/libvfio/vfio_pci_device.h | 16 +
.../selftests/vfio/lib/vfio_pci_device.c | 5 +-
.../vfio/vfio_iommufd_noiommu_test.c | 664 ++++++++++++++++++
21 files changed, 1221 insertions(+), 78 deletions(-)
create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
--
2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v6 1/7] iommufd: Support a HWPT without an iommu driver for noiommu
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
@ 2026-05-21 22:11 ` Jacob Pan
2026-05-21 22:11 ` [PATCH v6 2/7] iommufd: Move igroup allocation to a function Jacob Pan
` (6 subsequent siblings)
7 siblings, 0 replies; 25+ messages in thread
From: Jacob Pan @ 2026-05-21 22:11 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
From: Jason Gunthorpe <jgg@nvidia.com>
Create just a little part of a real iommu driver, enough to
slot in under the dev_iommu_ops() and allow iommufd to call
domain_alloc_paging_flags() and fail everything else.
This allows explicitly creating a HWPT under an IOAS.
A new Kconfig option IOMMUFD_NOIOMMU is introduced to differentiate
from the VFIO group/container based noiommu mode.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v6: (Yi)
- Sort includes alphabetically (iommu.h after generic_pt/iommu.h)
- Fix comment: s/mock page table/SW-only page table/ to avoid confusion
with selftest mock
- Rewrite noiommu_amdv1_ops comment: explain why AMDV1 format is chosen
(multi-page size options), remove references to group-container mode distinction
v5:
- Use the new IOMMUFD_NOIOMMU Kconfig instead of VFIO_NOIOMMU
- Use consistent wording referring to VFIO noiommu mode (Kevin)
- Copyright date fix (Kevin)
v4:
- Make iommufd_noiommu_ops const
v3:
- Add comment to explain the design difference over the
legacy noiommu VFIO code.
---
drivers/iommu/iommufd/Kconfig | 12 +++
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/hw_pagetable.c | 15 +++-
drivers/iommu/iommufd/hwpt_noiommu.c | 97 +++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 2 +
5 files changed, 125 insertions(+), 2 deletions(-)
create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index 455bac0351f2..6c3bea83631b 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -16,6 +16,18 @@ config IOMMUFD
If you don't know what to do here, say N.
if IOMMUFD
+config IOMMUFD_NOIOMMU
+ bool
+ depends on !GENERIC_ATOMIC64 # IOMMU_PT_AMDV1 requires cmpxchg64
+ select GENERIC_PT
+ select IOMMU_PT
+ select IOMMU_PT_AMDV1
+ help
+ Provides a SW-only IO page table for devices without hardware
+ IOMMU backing. This uses the AMDV1 page table format for
+ IOVA-to-PA lookups only, not for hardware DMA translation.
+ To be selected by VFIO_NOIOMMU when VFIO_DEVICE_CDEV is enabled.
+
config IOMMUFD_VFIO_CONTAINER
bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
depends on VFIO_GROUP && !VFIO_CONTAINER
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 71d692c9a8f4..67207914bb6e 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -10,6 +10,7 @@ iommufd-y := \
vfio_compat.o \
viommu.o
+iommufd-$(CONFIG_IOMMUFD_NOIOMMU) += hwpt_noiommu.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index fe789c2dc0c9..0ae14cd3fc72 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,15 @@
#include "../iommu-priv.h"
#include "iommufd_private.h"
+static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev)
+{
+ if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->igroup->group)
+ return &iommufd_noiommu_ops;
+ if (WARN_ON_ONCE(!idev->dev->iommu))
+ return NULL;
+ return dev_iommu_ops(idev->dev);
+}
+
static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
{
if (hwpt->domain)
@@ -114,11 +123,13 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
IOMMU_HWPT_FAULT_ID_VALID |
IOMMU_HWPT_ALLOC_PASID;
- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+ const struct iommu_ops *ops = get_iommu_ops(idev);
struct iommufd_hwpt_paging *hwpt_paging;
struct iommufd_hw_pagetable *hwpt;
int rc;
+ if (!ops)
+ return ERR_PTR(-ENODEV);
lockdep_assert_held(&ioas->mutex);
if ((flags || user_data) && !ops->domain_alloc_paging_flags)
@@ -229,7 +240,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
struct iommufd_device *idev, u32 flags,
const struct iommu_user_data *user_data)
{
- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+ const struct iommu_ops *ops = get_iommu_ops(idev);
struct iommufd_hwpt_nested *hwpt_nested;
struct iommufd_hw_pagetable *hwpt;
int rc;
diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/hwpt_noiommu.c
new file mode 100644
index 000000000000..62a44f4b9164
--- /dev/null
+++ b/drivers/iommu/iommufd/hwpt_noiommu.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES
+ */
+#include <linux/generic_pt/iommu.h>
+#include <linux/iommu.h>
+#include "iommufd_private.h"
+
+static const struct iommu_domain_ops noiommu_amdv1_ops;
+
+struct noiommu_domain {
+ union {
+ struct iommu_domain domain;
+ struct pt_iommu_amdv1 amdv1;
+ };
+ spinlock_t lock;
+};
+PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain);
+
+static void noiommu_change_top(struct pt_iommu *iommu_table,
+ phys_addr_t top_paddr, unsigned int top_level)
+{
+}
+
+static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt)
+{
+ struct noiommu_domain *domain =
+ container_of(iommupt, struct noiommu_domain, amdv1.iommu);
+
+ return &domain->lock;
+}
+
+static const struct pt_iommu_driver_ops noiommu_driver_ops = {
+ .get_top_lock = noiommu_get_top_lock,
+ .change_top = noiommu_change_top,
+};
+
+static struct iommu_domain *
+noiommu_alloc_paging_flags(struct device *dev, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ struct pt_iommu_amdv1_cfg cfg = {};
+ struct noiommu_domain *dom;
+ int rc;
+
+ if (flags || user_data)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ cfg.common.hw_max_vasz_lg2 = 64;
+ cfg.common.hw_max_oasz_lg2 = 52;
+ cfg.starting_level = 2;
+ cfg.common.features =
+ (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) |
+ BIT(PT_FEAT_AMDV1_FORCE_COHERENCE));
+
+ dom = kzalloc(sizeof(*dom), GFP_KERNEL);
+ if (!dom)
+ return ERR_PTR(-ENOMEM);
+
+ spin_lock_init(&dom->lock);
+ dom->amdv1.iommu.nid = NUMA_NO_NODE;
+ dom->amdv1.iommu.driver_ops = &noiommu_driver_ops;
+ dom->domain.ops = &noiommu_amdv1_ops;
+
+ /* Use SW-only page table which is based on AMDV1 */
+ rc = pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL);
+ if (rc) {
+ kfree(dom);
+ return ERR_PTR(rc);
+ }
+
+ return &dom->domain;
+}
+
+static void noiommu_domain_free(struct iommu_domain *iommu_domain)
+{
+ struct noiommu_domain *domain =
+ container_of(iommu_domain, struct noiommu_domain, domain);
+
+ pt_iommu_deinit(&domain->amdv1.iommu);
+ kfree(domain);
+}
+
+/*
+ * Domain ops for iommufd no-IOMMU mode. Uses AMDV1 format as a
+ * SW-only IOPT because it has the best multi-page size options
+ * of all the formats. IOVAs serve only for IOVA-to-PA lookups,
+ * not for hardware DMA translation.
+ */
+static const struct iommu_domain_ops noiommu_amdv1_ops = {
+ IOMMU_PT_DOMAIN_OPS(amdv1),
+ .free = noiommu_domain_free,
+};
+
+const struct iommu_ops iommufd_noiommu_ops = {
+ .domain_alloc_paging_flags = noiommu_alloc_paging_flags,
+};
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 6ac1965199e9..2682b5baa6e9 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
refcount_dec(&hwpt->obj.users);
}
+extern const struct iommu_ops iommufd_noiommu_ops;
+
struct iommufd_attach;
struct iommufd_group {
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v6 2/7] iommufd: Move igroup allocation to a function
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-05-21 22:11 ` [PATCH v6 1/7] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
@ 2026-05-21 22:11 ` Jacob Pan
2026-05-22 6:00 ` Baolu Lu
2026-05-21 22:11 ` [PATCH v6 3/7] iommufd: Allow binding to a noiommu device Jacob Pan
` (5 subsequent siblings)
7 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-21 22:11 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
From: Jason Gunthorpe <jgg@nvidia.com>
So it can be reused in the next patch which allows binding to noiommu
device.
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v5:
- Add NULL group to the error handling path of
iommufd_group_setup_msi()
v3:
- New patch
---
drivers/iommu/iommufd/device.c | 43 +++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 170a7005f0bc..d03076fcf3c2 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -56,6 +56,30 @@ static bool iommufd_group_try_get(struct iommufd_group *igroup,
return kref_get_unless_zero(&igroup->ref);
}
+static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx,
+ struct iommu_group *group)
+{
+ struct iommufd_group *new_igroup;
+
+ new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL);
+ if (!new_igroup)
+ return ERR_PTR(-ENOMEM);
+
+ kref_init(&new_igroup->ref);
+ mutex_init(&new_igroup->lock);
+ xa_init(&new_igroup->pasid_attach);
+ new_igroup->sw_msi_start = PHYS_ADDR_MAX;
+ /* group reference moves into new_igroup */
+ new_igroup->group = group;
+
+ /*
+ * The ictx is not additionally refcounted here because all objects using
+ * an igroup must put it before their destroy completes.
+ */
+ new_igroup->ictx = ictx;
+ return new_igroup;
+}
+
/*
* iommufd needs to store some more data for each iommu_group, we keep a
* parallel xarray indexed by iommu_group id to hold this instead of putting it
@@ -87,25 +111,12 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
}
xa_unlock(&ictx->groups);
- new_igroup = kzalloc_obj(*new_igroup);
- if (!new_igroup) {
+ new_igroup = iommufd_alloc_group(ictx, group);
+ if (IS_ERR(new_igroup)) {
iommu_group_put(group);
- return ERR_PTR(-ENOMEM);
+ return new_igroup;
}
- kref_init(&new_igroup->ref);
- mutex_init(&new_igroup->lock);
- xa_init(&new_igroup->pasid_attach);
- new_igroup->sw_msi_start = PHYS_ADDR_MAX;
- /* group reference moves into new_igroup */
- new_igroup->group = group;
-
- /*
- * The ictx is not additionally refcounted here becase all objects using
- * an igroup must put it before their destroy completes.
- */
- new_igroup->ictx = ictx;
-
/*
* We dropped the lock so igroup is invalid. NULL is a safe and likely
* value to assume for the xa_cmpxchg algorithm.
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v6 3/7] iommufd: Allow binding to a noiommu device
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-05-21 22:11 ` [PATCH v6 1/7] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-05-21 22:11 ` [PATCH v6 2/7] iommufd: Move igroup allocation to a function Jacob Pan
@ 2026-05-21 22:11 ` Jacob Pan
2026-05-22 6:01 ` Baolu Lu
2026-05-21 22:11 ` [PATCH v6 4/7] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
` (4 subsequent siblings)
7 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-21 22:11 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
From: Jason Gunthorpe <jgg@nvidia.com>
Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating
a dummy IOMMU group for such devices and skipping hwpt operations.
This enables noiommu devices to operate through the same iommufd API as IOMMU-
capable devices.
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v6:
- Expand iommufd_device_is_noiommu() comment to explain why dev->iommu
is checked instead of device_iommu_mapped() (Yi & Baolu)
- Simplify bind error handling by factoring out duplicated rc check (Yi)
v5:
- simplify logic and rename iommufd_device_is_noiommu (Kevin, Yi)
- use a helper iommufd_bind_noiommu instead of open coding (Kevin)
- move IOMMU cap check under iommufd_bind_iommu() (Yi)
- reword comments for partial init (Yi)
- misc minor clean up
v4:
- Update the description of the module parameter (Alex)
v3:
- Consolidate into fewer patches
fix baolu comment
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/iommu/iommufd/device.c | 149 ++++++++++++++++++++++++---------
1 file changed, 110 insertions(+), 39 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index d03076fcf3c2..ff7f7bff5058 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -23,6 +23,19 @@ struct iommufd_attach {
struct xarray device_array;
};
+/*
+ * Detect a noiommu device for the cdev path. We check dev->iommu rather than
+ * using device_iommu_mapped() (which checks dev->iommu_group) because when
+ * both group and cdev interfaces coexist, the group path assigns a fake
+ * noiommu iommu_group to the device. That would cause device_iommu_mapped()
+ * to return true and hide the noiommu case from the cdev path. dev->iommu is
+ * reliably NULL when no IOMMU driver is managing the device.
+ */
+static bool iommufd_device_is_noiommu(struct iommufd_device *idev)
+{
+ return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && !idev->dev->iommu;
+}
+
static void iommufd_group_release(struct kref *kref)
{
struct iommufd_group *igroup =
@@ -30,9 +43,11 @@ static void iommufd_group_release(struct kref *kref)
WARN_ON(!xa_empty(&igroup->pasid_attach));
- xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
- NULL, GFP_KERNEL);
- iommu_group_put(igroup->group);
+ if (igroup->group) {
+ xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group),
+ igroup, NULL, GFP_KERNEL);
+ iommu_group_put(igroup->group);
+ }
mutex_destroy(&igroup->lock);
kfree(igroup);
}
@@ -204,32 +219,20 @@ void iommufd_device_destroy(struct iommufd_object *obj)
struct iommufd_device *idev =
container_of(obj, struct iommufd_device, obj);
- iommu_device_release_dma_owner(idev->dev);
+ /* igroup is NULL when destroy called during bind error cleanup */
+ if (!idev->igroup)
+ return;
+ if (!iommufd_device_is_noiommu(idev))
+ iommu_device_release_dma_owner(idev->dev);
iommufd_put_group(idev->igroup);
if (!iommufd_selftest_is_mock_dev(idev->dev))
iommufd_ctx_put(idev->ictx);
}
-/**
- * iommufd_device_bind - Bind a physical device to an iommu fd
- * @ictx: iommufd file descriptor
- * @dev: Pointer to a physical device struct
- * @id: Output ID number to return to userspace for this device
- *
- * A successful bind establishes an ownership over the device and returns
- * struct iommufd_device pointer, otherwise returns error pointer.
- *
- * A driver using this API must set driver_managed_dma and must not touch
- * the device until this routine succeeds and establishes ownership.
- *
- * Binding a PCI device places the entire RID under iommufd control.
- *
- * The caller must undo this with iommufd_device_unbind()
- */
-struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
- struct device *dev, u32 *id)
+static int iommufd_bind_iommu(struct iommufd_device *idev)
{
- struct iommufd_device *idev;
+ struct iommufd_ctx *ictx = idev->ictx;
+ struct device *dev = idev->dev;
struct iommufd_group *igroup;
int rc;
@@ -238,11 +241,11 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
* to restore cache coherency.
*/
if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
igroup = iommufd_get_group(ictx, dev);
if (IS_ERR(igroup))
- return ERR_CAST(igroup);
+ return PTR_ERR(igroup);
/*
* For historical compat with VFIO the insecure interrupt path is
@@ -268,21 +271,77 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
if (rc)
goto out_group_put;
+ /* igroup refcount moves into iommufd_device */
+ idev->igroup = igroup;
+ idev->enforce_cache_coherency =
+ device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+ return 0;
+
+out_group_put:
+ iommufd_put_group(igroup);
+ return rc;
+}
+
+/*
+ * Noiommu devices have no real IOMMU group. Create a dummy igroup so that
+ * internal code paths that expect idev->igroup to be present still work.
+ * A NULL igroup->group distinguishes this from a real IOMMU-backed group.
+ */
+static int iommufd_bind_noiommu(struct iommufd_device *idev)
+{
+ struct iommufd_group *igroup;
+
+ igroup = iommufd_alloc_group(idev->ictx, NULL);
+ if (IS_ERR(igroup))
+ return PTR_ERR(igroup);
+ idev->igroup = igroup;
+ return 0;
+}
+
+/**
+ * iommufd_device_bind - Bind a physical device to an iommu fd
+ * @ictx: iommufd file descriptor
+ * @dev: Pointer to a physical device struct
+ * @id: Output ID number to return to userspace for this device
+ *
+ * A successful bind establishes an ownership over the device and returns
+ * struct iommufd_device pointer, otherwise returns error pointer.
+ *
+ * A driver using this API must set driver_managed_dma and must not touch
+ * the device until this routine succeeds and establishes ownership.
+ *
+ * Binding a PCI device places the entire RID under iommufd control.
+ *
+ * The caller must undo this with iommufd_device_unbind()
+ */
+struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
+ struct device *dev, u32 *id)
+{
+ struct iommufd_device *idev;
+ int rc;
+
idev = iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE);
- if (IS_ERR(idev)) {
- rc = PTR_ERR(idev);
- goto out_release_owner;
- }
+ if (IS_ERR(idev))
+ return idev;
+
idev->ictx = ictx;
+ idev->dev = dev;
+
+ if (!iommufd_device_is_noiommu(idev))
+ rc = iommufd_bind_iommu(idev);
+ else
+ rc = iommufd_bind_noiommu(idev);
+ if (rc)
+ goto err_out;
+
+ /*
+ * Take a ctx reference after bind succeeds. This must happen here
+ * so that iommufd_device_destroy() can handle partial initialization
+ */
if (!iommufd_selftest_is_mock_dev(dev))
iommufd_ctx_get(ictx);
- idev->dev = dev;
- idev->enforce_cache_coherency =
- device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
/* The calling driver is a user until iommufd_device_unbind() */
refcount_inc(&idev->obj.users);
- /* igroup refcount moves into iommufd_device */
- idev->igroup = igroup;
/*
* If the caller fails after this success it must call
@@ -294,11 +353,14 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
*id = idev->obj.id;
return idev;
-out_release_owner:
- iommu_device_release_dma_owner(dev);
-out_group_put:
- iommufd_put_group(igroup);
+err_out:
+ /*
+ * iommufd_device_destroy() handles partially initialized idev,
+ * so iommufd_object_abort_and_destroy() is safe to call here.
+ */
+ iommufd_object_abort_and_destroy(ictx, &idev->obj);
return ERR_PTR(rc);
+
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD");
@@ -512,6 +574,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
struct iommufd_attach_handle *handle;
int rc;
+ if (iommufd_device_is_noiommu(idev))
+ return 0;
+
if (!iommufd_hwpt_compatible_device(hwpt, idev))
return -EINVAL;
@@ -559,6 +624,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
{
struct iommufd_attach_handle *handle;
+ if (iommufd_device_is_noiommu(idev))
+ return;
+
handle = iommufd_device_get_attach_handle(idev, pasid);
if (pasid == IOMMU_NO_PASID)
iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
@@ -577,6 +645,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
struct iommufd_attach_handle *handle, *old_handle;
int rc;
+ if (iommufd_device_is_noiommu(idev))
+ return 0;
+
if (!iommufd_hwpt_compatible_device(hwpt, idev))
return -EINVAL;
@@ -652,7 +723,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
goto err_release_devid;
}
- if (attach_resv) {
+ if (attach_resv && !iommufd_device_is_noiommu(idev)) {
rc = iommufd_device_attach_reserved_iova(idev, hwpt_paging);
if (rc)
goto err_release_devid;
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v6 4/7] iommufd: Add an ioctl to query PA from IOVA for noiommu mode
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
` (2 preceding siblings ...)
2026-05-21 22:11 ` [PATCH v6 3/7] iommufd: Allow binding to a noiommu device Jacob Pan
@ 2026-05-21 22:11 ` Jacob Pan
2026-05-22 9:22 ` Yi Liu
2026-05-21 22:11 ` [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
` (3 subsequent siblings)
7 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-21 22:11 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
To support no-IOMMU mode where userspace drivers perform unsafe DMA
using physical addresses, introduce a new API to retrieve the
physical address of a user-allocated DMA buffer that has been mapped to
an IOVA via IOAS. The mapping is backed by SW-only I/O page tables
maintained by the generic IOMMUPT framework.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v6:
- Limit search length (Baolu, Jason)
v5:
- Fix next_iova exceeds iopt_area_last_iova (Alex)
- Rename IOCTL more specific to NOIOMMU, i.e.
IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA (Kevin)
- Add header stubs for iopt_get_phys()
v4:
- Fix ioctl return type (Yi Liu)
---
drivers/iommu/iommufd/io_pagetable.c | 72 +++++++++++++++++++++++++
drivers/iommu/iommufd/ioas.c | 30 +++++++++++
drivers/iommu/iommufd/iommufd_private.h | 18 +++++++
drivers/iommu/iommufd/main.c | 3 ++
include/uapi/linux/iommufd.h | 27 ++++++++++
5 files changed, 150 insertions(+)
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 24d4917105d9..4369447e2125 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -859,6 +859,78 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
return iopt_unmap_iova_range(iopt, iova, iova_last, unmapped);
}
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
+ u64 *length)
+{
+ struct iopt_area *area;
+ u64 max_length = *length;
+ u64 tmp_length = 0;
+ u64 tmp_paddr = 0;
+ int rc = 0;
+
+ down_read(&iopt->iova_rwsem);
+ area = iopt_area_iter_first(iopt, iova, iova);
+ if (!area || !area->pages) {
+ rc = -ENOENT;
+ goto unlock_exit;
+ }
+
+ if (!area->storage_domain ||
+ area->storage_domain->owner != &iommufd_noiommu_ops) {
+ rc = -EOPNOTSUPP;
+ goto unlock_exit;
+ }
+
+ *paddr = iommu_iova_to_phys(area->storage_domain, iova);
+ if (!*paddr) {
+ rc = -EINVAL;
+ goto unlock_exit;
+ }
+
+ tmp_length = PAGE_SIZE - offset_in_page(iova);
+ tmp_paddr = *paddr;
+ /*
+ * Scan the domain for the contiguous physical address length so that
+ * userspace search can be optimized for fewer ioctls. A max_length of
+ * 0 means no limit.
+ */
+ while (iova < iopt_area_last_iova(area)) {
+ unsigned long next_iova;
+ u64 next_paddr;
+
+ if (max_length && tmp_length >= max_length) {
+ tmp_length = max_length;
+ break;
+ }
+
+ if (check_add_overflow(iova, PAGE_SIZE, &next_iova))
+ break;
+
+ if (next_iova > iopt_area_last_iova(area))
+ break;
+
+ next_paddr = iommu_iova_to_phys(area->storage_domain, next_iova);
+
+ if (!next_paddr || next_paddr != tmp_paddr + PAGE_SIZE)
+ break;
+
+ iova = next_iova;
+ tmp_paddr += PAGE_SIZE;
+ tmp_length += PAGE_SIZE;
+ }
+
+ if (max_length && tmp_length > max_length)
+ tmp_length = max_length;
+ *length = tmp_length;
+
+unlock_exit:
+ up_read(&iopt->iova_rwsem);
+
+ return rc;
+}
+#endif
+
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped)
{
/* If the IOVAs are empty then unmap all succeeds */
diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index fed06c2b728e..82bbc0c2357e 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -375,6 +375,36 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd)
return rc;
}
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_ioas_noiommu_get_pa *cmd = ucmd->cmd;
+ struct iommufd_ioas *ioas;
+ int rc;
+
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
+ if (cmd->flags || cmd->__reserved)
+ return -EOPNOTSUPP;
+
+ ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
+ if (IS_ERR(ioas))
+ return PTR_ERR(ioas);
+
+ rc = iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys,
+ &cmd->length);
+ if (rc)
+ goto out_put;
+
+ rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out_put:
+ iommufd_put_object(ucmd->ictx, &ioas->obj);
+
+ return rc;
+}
+#endif
+
static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx,
struct xarray *ioas_list)
{
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 2682b5baa6e9..13f1506d8066 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -118,6 +118,16 @@ int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list,
int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
unsigned long length, unsigned long *unmapped);
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
+ u64 *length);
+#else
+static inline int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova,
+ u64 *paddr, u64 *length)
+{
+ return -EOPNOTSUPP;
+}
+#endif
int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
struct iommu_domain *domain,
@@ -346,6 +356,14 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd);
int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd);
int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
+#ifdef CONFIG_IOMMUFD_NOIOMMU
+int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd);
+#else
+static inline int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd)
+{
+ return -EOPNOTSUPP;
+}
+#endif
int iommufd_ioas_option(struct iommufd_ucmd *ucmd);
int iommufd_option_rlimit_mode(struct iommu_option *cmd,
struct iommufd_ctx *ictx);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 8c6d43601afb..3b4192d70570 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -424,6 +424,7 @@ union ucmd_buffer {
struct iommu_ioas_alloc alloc;
struct iommu_ioas_allow_iovas allow_iovas;
struct iommu_ioas_copy ioas_copy;
+ struct iommu_ioas_noiommu_get_pa noiommu_get_pa;
struct iommu_ioas_iova_ranges iova_ranges;
struct iommu_ioas_map map;
struct iommu_ioas_unmap unmap;
@@ -482,6 +483,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, iova),
IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file,
struct iommu_ioas_map_file, iova),
+ IOCTL_OP(IOMMU_IOAS_NOIOMMU_GET_PA, iommufd_ioas_noiommu_get_pa, struct iommu_ioas_noiommu_get_pa,
+ out_phys),
IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap,
length),
IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64),
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index e998dfbd6960..26b4998439e8 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -57,6 +57,7 @@ enum {
IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
IOMMUFD_CMD_HW_QUEUE_ALLOC = 0x94,
+ IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA = 0x95,
};
/**
@@ -219,6 +220,32 @@ struct iommu_ioas_map {
};
#define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP)
+/**
+ * struct iommu_ioas_noiommu_get_pa - ioctl(IOMMU_IOAS_NOIOMMU_GET_PA)
+ * @size: sizeof(struct iommu_ioas_noiommu_get_pa)
+ * @flags: Reserved, must be 0 for now
+ * @ioas_id: IOAS ID to query IOVA to PA mapping from
+ * @__reserved: Must be 0
+ * @iova: IOVA to query
+ * @length: On input, maximum number of bytes to scan for contiguity (0 means
+ * no limit). On output, actual number of contiguous bytes starting
+ * from out_phys.
+ * @out_phys: Output physical address the IOVA maps to
+ *
+ * Query the physical address backing an IOVA range. The entire range must be
+ * mapped already. For noiommu devices doing unsafe DMA only.
+ */
+struct iommu_ioas_noiommu_get_pa {
+ __u32 size;
+ __u32 flags;
+ __u32 ioas_id;
+ __u32 __reserved;
+ __aligned_u64 iova;
+ __aligned_u64 length;
+ __aligned_u64 out_phys;
+};
+#define IOMMU_IOAS_NOIOMMU_GET_PA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA)
+
/**
* struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE)
* @size: sizeof(struct iommu_ioas_map_file)
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
` (3 preceding siblings ...)
2026-05-21 22:11 ` [PATCH v6 4/7] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
@ 2026-05-21 22:11 ` Jacob Pan
2026-05-22 9:19 ` Yi Liu
2026-05-21 22:11 ` [PATCH v6 6/7] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
` (2 subsequent siblings)
7 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-21 22:11 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
Now that devices under noiommu mode can bind with IOMMUFD and perform
IOAS operations, lift restrictions on cdev from VFIO side.
Use cases are documented in Documentation/driver-api/vfio.rst
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v6:
- Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and group.
Use Kconfig dependency to restrict usages and avoid null group
checks. (Alex & Yi)
- Add CAP_SYS_RAWIO checks for cdev open to maintain security parity
with the group noiommu path. (Alex)
v5:
- Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
and its dependencies
- Add comment to explain vfio_noiommu conditional definition (Alex)
- Removed early return for group noiommu in bind/unbind
- Use consistent wording referring to VFIO noiommu mode (Kevin)
- Update unsafe_noiommu Kconfig help text (Kevin)
- Change dev_warn to dev_info for noiommu enabling msg (Kevin)
v4:
- Remove early return in iommufd_bind for noiommu (Alex)
v3:
- Consolidate into fewer patches
v2:
- removed unnecessary device->noiommu set in
iommufd_vfio_compat_ioas_get_id()
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/vfio/Kconfig | 8 +++++---
drivers/vfio/device_cdev.c | 3 +++
drivers/vfio/iommufd.c | 6 +++---
drivers/vfio/vfio.h | 20 +++++++++++++-------
drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
include/linux/vfio.h | 1 +
6 files changed, 44 insertions(+), 17 deletions(-)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index ceae52fd7586..d3d8fef2855c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
The VFIO device cdev is another way for userspace to get device
access. Userspace gets device fd by opening device cdev under
/dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
- to set up secure DMA context for device access. This interface does
- not support noiommu.
+ to set up secure DMA context for device access.
If you don't know what to do here, say N.
@@ -62,7 +61,10 @@ endif
config VFIO_NOIOMMU
bool "VFIO No-IOMMU support"
- depends on VFIO_GROUP
+ depends on VFIO_GROUP || VFIO_DEVICE_CDEV
+ depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER
+ depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
+ select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
help
VFIO is built on the ability to isolate devices using the IOMMU.
Only with an IOMMU can userspace access to DMA capable devices be
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 54abf312cf04..4e2c1e4fc1f8 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
struct vfio_device_file *df;
int ret;
+ if (device->noiommu && !capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
/* Paired with the put in vfio_device_fops_release() */
if (!vfio_device_try_get_registration(device))
return -ENODEV;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index a38d262c6028..d4f2e2a0f2f3 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- /* Returns 0 to permit device opening under noiommu mode */
- if (vfio_device_is_noiommu(vdev))
+ /* Group noiommu via iommufd compat needs no device binding */
+ if (df->group && vfio_device_is_noiommu(vdev))
return 0;
return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
@@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- if (vfio_device_is_noiommu(vdev))
+ if (df->group && vfio_device_is_noiommu(vdev))
return;
if (vdev->ops->unbind_iommufd)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index e4b72e79b7e3..6f0a2dfc8a00 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct vfio_device *device);
static inline int vfio_device_add(struct vfio_device *device)
{
- /* cdev does not support noiommu device */
- if (vfio_device_is_noiommu(device))
- return device_add(&device->device);
vfio_init_device_cdev(device);
return cdev_device_add(&device->cdev, &device->device);
}
static inline void vfio_device_del(struct vfio_device *device)
{
- if (vfio_device_is_noiommu(device))
- device_del(&device->device);
- else
- cdev_device_del(&device->cdev, &device->device);
+ cdev_device_del(&device->cdev, &device->device);
}
int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
@@ -420,6 +414,18 @@ static inline void vfio_cdev_cleanup(void)
}
#endif /* CONFIG_VFIO_DEVICE_CDEV */
+#if IS_ENABLED(CONFIG_VFIO_NOIOMMU)
+static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev)
+{
+ return vdev->noiommu;
+}
+#else
+static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev)
+{
+ return false;
+}
+#endif
+
#if IS_ENABLED(CONFIG_VFIO_VIRQFD)
int __init vfio_virqfd_init(void);
void vfio_virqfd_exit(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6222376ab6ab..84381c500623 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -321,6 +321,20 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
return ret;
}
+static int vfio_device_set_noiommu_and_name(struct vfio_device *device)
+{
+ if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu && !device->dev->iommu) {
+ device->noiommu = true;
+ add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+ dev_warn(device->dev,
+ "Adding kernel taint for vfio-noiommu cdev on device\n");
+ }
+
+ /* Just to be safe, expose to user explicitly noiommu cdev node */
+ return dev_set_name(&device->device, "%svfio%d",
+ device->noiommu ? "noiommu-" : "", device->index);
+}
+
static int __vfio_register_dev(struct vfio_device *device,
enum vfio_group_type type)
{
@@ -340,20 +354,21 @@ static int __vfio_register_dev(struct vfio_device *device,
if (!device->dev_set)
vfio_assign_device_set(device, device);
- ret = dev_set_name(&device->device, "vfio%d", device->index);
+ ret = vfio_device_set_group(device, type);
if (ret)
return ret;
- ret = vfio_device_set_group(device, type);
+ ret = vfio_device_set_noiommu_and_name(device);
if (ret)
- return ret;
+ goto err_out;
/*
* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
* restore cache coherency. It has to be checked here because it is only
* valid for cases where we are using iommu groups.
*/
- if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
+ if (type == VFIO_IOMMU && !(vfio_device_is_noiommu(device) ||
+ vfio_device_is_cdev_noiommu(device)) &&
!device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
ret = -EINVAL;
goto err_out;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 31b826efba00..45f08986359e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -74,6 +74,7 @@ struct vfio_device {
u8 iommufd_attached:1;
#endif
u8 cdev_opened:1;
+ u8 noiommu:1;
/*
* debug_root is a static property of the vfio_device
* which must be set prior to registering the vfio_device.
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v6 6/7] selftests/vfio: Add iommufd noiommu mode selftest for cdev
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
` (4 preceding siblings ...)
2026-05-21 22:11 ` [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
@ 2026-05-21 22:11 ` Jacob Pan
2026-05-21 22:39 ` David Matlack
2026-05-21 22:11 ` [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode Jacob Pan
2026-05-25 8:30 ` [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Tian, Kevin
7 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-21 22:11 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
Add comprehensive selftest for VFIO device operations with iommufd in
noiommu mode. Tests cover:
- Device binding to iommufd
- IOAS (I/O Address Space) allocation, mapping with dummy IOVA
- Retrieve PA from dummy IOVA
- Device attach/detach operations as usual
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v6:
- Add test cases for get_pa length limit
v4:
- squash DSA specific selftest changes
v2:
- New selftest for generic noiommu bind/unbind
---
tools/testing/selftests/vfio/Makefile | 1 +
.../lib/include/libvfio/vfio_pci_device.h | 16 +
.../selftests/vfio/lib/vfio_pci_device.c | 5 +-
.../vfio/vfio_iommufd_noiommu_test.c | 664 ++++++++++++++++++
4 files changed, 684 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
index 0684932d91bf..c9c02fdfd946 100644
--- a/tools/testing/selftests/vfio/Makefile
+++ b/tools/testing/selftests/vfio/Makefile
@@ -9,6 +9,7 @@ CFLAGS = $(KHDR_INCLUDES)
TEST_GEN_PROGS += vfio_dma_mapping_test
TEST_GEN_PROGS += vfio_dma_mapping_mmio_test
TEST_GEN_PROGS += vfio_iommufd_setup_test
+TEST_GEN_PROGS += vfio_iommufd_noiommu_test
TEST_GEN_PROGS += vfio_pci_device_test
TEST_GEN_PROGS += vfio_pci_device_init_perf_test
TEST_GEN_PROGS += vfio_pci_driver_test
diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
index 2858885a89bb..6218c91776b3 100644
--- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
+++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
@@ -122,4 +122,20 @@ static inline bool vfio_pci_device_match(struct vfio_pci_device *device,
const char *vfio_pci_get_cdev_path(const char *bdf);
+static inline bool vfio_pci_noiommu_mode_enabled(void)
+{
+ char buf[8] = {};
+ int fd, n;
+
+ fd = open("/sys/module/vfio/parameters/enable_unsafe_noiommu_mode",
+ O_RDONLY);
+ if (fd < 0)
+ return false;
+
+ n = read(fd, buf, sizeof(buf) - 1);
+ close(fd);
+
+ return n > 0 && buf[0] == 'Y';
+}
+
#endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H */
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
index fc75e04ef010..1a91658e812d 100644
--- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c
+++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
@@ -308,8 +308,9 @@ const char *vfio_pci_get_cdev_path(const char *bdf)
VFIO_ASSERT_NOT_NULL(dir, "Failed to open directory %s\n", dir_path);
while ((entry = readdir(dir)) != NULL) {
- /* Find the file that starts with "vfio" */
- if (strncmp("vfio", entry->d_name, 4))
+ /* Find the file that starts with "vfio" or "noiommu-vfio" */
+ if (strncmp("vfio", entry->d_name, 4) &&
+ strncmp("noiommu-vfio", entry->d_name, 12))
continue;
snprintf(cdev_path, PATH_MAX, "/dev/vfio/devices/%s", entry->d_name);
diff --git a/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
new file mode 100644
index 000000000000..d91b505fc60d
--- /dev/null
+++ b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
@@ -0,0 +1,664 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * VFIO iommufd NoIOMMU Mode Selftest
+ *
+ * Tests VFIO device operations with iommufd in noiommu mode, including:
+ * - Device binding to iommufd
+ * - IOAS (I/O Address Space) allocation and management
+ * - Device attach/detach to IOAS
+ * - Memory mapping in IOAS
+ * - Device info queries and reset
+ */
+
+#include <linux/limits.h>
+#include <linux/vfio.h>
+#include <linux/iommufd.h>
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <dirent.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <errno.h>
+
+#include <libvfio.h>
+#include "kselftest_harness.h"
+
+static const char iommu_dev_path[] = "/dev/iommu";
+static const char *cdev_path;
+
+static char *vfio_noiommu_get_device_id(const char *bdf)
+{
+ char *path = NULL;
+ char *vfio_id = NULL;
+ struct dirent *dentry;
+ DIR *dp;
+
+ if (asprintf(&path, "/sys/bus/pci/devices/%s/vfio-dev", bdf) < 0)
+ return NULL;
+
+ dp = opendir(path);
+ if (!dp) {
+ free(path);
+ return NULL;
+ }
+
+ while ((dentry = readdir(dp)) != NULL) {
+ if (strncmp("noiommu-vfio", dentry->d_name, 12) == 0) {
+ vfio_id = strdup(dentry->d_name);
+ break;
+ }
+ }
+
+ closedir(dp);
+ free(path);
+ return vfio_id;
+}
+
+static char *vfio_noiommu_get_cdev_path(const char *bdf)
+{
+ char *vfio_id = vfio_noiommu_get_device_id(bdf);
+ char *cdev = NULL;
+
+ if (vfio_id) {
+ asprintf(&cdev, "/dev/vfio/devices/%s", vfio_id);
+ free(vfio_id);
+ }
+ return cdev;
+}
+
+static int vfio_device_bind_iommufd_ioctl(int cdev_fd, int iommufd)
+{
+ struct vfio_device_bind_iommufd bind_args = {
+ .argsz = sizeof(bind_args),
+ .iommufd = iommufd,
+ };
+
+ return ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind_args);
+}
+
+static int vfio_device_get_info_ioctl(int cdev_fd,
+ struct vfio_device_info *info)
+{
+ info->argsz = sizeof(*info);
+ return ioctl(cdev_fd, VFIO_DEVICE_GET_INFO, info);
+}
+
+static int vfio_device_ioas_alloc_ioctl(int iommufd,
+ struct iommu_ioas_alloc *alloc_args)
+{
+ alloc_args->size = sizeof(*alloc_args);
+ alloc_args->flags = 0;
+ return ioctl(iommufd, IOMMU_IOAS_ALLOC, alloc_args);
+}
+
+static int vfio_device_attach_iommufd_pt_ioctl(int cdev_fd, u32 pt_id)
+{
+ struct vfio_device_attach_iommufd_pt attach_args = {
+ .argsz = sizeof(attach_args),
+ .pt_id = pt_id,
+ };
+
+ return ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_args);
+}
+
+static int vfio_device_detach_iommufd_pt_ioctl(int cdev_fd)
+{
+ struct vfio_device_detach_iommufd_pt detach_args = {
+ .argsz = sizeof(detach_args),
+ };
+
+ return ioctl(cdev_fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_args);
+}
+
+static int vfio_device_get_region_info_ioctl(int cdev_fd, uint32_t index,
+ struct vfio_region_info *info)
+{
+ info->argsz = sizeof(*info);
+ info->index = index;
+ return ioctl(cdev_fd, VFIO_DEVICE_GET_REGION_INFO, info);
+}
+
+static int vfio_device_reset_ioctl(int cdev_fd)
+{
+ return ioctl(cdev_fd, VFIO_DEVICE_RESET);
+}
+
+static int ioas_map_pages(int iommufd, uint32_t ioas_id, uint64_t iova,
+ size_t length, bool hugepages)
+{
+ struct iommu_ioas_map map_args = {
+ .size = sizeof(map_args),
+ .ioas_id = ioas_id,
+ .iova = iova,
+ .length = length,
+ .flags = IOMMU_IOAS_MAP_READABLE | IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_FIXED_IOVA,
+ };
+ void *pages;
+ int ret;
+
+ /* Allocate test pages */
+ if (hugepages)
+ pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
+ else
+ pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (pages == MAP_FAILED) {
+ printf("mmap failed for length 0x%lx\n", (unsigned long)length);
+ return -ENOMEM;
+ }
+
+ /* Set up page pointer for mapping */
+ map_args.user_va = (uintptr_t)pages;
+
+ printf(" ioas_map_pages: ioas_id=%u, iova=0x%lx, length=0x%lx, user_va=%p\n",
+ ioas_id, (unsigned long)iova, (unsigned long)length, pages);
+
+ /* Map into IOAS */
+ ret = ioctl(iommufd, IOMMU_IOAS_MAP, &map_args);
+ if (ret != 0)
+ printf(" IOMMU_IOAS_MAP failed: %d (%s)\n", ret, strerror(errno));
+ else
+ printf(" IOMMU_IOAS_MAP succeeded, IOVA=0x%lx\n", (unsigned long)map_args.iova);
+
+ munmap(pages, length);
+ return ret;
+}
+
+static int ioas_unmap_pages(int iommufd, uint32_t ioas_id, uint64_t iova,
+ size_t length)
+{
+ struct iommu_ioas_unmap unmap_args = {
+ .size = sizeof(unmap_args),
+ .ioas_id = ioas_id,
+ .iova = iova,
+ .length = length,
+ };
+
+ return ioctl(iommufd, IOMMU_IOAS_UNMAP, &unmap_args);
+}
+
+static int ioas_destroy_ioctl(int iommufd, uint32_t ioas_id)
+{
+ struct iommu_destroy destroy_args = {
+ .size = sizeof(destroy_args),
+ .id = ioas_id,
+ };
+
+ return ioctl(iommufd, IOMMU_DESTROY, &destroy_args);
+}
+
+static int ioas_noiommu_get_pa_ioctl_len(int iommufd, uint32_t ioas_id,
+ uint64_t iova, uint64_t max_length,
+ uint64_t *phys_out, uint64_t *length_out)
+{
+ struct iommu_ioas_noiommu_get_pa get_pa = {
+ .size = sizeof(get_pa),
+ .flags = 0,
+ .ioas_id = ioas_id,
+ .iova = iova,
+ .length = max_length,
+ };
+
+ printf(" ioas_noiommu_get_pa_ioctl: ioas_id=%u, iova=0x%lx, max_length=0x%lx\n",
+ ioas_id, (unsigned long)iova, (unsigned long)max_length);
+
+ if (ioctl(iommufd, IOMMU_IOAS_NOIOMMU_GET_PA, &get_pa) != 0) {
+ printf(" IOMMU_IOAS_NOIOMMU_GET_PA failed: %s (errno=%d)\n",
+ strerror(errno), errno);
+ return -1;
+ }
+
+ printf(" IOMMU_IOAS_NOIOMMU_GET_PA succeeded: PA=0x%lx, length=0x%lx\n",
+ (unsigned long)get_pa.out_phys, (unsigned long)get_pa.length);
+
+ if (phys_out)
+ *phys_out = get_pa.out_phys;
+ if (length_out)
+ *length_out = get_pa.length;
+
+ return 0;
+}
+
+static int ioas_noiommu_get_pa_ioctl(int iommufd, uint32_t ioas_id, uint64_t iova,
+ uint64_t *phys_out, uint64_t *length_out)
+{
+ return ioas_noiommu_get_pa_ioctl_len(iommufd, ioas_id, iova, 0,
+ phys_out, length_out);
+}
+
+FIXTURE(vfio_noiommu) {
+ int cdev_fd;
+ int iommufd;
+};
+
+FIXTURE_SETUP(vfio_noiommu)
+{
+ ASSERT_LE(0, (self->cdev_fd = open(cdev_path, O_RDWR, 0)));
+ ASSERT_LE(0, (self->iommufd = open(iommu_dev_path, O_RDWR, 0)));
+}
+
+FIXTURE_TEARDOWN(vfio_noiommu)
+{
+ if (self->cdev_fd >= 0)
+ close(self->cdev_fd);
+ if (self->iommufd >= 0)
+ close(self->iommufd);
+}
+
+/*
+ * Test: Device cdev can be opened
+ */
+TEST_F(vfio_noiommu, device_cdev_open)
+{
+ ASSERT_LE(0, self->cdev_fd);
+}
+
+/*
+ * Test: Device can be bound to iommufd
+ */
+TEST_F(vfio_noiommu, device_bind_iommufd)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+}
+
+/*
+ * Test: Device info can be queried after binding
+ */
+TEST_F(vfio_noiommu, device_get_info_after_bind)
+{
+ struct vfio_device_info info;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
+ ASSERT_NE(0, info.argsz);
+}
+
+/*
+ * Test: Getting device info fails without bind
+ */
+TEST_F(vfio_noiommu, device_get_info_without_bind_fails)
+{
+ struct vfio_device_info info;
+
+ ASSERT_NE(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
+}
+
+/*
+ * Test: Binding with invalid iommufd fails
+ */
+TEST_F(vfio_noiommu, device_bind_bad_iommufd_fails)
+{
+ ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, -2));
+}
+
+/*
+ * Test: Cannot bind twice to same device
+ */
+TEST_F(vfio_noiommu, device_repeated_bind_fails)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+}
+
+/*
+ * Test: IOAS can be allocated
+ */
+TEST_F(vfio_noiommu, ioas_alloc)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_NE(0, alloc_args.out_ioas_id);
+}
+
+/*
+ * Test: IOAS can be destroyed
+ */
+TEST_F(vfio_noiommu, ioas_destroy)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_EQ(0, ioas_destroy_ioctl(self->iommufd,
+ alloc_args.out_ioas_id));
+}
+
+/*
+ * Test: Device can attach to IOAS after binding
+ */
+TEST_F(vfio_noiommu, device_attach_to_ioas)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+}
+
+/*
+ * Test: Attaching to invalid IOAS fails
+ */
+TEST_F(vfio_noiommu, device_attach_invalid_ioas_fails)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_NE(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ UINT32_MAX));
+}
+
+/*
+ * Test: Device can detach from IOAS
+ */
+TEST_F(vfio_noiommu, device_detach_from_ioas)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+ ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd));
+}
+
+/*
+ * Test: Full lifecycle - bind, attach, detach, reset
+ */
+TEST_F(vfio_noiommu, device_lifecycle)
+{
+ struct iommu_ioas_alloc alloc_args;
+ struct vfio_device_info info;
+
+ /* Bind device to iommufd */
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+
+ /* Allocate IOAS */
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ /* Attach device to IOAS */
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+
+ /* Query device info */
+ ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
+
+ /* Detach device from IOAS */
+ ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd));
+
+ /* Reset device */
+ ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
+}
+
+/*
+ * Test: Get region info
+ */
+TEST_F(vfio_noiommu, device_get_region_info)
+{
+ struct vfio_device_info dev_info;
+ struct vfio_region_info region_info;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &dev_info));
+
+ /* Try to get first region info if device has regions */
+ if (dev_info.num_regions > 0) {
+ ASSERT_EQ(0, vfio_device_get_region_info_ioctl(self->cdev_fd, 0,
+ ®ion_info));
+ ASSERT_NE(0, region_info.argsz);
+ }
+}
+
+TEST_F(vfio_noiommu, device_reset)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
+}
+
+TEST_F(vfio_noiommu, ioas_map_pages)
+{
+ struct iommu_ioas_alloc alloc_args;
+ long page_size = sysconf(_SC_PAGESIZE);
+ uint64_t iova = 0x10000;
+ int i;
+
+ ASSERT_GT(page_size, 0);
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ printf("Page size: %ld bytes\n", page_size);
+ /* Test mapping regions of different sizes: 1, 2, 4, 8 pages */
+ for (i = 0; i < 4; i++) {
+ size_t map_size = page_size * (1 << i); /* 1, 2, 4, 8 pages */
+ uint64_t test_iova = iova + (i * 0x100000);
+
+ /* Attempt to map each region (may fail if not supported) */
+ ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
+ test_iova, map_size, false);
+ }
+}
+
+TEST_F(vfio_noiommu, multiple_ioas_alloc)
+{
+ struct iommu_ioas_alloc alloc1, alloc2;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc1));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc2));
+ ASSERT_NE(alloc1.out_ioas_id, alloc2.out_ioas_id);
+}
+
+/*
+ * Test: Query physical address for IOVA
+ * Tests IOMMU_IOAS_NOIOMMU_GET_PA ioctl to translate IOVA to physical address
+ * Note: Device must be attached to IOAS for PA query to work
+ */
+#define NR_PAGES 32
+TEST_F(vfio_noiommu, ioas_noiommu_get_pa_mapped)
+{
+ struct iommu_ioas_alloc alloc_args;
+ long page_size = sysconf(_SC_PAGESIZE);
+ uint64_t iova = 0x200000;
+ uint64_t phys = 0;
+ uint64_t length = 0;
+ int ret;
+
+ ASSERT_GT(page_size, 0);
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+
+ /*
+ * Map a page into an arbitrary IOAS, used as a cookie for lookup.
+ * Use hugepages to test contiguous PA. Make sure hugepages are
+ * available. e.g. echo 64 > /proc/sys/vm/nr_hugepages
+ */
+ ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
+ iova, page_size * NR_PAGES, true);
+ if (ret != 0)
+ return;
+
+ /* Query the physical address for the mapped dummy IOVA */
+ ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ iova, &phys, &length);
+
+ if (ret == 0) {
+ /* If we got a result, verify it's valid */
+ ASSERT_NE(0, phys);
+ ASSERT_GE((uint64_t)page_size * NR_PAGES, length);
+ }
+
+ /*
+ * Query with a non-page-aligned IOVA. The returned length must
+ * not exceed the actual contiguous range starting from that
+ * offset, i.e. it must be reduced by the sub-page offset.
+ */
+ phys = 0;
+ length = 0;
+ ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ iova + 0x80, &phys, &length);
+ if (ret == 0) {
+ ASSERT_NE(0, phys);
+ /* Length must account for the sub-page offset */
+ ASSERT_GE((uint64_t)page_size * NR_PAGES - 0x80, length);
+ ASSERT_LE(length, (uint64_t)page_size * NR_PAGES - 0x80);
+ /* Must not overshoot into the next page boundary */
+ ASSERT_EQ(0, (phys + length) % page_size);
+ }
+}
+
+TEST_F(vfio_noiommu, ioas_noiommu_get_pa_unmapped_fails)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ /* Try to retrieve unmapped IOVA (should fail) */
+ ASSERT_NE(0, ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ 0x10000, NULL, NULL));
+}
+
+/*
+ * Test: length == 0 means no limit (backward compat default)
+ */
+TEST_F(vfio_noiommu, ioas_noiommu_get_pa_length_zero_no_limit)
+{
+ struct iommu_ioas_alloc alloc_args;
+ long page_size = sysconf(_SC_PAGESIZE);
+ uint64_t iova = 0x200000;
+ uint64_t phys_nolimit = 0, phys_zero = 0;
+ uint64_t len_nolimit = 0, len_zero = 0;
+ int ret;
+
+ ASSERT_GT(page_size, 0);
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc_args));
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+
+ ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
+ iova, page_size * NR_PAGES, true);
+ if (ret != 0)
+ return;
+
+ /* Query with length=0 (no limit, default behavior) */
+ ret = ioas_noiommu_get_pa_ioctl_len(self->iommufd, alloc_args.out_ioas_id,
+ iova, 0, &phys_zero, &len_zero);
+ if (ret != 0)
+ return;
+
+ /* Query with the wrapper (also passes 0) — must match */
+ ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ iova, &phys_nolimit, &len_nolimit);
+ ASSERT_EQ(0, ret);
+ ASSERT_EQ(phys_zero, phys_nolimit);
+ ASSERT_EQ(len_zero, len_nolimit);
+}
+
+/*
+ * Test: length caps the returned contiguous range
+ */
+TEST_F(vfio_noiommu, ioas_noiommu_get_pa_length_capped)
+{
+ struct iommu_ioas_alloc alloc_args;
+ long page_size = sysconf(_SC_PAGESIZE);
+ uint64_t iova = 0x200000;
+ uint64_t phys = 0;
+ uint64_t len_full = 0, len_capped = 0;
+ uint64_t cap;
+ int ret;
+
+ ASSERT_GT(page_size, 0);
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc_args));
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+
+ ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
+ iova, page_size * NR_PAGES, true);
+ if (ret != 0)
+ return;
+
+ /* First get the full uncapped length */
+ ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ iova, &phys, &len_full);
+ if (ret != 0)
+ return;
+
+ ASSERT_NE(0, phys);
+ ASSERT_NE(0, len_full);
+
+ /* Cap to a single page — returned length must not exceed it */
+ cap = page_size;
+ ret = ioas_noiommu_get_pa_ioctl_len(self->iommufd, alloc_args.out_ioas_id,
+ iova, cap, &phys, &len_capped);
+ ASSERT_EQ(0, ret);
+ ASSERT_LE(len_capped, cap);
+ ASSERT_NE(0, len_capped);
+
+ /*
+ * If full length was larger than one page, confirm capping works.
+ * Otherwise the mapping wasn't contiguous enough to test.
+ */
+ if (len_full > cap)
+ ASSERT_GT(len_full, len_capped);
+
+ /* Cap to a very large value — should return the same as uncapped */
+ ret = ioas_noiommu_get_pa_ioctl_len(self->iommufd, alloc_args.out_ioas_id,
+ iova, UINT64_MAX, &phys, &len_capped);
+ ASSERT_EQ(0, ret);
+ ASSERT_EQ(len_full, len_capped);
+}
+
+int main(int argc, char *argv[])
+{
+ const char *device_bdf = vfio_selftests_get_bdf(&argc, argv);
+ char *cdev = NULL;
+
+ if (!device_bdf) {
+ ksft_print_msg("No device BDF provided\n");
+ return KSFT_SKIP;
+ }
+
+ cdev = vfio_noiommu_get_cdev_path(device_bdf);
+ if (!cdev) {
+ ksft_print_msg("Could not find cdev for device %s\n",
+ device_bdf);
+ return KSFT_SKIP;
+ }
+
+ cdev_path = cdev;
+ ksft_print_msg("Using cdev device %s for BDF %s\n", cdev_path,
+ device_bdf);
+
+ return test_harness_run(argc, argv);
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
` (5 preceding siblings ...)
2026-05-21 22:11 ` [PATCH v6 6/7] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
@ 2026-05-21 22:11 ` Jacob Pan
2026-05-22 9:42 ` Yi Liu
2026-05-25 8:30 ` [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Tian, Kevin
7 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-21 22:11 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan
Document the NOIOMMU mode with newly added cdev support under iommufd.
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v6:
- Generalize device node names (noiommu-vfioX, noiommu-Y) in the tree
example (Yi)
- Clarify table column descriptions for Yes/No meanings (Yi)
---
Documentation/driver-api/vfio.rst | 83 ++++++++++++++++++++++++++++++-
1 file changed, 81 insertions(+), 2 deletions(-)
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 2a21a42c9386..739576a22de6 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -275,8 +275,6 @@ in a VFIO group.
With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
by directly opening a character device /dev/vfio/devices/vfioX where
"X" is the number allocated uniquely by VFIO for registered devices.
-cdev interface does not support noiommu devices, so user should use
-the legacy group interface if noiommu is wanted.
The cdev only works with IOMMUFD. Both VFIO drivers and applications
must adapt to the new cdev security model which requires using
@@ -370,6 +368,87 @@ IOMMUFD IOAS/HWPT to enable userspace DMA::
/* Other device operations as stated in "VFIO Usage Example" */
+VFIO NOIOMMU mode
+-------------------------------------------------------------------------------
+VFIO also supports a no-IOMMU mode, intended for usages where unsafe DMA can
+be performed by userspace drivers w/o physical IOMMU protection. This mode
+is controlled by the parameter:
+
+/sys/module/vfio/parameters/enable_unsafe_noiommu_mode
+
+Upon enabling this mode, with an assigned device, the user will be presented
+with a VFIO group and device file, e.g.::
+
+ /dev/vfio/
+ |-- devices
+ | `-- noiommu-vfioX /* VFIO device cdev */
+ |-- noiommu-Y /* VFIO group */
+ `-- vfio
+
+The capabilities vary depending on the device programming interface and kernel
+configuration used. The following table summarizes the differences ("Yes" means
+the UAPI is accessible and functional in noiommu mode, "No" means the UAPI is
+not supported):
+
++-------------------+---------------------+----------------------+
+| Feature | VFIO group | VFIO device cdev |
++===================+=====================+======================+
+| VFIO device UAPI | Yes | Yes |
++-------------------+---------------------+----------------------+
+| VFIO container | No | No |
++-------------------+---------------------+----------------------+
+| IOMMUFD IOAS | No | Yes* |
++-------------------+---------------------+----------------------+
+
+Note that the VFIO container case includes IOMMUFD provided VFIO compatibility
+interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER is
+enabled.
+
+* IOMMUFD UAPI is available for VFIO device cdev to pin and map user memory with
+ the ability to retrieve physical addresses for DMA command submission.
+
+Kconfig Support Matrix
+^^^^^^^^^^^^^^^^^^^^^^
+
+The visibility of CONFIG_VFIO_NOIOMMU depends on the combination of
+CONFIG_VFIO_GROUP, CONFIG_VFIO_DEVICE_CDEV, and whether a container backend
+(CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER) is configured. The
+Kconfig dependencies enforce the following constraints:
+
+- At least one access path (group or cdev) must be available.
+- If VFIO_GROUP is enabled, a container backend is required; otherwise the
+ group node would be unusable in noiommu mode.
+
+The resulting support matrix:
+
++------+-------+-----------+------+---------+---------------------------+
+| Case | GROUP | Container | CDEV | NOIOMMU | Notes |
++======+=======+===========+======+=========+===========================+
+| 1 | y | y | n | yes | Group noiommu works |
++------+-------+-----------+------+---------+---------------------------+
+| 2 | y | n | n | no | Blocked - no container |
++------+-------+-----------+------+---------+---------------------------+
+| 3 | y | y | y | yes | Both paths work |
++------+-------+-----------+------+---------+---------------------------+
+| 4 | y | n | y | no | Blocked - no container |
++------+-------+-----------+------+---------+---------------------------+
+| 5 | n | - | y | yes | Cdev-only works |
++------+-------+-----------+------+---------+---------------------------+
+| 6 | n | - | n | no | No access path |
++------+-------+-----------+------+---------+---------------------------+
+
+Container = CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER (either
+suffices). Case 4 is intentionally blocked: allowing NOIOMMU with GROUP
+enabled but no container would create unusable group nodes. Users who want
+cdev-only noiommu should set CONFIG_VFIO_GROUP=n (case 5).
+
+A new IOMMUFD ioctl IOMMU_IOAS_NOIOMMU_GET_PA is added to retrieve the physical
+address for a given IOVA. Although there is no physical DMA remapping hardware,
+IOMMU_IOAS_MAP_FIXED_IOVA is still used to establish IOVA-to-PA mappings in the
+software page table for later IOMMU_IOAS_NOIOMMU_GET_PA lookups.
+tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c provides an example of
+using this ioctl in no-IOMMU mode.
+
VFIO User API
-------------------------------------------------------------------------------
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v6 6/7] selftests/vfio: Add iommufd noiommu mode selftest for cdev
2026-05-21 22:11 ` [PATCH v6 6/7] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
@ 2026-05-21 22:39 ` David Matlack
2026-06-03 0:13 ` Jacob Pan
0 siblings, 1 reply; 25+ messages in thread
From: David Matlack @ 2026-05-21 22:39 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon
On 2026-05-21 03:11 PM, Jacob Pan wrote:
For the shortlog, please use "vfio: selftests: ..."
> Add comprehensive selftest for VFIO device operations with iommufd in
> noiommu mode. Tests cover:
> - Device binding to iommufd
> - IOAS (I/O Address Space) allocation, mapping with dummy IOVA
> - Retrieve PA from dummy IOVA
> - Device attach/detach operations as usual
High level feedback: Can you use the library for all the standard setup
and ioas mapping instead of reimplementing it in this test?
iommu = iommu_init(MODE_IOMMUFD);
device = vfio_pci_device_init(iommu, bdf);
__iommu_map(...);
__iommu_unma(...);
iommu_cleanup(iommu);
vfio_pci_device_cleanup(device);
If not, what are the gaps? It would be useful to fill in those gaps so
that it is easier to use VFIO selftests with noiommu setups.
>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> v6:
> - Add test cases for get_pa length limit
> v4:
> - squash DSA specific selftest changes
> v2:
> - New selftest for generic noiommu bind/unbind
> ---
> tools/testing/selftests/vfio/Makefile | 1 +
> .../lib/include/libvfio/vfio_pci_device.h | 16 +
> .../selftests/vfio/lib/vfio_pci_device.c | 5 +-
> .../vfio/vfio_iommufd_noiommu_test.c | 664 ++++++++++++++++++
> 4 files changed, 684 insertions(+), 2 deletions(-)
> create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
>
> diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
> index 0684932d91bf..c9c02fdfd946 100644
> --- a/tools/testing/selftests/vfio/Makefile
> +++ b/tools/testing/selftests/vfio/Makefile
> @@ -9,6 +9,7 @@ CFLAGS = $(KHDR_INCLUDES)
> TEST_GEN_PROGS += vfio_dma_mapping_test
> TEST_GEN_PROGS += vfio_dma_mapping_mmio_test
> TEST_GEN_PROGS += vfio_iommufd_setup_test
> +TEST_GEN_PROGS += vfio_iommufd_noiommu_test
> TEST_GEN_PROGS += vfio_pci_device_test
> TEST_GEN_PROGS += vfio_pci_device_init_perf_test
> TEST_GEN_PROGS += vfio_pci_driver_test
> diff --git a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
> index 2858885a89bb..6218c91776b3 100644
> --- a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
> +++ b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
> @@ -122,4 +122,20 @@ static inline bool vfio_pci_device_match(struct vfio_pci_device *device,
>
> const char *vfio_pci_get_cdev_path(const char *bdf);
>
> +static inline bool vfio_pci_noiommu_mode_enabled(void)
> +{
> + char buf[8] = {};
> + int fd, n;
> +
> + fd = open("/sys/module/vfio/parameters/enable_unsafe_noiommu_mode",
> + O_RDONLY);
Can you rebase on top of the latest changes Alex merged for 7.2? It
introduces the sysfs library from Raghu. Please add a helper there for
reading module parameters in a precursor patch.
> + if (fd < 0)
> + return false;
> +
> + n = read(fd, buf, sizeof(buf) - 1);
> + close(fd);
> +
> + return n > 0 && buf[0] == 'Y';
> +}
> +
> #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H */
> diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
> index fc75e04ef010..1a91658e812d 100644
> --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c
> +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
> @@ -308,8 +308,9 @@ const char *vfio_pci_get_cdev_path(const char *bdf)
> VFIO_ASSERT_NOT_NULL(dir, "Failed to open directory %s\n", dir_path);
>
> while ((entry = readdir(dir)) != NULL) {
> - /* Find the file that starts with "vfio" */
> - if (strncmp("vfio", entry->d_name, 4))
> + /* Find the file that starts with "vfio" or "noiommu-vfio" */
> + if (strncmp("vfio", entry->d_name, 4) &&
> + strncmp("noiommu-vfio", entry->d_name, 12))
> continue;
>
> snprintf(cdev_path, PATH_MAX, "/dev/vfio/devices/%s", entry->d_name);
> diff --git a/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
> new file mode 100644
> index 000000000000..d91b505fc60d
> --- /dev/null
> +++ b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
> @@ -0,0 +1,664 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * VFIO iommufd NoIOMMU Mode Selftest
> + *
> + * Tests VFIO device operations with iommufd in noiommu mode, including:
> + * - Device binding to iommufd
> + * - IOAS (I/O Address Space) allocation and management
> + * - Device attach/detach to IOAS
> + * - Memory mapping in IOAS
> + * - Device info queries and reset
> + */
> +
> +#include <linux/limits.h>
> +#include <linux/vfio.h>
> +#include <linux/iommufd.h>
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <dirent.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +#include <errno.h>
> +
> +#include <libvfio.h>
> +#include "kselftest_harness.h"
> +
> +static const char iommu_dev_path[] = "/dev/iommu";
I don't see why this needs to be global variables.
> +static const char *cdev_path;
> +
> +static char *vfio_noiommu_get_device_id(const char *bdf)
> +{
> + char *path = NULL;
> + char *vfio_id = NULL;
> + struct dirent *dentry;
> + DIR *dp;
> +
> + if (asprintf(&path, "/sys/bus/pci/devices/%s/vfio-dev", bdf) < 0)
> + return NULL;
> +
> + dp = opendir(path);
> + if (!dp) {
> + free(path);
> + return NULL;
> + }
> +
> + while ((dentry = readdir(dp)) != NULL) {
> + if (strncmp("noiommu-vfio", dentry->d_name, 12) == 0) {
> + vfio_id = strdup(dentry->d_name);
> + break;
> + }
> + }
> +
> + closedir(dp);
> + free(path);
> + return vfio_id;
> +}
> +
> +static char *vfio_noiommu_get_cdev_path(const char *bdf)
> +{
> + char *vfio_id = vfio_noiommu_get_device_id(bdf);
> + char *cdev = NULL;
> +
> + if (vfio_id) {
> + asprintf(&cdev, "/dev/vfio/devices/%s", vfio_id);
> + free(vfio_id);
> + }
> + return cdev;
> +}
Can we put this in the library and find a way to share code with
vfio_pci_get_cdev_path()?
> +
> +static int vfio_device_bind_iommufd_ioctl(int cdev_fd, int iommufd)
> +{
> + struct vfio_device_bind_iommufd bind_args = {
> + .argsz = sizeof(bind_args),
> + .iommufd = iommufd,
> + };
> +
> + return ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind_args);
> +}
Please add the ioctl wrappers to the library so they can be used by
other tests or library code in the future.
VFIO device ioctls can go in vfio_pci_device.c and iommufd ioctls can go
in iommu.c.
> +
> +static int vfio_device_get_info_ioctl(int cdev_fd,
> + struct vfio_device_info *info)
> +{
> + info->argsz = sizeof(*info);
> + return ioctl(cdev_fd, VFIO_DEVICE_GET_INFO, info);
> +}
> +
> +static int vfio_device_ioas_alloc_ioctl(int iommufd,
> + struct iommu_ioas_alloc *alloc_args)
> +{
> + alloc_args->size = sizeof(*alloc_args);
> + alloc_args->flags = 0;
> + return ioctl(iommufd, IOMMU_IOAS_ALLOC, alloc_args);
> +}
> +
> +static int vfio_device_attach_iommufd_pt_ioctl(int cdev_fd, u32 pt_id)
> +{
> + struct vfio_device_attach_iommufd_pt attach_args = {
> + .argsz = sizeof(attach_args),
> + .pt_id = pt_id,
> + };
> +
> + return ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_args);
> +}
> +
> +static int vfio_device_detach_iommufd_pt_ioctl(int cdev_fd)
> +{
> + struct vfio_device_detach_iommufd_pt detach_args = {
> + .argsz = sizeof(detach_args),
> + };
> +
> + return ioctl(cdev_fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_args);
> +}
> +
> +static int vfio_device_get_region_info_ioctl(int cdev_fd, uint32_t index,
> + struct vfio_region_info *info)
> +{
> + info->argsz = sizeof(*info);
> + info->index = index;
> + return ioctl(cdev_fd, VFIO_DEVICE_GET_REGION_INFO, info);
> +}
> +
> +static int vfio_device_reset_ioctl(int cdev_fd)
> +{
> + return ioctl(cdev_fd, VFIO_DEVICE_RESET);
> +}
> +
> +static int ioas_map_pages(int iommufd, uint32_t ioas_id, uint64_t iova,
> + size_t length, bool hugepages)
> +{
> + struct iommu_ioas_map map_args = {
> + .size = sizeof(map_args),
> + .ioas_id = ioas_id,
> + .iova = iova,
> + .length = length,
> + .flags = IOMMU_IOAS_MAP_READABLE | IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_FIXED_IOVA,
> + };
> + void *pages;
> + int ret;
> +
> + /* Allocate test pages */
> + if (hugepages)
> + pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
> + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
> + else
> + pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
> + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> + if (pages == MAP_FAILED) {
> + printf("mmap failed for length 0x%lx\n", (unsigned long)length);
> + return -ENOMEM;
> + }
> +
> + /* Set up page pointer for mapping */
> + map_args.user_va = (uintptr_t)pages;
> +
> + printf(" ioas_map_pages: ioas_id=%u, iova=0x%lx, length=0x%lx, user_va=%p\n",
> + ioas_id, (unsigned long)iova, (unsigned long)length, pages);
> +
> + /* Map into IOAS */
> + ret = ioctl(iommufd, IOMMU_IOAS_MAP, &map_args);
> + if (ret != 0)
> + printf(" IOMMU_IOAS_MAP failed: %d (%s)\n", ret, strerror(errno));
> + else
> + printf(" IOMMU_IOAS_MAP succeeded, IOVA=0x%lx\n", (unsigned long)map_args.iova);
> +
> + munmap(pages, length);
> + return ret;
> +}
> +
> +static int ioas_unmap_pages(int iommufd, uint32_t ioas_id, uint64_t iova,
> + size_t length)
> +{
> + struct iommu_ioas_unmap unmap_args = {
> + .size = sizeof(unmap_args),
> + .ioas_id = ioas_id,
> + .iova = iova,
> + .length = length,
> + };
> +
> + return ioctl(iommufd, IOMMU_IOAS_UNMAP, &unmap_args);
> +}
> +
> +static int ioas_destroy_ioctl(int iommufd, uint32_t ioas_id)
> +{
> + struct iommu_destroy destroy_args = {
> + .size = sizeof(destroy_args),
> + .id = ioas_id,
> + };
> +
> + return ioctl(iommufd, IOMMU_DESTROY, &destroy_args);
> +}
> +
> +static int ioas_noiommu_get_pa_ioctl_len(int iommufd, uint32_t ioas_id,
> + uint64_t iova, uint64_t max_length,
> + uint64_t *phys_out, uint64_t *length_out)
> +{
> + struct iommu_ioas_noiommu_get_pa get_pa = {
> + .size = sizeof(get_pa),
> + .flags = 0,
> + .ioas_id = ioas_id,
> + .iova = iova,
> + .length = max_length,
> + };
> +
> + printf(" ioas_noiommu_get_pa_ioctl: ioas_id=%u, iova=0x%lx, max_length=0x%lx\n",
> + ioas_id, (unsigned long)iova, (unsigned long)max_length);
> +
> + if (ioctl(iommufd, IOMMU_IOAS_NOIOMMU_GET_PA, &get_pa) != 0) {
> + printf(" IOMMU_IOAS_NOIOMMU_GET_PA failed: %s (errno=%d)\n",
> + strerror(errno), errno);
> + return -1;
> + }
> +
> + printf(" IOMMU_IOAS_NOIOMMU_GET_PA succeeded: PA=0x%lx, length=0x%lx\n",
> + (unsigned long)get_pa.out_phys, (unsigned long)get_pa.length);
> +
> + if (phys_out)
> + *phys_out = get_pa.out_phys;
> + if (length_out)
> + *length_out = get_pa.length;
> +
> + return 0;
> +}
> +
> +static int ioas_noiommu_get_pa_ioctl(int iommufd, uint32_t ioas_id, uint64_t iova,
> + uint64_t *phys_out, uint64_t *length_out)
> +{
> + return ioas_noiommu_get_pa_ioctl_len(iommufd, ioas_id, iova, 0,
> + phys_out, length_out);
> +}
> +
> +FIXTURE(vfio_noiommu) {
> + int cdev_fd;
> + int iommufd;
> +};
> +
> +FIXTURE_SETUP(vfio_noiommu)
> +{
> + ASSERT_LE(0, (self->cdev_fd = open(cdev_path, O_RDWR, 0)));
> + ASSERT_LE(0, (self->iommufd = open(iommu_dev_path, O_RDWR, 0)));
> +}
> +
> +FIXTURE_TEARDOWN(vfio_noiommu)
> +{
> + if (self->cdev_fd >= 0)
> + close(self->cdev_fd);
> + if (self->iommufd >= 0)
> + close(self->iommufd);
> +}
> +
> +/*
> + * Test: Device cdev can be opened
> + */
> +TEST_F(vfio_noiommu, device_cdev_open)
> +{
> + ASSERT_LE(0, self->cdev_fd);
> +}
This is already tested by the FIXTURE_SETUP(). No need for a TEST_F().
> +
> +/*
> + * Test: Device can be bound to iommufd
> + */
> +TEST_F(vfio_noiommu, device_bind_iommufd)
> +{
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> +}
> +
> +/*
> + * Test: Device info can be queried after binding
> + */
> +TEST_F(vfio_noiommu, device_get_info_after_bind)
> +{
> + struct vfio_device_info info;
> +
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
> + ASSERT_NE(0, info.argsz);
> +}
> +
> +/*
> + * Test: Getting device info fails without bind
> + */
> +TEST_F(vfio_noiommu, device_get_info_without_bind_fails)
> +{
> + struct vfio_device_info info;
> +
> + ASSERT_NE(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
> +}
> +
> +/*
> + * Test: Binding with invalid iommufd fails
> + */
> +TEST_F(vfio_noiommu, device_bind_bad_iommufd_fails)
> +{
> + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, -2));
> +}
Are all these tests really specific to noiommu?
> +
> +/*
> + * Test: Cannot bind twice to same device
> + */
> +TEST_F(vfio_noiommu, device_repeated_bind_fails)
> +{
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> +}
> +
> +/*
> + * Test: IOAS can be allocated
> + */
> +TEST_F(vfio_noiommu, ioas_alloc)
> +{
> + struct iommu_ioas_alloc alloc_args;
> +
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> + &alloc_args));
> + ASSERT_NE(0, alloc_args.out_ioas_id);
> +}
> +
> +/*
> + * Test: IOAS can be destroyed
> + */
> +TEST_F(vfio_noiommu, ioas_destroy)
> +{
> + struct iommu_ioas_alloc alloc_args;
> +
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> + &alloc_args));
> + ASSERT_EQ(0, ioas_destroy_ioctl(self->iommufd,
> + alloc_args.out_ioas_id));
> +}
> +
> +/*
> + * Test: Device can attach to IOAS after binding
> + */
> +TEST_F(vfio_noiommu, device_attach_to_ioas)
> +{
> + struct iommu_ioas_alloc alloc_args;
> +
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> + &alloc_args));
> + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> + alloc_args.out_ioas_id));
> +}
> +
> +/*
> + * Test: Attaching to invalid IOAS fails
> + */
> +TEST_F(vfio_noiommu, device_attach_invalid_ioas_fails)
> +{
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> + ASSERT_NE(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> + UINT32_MAX));
> +}
> +
> +/*
> + * Test: Device can detach from IOAS
> + */
> +TEST_F(vfio_noiommu, device_detach_from_ioas)
> +{
> + struct iommu_ioas_alloc alloc_args;
> +
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> + &alloc_args));
> + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> + alloc_args.out_ioas_id));
> + ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd));
> +}
> +
> +/*
> + * Test: Full lifecycle - bind, attach, detach, reset
> + */
> +TEST_F(vfio_noiommu, device_lifecycle)
> +{
> + struct iommu_ioas_alloc alloc_args;
> + struct vfio_device_info info;
> +
> + /* Bind device to iommufd */
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> +
> + /* Allocate IOAS */
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> + &alloc_args));
> +
> + /* Attach device to IOAS */
> + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> + alloc_args.out_ioas_id));
> +
> + /* Query device info */
> + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
> +
> + /* Detach device from IOAS */
> + ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd));
> +
> + /* Reset device */
> + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
> +}
> +
> +/*
> + * Test: Get region info
> + */
> +TEST_F(vfio_noiommu, device_get_region_info)
> +{
> + struct vfio_device_info dev_info;
> + struct vfio_region_info region_info;
> +
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &dev_info));
> +
> + /* Try to get first region info if device has regions */
> + if (dev_info.num_regions > 0) {
> + ASSERT_EQ(0, vfio_device_get_region_info_ioctl(self->cdev_fd, 0,
> + ®ion_info));
> + ASSERT_NE(0, region_info.argsz);
> + }
> +}
> +
> +TEST_F(vfio_noiommu, device_reset)
> +{
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
> +}
> +
> +TEST_F(vfio_noiommu, ioas_map_pages)
> +{
> + struct iommu_ioas_alloc alloc_args;
> + long page_size = sysconf(_SC_PAGESIZE);
> + uint64_t iova = 0x10000;
> + int i;
> +
> + ASSERT_GT(page_size, 0);
> +
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> + &alloc_args));
> +
> + printf("Page size: %ld bytes\n", page_size);
> + /* Test mapping regions of different sizes: 1, 2, 4, 8 pages */
> + for (i = 0; i < 4; i++) {
> + size_t map_size = page_size * (1 << i); /* 1, 2, 4, 8 pages */
> + uint64_t test_iova = iova + (i * 0x100000);
> +
> + /* Attempt to map each region (may fail if not supported) */
> + ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
> + test_iova, map_size, false);
> + }
> +}
> +
> +TEST_F(vfio_noiommu, multiple_ioas_alloc)
> +{
> + struct iommu_ioas_alloc alloc1, alloc2;
> +
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc1));
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc2));
> + ASSERT_NE(alloc1.out_ioas_id, alloc2.out_ioas_id);
> +}
> +
> +/*
> + * Test: Query physical address for IOVA
> + * Tests IOMMU_IOAS_NOIOMMU_GET_PA ioctl to translate IOVA to physical address
> + * Note: Device must be attached to IOAS for PA query to work
> + */
> +#define NR_PAGES 32
> +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_mapped)
> +{
> + struct iommu_ioas_alloc alloc_args;
> + long page_size = sysconf(_SC_PAGESIZE);
> + uint64_t iova = 0x200000;
> + uint64_t phys = 0;
> + uint64_t length = 0;
> + int ret;
> +
> + ASSERT_GT(page_size, 0);
> +
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> +
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> + &alloc_args));
> +
> + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> + alloc_args.out_ioas_id));
> +
> + /*
> + * Map a page into an arbitrary IOAS, used as a cookie for lookup.
> + * Use hugepages to test contiguous PA. Make sure hugepages are
> + * available. e.g. echo 64 > /proc/sys/vm/nr_hugepages
> + */
> + ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
> + iova, page_size * NR_PAGES, true);
> + if (ret != 0)
> + return;
> +
> + /* Query the physical address for the mapped dummy IOVA */
> + ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
> + iova, &phys, &length);
> +
> + if (ret == 0) {
> + /* If we got a result, verify it's valid */
> + ASSERT_NE(0, phys);
> + ASSERT_GE((uint64_t)page_size * NR_PAGES, length);
> + }
> +
> + /*
> + * Query with a non-page-aligned IOVA. The returned length must
> + * not exceed the actual contiguous range starting from that
> + * offset, i.e. it must be reduced by the sub-page offset.
> + */
> + phys = 0;
> + length = 0;
> + ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
> + iova + 0x80, &phys, &length);
> + if (ret == 0) {
> + ASSERT_NE(0, phys);
> + /* Length must account for the sub-page offset */
> + ASSERT_GE((uint64_t)page_size * NR_PAGES - 0x80, length);
> + ASSERT_LE(length, (uint64_t)page_size * NR_PAGES - 0x80);
> + /* Must not overshoot into the next page boundary */
> + ASSERT_EQ(0, (phys + length) % page_size);
> + }
> +}
> +
> +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_unmapped_fails)
> +{
> + struct iommu_ioas_alloc alloc_args;
> +
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> + &alloc_args));
> +
> + /* Try to retrieve unmapped IOVA (should fail) */
> + ASSERT_NE(0, ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
> + 0x10000, NULL, NULL));
> +}
> +
> +/*
> + * Test: length == 0 means no limit (backward compat default)
> + */
> +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_length_zero_no_limit)
> +{
> + struct iommu_ioas_alloc alloc_args;
> + long page_size = sysconf(_SC_PAGESIZE);
> + uint64_t iova = 0x200000;
> + uint64_t phys_nolimit = 0, phys_zero = 0;
> + uint64_t len_nolimit = 0, len_zero = 0;
> + int ret;
> +
> + ASSERT_GT(page_size, 0);
> +
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc_args));
> + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> + alloc_args.out_ioas_id));
> +
> + ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
> + iova, page_size * NR_PAGES, true);
> + if (ret != 0)
> + return;
> +
> + /* Query with length=0 (no limit, default behavior) */
> + ret = ioas_noiommu_get_pa_ioctl_len(self->iommufd, alloc_args.out_ioas_id,
> + iova, 0, &phys_zero, &len_zero);
> + if (ret != 0)
> + return;
> +
> + /* Query with the wrapper (also passes 0) — must match */
> + ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
> + iova, &phys_nolimit, &len_nolimit);
> + ASSERT_EQ(0, ret);
> + ASSERT_EQ(phys_zero, phys_nolimit);
> + ASSERT_EQ(len_zero, len_nolimit);
> +}
> +
> +/*
> + * Test: length caps the returned contiguous range
> + */
> +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_length_capped)
> +{
> + struct iommu_ioas_alloc alloc_args;
> + long page_size = sysconf(_SC_PAGESIZE);
> + uint64_t iova = 0x200000;
> + uint64_t phys = 0;
> + uint64_t len_full = 0, len_capped = 0;
> + uint64_t cap;
> + int ret;
> +
> + ASSERT_GT(page_size, 0);
> +
> + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> + self->iommufd));
> + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc_args));
> + ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> + alloc_args.out_ioas_id));
> +
> + ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
> + iova, page_size * NR_PAGES, true);
> + if (ret != 0)
> + return;
> +
> + /* First get the full uncapped length */
> + ret = ioas_noiommu_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
> + iova, &phys, &len_full);
> + if (ret != 0)
> + return;
> +
> + ASSERT_NE(0, phys);
> + ASSERT_NE(0, len_full);
> +
> + /* Cap to a single page — returned length must not exceed it */
> + cap = page_size;
> + ret = ioas_noiommu_get_pa_ioctl_len(self->iommufd, alloc_args.out_ioas_id,
> + iova, cap, &phys, &len_capped);
> + ASSERT_EQ(0, ret);
> + ASSERT_LE(len_capped, cap);
> + ASSERT_NE(0, len_capped);
> +
> + /*
> + * If full length was larger than one page, confirm capping works.
> + * Otherwise the mapping wasn't contiguous enough to test.
> + */
> + if (len_full > cap)
> + ASSERT_GT(len_full, len_capped);
> +
> + /* Cap to a very large value — should return the same as uncapped */
> + ret = ioas_noiommu_get_pa_ioctl_len(self->iommufd, alloc_args.out_ioas_id,
> + iova, UINT64_MAX, &phys, &len_capped);
> + ASSERT_EQ(0, ret);
> + ASSERT_EQ(len_full, len_capped);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + const char *device_bdf = vfio_selftests_get_bdf(&argc, argv);
> + char *cdev = NULL;
> +
> + if (!device_bdf) {
> + ksft_print_msg("No device BDF provided\n");
> + return KSFT_SKIP;
> + }
vfio_selftests_get_bdf() already handles exiting with KSFT_SKIP if it
can't find a BDF.
> +
> + cdev = vfio_noiommu_get_cdev_path(device_bdf);
> + if (!cdev) {
> + ksft_print_msg("Could not find cdev for device %s\n",
> + device_bdf);
nit: "Could not find niommu cdev for ..."
> + return KSFT_SKIP;
> + }
> +
> + cdev_path = cdev;
> + ksft_print_msg("Using cdev device %s for BDF %s\n", cdev_path,
> + device_bdf);
> +
> + return test_harness_run(argc, argv);
> +}
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 2/7] iommufd: Move igroup allocation to a function
2026-05-21 22:11 ` [PATCH v6 2/7] iommufd: Move igroup allocation to a function Jacob Pan
@ 2026-05-22 6:00 ` Baolu Lu
0 siblings, 0 replies; 25+ messages in thread
From: Baolu Lu @ 2026-05-22 6:00 UTC (permalink / raw)
To: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon
On 5/22/26 06:11, Jacob Pan wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> So it can be reused in the next patch which allows binding to noiommu
> device.
>
> Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> v5:
> - Add NULL group to the error handling path of
> iommufd_group_setup_msi()
> v3:
> - New patch
> ---
> drivers/iommu/iommufd/device.c | 43 +++++++++++++++++++++-------------
> 1 file changed, 27 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 170a7005f0bc..d03076fcf3c2 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -56,6 +56,30 @@ static bool iommufd_group_try_get(struct iommufd_group *igroup,
> return kref_get_unless_zero(&igroup->ref);
> }
>
> +static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx,
> + struct iommu_group *group)
> +{
> + struct iommufd_group *new_igroup;
> +
> + new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL);
nit: I am still wondering why kzalloc_obj() was replaced with kzalloc()
here. Is it a rebase issue or something that was left out of the commit
message?
Otherwise, I am fine with it,
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> + if (!new_igroup)
> + return ERR_PTR(-ENOMEM);
> +
> + kref_init(&new_igroup->ref);
> + mutex_init(&new_igroup->lock);
> + xa_init(&new_igroup->pasid_attach);
> + new_igroup->sw_msi_start = PHYS_ADDR_MAX;
> + /* group reference moves into new_igroup */
> + new_igroup->group = group;
> +
> + /*
> + * The ictx is not additionally refcounted here because all objects using
> + * an igroup must put it before their destroy completes.
> + */
> + new_igroup->ictx = ictx;
> + return new_igroup;
> +}
> +
> /*
> * iommufd needs to store some more data for each iommu_group, we keep a
> * parallel xarray indexed by iommu_group id to hold this instead of putting it
> @@ -87,25 +111,12 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
> }
> xa_unlock(&ictx->groups);
>
> - new_igroup = kzalloc_obj(*new_igroup);
> - if (!new_igroup) {
> + new_igroup = iommufd_alloc_group(ictx, group);
> + if (IS_ERR(new_igroup)) {
> iommu_group_put(group);
> - return ERR_PTR(-ENOMEM);
> + return new_igroup;
> }
>
> - kref_init(&new_igroup->ref);
> - mutex_init(&new_igroup->lock);
> - xa_init(&new_igroup->pasid_attach);
> - new_igroup->sw_msi_start = PHYS_ADDR_MAX;
> - /* group reference moves into new_igroup */
> - new_igroup->group = group;
> -
> - /*
> - * The ictx is not additionally refcounted here becase all objects using
> - * an igroup must put it before their destroy completes.
> - */
> - new_igroup->ictx = ictx;
> -
> /*
> * We dropped the lock so igroup is invalid. NULL is a safe and likely
> * value to assume for the xa_cmpxchg algorithm.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 3/7] iommufd: Allow binding to a noiommu device
2026-05-21 22:11 ` [PATCH v6 3/7] iommufd: Allow binding to a noiommu device Jacob Pan
@ 2026-05-22 6:01 ` Baolu Lu
0 siblings, 0 replies; 25+ messages in thread
From: Baolu Lu @ 2026-05-22 6:01 UTC (permalink / raw)
To: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Yi Liu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon
On 5/22/26 06:11, Jacob Pan wrote:
> From: Jason Gunthorpe<jgg@nvidia.com>
>
> Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating
> a dummy IOMMU group for such devices and skipping hwpt operations.
>
> This enables noiommu devices to operate through the same iommufd API as IOMMU-
> capable devices.
>
> Reviewed-by: Yi Liu<yi.l.liu@intel.com>
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> Signed-off-by: Jacob Pan<jacob.pan@linux.microsoft.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd
2026-05-21 22:11 ` [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
@ 2026-05-22 9:19 ` Yi Liu
2026-05-23 22:01 ` Jacob Pan
0 siblings, 1 reply; 25+ messages in thread
From: Yi Liu @ 2026-05-22 9:19 UTC (permalink / raw)
To: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon
On 5/22/26 06:11, Jacob Pan wrote:
> Now that devices under noiommu mode can bind with IOMMUFD and perform
> IOAS operations, lift restrictions on cdev from VFIO side.
> Use cases are documented in Documentation/driver-api/vfio.rst
>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> v6:
> - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and group.
> Use Kconfig dependency to restrict usages and avoid null group
> checks. (Alex & Yi)
> - Add CAP_SYS_RAWIO checks for cdev open to maintain security parity
> with the group noiommu path. (Alex)
> v5:
> - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
> and its dependencies
> - Add comment to explain vfio_noiommu conditional definition (Alex)
> - Removed early return for group noiommu in bind/unbind
> - Use consistent wording referring to VFIO noiommu mode (Kevin)
> - Update unsafe_noiommu Kconfig help text (Kevin)
> - Change dev_warn to dev_info for noiommu enabling msg (Kevin)
> v4:
> - Remove early return in iommufd_bind for noiommu (Alex)
> v3:
> - Consolidate into fewer patches
> v2:
> - removed unnecessary device->noiommu set in
> iommufd_vfio_compat_ioas_get_id()
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> drivers/vfio/Kconfig | 8 +++++---
> drivers/vfio/device_cdev.c | 3 +++
> drivers/vfio/iommufd.c | 6 +++---
> drivers/vfio/vfio.h | 20 +++++++++++++-------
> drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
> include/linux/vfio.h | 1 +
> 6 files changed, 44 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index ceae52fd7586..d3d8fef2855c 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
> The VFIO device cdev is another way for userspace to get device
> access. Userspace gets device fd by opening device cdev under
> /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
> - to set up secure DMA context for device access. This interface does
> - not support noiommu.
> + to set up secure DMA context for device access.
if noiommu, it's unsafe DMA. :)
> If you don't know what to do here, say N.
>
> @@ -62,7 +61,10 @@ endif
>
> config VFIO_NOIOMMU
> bool "VFIO No-IOMMU support"
> - depends on VFIO_GROUP
> + depends on VFIO_GROUP || VFIO_DEVICE_CDEV
> + depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER
> + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
> + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
> help
> VFIO is built on the ability to isolate devices using the IOMMU.
> Only with an IOMMU can userspace access to DMA capable devices be
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 54abf312cf04..4e2c1e4fc1f8 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
> struct vfio_device_file *df;
> int ret;
>
> + if (device->noiommu && !capable(CAP_SYS_RAWIO))
> + return -EPERM;
> +
> /* Paired with the put in vfio_device_fops_release() */
> if (!vfio_device_try_get_registration(device))
> return -ENODEV;
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index a38d262c6028..d4f2e2a0f2f3 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df)
>
> lockdep_assert_held(&vdev->dev_set->lock);
>
> - /* Returns 0 to permit device opening under noiommu mode */
> - if (vfio_device_is_noiommu(vdev))
> + /* Group noiommu via iommufd compat needs no device binding */
> + if (df->group && vfio_device_is_noiommu(vdev))
seems like vfio_device_is_noiommu() implies group path, then no need
to use df->group.
static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
vdev->group->type == VFIO_NO_IOMMU;
}
> return 0;
>
> return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
> @@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
>
> lockdep_assert_held(&vdev->dev_set->lock);
>
> - if (vfio_device_is_noiommu(vdev))
> + if (df->group && vfio_device_is_noiommu(vdev))
> return;
>
> if (vdev->ops->unbind_iommufd)
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index e4b72e79b7e3..6f0a2dfc8a00 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct vfio_device *device);
>
> static inline int vfio_device_add(struct vfio_device *device)
> {
> - /* cdev does not support noiommu device */
> - if (vfio_device_is_noiommu(device))
> - return device_add(&device->device);
> vfio_init_device_cdev(device);
> return cdev_device_add(&device->cdev, &device->device);
> }
>
> static inline void vfio_device_del(struct vfio_device *device)
> {
> - if (vfio_device_is_noiommu(device))
> - device_del(&device->device);
> - else
> - cdev_device_del(&device->cdev, &device->device);
> + cdev_device_del(&device->cdev, &device->device);
> }
>
> int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> @@ -420,6 +414,18 @@ static inline void vfio_cdev_cleanup(void)
> }
> #endif /* CONFIG_VFIO_DEVICE_CDEV */
>
> +#if IS_ENABLED(CONFIG_VFIO_NOIOMMU)
> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev)
> +{
> + return vdev->noiommu;
> +}
> +#else
> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device *vdev)
> +{
> + return false;
> +}
> +#endif
> +
> #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
> int __init vfio_virqfd_init(void);
> void vfio_virqfd_exit(void);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 6222376ab6ab..84381c500623 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -321,6 +321,20 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
> return ret;
> }
>
> +static int vfio_device_set_noiommu_and_name(struct vfio_device *device)
> +{
> + if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu && !device->dev->iommu) {
> + device->noiommu = true;
> + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> + dev_warn(device->dev,
> + "Adding kernel taint for vfio-noiommu cdev on device\n");
> + }
> +
> + /* Just to be safe, expose to user explicitly noiommu cdev node */
> + return dev_set_name(&device->device, "%svfio%d",
> + device->noiommu ? "noiommu-" : "", device->index);
> +}
> +
> static int __vfio_register_dev(struct vfio_device *device,
> enum vfio_group_type type)
> {
> @@ -340,20 +354,21 @@ static int __vfio_register_dev(struct vfio_device *device,
> if (!device->dev_set)
> vfio_assign_device_set(device, device);
>
> - ret = dev_set_name(&device->device, "vfio%d", device->index);
> + ret = vfio_device_set_group(device, type);
> if (ret)
> return ret;
>
> - ret = vfio_device_set_group(device, type);
> + ret = vfio_device_set_noiommu_and_name(device);
the order of dev_set_name and vfio_device_set_group() are swapped, any
special reason?
> if (ret)
> - return ret;
> + goto err_out;
>
> /*
> * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> * restore cache coherency. It has to be checked here because it is only
> * valid for cases where we are using iommu groups.
> */
> - if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> + if (type == VFIO_IOMMU && !(vfio_device_is_noiommu(device) ||
> + vfio_device_is_cdev_noiommu(device)) &&
now, the group path and cdev path have their own is_noiommu helper, can
the two helpers be consolidated?
> !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
> ret = -EINVAL;
> goto err_out;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 31b826efba00..45f08986359e 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -74,6 +74,7 @@ struct vfio_device {
> u8 iommufd_attached:1;
> #endif
> u8 cdev_opened:1;
> + u8 noiommu:1;
> /*
> * debug_root is a static property of the vfio_device
> * which must be set prior to registering the vfio_device.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 4/7] iommufd: Add an ioctl to query PA from IOVA for noiommu mode
2026-05-21 22:11 ` [PATCH v6 4/7] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
@ 2026-05-22 9:22 ` Yi Liu
0 siblings, 0 replies; 25+ messages in thread
From: Yi Liu @ 2026-05-22 9:22 UTC (permalink / raw)
To: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon
On 5/22/26 06:11, Jacob Pan wrote:
> To support no-IOMMU mode where userspace drivers perform unsafe DMA
> using physical addresses, introduce a new API to retrieve the
> physical address of a user-allocated DMA buffer that has been mapped to
> an IOVA via IOAS. The mapping is backed by SW-only I/O page tables
nit: /via IOAS/via IOMMU_IOAS_MAP/
> maintained by the generic IOMMUPT framework.
>
> Reviewed-by: Lu Baolu<baolu.lu@linux.intel.com>
> Suggested-by: Jason Gunthorpe<jgg@nvidia.com>
> Co-developed-by: Jason Gunthorpe<jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> Signed-off-by: Jacob Pan<jacob.pan@linux.microsoft.com>
> ---
> v6:
> - Limit search length (Baolu, Jason)
> v5:
> - Fix next_iova exceeds iopt_area_last_iova (Alex)
> - Rename IOCTL more specific to NOIOMMU, i.e.
> IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA (Kevin)
> - Add header stubs for iopt_get_phys()
> v4:
> - Fix ioctl return type (Yi Liu)
> ---
> drivers/iommu/iommufd/io_pagetable.c | 72 +++++++++++++++++++++++++
> drivers/iommu/iommufd/ioas.c | 30 +++++++++++
> drivers/iommu/iommufd/iommufd_private.h | 18 +++++++
> drivers/iommu/iommufd/main.c | 3 ++
> include/uapi/linux/iommufd.h | 27 ++++++++++
> 5 files changed, 150 insertions(+)
>
> diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
> index 24d4917105d9..4369447e2125 100644
> --- a/drivers/iommu/iommufd/io_pagetable.c
> +++ b/drivers/iommu/iommufd/io_pagetable.c
> @@ -859,6 +859,78 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
> return iopt_unmap_iova_range(iopt, iova, iova_last, unmapped);
> }
>
> +#ifdef CONFIG_IOMMUFD_NOIOMMU
> +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
> + u64 *length)
> +{
> + struct iopt_area *area;
> + u64 max_length = *length;
> + u64 tmp_length = 0;
> + u64 tmp_paddr = 0;
> + int rc = 0;
> +
> + down_read(&iopt->iova_rwsem);
> + area = iopt_area_iter_first(iopt, iova, iova);
> + if (!area || !area->pages) {
> + rc = -ENOENT;
> + goto unlock_exit;
> + }
> +
> + if (!area->storage_domain ||
> + area->storage_domain->owner != &iommufd_noiommu_ops) {
> + rc = -EOPNOTSUPP;
> + goto unlock_exit;
> + }
> +
> + *paddr = iommu_iova_to_phys(area->storage_domain, iova);
> + if (!*paddr) {
> + rc = -EINVAL;
> + goto unlock_exit;
> + }
> +
> + tmp_length = PAGE_SIZE - offset_in_page(iova);
> + tmp_paddr = *paddr;
> + /*
> + * Scan the domain for the contiguous physical address length so that
> + * userspace search can be optimized for fewer ioctls. A max_length of
> + * 0 means no limit.
> + */
> + while (iova < iopt_area_last_iova(area)) {
> + unsigned long next_iova;
> + u64 next_paddr;
> +
> + if (max_length && tmp_length >= max_length) {
> + tmp_length = max_length;
nit: is this value setting duplicated with the one outside this loop?
> + break;
> + }
> +
> + if (check_add_overflow(iova, PAGE_SIZE, &next_iova))
> + break;
> +
> + if (next_iova > iopt_area_last_iova(area))
> + break;
> +
> + next_paddr = iommu_iova_to_phys(area->storage_domain, next_iova);
> +
> + if (!next_paddr || next_paddr != tmp_paddr + PAGE_SIZE)
> + break;
> +
> + iova = next_iova;
> + tmp_paddr += PAGE_SIZE;
> + tmp_length += PAGE_SIZE;
> + }
> +
> + if (max_length && tmp_length > max_length)
> + tmp_length = max_length;
> + *length = tmp_length;
> +
> +unlock_exit:
> + up_read(&iopt->iova_rwsem);
> +
> + return rc;
> +}
> +#endif
> +
> int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped)
> {
> /* If the IOVAs are empty then unmap all succeeds */
> diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
> index fed06c2b728e..82bbc0c2357e 100644
> --- a/drivers/iommu/iommufd/ioas.c
> +++ b/drivers/iommu/iommufd/ioas.c
> @@ -375,6 +375,36 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd)
> return rc;
> }
>
> +#ifdef CONFIG_IOMMUFD_NOIOMMU
> +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd)
> +{
> + struct iommu_ioas_noiommu_get_pa *cmd = ucmd->cmd;
> + struct iommufd_ioas *ioas;
> + int rc;
> +
> + if (!capable(CAP_SYS_RAWIO))
> + return -EPERM;
> +
> + if (cmd->flags || cmd->__reserved)
> + return -EOPNOTSUPP;
> +
> + ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
> + if (IS_ERR(ioas))
> + return PTR_ERR(ioas);
> +
> + rc = iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys,
> + &cmd->length);
> + if (rc)
> + goto out_put;
> +
> + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> +out_put:
> + iommufd_put_object(ucmd->ictx, &ioas->obj);
> +
> + return rc;
> +}
> +#endif
> +
> static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx,
> struct xarray *ioas_list)
> {
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 2682b5baa6e9..13f1506d8066 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -118,6 +118,16 @@ int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list,
> int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
> unsigned long length, unsigned long *unmapped);
> int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
> +#ifdef CONFIG_IOMMUFD_NOIOMMU
> +int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
> + u64 *length);
> +#else
> +static inline int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova,
> + u64 *paddr, u64 *length)
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif
>
> int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
> struct iommu_domain *domain,
> @@ -346,6 +356,14 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd);
> int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd);
> int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
> int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
> +#ifdef CONFIG_IOMMUFD_NOIOMMU
> +int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd);
> +#else
> +static inline int iommufd_ioas_noiommu_get_pa(struct iommufd_ucmd *ucmd)
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif
> int iommufd_ioas_option(struct iommufd_ucmd *ucmd);
> int iommufd_option_rlimit_mode(struct iommu_option *cmd,
> struct iommufd_ctx *ictx);
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index 8c6d43601afb..3b4192d70570 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -424,6 +424,7 @@ union ucmd_buffer {
> struct iommu_ioas_alloc alloc;
> struct iommu_ioas_allow_iovas allow_iovas;
> struct iommu_ioas_copy ioas_copy;
> + struct iommu_ioas_noiommu_get_pa noiommu_get_pa;
> struct iommu_ioas_iova_ranges iova_ranges;
> struct iommu_ioas_map map;
> struct iommu_ioas_unmap unmap;
> @@ -482,6 +483,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
> IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, iova),
> IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file,
> struct iommu_ioas_map_file, iova),
> + IOCTL_OP(IOMMU_IOAS_NOIOMMU_GET_PA, iommufd_ioas_noiommu_get_pa, struct iommu_ioas_noiommu_get_pa,
> + out_phys),
> IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap,
> length),
> IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64),
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index e998dfbd6960..26b4998439e8 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -57,6 +57,7 @@ enum {
> IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
> IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
> IOMMUFD_CMD_HW_QUEUE_ALLOC = 0x94,
> + IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA = 0x95,
> };
>
> /**
> @@ -219,6 +220,32 @@ struct iommu_ioas_map {
> };
> #define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP)
>
> +/**
> + * struct iommu_ioas_noiommu_get_pa - ioctl(IOMMU_IOAS_NOIOMMU_GET_PA)
> + * @size: sizeof(struct iommu_ioas_noiommu_get_pa)
> + * @flags: Reserved, must be 0 for now
> + * @ioas_id: IOAS ID to query IOVA to PA mapping from
> + * @__reserved: Must be 0
> + * @iova: IOVA to query
> + * @length: On input, maximum number of bytes to scan for contiguity (0 means
> + * no limit). On output, actual number of contiguous bytes starting
> + * from out_phys.
> + * @out_phys: Output physical address the IOVA maps to
> + *
> + * Query the physical address backing an IOVA range. The entire range must be
> + * mapped already. For noiommu devices doing unsafe DMA only.
> + */
> +struct iommu_ioas_noiommu_get_pa {
> + __u32 size;
> + __u32 flags;
> + __u32 ioas_id;
> + __u32 __reserved;
> + __aligned_u64 iova;
> + __aligned_u64 length;
> + __aligned_u64 out_phys;
> +};
> +#define IOMMU_IOAS_NOIOMMU_GET_PA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA)
> +
> /**
> * struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE)
> * @size: sizeof(struct iommu_ioas_map_file)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode
2026-05-21 22:11 ` [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode Jacob Pan
@ 2026-05-22 9:42 ` Yi Liu
2026-05-23 3:42 ` Jacob Pan
0 siblings, 1 reply; 25+ messages in thread
From: Yi Liu @ 2026-05-22 9:42 UTC (permalink / raw)
To: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Baolu Lu
Cc: Saurabh Sengar, skhawaja, pasha.tatashin, Will Deacon
On 5/22/26 06:11, Jacob Pan wrote:
> Document the NOIOMMU mode with newly added cdev support under iommufd.
>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> v6:
> - Generalize device node names (noiommu-vfioX, noiommu-Y) in the tree
> example (Yi)
> - Clarify table column descriptions for Yes/No meanings (Yi)
> ---
> Documentation/driver-api/vfio.rst | 83 ++++++++++++++++++++++++++++++-
> 1 file changed, 81 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> index 2a21a42c9386..739576a22de6 100644
> --- a/Documentation/driver-api/vfio.rst
> +++ b/Documentation/driver-api/vfio.rst
> @@ -275,8 +275,6 @@ in a VFIO group.
> With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> by directly opening a character device /dev/vfio/devices/vfioX where
> "X" is the number allocated uniquely by VFIO for registered devices.
> -cdev interface does not support noiommu devices, so user should use
> -the legacy group interface if noiommu is wanted.
>
> The cdev only works with IOMMUFD. Both VFIO drivers and applications
> must adapt to the new cdev security model which requires using
> @@ -370,6 +368,87 @@ IOMMUFD IOAS/HWPT to enable userspace DMA::
>
> /* Other device operations as stated in "VFIO Usage Example" */
>
> +VFIO NOIOMMU mode
> +-------------------------------------------------------------------------------
> +VFIO also supports a no-IOMMU mode, intended for usages where unsafe DMA can
> +be performed by userspace drivers w/o physical IOMMU protection. This mode
> +is controlled by the parameter:
> +
> +/sys/module/vfio/parameters/enable_unsafe_noiommu_mode
> +
> +Upon enabling this mode, with an assigned device, the user will be presented
> +with a VFIO group and device file, e.g.::
> +
> + /dev/vfio/
> + |-- devices
> + | `-- noiommu-vfioX /* VFIO device cdev */
> + |-- noiommu-Y /* VFIO group */
> + `-- vfio
> +
> +The capabilities vary depending on the device programming interface and kernel
> +configuration used. The following table summarizes the differences ("Yes" means
> +the UAPI is accessible and functional in noiommu mode, "No" means the UAPI is
> +not supported):
> +
> ++-------------------+---------------------+----------------------+
> +| Feature | VFIO group | VFIO device cdev |
> ++===================+=====================+======================+
> +| VFIO device UAPI | Yes | Yes |
> ++-------------------+---------------------+----------------------+
> +| VFIO container | No | No |
> ++-------------------+---------------------+----------------------+
> +| IOMMUFD IOAS | No | Yes* |
> ++-------------------+---------------------+----------------------+
> +
> +Note that the VFIO container case includes IOMMUFD provided VFIO compatibility
> +interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER is
> +enabled.
> +
> +* IOMMUFD UAPI is available for VFIO device cdev to pin and map user memory with
> + the ability to retrieve physical addresses for DMA command submission.
> +
> +Kconfig Support Matrix
> +^^^^^^^^^^^^^^^^^^^^^^
> +
> +The visibility of CONFIG_VFIO_NOIOMMU depends on the combination of
> +CONFIG_VFIO_GROUP, CONFIG_VFIO_DEVICE_CDEV, and whether a container backend
> +(CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER) is configured. The
> +Kconfig dependencies enforce the following constraints:
> +
> +- At least one access path (group or cdev) must be available.
> +- If VFIO_GROUP is enabled, a container backend is required; otherwise the
> + group node would be unusable in noiommu mode.
> +
> +The resulting support matrix:
> +
> ++------+-------+-----------+------+---------+---------------------------+
> +| Case | GROUP | Container | CDEV | NOIOMMU | Notes |
> ++======+=======+===========+======+=========+===========================+
> +| 1 | y | y | n | yes | Group noiommu works |
> ++------+-------+-----------+------+---------+---------------------------+
> +| 2 | y | n | n | no | Blocked - no container |
> ++------+-------+-----------+------+---------+---------------------------+
> +| 3 | y | y | y | yes | Both paths work |
> ++------+-------+-----------+------+---------+---------------------------+
> +| 4 | y | n | y | no | Blocked - no container |
> ++------+-------+-----------+------+---------+---------------------------+
> +| 5 | n | - | y | yes | Cdev-only works |
> ++------+-------+-----------+------+---------+---------------------------+
> +| 6 | n | - | n | no | No access path |
> ++------+-------+-----------+------+---------+---------------------------+
> +
Does "Bloked" mean no access path or no chance to compile? :)
> +Container = CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER (either
> +suffices). Case 4 is intentionally blocked: allowing NOIOMMU with GROUP
> +enabled but no container would create unusable group nodes. Users who want
> +cdev-only noiommu should set CONFIG_VFIO_GROUP=n (case 5).
> +
> +A new IOMMUFD ioctl IOMMU_IOAS_NOIOMMU_GET_PA is added to retrieve the physical
> +address for a given IOVA. Although there is no physical DMA remapping hardware,
> +IOMMU_IOAS_MAP_FIXED_IOVA is still used to establish IOVA-to-PA mappings in the
> +software page table for later IOMMU_IOAS_NOIOMMU_GET_PA lookups.
> +tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c provides an example of
> +using this ioctl in no-IOMMU mode.
> +
> VFIO User API
> -------------------------------------------------------------------------------
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode
2026-05-22 9:42 ` Yi Liu
@ 2026-05-23 3:42 ` Jacob Pan
2026-05-25 6:29 ` Yi Liu
0 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-23 3:42 UTC (permalink / raw)
To: Yi Liu
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon, jacob.pan
Hi Yi,
On Fri, 22 May 2026 17:42:42 +0800
Yi Liu <yi.l.liu@intel.com> wrote:
> > ++------+-------+-----------+------+---------+---------------------------+
> > +| Case | GROUP | Container | CDEV | NOIOMMU | Notes
> > |
> > ++======+=======+===========+======+=========+===========================+
> > +| 1 | y | y | n | yes | Group noiommu works
> > |
> > ++------+-------+-----------+------+---------+---------------------------+
> > +| 2 | y | n | n | no | Blocked - no
> > container |
> > ++------+-------+-----------+------+---------+---------------------------+
> > +| 3 | y | y | y | yes | Both paths work
> > |
> > ++------+-------+-----------+------+---------+---------------------------+
> > +| 4 | y | n | y | no | Blocked - no
> > container |
> > ++------+-------+-----------+------+---------+---------------------------+
> > +| 5 | n | - | y | yes | Cdev-only works
> > |
> > ++------+-------+-----------+------+---------+---------------------------+
> > +| 6 | n | - | n | no | No access path
> > |
> > ++------+-------+-----------+------+---------+---------------------------+
> > +
>
> Does "Bloked" mean no access path or no chance to compile? :)
By “Blocked”, I mean Kconfig prevents that combination from being
selected, so it is not buildable as such; consequently there is no
access path at runtime.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd
2026-05-22 9:19 ` Yi Liu
@ 2026-05-23 22:01 ` Jacob Pan
2026-05-25 6:29 ` Yi Liu
0 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-23 22:01 UTC (permalink / raw)
To: Yi Liu
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon, jacob.pan
Hi Yi,
On Fri, 22 May 2026 17:19:41 +0800
Yi Liu <yi.l.liu@intel.com> wrote:
> On 5/22/26 06:11, Jacob Pan wrote:
> > Now that devices under noiommu mode can bind with IOMMUFD and
> > perform IOAS operations, lift restrictions on cdev from VFIO side.
> > Use cases are documented in Documentation/driver-api/vfio.rst
> >
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > v6:
> > - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and
> > group. Use Kconfig dependency to restrict usages and avoid null
> > group checks. (Alex & Yi)
> > - Add CAP_SYS_RAWIO checks for cdev open to maintain security
> > parity with the group noiommu path. (Alex)
> > v5:
> > - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
> > and its dependencies
> > - Add comment to explain vfio_noiommu conditional definition
> > (Alex)
> > - Removed early return for group noiommu in bind/unbind
> > - Use consistent wording referring to VFIO noiommu mode (Kevin)
> > - Update unsafe_noiommu Kconfig help text (Kevin)
> > - Change dev_warn to dev_info for noiommu enabling msg (Kevin)
> > v4:
> > - Remove early return in iommufd_bind for noiommu (Alex)
> > v3:
> > - Consolidate into fewer patches
> > v2:
> > - removed unnecessary device->noiommu set in
> > iommufd_vfio_compat_ioas_get_id()
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > drivers/vfio/Kconfig | 8 +++++---
> > drivers/vfio/device_cdev.c | 3 +++
> > drivers/vfio/iommufd.c | 6 +++---
> > drivers/vfio/vfio.h | 20 +++++++++++++-------
> > drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
> > include/linux/vfio.h | 1 +
> > 6 files changed, 44 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index ceae52fd7586..d3d8fef2855c 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
> > The VFIO device cdev is another way for userspace to
> > get device access. Userspace gets device fd by opening device cdev
> > under /dev/vfio/devices/vfioX, and then bind the device fd with an
> > iommufd
> > - to set up secure DMA context for device access. This
> > interface does
> > - not support noiommu.
> > + to set up secure DMA context for device access.
>
> if noiommu, it's unsafe DMA. :)
yes, here I just want to remove "This interface does not support
noiommu.".
>
> > If you don't know what to do here, say N.
> >
> > @@ -62,7 +61,10 @@ endif
> >
> > config VFIO_NOIOMMU
> > bool "VFIO No-IOMMU support"
> > - depends on VFIO_GROUP
> > + depends on VFIO_GROUP || VFIO_DEVICE_CDEV
> > + depends on !VFIO_GROUP || VFIO_CONTAINER ||
> > IOMMUFD_VFIO_CONTAINER
> > + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
> > + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
> > help
> > VFIO is built on the ability to isolate devices using
> > the IOMMU. Only with an IOMMU can userspace access to DMA capable
> > devices be diff --git a/drivers/vfio/device_cdev.c
> > b/drivers/vfio/device_cdev.c index 54abf312cf04..4e2c1e4fc1f8 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode
> > *inode, struct file *filep) struct vfio_device_file *df;
> > int ret;
> >
> > + if (device->noiommu && !capable(CAP_SYS_RAWIO))
> > + return -EPERM;
> > +
> > /* Paired with the put in vfio_device_fops_release() */
> > if (!vfio_device_try_get_registration(device))
> > return -ENODEV;
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index a38d262c6028..d4f2e2a0f2f3 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file
> > *df)
> > lockdep_assert_held(&vdev->dev_set->lock);
> >
> > - /* Returns 0 to permit device opening under noiommu mode */
> > - if (vfio_device_is_noiommu(vdev))
> > + /* Group noiommu via iommufd compat needs no device
> > binding */
> > + if (df->group && vfio_device_is_noiommu(vdev))
>
> seems like vfio_device_is_noiommu() implies group path, then no need
> to use df->group.
>
df->group is needed because only the legacy VFIO group/iommufd-compat
noiommu path should skip real iommufd device binding.
For df->group == NULL, the fd is a VFIO cdev fd. That path uses
VFIO_DEVICE_BIND_IOMMUFD and later VFIO_DEVICE_ATTACH_IOMMUFD_PT. Even
in noiommu cdev mode, bind must still call:
vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
so vdev->iommufd_device can get initialized. If the check were only:
if (vfio_device_is_noiommu(vdev))
return 0;
then cdev noiommu bind would falsely “succeed” without setting
vdev->iommufd_device. Later VFIO_DEVICE_ATTACH_IOMMUFD_PT calls
vfio_iommufd_physical_attach_ioas(), hits:
if (WARN_ON(!vdev->iommufd_device))
return -EINVAL;
In the noiommu test, you will get:
185.870670] ------------[ cut here ]------------
[ 185.871952] WARNING: drivers/vfio/iommufd.c:157 at
vfio_iommufd_physical_attach_ioas+0x3f/0x50, CPU#0:
vfio-noiommu-pc/157[ 185.875010] Modules linked in:[ 185.875882] CPU:
0 UID: 0 PID: 157 Comm: vfio-noiommu-pc Tainted: G U W
7.1.0-rc1+ #20 PREEMPT[ 185.878637] Tainted: [U]=USER, [W]=WARN[
185.879711] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014[ 185.882913]
RIP: 0010:vfio_iommufd_physical_attach_ioas+0x3f/0x50[ 185.884624]
Code: 89 f2 31 f6 f6 83 50 04 00 00 01 75 16 e8 d9 aa c6 ff 85 c0 75 07
80 8b 50 04 00 00 01 5b c3 cc cc cc cc e8 43 ab c6 ff eb e8 <0f> 0b
b80[ 185.889701] RSP: 0018:ffa000000062fd88 EFLAGS: 00010246[
185.891161] RAX: ffffffff81f59ee0 RBX: ff1100010c43b800 RCX:
0000000000000000[ 185.893141] RDX: ff1100010c708040 RSI:
ffa000000062fda0 RDI: 0000000000000000[ 185.895127] RBP:
ff1100010c43b800 R08: ff1100010c7c12b0 R09: 0000000000000000[
185.897119] R10: 0000000000000000 R11: 0000000000000000 R12:
00007ffec4c2f720[ 185.899102] R13: ffa000000062fda0 R14:
ff11000103bd40d0 R15: ff1100010c43b800[ 185.901075] FS:
0000000028d69380(0000) GS:ff110004e4a8d000(0000)
knlGS:0000000000000000[ 185.903284] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033[ 185.904888] CR2: 0000000028d73988 CR3:
0000000103507002 CR4: 0000000000f73ef0[ 185.906853] PKRU: 55555554[
185.907636] Call Trace:[ 185.908373] <TASK>[ 185.908932]
vfio_df_ioctl_attach_pt+0xc7/0x170[ 185.910085]
vfio_device_fops_unl_ioctl+0x49b/0xa50[ 185.911322] ?
file_tty_write.isra.0+0x202/0x320[ 185.912507]
__x64_sys_ioctl+0x425/0xa30[ 185.913502] do_syscall_64+0x5e/0xf80[
185.914444] ? irqentry_exit+0x3b/0x5e0[ 185.915414]
entry_SYSCALL_64_after_hwframe+0x76/0x7e[ 185.916701] RIP:
0033:0x434a4d[ 185.917498] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0
48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8
10 00 00 00 0f 05 <89> c2 3d0[ 185.922052] RSP: 002b:00007ffec4c2f6b0
EFLAGS: 00000246 ORIG_RAX: 0000000000000010[ 185.923785] RAX:
ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000434a4d[
185.925398] RDX: 00007ffec4c2f720 RSI: 0000000000003b77 RDI:
0000000000000004[ 185.927007] RBP: 00007ffec4c2f700 R08:
0000000000000064 R09: 0000000000000000[ 185.928611] R10:
0000000000000000 R11: 0000000000000246 R12: 00007ffec4c30918[
185.930211] R13: 00007ffec4c30940 R14: 00000000004cf868 R15:
0000000000000001[ 185.931758] </TASK>[ 185.932258] ---[ end trace
0000000000000000 ]---Failed to attach pt to device
> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> {
> return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> vdev->group->type == VFIO_NO_IOMMU;
> }
>
> > return 0;
> >
> > return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
> > @@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct
> > vfio_device_file *df)
> > lockdep_assert_held(&vdev->dev_set->lock);
> >
> > - if (vfio_device_is_noiommu(vdev))
> > + if (df->group && vfio_device_is_noiommu(vdev))
> > return;
> >
> > if (vdev->ops->unbind_iommufd)
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index e4b72e79b7e3..6f0a2dfc8a00 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct vfio_device
> > *device);
> > static inline int vfio_device_add(struct vfio_device *device)
> > {
> > - /* cdev does not support noiommu device */
> > - if (vfio_device_is_noiommu(device))
> > - return device_add(&device->device);
> > vfio_init_device_cdev(device);
> > return cdev_device_add(&device->cdev, &device->device);
> > }
> >
> > static inline void vfio_device_del(struct vfio_device *device)
> > {
> > - if (vfio_device_is_noiommu(device))
> > - device_del(&device->device);
> > - else
> > - cdev_device_del(&device->cdev, &device->device);
> > + cdev_device_del(&device->cdev, &device->device);
> > }
> >
> > int vfio_device_fops_cdev_open(struct inode *inode, struct file
> > *filep); @@ -420,6 +414,18 @@ static inline void
> > vfio_cdev_cleanup(void) }
> > #endif /* CONFIG_VFIO_DEVICE_CDEV */
> >
> > +#if IS_ENABLED(CONFIG_VFIO_NOIOMMU)
> > +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
> > *vdev) +{
> > + return vdev->noiommu;
> > +}
> > +#else
> > +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
> > *vdev) +{
> > + return false;
> > +}
> > +#endif
> > +
> > #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
> > int __init vfio_virqfd_init(void);
> > void vfio_virqfd_exit(void);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 6222376ab6ab..84381c500623 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -321,6 +321,20 @@ static int vfio_init_device(struct vfio_device
> > *device, struct device *dev, return ret;
> > }
> >
> > +static int vfio_device_set_noiommu_and_name(struct vfio_device
> > *device) +{
> > + if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu &&
> > !device->dev->iommu) {
> > + device->noiommu = true;
> > + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > + dev_warn(device->dev,
> > + "Adding kernel taint for vfio-noiommu
> > cdev on device\n");
> > + }
> > +
> > + /* Just to be safe, expose to user explicitly noiommu cdev
> > node */
> > + return dev_set_name(&device->device, "%svfio%d",
> > + device->noiommu ? "noiommu-" : "",
> > device->index); +}
> > +
> > static int __vfio_register_dev(struct vfio_device *device,
> > enum vfio_group_type type)
> > {
> > @@ -340,20 +354,21 @@ static int __vfio_register_dev(struct
> > vfio_device *device, if (!device->dev_set)
> > vfio_assign_device_set(device, device);
> >
> > - ret = dev_set_name(&device->device, "vfio%d",
> > device->index);
> > + ret = vfio_device_set_group(device, type);
> > if (ret)
> > return ret;
> >
> > - ret = vfio_device_set_group(device, type);
> > + ret = vfio_device_set_noiommu_and_name(device);
>
> the order of dev_set_name and vfio_device_set_group() are swapped, any
> special reason?
The ordering was intentional in an earlier version where the cdev
noiommu check depended on device->group. With the current check using
!device->dev->iommu, the ordering is no longer strictly required for
that test.
I kept vfio_device_set_group() first because the rest of registration
already treats group setup as the first VFIO state to unwind, and this
lets the existing err_out path handle failures after group assignment,
including dev_set_name(). I can restore the old order if you prefer,
since it is not functionally required anymore.
> > if (ret)
> > - return ret;
> > + goto err_out;
> >
> > /*
> > * VFIO always sets IOMMU_CACHE because we offer no way
> > for userspace to
> > * restore cache coherency. It has to be checked here
> > because it is only
> > * valid for cases where we are using iommu groups.
> > */
> > - if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)
> > &&
> > + if (type == VFIO_IOMMU && !(vfio_device_is_noiommu(device)
> > ||
> > +
> > vfio_device_is_cdev_noiommu(device)) &&
>
> now, the group path and cdev path have their own is_noiommu helper,
> can the two helpers be consolidated?
>
They could be consolidated mechanically, but I feel they are checking
different things it is more clear to keep them separate?
> > !device_iommu_capable(device->dev,
> > IOMMU_CAP_CACHE_COHERENCY)) { ret = -EINVAL;
> > goto err_out;
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 31b826efba00..45f08986359e 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -74,6 +74,7 @@ struct vfio_device {
> > u8 iommufd_attached:1;
> > #endif
> > u8 cdev_opened:1;
> > + u8 noiommu:1;
> > /*
> > * debug_root is a static property of the vfio_device
> > * which must be set prior to registering the
> > vfio_device.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode
2026-05-23 3:42 ` Jacob Pan
@ 2026-05-25 6:29 ` Yi Liu
0 siblings, 0 replies; 25+ messages in thread
From: Yi Liu @ 2026-05-25 6:29 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon
On 5/23/26 11:42, Jacob Pan wrote:
> Hi Yi,
>
> On Fri, 22 May 2026 17:42:42 +0800
> Yi Liu <yi.l.liu@intel.com> wrote:
>
>>> ++------+-------+-----------+------+---------+---------------------------+
>>> +| Case | GROUP | Container | CDEV | NOIOMMU | Notes
>>> |
>>> ++======+=======+===========+======+=========+===========================+
>>> +| 1 | y | y | n | yes | Group noiommu works
>>> |
>>> ++------+-------+-----------+------+---------+---------------------------+
>>> +| 2 | y | n | n | no | Blocked - no
>>> container |
>>> ++------+-------+-----------+------+---------+---------------------------+
>>> +| 3 | y | y | y | yes | Both paths work
>>> |
>>> ++------+-------+-----------+------+---------+---------------------------+
>>> +| 4 | y | n | y | no | Blocked - no
>>> container |
>>> ++------+-------+-----------+------+---------+---------------------------+
>>> +| 5 | n | - | y | yes | Cdev-only works
>>> |
>>> ++------+-------+-----------+------+---------+---------------------------+
>>> +| 6 | n | - | n | no | No access path
>>> |
>>> ++------+-------+-----------+------+---------+---------------------------+
>>> +
>>
>> Does "Bloked" mean no access path or no chance to compile? :)
> By “Blocked”, I mean Kconfig prevents that combination from being
> selected, so it is not buildable as such; consequently there is no
> access path at runtime.
got it. :)
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd
2026-05-23 22:01 ` Jacob Pan
@ 2026-05-25 6:29 ` Yi Liu
2026-05-28 18:52 ` Jacob Pan
0 siblings, 1 reply; 25+ messages in thread
From: Yi Liu @ 2026-05-25 6:29 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon
Hi Jacob,
On 5/24/26 06:01, Jacob Pan wrote:
> Hi Yi,
>
> On Fri, 22 May 2026 17:19:41 +0800
> Yi Liu <yi.l.liu@intel.com> wrote:
>
>> On 5/22/26 06:11, Jacob Pan wrote:
>>> Now that devices under noiommu mode can bind with IOMMUFD and
>>> perform IOAS operations, lift restrictions on cdev from VFIO side.
>>> Use cases are documented in Documentation/driver-api/vfio.rst
>>>
>>> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
>>> ---
>>> v6:
>>> - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and
>>> group. Use Kconfig dependency to restrict usages and avoid null
>>> group checks. (Alex & Yi)
>>> - Add CAP_SYS_RAWIO checks for cdev open to maintain security
>>> parity with the group noiommu path. (Alex)
>>> v5:
>>> - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
>>> and its dependencies
>>> - Add comment to explain vfio_noiommu conditional definition
>>> (Alex)
>>> - Removed early return for group noiommu in bind/unbind
>>> - Use consistent wording referring to VFIO noiommu mode (Kevin)
>>> - Update unsafe_noiommu Kconfig help text (Kevin)
>>> - Change dev_warn to dev_info for noiommu enabling msg (Kevin)
>>> v4:
>>> - Remove early return in iommufd_bind for noiommu (Alex)
>>> v3:
>>> - Consolidate into fewer patches
>>> v2:
>>> - removed unnecessary device->noiommu set in
>>> iommufd_vfio_compat_ioas_get_id()
>>> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
>>> ---
>>> drivers/vfio/Kconfig | 8 +++++---
>>> drivers/vfio/device_cdev.c | 3 +++
>>> drivers/vfio/iommufd.c | 6 +++---
>>> drivers/vfio/vfio.h | 20 +++++++++++++-------
>>> drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
>>> include/linux/vfio.h | 1 +
>>> 6 files changed, 44 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
>>> index ceae52fd7586..d3d8fef2855c 100644
>>> --- a/drivers/vfio/Kconfig
>>> +++ b/drivers/vfio/Kconfig
>>> @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
>>> The VFIO device cdev is another way for userspace to
>>> get device access. Userspace gets device fd by opening device cdev
>>> under /dev/vfio/devices/vfioX, and then bind the device fd with an
>>> iommufd
>>> - to set up secure DMA context for device access. This
>>> interface does
>>> - not support noiommu.
>>> + to set up secure DMA context for device access.
>>
>> if noiommu, it's unsafe DMA. :)
> yes, here I just want to remove "This interface does not support
> noiommu.".
>>
>>> If you don't know what to do here, say N.
>>>
>>> @@ -62,7 +61,10 @@ endif
>>>
>>> config VFIO_NOIOMMU
>>> bool "VFIO No-IOMMU support"
>>> - depends on VFIO_GROUP
>>> + depends on VFIO_GROUP || VFIO_DEVICE_CDEV
>>> + depends on !VFIO_GROUP || VFIO_CONTAINER ||
>>> IOMMUFD_VFIO_CONTAINER
>>> + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
>>> + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
>>> help
>>> VFIO is built on the ability to isolate devices using
>>> the IOMMU. Only with an IOMMU can userspace access to DMA capable
>>> devices be diff --git a/drivers/vfio/device_cdev.c
>>> b/drivers/vfio/device_cdev.c index 54abf312cf04..4e2c1e4fc1f8 100644
>>> --- a/drivers/vfio/device_cdev.c
>>> +++ b/drivers/vfio/device_cdev.c
>>> @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode
>>> *inode, struct file *filep) struct vfio_device_file *df;
>>> int ret;
>>>
>>> + if (device->noiommu && !capable(CAP_SYS_RAWIO))
>>> + return -EPERM;
>>> +
>>> /* Paired with the put in vfio_device_fops_release() */
>>> if (!vfio_device_try_get_registration(device))
>>> return -ENODEV;
>>> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
>>> index a38d262c6028..d4f2e2a0f2f3 100644
>>> --- a/drivers/vfio/iommufd.c
>>> +++ b/drivers/vfio/iommufd.c
>>> @@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file
>>> *df)
>>> lockdep_assert_held(&vdev->dev_set->lock);
>>>
>>> - /* Returns 0 to permit device opening under noiommu mode */
>>> - if (vfio_device_is_noiommu(vdev))
>>> + /* Group noiommu via iommufd compat needs no device
>>> binding */
>>> + if (df->group && vfio_device_is_noiommu(vdev))
>>
>> seems like vfio_device_is_noiommu() implies group path, then no need
>> to use df->group.
>>
> df->group is needed because only the legacy VFIO group/iommufd-compat
> noiommu path should skip real iommufd device binding.
got it. It should be opened via group path. Otherwise, it should go
ahead to bind iommufd. BTW. the noiommu check in
vfio_iommufd_compat_attach_ioas() is skipped. I know this helper is
only for the group path, so the df->group is not added, can we add
a note for it?
> For df->group == NULL, the fd is a VFIO cdev fd. That path uses
> VFIO_DEVICE_BIND_IOMMUFD and later VFIO_DEVICE_ATTACH_IOMMUFD_PT. Even
> in noiommu cdev mode, bind must still call:
>
> vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
>
> so vdev->iommufd_device can get initialized. If the check were only:
>
> if (vfio_device_is_noiommu(vdev))
> return 0;
> then cdev noiommu bind would falsely “succeed” without setting
> vdev->iommufd_device. Later VFIO_DEVICE_ATTACH_IOMMUFD_PT calls
> vfio_iommufd_physical_attach_ioas(), hits:
>
> if (WARN_ON(!vdev->iommufd_device))
> return -EINVAL;
>
> In the noiommu test, you will get:
> 185.870670] ------------[ cut here ]------------
> [ 185.871952] WARNING: drivers/vfio/iommufd.c:157 at
> vfio_iommufd_physical_attach_ioas+0x3f/0x50, CPU#0:
> vfio-noiommu-pc/157[ 185.875010] Modules linked in:[ 185.875882] CPU:
> 0 UID: 0 PID: 157 Comm: vfio-noiommu-pc Tainted: G U W
> 7.1.0-rc1+ #20 PREEMPT[ 185.878637] Tainted: [U]=USER, [W]=WARN[
> 185.879711] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014[ 185.882913]
> RIP: 0010:vfio_iommufd_physical_attach_ioas+0x3f/0x50[ 185.884624]
> Code: 89 f2 31 f6 f6 83 50 04 00 00 01 75 16 e8 d9 aa c6 ff 85 c0 75 07
> 80 8b 50 04 00 00 01 5b c3 cc cc cc cc e8 43 ab c6 ff eb e8 <0f> 0b
> b80[ 185.889701] RSP: 0018:ffa000000062fd88 EFLAGS: 00010246[
> 185.891161] RAX: ffffffff81f59ee0 RBX: ff1100010c43b800 RCX:
> 0000000000000000[ 185.893141] RDX: ff1100010c708040 RSI:
> ffa000000062fda0 RDI: 0000000000000000[ 185.895127] RBP:
> ff1100010c43b800 R08: ff1100010c7c12b0 R09: 0000000000000000[
> 185.897119] R10: 0000000000000000 R11: 0000000000000000 R12:
> 00007ffec4c2f720[ 185.899102] R13: ffa000000062fda0 R14:
> ff11000103bd40d0 R15: ff1100010c43b800[ 185.901075] FS:
> 0000000028d69380(0000) GS:ff110004e4a8d000(0000)
> knlGS:0000000000000000[ 185.903284] CS: 0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033[ 185.904888] CR2: 0000000028d73988 CR3:
> 0000000103507002 CR4: 0000000000f73ef0[ 185.906853] PKRU: 55555554[
> 185.907636] Call Trace:[ 185.908373] <TASK>[ 185.908932]
> vfio_df_ioctl_attach_pt+0xc7/0x170[ 185.910085]
> vfio_device_fops_unl_ioctl+0x49b/0xa50[ 185.911322] ?
> file_tty_write.isra.0+0x202/0x320[ 185.912507]
> __x64_sys_ioctl+0x425/0xa30[ 185.913502] do_syscall_64+0x5e/0xf80[
> 185.914444] ? irqentry_exit+0x3b/0x5e0[ 185.915414]
> entry_SYSCALL_64_after_hwframe+0x76/0x7e[ 185.916701] RIP:
> 0033:0x434a4d[ 185.917498] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0
> 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8
> 10 00 00 00 0f 05 <89> c2 3d0[ 185.922052] RSP: 002b:00007ffec4c2f6b0
> EFLAGS: 00000246 ORIG_RAX: 0000000000000010[ 185.923785] RAX:
> ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000434a4d[
> 185.925398] RDX: 00007ffec4c2f720 RSI: 0000000000003b77 RDI:
> 0000000000000004[ 185.927007] RBP: 00007ffec4c2f700 R08:
> 0000000000000064 R09: 0000000000000000[ 185.928611] R10:
> 0000000000000000 R11: 0000000000000246 R12: 00007ffec4c30918[
> 185.930211] R13: 00007ffec4c30940 R14: 00000000004cf868 R15:
> 0000000000000001[ 185.931758] </TASK>[ 185.932258] ---[ end trace
> 0000000000000000 ]---Failed to attach pt to device
>
>> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
>> {
>> return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
>> vdev->group->type == VFIO_NO_IOMMU;
>> }
>>
>>> return 0;
>>>
>>> return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
>>> @@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct
>>> vfio_device_file *df)
>>> lockdep_assert_held(&vdev->dev_set->lock);
>>>
>>> - if (vfio_device_is_noiommu(vdev))
>>> + if (df->group && vfio_device_is_noiommu(vdev))
>>> return;
>>>
>>> if (vdev->ops->unbind_iommufd)
>>> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
>>> index e4b72e79b7e3..6f0a2dfc8a00 100644
>>> --- a/drivers/vfio/vfio.h
>>> +++ b/drivers/vfio/vfio.h
>>> @@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct vfio_device
>>> *device);
>>> static inline int vfio_device_add(struct vfio_device *device)
>>> {
>>> - /* cdev does not support noiommu device */
>>> - if (vfio_device_is_noiommu(device))
>>> - return device_add(&device->device);
>>> vfio_init_device_cdev(device);
>>> return cdev_device_add(&device->cdev, &device->device);
>>> }
>>>
>>> static inline void vfio_device_del(struct vfio_device *device)
>>> {
>>> - if (vfio_device_is_noiommu(device))
>>> - device_del(&device->device);
>>> - else
>>> - cdev_device_del(&device->cdev, &device->device);
>>> + cdev_device_del(&device->cdev, &device->device);
>>> }
>>>
>>> int vfio_device_fops_cdev_open(struct inode *inode, struct file
>>> *filep); @@ -420,6 +414,18 @@ static inline void
>>> vfio_cdev_cleanup(void) }
>>> #endif /* CONFIG_VFIO_DEVICE_CDEV */
>>>
>>> +#if IS_ENABLED(CONFIG_VFIO_NOIOMMU)
>>> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
>>> *vdev) +{
>>> + return vdev->noiommu;
>>> +}
>>> +#else
>>> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
>>> *vdev) +{
>>> + return false;
>>> +}
>>> +#endif
>>> +
>>> #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
>>> int __init vfio_virqfd_init(void);
>>> void vfio_virqfd_exit(void);
>>> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
>>> index 6222376ab6ab..84381c500623 100644
>>> --- a/drivers/vfio/vfio_main.c
>>> +++ b/drivers/vfio/vfio_main.c
>>> @@ -321,6 +321,20 @@ static int vfio_init_device(struct vfio_device
>>> *device, struct device *dev, return ret;
>>> }
>>>
>>> +static int vfio_device_set_noiommu_and_name(struct vfio_device
>>> *device) +{
>>> + if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu &&
>>> !device->dev->iommu) {
>>> + device->noiommu = true;
>>> + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>>> + dev_warn(device->dev,
>>> + "Adding kernel taint for vfio-noiommu
>>> cdev on device\n");
>>> + }
>>> +
>>> + /* Just to be safe, expose to user explicitly noiommu cdev
>>> node */
>>> + return dev_set_name(&device->device, "%svfio%d",
>>> + device->noiommu ? "noiommu-" : "",
>>> device->index); +}
>>> +
>>> static int __vfio_register_dev(struct vfio_device *device,
>>> enum vfio_group_type type)
>>> {
>>> @@ -340,20 +354,21 @@ static int __vfio_register_dev(struct
>>> vfio_device *device, if (!device->dev_set)
>>> vfio_assign_device_set(device, device);
>>>
>>> - ret = dev_set_name(&device->device, "vfio%d",
>>> device->index);
>>> + ret = vfio_device_set_group(device, type);
>>> if (ret)
>>> return ret;
>>>
>>> - ret = vfio_device_set_group(device, type);
>>> + ret = vfio_device_set_noiommu_and_name(device);
>>
>> the order of dev_set_name and vfio_device_set_group() are swapped, any
>> special reason?
> The ordering was intentional in an earlier version where the cdev
> noiommu check depended on device->group. With the current check using
> !device->dev->iommu, the ordering is no longer strictly required for
> that test.
>
> I kept vfio_device_set_group() first because the rest of registration
> already treats group setup as the first VFIO state to unwind, and this
> lets the existing err_out path handle failures after group assignment,
> including dev_set_name(). I can restore the old order if you prefer,
> since it is not functionally required anymore.
I think it's better to keep the original order if no functional
requirement anymore.
>>> if (ret)
>>> - return ret;
>>> + goto err_out;
>>>
>>> /*
>>> * VFIO always sets IOMMU_CACHE because we offer no way
>>> for userspace to
>>> * restore cache coherency. It has to be checked here
>>> because it is only
>>> * valid for cases where we are using iommu groups.
>>> */
>>> - if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)
>>> &&
>>> + if (type == VFIO_IOMMU && !(vfio_device_is_noiommu(device)
>>> ||
>>> +
>>> vfio_device_is_cdev_noiommu(device)) &&
>>
>> now, the group path and cdev path have their own is_noiommu helper,
>> can the two helpers be consolidated?
>>
> They could be consolidated mechanically, but I feel they are checking
> different things it is more clear to keep them separate?
IMHO. They are actually checking if the device is noiommu. I found the
current usage of vfio_device_is_noiommu(). #1 is totally specific for
group path. #2, #3 and #4 are for the common path to identify the
noiommu of group. It also implies the info of the open path (group path?).
#7 and #8 is going to be dropped. And 9 is totally for checking noiommu
attribute. So I'm wondering if using !dev->iommu is a good choice.
Should be able to cover both group and cdev path. Let me know if this
is not workable.
# line filename / context / line
1 193 drivers/vfio/group.c <<vfio_df_group_open>>
if (df->iommufd && vfio_device_is_noiommu(device) &&
device->open_count == 0) {
2 29 drivers/vfio/iommufd.c <<vfio_df_iommufd_bind>>
if (vfio_device_is_noiommu(vdev))
3 44 drivers/vfio/iommufd.c <<vfio_iommufd_compat_attach_ioas>>
if (vfio_device_is_noiommu(vdev))
4 61 drivers/vfio/iommufd.c <<vfio_df_iommufd_unbind>>
if (vfio_device_is_noiommu(vdev))
5 115 drivers/vfio/vfio.h <<vfio_device_is_noiommu>>
static inline bool vfio_device_is_noiommu(struct
vfio_device *vdev)
6 191 drivers/vfio/vfio.h <<vfio_device_is_noiommu>>
static inline bool vfio_device_is_noiommu(struct
vfio_device *vdev)
7 362 drivers/vfio/vfio.h <<vfio_device_add>>
if (vfio_device_is_noiommu(device))
8 370 drivers/vfio/vfio.h <<vfio_device_del>>
if (vfio_device_is_noiommu(device))
9 356 drivers/vfio/vfio_main.c <<__vfio_register_dev>>
if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
>>> !device_iommu_capable(device->dev,
>>> IOMMU_CAP_CACHE_COHERENCY)) { ret = -EINVAL;
>>> goto err_out;
>>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>>> index 31b826efba00..45f08986359e 100644
>>> --- a/include/linux/vfio.h
>>> +++ b/include/linux/vfio.h
>>> @@ -74,6 +74,7 @@ struct vfio_device {
>>> u8 iommufd_attached:1;
>>> #endif
>>> u8 cdev_opened:1;
>>> + u8 noiommu:1;
>>> /*
>>> * debug_root is a static property of the vfio_device
>>> * which must be set prior to registering the
>>> vfio_device.
>
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
` (6 preceding siblings ...)
2026-05-21 22:11 ` [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode Jacob Pan
@ 2026-05-25 8:30 ` Tian, Kevin
2026-05-26 15:32 ` Jacob Pan
7 siblings, 1 reply; 25+ messages in thread
From: Tian, Kevin @ 2026-05-25 8:30 UTC (permalink / raw)
To: Jacob Pan, linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
Jason Gunthorpe, Alex Williamson, Joerg Roedel, Mostafa Saleh,
David Matlack, Robin Murphy, Nicolin Chen, Liu, Yi L, Baolu Lu
Cc: Saurabh Sengar, skhawaja@google.com, pasha.tatashin@soleen.com,
Will Deacon
Could you address the findings from Sashiko?
https://sashiko.dev/#/patchset/20260521221155.1375144-1-jacob.pan%40linux.microsoft.com
> From: Jacob Pan <jacob.pan@linux.microsoft.com>
> Sent: Friday, May 22, 2026 6:12 AM
>
> VFIO's unsafe_noiommu_mode has long provided a way for userspace
> drivers
> to operate on platforms lacking a hardware IOMMU. Today, IOMMUFD also
> supports No-IOMMU mode for group-based devices under vfio_compat
> mode.
> However, IOMMUFD's native character device (cdev) does not yet support
> No-IOMMU mode, which is the purpose of this patch.
>
> In summary, we have:
>
> |-------------------------+------+---------------|
> | Device access mode | VFIO | IOMMUFD |
> |-------------------------+------+---------------|
> | group /dev/vfio/$GROUP | Yes | Yes |
> |-------------------------+------+---------------|
> | cdev /dev/vfio/devices/ | No | This patch |
> |-------------------------+------+---------------|
>
> Beyond enabling cdev for IOMMUFD, this patch also addresses the following
> deficiencies in the current No-IOMMU mode suggested by Jason[1]:
> - Devices operating under No-IOMMU mode are limited to device-level UAPI
> access, without container or IOAS-level capabilities. Consequently,
> user-space drivers lack structured mechanisms for page pinning and often
> resort to mlock(), which is less robust than pin_user_pages() used for
> devices backed by a physical IOMMU. For example, mlock() does not
> prevent
> page migration.
> - There is no architectural mechanism for obtaining physical addresses for
> DMA. As a workaround, user-space drivers frequently rely on
> /proc/pagemap
> tricks or hardcoded values.
>
> By allowing noiommu device access to IOMMUFD IOAS and HWPT objects,
> this
> patch brings No-IOMMU mode closer to full citizenship within the IOMMU
> subsystem. In addition to addressing the two deficiencies mentioned above,
> the expectation is that it will also enable No-IOMMU devices to seamlessly
> participate in live update sessions via KHO [2].
>
> Furthermore, these devices will use the IOMMUFD-based ownership
> checking model for
> VFIO_DEVICE_PCI_HOT_RESET, eliminating the need for an iommufd_access
> object
> as required in a previous attempt [3].
>
> ChangeLog:
> V6:
> - Delete rename VFIO_IOMMU patch
> - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and group.
> Use Kconfig dependency to restrict usages and avoid null group
> checks. (Alex & Yi)
> - Add CAP_SYS_RAWIO checks for cdev open to maintain security parity
> with the group noiommu path. (Alex)
> - Updated documentation with Kconfig usage matrix
> - Added max length limit to get_pa ioctl (Baolu & Jason)
> V5:
> - Split CONFIG_VFIO_NOIOMMU into CONFIG_VFIO_GROUP_NOIOMMU
> and
> CONFIG_VFIO_CDEV_NOIOMMU so cdev noiommu is independent of
> VFIO_GROUP (Alex)
> - Add CAP_SYS_RAWIO check for cdev open and bind under noiommu,
> security parity with group noiommu (Alex)
> - Add IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) guard in
> iommufd_device_is_noiommu() to prevent noiommu bind when feature
> is disabled
> - Add prep patch to tolerate NULL group for cdev noiommu devices
> when CONFIG_VFIO_GROUP_NOIOMMU is not set [7/9]
> - Rename IOCTL to IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA to be more
> specific (Kevin)
> - Simplify iommufd_device_is_noiommu, use iommufd_bind_noiommu
> helper (Kevin, Yi)
> - Move IOMMU cap check under iommufd_bind_iommu() (Yi)
> - Fix next_iova exceeding iopt_area_last_iova in GET_PA (Alex)
> - Fix const hwpt, copyright date, typo in moved comment (Kevin)
> - Add Reviewed-by tags
> - Squash noiommu cdev selftest fix into selftest patch
> - Drop DSA selftest patch
> - Details in each patch changelog.
>
> V4:
> - Fix various corner cases pointed out by (Sashiko)
> Details in each patch changelog.
>
> V3:
> - Improve error handling [3/10] (Mostafa)
> - Simplify vfio_device_is_noiommu logic and merged in [6/10] (Mostafa)
> - Add comment to explain the design difference over the legacy noiommu
> VFIO code.[1/10]
>
> V2:
> - Fix build dependency by adding IOMMU_SUPPORT in [8/11]
> - Add an optimization to scan beyond the first page for a contiguous
> physical address range and return its length instead of a single
> page.[4/11]
>
> Since RFC[4]:
> - Abandoned dummy iommu driver approach as patch 1-3 absorbed the
> changes into iommufd.
>
> [1] https://lore.kernel.org/linux-
> iommu/20250603175403.GA407344@nvidia.com/
> [2] https://lore.kernel.org/linux-
> pci/20251027134430.00007e46@linux.microsoft.com/
> [3] https://lore.kernel.org/kvm/20230522115751.326947-1-
> yi.l.liu@intel.com/
> [4] https://lore.kernel.org/linux-iommu/20251201173012.18371-1-
> jacob.pan@linux.microsoft.com/
>
> Future cleanup: consolidate all CONFIG_IOMMUFD_NOIOMMU code
> (iopt_get_phys, iommufd_ioas_noiommu_get_pa, iommufd_noiommu_ops)
> into
> hwpt_noiommu.c to eliminate #ifdef guards from ioas.c and io_pagetable.c.
>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
>
>
> Jacob Pan (4):
> iommufd: Add an ioctl to query PA from IOVA for noiommu mode
> vfio: Enable cdev noiommu mode under iommufd
> selftests/vfio: Add iommufd noiommu mode selftest for cdev
> Documentation: Update VFIO NOIOMMU mode
>
> Jason Gunthorpe (3):
> iommufd: Support a HWPT without an iommu driver for noiommu
> iommufd: Move igroup allocation to a function
> iommufd: Allow binding to a noiommu device
>
> Documentation/driver-api/vfio.rst | 83 ++-
> drivers/iommu/iommufd/Kconfig | 12 +
> drivers/iommu/iommufd/Makefile | 1 +
> drivers/iommu/iommufd/device.c | 192 +++--
> drivers/iommu/iommufd/hw_pagetable.c | 15 +-
> drivers/iommu/iommufd/hwpt_noiommu.c | 97 +++
> drivers/iommu/iommufd/io_pagetable.c | 72 ++
> drivers/iommu/iommufd/ioas.c | 30 +
> drivers/iommu/iommufd/iommufd_private.h | 20 +
> drivers/iommu/iommufd/main.c | 3 +
> drivers/vfio/Kconfig | 8 +-
> drivers/vfio/device_cdev.c | 3 +
> drivers/vfio/iommufd.c | 6 +-
> drivers/vfio/vfio.h | 20 +-
> drivers/vfio/vfio_main.c | 23 +-
> include/linux/vfio.h | 1 +
> include/uapi/linux/iommufd.h | 27 +
> tools/testing/selftests/vfio/Makefile | 1 +
> .../lib/include/libvfio/vfio_pci_device.h | 16 +
> .../selftests/vfio/lib/vfio_pci_device.c | 5 +-
> .../vfio/vfio_iommufd_noiommu_test.c | 664 ++++++++++++++++++
> 21 files changed, 1221 insertions(+), 78 deletions(-)
> create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
> create mode 100644
> tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev
2026-05-25 8:30 ` [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Tian, Kevin
@ 2026-05-26 15:32 ` Jacob Pan
2026-05-26 17:57 ` Alex Williamson
0 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-26 15:32 UTC (permalink / raw)
To: Tian, Kevin
Cc: linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
Jason Gunthorpe, Alex Williamson, Joerg Roedel, Mostafa Saleh,
David Matlack, Robin Murphy, Nicolin Chen, Liu, Yi L, Baolu Lu,
Saurabh Sengar, skhawaja@google.com, pasha.tatashin@soleen.com,
Will Deacon, jacob.pan
Hi Kevin,
On Mon, 25 May 2026 08:30:12 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:
> Could you address the findings from Sashiko?
>
> https://sashiko.dev/#/patchset/20260521221155.1375144-1-jacob.pan%40linux.microsoft.com
>
I have go over my Sashiko review setup, but there are lots of
false positives, like this one below we already discussed in earlier
version. Is there a specific concern?
e.g.
> +static bool iommufd_device_is_noiommu(struct iommufd_device *idev)
> +{
> + return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) &&
> !idev->dev->iommu; +}
Does dynamically evaluating dev->iommu here allow the noiommu state to
flip during the device's lifetime?
> > From: Jacob Pan <jacob.pan@linux.microsoft.com>
> > Sent: Friday, May 22, 2026 6:12 AM
> >
> > VFIO's unsafe_noiommu_mode has long provided a way for userspace
> > drivers
> > to operate on platforms lacking a hardware IOMMU. Today, IOMMUFD
> > also supports No-IOMMU mode for group-based devices under
> > vfio_compat mode.
> > However, IOMMUFD's native character device (cdev) does not yet
> > support No-IOMMU mode, which is the purpose of this patch.
> >
> > In summary, we have:
> >
> > |-------------------------+------+---------------|
> > | Device access mode | VFIO | IOMMUFD |
> > |-------------------------+------+---------------|
> > | group /dev/vfio/$GROUP | Yes | Yes |
> > |-------------------------+------+---------------|
> > | cdev /dev/vfio/devices/ | No | This patch |
> > |-------------------------+------+---------------|
> >
> > Beyond enabling cdev for IOMMUFD, this patch also addresses the
> > following deficiencies in the current No-IOMMU mode suggested by
> > Jason[1]:
> > - Devices operating under No-IOMMU mode are limited to device-level
> > UAPI access, without container or IOAS-level capabilities.
> > Consequently, user-space drivers lack structured mechanisms for
> > page pinning and often resort to mlock(), which is less robust than
> > pin_user_pages() used for devices backed by a physical IOMMU. For
> > example, mlock() does not prevent
> > page migration.
> > - There is no architectural mechanism for obtaining physical
> > addresses for DMA. As a workaround, user-space drivers frequently
> > rely on /proc/pagemap
> > tricks or hardcoded values.
> >
> > By allowing noiommu device access to IOMMUFD IOAS and HWPT objects,
> > this
> > patch brings No-IOMMU mode closer to full citizenship within the
> > IOMMU subsystem. In addition to addressing the two deficiencies
> > mentioned above, the expectation is that it will also enable
> > No-IOMMU devices to seamlessly participate in live update sessions
> > via KHO [2].
> >
> > Furthermore, these devices will use the IOMMUFD-based ownership
> > checking model for
> > VFIO_DEVICE_PCI_HOT_RESET, eliminating the need for an
> > iommufd_access object
> > as required in a previous attempt [3].
> >
> > ChangeLog:
> > V6:
> > - Delete rename VFIO_IOMMU patch
> > - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev and
> > group. Use Kconfig dependency to restrict usages and avoid null
> > group checks. (Alex & Yi)
> > - Add CAP_SYS_RAWIO checks for cdev open to maintain security
> > parity with the group noiommu path. (Alex)
> > - Updated documentation with Kconfig usage matrix
> > - Added max length limit to get_pa ioctl (Baolu & Jason)
> > V5:
> > - Split CONFIG_VFIO_NOIOMMU into CONFIG_VFIO_GROUP_NOIOMMU
> > and
> > CONFIG_VFIO_CDEV_NOIOMMU so cdev noiommu is independent of
> > VFIO_GROUP (Alex)
> > - Add CAP_SYS_RAWIO check for cdev open and bind under noiommu,
> > security parity with group noiommu (Alex)
> > - Add IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) guard in
> > iommufd_device_is_noiommu() to prevent noiommu bind when feature
> > is disabled
> > - Add prep patch to tolerate NULL group for cdev noiommu devices
> > when CONFIG_VFIO_GROUP_NOIOMMU is not set [7/9]
> > - Rename IOCTL to IOMMUFD_CMD_IOAS_NOIOMMU_GET_PA to be more
> > specific (Kevin)
> > - Simplify iommufd_device_is_noiommu, use iommufd_bind_noiommu
> > helper (Kevin, Yi)
> > - Move IOMMU cap check under iommufd_bind_iommu() (Yi)
> > - Fix next_iova exceeding iopt_area_last_iova in GET_PA (Alex)
> > - Fix const hwpt, copyright date, typo in moved comment (Kevin)
> > - Add Reviewed-by tags
> > - Squash noiommu cdev selftest fix into selftest patch
> > - Drop DSA selftest patch
> > - Details in each patch changelog.
> >
> > V4:
> > - Fix various corner cases pointed out by (Sashiko)
> > Details in each patch changelog.
> >
> > V3:
> > - Improve error handling [3/10] (Mostafa)
> > - Simplify vfio_device_is_noiommu logic and merged in [6/10]
> > (Mostafa)
> > - Add comment to explain the design difference over the legacy
> > noiommu VFIO code.[1/10]
> >
> > V2:
> > - Fix build dependency by adding IOMMU_SUPPORT in [8/11]
> > - Add an optimization to scan beyond the first page for a
> > contiguous physical address range and return its length instead of
> > a single page.[4/11]
> >
> > Since RFC[4]:
> > - Abandoned dummy iommu driver approach as patch 1-3 absorbed the
> > changes into iommufd.
> >
> > [1] https://lore.kernel.org/linux-
> > iommu/20250603175403.GA407344@nvidia.com/
> > [2] https://lore.kernel.org/linux-
> > pci/20251027134430.00007e46@linux.microsoft.com/
> > [3] https://lore.kernel.org/kvm/20230522115751.326947-1-
> > yi.l.liu@intel.com/
> > [4] https://lore.kernel.org/linux-iommu/20251201173012.18371-1-
> > jacob.pan@linux.microsoft.com/
> >
> > Future cleanup: consolidate all CONFIG_IOMMUFD_NOIOMMU code
> > (iopt_get_phys, iommufd_ioas_noiommu_get_pa, iommufd_noiommu_ops)
> > into
> > hwpt_noiommu.c to eliminate #ifdef guards from ioas.c and
> > io_pagetable.c.
> >
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> >
> >
> > Jacob Pan (4):
> > iommufd: Add an ioctl to query PA from IOVA for noiommu mode
> > vfio: Enable cdev noiommu mode under iommufd
> > selftests/vfio: Add iommufd noiommu mode selftest for cdev
> > Documentation: Update VFIO NOIOMMU mode
> >
> > Jason Gunthorpe (3):
> > iommufd: Support a HWPT without an iommu driver for noiommu
> > iommufd: Move igroup allocation to a function
> > iommufd: Allow binding to a noiommu device
> >
> > Documentation/driver-api/vfio.rst | 83 ++-
> > drivers/iommu/iommufd/Kconfig | 12 +
> > drivers/iommu/iommufd/Makefile | 1 +
> > drivers/iommu/iommufd/device.c | 192 +++--
> > drivers/iommu/iommufd/hw_pagetable.c | 15 +-
> > drivers/iommu/iommufd/hwpt_noiommu.c | 97 +++
> > drivers/iommu/iommufd/io_pagetable.c | 72 ++
> > drivers/iommu/iommufd/ioas.c | 30 +
> > drivers/iommu/iommufd/iommufd_private.h | 20 +
> > drivers/iommu/iommufd/main.c | 3 +
> > drivers/vfio/Kconfig | 8 +-
> > drivers/vfio/device_cdev.c | 3 +
> > drivers/vfio/iommufd.c | 6 +-
> > drivers/vfio/vfio.h | 20 +-
> > drivers/vfio/vfio_main.c | 23 +-
> > include/linux/vfio.h | 1 +
> > include/uapi/linux/iommufd.h | 27 +
> > tools/testing/selftests/vfio/Makefile | 1 +
> > .../lib/include/libvfio/vfio_pci_device.h | 16 +
> > .../selftests/vfio/lib/vfio_pci_device.c | 5 +-
> > .../vfio/vfio_iommufd_noiommu_test.c | 664
> > ++++++++++++++++++ 21 files changed, 1221 insertions(+), 78
> > deletions(-) create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
> > create mode 100644
> > tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
> >
> > --
> > 2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev
2026-05-26 15:32 ` Jacob Pan
@ 2026-05-26 17:57 ` Alex Williamson
2026-05-27 22:34 ` Jacob Pan
0 siblings, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2026-05-26 17:57 UTC (permalink / raw)
To: Jacob Pan
Cc: Tian, Kevin, linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
Jason Gunthorpe, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Liu, Yi L, Baolu Lu, Saurabh Sengar,
skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon, alex
On Tue, 26 May 2026 08:32:37 -0700
Jacob Pan <jacob.pan@linux.microsoft.com> wrote:
> Hi Kevin,
>
> On Mon, 25 May 2026 08:30:12 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
>
> > Could you address the findings from Sashiko?
> >
> > https://sashiko.dev/#/patchset/20260521221155.1375144-1-jacob.pan%40linux.microsoft.com
> >
> I have go over my Sashiko review setup, but there are lots of
> false positives, like this one below we already discussed in earlier
> version. Is there a specific concern?
>
> e.g.
> > +static bool iommufd_device_is_noiommu(struct iommufd_device *idev)
> > +{
> > + return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) &&
> > !idev->dev->iommu; +}
> Does dynamically evaluating dev->iommu here allow the noiommu state to
> flip during the device's lifetime?
Yes, that one is at best a theoretical issue, but the next two NULL
pointer dereference if user passes noiommu device fd through an
unexpected iommufd interface appear quite real.
We're also still struggling with the Kconfig in patch 5, this Sashiko
comment is valid:
>> @@ -62,7 +61,10 @@ endif
>>
>> config VFIO_NOIOMMU
>> bool "VFIO No-IOMMU support"
>> - depends on VFIO_GROUP
>> + depends on VFIO_GROUP || VFIO_DEVICE_CDEV
>> + depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER
>> + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
>
> Does this disable VFIO_NOIOMMU completely for legacy group users on
> architectures using generic atomic64 implementations?
>
> On architectures like 32-bit ARM or x86, !GENERIC_ATOMIC64 evaluates
> to false. If a distribution enables VFIO_DEVICE_CDEV, this dependency
> resolves to false, silently breaking backwards compatibility and
> depriving legacy group-based users of noiommu support.
That's true, the question is do we care (I'd prefer to) and if so,
should we block the relevant interfaces from working though iommufd
rather than disallowing the Kconfig option.
The next issue regarding classifying emulated IOMMU devices as no-iommu
also appears valid, mdev devices like kvmgt for example.
The next issue raises a valid concern whether the dev_warn() should be
a dev_warn_once().
The sysfs naming comment is invalid, we intentionally name noiommu
devices uniquely to force userspace opt-in.
In patch 6, adding dead code is a valid comment, the unchecked asprintf
does seem to be an outlier in selftest code, the unmap comment may be a
false positive, as is the hugepage size, but the masked return comments
could arguably be skips or asserts. There are potentially a couple
remaining nits and style issues noted.
Overall, more signal than noise afaict. Thanks,
Alex
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev
2026-05-26 17:57 ` Alex Williamson
@ 2026-05-27 22:34 ` Jacob Pan
0 siblings, 0 replies; 25+ messages in thread
From: Jacob Pan @ 2026-05-27 22:34 UTC (permalink / raw)
To: Alex Williamson
Cc: Tian, Kevin, linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
Jason Gunthorpe, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Liu, Yi L, Baolu Lu, Saurabh Sengar,
skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon,
jacob.pan
Hi Alex,
On Tue, 26 May 2026 11:57:09 -0600
Alex Williamson <alex@shazbot.org> wrote:
> On Tue, 26 May 2026 08:32:37 -0700
> Jacob Pan <jacob.pan@linux.microsoft.com> wrote:
>
> > Hi Kevin,
> >
> > On Mon, 25 May 2026 08:30:12 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >
> > > Could you address the findings from Sashiko?
> > >
> > > https://sashiko.dev/#/patchset/20260521221155.1375144-1-jacob.pan%40linux.microsoft.com
> > >
> > I have go over my Sashiko review setup, but there are lots of
> > false positives, like this one below we already discussed in earlier
> > version. Is there a specific concern?
> >
> > e.g.
> > > +static bool iommufd_device_is_noiommu(struct iommufd_device
> > > *idev) +{
> > > + return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) &&
> > > !idev->dev->iommu; +}
> > Does dynamically evaluating dev->iommu here allow the noiommu state
> > to flip during the device's lifetime?
>
> Yes, that one is at best a theoretical issue, but the next two NULL
> pointer dereference if user passes noiommu device fd through an
> unexpected iommufd interface appear quite real.
>
> We're also still struggling with the Kconfig in patch 5, this Sashiko
> comment is valid:
>
> >> @@ -62,7 +61,10 @@ endif
> >>
> >> config VFIO_NOIOMMU
> >> bool "VFIO No-IOMMU support"
> >> - depends on VFIO_GROUP
> >> + depends on VFIO_GROUP || VFIO_DEVICE_CDEV
> >> + depends on !VFIO_GROUP || VFIO_CONTAINER ||
> >> IOMMUFD_VFIO_CONTAINER
> >> + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
> >
> > Does this disable VFIO_NOIOMMU completely for legacy group users on
> > architectures using generic atomic64 implementations?
> >
> > On architectures like 32-bit ARM or x86, !GENERIC_ATOMIC64 evaluates
> > to false. If a distribution enables VFIO_DEVICE_CDEV, this
> > dependency resolves to false, silently breaking backwards
> > compatibility and depriving legacy group-based users of noiommu
> > support.
>
> That's true, the question is do we care (I'd prefer to) and if so,
> should we block the relevant interfaces from working though iommufd
> rather than disallowing the Kconfig option.
>
Yes, I think we should preserve the legacy VFIO_GROUP/VFIO_CONTAINER
noiommu path and only disable the IOMMUFD/cdev noiommu support on
GENERIC_ATOMIC64 platforms.
How about:
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index d3d8fef2855c..b9d6e1c22aed 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -61,10 +61,9 @@ endif
config VFIO_NOIOMMU
bool "VFIO No-IOMMU support"
- depends on VFIO_GROUP || VFIO_DEVICE_CDEV
+ depends on VFIO_GROUP || (VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64)
depends on !VFIO_GROUP || VFIO_CONTAINER || IOMMUFD_VFIO_CONTAINER
- depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
- select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
+ select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV && !GENERIC_ATOMIC64
With this, VFIO_NOIOMMU remains selectable for legacy group/container users
on GENERIC_ATOMIC64 architectures, e.g. on ARM I can have
CONFIG_VFIO_DEVICE_CDEV=y
CONFIG_VFIO_GROUP=y
CONFIG_VFIO_CONTAINER=y
CONFIG_VFIO_IOMMU_TYPE1=y
CONFIG_VFIO_NOIOMMU=y
CONFIG_GENERIC_ATOMIC64=y
Also, gate this on iommufd:
static int vfio_device_set_noiommu_and_name(struct vfio_device *device, enum vfio_group_type type)
{
- if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu &&
+ if (IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && vfio_noiommu &&
> The next issue regarding classifying emulated IOMMU devices as
> no-iommu also appears valid, mdev devices like kvmgt for example.
>
I will fix this by adding vfio_group_type check.
-static int vfio_device_set_noiommu_and_name(struct vfio_device *device)
+static int vfio_device_set_noiommu_and_name(struct vfio_device *device, enum vfio_group_type type)
{
- if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu && !device->dev->iommu) {
+ if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu &&
+ !device->dev->iommu && type == VFIO_IOMMU) {
> The next issue raises a valid concern whether the dev_warn() should be
> a dev_warn_once().
>
I think dev_warn() is appropriate here, matching the existing group
path. The warning is per device per path. Using dev_warn_once() would
suppress warnings for later devices at the same callsite, which would
hide which devices were exposed through the unsafe noiommu path.
For example, with both group and cdev enabled today we get:
vfio-pci 0000:01:00.0: Adding kernel taint for vfio-noiommu group on
device
vfio-pci 0000:01:00.0: Adding kernel taint for vfio-noiommu cdev
on device
> The sysfs naming comment is invalid, we intentionally name noiommu
> devices uniquely to force userspace opt-in.
>
> In patch 6, adding dead code is a valid comment, the unchecked
> asprintf does seem to be an outlier in selftest code, the unmap
> comment may be a false positive, as is the hugepage size, but the
> masked return comments could arguably be skips or asserts. There are
> potentially a couple remaining nits and style issues noted.
>
yeah, I will fix the test code, bsed on David's feedback also.
> Overall, more signal than noise afaict. Thanks,
>
> Alex
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd
2026-05-25 6:29 ` Yi Liu
@ 2026-05-28 18:52 ` Jacob Pan
2026-05-29 7:27 ` Yi Liu
0 siblings, 1 reply; 25+ messages in thread
From: Jacob Pan @ 2026-05-28 18:52 UTC (permalink / raw)
To: Yi Liu
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon, jacob.pan
Hi Yi,
On Mon, 25 May 2026 14:29:31 +0800
Yi Liu <yi.l.liu@intel.com> wrote:
> Hi Jacob,
>
> On 5/24/26 06:01, Jacob Pan wrote:
> > Hi Yi,
> >
> > On Fri, 22 May 2026 17:19:41 +0800
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >
> >> On 5/22/26 06:11, Jacob Pan wrote:
> >>> Now that devices under noiommu mode can bind with IOMMUFD and
> >>> perform IOAS operations, lift restrictions on cdev from VFIO side.
> >>> Use cases are documented in Documentation/driver-api/vfio.rst
> >>>
> >>> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> >>> ---
> >>> v6:
> >>> - Revert back to unified VFIO_NOIOMMU Kconfig for both cdev
> >>> and group. Use Kconfig dependency to restrict usages and avoid
> >>> null group checks. (Alex & Yi)
> >>> - Add CAP_SYS_RAWIO checks for cdev open to maintain security
> >>> parity with the group noiommu path. (Alex)
> >>> v5:
> >>> - Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
> >>> and its dependencies
> >>> - Add comment to explain vfio_noiommu conditional definition
> >>> (Alex)
> >>> - Removed early return for group noiommu in bind/unbind
> >>> - Use consistent wording referring to VFIO noiommu mode
> >>> (Kevin)
> >>> - Update unsafe_noiommu Kconfig help text (Kevin)
> >>> - Change dev_warn to dev_info for noiommu enabling msg
> >>> (Kevin) v4:
> >>> - Remove early return in iommufd_bind for noiommu (Alex)
> >>> v3:
> >>> - Consolidate into fewer patches
> >>> v2:
> >>> - removed unnecessary device->noiommu set in
> >>> iommufd_vfio_compat_ioas_get_id()
> >>> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> >>> ---
> >>> drivers/vfio/Kconfig | 8 +++++---
> >>> drivers/vfio/device_cdev.c | 3 +++
> >>> drivers/vfio/iommufd.c | 6 +++---
> >>> drivers/vfio/vfio.h | 20 +++++++++++++-------
> >>> drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
> >>> include/linux/vfio.h | 1 +
> >>> 6 files changed, 44 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> >>> index ceae52fd7586..d3d8fef2855c 100644
> >>> --- a/drivers/vfio/Kconfig
> >>> +++ b/drivers/vfio/Kconfig
> >>> @@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
> >>> The VFIO device cdev is another way for userspace to
> >>> get device access. Userspace gets device fd by opening device cdev
> >>> under /dev/vfio/devices/vfioX, and then bind the device fd with an
> >>> iommufd
> >>> - to set up secure DMA context for device access. This
> >>> interface does
> >>> - not support noiommu.
> >>> + to set up secure DMA context for device access.
> >>
> >> if noiommu, it's unsafe DMA. :)
> > yes, here I just want to remove "This interface does not support
> > noiommu.".
> >>
> >>> If you don't know what to do here, say N.
> >>>
> >>> @@ -62,7 +61,10 @@ endif
> >>>
> >>> config VFIO_NOIOMMU
> >>> bool "VFIO No-IOMMU support"
> >>> - depends on VFIO_GROUP
> >>> + depends on VFIO_GROUP || VFIO_DEVICE_CDEV
> >>> + depends on !VFIO_GROUP || VFIO_CONTAINER ||
> >>> IOMMUFD_VFIO_CONTAINER
> >>> + depends on !VFIO_DEVICE_CDEV || !GENERIC_ATOMIC64
> >>> + select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
> >>> help
> >>> VFIO is built on the ability to isolate devices using
> >>> the IOMMU. Only with an IOMMU can userspace access to DMA capable
> >>> devices be diff --git a/drivers/vfio/device_cdev.c
> >>> b/drivers/vfio/device_cdev.c index 54abf312cf04..4e2c1e4fc1f8
> >>> 100644 --- a/drivers/vfio/device_cdev.c
> >>> +++ b/drivers/vfio/device_cdev.c
> >>> @@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode
> >>> *inode, struct file *filep) struct vfio_device_file *df;
> >>> int ret;
> >>>
> >>> + if (device->noiommu && !capable(CAP_SYS_RAWIO))
> >>> + return -EPERM;
> >>> +
> >>> /* Paired with the put in vfio_device_fops_release() */
> >>> if (!vfio_device_try_get_registration(device))
> >>> return -ENODEV;
> >>> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> >>> index a38d262c6028..d4f2e2a0f2f3 100644
> >>> --- a/drivers/vfio/iommufd.c
> >>> +++ b/drivers/vfio/iommufd.c
> >>> @@ -25,8 +25,8 @@ int vfio_df_iommufd_bind(struct vfio_device_file
> >>> *df)
> >>> lockdep_assert_held(&vdev->dev_set->lock);
> >>>
> >>> - /* Returns 0 to permit device opening under noiommu mode
> >>> */
> >>> - if (vfio_device_is_noiommu(vdev))
> >>> + /* Group noiommu via iommufd compat needs no device
> >>> binding */
> >>> + if (df->group && vfio_device_is_noiommu(vdev))
> >>
> >> seems like vfio_device_is_noiommu() implies group path, then no
> >> need to use df->group.
> >>
> > df->group is needed because only the legacy VFIO
> > group/iommufd-compat noiommu path should skip real iommufd device
> > binding.
>
> got it. It should be opened via group path. Otherwise, it should go
> ahead to bind iommufd. BTW. the noiommu check in
> vfio_iommufd_compat_attach_ioas() is skipped. I know this helper is
> only for the group path, so the df->group is not added, can we add
> a note for it?
>
Sure, I will add a comment:
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index d4f2e2a0f2f3..e9893d34d07b 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -40,7 +40,11 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
lockdep_assert_held(&vdev->dev_set->lock);
- /* compat noiommu does not need to do ioas attach */
+ /*
+ * Compat noiommu does not need to do ioas attach. This helper is
+ * only called from the legacy group/iommufd compat path, so no
+ * explicit df->group check is needed.
+ */
> > For df->group == NULL, the fd is a VFIO cdev fd. That path uses
> > VFIO_DEVICE_BIND_IOMMUFD and later VFIO_DEVICE_ATTACH_IOMMUFD_PT.
> > Even in noiommu cdev mode, bind must still call:
> >
> > vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
> >
> > so vdev->iommufd_device can get initialized. If the check were only:
> >
> > if (vfio_device_is_noiommu(vdev))
> > return 0;
> > then cdev noiommu bind would falsely “succeed” without setting
> > vdev->iommufd_device. Later VFIO_DEVICE_ATTACH_IOMMUFD_PT calls
> > vfio_iommufd_physical_attach_ioas(), hits:
> >
> > if (WARN_ON(!vdev->iommufd_device))
> > return -EINVAL;
> >
> > In the noiommu test, you will get:
> > 185.870670] ------------[ cut here ]------------
> > [ 185.871952] WARNING: drivers/vfio/iommufd.c:157 at
> > vfio_iommufd_physical_attach_ioas+0x3f/0x50, CPU#0:
> > vfio-noiommu-pc/157[ 185.875010] Modules linked in:[ 185.875882]
> > CPU: 0 UID: 0 PID: 157 Comm: vfio-noiommu-pc Tainted: G U W
> > 7.1.0-rc1+ #20 PREEMPT[ 185.878637] Tainted: [U]=USER, [W]=WARN[
> > 185.879711] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014[
> > 185.882913] RIP: 0010:vfio_iommufd_physical_attach_ioas+0x3f/0x50[
> > 185.884624] Code: 89 f2 31 f6 f6 83 50 04 00 00 01 75 16 e8 d9 aa
> > c6 ff 85 c0 75 07 80 8b 50 04 00 00 01 5b c3 cc cc cc cc e8 43 ab
> > c6 ff eb e8 <0f> 0b b80[ 185.889701] RSP: 0018:ffa000000062fd88
> > EFLAGS: 00010246[ 185.891161] RAX: ffffffff81f59ee0 RBX:
> > ff1100010c43b800 RCX: 0000000000000000[ 185.893141] RDX:
> > ff1100010c708040 RSI: ffa000000062fda0 RDI: 0000000000000000[
> > 185.895127] RBP: ff1100010c43b800 R08: ff1100010c7c12b0 R09:
> > 0000000000000000[ 185.897119] R10: 0000000000000000 R11:
> > 0000000000000000 R12: 00007ffec4c2f720[ 185.899102] R13:
> > ffa000000062fda0 R14: ff11000103bd40d0 R15: ff1100010c43b800[
> > 185.901075] FS: 0000000028d69380(0000) GS:ff110004e4a8d000(0000)
> > knlGS:0000000000000000[ 185.903284] CS: 0010 DS: 0000 ES: 0000
> > CR0: 0000000080050033[ 185.904888] CR2: 0000000028d73988 CR3:
> > 0000000103507002 CR4: 0000000000f73ef0[ 185.906853] PKRU: 55555554[
> > 185.907636] Call Trace:[ 185.908373] <TASK>[ 185.908932]
> > vfio_df_ioctl_attach_pt+0xc7/0x170[ 185.910085]
> > vfio_device_fops_unl_ioctl+0x49b/0xa50[ 185.911322] ?
> > file_tty_write.isra.0+0x202/0x320[ 185.912507]
> > __x64_sys_ioctl+0x425/0xa30[ 185.913502] do_syscall_64+0x5e/0xf80[
> > 185.914444] ? irqentry_exit+0x3b/0x5e0[ 185.915414]
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e[ 185.916701] RIP:
> > 0033:0x434a4d[ 185.917498] Code: 04 25 28 00 00 00 48 89 45 c8 31
> > c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89
> > 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d0[ 185.922052] RSP:
> > 002b:00007ffec4c2f6b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010[
> > 185.923785] RAX: ffffffffffffffda RBX: 0000000000000004 RCX:
> > 0000000000434a4d[ 185.925398] RDX: 00007ffec4c2f720 RSI:
> > 0000000000003b77 RDI: 0000000000000004[ 185.927007] RBP:
> > 00007ffec4c2f700 R08: 0000000000000064 R09: 0000000000000000[
> > 185.928611] R10: 0000000000000000 R11: 0000000000000246 R12:
> > 00007ffec4c30918[ 185.930211] R13: 00007ffec4c30940 R14:
> > 00000000004cf868 R15: 0000000000000001[ 185.931758] </TASK>[
> > 185.932258] ---[ end trace 0000000000000000 ]---Failed to attach pt
> > to device
> >> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> >> {
> >> return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> >> vdev->group->type == VFIO_NO_IOMMU;
> >> }
> >>
> >>> return 0;
> >>>
> >>> return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
> >>> @@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct
> >>> vfio_device_file *df)
> >>> lockdep_assert_held(&vdev->dev_set->lock);
> >>>
> >>> - if (vfio_device_is_noiommu(vdev))
> >>> + if (df->group && vfio_device_is_noiommu(vdev))
> >>> return;
> >>>
> >>> if (vdev->ops->unbind_iommufd)
> >>> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> >>> index e4b72e79b7e3..6f0a2dfc8a00 100644
> >>> --- a/drivers/vfio/vfio.h
> >>> +++ b/drivers/vfio/vfio.h
> >>> @@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct
> >>> vfio_device *device);
> >>> static inline int vfio_device_add(struct vfio_device *device)
> >>> {
> >>> - /* cdev does not support noiommu device */
> >>> - if (vfio_device_is_noiommu(device))
> >>> - return device_add(&device->device);
> >>> vfio_init_device_cdev(device);
> >>> return cdev_device_add(&device->cdev, &device->device);
> >>> }
> >>>
> >>> static inline void vfio_device_del(struct vfio_device *device)
> >>> {
> >>> - if (vfio_device_is_noiommu(device))
> >>> - device_del(&device->device);
> >>> - else
> >>> - cdev_device_del(&device->cdev, &device->device);
> >>> + cdev_device_del(&device->cdev, &device->device);
> >>> }
> >>>
> >>> int vfio_device_fops_cdev_open(struct inode *inode, struct file
> >>> *filep); @@ -420,6 +414,18 @@ static inline void
> >>> vfio_cdev_cleanup(void) }
> >>> #endif /* CONFIG_VFIO_DEVICE_CDEV */
> >>>
> >>> +#if IS_ENABLED(CONFIG_VFIO_NOIOMMU)
> >>> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
> >>> *vdev) +{
> >>> + return vdev->noiommu;
> >>> +}
> >>> +#else
> >>> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
> >>> *vdev) +{
> >>> + return false;
> >>> +}
> >>> +#endif
> >>> +
> >>> #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
> >>> int __init vfio_virqfd_init(void);
> >>> void vfio_virqfd_exit(void);
> >>> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> >>> index 6222376ab6ab..84381c500623 100644
> >>> --- a/drivers/vfio/vfio_main.c
> >>> +++ b/drivers/vfio/vfio_main.c
> >>> @@ -321,6 +321,20 @@ static int vfio_init_device(struct
> >>> vfio_device *device, struct device *dev, return ret;
> >>> }
> >>>
> >>> +static int vfio_device_set_noiommu_and_name(struct vfio_device
> >>> *device) +{
> >>> + if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu
> >>> && !device->dev->iommu) {
> >>> + device->noiommu = true;
> >>> + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> >>> + dev_warn(device->dev,
> >>> + "Adding kernel taint for vfio-noiommu
> >>> cdev on device\n");
> >>> + }
> >>> +
> >>> + /* Just to be safe, expose to user explicitly noiommu
> >>> cdev node */
> >>> + return dev_set_name(&device->device, "%svfio%d",
> >>> + device->noiommu ? "noiommu-" : "",
> >>> device->index); +}
> >>> +
> >>> static int __vfio_register_dev(struct vfio_device *device,
> >>> enum vfio_group_type type)
> >>> {
> >>> @@ -340,20 +354,21 @@ static int __vfio_register_dev(struct
> >>> vfio_device *device, if (!device->dev_set)
> >>> vfio_assign_device_set(device, device);
> >>>
> >>> - ret = dev_set_name(&device->device, "vfio%d",
> >>> device->index);
> >>> + ret = vfio_device_set_group(device, type);
> >>> if (ret)
> >>> return ret;
> >>>
> >>> - ret = vfio_device_set_group(device, type);
> >>> + ret = vfio_device_set_noiommu_and_name(device);
> >>
> >> the order of dev_set_name and vfio_device_set_group() are swapped,
> >> any special reason?
> > The ordering was intentional in an earlier version where the cdev
> > noiommu check depended on device->group. With the current check
> > using !device->dev->iommu, the ordering is no longer strictly
> > required for that test.
> >
> > I kept vfio_device_set_group() first because the rest of
> > registration already treats group setup as the first VFIO state to
> > unwind, and this lets the existing err_out path handle failures
> > after group assignment, including dev_set_name(). I can restore the
> > old order if you prefer, since it is not functionally required
> > anymore.
>
> I think it's better to keep the original order if no functional
> requirement anymore.
will do,
>
> >>> if (ret)
> >>> - return ret;
> >>> + goto err_out;
> >>>
> >>> /*
> >>> * VFIO always sets IOMMU_CACHE because we offer no way
> >>> for userspace to
> >>> * restore cache coherency. It has to be checked here
> >>> because it is only
> >>> * valid for cases where we are using iommu groups.
> >>> */
> >>> - if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)
> >>> &&
> >>> + if (type == VFIO_IOMMU &&
> >>> !(vfio_device_is_noiommu(device) ||
> >>> +
> >>> vfio_device_is_cdev_noiommu(device)) &&
> >>
> >> now, the group path and cdev path have their own is_noiommu helper,
> >> can the two helpers be consolidated?
> >>
> > They could be consolidated mechanically, but I feel they are
> > checking different things it is more clear to keep them separate?
>
> IMHO. They are actually checking if the device is noiommu. I found the
> current usage of vfio_device_is_noiommu(). #1 is totally specific for
> group path. #2, #3 and #4 are for the common path to identify the
> noiommu of group. It also implies the info of the open path (group
> path?). #7 and #8 is going to be dropped. And 9 is totally for
> checking noiommu attribute. So I'm wondering if using !dev->iommu is
> a good choice. Should be able to cover both group and cdev path. Let
> me know if this is not workable.
I'm not sure I follow -- do you mean using !dev->iommu as the common
no-IOMMU check?
I don't think that should be the helper condition directly. !dev->iommu
is the low-level device state, but VFIO no-IOMMU is a VFIO mode. For
example, VFIO_EMULATED_IOMMU devices may also not have dev->iommu, but
they should not be treated as VFIO_NO_IOMMU.
What I can do is consolidate the helpers around the VFIO state instead:
legacy group no-IOMMU is represented by group->type, while the cdev
path uses vdev->noiommu
static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
#if IS_ENABLED(CONFIG_VFIO_GROUP)
if (vdev->group && vdev->group->type == VFIO_NO_IOMMU)
return true;
#endif
return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && vdev->noiommu;
}
>
> # line filename / context / line
> 1 193 drivers/vfio/group.c <<vfio_df_group_open>>
> if (df->iommufd && vfio_device_is_noiommu(device) &&
> device->open_count == 0) {
> 2 29 drivers/vfio/iommufd.c <<vfio_df_iommufd_bind>>
> if (vfio_device_is_noiommu(vdev))
> 3 44 drivers/vfio/iommufd.c
> <<vfio_iommufd_compat_attach_ioas>> if (vfio_device_is_noiommu(vdev))
> 4 61 drivers/vfio/iommufd.c <<vfio_df_iommufd_unbind>>
> if (vfio_device_is_noiommu(vdev))
> 5 115 drivers/vfio/vfio.h <<vfio_device_is_noiommu>>
> static inline bool vfio_device_is_noiommu(struct
> vfio_device *vdev)
> 6 191 drivers/vfio/vfio.h <<vfio_device_is_noiommu>>
> static inline bool vfio_device_is_noiommu(struct
> vfio_device *vdev)
> 7 362 drivers/vfio/vfio.h <<vfio_device_add>>
> if (vfio_device_is_noiommu(device))
> 8 370 drivers/vfio/vfio.h <<vfio_device_del>>
> if (vfio_device_is_noiommu(device))
> 9 356 drivers/vfio/vfio_main.c <<__vfio_register_dev>>
> if (type == VFIO_IOMMU &&
> !vfio_device_is_noiommu(device) &&
>
> >>> !device_iommu_capable(device->dev,
> >>> IOMMU_CAP_CACHE_COHERENCY)) { ret = -EINVAL;
> >>> goto err_out;
> >>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> >>> index 31b826efba00..45f08986359e 100644
> >>> --- a/include/linux/vfio.h
> >>> +++ b/include/linux/vfio.h
> >>> @@ -74,6 +74,7 @@ struct vfio_device {
> >>> u8 iommufd_attached:1;
> >>> #endif
> >>> u8 cdev_opened:1;
> >>> + u8 noiommu:1;
> >>> /*
> >>> * debug_root is a static property of the vfio_device
> >>> * which must be set prior to registering the
> >>> vfio_device.
> >
>
> Regards,
> Yi Liu
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd
2026-05-28 18:52 ` Jacob Pan
@ 2026-05-29 7:27 ` Yi Liu
0 siblings, 0 replies; 25+ messages in thread
From: Yi Liu @ 2026-05-29 7:27 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, David Matlack,
Robin Murphy, Nicolin Chen, Tian, Kevin, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon
On 5/29/26 02:52, Jacob Pan wrote:
>>>> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
>>>> {
>>>> return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
>>>> vdev->group->type == VFIO_NO_IOMMU;
>>>> }
>>>>
>>>>> return 0;
>>>>>
>>>>> return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
>>>>> @@ -58,7 +58,7 @@ void vfio_df_iommufd_unbind(struct
>>>>> vfio_device_file *df)
>>>>> lockdep_assert_held(&vdev->dev_set->lock);
>>>>>
>>>>> - if (vfio_device_is_noiommu(vdev))
>>>>> + if (df->group && vfio_device_is_noiommu(vdev))
>>>>> return;
>>>>>
>>>>> if (vdev->ops->unbind_iommufd)
>>>>> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
>>>>> index e4b72e79b7e3..6f0a2dfc8a00 100644
>>>>> --- a/drivers/vfio/vfio.h
>>>>> +++ b/drivers/vfio/vfio.h
>>>>> @@ -358,19 +358,13 @@ void vfio_init_device_cdev(struct
>>>>> vfio_device *device);
>>>>> static inline int vfio_device_add(struct vfio_device *device)
>>>>> {
>>>>> - /* cdev does not support noiommu device */
>>>>> - if (vfio_device_is_noiommu(device))
>>>>> - return device_add(&device->device);
>>>>> vfio_init_device_cdev(device);
>>>>> return cdev_device_add(&device->cdev, &device->device);
>>>>> }
>>>>>
>>>>> static inline void vfio_device_del(struct vfio_device *device)
>>>>> {
>>>>> - if (vfio_device_is_noiommu(device))
>>>>> - device_del(&device->device);
>>>>> - else
>>>>> - cdev_device_del(&device->cdev, &device->device);
>>>>> + cdev_device_del(&device->cdev, &device->device);
>>>>> }
>>>>>
>>>>> int vfio_device_fops_cdev_open(struct inode *inode, struct file
>>>>> *filep); @@ -420,6 +414,18 @@ static inline void
>>>>> vfio_cdev_cleanup(void) }
>>>>> #endif /* CONFIG_VFIO_DEVICE_CDEV */
>>>>>
>>>>> +#if IS_ENABLED(CONFIG_VFIO_NOIOMMU)
>>>>> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
>>>>> *vdev) +{
>>>>> + return vdev->noiommu;
>>>>> +}
>>>>> +#else
>>>>> +static inline bool vfio_device_is_cdev_noiommu(struct vfio_device
>>>>> *vdev) +{
>>>>> + return false;
>>>>> +}
>>>>> +#endif
>>>>> +
>>>>> #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
>>>>> int __init vfio_virqfd_init(void);
>>>>> void vfio_virqfd_exit(void);
>>>>> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
>>>>> index 6222376ab6ab..84381c500623 100644
>>>>> --- a/drivers/vfio/vfio_main.c
>>>>> +++ b/drivers/vfio/vfio_main.c
>>>>> @@ -321,6 +321,20 @@ static int vfio_init_device(struct
>>>>> vfio_device *device, struct device *dev, return ret;
>>>>> }
>>>>>
>>>>> +static int vfio_device_set_noiommu_and_name(struct vfio_device
>>>>> *device) +{
>>>>> + if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && vfio_noiommu
>>>>> && !device->dev->iommu) {
>>>>> + device->noiommu = true;
>>>>> + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>>>>> + dev_warn(device->dev,
>>>>> + "Adding kernel taint for vfio-noiommu
>>>>> cdev on device\n");
>>>>> + }
>>>>> +
>>>>> + /* Just to be safe, expose to user explicitly noiommu
>>>>> cdev node */
>>>>> + return dev_set_name(&device->device, "%svfio%d",
>>>>> + device->noiommu ? "noiommu-" : "",
>>>>> device->index); +}
>>>>> +
>>>>> static int __vfio_register_dev(struct vfio_device *device,
>>>>> enum vfio_group_type type)
>>>>> {
>>>>> @@ -340,20 +354,21 @@ static int __vfio_register_dev(struct
>>>>> vfio_device *device, if (!device->dev_set)
>>>>> vfio_assign_device_set(device, device);
>>>>>
>>>>> - ret = dev_set_name(&device->device, "vfio%d",
>>>>> device->index);
>>>>> + ret = vfio_device_set_group(device, type);
>>>>> if (ret)
>>>>> return ret;
>>>>>
>>>>> - ret = vfio_device_set_group(device, type);
>>>>> + ret = vfio_device_set_noiommu_and_name(device);
>>>>
>>>> the order of dev_set_name and vfio_device_set_group() are swapped,
>>>> any special reason?
>>> The ordering was intentional in an earlier version where the cdev
>>> noiommu check depended on device->group. With the current check
>>> using !device->dev->iommu, the ordering is no longer strictly
>>> required for that test.
>>>
>>> I kept vfio_device_set_group() first because the rest of
>>> registration already treats group setup as the first VFIO state to
>>> unwind, and this lets the existing err_out path handle failures
>>> after group assignment, including dev_set_name(). I can restore the
>>> old order if you prefer, since it is not functionally required
>>> anymore.
>>
>> I think it's better to keep the original order if no functional
>> requirement anymore.
> will do,
>
>>
>>>>> if (ret)
>>>>> - return ret;
>>>>> + goto err_out;
>>>>>
>>>>> /*
>>>>> * VFIO always sets IOMMU_CACHE because we offer no way
>>>>> for userspace to
>>>>> * restore cache coherency. It has to be checked here
>>>>> because it is only
>>>>> * valid for cases where we are using iommu groups.
>>>>> */
>>>>> - if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)
>>>>> &&
>>>>> + if (type == VFIO_IOMMU &&
>>>>> !(vfio_device_is_noiommu(device) ||
>>>>> +
>>>>> vfio_device_is_cdev_noiommu(device)) &&
>>>>
>>>> now, the group path and cdev path have their own is_noiommu helper,
>>>> can the two helpers be consolidated?
>>>>
>>> They could be consolidated mechanically, but I feel they are
>>> checking different things it is more clear to keep them separate?
>>
>> IMHO. They are actually checking if the device is noiommu. I found the
>> current usage of vfio_device_is_noiommu(). #1 is totally specific for
>> group path. #2, #3 and #4 are for the common path to identify the
>> noiommu of group. It also implies the info of the open path (group
>> path?). #7 and #8 is going to be dropped. And 9 is totally for
>> checking noiommu attribute. So I'm wondering if using !dev->iommu is
>> a good choice. Should be able to cover both group and cdev path. Let
>> me know if this is not workable.
> I'm not sure I follow -- do you mean using !dev->iommu as the common
> no-IOMMU check?
>
> I don't think that should be the helper condition directly. !dev->iommu
> is the low-level device state, but VFIO no-IOMMU is a VFIO mode. For
> example, VFIO_EMULATED_IOMMU devices may also not have dev->iommu, but
> they should not be treated as VFIO_NO_IOMMU.
you are right. :)
> What I can do is consolidate the helpers around the VFIO state instead:
> legacy group no-IOMMU is represented by group->type, while the cdev
> path uses vdev->noiommu
>
> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> {
> #if IS_ENABLED(CONFIG_VFIO_GROUP)
> if (vdev->group && vdev->group->type == VFIO_NO_IOMMU)
> return true;
> #endif
>
> return IS_ENABLED(CONFIG_IOMMUFD_NOIOMMU) && vdev->noiommu;
> }
looks good. :)
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v6 6/7] selftests/vfio: Add iommufd noiommu mode selftest for cdev
2026-05-21 22:39 ` David Matlack
@ 2026-06-03 0:13 ` Jacob Pan
0 siblings, 0 replies; 25+ messages in thread
From: Jacob Pan @ 2026-06-03 0:13 UTC (permalink / raw)
To: David Matlack
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, Mostafa Saleh, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, Baolu Lu, Saurabh Sengar,
skhawaja, pasha.tatashin, Will Deacon, jacob.pan
Hi David,
On Thu, 21 May 2026 22:39:34 +0000
David Matlack <dmatlack@google.com> wrote:
> On 2026-05-21 03:11 PM, Jacob Pan wrote:
>
> For the shortlog, please use "vfio: selftests: ..."
>
> > Add comprehensive selftest for VFIO device operations with iommufd
> > in noiommu mode. Tests cover:
> > - Device binding to iommufd
> > - IOAS (I/O Address Space) allocation, mapping with dummy IOVA
> > - Retrieve PA from dummy IOVA
> > - Device attach/detach operations as usual
>
> High level feedback: Can you use the library for all the standard
> setup and ioas mapping instead of reimplementing it in this test?
>
> iommu = iommu_init(MODE_IOMMUFD);
> device = vfio_pci_device_init(iommu, bdf);
>
> __iommu_map(...);
> __iommu_unma(...);
>
> iommu_cleanup(iommu);
> vfio_pci_device_cleanup(device);
>
> If not, what are the gaps? It would be useful to fill in those gaps so
> that it is easier to use VFIO selftests with noiommu setups.
>
I will use the library, there is no gap, just that the test was written
a while ago did not catch up with the current VFIO selftest helpers.
> >
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > v6:
> > - Add test cases for get_pa length limit
> > v4:
> > - squash DSA specific selftest changes
> > v2:
> > - New selftest for generic noiommu bind/unbind
> > ---
> > tools/testing/selftests/vfio/Makefile | 1 +
> > .../lib/include/libvfio/vfio_pci_device.h | 16 +
> > .../selftests/vfio/lib/vfio_pci_device.c | 5 +-
> > .../vfio/vfio_iommufd_noiommu_test.c | 664
> > ++++++++++++++++++ 4 files changed, 684 insertions(+), 2
> > deletions(-) create mode 100644
> > tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
> >
> > diff --git a/tools/testing/selftests/vfio/Makefile
> > b/tools/testing/selftests/vfio/Makefile index
> > 0684932d91bf..c9c02fdfd946 100644 ---
> > a/tools/testing/selftests/vfio/Makefile +++
> > b/tools/testing/selftests/vfio/Makefile @@ -9,6 +9,7 @@ CFLAGS =
> > $(KHDR_INCLUDES) TEST_GEN_PROGS += vfio_dma_mapping_test
> > TEST_GEN_PROGS += vfio_dma_mapping_mmio_test
> > TEST_GEN_PROGS += vfio_iommufd_setup_test
> > +TEST_GEN_PROGS += vfio_iommufd_noiommu_test
> > TEST_GEN_PROGS += vfio_pci_device_test
> > TEST_GEN_PROGS += vfio_pci_device_init_perf_test
> > TEST_GEN_PROGS += vfio_pci_driver_test
> > diff --git
> > a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
> > b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
> > index 2858885a89bb..6218c91776b3 100644 ---
> > a/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
> > +++
> > b/tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
> > @@ -122,4 +122,20 @@ static inline bool
> > vfio_pci_device_match(struct vfio_pci_device *device, const char
> > *vfio_pci_get_cdev_path(const char *bdf); +static inline bool
> > vfio_pci_noiommu_mode_enabled(void) +{
> > + char buf[8] = {};
> > + int fd, n;
> > +
> > + fd =
> > open("/sys/module/vfio/parameters/enable_unsafe_noiommu_mode",
> > + O_RDONLY);
>
> Can you rebase on top of the latest changes Alex merged for 7.2? It
> introduces the sysfs library from Raghu. Please add a helper there for
> reading module parameters in a precursor patch.
>
ok, or if the timing is better, I will wait for 7.2 then submit this
selftest separately?
> > + if (fd < 0)
> > + return false;
> > +
> > + n = read(fd, buf, sizeof(buf) - 1);
> > + close(fd);
> > +
> > + return n > 0 && buf[0] == 'Y';
> > +}
> > +
> > #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H */
> > diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c
> > b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index
> > fc75e04ef010..1a91658e812d 100644 ---
> > a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++
> > b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -308,8
> > +308,9 @@ const char *vfio_pci_get_cdev_path(const char *bdf)
> > VFIO_ASSERT_NOT_NULL(dir, "Failed to open directory %s\n",
> > dir_path); while ((entry = readdir(dir)) != NULL) {
> > - /* Find the file that starts with "vfio" */
> > - if (strncmp("vfio", entry->d_name, 4))
> > + /* Find the file that starts with "vfio" or
> > "noiommu-vfio" */
> > + if (strncmp("vfio", entry->d_name, 4) &&
> > + strncmp("noiommu-vfio", entry->d_name, 12))
> > continue;
> >
> > snprintf(cdev_path, PATH_MAX,
> > "/dev/vfio/devices/%s", entry->d_name); diff --git
> > a/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
> > b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c new file
> > mode 100644 index 000000000000..d91b505fc60d --- /dev/null
> > +++ b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
> > @@ -0,0 +1,664 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * VFIO iommufd NoIOMMU Mode Selftest
> > + *
> > + * Tests VFIO device operations with iommufd in noiommu mode,
> > including:
> > + * - Device binding to iommufd
> > + * - IOAS (I/O Address Space) allocation and management
> > + * - Device attach/detach to IOAS
> > + * - Memory mapping in IOAS
> > + * - Device info queries and reset
> > + */
> > +
> > +#include <linux/limits.h>
> > +#include <linux/vfio.h>
> > +#include <linux/iommufd.h>
> > +
> > +#include <stdint.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <dirent.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/mman.h>
> > +#include <unistd.h>
> > +#include <errno.h>
> > +
> > +#include <libvfio.h>
> > +#include "kselftest_harness.h"
> > +
> > +static const char iommu_dev_path[] = "/dev/iommu";
>
> I don't see why this needs to be global variables.
>
> > +static const char *cdev_path;
> > +
> > +static char *vfio_noiommu_get_device_id(const char *bdf)
> > +{
> > + char *path = NULL;
> > + char *vfio_id = NULL;
> > + struct dirent *dentry;
> > + DIR *dp;
> > +
> > + if (asprintf(&path, "/sys/bus/pci/devices/%s/vfio-dev",
> > bdf) < 0)
> > + return NULL;
> > +
> > + dp = opendir(path);
> > + if (!dp) {
> > + free(path);
> > + return NULL;
> > + }
> > +
> > + while ((dentry = readdir(dp)) != NULL) {
> > + if (strncmp("noiommu-vfio", dentry->d_name, 12) ==
> > 0) {
> > + vfio_id = strdup(dentry->d_name);
> > + break;
> > + }
> > + }
> > +
> > + closedir(dp);
> > + free(path);
> > + return vfio_id;
> > +}
> > +
> > +static char *vfio_noiommu_get_cdev_path(const char *bdf)
> > +{
> > + char *vfio_id = vfio_noiommu_get_device_id(bdf);
> > + char *cdev = NULL;
> > +
> > + if (vfio_id) {
> > + asprintf(&cdev, "/dev/vfio/devices/%s", vfio_id);
> > + free(vfio_id);
> > + }
> > + return cdev;
> > +}
>
> Can we put this in the library and find a way to share code with
> vfio_pci_get_cdev_path()?
>
I will delete/merge this, vfio_pci_get_cdev_path will scan
/sys/bus/pci/devices/<bdf>/vfio-dev/ and accepts both vfio* and
noiommu-vfio*, then returns /dev/vfio/devices/<name>.
> > +
> > +static int vfio_device_bind_iommufd_ioctl(int cdev_fd, int iommufd)
> > +{
> > + struct vfio_device_bind_iommufd bind_args = {
> > + .argsz = sizeof(bind_args),
> > + .iommufd = iommufd,
> > + };
> > +
> > + return ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD,
> > &bind_args); +}
>
> Please add the ioctl wrappers to the library so they can be used by
> other tests or library code in the future.
>
> VFIO device ioctls can go in vfio_pci_device.c and iommufd ioctls can
> go in iommu.c.
>
ok, will do.
> > +
> > +static int vfio_device_get_info_ioctl(int cdev_fd,
> > + struct vfio_device_info
> > *info) +{
> > + info->argsz = sizeof(*info);
> > + return ioctl(cdev_fd, VFIO_DEVICE_GET_INFO, info);
> > +}
> > +
> > +static int vfio_device_ioas_alloc_ioctl(int iommufd,
> > + struct iommu_ioas_alloc
> > *alloc_args) +{
> > + alloc_args->size = sizeof(*alloc_args);
> > + alloc_args->flags = 0;
> > + return ioctl(iommufd, IOMMU_IOAS_ALLOC, alloc_args);
> > +}
> > +
> > +static int vfio_device_attach_iommufd_pt_ioctl(int cdev_fd, u32
> > pt_id) +{
> > + struct vfio_device_attach_iommufd_pt attach_args = {
> > + .argsz = sizeof(attach_args),
> > + .pt_id = pt_id,
> > + };
> > +
> > + return ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT,
> > &attach_args); +}
> > +
> > +static int vfio_device_detach_iommufd_pt_ioctl(int cdev_fd)
> > +{
> > + struct vfio_device_detach_iommufd_pt detach_args = {
> > + .argsz = sizeof(detach_args),
> > + };
> > +
> > + return ioctl(cdev_fd, VFIO_DEVICE_DETACH_IOMMUFD_PT,
> > &detach_args); +}
> > +
> > +static int vfio_device_get_region_info_ioctl(int cdev_fd, uint32_t
> > index,
> > + struct
> > vfio_region_info *info) +{
> > + info->argsz = sizeof(*info);
> > + info->index = index;
> > + return ioctl(cdev_fd, VFIO_DEVICE_GET_REGION_INFO, info);
> > +}
> > +
> > +static int vfio_device_reset_ioctl(int cdev_fd)
> > +{
> > + return ioctl(cdev_fd, VFIO_DEVICE_RESET);
> > +}
> > +
> > +static int ioas_map_pages(int iommufd, uint32_t ioas_id, uint64_t
> > iova,
> > + size_t length, bool hugepages)
> > +{
> > + struct iommu_ioas_map map_args = {
> > + .size = sizeof(map_args),
> > + .ioas_id = ioas_id,
> > + .iova = iova,
> > + .length = length,
> > + .flags = IOMMU_IOAS_MAP_READABLE |
> > IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_FIXED_IOVA,
> > + };
> > + void *pages;
> > + int ret;
> > +
> > + /* Allocate test pages */
> > + if (hugepages)
> > + pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
> > + MAP_PRIVATE | MAP_ANONYMOUS |
> > MAP_HUGETLB, -1, 0);
> > + else
> > + pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
> > + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > + if (pages == MAP_FAILED) {
> > + printf("mmap failed for length 0x%lx\n", (unsigned
> > long)length);
> > + return -ENOMEM;
> > + }
> > +
> > + /* Set up page pointer for mapping */
> > + map_args.user_va = (uintptr_t)pages;
> > +
> > + printf(" ioas_map_pages: ioas_id=%u, iova=0x%lx,
> > length=0x%lx, user_va=%p\n",
> > + ioas_id, (unsigned long)iova, (unsigned
> > long)length, pages); +
> > + /* Map into IOAS */
> > + ret = ioctl(iommufd, IOMMU_IOAS_MAP, &map_args);
> > + if (ret != 0)
> > + printf(" IOMMU_IOAS_MAP failed: %d (%s)\n", ret,
> > strerror(errno));
> > + else
> > + printf(" IOMMU_IOAS_MAP succeeded, IOVA=0x%lx\n",
> > (unsigned long)map_args.iova); +
> > + munmap(pages, length);
> > + return ret;
> > +}
> > +
> > +static int ioas_unmap_pages(int iommufd, uint32_t ioas_id,
> > uint64_t iova,
> > + size_t length)
> > +{
> > + struct iommu_ioas_unmap unmap_args = {
> > + .size = sizeof(unmap_args),
> > + .ioas_id = ioas_id,
> > + .iova = iova,
> > + .length = length,
> > + };
> > +
> > + return ioctl(iommufd, IOMMU_IOAS_UNMAP, &unmap_args);
> > +}
> > +
> > +static int ioas_destroy_ioctl(int iommufd, uint32_t ioas_id)
> > +{
> > + struct iommu_destroy destroy_args = {
> > + .size = sizeof(destroy_args),
> > + .id = ioas_id,
> > + };
> > +
> > + return ioctl(iommufd, IOMMU_DESTROY, &destroy_args);
> > +}
> > +
> > +static int ioas_noiommu_get_pa_ioctl_len(int iommufd, uint32_t
> > ioas_id,
> > + uint64_t iova, uint64_t max_length,
> > + uint64_t *phys_out, uint64_t
> > *length_out) +{
> > + struct iommu_ioas_noiommu_get_pa get_pa = {
> > + .size = sizeof(get_pa),
> > + .flags = 0,
> > + .ioas_id = ioas_id,
> > + .iova = iova,
> > + .length = max_length,
> > + };
> > +
> > + printf(" ioas_noiommu_get_pa_ioctl: ioas_id=%u,
> > iova=0x%lx, max_length=0x%lx\n",
> > + ioas_id, (unsigned long)iova, (unsigned
> > long)max_length); +
> > + if (ioctl(iommufd, IOMMU_IOAS_NOIOMMU_GET_PA, &get_pa) !=
> > 0) {
> > + printf(" IOMMU_IOAS_NOIOMMU_GET_PA failed: %s
> > (errno=%d)\n",
> > + strerror(errno), errno);
> > + return -1;
> > + }
> > +
> > + printf(" IOMMU_IOAS_NOIOMMU_GET_PA succeeded: PA=0x%lx,
> > length=0x%lx\n",
> > + (unsigned long)get_pa.out_phys, (unsigned
> > long)get_pa.length); +
> > + if (phys_out)
> > + *phys_out = get_pa.out_phys;
> > + if (length_out)
> > + *length_out = get_pa.length;
> > +
> > + return 0;
> > +}
> > +
> > +static int ioas_noiommu_get_pa_ioctl(int iommufd, uint32_t
> > ioas_id, uint64_t iova,
> > + uint64_t *phys_out, uint64_t
> > *length_out) +{
> > + return ioas_noiommu_get_pa_ioctl_len(iommufd, ioas_id,
> > iova, 0,
> > + phys_out, length_out);
> > +}
> > +
> > +FIXTURE(vfio_noiommu) {
> > + int cdev_fd;
> > + int iommufd;
> > +};
> > +
> > +FIXTURE_SETUP(vfio_noiommu)
> > +{
> > + ASSERT_LE(0, (self->cdev_fd = open(cdev_path, O_RDWR, 0)));
> > + ASSERT_LE(0, (self->iommufd = open(iommu_dev_path, O_RDWR,
> > 0))); +}
> > +
> > +FIXTURE_TEARDOWN(vfio_noiommu)
> > +{
> > + if (self->cdev_fd >= 0)
> > + close(self->cdev_fd);
> > + if (self->iommufd >= 0)
> > + close(self->iommufd);
> > +}
> > +
> > +/*
> > + * Test: Device cdev can be opened
> > + */
> > +TEST_F(vfio_noiommu, device_cdev_open)
> > +{
> > + ASSERT_LE(0, self->cdev_fd);
> > +}
>
> This is already tested by the FIXTURE_SETUP(). No need for a TEST_F().
>
will remove
> > +
> > +/*
> > + * Test: Device can be bound to iommufd
> > + */
> > +TEST_F(vfio_noiommu, device_bind_iommufd)
> > +{
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd)); +}
> > +
> > +/*
> > + * Test: Device info can be queried after binding
> > + */
> > +TEST_F(vfio_noiommu, device_get_info_after_bind)
> > +{
> > + struct vfio_device_info info;
> > +
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd));
> > + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd,
> > &info));
> > + ASSERT_NE(0, info.argsz);
> > +}
> > +
> > +/*
> > + * Test: Getting device info fails without bind
> > + */
> > +TEST_F(vfio_noiommu, device_get_info_without_bind_fails)
> > +{
> > + struct vfio_device_info info;
> > +
> > + ASSERT_NE(0, vfio_device_get_info_ioctl(self->cdev_fd,
> > &info)); +}
> > +
> > +/*
> > + * Test: Binding with invalid iommufd fails
> > + */
> > +TEST_F(vfio_noiommu, device_bind_bad_iommufd_fails)
> > +{
> > + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > -2)); +}
>
> Are all these tests really specific to noiommu?
>
no, will remove this and other duplicated tests.
> > +
> > +/*
> > + * Test: Cannot bind twice to same device
> > + */
> > +TEST_F(vfio_noiommu, device_repeated_bind_fails)
> > +{
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd));
> > + ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd)); +}
> > +
> > +/*
> > + * Test: IOAS can be allocated
> > + */
> > +TEST_F(vfio_noiommu, ioas_alloc)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > +
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > + &alloc_args));
> > + ASSERT_NE(0, alloc_args.out_ioas_id);
> > +}
> > +
> > +/*
> > + * Test: IOAS can be destroyed
> > + */
> > +TEST_F(vfio_noiommu, ioas_destroy)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > +
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > + &alloc_args));
> > + ASSERT_EQ(0, ioas_destroy_ioctl(self->iommufd,
> > + alloc_args.out_ioas_id));
> > +}
> > +
> > +/*
> > + * Test: Device can attach to IOAS after binding
> > + */
> > +TEST_F(vfio_noiommu, device_attach_to_ioas)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > +
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd));
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > + &alloc_args));
> > + ASSERT_EQ(0,
> > vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> > +
> > alloc_args.out_ioas_id)); +}
> > +
> > +/*
> > + * Test: Attaching to invalid IOAS fails
> > + */
> > +TEST_F(vfio_noiommu, device_attach_invalid_ioas_fails)
> > +{
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd));
> > + ASSERT_NE(0,
> > vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> > +
> > UINT32_MAX)); +}
> > +
> > +/*
> > + * Test: Device can detach from IOAS
> > + */
> > +TEST_F(vfio_noiommu, device_detach_from_ioas)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > +
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd));
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > + &alloc_args));
> > + ASSERT_EQ(0,
> > vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> > +
> > alloc_args.out_ioas_id));
> > + ASSERT_EQ(0,
> > vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd)); +}
> > +
> > +/*
> > + * Test: Full lifecycle - bind, attach, detach, reset
> > + */
> > +TEST_F(vfio_noiommu, device_lifecycle)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > + struct vfio_device_info info;
> > +
> > + /* Bind device to iommufd */
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd)); +
> > + /* Allocate IOAS */
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > + &alloc_args));
> > +
> > + /* Attach device to IOAS */
> > + ASSERT_EQ(0,
> > vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> > +
> > alloc_args.out_ioas_id)); +
> > + /* Query device info */
> > + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd,
> > &info)); +
> > + /* Detach device from IOAS */
> > + ASSERT_EQ(0,
> > vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd)); +
> > + /* Reset device */
> > + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
> > +}
> > +
> > +/*
> > + * Test: Get region info
> > + */
> > +TEST_F(vfio_noiommu, device_get_region_info)
> > +{
> > + struct vfio_device_info dev_info;
> > + struct vfio_region_info region_info;
> > +
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd));
> > + ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd,
> > &dev_info)); +
> > + /* Try to get first region info if device has regions */
> > + if (dev_info.num_regions > 0) {
> > + ASSERT_EQ(0,
> > vfio_device_get_region_info_ioctl(self->cdev_fd, 0,
> > +
> > ®ion_info));
> > + ASSERT_NE(0, region_info.argsz);
> > + }
> > +}
> > +
> > +TEST_F(vfio_noiommu, device_reset)
> > +{
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd));
> > + ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
> > +}
> > +
> > +TEST_F(vfio_noiommu, ioas_map_pages)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > + long page_size = sysconf(_SC_PAGESIZE);
> > + uint64_t iova = 0x10000;
> > + int i;
> > +
> > + ASSERT_GT(page_size, 0);
> > +
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > + &alloc_args));
> > +
> > + printf("Page size: %ld bytes\n", page_size);
> > + /* Test mapping regions of different sizes: 1, 2, 4, 8
> > pages */
> > + for (i = 0; i < 4; i++) {
> > + size_t map_size = page_size * (1 << i); /* 1, 2,
> > 4, 8 pages */
> > + uint64_t test_iova = iova + (i * 0x100000);
> > +
> > + /* Attempt to map each region (may fail if not
> > supported) */
> > + ioas_map_pages(self->iommufd,
> > alloc_args.out_ioas_id,
> > + test_iova, map_size, false);
> > + }
> > +}
> > +
> > +TEST_F(vfio_noiommu, multiple_ioas_alloc)
> > +{
> > + struct iommu_ioas_alloc alloc1, alloc2;
> > +
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > &alloc1));
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > &alloc2));
> > + ASSERT_NE(alloc1.out_ioas_id, alloc2.out_ioas_id);
> > +}
> > +
> > +/*
> > + * Test: Query physical address for IOVA
> > + * Tests IOMMU_IOAS_NOIOMMU_GET_PA ioctl to translate IOVA to
> > physical address
> > + * Note: Device must be attached to IOAS for PA query to work
> > + */
> > +#define NR_PAGES 32
> > +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_mapped)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > + long page_size = sysconf(_SC_PAGESIZE);
> > + uint64_t iova = 0x200000;
> > + uint64_t phys = 0;
> > + uint64_t length = 0;
> > + int ret;
> > +
> > + ASSERT_GT(page_size, 0);
> > +
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd)); +
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > + &alloc_args));
> > +
> > + ASSERT_EQ(0,
> > vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> > +
> > alloc_args.out_ioas_id)); +
> > + /*
> > + * Map a page into an arbitrary IOAS, used as a cookie for
> > lookup.
> > + * Use hugepages to test contiguous PA. Make sure
> > hugepages are
> > + * available. e.g. echo 64 > /proc/sys/vm/nr_hugepages
> > + */
> > + ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
> > + iova, page_size * NR_PAGES, true);
> > + if (ret != 0)
> > + return;
> > +
> > + /* Query the physical address for the mapped dummy IOVA */
> > + ret = ioas_noiommu_get_pa_ioctl(self->iommufd,
> > alloc_args.out_ioas_id,
> > + iova, &phys, &length);
> > +
> > + if (ret == 0) {
> > + /* If we got a result, verify it's valid */
> > + ASSERT_NE(0, phys);
> > + ASSERT_GE((uint64_t)page_size * NR_PAGES, length);
> > + }
> > +
> > + /*
> > + * Query with a non-page-aligned IOVA. The returned length
> > must
> > + * not exceed the actual contiguous range starting from
> > that
> > + * offset, i.e. it must be reduced by the sub-page offset.
> > + */
> > + phys = 0;
> > + length = 0;
> > + ret = ioas_noiommu_get_pa_ioctl(self->iommufd,
> > alloc_args.out_ioas_id,
> > + iova + 0x80, &phys, &length);
> > + if (ret == 0) {
> > + ASSERT_NE(0, phys);
> > + /* Length must account for the sub-page offset */
> > + ASSERT_GE((uint64_t)page_size * NR_PAGES - 0x80,
> > length);
> > + ASSERT_LE(length, (uint64_t)page_size * NR_PAGES -
> > 0x80);
> > + /* Must not overshoot into the next page boundary
> > */
> > + ASSERT_EQ(0, (phys + length) % page_size);
> > + }
> > +}
> > +
> > +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_unmapped_fails)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > +
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > + &alloc_args));
> > +
> > + /* Try to retrieve unmapped IOVA (should fail) */
> > + ASSERT_NE(0, ioas_noiommu_get_pa_ioctl(self->iommufd,
> > alloc_args.out_ioas_id,
> > + 0x10000, NULL, NULL));
> > +}
> > +
> > +/*
> > + * Test: length == 0 means no limit (backward compat default)
> > + */
> > +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_length_zero_no_limit)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > + long page_size = sysconf(_SC_PAGESIZE);
> > + uint64_t iova = 0x200000;
> > + uint64_t phys_nolimit = 0, phys_zero = 0;
> > + uint64_t len_nolimit = 0, len_zero = 0;
> > + int ret;
> > +
> > + ASSERT_GT(page_size, 0);
> > +
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd));
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > &alloc_args));
> > + ASSERT_EQ(0,
> > vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> > +
> > alloc_args.out_ioas_id)); +
> > + ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
> > + iova, page_size * NR_PAGES, true);
> > + if (ret != 0)
> > + return;
> > +
> > + /* Query with length=0 (no limit, default behavior) */
> > + ret = ioas_noiommu_get_pa_ioctl_len(self->iommufd,
> > alloc_args.out_ioas_id,
> > + iova, 0, &phys_zero,
> > &len_zero);
> > + if (ret != 0)
> > + return;
> > +
> > + /* Query with the wrapper (also passes 0) — must match */
> > + ret = ioas_noiommu_get_pa_ioctl(self->iommufd,
> > alloc_args.out_ioas_id,
> > + iova, &phys_nolimit,
> > &len_nolimit);
> > + ASSERT_EQ(0, ret);
> > + ASSERT_EQ(phys_zero, phys_nolimit);
> > + ASSERT_EQ(len_zero, len_nolimit);
> > +}
> > +
> > +/*
> > + * Test: length caps the returned contiguous range
> > + */
> > +TEST_F(vfio_noiommu, ioas_noiommu_get_pa_length_capped)
> > +{
> > + struct iommu_ioas_alloc alloc_args;
> > + long page_size = sysconf(_SC_PAGESIZE);
> > + uint64_t iova = 0x200000;
> > + uint64_t phys = 0;
> > + uint64_t len_full = 0, len_capped = 0;
> > + uint64_t cap;
> > + int ret;
> > +
> > + ASSERT_GT(page_size, 0);
> > +
> > + ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
> > +
> > self->iommufd));
> > + ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
> > &alloc_args));
> > + ASSERT_EQ(0,
> > vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
> > +
> > alloc_args.out_ioas_id)); +
> > + ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
> > + iova, page_size * NR_PAGES, true);
> > + if (ret != 0)
> > + return;
> > +
> > + /* First get the full uncapped length */
> > + ret = ioas_noiommu_get_pa_ioctl(self->iommufd,
> > alloc_args.out_ioas_id,
> > + iova, &phys, &len_full);
> > + if (ret != 0)
> > + return;
> > +
> > + ASSERT_NE(0, phys);
> > + ASSERT_NE(0, len_full);
> > +
> > + /* Cap to a single page — returned length must not exceed
> > it */
> > + cap = page_size;
> > + ret = ioas_noiommu_get_pa_ioctl_len(self->iommufd,
> > alloc_args.out_ioas_id,
> > + iova, cap, &phys,
> > &len_capped);
> > + ASSERT_EQ(0, ret);
> > + ASSERT_LE(len_capped, cap);
> > + ASSERT_NE(0, len_capped);
> > +
> > + /*
> > + * If full length was larger than one page, confirm
> > capping works.
> > + * Otherwise the mapping wasn't contiguous enough to test.
> > + */
> > + if (len_full > cap)
> > + ASSERT_GT(len_full, len_capped);
> > +
> > + /* Cap to a very large value — should return the same as
> > uncapped */
> > + ret = ioas_noiommu_get_pa_ioctl_len(self->iommufd,
> > alloc_args.out_ioas_id,
> > + iova, UINT64_MAX,
> > &phys, &len_capped);
> > + ASSERT_EQ(0, ret);
> > + ASSERT_EQ(len_full, len_capped);
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > + const char *device_bdf = vfio_selftests_get_bdf(&argc,
> > argv);
> > + char *cdev = NULL;
> > +
> > + if (!device_bdf) {
> > + ksft_print_msg("No device BDF provided\n");
> > + return KSFT_SKIP;
> > + }
>
> vfio_selftests_get_bdf() already handles exiting with KSFT_SKIP if it
> can't find a BDF.
>
will remove.
> > +
> > + cdev = vfio_noiommu_get_cdev_path(device_bdf);
> > + if (!cdev) {
> > + ksft_print_msg("Could not find cdev for device
> > %s\n",
> > + device_bdf);
>
> nit: "Could not find niommu cdev for ..."
>
not needed as no-IOMMU-specific helper will be removed :)
> > + return KSFT_SKIP;
> > + }
> > +
> > + cdev_path = cdev;
> > + ksft_print_msg("Using cdev device %s for BDF %s\n",
> > cdev_path,
> > + device_bdf);
> > +
> > + return test_harness_run(argc, argv);
> > +}
> > --
> > 2.43.0
> >
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2026-06-03 0:14 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-21 22:11 [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-05-21 22:11 ` [PATCH v6 1/7] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-05-21 22:11 ` [PATCH v6 2/7] iommufd: Move igroup allocation to a function Jacob Pan
2026-05-22 6:00 ` Baolu Lu
2026-05-21 22:11 ` [PATCH v6 3/7] iommufd: Allow binding to a noiommu device Jacob Pan
2026-05-22 6:01 ` Baolu Lu
2026-05-21 22:11 ` [PATCH v6 4/7] iommufd: Add an ioctl to query PA from IOVA for noiommu mode Jacob Pan
2026-05-22 9:22 ` Yi Liu
2026-05-21 22:11 ` [PATCH v6 5/7] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
2026-05-22 9:19 ` Yi Liu
2026-05-23 22:01 ` Jacob Pan
2026-05-25 6:29 ` Yi Liu
2026-05-28 18:52 ` Jacob Pan
2026-05-29 7:27 ` Yi Liu
2026-05-21 22:11 ` [PATCH v6 6/7] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
2026-05-21 22:39 ` David Matlack
2026-06-03 0:13 ` Jacob Pan
2026-05-21 22:11 ` [PATCH v6 7/7] Documentation: Update VFIO NOIOMMU mode Jacob Pan
2026-05-22 9:42 ` Yi Liu
2026-05-23 3:42 ` Jacob Pan
2026-05-25 6:29 ` Yi Liu
2026-05-25 8:30 ` [PATCH v6 0/7] iommufd: Enable noiommu mode for cdev Tian, Kevin
2026-05-26 15:32 ` Jacob Pan
2026-05-26 17:57 ` Alex Williamson
2026-05-27 22:34 ` Jacob Pan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox