* [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev
@ 2026-03-12 15:56 Jacob Pan
2026-03-12 15:56 ` [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
` (10 more replies)
0 siblings, 11 replies; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
VFIO's unsafe_noiommu_mode has long provided a way for userspace drivers
to operate on platforms lacking a hardware IOMMU. Today, IOMMUFD also
supports No-IOMMU mode for group-based devices under vfio_compat mode.
However, IOMMUFD's native character device (cdev) does not yet support
No-IOMMU mode, which is the purpose of this patch.
In summary, we have:
|-------------------------+------+---------------|
| Device access mode | VFIO | IOMMUFD |
|-------------------------+------+---------------|
| group /dev/vfio/$GROUP | Yes | Yes |
|-------------------------+------+---------------|
| cdev /dev/vfio/devices/ | No | This patch |
|-------------------------+------+---------------|
Beyond enabling cdev for IOMMUFD, this patch also addresses the following
deficiencies in the current No-IOMMU mode suggested by Jason[1]:
- Devices operating under No-IOMMU mode are limited to device-level UAPI
access, without container or IOAS-level capabilities. Consequently,
user-space drivers lack structured mechanisms for page pinning and often
resort to mlock(), which is less robust than pin_user_pages() used for
devices backed by a physical IOMMU. For example, mlock() does not prevent
page migration.
- There is no architectural mechanism for obtaining physical addresses for
DMA. As a workaround, user-space drivers frequently rely on /proc/pagemap
tricks or hardcoded values.
By allowing noiommu device access to IOMMUFD IOAS and HWPT objects, this
patch brings No-IOMMU mode closer to full citizenship within the IOMMU
subsystem. In addition to addressing the two deficiencies mentioned above,
the expectation is that it will also enable No-IOMMU devices to seamlessly
participate in live update sessions via KHO [2].
Furthermore, these devices will use the IOMMUFD-based ownership checking model for
VFIO_DEVICE_PCI_HOT_RESET, eliminating the need for an iommufd_access object
as required in a previous attempt [3].
ChangeLog:
V2:
- Fix build depenency by adding IOMMU_SUPPORT in [8/11]
- Add an optimization to scan beyond the first page for a contiguous physical
address range and return its length instead of a single page.[4/11]
Since RFC[4]:
- Abandoned dummy iommu driver approach as patch 1-3 absorbed the
changes into iommufd.
[1] https://lore.kernel.org/linux-iommu/20250603175403.GA407344@nvidia.com/
[2] https://lore.kernel.org/linux-pci/20251027134430.00007e46@linux.microsoft.com/
[3] https://lore.kernel.org/kvm/20230522115751.326947-1-yi.l.liu@intel.com/
[4] https://lore.kernel.org/linux-iommu/20251201173012.18371-1-jacob.pan@linux.microsoft.com/
Thanks,
Jacob
Jacob Pan (8):
iommufd: Add an ioctl IOMMU_IOAS_GET_PA to query PA from IOVA
vfio: Allow null group for noiommu without containers
vfio: Introduce and set noiommu flag on vfio_device
vfio: Update noiommu device detection logic for cdev
vfio: Enable cdev noiommu mode under iommufd
vfio:selftest: Handle VFIO noiommu cdev
selftests/vfio: Add iommufd noiommu mode selftest for cdev
Doc: Update VFIO NOIOMMU mode
Jason Gunthorpe (3):
iommufd: Support a HWPT without an iommu driver for noiommu
iommufd: Move igroup allocation to a function
iommufd: Allow binding to a noiommu device
Documentation/driver-api/vfio.rst | 44 +-
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 161 +++--
drivers/iommu/iommufd/hw_pagetable.c | 11 +-
drivers/iommu/iommufd/hwpt_noiommu.c | 91 +++
drivers/iommu/iommufd/io_pagetable.c | 60 ++
drivers/iommu/iommufd/ioas.c | 28 +
drivers/iommu/iommufd/iommufd_private.h | 5 +
drivers/iommu/iommufd/main.c | 3 +
drivers/vfio/Kconfig | 7 +-
drivers/vfio/group.c | 35 +-
drivers/vfio/iommufd.c | 7 -
drivers/vfio/vfio.h | 34 +-
drivers/vfio/vfio_main.c | 22 +-
include/linux/vfio.h | 10 +
include/uapi/linux/iommufd.h | 25 +
tools/testing/selftests/vfio/Makefile | 1 +
.../selftests/vfio/lib/vfio_pci_device.c | 25 +-
.../vfio/vfio_iommufd_noiommu_test.c | 549 ++++++++++++++++++
19 files changed, 1027 insertions(+), 92 deletions(-)
create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
--
2.34.1
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-18 18:38 ` Samiullah Khawaja
2026-03-22 9:24 ` Mostafa Saleh
2026-03-12 15:56 ` [PATCH V2 02/11] iommufd: Move igroup allocation to a function Jacob Pan
` (9 subsequent siblings)
10 siblings, 2 replies; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
From: Jason Gunthorpe <jgg@nvidia.com>
Create just a little part of a real iommu driver, enough to
slot in under the dev_iommu_ops() and allow iommufd to call
domain_alloc_paging_flags() and fail everything else.
This allows explicitly creating a HWPT under an IOAS.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/hw_pagetable.c | 11 ++-
drivers/iommu/iommufd/hwpt_noiommu.c | 91 +++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 2 +
4 files changed, 103 insertions(+), 2 deletions(-)
create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 71d692c9a8f4..2b1a020b14a6 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -10,6 +10,7 @@ iommufd-y := \
vfio_compat.o \
viommu.o
+iommufd-$(CONFIG_VFIO_NOIOMMU) += hwpt_noiommu.o
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index fe789c2dc0c9..37316d77277d 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,13 @@
#include "../iommu-priv.h"
#include "iommufd_private.h"
+static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev)
+{
+ if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && !idev->igroup->group)
+ return &iommufd_noiommu_ops;
+ return dev_iommu_ops(idev->dev);
+}
+
static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
{
if (hwpt->domain)
@@ -114,7 +121,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
IOMMU_HWPT_FAULT_ID_VALID |
IOMMU_HWPT_ALLOC_PASID;
- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+ const struct iommu_ops *ops = get_iommu_ops(idev);
struct iommufd_hwpt_paging *hwpt_paging;
struct iommufd_hw_pagetable *hwpt;
int rc;
@@ -229,7 +236,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
struct iommufd_device *idev, u32 flags,
const struct iommu_user_data *user_data)
{
- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+ const struct iommu_ops *ops = get_iommu_ops(idev);
struct iommufd_hwpt_nested *hwpt_nested;
struct iommufd_hw_pagetable *hwpt;
int rc;
diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/hwpt_noiommu.c
new file mode 100644
index 000000000000..0aa99f581ca3
--- /dev/null
+++ b/drivers/iommu/iommufd/hwpt_noiommu.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
+ */
+#include <linux/iommu.h>
+#include <linux/generic_pt/iommu.h>
+#include "iommufd_private.h"
+
+static const struct iommu_domain_ops noiommu_amdv1_ops;
+
+struct noiommu_domain {
+ union {
+ struct iommu_domain domain;
+ struct pt_iommu_amdv1 amdv1;
+ };
+ spinlock_t lock;
+};
+PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain);
+
+static void noiommu_change_top(struct pt_iommu *iommu_table,
+ phys_addr_t top_paddr, unsigned int top_level)
+{
+}
+
+static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt)
+{
+ struct noiommu_domain *domain =
+ container_of(iommupt, struct noiommu_domain, amdv1.iommu);
+
+ return &domain->lock;
+}
+
+static const struct pt_iommu_driver_ops noiommu_driver_ops = {
+ .get_top_lock = noiommu_get_top_lock,
+ .change_top = noiommu_change_top,
+};
+
+static struct iommu_domain *
+noiommu_alloc_paging_flags(struct device *dev, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ struct pt_iommu_amdv1_cfg cfg = {};
+ struct noiommu_domain *dom;
+ int rc;
+
+ if (flags || user_data)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ cfg.common.hw_max_vasz_lg2 = 64;
+ cfg.common.hw_max_oasz_lg2 = 52;
+ cfg.starting_level = 2;
+ cfg.common.features =
+ (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) |
+ BIT(PT_FEAT_AMDV1_FORCE_COHERENCE));
+
+ dom = kzalloc(sizeof(*dom), GFP_KERNEL);
+ if (!dom)
+ return ERR_PTR(-ENOMEM);
+
+ spin_lock_init(&dom->lock);
+ dom->amdv1.iommu.nid = NUMA_NO_NODE;
+ dom->amdv1.iommu.driver_ops = &noiommu_driver_ops;
+ dom->domain.ops = &noiommu_amdv1_ops;
+
+ /* Use mock page table which is based on AMDV1 */
+ rc = pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL);
+ if (rc) {
+ kfree(dom);
+ return ERR_PTR(rc);
+ }
+
+ return &dom->domain;
+}
+
+static void noiommu_domain_free(struct iommu_domain *iommu_domain)
+{
+ struct noiommu_domain *domain =
+ container_of(iommu_domain, struct noiommu_domain, domain);
+
+ pt_iommu_deinit(&domain->amdv1.iommu);
+ kfree(domain);
+}
+
+static const struct iommu_domain_ops noiommu_amdv1_ops = {
+ IOMMU_PT_DOMAIN_OPS(amdv1),
+ .free = noiommu_domain_free,
+};
+
+struct iommu_ops iommufd_noiommu_ops = {
+ .domain_alloc_paging_flags = noiommu_alloc_paging_flags,
+};
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 6ac1965199e9..9c18c5eb1899 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
refcount_dec(&hwpt->obj.users);
}
+extern struct iommu_ops iommufd_noiommu_ops;
+
struct iommufd_attach;
struct iommufd_group {
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 02/11] iommufd: Move igroup allocation to a function
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-03-12 15:56 ` [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-18 18:39 ` Samiullah Khawaja
` (2 more replies)
2026-03-12 15:56 ` [PATCH V2 03/11] iommufd: Allow binding to a noiommu device Jacob Pan
` (8 subsequent siblings)
10 siblings, 3 replies; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
From: Jason Gunthorpe <jgg@nvidia.com>
So it can be reused in the next patch which allows binding to noiommu
device.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/iommu/iommufd/device.c | 48 +++++++++++++++++++++-------------
1 file changed, 30 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 344d620cdecc..54d73016468f 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -30,8 +30,9 @@ static void iommufd_group_release(struct kref *kref)
WARN_ON(!xa_empty(&igroup->pasid_attach));
- xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
- NULL, GFP_KERNEL);
+ if (igroup->group)
+ xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group),
+ igroup, NULL, GFP_KERNEL);
iommu_group_put(igroup->group);
mutex_destroy(&igroup->lock);
kfree(igroup);
@@ -56,6 +57,30 @@ static bool iommufd_group_try_get(struct iommufd_group *igroup,
return kref_get_unless_zero(&igroup->ref);
}
+static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx,
+ struct iommu_group *group)
+{
+ struct iommufd_group *new_igroup;
+
+ new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL);
+ if (!new_igroup)
+ return ERR_PTR(-ENOMEM);
+
+ kref_init(&new_igroup->ref);
+ mutex_init(&new_igroup->lock);
+ xa_init(&new_igroup->pasid_attach);
+ new_igroup->sw_msi_start = PHYS_ADDR_MAX;
+ /* group reference moves into new_igroup */
+ new_igroup->group = group;
+
+ /*
+ * The ictx is not additionally refcounted here becase all objects using
+ * an igroup must put it before their destroy completes.
+ */
+ new_igroup->ictx = ictx;
+ return new_igroup;
+}
+
/*
* iommufd needs to store some more data for each iommu_group, we keep a
* parallel xarray indexed by iommu_group id to hold this instead of putting it
@@ -87,25 +112,12 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
}
xa_unlock(&ictx->groups);
- new_igroup = kzalloc_obj(*new_igroup);
- if (!new_igroup) {
+ new_igroup = iommufd_alloc_group(ictx, group);
+ if (IS_ERR(new_igroup)) {
iommu_group_put(group);
- return ERR_PTR(-ENOMEM);
+ return new_igroup;
}
- kref_init(&new_igroup->ref);
- mutex_init(&new_igroup->lock);
- xa_init(&new_igroup->pasid_attach);
- new_igroup->sw_msi_start = PHYS_ADDR_MAX;
- /* group reference moves into new_igroup */
- new_igroup->group = group;
-
- /*
- * The ictx is not additionally refcounted here becase all objects using
- * an igroup must put it before their destroy completes.
- */
- new_igroup->ictx = ictx;
-
/*
* We dropped the lock so igroup is invalid. NULL is a safe and likely
* value to assume for the xa_cmpxchg algorithm.
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 03/11] iommufd: Allow binding to a noiommu device
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-03-12 15:56 ` [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-03-12 15:56 ` [PATCH V2 02/11] iommufd: Move igroup allocation to a function Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-22 9:54 ` Mostafa Saleh
2026-03-12 15:56 ` [PATCH V2 04/11] iommufd: Add an ioctl IOMMU_IOAS_GET_PA to query PA from IOVA Jacob Pan
` (7 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
From: Jason Gunthorpe <jgg@nvidia.com>
Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating
a dummy IOMMU group for such devices and skipping hwpt operations.
This enables noiommu devices to operate through the same iommufd API as IOMMU-
capable devices.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/iommu/iommufd/device.c | 113 ++++++++++++++++++++++-----------
1 file changed, 76 insertions(+), 37 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 54d73016468f..c38d3efa3d6f 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -23,6 +23,11 @@ struct iommufd_attach {
struct xarray device_array;
};
+static bool is_vfio_noiommu(struct iommufd_device *idev)
+{
+ return !device_iommu_mapped(idev->dev) || !idev->dev->iommu;
+}
+
static void iommufd_group_release(struct kref *kref)
{
struct iommufd_group *igroup =
@@ -205,32 +210,17 @@ void iommufd_device_destroy(struct iommufd_object *obj)
struct iommufd_device *idev =
container_of(obj, struct iommufd_device, obj);
- iommu_device_release_dma_owner(idev->dev);
+ if (!is_vfio_noiommu(idev))
+ iommu_device_release_dma_owner(idev->dev);
iommufd_put_group(idev->igroup);
if (!iommufd_selftest_is_mock_dev(idev->dev))
iommufd_ctx_put(idev->ictx);
}
-/**
- * iommufd_device_bind - Bind a physical device to an iommu fd
- * @ictx: iommufd file descriptor
- * @dev: Pointer to a physical device struct
- * @id: Output ID number to return to userspace for this device
- *
- * A successful bind establishes an ownership over the device and returns
- * struct iommufd_device pointer, otherwise returns error pointer.
- *
- * A driver using this API must set driver_managed_dma and must not touch
- * the device until this routine succeeds and establishes ownership.
- *
- * Binding a PCI device places the entire RID under iommufd control.
- *
- * The caller must undo this with iommufd_device_unbind()
- */
-struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
- struct device *dev, u32 *id)
+static int iommufd_bind_iommu(struct iommufd_device *idev)
{
- struct iommufd_device *idev;
+ struct iommufd_ctx *ictx = idev->ictx;
+ struct device *dev = idev->dev;
struct iommufd_group *igroup;
int rc;
@@ -239,11 +229,11 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
* to restore cache coherency.
*/
if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
- igroup = iommufd_get_group(ictx, dev);
+ igroup = iommufd_get_group(idev->ictx, dev);
if (IS_ERR(igroup))
- return ERR_CAST(igroup);
+ return PTR_ERR(igroup);
/*
* For historical compat with VFIO the insecure interrupt path is
@@ -269,21 +259,66 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
if (rc)
goto out_group_put;
+ /* igroup refcount moves into iommufd_device */
+ idev->igroup = igroup;
+ return 0;
+
+out_group_put:
+ iommufd_put_group(igroup);
+ return rc;
+}
+
+/**
+ * iommufd_device_bind - Bind a physical device to an iommu fd
+ * @ictx: iommufd file descriptor
+ * @dev: Pointer to a physical device struct
+ * @id: Output ID number to return to userspace for this device
+ *
+ * A successful bind establishes an ownership over the device and returns
+ * struct iommufd_device pointer, otherwise returns error pointer.
+ *
+ * A driver using this API must set driver_managed_dma and must not touch
+ * the device until this routine succeeds and establishes ownership.
+ *
+ * Binding a PCI device places the entire RID under iommufd control.
+ *
+ * The caller must undo this with iommufd_device_unbind()
+ */
+struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
+ struct device *dev, u32 *id)
+{
+ struct iommufd_device *idev;
+ int rc;
+
idev = iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE);
- if (IS_ERR(idev)) {
- rc = PTR_ERR(idev);
- goto out_release_owner;
- }
+ if (IS_ERR(idev))
+ return idev;
idev->ictx = ictx;
- if (!iommufd_selftest_is_mock_dev(dev))
- iommufd_ctx_get(ictx);
idev->dev = dev;
idev->enforce_cache_coherency =
device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+
+ if (!is_vfio_noiommu(idev)) {
+ rc = iommufd_bind_iommu(idev);
+ if (rc)
+ return ERR_PTR(rc);
+ } else {
+ struct iommufd_group *igroup;
+
+ /*
+ * Create a dummy igroup, lots of stuff expects ths igroup to be
+ * present, but a NULL igroup->group is OK
+ */
+ igroup = iommufd_alloc_group(ictx, NULL);
+ if (IS_ERR(igroup))
+ return ERR_CAST(igroup);
+ idev->igroup = igroup;
+ }
+
+ if (!iommufd_selftest_is_mock_dev(dev))
+ iommufd_ctx_get(ictx);
/* The calling driver is a user until iommufd_device_unbind() */
refcount_inc(&idev->obj.users);
- /* igroup refcount moves into iommufd_device */
- idev->igroup = igroup;
/*
* If the caller fails after this success it must call
@@ -295,11 +330,6 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
*id = idev->obj.id;
return idev;
-out_release_owner:
- iommu_device_release_dma_owner(dev);
-out_group_put:
- iommufd_put_group(igroup);
- return ERR_PTR(rc);
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD");
@@ -513,6 +543,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
struct iommufd_attach_handle *handle;
int rc;
+ if (is_vfio_noiommu(idev))
+ return 0;
+
if (!iommufd_hwpt_compatible_device(hwpt, idev))
return -EINVAL;
@@ -560,6 +593,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
{
struct iommufd_attach_handle *handle;
+ if (is_vfio_noiommu(idev))
+ return;
+
handle = iommufd_device_get_attach_handle(idev, pasid);
if (pasid == IOMMU_NO_PASID)
iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
@@ -578,6 +614,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
struct iommufd_attach_handle *handle, *old_handle;
int rc;
+ if (is_vfio_noiommu(idev))
+ return 0;
+
if (!iommufd_hwpt_compatible_device(hwpt, idev))
return -EINVAL;
@@ -653,7 +692,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
goto err_release_devid;
}
- if (attach_resv) {
+ if (attach_resv && !is_vfio_noiommu(idev)) {
rc = iommufd_device_attach_reserved_iova(idev, hwpt_paging);
if (rc)
goto err_release_devid;
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 04/11] iommufd: Add an ioctl IOMMU_IOAS_GET_PA to query PA from IOVA
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
` (2 preceding siblings ...)
2026-03-12 15:56 ` [PATCH V2 03/11] iommufd: Allow binding to a noiommu device Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-12 15:56 ` [PATCH V2 05/11] vfio: Allow null group for noiommu without containers Jacob Pan
` (6 subsequent siblings)
10 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
To support no-IOMMU mode where userspace drivers perform unsafe DMA
using physical addresses, introduce a new API to retrieve the
physical address of a user-allocated DMA buffer that has been mapped to
an IOVA via IOAS. The mapping is backed by mock I/O page tables maintained
by generic IOMMUPT framework.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
v2:
- Scan the contiguous physical-address span beyond the first page and return its length.
---
drivers/iommu/iommufd/io_pagetable.c | 60 +++++++++++++++++++++++++
drivers/iommu/iommufd/ioas.c | 28 ++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 3 ++
drivers/iommu/iommufd/main.c | 3 ++
include/uapi/linux/iommufd.h | 25 +++++++++++
5 files changed, 119 insertions(+)
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index ee003bb2f647..5372fac6077b 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -849,6 +849,66 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
return iopt_unmap_iova_range(iopt, iova, iova_last, unmapped);
}
+int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
+ u64 *length)
+{
+ struct iopt_area *area;
+ u64 tmp_length = 0;
+ u64 tmp_paddr = 0;
+ int rc = 0;
+
+ if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU))
+ return -EOPNOTSUPP;
+
+ down_read(&iopt->iova_rwsem);
+ area = iopt_area_iter_first(iopt, iova, iova);
+ if (!area || !area->pages) {
+ rc = -ENOENT;
+ goto unlock_exit;
+ }
+
+ if (!area->storage_domain ||
+ area->storage_domain->owner != &iommufd_noiommu_ops) {
+ rc = -EOPNOTSUPP;
+ goto unlock_exit;
+ }
+
+ *paddr = iommu_iova_to_phys(area->storage_domain, iova);
+ if (!*paddr) {
+ rc = -EINVAL;
+ goto unlock_exit;
+ }
+
+ tmp_length = PAGE_SIZE;
+ tmp_paddr = *paddr;
+ /*
+ * Scan the domain for the contiguous physical address length so that
+ * userspace search can be optimized for fewer ioctls.
+ */
+ while (iova < iopt_area_last_iova(area)) {
+ unsigned long next_iova;
+ u64 next_paddr;
+
+ if (check_add_overflow(iova, PAGE_SIZE, &next_iova))
+ break;
+
+ next_paddr = iommu_iova_to_phys(area->storage_domain, next_iova);
+
+ if (!next_paddr || next_paddr != tmp_paddr + PAGE_SIZE)
+ break;
+
+ iova = next_iova;
+ tmp_paddr += PAGE_SIZE;
+ tmp_length += PAGE_SIZE;
+ }
+ *length = tmp_length;
+
+unlock_exit:
+ up_read(&iopt->iova_rwsem);
+
+ return rc;
+}
+
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped)
{
/* If the IOVAs are empty then unmap all succeeds */
diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index fed06c2b728e..a724f380a465 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -375,6 +375,34 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd)
return rc;
}
+int iommufd_ioas_get_pa(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_ioas_get_pa *cmd = ucmd->cmd;
+ struct iommufd_ioas *ioas;
+ int rc;
+
+ if (cmd->flags || cmd->__reserved)
+ return -EOPNOTSUPP;
+
+ if (!cmd->iova || cmd->iova >= ULONG_MAX)
+ return -EINVAL;
+
+ ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
+ if (IS_ERR(ioas))
+ return PTR_ERR(ioas);
+
+ rc = iopt_get_phys(&ioas->iopt, cmd->iova, &cmd->out_phys,
+ &cmd->out_length);
+ if (rc)
+ goto out_put;
+
+ rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out_put:
+ iommufd_put_object(ucmd->ictx, &ioas->obj);
+
+ return rc;
+}
+
static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx,
struct xarray *ioas_list)
{
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 9c18c5eb1899..3302c6a1f99e 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -118,6 +118,8 @@ int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list,
int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
unsigned long length, unsigned long *unmapped);
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
+int iopt_get_phys(struct io_pagetable *iopt, unsigned long iova, u64 *paddr,
+ u64 *length);
int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
struct iommu_domain *domain,
@@ -346,6 +348,7 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd);
int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd);
int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
+int iommufd_ioas_get_pa(struct iommufd_ucmd *ucmd);
int iommufd_ioas_option(struct iommufd_ucmd *ucmd);
int iommufd_option_rlimit_mode(struct iommu_option *cmd,
struct iommufd_ctx *ictx);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 8c6d43601afb..ebae01ed947d 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -432,6 +432,7 @@ union ucmd_buffer {
struct iommu_veventq_alloc veventq;
struct iommu_vfio_ioas vfio_ioas;
struct iommu_viommu_alloc viommu;
+ struct iommu_ioas_get_pa get_pa;
#ifdef CONFIG_IOMMUFD_TEST
struct iommu_test_cmd test;
#endif
@@ -484,6 +485,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
struct iommu_ioas_map_file, iova),
IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap,
length),
+ IOCTL_OP(IOMMU_IOAS_GET_PA, iommufd_ioas_get_pa, struct iommu_ioas_get_pa,
+ out_phys),
IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64),
IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl,
struct iommu_vdevice_alloc, virt_id),
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1dafbc552d37..9afe0a1b11a0 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -57,6 +57,7 @@ enum {
IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92,
IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
IOMMUFD_CMD_HW_QUEUE_ALLOC = 0x94,
+ IOMMUFD_CMD_IOAS_GET_PA = 0x95,
};
/**
@@ -219,6 +220,30 @@ struct iommu_ioas_map {
};
#define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP)
+/**
+ * struct iommu_ioas_get_pa - ioctl(IOMMU_IOAS_GET_PA)
+ * @size: sizeof(struct iommu_ioas_get_pa)
+ * @flags: Reserved, must be 0 for now
+ * @ioas_id: IOAS ID to query IOVA to PA mapping from
+ * @__reserved: Must be 0
+ * @iova: IOVA to query
+ * @out_length: Number of bytes contiguous physical address starting from phys
+ * @out_phys: Output physical address the IOVA maps to
+ *
+ * Query the physical address backing an IOVA range. The entire range must be
+ * mapped already. For noiommu devices doing unsafe DMA only.
+ */
+struct iommu_ioas_get_pa {
+ __u32 size;
+ __u32 flags;
+ __u32 ioas_id;
+ __u32 __reserved;
+ __aligned_u64 iova;
+ __aligned_u64 out_length;
+ __aligned_u64 out_phys;
+};
+#define IOMMU_IOAS_GET_PA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_GET_PA)
+
/**
* struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE)
* @size: sizeof(struct iommu_ioas_map_file)
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 05/11] vfio: Allow null group for noiommu without containers
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
` (3 preceding siblings ...)
2026-03-12 15:56 ` [PATCH V2 04/11] iommufd: Add an ioctl IOMMU_IOAS_GET_PA to query PA from IOVA Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-22 9:59 ` Mostafa Saleh
2026-03-12 15:56 ` [PATCH V2 06/11] vfio: Introduce and set noiommu flag on vfio_device Jacob Pan
` (5 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
In case of noiommu mode is enabled for VFIO cdev without VFIO container
nor IOMMUFD provided compatibility container, there is no need to
create a dummy group. Update the group operations to tolerate null group
pointer.
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/vfio/group.c | 14 ++++++++++++++
drivers/vfio/vfio.h | 17 +++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 4f15016d2a5f..98f2a4f2ebff 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -381,6 +381,9 @@ int vfio_device_block_group(struct vfio_device *device)
struct vfio_group *group = device->group;
int ret = 0;
+ if (vfio_null_group_allowed() && !group)
+ return 0;
+
mutex_lock(&group->group_lock);
if (group->opened_file) {
ret = -EBUSY;
@@ -398,6 +401,9 @@ void vfio_device_unblock_group(struct vfio_device *device)
{
struct vfio_group *group = device->group;
+ if (vfio_null_group_allowed() && !group)
+ return;
+
mutex_lock(&group->group_lock);
group->cdev_device_open_cnt--;
mutex_unlock(&group->group_lock);
@@ -589,6 +595,14 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
struct vfio_group *group;
int ret;
+ /*
+ * With noiommu enabled under cdev interface only, there is no need to
+ * create a vfio_group if the group based containers are not enabled.
+ * The cdev interface is exclusively used for iommufd.
+ */
+ if (vfio_null_group_allowed())
+ return NULL;
+
iommu_group = iommu_group_alloc();
if (IS_ERR(iommu_group))
return ERR_CAST(iommu_group);
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 50128da18bca..838c08077ce2 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -113,6 +113,18 @@ bool vfio_device_has_container(struct vfio_device *device);
int __init vfio_group_init(void);
void vfio_group_cleanup(void);
+/*
+ * With noiommu enabled and no containers are supported, allow devices that
+ * don't have a dummy group.
+ */
+static inline bool vfio_null_group_allowed(void)
+{
+ if (vfio_noiommu && (!IS_ENABLED(CONFIG_VFIO_CONTAINER) && !IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)))
+ return true;
+
+ return false;
+}
+
static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
@@ -189,6 +201,11 @@ static inline void vfio_group_cleanup(void)
{
}
+static inline bool vfio_null_group_allowed(void)
+{
+ return false;
+}
+
static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
return false;
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 06/11] vfio: Introduce and set noiommu flag on vfio_device
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
` (4 preceding siblings ...)
2026-03-12 15:56 ` [PATCH V2 05/11] vfio: Allow null group for noiommu without containers Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-22 10:02 ` Mostafa Saleh
2026-03-12 15:56 ` [PATCH V2 07/11] vfio: Update noiommu device detection logic for cdev Jacob Pan
` (4 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
When a VFIO device is added to a noiommu group, set the noiommu flag on
the vfio_device structure to indicate that the device operates in
noiommu mode.
Also update function signatures to pass vfio_device instead of device,
which has the direct access to the noiommu flag.
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/vfio/group.c | 21 +++++++++++----------
include/linux/vfio.h | 1 +
2 files changed, 12 insertions(+), 10 deletions(-)
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 98f2a4f2ebff..6f98c57de9e0 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -588,7 +588,7 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
return ret;
}
-static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
+static struct vfio_group *vfio_noiommu_group_alloc(struct vfio_device *vdev,
enum vfio_group_type type)
{
struct iommu_group *iommu_group;
@@ -610,7 +610,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
ret = iommu_group_set_name(iommu_group, "vfio-noiommu");
if (ret)
goto out_put_group;
- ret = iommu_group_add_device(iommu_group, dev);
+ ret = iommu_group_add_device(iommu_group, vdev->dev);
if (ret)
goto out_put_group;
@@ -625,7 +625,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
return group;
out_remove_device:
- iommu_group_remove_device(dev);
+ iommu_group_remove_device(vdev->dev);
out_put_group:
iommu_group_put(iommu_group);
return ERR_PTR(ret);
@@ -646,23 +646,24 @@ static bool vfio_group_has_device(struct vfio_group *group, struct device *dev)
return false;
}
-static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
+static struct vfio_group *vfio_group_find_or_alloc(struct vfio_device *vdev)
{
struct iommu_group *iommu_group;
struct vfio_group *group;
- iommu_group = iommu_group_get(dev);
+ iommu_group = iommu_group_get(vdev->dev);
if (!iommu_group && vfio_noiommu) {
+ vdev->noiommu = 1;
/*
* With noiommu enabled, create an IOMMU group for devices that
* don't already have one, implying no IOMMU hardware/driver
* exists. Taint the kernel because we're about to give a DMA
* capable device to a user without IOMMU protection.
*/
- group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
+ group = vfio_noiommu_group_alloc(vdev, VFIO_NO_IOMMU);
if (!IS_ERR(group)) {
add_taint(TAINT_USER, LOCKDEP_STILL_OK);
- dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n");
+ dev_warn(vdev->dev, "Adding kernel taint for vfio-noiommu group on device\n");
}
return group;
}
@@ -673,7 +674,7 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
mutex_lock(&vfio.group_lock);
group = vfio_group_find_from_iommu(iommu_group);
if (group) {
- if (WARN_ON(vfio_group_has_device(group, dev)))
+ if (WARN_ON(vfio_group_has_device(group, vdev->dev)))
group = ERR_PTR(-EINVAL);
else
refcount_inc(&group->drivers);
@@ -693,9 +694,9 @@ int vfio_device_set_group(struct vfio_device *device,
struct vfio_group *group;
if (type == VFIO_IOMMU)
- group = vfio_group_find_or_alloc(device->dev);
+ group = vfio_group_find_or_alloc(device);
else
- group = vfio_noiommu_group_alloc(device->dev, type);
+ group = vfio_noiommu_group_alloc(device, type);
if (IS_ERR(group))
return PTR_ERR(group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e90859956514..844d14839f96 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -72,6 +72,7 @@ struct vfio_device {
u8 iommufd_attached:1;
#endif
u8 cdev_opened:1;
+ u8 noiommu:1;
#ifdef CONFIG_DEBUG_FS
/*
* debug_root is a static property of the vfio_device
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 07/11] vfio: Update noiommu device detection logic for cdev
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
` (5 preceding siblings ...)
2026-03-12 15:56 ` [PATCH V2 06/11] vfio: Introduce and set noiommu flag on vfio_device Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-22 10:04 ` Mostafa Saleh
2026-03-12 15:56 ` [PATCH V2 08/11] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
` (3 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
Rework vfio_device_is_noiommu() to derive noiommu mode based on device,
group type, and configurations.
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/vfio/vfio.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 838c08077ce2..c5541967ef9b 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -127,8 +127,13 @@ static inline bool vfio_null_group_allowed(void)
static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
- return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
- vdev->group->type == VFIO_NO_IOMMU;
+ if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU))
+ return false;
+
+ if (vfio_null_group_allowed())
+ return vdev->noiommu;
+
+ return vdev->group->type == VFIO_NO_IOMMU;
}
#else
struct vfio_group;
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 08/11] vfio: Enable cdev noiommu mode under iommufd
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
` (6 preceding siblings ...)
2026-03-12 15:56 ` [PATCH V2 07/11] vfio: Update noiommu device detection logic for cdev Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-14 8:09 ` kernel test robot
2026-03-12 15:56 ` [PATCH V2 09/11] vfio:selftest: Handle VFIO noiommu cdev Jacob Pan
` (2 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
Now that devices under noiommu mode can bind with IOMMUFD and perform
IOAS operations, lift restrictions on cdev from VFIO side.
No IOMMU cdevs are explicitly named with noiommu prefix. e.g.
/dev/vfio/
|-- 7
|-- devices
| `-- noiommu-vfio0
`-- vfio
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v2:
- Fix build dependency on IOMMU_SUPPORT
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/vfio/Kconfig | 7 +++++--
drivers/vfio/iommufd.c | 7 -------
drivers/vfio/vfio.h | 8 +-------
drivers/vfio/vfio_main.c | 22 +++++++++++++++++++---
include/linux/vfio.h | 9 +++++++++
5 files changed, 34 insertions(+), 19 deletions(-)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index ceae52fd7586..78feca3d0c8b 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
The VFIO device cdev is another way for userspace to get device
access. Userspace gets device fd by opening device cdev under
/dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
- to set up secure DMA context for device access. This interface does
- not support noiommu.
+ to set up secure DMA context for device access.
If you don't know what to do here, say N.
@@ -63,6 +62,10 @@ endif
config VFIO_NOIOMMU
bool "VFIO No-IOMMU support"
depends on VFIO_GROUP
+ select GENERIC_PT
+ select IOMMU_PT
+ select IOMMU_PT_AMDV1
+ depends on IOMMU_SUPPORT
help
VFIO is built on the ability to isolate devices using the IOMMU.
Only with an IOMMU can userspace access to DMA capable devices be
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index a38d262c6028..26c9c3068c77 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -25,10 +25,6 @@ int vfio_df_iommufd_bind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- /* Returns 0 to permit device opening under noiommu mode */
- if (vfio_device_is_noiommu(vdev))
- return 0;
-
return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
}
@@ -58,9 +54,6 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- if (vfio_device_is_noiommu(vdev))
- return;
-
if (vdev->ops->unbind_iommufd)
vdev->ops->unbind_iommufd(vdev);
}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index c5541967ef9b..f6262f2cc7a6 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -381,19 +381,13 @@ void vfio_init_device_cdev(struct vfio_device *device);
static inline int vfio_device_add(struct vfio_device *device)
{
- /* cdev does not support noiommu device */
- if (vfio_device_is_noiommu(device))
- return device_add(&device->device);
vfio_init_device_cdev(device);
return cdev_device_add(&device->cdev, &device->device);
}
static inline void vfio_device_del(struct vfio_device *device)
{
- if (vfio_device_is_noiommu(device))
- device_del(&device->device);
- else
- cdev_device_del(&device->cdev, &device->device);
+ cdev_device_del(&device->cdev, &device->device);
}
int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 742477546b15..099d9b1ade4c 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -331,13 +331,15 @@ static int __vfio_register_dev(struct vfio_device *device,
if (!device->dev_set)
vfio_assign_device_set(device, device);
- ret = dev_set_name(&device->device, "vfio%d", device->index);
+ ret = vfio_device_set_group(device, type);
if (ret)
return ret;
- ret = vfio_device_set_group(device, type);
+ /* Just to be safe, expose to user explicitly noiommu cdev node */
+ ret = dev_set_name(&device->device, "%svfio%d",
+ device->noiommu ? "noiommu-" : "", device->index);
if (ret)
- return ret;
+ goto err_out;
/*
* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
@@ -357,6 +359,10 @@ static int __vfio_register_dev(struct vfio_device *device,
/* Refcounting can't start until the driver calls register */
refcount_set(&device->refcount, 1);
+ /* noiommu device w/o container may have NULL group */
+ if (vfio_device_is_noiommu(device) && !vfio_device_has_group(device))
+ return 0;
+
vfio_device_group_register(device);
vfio_device_debugfs_init(device);
@@ -391,6 +397,16 @@ void vfio_unregister_group_dev(struct vfio_device *device)
bool interrupted = false;
long rc;
+ /*
+ * For noiommu devices without a container, thus no dummy group,
+ * simply delete and unregister to balance refcount.
+ */
+ if (device->noiommu && !vfio_device_has_group(device)) {
+ vfio_device_del(device);
+ vfio_device_put_registration(device);
+ return;
+ }
+
/*
* Prevent new device opened by userspace via the
* VFIO_GROUP_GET_DEVICE_FD in the group path.
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 844d14839f96..775bd4f6bae9 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -327,6 +327,10 @@ struct iommu_group *vfio_file_iommu_group(struct file *file);
#if IS_ENABLED(CONFIG_VFIO_GROUP)
bool vfio_file_is_group(struct file *file);
bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
+static inline bool vfio_device_has_group(struct vfio_device *device)
+{
+ return device->group;
+}
#else
static inline bool vfio_file_is_group(struct file *file)
{
@@ -337,6 +341,11 @@ static inline bool vfio_file_has_dev(struct file *file, struct vfio_device *devi
{
return false;
}
+
+static inline bool vfio_device_has_group(struct vfio_device *device)
+{
+ return false;
+}
#endif
bool vfio_file_is_valid(struct file *file);
bool vfio_file_enforced_coherent(struct file *file);
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 09/11] vfio:selftest: Handle VFIO noiommu cdev
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
` (7 preceding siblings ...)
2026-03-12 15:56 ` [PATCH V2 08/11] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-12 15:56 ` [PATCH V2 10/11] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
2026-03-12 15:56 ` [PATCH V2 11/11] Doc: Update VFIO NOIOMMU mode Jacob Pan
10 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
With unsafe DMA noiommu mode, the vfio devices are prefixed with
noiommu-, e.g.
/dev/vfio/
|-- devices
| `-- noiommu-vfio0
|-- noiommu-0
`-- vfio
Let vfio tests, such as luo kexec test, accommodate the noiommu device
files.
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
.../selftests/vfio/lib/vfio_pci_device.c | 25 +++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
index 4e5871f1ebc3..15ddeb634a8d 100644
--- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c
+++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c
@@ -290,6 +290,24 @@ static void vfio_pci_device_setup(struct vfio_pci_device *device)
device->msi_eventfds[i] = -1;
}
+
+static int is_unsafe_noiommu_mode_enabled(void)
+{
+ const char *path = "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode";
+ FILE *f;
+ int c;
+
+ f = fopen(path, "re");
+ if (!f)
+ return 0;
+
+ c = fgetc(f);
+ fclose(f);
+ if (c == 'Y' || c == 'y')
+ return 1;
+ return 0;
+}
+
const char *vfio_pci_get_cdev_path(const char *bdf)
{
char dir_path[PATH_MAX];
@@ -306,8 +324,11 @@ const char *vfio_pci_get_cdev_path(const char *bdf)
VFIO_ASSERT_NOT_NULL(dir, "Failed to open directory %s\n", dir_path);
while ((entry = readdir(dir)) != NULL) {
- /* Find the file that starts with "vfio" */
- if (strncmp("vfio", entry->d_name, 4))
+ /* Find the file that starts with "noiommu-vfio" or "vfio" */
+ if (is_unsafe_noiommu_mode_enabled()) {
+ if (strncmp("noiommu-vfio", entry->d_name, strlen("noiommu-vfio")))
+ continue;
+ } else if (strncmp("vfio", entry->d_name, 4))
continue;
snprintf(cdev_path, PATH_MAX, "/dev/vfio/devices/%s", entry->d_name);
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 10/11] selftests/vfio: Add iommufd noiommu mode selftest for cdev
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
` (8 preceding siblings ...)
2026-03-12 15:56 ` [PATCH V2 09/11] vfio:selftest: Handle VFIO noiommu cdev Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-12 15:56 ` [PATCH V2 11/11] Doc: Update VFIO NOIOMMU mode Jacob Pan
10 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
Add comprehensive selftest for VFIO device operations with iommufd in
noiommu mode. Tests cover:
- Device binding to iommufd
- IOAS (I/O Address Space) allocation, mapping with dummy IOVA
- Retrieve PA from dummy IOVA
- Device attach/detach operations as usual
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
v2:
- Use huge page ioas map to test GET_PA searching for contiguous PA
range.
---
tools/testing/selftests/vfio/Makefile | 1 +
.../vfio/vfio_iommufd_noiommu_test.c | 549 ++++++++++++++++++
2 files changed, 550 insertions(+)
create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
index 8e90e409e91d..90f41d8ce3c7 100644
--- a/tools/testing/selftests/vfio/Makefile
+++ b/tools/testing/selftests/vfio/Makefile
@@ -9,6 +9,7 @@ CFLAGS = $(KHDR_INCLUDES)
TEST_GEN_PROGS += vfio_dma_mapping_test
TEST_GEN_PROGS += vfio_dma_mapping_mmio_test
TEST_GEN_PROGS += vfio_iommufd_setup_test
+TEST_GEN_PROGS += vfio_iommufd_noiommu_test
TEST_GEN_PROGS += vfio_pci_device_test
TEST_GEN_PROGS += vfio_pci_device_init_perf_test
TEST_GEN_PROGS += vfio_pci_driver_test
diff --git a/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
new file mode 100644
index 000000000000..c4e4fcd09342
--- /dev/null
+++ b/tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c
@@ -0,0 +1,549 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * VFIO iommufd NoIOMMU Mode Selftest
+ *
+ * Tests VFIO device operations with iommufd in noiommu mode, including:
+ * - Device binding to iommufd
+ * - IOAS (I/O Address Space) allocation and management
+ * - Device attach/detach to IOAS
+ * - Memory mapping in IOAS
+ * - Device info queries and reset
+ */
+
+#include <linux/limits.h>
+#include <linux/vfio.h>
+#include <linux/iommufd.h>
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <dirent.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <errno.h>
+
+#include <libvfio.h>
+#include "kselftest_harness.h"
+
+static const char iommu_dev_path[] = "/dev/iommu";
+static const char *cdev_path;
+
+static char *vfio_noiommu_get_device_id(const char *bdf)
+{
+ char *path = NULL;
+ char *vfio_id = NULL;
+ struct dirent *dentry;
+ DIR *dp;
+
+ if (asprintf(&path, "/sys/bus/pci/devices/%s/vfio-dev", bdf) < 0)
+ return NULL;
+
+ dp = opendir(path);
+ if (!dp) {
+ free(path);
+ return NULL;
+ }
+
+ while ((dentry = readdir(dp)) != NULL) {
+ if (strncmp("noiommu-vfio", dentry->d_name, 12) == 0) {
+ vfio_id = strdup(dentry->d_name);
+ break;
+ }
+ }
+
+ closedir(dp);
+ free(path);
+ return vfio_id;
+}
+
+static char *vfio_noiommu_get_cdev_path(const char *bdf)
+{
+ char *vfio_id = vfio_noiommu_get_device_id(bdf);
+ char *cdev = NULL;
+
+ if (vfio_id) {
+ asprintf(&cdev, "/dev/vfio/devices/%s", vfio_id);
+ free(vfio_id);
+ }
+ return cdev;
+}
+
+static int vfio_device_bind_iommufd_ioctl(int cdev_fd, int iommufd)
+{
+ struct vfio_device_bind_iommufd bind_args = {
+ .argsz = sizeof(bind_args),
+ .iommufd = iommufd,
+ };
+
+ return ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind_args);
+}
+
+static int vfio_device_get_info_ioctl(int cdev_fd,
+ struct vfio_device_info *info)
+{
+ info->argsz = sizeof(*info);
+ return ioctl(cdev_fd, VFIO_DEVICE_GET_INFO, info);
+}
+
+static int vfio_device_ioas_alloc_ioctl(int iommufd,
+ struct iommu_ioas_alloc *alloc_args)
+{
+ alloc_args->size = sizeof(*alloc_args);
+ alloc_args->flags = 0;
+ return ioctl(iommufd, IOMMU_IOAS_ALLOC, alloc_args);
+}
+
+static int vfio_device_attach_iommufd_pt_ioctl(int cdev_fd, u32 pt_id)
+{
+ struct vfio_device_attach_iommufd_pt attach_args = {
+ .argsz = sizeof(attach_args),
+ .pt_id = pt_id,
+ };
+
+ return ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_args);
+}
+
+static int vfio_device_detach_iommufd_pt_ioctl(int cdev_fd)
+{
+ struct vfio_device_detach_iommufd_pt detach_args = {
+ .argsz = sizeof(detach_args),
+ };
+
+ return ioctl(cdev_fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_args);
+}
+
+static int vfio_device_get_region_info_ioctl(int cdev_fd, uint32_t index,
+ struct vfio_region_info *info)
+{
+ info->argsz = sizeof(*info);
+ info->index = index;
+ return ioctl(cdev_fd, VFIO_DEVICE_GET_REGION_INFO, info);
+}
+
+static int vfio_device_reset_ioctl(int cdev_fd)
+{
+ return ioctl(cdev_fd, VFIO_DEVICE_RESET);
+}
+
+static int ioas_map_pages(int iommufd, uint32_t ioas_id, uint64_t iova,
+ size_t length, bool hugepages)
+{
+ struct iommu_ioas_map map_args = {
+ .size = sizeof(map_args),
+ .ioas_id = ioas_id,
+ .iova = iova,
+ .length = length,
+ .flags = IOMMU_IOAS_MAP_READABLE | IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_FIXED_IOVA,
+ };
+ void *pages;
+ int ret;
+
+ /* Allocate test pages */
+ if (hugepages)
+ pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
+ else
+ pages = mmap(NULL, length, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (pages == MAP_FAILED) {
+ printf("mmap failed for length 0x%lx\n", (unsigned long)length);
+ return -ENOMEM;
+ }
+
+ /* Set up page pointer for mapping */
+ map_args.user_va = (uintptr_t)pages;
+
+ printf(" ioas_map_pages: ioas_id=%u, iova=0x%lx, length=0x%lx, user_va=%p\n",
+ ioas_id, (unsigned long)iova, (unsigned long)length, pages);
+
+ /* Map into IOAS */
+ ret = ioctl(iommufd, IOMMU_IOAS_MAP, &map_args);
+ if (ret != 0)
+ printf(" IOMMU_IOAS_MAP failed: %d (%s)\n", ret, strerror(errno));
+ else
+ printf(" IOMMU_IOAS_MAP succeeded, IOVA=0x%lx\n", (unsigned long)map_args.iova);
+
+ munmap(pages, length);
+ return ret;
+}
+
+static int ioas_unmap_pages(int iommufd, uint32_t ioas_id, uint64_t iova,
+ size_t length)
+{
+ struct iommu_ioas_unmap unmap_args = {
+ .size = sizeof(unmap_args),
+ .ioas_id = ioas_id,
+ .iova = iova,
+ .length = length,
+ };
+
+ return ioctl(iommufd, IOMMU_IOAS_UNMAP, &unmap_args);
+}
+
+static int ioas_destroy_ioctl(int iommufd, uint32_t ioas_id)
+{
+ struct iommu_destroy destroy_args = {
+ .size = sizeof(destroy_args),
+ .id = ioas_id,
+ };
+
+ return ioctl(iommufd, IOMMU_DESTROY, &destroy_args);
+}
+
+static int ioas_get_pa_ioctl(int iommufd, uint32_t ioas_id, uint64_t iova,
+ uint64_t *phys_out, uint64_t *length_out)
+{
+ struct {
+ __u32 size;
+ __u32 flags;
+ __u32 ioas_id;
+ __u32 __reserved;
+ __u64 iova;
+ __u64 out_length;
+ __u64 out_phys;
+ } get_pa = {
+ .size = sizeof(get_pa),
+ .flags = 0,
+ .ioas_id = ioas_id,
+ .iova = iova,
+ };
+
+ printf(" ioas_get_pa_ioctl: ioas_id=%u, iova=0x%lx\n",
+ ioas_id, (unsigned long)iova);
+
+ if (ioctl(iommufd, IOMMU_IOAS_GET_PA, &get_pa) != 0) {
+ printf(" IOMMU_IOAS_GET_PA failed: %s (errno=%d)\n",
+ strerror(errno), errno);
+ return -1;
+ }
+
+ printf(" IOMMU_IOAS_GET_PA succeeded: PA=0x%lx, length=0x%lx\n",
+ (unsigned long)get_pa.out_phys, (unsigned long)get_pa.out_length);
+
+ if (phys_out)
+ *phys_out = get_pa.out_phys;
+ if (length_out)
+ *length_out = get_pa.out_length;
+
+ return 0;
+}
+
+FIXTURE(vfio_noiommu) {
+ int cdev_fd;
+ int iommufd;
+};
+
+FIXTURE_SETUP(vfio_noiommu)
+{
+ ASSERT_LE(0, (self->cdev_fd = open(cdev_path, O_RDWR, 0)));
+ ASSERT_LE(0, (self->iommufd = open(iommu_dev_path, O_RDWR, 0)));
+}
+
+FIXTURE_TEARDOWN(vfio_noiommu)
+{
+ if (self->cdev_fd >= 0)
+ close(self->cdev_fd);
+ if (self->iommufd >= 0)
+ close(self->iommufd);
+}
+
+/*
+ * Test: Device cdev can be opened
+ */
+TEST_F(vfio_noiommu, device_cdev_open)
+{
+ ASSERT_LE(0, self->cdev_fd);
+}
+
+/*
+ * Test: Device can be bound to iommufd
+ */
+TEST_F(vfio_noiommu, device_bind_iommufd)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+}
+
+/*
+ * Test: Device info can be queried after binding
+ */
+TEST_F(vfio_noiommu, device_get_info_after_bind)
+{
+ struct vfio_device_info info;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
+ ASSERT_NE(0, info.argsz);
+}
+
+/*
+ * Test: Getting device info fails without bind
+ */
+TEST_F(vfio_noiommu, device_get_info_without_bind_fails)
+{
+ struct vfio_device_info info;
+
+ ASSERT_NE(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
+}
+
+/*
+ * Test: Binding with invalid iommufd fails
+ */
+TEST_F(vfio_noiommu, device_bind_bad_iommufd_fails)
+{
+ ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd, -2));
+}
+
+/*
+ * Test: Cannot bind twice to same device
+ */
+TEST_F(vfio_noiommu, device_repeated_bind_fails)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_NE(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+}
+
+/*
+ * Test: IOAS can be allocated
+ */
+TEST_F(vfio_noiommu, ioas_alloc)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_NE(0, alloc_args.out_ioas_id);
+}
+
+/*
+ * Test: IOAS can be destroyed
+ */
+TEST_F(vfio_noiommu, ioas_destroy)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_EQ(0, ioas_destroy_ioctl(self->iommufd,
+ alloc_args.out_ioas_id));
+}
+
+/*
+ * Test: Device can attach to IOAS after binding
+ */
+TEST_F(vfio_noiommu, device_attach_to_ioas)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+}
+
+/*
+ * Test: Attaching to invalid IOAS fails
+ */
+TEST_F(vfio_noiommu, device_attach_invalid_ioas_fails)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_NE(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ UINT32_MAX));
+}
+
+/*
+ * Test: Device can detach from IOAS
+ */
+TEST_F(vfio_noiommu, device_detach_from_ioas)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+ ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd));
+}
+
+/*
+ * Test: Full lifecycle - bind, attach, detach, reset
+ */
+TEST_F(vfio_noiommu, device_lifecycle)
+{
+ struct iommu_ioas_alloc alloc_args;
+ struct vfio_device_info info;
+
+ /* Bind device to iommufd */
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+
+ /* Allocate IOAS */
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ /* Attach device to IOAS */
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+
+ /* Query device info */
+ ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &info));
+
+ /* Detach device from IOAS */
+ ASSERT_EQ(0, vfio_device_detach_iommufd_pt_ioctl(self->cdev_fd));
+
+ /* Reset device */
+ ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
+}
+
+/*
+ * Test: Get region info
+ */
+TEST_F(vfio_noiommu, device_get_region_info)
+{
+ struct vfio_device_info dev_info;
+ struct vfio_region_info region_info;
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_get_info_ioctl(self->cdev_fd, &dev_info));
+
+ /* Try to get first region info if device has regions */
+ if (dev_info.num_regions > 0) {
+ ASSERT_EQ(0, vfio_device_get_region_info_ioctl(self->cdev_fd, 0,
+ ®ion_info));
+ ASSERT_NE(0, region_info.argsz);
+ }
+}
+
+TEST_F(vfio_noiommu, device_reset)
+{
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+ ASSERT_EQ(0, vfio_device_reset_ioctl(self->cdev_fd));
+}
+
+TEST_F(vfio_noiommu, ioas_map_pages)
+{
+ struct iommu_ioas_alloc alloc_args;
+ long page_size = sysconf(_SC_PAGESIZE);
+ uint64_t iova = 0x10000;
+ int i;
+
+ ASSERT_GT(page_size, 0);
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ printf("Page size: %ld bytes\n", page_size);
+ /* Test mapping regions of different sizes: 1, 2, 4, 8 pages */
+ for (i = 0; i < 4; i++) {
+ size_t map_size = page_size * (1 << i); /* 1, 2, 4, 8 pages */
+ uint64_t test_iova = iova + (i * 0x100000);
+
+ /* Attempt to map each region (may fail if not supported) */
+ ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
+ test_iova, map_size, false);
+ }
+}
+
+TEST_F(vfio_noiommu, multiple_ioas_alloc)
+{
+ struct iommu_ioas_alloc alloc1, alloc2;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc1));
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd, &alloc2));
+ ASSERT_NE(alloc1.out_ioas_id, alloc2.out_ioas_id);
+}
+
+/*
+ * Test: Query physical address for IOVA
+ * Tests IOMMU_IOAS_GET_PA ioctl to translate IOVA to physical address
+ * Note: Device must be attached to IOAS for PA query to work
+ */
+#define NR_PAGES 32
+TEST_F(vfio_noiommu, ioas_get_pa_mapped)
+{
+ struct iommu_ioas_alloc alloc_args;
+ long page_size = sysconf(_SC_PAGESIZE);
+ uint64_t iova = 0x200000;
+ uint64_t phys = 0;
+ uint64_t length = 0;
+ int ret;
+
+ ASSERT_GT(page_size, 0);
+
+ ASSERT_EQ(0, vfio_device_bind_iommufd_ioctl(self->cdev_fd,
+ self->iommufd));
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ ASSERT_EQ(0, vfio_device_attach_iommufd_pt_ioctl(self->cdev_fd,
+ alloc_args.out_ioas_id));
+
+ /*
+ * Map a page into an arbitrary IOAS, used as a cookie for lookup.
+ * Use hugepages to test contiguous PA. Make sure hugepages are
+ * available. e.g. echo 64 > /proc/sys/vm/nr_hugepages
+ */
+ ret = ioas_map_pages(self->iommufd, alloc_args.out_ioas_id,
+ iova, page_size * NR_PAGES, true);
+ if (ret != 0)
+ return;
+
+ /* Query the physical address for the mapped dummy IOVA */
+ ret = ioas_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ iova, &phys, &length);
+
+ if (ret == 0) {
+ /* If we got a result, verify it's valid */
+ ASSERT_NE(0, phys);
+ ASSERT_GE((uint64_t)page_size * NR_PAGES, length);
+ }
+}
+
+TEST_F(vfio_noiommu, ioas_get_pa_unmapped_fails)
+{
+ struct iommu_ioas_alloc alloc_args;
+
+ ASSERT_EQ(0, vfio_device_ioas_alloc_ioctl(self->iommufd,
+ &alloc_args));
+
+ /* Try to retrieve unmapped IOVA (should fail) */
+ ASSERT_NE(0, ioas_get_pa_ioctl(self->iommufd, alloc_args.out_ioas_id,
+ 0x10000, NULL, NULL));
+}
+
+int main(int argc, char *argv[])
+{
+ const char *device_bdf = vfio_selftests_get_bdf(&argc, argv);
+ char *cdev = NULL;
+
+ if (!device_bdf) {
+ ksft_print_msg("No device BDF provided\n");
+ return KSFT_SKIP;
+ }
+
+ cdev = vfio_noiommu_get_cdev_path(device_bdf);
+ if (!cdev) {
+ ksft_print_msg("Could not find cdev for device %s\n",
+ device_bdf);
+ return KSFT_SKIP;
+ }
+
+ cdev_path = cdev;
+ ksft_print_msg("Using cdev device %s for BDF %s\n", cdev_path,
+ device_bdf);
+
+ return test_harness_run(argc, argv);
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH V2 11/11] Doc: Update VFIO NOIOMMU mode
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
` (9 preceding siblings ...)
2026-03-12 15:56 ` [PATCH V2 10/11] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
@ 2026-03-12 15:56 ` Jacob Pan
2026-03-13 17:48 ` kernel test robot
10 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2026-03-12 15:56 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: skhawaja, pasha.tatashin, Will Deacon, Jacob Pan, Baolu Lu
Document the NOIOMMU mode with newly added cdev support under iommufd.
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
Documentation/driver-api/vfio.rst | 44 +++++++++++++++++++++++++++++--
1 file changed, 42 insertions(+), 2 deletions(-)
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 2a21a42c9386..d1ee13dc6e98 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -275,8 +275,6 @@ in a VFIO group.
With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
by directly opening a character device /dev/vfio/devices/vfioX where
"X" is the number allocated uniquely by VFIO for registered devices.
-cdev interface does not support noiommu devices, so user should use
-the legacy group interface if noiommu is wanted.
The cdev only works with IOMMUFD. Both VFIO drivers and applications
must adapt to the new cdev security model which requires using
@@ -370,6 +368,48 @@ IOMMUFD IOAS/HWPT to enable userspace DMA::
/* Other device operations as stated in "VFIO Usage Example" */
+VFIO NOIOMMU mode
+-------------------------------------------------------------------------------
+VFIO also supports a no-IOMMU mode, intended for usages where unsafe DMA can
+be performed by userspace drivers w/o physical IOMMU protection. This mode
+is controlled by the parameter:
+
+/sys/module/vfio/parameters/enable_unsafe_noiommu_mode
+
+Upon enabling this mode, with an assigned device, the user will be presented
+with a VFIO group and device file, e.g.
+
+/dev/vfio/
+|-- devices
+| `-- noiommu-vfio0 /* VFIO device cdev */
+|-- noiommu-0 /* VFIO group */
+`-- vfio
+
+The capabilities vary depending on the device programming interface and kernel
+configuration used. The following table summarizes the differences:
+
++-------------------+---------------------+---------------------+
+| Feature | VFIO group | VFIO device cdev |
++===================+=====================+=====================+
+| VFIO device UAPI | Yes | Yes |
++-------------------+---------------------+---------------------+
+| VFIO container | No | No |
++-------------------+---------------------+---------------------+
+| IOMMUFD IOAS | No | Yes* |
++-------------------+---------------------+---------------------+
+Note that the VFIO container case includes IOMMUFD provided VFIO compatibility
+interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER is
+enabled.
+
+* IOMMUFD UAPI is available for VFIO device cdev to pin and map user memory with
+the ability to retrieve physical addresses for DMA command submission.
+
+A new IOMMUFD ioctl IOMMU_IOAS_GET_PA is added to retrieve the physical address
+for a given user virtual address. Note that IOMMU_IOAS_MAP_FIXED_IOVA flag is
+ignored in no-IOMMU mode since there is no physical DMA remapping hardware.
+tools/testing/selftests/vfio/vfio_iommufd_noiommu_test.c provides an example of
+using this ioctl in no-IOMMU mode.
+
VFIO User API
-------------------------------------------------------------------------------
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH V2 11/11] Doc: Update VFIO NOIOMMU mode
2026-03-12 15:56 ` [PATCH V2 11/11] Doc: Update VFIO NOIOMMU mode Jacob Pan
@ 2026-03-13 17:48 ` kernel test robot
0 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2026-03-13 17:48 UTC (permalink / raw)
To: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: oe-kbuild-all, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
Hi Jacob,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on next-20260313]
[cannot apply to awilliam-vfio/next awilliam-vfio/for-linus v6.16-rc1]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Jacob-Pan/iommufd-Support-a-HWPT-without-an-iommu-driver-for-noiommu/20260313-182818
base: linus/master
patch link: https://lore.kernel.org/r/20260312155637.376854-12-jacob.pan%40linux.microsoft.com
patch subject: [PATCH V2 11/11] Doc: Update VFIO NOIOMMU mode
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260313/202603131832.GiQt3WWE-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603131832.GiQt3WWE-lkp@intel.com/
All warnings (new ones prefixed by >>):
Ring Design
----------- [docutils]
WARNING: ./include/linux/usb/typec_altmode.h:44 struct member 'priority' not described in 'typec_altmode'
WARNING: ./include/linux/usb/typec_altmode.h:44 struct member 'mode_selection' not described in 'typec_altmode'
>> Documentation/driver-api/vfio.rst:382: WARNING: Inline substitution_reference start-string without end-string. [docutils]
>> Documentation/driver-api/vfio.rst:382: WARNING: Inline interpreted text or phrase reference start-string without end-string. [docutils]
>> Documentation/driver-api/vfio.rst:382: WARNING: Inline emphasis start-string without end-string. [docutils]
>> Documentation/driver-api/vfio.rst:382: WARNING: Inline interpreted text or phrase reference start-string without end-string. [docutils]
Documentation/driver-api/vfio.rst:392: ERROR: Malformed table.
Right border not aligned or missing.
--
+-------------------+---------------------+---------------------+
| VFIO container | No | No |
+-------------------+---------------------+---------------------+
| IOMMUFD IOAS | No | Yes* |
+-------------------+---------------------+---------------------+ [docutils]
>> Documentation/driver-api/vfio.rst:400: WARNING: Blank line required after table. [docutils]
>> Documentation/driver-api/vfio.rst:405: WARNING: Bullet list ends without a blank line; unexpected unindent. [docutils]
WARNING: ./include/linux/virtio.h:188 struct member 'map' not described in 'virtio_device'
WARNING: ./include/linux/virtio.h:188 struct member 'VIRTIO_DECLARE_FEATURES(features' not described in 'virtio_device'
WARNING: ./include/linux/virtio.h:188 struct member 'vmap' not described in 'virtio_device'
Documentation/gpu/amdgpu/display/display-manager:47: ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c:61: ERROR: Unexpected section title.
vim +382 Documentation/driver-api/vfio.rst
381
> 382 /dev/vfio/
383 |-- devices
384 | `-- noiommu-vfio0 /* VFIO device cdev */
385 |-- noiommu-0 /* VFIO group */
386 `-- vfio
387
388 The capabilities vary depending on the device programming interface and kernel
389 configuration used. The following table summarizes the differences:
390
391 +-------------------+---------------------+---------------------+
392 | Feature | VFIO group | VFIO device cdev |
393 +===================+=====================+=====================+
394 | VFIO device UAPI | Yes | Yes |
395 +-------------------+---------------------+---------------------+
396 | VFIO container | No | No |
397 +-------------------+---------------------+---------------------+
398 | IOMMUFD IOAS | No | Yes* |
399 +-------------------+---------------------+---------------------+
> 400 Note that the VFIO container case includes IOMMUFD provided VFIO compatibility
401 interfaces when either CONFIG_VFIO_CONTAINER or CONFIG_IOMMUFD_VFIO_CONTAINER is
402 enabled.
403
404 * IOMMUFD UAPI is available for VFIO device cdev to pin and map user memory with
> 405 the ability to retrieve physical addresses for DMA command submission.
406
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 08/11] vfio: Enable cdev noiommu mode under iommufd
2026-03-12 15:56 ` [PATCH V2 08/11] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
@ 2026-03-14 8:09 ` kernel test robot
0 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2026-03-14 8:09 UTC (permalink / raw)
To: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu
Cc: oe-kbuild-all, skhawaja, pasha.tatashin, Will Deacon, Jacob Pan,
Baolu Lu
Hi Jacob,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v7.0-rc3 next-20260311]
[cannot apply to awilliam-vfio/next awilliam-vfio/for-linus]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Jacob-Pan/iommufd-Support-a-HWPT-without-an-iommu-driver-for-noiommu/20260313-182818
base: linus/master
patch link: https://lore.kernel.org/r/20260312155637.376854-9-jacob.pan%40linux.microsoft.com
patch subject: [PATCH V2 08/11] vfio: Enable cdev noiommu mode under iommufd
config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20260314/202603141655.DaMKZr7a-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260314/202603141655.DaMKZr7a-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603141655.DaMKZr7a-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/iommu/generic_pt/fmt/iommu_template.h:36,
from drivers/iommu/generic_pt/fmt/iommu_mock.c:10:
drivers/iommu/generic_pt/fmt/amdv1.h: In function 'amdv1pt_install_table':
>> drivers/iommu/generic_pt/fmt/amdv1.h:255:16: error: implicit declaration of function 'pt_table_install64'; did you mean 'pt_table_install32'? [-Wimplicit-function-declaration]
255 | return pt_table_install64(pts, entry);
| ^~~~~~~~~~~~~~~~~~
| pt_table_install32
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for IOMMU_PT_AMDV1
Depends on [n]: GENERIC_PT [=y] && IOMMU_PT [=m] && !GENERIC_ATOMIC64 [=y]
Selected by [m]:
- VFIO_NOIOMMU [=y] && VFIO [=m] && VFIO_GROUP [=y] && IOMMU_SUPPORT [=y]
vim +255 drivers/iommu/generic_pt/fmt/amdv1.h
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 236
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 237 static inline bool amdv1pt_install_table(struct pt_state *pts,
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 238 pt_oaddr_t table_pa,
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 239 const struct pt_write_attrs *attrs)
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 240 {
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 241 u64 entry;
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 242
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 243 /*
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 244 * IR and IW are ANDed from the table levels along with the PTE. We
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 245 * always control permissions from the PTE, so always set IR and IW for
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 246 * tables.
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 247 */
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 248 entry = AMDV1PT_FMT_PR |
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 249 FIELD_PREP(AMDV1PT_FMT_NEXT_LEVEL, pts->level) |
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 250 FIELD_PREP(AMDV1PT_FMT_OA,
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 251 log2_div(table_pa, PT_GRANULE_LG2SZ)) |
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 252 AMDV1PT_FMT_IR | AMDV1PT_FMT_IW;
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 253 if (pts_feature(pts, PT_FEAT_AMDV1_ENCRYPT_TABLES))
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 254 entry = __sme_set(entry);
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 @255 return pt_table_install64(pts, entry);
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 256 }
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 257 #define pt_install_table amdv1pt_install_table
879ced2bab1ba9 Jason Gunthorpe 2025-11-04 258
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu
2026-03-12 15:56 ` [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
@ 2026-03-18 18:38 ` Samiullah Khawaja
2026-03-23 13:17 ` Jason Gunthorpe
2026-03-22 9:24 ` Mostafa Saleh
1 sibling, 1 reply; 31+ messages in thread
From: Samiullah Khawaja @ 2026-03-18 18:38 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, pasha.tatashin, Will Deacon,
Baolu Lu
On Thu, Mar 12, 2026 at 08:56:27AM -0700, Jacob Pan wrote:
>From: Jason Gunthorpe <jgg@nvidia.com>
>
>Create just a little part of a real iommu driver, enough to
>slot in under the dev_iommu_ops() and allow iommufd to call
>domain_alloc_paging_flags() and fail everything else.
>
>This allows explicitly creating a HWPT under an IOAS.
>
>Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
>---
> drivers/iommu/iommufd/Makefile | 1 +
> drivers/iommu/iommufd/hw_pagetable.c | 11 ++-
> drivers/iommu/iommufd/hwpt_noiommu.c | 91 +++++++++++++++++++++++++
> drivers/iommu/iommufd/iommufd_private.h | 2 +
> 4 files changed, 103 insertions(+), 2 deletions(-)
> create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
>
>diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
>index 71d692c9a8f4..2b1a020b14a6 100644
>--- a/drivers/iommu/iommufd/Makefile
>+++ b/drivers/iommu/iommufd/Makefile
>@@ -10,6 +10,7 @@ iommufd-y := \
> vfio_compat.o \
> viommu.o
>
>+iommufd-$(CONFIG_VFIO_NOIOMMU) += hwpt_noiommu.o
> iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
>
> obj-$(CONFIG_IOMMUFD) += iommufd.o
>diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
>index fe789c2dc0c9..37316d77277d 100644
>--- a/drivers/iommu/iommufd/hw_pagetable.c
>+++ b/drivers/iommu/iommufd/hw_pagetable.c
>@@ -8,6 +8,13 @@
> #include "../iommu-priv.h"
> #include "iommufd_private.h"
>
>+static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev)
>+{
>+ if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && !idev->igroup->group)
>+ return &iommufd_noiommu_ops;
>+ return dev_iommu_ops(idev->dev);
>+}
>+
> static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
> {
> if (hwpt->domain)
>@@ -114,7 +121,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
> IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
> IOMMU_HWPT_FAULT_ID_VALID |
> IOMMU_HWPT_ALLOC_PASID;
>- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
>+ const struct iommu_ops *ops = get_iommu_ops(idev);
> struct iommufd_hwpt_paging *hwpt_paging;
> struct iommufd_hw_pagetable *hwpt;
> int rc;
>@@ -229,7 +236,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
> struct iommufd_device *idev, u32 flags,
> const struct iommu_user_data *user_data)
> {
>- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
>+ const struct iommu_ops *ops = get_iommu_ops(idev);
> struct iommufd_hwpt_nested *hwpt_nested;
> struct iommufd_hw_pagetable *hwpt;
> int rc;
>diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/hwpt_noiommu.c
>new file mode 100644
>index 000000000000..0aa99f581ca3
>--- /dev/null
>+++ b/drivers/iommu/iommufd/hwpt_noiommu.c
>@@ -0,0 +1,91 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+/*
>+ * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
>+ */
>+#include <linux/iommu.h>
>+#include <linux/generic_pt/iommu.h>
>+#include "iommufd_private.h"
>+
>+static const struct iommu_domain_ops noiommu_amdv1_ops;
>+
>+struct noiommu_domain {
>+ union {
>+ struct iommu_domain domain;
>+ struct pt_iommu_amdv1 amdv1;
>+ };
>+ spinlock_t lock;
>+};
>+PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain);
>+
>+static void noiommu_change_top(struct pt_iommu *iommu_table,
>+ phys_addr_t top_paddr, unsigned int top_level)
>+{
>+}
>+
>+static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt)
>+{
>+ struct noiommu_domain *domain =
>+ container_of(iommupt, struct noiommu_domain, amdv1.iommu);
>+
>+ return &domain->lock;
>+}
>+
>+static const struct pt_iommu_driver_ops noiommu_driver_ops = {
>+ .get_top_lock = noiommu_get_top_lock,
>+ .change_top = noiommu_change_top,
>+};
>+
>+static struct iommu_domain *
>+noiommu_alloc_paging_flags(struct device *dev, u32 flags,
>+ const struct iommu_user_data *user_data)
>+{
>+ struct pt_iommu_amdv1_cfg cfg = {};
>+ struct noiommu_domain *dom;
>+ int rc;
>+
>+ if (flags || user_data)
>+ return ERR_PTR(-EOPNOTSUPP);
>+
>+ cfg.common.hw_max_vasz_lg2 = 64;
>+ cfg.common.hw_max_oasz_lg2 = 52;
>+ cfg.starting_level = 2;
>+ cfg.common.features =
>+ (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) |
>+ BIT(PT_FEAT_AMDV1_FORCE_COHERENCE));
>+
>+ dom = kzalloc(sizeof(*dom), GFP_KERNEL);
>+ if (!dom)
>+ return ERR_PTR(-ENOMEM);
>+
>+ spin_lock_init(&dom->lock);
>+ dom->amdv1.iommu.nid = NUMA_NO_NODE;
>+ dom->amdv1.iommu.driver_ops = &noiommu_driver_ops;
>+ dom->domain.ops = &noiommu_amdv1_ops;
>+
>+ /* Use mock page table which is based on AMDV1 */
>+ rc = pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL);
>+ if (rc) {
>+ kfree(dom);
>+ return ERR_PTR(rc);
>+ }
>+
>+ return &dom->domain;
>+}
>+
>+static void noiommu_domain_free(struct iommu_domain *iommu_domain)
>+{
>+ struct noiommu_domain *domain =
>+ container_of(iommu_domain, struct noiommu_domain, domain);
>+
>+ pt_iommu_deinit(&domain->amdv1.iommu);
>+ kfree(domain);
>+}
>+
>+static const struct iommu_domain_ops noiommu_amdv1_ops = {
>+ IOMMU_PT_DOMAIN_OPS(amdv1),
I understand that this fits in really well into the iommufd/hwpt
construction, but do we need page tables for this as all the
iova-to-phys information should be available in the IOPT in IOAS? As the
get_pa() function introduced in the later patch is only used for noiommu
use-cases, it can use the IOPT to get the physical addresses?
>+ .free = noiommu_domain_free,
>+};
>+
>+struct iommu_ops iommufd_noiommu_ops = {
>+ .domain_alloc_paging_flags = noiommu_alloc_paging_flags,
>+};
>diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
>index 6ac1965199e9..9c18c5eb1899 100644
>--- a/drivers/iommu/iommufd/iommufd_private.h
>+++ b/drivers/iommu/iommufd/iommufd_private.h
>@@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
> refcount_dec(&hwpt->obj.users);
> }
>
>+extern struct iommu_ops iommufd_noiommu_ops;
>+
> struct iommufd_attach;
>
> struct iommufd_group {
>--
>2.34.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 02/11] iommufd: Move igroup allocation to a function
2026-03-12 15:56 ` [PATCH V2 02/11] iommufd: Move igroup allocation to a function Jacob Pan
@ 2026-03-18 18:39 ` Samiullah Khawaja
2026-03-22 9:41 ` Mostafa Saleh
2026-03-23 16:46 ` Samiullah Khawaja
2 siblings, 0 replies; 31+ messages in thread
From: Samiullah Khawaja @ 2026-03-18 18:39 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, pasha.tatashin, Will Deacon,
Baolu Lu
On Thu, Mar 12, 2026 at 08:56:28AM -0700, Jacob Pan wrote:
>From: Jason Gunthorpe <jgg@nvidia.com>
>
>So it can be reused in the next patch which allows binding to noiommu
>device.
>
>Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
>---
> drivers/iommu/iommufd/device.c | 48 +++++++++++++++++++++-------------
> 1 file changed, 30 insertions(+), 18 deletions(-)
>
>diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
>index 344d620cdecc..54d73016468f 100644
>--- a/drivers/iommu/iommufd/device.c
>+++ b/drivers/iommu/iommufd/device.c
>@@ -30,8 +30,9 @@ static void iommufd_group_release(struct kref *kref)
>
> WARN_ON(!xa_empty(&igroup->pasid_attach));
>
>- xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
>- NULL, GFP_KERNEL);
>+ if (igroup->group)
>+ xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group),
>+ igroup, NULL, GFP_KERNEL);
> iommu_group_put(igroup->group);
> mutex_destroy(&igroup->lock);
> kfree(igroup);
>@@ -56,6 +57,30 @@ static bool iommufd_group_try_get(struct iommufd_group *igroup,
> return kref_get_unless_zero(&igroup->ref);
> }
>
>+static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx,
>+ struct iommu_group *group)
>+{
>+ struct iommufd_group *new_igroup;
>+
>+ new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL);
>+ if (!new_igroup)
>+ return ERR_PTR(-ENOMEM);
>+
>+ kref_init(&new_igroup->ref);
>+ mutex_init(&new_igroup->lock);
>+ xa_init(&new_igroup->pasid_attach);
>+ new_igroup->sw_msi_start = PHYS_ADDR_MAX;
>+ /* group reference moves into new_igroup */
>+ new_igroup->group = group;
>+
>+ /*
>+ * The ictx is not additionally refcounted here becase all objects using
>+ * an igroup must put it before their destroy completes.
>+ */
>+ new_igroup->ictx = ictx;
>+ return new_igroup;
>+}
>+
> /*
> * iommufd needs to store some more data for each iommu_group, we keep a
> * parallel xarray indexed by iommu_group id to hold this instead of putting it
>@@ -87,25 +112,12 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
> }
> xa_unlock(&ictx->groups);
>
>- new_igroup = kzalloc_obj(*new_igroup);
>- if (!new_igroup) {
>+ new_igroup = iommufd_alloc_group(ictx, group);
>+ if (IS_ERR(new_igroup)) {
> iommu_group_put(group);
>- return ERR_PTR(-ENOMEM);
>+ return new_igroup;
> }
>
>- kref_init(&new_igroup->ref);
>- mutex_init(&new_igroup->lock);
>- xa_init(&new_igroup->pasid_attach);
>- new_igroup->sw_msi_start = PHYS_ADDR_MAX;
>- /* group reference moves into new_igroup */
>- new_igroup->group = group;
>-
>- /*
>- * The ictx is not additionally refcounted here becase all objects using
>- * an igroup must put it before their destroy completes.
>- */
>- new_igroup->ictx = ictx;
>-
> /*
> * We dropped the lock so igroup is invalid. NULL is a safe and likely
> * value to assume for the xa_cmpxchg algorithm.
>--
>2.34.1
>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Thanks,
Sami
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu
2026-03-12 15:56 ` [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-03-18 18:38 ` Samiullah Khawaja
@ 2026-03-22 9:24 ` Mostafa Saleh
2026-03-23 21:11 ` Jacob Pan
1 sibling, 1 reply; 31+ messages in thread
From: Mostafa Saleh @ 2026-03-22 9:24 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
On Thu, Mar 12, 2026 at 08:56:27AM -0700, Jacob Pan wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> Create just a little part of a real iommu driver, enough to
> slot in under the dev_iommu_ops() and allow iommufd to call
> domain_alloc_paging_flags() and fail everything else.
>
> This allows explicitly creating a HWPT under an IOAS.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> drivers/iommu/iommufd/Makefile | 1 +
> drivers/iommu/iommufd/hw_pagetable.c | 11 ++-
> drivers/iommu/iommufd/hwpt_noiommu.c | 91 +++++++++++++++++++++++++
> drivers/iommu/iommufd/iommufd_private.h | 2 +
> 4 files changed, 103 insertions(+), 2 deletions(-)
> create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
>
> diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
> index 71d692c9a8f4..2b1a020b14a6 100644
> --- a/drivers/iommu/iommufd/Makefile
> +++ b/drivers/iommu/iommufd/Makefile
> @@ -10,6 +10,7 @@ iommufd-y := \
> vfio_compat.o \
> viommu.o
>
> +iommufd-$(CONFIG_VFIO_NOIOMMU) += hwpt_noiommu.o
> iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
>
> obj-$(CONFIG_IOMMUFD) += iommufd.o
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index fe789c2dc0c9..37316d77277d 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -8,6 +8,13 @@
> #include "../iommu-priv.h"
> #include "iommufd_private.h"
>
> +static const struct iommu_ops *get_iommu_ops(struct iommufd_device *idev)
> +{
> + if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && !idev->igroup->group)
> + return &iommufd_noiommu_ops;
> + return dev_iommu_ops(idev->dev);
> +}
> +
> static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
> {
> if (hwpt->domain)
> @@ -114,7 +121,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
> IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
> IOMMU_HWPT_FAULT_ID_VALID |
> IOMMU_HWPT_ALLOC_PASID;
> - const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> + const struct iommu_ops *ops = get_iommu_ops(idev);
> struct iommufd_hwpt_paging *hwpt_paging;
> struct iommufd_hw_pagetable *hwpt;
> int rc;
> @@ -229,7 +236,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
> struct iommufd_device *idev, u32 flags,
> const struct iommu_user_data *user_data)
> {
> - const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> + const struct iommu_ops *ops = get_iommu_ops(idev);
> struct iommufd_hwpt_nested *hwpt_nested;
> struct iommufd_hw_pagetable *hwpt;
> int rc;
> diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c b/drivers/iommu/iommufd/hwpt_noiommu.c
> new file mode 100644
> index 000000000000..0aa99f581ca3
> --- /dev/null
> +++ b/drivers/iommu/iommufd/hwpt_noiommu.c
> @@ -0,0 +1,91 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
> + */
> +#include <linux/iommu.h>
> +#include <linux/generic_pt/iommu.h>
> +#include "iommufd_private.h"
> +
> +static const struct iommu_domain_ops noiommu_amdv1_ops;
> +
> +struct noiommu_domain {
> + union {
> + struct iommu_domain domain;
> + struct pt_iommu_amdv1 amdv1;
> + };
> + spinlock_t lock;
> +};
> +PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain);
> +
> +static void noiommu_change_top(struct pt_iommu *iommu_table,
> + phys_addr_t top_paddr, unsigned int top_level)
> +{
> +}
> +
> +static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt)
> +{
> + struct noiommu_domain *domain =
> + container_of(iommupt, struct noiommu_domain, amdv1.iommu);
> +
> + return &domain->lock;
> +}
> +
> +static const struct pt_iommu_driver_ops noiommu_driver_ops = {
> + .get_top_lock = noiommu_get_top_lock,
> + .change_top = noiommu_change_top,
> +};
> +
> +static struct iommu_domain *
> +noiommu_alloc_paging_flags(struct device *dev, u32 flags,
> + const struct iommu_user_data *user_data)
> +{
> + struct pt_iommu_amdv1_cfg cfg = {};
> + struct noiommu_domain *dom;
> + int rc;
> +
> + if (flags || user_data)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + cfg.common.hw_max_vasz_lg2 = 64;
> + cfg.common.hw_max_oasz_lg2 = 52;
> + cfg.starting_level = 2;
> + cfg.common.features =
> + (BIT(PT_FEAT_DYNAMIC_TOP) | BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) |
> + BIT(PT_FEAT_AMDV1_FORCE_COHERENCE));
> +
> + dom = kzalloc(sizeof(*dom), GFP_KERNEL);
> + if (!dom)
> + return ERR_PTR(-ENOMEM);
> +
> + spin_lock_init(&dom->lock);
> + dom->amdv1.iommu.nid = NUMA_NO_NODE;
> + dom->amdv1.iommu.driver_ops = &noiommu_driver_ops;
> + dom->domain.ops = &noiommu_amdv1_ops;
> +
> + /* Use mock page table which is based on AMDV1 */
> + rc = pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL);
> + if (rc) {
> + kfree(dom);
> + return ERR_PTR(rc);
> + }
> +
> + return &dom->domain;
> +}
> +
> +static void noiommu_domain_free(struct iommu_domain *iommu_domain)
> +{
> + struct noiommu_domain *domain =
> + container_of(iommu_domain, struct noiommu_domain, domain);
> +
> + pt_iommu_deinit(&domain->amdv1.iommu);
> + kfree(domain);
> +}
> +
> +static const struct iommu_domain_ops noiommu_amdv1_ops = {
> + IOMMU_PT_DOMAIN_OPS(amdv1),
I see the appeal of re-using an existing page table implementation to
keep track of iovas which -as far as I understand- are used as tokens
for DMA pinned pages later, but maybe at least add some paragraph about
that, as it is not immediately clear and that's a different design
from the legacy noiommu VFIO code.
Thanks,
Mostafa
> + .free = noiommu_domain_free,
> +};
> +
> +struct iommu_ops iommufd_noiommu_ops = {
> + .domain_alloc_paging_flags = noiommu_alloc_paging_flags,
> +};
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 6ac1965199e9..9c18c5eb1899 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -464,6 +464,8 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
> refcount_dec(&hwpt->obj.users);
> }
>
> +extern struct iommu_ops iommufd_noiommu_ops;
> +
> struct iommufd_attach;
>
> struct iommufd_group {
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 02/11] iommufd: Move igroup allocation to a function
2026-03-12 15:56 ` [PATCH V2 02/11] iommufd: Move igroup allocation to a function Jacob Pan
2026-03-18 18:39 ` Samiullah Khawaja
@ 2026-03-22 9:41 ` Mostafa Saleh
2026-03-23 22:51 ` Jacob Pan
2026-03-23 16:46 ` Samiullah Khawaja
2 siblings, 1 reply; 31+ messages in thread
From: Mostafa Saleh @ 2026-03-22 9:41 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
On Thu, Mar 12, 2026 at 08:56:28AM -0700, Jacob Pan wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> So it can be reused in the next patch which allows binding to noiommu
> device.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> drivers/iommu/iommufd/device.c | 48 +++++++++++++++++++++-------------
> 1 file changed, 30 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 344d620cdecc..54d73016468f 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -30,8 +30,9 @@ static void iommufd_group_release(struct kref *kref)
>
> WARN_ON(!xa_empty(&igroup->pasid_attach));
>
> - xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
> - NULL, GFP_KERNEL);
> + if (igroup->group)
> + xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group),
> + igroup, NULL, GFP_KERNEL);
Is that a separate fix or perhaps belongs to the next patch making
it possible to have NULL groups.
Thanks,
Mostafa
> iommu_group_put(igroup->group);
> mutex_destroy(&igroup->lock);
> kfree(igroup);
> @@ -56,6 +57,30 @@ static bool iommufd_group_try_get(struct iommufd_group *igroup,
> return kref_get_unless_zero(&igroup->ref);
> }
>
> +static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx,
> + struct iommu_group *group)
> +{
> + struct iommufd_group *new_igroup;
> +
> + new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL);
> + if (!new_igroup)
> + return ERR_PTR(-ENOMEM);
> +
> + kref_init(&new_igroup->ref);
> + mutex_init(&new_igroup->lock);
> + xa_init(&new_igroup->pasid_attach);
> + new_igroup->sw_msi_start = PHYS_ADDR_MAX;
> + /* group reference moves into new_igroup */
> + new_igroup->group = group;
> +
> + /*
> + * The ictx is not additionally refcounted here becase all objects using
> + * an igroup must put it before their destroy completes.
> + */
> + new_igroup->ictx = ictx;
> + return new_igroup;
> +}
> +
> /*
> * iommufd needs to store some more data for each iommu_group, we keep a
> * parallel xarray indexed by iommu_group id to hold this instead of putting it
> @@ -87,25 +112,12 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
> }
> xa_unlock(&ictx->groups);
>
> - new_igroup = kzalloc_obj(*new_igroup);
> - if (!new_igroup) {
> + new_igroup = iommufd_alloc_group(ictx, group);
> + if (IS_ERR(new_igroup)) {
> iommu_group_put(group);
> - return ERR_PTR(-ENOMEM);
> + return new_igroup;
> }
>
> - kref_init(&new_igroup->ref);
> - mutex_init(&new_igroup->lock);
> - xa_init(&new_igroup->pasid_attach);
> - new_igroup->sw_msi_start = PHYS_ADDR_MAX;
> - /* group reference moves into new_igroup */
> - new_igroup->group = group;
> -
> - /*
> - * The ictx is not additionally refcounted here becase all objects using
> - * an igroup must put it before their destroy completes.
> - */
> - new_igroup->ictx = ictx;
> -
> /*
> * We dropped the lock so igroup is invalid. NULL is a safe and likely
> * value to assume for the xa_cmpxchg algorithm.
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 03/11] iommufd: Allow binding to a noiommu device
2026-03-12 15:56 ` [PATCH V2 03/11] iommufd: Allow binding to a noiommu device Jacob Pan
@ 2026-03-22 9:54 ` Mostafa Saleh
2026-03-23 13:20 ` Jason Gunthorpe
2026-03-24 19:13 ` Jacob Pan
0 siblings, 2 replies; 31+ messages in thread
From: Mostafa Saleh @ 2026-03-22 9:54 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
On Thu, Mar 12, 2026 at 08:56:29AM -0700, Jacob Pan wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> Allow iommufd to bind devices without an IOMMU (noiommu mode) by creating
> a dummy IOMMU group for such devices and skipping hwpt operations.
>
> This enables noiommu devices to operate through the same iommufd API as IOMMU-
> capable devices.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> drivers/iommu/iommufd/device.c | 113 ++++++++++++++++++++++-----------
> 1 file changed, 76 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 54d73016468f..c38d3efa3d6f 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -23,6 +23,11 @@ struct iommufd_attach {
> struct xarray device_array;
> };
>
> +static bool is_vfio_noiommu(struct iommufd_device *idev)
> +{
> + return !device_iommu_mapped(idev->dev) || !idev->dev->iommu;
Do this need to check for CONFIG_VFIO_NOIOMMU and maybe the module
param enable_unsafe_noiommu_mode similar to the legacy implemenation?
> +}
> +
> static void iommufd_group_release(struct kref *kref)
> {
> struct iommufd_group *igroup =
> @@ -205,32 +210,17 @@ void iommufd_device_destroy(struct iommufd_object *obj)
> struct iommufd_device *idev =
> container_of(obj, struct iommufd_device, obj);
>
> - iommu_device_release_dma_owner(idev->dev);
> + if (!is_vfio_noiommu(idev))
> + iommu_device_release_dma_owner(idev->dev);
> iommufd_put_group(idev->igroup);
> if (!iommufd_selftest_is_mock_dev(idev->dev))
> iommufd_ctx_put(idev->ictx);
> }
>
> -/**
> - * iommufd_device_bind - Bind a physical device to an iommu fd
> - * @ictx: iommufd file descriptor
> - * @dev: Pointer to a physical device struct
> - * @id: Output ID number to return to userspace for this device
> - *
> - * A successful bind establishes an ownership over the device and returns
> - * struct iommufd_device pointer, otherwise returns error pointer.
> - *
> - * A driver using this API must set driver_managed_dma and must not touch
> - * the device until this routine succeeds and establishes ownership.
> - *
> - * Binding a PCI device places the entire RID under iommufd control.
> - *
> - * The caller must undo this with iommufd_device_unbind()
> - */
> -struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> - struct device *dev, u32 *id)
> +static int iommufd_bind_iommu(struct iommufd_device *idev)
> {
> - struct iommufd_device *idev;
> + struct iommufd_ctx *ictx = idev->ictx;
> + struct device *dev = idev->dev;
> struct iommufd_group *igroup;
> int rc;
>
> @@ -239,11 +229,11 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> * to restore cache coherency.
> */
> if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> - return ERR_PTR(-EINVAL);
> + return -EINVAL;
>
> - igroup = iommufd_get_group(ictx, dev);
> + igroup = iommufd_get_group(idev->ictx, dev);
> if (IS_ERR(igroup))
> - return ERR_CAST(igroup);
> + return PTR_ERR(igroup);
>
> /*
> * For historical compat with VFIO the insecure interrupt path is
> @@ -269,21 +259,66 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> if (rc)
> goto out_group_put;
>
> + /* igroup refcount moves into iommufd_device */
> + idev->igroup = igroup;
> + return 0;
> +
> +out_group_put:
> + iommufd_put_group(igroup);
> + return rc;
> +}
> +
> +/**
> + * iommufd_device_bind - Bind a physical device to an iommu fd
> + * @ictx: iommufd file descriptor
> + * @dev: Pointer to a physical device struct
> + * @id: Output ID number to return to userspace for this device
> + *
> + * A successful bind establishes an ownership over the device and returns
> + * struct iommufd_device pointer, otherwise returns error pointer.
> + *
> + * A driver using this API must set driver_managed_dma and must not touch
> + * the device until this routine succeeds and establishes ownership.
> + *
> + * Binding a PCI device places the entire RID under iommufd control.
> + *
> + * The caller must undo this with iommufd_device_unbind()
> + */
> +struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> + struct device *dev, u32 *id)
> +{
> + struct iommufd_device *idev;
> + int rc;
> +
> idev = iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE);
The next code introduces new error cases, do that need to be cleaned in
that case by calling iommufd_object_abort_and_destroy()?
Thanks,
Mostafa
> - if (IS_ERR(idev)) {
> - rc = PTR_ERR(idev);
> - goto out_release_owner;
> - }
> + if (IS_ERR(idev))
> + return idev;
> idev->ictx = ictx;
> - if (!iommufd_selftest_is_mock_dev(dev))
> - iommufd_ctx_get(ictx);
> idev->dev = dev;
> idev->enforce_cache_coherency =
> device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
> +
> + if (!is_vfio_noiommu(idev)) {
> + rc = iommufd_bind_iommu(idev);
> + if (rc)
> + return ERR_PTR(rc);
> + } else {
> + struct iommufd_group *igroup;
> +
> + /*
> + * Create a dummy igroup, lots of stuff expects ths igroup to be
> + * present, but a NULL igroup->group is OK
> + */
> + igroup = iommufd_alloc_group(ictx, NULL);
> + if (IS_ERR(igroup))
> + return ERR_CAST(igroup);
> + idev->igroup = igroup;
> + }
> +
> + if (!iommufd_selftest_is_mock_dev(dev))
> + iommufd_ctx_get(ictx);
> /* The calling driver is a user until iommufd_device_unbind() */
> refcount_inc(&idev->obj.users);
> - /* igroup refcount moves into iommufd_device */
> - idev->igroup = igroup;
>
> /*
> * If the caller fails after this success it must call
> @@ -295,11 +330,6 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> *id = idev->obj.id;
> return idev;
>
> -out_release_owner:
> - iommu_device_release_dma_owner(dev);
> -out_group_put:
> - iommufd_put_group(igroup);
> - return ERR_PTR(rc);
> }
> EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD");
>
> @@ -513,6 +543,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
> struct iommufd_attach_handle *handle;
> int rc;
>
> + if (is_vfio_noiommu(idev))
> + return 0;
> +
> if (!iommufd_hwpt_compatible_device(hwpt, idev))
> return -EINVAL;
>
> @@ -560,6 +593,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
> {
> struct iommufd_attach_handle *handle;
>
> + if (is_vfio_noiommu(idev))
> + return;
> +
> handle = iommufd_device_get_attach_handle(idev, pasid);
> if (pasid == IOMMU_NO_PASID)
> iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> @@ -578,6 +614,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
> struct iommufd_attach_handle *handle, *old_handle;
> int rc;
>
> + if (is_vfio_noiommu(idev))
> + return 0;
> +
> if (!iommufd_hwpt_compatible_device(hwpt, idev))
> return -EINVAL;
>
> @@ -653,7 +692,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
> goto err_release_devid;
> }
>
> - if (attach_resv) {
> + if (attach_resv && !is_vfio_noiommu(idev)) {
> rc = iommufd_device_attach_reserved_iova(idev, hwpt_paging);
> if (rc)
> goto err_release_devid;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 05/11] vfio: Allow null group for noiommu without containers
2026-03-12 15:56 ` [PATCH V2 05/11] vfio: Allow null group for noiommu without containers Jacob Pan
@ 2026-03-22 9:59 ` Mostafa Saleh
2026-03-23 13:21 ` Jason Gunthorpe
0 siblings, 1 reply; 31+ messages in thread
From: Mostafa Saleh @ 2026-03-22 9:59 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
On Thu, Mar 12, 2026 at 08:56:31AM -0700, Jacob Pan wrote:
> In case of noiommu mode is enabled for VFIO cdev without VFIO container
> nor IOMMUFD provided compatibility container, there is no need to
> create a dummy group. Update the group operations to tolerate null group
> pointer.
>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> drivers/vfio/group.c | 14 ++++++++++++++
> drivers/vfio/vfio.h | 17 +++++++++++++++++
> 2 files changed, 31 insertions(+)
>
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 4f15016d2a5f..98f2a4f2ebff 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -381,6 +381,9 @@ int vfio_device_block_group(struct vfio_device *device)
> struct vfio_group *group = device->group;
> int ret = 0;
>
> + if (vfio_null_group_allowed() && !group)
> + return 0;
> +
> mutex_lock(&group->group_lock);
> if (group->opened_file) {
> ret = -EBUSY;
> @@ -398,6 +401,9 @@ void vfio_device_unblock_group(struct vfio_device *device)
> {
> struct vfio_group *group = device->group;
>
> + if (vfio_null_group_allowed() && !group)
> + return;
> +
> mutex_lock(&group->group_lock);
> group->cdev_device_open_cnt--;
> mutex_unlock(&group->group_lock);
> @@ -589,6 +595,14 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
> struct vfio_group *group;
> int ret;
>
> + /*
> + * With noiommu enabled under cdev interface only, there is no need to
> + * create a vfio_group if the group based containers are not enabled.
> + * The cdev interface is exclusively used for iommufd.
> + */
> + if (vfio_null_group_allowed())
> + return NULL;
> +
Now vfio_device_set_group() can return NULL when called from
__vfio_register_dev() where the error path calls
vfio_device_remove_group() which I believe would break.
But is that really needed, I feel like this optimization is not worth
the extra effort to add those checks and the possiblity of missing
some. what do you think?
Thanks,
Mostafa
> iommu_group = iommu_group_alloc();
> if (IS_ERR(iommu_group))
> return ERR_CAST(iommu_group);
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 50128da18bca..838c08077ce2 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -113,6 +113,18 @@ bool vfio_device_has_container(struct vfio_device *device);
> int __init vfio_group_init(void);
> void vfio_group_cleanup(void);
>
> +/*
> + * With noiommu enabled and no containers are supported, allow devices that
> + * don't have a dummy group.
> + */
> +static inline bool vfio_null_group_allowed(void)
> +{
> + if (vfio_noiommu && (!IS_ENABLED(CONFIG_VFIO_CONTAINER) && !IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)))
> + return true;
> +
> + return false;
> +}
> +
> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> {
> return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> @@ -189,6 +201,11 @@ static inline void vfio_group_cleanup(void)
> {
> }
>
> +static inline bool vfio_null_group_allowed(void)
> +{
> + return false;
> +}
> +
> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> {
> return false;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 06/11] vfio: Introduce and set noiommu flag on vfio_device
2026-03-12 15:56 ` [PATCH V2 06/11] vfio: Introduce and set noiommu flag on vfio_device Jacob Pan
@ 2026-03-22 10:02 ` Mostafa Saleh
0 siblings, 0 replies; 31+ messages in thread
From: Mostafa Saleh @ 2026-03-22 10:02 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
On Thu, Mar 12, 2026 at 08:56:32AM -0700, Jacob Pan wrote:
> When a VFIO device is added to a noiommu group, set the noiommu flag on
> the vfio_device structure to indicate that the device operates in
> noiommu mode.
>
> Also update function signatures to pass vfio_device instead of device,
> which has the direct access to the noiommu flag.
>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Thanks,
Mostafa
> ---
> drivers/vfio/group.c | 21 +++++++++++----------
> include/linux/vfio.h | 1 +
> 2 files changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 98f2a4f2ebff..6f98c57de9e0 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -588,7 +588,7 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> return ret;
> }
>
> -static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
> +static struct vfio_group *vfio_noiommu_group_alloc(struct vfio_device *vdev,
> enum vfio_group_type type)
> {
> struct iommu_group *iommu_group;
> @@ -610,7 +610,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
> ret = iommu_group_set_name(iommu_group, "vfio-noiommu");
> if (ret)
> goto out_put_group;
> - ret = iommu_group_add_device(iommu_group, dev);
> + ret = iommu_group_add_device(iommu_group, vdev->dev);
> if (ret)
> goto out_put_group;
>
> @@ -625,7 +625,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
> return group;
>
> out_remove_device:
> - iommu_group_remove_device(dev);
> + iommu_group_remove_device(vdev->dev);
> out_put_group:
> iommu_group_put(iommu_group);
> return ERR_PTR(ret);
> @@ -646,23 +646,24 @@ static bool vfio_group_has_device(struct vfio_group *group, struct device *dev)
> return false;
> }
>
> -static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
> +static struct vfio_group *vfio_group_find_or_alloc(struct vfio_device *vdev)
> {
> struct iommu_group *iommu_group;
> struct vfio_group *group;
>
> - iommu_group = iommu_group_get(dev);
> + iommu_group = iommu_group_get(vdev->dev);
> if (!iommu_group && vfio_noiommu) {
> + vdev->noiommu = 1;
> /*
> * With noiommu enabled, create an IOMMU group for devices that
> * don't already have one, implying no IOMMU hardware/driver
> * exists. Taint the kernel because we're about to give a DMA
> * capable device to a user without IOMMU protection.
> */
> - group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> + group = vfio_noiommu_group_alloc(vdev, VFIO_NO_IOMMU);
> if (!IS_ERR(group)) {
> add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> - dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n");
> + dev_warn(vdev->dev, "Adding kernel taint for vfio-noiommu group on device\n");
> }
> return group;
> }
> @@ -673,7 +674,7 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
> mutex_lock(&vfio.group_lock);
> group = vfio_group_find_from_iommu(iommu_group);
> if (group) {
> - if (WARN_ON(vfio_group_has_device(group, dev)))
> + if (WARN_ON(vfio_group_has_device(group, vdev->dev)))
> group = ERR_PTR(-EINVAL);
> else
> refcount_inc(&group->drivers);
> @@ -693,9 +694,9 @@ int vfio_device_set_group(struct vfio_device *device,
> struct vfio_group *group;
>
> if (type == VFIO_IOMMU)
> - group = vfio_group_find_or_alloc(device->dev);
> + group = vfio_group_find_or_alloc(device);
> else
> - group = vfio_noiommu_group_alloc(device->dev, type);
> + group = vfio_noiommu_group_alloc(device, type);
>
> if (IS_ERR(group))
> return PTR_ERR(group);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index e90859956514..844d14839f96 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -72,6 +72,7 @@ struct vfio_device {
> u8 iommufd_attached:1;
> #endif
> u8 cdev_opened:1;
> + u8 noiommu:1;
> #ifdef CONFIG_DEBUG_FS
> /*
> * debug_root is a static property of the vfio_device
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 07/11] vfio: Update noiommu device detection logic for cdev
2026-03-12 15:56 ` [PATCH V2 07/11] vfio: Update noiommu device detection logic for cdev Jacob Pan
@ 2026-03-22 10:04 ` Mostafa Saleh
0 siblings, 0 replies; 31+ messages in thread
From: Mostafa Saleh @ 2026-03-22 10:04 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
On Thu, Mar 12, 2026 at 08:56:33AM -0700, Jacob Pan wrote:
> Rework vfio_device_is_noiommu() to derive noiommu mode based on device,
> group type, and configurations.
>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> ---
> drivers/vfio/vfio.h | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 838c08077ce2..c5541967ef9b 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -127,8 +127,13 @@ static inline bool vfio_null_group_allowed(void)
>
> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> {
> - return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> - vdev->group->type == VFIO_NO_IOMMU;
> + if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU))
> + return false;
> +
> + if (vfio_null_group_allowed())
> + return vdev->noiommu;
> +
> + return vdev->group->type == VFIO_NO_IOMMU;
I see that noiommu is set for both, can this just be simplified to:
return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && vdev->noiommu;
Thanks,
Mostafa
> }
> #else
> struct vfio_group;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu
2026-03-18 18:38 ` Samiullah Khawaja
@ 2026-03-23 13:17 ` Jason Gunthorpe
2026-03-24 17:42 ` Samiullah Khawaja
0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2026-03-23 13:17 UTC (permalink / raw)
To: Samiullah Khawaja
Cc: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Alex Williamson,
Joerg Roedel, David Matlack, Robin Murphy, Nicolin Chen,
Tian, Kevin, Yi Liu, pasha.tatashin, Will Deacon, Baolu Lu
On Wed, Mar 18, 2026 at 06:38:14PM +0000, Samiullah Khawaja wrote:
> > +static const struct iommu_domain_ops noiommu_amdv1_ops = {
> > + IOMMU_PT_DOMAIN_OPS(amdv1),
>
> I understand that this fits in really well into the iommufd/hwpt
> construction, but do we need page tables for this as all the
> iova-to-phys information should be available in the IOPT in IOAS?
Yes we do! That is the whole point.
In iommufd once you pin the memory the phys is stored in only two
possible ways:
1) Inside an xarray if an access is used
2) Inside at least one iommu_domain
That's it. So to fit noiommu into this scheme, and have it rely on the
existing pinning, we either have to make it use an access or make it
use an iommu_domain -> a real one that can store phys.
Maybe a comment is helpful, but using the domain like this to store
the pinned phys has been the vfio design from day 1..
> get_pa() function introduced in the later patch is only used for noiommu
> use-cases, it can use the IOPT to get the physical addresses?
No.
Jason
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 03/11] iommufd: Allow binding to a noiommu device
2026-03-22 9:54 ` Mostafa Saleh
@ 2026-03-23 13:20 ` Jason Gunthorpe
2026-03-24 19:13 ` Jacob Pan
1 sibling, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2026-03-23 13:20 UTC (permalink / raw)
To: Mostafa Saleh
Cc: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Alex Williamson,
Joerg Roedel, David Matlack, Robin Murphy, Nicolin Chen,
Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin, Will Deacon,
Baolu Lu
On Sun, Mar 22, 2026 at 09:54:15AM +0000, Mostafa Saleh wrote:
> > +struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> > + struct device *dev, u32 *id)
> > +{
> > + struct iommufd_device *idev;
> > + int rc;
> > +
> > idev = iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE);
>
> The next code introduces new error cases, do that need to be cleaned in
> that case by calling iommufd_object_abort_and_destroy()?
It should probably use iommufd_object_alloc_ucmd() so the core code
manages the lifecycle?
Jason
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 05/11] vfio: Allow null group for noiommu without containers
2026-03-22 9:59 ` Mostafa Saleh
@ 2026-03-23 13:21 ` Jason Gunthorpe
0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2026-03-23 13:21 UTC (permalink / raw)
To: Mostafa Saleh
Cc: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Alex Williamson,
Joerg Roedel, David Matlack, Robin Murphy, Nicolin Chen,
Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin, Will Deacon,
Baolu Lu
On Sun, Mar 22, 2026 at 09:59:15AM +0000, Mostafa Saleh wrote:
> But is that really needed, I feel like this optimization is not worth
> the extra effort to add those checks and the possiblity of missing
> some. what do you think?
I don't think it is an optimization, this is cleaning the uapi visible
stuff not not require these hacks anymore.
Jason
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 02/11] iommufd: Move igroup allocation to a function
2026-03-12 15:56 ` [PATCH V2 02/11] iommufd: Move igroup allocation to a function Jacob Pan
2026-03-18 18:39 ` Samiullah Khawaja
2026-03-22 9:41 ` Mostafa Saleh
@ 2026-03-23 16:46 ` Samiullah Khawaja
2 siblings, 0 replies; 31+ messages in thread
From: Samiullah Khawaja @ 2026-03-23 16:46 UTC (permalink / raw)
To: Jacob Pan
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, pasha.tatashin, Will Deacon,
Baolu Lu
On Thu, Mar 12, 2026 at 08:56:28AM -0700, Jacob Pan wrote:
>From: Jason Gunthorpe <jgg@nvidia.com>
>
>So it can be reused in the next patch which allows binding to noiommu
>device.
>
>Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
>---
> drivers/iommu/iommufd/device.c | 48 +++++++++++++++++++++-------------
> 1 file changed, 30 insertions(+), 18 deletions(-)
>
>diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
>index 344d620cdecc..54d73016468f 100644
>--- a/drivers/iommu/iommufd/device.c
>+++ b/drivers/iommu/iommufd/device.c
>@@ -30,8 +30,9 @@ static void iommufd_group_release(struct kref *kref)
>
> WARN_ON(!xa_empty(&igroup->pasid_attach));
>
>- xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
>- NULL, GFP_KERNEL);
>+ if (igroup->group)
>+ xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group),
>+ igroup, NULL, GFP_KERNEL);
> iommu_group_put(igroup->group);
> mutex_destroy(&igroup->lock);
> kfree(igroup);
>@@ -56,6 +57,30 @@ static bool iommufd_group_try_get(struct iommufd_group *igroup,
> return kref_get_unless_zero(&igroup->ref);
> }
>
>+static struct iommufd_group *iommufd_alloc_group(struct iommufd_ctx *ictx,
>+ struct iommu_group *group)
>+{
>+ struct iommufd_group *new_igroup;
>+
>+ new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL);
>+ if (!new_igroup)
>+ return ERR_PTR(-ENOMEM);
>+
>+ kref_init(&new_igroup->ref);
>+ mutex_init(&new_igroup->lock);
>+ xa_init(&new_igroup->pasid_attach);
>+ new_igroup->sw_msi_start = PHYS_ADDR_MAX;
>+ /* group reference moves into new_igroup */
>+ new_igroup->group = group;
>+
>+ /*
>+ * The ictx is not additionally refcounted here becase all objects using
>+ * an igroup must put it before their destroy completes.
>+ */
>+ new_igroup->ictx = ictx;
>+ return new_igroup;
>+}
>+
> /*
> * iommufd needs to store some more data for each iommu_group, we keep a
> * parallel xarray indexed by iommu_group id to hold this instead of putting it
>@@ -87,25 +112,12 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
> }
> xa_unlock(&ictx->groups);
>
>- new_igroup = kzalloc_obj(*new_igroup);
>- if (!new_igroup) {
>+ new_igroup = iommufd_alloc_group(ictx, group);
>+ if (IS_ERR(new_igroup)) {
> iommu_group_put(group);
>- return ERR_PTR(-ENOMEM);
>+ return new_igroup;
> }
>
>- kref_init(&new_igroup->ref);
>- mutex_init(&new_igroup->lock);
>- xa_init(&new_igroup->pasid_attach);
>- new_igroup->sw_msi_start = PHYS_ADDR_MAX;
>- /* group reference moves into new_igroup */
>- new_igroup->group = group;
>-
>- /*
>- * The ictx is not additionally refcounted here becase all objects using
>- * an igroup must put it before their destroy completes.
>- */
>- new_igroup->ictx = ictx;
>-
> /*
> * We dropped the lock so igroup is invalid. NULL is a safe and likely
> * value to assume for the xa_cmpxchg algorithm.
>--
>2.34.1
>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu
2026-03-22 9:24 ` Mostafa Saleh
@ 2026-03-23 21:11 ` Jacob Pan
2026-03-23 22:10 ` Jason Gunthorpe
0 siblings, 1 reply; 31+ messages in thread
From: Jacob Pan @ 2026-03-23 21:11 UTC (permalink / raw)
To: Mostafa Saleh
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
Hi Mostafa,
On Sun, 22 Mar 2026 09:24:37 +0000
Mostafa Saleh <smostafa@google.com> wrote:
> On Thu, Mar 12, 2026 at 08:56:27AM -0700, Jacob Pan wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> >
> > Create just a little part of a real iommu driver, enough to
> > slot in under the dev_iommu_ops() and allow iommufd to call
> > domain_alloc_paging_flags() and fail everything else.
> >
> > This allows explicitly creating a HWPT under an IOAS.
> >
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > drivers/iommu/iommufd/Makefile | 1 +
> > drivers/iommu/iommufd/hw_pagetable.c | 11 ++-
> > drivers/iommu/iommufd/hwpt_noiommu.c | 91
> > +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h |
> > 2 + 4 files changed, 103 insertions(+), 2 deletions(-)
> > create mode 100644 drivers/iommu/iommufd/hwpt_noiommu.c
> >
> > diff --git a/drivers/iommu/iommufd/Makefile
> > b/drivers/iommu/iommufd/Makefile index 71d692c9a8f4..2b1a020b14a6
> > 100644 --- a/drivers/iommu/iommufd/Makefile
> > +++ b/drivers/iommu/iommufd/Makefile
> > @@ -10,6 +10,7 @@ iommufd-y := \
> > vfio_compat.o \
> > viommu.o
> >
> > +iommufd-$(CONFIG_VFIO_NOIOMMU) += hwpt_noiommu.o
> > iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
> >
> > obj-$(CONFIG_IOMMUFD) += iommufd.o
> > diff --git a/drivers/iommu/iommufd/hw_pagetable.c
> > b/drivers/iommu/iommufd/hw_pagetable.c index
> > fe789c2dc0c9..37316d77277d 100644 ---
> > a/drivers/iommu/iommufd/hw_pagetable.c +++
> > b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,6 +8,13 @@
> > #include "../iommu-priv.h"
> > #include "iommufd_private.h"
> >
> > +static const struct iommu_ops *get_iommu_ops(struct iommufd_device
> > *idev) +{
> > + if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> > !idev->igroup->group)
> > + return &iommufd_noiommu_ops;
> > + return dev_iommu_ops(idev->dev);
> > +}
> > +
> > static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable
> > *hwpt) {
> > if (hwpt->domain)
> > @@ -114,7 +121,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx
> > *ictx, struct iommufd_ioas *ioas, IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
> > IOMMU_HWPT_FAULT_ID_VALID |
> > IOMMU_HWPT_ALLOC_PASID;
> > - const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> > + const struct iommu_ops *ops = get_iommu_ops(idev);
> > struct iommufd_hwpt_paging *hwpt_paging;
> > struct iommufd_hw_pagetable *hwpt;
> > int rc;
> > @@ -229,7 +236,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx
> > *ictx, struct iommufd_device *idev, u32 flags,
> > const struct iommu_user_data *user_data)
> > {
> > - const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> > + const struct iommu_ops *ops = get_iommu_ops(idev);
> > struct iommufd_hwpt_nested *hwpt_nested;
> > struct iommufd_hw_pagetable *hwpt;
> > int rc;
> > diff --git a/drivers/iommu/iommufd/hwpt_noiommu.c
> > b/drivers/iommu/iommufd/hwpt_noiommu.c new file mode 100644
> > index 000000000000..0aa99f581ca3
> > --- /dev/null
> > +++ b/drivers/iommu/iommufd/hwpt_noiommu.c
> > @@ -0,0 +1,91 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
> > + */
> > +#include <linux/iommu.h>
> > +#include <linux/generic_pt/iommu.h>
> > +#include "iommufd_private.h"
> > +
> > +static const struct iommu_domain_ops noiommu_amdv1_ops;
> > +
> > +struct noiommu_domain {
> > + union {
> > + struct iommu_domain domain;
> > + struct pt_iommu_amdv1 amdv1;
> > + };
> > + spinlock_t lock;
> > +};
> > +PT_IOMMU_CHECK_DOMAIN(struct noiommu_domain, amdv1.iommu, domain);
> > +
> > +static void noiommu_change_top(struct pt_iommu *iommu_table,
> > + phys_addr_t top_paddr, unsigned int
> > top_level) +{
> > +}
> > +
> > +static spinlock_t *noiommu_get_top_lock(struct pt_iommu *iommupt)
> > +{
> > + struct noiommu_domain *domain =
> > + container_of(iommupt, struct noiommu_domain,
> > amdv1.iommu); +
> > + return &domain->lock;
> > +}
> > +
> > +static const struct pt_iommu_driver_ops noiommu_driver_ops = {
> > + .get_top_lock = noiommu_get_top_lock,
> > + .change_top = noiommu_change_top,
> > +};
> > +
> > +static struct iommu_domain *
> > +noiommu_alloc_paging_flags(struct device *dev, u32 flags,
> > + const struct iommu_user_data *user_data)
> > +{
> > + struct pt_iommu_amdv1_cfg cfg = {};
> > + struct noiommu_domain *dom;
> > + int rc;
> > +
> > + if (flags || user_data)
> > + return ERR_PTR(-EOPNOTSUPP);
> > +
> > + cfg.common.hw_max_vasz_lg2 = 64;
> > + cfg.common.hw_max_oasz_lg2 = 52;
> > + cfg.starting_level = 2;
> > + cfg.common.features =
> > + (BIT(PT_FEAT_DYNAMIC_TOP) |
> > BIT(PT_FEAT_AMDV1_ENCRYPT_TABLES) |
> > + BIT(PT_FEAT_AMDV1_FORCE_COHERENCE));
> > +
> > + dom = kzalloc(sizeof(*dom), GFP_KERNEL);
> > + if (!dom)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + spin_lock_init(&dom->lock);
> > + dom->amdv1.iommu.nid = NUMA_NO_NODE;
> > + dom->amdv1.iommu.driver_ops = &noiommu_driver_ops;
> > + dom->domain.ops = &noiommu_amdv1_ops;
> > +
> > + /* Use mock page table which is based on AMDV1 */
> > + rc = pt_iommu_amdv1_init(&dom->amdv1, &cfg, GFP_KERNEL);
> > + if (rc) {
> > + kfree(dom);
> > + return ERR_PTR(rc);
> > + }
> > +
> > + return &dom->domain;
> > +}
> > +
> > +static void noiommu_domain_free(struct iommu_domain *iommu_domain)
> > +{
> > + struct noiommu_domain *domain =
> > + container_of(iommu_domain, struct noiommu_domain,
> > domain); +
> > + pt_iommu_deinit(&domain->amdv1.iommu);
> > + kfree(domain);
> > +}
> > +
> > +static const struct iommu_domain_ops noiommu_amdv1_ops = {
> > + IOMMU_PT_DOMAIN_OPS(amdv1),
>
> I see the appeal of re-using an existing page table implementation to
> keep track of iovas which -as far as I understand- are used as tokens
> for DMA pinned pages later, but maybe at least add some paragraph
> about that, as it is not immediately clear and that's a different
> design from the legacy noiommu VFIO code.
>
Indeed it is a little confusing where we use the same VFIO noiommu
knobs but with extended set of features. The legacy VFIO noiommu mode
does not support container/IOAS level APIs thus no need for domain ops.
I also tried to explain the new design in the doc patch[11/11] with
summaries of API limitations between legacy VFIO noiommu mode and this
new mode under iommufd.
+-------------------+---------------------+---------------------+
| Feature | VFIO group | VFIO device cdev |
+===================+=====================+=====================+
| VFIO device UAPI | Yes | Yes |
+-------------------+---------------------+---------------------+
| VFIO container | No | No |
+-------------------+---------------------+---------------------+
| IOMMUFD IOAS | No | Yes* |
+-------------------+---------------------+---------------------+
How about adding the following comments:
@@ -81,6 +81,17 @@ static void noiommu_domain_free(struct iommu_domain *iommu_domain)
kfree(domain);
}
+/*
+ * AMDV1 is used as a dummy page table for no-IOMMU mode, similar to the
+ * iommufd selftest mock page table.
+ * Unlike legacy VFIO no-IOMMU mode, where no container level APIs are
+ * supported, this allows IOAS and hwpt objects to exist without hardware
+ * IOMMU support. IOVAs are used only for IOVA-to-PA lookups not for
+ * hardware translation in DMA.
+ *
+ * This is only used with iommufd and cdev-based interfaces and does not
+ * apply to legacy VFIO group-container based noiommu mode.
+ */
static const struct iommu_domain_ops noiommu_amdv1_ops = {
IOMMU_PT_DOMAIN_OPS(amdv1),
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu
2026-03-23 21:11 ` Jacob Pan
@ 2026-03-23 22:10 ` Jason Gunthorpe
0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2026-03-23 22:10 UTC (permalink / raw)
To: Jacob Pan
Cc: Mostafa Saleh, linux-kernel, iommu@lists.linux.dev,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
On Mon, Mar 23, 2026 at 02:11:32PM -0700, Jacob Pan wrote:
> +/*
> + * AMDV1 is used as a dummy page table for no-IOMMU mode, similar to the
> + * iommufd selftest mock page table.
> + * Unlike legacy VFIO no-IOMMU mode, where no container level APIs are
> + * supported, this allows IOAS and hwpt objects to exist without hardware
> + * IOMMU support. IOVAs are used only for IOVA-to-PA lookups not for
> + * hardware translation in DMA.
> + *
> + * This is only used with iommufd and cdev-based interfaces and does not
> + * apply to legacy VFIO group-container based noiommu mode.
> + */
> static const struct iommu_domain_ops noiommu_amdv1_ops = {
That seems clear
Jason
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 02/11] iommufd: Move igroup allocation to a function
2026-03-22 9:41 ` Mostafa Saleh
@ 2026-03-23 22:51 ` Jacob Pan
0 siblings, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2026-03-23 22:51 UTC (permalink / raw)
To: Mostafa Saleh
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
Hi Mostafa,
On Sun, 22 Mar 2026 09:41:17 +0000
Mostafa Saleh <smostafa@google.com> wrote:
> On Thu, Mar 12, 2026 at 08:56:28AM -0700, Jacob Pan wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> >
> > So it can be reused in the next patch which allows binding to
> > noiommu device.
> >
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > drivers/iommu/iommufd/device.c | 48
> > +++++++++++++++++++++------------- 1 file changed, 30
> > insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/iommu/iommufd/device.c
> > b/drivers/iommu/iommufd/device.c index 344d620cdecc..54d73016468f
> > 100644 --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -30,8 +30,9 @@ static void iommufd_group_release(struct kref
> > *kref)
> > WARN_ON(!xa_empty(&igroup->pasid_attach));
> >
> > - xa_cmpxchg(&igroup->ictx->groups,
> > iommu_group_id(igroup->group), igroup,
> > - NULL, GFP_KERNEL);
> > + if (igroup->group)
> > + xa_cmpxchg(&igroup->ictx->groups,
> > iommu_group_id(igroup->group),
> > + igroup, NULL, GFP_KERNEL);
>
> Is that a separate fix or perhaps belongs to the next patch making
> it possible to have NULL groups.
Yes, I agree. NULL groups are not possible here. Should be merged with
the next patch.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu
2026-03-23 13:17 ` Jason Gunthorpe
@ 2026-03-24 17:42 ` Samiullah Khawaja
0 siblings, 0 replies; 31+ messages in thread
From: Samiullah Khawaja @ 2026-03-24 17:42 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Alex Williamson,
Joerg Roedel, David Matlack, Robin Murphy, Nicolin Chen,
Tian, Kevin, Yi Liu, pasha.tatashin, Will Deacon, Baolu Lu
On Mon, Mar 23, 2026 at 10:17:14AM -0300, Jason Gunthorpe wrote:
>On Wed, Mar 18, 2026 at 06:38:14PM +0000, Samiullah Khawaja wrote:
>
>> > +static const struct iommu_domain_ops noiommu_amdv1_ops = {
>> > + IOMMU_PT_DOMAIN_OPS(amdv1),
>>
>> I understand that this fits in really well into the iommufd/hwpt
>> construction, but do we need page tables for this as all the
>> iova-to-phys information should be available in the IOPT in IOAS?
>
>Yes we do! That is the whole point.
>
>In iommufd once you pin the memory the phys is stored in only two
>possible ways:
>
>1) Inside an xarray if an access is used
>2) Inside at least one iommu_domain
>
>That's it. So to fit noiommu into this scheme, and have it rely on the
>existing pinning, we either have to make it use an access or make it
>use an iommu_domain -> a real one that can store phys.
Thanks for the explanation.
I missed the part where once pinning is done, the pfns are only
available in those two places.
>
>Maybe a comment is helpful, but using the domain like this to store
>the pinned phys has been the vfio design from day 1..
>
>> get_pa() function introduced in the later patch is only used for noiommu
>> use-cases, it can use the IOPT to get the physical addresses?
>
>No.
>
>Jason
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH V2 03/11] iommufd: Allow binding to a noiommu device
2026-03-22 9:54 ` Mostafa Saleh
2026-03-23 13:20 ` Jason Gunthorpe
@ 2026-03-24 19:13 ` Jacob Pan
1 sibling, 0 replies; 31+ messages in thread
From: Jacob Pan @ 2026-03-24 19:13 UTC (permalink / raw)
To: Mostafa Saleh
Cc: linux-kernel, iommu@lists.linux.dev, Jason Gunthorpe,
Alex Williamson, Joerg Roedel, David Matlack, Robin Murphy,
Nicolin Chen, Tian, Kevin, Yi Liu, skhawaja, pasha.tatashin,
Will Deacon, Baolu Lu
Hi Mostafa,
On Sun, 22 Mar 2026 09:54:15 +0000
Mostafa Saleh <smostafa@google.com> wrote:
> From: Mostafa Saleh <smostafa@google.com>
> To: Jacob Pan <jacob.pan@linux.microsoft.com>
> Cc: linux-kernel@vger.kernel.org, "iommu@lists.linux.dev"
> <iommu@lists.linux.dev>, Jason Gunthorpe <jgg@nvidia.com>, Alex
> Williamson <alex@shazbot.org>, Joerg Roedel <joro@8bytes.org>, David
> Matlack <dmatlack@google.com>, Robin Murphy <robin.murphy@arm.com>,
> Nicolin Chen <nicolinc@nvidia.com>, "Tian, Kevin"
> <kevin.tian@intel.com>, Yi Liu <yi.l.liu@intel.com>,
> skhawaja@google.com, pasha.tatashin@soleen.com, Will Deacon
> <will@kernel.org>, Baolu Lu <baolu.lu@linux.intel.com> Subject: Re:
> [PATCH V2 03/11] iommufd: Allow binding to a noiommu device Date:
> Sun, 22 Mar 2026 09:54:15 +0000
>
> On Thu, Mar 12, 2026 at 08:56:29AM -0700, Jacob Pan wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> >
> > Allow iommufd to bind devices without an IOMMU (noiommu mode) by
> > creating a dummy IOMMU group for such devices and skipping hwpt
> > operations.
> >
> > This enables noiommu devices to operate through the same iommufd
> > API as IOMMU- capable devices.
> >
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> > ---
> > drivers/iommu/iommufd/device.c | 113
> > ++++++++++++++++++++++----------- 1 file changed, 76 insertions(+),
> > 37 deletions(-)
> >
> > diff --git a/drivers/iommu/iommufd/device.c
> > b/drivers/iommu/iommufd/device.c index 54d73016468f..c38d3efa3d6f
> > 100644 --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -23,6 +23,11 @@ struct iommufd_attach {
> > struct xarray device_array;
> > };
> >
> > +static bool is_vfio_noiommu(struct iommufd_device *idev)
> > +{
> > + return !device_iommu_mapped(idev->dev) ||
> > !idev->dev->iommu;
>
> Do this need to check for CONFIG_VFIO_NOIOMMU and maybe the module
> param enable_unsafe_noiommu_mode similar to the legacy implemenation?
>
Checking for CONFIG_VFIO_NOIOMMU is not needed since all the conditions
are not restricted by CONFIG_VFIO_NOIOMMU. I felt it is cleaner this
way by not tying iommufd private code with vfio.
I guess we could do something like below but not necessary IMHO.
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -23,11 +23,6 @@ struct iommufd_attach {
struct xarray device_array;
};
-static bool is_vfio_noiommu(struct iommufd_device *idev)
-{
- return !device_iommu_mapped(idev->dev) || !idev->dev->iommu;
-}
-
static void iommufd_group_release(struct kref *kref)
{
struct iommufd_group *igroup =
diff --git a/drivers/iommu/iommufd/iommufd_private.h
b/drivers/iommu/iommufd/iommufd_private.h index
3302c6a1f99e..cba5550e3f2b 100644 ---
a/drivers/iommu/iommufd/iommufd_private.h +++
b/drivers/iommu/iommufd/iommufd_private.h @@ -711,6 +711,18 @@
iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id) struct
iommufd_vdevice, obj); }
+#ifdef CONFIG_VFIO_NOIOMMU
+static inline bool is_vfio_noiommu(struct iommufd_device *idev)
+{
+ return !device_iommu_mapped(idev->dev) || !idev->dev->iommu;
+}
+#else
+static inline bool is_vfio_noiommu(struct iommufd_device *idev)
+{
+ return false;
+}
+#endif
+
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2026-03-24 19:13 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 15:56 [PATCH V2 00/11] iommufd: Enable noiommu mode for cdev Jacob Pan
2026-03-12 15:56 ` [PATCH V2 01/11] iommufd: Support a HWPT without an iommu driver for noiommu Jacob Pan
2026-03-18 18:38 ` Samiullah Khawaja
2026-03-23 13:17 ` Jason Gunthorpe
2026-03-24 17:42 ` Samiullah Khawaja
2026-03-22 9:24 ` Mostafa Saleh
2026-03-23 21:11 ` Jacob Pan
2026-03-23 22:10 ` Jason Gunthorpe
2026-03-12 15:56 ` [PATCH V2 02/11] iommufd: Move igroup allocation to a function Jacob Pan
2026-03-18 18:39 ` Samiullah Khawaja
2026-03-22 9:41 ` Mostafa Saleh
2026-03-23 22:51 ` Jacob Pan
2026-03-23 16:46 ` Samiullah Khawaja
2026-03-12 15:56 ` [PATCH V2 03/11] iommufd: Allow binding to a noiommu device Jacob Pan
2026-03-22 9:54 ` Mostafa Saleh
2026-03-23 13:20 ` Jason Gunthorpe
2026-03-24 19:13 ` Jacob Pan
2026-03-12 15:56 ` [PATCH V2 04/11] iommufd: Add an ioctl IOMMU_IOAS_GET_PA to query PA from IOVA Jacob Pan
2026-03-12 15:56 ` [PATCH V2 05/11] vfio: Allow null group for noiommu without containers Jacob Pan
2026-03-22 9:59 ` Mostafa Saleh
2026-03-23 13:21 ` Jason Gunthorpe
2026-03-12 15:56 ` [PATCH V2 06/11] vfio: Introduce and set noiommu flag on vfio_device Jacob Pan
2026-03-22 10:02 ` Mostafa Saleh
2026-03-12 15:56 ` [PATCH V2 07/11] vfio: Update noiommu device detection logic for cdev Jacob Pan
2026-03-22 10:04 ` Mostafa Saleh
2026-03-12 15:56 ` [PATCH V2 08/11] vfio: Enable cdev noiommu mode under iommufd Jacob Pan
2026-03-14 8:09 ` kernel test robot
2026-03-12 15:56 ` [PATCH V2 09/11] vfio:selftest: Handle VFIO noiommu cdev Jacob Pan
2026-03-12 15:56 ` [PATCH V2 10/11] selftests/vfio: Add iommufd noiommu mode selftest for cdev Jacob Pan
2026-03-12 15:56 ` [PATCH V2 11/11] Doc: Update VFIO NOIOMMU mode Jacob Pan
2026-03-13 17:48 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox