* [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device
@ 2026-03-26 9:11 Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 01/14] vfio/iommufd: Extend attach/detach_hwpt callback implementations with pasid Zhenzhong Duan
` (14 more replies)
0 siblings, 15 replies; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
Hi,
Now we already support first stage translation with passthrough device
backed by nested translation in host, but only for PASID_0.
Structure VTDAddressSpace includes some elements suitable for emulated
device and passthrough device without PASID, e.g., address space,
different memory regions, etc, it is also protected by vtd iommu lock,
all these are useless and become a burden for passthrough device with
PASID.
When there are lots of PASIDs used in one device, the AS and MRs are
all registered to memory core and impact the whole system performance.
So instead of using VTDAddressSpace to cache pasid entry for each pasid
of a passthrough device, we define a light weight structure
VTDAccelPASIDCacheEntry with only necessary elements for each pasid. We
will use this struct as a parameter to conduct binding/unbinding to
nested hwpt, to record the current binded nested hwpt and even future
PRQ support. It's also designed to support PASID_0.
The potential full definition of VTDAccelPASIDCacheEntry may like:
typedef struct VTDAccelPASIDCacheEntry {
VTDHostIOMMUDevice *vtd_hiod;
VTDPASIDEntry pasid_entry;
uint32_t pasid;
uint32_t fs_hwpt_id;
uint32_t fault_id;
int fault_fd;
QLIST_HEAD(, VTDPRQEntry) vtd_prq_list;
IOMMUPRINotifier pri_notifier_entry;
IOMMUPRINotifier *pri_notifier;
QLIST_ENTRY(VTDAccelPASIDCacheEntry) next;
} VTDAccelPASIDCacheEntry;
GIT branch: https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_pasid
PATCH01-06: Some preparing work
PATCH07-10: Handle PASID entry addition and removal
PATCH11-12: Support pasid binding and unbinding
PATCH13-14: Add PASID related check and enable PASID for passthrough device
This patchset depends on a kernel feature enhancement[1] to work.
Tests:
Tested with DSA device which driver uses 2 PASIDs by default.
Thanks
Zhenzhong
[1] https://lore.kernel.org/all/20260205023405.41583-1-zhenzhong.duan@intel.com/
Changelog:
v2:
- move the check "s->pasid > PCI_EXT_CAP_PASID_MAX_WIDTH" to patch5 (Clement)
- move #include "hw/core/iommu.h" before #include "hw/core/qdev.h" (liuyi)
- polish the comment about @Pasid parameter (Liuyi)
- s/pe/pasid_entry, s/as_it/hiod_it, s/vtd_find_add_pc/vtd_accel_fill_pc (Liuyi)
- s/VTDACCELPASIDCacheEntry/VTDAccelPASIDCacheEntry (Liuyi)
- add explanation in code about PASID removal before addition (Liuyi)
- polish the comment about scope of VTDAccelPASIDCacheEntry vs VTDPASIDCacheEntry (Liuyi)
- add an optimization to bypass PASID entry addition for PASID selective pv_inv_dsc (Liuyi)
v1:
- use naming pattern "XXX_SET_THENAME" same as smmu (Clement)
- fix s->pasid check (Clement)
RFCv2:
- extend attach/detach_hwpt() instead of introducing new callbacks (Shammer)
- Define IOMMU_NO_PASID for device attachment without pasid (Nicolin)
- update vtd_destroy_old_fs_hwpt()'s parameter for naming consistency (Clement)
- check pasid bits size to be no more than 20 bits (Clement)
- initialize local variable max_pasid_log2 to 0 (Cédric)
Zhenzhong Duan (14):
vfio/iommufd: Extend attach/detach_hwpt callback implementations with
pasid
iommufd: Extend attach/detach_hwpt callbacks to support pasid
vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID
flag
intel_iommu: Create the nested hwpt with IOMMU_HWPT_ALLOC_PASID flag
intel_iommu: Change pasid property from bool to uint8
intel_iommu: Export some functions
intel_iommu_accel: Handle PASID entry addition for pc_inv_dsc request
intel_iommu_accel: Handle PASID entry removal for pc_inv_dsc request
intel_iommu_accel: Bypass PASID entry addition for just deleted entry
intel_iommu_accel: Handle PASID entry removal for system reset
intel_iommu_accel: Support pasid binding/unbinding and PIOTLB flushing
intel_iommu_accel: drop _lock suffix in
vtd_flush_host_piotlb_all_locked()
intel_iommu_accel: Add pasid bits size check
intel_iommu: Expose flag VIOMMU_FLAG_PASID_SUPPORTED when configured
hw/i386/intel_iommu_accel.h | 34 ++-
hw/i386/intel_iommu_internal.h | 43 +++-
include/hw/core/iommu.h | 2 +
include/hw/i386/intel_iommu.h | 4 +-
include/hw/vfio/vfio-device.h | 1 +
include/system/iommufd.h | 16 +-
backends/iommufd.c | 9 +-
hw/arm/smmuv3-accel.c | 12 +-
hw/i386/intel_iommu.c | 83 +++----
hw/i386/intel_iommu_accel.c | 420 +++++++++++++++++++++++++++------
hw/vfio/device.c | 11 +
hw/vfio/iommufd.c | 56 +++--
hw/vfio/trace-events | 4 +-
13 files changed, 524 insertions(+), 171 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v2 01/14] vfio/iommufd: Extend attach/detach_hwpt callback implementations with pasid
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-26 22:04 ` Nicolin Chen
2026-03-26 9:11 ` [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid Zhenzhong Duan
` (13 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
For attachment with pasid, pasid together with flag VFIO_DEVICE_ATTACH_PASID
should be passed in.
Define IOMMU_NO_PASID to represent device attachment without pasid same as
in kernel.
The implementation is similar for detachment.
Suggested-by: Shameer Kolothum Thodi <skolothumtho@nvidia.com>
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
---
include/hw/core/iommu.h | 2 ++
hw/vfio/iommufd.c | 44 +++++++++++++++++++++++++----------------
hw/vfio/trace-events | 4 ++--
3 files changed, 31 insertions(+), 19 deletions(-)
diff --git a/include/hw/core/iommu.h b/include/hw/core/iommu.h
index 86af315c15..bfcd511013 100644
--- a/include/hw/core/iommu.h
+++ b/include/hw/core/iommu.h
@@ -28,4 +28,6 @@ enum host_iommu_quirks {
HOST_IOMMU_QUIRK_NESTING_PARENT_BYPASS_RO = BIT_ULL(0),
};
+#define IOMMU_NO_PASID 0
+
#endif /* HW_IOMMU_H */
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 3e33dfbb35..93f1e61a8c 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -20,6 +20,7 @@
#include "trace.h"
#include "qapi/error.h"
#include "system/iommufd.h"
+#include "hw/core/iommu.h"
#include "hw/core/qdev.h"
#include "hw/vfio/vfio-cpr.h"
#include "system/reset.h"
@@ -305,43 +306,48 @@ out:
return ret;
}
-static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
- Error **errp)
+static int iommufd_cdev_pasid_attach_ioas_hwpt(VFIODevice *vbasedev,
+ uint32_t pasid, uint32_t id,
+ Error **errp)
{
int iommufd = vbasedev->iommufd->fd;
struct vfio_device_attach_iommufd_pt attach_data = {
.argsz = sizeof(attach_data),
- .flags = 0,
+ .flags = pasid == IOMMU_NO_PASID ? 0 : VFIO_DEVICE_ATTACH_PASID,
+ .pasid = pasid,
.pt_id = id,
};
/* Attach device to an IOAS or hwpt within iommufd */
if (ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data)) {
error_setg_errno(errp, errno,
- "[iommufd=%d] error attach %s (%d) to id=%d",
- iommufd, vbasedev->name, vbasedev->fd, id);
+ "[iommufd=%d] error attach %s (%d) pasid %d to id=%d",
+ iommufd, vbasedev->name, vbasedev->fd, pasid, id);
return -errno;
}
- trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
- vbasedev->fd, id);
+ trace_iommufd_cdev_pasid_attach_ioas_hwpt(iommufd, vbasedev->name,
+ vbasedev->fd, pasid, id);
return 0;
}
-static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
+static bool iommufd_cdev_pasid_detach_ioas_hwpt(VFIODevice *vbasedev,
+ uint32_t pasid, Error **errp)
{
int iommufd = vbasedev->iommufd->fd;
struct vfio_device_detach_iommufd_pt detach_data = {
.argsz = sizeof(detach_data),
- .flags = 0,
+ .flags = pasid == IOMMU_NO_PASID ? 0 : VFIO_DEVICE_DETACH_PASID,
+ .pasid = pasid,
};
if (ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_data)) {
- error_setg_errno(errp, errno, "detach %s failed", vbasedev->name);
+ error_setg_errno(errp, errno, "detach %s pasid %d failed",
+ vbasedev->name, pasid);
return false;
}
- trace_iommufd_cdev_detach_ioas_hwpt(iommufd, vbasedev->name);
+ trace_iommufd_cdev_pasid_detach_ioas_hwpt(iommufd, vbasedev->name, pasid);
return true;
}
@@ -362,7 +368,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
/* Try to find a domain */
QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
if (!cpr_is_incoming()) {
- ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ ret = iommufd_cdev_pasid_attach_ioas_hwpt(vbasedev, IOMMU_NO_PASID,
+ hwpt->hwpt_id, errp);
} else if (vbasedev->cpr.hwpt_id == hwpt->hwpt_id) {
ret = 0;
} else {
@@ -435,7 +442,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
return false;
}
- ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt_id, errp);
+ ret = iommufd_cdev_pasid_attach_ioas_hwpt(vbasedev, IOMMU_NO_PASID, hwpt_id,
+ errp);
if (ret) {
iommufd_backend_free_id(container->be, hwpt_id);
return false;
@@ -488,7 +496,8 @@ static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
/* If CPR, we are already attached to ioas_id. */
return cpr_is_incoming() ||
- !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
+ !iommufd_cdev_pasid_attach_ioas_hwpt(vbasedev, IOMMU_NO_PASID,
+ container->ioas_id, errp);
}
static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
@@ -496,7 +505,7 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
{
Error *err = NULL;
- if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
+ if (!iommufd_cdev_pasid_detach_ioas_hwpt(vbasedev, IOMMU_NO_PASID, &err)) {
error_report_err(err);
}
@@ -922,7 +931,8 @@ host_iommu_device_iommufd_vfio_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
{
VFIODevice *vbasedev = HOST_IOMMU_DEVICE(idev)->agent;
- return !iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt_id, errp);
+ return !iommufd_cdev_pasid_attach_ioas_hwpt(vbasedev, IOMMU_NO_PASID,
+ hwpt_id, errp);
}
static bool
@@ -931,7 +941,7 @@ host_iommu_device_iommufd_vfio_detach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
{
VFIODevice *vbasedev = HOST_IOMMU_DEVICE(idev)->agent;
- return iommufd_cdev_detach_ioas_hwpt(vbasedev, errp);
+ return iommufd_cdev_pasid_detach_ioas_hwpt(vbasedev, IOMMU_NO_PASID, errp);
}
static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 846e3625c5..764a3e4855 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -182,8 +182,8 @@ vfio_vmstate_change_prepare(const char *name, int running, const char *reason, c
iommufd_cdev_connect_and_bind(int iommufd, const char *name, int devfd, int devid) " [iommufd=%d] Successfully bound device %s (fd=%d): output devid=%d"
iommufd_cdev_getfd(const char *dev, int devfd) " %s (fd=%d)"
-iommufd_cdev_attach_ioas_hwpt(int iommufd, const char *name, int devfd, int id) " [iommufd=%d] Successfully attached device %s (%d) to id=%d"
-iommufd_cdev_detach_ioas_hwpt(int iommufd, const char *name) " [iommufd=%d] Successfully detached %s"
+iommufd_cdev_pasid_attach_ioas_hwpt(int iommufd, const char *name, int devfd, uint32_t pasid, int id) " [iommufd=%d] Successfully attached device %s (%d) pasid %d to id=%d"
+iommufd_cdev_pasid_detach_ioas_hwpt(int iommufd, const char *name, uint32_t pasid) " [iommufd=%d] Successfully detached %s pasid %d"
iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD container with ioasid=%d"
iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions, int flags) " %s (%d) num_irqs=%d num_regions=%d flags=%d"
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 01/14] vfio/iommufd: Extend attach/detach_hwpt callback implementations with pasid Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-26 22:18 ` Nicolin Chen
2026-03-27 4:29 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag Zhenzhong Duan
` (12 subsequent siblings)
14 siblings, 2 replies; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan, qemu-arm
Same for the two wrappers and their call sites.
Suggested-by: Shameer Kolothum Thodi <skolothumtho@nvidia.com>
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
---
include/system/iommufd.h | 16 +++++++++++-----
backends/iommufd.c | 9 +++++----
hw/arm/smmuv3-accel.c | 12 ++++++++----
hw/i386/intel_iommu_accel.c | 8 +++++---
hw/vfio/iommufd.c | 10 +++++-----
5 files changed, 34 insertions(+), 21 deletions(-)
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 7062944fe6..45a9e87cb0 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -138,14 +138,16 @@ struct HostIOMMUDeviceIOMMUFDClass {
*
* @idev: host IOMMU device backed by IOMMUFD backend.
*
+ * @pasid: target pasid of attach.
+ *
* @hwpt_id: ID of IOMMUFD hardware page table.
*
* @errp: pass an Error out when attachment fails.
*
* Returns: true on success, false on failure.
*/
- bool (*attach_hwpt)(HostIOMMUDeviceIOMMUFD *idev, uint32_t hwpt_id,
- Error **errp);
+ bool (*attach_hwpt)(HostIOMMUDeviceIOMMUFD *idev, uint32_t pasid,
+ uint32_t hwpt_id, Error **errp);
/**
* @detach_hwpt: detach host IOMMU device from IOMMUFD hardware page table.
* VFIO and VDPA device can have different implementation.
@@ -154,15 +156,19 @@ struct HostIOMMUDeviceIOMMUFDClass {
*
* @idev: host IOMMU device backed by IOMMUFD backend.
*
+ * @pasid: target pasid of detach.
+ *
* @errp: pass an Error out when attachment fails.
*
* Returns: true on success, false on failure.
*/
- bool (*detach_hwpt)(HostIOMMUDeviceIOMMUFD *idev, Error **errp);
+ bool (*detach_hwpt)(HostIOMMUDeviceIOMMUFD *idev, uint32_t pasid,
+ Error **errp);
};
bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
- uint32_t hwpt_id, Error **errp);
-bool host_iommu_device_iommufd_detach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
+ uint32_t pasid, uint32_t hwpt_id,
Error **errp);
+bool host_iommu_device_iommufd_detach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
+ uint32_t pasid, Error **errp);
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index e1fee16acf..ab612e4874 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -539,23 +539,24 @@ bool iommufd_backend_alloc_veventq(IOMMUFDBackend *be, uint32_t viommu_id,
}
bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
- uint32_t hwpt_id, Error **errp)
+ uint32_t pasid, uint32_t hwpt_id,
+ Error **errp)
{
HostIOMMUDeviceIOMMUFDClass *idevc =
HOST_IOMMU_DEVICE_IOMMUFD_GET_CLASS(idev);
g_assert(idevc->attach_hwpt);
- return idevc->attach_hwpt(idev, hwpt_id, errp);
+ return idevc->attach_hwpt(idev, pasid, hwpt_id, errp);
}
bool host_iommu_device_iommufd_detach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
- Error **errp)
+ uint32_t pasid, Error **errp)
{
HostIOMMUDeviceIOMMUFDClass *idevc =
HOST_IOMMU_DEVICE_IOMMUFD_GET_CLASS(idev);
g_assert(idevc->detach_hwpt);
- return idevc->detach_hwpt(idev, errp);
+ return idevc->detach_hwpt(idev, pasid, errp);
}
static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 65c2f44880..0af6b3296d 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -300,7 +300,8 @@ bool smmuv3_accel_install_ste(SMMUv3State *s, SMMUDevice *sdev, int sid,
return false;
}
- if (!host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, errp)) {
+ if (!host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID, hwpt_id,
+ errp)) {
if (s1_hwpt) {
iommufd_backend_free_id(idev->iommufd, s1_hwpt->hwpt_id);
g_free(s1_hwpt);
@@ -575,7 +576,8 @@ smmuv3_accel_alloc_viommu(SMMUv3State *s, HostIOMMUDeviceIOMMUFD *idev,
/* Attach a HWPT based on SMMUv3 GBPA.ABORT value */
hwpt_id = smmuv3_accel_gbpa_hwpt(s, accel);
- if (!host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, errp)) {
+ if (!host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID, hwpt_id,
+ errp)) {
goto free_veventq;
}
return true;
@@ -665,7 +667,8 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
idev = accel_dev->idev;
accel = accel_dev->s_accel;
/* Re-attach the default s2 hwpt id */
- if (!host_iommu_device_iommufd_attach_hwpt(idev, idev->hwpt_id, NULL)) {
+ if (!host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID,
+ idev->hwpt_id, NULL)) {
error_report("Unable to attach the default HW pagetable: idev devid "
"0x%x", idev->devid);
}
@@ -879,7 +882,8 @@ bool smmuv3_accel_attach_gbpa_hwpt(SMMUv3State *s, Error **errp)
hwpt_id = smmuv3_accel_gbpa_hwpt(s, accel);
QLIST_FOREACH(accel_dev, &accel->device_list, next) {
- if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev, hwpt_id,
+ if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev,
+ IOMMU_NO_PASID, hwpt_id,
&local_err)) {
error_append_hint(&local_err, "Failed to attach GBPA hwpt %u for "
"idev devid %u", hwpt_id, accel_dev->idev->devid);
diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
index 67d54849f2..45c08c8f6f 100644
--- a/hw/i386/intel_iommu_accel.c
+++ b/hw/i386/intel_iommu_accel.c
@@ -121,7 +121,8 @@ static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
}
}
- ret = host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, errp);
+ ret = host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID, hwpt_id,
+ errp);
trace_vtd_device_attach_hwpt(idev->devid, vtd_as->pasid, hwpt_id, ret);
if (ret) {
/* Destroy old fs_hwpt if it's a replacement */
@@ -145,7 +146,7 @@ static bool vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
bool ret;
if (s->dmar_enabled && s->root_scalable) {
- ret = host_iommu_device_iommufd_detach_hwpt(idev, errp);
+ ret = host_iommu_device_iommufd_detach_hwpt(idev, IOMMU_NO_PASID, errp);
trace_vtd_device_detach_hwpt(idev->devid, pasid, ret);
} else {
/*
@@ -153,7 +154,8 @@ static bool vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
* we fallback to the default HWPT which contains shadow page table.
* So guest DMA could still work.
*/
- ret = host_iommu_device_iommufd_attach_hwpt(idev, idev->hwpt_id, errp);
+ ret = host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID,
+ idev->hwpt_id, errp);
trace_vtd_device_reattach_def_hwpt(idev->devid, pasid, idev->hwpt_id,
ret);
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 93f1e61a8c..e822039858 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -927,21 +927,21 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, const void *data)
static bool
host_iommu_device_iommufd_vfio_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
- uint32_t hwpt_id, Error **errp)
+ uint32_t pasid, uint32_t hwpt_id,
+ Error **errp)
{
VFIODevice *vbasedev = HOST_IOMMU_DEVICE(idev)->agent;
- return !iommufd_cdev_pasid_attach_ioas_hwpt(vbasedev, IOMMU_NO_PASID,
- hwpt_id, errp);
+ return !iommufd_cdev_pasid_attach_ioas_hwpt(vbasedev, pasid, hwpt_id, errp);
}
static bool
host_iommu_device_iommufd_vfio_detach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
- Error **errp)
+ uint32_t pasid, Error **errp)
{
VFIODevice *vbasedev = HOST_IOMMU_DEVICE(idev)->agent;
- return iommufd_cdev_pasid_detach_ioas_hwpt(vbasedev, IOMMU_NO_PASID, errp);
+ return iommufd_cdev_pasid_detach_ioas_hwpt(vbasedev, pasid, errp);
}
static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 01/14] vfio/iommufd: Extend attach/detach_hwpt callback implementations with pasid Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-26 22:53 ` Nicolin Chen
2026-03-27 4:29 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 04/14] intel_iommu: Create the nested " Zhenzhong Duan
` (11 subsequent siblings)
14 siblings, 2 replies; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
When both device and vIOMMU have PASID enabled, then guest may setup
pasid attached translation.
We need to create the nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID
flag because according to uAPI, any domain attached to the non-PASID
part of the device must also be flagged, otherwise attaching a PASID
will blocked.
Introduce a vfio_device_get_viommu_flags_pasid_supported() helper to
facilitate this implementation.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
---
include/hw/vfio/vfio-device.h | 1 +
hw/vfio/device.c | 11 +++++++++++
hw/vfio/iommufd.c | 8 +++++++-
3 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 828a31c006..dd0355eb3d 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -268,6 +268,7 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainer *bcontainer,
void vfio_device_unprepare(VFIODevice *vbasedev);
bool vfio_device_get_viommu_flags_want_nesting(VFIODevice *vbasedev);
+bool vfio_device_get_viommu_flags_pasid_supported(VFIODevice *vbasedev);
bool vfio_device_get_host_iommu_quirk_bypass_ro(VFIODevice *vbasedev,
uint32_t type, void *caps,
uint32_t size);
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 973fc35b59..b15ca6ef0a 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -533,6 +533,17 @@ bool vfio_device_get_viommu_flags_want_nesting(VFIODevice *vbasedev)
return false;
}
+bool vfio_device_get_viommu_flags_pasid_supported(VFIODevice *vbasedev)
+{
+ VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
+
+ if (vdev) {
+ return !!(pci_device_get_viommu_flags(PCI_DEVICE(vdev)) &
+ VIOMMU_FLAG_PASID_SUPPORTED);
+ }
+ return false;
+}
+
bool vfio_device_get_host_iommu_quirk_bypass_ro(VFIODevice *vbasedev,
uint32_t type, void *caps,
uint32_t size)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index e822039858..dce4e4ce72 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -363,6 +363,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
VendorCaps caps;
VFIOIOASHwpt *hwpt;
uint32_t hwpt_id;
+ uint8_t max_pasid_log2 = 0;
int ret;
/* Try to find a domain */
@@ -408,7 +409,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
*/
if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
&type, &caps, sizeof(caps), &hw_caps,
- NULL, errp)) {
+ &max_pasid_log2, errp)) {
return false;
}
@@ -430,6 +431,11 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
}
}
+ if (max_pasid_log2 &&
+ vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
+ flags |= IOMMU_HWPT_ALLOC_PASID;
+ }
+
if (cpr_is_incoming()) {
hwpt_id = vbasedev->cpr.hwpt_id;
goto skip_alloc;
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 04/14] intel_iommu: Create the nested hwpt with IOMMU_HWPT_ALLOC_PASID flag
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (2 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 05/14] intel_iommu: Change pasid property from bool to uint8 Zhenzhong Duan
` (10 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
When pasid is enabled, any hwpt attached to non-PASID or PASID should be
IOMMU_HWPT_ALLOC_PASID flagged, or else attachment fails.
Change vtd_destroy_old_fs_hwpt() to pass in 'VTDHostIOMMUDevice *' for
naming consistency.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Clement Mathieu--Drif <clement.mathieu--drif@bull.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
---
hw/i386/intel_iommu_accel.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
index 45c08c8f6f..c2757f3bcd 100644
--- a/hw/i386/intel_iommu_accel.c
+++ b/hw/i386/intel_iommu_accel.c
@@ -69,11 +69,13 @@ VTDHostIOMMUDevice *vtd_find_hiod_iommufd(VTDAddressSpace *as)
return NULL;
}
-static bool vtd_create_fs_hwpt(HostIOMMUDeviceIOMMUFD *idev,
+static bool vtd_create_fs_hwpt(VTDHostIOMMUDevice *vtd_hiod,
VTDPASIDEntry *pe, uint32_t *fs_hwpt_id,
Error **errp)
{
+ HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
struct iommu_hwpt_vtd_s1 vtd = {};
+ uint32_t flags = vtd_hiod->iommu_state->pasid ? IOMMU_HWPT_ALLOC_PASID : 0;
vtd.flags = (VTD_SM_PASID_ENTRY_SRE(pe) ? IOMMU_VTD_S1_SRE : 0) |
(VTD_SM_PASID_ENTRY_WPE(pe) ? IOMMU_VTD_S1_WPE : 0) |
@@ -82,13 +84,15 @@ static bool vtd_create_fs_hwpt(HostIOMMUDeviceIOMMUFD *idev,
vtd.pgtbl_addr = (uint64_t)vtd_pe_get_fspt_base(pe);
return iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid, idev->hwpt_id,
- 0, IOMMU_HWPT_DATA_VTD_S1, sizeof(vtd),
- &vtd, fs_hwpt_id, errp);
+ flags, IOMMU_HWPT_DATA_VTD_S1,
+ sizeof(vtd), &vtd, fs_hwpt_id, errp);
}
-static void vtd_destroy_old_fs_hwpt(HostIOMMUDeviceIOMMUFD *idev,
+static void vtd_destroy_old_fs_hwpt(VTDHostIOMMUDevice *vtd_hiod,
VTDAddressSpace *vtd_as)
{
+ HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
+
if (!vtd_as->fs_hwpt_id) {
return;
}
@@ -116,7 +120,7 @@ static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
}
if (vtd_pe_pgtt_is_fst(pe)) {
- if (!vtd_create_fs_hwpt(idev, pe, &hwpt_id, errp)) {
+ if (!vtd_create_fs_hwpt(vtd_hiod, pe, &hwpt_id, errp)) {
return false;
}
}
@@ -126,7 +130,7 @@ static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
trace_vtd_device_attach_hwpt(idev->devid, vtd_as->pasid, hwpt_id, ret);
if (ret) {
/* Destroy old fs_hwpt if it's a replacement */
- vtd_destroy_old_fs_hwpt(idev, vtd_as);
+ vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_as);
if (vtd_pe_pgtt_is_fst(pe)) {
vtd_as->fs_hwpt_id = hwpt_id;
}
@@ -161,7 +165,7 @@ static bool vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
}
if (ret) {
- vtd_destroy_old_fs_hwpt(idev, vtd_as);
+ vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_as);
}
return ret;
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 05/14] intel_iommu: Change pasid property from bool to uint8
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (3 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 04/14] intel_iommu: Create the nested " Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-27 4:30 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 06/14] intel_iommu: Export some functions Zhenzhong Duan
` (9 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
'x-pasid-mode' is a bool property, we need an extra 'pss' property to
represent PASID size supported. Because there is no any device in QEMU
supporting pasid capability yet, no guest could use the pasid feature
until now, 'x-pasid-mode' takes no effect.
So instead of an extra 'pss' property we can use a single 'pasid'
property of uint8 type to represent if pasid is supported and the PASID
bits size. A value of N > 0 means pasid is supported and N - 1 is the
value in PSS field in ECAP register.
PASID bits size should also be no more than 20 bits according to PCI spec.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 2 +-
include/hw/i386/intel_iommu.h | 2 +-
hw/i386/intel_iommu.c | 11 +++++++++--
3 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 11a53aa369..db4f186a3e 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -195,7 +195,7 @@
#define VTD_ECAP_MHMV (15ULL << 20)
#define VTD_ECAP_SRS (1ULL << 31)
#define VTD_ECAP_NWFS (1ULL << 33)
-#define VTD_ECAP_PSS (7ULL << 35) /* limit: MemTxAttrs::pid */
+#define VTD_ECAP_SET_PSS(x, v) ((x)->ecap = deposit64((x)->ecap, 35, 5, v))
#define VTD_ECAP_PASID (1ULL << 40)
#define VTD_ECAP_PDS (1ULL << 42)
#define VTD_ECAP_SMTS (1ULL << 43)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index e44ce31841..95c76015e4 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -314,7 +314,7 @@ struct IntelIOMMUState {
bool intr_eime; /* Extended interrupt mode enabled */
OnOffAuto intr_eim; /* Toggle for EIM cabability */
uint8_t aw_bits; /* Host/IOVA address width (in bits) */
- bool pasid; /* Whether to support PASID */
+ uint8_t pasid; /* PASID supported in bits, 0 if not */
bool fs1gp; /* First Stage 1-GByte Page Support */
/* Transient Mapping, Reserved(0) since VTD spec revision 3.2 */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f395fa248c..a7b676cd13 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4203,7 +4203,7 @@ static const Property vtd_properties[] = {
DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
DEFINE_PROP_BOOL("x-flts", IntelIOMMUState, fsts, FALSE),
DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState, snoop_control, false),
- DEFINE_PROP_BOOL("x-pasid-mode", IntelIOMMUState, pasid, false),
+ DEFINE_PROP_UINT8("pasid", IntelIOMMUState, pasid, 0),
DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, false),
DEFINE_PROP_BOOL("stale-tm", IntelIOMMUState, stale_tm, false),
DEFINE_PROP_BOOL("fs1gp", IntelIOMMUState, fs1gp, true),
@@ -5042,7 +5042,8 @@ static void vtd_cap_init(IntelIOMMUState *s)
}
if (s->pasid) {
- s->ecap |= VTD_ECAP_PASID | VTD_ECAP_PSS;
+ VTD_ECAP_SET_PSS(s, s->pasid - 1);
+ s->ecap |= VTD_ECAP_PASID;
}
}
@@ -5583,6 +5584,12 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
return false;
}
+ if (s->pasid > PCI_EXT_CAP_PASID_MAX_WIDTH) {
+ error_setg(errp, "PASID width %d, exceed Max PASID Width %d allowed "
+ "in PCI spec", s->pasid, PCI_EXT_CAP_PASID_MAX_WIDTH);
+ return false;
+ }
+
if (s->svm) {
if (!x86_iommu->dt_supported) {
error_setg(errp, "Need to set device IOTLB for svm");
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 06/14] intel_iommu: Export some functions
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (4 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 05/14] intel_iommu: Change pasid property from bool to uint8 Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 07/14] intel_iommu_accel: Handle PASID entry addition for pc_inv_dsc request Zhenzhong Duan
` (8 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan, Clement Mathieu--Drif
Export some functions for accel code usages. Inline functions and MACROs
are moved to internal header files. Then accel code in following patches
could access them.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
---
hw/i386/intel_iommu_internal.h | 31 +++++++++++++++++++++++++
hw/i386/intel_iommu.c | 42 ++++++++--------------------------
2 files changed, 40 insertions(+), 33 deletions(-)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index db4f186a3e..c7e107fe87 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -620,6 +620,12 @@ typedef struct VTDRootEntry VTDRootEntry;
#define VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 0xffffffffffe00000ULL
#define VTD_SM_CONTEXT_ENTRY_PRE 0x10ULL
+/* context entry operations */
+#define VTD_CE_GET_PASID_DIR_TABLE(ce) \
+ ((ce)->val[0] & VTD_PASID_DIR_BASE_ADDR_MASK)
+#define VTD_CE_GET_PRE(ce) \
+ ((ce)->val[0] & VTD_SM_CONTEXT_ENTRY_PRE)
+
typedef struct VTDPASIDCacheInfo {
uint8_t type;
uint16_t did;
@@ -746,4 +752,29 @@ static inline bool vtd_pe_pgtt_is_fst(VTDPASIDEntry *pe)
{
return (VTD_SM_PASID_ENTRY_PGTT(pe) == VTD_SM_PASID_ENTRY_FST);
}
+
+static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
+{
+ return pdire->val & 1;
+}
+
+static inline bool vtd_pe_present(VTDPASIDEntry *pe)
+{
+ return pe->val[0] & VTD_PASID_ENTRY_P;
+}
+
+static inline int vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2)
+{
+ return memcmp(p1, p2, sizeof(*p1));
+}
+
+int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base, uint32_t pasid,
+ VTDPASIDDirEntry *pdire);
+int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s, uint32_t pasid,
+ dma_addr_t addr, VTDPASIDEntry *pe);
+int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
+ uint8_t devfn, VTDContextEntry *ce);
+int vtd_ce_get_pasid_entry(IntelIOMMUState *s, VTDContextEntry *ce,
+ VTDPASIDEntry *pe, uint32_t pasid);
+VTDAddressSpace *vtd_get_as_by_sid(IntelIOMMUState *s, uint16_t sid);
#endif
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a7b676cd13..b5d18ae321 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -42,12 +42,6 @@
#include "migration/vmstate.h"
#include "trace.h"
-/* context entry operations */
-#define VTD_CE_GET_PASID_DIR_TABLE(ce) \
- ((ce)->val[0] & VTD_PASID_DIR_BASE_ADDR_MASK)
-#define VTD_CE_GET_PRE(ce) \
- ((ce)->val[0] & VTD_SM_CONTEXT_ENTRY_PRE)
-
/*
* Paging mode for first-stage translation (VTD spec Figure 9-6)
* 00: 4-level paging, 01: 5-level paging
@@ -831,18 +825,12 @@ static inline bool vtd_pe_type_check(IntelIOMMUState *s, VTDPASIDEntry *pe)
}
}
-static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
-{
- return pdire->val & 1;
-}
-
/**
* Caller of this function should check present bit if wants
* to use pdir entry for further usage except for fpd bit check.
*/
-static int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base,
- uint32_t pasid,
- VTDPASIDDirEntry *pdire)
+int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base, uint32_t pasid,
+ VTDPASIDDirEntry *pdire)
{
uint32_t index;
dma_addr_t addr, entry_size;
@@ -860,15 +848,8 @@ static int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base,
return 0;
}
-static inline bool vtd_pe_present(VTDPASIDEntry *pe)
-{
- return pe->val[0] & VTD_PASID_ENTRY_P;
-}
-
-static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
- uint32_t pasid,
- dma_addr_t addr,
- VTDPASIDEntry *pe)
+int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s, uint32_t pasid,
+ dma_addr_t addr, VTDPASIDEntry *pe)
{
uint8_t pgtt;
uint32_t index;
@@ -954,8 +935,8 @@ static int vtd_get_pe_from_pasid_table(IntelIOMMUState *s,
return 0;
}
-static int vtd_ce_get_pasid_entry(IntelIOMMUState *s, VTDContextEntry *ce,
- VTDPASIDEntry *pe, uint32_t pasid)
+int vtd_ce_get_pasid_entry(IntelIOMMUState *s, VTDContextEntry *ce,
+ VTDPASIDEntry *pe, uint32_t pasid)
{
dma_addr_t pasid_dir_base;
@@ -1526,8 +1507,8 @@ static int vtd_ce_pasid_0_check(IntelIOMMUState *s, VTDContextEntry *ce)
}
/* Map a device to its corresponding domain (context-entry) */
-static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
- uint8_t devfn, VTDContextEntry *ce)
+int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
+ uint8_t devfn, VTDContextEntry *ce)
{
VTDRootEntry re;
int ret_fr;
@@ -1909,7 +1890,7 @@ static VTDAddressSpace *vtd_get_as_by_sid_and_pasid(IntelIOMMUState *s,
vtd_find_as_by_sid_and_pasid, &key);
}
-static VTDAddressSpace *vtd_get_as_by_sid(IntelIOMMUState *s, uint16_t sid)
+VTDAddressSpace *vtd_get_as_by_sid(IntelIOMMUState *s, uint16_t sid)
{
return vtd_get_as_by_sid_and_pasid(s, sid, PCI_NO_PASID);
}
@@ -3133,11 +3114,6 @@ static inline int vtd_dev_get_pe_from_pasid(VTDAddressSpace *vtd_as,
return vtd_ce_get_pasid_entry(s, &ce, pe, vtd_as->pasid);
}
-static int vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2)
-{
- return memcmp(p1, p2, sizeof(*p1));
-}
-
/* Update or invalidate pasid cache based on the pasid entry in guest memory. */
static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
gpointer user_data)
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 07/14] intel_iommu_accel: Handle PASID entry addition for pc_inv_dsc request
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (5 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 06/14] intel_iommu: Export some functions Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-27 4:30 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 08/14] intel_iommu_accel: Handle PASID entry removal " Zhenzhong Duan
` (7 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
Structure VTDAddressSpace includes some elements suitable for emulated
device and passthrough device without PASID, e.g., address space,
different memory regions, etc, it is also protected by vtd iommu lock,
all these are useless and become a burden for passthrough device with
PASID.
When there are lots of PASIDs used in one device, the AS and MRs are
all registered to memory core and impact the whole system performance.
So instead of using VTDAddressSpace to cache pasid entry for each pasid
of a passthrough device, we define a light weight structure
VTDAccelPASIDCacheEntry with only necessary elements for each pasid. We
will use this struct as a parameter to conduct binding/unbinding to
nested hwpt and to record the current bound nested hwpt. It's also
designed to support PASID_0.
VTDAccelPASIDCacheEntry is designed to only be used in intel_iommu_accel.c,
similarly VTDPASIDCacheEntry should only be used in hw/i386/intel_iommu.c
When guest creates new PASID entries, QEMU will capture the pc_inv_dsc
(pasid cache invalidation) request, walk through each pasid in each
passthrough device for valid pasid entries, create a new
VTDAccelPASIDCacheEntry if not existing yet.
PASID_0 of passthrough device still need to register MRs in case guest
does not operate in scalable mode. So for PASID_0, we have both
VTDAPASIDCacheEntry and VTDAccelPASIDCacheEntry.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_accel.h | 13 +++
hw/i386/intel_iommu_internal.h | 8 ++
hw/i386/intel_iommu.c | 3 +
hw/i386/intel_iommu_accel.c | 170 +++++++++++++++++++++++++++++++++
4 files changed, 194 insertions(+)
diff --git a/hw/i386/intel_iommu_accel.h b/hw/i386/intel_iommu_accel.h
index e5f0b077b4..c5981a23bf 100644
--- a/hw/i386/intel_iommu_accel.h
+++ b/hw/i386/intel_iommu_accel.h
@@ -12,6 +12,13 @@
#define HW_I386_INTEL_IOMMU_ACCEL_H
#include CONFIG_DEVICES
+typedef struct VTDAccelPASIDCacheEntry {
+ VTDHostIOMMUDevice *vtd_hiod;
+ VTDPASIDEntry pasid_entry;
+ uint32_t pasid;
+ QLIST_ENTRY(VTDAccelPASIDCacheEntry) next;
+} VTDAccelPASIDCacheEntry;
+
#ifdef CONFIG_VTD_ACCEL
bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
Error **errp);
@@ -20,6 +27,7 @@ bool vtd_propagate_guest_pasid(VTDAddressSpace *vtd_as, Error **errp);
void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
uint32_t pasid, hwaddr addr,
uint64_t npages, bool ih);
+void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info);
void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops);
#else
static inline bool vtd_check_hiod_accel(IntelIOMMUState *s,
@@ -49,6 +57,11 @@ static inline void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s,
{
}
+static inline void vtd_pasid_cache_sync_accel(IntelIOMMUState *s,
+ VTDPASIDCacheInfo *pc_info)
+{
+}
+
static inline void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops)
{
}
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index c7e107fe87..d5f212ded9 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -616,6 +616,7 @@ typedef struct VTDRootEntry VTDRootEntry;
#define VTD_CTX_ENTRY_SCALABLE_SIZE 32
#define PASID_0 0
+#define VTD_SM_CONTEXT_ENTRY_PDTS(x) extract64((x)->val[0], 9, 3)
#define VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(aw) (0x1e0ULL | ~VTD_HAW_MASK(aw))
#define VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 0xffffffffffe00000ULL
#define VTD_SM_CONTEXT_ENTRY_PRE 0x10ULL
@@ -646,6 +647,7 @@ typedef struct VTDPIOTLBInvInfo {
#define VTD_PASID_DIR_BITS_MASK (0x3fffULL)
#define VTD_PASID_DIR_INDEX(pasid) (((pasid) >> 6) & VTD_PASID_DIR_BITS_MASK)
#define VTD_PASID_DIR_FPD (1ULL << 1) /* Fault Processing Disable */
+#define VTD_PASID_TABLE_ENTRY_NUM (1ULL << 6)
#define VTD_PASID_TABLE_BITS_MASK (0x3fULL)
#define VTD_PASID_TABLE_INDEX(pasid) ((pasid) & VTD_PASID_TABLE_BITS_MASK)
#define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing Disable */
@@ -711,6 +713,7 @@ typedef struct VTDHostIOMMUDevice {
PCIBus *bus;
uint8_t devfn;
HostIOMMUDevice *hiod;
+ QLIST_HEAD(, VTDAccelPASIDCacheEntry) pasid_cache_list;
} VTDHostIOMMUDevice;
/*
@@ -768,6 +771,11 @@ static inline int vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2)
return memcmp(p1, p2, sizeof(*p1));
}
+static inline uint32_t vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce)
+{
+ return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce) + 7);
+}
+
int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base, uint32_t pasid,
VTDPASIDDirEntry *pdire);
int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s, uint32_t pasid,
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b5d18ae321..451ede7530 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3202,6 +3202,8 @@ static void vtd_pasid_cache_sync(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
g_hash_table_foreach(s->vtd_address_spaces, vtd_pasid_cache_sync_locked,
pc_info);
vtd_iommu_unlock(s);
+
+ vtd_pasid_cache_sync_accel(s, pc_info);
}
static void vtd_replay_pasid_bindings_all(IntelIOMMUState *s)
@@ -4759,6 +4761,7 @@ static bool vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
vtd_hiod->devfn = (uint8_t)devfn;
vtd_hiod->iommu_state = s;
vtd_hiod->hiod = hiod;
+ QLIST_INIT(&vtd_hiod->pasid_cache_list);
if (!vtd_check_hiod(s, vtd_hiod, errp)) {
g_free(vtd_hiod);
diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
index c2757f3bcd..32d8ab0ef9 100644
--- a/hw/i386/intel_iommu_accel.c
+++ b/hw/i386/intel_iommu_accel.c
@@ -257,6 +257,176 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
vtd_flush_host_piotlb_locked, &piotlb_info);
}
+static void vtd_accel_fill_pc(VTDHostIOMMUDevice *vtd_hiod, uint32_t pasid,
+ VTDPASIDEntry *pe)
+{
+ VTDAccelPASIDCacheEntry *vtd_pce;
+
+ QLIST_FOREACH(vtd_pce, &vtd_hiod->pasid_cache_list, next) {
+ if (vtd_pce->pasid == pasid) {
+ if (vtd_pasid_entry_compare(pe, &vtd_pce->pasid_entry)) {
+ vtd_pce->pasid_entry = *pe;
+ }
+ return;
+ }
+ }
+
+ vtd_pce = g_malloc0(sizeof(VTDAccelPASIDCacheEntry));
+ vtd_pce->vtd_hiod = vtd_hiod;
+ vtd_pce->pasid = pasid;
+ vtd_pce->pasid_entry = *pe;
+ QLIST_INSERT_HEAD(&vtd_hiod->pasid_cache_list, vtd_pce, next);
+}
+
+/*
+ * This function walks over PASID range within [start, end) in a single
+ * PASID table for entries matching @info type/did, then create
+ * VTDAccelPASIDCacheEntry if not exist yet.
+ */
+static void vtd_sm_pasid_table_walk_one(VTDHostIOMMUDevice *vtd_hiod,
+ dma_addr_t pt_base,
+ int start,
+ int end,
+ VTDPASIDCacheInfo *info)
+{
+ IntelIOMMUState *s = vtd_hiod->iommu_state;
+ VTDPASIDEntry pe;
+ int pasid;
+
+ for (pasid = start; pasid < end; pasid++) {
+ if (vtd_get_pe_in_pasid_leaf_table(s, pasid, pt_base, &pe) ||
+ !vtd_pe_present(&pe)) {
+ continue;
+ }
+
+ if ((info->type == VTD_INV_DESC_PASIDC_G_DSI ||
+ info->type == VTD_INV_DESC_PASIDC_G_PASID_SI) &&
+ (info->did != VTD_SM_PASID_ENTRY_DID(&pe))) {
+ /*
+ * VTD_PASID_CACHE_DOMSI and VTD_PASID_CACHE_PASIDSI
+ * requires domain id check. If domain id check fail,
+ * go to next pasid.
+ */
+ continue;
+ }
+
+ vtd_accel_fill_pc(vtd_hiod, pasid, &pe);
+ }
+}
+
+/*
+ * In VT-d scalable mode translation, PASID dir + PASID table is used.
+ * This function aims at looping over a range of PASIDs in the given
+ * two level table to identify the pasid config in guest.
+ */
+static void vtd_sm_pasid_table_walk(VTDHostIOMMUDevice *vtd_hiod,
+ dma_addr_t pdt_base,
+ int start, int end,
+ VTDPASIDCacheInfo *info)
+{
+ VTDPASIDDirEntry pdire;
+ int pasid = start;
+ int pasid_next;
+ dma_addr_t pt_base;
+
+ while (pasid < end) {
+ pasid_next = (pasid + VTD_PASID_TABLE_ENTRY_NUM) &
+ ~(VTD_PASID_TABLE_ENTRY_NUM - 1);
+ pasid_next = pasid_next < end ? pasid_next : end;
+
+ if (!vtd_get_pdire_from_pdir_table(pdt_base, pasid, &pdire)
+ && vtd_pdire_present(&pdire)) {
+ pt_base = pdire.val & VTD_PASID_TABLE_BASE_ADDR_MASK;
+ vtd_sm_pasid_table_walk_one(vtd_hiod, pt_base, pasid, pasid_next,
+ info);
+ }
+ pasid = pasid_next;
+ }
+}
+
+static void vtd_replay_pasid_bind_for_dev(VTDHostIOMMUDevice *vtd_hiod,
+ int start, int end,
+ VTDPASIDCacheInfo *pc_info)
+{
+ IntelIOMMUState *s = vtd_hiod->iommu_state;
+ VTDContextEntry ce;
+ int dev_max_pasid = 1 << vtd_hiod->hiod->caps.max_pasid_log2;
+
+ if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_hiod->bus),
+ vtd_hiod->devfn, &ce)) {
+ VTDPASIDCacheInfo walk_info = *pc_info;
+ uint32_t ce_max_pasid = vtd_sm_ce_get_pdt_entry_num(&ce) *
+ VTD_PASID_TABLE_ENTRY_NUM;
+
+ end = MIN(end, MIN(dev_max_pasid, ce_max_pasid));
+
+ vtd_sm_pasid_table_walk(vtd_hiod, VTD_CE_GET_PASID_DIR_TABLE(&ce),
+ start, end, &walk_info);
+ }
+}
+
+/*
+ * This function replays the guest pasid bindings by walking the two level
+ * guest PASID table. For each valid pasid entry, it creates an entry
+ * VTDAccelPASIDCacheEntry dynamically if not exist yet. This entry holds
+ * info specific to a pasid
+ */
+void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
+{
+ int start = PASID_0, end = 1 << s->pasid;
+ VTDHostIOMMUDevice *vtd_hiod;
+ GHashTableIter hiod_it;
+
+ if (!s->fsts) {
+ return;
+ }
+
+ /*
+ * VTDPASIDCacheInfo honors PCI pasid but VTDAccelPASIDCacheEntry honors
+ * iommu pasid
+ */
+ if (pc_info->pasid == PCI_NO_PASID) {
+ pc_info->pasid = PASID_0;
+ }
+
+ switch (pc_info->type) {
+ case VTD_INV_DESC_PASIDC_G_PASID_SI:
+ start = pc_info->pasid;
+ end = pc_info->pasid + 1;
+ /* fall through */
+ case VTD_INV_DESC_PASIDC_G_DSI:
+ /*
+ * loop all assigned devices, do domain id check in
+ * vtd_sm_pasid_table_walk_one() after get pasid entry.
+ */
+ break;
+ case VTD_INV_DESC_PASIDC_G_GLOBAL:
+ /* loop all assigned devices */
+ break;
+ default:
+ g_assert_not_reached();
+ }
+
+ /*
+ * In this replay, one only needs to care about the devices which are
+ * backed by host IOMMU. Those devices have a corresponding vtd_hiod
+ * in s->vtd_host_iommu_dev. For devices not backed by host IOMMU, it
+ * is not necessary to replay the bindings since their cache should be
+ * created in the future DMA address translation.
+ *
+ * VTD translation callback never accesses vtd_hiod and its corresponding
+ * cached pasid entry, so no iommu lock needed here.
+ */
+ g_hash_table_iter_init(&hiod_it, s->vtd_host_iommu_dev);
+ while (g_hash_table_iter_next(&hiod_it, NULL, (void **)&vtd_hiod)) {
+ if (!object_dynamic_cast(OBJECT(vtd_hiod->hiod),
+ TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
+ continue;
+ }
+ vtd_replay_pasid_bind_for_dev(vtd_hiod, start, end, pc_info);
+ }
+}
+
static uint64_t vtd_get_host_iommu_quirks(uint32_t type,
void *caps, uint32_t size)
{
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 08/14] intel_iommu_accel: Handle PASID entry removal for pc_inv_dsc request
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (6 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 07/14] intel_iommu_accel: Handle PASID entry addition for pc_inv_dsc request Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-27 4:31 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 09/14] intel_iommu_accel: Bypass PASID entry addition for just deleted entry Zhenzhong Duan
` (6 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
When guest deletes PASID entries, QEMU will capture the pasid cache
invalidation request, walk through pasid_cache_list in each passthrough
device to find stale VTDAccelPASIDCacheEntry and delete them.
This happen before the PASID entry addition, because a new added entry
should never be removed.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_accel.c | 75 +++++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)
diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
index 32d8ab0ef9..c1285ce331 100644
--- a/hw/i386/intel_iommu_accel.c
+++ b/hw/i386/intel_iommu_accel.c
@@ -16,6 +16,28 @@
#include "hw/pci/pci_bus.h"
#include "trace.h"
+static inline int vtd_hiod_get_pe_from_pasid(VTDAccelPASIDCacheEntry *vtd_pce,
+ VTDPASIDEntry *pe)
+{
+ VTDHostIOMMUDevice *vtd_hiod = vtd_pce->vtd_hiod;
+ IntelIOMMUState *s = vtd_hiod->iommu_state;
+ uint32_t pasid = vtd_pce->pasid;
+ VTDContextEntry ce;
+ int ret;
+
+ if (!s->dmar_enabled || !s->root_scalable) {
+ return -VTD_FR_RTADDR_INV_TTM;
+ }
+
+ ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_hiod->bus),
+ vtd_hiod->devfn, &ce);
+ if (ret) {
+ return ret;
+ }
+
+ return vtd_ce_get_pasid_entry(s, &ce, pe, pasid);
+}
+
bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
Error **errp)
{
@@ -257,6 +279,52 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
vtd_flush_host_piotlb_locked, &piotlb_info);
}
+static void vtd_pasid_cache_invalidate_one(VTDAccelPASIDCacheEntry *vtd_pce,
+ VTDPASIDCacheInfo *pc_info)
+{
+ VTDPASIDEntry pe;
+ uint16_t did;
+
+ /*
+ * VTD_INV_DESC_PASIDC_G_DSI and VTD_INV_DESC_PASIDC_G_PASID_SI require
+ * DID check. If DID doesn't match the value in cache or memory, then
+ * it's not a pasid entry we want to invalidate.
+ */
+ switch (pc_info->type) {
+ case VTD_INV_DESC_PASIDC_G_PASID_SI:
+ if (pc_info->pasid != vtd_pce->pasid) {
+ return;
+ }
+ /* Fall through */
+ case VTD_INV_DESC_PASIDC_G_DSI:
+ did = VTD_SM_PASID_ENTRY_DID(&vtd_pce->pasid_entry);
+ if (pc_info->did != did) {
+ return;
+ }
+ }
+
+ if (vtd_hiod_get_pe_from_pasid(vtd_pce, &pe)) {
+ /*
+ * No valid pasid entry in guest memory. e.g. pasid entry was modified
+ * to be either all-zero or non-present. Either case means existing
+ * pasid cache should be invalidated.
+ */
+ QLIST_REMOVE(vtd_pce, next);
+ g_free(vtd_pce);
+ }
+}
+
+/* Delete invalid pasid cache entry from pasid_cache_list */
+static void vtd_pasid_cache_invalidate(VTDHostIOMMUDevice *vtd_hiod,
+ VTDPASIDCacheInfo *pc_info)
+{
+ VTDAccelPASIDCacheEntry *vtd_pce, *next;
+
+ QLIST_FOREACH_SAFE(vtd_pce, &vtd_hiod->pasid_cache_list, next, next) {
+ vtd_pasid_cache_invalidate_one(vtd_pce, pc_info);
+ }
+}
+
static void vtd_accel_fill_pc(VTDHostIOMMUDevice *vtd_hiod, uint32_t pasid,
VTDPASIDEntry *pe)
{
@@ -423,6 +491,13 @@ void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
continue;
}
+
+ /*
+ * PASID entry removal is handled before addition intentionally,
+ * because it's unnecessary to iterate on an entry that will be
+ * removed.
+ */
+ vtd_pasid_cache_invalidate(vtd_hiod, pc_info);
vtd_replay_pasid_bind_for_dev(vtd_hiod, start, end, pc_info);
}
}
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 09/14] intel_iommu_accel: Bypass PASID entry addition for just deleted entry
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (7 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 08/14] intel_iommu_accel: Handle PASID entry removal " Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 10/14] intel_iommu_accel: Handle PASID entry removal for system reset Zhenzhong Duan
` (5 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
For VTD_INV_DESC_PASIDC_G_PASID_SI typed pc_inv_dsc invalidation, if an
pasid entry is just removed, it can never be a new entry to add. So
calling vtd_replay_pasid_bind_for_dev() is unnecessary.
Introduce a new field accel_pce_deleted in VTDPASIDCacheInfo to mark
this case and to do the bypassing.
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 1 +
hw/i386/intel_iommu_accel.c | 11 ++++++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index d5f212ded9..f3cb6cff1c 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -631,6 +631,7 @@ typedef struct VTDPASIDCacheInfo {
uint8_t type;
uint16_t did;
uint32_t pasid;
+ bool accel_pce_deleted;
} VTDPASIDCacheInfo;
typedef struct VTDPIOTLBInvInfo {
diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
index c1285ce331..1e27c0feb8 100644
--- a/hw/i386/intel_iommu_accel.c
+++ b/hw/i386/intel_iommu_accel.c
@@ -311,6 +311,10 @@ static void vtd_pasid_cache_invalidate_one(VTDAccelPASIDCacheEntry *vtd_pce,
*/
QLIST_REMOVE(vtd_pce, next);
g_free(vtd_pce);
+
+ if (pc_info->type == VTD_INV_DESC_PASIDC_G_PASID_SI) {
+ pc_info->accel_pce_deleted = true;
+ }
}
}
@@ -498,7 +502,12 @@ void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
* removed.
*/
vtd_pasid_cache_invalidate(vtd_hiod, pc_info);
- vtd_replay_pasid_bind_for_dev(vtd_hiod, start, end, pc_info);
+
+ if (pc_info->accel_pce_deleted) {
+ pc_info->accel_pce_deleted = false;
+ } else {
+ vtd_replay_pasid_bind_for_dev(vtd_hiod, start, end, pc_info);
+ }
}
}
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 10/14] intel_iommu_accel: Handle PASID entry removal for system reset
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (8 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 09/14] intel_iommu_accel: Bypass PASID entry addition for just deleted entry Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-27 4:32 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 11/14] intel_iommu_accel: Support pasid binding/unbinding and PIOTLB flushing Zhenzhong Duan
` (4 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
When system level reset, DMA translation is turned off, all PASID
entries become stale and should be deleted.
vtd_hiod list is never accessed without BQL, so no need to guard with
iommu lock.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_accel.h | 5 +++++
hw/i386/intel_iommu.c | 2 ++
hw/i386/intel_iommu_accel.c | 13 +++++++++++++
3 files changed, 20 insertions(+)
diff --git a/hw/i386/intel_iommu_accel.h b/hw/i386/intel_iommu_accel.h
index c5981a23bf..1fb7ca0af6 100644
--- a/hw/i386/intel_iommu_accel.h
+++ b/hw/i386/intel_iommu_accel.h
@@ -28,6 +28,7 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
uint32_t pasid, hwaddr addr,
uint64_t npages, bool ih);
void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info);
+void vtd_pasid_cache_reset_accel(IntelIOMMUState *s);
void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops);
#else
static inline bool vtd_check_hiod_accel(IntelIOMMUState *s,
@@ -62,6 +63,10 @@ static inline void vtd_pasid_cache_sync_accel(IntelIOMMUState *s,
{
}
+static inline void vtd_pasid_cache_reset_accel(IntelIOMMUState *s)
+{
+}
+
static inline void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops)
{
}
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 451ede7530..b022f3cb9e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -391,6 +391,8 @@ static void vtd_reset_caches(IntelIOMMUState *s)
vtd_reset_context_cache_locked(s);
vtd_pasid_cache_reset_locked(s);
vtd_iommu_unlock(s);
+
+ vtd_pasid_cache_reset_accel(s);
}
static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
index 1e27c0feb8..e9e67eb1a0 100644
--- a/hw/i386/intel_iommu_accel.c
+++ b/hw/i386/intel_iommu_accel.c
@@ -511,6 +511,19 @@ void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
}
}
+/* Fake a gloal pasid cache invalidation to remove all pasid cache entries */
+void vtd_pasid_cache_reset_accel(IntelIOMMUState *s)
+{
+ VTDPASIDCacheInfo pc_info = { .type = VTD_INV_DESC_PASIDC_G_GLOBAL };
+ VTDHostIOMMUDevice *vtd_hiod;
+ GHashTableIter as_it;
+
+ g_hash_table_iter_init(&as_it, s->vtd_host_iommu_dev);
+ while (g_hash_table_iter_next(&as_it, NULL, (void **)&vtd_hiod)) {
+ vtd_pasid_cache_invalidate(vtd_hiod, &pc_info);
+ }
+}
+
static uint64_t vtd_get_host_iommu_quirks(uint32_t type,
void *caps, uint32_t size)
{
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 11/14] intel_iommu_accel: Support pasid binding/unbinding and PIOTLB flushing
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (9 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 10/14] intel_iommu_accel: Handle PASID entry removal for system reset Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-27 4:32 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 12/14] intel_iommu_accel: drop _lock suffix in vtd_flush_host_piotlb_all_locked() Zhenzhong Duan
` (3 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
We just switched to use VTDAccelPASIDCacheEntry to cache pasid entry of
passthrough device, also need to switch the binding/unbinding and PIOTLB
flushing functions to use the same structure.
After the switching, we could remove accel related code from
vtd_pasid_cache_[reset/sync]_locked() to make intel_iommu.c cleaner.
The VTDAddressSpace of PASID_0 is still useful as VTD supports a legacy
mode which needs shadow page table instead of nested page table.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_accel.h | 2 +-
include/hw/i386/intel_iommu.h | 2 -
hw/i386/intel_iommu.c | 17 +----
hw/i386/intel_iommu_accel.c | 125 +++++++++++++++++-----------------
4 files changed, 64 insertions(+), 82 deletions(-)
diff --git a/hw/i386/intel_iommu_accel.h b/hw/i386/intel_iommu_accel.h
index 1fb7ca0af6..c72856a8ff 100644
--- a/hw/i386/intel_iommu_accel.h
+++ b/hw/i386/intel_iommu_accel.h
@@ -16,6 +16,7 @@ typedef struct VTDAccelPASIDCacheEntry {
VTDHostIOMMUDevice *vtd_hiod;
VTDPASIDEntry pasid_entry;
uint32_t pasid;
+ uint32_t fs_hwpt_id;
QLIST_ENTRY(VTDAccelPASIDCacheEntry) next;
} VTDAccelPASIDCacheEntry;
@@ -23,7 +24,6 @@ typedef struct VTDAccelPASIDCacheEntry {
bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
Error **errp);
VTDHostIOMMUDevice *vtd_find_hiod_iommufd(VTDAddressSpace *as);
-bool vtd_propagate_guest_pasid(VTDAddressSpace *vtd_as, Error **errp);
void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
uint32_t pasid, hwaddr addr,
uint64_t npages, bool ih);
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 95c76015e4..1842ba5840 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -154,8 +154,6 @@ struct VTDAddressSpace {
* with the guest IOMMU pgtables for a device.
*/
IOVATree *iova_tree;
-
- uint32_t fs_hwpt_id;
};
struct VTDIOTLBEntry {
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b022f3cb9e..f53642a611 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -86,8 +86,6 @@ static void vtd_pasid_cache_reset_locked(IntelIOMMUState *s)
VTDPASIDCacheEntry *pc_entry = &vtd_as->pasid_cache_entry;
if (pc_entry->valid) {
pc_entry->valid = false;
- /* It's fatal to get failure during reset */
- vtd_propagate_guest_pasid(vtd_as, &error_fatal);
}
}
}
@@ -3126,8 +3124,6 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
VTDPASIDEntry pe;
IOMMUNotifier *n;
uint16_t did;
- const char *err_prefix = "Attaching to HWPT failed: ";
- Error *local_err = NULL;
if (vtd_dev_get_pe_from_pasid(vtd_as, &pe)) {
if (!pc_entry->valid) {
@@ -3148,9 +3144,6 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
vtd_address_space_unmap(vtd_as, n);
}
vtd_switch_address_space(vtd_as);
-
- err_prefix = "Detaching from HWPT failed: ";
- goto do_bind_unbind;
}
/*
@@ -3178,20 +3171,12 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
if (!pc_entry->valid) {
pc_entry->pasid_entry = pe;
pc_entry->valid = true;
- } else if (vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
- err_prefix = "Replacing HWPT attachment failed: ";
- } else {
+ } else if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
return;
}
vtd_switch_address_space(vtd_as);
vtd_address_space_sync(vtd_as);
-
-do_bind_unbind:
- /* TODO: Fault event injection into guest, report error to QEMU for now */
- if (!vtd_propagate_guest_pasid(vtd_as, &local_err)) {
- error_reportf_err(local_err, "%s", err_prefix);
- }
}
static void vtd_pasid_cache_sync(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
index e9e67eb1a0..26543489fb 100644
--- a/hw/i386/intel_iommu_accel.c
+++ b/hw/i386/intel_iommu_accel.c
@@ -111,23 +111,24 @@ static bool vtd_create_fs_hwpt(VTDHostIOMMUDevice *vtd_hiod,
}
static void vtd_destroy_old_fs_hwpt(VTDHostIOMMUDevice *vtd_hiod,
- VTDAddressSpace *vtd_as)
+ VTDAccelPASIDCacheEntry *vtd_pce)
{
HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
- if (!vtd_as->fs_hwpt_id) {
+ if (!vtd_pce->fs_hwpt_id) {
return;
}
- iommufd_backend_free_id(idev->iommufd, vtd_as->fs_hwpt_id);
- vtd_as->fs_hwpt_id = 0;
+ iommufd_backend_free_id(idev->iommufd, vtd_pce->fs_hwpt_id);
+ vtd_pce->fs_hwpt_id = 0;
}
-static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
- VTDAddressSpace *vtd_as, Error **errp)
+static bool vtd_device_attach_iommufd(VTDAccelPASIDCacheEntry *vtd_pce,
+ Error **errp)
{
+ VTDHostIOMMUDevice *vtd_hiod = vtd_pce->vtd_hiod;
HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
- VTDPASIDEntry *pe = &vtd_as->pasid_cache_entry.pasid_entry;
- uint32_t hwpt_id = idev->hwpt_id;
+ VTDPASIDEntry *pe = &vtd_pce->pasid_entry;
+ uint32_t hwpt_id = idev->hwpt_id, pasid = vtd_pce->pasid;
bool ret;
/*
@@ -147,14 +148,13 @@ static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
}
}
- ret = host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID, hwpt_id,
- errp);
- trace_vtd_device_attach_hwpt(idev->devid, vtd_as->pasid, hwpt_id, ret);
+ ret = host_iommu_device_iommufd_attach_hwpt(idev, pasid, hwpt_id, errp);
+ trace_vtd_device_attach_hwpt(idev->devid, pasid, hwpt_id, ret);
if (ret) {
/* Destroy old fs_hwpt if it's a replacement */
- vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_as);
+ vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_pce);
if (vtd_pe_pgtt_is_fst(pe)) {
- vtd_as->fs_hwpt_id = hwpt_id;
+ vtd_pce->fs_hwpt_id = hwpt_id;
}
} else if (vtd_pe_pgtt_is_fst(pe)) {
iommufd_backend_free_id(idev->iommufd, hwpt_id);
@@ -163,16 +163,17 @@ static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
return ret;
}
-static bool vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
- VTDAddressSpace *vtd_as, Error **errp)
+static bool vtd_device_detach_iommufd(VTDAccelPASIDCacheEntry *vtd_pce,
+ Error **errp)
{
+ VTDHostIOMMUDevice *vtd_hiod = vtd_pce->vtd_hiod;
HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
- IntelIOMMUState *s = vtd_as->iommu_state;
- uint32_t pasid = vtd_as->pasid;
+ IntelIOMMUState *s = vtd_hiod->iommu_state;
+ uint32_t pasid = vtd_pce->pasid;
bool ret;
- if (s->dmar_enabled && s->root_scalable) {
- ret = host_iommu_device_iommufd_detach_hwpt(idev, IOMMU_NO_PASID, errp);
+ if (pasid != IOMMU_NO_PASID || (s->dmar_enabled && s->root_scalable)) {
+ ret = host_iommu_device_iommufd_detach_hwpt(idev, pasid, errp);
trace_vtd_device_detach_hwpt(idev->devid, pasid, ret);
} else {
/*
@@ -180,72 +181,47 @@ static bool vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
* we fallback to the default HWPT which contains shadow page table.
* So guest DMA could still work.
*/
- ret = host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID,
+ ret = host_iommu_device_iommufd_attach_hwpt(idev, pasid,
idev->hwpt_id, errp);
trace_vtd_device_reattach_def_hwpt(idev->devid, pasid, idev->hwpt_id,
ret);
}
if (ret) {
- vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_as);
+ vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_pce);
}
return ret;
}
-bool vtd_propagate_guest_pasid(VTDAddressSpace *vtd_as, Error **errp)
-{
- VTDPASIDCacheEntry *pc_entry = &vtd_as->pasid_cache_entry;
- VTDHostIOMMUDevice *vtd_hiod = vtd_find_hiod_iommufd(vtd_as);
-
- /* Ignore emulated device or legacy VFIO backed device */
- if (!vtd_as->iommu_state->fsts || !vtd_hiod) {
- return true;
- }
-
- if (pc_entry->valid) {
- return vtd_device_attach_iommufd(vtd_hiod, vtd_as, errp);
- }
-
- return vtd_device_detach_iommufd(vtd_hiod, vtd_as, errp);
-}
-
/*
- * This function is a loop function for the s->vtd_address_spaces
- * list with VTDPIOTLBInvInfo as execution filter. It propagates
- * the piotlb invalidation to host.
+ * This function is a loop function for the s->vtd_host_iommu_dev
+ * and vtd_hiod->pasid_cache_list lists with VTDPIOTLBInvInfo as
+ * execution filter. It propagates the piotlb invalidation to host.
*/
-static void vtd_flush_host_piotlb_locked(gpointer key, gpointer value,
- gpointer user_data)
+static void vtd_flush_host_piotlb(VTDAccelPASIDCacheEntry *vtd_pce,
+ VTDPIOTLBInvInfo *piotlb_info)
{
- VTDPIOTLBInvInfo *piotlb_info = user_data;
- VTDAddressSpace *vtd_as = value;
- VTDHostIOMMUDevice *vtd_hiod = vtd_find_hiod_iommufd(vtd_as);
- VTDPASIDCacheEntry *pc_entry = &vtd_as->pasid_cache_entry;
+ VTDHostIOMMUDevice *vtd_hiod = vtd_pce->vtd_hiod;
+ VTDPASIDEntry *pe = &vtd_pce->pasid_entry;
uint16_t did;
- if (!vtd_hiod) {
- return;
- }
-
- assert(vtd_as->pasid == PCI_NO_PASID);
-
/* Nothing to do if there is no first stage HWPT attached */
- if (!pc_entry->valid ||
- !vtd_pe_pgtt_is_fst(&pc_entry->pasid_entry)) {
+ if (!vtd_pe_pgtt_is_fst(pe)) {
return;
}
- did = VTD_SM_PASID_ENTRY_DID(&pc_entry->pasid_entry);
+ did = VTD_SM_PASID_ENTRY_DID(pe);
- if (piotlb_info->domain_id == did && piotlb_info->pasid == PASID_0) {
+ if (piotlb_info->domain_id == did && piotlb_info->pasid == vtd_pce->pasid) {
HostIOMMUDeviceIOMMUFD *idev =
HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
uint32_t entry_num = 1; /* Only implement one request for simplicity */
Error *local_err = NULL;
struct iommu_hwpt_vtd_s1_invalidate *cache = piotlb_info->inv_data;
- if (!iommufd_backend_invalidate_cache(idev->iommufd, vtd_as->fs_hwpt_id,
+ if (!iommufd_backend_invalidate_cache(idev->iommufd,
+ vtd_pce->fs_hwpt_id,
IOMMU_HWPT_INVALIDATE_DATA_VTD_S1,
sizeof(*cache), &entry_num, cache,
&local_err)) {
@@ -261,6 +237,8 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
{
struct iommu_hwpt_vtd_s1_invalidate cache_info = { 0 };
VTDPIOTLBInvInfo piotlb_info;
+ VTDHostIOMMUDevice *vtd_hiod;
+ GHashTableIter as_it;
cache_info.addr = addr;
cache_info.npages = npages;
@@ -271,12 +249,19 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
piotlb_info.inv_data = &cache_info;
/*
- * Go through each vtd_as instance in s->vtd_address_spaces, find out
- * affected host devices which need host piotlb invalidation. Piotlb
- * invalidation should check pasid cache per architecture point of view.
+ * Go through each vtd_pce in vtd_hiod->pasid_cache_list for each host
+ * device, find out affected host device pasid which need host piotlb
+ * invalidation. Piotlb invalidation should check pasid cache per
+ * architecture point of view.
*/
- g_hash_table_foreach(s->vtd_address_spaces,
- vtd_flush_host_piotlb_locked, &piotlb_info);
+ g_hash_table_iter_init(&as_it, s->vtd_host_iommu_dev);
+ while (g_hash_table_iter_next(&as_it, NULL, (void **)&vtd_hiod)) {
+ VTDAccelPASIDCacheEntry *vtd_pce;
+
+ QLIST_FOREACH(vtd_pce, &vtd_hiod->pasid_cache_list, next) {
+ vtd_flush_host_piotlb(vtd_pce, &piotlb_info);
+ }
+ }
}
static void vtd_pasid_cache_invalidate_one(VTDAccelPASIDCacheEntry *vtd_pce,
@@ -284,6 +269,7 @@ static void vtd_pasid_cache_invalidate_one(VTDAccelPASIDCacheEntry *vtd_pce,
{
VTDPASIDEntry pe;
uint16_t did;
+ Error *local_err = NULL;
/*
* VTD_INV_DESC_PASIDC_G_DSI and VTD_INV_DESC_PASIDC_G_PASID_SI require
@@ -309,6 +295,9 @@ static void vtd_pasid_cache_invalidate_one(VTDAccelPASIDCacheEntry *vtd_pce,
* to be either all-zero or non-present. Either case means existing
* pasid cache should be invalidated.
*/
+ if (!vtd_device_detach_iommufd(vtd_pce, &local_err)) {
+ error_reportf_err(local_err, "%s", "Detaching from HWPT failed: ");
+ }
QLIST_REMOVE(vtd_pce, next);
g_free(vtd_pce);
@@ -333,11 +322,17 @@ static void vtd_accel_fill_pc(VTDHostIOMMUDevice *vtd_hiod, uint32_t pasid,
VTDPASIDEntry *pe)
{
VTDAccelPASIDCacheEntry *vtd_pce;
+ Error *local_err = NULL;
QLIST_FOREACH(vtd_pce, &vtd_hiod->pasid_cache_list, next) {
if (vtd_pce->pasid == pasid) {
if (vtd_pasid_entry_compare(pe, &vtd_pce->pasid_entry)) {
vtd_pce->pasid_entry = *pe;
+
+ if (!vtd_device_attach_iommufd(vtd_pce, &local_err)) {
+ error_reportf_err(local_err, "%s",
+ "Replacing HWPT attachment failed: ");
+ }
}
return;
}
@@ -348,6 +343,10 @@ static void vtd_accel_fill_pc(VTDHostIOMMUDevice *vtd_hiod, uint32_t pasid,
vtd_pce->pasid = pasid;
vtd_pce->pasid_entry = *pe;
QLIST_INSERT_HEAD(&vtd_hiod->pasid_cache_list, vtd_pce, next);
+
+ if (!vtd_device_attach_iommufd(vtd_pce, &local_err)) {
+ error_reportf_err(local_err, "%s", "Attaching to HWPT failed: ");
+ }
}
/*
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 12/14] intel_iommu_accel: drop _lock suffix in vtd_flush_host_piotlb_all_locked()
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (10 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 11/14] intel_iommu_accel: Support pasid binding/unbinding and PIOTLB flushing Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 13/14] intel_iommu_accel: Add pasid bits size check Zhenzhong Duan
` (2 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
In order to support PASID, we have switched from looping vtd_as to vtd_hiod,
vtd_hiod represents host passthrough device and never deferenced without BQL.
So we don't need extra iommu lock to protect it.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_accel.h | 14 +++++++-------
hw/i386/intel_iommu.c | 7 ++++---
hw/i386/intel_iommu_accel.c | 6 +++---
3 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/hw/i386/intel_iommu_accel.h b/hw/i386/intel_iommu_accel.h
index c72856a8ff..45a12e0292 100644
--- a/hw/i386/intel_iommu_accel.h
+++ b/hw/i386/intel_iommu_accel.h
@@ -24,9 +24,9 @@ typedef struct VTDAccelPASIDCacheEntry {
bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
Error **errp);
VTDHostIOMMUDevice *vtd_find_hiod_iommufd(VTDAddressSpace *as);
-void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
- uint32_t pasid, hwaddr addr,
- uint64_t npages, bool ih);
+void vtd_flush_host_piotlb_all_accel(IntelIOMMUState *s, uint16_t domain_id,
+ uint32_t pasid, hwaddr addr,
+ uint64_t npages, bool ih);
void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info);
void vtd_pasid_cache_reset_accel(IntelIOMMUState *s);
void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops);
@@ -51,10 +51,10 @@ static inline bool vtd_propagate_guest_pasid(VTDAddressSpace *vtd_as,
return true;
}
-static inline void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s,
- uint16_t domain_id,
- uint32_t pasid, hwaddr addr,
- uint64_t npages, bool ih)
+static inline void vtd_flush_host_piotlb_all_accel(IntelIOMMUState *s,
+ uint16_t domain_id,
+ uint32_t pasid, hwaddr addr,
+ uint64_t npages, bool ih)
{
}
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f53642a611..5d0184fa0d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3011,11 +3011,11 @@ static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
info.domain_id = domain_id;
info.pasid = pasid;
+ vtd_flush_host_piotlb_all_accel(s, domain_id, pasid, 0, (uint64_t)-1,
+ false);
vtd_iommu_lock(s);
g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid,
&info);
- vtd_flush_host_piotlb_all_locked(s, domain_id, pasid, 0, (uint64_t)-1,
- false);
vtd_iommu_unlock(s);
QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
@@ -3045,10 +3045,11 @@ static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
info.addr = addr;
info.mask = ~((1 << am) - 1);
+ vtd_flush_host_piotlb_all_accel(s, domain_id, pasid, addr, 1 << am, ih);
+
vtd_iommu_lock(s);
g_hash_table_foreach_remove(s->iotlb,
vtd_hash_remove_by_page_piotlb, &info);
- vtd_flush_host_piotlb_all_locked(s, domain_id, pasid, addr, 1 << am, ih);
vtd_iommu_unlock(s);
vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);
diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
index 26543489fb..2fd26690b9 100644
--- a/hw/i386/intel_iommu_accel.c
+++ b/hw/i386/intel_iommu_accel.c
@@ -231,9 +231,9 @@ static void vtd_flush_host_piotlb(VTDAccelPASIDCacheEntry *vtd_pce,
}
}
-void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
- uint32_t pasid, hwaddr addr,
- uint64_t npages, bool ih)
+void vtd_flush_host_piotlb_all_accel(IntelIOMMUState *s, uint16_t domain_id,
+ uint32_t pasid, hwaddr addr,
+ uint64_t npages, bool ih)
{
struct iommu_hwpt_vtd_s1_invalidate cache_info = { 0 };
VTDPIOTLBInvInfo piotlb_info;
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 13/14] intel_iommu_accel: Add pasid bits size check
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (11 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 12/14] intel_iommu_accel: drop _lock suffix in vtd_flush_host_piotlb_all_locked() Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-27 4:32 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 14/14] intel_iommu: Expose flag VIOMMU_FLAG_PASID_SUPPORTED when configured Zhenzhong Duan
2026-03-27 3:58 ` [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Hao, Xudong
14 siblings, 1 reply; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
If pasid bits size is bigger than host side, host could fail to emulate
all bindings in guest. Add a check to fail device plug early.
Pasid bits size should also be no more than 20 bits according to PCI spec.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 1 +
hw/i386/intel_iommu_accel.c | 8 ++++++++
2 files changed, 9 insertions(+)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f3cb6cff1c..d11064b527 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -196,6 +196,7 @@
#define VTD_ECAP_SRS (1ULL << 31)
#define VTD_ECAP_NWFS (1ULL << 33)
#define VTD_ECAP_SET_PSS(x, v) ((x)->ecap = deposit64((x)->ecap, 35, 5, v))
+#define VTD_ECAP_PSS(ecap) extract64(ecap, 35, 5)
#define VTD_ECAP_PASID (1ULL << 40)
#define VTD_ECAP_PDS (1ULL << 42)
#define VTD_ECAP_SMTS (1ULL << 43)
diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
index 2fd26690b9..e73695ff83 100644
--- a/hw/i386/intel_iommu_accel.c
+++ b/hw/i386/intel_iommu_accel.c
@@ -44,6 +44,7 @@ bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
HostIOMMUDevice *hiod = vtd_hiod->hiod;
struct HostIOMMUDeviceCaps *caps = &hiod->caps;
struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
+ uint8_t hpasid = VTD_ECAP_PSS(vtd->ecap_reg) + 1;
PCIBus *bus = vtd_hiod->bus;
PCIDevice *pdev = bus->devices[vtd_hiod->devfn];
@@ -64,6 +65,13 @@ bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
return false;
}
+ /* Only do the check when host device support PASIDs */
+ if (caps->max_pasid_log2 && s->pasid > hpasid) {
+ error_setg(errp, "PASID bits size %d > host IOMMU PASID bits size %d",
+ s->pasid, hpasid);
+ return false;
+ }
+
if (pci_device_get_iommu_bus_devfn(pdev, &bus, NULL, NULL)) {
error_setg(errp, "Host device downstream to a PCI bridge is "
"unsupported when x-flts=on");
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 14/14] intel_iommu: Expose flag VIOMMU_FLAG_PASID_SUPPORTED when configured
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (12 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 13/14] intel_iommu_accel: Add pasid bits size check Zhenzhong Duan
@ 2026-03-26 9:11 ` Zhenzhong Duan
2026-03-27 3:58 ` [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Hao, Xudong
14 siblings, 0 replies; 38+ messages in thread
From: Zhenzhong Duan @ 2026-03-26 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao, Zhenzhong Duan
VFIO device will check flag VIOMMU_FLAG_PASID_SUPPORTED and expose PASID
capability, or else guest could not enable PASID of this device even if
vIOMMU's pasid is configured.
This is the final knob to enable PASID.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5d0184fa0d..96b4102ab9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4795,6 +4795,7 @@ static uint64_t vtd_get_viommu_flags(void *opaque)
uint64_t flags;
flags = s->fsts ? VIOMMU_FLAG_WANT_NESTING_PARENT : 0;
+ flags |= s->pasid ? VIOMMU_FLAG_PASID_SUPPORTED : 0;
return flags;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2 01/14] vfio/iommufd: Extend attach/detach_hwpt callback implementations with pasid
2026-03-26 9:11 ` [PATCH v2 01/14] vfio/iommufd: Extend attach/detach_hwpt callback implementations with pasid Zhenzhong Duan
@ 2026-03-26 22:04 ` Nicolin Chen
0 siblings, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2026-03-26 22:04 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, alex, clg, eric.auger, mst, jasowang, jgg,
skolothumtho, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, xudong.hao
On Thu, Mar 26, 2026 at 05:11:15AM -0400, Zhenzhong Duan wrote:
> For attachment with pasid, pasid together with flag VFIO_DEVICE_ATTACH_PASID
> should be passed in.
>
> Define IOMMU_NO_PASID to represent device attachment without pasid same as
> in kernel.
>
> The implementation is similar for detachment.
>
> Suggested-by: Shameer Kolothum Thodi <skolothumtho@nvidia.com>
> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid
2026-03-26 9:11 ` [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid Zhenzhong Duan
@ 2026-03-26 22:18 ` Nicolin Chen
2026-03-27 2:32 ` Duan, Zhenzhong
2026-03-27 4:29 ` Yi Liu
1 sibling, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2026-03-26 22:18 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, alex, clg, eric.auger, mst, jasowang, jgg,
skolothumtho, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, xudong.hao, qemu-arm
On Thu, Mar 26, 2026 at 05:11:16AM -0400, Zhenzhong Duan wrote:
> @@ -138,14 +138,16 @@ struct HostIOMMUDeviceIOMMUFDClass {
> *
> * @idev: host IOMMU device backed by IOMMUFD backend.
Not commenting against this patch, but I just found the "host IOMMU
device" and the "HostIOMMUDeviceIOMMUFD" a bit ambiguous. It's not
an "IOMMU device" right? Perhaps somebody can help me understand :)
> + * @pasid: target pasid of attach.
> + *
How about "target pasid of the device to be attached"?
The uAPI docs has the narrative "pasid of this device", which makes
it clearer.
> * @hwpt_id: ID of IOMMUFD hardware page table.
> *
> * @errp: pass an Error out when attachment fails.
> *
> * Returns: true on success, false on failure.
> */
> - bool (*attach_hwpt)(HostIOMMUDeviceIOMMUFD *idev, uint32_t hwpt_id,
> - Error **errp);
> + bool (*attach_hwpt)(HostIOMMUDeviceIOMMUFD *idev, uint32_t pasid,
> + uint32_t hwpt_id, Error **errp);
> /**
> * @detach_hwpt: detach host IOMMU device from IOMMUFD hardware page table.
> * VFIO and VDPA device can have different implementation.
> @@ -154,15 +156,19 @@ struct HostIOMMUDeviceIOMMUFDClass {
> *
> * @idev: host IOMMU device backed by IOMMUFD backend.
> *
> + * @pasid: target pasid of detach.
Ditto
Otherwise,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
2026-03-26 9:11 ` [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag Zhenzhong Duan
@ 2026-03-26 22:53 ` Nicolin Chen
2026-03-27 2:29 ` Duan, Zhenzhong
2026-03-27 4:29 ` Yi Liu
1 sibling, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2026-03-26 22:53 UTC (permalink / raw)
To: Zhenzhong Duan, skolothumtho
Cc: qemu-devel, alex, clg, eric.auger, mst, jasowang, jgg,
joao.m.martins, clement.mathieu--drif, kevin.tian, yi.l.liu,
xudong.hao
On Thu, Mar 26, 2026 at 05:11:17AM -0400, Zhenzhong Duan wrote:
> @@ -430,6 +431,11 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> }
> }
>
> + if (max_pasid_log2 &&
> + vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
> + flags |= IOMMU_HWPT_ALLOC_PASID;
> + }
This would set it to:
IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_NEST_PARENT
which isn't supported on ARM :-/
The VIOMMU_FLAG_PASID_SUPPORTED flag is to allow PCI core to expose
PASID cap. It feels like we need a different VIOMMU flag for VT-d,
e.g. VIOMMU_FLAG_WANT_PASID_ATTACH?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
2026-03-26 22:53 ` Nicolin Chen
@ 2026-03-27 2:29 ` Duan, Zhenzhong
2026-03-27 4:08 ` Nicolin Chen
0 siblings, 1 reply; 38+ messages in thread
From: Duan, Zhenzhong @ 2026-03-27 2:29 UTC (permalink / raw)
To: Nicolin Chen, skolothumtho@nvidia.com
Cc: qemu-devel@nongnu.org, alex@shazbot.org, clg@redhat.com,
eric.auger@redhat.com, mst@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, joao.m.martins@oracle.com,
clement.mathieu--drif@bull.com, Tian, Kevin, Liu, Yi L,
Hao, Xudong
>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
>IOMMU_HWPT_ALLOC_PASID flag
>
>On Thu, Mar 26, 2026 at 05:11:17AM -0400, Zhenzhong Duan wrote:
>> @@ -430,6 +431,11 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> }
>> }
>>
>> + if (max_pasid_log2 &&
>> + vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
>> + flags |= IOMMU_HWPT_ALLOC_PASID;
>> + }
>
>This would set it to:
> IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_NEST_PARENT
>which isn't supported on ARM :-/
I am a bit confused, if smmu supports dirty tracking, flags would be
set to IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT,
in arm_smmu_domain_alloc_paging_flags(), I see it will return -EOPNOTSUPP.
So how did smmu work in this case?
>
>The VIOMMU_FLAG_PASID_SUPPORTED flag is to allow PCI core to expose
>PASID cap. It feels like we need a different VIOMMU flag for VT-d,
>e.g. VIOMMU_FLAG_WANT_PASID_ATTACH?
>
>Thanks
>Nicolin
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid
2026-03-26 22:18 ` Nicolin Chen
@ 2026-03-27 2:32 ` Duan, Zhenzhong
2026-03-27 3:48 ` Nicolin Chen
0 siblings, 1 reply; 38+ messages in thread
From: Duan, Zhenzhong @ 2026-03-27 2:32 UTC (permalink / raw)
To: Nicolin Chen
Cc: qemu-devel@nongnu.org, alex@shazbot.org, clg@redhat.com,
eric.auger@redhat.com, mst@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, skolothumtho@nvidia.com,
joao.m.martins@oracle.com, clement.mathieu--drif@bull.com,
Tian, Kevin, Liu, Yi L, Hao, Xudong, qemu-arm@nongnu.org
>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to
>support pasid
>
>On Thu, Mar 26, 2026 at 05:11:16AM -0400, Zhenzhong Duan wrote:
>> @@ -138,14 +138,16 @@ struct HostIOMMUDeviceIOMMUFDClass {
>> *
>> * @idev: host IOMMU device backed by IOMMUFD backend.
>
>Not commenting against this patch, but I just found the "host IOMMU
>device" and the "HostIOMMUDeviceIOMMUFD" a bit ambiguous. It's not
>an "IOMMU device" right? Perhaps somebody can help me understand :)
A host device under host IOMMU?
>
>> + * @pasid: target pasid of attach.
>> + *
>
>How about "target pasid of the device to be attached"?
Looks fine, will do.
Thanks
Zhenzhong
>
>The uAPI docs has the narrative "pasid of this device", which makes
>it clearer.
>
>> * @hwpt_id: ID of IOMMUFD hardware page table.
>> *
>> * @errp: pass an Error out when attachment fails.
>> *
>> * Returns: true on success, false on failure.
>> */
>> - bool (*attach_hwpt)(HostIOMMUDeviceIOMMUFD *idev, uint32_t hwpt_id,
>> - Error **errp);
>> + bool (*attach_hwpt)(HostIOMMUDeviceIOMMUFD *idev, uint32_t pasid,
>> + uint32_t hwpt_id, Error **errp);
>> /**
>> * @detach_hwpt: detach host IOMMU device from IOMMUFD hardware
>page table.
>> * VFIO and VDPA device can have different implementation.
>> @@ -154,15 +156,19 @@ struct HostIOMMUDeviceIOMMUFDClass {
>> *
>> * @idev: host IOMMU device backed by IOMMUFD backend.
>> *
>> + * @pasid: target pasid of detach.
>
>Ditto
>
>Otherwise,
>
>Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid
2026-03-27 2:32 ` Duan, Zhenzhong
@ 2026-03-27 3:48 ` Nicolin Chen
2026-03-27 6:44 ` Duan, Zhenzhong
0 siblings, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2026-03-27 3:48 UTC (permalink / raw)
To: Duan, Zhenzhong
Cc: qemu-devel@nongnu.org, alex@shazbot.org, clg@redhat.com,
eric.auger@redhat.com, mst@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, skolothumtho@nvidia.com,
joao.m.martins@oracle.com, clement.mathieu--drif@bull.com,
Tian, Kevin, Liu, Yi L, Hao, Xudong, qemu-arm@nongnu.org
On Fri, Mar 27, 2026 at 02:32:57AM +0000, Duan, Zhenzhong wrote:
> >-----Original Message-----
> >From: Nicolin Chen <nicolinc@nvidia.com>
> >Subject: Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to
> >support pasid
> >
> >On Thu, Mar 26, 2026 at 05:11:16AM -0400, Zhenzhong Duan wrote:
> >> @@ -138,14 +138,16 @@ struct HostIOMMUDeviceIOMMUFDClass {
> >> *
> >> * @idev: host IOMMU device backed by IOMMUFD backend.
> >
> >Not commenting against this patch, but I just found the "host IOMMU
> >device" and the "HostIOMMUDeviceIOMMUFD" a bit ambiguous. It's not
> >an "IOMMU device" right? Perhaps somebody can help me understand :)
>
> A host device under host IOMMU?
"host device" would make sense, not "host IOMMU device", right?
Nicolin
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
` (13 preceding siblings ...)
2026-03-26 9:11 ` [PATCH v2 14/14] intel_iommu: Expose flag VIOMMU_FLAG_PASID_SUPPORTED when configured Zhenzhong Duan
@ 2026-03-27 3:58 ` Hao, Xudong
14 siblings, 0 replies; 38+ messages in thread
From: Hao, Xudong @ 2026-03-27 3:58 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: alex@shazbot.org, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, jasowang@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, skolothumtho@nvidia.com,
joao.m.martins@oracle.com, clement.mathieu--drif@bull.com,
Tian, Kevin, Liu, Yi L
> -----Original Message-----
> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
> Sent: Thursday, March 26, 2026 5:11 PM
> To: qemu-devel@nongnu.org
> Cc: alex@shazbot.org; clg@redhat.com; eric.auger@redhat.com;
> mst@redhat.com; jasowang@redhat.com; jgg@nvidia.com;
> nicolinc@nvidia.com; skolothumtho@nvidia.com; joao.m.martins@oracle.com;
> clement.mathieu--drif@bull.com; Tian, Kevin <kevin.tian@intel.com>; Liu, Yi L
> <yi.l.liu@intel.com>; Hao, Xudong <xudong.hao@intel.com>; Duan, Zhenzhong
> <zhenzhong.duan@intel.com>
> Subject: [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough
> device
>
> Hi,
>
> Now we already support first stage translation with passthrough device backed
> by nested translation in host, but only for PASID_0.
>
> Structure VTDAddressSpace includes some elements suitable for emulated
> device and passthrough device without PASID, e.g., address space, different
> memory regions, etc, it is also protected by vtd iommu lock, all these are useless
> and become a burden for passthrough device with PASID.
>
> When there are lots of PASIDs used in one device, the AS and MRs are all
> registered to memory core and impact the whole system performance.
>
> So instead of using VTDAddressSpace to cache pasid entry for each pasid of a
> passthrough device, we define a light weight structure
> VTDAccelPASIDCacheEntry with only necessary elements for each pasid. We will
> use this struct as a parameter to conduct binding/unbinding to nested hwpt, to
> record the current binded nested hwpt and even future PRQ support. It's also
> designed to support PASID_0.
>
> The potential full definition of VTDAccelPASIDCacheEntry may like:
>
> typedef struct VTDAccelPASIDCacheEntry {
> VTDHostIOMMUDevice *vtd_hiod;
> VTDPASIDEntry pasid_entry;
> uint32_t pasid;
> uint32_t fs_hwpt_id;
> uint32_t fault_id;
> int fault_fd;
> QLIST_HEAD(, VTDPRQEntry) vtd_prq_list;
> IOMMUPRINotifier pri_notifier_entry;
> IOMMUPRINotifier *pri_notifier;
> QLIST_ENTRY(VTDAccelPASIDCacheEntry) next;
> } VTDAccelPASIDCacheEntry;
>
> GIT branch:
> https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_pasid
>
> PATCH01-06: Some preparing work
> PATCH07-10: Handle PASID entry addition and removal
> PATCH11-12: Support pasid binding and unbinding
> PATCH13-14: Add PASID related check and enable PASID for passthrough device
>
> This patchset depends on a kernel feature enhancement[1] to work.
>
> Tests:
> Tested with DSA device which driver uses 2 PASIDs by default.
>
> Thanks
> Zhenzhong
>
> [1] https://lore.kernel.org/all/20260205023405.41583-1-
> zhenzhong.duan@intel.com/
>
> Changelog:
> v2:
> - move the check "s->pasid > PCI_EXT_CAP_PASID_MAX_WIDTH" to patch5
> (Clement)
> - move #include "hw/core/iommu.h" before #include "hw/core/qdev.h" (liuyi)
> - polish the comment about @Pasid parameter (Liuyi)
> - s/pe/pasid_entry, s/as_it/hiod_it, s/vtd_find_add_pc/vtd_accel_fill_pc (Liuyi)
> - s/VTDACCELPASIDCacheEntry/VTDAccelPASIDCacheEntry (Liuyi)
> - add explanation in code about PASID removal before addition (Liuyi)
> - polish the comment about scope of VTDAccelPASIDCacheEntry vs
> VTDPASIDCacheEntry (Liuyi)
> - add an optimization to bypass PASID entry addition for PASID selective
> pv_inv_dsc (Liuyi)
>
> v1:
> - use naming pattern "XXX_SET_THENAME" same as smmu (Clement)
> - fix s->pasid check (Clement)
>
> RFCv2:
> - extend attach/detach_hwpt() instead of introducing new callbacks (Shammer)
> - Define IOMMU_NO_PASID for device attachment without pasid (Nicolin)
> - update vtd_destroy_old_fs_hwpt()'s parameter for naming consistency
> (Clement)
> - check pasid bits size to be no more than 20 bits (Clement)
> - initialize local variable max_pasid_log2 to 0 (Cédric)
>
>
> Zhenzhong Duan (14):
> vfio/iommufd: Extend attach/detach_hwpt callback implementations with
> pasid
> iommufd: Extend attach/detach_hwpt callbacks to support pasid
> vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID
> flag
> intel_iommu: Create the nested hwpt with IOMMU_HWPT_ALLOC_PASID flag
> intel_iommu: Change pasid property from bool to uint8
> intel_iommu: Export some functions
> intel_iommu_accel: Handle PASID entry addition for pc_inv_dsc request
> intel_iommu_accel: Handle PASID entry removal for pc_inv_dsc request
> intel_iommu_accel: Bypass PASID entry addition for just deleted entry
> intel_iommu_accel: Handle PASID entry removal for system reset
> intel_iommu_accel: Support pasid binding/unbinding and PIOTLB flushing
> intel_iommu_accel: drop _lock suffix in
> vtd_flush_host_piotlb_all_locked()
> intel_iommu_accel: Add pasid bits size check
> intel_iommu: Expose flag VIOMMU_FLAG_PASID_SUPPORTED when
> configured
>
> hw/i386/intel_iommu_accel.h | 34 ++-
> hw/i386/intel_iommu_internal.h | 43 +++-
> include/hw/core/iommu.h | 2 +
> include/hw/i386/intel_iommu.h | 4 +-
> include/hw/vfio/vfio-device.h | 1 +
> include/system/iommufd.h | 16 +-
> backends/iommufd.c | 9 +-
> hw/arm/smmuv3-accel.c | 12 +-
> hw/i386/intel_iommu.c | 83 +++----
> hw/i386/intel_iommu_accel.c | 420 +++++++++++++++++++++++++++------
> hw/vfio/device.c | 11 +
> hw/vfio/iommufd.c | 56 +++--
> hw/vfio/trace-events | 4 +-
> 13 files changed, 524 insertions(+), 171 deletions(-)
>
> --
> 2.47.3
Tested-by: Xudong Hao <xudong.hao@intel.com>
DSA PF assignment to VM with vIOMMU scalable mode, dmatest passed in VM.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
2026-03-27 2:29 ` Duan, Zhenzhong
@ 2026-03-27 4:08 ` Nicolin Chen
2026-03-27 6:58 ` Duan, Zhenzhong
0 siblings, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2026-03-27 4:08 UTC (permalink / raw)
To: Duan, Zhenzhong
Cc: skolothumtho@nvidia.com, qemu-devel@nongnu.org, alex@shazbot.org,
clg@redhat.com, eric.auger@redhat.com, mst@redhat.com,
jasowang@redhat.com, jgg@nvidia.com, joao.m.martins@oracle.com,
clement.mathieu--drif@bull.com, Tian, Kevin, Liu, Yi L,
Hao, Xudong
On Fri, Mar 27, 2026 at 02:29:20AM +0000, Duan, Zhenzhong wrote:
> >-----Original Message-----
> >From: Nicolin Chen <nicolinc@nvidia.com>
> >Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
> >IOMMU_HWPT_ALLOC_PASID flag
> >
> >On Thu, Mar 26, 2026 at 05:11:17AM -0400, Zhenzhong Duan wrote:
> >> @@ -430,6 +431,11 @@ static bool
> >iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> >> }
> >> }
> >>
> >> + if (max_pasid_log2 &&
> >> + vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
> >> + flags |= IOMMU_HWPT_ALLOC_PASID;
> >> + }
> >
> >This would set it to:
> > IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_NEST_PARENT
> >which isn't supported on ARM :-/
>
> I am a bit confused, if smmu supports dirty tracking, flags would be
> set to IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT,
> in arm_smmu_domain_alloc_paging_flags(), I see it will return -EOPNOTSUPP.
> So how did smmu work in this case?
You hit a point. I almost forgot we need to do something with that
dirty tracking flag. This is currently broken..
For NVIDIA, the current generation Grace CPU doesn't support dirty
tracking. So, our QEMU VMs don't set that flag. This is just lucky
for us. Yet, it would trigger -EOPNOTSUPP on ARM CPU that supports,
as you mentioned.
For pasid attachment however, ARM doesn't need it: regular pasid=0
attach already has the pointer to a stage-1 PASID table.
Nicolin
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid
2026-03-26 9:11 ` [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid Zhenzhong Duan
2026-03-26 22:18 ` Nicolin Chen
@ 2026-03-27 4:29 ` Yi Liu
2026-03-27 6:45 ` Duan, Zhenzhong
1 sibling, 1 reply; 38+ messages in thread
From: Yi Liu @ 2026-03-27 4:29 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, xudong.hao,
qemu-arm
On 3/26/26 17:11, Zhenzhong Duan wrote:
> diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
> index 67d54849f2..45c08c8f6f 100644
> --- a/hw/i386/intel_iommu_accel.c
> +++ b/hw/i386/intel_iommu_accel.c
> @@ -121,7 +121,8 @@ static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
> }
> }
>
> - ret = host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, errp);
> + ret = host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID, hwpt_id,
> + errp);
> trace_vtd_device_attach_hwpt(idev->devid, vtd_as->pasid, hwpt_id, ret);
The trace looks to use the wrong pasid. could you make it use
IOMMU_NO_PASID as well? Same to the below chunks.
> if (ret) {
> /* Destroy old fs_hwpt if it's a replacement */
> @@ -145,7 +146,7 @@ static bool vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
> bool ret;
>
> if (s->dmar_enabled && s->root_scalable) {
> - ret = host_iommu_device_iommufd_detach_hwpt(idev, errp);
> + ret = host_iommu_device_iommufd_detach_hwpt(idev, IOMMU_NO_PASID, errp);
> trace_vtd_device_detach_hwpt(idev->devid, pasid, ret);
> } else {
> /*
> @@ -153,7 +154,8 @@ static bool vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
> * we fallback to the default HWPT which contains shadow page table.
> * So guest DMA could still work.
> */
> - ret = host_iommu_device_iommufd_attach_hwpt(idev, idev->hwpt_id, errp);
> + ret = host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID,
> + idev->hwpt_id, errp);
> trace_vtd_device_reattach_def_hwpt(idev->devid, pasid, idev->hwpt_id,
> ret);
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
2026-03-26 9:11 ` [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag Zhenzhong Duan
2026-03-26 22:53 ` Nicolin Chen
@ 2026-03-27 4:29 ` Yi Liu
2026-03-27 7:26 ` Duan, Zhenzhong
1 sibling, 1 reply; 38+ messages in thread
From: Yi Liu @ 2026-03-27 4:29 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, xudong.hao
On 3/26/26 17:11, Zhenzhong Duan wrote:
> When both device and vIOMMU have PASID enabled, then guest may setup
> pasid attached translation.
>
> We need to create the nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID
> flag because according to uAPI, any domain attached to the non-PASID
> part of the device must also be flagged, otherwise attaching a PASID
> will blocked.
echo the comment on the commit message.
https://lore.kernel.org/qemu-devel/a33c785a-ab94-4dc2-85eb-10b7d288f661@intel.com/
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 05/14] intel_iommu: Change pasid property from bool to uint8
2026-03-26 9:11 ` [PATCH v2 05/14] intel_iommu: Change pasid property from bool to uint8 Zhenzhong Duan
@ 2026-03-27 4:30 ` Yi Liu
2026-03-27 7:41 ` Duan, Zhenzhong
0 siblings, 1 reply; 38+ messages in thread
From: Yi Liu @ 2026-03-27 4:30 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, xudong.hao
On 3/26/26 17:11, Zhenzhong Duan wrote:
> 'x-pasid-mode' is a bool property, we need an extra 'pss' property to
> represent PASID size supported. Because there is no any device in QEMU
> supporting pasid capability yet, no guest could use the pasid feature
> until now, 'x-pasid-mode' takes no effect.
>
> So instead of an extra 'pss' property we can use a single 'pasid'
> property of uint8 type to represent if pasid is supported and the PASID
> bits size. A value of N > 0 means pasid is supported and N - 1 is the
> value in PSS field in ECAP register.
>
> PASID bits size should also be no more than 20 bits according to PCI spec.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu_internal.h | 2 +-
> include/hw/i386/intel_iommu.h | 2 +-
> hw/i386/intel_iommu.c | 11 +++++++++--
> 3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 11a53aa369..db4f186a3e 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -195,7 +195,7 @@
> #define VTD_ECAP_MHMV (15ULL << 20)
> #define VTD_ECAP_SRS (1ULL << 31)
> #define VTD_ECAP_NWFS (1ULL << 33)
> -#define VTD_ECAP_PSS (7ULL << 35) /* limit: MemTxAttrs::pid */
> +#define VTD_ECAP_SET_PSS(x, v) ((x)->ecap = deposit64((x)->ecap, 35, 5, v))
does this change still meet the limit commented by "* limit:
MemTxAttrs::pid */"?
> #define VTD_ECAP_PASID (1ULL << 40)
> #define VTD_ECAP_PDS (1ULL << 42)
> #define VTD_ECAP_SMTS (1ULL << 43)
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index e44ce31841..95c76015e4 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -314,7 +314,7 @@ struct IntelIOMMUState {
> bool intr_eime; /* Extended interrupt mode enabled */
> OnOffAuto intr_eim; /* Toggle for EIM cabability */
> uint8_t aw_bits; /* Host/IOVA address width (in bits) */
> - bool pasid; /* Whether to support PASID */
> + uint8_t pasid; /* PASID supported in bits, 0 if not */
> bool fs1gp; /* First Stage 1-GByte Page Support */
>
> /* Transient Mapping, Reserved(0) since VTD spec revision 3.2 */
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index f395fa248c..a7b676cd13 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -4203,7 +4203,7 @@ static const Property vtd_properties[] = {
> DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
> DEFINE_PROP_BOOL("x-flts", IntelIOMMUState, fsts, FALSE),
> DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState, snoop_control, false),
> - DEFINE_PROP_BOOL("x-pasid-mode", IntelIOMMUState, pasid, false),
> + DEFINE_PROP_UINT8("pasid", IntelIOMMUState, pasid, 0),
> DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, false),
> DEFINE_PROP_BOOL("stale-tm", IntelIOMMUState, stale_tm, false),
> DEFINE_PROP_BOOL("fs1gp", IntelIOMMUState, fs1gp, true),
> @@ -5042,7 +5042,8 @@ static void vtd_cap_init(IntelIOMMUState *s)
> }
>
> if (s->pasid) {
> - s->ecap |= VTD_ECAP_PASID | VTD_ECAP_PSS;
> + VTD_ECAP_SET_PSS(s, s->pasid - 1);
> + s->ecap |= VTD_ECAP_PASID;
> }
> }
>
> @@ -5583,6 +5584,12 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> return false;
> }
>
> + if (s->pasid > PCI_EXT_CAP_PASID_MAX_WIDTH) {
> + error_setg(errp, "PASID width %d, exceed Max PASID Width %d allowed "
> + "in PCI spec", s->pasid, PCI_EXT_CAP_PASID_MAX_WIDTH);
s/exceed/exceeds/
> + return false;
> + }
> +
> if (s->svm) {
> if (!x86_iommu->dt_supported) {
> error_setg(errp, "Need to set device IOTLB for svm");
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 07/14] intel_iommu_accel: Handle PASID entry addition for pc_inv_dsc request
2026-03-26 9:11 ` [PATCH v2 07/14] intel_iommu_accel: Handle PASID entry addition for pc_inv_dsc request Zhenzhong Duan
@ 2026-03-27 4:30 ` Yi Liu
2026-03-27 8:08 ` Duan, Zhenzhong
0 siblings, 1 reply; 38+ messages in thread
From: Yi Liu @ 2026-03-27 4:30 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, xudong.hao
On 3/26/26 17:11, Zhenzhong Duan wrote:
> Structure VTDAddressSpace includes some elements suitable for emulated
> device and passthrough device without PASID, e.g., address space,
> different memory regions, etc, it is also protected by vtd iommu lock,
> all these are useless and become a burden for passthrough device with
> PASID.
>
> When there are lots of PASIDs used in one device, the AS and MRs are
> all registered to memory core and impact the whole system performance.
>
> So instead of using VTDAddressSpace to cache pasid entry for each pasid
> of a passthrough device, we define a light weight structure
> VTDAccelPASIDCacheEntry with only necessary elements for each pasid. We
> will use this struct as a parameter to conduct binding/unbinding to
> nested hwpt and to record the current bound nested hwpt. It's also
> designed to support PASID_0.
>
> VTDAccelPASIDCacheEntry is designed to only be used in intel_iommu_accel.c,
> similarly VTDPASIDCacheEntry should only be used in hw/i386/intel_iommu.c
>
> When guest creates new PASID entries, QEMU will capture the pc_inv_dsc
> (pasid cache invalidation) request, walk through each pasid in each
> passthrough device for valid pasid entries, create a new
> VTDAccelPASIDCacheEntry if not existing yet.
>
> PASID_0 of passthrough device still need to register MRs in case guest
> does not operate in scalable mode. So for PASID_0, we have both
> VTDAPASIDCacheEntry and VTDAccelPASIDCacheEntry.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
perhaps togather with a co-d-b tag, otherwise this s-o-b tag is strange.
same to other commits in this series. :)
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu_accel.h | 13 +++
> hw/i386/intel_iommu_internal.h | 8 ++
> hw/i386/intel_iommu.c | 3 +
> hw/i386/intel_iommu_accel.c | 170 +++++++++++++++++++++++++++++++++
> 4 files changed, 194 insertions(+)
>
> diff --git a/hw/i386/intel_iommu_accel.h b/hw/i386/intel_iommu_accel.h
> index e5f0b077b4..c5981a23bf 100644
> --- a/hw/i386/intel_iommu_accel.h
> +++ b/hw/i386/intel_iommu_accel.h
> @@ -12,6 +12,13 @@
> #define HW_I386_INTEL_IOMMU_ACCEL_H
> #include CONFIG_DEVICES
>
> +typedef struct VTDAccelPASIDCacheEntry {
> + VTDHostIOMMUDevice *vtd_hiod;
> + VTDPASIDEntry pasid_entry;
> + uint32_t pasid;
> + QLIST_ENTRY(VTDAccelPASIDCacheEntry) next;
> +} VTDAccelPASIDCacheEntry;
> +
> #ifdef CONFIG_VTD_ACCEL
> bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
> Error **errp);
> @@ -20,6 +27,7 @@ bool vtd_propagate_guest_pasid(VTDAddressSpace *vtd_as, Error **errp);
> void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
> uint32_t pasid, hwaddr addr,
> uint64_t npages, bool ih);
> +void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info);
> void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops);
> #else
> static inline bool vtd_check_hiod_accel(IntelIOMMUState *s,
> @@ -49,6 +57,11 @@ static inline void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s,
> {
> }
>
> +static inline void vtd_pasid_cache_sync_accel(IntelIOMMUState *s,
> + VTDPASIDCacheInfo *pc_info)
> +{
> +}
> +
> static inline void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops)
> {
> }
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index c7e107fe87..d5f212ded9 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -616,6 +616,7 @@ typedef struct VTDRootEntry VTDRootEntry;
> #define VTD_CTX_ENTRY_SCALABLE_SIZE 32
>
> #define PASID_0 0
> +#define VTD_SM_CONTEXT_ENTRY_PDTS(x) extract64((x)->val[0], 9, 3)
> #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(aw) (0x1e0ULL | ~VTD_HAW_MASK(aw))
> #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 0xffffffffffe00000ULL
> #define VTD_SM_CONTEXT_ENTRY_PRE 0x10ULL
> @@ -646,6 +647,7 @@ typedef struct VTDPIOTLBInvInfo {
> #define VTD_PASID_DIR_BITS_MASK (0x3fffULL)
> #define VTD_PASID_DIR_INDEX(pasid) (((pasid) >> 6) & VTD_PASID_DIR_BITS_MASK)
> #define VTD_PASID_DIR_FPD (1ULL << 1) /* Fault Processing Disable */
> +#define VTD_PASID_TABLE_ENTRY_NUM (1ULL << 6)
> #define VTD_PASID_TABLE_BITS_MASK (0x3fULL)
> #define VTD_PASID_TABLE_INDEX(pasid) ((pasid) & VTD_PASID_TABLE_BITS_MASK)
> #define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing Disable */
> @@ -711,6 +713,7 @@ typedef struct VTDHostIOMMUDevice {
> PCIBus *bus;
> uint8_t devfn;
> HostIOMMUDevice *hiod;
> + QLIST_HEAD(, VTDAccelPASIDCacheEntry) pasid_cache_list;
> } VTDHostIOMMUDevice;
>
> /*
> @@ -768,6 +771,11 @@ static inline int vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2)
> return memcmp(p1, p2, sizeof(*p1));
> }
>
> +static inline uint32_t vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce)
> +{
> + return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce) + 7);
> +}
> +
> int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base, uint32_t pasid,
> VTDPASIDDirEntry *pdire);
> int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s, uint32_t pasid,
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index b5d18ae321..451ede7530 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3202,6 +3202,8 @@ static void vtd_pasid_cache_sync(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
> g_hash_table_foreach(s->vtd_address_spaces, vtd_pasid_cache_sync_locked,
> pc_info);
> vtd_iommu_unlock(s);
> +
> + vtd_pasid_cache_sync_accel(s, pc_info);
> }
>
> static void vtd_replay_pasid_bindings_all(IntelIOMMUState *s)
> @@ -4759,6 +4761,7 @@ static bool vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> vtd_hiod->devfn = (uint8_t)devfn;
> vtd_hiod->iommu_state = s;
> vtd_hiod->hiod = hiod;
> + QLIST_INIT(&vtd_hiod->pasid_cache_list);
>
> if (!vtd_check_hiod(s, vtd_hiod, errp)) {
> g_free(vtd_hiod);
> diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
> index c2757f3bcd..32d8ab0ef9 100644
> --- a/hw/i386/intel_iommu_accel.c
> +++ b/hw/i386/intel_iommu_accel.c
> @@ -257,6 +257,176 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
> vtd_flush_host_piotlb_locked, &piotlb_info);
> }
>
> +static void vtd_accel_fill_pc(VTDHostIOMMUDevice *vtd_hiod, uint32_t pasid,
> + VTDPASIDEntry *pe)
> +{
> + VTDAccelPASIDCacheEntry *vtd_pce;
> +
> + QLIST_FOREACH(vtd_pce, &vtd_hiod->pasid_cache_list, next) {
> + if (vtd_pce->pasid == pasid) {
> + if (vtd_pasid_entry_compare(pe, &vtd_pce->pasid_entry)) {
> + vtd_pce->pasid_entry = *pe;
> + }
> + return;
> + }
> + }
> +
> + vtd_pce = g_malloc0(sizeof(VTDAccelPASIDCacheEntry));
> + vtd_pce->vtd_hiod = vtd_hiod;
> + vtd_pce->pasid = pasid;
> + vtd_pce->pasid_entry = *pe;
> + QLIST_INSERT_HEAD(&vtd_hiod->pasid_cache_list, vtd_pce, next);
> +}
> +
> +/*
> + * This function walks over PASID range within [start, end) in a single
> + * PASID table for entries matching @info type/did, then create
> + * VTDAccelPASIDCacheEntry if not exist yet.
> + */
> +static void vtd_sm_pasid_table_walk_one(VTDHostIOMMUDevice *vtd_hiod,
> + dma_addr_t pt_base,
> + int start,
> + int end,
> + VTDPASIDCacheInfo *info)
> +{
> + IntelIOMMUState *s = vtd_hiod->iommu_state;
> + VTDPASIDEntry pe;
> + int pasid;
> +
> + for (pasid = start; pasid < end; pasid++) {
> + if (vtd_get_pe_in_pasid_leaf_table(s, pasid, pt_base, &pe) ||
> + !vtd_pe_present(&pe)) {
> + continue;
> + }
> +
> + if ((info->type == VTD_INV_DESC_PASIDC_G_DSI ||
> + info->type == VTD_INV_DESC_PASIDC_G_PASID_SI) &&
> + (info->did != VTD_SM_PASID_ENTRY_DID(&pe))) {
> + /*
> + * VTD_PASID_CACHE_DOMSI and VTD_PASID_CACHE_PASIDSI
> + * requires domain id check. If domain id check fail,
> + * go to next pasid.
> + */
> + continue;
> + }
> +
> + vtd_accel_fill_pc(vtd_hiod, pasid, &pe);
> + }
> +}
> +
> +/*
> + * In VT-d scalable mode translation, PASID dir + PASID table is used.
> + * This function aims at looping over a range of PASIDs in the given
> + * two level table to identify the pasid config in guest.
> + */
> +static void vtd_sm_pasid_table_walk(VTDHostIOMMUDevice *vtd_hiod,
> + dma_addr_t pdt_base,
> + int start, int end,
> + VTDPASIDCacheInfo *info)
> +{
> + VTDPASIDDirEntry pdire;
> + int pasid = start;
> + int pasid_next;
> + dma_addr_t pt_base;
> +
> + while (pasid < end) {
> + pasid_next = (pasid + VTD_PASID_TABLE_ENTRY_NUM) &
> + ~(VTD_PASID_TABLE_ENTRY_NUM - 1);
> + pasid_next = pasid_next < end ? pasid_next : end;
> +
> + if (!vtd_get_pdire_from_pdir_table(pdt_base, pasid, &pdire)
> + && vtd_pdire_present(&pdire)) {
> + pt_base = pdire.val & VTD_PASID_TABLE_BASE_ADDR_MASK;
> + vtd_sm_pasid_table_walk_one(vtd_hiod, pt_base, pasid, pasid_next,
> + info);
> + }
> + pasid = pasid_next;
> + }
> +}
> +
> +static void vtd_replay_pasid_bind_for_dev(VTDHostIOMMUDevice *vtd_hiod,
> + int start, int end,
> + VTDPASIDCacheInfo *pc_info)
s/vtd_replay_pasid_bind_for_dev/vtd_accel_replay_pasid_bind_for_dev/
> +{
> + IntelIOMMUState *s = vtd_hiod->iommu_state;
> + VTDContextEntry ce;
> + int dev_max_pasid = 1 << vtd_hiod->hiod->caps.max_pasid_log2;
> +
> + if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_hiod->bus),
> + vtd_hiod->devfn, &ce)) {
> + VTDPASIDCacheInfo walk_info = *pc_info;
> + uint32_t ce_max_pasid = vtd_sm_ce_get_pdt_entry_num(&ce) *
> + VTD_PASID_TABLE_ENTRY_NUM;
> +
> + end = MIN(end, MIN(dev_max_pasid, ce_max_pasid));
> +
> + vtd_sm_pasid_table_walk(vtd_hiod, VTD_CE_GET_PASID_DIR_TABLE(&ce),
> + start, end, &walk_info);
> + }
> +}
> +
> +/*
> + * This function replays the guest pasid bindings by walking the two level
> + * guest PASID table. For each valid pasid entry, it creates an entry
> + * VTDAccelPASIDCacheEntry dynamically if not exist yet. This entry holds
> + * info specific to a pasid
> + */
> +void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
I think it's nice to name it as vtd_accel_pasid_cache_sync().
> +{
> + int start = PASID_0, end = 1 << s->pasid;
> + VTDHostIOMMUDevice *vtd_hiod;
> + GHashTableIter hiod_it;
> +
> + if (!s->fsts) {
> + return;
> + }
> +
> + /*
> + * VTDPASIDCacheInfo honors PCI pasid but VTDAccelPASIDCacheEntry honors
> + * iommu pasid
> + */
> + if (pc_info->pasid == PCI_NO_PASID) {
> + pc_info->pasid = PASID_0;
> + }
> +
> + switch (pc_info->type) {
> + case VTD_INV_DESC_PASIDC_G_PASID_SI:
> + start = pc_info->pasid;
> + end = pc_info->pasid + 1;
> + /* fall through */
> + case VTD_INV_DESC_PASIDC_G_DSI:
> + /*
> + * loop all assigned devices, do domain id check in
> + * vtd_sm_pasid_table_walk_one() after get pasid entry.
> + */
> + break;
> + case VTD_INV_DESC_PASIDC_G_GLOBAL:
> + /* loop all assigned devices */
> + break;
> + default:
> + g_assert_not_reached();
> + }
> +
> + /*
> + * In this replay, one only needs to care about the devices which are
> + * backed by host IOMMU. Those devices have a corresponding vtd_hiod
> + * in s->vtd_host_iommu_dev. For devices not backed by host IOMMU, it
> + * is not necessary to replay the bindings since their cache should be
> + * created in the future DMA address translation.
above words can be dropped since this code lays in the _accel.c.
Instead, I think you can add words like "Loop all the vtd_hiod
instances to sync the "pasid cache" per the guest pasid configuration."
> + *
> + * VTD translation callback never accesses vtd_hiod and its corresponding
> + * cached pasid entry, so no iommu lock needed here.
> + */
> + g_hash_table_iter_init(&hiod_it, s->vtd_host_iommu_dev);
> + while (g_hash_table_iter_next(&hiod_it, NULL, (void **)&vtd_hiod)) {
> + if (!object_dynamic_cast(OBJECT(vtd_hiod->hiod),
> + TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
> + continue;
> + }
> + vtd_replay_pasid_bind_for_dev(vtd_hiod, start, end, pc_info);
> + }
> +}
> +
> static uint64_t vtd_get_host_iommu_quirks(uint32_t type,
> void *caps, uint32_t size)
> {
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 08/14] intel_iommu_accel: Handle PASID entry removal for pc_inv_dsc request
2026-03-26 9:11 ` [PATCH v2 08/14] intel_iommu_accel: Handle PASID entry removal " Zhenzhong Duan
@ 2026-03-27 4:31 ` Yi Liu
0 siblings, 0 replies; 38+ messages in thread
From: Yi Liu @ 2026-03-27 4:31 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, xudong.hao
On 3/26/26 17:11, Zhenzhong Duan wrote:
> When guest deletes PASID entries, QEMU will capture the pasid cache
> invalidation request, walk through pasid_cache_list in each passthrough
> device to find stale VTDAccelPASIDCacheEntry and delete them.
> This happen before the PASID entry addition, because a new added entry
> should never be removed.
drop above line.
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu_accel.c | 75 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 75 insertions(+)
>
> diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
> index 32d8ab0ef9..c1285ce331 100644
> --- a/hw/i386/intel_iommu_accel.c
> +++ b/hw/i386/intel_iommu_accel.c
> @@ -16,6 +16,28 @@
> #include "hw/pci/pci_bus.h"
> #include "trace.h"
>
> +static inline int vtd_hiod_get_pe_from_pasid(VTDAccelPASIDCacheEntry *vtd_pce,
> + VTDPASIDEntry *pe)
> +{
> + VTDHostIOMMUDevice *vtd_hiod = vtd_pce->vtd_hiod;
> + IntelIOMMUState *s = vtd_hiod->iommu_state;
> + uint32_t pasid = vtd_pce->pasid;
> + VTDContextEntry ce;
> + int ret;
> +
> + if (!s->dmar_enabled || !s->root_scalable) {
> + return -VTD_FR_RTADDR_INV_TTM;
> + }
> +
> + ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_hiod->bus),
> + vtd_hiod->devfn, &ce);
> + if (ret) {
> + return ret;
> + }
> +
> + return vtd_ce_get_pasid_entry(s, &ce, pe, pasid);
> +}
> +
> bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
> Error **errp)
> {
> @@ -257,6 +279,52 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
> vtd_flush_host_piotlb_locked, &piotlb_info);
> }
>
> +static void vtd_pasid_cache_invalidate_one(VTDAccelPASIDCacheEntry *vtd_pce,
> + VTDPASIDCacheInfo *pc_info)
> +{
> + VTDPASIDEntry pe;
> + uint16_t did;
> +
> + /*
> + * VTD_INV_DESC_PASIDC_G_DSI and VTD_INV_DESC_PASIDC_G_PASID_SI require
> + * DID check. If DID doesn't match the value in cache or memory, then
> + * it's not a pasid entry we want to invalidate.
> + */
> + switch (pc_info->type) {
> + case VTD_INV_DESC_PASIDC_G_PASID_SI:
> + if (pc_info->pasid != vtd_pce->pasid) {
> + return;
> + }
> + /* Fall through */
> + case VTD_INV_DESC_PASIDC_G_DSI:
> + did = VTD_SM_PASID_ENTRY_DID(&vtd_pce->pasid_entry);
> + if (pc_info->did != did) {
> + return;
> + }
> + }
> +
> + if (vtd_hiod_get_pe_from_pasid(vtd_pce, &pe)) {
> + /*
> + * No valid pasid entry in guest memory. e.g. pasid entry was modified
> + * to be either all-zero or non-present. Either case means existing
> + * pasid cache should be invalidated.
> + */
> + QLIST_REMOVE(vtd_pce, next);
> + g_free(vtd_pce);
could you wrap above two lines into a helper near to the
vtd_accel_fill_pc()? Although no other callers, just more readable. :)
> + }
> +}
> +
> +/* Delete invalid pasid cache entry from pasid_cache_list */
above comment is not quite necessary.
> +static void vtd_pasid_cache_invalidate(VTDHostIOMMUDevice *vtd_hiod,
> + VTDPASIDCacheInfo *pc_info)
> +{
> + VTDAccelPASIDCacheEntry *vtd_pce, *next;
> +
> + QLIST_FOREACH_SAFE(vtd_pce, &vtd_hiod->pasid_cache_list, next, next) {
> + vtd_pasid_cache_invalidate_one(vtd_pce, pc_info);
> + }
> +}
> +
> static void vtd_accel_fill_pc(VTDHostIOMMUDevice *vtd_hiod, uint32_t pasid,
> VTDPASIDEntry *pe)
> {
> @@ -423,6 +491,13 @@ void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
> TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
> continue;
> }
> +
> + /*
> + * PASID entry removal is handled before addition intentionally,
> + * because it's unnecessary to iterate on an entry that will be
> + * removed.
> + */
> + vtd_pasid_cache_invalidate(vtd_hiod, pc_info);
s/vtd_pasid_cache_invalidate/vtd_accel_pasid_cache_invalidate/
> vtd_replay_pasid_bind_for_dev(vtd_hiod, start, end, pc_info);
> }
> }
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 10/14] intel_iommu_accel: Handle PASID entry removal for system reset
2026-03-26 9:11 ` [PATCH v2 10/14] intel_iommu_accel: Handle PASID entry removal for system reset Zhenzhong Duan
@ 2026-03-27 4:32 ` Yi Liu
0 siblings, 0 replies; 38+ messages in thread
From: Yi Liu @ 2026-03-27 4:32 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, xudong.hao
On 3/26/26 17:11, Zhenzhong Duan wrote:
> When system level reset, DMA translation is turned off, all PASID
> entries become stale and should be deleted.
>
> vtd_hiod list is never accessed without BQL, so no need to guard with
> iommu lock.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu_accel.h | 5 +++++
> hw/i386/intel_iommu.c | 2 ++
> hw/i386/intel_iommu_accel.c | 13 +++++++++++++
> 3 files changed, 20 insertions(+)
>
> diff --git a/hw/i386/intel_iommu_accel.h b/hw/i386/intel_iommu_accel.h
> index c5981a23bf..1fb7ca0af6 100644
> --- a/hw/i386/intel_iommu_accel.h
> +++ b/hw/i386/intel_iommu_accel.h
> @@ -28,6 +28,7 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
> uint32_t pasid, hwaddr addr,
> uint64_t npages, bool ih);
> void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info);
> +void vtd_pasid_cache_reset_accel(IntelIOMMUState *s);
> void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops);
> #else
> static inline bool vtd_check_hiod_accel(IntelIOMMUState *s,
> @@ -62,6 +63,10 @@ static inline void vtd_pasid_cache_sync_accel(IntelIOMMUState *s,
> {
> }
>
> +static inline void vtd_pasid_cache_reset_accel(IntelIOMMUState *s)
> +{
> +}
> +
> static inline void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops)
> {
> }
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 451ede7530..b022f3cb9e 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -391,6 +391,8 @@ static void vtd_reset_caches(IntelIOMMUState *s)
> vtd_reset_context_cache_locked(s);
> vtd_pasid_cache_reset_locked(s);
> vtd_iommu_unlock(s);
> +
> + vtd_pasid_cache_reset_accel(s);
> }
>
> static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
> diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
> index 1e27c0feb8..e9e67eb1a0 100644
> --- a/hw/i386/intel_iommu_accel.c
> +++ b/hw/i386/intel_iommu_accel.c
> @@ -511,6 +511,19 @@ void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
> }
> }
>
> +/* Fake a gloal pasid cache invalidation to remove all pasid cache entries */
> +void vtd_pasid_cache_reset_accel(IntelIOMMUState *s)
> +{
> + VTDPASIDCacheInfo pc_info = { .type = VTD_INV_DESC_PASIDC_G_GLOBAL };
> + VTDHostIOMMUDevice *vtd_hiod;
> + GHashTableIter as_it;
> +
> + g_hash_table_iter_init(&as_it, s->vtd_host_iommu_dev);
> + while (g_hash_table_iter_next(&as_it, NULL, (void **)&vtd_hiod)) {
s/as_it/hiod_it/
other part LGTM.
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
> + vtd_pasid_cache_invalidate(vtd_hiod, &pc_info);
> + }
> +}
> +
> static uint64_t vtd_get_host_iommu_quirks(uint32_t type,
> void *caps, uint32_t size)
> {
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 11/14] intel_iommu_accel: Support pasid binding/unbinding and PIOTLB flushing
2026-03-26 9:11 ` [PATCH v2 11/14] intel_iommu_accel: Support pasid binding/unbinding and PIOTLB flushing Zhenzhong Duan
@ 2026-03-27 4:32 ` Yi Liu
0 siblings, 0 replies; 38+ messages in thread
From: Yi Liu @ 2026-03-27 4:32 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, xudong.hao
On 3/26/26 17:11, Zhenzhong Duan wrote:
> We just switched to use VTDAccelPASIDCacheEntry to cache pasid entry of
> passthrough device, also need to switch the binding/unbinding and PIOTLB
> flushing functions to use the same structure.
> > After the switching, we could remove accel related code from
> vtd_pasid_cache_[reset/sync]_locked() to make intel_iommu.c cleaner.
>
> The VTDAddressSpace of PASID_0 is still useful as VTD supports a legacy
> mode which needs shadow page table instead of nested page table.
This patch does quite a few things. But I don't have a good idea to
split it. So wish to have a nice description.
FYI.
Subject:
intel_iommu: Switch to VTDAccelPASIDCacheEntry for PASID bind/unbind and
PIOTLB invalidation
Commit message:
This patch switches from VTDAddressSpace to VTDAccelPASIDCacheEntry for
handling PASID bind/unbind operations and PIOTLB invalidations in
passthrough scenarios. VTDAccelPASIDCacheEntry was introduced to cache
PASID entries for passthrough devices and is now ready to propagate
PASID bind/unbind operations and PIOTLB invalidations to the host.
Unlike the previous approach, VTDAccelPASIDCacheEntry supports both
PASID_0 (rid_pasid) and other valid PASIDs, so this switch drops the
PASID_0 limitations that existed in the prior PASID bind/unbind and
PIOTLB invalidation path. For PASID_0 of passthrough devices,
VTDAddressSpace continues to handle shadow page modifications to the
host, but no longer manages PASID bind/unbind operations or PIOTLB
invalidations for passthrough scenarios.
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu_accel.h | 2 +-
> include/hw/i386/intel_iommu.h | 2 -
> hw/i386/intel_iommu.c | 17 +----
> hw/i386/intel_iommu_accel.c | 125 +++++++++++++++++-----------------
> 4 files changed, 64 insertions(+), 82 deletions(-)
>
> diff --git a/hw/i386/intel_iommu_accel.h b/hw/i386/intel_iommu_accel.h
> index 1fb7ca0af6..c72856a8ff 100644
> --- a/hw/i386/intel_iommu_accel.h
> +++ b/hw/i386/intel_iommu_accel.h
> @@ -16,6 +16,7 @@ typedef struct VTDAccelPASIDCacheEntry {
> VTDHostIOMMUDevice *vtd_hiod;
> VTDPASIDEntry pasid_entry;
> uint32_t pasid;
> + uint32_t fs_hwpt_id;
> QLIST_ENTRY(VTDAccelPASIDCacheEntry) next;
> } VTDAccelPASIDCacheEntry;
>
> @@ -23,7 +24,6 @@ typedef struct VTDAccelPASIDCacheEntry {
> bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
> Error **errp);
> VTDHostIOMMUDevice *vtd_find_hiod_iommufd(VTDAddressSpace *as);
> -bool vtd_propagate_guest_pasid(VTDAddressSpace *vtd_as, Error **errp);
> void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
> uint32_t pasid, hwaddr addr,
> uint64_t npages, bool ih);
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 95c76015e4..1842ba5840 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -154,8 +154,6 @@ struct VTDAddressSpace {
> * with the guest IOMMU pgtables for a device.
> */
> IOVATree *iova_tree;
> -
> - uint32_t fs_hwpt_id;
> };
>
> struct VTDIOTLBEntry {
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index b022f3cb9e..f53642a611 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -86,8 +86,6 @@ static void vtd_pasid_cache_reset_locked(IntelIOMMUState *s)
> VTDPASIDCacheEntry *pc_entry = &vtd_as->pasid_cache_entry;
> if (pc_entry->valid) {
> pc_entry->valid = false;
> - /* It's fatal to get failure during reset */
> - vtd_propagate_guest_pasid(vtd_as, &error_fatal);
> }
> }
> }
> @@ -3126,8 +3124,6 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
> VTDPASIDEntry pe;
> IOMMUNotifier *n;
> uint16_t did;
> - const char *err_prefix = "Attaching to HWPT failed: ";
> - Error *local_err = NULL;
>
> if (vtd_dev_get_pe_from_pasid(vtd_as, &pe)) {
> if (!pc_entry->valid) {
> @@ -3148,9 +3144,6 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
> vtd_address_space_unmap(vtd_as, n);
> }
> vtd_switch_address_space(vtd_as);
> -
> - err_prefix = "Detaching from HWPT failed: ";
> - goto do_bind_unbind;
> }
>
> /*
> @@ -3178,20 +3171,12 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
> if (!pc_entry->valid) {
> pc_entry->pasid_entry = pe;
> pc_entry->valid = true;
> - } else if (vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
> - err_prefix = "Replacing HWPT attachment failed: ";
> - } else {
> + } else if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) {
> return;
> }
>
> vtd_switch_address_space(vtd_as);
> vtd_address_space_sync(vtd_as);
> -
> -do_bind_unbind:
> - /* TODO: Fault event injection into guest, report error to QEMU for now */
> - if (!vtd_propagate_guest_pasid(vtd_as, &local_err)) {
> - error_reportf_err(local_err, "%s", err_prefix);
> - }
> }
>
> static void vtd_pasid_cache_sync(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
> diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
> index e9e67eb1a0..26543489fb 100644
> --- a/hw/i386/intel_iommu_accel.c
> +++ b/hw/i386/intel_iommu_accel.c
> @@ -111,23 +111,24 @@ static bool vtd_create_fs_hwpt(VTDHostIOMMUDevice *vtd_hiod,
> }
>
> static void vtd_destroy_old_fs_hwpt(VTDHostIOMMUDevice *vtd_hiod,
vtd_hiod can be retrived from vtd_pce?
> - VTDAddressSpace *vtd_as)
> + VTDAccelPASIDCacheEntry *vtd_pce)
> {
> HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
>
> - if (!vtd_as->fs_hwpt_id) {
> + if (!vtd_pce->fs_hwpt_id) {
> return;
> }
> - iommufd_backend_free_id(idev->iommufd, vtd_as->fs_hwpt_id);
> - vtd_as->fs_hwpt_id = 0;
> + iommufd_backend_free_id(idev->iommufd, vtd_pce->fs_hwpt_id);
> + vtd_pce->fs_hwpt_id = 0;
> }
>
> -static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
> - VTDAddressSpace *vtd_as, Error **errp)
> +static bool vtd_device_attach_iommufd(VTDAccelPASIDCacheEntry *vtd_pce,
> + Error **errp)
> {
> + VTDHostIOMMUDevice *vtd_hiod = vtd_pce->vtd_hiod;
> HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
> - VTDPASIDEntry *pe = &vtd_as->pasid_cache_entry.pasid_entry;
> - uint32_t hwpt_id = idev->hwpt_id;
> + VTDPASIDEntry *pe = &vtd_pce->pasid_entry;
> + uint32_t hwpt_id = idev->hwpt_id, pasid = vtd_pce->pasid;
> bool ret;
>
> /*
> @@ -147,14 +148,13 @@ static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
> }
> }
>
> - ret = host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID, hwpt_id,
> - errp);
> - trace_vtd_device_attach_hwpt(idev->devid, vtd_as->pasid, hwpt_id, ret);
> + ret = host_iommu_device_iommufd_attach_hwpt(idev, pasid, hwpt_id, errp);
> + trace_vtd_device_attach_hwpt(idev->devid, pasid, hwpt_id, ret);
> if (ret) {
> /* Destroy old fs_hwpt if it's a replacement */
> - vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_as);
> + vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_pce);
> if (vtd_pe_pgtt_is_fst(pe)) {
> - vtd_as->fs_hwpt_id = hwpt_id;
> + vtd_pce->fs_hwpt_id = hwpt_id;
> }
> } else if (vtd_pe_pgtt_is_fst(pe)) {
> iommufd_backend_free_id(idev->iommufd, hwpt_id);
> @@ -163,16 +163,17 @@ static bool vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
> return ret;
> }
>
> -static bool vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
> - VTDAddressSpace *vtd_as, Error **errp)
> +static bool vtd_device_detach_iommufd(VTDAccelPASIDCacheEntry *vtd_pce,
> + Error **errp)
> {
> + VTDHostIOMMUDevice *vtd_hiod = vtd_pce->vtd_hiod;
> HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
> - IntelIOMMUState *s = vtd_as->iommu_state;
> - uint32_t pasid = vtd_as->pasid;
> + IntelIOMMUState *s = vtd_hiod->iommu_state;
> + uint32_t pasid = vtd_pce->pasid;
> bool ret;
>
> - if (s->dmar_enabled && s->root_scalable) {
> - ret = host_iommu_device_iommufd_detach_hwpt(idev, IOMMU_NO_PASID, errp);
> + if (pasid != IOMMU_NO_PASID || (s->dmar_enabled && s->root_scalable)) {
> + ret = host_iommu_device_iommufd_detach_hwpt(idev, pasid, errp);
> trace_vtd_device_detach_hwpt(idev->devid, pasid, ret);
> } else {
> /*
> @@ -180,72 +181,47 @@ static bool vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
> * we fallback to the default HWPT which contains shadow page table.
> * So guest DMA could still work.
> */
> - ret = host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID,
> + ret = host_iommu_device_iommufd_attach_hwpt(idev, pasid,
> idev->hwpt_id, errp);
> trace_vtd_device_reattach_def_hwpt(idev->devid, pasid, idev->hwpt_id,
> ret);
> }
>
> if (ret) {
> - vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_as);
> + vtd_destroy_old_fs_hwpt(vtd_hiod, vtd_pce);
> }
>
> return ret;
> }
>
> -bool vtd_propagate_guest_pasid(VTDAddressSpace *vtd_as, Error **errp)
> -{
> - VTDPASIDCacheEntry *pc_entry = &vtd_as->pasid_cache_entry;
> - VTDHostIOMMUDevice *vtd_hiod = vtd_find_hiod_iommufd(vtd_as);
> -
> - /* Ignore emulated device or legacy VFIO backed device */
> - if (!vtd_as->iommu_state->fsts || !vtd_hiod) {
> - return true;
> - }
> -
> - if (pc_entry->valid) {
> - return vtd_device_attach_iommufd(vtd_hiod, vtd_as, errp);
> - }
> -
> - return vtd_device_detach_iommufd(vtd_hiod, vtd_as, errp);
> -}
> -
> /*
> - * This function is a loop function for the s->vtd_address_spaces
> - * list with VTDPIOTLBInvInfo as execution filter. It propagates
> - * the piotlb invalidation to host.
> + * This function is a loop function for the s->vtd_host_iommu_dev
> + * and vtd_hiod->pasid_cache_list lists with VTDPIOTLBInvInfo as
> + * execution filter. It propagates the piotlb invalidation to host.
> */
> -static void vtd_flush_host_piotlb_locked(gpointer key, gpointer value,
> - gpointer user_data)
> +static void vtd_flush_host_piotlb(VTDAccelPASIDCacheEntry *vtd_pce,
> + VTDPIOTLBInvInfo *piotlb_info)
> {
> - VTDPIOTLBInvInfo *piotlb_info = user_data;
> - VTDAddressSpace *vtd_as = value;
> - VTDHostIOMMUDevice *vtd_hiod = vtd_find_hiod_iommufd(vtd_as);
> - VTDPASIDCacheEntry *pc_entry = &vtd_as->pasid_cache_entry;
> + VTDHostIOMMUDevice *vtd_hiod = vtd_pce->vtd_hiod;
> + VTDPASIDEntry *pe = &vtd_pce->pasid_entry;
> uint16_t did;
>
> - if (!vtd_hiod) {
> - return;
> - }
> -
> - assert(vtd_as->pasid == PCI_NO_PASID);
> -
> /* Nothing to do if there is no first stage HWPT attached */
> - if (!pc_entry->valid ||
> - !vtd_pe_pgtt_is_fst(&pc_entry->pasid_entry)) {
> + if (!vtd_pe_pgtt_is_fst(pe)) {
> return;
> }
>
> - did = VTD_SM_PASID_ENTRY_DID(&pc_entry->pasid_entry);
> + did = VTD_SM_PASID_ENTRY_DID(pe);
>
> - if (piotlb_info->domain_id == did && piotlb_info->pasid == PASID_0) {
> + if (piotlb_info->domain_id == did && piotlb_info->pasid == vtd_pce->pasid) {
have you considered to use IOMMU_NO_PASID instead of PASID_0 before
this? When reading this change, I'm wondering why this changes
PASID_0 to vtd_pce->pasid while other parts of this patch changes
IOMMU_NO_PASID to vtd_pce->pasid. I think we've already have the
consensus that IOMMU_NO_PASID is 0, so you may have a patch to switch
using IOMMU_NO_PASID instead of PASID_0.
> HostIOMMUDeviceIOMMUFD *idev =
> HOST_IOMMU_DEVICE_IOMMUFD(vtd_hiod->hiod);
> uint32_t entry_num = 1; /* Only implement one request for simplicity */
> Error *local_err = NULL;
> struct iommu_hwpt_vtd_s1_invalidate *cache = piotlb_info->inv_data;
>
> - if (!iommufd_backend_invalidate_cache(idev->iommufd, vtd_as->fs_hwpt_id,
> + if (!iommufd_backend_invalidate_cache(idev->iommufd,
> + vtd_pce->fs_hwpt_id,
> IOMMU_HWPT_INVALIDATE_DATA_VTD_S1,
> sizeof(*cache), &entry_num, cache,
> &local_err)) {
> @@ -261,6 +237,8 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
> {
> struct iommu_hwpt_vtd_s1_invalidate cache_info = { 0 };
> VTDPIOTLBInvInfo piotlb_info;
> + VTDHostIOMMUDevice *vtd_hiod;
> + GHashTableIter as_it;
s/as_it/hiod_it/
> cache_info.addr = addr;
> cache_info.npages = npages;
> @@ -271,12 +249,19 @@ void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
> piotlb_info.inv_data = &cache_info;
>
> /*
> - * Go through each vtd_as instance in s->vtd_address_spaces, find out
> - * affected host devices which need host piotlb invalidation. Piotlb
> - * invalidation should check pasid cache per architecture point of view.
> + * Go through each vtd_pce in vtd_hiod->pasid_cache_list for each host
> + * device, find out affected host device pasid which need host piotlb
> + * invalidation. Piotlb invalidation should check pasid cache per
> + * architecture point of view.
> */
> - g_hash_table_foreach(s->vtd_address_spaces,
> - vtd_flush_host_piotlb_locked, &piotlb_info);
> + g_hash_table_iter_init(&as_it, s->vtd_host_iommu_dev);
> + while (g_hash_table_iter_next(&as_it, NULL, (void **)&vtd_hiod)) {
> + VTDAccelPASIDCacheEntry *vtd_pce;
> +
> + QLIST_FOREACH(vtd_pce, &vtd_hiod->pasid_cache_list, next) {
> + vtd_flush_host_piotlb(vtd_pce, &piotlb_info);
> + }
> + }
> }
>
> static void vtd_pasid_cache_invalidate_one(VTDAccelPASIDCacheEntry *vtd_pce,
> @@ -284,6 +269,7 @@ static void vtd_pasid_cache_invalidate_one(VTDAccelPASIDCacheEntry *vtd_pce,
> {
> VTDPASIDEntry pe;
> uint16_t did;
> + Error *local_err = NULL;
>
> /*
> * VTD_INV_DESC_PASIDC_G_DSI and VTD_INV_DESC_PASIDC_G_PASID_SI require
> @@ -309,6 +295,9 @@ static void vtd_pasid_cache_invalidate_one(VTDAccelPASIDCacheEntry *vtd_pce,
> * to be either all-zero or non-present. Either case means existing
> * pasid cache should be invalidated.
> */
> + if (!vtd_device_detach_iommufd(vtd_pce, &local_err)) {
> + error_reportf_err(local_err, "%s", "Detaching from HWPT failed: ");
> + }
> QLIST_REMOVE(vtd_pce, next);
> g_free(vtd_pce);
>
> @@ -333,11 +322,17 @@ static void vtd_accel_fill_pc(VTDHostIOMMUDevice *vtd_hiod, uint32_t pasid,
> VTDPASIDEntry *pe)
> {
> VTDAccelPASIDCacheEntry *vtd_pce;
> + Error *local_err = NULL;
>
> QLIST_FOREACH(vtd_pce, &vtd_hiod->pasid_cache_list, next) {
> if (vtd_pce->pasid == pasid) {
> if (vtd_pasid_entry_compare(pe, &vtd_pce->pasid_entry)) {
> vtd_pce->pasid_entry = *pe;
> +
> + if (!vtd_device_attach_iommufd(vtd_pce, &local_err)) {
> + error_reportf_err(local_err, "%s",
> + "Replacing HWPT attachment failed: ");
> + }
> }
> return;
> }
> @@ -348,6 +343,10 @@ static void vtd_accel_fill_pc(VTDHostIOMMUDevice *vtd_hiod, uint32_t pasid,
> vtd_pce->pasid = pasid;
> vtd_pce->pasid_entry = *pe;
> QLIST_INSERT_HEAD(&vtd_hiod->pasid_cache_list, vtd_pce, next);
> +
> + if (!vtd_device_attach_iommufd(vtd_pce, &local_err)) {
> + error_reportf_err(local_err, "%s", "Attaching to HWPT failed: ");
> + }
> }
>
> /*
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 13/14] intel_iommu_accel: Add pasid bits size check
2026-03-26 9:11 ` [PATCH v2 13/14] intel_iommu_accel: Add pasid bits size check Zhenzhong Duan
@ 2026-03-27 4:32 ` Yi Liu
0 siblings, 0 replies; 38+ messages in thread
From: Yi Liu @ 2026-03-27 4:32 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex, clg, eric.auger, mst, jasowang, jgg, nicolinc, skolothumtho,
joao.m.martins, clement.mathieu--drif, kevin.tian, xudong.hao
On 3/26/26 17:11, Zhenzhong Duan wrote:
> If pasid bits size is bigger than host side, host could fail to emulate
> all bindings in guest. Add a check to fail device plug early.
>
> Pasid bits size should also be no more than 20 bits according to PCI spec.
this has been enforced in the prior patch. right? hence just drop it. Or
you may be merge them into one patch.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu_internal.h | 1 +
> hw/i386/intel_iommu_accel.c | 8 ++++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index f3cb6cff1c..d11064b527 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -196,6 +196,7 @@
> #define VTD_ECAP_SRS (1ULL << 31)
> #define VTD_ECAP_NWFS (1ULL << 33)
> #define VTD_ECAP_SET_PSS(x, v) ((x)->ecap = deposit64((x)->ecap, 35, 5, v))
> +#define VTD_ECAP_PSS(ecap) extract64(ecap, 35, 5)
perhaps VTD_ECAP_GET_PSS. I'm also wondering if we should have a set
of macros to get other caps as well. Like "s->ecap & VTD_ECAP_SMTS"
may be VTD_EAP_GET_SMTS(s->ecap, VTD_ECAP_SMTS)".
> #define VTD_ECAP_PASID (1ULL << 40)
> #define VTD_ECAP_PDS (1ULL << 42)
> #define VTD_ECAP_SMTS (1ULL << 43)
> diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
> index 2fd26690b9..e73695ff83 100644
> --- a/hw/i386/intel_iommu_accel.c
> +++ b/hw/i386/intel_iommu_accel.c
> @@ -44,6 +44,7 @@ bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
> HostIOMMUDevice *hiod = vtd_hiod->hiod;
> struct HostIOMMUDeviceCaps *caps = &hiod->caps;
> struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
> + uint8_t hpasid = VTD_ECAP_PSS(vtd->ecap_reg) + 1;
> PCIBus *bus = vtd_hiod->bus;
> PCIDevice *pdev = bus->devices[vtd_hiod->devfn];
>
> @@ -64,6 +65,13 @@ bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hiod,
> return false;
> }
>
> + /* Only do the check when host device support PASIDs */
> + if (caps->max_pasid_log2 && s->pasid > hpasid) {
> + error_setg(errp, "PASID bits size %d > host IOMMU PASID bits size %d",
> + s->pasid, hpasid);
> + return false;
> + }
> +
> if (pci_device_get_iommu_bus_devfn(pdev, &bus, NULL, NULL)) {
> error_setg(errp, "Host device downstream to a PCI bridge is "
> "unsupported when x-flts=on");
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid
2026-03-27 3:48 ` Nicolin Chen
@ 2026-03-27 6:44 ` Duan, Zhenzhong
2026-03-27 9:34 ` Cédric Le Goater
0 siblings, 1 reply; 38+ messages in thread
From: Duan, Zhenzhong @ 2026-03-27 6:44 UTC (permalink / raw)
To: Nicolin Chen
Cc: qemu-devel@nongnu.org, alex@shazbot.org, clg@redhat.com,
eric.auger@redhat.com, mst@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, skolothumtho@nvidia.com,
joao.m.martins@oracle.com, clement.mathieu--drif@bull.com,
Tian, Kevin, Liu, Yi L, Hao, Xudong, qemu-arm@nongnu.org
>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to
>support pasid
>
>On Fri, Mar 27, 2026 at 02:32:57AM +0000, Duan, Zhenzhong wrote:
>> >-----Original Message-----
>> >From: Nicolin Chen <nicolinc@nvidia.com>
>> >Subject: Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks
>to
>> >support pasid
>> >
>> >On Thu, Mar 26, 2026 at 05:11:16AM -0400, Zhenzhong Duan wrote:
>> >> @@ -138,14 +138,16 @@ struct HostIOMMUDeviceIOMMUFDClass {
>> >> *
>> >> * @idev: host IOMMU device backed by IOMMUFD backend.
>> >
>> >Not commenting against this patch, but I just found the "host IOMMU
>> >device" and the "HostIOMMUDeviceIOMMUFD" a bit ambiguous. It's not
>> >an "IOMMU device" right? Perhaps somebody can help me understand :)
>>
>> A host device under host IOMMU?
>
>"host device" would make sense, not "host IOMMU device", right?
We want to emphasize that it's a "host device" backed by "host IOMMU",
"host device" is not enough, I think.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid
2026-03-27 4:29 ` Yi Liu
@ 2026-03-27 6:45 ` Duan, Zhenzhong
0 siblings, 0 replies; 38+ messages in thread
From: Duan, Zhenzhong @ 2026-03-27 6:45 UTC (permalink / raw)
To: Liu, Yi L, qemu-devel@nongnu.org
Cc: alex@shazbot.org, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, jasowang@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, skolothumtho@nvidia.com,
joao.m.martins@oracle.com, clement.mathieu--drif@bull.com,
Tian, Kevin, Hao, Xudong, qemu-arm@nongnu.org
>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to
>support pasid
>
>On 3/26/26 17:11, Zhenzhong Duan wrote:
>
>> diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
>> index 67d54849f2..45c08c8f6f 100644
>> --- a/hw/i386/intel_iommu_accel.c
>> +++ b/hw/i386/intel_iommu_accel.c
>> @@ -121,7 +121,8 @@ static bool
>vtd_device_attach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
>> }
>> }
>>
>> - ret = host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, errp);
>> + ret = host_iommu_device_iommufd_attach_hwpt(idev, IOMMU_NO_PASID,
>hwpt_id,
>> + errp);
>> trace_vtd_device_attach_hwpt(idev->devid, vtd_as->pasid, hwpt_id, ret);
>
>The trace looks to use the wrong pasid. could you make it use
>IOMMU_NO_PASID as well? Same to the below chunks.
OK, will do. In fact vtd_as->pasid always equals to IOMMU_NO_PASID here because we don't support pasid yet.
Thanks
Zhenzhong
>
>> if (ret) {
>> /* Destroy old fs_hwpt if it's a replacement */
>> @@ -145,7 +146,7 @@ static bool
>vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
>> bool ret;
>>
>> if (s->dmar_enabled && s->root_scalable) {
>> - ret = host_iommu_device_iommufd_detach_hwpt(idev, errp);
>> + ret = host_iommu_device_iommufd_detach_hwpt(idev,
>IOMMU_NO_PASID, errp);
>> trace_vtd_device_detach_hwpt(idev->devid, pasid, ret);
>> } else {
>> /*
>> @@ -153,7 +154,8 @@ static bool
>vtd_device_detach_iommufd(VTDHostIOMMUDevice *vtd_hiod,
>> * we fallback to the default HWPT which contains shadow page table.
>> * So guest DMA could still work.
>> */
>> - ret = host_iommu_device_iommufd_attach_hwpt(idev, idev->hwpt_id,
>errp);
>> + ret = host_iommu_device_iommufd_attach_hwpt(idev,
>IOMMU_NO_PASID,
>> + idev->hwpt_id, errp);
>> trace_vtd_device_reattach_def_hwpt(idev->devid, pasid, idev->hwpt_id,
>> ret);
>
>Regards,
>Yi Liu
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
2026-03-27 4:08 ` Nicolin Chen
@ 2026-03-27 6:58 ` Duan, Zhenzhong
0 siblings, 0 replies; 38+ messages in thread
From: Duan, Zhenzhong @ 2026-03-27 6:58 UTC (permalink / raw)
To: Nicolin Chen
Cc: skolothumtho@nvidia.com, qemu-devel@nongnu.org, alex@shazbot.org,
clg@redhat.com, eric.auger@redhat.com, mst@redhat.com,
jasowang@redhat.com, jgg@nvidia.com, joao.m.martins@oracle.com,
clement.mathieu--drif@bull.com, Tian, Kevin, Liu, Yi L,
Hao, Xudong
>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
>IOMMU_HWPT_ALLOC_PASID flag
>
>On Fri, Mar 27, 2026 at 02:29:20AM +0000, Duan, Zhenzhong wrote:
>> >-----Original Message-----
>> >From: Nicolin Chen <nicolinc@nvidia.com>
>> >Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
>> >IOMMU_HWPT_ALLOC_PASID flag
>> >
>> >On Thu, Mar 26, 2026 at 05:11:17AM -0400, Zhenzhong Duan wrote:
>> >> @@ -430,6 +431,11 @@ static bool
>> >iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> >> }
>> >> }
>> >>
>> >> + if (max_pasid_log2 &&
>> >> + vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
>> >> + flags |= IOMMU_HWPT_ALLOC_PASID;
>> >> + }
>> >
>> >This would set it to:
>> > IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_NEST_PARENT
>> >which isn't supported on ARM :-/
>>
>> I am a bit confused, if smmu supports dirty tracking, flags would be
>> set to IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
>IOMMU_HWPT_ALLOC_NEST_PARENT,
>> in arm_smmu_domain_alloc_paging_flags(), I see it will return -EOPNOTSUPP.
>> So how did smmu work in this case?
>
>You hit a point. I almost forgot we need to do something with that
>dirty tracking flag. This is currently broken..
>
>For NVIDIA, the current generation Grace CPU doesn't support dirty
>tracking. So, our QEMU VMs don't set that flag. This is just lucky
>for us. Yet, it would trigger -EOPNOTSUPP on ARM CPU that supports,
>as you mentioned.
>
>For pasid attachment however, ARM doesn't need it: regular pasid=0
>attach already has the pointer to a stage-1 PASID table.
Clear, will add VIOMMU_FLAG_WANT_PASID_ATTACH and check it instead.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
2026-03-27 4:29 ` Yi Liu
@ 2026-03-27 7:26 ` Duan, Zhenzhong
0 siblings, 0 replies; 38+ messages in thread
From: Duan, Zhenzhong @ 2026-03-27 7:26 UTC (permalink / raw)
To: Liu, Yi L, qemu-devel@nongnu.org
Cc: alex@shazbot.org, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, jasowang@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, skolothumtho@nvidia.com,
joao.m.martins@oracle.com, clement.mathieu--drif@bull.com,
Tian, Kevin, Hao, Xudong
>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
>IOMMU_HWPT_ALLOC_PASID flag
>
>On 3/26/26 17:11, Zhenzhong Duan wrote:
>> When both device and vIOMMU have PASID enabled, then guest may setup
>> pasid attached translation.
>>
>> We need to create the nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID
>> flag because according to uAPI, any domain attached to the non-PASID
>> part of the device must also be flagged, otherwise attaching a PASID
>> will blocked.
>
>echo the comment on the commit message.
>
>https://lore.kernel.org/qemu-devel/a33c785a-ab94-4dc2-85eb-
>10b7d288f661@intel.com/
Sorry, seems I missed it. Will use below to replace above paragraph.
"VFIO needs to be aware of potential pasid
usage and should attach the non-pasid part of pasid-capable device to
hwpt flagged with IOMMU_HWPT_ALLOC_PASID."
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v2 05/14] intel_iommu: Change pasid property from bool to uint8
2026-03-27 4:30 ` Yi Liu
@ 2026-03-27 7:41 ` Duan, Zhenzhong
0 siblings, 0 replies; 38+ messages in thread
From: Duan, Zhenzhong @ 2026-03-27 7:41 UTC (permalink / raw)
To: Liu, Yi L, qemu-devel@nongnu.org
Cc: alex@shazbot.org, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, jasowang@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, skolothumtho@nvidia.com,
joao.m.martins@oracle.com, clement.mathieu--drif@bull.com,
Tian, Kevin, Hao, Xudong
>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v2 05/14] intel_iommu: Change pasid property from bool to
>uint8
>
>On 3/26/26 17:11, Zhenzhong Duan wrote:
>> 'x-pasid-mode' is a bool property, we need an extra 'pss' property to
>> represent PASID size supported. Because there is no any device in QEMU
>> supporting pasid capability yet, no guest could use the pasid feature
>> until now, 'x-pasid-mode' takes no effect.
>>
>> So instead of an extra 'pss' property we can use a single 'pasid'
>> property of uint8 type to represent if pasid is supported and the PASID
>> bits size. A value of N > 0 means pasid is supported and N - 1 is the
>> value in PSS field in ECAP register.
>>
>> PASID bits size should also be no more than 20 bits according to PCI spec.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> hw/i386/intel_iommu_internal.h | 2 +-
>> include/hw/i386/intel_iommu.h | 2 +-
>> hw/i386/intel_iommu.c | 11 +++++++++--
>> 3 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
>> index 11a53aa369..db4f186a3e 100644
>> --- a/hw/i386/intel_iommu_internal.h
>> +++ b/hw/i386/intel_iommu_internal.h
>> @@ -195,7 +195,7 @@
>> #define VTD_ECAP_MHMV (15ULL << 20)
>> #define VTD_ECAP_SRS (1ULL << 31)
>> #define VTD_ECAP_NWFS (1ULL << 33)
>> -#define VTD_ECAP_PSS (7ULL << 35) /* limit: MemTxAttrs::pid */
>> +#define VTD_ECAP_SET_PSS(x, v) ((x)->ecap = deposit64((x)->ecap, 35, 5, v))
>
>does this change still meet the limit commented by "* limit:
>MemTxAttrs::pid */"?
VTD doesn't use MemTxAttrs::pid. Clement asked same question, see my reply at
https://lore.kernel.org/qemu-devel/IA3PR11MB9136B8106B0070F03CF9C32B9261A@IA3PR11MB9136.namprd11.prod.outlook.com/
>
>> #define VTD_ECAP_PASID (1ULL << 40)
>> #define VTD_ECAP_PDS (1ULL << 42)
>> #define VTD_ECAP_SMTS (1ULL << 43)
>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>> index e44ce31841..95c76015e4 100644
>> --- a/include/hw/i386/intel_iommu.h
>> +++ b/include/hw/i386/intel_iommu.h
>> @@ -314,7 +314,7 @@ struct IntelIOMMUState {
>> bool intr_eime; /* Extended interrupt mode enabled */
>> OnOffAuto intr_eim; /* Toggle for EIM cabability */
>> uint8_t aw_bits; /* Host/IOVA address width (in bits) */
>> - bool pasid; /* Whether to support PASID */
>> + uint8_t pasid; /* PASID supported in bits, 0 if not */
>> bool fs1gp; /* First Stage 1-GByte Page Support */
>>
>> /* Transient Mapping, Reserved(0) since VTD spec revision 3.2 */
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index f395fa248c..a7b676cd13 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -4203,7 +4203,7 @@ static const Property vtd_properties[] = {
>> DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode,
>FALSE),
>> DEFINE_PROP_BOOL("x-flts", IntelIOMMUState, fsts, FALSE),
>> DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState, snoop_control,
>false),
>> - DEFINE_PROP_BOOL("x-pasid-mode", IntelIOMMUState, pasid, false),
>> + DEFINE_PROP_UINT8("pasid", IntelIOMMUState, pasid, 0),
>> DEFINE_PROP_BOOL("svm", IntelIOMMUState, svm, false),
>> DEFINE_PROP_BOOL("stale-tm", IntelIOMMUState, stale_tm, false),
>> DEFINE_PROP_BOOL("fs1gp", IntelIOMMUState, fs1gp, true),
>> @@ -5042,7 +5042,8 @@ static void vtd_cap_init(IntelIOMMUState *s)
>> }
>>
>> if (s->pasid) {
>> - s->ecap |= VTD_ECAP_PASID | VTD_ECAP_PSS;
>> + VTD_ECAP_SET_PSS(s, s->pasid - 1);
>> + s->ecap |= VTD_ECAP_PASID;
>> }
>> }
>>
>> @@ -5583,6 +5584,12 @@ static bool vtd_decide_config(IntelIOMMUState *s,
>Error **errp)
>> return false;
>> }
>>
>> + if (s->pasid > PCI_EXT_CAP_PASID_MAX_WIDTH) {
>> + error_setg(errp, "PASID width %d, exceed Max PASID Width %d allowed "
>> + "in PCI spec", s->pasid, PCI_EXT_CAP_PASID_MAX_WIDTH);
>
>s/exceed/exceeds/
Will do.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v2 07/14] intel_iommu_accel: Handle PASID entry addition for pc_inv_dsc request
2026-03-27 4:30 ` Yi Liu
@ 2026-03-27 8:08 ` Duan, Zhenzhong
0 siblings, 0 replies; 38+ messages in thread
From: Duan, Zhenzhong @ 2026-03-27 8:08 UTC (permalink / raw)
To: Liu, Yi L, qemu-devel@nongnu.org
Cc: alex@shazbot.org, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, jasowang@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, skolothumtho@nvidia.com,
joao.m.martins@oracle.com, clement.mathieu--drif@bull.com,
Tian, Kevin, Hao, Xudong
>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v2 07/14] intel_iommu_accel: Handle PASID entry addition for
>pc_inv_dsc request
>
>On 3/26/26 17:11, Zhenzhong Duan wrote:
>> Structure VTDAddressSpace includes some elements suitable for emulated
>> device and passthrough device without PASID, e.g., address space,
>> different memory regions, etc, it is also protected by vtd iommu lock,
>> all these are useless and become a burden for passthrough device with
>> PASID.
>>
>> When there are lots of PASIDs used in one device, the AS and MRs are
>> all registered to memory core and impact the whole system performance.
>>
>> So instead of using VTDAddressSpace to cache pasid entry for each pasid
>> of a passthrough device, we define a light weight structure
>> VTDAccelPASIDCacheEntry with only necessary elements for each pasid. We
>> will use this struct as a parameter to conduct binding/unbinding to
>> nested hwpt and to record the current bound nested hwpt. It's also
>> designed to support PASID_0.
>>
>> VTDAccelPASIDCacheEntry is designed to only be used in intel_iommu_accel.c,
>> similarly VTDPASIDCacheEntry should only be used in hw/i386/intel_iommu.c
>>
>> When guest creates new PASID entries, QEMU will capture the pc_inv_dsc
>> (pasid cache invalidation) request, walk through each pasid in each
>> passthrough device for valid pasid entries, create a new
>> VTDAccelPASIDCacheEntry if not existing yet.
>>
>> PASID_0 of passthrough device still need to register MRs in case guest
>> does not operate in scalable mode. So for PASID_0, we have both
>> VTDAPASIDCacheEntry and VTDAccelPASIDCacheEntry.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>
>perhaps togather with a co-d-b tag, otherwise this s-o-b tag is strange.
>same to other commits in this series. :)
Sure, will do😊
>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> hw/i386/intel_iommu_accel.h | 13 +++
>> hw/i386/intel_iommu_internal.h | 8 ++
>> hw/i386/intel_iommu.c | 3 +
>> hw/i386/intel_iommu_accel.c | 170 +++++++++++++++++++++++++++++++++
>> 4 files changed, 194 insertions(+)
>>
>> diff --git a/hw/i386/intel_iommu_accel.h b/hw/i386/intel_iommu_accel.h
>> index e5f0b077b4..c5981a23bf 100644
>> --- a/hw/i386/intel_iommu_accel.h
>> +++ b/hw/i386/intel_iommu_accel.h
>> @@ -12,6 +12,13 @@
>> #define HW_I386_INTEL_IOMMU_ACCEL_H
>> #include CONFIG_DEVICES
>>
>> +typedef struct VTDAccelPASIDCacheEntry {
>> + VTDHostIOMMUDevice *vtd_hiod;
>> + VTDPASIDEntry pasid_entry;
>> + uint32_t pasid;
>> + QLIST_ENTRY(VTDAccelPASIDCacheEntry) next;
>> +} VTDAccelPASIDCacheEntry;
>> +
>> #ifdef CONFIG_VTD_ACCEL
>> bool vtd_check_hiod_accel(IntelIOMMUState *s, VTDHostIOMMUDevice
>*vtd_hiod,
>> Error **errp);
>> @@ -20,6 +27,7 @@ bool vtd_propagate_guest_pasid(VTDAddressSpace
>*vtd_as, Error **errp);
>> void vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t
>domain_id,
>> uint32_t pasid, hwaddr addr,
>> uint64_t npages, bool ih);
>> +void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo
>*pc_info);
>> void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops);
>> #else
>> static inline bool vtd_check_hiod_accel(IntelIOMMUState *s,
>> @@ -49,6 +57,11 @@ static inline void
>vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s,
>> {
>> }
>>
>> +static inline void vtd_pasid_cache_sync_accel(IntelIOMMUState *s,
>> + VTDPASIDCacheInfo *pc_info)
>> +{
>> +}
>> +
>> static inline void vtd_iommu_ops_update_accel(PCIIOMMUOps *ops)
>> {
>> }
>> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
>> index c7e107fe87..d5f212ded9 100644
>> --- a/hw/i386/intel_iommu_internal.h
>> +++ b/hw/i386/intel_iommu_internal.h
>> @@ -616,6 +616,7 @@ typedef struct VTDRootEntry VTDRootEntry;
>> #define VTD_CTX_ENTRY_SCALABLE_SIZE 32
>>
>> #define PASID_0 0
>> +#define VTD_SM_CONTEXT_ENTRY_PDTS(x) extract64((x)->val[0], 9, 3)
>> #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(aw) (0x1e0ULL |
>~VTD_HAW_MASK(aw))
>> #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 0xffffffffffe00000ULL
>> #define VTD_SM_CONTEXT_ENTRY_PRE 0x10ULL
>> @@ -646,6 +647,7 @@ typedef struct VTDPIOTLBInvInfo {
>> #define VTD_PASID_DIR_BITS_MASK (0x3fffULL)
>> #define VTD_PASID_DIR_INDEX(pasid) (((pasid) >> 6) &
>VTD_PASID_DIR_BITS_MASK)
>> #define VTD_PASID_DIR_FPD (1ULL << 1) /* Fault Processing Disable */
>> +#define VTD_PASID_TABLE_ENTRY_NUM (1ULL << 6)
>> #define VTD_PASID_TABLE_BITS_MASK (0x3fULL)
>> #define VTD_PASID_TABLE_INDEX(pasid) ((pasid) &
>VTD_PASID_TABLE_BITS_MASK)
>> #define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing Disable
>*/
>> @@ -711,6 +713,7 @@ typedef struct VTDHostIOMMUDevice {
>> PCIBus *bus;
>> uint8_t devfn;
>> HostIOMMUDevice *hiod;
>> + QLIST_HEAD(, VTDAccelPASIDCacheEntry) pasid_cache_list;
>> } VTDHostIOMMUDevice;
>>
>> /*
>> @@ -768,6 +771,11 @@ static inline int
>vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2)
>> return memcmp(p1, p2, sizeof(*p1));
>> }
>>
>> +static inline uint32_t vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce)
>> +{
>> + return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce) + 7);
>> +}
>> +
>> int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base, uint32_t pasid,
>> VTDPASIDDirEntry *pdire);
>> int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s, uint32_t pasid,
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index b5d18ae321..451ede7530 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -3202,6 +3202,8 @@ static void vtd_pasid_cache_sync(IntelIOMMUState
>*s, VTDPASIDCacheInfo *pc_info)
>> g_hash_table_foreach(s->vtd_address_spaces,
>vtd_pasid_cache_sync_locked,
>> pc_info);
>> vtd_iommu_unlock(s);
>> +
>> + vtd_pasid_cache_sync_accel(s, pc_info);
>> }
>>
>> static void vtd_replay_pasid_bindings_all(IntelIOMMUState *s)
>> @@ -4759,6 +4761,7 @@ static bool vtd_dev_set_iommu_device(PCIBus *bus,
>void *opaque, int devfn,
>> vtd_hiod->devfn = (uint8_t)devfn;
>> vtd_hiod->iommu_state = s;
>> vtd_hiod->hiod = hiod;
>> + QLIST_INIT(&vtd_hiod->pasid_cache_list);
>>
>> if (!vtd_check_hiod(s, vtd_hiod, errp)) {
>> g_free(vtd_hiod);
>> diff --git a/hw/i386/intel_iommu_accel.c b/hw/i386/intel_iommu_accel.c
>> index c2757f3bcd..32d8ab0ef9 100644
>> --- a/hw/i386/intel_iommu_accel.c
>> +++ b/hw/i386/intel_iommu_accel.c
>> @@ -257,6 +257,176 @@ void
>vtd_flush_host_piotlb_all_locked(IntelIOMMUState *s, uint16_t domain_id,
>> vtd_flush_host_piotlb_locked, &piotlb_info);
>> }
>>
>> +static void vtd_accel_fill_pc(VTDHostIOMMUDevice *vtd_hiod, uint32_t pasid,
>> + VTDPASIDEntry *pe)
>> +{
>> + VTDAccelPASIDCacheEntry *vtd_pce;
>> +
>> + QLIST_FOREACH(vtd_pce, &vtd_hiod->pasid_cache_list, next) {
>> + if (vtd_pce->pasid == pasid) {
>> + if (vtd_pasid_entry_compare(pe, &vtd_pce->pasid_entry)) {
>> + vtd_pce->pasid_entry = *pe;
>> + }
>> + return;
>> + }
>> + }
>> +
>> + vtd_pce = g_malloc0(sizeof(VTDAccelPASIDCacheEntry));
>> + vtd_pce->vtd_hiod = vtd_hiod;
>> + vtd_pce->pasid = pasid;
>> + vtd_pce->pasid_entry = *pe;
>> + QLIST_INSERT_HEAD(&vtd_hiod->pasid_cache_list, vtd_pce, next);
>> +}
>> +
>> +/*
>> + * This function walks over PASID range within [start, end) in a single
>> + * PASID table for entries matching @info type/did, then create
>> + * VTDAccelPASIDCacheEntry if not exist yet.
>> + */
>> +static void vtd_sm_pasid_table_walk_one(VTDHostIOMMUDevice *vtd_hiod,
>> + dma_addr_t pt_base,
>> + int start,
>> + int end,
>> + VTDPASIDCacheInfo *info)
>> +{
>> + IntelIOMMUState *s = vtd_hiod->iommu_state;
>> + VTDPASIDEntry pe;
>> + int pasid;
>> +
>> + for (pasid = start; pasid < end; pasid++) {
>> + if (vtd_get_pe_in_pasid_leaf_table(s, pasid, pt_base, &pe) ||
>> + !vtd_pe_present(&pe)) {
>> + continue;
>> + }
>> +
>> + if ((info->type == VTD_INV_DESC_PASIDC_G_DSI ||
>> + info->type == VTD_INV_DESC_PASIDC_G_PASID_SI) &&
>> + (info->did != VTD_SM_PASID_ENTRY_DID(&pe))) {
>> + /*
>> + * VTD_PASID_CACHE_DOMSI and VTD_PASID_CACHE_PASIDSI
>> + * requires domain id check. If domain id check fail,
>> + * go to next pasid.
>> + */
>> + continue;
>> + }
>> +
>> + vtd_accel_fill_pc(vtd_hiod, pasid, &pe);
>> + }
>> +}
>> +
>> +/*
>> + * In VT-d scalable mode translation, PASID dir + PASID table is used.
>> + * This function aims at looping over a range of PASIDs in the given
>> + * two level table to identify the pasid config in guest.
>> + */
>> +static void vtd_sm_pasid_table_walk(VTDHostIOMMUDevice *vtd_hiod,
>> + dma_addr_t pdt_base,
>> + int start, int end,
>> + VTDPASIDCacheInfo *info)
>> +{
>> + VTDPASIDDirEntry pdire;
>> + int pasid = start;
>> + int pasid_next;
>> + dma_addr_t pt_base;
>> +
>> + while (pasid < end) {
>> + pasid_next = (pasid + VTD_PASID_TABLE_ENTRY_NUM) &
>> + ~(VTD_PASID_TABLE_ENTRY_NUM - 1);
>> + pasid_next = pasid_next < end ? pasid_next : end;
>> +
>> + if (!vtd_get_pdire_from_pdir_table(pdt_base, pasid, &pdire)
>> + && vtd_pdire_present(&pdire)) {
>> + pt_base = pdire.val & VTD_PASID_TABLE_BASE_ADDR_MASK;
>> + vtd_sm_pasid_table_walk_one(vtd_hiod, pt_base, pasid, pasid_next,
>> + info);
>> + }
>> + pasid = pasid_next;
>> + }
>> +}
>> +
>> +static void vtd_replay_pasid_bind_for_dev(VTDHostIOMMUDevice *vtd_hiod,
>> + int start, int end,
>> + VTDPASIDCacheInfo *pc_info)
>
>s/vtd_replay_pasid_bind_for_dev/vtd_accel_replay_pasid_bind_for_dev/
An implicit rule is to use vtd_accel_* for all external functions, but for local functions,
we don't force it. But I'm fine with vtd_accel_replay_pasid_bind_for_dev if you prefer.
>
>> +{
>> + IntelIOMMUState *s = vtd_hiod->iommu_state;
>> + VTDContextEntry ce;
>> + int dev_max_pasid = 1 << vtd_hiod->hiod->caps.max_pasid_log2;
>> +
>> + if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_hiod->bus),
>> + vtd_hiod->devfn, &ce)) {
>> + VTDPASIDCacheInfo walk_info = *pc_info;
>> + uint32_t ce_max_pasid = vtd_sm_ce_get_pdt_entry_num(&ce) *
>> + VTD_PASID_TABLE_ENTRY_NUM;
>> +
>> + end = MIN(end, MIN(dev_max_pasid, ce_max_pasid));
>> +
>> + vtd_sm_pasid_table_walk(vtd_hiod,
>VTD_CE_GET_PASID_DIR_TABLE(&ce),
>> + start, end, &walk_info);
>> + }
>> +}
>> +
>> +/*
>> + * This function replays the guest pasid bindings by walking the two level
>> + * guest PASID table. For each valid pasid entry, it creates an entry
>> + * VTDAccelPASIDCacheEntry dynamically if not exist yet. This entry holds
>> + * info specific to a pasid
>> + */
>> +void vtd_pasid_cache_sync_accel(IntelIOMMUState *s, VTDPASIDCacheInfo
>*pc_info)
>
>I think it's nice to name it as vtd_accel_pasid_cache_sync().
Sure, will do, this is an external function.
>
>> +{
>> + int start = PASID_0, end = 1 << s->pasid;
>> + VTDHostIOMMUDevice *vtd_hiod;
>> + GHashTableIter hiod_it;
>> +
>> + if (!s->fsts) {
>> + return;
>> + }
>> +
>> + /*
>> + * VTDPASIDCacheInfo honors PCI pasid but VTDAccelPASIDCacheEntry
>honors
>> + * iommu pasid
>> + */
>> + if (pc_info->pasid == PCI_NO_PASID) {
>> + pc_info->pasid = PASID_0;
>> + }
>> +
>> + switch (pc_info->type) {
>> + case VTD_INV_DESC_PASIDC_G_PASID_SI:
>> + start = pc_info->pasid;
>> + end = pc_info->pasid + 1;
>> + /* fall through */
>> + case VTD_INV_DESC_PASIDC_G_DSI:
>> + /*
>> + * loop all assigned devices, do domain id check in
>> + * vtd_sm_pasid_table_walk_one() after get pasid entry.
>> + */
>> + break;
>> + case VTD_INV_DESC_PASIDC_G_GLOBAL:
>> + /* loop all assigned devices */
>> + break;
>> + default:
>> + g_assert_not_reached();
>> + }
>> +
>> + /*
>> + * In this replay, one only needs to care about the devices which are
>> + * backed by host IOMMU. Those devices have a corresponding vtd_hiod
>> + * in s->vtd_host_iommu_dev. For devices not backed by host IOMMU, it
>> + * is not necessary to replay the bindings since their cache should be
>> + * created in the future DMA address translation.
>
>above words can be dropped since this code lays in the _accel.c.
>Instead, I think you can add words like "Loop all the vtd_hiod
>instances to sync the "pasid cache" per the guest pasid configuration."
Make sense, will do.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid
2026-03-27 6:44 ` Duan, Zhenzhong
@ 2026-03-27 9:34 ` Cédric Le Goater
0 siblings, 0 replies; 38+ messages in thread
From: Cédric Le Goater @ 2026-03-27 9:34 UTC (permalink / raw)
To: Duan, Zhenzhong, Nicolin Chen
Cc: qemu-devel@nongnu.org, alex@shazbot.org, eric.auger@redhat.com,
mst@redhat.com, jasowang@redhat.com, jgg@nvidia.com,
skolothumtho@nvidia.com, joao.m.martins@oracle.com,
clement.mathieu--drif@bull.com, Tian, Kevin, Liu, Yi L,
Hao, Xudong, qemu-arm@nongnu.org
On 3/27/26 07:44, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Subject: Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to
>> support pasid
>>
>> On Fri, Mar 27, 2026 at 02:32:57AM +0000, Duan, Zhenzhong wrote:
>>>> -----Original Message-----
>>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>>> Subject: Re: [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks
>> to
>>>> support pasid
>>>>
>>>> On Thu, Mar 26, 2026 at 05:11:16AM -0400, Zhenzhong Duan wrote:
>>>>> @@ -138,14 +138,16 @@ struct HostIOMMUDeviceIOMMUFDClass {
>>>>> *
>>>>> * @idev: host IOMMU device backed by IOMMUFD backend.
>>>>
>>>> Not commenting against this patch, but I just found the "host IOMMU
>>>> device" and the "HostIOMMUDeviceIOMMUFD" a bit ambiguous. It's not
>>>> an "IOMMU device" right? Perhaps somebody can help me understand :)
>>>
>>> A host device under host IOMMU?
>>
>> "host device" would make sense, not "host IOMMU device", right?
>
> We want to emphasize that it's a "host device" backed by "host IOMMU",
> "host device" is not enough, I think.
Yes. These are related to the Host IOMMU device backends :
- VFIO IOMMU Type1, a.k.a legacy backend
- IOMMUFD,
There are other implementations. VFIO IOMMU Type1 is versioned and
a ppc flavor exists.
Thanks,
C.
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2026-03-27 9:35 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26 9:11 [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 01/14] vfio/iommufd: Extend attach/detach_hwpt callback implementations with pasid Zhenzhong Duan
2026-03-26 22:04 ` Nicolin Chen
2026-03-26 9:11 ` [PATCH v2 02/14] iommufd: Extend attach/detach_hwpt callbacks to support pasid Zhenzhong Duan
2026-03-26 22:18 ` Nicolin Chen
2026-03-27 2:32 ` Duan, Zhenzhong
2026-03-27 3:48 ` Nicolin Chen
2026-03-27 6:44 ` Duan, Zhenzhong
2026-03-27 9:34 ` Cédric Le Goater
2026-03-27 4:29 ` Yi Liu
2026-03-27 6:45 ` Duan, Zhenzhong
2026-03-26 9:11 ` [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag Zhenzhong Duan
2026-03-26 22:53 ` Nicolin Chen
2026-03-27 2:29 ` Duan, Zhenzhong
2026-03-27 4:08 ` Nicolin Chen
2026-03-27 6:58 ` Duan, Zhenzhong
2026-03-27 4:29 ` Yi Liu
2026-03-27 7:26 ` Duan, Zhenzhong
2026-03-26 9:11 ` [PATCH v2 04/14] intel_iommu: Create the nested " Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 05/14] intel_iommu: Change pasid property from bool to uint8 Zhenzhong Duan
2026-03-27 4:30 ` Yi Liu
2026-03-27 7:41 ` Duan, Zhenzhong
2026-03-26 9:11 ` [PATCH v2 06/14] intel_iommu: Export some functions Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 07/14] intel_iommu_accel: Handle PASID entry addition for pc_inv_dsc request Zhenzhong Duan
2026-03-27 4:30 ` Yi Liu
2026-03-27 8:08 ` Duan, Zhenzhong
2026-03-26 9:11 ` [PATCH v2 08/14] intel_iommu_accel: Handle PASID entry removal " Zhenzhong Duan
2026-03-27 4:31 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 09/14] intel_iommu_accel: Bypass PASID entry addition for just deleted entry Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 10/14] intel_iommu_accel: Handle PASID entry removal for system reset Zhenzhong Duan
2026-03-27 4:32 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 11/14] intel_iommu_accel: Support pasid binding/unbinding and PIOTLB flushing Zhenzhong Duan
2026-03-27 4:32 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 12/14] intel_iommu_accel: drop _lock suffix in vtd_flush_host_piotlb_all_locked() Zhenzhong Duan
2026-03-26 9:11 ` [PATCH v2 13/14] intel_iommu_accel: Add pasid bits size check Zhenzhong Duan
2026-03-27 4:32 ` Yi Liu
2026-03-26 9:11 ` [PATCH v2 14/14] intel_iommu: Expose flag VIOMMU_FLAG_PASID_SUPPORTED when configured Zhenzhong Duan
2026-03-27 3:58 ` [PATCH v2 00/14] intel_iommu: Enable PASID support for passthrough device Hao, Xudong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox