* [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
@ 2025-11-20 13:21 Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 01/33] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum
` (33 more replies)
0 siblings, 34 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Hi,
Changes from v5:
https://lore.kernel.org/qemu-devel/20251031105005.24618-1-skolothumtho@nvidia.com/
- Addressed feedback from v5 and picked up R-by tags. Thanks to all!
- The previously split out _DSM fix mini-series is now accepted [0].
- Improved documentation about the rationale behind the design choice of
returning an address space aliased to the system address space for
vfio-pci endpoint devices (patch #10).
- Added error propagation support for smmuv3_cmdq_consume() (patch #13).
- Updated vSTE based HWPT installation to check the SMMU enabled case
(patch #14).
- Introduced an optional callback to PCIIOMMUOps to retrieve the MSI
doorbell GPA directly, allowing us to avoid unsafe page table walks for
MSI translation in accelerated SMMUv3 cases (patch #16).
- GBPA-based vSTE update depends on Nicolin's kernel patch [1].
- VFIO/IOMMUFD has dependency on Zhenzhong's patches: 4/5/8 from the
pass-through support series [2].
PATCH organization:
1–26: Enables accelerated SMMUv3 with features based on default QEMU SMMUv3,
including IORT RMR based MSI support.
27–29: Adds options for specifying RIL, ATS, and OAS features.
30–33: Adds PASID support, including VFIO changes.
Tests:
Performed basic sanity tests on an NVIDIA GRACE platform with GPU device
assignments. A CUDA test application was used to verify the SVA use case.
Further tests are always welcome.
Eg: Qemu Cmd line:
qemu-system-aarch64 -machine virt,gic-version=3,highmem-mmio-size=2T \
-cpu host -smp cpus=4 -m size=16G,slots=2,maxmem=66G -nographic \
-bios QEMU_EFI.fd -object iommufd,id=iommufd0 -enable-kvm \
-object memory-backend-ram,size=8G,id=m0 \
-object memory-backend-ram,size=8G,id=m1 \
-numa node,memdev=m0,cpus=0-3,nodeid=0 -numa node,memdev=m1,nodeid=1 \
-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 -numa node,nodeid=5 \
-numa node,nodeid=6 -numa node,nodeid=7 -numa node,nodeid=8 -numa node,nodeid=9 \
-device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
-device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.0,accel=on,ats=on,ril=off,pasid=on,oas=48 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=512G,id=dev0 \
-device vfio-pci,host=0019:06:00.0,rombar=0,id=dev0,iommufd=iommufd0,bus=pcie.port1 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
...
-object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
-device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
-device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.1,accel=on,ats=on,ril=off,pasid=on \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2,pref64-reserve=512G \
-device vfio-pci,host=0018:06:00.0,rombar=0,id=dev1,iommufd=iommufd0,bus=pcie.port2 \
-device virtio-blk-device,drive=fs \
-drive file=image.qcow2,index=0,media=disk,format=qcow2,if=none,id=fs \
-net none \
-nographic
A complete branch can be found here,
https://github.com/shamiali2008/qemu-master master-smmuv3-accel-v6
Please take a look and let me know your feedback.
Thanks,
Shameer
[0] https://lore.kernel.org/qemu-devel/20251022080639.243965-1-skolothumtho@nvidia.com/
[1] https://lore.kernel.org/linux-iommu/20251103172755.2026145-1-nicolinc@nvidia.com/
[2] https://lore.kernel.org/qemu-devel/20251117093729.1121324-1-zhenzhong.duan@intel.com/
Details from RFCv3 Cover letter:
-------------------------------
https://lore.kernel.org/qemu-devel/20250714155941.22176-1-shameerali.kolothum.thodi@huawei.com/
This patch series introduces initial support for a user-creatable,
accelerated SMMUv3 device (-device arm-smmuv3,accel=on) in QEMU.
This is based on the user-creatable SMMUv3 device series [0].
Why this is needed:
On ARM, to enable vfio-pci pass-through devices in a VM, the host SMMUv3
must be set up in nested translation mode (Stage 1 + Stage 2), with
Stage 1 (S1) controlled by the guest and Stage 2 (S2) managed by the host.
This series introduces an optional accel property for the SMMUv3 device,
indicating that the guest will try to leverage host SMMUv3 features for
acceleration. By default, enabling accel configures the host SMMUv3 in
nested mode to support vfio-pci pass-through.
This new accelerated, user-creatable SMMUv3 device lets you:
-Set up a VM with multiple SMMUv3s, each tied to a different physical SMMUv3
on the host. Typically, you’d have multiple PCIe PXB root complexes in the
VM (one per virtual NUMA node), and each of them can have its own SMMUv3.
This setup mirrors the host's layout, where each NUMA node has its own
SMMUv3, and helps build VMs that are more aligned with the host's NUMA
topology.
-The host–guest SMMUv3 association results in reduced invalidation broadcasts
and lookups for devices behind different physical SMMUv3s.
-Simplifies handling of host SMMUv3s with differing feature sets.
-Lays the groundwork for additional capabilities like vCMDQ support.
-------------------------------
Eric Auger (2):
hw/pci-host/gpex: Allow to generate preserve boot config DSM #5
hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested
binding
Nicolin Chen (4):
backends/iommufd: Introduce iommufd_backend_alloc_viommu
backends/iommufd: Introduce iommufd_backend_alloc_vdev
hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
hw/arm/smmuv3-accel: Add nested vSTE install/uninstall support
Shameer Kolothum (26):
hw/arm/smmu-common: Factor out common helper functions and export
hw/arm/smmu-common: Make iommu ops part of SMMUState
hw/arm/smmuv3-accel: Introduce smmuv3 accel device
hw/arm/smmuv3-accel: Initialize shared system address space
hw/pci/pci: Move pci_init_bus_master() after adding device to bus
hw/pci/pci: Add optional supports_address_space() callback
hw/pci-bridge/pci_expander_bridge: Move TYPE_PXB_PCIE_DEV to header
hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints
with iommufd
hw/arm/smmuv3: Implement get_viommu_cap() callback
hw/arm/smmuv3: propagate smmuv3_cmdq_consume() errors to caller
hw/arm/smmuv3-accel: Install SMMUv3 GBPA based hwpt
hw/pci/pci: Introduce a callback to retrieve the MSI doorbell GPA
directly
hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA
hw/arm/smmuv3-accel: Add support to issue invalidation cmd to host
hw/arm/smmuv3: Initialize ID registers early during realize()
hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate
hw/arm/virt: Set PCI preserve_config for accel SMMUv3
tests/qtest/bios-tables-test: Prepare for IORT revison upgrade
tests/qtest/bios-tables-test: Update IORT blobs after revision upgrade
hw/arm/smmuv3: Add accel property for SMMUv3 device
hw/arm/smmuv3-accel: Add a property to specify RIL support
hw/arm/smmuv3-accel: Add support for ATS
hw/arm/smmuv3-accel: Add property to specify OAS bits
backends/iommufd: Retrieve PASID width from
iommufd_backend_get_device_info()
Extend get_cap() callback to support PASID
hw/arm/smmuv3-accel: Add support for PASID enable
Yi Liu (1):
vfio: Synthesize vPASID capability to VM
backends/iommufd.c | 82 +-
backends/trace-events | 2 +
hw/arm/Kconfig | 5 +
hw/arm/meson.build | 3 +-
hw/arm/smmu-common.c | 51 +-
hw/arm/smmuv3-accel.c | 756 ++++++++++++++++++
hw/arm/smmuv3-accel.h | 86 ++
hw/arm/smmuv3-internal.h | 27 +-
hw/arm/smmuv3.c | 206 ++++-
hw/arm/trace-events | 6 +
hw/arm/virt-acpi-build.c | 127 ++-
hw/arm/virt.c | 30 +
hw/i386/intel_iommu.c | 8 +-
hw/pci-bridge/pci_expander_bridge.c | 1 -
hw/pci-host/gpex-acpi.c | 29 +-
hw/pci/pci.c | 43 +-
hw/vfio/container-legacy.c | 8 +-
hw/vfio/iommufd.c | 7 +-
hw/vfio/pci.c | 38 +
include/hw/arm/smmu-common.h | 7 +
include/hw/arm/smmuv3.h | 10 +
include/hw/arm/virt.h | 1 +
include/hw/iommu.h | 1 +
include/hw/pci-host/gpex.h | 1 +
include/hw/pci/pci.h | 34 +
include/hw/pci/pci_bridge.h | 1 +
include/system/host_iommu_device.h | 21 +-
include/system/iommufd.h | 29 +-
target/arm/kvm.c | 18 +-
tests/data/acpi/aarch64/virt/IORT | Bin 128 -> 128 bytes
tests/data/acpi/aarch64/virt/IORT.its_off | Bin 172 -> 172 bytes
tests/data/acpi/aarch64/virt/IORT.smmuv3-dev | Bin 364 -> 364 bytes
.../data/acpi/aarch64/virt/IORT.smmuv3-legacy | Bin 276 -> 276 bytes
33 files changed, 1536 insertions(+), 102 deletions(-)
create mode 100644 hw/arm/smmuv3-accel.c
create mode 100644 hw/arm/smmuv3-accel.h
--
2.43.0
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v6 01/33] backends/iommufd: Introduce iommufd_backend_alloc_viommu
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 02/33] backends/iommufd: Introduce iommufd_backend_alloc_vdev Shameer Kolothum
` (32 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
From: Nicolin Chen <nicolinc@nvidia.com>
Add a helper to allocate a viommu object.
Also introduce a struct IOMMUFDViommu that can be used later by vendor
IOMMU implementations.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
backends/iommufd.c | 26 ++++++++++++++++++++++++++
backends/trace-events | 1 +
include/system/iommufd.h | 14 ++++++++++++++
3 files changed, 41 insertions(+)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index fdfb7c9d67..3d4a4ae736 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -446,6 +446,32 @@ bool iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t id,
return !ret;
}
+bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t viommu_type, uint32_t hwpt_id,
+ uint32_t *out_viommu_id, Error **errp)
+{
+ int ret;
+ struct iommu_viommu_alloc alloc_viommu = {
+ .size = sizeof(alloc_viommu),
+ .type = viommu_type,
+ .dev_id = dev_id,
+ .hwpt_id = hwpt_id,
+ };
+
+ ret = ioctl(be->fd, IOMMU_VIOMMU_ALLOC, &alloc_viommu);
+
+ trace_iommufd_backend_alloc_viommu(be->fd, dev_id, viommu_type, hwpt_id,
+ alloc_viommu.out_viommu_id, ret);
+ if (ret) {
+ error_setg_errno(errp, errno, "IOMMU_VIOMMU_ALLOC failed");
+ return false;
+ }
+
+ g_assert(out_viommu_id);
+ *out_viommu_id = alloc_viommu.out_viommu_id;
+ return true;
+}
+
bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
uint32_t hwpt_id, Error **errp)
{
diff --git a/backends/trace-events b/backends/trace-events
index 56132d3fd2..01c2d9bde9 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -21,3 +21,4 @@ iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%
iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
iommufd_backend_invalidate_cache(int iommufd, uint32_t id, uint32_t data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t data_ptr, int ret) " iommufd=%d id=%u data_type=%u entry_len=%u entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)"
+iommufd_backend_alloc_viommu(int iommufd, uint32_t dev_id, uint32_t type, uint32_t hwpt_id, uint32_t viommu_id, int ret) " iommufd=%d type=%u dev_id=%u hwpt_id=%u viommu_id=%u (%d)"
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index a659f36a20..11b8413c3f 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -38,6 +38,16 @@ struct IOMMUFDBackend {
/*< public >*/
};
+/*
+ * Virtual IOMMU object that represents physical IOMMU's virtualization
+ * support
+ */
+typedef struct IOMMUFDViommu {
+ IOMMUFDBackend *iommufd;
+ uint32_t s2_hwpt_id; /* ID of stage 2 HWPT */
+ uint32_t viommu_id; /* virtual IOMMU ID of allocated object */
+} IOMMUFDViommu;
+
bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
void iommufd_backend_disconnect(IOMMUFDBackend *be);
@@ -59,6 +69,10 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
uint32_t data_type, uint32_t data_len,
void *data_ptr, uint32_t *out_hwpt,
Error **errp);
+bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t viommu_type, uint32_t hwpt_id,
+ uint32_t *out_hwpt, Error **errp);
+
bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
bool start, Error **errp);
bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 02/33] backends/iommufd: Introduce iommufd_backend_alloc_vdev
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 01/33] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 03/33] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum
` (31 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
From: Nicolin Chen <nicolinc@nvidia.com>
Add a helper to allocate an iommufd device's virtual device (in the user
space) per a viommu instance.
While at it, introduce a struct IOMMUFDVdev for later use by vendor
IOMMU implementations.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
backends/iommufd.c | 27 +++++++++++++++++++++++++++
backends/trace-events | 1 +
include/system/iommufd.h | 12 ++++++++++++
3 files changed, 40 insertions(+)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 3d4a4ae736..e68a2c934f 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -472,6 +472,33 @@ bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
return true;
}
+bool iommufd_backend_alloc_vdev(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t viommu_id, uint64_t virt_id,
+ uint32_t *out_vdev_id, Error **errp)
+{
+ int ret;
+ struct iommu_vdevice_alloc alloc_vdev = {
+ .size = sizeof(alloc_vdev),
+ .viommu_id = viommu_id,
+ .dev_id = dev_id,
+ .virt_id = virt_id,
+ };
+
+ ret = ioctl(be->fd, IOMMU_VDEVICE_ALLOC, &alloc_vdev);
+
+ trace_iommufd_backend_alloc_vdev(be->fd, dev_id, viommu_id, virt_id,
+ alloc_vdev.out_vdevice_id, ret);
+
+ if (ret) {
+ error_setg_errno(errp, errno, "IOMMU_VDEVICE_ALLOC failed");
+ return false;
+ }
+
+ g_assert(out_vdev_id);
+ *out_vdev_id = alloc_vdev.out_vdevice_id;
+ return true;
+}
+
bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
uint32_t hwpt_id, Error **errp)
{
diff --git a/backends/trace-events b/backends/trace-events
index 01c2d9bde9..8408dc8701 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -22,3 +22,4 @@ iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) "
iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
iommufd_backend_invalidate_cache(int iommufd, uint32_t id, uint32_t data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t data_ptr, int ret) " iommufd=%d id=%u data_type=%u entry_len=%u entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)"
iommufd_backend_alloc_viommu(int iommufd, uint32_t dev_id, uint32_t type, uint32_t hwpt_id, uint32_t viommu_id, int ret) " iommufd=%d type=%u dev_id=%u hwpt_id=%u viommu_id=%u (%d)"
+iommufd_backend_alloc_vdev(int iommufd, uint32_t dev_id, uint32_t viommu_id, uint64_t virt_id, uint32_t vdev_id, int ret) " iommufd=%d dev_id=%u viommu_id=%u virt_id=0x%"PRIx64" vdev_id=%u (%d)"
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 11b8413c3f..41e216c677 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -48,6 +48,14 @@ typedef struct IOMMUFDViommu {
uint32_t viommu_id; /* virtual IOMMU ID of allocated object */
} IOMMUFDViommu;
+/*
+ * Virtual device object for a physical device bind to a vIOMMU.
+ */
+typedef struct IOMMUFDVdev {
+ uint32_t vdevice_id; /* object handle for vDevice */
+ uint32_t virt_id; /* virtual device ID */
+} IOMMUFDVdev;
+
bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
void iommufd_backend_disconnect(IOMMUFDBackend *be);
@@ -73,6 +81,10 @@ bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
uint32_t viommu_type, uint32_t hwpt_id,
uint32_t *out_hwpt, Error **errp);
+bool iommufd_backend_alloc_vdev(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t viommu_id, uint64_t virt_id,
+ uint32_t *out_vdev_id, Error **errp);
+
bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
bool start, Error **errp);
bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 03/33] hw/arm/smmu-common: Factor out common helper functions and export
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 01/33] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 02/33] backends/iommufd: Introduce iommufd_backend_alloc_vdev Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 04/33] hw/arm/smmu-common: Make iommu ops part of SMMUState Shameer Kolothum
` (30 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Factor out common helper functions and export. Subsequent patches for
smmuv3 accel support will make use of this.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmu-common.c | 44 +++++++++++++++++++++---------------
include/hw/arm/smmu-common.h | 6 +++++
2 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 62a7612184..59d6147ec9 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -847,12 +847,24 @@ SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num)
return NULL;
}
-static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
+void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev, PCIBus *bus, int devfn)
{
- SMMUState *s = opaque;
- SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
- SMMUDevice *sdev;
static unsigned int index;
+ g_autofree char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn,
+ index++);
+ sdev->smmu = s;
+ sdev->bus = bus;
+ sdev->devfn = devfn;
+
+ memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
+ s->mrtypename, OBJECT(s), name, UINT64_MAX);
+ address_space_init(&sdev->as, MEMORY_REGION(&sdev->iommu), name);
+ trace_smmu_add_mr(name);
+}
+
+SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus)
+{
+ SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
if (!sbus) {
sbus = g_malloc0(sizeof(SMMUPciBus) +
@@ -861,23 +873,19 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
g_hash_table_insert(s->smmu_pcibus_by_busptr, bus, sbus);
}
+ return sbus;
+}
+
+static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
+{
+ SMMUState *s = opaque;
+ SMMUPciBus *sbus = smmu_get_sbus(s, bus);
+ SMMUDevice *sdev;
+
sdev = sbus->pbdev[devfn];
if (!sdev) {
- char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn, index++);
-
sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1);
-
- sdev->smmu = s;
- sdev->bus = bus;
- sdev->devfn = devfn;
-
- memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
- s->mrtypename,
- OBJECT(s), name, UINT64_MAX);
- address_space_init(&sdev->as,
- MEMORY_REGION(&sdev->iommu), name);
- trace_smmu_add_mr(name);
- g_free(name);
+ smmu_init_sdev(s, sdev, bus, devfn);
}
return &sdev->as;
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 80d0fecfde..d307ddd952 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -180,6 +180,12 @@ OBJECT_DECLARE_TYPE(SMMUState, SMMUBaseClass, ARM_SMMU)
/* Return the SMMUPciBus handle associated to a PCI bus number */
SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num);
+/* Return the SMMUPciBus handle associated to a PCI bus */
+SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus);
+
+/* Initialize SMMUDevice handle associated to a SMMUPciBus */
+void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev, PCIBus *bus, int devfn);
+
/* Return the stream ID of an SMMU device */
static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 04/33] hw/arm/smmu-common: Make iommu ops part of SMMUState
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (2 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 03/33] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 05/33] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum
` (29 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Make iommu ops part of SMMUState and set to the current default smmu_ops.
No functional change intended. This will allow SMMUv3 accel implementation
to set a different iommu ops later.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmu-common.c | 7 +++++--
include/hw/arm/smmu-common.h | 1 +
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 59d6147ec9..4d6516443e 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -952,6 +952,9 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
return;
}
+ if (!s->iommu_ops) {
+ s->iommu_ops = &smmu_ops;
+ }
/*
* We only allow default PCIe Root Complex(pcie.0) or pxb-pcie based extra
* root complexes to be associated with SMMU.
@@ -971,9 +974,9 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
}
if (s->smmu_per_bus) {
- pci_setup_iommu_per_bus(pci_bus, &smmu_ops, s);
+ pci_setup_iommu_per_bus(pci_bus, s->iommu_ops, s);
} else {
- pci_setup_iommu(pci_bus, &smmu_ops, s);
+ pci_setup_iommu(pci_bus, s->iommu_ops, s);
}
return;
}
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index d307ddd952..eebf2f49e2 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -162,6 +162,7 @@ struct SMMUState {
uint8_t bus_num;
PCIBus *primary_bus;
bool smmu_per_bus; /* SMMU is specific to the primary_bus */
+ const PCIIOMMUOps *iommu_ops;
};
struct SMMUBaseClass {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 05/33] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (3 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 04/33] hw/arm/smmu-common: Make iommu ops part of SMMUState Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 06/33] hw/arm/smmuv3-accel: Initialize shared system address space Shameer Kolothum
` (28 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Set up dedicated PCIIOMMUOps for the accel SMMUv3, since it will need
different callback handling in upcoming patches. This also adds a
CONFIG_ARM_SMMUV3_ACCEL build option so the feature can be disabled
at compile time. Because we now include CONFIG_DEVICES in the header to
check for ARM_SMMUV3_ACCEL, the meson file entry for smmuv3.c needs to
be changed to arm_ss.add.
The “accel” property isn’t user visible yet and it will be introduced in
a later patch once all the supporting pieces are ready.
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/Kconfig | 5 ++++
hw/arm/meson.build | 3 ++-
hw/arm/smmuv3-accel.c | 59 +++++++++++++++++++++++++++++++++++++++++
hw/arm/smmuv3-accel.h | 27 +++++++++++++++++++
hw/arm/smmuv3.c | 5 ++++
include/hw/arm/smmuv3.h | 3 +++
6 files changed, 101 insertions(+), 1 deletion(-)
create mode 100644 hw/arm/smmuv3-accel.c
create mode 100644 hw/arm/smmuv3-accel.h
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 0cdeb60f1f..702b79a02b 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -12,6 +12,7 @@ config ARM_VIRT
select ARM_GIC
select ACPI
select ARM_SMMUV3
+ select ARM_SMMUV3_ACCEL
select GPIO_KEY
select DEVICE_TREE
select FW_CFG_DMA
@@ -629,6 +630,10 @@ config FSL_IMX8MP_EVK
config ARM_SMMUV3
bool
+config ARM_SMMUV3_ACCEL
+ bool
+ depends on ARM_SMMUV3 && IOMMUFD
+
config FSL_IMX6UL
bool
default y
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index aeaf654790..c250487e64 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -84,7 +84,8 @@ arm_common_ss.add(when: 'CONFIG_ARMSSE', if_true: files('armsse.c'))
arm_common_ss.add(when: 'CONFIG_FSL_IMX7', if_true: files('fsl-imx7.c', 'mcimx7d-sabre.c'))
arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP', if_true: files('fsl-imx8mp.c'))
arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP_EVK', if_true: files('imx8mp-evk.c'))
-arm_common_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
+arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
+arm_ss.add(when: 'CONFIG_ARM_SMMUV3_ACCEL', if_true: files('smmuv3-accel.c'))
arm_common_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
arm_common_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
arm_common_ss.add(when: 'CONFIG_XEN', if_true: files(
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
new file mode 100644
index 0000000000..99ef0db8c4
--- /dev/null
+++ b/hw/arm/smmuv3-accel.c
@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
+ * Copyright (C) 2025 NVIDIA
+ * Written by Nicolin Chen, Shameer Kolothum
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/arm/smmuv3.h"
+#include "smmuv3-accel.h"
+
+static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
+ PCIBus *bus, int devfn)
+{
+ SMMUDevice *sdev = sbus->pbdev[devfn];
+ SMMUv3AccelDevice *accel_dev;
+
+ if (sdev) {
+ return container_of(sdev, SMMUv3AccelDevice, sdev);
+ }
+
+ accel_dev = g_new0(SMMUv3AccelDevice, 1);
+ sdev = &accel_dev->sdev;
+
+ sbus->pbdev[devfn] = sdev;
+ smmu_init_sdev(bs, sdev, bus, devfn);
+ return accel_dev;
+}
+
+/*
+ * Find or add an address space for the given PCI device.
+ *
+ * If a device matching @bus and @devfn already exists, return its
+ * corresponding address space. Otherwise, create a new device entry
+ * and initialize address space for it.
+ */
+static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
+ int devfn)
+{
+ SMMUState *bs = opaque;
+ SMMUPciBus *sbus = smmu_get_sbus(bs, bus);
+ SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
+ SMMUDevice *sdev = &accel_dev->sdev;
+
+ return &sdev->as;
+}
+
+static const PCIIOMMUOps smmuv3_accel_ops = {
+ .get_address_space = smmuv3_accel_find_add_as,
+};
+
+void smmuv3_accel_init(SMMUv3State *s)
+{
+ SMMUState *bs = ARM_SMMU(s);
+
+ bs->iommu_ops = &smmuv3_accel_ops;
+}
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
new file mode 100644
index 0000000000..0dc6b00d35
--- /dev/null
+++ b/hw/arm/smmuv3-accel.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
+ * Copyright (C) 2025 NVIDIA
+ * Written by Nicolin Chen, Shameer Kolothum
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_ARM_SMMUV3_ACCEL_H
+#define HW_ARM_SMMUV3_ACCEL_H
+
+#include "hw/arm/smmu-common.h"
+#include CONFIG_DEVICES
+
+typedef struct SMMUv3AccelDevice {
+ SMMUDevice sdev;
+} SMMUv3AccelDevice;
+
+#ifdef CONFIG_ARM_SMMUV3_ACCEL
+void smmuv3_accel_init(SMMUv3State *s);
+#else
+static inline void smmuv3_accel_init(SMMUv3State *s)
+{
+}
+#endif
+
+#endif /* HW_ARM_SMMUV3_ACCEL_H */
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index bcf8af8dc7..ef991cb7d8 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -32,6 +32,7 @@
#include "qapi/error.h"
#include "hw/arm/smmuv3.h"
+#include "smmuv3-accel.h"
#include "smmuv3-internal.h"
#include "smmu-internal.h"
@@ -1882,6 +1883,10 @@ static void smmu_realize(DeviceState *d, Error **errp)
SysBusDevice *dev = SYS_BUS_DEVICE(d);
Error *local_err = NULL;
+ if (s->accel) {
+ smmuv3_accel_init(s);
+ }
+
c->parent_realize(d, &local_err);
if (local_err) {
error_propagate(errp, local_err);
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index d183a62766..bb7076286b 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -63,6 +63,9 @@ struct SMMUv3State {
qemu_irq irq[4];
QemuMutex mutex;
char *stage;
+
+ /* SMMU has HW accelerator support for nested S1 + s2 */
+ bool accel;
};
typedef enum {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 06/33] hw/arm/smmuv3-accel: Initialize shared system address space
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (4 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 05/33] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 07/33] hw/pci/pci: Move pci_init_bus_master() after adding device to bus Shameer Kolothum
` (27 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
To support accelerated SMMUv3 instances, introduce a shared system-wide
AddressSpace (shared_as_sysmem) that aliases the global system memory.
This shared AddressSpace will be used in a subsequent patch for all
vfio-pci devices behind all accelerated SMMUv3 instances within a VM.
Sharing a single system AddressSpace ensures that all devices behind
accelerated SMMUv3s use the same system address space pointer. This
allows VFIO/iommufd to reuse a single IOAS ID in iommufd_cdev_attach(),
enabling the Stage-2 page tables to be shared within the VM rather than
duplicated for each SMMUv3 instance.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 99ef0db8c4..b2eded743e 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -11,6 +11,14 @@
#include "hw/arm/smmuv3.h"
#include "smmuv3-accel.h"
+/*
+ * The root region aliases the global system memory, and shared_as_sysmem
+ * provides a shared Address Space referencing it. This Address Space is used
+ * by all vfio-pci devices behind all accelerated SMMUv3 instances within a VM.
+ */
+static MemoryRegion root, sysmem;
+static AddressSpace *shared_as_sysmem;
+
static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
PCIBus *bus, int devfn)
{
@@ -51,9 +59,27 @@ static const PCIIOMMUOps smmuv3_accel_ops = {
.get_address_space = smmuv3_accel_find_add_as,
};
+static void smmuv3_accel_as_init(SMMUv3State *s)
+{
+
+ if (shared_as_sysmem) {
+ return;
+ }
+
+ memory_region_init(&root, OBJECT(s), "root", UINT64_MAX);
+ memory_region_init_alias(&sysmem, OBJECT(s), "smmuv3-accel-sysmem",
+ get_system_memory(), 0,
+ memory_region_size(get_system_memory()));
+ memory_region_add_subregion(&root, 0, &sysmem);
+
+ shared_as_sysmem = g_new0(AddressSpace, 1);
+ address_space_init(shared_as_sysmem, &root, "smmuv3-accel-as-sysmem");
+}
+
void smmuv3_accel_init(SMMUv3State *s)
{
SMMUState *bs = ARM_SMMU(s);
bs->iommu_ops = &smmuv3_accel_ops;
+ smmuv3_accel_as_init(s);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 07/33] hw/pci/pci: Move pci_init_bus_master() after adding device to bus
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (5 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 06/33] hw/arm/smmuv3-accel: Initialize shared system address space Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 20:44 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback Shameer Kolothum
` (26 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju,
Michael S . Tsirkin
During PCI hotplug, in do_pci_register_device(), pci_init_bus_master()
is called before storing the pci_dev pointer in bus->devices[devfn].
This causes a problem if pci_init_bus_master() (via its
get_address_space() callback) attempts to retrieve the device using
pci_find_device(), since the PCI device is not yet visible on the bus.
Fix this by moving the pci_init_bus_master() call to after the device
has been added to bus->devices[devfn].
This prepares for a subsequent patch where the accel SMMUv3
get_address_space() callback retrieves the pci_dev to identify the
attached device type.
No functional change intended.
Cc: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/pci/pci.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8b62044a8e..af32ab4adb 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1381,9 +1381,6 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
pci_dev->bus_master_as.max_bounce_buffer_size =
pci_dev->max_bounce_buffer_size;
- if (phase_check(PHASE_MACHINE_READY)) {
- pci_init_bus_master(pci_dev);
- }
pci_dev->irq_state = 0;
pci_config_alloc(pci_dev);
@@ -1427,6 +1424,9 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
pci_dev->config_write = config_write;
bus->devices[devfn] = pci_dev;
pci_dev->version_id = 2; /* Current pci device vmstate version */
+ if (phase_check(PHASE_MACHINE_READY)) {
+ pci_init_bus_master(pci_dev);
+ }
return pci_dev;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (6 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 07/33] hw/pci/pci: Move pci_init_bus_master() after adding device to bus Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 20:51 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 09/33] hw/pci-bridge/pci_expander_bridge: Move TYPE_PXB_PCIE_DEV to header Shameer Kolothum
` (25 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju,
Michael S . Tsirkin
Introduce an optional supports_address_space() callback in PCIIOMMUOps to
allow a vIOMMU implementation to reject devices that should not be attached
to it.
Currently, get_address_space() is the first and mandatory callback into the
vIOMMU layer, which always returns an address space. For certain setups, such
as hardware accelerated vIOMMUs (e.g. ARM SMMUv3 with accel=on), attaching
emulated endpoint devices is undesirable as it may impact the behavior or
performance of VFIO passthrough devices, for example, by triggering
unnecessary invalidations on the host IOMMU.
The new callback allows a vIOMMU to check and reject unsupported devices
early during PCI device registration.
Cc: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/pci/pci.c | 20 ++++++++++++++++++++
include/hw/pci/pci.h | 17 +++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index af32ab4adb..55647a6928 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -135,6 +135,21 @@ static void pci_set_master(PCIDevice *d, bool enable)
d->is_master = enable; /* cache the status */
}
+static bool
+pci_device_supports_iommu_address_space(PCIDevice *dev, Error **errp)
+{
+ PCIBus *bus;
+ PCIBus *iommu_bus;
+ int devfn;
+
+ pci_device_get_iommu_bus_devfn(dev, &iommu_bus, &bus, &devfn);
+ if (iommu_bus && iommu_bus->iommu_ops->supports_address_space) {
+ return iommu_bus->iommu_ops->supports_address_space(bus,
+ iommu_bus->iommu_opaque, devfn, errp);
+ }
+ return true;
+}
+
static void pci_init_bus_master(PCIDevice *pci_dev)
{
AddressSpace *dma_as = pci_device_iommu_address_space(pci_dev);
@@ -1424,6 +1439,11 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev,
pci_dev->config_write = config_write;
bus->devices[devfn] = pci_dev;
pci_dev->version_id = 2; /* Current pci device vmstate version */
+ if (!pci_device_supports_iommu_address_space(pci_dev, errp)) {
+ do_pci_unregister_device(pci_dev);
+ bus->devices[devfn] = NULL;
+ return NULL;
+ }
if (phase_check(PHASE_MACHINE_READY)) {
pci_init_bus_master(pci_dev);
}
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index a3ca54859c..dd1c4483a2 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -417,6 +417,23 @@ typedef struct IOMMUPRINotifier {
* framework for a set of devices on a PCI bus.
*/
typedef struct PCIIOMMUOps {
+ /**
+ * @supports_address_space: Optional pre-check to determine if a PCI
+ * device can have an IOMMU address space.
+ *
+ * @bus: the #PCIBus being accessed.
+ *
+ * @opaque: the data passed to pci_setup_iommu().
+ *
+ * @devfn: device and function number.
+ *
+ * @errp: pass an Error out only when return false
+ *
+ * Returns: true if the device can be associated with an IOMMU address
+ * space, false otherwise with errp set.
+ */
+ bool (*supports_address_space)(PCIBus *bus, void *opaque, int devfn,
+ Error **errp);
/**
* @get_address_space: get the address space for a set of devices
* on a PCI bus.
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 09/33] hw/pci-bridge/pci_expander_bridge: Move TYPE_PXB_PCIE_DEV to header
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (7 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 20:52 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 10/33] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum
` (24 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Move the TYPE_PXB_PCIE_DEV definition to header so that it can be
referenced by other code in subsequent patch.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/pci-bridge/pci_expander_bridge.c | 1 -
include/hw/pci/pci_bridge.h | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c
index 1bcceddbc4..a8eb2d2426 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -48,7 +48,6 @@ struct PXBBus {
char bus_path[8];
};
-#define TYPE_PXB_PCIE_DEV "pxb-pcie"
OBJECT_DECLARE_SIMPLE_TYPE(PXBPCIEDev, PXB_PCIE_DEV)
static GList *pxb_dev_list;
diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h
index a055fd8d32..b61360b900 100644
--- a/include/hw/pci/pci_bridge.h
+++ b/include/hw/pci/pci_bridge.h
@@ -106,6 +106,7 @@ typedef struct PXBPCIEDev {
#define TYPE_PXB_PCIE_BUS "pxb-pcie-bus"
#define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
+#define TYPE_PXB_PCIE_DEV "pxb-pcie"
#define TYPE_PXB_DEV "pxb"
OBJECT_DECLARE_SIMPLE_TYPE(PXBDev, PXB_DEV)
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 10/33] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (8 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 09/33] hw/pci-bridge/pci_expander_bridge: Move TYPE_PXB_PCIE_DEV to header Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 11/33] hw/arm/smmuv3: Implement get_viommu_cap() callback Shameer Kolothum
` (23 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Accelerated SMMUv3 is only meaningful when a device can leverage the host
SMMUv3 in nested mode (S1+S2 translation). To keep the model consistent
and correct, this mode is restricted to vfio-pci endpoint devices using
the iommufd backend.
Non-endpoint emulated devices such as PCIe root ports and bridges are also
permitted so that vfio-pci devices can be attached downstream. All other
device types are unsupported in accelerated mode.
Implement supports_address_space() callback to reject all such unsupported
devices.
This restriction also avoids complications with IOTLB invalidations. Some
TLBI commands (e.g. CMD_TLBI_NH_ASID) lack an associated SID, making it
difficult to trace the originating device. Allowing emulated endpoints
would require invalidating both QEMU’s software IOTLB and the host’s
hardware IOTLB, which can significantly degrade performance.
A key design choice is the address space returned for accelerated vfio-pci
endpoints. VFIO core has a container that manages an HWPT. By default, it
allocates a stage-1 normal HWPT, unless vIOMMU requests for a nesting
parent HWPT for accelerated cases.
VFIO core adds a listener for that HWPT and sets up a handler
vfio_container_region_add() where it checks the memory region.
-If the region is a non-IOMMU translated one (system address space), VFIO
treats it as RAM and handles all stage-2 mappings for the core allocated
nesting parent HWPT.
-If the region is an IOMMU address space, VFIO instead enables IOTLB
notifier handling and translation replay, skipping the RAM listener and
therefore not installing stage-2 mappings.
For accelerated SMMUv3, correct operation requires the S1+S2 nesting
model, and therefore VFIO must take the "system address space" path so
that stage-2 mappings are properly built. Returning an alias of the
system address space ensures this happens. Returning the IOMMU address
space would omit stage-2 mapping and break nested translation.
Another option considered was forcing a pre-registration path using
vfio_prereg_listener() to set up stage-2 mappings, but this requires
changes in VFIO core and was not adopted. Returning an alias of the
system address space keeps the design aligned with existing VFIO/iommufd
nesting flows and avoids the need for cross-subsystem changes.
In summary:
- vfio-pci devices(with iommufd as backend) return an address space
aliased to system address space.
- bridges and root ports return the IOMMU address space.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 77 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 76 insertions(+), 1 deletion(-)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index b2eded743e..2fcd301322 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -7,8 +7,13 @@
*/
#include "qemu/osdep.h"
+#include "qemu/error-report.h"
#include "hw/arm/smmuv3.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci-host/gpex.h"
+#include "hw/vfio/pci.h"
+
#include "smmuv3-accel.h"
/*
@@ -37,6 +42,48 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
return accel_dev;
}
+/*
+ * Only allow PCIe bridges, pxb-pcie roots, and GPEX roots so vfio-pci
+ * endpoints can sit downstream. Accelerated SMMUv3 requires a vfio-pci
+ * endpoint using the iommufd backend; all other device types are rejected.
+ * This avoids supporting emulated endpoints, which would complicate IOTLB
+ * invalidation and hurt performance.
+ */
+static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool *vfio_pci)
+{
+
+ if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
+ object_dynamic_cast(OBJECT(pdev), TYPE_PXB_PCIE_DEV) ||
+ object_dynamic_cast(OBJECT(pdev), TYPE_GPEX_ROOT_DEVICE)) {
+ return true;
+ } else if ((object_dynamic_cast(OBJECT(pdev), TYPE_VFIO_PCI))) {
+ *vfio_pci = true;
+ if (object_property_get_link(OBJECT(pdev), "iommufd", NULL)) {
+ return true;
+ }
+ }
+ return false;
+}
+
+static bool smmuv3_accel_supports_as(PCIBus *bus, void *opaque, int devfn,
+ Error **errp)
+{
+ PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
+ bool vfio_pci = false;
+
+ if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
+ if (vfio_pci) {
+ error_setg(errp, "vfio-pci endpoint devices without an iommufd "
+ "backend not allowed when using arm-smmuv3,accel=on");
+
+ } else {
+ error_setg(errp, "Emulated endpoint devices are not allowed when "
+ "using arm-smmuv3,accel=on");
+ }
+ return false;
+ }
+ return true;
+}
/*
* Find or add an address space for the given PCI device.
*
@@ -47,15 +94,43 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
int devfn)
{
+ PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
SMMUState *bs = opaque;
SMMUPciBus *sbus = smmu_get_sbus(bs, bus);
SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
SMMUDevice *sdev = &accel_dev->sdev;
+ bool vfio_pci = false;
- return &sdev->as;
+ if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
+ /* Should never be here: supports_address_space() filters these out */
+ g_assert_not_reached();
+ }
+
+ /*
+ * In the accelerated mode, a vfio-pci device attached via the iommufd
+ * backend must remain in the system address space. Such a device is
+ * always translated by its physical SMMU (using either a stage-2-only
+ * STE or a nested STE), where the parent stage-2 page table is allocated
+ * by the VFIO core to back the system address space.
+ *
+ * Return the shared_as_sysmem aliased to the global system memory in this
+ * case. Sharing address_space_memory also allows devices under different
+ * vSMMU instances in the same VM to reuse a single nesting parent HWPT in
+ * the VFIO core.
+ *
+ * For non-endpoint emulated devices such as PCIe root ports and bridges,
+ * which may use the normal emulated translation path and software IOTLBs,
+ * return the SMMU's IOMMU address space.
+ */
+ if (vfio_pci) {
+ return shared_as_sysmem;
+ } else {
+ return &sdev->as;
+ }
}
static const PCIIOMMUOps smmuv3_accel_ops = {
+ .supports_address_space = smmuv3_accel_supports_as,
.get_address_space = smmuv3_accel_find_add_as,
};
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 11/33] hw/arm/smmuv3: Implement get_viommu_cap() callback
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (9 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 10/33] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 12/33] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum
` (22 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
For accelerated SMMUv3, we need nested parent domain creation. Add the
callback support so that VFIO can create a nested parent.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 2fcd301322..bd4a7dbde1 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -10,6 +10,7 @@
#include "qemu/error-report.h"
#include "hw/arm/smmuv3.h"
+#include "hw/iommu.h"
#include "hw/pci/pci_bridge.h"
#include "hw/pci-host/gpex.h"
#include "hw/vfio/pci.h"
@@ -129,9 +130,21 @@ static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
}
}
+static uint64_t smmuv3_accel_get_viommu_flags(void *opaque)
+{
+ /*
+ * We return VIOMMU_FLAG_WANT_NESTING_PARENT to inform VFIO core to create a
+ * nesting parent which is required for accelerated SMMUv3 support.
+ * The real HW nested support should be reported from host SMMUv3 and if
+ * it doesn't, the nesting parent allocation will fail anyway in VFIO core.
+ */
+ return VIOMMU_FLAG_WANT_NESTING_PARENT;
+}
+
static const PCIIOMMUOps smmuv3_accel_ops = {
.supports_address_space = smmuv3_accel_supports_as,
.get_address_space = smmuv3_accel_find_add_as,
+ .get_viommu_flags = smmuv3_accel_get_viommu_flags,
};
static void smmuv3_accel_as_init(SMMUv3State *s)
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 12/33] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (10 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 11/33] hw/arm/smmuv3: Implement get_viommu_cap() callback Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 13/33] hw/arm/smmuv3: propagate smmuv3_cmdq_consume() errors to caller Shameer Kolothum
` (21 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
From: Nicolin Chen <nicolinc@nvidia.com>
Implement the VFIO/PCI callbacks to attach and detach a HostIOMMUDevice
to a vSMMUv3 when accel=on,
- set_iommu_device(): attach a HostIOMMUDevice to a vIOMMU
- unset_iommu_device(): detach and release associated resources
In SMMUv3 accel=on mode, the guest SMMUv3 is backed by the host SMMUv3 via
IOMMUFD. A vIOMMU object (created via IOMMU_VIOMMU_ALLOC) provides a per-VM,
security-isolated handle to the physical SMMUv3. Without a vIOMMU, the
vSMMUv3 cannot relay guest operations to the host hardware nor maintain
isolation across VMs or devices. Therefore, set_iommu_device() allocates
a vIOMMU object if one does not already exist.
There are two main points to consider in this implementation:
1) VFIO core allocates and attaches a S2 HWPT that acts as the nesting
parent for nested HWPTs(IOMMU_DOMAIN_NESTED). This parent HWPT will
be shared across multiple vSMMU instances within a VM.
2) A device cannot attach directly to a vIOMMU. Instead, it attaches
through a proxy nested HWPT (IOMMU_DOMAIN_NESTED). Based on the STE
configuration,there are three types of nested HWPTs: bypass, abort,
and translate.
-The bypass and abort proxy HWPTs are pre-allocated. When SMMUv3
operates in global abort or bypass modes, as controlled by the GBPA
register, or issues a vSTE for bypass or abort we attach these
pre-allocated nested HWPTs.
-The translate HWPT requires a vDEVICE to be allocated first, since
invalidations and events depend on a valid vSID.
-The vDEVICE allocation and attach operations for vSTE based HWPTs
are implemented in subsequent patches.
In summary, a device placed behind a vSMMU instance must have a vSID for
translate vSTE. The bypass and abort vSTEs are pre-allocated as proxy
nested HWPTs and is attached based on GBPA register. The core-managed
nesting parent S2 HWPT is used as parent S2 HWPT for all the nested
HWPTs and is intended to be shared across vSMMU instances within the
same VM.
set_iommu_device():
- Reuse an existing vIOMMU for the same physical SMMU if available.
If not, allocate a new one using the nesting parent S2 HWPT.
- Pre-allocate two proxy nested HWPTs (bypass and abort) under the
vIOMMU and install one based on GBPA.ABORT value.
- Add the device to the vIOMMU’s device list.
unset_iommu_device():
- Re-attach device to the nesting parent S2 HWPT.
- Remove the device from the vIOMMU’s device list.
- If the list is empty, free the proxy HWPTs (bypass and abort)
and release the vIOMMU object.
Introduce struct SMMUv3AccelState, representing an accelerated SMMUv3
instance backed by an iommufd vIOMMU object, and storing the bypass and
abort proxy HWPT IDs.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 154 +++++++++++++++++++++++++++++++++++++++
hw/arm/smmuv3-accel.h | 16 ++++
hw/arm/smmuv3-internal.h | 3 +
hw/arm/trace-events | 4 +
include/hw/arm/smmuv3.h | 1 +
5 files changed, 178 insertions(+)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index bd4a7dbde1..4dd56a8e65 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -8,6 +8,7 @@
#include "qemu/osdep.h"
#include "qemu/error-report.h"
+#include "trace.h"
#include "hw/arm/smmuv3.h"
#include "hw/iommu.h"
@@ -15,6 +16,7 @@
#include "hw/pci-host/gpex.h"
#include "hw/vfio/pci.h"
+#include "smmuv3-internal.h"
#include "smmuv3-accel.h"
/*
@@ -43,6 +45,156 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
return accel_dev;
}
+static uint32_t smmuv3_accel_gbpa_hwpt(SMMUv3State *s, SMMUv3AccelState *accel)
+{
+ return FIELD_EX32(s->gbpa, GBPA, ABORT) ?
+ accel->abort_hwpt_id : accel->bypass_hwpt_id;
+}
+
+static bool
+smmuv3_accel_alloc_viommu(SMMUv3State *s, HostIOMMUDeviceIOMMUFD *idev,
+ Error **errp)
+{
+ struct iommu_hwpt_arm_smmuv3 bypass_data = {
+ .ste = { SMMU_STE_CFG_BYPASS | SMMU_STE_VALID, 0x0ULL },
+ };
+ struct iommu_hwpt_arm_smmuv3 abort_data = {
+ .ste = { SMMU_STE_VALID, 0x0ULL },
+ };
+ uint32_t s2_hwpt_id = idev->hwpt_id;
+ uint32_t viommu_id, hwpt_id;
+ SMMUv3AccelState *accel;
+
+ if (!iommufd_backend_alloc_viommu(idev->iommufd, idev->devid,
+ IOMMU_VIOMMU_TYPE_ARM_SMMUV3,
+ s2_hwpt_id, &viommu_id, errp)) {
+ return false;
+ }
+
+ accel = g_new0(SMMUv3AccelState, 1);
+ accel->viommu.viommu_id = viommu_id;
+ accel->viommu.s2_hwpt_id = s2_hwpt_id;
+ accel->viommu.iommufd = idev->iommufd;
+
+ /*
+ * Pre-allocate HWPTs for S1 bypass and abort cases. These will be attached
+ * later for guest STEs or GBPAs that require bypass or abort configuration.
+ */
+ if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid, viommu_id,
+ 0, IOMMU_HWPT_DATA_ARM_SMMUV3,
+ sizeof(abort_data), &abort_data,
+ &accel->abort_hwpt_id, errp)) {
+ goto free_viommu;
+ }
+
+ if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid, viommu_id,
+ 0, IOMMU_HWPT_DATA_ARM_SMMUV3,
+ sizeof(bypass_data), &bypass_data,
+ &accel->bypass_hwpt_id, errp)) {
+ goto free_abort_hwpt;
+ }
+
+ /* Attach a HWPT based on SMMUv3 GBPA.ABORT value */
+ hwpt_id = smmuv3_accel_gbpa_hwpt(s, accel);
+ if (!host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, errp)) {
+ goto free_bypass_hwpt;
+ }
+ s->s_accel = accel;
+ return true;
+
+free_bypass_hwpt:
+ iommufd_backend_free_id(idev->iommufd, accel->bypass_hwpt_id);
+free_abort_hwpt:
+ iommufd_backend_free_id(idev->iommufd, accel->abort_hwpt_id);
+free_viommu:
+ iommufd_backend_free_id(idev->iommufd, accel->viommu.viommu_id);
+ g_free(accel);
+ return false;
+}
+
+static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
+ HostIOMMUDevice *hiod, Error **errp)
+{
+ HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(hiod);
+ SMMUState *bs = opaque;
+ SMMUv3State *s = ARM_SMMUV3(bs);
+ SMMUPciBus *sbus = smmu_get_sbus(bs, bus);
+ SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
+
+ if (!idev) {
+ return true;
+ }
+
+ if (accel_dev->idev) {
+ if (accel_dev->idev != idev) {
+ error_setg(errp, "Device already has an associated idev 0x%x",
+ idev->devid);
+ return false;
+ }
+ return true;
+ }
+
+ if (s->s_accel) {
+ goto done;
+ }
+
+ if (!smmuv3_accel_alloc_viommu(s, idev, errp)) {
+ error_append_hint(errp, "Unable to alloc vIOMMU: idev devid 0x%x: ",
+ idev->devid);
+ return false;
+ }
+
+done:
+ accel_dev->idev = idev;
+ accel_dev->s_accel = s->s_accel;
+ QLIST_INSERT_HEAD(&s->s_accel->device_list, accel_dev, next);
+ trace_smmuv3_accel_set_iommu_device(devfn, idev->devid);
+ return true;
+}
+
+static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
+ int devfn)
+{
+ SMMUState *bs = opaque;
+ SMMUv3State *s = ARM_SMMUV3(bs);
+ SMMUPciBus *sbus = g_hash_table_lookup(bs->smmu_pcibus_by_busptr, bus);
+ HostIOMMUDeviceIOMMUFD *idev;
+ SMMUv3AccelDevice *accel_dev;
+ SMMUv3AccelState *accel;
+ SMMUDevice *sdev;
+
+ if (!sbus) {
+ return;
+ }
+
+ sdev = sbus->pbdev[devfn];
+ if (!sdev) {
+ return;
+ }
+
+ accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+ idev = accel_dev->idev;
+ accel = accel_dev->s_accel;
+ /* Re-attach the default s2 hwpt id */
+ if (!host_iommu_device_iommufd_attach_hwpt(idev, idev->hwpt_id, NULL)) {
+ error_report("Unable to attach the default HW pagetable: idev devid "
+ "0x%x", idev->devid);
+ }
+
+ accel_dev->idev = NULL;
+ accel_dev->s_accel = NULL;
+ QLIST_REMOVE(accel_dev, next);
+ trace_smmuv3_accel_unset_iommu_device(devfn, idev->devid);
+
+ if (QLIST_EMPTY(&accel->device_list)) {
+ iommufd_backend_free_id(accel->viommu.iommufd, accel->bypass_hwpt_id);
+ iommufd_backend_free_id(accel->viommu.iommufd, accel->abort_hwpt_id);
+ iommufd_backend_free_id(accel->viommu.iommufd, accel->viommu.viommu_id);
+ g_free(accel);
+ s->s_accel = NULL;
+ }
+}
+
/*
* Only allow PCIe bridges, pxb-pcie roots, and GPEX roots so vfio-pci
* endpoints can sit downstream. Accelerated SMMUv3 requires a vfio-pci
@@ -145,6 +297,8 @@ static const PCIIOMMUOps smmuv3_accel_ops = {
.supports_address_space = smmuv3_accel_supports_as,
.get_address_space = smmuv3_accel_find_add_as,
.get_viommu_flags = smmuv3_accel_get_viommu_flags,
+ .set_iommu_device = smmuv3_accel_set_iommu_device,
+ .unset_iommu_device = smmuv3_accel_unset_iommu_device,
};
static void smmuv3_accel_as_init(SMMUv3State *s)
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index 0dc6b00d35..c72605caab 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -10,10 +10,26 @@
#define HW_ARM_SMMUV3_ACCEL_H
#include "hw/arm/smmu-common.h"
+#include "system/iommufd.h"
+#include <linux/iommufd.h>
#include CONFIG_DEVICES
+/*
+ * Represents an accelerated SMMU instance backed by an iommufd vIOMMU object.
+ * Holds bypass and abort proxy HWPT IDs used for device attachment.
+ */
+typedef struct SMMUv3AccelState {
+ IOMMUFDViommu viommu;
+ uint32_t bypass_hwpt_id;
+ uint32_t abort_hwpt_id;
+ QLIST_HEAD(, SMMUv3AccelDevice) device_list;
+} SMMUv3AccelState;
+
typedef struct SMMUv3AccelDevice {
SMMUDevice sdev;
+ HostIOMMUDeviceIOMMUFD *idev;
+ QLIST_ENTRY(SMMUv3AccelDevice) next;
+ SMMUv3AccelState *s_accel;
} SMMUv3AccelDevice;
#ifdef CONFIG_ARM_SMMUV3_ACCEL
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index b6b7399347..81212a58f1 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -583,6 +583,9 @@ typedef struct CD {
((extract64((x)->word[7], 0, 16) << 32) | \
((x)->word[6] & 0xfffffff0))
+#define SMMU_STE_VALID (1ULL << 0)
+#define SMMU_STE_CFG_BYPASS (1ULL << 3)
+
static inline int oas2bits(int oas_field)
{
switch (oas_field) {
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index f3386bd7ae..2aaa0c40c7 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -66,6 +66,10 @@ smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s
smmuv3_inv_notifiers_iova(const char *name, int asid, int vmid, uint64_t iova, uint8_t tg, uint64_t num_pages, int stage) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" stage=%d"
smmu_reset_exit(void) ""
+#smmuv3-accel.c
+smmuv3_accel_set_iommu_device(int devfn, uint32_t devid) "devfn=0x%x (idev devid=0x%x)"
+smmuv3_accel_unset_iommu_device(int devfn, uint32_t devid) "devfn=0x%x (idev devid=0x%x)"
+
# strongarm.c
strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
strongarm_ssp_read_underrun(void) "SSP rx underrun"
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index bb7076286b..e54ece2d38 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -66,6 +66,7 @@ struct SMMUv3State {
/* SMMU has HW accelerator support for nested S1 + s2 */
bool accel;
+ struct SMMUv3AccelState *s_accel;
};
typedef enum {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 13/33] hw/arm/smmuv3: propagate smmuv3_cmdq_consume() errors to caller
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (11 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 12/33] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 20:59 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 14/33] hw/arm/smmuv3-accel: Add nested vSTE install/uninstall support Shameer Kolothum
` (20 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
smmuv3_cmdq_consume() is updated to return detailed errors via errp.
Although this is currently a no-op, it prepares the ground for accel
SMMUv3 specific command handling where proper error reporting will be
useful.
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3.c | 67 +++++++++++++++++++++++++++----------------------
1 file changed, 37 insertions(+), 30 deletions(-)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index ef991cb7d8..374ae08baa 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1279,7 +1279,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd, SMMUStage stage)
}
}
-static int smmuv3_cmdq_consume(SMMUv3State *s)
+static int smmuv3_cmdq_consume(SMMUv3State *s, Error **errp)
{
SMMUState *bs = ARM_SMMU(s);
SMMUCmdError cmd_error = SMMU_CERROR_NONE;
@@ -1547,42 +1547,44 @@ static MemTxResult smmu_writell(SMMUv3State *s, hwaddr offset,
static MemTxResult smmu_writel(SMMUv3State *s, hwaddr offset,
uint64_t data, MemTxAttrs attrs)
{
+ Error *local_err = NULL;
+
switch (offset) {
case A_CR0:
s->cr[0] = data;
s->cr0ack = data & ~SMMU_CR0_RESERVED;
/* in case the command queue has been enabled */
- smmuv3_cmdq_consume(s);
- return MEMTX_OK;
+ smmuv3_cmdq_consume(s, &local_err);
+ break;
case A_CR1:
s->cr[1] = data;
- return MEMTX_OK;
+ break;
case A_CR2:
s->cr[2] = data;
- return MEMTX_OK;
+ break;
case A_IRQ_CTRL:
s->irq_ctrl = data;
- return MEMTX_OK;
+ break;
case A_GERRORN:
smmuv3_write_gerrorn(s, data);
/*
* By acknowledging the CMDQ_ERR, SW may notify cmds can
* be processed again
*/
- smmuv3_cmdq_consume(s);
- return MEMTX_OK;
+ smmuv3_cmdq_consume(s, &local_err);
+ break;
case A_GERROR_IRQ_CFG0: /* 64b */
s->gerror_irq_cfg0 = deposit64(s->gerror_irq_cfg0, 0, 32, data);
- return MEMTX_OK;
+ break;
case A_GERROR_IRQ_CFG0 + 4:
s->gerror_irq_cfg0 = deposit64(s->gerror_irq_cfg0, 32, 32, data);
- return MEMTX_OK;
+ break;
case A_GERROR_IRQ_CFG1:
s->gerror_irq_cfg1 = data;
- return MEMTX_OK;
+ break;
case A_GERROR_IRQ_CFG2:
s->gerror_irq_cfg2 = data;
- return MEMTX_OK;
+ break;
case A_GBPA:
/*
* If UPDATE is not set, the write is ignored. This is the only
@@ -1592,71 +1594,76 @@ static MemTxResult smmu_writel(SMMUv3State *s, hwaddr offset,
/* Ignore update bit as write is synchronous. */
s->gbpa = data & ~R_GBPA_UPDATE_MASK;
}
- return MEMTX_OK;
+ break;
case A_STRTAB_BASE: /* 64b */
s->strtab_base = deposit64(s->strtab_base, 0, 32, data);
- return MEMTX_OK;
+ break;
case A_STRTAB_BASE + 4:
s->strtab_base = deposit64(s->strtab_base, 32, 32, data);
- return MEMTX_OK;
+ break;
case A_STRTAB_BASE_CFG:
s->strtab_base_cfg = data;
if (FIELD_EX32(data, STRTAB_BASE_CFG, FMT) == 1) {
s->sid_split = FIELD_EX32(data, STRTAB_BASE_CFG, SPLIT);
s->features |= SMMU_FEATURE_2LVL_STE;
}
- return MEMTX_OK;
+ break;
case A_CMDQ_BASE: /* 64b */
s->cmdq.base = deposit64(s->cmdq.base, 0, 32, data);
s->cmdq.log2size = extract64(s->cmdq.base, 0, 5);
if (s->cmdq.log2size > SMMU_CMDQS) {
s->cmdq.log2size = SMMU_CMDQS;
}
- return MEMTX_OK;
+ break;
case A_CMDQ_BASE + 4: /* 64b */
s->cmdq.base = deposit64(s->cmdq.base, 32, 32, data);
- return MEMTX_OK;
+ break;
case A_CMDQ_PROD:
s->cmdq.prod = data;
- smmuv3_cmdq_consume(s);
- return MEMTX_OK;
+ smmuv3_cmdq_consume(s, &local_err);
+ break;
case A_CMDQ_CONS:
s->cmdq.cons = data;
- return MEMTX_OK;
+ break;
case A_EVENTQ_BASE: /* 64b */
s->eventq.base = deposit64(s->eventq.base, 0, 32, data);
s->eventq.log2size = extract64(s->eventq.base, 0, 5);
if (s->eventq.log2size > SMMU_EVENTQS) {
s->eventq.log2size = SMMU_EVENTQS;
}
- return MEMTX_OK;
+ break;
case A_EVENTQ_BASE + 4:
s->eventq.base = deposit64(s->eventq.base, 32, 32, data);
- return MEMTX_OK;
+ break;
case A_EVENTQ_PROD:
s->eventq.prod = data;
- return MEMTX_OK;
+ break;
case A_EVENTQ_CONS:
s->eventq.cons = data;
- return MEMTX_OK;
+ break;
case A_EVENTQ_IRQ_CFG0: /* 64b */
s->eventq_irq_cfg0 = deposit64(s->eventq_irq_cfg0, 0, 32, data);
- return MEMTX_OK;
+ break;
case A_EVENTQ_IRQ_CFG0 + 4:
s->eventq_irq_cfg0 = deposit64(s->eventq_irq_cfg0, 32, 32, data);
- return MEMTX_OK;
+ break;
case A_EVENTQ_IRQ_CFG1:
s->eventq_irq_cfg1 = data;
- return MEMTX_OK;
+ break;
case A_EVENTQ_IRQ_CFG2:
s->eventq_irq_cfg2 = data;
- return MEMTX_OK;
+ break;
default:
qemu_log_mask(LOG_UNIMP,
"%s Unexpected 32-bit access to 0x%"PRIx64" (WI)\n",
__func__, offset);
- return MEMTX_OK;
+ break;
}
+
+ if (local_err) {
+ error_report_err(local_err);
+ }
+ return MEMTX_OK;
}
static MemTxResult smmu_write_mmio(void *opaque, hwaddr offset, uint64_t data,
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 14/33] hw/arm/smmuv3-accel: Add nested vSTE install/uninstall support
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (12 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 13/33] hw/arm/smmuv3: propagate smmuv3_cmdq_consume() errors to caller Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 15/33] hw/arm/smmuv3-accel: Install SMMUv3 GBPA based hwpt Shameer Kolothum
` (19 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
From: Nicolin Chen <nicolinc@nvidia.com>
A device placed behind a vSMMU instance must have corresponding vSTEs
(bypass, abort, or translate) installed. The bypass and abort proxy nested
HWPTs are pre-allocated.
For translate HWPT, a vDEVICE object is allocated and associated with the
vIOMMU for each guest device. This allows the host kernel to establish a
virtual SID to physical SID mapping, which is required for handling
invalidations and event reporting.
The translate HWPT is allocated based on the guest STE configuration and
attached to the device when the guest issues SMMU_CMD_CFGI_STE or
SMMU_CMD_CFGI_STE_RANGE, provided the STE enables S1 translation.
If the guest STE is invalid or S1 translation is disabled, the device is
attached to one of the pre-allocated ABORT or BYPASS HWPTs instead.
While at it, export smmu_find_ste() for use here.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 197 +++++++++++++++++++++++++++++++++++++++
hw/arm/smmuv3-accel.h | 22 +++++
hw/arm/smmuv3-internal.h | 20 ++++
hw/arm/smmuv3.c | 11 ++-
hw/arm/trace-events | 2 +
5 files changed, 250 insertions(+), 2 deletions(-)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 4dd56a8e65..2e42d2d484 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -51,6 +51,188 @@ static uint32_t smmuv3_accel_gbpa_hwpt(SMMUv3State *s, SMMUv3AccelState *accel)
accel->abort_hwpt_id : accel->bypass_hwpt_id;
}
+static bool
+smmuv3_accel_alloc_vdev(SMMUv3AccelDevice *accel_dev, int sid, Error **errp)
+{
+ SMMUv3AccelState *accel = accel_dev->s_accel;
+ HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
+ IOMMUFDVdev *vdev = accel_dev->vdev;
+ uint32_t vdevice_id;
+
+ if (!idev || vdev) {
+ return true;
+ }
+
+ if (!iommufd_backend_alloc_vdev(idev->iommufd, idev->devid,
+ accel->viommu.viommu_id, sid,
+ &vdevice_id, errp)) {
+ return false;
+ }
+
+ vdev = g_new(IOMMUFDVdev, 1);
+ vdev->vdevice_id = vdevice_id;
+ vdev->virt_id = sid;
+ accel_dev->vdev = vdev;
+ return true;
+}
+
+static SMMUS1Hwpt *
+smmuv3_accel_dev_alloc_translate(SMMUv3AccelDevice *accel_dev, STE *ste,
+ Error **errp)
+{
+ uint64_t ste_0 = (uint64_t)ste->word[0] | (uint64_t)ste->word[1] << 32;
+ uint64_t ste_1 = (uint64_t)ste->word[2] | (uint64_t)ste->word[3] << 32;
+ HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
+ SMMUv3AccelState *accel = accel_dev->s_accel;
+ struct iommu_hwpt_arm_smmuv3 nested_data = {
+ .ste = {
+ cpu_to_le64(ste_0 & STE0_MASK),
+ cpu_to_le64(ste_1 & STE1_MASK),
+ },
+ };
+ uint32_t hwpt_id = 0, flags = 0;
+ SMMUS1Hwpt *s1_hwpt;
+
+ if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
+ accel->viommu.viommu_id, flags,
+ IOMMU_HWPT_DATA_ARM_SMMUV3,
+ sizeof(nested_data), &nested_data,
+ &hwpt_id, errp)) {
+ return NULL;
+ }
+
+ s1_hwpt = g_new0(SMMUS1Hwpt, 1);
+ s1_hwpt->hwpt_id = hwpt_id;
+ trace_smmuv3_accel_translate_ste(accel_dev->vdev->virt_id, hwpt_id,
+ nested_data.ste[1], nested_data.ste[0]);
+ return s1_hwpt;
+}
+
+bool smmuv3_accel_install_ste(SMMUv3State *s, SMMUDevice *sdev, int sid,
+ Error **errp)
+{
+ SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
+ .inval_ste_allowed = true};
+ SMMUv3AccelState *accel = s->s_accel;
+ SMMUv3AccelDevice *accel_dev;
+ HostIOMMUDeviceIOMMUFD *idev;
+ uint32_t config, hwpt_id = 0;
+ SMMUS1Hwpt *s1_hwpt = NULL;
+ const char *type;
+ STE ste;
+
+ if (!accel) {
+ return true;
+ }
+
+ accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+ if (!accel_dev->s_accel) {
+ return true;
+ }
+
+ idev = accel_dev->idev;
+ if (!smmuv3_accel_alloc_vdev(accel_dev, sid, errp)) {
+ return false;
+ }
+
+ if (smmu_find_ste(sdev->smmu, sid, &ste, &event)) {
+ /* No STE found, nothing to install */
+ return true;
+ }
+
+ /*
+ * Install the STE based on SMMU enabled/config:
+ * - attach a pre-allocated HWPT for abort/bypass
+ * - or a new HWPT for translate STE
+ *
+ * Note: The vdev remains associated with accel_dev even if HWPT
+ * attach/alloc fails, since the Guest-Host SID mapping stays
+ * valid as long as the device is behind the accelerated SMMUv3.
+ */
+ if (!smmu_enabled(s)) {
+ hwpt_id = smmuv3_accel_gbpa_hwpt(s, accel);
+ } else {
+ config = STE_CONFIG(&ste);
+
+ if (!STE_VALID(&ste) || STE_CFG_ABORT(config)) {
+ hwpt_id = accel->abort_hwpt_id;
+ } else if (STE_CFG_BYPASS(config)) {
+ hwpt_id = accel->bypass_hwpt_id;
+ } else if (STE_CFG_S1_TRANSLATE(config)) {
+ s1_hwpt = smmuv3_accel_dev_alloc_translate(accel_dev, &ste, errp);
+ if (!s1_hwpt) {
+ return false;
+ }
+ hwpt_id = s1_hwpt->hwpt_id;
+ }
+ }
+
+ if (!hwpt_id) {
+ error_setg(errp, "Invalid STE config for sid 0x%x",
+ smmu_get_sid(&accel_dev->sdev));
+ return false;
+ }
+
+ if (!host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, errp)) {
+ if (s1_hwpt) {
+ iommufd_backend_free_id(idev->iommufd, s1_hwpt->hwpt_id);
+ g_free(s1_hwpt);
+ }
+ return false;
+ }
+
+ /* Free the previous s1_hwpt */
+ if (accel_dev->s1_hwpt) {
+ iommufd_backend_free_id(idev->iommufd, accel_dev->s1_hwpt->hwpt_id);
+ g_free(accel_dev->s1_hwpt);
+ }
+
+ accel_dev->s1_hwpt = s1_hwpt;
+ if (hwpt_id == accel->abort_hwpt_id) {
+ type = "abort";
+ } else if (hwpt_id == accel->bypass_hwpt_id) {
+ type = "bypass";
+ } else {
+ type = "translate";
+ }
+
+ trace_smmuv3_accel_install_ste(sid, type, hwpt_id);
+ return true;
+}
+
+bool smmuv3_accel_install_ste_range(SMMUv3State *s, SMMUSIDRange *range,
+ Error **errp)
+{
+ SMMUv3AccelState *accel = s->s_accel;
+ SMMUv3AccelDevice *accel_dev;
+ Error *local_err = NULL;
+ bool all_ok = true;
+
+ if (!accel) {
+ return true;
+ }
+
+ QLIST_FOREACH(accel_dev, &accel->device_list, next) {
+ uint32_t sid = smmu_get_sid(&accel_dev->sdev);
+
+ if (sid >= range->start && sid <= range->end) {
+ if (!smmuv3_accel_install_ste(s, &accel_dev->sdev,
+ sid, &local_err)) {
+ error_append_hint(&local_err, "Device 0x%x: Failed to install "
+ "STE\n", sid);
+ error_report_err(local_err);
+ local_err = NULL;
+ all_ok = false;
+ }
+ }
+ }
+
+ if (!all_ok) {
+ error_setg(errp, "Failed to install all STEs properly");
+ }
+ return all_ok;
+}
+
static bool
smmuv3_accel_alloc_viommu(SMMUv3State *s, HostIOMMUDeviceIOMMUFD *idev,
Error **errp)
@@ -161,6 +343,7 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
HostIOMMUDeviceIOMMUFD *idev;
SMMUv3AccelDevice *accel_dev;
SMMUv3AccelState *accel;
+ IOMMUFDVdev *vdev;
SMMUDevice *sdev;
if (!sbus) {
@@ -181,6 +364,20 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
"0x%x", idev->devid);
}
+ if (accel_dev->s1_hwpt) {
+ iommufd_backend_free_id(accel_dev->idev->iommufd,
+ accel_dev->s1_hwpt->hwpt_id);
+ g_free(accel_dev->s1_hwpt);
+ accel_dev->s1_hwpt = NULL;
+ }
+
+ vdev = accel_dev->vdev;
+ if (vdev) {
+ iommufd_backend_free_id(accel->viommu.iommufd, vdev->vdevice_id);
+ g_free(vdev);
+ accel_dev->vdev = NULL;
+ }
+
accel_dev->idev = NULL;
accel_dev->s_accel = NULL;
QLIST_REMOVE(accel_dev, next);
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index c72605caab..ae896cfa8b 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -25,19 +25,41 @@ typedef struct SMMUv3AccelState {
QLIST_HEAD(, SMMUv3AccelDevice) device_list;
} SMMUv3AccelState;
+typedef struct SMMUS1Hwpt {
+ uint32_t hwpt_id;
+} SMMUS1Hwpt;
+
typedef struct SMMUv3AccelDevice {
SMMUDevice sdev;
HostIOMMUDeviceIOMMUFD *idev;
+ SMMUS1Hwpt *s1_hwpt;
+ IOMMUFDVdev *vdev;
QLIST_ENTRY(SMMUv3AccelDevice) next;
SMMUv3AccelState *s_accel;
} SMMUv3AccelDevice;
#ifdef CONFIG_ARM_SMMUV3_ACCEL
void smmuv3_accel_init(SMMUv3State *s);
+bool smmuv3_accel_install_ste(SMMUv3State *s, SMMUDevice *sdev, int sid,
+ Error **errp);
+bool smmuv3_accel_install_ste_range(SMMUv3State *s, SMMUSIDRange *range,
+ Error **errp);
#else
static inline void smmuv3_accel_init(SMMUv3State *s)
{
}
+static inline bool
+smmuv3_accel_install_ste(SMMUv3State *s, SMMUDevice *sdev, int sid,
+ Error **errp)
+{
+ return true;
+}
+static inline bool
+smmuv3_accel_install_ste_range(SMMUv3State *s, SMMUSIDRange *range,
+ Error **errp)
+{
+ return true;
+}
#endif
#endif /* HW_ARM_SMMUV3_ACCEL_H */
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index 81212a58f1..a76e4e2484 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -547,6 +547,8 @@ typedef struct CD {
uint32_t word[16];
} CD;
+int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste, SMMUEventInfo *event);
+
/* STE fields */
#define STE_VALID(x) extract32((x)->word[0], 0, 1)
@@ -556,6 +558,7 @@ typedef struct CD {
#define STE_CFG_S2_ENABLED(config) (config & 0x2)
#define STE_CFG_ABORT(config) (!(config & 0x4))
#define STE_CFG_BYPASS(config) (config == 0x4)
+#define STE_CFG_S1_TRANSLATE(config) (config == 0x5)
#define STE_S1FMT(x) extract32((x)->word[0], 4 , 2)
#define STE_S1CDMAX(x) extract32((x)->word[1], 27, 5)
@@ -586,6 +589,23 @@ typedef struct CD {
#define SMMU_STE_VALID (1ULL << 0)
#define SMMU_STE_CFG_BYPASS (1ULL << 3)
+#define STE0_V MAKE_64BIT_MASK(0, 1)
+#define STE0_CONFIG MAKE_64BIT_MASK(1, 3)
+#define STE0_S1FMT MAKE_64BIT_MASK(4, 2)
+#define STE0_CTXPTR MAKE_64BIT_MASK(6, 50)
+#define STE0_S1CDMAX MAKE_64BIT_MASK(59, 5)
+#define STE0_MASK (STE0_S1CDMAX | STE0_CTXPTR | STE0_S1FMT | STE0_CONFIG | \
+ STE0_V)
+
+#define STE1_S1DSS MAKE_64BIT_MASK(0, 2)
+#define STE1_S1CIR MAKE_64BIT_MASK(2, 2)
+#define STE1_S1COR MAKE_64BIT_MASK(4, 2)
+#define STE1_S1CSH MAKE_64BIT_MASK(6, 2)
+#define STE1_S1STALLD MAKE_64BIT_MASK(27, 1)
+#define STE1_EATS MAKE_64BIT_MASK(28, 2)
+#define STE1_MASK (STE1_EATS | STE1_S1STALLD | STE1_S1CSH | STE1_S1COR | \
+ STE1_S1CIR | STE1_S1DSS)
+
static inline int oas2bits(int oas_field)
{
switch (oas_field) {
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 374ae08baa..bfb41b8866 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -630,8 +630,7 @@ bad_ste:
* Supports linear and 2-level stream table
* Return 0 on success, -EINVAL otherwise
*/
-static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
- SMMUEventInfo *event)
+int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste, SMMUEventInfo *event)
{
dma_addr_t addr, strtab_base;
uint32_t log2size;
@@ -1341,6 +1340,10 @@ static int smmuv3_cmdq_consume(SMMUv3State *s, Error **errp)
}
trace_smmuv3_cmdq_cfgi_ste(sid);
+ if (!smmuv3_accel_install_ste(s, sdev, sid, errp)) {
+ cmd_error = SMMU_CERROR_ILL;
+ break;
+ }
smmuv3_flush_config(sdev);
break;
@@ -1361,6 +1364,10 @@ static int smmuv3_cmdq_consume(SMMUv3State *s, Error **errp)
sid_range.end = sid_range.start + mask;
trace_smmuv3_cmdq_cfgi_ste_range(sid_range.start, sid_range.end);
+ if (!smmuv3_accel_install_ste_range(s, &sid_range, errp)) {
+ cmd_error = SMMU_CERROR_ILL;
+ break;
+ }
smmu_configs_inv_sid_range(bs, sid_range);
break;
}
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 2aaa0c40c7..8135c0c734 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -69,6 +69,8 @@ smmu_reset_exit(void) ""
#smmuv3-accel.c
smmuv3_accel_set_iommu_device(int devfn, uint32_t devid) "devfn=0x%x (idev devid=0x%x)"
smmuv3_accel_unset_iommu_device(int devfn, uint32_t devid) "devfn=0x%x (idev devid=0x%x)"
+smmuv3_accel_translate_ste(uint32_t vsid, uint32_t hwpt_id, uint64_t ste_1, uint64_t ste_0) "vSID=0x%x hwpt_id=0x%x ste=%"PRIx64":%"PRIx64
+smmuv3_accel_install_ste(uint32_t vsid, const char * type, uint32_t hwpt_id) "vSID=0x%x ste type=%s hwpt_id=0x%x"
# strongarm.c
strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 15/33] hw/arm/smmuv3-accel: Install SMMUv3 GBPA based hwpt
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (13 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 14/33] hw/arm/smmuv3-accel: Add nested vSTE install/uninstall support Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 21:03 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 16/33] hw/pci/pci: Introduce a callback to retrieve the MSI doorbell GPA directly Shameer Kolothum
` (18 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On guest reboot or on GBPA update, attach a nested HWPT based on the
GPBA.ABORT bit which either aborts all incoming transactions or bypasses
them.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 36 ++++++++++++++++++++++++++++++++++++
hw/arm/smmuv3-accel.h | 9 +++++++++
hw/arm/smmuv3.c | 2 ++
3 files changed, 47 insertions(+)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 2e42d2d484..65b577f49a 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -498,6 +498,42 @@ static const PCIIOMMUOps smmuv3_accel_ops = {
.unset_iommu_device = smmuv3_accel_unset_iommu_device,
};
+/* Based on SMUUv3 GPBA.ABORT configuration, attach a corresponding HWPT */
+bool smmuv3_accel_attach_gbpa_hwpt(SMMUv3State *s, Error **errp)
+{
+ SMMUv3AccelState *accel = s->s_accel;
+ SMMUv3AccelDevice *accel_dev;
+ Error *local_err = NULL;
+ bool all_ok = true;
+ uint32_t hwpt_id;
+
+ if (!accel) {
+ return true;
+ }
+
+ hwpt_id = smmuv3_accel_gbpa_hwpt(s, accel);
+ QLIST_FOREACH(accel_dev, &accel->device_list, next) {
+ if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev, hwpt_id,
+ &local_err)) {
+ error_append_hint(&local_err, "Failed to attach GBPA hwpt %u for "
+ "idev devid %u", hwpt_id, accel_dev->idev->devid);
+ error_report_err(local_err);
+ local_err = NULL;
+ all_ok = false;
+ }
+ }
+ if (!all_ok) {
+ error_setg(errp, "Failed to attach all GBPA based HWPTs properly");
+ }
+ return all_ok;
+}
+
+void smmuv3_accel_reset(SMMUv3State *s)
+{
+ /* Attach a HWPT based on GBPA reset value */
+ smmuv3_accel_attach_gbpa_hwpt(s, NULL);
+}
+
static void smmuv3_accel_as_init(SMMUv3State *s)
{
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index ae896cfa8b..2d2d005658 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -44,6 +44,8 @@ bool smmuv3_accel_install_ste(SMMUv3State *s, SMMUDevice *sdev, int sid,
Error **errp);
bool smmuv3_accel_install_ste_range(SMMUv3State *s, SMMUSIDRange *range,
Error **errp);
+bool smmuv3_accel_attach_gbpa_hwpt(SMMUv3State *s, Error **errp);
+void smmuv3_accel_reset(SMMUv3State *s);
#else
static inline void smmuv3_accel_init(SMMUv3State *s)
{
@@ -60,6 +62,13 @@ smmuv3_accel_install_ste_range(SMMUv3State *s, SMMUSIDRange *range,
{
return true;
}
+static inline bool smmuv3_accel_attach_gbpa_hwpt(SMMUv3State *s, Error **errp)
+{
+ return true;
+}
+static inline void smmuv3_accel_reset(SMMUv3State *s)
+{
+}
#endif
#endif /* HW_ARM_SMMUV3_ACCEL_H */
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index bfb41b8866..42c60b1ec8 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1600,6 +1600,7 @@ static MemTxResult smmu_writel(SMMUv3State *s, hwaddr offset,
if (data & R_GBPA_UPDATE_MASK) {
/* Ignore update bit as write is synchronous. */
s->gbpa = data & ~R_GBPA_UPDATE_MASK;
+ smmuv3_accel_attach_gbpa_hwpt(s, &local_err);
}
break;
case A_STRTAB_BASE: /* 64b */
@@ -1887,6 +1888,7 @@ static void smmu_reset_exit(Object *obj, ResetType type)
}
smmuv3_init_regs(s);
+ smmuv3_accel_reset(s);
}
static void smmu_realize(DeviceState *d, Error **errp)
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 16/33] hw/pci/pci: Introduce a callback to retrieve the MSI doorbell GPA directly
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (14 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 15/33] hw/arm/smmuv3-accel: Install SMMUv3 GBPA based hwpt Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 21:05 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA Shameer Kolothum
` (17 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju,
Michael S . Tsirkin
For certain vIOMMU implementations, such as SMMUv3 in accelerated mode,
the translation tables are programmed directly into the physical SMMUv3
in a nested configuration. While QEMU knows where the guest tables live,
safely walking them in software would require trapping and ordering all
guest invalidations on every command queue. Without this, QEMU could race
with guest updates and walk stale or freed page tables.
This constraint is fundamental to the design of HW-accelerated vSMMU when
used with downstream vfio-pci endpoint devices, where QEMU must never walk
guest translation tables and must rely on the physical SMMU for
translation. Future accelerated vSMMU features, such as virtual CMDQ, will
also prevent trapping invalidations, reinforcing this restriction.
For vfio-pci endpoints behind such a vSMMU, the only translation QEMU
needs is for the MSI doorbell used when setting up KVM MSI route tables.
Instead of attempting a software walk, introduce an optional vIOMMU
callback that returns the MSI doorbell GPA directly.
kvm_arch_fixup_msi_route() uses this callback when available and ignores
the guest provided IOVA in that case.
If the vIOMMU does not implement the callback, we fall back to the
existing IOMMU based address space translation path.
This ensures correct MSI routing for accelerated SMMUv3 + VFIO passthrough
while avoiding unsafe software walks of guest translation tables.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/pci/pci.c | 17 +++++++++++++++++
include/hw/pci/pci.h | 17 +++++++++++++++++
target/arm/kvm.c | 18 +++++++++++++++++-
3 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 55647a6928..201583603f 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2979,6 +2979,23 @@ bool pci_device_get_iommu_bus_devfn(PCIDevice *dev, PCIBus **piommu_bus,
return aliased;
}
+bool pci_device_iommu_msi_direct_gpa(PCIDevice *dev, hwaddr *out_doorbell)
+{
+ PCIBus *bus;
+ PCIBus *iommu_bus;
+ int devfn;
+
+ pci_device_get_iommu_bus_devfn(dev, &iommu_bus, &bus, &devfn);
+ if (iommu_bus) {
+ if (iommu_bus->iommu_ops->get_msi_direct_gpa) {
+ *out_doorbell = iommu_bus->iommu_ops->get_msi_direct_gpa(bus,
+ iommu_bus->iommu_opaque, devfn);
+ return true;
+ }
+ }
+ return false;
+}
+
AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
{
PCIBus *bus;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index dd1c4483a2..0964049044 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -664,6 +664,22 @@ typedef struct PCIIOMMUOps {
uint32_t pasid, bool priv_req, bool exec_req,
hwaddr addr, bool lpig, uint16_t prgi, bool is_read,
bool is_write);
+ /**
+ * @get_msi_direct_gpa: get the guest physical address of MSI doorbell
+ * for the device on a PCI bus.
+ *
+ * Optional callback. If implemented, it must return a valid guest
+ * physical address for the MSI doorbell
+ *
+ * @bus: the #PCIBus being accessed.
+ *
+ * @opaque: the data passed to pci_setup_iommu().
+ *
+ * @devfn: device and function number
+ *
+ * Returns: the guest physical address of the MSI doorbell.
+ */
+ uint64_t (*get_msi_direct_gpa)(PCIBus *bus, void *opaque, int devfn);
} PCIIOMMUOps;
bool pci_device_get_iommu_bus_devfn(PCIDevice *dev, PCIBus **piommu_bus,
@@ -672,6 +688,7 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
Error **errp);
void pci_device_unset_iommu_device(PCIDevice *dev);
+bool pci_device_iommu_msi_direct_gpa(PCIDevice *dev, hwaddr *out_doorbell);
/**
* pci_device_get_viommu_flags: get vIOMMU flags.
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 0d57081e69..2372de6a6e 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1620,26 +1620,42 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
return 0;
}
+ /*
+ * We do have an IOMMU address space, but for some vIOMMU implementations
+ * (e.g. accelerated SMMUv3) the translation tables are programmed into
+ * the physical SMMUv3 in the host (nested S1=guest, S2=host). QEMU cannot
+ * walk these tables in a safe way, so in that case we obtain the MSI
+ * doorbell GPA directly from the vIOMMU backend and ignore the gIOVA
+ * @address.
+ */
+ if (pci_device_iommu_msi_direct_gpa(dev, &doorbell_gpa)) {
+ goto set_doorbell;
+ }
+
/* MSI doorbell address is translated by an IOMMU */
- RCU_READ_LOCK_GUARD();
+ rcu_read_lock();
mr = address_space_translate(as, address, &xlat, &len, true,
MEMTXATTRS_UNSPECIFIED);
if (!mr) {
+ rcu_read_unlock();
return 1;
}
mrs = memory_region_find(mr, xlat, 1);
if (!mrs.mr) {
+ rcu_read_unlock();
return 1;
}
doorbell_gpa = mrs.offset_within_address_space;
memory_region_unref(mrs.mr);
+ rcu_read_unlock();
+set_doorbell:
route->u.msi.address_lo = doorbell_gpa;
route->u.msi.address_hi = doorbell_gpa >> 32;
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (15 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 16/33] hw/pci/pci: Introduce a callback to retrieve the MSI doorbell GPA directly Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 21:21 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 18/33] hw/arm/smmuv3-accel: Add support to issue invalidation cmd to host Shameer Kolothum
` (16 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Accelerated SMMUv3 instances rely on the physical SMMUv3 for nested
translation (Guest Stage-1, Host Stage-2). In this mode the guest’s
Stage-1 tables are programmed directly into hardware, and QEMU should
not attempt to walk them for translation since doing so is not reliably
safe. For vfio-pci endpoints behind such a vSMMU, the only translation
QEMU is responsible for is the MSI doorbell used during KVM MSI setup.
Add a device property to carry the MSI doorbell GPA from the virt
machine, and expose it through a new get_msi_direct_gpa PCIIOMMUOp.
kvm_arch_fixup_msi_route() can then use this GPA directly instead of
attempting a software walk of guest translation tables.
This enables correct MSI routing with accelerated SMMUv3 while avoiding
unsafe accesses to page tables.
For meaningful use of vfio-pci devices with accelerated SMMUv3, both KVM
and a kernel irqchip are required. Enforce this requirement when accel=on
is selected.
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 10 ++++++++++
hw/arm/smmuv3.c | 2 ++
hw/arm/virt.c | 22 ++++++++++++++++++++++
include/hw/arm/smmuv3.h | 1 +
4 files changed, 35 insertions(+)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 65b577f49a..8f7c0cda05 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -392,6 +392,15 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
}
}
+static uint64_t smmuv3_accel_get_msi_gpa(PCIBus *bus, void *opaque, int devfn)
+{
+ SMMUState *bs = opaque;
+ SMMUv3State *s = ARM_SMMUV3(bs);
+
+ g_assert(s->msi_gpa);
+ return s->msi_gpa;
+}
+
/*
* Only allow PCIe bridges, pxb-pcie roots, and GPEX roots so vfio-pci
* endpoints can sit downstream. Accelerated SMMUv3 requires a vfio-pci
@@ -496,6 +505,7 @@ static const PCIIOMMUOps smmuv3_accel_ops = {
.get_viommu_flags = smmuv3_accel_get_viommu_flags,
.set_iommu_device = smmuv3_accel_set_iommu_device,
.unset_iommu_device = smmuv3_accel_unset_iommu_device,
+ .get_msi_direct_gpa = smmuv3_accel_get_msi_gpa,
};
/* Based on SMUUv3 GPBA.ABORT configuration, attach a corresponding HWPT */
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 42c60b1ec8..f02e3ee46c 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1998,6 +1998,8 @@ static const Property smmuv3_properties[] = {
* Defaults to stage 1
*/
DEFINE_PROP_STRING("stage", SMMUv3State, stage),
+ /* GPA of MSI doorbell, for SMMUv3 accel use. */
+ DEFINE_PROP_UINT64("msi-gpa", SMMUv3State, msi_gpa, 0),
};
static void smmuv3_instance_init(Object *obj)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 25fb2bab56..ea3231543a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3052,6 +3052,14 @@ static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
/* The new SMMUv3 device is specific to the PCI bus */
object_property_set_bool(OBJECT(dev), "smmu_per_bus", true, NULL);
}
+ if (object_property_find(OBJECT(dev), "accel") &&
+ object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
+ if (!kvm_enabled() || !kvm_irqchip_in_kernel()) {
+ error_setg(errp, "SMMUv3 accel=on requires KVM with "
+ "kernel-irqchip=on support");
+ return;
+ }
+ }
}
}
@@ -3088,6 +3096,20 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
}
create_smmuv3_dev_dtb(vms, dev, bus);
+ if (object_property_find(OBJECT(dev), "accel") &&
+ object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
+ hwaddr db_start;
+
+ if (vms->msi_controller == VIRT_MSI_CTRL_ITS) {
+ /* GITS_TRANSLATER page + offset */
+ db_start = base_memmap[VIRT_GIC_ITS].base + 0x10000 + 0x40;
+ } else {
+ /* MSI_SETSPI_NS page + offset */
+ db_start = base_memmap[VIRT_GIC_V2M].base + 0x40;
+ }
+ object_property_set_uint(OBJECT(dev), "msi-gpa", db_start,
+ &error_abort);
+ }
}
}
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index e54ece2d38..5616a8a2be 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -67,6 +67,7 @@ struct SMMUv3State {
/* SMMU has HW accelerator support for nested S1 + s2 */
bool accel;
struct SMMUv3AccelState *s_accel;
+ uint64_t msi_gpa;
};
typedef enum {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 18/33] hw/arm/smmuv3-accel: Add support to issue invalidation cmd to host
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (16 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 19/33] hw/arm/smmuv3: Initialize ID registers early during realize() Shameer Kolothum
` (15 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Provide a helper and use that to issue the invalidation cmd to host SMMUv3.
We only issue one cmd at a time for now.
Support for batching of commands will be added later after analysing the
impact.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 36 ++++++++++++++++++++++++++++++++++++
hw/arm/smmuv3-accel.h | 8 ++++++++
hw/arm/smmuv3.c | 16 ++++++++++++++++
3 files changed, 60 insertions(+)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 8f7c0cda05..a7291e75f1 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -233,6 +233,42 @@ bool smmuv3_accel_install_ste_range(SMMUv3State *s, SMMUSIDRange *range,
return all_ok;
}
+/*
+ * This issues the invalidation cmd to the host SMMUv3.
+ *
+ * sdev is non-NULL for SID based invalidations (e.g. CFGI_CD), and NULL for
+ * non SID invalidations such as SMMU_CMD_TLBI_NH_ASID and SMMU_CMD_TLBI_NH_VA.
+ */
+bool smmuv3_accel_issue_inv_cmd(SMMUv3State *bs, void *cmd, SMMUDevice *sdev,
+ Error **errp)
+{
+ SMMUv3State *s = ARM_SMMUV3(bs);
+ SMMUv3AccelState *accel = s->s_accel;
+ uint32_t entry_num = 1;
+
+ /*
+ * No SMMUv3AccelState means no VFIO/IOMMUFD devices, nothing to
+ * invalidate.
+ */
+ if (!accel) {
+ return true;
+ }
+
+ /*
+ * SID based invalidations (e.g. CFGI_CD) apply only to vfio-pci endpoints
+ * with a valid vIOMMU vdev.
+ */
+ if (sdev && !container_of(sdev, SMMUv3AccelDevice, sdev)->vdev) {
+ return true;
+ }
+
+ /* Single command (entry_num = 1); no need to check returned entry_num */
+ return iommufd_backend_invalidate_cache(
+ accel->viommu.iommufd, accel->viommu.viommu_id,
+ IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
+ sizeof(Cmd), &entry_num, cmd, errp);
+}
+
static bool
smmuv3_accel_alloc_viommu(SMMUv3State *s, HostIOMMUDeviceIOMMUFD *idev,
Error **errp)
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index 2d2d005658..7186817264 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -45,6 +45,8 @@ bool smmuv3_accel_install_ste(SMMUv3State *s, SMMUDevice *sdev, int sid,
bool smmuv3_accel_install_ste_range(SMMUv3State *s, SMMUSIDRange *range,
Error **errp);
bool smmuv3_accel_attach_gbpa_hwpt(SMMUv3State *s, Error **errp);
+bool smmuv3_accel_issue_inv_cmd(SMMUv3State *s, void *cmd, SMMUDevice *sdev,
+ Error **errp);
void smmuv3_accel_reset(SMMUv3State *s);
#else
static inline void smmuv3_accel_init(SMMUv3State *s)
@@ -66,6 +68,12 @@ static inline bool smmuv3_accel_attach_gbpa_hwpt(SMMUv3State *s, Error **errp)
{
return true;
}
+static inline bool
+smmuv3_accel_issue_inv_cmd(SMMUv3State *s, void *cmd, SMMUDevice *sdev,
+ Error **errp)
+{
+ return true;
+}
static inline void smmuv3_accel_reset(SMMUv3State *s)
{
}
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index f02e3ee46c..513da966a4 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1388,6 +1388,10 @@ static int smmuv3_cmdq_consume(SMMUv3State *s, Error **errp)
trace_smmuv3_cmdq_cfgi_cd(sid);
smmuv3_flush_config(sdev);
+ if (!smmuv3_accel_issue_inv_cmd(s, &cmd, sdev, errp)) {
+ cmd_error = SMMU_CERROR_ILL;
+ break;
+ }
break;
}
case SMMU_CMD_TLBI_NH_ASID:
@@ -1411,6 +1415,10 @@ static int smmuv3_cmdq_consume(SMMUv3State *s, Error **errp)
trace_smmuv3_cmdq_tlbi_nh_asid(asid);
smmu_inv_notifiers_all(&s->smmu_state);
smmu_iotlb_inv_asid_vmid(bs, asid, vmid);
+ if (!smmuv3_accel_issue_inv_cmd(s, &cmd, NULL, errp)) {
+ cmd_error = SMMU_CERROR_ILL;
+ break;
+ }
break;
}
case SMMU_CMD_TLBI_NH_ALL:
@@ -1438,6 +1446,10 @@ static int smmuv3_cmdq_consume(SMMUv3State *s, Error **errp)
trace_smmuv3_cmdq_tlbi_nsnh();
smmu_inv_notifiers_all(&s->smmu_state);
smmu_iotlb_inv_all(bs);
+ if (!smmuv3_accel_issue_inv_cmd(s, &cmd, NULL, errp)) {
+ cmd_error = SMMU_CERROR_ILL;
+ break;
+ }
break;
case SMMU_CMD_TLBI_NH_VAA:
case SMMU_CMD_TLBI_NH_VA:
@@ -1446,6 +1458,10 @@ static int smmuv3_cmdq_consume(SMMUv3State *s, Error **errp)
break;
}
smmuv3_range_inval(bs, &cmd, SMMU_STAGE_1);
+ if (!smmuv3_accel_issue_inv_cmd(s, &cmd, NULL, errp)) {
+ cmd_error = SMMU_CERROR_ILL;
+ break;
+ }
break;
case SMMU_CMD_TLBI_S12_VMALL:
{
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 19/33] hw/arm/smmuv3: Initialize ID registers early during realize()
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (17 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 18/33] hw/arm/smmuv3-accel: Add support to issue invalidation cmd to host Shameer Kolothum
@ 2025-11-20 13:21 ` Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 20/33] hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate Shameer Kolothum
` (14 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:21 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Factor out ID register init into smmuv3_init_id_regs() and call it from
realize(). This ensures ID registers are initialized early for use in the
accelerated SMMUv3 path and will be utilized in subsequent patch.
Other registers remain initialized in smmuv3_reset().
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 513da966a4..dba5abc8d3 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -258,7 +258,12 @@ void smmuv3_record_event(SMMUv3State *s, SMMUEventInfo *info)
info->recorded = true;
}
-static void smmuv3_init_regs(SMMUv3State *s)
+/*
+ * Called during realize(), as the ID registers will be accessed early in the
+ * SMMUv3 accel path for feature compatibility checks. The remaining registers
+ * are initialized later in smmuv3_reset().
+ */
+static void smmuv3_init_id_regs(SMMUv3State *s)
{
/* Based on sys property, the stages supported in smmu will be advertised.*/
if (s->stage && !strcmp("2", s->stage)) {
@@ -298,7 +303,11 @@ static void smmuv3_init_regs(SMMUv3State *s)
s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, 1);
s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, 1);
s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, 1);
+ s->aidr = 0x1;
+}
+static void smmuv3_reset(SMMUv3State *s)
+{
s->cmdq.base = deposit64(s->cmdq.base, 0, 5, SMMU_CMDQS);
s->cmdq.prod = 0;
s->cmdq.cons = 0;
@@ -310,7 +319,6 @@ static void smmuv3_init_regs(SMMUv3State *s)
s->features = 0;
s->sid_split = 0;
- s->aidr = 0x1;
s->cr[0] = 0;
s->cr0ack = 0;
s->irq_ctrl = 0;
@@ -1903,7 +1911,7 @@ static void smmu_reset_exit(Object *obj, ResetType type)
c->parent_phases.exit(obj, type);
}
- smmuv3_init_regs(s);
+ smmuv3_reset(s);
smmuv3_accel_reset(s);
}
@@ -1935,6 +1943,7 @@ static void smmu_realize(DeviceState *d, Error **errp)
sysbus_init_mmio(dev, &sys->iomem);
smmu_init_irq(s, dev);
+ smmuv3_init_id_regs(s);
}
static const VMStateDescription vmstate_smmuv3_queue = {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 20/33] hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (18 preceding siblings ...)
2025-11-20 13:21 ` [PATCH v6 19/33] hw/arm/smmuv3: Initialize ID registers early during realize() Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 21:27 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 21/33] hw/pci-host/gpex: Allow to generate preserve boot config DSM #5 Shameer Kolothum
` (13 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Just before the device gets attached to the SMMUv3, make sure QEMU SMMUv3
features are compatible with the host SMMUv3.
Not all fields in the host SMMUv3 IDR registers are meaningful for userspace.
Only the following fields can be used:
- IDR0: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN, CD2L, ASID16, TTF
- IDR1: SIDSIZE, SSIDSIZE
- IDR3: BBML, RIL
- IDR5: VAX, GRAN64K, GRAN16K, GRAN4K
For now, the check is to make sure the features are in sync to enable
basic accelerated SMMUv3 support. AIDR is not checked, as hardware
implementations often provide a mix of architecture features regardless
of the revision reported in AIDR.
Note that SSIDSIZE check will be added later when support for PASID is
introduced.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 95 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 95 insertions(+)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index a7291e75f1..aae7840c40 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -27,6 +27,93 @@
static MemoryRegion root, sysmem;
static AddressSpace *shared_as_sysmem;
+static bool
+smmuv3_accel_check_hw_compatible(SMMUv3State *s,
+ struct iommu_hw_info_arm_smmuv3 *info,
+ Error **errp)
+{
+ /* QEMU SMMUv3 supports both linear and 2-level stream tables */
+ if (FIELD_EX32(info->idr[0], IDR0, STLEVEL) !=
+ FIELD_EX32(s->idr[0], IDR0, STLEVEL)) {
+ error_setg(errp, "Host SMMUv3 differs in Stream Table format");
+ return false;
+ }
+
+ /* QEMU SMMUv3 supports only little-endian translation table walks */
+ if (FIELD_EX32(info->idr[0], IDR0, TTENDIAN) >
+ FIELD_EX32(s->idr[0], IDR0, TTENDIAN)) {
+ error_setg(errp, "Host SMMUv3 doesn't support Little-endian "
+ "translation table");
+ return false;
+ }
+
+ /* QEMU SMMUv3 supports only AArch64 translation table format */
+ if (FIELD_EX32(info->idr[0], IDR0, TTF) <
+ FIELD_EX32(s->idr[0], IDR0, TTF)) {
+ error_setg(errp, "Host SMMUv3 doesn't support AArch64 translation "
+ "table format");
+ return false;
+ }
+
+ /* QEMU SMMUv3 supports SIDSIZE 16 */
+ if (FIELD_EX32(info->idr[1], IDR1, SIDSIZE) <
+ FIELD_EX32(s->idr[1], IDR1, SIDSIZE)) {
+ error_setg(errp, "Host SMMUv3 SIDSIZE not compatible");
+ return false;
+ }
+
+ /* QEMU SMMUv3 supports Range Invalidation by default */
+ if (FIELD_EX32(info->idr[3], IDR3, RIL) !=
+ FIELD_EX32(s->idr[3], IDR3, RIL)) {
+ error_setg(errp, "Host SMMUv3 doesn't support Range Invalidation");
+ return false;
+ }
+
+ /* QEMU SMMUv3 supports GRAN4K/GRAN16K/GRAN64K translation granules */
+ if (FIELD_EX32(info->idr[5], IDR5, GRAN4K) !=
+ FIELD_EX32(s->idr[5], IDR5, GRAN4K)) {
+ error_setg(errp, "Host SMMUv3 doesn't support 4K translation granule");
+ return false;
+ }
+ if (FIELD_EX32(info->idr[5], IDR5, GRAN16K) !=
+ FIELD_EX32(s->idr[5], IDR5, GRAN16K)) {
+ error_setg(errp, "Host SMMUv3 doesn't support 16K translation granule");
+ return false;
+ }
+ if (FIELD_EX32(info->idr[5], IDR5, GRAN64K) !=
+ FIELD_EX32(s->idr[5], IDR5, GRAN64K)) {
+ error_setg(errp, "Host SMMUv3 doesn't support 64K translation granule");
+ return false;
+ }
+
+ return true;
+}
+
+static bool
+smmuv3_accel_hw_compatible(SMMUv3State *s, HostIOMMUDeviceIOMMUFD *idev,
+ Error **errp)
+{
+ struct iommu_hw_info_arm_smmuv3 info;
+ uint32_t data_type;
+ uint64_t caps;
+
+ if (!iommufd_backend_get_device_info(idev->iommufd, idev->devid, &data_type,
+ &info, sizeof(info), &caps, errp)) {
+ return false;
+ }
+
+ if (data_type != IOMMU_HW_INFO_TYPE_ARM_SMMUV3) {
+ error_setg(errp, "Wrong data type (%d) for Host SMMUv3 device info",
+ data_type);
+ return false;
+ }
+
+ if (!smmuv3_accel_check_hw_compatible(s, &info, errp)) {
+ return false;
+ }
+ return true;
+}
+
static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
PCIBus *bus, int devfn)
{
@@ -352,6 +439,14 @@ static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
return true;
}
+ /*
+ * Check the host SMMUv3 associated with the dev is compatible with the
+ * QEMU SMMUv3 accel.
+ */
+ if (!smmuv3_accel_hw_compatible(s, idev, errp)) {
+ return false;
+ }
+
if (s->s_accel) {
goto done;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 21/33] hw/pci-host/gpex: Allow to generate preserve boot config DSM #5
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (19 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 20/33] hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 22/33] hw/arm/virt: Set PCI preserve_config for accel SMMUv3 Shameer Kolothum
` (12 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju,
Michael S . Tsirkin
From: Eric Auger <eric.auger@redhat.com>
Add a 'preserve_config' field in struct GPEXConfig and, if set, generate
the _DSM function #5 for preserving PCI boot configurations.
This will be used for SMMUv3 accel=on support in subsequent patch. When
SMMUv3 acceleration (accel=on) is enabled, QEMU exposes IORT Reserved
Memory Region (RMR) nodes to support MSI doorbell translations. As per
the Arm IORT specification, using IORT RMRs mandates the presence of
_DSM function #5 so that the OS retains the firmware-assigned PCI
configuration. Hence, this patch adds conditional support for generating
_DSM #5.
According to the ACPI Specification, Revision 6.6, Section 9.1.1 -
“_DSM (Device Specific Method)”,
"
If Function Index is zero, the return is a buffer containing one bit for
each function index, starting with zero. Bit 0 indicates whether there
is support for any functions other than function 0 for the specified
UUID and Revision ID. If set to zero, no functions are supported (other
than function zero) for the specified UUID and Revision ID. If set to
one, at least one additional function is supported. For all other bits
in the buffer, a bit is set to zero to indicate if that function index
is not supported for the specific UUID and Revision ID. (For example,
bit 1 set to 0 indicates that function index 1 is not supported for the
specific UUID and Revision ID.)
"
Please refer PCI Firmware Specification, Revision 3.3, Section 4.6.5 —
"_DSM for Preserving PCI Boot Configurations" for Function 5 of _DSM
method.
Also, while at it, move the byte_list declaration to the top of the
function for clarity.
At the moment, DSM generation is not yet enabled.
The resulting AML when preserve_config=true is:
Method (_DSM, 4, NotSerialized)
{
If ((Arg0 == ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d")))
{
If ((Arg2 == Zero))
{
Return (Buffer (One)
{
0x21
})
}
If ((Arg2 == 0x05))
{
Return (Zero)
}
}
...
}
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
[Shameer: Removed possible duplicate _DSM creations]
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
Previously, QEMU reverted an attempt to enable DSM #5 because it caused a
regression,
https://lore.kernel.org/all/20210724185234.GA2265457@roeck-us.net/.
However, in this series, we enable it selectively, only when SMMUv3 is in
accelerator mode. The devices involved in the earlier regression are not
expected in accelerated SMMUv3 use cases.
---
hw/pci-host/gpex-acpi.c | 29 +++++++++++++++++++++++------
include/hw/pci-host/gpex.h | 1 +
2 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index 4587baeb78..d9820f9b41 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -51,10 +51,11 @@ static void acpi_dsdt_add_pci_route_table(Aml *dev, uint32_t irq,
}
}
-static Aml *build_pci_host_bridge_dsm_method(void)
+static Aml *build_pci_host_bridge_dsm_method(bool preserve_config)
{
Aml *method = aml_method("_DSM", 4, AML_NOTSERIALIZED);
Aml *UUID, *ifctx, *ifctx1, *buf;
+ uint8_t byte_list[1] = {0};
/* PCI Firmware Specification 3.0
* 4.6.1. _DSM for PCI Express Slot Information
@@ -64,10 +65,23 @@ static Aml *build_pci_host_bridge_dsm_method(void)
UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
ifctx = aml_if(aml_equal(aml_arg(0), UUID));
ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
- uint8_t byte_list[1] = {0};
+ if (preserve_config) {
+ /* support functions other than 0, specifically function 5 */
+ byte_list[0] = 0x21;
+ }
buf = aml_buffer(1, byte_list);
aml_append(ifctx1, aml_return(buf));
aml_append(ifctx, ifctx1);
+ if (preserve_config) {
+ Aml *ifctx2 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
+ /*
+ * 0 - The operating system must not ignore the PCI configuration that
+ * firmware has done at boot time.
+ */
+ aml_append(ifctx2, aml_return(aml_int(0)));
+ aml_append(ifctx, ifctx2);
+ }
+
aml_append(method, ifctx);
byte_list[0] = 0;
@@ -77,12 +91,13 @@ static Aml *build_pci_host_bridge_dsm_method(void)
}
static void acpi_dsdt_add_host_bridge_methods(Aml *dev,
- bool enable_native_pcie_hotplug)
+ bool enable_native_pcie_hotplug,
+ bool preserve_config)
{
/* Declare an _OSC (OS Control Handoff) method */
aml_append(dev,
build_pci_host_bridge_osc_method(enable_native_pcie_hotplug));
- aml_append(dev, build_pci_host_bridge_dsm_method());
+ aml_append(dev, build_pci_host_bridge_dsm_method(preserve_config));
}
void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
@@ -152,7 +167,8 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
build_cxl_osc_method(dev);
} else {
/* pxb bridges do not have ACPI PCI Hot-plug enabled */
- acpi_dsdt_add_host_bridge_methods(dev, true);
+ acpi_dsdt_add_host_bridge_methods(dev, true,
+ cfg->preserve_config);
}
aml_append(scope, dev);
@@ -227,7 +243,8 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
}
aml_append(dev, aml_name_decl("_CRS", rbuf));
- acpi_dsdt_add_host_bridge_methods(dev, cfg->pci_native_hotplug);
+ acpi_dsdt_add_host_bridge_methods(dev, cfg->pci_native_hotplug,
+ cfg->preserve_config);
Aml *dev_res0 = aml_device("%s", "RES0");
aml_append(dev_res0, aml_name_decl("_HID", aml_string("PNP0C02")));
diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h
index feaf827474..7eea16e728 100644
--- a/include/hw/pci-host/gpex.h
+++ b/include/hw/pci-host/gpex.h
@@ -46,6 +46,7 @@ struct GPEXConfig {
int irq;
PCIBus *bus;
bool pci_native_hotplug;
+ bool preserve_config;
};
typedef struct GPEXIrq GPEXIrq;
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 22/33] hw/arm/virt: Set PCI preserve_config for accel SMMUv3
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (20 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 21/33] hw/pci-host/gpex: Allow to generate preserve boot config DSM #5 Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 23/33] tests/qtest/bios-tables-test: Prepare for IORT revison upgrade Shameer Kolothum
` (11 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Introduce a new pci_preserve_config field in virt machine state which
allows the generation of DSM #5. This field is only set if accel SMMU
is instantiated.
In a subsequent patch, SMMUv3 accel mode will make use of IORT RMR nodes
to enable nested translation of MSI doorbell addresses. IORT RMR requires
_DSM #5 to be set for the PCI host bridge so that the Guest kernel
preserves the PCI boot configuration.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/virt-acpi-build.c | 8 ++++++++
hw/arm/virt.c | 1 +
include/hw/arm/virt.h | 1 +
3 files changed, 10 insertions(+)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 200e2a1da7..f7264fe8d8 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -164,6 +164,14 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
.pci_native_hotplug = !acpi_pcihp,
};
+ /*
+ * Accel SMMU requires RMRs for MSI 1-1 mapping, which require _DSM
+ * function 5 (_DSM for Preserving PCI Boot Configurations).
+ */
+ if (vms->pci_preserve_config) {
+ cfg.preserve_config = true;
+ }
+
if (vms->highmem_mmio) {
cfg.mmio64 = memmap[VIRT_HIGH_PCIE_MMIO];
}
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ea3231543a..8503879c3d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3109,6 +3109,7 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
}
object_property_set_uint(OBJECT(dev), "msi-gpa", db_start,
&error_abort);
+ vms->pci_preserve_config = true;
}
}
}
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index c77a33f6df..efbc1758c5 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -180,6 +180,7 @@ struct VirtMachineState {
bool ns_el2_virt_timer_irq;
CXLState cxl_devices_state;
bool legacy_smmuv3_present;
+ bool pci_preserve_config;
};
#define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 23/33] tests/qtest/bios-tables-test: Prepare for IORT revison upgrade
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (21 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 22/33] hw/arm/virt: Set PCI preserve_config for accel SMMUv3 Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 24/33] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum
` (10 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Subsequent patch will upgrade IORT revision to 5 to add support
for IORT RMR nodes.
Add the affected IORT blobs to allowed-diff list for bios-table
tests.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
tests/qtest/bios-tables-test-allowed-diff.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..3279638ad0 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,5 @@
/* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/aarch64/virt/IORT",
+"tests/data/acpi/aarch64/virt/IORT.its_off",
+"tests/data/acpi/aarch64/virt/IORT.smmuv3-legacy",
+"tests/data/acpi/aarch64/virt/IORT.smmuv3-dev",
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 24/33] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (22 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 23/33] tests/qtest/bios-tables-test: Prepare for IORT revison upgrade Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 25/33] tests/qtest/bios-tables-test: Update IORT blobs after revision upgrade Shameer Kolothum
` (9 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju,
Jean-Philippe Brucker
From: Eric Auger <eric.auger@redhat.com>
To handle SMMUv3 accel=on mode(which configures the host SMMUv3 in nested
mode), it is practical to expose the guest with reserved memory regions
(RMRs) covering the IOVAs used by the host kernel to map physical MSI
doorbells.
Those IOVAs belong to [0x8000000, 0x8100000] matching MSI_IOVA_BASE and
MSI_IOVA_LENGTH definitions in kernel arm-smmu-v3 driver. This is the
window used to allocate IOVAs matching physical MSI doorbells.
With those RMRs, the guest is forced to use a flat mapping for this range.
Hence the assigned device is programmed with one IOVA from this range.
Stage 1, owned by the guest has a flat mapping for this IOVA. Stage2,
owned by the VMM then enforces a mapping from this IOVA to the physical
MSI doorbell.
The creation of those RMR nodes is only relevant if nested stage SMMU is
in use, along with VFIO. As VFIO devices can be hotplugged, all RMRs need
to be created in advance.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/virt-acpi-build.c | 111 ++++++++++++++++++++++++++++++++++++---
1 file changed, 103 insertions(+), 8 deletions(-)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f7264fe8d8..7a7b2e62c1 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -257,6 +257,29 @@ static void acpi_dsdt_add_tpm(Aml *scope, VirtMachineState *vms)
#define ROOT_COMPLEX_ENTRY_SIZE 36
#define IORT_NODE_OFFSET 48
+#define IORT_RMR_NUM_ID_MAPPINGS 1
+#define IORT_RMR_NUM_MEM_RANGE_DESC 1
+#define IORT_RMR_COMMON_HEADER_SIZE 28
+#define IORT_RMR_MEM_RANGE_DESC_SIZE 20
+
+/*
+ * IORT RMR flags:
+ * Bit[0] = 0 Disallow remapping of reserved ranges
+ * Bit[1] = 0 Unprivileged access
+ * Bits[9:2] = 0x00 Device nGnRnE memory
+ */
+#define IORT_RMR_FLAGS 0
+
+/*
+ * MSI doorbell IOVA window used by the host kernel SMMUv3 driver.
+ * Described in IORT RMR nodes to reserve the IOVA range where the host
+ * kernel maps physical MSI doorbells for devices. This ensures guests
+ * preserve a flat mapping for MSI doorbell in nested SMMUv3(accel=on)
+ * configurations.
+ */
+#define MSI_IOVA_BASE 0x8000000
+#define MSI_IOVA_LENGTH 0x100000
+
/*
* Append an ID mapping entry as described by "Table 4 ID mapping format" in
* "IO Remapping Table System Software on ARM Platforms", Chapter 3.
@@ -265,7 +288,8 @@ static void acpi_dsdt_add_tpm(Aml *scope, VirtMachineState *vms)
* Note that @id_count gets internally subtracted by one, following the spec.
*/
static void build_iort_id_mapping(GArray *table_data, uint32_t input_base,
- uint32_t id_count, uint32_t out_ref)
+ uint32_t id_count, uint32_t out_ref,
+ uint32_t flags)
{
build_append_int_noprefix(table_data, input_base, 4); /* Input base */
/* Number of IDs - The number of IDs in the range minus one */
@@ -273,7 +297,7 @@ static void build_iort_id_mapping(GArray *table_data, uint32_t input_base,
build_append_int_noprefix(table_data, input_base, 4); /* Output base */
build_append_int_noprefix(table_data, out_ref, 4); /* Output Reference */
/* Flags */
- build_append_int_noprefix(table_data, 0 /* Single mapping (disabled) */, 4);
+ build_append_int_noprefix(table_data, flags, 4);
}
struct AcpiIortIdMapping {
@@ -321,6 +345,7 @@ typedef struct AcpiIortSMMUv3Dev {
GArray *rc_smmu_idmaps;
/* Offset of the SMMUv3 IORT Node relative to the start of the IORT */
size_t offset;
+ bool accel;
} AcpiIortSMMUv3Dev;
/*
@@ -375,6 +400,9 @@ static int iort_smmuv3_devices(Object *obj, void *opaque)
}
bus = PCI_BUS(object_property_get_link(obj, "primary-bus", &error_abort));
+ if (object_property_find(obj, "accel")) {
+ sdev.accel = object_property_get_bool(obj, "accel", &error_abort);
+ }
pbus = PLATFORM_BUS_DEVICE(vms->platform_bus_dev);
sbdev = SYS_BUS_DEVICE(obj);
sdev.base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
@@ -448,10 +476,69 @@ static void create_rc_its_idmaps(GArray *its_idmaps, GArray *smmuv3_devs)
}
}
+static void
+build_iort_rmr_nodes(GArray *table_data, GArray *smmuv3_devices, uint32_t *id)
+{
+ AcpiIortSMMUv3Dev *sdev;
+ AcpiIortIdMapping *idmap;
+ int i;
+
+ for (i = 0; i < smmuv3_devices->len; i++) {
+ uint16_t rmr_len;
+ int bdf;
+
+ sdev = &g_array_index(smmuv3_devices, AcpiIortSMMUv3Dev, i);
+ if (!sdev->accel) {
+ continue;
+ }
+
+ /*
+ * Spec reference:Arm IO Remapping Table(IORT), ARM DEN 0049E.d,
+ * Section 3.1.1.5 "Reserved Memory Range node"
+ */
+ idmap = &g_array_index(sdev->rc_smmu_idmaps, AcpiIortIdMapping, 0);
+ bdf = idmap->input_base;
+ rmr_len = IORT_RMR_COMMON_HEADER_SIZE
+ + (IORT_RMR_NUM_ID_MAPPINGS * ID_MAPPING_ENTRY_SIZE)
+ + (IORT_RMR_NUM_MEM_RANGE_DESC * IORT_RMR_MEM_RANGE_DESC_SIZE);
+
+ /* Table 18 Reserved Memory Range Node */
+ build_append_int_noprefix(table_data, 6 /* RMR */, 1); /* Type */
+ /* Length */
+ build_append_int_noprefix(table_data, rmr_len, 2);
+ build_append_int_noprefix(table_data, 3, 1); /* Revision */
+ build_append_int_noprefix(table_data, (*id)++, 4); /* Identifier */
+ /* Number of ID mappings */
+ build_append_int_noprefix(table_data, IORT_RMR_NUM_ID_MAPPINGS, 4);
+ /* Reference to ID Array */
+ build_append_int_noprefix(table_data, IORT_RMR_COMMON_HEADER_SIZE, 4);
+
+ /* RMR specific data */
+
+ /* Flags */
+ build_append_int_noprefix(table_data, IORT_RMR_FLAGS, 4);
+ /* Number of Memory Range Descriptors */
+ build_append_int_noprefix(table_data, IORT_RMR_NUM_MEM_RANGE_DESC, 4);
+ /* Reference to Memory Range Descriptors */
+ build_append_int_noprefix(table_data, IORT_RMR_COMMON_HEADER_SIZE +
+ (IORT_RMR_NUM_ID_MAPPINGS * ID_MAPPING_ENTRY_SIZE), 4);
+ build_iort_id_mapping(table_data, bdf, idmap->id_count, sdev->offset,
+ 1);
+
+ /* Table 19 Memory Range Descriptor */
+
+ /* Physical Range offset */
+ build_append_int_noprefix(table_data, MSI_IOVA_BASE, 8);
+ /* Physical Range length */
+ build_append_int_noprefix(table_data, MSI_IOVA_LENGTH, 8);
+ build_append_int_noprefix(table_data, 0, 4); /* Reserved */
+ }
+}
+
/*
* Input Output Remapping Table (IORT)
* Conforms to "IO Remapping Table System Software on ARM Platforms",
- * Document number: ARM DEN 0049E.b, Feb 2021
+ * Document number: ARM DEN 0049E.d, Feb 2022
*/
static void
build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -465,7 +552,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
GArray *smmuv3_devs = g_array_new(false, true, sizeof(AcpiIortSMMUv3Dev));
GArray *rc_its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
- AcpiTable table = { .sig = "IORT", .rev = 3, .oem_id = vms->oem_id,
+ AcpiTable table = { .sig = "IORT", .rev = 5, .oem_id = vms->oem_id,
.oem_table_id = vms->oem_table_id };
/* Table 2 The IORT */
acpi_table_begin(&table, table_data);
@@ -491,6 +578,13 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
nb_nodes++; /* ITS */
rc_mapping_count += rc_its_idmaps->len;
}
+ /* Calculate RMR nodes required. One per SMMUv3 with accelerated mode */
+ for (i = 0; i < num_smmus; i++) {
+ sdev = &g_array_index(smmuv3_devs, AcpiIortSMMUv3Dev, i);
+ if (sdev->accel) {
+ nb_nodes++;
+ }
+ }
} else {
if (vms->its) {
nb_nodes = 2; /* RC and ITS */
@@ -563,7 +657,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
/* Array of ID mappings */
if (smmu_mapping_count) {
/* Output IORT node is the ITS Group node (the first node). */
- build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET);
+ build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET, 0);
}
}
@@ -615,7 +709,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
AcpiIortIdMapping, j);
/* Output IORT node is the SMMUv3 node. */
build_iort_id_mapping(table_data, range->input_base,
- range->id_count, sdev->offset);
+ range->id_count, sdev->offset, 0);
}
}
@@ -628,7 +722,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
range = &g_array_index(rc_its_idmaps, AcpiIortIdMapping, i);
/* Output IORT node is the ITS Group node (the first node). */
build_iort_id_mapping(table_data, range->input_base,
- range->id_count, IORT_NODE_OFFSET);
+ range->id_count, IORT_NODE_OFFSET, 0);
}
}
} else {
@@ -637,9 +731,10 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
* SMMU: RC -> ITS.
* Output IORT node is the ITS Group node (the first node).
*/
- build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET);
+ build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET, 0);
}
+ build_iort_rmr_nodes(table_data, smmuv3_devs, &id);
acpi_table_end(linker, &table);
g_array_free(rc_its_idmaps, true);
for (i = 0; i < num_smmus; i++) {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 25/33] tests/qtest/bios-tables-test: Update IORT blobs after revision upgrade
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (23 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 24/33] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 26/33] hw/arm/smmuv3: Add accel property for SMMUv3 device Shameer Kolothum
` (8 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Update the reference IORT blobs after revision upgrade for RMR node
support. This affects the aarch64 'virt' IORT tests.
IORT diff is the same for all the tests:
/*
* Intel ACPI Component Architecture
* AML/ASL+ Disassembler version 20230628 (64-bit version)
* Copyright (c) 2000 - 2023 Intel Corporation
*
- * Disassembly of tests/data/acpi/aarch64/virt/IORT, Mon Oct 20 14:42:41 2025
+ * Disassembly of /tmp/aml-B4ZRE3, Mon Oct 20 14:42:41 2025
*
* ACPI Data Table [IORT]
*
* Format: [HexOffset DecimalOffset ByteLength] FieldName : FieldValue (in hex)
*/
[000h 0000 004h] Signature : "IORT" [IO Remapping Table]
[004h 0004 004h] Table Length : 00000080
-[008h 0008 001h] Revision : 03
-[009h 0009 001h] Checksum : B3
+[008h 0008 001h] Revision : 05
+[009h 0009 001h] Checksum : B1
[00Ah 0010 006h] Oem ID : "BOCHS "
[010h 0016 008h] Oem Table ID : "BXPC "
[018h 0024 004h] Oem Revision : 00000001
[01Ch 0028 004h] Asl Compiler ID : "BXPC"
[020h 0032 004h] Asl Compiler Revision : 00000001
...
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
tests/data/acpi/aarch64/virt/IORT | Bin 128 -> 128 bytes
tests/data/acpi/aarch64/virt/IORT.its_off | Bin 172 -> 172 bytes
tests/data/acpi/aarch64/virt/IORT.smmuv3-dev | Bin 364 -> 364 bytes
tests/data/acpi/aarch64/virt/IORT.smmuv3-legacy | Bin 276 -> 276 bytes
tests/qtest/bios-tables-test-allowed-diff.h | 4 ----
5 files changed, 4 deletions(-)
diff --git a/tests/data/acpi/aarch64/virt/IORT b/tests/data/acpi/aarch64/virt/IORT
index 7efd0ce8a6b3928efa7e1373f688ab4c5f50543b..a234aae4c2d04668d34313836d32ca20e19c0880 100644
GIT binary patch
delta 18
ZcmZo*Y+&T_^bZPYU|?Wi-8hk}3;-#Q1d;#%
delta 18
ZcmZo*Y+&T_^bZPYU|?Wi-aL`33;-#O1d;#%
diff --git a/tests/data/acpi/aarch64/virt/IORT.its_off b/tests/data/acpi/aarch64/virt/IORT.its_off
index c10da4e61dd00e7eb062558a2735d49ca0b20620..0cf52b52f671637bf4dbc9e0fc80c3c73d0b01d3 100644
GIT binary patch
delta 18
ZcmZ3(xQ3C-(?2L=4FdxM>(q%{ivTdM1ttIh
delta 18
ZcmZ3(xQ3C-(?2L=4FdxM^Yn>aivTdK1ttIh
diff --git a/tests/data/acpi/aarch64/virt/IORT.smmuv3-dev b/tests/data/acpi/aarch64/virt/IORT.smmuv3-dev
index 67be268f62afbf2d9459540984da5e9340afdaaa..43a15fe2bf6cc650ffcbceff86919ea892928c0e 100644
GIT binary patch
delta 19
acmaFE^oEJc(?2LAhmnDS^~6T5Bt`%|fCYU3
delta 19
acmaFE^oEJc(?2LAhmnDS`P4?PBt`%|eg%C1
diff --git a/tests/data/acpi/aarch64/virt/IORT.smmuv3-legacy b/tests/data/acpi/aarch64/virt/IORT.smmuv3-legacy
index 41981a449fc306b80cccd87ddec3c593a8d72c07..5779d0e225a62b9cd70bebbacb7fd1e519c9e3c4 100644
GIT binary patch
delta 19
acmbQjG=+)F(?2Lggpq-P)oUXc7b5^FiUXej
delta 19
acmbQjG=+)F(?2Lggpq-P*=Hjc7b5^Fhy$Mh
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index 3279638ad0..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,5 +1 @@
/* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/aarch64/virt/IORT",
-"tests/data/acpi/aarch64/virt/IORT.its_off",
-"tests/data/acpi/aarch64/virt/IORT.smmuv3-legacy",
-"tests/data/acpi/aarch64/virt/IORT.smmuv3-dev",
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 26/33] hw/arm/smmuv3: Add accel property for SMMUv3 device
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (24 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 25/33] tests/qtest/bios-tables-test: Update IORT blobs after revision upgrade Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 27/33] hw/arm/smmuv3-accel: Add a property to specify RIL support Shameer Kolothum
` (7 subsequent siblings)
33 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Introduce an “accel” property to enable accelerator mode.
Live migration is currently unsupported when accelerator mode is enabled,
so a migration blocker is added.
Because this mode relies on IORT RMR for MSI support, accelerator mode is
not supported for device tree boot.
Also, in the accelerated SMMUv3 case, the host SMMUv3 is configured in nested
mode (S1 + S2), and the guest owns the Stage-1 page table. Therefore, we
expose only Stage-1 to the guest to ensure it uses the correct page table
format.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3.c | 26 ++++++++++++++++++++++++++
hw/arm/virt-acpi-build.c | 4 +---
hw/arm/virt.c | 15 +++++++++++----
include/hw/arm/smmuv3.h | 1 +
4 files changed, 39 insertions(+), 7 deletions(-)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index dba5abc8d3..8352dd5757 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -20,6 +20,7 @@
#include "qemu/bitops.h"
#include "hw/irq.h"
#include "hw/sysbus.h"
+#include "migration/blocker.h"
#include "migration/vmstate.h"
#include "hw/qdev-properties.h"
#include "hw/qdev-core.h"
@@ -1915,6 +1916,17 @@ static void smmu_reset_exit(Object *obj, ResetType type)
smmuv3_accel_reset(s);
}
+static bool smmu_validate_property(SMMUv3State *s, Error **errp)
+{
+#ifndef CONFIG_ARM_SMMUV3_ACCEL
+ if (s->accel) {
+ error_setg(errp, "accel=on support not compiled in");
+ return false;
+ }
+#endif
+ return true;
+}
+
static void smmu_realize(DeviceState *d, Error **errp)
{
SMMUState *sys = ARM_SMMU(d);
@@ -1923,8 +1935,17 @@ static void smmu_realize(DeviceState *d, Error **errp)
SysBusDevice *dev = SYS_BUS_DEVICE(d);
Error *local_err = NULL;
+ if (!smmu_validate_property(s, errp)) {
+ return;
+ }
+
if (s->accel) {
smmuv3_accel_init(s);
+ error_setg(&s->migration_blocker, "Migration not supported with SMMUv3 "
+ "accelerator mode enabled");
+ if (migrate_add_blocker(&s->migration_blocker, errp) < 0) {
+ return;
+ }
}
c->parent_realize(d, &local_err);
@@ -2023,6 +2044,7 @@ static const Property smmuv3_properties[] = {
* Defaults to stage 1
*/
DEFINE_PROP_STRING("stage", SMMUv3State, stage),
+ DEFINE_PROP_BOOL("accel", SMMUv3State, accel, false),
/* GPA of MSI doorbell, for SMMUv3 accel use. */
DEFINE_PROP_UINT64("msi-gpa", SMMUv3State, msi_gpa, 0),
};
@@ -2046,6 +2068,10 @@ static void smmuv3_class_init(ObjectClass *klass, const void *data)
device_class_set_props(dc, smmuv3_properties);
dc->hotpluggable = false;
dc->user_creatable = true;
+
+ object_class_property_set_description(klass, "accel",
+ "Enable SMMUv3 accelerator support. Allows host SMMUv3 to be "
+ "configured in nested mode for vfio-pci dev assignment");
}
static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 7a7b2e62c1..fd78c39317 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -400,9 +400,7 @@ static int iort_smmuv3_devices(Object *obj, void *opaque)
}
bus = PCI_BUS(object_property_get_link(obj, "primary-bus", &error_abort));
- if (object_property_find(obj, "accel")) {
- sdev.accel = object_property_get_bool(obj, "accel", &error_abort);
- }
+ sdev.accel = object_property_get_bool(obj, "accel", &error_abort);
pbus = PLATFORM_BUS_DEVICE(vms->platform_bus_dev);
sbdev = SYS_BUS_DEVICE(obj);
sdev.base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8503879c3d..51b15aef37 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3052,13 +3052,21 @@ static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
/* The new SMMUv3 device is specific to the PCI bus */
object_property_set_bool(OBJECT(dev), "smmu_per_bus", true, NULL);
}
- if (object_property_find(OBJECT(dev), "accel") &&
- object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
+ if (object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
+ char *stage;
+
if (!kvm_enabled() || !kvm_irqchip_in_kernel()) {
error_setg(errp, "SMMUv3 accel=on requires KVM with "
"kernel-irqchip=on support");
return;
}
+ stage = object_property_get_str(OBJECT(dev), "stage", &error_fatal);
+ /* If no stage specified, SMMUv3 will default to stage 1 */
+ if (*stage && strcmp("1", stage)) {
+ error_setg(errp, "Only stage1 is supported for SMMUV3 with "
+ "accel=on");
+ return;
+ }
}
}
}
@@ -3096,8 +3104,7 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
}
create_smmuv3_dev_dtb(vms, dev, bus);
- if (object_property_find(OBJECT(dev), "accel") &&
- object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
+ if (object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
hwaddr db_start;
if (vms->msi_controller == VIRT_MSI_CTRL_ITS) {
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index 5616a8a2be..9c39acd5ca 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -68,6 +68,7 @@ struct SMMUv3State {
bool accel;
struct SMMUv3AccelState *s_accel;
uint64_t msi_gpa;
+ Error *migration_blocker;
};
typedef enum {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 27/33] hw/arm/smmuv3-accel: Add a property to specify RIL support
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (25 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 26/33] hw/arm/smmuv3: Add accel property for SMMUv3 device Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 21:34 ` Nicolin Chen via
2025-11-20 13:22 ` [PATCH v6 28/33] hw/arm/smmuv3-accel: Add support for ATS Shameer Kolothum
` (6 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Currently QEMU SMMUv3 has RIL support by default. But if accelerated mode
is enabled, RIL has to be compatible with host SMMUv3 support.
Add a property so that the user can specify this.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 14 ++++++++++++--
hw/arm/smmuv3-accel.h | 4 ++++
hw/arm/smmuv3.c | 12 ++++++++++++
include/hw/arm/smmuv3.h | 1 +
4 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index aae7840c40..b6429c8b42 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -62,8 +62,8 @@ smmuv3_accel_check_hw_compatible(SMMUv3State *s,
return false;
}
- /* QEMU SMMUv3 supports Range Invalidation by default */
- if (FIELD_EX32(info->idr[3], IDR3, RIL) !=
+ /* User can disable QEMU SMMUv3 Range Invalidation support */
+ if (FIELD_EX32(info->idr[3], IDR3, RIL) >
FIELD_EX32(s->idr[3], IDR3, RIL)) {
error_setg(errp, "Host SMMUv3 doesn't support Range Invalidation");
return false;
@@ -639,6 +639,16 @@ static const PCIIOMMUOps smmuv3_accel_ops = {
.get_msi_direct_gpa = smmuv3_accel_get_msi_gpa,
};
+void smmuv3_accel_idr_override(SMMUv3State *s)
+{
+ if (!s->accel) {
+ return;
+ }
+
+ /* By default QEMU SMMUv3 has RIL. Update IDR3 if user has disabled it */
+ s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, s->ril);
+}
+
/* Based on SMUUv3 GPBA.ABORT configuration, attach a corresponding HWPT */
bool smmuv3_accel_attach_gbpa_hwpt(SMMUv3State *s, Error **errp)
{
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index 7186817264..2f2904d86b 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -47,6 +47,7 @@ bool smmuv3_accel_install_ste_range(SMMUv3State *s, SMMUSIDRange *range,
bool smmuv3_accel_attach_gbpa_hwpt(SMMUv3State *s, Error **errp);
bool smmuv3_accel_issue_inv_cmd(SMMUv3State *s, void *cmd, SMMUDevice *sdev,
Error **errp);
+void smmuv3_accel_idr_override(SMMUv3State *s);
void smmuv3_accel_reset(SMMUv3State *s);
#else
static inline void smmuv3_accel_init(SMMUv3State *s)
@@ -74,6 +75,9 @@ smmuv3_accel_issue_inv_cmd(SMMUv3State *s, void *cmd, SMMUDevice *sdev,
{
return true;
}
+static inline void smmuv3_accel_idr_override(SMMUv3State *s)
+{
+}
static inline void smmuv3_accel_reset(SMMUv3State *s)
{
}
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 8352dd5757..296afbe503 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -305,6 +305,7 @@ static void smmuv3_init_id_regs(SMMUv3State *s)
s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, 1);
s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, 1);
s->aidr = 0x1;
+ smmuv3_accel_idr_override(s);
}
static void smmuv3_reset(SMMUv3State *s)
@@ -1924,6 +1925,13 @@ static bool smmu_validate_property(SMMUv3State *s, Error **errp)
return false;
}
#endif
+ if (!s->accel) {
+ if (!s->ril) {
+ error_setg(errp, "ril can only be disabled if accel=on");
+ return false;
+ }
+ return true;
+ }
return true;
}
@@ -2047,6 +2055,8 @@ static const Property smmuv3_properties[] = {
DEFINE_PROP_BOOL("accel", SMMUv3State, accel, false),
/* GPA of MSI doorbell, for SMMUv3 accel use. */
DEFINE_PROP_UINT64("msi-gpa", SMMUv3State, msi_gpa, 0),
+ /* RIL can be turned off for accel cases */
+ DEFINE_PROP_BOOL("ril", SMMUv3State, ril, true),
};
static void smmuv3_instance_init(Object *obj)
@@ -2072,6 +2082,8 @@ static void smmuv3_class_init(ObjectClass *klass, const void *data)
object_class_property_set_description(klass, "accel",
"Enable SMMUv3 accelerator support. Allows host SMMUv3 to be "
"configured in nested mode for vfio-pci dev assignment");
+ object_class_property_set_description(klass, "ril",
+ "Disable range invalidation support (for accel=on)");
}
static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index 9c39acd5ca..533a2182e8 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -69,6 +69,7 @@ struct SMMUv3State {
struct SMMUv3AccelState *s_accel;
uint64_t msi_gpa;
Error *migration_blocker;
+ bool ril;
};
typedef enum {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 28/33] hw/arm/smmuv3-accel: Add support for ATS
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (26 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 27/33] hw/arm/smmuv3-accel: Add a property to specify RIL support Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 21:40 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 29/33] hw/arm/smmuv3-accel: Add property to specify OAS bits Shameer Kolothum
` (5 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
QEMU SMMUv3 does not enable ATS (Address Translation Services) by default.
When accelerated mode is enabled and the host SMMUv3 supports ATS, it can
be useful to report ATS capability to the guest so it can take advantage
of it if the device also supports ATS.
Note: ATS support cannot be reliably detected from the host SMMUv3 IDR
registers alone, as firmware ACPI IORT tables may override them. The
user must therefore ensure the support before enabling it.
The ATS support enabled here is only relevant for vfio-pci endpoints,
as SMMUv3 accelerated mode does not support emulated endpoint devices.
QEMU’s SMMUv3 implementation still lacks support for handling ATS
translation requests, which would be required for emulated endpoints.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 3 +++
hw/arm/smmuv3.c | 21 ++++++++++++++++++++-
hw/arm/virt-acpi-build.c | 10 ++++++++--
include/hw/arm/smmuv3.h | 1 +
4 files changed, 32 insertions(+), 3 deletions(-)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index b6429c8b42..73c7ce586a 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -647,6 +647,9 @@ void smmuv3_accel_idr_override(SMMUv3State *s)
/* By default QEMU SMMUv3 has RIL. Update IDR3 if user has disabled it */
s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, s->ril);
+
+ /* QEMU SMMUv3 has no ATS. Advertise ATS if opt-on by property */
+ s->idr[0] = FIELD_DP32(s->idr[0], IDR0, ATS, s->ats);
}
/* Based on SMUUv3 GPBA.ABORT configuration, attach a corresponding HWPT */
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 296afbe503..ad476146f6 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1498,13 +1498,24 @@ static int smmuv3_cmdq_consume(SMMUv3State *s, Error **errp)
*/
smmuv3_range_inval(bs, &cmd, SMMU_STAGE_2);
break;
+ case SMMU_CMD_ATC_INV:
+ SMMUDevice *sdev = smmu_find_sdev(bs, CMD_SID(&cmd));
+
+ if (!sdev) {
+ break;
+ }
+
+ if (!smmuv3_accel_issue_inv_cmd(s, &cmd, sdev, errp)) {
+ cmd_error = SMMU_CERROR_ILL;
+ break;
+ }
+ break;
case SMMU_CMD_TLBI_EL3_ALL:
case SMMU_CMD_TLBI_EL3_VA:
case SMMU_CMD_TLBI_EL2_ALL:
case SMMU_CMD_TLBI_EL2_ASID:
case SMMU_CMD_TLBI_EL2_VA:
case SMMU_CMD_TLBI_EL2_VAA:
- case SMMU_CMD_ATC_INV:
case SMMU_CMD_PRI_RESP:
case SMMU_CMD_RESUME:
case SMMU_CMD_STALL_TERM:
@@ -1930,6 +1941,10 @@ static bool smmu_validate_property(SMMUv3State *s, Error **errp)
error_setg(errp, "ril can only be disabled if accel=on");
return false;
}
+ if (s->ats) {
+ error_setg(errp, "ats can only be enabled if accel=on");
+ return false;
+ }
return true;
}
return true;
@@ -2057,6 +2072,7 @@ static const Property smmuv3_properties[] = {
DEFINE_PROP_UINT64("msi-gpa", SMMUv3State, msi_gpa, 0),
/* RIL can be turned off for accel cases */
DEFINE_PROP_BOOL("ril", SMMUv3State, ril, true),
+ DEFINE_PROP_BOOL("ats", SMMUv3State, ats, false),
};
static void smmuv3_instance_init(Object *obj)
@@ -2084,6 +2100,9 @@ static void smmuv3_class_init(ObjectClass *klass, const void *data)
"configured in nested mode for vfio-pci dev assignment");
object_class_property_set_description(klass, "ril",
"Disable range invalidation support (for accel=on)");
+ object_class_property_set_description(klass, "ats",
+ "Enable/disable ATS support (for accel=on). Please ensure host "
+ "platform has ATS support before enabling this");
}
static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index fd78c39317..1e3779991e 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -346,6 +346,7 @@ typedef struct AcpiIortSMMUv3Dev {
/* Offset of the SMMUv3 IORT Node relative to the start of the IORT */
size_t offset;
bool accel;
+ bool ats;
} AcpiIortSMMUv3Dev;
/*
@@ -401,6 +402,7 @@ static int iort_smmuv3_devices(Object *obj, void *opaque)
bus = PCI_BUS(object_property_get_link(obj, "primary-bus", &error_abort));
sdev.accel = object_property_get_bool(obj, "accel", &error_abort);
+ sdev.ats = object_property_get_bool(obj, "ats", &error_abort);
pbus = PLATFORM_BUS_DEVICE(vms->platform_bus_dev);
sbdev = SYS_BUS_DEVICE(obj);
sdev.base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
@@ -544,6 +546,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
int i, nb_nodes, rc_mapping_count;
AcpiIortSMMUv3Dev *sdev;
size_t node_size;
+ bool ats_needed = false;
int num_smmus = 0;
uint32_t id = 0;
int rc_smmu_idmaps_len = 0;
@@ -579,6 +582,9 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
/* Calculate RMR nodes required. One per SMMUv3 with accelerated mode */
for (i = 0; i < num_smmus; i++) {
sdev = &g_array_index(smmuv3_devs, AcpiIortSMMUv3Dev, i);
+ if (sdev->ats) {
+ ats_needed = true;
+ }
if (sdev->accel) {
nb_nodes++;
}
@@ -678,8 +684,8 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
build_append_int_noprefix(table_data, 0, 2); /* Reserved */
/* Table 15 Memory Access Flags */
build_append_int_noprefix(table_data, 0x3 /* CCA = CPM = DACS = 1 */, 1);
-
- build_append_int_noprefix(table_data, 0, 4); /* ATS Attribute */
+ /* ATS Attribute */
+ build_append_int_noprefix(table_data, ats_needed, 4);
/* MCFG pci_segment */
build_append_int_noprefix(table_data, 0, 4); /* PCI Segment number */
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index 533a2182e8..242d6429ed 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -70,6 +70,7 @@ struct SMMUv3State {
uint64_t msi_gpa;
Error *migration_blocker;
bool ril;
+ bool ats;
};
typedef enum {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 29/33] hw/arm/smmuv3-accel: Add property to specify OAS bits
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (27 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 28/33] hw/arm/smmuv3-accel: Add support for ATS Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 21:47 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 30/33] backends/iommufd: Retrieve PASID width from iommufd_backend_get_device_info() Shameer Kolothum
` (4 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
QEMU SMMUv3 currently sets the output address size (OAS) to 44 bits.
With accelerator mode enabled, a device may use SVA, where CPU page tables
are shared with the SMMU, requiring an OAS at least as large as the
CPU’s output address size. A user option is added to configure this.
However, the OAS value advertised by the virtual SMMU must remain
compatible with the capabilities of the host SMMUv3. In accelerated
mode, the host SMMU performs stage-2 translation and must be able to
consume the intermediate physical addresses (IPA) produced by stage-1.
The OAS exposed by the virtual SMMU defines the maximum IPA width that
stage-1 translations may generate. For AArch64 implementations, the
maximum usable IPA size on the host SMMU is determined by its own OAS.
Check that the configured OAS does not exceed what the host SMMU
can safely support.
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 20 ++++++++++++++++++++
hw/arm/smmuv3-internal.h | 3 ++-
hw/arm/smmuv3.c | 16 +++++++++++++++-
include/hw/arm/smmuv3.h | 1 +
4 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 73c7ce586a..35a94c720a 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -27,6 +27,12 @@
static MemoryRegion root, sysmem;
static AddressSpace *shared_as_sysmem;
+static int smmuv3_oas_bits(uint32_t oas)
+{
+ static const int map[] = { 32, 36, 40, 42, 44, 48, 52, 56 };
+ return (oas < ARRAY_SIZE(map)) ? map[oas] : -EINVAL;
+}
+
static bool
smmuv3_accel_check_hw_compatible(SMMUv3State *s,
struct iommu_hw_info_arm_smmuv3 *info,
@@ -68,6 +74,15 @@ smmuv3_accel_check_hw_compatible(SMMUv3State *s,
error_setg(errp, "Host SMMUv3 doesn't support Range Invalidation");
return false;
}
+ /* Check OAS value opted is compatible with Host SMMUv3 IPA */
+ if (FIELD_EX32(info->idr[5], IDR5, OAS) <
+ FIELD_EX32(s->idr[5], IDR5, OAS)) {
+ error_setg(errp, "Host SMMUv3 supports only %d-bit IPA, but the vSMMU "
+ "OAS implies %d-bit IPA",
+ smmuv3_oas_bits(FIELD_EX32(info->idr[5], IDR5, OAS)),
+ smmuv3_oas_bits(FIELD_EX32(info->idr[5], IDR5, OAS)));
+ return false;
+ }
/* QEMU SMMUv3 supports GRAN4K/GRAN16K/GRAN64K translation granules */
if (FIELD_EX32(info->idr[5], IDR5, GRAN4K) !=
@@ -650,6 +665,11 @@ void smmuv3_accel_idr_override(SMMUv3State *s)
/* QEMU SMMUv3 has no ATS. Advertise ATS if opt-on by property */
s->idr[0] = FIELD_DP32(s->idr[0], IDR0, ATS, s->ats);
+
+ /* Advertise 48-bit OAS in IDR5 when requested (default is 44 bits). */
+ if (s->oas == 48) {
+ s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, SMMU_IDR5_OAS_48);
+ }
}
/* Based on SMUUv3 GPBA.ABORT configuration, attach a corresponding HWPT */
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index a76e4e2484..0f44a4e1d3 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -111,7 +111,8 @@ REG32(IDR5, 0x14)
FIELD(IDR5, VAX, 10, 2);
FIELD(IDR5, STALL_MAX, 16, 16);
-#define SMMU_IDR5_OAS 4
+#define SMMU_IDR5_OAS_44 4
+#define SMMU_IDR5_OAS_48 5
REG32(IIDR, 0x18)
REG32(AIDR, 0x1c)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index ad476146f6..a7bd4eeb77 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -299,7 +299,8 @@ static void smmuv3_init_id_regs(SMMUv3State *s)
s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, 1);
s->idr[3] = FIELD_DP32(s->idr[3], IDR3, BBML, 2);
- s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, SMMU_IDR5_OAS); /* 44 bits */
+ /* OAS: 44 bits */
+ s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, SMMU_IDR5_OAS_44);
/* 4K, 16K and 64K granule support */
s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, 1);
s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, 1);
@@ -1945,8 +1946,17 @@ static bool smmu_validate_property(SMMUv3State *s, Error **errp)
error_setg(errp, "ats can only be enabled if accel=on");
return false;
}
+ if (s->oas != 44) {
+ error_setg(errp, "OAS can only be set to 44 bits if accel=off");
+ return false;
+ }
return true;
}
+
+ if (s->oas != 44 && s->oas != 48) {
+ error_setg(errp, "OAS can only be set to 44 or 48 bits");
+ return false;
+ }
return true;
}
@@ -2073,6 +2083,7 @@ static const Property smmuv3_properties[] = {
/* RIL can be turned off for accel cases */
DEFINE_PROP_BOOL("ril", SMMUv3State, ril, true),
DEFINE_PROP_BOOL("ats", SMMUv3State, ats, false),
+ DEFINE_PROP_UINT8("oas", SMMUv3State, oas, 44),
};
static void smmuv3_instance_init(Object *obj)
@@ -2103,6 +2114,9 @@ static void smmuv3_class_init(ObjectClass *klass, const void *data)
object_class_property_set_description(klass, "ats",
"Enable/disable ATS support (for accel=on). Please ensure host "
"platform has ATS support before enabling this");
+ object_class_property_set_description(klass, "oas",
+ "Specify Output Address Size (for accel =on). Supported values "
+ "are 44 or 48 bits. Defaults to 44 bits");
}
static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index 242d6429ed..d488a39cd0 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -71,6 +71,7 @@ struct SMMUv3State {
Error *migration_blocker;
bool ril;
bool ats;
+ uint8_t oas;
};
typedef enum {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 30/33] backends/iommufd: Retrieve PASID width from iommufd_backend_get_device_info()
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (28 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 29/33] hw/arm/smmuv3-accel: Add property to specify OAS bits Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 21:50 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 31/33] Extend get_cap() callback to support PASID Shameer Kolothum
` (3 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Retrieve PASID width from iommufd_backend_get_device_info() and store it
in HostIOMMUDeviceCaps for later use.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
backends/iommufd.c | 6 +++++-
hw/arm/smmuv3-accel.c | 3 ++-
hw/vfio/iommufd.c | 7 +++++--
include/system/host_iommu_device.h | 3 +++
include/system/iommufd.h | 3 ++-
5 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index e68a2c934f..6381f9664b 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -388,7 +388,8 @@ bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
- uint64_t *caps, Error **errp)
+ uint64_t *caps, uint8_t *max_pasid_log2,
+ Error **errp)
{
struct iommu_hw_info info = {
.size = sizeof(info),
@@ -407,6 +408,9 @@ bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
g_assert(caps);
*caps = info.out_capabilities;
+ if (max_pasid_log2) {
+ *max_pasid_log2 = info.out_max_pasid_log2;
+ }
return true;
}
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 35a94c720a..254d29ee2d 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -113,7 +113,8 @@ smmuv3_accel_hw_compatible(SMMUv3State *s, HostIOMMUDeviceIOMMUFD *idev,
uint64_t caps;
if (!iommufd_backend_get_device_info(idev->iommufd, idev->devid, &data_type,
- &info, sizeof(info), &caps, errp)) {
+ &info, sizeof(info), &caps, NULL,
+ errp)) {
return false;
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index ca8a6b7029..bbe944d7cc 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -353,7 +353,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
* instead.
*/
if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
- &type, NULL, 0, &hw_caps, errp)) {
+ &type, NULL, 0, &hw_caps, NULL,
+ errp)) {
return false;
}
@@ -889,19 +890,21 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
HostIOMMUDeviceCaps *caps = &hiod->caps;
VendorCaps *vendor_caps = &caps->vendor_caps;
enum iommu_hw_info_type type;
+ uint8_t max_pasid_log2;
uint64_t hw_caps;
hiod->agent = opaque;
if (!iommufd_backend_get_device_info(vdev->iommufd, vdev->devid, &type,
vendor_caps, sizeof(*vendor_caps),
- &hw_caps, errp)) {
+ &hw_caps, &max_pasid_log2, errp)) {
return false;
}
hiod->name = g_strdup(vdev->name);
caps->type = type;
caps->hw_caps = hw_caps;
+ caps->max_pasid_log2 = max_pasid_log2;
idev = HOST_IOMMU_DEVICE_IOMMUFD(hiod);
idev->iommufd = vdev->iommufd;
diff --git a/include/system/host_iommu_device.h b/include/system/host_iommu_device.h
index ab849a4a82..bfb2b60478 100644
--- a/include/system/host_iommu_device.h
+++ b/include/system/host_iommu_device.h
@@ -30,6 +30,8 @@ typedef union VendorCaps {
* @hw_caps: host platform IOMMU capabilities (e.g. on IOMMUFD this represents
* the @out_capabilities value returned from IOMMU_GET_HW_INFO ioctl)
*
+ * @max_pasid_log2: width of PASIDs supported by host IOMMU device
+ *
* @vendor_caps: host platform IOMMU vendor specific capabilities (e.g. on
* IOMMUFD this represents a user-space buffer filled by kernel
* with host IOMMU @type specific hardware information data)
@@ -37,6 +39,7 @@ typedef union VendorCaps {
typedef struct HostIOMMUDeviceCaps {
uint32_t type;
uint64_t hw_caps;
+ uint8_t max_pasid_log2;
VendorCaps vendor_caps;
} HostIOMMUDeviceCaps;
#endif
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 41e216c677..aa78bf1e1d 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -71,7 +71,8 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
hwaddr iova, uint64_t size);
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
- uint64_t *caps, Error **errp);
+ uint64_t *caps, uint8_t *max_pasid_log2,
+ Error **errp);
bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
uint32_t pt_id, uint32_t flags,
uint32_t data_type, uint32_t data_len,
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 31/33] Extend get_cap() callback to support PASID
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (29 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 30/33] backends/iommufd: Retrieve PASID width from iommufd_backend_get_device_info() Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 21:56 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM Shameer Kolothum
` (2 subsequent siblings)
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
Modify get_cap() callback so that it can return cap via an output
uint64_t param. And add support for generic iommu hw capability
info and max_pasid_log2(pasid width).
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
backends/iommufd.c | 23 ++++++++++++++++++-----
hw/i386/intel_iommu.c | 8 +++++---
hw/vfio/container-legacy.c | 8 ++++++--
include/system/host_iommu_device.h | 18 ++++++++++++------
4 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 6381f9664b..718d63f5cf 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -523,19 +523,32 @@ bool host_iommu_device_iommufd_detach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
return idevc->detach_hwpt(idev, errp);
}
-static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap, Error **errp)
+static int hiod_iommufd_get_cap(HostIOMMUDevice *hiod, int cap_id,
+ uint64_t *out_cap, Error **errp)
{
HostIOMMUDeviceCaps *caps = &hiod->caps;
- switch (cap) {
+ g_assert(out_cap);
+
+ switch (cap_id) {
case HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE:
- return caps->type;
+ *out_cap = caps->type;
+ break;
case HOST_IOMMU_DEVICE_CAP_AW_BITS:
- return vfio_device_get_aw_bits(hiod->agent);
+ *out_cap = vfio_device_get_aw_bits(hiod->agent);
+ break;
+ case HOST_IOMMU_DEVICE_CAP_GENERIC_HW:
+ *out_cap = caps->hw_caps;
+ break;
+ case HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2:
+ *out_cap = caps->max_pasid_log2;
+ break;
default:
- error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
+ *out_cap = 0;
+ error_setg(errp, "%s: unsupported capability %x", hiod->name, cap_id);
return -EINVAL;
}
+ return 0;
}
static void hiod_iommufd_class_init(ObjectClass *oc, const void *data)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 78b142ccea..d5c131a814 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4605,6 +4605,7 @@ static bool vtd_check_hiod(IntelIOMMUState *s, HostIOMMUDevice *hiod,
Error **errp)
{
HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
+ uint64_t out_cap;
int ret;
if (!hiodc->get_cap) {
@@ -4613,12 +4614,13 @@ static bool vtd_check_hiod(IntelIOMMUState *s, HostIOMMUDevice *hiod,
}
/* Common checks */
- ret = hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_AW_BITS, errp);
+ ret = hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_AW_BITS, &out_cap, errp);
if (ret < 0) {
return false;
}
- if (s->aw_bits > ret) {
- error_setg(errp, "aw-bits %d > host aw-bits %d", s->aw_bits, ret);
+ if (s->aw_bits > out_cap) {
+ error_setg(errp, "aw-bits %d > host aw-bits 0x%" PRIx64, s->aw_bits,
+ out_cap);
return false;
}
diff --git a/hw/vfio/container-legacy.c b/hw/vfio/container-legacy.c
index 32c260b345..1acf063762 100644
--- a/hw/vfio/container-legacy.c
+++ b/hw/vfio/container-legacy.c
@@ -1203,15 +1203,19 @@ static bool hiod_legacy_vfio_realize(HostIOMMUDevice *hiod, void *opaque,
}
static int hiod_legacy_vfio_get_cap(HostIOMMUDevice *hiod, int cap,
- Error **errp)
+ uint64_t *out_cap, Error **errp)
{
+ g_assert(out_cap);
+
switch (cap) {
case HOST_IOMMU_DEVICE_CAP_AW_BITS:
- return vfio_device_get_aw_bits(hiod->agent);
+ *out_cap = vfio_device_get_aw_bits(hiod->agent);
+ break;
default:
error_setg(errp, "%s: unsupported capability %x", hiod->name, cap);
return -EINVAL;
}
+ return 0;
}
static GList *
diff --git a/include/system/host_iommu_device.h b/include/system/host_iommu_device.h
index bfb2b60478..4e891e5225 100644
--- a/include/system/host_iommu_device.h
+++ b/include/system/host_iommu_device.h
@@ -88,19 +88,21 @@ struct HostIOMMUDeviceClass {
* @get_cap: check if a host IOMMU device capability is supported.
*
* Optional callback, if not implemented, hint not supporting query
- * of @cap.
+ * of @cap_id.
*
* @hiod: pointer to a host IOMMU device instance.
*
- * @cap: capability to check.
+ * @cap_id: capability to check.
+ *
+ * @out_cap: 0 if a @cap_id is unsupported or else the capability
+ * value for @cap_id.
*
* @errp: pass an Error out when fails to query capability.
*
- * Returns: <0 on failure, 0 if a @cap is unsupported, or else
- * 1 or some positive value for some special @cap,
- * i.e., HOST_IOMMU_DEVICE_CAP_AW_BITS.
+ * Returns: <0 if @cap_id is not supported, 0 on success.
*/
- int (*get_cap)(HostIOMMUDevice *hiod, int cap, Error **errp);
+ int (*get_cap)(HostIOMMUDevice *hiod, int cap_id, uint64_t *out_cap,
+ Error **errp);
/**
* @get_iova_ranges: Return the list of usable iova_ranges along with
* @hiod Host IOMMU device
@@ -123,6 +125,10 @@ struct HostIOMMUDeviceClass {
*/
#define HOST_IOMMU_DEVICE_CAP_IOMMU_TYPE 0
#define HOST_IOMMU_DEVICE_CAP_AW_BITS 1
+/* Generic IOMMU HW capability info */
+#define HOST_IOMMU_DEVICE_CAP_GENERIC_HW 2
+/* PASID width */
+#define HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2 3
#define HOST_IOMMU_DEVICE_CAP_AW_BITS_MAX 64
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (30 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 31/33] Extend get_cap() callback to support PASID Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 21:59 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable Shameer Kolothum
2025-11-20 17:06 ` [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Nicolin Chen
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
From: Yi Liu <yi.l.liu@intel.com>
If user wants to expose PASID capability in vIOMMU, then VFIO would also
need to report the PASID cap for this device if the underlying hardware
supports it as well.
As a start, this chooses to put the vPASID cap in the last 8 bytes of the
vconfig space. This is a choice in the good hope of no conflict with any
existing cap or hidden registers. For the devices that has hidden registers,
user should figure out a proper offset for the vPASID cap. This may require
an option for user to config it. Here we leave it as a future extension.
There are more discussions on the mechanism of finding the proper offset.
https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2@BN9PR11MB5276.namprd11.prod.outlook.com/
Since we add a check to ensure the vIOMMU supports PASID, only devices
under those vIOMMUs can synthesize the vPASID capability. This gives
users control over which devices expose vPASID.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/vfio/pci.c | 38 ++++++++++++++++++++++++++++++++++++++
include/hw/iommu.h | 1 +
2 files changed, 39 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8b8bc5a421..e11e39d667 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -24,6 +24,7 @@
#include <sys/ioctl.h>
#include "hw/hw.h"
+#include "hw/iommu.h"
#include "hw/pci/msi.h"
#include "hw/pci/msix.h"
#include "hw/pci/pci_bridge.h"
@@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev, uint16_t pos)
static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
{
+ HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
+ HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
PCIDevice *pdev = PCI_DEVICE(vdev);
+ uint64_t max_pasid_log2 = 0;
+ bool pasid_cap_added = false;
+ uint64_t hw_caps;
uint32_t header;
uint16_t cap_id, next, size;
uint8_t cap_ver;
@@ -2578,12 +2584,44 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
pcie_add_capability(pdev, cap_id, cap_ver, next, size);
}
break;
+ /*
+ * VFIO kernel does not expose the PASID CAP today. We may synthesize
+ * one later through IOMMUFD APIs. If VFIO ever starts exposing it,
+ * record its presence here so we do not create a duplicate CAP.
+ */
+ case PCI_EXT_CAP_ID_PASID:
+ pasid_cap_added = true;
+ /* fallthrough */
default:
pcie_add_capability(pdev, cap_id, cap_ver, next, size);
}
}
+#ifdef CONFIG_IOMMUFD
+ /* Try to retrieve PASID CAP through IOMMUFD APIs */
+ if (!pasid_cap_added && hiodc && hiodc->get_cap) {
+ hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW, &hw_caps, NULL);
+ hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2,
+ &max_pasid_log2, NULL);
+ }
+
+ /*
+ * If supported, adds the PASID capability in the end of the PCIe config
+ * space. TODO: Add option for enabling pasid at a safe offset.
+ */
+ if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) &
+ VIOMMU_FLAG_PASID_SUPPORTED)) {
+ bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC);
+ bool priv_mod = (hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV);
+
+ pcie_pasid_init(pdev, PCIE_CONFIG_SPACE_SIZE - PCI_EXT_CAP_PASID_SIZEOF,
+ max_pasid_log2, exec_perm, priv_mod);
+ /* PASID capability is fully emulated by QEMU */
+ memset(vdev->emulated_config_bits + pdev->exp.pasid_cap, 0xff, 8);
+ }
+#endif
+
/* Cleanup chain head ID if necessary */
if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
diff --git a/include/hw/iommu.h b/include/hw/iommu.h
index 9b8bb94fc2..9635770bee 100644
--- a/include/hw/iommu.h
+++ b/include/hw/iommu.h
@@ -20,6 +20,7 @@
enum viommu_flags {
/* vIOMMU needs nesting parent HWPT to create nested HWPT */
VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
+ VIOMMU_FLAG_PASID_SUPPORTED = BIT_ULL(1),
};
#endif /* HW_IOMMU_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (31 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM Shameer Kolothum
@ 2025-11-20 13:22 ` Shameer Kolothum
2025-11-20 22:09 ` Nicolin Chen
2025-11-20 17:06 ` [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Nicolin Chen
33 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-20 13:22 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
QEMU SMMUv3 currently forces SSID (Substream ID) to zero. One key use case
for accelerated mode is Shared Virtual Addressing (SVA), which requires
SSID support so the guest can maintain multiple context descriptors per
substream ID.
Provide an option for user to enable PASID support. A SSIDSIZE of 16
is currently used as default.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/arm/smmuv3-accel.c | 23 ++++++++++++++++++++++-
hw/arm/smmuv3-internal.h | 1 +
hw/arm/smmuv3.c | 13 +++++++++++--
include/hw/arm/smmuv3.h | 1 +
4 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 254d29ee2d..dc0f61e841 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -67,6 +67,12 @@ smmuv3_accel_check_hw_compatible(SMMUv3State *s,
error_setg(errp, "Host SMMUv3 SIDSIZE not compatible");
return false;
}
+ /* If user enables PASID support(pasid=on), QEMU sets SSIDSIZE to 16 */
+ if (FIELD_EX32(info->idr[1], IDR1, SSIDSIZE) <
+ FIELD_EX32(s->idr[1], IDR1, SSIDSIZE)) {
+ error_setg(errp, "Host SMMUv3 SSIDSIZE not compatible");
+ return false;
+ }
/* User can disable QEMU SMMUv3 Range Invalidation support */
if (FIELD_EX32(info->idr[3], IDR3, RIL) >
@@ -643,7 +649,14 @@ static uint64_t smmuv3_accel_get_viommu_flags(void *opaque)
* The real HW nested support should be reported from host SMMUv3 and if
* it doesn't, the nesting parent allocation will fail anyway in VFIO core.
*/
- return VIOMMU_FLAG_WANT_NESTING_PARENT;
+ uint64_t flags = VIOMMU_FLAG_WANT_NESTING_PARENT;
+ SMMUState *bs = opaque;
+ SMMUv3State *s = ARM_SMMUV3(bs);
+
+ if (s->pasid) {
+ flags |= VIOMMU_FLAG_PASID_SUPPORTED;
+ }
+ return flags;
}
static const PCIIOMMUOps smmuv3_accel_ops = {
@@ -671,6 +684,14 @@ void smmuv3_accel_idr_override(SMMUv3State *s)
if (s->oas == 48) {
s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, SMMU_IDR5_OAS_48);
}
+
+ /*
+ * By default QEMU SMMUv3 has no PASID(SSID) support. Update IDR1 if user
+ * has enabled it.
+ */
+ if (s->pasid) {
+ s->idr[1] = FIELD_DP32(s->idr[1], IDR1, SSIDSIZE, SMMU_IDR1_SSIDSIZE);
+ }
}
/* Based on SMUUv3 GPBA.ABORT configuration, attach a corresponding HWPT */
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index 0f44a4e1d3..e45aad27f7 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -81,6 +81,7 @@ REG32(IDR1, 0x4)
FIELD(IDR1, ECMDQ, 31, 1)
#define SMMU_IDR1_SIDSIZE 16
+#define SMMU_IDR1_SSIDSIZE 16
#define SMMU_CMDQS 19
#define SMMU_EVENTQS 19
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index a7bd4eeb77..763f069a35 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -611,9 +611,11 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
}
}
- if (STE_S1CDMAX(ste) != 0) {
+ /* If pasid not enabled, can't support multiple CDs */
+ if (!s->pasid && STE_S1CDMAX(ste) != 0) {
qemu_log_mask(LOG_UNIMP,
- "SMMUv3 does not support multiple context descriptors yet\n");
+ "SMMUv3: multiple S1 context descriptors require PASID support. "
+ "Enable PASID with pasid=on (supported only with accel=on)\n");
goto bad_ste;
}
@@ -1950,6 +1952,10 @@ static bool smmu_validate_property(SMMUv3State *s, Error **errp)
error_setg(errp, "OAS can only be set to 44 bits if accel=off");
return false;
}
+ if (s->pasid) {
+ error_setg(errp, "pasid can only be enabled if accel=on");
+ return false;
+ }
return true;
}
@@ -2084,6 +2090,7 @@ static const Property smmuv3_properties[] = {
DEFINE_PROP_BOOL("ril", SMMUv3State, ril, true),
DEFINE_PROP_BOOL("ats", SMMUv3State, ats, false),
DEFINE_PROP_UINT8("oas", SMMUv3State, oas, 44),
+ DEFINE_PROP_BOOL("pasid", SMMUv3State, pasid, false),
};
static void smmuv3_instance_init(Object *obj)
@@ -2117,6 +2124,8 @@ static void smmuv3_class_init(ObjectClass *klass, const void *data)
object_class_property_set_description(klass, "oas",
"Specify Output Address Size (for accel =on). Supported values "
"are 44 or 48 bits. Defaults to 44 bits");
+ object_class_property_set_description(klass, "pasid",
+ "Enable/disable PASID support (for accel=on)");
}
static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index d488a39cd0..2d4970fe19 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -72,6 +72,7 @@ struct SMMUv3State {
bool ril;
bool ats;
uint8_t oas;
+ bool pasid;
};
typedef enum {
--
2.43.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
` (32 preceding siblings ...)
2025-11-20 13:22 ` [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable Shameer Kolothum
@ 2025-11-20 17:06 ` Nicolin Chen
33 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 17:06 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:21:40PM +0000, Shameer Kolothum wrote:
> https://github.com/shamiali2008/qemu-master master-smmuv3-accel-v6
I did a quick sanity with two of my VM setups passing through GPU
and MLX on NVIDIA Grace. Everything looks normal to me except:
> - GBPA-based vSTE update depends on Nicolin's kernel patch [1].
This now becomes a hard requirement that would error out when the
kernel doesn't have this change v.s. v5 just giving a warning. So
make sure to apply that to the kernel tree for testing.
b4 am https://lore.kernel.org/linux-iommu/20251103172755.2026145-1-nicolinc@nvidia.com/
Not sure if we should put a hard requirement on the kernel change
that isn't merged..
With that,
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 07/33] hw/pci/pci: Move pci_init_bus_master() after adding device to bus
2025-11-20 13:21 ` [PATCH v6 07/33] hw/pci/pci: Move pci_init_bus_master() after adding device to bus Shameer Kolothum
@ 2025-11-20 20:44 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 20:44 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju,
Michael S . Tsirkin
On Thu, Nov 20, 2025 at 01:21:47PM +0000, Shameer Kolothum wrote:
> During PCI hotplug, in do_pci_register_device(), pci_init_bus_master()
> is called before storing the pci_dev pointer in bus->devices[devfn].
>
> This causes a problem if pci_init_bus_master() (via its
> get_address_space() callback) attempts to retrieve the device using
> pci_find_device(), since the PCI device is not yet visible on the bus.
>
> Fix this by moving the pci_init_bus_master() call to after the device
> has been added to bus->devices[devfn].
>
> This prepares for a subsequent patch where the accel SMMUv3
> get_address_space() callback retrieves the pci_dev to identify the
> attached device type.
>
> No functional change intended.
>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback
2025-11-20 13:21 ` [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback Shameer Kolothum
@ 2025-11-20 20:51 ` Nicolin Chen
2025-11-21 10:38 ` Shameer Kolothum
0 siblings, 1 reply; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 20:51 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju,
Michael S . Tsirkin
On Thu, Nov 20, 2025 at 01:21:48PM +0000, Shameer Kolothum wrote:
> Introduce an optional supports_address_space() callback in PCIIOMMUOps to
"supports_address_space" sounds a bit to wide to me than its
indication to supporting an IOMMU address space specifically,
since the "system address space" being used in this series is
a legit address space as well.
With that being said, I think we are fine for now, given the
API docs has clarified it. If someone shares the same concern,
we can rename it later.
> allow a vIOMMU implementation to reject devices that should not be attached
> to it.
>
> Currently, get_address_space() is the first and mandatory callback into the
> vIOMMU layer, which always returns an address space. For certain setups, such
> as hardware accelerated vIOMMUs (e.g. ARM SMMUv3 with accel=on), attaching
> emulated endpoint devices is undesirable as it may impact the behavior or
> performance of VFIO passthrough devices, for example, by triggering
> unnecessary invalidations on the host IOMMU.
>
> The new callback allows a vIOMMU to check and reject unsupported devices
> early during PCI device registration.
>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 09/33] hw/pci-bridge/pci_expander_bridge: Move TYPE_PXB_PCIE_DEV to header
2025-11-20 13:21 ` [PATCH v6 09/33] hw/pci-bridge/pci_expander_bridge: Move TYPE_PXB_PCIE_DEV to header Shameer Kolothum
@ 2025-11-20 20:52 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 20:52 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:21:49PM +0000, Shameer Kolothum wrote:
> Move the TYPE_PXB_PCIE_DEV definition to header so that it can be
> referenced by other code in subsequent patch.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 13/33] hw/arm/smmuv3: propagate smmuv3_cmdq_consume() errors to caller
2025-11-20 13:21 ` [PATCH v6 13/33] hw/arm/smmuv3: propagate smmuv3_cmdq_consume() errors to caller Shameer Kolothum
@ 2025-11-20 20:59 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 20:59 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:21:53PM +0000, Shameer Kolothum wrote:
> smmuv3_cmdq_consume() is updated to return detailed errors via errp.
>
> Although this is currently a no-op, it prepares the ground for accel
> SMMUv3 specific command handling where proper error reporting will be
> useful.
>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 15/33] hw/arm/smmuv3-accel: Install SMMUv3 GBPA based hwpt
2025-11-20 13:21 ` [PATCH v6 15/33] hw/arm/smmuv3-accel: Install SMMUv3 GBPA based hwpt Shameer Kolothum
@ 2025-11-20 21:03 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:03 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:21:55PM +0000, Shameer Kolothum wrote:
> On guest reboot or on GBPA update, attach a nested HWPT based on the
> GPBA.ABORT bit which either aborts all incoming transactions or bypasses
> them.
>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 16/33] hw/pci/pci: Introduce a callback to retrieve the MSI doorbell GPA directly
2025-11-20 13:21 ` [PATCH v6 16/33] hw/pci/pci: Introduce a callback to retrieve the MSI doorbell GPA directly Shameer Kolothum
@ 2025-11-20 21:05 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:05 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju,
Michael S . Tsirkin
On Thu, Nov 20, 2025 at 01:21:56PM +0000, Shameer Kolothum wrote:
> For certain vIOMMU implementations, such as SMMUv3 in accelerated mode,
> the translation tables are programmed directly into the physical SMMUv3
> in a nested configuration. While QEMU knows where the guest tables live,
> safely walking them in software would require trapping and ordering all
> guest invalidations on every command queue. Without this, QEMU could race
> with guest updates and walk stale or freed page tables.
>
> This constraint is fundamental to the design of HW-accelerated vSMMU when
> used with downstream vfio-pci endpoint devices, where QEMU must never walk
> guest translation tables and must rely on the physical SMMU for
> translation. Future accelerated vSMMU features, such as virtual CMDQ, will
> also prevent trapping invalidations, reinforcing this restriction.
>
> For vfio-pci endpoints behind such a vSMMU, the only translation QEMU
> needs is for the MSI doorbell used when setting up KVM MSI route tables.
> Instead of attempting a software walk, introduce an optional vIOMMU
> callback that returns the MSI doorbell GPA directly.
>
> kvm_arch_fixup_msi_route() uses this callback when available and ignores
> the guest provided IOVA in that case.
>
> If the vIOMMU does not implement the callback, we fall back to the
> existing IOMMU based address space translation path.
>
> This ensures correct MSI routing for accelerated SMMUv3 + VFIO passthrough
> while avoiding unsafe software walks of guest translation tables.
>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA
2025-11-20 13:21 ` [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA Shameer Kolothum
@ 2025-11-20 21:21 ` Nicolin Chen
2025-11-21 9:57 ` Shameer Kolothum
0 siblings, 1 reply; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:21 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:21:57PM +0000, Shameer Kolothum wrote:
> Accelerated SMMUv3 instances rely on the physical SMMUv3 for nested
> translation (Guest Stage-1, Host Stage-2). In this mode the guest’s
> Stage-1 tables are programmed directly into hardware, and QEMU should
> not attempt to walk them for translation since doing so is not reliably
> safe. For vfio-pci endpoints behind such a vSMMU, the only translation
> QEMU is responsible for is the MSI doorbell used during KVM MSI setup.
>
> Add a device property to carry the MSI doorbell GPA from the virt
> machine, and expose it through a new get_msi_direct_gpa PCIIOMMUOp.
> kvm_arch_fixup_msi_route() can then use this GPA directly instead of
> attempting a software walk of guest translation tables.
>
> This enables correct MSI routing with accelerated SMMUv3 while avoiding
> unsafe accesses to page tables.
>
> For meaningful use of vfio-pci devices with accelerated SMMUv3, both KVM
> and a kernel irqchip are required. Enforce this requirement when accel=on
> is selected.
>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Nits:
> +++ b/hw/arm/virt.c
> @@ -3052,6 +3052,14 @@ static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> /* The new SMMUv3 device is specific to the PCI bus */
> object_property_set_bool(OBJECT(dev), "smmu_per_bus", true, NULL);
> }
> + if (object_property_find(OBJECT(dev), "accel") &&
> + object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
Do we need object_property_find()? A later patch seems to drop it.
Perhaps we shouldn't add it in the first place?
> @@ -3088,6 +3096,20 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> }
>
> create_smmuv3_dev_dtb(vms, dev, bus);
> + if (object_property_find(OBJECT(dev), "accel") &&
> + object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
Ditto
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 20/33] hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate
2025-11-20 13:22 ` [PATCH v6 20/33] hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate Shameer Kolothum
@ 2025-11-20 21:27 ` Nicolin Chen
2025-11-20 21:30 ` Nicolin Chen
0 siblings, 1 reply; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:27 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:22:00PM +0000, Shameer Kolothum wrote:
> + /* QEMU SMMUv3 supports Range Invalidation by default */
> + if (FIELD_EX32(info->idr[3], IDR3, RIL) !=
> + FIELD_EX32(s->idr[3], IDR3, RIL)) {
> + error_setg(errp, "Host SMMUv3 doesn't support Range Invalidation");
> + return false;
If host reports info->ril=1 while VM sets s->ril=0, it could have
worked. But this would reject the case.
I think it should be:
if (FIELD_EX32(info->idr[3], IDR3, RIL) <
FIELD_EX32(s->idr[3], IDR3, RIL)) {
?
Nicolin
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 20/33] hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate
2025-11-20 21:27 ` Nicolin Chen
@ 2025-11-20 21:30 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:30 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:27:25PM -0800, Nicolin Chen wrote:
> On Thu, Nov 20, 2025 at 01:22:00PM +0000, Shameer Kolothum wrote:
> > + /* QEMU SMMUv3 supports Range Invalidation by default */
> > + if (FIELD_EX32(info->idr[3], IDR3, RIL) !=
> > + FIELD_EX32(s->idr[3], IDR3, RIL)) {
> > + error_setg(errp, "Host SMMUv3 doesn't support Range Invalidation");
> > + return false;
>
> If host reports info->ril=1 while VM sets s->ril=0, it could have
> worked. But this would reject the case.
>
> I think it should be:
> if (FIELD_EX32(info->idr[3], IDR3, RIL) <
> FIELD_EX32(s->idr[3], IDR3, RIL)) {
> ?
Never mind. I realized that you are doing in a followup patch :)
Nicolin
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 27/33] hw/arm/smmuv3-accel: Add a property to specify RIL support
2025-11-20 13:22 ` [PATCH v6 27/33] hw/arm/smmuv3-accel: Add a property to specify RIL support Shameer Kolothum
@ 2025-11-20 21:34 ` Nicolin Chen via
2025-11-21 10:04 ` Shameer Kolothum
0 siblings, 1 reply; 61+ messages in thread
From: Nicolin Chen via @ 2025-11-20 21:34 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:22:07PM +0000, Shameer Kolothum wrote:
> Currently QEMU SMMUv3 has RIL support by default. But if accelerated mode
> is enabled, RIL has to be compatible with host SMMUv3 support.
>
> Add a property so that the user can specify this.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> ---
> hw/arm/smmuv3-accel.c | 14 ++++++++++++--
> hw/arm/smmuv3-accel.h | 4 ++++
> hw/arm/smmuv3.c | 12 ++++++++++++
> include/hw/arm/smmuv3.h | 1 +
> 4 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index aae7840c40..b6429c8b42 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -62,8 +62,8 @@ smmuv3_accel_check_hw_compatible(SMMUv3State *s,
> return false;
> }
>
> - /* QEMU SMMUv3 supports Range Invalidation by default */
> - if (FIELD_EX32(info->idr[3], IDR3, RIL) !=
> + /* User can disable QEMU SMMUv3 Range Invalidation support */
> + if (FIELD_EX32(info->idr[3], IDR3, RIL) >
> FIELD_EX32(s->idr[3], IDR3, RIL)) {
When (host) info->idr = 1 > (VM) s->idr = 0, it should work?
So, should it be "<" instead?
Otherwise,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 28/33] hw/arm/smmuv3-accel: Add support for ATS
2025-11-20 13:22 ` [PATCH v6 28/33] hw/arm/smmuv3-accel: Add support for ATS Shameer Kolothum
@ 2025-11-20 21:40 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:40 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:22:08PM +0000, Shameer Kolothum wrote:
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 296afbe503..ad476146f6 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1498,13 +1498,24 @@ static int smmuv3_cmdq_consume(SMMUv3State *s, Error **errp)
> */
> smmuv3_range_inval(bs, &cmd, SMMU_STAGE_2);
> break;
> + case SMMU_CMD_ATC_INV:
> + SMMUDevice *sdev = smmu_find_sdev(bs, CMD_SID(&cmd));
> +
> + if (!sdev) {
> + break;
> + }
Should we do:
if (!sdev || !s->ats) {
trace_smmuv3_unhandled_cmd(type);
break;
}
?
Otherwise,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 29/33] hw/arm/smmuv3-accel: Add property to specify OAS bits
2025-11-20 13:22 ` [PATCH v6 29/33] hw/arm/smmuv3-accel: Add property to specify OAS bits Shameer Kolothum
@ 2025-11-20 21:47 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:47 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:22:09PM +0000, Shameer Kolothum wrote:
> QEMU SMMUv3 currently sets the output address size (OAS) to 44 bits.
> With accelerator mode enabled, a device may use SVA, where CPU page tables
> are shared with the SMMU, requiring an OAS at least as large as the
> CPU’s output address size. A user option is added to configure this.
>
> However, the OAS value advertised by the virtual SMMU must remain
> compatible with the capabilities of the host SMMUv3. In accelerated
> mode, the host SMMU performs stage-2 translation and must be able to
> consume the intermediate physical addresses (IPA) produced by stage-1.
>
> The OAS exposed by the virtual SMMU defines the maximum IPA width that
> stage-1 translations may generate. For AArch64 implementations, the
> maximum usable IPA size on the host SMMU is determined by its own OAS.
> Check that the configured OAS does not exceed what the host SMMU
> can safely support.
>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> ---
> hw/arm/smmuv3-accel.c | 20 ++++++++++++++++++++
> hw/arm/smmuv3-internal.h | 3 ++-
> hw/arm/smmuv3.c | 16 +++++++++++++++-
> include/hw/arm/smmuv3.h | 1 +
> 4 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 73c7ce586a..35a94c720a 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -27,6 +27,12 @@
> static MemoryRegion root, sysmem;
> static AddressSpace *shared_as_sysmem;
>
> +static int smmuv3_oas_bits(uint32_t oas)
> +{
> + static const int map[] = { 32, 36, 40, 42, 44, 48, 52, 56 };
> + return () ? map[oas] : -EINVAL;
We should probably just:
g_assert(oas < ARRAY_SIZE(map));
-EINVAL is useless anyway in the caller that prints it.
Otherwise,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 30/33] backends/iommufd: Retrieve PASID width from iommufd_backend_get_device_info()
2025-11-20 13:22 ` [PATCH v6 30/33] backends/iommufd: Retrieve PASID width from iommufd_backend_get_device_info() Shameer Kolothum
@ 2025-11-20 21:50 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:50 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:22:10PM +0000, Shameer Kolothum wrote:
> Retrieve PASID width from iommufd_backend_get_device_info() and store it
> in HostIOMMUDeviceCaps for later use.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 31/33] Extend get_cap() callback to support PASID
2025-11-20 13:22 ` [PATCH v6 31/33] Extend get_cap() callback to support PASID Shameer Kolothum
@ 2025-11-20 21:56 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:56 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:22:11PM +0000, Shameer Kolothum wrote:
> Modify get_cap() callback so that it can return cap via an output
> uint64_t param. And add support for generic iommu hw capability
> info and max_pasid_log2(pasid width).
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM
2025-11-20 13:22 ` [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM Shameer Kolothum
@ 2025-11-20 21:59 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 21:59 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:22:12PM +0000, Shameer Kolothum wrote:
> From: Yi Liu <yi.l.liu@intel.com>
>
> If user wants to expose PASID capability in vIOMMU, then VFIO would also
> need to report the PASID cap for this device if the underlying hardware
> supports it as well.
>
> As a start, this chooses to put the vPASID cap in the last 8 bytes of the
> vconfig space. This is a choice in the good hope of no conflict with any
> existing cap or hidden registers. For the devices that has hidden registers,
> user should figure out a proper offset for the vPASID cap. This may require
> an option for user to config it. Here we leave it as a future extension.
> There are more discussions on the mechanism of finding the proper offset.
>
> https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2@BN9PR11MB5276.namprd11.prod.outlook.com/
>
> Since we add a check to ensure the vIOMMU supports PASID, only devices
> under those vIOMMUs can synthesize the vPASID capability. This gives
> users control over which devices expose vPASID.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable
2025-11-20 13:22 ` [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable Shameer Kolothum
@ 2025-11-20 22:09 ` Nicolin Chen
2025-11-21 10:22 ` Shameer Kolothum
0 siblings, 1 reply; 61+ messages in thread
From: Nicolin Chen @ 2025-11-20 22:09 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
berrange, nathanc, mochs, smostafa, wangzhou1, jiangkunkun,
jonathan.cameron, zhangfei.gao, zhenzhong.duan, yi.l.liu, kjaju
On Thu, Nov 20, 2025 at 01:22:13PM +0000, Shameer Kolothum wrote:
> +++ b/hw/arm/smmuv3-accel.c
> @@ -67,6 +67,12 @@ smmuv3_accel_check_hw_compatible(SMMUv3State *s,
> error_setg(errp, "Host SMMUv3 SIDSIZE not compatible");
> return false;
> }
> + /* If user enables PASID support(pasid=on), QEMU sets SSIDSIZE to 16 */
> + if (FIELD_EX32(info->idr[1], IDR1, SSIDSIZE) <
> + FIELD_EX32(s->idr[1], IDR1, SSIDSIZE)) {
> + error_setg(errp, "Host SMMUv3 SSIDSIZE not compatible");
> + return false;
> + }
I think we can print the values: host vs VM. And at SIDSIZE above
as well.
> @@ -2084,6 +2090,7 @@ static const Property smmuv3_properties[] = {
> DEFINE_PROP_BOOL("ril", SMMUv3State, ril, true),
> DEFINE_PROP_BOOL("ats", SMMUv3State, ats, false),
> DEFINE_PROP_UINT8("oas", SMMUv3State, oas, 44),
> + DEFINE_PROP_BOOL("pasid", SMMUv3State, pasid, false),
> };
Instead of doing a boolean "pasid", perhaps ssidsize and sidsize
should be configurable. Then, user can follow the not-compatible
print to set correct SSIDSIZE and SIDSIZE.
They can also choose to set a higher value if underlying SMMU HW
supports that.
Otherwise,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA
2025-11-20 21:21 ` Nicolin Chen
@ 2025-11-21 9:57 ` Shameer Kolothum
2025-11-21 17:56 ` Nicolin Chen
0 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-21 9:57 UTC (permalink / raw)
To: Nicolin Chen
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, Jason Gunthorpe, ddutile@redhat.com,
berrange@redhat.com, Nathan Chen, Matt Ochs, smostafa@google.com,
wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
zhenzhong.duan@intel.com, yi.l.liu@intel.com, Krishnakant Jaju
> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: 20 November 2025 21:22
> To: Shameer Kolothum <skolothumtho@nvidia.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; Jason Gunthorpe
> <jgg@nvidia.com>; ddutile@redhat.com; berrange@redhat.com; Nathan
> Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> smostafa@google.com; wangzhou1@hisilicon.com;
> jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
> Krishnakant Jaju <kjaju@nvidia.com>
> Subject: Re: [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a
> direct MSI doorbell GPA
>
> On Thu, Nov 20, 2025 at 01:21:57PM +0000, Shameer Kolothum wrote:
> > Accelerated SMMUv3 instances rely on the physical SMMUv3 for nested
> > translation (Guest Stage-1, Host Stage-2). In this mode the guest’s
> > Stage-1 tables are programmed directly into hardware, and QEMU should
> > not attempt to walk them for translation since doing so is not reliably
> > safe. For vfio-pci endpoints behind such a vSMMU, the only translation
> > QEMU is responsible for is the MSI doorbell used during KVM MSI setup.
> >
> > Add a device property to carry the MSI doorbell GPA from the virt
> > machine, and expose it through a new get_msi_direct_gpa PCIIOMMUOp.
> > kvm_arch_fixup_msi_route() can then use this GPA directly instead of
> > attempting a software walk of guest translation tables.
> >
> > This enables correct MSI routing with accelerated SMMUv3 while avoiding
> > unsafe accesses to page tables.
> >
> > For meaningful use of vfio-pci devices with accelerated SMMUv3, both KVM
> > and a kernel irqchip are required. Enforce this requirement when accel=on
> > is selected.
> >
> > Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
>
> Nits:
>
> > +++ b/hw/arm/virt.c
> > @@ -3052,6 +3052,14 @@ static void
> virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> > /* The new SMMUv3 device is specific to the PCI bus */
> > object_property_set_bool(OBJECT(dev), "smmu_per_bus", true,
> NULL);
> > }
> > + if (object_property_find(OBJECT(dev), "accel") &&
> > + object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
>
> Do we need object_property_find()? A later patch seems to drop it.
> Perhaps we shouldn't add it in the first place?
We need that at this stage as we haven't added the "accel" property yet
and that will cause "make check" tests to fail without that.
We remove it once we introduce "accel" property later.
Thanks,
Shameer
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v6 27/33] hw/arm/smmuv3-accel: Add a property to specify RIL support
2025-11-20 21:34 ` Nicolin Chen via
@ 2025-11-21 10:04 ` Shameer Kolothum
0 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-21 10:04 UTC (permalink / raw)
To: Nicolin Chen
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, Jason Gunthorpe, ddutile@redhat.com,
berrange@redhat.com, Nathan Chen, Matt Ochs, smostafa@google.com,
wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
zhenzhong.duan@intel.com, yi.l.liu@intel.com, Krishnakant Jaju
> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: 20 November 2025 21:35
> To: Shameer Kolothum <skolothumtho@nvidia.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; Jason Gunthorpe
> <jgg@nvidia.com>; ddutile@redhat.com; berrange@redhat.com; Nathan
> Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> smostafa@google.com; wangzhou1@hisilicon.com;
> jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
> Krishnakant Jaju <kjaju@nvidia.com>
> Subject: Re: [PATCH v6 27/33] hw/arm/smmuv3-accel: Add a property to
> specify RIL support
>
> On Thu, Nov 20, 2025 at 01:22:07PM +0000, Shameer Kolothum wrote:
> > Currently QEMU SMMUv3 has RIL support by default. But if accelerated
> > mode is enabled, RIL has to be compatible with host SMMUv3 support.
> >
> > Add a property so that the user can specify this.
> >
> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> > ---
> > hw/arm/smmuv3-accel.c | 14 ++++++++++++--
> > hw/arm/smmuv3-accel.h | 4 ++++
> > hw/arm/smmuv3.c | 12 ++++++++++++
> > include/hw/arm/smmuv3.h | 1 +
> > 4 files changed, 29 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c index
> > aae7840c40..b6429c8b42 100644
> > --- a/hw/arm/smmuv3-accel.c
> > +++ b/hw/arm/smmuv3-accel.c
> > @@ -62,8 +62,8 @@ smmuv3_accel_check_hw_compatible(SMMUv3State
> *s,
> > return false;
> > }
> >
> > - /* QEMU SMMUv3 supports Range Invalidation by default */
> > - if (FIELD_EX32(info->idr[3], IDR3, RIL) !=
> > + /* User can disable QEMU SMMUv3 Range Invalidation support */
> > + if (FIELD_EX32(info->idr[3], IDR3, RIL) >
> > FIELD_EX32(s->idr[3], IDR3, RIL)) {
>
> When (host) info->idr = 1 > (VM) s->idr = 0, it should work?
Yes, that was my intention.
> So, should it be "<" instead?
And got it wrong 😊. Will correct.
Thanks,
Shameer
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable
2025-11-20 22:09 ` Nicolin Chen
@ 2025-11-21 10:22 ` Shameer Kolothum
2025-11-21 17:50 ` Nicolin Chen
0 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-21 10:22 UTC (permalink / raw)
To: Nicolin Chen
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, Jason Gunthorpe, ddutile@redhat.com,
berrange@redhat.com, Nathan Chen, Matt Ochs, smostafa@google.com,
wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
zhenzhong.duan@intel.com, yi.l.liu@intel.com, Krishnakant Jaju
> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: 20 November 2025 22:10
> To: Shameer Kolothum <skolothumtho@nvidia.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; Jason Gunthorpe
> <jgg@nvidia.com>; ddutile@redhat.com; berrange@redhat.com; Nathan
> Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> smostafa@google.com; wangzhou1@hisilicon.com;
> jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
> Krishnakant Jaju <kjaju@nvidia.com>
> Subject: Re: [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID
> enable
>
> On Thu, Nov 20, 2025 at 01:22:13PM +0000, Shameer Kolothum wrote:
> > +++ b/hw/arm/smmuv3-accel.c
> > @@ -67,6 +67,12 @@ smmuv3_accel_check_hw_compatible(SMMUv3State
> *s,
> > error_setg(errp, "Host SMMUv3 SIDSIZE not compatible");
> > return false;
> > }
> > + /* If user enables PASID support(pasid=on), QEMU sets SSIDSIZE to 16 */
> > + if (FIELD_EX32(info->idr[1], IDR1, SSIDSIZE) <
> > + FIELD_EX32(s->idr[1], IDR1, SSIDSIZE)) {
> > + error_setg(errp, "Host SMMUv3 SSIDSIZE not compatible");
> > + return false;
> > + }
>
> I think we can print the values: host vs VM. And at SIDSIZE above
> as well.
>
> > @@ -2084,6 +2090,7 @@ static const Property smmuv3_properties[] = {
> > DEFINE_PROP_BOOL("ril", SMMUv3State, ril, true),
> > DEFINE_PROP_BOOL("ats", SMMUv3State, ats, false),
> > DEFINE_PROP_UINT8("oas", SMMUv3State, oas, 44),
> > + DEFINE_PROP_BOOL("pasid", SMMUv3State, pasid, false),
> > };
>
> Instead of doing a boolean "pasid", perhaps ssidsize and sidsize
> should be configurable. Then, user can follow the not-compatible
> print to set correct SSIDSIZE and SIDSIZE.
Do we really need that? Currently both are set to 16 which means 64K
values are supported. I think we can make it configurable when any
usecase with >64K requirement comes up.
Thanks,
Shameer
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback
2025-11-20 20:51 ` Nicolin Chen
@ 2025-11-21 10:38 ` Shameer Kolothum
2025-11-21 17:28 ` Nicolin Chen
0 siblings, 1 reply; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-21 10:38 UTC (permalink / raw)
To: Nicolin Chen
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, Jason Gunthorpe, ddutile@redhat.com,
berrange@redhat.com, Nathan Chen, Matt Ochs, smostafa@google.com,
wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
zhenzhong.duan@intel.com, yi.l.liu@intel.com, Krishnakant Jaju,
Michael S . Tsirkin
> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: 20 November 2025 20:51
> To: Shameer Kolothum <skolothumtho@nvidia.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; Jason Gunthorpe
> <jgg@nvidia.com>; ddutile@redhat.com; berrange@redhat.com; Nathan
> Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> smostafa@google.com; wangzhou1@hisilicon.com;
> jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
> Krishnakant Jaju <kjaju@nvidia.com>; Michael S . Tsirkin <mst@redhat.com>
> Subject: Re: [PATCH v6 08/33] hw/pci/pci: Add optional
> supports_address_space() callback
>
> On Thu, Nov 20, 2025 at 01:21:48PM +0000, Shameer Kolothum wrote:
> > Introduce an optional supports_address_space() callback in PCIIOMMUOps
> to
>
> "supports_address_space" sounds a bit to wide to me than its
> indication to supporting an IOMMU address space specifically,
> since the "system address space" being used in this series is
> a legit address space as well.
>
> With that being said, I think we are fine for now, given the
> API docs has clarified it. If someone shares the same concern,
> we can rename it later.
The intent here is just to let the vIOMMU decide whether a device should
be associated with its address_space before we call get_address_space().
If the check passes, the vIOMMU must provide the actual address_space
through get_address_space() callback.
Sure. Open to suggestion here.
Thanks,
Shameer
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback
2025-11-21 10:38 ` Shameer Kolothum
@ 2025-11-21 17:28 ` Nicolin Chen
2025-11-21 17:32 ` Shameer Kolothum
0 siblings, 1 reply; 61+ messages in thread
From: Nicolin Chen @ 2025-11-21 17:28 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, Jason Gunthorpe, ddutile@redhat.com,
berrange@redhat.com, Nathan Chen, Matt Ochs, smostafa@google.com,
wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
zhenzhong.duan@intel.com, yi.l.liu@intel.com, Krishnakant Jaju,
Michael S . Tsirkin
On Fri, Nov 21, 2025 at 02:38:06AM -0800, Shameer Kolothum wrote:
> > -----Original Message-----
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: 20 November 2025 20:51
> > To: Shameer Kolothum <skolothumtho@nvidia.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; Jason Gunthorpe
> > <jgg@nvidia.com>; ddutile@redhat.com; berrange@redhat.com; Nathan
> > Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> > smostafa@google.com; wangzhou1@hisilicon.com;
> > jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> > zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
> > Krishnakant Jaju <kjaju@nvidia.com>; Michael S . Tsirkin <mst@redhat.com>
> > Subject: Re: [PATCH v6 08/33] hw/pci/pci: Add optional
> > supports_address_space() callback
> >
> > On Thu, Nov 20, 2025 at 01:21:48PM +0000, Shameer Kolothum wrote:
> > > Introduce an optional supports_address_space() callback in PCIIOMMUOps
> > to
> >
> > "supports_address_space" sounds a bit to wide to me than its
> > indication to supporting an IOMMU address space specifically,
> > since the "system address space" being used in this series is
> > a legit address space as well.
> >
> > With that being said, I think we are fine for now, given the
> > API docs has clarified it. If someone shares the same concern,
> > we can rename it later.
>
> The intent here is just to let the vIOMMU decide whether a device should
> be associated with its address_space before we call get_address_space().
> If the check passes, the vIOMMU must provide the actual address_space
> through get_address_space() callback.
The naming makes sense now. Yet, the API doc is a bit confusing..
Why it says "device can have an IOMMU address space"? If a device
only has a system address space (i.e. it doesn't support an IOMMU
address space), it still returns true, right?
Nicolin
^ permalink raw reply [flat|nested] 61+ messages in thread
* RE: [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback
2025-11-21 17:28 ` Nicolin Chen
@ 2025-11-21 17:32 ` Shameer Kolothum
0 siblings, 0 replies; 61+ messages in thread
From: Shameer Kolothum @ 2025-11-21 17:32 UTC (permalink / raw)
To: Nicolin Chen
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, Jason Gunthorpe, ddutile@redhat.com,
berrange@redhat.com, Nathan Chen, Matt Ochs, smostafa@google.com,
wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
zhenzhong.duan@intel.com, yi.l.liu@intel.com, Krishnakant Jaju,
Michael S . Tsirkin
> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: 21 November 2025 17:28
> To: Shameer Kolothum <skolothumtho@nvidia.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; Jason Gunthorpe
> <jgg@nvidia.com>; ddutile@redhat.com; berrange@redhat.com; Nathan
> Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> smostafa@google.com; wangzhou1@hisilicon.com;
> jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
> Krishnakant Jaju <kjaju@nvidia.com>; Michael S . Tsirkin <mst@redhat.com>
> Subject: Re: [PATCH v6 08/33] hw/pci/pci: Add optional
> supports_address_space() callback
>
> On Fri, Nov 21, 2025 at 02:38:06AM -0800, Shameer Kolothum wrote:
> > > -----Original Message-----
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: 20 November 2025 20:51
> > > To: Shameer Kolothum <skolothumtho@nvidia.com>
> > > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > > eric.auger@redhat.com; peter.maydell@linaro.org; Jason Gunthorpe
> > > <jgg@nvidia.com>; ddutile@redhat.com; berrange@redhat.com; Nathan
> > > Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> > > smostafa@google.com; wangzhou1@hisilicon.com;
> > > jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> > > zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
> > > Krishnakant Jaju <kjaju@nvidia.com>; Michael S . Tsirkin
> <mst@redhat.com>
> > > Subject: Re: [PATCH v6 08/33] hw/pci/pci: Add optional
> > > supports_address_space() callback
> > >
> > > On Thu, Nov 20, 2025 at 01:21:48PM +0000, Shameer Kolothum wrote:
> > > > Introduce an optional supports_address_space() callback in
> PCIIOMMUOps
> > > to
> > >
> > > "supports_address_space" sounds a bit to wide to me than its
> > > indication to supporting an IOMMU address space specifically,
> > > since the "system address space" being used in this series is
> > > a legit address space as well.
> > >
> > > With that being said, I think we are fine for now, given the
> > > API docs has clarified it. If someone shares the same concern,
> > > we can rename it later.
> >
> > The intent here is just to let the vIOMMU decide whether a device should
> > be associated with its address_space before we call get_address_space().
> > If the check passes, the vIOMMU must provide the actual address_space
> > through get_address_space() callback.
>
> The naming makes sense now. Yet, the API doc is a bit confusing..
>
> Why it says "device can have an IOMMU address space"? If a device
> only has a system address space (i.e. it doesn't support an IOMMU
> address space), it still returns true, right?
Yes. I think the API doc wording requires tightening. Will do.
Thanks,
Shameer
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable
2025-11-21 10:22 ` Shameer Kolothum
@ 2025-11-21 17:50 ` Nicolin Chen
2025-11-21 18:36 ` Nicolin Chen
2025-11-21 18:44 ` Jason Gunthorpe
0 siblings, 2 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-21 17:50 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, Jason Gunthorpe, ddutile@redhat.com,
berrange@redhat.com, Nathan Chen, Matt Ochs, smostafa@google.com,
wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
zhenzhong.duan@intel.com, yi.l.liu@intel.com, Krishnakant Jaju
On Fri, Nov 21, 2025 at 02:22:21AM -0800, Shameer Kolothum wrote:
> > > @@ -2084,6 +2090,7 @@ static const Property smmuv3_properties[] = {
> > > DEFINE_PROP_BOOL("ril", SMMUv3State, ril, true),
> > > DEFINE_PROP_BOOL("ats", SMMUv3State, ats, false),
> > > DEFINE_PROP_UINT8("oas", SMMUv3State, oas, 44),
> > > + DEFINE_PROP_BOOL("pasid", SMMUv3State, pasid, false),
> > > };
> >
> > Instead of doing a boolean "pasid", perhaps ssidsize and sidsize
> > should be configurable. Then, user can follow the not-compatible
> > print to set correct SSIDSIZE and SIDSIZE.
>
> Do we really need that? Currently both are set to 16 which means 64K
> values are supported. I think we can make it configurable when any
> usecase with >64K requirement comes up.
For upper boundary, we have SoCs with SSIDSIZE=0x14 i.e. 20. I
am not sure how user space would use this range, but I feel it
is better not to cap it. And SIDSIZE=16 is probably way enough
given that QEMU only has one PCI Bus domain.
For lower boundary, SMMUv3 spec defines:
SSIDSIZE, bits [10:6]
Max bits of SubstreamID.
Valid range 0 to 20 inclusive, 0 meaning no substreams are supported.
and
SIDSIZE, bits [5:0]
Max bits of StreamID.
This value is between 0 and 32 inclusive.
Note: 0 is a legal value. In this case the SMMU supports one stream.
We apply a hard requirement that a host value must >= VM value.
This might not work for hardware that has smaller numbers.
Yes, we may add an SIDSIZE when somebody actually wants it. But
the "bool pasid" would be very useless when we add an SSIDSIZE.
So, I think it's nicer to define "uint32 ssidsize" in the first
place, which also aligns the QEMU parameter with the HW naming.
Nicolin
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA
2025-11-21 9:57 ` Shameer Kolothum
@ 2025-11-21 17:56 ` Nicolin Chen
0 siblings, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-21 17:56 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, Jason Gunthorpe, ddutile@redhat.com,
berrange@redhat.com, Nathan Chen, Matt Ochs, smostafa@google.com,
wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
zhenzhong.duan@intel.com, yi.l.liu@intel.com, Krishnakant Jaju
On Fri, Nov 21, 2025 at 01:57:37AM -0800, Shameer Kolothum wrote:
> > On Thu, Nov 20, 2025 at 01:21:57PM +0000, Shameer Kolothum wrote:
> > > Accelerated SMMUv3 instances rely on the physical SMMUv3 for nested
> > > translation (Guest Stage-1, Host Stage-2). In this mode the guest’s
> > > Stage-1 tables are programmed directly into hardware, and QEMU should
> > > not attempt to walk them for translation since doing so is not reliably
> > > safe. For vfio-pci endpoints behind such a vSMMU, the only translation
> > > QEMU is responsible for is the MSI doorbell used during KVM MSI setup.
> > >
> > > Add a device property to carry the MSI doorbell GPA from the virt
> > > machine, and expose it through a new get_msi_direct_gpa PCIIOMMUOp.
> > > kvm_arch_fixup_msi_route() can then use this GPA directly instead of
> > > attempting a software walk of guest translation tables.
> > >
> > > This enables correct MSI routing with accelerated SMMUv3 while avoiding
> > > unsafe accesses to page tables.
> > >
> > > For meaningful use of vfio-pci devices with accelerated SMMUv3, both KVM
> > > and a kernel irqchip are required. Enforce this requirement when accel=on
> > > is selected.
> > >
> > > Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> >
> > Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> >
> > Nits:
> >
> > > +++ b/hw/arm/virt.c
> > > @@ -3052,6 +3052,14 @@ static void
> > virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> > > /* The new SMMUv3 device is specific to the PCI bus */
> > > object_property_set_bool(OBJECT(dev), "smmu_per_bus", true,
> > NULL);
> > > }
> > > + if (object_property_find(OBJECT(dev), "accel") &&
> > > + object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
> >
> > Do we need object_property_find()? A later patch seems to drop it.
> > Perhaps we shouldn't add it in the first place?
>
> We need that at this stage as we haven't added the "accel" property yet
> and that will cause "make check" tests to fail without that.
>
> We remove it once we introduce "accel" property later.
Hmm, I assume object_property_get_bool() would return false when
"accel" is not available yet? No?
Nicolin
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable
2025-11-21 17:50 ` Nicolin Chen
@ 2025-11-21 18:36 ` Nicolin Chen
2025-11-21 18:44 ` Jason Gunthorpe
1 sibling, 0 replies; 61+ messages in thread
From: Nicolin Chen @ 2025-11-21 18:36 UTC (permalink / raw)
To: Shameer Kolothum
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, Jason Gunthorpe, ddutile@redhat.com,
berrange@redhat.com, Nathan Chen, Matt Ochs, smostafa@google.com,
wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
zhenzhong.duan@intel.com, yi.l.liu@intel.com, Krishnakant Jaju
On Fri, Nov 21, 2025 at 09:50:53AM -0800, Nicolin Chen wrote:
> So, I think it's nicer to define "uint32 ssidsize" in the first
Correction: "u8 ssidsize".
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable
2025-11-21 17:50 ` Nicolin Chen
2025-11-21 18:36 ` Nicolin Chen
@ 2025-11-21 18:44 ` Jason Gunthorpe
1 sibling, 0 replies; 61+ messages in thread
From: Jason Gunthorpe @ 2025-11-21 18:44 UTC (permalink / raw)
To: Nicolin Chen
Cc: Shameer Kolothum, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
eric.auger@redhat.com, peter.maydell@linaro.org,
ddutile@redhat.com, berrange@redhat.com, Nathan Chen, Matt Ochs,
smostafa@google.com, wangzhou1@hisilicon.com,
jiangkunkun@huawei.com, jonathan.cameron@huawei.com,
zhangfei.gao@linaro.org, zhenzhong.duan@intel.com,
yi.l.liu@intel.com, Krishnakant Jaju
On Fri, Nov 21, 2025 at 09:50:50AM -0800, Nicolin Chen wrote:
> On Fri, Nov 21, 2025 at 02:22:21AM -0800, Shameer Kolothum wrote:
> > > > @@ -2084,6 +2090,7 @@ static const Property smmuv3_properties[] = {
> > > > DEFINE_PROP_BOOL("ril", SMMUv3State, ril, true),
> > > > DEFINE_PROP_BOOL("ats", SMMUv3State, ats, false),
> > > > DEFINE_PROP_UINT8("oas", SMMUv3State, oas, 44),
> > > > + DEFINE_PROP_BOOL("pasid", SMMUv3State, pasid, false),
> > > > };
> > >
> > > Instead of doing a boolean "pasid", perhaps ssidsize and sidsize
> > > should be configurable. Then, user can follow the not-compatible
> > > print to set correct SSIDSIZE and SIDSIZE.
> >
> > Do we really need that? Currently both are set to 16 which means 64K
> > values are supported. I think we can make it configurable when any
> > usecase with >64K requirement comes up.
>
> For upper boundary, we have SoCs with SSIDSIZE=0x14 i.e. 20. I
> am not sure how user space would use this range, but I feel it
> is better not to cap it. And SIDSIZE=16 is probably way enough
> given that QEMU only has one PCI Bus domain.
Yeah, it should be ssize not pasid.
The use case of these values is exactly defining a SMMU instance type
so that it can migration between different physical HW. So long as
physical can implement the instance type.
Thus you broadly want to make the iidrs configurable in the exact
spec language of the iidrs, IMHO.
Jason
^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2025-11-22 3:18 UTC | newest]
Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 01/33] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 02/33] backends/iommufd: Introduce iommufd_backend_alloc_vdev Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 03/33] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 04/33] hw/arm/smmu-common: Make iommu ops part of SMMUState Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 05/33] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 06/33] hw/arm/smmuv3-accel: Initialize shared system address space Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 07/33] hw/pci/pci: Move pci_init_bus_master() after adding device to bus Shameer Kolothum
2025-11-20 20:44 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback Shameer Kolothum
2025-11-20 20:51 ` Nicolin Chen
2025-11-21 10:38 ` Shameer Kolothum
2025-11-21 17:28 ` Nicolin Chen
2025-11-21 17:32 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 09/33] hw/pci-bridge/pci_expander_bridge: Move TYPE_PXB_PCIE_DEV to header Shameer Kolothum
2025-11-20 20:52 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 10/33] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 11/33] hw/arm/smmuv3: Implement get_viommu_cap() callback Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 12/33] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 13/33] hw/arm/smmuv3: propagate smmuv3_cmdq_consume() errors to caller Shameer Kolothum
2025-11-20 20:59 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 14/33] hw/arm/smmuv3-accel: Add nested vSTE install/uninstall support Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 15/33] hw/arm/smmuv3-accel: Install SMMUv3 GBPA based hwpt Shameer Kolothum
2025-11-20 21:03 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 16/33] hw/pci/pci: Introduce a callback to retrieve the MSI doorbell GPA directly Shameer Kolothum
2025-11-20 21:05 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA Shameer Kolothum
2025-11-20 21:21 ` Nicolin Chen
2025-11-21 9:57 ` Shameer Kolothum
2025-11-21 17:56 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 18/33] hw/arm/smmuv3-accel: Add support to issue invalidation cmd to host Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 19/33] hw/arm/smmuv3: Initialize ID registers early during realize() Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 20/33] hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate Shameer Kolothum
2025-11-20 21:27 ` Nicolin Chen
2025-11-20 21:30 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 21/33] hw/pci-host/gpex: Allow to generate preserve boot config DSM #5 Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 22/33] hw/arm/virt: Set PCI preserve_config for accel SMMUv3 Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 23/33] tests/qtest/bios-tables-test: Prepare for IORT revison upgrade Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 24/33] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 25/33] tests/qtest/bios-tables-test: Update IORT blobs after revision upgrade Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 26/33] hw/arm/smmuv3: Add accel property for SMMUv3 device Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 27/33] hw/arm/smmuv3-accel: Add a property to specify RIL support Shameer Kolothum
2025-11-20 21:34 ` Nicolin Chen via
2025-11-21 10:04 ` Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 28/33] hw/arm/smmuv3-accel: Add support for ATS Shameer Kolothum
2025-11-20 21:40 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 29/33] hw/arm/smmuv3-accel: Add property to specify OAS bits Shameer Kolothum
2025-11-20 21:47 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 30/33] backends/iommufd: Retrieve PASID width from iommufd_backend_get_device_info() Shameer Kolothum
2025-11-20 21:50 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 31/33] Extend get_cap() callback to support PASID Shameer Kolothum
2025-11-20 21:56 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM Shameer Kolothum
2025-11-20 21:59 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable Shameer Kolothum
2025-11-20 22:09 ` Nicolin Chen
2025-11-21 10:22 ` Shameer Kolothum
2025-11-21 17:50 ` Nicolin Chen
2025-11-21 18:36 ` Nicolin Chen
2025-11-21 18:44 ` Jason Gunthorpe
2025-11-20 17:06 ` [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Nicolin Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).