[RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
@ 2025-07-14 15:59 Shameer Kolothum via
  2025-07-14 15:59 ` [RFC PATCH v3 01/15] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
                   ` (16 more replies)
  0 siblings, 17 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

Hi All,

This patch series introduces initial support for a user-creatable,
accelerated SMMUv3 device (-device arm-smmuv3,accel=on) in QEMU.

This is based on the user-creatable SMMUv3 device series [0].

Why this is needed:

On ARM, to enable vfio-pci pass-through devices in a VM, the host SMMUv3
must be set up in nested translation mode (Stage 1 + Stage 2), with
Stage 1 (S1) controlled by the guest and Stage 2 (S2) managed by the host.

This series introduces an optional accel property for the SMMUv3 device,
indicating that the guest will try to leverage host SMMUv3 features for
acceleration. By default, enabling accel configures the host SMMUv3 in
nested mode to support vfio-pci pass-through.

This new accelerated, user-creatable SMMUv3 device lets you:

 -Set up a VM with multiple SMMUv3s, each tied to a different physical SMMUv3
  on the host. Typically, you’d have multiple PCIe PXB root complexes in the
  VM (one per virtual NUMA node), and each of them can have its own SMMUv3.
  This setup mirrors the host's layout, where each NUMA node has its own
  SMMUv3, and helps build VMs that are more aligned with the host's NUMA
  topology.

 -The host–guest SMMUv3 association results in reduced invalidation broadcasts
  and lookups for devices behind different physical SMMUv3s.

 -Simplifies handling of host SMMUv3s with differing feature sets.

 -Lays the groundwork for additional capabilities like vCMDQ support.

Changes from RFCv2[1] and key points in RFCv3:

 -Unlike RFCv2, there is no arm-smmuv3-accel device now. The accelerated
  mode is enabled using -device arm-smmuv3,accel=on.

 -When accel=on is specified, the SMMUv3 will allow only vfio-pci endpoint
  devices and any non-endpoint devices like PCI bridges and root ports used
  to plug in the vfio-pci. See patch#6 

 -I have tried to keep this RFC simple and basic so we can focus on the
  structure of this new accelerated support. That means there is no support
  for ATS, PASID, or PRI. Only vfio-pci devices that don’t require these
  features will work.

 -Some clarity is still needed on the final approach to handle MSI translation.
  Hence, RMR support (which is required for this) is not included yet, but
  available in the git branch provided below for testing.
 
 -At least one vfio-pci device must currently be cold-plugged to a PCIe root
  complex associated with arm-smmuv3,accel=on. This is required to:
  1. associate a guest SMMUv3 with a host SMMUv3
  2. retrieve the host SMMUv3 feature registers for guest export
  This still needs discussion, as there were concerns previously about this
  approach and it also breaks hotplug/unplug scenarios. See patch#14

 -This version does not yet support host SMMUv3 fault handling or other event
  notifications. These will be addressed in a future patch series.

Branch for testing:

This is based on v8 of the SMMUv3 device series and has dependency on the Intel
series here [3].

https://github.com/hisilicon/qemu/tree/smmuv3-dev-v8-accel-rfcv3


Tested on a HiSilicon platform with multiple SMMUv3s.

./qemu-system-aarch64 \
  -machine virt,accel=kvm,gic-version=3 \
  -object iommufd,id=iommufd0 \
  -bios QEMU_EFI \
  -cpu host -smp cpus=4 -m size=16G,slots=4,maxmem=256G -nographic \
  -device virtio-blk-device,drive=fs \
  -drive if=none,file=ubuntu.img,id=fs \
  -kernel Image \
  -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0,accel=on \
  -device vfio-pci,host=0000:75:00.1,bus=pcie.0,iommufd=iommufd0 \
  -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \
  -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on \
  -device pcie-root-port,id=pcie1.port1,chassis=2,bus=pcie.1,pref64-reserve=2M,io-reserve=1K \
  -device vfio-pci,host=0000:7d:02.1,bus=pcie1.port1,iommufd=iommufd0,id=net1 \
  -append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \
  -device pxb-pcie,id=pcie.2,bus_nr=32,bus=pcie.0 \
  -device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.2 \
  -device pcie-root-port,id=pcie2.port1,chassis=8,bus=pcie.2 \
  -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie2.port1 \
  -fsdev local,id=p9fs,path=p9root,security_model=mapped \
  -net none \
  -nographic
  

Guest output:
  
root@ubuntu:/# dmesg |grep smmu
 arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
 arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008305)
 arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
 arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
 arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
 arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008305)
 arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
 arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
 arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x0
 arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit (features 0x00008305)
 arm-smmu-v3 arm-smmu-v3.2.auto: allocated 65536 entries for cmdq
 arm-smmu-v3 arm-smmu-v3.2.auto: allocated 32768 entries for evtq
root@ubuntu:/# 

root@ubuntu:/# lspci -tv
-+-[0000:20]---00.0-[21]----00.0  Red Hat, Inc Virtio filesystem
 +-[0000:02]---00.0-[03]----00.0  Huawei Technologies Co., Ltd. Device a22e
 \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
             +-01.0  Huawei Technologies Co., Ltd. Device a251
             +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
             \-03.0  Red Hat, Inc. QEMU PCIe Expander bridge
root@ubuntu:/# 

root@ubuntu:/# 
root@ubuntu:/# dmesg |grep Adding
 hns3 0000:03:00.0: Adding to iommu group 0
 hisi_zip 0000:00:01.0: Adding to iommu group 1
 pcieport 0000:20:00.0: Adding to iommu group 2
 pcieport 0000:02:00.0: Adding to iommu group 3
 virtio-pci 0000:21:00.0: Adding to iommu group 4

Further tests are always welcome.

Please take a look and let me know your feedback.

Thanks,
Shameer

[0] https://lore.kernel.org/qemu-devel/20250711084749.18300-1-shameerali.kolothum.thodi@huawei.com/
[1] https://lore.kernel.org/qemu-devel/20250311141045.66620-1-shameerali.kolothum.thodi@huawei.com/
[2] https://lore.kernel.org/qemu-devel/20250708110601.633308-1-zhenzhong.duan@intel.com/

Nicolin Chen (8):
  backends/iommufd: Introduce iommufd_backend_alloc_viommu
  backends/iommufd: Introduce iommufd_vdev_alloc
  hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
  hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache
    invalidations
  hw/arm/smmuv3: Forward invalidation commands to hw
  Read and validate host SMMUv3 feature bits

Shameer Kolothum (7):
  hw/arm/smmu-common: Factor out common helper functions and export
  hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper
  hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints
    with iommufd
  hw/arm/smmuv3: Implement get_viommu_cap() callback
  hw/pci/pci: Introduce optional get_msi_address_space() callback.
  hw/arm/smmu-common: Add accel property for SMMU dev

 backends/iommufd.c                  |  51 +++
 backends/trace-events               |   2 +
 hw/arm/meson.build                  |   3 +-
 hw/arm/smmu-common.c                |  70 ++-
 hw/arm/smmuv3-accel.c               | 631 ++++++++++++++++++++++++++++
 hw/arm/smmuv3-accel.h               |  93 ++++
 hw/arm/smmuv3-internal.h            |  27 ++
 hw/arm/smmuv3.c                     |  44 +-
 hw/arm/trace-events                 |   5 +
 hw/arm/virt.c                       |  12 +
 hw/pci-bridge/pci_expander_bridge.c |   1 -
 hw/pci/pci.c                        |  19 +
 include/hw/arm/smmu-common.h        |  10 +
 include/hw/arm/smmuv3.h             |   1 +
 include/hw/pci/pci.h                |  16 +
 include/hw/pci/pci_bridge.h         |   1 +
 include/system/iommufd.h            |  19 +
 target/arm/kvm.c                    |   2 +-
 18 files changed, 981 insertions(+), 26 deletions(-)
 create mode 100644 hw/arm/smmuv3-accel.c
 create mode 100644 hw/arm/smmuv3-accel.h

-- 
2.34.1



^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 01/15] backends/iommufd: Introduce iommufd_backend_alloc_viommu
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 16:22   ` Nicolin Chen
  2025-07-15  9:14   ` Jonathan Cameron via
  2025-07-14 15:59 ` [RFC PATCH v3 02/15] backends/iommufd: Introduce iommufd_vdev_alloc Shameer Kolothum via
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

From: Nicolin Chen <nicolinc@nvidia.com>

Add a helper to allocate a viommu object.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 backends/iommufd.c       | 25 +++++++++++++++++++++++++
 backends/trace-events    |  1 +
 include/system/iommufd.h |  4 ++++
 3 files changed, 30 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 2a33c7ab0b..f3b95ee321 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -446,6 +446,31 @@ bool iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t id,
     return !ret;
 }
 
+bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
+                                  uint32_t viommu_type, uint32_t hwpt_id,
+                                  uint32_t *out_viommu_id, Error **errp)
+{
+    int ret, fd = be->fd;
+    struct iommu_viommu_alloc alloc_viommu = {
+        .size = sizeof(alloc_viommu),
+        .type = viommu_type,
+        .dev_id = dev_id,
+        .hwpt_id = hwpt_id,
+    };
+
+    ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &alloc_viommu);
+
+    trace_iommufd_backend_alloc_viommu(fd, viommu_type, dev_id, hwpt_id,
+                                       alloc_viommu.out_viommu_id, ret);
+    if (ret) {
+        error_setg_errno(errp, errno, "IOMMU_VIOMMU_ALLOC failed");
+        return false;
+    }
+
+    *out_viommu_id = alloc_viommu.out_viommu_id;
+    return true;
+}
+
 bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
                                            uint32_t hwpt_id, Error **errp)
 {
diff --git a/backends/trace-events b/backends/trace-events
index 56132d3fd2..2294af2cc8 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -21,3 +21,4 @@ iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%
 iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
 iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
 iommufd_backend_invalidate_cache(int iommufd, uint32_t id, uint32_t data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t data_ptr, int ret) " iommufd=%d id=%u data_type=%u entry_len=%u entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)"
+iommufd_backend_alloc_viommu(int iommufd, uint32_t type, uint32_t dev_id, uint32_t hwpt_id, uint32_t viommu_id, int ret) " iommufd=%d type=%u dev_id=%u hwpt_id=%u viommu_id=%u (%d)"
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index c9c72ffc45..9acdb20032 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -59,6 +59,10 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
                                 uint32_t data_type, uint32_t data_len,
                                 void *data_ptr, uint32_t *out_hwpt,
                                 Error **errp);
+bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
+                                  uint32_t viommu_type, uint32_t hwpt_id,
+                                  uint32_t *out_hwpt, Error **errp);
+
 bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
                                         bool start, Error **errp);
 bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 02/15] backends/iommufd: Introduce iommufd_vdev_alloc
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
  2025-07-14 15:59 ` [RFC PATCH v3 01/15] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 16:27   ` Nicolin Chen
  2025-07-15  9:19   ` Jonathan Cameron via
  2025-07-14 15:59 ` [RFC PATCH v3 03/15] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum via
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

From: Nicolin Chen <nicolinc@nvidia.com>

Add a helper to allocate an iommufd device's virtual device (in the user
space) per a viommu instance.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 backends/iommufd.c       | 26 ++++++++++++++++++++++++++
 backends/trace-events    |  1 +
 include/system/iommufd.h |  4 ++++
 3 files changed, 31 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index f3b95ee321..0cafa1a4b7 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -471,6 +471,32 @@ bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
     return true;
 }
 
+bool iommufd_backend_alloc_vdev(IOMMUFDBackend *be, uint32_t dev_id,
+                                uint32_t viommu_id, uint64_t virt_id,
+                                uint32_t *out_vdev_id, Error **errp)
+{
+    int ret, fd = be->fd;
+    struct iommu_vdevice_alloc alloc_vdev = {
+        .size = sizeof(alloc_vdev),
+        .viommu_id = viommu_id,
+        .dev_id = dev_id,
+        .virt_id = virt_id,
+    };
+
+    ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &alloc_vdev);
+
+    trace_iommufd_backend_alloc_vdev(fd, dev_id, viommu_id, virt_id,
+                                     alloc_vdev.out_vdevice_id, ret);
+
+    if (ret) {
+        error_setg_errno(errp, errno, "IOMMU_VDEVICE_ALLOC failed");
+        return false;
+    }
+
+    *out_vdev_id = alloc_vdev.out_vdevice_id;
+    return true;
+}
+
 bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
                                            uint32_t hwpt_id, Error **errp)
 {
diff --git a/backends/trace-events b/backends/trace-events
index 2294af2cc8..14399da111 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -22,3 +22,4 @@ iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) "
 iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
 iommufd_backend_invalidate_cache(int iommufd, uint32_t id, uint32_t data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t data_ptr, int ret) " iommufd=%d id=%u data_type=%u entry_len=%u entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)"
 iommufd_backend_alloc_viommu(int iommufd, uint32_t type, uint32_t dev_id, uint32_t hwpt_id, uint32_t viommu_id, int ret) " iommufd=%d type=%u dev_id=%u hwpt_id=%u viommu_id=%u (%d)"
+iommufd_backend_alloc_vdev(int iommufd, uint32_t dev_id, uint32_t viommu_id, uint64_t virt_id, uint32_t vdev_id, int ret) " iommufd=%d dev_id=%u viommu_id=%u virt_id=0x%"PRIx64" vdev_id=%u (%d)"
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 9acdb20032..6ab3ba3cb6 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -63,6 +63,10 @@ bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
                                   uint32_t viommu_type, uint32_t hwpt_id,
                                   uint32_t *out_hwpt, Error **errp);
 
+bool iommufd_backend_alloc_vdev(IOMMUFDBackend *be, uint32_t dev_id,
+                                uint32_t viommu_id, uint64_t virt_id,
+                                uint32_t *out_vdev_id, Error **errp);
+
 bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
                                         bool start, Error **errp);
 bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 03/15] hw/arm/smmu-common: Factor out common helper functions and export
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
  2025-07-14 15:59 ` [RFC PATCH v3 01/15] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
  2025-07-14 15:59 ` [RFC PATCH v3 02/15] backends/iommufd: Introduce iommufd_vdev_alloc Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-15  9:27   ` Jonathan Cameron via
  2025-07-14 15:59 ` [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper Shameer Kolothum via
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

Subsequent patches for smmuv3 accel support will make use of this.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmu-common.c         | 48 ++++++++++++++++++++++--------------
 include/hw/arm/smmu-common.h |  6 +++++
 2 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index ab920717cf..0f1a06cec2 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -847,12 +847,28 @@ SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num)
     return NULL;
 }
 
-static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
+void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev,
+                    PCIBus *bus, int devfn)
 {
-    SMMUState *s = opaque;
-    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
-    SMMUDevice *sdev;
     static unsigned int index;
+    char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn, index++);
+
+    sdev->smmu = s;
+    sdev->bus = bus;
+    sdev->devfn = devfn;
+
+    memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
+                             s->mrtypename,
+                             OBJECT(s), name, UINT64_MAX);
+    address_space_init(&sdev->as,
+                       MEMORY_REGION(&sdev->iommu), name);
+    trace_smmu_add_mr(name);
+    g_free(name);
+}
+
+SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus)
+{
+    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
 
     if (!sbus) {
         sbus = g_malloc0(sizeof(SMMUPciBus) +
@@ -861,23 +877,19 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
         g_hash_table_insert(s->smmu_pcibus_by_busptr, bus, sbus);
     }
 
+    return sbus;
+}
+
+static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
+{
+    SMMUDevice *sdev;
+    SMMUState *s = opaque;
+    SMMUPciBus *sbus = smmu_get_sbus(s, bus);
+
     sdev = sbus->pbdev[devfn];
     if (!sdev) {
-        char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn, index++);
-
         sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1);
-
-        sdev->smmu = s;
-        sdev->bus = bus;
-        sdev->devfn = devfn;
-
-        memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
-                                 s->mrtypename,
-                                 OBJECT(s), name, UINT64_MAX);
-        address_space_init(&sdev->as,
-                           MEMORY_REGION(&sdev->iommu), name);
-        trace_smmu_add_mr(name);
-        g_free(name);
+        smmu_init_sdev(s, sdev, bus, devfn);
     }
 
     return &sdev->as;
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 80d0fecfde..c6f899e403 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -180,6 +180,12 @@ OBJECT_DECLARE_TYPE(SMMUState, SMMUBaseClass, ARM_SMMU)
 /* Return the SMMUPciBus handle associated to a PCI bus number */
 SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num);
 
+/* Return the SMMUPciBus handle associated to a PCI bus */
+SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus);
+
+/* Initialize SMMUDevice handle associated to a SMMUPCIBus */
+void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev, PCIBus *bus, int devfn);
+
 /* Return the stream ID of an SMMU device */
 static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (2 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 03/15] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 16:38   ` Nicolin Chen via
                     ` (2 more replies)
  2025-07-14 15:59 ` [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum via
                   ` (12 subsequent siblings)
  16 siblings, 3 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

Allows to retrieve the PCIIOMMUOps based on the SMMU type. This will be
useful when we add support for accelerated SMMUV3 in subsequent patches
as that requires a different set of callbacks for iommu ops.

No special handling is required for now and returns the default ops
in base SMMU Class.

No functional changes intended.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmu-common.c         | 17 +++++++++++++++--
 include/hw/arm/smmu-common.h |  1 +
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 0f1a06cec2..3a1080773a 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -934,6 +934,16 @@ void smmu_inv_notifiers_all(SMMUState *s)
     }
 }
 
+static const PCIIOMMUOps *smmu_iommu_ops_by_type(SMMUState *s)
+{
+    SMMUBaseClass *sbc;
+
+    sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
+    assert(sbc->iommu_ops);
+
+    return sbc->iommu_ops;
+}
+
 static void smmu_base_realize(DeviceState *dev, Error **errp)
 {
     SMMUState *s = ARM_SMMU(dev);
@@ -962,6 +972,7 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
      */
     if (pci_bus_is_express(pci_bus) && pci_bus_is_root(pci_bus) &&
         object_dynamic_cast(OBJECT(pci_bus)->parent, TYPE_PCI_HOST_BRIDGE)) {
+        const PCIIOMMUOps  *iommu_ops;
         /*
          * This condition matches either the default pcie.0, pxb-pcie, or
          * pxb-cxl. For both pxb-pcie and pxb-cxl, parent_dev will be set.
@@ -974,10 +985,11 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
             }
         }
 
+        iommu_ops = smmu_iommu_ops_by_type(s);
         if (s->smmu_per_bus) {
-            pci_setup_iommu_per_bus(pci_bus, &smmu_ops, s);
+            pci_setup_iommu_per_bus(pci_bus, iommu_ops, s);
         } else {
-            pci_setup_iommu(pci_bus, &smmu_ops, s);
+            pci_setup_iommu(pci_bus, iommu_ops, s);
         }
         return;
     }
@@ -1018,6 +1030,7 @@ static void smmu_base_class_init(ObjectClass *klass, const void *data)
     device_class_set_parent_realize(dc, smmu_base_realize,
                                     &sbc->parent_realize);
     rc->phases.exit = smmu_base_reset_exit;
+    sbc->iommu_ops = &smmu_ops;
 }
 
 static const TypeInfo smmu_base_info = {
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index c6f899e403..eb94623555 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -171,6 +171,7 @@ struct SMMUBaseClass {
     /*< public >*/
 
     DeviceRealize parent_realize;
+    const PCIIOMMUOps *iommu_ops;
 
 };
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (3 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 17:23   ` Nicolin Chen
                     ` (2 more replies)
  2025-07-14 15:59 ` [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum via
                   ` (11 subsequent siblings)
  16 siblings, 3 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

Also setup specific PCIIOMMUOps for accel SMMUv3 as accel
SMMUv3 will have different handling for those ops callbacks
in subsequent patches.

The "accel" property is not yet added, so users cannot set it at this
point. It will be introduced in a subsequent patch once the necessary
support is in place.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/meson.build           |  3 +-
 hw/arm/smmu-common.c         |  6 +++-
 hw/arm/smmuv3-accel.c        | 66 ++++++++++++++++++++++++++++++++++++
 hw/arm/smmuv3-accel.h        | 19 +++++++++++
 include/hw/arm/smmu-common.h |  3 ++
 5 files changed, 95 insertions(+), 2 deletions(-)
 create mode 100644 hw/arm/smmuv3-accel.c
 create mode 100644 hw/arm/smmuv3-accel.h

diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index dc68391305..6126eb1b64 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -61,7 +61,8 @@ arm_common_ss.add(when: 'CONFIG_ARMSSE', if_true: files('armsse.c'))
 arm_common_ss.add(when: 'CONFIG_FSL_IMX7', if_true: files('fsl-imx7.c', 'mcimx7d-sabre.c'))
 arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP', if_true: files('fsl-imx8mp.c'))
 arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP_EVK', if_true: files('imx8mp-evk.c'))
-arm_common_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
+arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
+arm_ss.add(when: ['CONFIG_ARM_SMMUV3', 'CONFIG_IOMMUFD'], if_true: files('smmuv3-accel.c'))
 arm_common_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
 arm_common_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
 arm_ss.add(when: 'CONFIG_XEN', if_true: files(
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 3a1080773a..6a58f574d3 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -938,7 +938,11 @@ static const PCIIOMMUOps *smmu_iommu_ops_by_type(SMMUState *s)
 {
     SMMUBaseClass *sbc;
 
-    sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
+    if (s->accel) {
+        sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMUV3_ACCEL));
+    } else {
+        sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
+    }
     assert(sbc->iommu_ops);
 
     return sbc->iommu_ops;
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
new file mode 100644
index 0000000000..2eac9c6ff4
--- /dev/null
+++ b/hw/arm/smmuv3-accel.c
@@ -0,0 +1,66 @@
+/*
+ * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
+ * Copyright (C) 2025 NVIDIA
+ * Written by Nicolin Chen, Shameer Kolothum
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/arm/smmuv3.h"
+#include "smmuv3-accel.h"
+
+static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
+                                                PCIBus *bus, int devfn)
+{
+    SMMUDevice *sdev = sbus->pbdev[devfn];
+    SMMUv3AccelDevice *accel_dev;
+
+    if (sdev) {
+        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+    } else {
+        accel_dev = g_new0(SMMUv3AccelDevice, 1);
+        sdev = &accel_dev->sdev;
+
+        sbus->pbdev[devfn] = sdev;
+        smmu_init_sdev(bs, sdev, bus, devfn);
+    }
+
+    return accel_dev;
+}
+
+static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
+                                              int devfn)
+{
+    SMMUState *bs = opaque;
+    SMMUPciBus *sbus;
+    SMMUv3AccelDevice *accel_dev;
+    SMMUDevice *sdev;
+
+    sbus = smmu_get_sbus(bs, bus);
+    accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
+    sdev = &accel_dev->sdev;
+
+    return &sdev->as;
+}
+
+static const PCIIOMMUOps smmuv3_accel_ops = {
+    .get_address_space = smmuv3_accel_find_add_as,
+};
+
+static void smmuv3_accel_class_init(ObjectClass *oc, const void *data)
+{
+    SMMUBaseClass *sbc = ARM_SMMU_CLASS(oc);
+
+    sbc->iommu_ops = &smmuv3_accel_ops;
+}
+
+static const TypeInfo types[] = {
+    {
+        .name = TYPE_ARM_SMMUV3_ACCEL,
+        .parent = TYPE_ARM_SMMUV3,
+        .class_init = smmuv3_accel_class_init,
+    }
+};
+DEFINE_TYPES(types)
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
new file mode 100644
index 0000000000..4cf30b1291
--- /dev/null
+++ b/hw/arm/smmuv3-accel.h
@@ -0,0 +1,19 @@
+/*
+ * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
+ * Copyright (C) 2025 NVIDIA
+ * Written by Nicolin Chen, Shameer Kolothum
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_ARM_SMMUV3_ACCEL_H
+#define HW_ARM_SMMUV3_ACCEL_H
+
+#include "hw/arm/smmu-common.h"
+#include CONFIG_DEVICES
+
+typedef struct SMMUv3AccelDevice {
+    SMMUDevice  sdev;
+} SMMUv3AccelDevice;
+
+#endif /* HW_ARM_SMMUV3_ACCEL_H */
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index eb94623555..c459d24427 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -162,6 +162,7 @@ struct SMMUState {
     uint8_t bus_num;
     PCIBus *primary_bus;
     bool smmu_per_bus; /* SMMU is specific to the primary_bus */
+    bool accel; /* SMMU has accelerator support */
 };
 
 struct SMMUBaseClass {
@@ -178,6 +179,8 @@ struct SMMUBaseClass {
 #define TYPE_ARM_SMMU "arm-smmu"
 OBJECT_DECLARE_TYPE(SMMUState, SMMUBaseClass, ARM_SMMU)
 
+#define TYPE_ARM_SMMUV3_ACCEL "arm-smmuv3-accel"
+
 /* Return the SMMUPciBus handle associated to a PCI bus number */
 SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (4 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 18:18   ` Nicolin Chen
                     ` (3 more replies)
  2025-07-14 15:59 ` [RFC PATCH v3 07/15] hw/arm/smmuv3: Implement get_viommu_cap() callback Shameer Kolothum via
                   ` (10 subsequent siblings)
  16 siblings, 4 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

Accelerated SMMUv3 is only useful when the device can take advantage of
the host's SMMUv3 in nested mode. To keep things simple and correct, we
only allow this feature for vfio-pci endpoint devices that use the iommufd
backend. We also allow non-endpoint emulated devices like PCI bridges and
root ports, so that users can plug in these vfio-pci devices.

Another reason for this limit is to avoid problems with IOTLB
invalidations. Some commands (e.g., CMD_TLBI_NH_ASID) lack an associated
SID, making it difficult to trace the originating device. If we allowed
emulated endpoint devices, QEMU would have to invalidate both its own
software IOTLB and the host's hardware IOTLB, which could slow things
down.

Since vfio-pci devices in nested mode rely on the host SMMUv3's nested
translation (S1+S2), their get_address_space() callback must return the
system address space to enable correct S2 mappings of guest RAM.

So in short:
 - vfio-pci devices return the system address space
 - bridges and root ports return the IOMMU address space

Note: On ARM, MSI doorbell addresses are also translated via SMMUv3.
Hence, if a vfio-pci device is behind the SMMuv3 with translation enabled,
it must return the IOMMU address space for MSI. Support for this will be
added in a follow-up patch.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c               | 50 ++++++++++++++++++++++++++++-
 hw/arm/smmuv3-accel.h               | 15 +++++++++
 hw/arm/smmuv3.c                     |  4 +++
 hw/pci-bridge/pci_expander_bridge.c |  1 -
 include/hw/arm/smmuv3.h             |  1 +
 include/hw/pci/pci_bridge.h         |  1 +
 6 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 2eac9c6ff4..0b0ddb03e2 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -7,13 +7,19 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 
 #include "hw/arm/smmuv3.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci-host/gpex.h"
+#include "hw/vfio/pci.h"
+
 #include "smmuv3-accel.h"
 
 static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
                                                 PCIBus *bus, int devfn)
 {
+    SMMUv3State *s = ARM_SMMUV3(bs);
     SMMUDevice *sdev = sbus->pbdev[devfn];
     SMMUv3AccelDevice *accel_dev;
 
@@ -25,30 +31,72 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
 
         sbus->pbdev[devfn] = sdev;
         smmu_init_sdev(bs, sdev, bus, devfn);
+        address_space_init(&accel_dev->as_sysmem, &s->s_accel->root,
+                           "smmuv3-accel-sysmem");
     }
 
     return accel_dev;
 }
 
+static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool *vfio_pci)
+{
+
+    if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
+        object_dynamic_cast(OBJECT(pdev), "pxb-pcie") ||
+        object_dynamic_cast(OBJECT(pdev), "gpex-root")) {
+        return true;
+    } else if ((object_dynamic_cast(OBJECT(pdev), TYPE_VFIO_PCI) &&
+        object_property_find(OBJECT(pdev), "iommufd"))) {
+        *vfio_pci = true;
+        return true;
+    }
+    return false;
+}
+
 static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
                                               int devfn)
 {
+    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
     SMMUState *bs = opaque;
+    bool vfio_pci = false;
     SMMUPciBus *sbus;
     SMMUv3AccelDevice *accel_dev;
     SMMUDevice *sdev;
 
+    if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
+        error_report("Device(%s) not allowed. Only PCIe root complex devices "
+                     "or PCI bridge devices or vfio-pci endpoint devices with "
+                     "iommufd as backend is allowed with arm-smmuv3,accel=on",
+                     pdev->name);
+        exit(1);
+    }
     sbus = smmu_get_sbus(bs, bus);
     accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
     sdev = &accel_dev->sdev;
 
-    return &sdev->as;
+    if (vfio_pci) {
+        return &accel_dev->as_sysmem;
+    } else {
+        return &sdev->as;
+    }
 }
 
 static const PCIIOMMUOps smmuv3_accel_ops = {
     .get_address_space = smmuv3_accel_find_add_as,
 };
 
+void smmuv3_accel_init(SMMUv3State *s)
+{
+    SMMUv3AccelState *s_accel;
+
+    s->s_accel = s_accel = g_new0(SMMUv3AccelState, 1);
+    memory_region_init(&s_accel->root, OBJECT(s), "root", UINT64_MAX);
+    memory_region_init_alias(&s_accel->sysmem, OBJECT(s),
+                             "smmuv3-accel-sysmem", get_system_memory(), 0,
+                             memory_region_size(get_system_memory()));
+    memory_region_add_subregion(&s_accel->root, 0, &s_accel->sysmem);
+}
+
 static void smmuv3_accel_class_init(ObjectClass *oc, const void *data)
 {
     SMMUBaseClass *sbc = ARM_SMMU_CLASS(oc);
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index 4cf30b1291..2cd343103f 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -9,11 +9,26 @@
 #ifndef HW_ARM_SMMUV3_ACCEL_H
 #define HW_ARM_SMMUV3_ACCEL_H
 
+#include "hw/arm/smmuv3.h"
 #include "hw/arm/smmu-common.h"
 #include CONFIG_DEVICES
 
 typedef struct SMMUv3AccelDevice {
     SMMUDevice  sdev;
+    AddressSpace as_sysmem;
 } SMMUv3AccelDevice;
 
+typedef struct SMMUv3AccelState {
+    MemoryRegion root;
+    MemoryRegion sysmem;
+} SMMUv3AccelState;
+
+#if defined(CONFIG_ARM_SMMUV3) && defined(CONFIG_IOMMUFD)
+void smmuv3_accel_init(SMMUv3State *s);
+#else
+static inline void smmuv3_accel_init(SMMUv3State *d)
+{
+}
+#endif
+
 #endif /* HW_ARM_SMMUV3_ACCEL_H */
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index bcf8af8dc7..2f5a8157dd 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -32,6 +32,7 @@
 #include "qapi/error.h"
 
 #include "hw/arm/smmuv3.h"
+#include "smmuv3-accel.h"
 #include "smmuv3-internal.h"
 #include "smmu-internal.h"
 
@@ -1898,6 +1899,9 @@ static void smmu_realize(DeviceState *d, Error **errp)
     sysbus_init_mmio(dev, &sys->iomem);
 
     smmu_init_irq(s, dev);
+    if (sys->accel) {
+        smmuv3_accel_init(s);
+    }
 }
 
 static const VMStateDescription vmstate_smmuv3_queue = {
diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c
index 1bcceddbc4..a8eb2d2426 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -48,7 +48,6 @@ struct PXBBus {
     char bus_path[8];
 };
 
-#define TYPE_PXB_PCIE_DEV "pxb-pcie"
 OBJECT_DECLARE_SIMPLE_TYPE(PXBPCIEDev, PXB_PCIE_DEV)
 
 static GList *pxb_dev_list;
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index d183a62766..3bdb92391a 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -63,6 +63,7 @@ struct SMMUv3State {
     qemu_irq     irq[4];
     QemuMutex mutex;
     char *stage;
+    struct SMMUv3AccelState  *s_accel;
 };
 
 typedef enum {
diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h
index a055fd8d32..b61360b900 100644
--- a/include/hw/pci/pci_bridge.h
+++ b/include/hw/pci/pci_bridge.h
@@ -106,6 +106,7 @@ typedef struct PXBPCIEDev {
 
 #define TYPE_PXB_PCIE_BUS "pxb-pcie-bus"
 #define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
+#define TYPE_PXB_PCIE_DEV "pxb-pcie"
 #define TYPE_PXB_DEV "pxb"
 OBJECT_DECLARE_SIMPLE_TYPE(PXBDev, PXB_DEV)
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 07/15] hw/arm/smmuv3: Implement get_viommu_cap() callback
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (5 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 18:31   ` Nicolin Chen
  2025-07-14 15:59 ` [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

For accelerated SMMUv3, we need nested parent domain creation. Add the
callback support so that VFIO can create a nested parent.

Since 'accel=on' for SMMUv3 requires the guest SMMUv3 to be configured
in Stage 1 mode, ensure that the 'stage' property is explicitly set to
Stage 1.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c | 15 +++++++++++++++
 hw/arm/virt.c         | 12 ++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 0b0ddb03e2..66cd4f5ece 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -10,6 +10,7 @@
 #include "qemu/error-report.h"
 
 #include "hw/arm/smmuv3.h"
+#include "hw/iommu.h"
 #include "hw/pci/pci_bridge.h"
 #include "hw/pci-host/gpex.h"
 #include "hw/vfio/pci.h"
@@ -81,8 +82,22 @@ static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
     }
 }
 
+static uint64_t smmuv3_accel_get_viommu_cap(void *opaque)
+{
+    /*
+     * Accelerated smmuv3 support only allowes Guest S1
+     * configuration. Hence report VIOMMU_CAP_STAGE1
+     * so that VFIO can create nested parent domain.
+     * The real nested support should be reported from host
+     * SMMUv3 and if it doesn't, the nested parent allocation
+     * will fail anyway.
+     */
+    return VIOMMU_CAP_STAGE1;
+}
+
 static const PCIIOMMUOps smmuv3_accel_ops = {
     .get_address_space = smmuv3_accel_find_add_as,
+    .get_viommu_cap = smmuv3_accel_get_viommu_cap,
 };
 
 void smmuv3_accel_init(SMMUv3State *s)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 22393cf39e..fdb47eda6a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3053,6 +3053,18 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                 return;
             }
 
+            if (object_property_get_bool(OBJECT(dev), "accel", &error_abort)) {
+                char *stage;
+
+                stage = object_property_get_str(OBJECT(dev), "stage",
+                                                &error_fatal);
+                if (*stage && strcmp("1", stage)) {
+                    error_setg(errp, "Only stage1 is supported for SMMUV3 with "
+                               "accel=on");
+                    return;
+                }
+            }
+
             create_smmuv3_dev_dtb(vms, dev, bus);
         }
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (6 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 07/15] hw/arm/smmuv3: Implement get_viommu_cap() callback Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 19:11   ` Nicolin Chen
  2025-07-15 10:29   ` Jonathan Cameron via
  2025-07-14 15:59 ` [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support Shameer Kolothum via
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

From: Nicolin Chen <nicolinc@nvidia.com>

Implement a set_iommu_device callback:
 -If found an existing viommu reuse that.
   (Devices behind the same physical SMMU should share an S2 HWPT)
 -Else,
    Allocate a viommu with the nested parent S2 hwpt allocated by VFIO.
    Allocate bypass and abort hwpt.
 -And add the dev to viommu device list

Also add an unset_iommu_device to unwind/cleanup above.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c    | 154 +++++++++++++++++++++++++++++++++++++++
 hw/arm/smmuv3-accel.h    |  20 +++++
 hw/arm/trace-events      |   4 +
 include/system/iommufd.h |   6 ++
 4 files changed, 184 insertions(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 66cd4f5ece..fe90d48675 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -7,6 +7,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "trace.h"
 #include "qemu/error-report.h"
 
 #include "hw/arm/smmuv3.h"
@@ -17,6 +18,9 @@
 
 #include "smmuv3-accel.h"
 
+#define SMMU_STE_VALID      (1ULL << 0)
+#define SMMU_STE_CFG_BYPASS (1ULL << 3)
+
 static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
                                                 PCIBus *bus, int devfn)
 {
@@ -39,6 +43,154 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
     return accel_dev;
 }
 
+static bool
+smmuv3_accel_dev_alloc_viommu(SMMUv3AccelDevice *accel_dev,
+                               HostIOMMUDeviceIOMMUFD *idev, Error **errp)
+{
+    struct iommu_hwpt_arm_smmuv3 bypass_data = {
+        .ste = { SMMU_STE_CFG_BYPASS | SMMU_STE_VALID, 0x0ULL },
+    };
+    struct iommu_hwpt_arm_smmuv3 abort_data = {
+        .ste = { SMMU_STE_VALID, 0x0ULL },
+    };
+    SMMUDevice *sdev = &accel_dev->sdev;
+    SMMUState *bs = sdev->smmu;
+    SMMUv3State *s = ARM_SMMUV3(bs);
+    SMMUv3AccelState *s_accel = s->s_accel;
+    uint32_t s2_hwpt_id = idev->hwpt_id;
+    SMMUS2Hwpt *s2_hwpt;
+    SMMUViommu *viommu;
+    uint32_t viommu_id;
+
+    if (s_accel->viommu) {
+        accel_dev->viommu = s_accel->viommu;
+        return true;
+    }
+
+    if (!iommufd_backend_alloc_viommu(idev->iommufd, idev->devid,
+                                      IOMMU_VIOMMU_TYPE_ARM_SMMUV3,
+                                      s2_hwpt_id, &viommu_id, errp)) {
+        return false;
+    }
+
+    viommu = g_new0(SMMUViommu, 1);
+    viommu->core.viommu_id = viommu_id;
+    viommu->core.s2_hwpt_id = s2_hwpt_id;
+    viommu->core.iommufd = idev->iommufd;
+
+    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
+                                    viommu->core.viommu_id, 0,
+                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
+                                    sizeof(abort_data), &abort_data,
+                                    &viommu->abort_hwpt_id, errp)) {
+        goto free_viommu;
+    }
+
+    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
+                                    viommu->core.viommu_id, 0,
+                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
+                                    sizeof(bypass_data), &bypass_data,
+                                    &viommu->bypass_hwpt_id, errp)) {
+        goto free_abort_hwpt;
+    }
+
+    s2_hwpt = g_new(SMMUS2Hwpt, 1);
+    s2_hwpt->iommufd = idev->iommufd;
+    s2_hwpt->hwpt_id = s2_hwpt_id;
+
+    viommu->iommufd = idev->iommufd;
+    viommu->s2_hwpt = s2_hwpt;
+
+    s_accel->viommu = viommu;
+    accel_dev->viommu = viommu;
+    return true;
+
+free_abort_hwpt:
+    iommufd_backend_free_id(idev->iommufd, viommu->abort_hwpt_id);
+free_viommu:
+    iommufd_backend_free_id(idev->iommufd, viommu->core.viommu_id);
+    g_free(viommu);
+    return false;
+}
+
+static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
+                                          HostIOMMUDevice *hiod, Error **errp)
+{
+    HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(hiod);
+    SMMUState *bs = opaque;
+    SMMUv3State *s = ARM_SMMUV3(bs);
+    SMMUv3AccelState *s_accel = s->s_accel;
+    SMMUPciBus *sbus = smmu_get_sbus(bs, bus);
+    SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
+    SMMUDevice *sdev = &accel_dev->sdev;
+
+    if (!idev) {
+        return true;
+    }
+
+    if (accel_dev->idev) {
+        if (accel_dev->idev != idev) {
+            error_report("Device 0x%x already has an associated idev",
+                         smmu_get_sid(sdev));
+            return false;
+        } else {
+            return true;
+        }
+    }
+
+    if (!smmuv3_accel_dev_alloc_viommu(accel_dev, idev, errp)) {
+        error_report("Device 0x%x: Unable to alloc viommu", smmu_get_sid(sdev));
+        return false;
+    }
+
+    accel_dev->idev = idev;
+    QLIST_INSERT_HEAD(&s_accel->viommu->device_list, accel_dev, next);
+    trace_smmuv3_accel_set_iommu_device(devfn, smmu_get_sid(sdev));
+    return true;
+}
+
+static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
+                                            int devfn)
+{
+    SMMUState *bs = opaque;
+    SMMUv3State *s = ARM_SMMUV3(bs);
+    SMMUPciBus *sbus = g_hash_table_lookup(bs->smmu_pcibus_by_busptr, bus);
+    SMMUv3AccelDevice *accel_dev;
+    SMMUViommu *viommu;
+    SMMUDevice *sdev;
+
+    if (!sbus) {
+        return;
+    }
+
+    sdev = sbus->pbdev[devfn];
+    if (!sdev) {
+        return;
+    }
+
+    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+    if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev,
+                                               accel_dev->idev->hwpt_id,
+                                               NULL)) {
+        error_report("Unable to attach dev to the default HW pagetable");
+    }
+
+    accel_dev->idev = NULL;
+    QLIST_REMOVE(accel_dev, next);
+    trace_smmuv3_accel_unset_iommu_device(devfn, smmu_get_sid(sdev));
+
+    viommu = s->s_accel->viommu;
+    if (QLIST_EMPTY(&viommu->device_list)) {
+        iommufd_backend_free_id(viommu->iommufd, viommu->bypass_hwpt_id);
+        iommufd_backend_free_id(viommu->iommufd, viommu->abort_hwpt_id);
+        iommufd_backend_free_id(viommu->iommufd, viommu->core.viommu_id);
+        iommufd_backend_free_id(viommu->iommufd, viommu->s2_hwpt->hwpt_id);
+        g_free(viommu->s2_hwpt);
+        g_free(viommu);
+        s->s_accel->viommu = NULL;
+    }
+}
+
 static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool *vfio_pci)
 {
 
@@ -98,6 +250,8 @@ static uint64_t smmuv3_accel_get_viommu_cap(void *opaque)
 static const PCIIOMMUOps smmuv3_accel_ops = {
     .get_address_space = smmuv3_accel_find_add_as,
     .get_viommu_cap = smmuv3_accel_get_viommu_cap,
+    .set_iommu_device = smmuv3_accel_set_iommu_device,
+    .unset_iommu_device = smmuv3_accel_unset_iommu_device,
 };
 
 void smmuv3_accel_init(SMMUv3State *s)
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index 2cd343103f..55a6a353fc 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -11,16 +11,36 @@
 
 #include "hw/arm/smmuv3.h"
 #include "hw/arm/smmu-common.h"
+#include "system/iommufd.h"
+#include <linux/iommufd.h>
 #include CONFIG_DEVICES
 
+typedef struct SMMUS2Hwpt {
+    IOMMUFDBackend *iommufd;
+    uint32_t hwpt_id;
+} SMMUS2Hwpt;
+
+typedef struct SMMUViommu {
+    IOMMUFDBackend *iommufd;
+    IOMMUFDViommu core;
+    SMMUS2Hwpt *s2_hwpt;
+    uint32_t bypass_hwpt_id;
+    uint32_t abort_hwpt_id;
+    QLIST_HEAD(, SMMUv3AccelDevice) device_list;
+} SMMUViommu;
+
 typedef struct SMMUv3AccelDevice {
     SMMUDevice  sdev;
     AddressSpace as_sysmem;
+    HostIOMMUDeviceIOMMUFD *idev;
+    SMMUViommu *viommu;
+    QLIST_ENTRY(SMMUv3AccelDevice) next;
 } SMMUv3AccelDevice;
 
 typedef struct SMMUv3AccelState {
     MemoryRegion root;
     MemoryRegion sysmem;
+    SMMUViommu *viommu;
 } SMMUv3AccelState;
 
 #if defined(CONFIG_ARM_SMMUV3) && defined(CONFIG_IOMMUFD)
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index f3386bd7ae..c4537ca1d6 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -66,6 +66,10 @@ smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s
 smmuv3_inv_notifiers_iova(const char *name, int asid, int vmid, uint64_t iova, uint8_t tg, uint64_t num_pages, int stage) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" stage=%d"
 smmu_reset_exit(void) ""
 
+#smmuv3-accel.c
+smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
+smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
+
 # strongarm.c
 strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
 strongarm_ssp_read_underrun(void) "SSP rx underrun"
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 6ab3ba3cb6..b7ad2cf10c 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -38,6 +38,12 @@ struct IOMMUFDBackend {
     /*< public >*/
 };
 
+typedef struct IOMMUFDViommu {
+    IOMMUFDBackend *iommufd;
+    uint32_t s2_hwpt_id;
+    uint32_t viommu_id;
+} IOMMUFDViommu;
+
 bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
 void iommufd_backend_disconnect(IOMMUFDBackend *be);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (7 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 19:37   ` Nicolin Chen
  2025-07-15 23:12   ` Nicolin Chen
  2025-07-14 15:59 ` [RFC PATCH v3 10/15] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device Shameer Kolothum via
                   ` (7 subsequent siblings)
  16 siblings, 2 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

From: Nicolin Chen <nicolinc@nvidia.com>

Allocates a s1 HWPT for the Guest s1 stage and attaches that
to the dev. This will be invoked when Guest issues
SMMU_CMD_CFGI_STE/STE_RANGE.

While at it, we are also exporting both smmu_find_ste() and
smmuv3_flush_config() from smmuv3.c for use here.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c    | 130 +++++++++++++++++++++++++++++++++++++++
 hw/arm/smmuv3-accel.h    |  17 +++++
 hw/arm/smmuv3-internal.h |   4 ++
 hw/arm/smmuv3.c          |   8 ++-
 hw/arm/trace-events      |   1 +
 5 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index fe90d48675..74bf20cfaf 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -18,9 +18,139 @@
 
 #include "smmuv3-accel.h"
 
+#include "smmuv3-internal.h"
+
 #define SMMU_STE_VALID      (1ULL << 0)
 #define SMMU_STE_CFG_BYPASS (1ULL << 3)
 
+static void
+smmuv3_accel_dev_uninstall_nested_ste(SMMUv3AccelDevice *accel_dev, bool abort)
+{
+    HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
+    SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
+    uint32_t hwpt_id;
+
+    if (!s1_hwpt || !accel_dev->viommu) {
+        return;
+    }
+
+    if (abort) {
+        hwpt_id = accel_dev->viommu->abort_hwpt_id;
+    } else {
+        hwpt_id = accel_dev->viommu->bypass_hwpt_id;
+    }
+
+    host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, &error_abort);
+    iommufd_backend_free_id(s1_hwpt->iommufd, s1_hwpt->hwpt_id);
+    accel_dev->s1_hwpt = NULL;
+    g_free(s1_hwpt);
+}
+
+static int
+smmuv3_accel_dev_install_nested_ste(SMMUv3AccelDevice *accel_dev,
+                                    uint32_t data_type, uint32_t data_len,
+                                    void *data)
+{
+    SMMUViommu *viommu = accel_dev->viommu;
+    SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
+    HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
+    uint32_t flags = 0;
+
+    if (!idev || !viommu) {
+        return -ENOENT;
+    }
+
+    if (s1_hwpt) {
+        smmuv3_accel_dev_uninstall_nested_ste(accel_dev, true);
+    }
+
+    s1_hwpt = g_new0(SMMUS1Hwpt, 1);
+    s1_hwpt->iommufd = idev->iommufd;
+    iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
+                               viommu->core.viommu_id, flags, data_type,
+                               data_len, data, &s1_hwpt->hwpt_id, &error_abort);
+    host_iommu_device_iommufd_attach_hwpt(idev, s1_hwpt->hwpt_id, &error_abort);
+    accel_dev->s1_hwpt = s1_hwpt;
+    return 0;
+}
+
+void smmuv3_accel_install_nested_ste(SMMUState *bs, SMMUDevice *sdev, int sid)
+{
+    SMMUv3AccelDevice *accel_dev;
+    SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
+                           .inval_ste_allowed = true};
+    struct iommu_hwpt_arm_smmuv3 nested_data = {};
+    uint32_t config;
+    STE ste;
+    int ret;
+
+    if (!bs->accel) {
+        return;
+    }
+
+    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+    if (!accel_dev->viommu) {
+        return;
+    }
+
+    ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
+    if (ret) {
+        error_report("failed to find STE for sid 0x%x", sid);
+        return;
+    }
+
+    config = STE_CONFIG(&ste);
+    if (!STE_VALID(&ste) || !STE_CFG_S1_ENABLED(config)) {
+        smmuv3_accel_dev_uninstall_nested_ste(accel_dev, STE_CFG_ABORT(config));
+        smmuv3_flush_config(sdev);
+        return;
+    }
+
+    nested_data.ste[0] = (uint64_t)ste.word[0] | (uint64_t)ste.word[1] << 32;
+    nested_data.ste[1] = (uint64_t)ste.word[2] | (uint64_t)ste.word[3] << 32;
+    /* V | CONFIG | S1FMT | S1CTXPTR | S1CDMAX */
+    nested_data.ste[0] &= 0xf80fffffffffffffULL;
+    /* S1DSS | S1CIR | S1COR | S1CSH | S1STALLD | EATS */
+    nested_data.ste[1] &= 0x380000ffULL;
+    ret = smmuv3_accel_dev_install_nested_ste(accel_dev,
+                                              IOMMU_HWPT_DATA_ARM_SMMUV3,
+                                              sizeof(nested_data),
+                                              &nested_data);
+    if (ret) {
+        error_report("Unable to install nested STE=%16LX:%16LX, sid=0x%x,"
+                      "ret=%d", nested_data.ste[1], nested_data.ste[0],
+                      sid, ret);
+    }
+
+    trace_smmuv3_accel_install_nested_ste(sid, nested_data.ste[1],
+                                          nested_data.ste[0]);
+}
+
+static void
+smmuv3_accel_ste_range(gpointer key, gpointer value, gpointer user_data)
+{
+    SMMUDevice *sdev = (SMMUDevice *)key;
+    uint32_t sid = smmu_get_sid(sdev);
+    SMMUSIDRange *sid_range = (SMMUSIDRange *)user_data;
+
+    if (sid >= sid_range->start && sid <= sid_range->end) {
+        SMMUv3State *s = sdev->smmu;
+        SMMUState *bs = &s->smmu_state;
+
+        smmuv3_accel_install_nested_ste(bs, sdev, sid);
+    }
+}
+
+void
+smmuv3_accel_install_nested_ste_range(SMMUState *bs, SMMUSIDRange *range)
+{
+    if (!bs->accel) {
+        return;
+    }
+
+    g_hash_table_foreach(bs->configs, smmuv3_accel_ste_range, range);
+}
+
 static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
                                                 PCIBus *bus, int devfn)
 {
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index 55a6a353fc..06e81b630d 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -29,10 +29,16 @@ typedef struct SMMUViommu {
     QLIST_HEAD(, SMMUv3AccelDevice) device_list;
 } SMMUViommu;
 
+typedef struct SMMUS1Hwpt {
+    IOMMUFDBackend *iommufd;
+    uint32_t hwpt_id;
+} SMMUS1Hwpt;
+
 typedef struct SMMUv3AccelDevice {
     SMMUDevice  sdev;
     AddressSpace as_sysmem;
     HostIOMMUDeviceIOMMUFD *idev;
+    SMMUS1Hwpt  *s1_hwpt;
     SMMUViommu *viommu;
     QLIST_ENTRY(SMMUv3AccelDevice) next;
 } SMMUv3AccelDevice;
@@ -45,10 +51,21 @@ typedef struct SMMUv3AccelState {
 
 #if defined(CONFIG_ARM_SMMUV3) && defined(CONFIG_IOMMUFD)
 void smmuv3_accel_init(SMMUv3State *s);
+void smmuv3_accel_install_nested_ste(SMMUState *bs, SMMUDevice *sdev, int sid);
+void smmuv3_accel_install_nested_ste_range(SMMUState *bs,
+                                           SMMUSIDRange *range);
 #else
 static inline void smmuv3_accel_init(SMMUv3State *d)
 {
 }
+static inline void
+smmuv3_accel_install_nested_ste(SMMUState *bs, SMMUDevice *sdev, int sid)
+{
+}
+static inline void
+smmuv3_accel_install_nested_ste_range(SMMUState *bs, SMMUSIDRange *range)
+{
+}
 #endif
 
 #endif /* HW_ARM_SMMUV3_ACCEL_H */
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index b6b7399347..738061c6ad 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -547,6 +547,10 @@ typedef struct CD {
     uint32_t word[16];
 } CD;
 
+int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
+                  SMMUEventInfo *event);
+void smmuv3_flush_config(SMMUDevice *sdev);
+
 /* STE fields */
 
 #define STE_VALID(x)   extract32((x)->word[0], 0, 1)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 2f5a8157dd..c94bfe6564 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -630,8 +630,8 @@ bad_ste:
  * Supports linear and 2-level stream table
  * Return 0 on success, -EINVAL otherwise
  */
-static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
-                         SMMUEventInfo *event)
+int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
+                  SMMUEventInfo *event)
 {
     dma_addr_t addr, strtab_base;
     uint32_t log2size;
@@ -900,7 +900,7 @@ static SMMUTransCfg *smmuv3_get_config(SMMUDevice *sdev, SMMUEventInfo *event)
     return cfg;
 }
 
-static void smmuv3_flush_config(SMMUDevice *sdev)
+void smmuv3_flush_config(SMMUDevice *sdev)
 {
     SMMUv3State *s = sdev->smmu;
     SMMUState *bc = &s->smmu_state;
@@ -1342,6 +1342,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 
             trace_smmuv3_cmdq_cfgi_ste(sid);
             smmuv3_flush_config(sdev);
+            smmuv3_accel_install_nested_ste(bs, sdev, sid);
 
             break;
         }
@@ -1361,6 +1362,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
             sid_range.end = sid_range.start + mask;
 
             trace_smmuv3_cmdq_cfgi_ste_range(sid_range.start, sid_range.end);
+            smmuv3_accel_install_nested_ste_range(bs, &sid_range);
             smmu_configs_inv_sid_range(bs, sid_range);
             break;
         }
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index c4537ca1d6..7d232ca17c 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -69,6 +69,7 @@ smmu_reset_exit(void) ""
 #smmuv3-accel.c
 smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
 smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
+smmuv3_accel_install_nested_ste(uint32_t sid, uint64_t ste_1, uint64_t ste_0) "sid=%d ste=%"PRIx64":%"PRIx64
 
 # strongarm.c
 strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 10/15] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (8 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 19:43   ` Nicolin Chen
  2025-07-14 15:59 ` [RFC PATCH v3 11/15] hw/pci/pci: Introduce optional get_msi_address_space() callback Shameer Kolothum via
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

From: Nicolin Chen <nicolinc@nvidia.com>

Allocate and associate a vDEVICE object for the Guest device
with the vIOMMU. This will help the kernel to do the
vSID --> sid translation whenever required (eg: device specific
invalidations).

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c    | 25 +++++++++++++++++++++++++
 hw/arm/smmuv3-accel.h    |  1 +
 include/system/iommufd.h |  5 +++++
 3 files changed, 31 insertions(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 74bf20cfaf..f1584dd775 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -93,6 +93,23 @@ void smmuv3_accel_install_nested_ste(SMMUState *bs, SMMUDevice *sdev, int sid)
         return;
     }
 
+    if (!accel_dev->vdev && accel_dev->idev) {
+        IOMMUFDVdev *vdev;
+        uint32_t vdev_id;
+        SMMUViommu *viommu = accel_dev->viommu;
+
+        iommufd_backend_alloc_vdev(viommu->core.iommufd, accel_dev->idev->devid,
+                                   viommu->core.viommu_id, sid, &vdev_id,
+                                   &error_abort);
+        vdev = g_new(IOMMUFDVdev, 1);
+        vdev->vdev_id = vdev_id;
+        vdev->dev_id = sid;
+        accel_dev->vdev = vdev;
+        host_iommu_device_iommufd_attach_hwpt(accel_dev->idev,
+                                              accel_dev->viommu->bypass_hwpt_id,
+                                              &error_abort);
+    }
+
     ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
     if (ret) {
         error_report("failed to find STE for sid 0x%x", sid);
@@ -287,6 +304,7 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
     SMMUPciBus *sbus = g_hash_table_lookup(bs->smmu_pcibus_by_busptr, bus);
     SMMUv3AccelDevice *accel_dev;
     SMMUViommu *viommu;
+    IOMMUFDVdev *vdev;
     SMMUDevice *sdev;
 
     if (!sbus) {
@@ -310,6 +328,13 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
     trace_smmuv3_accel_unset_iommu_device(devfn, smmu_get_sid(sdev));
 
     viommu = s->s_accel->viommu;
+    vdev = accel_dev->vdev;
+    if (vdev) {
+        iommufd_backend_free_id(viommu->iommufd, vdev->vdev_id);
+        g_free(vdev);
+        accel_dev->vdev = NULL;
+    }
+
     if (QLIST_EMPTY(&viommu->device_list)) {
         iommufd_backend_free_id(viommu->iommufd, viommu->bypass_hwpt_id);
         iommufd_backend_free_id(viommu->iommufd, viommu->abort_hwpt_id);
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index 06e81b630d..21028e60c8 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -40,6 +40,7 @@ typedef struct SMMUv3AccelDevice {
     HostIOMMUDeviceIOMMUFD *idev;
     SMMUS1Hwpt  *s1_hwpt;
     SMMUViommu *viommu;
+    IOMMUFDVdev  *vdev;
     QLIST_ENTRY(SMMUv3AccelDevice) next;
 } SMMUv3AccelDevice;
 
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index b7ad2cf10c..8de559d448 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -44,6 +44,11 @@ typedef struct IOMMUFDViommu {
     uint32_t viommu_id;
 } IOMMUFDViommu;
 
+typedef struct IOMMUFDVdev {
+    uint32_t vdev_id;
+    uint32_t dev_id;
+} IOMMUFDVdev;
+
 bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
 void iommufd_backend_disconnect(IOMMUFDBackend *be);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 11/15] hw/pci/pci: Introduce optional get_msi_address_space() callback.
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (9 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 10/15] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 19:50   ` Nicolin Chen
  2025-07-14 15:59 ` [RFC PATCH v3 12/15] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

On ARM, when a device is behind an IOMMU, its MSI doorbell address is
subject to translation by the IOMMU. This behavior affects vfio-pci
passthrough devices assigned to guests using an accelerated SMMUv3.

In this setup, we configure the host SMMUv3 in nested mode, where
VFIO sets up the Stage-2 (S2) mappings for guest RAM, while the guest
controls Stage-1 (S1). To allow VFIO to correctly configure S2 mappings,
we currently return the system address space via the get_address_space()
callback for vfio-pci devices.

However, QEMU/KVM also uses this same callback path when resolving the
address space for MSI doorbells:

kvm_irqchip_add_msi_route()
  kvm_arch_fixup_msi_route()
    pci_device_iommu_address_space()

This leads to problems when MSI doorbells need to be translated.

To fix this, introduce an optional get_msi_address_space() callback.
In the SMMUv3 accelerated case, this callback returns the IOMMU address
space if the guest has set up S1 translations for the vfio-pci device.
Otherwise, it returns the system address space.

Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c | 25 +++++++++++++++++++++++++
 hw/pci/pci.c          | 19 +++++++++++++++++++
 include/hw/pci/pci.h  | 16 ++++++++++++++++
 target/arm/kvm.c      |  2 +-
 4 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index f1584dd775..04c665ccf5 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -346,6 +346,30 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
     }
 }
 
+static AddressSpace *smmuv3_accel_find_msi_as(PCIBus *bus, void *opaque,
+                                                  int devfn)
+{
+    SMMUState *bs = opaque;
+    SMMUPciBus *sbus;
+    SMMUv3AccelDevice *accel_dev;
+    SMMUDevice *sdev;
+
+    sbus = smmu_get_sbus(bs, bus);
+    accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
+    sdev = &accel_dev->sdev;
+
+    /*
+     * If the assigned vfio-pci dev has S1 translation enabled by
+     * Guest, return IOMMU address space for MSI translation.
+     * Otherwise, return system address space.
+     */
+    if (accel_dev->s1_hwpt) {
+        return &sdev->as;
+    } else {
+        return &accel_dev->as_sysmem;
+    }
+}
+
 static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool *vfio_pci)
 {
 
@@ -407,6 +431,7 @@ static const PCIIOMMUOps smmuv3_accel_ops = {
     .get_viommu_cap = smmuv3_accel_get_viommu_cap,
     .set_iommu_device = smmuv3_accel_set_iommu_device,
     .unset_iommu_device = smmuv3_accel_unset_iommu_device,
+    .get_msi_address_space = smmuv3_accel_find_msi_as,
 };
 
 void smmuv3_accel_init(SMMUv3State *s)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 13de0e2809..404aeb643d 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2957,6 +2957,25 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
     return &address_space_memory;
 }
 
+AddressSpace *pci_device_iommu_msi_address_space(PCIDevice *dev)
+{
+    PCIBus *bus;
+    PCIBus *iommu_bus;
+    int devfn;
+
+    pci_device_get_iommu_bus_devfn(dev, &iommu_bus, &bus, &devfn);
+    if (iommu_bus) {
+        if (iommu_bus->iommu_ops->get_msi_address_space) {
+            return iommu_bus->iommu_ops->get_msi_address_space(bus,
+                                 iommu_bus->iommu_opaque, devfn);
+        } else {
+            return iommu_bus->iommu_ops->get_address_space(bus,
+                                 iommu_bus->iommu_opaque, devfn);
+        }
+    }
+    return &address_space_memory;
+}
+
 int pci_iommu_init_iotlb_notifier(PCIDevice *dev, IOMMUNotifier *n,
                                   IOMMUNotify fn, void *opaque)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d1d43e9fb9..55138c406e 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -639,12 +639,28 @@ typedef struct PCIIOMMUOps {
                             uint32_t pasid, bool priv_req, bool exec_req,
                             hwaddr addr, bool lpig, uint16_t prgi, bool is_read,
                             bool is_write);
+    /**
+     * @get_msi_address_space: get the address space for MSI doorbell address
+     * for devices
+     *
+     * Optional callback which returns a pointer to an #AddressSpace. This
+     * is required if MSI doorbell also gets translated through IOMMU(eg: ARM)
+     *
+     * @bus: the #PCIBus being accessed.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * @devfn: device and function number
+     */
+    AddressSpace * (*get_msi_address_space)(PCIBus *bus, void *opaque,
+                                            int devfn);
 } PCIIOMMUOps;
 
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
                                  Error **errp);
 void pci_device_unset_iommu_device(PCIDevice *dev);
+AddressSpace *pci_device_iommu_msi_address_space(PCIDevice *dev);
 
 /**
  * pci_device_get_viommu_cap: get vIOMMU capabilities.
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 6672344855..c78d0d59bb 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1535,7 +1535,7 @@ int kvm_arm_set_irq(int cpu, int irqtype, int irq, int level)
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
                              uint64_t address, uint32_t data, PCIDevice *dev)
 {
-    AddressSpace *as = pci_device_iommu_address_space(dev);
+    AddressSpace *as = pci_device_iommu_msi_address_space(dev);
     hwaddr xlat, len, doorbell_gpa;
     MemoryRegionSection mrs;
     MemoryRegion *mr;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 12/15] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (10 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 11/15] hw/pci/pci: Introduce optional get_msi_address_space() callback Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 19:55   ` Nicolin Chen
  2025-07-15 10:39   ` Jonathan Cameron via
  2025-07-14 15:59 ` [RFC PATCH v3 13/15] hw/arm/smmuv3: Forward invalidation commands to hw Shameer Kolothum via
                   ` (4 subsequent siblings)
  16 siblings, 2 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

From: Nicolin Chen <nicolinc@nvidia.com>

Helpers will batch the commands and issue at once to host SMMUv3.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c    | 65 ++++++++++++++++++++++++++++++++++++++++
 hw/arm/smmuv3-accel.h    | 16 ++++++++++
 hw/arm/smmuv3-internal.h | 12 ++++++++
 3 files changed, 93 insertions(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 04c665ccf5..1298b4f6d0 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -168,6 +168,71 @@ smmuv3_accel_install_nested_ste_range(SMMUState *bs, SMMUSIDRange *range)
     g_hash_table_foreach(bs->configs, smmuv3_accel_ste_range, range);
 }
 
+/* Update batch->ncmds to the number of execute cmds */
+bool smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch)
+{
+    SMMUv3State *s = ARM_SMMUV3(bs);
+    SMMUv3AccelState *s_accel = s->s_accel;
+    uint32_t total = batch->ncmds;
+    IOMMUFDViommu *viommu_core;
+    int ret;
+
+    if (!bs->accel) {
+        return true;
+    }
+
+    if (!s_accel->viommu) {
+        return true;
+    }
+
+    viommu_core = &s_accel->viommu->core;
+    ret = iommufd_backend_invalidate_cache(viommu_core->iommufd,
+                                           viommu_core->viommu_id,
+                                           IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
+                                           sizeof(Cmd), &batch->ncmds,
+                                           batch->cmds, NULL);
+    if (!ret || total != batch->ncmds) {
+        error_report("%s failed: ret=%d, total=%d, done=%d",
+                      __func__, ret, total, batch->ncmds);
+        return ret;
+    }
+
+    batch->ncmds = 0;
+    return ret;
+}
+
+/*
+ * Note: sdev can be NULL for certain invalidation commands
+ * e.g., SMMU_CMD_TLBI_NH_ASID, SMMU_CMD_TLBI_NH_VA etc.
+ */
+void smmuv3_accel_batch_cmd(SMMUState *bs, SMMUDevice *sdev,
+                           SMMUCommandBatch *batch, Cmd *cmd,
+                           uint32_t *cons)
+{
+    if (!bs->accel) {
+        return;
+    }
+
+   /*
+    * We may end up here for any emulated PCI bridge or root port type
+    * devices. The batching of commands only matters for vfio-pci endpoint
+    * devices with Guest S1 translation enabled. Hence check that, if
+    * sdev is available.
+    */
+    if (sdev) {
+        SMMUv3AccelDevice *accel_dev;
+        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+
+        if (!accel_dev->s1_hwpt) {
+            return;
+        }
+    }
+
+    batch->cmds[batch->ncmds] = *cmd;
+    batch->cons[batch->ncmds++] = *cons;
+    return;
+}
+
 static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
                                                 PCIBus *bus, int devfn)
 {
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index 21028e60c8..d06c9664ba 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -13,6 +13,7 @@
 #include "hw/arm/smmu-common.h"
 #include "system/iommufd.h"
 #include <linux/iommufd.h>
+#include "smmuv3-internal.h"
 #include CONFIG_DEVICES
 
 typedef struct SMMUS2Hwpt {
@@ -55,6 +56,10 @@ void smmuv3_accel_init(SMMUv3State *s);
 void smmuv3_accel_install_nested_ste(SMMUState *bs, SMMUDevice *sdev, int sid);
 void smmuv3_accel_install_nested_ste_range(SMMUState *bs,
                                            SMMUSIDRange *range);
+bool smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch);
+void smmuv3_accel_batch_cmd(SMMUState *bs, SMMUDevice *sdev,
+                           SMMUCommandBatch *batch, struct Cmd *cmd,
+                           uint32_t *cons);
 #else
 static inline void smmuv3_accel_init(SMMUv3State *d)
 {
@@ -67,6 +72,17 @@ static inline void
 smmuv3_accel_install_nested_ste_range(SMMUState *bs, SMMUSIDRange *range)
 {
 }
+static inline bool smmuv3_accel_issue_cmd_batch(SMMUState *bs,
+                                               SMMUCommandBatch *batch)
+{
+    return true;
+}
+static inline void smmuv3_accel_batch_cmd(SMMUState *bs, SMMUDevice *sdev,
+                                          SMMUCommandBatch *batch,
+                                          struct Cmd *cmd, uint32_t *cons)
+{
+    return;
+}
 #endif
 
 #endif /* HW_ARM_SMMUV3_ACCEL_H */
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index 738061c6ad..8cb6a9238a 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -547,6 +547,18 @@ typedef struct CD {
     uint32_t word[16];
 } CD;
 
+/*
+ * SMMUCommandBatch - batch of invalidation commands for accel smmuv3
+ * @cmds: Pointer to list of commands
+ * @cons: Pointer to list of CONS corresponding to the commands
+ * @ncmds: Number of cmds in the batch
+ */
+typedef struct SMMUCommandBatch {
+    struct Cmd *cmds;
+    uint32_t *cons;
+    uint32_t ncmds;
+} SMMUCommandBatch;
+
 int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
                   SMMUEventInfo *event);
 void smmuv3_flush_config(SMMUDevice *sdev);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 13/15] hw/arm/smmuv3: Forward invalidation commands to hw
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (11 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 12/15] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-15 10:46   ` Jonathan Cameron via
  2025-07-14 15:59 ` [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits Shameer Kolothum via
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

From: Nicolin Chen <nicolinc@nvidia.com>

Use the provided smmuv3-accel helper functions to issue the
invalidation commands to host SMMUv3.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-internal.h | 11 +++++++++++
 hw/arm/smmuv3.c          | 28 ++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index 8cb6a9238a..f3aeaf6375 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -233,6 +233,17 @@ static inline bool smmuv3_gerror_irq_enabled(SMMUv3State *s)
 #define Q_CONS_WRAP(q) (((q)->cons & WRAP_MASK(q)) >> (q)->log2size)
 #define Q_PROD_WRAP(q) (((q)->prod & WRAP_MASK(q)) >> (q)->log2size)
 
+static inline int smmuv3_q_ncmds(SMMUQueue *q)
+{
+    uint32_t prod = Q_PROD(q);
+    uint32_t cons = Q_CONS(q);
+
+    if (Q_PROD_WRAP(q) == Q_CONS_WRAP(q))
+        return prod - cons;
+    else
+        return WRAP_MASK(q) - cons + prod;
+}
+
 static inline bool smmuv3_q_full(SMMUQueue *q)
 {
     return ((q->cons ^ q->prod) & WRAP_INDEX_MASK(q)) == WRAP_MASK(q);
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index c94bfe6564..97ecca0764 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1285,10 +1285,17 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
     SMMUCmdError cmd_error = SMMU_CERROR_NONE;
     SMMUQueue *q = &s->cmdq;
     SMMUCommandType type = 0;
+    SMMUCommandBatch batch = {};
+    uint32_t ncmds;
 
     if (!smmuv3_cmdq_enabled(s)) {
         return 0;
     }
+
+    ncmds = smmuv3_q_ncmds(q);
+    batch.cmds = g_new0(Cmd, ncmds);
+    batch.cons = g_new0(uint32_t, ncmds);
+
     /*
      * some commands depend on register values, typically CR0. In case those
      * register values change while handling the command, spec says it
@@ -1383,6 +1390,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 
             trace_smmuv3_cmdq_cfgi_cd(sid);
             smmuv3_flush_config(sdev);
+            smmuv3_accel_batch_cmd(sdev->smmu, sdev, &batch, &cmd, &q->cons);
             break;
         }
         case SMMU_CMD_TLBI_NH_ASID:
@@ -1406,6 +1414,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
             trace_smmuv3_cmdq_tlbi_nh_asid(asid);
             smmu_inv_notifiers_all(&s->smmu_state);
             smmu_iotlb_inv_asid_vmid(bs, asid, vmid);
+            smmuv3_accel_batch_cmd(bs, NULL, &batch, &cmd, &q->cons);
             break;
         }
         case SMMU_CMD_TLBI_NH_ALL:
@@ -1433,6 +1442,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
             trace_smmuv3_cmdq_tlbi_nsnh();
             smmu_inv_notifiers_all(&s->smmu_state);
             smmu_iotlb_inv_all(bs);
+            smmuv3_accel_batch_cmd(bs, NULL, &batch, &cmd, &q->cons);
             break;
         case SMMU_CMD_TLBI_NH_VAA:
         case SMMU_CMD_TLBI_NH_VA:
@@ -1441,6 +1451,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
                 break;
             }
             smmuv3_range_inval(bs, &cmd, SMMU_STAGE_1);
+            smmuv3_accel_batch_cmd(bs, NULL, &batch, &cmd, &q->cons);
             break;
         case SMMU_CMD_TLBI_S12_VMALL:
         {
@@ -1499,12 +1510,29 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
         queue_cons_incr(q);
     }
 
+    qemu_mutex_lock(&s->mutex);
+    if (!cmd_error && batch.ncmds) {
+        if (!smmuv3_accel_issue_cmd_batch(bs, &batch)) {
+            if (batch.ncmds) {
+                q->cons = batch.cons[batch.ncmds - 1];
+            } else {
+                q->cons = batch.cons[0]; /* FIXME: Check */
+            }
+            qemu_log_mask(LOG_GUEST_ERROR, "Illegal command type: %d\n",
+                          CMD_TYPE(&batch.cmds[batch.ncmds]));
+            cmd_error = SMMU_CERROR_ILL;
+        }
+    }
+    qemu_mutex_unlock(&s->mutex);
+
     if (cmd_error) {
         trace_smmuv3_cmdq_consume_error(smmu_cmd_string(type), cmd_error);
         smmu_write_cmdq_err(s, cmd_error);
         smmuv3_trigger_irq(s, SMMU_IRQ_GERROR, R_GERROR_CMDQ_ERR_MASK);
     }
 
+    g_free(batch.cmds);
+    g_free(batch.cons);
     trace_smmuv3_cmdq_consume_out(Q_PROD(q), Q_CONS(q),
                                   Q_PROD_WRAP(q), Q_CONS_WRAP(q));
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (12 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 13/15] hw/arm/smmuv3: Forward invalidation commands to hw Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 20:04   ` Nicolin Chen via
                     ` (3 more replies)
  2025-07-14 15:59 ` [RFC PATCH v3 15/15] hw/arm/smmu-common: Add accel property for SMMU dev Shameer Kolothum via
                   ` (2 subsequent siblings)
  16 siblings, 4 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

From: Nicolin Chen <nicolinc@nvidia.com>

Not all fields in the SMMU IDR registers are meaningful for userspace.
Only the following fields can be used:

  - IDR0: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN, CD2L, ASID16, TTF  
  - IDR1: SIDSIZE, SSIDSIZE  
  - IDR3: BBML, RIL  
  - IDR5: VAX, GRAN64K, GRAN16K, GRAN4K

Use the relevant fields from these to check whether the host and emulated
SMMUv3 features are sufficiently aligned to enable accelerated SMMUv3
support.

To retrieve this information from the host, at least one vfio-pci device
must be assigned with "arm-smmuv3,accel=on" usage. Add a check to enforce
this.

Note:

ATS, PASID, and PRI features are currently not supported. Only devices
that do not require or make use of these features are expected to work.

Also, requiring at least one vfio-pci device to be cold-plugged
complicates hot-unplug and replug scenarios. For example, if all devices
behind the vSMMUv3 are unplugged after the guest boots, and a new device
is later hot-plugged into the same PCI bus, there is no guarantee that
the underlying host SMMUv3 will expose the same feature set as the one
originally used when the vSMMU was initialized.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c | 103 ++++++++++++++++++++++++++++++++++++++++++
 hw/arm/smmuv3-accel.h |   5 ++
 hw/arm/smmuv3.c       |   4 ++
 hw/arm/trace-events   |   2 +-
 4 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 1298b4f6d0..3b2f45bd88 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -23,6 +23,109 @@
 #define SMMU_STE_VALID      (1ULL << 0)
 #define SMMU_STE_CFG_BYPASS (1ULL << 3)
 
+static int
+smmuv3_accel_host_hw_info(SMMUv3AccelDevice *accel_dev, uint32_t *data_type,
+                          uint32_t data_len, void *data)
+{
+    uint64_t caps;
+
+    if (!accel_dev || !accel_dev->idev) {
+        return -ENOENT;
+    }
+
+    return !iommufd_backend_get_device_info(accel_dev->idev->iommufd,
+                                            accel_dev->idev->devid,
+                                            data_type, data,
+                                            data_len, &caps, NULL);
+}
+
+void smmuv3_accel_init_regs(SMMUv3State *s)
+{
+    SMMUv3AccelState *s_accel = s->s_accel;
+    SMMUv3AccelDevice *accel_dev;
+    uint32_t data_type;
+    uint32_t val;
+    int ret;
+
+    if (s_accel->info.idr[0]) {
+        /* We already got this */
+        return;
+    }
+
+    if (!s_accel->viommu || QLIST_EMPTY(&s_accel->viommu->device_list)) {
+        error_report("For arm-smmuv3,accel=on case, atleast one cold-plugged "
+                     "vfio-pci dev needs to be assigned");
+        goto out_err;
+    }
+
+    accel_dev = QLIST_FIRST(&s_accel->viommu->device_list);
+    ret = smmuv3_accel_host_hw_info(accel_dev, &data_type,
+                                    sizeof(s_accel->info), &s_accel->info);
+    if (ret) {
+        error_report("Failed to get Host SMMU device info");
+        goto out_err;
+    }
+
+    if (data_type != IOMMU_HW_INFO_TYPE_ARM_SMMUV3) {
+        error_report("Wrong data type (%d) for Host SMMU device info",
+                     data_type);
+        goto out_err;
+    }
+
+    trace_smmuv3_accel_host_hw_info(s_accel->info.idr[0], s_accel->info.idr[1],
+                                    s_accel->info.idr[3], s_accel->info.idr[5]);
+    /*
+     * QEMU SMMUv3 supports both linear and 2-level stream tables. If host
+     * SMMUv3 supports only linear stream table, report that to Guest.
+     */
+    val = FIELD_EX32(s_accel->info.idr[0], IDR0, STLEVEL);
+    if (val < FIELD_EX32(s->idr[0], IDR0, STLEVEL)) {
+        s->idr[0] = FIELD_DP32(s->idr[0], IDR0, STLEVEL, val);
+    }
+
+    /*
+     * QEMU SMMUv3 supports little-endian support for translation table walks.
+     * If host SMMUv3 supports only big-endian, report error.
+     */
+    val = FIELD_EX32(s_accel->info.idr[0], IDR0, TTENDIAN);
+    if (val > FIELD_EX32(s->idr[0], IDR0, TTENDIAN)) {
+        error_report("Host SUUMU device translation table walk endianess "
+                     "not supported");
+        goto out_err;
+    }
+
+    /*
+     * QEMU SMMUv3 supports AArch64 Translation table format.
+     * If host SMMUv3 supports only AArch32, report error.
+     */
+    val = FIELD_EX32(s_accel->info.idr[0], IDR0, TTF);
+    if (val < FIELD_EX32(s->idr[0], IDR0, TTF)) {
+        error_report("Host SMMU device Translation table format not supported");
+        goto out_err;
+    }
+
+    /*
+     * QEMU SMMUv3 supports 4K/16K/64K translation granules. If host SMMUv3
+     * does't support any of these, report the supported ones only to Guest.
+     */
+    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN4K);
+    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN4K)) {
+        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
+    }
+    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN16K);
+    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN16K)) {
+        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
+    }
+    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN64K);
+    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN64K)) {
+        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
+    }
+    return;
+
+out_err:
+    exit(1);
+}
+
 static void
 smmuv3_accel_dev_uninstall_nested_ste(SMMUv3AccelDevice *accel_dev, bool abort)
 {
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index d06c9664ba..e1e99598b4 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -49,6 +49,7 @@ typedef struct SMMUv3AccelState {
     MemoryRegion root;
     MemoryRegion sysmem;
     SMMUViommu *viommu;
+    struct iommu_hw_info_arm_smmuv3 info;
 } SMMUv3AccelState;
 
 #if defined(CONFIG_ARM_SMMUV3) && defined(CONFIG_IOMMUFD)
@@ -60,6 +61,7 @@ bool smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch);
 void smmuv3_accel_batch_cmd(SMMUState *bs, SMMUDevice *sdev,
                            SMMUCommandBatch *batch, struct Cmd *cmd,
                            uint32_t *cons);
+void smmuv3_accel_init_regs(SMMUv3State *s);
 #else
 static inline void smmuv3_accel_init(SMMUv3State *d)
 {
@@ -83,6 +85,9 @@ static inline void smmuv3_accel_batch_cmd(SMMUState *bs, SMMUDevice *sdev,
 {
     return;
 }
+static inline void smmuv3_accel_init_regs(SMMUv3State *s)
+{
+}
 #endif
 
 #endif /* HW_ARM_SMMUV3_ACCEL_H */
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 97ecca0764..100e3c8929 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1894,6 +1894,7 @@ static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
  */
 static void smmu_reset_exit(Object *obj, ResetType type)
 {
+    SMMUState *sys = ARM_SMMU(obj);
     SMMUv3State *s = ARM_SMMUV3(obj);
     SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
 
@@ -1903,6 +1904,9 @@ static void smmu_reset_exit(Object *obj, ResetType type)
     }
 
     smmuv3_init_regs(s);
+    if (sys->accel) {
+        smmuv3_accel_init_regs(s);
+    }
 }
 
 static void smmu_realize(DeviceState *d, Error **errp)
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 7d232ca17c..37ecab10a0 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -70,7 +70,7 @@ smmu_reset_exit(void) ""
 smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
 smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
 smmuv3_accel_install_nested_ste(uint32_t sid, uint64_t ste_1, uint64_t ste_0) "sid=%d ste=%"PRIx64":%"PRIx64
-
+smmuv3_accel_host_hw_info(uint32_t idr0, uint32_t idr1, uint32_t idr3, uint32_t idr5) "idr0=0x%x idr1=0x%x idr3=0x%x idr5=0x%x"
 # strongarm.c
 strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
 strongarm_ssp_read_underrun(void) "SSP rx underrun"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [RFC PATCH v3 15/15] hw/arm/smmu-common: Add accel property for SMMU dev
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (13 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits Shameer Kolothum via
@ 2025-07-14 15:59 ` Shameer Kolothum via
  2025-07-14 20:00   ` Nicolin Chen
  2025-07-15 10:49   ` Jonathan Cameron via
  2025-07-14 16:14 ` [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Nicolin Chen via
  2025-07-15 10:46 ` Duan, Zhenzhong
  16 siblings, 2 replies; 79+ messages in thread
From: Shameer Kolothum via @ 2025-07-14 15:59 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum

Now user can set "accel=on". Have fun!

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmu-common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 6a58f574d3..3e8783670a 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -1022,6 +1022,7 @@ static const Property smmu_dev_properties[] = {
     DEFINE_PROP_BOOL("smmu_per_bus", SMMUState, smmu_per_bus, false),
     DEFINE_PROP_LINK("primary-bus", SMMUState, primary_bus,
                      TYPE_PCI_BUS, PCIBus *),
+    DEFINE_PROP_BOOL("accel", SMMUState, accel, false),
 };
 
 static void smmu_base_class_init(ObjectClass *klass, const void *data)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (14 preceding siblings ...)
  2025-07-14 15:59 ` [RFC PATCH v3 15/15] hw/arm/smmu-common: Add accel property for SMMU dev Shameer Kolothum via
@ 2025-07-14 16:14 ` Nicolin Chen via
  2025-07-14 20:22   ` Nicolin Chen via
  2025-07-15 10:46 ` Duan, Zhenzhong
  16 siblings, 1 reply; 79+ messages in thread
From: Nicolin Chen via @ 2025-07-14 16:14 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

Hi Shameer,

Thank you for sending the v3.

On Mon, Jul 14, 2025 at 04:59:26PM +0100, Shameer Kolothum wrote:
> Branch for testing:
[...]
> Tested on a HiSilicon platform with multiple SMMUv3s.
> 
> ./qemu-system-aarch64 \
>   -machine virt,accel=kvm,gic-version=3 \
>   -object iommufd,id=iommufd0 \
>   -bios QEMU_EFI \
>   -cpu host -smp cpus=4 -m size=16G,slots=4,maxmem=256G -nographic \
>   -device virtio-blk-device,drive=fs \
>   -drive if=none,file=ubuntu.img,id=fs \
>   -kernel Image \
>   -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0,accel=on \
>   -device vfio-pci,host=0000:75:00.1,bus=pcie.0,iommufd=iommufd0 \
>   -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \
>   -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on \
>   -device pcie-root-port,id=pcie1.port1,chassis=2,bus=pcie.1,pref64-reserve=2M,io-reserve=1K \
>   -device vfio-pci,host=0000:7d:02.1,bus=pcie1.port1,iommufd=iommufd0,id=net1 \
>   -append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \
>   -device pxb-pcie,id=pcie.2,bus_nr=32,bus=pcie.0 \
>   -device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.2 \
>   -device pcie-root-port,id=pcie2.port1,chassis=8,bus=pcie.2 \
>   -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie2.port1 \
>   -fsdev local,id=p9fs,path=p9root,security_model=mapped \
>   -net none \
>   -nographic
 
I am looking for that "branch for testing" to try some tests on my
side, but couldn't find one. Would you please share a Github link?

Thanks!
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 01/15] backends/iommufd: Introduce iommufd_backend_alloc_viommu
  2025-07-14 15:59 ` [RFC PATCH v3 01/15] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
@ 2025-07-14 16:22   ` Nicolin Chen
  2025-07-15  9:14   ` Jonathan Cameron via
  1 sibling, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 16:22 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:27PM +0100, Shameer Kolothum wrote:
> +bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
> +                                  uint32_t viommu_type, uint32_t hwpt_id,
> +                                  uint32_t *out_viommu_id, Error **errp)
> +{
> +    int ret, fd = be->fd;
> +    struct iommu_viommu_alloc alloc_viommu = {
> +        .size = sizeof(alloc_viommu),
> +        .type = viommu_type,
> +        .dev_id = dev_id,
> +        .hwpt_id = hwpt_id,
> +    };
> +
> +    ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &alloc_viommu);
> +
> +    trace_iommufd_backend_alloc_viommu(fd, viommu_type, dev_id, hwpt_id,
> +                                       alloc_viommu.out_viommu_id, ret);

Let's do "dev_id, viommu_type, hwpt_id, ..." following the sequence
of the inputs from the function.

> +    if (ret) {
> +        error_setg_errno(errp, errno, "IOMMU_VIOMMU_ALLOC failed");
> +        return false;
> +    }
> +
> +    *out_viommu_id = alloc_viommu.out_viommu_id;

Let's add a g_assert(out_viommu_id) in front of this line.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 02/15] backends/iommufd: Introduce iommufd_vdev_alloc
  2025-07-14 15:59 ` [RFC PATCH v3 02/15] backends/iommufd: Introduce iommufd_vdev_alloc Shameer Kolothum via
@ 2025-07-14 16:27   ` Nicolin Chen
  2025-07-15  9:19   ` Jonathan Cameron via
  1 sibling, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 16:27 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:28PM +0100, Shameer Kolothum wrote:
> +bool iommufd_backend_alloc_vdev(IOMMUFDBackend *be, uint32_t dev_id,
> +                                uint32_t viommu_id, uint64_t virt_id,
> +                                uint32_t *out_vdev_id, Error **errp)
> +{
> +    int ret, fd = be->fd;
> +    struct iommu_vdevice_alloc alloc_vdev = {
> +        .size = sizeof(alloc_vdev),
> +        .viommu_id = viommu_id,
> +        .dev_id = dev_id,
> +        .virt_id = virt_id,
> +    };
> +
> +    ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &alloc_vdev);
> +
> +    trace_iommufd_backend_alloc_vdev(fd, dev_id, viommu_id, virt_id,
> +                                     alloc_vdev.out_vdevice_id, ret);
> +
> +    if (ret) {
> +        error_setg_errno(errp, errno, "IOMMU_VDEVICE_ALLOC failed");
> +        return false;
> +    }
> +
> +    *out_vdev_id = alloc_vdev.out_vdevice_id;

g_assert(out_vdev_id);

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper
  2025-07-14 15:59 ` [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper Shameer Kolothum via
@ 2025-07-14 16:38   ` Nicolin Chen via
  2025-07-15  9:30   ` Jonathan Cameron via
  2025-09-04  7:55   ` Eric Auger
  2 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen via @ 2025-07-14 16:38 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:30PM +0100, Shameer Kolothum wrote:
> Allows to retrieve the PCIIOMMUOps based on the SMMU type. This will be
> useful when we add support for accelerated SMMUV3 in subsequent patches
> as that requires a different set of callbacks for iommu ops.
> 
> No special handling is required for now and returns the default ops
> in base SMMU Class.
> 
> No functional changes intended.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

> +static const PCIIOMMUOps *smmu_iommu_ops_by_type(SMMUState *s)
> +{
> +    SMMUBaseClass *sbc;
> +
> +    sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
> +    assert(sbc->iommu_ops);

I have an impression that QEMU uses the glib version more, so it
could be g_assert(). But I do see a lots of assert() also in the
existing code. So, just a note here, not a strong suggestion.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  2025-07-14 15:59 ` [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum via
@ 2025-07-14 17:23   ` Nicolin Chen
  2025-09-04 14:33     ` Eric Auger
  2025-07-15  9:45   ` Jonathan Cameron via
  2025-07-15 10:48   ` Duan, Zhenzhong
  2 siblings, 1 reply; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 17:23 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:31PM +0100, Shameer Kolothum wrote:
> Also setup specific PCIIOMMUOps for accel SMMUv3 as accel
> SMMUv3 will have different handling for those ops callbacks
> in subsequent patches.
> 
> The "accel" property is not yet added, so users cannot set it at this
> point. It will be introduced in a subsequent patch once the necessary
> support is in place.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Overall the patch looks good to me,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

with some nits:

> @@ -61,7 +61,8 @@ arm_common_ss.add(when: 'CONFIG_ARMSSE', if_true: files('armsse.c'))
>  arm_common_ss.add(when: 'CONFIG_FSL_IMX7', if_true: files('fsl-imx7.c', 'mcimx7d-sabre.c'))
>  arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP', if_true: files('fsl-imx8mp.c'))
>  arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP_EVK', if_true: files('imx8mp-evk.c'))
> -arm_common_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
> +arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
> +arm_ss.add(when: ['CONFIG_ARM_SMMUV3', 'CONFIG_IOMMUFD'], if_true: files('smmuv3-accel.c'))

Wondering why "arm_common_ss" is changed to "arm_ss"?

> +static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
> +                                                PCIBus *bus, int devfn)

There seems to be an extra space in the 2nd line.

> +{
> +    SMMUDevice *sdev = sbus->pbdev[devfn];
> +    SMMUv3AccelDevice *accel_dev;
> +
> +    if (sdev) {
> +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +    } else {
> +        accel_dev = g_new0(SMMUv3AccelDevice, 1);
> +        sdev = &accel_dev->sdev;
> +
> +        sbus->pbdev[devfn] = sdev;
> +        smmu_init_sdev(bs, sdev, bus, devfn);
> +    }

Could just:
    if (sdev) {
        return container_of(sdev, SMMUv3AccelDevice, sdev);
    }

Then, no extra indentations for the rest of the code.

> +
> +    return accel_dev;
> +}
> +
> +static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
> +                                              int devfn)
> +{
> +    SMMUState *bs = opaque;
> +    SMMUPciBus *sbus;
> +    SMMUv3AccelDevice *accel_dev;
> +    SMMUDevice *sdev;
> +
> +    sbus = smmu_get_sbus(bs, bus);
> +    accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
> +    sdev = &accel_dev->sdev;

Maybe just:

+    SMMUPciBus *sbus = smmu_get_sbus(bs, bus);
+    SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
+    SMMUDevice *sdev = &accel_dev->sdev;

?

> +typedef struct SMMUv3AccelDevice {
> +    SMMUDevice  sdev;

Let's drop the extra space in between.

> +} SMMUv3AccelDevice;
> +
> +#endif /* HW_ARM_SMMUV3_ACCEL_H */
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index eb94623555..c459d24427 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -162,6 +162,7 @@ struct SMMUState {
>      uint8_t bus_num;
>      PCIBus *primary_bus;
>      bool smmu_per_bus; /* SMMU is specific to the primary_bus */
> +    bool accel; /* SMMU has accelerator support */

How about:
"SMMU is in the HW-accelerated mode for stage-1 translation"
?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-14 15:59 ` [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum via
@ 2025-07-14 18:18   ` Nicolin Chen
  2025-07-15  9:51   ` Jonathan Cameron via
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 18:18 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:32PM +0100, Shameer Kolothum wrote:
> Accelerated SMMUv3 is only useful when the device can take advantage of
> the host's SMMUv3 in nested mode. To keep things simple and correct, we
> only allow this feature for vfio-pci endpoint devices that use the iommufd
> backend. We also allow non-endpoint emulated devices like PCI bridges and
> root ports, so that users can plug in these vfio-pci devices.
> 
> Another reason for this limit is to avoid problems with IOTLB
> invalidations. Some commands (e.g., CMD_TLBI_NH_ASID) lack an associated
> SID, making it difficult to trace the originating device. If we allowed
> emulated endpoint devices, QEMU would have to invalidate both its own
> software IOTLB and the host's hardware IOTLB, which could slow things
> down.

Change looks good to me,

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

Some nits:

> Since vfio-pci devices in nested mode rely on the host SMMUv3's nested
> translation (S1+S2), their get_address_space() callback must return the
> system address space to enable correct S2 mappings of guest RAM.
>
> So in short:
>  - vfio-pci devices return the system address space
>  - bridges and root ports return the IOMMU address space

I think we can spare a few more words here and in the code too:
(a) A vfio-pci device in an accelerated mode doesn't need any of
    the features from the iommu address space, since translation
    and IOTLB maintenance will be handled by the real hardware.
(b) In the HW accelerated mode, the VFIO core will allocate the
    S2 nesting parent HWPT on top of a core-managed IOAS for S2
    mappings. So, returning the system address space allows to
    take advantage of that.
(c) The reason why bridges and root ports can't use the system
    address space.

Feel free to organize these in your preferred words.

> Note: On ARM, MSI doorbell addresses are also translated via SMMUv3.
> Hence, if a vfio-pci device is behind the SMMuv3 with translation enabled,

s/SMMuv3/SMMUv3
  
> +static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool *vfio_pci)
> +{
> +
> +    if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
> +        object_dynamic_cast(OBJECT(pdev), "pxb-pcie") ||

"TYPE_PXB_PCIE_DEV" since we moved it to the header in this patch.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 07/15] hw/arm/smmuv3: Implement get_viommu_cap() callback
  2025-07-14 15:59 ` [RFC PATCH v3 07/15] hw/arm/smmuv3: Implement get_viommu_cap() callback Shameer Kolothum via
@ 2025-07-14 18:31   ` Nicolin Chen
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 18:31 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:33PM +0100, Shameer Kolothum wrote:
> For accelerated SMMUv3, we need nested parent domain creation. Add the
> callback support so that VFIO can create a nested parent.
> 
> Since 'accel=on' for SMMUv3 requires the guest SMMUv3 to be configured
> in Stage 1 mode, ensure that the 'stage' property is explicitly set to
> Stage 1.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

> @@ -81,8 +82,22 @@ static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
>      }
>  }
>  
> +static uint64_t smmuv3_accel_get_viommu_cap(void *opaque)
> +{
> +    /*
> +     * Accelerated smmuv3 support only allowes Guest S1

s/allowes/allows

> +     * configuration. Hence report VIOMMU_CAP_STAGE1
> +     * so that VFIO can create nested parent domain.

Aligning with the kernel uAPI docs:

s/nested/a nesting

> +     * The real nested support should be reported from host

"The real HW nested stage-1 translation must be supported by the .."

> +     * SMMUv3 and if it doesn't, the nested parent allocation

s/nested/nesting

> +     * will fail anyway.
> +     */

And I think the lines are wrapped a bit too early. Should QEMU allow
up-to-80 characters?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-07-14 15:59 ` [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
@ 2025-07-14 19:11   ` Nicolin Chen
  2025-07-15 10:29   ` Jonathan Cameron via
  1 sibling, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 19:11 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:34PM +0100, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Implement a set_iommu_device callback:
>  -If found an existing viommu reuse that.
>    (Devices behind the same physical SMMU should share an S2 HWPT)
>  -Else,
>     Allocate a viommu with the nested parent S2 hwpt allocated by VFIO.

s/nested/nesting

>     Allocate bypass and abort hwpt.

Let's spare some words explaining why they are needed:

iommufd provides a vIOMMU model for nested translation support, where its
object encapsulates an S2 nesting parent HWPT. In this mode, devices can't
attach to the S2 HWPT directly, bypassing the iommufd vIOMMU object. Given
that a vIOMMU object isn't directly attachable, allocate two proxy nested
HWPTs (bypass and abort) for devices to attach.

> @@ -7,6 +7,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "trace.h"
>  #include "qemu/error-report.h"

Will look nicer in alphabetical order

>  static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
>                                                  PCIBus *bus, int devfn)

There seems to be an extra space in the 2nd line's indentations.

> +static bool
> +smmuv3_accel_dev_alloc_viommu(SMMUv3AccelDevice *accel_dev,
> +                               HostIOMMUDeviceIOMMUFD *idev, Error **errp)

Ditto

> +{
> +    struct iommu_hwpt_arm_smmuv3 bypass_data = {
> +        .ste = { SMMU_STE_CFG_BYPASS | SMMU_STE_VALID, 0x0ULL },
> +    };
> +    struct iommu_hwpt_arm_smmuv3 abort_data = {
> +        .ste = { SMMU_STE_VALID, 0x0ULL },
> +    };
> +    SMMUDevice *sdev = &accel_dev->sdev;
> +    SMMUState *bs = sdev->smmu;
> +    SMMUv3State *s = ARM_SMMUV3(bs);
> +    SMMUv3AccelState *s_accel = s->s_accel;
> +    uint32_t s2_hwpt_id = idev->hwpt_id;
> +    SMMUS2Hwpt *s2_hwpt;
> +    SMMUViommu *viommu;
> +    uint32_t viommu_id;
> +
> +    if (s_accel->viommu) {
> +        accel_dev->viommu = s_accel->viommu;
> +        return true;
> +    }
> +
> +    if (!iommufd_backend_alloc_viommu(idev->iommufd, idev->devid,
> +                                      IOMMU_VIOMMU_TYPE_ARM_SMMUV3,
> +                                      s2_hwpt_id, &viommu_id, errp)) {
> +        return false;
> +    }
> +
> +    viommu = g_new0(SMMUViommu, 1);
> +    viommu->core.viommu_id = viommu_id;
> +    viommu->core.s2_hwpt_id = s2_hwpt_id;
> +    viommu->core.iommufd = idev->iommufd;
> +
> +    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
> +                                    viommu->core.viommu_id, 0,
> +                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
> +                                    sizeof(abort_data), &abort_data,
> +                                    &viommu->abort_hwpt_id, errp)) {
> +        goto free_viommu;
> +    }
> +
> +    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
> +                                    viommu->core.viommu_id, 0,
> +                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
> +                                    sizeof(bypass_data), &bypass_data,
> +                                    &viommu->bypass_hwpt_id, errp)) {
> +        goto free_abort_hwpt;
> +    }
> +
> +    s2_hwpt = g_new(SMMUS2Hwpt, 1);
> +    s2_hwpt->iommufd = idev->iommufd;
> +    s2_hwpt->hwpt_id = s2_hwpt_id;

s2_hwpt is core allocated now, so maybe we don't need this object.

> +
> +    viommu->iommufd = idev->iommufd;
> +    viommu->s2_hwpt = s2_hwpt;
> +
> +    s_accel->viommu = viommu;
> +    accel_dev->viommu = viommu;
> +    return true;
> +
> +free_abort_hwpt:
> +    iommufd_backend_free_id(idev->iommufd, viommu->abort_hwpt_id);
> +free_viommu:
> +    iommufd_backend_free_id(idev->iommufd, viommu->core.viommu_id);
> +    g_free(viommu);
> +    return false;
> +}
> +
> +static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> +                                          HostIOMMUDevice *hiod, Error **errp)
> +{
> +    HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(hiod);
> +    SMMUState *bs = opaque;
> +    SMMUv3State *s = ARM_SMMUV3(bs);
> +    SMMUv3AccelState *s_accel = s->s_accel;
> +    SMMUPciBus *sbus = smmu_get_sbus(bs, bus);
> +    SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
> +    SMMUDevice *sdev = &accel_dev->sdev;
> +
> +    if (!idev) {
> +        return true;
> +    }
> +
> +    if (accel_dev->idev) {
> +        if (accel_dev->idev != idev) {
> +            error_report("Device 0x%x already has an associated idev",
> +                         smmu_get_sid(sdev));
> +            return false;
> +        } else {
> +            return true;
> +        }
> +    }
> +
> +    if (!smmuv3_accel_dev_alloc_viommu(accel_dev, idev, errp)) {
> +        error_report("Device 0x%x: Unable to alloc viommu", smmu_get_sid(sdev));
> +        return false;
> +    }
> +
> +    accel_dev->idev = idev;
> +    QLIST_INSERT_HEAD(&s_accel->viommu->device_list, accel_dev, next);
> +    trace_smmuv3_accel_set_iommu_device(devfn, smmu_get_sid(sdev));

Since we need three direct copies of smmu_get_sid(), it'd be
cleaner to have a local variable?
+    uint16_t sid = smmu_get_sid(sdev);

Or should it have a validation of the sdev pointer like the
unset() does?

> +    return true;
> +}
> +
> +static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
> +                                            int devfn)
> +{
> +    SMMUState *bs = opaque;
> +    SMMUv3State *s = ARM_SMMUV3(bs);
> +    SMMUPciBus *sbus = g_hash_table_lookup(bs->smmu_pcibus_by_busptr, bus);
> +    SMMUv3AccelDevice *accel_dev;
> +    SMMUViommu *viommu;
> +    SMMUDevice *sdev;
> +
> +    if (!sbus) {
> +        return;
> +    }
> +
> +    sdev = sbus->pbdev[devfn];
> +    if (!sdev) {
> +        return;
> +    }
> +
> +    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +    if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev,
> +                                               accel_dev->idev->hwpt_id,
> +                                               NULL)) {
> +        error_report("Unable to attach dev to the default HW pagetable");
> +    }
> +
> +    accel_dev->idev = NULL;
> +    QLIST_REMOVE(accel_dev, next);
> +    trace_smmuv3_accel_unset_iommu_device(devfn, smmu_get_sid(sdev));
> +
> +    viommu = s->s_accel->viommu;
> +    if (QLIST_EMPTY(&viommu->device_list)) {
> +        iommufd_backend_free_id(viommu->iommufd, viommu->bypass_hwpt_id);
> +        iommufd_backend_free_id(viommu->iommufd, viommu->abort_hwpt_id);
> +        iommufd_backend_free_id(viommu->iommufd, viommu->core.viommu_id);
> +        iommufd_backend_free_id(viommu->iommufd, viommu->s2_hwpt->hwpt_id);
> +        g_free(viommu->s2_hwpt);

s2_hwpt should be core-managed, so its id should not be free-ed
here at least.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  2025-07-14 15:59 ` [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support Shameer Kolothum via
@ 2025-07-14 19:37   ` Nicolin Chen
  2025-07-15 23:12   ` Nicolin Chen
  1 sibling, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 19:37 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:35PM +0100, Shameer Kolothum wrote:
> +static int
> +smmuv3_accel_dev_install_nested_ste(SMMUv3AccelDevice *accel_dev,
> +                                    uint32_t data_type, uint32_t data_len,
> +                                    void *data)
> +{
> +    SMMUViommu *viommu = accel_dev->viommu;
> +    SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
> +    HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
> +    uint32_t flags = 0;
> +
> +    if (!idev || !viommu) {
> +        return -ENOENT;
> +    }
> +
> +    if (s1_hwpt) {
> +        smmuv3_accel_dev_uninstall_nested_ste(accel_dev, true);
> +    }
> +
> +    s1_hwpt = g_new0(SMMUS1Hwpt, 1);
> +    s1_hwpt->iommufd = idev->iommufd;
> +    iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
> +                               viommu->core.viommu_id, flags, data_type,
> +                               data_len, data, &s1_hwpt->hwpt_id, &error_abort);

Let's check the return value.

> +    host_iommu_device_iommufd_attach_hwpt(idev, s1_hwpt->hwpt_id, &error_abort);
> +    accel_dev->s1_hwpt = s1_hwpt;
> +    return 0;
> +}
> +
> +void smmuv3_accel_install_nested_ste(SMMUState *bs, SMMUDevice *sdev, int sid)
> +{
> +    SMMUv3AccelDevice *accel_dev;
> +    SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
> +                           .inval_ste_allowed = true};
> +    struct iommu_hwpt_arm_smmuv3 nested_data = {};
> +    uint32_t config;
> +    STE ste;
> +    int ret;
> +
> +    if (!bs->accel) {
> +        return;
> +    }
> +
> +    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +    if (!accel_dev->viommu) {
> +        return;
> +    }
> +
> +    ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
> +    if (ret) {
> +        error_report("failed to find STE for sid 0x%x", sid);
> +        return;
> +    }
> +
> +    config = STE_CONFIG(&ste);
> +    if (!STE_VALID(&ste) || !STE_CFG_S1_ENABLED(config)) {
> +        smmuv3_accel_dev_uninstall_nested_ste(accel_dev, STE_CFG_ABORT(config));
> +        smmuv3_flush_config(sdev);
> +        return;
> +    }
> +
> +    nested_data.ste[0] = (uint64_t)ste.word[0] | (uint64_t)ste.word[1] << 32;
> +    nested_data.ste[1] = (uint64_t)ste.word[2] | (uint64_t)ste.word[3] << 32;
> +    /* V | CONFIG | S1FMT | S1CTXPTR | S1CDMAX */
> +    nested_data.ste[0] &= 0xf80fffffffffffffULL;
> +    /* S1DSS | S1CIR | S1COR | S1CSH | S1STALLD | EATS */
> +    nested_data.ste[1] &= 0x380000ffULL;

Likely we need to make sure that values here are little endians, in
alignment with the kernel uABI.

> +    ret = smmuv3_accel_dev_install_nested_ste(accel_dev,
> +                                              IOMMU_HWPT_DATA_ARM_SMMUV3,
> +                                              sizeof(nested_data),
> +                                              &nested_data);
> +    if (ret) {
> +        error_report("Unable to install nested STE=%16LX:%16LX, sid=0x%x,"
> +                      "ret=%d", nested_data.ste[1], nested_data.ste[0],
> +                      sid, ret);
> +    }
> +
> +    trace_smmuv3_accel_install_nested_ste(sid, nested_data.ste[1],
> +                                          nested_data.ste[0]);
> +}
> +
> +static void
> +smmuv3_accel_ste_range(gpointer key, gpointer value, gpointer user_data)
> +{
> +    SMMUDevice *sdev = (SMMUDevice *)key;
> +    uint32_t sid = smmu_get_sid(sdev);
> +    SMMUSIDRange *sid_range = (SMMUSIDRange *)user_data;
> +
> +    if (sid >= sid_range->start && sid <= sid_range->end) {
> +        SMMUv3State *s = sdev->smmu;
> +        SMMUState *bs = &s->smmu_state;

Can we use ARM_SMMU and ARM_SMMUV3 macros?

> +
> +        smmuv3_accel_install_nested_ste(bs, sdev, sid);
> +    }
> +}
> +
> +void
> +smmuv3_accel_install_nested_ste_range(SMMUState *bs, SMMUSIDRange *range)

Fits in one line.

>  typedef struct SMMUv3AccelDevice {
>      SMMUDevice  sdev;
>      AddressSpace as_sysmem;
>      HostIOMMUDeviceIOMMUFD *idev;
> +    SMMUS1Hwpt  *s1_hwpt;

No need of an extra space.

>      SMMUViommu *viommu;
>      QLIST_ENTRY(SMMUv3AccelDevice) next;
>  } SMMUv3AccelDevice;
> @@ -45,10 +51,21 @@ typedef struct SMMUv3AccelState {
>  
>  #if defined(CONFIG_ARM_SMMUV3) && defined(CONFIG_IOMMUFD)
>  void smmuv3_accel_init(SMMUv3State *s);
> +void smmuv3_accel_install_nested_ste(SMMUState *bs, SMMUDevice *sdev, int sid);
> +void smmuv3_accel_install_nested_ste_range(SMMUState *bs,
> +                                           SMMUSIDRange *range);

Fits in one line.

> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index b6b7399347..738061c6ad 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -547,6 +547,10 @@ typedef struct CD {
>      uint32_t word[16];
>  } CD;
>  
> +int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
> +                  SMMUEventInfo *event);

Ditto

> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 2f5a8157dd..c94bfe6564 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -630,8 +630,8 @@ bad_ste:
>   * Supports linear and 2-level stream table
>   * Return 0 on success, -EINVAL otherwise
>   */
> -static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
> -                         SMMUEventInfo *event)
> +int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
> +                  SMMUEventInfo *event)

Ditto

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 10/15] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
  2025-07-14 15:59 ` [RFC PATCH v3 10/15] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device Shameer Kolothum via
@ 2025-07-14 19:43   ` Nicolin Chen
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 19:43 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:36PM +0100, Shameer Kolothum wrote:
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 74bf20cfaf..f1584dd775 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -93,6 +93,23 @@ void smmuv3_accel_install_nested_ste(SMMUState *bs, SMMUDevice *sdev, int sid)
>          return;
>      }
>  
> +    if (!accel_dev->vdev && accel_dev->idev) {
> +        IOMMUFDVdev *vdev;
> +        uint32_t vdev_id;
> +        SMMUViommu *viommu = accel_dev->viommu;

Can we put the viommu line at the top of these three?

> +
> +        iommufd_backend_alloc_vdev(viommu->core.iommufd, accel_dev->idev->devid,
> +                                   viommu->core.viommu_id, sid, &vdev_id,
> +                                   &error_abort);

Let's check ret.

> diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
> index 06e81b630d..21028e60c8 100644
> --- a/hw/arm/smmuv3-accel.h
> +++ b/hw/arm/smmuv3-accel.h
> @@ -40,6 +40,7 @@ typedef struct SMMUv3AccelDevice {
>      HostIOMMUDeviceIOMMUFD *idev;
>      SMMUS1Hwpt  *s1_hwpt;
>      SMMUViommu *viommu;
> +    IOMMUFDVdev  *vdev;

No need of extra space.

>      QLIST_ENTRY(SMMUv3AccelDevice) next;
>  } SMMUv3AccelDevice;
>  
> diff --git a/include/system/iommufd.h b/include/system/iommufd.h
> index b7ad2cf10c..8de559d448 100644
> --- a/include/system/iommufd.h
> +++ b/include/system/iommufd.h
> @@ -44,6 +44,11 @@ typedef struct IOMMUFDViommu {
>      uint32_t viommu_id;
>  } IOMMUFDViommu;
>  
> +typedef struct IOMMUFDVdev {
> +    uint32_t vdev_id;
> +    uint32_t dev_id;
> +} IOMMUFDVdev;

This adds to the core header. Maybe it can be done with the patch
that adds iommufd_backend_alloc_vdev()?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 11/15] hw/pci/pci: Introduce optional get_msi_address_space() callback.
  2025-07-14 15:59 ` [RFC PATCH v3 11/15] hw/pci/pci: Introduce optional get_msi_address_space() callback Shameer Kolothum via
@ 2025-07-14 19:50   ` Nicolin Chen
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 19:50 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:37PM +0100, Shameer Kolothum wrote:
> On ARM, when a device is behind an IOMMU, its MSI doorbell address is
> subject to translation by the IOMMU. This behavior affects vfio-pci
> passthrough devices assigned to guests using an accelerated SMMUv3.
> 
> In this setup, we configure the host SMMUv3 in nested mode, where
> VFIO sets up the Stage-2 (S2) mappings for guest RAM, while the guest
> controls Stage-1 (S1). To allow VFIO to correctly configure S2 mappings,
> we currently return the system address space via the get_address_space()
> callback for vfio-pci devices.
> 
> However, QEMU/KVM also uses this same callback path when resolving the
> address space for MSI doorbells:
> 
> kvm_irqchip_add_msi_route()
>   kvm_arch_fixup_msi_route()
>     pci_device_iommu_address_space()
> 
> This leads to problems when MSI doorbells need to be translated.
> 
> To fix this, introduce an optional get_msi_address_space() callback.
> In the SMMUv3 accelerated case, this callback returns the IOMMU address
> space if the guest has set up S1 translations for the vfio-pci device.
> Otherwise, it returns the system address space.
> 
> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c | 25 +++++++++++++++++++++++++
>  hw/pci/pci.c          | 19 +++++++++++++++++++
>  include/hw/pci/pci.h  | 16 ++++++++++++++++
>  target/arm/kvm.c      |  2 +-

I think we need to separate core changes and smmu changes, like how
pci_device_set/unset_iommu_device were introduced.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 12/15] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-07-14 15:59 ` [RFC PATCH v3 12/15] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
@ 2025-07-14 19:55   ` Nicolin Chen
  2025-07-15 10:39   ` Jonathan Cameron via
  1 sibling, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 19:55 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:38PM +0100, Shameer Kolothum wrote:
> diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
> index 21028e60c8..d06c9664ba 100644
> --- a/hw/arm/smmuv3-accel.h
> +++ b/hw/arm/smmuv3-accel.h
> @@ -13,6 +13,7 @@
>  #include "hw/arm/smmu-common.h"
>  #include "system/iommufd.h"
>  #include <linux/iommufd.h>
> +#include "smmuv3-internal.h"

Let's organize in alphabetical order.

> +static inline void smmuv3_accel_batch_cmd(SMMUState *bs, SMMUDevice *sdev,
> +                                          SMMUCommandBatch *batch,
> +                                          struct Cmd *cmd, uint32_t *cons)
> +{
> +    return;

Leave it blank since void?

Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 15/15] hw/arm/smmu-common: Add accel property for SMMU dev
  2025-07-14 15:59 ` [RFC PATCH v3 15/15] hw/arm/smmu-common: Add accel property for SMMU dev Shameer Kolothum via
@ 2025-07-14 20:00   ` Nicolin Chen
  2025-07-15 10:49   ` Jonathan Cameron via
  1 sibling, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-14 20:00 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:41PM +0100, Shameer Kolothum wrote:
> Now user can set "accel=on". Have fun!
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
 
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-14 15:59 ` [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits Shameer Kolothum via
@ 2025-07-14 20:04   ` Nicolin Chen via
  2025-07-14 20:24     ` Nicolin Chen via
  2025-07-15 10:48   ` Jonathan Cameron via
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 79+ messages in thread
From: Nicolin Chen via @ 2025-07-14 20:04 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:40PM +0100, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Not all fields in the SMMU IDR registers are meaningful for userspace.
> Only the following fields can be used:
> 
>   - IDR0: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN, CD2L, ASID16, TTF  
>   - IDR1: SIDSIZE, SSIDSIZE  
>   - IDR3: BBML, RIL  
>   - IDR5: VAX, GRAN64K, GRAN16K, GRAN4K
> 
> Use the relevant fields from these to check whether the host and emulated
> SMMUv3 features are sufficiently aligned to enable accelerated SMMUv3
> support.
> 
> To retrieve this information from the host, at least one vfio-pci device
> must be assigned with "arm-smmuv3,accel=on" usage. Add a check to enforce
> this.
> 
> Note:
> 
> ATS, PASID, and PRI features are currently not supported. Only devices
> that do not require or make use of these features are expected to work.

Can we support ATS/PASID at least? I need to double check intel's
series, but I somehow recall that there is a PASID cap support in
the VFIO level, so VM could actually report ATS/PASID caps?

The invalidation part could forward ATC_INV command too, as kernel
supports that.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-07-14 16:14 ` [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Nicolin Chen via
@ 2025-07-14 20:22   ` Nicolin Chen via
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen via @ 2025-07-14 20:22 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 09:14:17AM -0700, Nicolin Chen wrote:
> Hi Shameer,
> 
> Thank you for sending the v3.
> 
> On Mon, Jul 14, 2025 at 04:59:26PM +0100, Shameer Kolothum wrote:
> > Branch for testing:
> [...]
> > Tested on a HiSilicon platform with multiple SMMUv3s.
> > 
> > ./qemu-system-aarch64 \
> >   -machine virt,accel=kvm,gic-version=3 \
> >   -object iommufd,id=iommufd0 \
> >   -bios QEMU_EFI \
> >   -cpu host -smp cpus=4 -m size=16G,slots=4,maxmem=256G -nographic \
> >   -device virtio-blk-device,drive=fs \
> >   -drive if=none,file=ubuntu.img,id=fs \
> >   -kernel Image \
> >   -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0,accel=on \
> >   -device vfio-pci,host=0000:75:00.1,bus=pcie.0,iommufd=iommufd0 \
> >   -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \
> >   -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on \
> >   -device pcie-root-port,id=pcie1.port1,chassis=2,bus=pcie.1,pref64-reserve=2M,io-reserve=1K \
> >   -device vfio-pci,host=0000:7d:02.1,bus=pcie1.port1,iommufd=iommufd0,id=net1 \
> >   -append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \
> >   -device pxb-pcie,id=pcie.2,bus_nr=32,bus=pcie.0 \
> >   -device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.2 \
> >   -device pcie-root-port,id=pcie2.port1,chassis=8,bus=pcie.2 \
> >   -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie2.port1 \
> >   -fsdev local,id=p9fs,path=p9root,security_model=mapped \
> >   -net none \
> >   -nographic
>  
> I am looking for that "branch for testing" to try some tests on my
> side, but couldn't find one. Would you please share a Github link?

Oops. I found the link. Never mind.

Cheers
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-14 20:04   ` Nicolin Chen via
@ 2025-07-14 20:24     ` Nicolin Chen via
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen via @ 2025-07-14 20:24 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 01:04:02PM -0700, Nicolin Chen wrote:
> On Mon, Jul 14, 2025 at 04:59:40PM +0100, Shameer Kolothum wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > 
> > Not all fields in the SMMU IDR registers are meaningful for userspace.
> > Only the following fields can be used:
> > 
> >   - IDR0: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN, CD2L, ASID16, TTF  
> >   - IDR1: SIDSIZE, SSIDSIZE  
> >   - IDR3: BBML, RIL  
> >   - IDR5: VAX, GRAN64K, GRAN16K, GRAN4K
> > 
> > Use the relevant fields from these to check whether the host and emulated
> > SMMUv3 features are sufficiently aligned to enable accelerated SMMUv3
> > support.
> > 
> > To retrieve this information from the host, at least one vfio-pci device
> > must be assigned with "arm-smmuv3,accel=on" usage. Add a check to enforce
> > this.
> > 
> > Note:
> > 
> > ATS, PASID, and PRI features are currently not supported. Only devices
> > that do not require or make use of these features are expected to work.
> 
> Can we support ATS/PASID at least? I need to double check intel's
> series, but I somehow recall that there is a PASID cap support in
> the VFIO level, so VM could actually report ATS/PASID caps?
> 
> The invalidation part could forward ATC_INV command too, as kernel
> supports that.

Also, the Subject is missing prefix "hw/arm/smmuv3-accel:".

Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 01/15] backends/iommufd: Introduce iommufd_backend_alloc_viommu
  2025-07-14 15:59 ` [RFC PATCH v3 01/15] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
  2025-07-14 16:22   ` Nicolin Chen
@ 2025-07-15  9:14   ` Jonathan Cameron via
  1 sibling, 0 replies; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15  9:14 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:27 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Add a helper to allocate a viommu object.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

One trivial comment inline. Feel free to ignore.

> ---
>  backends/iommufd.c       | 25 +++++++++++++++++++++++++
>  backends/trace-events    |  1 +
>  include/system/iommufd.h |  4 ++++
>  3 files changed, 30 insertions(+)
> 
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 2a33c7ab0b..f3b95ee321 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -446,6 +446,31 @@ bool iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t id,
>      return !ret;
>  }
>  
> +bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
> +                                  uint32_t viommu_type, uint32_t hwpt_id,
> +                                  uint32_t *out_viommu_id, Error **errp)
> +{
> +    int ret, fd = be->fd;

Not sure the fd local variable is worth bothering with given be->fd is
very short and you only have a couple of users.

> +    struct iommu_viommu_alloc alloc_viommu = {
> +        .size = sizeof(alloc_viommu),
> +        .type = viommu_type,
> +        .dev_id = dev_id,
> +        .hwpt_id = hwpt_id,
> +    };
> +
> +    ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &alloc_viommu);
> +
> +    trace_iommufd_backend_alloc_viommu(fd, viommu_type, dev_id, hwpt_id,
> +                                       alloc_viommu.out_viommu_id, ret);
> +    if (ret) {
> +        error_setg_errno(errp, errno, "IOMMU_VIOMMU_ALLOC failed");
> +        return false;
> +    }
> +
> +    *out_viommu_id = alloc_viommu.out_viommu_id;
> +    return true;
> +}


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 02/15] backends/iommufd: Introduce iommufd_vdev_alloc
  2025-07-14 15:59 ` [RFC PATCH v3 02/15] backends/iommufd: Introduce iommufd_vdev_alloc Shameer Kolothum via
  2025-07-14 16:27   ` Nicolin Chen
@ 2025-07-15  9:19   ` Jonathan Cameron via
  1 sibling, 0 replies; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15  9:19 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:28 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Add a helper to allocate an iommufd device's virtual device (in the user
> space) per a viommu instance.
Same trivial suggestion as in patch 1.  Also feel free to ignore.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 03/15] hw/arm/smmu-common: Factor out common helper functions and export
  2025-07-14 15:59 ` [RFC PATCH v3 03/15] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum via
@ 2025-07-15  9:27   ` Jonathan Cameron via
  0 siblings, 0 replies; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15  9:27 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:29 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> Subsequent patches for smmuv3 accel support will make use of this.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Various trivial things inline.  In general looks fine.

J

> ---
>  hw/arm/smmu-common.c         | 48 ++++++++++++++++++++++--------------
>  include/hw/arm/smmu-common.h |  6 +++++
>  2 files changed, 36 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index ab920717cf..0f1a06cec2 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -847,12 +847,28 @@ SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num)
>      return NULL;
>  }
>  
> -static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
> +void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev,
> +                    PCIBus *bus, int devfn)

It's trivial Tuesday.  Fits on one line.  Maybe fine to keep it like this if you
are going to modify this in later patches and this reduces the churn.


>  {
> -    SMMUState *s = opaque;
> -    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
> -    SMMUDevice *sdev;
>      static unsigned int index;
> +    char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn, index++);
> +
> +    sdev->smmu = s;
> +    sdev->bus = bus;
> +    sdev->devfn = devfn;
> +
> +    memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
> +                             s->mrtypename,
> +                             OBJECT(s), name, UINT64_MAX);
Wrap was odd on original code, might as well tidy it up though as have a
little more width now.

    memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
                             s->mrtypename, OBJECT(s), name, UINT64_MAX);

> +    address_space_init(&sdev->as,
> +                       MEMORY_REGION(&sdev->iommu), name);
And this one.

    address_space_init(&sdev->as, MEMORY_REGION(&sdev->iommu), name);

> +    trace_smmu_add_mr(name);
> +    g_free(name);

Use g_autofree perhaps.

> +}
> +
> +SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus)
> +{
> +    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
>  
>      if (!sbus) {
>          sbus = g_malloc0(sizeof(SMMUPciBus) +
> @@ -861,23 +877,19 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
>          g_hash_table_insert(s->smmu_pcibus_by_busptr, bus, sbus);
>      }
>  
> +    return sbus;
> +}
> +
> +static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
> +{
> +    SMMUDevice *sdev;

Why have this first?  Original order had it last.  I don't really care but
some sort of system is nicer than none.

> +    SMMUState *s = opaque;
> +    SMMUPciBus *sbus = smmu_get_sbus(s, bus);
> +
>      sdev = sbus->pbdev[devfn];
>      if (!sdev) {
> -        char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn, index++);
> -
>          sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1);
> -
> -        sdev->smmu = s;
> -        sdev->bus = bus;
> -        sdev->devfn = devfn;
> -
> -        memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
> -                                 s->mrtypename,
> -                                 OBJECT(s), name, UINT64_MAX);
> -        address_space_init(&sdev->as,
> -                           MEMORY_REGION(&sdev->iommu), name);
> -        trace_smmu_add_mr(name);
> -        g_free(name);
> +        smmu_init_sdev(s, sdev, bus, devfn);
>      }
>  
>      return &sdev->as;
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index 80d0fecfde..c6f899e403 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -180,6 +180,12 @@ OBJECT_DECLARE_TYPE(SMMUState, SMMUBaseClass, ARM_SMMU)
>  /* Return the SMMUPciBus handle associated to a PCI bus number */
>  SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num);
>  
> +/* Return the SMMUPciBus handle associated to a PCI bus */
> +SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus);
> +
> +/* Initialize SMMUDevice handle associated to a SMMUPCIBus */
> +void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev, PCIBus *bus, int devfn);
> +
>  /* Return the stream ID of an SMMU device */
>  static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
>  {



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper
  2025-07-14 15:59 ` [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper Shameer Kolothum via
  2025-07-14 16:38   ` Nicolin Chen via
@ 2025-07-15  9:30   ` Jonathan Cameron via
  2025-09-04  7:55   ` Eric Auger
  2 siblings, 0 replies; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15  9:30 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:30 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> Allows to retrieve the PCIIOMMUOps based on the SMMU type. This will be
> useful when we add support for accelerated SMMUV3 in subsequent patches
> as that requires a different set of callbacks for iommu ops.
> 
> No special handling is required for now and returns the default ops
> in base SMMU Class.
> 
> No functional changes intended.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Trivial inline.

> ---
>  hw/arm/smmu-common.c         | 17 +++++++++++++++--
>  include/hw/arm/smmu-common.h |  1 +
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 0f1a06cec2..3a1080773a 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -934,6 +934,16 @@ void smmu_inv_notifiers_all(SMMUState *s)
>      }
>  }
>  
> +static const PCIIOMMUOps *smmu_iommu_ops_by_type(SMMUState *s)
> +{
> +    SMMUBaseClass *sbc;
> +
> +    sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
> +    assert(sbc->iommu_ops);
> +
> +    return sbc->iommu_ops;
> +}
> +
>  static void smmu_base_realize(DeviceState *dev, Error **errp)
>  {
>      SMMUState *s = ARM_SMMU(dev);
> @@ -962,6 +972,7 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
>       */
>      if (pci_bus_is_express(pci_bus) && pci_bus_is_root(pci_bus) &&
>          object_dynamic_cast(OBJECT(pci_bus)->parent, TYPE_PCI_HOST_BRIDGE)) {
> +        const PCIIOMMUOps  *iommu_ops;
Bonus space.

>          /*
>           * This condition matches either the default pcie.0, pxb-pcie, or
>           * pxb-cxl. For both pxb-pcie and pxb-cxl, parent_dev will be set.
> @@ -974,10 +985,11 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
>              }
>          }
>  
> +        iommu_ops = smmu_iommu_ops_by_type(s);
>          if (s->smmu_per_bus) {
> -            pci_setup_iommu_per_bus(pci_bus, &smmu_ops, s);
> +            pci_setup_iommu_per_bus(pci_bus, iommu_ops, s);
>          } else {
> -            pci_setup_iommu(pci_bus, &smmu_ops, s);
> +            pci_setup_iommu(pci_bus, iommu_ops, s);
>          }
>          return;
>      }
> @@ -1018,6 +1030,7 @@ static void smmu_base_class_init(ObjectClass *klass, const void *data)
>      device_class_set_parent_realize(dc, smmu_base_realize,
>                                      &sbc->parent_realize);
>      rc->phases.exit = smmu_base_reset_exit;
> +    sbc->iommu_ops = &smmu_ops;
>  }
>  
>  static const TypeInfo smmu_base_info = {
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index c6f899e403..eb94623555 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -171,6 +171,7 @@ struct SMMUBaseClass {
>      /*< public >*/
>  
>      DeviceRealize parent_realize;
> +    const PCIIOMMUOps *iommu_ops;
>  
>  };
>  



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  2025-07-14 15:59 ` [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum via
  2025-07-14 17:23   ` Nicolin Chen
@ 2025-07-15  9:45   ` Jonathan Cameron via
  2025-07-15 10:48   ` Duan, Zhenzhong
  2 siblings, 0 replies; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15  9:45 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:31 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> Also setup specific PCIIOMMUOps for accel SMMUv3 as accel
> SMMUv3 will have different handling for those ops callbacks
> in subsequent patches.
> 
> The "accel" property is not yet added, so users cannot set it at this
> point. It will be introduced in a subsequent patch once the necessary
> support is in place.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> new file mode 100644
> index 0000000000..2eac9c6ff4
> --- /dev/null
> +++ b/hw/arm/smmuv3-accel.c
> @@ -0,0 +1,66 @@
> +/*
> + * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
> + * Copyright (C) 2025 NVIDIA
> + * Written by Nicolin Chen, Shameer Kolothum
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "hw/arm/smmuv3.h"
> +#include "smmuv3-accel.h"
> +
> +static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
> +                                                PCIBus *bus, int devfn)
> +{
> +    SMMUDevice *sdev = sbus->pbdev[devfn];
> +    SMMUv3AccelDevice *accel_dev;
> +
> +    if (sdev) {
> +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);

Nicolin made good point on early return being nicer here.

> +    } else {
> +        accel_dev = g_new0(SMMUv3AccelDevice, 1);
> +        sdev = &accel_dev->sdev;
> +
> +        sbus->pbdev[devfn] = sdev;
> +        smmu_init_sdev(bs, sdev, bus, devfn);
> +    }
> +
> +    return accel_dev;
> +}
> +
> +static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
> +                                              int devfn)
> +{
> +    SMMUState *bs = opaque;

Why bs? (other than for giggles)  If that is standard naming already then
fair enough.

> +    SMMUPciBus *sbus;
> +    SMMUv3AccelDevice *accel_dev;
> +    SMMUDevice *sdev;

Maybe tidy up the ordering to some scheme. 

> +
> +    sbus = smmu_get_sbus(bs, bus);
> +    accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
> +    sdev = &accel_dev->sdev;
> +
> +    return &sdev->as;

Not a lot of point in having local sdev unless you add more
stuff here that uses it later.

> +}

> diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
> new file mode 100644
> index 0000000000..4cf30b1291
> --- /dev/null
> +++ b/hw/arm/smmuv3-accel.h
> @@ -0,0 +1,19 @@
> +/*
> + * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
> + * Copyright (C) 2025 NVIDIA
> + * Written by Nicolin Chen, Shameer Kolothum
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_ARM_SMMUV3_ACCEL_H
> +#define HW_ARM_SMMUV3_ACCEL_H
> +
> +#include "hw/arm/smmu-common.h"
> +#include CONFIG_DEVICES
> +
> +typedef struct SMMUv3AccelDevice {
> +    SMMUDevice  sdev;

Bonus space.

> +} SMMUv3AccelDevice;
> +
> +#endif /* HW_ARM_SMMUV3_ACCEL_H */




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-14 15:59 ` [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum via
  2025-07-14 18:18   ` Nicolin Chen
@ 2025-07-15  9:51   ` Jonathan Cameron via
  2025-07-15 10:53   ` Duan, Zhenzhong
  2025-08-06  0:55   ` Nicolin Chen
  3 siblings, 0 replies; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15  9:51 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:32 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> Accelerated SMMUv3 is only useful when the device can take advantage of
> the host's SMMUv3 in nested mode. To keep things simple and correct, we
> only allow this feature for vfio-pci endpoint devices that use the iommufd
> backend. We also allow non-endpoint emulated devices like PCI bridges and
> root ports, so that users can plug in these vfio-pci devices.
> 
> Another reason for this limit is to avoid problems with IOTLB
> invalidations. Some commands (e.g., CMD_TLBI_NH_ASID) lack an associated
> SID, making it difficult to trace the originating device. If we allowed
> emulated endpoint devices, QEMU would have to invalidate both its own
> software IOTLB and the host's hardware IOTLB, which could slow things
> down.
> 
> Since vfio-pci devices in nested mode rely on the host SMMUv3's nested
> translation (S1+S2), their get_address_space() callback must return the
> system address space to enable correct S2 mappings of guest RAM.
> 
> So in short:
>  - vfio-pci devices return the system address space
>  - bridges and root ports return the IOMMU address space
> 
> Note: On ARM, MSI doorbell addresses are also translated via SMMUv3.
> Hence, if a vfio-pci device is behind the SMMuv3 with translation enabled,
> it must return the IOMMU address space for MSI. Support for this will be
> added in a follow-up patch.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c               | 50 ++++++++++++++++++++++++++++-
>  hw/arm/smmuv3-accel.h               | 15 +++++++++
>  hw/arm/smmuv3.c                     |  4 +++
>  hw/pci-bridge/pci_expander_bridge.c |  1 -
>  include/hw/arm/smmuv3.h             |  1 +
>  include/hw/pci/pci_bridge.h         |  1 +
>  6 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 2eac9c6ff4..0b0ddb03e2 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -7,13 +7,19 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  
>  #include "hw/arm/smmuv3.h"
> +#include "hw/pci/pci_bridge.h"
> +#include "hw/pci-host/gpex.h"
> +#include "hw/vfio/pci.h"
> +
>  #include "smmuv3-accel.h"
>  
>  static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
>                                                  PCIBus *bus, int devfn)
>  {
> +    SMMUv3State *s = ARM_SMMUV3(bs);
>      SMMUDevice *sdev = sbus->pbdev[devfn];
>      SMMUv3AccelDevice *accel_dev;
>  
> @@ -25,30 +31,72 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
>  
>          sbus->pbdev[devfn] = sdev;
>          smmu_init_sdev(bs, sdev, bus, devfn);
> +        address_space_init(&accel_dev->as_sysmem, &s->s_accel->root,
> +                           "smmuv3-accel-sysmem");
>      }
>  
>      return accel_dev;
>  }
>  
> +static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool *vfio_pci)
> +{
> +
> +    if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
> +        object_dynamic_cast(OBJECT(pdev), "pxb-pcie") ||
> +        object_dynamic_cast(OBJECT(pdev), "gpex-root")) {

Include gpex.h and TYPE_GPEX_ROOT_DEVICE
TYPE_IOMMUFD_BACKEND in iommufd.h

etc.



> +        return true;
> +    } else if ((object_dynamic_cast(OBJECT(pdev), TYPE_VFIO_PCI) &&
> +        object_property_find(OBJECT(pdev), "iommufd"))) {
> +        *vfio_pci = true;
> +        return true;
> +    }
> +    return false;
> +}
> +
>  static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
>                                                int devfn)
>  {
> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>      SMMUState *bs = opaque;
> +    bool vfio_pci = false;
>      SMMUPciBus *sbus;
>      SMMUv3AccelDevice *accel_dev;
>      SMMUDevice *sdev;
>  
> +    if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
> +        error_report("Device(%s) not allowed. Only PCIe root complex devices "
> +                     "or PCI bridge devices or vfio-pci endpoint devices with "
> +                     "iommufd as backend is allowed with arm-smmuv3,accel=on",
> +                     pdev->name);
> +        exit(1);
> +    }
>      sbus = smmu_get_sbus(bs, bus);
>      accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
>      sdev = &accel_dev->sdev;
>  
> -    return &sdev->as;
> +    if (vfio_pci) {
> +        return &accel_dev->as_sysmem;
> +    } else {
> +        return &sdev->as;
> +    }
>  }
>  
>  static const PCIIOMMUOps smmuv3_accel_ops = {
>      .get_address_space = smmuv3_accel_find_add_as,
>  };
>  
> +void smmuv3_accel_init(SMMUv3State *s)
> +{
> +    SMMUv3AccelState *s_accel;
> +
> +    s->s_accel = s_accel = g_new0(SMMUv3AccelState, 1);
> +    memory_region_init(&s_accel->root, OBJECT(s), "root", UINT64_MAX);
> +    memory_region_init_alias(&s_accel->sysmem, OBJECT(s),
> +                             "smmuv3-accel-sysmem", get_system_memory(), 0,
> +                             memory_region_size(get_system_memory()));
> +    memory_region_add_subregion(&s_accel->root, 0, &s_accel->sysmem);
> +}
> +
>  static void smmuv3_accel_class_init(ObjectClass *oc, const void *data)
>  {
>      SMMUBaseClass *sbc = ARM_SMMU_CLASS(oc);

> diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
> index d183a62766..3bdb92391a 100644
> --- a/include/hw/arm/smmuv3.h
> +++ b/include/hw/arm/smmuv3.h
> @@ -63,6 +63,7 @@ struct SMMUv3State {
>      qemu_irq     irq[4];
>      QemuMutex mutex;
>      char *stage;
> +    struct SMMUv3AccelState  *s_accel;

bonus space.

>  };
>  
>  typedef enum {



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-07-14 15:59 ` [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
  2025-07-14 19:11   ` Nicolin Chen
@ 2025-07-15 10:29   ` Jonathan Cameron via
  2025-07-15 17:01     ` Nicolin Chen
  1 sibling, 1 reply; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15 10:29 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:34 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Implement a set_iommu_device callback:
>  -If found an existing viommu reuse that.
>    (Devices behind the same physical SMMU should share an S2 HWPT)
>  -Else,
>     Allocate a viommu with the nested parent S2 hwpt allocated by VFIO.
>     Allocate bypass and abort hwpt.
>  -And add the dev to viommu device list
> 
> Also add an unset_iommu_device to unwind/cleanup above.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>


One questions inline plus trivial stuff.  I'm not yet up to speed with
all the iommufd stuff so this is rather superficial for now.

> +static bool
> +smmuv3_accel_dev_alloc_viommu(SMMUv3AccelDevice *accel_dev,
> +                               HostIOMMUDeviceIOMMUFD *idev, Error **errp)
> +{
> +    struct iommu_hwpt_arm_smmuv3 bypass_data = {
> +        .ste = { SMMU_STE_CFG_BYPASS | SMMU_STE_VALID, 0x0ULL },
> +    };
> +    struct iommu_hwpt_arm_smmuv3 abort_data = {
> +        .ste = { SMMU_STE_VALID, 0x0ULL },
> +    };
> +    SMMUDevice *sdev = &accel_dev->sdev;
> +    SMMUState *bs = sdev->smmu;
> +    SMMUv3State *s = ARM_SMMUV3(bs);
> +    SMMUv3AccelState *s_accel = s->s_accel;
> +    uint32_t s2_hwpt_id = idev->hwpt_id;
> +    SMMUS2Hwpt *s2_hwpt;
> +    SMMUViommu *viommu;
> +    uint32_t viommu_id;
> +
> +    if (s_accel->viommu) {
> +        accel_dev->viommu = s_accel->viommu;
> +        return true;
> +    }
> +
> +    if (!iommufd_backend_alloc_viommu(idev->iommufd, idev->devid,
> +                                      IOMMU_VIOMMU_TYPE_ARM_SMMUV3,
> +                                      s2_hwpt_id, &viommu_id, errp)) {
> +        return false;
> +    }
> +
> +    viommu = g_new0(SMMUViommu, 1);
> +    viommu->core.viommu_id = viommu_id;
> +    viommu->core.s2_hwpt_id = s2_hwpt_id;
> +    viommu->core.iommufd = idev->iommufd;
> +
> +    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
> +                                    viommu->core.viommu_id, 0,
> +                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
> +                                    sizeof(abort_data), &abort_data,
> +                                    &viommu->abort_hwpt_id, errp)) {
> +        goto free_viommu;
> +    }
> +
> +    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
> +                                    viommu->core.viommu_id, 0,
> +                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
> +                                    sizeof(bypass_data), &bypass_data,
> +                                    &viommu->bypass_hwpt_id, errp)) {
> +        goto free_abort_hwpt;
> +    }
> +
> +    s2_hwpt = g_new(SMMUS2Hwpt, 1);
> +    s2_hwpt->iommufd = idev->iommufd;
> +    s2_hwpt->hwpt_id = s2_hwpt_id;
> +
> +    viommu->iommufd = idev->iommufd;
> +    viommu->s2_hwpt = s2_hwpt;
> +
> +    s_accel->viommu = viommu;
> +    accel_dev->viommu = viommu;
> +    return true;
> +
> +free_abort_hwpt:
> +    iommufd_backend_free_id(idev->iommufd, viommu->abort_hwpt_id);
> +free_viommu:
> +    iommufd_backend_free_id(idev->iommufd, viommu->core.viommu_id);
> +    g_free(viommu);

No unwinding of iommufd_backened_alloc_viommu?
Looks like we just leak it until destruction of the fd. 

Maybe add a comment for those like me who aren't all that familiar with
this stuff and see an alloc with no matching free.


> +    return false;
> +}
> +
> +static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> +                                          HostIOMMUDevice *hiod, Error **errp)
> +{
> +    HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(hiod);
> +    SMMUState *bs = opaque;
> +    SMMUv3State *s = ARM_SMMUV3(bs);
> +    SMMUv3AccelState *s_accel = s->s_accel;
> +    SMMUPciBus *sbus = smmu_get_sbus(bs, bus);
> +    SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
> +    SMMUDevice *sdev = &accel_dev->sdev;
> +
> +    if (!idev) {
> +        return true;
> +    }
> +
> +    if (accel_dev->idev) {
> +        if (accel_dev->idev != idev) {
> +            error_report("Device 0x%x already has an associated idev",
> +                         smmu_get_sid(sdev));
> +            return false;
> +        } else {

No need for else as other path already returned.

> +            return true;
> +        }
> +    }
> +
> +    if (!smmuv3_accel_dev_alloc_viommu(accel_dev, idev, errp)) {
> +        error_report("Device 0x%x: Unable to alloc viommu", smmu_get_sid(sdev));
> +        return false;
> +    }
> +
> +    accel_dev->idev = idev;
> +    QLIST_INSERT_HEAD(&s_accel->viommu->device_list, accel_dev, next);
> +    trace_smmuv3_accel_set_iommu_device(devfn, smmu_get_sid(sdev));
> +    return true;
> +}


> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index f3386bd7ae..c4537ca1d6 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -66,6 +66,10 @@ smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s
>  smmuv3_inv_notifiers_iova(const char *name, int asid, int vmid, uint64_t iova, uint8_t tg, uint64_t num_pages, int stage) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" stage=%d"
>  smmu_reset_exit(void) ""
>  
> +#smmuv3-accel.c
> +smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
> +smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
bracket?



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 12/15] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-07-14 15:59 ` [RFC PATCH v3 12/15] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
  2025-07-14 19:55   ` Nicolin Chen
@ 2025-07-15 10:39   ` Jonathan Cameron via
  2025-07-15 17:07     ` Nicolin Chen
  1 sibling, 1 reply; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15 10:39 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:38 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Helpers will batch the commands and issue at once to host SMMUv3.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c    | 65 ++++++++++++++++++++++++++++++++++++++++
>  hw/arm/smmuv3-accel.h    | 16 ++++++++++
>  hw/arm/smmuv3-internal.h | 12 ++++++++
>  3 files changed, 93 insertions(+)
> 
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 04c665ccf5..1298b4f6d0 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -168,6 +168,71 @@ smmuv3_accel_install_nested_ste_range(SMMUState *bs, SMMUSIDRange *range)
>      g_hash_table_foreach(bs->configs, smmuv3_accel_ste_range, range);
>  }
>  
> +/* Update batch->ncmds to the number of execute cmds */

Not obvious what the return value here means. Maybe a comment?

> +bool smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch)
> +{
> +    SMMUv3State *s = ARM_SMMUV3(bs);
> +    SMMUv3AccelState *s_accel = s->s_accel;
> +    uint32_t total = batch->ncmds;
> +    IOMMUFDViommu *viommu_core;
> +    int ret;
> +
> +    if (!bs->accel) {
> +        return true;
> +    }
> +
> +    if (!s_accel->viommu) {
> +        return true;
> +    }
> +
> +    viommu_core = &s_accel->viommu->core;
> +    ret = iommufd_backend_invalidate_cache(viommu_core->iommufd,
> +                                           viommu_core->viommu_id,
> +                                           IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
> +                                           sizeof(Cmd), &batch->ncmds,
> +                                           batch->cmds, NULL);
> +    if (!ret || total != batch->ncmds) {
> +        error_report("%s failed: ret=%d, total=%d, done=%d",
> +                      __func__, ret, total, batch->ncmds);
> +        return ret;

This is reporting an error either way but returning success for the second
condition which looks odd.  Add a comment if intended.

> +    }
> +
> +    batch->ncmds = 0;
> +    return ret;

return true; given I think we know it's true if we get here?

> +}
> +
> +/*
> + * Note: sdev can be NULL for certain invalidation commands
> + * e.g., SMMU_CMD_TLBI_NH_ASID, SMMU_CMD_TLBI_NH_VA etc.
> + */
> +void smmuv3_accel_batch_cmd(SMMUState *bs, SMMUDevice *sdev,
> +                           SMMUCommandBatch *batch, Cmd *cmd,
> +                           uint32_t *cons)
> +{
> +    if (!bs->accel) {
> +        return;
> +    }
> +
> +   /*
> +    * We may end up here for any emulated PCI bridge or root port type
> +    * devices. The batching of commands only matters for vfio-pci endpoint
> +    * devices with Guest S1 translation enabled. Hence check that, if
> +    * sdev is available.
> +    */
> +    if (sdev) {
> +        SMMUv3AccelDevice *accel_dev;
> +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +
> +        if (!accel_dev->s1_hwpt) {
> +            return;
> +        }
> +    }
> +
> +    batch->cmds[batch->ncmds] = *cmd;
> +    batch->cons[batch->ncmds++] = *cons;
> +    return;
Drop this trailing return.

> +}


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (15 preceding siblings ...)
  2025-07-14 16:14 ` [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Nicolin Chen via
@ 2025-07-15 10:46 ` Duan, Zhenzhong
  2025-07-16  7:27   ` Shameerali Kolothum Thodi via
  16 siblings, 1 reply; 79+ messages in thread
From: Duan, Zhenzhong @ 2025-07-15 10:46 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, ddutile@redhat.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	linuxarm@huawei.com, wangzhou1@hisilicon.com,
	jiangkunkun@huawei.com, jonathan.cameron@huawei.com,
	zhangfei.gao@linaro.org, shameerkolothum@gmail.com

Hi Shameer,

>-----Original Message-----
>From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>Subject: [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable
>accelerated SMMUv3
>
>Hi All,
>
>This patch series introduces initial support for a user-creatable,
>accelerated SMMUv3 device (-device arm-smmuv3,accel=on) in QEMU.
>
>This is based on the user-creatable SMMUv3 device series [0].
>
>Why this is needed:
>
>On ARM, to enable vfio-pci pass-through devices in a VM, the host SMMUv3
>must be set up in nested translation mode (Stage 1 + Stage 2), with
>Stage 1 (S1) controlled by the guest and Stage 2 (S2) managed by the host.
>
>This series introduces an optional accel property for the SMMUv3 device,
>indicating that the guest will try to leverage host SMMUv3 features for
>acceleration. By default, enabling accel configures the host SMMUv3 in
>nested mode to support vfio-pci pass-through.
>
>This new accelerated, user-creatable SMMUv3 device lets you:
>
> -Set up a VM with multiple SMMUv3s, each tied to a different physical
>SMMUv3
>  on the host. Typically, you’d have multiple PCIe PXB root complexes in the
>  VM (one per virtual NUMA node), and each of them can have its own
>SMMUv3.
>  This setup mirrors the host's layout, where each NUMA node has its own
>  SMMUv3, and helps build VMs that are more aligned with the host's NUMA
>  topology.

Is it a must to mirror the host layout?
Does this mirror include smmuv3.0 which linked to pcie.0?
Do we have to create same number of smmuv3 as host smmuv3 for guest?
What happen if we don't mirror correctly, e.g., vfio device linked to smmuv3.0
in guest while in host it linked to smmuv3.1?
>
> -The host–guest SMMUv3 association results in reduced invalidation
>broadcasts
>  and lookups for devices behind different physical SMMUv3s.
>
> -Simplifies handling of host SMMUv3s with differing feature sets.
>
> -Lays the groundwork for additional capabilities like vCMDQ support.
>
>Changes from RFCv2[1] and key points in RFCv3:
>
> -Unlike RFCv2, there is no arm-smmuv3-accel device now. The accelerated
>  mode is enabled using -device arm-smmuv3,accel=on.
>
> -When accel=on is specified, the SMMUv3 will allow only vfio-pci endpoint
>  devices and any non-endpoint devices like PCI bridges and root ports used
>  to plug in the vfio-pci. See patch#6
>
> -I have tried to keep this RFC simple and basic so we can focus on the
>  structure of this new accelerated support. That means there is no support
>  for ATS, PASID, or PRI. Only vfio-pci devices that don’t require these
>  features will work.
>
> -Some clarity is still needed on the final approach to handle MSI translation.
>  Hence, RMR support (which is required for this) is not included yet, but
>  available in the git branch provided below for testing.
>
> -At least one vfio-pci device must currently be cold-plugged to a PCIe root
>  complex associated with arm-smmuv3,accel=on. This is required to:
>  1. associate a guest SMMUv3 with a host SMMUv3
>  2. retrieve the host SMMUv3 feature registers for guest export
>  This still needs discussion, as there were concerns previously about this
>  approach and it also breaks hotplug/unplug scenarios. See patch#14
>
> -This version does not yet support host SMMUv3 fault handling or other
>event
>  notifications. These will be addressed in a future patch series.
>
>Branch for testing:
>
>This is based on v8 of the SMMUv3 device series and has dependency on the
>Intel
>series here [3].
>
>https://github.com/hisilicon/qemu/tree/smmuv3-dev-v8-accel-rfcv3
>
>
>Tested on a HiSilicon platform with multiple SMMUv3s.
>
>./qemu-system-aarch64 \
>  -machine virt,accel=kvm,gic-version=3 \
>  -object iommufd,id=iommufd0 \
>  -bios QEMU_EFI \
>  -cpu host -smp cpus=4 -m size=16G,slots=4,maxmem=256G -nographic \
>  -device virtio-blk-device,drive=fs \
>  -drive if=none,file=ubuntu.img,id=fs \
>  -kernel Image \
>  -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0,accel=on \

Here accel=on, so only vfio device is allowed on pcie.0?

>  -device vfio-pci,host=0000:75:00.1,bus=pcie.0,iommufd=iommufd0 \
>  -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \
>  -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on \
>  -device
>pcie-root-port,id=pcie1.port1,chassis=2,bus=pcie.1,pref64-reserve=2M,io-res
>erve=1K \
>  -device
>vfio-pci,host=0000:7d:02.1,bus=pcie1.port1,iommufd=iommufd0,id=net1 \
>  -append "rdinit=init console=ttyAMA0 root=/dev/vda rw
>earlycon=pl011,0x9000000" \
>  -device pxb-pcie,id=pcie.2,bus_nr=32,bus=pcie.0 \
>  -device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.2 \
>  -device pcie-root-port,id=pcie2.port1,chassis=8,bus=pcie.2 \
>  -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie2.port1 \
>  -fsdev local,id=p9fs,path=p9root,security_model=mapped \
>  -net none \
>  -nographic
>
>
>Guest output:
>
>root@ubuntu:/# dmesg |grep smmu
> arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features
>0x00008305)
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features
>0x00008305)
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
> arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit (features
>0x00008305)
> arm-smmu-v3 arm-smmu-v3.2.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.2.auto: allocated 32768 entries for evtq
>root@ubuntu:/#
>
>root@ubuntu:/# lspci -tv
>-+-[0000:20]---00.0-[21]----00.0  Red Hat, Inc Virtio filesystem
> +-[0000:02]---00.0-[03]----00.0  Huawei Technologies Co., Ltd. Device
>a22e
> \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>             +-01.0  Huawei Technologies Co., Ltd. Device a251
>             +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>             \-03.0  Red Hat, Inc. QEMU PCIe Expander bridge

Are these all the devices in this guest config?
Will not qemu create some default devices implicitly even if we don't ask them in cmdline?

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 13/15] hw/arm/smmuv3: Forward invalidation commands to hw
  2025-07-14 15:59 ` [RFC PATCH v3 13/15] hw/arm/smmuv3: Forward invalidation commands to hw Shameer Kolothum via
@ 2025-07-15 10:46   ` Jonathan Cameron via
  2025-07-15 17:22     ` Nicolin Chen
  0 siblings, 1 reply; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15 10:46 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:39 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Use the provided smmuv3-accel helper functions to issue the
> invalidation commands to host SMMUv3.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-internal.h | 11 +++++++++++
>  hw/arm/smmuv3.c          | 28 ++++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 8cb6a9238a..f3aeaf6375 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -233,6 +233,17 @@ static inline bool smmuv3_gerror_irq_enabled(SMMUv3State *s)
>  #define Q_CONS_WRAP(q) (((q)->cons & WRAP_MASK(q)) >> (q)->log2size)
>  #define Q_PROD_WRAP(q) (((q)->prod & WRAP_MASK(q)) >> (q)->log2size)
>  
> +static inline int smmuv3_q_ncmds(SMMUQueue *q)
> +{
> +    uint32_t prod = Q_PROD(q);
> +    uint32_t cons = Q_CONS(q);
> +
> +    if (Q_PROD_WRAP(q) == Q_CONS_WRAP(q))
> +        return prod - cons;
> +    else

Else doesn't add anything, also, it's qemu so {}

> +        return WRAP_MASK(q) - cons + prod;
> +}
> +
>  static inline bool smmuv3_q_full(SMMUQueue *q)
>  {
>      return ((q->cons ^ q->prod) & WRAP_INDEX_MASK(q)) == WRAP_MASK(q);
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index c94bfe6564..97ecca0764 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1285,10 +1285,17 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>      SMMUCmdError cmd_error = SMMU_CERROR_NONE;
>      SMMUQueue *q = &s->cmdq;
>      SMMUCommandType type = 0;
> +    SMMUCommandBatch batch = {};
> +    uint32_t ncmds;
>  
>      if (!smmuv3_cmdq_enabled(s)) {
>          return 0;
>      }
> +
> +    ncmds = smmuv3_q_ncmds(q);
> +    batch.cmds = g_new0(Cmd, ncmds);
> +    batch.cons = g_new0(uint32_t, ncmds);

Where is batch.ncmds set?  It is cleared but I'm missing it being set to anything.

> +

> +    qemu_mutex_lock(&s->mutex);
> +    if (!cmd_error && batch.ncmds) {
> +        if (!smmuv3_accel_issue_cmd_batch(bs, &batch)) {
> +            if (batch.ncmds) {
> +                q->cons = batch.cons[batch.ncmds - 1];
> +            } else {
> +                q->cons = batch.cons[0]; /* FIXME: Check */
> +            }

Totally non obvious that a return of false from the issue call means
illegal command type.  Maybe that will be obvious form comments
requested in previous patch review.


> +            qemu_log_mask(LOG_GUEST_ERROR, "Illegal command type: %d\n",
> +                          CMD_TYPE(&batch.cmds[batch.ncmds]));
> +            cmd_error = SMMU_CERROR_ILL;
> +        }
> +    }
> +    qemu_mutex_unlock(&s->mutex);
> +
>      if (cmd_error) {
>          trace_smmuv3_cmdq_consume_error(smmu_cmd_string(type), cmd_error);
>          smmu_write_cmdq_err(s, cmd_error);
>          smmuv3_trigger_irq(s, SMMU_IRQ_GERROR, R_GERROR_CMDQ_ERR_MASK);
>      }
>  
> +    g_free(batch.cmds);
> +    g_free(batch.cons);
>      trace_smmuv3_cmdq_consume_out(Q_PROD(q), Q_CONS(q),
>                                    Q_PROD_WRAP(q), Q_CONS_WRAP(q));
>  



^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  2025-07-14 15:59 ` [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum via
  2025-07-14 17:23   ` Nicolin Chen
  2025-07-15  9:45   ` Jonathan Cameron via
@ 2025-07-15 10:48   ` Duan, Zhenzhong
  2025-07-15 17:29     ` Nicolin Chen
  2 siblings, 1 reply; 79+ messages in thread
From: Duan, Zhenzhong @ 2025-07-15 10:48 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, ddutile@redhat.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	linuxarm@huawei.com, wangzhou1@hisilicon.com,
	jiangkunkun@huawei.com, jonathan.cameron@huawei.com,
	zhangfei.gao@linaro.org, shameerkolothum@gmail.com



>-----Original Message-----
>From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>Subject: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3
>accel device
>
>Also setup specific PCIIOMMUOps for accel SMMUv3 as accel
>SMMUv3 will have different handling for those ops callbacks
>in subsequent patches.
>
>The "accel" property is not yet added, so users cannot set it at this
>point. It will be introduced in a subsequent patch once the necessary
>support is in place.
>
>Signed-off-by: Shameer Kolothum
><shameerali.kolothum.thodi@huawei.com>
>---
> hw/arm/meson.build           |  3 +-
> hw/arm/smmu-common.c         |  6 +++-
> hw/arm/smmuv3-accel.c        | 66
>++++++++++++++++++++++++++++++++++++
> hw/arm/smmuv3-accel.h        | 19 +++++++++++
> include/hw/arm/smmu-common.h |  3 ++
> 5 files changed, 95 insertions(+), 2 deletions(-)
> create mode 100644 hw/arm/smmuv3-accel.c
> create mode 100644 hw/arm/smmuv3-accel.h
>
>diff --git a/hw/arm/meson.build b/hw/arm/meson.build
>index dc68391305..6126eb1b64 100644
>--- a/hw/arm/meson.build
>+++ b/hw/arm/meson.build
>@@ -61,7 +61,8 @@ arm_common_ss.add(when: 'CONFIG_ARMSSE', if_true:
>files('armsse.c'))
> arm_common_ss.add(when: 'CONFIG_FSL_IMX7', if_true: files('fsl-imx7.c',
>'mcimx7d-sabre.c'))
> arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP', if_true:
>files('fsl-imx8mp.c'))
> arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP_EVK', if_true:
>files('imx8mp-evk.c'))
>-arm_common_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true:
>files('smmuv3.c'))
>+arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
>+arm_ss.add(when: ['CONFIG_ARM_SMMUV3', 'CONFIG_IOMMUFD'], if_true:
>files('smmuv3-accel.c'))
> arm_common_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true:
>files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
> arm_common_ss.add(when: 'CONFIG_NRF51_SOC', if_true:
>files('nrf51_soc.c'))
> arm_ss.add(when: 'CONFIG_XEN', if_true: files(
>diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
>index 3a1080773a..6a58f574d3 100644
>--- a/hw/arm/smmu-common.c
>+++ b/hw/arm/smmu-common.c
>@@ -938,7 +938,11 @@ static const PCIIOMMUOps
>*smmu_iommu_ops_by_type(SMMUState *s)
> {
>     SMMUBaseClass *sbc;
>
>-    sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
>+    if (s->accel) {
>+        sbc =
>ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMUV3_ACCEL));
>+    } else {
>+        sbc =
>ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
>+    }
>     assert(sbc->iommu_ops);
>
>     return sbc->iommu_ops;
>diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
>new file mode 100644
>index 0000000000..2eac9c6ff4
>--- /dev/null
>+++ b/hw/arm/smmuv3-accel.c
>@@ -0,0 +1,66 @@
>+/*
>+ * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
>+ * Copyright (C) 2025 NVIDIA
>+ * Written by Nicolin Chen, Shameer Kolothum
>+ *
>+ * SPDX-License-Identifier: GPL-2.0-or-later
>+ */
>+
>+#include "qemu/osdep.h"
>+
>+#include "hw/arm/smmuv3.h"
>+#include "smmuv3-accel.h"
>+
>+static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs,
>SMMUPciBus *sbus,
>+                                                PCIBus *bus, int
>devfn)
>+{
>+    SMMUDevice *sdev = sbus->pbdev[devfn];
>+    SMMUv3AccelDevice *accel_dev;
>+
>+    if (sdev) {
>+        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
>+    } else {
>+        accel_dev = g_new0(SMMUv3AccelDevice, 1);
>+        sdev = &accel_dev->sdev;
>+
>+        sbus->pbdev[devfn] = sdev;
>+        smmu_init_sdev(bs, sdev, bus, devfn);
>+    }
>+
>+    return accel_dev;
>+}
>+
>+static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void
>*opaque,
>+                                              int devfn)
>+{
>+    SMMUState *bs = opaque;
>+    SMMUPciBus *sbus;
>+    SMMUv3AccelDevice *accel_dev;
>+    SMMUDevice *sdev;
>+
>+    sbus = smmu_get_sbus(bs, bus);
>+    accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
>+    sdev = &accel_dev->sdev;
>+
>+    return &sdev->as;
>+}
>+
>+static const PCIIOMMUOps smmuv3_accel_ops = {
>+    .get_address_space = smmuv3_accel_find_add_as,
>+};
>+
>+static void smmuv3_accel_class_init(ObjectClass *oc, const void *data)
>+{
>+    SMMUBaseClass *sbc = ARM_SMMU_CLASS(oc);
>+
>+    sbc->iommu_ops = &smmuv3_accel_ops;
>+}
>+
>+static const TypeInfo types[] = {
>+    {
>+        .name = TYPE_ARM_SMMUV3_ACCEL,
>+        .parent = TYPE_ARM_SMMUV3,
>+        .class_init = smmuv3_accel_class_init,
>+    }

In cover-letter, I see "-device arm-smmuv3", so where is above accel device
created so we could use smmuv3_accel_ops?



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-14 15:59 ` [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits Shameer Kolothum via
  2025-07-14 20:04   ` Nicolin Chen via
@ 2025-07-15 10:48   ` Jonathan Cameron via
  2025-07-16  2:57   ` Nicolin Chen via
  2025-07-22 17:42   ` Nicolin Chen
  3 siblings, 0 replies; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15 10:48 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:40 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Not all fields in the SMMU IDR registers are meaningful for userspace.
> Only the following fields can be used:
> 
>   - IDR0: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN, CD2L, ASID16, TTF  
>   - IDR1: SIDSIZE, SSIDSIZE  
>   - IDR3: BBML, RIL  
>   - IDR5: VAX, GRAN64K, GRAN16K, GRAN4K
> 
> Use the relevant fields from these to check whether the host and emulated
> SMMUv3 features are sufficiently aligned to enable accelerated SMMUv3
> support.
> 
> To retrieve this information from the host, at least one vfio-pci device
> must be assigned with "arm-smmuv3,accel=on" usage. Add a check to enforce
> this.
> 
> Note:
> 
> ATS, PASID, and PRI features are currently not supported. Only devices
> that do not require or make use of these features are expected to work.
> 
> Also, requiring at least one vfio-pci device to be cold-plugged
> complicates hot-unplug and replug scenarios. For example, if all devices
> behind the vSMMUv3 are unplugged after the guest boots, and a new device
> is later hot-plugged into the same PCI bus, there is no guarantee that
> the underlying host SMMUv3 will expose the same feature set as the one
> originally used when the vSMMU was initialized.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
=
> +
> +void smmuv3_accel_init_regs(SMMUv3State *s)
> +{
> +    SMMUv3AccelState *s_accel = s->s_accel;
> +    SMMUv3AccelDevice *accel_dev;
> +    uint32_t data_type;
> +    uint32_t val;
> +    int ret;
> +
> +    if (s_accel->info.idr[0]) {
> +        /* We already got this */
> +        return;
> +    }
> +
> +    if (!s_accel->viommu || QLIST_EMPTY(&s_accel->viommu->device_list)) {
> +        error_report("For arm-smmuv3,accel=on case, atleast one cold-plugged "
> +                     "vfio-pci dev needs to be assigned");
> +        goto out_err;
> +    }
> +
> +    accel_dev = QLIST_FIRST(&s_accel->viommu->device_list);
> +    ret = smmuv3_accel_host_hw_info(accel_dev, &data_type,
> +                                    sizeof(s_accel->info), &s_accel->info);
> +    if (ret) {
> +        error_report("Failed to get Host SMMU device info");
> +        goto out_err;
> +    }
> +
> +    if (data_type != IOMMU_HW_INFO_TYPE_ARM_SMMUV3) {
> +        error_report("Wrong data type (%d) for Host SMMU device info",
> +                     data_type);
> +        goto out_err;
> +    }
> +
> +    trace_smmuv3_accel_host_hw_info(s_accel->info.idr[0], s_accel->info.idr[1],
> +                                    s_accel->info.idr[3], s_accel->info.idr[5]);
> +    /*
> +     * QEMU SMMUv3 supports both linear and 2-level stream tables. If host
> +     * SMMUv3 supports only linear stream table, report that to Guest.
> +     */
> +    val = FIELD_EX32(s_accel->info.idr[0], IDR0, STLEVEL);
> +    if (val < FIELD_EX32(s->idr[0], IDR0, STLEVEL)) {
> +        s->idr[0] = FIELD_DP32(s->idr[0], IDR0, STLEVEL, val);
> +    }
> +
> +    /*
> +     * QEMU SMMUv3 supports little-endian support for translation table walks.
> +     * If host SMMUv3 supports only big-endian, report error.
> +     */
> +    val = FIELD_EX32(s_accel->info.idr[0], IDR0, TTENDIAN);
> +    if (val > FIELD_EX32(s->idr[0], IDR0, TTENDIAN)) {
> +        error_report("Host SUUMU device translation table walk endianess "
> +                     "not supported");
> +        goto out_err;
> +    }
> +
> +    /*
> +     * QEMU SMMUv3 supports AArch64 Translation table format.
> +     * If host SMMUv3 supports only AArch32, report error.
> +     */
> +    val = FIELD_EX32(s_accel->info.idr[0], IDR0, TTF);
> +    if (val < FIELD_EX32(s->idr[0], IDR0, TTF)) {
> +        error_report("Host SMMU device Translation table format not supported");
> +        goto out_err;
> +    }
> +
> +    /*
> +     * QEMU SMMUv3 supports 4K/16K/64K translation granules. If host SMMUv3
> +     * does't support any of these, report the supported ones only to Guest.
> +     */
> +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN4K);
> +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN4K)) {
> +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
> +    }
> +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN16K);
> +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN16K)) {
> +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
> +    }
> +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN64K);
> +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN64K)) {
> +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
> +    }
> +    return;
> +
> +out_err:
> +    exit(1);

Maybe just do this at each error path rather than goto?
Makes it clear that the result is brutal.


> +}
> +
>  static void
>  smmuv3_accel_dev_uninstall_nested_ste(SMMUv3AccelDevice *accel_dev, bool abort)
>  {


> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index 7d232ca17c..37ecab10a0 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -70,7 +70,7 @@ smmu_reset_exit(void) ""
>  smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
>  smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
>  smmuv3_accel_install_nested_ste(uint32_t sid, uint64_t ste_1, uint64_t ste_0) "sid=%d ste=%"PRIx64":%"PRIx64
> -

Stray

> +smmuv3_accel_host_hw_info(uint32_t idr0, uint32_t idr1, uint32_t idr3, uint32_t idr5) "idr0=0x%x idr1=0x%x idr3=0x%x idr5=0x%x"
>  # strongarm.c
>  strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
>  strongarm_ssp_read_underrun(void) "SSP rx underrun"



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 15/15] hw/arm/smmu-common: Add accel property for SMMU dev
  2025-07-14 15:59 ` [RFC PATCH v3 15/15] hw/arm/smmu-common: Add accel property for SMMU dev Shameer Kolothum via
  2025-07-14 20:00   ` Nicolin Chen
@ 2025-07-15 10:49   ` Jonathan Cameron via
  1 sibling, 0 replies; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-15 10:49 UTC (permalink / raw)
  To: Shameer Kolothum, linuxarm
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, 14 Jul 2025 16:59:41 +0100
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> Now user can set "accel=on". Have fun!
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Hard to argue with this one ;)

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  hw/arm/smmu-common.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 6a58f574d3..3e8783670a 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -1022,6 +1022,7 @@ static const Property smmu_dev_properties[] = {
>      DEFINE_PROP_BOOL("smmu_per_bus", SMMUState, smmu_per_bus, false),
>      DEFINE_PROP_LINK("primary-bus", SMMUState, primary_bus,
>                       TYPE_PCI_BUS, PCIBus *),
> +    DEFINE_PROP_BOOL("accel", SMMUState, accel, false),
>  };
>  
>  static void smmu_base_class_init(ObjectClass *klass, const void *data)



^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-14 15:59 ` [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum via
  2025-07-14 18:18   ` Nicolin Chen
  2025-07-15  9:51   ` Jonathan Cameron via
@ 2025-07-15 10:53   ` Duan, Zhenzhong
  2025-07-15 17:59     ` Nicolin Chen
  2025-08-06  0:55   ` Nicolin Chen
  3 siblings, 1 reply; 79+ messages in thread
From: Duan, Zhenzhong @ 2025-07-15 10:53 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, ddutile@redhat.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	linuxarm@huawei.com, wangzhou1@hisilicon.com,
	jiangkunkun@huawei.com, jonathan.cameron@huawei.com,
	zhangfei.gao@linaro.org, shameerkolothum@gmail.com



>-----Original Message-----
>From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>Subject: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated
>SMMUv3 to vfio-pci endpoints with iommufd
>
>Accelerated SMMUv3 is only useful when the device can take advantage of
>the host's SMMUv3 in nested mode. To keep things simple and correct, we
>only allow this feature for vfio-pci endpoint devices that use the iommufd
>backend. We also allow non-endpoint emulated devices like PCI bridges and
>root ports, so that users can plug in these vfio-pci devices.
>
>Another reason for this limit is to avoid problems with IOTLB
>invalidations. Some commands (e.g., CMD_TLBI_NH_ASID) lack an associated
>SID, making it difficult to trace the originating device. If we allowed
>emulated endpoint devices, QEMU would have to invalidate both its own
>software IOTLB and the host's hardware IOTLB, which could slow things
>down.
>
>Since vfio-pci devices in nested mode rely on the host SMMUv3's nested
>translation (S1+S2), their get_address_space() callback must return the
>system address space to enable correct S2 mappings of guest RAM.
>
>So in short:
> - vfio-pci devices return the system address space
> - bridges and root ports return the IOMMU address space
>
>Note: On ARM, MSI doorbell addresses are also translated via SMMUv3.

So the translation result is a doorbell addr(gpa) for guest?
IIUC, there should be a mapping between guest doorbell addr(gpa) to host
doorbell addr(hpa) in stage2 page table? Where is this mapping setup?

>Hence, if a vfio-pci device is behind the SMMuv3 with translation enabled,
>it must return the IOMMU address space for MSI. Support for this will be
>added in a follow-up patch.
>
>Signed-off-by: Shameer Kolothum
><shameerali.kolothum.thodi@huawei.com>
>---
> hw/arm/smmuv3-accel.c               | 50
>++++++++++++++++++++++++++++-
> hw/arm/smmuv3-accel.h               | 15 +++++++++
> hw/arm/smmuv3.c                     |  4 +++
> hw/pci-bridge/pci_expander_bridge.c |  1 -
> include/hw/arm/smmuv3.h             |  1 +
> include/hw/pci/pci_bridge.h         |  1 +
> 6 files changed, 70 insertions(+), 2 deletions(-)
>
>diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
>index 2eac9c6ff4..0b0ddb03e2 100644
>--- a/hw/arm/smmuv3-accel.c
>+++ b/hw/arm/smmuv3-accel.c
>@@ -7,13 +7,19 @@
>  */
>
> #include "qemu/osdep.h"
>+#include "qemu/error-report.h"
>
> #include "hw/arm/smmuv3.h"
>+#include "hw/pci/pci_bridge.h"
>+#include "hw/pci-host/gpex.h"
>+#include "hw/vfio/pci.h"
>+
> #include "smmuv3-accel.h"
>
> static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs,
>SMMUPciBus *sbus,
>                                                 PCIBus *bus, int
>devfn)
> {
>+    SMMUv3State *s = ARM_SMMUV3(bs);
>     SMMUDevice *sdev = sbus->pbdev[devfn];
>     SMMUv3AccelDevice *accel_dev;
>
>@@ -25,30 +31,72 @@ static SMMUv3AccelDevice
>*smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
>
>         sbus->pbdev[devfn] = sdev;
>         smmu_init_sdev(bs, sdev, bus, devfn);
>+        address_space_init(&accel_dev->as_sysmem, &s->s_accel->root,
>+                           "smmuv3-accel-sysmem");
>     }
>
>     return accel_dev;
> }
>
>+static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool *vfio_pci)
>+{
>+
>+    if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
>+        object_dynamic_cast(OBJECT(pdev), "pxb-pcie") ||
>+        object_dynamic_cast(OBJECT(pdev), "gpex-root")) {
>+        return true;
>+    } else if ((object_dynamic_cast(OBJECT(pdev), TYPE_VFIO_PCI) &&
>+        object_property_find(OBJECT(pdev), "iommufd"))) {

Will this always return true?

>+        *vfio_pci = true;
>+        return true;
>+    }
>+    return false;
>+}
>+
> static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void
>*opaque,
>                                               int devfn)
> {
>+    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>     SMMUState *bs = opaque;
>+    bool vfio_pci = false;
>     SMMUPciBus *sbus;
>     SMMUv3AccelDevice *accel_dev;
>     SMMUDevice *sdev;
>
>+    if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
>+        error_report("Device(%s) not allowed. Only PCIe root complex
>devices "
>+                     "or PCI bridge devices or vfio-pci endpoint devices
>with "
>+                     "iommufd as backend is allowed with
>arm-smmuv3,accel=on",
>+                     pdev->name);
>+        exit(1);

Seems aggressive for a hotplug, could we fail hotplug instead of kill QEMU?

Thanks
Zhenzhong

>+    }
>     sbus = smmu_get_sbus(bs, bus);
>     accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
>     sdev = &accel_dev->sdev;
>
>-    return &sdev->as;
>+    if (vfio_pci) {
>+        return &accel_dev->as_sysmem;
>+    } else {
>+        return &sdev->as;
>+    }
> }
>
> static const PCIIOMMUOps smmuv3_accel_ops = {
>     .get_address_space = smmuv3_accel_find_add_as,
> };
>
>+void smmuv3_accel_init(SMMUv3State *s)
>+{
>+    SMMUv3AccelState *s_accel;
>+
>+    s->s_accel = s_accel = g_new0(SMMUv3AccelState, 1);
>+    memory_region_init(&s_accel->root, OBJECT(s), "root", UINT64_MAX);
>+    memory_region_init_alias(&s_accel->sysmem, OBJECT(s),
>+                             "smmuv3-accel-sysmem",
>get_system_memory(), 0,
>+
>memory_region_size(get_system_memory()));
>+    memory_region_add_subregion(&s_accel->root, 0,
>&s_accel->sysmem);
>+}
>+
> static void smmuv3_accel_class_init(ObjectClass *oc, const void *data)
> {
>     SMMUBaseClass *sbc = ARM_SMMU_CLASS(oc);
>diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
>index 4cf30b1291..2cd343103f 100644
>--- a/hw/arm/smmuv3-accel.h
>+++ b/hw/arm/smmuv3-accel.h
>@@ -9,11 +9,26 @@
> #ifndef HW_ARM_SMMUV3_ACCEL_H
> #define HW_ARM_SMMUV3_ACCEL_H
>
>+#include "hw/arm/smmuv3.h"
> #include "hw/arm/smmu-common.h"
> #include CONFIG_DEVICES
>
> typedef struct SMMUv3AccelDevice {
>     SMMUDevice  sdev;
>+    AddressSpace as_sysmem;
> } SMMUv3AccelDevice;
>
>+typedef struct SMMUv3AccelState {
>+    MemoryRegion root;
>+    MemoryRegion sysmem;
>+} SMMUv3AccelState;
>+
>+#if defined(CONFIG_ARM_SMMUV3) && defined(CONFIG_IOMMUFD)
>+void smmuv3_accel_init(SMMUv3State *s);
>+#else
>+static inline void smmuv3_accel_init(SMMUv3State *d)
>+{
>+}
>+#endif
>+
> #endif /* HW_ARM_SMMUV3_ACCEL_H */
>diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>index bcf8af8dc7..2f5a8157dd 100644
>--- a/hw/arm/smmuv3.c
>+++ b/hw/arm/smmuv3.c
>@@ -32,6 +32,7 @@
> #include "qapi/error.h"
>
> #include "hw/arm/smmuv3.h"
>+#include "smmuv3-accel.h"
> #include "smmuv3-internal.h"
> #include "smmu-internal.h"
>
>@@ -1898,6 +1899,9 @@ static void smmu_realize(DeviceState *d, Error
>**errp)
>     sysbus_init_mmio(dev, &sys->iomem);
>
>     smmu_init_irq(s, dev);
>+    if (sys->accel) {
>+        smmuv3_accel_init(s);
>+    }
> }
>
> static const VMStateDescription vmstate_smmuv3_queue = {
>diff --git a/hw/pci-bridge/pci_expander_bridge.c
>b/hw/pci-bridge/pci_expander_bridge.c
>index 1bcceddbc4..a8eb2d2426 100644
>--- a/hw/pci-bridge/pci_expander_bridge.c
>+++ b/hw/pci-bridge/pci_expander_bridge.c
>@@ -48,7 +48,6 @@ struct PXBBus {
>     char bus_path[8];
> };
>
>-#define TYPE_PXB_PCIE_DEV "pxb-pcie"
> OBJECT_DECLARE_SIMPLE_TYPE(PXBPCIEDev, PXB_PCIE_DEV)
>
> static GList *pxb_dev_list;
>diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
>index d183a62766..3bdb92391a 100644
>--- a/include/hw/arm/smmuv3.h
>+++ b/include/hw/arm/smmuv3.h
>@@ -63,6 +63,7 @@ struct SMMUv3State {
>     qemu_irq     irq[4];
>     QemuMutex mutex;
>     char *stage;
>+    struct SMMUv3AccelState  *s_accel;
> };
>
> typedef enum {
>diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h
>index a055fd8d32..b61360b900 100644
>--- a/include/hw/pci/pci_bridge.h
>+++ b/include/hw/pci/pci_bridge.h
>@@ -106,6 +106,7 @@ typedef struct PXBPCIEDev {
>
> #define TYPE_PXB_PCIE_BUS "pxb-pcie-bus"
> #define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
>+#define TYPE_PXB_PCIE_DEV "pxb-pcie"
> #define TYPE_PXB_DEV "pxb"
> OBJECT_DECLARE_SIMPLE_TYPE(PXBDev, PXB_DEV)
>
>--
>2.34.1



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-07-15 10:29   ` Jonathan Cameron via
@ 2025-07-15 17:01     ` Nicolin Chen
  2025-07-16  9:33       ` Jonathan Cameron via
  0 siblings, 1 reply; 79+ messages in thread
From: Nicolin Chen @ 2025-07-15 17:01 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Shameer Kolothum, linuxarm, qemu-arm, qemu-devel, eric.auger,
	peter.maydell, jgg, ddutile, berrange, nathanc, mochs, smostafa,
	wangzhou1, jiangkunkun, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Tue, Jul 15, 2025 at 11:29:41AM +0100, Jonathan Cameron wrote:
> > +    if (!iommufd_backend_alloc_viommu(idev->iommufd, idev->devid,
> > +                                      IOMMU_VIOMMU_TYPE_ARM_SMMUV3,
> > +                                      s2_hwpt_id, &viommu_id, errp)) {
> > +        return false;
> > +    }

[...]

> > +free_abort_hwpt:
> > +    iommufd_backend_free_id(idev->iommufd, viommu->abort_hwpt_id);
> > +free_viommu:
> > +    iommufd_backend_free_id(idev->iommufd, viommu->core.viommu_id);
> > +    g_free(viommu);
> 
> No unwinding of iommufd_backened_alloc_viommu?
> Looks like we just leak it until destruction of the fd. 
> 
> Maybe add a comment for those like me who aren't all that familiar with
> this stuff and see an alloc with no matching free.

Those iommufd_backend_free_id calls are the reverts. An iommufd
object is free-ed using its object id, i.e. the "viommu_id" and
"abort_hwpt_id" in the lines.

Adding comments to every single iommufd_backened_free_id() call
isn't optimal, IMHO, as that function would be invoked across
different vIOMMU files and even the vfio/iommufd core files.

Perhaps QEMU should wrap it up with a helper, E.g.

static inline void iommufd_backend_free(int iommufd, int obj_id)
{
	iommufd_backend_free_id(iommufd, obj_id);
}

if it helps readability?

Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 12/15] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-07-15 10:39   ` Jonathan Cameron via
@ 2025-07-15 17:07     ` Nicolin Chen
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-15 17:07 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Shameer Kolothum, linuxarm, qemu-arm, qemu-devel, eric.auger,
	peter.maydell, jgg, ddutile, berrange, nathanc, mochs, smostafa,
	wangzhou1, jiangkunkun, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Tue, Jul 15, 2025 at 11:39:54AM +0100, Jonathan Cameron wrote:

> > +/* Update batch->ncmds to the number of execute cmds */
> 
> Not obvious what the return value here means. Maybe a comment?

I think it would be clear once we fix the typo in the current one:
s/execute/executed

That being said, if something else is still preferred, we can add:
Return is true if all cmds are issued correctly. False otherwise.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 13/15] hw/arm/smmuv3: Forward invalidation commands to hw
  2025-07-15 10:46   ` Jonathan Cameron via
@ 2025-07-15 17:22     ` Nicolin Chen
  2025-07-16  7:32       ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 79+ messages in thread
From: Nicolin Chen @ 2025-07-15 17:22 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Shameer Kolothum, linuxarm, qemu-arm, qemu-devel, eric.auger,
	peter.maydell, jgg, ddutile, berrange, nathanc, mochs, smostafa,
	wangzhou1, jiangkunkun, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Tue, Jul 15, 2025 at 11:46:09AM +0100, Jonathan Cameron wrote:
> >      SMMUCmdError cmd_error = SMMU_CERROR_NONE;
> >      SMMUQueue *q = &s->cmdq;
> >      SMMUCommandType type = 0;
> > +    SMMUCommandBatch batch = {};
> > +    uint32_t ncmds;
> >  
> >      if (!smmuv3_cmdq_enabled(s)) {
> >          return 0;
> >      }
> > +
> > +    ncmds = smmuv3_q_ncmds(q);
> > +    batch.cmds = g_new0(Cmd, ncmds);
> > +    batch.cons = g_new0(uint32_t, ncmds);
> 
> Where is batch.ncmds set?  It is cleared but I'm missing it being set to anything.

smmuv3_accel_batch_cmd() internally sets that, every time it's
invoked to add a new command in the batch.

Shameer, let's add some comments explaining the batch function.

> > +
> 
> > +    qemu_mutex_lock(&s->mutex);
> > +    if (!cmd_error && batch.ncmds) {
> > +        if (!smmuv3_accel_issue_cmd_batch(bs, &batch)) {
> > +            if (batch.ncmds) {
> > +                q->cons = batch.cons[batch.ncmds - 1];
> > +            } else {
> > +                q->cons = batch.cons[0]; /* FIXME: Check */
> > +            }
> 
> Totally non obvious that a return of false from the issue call means
> illegal command type.  Maybe that will be obvious form comments
> requested in previous patch review.

That's a good point. Shameer, I think we need some fine-grinding
here, validating the return value from the ioctl, for which the
kernel will only return -EIO or -ETIMEOUT on failure, indicating
either an SMMU_CERROR_ILL or an SMMU_CERROR_ATC_INV_SYNC.
 
> > +            qemu_log_mask(LOG_GUEST_ERROR, "Illegal command type: %d\n",
> > +                          CMD_TYPE(&batch.cmds[batch.ncmds]));
> > +            cmd_error = SMMU_CERROR_ILL;

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  2025-07-15 10:48   ` Duan, Zhenzhong
@ 2025-07-15 17:29     ` Nicolin Chen
  2025-07-16  3:38       ` Duan, Zhenzhong
  0 siblings, 1 reply; 79+ messages in thread
From: Nicolin Chen @ 2025-07-15 17:29 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: Shameer Kolothum, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, linuxarm@huawei.com,
	wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
	jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com

On Tue, Jul 15, 2025 at 10:48:31AM +0000, Duan, Zhenzhong wrote:
> >+static const TypeInfo types[] = {
> >+    {
> >+        .name = TYPE_ARM_SMMUV3_ACCEL,
> >+        .parent = TYPE_ARM_SMMUV3,
> >+        .class_init = smmuv3_accel_class_init,
> >+    }
> 
> In cover-letter, I see "-device arm-smmuv3", so where is above accel device
> created so we could use smmuv3_accel_ops?

The smmu-common.c is the shared file between accel and non-accel
instances. It has a module property:
    DEFINE_PROP_BOOL("accel", SMMUState, accel, false),

where it directs to different iommu_ops:
937 static const PCIIOMMUOps *smmu_iommu_ops_by_type(SMMUState *s)                                                                                                                                                                                                                                                                                          
938 {                                                                                                                                                                                                                                                                                                                                                       
939     SMMUBaseClass *sbc;                                                                                                                                                                                                                                                                                                                                 
940                                                                                                                                                                                                                                                                                                                                                         
941     if (s->accel) {                                                                                                                                                                                                                                                                                                                                     
942         sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMUV3_ACCEL));                                                                                                                                                                                                                                                                              
943     } else {                                                                                                                                                                                                                                                                                                                                            
944         sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));                                                                                                                                                                                                                                                                                      
945     }                                                                                                                                                                                                                                                                                                                                                   
946     assert(sbc->iommu_ops);                                                                                                                                                                                                                                                                                                                             
947                                                                                                                                                                                                                                                                                                                                                         
948     return sbc->iommu_ops;                                                                                                                                                                                                                                                                                                                              
949 }   

Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-15 10:53   ` Duan, Zhenzhong
@ 2025-07-15 17:59     ` Nicolin Chen
  2025-07-16  6:26       ` Duan, Zhenzhong
  2025-07-16  8:06       ` Shameerali Kolothum Thodi via
  0 siblings, 2 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-15 17:59 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: Shameer Kolothum, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, linuxarm@huawei.com,
	wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
	jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com

On Tue, Jul 15, 2025 at 10:53:50AM +0000, Duan, Zhenzhong wrote:
> 
> 
> >-----Original Message-----
> >From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> >Subject: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated
> >SMMUv3 to vfio-pci endpoints with iommufd
> >
> >Accelerated SMMUv3 is only useful when the device can take advantage of
> >the host's SMMUv3 in nested mode. To keep things simple and correct, we
> >only allow this feature for vfio-pci endpoint devices that use the iommufd
> >backend. We also allow non-endpoint emulated devices like PCI bridges and
> >root ports, so that users can plug in these vfio-pci devices.
> >
> >Another reason for this limit is to avoid problems with IOTLB
> >invalidations. Some commands (e.g., CMD_TLBI_NH_ASID) lack an associated
> >SID, making it difficult to trace the originating device. If we allowed
> >emulated endpoint devices, QEMU would have to invalidate both its own
> >software IOTLB and the host's hardware IOTLB, which could slow things
> >down.
> >
> >Since vfio-pci devices in nested mode rely on the host SMMUv3's nested
> >translation (S1+S2), their get_address_space() callback must return the
> >system address space to enable correct S2 mappings of guest RAM.
> >
> >So in short:
> > - vfio-pci devices return the system address space
> > - bridges and root ports return the IOMMU address space
> >
> >Note: On ARM, MSI doorbell addresses are also translated via SMMUv3.
> 
> So the translation result is a doorbell addr(gpa) for guest?
> IIUC, there should be a mapping between guest doorbell addr(gpa) to host
> doorbell addr(hpa) in stage2 page table? Where is this mapping setup?

Yes and yes.

On ARM, MSI is behind IOMMU. When 2-stage translation is enabled,
it goes through two stages as you understood.

There are a few ways to implement this, though the current kernel
only supports one solution, which is a hard-coded RMR (reserved
memory region).

The solution sets up a RMR region in the ACPI's IORT, which maps
the stage1 linearly, i.e. gIOVA=gPA.

The gPA=>hPA mappings in the stage-2 are done by the kernel that
polls an IOMMU_RESV_SW_MSI region defined in the kernel driver.

It's not the ideal solution, but it's the simplest to implement.

There are other ways to support this like a true 2-stage mapping
but they are still on the way.

For more details, please refer to this:
https://lore.kernel.org/all/cover.1740014950.git.nicolinc@nvidia.com/

> >+static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool *vfio_pci)
> >+{
> >+
> >+    if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
> >+        object_dynamic_cast(OBJECT(pdev), "pxb-pcie") ||
> >+        object_dynamic_cast(OBJECT(pdev), "gpex-root")) {
> >+        return true;
> >+    } else if ((object_dynamic_cast(OBJECT(pdev), TYPE_VFIO_PCI) &&
> >+        object_property_find(OBJECT(pdev), "iommufd"))) {
> 
> Will this always return true?

It won't if a vfio-pci device doesn't have the "iommufd" property?

> >+        *vfio_pci = true;
> >+        return true;
> >+    }
> >+    return false;

Then, it returns "false" here.

> > static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void
> >*opaque,
> >                                               int devfn)
> > {
> >+    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
> >     SMMUState *bs = opaque;
> >+    bool vfio_pci = false;
> >     SMMUPciBus *sbus;
> >     SMMUv3AccelDevice *accel_dev;
> >     SMMUDevice *sdev;
> >
> >+    if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
> >+        error_report("Device(%s) not allowed. Only PCIe root complex
> >devices "
> >+                     "or PCI bridge devices or vfio-pci endpoint devices
> >with "
> >+                     "iommufd as backend is allowed with
> >arm-smmuv3,accel=on",
> >+                     pdev->name);
> >+        exit(1);
> 
> Seems aggressive for a hotplug, could we fail hotplug instead of kill QEMU?

Hotplug will unlikely be supported well, as it would introduce
too much complication.

With iommufd, a vIOMMU object is allocated per device (vfio). If
the device fd (cdev) is not yet given to the QEMU. It isn't able
to allocate a vIOMMU object when creating a VM.

While a vIOMMU object can be allocated at a later stage once the
device is hotplugged. But things like IORT mappings aren't able
to get refreshed since the OS is likely already booted. Even an
IOMMU capability sync via the hw_info ioctl will be difficult to
do at the runtime post the guest iommu driver's initialization.

I am not 100% sure. But I think Intel model could have a similar
problem if the guest boots with zero cold-plugged device and then
hot-plugs a PASID-capable device at the runtime, when the guest-
level IOMMU driver is already inited?

FWIW, Shameer's cover-letter has the following line:
 "At least one vfio-pci device must currently be cold-plugged to
  a PCIe root complex associated with arm-smmuv3,accel=on."

Perhaps there should be a similar highlight in this smmuv3-accel
file as well (@Shameer).

Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  2025-07-14 15:59 ` [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support Shameer Kolothum via
  2025-07-14 19:37   ` Nicolin Chen
@ 2025-07-15 23:12   ` Nicolin Chen
  2025-07-16  8:36     ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 79+ messages in thread
From: Nicolin Chen @ 2025-07-15 23:12 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:35PM +0100, Shameer Kolothum wrote:
> +static void
> +smmuv3_accel_ste_range(gpointer key, gpointer value, gpointer user_data)
> +{
> +    SMMUDevice *sdev = (SMMUDevice *)key;
> +    uint32_t sid = smmu_get_sid(sdev);
> +    SMMUSIDRange *sid_range = (SMMUSIDRange *)user_data;
> +
> +    if (sid >= sid_range->start && sid <= sid_range->end) {
> +        SMMUv3State *s = sdev->smmu;
> +        SMMUState *bs = &s->smmu_state;
> +
> +        smmuv3_accel_install_nested_ste(bs, sdev, sid);
> +    }
> +}
> +
> +void
> +smmuv3_accel_install_nested_ste_range(SMMUState *bs, SMMUSIDRange *range)
> +{
> +    if (!bs->accel) {
> +        return;
> +    }
> +
> +    g_hash_table_foreach(bs->configs, smmuv3_accel_ste_range, range);

This will not work correctly?

The bs->configs is a cache that gets an entry inserted to when a
config is fetched via smmuv3_get_config(), which gets invoked by
smmuv3_notify_iova() and smmuv3_translate() only.

But CMDQ_OP_CFGI_ALL can actually happen very early, e.g. Linux
driver does that in the probe() right after SMMU CMDQ is enabled,
at which point neither smmuv3_notify_iova nor smmuv3_translate
could ever get invoked, meaning that the g_hash_table is empty.

Without the acceleration, this foreach works because vSMMU does
not need to do anything since the cache is indeed empty.

But, with accel, it must call smmuv3_accel_install_nested_ste().

So, I think this should foreach the viommu->device_list instead.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-14 15:59 ` [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits Shameer Kolothum via
  2025-07-14 20:04   ` Nicolin Chen via
  2025-07-15 10:48   ` Jonathan Cameron via
@ 2025-07-16  2:57   ` Nicolin Chen via
  2025-07-16 10:26     ` Shameerali Kolothum Thodi via
  2025-07-16 11:51     ` Jason Gunthorpe
  2025-07-22 17:42   ` Nicolin Chen
  3 siblings, 2 replies; 79+ messages in thread
From: Nicolin Chen via @ 2025-07-16  2:57 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:40PM +0100, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Not all fields in the SMMU IDR registers are meaningful for userspace.
> Only the following fields can be used:
> 
>   - IDR0: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN, CD2L, ASID16, TTF  
>   - IDR1: SIDSIZE, SSIDSIZE  
>   - IDR3: BBML, RIL  
>   - IDR5: VAX, GRAN64K, GRAN16K, GRAN4K

But half of these fields are not validated in the patch :-/

My vSMMU didn't work until I added entries like SIDSIZE, SSIDSIZE,
TERM_MODEL, STALL_MODEL, and RIL.

I think IDR5.OAS should be also added in the list. Maybe we should
update the kernel uAPI meanwhile.

> +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN4K);
> +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN4K)) {
> +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
> +    }
> +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN16K);
> +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN16K)) {
> +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
> +    }
> +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN64K);
> +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN64K)) {
> +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);

Unless there is some conflicts between the QEMU emulation and the
SMMU HW, I think we should probably just override these fields to
the HW values, instead of running comparisons. The justification
could be that these fields are unlikely going to be controlled by
the QEMU but supported directly by the real HW.

For example, if HW supports SSIDSIZE=5, there seems to be no good
reason to limit it to SSIDSIZE=4? Even if the default SSIDSIZE in
the smmuv3_init_regs() is 4.

> @@ -1903,6 +1904,9 @@ static void smmu_reset_exit(Object *obj, ResetType type)
>      }
>  
>      smmuv3_init_regs(s);
> +    if (sys->accel) {
> +        smmuv3_accel_init_regs(s);
> +    }

I feel that we should likely do an if-else instead, i.e.

    if (sys->accel) {
        smmuv3_accel_init_regs(s);
    } else {
        smmuv3_init_regs(s);
    }

The smmuv3_init_regs() enables certain bits that really should be
set by the returned IDRs from hw_info in smmuv3_accel_init_regs().

Doing an overriding call can potentially give us some trouble in
the future if there are new bits being introduced and enabled in
smmuv3_init_regs() but missed in smmuv3_accel_init_regs().

So, it can be simpler in the long run if smmuv3_accel_init_regs()
initializes in its own way, IMHO.

Thanks
Nicolin 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  2025-07-15 17:29     ` Nicolin Chen
@ 2025-07-16  3:38       ` Duan, Zhenzhong
  2025-07-16  9:27         ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 79+ messages in thread
From: Duan, Zhenzhong @ 2025-07-16  3:38 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, linuxarm@huawei.com,
	wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
	jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com



>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce
>smmuv3 accel device
>
>On Tue, Jul 15, 2025 at 10:48:31AM +0000, Duan, Zhenzhong wrote:
>> >+static const TypeInfo types[] = {
>> >+    {
>> >+        .name = TYPE_ARM_SMMUV3_ACCEL,
>> >+        .parent = TYPE_ARM_SMMUV3,
>> >+        .class_init = smmuv3_accel_class_init,
>> >+    }
>>
>> In cover-letter, I see "-device arm-smmuv3", so where is above accel device
>> created so we could use smmuv3_accel_ops?
>
>The smmu-common.c is the shared file between accel and non-accel
>instances. It has a module property:
>    DEFINE_PROP_BOOL("accel", SMMUState, accel, false),

It looks we expose a new TYPE_ARM_SMMUV3_ACCEL type device just for exporting accel iommu_ops?
What about returning accel iommu_ops directly in smmu_iommu_ops_by_type() and drop the new type?

>
>where it directs to different iommu_ops:
>937 static const PCIIOMMUOps *smmu_iommu_ops_by_type(SMMUState *s)
>938 {
>939     SMMUBaseClass *sbc;
>940
>941     if (s->accel) {
>942         sbc =
>ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMUV3_ACCEL));
>943     } else {
>944         sbc =
>ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
>945     }
>946     assert(sbc->iommu_ops);
>947
>948     return sbc->iommu_ops;
>949 }
>
>Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-15 17:59     ` Nicolin Chen
@ 2025-07-16  6:26       ` Duan, Zhenzhong
  2025-07-16  9:34         ` Shameerali Kolothum Thodi via
  2025-07-16  8:06       ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 79+ messages in thread
From: Duan, Zhenzhong @ 2025-07-16  6:26 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, linuxarm@huawei.com,
	wangzhou1@hisilicon.com, jiangkunkun@huawei.com,
	jonathan.cameron@huawei.com, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com



>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict
>accelerated SMMUv3 to vfio-pci endpoints with iommufd
>
>On Tue, Jul 15, 2025 at 10:53:50AM +0000, Duan, Zhenzhong wrote:
>>
>>
>> >-----Original Message-----
>> >From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>> >Subject: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict
>accelerated
>> >SMMUv3 to vfio-pci endpoints with iommufd
>> >
>> >Accelerated SMMUv3 is only useful when the device can take advantage of
>> >the host's SMMUv3 in nested mode. To keep things simple and correct, we
>> >only allow this feature for vfio-pci endpoint devices that use the iommufd
>> >backend. We also allow non-endpoint emulated devices like PCI bridges
>and
>> >root ports, so that users can plug in these vfio-pci devices.
>> >
>> >Another reason for this limit is to avoid problems with IOTLB
>> >invalidations. Some commands (e.g., CMD_TLBI_NH_ASID) lack an
>associated
>> >SID, making it difficult to trace the originating device. If we allowed
>> >emulated endpoint devices, QEMU would have to invalidate both its own
>> >software IOTLB and the host's hardware IOTLB, which could slow things
>> >down.
>> >
>> >Since vfio-pci devices in nested mode rely on the host SMMUv3's nested
>> >translation (S1+S2), their get_address_space() callback must return the
>> >system address space to enable correct S2 mappings of guest RAM.
>> >
>> >So in short:
>> > - vfio-pci devices return the system address space
>> > - bridges and root ports return the IOMMU address space
>> >
>> >Note: On ARM, MSI doorbell addresses are also translated via SMMUv3.
>>
>> So the translation result is a doorbell addr(gpa) for guest?
>> IIUC, there should be a mapping between guest doorbell addr(gpa) to host
>> doorbell addr(hpa) in stage2 page table? Where is this mapping setup?
>
>Yes and yes.
>
>On ARM, MSI is behind IOMMU. When 2-stage translation is enabled,
>it goes through two stages as you understood.
>
>There are a few ways to implement this, though the current kernel
>only supports one solution, which is a hard-coded RMR (reserved
>memory region).
>
>The solution sets up a RMR region in the ACPI's IORT, which maps
>the stage1 linearly, i.e. gIOVA=gPA.
>
>The gPA=>hPA mappings in the stage-2 are done by the kernel that
>polls an IOMMU_RESV_SW_MSI region defined in the kernel driver.
>
>It's not the ideal solution, but it's the simplest to implement.
>
>There are other ways to support this like a true 2-stage mapping
>but they are still on the way.
>
>For more details, please refer to this:
>https://lore.kernel.org/all/cover.1740014950.git.nicolinc@nvidia.com/

Thanks for the link, it helps much for understanding arm smmu arch.

>
>> >+static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool
>*vfio_pci)
>> >+{
>> >+
>> >+    if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
>> >+        object_dynamic_cast(OBJECT(pdev), "pxb-pcie") ||
>> >+        object_dynamic_cast(OBJECT(pdev), "gpex-root")) {
>> >+        return true;
>> >+    } else if ((object_dynamic_cast(OBJECT(pdev), TYPE_VFIO_PCI) &&
>> >+        object_property_find(OBJECT(pdev), "iommufd"))) {
>>
>> Will this always return true?
>
>It won't if a vfio-pci device doesn't have the "iommufd" property?

IIUC, iommufd property is always there, just value not filled for legacy container case.
What about checking VFIOPCIDevice.vbasedev.iommufd?

>
>> >+        *vfio_pci = true;
>> >+        return true;
>> >+    }
>> >+    return false;
>
>Then, it returns "false" here.
>
>> > static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void
>> >*opaque,
>> >                                               int devfn)
>> > {
>> >+    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>> >     SMMUState *bs = opaque;
>> >+    bool vfio_pci = false;
>> >     SMMUPciBus *sbus;
>> >     SMMUv3AccelDevice *accel_dev;
>> >     SMMUDevice *sdev;
>> >
>> >+    if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
>> >+        error_report("Device(%s) not allowed. Only PCIe root complex
>> >devices "
>> >+                     "or PCI bridge devices or vfio-pci endpoint
>devices
>> >with "
>> >+                     "iommufd as backend is allowed with
>> >arm-smmuv3,accel=on",
>> >+                     pdev->name);
>> >+        exit(1);
>>
>> Seems aggressive for a hotplug, could we fail hotplug instead of kill QEMU?
>
>Hotplug will unlikely be supported well, as it would introduce
>too much complication.
>
>With iommufd, a vIOMMU object is allocated per device (vfio). If
>the device fd (cdev) is not yet given to the QEMU. It isn't able
>to allocate a vIOMMU object when creating a VM.
>
>While a vIOMMU object can be allocated at a later stage once the
>device is hotplugged. But things like IORT mappings aren't able
>to get refreshed since the OS is likely already booted. Even an
>IOMMU capability sync via the hw_info ioctl will be difficult to
>do at the runtime post the guest iommu driver's initialization.
>
>I am not 100% sure. But I think Intel model could have a similar
>problem if the guest boots with zero cold-plugged device and then
>hot-plugs a PASID-capable device at the runtime, when the guest-
>level IOMMU driver is already inited?

For vtd we define a property for each capability we care about.
When hotplug a device, we get hw_info through ioctl and compare
host's capability with virtual vtd's property setting, if incompatible,
we fail the hotplug.

In old implementation we sync host iommu caps into virtual vtd's cap,
but that's Naked by maintainer. The suggested way is to define property
for each capability we care and do compatibility check.

There is a "pasid" property in virtual vtd, only when it's true, the PASID-capable
device can work with pasid.

Zhenzhong

>
>FWIW, Shameer's cover-letter has the following line:
> "At least one vfio-pci device must currently be cold-plugged to
>  a PCIe root complex associated with arm-smmuv3,accel=on."
>
>Perhaps there should be a similar highlight in this smmuv3-accel
>file as well (@Shameer).
>
>Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-07-15 10:46 ` Duan, Zhenzhong
@ 2025-07-16  7:27   ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 79+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-07-16  7:27 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, ddutile@redhat.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, shameerkolothum@gmail.com



> -----Original Message-----
> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
> Sent: Tuesday, July 15, 2025 11:46 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; berrange@redhat.com;
> nathanc@nvidia.com; mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org;
> shameerkolothum@gmail.com
> Subject: RE: [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-
> creatable accelerated SMMUv3
> 
> Hi Shameer,
> 
> >-----Original Message-----
> >From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> >Subject: [RFC PATCH v3 00/15] hw/arm/virt: Add support for
> >user-creatable accelerated SMMUv3
> >
> >Hi All,
> >
> >This patch series introduces initial support for a user-creatable,
> >accelerated SMMUv3 device (-device arm-smmuv3,accel=on) in QEMU.
> >
> >This is based on the user-creatable SMMUv3 device series [0].
> >
> >Why this is needed:
> >
> >On ARM, to enable vfio-pci pass-through devices in a VM, the host
> >SMMUv3 must be set up in nested translation mode (Stage 1 + Stage 2),
> >with Stage 1 (S1) controlled by the guest and Stage 2 (S2) managed by the
> host.
> >
> >This series introduces an optional accel property for the SMMUv3
> >device, indicating that the guest will try to leverage host SMMUv3
> >features for acceleration. By default, enabling accel configures the
> >host SMMUv3 in nested mode to support vfio-pci pass-through.
> >
> >This new accelerated, user-creatable SMMUv3 device lets you:
> >
> > -Set up a VM with multiple SMMUv3s, each tied to a different physical
> >SMMUv3
> >  on the host. Typically, you’d have multiple PCIe PXB root complexes
> >in the
> >  VM (one per virtual NUMA node), and each of them can have its own
> >SMMUv3.
> >  This setup mirrors the host's layout, where each NUMA node has its
> >own
> >  SMMUv3, and helps build VMs that are more aligned with the host's
> >NUMA
> >  topology.
> 
> Is it a must to mirror the host layout?
> Does this mirror include smmuv3.0 which linked to pcie.0?
> Do we have to create same number of smmuv3 as host smmuv3 for guest?
> What happen if we don't mirror correctly, e.g., vfio device linked to
> smmuv3.0 in guest while in host it linked to smmuv3.1?

It is not a must to mirror the host layout. But NUMA alignment will help you
achieve better performance when you have PCI pass-through devices assigned
to VM. Normally in a HW system each PCIe root complex and associated IOMMU 
will be associated with a particular NUMA node. So if you don't align correctly the 
the memory access won't be optimal.

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirt#sect-Virtualization_Tuning_Optimization_Guide-NUMA-Node_Locality_for_PCI

Thanks,
Shameer

> >
> > -The host–guest SMMUv3 association results in reduced invalidation
> >broadcasts
> >  and lookups for devices behind different physical SMMUv3s.
> >
> > -Simplifies handling of host SMMUv3s with differing feature sets.
> >
> > -Lays the groundwork for additional capabilities like vCMDQ support.
> >
> >Changes from RFCv2[1] and key points in RFCv3:
> >
> > -Unlike RFCv2, there is no arm-smmuv3-accel device now. The
> >accelerated
> >  mode is enabled using -device arm-smmuv3,accel=on.
> >
> > -When accel=on is specified, the SMMUv3 will allow only vfio-pci
> > endpoint
> >  devices and any non-endpoint devices like PCI bridges and root ports
> > used
> >  to plug in the vfio-pci. See patch#6
> >
> > -I have tried to keep this RFC simple and basic so we can focus on the
> >  structure of this new accelerated support. That means there is no
> > support
> >  for ATS, PASID, or PRI. Only vfio-pci devices that don’t require
> > these
> >  features will work.
> >
> > -Some clarity is still needed on the final approach to handle MSI
> translation.
> >  Hence, RMR support (which is required for this) is not included yet,
> > but
> >  available in the git branch provided below for testing.
> >
> > -At least one vfio-pci device must currently be cold-plugged to a PCIe
> >root
> >  complex associated with arm-smmuv3,accel=on. This is required to:
> >  1. associate a guest SMMUv3 with a host SMMUv3
> >  2. retrieve the host SMMUv3 feature registers for guest export
> >  This still needs discussion, as there were concerns previously about
> >this
> >  approach and it also breaks hotplug/unplug scenarios. See patch#14
> >
> > -This version does not yet support host SMMUv3 fault handling or other
> >event
> >  notifications. These will be addressed in a future patch series.
> >
> >Branch for testing:
> >
> >This is based on v8 of the SMMUv3 device series and has dependency on
> >the Intel series here [3].
> >
> >https://github.com/hisilicon/qemu/tree/smmuv3-dev-v8-accel-rfcv3
> >
> >
> >Tested on a HiSilicon platform with multiple SMMUv3s.
> >
> >./qemu-system-aarch64 \
> >  -machine virt,accel=kvm,gic-version=3 \
> >  -object iommufd,id=iommufd0 \
> >  -bios QEMU_EFI \
> >  -cpu host -smp cpus=4 -m size=16G,slots=4,maxmem=256G -nographic \
> >  -device virtio-blk-device,drive=fs \
> >  -drive if=none,file=ubuntu.img,id=fs \
> >  -kernel Image \
> >  -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0,accel=on \
> 
> Here accel=on, so only vfio device is allowed on pcie.0?
> 
> >  -device vfio-pci,host=0000:75:00.1,bus=pcie.0,iommufd=iommufd0 \
> >  -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \
> >  -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on \
> >  -device
> >pcie-root-port,id=pcie1.port1,chassis=2,bus=pcie.1,pref64-reserve=2M,io
> >-res
> >erve=1K \
> >  -device
> >vfio-pci,host=0000:7d:02.1,bus=pcie1.port1,iommufd=iommufd0,id=net1 \
> >  -append "rdinit=init console=ttyAMA0 root=/dev/vda rw
> >earlycon=pl011,0x9000000" \
> >  -device pxb-pcie,id=pcie.2,bus_nr=32,bus=pcie.0 \
> >  -device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.2 \
> >  -device pcie-root-port,id=pcie2.port1,chassis=8,bus=pcie.2 \
> >  -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie2.port1 \
> >  -fsdev local,id=p9fs,path=p9root,security_model=mapped \
> >  -net none \
> >  -nographic
> >
> >
> >Guest output:
> >
> >root@ubuntu:/# dmesg |grep smmu
> > arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> > arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features
> >0x00008305)
> > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> > arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> > arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features
> >0x00008305)
> > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
> > arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x0
> > arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit (features
> >0x00008305)
> > arm-smmu-v3 arm-smmu-v3.2.auto: allocated 65536 entries for cmdq
> > arm-smmu-v3 arm-smmu-v3.2.auto: allocated 32768 entries for evtq
> >root@ubuntu:/#
> >
> >root@ubuntu:/# lspci -tv
> >-+-[0000:20]---00.0-[21]----00.0  Red Hat, Inc Virtio filesystem
> > +-[0000:02]---00.0-[03]----00.0  Huawei Technologies Co., Ltd. Device
> >a22e
> > \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
> >             +-01.0  Huawei Technologies Co., Ltd. Device a251
> >             +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
> >             \-03.0  Red Hat, Inc. QEMU PCIe Expander bridge
> 
> Are these all the devices in this guest config?
> Will not qemu create some default devices implicitly even if we don't ask
> them in cmdline?
> 
> Thanks
> Zhenzhong


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 13/15] hw/arm/smmuv3: Forward invalidation commands to hw
  2025-07-15 17:22     ` Nicolin Chen
@ 2025-07-16  7:32       ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 79+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-07-16  7:32 UTC (permalink / raw)
  To: Nicolin Chen, Jonathan Cameron
  Cc: Linuxarm, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Wangzhou (B), jiangkunkun,
	zhangfei.gao@linaro.org, zhenzhong.duan@intel.com,
	shameerkolothum@gmail.com



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, July 15, 2025 6:23 PM
> To: Jonathan Cameron <jonathan.cameron@huawei.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Linuxarm
> <linuxarm@huawei.com>; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org; eric.auger@redhat.com; peter.maydell@linaro.org;
> jgg@nvidia.com; ddutile@redhat.com; berrange@redhat.com;
> nathanc@nvidia.com; mochs@nvidia.com; smostafa@google.com;
> Wangzhou (B) <wangzhou1@hisilicon.com>; jiangkunkun
> <jiangkunkun@huawei.com>; zhangfei.gao@linaro.org;
> zhenzhong.duan@intel.com; shameerkolothum@gmail.com
> Subject: Re: [RFC PATCH v3 13/15] hw/arm/smmuv3: Forward invalidation
> commands to hw
> 
> On Tue, Jul 15, 2025 at 11:46:09AM +0100, Jonathan Cameron wrote:
> > >      SMMUCmdError cmd_error = SMMU_CERROR_NONE;
> > >      SMMUQueue *q = &s->cmdq;
> > >      SMMUCommandType type = 0;
> > > +    SMMUCommandBatch batch = {};
> > > +    uint32_t ncmds;
> > >
> > >      if (!smmuv3_cmdq_enabled(s)) {
> > >          return 0;
> > >      }
> > > +
> > > +    ncmds = smmuv3_q_ncmds(q);
> > > +    batch.cmds = g_new0(Cmd, ncmds);
> > > +    batch.cons = g_new0(uint32_t, ncmds);
> >
> > Where is batch.ncmds set?  It is cleared but I'm missing it being set to
> anything.
> 
> smmuv3_accel_batch_cmd() internally sets that, every time it's
> invoked to add a new command in the batch.
> 
> Shameer, let's add some comments explaining the batch function.

Yes. Will add.

> 
> > > +
> >
> > > +    qemu_mutex_lock(&s->mutex);
> > > +    if (!cmd_error && batch.ncmds) {
> > > +        if (!smmuv3_accel_issue_cmd_batch(bs, &batch)) {
> > > +            if (batch.ncmds) {
> > > +                q->cons = batch.cons[batch.ncmds - 1];
> > > +            } else {
> > > +                q->cons = batch.cons[0]; /* FIXME: Check */
> > > +            }
> >
> > Totally non obvious that a return of false from the issue call means
> > illegal command type.  Maybe that will be obvious form comments
> > requested in previous patch review.
> 
> That's a good point. Shameer, I think we need some fine-grinding
> here, validating the return value from the ioctl, for which the
> kernel will only return -EIO or -ETIMEOUT on failure, indicating
> either an SMMU_CERROR_ILL or an SMMU_CERROR_ATC_INV_SYNC.

Yeah. I was not sure on this. Also on setting current cons pointer in case IOCTL 
return for some reason other than attempting the CMD. I will double check
this.
 
Thanks,
Shameer

> > > +            qemu_log_mask(LOG_GUEST_ERROR, "Illegal command type:
> %d\n",
> > > +                          CMD_TYPE(&batch.cmds[batch.ncmds]));
> > > +            cmd_error = SMMU_CERROR_ILL;
> 
> Thanks
> Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-15 17:59     ` Nicolin Chen
  2025-07-16  6:26       ` Duan, Zhenzhong
@ 2025-07-16  8:06       ` Shameerali Kolothum Thodi via
  1 sibling, 0 replies; 79+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-07-16  8:06 UTC (permalink / raw)
  To: Nicolin Chen, Duan, Zhenzhong
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, July 15, 2025 6:59 PM
> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; jgg@nvidia.com; ddutile@redhat.com;
> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; shameerkolothum@gmail.com
> Subject: Re: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict
> accelerated SMMUv3 to vfio-pci endpoints with iommufd

...

> > >+    if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
> > >+        error_report("Device(%s) not allowed. Only PCIe root complex
> > >devices "
> > >+                     "or PCI bridge devices or vfio-pci endpoint devices
> > >with "
> > >+                     "iommufd as backend is allowed with
> > >arm-smmuv3,accel=on",
> > >+                     pdev->name);
> > >+        exit(1);
> >
> > Seems aggressive for a hotplug, could we fail hotplug instead of kill
> QEMU?

That's right. I will try to see whether it is possible to do a dev->hotplugged
check here.
 
> Hotplug will unlikely be supported well, as it would introduce
> too much complication.
> 
> With iommufd, a vIOMMU object is allocated per device (vfio). If
> the device fd (cdev) is not yet given to the QEMU. It isn't able
> to allocate a vIOMMU object when creating a VM.
> 
> While a vIOMMU object can be allocated at a later stage once the
> device is hotplugged. But things like IORT mappings aren't able
> to get refreshed since the OS is likely already booted.

Why do we need IORT mappings to be refreshed during hotplug?
AFAICS, the mappings are created per host bridge Ids. And how is this
different from a host machine doing hotplug?

 Even an
> IOMMU capability sync via the hw_info ioctl will be difficult to
> do at the runtime post the guest iommu driver's initialization.

We had some discussion on this "at least one vfio-pci" restriction
for accelerated mode previously here.
https://lore.kernel.org/qemu-devel/Z6TtCLQ35UI12T77@redhat.com/#t

I am not sure we reached any consensus on that. The 3 different approaches
discussed were,

1. The current one used here. At least one cold plugged vfio-pci device
   so that  we can retrieve the host SMMUV3 HW_INFO as per current
  IOMMUFD APIs.

2. A new IOMMUFD API to retrieve HW_INFO without a device. 

3. A fully specified vSMMUv3 through Qemu command line so that we
   don't need HW_INFO from kernel.

We're going with option one for now, but completely blocking hotplug
because of it  feels a bit too restrictive to me.

The real issue (for now), as I see it, is that we need some way to remember
the Guest SMMUv3 <-> Host SMMUv3 mapping after the guest has booted.
That way, even if all devices tied to a Guest SMMUv3 get hot-unplugged,
QEMU can still block attaching a device that belongs to a different Host
SMMUv3.

Thanks,
Shameer

> I am not 100% sure. But I think Intel model could have a similar
> problem if the guest boots with zero cold-plugged device and then
> hot-plugs a PASID-capable device at the runtime, when the guest-
> level IOMMU driver is already inited?
> 
> FWIW, Shameer's cover-letter has the following line:
>  "At least one vfio-pci device must currently be cold-plugged to
>   a PCIe root complex associated with arm-smmuv3,accel=on."
> 
> Perhaps there should be a similar highlight in this smmuv3-accel
> file as well (@Shameer).
> 
> Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  2025-07-15 23:12   ` Nicolin Chen
@ 2025-07-16  8:36     ` Shameerali Kolothum Thodi via
  2025-07-16 18:17       ` Nicolin Chen
  0 siblings, 1 reply; 79+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-07-16  8:36 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	zhenzhong.duan@intel.com, shameerkolothum@gmail.com



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, July 16, 2025 12:13 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org;
> zhenzhong.duan@intel.com; shameerkolothum@gmail.com
> Subject: Re: [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested
> STE install/uninstall support
> 
> On Mon, Jul 14, 2025 at 04:59:35PM +0100, Shameer Kolothum wrote:
> > +static void
> > +smmuv3_accel_ste_range(gpointer key, gpointer value, gpointer
> user_data)
> > +{
> > +    SMMUDevice *sdev = (SMMUDevice *)key;
> > +    uint32_t sid = smmu_get_sid(sdev);
> > +    SMMUSIDRange *sid_range = (SMMUSIDRange *)user_data;
> > +
> > +    if (sid >= sid_range->start && sid <= sid_range->end) {
> > +        SMMUv3State *s = sdev->smmu;
> > +        SMMUState *bs = &s->smmu_state;
> > +
> > +        smmuv3_accel_install_nested_ste(bs, sdev, sid);
> > +    }
> > +}
> > +
> > +void
> > +smmuv3_accel_install_nested_ste_range(SMMUState *bs,
> SMMUSIDRange *range)
> > +{
> > +    if (!bs->accel) {
> > +        return;
> > +    }
> > +
> > +    g_hash_table_foreach(bs->configs, smmuv3_accel_ste_range, range);
> 
> This will not work correctly?
> 
> The bs->configs is a cache that gets an entry inserted to when a
> config is fetched via smmuv3_get_config(), which gets invoked by
> smmuv3_notify_iova() and smmuv3_translate() only.
> 
> But CMDQ_OP_CFGI_ALL can actually happen very early, e.g. Linux
> driver does that in the probe() right after SMMU CMDQ is enabled,
> at which point neither smmuv3_notify_iova nor smmuv3_translate
> could ever get invoked, meaning that the g_hash_table is empty.
> 
> Without the acceleration, this foreach works because vSMMU does
> not need to do anything since the cache is indeed empty.
> 
> But, with accel, it must call smmuv3_accel_install_nested_ste().

Ok. The only place I can see CMDQ_OP_CFGI_ALL get invoked by Linux
kernel is during arm_smmu_device_reset() and that is to clear all.
But I am not sure we will have any valid STEs at that time. Just curious,
are you seeing any issues with this at the moment?
 
> So, I think this should foreach the viommu->device_list instead.

But agree. Using device_list is more appropriate unless we cache the 
configs during each install_netsed_ste() path.

Thanks,
Shameer 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  2025-07-16  3:38       ` Duan, Zhenzhong
@ 2025-07-16  9:27         ` Shameerali Kolothum Thodi via
  2025-09-04 14:31           ` Eric Auger
  0 siblings, 1 reply; 79+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-07-16  9:27 UTC (permalink / raw)
  To: Duan, Zhenzhong, Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com



> -----Original Message-----
> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
> Sent: Wednesday, July 16, 2025 4:39 AM
> To: Nicolin Chen <nicolinc@nvidia.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; jgg@nvidia.com; ddutile@redhat.com;
> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; shameerkolothum@gmail.com
> Subject: RE: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce
> smmuv3 accel device
> 
> 
> 
> >-----Original Message-----
> >From: Nicolin Chen <nicolinc@nvidia.com>
> >Subject: Re: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce
> >smmuv3 accel device
> >
> >On Tue, Jul 15, 2025 at 10:48:31AM +0000, Duan, Zhenzhong wrote:
> >> >+static const TypeInfo types[] = {
> >> >+    {
> >> >+        .name = TYPE_ARM_SMMUV3_ACCEL,
> >> >+        .parent = TYPE_ARM_SMMUV3,
> >> >+        .class_init = smmuv3_accel_class_init,
> >> >+    }
> >>
> >> In cover-letter, I see "-device arm-smmuv3", so where is above accel
> >> device created so we could use smmuv3_accel_ops?
> >
> >The smmu-common.c is the shared file between accel and non-accel
> >instances. It has a module property:
> >    DEFINE_PROP_BOOL("accel", SMMUState, accel, false),
> 
> It looks we expose a new TYPE_ARM_SMMUV3_ACCEL type device just for
> exporting accel iommu_ops?
> What about returning accel iommu_ops directly in
> smmu_iommu_ops_by_type() and drop the new type?

We are not creating any new device here. Its just a Class object of different type.
I had a different approach previously and Eric suggested to try this as there
are examples in VFIO/IOMMUFD for something like this.

https://lore.kernel.org/qemu-devel/1105d100-dd1e-4aca-b518-50f903967416@redhat.com/

Thanks,
Shameer

> >
> >where it directs to different iommu_ops:
> >937 static const PCIIOMMUOps *smmu_iommu_ops_by_type(SMMUState
> *s)
> >938 {
> >939     SMMUBaseClass *sbc;
> >940
> >941     if (s->accel) {
> >942         sbc =
> >ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMUV3_ACCEL));
> >943     } else {
> >944         sbc =
> >ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
> >945     }
> >946     assert(sbc->iommu_ops);
> >947
> >948     return sbc->iommu_ops;
> >949 }
> >
> >Nicolin



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-07-15 17:01     ` Nicolin Chen
@ 2025-07-16  9:33       ` Jonathan Cameron via
  0 siblings, 0 replies; 79+ messages in thread
From: Jonathan Cameron via @ 2025-07-16  9:33 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, linuxarm, qemu-arm, qemu-devel, eric.auger,
	peter.maydell, jgg, ddutile, berrange, nathanc, mochs, smostafa,
	wangzhou1, jiangkunkun, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Tue, 15 Jul 2025 10:01:21 -0700
Nicolin Chen <nicolinc@nvidia.com> wrote:

> On Tue, Jul 15, 2025 at 11:29:41AM +0100, Jonathan Cameron wrote:
> > > +    if (!iommufd_backend_alloc_viommu(idev->iommufd, idev->devid,
> > > +                                      IOMMU_VIOMMU_TYPE_ARM_SMMUV3,
> > > +                                      s2_hwpt_id, &viommu_id, errp)) {
> > > +        return false;
> > > +    }  
> 
> [...]
> 
> > > +free_abort_hwpt:
> > > +    iommufd_backend_free_id(idev->iommufd, viommu->abort_hwpt_id);
> > > +free_viommu:
> > > +    iommufd_backend_free_id(idev->iommufd, viommu->core.viommu_id);
> > > +    g_free(viommu);  
> > 
> > No unwinding of iommufd_backened_alloc_viommu?
> > Looks like we just leak it until destruction of the fd. 
> > 
> > Maybe add a comment for those like me who aren't all that familiar with
> > this stuff and see an alloc with no matching free.  
> 
> Those iommufd_backend_free_id calls are the reverts. An iommufd
> object is free-ed using its object id, i.e. the "viommu_id" and
> "abort_hwpt_id" in the lines.

Ah.  I confused IDs and thought this was unwinding the two
iommufd_backend_alloc_hwpt() calls but of course the second one doesn't
need unwinding in the error path as it is the last call so if it fails
nothing to unwind.

So feel free to ignore this comment. 

> 
> Adding comments to every single iommufd_backened_free_id() call
> isn't optimal, IMHO, as that function would be invoked across
> different vIOMMU files and even the vfio/iommufd core files.
> 
> Perhaps QEMU should wrap it up with a helper, E.g.
> 
> static inline void iommufd_backend_free(int iommufd, int obj_id)
> {
> 	iommufd_backend_free_id(iommufd, obj_id);
> }
> 
> if it helps readability?

That would then leave us with iommufd_backend_alloc_hwpt() being
unwound by a direct iommufd_backend_free_id() which is equally
odd - so let's just keep them all being odd.

Jonathan

> 
> Nicolin



^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-16  6:26       ` Duan, Zhenzhong
@ 2025-07-16  9:34         ` Shameerali Kolothum Thodi via
  2025-07-16 10:32           ` Duan, Zhenzhong
  2025-07-16 17:51           ` Nicolin Chen
  0 siblings, 2 replies; 79+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-07-16  9:34 UTC (permalink / raw)
  To: Duan, Zhenzhong, Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com



> -----Original Message-----
> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
> Sent: Wednesday, July 16, 2025 7:26 AM
> To: Nicolin Chen <nicolinc@nvidia.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; jgg@nvidia.com; ddutile@redhat.com;
> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; shameerkolothum@gmail.com
> Subject: RE: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict
> accelerated SMMUv3 to vfio-pci endpoints with iommufd
> 

...

> >> >+static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool
> >*vfio_pci)
> >> >+{
> >> >+
> >> >+    if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
> >> >+        object_dynamic_cast(OBJECT(pdev), "pxb-pcie") ||
> >> >+        object_dynamic_cast(OBJECT(pdev), "gpex-root")) {
> >> >+        return true;
> >> >+    } else if ((object_dynamic_cast(OBJECT(pdev), TYPE_VFIO_PCI) &&
> >> >+        object_property_find(OBJECT(pdev), "iommufd"))) {
> >>
> >> Will this always return true?
> >
> >It won't if a vfio-pci device doesn't have the "iommufd" property?
> 
> IIUC, iommufd property is always there, just value not filled for legacy
> container case.
> What about checking VFIOPCIDevice.vbasedev.iommufd?

That's right. The property is always there. But instead of accessing VFIOPCIDevice
in SMMUv3 code, I think we can use object_property_get_link(obj, "iommufd", &error_abort)
instead?
 
> >
> >> >+        *vfio_pci = true;
> >> >+        return true;
> >> >+    }
> >> >+    return false;
> >
> >Then, it returns "false" here.
> >
> >> > static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void
> >> >*opaque,
> >> >                                               int devfn)
> >> > {
> >> >+    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
> >> >     SMMUState *bs = opaque;
> >> >+    bool vfio_pci = false;
> >> >     SMMUPciBus *sbus;
> >> >     SMMUv3AccelDevice *accel_dev;
> >> >     SMMUDevice *sdev;
> >> >
> >> >+    if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
> >> >+        error_report("Device(%s) not allowed. Only PCIe root complex
> >> >devices "
> >> >+                     "or PCI bridge devices or vfio-pci endpoint
> >devices
> >> >with "
> >> >+                     "iommufd as backend is allowed with
> >> >arm-smmuv3,accel=on",
> >> >+                     pdev->name);
> >> >+        exit(1);
> >>
> >> Seems aggressive for a hotplug, could we fail hotplug instead of kill
> QEMU?
> >
> >Hotplug will unlikely be supported well, as it would introduce
> >too much complication.
> >
> >With iommufd, a vIOMMU object is allocated per device (vfio). If
> >the device fd (cdev) is not yet given to the QEMU. It isn't able
> >to allocate a vIOMMU object when creating a VM.
> >
> >While a vIOMMU object can be allocated at a later stage once the
> >device is hotplugged. But things like IORT mappings aren't able
> >to get refreshed since the OS is likely already booted. Even an
> >IOMMU capability sync via the hw_info ioctl will be difficult to
> >do at the runtime post the guest iommu driver's initialization.
> >
> >I am not 100% sure. But I think Intel model could have a similar
> >problem if the guest boots with zero cold-plugged device and then
> >hot-plugs a PASID-capable device at the runtime, when the guest-
> >level IOMMU driver is already inited?
> 
> For vtd we define a property for each capability we care about.
> When hotplug a device, we get hw_info through ioctl and compare
> host's capability with virtual vtd's property setting, if incompatible,
> we fail the hotplug.
> 
> In old implementation we sync host iommu caps into virtual vtd's cap,
> but that's Naked by maintainer. The suggested way is to define property
> for each capability we care and do compatibility check.
> 
> There is a "pasid" property in virtual vtd, only when it's true, the PASID-
> capable
> device can work with pasid.

Thanks for this information. I think probably we need to take a look at this as
this doesn't have a dependency on cold-plug device to be present for SMMUv3.
Will go through intel vtd implementation.

Thanks,
Shameer




^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-16  2:57   ` Nicolin Chen via
@ 2025-07-16 10:26     ` Shameerali Kolothum Thodi via
  2025-07-16 18:37       ` Nicolin Chen
  2025-07-16 11:51     ` Jason Gunthorpe
  1 sibling, 1 reply; 79+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-07-16 10:26 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	zhenzhong.duan@intel.com, shameerkolothum@gmail.com



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, July 16, 2025 3:58 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org;
> zhenzhong.duan@intel.com; shameerkolothum@gmail.com
> Subject: Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature
> bits
> 
> On Mon, Jul 14, 2025 at 04:59:40PM +0100, Shameer Kolothum wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> >
> > Not all fields in the SMMU IDR registers are meaningful for userspace.
> > Only the following fields can be used:
> >
> >   - IDR0: ST_LEVEL, TERM_MODEL, STALL_MODEL, TTENDIAN, CD2L, ASID16,
> TTF
> >   - IDR1: SIDSIZE, SSIDSIZE
> >   - IDR3: BBML, RIL
> >   - IDR5: VAX, GRAN64K, GRAN16K, GRAN4K
> 
> But half of these fields are not validated in the patch :-/

That’s why I said " Use the relevant fields from these to check.." .
But sorry, I was conservative ☹ and not sure the SSIDSIZE/STALL mattered
for non pasid cases.

> My vSMMU didn't work until I added entries like SIDSIZE, SSIDSIZE,
> TERM_MODEL, STALL_MODEL, and RIL.

How come your vSMMU not working? Or you meant the assigned
dev is not working?

The emulation supports SIDSIZE = 16 and RIL. Could you please
share the difference between these values w.r.t host SMMUv3.

> I think IDR5.OAS should be also added in the list. Maybe we should
> update the kernel uAPI meanwhile.

Ok.
 
> > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN4K);
> > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN4K)) {
> > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
> > +    }
> > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN16K);
> > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN16K)) {
> > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
> > +    }
> > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN64K);
> > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN64K)) {
> > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
> 
> Unless there is some conflicts between the QEMU emulation and the
> SMMU HW, I think we should probably just override these fields to
> the HW values, instead of running comparisons. The justification
> could be that these fields are unlikely going to be controlled by
> the QEMU but supported directly by the real HW.
> 
> For example, if HW supports SSIDSIZE=5, there seems to be no good
> reason to limit it to SSIDSIZE=4? Even if the default SSIDSIZE in
> the smmuv3_init_regs() is 4.
> 
> > @@ -1903,6 +1904,9 @@ static void smmu_reset_exit(Object *obj,
> ResetType type)
> >      }
> >
> >      smmuv3_init_regs(s);
> > +    if (sys->accel) {
> > +        smmuv3_accel_init_regs(s);
> > +    }
> 
> I feel that we should likely do an if-else instead, i.e.
> 
>     if (sys->accel) {
>         smmuv3_accel_init_regs(s);
>     } else {
>         smmuv3_init_regs(s);
>     }
> 
> The smmuv3_init_regs() enables certain bits that really should be
> set by the returned IDRs from hw_info in smmuv3_accel_init_regs().
> 
> Doing an overriding call can potentially give us some trouble in
> the future if there are new bits being introduced and enabled in
> smmuv3_init_regs() but missed in smmuv3_accel_init_regs().
> 
> So, it can be simpler in the long run if smmuv3_accel_init_regs()
> initializes in its own way, IMHO.

Ok. Are you suggesting we simply override the IDR values from Host?
I don't think that is a good idea as it is not just the IDR values that
determines the host features. And we had a discussion on this 
in v2 and the suggestion was " vmm should not be copying IDR fields
blindly..."

https://lore.kernel.org/qemu-devel/Z+VNA+hFu0LJn19l@nvidia.com/

Probably we should take a look at Intel vtd implementation mentioned
by Zhenzhong in the other thread where it looks like there seems to be
a property for each capability they care about.

Probably something like,
-device arm-smmuv3,accel=on,pasid_cap=on,

And then enabling all features related to pasid and on later when
we retrieve the HW_INFO on device plug, compare and fail if not?

But I think on ARM, we still we have limitations in knowing the actual
host supported features through IDR. In that case, we can only assume
that user is making an informed decision while enabling these features.

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-16  9:34         ` Shameerali Kolothum Thodi via
@ 2025-07-16 10:32           ` Duan, Zhenzhong
  2025-07-16 17:51           ` Nicolin Chen
  1 sibling, 0 replies; 79+ messages in thread
From: Duan, Zhenzhong @ 2025-07-16 10:32 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com



>-----Original Message-----
>From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>Subject: RE: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict
>accelerated SMMUv3 to vfio-pci endpoints with iommufd
>
>
>
>> -----Original Message-----
>> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>> Sent: Wednesday, July 16, 2025 7:26 AM
>> To: Nicolin Chen <nicolinc@nvidia.com>
>> Cc: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; eric.auger@redhat.com;
>> peter.maydell@linaro.org; jgg@nvidia.com; ddutile@redhat.com;
>> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
>> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org; shameerkolothum@gmail.com
>> Subject: RE: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict
>> accelerated SMMUv3 to vfio-pci endpoints with iommufd
>>
>
>...
>
>> >> >+static bool smmuv3_accel_pdev_allowed(PCIDevice *pdev, bool
>> >*vfio_pci)
>> >> >+{
>> >> >+
>> >> >+    if (object_dynamic_cast(OBJECT(pdev), TYPE_PCI_BRIDGE) ||
>> >> >+        object_dynamic_cast(OBJECT(pdev), "pxb-pcie") ||
>> >> >+        object_dynamic_cast(OBJECT(pdev), "gpex-root")) {
>> >> >+        return true;
>> >> >+    } else if ((object_dynamic_cast(OBJECT(pdev), TYPE_VFIO_PCI)
>&&
>> >> >+        object_property_find(OBJECT(pdev), "iommufd"))) {
>> >>
>> >> Will this always return true?
>> >
>> >It won't if a vfio-pci device doesn't have the "iommufd" property?
>>
>> IIUC, iommufd property is always there, just value not filled for legacy
>> container case.
>> What about checking VFIOPCIDevice.vbasedev.iommufd?
>
>That's right. The property is always there. But instead of accessing
>VFIOPCIDevice
>in SMMUv3 code, I think we can use object_property_get_link(obj, "iommufd",
>&error_abort)
>instead?

Yes, looks better.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-16  2:57   ` Nicolin Chen via
  2025-07-16 10:26     ` Shameerali Kolothum Thodi via
@ 2025-07-16 11:51     ` Jason Gunthorpe
  2025-07-16 17:35       ` Nicolin Chen
  1 sibling, 1 reply; 79+ messages in thread
From: Jason Gunthorpe @ 2025-07-16 11:51 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, eric.auger, peter.maydell,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Tue, Jul 15, 2025 at 07:57:57PM -0700, Nicolin Chen wrote:
> > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN4K);
> > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN4K)) {
> > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
> > +    }
> > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN16K);
> > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN16K)) {
> > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
> > +    }
> > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN64K);
> > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN64K)) {
> > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
> 
> Unless there is some conflicts between the QEMU emulation and the
> SMMU HW, I think we should probably just override these fields to
> the HW values,

The qemu model should be fully independent of the underlying HW, it
should not override from HW.

It should check if the underlying supports the model and fail if it
doesn't.

Jason


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-16 11:51     ` Jason Gunthorpe
@ 2025-07-16 17:35       ` Nicolin Chen
  2025-07-16 17:45         ` Jason Gunthorpe
  0 siblings, 1 reply; 79+ messages in thread
From: Nicolin Chen @ 2025-07-16 17:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, eric.auger, peter.maydell,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Wed, Jul 16, 2025 at 08:51:23AM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 15, 2025 at 07:57:57PM -0700, Nicolin Chen wrote:
> > > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN4K);
> > > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN4K)) {
> > > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
> > > +    }
> > > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN16K);
> > > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN16K)) {
> > > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
> > > +    }
> > > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN64K);
> > > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN64K)) {
> > > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
> > 
> > Unless there is some conflicts between the QEMU emulation and the
> > SMMU HW, I think we should probably just override these fields to
> > the HW values,
> 
> The qemu model should be fully independent of the underlying HW, it
> should not override from HW.
> 
> It should check if the underlying supports the model and fail if it
> doesn't.

For every bit? If there is a conflict at a certain field (e.g.
VMM only supports little endian while HW supports big endian),
it must fail.

But here, I mean for these specific fields such as GRANxK and
RIL (range-based invalidation), we should override them with
the HW values. Otherwise, the guest OS seeing RIL for example
will issue TLBI commands that the host can't support. Right?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-16 17:35       ` Nicolin Chen
@ 2025-07-16 17:45         ` Jason Gunthorpe
  2025-07-16 18:09           ` Nicolin Chen
  0 siblings, 1 reply; 79+ messages in thread
From: Jason Gunthorpe @ 2025-07-16 17:45 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, eric.auger, peter.maydell,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Wed, Jul 16, 2025 at 10:35:25AM -0700, Nicolin Chen wrote:
> On Wed, Jul 16, 2025 at 08:51:23AM -0300, Jason Gunthorpe wrote:
> > On Tue, Jul 15, 2025 at 07:57:57PM -0700, Nicolin Chen wrote:
> > > > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN4K);
> > > > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN4K)) {
> > > > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
> > > > +    }
> > > > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN16K);
> > > > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN16K)) {
> > > > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
> > > > +    }
> > > > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN64K);
> > > > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN64K)) {
> > > > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
> > > 
> > > Unless there is some conflicts between the QEMU emulation and the
> > > SMMU HW, I think we should probably just override these fields to
> > > the HW values,
> > 
> > The qemu model should be fully independent of the underlying HW, it
> > should not override from HW.
> > 
> > It should check if the underlying supports the model and fail if it
> > doesn't.
> 
> For every bit? If there is a conflict at a certain field (e.g.
> VMM only supports little endian while HW supports big endian),
> it must fail.

Yes every bit.

> But here, I mean for these specific fields such as GRANxK and
> RIL (range-based invalidation), we should override them with
> the HW values. Otherwise, the guest OS seeing RIL for example
> will issue TLBI commands that the host can't support. Right?

No.

If the SMMU model does not include RIL then RIL is not available to
the guest.

If the SMMU model only supports GRAN4K, then the guest only uses 4k.

This exactness is critical for live migration. We cannot have the IDRs
change during live migration.

So there should be some built in models in qemu that define exactly
what kind of SMMU you get, and things like if 4k/16k/64k or RIL are
included in that model or not should be command line parameters/etc
like everything else in qemu..

Jason


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-16  9:34         ` Shameerali Kolothum Thodi via
  2025-07-16 10:32           ` Duan, Zhenzhong
@ 2025-07-16 17:51           ` Nicolin Chen
  2025-07-16 18:21             ` Nicolin Chen
  1 sibling, 1 reply; 79+ messages in thread
From: Nicolin Chen @ 2025-07-16 17:51 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Duan, Zhenzhong, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com

On Wed, Jul 16, 2025 at 09:34:04AM +0000, Shameerali Kolothum Thodi wrote:
> > >> Seems aggressive for a hotplug, could we fail hotplug instead of kill
> > QEMU?
> > >
> > >Hotplug will unlikely be supported well, as it would introduce
> > >too much complication.
> > >
> > >With iommufd, a vIOMMU object is allocated per device (vfio). If
> > >the device fd (cdev) is not yet given to the QEMU. It isn't able
> > >to allocate a vIOMMU object when creating a VM.
> > >
> > >While a vIOMMU object can be allocated at a later stage once the
> > >device is hotplugged. But things like IORT mappings aren't able
> > >to get refreshed since the OS is likely already booted. Even an
> > >IOMMU capability sync via the hw_info ioctl will be difficult to
> > >do at the runtime post the guest iommu driver's initialization.
> > >
> > >I am not 100% sure. But I think Intel model could have a similar
> > >problem if the guest boots with zero cold-plugged device and then
> > >hot-plugs a PASID-capable device at the runtime, when the guest-
> > >level IOMMU driver is already inited?
> > 
> > For vtd we define a property for each capability we care about.
> > When hotplug a device, we get hw_info through ioctl and compare
> > host's capability with virtual vtd's property setting, if incompatible,
> > we fail the hotplug.
> > 
> > In old implementation we sync host iommu caps into virtual vtd's cap,
> > but that's Naked by maintainer. The suggested way is to define property
> > for each capability we care and do compatibility check.
> > 
> > There is a "pasid" property in virtual vtd, only when it's true, the PASID-
> > capable
> > device can work with pasid.
> 
> Thanks for this information. I think probably we need to take a look at this as
> this doesn't have a dependency on cold-plug device to be present for SMMUv3.
> Will go through intel vtd implementation.

I see. A compatibility test sounds promising.

It still feels tricky when dealing with multi vSMMU instances, if
some instances don't have a cold-plug device to poll hw_info. We
would need to pre-define all the feature bits. Then, run the test
on every hotplug device attached later to the vSMMU instance.

Maybe we could do something wise:
The sysfs node provides all the IOMMU nodes. So, we could compare
the node names to see if they are likely symmetric or not. Nodes
sharing the same naming pattern are more likely created by the
same IOMMU driver. So, as a speculation, a vSMMU instance with no
coldplug device could borrow the bits from a vSMMU instance with
a device?

Sure, individual IOMMU instances could differ in specific fields
despite using the same node name. This would unfortunately lead
to hotplug failure upon the compatibility check.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-16 17:45         ` Jason Gunthorpe
@ 2025-07-16 18:09           ` Nicolin Chen
  2025-07-16 18:42             ` Jason Gunthorpe
  0 siblings, 1 reply; 79+ messages in thread
From: Nicolin Chen @ 2025-07-16 18:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, eric.auger, peter.maydell,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Wed, Jul 16, 2025 at 02:45:06PM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 16, 2025 at 10:35:25AM -0700, Nicolin Chen wrote:
> > On Wed, Jul 16, 2025 at 08:51:23AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Jul 15, 2025 at 07:57:57PM -0700, Nicolin Chen wrote:
> > > > > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN4K);
> > > > > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN4K)) {
> > > > > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
> > > > > +    }
> > > > > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN16K);
> > > > > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN16K)) {
> > > > > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
> > > > > +    }
> > > > > +    val = FIELD_EX32(s_accel->info.idr[5], IDR5, GRAN64K);
> > > > > +    if (val < FIELD_EX32(s->idr[5], IDR5, GRAN64K)) {
> > > > > +        s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
> > > > 
> > > > Unless there is some conflicts between the QEMU emulation and the
> > > > SMMU HW, I think we should probably just override these fields to
> > > > the HW values,
> > > 
> > > The qemu model should be fully independent of the underlying HW, it
> > > should not override from HW.
> > > 
> > > It should check if the underlying supports the model and fail if it
> > > doesn't.
> > 
> > For every bit? If there is a conflict at a certain field (e.g.
> > VMM only supports little endian while HW supports big endian),
> > it must fail.
> 
> Yes every bit.
> 
> > But here, I mean for these specific fields such as GRANxK and
> > RIL (range-based invalidation), we should override them with
> > the HW values. Otherwise, the guest OS seeing RIL for example
> > will issue TLBI commands that the host can't support. Right?
> 
> No.
> 
> If the SMMU model does not include RIL then RIL is not available to
> the guest.
> 
> If the SMMU model only supports GRAN4K, then the guest only uses 4k.
> 
> This exactness is critical for live migration. We cannot have the IDRs
> change during live migration.
> 
> So there should be some built in models in qemu that define exactly
> what kind of SMMU you get, and things like if 4k/16k/64k or RIL are
> included in that model or not should be command line parameters/etc
> like everything else in qemu..

OK. I see your point. That will leads to a very long list of
parameters.

So, a vSMMU model is defined following the parameters in the
command line. A device (and its attaching SMMU HW) that's not
compatibile should just fail the cold-plug at the beginning.

Then, it shouldn't run into any problem that I encountered.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  2025-07-16  8:36     ` Shameerali Kolothum Thodi via
@ 2025-07-16 18:17       ` Nicolin Chen
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-16 18:17 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	zhenzhong.duan@intel.com, shameerkolothum@gmail.com

On Wed, Jul 16, 2025 at 08:36:38AM +0000, Shameerali Kolothum Thodi wrote:
> > > +    g_hash_table_foreach(bs->configs, smmuv3_accel_ste_range, range);
> > 
> > This will not work correctly?
> > 
> > The bs->configs is a cache that gets an entry inserted to when a
> > config is fetched via smmuv3_get_config(), which gets invoked by
> > smmuv3_notify_iova() and smmuv3_translate() only.
> > 
> > But CMDQ_OP_CFGI_ALL can actually happen very early, e.g. Linux
> > driver does that in the probe() right after SMMU CMDQ is enabled,
> > at which point neither smmuv3_notify_iova nor smmuv3_translate
> > could ever get invoked, meaning that the g_hash_table is empty.
> > 
> > Without the acceleration, this foreach works because vSMMU does
> > not need to do anything since the cache is indeed empty.
> > 
> > But, with accel, it must call smmuv3_accel_install_nested_ste().
> 
> Ok. The only place I can see CMDQ_OP_CFGI_ALL get invoked by Linux
> kernel is during arm_smmu_device_reset() and that is to clear all.
> But I am not sure we will have any valid STEs at that time. Just curious,
> are you seeing any issues with this at the moment?

I recall that (not for this series) I hit some issue with a guest
having "iommu.passthrough=y" string in its bootcmd. The guest OS
initialized all SIDs to a Config.Bypass mode accordingly. But that
was not handled correctly by QEMU so the host was not getting any
request to program a stage-1 bypass STE to the HW.

So, I think there would be a similar issue here.

Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-16 17:51           ` Nicolin Chen
@ 2025-07-16 18:21             ` Nicolin Chen
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-16 18:21 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Duan, Zhenzhong, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com

On Wed, Jul 16, 2025 at 10:51:23AM -0700, Nicolin Chen wrote:
> On Wed, Jul 16, 2025 at 09:34:04AM +0000, Shameerali Kolothum Thodi wrote:
> > > >> Seems aggressive for a hotplug, could we fail hotplug instead of kill
> > > QEMU?
> > > >
> > > >Hotplug will unlikely be supported well, as it would introduce
> > > >too much complication.
> > > >
> > > >With iommufd, a vIOMMU object is allocated per device (vfio). If
> > > >the device fd (cdev) is not yet given to the QEMU. It isn't able
> > > >to allocate a vIOMMU object when creating a VM.
> > > >
> > > >While a vIOMMU object can be allocated at a later stage once the
> > > >device is hotplugged. But things like IORT mappings aren't able
> > > >to get refreshed since the OS is likely already booted. Even an
> > > >IOMMU capability sync via the hw_info ioctl will be difficult to
> > > >do at the runtime post the guest iommu driver's initialization.
> > > >
> > > >I am not 100% sure. But I think Intel model could have a similar
> > > >problem if the guest boots with zero cold-plugged device and then
> > > >hot-plugs a PASID-capable device at the runtime, when the guest-
> > > >level IOMMU driver is already inited?
> > > 
> > > For vtd we define a property for each capability we care about.
> > > When hotplug a device, we get hw_info through ioctl and compare
> > > host's capability with virtual vtd's property setting, if incompatible,
> > > we fail the hotplug.
> > > 
> > > In old implementation we sync host iommu caps into virtual vtd's cap,
> > > but that's Naked by maintainer. The suggested way is to define property
> > > for each capability we care and do compatibility check.
> > > 
> > > There is a "pasid" property in virtual vtd, only when it's true, the PASID-
> > > capable
> > > device can work with pasid.
> > 
> > Thanks for this information. I think probably we need to take a look at this as
> > this doesn't have a dependency on cold-plug device to be present for SMMUv3.
> > Will go through intel vtd implementation.
> 
> I see. A compatibility test sounds promising.
> 
> It still feels tricky when dealing with multi vSMMU instances, if
> some instances don't have a cold-plug device to poll hw_info. We
> would need to pre-define all the feature bits. Then, run the test
> on every hotplug device attached later to the vSMMU instance.
> 
> Maybe we could do something wise:
> The sysfs node provides all the IOMMU nodes. So, we could compare
> the node names to see if they are likely symmetric or not. Nodes
> sharing the same naming pattern are more likely created by the
> same IOMMU driver. So, as a speculation, a vSMMU instance with no
> coldplug device could borrow the bits from a vSMMU instance with
> a device?
> 
> Sure, individual IOMMU instances could differ in specific fields
> despite using the same node name. This would unfortunately lead
> to hotplug failure upon the compatibility check.

Hmm, forget what I said here. Each vSMMU instance should be pre
defined with a list of parameters. So, we will need to run the
compatibility test not only for hotplug devices, but coldplug
ones too.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-16 10:26     ` Shameerali Kolothum Thodi via
@ 2025-07-16 18:37       ` Nicolin Chen
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-16 18:37 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	zhenzhong.duan@intel.com, shameerkolothum@gmail.com

On Wed, Jul 16, 2025 at 10:26:21AM +0000, Shameerali Kolothum Thodi wrote:
> > On Mon, Jul 14, 2025 at 04:59:40PM +0100, Shameer Kolothum wrote:
> > My vSMMU didn't work until I added entries like SIDSIZE, SSIDSIZE,
> > TERM_MODEL, STALL_MODEL, and RIL.
> 
> How come your vSMMU not working? Or you meant the assigned
> dev is not working?

The "dev" (behind the vSMMU) running in the guest I mean.

> The emulation supports SIDSIZE = 16 and RIL. Could you please
> share the difference between these values w.r.t host SMMUv3.

My hardware doesn't support RIL while the VMM sets RIL.

There are other conflicts like STALL_MODEL that affected
the final two-stage STE in the host too.

> Probably we should take a look at Intel vtd implementation mentioned
> by Zhenzhong in the other thread where it looks like there seems to be
> a property for each capability they care about.
> 
> Probably something like,
> -device arm-smmuv3,accel=on,pasid_cap=on,
> 
> And then enabling all features related to pasid and on later when
> we retrieve the HW_INFO on device plug, compare and fail if not?
> 
> But I think on ARM, we still we have limitations in knowing the actual
> host supported features through IDR. In that case, we can only assume
> that user is making an informed decision while enabling these features.

Yes, I think you are right about this approach.

Maybe a "subversion" parameter could mask away quite a few bits
like RIL BBML. But the tricky thing is that user might want a
customization to those individual bits, because it has to match
with the HW values to use the device correctly.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-16 18:09           ` Nicolin Chen
@ 2025-07-16 18:42             ` Jason Gunthorpe
  2025-07-16 18:53               ` Nicolin Chen
  0 siblings, 1 reply; 79+ messages in thread
From: Jason Gunthorpe @ 2025-07-16 18:42 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, eric.auger, peter.maydell,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Wed, Jul 16, 2025 at 11:09:45AM -0700, Nicolin Chen wrote:
> OK. I see your point. That will leads to a very long list of
> parameters.

I would have some useful prebaked ones. Realistically there are not
that many combinations of HW capabilities that are
interesting/exist.

> So, a vSMMU model is defined following the parameters in the
> command line. A device (and its attaching SMMU HW) that's not
> compatibile should just fail the cold-plug at the beginning.

Yes

And if you want to do hotplug the SMMU is already fully defined so you
don't need to discover anything at VM startup time.

Jason


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-16 18:42             ` Jason Gunthorpe
@ 2025-07-16 18:53               ` Nicolin Chen
  0 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-16 18:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, eric.auger, peter.maydell,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Wed, Jul 16, 2025 at 03:42:39PM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 16, 2025 at 11:09:45AM -0700, Nicolin Chen wrote:
> > OK. I see your point. That will leads to a very long list of
> > parameters.
> 
> I would have some useful prebaked ones. Realistically there are not
> that many combinations of HW capabilities that are
> interesting/exist.

Maybe starting with a configurable subversion could be a good one.

But I suspect there can be some case where somebody wants certain
bits to be off on top of a subversion prebake..

> > So, a vSMMU model is defined following the parameters in the
> > command line. A device (and its attaching SMMU HW) that's not
> > compatibile should just fail the cold-plug at the beginning.
> 
> Yes
> 
> And if you want to do hotplug the SMMU is already fully defined so you
> don't need to discover anything at VM startup time.

Yea, basically every device (whether hotplug or coldplug) should
run hw_info to make sure it's compatible to the predefined vSMMU,
not the other way around that I expected.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits
  2025-07-14 15:59 ` [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits Shameer Kolothum via
                     ` (2 preceding siblings ...)
  2025-07-16  2:57   ` Nicolin Chen via
@ 2025-07-22 17:42   ` Nicolin Chen
  3 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-07-22 17:42 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

On Mon, Jul 14, 2025 at 04:59:40PM +0100, Shameer Kolothum wrote:
> +void smmuv3_accel_init_regs(SMMUv3State *s)
> +{
> +    SMMUv3AccelState *s_accel = s->s_accel;
> +    SMMUv3AccelDevice *accel_dev;
> +    uint32_t data_type;
> +    uint32_t val;
> +    int ret;
> +
> +    if (s_accel->info.idr[0]) {
> +        /* We already got this */
> +        return;

We can avoid duplicated HW_INFO ioctls but probably shouldn't return
here, but just goto ..

> +    }
> +
> +    if (!s_accel->viommu || QLIST_EMPTY(&s_accel->viommu->device_list)) {
> +        error_report("For arm-smmuv3,accel=on case, atleast one cold-plugged "
> +                     "vfio-pci dev needs to be assigned");
> +        goto out_err;
> +    }
> +
> +    accel_dev = QLIST_FIRST(&s_accel->viommu->device_list);
> +    ret = smmuv3_accel_host_hw_info(accel_dev, &data_type,
> +                                    sizeof(s_accel->info), &s_accel->info);
> +    if (ret) {
> +        error_report("Failed to get Host SMMU device info");
> +        goto out_err;
> +    }
> +
> +    if (data_type != IOMMU_HW_INFO_TYPE_ARM_SMMUV3) {
> +        error_report("Wrong data type (%d) for Host SMMU device info",
> +                     data_type);
> +        goto out_err;
> +    }

.. likely here:

> +    trace_smmuv3_accel_host_hw_info(s_accel->info.idr[0], s_accel->info.idr[1],
> +                                    s_accel->info.idr[3], s_accel->info.idr[5]);

The following register initializations shouldn't be skipped.

Otherwise, caps would not be the same after a system reboot.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd
  2025-07-14 15:59 ` [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum via
                     ` (2 preceding siblings ...)
  2025-07-15 10:53   ` Duan, Zhenzhong
@ 2025-08-06  0:55   ` Nicolin Chen
  3 siblings, 0 replies; 79+ messages in thread
From: Nicolin Chen @ 2025-08-06  0:55 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, zhenzhong.duan,
	shameerkolothum

Hi Shameer,

On Mon, Jul 14, 2025 at 04:59:32PM +0100, Shameer Kolothum wrote:
> @@ -25,30 +31,72 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
>  
>          sbus->pbdev[devfn] = sdev;
>          smmu_init_sdev(bs, sdev, bus, devfn);
> +        address_space_init(&accel_dev->as_sysmem, &s->s_accel->root,
> +                           "smmuv3-accel-sysmem");
>      }
>  
>      return accel_dev;
>  }
[..]
>  static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
>                                                int devfn)
>  {
> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>      SMMUState *bs = opaque;
> +    bool vfio_pci = false;
>      SMMUPciBus *sbus;
>      SMMUv3AccelDevice *accel_dev;
>      SMMUDevice *sdev;
>  
> +    if (pdev && !smmuv3_accel_pdev_allowed(pdev, &vfio_pci)) {
> +        error_report("Device(%s) not allowed. Only PCIe root complex devices "
> +                     "or PCI bridge devices or vfio-pci endpoint devices with "
> +                     "iommufd as backend is allowed with arm-smmuv3,accel=on",
> +                     pdev->name);
> +        exit(1);
> +    }
>      sbus = smmu_get_sbus(bs, bus);
>      accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
>      sdev = &accel_dev->sdev;
>  
> -    return &sdev->as;
> +    if (vfio_pci) {
> +        return &accel_dev->as_sysmem;

I found a new problem here that we initialize new as_sysmem per
VFIO device. So, sdevs would return own individual AS pointers
here at this get_address_space function, although they point to
the same system address space.

Since address space pointers are returned differently for VFIO
devices, this fails to reuse ioas_id in iommufd_cdev_attach(),
and ends up with allocating a new ioas for each device.

I think we can try the following change to make sure all accel
linked VFIO devices would share the same ioas_id, though I am
not sure if SMMUBaseClass is the right place to go. Please take
a look.

Then, once kernel is patched to share S2 hwpt across vSMMUs,
iommufd_cdev_attach() will go further to reuse the S2 HWPT in
the same container.

Thanks
Nicolin

---
 hw/arm/smmuv3-accel.c        | 14 ++++++++++----
 hw/arm/smmuv3-accel.h        |  2 +-
 include/hw/arm/smmu-common.h |  2 ++
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index c6dc50cf45..405b72095f 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -370,13 +370,19 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
     if (sdev) {
         accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
     } else {
+        SMMUBaseClass *sbc = ARM_SMMU_GET_CLASS(s);
+
         accel_dev = g_new0(SMMUv3AccelDevice, 1);
         sdev = &accel_dev->sdev;
 
         sbus->pbdev[devfn] = sdev;
         smmu_init_sdev(bs, sdev, bus, devfn);
-        address_space_init(&accel_dev->as_sysmem, &s->s_accel->root,
-                           "smmuv3-accel-sysmem");
+        if (!sbc->as_sysmem) {
+            sbc->as_sysmem = g_new0(AddressSpace, 1);
+            address_space_init(sbc->as_sysmem, &s->s_accel->root,
+                               "smmuv3-accel-sysmem");
+        }
+        accel_dev->as_sysmem = sbc->as_sysmem;
     }
 
     return accel_dev;
@@ -558,7 +564,7 @@ static AddressSpace *smmuv3_accel_find_msi_as(PCIBus *bus, void *opaque,
     if (accel_dev->s1_hwpt) {
         return &sdev->as;
     } else {
-        return &accel_dev->as_sysmem;
+        return accel_dev->as_sysmem;
     }
 }
 
@@ -599,7 +605,7 @@ static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
     sdev = &accel_dev->sdev;
 
     if (vfio_pci) {
-        return &accel_dev->as_sysmem;
+        return accel_dev->as_sysmem;
     } else {
         return &sdev->as;
     }
diff --git a/hw/arm/smmuv3-accel.h b/hw/arm/smmuv3-accel.h
index e1e99598b4..9faa0818d7 100644
--- a/hw/arm/smmuv3-accel.h
+++ b/hw/arm/smmuv3-accel.h
@@ -37,7 +37,7 @@ typedef struct SMMUS1Hwpt {
 
 typedef struct SMMUv3AccelDevice {
     SMMUDevice  sdev;
-    AddressSpace as_sysmem;
+    AddressSpace *as_sysmem;
     HostIOMMUDeviceIOMMUFD *idev;
     SMMUS1Hwpt  *s1_hwpt;
     SMMUViommu *viommu;
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index c459d24427..9bb9554435 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -168,6 +168,8 @@ struct SMMUState {
 struct SMMUBaseClass {
     /* <private> */
     SysBusDeviceClass parent_class;
+    /* FIXME is there any better shared place for different vSMMU instances? */
+    AddressSpace *as_sysmem;
 
     /*< public >*/
 
-- 


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper
  2025-07-14 15:59 ` [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper Shameer Kolothum via
  2025-07-14 16:38   ` Nicolin Chen via
  2025-07-15  9:30   ` Jonathan Cameron via
@ 2025-09-04  7:55   ` Eric Auger
  2 siblings, 0 replies; 79+ messages in thread
From: Eric Auger @ 2025-09-04  7:55 UTC (permalink / raw)
  To: qemu-arm, qemu-devel, skolothumtho
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao, zhenzhong.duan, shameerkolothum



On 7/14/25 5:59 PM, Shameer Kolothum wrote:
> Allows to retrieve the PCIIOMMUOps based on the SMMU type. This will be
> useful when we add support for accelerated SMMUV3 in subsequent patches
> as that requires a different set of callbacks for iommu ops.
>
> No special handling is required for now and returns the default ops
> in base SMMU Class.
>
> No functional changes intended.
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
> ---
>  hw/arm/smmu-common.c         | 17 +++++++++++++++--
>  include/hw/arm/smmu-common.h |  1 +
>  2 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 0f1a06cec2..3a1080773a 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -934,6 +934,16 @@ void smmu_inv_notifiers_all(SMMUState *s)
>      }
>  }
>  
> +static const PCIIOMMUOps *smmu_iommu_ops_by_type(SMMUState *s)
> +{
> +    SMMUBaseClass *sbc;
> +
> +    sbc = ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
> +    assert(sbc->iommu_ops);
> +
> +    return sbc->iommu_ops;
> +}
> +
>  static void smmu_base_realize(DeviceState *dev, Error **errp)
>  {
>      SMMUState *s = ARM_SMMU(dev);
> @@ -962,6 +972,7 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
>       */
>      if (pci_bus_is_express(pci_bus) && pci_bus_is_root(pci_bus) &&
>          object_dynamic_cast(OBJECT(pci_bus)->parent, TYPE_PCI_HOST_BRIDGE)) {
> +        const PCIIOMMUOps  *iommu_ops;
>          /*
>           * This condition matches either the default pcie.0, pxb-pcie, or
>           * pxb-cxl. For both pxb-pcie and pxb-cxl, parent_dev will be set.
> @@ -974,10 +985,11 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
>              }
>          }
>  
> +        iommu_ops = smmu_iommu_ops_by_type(s);
>          if (s->smmu_per_bus) {
> -            pci_setup_iommu_per_bus(pci_bus, &smmu_ops, s);
> +            pci_setup_iommu_per_bus(pci_bus, iommu_ops, s);
>          } else {
> -            pci_setup_iommu(pci_bus, &smmu_ops, s);
> +            pci_setup_iommu(pci_bus, iommu_ops, s);
>          }
>          return;
>      }
> @@ -1018,6 +1030,7 @@ static void smmu_base_class_init(ObjectClass *klass, const void *data)
>      device_class_set_parent_realize(dc, smmu_base_realize,
>                                      &sbc->parent_realize);
>      rc->phases.exit = smmu_base_reset_exit;
> +    sbc->iommu_ops = &smmu_ops;
>  }
>  
>  static const TypeInfo smmu_base_info = {
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index c6f899e403..eb94623555 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -171,6 +171,7 @@ struct SMMUBaseClass {
>      /*< public >*/
>  
>      DeviceRealize parent_realize;
> +    const PCIIOMMUOps *iommu_ops;
>  
>  };
>  



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  2025-07-16  9:27         ` Shameerali Kolothum Thodi via
@ 2025-09-04 14:31           ` Eric Auger
  0 siblings, 0 replies; 79+ messages in thread
From: Eric Auger @ 2025-09-04 14:31 UTC (permalink / raw)
  To: Duan, Zhenzhong, Nicolin Chen, skolothumtho
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	shameerkolothum@gmail.com

Hi Shameer,

On 7/16/25 11:27 AM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>> Sent: Wednesday, July 16, 2025 4:39 AM
>> To: Nicolin Chen <nicolinc@nvidia.com>
>> Cc: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; eric.auger@redhat.com;
>> peter.maydell@linaro.org; jgg@nvidia.com; ddutile@redhat.com;
>> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
>> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org; shameerkolothum@gmail.com
>> Subject: RE: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce
>> smmuv3 accel device
>>
>>
>>
>>> -----Original Message-----
>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>> Subject: Re: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce
>>> smmuv3 accel device
>>>
>>> On Tue, Jul 15, 2025 at 10:48:31AM +0000, Duan, Zhenzhong wrote:
>>>>> +static const TypeInfo types[] = {
>>>>> +    {
>>>>> +        .name = TYPE_ARM_SMMUV3_ACCEL,
>>>>> +        .parent = TYPE_ARM_SMMUV3,
>>>>> +        .class_init = smmuv3_accel_class_init,
>>>>> +    }
>>>> In cover-letter, I see "-device arm-smmuv3", so where is above accel
>>>> device created so we could use smmuv3_accel_ops?
>>> The smmu-common.c is the shared file between accel and non-accel
>>> instances. It has a module property:
>>>    DEFINE_PROP_BOOL("accel", SMMUState, accel, false),
>> It looks we expose a new TYPE_ARM_SMMUV3_ACCEL type device just for
>> exporting accel iommu_ops?
>> What about returning accel iommu_ops directly in
>> smmu_iommu_ops_by_type() and drop the new type?
> We are not creating any new device here. Its just a Class object of different type.
> I had a different approach previously and Eric suggested to try this as there
> are examples in VFIO/IOMMUFD for something like this.
>
> https://lore.kernel.org/qemu-devel/1105d100-dd1e-4aca-b518-50f903967416@redhat.com/
Actually I pointed out that usually we don't add methods in states as it
was done in v2 (in SMMUState) but rather in classes hence my suggestion
to use a class instead. Now what looks strange is your class does not
implement any method ;-)

docs/devel/qom.rst says "The #ObjectClass typically holds a table of function pointers
for the virtual methods implemented by this type."

Sorry if I was unclear.

Now if you don't need any "accel" specific methods besides the PCIIOMMUOps which have a specific struct, I am not sure we need a class anymore. or then you direct embed the PCIIOMMUOps Struct in the method?

Also it's true that's weird that the actual object is never instantiated and may also appear in qom object list. This is not the case for the example I gave, ie. the VFIOContainerBase.

I don't know if anyone has a better/more elegant idea?

Thanks

Eric

>
> Thanks,
> Shameer
>
>>> where it directs to different iommu_ops:
>>> 937 static const PCIIOMMUOps *smmu_iommu_ops_by_type(SMMUState
>> *s)
>>> 938 {
>>> 939     SMMUBaseClass *sbc;
>>> 940
>>> 941     if (s->accel) {
>>> 942         sbc =
>>> ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMUV3_ACCEL));
>>> 943     } else {
>>> 944         sbc =
>>> ARM_SMMU_CLASS(object_class_by_name(TYPE_ARM_SMMU));
>>> 945     }
>>> 946     assert(sbc->iommu_ops);
>>> 947
>>> 948     return sbc->iommu_ops;
>>> 949 }
>>>
>>> Nicolin



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device
  2025-07-14 17:23   ` Nicolin Chen
@ 2025-09-04 14:33     ` Eric Auger
  0 siblings, 0 replies; 79+ messages in thread
From: Eric Auger @ 2025-09-04 14:33 UTC (permalink / raw)
  To: Nicolin Chen, skolothumtho
  Cc: qemu-arm, qemu-devel, peter.maydell, jgg, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, zhenzhong.duan, shameerkolothum



On 7/14/25 7:23 PM, Nicolin Chen wrote:
> On Mon, Jul 14, 2025 at 04:59:31PM +0100, Shameer Kolothum wrote:
>> Also setup specific PCIIOMMUOps for accel SMMUv3 as accel
>> SMMUv3 will have different handling for those ops callbacks
>> in subsequent patches.
>>
>> The "accel" property is not yet added, so users cannot set it at this
>> point. It will be introduced in a subsequent patch once the necessary
>> support is in place.
>>
>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Overall the patch looks good to me,
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
>
> with some nits:
>
>> @@ -61,7 +61,8 @@ arm_common_ss.add(when: 'CONFIG_ARMSSE', if_true: files('armsse.c'))
>>  arm_common_ss.add(when: 'CONFIG_FSL_IMX7', if_true: files('fsl-imx7.c', 'mcimx7d-sabre.c'))
>>  arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP', if_true: files('fsl-imx8mp.c'))
>>  arm_common_ss.add(when: 'CONFIG_FSL_IMX8MP_EVK', if_true: files('imx8mp-evk.c'))
>> -arm_common_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
>> +arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
>> +arm_ss.add(when: ['CONFIG_ARM_SMMUV3', 'CONFIG_IOMMUFD'], if_true: files('smmuv3-accel.c'))
> Wondering why "arm_common_ss" is changed to "arm_ss"?
Indeed why did you need to change that?

Thanks

Eric
>
>> +static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *bs, SMMUPciBus *sbus,
>> +                                                PCIBus *bus, int devfn)
> There seems to be an extra space in the 2nd line.
>
>> +{
>> +    SMMUDevice *sdev = sbus->pbdev[devfn];
>> +    SMMUv3AccelDevice *accel_dev;
>> +
>> +    if (sdev) {
>> +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
>> +    } else {
>> +        accel_dev = g_new0(SMMUv3AccelDevice, 1);
>> +        sdev = &accel_dev->sdev;
>> +
>> +        sbus->pbdev[devfn] = sdev;
>> +        smmu_init_sdev(bs, sdev, bus, devfn);
>> +    }
> Could just:
>     if (sdev) {
>         return container_of(sdev, SMMUv3AccelDevice, sdev);
>     }
>
> Then, no extra indentations for the rest of the code.
>
>> +
>> +    return accel_dev;
>> +}
>> +
>> +static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
>> +                                              int devfn)
>> +{
>> +    SMMUState *bs = opaque;
>> +    SMMUPciBus *sbus;
>> +    SMMUv3AccelDevice *accel_dev;
>> +    SMMUDevice *sdev;
>> +
>> +    sbus = smmu_get_sbus(bs, bus);
>> +    accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
>> +    sdev = &accel_dev->sdev;
> Maybe just:
>
> +    SMMUPciBus *sbus = smmu_get_sbus(bs, bus);
> +    SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(bs, sbus, bus, devfn);
> +    SMMUDevice *sdev = &accel_dev->sdev;
>
> ?
>
>> +typedef struct SMMUv3AccelDevice {
>> +    SMMUDevice  sdev;
> Let's drop the extra space in between.
>
>> +} SMMUv3AccelDevice;
>> +
>> +#endif /* HW_ARM_SMMUV3_ACCEL_H */
>> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
>> index eb94623555..c459d24427 100644
>> --- a/include/hw/arm/smmu-common.h
>> +++ b/include/hw/arm/smmu-common.h
>> @@ -162,6 +162,7 @@ struct SMMUState {
>>      uint8_t bus_num;
>>      PCIBus *primary_bus;
>>      bool smmu_per_bus; /* SMMU is specific to the primary_bus */
>> +    bool accel; /* SMMU has accelerator support */
> How about:
> "SMMU is in the HW-accelerated mode for stage-1 translation"
> ?
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2025-09-04 14:34 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-14 15:59 [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
2025-07-14 15:59 ` [RFC PATCH v3 01/15] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
2025-07-14 16:22   ` Nicolin Chen
2025-07-15  9:14   ` Jonathan Cameron via
2025-07-14 15:59 ` [RFC PATCH v3 02/15] backends/iommufd: Introduce iommufd_vdev_alloc Shameer Kolothum via
2025-07-14 16:27   ` Nicolin Chen
2025-07-15  9:19   ` Jonathan Cameron via
2025-07-14 15:59 ` [RFC PATCH v3 03/15] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum via
2025-07-15  9:27   ` Jonathan Cameron via
2025-07-14 15:59 ` [RFC PATCH v3 04/15] hw/arm/smmu-common: Introduce smmu_iommu_ops_by_type() helper Shameer Kolothum via
2025-07-14 16:38   ` Nicolin Chen via
2025-07-15  9:30   ` Jonathan Cameron via
2025-09-04  7:55   ` Eric Auger
2025-07-14 15:59 ` [RFC PATCH v3 05/15] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum via
2025-07-14 17:23   ` Nicolin Chen
2025-09-04 14:33     ` Eric Auger
2025-07-15  9:45   ` Jonathan Cameron via
2025-07-15 10:48   ` Duan, Zhenzhong
2025-07-15 17:29     ` Nicolin Chen
2025-07-16  3:38       ` Duan, Zhenzhong
2025-07-16  9:27         ` Shameerali Kolothum Thodi via
2025-09-04 14:31           ` Eric Auger
2025-07-14 15:59 ` [RFC PATCH v3 06/15] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum via
2025-07-14 18:18   ` Nicolin Chen
2025-07-15  9:51   ` Jonathan Cameron via
2025-07-15 10:53   ` Duan, Zhenzhong
2025-07-15 17:59     ` Nicolin Chen
2025-07-16  6:26       ` Duan, Zhenzhong
2025-07-16  9:34         ` Shameerali Kolothum Thodi via
2025-07-16 10:32           ` Duan, Zhenzhong
2025-07-16 17:51           ` Nicolin Chen
2025-07-16 18:21             ` Nicolin Chen
2025-07-16  8:06       ` Shameerali Kolothum Thodi via
2025-08-06  0:55   ` Nicolin Chen
2025-07-14 15:59 ` [RFC PATCH v3 07/15] hw/arm/smmuv3: Implement get_viommu_cap() callback Shameer Kolothum via
2025-07-14 18:31   ` Nicolin Chen
2025-07-14 15:59 ` [RFC PATCH v3 08/15] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
2025-07-14 19:11   ` Nicolin Chen
2025-07-15 10:29   ` Jonathan Cameron via
2025-07-15 17:01     ` Nicolin Chen
2025-07-16  9:33       ` Jonathan Cameron via
2025-07-14 15:59 ` [RFC PATCH v3 09/15] hw/arm/smmuv3-accel: Support nested STE install/uninstall support Shameer Kolothum via
2025-07-14 19:37   ` Nicolin Chen
2025-07-15 23:12   ` Nicolin Chen
2025-07-16  8:36     ` Shameerali Kolothum Thodi via
2025-07-16 18:17       ` Nicolin Chen
2025-07-14 15:59 ` [RFC PATCH v3 10/15] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device Shameer Kolothum via
2025-07-14 19:43   ` Nicolin Chen
2025-07-14 15:59 ` [RFC PATCH v3 11/15] hw/pci/pci: Introduce optional get_msi_address_space() callback Shameer Kolothum via
2025-07-14 19:50   ` Nicolin Chen
2025-07-14 15:59 ` [RFC PATCH v3 12/15] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
2025-07-14 19:55   ` Nicolin Chen
2025-07-15 10:39   ` Jonathan Cameron via
2025-07-15 17:07     ` Nicolin Chen
2025-07-14 15:59 ` [RFC PATCH v3 13/15] hw/arm/smmuv3: Forward invalidation commands to hw Shameer Kolothum via
2025-07-15 10:46   ` Jonathan Cameron via
2025-07-15 17:22     ` Nicolin Chen
2025-07-16  7:32       ` Shameerali Kolothum Thodi via
2025-07-14 15:59 ` [RFC PATCH v3 14/15] Read and validate host SMMUv3 feature bits Shameer Kolothum via
2025-07-14 20:04   ` Nicolin Chen via
2025-07-14 20:24     ` Nicolin Chen via
2025-07-15 10:48   ` Jonathan Cameron via
2025-07-16  2:57   ` Nicolin Chen via
2025-07-16 10:26     ` Shameerali Kolothum Thodi via
2025-07-16 18:37       ` Nicolin Chen
2025-07-16 11:51     ` Jason Gunthorpe
2025-07-16 17:35       ` Nicolin Chen
2025-07-16 17:45         ` Jason Gunthorpe
2025-07-16 18:09           ` Nicolin Chen
2025-07-16 18:42             ` Jason Gunthorpe
2025-07-16 18:53               ` Nicolin Chen
2025-07-22 17:42   ` Nicolin Chen
2025-07-14 15:59 ` [RFC PATCH v3 15/15] hw/arm/smmu-common: Add accel property for SMMU dev Shameer Kolothum via
2025-07-14 20:00   ` Nicolin Chen
2025-07-15 10:49   ` Jonathan Cameron via
2025-07-14 16:14 ` [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Nicolin Chen via
2025-07-14 20:22   ` Nicolin Chen via
2025-07-15 10:46 ` Duan, Zhenzhong
2025-07-16  7:27   ` Shameerali Kolothum Thodi via

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).