qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
@ 2025-03-11 14:10 Shameer Kolothum via
  2025-03-11 14:10 ` [RFC PATCH v2 01/20] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
                   ` (21 more replies)
  0 siblings, 22 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Hi All,

This patch series introduces initial support for a user-creatable
accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU.

Why this is needed:

Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the
machine and does not support configuring the host SMMUv3 in nested
mode.This limitation prevents its use with vfio-pci passthrough
devices.

The new pluggable smmuv3-accel device enables host SMMUv3 configuration
with nested stage support (Stage 1 owned by the Guest and Stage 2 by the
host) via the new IOMMUFD APIs. Additionally, it allows multiple 
accelerated vSMMUv3 instances for guests running on hosts with multiple
physical SMMUv3s.

This will benefit in:
-Reduced invalidation broadcasts and lookups for devices behind multiple
 physical SMMUv3s.
-Simplifies handling of host SMMUv3s with differing feature sets.
-Lays the groundwork for additional capabilities like vCMDQ support.


Changes from RFCv1[0]:

Thanks to everyone who provided feedback on RFCv1!. 

–The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested
 to better reflect its role in using the host's physical SMMUv3 for page
 table setup and cache invalidations.
-Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2).
-Merges patches from Nicolin’s GitHub repository that add accelerated
 functionalityi for page table setup and cache invalidations[1]. I have
 modified these a bit, but hopefully has not broken anything.
-Incorporates various fixes and improvements based on RFCv1 feedback.
–Adds support for vfio-pci hotplug with smmuv3-accel.

Note: IORT RMR patches for MSI setup are currently excluded as we may
adopt a different approach for MSI handling in the future [2].

Also this has dependency on the common iommufd/vfio patches from
Zhenzhong's series here[3]

ToDos:

–At least one vfio-pci device must currently be cold-plugged to a
 pxb-pcie bus associated with the arm-smmuv3-accel. This is required both
 to associate a vSMMUv3 with a host SMMUv3 and also needed to
 retrieve the host SMMUv3 IDR registers for guest export.
 Future updates will remove this restriction by adding the
 necessary kernel support.
 Please find the discussion here[4]
-This version does not yet support host SMMUv3 fault handling or
 other event notifications. These will be addressed in a
 future patch series.


The complete branch can be found here:
https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext

I have done basic sanity testing on a Hisilicon Platform using the kernel
branch here:
https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2

Usage Eg:

On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF
devices and HNS VF devices are behind different host SMMUv3s. So for a
Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as below,


./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \
-cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \
-bios QEMU_EFI.fd \
-object iommufd,id=iommufd0 \
-device virtio-blk-device,drive=fs \
-drive if=none,file=rootfs.qcow2,id=fs \
-device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
-device arm-smmuv3-accel,bus=pcie.1 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K \
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \
-device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
-device arm-smmuv3-accel,bus=pcie.2 \
-device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \
-kernel Image \
-append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
-device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \
-fsdev local,id=p9fs,path=p9root,security_model=mapped \
-net none \
-nographic

Guest will boot with two SMMUv3s,
...
arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008325)
arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008325)
arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq

With a pci topology like below,

[root@localhost ~]# lspci -tv
-+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
 |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           \-03.0  Virtio: Virtio filesystem
 +-[0000:01]-+-00.0-[02]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 |           \-01.0-[03]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 \-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)

Further tests are always welcome.

Please take a look and let me know your feedback!

Thanks,
Shameer

[0] https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com/
[1] https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6
[2] https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicolinc@nvidia.com/
[3] https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.duan@intel.com/
[4] https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/

Nicolin Chen (11):
  backends/iommufd: Introduce iommufd_backend_alloc_viommu
  backends/iommufd: Introduce iommufd_vdev_alloc
  hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
  hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed
  hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache
    invalidations
  hw/arm/smmuv3: Forward invalidation commands to hw
  hw/arm/smmuv3-accel: Read host SMMUv3 device info
  hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
  hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3

Shameer Kolothum (9):
  hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel
    device
  hw/arm/virt: Add support for smmuv3-accel
  hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  hw/arm/smmu-common: Factor out common helper functions and export
  hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
  hw/arm/smmuv3-accel: Provide get_address_space callback
  hw/arm/smmuv3: Install nested ste for CFGI_STE
  hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes
  hw/arm/smmuv3-accel: Enable smmuv3-accel creation

 backends/iommufd.c            |  51 +++
 backends/trace-events         |   2 +
 hw/arm/Kconfig                |   5 +
 hw/arm/meson.build            |   1 +
 hw/arm/smmu-common.c          |  95 +++++-
 hw/arm/smmuv3-accel.c         | 616 ++++++++++++++++++++++++++++++++++
 hw/arm/smmuv3-internal.h      |  54 +++
 hw/arm/smmuv3.c               |  80 ++++-
 hw/arm/trace-events           |   6 +
 hw/arm/virt-acpi-build.c      | 113 ++++++-
 hw/arm/virt.c                 |  12 +
 hw/core/sysbus-fdt.c          |   1 +
 include/hw/arm/smmu-common.h  |  14 +
 include/hw/arm/smmuv3-accel.h |  75 +++++
 include/hw/arm/virt.h         |   1 +
 include/system/iommufd.h      |  14 +
 16 files changed, 1101 insertions(+), 39 deletions(-)
 create mode 100644 hw/arm/smmuv3-accel.c
 create mode 100644 include/hw/arm/smmuv3-accel.h

-- 
2.34.1



^ permalink raw reply	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 01/20] backends/iommufd: Introduce iommufd_backend_alloc_viommu
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-12 15:20   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 02/20] backends/iommufd: Introduce iommufd_vdev_alloc Shameer Kolothum via
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

Add a helper to allocate a viommu object.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 backends/iommufd.c       | 25 +++++++++++++++++++++++++
 backends/trace-events    |  1 +
 include/system/iommufd.h |  4 ++++
 3 files changed, 30 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 3c23caef96..3fac08c96e 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -341,6 +341,31 @@ int iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t hwpt_id,
     return ret;
 }
 
+bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
+                                  uint32_t viommu_type, uint32_t hwpt_id,
+                                  uint32_t *out_viommu_id, Error **errp)
+{
+    int ret, fd = be->fd;
+    struct iommu_viommu_alloc alloc_viommu = {
+        .size = sizeof(alloc_viommu),
+        .type = viommu_type,
+        .dev_id = dev_id,
+        .hwpt_id = hwpt_id,
+    };
+
+    ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &alloc_viommu);
+
+    trace_iommufd_backend_alloc_viommu(fd, viommu_type, dev_id, hwpt_id,
+                                       alloc_viommu.out_viommu_id, ret);
+    if (ret) {
+        error_setg_errno(errp, errno, "IOMMU_VIOMMU_ALLOC failed");
+        return false;
+    }
+
+    *out_viommu_id = alloc_viommu.out_viommu_id;
+    return true;
+}
+
 bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
                                            uint32_t hwpt_id, Error **errp)
 {
diff --git a/backends/trace-events b/backends/trace-events
index 5a23db6c8a..a835827540 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -19,3 +19,4 @@ iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%
 iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
 iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
 iommufd_backend_invalidate_cache(int iommufd, uint32_t hwpt_id, uint32_t data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t data_ptr, int ret) " iommufd=%d hwpt_id=%u data_type=%u entry_len=%u entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)"
+iommufd_backend_alloc_viommu(int iommufd, uint32_t type, uint32_t dev_id, uint32_t hwpt_id, uint32_t viommu_id, int ret) " iommufd=%d type=%u dev_id=%u hwpt_id=%u viommu_id=%u (%d)"
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index b93421ac7c..7e5507f2db 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -55,6 +55,10 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
                                 uint32_t data_type, uint32_t data_len,
                                 void *data_ptr, uint32_t *out_hwpt,
                                 Error **errp);
+bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
+                                  uint32_t viommu_type, uint32_t hwpt_id,
+                                  uint32_t *out_hwpt, Error **errp);
+
 bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
                                         bool start, Error **errp);
 bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 02/20] backends/iommufd: Introduce iommufd_vdev_alloc
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
  2025-03-11 14:10 ` [RFC PATCH v2 01/20] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-12 15:25   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device Shameer Kolothum via
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

Add a helper to allocate an iommufd device's virtual device (in the user
space) per a viommu instance.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 backends/iommufd.c       | 26 ++++++++++++++++++++++++++
 backends/trace-events    |  1 +
 include/system/iommufd.h |  4 ++++
 3 files changed, 31 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 3fac08c96e..3511dd32ab 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -366,6 +366,32 @@ bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
     return true;
 }
 
+bool iommufd_backend_alloc_vdev(IOMMUFDBackend *be, uint32_t dev_id,
+                                uint32_t viommu_id, uint64_t virt_id,
+                                uint32_t *out_vdev_id, Error **errp)
+{
+    int ret, fd = be->fd;
+    struct iommu_vdevice_alloc alloc_vdev = {
+        .size = sizeof(alloc_vdev),
+        .viommu_id = viommu_id,
+        .dev_id = dev_id,
+        .virt_id = virt_id,
+    };
+
+    ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &alloc_vdev);
+
+    trace_iommufd_backend_alloc_vdev(fd, dev_id, viommu_id, virt_id,
+                                     alloc_vdev.out_vdevice_id, ret);
+
+    if (ret) {
+        error_setg_errno(errp, errno, "IOMMU_VDEVICE_ALLOC failed");
+        return false;
+    }
+
+    *out_vdev_id = alloc_vdev.out_vdevice_id;
+    return true;
+}
+
 bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
                                            uint32_t hwpt_id, Error **errp)
 {
diff --git a/backends/trace-events b/backends/trace-events
index a835827540..86c8f89e8a 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -20,3 +20,4 @@ iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) "
 iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
 iommufd_backend_invalidate_cache(int iommufd, uint32_t hwpt_id, uint32_t data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t data_ptr, int ret) " iommufd=%d hwpt_id=%u data_type=%u entry_len=%u entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)"
 iommufd_backend_alloc_viommu(int iommufd, uint32_t type, uint32_t dev_id, uint32_t hwpt_id, uint32_t viommu_id, int ret) " iommufd=%d type=%u dev_id=%u hwpt_id=%u viommu_id=%u (%d)"
+iommufd_backend_alloc_vdev(int iommufd, uint32_t dev_id, uint32_t viommu_id, uint64_t virt_id, uint32_t vdev_id, int ret) " iommufd=%d dev_id=%u viommu_id=%u virt_id=0x%"PRIx64" vdev_id=%u (%d)"
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 7e5507f2db..53920bae5b 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -59,6 +59,10 @@ bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
                                   uint32_t viommu_type, uint32_t hwpt_id,
                                   uint32_t *out_hwpt, Error **errp);
 
+bool iommufd_backend_alloc_vdev(IOMMUFDBackend *be, uint32_t dev_id,
+                                uint32_t viommu_id, uint64_t virt_id,
+                                uint32_t *out_vdev_id, Error **errp);
+
 bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
                                         bool start, Error **errp);
 bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
  2025-03-11 14:10 ` [RFC PATCH v2 01/20] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
  2025-03-11 14:10 ` [RFC PATCH v2 02/20] backends/iommufd: Introduce iommufd_vdev_alloc Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-11 20:13   ` Nicolin Chen
  2025-03-12 15:15   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel Shameer Kolothum via
                   ` (18 subsequent siblings)
  21 siblings, 2 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel
device. In order to support vfio-pci dev assignment with a Guest
SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
mode, with Guest owning the S1 page tables. Subsequent patches will
add support for smmuv3-accel to provide this.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/Kconfig                |  5 ++++
 hw/arm/meson.build            |  1 +
 hw/arm/smmu-common.c          |  1 +
 hw/arm/smmuv3-accel.c         | 51 +++++++++++++++++++++++++++++++++++
 include/hw/arm/smmu-common.h  |  3 +++
 include/hw/arm/smmuv3-accel.h | 31 +++++++++++++++++++++
 6 files changed, 92 insertions(+)
 create mode 100644 hw/arm/smmuv3-accel.c
 create mode 100644 include/hw/arm/smmuv3-accel.h

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 504841ccab..f889842dd8 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -14,6 +14,7 @@ config ARM_VIRT
     select ARM_GIC
     select ACPI
     select ARM_SMMUV3
+    select ARM_SMMUV3_ACCEL
     select GPIO_KEY
     select DEVICE_TREE
     select FW_CFG_DMA
@@ -596,6 +597,10 @@ config FSL_IMX7
 config ARM_SMMUV3
     bool
 
+config ARM_SMMUV3_ACCEL
+    select ARM_SMMUV3
+    bool
+
 config FSL_IMX6UL
     bool
     default y
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 465c757f97..e8593363b0 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -55,6 +55,7 @@ arm_ss.add(when: 'CONFIG_MUSCA', if_true: files('musca.c'))
 arm_ss.add(when: 'CONFIG_ARMSSE', if_true: files('armsse.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: files('fsl-imx7.c', 'mcimx7d-sabre.c'))
 arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
+arm_ss.add(when: 'CONFIG_ARM_SMMUV3_ACCEL', if_true: files('smmuv3-accel.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
 arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
 arm_ss.add(when: 'CONFIG_XEN', if_true: files(
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 8c1b407b82..f5caf1665c 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -943,6 +943,7 @@ static const Property smmu_dev_properties[] = {
     DEFINE_PROP_UINT8("bus_num", SMMUState, bus_num, 0),
     DEFINE_PROP_LINK("primary-bus", SMMUState, primary_bus,
                      TYPE_PCI_BUS, PCIBus *),
+    DEFINE_PROP_BOOL("accel", SMMUState, accel, false),
 };
 
 static void smmu_base_class_init(ObjectClass *klass, void *data)
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
new file mode 100644
index 0000000000..c327661636
--- /dev/null
+++ b/hw/arm/smmuv3-accel.c
@@ -0,0 +1,51 @@
+/*
+ * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
+ * Copyright (C) 2025 NVIDIA
+ * Written by Nicolin Chen, Shameer Kolothum
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/arm/smmuv3-accel.h"
+
+static void smmu_accel_realize(DeviceState *d, Error **errp)
+{
+    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(d);
+    SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_GET_CLASS(s_accel);
+    SysBusDevice *dev = SYS_BUS_DEVICE(d);
+    Error *local_err = NULL;
+
+    object_property_set_bool(OBJECT(dev), "accel", true, &error_abort);
+    c->parent_realize(d, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_CLASS(klass);
+
+    device_class_set_parent_realize(dc, smmu_accel_realize,
+                                    &c->parent_realize);
+    dc->hotpluggable = false;
+}
+
+static const TypeInfo smmuv3_accel_type_info = {
+    .name          = TYPE_ARM_SMMUV3_ACCEL,
+    .parent        = TYPE_ARM_SMMUV3,
+    .instance_size = sizeof(SMMUv3AccelState),
+    .class_size    = sizeof(SMMUv3AccelClass),
+    .class_init    = smmuv3_accel_class_init,
+};
+
+static void smmuv3_accel_register_types(void)
+{
+    type_register_static(&smmuv3_accel_type_info);
+}
+
+type_init(smmuv3_accel_register_types)
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index d1a4a64551..b5c63cfd5d 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -157,6 +157,9 @@ struct SMMUState {
     QLIST_HEAD(, SMMUDevice) devices_with_notifiers;
     uint8_t bus_num;
     PCIBus *primary_bus;
+
+    /* For smmuv3-accel */
+    bool accel;
 };
 
 struct SMMUBaseClass {
diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
new file mode 100644
index 0000000000..56fe376bf4
--- /dev/null
+++ b/include/hw/arm/smmuv3-accel.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
+ * Copyright (C) 2025 NVIDIA
+ * Written by Nicolin Chen, Shameer Kolothum
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_ARM_SMMUV3_ACCEL_H
+#define HW_ARM_SMMUV3_ACCEL_H
+
+#include "hw/arm/smmu-common.h"
+#include "hw/arm/smmuv3.h"
+#include "qom/object.h"
+
+#define TYPE_ARM_SMMUV3_ACCEL   "arm-smmuv3-accel"
+OBJECT_DECLARE_TYPE(SMMUv3AccelState, SMMUv3AccelClass, ARM_SMMUV3_ACCEL)
+
+struct SMMUv3AccelState {
+    SMMUv3State smmuv3_state;
+};
+
+struct SMMUv3AccelClass {
+    /*< private >*/
+    SMMUv3Class smmuv3_class;
+    /*< public >*/
+
+    DeviceRealize parent_realize;
+};
+
+#endif /* HW_ARM_SMMUV3_ACCEL_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (2 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-11 20:22   ` Nicolin Chen
                     ` (2 more replies)
  2025-03-11 14:10 ` [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus Shameer Kolothum via
                   ` (17 subsequent siblings)
  21 siblings, 3 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Allow cold-plug smmuv3-accel to virt If the machine wide smmuv3
is not specified.

No FDT support is added for now.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/virt.c         | 12 ++++++++++++
 hw/core/sysbus-fdt.c  |  1 +
 include/hw/arm/virt.h |  1 +
 3 files changed, 14 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4a5a9666e9..84a323da55 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -73,6 +73,7 @@
 #include "qobject/qlist.h"
 #include "standard-headers/linux/input.h"
 #include "hw/arm/smmuv3.h"
+#include "hw/arm/smmuv3-accel.h"
 #include "hw/acpi/acpi.h"
 #include "target/arm/cpu-qom.h"
 #include "target/arm/internals.h"
@@ -2911,6 +2912,16 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
             platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
                                      SYS_BUS_DEVICE(dev));
         }
+        if (object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_ACCEL)) {
+            if (vms->iommu == VIRT_IOMMU_SMMUV3) {
+                error_setg(errp,
+                           "iommu=smmuv3 is already specified. can't create smmuv3-accel dev");
+                return;
+            }
+            if (vms->iommu != VIRT_IOMMU_SMMUV3_ACCEL) {
+                vms->iommu = VIRT_IOMMU_SMMUV3_ACCEL;
+            }
+        }
     }
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
@@ -3120,6 +3131,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_PLATFORM);
+    machine_class_allow_dynamic_sysbus_dev(mc, TYPE_ARM_SMMUV3_ACCEL);
 #ifdef CONFIG_TPM
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
 #endif
diff --git a/hw/core/sysbus-fdt.c b/hw/core/sysbus-fdt.c
index 774c0aed41..c8502ad830 100644
--- a/hw/core/sysbus-fdt.c
+++ b/hw/core/sysbus-fdt.c
@@ -489,6 +489,7 @@ static const BindingEntry bindings[] = {
 #ifdef CONFIG_LINUX
     TYPE_BINDING(TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node),
     TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
+    TYPE_BINDING("arm-smmuv3-accel", no_fdt_node),
     VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a", add_amd_xgbe_fdt_node),
 #endif
 #ifdef CONFIG_TPM
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index c8e94e6aed..849d1cd5b5 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -92,6 +92,7 @@ enum {
 typedef enum VirtIOMMUType {
     VIRT_IOMMU_NONE,
     VIRT_IOMMU_SMMUV3,
+    VIRT_IOMMU_SMMUV3_ACCEL,
     VIRT_IOMMU_VIRTIO,
 } VirtIOMMUType;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (3 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-12 16:07   ` Eric Auger
  2025-03-18 22:12   ` Donald Dutile
  2025-03-11 14:10 ` [RFC PATCH v2 06/20] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum via
                   ` (16 subsequent siblings)
  21 siblings, 2 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

User must associate a pxb-pcie root bus to smmuv3-accel
and that is set as the primary-bus for the smmu dev.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index c327661636..1471b65374 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -9,6 +9,21 @@
 #include "qemu/osdep.h"
 
 #include "hw/arm/smmuv3-accel.h"
+#include "hw/pci/pci_bridge.h"
+
+static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
+{
+    DeviceState *d = opaque;
+
+    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
+        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
+        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus->name)) {
+            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
+                                     &error_abort);
+        }
+    }
+    return 0;
+}
 
 static void smmu_accel_realize(DeviceState *d, Error **errp)
 {
@@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
     SysBusDevice *dev = SYS_BUS_DEVICE(d);
     Error *local_err = NULL;
 
+    object_child_foreach_recursive(object_get_root(),
+                                   smmuv3_accel_pxb_pcie_bus, d);
+
     object_property_set_bool(OBJECT(dev), "accel", true, &error_abort);
     c->parent_realize(d, &local_err);
     if (local_err) {
@@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
     device_class_set_parent_realize(dc, smmu_accel_realize,
                                     &c->parent_realize);
     dc->hotpluggable = false;
+    dc->bus_type = TYPE_PCIE_BUS;
 }
 
 static const TypeInfo smmuv3_accel_type_info = {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 06/20] hw/arm/smmu-common: Factor out common helper functions and export
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (4 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-12 16:12   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps Shameer Kolothum via
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Subsequent patches for smmuv3-accel will make use of this

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmu-common.c         | 48 ++++++++++++++++++++++--------------
 include/hw/arm/smmu-common.h |  6 +++++
 2 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index f5caf1665c..83c0693f5a 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -826,12 +826,28 @@ SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num)
     return NULL;
 }
 
-static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
+void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev,
+                    PCIBus *bus, int devfn)
 {
-    SMMUState *s = opaque;
-    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
-    SMMUDevice *sdev;
     static unsigned int index;
+    char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn, index++);
+
+    sdev->smmu = s;
+    sdev->bus = bus;
+    sdev->devfn = devfn;
+
+    memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
+                             s->mrtypename,
+                             OBJECT(s), name, UINT64_MAX);
+    address_space_init(&sdev->as,
+                       MEMORY_REGION(&sdev->iommu), name);
+    trace_smmu_add_mr(name);
+    g_free(name);
+}
+
+SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus)
+{
+    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
 
     if (!sbus) {
         sbus = g_malloc0(sizeof(SMMUPciBus) +
@@ -840,23 +856,19 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
         g_hash_table_insert(s->smmu_pcibus_by_busptr, bus, sbus);
     }
 
+    return sbus;
+}
+
+static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
+{
+    SMMUDevice *sdev;
+    SMMUState *s = opaque;
+    SMMUPciBus *sbus = smmu_get_sbus(s, bus);
+
     sdev = sbus->pbdev[devfn];
     if (!sdev) {
-        char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn, index++);
-
         sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1);
-
-        sdev->smmu = s;
-        sdev->bus = bus;
-        sdev->devfn = devfn;
-
-        memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
-                                 s->mrtypename,
-                                 OBJECT(s), name, UINT64_MAX);
-        address_space_init(&sdev->as,
-                           MEMORY_REGION(&sdev->iommu), name);
-        trace_smmu_add_mr(name);
-        g_free(name);
+        smmu_init_sdev(s, sdev, bus, devfn);
     }
 
     return &sdev->as;
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index b5c63cfd5d..80ff2ef6aa 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -178,6 +178,12 @@ OBJECT_DECLARE_TYPE(SMMUState, SMMUBaseClass, ARM_SMMU)
 /* Return the SMMUPciBus handle associated to a PCI bus number */
 SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num);
 
+/* Return the SMMUPciBus handle associated to a PCI bus */
+SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus);
+
+/* Initialize SMMUDevice handle associated to a SMMUPCIBus */
+void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev, PCIBus *bus, int devfn);
+
 /* Return the stream ID of an SMMU device */
 static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (5 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 06/20] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-12 16:23   ` Eric Auger
  2025-03-12 17:10   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 08/20] hw/arm/smmuv3-accel: Provide get_address_space callback Shameer Kolothum via
                   ` (14 subsequent siblings)
  21 siblings, 2 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Subsequently smmuv3-accel will provide these callbacks

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmu-common.c         | 27 +++++++++++++++++++++++++++
 include/hw/arm/smmu-common.h |  5 +++++
 2 files changed, 32 insertions(+)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 83c0693f5a..9fd455baa0 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -865,6 +865,10 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
     SMMUState *s = opaque;
     SMMUPciBus *sbus = smmu_get_sbus(s, bus);
 
+    if (s->accel && s->get_address_space) {
+        return s->get_address_space(bus, opaque, devfn);
+    }
+
     sdev = sbus->pbdev[devfn];
     if (!sdev) {
         sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1);
@@ -874,8 +878,31 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
     return &sdev->as;
 }
 
+static bool smmu_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
+                                      HostIOMMUDevice *hiod, Error **errp)
+{
+    SMMUState *s = opaque;
+
+    if (s->accel && s->set_iommu_device) {
+        return s->set_iommu_device(bus, opaque, devfn, hiod, errp);
+    }
+
+    return false;
+}
+
+static void smmu_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
+{
+    SMMUState *s = opaque;
+
+    if (s->accel && s->unset_iommu_device) {
+        s->unset_iommu_device(bus, opaque, devfn);
+    }
+}
+
 static const PCIIOMMUOps smmu_ops = {
     .get_address_space = smmu_find_add_as,
+    .set_iommu_device = smmu_dev_set_iommu_device,
+    .unset_iommu_device = smmu_dev_unset_iommu_device,
 };
 
 SMMUDevice *smmu_find_sdev(SMMUState *s, uint32_t sid)
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 80ff2ef6aa..7b05640167 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -160,6 +160,11 @@ struct SMMUState {
 
     /* For smmuv3-accel */
     bool accel;
+
+    AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
+    bool (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
+                             HostIOMMUDevice *dev, Error **errp);
+    void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
 };
 
 struct SMMUBaseClass {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 08/20] hw/arm/smmuv3-accel: Provide get_address_space callback
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (6 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-11 20:50   ` Nicolin Chen
  2025-03-12 17:14   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
                   ` (13 subsequent siblings)
  21 siblings, 2 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Also introduce a struct SMMUv3AccelDevice to hold accelerator specific
device info. This will be populated accordingly in subsequent patches.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c         | 36 +++++++++++++++++++++++++++++++++++
 include/hw/arm/smmuv3-accel.h |  4 ++++
 2 files changed, 40 insertions(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 1471b65374..6610ebe4be 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -11,6 +11,40 @@
 #include "hw/arm/smmuv3-accel.h"
 #include "hw/pci/pci_bridge.h"
 
+static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
+                                                PCIBus *bus, int devfn)
+{
+    SMMUDevice *sdev = sbus->pbdev[devfn];
+    SMMUv3AccelDevice *accel_dev;
+
+    if (sdev) {
+        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+    } else {
+        accel_dev = g_new0(SMMUv3AccelDevice, 1);
+        sdev = &accel_dev->sdev;
+
+        sbus->pbdev[devfn] = sdev;
+        smmu_init_sdev(s, sdev, bus, devfn);
+    }
+
+    return accel_dev;
+}
+
+static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
+                                              int devfn)
+{
+    SMMUState *s = opaque;
+    SMMUPciBus *sbus;
+    SMMUv3AccelDevice *accel_dev;
+    SMMUDevice *sdev;
+
+    sbus = smmu_get_sbus(s, bus);
+    accel_dev = smmuv3_accel_get_dev(s, sbus, bus, devfn);
+    sdev = &accel_dev->sdev;
+
+    return &sdev->as;
+}
+
 static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
 {
     DeviceState *d = opaque;
@@ -30,6 +64,7 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
     SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(d);
     SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_GET_CLASS(s_accel);
     SysBusDevice *dev = SYS_BUS_DEVICE(d);
+    SMMUState *bs = ARM_SMMU(d);
     Error *local_err = NULL;
 
     object_child_foreach_recursive(object_get_root(),
@@ -41,6 +76,7 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
         error_propagate(errp, local_err);
         return;
     }
+    bs->get_address_space = smmuv3_accel_find_add_as;
 }
 
 static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
index 56fe376bf4..86c0523063 100644
--- a/include/hw/arm/smmuv3-accel.h
+++ b/include/hw/arm/smmuv3-accel.h
@@ -16,6 +16,10 @@
 #define TYPE_ARM_SMMUV3_ACCEL   "arm-smmuv3-accel"
 OBJECT_DECLARE_TYPE(SMMUv3AccelState, SMMUv3AccelClass, ARM_SMMUV3_ACCEL)
 
+typedef struct SMMUv3AccelDevice {
+    SMMUDevice  sdev;
+} SMMUv3AccelDevice;
+
 struct SMMUv3AccelState {
     SMMUv3State smmuv3_state;
 };
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (7 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 08/20] hw/arm/smmuv3-accel: Provide get_address_space callback Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-11 21:07   ` Nicolin Chen
  2025-03-12 12:52   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 10/20] hw/arm/smmuv3-accel: Support nested STE install/uninstall support Shameer Kolothum via
                   ` (12 subsequent siblings)
  21 siblings, 2 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

Implement a set_iommu_device callback:
 -Find an existing S2 hwpt to test attach() or allocate a new one
   (Devices behind the same physical SMMU should share an S2 HWPT.)
 -Attach the device to the S2 hwp
 -Allocate a viommu with the returned s2 hwpt.
 -Allocate bypass and abort hwpt and attach bypass hwpt.
 -and add it to its device list

Also add an unset_iommu_device doing the opposite cleanup routine.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/meson.build            |   2 +-
 hw/arm/smmuv3-accel.c         | 183 ++++++++++++++++++++++++++++++++++
 hw/arm/trace-events           |   4 +
 include/hw/arm/smmuv3-accel.h |  23 +++++
 include/system/iommufd.h      |   6 ++
 5 files changed, 217 insertions(+), 1 deletion(-)

diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index e8593363b0..dd41a86619 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -55,7 +55,7 @@ arm_ss.add(when: 'CONFIG_MUSCA', if_true: files('musca.c'))
 arm_ss.add(when: 'CONFIG_ARMSSE', if_true: files('armsse.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: files('fsl-imx7.c', 'mcimx7d-sabre.c'))
 arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
-arm_ss.add(when: 'CONFIG_ARM_SMMUV3_ACCEL', if_true: files('smmuv3-accel.c'))
+arm_ss.add(when: ['CONFIG_ARM_SMMUV3_ACCEL', 'CONFIG_IOMMUFD'], if_true: files('smmuv3-accel.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
 arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
 arm_ss.add(when: 'CONFIG_XEN', if_true: files(
diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 6610ebe4be..1c696649d5 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -7,6 +7,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "trace.h"
+#include "qemu/error-report.h"
 
 #include "hw/arm/smmuv3-accel.h"
 #include "hw/pci/pci_bridge.h"
@@ -30,6 +32,185 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
     return accel_dev;
 }
 
+static bool
+smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
+                               HostIOMMUDeviceIOMMUFD *idev, Error **errp)
+{
+    struct iommu_hwpt_arm_smmuv3 bypass_data = {
+        .ste = { 0x9ULL, 0x0ULL },
+    };
+    struct iommu_hwpt_arm_smmuv3 abort_data = {
+        .ste = { 0x1ULL, 0x0ULL },
+    };
+    SMMUDevice *sdev = &accel_dev->sdev;
+    SMMUState *s = sdev->smmu;
+    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
+    SMMUS2Hwpt *s2_hwpt;
+    SMMUViommu *viommu;
+    uint32_t s2_hwpt_id;
+    uint32_t viommu_id;
+
+    if (s_accel->viommu) {
+        accel_dev->viommu = s_accel->viommu;
+        return host_iommu_device_iommufd_attach_hwpt(
+                       idev, s_accel->viommu->s2_hwpt->hwpt_id, errp);
+    }
+
+    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid, idev->ioas_id,
+                                    IOMMU_HWPT_ALLOC_NEST_PARENT,
+                                    IOMMU_HWPT_DATA_NONE, 0, NULL,
+                                    &s2_hwpt_id, errp)) {
+        return false;
+    }
+
+    /* Attach to S2 for MSI cookie */
+    if (!host_iommu_device_iommufd_attach_hwpt(idev, s2_hwpt_id, errp)) {
+        goto free_s2_hwpt;
+    }
+
+    if (!iommufd_backend_alloc_viommu(idev->iommufd, idev->devid,
+                                      IOMMU_VIOMMU_TYPE_ARM_SMMUV3,
+                                      s2_hwpt_id, &viommu_id, errp)) {
+        goto detach_s2_hwpt;
+    }
+
+    viommu = g_new0(SMMUViommu, 1);
+    viommu->core.viommu_id = viommu_id;
+    viommu->core.s2_hwpt_id = s2_hwpt_id;
+    viommu->core.iommufd = idev->iommufd;
+
+    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
+                                    viommu->core.viommu_id, 0,
+                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
+                                    sizeof(abort_data), &abort_data,
+                                    &viommu->abort_hwpt_id, errp)) {
+        error_report("failed to allocate an abort pagetable");
+        goto free_viommu;
+    }
+
+    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
+                                    viommu->core.viommu_id, 0,
+                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
+                                    sizeof(bypass_data), &bypass_data,
+                                    &viommu->bypass_hwpt_id, errp)) {
+        error_report("failed to allocate a bypass pagetable");
+        goto free_abort_hwpt;
+    }
+
+    /*
+     * Attach the bypass STE which means S1 bypass and S2 translate.
+     * This is to make sure that the vIOMMU object is now associated
+     * with the device and has this STE installed in the host SMMUV3.
+     */
+    if (!host_iommu_device_iommufd_attach_hwpt(
+                idev, viommu->bypass_hwpt_id, errp)) {
+        error_report("failed to attach the bypass pagetable");
+        goto free_bypass_hwpt;
+    }
+
+    s2_hwpt = g_new0(SMMUS2Hwpt, 1);
+    s2_hwpt->iommufd = idev->iommufd;
+    s2_hwpt->hwpt_id = s2_hwpt_id;
+    s2_hwpt->ioas_id = idev->ioas_id;
+
+    viommu->iommufd = idev->iommufd;
+    viommu->s2_hwpt = s2_hwpt;
+
+    s_accel->viommu = viommu;
+    accel_dev->viommu = viommu;
+    return true;
+
+free_bypass_hwpt:
+    iommufd_backend_free_id(idev->iommufd, viommu->bypass_hwpt_id);
+free_abort_hwpt:
+    iommufd_backend_free_id(idev->iommufd, viommu->abort_hwpt_id);
+free_viommu:
+    iommufd_backend_free_id(idev->iommufd, viommu->core.viommu_id);
+    g_free(viommu);
+detach_s2_hwpt:
+    host_iommu_device_iommufd_attach_hwpt(idev, accel_dev->idev->ioas_id, errp);
+free_s2_hwpt:
+    iommufd_backend_free_id(idev->iommufd, s2_hwpt_id);
+    return false;
+}
+
+static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
+                                          HostIOMMUDevice *hiod, Error **errp)
+{
+    HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(hiod);
+    SMMUState *s = opaque;
+    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
+    SMMUPciBus *sbus = smmu_get_sbus(s, bus);
+    SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(s, sbus, bus, devfn);
+    SMMUDevice *sdev = &accel_dev->sdev;
+
+    if (!idev) {
+        return true;
+    }
+
+    if (accel_dev->idev) {
+        if (accel_dev->idev != idev) {
+            error_report("Device 0x%x already ha an associated idev",
+                         smmu_get_sid(sdev));
+            return false;
+        } else {
+            return true;
+        }
+    }
+
+    if (!smmuv3_accel_dev_attach_viommu(accel_dev, idev, errp)) {
+        error_report("Unable to attach viommu");
+        return false;
+    }
+
+    accel_dev->idev = idev;
+    QLIST_INSERT_HEAD(&s_accel->viommu->device_list, accel_dev, next);
+    trace_smmuv3_accel_set_iommu_device(devfn, smmu_get_sid(sdev));
+    return true;
+}
+
+static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
+                                            int devfn)
+{
+    SMMUDevice *sdev;
+    SMMUv3AccelDevice *accel_dev;
+    SMMUViommu *viommu;
+    SMMUState *s = opaque;
+    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
+    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
+
+    if (!sbus) {
+        return;
+    }
+
+    sdev = sbus->pbdev[devfn];
+    if (!sdev) {
+        return;
+    }
+
+    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+    if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev,
+                                               accel_dev->idev->ioas_id,
+                                               NULL)) {
+        error_report("Unable to attach dev to the default HW pagetable");
+    }
+
+
+    accel_dev->idev = NULL;
+    QLIST_REMOVE(accel_dev, next);
+    trace_smmuv3_accel_unset_iommu_device(devfn, smmu_get_sid(sdev));
+
+    viommu = s_accel->viommu;
+    if (QLIST_EMPTY(&viommu->device_list)) {
+        iommufd_backend_free_id(viommu->iommufd, viommu->bypass_hwpt_id);
+        iommufd_backend_free_id(viommu->iommufd, viommu->abort_hwpt_id);
+        iommufd_backend_free_id(viommu->iommufd, viommu->core.viommu_id);
+        iommufd_backend_free_id(viommu->iommufd, viommu->s2_hwpt->hwpt_id);
+        g_free(viommu->s2_hwpt);
+        g_free(viommu);
+        s_accel->viommu = NULL;
+    }
+}
 static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
                                               int devfn)
 {
@@ -77,6 +258,8 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
         return;
     }
     bs->get_address_space = smmuv3_accel_find_add_as;
+    bs->set_iommu_device = smmuv3_accel_set_iommu_device;
+    bs->unset_iommu_device = smmuv3_accel_unset_iommu_device;
 }
 
 static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 7790db780e..17960794bf 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -58,6 +58,10 @@ smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s
 smmuv3_inv_notifiers_iova(const char *name, int asid, int vmid, uint64_t iova, uint8_t tg, uint64_t num_pages, int stage) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" stage=%d"
 smmu_reset_exit(void) ""
 
+#smmuv3-accel.c
+smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
+smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
+
 # strongarm.c
 strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
 strongarm_ssp_read_underrun(void) "SSP rx underrun"
diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
index 86c0523063..aca6838dca 100644
--- a/include/hw/arm/smmuv3-accel.h
+++ b/include/hw/arm/smmuv3-accel.h
@@ -12,16 +12,39 @@
 #include "hw/arm/smmu-common.h"
 #include "hw/arm/smmuv3.h"
 #include "qom/object.h"
+#include "system/iommufd.h"
+
+#include <linux/iommufd.h>
 
 #define TYPE_ARM_SMMUV3_ACCEL   "arm-smmuv3-accel"
 OBJECT_DECLARE_TYPE(SMMUv3AccelState, SMMUv3AccelClass, ARM_SMMUV3_ACCEL)
 
+typedef struct SMMUS2Hwpt {
+    IOMMUFDBackend *iommufd;
+    uint32_t hwpt_id;
+    uint32_t ioas_id;
+} SMMUS2Hwpt;
+
+typedef struct SMMUViommu {
+    IOMMUFDBackend *iommufd;
+    IOMMUFDViommu core;
+    SMMUS2Hwpt *s2_hwpt;
+    uint32_t bypass_hwpt_id;
+    uint32_t abort_hwpt_id;
+    QLIST_HEAD(, SMMUv3AccelDevice) device_list;
+    QLIST_ENTRY(SMMUViommu) next;
+} SMMUViommu;
+
 typedef struct SMMUv3AccelDevice {
     SMMUDevice  sdev;
+    HostIOMMUDeviceIOMMUFD *idev;
+    SMMUViommu *viommu;
+    QLIST_ENTRY(SMMUv3AccelDevice) next;
 } SMMUv3AccelDevice;
 
 struct SMMUv3AccelState {
     SMMUv3State smmuv3_state;
+    SMMUViommu *viommu;
 };
 
 struct SMMUv3AccelClass {
diff --git a/include/system/iommufd.h b/include/system/iommufd.h
index 53920bae5b..9c106ea078 100644
--- a/include/system/iommufd.h
+++ b/include/system/iommufd.h
@@ -37,6 +37,12 @@ struct IOMMUFDBackend {
     /*< public >*/
 };
 
+typedef struct IOMMUFDViommu {
+    IOMMUFDBackend *iommufd;
+    uint32_t s2_hwpt_id;
+    uint32_t viommu_id;
+} IOMMUFDViommu;
+
 bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
 void iommufd_backend_disconnect(IOMMUFDBackend *be);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 10/20] hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (8 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-25 18:08   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 11/20] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device Shameer Kolothum via
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

Allocates a s1 HWPT for the Guest s1 stage and attaches that
to the dev. This will be invoked in a subsequent patch when
Guest issues SMMU_CMD_CFGI_STE.

While at it, we are also exporting both smmu_find_ste() and
smmuv3_flush_config() from smmuv3.c for use here.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c         | 111 ++++++++++++++++++++++++++++++++++
 hw/arm/smmuv3-internal.h      |  13 ++++
 hw/arm/smmuv3.c               |   5 +-
 hw/arm/trace-events           |   1 +
 include/hw/arm/smmuv3-accel.h |   6 ++
 5 files changed, 133 insertions(+), 3 deletions(-)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 1c696649d5..d3a5cf9551 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -13,6 +13,8 @@
 #include "hw/arm/smmuv3-accel.h"
 #include "hw/pci/pci_bridge.h"
 
+#include "smmuv3-internal.h"
+
 static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
                                                 PCIBus *bus, int devfn)
 {
@@ -32,6 +34,115 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
     return accel_dev;
 }
 
+static void
+smmuv3_accel_dev_uninstall_nested_ste(SMMUv3AccelDevice *accel_dev, bool abort)
+{
+    HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
+    SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
+    uint32_t hwpt_id;
+
+    if (!s1_hwpt || !accel_dev->viommu) {
+        return;
+    }
+
+    if (abort) {
+        hwpt_id = accel_dev->viommu->abort_hwpt_id;
+    } else {
+        hwpt_id = accel_dev->viommu->bypass_hwpt_id;
+    }
+
+    host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, &error_abort);
+    iommufd_backend_free_id(s1_hwpt->iommufd, s1_hwpt->hwpt_id);
+    accel_dev->s1_hwpt = NULL;
+    g_free(s1_hwpt);
+}
+
+static int
+smmuv3_accel_dev_install_nested_ste(SMMUv3AccelDevice *accel_dev,
+                                    uint32_t data_type, uint32_t data_len,
+                                    void *data)
+{
+    SMMUViommu *viommu = accel_dev->viommu;
+    SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
+    HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
+
+    if (!idev || !viommu) {
+        return -ENOENT;
+    }
+
+    if (s1_hwpt) {
+        smmuv3_accel_dev_uninstall_nested_ste(accel_dev, false);
+    }
+
+    s1_hwpt = g_new0(SMMUS1Hwpt, 1);
+    if (!s1_hwpt) {
+        return -ENOMEM;
+    }
+
+    s1_hwpt->iommufd = idev->iommufd;
+    iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
+                               viommu->core.viommu_id, 0, data_type, data_len,
+                               data, &s1_hwpt->hwpt_id, &error_abort);
+    host_iommu_device_iommufd_attach_hwpt(idev, s1_hwpt->hwpt_id, &error_abort);
+    accel_dev->s1_hwpt = s1_hwpt;
+    return 0;
+}
+
+void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
+{
+    SMMUv3AccelDevice *accel_dev;
+    SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
+                           .inval_ste_allowed = true};
+    struct iommu_hwpt_arm_smmuv3 nested_data = {};
+    SMMUv3State *s = sdev->smmu;
+    SMMUState *bs = &s->smmu_state;
+    uint32_t config;
+    STE ste;
+    int ret;
+
+    if (!bs->accel) {
+        return;
+    }
+
+    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+    if (!accel_dev->viommu) {
+        return;
+    }
+
+    ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
+    if (ret) {
+        /*
+         * For a 2-level Stream Table, the level-2 table might not be ready
+         * until the device gets inserted to the stream table. Ignore this.
+         */
+        return;
+    }
+
+    config = STE_CONFIG(&ste);
+    if (!STE_VALID(&ste) || !STE_CFG_S1_ENABLED(config)) {
+        smmuv3_accel_dev_uninstall_nested_ste(accel_dev, STE_CFG_ABORT(config));
+        smmuv3_flush_config(sdev);
+        return;
+    }
+
+    nested_data.ste[0] = (uint64_t)ste.word[0] | (uint64_t)ste.word[1] << 32;
+    nested_data.ste[1] = (uint64_t)ste.word[2] | (uint64_t)ste.word[3] << 32;
+    /* V | CONFIG | S1FMT | S1CTXPTR | S1CDMAX */
+    nested_data.ste[0] &= 0xf80fffffffffffffULL;
+    /* S1DSS | S1CIR | S1COR | S1CSH | S1STALLD | EATS */
+    nested_data.ste[1] &= 0x380000ffULL;
+    ret = smmuv3_accel_dev_install_nested_ste(accel_dev,
+                                              IOMMU_HWPT_DATA_ARM_SMMUV3,
+                                              sizeof(nested_data),
+                                              &nested_data);
+    if (ret) {
+        error_report("Unable to install nested STE=%16LX:%16LX, ret=%d",
+                     nested_data.ste[1], nested_data.ste[0], ret);
+    }
+    trace_smmuv3_accel_install_nested_ste(sid, nested_data.ste[1],
+                                          nested_data.ste[0]);
+}
+
 static bool
 smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
                                HostIOMMUDeviceIOMMUFD *idev, Error **errp)
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index b6b7399347..46c8bcae14 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -24,6 +24,8 @@
 #include "hw/registerfields.h"
 #include "hw/arm/smmu-common.h"
 
+#include CONFIG_DEVICES
+
 typedef enum SMMUTranslationStatus {
     SMMU_TRANS_DISABLE,
     SMMU_TRANS_ABORT,
@@ -547,6 +549,17 @@ typedef struct CD {
     uint32_t word[16];
 } CD;
 
+int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
+                  SMMUEventInfo *event);
+void smmuv3_flush_config(SMMUDevice *sdev);
+
+#if defined(CONFIG_ARM_SMMUV3_ACCEL) && defined(CONFIG_IOMMUFD)
+void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid);
+#else
+static inline void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
+{
+}
+#endif
 /* STE fields */
 
 #define STE_VALID(x)   extract32((x)->word[0], 0, 1)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index b49a59b64c..ea63731d61 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -628,8 +628,7 @@ bad_ste:
  * Supports linear and 2-level stream table
  * Return 0 on success, -EINVAL otherwise
  */
-static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
-                         SMMUEventInfo *event)
+int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste, SMMUEventInfo *event)
 {
     dma_addr_t addr, strtab_base;
     uint32_t log2size;
@@ -898,7 +897,7 @@ static SMMUTransCfg *smmuv3_get_config(SMMUDevice *sdev, SMMUEventInfo *event)
     return cfg;
 }
 
-static void smmuv3_flush_config(SMMUDevice *sdev)
+void smmuv3_flush_config(SMMUDevice *sdev)
 {
     SMMUv3State *s = sdev->smmu;
     SMMUState *bc = &s->smmu_state;
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 17960794bf..cd2eac31c2 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -61,6 +61,7 @@ smmu_reset_exit(void) ""
 #smmuv3-accel.c
 smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
 smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
+smmuv3_accel_install_nested_ste(uint32_t sid, uint64_t ste_1, uint64_t ste_0) "sid=%d ste=%"PRIx64":%"PRIx64
 
 # strongarm.c
 strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
index aca6838dca..d6b0b1ca30 100644
--- a/include/hw/arm/smmuv3-accel.h
+++ b/include/hw/arm/smmuv3-accel.h
@@ -35,9 +35,15 @@ typedef struct SMMUViommu {
     QLIST_ENTRY(SMMUViommu) next;
 } SMMUViommu;
 
+typedef struct SMMUS1Hwpt {
+    IOMMUFDBackend *iommufd;
+    uint32_t hwpt_id;
+} SMMUS1Hwpt;
+
 typedef struct SMMUv3AccelDevice {
     SMMUDevice  sdev;
     HostIOMMUDeviceIOMMUFD *idev;
+    SMMUS1Hwpt  *s1_hwpt;
     SMMUViommu *viommu;
     QLIST_ENTRY(SMMUv3AccelDevice) next;
 } SMMUv3AccelDevice;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 11/20] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (9 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 10/20] hw/arm/smmuv3-accel: Support nested STE install/uninstall support Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-18 23:30   ` Donald Dutile
  2025-03-25 18:13   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 12/20] hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed Shameer Kolothum via
                   ` (10 subsequent siblings)
  21 siblings, 2 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

Allocate and associate a vDEVICE object for the Guest device
with the vIOMMU. This will help the kernel to do the
vSID --> sid translation whenever required (eg: device specific
invalidations).

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c         | 22 ++++++++++++++++++++++
 include/hw/arm/smmuv3-accel.h |  6 ++++++
 2 files changed, 28 insertions(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index d3a5cf9551..056bd23b2e 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -109,6 +109,20 @@ void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
         return;
     }
 
+    if (!accel_dev->vdev && accel_dev->idev) {
+        SMMUVdev *vdev;
+        uint32_t vdev_id;
+        SMMUViommu *viommu = accel_dev->viommu;
+
+        iommufd_backend_alloc_vdev(viommu->core.iommufd, accel_dev->idev->devid,
+                                   viommu->core.viommu_id, sid, &vdev_id,
+                                   &error_abort);
+        vdev = g_new0(SMMUVdev, 1);
+        vdev->vdev_id = vdev_id;
+        vdev->sid = sid;
+        accel_dev->vdev = vdev;
+    }
+
     ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
     if (ret) {
         /*
@@ -283,6 +297,7 @@ static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
 static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
                                             int devfn)
 {
+    SMMUVdev *vdev;
     SMMUDevice *sdev;
     SMMUv3AccelDevice *accel_dev;
     SMMUViommu *viommu;
@@ -312,6 +327,13 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
     trace_smmuv3_accel_unset_iommu_device(devfn, smmu_get_sid(sdev));
 
     viommu = s_accel->viommu;
+    vdev = accel_dev->vdev;
+    if (vdev) {
+        iommufd_backend_free_id(viommu->iommufd, vdev->vdev_id);
+        g_free(vdev);
+        accel_dev->vdev = NULL;
+    }
+
     if (QLIST_EMPTY(&viommu->device_list)) {
         iommufd_backend_free_id(viommu->iommufd, viommu->bypass_hwpt_id);
         iommufd_backend_free_id(viommu->iommufd, viommu->abort_hwpt_id);
diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
index d6b0b1ca30..54b217ab4f 100644
--- a/include/hw/arm/smmuv3-accel.h
+++ b/include/hw/arm/smmuv3-accel.h
@@ -35,6 +35,11 @@ typedef struct SMMUViommu {
     QLIST_ENTRY(SMMUViommu) next;
 } SMMUViommu;
 
+typedef struct SMMUVdev {
+    uint32_t vdev_id;
+    uint32_t sid;
+} SMMUVdev;
+
 typedef struct SMMUS1Hwpt {
     IOMMUFDBackend *iommufd;
     uint32_t hwpt_id;
@@ -45,6 +50,7 @@ typedef struct SMMUv3AccelDevice {
     HostIOMMUDeviceIOMMUFD *idev;
     SMMUS1Hwpt  *s1_hwpt;
     SMMUViommu *viommu;
+    SMMUVdev   *vdev;
     QLIST_ENTRY(SMMUv3AccelDevice) next;
 } SMMUv3AccelDevice;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 12/20] hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (10 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 11/20] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-25 18:47   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

When nested translation is enabled, there are 2-stage translation
occuring to two different address spaces: stage-1 in the iommu as,
while stage-2 in the system as.

If a device attached to the vSMMU doesn't enable stage-1 translation,
e.g. vSTE sets to Config=Bypass, the system as should be returned,
so QEMU can set up system memory mappings onto the stage-2 page table.
This is crucial for an iommufd enabled VFIO device as the VFIO core
code would register an iommu notifier and replay the address space
which should be bypassed for this nested translation case.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c         | 22 +++++++++++++++++++++-
 include/hw/arm/smmuv3-accel.h |  3 +++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 056bd23b2e..76134d106a 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -18,6 +18,7 @@
 static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
                                                 PCIBus *bus, int devfn)
 {
+    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
     SMMUDevice *sdev = sbus->pbdev[devfn];
     SMMUv3AccelDevice *accel_dev;
 
@@ -29,6 +30,8 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
 
         sbus->pbdev[devfn] = sdev;
         smmu_init_sdev(s, sdev, bus, devfn);
+        address_space_init(&accel_dev->as_sysmem, &s_accel->root,
+                           "smmuv3-accel-sysmem");
     }
 
     return accel_dev;
@@ -351,12 +354,23 @@ static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
     SMMUPciBus *sbus;
     SMMUv3AccelDevice *accel_dev;
     SMMUDevice *sdev;
+    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
+    bool has_iommufd = false;
+
+    if (pdev) {
+        has_iommufd = object_property_find(OBJECT(pdev), "iommufd");
+    }
 
     sbus = smmu_get_sbus(s, bus);
     accel_dev = smmuv3_accel_get_dev(s, sbus, bus, devfn);
     sdev = &accel_dev->sdev;
 
-    return &sdev->as;
+    /* Return the system as if the device uses stage-2 only */
+    if (has_iommufd && !accel_dev->s1_hwpt) {
+        return &accel_dev->as_sysmem;
+    } else {
+        return &sdev->as;
+    }
 }
 
 static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
@@ -390,6 +404,12 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
         error_propagate(errp, local_err);
         return;
     }
+
+    memory_region_init(&s_accel->root, OBJECT(s_accel), "root", UINT64_MAX);
+    memory_region_init_alias(&s_accel->sysmem, OBJECT(s_accel),
+                             "smmuv3-accel-sysmem", get_system_memory(), 0,
+                             memory_region_size(get_system_memory()));
+    memory_region_add_subregion(&s_accel->root, 0, &s_accel->sysmem);
     bs->get_address_space = smmuv3_accel_find_add_as;
     bs->set_iommu_device = smmuv3_accel_set_iommu_device;
     bs->unset_iommu_device = smmuv3_accel_unset_iommu_device;
diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
index 54b217ab4f..58e68534c0 100644
--- a/include/hw/arm/smmuv3-accel.h
+++ b/include/hw/arm/smmuv3-accel.h
@@ -51,12 +51,15 @@ typedef struct SMMUv3AccelDevice {
     SMMUS1Hwpt  *s1_hwpt;
     SMMUViommu *viommu;
     SMMUVdev   *vdev;
+    AddressSpace as_sysmem;
     QLIST_ENTRY(SMMUv3AccelDevice) next;
 } SMMUv3AccelDevice;
 
 struct SMMUv3AccelState {
     SMMUv3State smmuv3_state;
     SMMUViommu *viommu;
+    MemoryRegion root;
+    MemoryRegion sysmem;
 };
 
 struct SMMUv3AccelClass {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (11 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 12/20] hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-19  1:31   ` Donald Dutile
                     ` (2 more replies)
  2025-03-11 14:10 ` [RFC PATCH v2 14/20] hw/arm/smmuv3: Install nested ste for CFGI_STE Shameer Kolothum via
                   ` (8 subsequent siblings)
  21 siblings, 3 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

Inroduce an SMMUCommandBatch and some helpers to batch and issue the
commands.  Currently separate out TLBI commands and device cache commands
to avoid some errata on certain versions of SMMUs. Later it should check
IIDR register to detect if underlying SMMU hw has such an erratum.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c    | 69 ++++++++++++++++++++++++++++++++++++++++
 hw/arm/smmuv3-internal.h | 29 +++++++++++++++++
 2 files changed, 98 insertions(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 76134d106a..09be838d22 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -160,6 +160,75 @@ void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
                                           nested_data.ste[0]);
 }
 
+/* Update batch->ncmds to the number of execute cmds */
+int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch)
+{
+    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(bs);
+    uint32_t total = batch->ncmds;
+    IOMMUFDViommu *viommu_core;
+    int ret;
+
+    if (!bs->accel) {
+        return 0;
+    }
+
+    if (!s_accel->viommu) {
+        return 0;
+    }
+    viommu_core = &s_accel->viommu->core;
+    ret = iommufd_backend_invalidate_cache(viommu_core->iommufd,
+                                           viommu_core->viommu_id,
+                                           IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
+                                           sizeof(Cmd), &batch->ncmds,
+                                           batch->cmds);
+    if (total != batch->ncmds) {
+        error_report("%s failed: ret=%d, total=%d, done=%d",
+                      __func__, ret, total, batch->ncmds);
+        return ret;
+    }
+
+    batch->ncmds = 0;
+    batch->dev_cache = false;
+    return ret;
+}
+
+int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
+                            SMMUCommandBatch *batch, Cmd *cmd,
+                            uint32_t *cons, bool dev_cache)
+{
+    int ret;
+
+    if (!bs->accel) {
+        return 0;
+    }
+
+    if (sdev) {
+        SMMUv3AccelDevice *accel_dev;
+        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
+        if (!accel_dev->s1_hwpt) {
+            return 0;
+        }
+    }
+
+    /*
+     * Currently separate out dev_cache and hwpt for safety, which might
+     * not be necessary if underlying HW SMMU does not have the errata.
+     *
+     * TODO check IIDR register values read from hw_info.
+     */
+    if (batch->ncmds && (dev_cache != batch->dev_cache)) {
+        ret = smmuv3_accel_issue_cmd_batch(bs, batch);
+        if (ret) {
+            *cons = batch->cons[batch->ncmds];
+            return ret;
+        }
+    }
+    batch->dev_cache = dev_cache;
+    batch->cmds[batch->ncmds] = *cmd;
+    batch->cons[batch->ncmds++] = *cons;
+    return 0;
+}
+
 static bool
 smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
                                HostIOMMUDeviceIOMMUFD *idev, Error **errp)
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index 46c8bcae14..4602ae6728 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -549,13 +549,42 @@ typedef struct CD {
     uint32_t word[16];
 } CD;
 
+/**
+ * SMMUCommandBatch - batch of invalidation commands for smmuv3-accel
+ * @cmds: Pointer to list of commands
+ * @cons: Pointer to list of CONS corresponding to the commands
+ * @ncmds: Total ncmds in the batch
+ * @dev_cache: Issue to a device cache
+ */
+typedef struct SMMUCommandBatch {
+    Cmd *cmds;
+    uint32_t *cons;
+    uint32_t ncmds;
+    bool dev_cache;
+} SMMUCommandBatch;
+
 int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
                   SMMUEventInfo *event);
 void smmuv3_flush_config(SMMUDevice *sdev);
 
 #if defined(CONFIG_ARM_SMMUV3_ACCEL) && defined(CONFIG_IOMMUFD)
+int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch);
+int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
+                            SMMUCommandBatch *batch, Cmd *cmd,
+                            uint32_t *cons, bool dev_cache);
 void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid);
 #else
+static inline int smmuv3_accel_issue_cmd_batch(SMMUState *bs,
+                                               SMMUCommandBatch *batch)
+{
+    return 0;
+}
+static inline int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
+                                          SMMUCommandBatch *batch, Cmd *cmd,
+                                          uint32_t *cons, bool dev_cache)
+{
+    return 0;
+}
 static inline void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
 {
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 14/20] hw/arm/smmuv3: Install nested ste for CFGI_STE
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (12 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-26 13:39   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw Shameer Kolothum via
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Make use of smmuv3_accel provided _install_nested_ste() for CFGI_STE.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index ea63731d61..83159db1d4 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1286,6 +1286,7 @@ smmuv3_invalidate_ste(gpointer key, gpointer value, gpointer user_data)
     if (sid < sid_range->start || sid > sid_range->end) {
         return false;
     }
+    smmuv3_accel_install_nested_ste(sdev, sid);
     trace_smmuv3_config_cache_inv(sid);
     return true;
 }
@@ -1353,6 +1354,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 
             trace_smmuv3_cmdq_cfgi_ste(sid);
             smmuv3_flush_config(sdev);
+            smmuv3_accel_install_nested_ste(sdev, sid);
 
             break;
         }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (13 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 14/20] hw/arm/smmuv3: Install nested ste for CFGI_STE Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-26 14:16   ` Eric Auger
  2025-03-26 14:18   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 16/20] hw/arm/smmuv3-accel: Read host SMMUv3 device info Shameer Kolothum via
                   ` (6 subsequent siblings)
  21 siblings, 2 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

Use the provided smmuv3-accel helper functions to issue the
command to physical SMMUv3.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-internal.h | 11 ++++++++
 hw/arm/smmuv3.c          | 58 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index 4602ae6728..546f8faac0 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -235,6 +235,17 @@ static inline bool smmuv3_gerror_irq_enabled(SMMUv3State *s)
 #define Q_CONS_WRAP(q) (((q)->cons & WRAP_MASK(q)) >> (q)->log2size)
 #define Q_PROD_WRAP(q) (((q)->prod & WRAP_MASK(q)) >> (q)->log2size)
 
+static inline int smmuv3_q_ncmds(SMMUQueue *q)
+{
+        uint32_t prod = Q_PROD(q);
+        uint32_t cons = Q_CONS(q);
+
+        if (Q_PROD_WRAP(q) == Q_CONS_WRAP(q))
+                return prod - cons;
+        else
+                return WRAP_MASK(q) - cons + prod;
+}
+
 static inline bool smmuv3_q_full(SMMUQueue *q)
 {
     return ((q->cons ^ q->prod) & WRAP_INDEX_MASK(q)) == WRAP_MASK(q);
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 83159db1d4..e0f225d0df 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1297,10 +1297,18 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
     SMMUCmdError cmd_error = SMMU_CERROR_NONE;
     SMMUQueue *q = &s->cmdq;
     SMMUCommandType type = 0;
+    SMMUCommandBatch batch = {};
+    uint32_t ncmds = 0;
+
 
     if (!smmuv3_cmdq_enabled(s)) {
         return 0;
     }
+
+    ncmds = smmuv3_q_ncmds(q);
+    batch.cmds = g_new0(Cmd, ncmds);
+    batch.cons = g_new0(uint32_t, ncmds);
+
     /*
      * some commands depend on register values, typically CR0. In case those
      * register values change while handling the command, spec says it
@@ -1395,6 +1403,13 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 
             trace_smmuv3_cmdq_cfgi_cd(sid);
             smmuv3_flush_config(sdev);
+
+            if (smmuv3_accel_batch_cmds(sdev->smmu, sdev, &batch, &cmd,
+                                        &q->cons, true)) {
+                cmd_error = SMMU_CERROR_ILL;
+                break;
+            }
+
             break;
         }
         case SMMU_CMD_TLBI_NH_ASID:
@@ -1418,6 +1433,13 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
             trace_smmuv3_cmdq_tlbi_nh_asid(asid);
             smmu_inv_notifiers_all(&s->smmu_state);
             smmu_iotlb_inv_asid_vmid(bs, asid, vmid);
+
+            if (smmuv3_accel_batch_cmds(bs, NULL, &batch, &cmd, &q->cons,
+                                        false)) {
+                cmd_error = SMMU_CERROR_ILL;
+                break;
+            }
+
             break;
         }
         case SMMU_CMD_TLBI_NH_ALL:
@@ -1445,6 +1467,12 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
             trace_smmuv3_cmdq_tlbi_nsnh();
             smmu_inv_notifiers_all(&s->smmu_state);
             smmu_iotlb_inv_all(bs);
+
+            if (smmuv3_accel_batch_cmds(bs, NULL, &batch, &cmd, &q->cons,
+                                        false)) {
+                cmd_error = SMMU_CERROR_ILL;
+                break;
+            }
             break;
         case SMMU_CMD_TLBI_NH_VAA:
         case SMMU_CMD_TLBI_NH_VA:
@@ -1453,7 +1481,24 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
                 break;
             }
             smmuv3_range_inval(bs, &cmd, SMMU_STAGE_1);
+
+            if (smmuv3_accel_batch_cmds(bs, NULL, &batch, &cmd, &q->cons,
+                                        false)) {
+                cmd_error = SMMU_CERROR_ILL;
+                break;
+            }
+            break;
+        case SMMU_CMD_ATC_INV:
+        {
+            SMMUDevice *sdev = smmu_find_sdev(bs, CMD_SID(&cmd));
+
+            if (smmuv3_accel_batch_cmds(sdev->smmu, sdev, &batch, &cmd,
+                                        &q->cons, true)) {
+                cmd_error = SMMU_CERROR_ILL;
+                break;
+            }
             break;
+        }
         case SMMU_CMD_TLBI_S12_VMALL:
         {
             int vmid = CMD_VMID(&cmd);
@@ -1485,7 +1530,6 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
         case SMMU_CMD_TLBI_EL2_ASID:
         case SMMU_CMD_TLBI_EL2_VA:
         case SMMU_CMD_TLBI_EL2_VAA:
-        case SMMU_CMD_ATC_INV:
         case SMMU_CMD_PRI_RESP:
         case SMMU_CMD_RESUME:
         case SMMU_CMD_STALL_TERM:
@@ -1511,12 +1555,24 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
         queue_cons_incr(q);
     }
 
+    qemu_mutex_lock(&s->mutex);
+    if (!cmd_error && batch.ncmds) {
+        if (smmuv3_accel_issue_cmd_batch(bs, &batch)) {
+            q->cons = batch.cons[batch.ncmds];
+            cmd_error = SMMU_CERROR_ILL;
+        }
+    }
+    qemu_mutex_unlock(&s->mutex);
+
     if (cmd_error) {
         trace_smmuv3_cmdq_consume_error(smmu_cmd_string(type), cmd_error);
         smmu_write_cmdq_err(s, cmd_error);
         smmuv3_trigger_irq(s, SMMU_IRQ_GERROR, R_GERROR_CMDQ_ERR_MASK);
     }
 
+    g_free(batch.cmds);
+    g_free(batch.cons);
+
     trace_smmuv3_cmdq_consume_out(Q_PROD(q), Q_CONS(q),
                                   Q_PROD_WRAP(q), Q_CONS_WRAP(q));
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 16/20] hw/arm/smmuv3-accel: Read host SMMUv3 device info
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (14 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-19  2:45   ` Donald Dutile
  2025-03-26 14:57   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD Shameer Kolothum via
                   ` (5 subsequent siblings)
  21 siblings, 2 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

Read the underlying SMMUv3 device info and set corresponding IDR
bits. We need at least one cold-plugged vfio-pci dev associated
with the smmuv3-accel instance to do this now.  Hence fail if it
is not available.

ToDo: The above requirement will be relaxed in future when we add
support in the kernel.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c         | 104 ++++++++++++++++++++++++++++++++++
 hw/arm/trace-events           |   1 +
 include/hw/arm/smmuv3-accel.h |   2 +
 3 files changed, 107 insertions(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index 09be838d22..fb08e1d66b 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -15,6 +15,96 @@
 
 #include "smmuv3-internal.h"
 
+static int
+smmuv3_accel_dev_get_info(SMMUv3AccelDevice *accel_dev, uint32_t *data_type,
+                          uint32_t data_len, void *data)
+{
+    uint64_t caps;
+
+    if (!accel_dev || !accel_dev->idev) {
+        return -ENOENT;
+    }
+
+    return !iommufd_backend_get_device_info(accel_dev->idev->iommufd,
+                                            accel_dev->idev->devid,
+                                            data_type, data,
+                                            data_len, &caps, NULL);
+}
+
+static void smmuv3_accel_init_regs(SMMUv3AccelState *s_accel)
+{
+    SMMUv3State *s = ARM_SMMUV3(s_accel);
+    SMMUv3AccelDevice *accel_dev;
+    uint32_t data_type;
+    uint32_t val;
+    int ret;
+
+    if (!s_accel->viommu || QLIST_EMPTY(&s_accel->viommu->device_list)) {
+        error_report("At least one cold-plugged vfio-pci is required for smmuv3-accel!");
+        exit(1);
+    }
+
+    accel_dev = QLIST_FIRST(&s_accel->viommu->device_list);
+    if (accel_dev->info.idr[0]) {
+        info_report("reusing the previous hw_info");
+        goto out;
+    }
+
+    ret = smmuv3_accel_dev_get_info(accel_dev, &data_type,
+                                    sizeof(accel_dev->info), &accel_dev->info);
+    if (ret) {
+        error_report("failed to get SMMU device info");
+        return;
+    }
+
+    if (data_type != IOMMU_HW_INFO_TYPE_ARM_SMMUV3) {
+        error_report("Wrong data type (%d)!", data_type);
+        return;
+    }
+
+out:
+    trace_smmuv3_accel_get_device_info(accel_dev->info.idr[0],
+                                       accel_dev->info.idr[1],
+                                       accel_dev->info.idr[3],
+                                       accel_dev->info.idr[5]);
+
+    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, BTM);
+    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, BTM, val);
+    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, ATS);
+    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, ATS, val);
+    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, ASID16);
+    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, ASID16, val);
+    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, TERM_MODEL);
+    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, TERM_MODEL, val);
+    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, STALL_MODEL);
+    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, STALL_MODEL, val);
+    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, STLEVEL);
+    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, STLEVEL, val);
+
+    val = FIELD_EX32(accel_dev->info.idr[1], IDR1, SIDSIZE);
+    s->idr[1] = FIELD_DP32(s->idr[1], IDR1, SIDSIZE, val);
+    val = FIELD_EX32(accel_dev->info.idr[1], IDR1, SSIDSIZE);
+    s->idr[1] = FIELD_DP32(s->idr[1], IDR1, SSIDSIZE, val);
+
+    val = FIELD_EX32(accel_dev->info.idr[3], IDR3, HAD);
+    s->idr[3] = FIELD_DP32(s->idr[3], IDR3, HAD, val);
+    val = FIELD_EX32(accel_dev->info.idr[3], IDR3, RIL);
+    s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, val);
+    val = FIELD_EX32(accel_dev->info.idr[3], IDR3, BBML);
+    s->idr[3] = FIELD_DP32(s->idr[3], IDR3, BBML, val);
+
+    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, GRAN4K);
+    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
+    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, GRAN16K);
+    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
+    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, GRAN64K);
+    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
+    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, OAS);
+    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, val);
+
+    /* FIXME check iidr and aidr registrs too */
+}
+
 static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
                                                 PCIBus *bus, int devfn)
 {
@@ -484,11 +574,25 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
     bs->unset_iommu_device = smmuv3_accel_unset_iommu_device;
 }
 
+static void smmuv3_accel_reset_hold(Object *obj, ResetType type)
+{
+    SMMUv3AccelState *s = ARM_SMMUV3_ACCEL(obj);
+    SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_GET_CLASS(s);
+
+    if (c->parent_phases.hold) {
+        c->parent_phases.hold(obj, type);
+    }
+    smmuv3_accel_init_regs(s);
+}
+
 static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
     SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_CLASS(klass);
 
+    resettable_class_set_parent_phases(rc, NULL, smmuv3_accel_reset_hold, NULL,
+                                       &c->parent_phases);
     device_class_set_parent_realize(dc, smmu_accel_realize,
                                     &c->parent_realize);
     dc->hotpluggable = false;
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index cd2eac31c2..c7a7e58291 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -62,6 +62,7 @@ smmu_reset_exit(void) ""
 smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
 smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
 smmuv3_accel_install_nested_ste(uint32_t sid, uint64_t ste_1, uint64_t ste_0) "sid=%d ste=%"PRIx64":%"PRIx64
+smmuv3_accel_get_device_info(uint32_t idr0, uint32_t idr1, uint32_t idr3, uint32_t idr5) "idr0=0x%x idr1=0x%x idr3=0x%x idr5=0x%x"
 
 # strongarm.c
 strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
index 58e68534c0..9e30d7d351 100644
--- a/include/hw/arm/smmuv3-accel.h
+++ b/include/hw/arm/smmuv3-accel.h
@@ -52,6 +52,7 @@ typedef struct SMMUv3AccelDevice {
     SMMUViommu *viommu;
     SMMUVdev   *vdev;
     AddressSpace as_sysmem;
+    struct iommu_hw_info_arm_smmuv3 info;
     QLIST_ENTRY(SMMUv3AccelDevice) next;
 } SMMUv3AccelDevice;
 
@@ -68,6 +69,7 @@ struct SMMUv3AccelClass {
     /*< public >*/
 
     DeviceRealize parent_realize;
+    ResettablePhases parent_phases;
 };
 
 #endif /* HW_ARM_SMMUV3_ACCEL_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (15 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 16/20] hw/arm/smmuv3-accel: Read host SMMUv3 device info Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-26 17:18   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3 Shameer Kolothum via
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

With nested translation, the underlying HW could support those two fields.
Allow them according to the updated idr registers after the hw_info ioctl.

When substreams are enabled (S1CDMax != 0), S1DSS field determines
the behavior of a transaction.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-internal.h |  1 +
 hw/arm/smmuv3.c          | 15 +++++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index 546f8faac0..530284a9c0 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -612,6 +612,7 @@ static inline void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
 
 #define STE_S1FMT(x)       extract32((x)->word[0], 4 , 2)
 #define STE_S1CDMAX(x)     extract32((x)->word[1], 27, 5)
+#define STE_S1DSS(x)       extract32((x)->word[2], 0,  2)
 #define STE_S1STALLD(x)    extract32((x)->word[2], 27, 1)
 #define STE_EATS(x)        extract32((x)->word[2], 28, 2)
 #define STE_STRW(x)        extract32((x)->word[2], 30, 2)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index e0f225d0df..e8a6c50056 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -561,6 +561,16 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
 
     decode_ste_config(cfg, config);
 
+      /* S1DSS.Terminate is same as Config.abort for default stream */
+    if (STE_CFG_S1_ENABLED(config) && STE_S1DSS(ste) == 0) {
+        cfg->aborted = true;
+    }
+
+    /* S1DSS.Bypass is same as Config.bypass for default stream */
+    if (STE_CFG_S1_ENABLED(config) && STE_S1DSS(ste) == 0x1) {
+        cfg->bypassed = true;
+    }
+
     if (cfg->aborted || cfg->bypassed) {
         return 0;
     }
@@ -598,13 +608,14 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
         }
     }
 
-    if (STE_S1CDMAX(ste) != 0) {
+    if (!FIELD_EX32(s->idr[1], IDR1, SSIDSIZE) && STE_S1CDMAX(ste) != 0) {
         qemu_log_mask(LOG_UNIMP,
                       "SMMUv3 does not support multiple context descriptors yet\n");
         goto bad_ste;
     }
 
-    if (STE_S1STALLD(ste)) {
+    /* STALL_MODEL being 0b01 means "stall is not supported" */
+    if ((FIELD_EX32(s->idr[0], IDR0, STALL_MODEL) & 0x1) && STE_S1STALLD(ste)) {
         qemu_log_mask(LOG_UNIMP,
                       "SMMUv3 S1 stalling fault model not allowed yet\n");
         goto bad_ste;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (16 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-19  2:52   ` Donald Dutile
  2025-03-26 17:40   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 19/20] hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes Shameer Kolothum via
                   ` (3 subsequent siblings)
  21 siblings, 2 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

If a vSMMU is configured as a accelerated one, HW IOTLB will be used
and all cache invalidation should be done to the HW IOTLB too, v.s.
the emulated iotlb. In this case, an iommu notifier isn't registered,
as the devices behind a SMMUv3-accel would stay in the system address
space for stage-2 mappings.

However, the KVM code still requests an iommu address space to translate
an MSI doorbell gIOVA via get_msi_address_space() and translate().

Since a SMMUv3-accel doesn't register an iommu notifier to flush emulated
iotlb, bypass the emulated IOTLB and always walk through the guest-level
IO page table.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmu-common.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 9fd455baa0..fd10df8866 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -77,6 +77,17 @@ static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs,
     uint8_t level = 4 - (inputsize - 4) / stride;
     SMMUTLBEntry *entry = NULL;
 
+    /*
+     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
+     * KVM still requests for an iommu address space for an MSI fixup by looking
+     * up stage-1 page table. Make sure we don't go through the emulated pathway
+     * so that the emulated iotlb will not need any invalidation.
+     */
+
+    if (bs->accel) {
+        return NULL;
+    }
+
     while (level <= 3) {
         uint64_t subpage_size = 1ULL << level_shift(level, tt->granule_sz);
         uint64_t mask = subpage_size - 1;
@@ -142,6 +153,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
     SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
     uint8_t tg = (new->granule - 10) / 2;
 
+    /*
+     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
+     * KVM still requests for an iommu address space for an MSI fixup by looking
+     * up stage-1 page table. Make sure we don't go through the emulated pathway
+     * so that the emulated iotlb will not need any invalidation.
+     */
+    if (bs->accel) {
+        return;
+    }
+
     if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
         smmu_iotlb_inv_all(bs);
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 19/20] hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (17 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3 Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-26 18:14   ` Eric Auger
  2025-03-11 14:10 ` [RFC PATCH v2 20/20] hw/arm/smmuv3-accel: Enable smmuv3-accel creation Shameer Kolothum via
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Now that we can have multiple user-creatable smmuv3-accel devices,
each associated with different pci buses, update IORT ID mappings
accordingly.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/virt-acpi-build.c | 113 +++++++++++++++++++++++++++++++++------
 1 file changed, 97 insertions(+), 16 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 3ac8f8e178..c232850e36 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -43,6 +43,7 @@
 #include "hw/acpi/generic_event_device.h"
 #include "hw/acpi/tpm.h"
 #include "hw/acpi/hmat.h"
+#include "hw/arm/smmuv3-accel.h"
 #include "hw/pci/pcie_host.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
@@ -233,6 +234,51 @@ struct AcpiIortIdMapping {
 };
 typedef struct AcpiIortIdMapping AcpiIortIdMapping;
 
+struct SMMUv3Accel {
+    int irq;
+    hwaddr base;
+    AcpiIortIdMapping smmu_idmap;
+};
+typedef struct SMMUv3Accel SMMUv3Accel;
+
+static int smmuv3_accel_idmap_compare(gconstpointer a, gconstpointer b)
+{
+    SMMUv3Accel *accel_a = (SMMUv3Accel *)a;
+    SMMUv3Accel *accel_b = (SMMUv3Accel *)b;
+
+    return accel_a->smmu_idmap.input_base - accel_b->smmu_idmap.input_base;
+}
+
+static int get_smmuv3_accel(Object *obj, void *opaque)
+{
+    GArray *s_accel_blob = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_ARM_SMMUV3_ACCEL)) {
+        PCIBus *bus = (PCIBus *) object_property_get_link(obj, "primary-bus",
+                                                          &error_abort);
+        if (bus && !pci_bus_bypass_iommu(bus)) {
+            SMMUv3Accel accel;
+            int min_bus, max_bus;
+            VirtMachineState *v = VIRT_MACHINE(qdev_get_machine());
+            PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(v->platform_bus_dev);
+            SysBusDevice *sbdev = SYS_BUS_DEVICE(obj);
+            hwaddr base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
+            int irq = platform_bus_get_irqn(pbus, sbdev, 0);
+
+            base += v->memmap[VIRT_PLATFORM_BUS].base;
+            irq += v->irqmap[VIRT_PLATFORM_BUS];
+
+            pci_bus_range(bus, &min_bus, &max_bus);
+            accel.smmu_idmap.input_base = min_bus << 8;
+            accel.smmu_idmap.id_count = (max_bus - min_bus + 1) << 8;
+            accel.base = base;
+            accel.irq = irq + ARM_SPI_BASE;
+            g_array_append_val(s_accel_blob, accel);
+        }
+    }
+    return 0;
+}
+
 /* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
 static int
 iort_host_bridges(Object *obj, void *opaque)
@@ -275,30 +321,51 @@ static void
 build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 {
     int i, nb_nodes, rc_mapping_count;
-    size_t node_size, smmu_offset = 0;
+    size_t node_size, *smmu_offset = NULL;
     AcpiIortIdMapping *idmap;
+    SMMUv3Accel *accel;
+    int num_smmus = 0;
     uint32_t id = 0;
     GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
     GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
+    GArray *smmuv3_accel = g_array_new(false, true, sizeof(SMMUv3Accel));
 
     AcpiTable table = { .sig = "IORT", .rev = 3, .oem_id = vms->oem_id,
                         .oem_table_id = vms->oem_table_id };
     /* Table 2 The IORT */
     acpi_table_begin(&table, table_data);
 
-    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
-        AcpiIortIdMapping next_range = {0};
-
+    nb_nodes = 2; /* RC, ITS */
+    if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
+        object_child_foreach_recursive(object_get_root(),
+                                       get_smmuv3_accel, smmuv3_accel);
+        /* Sort the smmuv3-accel by smmu idmap input_base */
+        g_array_sort(smmuv3_accel, smmuv3_accel_idmap_compare);
+
+        /*  Fill smmu idmap from sorted accel array */
+        for (i = 0; i < smmuv3_accel->len; i++) {
+            accel = &g_array_index(smmuv3_accel, SMMUv3Accel, i);
+            g_array_append_val(smmu_idmaps, accel->smmu_idmap);
+        }
+        num_smmus = smmuv3_accel->len;
+    } else if (vms->iommu == VIRT_IOMMU_SMMUV3) {
         object_child_foreach_recursive(object_get_root(),
                                        iort_host_bridges, smmu_idmaps);
 
         /* Sort the smmu idmap by input_base */
         g_array_sort(smmu_idmaps, iort_idmap_compare);
+        num_smmus = 1;
+    }
 
-        /*
-         * Split the whole RIDs by mapping from RC to SMMU,
-         * build the ID mapping from RC to ITS directly.
-         */
+    /*
+     * Split the whole RIDs by mapping from RC to SMMU,
+     * build the ID mapping from RC to ITS directly.
+     */
+    if (num_smmus) {
+        AcpiIortIdMapping next_range = {0};
+
+        smmu_offset = g_new0(size_t, num_smmus);
+        nb_nodes += num_smmus;
         for (i = 0; i < smmu_idmaps->len; i++) {
             idmap = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
 
@@ -316,10 +383,8 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             g_array_append_val(its_idmaps, next_range);
         }
 
-        nb_nodes = 3; /* RC, ITS, SMMUv3 */
         rc_mapping_count = smmu_idmaps->len + its_idmaps->len;
     } else {
-        nb_nodes = 2; /* RC, ITS */
         rc_mapping_count = 1;
     }
     /* Number of IORT Nodes */
@@ -341,10 +406,19 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     /* GIC ITS Identifier Array */
     build_append_int_noprefix(table_data, 0 /* MADT translation_id */, 4);
 
-    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
-        int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
+    for (i = 0; i < num_smmus; i++) {
+        hwaddr base;
+        int irq;
+        if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
+            accel = &g_array_index(smmuv3_accel, SMMUv3Accel, i);
+            base = accel->base;
+            irq = accel->irq;
+        } else {
+            base = vms->memmap[VIRT_SMMU].base;
+            irq = vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
+        }
 
-        smmu_offset = table_data->len - table.table_offset;
+        smmu_offset[i] = table_data->len - table.table_offset;
         /* Table 9 SMMUv3 Format */
         build_append_int_noprefix(table_data, 4 /* SMMUv3 */, 1); /* Type */
         node_size =  SMMU_V3_ENTRY_SIZE + ID_MAPPING_ENTRY_SIZE;
@@ -355,7 +429,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         /* Reference to ID Array */
         build_append_int_noprefix(table_data, SMMU_V3_ENTRY_SIZE, 4);
         /* Base address */
-        build_append_int_noprefix(table_data, vms->memmap[VIRT_SMMU].base, 8);
+        build_append_int_noprefix(table_data, base, 8);
         /* Flags */
         build_append_int_noprefix(table_data, 1 /* COHACC Override */, 4);
         build_append_int_noprefix(table_data, 0, 4); /* Reserved */
@@ -404,15 +478,22 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     build_append_int_noprefix(table_data, 0, 3); /* Reserved */
 
     /* Output Reference */
-    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
+    if (num_smmus) {
         AcpiIortIdMapping *range;
+        size_t offset;
 
         /* translated RIDs connect to SMMUv3 node: RC -> SMMUv3 -> ITS */
         for (i = 0; i < smmu_idmaps->len; i++) {
+            if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
+                offset = smmu_offset[i];
+            } else {
+                offset = smmu_offset[0];
+            }
+
             range = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
             /* output IORT node is the smmuv3 node */
             build_iort_id_mapping(table_data, range->input_base,
-                                  range->id_count, smmu_offset);
+                                  range->id_count, offset);
         }
 
         /* bypassed RIDs connect to ITS group node directly: RC -> ITS */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [RFC PATCH v2 20/20] hw/arm/smmuv3-accel: Enable smmuv3-accel creation
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (18 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 19/20] hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes Shameer Kolothum via
@ 2025-03-11 14:10 ` Shameer Kolothum via
  2025-03-19 16:40 ` [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Philippe Mathieu-Daudé
  2025-03-25 14:42 ` Eric Auger
  21 siblings, 0 replies; 145+ messages in thread
From: Shameer Kolothum via @ 2025-03-11 14:10 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3-accel.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
index fb08e1d66b..812f8e358f 100644
--- a/hw/arm/smmuv3-accel.c
+++ b/hw/arm/smmuv3-accel.c
@@ -595,6 +595,7 @@ static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
                                        &c->parent_phases);
     device_class_set_parent_realize(dc, smmu_accel_realize,
                                     &c->parent_realize);
+    dc->user_creatable = true;
     dc->hotpluggable = false;
     dc->bus_type = TYPE_PCIE_BUS;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-11 14:10 ` [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device Shameer Kolothum via
@ 2025-03-11 20:13   ` Nicolin Chen
  2025-03-12 15:15   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-11 20:13 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Tue, Mar 11, 2025 at 02:10:28PM +0000, Shameer Kolothum wrote:
> +/*
> + * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
> + * Copyright (C) 2025 NVIDIA

+ * Copyright (C) 2025 NVIDIA CORPORATION & AFFILIATES

> + * Written by Nicolin Chen, Shameer Kolothum

(Thanks for adding my name!)

>  struct SMMUBaseClass {
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> new file mode 100644
> index 0000000000..56fe376bf4
> --- /dev/null
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -0,0 +1,31 @@
> +/*
> + * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
> + * Copyright (C) 2025 NVIDIA

Ditto

> + * Written by Nicolin Chen, Shameer Kolothum
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_ARM_SMMUV3_ACCEL_H
> +#define HW_ARM_SMMUV3_ACCEL_H
> +
> +#include "hw/arm/smmu-common.h"
> +#include "hw/arm/smmuv3.h"

> +#include "qom/object.h"

smmuv3.h seems to include smmu-common.h and object.h already.

Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-11 14:10 ` [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel Shameer Kolothum via
@ 2025-03-11 20:22   ` Nicolin Chen
  2025-03-12  9:44     ` Shameerali Kolothum Thodi via
  2025-03-12 15:36   ` Eric Auger
  2025-03-18 22:49   ` Donald Dutile
  2 siblings, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-11 20:22 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Tue, Mar 11, 2025 at 02:10:29PM +0000, Shameer Kolothum wrote:
> Allow cold-plug smmuv3-accel to virt If the machine wide smmuv3
> is not specified.
> 
> No FDT support is added for now.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/virt.c         | 12 ++++++++++++
>  hw/core/sysbus-fdt.c  |  1 +
>  include/hw/arm/virt.h |  1 +
>  3 files changed, 14 insertions(+)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 4a5a9666e9..84a323da55 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -73,6 +73,7 @@
>  #include "qobject/qlist.h"
>  #include "standard-headers/linux/input.h"
>  #include "hw/arm/smmuv3.h"
> +#include "hw/arm/smmuv3-accel.h"

smmuv3-accel.h included smmuv3.h in the patch prior.

> @@ -2911,6 +2912,16 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
>                                       SYS_BUS_DEVICE(dev));
>          }
> +        if (object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_ACCEL)) {
> +            if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> +                error_setg(errp,
> +                           "iommu=smmuv3 is already specified. can't create smmuv3-accel dev");
> +                return;
> +            }
> +            if (vms->iommu != VIRT_IOMMU_SMMUV3_ACCEL) {
> +                vms->iommu = VIRT_IOMMU_SMMUV3_ACCEL;
> +            }

Looks like it is to support TYPE_VIRTIO_IOMMU_PCI?

Just asking: should SMMUV3_ACCEL work with that?

Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 08/20] hw/arm/smmuv3-accel: Provide get_address_space callback
  2025-03-11 14:10 ` [RFC PATCH v2 08/20] hw/arm/smmuv3-accel: Provide get_address_space callback Shameer Kolothum via
@ 2025-03-11 20:50   ` Nicolin Chen
  2025-03-12 17:14   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-11 20:50 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Tue, Mar 11, 2025 at 02:10:33PM +0000, Shameer Kolothum wrote:
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> index 56fe376bf4..86c0523063 100644
> --- a/include/hw/arm/smmuv3-accel.h
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -16,6 +16,10 @@
>  #define TYPE_ARM_SMMUV3_ACCEL   "arm-smmuv3-accel"
>  OBJECT_DECLARE_TYPE(SMMUv3AccelState, SMMUv3AccelClass, ARM_SMMUV3_ACCEL)
>  
> +typedef struct SMMUv3AccelDevice {
> +    SMMUDevice  sdev;

nit: there are two spaces?

Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-03-11 14:10 ` [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
@ 2025-03-11 21:07   ` Nicolin Chen
  2025-03-17  8:38     ` Shameerali Kolothum Thodi via
  2025-03-12 12:52   ` Eric Auger
  1 sibling, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-11 21:07 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Tue, Mar 11, 2025 at 02:10:34PM +0000, Shameer Kolothum wrote:
> @@ -30,6 +32,185 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
>      return accel_dev;
>  }
>  
> +static bool
> +smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
> +                               HostIOMMUDeviceIOMMUFD *idev, Error **errp)

With vEVENTQ v9, vDEVICE (vSID) is required to attach a device
to a proxy NESTED hwpt (applicable to bypass/abort HWPTs too).
So, host_iommu_device_iommufd_attach_hwpt() would fail in this
function because vSID isn't ready at this stage. So all those
calls should be moved out of the function, then this should be
likely "smmuv3_accel_dev_alloc_viommu"?

That being said, I don't know when QEMU actually prepare a BDF
number for a vfio-pci device. The only place that I see it is
ready is at guest-level SMMU installing the Stream Table, i.e.
in smmuv3_accel_install_nested_ste().

> +{
> +    struct iommu_hwpt_arm_smmuv3 bypass_data = {
> +        .ste = { 0x9ULL, 0x0ULL },
> +    };
> +    struct iommu_hwpt_arm_smmuv3 abort_data = {
> +        .ste = { 0x1ULL, 0x0ULL },
> +    };
> +    SMMUDevice *sdev = &accel_dev->sdev;
> +    SMMUState *s = sdev->smmu;
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
> +    SMMUS2Hwpt *s2_hwpt;
> +    SMMUViommu *viommu;
> +    uint32_t s2_hwpt_id;
> +    uint32_t viommu_id;
> +
> +    if (s_accel->viommu) {
> +        accel_dev->viommu = s_accel->viommu;
> +        return host_iommu_device_iommufd_attach_hwpt(
> +                       idev, s_accel->viommu->s2_hwpt->hwpt_id, errp);

Yea, here is my bad. We shouldn't attach a device to s2_hwpt,
since eventually s2_hwpt would be a shared hwpt across SMMUs.

> +    /* Attach to S2 for MSI cookie */
> +    if (!host_iommu_device_iommufd_attach_hwpt(idev, s2_hwpt_id, errp)) {
> +        goto free_s2_hwpt;
> +    }

With the merged sw_msi series, we don't need this anymore.

> +    /*
> +     * Attach the bypass STE which means S1 bypass and S2 translate.
> +     * This is to make sure that the vIOMMU object is now associated
> +     * with the device and has this STE installed in the host SMMUV3.
> +     */
> +    if (!host_iommu_device_iommufd_attach_hwpt(
> +                idev, viommu->bypass_hwpt_id, errp)) {
> +        error_report("failed to attach the bypass pagetable");
> +        goto free_bypass_hwpt;
> +    }

Ditto. We have to postpone this until vdevice is allocated.

> +static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
> +                                            int devfn)
> +{
> +    SMMUDevice *sdev;
> +    SMMUv3AccelDevice *accel_dev;
> +    SMMUViommu *viommu;
> +    SMMUState *s = opaque;
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
> +    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
> +
> +    if (!sbus) {
> +        return;
> +    }
> +
> +    sdev = sbus->pbdev[devfn];
> +    if (!sdev) {
> +        return;
> +    }
> +
> +    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +    if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev,
> +                                               accel_dev->idev->ioas_id,
> +                                               NULL)) {
> +        error_report("Unable to attach dev to the default HW pagetable");
> +    }
> +
> +

Could drop the extra line.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-11 20:22   ` Nicolin Chen
@ 2025-03-12  9:44     ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-12  9:44 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, March 11, 2025 8:23 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
> accel
> 
> On Tue, Mar 11, 2025 at 02:10:29PM +0000, Shameer Kolothum wrote:
> > Allow cold-plug smmuv3-accel to virt If the machine wide smmuv3
> > is not specified.
> >
> > No FDT support is added for now.
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  hw/arm/virt.c         | 12 ++++++++++++
> >  hw/core/sysbus-fdt.c  |  1 +
> >  include/hw/arm/virt.h |  1 +
> >  3 files changed, 14 insertions(+)
> >
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 4a5a9666e9..84a323da55 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -73,6 +73,7 @@
> >  #include "qobject/qlist.h"
> >  #include "standard-headers/linux/input.h"
> >  #include "hw/arm/smmuv3.h"
> > +#include "hw/arm/smmuv3-accel.h"
> 
> smmuv3-accel.h included smmuv3.h in the patch prior.
> 
> > @@ -2911,6 +2912,16 @@ static void
> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms-
> >platform_bus_dev),
> >                                       SYS_BUS_DEVICE(dev));
> >          }
> > +        if (object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_ACCEL))
> {
> > +            if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> > +                error_setg(errp,
> > +                           "iommu=smmuv3 is already specified. can't create
> smmuv3-accel dev");
> > +                return;
> > +            }
> > +            if (vms->iommu != VIRT_IOMMU_SMMUV3_ACCEL) {
> > +                vms->iommu = VIRT_IOMMU_SMMUV3_ACCEL;
> > +            }
> 
> Looks like it is to support TYPE_VIRTIO_IOMMU_PCI?
>
> Just asking: should SMMUV3_ACCEL work with that?

Hmm..That's true. It will conflict with virtio-iommu. I will add
a blocker if both are specified.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-03-11 14:10 ` [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
  2025-03-11 21:07   ` Nicolin Chen
@ 2025-03-12 12:52   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-12 12:52 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi Shameer,


On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Implement a set_iommu_device callback:
>  -Find an existing S2 hwpt to test attach() or allocate a new one
>    (Devices behind the same physical SMMU should share an S2 HWPT.)
>  -Attach the device to the S2 hwp
>  -Allocate a viommu with the returned s2 hwpt.
>  -Allocate bypass and abort hwpt and attach bypass hwpt.
>  -and add it to its device list
>
> Also add an unset_iommu_device doing the opposite cleanup routine.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/meson.build            |   2 +-
>  hw/arm/smmuv3-accel.c         | 183 ++++++++++++++++++++++++++++++++++
>  hw/arm/trace-events           |   4 +
>  include/hw/arm/smmuv3-accel.h |  23 +++++
>  include/system/iommufd.h      |   6 ++
>  5 files changed, 217 insertions(+), 1 deletion(-)
>
> diff --git a/hw/arm/meson.build b/hw/arm/meson.build
> index e8593363b0..dd41a86619 100644
> --- a/hw/arm/meson.build
> +++ b/hw/arm/meson.build
> @@ -55,7 +55,7 @@ arm_ss.add(when: 'CONFIG_MUSCA', if_true: files('musca.c'))
>  arm_ss.add(when: 'CONFIG_ARMSSE', if_true: files('armsse.c'))
>  arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: files('fsl-imx7.c', 'mcimx7d-sabre.c'))
>  arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
> -arm_ss.add(when: 'CONFIG_ARM_SMMUV3_ACCEL', if_true: files('smmuv3-accel.c'))
> +arm_ss.add(when: ['CONFIG_ARM_SMMUV3_ACCEL', 'CONFIG_IOMMUFD'], if_true: files('smmuv3-accel.c'))
I guess we could set from patch 3 onwards?
>  arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
>  arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
>  arm_ss.add(when: 'CONFIG_XEN', if_true: files(
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 6610ebe4be..1c696649d5 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -7,6 +7,8 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "trace.h"
> +#include "qemu/error-report.h"
>  
>  #include "hw/arm/smmuv3-accel.h"
>  #include "hw/pci/pci_bridge.h"
> @@ -30,6 +32,185 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
>      return accel_dev;
>  }
>  
> +static bool
> +smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
> +                               HostIOMMUDeviceIOMMUFD *idev, Error **errp)
> +{
> +    struct iommu_hwpt_arm_smmuv3 bypass_data = {
> +        .ste = { 0x9ULL, 0x0ULL },
I would suggest to use defines.
> +    };
> +    struct iommu_hwpt_arm_smmuv3 abort_data = {
> +        .ste = { 0x1ULL, 0x0ULL },
same
> +    };
> +    SMMUDevice *sdev = &accel_dev->sdev;
> +    SMMUState *s = sdev->smmu;
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
> +    SMMUS2Hwpt *s2_hwpt;
> +    SMMUViommu *viommu;
> +    uint32_t s2_hwpt_id;
> +    uint32_t viommu_id;
> +
> +    if (s_accel->viommu) {
> +        accel_dev->viommu = s_accel->viommu;
> +        return host_iommu_device_iommufd_attach_hwpt(
> +                       idev, s_accel->viommu->s2_hwpt->hwpt_id, errp);
> +    }
> +
> +    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid, idev->ioas_id,
> +                                    IOMMU_HWPT_ALLOC_NEST_PARENT,
> +                                    IOMMU_HWPT_DATA_NONE, 0, NULL,
> +                                    &s2_hwpt_id, errp)) {
> +        return false;
> +    }
> +
> +    /* Attach to S2 for MSI cookie */
> +    if (!host_iommu_device_iommufd_attach_hwpt(idev, s2_hwpt_id, errp)) {
> +        goto free_s2_hwpt;
> +    }
> +
> +    if (!iommufd_backend_alloc_viommu(idev->iommufd, idev->devid,
> +                                      IOMMU_VIOMMU_TYPE_ARM_SMMUV3,
> +                                      s2_hwpt_id, &viommu_id, errp)) {
> +        goto detach_s2_hwpt;
> +    }
> +
> +    viommu = g_new0(SMMUViommu, 1);
> +    viommu->core.viommu_id = viommu_id;
> +    viommu->core.s2_hwpt_id = s2_hwpt_id;
> +    viommu->core.iommufd = idev->iommufd;
> +
> +    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
> +                                    viommu->core.viommu_id, 0,
> +                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
> +                                    sizeof(abort_data), &abort_data,
> +                                    &viommu->abort_hwpt_id, errp)) {
> +        error_report("failed to allocate an abort pagetable");
is that error_report needed as we have the error handle already?
> +        goto free_viommu;
> +    }
> +
> +    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
> +                                    viommu->core.viommu_id, 0,
> +                                    IOMMU_HWPT_DATA_ARM_SMMUV3,
> +                                    sizeof(bypass_data), &bypass_data,
> +                                    &viommu->bypass_hwpt_id, errp)) {
> +        error_report("failed to allocate a bypass pagetable");
same
> +        goto free_abort_hwpt;
> +    }
> +
> +    /*
> +     * Attach the bypass STE which means S1 bypass and S2 translate.
> +     * This is to make sure that the vIOMMU object is now associated
> +     * with the device and has this STE installed in the host SMMUV3.
> +     */
> +    if (!host_iommu_device_iommufd_attach_hwpt(
> +                idev, viommu->bypass_hwpt_id, errp)) {
> +        error_report("failed to attach the bypass pagetable");
same
if you prefer you can add error "hint"
> +        goto free_bypass_hwpt;
> +    }
> +
> +    s2_hwpt = g_new0(SMMUS2Hwpt, 1);
> +    s2_hwpt->iommufd = idev->iommufd;
> +    s2_hwpt->hwpt_id = s2_hwpt_id;
> +    s2_hwpt->ioas_id = idev->ioas_id;
> +
> +    viommu->iommufd = idev->iommufd;
> +    viommu->s2_hwpt = s2_hwpt;
> +
> +    s_accel->viommu = viommu;
> +    accel_dev->viommu = viommu;
> +    return true;
> +
> +free_bypass_hwpt:
> +    iommufd_backend_free_id(idev->iommufd, viommu->bypass_hwpt_id);
> +free_abort_hwpt:
> +    iommufd_backend_free_id(idev->iommufd, viommu->abort_hwpt_id);
> +free_viommu:
> +    iommufd_backend_free_id(idev->iommufd, viommu->core.viommu_id);
> +    g_free(viommu);
> +detach_s2_hwpt:
> +    host_iommu_device_iommufd_attach_hwpt(idev, accel_dev->idev->ioas_id, errp);
This should be detach.

Eric
> +free_s2_hwpt:
> +    iommufd_backend_free_id(idev->iommufd, s2_hwpt_id);
> +    return false;
> +}
> +
> +static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> +                                          HostIOMMUDevice *hiod, Error **errp)
> +{
> +    HostIOMMUDeviceIOMMUFD *idev = HOST_IOMMU_DEVICE_IOMMUFD(hiod);
> +    SMMUState *s = opaque;
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
> +    SMMUPciBus *sbus = smmu_get_sbus(s, bus);
> +    SMMUv3AccelDevice *accel_dev = smmuv3_accel_get_dev(s, sbus, bus, devfn);
> +    SMMUDevice *sdev = &accel_dev->sdev;
> +
> +    if (!idev) {
> +        return true;
> +    }
> +
> +    if (accel_dev->idev) {
> +        if (accel_dev->idev != idev) {
> +            error_report("Device 0x%x already ha an associated idev",
> +                         smmu_get_sid(sdev));
> +            return false;
> +        } else {
> +            return true;
> +        }
> +    }
> +
> +    if (!smmuv3_accel_dev_attach_viommu(accel_dev, idev, errp)) {
> +        error_report("Unable to attach viommu");
> +        return false;
> +    }
> +
> +    accel_dev->idev = idev;
> +    QLIST_INSERT_HEAD(&s_accel->viommu->device_list, accel_dev, next);
> +    trace_smmuv3_accel_set_iommu_device(devfn, smmu_get_sid(sdev));
> +    return true;
> +}
> +
> +static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
> +                                            int devfn)
> +{
> +    SMMUDevice *sdev;
> +    SMMUv3AccelDevice *accel_dev;
> +    SMMUViommu *viommu;
> +    SMMUState *s = opaque;
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
> +    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
> +
> +    if (!sbus) {
> +        return;
> +    }
> +
> +    sdev = sbus->pbdev[devfn];
> +    if (!sdev) {
> +        return;
> +    }
> +
> +    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +    if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev,
> +                                               accel_dev->idev->ioas_id,
> +                                               NULL)) {
> +        error_report("Unable to attach dev to the default HW pagetable");
> +    }
> +
> +
> +    accel_dev->idev = NULL;
> +    QLIST_REMOVE(accel_dev, next);
> +    trace_smmuv3_accel_unset_iommu_device(devfn, smmu_get_sid(sdev));
> +
> +    viommu = s_accel->viommu;
> +    if (QLIST_EMPTY(&viommu->device_list)) {
> +        iommufd_backend_free_id(viommu->iommufd, viommu->bypass_hwpt_id);
> +        iommufd_backend_free_id(viommu->iommufd, viommu->abort_hwpt_id);
> +        iommufd_backend_free_id(viommu->iommufd, viommu->core.viommu_id);
> +        iommufd_backend_free_id(viommu->iommufd, viommu->s2_hwpt->hwpt_id);
> +        g_free(viommu->s2_hwpt);
> +        g_free(viommu);
> +        s_accel->viommu = NULL;
> +    }
> +}
>  static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
>                                                int devfn)
>  {
> @@ -77,6 +258,8 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
>          return;
>      }
>      bs->get_address_space = smmuv3_accel_find_add_as;
> +    bs->set_iommu_device = smmuv3_accel_set_iommu_device;
> +    bs->unset_iommu_device = smmuv3_accel_unset_iommu_device;
>  }
>  
>  static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index 7790db780e..17960794bf 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -58,6 +58,10 @@ smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s
>  smmuv3_inv_notifiers_iova(const char *name, int asid, int vmid, uint64_t iova, uint8_t tg, uint64_t num_pages, int stage) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" stage=%d"
>  smmu_reset_exit(void) ""
>  
> +#smmuv3-accel.c
> +smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
> +smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
> +
>  # strongarm.c
>  strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
>  strongarm_ssp_read_underrun(void) "SSP rx underrun"
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> index 86c0523063..aca6838dca 100644
> --- a/include/hw/arm/smmuv3-accel.h
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -12,16 +12,39 @@
>  #include "hw/arm/smmu-common.h"
>  #include "hw/arm/smmuv3.h"
>  #include "qom/object.h"
> +#include "system/iommufd.h"
> +
> +#include <linux/iommufd.h>
>  
>  #define TYPE_ARM_SMMUV3_ACCEL   "arm-smmuv3-accel"
>  OBJECT_DECLARE_TYPE(SMMUv3AccelState, SMMUv3AccelClass, ARM_SMMUV3_ACCEL)
>  
> +typedef struct SMMUS2Hwpt {
> +    IOMMUFDBackend *iommufd;
> +    uint32_t hwpt_id;
> +    uint32_t ioas_id;
> +} SMMUS2Hwpt;
> +
> +typedef struct SMMUViommu {
> +    IOMMUFDBackend *iommufd;
> +    IOMMUFDViommu core;
> +    SMMUS2Hwpt *s2_hwpt;
> +    uint32_t bypass_hwpt_id;
> +    uint32_t abort_hwpt_id;
> +    QLIST_HEAD(, SMMUv3AccelDevice) device_list;
> +    QLIST_ENTRY(SMMUViommu) next;
> +} SMMUViommu;
> +
>  typedef struct SMMUv3AccelDevice {
>      SMMUDevice  sdev;
> +    HostIOMMUDeviceIOMMUFD *idev;
> +    SMMUViommu *viommu;
> +    QLIST_ENTRY(SMMUv3AccelDevice) next;
>  } SMMUv3AccelDevice;
>  
>  struct SMMUv3AccelState {
>      SMMUv3State smmuv3_state;
> +    SMMUViommu *viommu;
>  };
>  
>  struct SMMUv3AccelClass {
> diff --git a/include/system/iommufd.h b/include/system/iommufd.h
> index 53920bae5b..9c106ea078 100644
> --- a/include/system/iommufd.h
> +++ b/include/system/iommufd.h
> @@ -37,6 +37,12 @@ struct IOMMUFDBackend {
>      /*< public >*/
>  };
>  
> +typedef struct IOMMUFDViommu {
> +    IOMMUFDBackend *iommufd;
> +    uint32_t s2_hwpt_id;
> +    uint32_t viommu_id;
> +} IOMMUFDViommu;
> +
>  bool iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
>  void iommufd_backend_disconnect(IOMMUFDBackend *be);
>  



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-11 14:10 ` [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device Shameer Kolothum via
  2025-03-11 20:13   ` Nicolin Chen
@ 2025-03-12 15:15   ` Eric Auger
  2025-03-17 17:54     ` Nicolin Chen
  1 sibling, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-12 15:15 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi Shameer,


On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel
> device. In order to support vfio-pci dev assignment with a Guest
guest
> SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
nested (s1+s2)
> mode, with Guest owning the S1 page tables. Subsequent patches will
the guest
> add support for smmuv3-accel to provide this.
Can't this -accel smmu also works with emulated devices? Do we want an
exclusive usage?

I would also document in the commit msg that a new property is added in
the parent SMMU (accel).
Will this device be migratable? Do we need a migration blocker?
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/Kconfig                |  5 ++++
>  hw/arm/meson.build            |  1 +
>  hw/arm/smmu-common.c          |  1 +
>  hw/arm/smmuv3-accel.c         | 51 +++++++++++++++++++++++++++++++++++
>  include/hw/arm/smmu-common.h  |  3 +++
>  include/hw/arm/smmuv3-accel.h | 31 +++++++++++++++++++++
>  6 files changed, 92 insertions(+)
>  create mode 100644 hw/arm/smmuv3-accel.c
>  create mode 100644 include/hw/arm/smmuv3-accel.h
>
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 504841ccab..f889842dd8 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -14,6 +14,7 @@ config ARM_VIRT
>      select ARM_GIC
>      select ACPI
>      select ARM_SMMUV3
> +    select ARM_SMMUV3_ACCEL
>      select GPIO_KEY
>      select DEVICE_TREE
>      select FW_CFG_DMA
> @@ -596,6 +597,10 @@ config FSL_IMX7
>  config ARM_SMMUV3
>      bool
>  
> +config ARM_SMMUV3_ACCEL
> +    select ARM_SMMUV3
> +    bool
> +
>  config FSL_IMX6UL
>      bool
>      default y
> diff --git a/hw/arm/meson.build b/hw/arm/meson.build
> index 465c757f97..e8593363b0 100644
> --- a/hw/arm/meson.build
> +++ b/hw/arm/meson.build
> @@ -55,6 +55,7 @@ arm_ss.add(when: 'CONFIG_MUSCA', if_true: files('musca.c'))
>  arm_ss.add(when: 'CONFIG_ARMSSE', if_true: files('armsse.c'))
>  arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: files('fsl-imx7.c', 'mcimx7d-sabre.c'))
>  arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
> +arm_ss.add(when: 'CONFIG_ARM_SMMUV3_ACCEL', if_true: files('smmuv3-accel.c'))
>  arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
>  arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
>  arm_ss.add(when: 'CONFIG_XEN', if_true: files(
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 8c1b407b82..f5caf1665c 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -943,6 +943,7 @@ static const Property smmu_dev_properties[] = {
>      DEFINE_PROP_UINT8("bus_num", SMMUState, bus_num, 0),
>      DEFINE_PROP_LINK("primary-bus", SMMUState, primary_bus,
>                       TYPE_PCI_BUS, PCIBus *),
> +    DEFINE_PROP_BOOL("accel", SMMUState, accel, false),
>  };
>  
>  static void smmu_base_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> new file mode 100644
> index 0000000000..c327661636
> --- /dev/null
> +++ b/hw/arm/smmuv3-accel.c
> @@ -0,0 +1,51 @@
> +/*
> + * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
> + * Copyright (C) 2025 NVIDIA
> + * Written by Nicolin Chen, Shameer Kolothum
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "hw/arm/smmuv3-accel.h"
> +
> +static void smmu_accel_realize(DeviceState *d, Error **errp)
> +{
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(d);
> +    SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_GET_CLASS(s_accel);
> +    SysBusDevice *dev = SYS_BUS_DEVICE(d);
> +    Error *local_err = NULL;
> +
> +    object_property_set_bool(OBJECT(dev), "accel", true, &error_abort);
you shouldn't need dev and simply use OBJECT(d)
> +    c->parent_realize(d, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
> +static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_CLASS(klass);
> +
> +    device_class_set_parent_realize(dc, smmu_accel_realize,
> +                                    &c->parent_realize);
> +    dc->hotpluggable = false;
> +}
> +
> +static const TypeInfo smmuv3_accel_type_info = {
> +    .name          = TYPE_ARM_SMMUV3_ACCEL,
> +    .parent        = TYPE_ARM_SMMUV3,
> +    .instance_size = sizeof(SMMUv3AccelState),
> +    .class_size    = sizeof(SMMUv3AccelClass),
> +    .class_init    = smmuv3_accel_class_init,
> +};
> +
> +static void smmuv3_accel_register_types(void)
> +{
> +    type_register_static(&smmuv3_accel_type_info);
> +}
> +
> +type_init(smmuv3_accel_register_types)
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index d1a4a64551..b5c63cfd5d 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -157,6 +157,9 @@ struct SMMUState {
>      QLIST_HEAD(, SMMUDevice) devices_with_notifiers;
>      uint8_t bus_num;
>      PCIBus *primary_bus;
> +
> +    /* For smmuv3-accel */
> +    bool accel;
>  };
>  
>  struct SMMUBaseClass {
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> new file mode 100644
> index 0000000000..56fe376bf4
> --- /dev/null
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -0,0 +1,31 @@
> +/*
> + * Copyright (c) 2025 Huawei Technologies R & D (UK) Ltd
> + * Copyright (C) 2025 NVIDIA
> + * Written by Nicolin Chen, Shameer Kolothum
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_ARM_SMMUV3_ACCEL_H
> +#define HW_ARM_SMMUV3_ACCEL_H
> +
> +#include "hw/arm/smmu-common.h"
> +#include "hw/arm/smmuv3.h"
> +#include "qom/object.h"
> +
> +#define TYPE_ARM_SMMUV3_ACCEL   "arm-smmuv3-accel"
> +OBJECT_DECLARE_TYPE(SMMUv3AccelState, SMMUv3AccelClass, ARM_SMMUV3_ACCEL)
> +
> +struct SMMUv3AccelState {
> +    SMMUv3State smmuv3_state;
> +};
> +
> +struct SMMUv3AccelClass {
> +    /*< private >*/
> +    SMMUv3Class smmuv3_class;
> +    /*< public >*/
> +
> +    DeviceRealize parent_realize;
> +};
> +
> +#endif /* HW_ARM_SMMUV3_ACCEL_H */
Thanks

Eric



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 01/20] backends/iommufd: Introduce iommufd_backend_alloc_viommu
  2025-03-11 14:10 ` [RFC PATCH v2 01/20] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
@ 2025-03-12 15:20   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-12 15:20 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi,


On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Add a helper to allocate a viommu object.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  backends/iommufd.c       | 25 +++++++++++++++++++++++++
>  backends/trace-events    |  1 +
>  include/system/iommufd.h |  4 ++++
>  3 files changed, 30 insertions(+)
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 3c23caef96..3fac08c96e 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -341,6 +341,31 @@ int iommufd_backend_invalidate_cache(IOMMUFDBackend *be, uint32_t hwpt_id,
>      return ret;
>  }
>  
> +bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
> +                                  uint32_t viommu_type, uint32_t hwpt_id,
> +                                  uint32_t *out_viommu_id, Error **errp)
> +{
> +    int ret, fd = be->fd;
> +    struct iommu_viommu_alloc alloc_viommu = {
> +        .size = sizeof(alloc_viommu),
> +        .type = viommu_type,
> +        .dev_id = dev_id,
> +        .hwpt_id = hwpt_id,
> +    };
> +
> +    ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &alloc_viommu);
> +
> +    trace_iommufd_backend_alloc_viommu(fd, viommu_type, dev_id, hwpt_id,
> +                                       alloc_viommu.out_viommu_id, ret);
> +    if (ret) {
> +        error_setg_errno(errp, errno, "IOMMU_VIOMMU_ALLOC failed");
> +        return false;
> +    }
> +
> +    *out_viommu_id = alloc_viommu.out_viommu_id;
> +    return true;
> +}
> +
>  bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
>                                             uint32_t hwpt_id, Error **errp)
>  {
> diff --git a/backends/trace-events b/backends/trace-events
> index 5a23db6c8a..a835827540 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -19,3 +19,4 @@ iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%
>  iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) " iommufd=%d hwpt=%u enable=%d (%d)"
>  iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
>  iommufd_backend_invalidate_cache(int iommufd, uint32_t hwpt_id, uint32_t data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t data_ptr, int ret) " iommufd=%d hwpt_id=%u data_type=%u entry_len=%u entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)"
> +iommufd_backend_alloc_viommu(int iommufd, uint32_t type, uint32_t dev_id, uint32_t hwpt_id, uint32_t viommu_id, int ret) " iommufd=%d type=%u dev_id=%u hwpt_id=%u viommu_id=%u (%d)"
> diff --git a/include/system/iommufd.h b/include/system/iommufd.h
> index b93421ac7c..7e5507f2db 100644
> --- a/include/system/iommufd.h
> +++ b/include/system/iommufd.h
> @@ -55,6 +55,10 @@ bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
>                                  uint32_t data_type, uint32_t data_len,
>                                  void *data_ptr, uint32_t *out_hwpt,
>                                  Error **errp);
> +bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
> +                                  uint32_t viommu_type, uint32_t hwpt_id,
> +                                  uint32_t *out_hwpt, Error **errp);
> +
>  bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>                                          bool start, Error **errp);
>  bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 02/20] backends/iommufd: Introduce iommufd_vdev_alloc
  2025-03-11 14:10 ` [RFC PATCH v2 02/20] backends/iommufd: Introduce iommufd_vdev_alloc Shameer Kolothum via
@ 2025-03-12 15:25   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-12 15:25 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao


Hi Shameer,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Add a helper to allocate an iommufd device's virtual device (in the user
> space) per a viommu instance.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  backends/iommufd.c       | 26 ++++++++++++++++++++++++++
>  backends/trace-events    |  1 +
>  include/system/iommufd.h |  4 ++++
>  3 files changed, 31 insertions(+)
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 3fac08c96e..3511dd32ab 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -366,6 +366,32 @@ bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
>      return true;
>  }
>  
> +bool iommufd_backend_alloc_vdev(IOMMUFDBackend *be, uint32_t dev_id,
> +                                uint32_t viommu_id, uint64_t virt_id,
> +                                uint32_t *out_vdev_id, Error **errp)
> +{
> +    int ret, fd = be->fd;
> +    struct iommu_vdevice_alloc alloc_vdev = {
> +        .size = sizeof(alloc_vdev),
> +        .viommu_id = viommu_id,
> +        .dev_id = dev_id,
> +        .virt_id = virt_id,
> +    };
> +
> +    ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &alloc_vdev);
> +
> +    trace_iommufd_backend_alloc_vdev(fd, dev_id, viommu_id, virt_id,
> +                                     alloc_vdev.out_vdevice_id, ret);
> +
> +    if (ret) {
> +        error_setg_errno(errp, errno, "IOMMU_VDEVICE_ALLOC failed");
> +        return false;
> +    }
> +
> +    *out_vdev_id = alloc_vdev.out_vdevice_id;
> +    return true;
> +}
> +
>  bool host_iommu_device_iommufd_attach_hwpt(HostIOMMUDeviceIOMMUFD *idev,
>                                             uint32_t hwpt_id, Error **errp)
>  {
> diff --git a/backends/trace-events b/backends/trace-events
> index a835827540..86c8f89e8a 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -20,3 +20,4 @@ iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) "
>  iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%u iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
>  iommufd_backend_invalidate_cache(int iommufd, uint32_t hwpt_id, uint32_t data_type, uint32_t entry_len, uint32_t entry_num, uint32_t done_num, uint64_t data_ptr, int ret) " iommufd=%d hwpt_id=%u data_type=%u entry_len=%u entry_num=%u done_num=%u data_ptr=0x%"PRIx64" (%d)"
>  iommufd_backend_alloc_viommu(int iommufd, uint32_t type, uint32_t dev_id, uint32_t hwpt_id, uint32_t viommu_id, int ret) " iommufd=%d type=%u dev_id=%u hwpt_id=%u viommu_id=%u (%d)"
> +iommufd_backend_alloc_vdev(int iommufd, uint32_t dev_id, uint32_t viommu_id, uint64_t virt_id, uint32_t vdev_id, int ret) " iommufd=%d dev_id=%u viommu_id=%u virt_id=0x%"PRIx64" vdev_id=%u (%d)"
> diff --git a/include/system/iommufd.h b/include/system/iommufd.h
> index 7e5507f2db..53920bae5b 100644
> --- a/include/system/iommufd.h
> +++ b/include/system/iommufd.h
> @@ -59,6 +59,10 @@ bool iommufd_backend_alloc_viommu(IOMMUFDBackend *be, uint32_t dev_id,
>                                    uint32_t viommu_type, uint32_t hwpt_id,
>                                    uint32_t *out_hwpt, Error **errp);
>  
> +bool iommufd_backend_alloc_vdev(IOMMUFDBackend *be, uint32_t dev_id,
> +                                uint32_t viommu_id, uint64_t virt_id,
> +                                uint32_t *out_vdev_id, Error **errp);
> +
>  bool iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
>                                          bool start, Error **errp);
>  bool iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-11 14:10 ` [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel Shameer Kolothum via
  2025-03-11 20:22   ` Nicolin Chen
@ 2025-03-12 15:36   ` Eric Auger
  2025-03-12 15:46     ` Shameerali Kolothum Thodi via
  2025-03-18 22:49   ` Donald Dutile
  2 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-12 15:36 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi Shameer,


On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Allow cold-plug smmuv3-accel to virt If the machine wide smmuv3
> is not specified.
>
> No FDT support is added for now.
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/virt.c         | 12 ++++++++++++
>  hw/core/sysbus-fdt.c  |  1 +
>  include/hw/arm/virt.h |  1 +
>  3 files changed, 14 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 4a5a9666e9..84a323da55 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -73,6 +73,7 @@
>  #include "qobject/qlist.h"
>  #include "standard-headers/linux/input.h"
>  #include "hw/arm/smmuv3.h"
> +#include "hw/arm/smmuv3-accel.h"
>  #include "hw/acpi/acpi.h"
>  #include "target/arm/cpu-qom.h"
>  #include "target/arm/internals.h"
> @@ -2911,6 +2912,16 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
>                                       SYS_BUS_DEVICE(dev));
>          }
> +        if (object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_ACCEL)) {
> +            if (vms->iommu == VIRT_IOMMU_SMMUV3) {
maybe just check whether it is != VIRT_IOMMU_NONE?
> +                error_setg(errp,
> +                           "iommu=smmuv3 is already specified. can't create smmuv3-accel dev");
I would clearly state "iommu=smmuv3 virt machine option is alreadt set"
and use an error hint to say both are not compatible.
> +                return;
> +            }
> +            if (vms->iommu != VIRT_IOMMU_SMMUV3_ACCEL) {
> +                vms->iommu = VIRT_IOMMU_SMMUV3_ACCEL;

I know there were quite a lot of dicussions on the 1st multi
instantiation series related to the way we instanatiate that device and
maybe I missed some blockers but why wouldn't we allow the instantiation
of the legacy smmu device with -device too. I think this would be
simpler for libvirt and we would somehow deprecate the machine option
method? would that make a problem if you were to use -device smmu,accel
or something alike?
> +            }
> +        }
>      }
>  
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> @@ -3120,6 +3131,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_PLATFORM);
> +    machine_class_allow_dynamic_sysbus_dev(mc, TYPE_ARM_SMMUV3_ACCEL);
>  #ifdef CONFIG_TPM
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
>  #endif
> diff --git a/hw/core/sysbus-fdt.c b/hw/core/sysbus-fdt.c
> index 774c0aed41..c8502ad830 100644
> --- a/hw/core/sysbus-fdt.c
> +++ b/hw/core/sysbus-fdt.c
> @@ -489,6 +489,7 @@ static const BindingEntry bindings[] = {
>  #ifdef CONFIG_LINUX
>      TYPE_BINDING(TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node),
>      TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
> +    TYPE_BINDING("arm-smmuv3-accel", no_fdt_node),
use the define instead.

to me this patch should be moved at the end of the series when the
device is fully functional.
>      VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a", add_amd_xgbe_fdt_node),
>  #endif
>  #ifdef CONFIG_TPM
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index c8e94e6aed..849d1cd5b5 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -92,6 +92,7 @@ enum {
>  typedef enum VirtIOMMUType {
>      VIRT_IOMMU_NONE,
>      VIRT_IOMMU_SMMUV3,
> +    VIRT_IOMMU_SMMUV3_ACCEL,
>      VIRT_IOMMU_VIRTIO,
>  } VirtIOMMUType;
>  
Thanks

Eric



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-12 15:36   ` Eric Auger
@ 2025-03-12 15:46     ` Shameerali Kolothum Thodi via
  2025-03-12 16:13       ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-12 15:46 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: qemu-devel-
> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org <qemu-
> devel-bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org> On
> Behalf Of Eric Auger
> Sent: Wednesday, March 12, 2025 3:36 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
> accel
> 
> Hi Shameer,
> 
> 
> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> > Allow cold-plug smmuv3-accel to virt If the machine wide smmuv3
> > is not specified.
> >
> > No FDT support is added for now.
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  hw/arm/virt.c         | 12 ++++++++++++
> >  hw/core/sysbus-fdt.c  |  1 +
> >  include/hw/arm/virt.h |  1 +
> >  3 files changed, 14 insertions(+)
> >
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 4a5a9666e9..84a323da55 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -73,6 +73,7 @@
> >  #include "qobject/qlist.h"
> >  #include "standard-headers/linux/input.h"
> >  #include "hw/arm/smmuv3.h"
> > +#include "hw/arm/smmuv3-accel.h"
> >  #include "hw/acpi/acpi.h"
> >  #include "target/arm/cpu-qom.h"
> >  #include "target/arm/internals.h"
> > @@ -2911,6 +2912,16 @@ static void
> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms-
> >platform_bus_dev),
> >                                       SYS_BUS_DEVICE(dev));
> >          }
> > +        if (object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_ACCEL))
> {
> > +            if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> maybe just check whether it is != VIRT_IOMMU_NONE?
> > +                error_setg(errp,
> > +                           "iommu=smmuv3 is already specified. can't create
> smmuv3-accel dev");
> I would clearly state "iommu=smmuv3 virt machine option is alreadt set"
> and use an error hint to say both are not compatible.
> > +                return;
> > +            }
> > +            if (vms->iommu != VIRT_IOMMU_SMMUV3_ACCEL) {
> > +                vms->iommu = VIRT_IOMMU_SMMUV3_ACCEL;
> 
> I know there were quite a lot of dicussions on the 1st multi
> instantiation series related to the way we instanatiate that device and
> maybe I missed some blockers but why wouldn't we allow the instantiation
> of the legacy smmu device with -device too. I think this would be
> simpler for libvirt and we would somehow deprecate the machine option
> method? would that make a problem if you were to use -device smmu,accel
> or something alike?

Thanks for taking a look. I am just jumping on this one for now.  Yes, there
were discussions around that. But I was not sure we concluded on deprecating
the machine option. So if I get you correctly the idea is,

if we have, 
-device smmuv3 it will instantiate the current machine wide smmuv3 and for
-device smmuv3,accel this device?

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-11 14:10 ` [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus Shameer Kolothum via
@ 2025-03-12 16:07   ` Eric Auger
  2025-03-12 16:34     ` Shameerali Kolothum Thodi via
  2025-03-18 22:12   ` Donald Dutile
  1 sibling, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-12 16:07 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi Shameer,


On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> User must associate a pxb-pcie root bus to smmuv3-accel
> and that is set as the primary-bus for the smmu dev.
why do we require a pxb-pcie root bus? why can't pci.0 root bus be used
for simpler use cases (ie. I just want to passthough a NIC in
accelerated mode). Or may pci.0 is also called a pax-pcie root bus?

Besides, why do we put the constraint to plug on a root bus. I know that
at this point we always plug to pci.0 but with the new -device option it
would be possible to plug it anywhere in the pcie hierarchy. At SOC
level can't an SMMU be plugged anywhere protecting just a few RIDs?
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index c327661636..1471b65374 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -9,6 +9,21 @@
>  #include "qemu/osdep.h"
>  
>  #include "hw/arm/smmuv3-accel.h"
> +#include "hw/pci/pci_bridge.h"
> +
> +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
> +{
> +    DeviceState *d = opaque;
> +
> +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
> +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
> +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus->name)) {
> +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
> +                                     &error_abort);
if you want to stop the recursive search I think you need to return
something != 0 here.

I don't really understand why we don't simply set the primary-bus to
<bus> where -device arm-smmuv3-accel, bus=<bus>? or maybe enforce that
this bus is an actual root bus if we really need that?
> +        }
> +    }
> +    return 0;
> +}
>  
>  static void smmu_accel_realize(DeviceState *d, Error **errp)
>  {
> @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
>      SysBusDevice *dev = SYS_BUS_DEVICE(d);
>      Error *local_err = NULL;
>  
> +    object_child_foreach_recursive(object_get_root(),
> +                                   smmuv3_accel_pxb_pcie_bus, d);
> +
>      object_property_set_bool(OBJECT(dev), "accel", true, &error_abort);
>      c->parent_realize(d, &local_err);
>      if (local_err) {
> @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
>      device_class_set_parent_realize(dc, smmu_accel_realize,
>                                      &c->parent_realize);
>      dc->hotpluggable = false;
> +    dc->bus_type = TYPE_PCIE_BUS;
shouldn't it below to 3/20? It is not really related to primary_bus
setting? Thanks Eric
>  }
>  
>  static const TypeInfo smmuv3_accel_type_info = {



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 06/20] hw/arm/smmu-common: Factor out common helper functions and export
  2025-03-11 14:10 ` [RFC PATCH v2 06/20] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum via
@ 2025-03-12 16:12   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-12 16:12 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao




On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Subsequent patches for smmuv3-accel will make use of this
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  hw/arm/smmu-common.c         | 48 ++++++++++++++++++++++--------------
>  include/hw/arm/smmu-common.h |  6 +++++
>  2 files changed, 36 insertions(+), 18 deletions(-)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index f5caf1665c..83c0693f5a 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -826,12 +826,28 @@ SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num)
>      return NULL;
>  }
>  
> -static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
> +void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev,
> +                    PCIBus *bus, int devfn)
>  {
> -    SMMUState *s = opaque;
> -    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
> -    SMMUDevice *sdev;
>      static unsigned int index;
> +    char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn, index++);
> +
> +    sdev->smmu = s;
> +    sdev->bus = bus;
> +    sdev->devfn = devfn;
> +
> +    memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
> +                             s->mrtypename,
> +                             OBJECT(s), name, UINT64_MAX);
> +    address_space_init(&sdev->as,
> +                       MEMORY_REGION(&sdev->iommu), name);
> +    trace_smmu_add_mr(name);
> +    g_free(name);
> +}
> +
> +SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus)
> +{
> +    SMMUPciBus *sbus = g_hash_table_lookup(s->smmu_pcibus_by_busptr, bus);
>  
>      if (!sbus) {
>          sbus = g_malloc0(sizeof(SMMUPciBus) +
> @@ -840,23 +856,19 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
>          g_hash_table_insert(s->smmu_pcibus_by_busptr, bus, sbus);
>      }
>  
> +    return sbus;
> +}
> +
> +static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
> +{
> +    SMMUDevice *sdev;
> +    SMMUState *s = opaque;
> +    SMMUPciBus *sbus = smmu_get_sbus(s, bus);
> +
>      sdev = sbus->pbdev[devfn];
>      if (!sdev) {
> -        char *name = g_strdup_printf("%s-%d-%d", s->mrtypename, devfn, index++);
> -
>          sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1);
> -
> -        sdev->smmu = s;
> -        sdev->bus = bus;
> -        sdev->devfn = devfn;
> -
> -        memory_region_init_iommu(&sdev->iommu, sizeof(sdev->iommu),
> -                                 s->mrtypename,
> -                                 OBJECT(s), name, UINT64_MAX);
> -        address_space_init(&sdev->as,
> -                           MEMORY_REGION(&sdev->iommu), name);
> -        trace_smmu_add_mr(name);
> -        g_free(name);
> +        smmu_init_sdev(s, sdev, bus, devfn);
>      }
>  
>      return &sdev->as;
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index b5c63cfd5d..80ff2ef6aa 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -178,6 +178,12 @@ OBJECT_DECLARE_TYPE(SMMUState, SMMUBaseClass, ARM_SMMU)
>  /* Return the SMMUPciBus handle associated to a PCI bus number */
>  SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num);
>  
> +/* Return the SMMUPciBus handle associated to a PCI bus */
> +SMMUPciBus *smmu_get_sbus(SMMUState *s, PCIBus *bus);
> +
> +/* Initialize SMMUDevice handle associated to a SMMUPCIBus */
> +void smmu_init_sdev(SMMUState *s, SMMUDevice *sdev, PCIBus *bus, int devfn);
> +
>  /* Return the stream ID of an SMMU device */
>  static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
>  {



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-12 15:46     ` Shameerali Kolothum Thodi via
@ 2025-03-12 16:13       ` Eric Auger
  2025-03-12 16:22         ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-12 16:13 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer,


On 3/12/25 4:46 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: qemu-devel-
>> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org <qemu-
>> devel-bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org> On
>> Behalf Of Eric Auger
>> Sent: Wednesday, March 12, 2025 3:36 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
>> accel
>>
>> Hi Shameer,
>>
>>
>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>> Allow cold-plug smmuv3-accel to virt If the machine wide smmuv3
>>> is not specified.
>>>
>>> No FDT support is added for now.
>>>
>>> Signed-off-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com>
>>> ---
>>>  hw/arm/virt.c         | 12 ++++++++++++
>>>  hw/core/sysbus-fdt.c  |  1 +
>>>  include/hw/arm/virt.h |  1 +
>>>  3 files changed, 14 insertions(+)
>>>
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index 4a5a9666e9..84a323da55 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -73,6 +73,7 @@
>>>  #include "qobject/qlist.h"
>>>  #include "standard-headers/linux/input.h"
>>>  #include "hw/arm/smmuv3.h"
>>> +#include "hw/arm/smmuv3-accel.h"
>>>  #include "hw/acpi/acpi.h"
>>>  #include "target/arm/cpu-qom.h"
>>>  #include "target/arm/internals.h"
>>> @@ -2911,6 +2912,16 @@ static void
>> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>>>              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms-
>>> platform_bus_dev),
>>>                                       SYS_BUS_DEVICE(dev));
>>>          }
>>> +        if (object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_ACCEL))
>> {
>>> +            if (vms->iommu == VIRT_IOMMU_SMMUV3) {
>> maybe just check whether it is != VIRT_IOMMU_NONE?
>>> +                error_setg(errp,
>>> +                           "iommu=smmuv3 is already specified. can't create
>> smmuv3-accel dev");
>> I would clearly state "iommu=smmuv3 virt machine option is alreadt set"
>> and use an error hint to say both are not compatible.
>>> +                return;
>>> +            }
>>> +            if (vms->iommu != VIRT_IOMMU_SMMUV3_ACCEL) {
>>> +                vms->iommu = VIRT_IOMMU_SMMUV3_ACCEL;
>> I know there were quite a lot of dicussions on the 1st multi
>> instantiation series related to the way we instanatiate that device and
>> maybe I missed some blockers but why wouldn't we allow the instantiation
>> of the legacy smmu device with -device too. I think this would be
>> simpler for libvirt and we would somehow deprecate the machine option
>> method? would that make a problem if you were to use -device smmu,accel
>> or something alike?
> Thanks for taking a look. I am just jumping on this one for now.  Yes, there
> were discussions around that. But I was not sure we concluded on deprecating
> the machine option. So if I get you correctly the idea is,
>
> if we have, 
> -device smmuv3 it will instantiate the current machine wide smmuv3 and for
> -device smmuv3,accel this device?
yes that would be my preference.

Eric
>
> Thanks,
> Shameer



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-12 16:13       ` Eric Auger
@ 2025-03-12 16:22         ` Shameerali Kolothum Thodi via
  2025-03-12 16:27           ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-12 16:22 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: qemu-devel-
> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org <qemu-
> devel-bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org> On
> Behalf Of Eric Auger
> Sent: Wednesday, March 12, 2025 4:13 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
> accel
> 
> Hi Shameer,
> 
> 
> On 3/12/25 4:46 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: qemu-devel-
> >> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org <qemu-
> >> devel-bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org>
> On
> >> Behalf Of Eric Auger
> >> Sent: Wednesday, March 12, 2025 3:36 PM
> >> To: Shameerali Kolothum Thodi
> >> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> >> qemu-devel@nongnu.org
> >> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> >> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> >> mochs@nvidia.com; smostafa@google.com; Linuxarm
> >> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> >> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> >> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> >> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for
> >> smmuv3- accel
> >>
> >> Hi Shameer,
> >>
> >>
> >> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> >>> Allow cold-plug smmuv3-accel to virt If the machine wide smmuv3 is
> >>> not specified.
> >>>
> >>> No FDT support is added for now.
> >>>
> >>> Signed-off-by: Shameer Kolothum
> >> <shameerali.kolothum.thodi@huawei.com>
> >>> ---
> >>>  hw/arm/virt.c         | 12 ++++++++++++
> >>>  hw/core/sysbus-fdt.c  |  1 +
> >>>  include/hw/arm/virt.h |  1 +
> >>>  3 files changed, 14 insertions(+)
> >>>
> >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c index
> >>> 4a5a9666e9..84a323da55 100644
> >>> --- a/hw/arm/virt.c
> >>> +++ b/hw/arm/virt.c
> >>> @@ -73,6 +73,7 @@
> >>>  #include "qobject/qlist.h"
> >>>  #include "standard-headers/linux/input.h"
> >>>  #include "hw/arm/smmuv3.h"
> >>> +#include "hw/arm/smmuv3-accel.h"
> >>>  #include "hw/acpi/acpi.h"
> >>>  #include "target/arm/cpu-qom.h"
> >>>  #include "target/arm/internals.h"
> >>> @@ -2911,6 +2912,16 @@ static void
> >> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >>>              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms-
> >>> platform_bus_dev),
> >>>                                       SYS_BUS_DEVICE(dev));
> >>>          }
> >>> +        if (object_dynamic_cast(OBJECT(dev),
> >>> + TYPE_ARM_SMMUV3_ACCEL))
> >> {
> >>> +            if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> >> maybe just check whether it is != VIRT_IOMMU_NONE?
> >>> +                error_setg(errp,
> >>> +                           "iommu=smmuv3 is already specified.
> >>> + can't create
> >> smmuv3-accel dev");
> >> I would clearly state "iommu=smmuv3 virt machine option is alreadt set"
> >> and use an error hint to say both are not compatible.
> >>> +                return;
> >>> +            }
> >>> +            if (vms->iommu != VIRT_IOMMU_SMMUV3_ACCEL) {
> >>> +                vms->iommu = VIRT_IOMMU_SMMUV3_ACCEL;
> >> I know there were quite a lot of dicussions on the 1st multi
> >> instantiation series related to the way we instanatiate that device
> >> and maybe I missed some blockers but why wouldn't we allow the
> >> instantiation of the legacy smmu device with -device too. I think
> >> this would be simpler for libvirt and we would somehow deprecate the
> >> machine option method? would that make a problem if you were to use
> >> -device smmu,accel or something alike?
> > Thanks for taking a look. I am just jumping on this one for now.  Yes,
> > there were discussions around that. But I was not sure we concluded on
> > deprecating the machine option. So if I get you correctly the idea is,
> >
> > if we have,
> > -device smmuv3 it will instantiate the current machine wide smmuv3 and
> > for -device smmuv3,accel this device?
> yes that would be my preference.

Ok. I will look into that in my next respin. A quick question. Does qemu
DEVICE model support the differentiation like above easily? Or we have
to manage it with properties?

Any example device implementation like above already there?
Please let me know.

Thanks,
Shameer


 


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
  2025-03-11 14:10 ` [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps Shameer Kolothum via
@ 2025-03-12 16:23   ` Eric Auger
  2025-03-13  8:09     ` Shameerali Kolothum Thodi via
  2025-03-12 17:10   ` Eric Auger
  1 sibling, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-12 16:23 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao




On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Subsequently smmuv3-accel will provide these callbacks
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmu-common.c         | 27 +++++++++++++++++++++++++++
>  include/hw/arm/smmu-common.h |  5 +++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 83c0693f5a..9fd455baa0 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -865,6 +865,10 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
>      SMMUState *s = opaque;
>      SMMUPciBus *sbus = smmu_get_sbus(s, bus);
>  
> +    if (s->accel && s->get_address_space) {
> +        return s->get_address_space(bus, opaque, devfn);
> +    }
> +
why do we require that new call site? This needs to be documented in the
commit msg esp. because we don't know what this cb will do.
>      sdev = sbus->pbdev[devfn];
>      if (!sdev) {
>          sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1);
> @@ -874,8 +878,31 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
>      return &sdev->as;
>  }
>  
> +static bool smmu_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> +                                      HostIOMMUDevice *hiod, Error **errp)
> +{
> +    SMMUState *s = opaque;
> +
> +    if (s->accel && s->set_iommu_device) {
> +        return s->set_iommu_device(bus, opaque, devfn, hiod, errp);
> +    }
> +
> +    return false;
> +}
> +
> +static void smmu_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
> +{
> +    SMMUState *s = opaque;
> +
> +    if (s->accel && s->unset_iommu_device) {
> +        s->unset_iommu_device(bus, opaque, devfn);
> +    }
> +}
> +
>  static const PCIIOMMUOps smmu_ops = {
>      .get_address_space = smmu_find_add_as,
> +    .set_iommu_device = smmu_dev_set_iommu_device,
> +    .unset_iommu_device = smmu_dev_unset_iommu_device,
>  };
>  
>  SMMUDevice *smmu_find_sdev(SMMUState *s, uint32_t sid)
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index 80ff2ef6aa..7b05640167 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -160,6 +160,11 @@ struct SMMUState {
>  
>      /* For smmuv3-accel */
>      bool accel;
> +
> +    AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
> +    bool (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
> +                             HostIOMMUDevice *dev, Error **errp);
> +    void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
I think this should be exposed by a class and only implemented in the
smmuv3 accel device. Adding those cbs directly in the State looks not
the std way.

Eric
>  };
>  
>  struct SMMUBaseClass {



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-12 16:22         ` Shameerali Kolothum Thodi via
@ 2025-03-12 16:27           ` Eric Auger
  2025-03-12 17:34             ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-12 16:27 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org




On 3/12/25 5:22 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: qemu-devel-
>> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org <qemu-
>> devel-bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org> On
>> Behalf Of Eric Auger
>> Sent: Wednesday, March 12, 2025 4:13 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
>> accel
>>
>> Hi Shameer,
>>
>>
>> On 3/12/25 4:46 PM, Shameerali Kolothum Thodi wrote:
>>> Hi Eric,
>>>
>>>> -----Original Message-----
>>>> From: qemu-devel-
>>>> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org <qemu-
>>>> devel-bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org>
>> On
>>>> Behalf Of Eric Auger
>>>> Sent: Wednesday, March 12, 2025 3:36 PM
>>>> To: Shameerali Kolothum Thodi
>>>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>>>> qemu-devel@nongnu.org
>>>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>>>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>>>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>>> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for
>>>> smmuv3- accel
>>>>
>>>> Hi Shameer,
>>>>
>>>>
>>>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>>>> Allow cold-plug smmuv3-accel to virt If the machine wide smmuv3 is
>>>>> not specified.
>>>>>
>>>>> No FDT support is added for now.
>>>>>
>>>>> Signed-off-by: Shameer Kolothum
>>>> <shameerali.kolothum.thodi@huawei.com>
>>>>> ---
>>>>>  hw/arm/virt.c         | 12 ++++++++++++
>>>>>  hw/core/sysbus-fdt.c  |  1 +
>>>>>  include/hw/arm/virt.h |  1 +
>>>>>  3 files changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c index
>>>>> 4a5a9666e9..84a323da55 100644
>>>>> --- a/hw/arm/virt.c
>>>>> +++ b/hw/arm/virt.c
>>>>> @@ -73,6 +73,7 @@
>>>>>  #include "qobject/qlist.h"
>>>>>  #include "standard-headers/linux/input.h"
>>>>>  #include "hw/arm/smmuv3.h"
>>>>> +#include "hw/arm/smmuv3-accel.h"
>>>>>  #include "hw/acpi/acpi.h"
>>>>>  #include "target/arm/cpu-qom.h"
>>>>>  #include "target/arm/internals.h"
>>>>> @@ -2911,6 +2912,16 @@ static void
>>>> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>>>>>              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms-
>>>>> platform_bus_dev),
>>>>>                                       SYS_BUS_DEVICE(dev));
>>>>>          }
>>>>> +        if (object_dynamic_cast(OBJECT(dev),
>>>>> + TYPE_ARM_SMMUV3_ACCEL))
>>>> {
>>>>> +            if (vms->iommu == VIRT_IOMMU_SMMUV3) {
>>>> maybe just check whether it is != VIRT_IOMMU_NONE?
>>>>> +                error_setg(errp,
>>>>> +                           "iommu=smmuv3 is already specified.
>>>>> + can't create
>>>> smmuv3-accel dev");
>>>> I would clearly state "iommu=smmuv3 virt machine option is alreadt set"
>>>> and use an error hint to say both are not compatible.
>>>>> +                return;
>>>>> +            }
>>>>> +            if (vms->iommu != VIRT_IOMMU_SMMUV3_ACCEL) {
>>>>> +                vms->iommu = VIRT_IOMMU_SMMUV3_ACCEL;
>>>> I know there were quite a lot of dicussions on the 1st multi
>>>> instantiation series related to the way we instanatiate that device
>>>> and maybe I missed some blockers but why wouldn't we allow the
>>>> instantiation of the legacy smmu device with -device too. I think
>>>> this would be simpler for libvirt and we would somehow deprecate the
>>>> machine option method? would that make a problem if you were to use
>>>> -device smmu,accel or something alike?
>>> Thanks for taking a look. I am just jumping on this one for now.  Yes,
>>> there were discussions around that. But I was not sure we concluded on
>>> deprecating the machine option. So if I get you correctly the idea is,
>>>
>>> if we have,
>>> -device smmuv3 it will instantiate the current machine wide smmuv3 and
>>> for -device smmuv3,accel this device?
>> yes that would be my preference.
> Ok. I will look into that in my next respin. A quick question. Does qemu
> DEVICE model support the differentiation like above easily? Or we have
> to manage it with properties?
Not sure if I understand you question. I meant it can be a boolean
device property (DEFINE_PROP_BOOL) smmuv3,accel=on

No?

Eric
>
> Any example device implementation like above already there?
> Please let me know.
>
> Thanks,
> Shameer
>
>
>  
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-12 16:07   ` Eric Auger
@ 2025-03-12 16:34     ` Shameerali Kolothum Thodi via
  2025-03-12 16:39       ` Daniel P. Berrangé
  2025-03-12 16:42       ` Eric Auger
  0 siblings, 2 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-12 16:34 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, March 12, 2025 4:08 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
> pcie bus
> 
> Hi Shameer,
> 
> 
> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> > User must associate a pxb-pcie root bus to smmuv3-accel
> > and that is set as the primary-bus for the smmu dev.
> why do we require a pxb-pcie root bus? why can't pci.0 root bus be used
> for simpler use cases (ie. I just want to passthough a NIC in
> accelerated mode). Or may pci.0 is also called a pax-pcie root bus?

The idea was since pcie.0 is the default RC with virt, leave that to cases where
we want to attach any emulated devices and use pxb-pcie based RCs for vfio-pci.

> 
> Besides, why do we put the constraint to plug on a root bus. I know that
> at this point we always plug to pci.0 but with the new -device option it
> would be possible to plug it anywhere in the pcie hierarchy. At SOC
> level can't an SMMU be plugged anywhere protecting just a few RIDs?

In my understanding normally(or atleast in the most common cases) it is attached 
to root complexes. Also IORT mappings are at the root complex level, right?

To 
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> > index c327661636..1471b65374 100644
> > --- a/hw/arm/smmuv3-accel.c
> > +++ b/hw/arm/smmuv3-accel.c
> > @@ -9,6 +9,21 @@
> >  #include "qemu/osdep.h"
> >
> >  #include "hw/arm/smmuv3-accel.h"
> > +#include "hw/pci/pci_bridge.h"
> > +
> > +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
> > +{
> > +    DeviceState *d = opaque;
> > +
> > +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
> > +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
> > +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus-
> >name)) {
> > +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
> > +                                     &error_abort);
> if you want to stop the recursive search I think you need to return
> something != 0 here.

Ok.
 
> I don't really understand why we don't simply set the primary-bus to
> <bus> where -device arm-smmuv3-accel, bus=<bus>? or maybe enforce that
> this bus is an actual root bus if we really need that?

The primary-bus here is actually the property of the parent SMMU device.
This one now has,

-device arm-smmuv3-accel, bus format.


> > +        }
> > +    }
> > +    return 0;
> > +}
> >
> >  static void smmu_accel_realize(DeviceState *d, Error **errp)
> >  {
> > @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d, Error
> **errp)
> >      SysBusDevice *dev = SYS_BUS_DEVICE(d);
> >      Error *local_err = NULL;
> >
> > +    object_child_foreach_recursive(object_get_root(),
> > +                                   smmuv3_accel_pxb_pcie_bus, d);
> > +
> >      object_property_set_bool(OBJECT(dev), "accel", true, &error_abort);
> >      c->parent_realize(d, &local_err);
> >      if (local_err) {
> > @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass
> *klass, void *data)
> >      device_class_set_parent_realize(dc, smmu_accel_realize,
> >                                      &c->parent_realize);
> >      dc->hotpluggable = false;
> > +    dc->bus_type = TYPE_PCIE_BUS;
> shouldn't it below to 3/20? It is not really related to primary_bus
> setting?

This is to set the bus property of this smmuv3-accel device. As mentioned 
above primary-bus is the property of parent TYPE_ARM_SMMU device.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-12 16:34     ` Shameerali Kolothum Thodi via
@ 2025-03-12 16:39       ` Daniel P. Berrangé
  2025-03-12 17:28         ` Shameerali Kolothum Thodi via
  2025-03-12 16:42       ` Eric Auger
  1 sibling, 1 reply; 145+ messages in thread
From: Daniel P. Berrangé @ 2025-03-12 16:39 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Wed, Mar 12, 2025 at 04:34:18PM +0000, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
> > -----Original Message-----
> > From: Eric Auger <eric.auger@redhat.com>
> > Sent: Wednesday, March 12, 2025 4:08 PM
> > To: Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> > qemu-devel@nongnu.org
> > Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> > ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> > mochs@nvidia.com; smostafa@google.com; Linuxarm
> > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
> > pcie bus
> > 
> > Hi Shameer,
> > 
> > 
> > On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> > > User must associate a pxb-pcie root bus to smmuv3-accel
> > > and that is set as the primary-bus for the smmu dev.
> > why do we require a pxb-pcie root bus? why can't pci.0 root bus be used
> > for simpler use cases (ie. I just want to passthough a NIC in
> > accelerated mode). Or may pci.0 is also called a pax-pcie root bus?
> 
> The idea was since pcie.0 is the default RC with virt, leave that to cases where
> we want to attach any emulated devices and use pxb-pcie based RCs for vfio-pci.

The majority of management applications will never do anything other
than a flat PCI(e) topology by default. Some might enable pxb-pcie as
an optional but plenty won't ever support it. If you want to maximise
the potential usefulness of the ssmmuv3-accel, and it is technically
viable, it would be worth permitting choice of attachment to the root
bus as an alteranative to the pxb.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-12 16:34     ` Shameerali Kolothum Thodi via
  2025-03-12 16:39       ` Daniel P. Berrangé
@ 2025-03-12 16:42       ` Eric Auger
  2025-03-13  8:22         ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-12 16:42 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org




On 3/12/25 5:34 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Wednesday, March 12, 2025 4:08 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
>> pcie bus
>>
>> Hi Shameer,
>>
>>
>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>> User must associate a pxb-pcie root bus to smmuv3-accel
>>> and that is set as the primary-bus for the smmu dev.
>> why do we require a pxb-pcie root bus? why can't pci.0 root bus be used
>> for simpler use cases (ie. I just want to passthough a NIC in
>> accelerated mode). Or may pci.0 is also called a pax-pcie root bus?
> The idea was since pcie.0 is the default RC with virt, leave that to cases where
> we want to attach any emulated devices and use pxb-pcie based RCs for vfio-pci.
yes but for simpler use case you may not want the extra pain to
instantiate a pxb-pcie device. Actually libvirt does not instantiate it
by default.
>
>> Besides, why do we put the constraint to plug on a root bus. I know that
>> at this point we always plug to pci.0 but with the new -device option it
>> would be possible to plug it anywhere in the pcie hierarchy. At SOC
>> level can't an SMMU be plugged anywhere protecting just a few RIDs?
> In my understanding normally(or atleast in the most common cases) it is attached 
> to root complexes. Also IORT mappings are at the root complex level, right?
Yes I do agree the IORT describes ID mappings between RC and SMMU but
the actual ID mappings allow you to be much more precise and state that
a given SMMU only translates few RIDs within that RID space. If you
force the device bus to be a root bus you can't model that anymore.

Eric
>
> To 
>>> Signed-off-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com>
>>> ---
>>>  hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
>>>  1 file changed, 19 insertions(+)
>>>
>>> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
>>> index c327661636..1471b65374 100644
>>> --- a/hw/arm/smmuv3-accel.c
>>> +++ b/hw/arm/smmuv3-accel.c
>>> @@ -9,6 +9,21 @@
>>>  #include "qemu/osdep.h"
>>>
>>>  #include "hw/arm/smmuv3-accel.h"
>>> +#include "hw/pci/pci_bridge.h"
>>> +
>>> +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
>>> +{
>>> +    DeviceState *d = opaque;
>>> +
>>> +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
>>> +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
>>> +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus-
>>> name)) {
>>> +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
>>> +                                     &error_abort);
>> if you want to stop the recursive search I think you need to return
>> something != 0 here.
> Ok.
>  
>> I don't really understand why we don't simply set the primary-bus to
>> <bus> where -device arm-smmuv3-accel, bus=<bus>? or maybe enforce that
>> this bus is an actual root bus if we really need that?
> The primary-bus here is actually the property of the parent SMMU device.
> This one now has,
>
> -device arm-smmuv3-accel, bus format.
>
>
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>>
>>>  static void smmu_accel_realize(DeviceState *d, Error **errp)
>>>  {
>>> @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d, Error
>> **errp)
>>>      SysBusDevice *dev = SYS_BUS_DEVICE(d);
>>>      Error *local_err = NULL;
>>>
>>> +    object_child_foreach_recursive(object_get_root(),
>>> +                                   smmuv3_accel_pxb_pcie_bus, d);
>>> +
>>>      object_property_set_bool(OBJECT(dev), "accel", true, &error_abort);
>>>      c->parent_realize(d, &local_err);
>>>      if (local_err) {
>>> @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass
>> *klass, void *data)
>>>      device_class_set_parent_realize(dc, smmu_accel_realize,
>>>                                      &c->parent_realize);
>>>      dc->hotpluggable = false;
>>> +    dc->bus_type = TYPE_PCIE_BUS;
>> shouldn't it below to 3/20? It is not really related to primary_bus
>> setting?
> This is to set the bus property of this smmuv3-accel device. As mentioned 
> above primary-bus is the property of parent TYPE_ARM_SMMU device.
>
> Thanks,
> Shameer



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
  2025-03-11 14:10 ` [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps Shameer Kolothum via
  2025-03-12 16:23   ` Eric Auger
@ 2025-03-12 17:10   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-12 17:10 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao




On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Subsequently smmuv3-accel will provide these callbacks
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmu-common.c         | 27 +++++++++++++++++++++++++++
>  include/hw/arm/smmu-common.h |  5 +++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 83c0693f5a..9fd455baa0 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -865,6 +865,10 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
>      SMMUState *s = opaque;
>      SMMUPciBus *sbus = smmu_get_sbus(s, bus);
>  
> +    if (s->accel && s->get_address_space) {
> +        return s->get_address_space(bus, opaque, devfn);
> +    }
after reading subsequent patch the code below should be another
implementation of this specific cb for non accelerate SMMU. So in that
patch you could introduce the implementation for the non accelerated SMMU.

Eric
> +
>      sdev = sbus->pbdev[devfn];
>      if (!sdev) {
>          sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1);
> @@ -874,8 +878,31 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
>      return &sdev->as;
>  }
>  
> +static bool smmu_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> +                                      HostIOMMUDevice *hiod, Error **errp)
> +{
> +    SMMUState *s = opaque;
> +
> +    if (s->accel && s->set_iommu_device) {
> +        return s->set_iommu_device(bus, opaque, devfn, hiod, errp);
> +    }
> +
> +    return false;
> +}
> +
> +static void smmu_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
> +{
> +    SMMUState *s = opaque;
> +
> +    if (s->accel && s->unset_iommu_device) {
> +        s->unset_iommu_device(bus, opaque, devfn);
> +    }
> +}
> +
>  static const PCIIOMMUOps smmu_ops = {
>      .get_address_space = smmu_find_add_as,
> +    .set_iommu_device = smmu_dev_set_iommu_device,
> +    .unset_iommu_device = smmu_dev_unset_iommu_device,
>  };
>  
>  SMMUDevice *smmu_find_sdev(SMMUState *s, uint32_t sid)
> diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index 80ff2ef6aa..7b05640167 100644
> --- a/include/hw/arm/smmu-common.h
> +++ b/include/hw/arm/smmu-common.h
> @@ -160,6 +160,11 @@ struct SMMUState {
>  
>      /* For smmuv3-accel */
>      bool accel;
> +
> +    AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
> +    bool (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
> +                             HostIOMMUDevice *dev, Error **errp);
> +    void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
>  };
>  
>  struct SMMUBaseClass {



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 08/20] hw/arm/smmuv3-accel: Provide get_address_space callback
  2025-03-11 14:10 ` [RFC PATCH v2 08/20] hw/arm/smmuv3-accel: Provide get_address_space callback Shameer Kolothum via
  2025-03-11 20:50   ` Nicolin Chen
@ 2025-03-12 17:14   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-12 17:14 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao


Hi Shameer,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Also introduce a struct SMMUv3AccelDevice to hold accelerator specific
> device info. This will be populated accordingly in subsequent patches.
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c         | 36 +++++++++++++++++++++++++++++++++++
>  include/hw/arm/smmuv3-accel.h |  4 ++++
>  2 files changed, 40 insertions(+)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 1471b65374..6610ebe4be 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -11,6 +11,40 @@
>  #include "hw/arm/smmuv3-accel.h"
>  #include "hw/pci/pci_bridge.h"
>  
> +static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
> +                                                PCIBus *bus, int devfn)
> +{
> +    SMMUDevice *sdev = sbus->pbdev[devfn];
> +    SMMUv3AccelDevice *accel_dev;
> +
> +    if (sdev) {
> +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +    } else {
> +        accel_dev = g_new0(SMMUv3AccelDevice, 1);
> +        sdev = &accel_dev->sdev;
> +
> +        sbus->pbdev[devfn] = sdev;
> +        smmu_init_sdev(s, sdev, bus, devfn);
> +    }
> +
> +    return accel_dev;
> +}
> +
> +static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
> +                                              int devfn)
If you reimplement the ops for the accelerated smmu why did you need to
add:

+++ b/hw/arm/smmu-common.c
@@ -865,6 +865,10 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
     SMMUState *s = opaque;
     SMMUPciBus *sbus = smmu_get_sbus(s, bus);
 
+    if (s->accel && s->get_address_space) {
+        return s->get_address_space(bus, opaque, devfn);
+    }

in 7/20?

Eric

> +{
> +    SMMUState *s = opaque;
> +    SMMUPciBus *sbus;
> +    SMMUv3AccelDevice *accel_dev;
> +    SMMUDevice *sdev;
> +
> +    sbus = smmu_get_sbus(s, bus);
> +    accel_dev = smmuv3_accel_get_dev(s, sbus, bus, devfn);
> +    sdev = &accel_dev->sdev;
> +
> +    return &sdev->as;
> +}
> +
>  static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
>  {
>      DeviceState *d = opaque;
> @@ -30,6 +64,7 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
>      SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(d);
>      SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_GET_CLASS(s_accel);
>      SysBusDevice *dev = SYS_BUS_DEVICE(d);
> +    SMMUState *bs = ARM_SMMU(d);
>      Error *local_err = NULL;
>  
>      object_child_foreach_recursive(object_get_root(),
> @@ -41,6 +76,7 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
>          error_propagate(errp, local_err);
>          return;
>      }
> +    bs->get_address_space = smmuv3_accel_find_add_as;
>  }
>  
>  static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> index 56fe376bf4..86c0523063 100644
> --- a/include/hw/arm/smmuv3-accel.h
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -16,6 +16,10 @@
>  #define TYPE_ARM_SMMUV3_ACCEL   "arm-smmuv3-accel"
>  OBJECT_DECLARE_TYPE(SMMUv3AccelState, SMMUv3AccelClass, ARM_SMMUV3_ACCEL)
>  
> +typedef struct SMMUv3AccelDevice {
> +    SMMUDevice  sdev;
> +} SMMUv3AccelDevice;
> +
>  struct SMMUv3AccelState {
>      SMMUv3State smmuv3_state;
>  };



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-12 16:39       ` Daniel P. Berrangé
@ 2025-03-12 17:28         ` Shameerali Kolothum Thodi via
  2025-03-13 15:21           ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-12 17:28 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Wednesday, March 12, 2025 4:39 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: eric.auger@redhat.com; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
> pcie bus
> 
> On Wed, Mar 12, 2025 at 04:34:18PM +0000, Shameerali Kolothum Thodi
> wrote:
> > Hi Eric,
> >
> > > -----Original Message-----
> > > From: Eric Auger <eric.auger@redhat.com>
> > > Sent: Wednesday, March 12, 2025 4:08 PM
> > > To: Shameerali Kolothum Thodi
> > > <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> > > qemu-devel@nongnu.org
> > > Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> > > ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> > > mochs@nvidia.com; smostafa@google.com; Linuxarm
> > > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > > Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a
> pxb-
> > > pcie bus
> > >
> > > Hi Shameer,
> > >
> > >
> > > On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> > > > User must associate a pxb-pcie root bus to smmuv3-accel
> > > > and that is set as the primary-bus for the smmu dev.
> > > why do we require a pxb-pcie root bus? why can't pci.0 root bus be used
> > > for simpler use cases (ie. I just want to passthough a NIC in
> > > accelerated mode). Or may pci.0 is also called a pax-pcie root bus?
> >
> > The idea was since pcie.0 is the default RC with virt, leave that to cases
> where
> > we want to attach any emulated devices and use pxb-pcie based RCs for
> vfio-pci.
> 
> The majority of management applications will never do anything other
> than a flat PCI(e) topology by default. Some might enable pxb-pcie as
> an optional but plenty won't ever support it. If you want to maximise
> the potential usefulness of the ssmmuv3-accel, and it is technically
> viable, it would be worth permitting choice of attachment to the root
> bus as an alteranative to the pxb.

Ok. I will look into this. Though I am not sure when we have smmuv3-accel
to pcie.0 we can still have additional smmuv3-accel with pxb-pcie or not.
It looks like pxb-pcie will be plugged into pcie.0. And if that is the case
IORT mappings will be difficult I guess. I need to double check.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-12 16:27           ` Eric Auger
@ 2025-03-12 17:34             ` Shameerali Kolothum Thodi via
  2025-03-12 18:30               ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-12 17:34 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, March 12, 2025 4:28 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
> accel

> >> Hi Shameer,
> >>>> I know there were quite a lot of dicussions on the 1st multi
> >>>> instantiation series related to the way we instanatiate that device
> >>>> and maybe I missed some blockers but why wouldn't we allow the
> >>>> instantiation of the legacy smmu device with -device too. I think
> >>>> this would be simpler for libvirt and we would somehow deprecate
> >>>> the machine option method? would that make a problem if you were
> to
> >>>> use -device smmu,accel or something alike?
> >>> Thanks for taking a look. I am just jumping on this one for now.
> >>> Yes, there were discussions around that. But I was not sure we
> >>> concluded on deprecating the machine option. So if I get you
> >>> correctly the idea is,
> >>>
> >>> if we have,
> >>> -device smmuv3 it will instantiate the current machine wide smmuv3
> >>> and for -device smmuv3,accel this device?
> >> yes that would be my preference.
> > Ok. I will look into that in my next respin. A quick question. Does
> > qemu DEVICE model support the differentiation like above easily? Or we
> > have to manage it with properties?
> Not sure if I understand you question. I meant it can be a boolean device
> property (DEFINE_PROP_BOOL) smmuv3,accel=on
> 
> No?

Right. My query was more about any hidden Qemu magic to have device instantiation
similar to what we have at the moment even though we name both devices "smmuv3".
 
That way I can keep much of the code rather than checking "accel" property
in SMMUv3 code and redirecting calls. But looks like not. 

Thanks,
Shameer





^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-12 17:34             ` Shameerali Kolothum Thodi via
@ 2025-03-12 18:30               ` Eric Auger
  2025-03-13  8:26                 ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-12 18:30 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer,


On 3/12/25 6:34 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Wednesday, March 12, 2025 4:28 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
>> accel
>>>> Hi Shameer,
>>>>>> I know there were quite a lot of dicussions on the 1st multi
>>>>>> instantiation series related to the way we instanatiate that device
>>>>>> and maybe I missed some blockers but why wouldn't we allow the
>>>>>> instantiation of the legacy smmu device with -device too. I think
>>>>>> this would be simpler for libvirt and we would somehow deprecate
>>>>>> the machine option method? would that make a problem if you were
>> to
>>>>>> use -device smmu,accel or something alike?
>>>>> Thanks for taking a look. I am just jumping on this one for now.
>>>>> Yes, there were discussions around that. But I was not sure we
>>>>> concluded on deprecating the machine option. So if I get you
>>>>> correctly the idea is,
>>>>>
>>>>> if we have,
>>>>> -device smmuv3 it will instantiate the current machine wide smmuv3
>>>>> and for -device smmuv3,accel this device?
>>>> yes that would be my preference.
>>> Ok. I will look into that in my next respin. A quick question. Does
>>> qemu DEVICE model support the differentiation like above easily? Or we
>>> have to manage it with properties?
>> Not sure if I understand you question. I meant it can be a boolean device
>> property (DEFINE_PROP_BOOL) smmuv3,accel=on
>>
>> No?
> Right. My query was more about any hidden Qemu magic to have device instantiation
> similar to what we have at the moment even though we name both devices "smmuv3".
>  
> That way I can keep much of the code rather than checking "accel" property
> in SMMUv3 code and redirecting calls. But looks like not. 
I don't think there is any such a trick.

Having the legacy device (without accel) only instantiable with the virt
machine option and the new accelerated one only instantiable with a
-device option looks strange to me. By the way they model the same
device so I think it makes more sense to use the same device with an
option.

Also do you see anything that would prevent the acceleration enhanced
device from being able to translate emulated devices as well. Ideally
the smmu device should react differently depending on the device which
is translated. I think it worked with the original implementation as far
as I remember.

Thanks

Eric
>
> Thanks,
> Shameer
>
>
>
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
  2025-03-12 16:23   ` Eric Auger
@ 2025-03-13  8:09     ` Shameerali Kolothum Thodi via
  2025-03-17 16:52       ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-13  8:09 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, March 12, 2025 4:24 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce
> callbacks for PCIIOMMUOps
> 
> 
> 
> 
> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> > Subsequently smmuv3-accel will provide these callbacks
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  hw/arm/smmu-common.c         | 27 +++++++++++++++++++++++++++
> >  include/hw/arm/smmu-common.h |  5 +++++
> >  2 files changed, 32 insertions(+)
> >
> > diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c index
> > 83c0693f5a..9fd455baa0 100644
> > --- a/hw/arm/smmu-common.c
> > +++ b/hw/arm/smmu-common.c
> > @@ -865,6 +865,10 @@ static AddressSpace *smmu_find_add_as(PCIBus
> *bus, void *opaque, int devfn)
> >      SMMUState *s = opaque;
> >      SMMUPciBus *sbus = smmu_get_sbus(s, bus);
> >
> > +    if (s->accel && s->get_address_space) {
> > +        return s->get_address_space(bus, opaque, devfn);
> > +    }
> > +
> why do we require that new call site? This needs to be documented in the
> commit msg esp. because we don't know what this cb will do.

Currently, this is where the first time SMMUDevice sdev is allocated. And for smmuv3-accel
cases we are introducing a new SMMUv3AccelDevice accel_dev for holding additional
accel specific information. In order to do that the above cb is used. Same for other callbacks
as well.

Another way of avoiding the callbacks would be to  move the pci_setup_iommu(bus, ops) 
call from the smmu-common.c to smmuv3/smmuv3-accel and handle it directly there.

Or is there a better idea?

> >      sdev = sbus->pbdev[devfn];
> >      if (!sdev) {
> >          sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1); @@ -874,8
> > +878,31 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void
> *opaque, int devfn)
> >      return &sdev->as;
> >  }
> >
> > +static bool smmu_dev_set_iommu_device(PCIBus *bus, void *opaque,
> int devfn,
> > +                                      HostIOMMUDevice *hiod, Error
> > +**errp) {
> > +    SMMUState *s = opaque;
> > +
> > +    if (s->accel && s->set_iommu_device) {
> > +        return s->set_iommu_device(bus, opaque, devfn, hiod, errp);
> > +    }
> > +
> > +    return false;
> > +}
> > +
> > +static void smmu_dev_unset_iommu_device(PCIBus *bus, void *opaque,
> > +int devfn) {
> > +    SMMUState *s = opaque;
> > +
> > +    if (s->accel && s->unset_iommu_device) {
> > +        s->unset_iommu_device(bus, opaque, devfn);
> > +    }
> > +}
> > +
> >  static const PCIIOMMUOps smmu_ops = {
> >      .get_address_space = smmu_find_add_as,
> > +    .set_iommu_device = smmu_dev_set_iommu_device,
> > +    .unset_iommu_device = smmu_dev_unset_iommu_device,
> >  };
> >
> >  SMMUDevice *smmu_find_sdev(SMMUState *s, uint32_t sid) diff --git
> > a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
> index
> > 80ff2ef6aa..7b05640167 100644
> > --- a/include/hw/arm/smmu-common.h
> > +++ b/include/hw/arm/smmu-common.h
> > @@ -160,6 +160,11 @@ struct SMMUState {
> >
> >      /* For smmuv3-accel */
> >      bool accel;
> > +
> > +    AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int
> devfn);
> > +    bool (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
> > +                             HostIOMMUDevice *dev, Error **errp);
> > +    void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
> I think this should be exposed by a class and only implemented in the
> smmuv3 accel device. Adding those cbs directly in the State looks not the
> std way.

Ok. You mean we can directly place  PCIIOMMUOps *ops here then? 

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-12 16:42       ` Eric Auger
@ 2025-03-13  8:22         ` Shameerali Kolothum Thodi via
  2025-03-17 16:57           ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-13  8:22 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, March 12, 2025 4:42 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
> pcie bus
> 
> 
> 
> 
> On 3/12/25 5:34 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Eric Auger <eric.auger@redhat.com>
> >> Sent: Wednesday, March 12, 2025 4:08 PM
> >> To: Shameerali Kolothum Thodi
> >> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> >> qemu-devel@nongnu.org
> >> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> >> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> >> mochs@nvidia.com; smostafa@google.com; Linuxarm
> >> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> >> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> >> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> >> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a
> pxb-
> >> pcie bus
> >>
> >> Hi Shameer,
> >>
> >>
> >> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> >>> User must associate a pxb-pcie root bus to smmuv3-accel
> >>> and that is set as the primary-bus for the smmu dev.
> >> why do we require a pxb-pcie root bus? why can't pci.0 root bus be used
> >> for simpler use cases (ie. I just want to passthough a NIC in
> >> accelerated mode). Or may pci.0 is also called a pax-pcie root bus?
> > The idea was since pcie.0 is the default RC with virt, leave that to cases
> where
> > we want to attach any emulated devices and use pxb-pcie based RCs for
> vfio-pci.
> yes but for simpler use case you may not want the extra pain to
> instantiate a pxb-pcie device. Actually libvirt does not instantiate it
> by default.
> >
> >> Besides, why do we put the constraint to plug on a root bus. I know that
> >> at this point we always plug to pci.0 but with the new -device option it
> >> would be possible to plug it anywhere in the pcie hierarchy. At SOC
> >> level can't an SMMU be plugged anywhere protecting just a few RIDs?
> > In my understanding normally(or atleast in the most common cases) it is
> attached
> > to root complexes. Also IORT mappings are at the root complex level,
> right?
> Yes I do agree the IORT describes ID mappings between RC and SMMU but
> the actual ID mappings allow you to be much more precise and state that
> a given SMMU only translates few RIDs within that RID space. If you
> force the device bus to be a root bus you can't model that anymore.
>

Do we really need to support that? What if the user then have another smmuv3-accel
in the associated upstream buses/RC as well? Not sure how to handle that.
 
Thanks,
Shameer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-12 18:30               ` Eric Auger
@ 2025-03-13  8:26                 ` Shameerali Kolothum Thodi via
  2025-03-13 15:22                   ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-13  8:26 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, March 12, 2025 6:31 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
> accel
> 
> Hi Shameer,
> 
> 
> >>>>> Thanks for taking a look. I am just jumping on this one for now.
> >>>>> Yes, there were discussions around that. But I was not sure we
> >>>>> concluded on deprecating the machine option. So if I get you
> >>>>> correctly the idea is,
> >>>>>
> >>>>> if we have,
> >>>>> -device smmuv3 it will instantiate the current machine wide smmuv3
> >>>>> and for -device smmuv3,accel this device?
> >>>> yes that would be my preference.
> >>> Ok. I will look into that in my next respin. A quick question. Does
> >>> qemu DEVICE model support the differentiation like above easily? Or
> we
> >>> have to manage it with properties?
> >> Not sure if I understand you question. I meant it can be a boolean
> device
> >> property (DEFINE_PROP_BOOL) smmuv3,accel=on
> >>
> >> No?
> > Right. My query was more about any hidden Qemu magic to have device
> instantiation
> > similar to what we have at the moment even though we name both
> devices "smmuv3".
> >
> > That way I can keep much of the code rather than checking "accel"
> property
> > in SMMUv3 code and redirecting calls. But looks like not.
> I don't think there is any such a trick.
> 
> Having the legacy device (without accel) only instantiable with the virt
> machine option and the new accelerated one only instantiable with a
> -device option looks strange to me. By the way they model the same
> device so I think it makes more sense to use the same device with an
> option.

Ok. Will address that in the next respin.

> Also do you see anything that would prevent the acceleration enhanced
> device from being able to translate emulated devices as well. Ideally
> the smmu device should react differently depending on the device which
> is translated. I think it worked with the original implementation as far
> as I remember.

Yes, smmuv3-accel works with emulated devices as well. Currently the only
limitation is, we should have at least one vfi-pci dev cold plugged as mentioned
in the cover letter. Hopefully we will be able to resolve that restriction soon.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-12 17:28         ` Shameerali Kolothum Thodi via
@ 2025-03-13 15:21           ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-13 15:21 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org


Hi Shameer,

On 3/12/25 6:28 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Daniel P. Berrangé <berrange@redhat.com>
>> Sent: Wednesday, March 12, 2025 4:39 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: eric.auger@redhat.com; qemu-arm@nongnu.org; qemu-
>> devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
>> nicolinc@nvidia.com; ddutile@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
>> pcie bus
>>
>> On Wed, Mar 12, 2025 at 04:34:18PM +0000, Shameerali Kolothum Thodi
>> wrote:
>>> Hi Eric,
>>>
>>>> -----Original Message-----
>>>> From: Eric Auger <eric.auger@redhat.com>
>>>> Sent: Wednesday, March 12, 2025 4:08 PM
>>>> To: Shameerali Kolothum Thodi
>>>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>>>> qemu-devel@nongnu.org
>>>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>>>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>>>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a
>> pxb-
>>>> pcie bus
>>>>
>>>> Hi Shameer,
>>>>
>>>>
>>>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>>>> User must associate a pxb-pcie root bus to smmuv3-accel
>>>>> and that is set as the primary-bus for the smmu dev.
>>>> why do we require a pxb-pcie root bus? why can't pci.0 root bus be used
>>>> for simpler use cases (ie. I just want to passthough a NIC in
>>>> accelerated mode). Or may pci.0 is also called a pax-pcie root bus?
>>> The idea was since pcie.0 is the default RC with virt, leave that to cases
>> where
>>> we want to attach any emulated devices and use pxb-pcie based RCs for
>> vfio-pci.
>>
>> The majority of management applications will never do anything other
>> than a flat PCI(e) topology by default. Some might enable pxb-pcie as
>> an optional but plenty won't ever support it. If you want to maximise
>> the potential usefulness of the ssmmuv3-accel, and it is technically
>> viable, it would be worth permitting choice of attachment to the root
>> bus as an alteranative to the pxb.
> Ok. I will look into this. Though I am not sure when we have smmuv3-accel
> to pcie.0 we can still have additional smmuv3-accel with pxb-pcie or not.
> It looks like pxb-pcie will be plugged into pcie.0. And if that is the case
> IORT mappings will be difficult I guess. I need to double check.

Indeed it makes things more difficult in terms of id mapping but I think
it would bring some benefits to be able to plug the accel smmu on pci.0 too.

some logic should be there already because you can bypass the SMMU on a
given pxb while enabled on pci.0:
see

[PATCH v5 0/9] IOMMU: Add support for IOMMU Bypass Feature <https://lore.kernel.org/all/1625748919-52456-1-git-send-email-wangxingang5@huawei.com/#r>
https://lore.kernel.org/all/1625748919-52456-1-git-send-email-wangxingang5@huawei.com/

Eric

>
> Thanks,
> Shameer



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-13  8:26                 ` Shameerali Kolothum Thodi via
@ 2025-03-13 15:22                   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-13 15:22 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org




On 3/13/25 9:26 AM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Wednesday, March 12, 2025 6:31 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
>> accel
>>
>> Hi Shameer,
>>
>>
>>>>>>> Thanks for taking a look. I am just jumping on this one for now.
>>>>>>> Yes, there were discussions around that. But I was not sure we
>>>>>>> concluded on deprecating the machine option. So if I get you
>>>>>>> correctly the idea is,
>>>>>>>
>>>>>>> if we have,
>>>>>>> -device smmuv3 it will instantiate the current machine wide smmuv3
>>>>>>> and for -device smmuv3,accel this device?
>>>>>> yes that would be my preference.
>>>>> Ok. I will look into that in my next respin. A quick question. Does
>>>>> qemu DEVICE model support the differentiation like above easily? Or
>> we
>>>>> have to manage it with properties?
>>>> Not sure if I understand you question. I meant it can be a boolean
>> device
>>>> property (DEFINE_PROP_BOOL) smmuv3,accel=on
>>>>
>>>> No?
>>> Right. My query was more about any hidden Qemu magic to have device
>> instantiation
>>> similar to what we have at the moment even though we name both
>> devices "smmuv3".
>>> That way I can keep much of the code rather than checking "accel"
>> property
>>> in SMMUv3 code and redirecting calls. But looks like not.
>> I don't think there is any such a trick.
>>
>> Having the legacy device (without accel) only instantiable with the virt
>> machine option and the new accelerated one only instantiable with a
>> -device option looks strange to me. By the way they model the same
>> device so I think it makes more sense to use the same device with an
>> option.
> Ok. Will address that in the next respin.
>
>> Also do you see anything that would prevent the acceleration enhanced
>> device from being able to translate emulated devices as well. Ideally
>> the smmu device should react differently depending on the device which
>> is translated. I think it worked with the original implementation as far
>> as I remember.
> Yes, smmuv3-accel works with emulated devices as well. Currently the only
> limitation is, we should have at least one vfi-pci dev cold plugged as mentioned
> in the cover letter. Hopefully we will be able to resolve that restriction soon.

OK thanks

Eric
>
> Thanks,
> Shameer
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-03-11 21:07   ` Nicolin Chen
@ 2025-03-17  8:38     ` Shameerali Kolothum Thodi via
  2025-03-17 18:19       ` Nicolin Chen
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-17  8:38 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Nicolin,

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, March 11, 2025 9:08 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add
> set/unset_iommu_device callback
> 
> On Tue, Mar 11, 2025 at 02:10:34PM +0000, Shameer Kolothum wrote:
> > @@ -30,6 +32,185 @@ static SMMUv3AccelDevice
> *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
> >      return accel_dev;
> >  }
> >
> > +static bool
> > +smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
> > +                               HostIOMMUDeviceIOMMUFD *idev, Error **errp)
> 
> With vEVENTQ v9, vDEVICE (vSID) is required to attach a device
> to a proxy NESTED hwpt (applicable to bypass/abort HWPTs too).
> So, host_iommu_device_iommufd_attach_hwpt() would fail in this
> function because vSID isn't ready at this stage. So all those
> calls should be moved out of the function, then this should be
> likely "smmuv3_accel_dev_alloc_viommu"?
> 
> That being said, I don't know when QEMU actually prepare a BDF
> number for a vfio-pci device. The only place that I see it is
> ready is at guest-level SMMU installing the Stream Table, i.e.
> in smmuv3_accel_install_nested_ste().
> 
> > +{
> > +    struct iommu_hwpt_arm_smmuv3 bypass_data = {
> > +        .ste = { 0x9ULL, 0x0ULL },
> > +    };
> > +    struct iommu_hwpt_arm_smmuv3 abort_data = {
> > +        .ste = { 0x1ULL, 0x0ULL },
> > +    };
> > +    SMMUDevice *sdev = &accel_dev->sdev;
> > +    SMMUState *s = sdev->smmu;
> > +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
> > +    SMMUS2Hwpt *s2_hwpt;
> > +    SMMUViommu *viommu;
> > +    uint32_t s2_hwpt_id;
> > +    uint32_t viommu_id;
> > +
> > +    if (s_accel->viommu) {
> > +        accel_dev->viommu = s_accel->viommu;
> > +        return host_iommu_device_iommufd_attach_hwpt(
> > +                       idev, s_accel->viommu->s2_hwpt->hwpt_id, errp);
> 
> Yea, here is my bad. We shouldn't attach a device to s2_hwpt,
> since eventually s2_hwpt would be a shared hwpt across SMMUs.
> 
> > +    /* Attach to S2 for MSI cookie */
> > +    if (!host_iommu_device_iommufd_attach_hwpt(idev, s2_hwpt_id,
> errp)) {
> > +        goto free_s2_hwpt;
> > +    }
> 
> With the merged sw_msi series, we don't need this anymore.
> 
> > +    /*
> > +     * Attach the bypass STE which means S1 bypass and S2 translate.
> > +     * This is to make sure that the vIOMMU object is now associated
> > +     * with the device and has this STE installed in the host SMMUV3.
> > +     */
> > +    if (!host_iommu_device_iommufd_attach_hwpt(
> > +                idev, viommu->bypass_hwpt_id, errp)) {
> > +        error_report("failed to attach the bypass pagetable");
> > +        goto free_bypass_hwpt;
> > +    }
> 
> Ditto. We have to postpone this until vdevice is allocated.

Ok.  I will take a look based on the  vEVENTQ v9 series.
I guess this Qemu branch of yours is a more representative of the changes described
above?
https://github.com/nicolinc/qemu/tree/wip/for_iommufd_veventq-v9

Right?

Thanks,
Shameer

> > +static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void
> *opaque,
> > +                                            int devfn)
> > +{
> > +    SMMUDevice *sdev;
> > +    SMMUv3AccelDevice *accel_dev;
> > +    SMMUViommu *viommu;
> > +    SMMUState *s = opaque;
> > +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
> > +    SMMUPciBus *sbus = g_hash_table_lookup(s-
> >smmu_pcibus_by_busptr, bus);
> > +
> > +    if (!sbus) {
> > +        return;
> > +    }
> > +
> > +    sdev = sbus->pbdev[devfn];
> > +    if (!sdev) {
> > +        return;
> > +    }
> > +
> > +    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> > +    if (!host_iommu_device_iommufd_attach_hwpt(accel_dev->idev,
> > +                                               accel_dev->idev->ioas_id,
> > +                                               NULL)) {
> > +        error_report("Unable to attach dev to the default HW pagetable");
> > +    }
> > +
> > +
> 
> Could drop the extra line.
> 
> Thanks
> Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
  2025-03-13  8:09     ` Shameerali Kolothum Thodi via
@ 2025-03-17 16:52       ` Eric Auger
  2025-03-18  9:47         ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-17 16:52 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org


Hi Shameer,

On 3/13/25 9:09 AM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Wednesday, March 12, 2025 4:24 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce
>> callbacks for PCIIOMMUOps
>>
>>
>>
>>
>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>> Subsequently smmuv3-accel will provide these callbacks
>>>
>>> Signed-off-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com>
>>> ---
>>>  hw/arm/smmu-common.c         | 27 +++++++++++++++++++++++++++
>>>  include/hw/arm/smmu-common.h |  5 +++++
>>>  2 files changed, 32 insertions(+)
>>>
>>> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c index
>>> 83c0693f5a..9fd455baa0 100644
>>> --- a/hw/arm/smmu-common.c
>>> +++ b/hw/arm/smmu-common.c
>>> @@ -865,6 +865,10 @@ static AddressSpace *smmu_find_add_as(PCIBus
>> *bus, void *opaque, int devfn)
>>>      SMMUState *s = opaque;
>>>      SMMUPciBus *sbus = smmu_get_sbus(s, bus);
>>>
>>> +    if (s->accel && s->get_address_space) {
>>> +        return s->get_address_space(bus, opaque, devfn);
>>> +    }
>>> +
>> why do we require that new call site? This needs to be documented in the
>> commit msg esp. because we don't know what this cb will do.
> Currently, this is where the first time SMMUDevice sdev is allocated. And for smmuv3-accel
> cases we are introducing a new SMMUv3AccelDevice accel_dev for holding additional
> accel specific information. In order to do that the above cb is used. Same for other callbacks
> as well.
>
> Another way of avoiding the callbacks would be to  move the pci_setup_iommu(bus, ops) 
> call from the smmu-common.c to smmuv3/smmuv3-accel and handle it directly there.
>
> Or is there a better idea?
>
>>>      sdev = sbus->pbdev[devfn];
>>>      if (!sdev) {
>>>          sdev = sbus->pbdev[devfn] = g_new0(SMMUDevice, 1); @@ -874,8
>>> +878,31 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void
>> *opaque, int devfn)
>>>      return &sdev->as;
>>>  }
>>>
>>> +static bool smmu_dev_set_iommu_device(PCIBus *bus, void *opaque,
>> int devfn,
>>> +                                      HostIOMMUDevice *hiod, Error
>>> +**errp) {
>>> +    SMMUState *s = opaque;
>>> +
>>> +    if (s->accel && s->set_iommu_device) {
>>> +        return s->set_iommu_device(bus, opaque, devfn, hiod, errp);
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>> +static void smmu_dev_unset_iommu_device(PCIBus *bus, void *opaque,
>>> +int devfn) {
>>> +    SMMUState *s = opaque;
>>> +
>>> +    if (s->accel && s->unset_iommu_device) {
>>> +        s->unset_iommu_device(bus, opaque, devfn);
>>> +    }
>>> +}
>>> +
>>>  static const PCIIOMMUOps smmu_ops = {
>>>      .get_address_space = smmu_find_add_as,
>>> +    .set_iommu_device = smmu_dev_set_iommu_device,
>>> +    .unset_iommu_device = smmu_dev_unset_iommu_device,
>>>  };
>>>
>>>  SMMUDevice *smmu_find_sdev(SMMUState *s, uint32_t sid) diff --git
>>> a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
>> index
>>> 80ff2ef6aa..7b05640167 100644
>>> --- a/include/hw/arm/smmu-common.h
>>> +++ b/include/hw/arm/smmu-common.h
>>> @@ -160,6 +160,11 @@ struct SMMUState {
>>>
>>>      /* For smmuv3-accel */
>>>      bool accel;
>>> +
>>> +    AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int
>> devfn);
>>> +    bool (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
>>> +                             HostIOMMUDevice *dev, Error **errp);
>>> +    void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
>> I think this should be exposed by a class and only implemented in the
>> smmuv3 accel device. Adding those cbs directly in the State looks not the
>> std way.
> Ok. You mean we can directly place  PCIIOMMUOps *ops here then? 
When I first skimmed through the series I assumed you would use 2
seperate devices, in which case that would use 2 different
implementations of the same class. You may have a look at
docs/devel/qom.rst and Methods and class there.

Now as I commented earlier I think the end user shall instantiate the
same device for non accel and accel. I would advocate for passing an
option telling whether we want accel modality. Then it rather looks like
what was done for vfio device with either legacy or iommufd backend.

depending on whether the iommufd option is passed you select the right
class implementation:
see hw/vfio/common.c and vfio_attach_device


    const VFIOIOMMUClass *ops =
        VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));

    if (vbasedev->iommufd) {
        ops =
VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
    }

I would doing something similar for selecting the right ops depending on
the passed option.

I hope this helps

Eric


>
> Thanks,
> Shameer



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-13  8:22         ` Shameerali Kolothum Thodi via
@ 2025-03-17 16:57           ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-17 16:57 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org




On 3/13/25 9:22 AM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Wednesday, March 12, 2025 4:42 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
>> pcie bus
>>
>>
>>
>>
>> On 3/12/25 5:34 PM, Shameerali Kolothum Thodi wrote:
>>> Hi Eric,
>>>
>>>> -----Original Message-----
>>>> From: Eric Auger <eric.auger@redhat.com>
>>>> Sent: Wednesday, March 12, 2025 4:08 PM
>>>> To: Shameerali Kolothum Thodi
>>>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>>>> qemu-devel@nongnu.org
>>>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>>>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>>>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a
>> pxb-
>>>> pcie bus
>>>>
>>>> Hi Shameer,
>>>>
>>>>
>>>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>>>> User must associate a pxb-pcie root bus to smmuv3-accel
>>>>> and that is set as the primary-bus for the smmu dev.
>>>> why do we require a pxb-pcie root bus? why can't pci.0 root bus be used
>>>> for simpler use cases (ie. I just want to passthough a NIC in
>>>> accelerated mode). Or may pci.0 is also called a pax-pcie root bus?
>>> The idea was since pcie.0 is the default RC with virt, leave that to cases
>> where
>>> we want to attach any emulated devices and use pxb-pcie based RCs for
>> vfio-pci.
>> yes but for simpler use case you may not want the extra pain to
>> instantiate a pxb-pcie device. Actually libvirt does not instantiate it
>> by default.
>>>> Besides, why do we put the constraint to plug on a root bus. I know that
>>>> at this point we always plug to pci.0 but with the new -device option it
>>>> would be possible to plug it anywhere in the pcie hierarchy. At SOC
>>>> level can't an SMMU be plugged anywhere protecting just a few RIDs?
>>> In my understanding normally(or atleast in the most common cases) it is
>> attached
>>> to root complexes. Also IORT mappings are at the root complex level,
>> right?
>> Yes I do agree the IORT describes ID mappings between RC and SMMU but
>> the actual ID mappings allow you to be much more precise and state that
>> a given SMMU only translates few RIDs within that RID space. If you
>> force the device bus to be a root bus you can't model that anymore.
>>
> Do we really need to support that? What if the user then have another smmuv3-accel
> in the associated upstream buses/RC as well? Not sure how to handle that.
Well I agree we would need to reject such kind of config. Maybe we can
relax this requirement and connect the smmu to a root bus (including
pci.0 though). Then this SMMU would translate all the input RIDs. This
is not as flexible as what the IORT allows but maybe that's enough for
our use cases.

Thanks

Eric
>  
> Thanks,
> Shameer



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-12 15:15   ` Eric Auger
@ 2025-03-17 17:54     ` Nicolin Chen
  2025-03-17 18:07       ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-17 17:54 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Wed, Mar 12, 2025 at 04:15:10PM +0100, Eric Auger wrote:
> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> > Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel
> > device. In order to support vfio-pci dev assignment with a Guest
> guest
> > SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
> nested (s1+s2)
> > mode, with Guest owning the S1 page tables. Subsequent patches will
> the guest
> > add support for smmuv3-accel to provide this.
>
> Can't this -accel smmu also works with emulated devices? Do we want an
> exclusive usage?

Is there any benefit from emulated devices working in the HW-
accelerated nested translation mode?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-17 17:54     ` Nicolin Chen
@ 2025-03-17 18:07       ` Eric Auger
  2025-03-17 19:10         ` Nicolin Chen
  0 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-17 18:07 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao




On 3/17/25 6:54 PM, Nicolin Chen wrote:
> On Wed, Mar 12, 2025 at 04:15:10PM +0100, Eric Auger wrote:
>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>> Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel
>>> device. In order to support vfio-pci dev assignment with a Guest
>> guest
>>> SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
>> nested (s1+s2)
>>> mode, with Guest owning the S1 page tables. Subsequent patches will
>> the guest
>>> add support for smmuv3-accel to provide this.
>> Can't this -accel smmu also works with emulated devices? Do we want an
>> exclusive usage?
> Is there any benefit from emulated devices working in the HW-
> accelerated nested translation mode?

Not really but do we have any justification for using different device
name in accel mode? I am not even sure that accel option is really
needed. Ideally the qemu device should be able to detect it is
protecting a VFIO device, in which case it shall check whether nested is
supported by host SMMU and then automatically turn accel mode?

I gave the example of the vfio device which has different class
implementration depending on the iommufd option being set or not.

Thanks

Eric

>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  2025-03-17  8:38     ` Shameerali Kolothum Thodi via
@ 2025-03-17 18:19       ` Nicolin Chen
  0 siblings, 0 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-17 18:19 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Mon, Mar 17, 2025 at 08:38:23AM +0000, Shameerali Kolothum Thodi wrote:
> Hi Nicolin,
> 
> > -----Original Message-----
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Tuesday, March 11, 2025 9:08 PM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> > mochs@nvidia.com; smostafa@google.com; Linuxarm
> > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > Subject: Re: [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add
> > set/unset_iommu_device callback
> > 
> > On Tue, Mar 11, 2025 at 02:10:34PM +0000, Shameer Kolothum wrote:
> > > @@ -30,6 +32,185 @@ static SMMUv3AccelDevice
> > *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
> > >      return accel_dev;
> > >  }
> > >
> > > +static bool
> > > +smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
> > > +                               HostIOMMUDeviceIOMMUFD *idev, Error **errp)
> > 
> > With vEVENTQ v9, vDEVICE (vSID) is required to attach a device
> > to a proxy NESTED hwpt (applicable to bypass/abort HWPTs too).
> > So, host_iommu_device_iommufd_attach_hwpt() would fail in this
> > function because vSID isn't ready at this stage. So all those
> > calls should be moved out of the function, then this should be
> > likely "smmuv3_accel_dev_alloc_viommu"?
> > 
> > That being said, I don't know when QEMU actually prepare a BDF
> > number for a vfio-pci device. The only place that I see it is
> > ready is at guest-level SMMU installing the Stream Table, i.e.
> > in smmuv3_accel_install_nested_ste().

> > > +    /*
> > > +     * Attach the bypass STE which means S1 bypass and S2 translate.
> > > +     * This is to make sure that the vIOMMU object is now associated
> > > +     * with the device and has this STE installed in the host SMMUV3.
> > > +     */
> > > +    if (!host_iommu_device_iommufd_attach_hwpt(
> > > +                idev, viommu->bypass_hwpt_id, errp)) {
> > > +        error_report("failed to attach the bypass pagetable");
> > > +        goto free_bypass_hwpt;
> > > +    }
> > 
> > Ditto. We have to postpone this until vdevice is allocated.
> 
> Ok.  I will take a look based on the  vEVENTQ v9 series.
> I guess this Qemu branch of yours is a more representative of the changes described
> above?
> https://github.com/nicolinc/qemu/commits/wip/for_iommufd_veventq-v9/

Yes. 

Mainly this change:
https://github.com/nicolinc/qemu/commit/d8f496eaf528f1c397f2374a999b8b23fd55c75b

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-17 18:07       ` Eric Auger
@ 2025-03-17 19:10         ` Nicolin Chen
  2025-03-17 19:24           ` Jason Gunthorpe
  2025-03-19 16:45           ` Eric Auger
  0 siblings, 2 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-17 19:10 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Mon, Mar 17, 2025 at 07:07:52PM +0100, Eric Auger wrote:
> On 3/17/25 6:54 PM, Nicolin Chen wrote:
> > On Wed, Mar 12, 2025 at 04:15:10PM +0100, Eric Auger wrote:
> >> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> >>> Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel
> >>> device. In order to support vfio-pci dev assignment with a Guest
> >> guest
> >>> SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
> >> nested (s1+s2)
> >>> mode, with Guest owning the S1 page tables. Subsequent patches will
> >> the guest
> >>> add support for smmuv3-accel to provide this.
> >> Can't this -accel smmu also works with emulated devices? Do we want an
> >> exclusive usage?
> > Is there any benefit from emulated devices working in the HW-
> > accelerated nested translation mode?
> 
> Not really but do we have any justification for using different device
> name in accel mode? I am not even sure that accel option is really
> needed. Ideally the qemu device should be able to detect it is
> protecting a VFIO device, in which case it shall check whether nested is
> supported by host SMMU and then automatically turn accel mode?
> 
> I gave the example of the vfio device which has different class
> implementration depending on the iommufd option being set or not.

Do you mean that we should just create a regular smmuv3 device and
let a VFIO device to turn on this smmuv3's accel mode depending on
its LEGACY/IOMMUFD class?

Another question: how does an emulated device work with a vSMMUv3?
I could imagine that all the accel steps would be bypassed since
!sdev->idev. Yet, the emulated iotlb should cache its translation
so we will need to flush the iotlb, which will increase complexity
as the TLBI command dispatching function will need to be aware what
ASID is for emulated device and what is for vfio device..

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-17 19:10         ` Nicolin Chen
@ 2025-03-17 19:24           ` Jason Gunthorpe
  2025-03-17 20:19             ` Nicolin Chen
  2025-03-18 21:42             ` Donald Dutile
  2025-03-19 16:45           ` Eric Auger
  1 sibling, 2 replies; 145+ messages in thread
From: Jason Gunthorpe @ 2025-03-17 19:24 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Eric Auger, Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
> Another question: how does an emulated device work with a vSMMUv3?
> I could imagine that all the accel steps would be bypassed since
> !sdev->idev. Yet, the emulated iotlb should cache its translation
> so we will need to flush the iotlb, which will increase complexity
> as the TLBI command dispatching function will need to be aware what
> ASID is for emulated device and what is for vfio device..

I think you should block it. We already expect different vSMMU's
depending on the physical SMMU under the PCI device, it makes sense
that a SW VFIO device would have it's own, non-accelerated, vSMMU
model in the guest.

Jason


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-17 19:24           ` Jason Gunthorpe
@ 2025-03-17 20:19             ` Nicolin Chen
  2025-03-18  9:50               ` Shameerali Kolothum Thodi via
  2025-03-18 18:31               ` Eric Auger
  2025-03-18 21:42             ` Donald Dutile
  1 sibling, 2 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-17 20:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Eric Auger, Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
> > Another question: how does an emulated device work with a vSMMUv3?
> > I could imagine that all the accel steps would be bypassed since
> > !sdev->idev. Yet, the emulated iotlb should cache its translation
> > so we will need to flush the iotlb, which will increase complexity
> > as the TLBI command dispatching function will need to be aware what
> > ASID is for emulated device and what is for vfio device..
> 
> I think you should block it. We already expect different vSMMU's
> depending on the physical SMMU under the PCI device, it makes sense
> that a SW VFIO device would have it's own, non-accelerated, vSMMU
> model in the guest.

Yea, I agree and it'd be cleaner for an implementation separating
them.

In my mind, the general idea of "accel=on" is also to keep things
in a more efficient way: passthrough devices go to HW-accelerated
vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
bypassed (PCIE0).

Though I do see the point from QEMU prospective that user may want
to start a VM with HW-accelerated vSMMU for one passthrough device
using a simple setup without caring about the routing via command.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
  2025-03-17 16:52       ` Eric Auger
@ 2025-03-18  9:47         ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-18  9:47 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Monday, March 17, 2025 4:52 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce
> callbacks for PCIIOMMUOps


> Hi Shameer,
> 
> On 3/13/25 9:09 AM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >>>      bool accel;
> >>> +
> >>> +    AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque,
> int
> >> devfn);
> >>> +    bool (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
> >>> +                             HostIOMMUDevice *dev, Error **errp);
> >>> +    void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
> >> I think this should be exposed by a class and only implemented in the
> >> smmuv3 accel device. Adding those cbs directly in the State looks not the
> >> std way.
> > Ok. You mean we can directly place  PCIIOMMUOps *ops here then?
> When I first skimmed through the series I assumed you would use 2
> seperate devices, in which case that would use 2 different
> implementations of the same class. You may have a look at
> docs/devel/qom.rst and Methods and class there.
> 
> Now as I commented earlier I think the end user shall instantiate the
> same device for non accel and accel. I would advocate for passing an
> option telling whether we want accel modality. Then it rather looks like
> what was done for vfio device with either legacy or iommufd backend.
> 
> depending on whether the iommufd option is passed you select the right
> class implementation:
> see hw/vfio/common.c and vfio_attach_device
> 
> 
>     const VFIOIOMMUClass *ops =
>         VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGA
> CY));
> 
>     if (vbasedev->iommufd) {
>         ops =
> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD)
> );
>     }
> 
> I would doing something similar for selecting the right ops depending on
> the passed option.
> 
> I hope this helps

Thanks Eric. I will take a look.

Shameer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-17 20:19             ` Nicolin Chen
@ 2025-03-18  9:50               ` Shameerali Kolothum Thodi via
  2025-03-18 18:31               ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-18  9:50 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Eric Auger, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, ddutile@redhat.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Monday, March 17, 2025 8:19 PM
> To: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Eric Auger <eric.auger@redhat.com>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; peter.maydell@linaro.org; ddutile@redhat.com;
> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial
> infrastructure for smmuv3-accel device
> 
> On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
> > On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
> > > Another question: how does an emulated device work with a
> vSMMUv3?
> > > I could imagine that all the accel steps would be bypassed since
> > > !sdev->idev. Yet, the emulated iotlb should cache its translation
> > > so we will need to flush the iotlb, which will increase complexity
> > > as the TLBI command dispatching function will need to be aware what
> > > ASID is for emulated device and what is for vfio device..
> >
> > I think you should block it. We already expect different vSMMU's
> > depending on the physical SMMU under the PCI device, it makes sense
> > that a SW VFIO device would have it's own, non-accelerated, vSMMU
> > model in the guest.
> 
> Yea, I agree and it'd be cleaner for an implementation separating
> them.
> 
> In my mind, the general idea of "accel=on" is also to keep things
> in a more efficient way: passthrough devices go to HW-accelerated
> vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
> bypassed (PCIE0).
> 
> Though I do see the point from QEMU prospective that user may want
> to start a VM with HW-accelerated vSMMU for one passthrough device
> using a simple setup without caring about the routing via command.

For now we don't use iotlb for accel cases with emulated devices. So probably
can document/warn the user about possible performance degradation if they
attach such a device rather than blocking.

Thanks,
Shameer
 


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-17 20:19             ` Nicolin Chen
  2025-03-18  9:50               ` Shameerali Kolothum Thodi via
@ 2025-03-18 18:31               ` Eric Auger
  2025-03-18 19:13                 ` Nicolin Chen
  2025-03-19  0:31                 ` Jason Gunthorpe
  1 sibling, 2 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-18 18:31 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, ddutile,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

Hi,


On 3/17/25 9:19 PM, Nicolin Chen wrote:
> On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
>> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
>>> Another question: how does an emulated device work with a vSMMUv3?
>>> I could imagine that all the accel steps would be bypassed since
>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>> so we will need to flush the iotlb, which will increase complexity
>>> as the TLBI command dispatching function will need to be aware what
>>> ASID is for emulated device and what is for vfio device..
>> I think you should block it. We already expect different vSMMU's
>> depending on the physical SMMU under the PCI device, it makes sense
>> that a SW VFIO device would have it's own, non-accelerated, vSMMU
>> model in the guest.
> Yea, I agree and it'd be cleaner for an implementation separating
> them.
>
> In my mind, the general idea of "accel=on" is also to keep things
> in a more efficient way: passthrough devices go to HW-accelerated
> vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
> bypassed (PCIE0).
Originally a specific SMMU device was needed to opt in for MSI reserved
region ACPI IORT description which are not needed if you don't rely on
S1+S2. However if we don't rely on this trick this was not even needed
with legacy integration
(https://patchwork.kernel.org/project/qemu-devel/cover/20180921081819.9203-1-eric.auger@redhat.com/).

Nevertheless I don't think anything prevents the acceleration granted
device from also working with virtio/vhost devices for instance unless
you unplug the existing infra. The translation and invalidation just
should use different control paths (explicit translation requests,
invalidation notifications towards vhost, ...).

Again, what does legitimate to have different qemu devices for the same
IP? I understand that it simplifies the implementation but I am not sure
this is a good reason. Nevertheless it worth challenging. What is the
plan for intel iommu? Will we have 2 devices, the legacy device and one
for nested?

Thanks

Eric


>
> Though I do see the point from QEMU prospective that user may want
> to start a VM with HW-accelerated vSMMU for one passthrough device
> using a simple setup without caring about the routing via command.
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-18 18:31               ` Eric Auger
@ 2025-03-18 19:13                 ` Nicolin Chen
  2025-03-18 21:22                   ` Donald Dutile
  2025-03-19  0:31                 ` Jason Gunthorpe
  1 sibling, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-18 19:13 UTC (permalink / raw)
  To: Eric Auger
  Cc: Jason Gunthorpe, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, ddutile, berrange, nathanc, mochs, smostafa,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
> On 3/17/25 9:19 PM, Nicolin Chen wrote:
> > On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
> >> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
> >>> Another question: how does an emulated device work with a vSMMUv3?
> >>> I could imagine that all the accel steps would be bypassed since
> >>> !sdev->idev. Yet, the emulated iotlb should cache its translation
> >>> so we will need to flush the iotlb, which will increase complexity
> >>> as the TLBI command dispatching function will need to be aware what
> >>> ASID is for emulated device and what is for vfio device..
> >> I think you should block it. We already expect different vSMMU's
> >> depending on the physical SMMU under the PCI device, it makes sense
> >> that a SW VFIO device would have it's own, non-accelerated, vSMMU
> >> model in the guest.
> > Yea, I agree and it'd be cleaner for an implementation separating
> > them.
> >
> > In my mind, the general idea of "accel=on" is also to keep things
> > in a more efficient way: passthrough devices go to HW-accelerated
> > vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
> > bypassed (PCIE0).

> Originally a specific SMMU device was needed to opt in for MSI reserved
> region ACPI IORT description which are not needed if you don't rely on
> S1+S2. However if we don't rely on this trick this was not even needed
> with legacy integration
> (https://patchwork.kernel.org/project/qemu-devel/cover/20180921081819.9203-1-eric.auger@redhat.com/).
> 
> Nevertheless I don't think anything prevents the acceleration granted
> device from also working with virtio/vhost devices for instance unless
> you unplug the existing infra. The translation and invalidation just
> should use different control paths (explicit translation requests,
> invalidation notifications towards vhost, ...).

smmuv3_translate() is per sdev, so it's easy.

Invalidation is done via commands, which could be tricky:
a) Broadcast command
b) ASID validation -- we'll need to keep track of a list of ASIDs
   for vfio device to compare the ASID in each per-ASID command,
   potentially by trapping all CFGI_CD(_ALL) commands? Note that
   each vfio device may have multiple ASIDs (for multiple CDs).
Either a or b above will have some validation efficiency impact.

> Again, what does legitimate to have different qemu devices for the same
> IP? I understand that it simplifies the implementation but I am not sure
> this is a good reason. Nevertheless it worth challenging. What is the
> plan for intel iommu? Will we have 2 devices, the legacy device and one
> for nested?

Hmm, it seems that there are two different topics:
1. Use one SMMU device model (source code file; "iommu=" string)
   for both an emulated vSMMU and an HW-accelerated vSMMU.
2. Allow one vSMMU instance to work with both an emulated device
   and a passthrough device.
And I get that you want both 1 and 2.

I'm totally okay with 1, yet see no compelling benefit from 2 for
the increased complexity in the invalidation routine.

And another question about the mixed device attachment. Let's say
we have in the host:
  VFIO passthrough dev0 -> pSMMU0
  VFIO passthrough dev1 -> pSMMU1
Should we allow emulated devices to be flexibly plugged?
  dev0 -> vSMMU0 /* Hard requirement */
  dev1 -> vSMMU1 /* Hard requirement */
  emu0 -> vSMMU0 /* Soft requirement; can be vSMMU1 also */
  emu1 -> vSMMU1 /* Soft requirement; can be vSMMU0 also */

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-18 19:13                 ` Nicolin Chen
@ 2025-03-18 21:22                   ` Donald Dutile
  2025-03-19  0:23                     ` Jason Gunthorpe
  2025-03-19 17:04                     ` Eric Auger
  0 siblings, 2 replies; 145+ messages in thread
From: Donald Dutile @ 2025-03-18 21:22 UTC (permalink / raw)
  To: Nicolin Chen, Eric Auger
  Cc: Jason Gunthorpe, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, berrange, nathanc, mochs, smostafa, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/18/25 3:13 PM, Nicolin Chen wrote:
> On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
>> On 3/17/25 9:19 PM, Nicolin Chen wrote:
>>> On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
>>>> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
>>>>> Another question: how does an emulated device work with a vSMMUv3?
>>>>> I could imagine that all the accel steps would be bypassed since
>>>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>>>> so we will need to flush the iotlb, which will increase complexity
>>>>> as the TLBI command dispatching function will need to be aware what
>>>>> ASID is for emulated device and what is for vfio device..
>>>> I think you should block it. We already expect different vSMMU's
>>>> depending on the physical SMMU under the PCI device, it makes sense
>>>> that a SW VFIO device would have it's own, non-accelerated, vSMMU
>>>> model in the guest.
>>> Yea, I agree and it'd be cleaner for an implementation separating
>>> them.
>>>
>>> In my mind, the general idea of "accel=on" is also to keep things
>>> in a more efficient way: passthrough devices go to HW-accelerated
>>> vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
>>> bypassed (PCIE0).
> 
>> Originally a specific SMMU device was needed to opt in for MSI reserved
>> region ACPI IORT description which are not needed if you don't rely on
>> S1+S2. However if we don't rely on this trick this was not even needed
>> with legacy integration
>> (https://patchwork.kernel.org/project/qemu-devel/cover/20180921081819.9203-1-eric.auger@redhat.com/).
>>
>> Nevertheless I don't think anything prevents the acceleration granted
>> device from also working with virtio/vhost devices for instance unless
>> you unplug the existing infra. The translation and invalidation just
>> should use different control paths (explicit translation requests,
>> invalidation notifications towards vhost, ...).
> 
> smmuv3_translate() is per sdev, so it's easy.
> 
> Invalidation is done via commands, which could be tricky:
> a) Broadcast command
> b) ASID validation -- we'll need to keep track of a list of ASIDs
>     for vfio device to compare the ASID in each per-ASID command,
>     potentially by trapping all CFGI_CD(_ALL) commands? Note that
>     each vfio device may have multiple ASIDs (for multiple CDs).
> Either a or b above will have some validation efficiency impact.
> 
>> Again, what does legitimate to have different qemu devices for the same
>> IP? I understand that it simplifies the implementation but I am not sure
>> this is a good reason. Nevertheless it worth challenging. What is the
>> plan for intel iommu? Will we have 2 devices, the legacy device and one
>> for nested?
> 
> Hmm, it seems that there are two different topics:
> 1. Use one SMMU device model (source code file; "iommu=" string)
>     for both an emulated vSMMU and an HW-accelerated vSMMU.
> 2. Allow one vSMMU instance to work with both an emulated device
>     and a passthrough device.
> And I get that you want both 1 and 2.
> 
> I'm totally okay with 1, yet see no compelling benefit from 2 for
> the increased complexity in the invalidation routine.
> 
> And another question about the mixed device attachment. Let's say
> we have in the host:
>    VFIO passthrough dev0 -> pSMMU0
>    VFIO passthrough dev1 -> pSMMU1
> Should we allow emulated devices to be flexibly plugged?
>    dev0 -> vSMMU0 /* Hard requirement */
>    dev1 -> vSMMU1 /* Hard requirement */
>    emu0 -> vSMMU0 /* Soft requirement; can be vSMMU1 also */
>    emu1 -> vSMMU1 /* Soft requirement; can be vSMMU0 also */
> 
> Thanks
> Nicolin
> 
I agree w/Jason & Nicolin: different vSMMUs for pass-through devices than emulated, & vice-versa.
Not mixing... because... of the next agreement:

I agree with Eric that 'accel' isn't needed -- this should be ascertained from the pSMMU that a physical device is attached to.
Now... how does vfio(?; why not qemu?) layer determine that? -- where are SMMUv3 'accel' features exposed either: a) in the device struct (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't find anything under either on my g-h system, but would appreciate a ptr if there is.
and like Eric, although 'accel' is better than the original 'nested', it's non-obvious what accel feature(s) are being turned on, or not.
In fact, if broken accel hw occurs ('if' -> 'when'), how should it be turned off? ... if info in the kernel, a kernel boot-param will be needed;
if in sysfs, a write to 0 an enable(disable) it maybe an alternative as well.
Bottom line: we need a way to (a) ascertain the accel feature (b) a way to disable it when it is broken,
so qemu's smmuv3 spec will 'just work'.
[This may also help when migrating from a machine that has accel working to one that does not.[

... and when an emulated device is assigned a vSMMU, there are no accel features ... unless we have tunables like batch iotlb invalidation for perf reasons, which can be viewed as an 'accel' option.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-17 19:24           ` Jason Gunthorpe
  2025-03-17 20:19             ` Nicolin Chen
@ 2025-03-18 21:42             ` Donald Dutile
  1 sibling, 0 replies; 145+ messages in thread
From: Donald Dutile @ 2025-03-18 21:42 UTC (permalink / raw)
  To: Jason Gunthorpe, Nicolin Chen
  Cc: Eric Auger, Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/17/25 3:24 PM, Jason Gunthorpe wrote:
> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
>> Another question: how does an emulated device work with a vSMMUv3?
>> I could imagine that all the accel steps would be bypassed since
>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>> so we will need to flush the iotlb, which will increase complexity
>> as the TLBI command dispatching function will need to be aware what
>> ASID is for emulated device and what is for vfio device..
> 
> I think you should block it. We already expect different vSMMU's
... and when you say 'block', you mean qemu prints out a helpful message
like "Mixing emulate/virtual devices and physical devices on a single SMMUv3 is not allowed.
       Specify separate smmuv3 objects for each type of device; multiple smmuv3 objects may
       be required for each physical device if they are attached to different smmuv3's in the host system."

Or would that be an allowed qemu machine definition, but the 'block' would be a warning like:
  "Mixing emulated/virtual devices and physical devices on a single SMMUv3 is not recommended for
   performance reasons.  To yield optimal performance, place physical devices on separate SMMUv3 objects
   than emulated/virtual device SMMUv3 objects."
... and in this case, the physical devices would not use the accel features of an smmuv3, but still be 'functional'.
This may be desired for a machine definition that wants to be used on different hosts that may not have the
(same) accel feature(s).



> depending on the physical SMMU under the PCI device, it makes sense
> that a SW VFIO device would have it's own, non-accelerated, vSMMU
> model in the guest.
> 
> Jason
> 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-11 14:10 ` [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus Shameer Kolothum via
  2025-03-12 16:07   ` Eric Auger
@ 2025-03-18 22:12   ` Donald Dutile
  2025-03-19  9:26     ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 145+ messages in thread
From: Donald Dutile @ 2025-03-18 22:12 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, berrange, nathanc,
	mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Shameer,

Hi!

On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> User must associate a pxb-pcie root bus to smmuv3-accel
> and that is set as the primary-bus for the smmu dev.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
> 
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index c327661636..1471b65374 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -9,6 +9,21 @@
>   #include "qemu/osdep.h"
>   
>   #include "hw/arm/smmuv3-accel.h"
> +#include "hw/pci/pci_bridge.h"
> +
> +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
> +{
> +    DeviceState *d = opaque;
> +
> +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
> +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
> +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus->name)) {
> +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
> +                                     &error_abort);
> +        }
> +    }
> +    return 0;
> +}
>   
>   static void smmu_accel_realize(DeviceState *d, Error **errp)
>   {
> @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
>       SysBusDevice *dev = SYS_BUS_DEVICE(d);
>       Error *local_err = NULL;
>   
> +    object_child_foreach_recursive(object_get_root(),
> +                                   smmuv3_accel_pxb_pcie_bus, d);
> +
>       object_property_set_bool(OBJECT(dev), "accel", true, &error_abort);
>       c->parent_realize(d, &local_err);
>       if (local_err) {
> @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
>       device_class_set_parent_realize(dc, smmu_accel_realize,
>                                       &c->parent_realize);
>       dc->hotpluggable = false;
> +    dc->bus_type = TYPE_PCIE_BUS;
>   }
>   
>   static const TypeInfo smmuv3_accel_type_info = {

I am not seeing the need for a pxb-pcie bus(switch) introduced for each 'accel'.
Isn't the IORT able to define different SMMUs for different RIDs?   if so, itsn't that sufficient
to associate (define) an SMMU<->RID association without introducing a pxb-pcie?
and again, I'm not sure how that improves/enables the device<->SMMU associativity?

Feel free to enlighten me where I may have mis-read/interpreted the IORT & SMMUv3 specs.

Thanks,
- Don



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-11 14:10 ` [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel Shameer Kolothum via
  2025-03-11 20:22   ` Nicolin Chen
  2025-03-12 15:36   ` Eric Auger
@ 2025-03-18 22:49   ` Donald Dutile
  2025-03-19  9:28     ` Shameerali Kolothum Thodi via
  2 siblings, 1 reply; 145+ messages in thread
From: Donald Dutile @ 2025-03-18 22:49 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, berrange, nathanc,
	mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao


Doesn't this commit become moot, if accel becomes an smmuv3 option vs separate device object altogether,
dynamically added if a pdev is attached to a host SMMUv3 that has accel feature(s)?

Blocking w/virtio-iommu falls under the same situation mentioned in 03/20 wrt mixing emulated & physical devices on the same smmuv3.

- Don

On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> Allow cold-plug smmuv3-accel to virt If the machine wide smmuv3
> is not specified.
> 
> No FDT support is added for now.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   hw/arm/virt.c         | 12 ++++++++++++
>   hw/core/sysbus-fdt.c  |  1 +
>   include/hw/arm/virt.h |  1 +
>   3 files changed, 14 insertions(+)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 4a5a9666e9..84a323da55 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -73,6 +73,7 @@
>   #include "qobject/qlist.h"
>   #include "standard-headers/linux/input.h"
>   #include "hw/arm/smmuv3.h"
> +#include "hw/arm/smmuv3-accel.h"
>   #include "hw/acpi/acpi.h"
>   #include "target/arm/cpu-qom.h"
>   #include "target/arm/internals.h"
> @@ -2911,6 +2912,16 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>               platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
>                                        SYS_BUS_DEVICE(dev));
>           }
> +        if (object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_ACCEL)) {
> +            if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> +                error_setg(errp,
> +                           "iommu=smmuv3 is already specified. can't create smmuv3-accel dev");
> +                return;
> +            }
> +            if (vms->iommu != VIRT_IOMMU_SMMUV3_ACCEL) {
> +                vms->iommu = VIRT_IOMMU_SMMUV3_ACCEL;
> +            }
> +        }
>       }
>   
>       if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> @@ -3120,6 +3131,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>       machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
>       machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
>       machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_PLATFORM);
> +    machine_class_allow_dynamic_sysbus_dev(mc, TYPE_ARM_SMMUV3_ACCEL);
>   #ifdef CONFIG_TPM
>       machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
>   #endif
> diff --git a/hw/core/sysbus-fdt.c b/hw/core/sysbus-fdt.c
> index 774c0aed41..c8502ad830 100644
> --- a/hw/core/sysbus-fdt.c
> +++ b/hw/core/sysbus-fdt.c
> @@ -489,6 +489,7 @@ static const BindingEntry bindings[] = {
>   #ifdef CONFIG_LINUX
>       TYPE_BINDING(TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node),
>       TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
> +    TYPE_BINDING("arm-smmuv3-accel", no_fdt_node),
>       VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a", add_amd_xgbe_fdt_node),
>   #endif
>   #ifdef CONFIG_TPM
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index c8e94e6aed..849d1cd5b5 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -92,6 +92,7 @@ enum {
>   typedef enum VirtIOMMUType {
>       VIRT_IOMMU_NONE,
>       VIRT_IOMMU_SMMUV3,
> +    VIRT_IOMMU_SMMUV3_ACCEL,
>       VIRT_IOMMU_VIRTIO,
>   } VirtIOMMUType;
>   



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 11/20] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
  2025-03-11 14:10 ` [RFC PATCH v2 11/20] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device Shameer Kolothum via
@ 2025-03-18 23:30   ` Donald Dutile
  2025-03-25 18:13   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Donald Dutile @ 2025-03-18 23:30 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, berrange, nathanc,
	mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Shameer,

Hi,

On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Allocate and associate a vDEVICE object for the Guest device
> with the vIOMMU. This will help the kernel to do the
> vSID --> sid translation whenever required (eg: device specific
> invalidations).
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   hw/arm/smmuv3-accel.c         | 22 ++++++++++++++++++++++
>   include/hw/arm/smmuv3-accel.h |  6 ++++++
>   2 files changed, 28 insertions(+)
> 
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index d3a5cf9551..056bd23b2e 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -109,6 +109,20 @@ void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
>           return;
>       }
>   
> +    if (!accel_dev->vdev && accel_dev->idev) {
> +        SMMUVdev *vdev;
> +        uint32_t vdev_id;
> +        SMMUViommu *viommu = accel_dev->viommu;
> +
> +        iommufd_backend_alloc_vdev(viommu->core.iommufd, accel_dev->idev->devid,
> +                                   viommu->core.viommu_id, sid, &vdev_id,
> +                                   &error_abort);
> +        vdev = g_new0(SMMUVdev, 1);
> +        vdev->vdev_id = vdev_id;
> +        vdev->sid = sid;
> +        accel_dev->vdev = vdev;
> +    }
> +
>       ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
>       if (ret) {
>           /*
> @@ -283,6 +297,7 @@ static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
>   static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
>                                               int devfn)
>   {
> +    SMMUVdev *vdev;
>       SMMUDevice *sdev;
>       SMMUv3AccelDevice *accel_dev;
>       SMMUViommu *viommu;
> @@ -312,6 +327,13 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
>       trace_smmuv3_accel_unset_iommu_device(devfn, smmu_get_sid(sdev));
>   
>       viommu = s_accel->viommu;
> +    vdev = accel_dev->vdev;
> +    if (vdev) {
> +        iommufd_backend_free_id(viommu->iommufd, vdev->vdev_id);
> +        g_free(vdev);
> +        accel_dev->vdev = NULL;
> +    }
> +
>       if (QLIST_EMPTY(&viommu->device_list)) {
>           iommufd_backend_free_id(viommu->iommufd, viommu->bypass_hwpt_id);
>           iommufd_backend_free_id(viommu->iommufd, viommu->abort_hwpt_id);
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> index d6b0b1ca30..54b217ab4f 100644
> --- a/include/hw/arm/smmuv3-accel.h
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -35,6 +35,11 @@ typedef struct SMMUViommu {
>       QLIST_ENTRY(SMMUViommu) next;
>   } SMMUViommu;
>   
> +typedef struct SMMUVdev {
> +    uint32_t vdev_id;
> +    uint32_t sid;
> +} SMMUVdev;
> +
Shouldn't this be 'IOMMUFDVdev' ... it's not an SMMU (v)dev , it's an IOMMUFD/vIOMMU vDEVICE for this SMMU


>   typedef struct SMMUS1Hwpt {
>       IOMMUFDBackend *iommufd;
>       uint32_t hwpt_id;
> @@ -45,6 +50,7 @@ typedef struct SMMUv3AccelDevice {
>       HostIOMMUDeviceIOMMUFD *idev;
>       SMMUS1Hwpt  *s1_hwpt;
>       SMMUViommu *viommu;
> +    SMMUVdev   *vdev;
>       QLIST_ENTRY(SMMUv3AccelDevice) next;
>   } SMMUv3AccelDevice;
>   



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-18 21:22                   ` Donald Dutile
@ 2025-03-19  0:23                     ` Jason Gunthorpe
  2025-03-19  2:15                       ` Donald Dutile
  2025-03-19 17:00                       ` Eric Auger
  2025-03-19 17:04                     ` Eric Auger
  1 sibling, 2 replies; 145+ messages in thread
From: Jason Gunthorpe @ 2025-03-19  0:23 UTC (permalink / raw)
  To: Donald Dutile
  Cc: Nicolin Chen, Eric Auger, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, berrange, nathanc, mochs, smostafa, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

On Tue, Mar 18, 2025 at 05:22:51PM -0400, Donald Dutile wrote:

> I agree with Eric that 'accel' isn't needed -- this should be
> ascertained from the pSMMU that a physical device is attached to.

I seem to remember the point was made that we don't actually know if
accel is possible, or desired, especially in the case of hotplug.

The accelerated mode has a number of limitations that the software
mode does not have. I think it does make sense that the user would
deliberately choose to use a more restrictive operating mode and then
would have to meet the requirements - eg by creating the required
number and configuration of vSMMUs.

> Now... how does vfio(?; why not qemu?) layer determine that? --
> where are SMMUv3 'accel' features exposed either: a) in the device
> struct (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't
> find anything under either on my g-h system, but would appreciate a
> ptr if there is.

I think it is not discoverable yet other thatn through
try-and-fail. Discoverability would probably be some bits in an
iommufd GET_INFO ioctl or something like that.

> and like Eric, although 'accel' is better than the
> original 'nested', it's non-obvious what accel feature(s) are being
> turned on, or not.

There are really only one accel feature - direct HW usage of the IO
Page table in the guest (no shadowing).

A secondary addon would be direct HW usage of an invalidation queue in
the guest.

> kernel boot-param will be needed; if in sysfs, a write to 0 an
> enable(disable) it maybe an alternative as well.  Bottom line: we
> need a way to (a) ascertain the accel feature (b) a way to disable
> it when it is broken, so qemu's smmuv3 spec will 'just work'.  

You'd turned it off by not asking qemu to use it, that is sort of the
reasoning behind the command line opt in for accel or not.

Jason


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-18 18:31               ` Eric Auger
  2025-03-18 19:13                 ` Nicolin Chen
@ 2025-03-19  0:31                 ` Jason Gunthorpe
  2025-03-19  5:27                   ` Nicolin Chen
  2025-03-24 14:08                   ` Eric Auger
  1 sibling, 2 replies; 145+ messages in thread
From: Jason Gunthorpe @ 2025-03-19  0:31 UTC (permalink / raw)
  To: Eric Auger
  Cc: Nicolin Chen, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, ddutile, berrange, nathanc, mochs, smostafa,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
> Nevertheless I don't think anything prevents the acceleration granted
> device from also working with virtio/vhost devices for instance unless
> you unplug the existing infra.

If the accel mode is using something like vcmdq then it is not
possible to work since the invalidations won't even be trapped.

Even in the case where we trap the invalidations it sure is
complicated.. invalidation is done by ASID which is not obviously
related to any specific device. An ASID could be hidden inside a CD
table that is being HW accessed and also inside a CD table that is SW
accessed. The VMM has no way to know what is going on so you'd end up
forced to replicate all the ASID invalidations. :\

It just doesn't seem worthwhile to try to make it all work.

I'd suggest arranging to share some of the SMMUv3 emulation code,
maybe with a library/headerfile or something, but I think it does make
sense they would be different implementations given how completely
different they should be.

Jason


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-11 14:10 ` [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
@ 2025-03-19  1:31   ` Donald Dutile
  2025-03-19  9:48     ` Shameerali Kolothum Thodi via
  2025-03-26 13:38   ` Eric Auger
  2025-03-26 13:59   ` Eric Auger
  2 siblings, 1 reply; 145+ messages in thread
From: Donald Dutile @ 2025-03-19  1:31 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, berrange, nathanc,
	mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Shameer,

Hi,


On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Inroduce an SMMUCommandBatch and some helpers to batch and issue the
   ^^^^^^^^ Introduce
> commands.  Currently separate out TLBI commands and device cache commands
> to avoid some errata on certain versions of SMMUs. Later it should check
> IIDR register to detect if underlying SMMU hw has such an erratum.
Where is all this info about 'certain versions of SMMUs' and
'check IIDR register' has something to do with 'underlying SMMU hw such an erratum',
-- which IIDR (& bits)? or are we talking about rsvd SMMU_IDR<> registers?


And can't these helpers be used for emulated smmuv3 as well as accelerated?

Thanks,
- Don
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   hw/arm/smmuv3-accel.c    | 69 ++++++++++++++++++++++++++++++++++++++++
>   hw/arm/smmuv3-internal.h | 29 +++++++++++++++++
>   2 files changed, 98 insertions(+)
> 
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 76134d106a..09be838d22 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -160,6 +160,75 @@ void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
>                                             nested_data.ste[0]);
>   }
>   
> +/* Update batch->ncmds to the number of execute cmds */
> +int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch)
> +{
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(bs);
> +    uint32_t total = batch->ncmds;
> +    IOMMUFDViommu *viommu_core;
> +    int ret;
> +
> +    if (!bs->accel) {
> +        return 0;
> +    }
> +
> +    if (!s_accel->viommu) {
> +        return 0;
> +    }
> +    viommu_core = &s_accel->viommu->core;
> +    ret = iommufd_backend_invalidate_cache(viommu_core->iommufd,
> +                                           viommu_core->viommu_id,
> +                                           IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
> +                                           sizeof(Cmd), &batch->ncmds,
> +                                           batch->cmds);
> +    if (total != batch->ncmds) {
> +        error_report("%s failed: ret=%d, total=%d, done=%d",
> +                      __func__, ret, total, batch->ncmds);
> +        return ret;
> +    }
> +
> +    batch->ncmds = 0;
> +    batch->dev_cache = false;
> +    return ret;
> +}
> +
> +int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
> +                            SMMUCommandBatch *batch, Cmd *cmd,
> +                            uint32_t *cons, bool dev_cache)
> +{
> +    int ret;
> +
> +    if (!bs->accel) {
> +        return 0;
> +    }
> +
> +    if (sdev) {
> +        SMMUv3AccelDevice *accel_dev;
> +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +        if (!accel_dev->s1_hwpt) {
> +            return 0;
> +        }
> +    }
> +
> +    /*
> +     * Currently separate out dev_cache and hwpt for safety, which might
> +     * not be necessary if underlying HW SMMU does not have the errata.
> +     *
> +     * TODO check IIDR register values read from hw_info.
> +     */
> +    if (batch->ncmds && (dev_cache != batch->dev_cache)) {
> +        ret = smmuv3_accel_issue_cmd_batch(bs, batch);
> +        if (ret) {
> +            *cons = batch->cons[batch->ncmds];
> +            return ret;
> +        }
> +    }
> +    batch->dev_cache = dev_cache;
> +    batch->cmds[batch->ncmds] = *cmd;
> +    batch->cons[batch->ncmds++] = *cons;
> +    return 0;
> +}
> +
>   static bool
>   smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
>                                  HostIOMMUDeviceIOMMUFD *idev, Error **errp)
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 46c8bcae14..4602ae6728 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -549,13 +549,42 @@ typedef struct CD {
>       uint32_t word[16];
>   } CD;
>   
> +/**
> + * SMMUCommandBatch - batch of invalidation commands for smmuv3-accel
> + * @cmds: Pointer to list of commands
> + * @cons: Pointer to list of CONS corresponding to the commands
> + * @ncmds: Total ncmds in the batch
> + * @dev_cache: Issue to a device cache
> + */
> +typedef struct SMMUCommandBatch {
> +    Cmd *cmds;
> +    uint32_t *cons;
> +    uint32_t ncmds;
> +    bool dev_cache;
> +} SMMUCommandBatch;
> +
>   int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
>                     SMMUEventInfo *event);
>   void smmuv3_flush_config(SMMUDevice *sdev);
>   
>   #if defined(CONFIG_ARM_SMMUV3_ACCEL) && defined(CONFIG_IOMMUFD)
> +int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch);
> +int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
> +                            SMMUCommandBatch *batch, Cmd *cmd,
> +                            uint32_t *cons, bool dev_cache);
>   void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid);
>   #else
> +static inline int smmuv3_accel_issue_cmd_batch(SMMUState *bs,
> +                                               SMMUCommandBatch *batch)
> +{
> +    return 0;
> +}
> +static inline int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
> +                                          SMMUCommandBatch *batch, Cmd *cmd,
> +                                          uint32_t *cons, bool dev_cache)
> +{
> +    return 0;
> +}
>   static inline void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
>   {
>   }



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19  0:23                     ` Jason Gunthorpe
@ 2025-03-19  2:15                       ` Donald Dutile
  2025-03-19 17:00                       ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Donald Dutile @ 2025-03-19  2:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Eric Auger, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, berrange, nathanc, mochs, smostafa, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

Jason,
Hey!

On 3/18/25 8:23 PM, Jason Gunthorpe wrote:
> On Tue, Mar 18, 2025 at 05:22:51PM -0400, Donald Dutile wrote:
> 
>> I agree with Eric that 'accel' isn't needed -- this should be
>> ascertained from the pSMMU that a physical device is attached to.
> 
> I seem to remember the point was made that we don't actually know if
> accel is possible, or desired, especially in the case of hotplug.
> 
In the case of hw-passthrough hot-plug, what isn't known?:
a) domain:b:d.f is known
b) thus its hierarchy and SMMUv3 association in the host is known
c) thus, if the (accel) features of the SMMUv3 were exposed (known),
    then the proper setup (separate SMMUv3 vs system-wide-emulated SMMUv3;
    association of (allocated/configured) vSMMUv3 to pSMMUv3 would be known/made

What else is missing?

> The accelerated mode has a number of limitations that the software
> mode does not have. I think it does make sense that the user would
> deliberately choose to use a more restrictive operating mode and then
> would have to meet the requirements - eg by creating the required
> number and configuration of vSMMUs.
> 
At a qemu-cmd level, the right number & config of smmuv3's, but
libvirt, if it had the above info, could auto-generate the right number
of smmuv3's (stages, accel-features, etc.) ... just as it does today in
creating the right number of pcie bus's, RPs, etc. from simple(r)
device specs into more complete, qemu configs.

>> Now... how does vfio(?; why not qemu?) layer determine that? --
>> where are SMMUv3 'accel' features exposed either: a) in the device
>> struct (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't
>> find anything under either on my g-h system, but would appreciate a
>> ptr if there is.
> 
> I think it is not discoverable yet other thatn through
> try-and-fail. Discoverability would probably be some bits in an
> iommufd GET_INFO ioctl or something like that.
> 
I don't see how iommufd would 'get-info' the needed info any better
than any other interface/subsystem.  ...

>> and like Eric, although 'accel' is better than the
>> original 'nested', it's non-obvious what accel feature(s) are being
>> turned on, or not.
> 
> There are really only one accel feature - direct HW usage of the IO
> Page table in the guest (no shadowing).
> 
> A secondary addon would be direct HW usage of an invalidation queue in
> the guest.
> 
and, if architected correctly, even in (device-specific) sw-provided tables,
it could be 'formatted' in a way that it was discoverable by the appropriate layers
(libvirt, qemu).
Once discoverable, this whole separate accel device -- which is really an
attribute of an SMMUv3 -- can be generalized, and reduced, to a much
smaller, simpler, sw footprint, with the concept of callbacks (as the series
uses) to enable hw accelerators to perform the shadow-ops that fully-emulated
smmuv3 would have to do.
  
>> kernel boot-param will be needed; if in sysfs, a write to 0 an
>> enable(disable) it maybe an alternative as well.  Bottom line: we
>> need a way to (a) ascertain the accel feature (b) a way to disable
>> it when it is broken, so qemu's smmuv3 spec will 'just work'.
> 
> You'd turned it off by not asking qemu to use it, that is sort of the
> reasoning behind the command line opt in for accel or not.
It would make machine-level definitions far more portable if the
working/non-working, and the one-accel, or two-accel, or three-accel, or ...
features were dynamically determined  vs a static (qemu) machine config, that would have
to be manipulated each time it ran on a different machine.

e.g., cluster sw scans servers for machines with device-X.
       create VMs, assigning some/all of device-X to a VM via its own smmuv3. done.
       Now, if the smmuv3 features were exposed all the way up to userspace,
       then one could argue the cluster sw could scan for those features and add it
       to the accel=x,y,z option of the smmuv3 associated with an assigned device.
       potato/po-tah-toe cluster sw or libvirt or qemu or <something-else> scans/reads
        ... discoverability of the features has to be done by
       (a) a computer, or (b) an error-prone human.
      ... all that AI gone to waste ... ;-)

- Don

> 
> Jason
> 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 16/20] hw/arm/smmuv3-accel: Read host SMMUv3 device info
  2025-03-11 14:10 ` [RFC PATCH v2 16/20] hw/arm/smmuv3-accel: Read host SMMUv3 device info Shameer Kolothum via
@ 2025-03-19  2:45   ` Donald Dutile
  2025-03-26 14:57   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Donald Dutile @ 2025-03-19  2:45 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, berrange, nathanc,
	mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

Shameer,

Hey,


On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Read the underlying SMMUv3 device info and set corresponding IDR
> bits. We need at least one cold-plugged vfio-pci dev associated
> with the smmuv3-accel instance to do this now.  Hence fail if it
> is not available.
> 
> ToDo: The above requirement will be relaxed in future when we add
> support in the kernel.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   hw/arm/smmuv3-accel.c         | 104 ++++++++++++++++++++++++++++++++++
>   hw/arm/trace-events           |   1 +
>   include/hw/arm/smmuv3-accel.h |   2 +
>   3 files changed, 107 insertions(+)
> 
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 09be838d22..fb08e1d66b 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -15,6 +15,96 @@
>   
>   #include "smmuv3-internal.h"
>   
> +static int
> +smmuv3_accel_dev_get_info(SMMUv3AccelDevice *accel_dev, uint32_t *data_type,
> +                          uint32_t data_len, void *data)
> +{
> +    uint64_t caps;
> +
> +    if (!accel_dev || !accel_dev->idev) {
> +        return -ENOENT;
> +    }
> +
> +    return !iommufd_backend_get_device_info(accel_dev->idev->iommufd,
> +                                            accel_dev->idev->devid,
> +                                            data_type, data,
> +                                            data_len, &caps, NULL);
> +}
> +
> +static void smmuv3_accel_init_regs(SMMUv3AccelState *s_accel)
> +{
> +    SMMUv3State *s = ARM_SMMUV3(s_accel);
> +    SMMUv3AccelDevice *accel_dev;
> +    uint32_t data_type;
> +    uint32_t val;
> +    int ret;
> +
> +    if (!s_accel->viommu || QLIST_EMPTY(&s_accel->viommu->device_list)) {
> +        error_report("At least one cold-plugged vfio-pci is required for smmuv3-accel!");
> +        exit(1);
> +    }
> +
> +    accel_dev = QLIST_FIRST(&s_accel->viommu->device_list);
> +    if (accel_dev->info.idr[0]) {
> +        info_report("reusing the previous hw_info");
> +        goto out;
> +    }
> +
> +    ret = smmuv3_accel_dev_get_info(accel_dev, &data_type,
> +                                    sizeof(accel_dev->info), &accel_dev->info);
> +    if (ret) {
> +        error_report("failed to get SMMU device info");
> +        return;
> +    }
> +
> +    if (data_type != IOMMU_HW_INFO_TYPE_ARM_SMMUV3) {
> +        error_report("Wrong data type (%d)!", data_type);
> +        return;
> +    }
> +
> +out:
> +    trace_smmuv3_accel_get_device_info(accel_dev->info.idr[0],
> +                                       accel_dev->info.idr[1],
> +                                       accel_dev->info.idr[3],
> +                                       accel_dev->info.idr[5]);
> +
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, BTM);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, BTM, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, ATS);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, ATS, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, ASID16);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, ASID16, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, TERM_MODEL);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, TERM_MODEL, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, STALL_MODEL);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, STALL_MODEL, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, STLEVEL);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, STLEVEL, val);
> +
> +    val = FIELD_EX32(accel_dev->info.idr[1], IDR1, SIDSIZE);
> +    s->idr[1] = FIELD_DP32(s->idr[1], IDR1, SIDSIZE, val);
> +    val = FIELD_EX32(accel_dev->info.idr[1], IDR1, SSIDSIZE);
> +    s->idr[1] = FIELD_DP32(s->idr[1], IDR1, SSIDSIZE, val);
> +
> +    val = FIELD_EX32(accel_dev->info.idr[3], IDR3, HAD);
> +    s->idr[3] = FIELD_DP32(s->idr[3], IDR3, HAD, val);
> +    val = FIELD_EX32(accel_dev->info.idr[3], IDR3, RIL);
> +    s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, val);
> +    val = FIELD_EX32(accel_dev->info.idr[3], IDR3, BBML);
> +    s->idr[3] = FIELD_DP32(s->idr[3], IDR3, BBML, val);
> +
> +    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, GRAN4K);
> +    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
> +    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, GRAN16K);
> +    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
> +    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, GRAN64K);
> +    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
> +    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, OAS);
> +    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, val);
> +
> +    /* FIXME check iidr and aidr registrs too */
> +}
> +
>   static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
>                                                   PCIBus *bus, int devfn)
>   {
> @@ -484,11 +574,25 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
>       bs->unset_iommu_device = smmuv3_accel_unset_iommu_device;
>   }
>   
> +static void smmuv3_accel_reset_hold(Object *obj, ResetType type)
> +{
> +    SMMUv3AccelState *s = ARM_SMMUV3_ACCEL(obj);
> +    SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_GET_CLASS(s);
> +
> +    if (c->parent_phases.hold) {
> +        c->parent_phases.hold(obj, type);
> +    }
> +    smmuv3_accel_init_regs(s);
> +}
reset has to be moved from hold to exit phase....
Eric recently posted a fix for this issue in upstream.

... and if accel was just a feature of the common smmuv3 support, this reset wouldn't be needed...

> +
>   static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
>   {
>       DeviceClass *dc = DEVICE_CLASS(klass);
> +    ResettableClass *rc = RESETTABLE_CLASS(klass);
>       SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_CLASS(klass);
>   
> +    resettable_class_set_parent_phases(rc, NULL, smmuv3_accel_reset_hold, NULL,
> +                                       &c->parent_phases);
>       device_class_set_parent_realize(dc, smmu_accel_realize,
>                                       &c->parent_realize);
>       dc->hotpluggable = false;
> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index cd2eac31c2..c7a7e58291 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -62,6 +62,7 @@ smmu_reset_exit(void) ""
>   smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
>   smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
>   smmuv3_accel_install_nested_ste(uint32_t sid, uint64_t ste_1, uint64_t ste_0) "sid=%d ste=%"PRIx64":%"PRIx64
> +smmuv3_accel_get_device_info(uint32_t idr0, uint32_t idr1, uint32_t idr3, uint32_t idr5) "idr0=0x%x idr1=0x%x idr3=0x%x idr5=0x%x"
>   
>   # strongarm.c
>   strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> index 58e68534c0..9e30d7d351 100644
> --- a/include/hw/arm/smmuv3-accel.h
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -52,6 +52,7 @@ typedef struct SMMUv3AccelDevice {
>       SMMUViommu *viommu;
>       SMMUVdev   *vdev;
>       AddressSpace as_sysmem;
> +    struct iommu_hw_info_arm_smmuv3 info;
>       QLIST_ENTRY(SMMUv3AccelDevice) next;
>   } SMMUv3AccelDevice;
>   
> @@ -68,6 +69,7 @@ struct SMMUv3AccelClass {
>       /*< public >*/
>   
>       DeviceRealize parent_realize;
> +    ResettablePhases parent_phases;
>   };
>   
>   #endif /* HW_ARM_SMMUV3_ACCEL_H */

In general, I would move this common code stuff at the front of the patch series... just gathering registers, capabilities, etc.

- Don



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
  2025-03-11 14:10 ` [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3 Shameer Kolothum via
@ 2025-03-19  2:52   ` Donald Dutile
  2025-03-26 17:40   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Donald Dutile @ 2025-03-19  2:52 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, berrange, nathanc,
	mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao



On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> If a vSMMU is configured as a accelerated one, HW IOTLB will be used
> and all cache invalidation should be done to the HW IOTLB too, v.s.
> the emulated iotlb. In this case, an iommu notifier isn't registered,
> as the devices behind a SMMUv3-accel would stay in the system address
> space for stage-2 mappings.
> 
> However, the KVM code still requests an iommu address space to translate
> an MSI doorbell gIOVA via get_msi_address_space() and translate().
> 
> Since a SMMUv3-accel doesn't register an iommu notifier to flush emulated
> iotlb, bypass the emulated IOTLB and always walk through the guest-level
> IO page table.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   hw/arm/smmu-common.c | 21 +++++++++++++++++++++
>   1 file changed, 21 insertions(+)
> 
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 9fd455baa0..fd10df8866 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -77,6 +77,17 @@ static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs,
>       uint8_t level = 4 - (inputsize - 4) / stride;
>       SMMUTLBEntry *entry = NULL;
>   
> +    /*
> +     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
> +     * KVM still requests for an iommu address space for an MSI fixup by looking
> +     * up stage-1 page table. Make sure we don't go through the emulated pathway
> +     * so that the emulated iotlb will not need any invalidation.
> +     */
> +
> +    if (bs->accel) {
> +        return NULL;
> +    }
> +
>       while (level <= 3) {
>           uint64_t subpage_size = 1ULL << level_shift(level, tt->granule_sz);
>           uint64_t mask = subpage_size - 1;
> @@ -142,6 +153,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
>       SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
>       uint8_t tg = (new->granule - 10) / 2;
>   
> +    /*
> +     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
> +     * KVM still requests for an iommu address space for an MSI fixup by looking
> +     * up stage-1 page table. Make sure we don't go through the emulated pathway
> +     * so that the emulated iotlb will not need any invalidation.
> +     */
> +    if (bs->accel) {
> +        return;
> +    }
> +
>       if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
>           smmu_iotlb_inv_all(bs);
>       }

Ah! ... if 'accel', skip emulated code since hw handling it... in common smmu code... I like it! :)
- Don



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19  0:31                 ` Jason Gunthorpe
@ 2025-03-19  5:27                   ` Nicolin Chen
  2025-03-24 14:08                   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-19  5:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Eric Auger, Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Tue, Mar 18, 2025 at 09:31:35PM -0300, Jason Gunthorpe wrote:
> On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
> > Nevertheless I don't think anything prevents the acceleration granted
> > device from also working with virtio/vhost devices for instance unless
> > you unplug the existing infra.
> 
> If the accel mode is using something like vcmdq then it is not
> possible to work since the invalidations won't even be trapped.

Yea, I totally forgot that.. All the invalidation commands that
belong to emulated devices would be issued to VCMDQ (HW), while
those vSIDs wouldn't be supported by the HW for CFGI_CD/ATC_INV,
which will trigger errors/timeouts.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-18 22:12   ` Donald Dutile
@ 2025-03-19  9:26     ` Shameerali Kolothum Thodi via
  2025-03-19 16:21       ` Donald Dutile
  2025-03-20 17:02       ` Nicolin Chen
  0 siblings, 2 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-19  9:26 UTC (permalink / raw)
  To: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

Hi Don,

> -----Original Message-----
> From: Donald Dutile <ddutile@redhat.com>
> Sent: Tuesday, March 18, 2025 10:12 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
> pcie bus
> 
> Shameer,
> 
> Hi!
> 
> On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> > User must associate a pxb-pcie root bus to smmuv3-accel
> > and that is set as the primary-bus for the smmu dev.
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >   hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
> >   1 file changed, 19 insertions(+)
> >
> > diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> > index c327661636..1471b65374 100644
> > --- a/hw/arm/smmuv3-accel.c
> > +++ b/hw/arm/smmuv3-accel.c
> > @@ -9,6 +9,21 @@
> >   #include "qemu/osdep.h"
> >
> >   #include "hw/arm/smmuv3-accel.h"
> > +#include "hw/pci/pci_bridge.h"
> > +
> > +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
> > +{
> > +    DeviceState *d = opaque;
> > +
> > +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
> > +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
> > +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus-
> >name)) {
> > +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
> > +                                     &error_abort);
> > +        }
> > +    }
> > +    return 0;
> > +}
> >
> >   static void smmu_accel_realize(DeviceState *d, Error **errp)
> >   {
> > @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d, Error
> **errp)
> >       SysBusDevice *dev = SYS_BUS_DEVICE(d);
> >       Error *local_err = NULL;
> >
> > +    object_child_foreach_recursive(object_get_root(),
> > +                                   smmuv3_accel_pxb_pcie_bus, d);
> > +
> >       object_property_set_bool(OBJECT(dev), "accel", true, &error_abort);
> >       c->parent_realize(d, &local_err);
> >       if (local_err) {
> > @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass
> *klass, void *data)
> >       device_class_set_parent_realize(dc, smmu_accel_realize,
> >                                       &c->parent_realize);
> >       dc->hotpluggable = false;
> > +    dc->bus_type = TYPE_PCIE_BUS;
> >   }
> >
> >   static const TypeInfo smmuv3_accel_type_info = {
> 
> I am not seeing the need for a pxb-pcie bus(switch) introduced for each
> 'accel'.
> Isn't the IORT able to define different SMMUs for different RIDs?   if so,
> itsn't that sufficient
> to associate (define) an SMMU<->RID association without introducing a
> pxb-pcie?
> and again, I'm not sure how that improves/enables the device<->SMMU
> associativity?

Thanks for taking a look at the series. As discussed elsewhere in this thread(with
Eric), normally in physical world (or atleast in the most common cases) SMMUv3
is attached to PCIe Root Complex and if you take a look at the IORT spec, it describes
association of ID mappings between a RC node and SMMUV3 node.

And if my understanding is correct, in Qemu, only pxb-pcie allows you to add
extra root complexes even though it is still plugged to parent(pcie.0). ie, for all
devices downstream it acts as a root complex but still plugged into a parent pcie.0.
This allows us to add/describe multiple "smmuv3-accel" each associated with a RC.

Having said that,  current code only allows pxb-pcie root complexes avoiding
the pcie.0. The idea behind this was, user can use pcie.0 with a non accel SMMUv3
for any emulated devices avoiding the performance bottlenecks we are
discussing for emulated dev+smmuv3-accel cases. But based on the feedback from
Eric and Daniel I will relax that restriction and will allow association with pcie.0.

Thanks,
Shameer








 

>>> to root complexes.
> Feel free to enlighten me where I may have mis-read/interpreted the IORT
> & SMMUv3 specs.
> 
> Thanks,
> - Don
> 


^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel
  2025-03-18 22:49   ` Donald Dutile
@ 2025-03-19  9:28     ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-19  9:28 UTC (permalink / raw)
  To: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Donald Dutile <ddutile@redhat.com>
> Sent: Tuesday, March 18, 2025 10:50 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-
> accel
> 
> 
> Doesn't this commit become moot, if accel becomes an smmuv3 option vs
> separate device object altogether, dynamically added if a pdev is attached
> to a host SMMUv3 that has accel feature(s)?
> 
> Blocking w/virtio-iommu falls under the same situation mentioned in 03/20
> wrt mixing emulated & physical devices on the same smmuv3.

Yes, this patch might change once we move to "-device smmuv3, accel=on" version.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-19  1:31   ` Donald Dutile
@ 2025-03-19  9:48     ` Shameerali Kolothum Thodi via
  2025-03-19 16:24       ` Donald Dutile
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-19  9:48 UTC (permalink / raw)
  To: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Donald Dutile <ddutile@redhat.com>
> Sent: Wednesday, March 19, 2025 1:31 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers
> to batch and issue cache invalidations
> 
> Shameer,
> 
> Hi,
> 
> 
> On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> >
> > Inroduce an SMMUCommandBatch and some helpers to batch and issue
> the
>    ^^^^^^^^ Introduce
> > commands.  Currently separate out TLBI commands and device cache
> > commands to avoid some errata on certain versions of SMMUs. Later it
> > should check IIDR register to detect if underlying SMMU hw has such an
> erratum.
> Where is all this info about 'certain versions of SMMUs' and 'check IIDR
> register' has something to do with 'underlying SMMU hw such an erratum',
> -- which IIDR (& bits)? or are we talking about rsvd SMMU_IDR<> registers?

I guess the batching has constraints on some platforms, IIRC, this was discussed
somewhere in a kernel thread.  

Nicolin, could you please provide some background on this.

> 
> And can't these helpers be used for emulated smmuv3 as well as
> accelerated?

Could be I guess. But no benefit in terms of performance. May be will make
code look nicer. I will take a look if not much of changes in the emulated path.

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-19  9:26     ` Shameerali Kolothum Thodi via
@ 2025-03-19 16:21       ` Donald Dutile
  2025-03-19 18:21         ` Eric Auger
  2025-03-20 17:02       ` Nicolin Chen
  1 sibling, 1 reply; 145+ messages in thread
From: Donald Dutile @ 2025-03-19 16:21 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



On 3/19/25 5:26 AM, Shameerali Kolothum Thodi wrote:
> Hi Don,
> 
Hey!

>> -----Original Message-----
>> From: Donald Dutile <ddutile@redhat.com>
>> Sent: Tuesday, March 18, 2025 10:12 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
>> pcie bus
>>
>> Shameer,
>>
>> Hi!
>>
>> On 3/11/25 10:10 AM, Shameer Kolothum wrote:
>>> User must associate a pxb-pcie root bus to smmuv3-accel
>>> and that is set as the primary-bus for the smmu dev.
>>>
>>> Signed-off-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com>
>>> ---
>>>    hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
>>>    1 file changed, 19 insertions(+)
>>>
>>> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
>>> index c327661636..1471b65374 100644
>>> --- a/hw/arm/smmuv3-accel.c
>>> +++ b/hw/arm/smmuv3-accel.c
>>> @@ -9,6 +9,21 @@
>>>    #include "qemu/osdep.h"
>>>
>>>    #include "hw/arm/smmuv3-accel.h"
>>> +#include "hw/pci/pci_bridge.h"
>>> +
>>> +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
>>> +{
>>> +    DeviceState *d = opaque;
>>> +
>>> +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
>>> +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
>>> +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus-
>>> name)) {
>>> +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
>>> +                                     &error_abort);
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>>>
>>>    static void smmu_accel_realize(DeviceState *d, Error **errp)
>>>    {
>>> @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d, Error
>> **errp)
>>>        SysBusDevice *dev = SYS_BUS_DEVICE(d);
>>>        Error *local_err = NULL;
>>>
>>> +    object_child_foreach_recursive(object_get_root(),
>>> +                                   smmuv3_accel_pxb_pcie_bus, d);
>>> +
>>>        object_property_set_bool(OBJECT(dev), "accel", true, &error_abort);
>>>        c->parent_realize(d, &local_err);
>>>        if (local_err) {
>>> @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass
>> *klass, void *data)
>>>        device_class_set_parent_realize(dc, smmu_accel_realize,
>>>                                        &c->parent_realize);
>>>        dc->hotpluggable = false;
>>> +    dc->bus_type = TYPE_PCIE_BUS;
>>>    }
>>>
>>>    static const TypeInfo smmuv3_accel_type_info = {
>>
>> I am not seeing the need for a pxb-pcie bus(switch) introduced for each
>> 'accel'.
>> Isn't the IORT able to define different SMMUs for different RIDs?   if so,
>> itsn't that sufficient
>> to associate (define) an SMMU<->RID association without introducing a
>> pxb-pcie?
>> and again, I'm not sure how that improves/enables the device<->SMMU
>> associativity?
> 
> Thanks for taking a look at the series. As discussed elsewhere in this thread(with
> Eric), normally in physical world (or atleast in the most common cases) SMMUv3
> is attached to PCIe Root Complex and if you take a look at the IORT spec, it describes
> association of ID mappings between a RC node and SMMUV3 node.
> 
> And if my understanding is correct, in Qemu, only pxb-pcie allows you to add
> extra root complexes even though it is still plugged to parent(pcie.0). ie, for all
> devices downstream it acts as a root complex but still plugged into a parent pcie.0.
> This allows us to add/describe multiple "smmuv3-accel" each associated with a RC.
> 
I find the qemu statements a bit unclear here as well.
I looked at the hot plug statement(s) in docs/pcie.txt, as I figured that's where dynamic
IORT changes would be needed as well.  There, it says you can hot-add PCIe devices to RPs,
one has to define/add RP's to the machine model for that plug-in.

Using libvirt, it could auto-add the needed RPs to do dynmaic smmuv3 additions,
if I understand how libvirt does that today for pcie devices now (/me looks at danpb for feedback).

> Having said that,  current code only allows pxb-pcie root complexes avoiding
> the pcie.0. The idea behind this was, user can use pcie.0 with a non accel SMMUv3
> for any emulated devices avoiding the performance bottlenecks we are
> discussing for emulated dev+smmuv3-accel cases. But based on the feedback from
> Eric and Daniel I will relax that restriction and will allow association with pcie.0.
> 
So, I think this isn't a restriction that this smmuv3 feature should enforce;
lack of a proper RP or pxb-pcie will yield an invalid config issue/error, and
the machine definition will be modified to meet the needs for IORT.

> Thanks,
> Shameer
> 
> 
> 
> 
> 
> 
> 
> 
>   
> 
>>>> to root complexes.
>> Feel free to enlighten me where I may have mis-read/interpreted the IORT
>> & SMMUv3 specs.
>>
>> Thanks,
>> - Don
>>
> 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-19  9:48     ` Shameerali Kolothum Thodi via
@ 2025-03-19 16:24       ` Donald Dutile
  2025-03-19 16:48         ` Nicolin Chen
  0 siblings, 1 reply; 145+ messages in thread
From: Donald Dutile @ 2025-03-19 16:24 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



On 3/19/25 5:48 AM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Donald Dutile <ddutile@redhat.com>
>> Sent: Wednesday, March 19, 2025 1:31 AM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers
>> to batch and issue cache invalidations
>>
>> Shameer,
>>
>> Hi,
>>
>>
>> On 3/11/25 10:10 AM, Shameer Kolothum wrote:
>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>>
>>> Inroduce an SMMUCommandBatch and some helpers to batch and issue
>> the
>>     ^^^^^^^^ Introduce
>>> commands.  Currently separate out TLBI commands and device cache
>>> commands to avoid some errata on certain versions of SMMUs. Later it
>>> should check IIDR register to detect if underlying SMMU hw has such an
>> erratum.
>> Where is all this info about 'certain versions of SMMUs' and 'check IIDR
>> register' has something to do with 'underlying SMMU hw such an erratum',
>> -- which IIDR (& bits)? or are we talking about rsvd SMMU_IDR<> registers?
> 
> I guess the batching has constraints on some platforms, IIRC, this was discussed
> somewhere in a kernel thread.
> 
> Nicolin, could you please provide some background on this.
> 
A lore link if it's discussed upstream, thanks.

>>
>> And can't these helpers be used for emulated smmuv3 as well as
>> accelerated?
> 
> Could be I guess. But no benefit in terms of performance. May be will make
> code look nicer. I will take a look if not much of changes in the emulated path.
> 
Thanks for looking into it.  The push is to use common code path(s) to invoke (or not)
an accel path.

> Thanks,
> Shameer
> 
> 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (19 preceding siblings ...)
  2025-03-11 14:10 ` [RFC PATCH v2 20/20] hw/arm/smmuv3-accel: Enable smmuv3-accel creation Shameer Kolothum via
@ 2025-03-19 16:40 ` Philippe Mathieu-Daudé
  2025-03-19 17:13   ` Eric Auger
  2025-03-25 14:42 ` Eric Auger
  21 siblings, 1 reply; 145+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-03-19 16:40 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, berrange,
	nathanc, mochs, smostafa, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, Cédric Le Goater

Hi,

On 11/3/25 15:10, Shameer Kolothum via wrote:
> Hi All,
> 
> This patch series introduces initial support for a user-creatable
> accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU.

I'm a bit confused by the design here. Why are we introducing this as
some device while it is a core component of the bus topology (here PCI)?

Is is because this device is inspired on how x86 IOMMUs are wired?

> Why this is needed:
> 
> Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the
> machine and does not support configuring the host SMMUv3 in nested
> mode.This limitation prevents its use with vfio-pci passthrough
> devices.
> 
> The new pluggable smmuv3-accel device enables host SMMUv3 configuration
> with nested stage support (Stage 1 owned by the Guest and Stage 2 by the
> host) via the new IOMMUFD APIs. Additionally, it allows multiple
> accelerated vSMMUv3 instances for guests running on hosts with multiple
> physical SMMUv3s.
> 
> This will benefit in:
> -Reduced invalidation broadcasts and lookups for devices behind multiple
>   physical SMMUv3s.
> -Simplifies handling of host SMMUv3s with differing feature sets.
> -Lays the groundwork for additional capabilities like vCMDQ support.
> 
> 
> Changes from RFCv1[0]:
> 
> Thanks to everyone who provided feedback on RFCv1!.
> 
> –The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested
>   to better reflect its role in using the host's physical SMMUv3 for page
>   table setup and cache invalidations.
> -Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2).
> -Merges patches from Nicolin’s GitHub repository that add accelerated
>   functionalityi for page table setup and cache invalidations[1]. I have
>   modified these a bit, but hopefully has not broken anything.
> -Incorporates various fixes and improvements based on RFCv1 feedback.
> –Adds support for vfio-pci hotplug with smmuv3-accel.
> 
> Note: IORT RMR patches for MSI setup are currently excluded as we may
> adopt a different approach for MSI handling in the future [2].
> 
> Also this has dependency on the common iommufd/vfio patches from
> Zhenzhong's series here[3]
> 
> ToDos:
> 
> –At least one vfio-pci device must currently be cold-plugged to a
>   pxb-pcie bus associated with the arm-smmuv3-accel. This is required both
>   to associate a vSMMUv3 with a host SMMUv3 and also needed to
>   retrieve the host SMMUv3 IDR registers for guest export.
>   Future updates will remove this restriction by adding the
>   necessary kernel support.
>   Please find the discussion here[4]
> -This version does not yet support host SMMUv3 fault handling or
>   other event notifications. These will be addressed in a
>   future patch series.
> 
> 
> The complete branch can be found here:
> https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext
> 
> I have done basic sanity testing on a Hisilicon Platform using the kernel
> branch here:
> https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2
> 
> Usage Eg:
> 
> On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different host SMMUv3s. So for a
> Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as below,
> 
> 
> ./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \
> -cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \
> -bios QEMU_EFI.fd \
> -object iommufd,id=iommufd0 \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
> -device arm-smmuv3-accel,bus=pcie.1 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
> -device arm-smmuv3-accel,bus=pcie.2 \
> -device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \
> -kernel Image \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs,path=p9root,security_model=mapped \
> -net none \
> -nographic
> 
> Guest will boot with two SMMUv3s,
> ...
> arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008325)
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008325)
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
> 
> With a pci topology like below,
> 
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>   |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>   |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>   |           \-03.0  Virtio: Virtio filesystem
>   +-[0000:01]-+-00.0-[02]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>   |           \-01.0-[03]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>   \-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> 
> Further tests are always welcome.
> 
> Please take a look and let me know your feedback!
> 
> Thanks,
> Shameer
> 
> [0] https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com/
> [1] https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6
> [2] https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicolinc@nvidia.com/
> [3] https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.duan@intel.com/
> [4] https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-17 19:10         ` Nicolin Chen
  2025-03-17 19:24           ` Jason Gunthorpe
@ 2025-03-19 16:45           ` Eric Auger
  2025-03-19 16:53             ` Shameerali Kolothum Thodi via
  2025-03-19 17:14             ` Nicolin Chen
  1 sibling, 2 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-19 16:45 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao




On 3/17/25 8:10 PM, Nicolin Chen wrote:
> On Mon, Mar 17, 2025 at 07:07:52PM +0100, Eric Auger wrote:
>> On 3/17/25 6:54 PM, Nicolin Chen wrote:
>>> On Wed, Mar 12, 2025 at 04:15:10PM +0100, Eric Auger wrote:
>>>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>>>> Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel
>>>>> device. In order to support vfio-pci dev assignment with a Guest
>>>> guest
>>>>> SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
>>>> nested (s1+s2)
>>>>> mode, with Guest owning the S1 page tables. Subsequent patches will
>>>> the guest
>>>>> add support for smmuv3-accel to provide this.
>>>> Can't this -accel smmu also works with emulated devices? Do we want an
>>>> exclusive usage?
>>> Is there any benefit from emulated devices working in the HW-
>>> accelerated nested translation mode?
>> Not really but do we have any justification for using different device
>> name in accel mode? I am not even sure that accel option is really
>> needed. Ideally the qemu device should be able to detect it is
>> protecting a VFIO device, in which case it shall check whether nested is
>> supported by host SMMU and then automatically turn accel mode?
>>
>> I gave the example of the vfio device which has different class
>> implementration depending on the iommufd option being set or not.
> Do you mean that we should just create a regular smmuv3 device and
> let a VFIO device to turn on this smmuv3's accel mode depending on
> its LEGACY/IOMMUFD class?

no this is not what I meant. I gave an example where depending on an
option passed to thye VFIO device you choose one class implement or the
other.
>
> Another question: how does an emulated device work with a vSMMUv3?
I don't get your question. vSMMUv3 currently only works with emulated
devices. Did you mean accelerated SMMUv3?
> I could imagine that all the accel steps would be bypassed since
> !sdev->idev. Yet, the emulated iotlb should cache its translation
> so we will need to flush the iotlb, which will increase complexity
> as the TLBI command dispatching function will need to be aware what
> ASID is for emulated device and what is for vfio device..
I don't get the issue. For emulated device you go through the usual
translate path which indeed caches configs and translations. In case the
guest invalidates something, you know the SID and you find the entries
in the cache that are tagged by this SID.

In case you have an accelerated device (indeed if sdev->idev) you don't
exercise that path. On invalidation you detect the SID matches a VFIO
devoce, propagate the invalidations to the host instead. on the
invalidation you should be able to detect pretty easily if you need to
flush the emulated caches or propagate the invalidations. Do I miss some
extra problematic?

I do not say we should support emulated devices and VFIO devices in the
same guest iommu group. But I don't see why we couldn't easily plug the
accelerated logic in the current logical for emulation/vhost and do not
require a different qemu device.

Thanks

Eric
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-19 16:24       ` Donald Dutile
@ 2025-03-19 16:48         ` Nicolin Chen
  0 siblings, 0 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-19 16:48 UTC (permalink / raw)
  To: Donald Dutile
  Cc: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

On Wed, Mar 19, 2025 at 12:24:32PM -0400, Donald Dutile wrote:
> > > On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > 
> > > > Inroduce an SMMUCommandBatch and some helpers to batch and issue
> > > the
> > >     ^^^^^^^^ Introduce
> > > > commands.  Currently separate out TLBI commands and device cache
> > > > commands to avoid some errata on certain versions of SMMUs. Later it
> > > > should check IIDR register to detect if underlying SMMU hw has such an
> > > erratum.
> > > Where is all this info about 'certain versions of SMMUs' and 'check IIDR
> > > register' has something to do with 'underlying SMMU hw such an erratum',
> > > -- which IIDR (& bits)? or are we talking about rsvd SMMU_IDR<> registers?
> > 
> > I guess the batching has constraints on some platforms, IIRC, this was discussed
> > somewhere in a kernel thread.
> > 
> > Nicolin, could you please provide some background on this.
> > 
> A lore link if it's discussed upstream, thanks.

https://lore.kernel.org/all/696da78d32bb4491f898f11b0bb4d850a8aa7c6a.1683731256.git.robin.murphy@arm.com/

IIRC, some of them forbid command issuing like mixing leaf TLBI
commands with non-leaf TLBI commands or mixing device commands
with TLBI commands.

Currently, kernel masks away the ARM_SMMU_FEAT_NESTING from the
affected SMMU versions/subversions. So, I think we are fine for
now, though probably doesn't hurt to check IIDR?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 16:45           ` Eric Auger
@ 2025-03-19 16:53             ` Shameerali Kolothum Thodi via
  2025-03-19 17:26               ` Eric Auger
  2025-03-19 17:14             ` Nicolin Chen
  1 sibling, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-19 16:53 UTC (permalink / raw)
  To: eric.auger@redhat.com, Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, March 19, 2025 4:46 PM
> To: Nicolin Chen <nicolinc@nvidia.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial
> infrastructure for smmuv3-accel device
> >>> Is there any benefit from emulated devices working in the HW-
> >>> accelerated nested translation mode?
> >> Not really but do we have any justification for using different device
> >> name in accel mode? I am not even sure that accel option is really
> >> needed. Ideally the qemu device should be able to detect it is
> >> protecting a VFIO device, in which case it shall check whether nested is
> >> supported by host SMMU and then automatically turn accel mode?
> >>
> >> I gave the example of the vfio device which has different class
> >> implementration depending on the iommufd option being set or not.
> > Do you mean that we should just create a regular smmuv3 device and
> > let a VFIO device to turn on this smmuv3's accel mode depending on
> > its LEGACY/IOMMUFD class?
> 
> no this is not what I meant. I gave an example where depending on an
> option passed to thye VFIO device you choose one class implement or the
> other.
> >
> > Another question: how does an emulated device work with a vSMMUv3?
> I don't get your question. vSMMUv3 currently only works with emulated
> devices. Did you mean accelerated SMMUv3?
> > I could imagine that all the accel steps would be bypassed since
> > !sdev->idev. Yet, the emulated iotlb should cache its translation
> > so we will need to flush the iotlb, which will increase complexity
> > as the TLBI command dispatching function will need to be aware what
> > ASID is for emulated device and what is for vfio device..
> I don't get the issue. For emulated device you go through the usual
> translate path which indeed caches configs and translations. In case the
> guest invalidates something, you know the SID and you find the entries
> in the cache that are tagged by this SID.

Not always you get sid, eg: CMD_TLBI_NH_ASID

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19  0:23                     ` Jason Gunthorpe
  2025-03-19  2:15                       ` Donald Dutile
@ 2025-03-19 17:00                       ` Eric Auger
  2025-03-19 17:12                         ` Shameerali Kolothum Thodi via
  2025-03-21  0:55                         ` Donald Dutile
  1 sibling, 2 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-19 17:00 UTC (permalink / raw)
  To: Jason Gunthorpe, Donald Dutile
  Cc: Nicolin Chen, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, berrange, nathanc, mochs, smostafa, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

Hi,


On 3/19/25 1:23 AM, Jason Gunthorpe wrote:
> On Tue, Mar 18, 2025 at 05:22:51PM -0400, Donald Dutile wrote:
>
>> I agree with Eric that 'accel' isn't needed -- this should be
>> ascertained from the pSMMU that a physical device is attached to.
> I seem to remember the point was made that we don't actually know if
> accel is possible, or desired, especially in the case of hotplug.
that's why I think it would be better if we could instantiate a single
type of device that can do both accel and non accel mode.
Maybe that would be at the price of always enforcing MSI resv regions on
guest to assure MSI nesting is possible.

>
> The accelerated mode has a number of limitations that the software
> mode does not have. I think it does make sense that the user would
> deliberately choose to use a more restrictive operating mode and then
> would have to meet the requirements - eg by creating the required
> number and configuration of vSMMUs.
To avoid any misunderstanding I am not pushing for have a single vSMMU
instance. I advocate for having several instances, each somehow
specialized for VFIO devices or emulated devices. Maybe we can opt-in
with accel=on but the default could be auto (the property can be
AUTO_ON_OFF) where the code detects if a VFIO device is translated.In
case incompatible devices are translated into a same vSMMU instance I
guess it could be detected and will fail.

What I am pusshing for is to have a single type of QEMU device which can
do both accel and non accel.
> In general I advocate for having several vSMMU instances, each of them
>
>> Now... how does vfio(?; why not qemu?) layer determine that? --
>> where are SMMUv3 'accel' features exposed either: a) in the device
>> struct (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't
>> find anything under either on my g-h system, but would appreciate a
>> ptr if there is.
> I think it is not discoverable yet other thatn through
> try-and-fail. Discoverability would probably be some bits in an
> iommufd GET_INFO ioctl or something like that.
yeah but at least we can easily detect if a VFIO device is beeing
translated by a vSMMU instance in which case there is no other choice to
turn accel on.

Thanks

Eric
>
>> and like Eric, although 'accel' is better than the
>> original 'nested', it's non-obvious what accel feature(s) are being
>> turned on, or not.
> There are really only one accel feature - direct HW usage of the IO
> Page table in the guest (no shadowing).
>
> A secondary addon would be direct HW usage of an invalidation queue in
> the guest.
>
>> kernel boot-param will be needed; if in sysfs, a write to 0 an
>> enable(disable) it maybe an alternative as well.  Bottom line: we
>> need a way to (a) ascertain the accel feature (b) a way to disable
>> it when it is broken, so qemu's smmuv3 spec will 'just work'.  
> You'd turned it off by not asking qemu to use it, that is sort of the
> reasoning behind the command line opt in for accel or not.
>
> Jason
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-18 21:22                   ` Donald Dutile
  2025-03-19  0:23                     ` Jason Gunthorpe
@ 2025-03-19 17:04                     ` Eric Auger
  2025-03-21  0:54                       ` Donald Dutile
  1 sibling, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-19 17:04 UTC (permalink / raw)
  To: Donald Dutile, Nicolin Chen
  Cc: Jason Gunthorpe, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, berrange, nathanc, mochs, smostafa, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao




On 3/18/25 10:22 PM, Donald Dutile wrote:
>
>
> On 3/18/25 3:13 PM, Nicolin Chen wrote:
>> On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
>>> On 3/17/25 9:19 PM, Nicolin Chen wrote:
>>>> On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
>>>>> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
>>>>>> Another question: how does an emulated device work with a vSMMUv3?
>>>>>> I could imagine that all the accel steps would be bypassed since
>>>>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>>>>> so we will need to flush the iotlb, which will increase complexity
>>>>>> as the TLBI command dispatching function will need to be aware what
>>>>>> ASID is for emulated device and what is for vfio device..
>>>>> I think you should block it. We already expect different vSMMU's
>>>>> depending on the physical SMMU under the PCI device, it makes sense
>>>>> that a SW VFIO device would have it's own, non-accelerated, vSMMU
>>>>> model in the guest.
>>>> Yea, I agree and it'd be cleaner for an implementation separating
>>>> them.
>>>>
>>>> In my mind, the general idea of "accel=on" is also to keep things
>>>> in a more efficient way: passthrough devices go to HW-accelerated
>>>> vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
>>>> bypassed (PCIE0).
>>
>>> Originally a specific SMMU device was needed to opt in for MSI reserved
>>> region ACPI IORT description which are not needed if you don't rely on
>>> S1+S2. However if we don't rely on this trick this was not even needed
>>> with legacy integration
>>> (https://patchwork.kernel.org/project/qemu-devel/cover/20180921081819.9203-1-eric.auger@redhat.com/).
>>>
>>>
>>> Nevertheless I don't think anything prevents the acceleration granted
>>> device from also working with virtio/vhost devices for instance unless
>>> you unplug the existing infra. The translation and invalidation just
>>> should use different control paths (explicit translation requests,
>>> invalidation notifications towards vhost, ...).
>>
>> smmuv3_translate() is per sdev, so it's easy.
>>
>> Invalidation is done via commands, which could be tricky:
>> a) Broadcast command
>> b) ASID validation -- we'll need to keep track of a list of ASIDs
>>     for vfio device to compare the ASID in each per-ASID command,
>>     potentially by trapping all CFGI_CD(_ALL) commands? Note that
>>     each vfio device may have multiple ASIDs (for multiple CDs).
>> Either a or b above will have some validation efficiency impact.
>>
>>> Again, what does legitimate to have different qemu devices for the same
>>> IP? I understand that it simplifies the implementation but I am not
>>> sure
>>> this is a good reason. Nevertheless it worth challenging. What is the
>>> plan for intel iommu? Will we have 2 devices, the legacy device and one
>>> for nested?
>>
>> Hmm, it seems that there are two different topics:
>> 1. Use one SMMU device model (source code file; "iommu=" string)
>>     for both an emulated vSMMU and an HW-accelerated vSMMU.
>> 2. Allow one vSMMU instance to work with both an emulated device
>>     and a passthrough device.
>> And I get that you want both 1 and 2.
>>
>> I'm totally okay with 1, yet see no compelling benefit from 2 for
>> the increased complexity in the invalidation routine.
>>
>> And another question about the mixed device attachment. Let's say
>> we have in the host:
>>    VFIO passthrough dev0 -> pSMMU0
>>    VFIO passthrough dev1 -> pSMMU1
>> Should we allow emulated devices to be flexibly plugged?
>>    dev0 -> vSMMU0 /* Hard requirement */
>>    dev1 -> vSMMU1 /* Hard requirement */
>>    emu0 -> vSMMU0 /* Soft requirement; can be vSMMU1 also */
>>    emu1 -> vSMMU1 /* Soft requirement; can be vSMMU0 also */
>>
>> Thanks
>> Nicolin
>>
> I agree w/Jason & Nicolin: different vSMMUs for pass-through devices
> than emulated, & vice-versa.
> Not mixing... because... of the next agreement:
you need to clarify what you mean by different vSMMUs: are you taking
about different instances or different qemu device types?
>
> I agree with Eric that 'accel' isn't needed -- this should be
> ascertained from the pSMMU that a physical device is attached to.
we can simply use an AUTO_ON_OFF property and by default choose AUTO
value. That would close the debate ;-)

Eric
> Now... how does vfio(?; why not qemu?) layer determine that? -- where
> are SMMUv3 'accel' features exposed either: a) in the device struct
> (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't find
> anything under either on my g-h system, but would appreciate a ptr if
> there is.
> and like Eric, although 'accel' is better than the original 'nested',
> it's non-obvious what accel feature(s) are being turned on, or not.
> In fact, if broken accel hw occurs ('if' -> 'when'), how should it be
> turned off? ... if info in the kernel, a kernel boot-param will be
> needed;
> if in sysfs, a write to 0 an enable(disable) it maybe an alternative
> as well.
> Bottom line: we need a way to (a) ascertain the accel feature (b) a
> way to disable it when it is broken,
> so qemu's smmuv3 spec will 'just work'.
> [This may also help when migrating from a machine that has accel
> working to one that does not.[
>
> ... and when an emulated device is assigned a vSMMU, there are no
> accel features ... unless we have tunables like batch iotlb
> invalidation for perf reasons, which can be viewed as an 'accel' option.
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 17:00                       ` Eric Auger
@ 2025-03-19 17:12                         ` Shameerali Kolothum Thodi via
  2025-03-19 17:38                           ` Eric Auger
  2025-03-21  0:55                         ` Donald Dutile
  1 sibling, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-19 17:12 UTC (permalink / raw)
  To: eric.auger@redhat.com, Jason Gunthorpe, Donald Dutile
  Cc: Nicolin Chen, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, March 19, 2025 5:01 PM
> To: Jason Gunthorpe <jgg@nvidia.com>; Donald Dutile
> <ddutile@redhat.com>
> Cc: Nicolin Chen <nicolinc@nvidia.com>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; peter.maydell@linaro.org;
> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial
> infrastructure for smmuv3-accel device
> 
> Hi,
> 
> 
> On 3/19/25 1:23 AM, Jason Gunthorpe wrote:
> > On Tue, Mar 18, 2025 at 05:22:51PM -0400, Donald Dutile wrote:
> >
> >> I agree with Eric that 'accel' isn't needed -- this should be
> >> ascertained from the pSMMU that a physical device is attached to.
> > I seem to remember the point was made that we don't actually know if
> > accel is possible, or desired, especially in the case of hotplug.
> that's why I think it would be better if we could instantiate a single
> type of device that can do both accel and non accel mode.
> Maybe that would be at the price of always enforcing MSI resv regions on
> guest to assure MSI nesting is possible.
> 
> >
> > The accelerated mode has a number of limitations that the software
> > mode does not have. I think it does make sense that the user would
> > deliberately choose to use a more restrictive operating mode and then
> > would have to meet the requirements - eg by creating the required
> > number and configuration of vSMMUs.
> To avoid any misunderstanding I am not pushing for have a single vSMMU
> instance. I advocate for having several instances, each somehow
> specialized for VFIO devices or emulated devices. Maybe we can opt-in
> with accel=on but the default could be auto (the property can be
> AUTO_ON_OFF) where the code detects if a VFIO device is translated.In
> case incompatible devices are translated into a same vSMMU instance I
> guess it could be detected and will fail.
> 
> What I am pusshing for is to have a single type of QEMU device which can
> do both accel and non accel.
> > In general I advocate for having several vSMMU instances, each of them
> >
> >> Now... how does vfio(?; why not qemu?) layer determine that? --
> >> where are SMMUv3 'accel' features exposed either: a) in the device
> >> struct (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't
> >> find anything under either on my g-h system, but would appreciate a
> >> ptr if there is.
> > I think it is not discoverable yet other thatn through
> > try-and-fail. Discoverability would probably be some bits in an
> > iommufd GET_INFO ioctl or something like that.
> yeah but at least we can easily detect if a VFIO device is beeing
> translated by a vSMMU instance in which case there is no other choice to
> turn accel on.

Not sure, how you can handle hotplug in such a case? For example if  the smmuv3
dev starts with an emulated device and later try plug a vfio dev? In case of "accel"
the feature bits(IIDR) is queried from the host SMMUv3 and is presented to
to the vSMMU(See patch  #16). We can't do this once Guest is booted.

Also Daniel previously commented on RFCv1 that he would like to have explicit
vSMMU<-->pSMMU association in Qemu command line. 
https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/

Though we are not there yet without a cold-plugged VFIO dev at the moment,
having auto detection of accel is not the right approach if we want an explicit
association in Qemu command line.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-03-19 16:40 ` [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Philippe Mathieu-Daudé
@ 2025-03-19 17:13   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-19 17:13 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Shameer Kolothum, qemu-arm,
	qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao, Cédric Le Goater


Hi Philippe,

On 3/19/25 5:40 PM, Philippe Mathieu-Daudé wrote:
> Hi,
>
> On 11/3/25 15:10, Shameer Kolothum via wrote:
>> Hi All,
>>
>> This patch series introduces initial support for a user-creatable
>> accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU.
>
> I'm a bit confused by the design here. Why are we introducing this as
> some device while it is a core component of the bus topology (here PCI)?

At the moment the SMMU machine wide and is optin-in with a machine option.

However there is a need to be able to instantiate multiple of them to
match the physical implementation. and there is a need to define what
bus topology each instance is translating, hence the idea to attach it
to a bus.

At ACPI level the IORT table allows to precisely define which RID is
translated by each SMMU instance and this is something we fail to model
with the machine wide option.

Eric
>
> Is is because this device is inspired on how x86 IOMMUs are wired?
>
>> Why this is needed:
>>
>> Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the
>> machine and does not support configuring the host SMMUv3 in nested
>> mode.This limitation prevents its use with vfio-pci passthrough
>> devices.
>>
>> The new pluggable smmuv3-accel device enables host SMMUv3 configuration
>> with nested stage support (Stage 1 owned by the Guest and Stage 2 by the
>> host) via the new IOMMUFD APIs. Additionally, it allows multiple
>> accelerated vSMMUv3 instances for guests running on hosts with multiple
>> physical SMMUv3s.
>>
>> This will benefit in:
>> -Reduced invalidation broadcasts and lookups for devices behind multiple
>>   physical SMMUv3s.
>> -Simplifies handling of host SMMUv3s with differing feature sets.
>> -Lays the groundwork for additional capabilities like vCMDQ support.
>>
>>
>> Changes from RFCv1[0]:
>>
>> Thanks to everyone who provided feedback on RFCv1!.
>>
>> –The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested
>>   to better reflect its role in using the host's physical SMMUv3 for
>> page
>>   table setup and cache invalidations.
>> -Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2).
>> -Merges patches from Nicolin’s GitHub repository that add accelerated
>>   functionalityi for page table setup and cache invalidations[1]. I have
>>   modified these a bit, but hopefully has not broken anything.
>> -Incorporates various fixes and improvements based on RFCv1 feedback.
>> –Adds support for vfio-pci hotplug with smmuv3-accel.
>>
>> Note: IORT RMR patches for MSI setup are currently excluded as we may
>> adopt a different approach for MSI handling in the future [2].
>>
>> Also this has dependency on the common iommufd/vfio patches from
>> Zhenzhong's series here[3]
>>
>> ToDos:
>>
>> –At least one vfio-pci device must currently be cold-plugged to a
>>   pxb-pcie bus associated with the arm-smmuv3-accel. This is required
>> both
>>   to associate a vSMMUv3 with a host SMMUv3 and also needed to
>>   retrieve the host SMMUv3 IDR registers for guest export.
>>   Future updates will remove this restriction by adding the
>>   necessary kernel support.
>>   Please find the discussion here[4]
>> -This version does not yet support host SMMUv3 fault handling or
>>   other event notifications. These will be addressed in a
>>   future patch series.
>>
>>
>> The complete branch can be found here:
>> https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext
>>
>> I have done basic sanity testing on a Hisilicon Platform using the
>> kernel
>> branch here:
>> https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2
>>
>> Usage Eg:
>>
>> On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF
>> devices and HNS VF devices are behind different host SMMUv3s. So for a
>> Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as
>> below,
>>
>>
>> ./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \
>> -cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \
>> -bios QEMU_EFI.fd \
>> -object iommufd,id=iommufd0 \
>> -device virtio-blk-device,drive=fs \
>> -drive if=none,file=rootfs.qcow2,id=fs \
>> -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
>> -device arm-smmuv3-accel,bus=pcie.1 \
>> -device
>> pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K
>> \
>> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>> -device
>> pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K
>> \
>> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \
>> -device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
>> -device arm-smmuv3-accel,bus=pcie.2 \
>> -device
>> pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K
>> \
>> -device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \
>> -kernel Image \
>> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
>> earlycon=pl011,0x9000000" \
>> -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \
>> -fsdev local,id=p9fs,path=p9root,security_model=mapped \
>> -net none \
>> -nographic
>>
>> Guest will boot with two SMMUv3s,
>> ...
>> arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
>> arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features
>> 0x00008325)
>> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
>> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
>> arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
>> arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features
>> 0x00008325)
>> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
>> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
>>
>> With a pci topology like below,
>>
>> [root@localhost ~]# lspci -tv
>> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>>   |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>>   |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>>   |           \-03.0  Virtio: Virtio filesystem
>>   +-[0000:01]-+-00.0-[02]----00.0  Huawei Technologies Co., Ltd. HNS
>> Network Controller (Virtual Function)
>>   |           \-01.0-[03]----00.0  Huawei Technologies Co., Ltd. HNS
>> Network Controller (Virtual Function)
>>   \-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd.
>> HiSilicon ZIP Engine(Virtual Function)
>>
>> Further tests are always welcome.
>>
>> Please take a look and let me know your feedback!
>>
>> Thanks,
>> Shameer
>>
>> [0]
>> https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com/
>> [1]
>> https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6
>> [2]
>> https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicolinc@nvidia.com/
>> [3]
>> https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.duan@intel.com/
>> [4] https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 16:45           ` Eric Auger
  2025-03-19 16:53             ` Shameerali Kolothum Thodi via
@ 2025-03-19 17:14             ` Nicolin Chen
  2025-03-19 18:09               ` Eric Auger
  1 sibling, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-19 17:14 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Wed, Mar 19, 2025 at 05:45:51PM +0100, Eric Auger wrote:
> 
> 
> 
> On 3/17/25 8:10 PM, Nicolin Chen wrote:
> > On Mon, Mar 17, 2025 at 07:07:52PM +0100, Eric Auger wrote:
> >> On 3/17/25 6:54 PM, Nicolin Chen wrote:
> >>> On Wed, Mar 12, 2025 at 04:15:10PM +0100, Eric Auger wrote:
> >>>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> >>>>> Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel
> >>>>> device. In order to support vfio-pci dev assignment with a Guest
> >>>> guest
> >>>>> SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
> >>>> nested (s1+s2)
> >>>>> mode, with Guest owning the S1 page tables. Subsequent patches will
> >>>> the guest
> >>>>> add support for smmuv3-accel to provide this.
> >>>> Can't this -accel smmu also works with emulated devices? Do we want an
> >>>> exclusive usage?
> >>> Is there any benefit from emulated devices working in the HW-
> >>> accelerated nested translation mode?
> >> Not really but do we have any justification for using different device
> >> name in accel mode? I am not even sure that accel option is really
> >> needed. Ideally the qemu device should be able to detect it is
> >> protecting a VFIO device, in which case it shall check whether nested is
> >> supported by host SMMU and then automatically turn accel mode?
> >>
> >> I gave the example of the vfio device which has different class
> >> implementration depending on the iommufd option being set or not.
> > Do you mean that we should just create a regular smmuv3 device and
> > let a VFIO device to turn on this smmuv3's accel mode depending on
> > its LEGACY/IOMMUFD class?
> 
> no this is not what I meant. I gave an example where depending on an
> option passed to thye VFIO device you choose one class implement or the
> other.

Option means something like this:
	-device smmuv3,accel=on
instead of
	-device "smmuv3-accel"
?

Yea, I think that's good.

> > Another question: how does an emulated device work with a vSMMUv3?

> I don't get your question. vSMMUv3 currently only works with emulated
> devices. Did you mean accelerated SMMUv3?

Yea. If "accel=on", how does an emulated device work with that?

> > I could imagine that all the accel steps would be bypassed since
> > !sdev->idev. Yet, the emulated iotlb should cache its translation
> > so we will need to flush the iotlb, which will increase complexity
> > as the TLBI command dispatching function will need to be aware what
> > ASID is for emulated device and what is for vfio device..

> I don't get the issue. For emulated device you go through the usual
> translate path which indeed caches configs and translations. In case the
> guest invalidates something, you know the SID and you find the entries
> in the cache that are tagged by this SID.
> 
> In case you have an accelerated device (indeed if sdev->idev) you don't
> exercise that path. On invalidation you detect the SID matches a VFIO
> devoce, propagate the invalidations to the host instead. on the
> invalidation you should be able to detect pretty easily if you need to
> flush the emulated caches or propagate the invalidations. Do I miss some
> extra problematic?
> 
> I do not say we should support emulated devices and VFIO devices in the
> same guest iommu group. But I don't see why we couldn't easily plug the
> accelerated logic in the current logical for emulation/vhost and do not
> require a different qemu device.

Hmm, feels like I fundamentally misunderstood your point.
 a) We implement the device model with the same piece of code but
    only provide an option "accel=on/off" to switch mode. And both
    passthrough devices and emulated devices can attach to the same
    "accel=on" device.
 b) We implement the device model with the same piece of code but
    only provide an option "accel=on/off" to switch mode. Then, an
    passthrough device can attach to an "accel=on" device, but an
    emulated device can only attach to an "accel=off" SMMU device.

I was thinking that you want case (a). But actually you were just
talking about case (b)? I think (b) is totally fine.

We certainly can't do case (a): not all TLBI commands gives an "SID"
field (so would have to broadcast, i.e. underlying SMMU HW would run
commands that were supposed for emulated devices only); in case of
vCMDQ, commands for emulated devices would be issued to real HW and
trigger HW errors.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 16:53             ` Shameerali Kolothum Thodi via
@ 2025-03-19 17:26               ` Eric Auger
  2025-03-19 17:34                 ` Jason Gunthorpe
  0 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-19 17:26 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer,


On 3/19/25 5:53 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Wednesday, March 19, 2025 4:46 PM
>> To: Nicolin Chen <nicolinc@nvidia.com>
>> Cc: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial
>> infrastructure for smmuv3-accel device
>>>>> Is there any benefit from emulated devices working in the HW-
>>>>> accelerated nested translation mode?
>>>> Not really but do we have any justification for using different device
>>>> name in accel mode? I am not even sure that accel option is really
>>>> needed. Ideally the qemu device should be able to detect it is
>>>> protecting a VFIO device, in which case it shall check whether nested is
>>>> supported by host SMMU and then automatically turn accel mode?
>>>>
>>>> I gave the example of the vfio device which has different class
>>>> implementration depending on the iommufd option being set or not.
>>> Do you mean that we should just create a regular smmuv3 device and
>>> let a VFIO device to turn on this smmuv3's accel mode depending on
>>> its LEGACY/IOMMUFD class?
>> no this is not what I meant. I gave an example where depending on an
>> option passed to thye VFIO device you choose one class implement or the
>> other.
>>> Another question: how does an emulated device work with a vSMMUv3?
>> I don't get your question. vSMMUv3 currently only works with emulated
>> devices. Did you mean accelerated SMMUv3?
>>> I could imagine that all the accel steps would be bypassed since
>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>> so we will need to flush the iotlb, which will increase complexity
>>> as the TLBI command dispatching function will need to be aware what
>>> ASID is for emulated device and what is for vfio device..
>> I don't get the issue. For emulated device you go through the usual
>> translate path which indeed caches configs and translations. In case the
>> guest invalidates something, you know the SID and you find the entries
>> in the cache that are tagged by this SID.
> Not always you get sid, eg: CMD_TLBI_NH_ASID

Effectively with ASID invalidation you potentially need to do both qemu
IOTLB invalidation and host invalidation propagation.
but this code is already in place in the code and used in vhost mode:

            smmu_inv_notifiers_all(&s->smmu_state);
            smmu_iotlb_inv_asid_vmid(bs, asid, vmid);

but as stated before in VFIO accel mode the cache is not filled so I
don't expect a huge penalty

Besides we can also disable qemu caches if it turns the accel mode is in
use, no?

Eric


>
> Thanks,
> Shameer



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 17:26               ` Eric Auger
@ 2025-03-19 17:34                 ` Jason Gunthorpe
  2025-03-19 17:41                   ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Jason Gunthorpe @ 2025-03-19 17:34 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameerali Kolothum Thodi, Nicolin Chen, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

On Wed, Mar 19, 2025 at 06:26:48PM +0100, Eric Auger wrote:
> Effectively with ASID invalidation you potentially need to do both qemu
> IOTLB invalidation and host invalidation propagation.
> but this code is already in place in the code and used in vhost mode:

Let's not forget the focus here, the point of the accel mode is to
run fast, especially fast invalidation.

Doing a bunch of extra stuff on hot paths just to support mixing
virtual devices with physical doesn't seem like a great direction..

Jason


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 17:12                         ` Shameerali Kolothum Thodi via
@ 2025-03-19 17:38                           ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-19 17:38 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Jason Gunthorpe, Donald Dutile
  Cc: Nicolin Chen, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org




On 3/19/25 6:12 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Wednesday, March 19, 2025 5:01 PM
>> To: Jason Gunthorpe <jgg@nvidia.com>; Donald Dutile
>> <ddutile@redhat.com>
>> Cc: Nicolin Chen <nicolinc@nvidia.com>; Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; peter.maydell@linaro.org;
>> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
>> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial
>> infrastructure for smmuv3-accel device
>>
>> Hi,
>>
>>
>> On 3/19/25 1:23 AM, Jason Gunthorpe wrote:
>>> On Tue, Mar 18, 2025 at 05:22:51PM -0400, Donald Dutile wrote:
>>>
>>>> I agree with Eric that 'accel' isn't needed -- this should be
>>>> ascertained from the pSMMU that a physical device is attached to.
>>> I seem to remember the point was made that we don't actually know if
>>> accel is possible, or desired, especially in the case of hotplug.
>> that's why I think it would be better if we could instantiate a single
>> type of device that can do both accel and non accel mode.
>> Maybe that would be at the price of always enforcing MSI resv regions on
>> guest to assure MSI nesting is possible.
>>
>>> The accelerated mode has a number of limitations that the software
>>> mode does not have. I think it does make sense that the user would
>>> deliberately choose to use a more restrictive operating mode and then
>>> would have to meet the requirements - eg by creating the required
>>> number and configuration of vSMMUs.
>> To avoid any misunderstanding I am not pushing for have a single vSMMU
>> instance. I advocate for having several instances, each somehow
>> specialized for VFIO devices or emulated devices. Maybe we can opt-in
>> with accel=on but the default could be auto (the property can be
>> AUTO_ON_OFF) where the code detects if a VFIO device is translated.In
>> case incompatible devices are translated into a same vSMMU instance I
>> guess it could be detected and will fail.
>>
>> What I am pusshing for is to have a single type of QEMU device which can
>> do both accel and non accel.
>>> In general I advocate for having several vSMMU instances, each of them
>>>
>>>> Now... how does vfio(?; why not qemu?) layer determine that? --
>>>> where are SMMUv3 'accel' features exposed either: a) in the device
>>>> struct (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't
>>>> find anything under either on my g-h system, but would appreciate a
>>>> ptr if there is.
>>> I think it is not discoverable yet other thatn through
>>> try-and-fail. Discoverability would probably be some bits in an
>>> iommufd GET_INFO ioctl or something like that.
>> yeah but at least we can easily detect if a VFIO device is beeing
>> translated by a vSMMU instance in which case there is no other choice to
>> turn accel on.
> Not sure, how you can handle hotplug in such a case? For example if  the smmuv3
> dev starts with an emulated device and later try plug a vfio dev? In case of "accel"
> the feature bits(IIDR) is queried from the host SMMUv3 and is presented to
> to the vSMMU(See patch  #16). We can't do this once Guest is booted.
if accel=auto, if smmu is attached to a bus where only emulated devices
are plugged, at cold start accel=false and then it effectively becomes
impossible to hotplug a vfio device.
if accel=auto and smmu is attached to a bus where a VFIO-PCI device is
cold-plugged, we end up with accel=on forced.
otherwise you always have the possible to opt-on for accel with accel=true
just like intel_iommu has caching_mode option.
>
> Also Daniel previously commented on RFCv1 that he would like to have explicit
> vSMMU<-->pSMMU association in Qemu command line. 
> https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/
tbh I did not understand why this explicit setting was needed and why it
can't be inferred from the HostIOMMUDevice. But I need to read this back.
>
> Though we are not there yet without a cold-plugged VFIO dev at the moment,
> having auto detection of accel is not the right approach if we want an explicit
> association in Qemu command line.
Maybe we shall not focus too much on auto detection at the moment.

Eric
>
> Thanks,
> Shameer



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 17:34                 ` Jason Gunthorpe
@ 2025-03-19 17:41                   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-19 17:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameerali Kolothum Thodi, Nicolin Chen, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org




On 3/19/25 6:34 PM, Jason Gunthorpe wrote:
> On Wed, Mar 19, 2025 at 06:26:48PM +0100, Eric Auger wrote:
>> Effectively with ASID invalidation you potentially need to do both qemu
>> IOTLB invalidation and host invalidation propagation.
>> but this code is already in place in the code and used in vhost mode:
> Let's not forget the focus here, the point of the accel mode is to
> run fast, especially fast invalidation.
>
> Doing a bunch of extra stuff on hot paths just to support mixing
> virtual devices with physical doesn't seem like a great direction..
fair enough. Then let's disable the internal caches if we are in accel mode.

Eric
>
> Jason
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 17:14             ` Nicolin Chen
@ 2025-03-19 18:09               ` Eric Auger
  2025-03-19 18:34                 ` Nicolin Chen
  2025-03-21  1:26                 ` Donald Dutile
  0 siblings, 2 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-19 18:09 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

Hi Nicolin,


On 3/19/25 6:14 PM, Nicolin Chen wrote:
> On Wed, Mar 19, 2025 at 05:45:51PM +0100, Eric Auger wrote:
>>
>>
>> On 3/17/25 8:10 PM, Nicolin Chen wrote:
>>> On Mon, Mar 17, 2025 at 07:07:52PM +0100, Eric Auger wrote:
>>>> On 3/17/25 6:54 PM, Nicolin Chen wrote:
>>>>> On Wed, Mar 12, 2025 at 04:15:10PM +0100, Eric Auger wrote:
>>>>>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>>>>>> Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel
>>>>>>> device. In order to support vfio-pci dev assignment with a Guest
>>>>>> guest
>>>>>>> SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
>>>>>> nested (s1+s2)
>>>>>>> mode, with Guest owning the S1 page tables. Subsequent patches will
>>>>>> the guest
>>>>>>> add support for smmuv3-accel to provide this.
>>>>>> Can't this -accel smmu also works with emulated devices? Do we want an
>>>>>> exclusive usage?
>>>>> Is there any benefit from emulated devices working in the HW-
>>>>> accelerated nested translation mode?
>>>> Not really but do we have any justification for using different device
>>>> name in accel mode? I am not even sure that accel option is really
>>>> needed. Ideally the qemu device should be able to detect it is
>>>> protecting a VFIO device, in which case it shall check whether nested is
>>>> supported by host SMMU and then automatically turn accel mode?
>>>>
>>>> I gave the example of the vfio device which has different class
>>>> implementration depending on the iommufd option being set or not.
>>> Do you mean that we should just create a regular smmuv3 device and
>>> let a VFIO device to turn on this smmuv3's accel mode depending on
>>> its LEGACY/IOMMUFD class?
>> no this is not what I meant. I gave an example where depending on an
>> option passed to thye VFIO device you choose one class implement or the
>> other.
> Option means something like this:
> 	-device smmuv3,accel=on
> instead of
> 	-device "smmuv3-accel"
> ?
>
> Yea, I think that's good.
Yeah actually that's a big debate for not much. From an implementation
pov that shall not change much. The only doubt I have is if we need to
conditionnaly expose the MSI RESV regions it is easier to do if we
detect we have a smmuv3-accel. what the option allows is the auto mode.
>
>>> Another question: how does an emulated device work with a vSMMUv3?
>> I don't get your question. vSMMUv3 currently only works with emulated
>> devices. Did you mean accelerated SMMUv3?
> Yea. If "accel=on", how does an emulated device work with that?
>
>>> I could imagine that all the accel steps would be bypassed since
>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>> so we will need to flush the iotlb, which will increase complexity
>>> as the TLBI command dispatching function will need to be aware what
>>> ASID is for emulated device and what is for vfio device..
>> I don't get the issue. For emulated device you go through the usual
>> translate path which indeed caches configs and translations. In case the
>> guest invalidates something, you know the SID and you find the entries
>> in the cache that are tagged by this SID.
>>
>> In case you have an accelerated device (indeed if sdev->idev) you don't
>> exercise that path. On invalidation you detect the SID matches a VFIO
>> devoce, propagate the invalidations to the host instead. on the
>> invalidation you should be able to detect pretty easily if you need to
>> flush the emulated caches or propagate the invalidations. Do I miss some
>> extra problematic?
>>
>> I do not say we should support emulated devices and VFIO devices in the
>> same guest iommu group. But I don't see why we couldn't easily plug the
>> accelerated logic in the current logical for emulation/vhost and do not
>> require a different qemu device.
> Hmm, feels like I fundamentally misunderstood your point.
>  a) We implement the device model with the same piece of code but
>     only provide an option "accel=on/off" to switch mode. And both
>     passthrough devices and emulated devices can attach to the same
>     "accel=on" device.
I think we all agree we don't want that use case in general. However
effectively I was questioning why it couldn't work maybe at the expense
of some perf degration.
>  b) We implement the device model with the same piece of code but
>     only provide an option "accel=on/off" to switch mode. Then, an
>     passthrough device can attach to an "accel=on" device, but an
>     emulated device can only attach to an "accel=off" SMMU device.
>
> I was thinking that you want case (a). But actually you were just
> talking about case (b)? I think (b) is totally fine.
>
> We certainly can't do case (a): not all TLBI commands gives an "SID"
> field (so would have to broadcast, i.e. underlying SMMU HW would run
> commands that were supposed for emulated devices only); in case of
> vCMDQ, commands for emulated devices would be issued to real HW and
I am still confused about that. For instance if the guest sends an
NH_ASID, NH_VA invalidation and it happens both the emulated device and
VFIO-device share the same cd.asid (same guest iommu domain, which
practically should not happen) why shouldn't we propagate the
invalidation to the host. Does the problem come from the usage of vCMDQ
or would you foresee the same problem with a generic physical SMMU?

Thanks

Eric
> trigger HW errors.
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-19 16:21       ` Donald Dutile
@ 2025-03-19 18:21         ` Eric Auger
  2025-03-21  0:59           ` Donald Dutile
  0 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-19 18:21 UTC (permalink / raw)
  To: Donald Dutile, Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Don,


On 3/19/25 5:21 PM, Donald Dutile wrote:
>
>
> On 3/19/25 5:26 AM, Shameerali Kolothum Thodi wrote:
>> Hi Don,
>>
> Hey!
>
>>> -----Original Message-----
>>> From: Donald Dutile <ddutile@redhat.com>
>>> Sent: Tuesday, March 18, 2025 10:12 PM
>>> To: Shameerali Kolothum Thodi
>>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>>> qemu-devel@nongnu.org
>>> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>>> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
>>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
>>> pcie bus
>>>
>>> Shameer,
>>>
>>> Hi!
>>>
>>> On 3/11/25 10:10 AM, Shameer Kolothum wrote:
>>>> User must associate a pxb-pcie root bus to smmuv3-accel
>>>> and that is set as the primary-bus for the smmu dev.
>>>>
>>>> Signed-off-by: Shameer Kolothum
>>> <shameerali.kolothum.thodi@huawei.com>
>>>> ---
>>>>    hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
>>>>    1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
>>>> index c327661636..1471b65374 100644
>>>> --- a/hw/arm/smmuv3-accel.c
>>>> +++ b/hw/arm/smmuv3-accel.c
>>>> @@ -9,6 +9,21 @@
>>>>    #include "qemu/osdep.h"
>>>>
>>>>    #include "hw/arm/smmuv3-accel.h"
>>>> +#include "hw/pci/pci_bridge.h"
>>>> +
>>>> +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
>>>> +{
>>>> +    DeviceState *d = opaque;
>>>> +
>>>> +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
>>>> +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
>>>> +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus-
>>>> name)) {
>>>> +            object_property_set_link(OBJECT(d), "primary-bus",
>>>> OBJECT(bus),
>>>> +                                     &error_abort);
>>>> +        }
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>>
>>>>    static void smmu_accel_realize(DeviceState *d, Error **errp)
>>>>    {
>>>> @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d, Error
>>> **errp)
>>>>        SysBusDevice *dev = SYS_BUS_DEVICE(d);
>>>>        Error *local_err = NULL;
>>>>
>>>> +    object_child_foreach_recursive(object_get_root(),
>>>> +                                   smmuv3_accel_pxb_pcie_bus, d);
>>>> +
>>>>        object_property_set_bool(OBJECT(dev), "accel", true,
>>>> &error_abort);
>>>>        c->parent_realize(d, &local_err);
>>>>        if (local_err) {
>>>> @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass
>>> *klass, void *data)
>>>>        device_class_set_parent_realize(dc, smmu_accel_realize,
>>>>                                        &c->parent_realize);
>>>>        dc->hotpluggable = false;
>>>> +    dc->bus_type = TYPE_PCIE_BUS;
>>>>    }
>>>>
>>>>    static const TypeInfo smmuv3_accel_type_info = {
>>>
>>> I am not seeing the need for a pxb-pcie bus(switch) introduced for each
>>> 'accel'.
>>> Isn't the IORT able to define different SMMUs for different RIDs?  
>>> if so,
>>> itsn't that sufficient
>>> to associate (define) an SMMU<->RID association without introducing a
>>> pxb-pcie?
>>> and again, I'm not sure how that improves/enables the device<->SMMU
>>> associativity?
>>
>> Thanks for taking a look at the series. As discussed elsewhere in
>> this thread(with
>> Eric), normally in physical world (or atleast in the most common
>> cases) SMMUv3
>> is attached to PCIe Root Complex and if you take a look at the IORT
>> spec, it describes
>> association of ID mappings between a RC node and SMMUV3 node.
>>
>> And if my understanding is correct, in Qemu, only pxb-pcie allows you
>> to add
>> extra root complexes even though it is still plugged to
>> parent(pcie.0). ie, for all
>> devices downstream it acts as a root complex but still plugged into a
>> parent pcie.0.
>> This allows us to add/describe multiple "smmuv3-accel" each
>> associated with a RC.
>>
> I find the qemu statements a bit unclear here as well.
> I looked at the hot plug statement(s) in docs/pcie.txt, as I figured
> that's where dynamic
> IORT changes would be needed as well.  There, it says you can hot-add
> PCIe devices to RPs,
> one has to define/add RP's to the machine model for that plug-in.
>
> Using libvirt, it could auto-add the needed RPs to do dynmaic smmuv3
> additions,
I am not sure I understand your statement here. we don't want "dynamic"
SMMUv3 instantiation. SMMUv3 is a platform device which is supposed to
be coldplugged on a pre-existing PCIe hierarchy. The SMMUv3 device is
not something that is meant to be hotplugged or hotunplugged.
To me we hijack the bus= property to provide information about the IORT
IDMAP

Thanks

Eric
> if I understand how libvirt does that today for pcie devices now (/me
> looks at danpb for feedback).
>
>> Having said that,  current code only allows pxb-pcie root complexes
>> avoiding
>> the pcie.0. The idea behind this was, user can use pcie.0 with a non
>> accel SMMUv3
>> for any emulated devices avoiding the performance bottlenecks we are
>> discussing for emulated dev+smmuv3-accel cases. But based on the
>> feedback from
>> Eric and Daniel I will relax that restriction and will allow
>> association with pcie.0.
>>
> So, I think this isn't a restriction that this smmuv3 feature should
> enforce;
> lack of a proper RP or pxb-pcie will yield an invalid config
> issue/error, and
> the machine definition will be modified to meet the needs for IORT.
>
>> Thanks,
>> Shameer
>>
>>
>>
>>
>>
>>
>>
>>
>>  
>>>>> to root complexes.
>>> Feel free to enlighten me where I may have mis-read/interpreted the
>>> IORT
>>> & SMMUv3 specs.
>>>
>>> Thanks,
>>> - Don
>>>
>>
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 18:09               ` Eric Auger
@ 2025-03-19 18:34                 ` Nicolin Chen
  2025-03-24 14:46                   ` Eric Auger
  2025-03-21  1:26                 ` Donald Dutile
  1 sibling, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-19 18:34 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Wed, Mar 19, 2025 at 07:09:33PM +0100, Eric Auger wrote:
> > Option means something like this:
> > 	-device smmuv3,accel=on
> > instead of
> > 	-device "smmuv3-accel"
> > ?
> >
> > Yea, I think that's good.

> Yeah actually that's a big debate for not much. From an implementation
> pov that shall not change much. The only doubt I have is if we need to
> conditionnaly expose the MSI RESV regions it is easier to do if we
> detect we have a smmuv3-accel. what the option allows is the auto mode.

Mind elaborating your doubt about the MSI RESV region?

Do you mean how VMS code should tag "accel=on" option and generate
RMR nodes in the IORT table?

> > We certainly can't do case (a): not all TLBI commands gives an "SID"
> > field (so would have to broadcast, i.e. underlying SMMU HW would run
> > commands that were supposed for emulated devices only); in case of
> > vCMDQ, commands for emulated devices would be issued to real HW and

> I am still confused about that. For instance if the guest sends an
> NH_ASID, NH_VA invalidation and it happens both the emulated device and
> VFIO-device share the same cd.asid (same guest iommu domain, which
> practically should not happen) why shouldn't we propagate the
> invalidation to the host. Does the problem come from the usage of vCMDQ
> or would you foresee the same problem with a generic physical SMMU?

Host (HW) would end up with executing commands that were issued for
emulated devices, which impacts performance.

With vCMDQ, QEMU cannot trap command queue because all invalidation
commands will be issued to HW directly from the guest kernel driver.
This includes TLBI and ATC_INV commands. It's probably okay to run
TLBI commands with vCMDQ (again perf impact), while ATC_INV commands
would result in "unkonwn SID" errors or directly ATC_INV timeouts.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-19  9:26     ` Shameerali Kolothum Thodi via
  2025-03-19 16:21       ` Donald Dutile
@ 2025-03-20 17:02       ` Nicolin Chen
  2025-03-24  8:19         ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-20 17:02 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Wed, Mar 19, 2025 at 09:26:29AM +0000, Shameerali Kolothum Thodi wrote:
> Having said that,  current code only allows pxb-pcie root complexes avoiding
> the pcie.0. The idea behind this was, user can use pcie.0 with a non accel SMMUv3
> for any emulated devices avoiding the performance bottlenecks we are
> discussing for emulated dev+smmuv3-accel cases. But based on the feedback from
> Eric and Daniel I will relax that restriction and will allow association with pcie.0.

Just want a clarification here..

If VM has a passthrough device only:
 attach it to PCIE.0 <=> vSMMU0 (accel=on)
If VM has an emulated device and a passthrough device:
 attach the emulated device to PCIE.0 <=> vSMMU bypass (or accel=off?)
 attach the passthrough device to pxb-pcie <=> vSMMU0 (accel=on)
?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 17:04                     ` Eric Auger
@ 2025-03-21  0:54                       ` Donald Dutile
  2025-03-24 14:52                         ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Donald Dutile @ 2025-03-21  0:54 UTC (permalink / raw)
  To: eric.auger, Nicolin Chen
  Cc: Jason Gunthorpe, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, berrange, nathanc, mochs, smostafa, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/19/25 1:04 PM, Eric Auger wrote:
> 
> 
> 
> On 3/18/25 10:22 PM, Donald Dutile wrote:
>>
>>
>> On 3/18/25 3:13 PM, Nicolin Chen wrote:
>>> On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
>>>> On 3/17/25 9:19 PM, Nicolin Chen wrote:
>>>>> On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
>>>>>> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
>>>>>>> Another question: how does an emulated device work with a vSMMUv3?
>>>>>>> I could imagine that all the accel steps would be bypassed since
>>>>>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>>>>>> so we will need to flush the iotlb, which will increase complexity
>>>>>>> as the TLBI command dispatching function will need to be aware what
>>>>>>> ASID is for emulated device and what is for vfio device..
>>>>>> I think you should block it. We already expect different vSMMU's
>>>>>> depending on the physical SMMU under the PCI device, it makes sense
>>>>>> that a SW VFIO device would have it's own, non-accelerated, vSMMU
>>>>>> model in the guest.
>>>>> Yea, I agree and it'd be cleaner for an implementation separating
>>>>> them.
>>>>>
>>>>> In my mind, the general idea of "accel=on" is also to keep things
>>>>> in a more efficient way: passthrough devices go to HW-accelerated
>>>>> vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
>>>>> bypassed (PCIE0).
>>>
>>>> Originally a specific SMMU device was needed to opt in for MSI reserved
>>>> region ACPI IORT description which are not needed if you don't rely on
>>>> S1+S2. However if we don't rely on this trick this was not even needed
>>>> with legacy integration
>>>> (https://patchwork.kernel.org/project/qemu-devel/cover/20180921081819.9203-1-eric.auger@redhat.com/).
>>>>
>>>>
>>>> Nevertheless I don't think anything prevents the acceleration granted
>>>> device from also working with virtio/vhost devices for instance unless
>>>> you unplug the existing infra. The translation and invalidation just
>>>> should use different control paths (explicit translation requests,
>>>> invalidation notifications towards vhost, ...).
>>>
>>> smmuv3_translate() is per sdev, so it's easy.
>>>
>>> Invalidation is done via commands, which could be tricky:
>>> a) Broadcast command
>>> b) ASID validation -- we'll need to keep track of a list of ASIDs
>>>      for vfio device to compare the ASID in each per-ASID command,
>>>      potentially by trapping all CFGI_CD(_ALL) commands? Note that
>>>      each vfio device may have multiple ASIDs (for multiple CDs).
>>> Either a or b above will have some validation efficiency impact.
>>>
>>>> Again, what does legitimate to have different qemu devices for the same
>>>> IP? I understand that it simplifies the implementation but I am not
>>>> sure
>>>> this is a good reason. Nevertheless it worth challenging. What is the
>>>> plan for intel iommu? Will we have 2 devices, the legacy device and one
>>>> for nested?
>>>
>>> Hmm, it seems that there are two different topics:
>>> 1. Use one SMMU device model (source code file; "iommu=" string)
>>>      for both an emulated vSMMU and an HW-accelerated vSMMU.
>>> 2. Allow one vSMMU instance to work with both an emulated device
>>>      and a passthrough device.
>>> And I get that you want both 1 and 2.
>>>
>>> I'm totally okay with 1, yet see no compelling benefit from 2 for
>>> the increased complexity in the invalidation routine.
>>>
>>> And another question about the mixed device attachment. Let's say
>>> we have in the host:
>>>     VFIO passthrough dev0 -> pSMMU0
>>>     VFIO passthrough dev1 -> pSMMU1
>>> Should we allow emulated devices to be flexibly plugged?
>>>     dev0 -> vSMMU0 /* Hard requirement */
>>>     dev1 -> vSMMU1 /* Hard requirement */
>>>     emu0 -> vSMMU0 /* Soft requirement; can be vSMMU1 also */
>>>     emu1 -> vSMMU1 /* Soft requirement; can be vSMMU0 also */
>>>
>>> Thanks
>>> Nicolin
>>>
>> I agree w/Jason & Nicolin: different vSMMUs for pass-through devices
>> than emulated, & vice-versa.
>> Not mixing... because... of the next agreement:
> you need to clarify what you mean by different vSMMUs: are you taking
> about different instances or different qemu device types?
Both. a device needed to use hw-accel feature has to use an smmu that has that feature;
an emulated device can use such an smmu, but as mentioned in other threads,
if you start with all emulated in one smmu, if you hot-plug a (assigned) device,
it needs another smmu that has hw-accel features.
Keeping them split makes it easier at config time, and it may enable the code to be simpler...
but the other half of my brain wants common code paths with accel/emulate branches but
a different smmu instance will like simplify the smmu-(accel-)specific lookups.

>>
>> I agree with Eric that 'accel' isn't needed -- this should be
>> ascertained from the pSMMU that a physical device is attached to.
> we can simply use an AUTO_ON_OFF property and by default choose AUTO
> value. That would close the debate ;-)
> 
Preaching to the choir... yes.

> Eric
>> Now... how does vfio(?; why not qemu?) layer determine that? -- where
>> are SMMUv3 'accel' features exposed either: a) in the device struct
>> (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't find
>> anything under either on my g-h system, but would appreciate a ptr if
>> there is.
>> and like Eric, although 'accel' is better than the original 'nested',
>> it's non-obvious what accel feature(s) are being turned on, or not.
>> In fact, if broken accel hw occurs ('if' -> 'when'), how should it be
>> turned off? ... if info in the kernel, a kernel boot-param will be
>> needed;
>> if in sysfs, a write to 0 an enable(disable) it maybe an alternative
>> as well.
>> Bottom line: we need a way to (a) ascertain the accel feature (b) a
>> way to disable it when it is broken,
>> so qemu's smmuv3 spec will 'just work'.
>> [This may also help when migrating from a machine that has accel
>> working to one that does not.[
>>
>> ... and when an emulated device is assigned a vSMMU, there are no
>> accel features ... unless we have tunables like batch iotlb
>> invalidation for perf reasons, which can be viewed as an 'accel' option.
>>
> 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 17:00                       ` Eric Auger
  2025-03-19 17:12                         ` Shameerali Kolothum Thodi via
@ 2025-03-21  0:55                         ` Donald Dutile
  1 sibling, 0 replies; 145+ messages in thread
From: Donald Dutile @ 2025-03-21  0:55 UTC (permalink / raw)
  To: eric.auger, Jason Gunthorpe
  Cc: Nicolin Chen, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, berrange, nathanc, mochs, smostafa, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/19/25 1:00 PM, Eric Auger wrote:
> Hi,
> 
> 
> On 3/19/25 1:23 AM, Jason Gunthorpe wrote:
>> On Tue, Mar 18, 2025 at 05:22:51PM -0400, Donald Dutile wrote:
>>
>>> I agree with Eric that 'accel' isn't needed -- this should be
>>> ascertained from the pSMMU that a physical device is attached to.
>> I seem to remember the point was made that we don't actually know if
>> accel is possible, or desired, especially in the case of hotplug.
> that's why I think it would be better if we could instantiate a single
> type of device that can do both accel and non accel mode.
> Maybe that would be at the price of always enforcing MSI resv regions on
> guest to assure MSI nesting is possible.
> 
>>
>> The accelerated mode has a number of limitations that the software
>> mode does not have. I think it does make sense that the user would
>> deliberately choose to use a more restrictive operating mode and then
>> would have to meet the requirements - eg by creating the required
>> number and configuration of vSMMUs.
> To avoid any misunderstanding I am not pushing for have a single vSMMU
> instance. I advocate for having several instances, each somehow
> specialized for VFIO devices or emulated devices. Maybe we can opt-in
> with accel=on but the default could be auto (the property can be
> AUTO_ON_OFF) where the code detects if a VFIO device is translated.In
> case incompatible devices are translated into a same vSMMU instance I
> guess it could be detected and will fail.
> 
> What I am pusshing for is to have a single type of QEMU device which can
> do both accel and non accel.
+1 !

>> In general I advocate for having several vSMMU instances, each of them
>>
>>> Now... how does vfio(?; why not qemu?) layer determine that? --
>>> where are SMMUv3 'accel' features exposed either: a) in the device
>>> struct (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't
>>> find anything under either on my g-h system, but would appreciate a
>>> ptr if there is.
>> I think it is not discoverable yet other thatn through
>> try-and-fail. Discoverability would probably be some bits in an
>> iommufd GET_INFO ioctl or something like that.
> yeah but at least we can easily detect if a VFIO device is beeing
> translated by a vSMMU instance in which case there is no other choice to
> turn accel on.
> 
> Thanks
> 
> Eric
>>
>>> and like Eric, although 'accel' is better than the
>>> original 'nested', it's non-obvious what accel feature(s) are being
>>> turned on, or not.
>> There are really only one accel feature - direct HW usage of the IO
>> Page table in the guest (no shadowing).
>>
>> A secondary addon would be direct HW usage of an invalidation queue in
>> the guest.
>>
>>> kernel boot-param will be needed; if in sysfs, a write to 0 an
>>> enable(disable) it maybe an alternative as well.  Bottom line: we
>>> need a way to (a) ascertain the accel feature (b) a way to disable
>>> it when it is broken, so qemu's smmuv3 spec will 'just work'.
>> You'd turned it off by not asking qemu to use it, that is sort of the
>> reasoning behind the command line opt in for accel or not.
>>
>> Jason
>>
> 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-19 18:21         ` Eric Auger
@ 2025-03-21  0:59           ` Donald Dutile
  2025-03-24 14:56             ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Donald Dutile @ 2025-03-21  0:59 UTC (permalink / raw)
  To: eric.auger, Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



On 3/19/25 2:21 PM, Eric Auger wrote:
> Hi Don,
> 
> 
> On 3/19/25 5:21 PM, Donald Dutile wrote:
>>
>>
>> On 3/19/25 5:26 AM, Shameerali Kolothum Thodi wrote:
>>> Hi Don,
>>>
>> Hey!
>>
>>>> -----Original Message-----
>>>> From: Donald Dutile <ddutile@redhat.com>
>>>> Sent: Tuesday, March 18, 2025 10:12 PM
>>>> To: Shameerali Kolothum Thodi
>>>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>>>> qemu-devel@nongnu.org
>>>> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>>>> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
>>>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
>>>> pcie bus
>>>>
>>>> Shameer,
>>>>
>>>> Hi!
>>>>
>>>> On 3/11/25 10:10 AM, Shameer Kolothum wrote:
>>>>> User must associate a pxb-pcie root bus to smmuv3-accel
>>>>> and that is set as the primary-bus for the smmu dev.
>>>>>
>>>>> Signed-off-by: Shameer Kolothum
>>>> <shameerali.kolothum.thodi@huawei.com>
>>>>> ---
>>>>>     hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
>>>>>     1 file changed, 19 insertions(+)
>>>>>
>>>>> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
>>>>> index c327661636..1471b65374 100644
>>>>> --- a/hw/arm/smmuv3-accel.c
>>>>> +++ b/hw/arm/smmuv3-accel.c
>>>>> @@ -9,6 +9,21 @@
>>>>>     #include "qemu/osdep.h"
>>>>>
>>>>>     #include "hw/arm/smmuv3-accel.h"
>>>>> +#include "hw/pci/pci_bridge.h"
>>>>> +
>>>>> +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
>>>>> +{
>>>>> +    DeviceState *d = opaque;
>>>>> +
>>>>> +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
>>>>> +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
>>>>> +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus-
>>>>> name)) {
>>>>> +            object_property_set_link(OBJECT(d), "primary-bus",
>>>>> OBJECT(bus),
>>>>> +                                     &error_abort);
>>>>> +        }
>>>>> +    }
>>>>> +    return 0;
>>>>> +}
>>>>>
>>>>>     static void smmu_accel_realize(DeviceState *d, Error **errp)
>>>>>     {
>>>>> @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d, Error
>>>> **errp)
>>>>>         SysBusDevice *dev = SYS_BUS_DEVICE(d);
>>>>>         Error *local_err = NULL;
>>>>>
>>>>> +    object_child_foreach_recursive(object_get_root(),
>>>>> +                                   smmuv3_accel_pxb_pcie_bus, d);
>>>>> +
>>>>>         object_property_set_bool(OBJECT(dev), "accel", true,
>>>>> &error_abort);
>>>>>         c->parent_realize(d, &local_err);
>>>>>         if (local_err) {
>>>>> @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass
>>>> *klass, void *data)
>>>>>         device_class_set_parent_realize(dc, smmu_accel_realize,
>>>>>                                         &c->parent_realize);
>>>>>         dc->hotpluggable = false;
>>>>> +    dc->bus_type = TYPE_PCIE_BUS;
>>>>>     }
>>>>>
>>>>>     static const TypeInfo smmuv3_accel_type_info = {
>>>>
>>>> I am not seeing the need for a pxb-pcie bus(switch) introduced for each
>>>> 'accel'.
>>>> Isn't the IORT able to define different SMMUs for different RIDs?
>>>> if so,
>>>> itsn't that sufficient
>>>> to associate (define) an SMMU<->RID association without introducing a
>>>> pxb-pcie?
>>>> and again, I'm not sure how that improves/enables the device<->SMMU
>>>> associativity?
>>>
>>> Thanks for taking a look at the series. As discussed elsewhere in
>>> this thread(with
>>> Eric), normally in physical world (or atleast in the most common
>>> cases) SMMUv3
>>> is attached to PCIe Root Complex and if you take a look at the IORT
>>> spec, it describes
>>> association of ID mappings between a RC node and SMMUV3 node.
>>>
>>> And if my understanding is correct, in Qemu, only pxb-pcie allows you
>>> to add
>>> extra root complexes even though it is still plugged to
>>> parent(pcie.0). ie, for all
>>> devices downstream it acts as a root complex but still plugged into a
>>> parent pcie.0.
>>> This allows us to add/describe multiple "smmuv3-accel" each
>>> associated with a RC.
>>>
>> I find the qemu statements a bit unclear here as well.
>> I looked at the hot plug statement(s) in docs/pcie.txt, as I figured
>> that's where dynamic
>> IORT changes would be needed as well.  There, it says you can hot-add
>> PCIe devices to RPs,
>> one has to define/add RP's to the machine model for that plug-in.
>>
>> Using libvirt, it could auto-add the needed RPs to do dynmaic smmuv3
>> additions,
> I am not sure I understand your statement here. we don't want "dynamic"
> SMMUv3 instantiation. SMMUv3 is a platform device which is supposed to
> be coldplugged on a pre-existing PCIe hierarchy. The SMMUv3 device is
> not something that is meant to be hotplugged or hotunplugged.
> To me we hijack the bus= property to provide information about the IORT
> IDMAP
> 
Dynamic in the sense that if one adds smmuv3 for multiple devices,
libvirt will dynamically figure out how to instantiate one, two, three... smmu's
in the machine at cold boot.
If you want a machine to be able to hot-plug a device that would require another smmu,
than the config, and smmu, would have to be explicilty stated; as is done today for
hot-plug PCIe if the simple machine that libvirt would make is not sufficient to
hot-add a PCIe device.

> Thanks
> 
> Eric
>> if I understand how libvirt does that today for pcie devices now (/me
>> looks at danpb for feedback).
>>
>>> Having said that,  current code only allows pxb-pcie root complexes
>>> avoiding
>>> the pcie.0. The idea behind this was, user can use pcie.0 with a non
>>> accel SMMUv3
>>> for any emulated devices avoiding the performance bottlenecks we are
>>> discussing for emulated dev+smmuv3-accel cases. But based on the
>>> feedback from
>>> Eric and Daniel I will relax that restriction and will allow
>>> association with pcie.0.
>>>
>> So, I think this isn't a restriction that this smmuv3 feature should
>> enforce;
>> lack of a proper RP or pxb-pcie will yield an invalid config
>> issue/error, and
>> the machine definition will be modified to meet the needs for IORT.
>>
>>> Thanks,
>>> Shameer
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   
>>>>>> to root complexes.
>>>> Feel free to enlighten me where I may have mis-read/interpreted the
>>>> IORT
>>>> & SMMUv3 specs.
>>>>
>>>> Thanks,
>>>> - Don
>>>>
>>>
>>
> 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 18:09               ` Eric Auger
  2025-03-19 18:34                 ` Nicolin Chen
@ 2025-03-21  1:26                 ` Donald Dutile
  2025-03-24 14:59                   ` Eric Auger
  1 sibling, 1 reply; 145+ messages in thread
From: Donald Dutile @ 2025-03-21  1:26 UTC (permalink / raw)
  To: eric.auger, Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/19/25 2:09 PM, Eric Auger wrote:
> Hi Nicolin,
> 
> 
> On 3/19/25 6:14 PM, Nicolin Chen wrote:
>> On Wed, Mar 19, 2025 at 05:45:51PM +0100, Eric Auger wrote:
>>>
>>>
>>> On 3/17/25 8:10 PM, Nicolin Chen wrote:
>>>> On Mon, Mar 17, 2025 at 07:07:52PM +0100, Eric Auger wrote:
>>>>> On 3/17/25 6:54 PM, Nicolin Chen wrote:
>>>>>> On Wed, Mar 12, 2025 at 04:15:10PM +0100, Eric Auger wrote:
>>>>>>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>>>>>>> Based on SMMUv3 as a parent device, add a user-creatable smmuv3-accel
>>>>>>>> device. In order to support vfio-pci dev assignment with a Guest
>>>>>>> guest
>>>>>>>> SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
>>>>>>> nested (s1+s2)
>>>>>>>> mode, with Guest owning the S1 page tables. Subsequent patches will
>>>>>>> the guest
>>>>>>>> add support for smmuv3-accel to provide this.
>>>>>>> Can't this -accel smmu also works with emulated devices? Do we want an
>>>>>>> exclusive usage?
>>>>>> Is there any benefit from emulated devices working in the HW-
>>>>>> accelerated nested translation mode?
>>>>> Not really but do we have any justification for using different device
>>>>> name in accel mode? I am not even sure that accel option is really
>>>>> needed. Ideally the qemu device should be able to detect it is
>>>>> protecting a VFIO device, in which case it shall check whether nested is
>>>>> supported by host SMMU and then automatically turn accel mode?
>>>>>
>>>>> I gave the example of the vfio device which has different class
>>>>> implementration depending on the iommufd option being set or not.
>>>> Do you mean that we should just create a regular smmuv3 device and
>>>> let a VFIO device to turn on this smmuv3's accel mode depending on
>>>> its LEGACY/IOMMUFD class?
>>> no this is not what I meant. I gave an example where depending on an
>>> option passed to thye VFIO device you choose one class implement or the
>>> other.
>> Option means something like this:
>> 	-device smmuv3,accel=on
>> instead of
>> 	-device "smmuv3-accel"
>> ?
>>
>> Yea, I think that's good.
> Yeah actually that's a big debate for not much. From an implementation
> pov that shall not change much. The only doubt I have is if we need to
> conditionnaly expose the MSI RESV regions it is easier to do if we
> detect we have a smmuv3-accel. what the option allows is the auto mode.
>>
>>>> Another question: how does an emulated device work with a vSMMUv3?
>>> I don't get your question. vSMMUv3 currently only works with emulated
>>> devices. Did you mean accelerated SMMUv3?
>> Yea. If "accel=on", how does an emulated device work with that?
>>
>>>> I could imagine that all the accel steps would be bypassed since
>>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>>> so we will need to flush the iotlb, which will increase complexity
>>>> as the TLBI command dispatching function will need to be aware what
>>>> ASID is for emulated device and what is for vfio device..
>>> I don't get the issue. For emulated device you go through the usual
>>> translate path which indeed caches configs and translations. In case the
>>> guest invalidates something, you know the SID and you find the entries
>>> in the cache that are tagged by this SID.
>>>
>>> In case you have an accelerated device (indeed if sdev->idev) you don't
>>> exercise that path. On invalidation you detect the SID matches a VFIO
>>> devoce, propagate the invalidations to the host instead. on the
>>> invalidation you should be able to detect pretty easily if you need to
>>> flush the emulated caches or propagate the invalidations. Do I miss some
>>> extra problematic?
>>>
>>> I do not say we should support emulated devices and VFIO devices in the
>>> same guest iommu group. But I don't see why we couldn't easily plug the
>>> accelerated logic in the current logical for emulation/vhost and do not
>>> require a different qemu device.
>> Hmm, feels like I fundamentally misunderstood your point.
>>   a) We implement the device model with the same piece of code but
>>      only provide an option "accel=on/off" to switch mode. And both
>>      passthrough devices and emulated devices can attach to the same
>>      "accel=on" device.
> I think we all agree we don't want that use case in general. However
> effectively I was questioning why it couldn't work maybe at the expense
> of some perf degration.
>>   b) We implement the device model with the same piece of code but
>>      only provide an option "accel=on/off" to switch mode. Then, an
>>      passthrough device can attach to an "accel=on" device, but an
>>      emulated device can only attach to an "accel=off" SMMU device.
>>
>> I was thinking that you want case (a). But actually you were just
>> talking about case (b)? I think (b) is totally fine.
>>
>> We certainly can't do case (a): not all TLBI commands gives an "SID"
>> field (so would have to broadcast, i.e. underlying SMMU HW would run
>> commands that were supposed for emulated devices only); in case of
>> vCMDQ, commands for emulated devices would be issued to real HW and
> I am still confused about that. For instance if the guest sends an
> NH_ASID, NH_VA invalidation and it happens both the emulated device and
> VFIO-device share the same cd.asid (same guest iommu domain, which
> practically should not happen) why shouldn't we propagate the
it can't ... on ARM ... PCIe only, no shared iommu domain btwn devices.

Isn't this another reason (perf) why emulated devices & physical devices should
be on different vSMMU's ... so it can be distinguished on how deep (to hw)
or how wide(a broadcast) actions like TLBI is implemented, or impacts other devices ?


> invalidation to the host. Does the problem come from the usage of vCMDQ
> or would you foresee the same problem with a generic physical SMMU?
> 
> Thanks
> 
> Eric
>> trigger HW errors.
>>
>> Thanks
>> Nicolin
>>
> 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-20 17:02       ` Nicolin Chen
@ 2025-03-24  8:19         ` Shameerali Kolothum Thodi via
  2025-03-24 13:13           ` Eric Auger
  2025-03-24 15:50           ` Nicolin Chen
  0 siblings, 2 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-24  8:19 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, March 20, 2025 5:03 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Donald Dutile <ddutile@redhat.com>; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org; eric.auger@redhat.com; peter.maydell@linaro.org;
> jgg@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
> pcie bus
> 
> On Wed, Mar 19, 2025 at 09:26:29AM +0000, Shameerali Kolothum Thodi
> wrote:
> > Having said that,  current code only allows pxb-pcie root complexes
> avoiding
> > the pcie.0. The idea behind this was, user can use pcie.0 with a non accel
> SMMUv3
> > for any emulated devices avoiding the performance bottlenecks we are
> > discussing for emulated dev+smmuv3-accel cases. But based on the
> feedback from
> > Eric and Daniel I will relax that restriction and will allow association with
> pcie.0.
> 
> Just want a clarification here..
> 
> If VM has a passthrough device only:
>  attach it to PCIE.0 <=> vSMMU0 (accel=on)

Yes. Basically support accel=on to pcie.0 as well.

> If VM has an emulated device and a passthrough device:
>  attach the emulated device to PCIE.0 <=> vSMMU bypass (or accel=off?)
>  attach the passthrough device to pxb-pcie <=> vSMMU0 (accel=on)

This can be other way around as well:
ie, 
pass-through to pcie.0(accel=on) and emulated to any other pxb-pcie with accel = off.

I think the way bus numbers are allocated in Qemu for pcie.0 and pxb-pcie allows
us to support this in IORT ID maps.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-24  8:19         ` Shameerali Kolothum Thodi via
@ 2025-03-24 13:13           ` Eric Auger
  2025-03-24 13:55             ` Shameerali Kolothum Thodi via
  2025-03-24 16:01             ` Nicolin Chen
  2025-03-24 15:50           ` Nicolin Chen
  1 sibling, 2 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-24 13:13 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen
  Cc: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

Hi Shameer,

On 3/24/25 9:19 AM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Sent: Thursday, March 20, 2025 5:03 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: Donald Dutile <ddutile@redhat.com>; qemu-arm@nongnu.org; qemu-
>> devel@nongnu.org; eric.auger@redhat.com; peter.maydell@linaro.org;
>> jgg@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
>> pcie bus
>>
>> On Wed, Mar 19, 2025 at 09:26:29AM +0000, Shameerali Kolothum Thodi
>> wrote:
>>> Having said that,  current code only allows pxb-pcie root complexes
>> avoiding
>>> the pcie.0. The idea behind this was, user can use pcie.0 with a non accel
>> SMMUv3
>>> for any emulated devices avoiding the performance bottlenecks we are
>>> discussing for emulated dev+smmuv3-accel cases. But based on the
>> feedback from
>>> Eric and Daniel I will relax that restriction and will allow association with
>> pcie.0.
>>
>> Just want a clarification here..
>>
>> If VM has a passthrough device only:
>>  attach it to PCIE.0 <=> vSMMU0 (accel=on)
> Yes. Basically support accel=on to pcie.0 as well.

agreed we shall be able to instantiate the accelerated SMMU on pcie.0 too.
>
>> If VM has an emulated device and a passthrough device:
>>  attach the emulated device to PCIE.0 <=> vSMMU bypass (or accel=off?)
>>  attach the passthrough device to pxb-pcie <=> vSMMU0 (accel=on)
> This can be other way around as well:
> ie, 
> pass-through to pcie.0(accel=on) and emulated to any other pxb-pcie with accel = off.
+1
>
> I think the way bus numbers are allocated in Qemu for pcie.0 and pxb-pcie allows
> us to support this in IORT ID maps.
One trouble we may get into is possible bus reordering by the guest. I
don't know the details but I remember that in certain conditions the
guest can reorder the bus numbers.

Besides what I don't get in the above discussion, related to whether the
accelerated mode can also sipport emulated devices, is that if you use
the originally suggested hierarchy (pxb-pcie + root port + VFIO device)
you eventually get on guest side 2 devices protected by the SMMU
instance: the root port and the VFIO device. They end up in different
iommu groups. So there is already a mix of emulated and VFIO device.

Thanks

Eric
>
> Thanks,
> Shameer
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-24 13:13           ` Eric Auger
@ 2025-03-24 13:55             ` Shameerali Kolothum Thodi via
  2025-03-24 15:34               ` Eric Auger
  2025-03-24 16:01             ` Nicolin Chen
  1 sibling, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-24 13:55 UTC (permalink / raw)
  To: eric.auger@redhat.com, Nicolin Chen
  Cc: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: qemu-devel-
> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org <qemu-
> devel-bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org> On
> Behalf Of Eric Auger
> Sent: Monday, March 24, 2025 1:13 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Nicolin Chen
> <nicolinc@nvidia.com>
> Cc: Donald Dutile <ddutile@redhat.com>; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
> pcie bus
> 
> Hi Shameer,
> 
> On 3/24/25 9:19 AM, Shameerali Kolothum Thodi wrote:
> >
> >> -----Original Message-----
> >> From: Nicolin Chen <nicolinc@nvidia.com>
> >> Sent: Thursday, March 20, 2025 5:03 PM
> >> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> >> Cc: Donald Dutile <ddutile@redhat.com>; qemu-arm@nongnu.org;
> qemu-
> >> devel@nongnu.org; eric.auger@redhat.com; peter.maydell@linaro.org;
> >> jgg@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
> >> mochs@nvidia.com; smostafa@google.com; Linuxarm
> >> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> >> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> >> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> >> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a
> pxb-
> >> pcie bus
> >>
> >> On Wed, Mar 19, 2025 at 09:26:29AM +0000, Shameerali Kolothum Thodi
> >> wrote:
> >>> Having said that,  current code only allows pxb-pcie root complexes
> >> avoiding
> >>> the pcie.0. The idea behind this was, user can use pcie.0 with a non
> accel
> >> SMMUv3
> >>> for any emulated devices avoiding the performance bottlenecks we are
> >>> discussing for emulated dev+smmuv3-accel cases. But based on the
> >> feedback from
> >>> Eric and Daniel I will relax that restriction and will allow association
> with
> >> pcie.0.
> >>
> >> Just want a clarification here..
> >>
> >> If VM has a passthrough device only:
> >>  attach it to PCIE.0 <=> vSMMU0 (accel=on)
> > Yes. Basically support accel=on to pcie.0 as well.
> 
> agreed we shall be able to instantiate the accelerated SMMU on pcie.0 too.
> >
> >> If VM has an emulated device and a passthrough device:
> >>  attach the emulated device to PCIE.0 <=> vSMMU bypass (or accel=off?)
> >>  attach the passthrough device to pxb-pcie <=> vSMMU0 (accel=on)
> > This can be other way around as well:
> > ie,
> > pass-through to pcie.0(accel=on) and emulated to any other pxb-pcie with
> accel = off.
> +1
> >
> > I think the way bus numbers are allocated in Qemu for pcie.0 and pxb-
> pcie allows
> > us to support this in IORT ID maps.
> One trouble we may get into is possible bus reordering by the guest. I
> don't know the details but I remember that in certain conditions the
> guest can reorder the bus numbers.

Yeah, Guest kernel can re-enumerate PCIe. I will check.
 
> Besides what I don't get in the above discussion, related to whether the
> accelerated mode can also sipport emulated devices, is that if you use
> the originally suggested hierarchy (pxb-pcie + root port + VFIO device)
> you eventually get on guest side 2 devices protected by the SMMU
> instance: the root port and the VFIO device. They end up in different
> iommu groups. So there is already a mix of emulated and VFIO device.

True. But I guess the root port associated activity(invalidations etc) will be
very minimal(or nil?) compared to a virtio device.

Thanks,
Shameer




^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19  0:31                 ` Jason Gunthorpe
  2025-03-19  5:27                   ` Nicolin Chen
@ 2025-03-24 14:08                   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-24 14:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, ddutile, berrange, nathanc, mochs, smostafa,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

Hi Jason,

On 3/19/25 1:31 AM, Jason Gunthorpe wrote:
> On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
>> Nevertheless I don't think anything prevents the acceleration granted
>> device from also working with virtio/vhost devices for instance unless
>> you unplug the existing infra.
> If the accel mode is using something like vcmdq then it is not
> possible to work since the invalidations won't even be trapped.

I acknowledged I was more focused on the case without vcmdq which was
addressed in the past and now I better see the problem.

>
> Even in the case where we trap the invalidations it sure is
> complicated.. invalidation is done by ASID which is not obviously
> related to any specific device. An ASID could be hidden inside a CD
> table that is being HW accessed and also inside a CD table that is SW
> accessed. The VMM has no way to know what is going on so you'd end up
> forced to replicate all the ASID invalidations. :\
Nevertheless I think we shall also support the case without vcmdq
(currently supported in this series). And this one looks more compatible
with emulated devices althout less optimized.

>
> It just doesn't seem worthwhile to try to make it all work.
>
> I'd suggest arranging to share some of the SMMUv3 emulation code,
> maybe with a library/headerfile or something, but I think it does make
> sense they would be different implementations given how completely
> different they should be.
I agree with can do our utmost to separate implementations. I more
concerned about having libvirt guessing what kind of devices it shall use.

on x86 libvirt needs to use -device intel-iommu,caching-mode=on if one
wants to protect a VFIO device. So this looks like similar to adding
accel=on on ARM.

Eric
>
> Jason
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-19 18:34                 ` Nicolin Chen
@ 2025-03-24 14:46                   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-24 14:46 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/19/25 7:34 PM, Nicolin Chen wrote:
> On Wed, Mar 19, 2025 at 07:09:33PM +0100, Eric Auger wrote:
>>> Option means something like this:
>>> 	-device smmuv3,accel=on
>>> instead of
>>> 	-device "smmuv3-accel"
>>> ?
>>>
>>> Yea, I think that's good.
>> Yeah actually that's a big debate for not much. From an implementation
>> pov that shall not change much. The only doubt I have is if we need to
>> conditionnaly expose the MSI RESV regions it is easier to do if we
>> detect we have a smmuv3-accel. what the option allows is the auto mode.
> Mind elaborating your doubt about the MSI RESV region?
>
> Do you mean how VMS code should tag "accel=on" option and generate
> RMR nodes in the IORT table?
yes that was my point. Earlier we detected whether a "nested-smmu" was
part of the object hierarchy. Now we do the same with smmu type and
check if accel=on. I guess we can retrieve the property value but this
is worth to test.
>
>>> We certainly can't do case (a): not all TLBI commands gives an "SID"
>>> field (so would have to broadcast, i.e. underlying SMMU HW would run
>>> commands that were supposed for emulated devices only); in case of
>>> vCMDQ, commands for emulated devices would be issued to real HW and
>> I am still confused about that. For instance if the guest sends an
>> NH_ASID, NH_VA invalidation and it happens both the emulated device and
>> VFIO-device share the same cd.asid (same guest iommu domain, which
>> practically should not happen) why shouldn't we propagate the
>> invalidation to the host. Does the problem come from the usage of vCMDQ
>> or would you foresee the same problem with a generic physical SMMU?
> Host (HW) would end up with executing commands that were issued for
> emulated devices, which impacts performance.
>
> With vCMDQ, QEMU cannot trap command queue because all invalidation
> commands will be issued to HW directly from the guest kernel driver.
> This includes TLBI and ATC_INV commands. It's probably okay to run
> TLBI commands with vCMDQ (again perf impact), while ATC_INV commands
> would result in "unkonwn SID" errors or directly ATC_INV timeouts.
OK understood. Thanks and sorry for the misunderstanding

Eric
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-21  0:54                       ` Donald Dutile
@ 2025-03-24 14:52                         ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-24 14:52 UTC (permalink / raw)
  To: Donald Dutile, Nicolin Chen
  Cc: Jason Gunthorpe, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, berrange, nathanc, mochs, smostafa, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/21/25 1:54 AM, Donald Dutile wrote:
>
>
> On 3/19/25 1:04 PM, Eric Auger wrote:
>>
>>
>>
>> On 3/18/25 10:22 PM, Donald Dutile wrote:
>>>
>>>
>>> On 3/18/25 3:13 PM, Nicolin Chen wrote:
>>>> On Tue, Mar 18, 2025 at 07:31:36PM +0100, Eric Auger wrote:
>>>>> On 3/17/25 9:19 PM, Nicolin Chen wrote:
>>>>>> On Mon, Mar 17, 2025 at 04:24:53PM -0300, Jason Gunthorpe wrote:
>>>>>>> On Mon, Mar 17, 2025 at 12:10:19PM -0700, Nicolin Chen wrote:
>>>>>>>> Another question: how does an emulated device work with a vSMMUv3?
>>>>>>>> I could imagine that all the accel steps would be bypassed since
>>>>>>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>>>>>>> so we will need to flush the iotlb, which will increase complexity
>>>>>>>> as the TLBI command dispatching function will need to be aware
>>>>>>>> what
>>>>>>>> ASID is for emulated device and what is for vfio device..
>>>>>>> I think you should block it. We already expect different vSMMU's
>>>>>>> depending on the physical SMMU under the PCI device, it makes sense
>>>>>>> that a SW VFIO device would have it's own, non-accelerated, vSMMU
>>>>>>> model in the guest.
>>>>>> Yea, I agree and it'd be cleaner for an implementation separating
>>>>>> them.
>>>>>>
>>>>>> In my mind, the general idea of "accel=on" is also to keep things
>>>>>> in a more efficient way: passthrough devices go to HW-accelerated
>>>>>> vSMMUs (separated PCIE buses), while emulated ones go to a vSMMU-
>>>>>> bypassed (PCIE0).
>>>>
>>>>> Originally a specific SMMU device was needed to opt in for MSI
>>>>> reserved
>>>>> region ACPI IORT description which are not needed if you don't
>>>>> rely on
>>>>> S1+S2. However if we don't rely on this trick this was not even
>>>>> needed
>>>>> with legacy integration
>>>>> (https://patchwork.kernel.org/project/qemu-devel/cover/20180921081819.9203-1-eric.auger@redhat.com/).
>>>>>
>>>>>
>>>>>
>>>>> Nevertheless I don't think anything prevents the acceleration granted
>>>>> device from also working with virtio/vhost devices for instance
>>>>> unless
>>>>> you unplug the existing infra. The translation and invalidation just
>>>>> should use different control paths (explicit translation requests,
>>>>> invalidation notifications towards vhost, ...).
>>>>
>>>> smmuv3_translate() is per sdev, so it's easy.
>>>>
>>>> Invalidation is done via commands, which could be tricky:
>>>> a) Broadcast command
>>>> b) ASID validation -- we'll need to keep track of a list of ASIDs
>>>>      for vfio device to compare the ASID in each per-ASID command,
>>>>      potentially by trapping all CFGI_CD(_ALL) commands? Note that
>>>>      each vfio device may have multiple ASIDs (for multiple CDs).
>>>> Either a or b above will have some validation efficiency impact.
>>>>
>>>>> Again, what does legitimate to have different qemu devices for the
>>>>> same
>>>>> IP? I understand that it simplifies the implementation but I am not
>>>>> sure
>>>>> this is a good reason. Nevertheless it worth challenging. What is the
>>>>> plan for intel iommu? Will we have 2 devices, the legacy device
>>>>> and one
>>>>> for nested?
>>>>
>>>> Hmm, it seems that there are two different topics:
>>>> 1. Use one SMMU device model (source code file; "iommu=" string)
>>>>      for both an emulated vSMMU and an HW-accelerated vSMMU.
>>>> 2. Allow one vSMMU instance to work with both an emulated device
>>>>      and a passthrough device.
>>>> And I get that you want both 1 and 2.
>>>>
>>>> I'm totally okay with 1, yet see no compelling benefit from 2 for
>>>> the increased complexity in the invalidation routine.
>>>>
>>>> And another question about the mixed device attachment. Let's say
>>>> we have in the host:
>>>>     VFIO passthrough dev0 -> pSMMU0
>>>>     VFIO passthrough dev1 -> pSMMU1
>>>> Should we allow emulated devices to be flexibly plugged?
>>>>     dev0 -> vSMMU0 /* Hard requirement */
>>>>     dev1 -> vSMMU1 /* Hard requirement */
>>>>     emu0 -> vSMMU0 /* Soft requirement; can be vSMMU1 also */
>>>>     emu1 -> vSMMU1 /* Soft requirement; can be vSMMU0 also */
>>>>
>>>> Thanks
>>>> Nicolin
>>>>
>>> I agree w/Jason & Nicolin: different vSMMUs for pass-through devices
>>> than emulated, & vice-versa.
>>> Not mixing... because... of the next agreement:
>> you need to clarify what you mean by different vSMMUs: are you taking
>> about different instances or different qemu device types?
> Both. a device needed to use hw-accel feature has to use an smmu that
> has that feature;
> an emulated device can use such an smmu, but as mentioned in other
> threads,
> if you start with all emulated in one smmu, if you hot-plug a
> (assigned) device,
> it needs another smmu that has hw-accel features.
> Keeping them split makes it easier at config time, and it may enable
> the code to be simpler...
> but the other half of my brain wants common code paths with
> accel/emulate branches but
> a different smmu instance will like simplify the smmu-(accel-)specific
> lookups.

Yes I think we agree on the fact that several smmu instances are needed,
especially for matching the underneath HW topology and for having a
separate protection for emulated and host devices (esp with vCMD queues)

Eric
>
>>>
>>> I agree with Eric that 'accel' isn't needed -- this should be
>>> ascertained from the pSMMU that a physical device is attached to.
>> we can simply use an AUTO_ON_OFF property and by default choose AUTO
>> value. That would close the debate ;-)
>>
> Preaching to the choir... yes.
>
>> Eric
>>> Now... how does vfio(?; why not qemu?) layer determine that? -- where
>>> are SMMUv3 'accel' features exposed either: a) in the device struct
>>> (for the smmuv3) or (b) somewhere under sysfs? ... I couldn't find
>>> anything under either on my g-h system, but would appreciate a ptr if
>>> there is.
>>> and like Eric, although 'accel' is better than the original 'nested',
>>> it's non-obvious what accel feature(s) are being turned on, or not.
>>> In fact, if broken accel hw occurs ('if' -> 'when'), how should it be
>>> turned off? ... if info in the kernel, a kernel boot-param will be
>>> needed;
>>> if in sysfs, a write to 0 an enable(disable) it maybe an alternative
>>> as well.
>>> Bottom line: we need a way to (a) ascertain the accel feature (b) a
>>> way to disable it when it is broken,
>>> so qemu's smmuv3 spec will 'just work'.
>>> [This may also help when migrating from a machine that has accel
>>> working to one that does not.[
>>>
>>> ... and when an emulated device is assigned a vSMMU, there are no
>>> accel features ... unless we have tunables like batch iotlb
>>> invalidation for perf reasons, which can be viewed as an 'accel'
>>> option.
>>>
>>
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-21  0:59           ` Donald Dutile
@ 2025-03-24 14:56             ` Eric Auger
  2025-03-24 15:02               ` Daniel P. Berrangé
  2025-03-24 21:43               ` Donald Dutile
  0 siblings, 2 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-24 14:56 UTC (permalink / raw)
  To: Donald Dutile, Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



On 3/21/25 1:59 AM, Donald Dutile wrote:
>
>
> On 3/19/25 2:21 PM, Eric Auger wrote:
>> Hi Don,
>>
>>
>> On 3/19/25 5:21 PM, Donald Dutile wrote:
>>>
>>>
>>> On 3/19/25 5:26 AM, Shameerali Kolothum Thodi wrote:
>>>> Hi Don,
>>>>
>>> Hey!
>>>
>>>>> -----Original Message-----
>>>>> From: Donald Dutile <ddutile@redhat.com>
>>>>> Sent: Tuesday, March 18, 2025 10:12 PM
>>>>> To: Shameerali Kolothum Thodi
>>>>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>>>>> qemu-devel@nongnu.org
>>>>> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>>>>> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
>>>>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>>>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>>>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a
>>>>> pxb-
>>>>> pcie bus
>>>>>
>>>>> Shameer,
>>>>>
>>>>> Hi!
>>>>>
>>>>> On 3/11/25 10:10 AM, Shameer Kolothum wrote:
>>>>>> User must associate a pxb-pcie root bus to smmuv3-accel
>>>>>> and that is set as the primary-bus for the smmu dev.
>>>>>>
>>>>>> Signed-off-by: Shameer Kolothum
>>>>> <shameerali.kolothum.thodi@huawei.com>
>>>>>> ---
>>>>>>     hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
>>>>>>     1 file changed, 19 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
>>>>>> index c327661636..1471b65374 100644
>>>>>> --- a/hw/arm/smmuv3-accel.c
>>>>>> +++ b/hw/arm/smmuv3-accel.c
>>>>>> @@ -9,6 +9,21 @@
>>>>>>     #include "qemu/osdep.h"
>>>>>>
>>>>>>     #include "hw/arm/smmuv3-accel.h"
>>>>>> +#include "hw/pci/pci_bridge.h"
>>>>>> +
>>>>>> +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
>>>>>> +{
>>>>>> +    DeviceState *d = opaque;
>>>>>> +
>>>>>> +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
>>>>>> +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
>>>>>> +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus-
>>>>>> name)) {
>>>>>> +            object_property_set_link(OBJECT(d), "primary-bus",
>>>>>> OBJECT(bus),
>>>>>> +                                     &error_abort);
>>>>>> +        }
>>>>>> +    }
>>>>>> +    return 0;
>>>>>> +}
>>>>>>
>>>>>>     static void smmu_accel_realize(DeviceState *d, Error **errp)
>>>>>>     {
>>>>>> @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d,
>>>>>> Error
>>>>> **errp)
>>>>>>         SysBusDevice *dev = SYS_BUS_DEVICE(d);
>>>>>>         Error *local_err = NULL;
>>>>>>
>>>>>> +    object_child_foreach_recursive(object_get_root(),
>>>>>> +                                   smmuv3_accel_pxb_pcie_bus, d);
>>>>>> +
>>>>>>         object_property_set_bool(OBJECT(dev), "accel", true,
>>>>>> &error_abort);
>>>>>>         c->parent_realize(d, &local_err);
>>>>>>         if (local_err) {
>>>>>> @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass
>>>>> *klass, void *data)
>>>>>>         device_class_set_parent_realize(dc, smmu_accel_realize,
>>>>>>                                         &c->parent_realize);
>>>>>>         dc->hotpluggable = false;
>>>>>> +    dc->bus_type = TYPE_PCIE_BUS;
>>>>>>     }
>>>>>>
>>>>>>     static const TypeInfo smmuv3_accel_type_info = {
>>>>>
>>>>> I am not seeing the need for a pxb-pcie bus(switch) introduced for
>>>>> each
>>>>> 'accel'.
>>>>> Isn't the IORT able to define different SMMUs for different RIDs?
>>>>> if so,
>>>>> itsn't that sufficient
>>>>> to associate (define) an SMMU<->RID association without introducing a
>>>>> pxb-pcie?
>>>>> and again, I'm not sure how that improves/enables the device<->SMMU
>>>>> associativity?
>>>>
>>>> Thanks for taking a look at the series. As discussed elsewhere in
>>>> this thread(with
>>>> Eric), normally in physical world (or atleast in the most common
>>>> cases) SMMUv3
>>>> is attached to PCIe Root Complex and if you take a look at the IORT
>>>> spec, it describes
>>>> association of ID mappings between a RC node and SMMUV3 node.
>>>>
>>>> And if my understanding is correct, in Qemu, only pxb-pcie allows you
>>>> to add
>>>> extra root complexes even though it is still plugged to
>>>> parent(pcie.0). ie, for all
>>>> devices downstream it acts as a root complex but still plugged into a
>>>> parent pcie.0.
>>>> This allows us to add/describe multiple "smmuv3-accel" each
>>>> associated with a RC.
>>>>
>>> I find the qemu statements a bit unclear here as well.
>>> I looked at the hot plug statement(s) in docs/pcie.txt, as I figured
>>> that's where dynamic
>>> IORT changes would be needed as well.  There, it says you can hot-add
>>> PCIe devices to RPs,
>>> one has to define/add RP's to the machine model for that plug-in.
>>>
>>> Using libvirt, it could auto-add the needed RPs to do dynmaic smmuv3
>>> additions,
>> I am not sure I understand your statement here. we don't want "dynamic"
>> SMMUv3 instantiation. SMMUv3 is a platform device which is supposed to
>> be coldplugged on a pre-existing PCIe hierarchy. The SMMUv3 device is
>> not something that is meant to be hotplugged or hotunplugged.
>> To me we hijack the bus= property to provide information about the IORT
>> IDMAP
>>
> Dynamic in the sense that if one adds smmuv3 for multiple devices,
> libvirt will dynamically figure out how to instantiate one, two,
> three... smmu's
> in the machine at cold boot.
> If you want a machine to be able to hot-plug a device that would
> require another smmu,
> than the config, and smmu, would have to be explicilty stated; as is
> done today for
> hot-plug PCIe if the simple machine that libvirt would make is not
> sufficient to
> hot-add a PCIe device.

Hum this will need to be discussed with libvirt guys but I am not sure
they will be inclined to support such kind of policy, esp because vIOMMU
is a pretty marginal use case as of now. They do automatic instantiation
for pcie, usb controllers but I am not sure they will take care of the
vIOMMU tbh

Eric
>
>> Thanks
>>
>> Eric
>>> if I understand how libvirt does that today for pcie devices now (/me
>>> looks at danpb for feedback).
>>>
>>>> Having said that,  current code only allows pxb-pcie root complexes
>>>> avoiding
>>>> the pcie.0. The idea behind this was, user can use pcie.0 with a non
>>>> accel SMMUv3
>>>> for any emulated devices avoiding the performance bottlenecks we are
>>>> discussing for emulated dev+smmuv3-accel cases. But based on the
>>>> feedback from
>>>> Eric and Daniel I will relax that restriction and will allow
>>>> association with pcie.0.
>>>>
>>> So, I think this isn't a restriction that this smmuv3 feature should
>>> enforce;
>>> lack of a proper RP or pxb-pcie will yield an invalid config
>>> issue/error, and
>>> the machine definition will be modified to meet the needs for IORT.
>>>
>>>> Thanks,
>>>> Shameer
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  
>>>>>>> to root complexes.
>>>>> Feel free to enlighten me where I may have mis-read/interpreted the
>>>>> IORT
>>>>> & SMMUv3 specs.
>>>>>
>>>>> Thanks,
>>>>> - Don
>>>>>
>>>>
>>>
>>
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device
  2025-03-21  1:26                 ` Donald Dutile
@ 2025-03-24 14:59                   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-24 14:59 UTC (permalink / raw)
  To: Donald Dutile, Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/21/25 2:26 AM, Donald Dutile wrote:
>
>
> On 3/19/25 2:09 PM, Eric Auger wrote:
>> Hi Nicolin,
>>
>>
>> On 3/19/25 6:14 PM, Nicolin Chen wrote:
>>> On Wed, Mar 19, 2025 at 05:45:51PM +0100, Eric Auger wrote:
>>>>
>>>>
>>>> On 3/17/25 8:10 PM, Nicolin Chen wrote:
>>>>> On Mon, Mar 17, 2025 at 07:07:52PM +0100, Eric Auger wrote:
>>>>>> On 3/17/25 6:54 PM, Nicolin Chen wrote:
>>>>>>> On Wed, Mar 12, 2025 at 04:15:10PM +0100, Eric Auger wrote:
>>>>>>>> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
>>>>>>>>> Based on SMMUv3 as a parent device, add a user-creatable
>>>>>>>>> smmuv3-accel
>>>>>>>>> device. In order to support vfio-pci dev assignment with a Guest
>>>>>>>> guest
>>>>>>>>> SMMUv3, the physical SMMUv3 has to be configured in nested(S1+s2)
>>>>>>>> nested (s1+s2)
>>>>>>>>> mode, with Guest owning the S1 page tables. Subsequent patches
>>>>>>>>> will
>>>>>>>> the guest
>>>>>>>>> add support for smmuv3-accel to provide this.
>>>>>>>> Can't this -accel smmu also works with emulated devices? Do we
>>>>>>>> want an
>>>>>>>> exclusive usage?
>>>>>>> Is there any benefit from emulated devices working in the HW-
>>>>>>> accelerated nested translation mode?
>>>>>> Not really but do we have any justification for using different
>>>>>> device
>>>>>> name in accel mode? I am not even sure that accel option is really
>>>>>> needed. Ideally the qemu device should be able to detect it is
>>>>>> protecting a VFIO device, in which case it shall check whether
>>>>>> nested is
>>>>>> supported by host SMMU and then automatically turn accel mode?
>>>>>>
>>>>>> I gave the example of the vfio device which has different class
>>>>>> implementration depending on the iommufd option being set or not.
>>>>> Do you mean that we should just create a regular smmuv3 device and
>>>>> let a VFIO device to turn on this smmuv3's accel mode depending on
>>>>> its LEGACY/IOMMUFD class?
>>>> no this is not what I meant. I gave an example where depending on an
>>>> option passed to thye VFIO device you choose one class implement or
>>>> the
>>>> other.
>>> Option means something like this:
>>>     -device smmuv3,accel=on
>>> instead of
>>>     -device "smmuv3-accel"
>>> ?
>>>
>>> Yea, I think that's good.
>> Yeah actually that's a big debate for not much. From an implementation
>> pov that shall not change much. The only doubt I have is if we need to
>> conditionnaly expose the MSI RESV regions it is easier to do if we
>> detect we have a smmuv3-accel. what the option allows is the auto mode.
>>>
>>>>> Another question: how does an emulated device work with a vSMMUv3?
>>>> I don't get your question. vSMMUv3 currently only works with emulated
>>>> devices. Did you mean accelerated SMMUv3?
>>> Yea. If "accel=on", how does an emulated device work with that?
>>>
>>>>> I could imagine that all the accel steps would be bypassed since
>>>>> !sdev->idev. Yet, the emulated iotlb should cache its translation
>>>>> so we will need to flush the iotlb, which will increase complexity
>>>>> as the TLBI command dispatching function will need to be aware what
>>>>> ASID is for emulated device and what is for vfio device..
>>>> I don't get the issue. For emulated device you go through the usual
>>>> translate path which indeed caches configs and translations. In
>>>> case the
>>>> guest invalidates something, you know the SID and you find the entries
>>>> in the cache that are tagged by this SID.
>>>>
>>>> In case you have an accelerated device (indeed if sdev->idev) you
>>>> don't
>>>> exercise that path. On invalidation you detect the SID matches a VFIO
>>>> devoce, propagate the invalidations to the host instead. on the
>>>> invalidation you should be able to detect pretty easily if you need to
>>>> flush the emulated caches or propagate the invalidations. Do I miss
>>>> some
>>>> extra problematic?
>>>>
>>>> I do not say we should support emulated devices and VFIO devices in
>>>> the
>>>> same guest iommu group. But I don't see why we couldn't easily plug
>>>> the
>>>> accelerated logic in the current logical for emulation/vhost and do
>>>> not
>>>> require a different qemu device.
>>> Hmm, feels like I fundamentally misunderstood your point.
>>>   a) We implement the device model with the same piece of code but
>>>      only provide an option "accel=on/off" to switch mode. And both
>>>      passthrough devices and emulated devices can attach to the same
>>>      "accel=on" device.
>> I think we all agree we don't want that use case in general. However
>> effectively I was questioning why it couldn't work maybe at the expense
>> of some perf degration.
>>>   b) We implement the device model with the same piece of code but
>>>      only provide an option "accel=on/off" to switch mode. Then, an
>>>      passthrough device can attach to an "accel=on" device, but an
>>>      emulated device can only attach to an "accel=off" SMMU device.
>>>
>>> I was thinking that you want case (a). But actually you were just
>>> talking about case (b)? I think (b) is totally fine.
>>>
>>> We certainly can't do case (a): not all TLBI commands gives an "SID"
>>> field (so would have to broadcast, i.e. underlying SMMU HW would run
>>> commands that were supposed for emulated devices only); in case of
>>> vCMDQ, commands for emulated devices would be issued to real HW and
>> I am still confused about that. For instance if the guest sends an
>> NH_ASID, NH_VA invalidation and it happens both the emulated device and
>> VFIO-device share the same cd.asid (same guest iommu domain, which
>> practically should not happen) why shouldn't we propagate the
> it can't ... on ARM ... PCIe only, no shared iommu domain btwn devices.
yeah I agree this generally happens behind a PCIe to PCI bridge.
>
> Isn't this another reason (perf) why emulated devices & physical
> devices should
> be on different vSMMU's ... so it can be distinguished on how deep (to
> hw)
> or how wide(a broadcast) actions like TLBI is implemented, or impacts
> other devices ?
To me the actual issue is vcmdq. Here we have a blocker. Otherwise if
you don't have vcmdq you still can propage invalidations using the
proper notifier (VFIO or vhost). This used to work

Eric
>
>
>> invalidation to the host. Does the problem come from the usage of vCMDQ
>> or would you foresee the same problem with a generic physical SMMU?
>>
>> Thanks
>>
>> Eric
>>> trigger HW errors.
>>>
>>> Thanks
>>> Nicolin
>>>
>>
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-24 14:56             ` Eric Auger
@ 2025-03-24 15:02               ` Daniel P. Berrangé
  2025-03-24 21:43               ` Donald Dutile
  1 sibling, 0 replies; 145+ messages in thread
From: Daniel P. Berrangé @ 2025-03-24 15:02 UTC (permalink / raw)
  To: Eric Auger
  Cc: Donald Dutile, Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, jgg@nvidia.com,
	nicolinc@nvidia.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Mon, Mar 24, 2025 at 03:56:12PM +0100, Eric Auger wrote:
> 
> 
> On 3/21/25 1:59 AM, Donald Dutile wrote:
> >
> >
> > On 3/19/25 2:21 PM, Eric Auger wrote:
> >> Hi Don,
> >>
> >>
> >> On 3/19/25 5:21 PM, Donald Dutile wrote:
> >>>
> >>>
> >>> On 3/19/25 5:26 AM, Shameerali Kolothum Thodi wrote:
> >>>> Hi Don,
> >>>>
> >>> Hey!
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Donald Dutile <ddutile@redhat.com>
> >>>>> Sent: Tuesday, March 18, 2025 10:12 PM
> >>>>> To: Shameerali Kolothum Thodi
> >>>>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> >>>>> qemu-devel@nongnu.org
> >>>>> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> >>>>> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
> >>>>> mochs@nvidia.com; smostafa@google.com; Linuxarm
> >>>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> >>>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> >>>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> >>>>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a
> >>>>> pxb-
> >>>>> pcie bus
> >>>>>
> >>>>> Shameer,
> >>>>>
> >>>>> Hi!
> >>>>>
> >>>>> On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> >>>>>> User must associate a pxb-pcie root bus to smmuv3-accel
> >>>>>> and that is set as the primary-bus for the smmu dev.
> >>>>>>
> >>>>>> Signed-off-by: Shameer Kolothum
> >>>>> <shameerali.kolothum.thodi@huawei.com>
> >>>>>> ---
> >>>>>>     hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
> >>>>>>     1 file changed, 19 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> >>>>>> index c327661636..1471b65374 100644
> >>>>>> --- a/hw/arm/smmuv3-accel.c
> >>>>>> +++ b/hw/arm/smmuv3-accel.c
> >>>>>> @@ -9,6 +9,21 @@
> >>>>>>     #include "qemu/osdep.h"
> >>>>>>
> >>>>>>     #include "hw/arm/smmuv3-accel.h"
> >>>>>> +#include "hw/pci/pci_bridge.h"
> >>>>>> +
> >>>>>> +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
> >>>>>> +{
> >>>>>> +    DeviceState *d = opaque;
> >>>>>> +
> >>>>>> +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
> >>>>>> +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
> >>>>>> +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus-
> >>>>>> name)) {
> >>>>>> +            object_property_set_link(OBJECT(d), "primary-bus",
> >>>>>> OBJECT(bus),
> >>>>>> +                                     &error_abort);
> >>>>>> +        }
> >>>>>> +    }
> >>>>>> +    return 0;
> >>>>>> +}
> >>>>>>
> >>>>>>     static void smmu_accel_realize(DeviceState *d, Error **errp)
> >>>>>>     {
> >>>>>> @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d,
> >>>>>> Error
> >>>>> **errp)
> >>>>>>         SysBusDevice *dev = SYS_BUS_DEVICE(d);
> >>>>>>         Error *local_err = NULL;
> >>>>>>
> >>>>>> +    object_child_foreach_recursive(object_get_root(),
> >>>>>> +                                   smmuv3_accel_pxb_pcie_bus, d);
> >>>>>> +
> >>>>>>         object_property_set_bool(OBJECT(dev), "accel", true,
> >>>>>> &error_abort);
> >>>>>>         c->parent_realize(d, &local_err);
> >>>>>>         if (local_err) {
> >>>>>> @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass
> >>>>> *klass, void *data)
> >>>>>>         device_class_set_parent_realize(dc, smmu_accel_realize,
> >>>>>>                                         &c->parent_realize);
> >>>>>>         dc->hotpluggable = false;
> >>>>>> +    dc->bus_type = TYPE_PCIE_BUS;
> >>>>>>     }
> >>>>>>
> >>>>>>     static const TypeInfo smmuv3_accel_type_info = {
> >>>>>
> >>>>> I am not seeing the need for a pxb-pcie bus(switch) introduced for
> >>>>> each
> >>>>> 'accel'.
> >>>>> Isn't the IORT able to define different SMMUs for different RIDs?
> >>>>> if so,
> >>>>> itsn't that sufficient
> >>>>> to associate (define) an SMMU<->RID association without introducing a
> >>>>> pxb-pcie?
> >>>>> and again, I'm not sure how that improves/enables the device<->SMMU
> >>>>> associativity?
> >>>>
> >>>> Thanks for taking a look at the series. As discussed elsewhere in
> >>>> this thread(with
> >>>> Eric), normally in physical world (or atleast in the most common
> >>>> cases) SMMUv3
> >>>> is attached to PCIe Root Complex and if you take a look at the IORT
> >>>> spec, it describes
> >>>> association of ID mappings between a RC node and SMMUV3 node.
> >>>>
> >>>> And if my understanding is correct, in Qemu, only pxb-pcie allows you
> >>>> to add
> >>>> extra root complexes even though it is still plugged to
> >>>> parent(pcie.0). ie, for all
> >>>> devices downstream it acts as a root complex but still plugged into a
> >>>> parent pcie.0.
> >>>> This allows us to add/describe multiple "smmuv3-accel" each
> >>>> associated with a RC.
> >>>>
> >>> I find the qemu statements a bit unclear here as well.
> >>> I looked at the hot plug statement(s) in docs/pcie.txt, as I figured
> >>> that's where dynamic
> >>> IORT changes would be needed as well.  There, it says you can hot-add
> >>> PCIe devices to RPs,
> >>> one has to define/add RP's to the machine model for that plug-in.
> >>>
> >>> Using libvirt, it could auto-add the needed RPs to do dynmaic smmuv3
> >>> additions,
> >> I am not sure I understand your statement here. we don't want "dynamic"
> >> SMMUv3 instantiation. SMMUv3 is a platform device which is supposed to
> >> be coldplugged on a pre-existing PCIe hierarchy. The SMMUv3 device is
> >> not something that is meant to be hotplugged or hotunplugged.
> >> To me we hijack the bus= property to provide information about the IORT
> >> IDMAP
> >>
> > Dynamic in the sense that if one adds smmuv3 for multiple devices,
> > libvirt will dynamically figure out how to instantiate one, two,
> > three... smmu's
> > in the machine at cold boot.
> > If you want a machine to be able to hot-plug a device that would
> > require another smmu,
> > than the config, and smmu, would have to be explicilty stated; as is
> > done today for
> > hot-plug PCIe if the simple machine that libvirt would make is not
> > sufficient to
> > hot-add a PCIe device.
> 
> Hum this will need to be discussed with libvirt guys but I am not sure
> they will be inclined to support such kind of policy, esp because vIOMMU
> is a pretty marginal use case as of now. They do automatic instantiation
> for pcie, usb controllers but I am not sure they will take care of the
> vIOMMU tbh

Honestly I've lost track of what's going on this thread design-wise.

As general precedence though, the PCI(e) hierarchies libvirt auto-creates
are very flat - no PXBs for example, no association of host/guest NUMA,
etc. Libvirt does as little as possible in order to get PCI devices
working and anything even slightly "fancy" is left upto the mgmt app
to define. IIRC we have a few scenarios where IOMMUs get auto-added,
but mostly we expect the mgmt app to define them.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-24 13:55             ` Shameerali Kolothum Thodi via
@ 2025-03-24 15:34               ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-24 15:34 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen
  Cc: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



On 3/24/25 2:55 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: qemu-devel-
>> bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org <qemu-
>> devel-bounces+shameerali.kolothum.thodi=huawei.com@nongnu.org> On
>> Behalf Of Eric Auger
>> Sent: Monday, March 24, 2025 1:13 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; Nicolin Chen
>> <nicolinc@nvidia.com>
>> Cc: Donald Dutile <ddutile@redhat.com>; qemu-arm@nongnu.org; qemu-
>> devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
>> berrange@redhat.com; nathanc@nvidia.com; mochs@nvidia.com;
>> smostafa@google.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
>> pcie bus
>>
>> Hi Shameer,
>>
>> On 3/24/25 9:19 AM, Shameerali Kolothum Thodi wrote:
>>>> -----Original Message-----
>>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>>> Sent: Thursday, March 20, 2025 5:03 PM
>>>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>
>>>> Cc: Donald Dutile <ddutile@redhat.com>; qemu-arm@nongnu.org;
>> qemu-
>>>> devel@nongnu.org; eric.auger@redhat.com; peter.maydell@linaro.org;
>>>> jgg@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
>>>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a
>> pxb-
>>>> pcie bus
>>>>
>>>> On Wed, Mar 19, 2025 at 09:26:29AM +0000, Shameerali Kolothum Thodi
>>>> wrote:
>>>>> Having said that,  current code only allows pxb-pcie root complexes
>>>> avoiding
>>>>> the pcie.0. The idea behind this was, user can use pcie.0 with a non
>> accel
>>>> SMMUv3
>>>>> for any emulated devices avoiding the performance bottlenecks we are
>>>>> discussing for emulated dev+smmuv3-accel cases. But based on the
>>>> feedback from
>>>>> Eric and Daniel I will relax that restriction and will allow association
>> with
>>>> pcie.0.
>>>>
>>>> Just want a clarification here..
>>>>
>>>> If VM has a passthrough device only:
>>>>  attach it to PCIE.0 <=> vSMMU0 (accel=on)
>>> Yes. Basically support accel=on to pcie.0 as well.
>> agreed we shall be able to instantiate the accelerated SMMU on pcie.0 too.
>>>> If VM has an emulated device and a passthrough device:
>>>>  attach the emulated device to PCIE.0 <=> vSMMU bypass (or accel=off?)
>>>>  attach the passthrough device to pxb-pcie <=> vSMMU0 (accel=on)
>>> This can be other way around as well:
>>> ie,
>>> pass-through to pcie.0(accel=on) and emulated to any other pxb-pcie with
>> accel = off.
>> +1
>>> I think the way bus numbers are allocated in Qemu for pcie.0 and pxb-
>> pcie allows
>>> us to support this in IORT ID maps.
>> One trouble we may get into is possible bus reordering by the guest. I
>> don't know the details but I remember that in certain conditions the
>> guest can reorder the bus numbers.
> Yeah, Guest kernel can re-enumerate PCIe. I will check.
>  
>> Besides what I don't get in the above discussion, related to whether the
>> accelerated mode can also sipport emulated devices, is that if you use
>> the originally suggested hierarchy (pxb-pcie + root port + VFIO device)
>> you eventually get on guest side 2 devices protected by the SMMU
>> instance: the root port and the VFIO device. They end up in different
>> iommu groups. So there is already a mix of emulated and VFIO device.
> True. But I guess the root port associated activity(invalidations etc) will be
> very minimal(or nil?) compared to a virtio device.
Agreed. I just meant discriminating between devices that can bring
trouble and others may require some caution

Eric
>
> Thanks,
> Shameer
>
>
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-24  8:19         ` Shameerali Kolothum Thodi via
  2025-03-24 13:13           ` Eric Auger
@ 2025-03-24 15:50           ` Nicolin Chen
  1 sibling, 0 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-24 15:50 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Mon, Mar 24, 2025 at 08:19:27AM +0000, Shameerali Kolothum Thodi wrote:
> > If VM has an emulated device and a passthrough device:
> >  attach the emulated device to PCIE.0 <=> vSMMU bypass (or accel=off?)
> >  attach the passthrough device to pxb-pcie <=> vSMMU0 (accel=on)
> 
> This can be other way around as well:
> ie, 
> pass-through to pcie.0(accel=on) and emulated to any other pxb-pcie with accel = off.
> 
> I think the way bus numbers are allocated in Qemu for pcie.0 and pxb-pcie allows
> us to support this in IORT ID maps.

That sounds fine. The reason why I picked pcie.0 for emulated
devices, simply for keeping the design of pxb-pcie for multi-
vSMMU cases.

I think libvirt could still choose to keep it simple, although
on the QEMU side we have to keep the flexibility.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-24 13:13           ` Eric Auger
  2025-03-24 13:55             ` Shameerali Kolothum Thodi via
@ 2025-03-24 16:01             ` Nicolin Chen
  2025-03-24 16:06               ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-24 16:01 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameerali Kolothum Thodi, Donald Dutile, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, jgg@nvidia.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Mon, Mar 24, 2025 at 02:13:20PM +0100, Eric Auger wrote:
> >> If VM has an emulated device and a passthrough device:
> >>  attach the emulated device to PCIE.0 <=> vSMMU bypass (or accel=off?)
> >>  attach the passthrough device to pxb-pcie <=> vSMMU0 (accel=on)
> > This can be other way around as well:
> > ie, 
> > pass-through to pcie.0(accel=on) and emulated to any other pxb-pcie with accel = off.
> +1
> >
> > I think the way bus numbers are allocated in Qemu for pcie.0 and pxb-pcie allows
> > us to support this in IORT ID maps.
> One trouble we may get into is possible bus reordering by the guest. I
> don't know the details but I remember that in certain conditions the
> guest can reorder the bus numbers.

Hmm, that sounds troublesome. IORT mappings are done using the bus
number, which is fixed to a vSMMU. Can we disable that reordering?

> Besides what I don't get in the above discussion, related to whether the
> accelerated mode can also sipport emulated devices, is that if you use
> the originally suggested hierarchy (pxb-pcie + root port + VFIO device)
> you eventually get on guest side 2 devices protected by the SMMU
> instance: the root port and the VFIO device. They end up in different
> iommu groups. So there is already a mix of emulated and VFIO device.

Strictly speaking, yes, that's a mix. Maybe we should say emulated
endpoints and passthrough endpoints?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-24 16:01             ` Nicolin Chen
@ 2025-03-24 16:06               ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-24 16:06 UTC (permalink / raw)
  To: Nicolin Chen, Eric Auger
  Cc: Donald Dutile, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, berrange@redhat.com,
	nathanc@nvidia.com, mochs@nvidia.com, smostafa@google.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Monday, March 24, 2025 4:02 PM
> To: Eric Auger <eric.auger@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Donald Dutile
> <ddutile@redhat.com>; qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> peter.maydell@linaro.org; jgg@nvidia.com; berrange@redhat.com;
> nathanc@nvidia.com; mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-
> pcie bus
> 
> On Mon, Mar 24, 2025 at 02:13:20PM +0100, Eric Auger wrote:
> > >> If VM has an emulated device and a passthrough device:
> > >>  attach the emulated device to PCIE.0 <=> vSMMU bypass (or
> accel=off?)
> > >>  attach the passthrough device to pxb-pcie <=> vSMMU0 (accel=on)
> > > This can be other way around as well:
> > > ie,
> > > pass-through to pcie.0(accel=on) and emulated to any other pxb-pcie
> with accel = off.
> > +1
> > >
> > > I think the way bus numbers are allocated in Qemu for pcie.0 and pxb-
> pcie allows
> > > us to support this in IORT ID maps.
> > One trouble we may get into is possible bus reordering by the guest. I
> > don't know the details but I remember that in certain conditions the
> > guest can reorder the bus numbers.
> 
> Hmm, that sounds troublesome. IORT mappings are done using the bus
> number, which is fixed to a vSMMU. Can we disable that reordering?

DSM 5# is actually a way to do that. But I don't think we need that as host
kernel also will have the same issues with IORT if re enumeration happens.
I think the iommu_fwspec mechanism is to take care of this. I need to double
check though.
 
Thanks,
Shameer


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  2025-03-24 14:56             ` Eric Auger
  2025-03-24 15:02               ` Daniel P. Berrangé
@ 2025-03-24 21:43               ` Donald Dutile
  1 sibling, 0 replies; 145+ messages in thread
From: Donald Dutile @ 2025-03-24 21:43 UTC (permalink / raw)
  To: eric.auger, Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



On 3/24/25 10:56 AM, Eric Auger wrote:
> 
> 
> On 3/21/25 1:59 AM, Donald Dutile wrote:
>>
>>
>> On 3/19/25 2:21 PM, Eric Auger wrote:
>>> Hi Don,
>>>
>>>
>>> On 3/19/25 5:21 PM, Donald Dutile wrote:
>>>>
>>>>
>>>> On 3/19/25 5:26 AM, Shameerali Kolothum Thodi wrote:
>>>>> Hi Don,
>>>>>
>>>> Hey!
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Donald Dutile <ddutile@redhat.com>
>>>>>> Sent: Tuesday, March 18, 2025 10:12 PM
>>>>>> To: Shameerali Kolothum Thodi
>>>>>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>>>>>> qemu-devel@nongnu.org
>>>>>> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>>>>>> nicolinc@nvidia.com; berrange@redhat.com; nathanc@nvidia.com;
>>>>>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>>>>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>>>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>>>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>>>>> Subject: Re: [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a
>>>>>> pxb-
>>>>>> pcie bus
>>>>>>
>>>>>> Shameer,
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> On 3/11/25 10:10 AM, Shameer Kolothum wrote:
>>>>>>> User must associate a pxb-pcie root bus to smmuv3-accel
>>>>>>> and that is set as the primary-bus for the smmu dev.
>>>>>>>
>>>>>>> Signed-off-by: Shameer Kolothum
>>>>>> <shameerali.kolothum.thodi@huawei.com>
>>>>>>> ---
>>>>>>>      hw/arm/smmuv3-accel.c | 19 +++++++++++++++++++
>>>>>>>      1 file changed, 19 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
>>>>>>> index c327661636..1471b65374 100644
>>>>>>> --- a/hw/arm/smmuv3-accel.c
>>>>>>> +++ b/hw/arm/smmuv3-accel.c
>>>>>>> @@ -9,6 +9,21 @@
>>>>>>>      #include "qemu/osdep.h"
>>>>>>>
>>>>>>>      #include "hw/arm/smmuv3-accel.h"
>>>>>>> +#include "hw/pci/pci_bridge.h"
>>>>>>> +
>>>>>>> +static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
>>>>>>> +{
>>>>>>> +    DeviceState *d = opaque;
>>>>>>> +
>>>>>>> +    if (object_dynamic_cast(obj, "pxb-pcie-bus")) {
>>>>>>> +        PCIBus *bus = PCI_HOST_BRIDGE(obj->parent)->bus;
>>>>>>> +        if (d->parent_bus && !strcmp(bus->qbus.name, d->parent_bus-
>>>>>>> name)) {
>>>>>>> +            object_property_set_link(OBJECT(d), "primary-bus",
>>>>>>> OBJECT(bus),
>>>>>>> +                                     &error_abort);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>>
>>>>>>>      static void smmu_accel_realize(DeviceState *d, Error **errp)
>>>>>>>      {
>>>>>>> @@ -17,6 +32,9 @@ static void smmu_accel_realize(DeviceState *d,
>>>>>>> Error
>>>>>> **errp)
>>>>>>>          SysBusDevice *dev = SYS_BUS_DEVICE(d);
>>>>>>>          Error *local_err = NULL;
>>>>>>>
>>>>>>> +    object_child_foreach_recursive(object_get_root(),
>>>>>>> +                                   smmuv3_accel_pxb_pcie_bus, d);
>>>>>>> +
>>>>>>>          object_property_set_bool(OBJECT(dev), "accel", true,
>>>>>>> &error_abort);
>>>>>>>          c->parent_realize(d, &local_err);
>>>>>>>          if (local_err) {
>>>>>>> @@ -33,6 +51,7 @@ static void smmuv3_accel_class_init(ObjectClass
>>>>>> *klass, void *data)
>>>>>>>          device_class_set_parent_realize(dc, smmu_accel_realize,
>>>>>>>                                          &c->parent_realize);
>>>>>>>          dc->hotpluggable = false;
>>>>>>> +    dc->bus_type = TYPE_PCIE_BUS;
>>>>>>>      }
>>>>>>>
>>>>>>>      static const TypeInfo smmuv3_accel_type_info = {
>>>>>>
>>>>>> I am not seeing the need for a pxb-pcie bus(switch) introduced for
>>>>>> each
>>>>>> 'accel'.
>>>>>> Isn't the IORT able to define different SMMUs for different RIDs?
>>>>>> if so,
>>>>>> itsn't that sufficient
>>>>>> to associate (define) an SMMU<->RID association without introducing a
>>>>>> pxb-pcie?
>>>>>> and again, I'm not sure how that improves/enables the device<->SMMU
>>>>>> associativity?
>>>>>
>>>>> Thanks for taking a look at the series. As discussed elsewhere in
>>>>> this thread(with
>>>>> Eric), normally in physical world (or atleast in the most common
>>>>> cases) SMMUv3
>>>>> is attached to PCIe Root Complex and if you take a look at the IORT
>>>>> spec, it describes
>>>>> association of ID mappings between a RC node and SMMUV3 node.
>>>>>
>>>>> And if my understanding is correct, in Qemu, only pxb-pcie allows you
>>>>> to add
>>>>> extra root complexes even though it is still plugged to
>>>>> parent(pcie.0). ie, for all
>>>>> devices downstream it acts as a root complex but still plugged into a
>>>>> parent pcie.0.
>>>>> This allows us to add/describe multiple "smmuv3-accel" each
>>>>> associated with a RC.
>>>>>
>>>> I find the qemu statements a bit unclear here as well.
>>>> I looked at the hot plug statement(s) in docs/pcie.txt, as I figured
>>>> that's where dynamic
>>>> IORT changes would be needed as well.  There, it says you can hot-add
>>>> PCIe devices to RPs,
>>>> one has to define/add RP's to the machine model for that plug-in.
>>>>
>>>> Using libvirt, it could auto-add the needed RPs to do dynmaic smmuv3
>>>> additions,
>>> I am not sure I understand your statement here. we don't want "dynamic"
>>> SMMUv3 instantiation. SMMUv3 is a platform device which is supposed to
>>> be coldplugged on a pre-existing PCIe hierarchy. The SMMUv3 device is
>>> not something that is meant to be hotplugged or hotunplugged.
>>> To me we hijack the bus= property to provide information about the IORT
>>> IDMAP
>>>
>> Dynamic in the sense that if one adds smmuv3 for multiple devices,
>> libvirt will dynamically figure out how to instantiate one, two,
>> three... smmu's
>> in the machine at cold boot.
>> If you want a machine to be able to hot-plug a device that would
>> require another smmu,
>> than the config, and smmu, would have to be explicilty stated; as is
>> done today for
>> hot-plug PCIe if the simple machine that libvirt would make is not
>> sufficient to
>> hot-add a PCIe device.
> 
> Hum this will need to be discussed with libvirt guys but I am not sure
> they will be inclined to support such kind of policy, esp because vIOMMU
> is a pretty marginal use case as of now. They do automatic instantiation
> for pcie, usb controllers but I am not sure they will take care of the
> vIOMMU tbh
> 
> Eric

A discussion w/libvirt developers would be prudent.
I don't think it's that complicated and lots of parallels to device-assigned pcie devices & virtio-devices,
but for possibly different reasons: for pci(e) assigned devices, need to add (hw-centric) RP's and pcie bus's;
virtio devices can share a (n emulated) PCI.

for smmu: devices assigned can be attached to an smmu, which libvirt can have accel=auto added to it, on
a separate smmu than those added to virtio devices(sharing that smmu).  Each assigned device can have a
unique smmu-id, like assigned PCI(e) devices have unique pci-id (pcie is one-device/one-bus, of course)
but the assigned devices may actually use the same smmu (physically, even though virtually defined as separate).
If we end up with too many smmu's b/c we have successfully exploited their advanced features
for assigned devices in guests, I'll consider that a win! ;-)
Seriously, I'm sure we can figure out how to improve the libvirt smmu/iommu assignment of (assigned) devices to (virtual) smmu's/iommu's
with a bit more hw-tree lookup code.

I look forward to the discussion with the libvirt developers.

>>
>>> Thanks
>>>
>>> Eric
>>>> if I understand how libvirt does that today for pcie devices now (/me
>>>> looks at danpb for feedback).
>>>>
>>>>> Having said that,  current code only allows pxb-pcie root complexes
>>>>> avoiding
>>>>> the pcie.0. The idea behind this was, user can use pcie.0 with a non
>>>>> accel SMMUv3
>>>>> for any emulated devices avoiding the performance bottlenecks we are
>>>>> discussing for emulated dev+smmuv3-accel cases. But based on the
>>>>> feedback from
>>>>> Eric and Daniel I will relax that restriction and will allow
>>>>> association with pcie.0.
>>>>>
>>>> So, I think this isn't a restriction that this smmuv3 feature should
>>>> enforce;
>>>> lack of a proper RP or pxb-pcie will yield an invalid config
>>>> issue/error, and
>>>> the machine definition will be modified to meet the needs for IORT.
>>>>
>>>>> Thanks,
>>>>> Shameer
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   
>>>>>>>> to root complexes.
>>>>>> Feel free to enlighten me where I may have mis-read/interpreted the
>>>>>> IORT
>>>>>> & SMMUv3 specs.
>>>>>>
>>>>>> Thanks,
>>>>>> - Don
>>>>>>
>>>>>
>>>>
>>>
>>
> 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
                   ` (20 preceding siblings ...)
  2025-03-19 16:40 ` [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Philippe Mathieu-Daudé
@ 2025-03-25 14:42 ` Eric Auger
  2025-03-25 15:43   ` Shameerali Kolothum Thodi via
  21 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-25 14:42 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Hi All,
>
> This patch series introduces initial support for a user-creatable
> accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU.
>
> Why this is needed:
>
> Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the
> machine and does not support configuring the host SMMUv3 in nested
> mode.This limitation prevents its use with vfio-pci passthrough
> devices.
>
> The new pluggable smmuv3-accel device enables host SMMUv3 configuration
> with nested stage support (Stage 1 owned by the Guest and Stage 2 by the
> host) via the new IOMMUFD APIs. Additionally, it allows multiple 
> accelerated vSMMUv3 instances for guests running on hosts with multiple
> physical SMMUv3s.
>
> This will benefit in:
> -Reduced invalidation broadcasts and lookups for devices behind multiple
>  physical SMMUv3s.
> -Simplifies handling of host SMMUv3s with differing feature sets.
> -Lays the groundwork for additional capabilities like vCMDQ support.
>
>
> Changes from RFCv1[0]:
>
> Thanks to everyone who provided feedback on RFCv1!. 
>
> –The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested
>  to better reflect its role in using the host's physical SMMUv3 for page
>  table setup and cache invalidations.
> -Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2).
> -Merges patches from Nicolin’s GitHub repository that add accelerated
>  functionalityi for page table setup and cache invalidations[1]. I have
>  modified these a bit, but hopefully has not broken anything.
> -Incorporates various fixes and improvements based on RFCv1 feedback.
> –Adds support for vfio-pci hotplug with smmuv3-accel.
>
> Note: IORT RMR patches for MSI setup are currently excluded as we may
> adopt a different approach for MSI handling in the future [2].
>
> Also this has dependency on the common iommufd/vfio patches from
> Zhenzhong's series here[3]
>
> ToDos:
>
> –At least one vfio-pci device must currently be cold-plugged to a
>  pxb-pcie bus associated with the arm-smmuv3-accel. This is required both
>  to associate a vSMMUv3 with a host SMMUv3 and also needed to
>  retrieve the host SMMUv3 IDR registers for guest export.
>  Future updates will remove this restriction by adding the
>  necessary kernel support.
>  Please find the discussion here[4]
> -This version does not yet support host SMMUv3 fault handling or
>  other event notifications. These will be addressed in a
>  future patch series.
>
>
> The complete branch can be found here:
> https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext
>
> I have done basic sanity testing on a Hisilicon Platform using the kernel
> branch here:
> https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2
>
> Usage Eg:
>
> On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different host SMMUv3s. So for a
> Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as below,
>
>
> ./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \
> -cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \
> -bios QEMU_EFI.fd \
> -object iommufd,id=iommufd0 \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
> -device arm-smmuv3-accel,bus=pcie.1 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
> -device arm-smmuv3-accel,bus=pcie.2 \
> -device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \
> -kernel Image \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs,path=p9root,security_model=mapped \
> -net none \
> -nographic
>
> Guest will boot with two SMMUv3s,
> ...
> arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008325)
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008325)
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
>
> With a pci topology like below,
>
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:01]-+-00.0-[02]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  |           \-01.0-[03]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)

For the record I tested the series with host VFIO device and a
virtio-blk-pci device put behind the same pxb-pcie/smmu protection and
it works just fine

-+-[0000:0a]-+-01.0-[0b]----00.0  Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
 |           \-01.1-[0c]----00.0  Red Hat, Inc. Virtio 1.0 block device
 \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
             +-01.0-[01]--
             +-01.1-[02]--
             \-02.0  Red Hat, Inc. QEMU PCIe Expander bridge

This shows that without vcmdq feature there is no blocker having the
same smmu device protecting both accelerated and emulated devices.

Thanks

Eric
>
> Further tests are always welcome.
>
> Please take a look and let me know your feedback!
>
> Thanks,
> Shameer
>
> [0] https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com/
> [1] https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6
> [2] https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicolinc@nvidia.com/
> [3] https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.duan@intel.com/
> [4] https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/
>
> Nicolin Chen (11):
>   backends/iommufd: Introduce iommufd_backend_alloc_viommu
>   backends/iommufd: Introduce iommufd_vdev_alloc
>   hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
>   hw/arm/smmuv3-accel: Support nested STE install/uninstall support
>   hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
>   hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed
>   hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache
>     invalidations
>   hw/arm/smmuv3: Forward invalidation commands to hw
>   hw/arm/smmuv3-accel: Read host SMMUv3 device info
>   hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
>   hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
>
> Shameer Kolothum (9):
>   hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel
>     device
>   hw/arm/virt: Add support for smmuv3-accel
>   hw/arm/smmuv3-accel: Associate a pxb-pcie bus
>   hw/arm/smmu-common: Factor out common helper functions and export
>   hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
>   hw/arm/smmuv3-accel: Provide get_address_space callback
>   hw/arm/smmuv3: Install nested ste for CFGI_STE
>   hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes
>   hw/arm/smmuv3-accel: Enable smmuv3-accel creation
>
>  backends/iommufd.c            |  51 +++
>  backends/trace-events         |   2 +
>  hw/arm/Kconfig                |   5 +
>  hw/arm/meson.build            |   1 +
>  hw/arm/smmu-common.c          |  95 +++++-
>  hw/arm/smmuv3-accel.c         | 616 ++++++++++++++++++++++++++++++++++
>  hw/arm/smmuv3-internal.h      |  54 +++
>  hw/arm/smmuv3.c               |  80 ++++-
>  hw/arm/trace-events           |   6 +
>  hw/arm/virt-acpi-build.c      | 113 ++++++-
>  hw/arm/virt.c                 |  12 +
>  hw/core/sysbus-fdt.c          |   1 +
>  include/hw/arm/smmu-common.h  |  14 +
>  include/hw/arm/smmuv3-accel.h |  75 +++++
>  include/hw/arm/virt.h         |   1 +
>  include/system/iommufd.h      |  14 +
>  16 files changed, 1101 insertions(+), 39 deletions(-)
>  create mode 100644 hw/arm/smmuv3-accel.c
>  create mode 100644 include/hw/arm/smmuv3-accel.h
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-03-25 14:42 ` Eric Auger
@ 2025-03-25 15:43   ` Shameerali Kolothum Thodi via
  2025-03-25 18:26     ` Nicolin Chen via
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-25 15:43 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, berrange@redhat.com, nathanc@nvidia.com,
	mochs@nvidia.com, smostafa@google.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Tuesday, March 25, 2025 2:43 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-
> creatable accelerated SMMUv3
> 


> For the record I tested the series with host VFIO device and a
> virtio-blk-pci device put behind the same pxb-pcie/smmu protection and
> it works just fine
> 
> -+-[0000:0a]-+-01.0-[0b]----00.0  Mellanox Technologies ConnectX Family
> mlx5Gen Virtual Function
>  |           \-01.1-[0c]----00.0  Red Hat, Inc. Virtio 1.0 block device
>  \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>              +-01.0-[01]--
>              +-01.1-[02]--
>              \-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
> 
> This shows that without vcmdq feature there is no blocker having the
> same smmu device protecting both accelerated and emulated devices.

Thanks for giving it a spin. Yes, it currently supports the above. 

At the moment we are not using the IOTLB for the emulated dev for a
config like above.  Have you checked performance for either emulated or
vfio dev with the above config? Whatever light tests I have done it shows
performance degradation for emulated dev compared to the default
SMMUv3(iommu=smmuv3). 

And if the emulated dev issues _TLBI_NH_ASID, the code currently will propagate
that down to host SMMUv3. This will affect the vfio dev as well.

So the question is whether we want to allow this(assuming user is educated) or
block such a config as user has an option of using a non-accel smmuv3 for
emulated devices.

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 10/20] hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  2025-03-11 14:10 ` [RFC PATCH v2 10/20] hw/arm/smmuv3-accel: Support nested STE install/uninstall support Shameer Kolothum via
@ 2025-03-25 18:08   ` Eric Auger
  2025-03-25 19:33     ` Nicolin Chen
  0 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-25 18:08 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Allocates a s1 HWPT for the Guest s1 stage and attaches that
> to the dev. This will be invoked in a subsequent patch when
> Guest issues SMMU_CMD_CFGI_STE.
CMD_CFGI_STE ...
or CMD_CFGI_STE_RANGE
>
> While at it, we are also exporting both smmu_find_ste() and
> smmuv3_flush_config() from smmuv3.c for use here.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c         | 111 ++++++++++++++++++++++++++++++++++
>  hw/arm/smmuv3-internal.h      |  13 ++++
>  hw/arm/smmuv3.c               |   5 +-
>  hw/arm/trace-events           |   1 +
>  include/hw/arm/smmuv3-accel.h |   6 ++
>  5 files changed, 133 insertions(+), 3 deletions(-)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 1c696649d5..d3a5cf9551 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -13,6 +13,8 @@
>  #include "hw/arm/smmuv3-accel.h"
>  #include "hw/pci/pci_bridge.h"
>  
> +#include "smmuv3-internal.h"
> +
>  static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
>                                                  PCIBus *bus, int devfn)
>  {
> @@ -32,6 +34,115 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
>      return accel_dev;
>  }
>  
> +static void
> +smmuv3_accel_dev_uninstall_nested_ste(SMMUv3AccelDevice *accel_dev, bool abort)
> +{
> +    HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
> +    SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
> +    uint32_t hwpt_id;
> +
> +    if (!s1_hwpt || !accel_dev->viommu) {
> +        return;
> +    }
> +
> +    if (abort) {
> +        hwpt_id = accel_dev->viommu->abort_hwpt_id;
> +    } else {
> +        hwpt_id = accel_dev->viommu->bypass_hwpt_id;
> +    }
> +
> +    host_iommu_device_iommufd_attach_hwpt(idev, hwpt_id, &error_abort);
> +    iommufd_backend_free_id(s1_hwpt->iommufd, s1_hwpt->hwpt_id);
> +    accel_dev->s1_hwpt = NULL;
> +    g_free(s1_hwpt);
> +}
> +
> +static int
> +smmuv3_accel_dev_install_nested_ste(SMMUv3AccelDevice *accel_dev,
> +                                    uint32_t data_type, uint32_t data_len,
> +                                    void *data)
> +{
> +    SMMUViommu *viommu = accel_dev->viommu;
> +    SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
> +    HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
> +
> +    if (!idev || !viommu) {
> +        return -ENOENT;
> +    }
> +
> +    if (s1_hwpt) {
> +        smmuv3_accel_dev_uninstall_nested_ste(accel_dev, false);
why do you choose abort = false?
> +    }
> +
> +    s1_hwpt = g_new0(SMMUS1Hwpt, 1);
> +    if (!s1_hwpt) {
no need to test the result.

"
If any call to allocate memory using functions |g_new()|
<https://docs.gtk.org/glib/func.new.html>, |g_new0()|
<https://docs.gtk.org/glib/func.new0.html>, |g_renew()|
<https://docs.gtk.org/glib/func.renew.html>, |g_malloc()|
<https://docs.gtk.org/glib/func.malloc.html>, |g_malloc0()|
<https://docs.gtk.org/glib/func.malloc0.html>, |g_malloc0_n()|
<https://docs.gtk.org/glib/func.malloc0_n.html>, |g_realloc()|
<https://docs.gtk.org/glib/func.realloc.html> and |g_realloc_n()|
<https://docs.gtk.org/glib/func.realloc_n.html> fails, the application
is terminated. This also means that there is no need to check if the
call succeeded.
"

https://docs.gtk.org/glib/memory.html#title

> +        return -ENOMEM;
> +    }
> +
> +    s1_hwpt->iommufd = idev->iommufd;
> +    iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
> +                               viommu->core.viommu_id, 0, data_type, data_len,
> +                               data, &s1_hwpt->hwpt_id, &error_abort);
> +    host_iommu_device_iommufd_attach_hwpt(idev, s1_hwpt->hwpt_id, &error_abort);
> +    accel_dev->s1_hwpt = s1_hwpt;
> +    return 0;
> +}
> +
> +void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
> +{
> +    SMMUv3AccelDevice *accel_dev;
> +    SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
> +                           .inval_ste_allowed = true};
> +    struct iommu_hwpt_arm_smmuv3 nested_data = {};
> +    SMMUv3State *s = sdev->smmu;
> +    SMMUState *bs = &s->smmu_state;
> +    uint32_t config;
> +    STE ste;
> +    int ret;
> +
> +    if (!bs->accel) {
> +        return;
> +    }
> +
> +    accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +    if (!accel_dev->viommu) {
> +        return;
> +    }
> +
> +    ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
> +    if (ret) {
> +        /*
> +         * For a 2-level Stream Table, the level-2 table might not be ready
> +         * until the device gets inserted to the stream table. Ignore this.
> +         */
I am confused by the above comment. Please can you describe the
circumstances when this happens and why this should be an error
> +        return;
> +    }
> +
> +    config = STE_CONFIG(&ste);
> +    if (!STE_VALID(&ste) || !STE_CFG_S1_ENABLED(config)) {
you fully bypass the logic of smmuv3_get_config/decode_config. Couldn't
you reuse it. Originally we used the s1ctxptr and disabled/bypassed info.
> +        smmuv3_accel_dev_uninstall_nested_ste(accel_dev, STE_CFG_ABORT(config));
> +        smmuv3_flush_config(sdev);
> +        return;
> +    }
> +
> +    nested_data.ste[0] = (uint64_t)ste.word[0] | (uint64_t)ste.word[1] << 32;
> +    nested_data.ste[1] = (uint64_t)ste.word[2] | (uint64_t)ste.word[3] << 32;
> +    /* V | CONFIG | S1FMT | S1CTXPTR | S1CDMAX */
> +    nested_data.ste[0] &= 0xf80fffffffffffffULL;
> +    /* S1DSS | S1CIR | S1COR | S1CSH | S1STALLD | EATS */
> +    nested_data.ste[1] &= 0x380000ffULL;
> +    ret = smmuv3_accel_dev_install_nested_ste(accel_dev,
> +                                              IOMMU_HWPT_DATA_ARM_SMMUV3,
> +                                              sizeof(nested_data),
> +                                              &nested_data);
> +    if (ret) {
> +        error_report("Unable to install nested STE=%16LX:%16LX, ret=%d",
> +                     nested_data.ste[1], nested_data.ste[0], ret);
also print the SID
> +    }
> +    trace_smmuv3_accel_install_nested_ste(sid, nested_data.ste[1],
> +                                          nested_data.ste[0]);
> +}
> +
>  static bool
>  smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
>                                 HostIOMMUDeviceIOMMUFD *idev, Error **errp)
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index b6b7399347..46c8bcae14 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -24,6 +24,8 @@
>  #include "hw/registerfields.h"
>  #include "hw/arm/smmu-common.h"
>  
> +#include CONFIG_DEVICES
> +
>  typedef enum SMMUTranslationStatus {
>      SMMU_TRANS_DISABLE,
>      SMMU_TRANS_ABORT,
> @@ -547,6 +549,17 @@ typedef struct CD {
>      uint32_t word[16];
>  } CD;
>  
> +int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
> +                  SMMUEventInfo *event);
> +void smmuv3_flush_config(SMMUDevice *sdev);
> +
> +#if defined(CONFIG_ARM_SMMUV3_ACCEL) && defined(CONFIG_IOMMUFD)
> +void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid);
> +#else
> +static inline void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
> +{
> +}
> +#endif
>  /* STE fields */
>  
>  #define STE_VALID(x)   extract32((x)->word[0], 0, 1)
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index b49a59b64c..ea63731d61 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -628,8 +628,7 @@ bad_ste:
>   * Supports linear and 2-level stream table
>   * Return 0 on success, -EINVAL otherwise
>   */
> -static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
> -                         SMMUEventInfo *event)
> +int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste, SMMUEventInfo *event)
>  {
>      dma_addr_t addr, strtab_base;
>      uint32_t log2size;
> @@ -898,7 +897,7 @@ static SMMUTransCfg *smmuv3_get_config(SMMUDevice *sdev, SMMUEventInfo *event)
>      return cfg;
>  }
>  
> -static void smmuv3_flush_config(SMMUDevice *sdev)
> +void smmuv3_flush_config(SMMUDevice *sdev)
>  {
>      SMMUv3State *s = sdev->smmu;
>      SMMUState *bc = &s->smmu_state;
> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index 17960794bf..cd2eac31c2 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -61,6 +61,7 @@ smmu_reset_exit(void) ""
>  #smmuv3-accel.c
>  smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
>  smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
> +smmuv3_accel_install_nested_ste(uint32_t sid, uint64_t ste_1, uint64_t ste_0) "sid=%d ste=%"PRIx64":%"PRIx64
>  
>  # strongarm.c
>  strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> index aca6838dca..d6b0b1ca30 100644
> --- a/include/hw/arm/smmuv3-accel.h
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -35,9 +35,15 @@ typedef struct SMMUViommu {
>      QLIST_ENTRY(SMMUViommu) next;
>  } SMMUViommu;
>  
> +typedef struct SMMUS1Hwpt {
> +    IOMMUFDBackend *iommufd;
> +    uint32_t hwpt_id;
> +} SMMUS1Hwpt;
> +
>  typedef struct SMMUv3AccelDevice {
>      SMMUDevice  sdev;
>      HostIOMMUDeviceIOMMUFD *idev;
> +    SMMUS1Hwpt  *s1_hwpt;
>      SMMUViommu *viommu;
>      QLIST_ENTRY(SMMUv3AccelDevice) next;
>  } SMMUv3AccelDevice;
Thanks

Eric



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 11/20] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
  2025-03-11 14:10 ` [RFC PATCH v2 11/20] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device Shameer Kolothum via
  2025-03-18 23:30   ` Donald Dutile
@ 2025-03-25 18:13   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-25 18:13 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Allocate and associate a vDEVICE object for the Guest device
> with the vIOMMU. This will help the kernel to do the
> vSID --> sid translation whenever required (eg: device specific
s/sid/SID
> invalidations).
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c         | 22 ++++++++++++++++++++++
>  include/hw/arm/smmuv3-accel.h |  6 ++++++
>  2 files changed, 28 insertions(+)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index d3a5cf9551..056bd23b2e 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -109,6 +109,20 @@ void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
>          return;
>      }
>  
> +    if (!accel_dev->vdev && accel_dev->idev) {
> +        SMMUVdev *vdev;
> +        uint32_t vdev_id;
> +        SMMUViommu *viommu = accel_dev->viommu;
> +
> +        iommufd_backend_alloc_vdev(viommu->core.iommufd, accel_dev->idev->devid,
> +                                   viommu->core.viommu_id, sid, &vdev_id,
> +                                   &error_abort);
> +        vdev = g_new0(SMMUVdev, 1);
nit: no need to use 0 variant if you set all fields
> +        vdev->vdev_id = vdev_id;
> +        vdev->sid = sid;
> +        accel_dev->vdev = vdev;
> +    }
> +
>      ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
>      if (ret) {
>          /*
> @@ -283,6 +297,7 @@ static bool smmuv3_accel_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
>  static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
>                                              int devfn)
>  {
> +    SMMUVdev *vdev;
>      SMMUDevice *sdev;
>      SMMUv3AccelDevice *accel_dev;
>      SMMUViommu *viommu;
> @@ -312,6 +327,13 @@ static void smmuv3_accel_unset_iommu_device(PCIBus *bus, void *opaque,
>      trace_smmuv3_accel_unset_iommu_device(devfn, smmu_get_sid(sdev));
>  
>      viommu = s_accel->viommu;
> +    vdev = accel_dev->vdev;
> +    if (vdev) {
> +        iommufd_backend_free_id(viommu->iommufd, vdev->vdev_id);
> +        g_free(vdev);
> +        accel_dev->vdev = NULL;
> +    }
> +
>      if (QLIST_EMPTY(&viommu->device_list)) {
>          iommufd_backend_free_id(viommu->iommufd, viommu->bypass_hwpt_id);
>          iommufd_backend_free_id(viommu->iommufd, viommu->abort_hwpt_id);
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> index d6b0b1ca30..54b217ab4f 100644
> --- a/include/hw/arm/smmuv3-accel.h
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -35,6 +35,11 @@ typedef struct SMMUViommu {
>      QLIST_ENTRY(SMMUViommu) next;
>  } SMMUViommu;
>  
> +typedef struct SMMUVdev {
> +    uint32_t vdev_id;
> +    uint32_t sid;
> +} SMMUVdev;
> +
>  typedef struct SMMUS1Hwpt {
>      IOMMUFDBackend *iommufd;
>      uint32_t hwpt_id;
> @@ -45,6 +50,7 @@ typedef struct SMMUv3AccelDevice {
>      HostIOMMUDeviceIOMMUFD *idev;
>      SMMUS1Hwpt  *s1_hwpt;
>      SMMUViommu *viommu;
> +    SMMUVdev   *vdev;
>      QLIST_ENTRY(SMMUv3AccelDevice) next;
>  } SMMUv3AccelDevice;
>  



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-03-25 15:43   ` Shameerali Kolothum Thodi via
@ 2025-03-25 18:26     ` Nicolin Chen via
  2025-03-25 18:52       ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Nicolin Chen via @ 2025-03-25 18:26 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Tue, Mar 25, 2025 at 03:43:29PM +0000, Shameerali Kolothum Thodi wrote:
> > For the record I tested the series with host VFIO device and a
> > virtio-blk-pci device put behind the same pxb-pcie/smmu protection and
> > it works just fine
> > 
> > -+-[0000:0a]-+-01.0-[0b]----00.0  Mellanox Technologies ConnectX Family
> > mlx5Gen Virtual Function
> >  |           \-01.1-[0c]----00.0  Red Hat, Inc. Virtio 1.0 block device
> >  \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
> >              +-01.0-[01]--
> >              +-01.1-[02]--
> >              \-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
> > 
> > This shows that without vcmdq feature there is no blocker having the
> > same smmu device protecting both accelerated and emulated devices.
> 
> Thanks for giving it a spin. Yes, it currently supports the above. 
> 
> At the moment we are not using the IOTLB for the emulated dev for a
> config like above.  Have you checked performance for either emulated or
> vfio dev with the above config? Whatever light tests I have done it shows
> performance degradation for emulated dev compared to the default
> SMMUv3(iommu=smmuv3). 
> 
> And if the emulated dev issues _TLBI_NH_ASID, the code currently will propagate
> that down to host SMMUv3. This will affect the vfio dev as well.

VA too. Only commands with an SID field can be simply excluded.
I think we should be concerned that the underlying SMMU CMDQ HW
has a very limited command executing power, so wasting command
cycles doesn't feel very ideal as it could impact the host OS
(and other VMs too).

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 12/20] hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed
  2025-03-11 14:10 ` [RFC PATCH v2 12/20] hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed Shameer Kolothum via
@ 2025-03-25 18:47   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-25 18:47 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao



On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> When nested translation is enabled, there are 2-stage translation
> occuring to two different address spaces: stage-1 in the iommu as,
> while stage-2 in the system as.
>
> If a device attached to the vSMMU doesn't enable stage-1 translation,
> e.g. vSTE sets to Config=Bypass, the system as should be returned,
> so QEMU can set up system memory mappings onto the stage-2 page table.
> This is crucial for an iommufd enabled VFIO device as the VFIO core
> code would register an iommu notifier and replay the address space
> which should be bypassed for this nested translation case.

I would suggest to get inspired of 90519b90539 ("virtio-iommu: Add
bypass mode support to assigned device") or similar patch on vtd
(558e0024a428 intel_iommu: allow dynamic switch of IOMMU region +
4b519ef1de9a  intel-iommu: optimize nodmar memory regions), ie. use the
same terminology and techniques/objects (switch_address_space) .
To me this is not related to HW acceleration but rather to VFIO in general.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c         | 22 +++++++++++++++++++++-
>  include/hw/arm/smmuv3-accel.h |  3 +++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 056bd23b2e..76134d106a 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -18,6 +18,7 @@
>  static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
>                                                  PCIBus *bus, int devfn)
>  {
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(s);
>      SMMUDevice *sdev = sbus->pbdev[devfn];
>      SMMUv3AccelDevice *accel_dev;
>  
> @@ -29,6 +30,8 @@ static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
>  
>          sbus->pbdev[devfn] = sdev;
>          smmu_init_sdev(s, sdev, bus, devfn);
> +        address_space_init(&accel_dev->as_sysmem, &s_accel->root,
> +                           "smmuv3-accel-sysmem");
>      }
>  
>      return accel_dev;
> @@ -351,12 +354,23 @@ static AddressSpace *smmuv3_accel_find_add_as(PCIBus *bus, void *opaque,
>      SMMUPciBus *sbus;
>      SMMUv3AccelDevice *accel_dev;
>      SMMUDevice *sdev;
> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
> +    bool has_iommufd = false;
> +
> +    if (pdev) {
> +        has_iommufd = object_property_find(OBJECT(pdev), "iommufd");
> +    }
Refering to the discussion we had on how to set MSI RESV regions at
virt-acpi-build level depending on whether the SMMU was accelerated I
think we can use exactly the above method (checking accel property)
>  
>      sbus = smmu_get_sbus(s, bus);
>      accel_dev = smmuv3_accel_get_dev(s, sbus, bus, devfn);
>      sdev = &accel_dev->sdev;
>  
> -    return &sdev->as;
> +    /* Return the system as if the device uses stage-2 only */
> +    if (has_iommufd && !accel_dev->s1_hwpt) {
> +        return &accel_dev->as_sysmem;
> +    } else {
> +        return &sdev->as;
> +    }
>  }
>  
>  static int smmuv3_accel_pxb_pcie_bus(Object *obj, void *opaque)
> @@ -390,6 +404,12 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
>          error_propagate(errp, local_err);
>          return;
>      }
> +
> +    memory_region_init(&s_accel->root, OBJECT(s_accel), "root", UINT64_MAX);
> +    memory_region_init_alias(&s_accel->sysmem, OBJECT(s_accel),
> +                             "smmuv3-accel-sysmem", get_system_memory(), 0,
> +                             memory_region_size(get_system_memory()));
> +    memory_region_add_subregion(&s_accel->root, 0, &s_accel->sysmem);
>      bs->get_address_space = smmuv3_accel_find_add_as;
>      bs->set_iommu_device = smmuv3_accel_set_iommu_device;
>      bs->unset_iommu_device = smmuv3_accel_unset_iommu_device;
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> index 54b217ab4f..58e68534c0 100644
> --- a/include/hw/arm/smmuv3-accel.h
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -51,12 +51,15 @@ typedef struct SMMUv3AccelDevice {
>      SMMUS1Hwpt  *s1_hwpt;
>      SMMUViommu *viommu;
>      SMMUVdev   *vdev;
> +    AddressSpace as_sysmem;
>      QLIST_ENTRY(SMMUv3AccelDevice) next;
>  } SMMUv3AccelDevice;
>  
>  struct SMMUv3AccelState {
>      SMMUv3State smmuv3_state;
>      SMMUViommu *viommu;
> +    MemoryRegion root;
> +    MemoryRegion sysmem;
>  };
>  
>  struct SMMUv3AccelClass {



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
  2025-03-25 18:26     ` Nicolin Chen via
@ 2025-03-25 18:52       ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-25 18:52 UTC (permalink / raw)
  To: Nicolin Chen, Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer, Nicolin,

On 3/25/25 7:26 PM, Nicolin Chen wrote:
> On Tue, Mar 25, 2025 at 03:43:29PM +0000, Shameerali Kolothum Thodi wrote:
>>> For the record I tested the series with host VFIO device and a
>>> virtio-blk-pci device put behind the same pxb-pcie/smmu protection and
>>> it works just fine
>>>
>>> -+-[0000:0a]-+-01.0-[0b]----00.0  Mellanox Technologies ConnectX Family
>>> mlx5Gen Virtual Function
>>>  |           \-01.1-[0c]----00.0  Red Hat, Inc. Virtio 1.0 block device
>>>  \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>>>              +-01.0-[01]--
>>>              +-01.1-[02]--
>>>              \-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>>>
>>> This shows that without vcmdq feature there is no blocker having the
>>> same smmu device protecting both accelerated and emulated devices.
>> Thanks for giving it a spin. Yes, it currently supports the above. 
>>
>> At the moment we are not using the IOTLB for the emulated dev for a
>> config like above.  Have you checked performance for either emulated or
>> vfio dev with the above config? Whatever light tests I have done it shows
>> performance degradation for emulated dev compared to the default
>> SMMUv3(iommu=smmuv3). 
No I have not checked yet. Again I do not advocate for this kind of mix
but I wanted to check that it still works conceptually.

Thanks

Eric
>>
>> And if the emulated dev issues _TLBI_NH_ASID, the code currently will propagate
>> that down to host SMMUv3. This will affect the vfio dev as well.
> VA too. Only commands with an SID field can be simply excluded.
> I think we should be concerned that the underlying SMMU CMDQ HW
> has a very limited command executing power, so wasting command
> cycles doesn't feel very ideal as it could impact the host OS
> (and other VMs too).
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 10/20] hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  2025-03-25 18:08   ` Eric Auger
@ 2025-03-25 19:33     ` Nicolin Chen
  0 siblings, 0 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-25 19:33 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Tue, Mar 25, 2025 at 07:08:29PM +0100, Eric Auger wrote:
> > +static int
> > +smmuv3_accel_dev_install_nested_ste(SMMUv3AccelDevice *accel_dev,
> > +                                    uint32_t data_type, uint32_t data_len,
> > +                                    void *data)
> > +{
> > +    SMMUViommu *viommu = accel_dev->viommu;
> > +    SMMUS1Hwpt *s1_hwpt = accel_dev->s1_hwpt;
> > +    HostIOMMUDeviceIOMMUFD *idev = accel_dev->idev;
> > +
> > +    if (!idev || !viommu) {
> > +        return -ENOENT;
> > +    }
> > +
> > +    if (s1_hwpt) {
> > +        smmuv3_accel_dev_uninstall_nested_ste(accel_dev, false);

> why do you choose abort = false?

There is no particular reason. This is in the way where guest is
updating the STE. So we want to stage the device somewhere as its
default stage. Maybe ABORT could be a better place? I didn't give
this a deeper thought, to be honest. Good question :)

> > +    ret = smmu_find_ste(sdev->smmu, sid, &ste, &event);
> > +    if (ret) {
> > +        /*
> > +         * For a 2-level Stream Table, the level-2 table might not be ready
> > +         * until the device gets inserted to the stream table. Ignore this.
> > +         */

> I am confused by the above comment. Please can you describe the
> circumstances when this happens and why this should be an error

I added this since one of the early versions, and I don't recall
what was going on exactly... likely I saw smmu_find_ste() return
an error at that time when guest OS boots with Stream Table init,
yet later it installed the stage-1 STE and then smmu_find_ste()
started to return STE.

I think we can drop this comments, until we hit this again.

> > +        return;
> > +    }
> > +
> > +    config = STE_CONFIG(&ste);
> > +    if (!STE_VALID(&ste) || !STE_CFG_S1_ENABLED(config)) {

> you fully bypass the logic of smmuv3_get_config/decode_config. Couldn't
> you reuse it. Originally we used the s1ctxptr and disabled/bypassed info.

We likely can, though we don't need the CD part in decode_config
so we might need to split those functions to reuse.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-11 14:10 ` [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
  2025-03-19  1:31   ` Donald Dutile
@ 2025-03-26 13:38   ` Eric Auger
  2025-03-26 19:16     ` Nicolin Chen
  2025-03-26 13:59   ` Eric Auger
  2 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-26 13:38 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi Shameer, Nicolin,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Inroduce an SMMUCommandBatch and some helpers to batch and issue the
> commands.  Currently separate out TLBI commands and device cache commands
can you precise which "device cache commands" you are talking about?
> to avoid some errata on certain versions of SMMUs. Later it should check
worth to give more details about this famous errata here.
> IIDR register to detect if underlying SMMU hw has such an erratum.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c    | 69 ++++++++++++++++++++++++++++++++++++++++
>  hw/arm/smmuv3-internal.h | 29 +++++++++++++++++
>  2 files changed, 98 insertions(+)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 76134d106a..09be838d22 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -160,6 +160,75 @@ void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
>                                            nested_data.ste[0]);
>  }
>  
> +/* Update batch->ncmds to the number of execute cmds */
> +int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch)
> +{
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(bs);
> +    uint32_t total = batch->ncmds;
> +    IOMMUFDViommu *viommu_core;
> +    int ret;
> +
> +    if (!bs->accel) {
> +        return 0;
> +    }
> +
> +    if (!s_accel->viommu) {
> +        return 0;
> +    }
> +    viommu_core = &s_accel->viommu->core;
> +    ret = iommufd_backend_invalidate_cache(viommu_core->iommufd,
> +                                           viommu_core->viommu_id,
> +                                           IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
> +                                           sizeof(Cmd), &batch->ncmds,
> +                                           batch->cmds);
> +    if (total != batch->ncmds) {
> +        error_report("%s failed: ret=%d, total=%d, done=%d",
> +                      __func__, ret, total, batch->ncmds);
some commands may have been executed (batch->ncmds !=0). Is the
batch_cmds array updated accordingly? In the kernel doc I don't see any
mention of that. Do you need to report a cmd_error as we do for some
other cmds?
> +        return ret;
> +    }
> +
> +    batch->ncmds = 0;
> +    batch->dev_cache = false;
> +    return ret;
> +}
> +
> +int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
I was confused by the name. The helper adds a single Cmd to the batch,
right? so batch_cmd would better suited.
> +                            SMMUCommandBatch *batch, Cmd *cmd,
> +                            uint32_t *cons, bool dev_cache)
> +{
> +    int ret;
> +
> +    if (!bs->accel) {
> +        return 0;
> +    }
> +
> +    if (sdev) {
> +        SMMUv3AccelDevice *accel_dev;
> +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +        if (!accel_dev->s1_hwpt) {
can it happen? in the positive can you add some comment to describe in
which condition?
> +            return 0;
> +        }
> +    }
> +
> +    /*
> +     * Currently separate out dev_cache and hwpt for safety, which might
> +     * not be necessary if underlying HW SMMU does not have the errata.
> +     *
> +     * TODO check IIDR register values read from hw_info.
> +     */
> +    if (batch->ncmds && (dev_cache != batch->dev_cache)) {
> +        ret = smmuv3_accel_issue_cmd_batch(bs, batch);
> +        if (ret) {
> +            *cons = batch->cons[batch->ncmds];
> +            return ret;
> +        }
> +    }
> +    batch->dev_cache = dev_cache;
> +    batch->cmds[batch->ncmds] = *cmd;
> +    batch->cons[batch->ncmds++] = *cons;
> +    return 0;
> +}
> +
>  static bool
>  smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
>                                 HostIOMMUDeviceIOMMUFD *idev, Error **errp)
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 46c8bcae14..4602ae6728 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -549,13 +549,42 @@ typedef struct CD {
>      uint32_t word[16];
>  } CD;
>  
> +/**
> + * SMMUCommandBatch - batch of invalidation commands for smmuv3-accel
> + * @cmds: Pointer to list of commands
> + * @cons: Pointer to list of CONS corresponding to the commands
> + * @ncmds: Total ncmds in the batch
number of commands
> + * @dev_cache: Issue to a device cache
indicate whether the invalidation command batch targets device cache?
> + */
> +typedef struct SMMUCommandBatch {
> +    Cmd *cmds;
> +    uint32_t *cons;
> +    uint32_t ncmds;
> +    bool dev_cache;
> +} SMMUCommandBatch;
> +
>  int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
>                    SMMUEventInfo *event);
>  void smmuv3_flush_config(SMMUDevice *sdev);
>  
>  #if defined(CONFIG_ARM_SMMUV3_ACCEL) && defined(CONFIG_IOMMUFD)
> +int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch);
> +int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
> +                            SMMUCommandBatch *batch, Cmd *cmd,
> +                            uint32_t *cons, bool dev_cache);
>  void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid);
>  #else
> +static inline int smmuv3_accel_issue_cmd_batch(SMMUState *bs,
> +                                               SMMUCommandBatch *batch)
> +{
> +    return 0;
> +}
> +static inline int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
> +                                          SMMUCommandBatch *batch, Cmd *cmd,
> +                                          uint32_t *cons, bool dev_cache)
> +{
> +    return 0;
> +}
>  static inline void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
>  {
>  }
Thanks

Eric



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 14/20] hw/arm/smmuv3: Install nested ste for CFGI_STE
  2025-03-11 14:10 ` [RFC PATCH v2 14/20] hw/arm/smmuv3: Install nested ste for CFGI_STE Shameer Kolothum via
@ 2025-03-26 13:39   ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-26 13:39 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Make use of smmuv3_accel provided _install_nested_ste() for CFGI_STE.
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index ea63731d61..83159db1d4 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1286,6 +1286,7 @@ smmuv3_invalidate_ste(gpointer key, gpointer value, gpointer user_data)
>      if (sid < sid_range->start || sid > sid_range->end) {
>          return false;
>      }
> +    smmuv3_accel_install_nested_ste(sdev, sid);
>      trace_smmuv3_config_cache_inv(sid);
>      return true;
>  }
> @@ -1353,6 +1354,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>  
>              trace_smmuv3_cmdq_cfgi_ste(sid);
>              smmuv3_flush_config(sdev);
> +            smmuv3_accel_install_nested_ste(sdev, sid);
>  
>              break;
>          }
Given the small code diff I would merge this in the patch that introduces 

smmuv3_accel_install_nested_ste


Eric



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-11 14:10 ` [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
  2025-03-19  1:31   ` Donald Dutile
  2025-03-26 13:38   ` Eric Auger
@ 2025-03-26 13:59   ` Eric Auger
  2 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-26 13:59 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao



On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Inroduce an SMMUCommandBatch and some helpers to batch and issue the
> commands.  Currently separate out TLBI commands and device cache commands
> to avoid some errata on certain versions of SMMUs. Later it should check
> IIDR register to detect if underlying SMMU hw has such an erratum.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c    | 69 ++++++++++++++++++++++++++++++++++++++++
>  hw/arm/smmuv3-internal.h | 29 +++++++++++++++++
>  2 files changed, 98 insertions(+)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 76134d106a..09be838d22 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -160,6 +160,75 @@ void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
>                                            nested_data.ste[0]);
>  }
>  
> +/* Update batch->ncmds to the number of execute cmds */
> +int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch)
> +{
> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(bs);
> +    uint32_t total = batch->ncmds;
> +    IOMMUFDViommu *viommu_core;
> +    int ret;
> +
> +    if (!bs->accel) {
> +        return 0;
> +    }
> +
> +    if (!s_accel->viommu) {
> +        return 0;
> +    }
> +    viommu_core = &s_accel->viommu->core;
> +    ret = iommufd_backend_invalidate_cache(viommu_core->iommufd,
> +                                           viommu_core->viommu_id,
> +                                           IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
> +                                           sizeof(Cmd), &batch->ncmds,
> +                                           batch->cmds);
> +    if (total != batch->ncmds) {
> +        error_report("%s failed: ret=%d, total=%d, done=%d",
> +                      __func__, ret, total, batch->ncmds);
> +        return ret;
> +    }
> +
> +    batch->ncmds = 0;
> +    batch->dev_cache = false;
> +    return ret;
> +}
> +
> +int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
I think you shall document that sdev can be NULL and also when this
helper shall be called with sdev != NULL

Thanks

Eric
> +                            SMMUCommandBatch *batch, Cmd *cmd,
> +                            uint32_t *cons, bool dev_cache)
> +{
> +    int ret;
> +
> +    if (!bs->accel) {
> +        return 0;
> +    }
> +
> +    if (sdev) {
> +        SMMUv3AccelDevice *accel_dev;
> +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> +        if (!accel_dev->s1_hwpt) {
> +            return 0;
> +        }
> +    }
> +
> +    /*
> +     * Currently separate out dev_cache and hwpt for safety, which might
> +     * not be necessary if underlying HW SMMU does not have the errata.
> +     *
> +     * TODO check IIDR register values read from hw_info.
> +     */
> +    if (batch->ncmds && (dev_cache != batch->dev_cache)) {
> +        ret = smmuv3_accel_issue_cmd_batch(bs, batch);
> +        if (ret) {
> +            *cons = batch->cons[batch->ncmds];
> +            return ret;
> +        }
> +    }
> +    batch->dev_cache = dev_cache;
> +    batch->cmds[batch->ncmds] = *cmd;
> +    batch->cons[batch->ncmds++] = *cons;
> +    return 0;
> +}
> +
>  static bool
>  smmuv3_accel_dev_attach_viommu(SMMUv3AccelDevice *accel_dev,
>                                 HostIOMMUDeviceIOMMUFD *idev, Error **errp)
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 46c8bcae14..4602ae6728 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -549,13 +549,42 @@ typedef struct CD {
>      uint32_t word[16];
>  } CD;
>  
> +/**
> + * SMMUCommandBatch - batch of invalidation commands for smmuv3-accel
> + * @cmds: Pointer to list of commands
> + * @cons: Pointer to list of CONS corresponding to the commands
> + * @ncmds: Total ncmds in the batch
> + * @dev_cache: Issue to a device cache
> + */
> +typedef struct SMMUCommandBatch {
> +    Cmd *cmds;
> +    uint32_t *cons;
> +    uint32_t ncmds;
> +    bool dev_cache;
> +} SMMUCommandBatch;
> +
>  int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste,
>                    SMMUEventInfo *event);
>  void smmuv3_flush_config(SMMUDevice *sdev);
>  
>  #if defined(CONFIG_ARM_SMMUV3_ACCEL) && defined(CONFIG_IOMMUFD)
> +int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch);
> +int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
> +                            SMMUCommandBatch *batch, Cmd *cmd,
> +                            uint32_t *cons, bool dev_cache);
>  void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid);
>  #else
> +static inline int smmuv3_accel_issue_cmd_batch(SMMUState *bs,
> +                                               SMMUCommandBatch *batch)
> +{
> +    return 0;
> +}
> +static inline int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
> +                                          SMMUCommandBatch *batch, Cmd *cmd,
> +                                          uint32_t *cons, bool dev_cache)
> +{
> +    return 0;
> +}
>  static inline void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
>  {
>  }



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw
  2025-03-11 14:10 ` [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw Shameer Kolothum via
@ 2025-03-26 14:16   ` Eric Auger
  2025-03-26 19:27     ` Nicolin Chen
  2025-03-26 14:18   ` Eric Auger
  1 sibling, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-26 14:16 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Use the provided smmuv3-accel helper functions to issue the
> command to physical SMMUv3.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-internal.h | 11 ++++++++
>  hw/arm/smmuv3.c          | 58 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 4602ae6728..546f8faac0 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -235,6 +235,17 @@ static inline bool smmuv3_gerror_irq_enabled(SMMUv3State *s)
>  #define Q_CONS_WRAP(q) (((q)->cons & WRAP_MASK(q)) >> (q)->log2size)
>  #define Q_PROD_WRAP(q) (((q)->prod & WRAP_MASK(q)) >> (q)->log2size)
>  
> +static inline int smmuv3_q_ncmds(SMMUQueue *q)
> +{
> +        uint32_t prod = Q_PROD(q);
> +        uint32_t cons = Q_CONS(q);
> +
> +        if (Q_PROD_WRAP(q) == Q_CONS_WRAP(q))
> +                return prod - cons;
> +        else
> +                return WRAP_MASK(q) - cons + prod;
> +}
> +
>  static inline bool smmuv3_q_full(SMMUQueue *q)
>  {
>      return ((q->cons ^ q->prod) & WRAP_INDEX_MASK(q)) == WRAP_MASK(q);
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 83159db1d4..e0f225d0df 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1297,10 +1297,18 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>      SMMUCmdError cmd_error = SMMU_CERROR_NONE;
>      SMMUQueue *q = &s->cmdq;
>      SMMUCommandType type = 0;
> +    SMMUCommandBatch batch = {};
> +    uint32_t ncmds = 0;
init looks useless
> +
>  
>      if (!smmuv3_cmdq_enabled(s)) {
>          return 0;
>      }
> +
> +    ncmds = smmuv3_q_ncmds(q);
> +    batch.cmds = g_new0(Cmd, ncmds);
> +    batch.cons = g_new0(uint32_t, ncmds);
> +
>      /*
>       * some commands depend on register values, typically CR0. In case those
>       * register values change while handling the command, spec says it
> @@ -1395,6 +1403,13 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>  
>              trace_smmuv3_cmdq_cfgi_cd(sid);
>              smmuv3_flush_config(sdev);
> +
> +            if (smmuv3_accel_batch_cmds(sdev->smmu, sdev, &batch, &cmd,
> +                                        &q->cons, true)) {
> +                cmd_error = SMMU_CERROR_ILL;
I understand you collect all batchable commands all together (those
sharing the same dev_cache prop) and the batch is executed either when
the cache target changes or at the very end of the queue consumption.
Since you don't batch all kinds of commands don't you have a risk to
send commands out of order?

Eric
> +                break;
> +            }
> +
>              break;
>          }
>          case SMMU_CMD_TLBI_NH_ASID:
> @@ -1418,6 +1433,13 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>              trace_smmuv3_cmdq_tlbi_nh_asid(asid);
>              smmu_inv_notifiers_all(&s->smmu_state);
>              smmu_iotlb_inv_asid_vmid(bs, asid, vmid);
> +
> +            if (smmuv3_accel_batch_cmds(bs, NULL, &batch, &cmd, &q->cons,
> +                                        false)) {
> +                cmd_error = SMMU_CERROR_ILL;
> +                break;
> +            }
> +
>              break;
>          }
>          case SMMU_CMD_TLBI_NH_ALL:
> @@ -1445,6 +1467,12 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>              trace_smmuv3_cmdq_tlbi_nsnh();
>              smmu_inv_notifiers_all(&s->smmu_state);
>              smmu_iotlb_inv_all(bs);
> +
> +            if (smmuv3_accel_batch_cmds(bs, NULL, &batch, &cmd, &q->cons,
> +                                        false)) {
> +                cmd_error = SMMU_CERROR_ILL;
> +                break;
> +            }
>              break;
>          case SMMU_CMD_TLBI_NH_VAA:
>          case SMMU_CMD_TLBI_NH_VA:
> @@ -1453,7 +1481,24 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>                  break;
>              }
>              smmuv3_range_inval(bs, &cmd, SMMU_STAGE_1);
> +
> +            if (smmuv3_accel_batch_cmds(bs, NULL, &batch, &cmd, &q->cons,
> +                                        false)) {
> +                cmd_error = SMMU_CERROR_ILL;
> +                break;
> +            }
> +            break;
> +        case SMMU_CMD_ATC_INV:
> +        {
> +            SMMUDevice *sdev = smmu_find_sdev(bs, CMD_SID(&cmd));
> +
> +            if (smmuv3_accel_batch_cmds(sdev->smmu, sdev, &batch, &cmd,
> +                                        &q->cons, true)) {
> +                cmd_error = SMMU_CERROR_ILL;
> +                break;
> +            }
>              break;
> +        }
>          case SMMU_CMD_TLBI_S12_VMALL:
>          {
>              int vmid = CMD_VMID(&cmd);
> @@ -1485,7 +1530,6 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>          case SMMU_CMD_TLBI_EL2_ASID:
>          case SMMU_CMD_TLBI_EL2_VA:
>          case SMMU_CMD_TLBI_EL2_VAA:
> -        case SMMU_CMD_ATC_INV:
>          case SMMU_CMD_PRI_RESP:
>          case SMMU_CMD_RESUME:
>          case SMMU_CMD_STALL_TERM:
> @@ -1511,12 +1555,24 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>          queue_cons_incr(q);
>      }
>  
> +    qemu_mutex_lock(&s->mutex);
> +    if (!cmd_error && batch.ncmds) {
> +        if (smmuv3_accel_issue_cmd_batch(bs, &batch)) {
> +            q->cons = batch.cons[batch.ncmds];
> +            cmd_error = SMMU_CERROR_ILL;
> +        }
> +    }
> +    qemu_mutex_unlock(&s->mutex);
> +
>      if (cmd_error) {
>          trace_smmuv3_cmdq_consume_error(smmu_cmd_string(type), cmd_error);
>          smmu_write_cmdq_err(s, cmd_error);
>          smmuv3_trigger_irq(s, SMMU_IRQ_GERROR, R_GERROR_CMDQ_ERR_MASK);
>      }
>  
> +    g_free(batch.cmds);
> +    g_free(batch.cons);
> +
>      trace_smmuv3_cmdq_consume_out(Q_PROD(q), Q_CONS(q),
>                                    Q_PROD_WRAP(q), Q_CONS_WRAP(q));
>  
Thanks

Eric



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw
  2025-03-11 14:10 ` [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw Shameer Kolothum via
  2025-03-26 14:16   ` Eric Auger
@ 2025-03-26 14:18   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-26 14:18 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao



On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Use the provided smmuv3-accel helper functions to issue the
> command to physical SMMUv3.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-internal.h | 11 ++++++++
>  hw/arm/smmuv3.c          | 58 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 4602ae6728..546f8faac0 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -235,6 +235,17 @@ static inline bool smmuv3_gerror_irq_enabled(SMMUv3State *s)
>  #define Q_CONS_WRAP(q) (((q)->cons & WRAP_MASK(q)) >> (q)->log2size)
>  #define Q_PROD_WRAP(q) (((q)->prod & WRAP_MASK(q)) >> (q)->log2size)
>  
> +static inline int smmuv3_q_ncmds(SMMUQueue *q)
> +{
> +        uint32_t prod = Q_PROD(q);
> +        uint32_t cons = Q_CONS(q);
> +
> +        if (Q_PROD_WRAP(q) == Q_CONS_WRAP(q))
> +                return prod - cons;
> +        else
> +                return WRAP_MASK(q) - cons + prod;
> +}
> +
>  static inline bool smmuv3_q_full(SMMUQueue *q)
>  {
>      return ((q->cons ^ q->prod) & WRAP_INDEX_MASK(q)) == WRAP_MASK(q);
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 83159db1d4..e0f225d0df 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1297,10 +1297,18 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>      SMMUCmdError cmd_error = SMMU_CERROR_NONE;
>      SMMUQueue *q = &s->cmdq;
>      SMMUCommandType type = 0;
> +    SMMUCommandBatch batch = {};
> +    uint32_t ncmds = 0;
> +
>  
>      if (!smmuv3_cmdq_enabled(s)) {
>          return 0;
>      }
> +
> +    ncmds = smmuv3_q_ncmds(q);
> +    batch.cmds = g_new0(Cmd, ncmds);
> +    batch.cons = g_new0(uint32_t, ncmds);
> +
>      /*
>       * some commands depend on register values, typically CR0. In case those
>       * register values change while handling the command, spec says it
> @@ -1395,6 +1403,13 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>  
>              trace_smmuv3_cmdq_cfgi_cd(sid);
>              smmuv3_flush_config(sdev);
> +
> +            if (smmuv3_accel_batch_cmds(sdev->smmu, sdev, &batch, &cmd,
> +                                        &q->cons, true)) {
> +                cmd_error = SMMU_CERROR_ILL;
OK so now I see you record the error. You can ignore the previous comment.
> +                break;
> +            }
> +
>              break;
>          }
>          case SMMU_CMD_TLBI_NH_ASID:
> @@ -1418,6 +1433,13 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>              trace_smmuv3_cmdq_tlbi_nh_asid(asid);
>              smmu_inv_notifiers_all(&s->smmu_state);
>              smmu_iotlb_inv_asid_vmid(bs, asid, vmid);
> +
> +            if (smmuv3_accel_batch_cmds(bs, NULL, &batch, &cmd, &q->cons,
> +                                        false)) {
> +                cmd_error = SMMU_CERROR_ILL;
> +                break;
> +            }
> +
>              break;
>          }
>          case SMMU_CMD_TLBI_NH_ALL:
> @@ -1445,6 +1467,12 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>              trace_smmuv3_cmdq_tlbi_nsnh();
>              smmu_inv_notifiers_all(&s->smmu_state);
>              smmu_iotlb_inv_all(bs);
> +
> +            if (smmuv3_accel_batch_cmds(bs, NULL, &batch, &cmd, &q->cons,
> +                                        false)) {
> +                cmd_error = SMMU_CERROR_ILL;
> +                break;
> +            }
>              break;
>          case SMMU_CMD_TLBI_NH_VAA:
>          case SMMU_CMD_TLBI_NH_VA:
> @@ -1453,7 +1481,24 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>                  break;
>              }
>              smmuv3_range_inval(bs, &cmd, SMMU_STAGE_1);
> +
> +            if (smmuv3_accel_batch_cmds(bs, NULL, &batch, &cmd, &q->cons,
> +                                        false)) {
> +                cmd_error = SMMU_CERROR_ILL;
> +                break;
> +            }
> +            break;
> +        case SMMU_CMD_ATC_INV:
To me the code below shall be put in a separate patch as it introduces
the suport for a new cmd. Also it shall be properly documented in the
commit msg
> +        {
> +            SMMUDevice *sdev = smmu_find_sdev(bs, CMD_SID(&cmd));
> +
> +            if (smmuv3_accel_batch_cmds(sdev->smmu, sdev, &batch, &cmd,
> +                                        &q->cons, true)) {
> +                cmd_error = SMMU_CERROR_ILL;
> +                break;
> +            }
>              break;
> +        }
>          case SMMU_CMD_TLBI_S12_VMALL:
>          {
>              int vmid = CMD_VMID(&cmd);
> @@ -1485,7 +1530,6 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>          case SMMU_CMD_TLBI_EL2_ASID:
>          case SMMU_CMD_TLBI_EL2_VA:
>          case SMMU_CMD_TLBI_EL2_VAA:
> -        case SMMU_CMD_ATC_INV:
>          case SMMU_CMD_PRI_RESP:
>          case SMMU_CMD_RESUME:
>          case SMMU_CMD_STALL_TERM:
> @@ -1511,12 +1555,24 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>          queue_cons_incr(q);
>      }
>  
> +    qemu_mutex_lock(&s->mutex);
> +    if (!cmd_error && batch.ncmds) {
> +        if (smmuv3_accel_issue_cmd_batch(bs, &batch)) {
> +            q->cons = batch.cons[batch.ncmds];
> +            cmd_error = SMMU_CERROR_ILL;
> +        }
> +    }
> +    qemu_mutex_unlock(&s->mutex);
> +
>      if (cmd_error) {
>          trace_smmuv3_cmdq_consume_error(smmu_cmd_string(type), cmd_error);
>          smmu_write_cmdq_err(s, cmd_error);
>          smmuv3_trigger_irq(s, SMMU_IRQ_GERROR, R_GERROR_CMDQ_ERR_MASK);
>      }
>  
> +    g_free(batch.cmds);
> +    g_free(batch.cons);
> +
>      trace_smmuv3_cmdq_consume_out(Q_PROD(q), Q_CONS(q),
>                                    Q_PROD_WRAP(q), Q_CONS_WRAP(q));
>  
Eric



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 16/20] hw/arm/smmuv3-accel: Read host SMMUv3 device info
  2025-03-11 14:10 ` [RFC PATCH v2 16/20] hw/arm/smmuv3-accel: Read host SMMUv3 device info Shameer Kolothum via
  2025-03-19  2:45   ` Donald Dutile
@ 2025-03-26 14:57   ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-26 14:57 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao



On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Read the underlying SMMUv3 device info and set corresponding IDR
> bits. We need at least one cold-plugged vfio-pci dev associated
> with the smmuv3-accel instance to do this now.  Hence fail if it
> is not available.
>
> ToDo: The above requirement will be relaxed in future when we add
> support in the kernel.
Can you give more details about what is missing?
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-accel.c         | 104 ++++++++++++++++++++++++++++++++++
>  hw/arm/trace-events           |   1 +
>  include/hw/arm/smmuv3-accel.h |   2 +
>  3 files changed, 107 insertions(+)
>
> diff --git a/hw/arm/smmuv3-accel.c b/hw/arm/smmuv3-accel.c
> index 09be838d22..fb08e1d66b 100644
> --- a/hw/arm/smmuv3-accel.c
> +++ b/hw/arm/smmuv3-accel.c
> @@ -15,6 +15,96 @@
>  
>  #include "smmuv3-internal.h"
>  
> +static int
> +smmuv3_accel_dev_get_info(SMMUv3AccelDevice *accel_dev, uint32_t *data_type,
> +                          uint32_t data_len, void *data)
> +{
> +    uint64_t caps;
> +
> +    if (!accel_dev || !accel_dev->idev) {
> +        return -ENOENT;
> +    }
> +
> +    return !iommufd_backend_get_device_info(accel_dev->idev->iommufd,
> +                                            accel_dev->idev->devid,
> +                                            data_type, data,
> +                                            data_len, &caps, NULL);
> +}
> +
> +static void smmuv3_accel_init_regs(SMMUv3AccelState *s_accel)
> +{
> +    SMMUv3State *s = ARM_SMMUV3(s_accel);
> +    SMMUv3AccelDevice *accel_dev;
> +    uint32_t data_type;
> +    uint32_t val;
> +    int ret;
> +
> +    if (!s_accel->viommu || QLIST_EMPTY(&s_accel->viommu->device_list)) {
> +        error_report("At least one cold-plugged vfio-pci is required for smmuv3-accel!");
> +        exit(1);
> +    }
> +
> +    accel_dev = QLIST_FIRST(&s_accel->viommu->device_list);
> +    if (accel_dev->info.idr[0]) {
> +        info_report("reusing the previous hw_info");
> +        goto out;
> +    }
> +
> +    ret = smmuv3_accel_dev_get_info(accel_dev, &data_type,
> +                                    sizeof(accel_dev->info), &accel_dev->info);
> +    if (ret) {
> +        error_report("failed to get SMMU device info");
> +        return;
> +    }
> +
> +    if (data_type != IOMMU_HW_INFO_TYPE_ARM_SMMUV3) {
> +        error_report("Wrong data type (%d)!", data_type);
> +        return;
> +    }
> +
> +out:
> +    trace_smmuv3_accel_get_device_info(accel_dev->info.idr[0],
> +                                       accel_dev->info.idr[1],
> +                                       accel_dev->info.idr[3],
> +                                       accel_dev->info.idr[5]);
> +
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, BTM);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, BTM, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, ATS);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, ATS, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, ASID16);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, ASID16, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, TERM_MODEL);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, TERM_MODEL, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, STALL_MODEL);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, STALL_MODEL, val);
> +    val = FIELD_EX32(accel_dev->info.idr[0], IDR0, STLEVEL);
> +    s->idr[0] = FIELD_DP32(s->idr[0], IDR0, STLEVEL, val);
> +
> +    val = FIELD_EX32(accel_dev->info.idr[1], IDR1, SIDSIZE);
> +    s->idr[1] = FIELD_DP32(s->idr[1], IDR1, SIDSIZE, val);
> +    val = FIELD_EX32(accel_dev->info.idr[1], IDR1, SSIDSIZE);
> +    s->idr[1] = FIELD_DP32(s->idr[1], IDR1, SSIDSIZE, val);
> +
> +    val = FIELD_EX32(accel_dev->info.idr[3], IDR3, HAD);
> +    s->idr[3] = FIELD_DP32(s->idr[3], IDR3, HAD, val);
> +    val = FIELD_EX32(accel_dev->info.idr[3], IDR3, RIL);
> +    s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, val);
> +    val = FIELD_EX32(accel_dev->info.idr[3], IDR3, BBML);
> +    s->idr[3] = FIELD_DP32(s->idr[3], IDR3, BBML, val);
> +
> +    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, GRAN4K);
> +    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, val);
> +    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, GRAN16K);
> +    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, val);
> +    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, GRAN64K);
> +    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, val);
> +    val = FIELD_EX32(accel_dev->info.idr[5], IDR5, OAS);
> +    s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, val);
Are all those ID regs mandated? I would suggest to have props with
default values that can be overriden. Once we get a VFIO device plugged
we could check whether there is an incompatibility.

> +
> +    /* FIXME check iidr and aidr registrs too */
not, capital letters for regs and registrs typ

> +}
> +
>  static SMMUv3AccelDevice *smmuv3_accel_get_dev(SMMUState *s, SMMUPciBus *sbus,
>                                                  PCIBus *bus, int devfn)
>  {
> @@ -484,11 +574,25 @@ static void smmu_accel_realize(DeviceState *d, Error **errp)
>      bs->unset_iommu_device = smmuv3_accel_unset_iommu_device;
>  }
>  
> +static void smmuv3_accel_reset_hold(Object *obj, ResetType type)
> +{
> +    SMMUv3AccelState *s = ARM_SMMUV3_ACCEL(obj);
> +    SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_GET_CLASS(s);
> +
> +    if (c->parent_phases.hold) {
> +        c->parent_phases.hold(obj, type);
> +    }
> +    smmuv3_accel_init_regs(s);
> +}
> +
>  static void smmuv3_accel_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> +    ResettableClass *rc = RESETTABLE_CLASS(klass);
>      SMMUv3AccelClass *c = ARM_SMMUV3_ACCEL_CLASS(klass);
>  
> +    resettable_class_set_parent_phases(rc, NULL, smmuv3_accel_reset_hold, NULL,
> +                                       &c->parent_phases);
as Don mentionned this shall be exit now anyway

Eric
>      device_class_set_parent_realize(dc, smmu_accel_realize,
>                                      &c->parent_realize);
>      dc->hotpluggable = false;
> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index cd2eac31c2..c7a7e58291 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -62,6 +62,7 @@ smmu_reset_exit(void) ""
>  smmuv3_accel_set_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x)"
>  smmuv3_accel_unset_iommu_device(int devfn, uint32_t sid) "devfn=0x%x (sid=0x%x"
>  smmuv3_accel_install_nested_ste(uint32_t sid, uint64_t ste_1, uint64_t ste_0) "sid=%d ste=%"PRIx64":%"PRIx64
> +smmuv3_accel_get_device_info(uint32_t idr0, uint32_t idr1, uint32_t idr3, uint32_t idr5) "idr0=0x%x idr1=0x%x idr3=0x%x idr5=0x%x"
>  
>  # strongarm.c
>  strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
> diff --git a/include/hw/arm/smmuv3-accel.h b/include/hw/arm/smmuv3-accel.h
> index 58e68534c0..9e30d7d351 100644
> --- a/include/hw/arm/smmuv3-accel.h
> +++ b/include/hw/arm/smmuv3-accel.h
> @@ -52,6 +52,7 @@ typedef struct SMMUv3AccelDevice {
>      SMMUViommu *viommu;
>      SMMUVdev   *vdev;
>      AddressSpace as_sysmem;
> +    struct iommu_hw_info_arm_smmuv3 info;
>      QLIST_ENTRY(SMMUv3AccelDevice) next;
>  } SMMUv3AccelDevice;
>  
> @@ -68,6 +69,7 @@ struct SMMUv3AccelClass {
>      /*< public >*/
>  
>      DeviceRealize parent_realize;
> +    ResettablePhases parent_phases;
>  };
>  
>  #endif /* HW_ARM_SMMUV3_ACCEL_H */



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
  2025-03-11 14:10 ` [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD Shameer Kolothum via
@ 2025-03-26 17:18   ` Eric Auger
  2025-03-26 19:46     ` Nicolin Chen
  2025-03-27 13:05     ` Jason Gunthorpe
  0 siblings, 2 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-26 17:18 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> With nested translation, the underlying HW could support those two fields.
> Allow them according to the updated idr registers after the hw_info ioctl.
s/idr/IDR
>
> When substreams are enabled (S1CDMax != 0), S1DSS field determines
> the behavior of a transaction.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3-internal.h |  1 +
>  hw/arm/smmuv3.c          | 15 +++++++++++++--
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 546f8faac0..530284a9c0 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -612,6 +612,7 @@ static inline void smmuv3_accel_install_nested_ste(SMMUDevice *sdev, int sid)
>  
>  #define STE_S1FMT(x)       extract32((x)->word[0], 4 , 2)
>  #define STE_S1CDMAX(x)     extract32((x)->word[1], 27, 5)
> +#define STE_S1DSS(x)       extract32((x)->word[2], 0,  2)
>  #define STE_S1STALLD(x)    extract32((x)->word[2], 27, 1)
>  #define STE_EATS(x)        extract32((x)->word[2], 28, 2)
>  #define STE_STRW(x)        extract32((x)->word[2], 30, 2)
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index e0f225d0df..e8a6c50056 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -561,6 +561,16 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
>  
>      decode_ste_config(cfg, config);
>  
> +      /* S1DSS.Terminate is same as Config.abort for default stream */

S1DSS. Termination

> +    if (STE_CFG_S1_ENABLED(config) && STE_S1DSS(ste) == 0) {
> +        cfg->aborted = true;
The spec says:
"
When substreams are enabled (STE.S1CDMax != 0), this field determines
the behavior of a transaction or
translation request that arrives without an associated substream:
"
So I understand you should also check STE.S1CDMax. Also how do we check
that the incoming transacrion arrives without a substream?

In general shouldn't we add the support of subtreams in the emulated
code too?

the spec also says that in that case you should record a
F_STREAM_DISABLED event.

> +    }
> +
> +    /* S1DSS.Bypass is same as Config.bypass for default stream */
S1DSS. Bypass
> +    if (STE_CFG_S1_ENABLED(config) && STE_S1DSS(ste) == 0x1) {
> +        cfg->bypassed = true;
> +    }
> +
>      if (cfg->aborted || cfg->bypassed) {
>          return 0;
>      }
> @@ -598,13 +608,14 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
>          }
>      }
>  
> -    if (STE_S1CDMAX(ste) != 0) {
> +    if (!FIELD_EX32(s->idr[1], IDR1, SSIDSIZE) && STE_S1CDMAX(ste) != 0) {
>          qemu_log_mask(LOG_UNIMP,
>                        "SMMUv3 does not support multiple context descriptors yet\n");
the log message should be different because it becomes a guest error:
qemu_log_mask(LOG_GUEST_ERROR, "invalid S1CDMAX as SSIDSIZE==0") or
something alike

>          goto bad_ste;
>      }
>  
> -    if (STE_S1STALLD(ste)) {
> +    /* STALL_MODEL being 0b01 means "stall is not supported" */
> +    if ((FIELD_EX32(s->idr[0], IDR0, STALL_MODEL) & 0x1) && STE_S1STALLD(ste)) {
>          qemu_log_mask(LOG_UNIMP,
>                        "SMMUv3 S1 stalling fault model not allowed yet\n");
same here.

Again I think we need to understand the consequence of having a more
comprehensive support of SSID. This also holds with old the IDR fields
that may be inherited from the HW and we don't support yet in the
emulation code

Thanks

Eric
>          goto bad_ste;



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
  2025-03-11 14:10 ` [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3 Shameer Kolothum via
  2025-03-19  2:52   ` Donald Dutile
@ 2025-03-26 17:40   ` Eric Auger
  2025-03-26 19:57     ` Nicolin Chen
  1 sibling, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-26 17:40 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao



On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> If a vSMMU is configured as a accelerated one, HW IOTLB will be used
> and all cache invalidation should be done to the HW IOTLB too, v.s.
> the emulated iotlb. In this case, an iommu notifier isn't registered,
> as the devices behind a SMMUv3-accel would stay in the system address
> space for stage-2 mappings.
>
> However, the KVM code still requests an iommu address space to translate
> an MSI doorbell gIOVA via get_msi_address_space() and translate().
In case we you flat MSI mapping, can't we get rid about that problematic?

Sorry but I don't really understand the problematic here. Please can
elaborate?

Thanks

Eric
>
> Since a SMMUv3-accel doesn't register an iommu notifier to flush emulated
> iotlb, bypass the emulated IOTLB and always walk through the guest-level
> IO page table.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmu-common.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 9fd455baa0..fd10df8866 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -77,6 +77,17 @@ static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs,
>      uint8_t level = 4 - (inputsize - 4) / stride;
>      SMMUTLBEntry *entry = NULL;
>  
> +    /*
> +     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
> +     * KVM still requests for an iommu address space for an MSI fixup by looking
> +     * up stage-1 page table. Make sure we don't go through the emulated pathway
> +     * so that the emulated iotlb will not need any invalidation.
> +     */
> +
> +    if (bs->accel) {
> +        return NULL;
> +    }
> +
>      while (level <= 3) {
>          uint64_t subpage_size = 1ULL << level_shift(level, tt->granule_sz);
>          uint64_t mask = subpage_size - 1;
> @@ -142,6 +153,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
>      SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
>      uint8_t tg = (new->granule - 10) / 2;
>  
> +    /*
> +     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
> +     * KVM still requests for an iommu address space for an MSI fixup by looking
> +     * up stage-1 page table. Make sure we don't go through the emulated pathway
> +     * so that the emulated iotlb will not need any invalidation.
> +     */
> +    if (bs->accel) {
> +        return;
> +    }
> +
>      if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
>          smmu_iotlb_inv_all(bs);
>      }



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 19/20] hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes
  2025-03-11 14:10 ` [RFC PATCH v2 19/20] hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes Shameer Kolothum via
@ 2025-03-26 18:14   ` Eric Auger
  2025-03-26 18:50     ` Nicolin Chen
  0 siblings, 1 reply; 145+ messages in thread
From: Eric Auger @ 2025-03-26 18:14 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, berrange, nathanc, mochs,
	smostafa, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao



On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Now that we can have multiple user-creatable smmuv3-accel devices,
> each associated with different pci buses, update IORT ID mappings
> accordingly.
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/virt-acpi-build.c | 113 +++++++++++++++++++++++++++++++++------
>  1 file changed, 97 insertions(+), 16 deletions(-)
>
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 3ac8f8e178..c232850e36 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -43,6 +43,7 @@
>  #include "hw/acpi/generic_event_device.h"
>  #include "hw/acpi/tpm.h"
>  #include "hw/acpi/hmat.h"
> +#include "hw/arm/smmuv3-accel.h"
>  #include "hw/pci/pcie_host.h"
>  #include "hw/pci/pci.h"
>  #include "hw/pci/pci_bus.h"
> @@ -233,6 +234,51 @@ struct AcpiIortIdMapping {
>  };
>  typedef struct AcpiIortIdMapping AcpiIortIdMapping;
>  
> +struct SMMUv3Accel {
> +    int irq;
> +    hwaddr base;
> +    AcpiIortIdMapping smmu_idmap;
> +};
> +typedef struct SMMUv3Accel SMMUv3Accel;
> +
> +static int smmuv3_accel_idmap_compare(gconstpointer a, gconstpointer b)
> +{
> +    SMMUv3Accel *accel_a = (SMMUv3Accel *)a;
> +    SMMUv3Accel *accel_b = (SMMUv3Accel *)b;
> +
> +    return accel_a->smmu_idmap.input_base - accel_b->smmu_idmap.input_base;
> +}
> +
> +static int get_smmuv3_accel(Object *obj, void *opaque)
> +{
> +    GArray *s_accel_blob = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_ARM_SMMUV3_ACCEL)) {
> +        PCIBus *bus = (PCIBus *) object_property_get_link(obj, "primary-bus",
> +                                                          &error_abort);
> +        if (bus && !pci_bus_bypass_iommu(bus)) {
> +            SMMUv3Accel accel;
> +            int min_bus, max_bus;
> +            VirtMachineState *v = VIRT_MACHINE(qdev_get_machine());
> +            PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(v->platform_bus_dev);
> +            SysBusDevice *sbdev = SYS_BUS_DEVICE(obj);
> +            hwaddr base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
> +            int irq = platform_bus_get_irqn(pbus, sbdev, 0);
> +
> +            base += v->memmap[VIRT_PLATFORM_BUS].base;
> +            irq += v->irqmap[VIRT_PLATFORM_BUS];
> +
> +            pci_bus_range(bus, &min_bus, &max_bus);
> +            accel.smmu_idmap.input_base = min_bus << 8;
> +            accel.smmu_idmap.id_count = (max_bus - min_bus + 1) << 8;
> +            accel.base = base;
> +            accel.irq = irq + ARM_SPI_BASE;
> +            g_array_append_val(s_accel_blob, accel);
> +        }
> +    }
> +    return 0;
> +}
> +
>  /* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
>  static int
>  iort_host_bridges(Object *obj, void *opaque)
> @@ -275,30 +321,51 @@ static void
>  build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>  {
>      int i, nb_nodes, rc_mapping_count;
> -    size_t node_size, smmu_offset = 0;
> +    size_t node_size, *smmu_offset = NULL;
>      AcpiIortIdMapping *idmap;
> +    SMMUv3Accel *accel;
> +    int num_smmus = 0;
>      uint32_t id = 0;
>      GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
>      GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
> +    GArray *smmuv3_accel = g_array_new(false, true, sizeof(SMMUv3Accel));
>  
>      AcpiTable table = { .sig = "IORT", .rev = 3, .oem_id = vms->oem_id,
>                          .oem_table_id = vms->oem_table_id };
>      /* Table 2 The IORT */
>      acpi_table_begin(&table, table_data);
>  
> -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> -        AcpiIortIdMapping next_range = {0};
> -
> +    nb_nodes = 2; /* RC, ITS */
> +    if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
> +        object_child_foreach_recursive(object_get_root(),
> +                                       get_smmuv3_accel, smmuv3_accel);
> +        /* Sort the smmuv3-accel by smmu idmap input_base */
> +        g_array_sort(smmuv3_accel, smmuv3_accel_idmap_compare);
> +
> +        /*  Fill smmu idmap from sorted accel array */
> +        for (i = 0; i < smmuv3_accel->len; i++) {
> +            accel = &g_array_index(smmuv3_accel, SMMUv3Accel, i);
> +            g_array_append_val(smmu_idmaps, accel->smmu_idmap);
> +        }
> +        num_smmus = smmuv3_accel->len;
> +    } else if (vms->iommu == VIRT_IOMMU_SMMUV3) {
>          object_child_foreach_recursive(object_get_root(),
>                                         iort_host_bridges, smmu_idmaps);
>  
>          /* Sort the smmu idmap by input_base */
>          g_array_sort(smmu_idmaps, iort_idmap_compare);
> +        num_smmus = 1;
> +    }
>  
> -        /*
> -         * Split the whole RIDs by mapping from RC to SMMU,
> -         * build the ID mapping from RC to ITS directly.
> -         */
> +    /*
> +     * Split the whole RIDs by mapping from RC to SMMU,
> +     * build the ID mapping from RC to ITS directly.
> +     */
> +    if (num_smmus) {
> +        AcpiIortIdMapping next_range = {0};
> +
> +        smmu_offset = g_new0(size_t, num_smmus);
> +        nb_nodes += num_smmus;
>          for (i = 0; i < smmu_idmaps->len; i++) {
>              idmap = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
>  
> @@ -316,10 +383,8 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>              g_array_append_val(its_idmaps, next_range);
>          }
>  
> -        nb_nodes = 3; /* RC, ITS, SMMUv3 */
>          rc_mapping_count = smmu_idmaps->len + its_idmaps->len;
>      } else {
> -        nb_nodes = 2; /* RC, ITS */
>          rc_mapping_count = 1;
>      }
>      /* Number of IORT Nodes */
> @@ -341,10 +406,19 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      /* GIC ITS Identifier Array */
>      build_append_int_noprefix(table_data, 0 /* MADT translation_id */, 4);
>  
> -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> -        int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
> +    for (i = 0; i < num_smmus; i++) {
> +        hwaddr base;
> +        int irq;
> +        if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
> +            accel = &g_array_index(smmuv3_accel, SMMUv3Accel, i);
> +            base = accel->base;
> +            irq = accel->irq;
> +        } else {
> +            base = vms->memmap[VIRT_SMMU].base;
> +            irq = vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
> +        }
>  
> -        smmu_offset = table_data->len - table.table_offset;
> +        smmu_offset[i] = table_data->len - table.table_offset;
>          /* Table 9 SMMUv3 Format */
>          build_append_int_noprefix(table_data, 4 /* SMMUv3 */, 1); /* Type */
>          node_size =  SMMU_V3_ENTRY_SIZE + ID_MAPPING_ENTRY_SIZE;
> @@ -355,7 +429,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>          /* Reference to ID Array */
>          build_append_int_noprefix(table_data, SMMU_V3_ENTRY_SIZE, 4);
>          /* Base address */
> -        build_append_int_noprefix(table_data, vms->memmap[VIRT_SMMU].base, 8);
> +        build_append_int_noprefix(table_data, base, 8);
>          /* Flags */
>          build_append_int_noprefix(table_data, 1 /* COHACC Override */, 4);
>          build_append_int_noprefix(table_data, 0, 4); /* Reserved */
> @@ -404,15 +478,22 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      build_append_int_noprefix(table_data, 0, 3); /* Reserved */
>  
>      /* Output Reference */
> -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> +    if (num_smmus) {
>          AcpiIortIdMapping *range;
> +        size_t offset;
>  
>          /* translated RIDs connect to SMMUv3 node: RC -> SMMUv3 -> ITS */
>          for (i = 0; i < smmu_idmaps->len; i++) {
> +            if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
> +                offset = smmu_offset[i];
> +            } else {
> +                offset = smmu_offset[0];
maybe we can also use smmu_offset array for non accel mode and get rid
of this.

Nevertheless it looks pretty good to me already.

Eric
> +            }
> +
>              range = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
>              /* output IORT node is the smmuv3 node */
>              build_iort_id_mapping(table_data, range->input_base,
> -                                  range->id_count, smmu_offset);
> +                                  range->id_count, offset);
>          }
>  
>          /* bypassed RIDs connect to ITS group node directly: RC -> ITS */



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 19/20] hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes
  2025-03-26 18:14   ` Eric Auger
@ 2025-03-26 18:50     ` Nicolin Chen
  2025-03-27  9:26       ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-26 18:50 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Wed, Mar 26, 2025 at 07:14:31PM +0100, Eric Auger wrote:
> 
> 
> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> > Now that we can have multiple user-creatable smmuv3-accel devices,
> > each associated with different pci buses, update IORT ID mappings
> > accordingly.
> >
> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  hw/arm/virt-acpi-build.c | 113 +++++++++++++++++++++++++++++++++------
> >  1 file changed, 97 insertions(+), 16 deletions(-)
> >
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index 3ac8f8e178..c232850e36 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -43,6 +43,7 @@
> >  #include "hw/acpi/generic_event_device.h"
> >  #include "hw/acpi/tpm.h"
> >  #include "hw/acpi/hmat.h"
> > +#include "hw/arm/smmuv3-accel.h"
> >  #include "hw/pci/pcie_host.h"
> >  #include "hw/pci/pci.h"
> >  #include "hw/pci/pci_bus.h"
> > @@ -233,6 +234,51 @@ struct AcpiIortIdMapping {
> >  };
> >  typedef struct AcpiIortIdMapping AcpiIortIdMapping;
> >  
> > +struct SMMUv3Accel {
> > +    int irq;
> > +    hwaddr base;
> > +    AcpiIortIdMapping smmu_idmap;
> > +};
> > +typedef struct SMMUv3Accel SMMUv3Accel;
> > +
> > +static int smmuv3_accel_idmap_compare(gconstpointer a, gconstpointer b)
> > +{
> > +    SMMUv3Accel *accel_a = (SMMUv3Accel *)a;
> > +    SMMUv3Accel *accel_b = (SMMUv3Accel *)b;
> > +
> > +    return accel_a->smmu_idmap.input_base - accel_b->smmu_idmap.input_base;
> > +}
> > +
> > +static int get_smmuv3_accel(Object *obj, void *opaque)
> > +{
> > +    GArray *s_accel_blob = opaque;
> > +
> > +    if (object_dynamic_cast(obj, TYPE_ARM_SMMUV3_ACCEL)) {
> > +        PCIBus *bus = (PCIBus *) object_property_get_link(obj, "primary-bus",
> > +                                                          &error_abort);
> > +        if (bus && !pci_bus_bypass_iommu(bus)) {
> > +            SMMUv3Accel accel;
> > +            int min_bus, max_bus;
> > +            VirtMachineState *v = VIRT_MACHINE(qdev_get_machine());
> > +            PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(v->platform_bus_dev);
> > +            SysBusDevice *sbdev = SYS_BUS_DEVICE(obj);
> > +            hwaddr base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
> > +            int irq = platform_bus_get_irqn(pbus, sbdev, 0);
> > +
> > +            base += v->memmap[VIRT_PLATFORM_BUS].base;
> > +            irq += v->irqmap[VIRT_PLATFORM_BUS];
> > +
> > +            pci_bus_range(bus, &min_bus, &max_bus);
> > +            accel.smmu_idmap.input_base = min_bus << 8;
> > +            accel.smmu_idmap.id_count = (max_bus - min_bus + 1) << 8;
> > +            accel.base = base;
> > +            accel.irq = irq + ARM_SPI_BASE;
> > +            g_array_append_val(s_accel_blob, accel);
> > +        }
> > +    }
> > +    return 0;
> > +}
> > +
> >  /* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
> >  static int
> >  iort_host_bridges(Object *obj, void *opaque)
> > @@ -275,30 +321,51 @@ static void
> >  build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >  {
> >      int i, nb_nodes, rc_mapping_count;
> > -    size_t node_size, smmu_offset = 0;
> > +    size_t node_size, *smmu_offset = NULL;
> >      AcpiIortIdMapping *idmap;
> > +    SMMUv3Accel *accel;
> > +    int num_smmus = 0;
> >      uint32_t id = 0;
> >      GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
> >      GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
> > +    GArray *smmuv3_accel = g_array_new(false, true, sizeof(SMMUv3Accel));
> >  
> >      AcpiTable table = { .sig = "IORT", .rev = 3, .oem_id = vms->oem_id,
> >                          .oem_table_id = vms->oem_table_id };
> >      /* Table 2 The IORT */
> >      acpi_table_begin(&table, table_data);
> >  
> > -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> > -        AcpiIortIdMapping next_range = {0};
> > -
> > +    nb_nodes = 2; /* RC, ITS */
> > +    if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
> > +        object_child_foreach_recursive(object_get_root(),
> > +                                       get_smmuv3_accel, smmuv3_accel);
> > +        /* Sort the smmuv3-accel by smmu idmap input_base */
> > +        g_array_sort(smmuv3_accel, smmuv3_accel_idmap_compare);
> > +
> > +        /*  Fill smmu idmap from sorted accel array */
> > +        for (i = 0; i < smmuv3_accel->len; i++) {
> > +            accel = &g_array_index(smmuv3_accel, SMMUv3Accel, i);
> > +            g_array_append_val(smmu_idmaps, accel->smmu_idmap);
> > +        }
> > +        num_smmus = smmuv3_accel->len;
> > +    } else if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> >          object_child_foreach_recursive(object_get_root(),
> >                                         iort_host_bridges, smmu_idmaps);
> >  
> >          /* Sort the smmu idmap by input_base */
> >          g_array_sort(smmu_idmaps, iort_idmap_compare);
> > +        num_smmus = 1;
> > +    }
> >  
> > -        /*
> > -         * Split the whole RIDs by mapping from RC to SMMU,
> > -         * build the ID mapping from RC to ITS directly.
> > -         */
> > +    /*
> > +     * Split the whole RIDs by mapping from RC to SMMU,
> > +     * build the ID mapping from RC to ITS directly.
> > +     */
> > +    if (num_smmus) {
> > +        AcpiIortIdMapping next_range = {0};
> > +
> > +        smmu_offset = g_new0(size_t, num_smmus);
> > +        nb_nodes += num_smmus;
> >          for (i = 0; i < smmu_idmaps->len; i++) {
> >              idmap = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
> >  
> > @@ -316,10 +383,8 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >              g_array_append_val(its_idmaps, next_range);
> >          }
> >  
> > -        nb_nodes = 3; /* RC, ITS, SMMUv3 */
> >          rc_mapping_count = smmu_idmaps->len + its_idmaps->len;
> >      } else {
> > -        nb_nodes = 2; /* RC, ITS */
> >          rc_mapping_count = 1;
> >      }
> >      /* Number of IORT Nodes */
> > @@ -341,10 +406,19 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >      /* GIC ITS Identifier Array */
> >      build_append_int_noprefix(table_data, 0 /* MADT translation_id */, 4);
> >  
> > -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> > -        int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
> > +    for (i = 0; i < num_smmus; i++) {
> > +        hwaddr base;
> > +        int irq;
> > +        if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
> > +            accel = &g_array_index(smmuv3_accel, SMMUv3Accel, i);
> > +            base = accel->base;
> > +            irq = accel->irq;
> > +        } else {
> > +            base = vms->memmap[VIRT_SMMU].base;
> > +            irq = vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
> > +        }
> >  
> > -        smmu_offset = table_data->len - table.table_offset;
> > +        smmu_offset[i] = table_data->len - table.table_offset;
> >          /* Table 9 SMMUv3 Format */
> >          build_append_int_noprefix(table_data, 4 /* SMMUv3 */, 1); /* Type */
> >          node_size =  SMMU_V3_ENTRY_SIZE + ID_MAPPING_ENTRY_SIZE;
> > @@ -355,7 +429,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >          /* Reference to ID Array */
> >          build_append_int_noprefix(table_data, SMMU_V3_ENTRY_SIZE, 4);
> >          /* Base address */
> > -        build_append_int_noprefix(table_data, vms->memmap[VIRT_SMMU].base, 8);
> > +        build_append_int_noprefix(table_data, base, 8);
> >          /* Flags */
> >          build_append_int_noprefix(table_data, 1 /* COHACC Override */, 4);
> >          build_append_int_noprefix(table_data, 0, 4); /* Reserved */
> > @@ -404,15 +478,22 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >      build_append_int_noprefix(table_data, 0, 3); /* Reserved */
> >  
> >      /* Output Reference */
> > -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> > +    if (num_smmus) {
> >          AcpiIortIdMapping *range;
> > +        size_t offset;
> >  
> >          /* translated RIDs connect to SMMUv3 node: RC -> SMMUv3 -> ITS */
> >          for (i = 0; i < smmu_idmaps->len; i++) {
> > +            if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
> > +                offset = smmu_offset[i];
> > +            } else {
> > +                offset = smmu_offset[0];

> maybe we can also use smmu_offset array for non accel mode and get rid
> of this.

I recall that my previous version does combine two modes, i.e.
non-accel mode only uses smmu_offset[0]. Perhaps Shameer found
some mismatch between smmu_idmaps->len and num_smmus?

Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-26 13:38   ` Eric Auger
@ 2025-03-26 19:16     ` Nicolin Chen
  2025-03-27  7:46       ` Shameerali Kolothum Thodi via
  2025-03-27  8:00       ` Eric Auger
  0 siblings, 2 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-26 19:16 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Wed, Mar 26, 2025 at 02:38:04PM +0100, Eric Auger wrote:
> > +/* Update batch->ncmds to the number of execute cmds */
> > +int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch)
> > +{
> > +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(bs);
> > +    uint32_t total = batch->ncmds;
> > +    IOMMUFDViommu *viommu_core;
> > +    int ret;
> > +
> > +    if (!bs->accel) {
> > +        return 0;
> > +    }
> > +
> > +    if (!s_accel->viommu) {
> > +        return 0;
> > +    }
> > +    viommu_core = &s_accel->viommu->core;
> > +    ret = iommufd_backend_invalidate_cache(viommu_core->iommufd,
> > +                                           viommu_core->viommu_id,
> > +                                           IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
> > +                                           sizeof(Cmd), &batch->ncmds,
> > +                                           batch->cmds);
> > +    if (total != batch->ncmds) {
> > +        error_report("%s failed: ret=%d, total=%d, done=%d",
> > +                      __func__, ret, total, batch->ncmds);
> some commands may have been executed (batch->ncmds !=0). Is the
> batch_cmds array updated accordingly? In the kernel doc I don't see any
> mention of that.

The uAPI kdoc of ioctl(IOMMU_HWPT_INVALIDATE) mentions:
 * @entry_num: Input the number of cache invalidation requests in the array.
 *             Output the number of requests successfully handled by kernel.

> Do you need to report a cmd_error as we do for some
> other cmds?

Yes, we do. And we did (in this patch)? cons would be updated:
+    if (batch->ncmds && (dev_cache != batch->dev_cache)) {
+        ret = smmuv3_accel_issue_cmd_batch(bs, batch);
+        if (ret) {
+            *cons = batch->cons[batch->ncmds];
+            return ret;
+        }
+    }

> > +        return ret;
> > +    }
> > +
> > +    batch->ncmds = 0;
> > +    batch->dev_cache = false;
> > +    return ret;
> > +}
> > +
> > +int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
> I was confused by the name. The helper adds a single Cmd to the batch,
> right? so batch_cmd would better suited.

Yea, it could be "smmuv3_accel_batch_cmd".

> > +                            SMMUCommandBatch *batch, Cmd *cmd,
> > +                            uint32_t *cons, bool dev_cache)
> > +{
> > +    int ret;
> > +
> > +    if (!bs->accel) {
> > +        return 0;
> > +    }
> > +
> > +    if (sdev) {
> > +        SMMUv3AccelDevice *accel_dev;
> > +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
> > +        if (!accel_dev->s1_hwpt) {

> can it happen? in the positive can you add some comment to describe in
> which condition?

I recall this is for device cache specifically (i.e. CGFI_CD[_ALL]
and ATC_INV) that I had in smmuv3_cmdq_consume(). Perhaps it gets
here because Shameer separated the accel code from the non-accel
smmuv3 file.

This condition is to check if the device is attached to an accel
HWPT, particularly to exclude commands being issued for emulated
devices. Surely, if a device isn't attached to an accel stage-1
HWPT any more, we probably shouldn't forward the commands to the
kernel? Though I start to suspect that we might need a lock for
accel_dev->s1_hwpt?

> > +/**
> > + * SMMUCommandBatch - batch of invalidation commands for smmuv3-accel
> > + * @cmds: Pointer to list of commands
> > + * @cons: Pointer to list of CONS corresponding to the commands
> > + * @ncmds: Total ncmds in the batch

> number of commands

OK.

> > + * @dev_cache: Issue to a device cache

> indicate whether the invalidation command batch targets device cache?

Maybe "invalidation command batch targeting device cache or TLB".

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw
  2025-03-26 14:16   ` Eric Auger
@ 2025-03-26 19:27     ` Nicolin Chen
  2025-03-27  8:03       ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-26 19:27 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Wed, Mar 26, 2025 at 03:16:18PM +0100, Eric Auger wrote:
> > @@ -1395,6 +1403,13 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
> >  
> >              trace_smmuv3_cmdq_cfgi_cd(sid);
> >              smmuv3_flush_config(sdev);
> > +
> > +            if (smmuv3_accel_batch_cmds(sdev->smmu, sdev, &batch, &cmd,
> > +                                        &q->cons, true)) {
> > +                cmd_error = SMMU_CERROR_ILL;
> I understand you collect all batchable commands all together (those
> sharing the same dev_cache prop) and the batch is executed either when
> the cache target changes or at the very end of the queue consumption.
> Since you don't batch all kinds of commands don't you have a risk to
> send commands out of order?

Yes, that could happen. But would it have some real risk?

This practice has an assumption that the guest OS would group
each batch with a proper CMD_SYNC like Linux does. So it could
reduce the amount of ioctls. If we can think of some real risk
when the guest OS doesn't, yes, I think we would have to flush
the batch if any non-accel command appear in-between.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
  2025-03-26 17:18   ` Eric Auger
@ 2025-03-26 19:46     ` Nicolin Chen
  2025-03-27  7:54       ` Shameerali Kolothum Thodi via
  2025-03-27 13:05     ` Jason Gunthorpe
  1 sibling, 1 reply; 145+ messages in thread
From: Nicolin Chen @ 2025-03-26 19:46 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Wed, Mar 26, 2025 at 06:18:49PM +0100, Eric Auger wrote:
> > @@ -561,6 +561,16 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
> >  
> >      decode_ste_config(cfg, config);
> >  
> > +      /* S1DSS.Terminate is same as Config.abort for default stream */
> 
> S1DSS. Termination

The spec uses "Terminate":
"• 0b00: Terminate. An abort is reported to the device and the
                     F_STREAM_DISABLED event is recorded."

> 
> > +    if (STE_CFG_S1_ENABLED(config) && STE_S1DSS(ste) == 0) {
> > +        cfg->aborted = true;
> The spec says:
> "
> When substreams are enabled (STE.S1CDMax != 0), this field determines
> the behavior of a transaction or
> translation request that arrives without an associated substream:
> "
> So I understand you should also check STE.S1CDMax.

Yea, that's missing, as spec also says:
"If Config[0] == 0 (stage 1 disabled) or S1CDMax == 0 (substreams
 disabled) or SMMU_IDR1.SSIDSIZE == 0 (substreams unsupported),
 this field is IGNORED."

> Also how do we check
> that the incoming transacrion arrives without a substream?
> 
> In general shouldn't we add the support of subtreams in the emulated
> code too?
> 
> the spec also says that in that case you should record a
> F_STREAM_DISABLED event.

Yea, that's seemingly missing too, for value 0b10.

> > +    }
> > +
> > +    /* S1DSS.Bypass is same as Config.bypass for default stream */

> S1DSS. Bypass

Perhaps, instead of FIELD.VALUE format, we should do:
	FIELD=VALUE?
e.g. S1DSS=Terminate (0b01) and S1DSS=Bypass (0b01).

> > +    if (STE_CFG_S1_ENABLED(config) && STE_S1DSS(ste) == 0x1) {
> > +        cfg->bypassed = true;
> > +    }
> > +
> >      if (cfg->aborted || cfg->bypassed) {
> >          return 0;
> >      }
> > @@ -598,13 +608,14 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
> >          }
> >      }
> >  
> > -    if (STE_S1CDMAX(ste) != 0) {
> > +    if (!FIELD_EX32(s->idr[1], IDR1, SSIDSIZE) && STE_S1CDMAX(ste) != 0) {
> >          qemu_log_mask(LOG_UNIMP,
> >                        "SMMUv3 does not support multiple context descriptors yet\n");
> the log message should be different because it becomes a guest error:
> qemu_log_mask(LOG_GUEST_ERROR, "invalid S1CDMAX as SSIDSIZE==0") or
> something alike
> 
> >          goto bad_ste;
> >      }
> >  
> > -    if (STE_S1STALLD(ste)) {
> > +    /* STALL_MODEL being 0b01 means "stall is not supported" */
> > +    if ((FIELD_EX32(s->idr[0], IDR0, STALL_MODEL) & 0x1) && STE_S1STALLD(ste)) {
> >          qemu_log_mask(LOG_UNIMP,
> >                        "SMMUv3 S1 stalling fault model not allowed yet\n");
> same here.
> 
> Again I think we need to understand the consequence of having a more
> comprehensive support of SSID. This also holds with old the IDR fields
> that may be inherited from the HW and we don't support yet in the
> emulation code

To support guest-level SVA, it must support SSID. We can keep the
SSIDSIZE=0 in an emulated SMMU. Would you elaborate the concern of
doing so?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
  2025-03-26 17:40   ` Eric Auger
@ 2025-03-26 19:57     ` Nicolin Chen
  0 siblings, 0 replies; 145+ messages in thread
From: Nicolin Chen @ 2025-03-26 19:57 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Wed, Mar 26, 2025 at 06:40:10PM +0100, Eric Auger wrote:
> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> >
> > If a vSMMU is configured as a accelerated one, HW IOTLB will be used
> > and all cache invalidation should be done to the HW IOTLB too, v.s.
> > the emulated iotlb. In this case, an iommu notifier isn't registered,
> > as the devices behind a SMMUv3-accel would stay in the system address
> > space for stage-2 mappings.
> >
> > However, the KVM code still requests an iommu address space to translate
> > an MSI doorbell gIOVA via get_msi_address_space() and translate().
> In case we you flat MSI mapping, can't we get rid about that problematic?
> 
> Sorry but I don't really understand the problematic here. Please can
> elaborate?

With RMR, the HW is doing flat mapping for stage-1, but the guest
isn't doing a 1:1 mapping.

The guest maps a gIOVA to the IPA of vITS page (IIRC, 0x8090000),
meanwhile the PCI HW is programmed with the RMR IOVA (0x8000000).

The translation part works well with the flat mapping alone, while
the vIRQ injection part (done by KVM) has to update the vITS page.

The details are in kvm_arch_fixup_msi_route() that uses the iommu
address space to translate the gIOVA (being programmed to the guest
level PCI) to the IPA of the vITS page.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-26 19:16     ` Nicolin Chen
@ 2025-03-27  7:46       ` Shameerali Kolothum Thodi via
  2025-03-27  8:00       ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-27  7:46 UTC (permalink / raw)
  To: Nicolin Chen, Eric Auger
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, March 26, 2025 7:16 PM
> To: Eric Auger <eric.auger@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers
> to batch and issue cache invalidations
> 
 
> > can it happen? in the positive can you add some comment to describe in
> > which condition?
> 
> I recall this is for device cache specifically (i.e. CGFI_CD[_ALL]
> and ATC_INV) that I had in smmuv3_cmdq_consume(). Perhaps it gets
> here because Shameer separated the accel code from the non-accel
> smmuv3 file.

Yes. It is because I moved the code the around.
> 
> This condition is to check if the device is attached to an accel
> HWPT, particularly to exclude commands being issued for emulated
> devices. Surely, if a device isn't attached to an accel stage-1
> HWPT any more, we probably shouldn't forward the commands to the
> kernel? Though I start to suspect that we might need a lock for
> accel_dev->s1_hwpt?

I will double check this whether we require this or not.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
  2025-03-26 19:46     ` Nicolin Chen
@ 2025-03-27  7:54       ` Shameerali Kolothum Thodi via
  2025-03-27  9:11         ` Eric Auger
  0 siblings, 1 reply; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-27  7:54 UTC (permalink / raw)
  To: Nicolin Chen, Eric Auger
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, March 26, 2025 7:47 PM
> To: Eric Auger <eric.auger@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for
> STE_S1CDMAX and STE_S1STALLD
> 

> > Again I think we need to understand the consequence of having a more
> > comprehensive support of SSID. This also holds with old the IDR fields
> > that may be inherited from the HW and we don't support yet in the
> > emulation code
> 
> To support guest-level SVA, it must support SSID. We can keep the
> SSIDSIZE=0 in an emulated SMMU. Would you elaborate the concern of
> doing so?
> 
Regarding adding support for SSID/SVA in emulation code, the support also depends on
device PRI/IOPF feature as well. Do we have any emulated devices that can make use
this? I would say we can add that support later if there is any real use cases for that.

Thanks,
Shameer 

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations
  2025-03-26 19:16     ` Nicolin Chen
  2025-03-27  7:46       ` Shameerali Kolothum Thodi via
@ 2025-03-27  8:00       ` Eric Auger
  1 sibling, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-27  8:00 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/26/25 8:16 PM, Nicolin Chen wrote:
> On Wed, Mar 26, 2025 at 02:38:04PM +0100, Eric Auger wrote:
>>> +/* Update batch->ncmds to the number of execute cmds */
>>> +int smmuv3_accel_issue_cmd_batch(SMMUState *bs, SMMUCommandBatch *batch)
>>> +{
>>> +    SMMUv3AccelState *s_accel = ARM_SMMUV3_ACCEL(bs);
>>> +    uint32_t total = batch->ncmds;
>>> +    IOMMUFDViommu *viommu_core;
>>> +    int ret;
>>> +
>>> +    if (!bs->accel) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (!s_accel->viommu) {
>>> +        return 0;
>>> +    }
>>> +    viommu_core = &s_accel->viommu->core;
>>> +    ret = iommufd_backend_invalidate_cache(viommu_core->iommufd,
>>> +                                           viommu_core->viommu_id,
>>> +                                           IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3,
>>> +                                           sizeof(Cmd), &batch->ncmds,
>>> +                                           batch->cmds);
>>> +    if (total != batch->ncmds) {
>>> +        error_report("%s failed: ret=%d, total=%d, done=%d",
>>> +                      __func__, ret, total, batch->ncmds);
>> some commands may have been executed (batch->ncmds !=0). Is the
>> batch_cmds array updated accordingly? In the kernel doc I don't see any
>> mention of that.
> The uAPI kdoc of ioctl(IOMMU_HWPT_INVALIDATE) mentions:
>  * @entry_num: Input the number of cache invalidation requests in the array.
>  *             Output the number of requests successfully handled by kernel.
I was rather talking about the array of cmd itself but I guess it is
left unchanged.  don't know if we are supposed to retry sending failed
cmds or not.
>
>> Do you need to report a cmd_error as we do for some
>> other cmds?
> Yes, we do. And we did (in this patch)? cons would be updated:
> +    if (batch->ncmds && (dev_cache != batch->dev_cache)) {
> +        ret = smmuv3_accel_issue_cmd_batch(bs, batch);
> +        if (ret) {
> +            *cons = batch->cons[batch->ncmds];
> +            return ret;
cons is updated but error is not logged in this patch.
> +        }
> +    }
>
>>> +        return ret;
>>> +    }
>>> +
>>> +    batch->ncmds = 0;
>>> +    batch->dev_cache = false;
>>> +    return ret;
>>> +}
>>> +
>>> +int smmuv3_accel_batch_cmds(SMMUState *bs, SMMUDevice *sdev,
>> I was confused by the name. The helper adds a single Cmd to the batch,
>> right? so batch_cmd would better suited.
> Yea, it could be "smmuv3_accel_batch_cmd".
>
>>> +                            SMMUCommandBatch *batch, Cmd *cmd,
>>> +                            uint32_t *cons, bool dev_cache)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (!bs->accel) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (sdev) {
>>> +        SMMUv3AccelDevice *accel_dev;
>>> +        accel_dev = container_of(sdev, SMMUv3AccelDevice, sdev);
>>> +        if (!accel_dev->s1_hwpt) {
>> can it happen? in the positive can you add some comment to describe in
>> which condition?
> I recall this is for device cache specifically (i.e. CGFI_CD[_ALL]
> and ATC_INV) that I had in smmuv3_cmdq_consume(). Perhaps it gets
> here because Shameer separated the accel code from the non-accel
> smmuv3 file.
>
> This condition is to check if the device is attached to an accel
> HWPT, particularly to exclude commands being issued for emulated
> devices. Surely, if a device isn't attached to an accel stage-1
> HWPT any more, we probably shouldn't forward the commands to the
> kernel? Though I start to suspect that we might need a lock for
> accel_dev->s1_hwpt?
Yes worth to dig in and add a comment

Thanks

Eric
>
>>> +/**
>>> + * SMMUCommandBatch - batch of invalidation commands for smmuv3-accel
>>> + * @cmds: Pointer to list of commands
>>> + * @cons: Pointer to list of CONS corresponding to the commands
>>> + * @ncmds: Total ncmds in the batch
>> number of commands
> OK.
>
>>> + * @dev_cache: Issue to a device cache
>> indicate whether the invalidation command batch targets device cache?
> Maybe "invalidation command batch targeting device cache or TLB".
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw
  2025-03-26 19:27     ` Nicolin Chen
@ 2025-03-27  8:03       ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-27  8:03 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao



On 3/26/25 8:27 PM, Nicolin Chen wrote:
> On Wed, Mar 26, 2025 at 03:16:18PM +0100, Eric Auger wrote:
>>> @@ -1395,6 +1403,13 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>>>  
>>>              trace_smmuv3_cmdq_cfgi_cd(sid);
>>>              smmuv3_flush_config(sdev);
>>> +
>>> +            if (smmuv3_accel_batch_cmds(sdev->smmu, sdev, &batch, &cmd,
>>> +                                        &q->cons, true)) {
>>> +                cmd_error = SMMU_CERROR_ILL;
>> I understand you collect all batchable commands all together (those
>> sharing the same dev_cache prop) and the batch is executed either when
>> the cache target changes or at the very end of the queue consumption.
>> Since you don't batch all kinds of commands don't you have a risk to
>> send commands out of order?
> Yes, that could happen. But would it have some real risk?

OK. I don't know but this needs to be studied.

Eric
>
> This practice has an assumption that the guest OS would group
> each batch with a proper CMD_SYNC like Linux does. So it could
> reduce the amount of ioctls. If we can think of some real risk
> when the guest OS doesn't, yes, I think we would have to flush
> the batch if any non-accel command appear in-between.
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
  2025-03-27  7:54       ` Shameerali Kolothum Thodi via
@ 2025-03-27  9:11         ` Eric Auger
  0 siblings, 0 replies; 145+ messages in thread
From: Eric Auger @ 2025-03-27  9:11 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi,

On 3/27/25 8:54 AM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Sent: Wednesday, March 26, 2025 7:47 PM
>> To: Eric Auger <eric.auger@redhat.com>
>> Cc: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for
>> STE_S1CDMAX and STE_S1STALLD
>>
>>> Again I think we need to understand the consequence of having a more
>>> comprehensive support of SSID. This also holds with old the IDR fields
>>> that may be inherited from the HW and we don't support yet in the
>>> emulation code
>> To support guest-level SVA, it must support SSID. We can keep the
>> SSIDSIZE=0 in an emulated SMMU. Would you elaborate the concern of
>> doing so?
I just want to make sure we dissociate both accel and emulated paths and
we do not advertise SSID in one mode while we do not fully support it.
>>
> Regarding adding support for SSID/SVA in emulation code, the support also depends on
> device PRI/IOPF feature as well. Do we have any emulated devices that can make use
> this? I would say we can add that support later if there is any real use cases for that.

x86 may be ahead of us in this area. Maybe this was tested by Zhenzhong
when contributing emulation for S1 support in intel_iommu?

Eric
>
> Thanks,
> Shameer 



^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [RFC PATCH v2 19/20] hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes
  2025-03-26 18:50     ` Nicolin Chen
@ 2025-03-27  9:26       ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 145+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-27  9:26 UTC (permalink / raw)
  To: Nicolin Chen, Eric Auger
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	berrange@redhat.com, nathanc@nvidia.com, mochs@nvidia.com,
	smostafa@google.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, March 26, 2025 6:51 PM
> To: Eric Auger <eric.auger@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 19/20] hw/arm/virt-acpi-build: Update IORT with
> multiple smmuv3-accel nodes
> 

> > >          for (i = 0; i < smmu_idmaps->len; i++) {
> > > +            if (vms->iommu == VIRT_IOMMU_SMMUV3_ACCEL) {
> > > +                offset = smmu_offset[i];
> > > +            } else {
> > > +                offset = smmu_offset[0];
> 
> > maybe we can also use smmu_offset array for non accel mode and get rid
> > of this.
> 
> I recall that my previous version does combine two modes, i.e.
> non-accel mode only uses smmu_offset[0]. Perhaps Shameer found some
> mismatch between smmu_idmaps->len and num_smmus?

Perhaps I did 😊. I think it was for a case where there were multiple host bridges 
associated with iommu=smmuv3. I will revisit to see this can be simplified.

Between, Thanks to both of you(and others of course!) for going through the series.
I will consolidate the comments and rework the series soon. 

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
  2025-03-26 17:18   ` Eric Auger
  2025-03-26 19:46     ` Nicolin Chen
@ 2025-03-27 13:05     ` Jason Gunthorpe
  1 sibling, 0 replies; 145+ messages in thread
From: Jason Gunthorpe @ 2025-03-27 13:05 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, nicolinc,
	ddutile, berrange, nathanc, mochs, smostafa, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

On Wed, Mar 26, 2025 at 06:18:49PM +0100, Eric Auger wrote:
> Again I think we need to understand the consequence of having a more
> comprehensive support of SSID. This also holds with old the IDR fields
> that may be inherited from the HW and we don't support yet in the
> emulation code

To be very clear, and this is in one of the uapi header comments, the
vmm should not be copying IDR fields blindly. It should refer to the
physical HW IDR to build the virtual one only for bits which make
sense and match its own paravirtualization capabilities.

Also, I don't think any of the emulation SW in qemu can use a pasid,
so isn't it OK to just ignore the non-zero SSIDs in the CD table, and
continue to advertise a vPCI device without a PASID cap??

Jason


^ permalink raw reply	[flat|nested] 145+ messages in thread

end of thread, other threads:[~2025-03-27 13:06 UTC | newest]

Thread overview: 145+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-11 14:10 [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum via
2025-03-11 14:10 ` [RFC PATCH v2 01/20] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum via
2025-03-12 15:20   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 02/20] backends/iommufd: Introduce iommufd_vdev_alloc Shameer Kolothum via
2025-03-12 15:25   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 03/20] hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel device Shameer Kolothum via
2025-03-11 20:13   ` Nicolin Chen
2025-03-12 15:15   ` Eric Auger
2025-03-17 17:54     ` Nicolin Chen
2025-03-17 18:07       ` Eric Auger
2025-03-17 19:10         ` Nicolin Chen
2025-03-17 19:24           ` Jason Gunthorpe
2025-03-17 20:19             ` Nicolin Chen
2025-03-18  9:50               ` Shameerali Kolothum Thodi via
2025-03-18 18:31               ` Eric Auger
2025-03-18 19:13                 ` Nicolin Chen
2025-03-18 21:22                   ` Donald Dutile
2025-03-19  0:23                     ` Jason Gunthorpe
2025-03-19  2:15                       ` Donald Dutile
2025-03-19 17:00                       ` Eric Auger
2025-03-19 17:12                         ` Shameerali Kolothum Thodi via
2025-03-19 17:38                           ` Eric Auger
2025-03-21  0:55                         ` Donald Dutile
2025-03-19 17:04                     ` Eric Auger
2025-03-21  0:54                       ` Donald Dutile
2025-03-24 14:52                         ` Eric Auger
2025-03-19  0:31                 ` Jason Gunthorpe
2025-03-19  5:27                   ` Nicolin Chen
2025-03-24 14:08                   ` Eric Auger
2025-03-18 21:42             ` Donald Dutile
2025-03-19 16:45           ` Eric Auger
2025-03-19 16:53             ` Shameerali Kolothum Thodi via
2025-03-19 17:26               ` Eric Auger
2025-03-19 17:34                 ` Jason Gunthorpe
2025-03-19 17:41                   ` Eric Auger
2025-03-19 17:14             ` Nicolin Chen
2025-03-19 18:09               ` Eric Auger
2025-03-19 18:34                 ` Nicolin Chen
2025-03-24 14:46                   ` Eric Auger
2025-03-21  1:26                 ` Donald Dutile
2025-03-24 14:59                   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 04/20] hw/arm/virt: Add support for smmuv3-accel Shameer Kolothum via
2025-03-11 20:22   ` Nicolin Chen
2025-03-12  9:44     ` Shameerali Kolothum Thodi via
2025-03-12 15:36   ` Eric Auger
2025-03-12 15:46     ` Shameerali Kolothum Thodi via
2025-03-12 16:13       ` Eric Auger
2025-03-12 16:22         ` Shameerali Kolothum Thodi via
2025-03-12 16:27           ` Eric Auger
2025-03-12 17:34             ` Shameerali Kolothum Thodi via
2025-03-12 18:30               ` Eric Auger
2025-03-13  8:26                 ` Shameerali Kolothum Thodi via
2025-03-13 15:22                   ` Eric Auger
2025-03-18 22:49   ` Donald Dutile
2025-03-19  9:28     ` Shameerali Kolothum Thodi via
2025-03-11 14:10 ` [RFC PATCH v2 05/20] hw/arm/smmuv3-accel: Associate a pxb-pcie bus Shameer Kolothum via
2025-03-12 16:07   ` Eric Auger
2025-03-12 16:34     ` Shameerali Kolothum Thodi via
2025-03-12 16:39       ` Daniel P. Berrangé
2025-03-12 17:28         ` Shameerali Kolothum Thodi via
2025-03-13 15:21           ` Eric Auger
2025-03-12 16:42       ` Eric Auger
2025-03-13  8:22         ` Shameerali Kolothum Thodi via
2025-03-17 16:57           ` Eric Auger
2025-03-18 22:12   ` Donald Dutile
2025-03-19  9:26     ` Shameerali Kolothum Thodi via
2025-03-19 16:21       ` Donald Dutile
2025-03-19 18:21         ` Eric Auger
2025-03-21  0:59           ` Donald Dutile
2025-03-24 14:56             ` Eric Auger
2025-03-24 15:02               ` Daniel P. Berrangé
2025-03-24 21:43               ` Donald Dutile
2025-03-20 17:02       ` Nicolin Chen
2025-03-24  8:19         ` Shameerali Kolothum Thodi via
2025-03-24 13:13           ` Eric Auger
2025-03-24 13:55             ` Shameerali Kolothum Thodi via
2025-03-24 15:34               ` Eric Auger
2025-03-24 16:01             ` Nicolin Chen
2025-03-24 16:06               ` Shameerali Kolothum Thodi via
2025-03-24 15:50           ` Nicolin Chen
2025-03-11 14:10 ` [RFC PATCH v2 06/20] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum via
2025-03-12 16:12   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 07/20] hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps Shameer Kolothum via
2025-03-12 16:23   ` Eric Auger
2025-03-13  8:09     ` Shameerali Kolothum Thodi via
2025-03-17 16:52       ` Eric Auger
2025-03-18  9:47         ` Shameerali Kolothum Thodi via
2025-03-12 17:10   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 08/20] hw/arm/smmuv3-accel: Provide get_address_space callback Shameer Kolothum via
2025-03-11 20:50   ` Nicolin Chen
2025-03-12 17:14   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 09/20] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum via
2025-03-11 21:07   ` Nicolin Chen
2025-03-17  8:38     ` Shameerali Kolothum Thodi via
2025-03-17 18:19       ` Nicolin Chen
2025-03-12 12:52   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 10/20] hw/arm/smmuv3-accel: Support nested STE install/uninstall support Shameer Kolothum via
2025-03-25 18:08   ` Eric Auger
2025-03-25 19:33     ` Nicolin Chen
2025-03-11 14:10 ` [RFC PATCH v2 11/20] hw/arm/smmuv3-accel: Allocate a vDEVICE object for device Shameer Kolothum via
2025-03-18 23:30   ` Donald Dutile
2025-03-25 18:13   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 12/20] hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed Shameer Kolothum via
2025-03-25 18:47   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 13/20] hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache invalidations Shameer Kolothum via
2025-03-19  1:31   ` Donald Dutile
2025-03-19  9:48     ` Shameerali Kolothum Thodi via
2025-03-19 16:24       ` Donald Dutile
2025-03-19 16:48         ` Nicolin Chen
2025-03-26 13:38   ` Eric Auger
2025-03-26 19:16     ` Nicolin Chen
2025-03-27  7:46       ` Shameerali Kolothum Thodi via
2025-03-27  8:00       ` Eric Auger
2025-03-26 13:59   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 14/20] hw/arm/smmuv3: Install nested ste for CFGI_STE Shameer Kolothum via
2025-03-26 13:39   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 15/20] hw/arm/smmuv3: Forward invalidation commands to hw Shameer Kolothum via
2025-03-26 14:16   ` Eric Auger
2025-03-26 19:27     ` Nicolin Chen
2025-03-27  8:03       ` Eric Auger
2025-03-26 14:18   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 16/20] hw/arm/smmuv3-accel: Read host SMMUv3 device info Shameer Kolothum via
2025-03-19  2:45   ` Donald Dutile
2025-03-26 14:57   ` Eric Auger
2025-03-11 14:10 ` [RFC PATCH v2 17/20] hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD Shameer Kolothum via
2025-03-26 17:18   ` Eric Auger
2025-03-26 19:46     ` Nicolin Chen
2025-03-27  7:54       ` Shameerali Kolothum Thodi via
2025-03-27  9:11         ` Eric Auger
2025-03-27 13:05     ` Jason Gunthorpe
2025-03-11 14:10 ` [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3 Shameer Kolothum via
2025-03-19  2:52   ` Donald Dutile
2025-03-26 17:40   ` Eric Auger
2025-03-26 19:57     ` Nicolin Chen
2025-03-11 14:10 ` [RFC PATCH v2 19/20] hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes Shameer Kolothum via
2025-03-26 18:14   ` Eric Auger
2025-03-26 18:50     ` Nicolin Chen
2025-03-27  9:26       ` Shameerali Kolothum Thodi via
2025-03-11 14:10 ` [RFC PATCH v2 20/20] hw/arm/smmuv3-accel: Enable smmuv3-accel creation Shameer Kolothum via
2025-03-19 16:40 ` [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Philippe Mathieu-Daudé
2025-03-19 17:13   ` Eric Auger
2025-03-25 14:42 ` Eric Auger
2025-03-25 15:43   ` Shameerali Kolothum Thodi via
2025-03-25 18:26     ` Nicolin Chen via
2025-03-25 18:52       ` Eric Auger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).