[RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
@ 2024-11-08 12:52 Shameer Kolothum via
  2024-11-08 12:52 ` [RFC PATCH 1/5] hw/arm/virt: Add an SMMU_IO_LEN macro Shameer Kolothum via
                   ` (10 more replies)
  0 siblings, 11 replies; 150+ messages in thread
From: Shameer Kolothum via @ 2024-11-08 12:52 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

Hi,

This series adds initial support for a user-creatable "arm-smmuv3-nested"
device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
and cannot support multiple SMMUv3s.

In order to support vfio-pci dev assignment with vSMMUv3, the physical
SMMUv3 has to be configured in nested mode. Having a pluggable
"arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
running on a host with multiple physical SMMUv3s. A few benefits of doing
this are,

1. Avoid invalidation broadcast or lookup in case devices are behind
   multiple phys SMMUv3s.
2. Makes it easy to handle phys SMMUv3s that differ in features.
3. Easy to handle future requirements such as vCMDQ support.

This is based on discussions/suggestions received for a previous RFC by
Nicolin here[0].

This series includes,
 -Adds support for "arm-smmuv3-nested" device. At present only virt is
  supported and is using _plug_cb() callback to hook the sysbus mem
  and irq (Not sure this has any negative repercussions). Patch #3.
 -Provides a way to associate a pci-bus(pxb-pcie) to the above device.
  Patch #3.
 -The last patch is adding RMR support for MSI doorbell handling. Patch #5.
  This may change in future[1].

This RFC is for initial discussion/test purposes only and includes patches
that are only relevant for adding the "arm-smmuv3-nested" support. For the
complete branch please find,
https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1

Few ToDos to note,
1. At present default-bus-bypass-iommu=on should be set when
   arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
   related boot error.  Requires fixing.
2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
   Could be a bug in IORT id mappings.
3. The above branch doesn't support vSVA yet.

Hopefully this is helpful in taking the discussion forward. Please take a
look and let me know.

How to use it(Eg:):

On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
specify two smmuv3-nested devices each behind a pxb-pcie as below,

./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
-enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
-object iommufd,id=iommufd0 \
-bios QEMU_EFI.fd \
-kernel Image \
-device virtio-blk-device,drive=fs \
-drive if=none,file=rootfs.qcow2,id=fs \
-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
-device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
-device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
-append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
-device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
-fsdev local,id=p9fs2,path=p9root,security_model=mapped \
-net none \
-nographic

Guest will boot with two SMMuv3s,
[    1.608130] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
[    1.609655] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
[    1.612475] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
[    1.614444] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
[    1.617451] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
[    1.618842] arm-smmu-v3 arm-smmu-v3.1.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
[    1.621366] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
[    1.623225] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq

With a pci topology like below,
[root@localhost ~]# lspci -tv
-+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
 |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           \-03.0  Virtio: Virtio filesystem
 +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
[root@localhost ~]#

And if you want to add another HNS VF, it should be added to the same SMMUv3
as of the first HNS dev,

-device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \

[root@localhost ~]# lspci -tv
-+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
 |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           \-03.0  Virtio: Virtio filesystem
 +-[0000:08]-+-00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 |           \-01.0-[0a]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
[root@localhost ~]#

Attempt to add the HNS VF to a different SMMUv3 will result in,

-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
   Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument

At present Qemu is not doing any extra validation other than the above
failure to make sure the user configuration is correct or not. The
assumption is libvirt will take care of this.

Thanks,
Shameer
[0] https://lore.kernel.org/qemu-devel/cover.1719361174.git.nicolinc@nvidia.com/
[1] https://lore.kernel.org/linux-iommu/ZrVN05VylFq8lK4q@Asurada-Nvidia/

Eric Auger (1):
  hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested
    binding

Nicolin Chen (2):
  hw/arm/virt: Add an SMMU_IO_LEN macro
  hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes

Shameer Kolothum (2):
  hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device

 hw/arm/smmuv3.c          |  61 ++++++++++++++++++++++
 hw/arm/virt-acpi-build.c | 109 ++++++++++++++++++++++++++++++++-------
 hw/arm/virt.c            |  33 ++++++++++--
 hw/core/sysbus-fdt.c     |   1 +
 include/hw/arm/smmuv3.h  |  17 ++++++
 include/hw/arm/virt.h    |  15 ++++++
 6 files changed, 215 insertions(+), 21 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 150+ messages in thread

* [RFC PATCH 1/5] hw/arm/virt: Add an SMMU_IO_LEN macro
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
@ 2024-11-08 12:52 ` Shameer Kolothum via
  2024-11-13 16:48   ` Eric Auger
  2024-11-08 12:52 ` [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device Shameer Kolothum via
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 150+ messages in thread
From: Shameer Kolothum via @ 2024-11-08 12:52 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

A following patch will add a new MMIO region for nested SMMU instances.

This macro will be repeatedly used to set offsets and MMIO sizes in both
virt and virt-acpi-build.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/virt.c         | 2 +-
 include/hw/arm/virt.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 719e83e6a1..780bcff77c 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -174,7 +174,7 @@ static const MemMapEntry base_memmap[] = {
     [VIRT_FW_CFG] =             { 0x09020000, 0x00000018 },
     [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
     [VIRT_UART1] =              { 0x09040000, 0x00001000 },
-    [VIRT_SMMU] =               { 0x09050000, 0x00020000 },
+    [VIRT_SMMU] =               { 0x09050000, SMMU_IO_LEN },
     [VIRT_PCDIMM_ACPI] =        { 0x09070000, MEMORY_HOTPLUG_IO_LEN },
     [VIRT_ACPI_GED] =           { 0x09080000, ACPI_GED_EVT_SEL_LEN },
     [VIRT_NVDIMM_ACPI] =        { 0x09090000, NVDIMM_ACPI_IO_LEN},
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ab961bb6a9..46f48fe561 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -47,6 +47,9 @@
 /* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */
 #define PVTIME_SIZE_PER_CPU 64
 
+/* MMIO region size for SMMUv3 */
+#define SMMU_IO_LEN 0x20000
+
 enum {
     VIRT_FLASH,
     VIRT_MEM,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 1/5] hw/arm/virt: Add an SMMU_IO_LEN macro
  2024-11-08 12:52 ` [RFC PATCH 1/5] hw/arm/virt: Add an SMMU_IO_LEN macro Shameer Kolothum via
@ 2024-11-13 16:48   ` Eric Auger
  0 siblings, 0 replies; 150+ messages in thread
From: Eric Auger @ 2024-11-13 16:48 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

Hi,

On 11/8/24 13:52, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> A following patch will add a new MMIO region for nested SMMU instances.
Nit: Add a new ... is generally preferred I think
>
> This macro will be repeatedly used to set offsets and MMIO sizes in both
> virt and virt-acpi-build.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  hw/arm/virt.c         | 2 +-
>  include/hw/arm/virt.h | 3 +++
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 719e83e6a1..780bcff77c 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -174,7 +174,7 @@ static const MemMapEntry base_memmap[] = {
>      [VIRT_FW_CFG] =             { 0x09020000, 0x00000018 },
>      [VIRT_GPIO] =               { 0x09030000, 0x00001000 },
>      [VIRT_UART1] =              { 0x09040000, 0x00001000 },
> -    [VIRT_SMMU] =               { 0x09050000, 0x00020000 },
> +    [VIRT_SMMU] =               { 0x09050000, SMMU_IO_LEN },
>      [VIRT_PCDIMM_ACPI] =        { 0x09070000, MEMORY_HOTPLUG_IO_LEN },
>      [VIRT_ACPI_GED] =           { 0x09080000, ACPI_GED_EVT_SEL_LEN },
>      [VIRT_NVDIMM_ACPI] =        { 0x09090000, NVDIMM_ACPI_IO_LEN},
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index ab961bb6a9..46f48fe561 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -47,6 +47,9 @@
>  /* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */
>  #define PVTIME_SIZE_PER_CPU 64
>  
> +/* MMIO region size for SMMUv3 */
> +#define SMMU_IO_LEN 0x20000
> +
>  enum {
>      VIRT_FLASH,
>      VIRT_MEM,



^ permalink raw reply	[flat|nested] 150+ messages in thread

* [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
  2024-11-08 12:52 ` [RFC PATCH 1/5] hw/arm/virt: Add an SMMU_IO_LEN macro Shameer Kolothum via
@ 2024-11-08 12:52 ` Shameer Kolothum via
  2024-11-13 17:12   ` Eric Auger
  2024-11-13 18:00   ` Eric Auger
  2024-11-08 12:52 ` [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a " Shameer Kolothum via
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 150+ messages in thread
From: Shameer Kolothum via @ 2024-11-08 12:52 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

Based on SMMUv3 as a parent device, add a user-creatable
smmuv3-nested device. Subsequent patches will add support to
specify a PCI bus for this device.

Currently only supported for "virt", so hook up the sybus mem & irq
for that  as well.

No FDT support is added for now.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3.c         | 34 ++++++++++++++++++++++++++++++++++
 hw/arm/virt.c           | 31 +++++++++++++++++++++++++++++--
 hw/core/sysbus-fdt.c    |  1 +
 include/hw/arm/smmuv3.h | 15 +++++++++++++++
 include/hw/arm/virt.h   |  6 ++++++
 5 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 2101031a8f..0033eb8125 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -2201,6 +2201,19 @@ static void smmu_realize(DeviceState *d, Error **errp)
     smmu_init_irq(s, dev);
 }
 
+static void smmu_nested_realize(DeviceState *d, Error **errp)
+{
+    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
+    SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_GET_CLASS(s_nested);
+    Error *local_err = NULL;
+
+    c->parent_realize(d, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
 static const VMStateDescription vmstate_smmuv3_queue = {
     .name = "smmuv3_queue",
     .version_id = 1,
@@ -2299,6 +2312,18 @@ static void smmuv3_class_init(ObjectClass *klass, void *data)
     device_class_set_props(dc, smmuv3_properties);
 }
 
+static void smmuv3_nested_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_CLASS(klass);
+
+    dc->vmsd = &vmstate_smmuv3;
+    device_class_set_parent_realize(dc, smmu_nested_realize,
+                                    &c->parent_realize);
+    dc->user_creatable = true;
+    dc->hotpluggable = false;
+}
+
 static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
                                       IOMMUNotifierFlag old,
                                       IOMMUNotifierFlag new,
@@ -2337,6 +2362,14 @@ static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
     imrc->notify_flag_changed = smmuv3_notify_flag_changed;
 }
 
+static const TypeInfo smmuv3_nested_type_info = {
+    .name          = TYPE_ARM_SMMUV3_NESTED,
+    .parent        = TYPE_ARM_SMMUV3,
+    .instance_size = sizeof(SMMUv3NestedState),
+    .class_size    = sizeof(SMMUv3NestedClass),
+    .class_init    = smmuv3_nested_class_init,
+};
+
 static const TypeInfo smmuv3_type_info = {
     .name          = TYPE_ARM_SMMUV3,
     .parent        = TYPE_ARM_SMMU,
@@ -2355,6 +2388,7 @@ static const TypeInfo smmuv3_iommu_memory_region_info = {
 static void smmuv3_register_types(void)
 {
     type_register(&smmuv3_type_info);
+    type_register(&smmuv3_nested_type_info);
     type_register(&smmuv3_iommu_memory_region_info);
 }
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 780bcff77c..38075f9ab2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -181,6 +181,7 @@ static const MemMapEntry base_memmap[] = {
     [VIRT_PVTIME] =             { 0x090a0000, 0x00010000 },
     [VIRT_SECURE_GPIO] =        { 0x090b0000, 0x00001000 },
     [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
+    [VIRT_SMMU_NESTED] =        { 0x0b000000, 0x01000000 },
     /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
     [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
     [VIRT_SECURE_MEM] =         { 0x0e000000, 0x01000000 },
@@ -226,6 +227,7 @@ static const int a15irqmap[] = {
     [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
     [VIRT_SMMU] = 74,    /* ...to 74 + NUM_SMMU_IRQS - 1 */
     [VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS -1 */
+    [VIRT_SMMU_NESTED] = 200,
 };
 
 static void create_randomness(MachineState *ms, const char *node)
@@ -2883,10 +2885,34 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                                         DeviceState *dev, Error **errp)
 {
     VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
+    MachineClass *mc = MACHINE_GET_CLASS(vms);
 
-    if (vms->platform_bus_dev) {
-        MachineClass *mc = MACHINE_GET_CLASS(vms);
+    /* For smmuv3-nested devices we need to set the mem & irq */
+    if (device_is_dynamic_sysbus(mc, dev) &&
+        object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_NESTED)) {
+        hwaddr base = vms->memmap[VIRT_SMMU_NESTED].base;
+        int irq =  vms->irqmap[VIRT_SMMU_NESTED];
+
+        if (vms->smmu_nested_count >= MAX_SMMU_NESTED) {
+            error_setg(errp, "smmuv3-nested max count reached!");
+            return;
+        }
+
+        base += (vms->smmu_nested_count * SMMU_IO_LEN);
+        irq += (vms->smmu_nested_count * NUM_SMMU_IRQS);
 
+        sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, base);
+        for (int i = 0; i < 4; i++) {
+            sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
+                               qdev_get_gpio_in(vms->gic, irq + i));
+        }
+        if (vms->iommu != VIRT_IOMMU_SMMUV3_NESTED) {
+            vms->iommu = VIRT_IOMMU_SMMUV3_NESTED;
+        }
+        vms->smmu_nested_count++;
+    }
+
+    if (vms->platform_bus_dev) {
         if (device_is_dynamic_sysbus(mc, dev)) {
             platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
                                      SYS_BUS_DEVICE(dev));
@@ -3067,6 +3093,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_PLATFORM);
+    machine_class_allow_dynamic_sysbus_dev(mc, TYPE_ARM_SMMUV3_NESTED);
 #ifdef CONFIG_TPM
     machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
 #endif
diff --git a/hw/core/sysbus-fdt.c b/hw/core/sysbus-fdt.c
index eebcd28f9a..0f0d0b3e58 100644
--- a/hw/core/sysbus-fdt.c
+++ b/hw/core/sysbus-fdt.c
@@ -489,6 +489,7 @@ static const BindingEntry bindings[] = {
 #ifdef CONFIG_LINUX
     TYPE_BINDING(TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node),
     TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
+    TYPE_BINDING("arm-smmuv3-nested", no_fdt_node),
     VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a", add_amd_xgbe_fdt_node),
 #endif
 #ifdef CONFIG_TPM
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index d183a62766..87e628be7a 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -84,6 +84,21 @@ struct SMMUv3Class {
 #define TYPE_ARM_SMMUV3   "arm-smmuv3"
 OBJECT_DECLARE_TYPE(SMMUv3State, SMMUv3Class, ARM_SMMUV3)
 
+#define TYPE_ARM_SMMUV3_NESTED   "arm-smmuv3-nested"
+OBJECT_DECLARE_TYPE(SMMUv3NestedState, SMMUv3NestedClass, ARM_SMMUV3_NESTED)
+
+struct SMMUv3NestedState {
+    SMMUv3State smmuv3_state;
+};
+
+struct SMMUv3NestedClass {
+    /*< private >*/
+    SMMUv3Class smmuv3_class;
+    /*< public >*/
+
+    DeviceRealize parent_realize;
+};
+
 #define STAGE1_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S1P)
 #define STAGE2_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S2P)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 46f48fe561..50e47a4ef3 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -50,6 +50,9 @@
 /* MMIO region size for SMMUv3 */
 #define SMMU_IO_LEN 0x20000
 
+/* Max supported nested SMMUv3 */
+#define MAX_SMMU_NESTED 128
+
 enum {
     VIRT_FLASH,
     VIRT_MEM,
@@ -62,6 +65,7 @@ enum {
     VIRT_GIC_ITS,
     VIRT_GIC_REDIST,
     VIRT_SMMU,
+    VIRT_SMMU_NESTED,
     VIRT_UART0,
     VIRT_MMIO,
     VIRT_RTC,
@@ -92,6 +96,7 @@ enum {
 typedef enum VirtIOMMUType {
     VIRT_IOMMU_NONE,
     VIRT_IOMMU_SMMUV3,
+    VIRT_IOMMU_SMMUV3_NESTED,
     VIRT_IOMMU_VIRTIO,
 } VirtIOMMUType;
 
@@ -155,6 +160,7 @@ struct VirtMachineState {
     bool mte;
     bool dtb_randomness;
     bool second_ns_uart_present;
+    int smmu_nested_count;
     OnOffAuto acpi;
     VirtGICType gic_version;
     VirtIOMMUType iommu;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-08 12:52 ` [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device Shameer Kolothum via
@ 2024-11-13 17:12   ` Eric Auger
  2024-11-13 18:05     ` Nicolin Chen
  2024-11-14  8:20     ` Shameerali Kolothum Thodi via
  2024-11-13 18:00   ` Eric Auger
  1 sibling, 2 replies; 150+ messages in thread
From: Eric Auger @ 2024-11-13 17:12 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

Hi Shameer,
On 11/8/24 13:52, Shameer Kolothum wrote:
> Based on SMMUv3 as a parent device, add a user-creatable
> smmuv3-nested device. Subsequent patches will add support to
> specify a PCI bus for this device.
>
> Currently only supported for "virt", so hook up the sybus mem & irq
> for that  as well.
>
> No FDT support is added for now.
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3.c         | 34 ++++++++++++++++++++++++++++++++++
>  hw/arm/virt.c           | 31 +++++++++++++++++++++++++++++--
>  hw/core/sysbus-fdt.c    |  1 +
>  include/hw/arm/smmuv3.h | 15 +++++++++++++++
>  include/hw/arm/virt.h   |  6 ++++++
>  5 files changed, 85 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 2101031a8f..0033eb8125 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -2201,6 +2201,19 @@ static void smmu_realize(DeviceState *d, Error **errp)
>      smmu_init_irq(s, dev);
>  }
>  
> +static void smmu_nested_realize(DeviceState *d, Error **errp)
> +{
> +    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
nit: s/s_nested/ns or just s?
> +    SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_GET_CLASS(s_nested);
> +    Error *local_err = NULL;
> +
> +    c->parent_realize(d, &local_err);
I think it is safe to use errp directly here.
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
>  static const VMStateDescription vmstate_smmuv3_queue = {
>      .name = "smmuv3_queue",
>      .version_id = 1,
> @@ -2299,6 +2312,18 @@ static void smmuv3_class_init(ObjectClass *klass, void *data)
>      device_class_set_props(dc, smmuv3_properties);
>  }
>  
> +static void smmuv3_nested_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_CLASS(klass);
> +
> +    dc->vmsd = &vmstate_smmuv3;
> +    device_class_set_parent_realize(dc, smmu_nested_realize,
> +                                    &c->parent_realize);
> +    dc->user_creatable = true;
> +    dc->hotpluggable = false;
> +}
> +
>  static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
>                                        IOMMUNotifierFlag old,
>                                        IOMMUNotifierFlag new,
> @@ -2337,6 +2362,14 @@ static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
>      imrc->notify_flag_changed = smmuv3_notify_flag_changed;
>  }
>  
> +static const TypeInfo smmuv3_nested_type_info = {
> +    .name          = TYPE_ARM_SMMUV3_NESTED,
> +    .parent        = TYPE_ARM_SMMUV3,
> +    .instance_size = sizeof(SMMUv3NestedState),
> +    .class_size    = sizeof(SMMUv3NestedClass),
> +    .class_init    = smmuv3_nested_class_init,
> +};
> +
>  static const TypeInfo smmuv3_type_info = {
>      .name          = TYPE_ARM_SMMUV3,
>      .parent        = TYPE_ARM_SMMU,
> @@ -2355,6 +2388,7 @@ static const TypeInfo smmuv3_iommu_memory_region_info = {
>  static void smmuv3_register_types(void)
>  {
>      type_register(&smmuv3_type_info);
> +    type_register(&smmuv3_nested_type_info);
>      type_register(&smmuv3_iommu_memory_region_info);
>  }
>  
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 780bcff77c..38075f9ab2 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -181,6 +181,7 @@ static const MemMapEntry base_memmap[] = {
>      [VIRT_PVTIME] =             { 0x090a0000, 0x00010000 },
>      [VIRT_SECURE_GPIO] =        { 0x090b0000, 0x00001000 },
>      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
> +    [VIRT_SMMU_NESTED] =        { 0x0b000000, 0x01000000 },
I agree with Mostafa that the _NESTED terminology may not be the best
choice.
The motivation behind that multi-instance attempt, as introduced in
https://lore.kernel.org/all/ZEcT%2F7erkhHDaNvD@Asurada-Nvidia/
was:
- SMMUs with different feature bits
- support of VCMDQ HW extension for SMMU CMDQ
- need for separate S1 invalidation paths

If I understand correctly this is mostly wanted for VCMDQ handling? if
this is correct we may indicate that somehow in the terminology.

If I understand correctly VCMDQ terminology is NVidia specific while
ECMDQ is the baseline (?).
>      /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
>      [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
>      [VIRT_SECURE_MEM] =         { 0x0e000000, 0x01000000 },
> @@ -226,6 +227,7 @@ static const int a15irqmap[] = {
>      [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
>      [VIRT_SMMU] = 74,    /* ...to 74 + NUM_SMMU_IRQS - 1 */
>      [VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS -1 */
> +    [VIRT_SMMU_NESTED] = 200,
What is the max IRQs expected to be consumed. Wother to comment for next
interrupt user.
>  };
>  
>  static void create_randomness(MachineState *ms, const char *node)
> @@ -2883,10 +2885,34 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>                                          DeviceState *dev, Error **errp)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> +    MachineClass *mc = MACHINE_GET_CLASS(vms);
>  
> -    if (vms->platform_bus_dev) {
> -        MachineClass *mc = MACHINE_GET_CLASS(vms);
> +    /* For smmuv3-nested devices we need to set the mem & irq */
> +    if (device_is_dynamic_sysbus(mc, dev) &&
> +        object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_NESTED)) {
why did you choose not using the PLATFORM BUS infra which does that kind
of binding automatically (also it provisions for dedicated MMIOs and
IRQs). At least you would need to justify in the commit msg I think
> +        hwaddr base = vms->memmap[VIRT_SMMU_NESTED].base;
> +        int irq =  vms->irqmap[VIRT_SMMU_NESTED];
> +
> +        if (vms->smmu_nested_count >= MAX_SMMU_NESTED) {
> +            error_setg(errp, "smmuv3-nested max count reached!");
> +            return;
> +        }
> +
> +        base += (vms->smmu_nested_count * SMMU_IO_LEN);
> +        irq += (vms->smmu_nested_count * NUM_SMMU_IRQS);
>  
> +        sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, base);
> +        for (int i = 0; i < 4; i++) {
> +            sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
> +                               qdev_get_gpio_in(vms->gic, irq + i));
> +        }
> +        if (vms->iommu != VIRT_IOMMU_SMMUV3_NESTED) {
> +            vms->iommu = VIRT_IOMMU_SMMUV3_NESTED;
> +        }
> +        vms->smmu_nested_count++;
this kind of check would definitively not integrated in the platform bus
but this could be introduced generically in the framework though or
special cased after the platform_bus_link_device
> +    }
> +
> +    if (vms->platform_bus_dev) {
>          if (device_is_dynamic_sysbus(mc, dev)) {
>              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
>                                       SYS_BUS_DEVICE(dev));
> @@ -3067,6 +3093,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_PLATFORM);
> +    machine_class_allow_dynamic_sysbus_dev(mc, TYPE_ARM_SMMUV3_NESTED);
>  #ifdef CONFIG_TPM
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
>  #endif
> diff --git a/hw/core/sysbus-fdt.c b/hw/core/sysbus-fdt.c
> index eebcd28f9a..0f0d0b3e58 100644
> --- a/hw/core/sysbus-fdt.c
> +++ b/hw/core/sysbus-fdt.c
> @@ -489,6 +489,7 @@ static const BindingEntry bindings[] = {
>  #ifdef CONFIG_LINUX
>      TYPE_BINDING(TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node),
>      TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
> +    TYPE_BINDING("arm-smmuv3-nested", no_fdt_node),
>      VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a", add_amd_xgbe_fdt_node),
>  #endif
>  #ifdef CONFIG_TPM
> diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
> index d183a62766..87e628be7a 100644
> --- a/include/hw/arm/smmuv3.h
> +++ b/include/hw/arm/smmuv3.h
> @@ -84,6 +84,21 @@ struct SMMUv3Class {
>  #define TYPE_ARM_SMMUV3   "arm-smmuv3"
>  OBJECT_DECLARE_TYPE(SMMUv3State, SMMUv3Class, ARM_SMMUV3)
>  
> +#define TYPE_ARM_SMMUV3_NESTED   "arm-smmuv3-nested"
> +OBJECT_DECLARE_TYPE(SMMUv3NestedState, SMMUv3NestedClass, ARM_SMMUV3_NESTED)
> +
> +struct SMMUv3NestedState {
> +    SMMUv3State smmuv3_state;
> +};
> +
> +struct SMMUv3NestedClass {
> +    /*< private >*/
> +    SMMUv3Class smmuv3_class;
> +    /*< public >*/
> +
> +    DeviceRealize parent_realize;
> +};
> +
>  #define STAGE1_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S1P)
>  #define STAGE2_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S2P)
>  
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 46f48fe561..50e47a4ef3 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -50,6 +50,9 @@
>  /* MMIO region size for SMMUv3 */
>  #define SMMU_IO_LEN 0x20000
>  
> +/* Max supported nested SMMUv3 */
> +#define MAX_SMMU_NESTED 128
Ouch, that many?!
> +
>  enum {
>      VIRT_FLASH,
>      VIRT_MEM,
> @@ -62,6 +65,7 @@ enum {
>      VIRT_GIC_ITS,
>      VIRT_GIC_REDIST,
>      VIRT_SMMU,
> +    VIRT_SMMU_NESTED,
>      VIRT_UART0,
>      VIRT_MMIO,
>      VIRT_RTC,
> @@ -92,6 +96,7 @@ enum {
>  typedef enum VirtIOMMUType {
>      VIRT_IOMMU_NONE,
>      VIRT_IOMMU_SMMUV3,
> +    VIRT_IOMMU_SMMUV3_NESTED,
>      VIRT_IOMMU_VIRTIO,
>  } VirtIOMMUType;
>  
> @@ -155,6 +160,7 @@ struct VirtMachineState {
>      bool mte;
>      bool dtb_randomness;
>      bool second_ns_uart_present;
> +    int smmu_nested_count;
>      OnOffAuto acpi;
>      VirtGICType gic_version;
>      VirtIOMMUType iommu;
Thanks

Eric



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-13 17:12   ` Eric Auger
@ 2024-11-13 18:05     ` Nicolin Chen
  2024-11-26 18:28       ` Donald Dutile
  2024-11-14  8:20     ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2024-11-13 18:05 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	ddutile, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi Eric,

On Wed, Nov 13, 2024 at 06:12:15PM +0100, Eric Auger wrote:
> On 11/8/24 13:52, Shameer Kolothum wrote:
> > @@ -181,6 +181,7 @@ static const MemMapEntry base_memmap[] = {
> >      [VIRT_PVTIME] =             { 0x090a0000, 0x00010000 },
> >      [VIRT_SECURE_GPIO] =        { 0x090b0000, 0x00001000 },
> >      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
> > +    [VIRT_SMMU_NESTED] =        { 0x0b000000, 0x01000000 },

> I agree with Mostafa that the _NESTED terminology may not be the best
> choice.
> The motivation behind that multi-instance attempt, as introduced in
> https://lore.kernel.org/all/ZEcT%2F7erkhHDaNvD@Asurada-Nvidia/
> was:
> - SMMUs with different feature bits
> - support of VCMDQ HW extension for SMMU CMDQ
> - need for separate S1 invalidation paths
> 
> If I understand correctly this is mostly wanted for VCMDQ handling? if
> this is correct we may indicate that somehow in the terminology.
> 
> If I understand correctly VCMDQ terminology is NVidia specific while
> ECMDQ is the baseline (?).

VCMDQ makes a multi-vSMMU-instance design a hard requirement, yet
the point (3) for separate invalidation paths also matters. Jason
suggested VMM in base case to create multi vSMMU instances as the
kernel doc mentioned here:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Documentation/userspace-api/iommufd.rst#n84

W.r.t naming, maybe something related to "hardware-accelerated"?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-13 18:05     ` Nicolin Chen
@ 2024-11-26 18:28       ` Donald Dutile
  2024-11-27 10:21         ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Donald Dutile @ 2024-11-26 18:28 UTC (permalink / raw)
  To: Nicolin Chen, Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

On 11/13/24 1:05 PM, Nicolin Chen wrote:
> Hi Eric,
> 
> On Wed, Nov 13, 2024 at 06:12:15PM +0100, Eric Auger wrote:
>> On 11/8/24 13:52, Shameer Kolothum wrote:
>>> @@ -181,6 +181,7 @@ static const MemMapEntry base_memmap[] = {
>>>       [VIRT_PVTIME] =             { 0x090a0000, 0x00010000 },
>>>       [VIRT_SECURE_GPIO] =        { 0x090b0000, 0x00001000 },
>>>       [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
>>> +    [VIRT_SMMU_NESTED] =        { 0x0b000000, 0x01000000 },
> 
>> I agree with Mostafa that the _NESTED terminology may not be the best
>> choice.
>> The motivation behind that multi-instance attempt, as introduced in
>> https://lore.kernel.org/all/ZEcT%2F7erkhHDaNvD@Asurada-Nvidia/
>> was:
>> - SMMUs with different feature bits
>> - support of VCMDQ HW extension for SMMU CMDQ
>> - need for separate S1 invalidation paths
>>
>> If I understand correctly this is mostly wanted for VCMDQ handling? if
>> this is correct we may indicate that somehow in the terminology.
>>
>> If I understand correctly VCMDQ terminology is NVidia specific while
>> ECMDQ is the baseline (?).
> 
> VCMDQ makes a multi-vSMMU-instance design a hard requirement, yet
> the point (3) for separate invalidation paths also matters. Jason
> suggested VMM in base case to create multi vSMMU instances as the
> kernel doc mentioned here:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Documentation/userspace-api/iommufd.rst#n84
> 
> W.r.t naming, maybe something related to "hardware-accelerated"?
> 
Given that 'accel' has been used for hw-acceleration elsewhere, that seems like a reasonable 'mode'.
But, it needs a paramater to state was is being accelerated.
i.e., the more global 'accel=kvm' has 'kvm'.

For SMMUv3, NVIDIA-specific vCMDQ, it needs a parameter to state that specifically,
since I'm concluding from reading the SMMUv3 version G.a spec, that ECMDQ was added
to be able to assign an ECMDQ to a VM, and let the VM do CMDQ driven invalidations via
a similar mechanism as assigned PCI-device mmio space in a VM.
So, how should the QEMU invocation select what parts to 'accel' in the vSMMUv3 given
to the VM?  ... and given the history of hw-based, virt-acceleration, I can only guess
more SMMUv3 accel tweaks will be found/desired/implemented.

So, given there is an NVIDIA-specific/like ECMDQ, but different, the accel parameter
chosen has to consider 'name-space collision', i.e., accel=nv-vcmdq  and accel=ecmdq,
unless sw can be made to smartly probe and determine the underlying diffs, and have
equivalent functionality, in which case, a simpler 'accel=vcmdq' could be used.

Finally, wrt libvirt, how does it know/tell what can and should be used?
For ECMDQ, something under sysfs for an SMMUv3 could expose its presence/capability/availability
(tag for use/alloc'd for a VM), or an ioctl/cdev i/f to the SMMUv3.

But how does one know today that there's NVIDIA-vCMDQ support on its SMMUv3? -- is it
exposed in sysfs, ioctl, cdev?
... and all needs to be per-instance ....
... libvirt  (or any other VMM orchestrator) will need to determine compatibility for
     live migration. e.g., can one live migrate an accel=nv-vcmdq-based VM to a host with
     accel=ecmdq support?  only nv-vcmdq?  what if there are version diffs of nv-vcmdq over time?
     -- apologies, but I don't know the minute details of nv-vcmdq to determine if that's unlikely or not.

Once the qemu-smmuv3-api is defined, with the recognition of what libvirt (or any other VMM) needs to probe/check/use for hw-accelerated features,
I think it'll be more straight-fwd to implement, and (clearly) understand from a qemu command line. :)

Thanks,
- Don

> Thanks
> Nicolin
> 

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-26 18:28       ` Donald Dutile
@ 2024-11-27 10:21         ` Shameerali Kolothum Thodi via
  2024-11-27 16:00           ` Jason Gunthorpe
  2024-11-28  4:29           ` Donald Dutile
  0 siblings, 2 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-27 10:21 UTC (permalink / raw)
  To: Donald Dutile, Nicolin Chen, Eric Auger
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Donald Dutile <ddutile@redhat.com>
> Sent: Tuesday, November 26, 2024 6:29 PM
> To: Nicolin Chen <nicolinc@nvidia.com>; Eric Auger
> <eric.auger@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for
> SMMUv3 Nested device
> 
> 
> 
> On 11/13/24 1:05 PM, Nicolin Chen wrote:
> > Hi Eric,
> >
> > On Wed, Nov 13, 2024 at 06:12:15PM +0100, Eric Auger wrote:
> >> On 11/8/24 13:52, Shameer Kolothum wrote:
> >>> @@ -181,6 +181,7 @@ static const MemMapEntry base_memmap[] = {
> >>>       [VIRT_PVTIME] =             { 0x090a0000, 0x00010000 },
> >>>       [VIRT_SECURE_GPIO] =        { 0x090b0000, 0x00001000 },
> >>>       [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
> >>> +    [VIRT_SMMU_NESTED] =        { 0x0b000000, 0x01000000 },
> >
> >> I agree with Mostafa that the _NESTED terminology may not be the best
> >> choice.
> >> The motivation behind that multi-instance attempt, as introduced in
> >> https://lore.kernel.org/all/ZEcT%2F7erkhHDaNvD@Asurada-Nvidia/
> >> was:
> >> - SMMUs with different feature bits
> >> - support of VCMDQ HW extension for SMMU CMDQ
> >> - need for separate S1 invalidation paths
> >>
> >> If I understand correctly this is mostly wanted for VCMDQ handling? if
> >> this is correct we may indicate that somehow in the terminology.
> >>
> >> If I understand correctly VCMDQ terminology is NVidia specific while
> >> ECMDQ is the baseline (?).
> >
> > VCMDQ makes a multi-vSMMU-instance design a hard requirement, yet
> > the point (3) for separate invalidation paths also matters. Jason
> > suggested VMM in base case to create multi vSMMU instances as the
> > kernel doc mentioned here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-
> next.git/tree/Documentation/userspace-api/iommufd.rst#n84
> >
> > W.r.t naming, maybe something related to "hardware-accelerated"?
> >
> Given that 'accel' has been used for hw-acceleration elsewhere, that seems
> like a reasonable 'mode'.
> But, it needs a paramater to state was is being accelerated.
> i.e., the more global 'accel=kvm' has 'kvm'.

I was thinking more like calling this hw accelerated nested SMMUv3 emulation
as 'smmuv3-accel'.  This avoids confusion with the already existing 
'iommu=smmuv3' that also has a nested emulation support. 

ie,
-device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \

> 
> For SMMUv3, NVIDIA-specific vCMDQ, it needs a parameter to state that
> specifically,
> since I'm concluding from reading the SMMUv3 version G.a spec, that
> ECMDQ was added
> to be able to assign an ECMDQ to a VM,

Not sure the intention of ECMDQ as per that specification is to assign
it to a VM. I think the main idea behind it is to have one Command Queue 
per host CPU to eliminate lock contention while submitting commands
to SMMU.

AFAIK it is not safe to assign one of the ECMDQ to guest yet. I think there is no
way you can associate a VMID with ECMDQ. So there is no plan to
support ARM ECMDQ now.

NVIDIA VCMDQ is a completely vendor specific one. Perhaps ARM may come
up with an assignable CMDQ in future though.

 and let the VM do CMDQ driven
> invalidations via
> a similar mechanism as assigned PCI-device mmio space in a VM.
> So, how should the QEMU invocation select what parts to 'accel' in the
> vSMMUv3 given
> to the VM?  ... and given the history of hw-based, virt-acceleration, I can
> only guess
> more SMMUv3 accel tweaks will be found/desired/implemented.
> 
> So, given there is an NVIDIA-specific/like ECMDQ, but different, the accel
> parameter
> chosen has to consider 'name-space collision', i.e., accel=nv-vcmdq  and
> accel=ecmdq,
> unless sw can be made to smartly probe and determine the underlying
> diffs, and have
> equivalent functionality, in which case, a simpler 'accel=vcmdq' could be
> used.
> 

Yep. Probably we could abstract that from the user and handle it within
Qemu when the kernel reports the capability based on physical SMMUv3.

> Finally, wrt libvirt, how does it know/tell what can and should be used?
> For ECMDQ, something under sysfs for an SMMUv3 could expose its
> presence/capability/availability
> (tag for use/alloc'd for a VM), or an ioctl/cdev i/f to the SMMUv3.
> But how does one know today that there's NVIDIA-vCMDQ support on its
> SMMUv3? -- is it
> exposed in sysfs, ioctl, cdev?

I think the capability will be reported through a IOCTL.  Nicolin ?

> ... and all needs to be per-instance ....
> ... libvirt  (or any other VMM orchestrator) will need to determine
> compatibility for
>      live migration. e.g., can one live migrate an accel=nv-vcmdq-based VM to
> a host with
>      accel=ecmdq support?  only nv-vcmdq?  what if there are version diffs of
> nv-vcmdq over time?
>      -- apologies, but I don't know the minute details of nv-vcmdq to
> determine if that's unlikely or not.

Yes. This require more thought. But our first aim is get the basic smmuv3-accel
support.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-27 10:21         ` Shameerali Kolothum Thodi via
@ 2024-11-27 16:00           ` Jason Gunthorpe
  2024-11-27 16:05             ` Eric Auger
  2024-11-27 23:03             ` Donald Dutile
  2024-11-28  4:29           ` Donald Dutile
  1 sibling, 2 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2024-11-27 16:00 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Donald Dutile, Nicolin Chen, Eric Auger, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

On Wed, Nov 27, 2024 at 10:21:24AM +0000, Shameerali Kolothum Thodi wrote:
> > For SMMUv3, NVIDIA-specific vCMDQ, it needs a parameter to state that
> > specifically,
> > since I'm concluding from reading the SMMUv3 version G.a spec, that
> > ECMDQ was added
> > to be able to assign an ECMDQ to a VM,
> 
> Not sure the intention of ECMDQ as per that specification is to assign
> it to a VM. I think the main idea behind it is to have one Command Queue 
> per host CPU to eliminate lock contention while submitting commands
> to SMMU.

Right

> AFAIK it is not safe to assign one of the ECMDQ to guest yet. I think there is no
> way you can associate a VMID with ECMDQ. So there is no plan to
> support ARM ECMDQ now.

Yep

> NVIDIA VCMDQ is a completely vendor specific one. Perhaps ARM may come
> up with an assignable CMDQ in future though.

Yes, it is easy to imagine an ECMDQ extension that provides the same HW
features that VCMDQ has in future. I hope ARM will develop one.

> > ... and all needs to be per-instance ....
> > ... libvirt  (or any other VMM orchestrator) will need to determine
> > compatibility for
> >      live migration. e.g., can one live migrate an accel=nv-vcmdq-based VM to
> > a host with
> >      accel=ecmdq support?  only nv-vcmdq?  what if there are version diffs of
> > nv-vcmdq over time?
> >      -- apologies, but I don't know the minute details of nv-vcmdq to
> > determine if that's unlikely or not.
> 
> Yes. This require more thought. But our first aim is get the basic smmuv3-accel
> support.

Yeah, there is no live migration support yet in the SMMU qmeu driver,
AFAIK?

When it gets done the supported options will have to be considered

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-27 16:00           ` Jason Gunthorpe
@ 2024-11-27 16:05             ` Eric Auger
  2024-11-28  3:25               ` Zhangfei Gao
  2024-11-27 23:03             ` Donald Dutile
  1 sibling, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-27 16:05 UTC (permalink / raw)
  To: Jason Gunthorpe, Shameerali Kolothum Thodi
  Cc: Donald Dutile, Nicolin Chen, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



On 11/27/24 17:00, Jason Gunthorpe wrote:
> On Wed, Nov 27, 2024 at 10:21:24AM +0000, Shameerali Kolothum Thodi wrote:
>>> For SMMUv3, NVIDIA-specific vCMDQ, it needs a parameter to state that
>>> specifically,
>>> since I'm concluding from reading the SMMUv3 version G.a spec, that
>>> ECMDQ was added
>>> to be able to assign an ECMDQ to a VM,
>> Not sure the intention of ECMDQ as per that specification is to assign
>> it to a VM. I think the main idea behind it is to have one Command Queue 
>> per host CPU to eliminate lock contention while submitting commands
>> to SMMU.
> Right
>
>> AFAIK it is not safe to assign one of the ECMDQ to guest yet. I think there is no
>> way you can associate a VMID with ECMDQ. So there is no plan to
>> support ARM ECMDQ now.
> Yep
>
>> NVIDIA VCMDQ is a completely vendor specific one. Perhaps ARM may come
>> up with an assignable CMDQ in future though.
> Yes, it is easy to imagine an ECMDQ extension that provides the same HW
> features that VCMDQ has in future. I hope ARM will develop one.
>
>>> ... and all needs to be per-instance ....
>>> ... libvirt  (or any other VMM orchestrator) will need to determine
>>> compatibility for
>>>      live migration. e.g., can one live migrate an accel=nv-vcmdq-based VM to
>>> a host with
>>>      accel=ecmdq support?  only nv-vcmdq?  what if there are version diffs of
>>> nv-vcmdq over time?
>>>      -- apologies, but I don't know the minute details of nv-vcmdq to
>>> determine if that's unlikely or not.
>> Yes. This require more thought. But our first aim is get the basic smmuv3-accel
>> support.
> Yeah, there is no live migration support yet in the SMMU qmeu driver,
> AFAIK?
the non accelerated SMMU QEMU device does support migration.

Eric
>
> When it gets done the supported options will have to be considered
>
> Jason
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-27 16:05             ` Eric Auger
@ 2024-11-28  3:25               ` Zhangfei Gao
  2024-11-28  8:06                 ` Eric Auger
  0 siblings, 1 reply; 150+ messages in thread
From: Zhangfei Gao @ 2024-11-28  3:25 UTC (permalink / raw)
  To: eric.auger
  Cc: Jason Gunthorpe, Shameerali Kolothum Thodi, Donald Dutile,
	Nicolin Chen, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron

Hi, Eric

On Thu, 28 Nov 2024 at 00:06, Eric Auger <eric.auger@redhat.com> wrote:

> > Yeah, there is no live migration support yet in the SMMU qmeu driver,
> > AFAIK?
> the non accelerated SMMU QEMU device does support migration.

Could you clarify more about this?
The migration is not supported if using viommu (SMMU QEMU device), isn't it?

Thanks


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-28  3:25               ` Zhangfei Gao
@ 2024-11-28  8:06                 ` Eric Auger
  2024-11-28  8:28                   ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-28  8:06 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Jason Gunthorpe, Shameerali Kolothum Thodi, Donald Dutile,
	Nicolin Chen, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron



On 11/28/24 04:25, Zhangfei Gao wrote:
> Hi, Eric
>
> On Thu, 28 Nov 2024 at 00:06, Eric Auger <eric.auger@redhat.com> wrote:
>
>>> Yeah, there is no live migration support yet in the SMMU qmeu driver,
>>> AFAIK?
>> the non accelerated SMMU QEMU device does support migration.
> Could you clarify more about this?
> The migration is not supported if using viommu (SMMU QEMU device), isn't it?
No this is not correct. Current QEMU SMMU device *does* support
migration (see VMStateDescription) as well as qemu virtio-iommu device.
so for instance if you run a guest with smmuv3 and protected virtio-pci
devices this is supposed to be migratable. If it does not work this is
bug and this should be fixed ;-)

Thanks

Eric
>
> Thanks
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-28  8:06                 ` Eric Auger
@ 2024-11-28  8:28                   ` Shameerali Kolothum Thodi via
  2024-11-28  8:41                     ` Eric Auger
  2024-11-28 12:52                     ` Jason Gunthorpe
  0 siblings, 2 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-28  8:28 UTC (permalink / raw)
  To: eric.auger@redhat.com, Zhangfei Gao
  Cc: Jason Gunthorpe, Donald Dutile, Nicolin Chen, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron



> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, November 28, 2024 8:07 AM
> To: Zhangfei Gao <zhangfei.gao@linaro.org>
> Cc: Jason Gunthorpe <jgg@nvidia.com>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Donald Dutile
> <ddutile@redhat.com>; Nicolin Chen <nicolinc@nvidia.com>; qemu-
> arm@nongnu.org; qemu-devel@nongnu.org; peter.maydell@linaro.org;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>
> Subject: Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for
> SMMUv3 Nested device
> 
> 
> 
> On 11/28/24 04:25, Zhangfei Gao wrote:
> > Hi, Eric
> >
> > On Thu, 28 Nov 2024 at 00:06, Eric Auger <eric.auger@redhat.com> wrote:
> >
> >>> Yeah, there is no live migration support yet in the SMMU qmeu driver,
> >>> AFAIK?
> >> the non accelerated SMMU QEMU device does support migration.
> > Could you clarify more about this?
> > The migration is not supported if using viommu (SMMU QEMU device),
> isn't it?
> No this is not correct. Current QEMU SMMU device *does* support
> migration (see VMStateDescription) as well as qemu virtio-iommu device.
> so for instance if you run a guest with smmuv3 and protected virtio-pci
> devices this is supposed to be migratable. If it does not work this is
> bug and this should be fixed ;-)

I think if I am right Zhangfei was testing with vfio-pci device assigned on his vSVA
branch. But migration with vfio device is currently explicitly blocked if vIOMMU is
present. 

I think Joao is working on it here[1].

But we may require additional support when we have vSVA to handle any
in-flight page fault handling gracefully.

Thanks,
Shameer
1. https://lore.kernel.org/all/20230622214845.3980-1-joao.m.martins@oracle.com/




^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-28  8:28                   ` Shameerali Kolothum Thodi via
@ 2024-11-28  8:41                     ` Eric Auger
  2024-11-28 12:52                     ` Jason Gunthorpe
  1 sibling, 0 replies; 150+ messages in thread
From: Eric Auger @ 2024-11-28  8:41 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Zhangfei Gao
  Cc: Jason Gunthorpe, Donald Dutile, Nicolin Chen, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron

Hi Shameer,

On 11/28/24 09:28, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Thursday, November 28, 2024 8:07 AM
>> To: Zhangfei Gao <zhangfei.gao@linaro.org>
>> Cc: Jason Gunthorpe <jgg@nvidia.com>; Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; Donald Dutile
>> <ddutile@redhat.com>; Nicolin Chen <nicolinc@nvidia.com>; qemu-
>> arm@nongnu.org; qemu-devel@nongnu.org; peter.maydell@linaro.org;
>> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>
>> Subject: Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for
>> SMMUv3 Nested device
>>
>>
>>
>> On 11/28/24 04:25, Zhangfei Gao wrote:
>>> Hi, Eric
>>>
>>> On Thu, 28 Nov 2024 at 00:06, Eric Auger <eric.auger@redhat.com> wrote:
>>>
>>>>> Yeah, there is no live migration support yet in the SMMU qmeu driver,
>>>>> AFAIK?
>>>> the non accelerated SMMU QEMU device does support migration.
>>> Could you clarify more about this?
>>> The migration is not supported if using viommu (SMMU QEMU device),
>> isn't it?
>> No this is not correct. Current QEMU SMMU device *does* support
>> migration (see VMStateDescription) as well as qemu virtio-iommu device.
>> so for instance if you run a guest with smmuv3 and protected virtio-pci
>> devices this is supposed to be migratable. If it does not work this is
>> bug and this should be fixed ;-)
> I think if I am right Zhangfei was testing with vfio-pci device assigned on his vSVA
> branch. But migration with vfio device is currently explicitly blocked if vIOMMU is
> present. 
definitively I was talking about migration vSMMU/VFIO which is not upstream.
>
> I think Joao is working on it here[1].
>
> But we may require additional support when we have vSVA to handle any
> in-flight page fault handling gracefully.
>
> Thanks,
> Shameer
> 1. https://lore.kernel.org/all/20230622214845.3980-1-joao.m.martins@oracle.com/
Thanks

Eric
>
>
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-28  8:28                   ` Shameerali Kolothum Thodi via
  2024-11-28  8:41                     ` Eric Auger
@ 2024-11-28 12:52                     ` Jason Gunthorpe
  1 sibling, 0 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2024-11-28 12:52 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: eric.auger@redhat.com, Zhangfei Gao, Donald Dutile, Nicolin Chen,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron

On Thu, Nov 28, 2024 at 08:28:15AM +0000, Shameerali Kolothum Thodi wrote:

> I think if I am right Zhangfei was testing with vfio-pci device assigned on his vSVA
> branch. But migration with vfio device is currently explicitly blocked if vIOMMU is
> present. 
> 
> I think Joao is working on it here[1].

Right, this is what I was thinking of. What use is smmu migration
support if VFIO side blocks any useful configuration?

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-27 16:00           ` Jason Gunthorpe
  2024-11-27 16:05             ` Eric Auger
@ 2024-11-27 23:03             ` Donald Dutile
  2024-11-28 12:51               ` Jason Gunthorpe
  1 sibling, 1 reply; 150+ messages in thread
From: Donald Dutile @ 2024-11-27 23:03 UTC (permalink / raw)
  To: Jason Gunthorpe, Shameerali Kolothum Thodi
  Cc: Nicolin Chen, Eric Auger, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



On 11/27/24 11:00 AM, Jason Gunthorpe wrote:
> On Wed, Nov 27, 2024 at 10:21:24AM +0000, Shameerali Kolothum Thodi wrote:
>>> For SMMUv3, NVIDIA-specific vCMDQ, it needs a parameter to state that
>>> specifically,
>>> since I'm concluding from reading the SMMUv3 version G.a spec, that
>>> ECMDQ was added
>>> to be able to assign an ECMDQ to a VM,
>>
>> Not sure the intention of ECMDQ as per that specification is to assign
>> it to a VM. I think the main idea behind it is to have one Command Queue
>> per host CPU to eliminate lock contention while submitting commands
>> to SMMU.
> 
> Right
> 
>> AFAIK it is not safe to assign one of the ECMDQ to guest yet. I think there is no
>> way you can associate a VMID with ECMDQ. So there is no plan to
>> support ARM ECMDQ now.
> 
> Yep
>
'Yet' ...
The association would be done via the VMM -- no different then what associates
an assigned device to a VM today -- no hw-level (VM-)ID needed; a matter of exposing
it to the VM, or not; or mapping the (virtual) CMDQ to the mapped/associated ECMDQ.
They are purposedly mapped 64K apart from each other, enabling page-level protection,
which I doubt is a per-CPU req for lock contention avoidance (large-cache-block
spaced would be sufficient, even 4k; it's 64k spaced btwn ECMDQ regs .. the largest
ARM page size.

Summary: let's not assume this can't happen, and the chosen cmdline prevents it.

>> NVIDIA VCMDQ is a completely vendor specific one. Perhaps ARM may come
>> up with an assignable CMDQ in future though.
> 
> Yes, it is easy to imagine an ECMDQ extension that provides the same HW
> features that VCMDQ has in future. I hope ARM will develop one.
> 
Right, so how to know what op is being "accel"'d wrt smmuv3.

>>> ... and all needs to be per-instance ....
>>> ... libvirt  (or any other VMM orchestrator) will need to determine
>>> compatibility for
>>>       live migration. e.g., can one live migrate an accel=nv-vcmdq-based VM to
>>> a host with
>>>       accel=ecmdq support?  only nv-vcmdq?  what if there are version diffs of
>>> nv-vcmdq over time?
>>>       -- apologies, but I don't know the minute details of nv-vcmdq to
>>> determine if that's unlikely or not.
>>
>> Yes. This require more thought. But our first aim is get the basic smmuv3-accel
>> support.
> 
> Yeah, there is no live migration support yet in the SMMU qmeu driver,
> AFAIK?
> 
> When it gets done the supported options will have to be considered
> 
> Jason
> 



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-27 23:03             ` Donald Dutile
@ 2024-11-28 12:51               ` Jason Gunthorpe
  0 siblings, 0 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2024-11-28 12:51 UTC (permalink / raw)
  To: Donald Dutile
  Cc: Shameerali Kolothum Thodi, Nicolin Chen, Eric Auger,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Wed, Nov 27, 2024 at 06:03:23PM -0500, Donald Dutile wrote:

> The association would be done via the VMM -- no different then what associates
> an assigned device to a VM today -- no hw-level (VM-)ID needed; a matter of exposing
> it to the VM, or not; or mapping the (virtual) CMDQ to the mapped/associated ECMDQ.
> They are purposedly mapped 64K apart from each other, enabling page-level protection,
> which I doubt is a per-CPU req for lock contention avoidance (large-cache-block
> spaced would be sufficient, even 4k; it's 64k spaced btwn ECMDQ regs .. the largest
> ARM page size.

There are commands a VM could stuff on the ECMQ that would harm the
hypervisor. Without VMID isolation you can't use ECMDQ like this
today.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-27 10:21         ` Shameerali Kolothum Thodi via
  2024-11-27 16:00           ` Jason Gunthorpe
@ 2024-11-28  4:29           ` Donald Dutile
  2024-11-28  4:44             ` Nicolin Chen
  2024-11-28  8:17             ` Shameerali Kolothum Thodi via
  1 sibling, 2 replies; 150+ messages in thread
From: Donald Dutile @ 2024-11-28  4:29 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen, Eric Auger
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



On 11/27/24 5:21 AM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Donald Dutile <ddutile@redhat.com>
>> Sent: Tuesday, November 26, 2024 6:29 PM
>> To: Nicolin Chen <nicolinc@nvidia.com>; Eric Auger
>> <eric.auger@redhat.com>
>> Cc: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
>> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for
>> SMMUv3 Nested device
>>
>>
>>
>> On 11/13/24 1:05 PM, Nicolin Chen wrote:
>>> Hi Eric,
>>>
>>> On Wed, Nov 13, 2024 at 06:12:15PM +0100, Eric Auger wrote:
>>>> On 11/8/24 13:52, Shameer Kolothum wrote:
>>>>> @@ -181,6 +181,7 @@ static const MemMapEntry base_memmap[] = {
>>>>>        [VIRT_PVTIME] =             { 0x090a0000, 0x00010000 },
>>>>>        [VIRT_SECURE_GPIO] =        { 0x090b0000, 0x00001000 },
>>>>>        [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
>>>>> +    [VIRT_SMMU_NESTED] =        { 0x0b000000, 0x01000000 },
>>>
>>>> I agree with Mostafa that the _NESTED terminology may not be the best
>>>> choice.
>>>> The motivation behind that multi-instance attempt, as introduced in
>>>> https://lore.kernel.org/all/ZEcT%2F7erkhHDaNvD@Asurada-Nvidia/
>>>> was:
>>>> - SMMUs with different feature bits
>>>> - support of VCMDQ HW extension for SMMU CMDQ
>>>> - need for separate S1 invalidation paths
>>>>
>>>> If I understand correctly this is mostly wanted for VCMDQ handling? if
>>>> this is correct we may indicate that somehow in the terminology.
>>>>
>>>> If I understand correctly VCMDQ terminology is NVidia specific while
>>>> ECMDQ is the baseline (?).
>>>
>>> VCMDQ makes a multi-vSMMU-instance design a hard requirement, yet
>>> the point (3) for separate invalidation paths also matters. Jason
>>> suggested VMM in base case to create multi vSMMU instances as the
>>> kernel doc mentioned here:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-
>> next.git/tree/Documentation/userspace-api/iommufd.rst#n84
>>>
>>> W.r.t naming, maybe something related to "hardware-accelerated"?
>>>
>> Given that 'accel' has been used for hw-acceleration elsewhere, that seems
>> like a reasonable 'mode'.
>> But, it needs a paramater to state was is being accelerated.
>> i.e., the more global 'accel=kvm' has 'kvm'.
> 
> I was thinking more like calling this hw accelerated nested SMMUv3 emulation
> as 'smmuv3-accel'.  This avoids confusion with the already existing
> 'iommu=smmuv3' that also has a nested emulation support.
> 
> ie,
> -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
> 
I -think- you are saying below, that we have to think a bit more about this
device tagging.  I'm thinking more like
  - device arm-smmuv3,accel=<vcmdq>,id=smmu1,bus=pcie.1 \

>>
>> For SMMUv3, NVIDIA-specific vCMDQ, it needs a parameter to state that
>> specifically,
>> since I'm concluding from reading the SMMUv3 version G.a spec, that
>> ECMDQ was added
>> to be able to assign an ECMDQ to a VM,
> 
> Not sure the intention of ECMDQ as per that specification is to assign
> it to a VM. I think the main idea behind it is to have one Command Queue
> per host CPU to eliminate lock contention while submitting commands
> to SMMU.
> 
> AFAIK it is not safe to assign one of the ECMDQ to guest yet. I think there is no
> way you can associate a VMID with ECMDQ. So there is no plan to
> support ARM ECMDQ now.
> 
> NVIDIA VCMDQ is a completely vendor specific one. Perhaps ARM may come
> up with an assignable CMDQ in future though.
> 
>   and let the VM do CMDQ driven
>> invalidations via
>> a similar mechanism as assigned PCI-device mmio space in a VM.
>> So, how should the QEMU invocation select what parts to 'accel' in the
>> vSMMUv3 given
>> to the VM?  ... and given the history of hw-based, virt-acceleration, I can
>> only guess
>> more SMMUv3 accel tweaks will be found/desired/implemented.
>>
>> So, given there is an NVIDIA-specific/like ECMDQ, but different, the accel
>> parameter
>> chosen has to consider 'name-space collision', i.e., accel=nv-vcmdq  and
>> accel=ecmdq,
>> unless sw can be made to smartly probe and determine the underlying
>> diffs, and have
>> equivalent functionality, in which case, a simpler 'accel=vcmdq' could be
>> used.
>>
> 
> Yep. Probably we could abstract that from the user and handle it within
> Qemu when the kernel reports the capability based on physical SMMUv3.
> 
>> Finally, wrt libvirt, how does it know/tell what can and should be used?
>> For ECMDQ, something under sysfs for an SMMUv3 could expose its
>> presence/capability/availability
>> (tag for use/alloc'd for a VM), or an ioctl/cdev i/f to the SMMUv3.
>> But how does one know today that there's NVIDIA-vCMDQ support on its
>> SMMUv3? -- is it
>> exposed in sysfs, ioctl, cdev?
> 
> I think the capability will be reported through a IOCTL.  Nicolin ?
> 
>> ... and all needs to be per-instance ....
>> ... libvirt  (or any other VMM orchestrator) will need to determine
>> compatibility for
>>       live migration. e.g., can one live migrate an accel=nv-vcmdq-based VM to
>> a host with
>>       accel=ecmdq support?  only nv-vcmdq?  what if there are version diffs of
>> nv-vcmdq over time?
>>       -- apologies, but I don't know the minute details of nv-vcmdq to
>> determine if that's unlikely or not.
> 
> Yes. This require more thought. But our first aim is get the basic smmuv3-accel
> support.
> 
> Thanks,
> Shameer
> 



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-28  4:29           ` Donald Dutile
@ 2024-11-28  4:44             ` Nicolin Chen
  2024-11-28 12:54               ` Jason Gunthorpe
  2024-11-28  8:17             ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2024-11-28  4:44 UTC (permalink / raw)
  To: Donald Dutile
  Cc: Shameerali Kolothum Thodi, Eric Auger, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, jgg@nvidia.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

On Wed, Nov 27, 2024 at 11:29:06PM -0500, Donald Dutile wrote:
> On 11/27/24 5:21 AM, Shameerali Kolothum Thodi wrote:
> > > > W.r.t naming, maybe something related to "hardware-accelerated"?
> > > > 
> > > Given that 'accel' has been used for hw-acceleration elsewhere, that seems
> > > like a reasonable 'mode'.
> > > But, it needs a paramater to state was is being accelerated.
> > > i.e., the more global 'accel=kvm' has 'kvm'.
> > 
> > I was thinking more like calling this hw accelerated nested SMMUv3 emulation
> > as 'smmuv3-accel'.  This avoids confusion with the already existing
> > 'iommu=smmuv3' that also has a nested emulation support.
> > 
> > ie,
> > -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
> > 
..
> I -think- you are saying below, that we have to think a bit more about this
> device tagging.  I'm thinking more like
>  - device arm-smmuv3,accel=<vcmdq>,id=smmu1,bus=pcie.1 \

I wonder if we really need a "vcmdq" enabling/disabling option?

Jason's suggested approach for a vIOMMU allocation is to retry-
on-failure, so my draft patches allocate a TEGRA241_CMDQV type
of vIOMMU first, and then fall back to a regular SMMUV3 type if
it fails. So, a host that doesn't have a VCMDQ capability could
still work with the fallback/default pathway.

Otherwise, we'd need to expose some sysfs node as you mentioned
in the other reply, for libvirt to set or hide a "vcmdq" option.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-28  4:44             ` Nicolin Chen
@ 2024-11-28 12:54               ` Jason Gunthorpe
  2024-11-28 18:22                 ` Nicolin Chen
  2024-12-02 18:53                 ` Donald Dutile
  0 siblings, 2 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2024-11-28 12:54 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Donald Dutile, Shameerali Kolothum Thodi, Eric Auger,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Wed, Nov 27, 2024 at 08:44:47PM -0800, Nicolin Chen wrote:
> On Wed, Nov 27, 2024 at 11:29:06PM -0500, Donald Dutile wrote:
> > On 11/27/24 5:21 AM, Shameerali Kolothum Thodi wrote:
> > > > > W.r.t naming, maybe something related to "hardware-accelerated"?
> > > > > 
> > > > Given that 'accel' has been used for hw-acceleration elsewhere, that seems
> > > > like a reasonable 'mode'.
> > > > But, it needs a paramater to state was is being accelerated.
> > > > i.e., the more global 'accel=kvm' has 'kvm'.
> > > 
> > > I was thinking more like calling this hw accelerated nested SMMUv3 emulation
> > > as 'smmuv3-accel'.  This avoids confusion with the already existing
> > > 'iommu=smmuv3' that also has a nested emulation support.
> > > 
> > > ie,
> > > -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
> > > 
> ..
> > I -think- you are saying below, that we have to think a bit more about this
> > device tagging.  I'm thinking more like
> >  - device arm-smmuv3,accel=<vcmdq>,id=smmu1,bus=pcie.1 \
> 
> I wonder if we really need a "vcmdq" enabling/disabling option?
> 
> Jason's suggested approach for a vIOMMU allocation is to retry-
> on-failure, so my draft patches allocate a TEGRA241_CMDQV type
> of vIOMMU first, and then fall back to a regular SMMUV3 type if
> it fails. So, a host that doesn't have a VCMDQ capability could
> still work with the fallback/default pathway.

It needs to be configurable so the VM can be configured in a
consistent way across nodes

autodetection of host features is nice for experimenting but scale
deployments should precisely specify every detail about the VM and not
rely on host detection. Otherwise the VM instance type will be ill
defined..

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-28 12:54               ` Jason Gunthorpe
@ 2024-11-28 18:22                 ` Nicolin Chen
  2024-12-02 18:53                 ` Donald Dutile
  1 sibling, 0 replies; 150+ messages in thread
From: Nicolin Chen @ 2024-11-28 18:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Donald Dutile, Shameerali Kolothum Thodi, Eric Auger,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Thu, Nov 28, 2024 at 08:54:26AM -0400, Jason Gunthorpe wrote:
> On Wed, Nov 27, 2024 at 08:44:47PM -0800, Nicolin Chen wrote:
> > On Wed, Nov 27, 2024 at 11:29:06PM -0500, Donald Dutile wrote:
> > > On 11/27/24 5:21 AM, Shameerali Kolothum Thodi wrote:
> > > > > > W.r.t naming, maybe something related to "hardware-accelerated"?
> > > > > > 
> > > > > Given that 'accel' has been used for hw-acceleration elsewhere, that seems
> > > > > like a reasonable 'mode'.
> > > > > But, it needs a paramater to state was is being accelerated.
> > > > > i.e., the more global 'accel=kvm' has 'kvm'.
> > > > 
> > > > I was thinking more like calling this hw accelerated nested SMMUv3 emulation
> > > > as 'smmuv3-accel'.  This avoids confusion with the already existing
> > > > 'iommu=smmuv3' that also has a nested emulation support.
> > > > 
> > > > ie,
> > > > -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
> > > > 
> > ..
> > > I -think- you are saying below, that we have to think a bit more about this
> > > device tagging.  I'm thinking more like
> > >  - device arm-smmuv3,accel=<vcmdq>,id=smmu1,bus=pcie.1 \
> > 
> > I wonder if we really need a "vcmdq" enabling/disabling option?
> > 
> > Jason's suggested approach for a vIOMMU allocation is to retry-
> > on-failure, so my draft patches allocate a TEGRA241_CMDQV type
> > of vIOMMU first, and then fall back to a regular SMMUV3 type if
> > it fails. So, a host that doesn't have a VCMDQ capability could
> > still work with the fallback/default pathway.
> 
> It needs to be configurable so the VM can be configured in a
> consistent way across nodes
> 
> autodetection of host features is nice for experimenting but scale
> deployments should precisely specify every detail about the VM and not
> rely on host detection. Otherwise the VM instance type will be ill
> defined..

In that case, we'd need to expose a vcmdq capability somewhere.
We do for vIOMMU via hw_info. Should we keep the consistency?

Otherwise, some sysfs nodes (easier for libvirt) could do the
job too: num_available_vintfs, max_vcmdqs_per_vintf, and etc.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-28 12:54               ` Jason Gunthorpe
  2024-11-28 18:22                 ` Nicolin Chen
@ 2024-12-02 18:53                 ` Donald Dutile
  1 sibling, 0 replies; 150+ messages in thread
From: Donald Dutile @ 2024-12-02 18:53 UTC (permalink / raw)
  To: Jason Gunthorpe, Nicolin Chen
  Cc: Shameerali Kolothum Thodi, Eric Auger, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



On 11/28/24 7:54 AM, Jason Gunthorpe wrote:
> On Wed, Nov 27, 2024 at 08:44:47PM -0800, Nicolin Chen wrote:
>> On Wed, Nov 27, 2024 at 11:29:06PM -0500, Donald Dutile wrote:
>>> On 11/27/24 5:21 AM, Shameerali Kolothum Thodi wrote:
>>>>>> W.r.t naming, maybe something related to "hardware-accelerated"?
>>>>>>
>>>>> Given that 'accel' has been used for hw-acceleration elsewhere, that seems
>>>>> like a reasonable 'mode'.
>>>>> But, it needs a paramater to state was is being accelerated.
>>>>> i.e., the more global 'accel=kvm' has 'kvm'.
>>>>
>>>> I was thinking more like calling this hw accelerated nested SMMUv3 emulation
>>>> as 'smmuv3-accel'.  This avoids confusion with the already existing
>>>> 'iommu=smmuv3' that also has a nested emulation support.
>>>>
>>>> ie,
>>>> -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
>>>>
>> ..
>>> I -think- you are saying below, that we have to think a bit more about this
>>> device tagging.  I'm thinking more like
>>>   - device arm-smmuv3,accel=<vcmdq>,id=smmu1,bus=pcie.1 \
>>
>> I wonder if we really need a "vcmdq" enabling/disabling option?
>>
>> Jason's suggested approach for a vIOMMU allocation is to retry-
>> on-failure, so my draft patches allocate a TEGRA241_CMDQV type
>> of vIOMMU first, and then fall back to a regular SMMUV3 type if
>> it fails. So, a host that doesn't have a VCMDQ capability could
>> still work with the fallback/default pathway.
> 
> It needs to be configurable so the VM can be configured in a
> consistent way across nodes
> 
Yes.
To expound further, one wants to be able to define an 'acceptable'
VM criteria, so libvirt (or OpenStack?) can find and generate the list
of 'acceptable nodes', priori typically, that can be a match for the
acceptance criteria.
Conversely, if one specifies a set of systems that one wants to be able
to migrate across, then libvirt needs to find and select/set the features|attributes
that enable the VM to migrate in a compatible way.

> autodetection of host features is nice for experimenting but scale
> deployments should precisely specify every detail about the VM and not
> rely on host detection. Otherwise the VM instance type will be ill
> defined..
> 
> Jason
> 



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-28  4:29           ` Donald Dutile
  2024-11-28  4:44             ` Nicolin Chen
@ 2024-11-28  8:17             ` Shameerali Kolothum Thodi via
  1 sibling, 0 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-28  8:17 UTC (permalink / raw)
  To: Donald Dutile, Nicolin Chen, Eric Auger
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Donald Dutile <ddutile@redhat.com>
> Sent: Thursday, November 28, 2024 4:29 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Nicolin Chen
> <nicolinc@nvidia.com>; Eric Auger <eric.auger@redhat.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> peter.maydell@linaro.org; jgg@nvidia.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for
> SMMUv3 Nested device
> 
> 
> >>> W.r.t naming, maybe something related to "hardware-accelerated"?
> >>>
> >> Given that 'accel' has been used for hw-acceleration elsewhere, that
> seems
> >> like a reasonable 'mode'.
> >> But, it needs a paramater to state was is being accelerated.
> >> i.e., the more global 'accel=kvm' has 'kvm'.
> >
> > I was thinking more like calling this hw accelerated nested SMMUv3
> emulation
> > as 'smmuv3-accel'.  This avoids confusion with the already existing
> > 'iommu=smmuv3' that also has a nested emulation support.
> >
> > ie,
> > -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
> >
> I -think- you are saying below, that we have to think a bit more about this
> device tagging.  I'm thinking more like
>   - device arm-smmuv3,accel=<vcmdq>,id=smmu1,bus=pcie.1 \

Ok. But I think the initial suggestion to call this something else other than arm-smmuv3
came from the fact that it makes use of physical SMMUv3 nested stage support. This is
required for vfio-pci assignment. So I used "accel" in that context. That is what I
mean by basic functionality of this SMMUV3 device. If we need any additional accelerated
feature support then that can be provided as "properties" on top of this. Like,

- device arm-smmuv3-accel,id=smmu1,bus=pcie.1,vcmdq=on \

Or may be as Nicolin's suggestion(without explicit "vcmdq") of probing for vCMDQ
support transparently and falling back to basic support if not available.

I prefer the first one which gives an option to turn off if required. But don’t have any
strong opinion either way. 

Thanks,
Shameer.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-13 17:12   ` Eric Auger
  2024-11-13 18:05     ` Nicolin Chen
@ 2024-11-14  8:20     ` Shameerali Kolothum Thodi via
  2024-11-14  8:41       ` Eric Auger
  2024-11-15 22:32       ` Nicolin Chen
  1 sibling, 2 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-14  8:20 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, November 13, 2024 5:12 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for
> SMMUv3 Nested device
> 
> Hi Shameer,
> On 11/8/24 13:52, Shameer Kolothum wrote:
> > Based on SMMUv3 as a parent device, add a user-creatable smmuv3-
> nested
> > device. Subsequent patches will add support to specify a PCI bus for
> > this device.
> >
> > Currently only supported for "virt", so hook up the sybus mem & irq
> > for that  as well.
> >
> > No FDT support is added for now.
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  hw/arm/smmuv3.c         | 34 ++++++++++++++++++++++++++++++++++
> >  hw/arm/virt.c           | 31 +++++++++++++++++++++++++++++--
> >  hw/core/sysbus-fdt.c    |  1 +
> >  include/hw/arm/smmuv3.h | 15 +++++++++++++++
> >  include/hw/arm/virt.h   |  6 ++++++
> >  5 files changed, 85 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c index
> > 2101031a8f..0033eb8125 100644
> > --- a/hw/arm/smmuv3.c
> > +++ b/hw/arm/smmuv3.c
> > @@ -2201,6 +2201,19 @@ static void smmu_realize(DeviceState *d, Error
> **errp)
> >      smmu_init_irq(s, dev);
> >  }
> >
> > +static void smmu_nested_realize(DeviceState *d, Error **errp) {
> > +    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
> nit: s/s_nested/ns or just s?
> > +    SMMUv3NestedClass *c =
> ARM_SMMUV3_NESTED_GET_CLASS(s_nested);
> > +    Error *local_err = NULL;
> > +
> > +    c->parent_realize(d, &local_err);
> I think it is safe to use errp directly here.

Ok.

> > +    if (local_err) {
> > +        error_propagate(errp, local_err);
> > +        return;
> > +    }
> > +}
> > +
> >  static const VMStateDescription vmstate_smmuv3_queue = {
> >      .name = "smmuv3_queue",
> >      .version_id = 1,
> > @@ -2299,6 +2312,18 @@ static void smmuv3_class_init(ObjectClass
> *klass, void *data)
> >      device_class_set_props(dc, smmuv3_properties);  }
> >
> > +static void smmuv3_nested_class_init(ObjectClass *klass, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +    SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_CLASS(klass);
> > +
> > +    dc->vmsd = &vmstate_smmuv3;
> > +    device_class_set_parent_realize(dc, smmu_nested_realize,
> > +                                    &c->parent_realize);
> > +    dc->user_creatable = true;
> > +    dc->hotpluggable = false;
> > +}
> > +
> >  static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
> >                                        IOMMUNotifierFlag old,
> >                                        IOMMUNotifierFlag new, @@
> > -2337,6 +2362,14 @@ static void
> smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
> >      imrc->notify_flag_changed = smmuv3_notify_flag_changed;  }
> >
> > +static const TypeInfo smmuv3_nested_type_info = {
> > +    .name          = TYPE_ARM_SMMUV3_NESTED,
> > +    .parent        = TYPE_ARM_SMMUV3,
> > +    .instance_size = sizeof(SMMUv3NestedState),
> > +    .class_size    = sizeof(SMMUv3NestedClass),
> > +    .class_init    = smmuv3_nested_class_init,
> > +};
> > +
> >  static const TypeInfo smmuv3_type_info = {
> >      .name          = TYPE_ARM_SMMUV3,
> >      .parent        = TYPE_ARM_SMMU,
> > @@ -2355,6 +2388,7 @@ static const TypeInfo
> > smmuv3_iommu_memory_region_info = {  static void
> > smmuv3_register_types(void)  {
> >      type_register(&smmuv3_type_info);
> > +    type_register(&smmuv3_nested_type_info);
> >      type_register(&smmuv3_iommu_memory_region_info);
> >  }
> >
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c index
> > 780bcff77c..38075f9ab2 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -181,6 +181,7 @@ static const MemMapEntry base_memmap[] = {
> >      [VIRT_PVTIME] =             { 0x090a0000, 0x00010000 },
> >      [VIRT_SECURE_GPIO] =        { 0x090b0000, 0x00001000 },
> >      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
> > +    [VIRT_SMMU_NESTED] =        { 0x0b000000, 0x01000000 },
> I agree with Mostafa that the _NESTED terminology may not be the best
> choice.

Yes. Agree.

> The motivation behind that multi-instance attempt, as introduced in
> https://lore.kernel.org/all/ZEcT%2F7erkhHDaNvD@Asurada-Nvidia/
> was:
> - SMMUs with different feature bits
> - support of VCMDQ HW extension for SMMU CMDQ
> - need for separate S1 invalidation paths
> 
> If I understand correctly this is mostly wanted for VCMDQ handling? if this
> is correct we may indicate that somehow in the terminology.
> 

Not just for VCMDQ, but it benefits when we have multiple physical SMMUv3
instances as well.

> If I understand correctly VCMDQ terminology is NVidia specific while ECMDQ
> is the baseline (?).

Yes, VCMDQ is NVIDIA specific. And ECMDQ is ARM SMMUv3, but don’t think we
can associate ECMDQ with a virtual SMMUv3.

> >      /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that
> size */
> >      [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
> >      [VIRT_SECURE_MEM] =         { 0x0e000000, 0x01000000 },
> > @@ -226,6 +227,7 @@ static const int a15irqmap[] = {
> >      [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
> >      [VIRT_SMMU] = 74,    /* ...to 74 + NUM_SMMU_IRQS - 1 */
> >      [VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS
> > -1 */
> > +    [VIRT_SMMU_NESTED] = 200,
> What is the max IRQs expected to be consumed. Wother to comment for
> next interrupt user.

Depends on how many we plan to support max  and each requires minimum 4. I will
update with a  comment here.

> >  };
> >
> >  static void create_randomness(MachineState *ms, const char *node) @@
> > -2883,10 +2885,34 @@ static void
> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >                                          DeviceState *dev, Error
> > **errp)  {
> >      VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> > +    MachineClass *mc = MACHINE_GET_CLASS(vms);
> >
> > -    if (vms->platform_bus_dev) {
> > -        MachineClass *mc = MACHINE_GET_CLASS(vms);
> > +    /* For smmuv3-nested devices we need to set the mem & irq */
> > +    if (device_is_dynamic_sysbus(mc, dev) &&
> > +        object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_NESTED)) {
> why did you choose not using the PLATFORM BUS infra which does that
> kind of binding automatically (also it provisions for dedicated MMIOs and
> IRQs). At least you would need to justify in the commit msg I think

Because I was not  sure how to do this binding otherwise. I couldn't find
any such precedence  for a  dynamic platform bus dev binding 
MMIOs/IRQs(May be I didn't look hard). I mentioned it in cover letter.

Could you please give me some pointers/example for this? I will also
take another look.

> > +        hwaddr base = vms->memmap[VIRT_SMMU_NESTED].base;
> > +        int irq =  vms->irqmap[VIRT_SMMU_NESTED];
> > +
> > +        if (vms->smmu_nested_count >= MAX_SMMU_NESTED) {
> > +            error_setg(errp, "smmuv3-nested max count reached!");
> > +            return;
> > +        }
> > +
> > +        base += (vms->smmu_nested_count * SMMU_IO_LEN);
> > +        irq += (vms->smmu_nested_count * NUM_SMMU_IRQS);
> >
> > +        sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, base);
> > +        for (int i = 0; i < 4; i++) {
> > +            sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
> > +                               qdev_get_gpio_in(vms->gic, irq + i));
> > +        }
> > +        if (vms->iommu != VIRT_IOMMU_SMMUV3_NESTED) {
> > +            vms->iommu = VIRT_IOMMU_SMMUV3_NESTED;
> > +        }
> > +        vms->smmu_nested_count++;
> this kind of check would definitively not integrated in the platform bus but
> this could be introduced generically in the framework though or special
> cased after the platform_bus_link_device

Ok. So I assume there is a better way to link the MMIOs/IRQs as you mentioned 
above and we can add another helper to track this count as well.

> > +    }
> > +
> > +    if (vms->platform_bus_dev) {
> >          if (device_is_dynamic_sysbus(mc, dev)) {
> >              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms-
> >platform_bus_dev),
> >                                       SYS_BUS_DEVICE(dev)); @@ -3067,6
> > +3093,7 @@ static void virt_machine_class_init(ObjectClass *oc, void
> *data)
> >      machine_class_allow_dynamic_sysbus_dev(mc,
> TYPE_VFIO_AMD_XGBE);
> >      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
> >      machine_class_allow_dynamic_sysbus_dev(mc,
> TYPE_VFIO_PLATFORM);
> > +    machine_class_allow_dynamic_sysbus_dev(mc,
> > + TYPE_ARM_SMMUV3_NESTED);
> >  #ifdef CONFIG_TPM
> >      machine_class_allow_dynamic_sysbus_dev(mc,
> TYPE_TPM_TIS_SYSBUS);
> > #endif diff --git a/hw/core/sysbus-fdt.c b/hw/core/sysbus-fdt.c index
> > eebcd28f9a..0f0d0b3e58 100644
> > --- a/hw/core/sysbus-fdt.c
> > +++ b/hw/core/sysbus-fdt.c
> > @@ -489,6 +489,7 @@ static const BindingEntry bindings[] = {  #ifdef
> > CONFIG_LINUX
> >      TYPE_BINDING(TYPE_VFIO_CALXEDA_XGMAC,
> add_calxeda_midway_xgmac_fdt_node),
> >      TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
> > +    TYPE_BINDING("arm-smmuv3-nested", no_fdt_node),
> >      VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a",
> > add_amd_xgbe_fdt_node),  #endif  #ifdef CONFIG_TPM diff --git
> > a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h index
> > d183a62766..87e628be7a 100644
> > --- a/include/hw/arm/smmuv3.h
> > +++ b/include/hw/arm/smmuv3.h
> > @@ -84,6 +84,21 @@ struct SMMUv3Class {
> >  #define TYPE_ARM_SMMUV3   "arm-smmuv3"
> >  OBJECT_DECLARE_TYPE(SMMUv3State, SMMUv3Class, ARM_SMMUV3)
> >
> > +#define TYPE_ARM_SMMUV3_NESTED   "arm-smmuv3-nested"
> > +OBJECT_DECLARE_TYPE(SMMUv3NestedState, SMMUv3NestedClass,
> > +ARM_SMMUV3_NESTED)
> > +
> > +struct SMMUv3NestedState {
> > +    SMMUv3State smmuv3_state;
> > +};
> > +
> > +struct SMMUv3NestedClass {
> > +    /*< private >*/
> > +    SMMUv3Class smmuv3_class;
> > +    /*< public >*/
> > +
> > +    DeviceRealize parent_realize;
> > +};
> > +
> >  #define STAGE1_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S1P)
> >  #define STAGE2_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S2P)
> >
> > diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index
> > 46f48fe561..50e47a4ef3 100644
> > --- a/include/hw/arm/virt.h
> > +++ b/include/hw/arm/virt.h
> > @@ -50,6 +50,9 @@
> >  /* MMIO region size for SMMUv3 */
> >  #define SMMU_IO_LEN 0x20000
> >
> > +/* Max supported nested SMMUv3 */
> > +#define MAX_SMMU_NESTED 128
> Ouch, that many?!

😊. I just came up with the max we can support the allocated MMIO space.
We do have systems at present which has 8 physical SMMUv3s at the moment.
Probably 16/32 would be a better number I guess.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-14  8:20     ` Shameerali Kolothum Thodi via
@ 2024-11-14  8:41       ` Eric Auger
  2024-11-14 13:27         ` Shameerali Kolothum Thodi via
  2024-11-15 22:32       ` Nicolin Chen
  1 sibling, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-14  8:41 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer,

On 11/14/24 09:20, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Wednesday, November 13, 2024 5:12 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for
>> SMMUv3 Nested device
>>
>> Hi Shameer,
>> On 11/8/24 13:52, Shameer Kolothum wrote:
>>> Based on SMMUv3 as a parent device, add a user-creatable smmuv3-
>> nested
>>> device. Subsequent patches will add support to specify a PCI bus for
>>> this device.
>>>
>>> Currently only supported for "virt", so hook up the sybus mem & irq
>>> for that  as well.
>>>
>>> No FDT support is added for now.
>>>
>>> Signed-off-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com>
>>> ---
>>>  hw/arm/smmuv3.c         | 34 ++++++++++++++++++++++++++++++++++
>>>  hw/arm/virt.c           | 31 +++++++++++++++++++++++++++++--
>>>  hw/core/sysbus-fdt.c    |  1 +
>>>  include/hw/arm/smmuv3.h | 15 +++++++++++++++
>>>  include/hw/arm/virt.h   |  6 ++++++
>>>  5 files changed, 85 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c index
>>> 2101031a8f..0033eb8125 100644
>>> --- a/hw/arm/smmuv3.c
>>> +++ b/hw/arm/smmuv3.c
>>> @@ -2201,6 +2201,19 @@ static void smmu_realize(DeviceState *d, Error
>> **errp)
>>>      smmu_init_irq(s, dev);
>>>  }
>>>
>>> +static void smmu_nested_realize(DeviceState *d, Error **errp) {
>>> +    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
>> nit: s/s_nested/ns or just s?
>>> +    SMMUv3NestedClass *c =
>> ARM_SMMUV3_NESTED_GET_CLASS(s_nested);
>>> +    Error *local_err = NULL;
>>> +
>>> +    c->parent_realize(d, &local_err);
>> I think it is safe to use errp directly here.
> Ok.
>
>>> +    if (local_err) {
>>> +        error_propagate(errp, local_err);
>>> +        return;
>>> +    }
>>> +}
>>> +
>>>  static const VMStateDescription vmstate_smmuv3_queue = {
>>>      .name = "smmuv3_queue",
>>>      .version_id = 1,
>>> @@ -2299,6 +2312,18 @@ static void smmuv3_class_init(ObjectClass
>> *klass, void *data)
>>>      device_class_set_props(dc, smmuv3_properties);  }
>>>
>>> +static void smmuv3_nested_class_init(ObjectClass *klass, void *data)
>>> +{
>>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>>> +    SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_CLASS(klass);
>>> +
>>> +    dc->vmsd = &vmstate_smmuv3;
>>> +    device_class_set_parent_realize(dc, smmu_nested_realize,
>>> +                                    &c->parent_realize);
>>> +    dc->user_creatable = true;
>>> +    dc->hotpluggable = false;
>>> +}
>>> +
>>>  static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
>>>                                        IOMMUNotifierFlag old,
>>>                                        IOMMUNotifierFlag new, @@
>>> -2337,6 +2362,14 @@ static void
>> smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
>>>      imrc->notify_flag_changed = smmuv3_notify_flag_changed;  }
>>>
>>> +static const TypeInfo smmuv3_nested_type_info = {
>>> +    .name          = TYPE_ARM_SMMUV3_NESTED,
>>> +    .parent        = TYPE_ARM_SMMUV3,
>>> +    .instance_size = sizeof(SMMUv3NestedState),
>>> +    .class_size    = sizeof(SMMUv3NestedClass),
>>> +    .class_init    = smmuv3_nested_class_init,
>>> +};
>>> +
>>>  static const TypeInfo smmuv3_type_info = {
>>>      .name          = TYPE_ARM_SMMUV3,
>>>      .parent        = TYPE_ARM_SMMU,
>>> @@ -2355,6 +2388,7 @@ static const TypeInfo
>>> smmuv3_iommu_memory_region_info = {  static void
>>> smmuv3_register_types(void)  {
>>>      type_register(&smmuv3_type_info);
>>> +    type_register(&smmuv3_nested_type_info);
>>>      type_register(&smmuv3_iommu_memory_region_info);
>>>  }
>>>
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c index
>>> 780bcff77c..38075f9ab2 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -181,6 +181,7 @@ static const MemMapEntry base_memmap[] = {
>>>      [VIRT_PVTIME] =             { 0x090a0000, 0x00010000 },
>>>      [VIRT_SECURE_GPIO] =        { 0x090b0000, 0x00001000 },
>>>      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
>>> +    [VIRT_SMMU_NESTED] =        { 0x0b000000, 0x01000000 },
>> I agree with Mostafa that the _NESTED terminology may not be the best
>> choice.
> Yes. Agree.
Nicolin's suggestion to use a reference to HW acceleration looks
sensible to me.
>
>> The motivation behind that multi-instance attempt, as introduced in
>> https://lore.kernel.org/all/ZEcT%2F7erkhHDaNvD@Asurada-Nvidia/
>> was:
>> - SMMUs with different feature bits
>> - support of VCMDQ HW extension for SMMU CMDQ
>> - need for separate S1 invalidation paths
>>
>> If I understand correctly this is mostly wanted for VCMDQ handling? if this
>> is correct we may indicate that somehow in the terminology.
>>
> Not just for VCMDQ, but it benefits when we have multiple physical SMMUv3
> instances as well.
>
>> If I understand correctly VCMDQ terminology is NVidia specific while ECMDQ
>> is the baseline (?).
> Yes, VCMDQ is NVIDIA specific. And ECMDQ is ARM SMMUv3, but don’t think we
> can associate ECMDQ with a virtual SMMUv3.
ok
>
>>>      /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that
>> size */
>>>      [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
>>>      [VIRT_SECURE_MEM] =         { 0x0e000000, 0x01000000 },
>>> @@ -226,6 +227,7 @@ static const int a15irqmap[] = {
>>>      [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
>>>      [VIRT_SMMU] = 74,    /* ...to 74 + NUM_SMMU_IRQS - 1 */
>>>      [VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS
>>> -1 */
>>> +    [VIRT_SMMU_NESTED] = 200,
>> What is the max IRQs expected to be consumed. Wother to comment for
>> next interrupt user.
> Depends on how many we plan to support max  and each requires minimum 4. I will
> update with a  comment here.
>
>>>  };
>>>
>>>  static void create_randomness(MachineState *ms, const char *node) @@
>>> -2883,10 +2885,34 @@ static void
>> virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>>>                                          DeviceState *dev, Error
>>> **errp)  {
>>>      VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
>>> +    MachineClass *mc = MACHINE_GET_CLASS(vms);
>>>
>>> -    if (vms->platform_bus_dev) {
>>> -        MachineClass *mc = MACHINE_GET_CLASS(vms);
>>> +    /* For smmuv3-nested devices we need to set the mem & irq */
>>> +    if (device_is_dynamic_sysbus(mc, dev) &&
>>> +        object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_NESTED)) {
>> why did you choose not using the PLATFORM BUS infra which does that
>> kind of binding automatically (also it provisions for dedicated MMIOs and
>> IRQs). At least you would need to justify in the commit msg I think
> Because I was not  sure how to do this binding otherwise. I couldn't find
> any such precedence  for a  dynamic platform bus dev binding 
> MMIOs/IRQs(May be I didn't look hard). I mentioned it in cover letter.
>
> Could you please give me some pointers/example for this? I will also
> take another look.
vfio platform users such automatic binding (however you must check that
vfio platform bus mmio and irq space is large enough for your needs).

the binding is transparently handled by
    if (vms->platform_bus_dev) {
        if (device_is_dynamic_sysbus(mc, dev)) {
           
platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
                                     SYS_BUS_DEVICE(dev));
        }
    }


>
>>> +        hwaddr base = vms->memmap[VIRT_SMMU_NESTED].base;
>>> +        int irq =  vms->irqmap[VIRT_SMMU_NESTED];
>>> +
>>> +        if (vms->smmu_nested_count >= MAX_SMMU_NESTED) {
>>> +            error_setg(errp, "smmuv3-nested max count reached!");
>>> +            return;
>>> +        }
>>> +
>>> +        base += (vms->smmu_nested_count * SMMU_IO_LEN);
>>> +        irq += (vms->smmu_nested_count * NUM_SMMU_IRQS);
>>>
>>> +        sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, base);
>>> +        for (int i = 0; i < 4; i++) {
>>> +            sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
>>> +                               qdev_get_gpio_in(vms->gic, irq + i));
>>> +        }
>>> +        if (vms->iommu != VIRT_IOMMU_SMMUV3_NESTED) {
>>> +            vms->iommu = VIRT_IOMMU_SMMUV3_NESTED;
>>> +        }
>>> +        vms->smmu_nested_count++;
>> this kind of check would definitively not integrated in the platform bus but
>> this could be introduced generically in the framework though or special
>> cased after the platform_bus_link_device
> Ok. So I assume there is a better way to link the MMIOs/IRQs as you mentioned 
> above and we can add another helper to track this count as well.
>
>>> +    }
>>> +
>>> +    if (vms->platform_bus_dev) {
>>>          if (device_is_dynamic_sysbus(mc, dev)) {
>>>              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms-
>>> platform_bus_dev),
>>>                                       SYS_BUS_DEVICE(dev)); @@ -3067,6
>>> +3093,7 @@ static void virt_machine_class_init(ObjectClass *oc, void
>> *data)
>>>      machine_class_allow_dynamic_sysbus_dev(mc,
>> TYPE_VFIO_AMD_XGBE);
>>>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
>>>      machine_class_allow_dynamic_sysbus_dev(mc,
>> TYPE_VFIO_PLATFORM);
>>> +    machine_class_allow_dynamic_sysbus_dev(mc,
>>> + TYPE_ARM_SMMUV3_NESTED);
>>>  #ifdef CONFIG_TPM
>>>      machine_class_allow_dynamic_sysbus_dev(mc,
>> TYPE_TPM_TIS_SYSBUS);
>>> #endif diff --git a/hw/core/sysbus-fdt.c b/hw/core/sysbus-fdt.c index
>>> eebcd28f9a..0f0d0b3e58 100644
>>> --- a/hw/core/sysbus-fdt.c
>>> +++ b/hw/core/sysbus-fdt.c
>>> @@ -489,6 +489,7 @@ static const BindingEntry bindings[] = {  #ifdef
>>> CONFIG_LINUX
>>>      TYPE_BINDING(TYPE_VFIO_CALXEDA_XGMAC,
>> add_calxeda_midway_xgmac_fdt_node),
>>>      TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
>>> +    TYPE_BINDING("arm-smmuv3-nested", no_fdt_node),
>>>      VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a",
>>> add_amd_xgbe_fdt_node),  #endif  #ifdef CONFIG_TPM diff --git
>>> a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h index
>>> d183a62766..87e628be7a 100644
>>> --- a/include/hw/arm/smmuv3.h
>>> +++ b/include/hw/arm/smmuv3.h
>>> @@ -84,6 +84,21 @@ struct SMMUv3Class {
>>>  #define TYPE_ARM_SMMUV3   "arm-smmuv3"
>>>  OBJECT_DECLARE_TYPE(SMMUv3State, SMMUv3Class, ARM_SMMUV3)
>>>
>>> +#define TYPE_ARM_SMMUV3_NESTED   "arm-smmuv3-nested"
>>> +OBJECT_DECLARE_TYPE(SMMUv3NestedState, SMMUv3NestedClass,
>>> +ARM_SMMUV3_NESTED)
>>> +
>>> +struct SMMUv3NestedState {
>>> +    SMMUv3State smmuv3_state;
>>> +};
>>> +
>>> +struct SMMUv3NestedClass {
>>> +    /*< private >*/
>>> +    SMMUv3Class smmuv3_class;
>>> +    /*< public >*/
>>> +
>>> +    DeviceRealize parent_realize;
>>> +};
>>> +
>>>  #define STAGE1_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S1P)
>>>  #define STAGE2_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S2P)
>>>
>>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index
>>> 46f48fe561..50e47a4ef3 100644
>>> --- a/include/hw/arm/virt.h
>>> +++ b/include/hw/arm/virt.h
>>> @@ -50,6 +50,9 @@
>>>  /* MMIO region size for SMMUv3 */
>>>  #define SMMU_IO_LEN 0x20000
>>>
>>> +/* Max supported nested SMMUv3 */    if (vms->platform_bus_dev) {
>>>         if (device_is_dynamic_sysbus(mc, dev)) {
>>>             platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
>>>                                      SYS_BUS_DEVICE(dev));
>>>         }
>>>     }
>>>
>>> +#define MAX_SMMU_NESTED 128
>> Ouch, that many?!
> 😊. I just came up with the max we can support the allocated MMIO space.
> We do have systems at present which has 8 physical SMMUv3s at the moment.
> Probably 16/32 would be a better number I guess.
OK thanks

Eric
>
> Thanks,
> Shameer



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-14  8:41       ` Eric Auger
@ 2024-11-14 13:27         ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-14 13:27 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, November 14, 2024 8:42 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for
> SMMUv3 Nested device


> >> why did you choose not using the PLATFORM BUS infra which does that
> >> kind of binding automatically (also it provisions for dedicated MMIOs
> and
> >> IRQs). At least you would need to justify in the commit msg I think
> > Because I was not  sure how to do this binding otherwise. I couldn't find
> > any such precedence  for a  dynamic platform bus dev binding
> > MMIOs/IRQs(May be I didn't look hard). I mentioned it in cover letter.
> >
> > Could you please give me some pointers/example for this? I will also
> > take another look.
> vfio platform users such automatic binding (however you must check that
> vfio platform bus mmio and irq space is large enough for your needs).
> 
> the binding is transparently handled by
>     if (vms->platform_bus_dev) {
>         if (device_is_dynamic_sysbus(mc, dev)) {
> 
> platform_bus_link_device(PLATFORM_BUS_DEVICE(vms-
> >platform_bus_dev),
>                                      SYS_BUS_DEVICE(dev));
>         }
>     }

Ah..I see. I missed that it does that transparently.  And use 
platform_bus_get_mmio_addr() for retrieving it for ACPI/DT similar to what TPM
device is doing.

So we don't need specific entries for this device in memmap/irqmap.

I will give it a try.

Thanks,
Shameer.



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-14  8:20     ` Shameerali Kolothum Thodi via
  2024-11-14  8:41       ` Eric Auger
@ 2024-11-15 22:32       ` Nicolin Chen
  1 sibling, 0 replies; 150+ messages in thread
From: Nicolin Chen @ 2024-11-15 22:32 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

On Thu, Nov 14, 2024 at 08:20:08AM +0000, Shameerali Kolothum Thodi wrote:
> > > diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index
> > > 46f48fe561..50e47a4ef3 100644
> > > --- a/include/hw/arm/virt.h
> > > +++ b/include/hw/arm/virt.h
> > > @@ -50,6 +50,9 @@
> > >  /* MMIO region size for SMMUv3 */
> > >  #define SMMU_IO_LEN 0x20000
> > >
> > > +/* Max supported nested SMMUv3 */
> > > +#define MAX_SMMU_NESTED 128
> > Ouch, that many?!
> 
> 😊. I just came up with the max we can support the allocated MMIO space.
> We do have systems at present which has 8 physical SMMUv3s at the moment.
> Probably 16/32 would be a better number I guess.

FWIW, we have systems having 20 physical SMMUs at this moment :)

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  2024-11-08 12:52 ` [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device Shameer Kolothum via
  2024-11-13 17:12   ` Eric Auger
@ 2024-11-13 18:00   ` Eric Auger
  1 sibling, 0 replies; 150+ messages in thread
From: Eric Auger @ 2024-11-13 18:00 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao



On 11/8/24 13:52, Shameer Kolothum wrote:
> Based on SMMUv3 as a parent device, add a user-creatable
> smmuv3-nested device. Subsequent patches will add support to
> specify a PCI bus for this device.
>
> Currently only supported for "virt", so hook up the sybus mem & irq
> for that  as well.
>
> No FDT support is added for now.
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3.c         | 34 ++++++++++++++++++++++++++++++++++
>  hw/arm/virt.c           | 31 +++++++++++++++++++++++++++++--
>  hw/core/sysbus-fdt.c    |  1 +
>  include/hw/arm/smmuv3.h | 15 +++++++++++++++
>  include/hw/arm/virt.h   |  6 ++++++
>  5 files changed, 85 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 2101031a8f..0033eb8125 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -2201,6 +2201,19 @@ static void smmu_realize(DeviceState *d, Error **errp)
>      smmu_init_irq(s, dev);
>  }
>  
> +static void smmu_nested_realize(DeviceState *d, Error **errp)
> +{
> +    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
> +    SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_GET_CLASS(s_nested);
> +    Error *local_err = NULL;
> +
> +    c->parent_realize(d, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
>  static const VMStateDescription vmstate_smmuv3_queue = {
>      .name = "smmuv3_queue",
>      .version_id = 1,
> @@ -2299,6 +2312,18 @@ static void smmuv3_class_init(ObjectClass *klass, void *data)
>      device_class_set_props(dc, smmuv3_properties);
>  }
>  
> +static void smmuv3_nested_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_CLASS(klass);
> +
> +    dc->vmsd = &vmstate_smmuv3;
> +    device_class_set_parent_realize(dc, smmu_nested_realize,
> +                                    &c->parent_realize);
> +    dc->user_creatable = true;
this may be set at the very end of the series eventually.

Eric
> +    dc->hotpluggable = false;
> +}
> +
>  static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
>                                        IOMMUNotifierFlag old,
>                                        IOMMUNotifierFlag new,
> @@ -2337,6 +2362,14 @@ static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
>      imrc->notify_flag_changed = smmuv3_notify_flag_changed;
>  }
>  
> +static const TypeInfo smmuv3_nested_type_info = {
> +    .name          = TYPE_ARM_SMMUV3_NESTED,
> +    .parent        = TYPE_ARM_SMMUV3,
> +    .instance_size = sizeof(SMMUv3NestedState),
> +    .class_size    = sizeof(SMMUv3NestedClass),
> +    .class_init    = smmuv3_nested_class_init,
> +};
> +
>  static const TypeInfo smmuv3_type_info = {
>      .name          = TYPE_ARM_SMMUV3,
>      .parent        = TYPE_ARM_SMMU,
> @@ -2355,6 +2388,7 @@ static const TypeInfo smmuv3_iommu_memory_region_info = {
>  static void smmuv3_register_types(void)
>  {
>      type_register(&smmuv3_type_info);
> +    type_register(&smmuv3_nested_type_info);
>      type_register(&smmuv3_iommu_memory_region_info);
>  }
>  
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 780bcff77c..38075f9ab2 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -181,6 +181,7 @@ static const MemMapEntry base_memmap[] = {
>      [VIRT_PVTIME] =             { 0x090a0000, 0x00010000 },
>      [VIRT_SECURE_GPIO] =        { 0x090b0000, 0x00001000 },
>      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
> +    [VIRT_SMMU_NESTED] =        { 0x0b000000, 0x01000000 },
>      /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
>      [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
>      [VIRT_SECURE_MEM] =         { 0x0e000000, 0x01000000 },
> @@ -226,6 +227,7 @@ static const int a15irqmap[] = {
>      [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
>      [VIRT_SMMU] = 74,    /* ...to 74 + NUM_SMMU_IRQS - 1 */
>      [VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS -1 */
> +    [VIRT_SMMU_NESTED] = 200,
>  };
>  
>  static void create_randomness(MachineState *ms, const char *node)
> @@ -2883,10 +2885,34 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>                                          DeviceState *dev, Error **errp)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
> +    MachineClass *mc = MACHINE_GET_CLASS(vms);
>  
> -    if (vms->platform_bus_dev) {
> -        MachineClass *mc = MACHINE_GET_CLASS(vms);
> +    /* For smmuv3-nested devices we need to set the mem & irq */
> +    if (device_is_dynamic_sysbus(mc, dev) &&
> +        object_dynamic_cast(OBJECT(dev), TYPE_ARM_SMMUV3_NESTED)) {
> +        hwaddr base = vms->memmap[VIRT_SMMU_NESTED].base;
> +        int irq =  vms->irqmap[VIRT_SMMU_NESTED];
> +
> +        if (vms->smmu_nested_count >= MAX_SMMU_NESTED) {
> +            error_setg(errp, "smmuv3-nested max count reached!");
> +            return;
> +        }
> +
> +        base += (vms->smmu_nested_count * SMMU_IO_LEN);
> +        irq += (vms->smmu_nested_count * NUM_SMMU_IRQS);
>  
> +        sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, base);
> +        for (int i = 0; i < 4; i++) {
> +            sysbus_connect_irq(SYS_BUS_DEVICE(dev), i,
> +                               qdev_get_gpio_in(vms->gic, irq + i));
> +        }
> +        if (vms->iommu != VIRT_IOMMU_SMMUV3_NESTED) {
> +            vms->iommu = VIRT_IOMMU_SMMUV3_NESTED;
> +        }
> +        vms->smmu_nested_count++;
> +    }
> +
> +    if (vms->platform_bus_dev) {
>          if (device_is_dynamic_sysbus(mc, dev)) {
>              platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
>                                       SYS_BUS_DEVICE(dev));
> @@ -3067,6 +3093,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_AMD_XGBE);
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_VFIO_PLATFORM);
> +    machine_class_allow_dynamic_sysbus_dev(mc, TYPE_ARM_SMMUV3_NESTED);
>  #ifdef CONFIG_TPM
>      machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
>  #endif
> diff --git a/hw/core/sysbus-fdt.c b/hw/core/sysbus-fdt.c
> index eebcd28f9a..0f0d0b3e58 100644
> --- a/hw/core/sysbus-fdt.c
> +++ b/hw/core/sysbus-fdt.c
> @@ -489,6 +489,7 @@ static const BindingEntry bindings[] = {
>  #ifdef CONFIG_LINUX
>      TYPE_BINDING(TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node),
>      TYPE_BINDING(TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node),
> +    TYPE_BINDING("arm-smmuv3-nested", no_fdt_node),
>      VFIO_PLATFORM_BINDING("amd,xgbe-seattle-v1a", add_amd_xgbe_fdt_node),
>  #endif
>  #ifdef CONFIG_TPM
> diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
> index d183a62766..87e628be7a 100644
> --- a/include/hw/arm/smmuv3.h
> +++ b/include/hw/arm/smmuv3.h
> @@ -84,6 +84,21 @@ struct SMMUv3Class {
>  #define TYPE_ARM_SMMUV3   "arm-smmuv3"
>  OBJECT_DECLARE_TYPE(SMMUv3State, SMMUv3Class, ARM_SMMUV3)
>  
> +#define TYPE_ARM_SMMUV3_NESTED   "arm-smmuv3-nested"
> +OBJECT_DECLARE_TYPE(SMMUv3NestedState, SMMUv3NestedClass, ARM_SMMUV3_NESTED)
> +
> +struct SMMUv3NestedState {
> +    SMMUv3State smmuv3_state;
> +};
> +
> +struct SMMUv3NestedClass {
> +    /*< private >*/
> +    SMMUv3Class smmuv3_class;
> +    /*< public >*/
> +
> +    DeviceRealize parent_realize;
> +};
> +
>  #define STAGE1_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S1P)
>  #define STAGE2_SUPPORTED(s)      FIELD_EX32(s->idr[0], IDR0, S2P)
>  
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 46f48fe561..50e47a4ef3 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -50,6 +50,9 @@
>  /* MMIO region size for SMMUv3 */
>  #define SMMU_IO_LEN 0x20000
>  
> +/* Max supported nested SMMUv3 */
> +#define MAX_SMMU_NESTED 128
> +
>  enum {
>      VIRT_FLASH,
>      VIRT_MEM,
> @@ -62,6 +65,7 @@ enum {
>      VIRT_GIC_ITS,
>      VIRT_GIC_REDIST,
>      VIRT_SMMU,
> +    VIRT_SMMU_NESTED,
>      VIRT_UART0,
>      VIRT_MMIO,
>      VIRT_RTC,
> @@ -92,6 +96,7 @@ enum {
>  typedef enum VirtIOMMUType {
>      VIRT_IOMMU_NONE,
>      VIRT_IOMMU_SMMUV3,
> +    VIRT_IOMMU_SMMUV3_NESTED,
>      VIRT_IOMMU_VIRTIO,
>  } VirtIOMMUType;
>  
> @@ -155,6 +160,7 @@ struct VirtMachineState {
>      bool mte;
>      bool dtb_randomness;
>      bool second_ns_uart_present;
> +    int smmu_nested_count;
>      OnOffAuto acpi;
>      VirtGICType gic_version;
>      VirtIOMMUType iommu;



^ permalink raw reply	[flat|nested] 150+ messages in thread

* [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
  2024-11-08 12:52 ` [RFC PATCH 1/5] hw/arm/virt: Add an SMMU_IO_LEN macro Shameer Kolothum via
  2024-11-08 12:52 ` [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device Shameer Kolothum via
@ 2024-11-08 12:52 ` Shameer Kolothum via
  2024-11-13 17:58   ` Eric Auger
  2025-01-30 16:29   ` Daniel P. Berrangé
  2024-11-08 12:52 ` [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes Shameer Kolothum via
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 150+ messages in thread
From: Shameer Kolothum via @ 2024-11-08 12:52 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

Subsequent patches will add IORT modifications to get this working.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmuv3.c         | 27 +++++++++++++++++++++++++++
 include/hw/arm/smmuv3.h |  2 ++
 2 files changed, 29 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 0033eb8125..9b0a776769 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -24,6 +24,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
 #include "cpu.h"
 #include "trace.h"
 #include "qemu/log.h"
@@ -2201,12 +2202,32 @@ static void smmu_realize(DeviceState *d, Error **errp)
     smmu_init_irq(s, dev);
 }
 
+static int smmuv3_nested_pci_host_bridge(Object *obj, void *opaque)
+{
+    DeviceState *d = opaque;
+    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
+
+    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
+        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
+        if (s_nested->pci_bus && !strcmp(bus->qbus.name, s_nested->pci_bus)) {
+            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
+                                     &error_abort);
+        }
+    }
+    return 0;
+}
+
 static void smmu_nested_realize(DeviceState *d, Error **errp)
 {
     SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
     SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_GET_CLASS(s_nested);
+    SysBusDevice *dev = SYS_BUS_DEVICE(d);
     Error *local_err = NULL;
 
+    object_child_foreach_recursive(object_get_root(),
+                                   smmuv3_nested_pci_host_bridge, d);
+    object_property_set_bool(OBJECT(dev), "nested", true, &error_abort);
+
     c->parent_realize(d, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
@@ -2293,6 +2314,11 @@ static Property smmuv3_properties[] = {
     DEFINE_PROP_END_OF_LIST()
 };
 
+static Property smmuv3_nested_properties[] = {
+    DEFINE_PROP_STRING("pci-bus", SMMUv3NestedState, pci_bus),
+    DEFINE_PROP_END_OF_LIST()
+};
+
 static void smmuv3_instance_init(Object *obj)
 {
     /* Nothing much to do here as of now */
@@ -2320,6 +2346,7 @@ static void smmuv3_nested_class_init(ObjectClass *klass, void *data)
     dc->vmsd = &vmstate_smmuv3;
     device_class_set_parent_realize(dc, smmu_nested_realize,
                                     &c->parent_realize);
+    device_class_set_props(dc, smmuv3_nested_properties);
     dc->user_creatable = true;
     dc->hotpluggable = false;
 }
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index 87e628be7a..96513fce56 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -89,6 +89,8 @@ OBJECT_DECLARE_TYPE(SMMUv3NestedState, SMMUv3NestedClass, ARM_SMMUV3_NESTED)
 
 struct SMMUv3NestedState {
     SMMUv3State smmuv3_state;
+
+    char *pci_bus;
 };
 
 struct SMMUv3NestedClass {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device
  2024-11-08 12:52 ` [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a " Shameer Kolothum via
@ 2024-11-13 17:58   ` Eric Auger
  2024-11-14  8:30     ` Shameerali Kolothum Thodi via
  2025-01-30 16:29   ` Daniel P. Berrangé
  1 sibling, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-13 17:58 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

Hi Shameer,

On 11/8/24 13:52, Shameer Kolothum wrote:
> Subsequent patches will add IORT modifications to get this working.
add a proper commit msg once non RFC ;-)
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3.c         | 27 +++++++++++++++++++++++++++
>  include/hw/arm/smmuv3.h |  2 ++
>  2 files changed, 29 insertions(+)
>
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 0033eb8125..9b0a776769 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -24,6 +24,7 @@
>  #include "hw/qdev-properties.h"
>  #include "hw/qdev-core.h"
>  #include "hw/pci/pci.h"
> +#include "hw/pci/pci_bridge.h"
>  #include "cpu.h"
>  #include "trace.h"
>  #include "qemu/log.h"
> @@ -2201,12 +2202,32 @@ static void smmu_realize(DeviceState *d, Error **errp)
>      smmu_init_irq(s, dev);
>  }
>  
> +static int smmuv3_nested_pci_host_bridge(Object *obj, void *opaque)
> +{
> +    DeviceState *d = opaque;
> +    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
> +
> +    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> +        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> +        if (s_nested->pci_bus && !strcmp(bus->qbus.name, s_nested->pci_bus)) {
> +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
> +                                     &error_abort);
> +        }
> +    }
> +    return 0;
> +}
> +
>  static void smmu_nested_realize(DeviceState *d, Error **errp)
>  {
>      SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
>      SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_GET_CLASS(s_nested);
> +    SysBusDevice *dev = SYS_BUS_DEVICE(d);
>      Error *local_err = NULL;
>  
> +    object_child_foreach_recursive(object_get_root(),
> +                                   smmuv3_nested_pci_host_bridge, d);
Using a different opaque struct pointer you may properly use the errp
and nicely fail if the bus is not found (avoid using error_abort).
> +    object_property_set_bool(OBJECT(dev), "nested", true, &error_abort);
why do you need that nested property as the SMMU is already type'd
differently.
> +
>      c->parent_realize(d, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
> @@ -2293,6 +2314,11 @@ static Property smmuv3_properties[] = {
>      DEFINE_PROP_END_OF_LIST()
>  };
>  
> +static Property smmuv3_nested_properties[] = {
> +    DEFINE_PROP_STRING("pci-bus", SMMUv3NestedState, pci_bus),
nit: maybe we can use the "bus" name instead of pci-bus
> +    DEFINE_PROP_END_OF_LIST()
> +};
> +
>  static void smmuv3_instance_init(Object *obj)
>  {
>      /* Nothing much to do here as of now */
> @@ -2320,6 +2346,7 @@ static void smmuv3_nested_class_init(ObjectClass *klass, void *data)
>      dc->vmsd = &vmstate_smmuv3;
>      device_class_set_parent_realize(dc, smmu_nested_realize,
>                                      &c->parent_realize);
> +    device_class_set_props(dc, smmuv3_nested_properties);
>      dc->user_creatable = true;
>      dc->hotpluggable = false;
>  }
> diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
> index 87e628be7a..96513fce56 100644
> --- a/include/hw/arm/smmuv3.h
> +++ b/include/hw/arm/smmuv3.h
> @@ -89,6 +89,8 @@ OBJECT_DECLARE_TYPE(SMMUv3NestedState, SMMUv3NestedClass, ARM_SMMUV3_NESTED)
>  
>  struct SMMUv3NestedState {
>      SMMUv3State smmuv3_state;
> +
> +    char *pci_bus;
>  };
>  
>  struct SMMUv3NestedClass {
Thanks

Eric



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device
  2024-11-13 17:58   ` Eric Auger
@ 2024-11-14  8:30     ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-14  8:30 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, November 13, 2024 5:59 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a
> SMMUv3 Nested device
> 
> Hi Shameer,
> 
> On 11/8/24 13:52, Shameer Kolothum wrote:
> > Subsequent patches will add IORT modifications to get this working.
> add a proper commit msg once non RFC ;-)
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  hw/arm/smmuv3.c         | 27 +++++++++++++++++++++++++++
> >  include/hw/arm/smmuv3.h |  2 ++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> > index 0033eb8125..9b0a776769 100644
> > --- a/hw/arm/smmuv3.c
> > +++ b/hw/arm/smmuv3.c
> > @@ -24,6 +24,7 @@
> >  #include "hw/qdev-properties.h"
> >  #include "hw/qdev-core.h"
> >  #include "hw/pci/pci.h"
> > +#include "hw/pci/pci_bridge.h"
> >  #include "cpu.h"
> >  #include "trace.h"
> >  #include "qemu/log.h"
> > @@ -2201,12 +2202,32 @@ static void smmu_realize(DeviceState *d,
> Error **errp)
> >      smmu_init_irq(s, dev);
> >  }
> >
> > +static int smmuv3_nested_pci_host_bridge(Object *obj, void *opaque)
> > +{
> > +    DeviceState *d = opaque;
> > +    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
> > +
> > +    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> > +        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> > +        if (s_nested->pci_bus && !strcmp(bus->qbus.name, s_nested-
> >pci_bus)) {
> > +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
> > +                                     &error_abort);
> > +        }
> > +    }
> > +    return 0;
> > +}
> > +
> >  static void smmu_nested_realize(DeviceState *d, Error **errp)
> >  {
> >      SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
> >      SMMUv3NestedClass *c =
> ARM_SMMUV3_NESTED_GET_CLASS(s_nested);
> > +    SysBusDevice *dev = SYS_BUS_DEVICE(d);
> >      Error *local_err = NULL;
> >
> > +    object_child_foreach_recursive(object_get_root(),
> > +                                   smmuv3_nested_pci_host_bridge, d);
> Using a different opaque struct pointer you may properly use the errp
> and nicely fail if the bus is not found (avoid using error_abort).

Ok.

> > +    object_property_set_bool(OBJECT(dev), "nested", true, &error_abort);
> why do you need that nested property as the SMMU is already type'd
> differently.

I think it is because there are previous patches in Nicolin's branch that used this
"nested" property to differentiate the address pace. I will check and update.

> > +
> >      c->parent_realize(d, &local_err);
> >      if (local_err) {
> >          error_propagate(errp, local_err);
> > @@ -2293,6 +2314,11 @@ static Property smmuv3_properties[] = {
> >      DEFINE_PROP_END_OF_LIST()
> >  };
> >
> > +static Property smmuv3_nested_properties[] = {
> > +    DEFINE_PROP_STRING("pci-bus", SMMUv3NestedState, pci_bus),
> nit: maybe we can use the "bus" name instead of pci-bus

Ok.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device
  2024-11-08 12:52 ` [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a " Shameer Kolothum via
  2024-11-13 17:58   ` Eric Auger
@ 2025-01-30 16:29   ` Daniel P. Berrangé
  2025-01-30 18:19     ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-01-30 16:29 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

On Fri, Nov 08, 2024 at 12:52:40PM +0000, Shameer Kolothum via wrote:
> Subsequent patches will add IORT modifications to get this working.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmuv3.c         | 27 +++++++++++++++++++++++++++
>  include/hw/arm/smmuv3.h |  2 ++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 0033eb8125..9b0a776769 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -24,6 +24,7 @@
>  #include "hw/qdev-properties.h"
>  #include "hw/qdev-core.h"
>  #include "hw/pci/pci.h"
> +#include "hw/pci/pci_bridge.h"
>  #include "cpu.h"
>  #include "trace.h"
>  #include "qemu/log.h"
> @@ -2201,12 +2202,32 @@ static void smmu_realize(DeviceState *d, Error **errp)
>      smmu_init_irq(s, dev);
>  }
>  
> +static int smmuv3_nested_pci_host_bridge(Object *obj, void *opaque)
> +{
> +    DeviceState *d = opaque;
> +    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
> +
> +    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> +        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> +        if (s_nested->pci_bus && !strcmp(bus->qbus.name, s_nested->pci_bus)) {
> +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
> +                                     &error_abort);

Is the SMMUv3Nested useful if no 'primary-bus' is set ? 

If not, then the 'realize' method ought to validate 'pci-bus' is not
NULL and and raise an error if NULL.

After object_child_foreach_recursive returns, 'realize' should
also validate that 'primary-bus' has been set, and raise an error
it not set, to detect typos in the 'pci-bus' property.

> +        }
> +    }
> +    return 0;
> +}
> +
>  static void smmu_nested_realize(DeviceState *d, Error **errp)
>  {
>      SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
>      SMMUv3NestedClass *c = ARM_SMMUV3_NESTED_GET_CLASS(s_nested);
> +    SysBusDevice *dev = SYS_BUS_DEVICE(d);
>      Error *local_err = NULL;
>  
> +    object_child_foreach_recursive(object_get_root(),
> +                                   smmuv3_nested_pci_host_bridge, d);
> +    object_property_set_bool(OBJECT(dev), "nested", true, &error_abort);
> +
>      c->parent_realize(d, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
> @@ -2293,6 +2314,11 @@ static Property smmuv3_properties[] = {
>      DEFINE_PROP_END_OF_LIST()
>  };
>  
> +static Property smmuv3_nested_properties[] = {
> +    DEFINE_PROP_STRING("pci-bus", SMMUv3NestedState, pci_bus),
> +    DEFINE_PROP_END_OF_LIST()
> +};
> +
>  static void smmuv3_instance_init(Object *obj)
>  {
>      /* Nothing much to do here as of now */
> @@ -2320,6 +2346,7 @@ static void smmuv3_nested_class_init(ObjectClass *klass, void *data)
>      dc->vmsd = &vmstate_smmuv3;
>      device_class_set_parent_realize(dc, smmu_nested_realize,
>                                      &c->parent_realize);
> +    device_class_set_props(dc, smmuv3_nested_properties);
>      dc->user_creatable = true;
>      dc->hotpluggable = false;
>  }
> diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
> index 87e628be7a..96513fce56 100644
> --- a/include/hw/arm/smmuv3.h
> +++ b/include/hw/arm/smmuv3.h
> @@ -89,6 +89,8 @@ OBJECT_DECLARE_TYPE(SMMUv3NestedState, SMMUv3NestedClass, ARM_SMMUV3_NESTED)
>  
>  struct SMMUv3NestedState {
>      SMMUv3State smmuv3_state;
> +
> +    char *pci_bus;
>  };
>  
>  struct SMMUv3NestedClass {
> -- 
> 2.34.1
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device
  2025-01-30 16:29   ` Daniel P. Berrangé
@ 2025-01-30 18:19     ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-01-30 18:19 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, January 30, 2025 4:30 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a
> SMMUv3 Nested device
> 
> On Fri, Nov 08, 2024 at 12:52:40PM +0000, Shameer Kolothum via wrote:
> > Subsequent patches will add IORT modifications to get this working.
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  hw/arm/smmuv3.c         | 27 +++++++++++++++++++++++++++
> >  include/hw/arm/smmuv3.h |  2 ++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> > index 0033eb8125..9b0a776769 100644
> > --- a/hw/arm/smmuv3.c
> > +++ b/hw/arm/smmuv3.c
> > @@ -24,6 +24,7 @@
> >  #include "hw/qdev-properties.h"
> >  #include "hw/qdev-core.h"
> >  #include "hw/pci/pci.h"
> > +#include "hw/pci/pci_bridge.h"
> >  #include "cpu.h"
> >  #include "trace.h"
> >  #include "qemu/log.h"
> > @@ -2201,12 +2202,32 @@ static void smmu_realize(DeviceState *d,
> Error **errp)
> >      smmu_init_irq(s, dev);
> >  }
> >
> > +static int smmuv3_nested_pci_host_bridge(Object *obj, void *opaque)
> > +{
> > +    DeviceState *d = opaque;
> > +    SMMUv3NestedState *s_nested = ARM_SMMUV3_NESTED(d);
> > +
> > +    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> > +        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> > +        if (s_nested->pci_bus && !strcmp(bus->qbus.name, s_nested-
> >pci_bus)) {
> > +            object_property_set_link(OBJECT(d), "primary-bus", OBJECT(bus),
> > +                                     &error_abort);
> 
> Is the SMMUv3Nested useful if no 'primary-bus' is set ?
> 
> If not, then the 'realize' method ought to validate 'pci-bus' is not
> NULL and and raise an error if NULL.
> 
> After object_child_foreach_recursive returns, 'realize' should
> also validate that 'primary-bus' has been set, and raise an error
> it not set, to detect typos in the 'pci-bus' property.

I think that gets checked in the parent->realize() 

smmu_base_realize()

if (s->primary_bus) {
        pci_setup_iommu(s->primary_bus, &smmu_ops, s);
    } else {
        error_setg(errp, "SMMU is not attached to any PCI bus!");
    }

I will double check if that covers all the corner cases or not.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
                   ` (2 preceding siblings ...)
  2024-11-08 12:52 ` [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a " Shameer Kolothum via
@ 2024-11-08 12:52 ` Shameer Kolothum via
  2024-11-18 10:01   ` Eric Auger
  2024-11-08 12:52 ` [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum via
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 150+ messages in thread
From: Shameer Kolothum via @ 2024-11-08 12:52 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

From: Nicolin Chen <nicolinc@nvidia.com>

Now that we can have multiple user-creatable smmuv3-nested
devices, each associated with different pci buses, update
IORT ID mappings accordingly.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/virt-acpi-build.c | 34 ++++++++++++++++++++++++----------
 include/hw/arm/virt.h    |  6 ++++++
 2 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index e10cad86dd..ec4cdfb2d7 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -276,8 +276,10 @@ static void
 build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 {
     int i, nb_nodes, rc_mapping_count;
-    size_t node_size, smmu_offset = 0;
+    size_t node_size, *smmu_offset;
     AcpiIortIdMapping *idmap;
+    hwaddr base;
+    int irq, num_smmus = 0;
     uint32_t id = 0;
     GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
     GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
@@ -287,7 +289,21 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     /* Table 2 The IORT */
     acpi_table_begin(&table, table_data);
 
-    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
+    if (vms->smmu_nested_count) {
+        irq = vms->irqmap[VIRT_SMMU_NESTED] + ARM_SPI_BASE;
+        base = vms->memmap[VIRT_SMMU_NESTED].base;
+        num_smmus = vms->smmu_nested_count;
+    } else if (virt_has_smmuv3(vms)) {
+        irq = vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
+        base = vms->memmap[VIRT_SMMU].base;
+        num_smmus = 1;
+    }
+
+    smmu_offset = g_new0(size_t, num_smmus);
+    nb_nodes = 2; /* RC, ITS */
+    nb_nodes += num_smmus; /* SMMU nodes */
+
+    if (virt_has_smmuv3(vms)) {
         AcpiIortIdMapping next_range = {0};
 
         object_child_foreach_recursive(object_get_root(),
@@ -317,10 +333,8 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             g_array_append_val(its_idmaps, next_range);
         }
 
-        nb_nodes = 3; /* RC, ITS, SMMUv3 */
         rc_mapping_count = smmu_idmaps->len + its_idmaps->len;
     } else {
-        nb_nodes = 2; /* RC, ITS */
         rc_mapping_count = 1;
     }
     /* Number of IORT Nodes */
@@ -342,10 +356,9 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     /* GIC ITS Identifier Array */
     build_append_int_noprefix(table_data, 0 /* MADT translation_id */, 4);
 
-    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
-        int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
+    for (i = 0; i < num_smmus; i++) {
+        smmu_offset[i] = table_data->len - table.table_offset;
 
-        smmu_offset = table_data->len - table.table_offset;
         /* Table 9 SMMUv3 Format */
         build_append_int_noprefix(table_data, 4 /* SMMUv3 */, 1); /* Type */
         node_size =  SMMU_V3_ENTRY_SIZE + ID_MAPPING_ENTRY_SIZE;
@@ -356,7 +369,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         /* Reference to ID Array */
         build_append_int_noprefix(table_data, SMMU_V3_ENTRY_SIZE, 4);
         /* Base address */
-        build_append_int_noprefix(table_data, vms->memmap[VIRT_SMMU].base, 8);
+        build_append_int_noprefix(table_data, base + (i * SMMU_IO_LEN), 8);
         /* Flags */
         build_append_int_noprefix(table_data, 1 /* COHACC Override */, 4);
         build_append_int_noprefix(table_data, 0, 4); /* Reserved */
@@ -367,6 +380,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         build_append_int_noprefix(table_data, irq + 1, 4); /* PRI */
         build_append_int_noprefix(table_data, irq + 3, 4); /* GERR */
         build_append_int_noprefix(table_data, irq + 2, 4); /* Sync */
+        irq += NUM_SMMU_IRQS;
         build_append_int_noprefix(table_data, 0, 4); /* Proximity domain */
         /* DeviceID mapping index (ignored since interrupts are GSIV based) */
         build_append_int_noprefix(table_data, 0, 4);
@@ -405,7 +419,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     build_append_int_noprefix(table_data, 0, 3); /* Reserved */
 
     /* Output Reference */
-    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
+    if (virt_has_smmuv3(vms)) {
         AcpiIortIdMapping *range;
 
         /* translated RIDs connect to SMMUv3 node: RC -> SMMUv3 -> ITS */
@@ -413,7 +427,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             range = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
             /* output IORT node is the smmuv3 node */
             build_iort_id_mapping(table_data, range->input_base,
-                                  range->id_count, smmu_offset);
+                                  range->id_count, smmu_offset[i]);
         }
 
         /* bypassed RIDs connect to ITS group node directly: RC -> ITS */
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 50e47a4ef3..304ab134ae 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -219,4 +219,10 @@ static inline int virt_gicv3_redist_region_count(VirtMachineState *vms)
             vms->highmem_redists) ? 2 : 1;
 }
 
+static inline bool virt_has_smmuv3(const VirtMachineState *vms)
+{
+    return vms->iommu == VIRT_IOMMU_SMMUV3 ||
+           vms->iommu == VIRT_IOMMU_SMMUV3_NESTED;
+}
+
 #endif /* QEMU_ARM_VIRT_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-08 12:52 ` [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes Shameer Kolothum via
@ 2024-11-18 10:01   ` Eric Auger
  2024-11-18 11:44     ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-18 10:01 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

Hi Shameer,

On 11/8/24 13:52, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> Now that we can have multiple user-creatable smmuv3-nested
> devices, each associated with different pci buses, update
> IORT ID mappings accordingly.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/virt-acpi-build.c | 34 ++++++++++++++++++++++++----------
>  include/hw/arm/virt.h    |  6 ++++++
>  2 files changed, 30 insertions(+), 10 deletions(-)
>
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index e10cad86dd..ec4cdfb2d7 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -276,8 +276,10 @@ static void
>  build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>  {
>      int i, nb_nodes, rc_mapping_count;
> -    size_t node_size, smmu_offset = 0;
> +    size_t node_size, *smmu_offset;
>      AcpiIortIdMapping *idmap;
> +    hwaddr base;
> +    int irq, num_smmus = 0;
>      uint32_t id = 0;
>      GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
>      GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
> @@ -287,7 +289,21 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      /* Table 2 The IORT */
>      acpi_table_begin(&table, table_data);
>  
> -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> +    if (vms->smmu_nested_count) {
> +        irq = vms->irqmap[VIRT_SMMU_NESTED] + ARM_SPI_BASE;
> +        base = vms->memmap[VIRT_SMMU_NESTED].base;
> +        num_smmus = vms->smmu_nested_count;
> +    } else if (virt_has_smmuv3(vms)) {
> +        irq = vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
> +        base = vms->memmap[VIRT_SMMU].base;
> +        num_smmus = 1;
> +    }
> +
> +    smmu_offset = g_new0(size_t, num_smmus);
> +    nb_nodes = 2; /* RC, ITS */
> +    nb_nodes += num_smmus; /* SMMU nodes */
> +
> +    if (virt_has_smmuv3(vms)) {
>          AcpiIortIdMapping next_range = {0};
>  
>          object_child_foreach_recursive(object_get_root(),
> @@ -317,10 +333,8 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>              g_array_append_val(its_idmaps, next_range);
>          }
>  
> -        nb_nodes = 3; /* RC, ITS, SMMUv3 */
>          rc_mapping_count = smmu_idmaps->len + its_idmaps->len;
>      } else {
> -        nb_nodes = 2; /* RC, ITS */
>          rc_mapping_count = 1;
>      }
>      /* Number of IORT Nodes */
> @@ -342,10 +356,9 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      /* GIC ITS Identifier Array */
>      build_append_int_noprefix(table_data, 0 /* MADT translation_id */, 4);
>  
> -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> -        int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
> +    for (i = 0; i < num_smmus; i++) {
> +        smmu_offset[i] = table_data->len - table.table_offset;
>  
I would have expected changes in the smmu idmap has well. If a given
SMMU instance now protects a given bus hierarchy shouldn't it be
reflected in a differentiated SMMU idmap for each of them (RID subset of
SMMU->pci-bus mapping to a specific IORT SMMU node)? How is it done
currently?

Thanks

Eric
> -        smmu_offset = table_data->len - table.table_offset;
>          /* Table 9 SMMUv3 Format */
>          build_append_int_noprefix(table_data, 4 /* SMMUv3 */, 1); /* Type */
>          node_size =  SMMU_V3_ENTRY_SIZE + ID_MAPPING_ENTRY_SIZE;
> @@ -356,7 +369,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>          /* Reference to ID Array */
>          build_append_int_noprefix(table_data, SMMU_V3_ENTRY_SIZE, 4);
>          /* Base address */
> -        build_append_int_noprefix(table_data, vms->memmap[VIRT_SMMU].base, 8);
> +        build_append_int_noprefix(table_data, base + (i * SMMU_IO_LEN), 8);
>          /* Flags */
>          build_append_int_noprefix(table_data, 1 /* COHACC Override */, 4);
>          build_append_int_noprefix(table_data, 0, 4); /* Reserved */
> @@ -367,6 +380,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>          build_append_int_noprefix(table_data, irq + 1, 4); /* PRI */
>          build_append_int_noprefix(table_data, irq + 3, 4); /* GERR */
>          build_append_int_noprefix(table_data, irq + 2, 4); /* Sync */
> +        irq += NUM_SMMU_IRQS;
>          build_append_int_noprefix(table_data, 0, 4); /* Proximity domain */
>          /* DeviceID mapping index (ignored since interrupts are GSIV based) */
>          build_append_int_noprefix(table_data, 0, 4);
> @@ -405,7 +419,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      build_append_int_noprefix(table_data, 0, 3); /* Reserved */
>  
>      /* Output Reference */
> -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> +    if (virt_has_smmuv3(vms)) {
>          AcpiIortIdMapping *range;
>  
>          /* translated RIDs connect to SMMUv3 node: RC -> SMMUv3 -> ITS */
> @@ -413,7 +427,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>              range = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
>              /* output IORT node is the smmuv3 node */
>              build_iort_id_mapping(table_data, range->input_base,
> -                                  range->id_count, smmu_offset);
> +                                  range->id_count, smmu_offset[i]);
>          }
>  
>          /* bypassed RIDs connect to ITS group node directly: RC -> ITS */
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 50e47a4ef3..304ab134ae 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -219,4 +219,10 @@ static inline int virt_gicv3_redist_region_count(VirtMachineState *vms)
>              vms->highmem_redists) ? 2 : 1;
>  }
>  
> +static inline bool virt_has_smmuv3(const VirtMachineState *vms)
> +{
> +    return vms->iommu == VIRT_IOMMU_SMMUV3 ||
> +           vms->iommu == VIRT_IOMMU_SMMUV3_NESTED;
> +}
> +
>  #endif /* QEMU_ARM_VIRT_H */



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-18 10:01   ` Eric Auger
@ 2024-11-18 11:44     ` Shameerali Kolothum Thodi via
  2024-11-18 13:45       ` Eric Auger
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-18 11:44 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Monday, November 18, 2024 10:02 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
> multiple SMMU nodes
> 

> >      /* Number of IORT Nodes */
> > @@ -342,10 +356,9 @@ build_iort(GArray *table_data, BIOSLinker
> *linker, VirtMachineState *vms)
> >      /* GIC ITS Identifier Array */
> >      build_append_int_noprefix(table_data, 0 /* MADT translation_id */,
> 4);
> >
> > -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> > -        int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
> > +    for (i = 0; i < num_smmus; i++) {
> > +        smmu_offset[i] = table_data->len - table.table_offset;
> >
> I would have expected changes in the smmu idmap has well. If a given
> SMMU instance now protects a given bus hierarchy shouldn't it be
> reflected in a differentiated SMMU idmap for each of them (RID subset of
> SMMU->pci-bus mapping to a specific IORT SMMU node)? How is it done
> currently?

I thought that smmu_idmaps will be handled by this ?

object_child_foreach_recursive(object_get_root(),
                                       iort_host_bridges, smmu_idmaps);

But it is possible that, there is a bug in this IORT generation here as I am not
able to hot add  devices. It looks like the pciehp interrupt is not generated/received
for some reason. Nicolin[0] is suspecting the min/max bus range in
iort_host_bridges() may not leave enough ranges for hot add later.

Cold plugging devices to different SMMUv3/pcie-pxb seems to be alright.

I will debug that soon.

Thanks,
Shameer
[0] https://lore.kernel.org/qemu-devel/ZzPd1F%2FUA2MKMbwl@Asurada-Nvidia/



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-18 11:44     ` Shameerali Kolothum Thodi via
@ 2024-11-18 13:45       ` Eric Auger
  2024-11-18 15:00         ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-18 13:45 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer,

On 11/18/24 12:44, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Monday, November 18, 2024 10:02 AM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
>> multiple SMMU nodes
>>
>>>      /* Number of IORT Nodes */
>>> @@ -342,10 +356,9 @@ build_iort(GArray *table_data, BIOSLinker
>> *linker, VirtMachineState *vms)
>>>      /* GIC ITS Identifier Array */
>>>      build_append_int_noprefix(table_data, 0 /* MADT translation_id */,
>> 4);
>>> -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
>>> -        int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
>>> +    for (i = 0; i < num_smmus; i++) {
>>> +        smmu_offset[i] = table_data->len - table.table_offset;
>>>
>> I would have expected changes in the smmu idmap has well. If a given
>> SMMU instance now protects a given bus hierarchy shouldn't it be
>> reflected in a differentiated SMMU idmap for each of them (RID subset of
>> SMMU->pci-bus mapping to a specific IORT SMMU node)? How is it done
>> currently?
> I thought that smmu_idmaps will be handled by this ?
>
> object_child_foreach_recursive(object_get_root(),
>                                        iort_host_bridges, smmu_idmaps);
to me this traverses the qemu object hierarchy to find all host bridges
and for each of them builds an idmap array (smmu_idmaps mapping this RC
RID range to this SMMU). But to me those idmaps will be assigned to
*all* SMMU insteaces leading to a wong IORT description because all
SMMUs will be protecting all devices. You shall only retain idmaps which
correspond to the pci_bus a given vSMMU is attached to. Then each SMMU
will protect a distinct PCIe subtree which does not seem the case today.
At least that's my current understanding.

Eric


>
> But it is possible that, there is a bug in this IORT generation here as I am not
> able to hot add  devices. It looks like the pciehp interrupt is not generated/received
> for some reason. Nicolin[0] is suspecting the min/max bus range in
> iort_host_bridges() may not leave enough ranges for hot add later.
>
> Cold plugging devices to different SMMUv3/pcie-pxb seems to be alright.
>
> I will debug that soon.
>
> Thanks,
> Shameer
> [0] https://lore.kernel.org/qemu-devel/ZzPd1F%2FUA2MKMbwl@Asurada-Nvidia/
>
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-18 13:45       ` Eric Auger
@ 2024-11-18 15:00         ` Shameerali Kolothum Thodi via
  2024-11-18 18:09           ` Eric Auger
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-18 15:00 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Monday, November 18, 2024 1:46 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
> multiple SMMU nodes
> 

> >>>      /* Number of IORT Nodes */
> >>> @@ -342,10 +356,9 @@ build_iort(GArray *table_data, BIOSLinker
> >> *linker, VirtMachineState *vms)
> >>>      /* GIC ITS Identifier Array */
> >>>      build_append_int_noprefix(table_data, 0 /* MADT translation_id */,
> >> 4);
> >>> -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
> >>> -        int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
> >>> +    for (i = 0; i < num_smmus; i++) {
> >>> +        smmu_offset[i] = table_data->len - table.table_offset;
> >>>
> >> I would have expected changes in the smmu idmap has well. If a given
> >> SMMU instance now protects a given bus hierarchy shouldn't it be
> >> reflected in a differentiated SMMU idmap for each of them (RID subset
> of
> >> SMMU->pci-bus mapping to a specific IORT SMMU node)? How is it done
> >> currently?
> > I thought that smmu_idmaps will be handled by this ?
> >
> > object_child_foreach_recursive(object_get_root(),
> >                                        iort_host_bridges, smmu_idmaps);
> to me this traverses the qemu object hierarchy to find all host bridges
> and for each of them builds an idmap array (smmu_idmaps mapping this
> RC
> RID range to this SMMU). But to me those idmaps will be assigned to
> *all* SMMU insteaces leading to a wong IORT description because all
> SMMUs will be protecting all devices. You shall only retain idmaps which
> correspond to the pci_bus a given vSMMU is attached to. Then each SMMU
> will protect a distinct PCIe subtree which does not seem the case today.
> At least that's my current understanding.

Ah..right. I will fix that in next version. 

I think the above won't affect the basic case where I have only one
pcie-pxb/SMMUv3. But even in that case hot add seems not working.

I tried hacking the min/max ranges as suspected by Nicolin. But still not enough to 
get it working.  Do you have any hint on why the hot add(described below) is not
working?

Thanks,
Shameer

> 
> Eric
> 
> 
> >
> > But it is possible that, there is a bug in this IORT generation here as I am
> not
> > able to hot add  devices. It looks like the pciehp interrupt is not
> generated/received
> > for some reason. Nicolin[0] is suspecting the min/max bus range in
> > iort_host_bridges() may not leave enough ranges for hot add later.
> >
> > Cold plugging devices to different SMMUv3/pcie-pxb seems to be alright.
> >
> > I will debug that soon.
> >
> > Thanks,
> > Shameer
> > [0] https://lore.kernel.org/qemu-devel/ZzPd1F%2FUA2MKMbwl@Asurada-
> Nvidia/
> >
> >
> 


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-18 15:00         ` Shameerali Kolothum Thodi via
@ 2024-11-18 18:09           ` Eric Auger
  2024-11-20 14:16             ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-18 18:09 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer,

On 11/18/24 16:00, Shameerali Kolothum Thodi wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Monday, November 18, 2024 1:46 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
>> multiple SMMU nodes
>>
>>>>>      /* Number of IORT Nodes */
>>>>> @@ -342,10 +356,9 @@ build_iort(GArray *table_data, BIOSLinker
>>>> *linker, VirtMachineState *vms)
>>>>>      /* GIC ITS Identifier Array */
>>>>>      build_append_int_noprefix(table_data, 0 /* MADT translation_id */,
>>>> 4);
>>>>> -    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
>>>>> -        int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
>>>>> +    for (i = 0; i < num_smmus; i++) {
>>>>> +        smmu_offset[i] = table_data->len - table.table_offset;
>>>>>
>>>> I would have expected changes in the smmu idmap has well. If a given
>>>> SMMU instance now protects a given bus hierarchy shouldn't it be
>>>> reflected in a differentiated SMMU idmap for each of them (RID subset
>> of
>>>> SMMU->pci-bus mapping to a specific IORT SMMU node)? How is it done
>>>> currently?
>>> I thought that smmu_idmaps will be handled by this ?
>>>
>>> object_child_foreach_recursive(object_get_root(),
>>>                                        iort_host_bridges, smmu_idmaps);
>> to me this traverses the qemu object hierarchy to find all host bridges
>> and for each of them builds an idmap array (smmu_idmaps mapping this
>> RC
>> RID range to this SMMU). But to me those idmaps will be assigned to
>> *all* SMMU insteaces leading to a wong IORT description because all
>> SMMUs will be protecting all devices. You shall only retain idmaps which
>> correspond to the pci_bus a given vSMMU is attached to. Then each SMMU
>> will protect a distinct PCIe subtree which does not seem the case today.
>> At least that's my current understanding.
> Ah..right. I will fix that in next version. 
>
> I think the above won't affect the basic case where I have only one
> pcie-pxb/SMMUv3. But even in that case hot add seems not working.
>
> I tried hacking the min/max ranges as suspected by Nicolin. But still not enough to 
> get it working.  Do you have any hint on why the hot add(described below) is not
> working?
Hum thought the duplicate idmap could be the cause. Otherwise I have no
clue. I would advice to fix it first.

Eric
>
> Thanks,
> Shameer
>
>> Eric
>>
>>
>>> But it is possible that, there is a bug in this IORT generation here as I am
>> not
>>> able to hot add  devices. It looks like the pciehp interrupt is not
>> generated/received
>>> for some reason. Nicolin[0] is suspecting the min/max bus range in
>>> iort_host_bridges() may not leave enough ranges for hot add later.
>>>
>>> Cold plugging devices to different SMMUv3/pcie-pxb seems to be alright.
>>>
>>> I will debug that soon.
>>>
>>> Thanks,
>>> Shameer
>>> [0] https://lore.kernel.org/qemu-devel/ZzPd1F%2FUA2MKMbwl@Asurada-
>> Nvidia/
>>>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-18 18:09           ` Eric Auger
@ 2024-11-20 14:16             ` Shameerali Kolothum Thodi via
  2024-11-20 16:10               ` Eric Auger
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-20 14:16 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Monday, November 18, 2024 6:10 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
> multiple SMMU nodes

[...]

> > I think the above won't affect the basic case where I have only one
> > pcie-pxb/SMMUv3. But even in that case hot add seems not working.
> >
> > I tried hacking the min/max ranges as suspected by Nicolin. But still not
> enough to
> > get it working.  Do you have any hint on why the hot add(described
> below) is not
> > working?
> Hum thought the duplicate idmap could be the cause. Otherwise I have no
> clue. I would advice to fix it first.

I think I have an idea why the hot add was not working. 

When we have the PCIe topology as something like below,

-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2 \
-device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
...

The current IORT generation includes the pcie-root-port dev ids also
in the SMMUv3 node idmaps.

Hence, when Guest kernel loads, pcieport is also behind the SMMUv3.

[    1.466670] pcieport 0000:64:00.0: Adding to iommu group 1
...
[    1.448205] pcieport 0000:64:01.0: Adding to iommu group 2

So when we do a  hot add,
device_add vfio-pci,host=0000:75:00.1,bus=pcie.port1,iommufd=iommufd0

The Qemu hotplug event handler tries to inject an IRQ to the Guest pcieport
by retrieving the MSI address it is configured with. 

hotplug_event_notify()
    msix_prepare_message(): [address: 0xfffff040]
         msix_notify()

The ITS address retrieved here is actually the SMMUv3 translated iova addr,
not the Guest PA.  So Guest never sees/receives the interrupt.

I did hack the IORT code to exclude the pcie-root-port dev ids from the SMMUv3
node idmaps and the hot add seems to work fine.

Looks like we need to find all the pcie-root-port dev ids associated with a
SMMUv3/pxb-pcie and exclude them from SMMUv3 node idmaps to get
the hot add working.

I am not sure though this will  create any other issues in IOMMU isolation criteria
(ACS etc,), especially if we want to access the device in Guest user space( I hope
not).

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-20 14:16             ` Shameerali Kolothum Thodi via
@ 2024-11-20 16:10               ` Eric Auger
  2024-11-20 16:26                 ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-20 16:10 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer,

On 11/20/24 15:16, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Monday, November 18, 2024 6:10 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
>> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
>> multiple SMMU nodes
> [...]
>
>>> I think the above won't affect the basic case where I have only one
>>> pcie-pxb/SMMUv3. But even in that case hot add seems not working.
>>>
>>> I tried hacking the min/max ranges as suspected by Nicolin. But still not
>> enough to
>>> get it working.  Do you have any hint on why the hot add(described
>> below) is not
>>> working?
>> Hum thought the duplicate idmap could be the cause. Otherwise I have no
>> clue. I would advice to fix it first.
> I think I have an idea why the hot add was not working. 
>
> When we have the PCIe topology as something like below,
>
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> ...
>
> The current IORT generation includes the pcie-root-port dev ids also
> in the SMMUv3 node idmaps.
>
> Hence, when Guest kernel loads, pcieport is also behind the SMMUv3.
>
> [    1.466670] pcieport 0000:64:00.0: Adding to iommu group 1
> ...
> [    1.448205] pcieport 0000:64:01.0: Adding to iommu group 2

But it should be the same without multi-instantiation, no? I would have
expected this as normal. Has you tested hot-plug without the series
laterly? Do you have the same pb?

Thanks

Eric
>
>
> So when we do a  hot add,
> device_add vfio-pci,host=0000:75:00.1,bus=pcie.port1,iommufd=iommufd0
>
> The Qemu hotplug event handler tries to inject an IRQ to the Guest pcieport
> by retrieving the MSI address it is configured with. 
>
> hotplug_event_notify()
>     msix_prepare_message(): [address: 0xfffff040]
>          msix_notify()
>
> The ITS address retrieved here is actually the SMMUv3 translated iova addr,
> not the Guest PA.  So Guest never sees/receives the interrupt.
>
> I did hack the IORT code to exclude the pcie-root-port dev ids from the SMMUv3
> node idmaps and the hot add seems to work fine.
>
> Looks like we need to find all the pcie-root-port dev ids associated with a
> SMMUv3/pxb-pcie and exclude them from SMMUv3 node idmaps to get
> the hot add working.
>
> I am not sure though this will  create any other issues in IOMMU isolation criteria
> (ACS etc,), especially if we want to access the device in Guest user space( I hope
> not).
>
> Thanks,
> Shameer
>
>
>
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-20 16:10               ` Eric Auger
@ 2024-11-20 16:26                 ` Shameerali Kolothum Thodi via
  2024-11-21  9:46                   ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-20 16:26 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, November 20, 2024 4:11 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
> multiple SMMU nodes
> 
> Hi Shameer,
> 
> On 11/20/24 15:16, Shameerali Kolothum Thodi wrote:
> >
> >> -----Original Message-----
> >> From: Eric Auger <eric.auger@redhat.com>
> >> Sent: Monday, November 18, 2024 6:10 PM
> >> To: Shameerali Kolothum Thodi
> >> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> >> qemu-devel@nongnu.org
> >> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> >> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> >> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> >> Jonathan Cameron <jonathan.cameron@huawei.com>;
> >> zhangfei.gao@linaro.org
> >> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
> >> multiple SMMU nodes
> > [...]
> >
> >>> I think the above won't affect the basic case where I have only one
> >>> pcie-pxb/SMMUv3. But even in that case hot add seems not working.
> >>>
> >>> I tried hacking the min/max ranges as suspected by Nicolin. But still not
> >> enough to
> >>> get it working.  Do you have any hint on why the hot add(described
> >> below) is not
> >>> working?
> >> Hum thought the duplicate idmap could be the cause. Otherwise I have
> no
> >> clue. I would advice to fix it first.
> > I think I have an idea why the hot add was not working.
> >
> > When we have the PCIe topology as something like below,
> >
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2 \
> > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > ...
> >
> > The current IORT generation includes the pcie-root-port dev ids also
> > in the SMMUv3 node idmaps.
> >
> > Hence, when Guest kernel loads, pcieport is also behind the SMMUv3.
> >
> > [    1.466670] pcieport 0000:64:00.0: Adding to iommu group 1
> > ...
> > [    1.448205] pcieport 0000:64:01.0: Adding to iommu group 2
> 
> But it should be the same without multi-instantiation, no? I would have
> expected this as normal. Has you tested hot-plug without the series
> laterly? Do you have the same pb?

That is a good question. I will give it a try soon and update.

Thanks,
Shameer.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-20 16:26                 ` Shameerali Kolothum Thodi via
@ 2024-11-21  9:46                   ` Shameerali Kolothum Thodi via
  2024-12-10 20:48                     ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-21  9:46 UTC (permalink / raw)
  To: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org
  Cc: peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Eric,

> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: Wednesday, November 20, 2024 4:26 PM
> To: 'eric.auger@redhat.com' <eric.auger@redhat.com>; qemu-
> arm@nongnu.org; qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
> multiple SMMU nodes
> 
> > > I think I have an idea why the hot add was not working.
> > >
> > > When we have the PCIe topology as something like below,
> > >
> > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ -device
> > > pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \ -device
> > > pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2 \ -device
> > > arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \ ...
> > >
> > > The current IORT generation includes the pcie-root-port dev ids also
> > > in the SMMUv3 node idmaps.
> > >
> > > Hence, when Guest kernel loads, pcieport is also behind the SMMUv3.
> > >
> > > [    1.466670] pcieport 0000:64:00.0: Adding to iommu group 1
> > > ...
> > > [    1.448205] pcieport 0000:64:01.0: Adding to iommu group 2
> >
> > But it should be the same without multi-instantiation, no? I would
> > have expected this as normal. Has you tested hot-plug without the
> > series laterly? Do you have the same pb?
> 
> That is a good question. I will give it a try soon and update.

I tried hot add with the current SMMUv3(iommu=smmuv3) and hot add
works when I added a virtio dev to pcie-root-port connected to a pxb-pcie.

And now I think I know(hopefully) the reason why it is not working with
smmuv3-nested case. I think the root cause is this commit here,

(series: " cover-letter: Add HW accelerated nesting support for arm SMMUv3")
https://github.com/hisilicon/qemu/commit/9b21f28595cef7b1100ae130974605f357ef75d3

This changes the way address space is returned for the devices.

static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
{
    SMMUState *s = opaque;
    SMMUPciBus *sbus = smmu_get_sbus(s, bus);
    SMMUDevice *sdev = smmu_get_sdev(s, sbus, bus, devfn);

    /* Return the system as if the device uses stage-2 only */
    if (s->nested && !sdev->s1_hwpt) {
        return &sdev->as_sysmem;
    } else {
        return &sdev->as;
    }
}

If we have entries in the SMMUv3 idmap for bus:devfn, then I think we should
return IOMMU address space here. But the logic above returns sysmem
address space for anything other than vfio/iommufd devices. 

The hot add works when I hacked the logic to return IOMMU address space
for pcie root port devices.

Could you please take a look at the commit above and let me know if this
indeed could be the problem?

Thanks,
Shameer







^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-11-21  9:46                   ` Shameerali Kolothum Thodi via
@ 2024-12-10 20:48                     ` Nicolin Chen
  2024-12-11 15:21                       ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2024-12-10 20:48 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

On Thu, Nov 21, 2024 at 09:46:16AM +0000, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
> > -----Original Message-----
> > From: Shameerali Kolothum Thodi
> > Sent: Wednesday, November 20, 2024 4:26 PM
> > To: 'eric.auger@redhat.com' <eric.auger@redhat.com>; qemu-
> > arm@nongnu.org; qemu-devel@nongnu.org
> > Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> > ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> > Jonathan Cameron <jonathan.cameron@huawei.com>;
> > zhangfei.gao@linaro.org
> > Subject: RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
> > multiple SMMU nodes
> > 
> > > > I think I have an idea why the hot add was not working.
> > > >
> > > > When we have the PCIe topology as something like below,
> > > >
> > > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ -device
> > > > pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \ -device
> > > > pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2 \ -device
> > > > arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \ ...
> > > >
> > > > The current IORT generation includes the pcie-root-port dev ids also
> > > > in the SMMUv3 node idmaps.
> > > >
> > > > Hence, when Guest kernel loads, pcieport is also behind the SMMUv3.
> > > >
> > > > [    1.466670] pcieport 0000:64:00.0: Adding to iommu group 1
> > > > ...
> > > > [    1.448205] pcieport 0000:64:01.0: Adding to iommu group 2
> > >
> > > But it should be the same without multi-instantiation, no? I would
> > > have expected this as normal. Has you tested hot-plug without the
> > > series laterly? Do you have the same pb?
> > 
> > That is a good question. I will give it a try soon and update.
> 
> I tried hot add with the current SMMUv3(iommu=smmuv3) and hot add
> works when I added a virtio dev to pcie-root-port connected to a pxb-pcie.
> 
> And now I think I know(hopefully) the reason why it is not working with
> smmuv3-nested case. I think the root cause is this commit here,
> 
> (series: " cover-letter: Add HW accelerated nesting support for arm SMMUv3")

> This changes the way address space is returned for the devices.
> 
> static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
> {
>     SMMUState *s = opaque;
>     SMMUPciBus *sbus = smmu_get_sbus(s, bus);
>     SMMUDevice *sdev = smmu_get_sdev(s, sbus, bus, devfn);
> 
>     /* Return the system as if the device uses stage-2 only */
>     if (s->nested && !sdev->s1_hwpt) {
>         return &sdev->as_sysmem;
>     } else {
>         return &sdev->as;
>     }
> }
> 
> If we have entries in the SMMUv3 idmap for bus:devfn, then I think we should
> return IOMMU address space here. But the logic above returns sysmem
> address space for anything other than vfio/iommufd devices.
>
> The hot add works when I hacked the logic to return IOMMU address space
> for pcie root port devices.

That is to bypass the "if (memory_region_is_iommu(section->mr))"
in vfio_listener_region_add(), when the device gets initially
attached to the default container.

Once a device reaches to the pci_device_set_iommu_device() call,
it should be attached to an IDENTIY/bypass proxy s1_hwpt, so the
smmu_find_add_as() will return the iommu as.

So, the fact that your hack is working means the hotplug routine
is likely missing a pci_device_set_iommu_device() call, IMHO, or
probably it should do pci_device_iommu_address_space() after the
device finishes pci_device_set_iommu_device() instead..

Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-12-10 20:48                     ` Nicolin Chen
@ 2024-12-11 15:21                       ` Shameerali Kolothum Thodi via
  2024-12-13  0:28                         ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-12-11 15:21 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

Hi Nicolin,

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, December 10, 2024 8:48 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: eric.auger@redhat.com; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
> multiple SMMU nodes
> 


> > And now I think I know(hopefully) the reason why it is not working with
> > smmuv3-nested case. I think the root cause is this commit here,
> >
> > (series: " cover-letter: Add HW accelerated nesting support for arm
> SMMUv3")
> 
> > This changes the way address space is returned for the devices.
> >
> > static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int
> devfn)
> > {
> >     SMMUState *s = opaque;
> >     SMMUPciBus *sbus = smmu_get_sbus(s, bus);
> >     SMMUDevice *sdev = smmu_get_sdev(s, sbus, bus, devfn);
> >
> >     /* Return the system as if the device uses stage-2 only */
> >     if (s->nested && !sdev->s1_hwpt) {
> >         return &sdev->as_sysmem;
> >     } else {
> >         return &sdev->as;
> >     }
> > }
> >
> > If we have entries in the SMMUv3 idmap for bus:devfn, then I think we
> should
> > return IOMMU address space here. But the logic above returns sysmem
> > address space for anything other than vfio/iommufd devices.
> >
> > The hot add works when I hacked the logic to return IOMMU address
> space
> > for pcie root port devices.
> That is to bypass the "if (memory_region_is_iommu(section->mr))"
> in vfio_listener_region_add(), when the device gets initially
> attached to the default container.

Right. 
 
> Once a device reaches to the pci_device_set_iommu_device() call,
> it should be attached to an IDENTIY/bypass proxy s1_hwpt, so the
> smmu_find_add_as() will return the iommu as.

Agree. The above situation you explained is perfectly fine with vfio-pci dev.

> So, the fact that your hack is working means the hotplug routine
> is likely missing a pci_device_set_iommu_device() call, IMHO, or
> probably it should do pci_device_iommu_address_space() after the
> device finishes pci_device_set_iommu_device() instead..

The problem is not with the hot added vfio-pci dev but with the
pcie-root-port device. When we hot add a vfio-pci to a root port,
Qemu will inject an interrupt for the Guest root port device and
that kick starts the vfio-pci device add process. This involves writing
to the MSI address the Guest  kernel configures for the root port dev. 

As per the current logic, the root port dev will have sysmem address
space and in IORT we have root port dev id in smmu idmap. This
will not work as Guest kernel configures a translated IOVA for MSI.

I think we have discussed this issue of returning different address
spaces before here[0]. But that was in a different context though.
The hack mentioned in [0] actually works for this case as well, where
we add an extra check to see the dev is vfio-pci or not. But I am not
sure that is the best way to handle this.

Another option is to exclude all the root port devices from IORT idmap.
But that looks not an ideal one to me as it actually sits behind an SMMUv3
in this case.

Please let me know if you have any ideas.

Thanks,
Shameer

[0] https://lore.kernel.org/linux-iommu/02f3fbc5145d4449b3313eb802ecfa2c@huawei.com/

 



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
  2024-12-11 15:21                       ` Shameerali Kolothum Thodi via
@ 2024-12-13  0:28                         ` Nicolin Chen
  0 siblings, 0 replies; 150+ messages in thread
From: Nicolin Chen @ 2024-12-13  0:28 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: eric.auger@redhat.com, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

On Wed, Dec 11, 2024 at 03:21:37PM +0000, Shameerali Kolothum Thodi wrote:
> > Once a device reaches to the pci_device_set_iommu_device() call,
> > it should be attached to an IDENTIY/bypass proxy s1_hwpt, so the
> > smmu_find_add_as() will return the iommu as.
> 
> Agree. The above situation you explained is perfectly fine with vfio-pci dev.
> 
> > So, the fact that your hack is working means the hotplug routine
> > is likely missing a pci_device_set_iommu_device() call, IMHO, or
> > probably it should do pci_device_iommu_address_space() after the
> > device finishes pci_device_set_iommu_device() instead..
> 
> The problem is not with the hot added vfio-pci dev but with the
> pcie-root-port device. When we hot add a vfio-pci to a root port,
> Qemu will inject an interrupt for the Guest root port device and
> that kick starts the vfio-pci device add process. This involves writing
> to the MSI address the Guest  kernel configures for the root port dev. 
> 
> As per the current logic, the root port dev will have sysmem address
> space and in IORT we have root port dev id in smmu idmap. This
> will not work as Guest kernel configures a translated IOVA for MSI.
> 
> I think we have discussed this issue of returning different address
> spaces before here[0]. But that was in a different context though.
> The hack mentioned in [0] actually works for this case as well, where
> we add an extra check to see the dev is vfio-pci or not. But I am not
> sure that is the best way to handle this.
> 
> Another option is to exclude all the root port devices from IORT idmap.
> But that looks not an ideal one to me as it actually sits behind an SMMUv3
> in this case.
> 
> Please let me know if you have any ideas.

Oh... I completely forgot that...

So, we need to make sure the sdev/PCIDevice is a passthrough dev
that will go through the set_iommu_device callback. Otherwise,
just return the iommu address space.

Perhaps we could set a flag during vfio_realize() in PCIDevice *
pdev, so later we could cast the sdev to pdev and recheck that.

Or, we could do something like your approach:
-----------------------------------------------------------------
@@ -896,9 +896,11 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
     SMMUState *s = opaque;
     SMMUPciBus *sbus = smmu_get_sbus(s, bus);
     SMMUDevice *sdev = smmu_get_sdev(s, sbus, bus, devfn);
+    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
+    bool has_iommufd = !!object_property_find(OBJECT(pdev), "iommufd");

     /* Return the system as if the device uses stage-2 only */
-    if (s->nested && !sdev->s1_hwpt) {
+    if (s->nested && !sdev->s1_hwpt && has_iommufd) {
         return &sdev->as_sysmem;
     } else {
         return &sdev->as;
-----------------------------------------------------------------

vfio-pci might not guarantee that it has an "iommufd" property so
checking the property explicitly might be nicer.

Thanks
Nic


^ permalink raw reply	[flat|nested] 150+ messages in thread

* [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
                   ` (3 preceding siblings ...)
  2024-11-08 12:52 ` [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes Shameer Kolothum via
@ 2024-11-08 12:52 ` Shameer Kolothum via
  2024-11-13 18:31   ` Nicolin Chen
  2024-12-10 23:01   ` Nicolin Chen
  2024-11-12 22:59 ` [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Nicolin Chen
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 150+ messages in thread
From: Shameer Kolothum via @ 2024-11-08 12:52 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: eric.auger, peter.maydell, jgg, nicolinc, ddutile, linuxarm,
	wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

From: Eric Auger <eric.auger@redhat.com>

To handle SMMUv3 nested stage support it is practical to
expose the guest with reserved memory regions (RMRs)
covering the IOVAs used by the host kernel to map
physical MSI doorbells.

Those IOVAs belong to [0x8000000, 0x8100000] matching
MSI_IOVA_BASE and MSI_IOVA_LENGTH definitions in kernel
arm-smmu-v3 driver. This is the window used to allocate
IOVAs matching physical MSI doorbells.

With those RMRs, the guest is forced to use a flat mapping
for this range. Hence the assigned device is programmed
with one IOVA from this range. Stage 1, owned by the guest
has a flat mapping for this IOVA. Stage2, owned by the VMM
then enforces a mapping from this IOVA to the physical
MSI doorbell.

The creation of those RMR nodes only is relevant if nested
stage SMMU is in use, along with VFIO. As VFIO devices can be
hotplugged, all RMRs need to be created in advance. Hence
the patch introduces a new arm virt "nested-smmuv3" iommu type.

ARM DEN 0049E.b IORT specification also mandates that when
RMRs are present, the OS must preserve PCIe configuration
performed by the boot FW. So along with the RMR IORT nodes,
a _DSM function #5, as defined by PCI FIRMWARE SPECIFICATION
EVISION 3.3, chapter 4.6.5 is added to PCIe host bridge
and PCIe expander bridge objects.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/virt-acpi-build.c | 77 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 68 insertions(+), 9 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index ec4cdfb2d7..f327ca59ec 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -132,6 +132,14 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
         .bus    = vms->bus,
     };
 
+    /*
+     * Nested SMMU requires RMRs for MSI 1-1 mapping, which
+     * require _DSM for PreservingPCI Boot Configurations
+     */
+    if (vms->iommu == VIRT_IOMMU_SMMUV3_NESTED) {
+        cfg.preserve_config = true;
+    }
+
     if (vms->highmem_mmio) {
         cfg.mmio64 = memmap[VIRT_HIGH_PCIE_MMIO];
     }
@@ -216,16 +224,16 @@ static void acpi_dsdt_add_tpm(Aml *scope, VirtMachineState *vms)
  *
  * Note that @id_count gets internally subtracted by one, following the spec.
  */
-static void build_iort_id_mapping(GArray *table_data, uint32_t input_base,
-                                  uint32_t id_count, uint32_t out_ref)
+static void
+build_iort_id_mapping(GArray *table_data, uint32_t input_base,
+                      uint32_t id_count, uint32_t out_ref, uint32_t flags)
 {
     build_append_int_noprefix(table_data, input_base, 4); /* Input base */
     /* Number of IDs - The number of IDs in the range minus one */
     build_append_int_noprefix(table_data, id_count - 1, 4);
     build_append_int_noprefix(table_data, input_base, 4); /* Output base */
     build_append_int_noprefix(table_data, out_ref, 4); /* Output Reference */
-    /* Flags */
-    build_append_int_noprefix(table_data, 0 /* Single mapping (disabled) */, 4);
+    build_append_int_noprefix(table_data, flags, 4); /* Flags */
 }
 
 struct AcpiIortIdMapping {
@@ -267,6 +275,50 @@ static int iort_idmap_compare(gconstpointer a, gconstpointer b)
     return idmap_a->input_base - idmap_b->input_base;
 }
 
+static void
+build_iort_rmr_nodes(GArray *table_data, GArray *smmu_idmaps,
+                     size_t *smmu_offset, uint32_t *id)
+{
+    AcpiIortIdMapping *range;
+    int i;
+
+    for (i = 0; i < smmu_idmaps->len; i++) {
+        range = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
+        int bdf = range->input_base;
+
+        /* Table 18 Reserved Memory Range Node */
+
+        build_append_int_noprefix(table_data, 6 /* RMR */, 1); /* Type */
+        /* Length */
+        build_append_int_noprefix(table_data, 28 + ID_MAPPING_ENTRY_SIZE + 20, 2);
+        build_append_int_noprefix(table_data, 3, 1); /* Revision */
+        build_append_int_noprefix(table_data, *id, 4); /* Identifier */
+        /* Number of ID mappings */
+        build_append_int_noprefix(table_data, 1, 4);
+        /* Reference to ID Array */
+        build_append_int_noprefix(table_data, 28, 4);
+
+        /* RMR specific data */
+
+        /* Flags */
+        build_append_int_noprefix(table_data, 0 /* Disallow remapping */, 4);
+        /* Number of Memory Range Descriptors */
+        build_append_int_noprefix(table_data, 1 , 4);
+        /* Reference to Memory Range Descriptors */
+        build_append_int_noprefix(table_data, 28 + ID_MAPPING_ENTRY_SIZE, 4);
+        build_iort_id_mapping(table_data, bdf, range->id_count, smmu_offset[i], 1);
+
+        /* Table 19 Memory Range Descriptor */
+
+        /* Physical Range offset */
+        build_append_int_noprefix(table_data, 0x8000000, 8);
+        /* Physical Range length */
+        build_append_int_noprefix(table_data, 0x100000, 8);
+        build_append_int_noprefix(table_data, 0, 4); /* Reserved */
+        *id += 1;
+    }
+}
+
 /*
  * Input Output Remapping Table (IORT)
  * Conforms to "IO Remapping Table System Software on ARM Platforms",
@@ -284,7 +336,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     GArray *smmu_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
     GArray *its_idmaps = g_array_new(false, true, sizeof(AcpiIortIdMapping));
 
-    AcpiTable table = { .sig = "IORT", .rev = 3, .oem_id = vms->oem_id,
+    AcpiTable table = { .sig = "IORT", .rev = 5, .oem_id = vms->oem_id,
                         .oem_table_id = vms->oem_table_id };
     /* Table 2 The IORT */
     acpi_table_begin(&table, table_data);
@@ -325,6 +377,9 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             }
 
             next_range.input_base = idmap->input_base + idmap->id_count;
+            if (vms->iommu == VIRT_IOMMU_SMMUV3_NESTED) {
+                nb_nodes++; /* RMR node per SMMU */
+            }
         }
 
         /* Append the last RC -> ITS ID mapping */
@@ -386,7 +441,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         build_append_int_noprefix(table_data, 0, 4);
 
         /* output IORT node is the ITS group node (the first node) */
-        build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET);
+        build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET, 0);
     }
 
     /* Table 17 Root Complex Node */
@@ -427,7 +482,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             range = &g_array_index(smmu_idmaps, AcpiIortIdMapping, i);
             /* output IORT node is the smmuv3 node */
             build_iort_id_mapping(table_data, range->input_base,
-                                  range->id_count, smmu_offset[i]);
+                                  range->id_count, smmu_offset[i], 0);
         }
 
         /* bypassed RIDs connect to ITS group node directly: RC -> ITS */
@@ -435,11 +490,15 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             range = &g_array_index(its_idmaps, AcpiIortIdMapping, i);
             /* output IORT node is the ITS group node (the first node) */
             build_iort_id_mapping(table_data, range->input_base,
-                                  range->id_count, IORT_NODE_OFFSET);
+                                  range->id_count, IORT_NODE_OFFSET, 0);
         }
     } else {
         /* output IORT node is the ITS group node (the first node) */
-        build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET);
+        build_iort_id_mapping(table_data, 0, 0x10000, IORT_NODE_OFFSET, 0);
+    }
+
+    if (vms->iommu == VIRT_IOMMU_SMMUV3_NESTED) {
+        build_iort_rmr_nodes(table_data, smmu_idmaps, smmu_offset, &id);
     }
 
     acpi_table_end(linker, &table);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-11-08 12:52 ` [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum via
@ 2024-11-13 18:31   ` Nicolin Chen
  2024-11-14  8:48     ` Shameerali Kolothum Thodi via
  2024-12-10 23:01   ` Nicolin Chen
  1 sibling, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2024-11-13 18:31 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> To handle SMMUv3 nested stage support it is practical to
> expose the guest with reserved memory regions (RMRs)
> covering the IOVAs used by the host kernel to map
> physical MSI doorbells.

There has been an ongoing solution for MSI alternative:
https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/

So, I think we should keep this patch out of this series, instead
put it on top of the testing branch.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-11-13 18:31   ` Nicolin Chen
@ 2024-11-14  8:48     ` Shameerali Kolothum Thodi via
  2024-11-14 10:41       ` Eric Auger
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-14  8:48 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, November 13, 2024 6:31 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions
> to handle MSI nested binding
> 
> On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
> > From: Eric Auger <eric.auger@redhat.com>
> >
> > To handle SMMUv3 nested stage support it is practical to expose the
> > guest with reserved memory regions (RMRs) covering the IOVAs used by
> > the host kernel to map physical MSI doorbells.
> 
> There has been an ongoing solution for MSI alternative:
> https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/
> 
> So, I think we should keep this patch out of this series, instead put it on top
> of the testing branch.

Yes. I think then we can support DT solution as well. 

On that MSI RFC above, have you seen Eric's earlier/initial proposal to bind the Guest MSI in
nested cases. IIRC, it was providing an IOCTL and then creating a mapping in the host.

I think this is the latest on that.
https://lore.kernel.org/linux-iommu/20210411114659.15051-4-eric.auger@redhat.com/

But not sure, why we then moved to RMR approach. Eric?

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-11-14  8:48     ` Shameerali Kolothum Thodi via
@ 2024-11-14 10:41       ` Eric Auger
  2024-11-15 22:12         ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-14 10:41 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

Hi Shameer,

On 11/14/24 09:48, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Sent: Wednesday, November 13, 2024 6:31 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions
>> to handle MSI nested binding
>>
>> On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
>>> From: Eric Auger <eric.auger@redhat.com>
>>>
>>> To handle SMMUv3 nested stage support it is practical to expose the
>>> guest with reserved memory regions (RMRs) covering the IOVAs used by
>>> the host kernel to map physical MSI doorbells.
>> There has been an ongoing solution for MSI alternative:
>> https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/
>>
>> So, I think we should keep this patch out of this series, instead put it on top
>> of the testing branch.
> Yes. I think then we can support DT solution as well. 
>
> On that MSI RFC above, have you seen Eric's earlier/initial proposal to bind the Guest MSI in
> nested cases. IIRC, it was providing an IOCTL and then creating a mapping in the host.
>
> I think this is the latest on that.
> https://lore.kernel.org/linux-iommu/20210411114659.15051-4-eric.auger@redhat.com/
yes this is the latest before I stopped my VFIO integration efforts.
>
> But not sure, why we then moved to RMR approach. Eric?

This was indeed the 1st integration approach. Using RMR instead was
suggested by Jean-Philippe and I considered it as simpler (because we
needed the SET_MSI_BINDING iotcl) so I changed the approach.

Thanks

Eric
>
> Thanks,
> Shameer
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-11-14 10:41       ` Eric Auger
@ 2024-11-15 22:12         ` Nicolin Chen
  0 siblings, 0 replies; 150+ messages in thread
From: Nicolin Chen @ 2024-11-15 22:12 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org, jgg@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Thu, Nov 14, 2024 at 11:41:58AM +0100, Eric Auger wrote:
> Hi Shameer,
> 
> On 11/14/24 09:48, Shameerali Kolothum Thodi wrote:
> >
> >> -----Original Message-----
> >> From: Nicolin Chen <nicolinc@nvidia.com>
> >> Sent: Wednesday, November 13, 2024 6:31 PM
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> >> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> >> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> >> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> >> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> >> Jonathan Cameron <jonathan.cameron@huawei.com>;
> >> zhangfei.gao@linaro.org
> >> Subject: Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions
> >> to handle MSI nested binding
> >>
> >> On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
> >>> From: Eric Auger <eric.auger@redhat.com>
> >>>
> >>> To handle SMMUv3 nested stage support it is practical to expose the
> >>> guest with reserved memory regions (RMRs) covering the IOVAs used by
> >>> the host kernel to map physical MSI doorbells.
> >> There has been an ongoing solution for MSI alternative:
> >> https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/
> >>
> >> So, I think we should keep this patch out of this series, instead put it on top
> >> of the testing branch.
> > Yes. I think then we can support DT solution as well. 
> >
> > On that MSI RFC above, have you seen Eric's earlier/initial proposal to bind the Guest MSI in
> > nested cases. IIRC, it was providing an IOCTL and then creating a mapping in the host.
> >
> > I think this is the latest on that.
> > https://lore.kernel.org/linux-iommu/20210411114659.15051-4-eric.auger@redhat.com/
> yes this is the latest before I stopped my VFIO integration efforts.
> >
> > But not sure, why we then moved to RMR approach. Eric?
> 
> This was indeed the 1st integration approach. Using RMR instead was
> suggested by Jean-Philippe and I considered it as simpler (because we
> needed the SET_MSI_BINDING iotcl) so I changed the approach.

Oh, I didn't realized Eric had this..

Now, Robin wanted it back (in iommufd though), against the RMR :-/

Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-11-08 12:52 ` [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum via
  2024-11-13 18:31   ` Nicolin Chen
@ 2024-12-10 23:01   ` Nicolin Chen
  2024-12-11  0:48     ` Jason Gunthorpe
  1 sibling, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2024-12-10 23:01 UTC (permalink / raw)
  To: eric.auger, ddutile, Shameer Kolothum
  Cc: qemu-arm, qemu-devel, peter.maydell, jgg, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao

Hi Eric/Don/Shameer,

On Fri, Nov 08, 2024 at 12:52:42PM +0000, Shameer Kolothum wrote:
> Those IOVAs belong to [0x8000000, 0x8100000] matching
> MSI_IOVA_BASE and MSI_IOVA_LENGTH definitions in kernel
> arm-smmu-v3 driver. This is the window used to allocate
> IOVAs matching physical MSI doorbells.
[...]
> +static void
> +build_iort_rmr_nodes(GArray *table_data, GArray *smmu_idmaps,
> +                     size_t *smmu_offset, uint32_t *id)
> +{
[...]
> +        /* Physical Range offset */
> +        build_append_int_noprefix(table_data, 0x8000000, 8);
> +        /* Physical Range length */
> +        build_append_int_noprefix(table_data, 0x100000, 8);
> +        build_append_int_noprefix(table_data, 0, 4); /* Reserved */
> +        *id += 1;
> +    }

Jason made some kernel patches for iommufd to do MSI mappings:
https://github.com/jgunthorpe/linux/commits/for-nicolin/
It addresses Robin's remark against a get_msi_mapping_domain API
so that we could likely support a RMR solution as well, if a VMM
chooses to use it v.s. a future non-RMR one (mapping vITS page).

Yet, here we seem to be missing a pathway between VMM and kernel
to agree on the MSI window decided by the kernel, as this patch
does the hard coding for a [0x8000000, 0x8100000) range.

Though I am aware that the sysfs node
"/sys/devices/pci000x/000x/iommu_group/reserved_regions" exposes
the MSI window, it's probably better to have a new iommufd uAPI
to expose the range, so a nested domain eventually might be able
to choose between a RMR flow and a non-RMR flow.

I have been going through the structures between QEMU's SMMU code
and virt/virt-acpi-build code, yet having a hard time to figure
out a way to forward the MSI window from the SMMU code to IORT,
especially after this series changes the "smmu" instance creation
from virt code to "-device" string. Any thought?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-12-10 23:01   ` Nicolin Chen
@ 2024-12-11  0:48     ` Jason Gunthorpe
  2024-12-11  1:28       ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Jason Gunthorpe @ 2024-12-11  0:48 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: eric.auger, ddutile, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

On Tue, Dec 10, 2024 at 03:01:48PM -0800, Nicolin Chen wrote:

> Yet, here we seem to be missing a pathway between VMM and kernel
> to agree on the MSI window decided by the kernel, as this patch
> does the hard coding for a [0x8000000, 0x8100000) range.

I would ideally turn it around and provide that range information to
the kernel and totally ignore the SW_MSI reserved region once
userspace provides it.

The SW_MSI range then becomes something just used "by default".

Haven't thought about exactly which ioctl could do
this.. SET_OPTION(SW_MSI) on the idevice perhaps?

It seems pretty simple to do?

We will eventually need a way for userspace to disable SW_MSI entirely
anyhow.

> I have been going through the structures between QEMU's SMMU code
> and virt/virt-acpi-build code, yet having a hard time to figure
> out a way to forward the MSI window from the SMMU code to IORT,
> especially after this series changes the "smmu" instance creation
> from virt code to "-device" string. Any thought?

You probably have to solve this eventually because when the kernel
supports a non-RMR path the IORT code will need to not create the RMR
too.

Using RMR, or not, and the address to put the SW_MSI, is probably part
of the global machine configuration in qemu.

Jason

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-12-11  0:48     ` Jason Gunthorpe
@ 2024-12-11  1:28       ` Nicolin Chen
  2024-12-11 13:11         ` Jason Gunthorpe
  0 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2024-12-11  1:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: eric.auger, ddutile, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

On Tue, Dec 10, 2024 at 08:48:21PM -0400, Jason Gunthorpe wrote:
> On Tue, Dec 10, 2024 at 03:01:48PM -0800, Nicolin Chen wrote:
> 
> > Yet, here we seem to be missing a pathway between VMM and kernel
> > to agree on the MSI window decided by the kernel, as this patch
> > does the hard coding for a [0x8000000, 0x8100000) range.
> 
> I would ideally turn it around and provide that range information to
> the kernel and totally ignore the SW_MSI reserved region once
> userspace provides it.

Hmm.. that sounds like a uAPI for vITS range..but yes..

> The SW_MSI range then becomes something just used "by default".
>
> Haven't thought about exactly which ioctl could do
> this.. SET_OPTION(SW_MSI) on the idevice perhaps?
> 
> It seems pretty simple to do?

That looks like a good interface, given that we are already
making sw_msi_list per ictx.

So, VMM can GET_OPTION(SW_MSI) for msi_base to extract the
info from kernel. Likely need a second call for its length?
Since IOMMU_OPTION only supports one val64 input or output.

> We will eventually need a way for userspace to disable SW_MSI entirely
> anyhow.

> > I have been going through the structures between QEMU's SMMU code
> > and virt/virt-acpi-build code, yet having a hard time to figure
> > out a way to forward the MSI window from the SMMU code to IORT,
> > especially after this series changes the "smmu" instance creation
> > from virt code to "-device" string. Any thought?
> 
> You probably have to solve this eventually because when the kernel
> supports a non-RMR path the IORT code will need to not create the RMR
> too.
> 
> Using RMR, or not, and the address to put the SW_MSI, is probably part
> of the global machine configuration in qemu.

Yes, either vITS or RMR range is in the global machine code.
So, likely it's not ideal to go with HWPTs.

Thanks!
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-12-11  1:28       ` Nicolin Chen
@ 2024-12-11 13:11         ` Jason Gunthorpe
  2024-12-11 17:20           ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Jason Gunthorpe @ 2024-12-11 13:11 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: eric.auger, ddutile, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

On Tue, Dec 10, 2024 at 05:28:17PM -0800, Nicolin Chen wrote:
> > I would ideally turn it around and provide that range information to
> > the kernel and totally ignore the SW_MSI reserved region once
> > userspace provides it.
> 
> Hmm.. that sounds like a uAPI for vITS range..but yes..

It controls the window that the kernel uses to dynamically map the ITS
pages.

> So, VMM can GET_OPTION(SW_MSI) for msi_base to extract the
> info from kernel. Likely need a second call for its length?
> Since IOMMU_OPTION only supports one val64 input or output.

No, just forget about the kernel's SW_MSI region. The VMM uses this
API and overrides it and iommufd completely ignores SW_MSI.

There is nothing special about the range hard coded into the smmu
driver.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-12-11 13:11         ` Jason Gunthorpe
@ 2024-12-11 17:20           ` Nicolin Chen
  2024-12-11 18:01             ` Jason Gunthorpe
  0 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2024-12-11 17:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: eric.auger, ddutile, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

On Wed, Dec 11, 2024 at 09:11:12AM -0400, Jason Gunthorpe wrote:
> On Tue, Dec 10, 2024 at 05:28:17PM -0800, Nicolin Chen wrote:
> > > I would ideally turn it around and provide that range information to
> > > the kernel and totally ignore the SW_MSI reserved region once
> > > userspace provides it.
> > 
> > Hmm.. that sounds like a uAPI for vITS range..but yes..
> 
> It controls the window that the kernel uses to dynamically map the ITS
> pages.

Can we use SET_OPTION for vITS mapping (non-RMR solution) too? 

> > So, VMM can GET_OPTION(SW_MSI) for msi_base to extract the
> > info from kernel. Likely need a second call for its length?
> > Since IOMMU_OPTION only supports one val64 input or output.
> 
> No, just forget about the kernel's SW_MSI region. The VMM uses this
> API and overrides it and iommufd completely ignores SW_MSI.
> 
> There is nothing special about the range hard coded into the smmu
> driver.

OK. We will have SET_OPTION(IOMMU_OPTION_SW_MSI_START) and
SET_OPTION(IOMMU_OPTION_SW_MSI_LAST).

I think we will need some validation to the range too, although
iommufd doesn't have the information about the underlying ITS
driver: what if user space sets range to a page size, while the
ITS driver requires multiple pages?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding
  2024-12-11 17:20           ` Nicolin Chen
@ 2024-12-11 18:01             ` Jason Gunthorpe
  0 siblings, 0 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2024-12-11 18:01 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: eric.auger, ddutile, Shameer Kolothum, qemu-arm, qemu-devel,
	peter.maydell, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

On Wed, Dec 11, 2024 at 09:20:20AM -0800, Nicolin Chen wrote:
> On Wed, Dec 11, 2024 at 09:11:12AM -0400, Jason Gunthorpe wrote:
> > On Tue, Dec 10, 2024 at 05:28:17PM -0800, Nicolin Chen wrote:
> > > > I would ideally turn it around and provide that range information to
> > > > the kernel and totally ignore the SW_MSI reserved region once
> > > > userspace provides it.
> > > 
> > > Hmm.. that sounds like a uAPI for vITS range..but yes..
> > 
> > It controls the window that the kernel uses to dynamically map the ITS
> > pages.
> 
> Can we use SET_OPTION for vITS mapping (non-RMR solution) too? 

There are two parts to the vITS flow:

 1) mapping the phsical ITS page - I expect this to go through
    IOMMUFD_CMD_IOAS_MAP_FILE
 2) Conveying the MSI addr per-irq - this doesn't feel like set_option
    is quite the right fit since it is an array of msis

> 
> > > So, VMM can GET_OPTION(SW_MSI) for msi_base to extract the
> > > info from kernel. Likely need a second call for its length?
> > > Since IOMMU_OPTION only supports one val64 input or output.
> > 
> > No, just forget about the kernel's SW_MSI region. The VMM uses this
> > API and overrides it and iommufd completely ignores SW_MSI.
> > 
> > There is nothing special about the range hard coded into the smmu
> > driver.
> 
> OK. We will have SET_OPTION(IOMMU_OPTION_SW_MSI_START) and
> SET_OPTION(IOMMU_OPTION_SW_MSI_LAST).

Maybe length, but yes

> I think we will need some validation to the range too, although
> iommufd doesn't have the information about the underlying ITS
> driver: what if user space sets range to a page size, while the
> ITS driver requires multiple pages?

Ideally the kernel would detect and fail IRQ setup in these cases.

I suggest enforcing a minimal range of something XXM big at least,
then it won't happen.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
                   ` (4 preceding siblings ...)
  2024-11-08 12:52 ` [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum via
@ 2024-11-12 22:59 ` Nicolin Chen
  2024-11-14  7:56   ` Shameerali Kolothum Thodi via
  2024-11-20 23:59   ` Nathan Chen
  2024-11-13 16:16 ` Mostafa Saleh
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 150+ messages in thread
From: Nicolin Chen @ 2024-11-12 22:59 UTC (permalink / raw)
  To: Shameer Kolothum, nathanc
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote:
> Few ToDos to note,
> 1. At present default-bus-bypass-iommu=on should be set when
>    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
>    related boot error.  Requires fixing.
> 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
>    Could be a bug in IORT id mappings.

Do we have enough bus number space for each pbx bus in IORT?

The bus range is defined by min_/max_bus in hort_host_bridges(),
where the pci_bus_range() function call might not leave enough
space in the range for hotplugs IIRC.

> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> -net none \
> -nographic
..
> With a pci topology like below,
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
> 
> And if you want to add another HNS VF, it should be added to the same SMMUv3
> as of the first HNS dev,
> 
> -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \
..
> At present Qemu is not doing any extra validation other than the above
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.

Nathan from NVIDIA side is working on the libvirt. And he already
did some prototype coding in libvirt that could generate required
PCI topology. I think he can take this patches for a combined test.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-12 22:59 ` [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Nicolin Chen
@ 2024-11-14  7:56   ` Shameerali Kolothum Thodi via
  2024-11-20 23:59   ` Nathan Chen
  1 sibling, 0 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-14  7:56 UTC (permalink / raw)
  To: Nicolin Chen, nathanc@nvidia.com
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, November 12, 2024 11:00 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; nathanc@nvidia.com
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote:
> > Few ToDos to note,
> > 1. At present default-bus-bypass-iommu=on should be set when
> >    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
> >    related boot error.  Requires fixing.
> > 2. Hot adding a device is not working at the moment. Looks like pcihp irq
> issue.
> >    Could be a bug in IORT id mappings.
> 
> Do we have enough bus number space for each pbx bus in IORT?
> 
> The bus range is defined by min_/max_bus in hort_host_bridges(),
> where the pci_bus_range() function call might not leave enough
> space in the range for hotplugs IIRC.

Ok. Thanks for the pointer. I will debug that.

> > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
> iommu=on \
> > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> > -object iommufd,id=iommufd0 \
> > -bios QEMU_EFI.fd \
> > -kernel Image \
> > -device virtio-blk-device,drive=fs \
> > -drive if=none,file=rootfs.qcow2,id=fs \
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
> earlycon=pl011,0x9000000" \
> > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> > -net none \
> > -nographic
> ..
> > With a pci topology like below,
> > [root@localhost ~]# lspci -tv
> > -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
> >  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
> >  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
> >  |           \-03.0  Virtio: Virtio filesystem
> >  +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS
> Network Controller (Virtual Function)
> >  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP
> Engine(Virtual Function)
> > [root@localhost ~]#
> >
> > And if you want to add another HNS VF, it should be added to the same
> SMMUv3
> > as of the first HNS dev,
> >
> > -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
> > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \
> ..
> > At present Qemu is not doing any extra validation other than the above
> > failure to make sure the user configuration is correct or not. The
> > assumption is libvirt will take care of this.
> 
> Nathan from NVIDIA side is working on the libvirt. And he already
> did some prototype coding in libvirt that could generate required
> PCI topology. I think he can take this patches for a combined test.

Cool. That's good to know.

Thanks,
SHameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-12 22:59 ` [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Nicolin Chen
  2024-11-14  7:56   ` Shameerali Kolothum Thodi via
@ 2024-11-20 23:59   ` Nathan Chen
  2024-11-21 10:12     ` Shameerali Kolothum Thodi via
  2024-12-12 23:54     ` Nathan Chen
  1 sibling, 2 replies; 150+ messages in thread
From: Nathan Chen @ 2024-11-20 23:59 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao,
	Nicolin Chen

Hi Shameer,

 >  Attempt to add the HNS VF to a different SMMUv3 will result in,
 >
 > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: 
Unable to attach viommu
 > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: 
vfio 0000:7d:02.2:
 >    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 
(38) to id=11: Invalid argument
 >
 > At present Qemu is not doing any extra validation other than the above
 > failure to make sure the user configuration is correct or not. The
 > assumption is libvirt will take care of this.
Would you be able to elaborate what Qemu is validating with this error 
message? I'm not seeing these errors when assigning a GPU's 
pcie-root-port to different PXBs (with different associated SMMU nodes).

I launched a VM using my libvirt prototype code + your qemu branch and 
noted a few small things:
1. Are there plans to support "-device addr" for arm-smmuv3-nested's 
PCIe slot and function like any other device? If not I'll exclude it 
from my libvirt prototype.
2. Is "id" for  "-device arm-smmuv3-nested" necessary for any sort of 
functionality? If so, I'll make a change to my libvirt prototype to 
support this. I was able to boot a VM and see a similar VM PCI topology 
as your example without specifying "id".

Otherwise, the VM topology looks OK with your qemu branch + my libvirt 
prototype.

Also as a heads up, I've added support for auto-inserting PCIe switch 
between the PXB and GPUs in libvirt to attach multiple devices to a SMMU 
node per libvirt's documentation - "If you intend to plug multiple 
devices into a pcie-expander-bus, you must connect a 
pcie-switch-upstream-port to the pcie-root-port that is plugged into the 
pcie-expander-bus, and multiple pcie-switch-downstream-ports to the 
pcie-switch-upstream-port". Future unit-tests should follow this 
topology configuration.

Thanks,
Nathan

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-20 23:59   ` Nathan Chen
@ 2024-11-21 10:12     ` Shameerali Kolothum Thodi via
  2024-11-22  1:41       ` Nathan Chen
  2024-12-12 23:54     ` Nathan Chen
  1 sibling, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-21 10:12 UTC (permalink / raw)
  To: Nathan Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nicolin Chen

Hi Nathan,

> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Wednesday, November 20, 2024 11:59 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nicolin Chen <nicolinc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Shameer,
> 
>  >  Attempt to add the HNS VF to a different SMMUv3 will result in,
>  >
>  > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> Unable to attach viommu
>  > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> vfio 0000:7d:02.2:
>  >    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2
> (38) to id=11: Invalid argument
>  >
>  > At present Qemu is not doing any extra validation other than the above
>  > failure to make sure the user configuration is correct or not. The
>  > assumption is libvirt will take care of this.
> Would you be able to elaborate what Qemu is validating with this error
> message? I'm not seeing these errors when assigning a GPU's
> pcie-root-port to different PXBs (with different associated SMMU nodes).

You should see that error when you have two devices that belongs to two
different physical SMMUv3s in the host kernel, is assigned to a single
PXB/SMMUv3 for Guest.

Something like,

-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=1 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \  --> This device belongs to phys SMMUv3_0
-device vfio-pci,host=0000:75:02.1,bus=pcie.port2,iommufd=iommufd0 \  --> This device belongs to phys SMMUv3_1

So the assumption above is that libvirt will be able to detect which devices belongs
to the same physical SMMUv3 and do the assignment for Guests correctly.

> I launched a VM using my libvirt prototype code + your qemu branch and
> noted a few small things:

Thanks for giving this a spin with libvirt.

> 1. Are there plans to support "-device addr" for arm-smmuv3-nested's
> PCIe slot and function like any other device? If not I'll exclude it
> from my libvirt prototype.

Not at the moment. arm-smmuv3-nested at the moment is not making any use
of PCI slot and  func info specifically. I am not sure how that will be useful
for this though.

> 2. Is "id" for  "-device arm-smmuv3-nested" necessary for any sort of
> functionality? If so, I'll make a change to my libvirt prototype to
> support this. I was able to boot a VM and see a similar VM PCI topology
> as your example without specifying "id".

Yes, "id" not used and without it, it will work.

> Otherwise, the VM topology looks OK with your qemu branch + my libvirt
> prototype.

That is good to know.
 
> Also as a heads up, I've added support for auto-inserting PCIe switch
> between the PXB and GPUs in libvirt to attach multiple devices to a SMMU
> node per libvirt's documentation - "If you intend to plug multiple
> devices into a pcie-expander-bus, you must connect a
> pcie-switch-upstream-port to the pcie-root-port that is plugged into the
> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
> pcie-switch-upstream-port". Future unit-tests should follow this
> topology configuration.

Ok. Could you please give me an example Qemu equivalent command option,
if possible, for the above case. I am not that familiar with libvirt and I would
also like to test the above scenario.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-21 10:12     ` Shameerali Kolothum Thodi via
@ 2024-11-22  1:41       ` Nathan Chen
  2024-11-22 17:38         ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Nathan Chen @ 2024-11-22  1:41 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nicolin Chen

 >> Also as a heads up, I've added support for auto-inserting PCIe switch
 >> between the PXB and GPUs in libvirt to attach multiple devices to a SMMU
 >> node per libvirt's documentation - "If you intend to plug multiple
 >> devices into a pcie-expander-bus, you must connect a
 >> pcie-switch-upstream-port to the pcie-root-port that is plugged into the
 >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
 >> pcie-switch-upstream-port". Future unit-tests should follow this
 >> topology configuration.
 >
 > Ok. Could you please give me an example Qemu equivalent command option,
 > if possible, for the above case. I am not that familiar with libvirt 
and I would
 > also like to test the above scenario.

You can use "-device x3130-upstream" for the upstream switch port, and
"-device xio3130-downstream" for the downstream port:

  -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
  -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
  -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
  -device xio3130-downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
  -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
  -device arm-smmuv3-nested,pci-bus=pci.1

-Nathan


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-22  1:41       ` Nathan Chen
@ 2024-11-22 17:38         ` Shameerali Kolothum Thodi via
  2024-11-22 18:53           ` Nathan Chen
  2024-12-13 11:58           ` Daniel P. Berrangé
  0 siblings, 2 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-22 17:38 UTC (permalink / raw)
  To: Nathan Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nicolin Chen



> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Friday, November 22, 2024 1:42 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nicolin Chen <nicolinc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
>  >> Also as a heads up, I've added support for auto-inserting PCIe switch
>  >> between the PXB and GPUs in libvirt to attach multiple devices to a
> SMMU
>  >> node per libvirt's documentation - "If you intend to plug multiple
>  >> devices into a pcie-expander-bus, you must connect a
>  >> pcie-switch-upstream-port to the pcie-root-port that is plugged into the
>  >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
>  >> pcie-switch-upstream-port". Future unit-tests should follow this
>  >> topology configuration.
>  >
>  > Ok. Could you please give me an example Qemu equivalent command
> option,
>  > if possible, for the above case. I am not that familiar with libvirt
> and I would
>  > also like to test the above scenario.
> 
> You can use "-device x3130-upstream" for the upstream switch port, and
> "-device xio3130-downstream" for the downstream port:
> 
>   -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
>   -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
>   -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
>   -device xio3130-
> downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
>   -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
>   -device arm-smmuv3-nested,pci-bus=pci.1

Thanks. Just wondering why libvirt mandates usage of pcie-switch for multiple
device plugging rather than just using pcie-root-ports?

Please let me if there is any advantage in doing so that you are aware of.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-22 17:38         ` Shameerali Kolothum Thodi via
@ 2024-11-22 18:53           ` Nathan Chen
  2025-02-04 14:00             ` Eric Auger
  2024-12-13 11:58           ` Daniel P. Berrangé
  1 sibling, 1 reply; 150+ messages in thread
From: Nathan Chen @ 2024-11-22 18:53 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nicolin Chen

 >  >> Also as a heads up, I've added support for auto-inserting PCIe switch
 >  >> between the PXB and GPUs in libvirt to attach multiple devices to a
 > SMMU
 >  >> node per libvirt's documentation - "If you intend to plug multiple
 >  >> devices into a pcie-expander-bus, you must connect a
 >  >> pcie-switch-upstream-port to the pcie-root-port that is plugged 
into the
 >  >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
 >  >> pcie-switch-upstream-port". Future unit-tests should follow this
 >  >> topology configuration.
 >  >
 > >  > Ok. Could you please give me an example Qemu equivalent command
 > > option,
 > >  > if possible, for the above case. I am not that familiar with libvirt
 > > and I would
 > >  > also like to test the above scenario.
 > >
 > > You can use "-device x3130-upstream" for the upstream switch port, and
 > > "-device xio3130-downstream" for the downstream port:
 > >
 > >   -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
 > >   -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
 > >   -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
 > >   -device xio3130-
 > > downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
 > >   -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
 > >   -device arm-smmuv3-nested,pci-bus=pci.1
 >
 > Thanks. Just wondering why libvirt mandates usage of pcie-switch for 
multiple
 > device plugging rather than just using pcie-root-ports?
 >
 > Please let me if there is any advantage in doing so that you are 
aware > of.

Actually it seems like that documentation I quoted is out of date. That 
section of the documentation for pcie-expander-bus was written before a 
patch that revised libvirt's pxb to have 32 slots instead of just 1 
slot, and it wasn't updated afterwards.

With your branch and my libvirt prototype, I was still able to attach a 
passthrough device behind a PCIe switch and see it attached to a vSMMU 
in the VM, so I'm not sure if you need to make additional changes to 
your solution to support this. But I think we should still support/test 
the case where VFIO devices are behind a switch, otherwise we're placing 
a limitation on end users who have a use case for it.

-Nathan


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-22 18:53           ` Nathan Chen
@ 2025-02-04 14:00             ` Eric Auger
  0 siblings, 0 replies; 150+ messages in thread
From: Eric Auger @ 2025-02-04 14:00 UTC (permalink / raw)
  To: Nathan Chen, Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nicolin Chen

Hi Nathan,


On 11/22/24 7:53 PM, Nathan Chen wrote:
> >  >> Also as a heads up, I've added support for auto-inserting PCIe
> switch
> >  >> between the PXB and GPUs in libvirt to attach multiple devices to a
> > SMMU
> >  >> node per libvirt's documentation - "If you intend to plug multiple
> >  >> devices into a pcie-expander-bus, you must connect a
> >  >> pcie-switch-upstream-port to the pcie-root-port that is plugged
> into the
> >  >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
> >  >> pcie-switch-upstream-port". Future unit-tests should follow this
> >  >> topology configuration.
> >  >
> > >  > Ok. Could you please give me an example Qemu equivalent command
> > > option,
> > >  > if possible, for the above case. I am not that familiar with
> libvirt
> > > and I would
> > >  > also like to test the above scenario.
> > >
> > > You can use "-device x3130-upstream" for the upstream switch port,
> and
> > > "-device xio3130-downstream" for the downstream port:
> > >
> > >   -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
> > >   -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
> > >   -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
> > >   -device xio3130-
> > > downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
> > >   -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
> > >   -device arm-smmuv3-nested,pci-bus=pci.1
> >
> > Thanks. Just wondering why libvirt mandates usage of pcie-switch for
> multiple
> > device plugging rather than just using pcie-root-ports?
> >
> > Please let me if there is any advantage in doing so that you are
> aware > of.
>
> Actually it seems like that documentation I quoted is out of date.
> That section of the documentation for pcie-expander-bus was written
> before a patch that revised libvirt's pxb to have 32 slots instead of
> just 1 slot, and it wasn't updated afterwards.
you mean read QEMU documentation in qemu/docs/pcie.txt (esp PCI Express
only hierarchy)

Thanks

Eric
>
> With your branch and my libvirt prototype, I was still able to attach
> a passthrough device behind a PCIe switch and see it attached to a
> vSMMU in the VM, so I'm not sure if you need to make additional
> changes to your solution to support this. But I think we should still
> support/test the case where VFIO devices are behind a switch,
> otherwise we're placing a limitation on end users who have a use case
> for it.
>
> -Nathan



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-22 17:38         ` Shameerali Kolothum Thodi via
  2024-11-22 18:53           ` Nathan Chen
@ 2024-12-13 11:58           ` Daniel P. Berrangé
  2024-12-13 12:43             ` Jason Gunthorpe
  1 sibling, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2024-12-13 11:58 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Nathan Chen, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, peter.maydell@linaro.org, jgg@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, Nicolin Chen

On Fri, Nov 22, 2024 at 05:38:54PM +0000, Shameerali Kolothum Thodi via wrote:
> 
> 
> > -----Original Message-----
> > From: Nathan Chen <nathanc@nvidia.com>
> > Sent: Friday, November 22, 2024 1:42 AM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> > Jonathan Cameron <jonathan.cameron@huawei.com>;
> > zhangfei.gao@linaro.org; Nicolin Chen <nicolinc@nvidia.com>
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> > 
> >  >> Also as a heads up, I've added support for auto-inserting PCIe switch
> >  >> between the PXB and GPUs in libvirt to attach multiple devices to a
> > SMMU
> >  >> node per libvirt's documentation - "If you intend to plug multiple
> >  >> devices into a pcie-expander-bus, you must connect a
> >  >> pcie-switch-upstream-port to the pcie-root-port that is plugged into the
> >  >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
> >  >> pcie-switch-upstream-port". Future unit-tests should follow this
> >  >> topology configuration.
> >  >
> >  > Ok. Could you please give me an example Qemu equivalent command
> > option,
> >  > if possible, for the above case. I am not that familiar with libvirt
> > and I would
> >  > also like to test the above scenario.
> > 
> > You can use "-device x3130-upstream" for the upstream switch port, and
> > "-device xio3130-downstream" for the downstream port:
> > 
> >   -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
> >   -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
> >   -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
> >   -device xio3130-
> > downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
> >   -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
> >   -device arm-smmuv3-nested,pci-bus=pci.1
> 
> Thanks. Just wondering why libvirt mandates usage of pcie-switch for multiple
> device plugging rather than just using pcie-root-ports?

Libvirt does not rquire use of pcie-switch. It supports them, but in the
absence of app requested configs, libvirt will always just populate
pcie-root-port devices. switches are something that has to be explicitly
asked for, and I don't see much need todo that.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-13 11:58           ` Daniel P. Berrangé
@ 2024-12-13 12:43             ` Jason Gunthorpe
  0 siblings, 0 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2024-12-13 12:43 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Shameerali Kolothum Thodi, Nathan Chen, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nicolin Chen

On Fri, Dec 13, 2024 at 11:58:02AM +0000, Daniel P. Berrangé wrote:

> Libvirt does not rquire use of pcie-switch. It supports them, but in the
> absence of app requested configs, libvirt will always just populate
> pcie-root-port devices. switches are something that has to be explicitly
> asked for, and I don't see much need todo that.

If you are assigning all VFIO devices within a multi-device iommu
group there are good reasons to show the switch, and the switch has to
reflect certain ACS properties. We have some systems like this..

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-20 23:59   ` Nathan Chen
  2024-11-21 10:12     ` Shameerali Kolothum Thodi via
@ 2024-12-12 23:54     ` Nathan Chen
  2024-12-13  1:01       ` Nathan Chen
  1 sibling, 1 reply; 150+ messages in thread
From: Nathan Chen @ 2024-12-12 23:54 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao,
	Nicolin Chen

Hi Shameer,

Could you share the branch/version of the boot firmware file 
"QEMU_EFI.fd" from your example, and where you retrieved it from? I've 
been encountering PCI host bridge resource conflicts whenever assigning 
more than one passthrough device to a multi-vSMMU VM, booting with the 
boot firmware provided by qemu-efi-aarch64 version 2024.02-2. This 
prevents the VM from booting, eventually dropping into the UEFI shell 
with an error message indicating DMA mapping failed for the passthrough 
devices.

Thanks,
Nathan

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-12 23:54     ` Nathan Chen
@ 2024-12-13  1:01       ` Nathan Chen
  2024-12-16  9:31         ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Nathan Chen @ 2024-12-13  1:01 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao,
	Nicolin Chen

 >with an error message indicating DMA mapping failed for the 
passthrough >devices.

A correction - the message indicates UEFI failed to find a mapping for 
the boot partition ("map: no mapping found"), not that DMA mapping 
failed. But earlier EDK debug logs still show PCI host bridge resource 
conflicts for the passthrough devices that seem related to the VM boot 
failure.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-13  1:01       ` Nathan Chen
@ 2024-12-16  9:31         ` Shameerali Kolothum Thodi via
  2025-01-25  2:43           ` Nathan Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-12-16  9:31 UTC (permalink / raw)
  To: Nathan Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nicolin Chen

Hi Nathan,

> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Friday, December 13, 2024 1:02 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nicolin Chen <nicolinc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> 
>  >with an error message indicating DMA mapping failed for the
> passthrough >devices.
> 
> A correction - the message indicates UEFI failed to find a mapping for
> the boot partition ("map: no mapping found"), not that DMA mapping
> failed. But earlier EDK debug logs still show PCI host bridge resource
> conflicts for the passthrough devices that seem related to the VM boot
> failure.

I have tried a 2023 version EFI which works. And for more recent tests I am
using a one built directly from,
https://github.com/tianocore/edk2.git master

Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT protection
in 5 level paging"

With both, I don’t remember seeing any boot failure and the above UEFI
related "map: no mapping found" error. But the Guest kernel at times
complaints about pci bridge window memory assignment failures.
...
pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space
pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed to assign
pci 0000:10:00.0: bridge window [io  size 0x1000]:can't assign; no space
...

But Guest still boots and worked fine so far.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-16  9:31         ` Shameerali Kolothum Thodi via
@ 2025-01-25  2:43           ` Nathan Chen
  2025-01-27 15:26             ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Nathan Chen @ 2025-01-25  2:43 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: ddutile, eric.auger, jgg, jiangkunkun, jonathan.cameron, linuxarm,
	nathanc, nicolinc, peter.maydell, qemu-arm, wangzhou1,
	zhangfei.gao, qemu-devel@nongnu.org

>>  >with an error message indicating DMA mapping failed for the
>> passthrough >devices.
>> 
>> A correction - the message indicates UEFI failed to find a mapping for
>> the boot partition ("map: no mapping found"), not that DMA mapping
>> failed. But earlier EDK debug logs still show PCI host bridge resource
>> conflicts for the passthrough devices that seem related to the VM boot
>> failure.
> 
> I have tried a 2023 version EFI which works. And for more recent tests I am
> using a one built directly from,
> https://github.com/tianocore/edk2.git master
> 
> Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT protection
> in 5 level paging"
> 
> With both, I don’t remember seeing any boot failure and the above UEFI
> related "map: no mapping found" error. But the Guest kernel at times
> complaints about pci bridge window memory assignment failures.
> ...
> pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space
> pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed to assign
> pci 0000:10:00.0: bridge window [io  size 0x1000]:can't assign; no space
> ...
> 
> But Guest still boots and worked fine so far.

Hi Shameer,

Just letting you know I resolved this by increasing the MMIO region size 
in hw/arm/virt.c to support passing through GPUs with large BAR regions 
(VIRT_HIGH_PCIE_MMIO). Thanks for taking a look.

Thanks,
Nathan


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-25  2:43           ` Nathan Chen
@ 2025-01-27 15:26             ` Shameerali Kolothum Thodi via
  2025-01-27 23:35               ` Nathan Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-01-27 15:26 UTC (permalink / raw)
  To: Nathan Chen
  Cc: ddutile@redhat.com, eric.auger@redhat.com, jgg@nvidia.com,
	jiangkunkun, Jonathan Cameron, Linuxarm, nicolinc@nvidia.com,
	peter.maydell@linaro.org, qemu-arm@nongnu.org, Wangzhou (B),
	zhangfei.gao@linaro.org, qemu-devel@nongnu.org



> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Saturday, January 25, 2025 2:44 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: ddutile@redhat.com; eric.auger@redhat.com; jgg@nvidia.com;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com>;
> nathanc@nvidia.com; nicolinc@nvidia.com; peter.maydell@linaro.org;
> qemu-arm@nongnu.org; Wangzhou (B) <wangzhou1@hisilicon.com>;
> zhangfei.gao@linaro.org; qemu-devel@nongnu.org
> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> >>  >with an error message indicating DMA mapping failed for the
> >> passthrough >devices.
> >>
> >> A correction - the message indicates UEFI failed to find a mapping for
> >> the boot partition ("map: no mapping found"), not that DMA mapping
> >> failed. But earlier EDK debug logs still show PCI host bridge resource
> >> conflicts for the passthrough devices that seem related to the VM boot
> >> failure.
> >
> > I have tried a 2023 version EFI which works. And for more recent tests I
> am
> > using a one built directly from,
> > https://github.com/tianocore/edk2.git master
> >
> > Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT
> protection
> > in 5 level paging"
> >
> > With both, I don’t remember seeing any boot failure and the above UEFI
> > related "map: no mapping found" error. But the Guest kernel at times
> > complaints about pci bridge window memory assignment failures.
> > ...
> > pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't
> assign; no space
> > pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed
> to assign
> > pci 0000:10:00.0: bridge window [io  size 0x1000]:can't assign; no space
> > ...
> >
> > But Guest still boots and worked fine so far.
> 
> Hi Shameer,
> 
> Just letting you know I resolved this by increasing the MMIO region size
> in hw/arm/virt.c to support passing through GPUs with large BAR regions
> (VIRT_HIGH_PCIE_MMIO). Thanks for taking a look.
> 

Ok. Thanks for that. Does that mean may be an optional property to specify
the size for VIRT_HIGH_PCIE_MMIO is worth adding?

And for the PCI bridge window specific errors that I mentioned above,

>>pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space

adding  ""mem-reserve=X" and "io-reserve=X" to pcie-root-port helps.

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-27 15:26             ` Shameerali Kolothum Thodi via
@ 2025-01-27 23:35               ` Nathan Chen
  0 siblings, 0 replies; 150+ messages in thread
From: Nathan Chen @ 2025-01-27 23:35 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: ddutile@redhat.com, eric.auger@redhat.com, jgg@nvidia.com,
	jiangkunkun, Jonathan Cameron, Linuxarm, nicolinc@nvidia.com,
	peter.maydell@linaro.org, qemu-arm@nongnu.org, Wangzhou (B),
	zhangfei.gao@linaro.org, qemu-devel@nongnu.org, mochs

>>>>  >with an error message indicating DMA mapping failed for the
>>>> passthrough >devices.
>>>>
>>>> A correction - the message indicates UEFI failed to find a mapping for
>>>> the boot partition ("map: no mapping found"), not that DMA mapping
>>>> failed. But earlier EDK debug logs still show PCI host bridge resource
>>>> conflicts for the passthrough devices that seem related to the VM boot
>>>> failure.
>>>
>>> I have tried a 2023 version EFI which works. And for more recent tests I
>> am
>>> using a one built directly from,
>>> https://github.com/tianocore/edk2.git master
>>>
>>> Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT
>> protection
>>> in 5 level paging"
>>>
>>> With both, I don’t remember seeing any boot failure and the above UEFI
>>> related "map: no mapping found" error. But the Guest kernel at times
>>> complaints about pci bridge window memory assignment failures.
>>> ...
>>> pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't
>> assign; no space
>>> pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed
>> to assign
>>> pci 0000:10:00.0: bridge window [io  size 0x1000]:can't assign; no space
>>> ...
>>>
>>> But Guest still boots and worked fine so far.
>>
>> Hi Shameer,
>>
>> Just letting you know I resolved this by increasing the MMIO region size
>> in hw/arm/virt.c to support passing through GPUs with large BAR regions
>> (VIRT_HIGH_PCIE_MMIO). Thanks for taking a look.
>>
> 
> Ok. Thanks for that. Does that mean may be an optional property to specify
> the size for VIRT_HIGH_PCIE_MMIO is worth adding?

Yes, and actually we have a patch ready for the configurable highmem 
region size. Matt Ochs will send it out in the next day or so and CC you 
on the submission.

> adding  ""mem-reserve=X" and "io-reserve=X" to pcie-root-port helps

Ok, good to know - I'll keep that in mind for future testing.

Thanks,
Nathan


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
                   ` (5 preceding siblings ...)
  2024-11-12 22:59 ` [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Nicolin Chen
@ 2024-11-13 16:16 ` Mostafa Saleh
  2024-11-14  8:01   ` Shameerali Kolothum Thodi via
  2024-11-13 21:42 ` Nicolin Chen
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 150+ messages in thread
From: Mostafa Saleh @ 2024-11-13 16:16 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

Hi Shameer,

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> Hi,
> 
> This series adds initial support for a user-creatable "arm-smmuv3-nested"
> device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> and cannot support multiple SMMUv3s.
> 

I had a quick look at the SMMUv3 files, as now SMMUv3 supports nested
translation emulation, would it make sense to rename this? As AFAIU,
this is about virt (stage-1) SMMUv3 that is emulated to a guest.
Including vSMMU or virt would help distinguish the code, as now
some new function as smmu_nested_realize() looks confusing.

Thanks,
Mostafa

> In order to support vfio-pci dev assignment with vSMMUv3, the physical
> SMMUv3 has to be configured in nested mode. Having a pluggable
> "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> running on a host with multiple physical SMMUv3s. A few benefits of doing
> this are,
> 
> 1. Avoid invalidation broadcast or lookup in case devices are behind
>    multiple phys SMMUv3s.
> 2. Makes it easy to handle phys SMMUv3s that differ in features.
> 3. Easy to handle future requirements such as vCMDQ support.
> 
> This is based on discussions/suggestions received for a previous RFC by
> Nicolin here[0].
> 
> This series includes,
>  -Adds support for "arm-smmuv3-nested" device. At present only virt is
>   supported and is using _plug_cb() callback to hook the sysbus mem
>   and irq (Not sure this has any negative repercussions). Patch #3.
>  -Provides a way to associate a pci-bus(pxb-pcie) to the above device.
>   Patch #3.
>  -The last patch is adding RMR support for MSI doorbell handling. Patch #5.
>   This may change in future[1].
> 
> This RFC is for initial discussion/test purposes only and includes patches
> that are only relevant for adding the "arm-smmuv3-nested" support. For the
> complete branch please find,
> https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1
> 
> Few ToDos to note,
> 1. At present default-bus-bypass-iommu=on should be set when
>    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
>    related boot error.  Requires fixing.
> 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
>    Could be a bug in IORT id mappings.
> 3. The above branch doesn't support vSVA yet.
> 
> Hopefully this is helpful in taking the discussion forward. Please take a
> look and let me know.
> 
> How to use it(Eg:):
> 
> On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
> specify two smmuv3-nested devices each behind a pxb-pcie as below,
> 
> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> -net none \
> -nographic
> 
> Guest will boot with two SMMuv3s,
> [    1.608130] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> [    1.609655] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
> [    1.612475] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> [    1.614444] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> [    1.617451] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> [    1.618842] arm-smmu-v3 arm-smmu-v3.1.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
> [    1.621366] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> [    1.623225] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
> 
> With a pci topology like below,
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
> 
> And if you want to add another HNS VF, it should be added to the same SMMUv3
> as of the first HNS dev,
> 
> -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \
> 
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]-+-00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  |           \-01.0-[0a]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
> 
> Attempt to add the HNS VF to a different SMMUv3 will result in,
> 
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument
> 
> At present Qemu is not doing any extra validation other than the above
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.
> 
> Thanks,
> Shameer
> [0] https://lore.kernel.org/qemu-devel/cover.1719361174.git.nicolinc@nvidia.com/
> [1] https://lore.kernel.org/linux-iommu/ZrVN05VylFq8lK4q@Asurada-Nvidia/
> 
> Eric Auger (1):
>   hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested
>     binding
> 
> Nicolin Chen (2):
>   hw/arm/virt: Add an SMMU_IO_LEN macro
>   hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
> 
> Shameer Kolothum (2):
>   hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
>   hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device
> 
>  hw/arm/smmuv3.c          |  61 ++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c | 109 ++++++++++++++++++++++++++++++++-------
>  hw/arm/virt.c            |  33 ++++++++++--
>  hw/core/sysbus-fdt.c     |   1 +
>  include/hw/arm/smmuv3.h  |  17 ++++++
>  include/hw/arm/virt.h    |  15 ++++++
>  6 files changed, 215 insertions(+), 21 deletions(-)
> 
> -- 
> 2.34.1
> 
> 


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-13 16:16 ` Mostafa Saleh
@ 2024-11-14  8:01   ` Shameerali Kolothum Thodi via
  2024-11-14 11:49     ` Mostafa Saleh
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-14  8:01 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Mostafa,

> -----Original Message-----
> From: Mostafa Saleh <smostafa@google.com>
> Sent: Wednesday, November 13, 2024 4:17 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Shameer,
> 
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > Hi,
> >
> > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per
> machine
> > and cannot support multiple SMMUv3s.
> >
> 
> I had a quick look at the SMMUv3 files, as now SMMUv3 supports nested
> translation emulation, would it make sense to rename this? As AFAIU,
> this is about virt (stage-1) SMMUv3 that is emulated to a guest.
> Including vSMMU or virt would help distinguish the code, as now
> some new function as smmu_nested_realize() looks confusing.

Yes. I have noticed that. We need to call it something else to avoid the 
confusion. Not sure including "virt" is a good idea as it may indicate virt
machine. Probably "acc" as Nicolin suggested to indicate hw accelerated. 
I will think about a better one. Open to suggestions.

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-14  8:01   ` Shameerali Kolothum Thodi via
@ 2024-11-14 11:49     ` Mostafa Saleh
  0 siblings, 0 replies; 150+ messages in thread
From: Mostafa Saleh @ 2024-11-14 11:49 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer,

On Thu, Nov 14, 2024 at 08:01:28AM +0000, Shameerali Kolothum Thodi wrote:
> Hi Mostafa,
> 
> > -----Original Message-----
> > From: Mostafa Saleh <smostafa@google.com>
> > Sent: Wednesday, November 13, 2024 4:17 PM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> > 
> > Hi Shameer,
> > 
> > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > Hi,
> > >
> > > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per
> > machine
> > > and cannot support multiple SMMUv3s.
> > >
> > 
> > I had a quick look at the SMMUv3 files, as now SMMUv3 supports nested
> > translation emulation, would it make sense to rename this? As AFAIU,
> > this is about virt (stage-1) SMMUv3 that is emulated to a guest.
> > Including vSMMU or virt would help distinguish the code, as now
> > some new function as smmu_nested_realize() looks confusing.
> 
> Yes. I have noticed that. We need to call it something else to avoid the 
> confusion. Not sure including "virt" is a good idea as it may indicate virt
> machine. Probably "acc" as Nicolin suggested to indicate hw accelerated. 
> I will think about a better one. Open to suggestions.

"acc" sounds good to me, also if possible we can have smmuv3-acc.c where
it has all the specific logic, and the main file just calls into it.

Thanks,
Mostafa

> 
> Thanks,
> Shameer
> 


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
                   ` (6 preceding siblings ...)
  2024-11-13 16:16 ` Mostafa Saleh
@ 2024-11-13 21:42 ` Nicolin Chen
  2024-11-14  9:11   ` Shameerali Kolothum Thodi via
  2024-11-18 10:50 ` Eric Auger
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2024-11-13 21:42 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, ddutile,
	linuxarm, wangzhou1, jiangkunkun, jonathan.cameron, zhangfei.gao

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote:
> This RFC is for initial discussion/test purposes only and includes patches
> that are only relevant for adding the "arm-smmuv3-nested" support. For the
> complete branch please find,
> https://github.com/hisilicon/qemu/commits/private-smmuv3-nested-dev-rfc-v1/

I guess the QEMU branch above pairs with this (vIOMMU v6)?
https://github.com/nicolinc/iommufd/commits/smmuv3_nesting-with-rmr

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-13 21:42 ` Nicolin Chen
@ 2024-11-14  9:11   ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-11-14  9:11 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, November 13, 2024 9:43 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote:
> > This RFC is for initial discussion/test purposes only and includes
> > patches that are only relevant for adding the "arm-smmuv3-nested"
> > support. For the complete branch please find,
> > https://github.com/hisilicon/qemu/commits/private-smmuv3-nested-dev-
> rf
> > c-v1/
> 
> I guess the QEMU branch above pairs with this (vIOMMU v6)?
> https://github.com/nicolinc/iommufd/commits/smmuv3_nesting-with-rmr

I actually based it on top of a kernel branch that Zhangfei is keeping for his verification tests.
https://github.com/Linaro/linux-kernel-uadk/commits/6.12-wip-10.26/

But yes, it indeed looks like based on the branch you mentioned above.

Thanks,
Shameer.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
                   ` (7 preceding siblings ...)
  2024-11-13 21:42 ` Nicolin Chen
@ 2024-11-18 10:50 ` Eric Auger
  2025-01-30 16:41   ` Daniel P. Berrangé
  2024-12-13 12:00 ` Daniel P. Berrangé
  2025-01-30 16:00 ` Daniel P. Berrangé
  10 siblings, 1 reply; 150+ messages in thread
From: Eric Auger @ 2024-11-18 10:50 UTC (permalink / raw)
  To: Shameer Kolothum, qemu-arm, qemu-devel
  Cc: peter.maydell, jgg, nicolinc, ddutile, linuxarm, wangzhou1,
	jiangkunkun, jonathan.cameron, zhangfei.gao, Andrea Bolognani

Hi Shameer,

On 11/8/24 13:52, Shameer Kolothum wrote:
> Hi,
>
> This series adds initial support for a user-creatable "arm-smmuv3-nested"
> device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> and cannot support multiple SMMUv3s.
>
> In order to support vfio-pci dev assignment with vSMMUv3, the physical
> SMMUv3 has to be configured in nested mode. Having a pluggable
> "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> running on a host with multiple physical SMMUv3s. A few benefits of doing
> this are,
>
> 1. Avoid invalidation broadcast or lookup in case devices are behind
>    multiple phys SMMUv3s.
> 2. Makes it easy to handle phys SMMUv3s that differ in features.
> 3. Easy to handle future requirements such as vCMDQ support.
>
> This is based on discussions/suggestions received for a previous RFC by
> Nicolin here[0].
>
> This series includes,
>  -Adds support for "arm-smmuv3-nested" device. At present only virt is
>   supported and is using _plug_cb() callback to hook the sysbus mem
>   and irq (Not sure this has any negative repercussions). Patch #3.
>  -Provides a way to associate a pci-bus(pxb-pcie) to the above device.
>   Patch #3.
>  -The last patch is adding RMR support for MSI doorbell handling. Patch #5.
>   This may change in future[1].
>
> This RFC is for initial discussion/test purposes only and includes patches
> that are only relevant for adding the "arm-smmuv3-nested" support. For the
> complete branch please find,
> https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1
>
> Few ToDos to note,
> 1. At present default-bus-bypass-iommu=on should be set when
>    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
>    related boot error.  Requires fixing.
> 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
>    Could be a bug in IORT id mappings.
> 3. The above branch doesn't support vSVA yet.
>
> Hopefully this is helpful in taking the discussion forward. Please take a
> look and let me know.
>
> How to use it(Eg:):
>
> On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
> specify two smmuv3-nested devices each behind a pxb-pcie as below,
>
> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
This kind of instantiation matches what I had in mind. It is
questionable whether the legacy SMMU shouldn't be migrated to that mode
too (instead of using a machine option setting), depending on Peter's
feedbacks and also comments from Libvirt guys. Adding Andrea in the loop.

Thanks

Eric
> -net none \
> -nographic
>
> Guest will boot with two SMMuv3s,
> [    1.608130] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> [    1.609655] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
> [    1.612475] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> [    1.614444] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> [    1.617451] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> [    1.618842] arm-smmu-v3 arm-smmu-v3.1.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
> [    1.621366] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> [    1.623225] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
>
> With a pci topology like below,
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
>
> And if you want to add another HNS VF, it should be added to the same SMMUv3
> as of the first HNS dev,
>
> -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \
>
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]-+-00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  |           \-01.0-[0a]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
>
> Attempt to add the HNS VF to a different SMMUv3 will result in,
>
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument
>
> At present Qemu is not doing any extra validation other than the above
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.
>
> Thanks,
> Shameer
> [0] https://lore.kernel.org/qemu-devel/cover.1719361174.git.nicolinc@nvidia.com/
> [1] https://lore.kernel.org/linux-iommu/ZrVN05VylFq8lK4q@Asurada-Nvidia/
>
> Eric Auger (1):
>   hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested
>     binding
>
> Nicolin Chen (2):
>   hw/arm/virt: Add an SMMU_IO_LEN macro
>   hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
>
> Shameer Kolothum (2):
>   hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
>   hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device
>
>  hw/arm/smmuv3.c          |  61 ++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c | 109 ++++++++++++++++++++++++++++++++-------
>  hw/arm/virt.c            |  33 ++++++++++--
>  hw/core/sysbus-fdt.c     |   1 +
>  include/hw/arm/smmuv3.h  |  17 ++++++
>  include/hw/arm/virt.h    |  15 ++++++
>  6 files changed, 215 insertions(+), 21 deletions(-)
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-18 10:50 ` Eric Auger
@ 2025-01-30 16:41   ` Daniel P. Berrangé
  0 siblings, 0 replies; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-01-30 16:41 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, peter.maydell, jgg,
	nicolinc, ddutile, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao, Andrea Bolognani

On Mon, Nov 18, 2024 at 11:50:46AM +0100, Eric Auger wrote:
> Hi Shameer,
> 
> On 11/8/24 13:52, Shameer Kolothum wrote:
> > Hi,
> >
> > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> > and cannot support multiple SMMUv3s.
> >
> > In order to support vfio-pci dev assignment with vSMMUv3, the physical
> > SMMUv3 has to be configured in nested mode. Having a pluggable
> > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> > running on a host with multiple physical SMMUv3s. A few benefits of doing
> > this are,
> >
> > 1. Avoid invalidation broadcast or lookup in case devices are behind
> >    multiple phys SMMUv3s.
> > 2. Makes it easy to handle phys SMMUv3s that differ in features.
> > 3. Easy to handle future requirements such as vCMDQ support.
> >
> > This is based on discussions/suggestions received for a previous RFC by
> > Nicolin here[0].
> >
> > This series includes,
> >  -Adds support for "arm-smmuv3-nested" device. At present only virt is
> >   supported and is using _plug_cb() callback to hook the sysbus mem
> >   and irq (Not sure this has any negative repercussions). Patch #3.
> >  -Provides a way to associate a pci-bus(pxb-pcie) to the above device.
> >   Patch #3.
> >  -The last patch is adding RMR support for MSI doorbell handling. Patch #5.
> >   This may change in future[1].
> >
> > This RFC is for initial discussion/test purposes only and includes patches
> > that are only relevant for adding the "arm-smmuv3-nested" support. For the
> > complete branch please find,
> > https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1
> >
> > Few ToDos to note,
> > 1. At present default-bus-bypass-iommu=on should be set when
> >    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
> >    related boot error.  Requires fixing.
> > 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
> >    Could be a bug in IORT id mappings.
> > 3. The above branch doesn't support vSVA yet.
> >
> > Hopefully this is helpful in taking the discussion forward. Please take a
> > look and let me know.
> >
> > How to use it(Eg:):
> >
> > On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
> > devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
> > specify two smmuv3-nested devices each behind a pxb-pcie as below,
> >
> > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> > -object iommufd,id=iommufd0 \
> > -bios QEMU_EFI.fd \
> > -kernel Image \
> > -device virtio-blk-device,drive=fs \
> > -drive if=none,file=rootfs.qcow2,id=fs \
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> This kind of instantiation matches what I had in mind. It is
> questionable whether the legacy SMMU shouldn't be migrated to that mode
> too (instead of using a machine option setting), depending on Peter's
> feedbacks and also comments from Libvirt guys. Adding Andrea in the loop.

Yeah, looking at the current config I'm pretty surprised to see it
configured with '-machine virt,iommu=ssmuv3', where 'smmuv3' is a
type name. This effectively a back-door reinvention of the '-device'
arg.

I think it'd make more sense to deprecate the 'iommu' property
on the machine, and allow '-device ssmu3,pci-bus=pcie.0' to
associate the IOMMU with the PCI root bus, so we have consistent
approaches for all SMMU impls.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
                   ` (8 preceding siblings ...)
  2024-11-18 10:50 ` Eric Auger
@ 2024-12-13 12:00 ` Daniel P. Berrangé
  2024-12-13 12:46   ` Jason Gunthorpe
  2025-01-30 16:00 ` Daniel P. Berrangé
  10 siblings, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2024-12-13 12:00 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> Hi,
> 
> This series adds initial support for a user-creatable "arm-smmuv3-nested"
> device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> and cannot support multiple SMMUv3s.
> 
> In order to support vfio-pci dev assignment with vSMMUv3, the physical
> SMMUv3 has to be configured in nested mode. Having a pluggable
> "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> running on a host with multiple physical SMMUv3s. A few benefits of doing
> this are,

I'm not very familiar with arm, but from this description I'm not
really seeing how "nesting" is involved here. You're only talking
about the host and 1 L1 guest, no L2 guest.

Also what is the relation between the physical SMMUv3 and the guest
SMMUv3 that's referenced ? Is this in fact some form of host device
passthrough rather than nesting ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-13 12:00 ` Daniel P. Berrangé
@ 2024-12-13 12:46   ` Jason Gunthorpe
  2024-12-13 13:19     ` Daniel P. Berrangé
  2024-12-13 13:33     ` Peter Maydell
  0 siblings, 2 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2024-12-13 12:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, eric.auger, peter.maydell,
	nicolinc, ddutile, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > Hi,
> > 
> > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> > and cannot support multiple SMMUv3s.
> > 
> > In order to support vfio-pci dev assignment with vSMMUv3, the physical
> > SMMUv3 has to be configured in nested mode. Having a pluggable
> > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> > running on a host with multiple physical SMMUv3s. A few benefits of doing
> > this are,
> 
> I'm not very familiar with arm, but from this description I'm not
> really seeing how "nesting" is involved here. You're only talking
> about the host and 1 L1 guest, no L2 guest.

nesting is the term the iommu side is using to refer to the 2
dimensional paging, ie a guest page table on top of a hypervisor page
table.

Nothing to do with vm nesting.
 
> Also what is the relation between the physical SMMUv3 and the guest
> SMMUv3 that's referenced ? Is this in fact some form of host device
> passthrough rather than nesting ?

It is an acceeleration feature, the iommu HW does more work instead of
the software emulating things. Similar to how the 2d paging option in
KVM is an acceleration feature.

All of the iommu series on vfio are creating paravirtualized iommu
models inside the VM. They access various levels of HW acceleration to
speed up the paravirtualization.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-13 12:46   ` Jason Gunthorpe
@ 2024-12-13 13:19     ` Daniel P. Berrangé
  2024-12-16  9:38       ` Shameerali Kolothum Thodi via
  2024-12-17 18:36       ` Donald Dutile
  2024-12-13 13:33     ` Peter Maydell
  1 sibling, 2 replies; 150+ messages in thread
From: Daniel P. Berrangé @ 2024-12-13 13:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, eric.auger, peter.maydell,
	nicolinc, ddutile, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

On Fri, Dec 13, 2024 at 08:46:42AM -0400, Jason Gunthorpe wrote:
> On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > Hi,
> > > 
> > > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> > > and cannot support multiple SMMUv3s.
> > > 
> > > In order to support vfio-pci dev assignment with vSMMUv3, the physical
> > > SMMUv3 has to be configured in nested mode. Having a pluggable
> > > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> > > running on a host with multiple physical SMMUv3s. A few benefits of doing
> > > this are,
> > 
> > I'm not very familiar with arm, but from this description I'm not
> > really seeing how "nesting" is involved here. You're only talking
> > about the host and 1 L1 guest, no L2 guest.
> 
> nesting is the term the iommu side is using to refer to the 2
> dimensional paging, ie a guest page table on top of a hypervisor page
> table.
> 
> Nothing to do with vm nesting.

Ok, that naming is destined to cause confusion for many, given the
commonly understood use of 'nesting' in the context of VMs...

>  
> > Also what is the relation between the physical SMMUv3 and the guest
> > SMMUv3 that's referenced ? Is this in fact some form of host device
> > passthrough rather than nesting ?
> 
> It is an acceeleration feature, the iommu HW does more work instead of
> the software emulating things. Similar to how the 2d paging option in
> KVM is an acceleration feature.
> 
> All of the iommu series on vfio are creating paravirtualized iommu
> models inside the VM. They access various levels of HW acceleration to
> speed up the paravirtualization.

... describing it as a HW accelerated iommu makes it significantly clearer
to me what this proposal is about. Perhaps the device is better named as
"arm-smmuv3-accel" ?


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-13 13:19     ` Daniel P. Berrangé
@ 2024-12-16  9:38       ` Shameerali Kolothum Thodi via
  2024-12-17 18:36       ` Donald Dutile
  1 sibling, 0 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-12-16  9:38 UTC (permalink / raw)
  To: Daniel P. Berrangé, Jason Gunthorpe
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org



> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Friday, December 13, 2024 1:20 PM
> To: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Dec 13, 2024 at 08:46:42AM -0400, Jason Gunthorpe wrote:
> > On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > > Hi,
> > > >
> > > > This series adds initial support for a user-creatable "arm-smmuv3-
> nested"
> > > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per
> machine
> > > > and cannot support multiple SMMUv3s.
> > > >
> > > > In order to support vfio-pci dev assignment with vSMMUv3, the
> physical
> > > > SMMUv3 has to be configured in nested mode. Having a pluggable
> > > > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3
> for Guests
> > > > running on a host with multiple physical SMMUv3s. A few benefits of
> doing
> > > > this are,
> > >
> > > I'm not very familiar with arm, but from this description I'm not
> > > really seeing how "nesting" is involved here. You're only talking
> > > about the host and 1 L1 guest, no L2 guest.
> >
> > nesting is the term the iommu side is using to refer to the 2
> > dimensional paging, ie a guest page table on top of a hypervisor page
> > table.
> >
> > Nothing to do with vm nesting.
> 
> Ok, that naming is destined to cause confusion for many, given the
> commonly understood use of 'nesting' in the context of VMs...
> 
> >
> > > Also what is the relation between the physical SMMUv3 and the guest
> > > SMMUv3 that's referenced ? Is this in fact some form of host device
> > > passthrough rather than nesting ?
> >
> > It is an acceeleration feature, the iommu HW does more work instead of
> > the software emulating things. Similar to how the 2d paging option in
> > KVM is an acceleration feature.
> >
> > All of the iommu series on vfio are creating paravirtualized iommu
> > models inside the VM. They access various levels of HW acceleration to
> > speed up the paravirtualization.
> 
> ... describing it as a HW accelerated iommu makes it significantly clearer
> to me what this proposal is about. Perhaps the device is better named as
> "arm-smmuv3-accel" ?

Agree. There were similar previous comments from reviewers that current smmuv3 
already has emulated stage 1 and stage 2 support and refers to that as "nested"
in code. So this will be renamed as above. 

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-13 13:19     ` Daniel P. Berrangé
  2024-12-16  9:38       ` Shameerali Kolothum Thodi via
@ 2024-12-17 18:36       ` Donald Dutile
  1 sibling, 0 replies; 150+ messages in thread
From: Donald Dutile @ 2024-12-17 18:36 UTC (permalink / raw)
  To: Daniel P. Berrangé, Jason Gunthorpe
  Cc: Shameer Kolothum, qemu-arm, qemu-devel, eric.auger, peter.maydell,
	nicolinc, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao



On 12/13/24 8:19 AM, Daniel P. Berrangé wrote:
> On Fri, Dec 13, 2024 at 08:46:42AM -0400, Jason Gunthorpe wrote:
>> On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
>>> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
>>>> Hi,
>>>>
>>>> This series adds initial support for a user-creatable "arm-smmuv3-nested"
>>>> device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
>>>> and cannot support multiple SMMUv3s.
>>>>
>>>> In order to support vfio-pci dev assignment with vSMMUv3, the physical
>>>> SMMUv3 has to be configured in nested mode. Having a pluggable
>>>> "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
>>>> running on a host with multiple physical SMMUv3s. A few benefits of doing
>>>> this are,
>>>
>>> I'm not very familiar with arm, but from this description I'm not
>>> really seeing how "nesting" is involved here. You're only talking
>>> about the host and 1 L1 guest, no L2 guest.
>>
>> nesting is the term the iommu side is using to refer to the 2
>> dimensional paging, ie a guest page table on top of a hypervisor page
>> table.
>>
>> Nothing to do with vm nesting.
> 
> Ok, that naming is destined to cause confusion for many, given the
> commonly understood use of 'nesting' in the context of VMs...
> 
>>   
>>> Also what is the relation between the physical SMMUv3 and the guest
>>> SMMUv3 that's referenced ? Is this in fact some form of host device
>>> passthrough rather than nesting ?
>>
>> It is an acceeleration feature, the iommu HW does more work instead of
>> the software emulating things. Similar to how the 2d paging option in
>> KVM is an acceleration feature.
>>
>> All of the iommu series on vfio are creating paravirtualized iommu
>> models inside the VM. They access various levels of HW acceleration to
>> speed up the paravirtualization.
> 
> ... describing it as a HW accelerated iommu makes it significantly clearer
> to me what this proposal is about. Perhaps the device is better named as
> "arm-smmuv3-accel" ?
> 
I'm having deja-vu! ;-)
Thanks for echo-ing my earlier statements in this patch series about the use of 'nested'.
and the better use of 'accel' in these circumstances.
Even 'accel' on an 'arm-smmuv3' is a bit of a hammer, as there can be multiple accel's features
&/or implementations... I would like to see the 'accel' as a parameter to 'arm-smmuv3', and not
a complete name-space onto itself, so we can do things like 'accel=cmdvq', accel='2-level', ...

and for libvirt's sanity, a way to get those hw features from sysfs for
(possible) migration-compatibility testing.

> 
> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-13 12:46   ` Jason Gunthorpe
  2024-12-13 13:19     ` Daniel P. Berrangé
@ 2024-12-13 13:33     ` Peter Maydell
  2024-12-16 10:01       ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 150+ messages in thread
From: Peter Maydell @ 2024-12-13 13:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel P. Berrangé, Shameer Kolothum, qemu-arm, qemu-devel,
	eric.auger, nicolinc, ddutile, linuxarm, wangzhou1, jiangkunkun,
	jonathan.cameron, zhangfei.gao

On Fri, 13 Dec 2024 at 12:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > Hi,
> > >
> > > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> > > and cannot support multiple SMMUv3s.
> > >
> > > In order to support vfio-pci dev assignment with vSMMUv3, the physical
> > > SMMUv3 has to be configured in nested mode. Having a pluggable
> > > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> > > running on a host with multiple physical SMMUv3s. A few benefits of doing
> > > this are,
> >
> > I'm not very familiar with arm, but from this description I'm not
> > really seeing how "nesting" is involved here. You're only talking
> > about the host and 1 L1 guest, no L2 guest.
>
> nesting is the term the iommu side is using to refer to the 2
> dimensional paging, ie a guest page table on top of a hypervisor page
> table.

Isn't that more usually called "two stage" paging? Calling
that "nesting" seems like it is going to be massively confusing...

Also, how does it relate to what this series seems to be
doing, where we provide the guest with two separate SMMUs?
(Are those two SMMUs "nested" in the sense that one is sitting
behind the other?)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-13 13:33     ` Peter Maydell
@ 2024-12-16 10:01       ` Shameerali Kolothum Thodi via
  2025-01-09  4:45         ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2024-12-16 10:01 UTC (permalink / raw)
  To: Peter Maydell, Jason Gunthorpe
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Friday, December 13, 2024 1:33 PM
> To: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, 13 Dec 2024 at 12:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > > Hi,
> > > >
> > > > This series adds initial support for a user-creatable "arm-smmuv3-
> nested"
> > > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per
> machine
> > > > and cannot support multiple SMMUv3s.
> > > >
> > > > In order to support vfio-pci dev assignment with vSMMUv3, the
> physical
> > > > SMMUv3 has to be configured in nested mode. Having a pluggable
> > > > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3
> for Guests
> > > > running on a host with multiple physical SMMUv3s. A few benefits of
> doing
> > > > this are,
> > >
> > > I'm not very familiar with arm, but from this description I'm not
> > > really seeing how "nesting" is involved here. You're only talking
> > > about the host and 1 L1 guest, no L2 guest.
> >
> > nesting is the term the iommu side is using to refer to the 2
> > dimensional paging, ie a guest page table on top of a hypervisor page
> > table.
> 
> Isn't that more usually called "two stage" paging? Calling
> that "nesting" seems like it is going to be massively confusing...

Yes. This will be renamed in future revisions as arm-smmuv3-accel.

> 
> Also, how does it relate to what this series seems to be
> doing, where we provide the guest with two separate SMMUs?
> (Are those two SMMUs "nested" in the sense that one is sitting
> behind the other?)

I don't think it requires two SMMUs in Guest. The nested or "two
stage" means the stage 1 page table is owned by Guest and stage 2
by host. And this is achieved by IOMMUFD provided IOCTLs. 

There is a precurser to this series where the support for hw accelerated
2 stage support is added in Qemu SMMUv3 code.

Please see the complete branch here,
https://github.com/hisilicon/qemu/commits/private-smmuv3-nested-dev-rfc-v1/
And patches prior to this commit adds that support: 
4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
SMMUv3")

Nicolin is soon going to send out those for review. Or I can include
those in this series so that it gives a complete picture. Nicolin?

Hope this clarifies any confusion.

Thanks,
Shameer




^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-12-16 10:01       ` Shameerali Kolothum Thodi via
@ 2025-01-09  4:45         ` Nicolin Chen
  2025-01-11  4:05           ` Donald Dutile
  2025-01-31 16:54           ` Eric Auger
  0 siblings, 2 replies; 150+ messages in thread
From: Nicolin Chen @ 2025-01-09  4:45 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, eric.auger@redhat.com,
	ddutile@redhat.com
  Cc: Peter Maydell, Jason Gunthorpe, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
> And patches prior to this commit adds that support: 
> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
> SMMUv3")
> 
> Nicolin is soon going to send out those for review. Or I can include
> those in this series so that it gives a complete picture. Nicolin?

Just found that I forgot to reply this one...sorry

I asked Don/Eric to take over that vSMMU series:
https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
(The majority of my effort has been still on the kernel side:
 previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)

Don/Eric, is there any update from your side?

I think it's also a good time to align with each other so we
can take our next step in the new year :)

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-09  4:45         ` Nicolin Chen
@ 2025-01-11  4:05           ` Donald Dutile
  2025-01-23  4:10             ` Nicolin Chen
  2025-01-31 16:54           ` Eric Auger
  1 sibling, 1 reply; 150+ messages in thread
From: Donald Dutile @ 2025-01-11  4:05 UTC (permalink / raw)
  To: Nicolin Chen, Shameerali Kolothum Thodi, eric.auger@redhat.com
  Cc: Peter Maydell, Jason Gunthorpe, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

Nicolin,
Hi!


On 1/8/25 11:45 PM, Nicolin Chen wrote:
> On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
>> And patches prior to this commit adds that support:
>> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>> SMMUv3")
>>
>> Nicolin is soon going to send out those for review. Or I can include
>> those in this series so that it gives a complete picture. Nicolin?
> 
> Just found that I forgot to reply this one...sorry
> 
> I asked Don/Eric to take over that vSMMU series:
> https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> (The majority of my effort has been still on the kernel side:
>   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
> 
> Don/Eric, is there any update from your side?
> 
Apologies for delayed response, been at customer site, and haven't been keeping up w/biz email.
Eric is probably waiting for me to get back and chat as well.
Will look to reply early next week.
- Don

> I think it's also a good time to align with each other so we
> can take our next step in the new year :)
> 
> Thanks
> Nicolin
> 



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-11  4:05           ` Donald Dutile
@ 2025-01-23  4:10             ` Nicolin Chen
  2025-01-23  8:28               ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2025-01-23  4:10 UTC (permalink / raw)
  To: Donald Dutile
  Cc: Shameerali Kolothum Thodi, eric.auger@redhat.com, Peter Maydell,
	Jason Gunthorpe, Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Don,

On Fri, Jan 10, 2025 at 11:05:24PM -0500, Donald Dutile wrote:
> On 1/8/25 11:45 PM, Nicolin Chen wrote:
> > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
> > > And patches prior to this commit adds that support:
> > > 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
> > > SMMUv3")
> > > 
> > > Nicolin is soon going to send out those for review. Or I can include
> > > those in this series so that it gives a complete picture. Nicolin?
> > 
> > Just found that I forgot to reply this one...sorry
> > 
> > I asked Don/Eric to take over that vSMMU series:
> > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> > (The majority of my effort has been still on the kernel side:
> >   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
> > 
> > Don/Eric, is there any update from your side?
> > 
> Apologies for delayed response, been at customer site, and haven't been keeping up w/biz email.
> Eric is probably waiting for me to get back and chat as well.
> Will look to reply early next week.
 
I wonder if we can make some progress in Feb? If so, we can start
to wrap up the iommufd uAPI patches for HWPT, which was a part of
intel's series but never got sent since their emulated series is
seemingly still pending?

One detail for the uAPI patches is to decide how vIOMMU code will
interact with those backend APIs.. Hopefully, you and Eric should
have something in mind :)

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-23  4:10             ` Nicolin Chen
@ 2025-01-23  8:28               ` Shameerali Kolothum Thodi via
  2025-01-23  8:40                 ` Nicolin Chen
  2025-01-23 11:07                 ` Duan, Zhenzhong
  0 siblings, 2 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-01-23  8:28 UTC (permalink / raw)
  To: Nicolin Chen, Donald Dutile
  Cc: eric.auger@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org,
	zhenzhong.duan@intel.com



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, January 23, 2025 4:10 AM
> To: Donald Dutile <ddutile@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; eric.auger@redhat.com; Peter
> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Don,
> 
> On Fri, Jan 10, 2025 at 11:05:24PM -0500, Donald Dutile wrote:
> > On 1/8/25 11:45 PM, Nicolin Chen wrote:
> > > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi
> wrote:
> > > > And patches prior to this commit adds that support:
> > > > 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
> > > > SMMUv3")
> > > >
> > > > Nicolin is soon going to send out those for review. Or I can include
> > > > those in this series so that it gives a complete picture. Nicolin?
> > >
> > > Just found that I forgot to reply this one...sorry
> > >
> > > I asked Don/Eric to take over that vSMMU series:
> > > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> > > (The majority of my effort has been still on the kernel side:
> > >   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
> > >
> > > Don/Eric, is there any update from your side?
> > >
> > Apologies for delayed response, been at customer site, and haven't been
> keeping up w/biz email.
> > Eric is probably waiting for me to get back and chat as well.
> > Will look to reply early next week.
> 
> I wonder if we can make some progress in Feb? If so, we can start
> to wrap up the iommufd uAPI patches for HWPT, which was a part of
> intel's series but never got sent since their emulated series is
> seemingly still pending?

I think these are the  5 patches that we require from Intel pass-through series,

vfio/iommufd: Implement [at|de]tach_hwpt handlers
vfio/iommufd: Implement HostIOMMUDeviceClass::realize_late() handler
HostIOMMUDevice: Introduce realize_late callback
vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD
backends/iommufd: Add helpers for invalidating user-managed HWPT

See the commits from here,
https://github.com/hisilicon/qemu/commit/bbdc65af38fa5723f1bd9b026e292730901f57b5

[CC  Zhenzhong]

Hi Zhenzhong,

Just wondering what your plans are for the above patches.  If it make sense and you
are fine with it, I think it is a good idea one of us can pick up those from that series
and sent out separately so that it can get some review and take it forward.

Thanks,
Shameer
 



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-23  8:28               ` Shameerali Kolothum Thodi via
@ 2025-01-23  8:40                 ` Nicolin Chen
  2025-01-23 11:07                 ` Duan, Zhenzhong
  1 sibling, 0 replies; 150+ messages in thread
From: Nicolin Chen @ 2025-01-23  8:40 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, zhenzhong.duan@intel.com
  Cc: Donald Dutile, eric.auger@redhat.com, Peter Maydell,
	Jason Gunthorpe, Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Thu, Jan 23, 2025 at 08:28:34AM +0000, Shameerali Kolothum Thodi wrote:
> > -----Original Message-----
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > I wonder if we can make some progress in Feb? If so, we can start
> > to wrap up the iommufd uAPI patches for HWPT, which was a part of
> > intel's series but never got sent since their emulated series is
> > seemingly still pending?
> 
> I think these are the  5 patches that we require from Intel pass-through series,
> 
> vfio/iommufd: Implement [at|de]tach_hwpt handlers
> vfio/iommufd: Implement HostIOMMUDeviceClass::realize_late() handler
> HostIOMMUDevice: Introduce realize_late callback
> vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD
> backends/iommufd: Add helpers for invalidating user-managed HWPT
 
> Hi Zhenzhong,
> 
> Just wondering what your plans are for the above patches.  If it make sense and you
> are fine with it, I think it is a good idea one of us can pick up those from that series
> and sent out separately so that it can get some review and take it forward.

+1

These uAPI/backend patches can be sent in a smaller series to
get reviewed prior to the intel/arm series. It can merge with
either of the intel/arm series that runs faster at the end of
the day :)

Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-23  8:28               ` Shameerali Kolothum Thodi via
  2025-01-23  8:40                 ` Nicolin Chen
@ 2025-01-23 11:07                 ` Duan, Zhenzhong
  2025-02-17  9:17                   ` Duan, Zhenzhong
  1 sibling, 1 reply; 150+ messages in thread
From: Duan, Zhenzhong @ 2025-01-23 11:07 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen, Donald Dutile
  Cc: eric.auger@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Shameer,

>-----Original Message-----
>From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested
>SMMUv3
>
>
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Sent: Thursday, January 23, 2025 4:10 AM
>> To: Donald Dutile <ddutile@redhat.com>
>> Cc: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; eric.auger@redhat.com; Peter
>> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
>> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
>> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> Hi Don,
>>
>> On Fri, Jan 10, 2025 at 11:05:24PM -0500, Donald Dutile wrote:
>> > On 1/8/25 11:45 PM, Nicolin Chen wrote:
>> > > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi
>> wrote:
>> > > > And patches prior to this commit adds that support:
>> > > > 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>> > > > SMMUv3")
>> > > >
>> > > > Nicolin is soon going to send out those for review. Or I can include
>> > > > those in this series so that it gives a complete picture. Nicolin?
>> > >
>> > > Just found that I forgot to reply this one...sorry
>> > >
>> > > I asked Don/Eric to take over that vSMMU series:
>> > > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
>> > > (The majority of my effort has been still on the kernel side:
>> > >   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
>> > >
>> > > Don/Eric, is there any update from your side?
>> > >
>> > Apologies for delayed response, been at customer site, and haven't been
>> keeping up w/biz email.
>> > Eric is probably waiting for me to get back and chat as well.
>> > Will look to reply early next week.
>>
>> I wonder if we can make some progress in Feb? If so, we can start
>> to wrap up the iommufd uAPI patches for HWPT, which was a part of
>> intel's series but never got sent since their emulated series is
>> seemingly still pending?
>
>I think these are the  5 patches that we require from Intel pass-through series,
>
>vfio/iommufd: Implement [at|de]tach_hwpt handlers
>vfio/iommufd: Implement HostIOMMUDeviceClass::realize_late() handler
>HostIOMMUDevice: Introduce realize_late callback
>vfio/iommufd: Add properties and handlers to
>TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>backends/iommufd: Add helpers for invalidating user-managed HWPT
>
>See the commits from here,
>https://github.com/hisilicon/qemu/commit/bbdc65af38fa5723f1bd9b026e29273
>0901f57b5
>
>[CC  Zhenzhong]
>
>Hi Zhenzhong,
>
>Just wondering what your plans are for the above patches.  If it make sense and
>you
>are fine with it, I think it is a good idea one of us can pick up those from that
>series
>and sent out separately so that it can get some review and take it forward.

Emulated series is merged, I plan to send Intel pass-through series after
Chinese festival vacation, but at least half a month later. So feel free to
pick those patches you need and send for comments.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-23 11:07                 ` Duan, Zhenzhong
@ 2025-02-17  9:17                   ` Duan, Zhenzhong
  2025-02-18  6:52                     ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Duan, Zhenzhong @ 2025-02-17  9:17 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Nicolin Chen, Donald Dutile
  Cc: eric.auger@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, Peng, Chao P

Hi Shameer, Nicolin,

>-----Original Message-----
>From: Duan, Zhenzhong
>Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested
>SMMUv3
>
>Hi Shameer,
>
>>-----Original Message-----
>>From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>nested
>>SMMUv3
>>
>>
>>
>>> -----Original Message-----
>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>> Sent: Thursday, January 23, 2025 4:10 AM
>>> To: Donald Dutile <ddutile@redhat.com>
>>> Cc: Shameerali Kolothum Thodi
>>> <shameerali.kolothum.thodi@huawei.com>; eric.auger@redhat.com; Peter
>>> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
>>> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>>> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
>>> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>>> zhangfei.gao@linaro.org
>>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>>> nested SMMUv3
>>>
>>> Hi Don,
>>>
>>> On Fri, Jan 10, 2025 at 11:05:24PM -0500, Donald Dutile wrote:
>>> > On 1/8/25 11:45 PM, Nicolin Chen wrote:
>>> > > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi
>>> wrote:
>>> > > > And patches prior to this commit adds that support:
>>> > > > 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>>> > > > SMMUv3")
>>> > > >
>>> > > > Nicolin is soon going to send out those for review. Or I can include
>>> > > > those in this series so that it gives a complete picture. Nicolin?
>>> > >
>>> > > Just found that I forgot to reply this one...sorry
>>> > >
>>> > > I asked Don/Eric to take over that vSMMU series:
>>> > > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
>>> > > (The majority of my effort has been still on the kernel side:
>>> > >   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
>>> > >
>>> > > Don/Eric, is there any update from your side?
>>> > >
>>> > Apologies for delayed response, been at customer site, and haven't been
>>> keeping up w/biz email.
>>> > Eric is probably waiting for me to get back and chat as well.
>>> > Will look to reply early next week.
>>>
>>> I wonder if we can make some progress in Feb? If so, we can start
>>> to wrap up the iommufd uAPI patches for HWPT, which was a part of
>>> intel's series but never got sent since their emulated series is
>>> seemingly still pending?
>>
>>I think these are the  5 patches that we require from Intel pass-through series,
>>
>>vfio/iommufd: Implement [at|de]tach_hwpt handlers
>>vfio/iommufd: Implement HostIOMMUDeviceClass::realize_late() handler
>>HostIOMMUDevice: Introduce realize_late callback
>>vfio/iommufd: Add properties and handlers to
>>TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>backends/iommufd: Add helpers for invalidating user-managed HWPT
>>
>>See the commits from here,
>>https://github.com/hisilicon/qemu/commit/bbdc65af38fa5723f1bd9b026e2927
>3
>>0901f57b5
>>
>>[CC  Zhenzhong]
>>
>>Hi Zhenzhong,
>>
>>Just wondering what your plans are for the above patches.  If it make sense and
>>you
>>are fine with it, I think it is a good idea one of us can pick up those from that
>>series
>>and sent out separately so that it can get some review and take it forward.
>
>Emulated series is merged, I plan to send Intel pass-through series after
>Chinese festival vacation, but at least half a month later. So feel free to
>pick those patches you need and send for comments.

I plan to send vtd nesting series out this week and want to ask about status
of "1) HWPT uAPI patches in backends/iommufd.c" series.

If you had sent it out, I will do a rebase and bypass them to avoid duplicate
review effort in community. Or I can send them in vtd nesting series if you not yet.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-17  9:17                   ` Duan, Zhenzhong
@ 2025-02-18  6:52                     ` Shameerali Kolothum Thodi via
  2025-03-06 17:59                       ` Eric Auger
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-18  6:52 UTC (permalink / raw)
  To: Duan, Zhenzhong, Nicolin Chen, Donald Dutile
  Cc: eric.auger@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, Peng, Chao P

Hi Zhenzhong,

> -----Original Message-----
> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
> Sent: Monday, February 17, 2025 9:17 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Nicolin Chen
> <nicolinc@nvidia.com>; Donald Dutile <ddutile@redhat.com>
> Cc: eric.auger@redhat.com; Peter Maydell <peter.maydell@linaro.org>;
> Jason Gunthorpe <jgg@nvidia.com>; Daniel P. Berrangé
> <berrange@redhat.com>; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Peng, Chao P <chao.p.peng@intel.com>
> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Shameer, Nicolin,
> 
[...]

> >>Hi Zhenzhong,
> >>
> >>Just wondering what your plans are for the above patches.  If it make
> sense and
> >>you
> >>are fine with it, I think it is a good idea one of us can pick up those from
> that
> >>series
> >>and sent out separately so that it can get some review and take it
> forward.
> >
> >Emulated series is merged, I plan to send Intel pass-through series after
> >Chinese festival vacation, but at least half a month later. So feel free to
> >pick those patches you need and send for comments.
> 
> I plan to send vtd nesting series out this week and want to ask about status
> of "1) HWPT uAPI patches in backends/iommufd.c" series.
> 
> If you had sent it out, I will do a rebase and bypass them to avoid duplicate
> review effort in community. Or I can send them in vtd nesting series if you
> not yet.

No. It is not send out yet. Please include it in your vtd nesting series. Thanks.

I am currently working on refactoring the SMMUv3 accel series and the
"Add HW accelerated nesting support for arm SMMUv3" series
from Nicolin.

Thanks,
Shameer.




^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-18  6:52                     ` Shameerali Kolothum Thodi via
@ 2025-03-06 17:59                       ` Eric Auger
  2025-03-06 18:27                         ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Eric Auger @ 2025-03-06 17:59 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Duan, Zhenzhong, Nicolin Chen,
	Donald Dutile
  Cc: Peter Maydell, Jason Gunthorpe, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Peng, Chao P

Hi Shammeer,


On 2/18/25 7:52 AM, Shameerali Kolothum Thodi wrote:
> Hi Zhenzhong,
>
>> -----Original Message-----
>> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>> Sent: Monday, February 17, 2025 9:17 AM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; Nicolin Chen
>> <nicolinc@nvidia.com>; Donald Dutile <ddutile@redhat.com>
>> Cc: eric.auger@redhat.com; Peter Maydell <peter.maydell@linaro.org>;
>> Jason Gunthorpe <jgg@nvidia.com>; Daniel P. Berrangé
>> <berrange@redhat.com>; qemu-arm@nongnu.org; qemu-
>> devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org; Peng, Chao P <chao.p.peng@intel.com>
>> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> Hi Shameer, Nicolin,
>>
> [...]
>
>>>> Hi Zhenzhong,
>>>>
>>>> Just wondering what your plans are for the above patches.  If it make
>> sense and
>>>> you
>>>> are fine with it, I think it is a good idea one of us can pick up those from
>> that
>>>> series
>>>> and sent out separately so that it can get some review and take it
>> forward.
>>> Emulated series is merged, I plan to send Intel pass-through series after
>>> Chinese festival vacation, but at least half a month later. So feel free to
>>> pick those patches you need and send for comments.
>> I plan to send vtd nesting series out this week and want to ask about status
>> of "1) HWPT uAPI patches in backends/iommufd.c" series.
>>
>> If you had sent it out, I will do a rebase and bypass them to avoid duplicate
>> review effort in community. Or I can send them in vtd nesting series if you
>> not yet.
> No. It is not send out yet. Please include it in your vtd nesting series. Thanks.
>
> I am currently working on refactoring the SMMUv3 accel series and the
> "Add HW accelerated nesting support for arm SMMUv3" series
so will you send "Add HW accelerated nesting support for arm SMMUv3" or
do you want me to do it? Thanks Eric
> from Nicolin.
>
> Thanks,
> Shameer.
>
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-03-06 17:59                       ` Eric Auger
@ 2025-03-06 18:27                         ` Shameerali Kolothum Thodi via
  2025-03-06 18:40                           ` Eric Auger
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-06 18:27 UTC (permalink / raw)
  To: eric.auger@redhat.com, Duan, Zhenzhong, Nicolin Chen,
	Donald Dutile
  Cc: Peter Maydell, Jason Gunthorpe, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Peng, Chao P



> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, March 6, 2025 6:00 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Duan, Zhenzhong
> <zhenzhong.duan@intel.com>; Nicolin Chen <nicolinc@nvidia.com>;
> Donald Dutile <ddutile@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>; Jason Gunthorpe
> <jgg@nvidia.com>; Daniel P. Berrangé <berrange@redhat.com>; qemu-
> arm@nongnu.org; qemu-devel@nongnu.org; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org; Peng, Chao P
> <chao.p.peng@intel.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Shammeer,
> 

Hi Eric,

> >
> > I am currently working on refactoring the SMMUv3 accel series and the
> > "Add HW accelerated nesting support for arm SMMUv3" series
> so will you send "Add HW accelerated nesting support for arm SMMUv3" or
> do you want me to do it? Thanks Eric

Yes. I am on it. Hopefully I will be able to send out everything next week.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-03-06 18:27                         ` Shameerali Kolothum Thodi via
@ 2025-03-06 18:40                           ` Eric Auger
  0 siblings, 0 replies; 150+ messages in thread
From: Eric Auger @ 2025-03-06 18:40 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Duan, Zhenzhong, Nicolin Chen,
	Donald Dutile
  Cc: Peter Maydell, Jason Gunthorpe, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Peng, Chao P

Hi Shameer,

On 3/6/25 7:27 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Thursday, March 6, 2025 6:00 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; Duan, Zhenzhong
>> <zhenzhong.duan@intel.com>; Nicolin Chen <nicolinc@nvidia.com>;
>> Donald Dutile <ddutile@redhat.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>; Jason Gunthorpe
>> <jgg@nvidia.com>; Daniel P. Berrangé <berrange@redhat.com>; qemu-
>> arm@nongnu.org; qemu-devel@nongnu.org; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org; Peng, Chao P
>> <chao.p.peng@intel.com>
>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> Hi Shammeer,
>>
> Hi Eric,
>
>>> I am currently working on refactoring the SMMUv3 accel series and the
>>> "Add HW accelerated nesting support for arm SMMUv3" series
>> so will you send "Add HW accelerated nesting support for arm SMMUv3" or
>> do you want me to do it? Thanks Eric
> Yes. I am on it. Hopefully I will be able to send out everything next week.
Sure. No pressure. I will continue reviewing Zhenzhong's series then.
Looking forward to seeing your respin.

Eric
>
> Thanks,
> Shameer



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-09  4:45         ` Nicolin Chen
  2025-01-11  4:05           ` Donald Dutile
@ 2025-01-31 16:54           ` Eric Auger
  2025-02-03 18:50             ` Nicolin Chen
  1 sibling, 1 reply; 150+ messages in thread
From: Eric Auger @ 2025-01-31 16:54 UTC (permalink / raw)
  To: Nicolin Chen, Shameerali Kolothum Thodi, ddutile@redhat.com
  Cc: Peter Maydell, Jason Gunthorpe, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org

Hi Nicolin,


On 1/9/25 5:45 AM, Nicolin Chen wrote:
> On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
>> And patches prior to this commit adds that support: 
>> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>> SMMUv3")
>>
>> Nicolin is soon going to send out those for review. Or I can include
>> those in this series so that it gives a complete picture. Nicolin?
> Just found that I forgot to reply this one...sorry
>
> I asked Don/Eric to take over that vSMMU series:
> https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> (The majority of my effort has been still on the kernel side:
>  previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
>
> Don/Eric, is there any update from your side?
To be honest we have not much progressed so far. On my end I can
dedicate some cycles now. I currently try to understand how and what
subset I can respin and which test setup can be used. I will come back
to you next week.

Eric

>
> I think it's also a good time to align with each other so we
> can take our next step in the new year :)
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31 16:54           ` Eric Auger
@ 2025-02-03 18:50             ` Nicolin Chen
  2025-02-04 17:49               ` Eric Auger
  0 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2025-02-03 18:50 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameerali Kolothum Thodi, ddutile@redhat.com, Peter Maydell,
	Jason Gunthorpe, Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Fri, Jan 31, 2025 at 05:54:56PM +0100, Eric Auger wrote:
> On 1/9/25 5:45 AM, Nicolin Chen wrote:
> > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
> >> And patches prior to this commit adds that support: 
> >> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
> >> SMMUv3")
> >>
> >> Nicolin is soon going to send out those for review. Or I can include
> >> those in this series so that it gives a complete picture. Nicolin?
> > Just found that I forgot to reply this one...sorry
> >
> > I asked Don/Eric to take over that vSMMU series:
> > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> > (The majority of my effort has been still on the kernel side:
> >  previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
> >
> > Don/Eric, is there any update from your side?
> To be honest we have not much progressed so far. On my end I can
> dedicate some cycles now. I currently try to understand how and what
> subset I can respin and which test setup can be used. I will come back
> to you next week.

In summary, we will have the following series:
1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
   https://lore.kernel.org/qemu-devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.namprd11.prod.outlook.com/
2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take over)
4) Shameer's work on "-device" in ARM virt.c
5) vEVENTQ for fault injection (if time is right, squash into 2/3)

Perhaps, 3/4 would come in a different order, or maybe 4 could split
into a few patches changing "-device" (sending before 3) and then a
few other patches adding multi-vSMMU support (sending after 3).

My latest QEMU branch for reference:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_veventq-v6
It hasn't integrated Shameer's and Nathan's work though..
For testing, use this kernel branch:
https://github.com/nicolinc/iommufd/commits/iommufd_veventq-v6-with-rmr

I think we'd need to build a shared branch by integrating the latest
series in the list above.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-03 18:50             ` Nicolin Chen
@ 2025-02-04 17:49               ` Eric Auger
  2025-02-05  0:08                 ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Eric Auger @ 2025-02-04 17:49 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameerali Kolothum Thodi, ddutile@redhat.com, Peter Maydell,
	Jason Gunthorpe, Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Nicolin, Shameer,

On 2/3/25 7:50 PM, Nicolin Chen wrote:
> On Fri, Jan 31, 2025 at 05:54:56PM +0100, Eric Auger wrote:
>> On 1/9/25 5:45 AM, Nicolin Chen wrote:
>>> On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
>>>> And patches prior to this commit adds that support: 
>>>> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>>>> SMMUv3")
>>>>
>>>> Nicolin is soon going to send out those for review. Or I can include
>>>> those in this series so that it gives a complete picture. Nicolin?
>>> Just found that I forgot to reply this one...sorry
>>>
>>> I asked Don/Eric to take over that vSMMU series:
>>> https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
>>> (The majority of my effort has been still on the kernel side:
>>>  previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
>>>
>>> Don/Eric, is there any update from your side?
>> To be honest we have not much progressed so far. On my end I can
>> dedicate some cycles now. I currently try to understand how and what
>> subset I can respin and which test setup can be used. I will come back
>> to you next week.
> In summary, we will have the following series:
> 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
>    https://lore.kernel.org/qemu-devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.namprd11.prod.outlook.com/
> 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
for 1 and 2, are you taking about the "Add VIOMMU infrastructure support
" series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
Sorry I may instead refer to NVidia or Intel's branch but I am not sure
about the last ones.
> 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take over)
We can start sending it upstream assuming we have a decent test environment.

However in
https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.com/

Shameer suggested he may include it in his SMMU multi instance series.
What do you both prefer?

Eric


> 4) Shameer's work on "-device" in ARM virt.c
> 5) vEVENTQ for fault injection (if time is right, squash into 2/3)
>
> Perhaps, 3/4 would come in a different order, or maybe 4 could split
> into a few patches changing "-device" (sending before 3) and then a
> few other patches adding multi-vSMMU support (sending after 3).
>
> My latest QEMU branch for reference:
> https://github.com/nicolinc/qemu/commits/wip/for_iommufd_veventq-v6
> It hasn't integrated Shameer's and Nathan's work though..
> For testing, use this kernel branch:
> https://github.com/nicolinc/iommufd/commits/iommufd_veventq-v6-with-rmr
>
> I think we'd need to build a shared branch by integrating the latest
> series in the list above.
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-04 17:49               ` Eric Auger
@ 2025-02-05  0:08                 ` Nicolin Chen
  2025-02-05 10:43                   ` Shameerali Kolothum Thodi via
                                     ` (2 more replies)
  0 siblings, 3 replies; 150+ messages in thread
From: Nicolin Chen @ 2025-02-05  0:08 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Eric Auger
  Cc: ddutile@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
> > In summary, we will have the following series:
> > 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
> >    https://lore.kernel.org/qemu-devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.namprd11.prod.outlook.com/
> > 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)

> for 1 and 2, are you taking about the "Add VIOMMU infrastructure support
> " series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
> Sorry I may instead refer to NVidia or Intel's branch but I am not sure
> about the last ones.

That "vIOMMU infrastructure" is for 2, yes.

For 1, it's inside the Intel's series:
"cover-letter: intel_iommu: Enable stage-1 translation for passthrough device"

So, we need to extract them out and make it separately..

> > 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take over)
> We can start sending it upstream assuming we have a decent test environment.
> 
> However in
> https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.com/
> 
> Shameer suggested he may include it in his SMMU multi instance series.
> What do you both prefer?

Sure, I think it's good to include those patches, though I believe
we need to build a new shared branch as Shameer's branch might not
reflect the latest kernel uAPI header.

Here is a new branch on top of latest master tree (v9.2.50):
https://github.com/nicolinc/qemu/commits/wip/for_shameer_02042025

I took HWPT patches from Zhenzhong's series and rebased all related
changes from my tree. I did some sanity and it should work with RMR.

Shameer, would you please try this branch and then integrate your
series on top of the following series?
   cover-letter: Add HW accelerated nesting support for arm SMMUv3
   cover-letter: Add vIOMMU-based nesting infrastructure support
   cover-letter: Add HWPT-based nesting infrastructure support
Basically, just replace my old multi-instance series with yours, to
create a shared branch for all of us.

Eric, perhaps you can start to look at the these series. Even the
first two iommufd series are a bit of rough integrations :)

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-05  0:08                 ` Nicolin Chen
@ 2025-02-05 10:43                   ` Shameerali Kolothum Thodi via
  2025-02-05 12:35                   ` Eric Auger
  2025-02-06 10:34                   ` Shameerali Kolothum Thodi via
  2 siblings, 0 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-05 10:43 UTC (permalink / raw)
  To: Nicolin Chen, Eric Auger
  Cc: ddutile@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, February 5, 2025 12:09 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Eric Auger
> <eric.auger@redhat.com>
> Cc: ddutile@redhat.com; Peter Maydell <peter.maydell@linaro.org>; Jason
> Gunthorpe <jgg@nvidia.com>; Daniel P. Berrangé <berrange@redhat.com>;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
> > > In summary, we will have the following series:
> > > 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
> > >    https://lore.kernel.org/qemu-
> devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.nam
> prd11.prod.outlook.com/
> > > 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
> 
> > for 1 and 2, are you taking about the "Add VIOMMU infrastructure
> support
> > " series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
> > Sorry I may instead refer to NVidia or Intel's branch but I am not sure
> > about the last ones.
> 
> That "vIOMMU infrastructure" is for 2, yes.
> 
> For 1, it's inside the Intel's series:
> "cover-letter: intel_iommu: Enable stage-1 translation for passthrough
> device"
> 
> So, we need to extract them out and make it separately..
> 
> > > 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take
> over)
> > We can start sending it upstream assuming we have a decent test
> environment.
> >
> > However in
> >
> https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.c
> om/
> >
> > Shameer suggested he may include it in his SMMU multi instance series.
> > What do you both prefer?
> 
> Sure, I think it's good to include those patches, though I believe
> we need to build a new shared branch as Shameer's branch might not
> reflect the latest kernel uAPI header.
> 
> Here is a new branch on top of latest master tree (v9.2.50):
> https://github.com/nicolinc/qemu/commits/wip/for_shameer_02042025
> 
> I took HWPT patches from Zhenzhong's series and rebased all related
> changes from my tree. I did some sanity and it should work with RMR.
> 
> Shameer, would you please try this branch and then integrate your
> series on top of the following series?
>    cover-letter: Add HW accelerated nesting support for arm SMMUv3
>    cover-letter: Add vIOMMU-based nesting infrastructure support
>    cover-letter: Add HWPT-based nesting infrastructure support
> Basically, just replace my old multi-instance series with yours, to
> create a shared branch for all of us.

Ok. I will take a look at that and rebase.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-05  0:08                 ` Nicolin Chen
  2025-02-05 10:43                   ` Shameerali Kolothum Thodi via
@ 2025-02-05 12:35                   ` Eric Auger
  2025-02-06 10:34                   ` Shameerali Kolothum Thodi via
  2 siblings, 0 replies; 150+ messages in thread
From: Eric Auger @ 2025-02-05 12:35 UTC (permalink / raw)
  To: Nicolin Chen, Shameerali Kolothum Thodi
  Cc: ddutile@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Nicolin,


On 2/5/25 1:08 AM, Nicolin Chen wrote:
> On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
>>> In summary, we will have the following series:
>>> 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
>>>    https://lore.kernel.org/qemu-devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.namprd11.prod.outlook.com/
>>> 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
>> for 1 and 2, are you taking about the "Add VIOMMU infrastructure support
>> " series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
>> Sorry I may instead refer to NVidia or Intel's branch but I am not sure
>> about the last ones.
> That "vIOMMU infrastructure" is for 2, yes.
>
> For 1, it's inside the Intel's series:
> "cover-letter: intel_iommu: Enable stage-1 translation for passthrough device"
>
> So, we need to extract them out and make it separately..

OK
>
>>> 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take over)
>> We can start sending it upstream assuming we have a decent test environment.
>>
>> However in
>> https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.com/
>>
>> Shameer suggested he may include it in his SMMU multi instance series.
>> What do you both prefer?
> Sure, I think it's good to include those patches, though I believe
> we need to build a new shared branch as Shameer's branch might not
> reflect the latest kernel uAPI header.
>
> Here is a new branch on top of latest master tree (v9.2.50):
> https://github.com/nicolinc/qemu/commits/wip/for_shameer_02042025
>
> I took HWPT patches from Zhenzhong's series and rebased all related
> changes from my tree. I did some sanity and it should work with RMR.
>
> Shameer, would you please try this branch and then integrate your
> series on top of the following series?
>    cover-letter: Add HW accelerated nesting support for arm SMMUv3
>    cover-letter: Add vIOMMU-based nesting infrastructure support
>    cover-letter: Add HWPT-based nesting infrastructure support
> Basically, just replace my old multi-instance series with yours, to
> create a shared branch for all of us.
>
> Eric, perhaps you can start to look at the these series. Even the
> first two iommufd series are a bit of rough integrations :)
OK I am starting this week

Eric
>
> Thanks
> Nicolin
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-05  0:08                 ` Nicolin Chen
  2025-02-05 10:43                   ` Shameerali Kolothum Thodi via
  2025-02-05 12:35                   ` Eric Auger
@ 2025-02-06 10:34                   ` Shameerali Kolothum Thodi via
  2025-02-06 18:58                     ` Nicolin Chen
  2 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-06 10:34 UTC (permalink / raw)
  To: Nicolin Chen, Eric Auger
  Cc: ddutile@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Nicolin,

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, February 5, 2025 12:09 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Eric Auger
> <eric.auger@redhat.com>
> Cc: ddutile@redhat.com; Peter Maydell <peter.maydell@linaro.org>; Jason
> Gunthorpe <jgg@nvidia.com>; Daniel P. Berrangé <berrange@redhat.com>;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
> > > In summary, we will have the following series:
> > > 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
> > >    https://lore.kernel.org/qemu-
> devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.nam
> prd11.prod.outlook.com/
> > > 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
> 
> > for 1 and 2, are you taking about the "Add VIOMMU infrastructure
> support
> > " series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
> > Sorry I may instead refer to NVidia or Intel's branch but I am not sure
> > about the last ones.
> 
> That "vIOMMU infrastructure" is for 2, yes.
> 
> For 1, it's inside the Intel's series:
> "cover-letter: intel_iommu: Enable stage-1 translation for passthrough
> device"
> 
> So, we need to extract them out and make it separately..
> 
> > > 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take
> over)
> > We can start sending it upstream assuming we have a decent test
> environment.
> >
> > However in
> >
> https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.c
> om/
> >
> > Shameer suggested he may include it in his SMMU multi instance series.
> > What do you both prefer?
> 
> Sure, I think it's good to include those patches, 

One of the feedback I received on my series was to rename "arm-smmuv3-nested"
to "arm-smmuv3-accel" and possibly rename function names to include "accel' as well
and move those functions to a separate "smmuv3-accel.c" file. I suppose that applies to 
the " Add HW accelerated nesting support for arm SMMUv3" series as well. 

Is that fine with you?

Thanks,
Shameer



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 10:34                   ` Shameerali Kolothum Thodi via
@ 2025-02-06 18:58                     ` Nicolin Chen
  2025-03-03 15:21                       ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2025-02-06 18:58 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Eric Auger, ddutile@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Thu, Feb 06, 2025 at 10:34:15AM +0000, Shameerali Kolothum Thodi wrote:
> > -----Original Message-----
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
> > > However in
> > >
> > > Shameer suggested he may include it in his SMMU multi instance series.
> > > What do you both prefer?
> > 
> > Sure, I think it's good to include those patches, 
> 
> One of the feedback I received on my series was to rename "arm-smmuv3-nested"
> to "arm-smmuv3-accel" and possibly rename function names to include "accel' as well
> and move those functions to a separate "smmuv3-accel.c" file. I suppose that applies to 
> the " Add HW accelerated nesting support for arm SMMUv3" series as well. 
> 
> Is that fine with you?

Oh, no problem. If you want to rename the whole thing, please feel
free. I do see the naming conflict between the "nested" stage and
the "nested" HW feature, which are both supported by the vSMMU now.

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 18:58                     ` Nicolin Chen
@ 2025-03-03 15:21                       ` Shameerali Kolothum Thodi via
  2025-03-03 17:04                         ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-03 15:21 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Eric Auger, ddutile@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Nicolin,

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 6, 2025 6:58 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Eric Auger <eric.auger@redhat.com>; ddutile@redhat.com; Peter
> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
[..]

> > One of the feedback I received on my series was to rename "arm-smmuv3-
> nested"
> > to "arm-smmuv3-accel" and possibly rename function names to include
> "accel' as well
> > and move those functions to a separate "smmuv3-accel.c" file. I suppose
> that applies to
> > the " Add HW accelerated nesting support for arm SMMUv3" series as
> well.
> >
> > Is that fine with you?
> 
> Oh, no problem. If you want to rename the whole thing, please feel
> free. I do see the naming conflict between the "nested" stage and
> the "nested" HW feature, which are both supported by the vSMMU now.

I am working on the above now and have quick question to you😊.

Looking at the smmu_dev_attach_viommu() fn here[0],
it appears to do the following:

1. Alloc a s2_hwpt if not allocated already and attach it.
2. Allocate abort and bypass hwpt
3. Attach bypass hwpt.

I didn't get why we are doing the step 3 here. To me it looks like,
when we attach the s2_hwpt(ie, the nested parent domain attach), 
the kernel will do,

arm_smmu_attach_dev()
  arm_smmu_make_s2_domain_ste()

It appears through step 3, we achieve the same thing again.

Or it is possible I missed something obvious here.

Please let me know.

Thanks,
Shameer

[0] https://github.com/nicolinc/qemu/blob/wip/for_shameer_02042025/hw/arm/smmu-common.c#L910C13-L910C35



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-03-03 15:21                       ` Shameerali Kolothum Thodi via
@ 2025-03-03 17:04                         ` Nicolin Chen
  2025-03-04  9:30                           ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2025-03-03 17:04 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Eric Auger, ddutile@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Mon, Mar 03, 2025 at 03:21:57PM +0000, Shameerali Kolothum Thodi wrote:
> I am working on the above now and have quick question to you😊.
> 
> Looking at the smmu_dev_attach_viommu() fn here[0],
> it appears to do the following:
> 
> 1. Alloc a s2_hwpt if not allocated already and attach it.
> 2. Allocate abort and bypass hwpt
> 3. Attach bypass hwpt.
> 
> I didn't get why we are doing the step 3 here. To me it looks like,
> when we attach the s2_hwpt(ie, the nested parent domain attach), 
> the kernel will do,
> 
> arm_smmu_attach_dev()
>   arm_smmu_make_s2_domain_ste()
> 
> It appears through step 3, we achieve the same thing again.
> 
> Or it is possible I missed something obvious here.

Because a device cannot attach to a vIOMMU object directly, but
only via a proxy hwpt_nested. So, this bypass hwpt gives us the
port to associate the device to the vIOMMU, before a vDEVICE or
a "translate" hwpt_nested is allocated.

Currently it's the same because an S2 parent hwpt holds a VMID,
so we could just attach the device to the S2 hwpt for the same
STE configuration as attaching the device to the proxy bypass
hwpt. Yet, this will change in the future after letting vIOMMU
objects hold their own VMIDs to share a common S2 parent hwpt
that won't have a VMID, i.e. arm_smmu_make_s2_domain_ste() will
need the vIOMMU object to get the VMID for STE.

I should have added a few lines of comments there :)

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-03-03 17:04                         ` Nicolin Chen
@ 2025-03-04  9:30                           ` Shameerali Kolothum Thodi via
  0 siblings, 0 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-03-04  9:30 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Eric Auger, ddutile@redhat.com, Peter Maydell, Jason Gunthorpe,
	Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Monday, March 3, 2025 5:05 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Eric Auger <eric.auger@redhat.com>; ddutile@redhat.com; Peter
> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Mon, Mar 03, 2025 at 03:21:57PM +0000, Shameerali Kolothum Thodi
> wrote:
> > I am working on the above now and have quick question to you😊.
> >
> > Looking at the smmu_dev_attach_viommu() fn here[0],
> > it appears to do the following:
> >
> > 1. Alloc a s2_hwpt if not allocated already and attach it.
> > 2. Allocate abort and bypass hwpt
> > 3. Attach bypass hwpt.
> >
> > I didn't get why we are doing the step 3 here. To me it looks like,
> > when we attach the s2_hwpt(ie, the nested parent domain attach),
> > the kernel will do,
> >
> > arm_smmu_attach_dev()
> >   arm_smmu_make_s2_domain_ste()
> >
> > It appears through step 3, we achieve the same thing again.
> >
> > Or it is possible I missed something obvious here.
> 
> Because a device cannot attach to a vIOMMU object directly, but
> only via a proxy hwpt_nested. So, this bypass hwpt gives us the
> port to associate the device to the vIOMMU, before a vDEVICE or
> a "translate" hwpt_nested is allocated.
> 
> Currently it's the same because an S2 parent hwpt holds a VMID,
> so we could just attach the device to the S2 hwpt for the same
> STE configuration as attaching the device to the proxy bypass
> hwpt. Yet, this will change in the future after letting vIOMMU
> objects hold their own VMIDs to share a common S2 parent hwpt
> that won't have a VMID, i.e. arm_smmu_make_s2_domain_ste() will
> need the vIOMMU object to get the VMID for STE.
> 
> I should have added a few lines of comments there :)

Ok. Thanks for the explanation. I will keep it then and add few comments
to make it clear.

Do you have an initial implementation of the above with vIOMMU object
holding the VMIDs to share? Actually I do have a dependency on that for
my KVM pinned VMID series[0] where it was suggested that the VMID
should associated with a vIOMMU object rather than the IOMMUFD
context I used in there.

And Jason mentioned about the work involved to do that here[1]. Appreciate
if you could share if any progress is made on that so that I can try to rebase
that KVM  Pinned series on top of that and give it a try.

Thanks,
Shameer
[0] https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/
[1] https://lore.kernel.org/linux-arm-kernel/20241129150628.GG1253388@nvidia.com/






^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
                   ` (9 preceding siblings ...)
  2024-12-13 12:00 ` Daniel P. Berrangé
@ 2025-01-30 16:00 ` Daniel P. Berrangé
  2025-01-30 18:09   ` Shameerali Kolothum Thodi via
  10 siblings, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-01-30 16:00 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: qemu-arm, qemu-devel, eric.auger, peter.maydell, jgg, nicolinc,
	ddutile, linuxarm, wangzhou1, jiangkunkun, jonathan.cameron,
	zhangfei.gao

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> How to use it(Eg:):
> 
> On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
> specify two smmuv3-nested devices each behind a pxb-pcie as below,
> 
> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> -net none \
> -nographic

Above you say the host has 2 SMMUv3 devices, and you've created 2 SMMUv3
guest devices to match.

The various emails in this thread & libvirt thread, indicate that each
guest SMMUv3 is associated with a host SMMUv3, but I don't see any
property on the command line for 'arm-ssmv3-nested' that tells it which
host eSMMUv3 it is to be associated with.

How does this association work ?


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-30 16:00 ` Daniel P. Berrangé
@ 2025-01-30 18:09   ` Shameerali Kolothum Thodi via
  2025-01-31  9:33     ` Shameerali Kolothum Thodi via
  2025-01-31 21:41     ` Daniel P. Berrangé
  0 siblings, 2 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-01-30 18:09 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

Hi Daniel,

> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, January 30, 2025 4:00 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > How to use it(Eg:):
> >
> > On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP
> VF
> > devices and HNS VF devices are behind different SMMUv3s. So for a
> Guest,
> > specify two smmuv3-nested devices each behind a pxb-pcie as below,
> >
> > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
> iommu=on \
> > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> > -object iommufd,id=iommufd0 \
> > -bios QEMU_EFI.fd \
> > -kernel Image \
> > -device virtio-blk-device,drive=fs \
> > -drive if=none,file=rootfs.qcow2,id=fs \
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
> earlycon=pl011,0x9000000" \
> > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> > -net none \
> > -nographic
> 
> Above you say the host has 2 SMMUv3 devices, and you've created 2
> SMMUv3
> guest devices to match.
> 
> The various emails in this thread & libvirt thread, indicate that each
> guest SMMUv3 is associated with a host SMMUv3, but I don't see any
> property on the command line for 'arm-ssmv3-nested' that tells it which
> host eSMMUv3 it is to be associated with.
> 
> How does this association work ?

You are right. The association is not very obvious in Qemu. The association
and checking is done implicitly by kernel at the moment.  I will try to explain
it here.

Each "arm-smmuv3-nested" instance, when the first device gets attached
to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
SMMUv3 driver. This domain will have a pointer representing the physical
SMMUv3 that the device belongs. And any other device which belongs to
the same physical SMMUv3 can share this S2 domain.

If a device that belongs to a different physical SMMUv3 gets attached to
the above domain, the HWPT attach will eventually fail as the physical
smmuv3 in the domains will have a mismatch,
https://elixir.bootlin.com/linux/v6.13/source/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c#L2860

And as I mentioned in cover letter, Qemu will report,

"
Attempt to add the HNS VF to a different SMMUv3 will result in,

-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
   Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument

At present Qemu is not doing any extra validation other than the above 
failure to make sure the user configuration is correct or not. The
assumption is libvirt will take care of this.
"
So in summary, if the libvirt gets it wrong, Qemu will fail with error.

If a more explicit association is required, some help from kernel is required
to identify the physical SMMUv3 associated with the device.  

Jason/Nicolin, any thoughts on this?

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-30 18:09   ` Shameerali Kolothum Thodi via
@ 2025-01-31  9:33     ` Shameerali Kolothum Thodi via
  2025-01-31 10:07       ` Eric Auger
  2025-01-31 14:24       ` Jason Gunthorpe
  2025-01-31 21:41     ` Daniel P. Berrangé
  1 sibling, 2 replies; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-01-31  9:33 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, Nathan Chen



> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: Thursday, January 30, 2025 6:09 PM
> To: 'Daniel P. Berrangé' <berrange@redhat.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Daniel,
> 
> > -----Original Message-----
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > Sent: Thursday, January 30, 2025 4:00 PM
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> >
> > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > How to use it(Eg:):
> > >
> > > On a HiSilicon platform that has multiple physical SMMUv3s, the ACC
> ZIP
> > VF
> > > devices and HNS VF devices are behind different SMMUv3s. So for a
> > Guest,
> > > specify two smmuv3-nested devices each behind a pxb-pcie as below,
> > >
> > > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
> > iommu=on \
> > > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> > > -object iommufd,id=iommufd0 \
> > > -bios QEMU_EFI.fd \
> > > -kernel Image \
> > > -device virtio-blk-device,drive=fs \
> > > -drive if=none,file=rootfs.qcow2,id=fs \
> > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> > > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> > > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> > > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> > > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> > > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
> > earlycon=pl011,0x9000000" \
> > > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> > > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> > > -net none \
> > > -nographic
> >
> > Above you say the host has 2 SMMUv3 devices, and you've created 2
> > SMMUv3
> > guest devices to match.
> >
> > The various emails in this thread & libvirt thread, indicate that each
> > guest SMMUv3 is associated with a host SMMUv3, but I don't see any
> > property on the command line for 'arm-ssmv3-nested' that tells it which
> > host eSMMUv3 it is to be associated with.
> >
> > How does this association work ?
> 
> You are right. The association is not very obvious in Qemu. The association
> and checking is done implicitly by kernel at the moment.  I will try to
> explain
> it here.
> 
> Each "arm-smmuv3-nested" instance, when the first device gets attached
> to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
> SMMUv3 driver. This domain will have a pointer representing the physical
> SMMUv3 that the device belongs. And any other device which belongs to
> the same physical SMMUv3 can share this S2 domain.
> 
> If a device that belongs to a different physical SMMUv3 gets attached to
> the above domain, the HWPT attach will eventually fail as the physical
> smmuv3 in the domains will have a mismatch,
> https://elixir.bootlin.com/linux/v6.13/source/drivers/iommu/arm/arm-
> smmu-v3/arm-smmu-v3.c#L2860
> 
> And as I mentioned in cover letter, Qemu will report,
> 
> "
> Attempt to add the HNS VF to a different SMMUv3 will result in,
> 
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> Unable to attach viommu
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio
> 0000:7d:02.2:
>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38)
> to id=11: Invalid argument
> 
> At present Qemu is not doing any extra validation other than the above
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.
> "
> So in summary, if the libvirt gets it wrong, Qemu will fail with error.
> 
> If a more explicit association is required, some help from kernel is required
> to identify the physical SMMUv3 associated with the device.

Again thinking about this, to have an explicit association in the Qemu command 
line between the vSMMUv3 and the phys smmuv3,

We can possibly add something like,

-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device arm-smmuv3-accel,bus=pcie.1,phys-smmuv3= smmu3.0x0000000100000000 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \

-device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
-device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2, phys-smmuv3= smmu3.0x0000000200000000  \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \

etc.

And Qemu does some checking to make sure that the device is indeed associated
with the specified phys-smmuv3.  This can be done going through the sysfs path checking
which is what I guess libvirt is currently doing to populate the topology. So basically
Qemu is just replicating that to validate again.

Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys
smmuv3 base address which can avoid going through the sysfs.

The only difference between the current approach(kernel failing the attach implicitly)
and the above is, Qemu can provide a validation of inputs and may be report a  better
error message than just saying " Unable to attach viommu/: Invalid argument".

If the command line looks Ok, I will go with the sysfs path validation method first in my
next respin.

Please let me know.

Thanks,
Shameer





^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31  9:33     ` Shameerali Kolothum Thodi via
@ 2025-01-31 10:07       ` Eric Auger
  2025-01-31 14:24       ` Jason Gunthorpe
  1 sibling, 0 replies; 150+ messages in thread
From: Eric Auger @ 2025-01-31 10:07 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, Nathan Chen


Hi Shameer,

On 1/31/25 10:33 AM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Shameerali Kolothum Thodi
>> Sent: Thursday, January 30, 2025 6:09 PM
>> To: 'Daniel P. Berrangé' <berrange@redhat.com>
>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> Hi Daniel,
>>
>>> -----Original Message-----
>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>> Sent: Thursday, January 30, 2025 4:00 PM
>>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>
>>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>>> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>>> nested SMMUv3
>>>
>>> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
>>>> How to use it(Eg:):
>>>>
>>>> On a HiSilicon platform that has multiple physical SMMUv3s, the ACC
>> ZIP
>>> VF
>>>> devices and HNS VF devices are behind different SMMUv3s. So for a
>>> Guest,
>>>> specify two smmuv3-nested devices each behind a pxb-pcie as below,
>>>>
>>>> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
>>> iommu=on \
>>>> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
>>>> -object iommufd,id=iommufd0 \
>>>> -bios QEMU_EFI.fd \
>>>> -kernel Image \
>>>> -device virtio-blk-device,drive=fs \
>>>> -drive if=none,file=rootfs.qcow2,id=fs \
>>>> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
>>>> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
>>>> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
>>>> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>>>> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
>>>> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
>>>> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
>>>> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
>>>> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
>>> earlycon=pl011,0x9000000" \
>>>> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
>>>> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
>>>> -net none \
>>>> -nographic
>>> Above you say the host has 2 SMMUv3 devices, and you've created 2
>>> SMMUv3
>>> guest devices to match.
>>>
>>> The various emails in this thread & libvirt thread, indicate that each
>>> guest SMMUv3 is associated with a host SMMUv3, but I don't see any
>>> property on the command line for 'arm-ssmv3-nested' that tells it which
>>> host eSMMUv3 it is to be associated with.
>>>
>>> How does this association work ?
>> You are right. The association is not very obvious in Qemu. The association
>> and checking is done implicitly by kernel at the moment.  I will try to
>> explain
>> it here.
>>
>> Each "arm-smmuv3-nested" instance, when the first device gets attached
>> to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
>> SMMUv3 driver. This domain will have a pointer representing the physical
>> SMMUv3 that the device belongs. And any other device which belongs to
>> the same physical SMMUv3 can share this S2 domain.
>>
>> If a device that belongs to a different physical SMMUv3 gets attached to
>> the above domain, the HWPT attach will eventually fail as the physical
>> smmuv3 in the domains will have a mismatch,
>> https://elixir.bootlin.com/linux/v6.13/source/drivers/iommu/arm/arm-
>> smmu-v3/arm-smmu-v3.c#L2860
>>
>> And as I mentioned in cover letter, Qemu will report,
>>
>> "
>> Attempt to add the HNS VF to a different SMMUv3 will result in,
>>
>> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
>> Unable to attach viommu
>> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio
>> 0000:7d:02.2:
>>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38)
>> to id=11: Invalid argument
>>
>> At present Qemu is not doing any extra validation other than the above
>> failure to make sure the user configuration is correct or not. The
>> assumption is libvirt will take care of this.
>> "
>> So in summary, if the libvirt gets it wrong, Qemu will fail with error.
>>
>> If a more explicit association is required, some help from kernel is required
>> to identify the physical SMMUv3 associated with the device.
> Again thinking about this, to have an explicit association in the Qemu command 
> line between the vSMMUv3 and the phys smmuv3,
>
> We can possibly add something like,
>
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-accel,bus=pcie.1,phys-smmuv3= smmu3.0x0000000100000000 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2, phys-smmuv3= smmu3.0x0000000200000000  \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
>
> etc.
>
> And Qemu does some checking to make sure that the device is indeed associated
> with the specified phys-smmuv3.  This can be done going through the sysfs path checking
> which is what I guess libvirt is currently doing to populate the topology. So basically
> Qemu is just replicating that to validate again.
>
> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys
> smmuv3 base address which can avoid going through the sysfs.
>
> The only difference between the current approach(kernel failing the attach implicitly)
> and the above is, Qemu can provide a validation of inputs and may be report a  better
> error message than just saying " Unable to attach viommu/: Invalid argument".
>
> If the command line looks Ok, I will go with the sysfs path validation method first in my
> next respin.
The command line looks sensible to me. on vfio we use
host=6810000.ethernet. Maybe reuse this instead of phys-smmuv3? Thanks Eric
>
> Please let me know.
>
> Thanks,
> Shameer
>
>
>
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31  9:33     ` Shameerali Kolothum Thodi via
  2025-01-31 10:07       ` Eric Auger
@ 2025-01-31 14:24       ` Jason Gunthorpe
  2025-01-31 14:39         ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 150+ messages in thread
From: Jason Gunthorpe @ 2025-01-31 14:24 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nathan Chen

On Fri, Jan 31, 2025 at 09:33:16AM +0000, Shameerali Kolothum Thodi wrote:

> And Qemu does some checking to make sure that the device is indeed associated
> with the specified phys-smmuv3.  This can be done going through the sysfs path checking
> which is what I guess libvirt is currently doing to populate the topology. So basically
> Qemu is just replicating that to validate again.

I would prefer that iommufd users not have to go out to sysfs..
 
> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys
> smmuv3 base address which can avoid going through the sysfs.

It also doesn't seem great to expose a physical address. But we could
have an 'iommu instance id' that was a unique small integer?

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31 14:24       ` Jason Gunthorpe
@ 2025-01-31 14:39         ` Shameerali Kolothum Thodi via
  2025-01-31 14:54           ` Jason Gunthorpe
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-01-31 14:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nathan Chen



> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 31, 2025 2:24 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Jan 31, 2025 at 09:33:16AM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > And Qemu does some checking to make sure that the device is indeed
> associated
> > with the specified phys-smmuv3.  This can be done going through the
> sysfs path checking
> > which is what I guess libvirt is currently doing to populate the topology.
> So basically
> > Qemu is just replicating that to validate again.
> 
> I would prefer that iommufd users not have to go out to sysfs..
> 
> > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> return the phys
> > smmuv3 base address which can avoid going through the sysfs.
> 
> It also doesn't seem great to expose a physical address. But we could
> have an 'iommu instance id' that was a unique small integer?

Ok. But how the user space can map that to the device?

Something like,
/sys/bus/pci/devices/0000:7d:00.1/iommu/instance.X ?

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31 14:39         ` Shameerali Kolothum Thodi via
@ 2025-01-31 14:54           ` Jason Gunthorpe
  2025-01-31 15:23             ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Jason Gunthorpe @ 2025-01-31 14:54 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nathan Chen

On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi wrote:

> > > And Qemu does some checking to make sure that the device is indeed
> > associated
> > > with the specified phys-smmuv3.  This can be done going through the
> > sysfs path checking
> > > which is what I guess libvirt is currently doing to populate the topology.
> > So basically
> > > Qemu is just replicating that to validate again.
> > 
> > I would prefer that iommufd users not have to go out to sysfs..
> > 
> > > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> > return the phys
> > > smmuv3 base address which can avoid going through the sysfs.
> > 
> > It also doesn't seem great to expose a physical address. But we could
> > have an 'iommu instance id' that was a unique small integer?
> 
> Ok. But how the user space can map that to the device?

Why does it need to?

libvirt picks some label for the vsmmu instance, it doesn't matter
what the string is.

qemu validates that all of the vsmmu instances are only linked to PCI
device that have the same iommu ID. This is already happening in the
kernel, it will fail attaches to mismatched instances.

Nothing further is needed?

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31 14:54           ` Jason Gunthorpe
@ 2025-01-31 15:23             ` Shameerali Kolothum Thodi via
  2025-01-31 16:08               ` Eric Auger
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-01-31 15:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Nathan Chen



> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 31, 2025 2:54 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > > > And Qemu does some checking to make sure that the device is indeed
> > > associated
> > > > with the specified phys-smmuv3.  This can be done going through the
> > > sysfs path checking
> > > > which is what I guess libvirt is currently doing to populate the
> topology.
> > > So basically
> > > > Qemu is just replicating that to validate again.
> > >
> > > I would prefer that iommufd users not have to go out to sysfs..
> > >
> > > > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> > > return the phys
> > > > smmuv3 base address which can avoid going through the sysfs.
> > >
> > > It also doesn't seem great to expose a physical address. But we could
> > > have an 'iommu instance id' that was a unique small integer?
> >
> > Ok. But how the user space can map that to the device?
> 
> Why does it need to?
> 
> libvirt picks some label for the vsmmu instance, it doesn't matter
> what the string is.
> 
> qemu validates that all of the vsmmu instances are only linked to PCI
> device that have the same iommu ID. This is already happening in the
> kernel, it will fail attaches to mismatched instances.
> 
> Nothing further is needed?

-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \

-device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
-device arm-smmuv3-accel,pci-bus=pcie.2,id=smmuv2 \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \

I think it works from a functionality point of view. A  particular
instance of arm-smmuv3-accel(say id=smmuv1) can only have devices attached
to the same phys smmuv3 "iommu instance id"

But not sure from a libvirt/Qemu interface point of view[0] the concerns
are addressed. Daniel/Nathan?

Thanks,
Shameer
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/X6R52JRBYDFZ5PSJFR534A655UZ3RHKN/



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31 15:23             ` Shameerali Kolothum Thodi via
@ 2025-01-31 16:08               ` Eric Auger
  2025-02-05 20:53                 ` Nathan Chen
  2025-02-06  8:53                 ` Daniel P. Berrangé
  0 siblings, 2 replies; 150+ messages in thread
From: Eric Auger @ 2025-01-31 16:08 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Jason Gunthorpe
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org,
	nicolinc@nvidia.com, ddutile@redhat.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org,
	Nathan Chen

Hi,


On 1/31/25 4:23 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Friday, January 31, 2025 2:54 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; eric.auger@redhat.com;
>> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
>> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
>> wrote:
>>
>>>>> And Qemu does some checking to make sure that the device is indeed
>>>> associated
>>>>> with the specified phys-smmuv3.  This can be done going through the
>>>> sysfs path checking
>>>>> which is what I guess libvirt is currently doing to populate the
>> topology.
>>>> So basically
>>>>> Qemu is just replicating that to validate again.
>>>> I would prefer that iommufd users not have to go out to sysfs..
>>>>
>>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
>>>> return the phys
>>>>> smmuv3 base address which can avoid going through the sysfs.
>>>> It also doesn't seem great to expose a physical address. But we could
>>>> have an 'iommu instance id' that was a unique small integer?
>>> Ok. But how the user space can map that to the device?
>> Why does it need to?
>>
>> libvirt picks some label for the vsmmu instance, it doesn't matter
>> what the string is.
>>
>> qemu validates that all of the vsmmu instances are only linked to PCI
>> device that have the same iommu ID. This is already happening in the
>> kernel, it will fail attaches to mismatched instances.
>>
>> Nothing further is needed?
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
I don't get what is the point of adding such an id if it is not
referenced anywhere?

Eric
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-accel,pci-bus=pcie.2,id=smmuv2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
>
> I think it works from a functionality point of view. A  particular
> instance of arm-smmuv3-accel(say id=smmuv1) can only have devices attached
> to the same phys smmuv3 "iommu instance id"
>
> But not sure from a libvirt/Qemu interface point of view[0] the concerns
> are addressed. Daniel/Nathan?
>
> Thanks,
> Shameer
> https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/X6R52JRBYDFZ5PSJFR534A655UZ3RHKN/
>



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31 16:08               ` Eric Auger
@ 2025-02-05 20:53                 ` Nathan Chen
  2025-02-06  8:54                   ` Daniel P. Berrangé
  2025-02-06  8:53                 ` Daniel P. Berrangé
  1 sibling, 1 reply; 150+ messages in thread
From: Nathan Chen @ 2025-02-05 20:53 UTC (permalink / raw)
  To: eric.auger
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org,
	nicolinc@nvidia.com, ddutile@redhat.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org,
	Shameer Kolothum, Jason Gunthorpe

On 1/31/2025 8:08 AM, Eric Auger wrote:
>>>>>> And Qemu does some checking to make sure that the device is indeed
>>>>> associated
>>>>>> with the specified phys-smmuv3.  This can be done going through the
>>>>> sysfs path checking
>>>>>> which is what I guess libvirt is currently doing to populate the
>>> topology.
>>>>> So basically
>>>>>> Qemu is just replicating that to validate again.
>>>>> I would prefer that iommufd users not have to go out to sysfs..
>>>>>
>>>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
>>>>> return the phys
>>>>>> smmuv3 base address which can avoid going through the sysfs.
>>>>> It also doesn't seem great to expose a physical address. But we could
>>>>> have an 'iommu instance id' that was a unique small integer?
>>>> Ok. But how the user space can map that to the device?
>>> Why does it need to?
>>>
>>> libvirt picks some label for the vsmmu instance, it doesn't matter
>>> what the string is.
>>>
>>> qemu validates that all of the vsmmu instances are only linked to PCI
>>> device that have the same iommu ID. This is already happening in the
>>> kernel, it will fail attaches to mismatched instances.
>>>
>>> Nothing further is needed?
>> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
>> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
>> -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
> I don't get what is the point of adding such an id if it is not
> referenced anywhere?
> 
> Eric

Daniel mentions that the host-to-guest SMMU pairing must be chosen such 
that it makes conceptual sense w.r.t. the guest NUMA to host NUMA 
pairing [0]. The current implementation allows for incorrect host to 
guest numa node pairings, e.g. pSMMU has affinity to host numa node 0, 
but it’s paired with a vSMMU paired with a guest numa node pinned to 
host numa node 1.

By specifying the host SMMU id, we can explicitly pair a host SMMU with 
a guest SMMU associated with the correct PXB NUMA node, vs. implying the 
host-to-guest SMMU pairing based on what devices are attached to the 
PXB. While it would not completely prevent the incorrect pSMMU/vSMMU 
pairing w.r.t. host to guest numa node pairings, specifying the pSMMU id 
would make the implications of host to guest numa node pairings more 
clear when specifying a vSMMU instance.

 From the libvirt discussion with Daniel [1], he also states "libvirt's 
goal has always been to make everything that's functionally impacting a 
guest device be 100% explicit. So I don't think we should be implying 
mappings to the host SMMU in QEMU at all, QEMU must be told what to map 
to." Specifying the id would be a means of explicitly specifying host to 
guest SMMU mapping instead of implying the mapping.

[0] https://lore.kernel.org/qemu-devel/Z51DmtP83741RAsb@redhat.com/
[1] 
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/7GDT6RX5LPAJMPP4ZSC4ACME6GVMG236/#X6R52JRBYDFZ5PSJFR534A655UZ3RHKN

Thanks,
Nathan

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-05 20:53                 ` Nathan Chen
@ 2025-02-06  8:54                   ` Daniel P. Berrangé
  0 siblings, 0 replies; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-02-06  8:54 UTC (permalink / raw)
  To: Nathan Chen
  Cc: eric.auger, qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, Shameer Kolothum, Jason Gunthorpe

On Wed, Feb 05, 2025 at 12:53:42PM -0800, Nathan Chen wrote:
> 
> 
> On 1/31/2025 8:08 AM, Eric Auger wrote:
> > > > > > > And Qemu does some checking to make sure that the device is indeed
> > > > > > associated
> > > > > > > with the specified phys-smmuv3.  This can be done going through the
> > > > > > sysfs path checking
> > > > > > > which is what I guess libvirt is currently doing to populate the
> > > > topology.
> > > > > > So basically
> > > > > > > Qemu is just replicating that to validate again.
> > > > > > I would prefer that iommufd users not have to go out to sysfs..
> > > > > > 
> > > > > > > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> > > > > > return the phys
> > > > > > > smmuv3 base address which can avoid going through the sysfs.
> > > > > > It also doesn't seem great to expose a physical address. But we could
> > > > > > have an 'iommu instance id' that was a unique small integer?
> > > > > Ok. But how the user space can map that to the device?
> > > > Why does it need to?
> > > > 
> > > > libvirt picks some label for the vsmmu instance, it doesn't matter
> > > > what the string is.
> > > > 
> > > > qemu validates that all of the vsmmu instances are only linked to PCI
> > > > device that have the same iommu ID. This is already happening in the
> > > > kernel, it will fail attaches to mismatched instances.
> > > > 
> > > > Nothing further is needed?
> > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > > -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
> > I don't get what is the point of adding such an id if it is not
> > referenced anywhere?
> > 
> > Eric
> 
> Daniel mentions that the host-to-guest SMMU pairing must be chosen such that
> it makes conceptual sense w.r.t. the guest NUMA to host NUMA pairing [0].
> The current implementation allows for incorrect host to guest numa node
> pairings, e.g. pSMMU has affinity to host numa node 0, but it’s paired with
> a vSMMU paired with a guest numa node pinned to host numa node 1.
> 
> By specifying the host SMMU id, we can explicitly pair a host SMMU with a
> guest SMMU associated with the correct PXB NUMA node, vs. implying the
> host-to-guest SMMU pairing based on what devices are attached to the PXB.
> While it would not completely prevent the incorrect pSMMU/vSMMU pairing
> w.r.t. host to guest numa node pairings, specifying the pSMMU id would make
> the implications of host to guest numa node pairings more clear when
> specifying a vSMMU instance.

You've not specified any host SMMU id in the above CLI args though,
only the PXB association.

It needs something like

 -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1,host-smmu=XXXXX

where 'XXXX' is some value to identify the host SMMU

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31 16:08               ` Eric Auger
  2025-02-05 20:53                 ` Nathan Chen
@ 2025-02-06  8:53                 ` Daniel P. Berrangé
  2025-02-06 16:44                   ` Eric Auger
  1 sibling, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-02-06  8:53 UTC (permalink / raw)
  To: Eric Auger
  Cc: Shameerali Kolothum Thodi, Jason Gunthorpe, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org,
	nicolinc@nvidia.com, ddutile@redhat.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org,
	Nathan Chen

On Fri, Jan 31, 2025 at 05:08:28PM +0100, Eric Auger wrote:
> Hi,
> 
> 
> On 1/31/25 4:23 PM, Shameerali Kolothum Thodi wrote:
> >
> >> -----Original Message-----
> >> From: Jason Gunthorpe <jgg@nvidia.com>
> >> Sent: Friday, January 31, 2025 2:54 PM
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> >> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> >> qemu-devel@nongnu.org; eric.auger@redhat.com;
> >> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> >> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> >> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> >> Jonathan Cameron <jonathan.cameron@huawei.com>;
> >> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
> >> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> >> nested SMMUv3
> >>
> >> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
> >> wrote:
> >>
> >>>>> And Qemu does some checking to make sure that the device is indeed
> >>>> associated
> >>>>> with the specified phys-smmuv3.  This can be done going through the
> >>>> sysfs path checking
> >>>>> which is what I guess libvirt is currently doing to populate the
> >> topology.
> >>>> So basically
> >>>>> Qemu is just replicating that to validate again.
> >>>> I would prefer that iommufd users not have to go out to sysfs..
> >>>>
> >>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> >>>> return the phys
> >>>>> smmuv3 base address which can avoid going through the sysfs.
> >>>> It also doesn't seem great to expose a physical address. But we could
> >>>> have an 'iommu instance id' that was a unique small integer?
> >>> Ok. But how the user space can map that to the device?
> >> Why does it need to?
> >>
> >> libvirt picks some label for the vsmmu instance, it doesn't matter
> >> what the string is.
> >>
> >> qemu validates that all of the vsmmu instances are only linked to PCI
> >> device that have the same iommu ID. This is already happening in the
> >> kernel, it will fail attaches to mismatched instances.
> >>
> >> Nothing further is needed?
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
> I don't get what is the point of adding such an id if it is not
> referenced anywhere?

Every QDev device instance has an 'id' property - if you don't
set one explicitly, QEMU will generate one internally. Libvirt
will always set the 'id' property to avoid the internal auto-
generated IDs, as it wants full knowledge of naming.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06  8:53                 ` Daniel P. Berrangé
@ 2025-02-06 16:44                   ` Eric Auger
  0 siblings, 0 replies; 150+ messages in thread
From: Eric Auger @ 2025-02-06 16:44 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Shameerali Kolothum Thodi, Jason Gunthorpe, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, peter.maydell@linaro.org,
	nicolinc@nvidia.com, ddutile@redhat.com, Linuxarm, Wangzhou (B),
	jiangkunkun, Jonathan Cameron, zhangfei.gao@linaro.org,
	Nathan Chen




On 2/6/25 9:53 AM, Daniel P. Berrangé wrote:
> On Fri, Jan 31, 2025 at 05:08:28PM +0100, Eric Auger wrote:
>> Hi,
>>
>>
>> On 1/31/25 4:23 PM, Shameerali Kolothum Thodi wrote:
>>>> -----Original Message-----
>>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>>> Sent: Friday, January 31, 2025 2:54 PM
>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>>> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>>>> qemu-devel@nongnu.org; eric.auger@redhat.com;
>>>> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
>>>> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>>>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>>>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>>>> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
>>>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>>>> nested SMMUv3
>>>>
>>>> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
>>>> wrote:
>>>>
>>>>>>> And Qemu does some checking to make sure that the device is indeed
>>>>>> associated
>>>>>>> with the specified phys-smmuv3.  This can be done going through the
>>>>>> sysfs path checking
>>>>>>> which is what I guess libvirt is currently doing to populate the
>>>> topology.
>>>>>> So basically
>>>>>>> Qemu is just replicating that to validate again.
>>>>>> I would prefer that iommufd users not have to go out to sysfs..
>>>>>>
>>>>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
>>>>>> return the phys
>>>>>>> smmuv3 base address which can avoid going through the sysfs.
>>>>>> It also doesn't seem great to expose a physical address. But we could
>>>>>> have an 'iommu instance id' that was a unique small integer?
>>>>> Ok. But how the user space can map that to the device?
>>>> Why does it need to?
>>>>
>>>> libvirt picks some label for the vsmmu instance, it doesn't matter
>>>> what the string is.
>>>>
>>>> qemu validates that all of the vsmmu instances are only linked to PCI
>>>> device that have the same iommu ID. This is already happening in the
>>>> kernel, it will fail attaches to mismatched instances.
>>>>
>>>> Nothing further is needed?
>>> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
>>> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
>>> -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
>> I don't get what is the point of adding such an id if it is not
>> referenced anywhere?
> Every QDev device instance has an 'id' property - if you don't
> set one explicitly, QEMU will generate one internally. Libvirt
> will always set the 'id' property to avoid the internal auto-
> generated IDs, as it wants full knowledge of naming.

OK thank you for the explanation

Eric
>
> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-30 18:09   ` Shameerali Kolothum Thodi via
  2025-01-31  9:33     ` Shameerali Kolothum Thodi via
@ 2025-01-31 21:41     ` Daniel P. Berrangé
  2025-02-06 10:02       ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-01-31 21:41 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org

On Thu, Jan 30, 2025 at 06:09:24PM +0000, Shameerali Kolothum Thodi wrote:
> 
> Each "arm-smmuv3-nested" instance, when the first device gets attached
> to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
> SMMUv3 driver. This domain will have a pointer representing the physical
> SMMUv3 that the device belongs. And any other device which belongs to
> the same physical SMMUv3 can share this S2 domain.

Ok, so given two guest SMMUv3s,   A and B, and two host SMMUv3s,
C and D, we could end up with A&C and B&D paired, or we could
end up with A&D and B&C paired, depending on whether we plug 
the first VFIO device into guest SMMUv3  A or B.

This is bad.  Behaviour must not vary depending on the order
in which we create devices.

An guest SMMUv3 is paired to a guest PXB. A guest PXB is liable
to be paired to a guest NUMA node. A guest NUMA node is liable
to be paired to host NUMA node. The guest/host SMMU pairing
must be chosen such that it makes conceptual sense wrt to the
guest PXB NUMA to host NUMA pairing.

If the kernel picks guest<->host SMMU pairings on a first-device
first-paired basis, this can end up with incorrect guest NUMA
configurations.

The mgmt apps needs to be able to tell QEMU exactly which
host SMMU to pair with each guest SMMU, and QEMU needs to
then tell the kernel.

> And as I mentioned in cover letter, Qemu will report,
> 
> "
> Attempt to add the HNS VF to a different SMMUv3 will result in,
> 
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument
> 
> At present Qemu is not doing any extra validation other than the above 
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.
> "
> So in summary, if the libvirt gets it wrong, Qemu will fail with error.

That's good error checking, and required, but also insufficient
as illustrated above IMHO.

> If a more explicit association is required, some help from kernel is required
> to identify the physical SMMUv3 associated with the device.

Yep, I think SMMUv3 info for devices needs to be exposed to userspace,
as well as a mechanism for QEMU to tell the kernel the SMMU mapping.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-01-31 21:41     ` Daniel P. Berrangé
@ 2025-02-06 10:02       ` Shameerali Kolothum Thodi via
  2025-02-06 10:37         ` Daniel P. Berrangé
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-06 10:02 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, nathanc@nvidia.com

Hi Daniel,

> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Friday, January 31, 2025 9:42 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Jan 30, 2025 at 06:09:24PM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> > Each "arm-smmuv3-nested" instance, when the first device gets attached
> > to it, will create a S2 HWPT and a corresponding SMMUv3 domain in
> kernel
> > SMMUv3 driver. This domain will have a pointer representing the physical
> > SMMUv3 that the device belongs. And any other device which belongs to
> > the same physical SMMUv3 can share this S2 domain.
> 
> Ok, so given two guest SMMUv3s,   A and B, and two host SMMUv3s,
> C and D, we could end up with A&C and B&D paired, or we could
> end up with A&D and B&C paired, depending on whether we plug
> the first VFIO device into guest SMMUv3  A or B.
> 
> This is bad.  Behaviour must not vary depending on the order
> in which we create devices.
> 
> An guest SMMUv3 is paired to a guest PXB. A guest PXB is liable
> to be paired to a guest NUMA node. A guest NUMA node is liable
> to be paired to host NUMA node. The guest/host SMMU pairing
> must be chosen such that it makes conceptual sense wrt to the
> guest PXB NUMA to host NUMA pairing.
> 
> If the kernel picks guest<->host SMMU pairings on a first-device
> first-paired basis, this can end up with incorrect guest NUMA
> configurations.

Ok. I am trying to understand how this can happen as I assume the
Guest PXB numa node is picked up by whatever device we are
attaching to it and based on which numa_id that device belongs to
in physical host.

And the physical smmuv3 numa id will be the same to that of the
device numa_id  it is associated with. Isn't it?

For example I have a system here, that has 8 phys SMMUv3s and numa
assignments on this is something like below,

Phys SMMUv3.0 --> node 0
  \..dev1 --> node0
Phys SMMUv3.1 --> node 0
\..dev2 -->node0
Phys SMMUv3.2 --> node 0
Phys SMMUv3.3 --> node 0

Phys SMMUv3.4 --> node 1
Phys SMMUv3.5 --> node 1
\..dev5 --> node1
Phys SMMUv3.6 --> node 1
Phys SMMUv3.7 --> node 1


If I have to assign say dev 1, 2 and 5 to a Guest, we need to specify 3
 "arm-smmuv3-accel" instances as they belong to different phys SMMUv3s.

-device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0 \
-device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=0 \
-device pxb-pcie,id=pcie.3,bus_nr=3,bus=pcie.0,numa_id=1 \
-device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
-device arm-smmuv3-accel,id=smmuv2,bus=pcie.2 \
-device arm-smmuv3-accel,id=smmuv3,bus=pcie.3 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device pcie-root-port,id=pcie.port2,bus=pcie.3,chassis=2 \
-device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3 \
-device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0 \
-device vfio-pci,host=0000: dev2,bus=pcie.port2,iommufd=iommufd0 \
-device vfio-pci,host=0000: dev5,bus=pcie.port3,iommufd=iommufd0

So I guess even if we don't specify the physical SMMUv3 association
explicitly, the kernel will check that based on the devices the Guest
SMMUv3 is attached to (and hence the Numa association), right?

In other words how an explicit association helps us here?

Or is it that the Guest PXB numa_id allocation is not always based
on device numa_id?

(May be I am missing something here. Sorry)

Thanks,
Shameer 















 
> The mgmt apps needs to be able to tell QEMU exactly which
> host SMMU to pair with each guest SMMU, and QEMU needs to
> then tell the kernel.
> 
> > And as I mentioned in cover letter, Qemu will report,
> >
> > "
> > Attempt to add the HNS VF to a different SMMUv3 will result in,
> >
> > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> Unable to attach viommu
> > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> vfio 0000:7d:02.2:
> >    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2
> (38) to id=11: Invalid argument
> >
> > At present Qemu is not doing any extra validation other than the above
> > failure to make sure the user configuration is correct or not. The
> > assumption is libvirt will take care of this.
> > "
> > So in summary, if the libvirt gets it wrong, Qemu will fail with error.
> 
> That's good error checking, and required, but also insufficient
> as illustrated above IMHO.
> 
> > If a more explicit association is required, some help from kernel is
> required
> > to identify the physical SMMUv3 associated with the device.
> 
> Yep, I think SMMUv3 info for devices needs to be exposed to userspace,
> as well as a mechanism for QEMU to tell the kernel the SMMU mapping.
> 
> 
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange
> :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
> 


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 10:02       ` Shameerali Kolothum Thodi via
@ 2025-02-06 10:37         ` Daniel P. Berrangé
  2025-02-06 13:51           ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-02-06 10:37 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 10:02:25AM +0000, Shameerali Kolothum Thodi wrote:
> Hi Daniel,
> 
> > -----Original Message-----
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > Sent: Friday, January 31, 2025 9:42 PM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> > 
> > On Thu, Jan 30, 2025 at 06:09:24PM +0000, Shameerali Kolothum Thodi
> > wrote:
> > >
> > > Each "arm-smmuv3-nested" instance, when the first device gets attached
> > > to it, will create a S2 HWPT and a corresponding SMMUv3 domain in
> > kernel
> > > SMMUv3 driver. This domain will have a pointer representing the physical
> > > SMMUv3 that the device belongs. And any other device which belongs to
> > > the same physical SMMUv3 can share this S2 domain.
> > 
> > Ok, so given two guest SMMUv3s,   A and B, and two host SMMUv3s,
> > C and D, we could end up with A&C and B&D paired, or we could
> > end up with A&D and B&C paired, depending on whether we plug
> > the first VFIO device into guest SMMUv3  A or B.
> > 
> > This is bad.  Behaviour must not vary depending on the order
> > in which we create devices.
> > 
> > An guest SMMUv3 is paired to a guest PXB. A guest PXB is liable
> > to be paired to a guest NUMA node. A guest NUMA node is liable
> > to be paired to host NUMA node. The guest/host SMMU pairing
> > must be chosen such that it makes conceptual sense wrt to the
> > guest PXB NUMA to host NUMA pairing.
> > 
> > If the kernel picks guest<->host SMMU pairings on a first-device
> > first-paired basis, this can end up with incorrect guest NUMA
> > configurations.
> 
> Ok. I am trying to understand how this can happen as I assume the
> Guest PXB numa node is picked up by whatever device we are
> attaching to it and based on which numa_id that device belongs to
> in physical host.
> 
> And the physical smmuv3 numa id will be the same to that of the
> device numa_id  it is associated with. Isn't it?
> 
> For example I have a system here, that has 8 phys SMMUv3s and numa
> assignments on this is something like below,
> 
> Phys SMMUv3.0 --> node 0
>   \..dev1 --> node0
> Phys SMMUv3.1 --> node 0
> \..dev2 -->node0
> Phys SMMUv3.2 --> node 0
> Phys SMMUv3.3 --> node 0
> 
> Phys SMMUv3.4 --> node 1
> Phys SMMUv3.5 --> node 1
> \..dev5 --> node1
> Phys SMMUv3.6 --> node 1
> Phys SMMUv3.7 --> node 1
> 
> 
> If I have to assign say dev 1, 2 and 5 to a Guest, we need to specify 3
>  "arm-smmuv3-accel" instances as they belong to different phys SMMUv3s.
> 
> -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0 \
> -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=0 \
> -device pxb-pcie,id=pcie.3,bus_nr=3,bus=pcie.0,numa_id=1 \
> -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
> -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2 \
> -device arm-smmuv3-accel,id=smmuv3,bus=pcie.3 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.3,chassis=2 \
> -device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3 \
> -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0 \
> -device vfio-pci,host=0000: dev2,bus=pcie.port2,iommufd=iommufd0 \
> -device vfio-pci,host=0000: dev5,bus=pcie.port3,iommufd=iommufd0
> 
> So I guess even if we don't specify the physical SMMUv3 association
> explicitly, the kernel will check that based on the devices the Guest
> SMMUv3 is attached to (and hence the Numa association), right?

It isn't about checking the devices, it is about the guest SMMU
getting differing host SMMU associations.

> In other words how an explicit association helps us here?
> 
> Or is it that the Guest PXB numa_id allocation is not always based
> on device numa_id?

Lets simplify to 2 SMMUs for shorter CLIs.

So to start with we assume physical host with two SMMUs, and
two PCI devices we want to assign

  0000:dev1 - associated with host SMMU 1, and host NUMA node 0
  0000:dev2 - associated with host SMMU 2, and host NUMA node 1

So now we configure QEMU like this:

 -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0
 -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=1
 -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1
 -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
 -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1
 -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2
 -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0
 -device vfio-pci,host=0000:dev2,bus=pcie.port2,iommufd=iommufd0

For brevity I'm not going to show the config for host/guest NUMA mappings,
but assume that guest NUMA node 0 has been configured to map to host NUMA
node 0 and guest node 1 to host node 1.

In this order of QEMU CLI args we get

  VFIO device 0000:dev1 causes the kernel to associate guest smmuv1 with
  host SSMU 1.

  VFIO device 0000:dev2 causes the kernel to associate guest smmuv2 with
  host SSMU 2.

Now consider we swap the ordering of the VFIO Devices on the QEMU cli


 -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0
 -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=1
 -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1
 -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
 -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1
 -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2
 -device vfio-pci,host=0000:dev2,bus=pcie.port2,iommufd=iommufd0
 -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0

In this order of QEMU CLI args we get

  VFIO device 0000:dev2 causes the kernel to associate guest smmuv1 with
  host SSMU 2.

  VFIO device 0000:dev1 causes the kernel to associate guest smmuv2 with
  host SSMU 1.

This is broken, as now we have inconsistent NUMA mappings between host
and guest. 0000:dev2 is associated with a PXB on NUMA node 1, but
associated with a guest SMMU that was paired with a PXB on NUMA node 0.

This is because the kernel is doing first-come first-matched logic for
mapping guest and host SMMUs, and thus is sensitive to ordering of the
VFIO devices on the CLI. We need to be ordering invariant, which means
libvirt must tell  QEMU which host + guest SMMUs to pair together, and
QEMU must in turn tell the kernel.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 10:37         ` Daniel P. Berrangé
@ 2025-02-06 13:51           ` Shameerali Kolothum Thodi via
  2025-02-06 14:46             ` Daniel P. Berrangé
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-06 13:51 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, nathanc@nvidia.com



> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, February 6, 2025 10:37 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org;
> nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 10:02:25AM +0000, Shameerali Kolothum Thodi
> wrote:
> > Hi Daniel,
> >
> > > -----Original Message-----
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > Sent: Friday, January 31, 2025 9:42 PM
> > > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> > > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> > > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-
> creatable
> > > nested SMMUv3
> > >
> > > On Thu, Jan 30, 2025 at 06:09:24PM +0000, Shameerali Kolothum Thodi
> > > wrote:
> > > >
> > > > Each "arm-smmuv3-nested" instance, when the first device gets
> attached
> > > > to it, will create a S2 HWPT and a corresponding SMMUv3 domain in
> > > kernel
> > > > SMMUv3 driver. This domain will have a pointer representing the
> physical
> > > > SMMUv3 that the device belongs. And any other device which belongs
> to
> > > > the same physical SMMUv3 can share this S2 domain.
> > >
> > > Ok, so given two guest SMMUv3s,   A and B, and two host SMMUv3s,
> > > C and D, we could end up with A&C and B&D paired, or we could
> > > end up with A&D and B&C paired, depending on whether we plug
> > > the first VFIO device into guest SMMUv3  A or B.
> > >
> > > This is bad.  Behaviour must not vary depending on the order
> > > in which we create devices.
> > >
> > > An guest SMMUv3 is paired to a guest PXB. A guest PXB is liable
> > > to be paired to a guest NUMA node. A guest NUMA node is liable
> > > to be paired to host NUMA node. The guest/host SMMU pairing
> > > must be chosen such that it makes conceptual sense wrt to the
> > > guest PXB NUMA to host NUMA pairing.
> > >
> > > If the kernel picks guest<->host SMMU pairings on a first-device
> > > first-paired basis, this can end up with incorrect guest NUMA
> > > configurations.
> >
> > Ok. I am trying to understand how this can happen as I assume the
> > Guest PXB numa node is picked up by whatever device we are
> > attaching to it and based on which numa_id that device belongs to
> > in physical host.
> >
> > And the physical smmuv3 numa id will be the same to that of the
> > device numa_id  it is associated with. Isn't it?
> >
> > For example I have a system here, that has 8 phys SMMUv3s and numa
> > assignments on this is something like below,
> >
> > Phys SMMUv3.0 --> node 0
> >   \..dev1 --> node0
> > Phys SMMUv3.1 --> node 0
> > \..dev2 -->node0
> > Phys SMMUv3.2 --> node 0
> > Phys SMMUv3.3 --> node 0
> >
> > Phys SMMUv3.4 --> node 1
> > Phys SMMUv3.5 --> node 1
> > \..dev5 --> node1
> > Phys SMMUv3.6 --> node 1
> > Phys SMMUv3.7 --> node 1
> >
> >
> > If I have to assign say dev 1, 2 and 5 to a Guest, we need to specify 3
> >  "arm-smmuv3-accel" instances as they belong to different phys
> SMMUv3s.
> >
> > -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0 \
> > -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=0 \
> > -device pxb-pcie,id=pcie.3,bus_nr=3,bus=pcie.0,numa_id=1 \
> > -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
> > -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2 \
> > -device arm-smmuv3-accel,id=smmuv3,bus=pcie.3 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device pcie-root-port,id=pcie.port2,bus=pcie.3,chassis=2 \
> > -device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3 \
> > -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0 \
> > -device vfio-pci,host=0000: dev2,bus=pcie.port2,iommufd=iommufd0 \
> > -device vfio-pci,host=0000: dev5,bus=pcie.port3,iommufd=iommufd0
> >
> > So I guess even if we don't specify the physical SMMUv3 association
> > explicitly, the kernel will check that based on the devices the Guest
> > SMMUv3 is attached to (and hence the Numa association), right?
> 
> It isn't about checking the devices, it is about the guest SMMU
> getting differing host SMMU associations.
> 
> > In other words how an explicit association helps us here?
> >
> > Or is it that the Guest PXB numa_id allocation is not always based
> > on device numa_id?
> 
> Lets simplify to 2 SMMUs for shorter CLIs.
> 
> So to start with we assume physical host with two SMMUs, and
> two PCI devices we want to assign
> 
>   0000:dev1 - associated with host SMMU 1, and host NUMA node 0
>   0000:dev2 - associated with host SMMU 2, and host NUMA node 1
> 
> So now we configure QEMU like this:
> 
>  -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0
>  -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=1
>  -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1
>  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
>  -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1
>  -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2
>  -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0
>  -device vfio-pci,host=0000:dev2,bus=pcie.port2,iommufd=iommufd0
> 
> For brevity I'm not going to show the config for host/guest NUMA
> mappings,
> but assume that guest NUMA node 0 has been configured to map to host
> NUMA
> node 0 and guest node 1 to host node 1.
> 
> In this order of QEMU CLI args we get
> 
>   VFIO device 0000:dev1 causes the kernel to associate guest smmuv1 with
>   host SSMU 1.
> 
>   VFIO device 0000:dev2 causes the kernel to associate guest smmuv2 with
>   host SSMU 2.
> 
> Now consider we swap the ordering of the VFIO Devices on the QEMU cli
> 
> 
>  -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0
>  -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=1
>  -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1
>  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
>  -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1
>  -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2
>  -device vfio-pci,host=0000:dev2,bus=pcie.port2,iommufd=iommufd0
>  -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0
> 
> In this order of QEMU CLI args we get
> 
>   VFIO device 0000:dev2 causes the kernel to associate guest smmuv1 with
>   host SSMU 2.
> 
>   VFIO device 0000:dev1 causes the kernel to associate guest smmuv2 with
>   host SSMU 1.
> 
> This is broken, as now we have inconsistent NUMA mappings between host
> and guest. 0000:dev2 is associated with a PXB on NUMA node 1, but
> associated with a guest SMMU that was paired with a PXB on NUMA node
> 0.

Hmm..I don’t think just swapping the order will change the association with
Guest SMMU here. Because, we have,

>  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2

During smmuv3-accel realize time, this will result in, 
 pci_setup_iommu(primary_bus, ops, smmu_state);

And when the vfio dev realization happens,
 set_iommu_device() 
   smmu_dev_set_iommu_device(bus, smmu_state, ,)
      --> this is where the guest smmuv3-->host smmuv3 association is first
            established. And any further vfio dev to this Guest SMMU will
            only succeeds if it belongs to the same phys SMMU.

ie, the Guest SMMU to pci bus association, actually make sure you have the
same Guest SMMU for the device.

smmuv2 --> pcie.2 --> (pxb-pcie, numa_id = 1)
0000:dev2 -->  pcie.port2 --> pcie.2 --> smmuv2 (pxb-pcie, numa_id = 1)

Hence the association of 0000:dev2 to Guest SMMUv2 remain same.

I hope this is clear. And I am not sure the association will be broken in any
other way unless Qemu CLI specify the dev to a different PXB.

May be it is that one of my earlier replies caused this confusion that 
ordering of the VFIO Devices on the QEMU cli will affect the association.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 13:51           ` Shameerali Kolothum Thodi via
@ 2025-02-06 14:46             ` Daniel P. Berrangé
  2025-02-06 15:07               ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-02-06 14:46 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 01:51:15PM +0000, Shameerali Kolothum Thodi wrote:
> Hmm..I don’t think just swapping the order will change the association with
> Guest SMMU here. Because, we have,
> 
> >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> 
> During smmuv3-accel realize time, this will result in, 
>  pci_setup_iommu(primary_bus, ops, smmu_state);
> 
> And when the vfio dev realization happens,
>  set_iommu_device() 
>    smmu_dev_set_iommu_device(bus, smmu_state, ,)
>       --> this is where the guest smmuv3-->host smmuv3 association is first
>             established. And any further vfio dev to this Guest SMMU will
>             only succeeds if it belongs to the same phys SMMU.
> 
> ie, the Guest SMMU to pci bus association, actually make sure you have the
> same Guest SMMU for the device.

Ok, so at time of VFIO device realize, QEMU is telling the kernel
to associate a physical SMMU, and its doing this with the virtual
SMMU attached to PXB parenting the VFIO device.

> smmuv2 --> pcie.2 --> (pxb-pcie, numa_id = 1)
> 0000:dev2 -->  pcie.port2 --> pcie.2 --> smmuv2 (pxb-pcie, numa_id = 1)
> 
> Hence the association of 0000:dev2 to Guest SMMUv2 remain same.

Yes, I concur the SMMU physical <-> virtual association should
be fixed, as long as the same VFIO device is always added to
the same virtual SMMU.

> I hope this is clear. And I am not sure the association will be broken in any
> other way unless Qemu CLI specify the dev to a different PXB.

Although the ordering is at least predictable, I remain uncomfortable
about the idea of the virtual SMMU association with the physical SMMU
being a side effect of the VFIO device placement.

There is still the open door for admin mis-configuration that will not
be diagnosed. eg consider we attached VFIO device 1 from the host NUMA
node 1 to  a PXB associated with host NUMA node 0. As long as that's
the first VFIO device, the kernel will happily associate the physical
and guest SMMUs.

If we set the physical/guest SMMU relationship directly, then at the
time the VFIO device is plugged, we can diagnose the incorrectly
placed VFIO device, and better reason about behaviour.

I've another question about unplug behaviour..

 1. Plug a VFIO device for host SMMU 1 into a PXB with guest SMMU 1.
      => Kernel associates host SMMU 1 and guest SMMU 1 together
 2. Unplug this VFIO device
 3. Plug a VFIO device for host SMMU 2 into a PXB with guest SMMU 1.

Does the host/guest SMMU 1<-> 1 association remain set after step 2,
implying step 3 will fail ? Or does it get unset, allowing step 3
to succeed, and establish a new mapping host SMMU 2 to guest SMMU 1.

If step 2 does NOT break the association, do we preserve that
across a savevm+loadvm sequence of QEMU. If we don't, then step
3 would fail before the savevm, but succeed after the loadvm.

Explicitly representing the host SMMU association on the guest SMMU
config makes this behaviour unambiguous. The host / guest SMMU
relationship is fixed for the lifetime of the VM and invariant of
whatever VFIO device is (or was previously) plugged.

So I still go back to my general principle that automatic side effects
are an undesirable idea in QEMU configuration. We have a long tradition
of making everything entirely explicit to produce easily predictable
behaviour.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 14:46             ` Daniel P. Berrangé
@ 2025-02-06 15:07               ` Shameerali Kolothum Thodi via
  2025-02-06 17:02                 ` Jason Gunthorpe
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-06 15:07 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, jgg@nvidia.com, nicolinc@nvidia.com,
	ddutile@redhat.com, Linuxarm, Wangzhou (B), jiangkunkun,
	Jonathan Cameron, zhangfei.gao@linaro.org, nathanc@nvidia.com



> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, February 6, 2025 2:47 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org;
> nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 01:51:15PM +0000, Shameerali Kolothum Thodi
> wrote:
> > Hmm..I don’t think just swapping the order will change the association
> with
> > Guest SMMU here. Because, we have,
> >
> > >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> >
> > During smmuv3-accel realize time, this will result in,
> >  pci_setup_iommu(primary_bus, ops, smmu_state);
> >
> > And when the vfio dev realization happens,
> >  set_iommu_device()
> >    smmu_dev_set_iommu_device(bus, smmu_state, ,)
> >       --> this is where the guest smmuv3-->host smmuv3 association is first
> >             established. And any further vfio dev to this Guest SMMU will
> >             only succeeds if it belongs to the same phys SMMU.
> >
> > ie, the Guest SMMU to pci bus association, actually make sure you have
> the
> > same Guest SMMU for the device.
> 
> Ok, so at time of VFIO device realize, QEMU is telling the kernel
> to associate a physical SMMU, and its doing this with the virtual
> SMMU attached to PXB parenting the VFIO device.
> 
> > smmuv2 --> pcie.2 --> (pxb-pcie, numa_id = 1)
> > 0000:dev2 -->  pcie.port2 --> pcie.2 --> smmuv2 (pxb-pcie, numa_id = 1)
> >
> > Hence the association of 0000:dev2 to Guest SMMUv2 remain same.
> 
> Yes, I concur the SMMU physical <-> virtual association should
> be fixed, as long as the same VFIO device is always added to
> the same virtual SMMU.
> 
> > I hope this is clear. And I am not sure the association will be broken in
> any
> > other way unless Qemu CLI specify the dev to a different PXB.
> 
> Although the ordering is at least predictable, I remain uncomfortable
> about the idea of the virtual SMMU association with the physical SMMU
> being a side effect of the VFIO device placement.
> 
> There is still the open door for admin mis-configuration that will not
> be diagnosed. eg consider we attached VFIO device 1 from the host NUMA
> node 1 to  a PXB associated with host NUMA node 0. As long as that's
> the first VFIO device, the kernel will happily associate the physical
> and guest SMMUs.

Yes. A mis-configuration can place it on a wrong one. 
 
> If we set the physical/guest SMMU relationship directly, then at the
> time the VFIO device is plugged, we can diagnose the incorrectly
> placed VFIO device, and better reason about behaviour.

Agree.

> I've another question about unplug behaviour..
> 
>  1. Plug a VFIO device for host SMMU 1 into a PXB with guest SMMU 1.
>       => Kernel associates host SMMU 1 and guest SMMU 1 together
>  2. Unplug this VFIO device
>  3. Plug a VFIO device for host SMMU 2 into a PXB with guest SMMU 1.
> 
> Does the host/guest SMMU 1<-> 1 association remain set after step 2,
> implying step 3 will fail ? Or does it get unset, allowing step 3
> to succeed, and establish a new mapping host SMMU 2 to guest SMMU 1.

At the moment the first association is not persistent. So a new mapping 
is possible.
 
> If step 2 does NOT break the association, do we preserve that
> across a savevm+loadvm sequence of QEMU. If we don't, then step
> 3 would fail before the savevm, but succeed after the loadvm.

Right. I haven't attempted migration tests yet. But agree that an 
explicit association is better to make migration compatible. Also
I am not sure if the target has a different phys SMMUV3<--> dev
mapping how we handle that.

> Explicitly representing the host SMMU association on the guest SMMU
> config makes this behaviour unambiguous. The host / guest SMMU
> relationship is fixed for the lifetime of the VM and invariant of
> whatever VFIO device is (or was previously) plugged.
> 
> So I still go back to my general principle that automatic side effects
> are an undesirable idea in QEMU configuration. We have a long tradition
> of making everything entirely explicit to produce easily predictable
> behaviour.

Ok. Convinced 😊. Thanks for explaining.

Shameer

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 15:07               ` Shameerali Kolothum Thodi via
@ 2025-02-06 17:02                 ` Jason Gunthorpe
  2025-02-06 17:10                   ` Daniel P. Berrangé
  0 siblings, 1 reply; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-06 17:02 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > If we set the physical/guest SMMU relationship directly, then at the
> > time the VFIO device is plugged, we can diagnose the incorrectly
> > placed VFIO device, and better reason about behaviour.
> 
> Agree.

Can you just take in a VFIO cdev FD reference on this command line:

 -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2

And that will lock the pSMMU/vSMMU relationship?

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 17:02                 ` Jason Gunthorpe
@ 2025-02-06 17:10                   ` Daniel P. Berrangé
  2025-02-06 17:46                     ` Jason Gunthorpe
  0 siblings, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-02-06 17:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > > If we set the physical/guest SMMU relationship directly, then at the
> > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > placed VFIO device, and better reason about behaviour.
> > 
> > Agree.
> 
> Can you just take in a VFIO cdev FD reference on this command line:
> 
>  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> 
> And that will lock the pSMMU/vSMMU relationship?

We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
the VFIO devices may be hot plugged arbitrarly later, and we should have
the association initialized the SMMU is realized.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 17:10                   ` Daniel P. Berrangé
@ 2025-02-06 17:46                     ` Jason Gunthorpe
  2025-02-06 17:54                       ` Daniel P. Berrangé
  2025-02-06 17:57                       ` Shameerali Kolothum Thodi via
  0 siblings, 2 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-06 17:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 05:10:32PM +0000, Daniel P. Berrangé wrote:
> On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > > > If we set the physical/guest SMMU relationship directly, then at the
> > > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > > placed VFIO device, and better reason about behaviour.
> > > 
> > > Agree.
> > 
> > Can you just take in a VFIO cdev FD reference on this command line:
> > 
> >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> > 
> > And that will lock the pSMMU/vSMMU relationship?
> 
> We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> the VFIO devices may be hot plugged arbitrarly later, and we should have
> the association initialized the SMMU is realized.

This is not supported kernel side, you can't instantiate a vIOMMU
without a VFIO device that uses it. For security.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 17:46                     ` Jason Gunthorpe
@ 2025-02-06 17:54                       ` Daniel P. Berrangé
  2025-02-06 17:58                         ` Jason Gunthorpe
  2025-02-06 17:57                       ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-02-06 17:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 01:46:47PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 05:10:32PM +0000, Daniel P. Berrangé wrote:
> > On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > > > > If we set the physical/guest SMMU relationship directly, then at the
> > > > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > > > placed VFIO device, and better reason about behaviour.
> > > > 
> > > > Agree.
> > > 
> > > Can you just take in a VFIO cdev FD reference on this command line:
> > > 
> > >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> > > 
> > > And that will lock the pSMMU/vSMMU relationship?
> > 
> > We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> > we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > the association initialized the SMMU is realized.
> 
> This is not supported kernel side, you can't instantiate a vIOMMU
> without a VFIO device that uses it. For security.

What are the security concerns here ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 17:54                       ` Daniel P. Berrangé
@ 2025-02-06 17:58                         ` Jason Gunthorpe
  2025-02-06 18:04                           ` Shameerali Kolothum Thodi via
  2025-02-06 18:18                           ` Daniel P. Berrangé
  0 siblings, 2 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-06 17:58 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 05:54:57PM +0000, Daniel P. Berrangé wrote:
> > > We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> > > we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> > > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > > the association initialized the SMMU is realized.
> > 
> > This is not supported kernel side, you can't instantiate a vIOMMU
> > without a VFIO device that uses it. For security.
> 
> What are the security concerns here ?

You should not be able to open iommufd and manipulate iommu HW that
you don't have a VFIO descriptor for, including creating physical
vIOMMU resources, allocating command queues and whatever else.

Some kind of hot plug smmu would have to create a vSMMU without any
kernel backing and then later bind it to a kernel implementation.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 17:58                         ` Jason Gunthorpe
@ 2025-02-06 18:04                           ` Shameerali Kolothum Thodi via
  2025-02-06 18:13                             ` Jason Gunthorpe
  2025-02-06 18:18                           ` Daniel P. Berrangé
  1 sibling, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-06 18:04 UTC (permalink / raw)
  To: Jason Gunthorpe, Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com



> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 6, 2025 5:59 PM
> To: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 05:54:57PM +0000, Daniel P. Berrangé wrote:
> > > > We shouldn't assume any VFIO device exists in the QEMU cnofig at the
> time
> > > > we realize the virtual ssmu. I expect the SMMU may be cold plugged,
> while
> > > > the VFIO devices may be hot plugged arbitrarly later, and we should
> have
> > > > the association initialized the SMMU is realized.
> > >
> > > This is not supported kernel side, you can't instantiate a vIOMMU
> > > without a VFIO device that uses it. For security.
> >
> > What are the security concerns here ?
> 
> You should not be able to open iommufd and manipulate iommu HW that
> you don't have a VFIO descriptor for, including creating physical
> vIOMMU resources, allocating command queues and whatever else.
> 
> Some kind of hot plug smmu would have to create a vSMMU without any
> kernel backing and then later bind it to a kernel implementation.

Not sure I get the problem with associating vSMMU with a pSMMU. Something
like an iommu instance id mentioned before,

-device arm-smmuv3-accel,id=smmuv2,bus=pcie.2,host-smmu=iommu.1

This can realize the vSMMU without actually creating a vIOMMU in kernel.
And when the dev gets attached/realized, check (GET_HW_INFO)the specified
iommu instance id matches or not.

Or the concern here is exporting an iommu instance id to user space?

Thanks,
Shameer
 

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 18:04                           ` Shameerali Kolothum Thodi via
@ 2025-02-06 18:13                             ` Jason Gunthorpe
  2025-02-06 18:18                               ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-06 18:13 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 06:04:57PM +0000, Shameerali Kolothum Thodi wrote:
> > Some kind of hot plug smmu would have to create a vSMMU without any
> > kernel backing and then later bind it to a kernel implementation.
> 
> Not sure I get the problem with associating vSMMU with a pSMMU. Something
> like an iommu instance id mentioned before,
> 
> -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2,host-smmu=iommu.1
> 
> This can realize the vSMMU without actually creating a vIOMMU in kernel.
> And when the dev gets attached/realized, check (GET_HW_INFO)the specified
> iommu instance id matches or not.
> 
> Or the concern here is exporting an iommu instance id to user space?

Philisophically we do not permit any HW access through iommufd without
a VFIO fd to "prove" the process has rights to touch hardware.

We don't have any way to prove the process has rights to touch the
iommu hardware seperately from VFIO.

So even if you invent an iommu ID we cannot accept it as a handle to
create viommu in iommufd.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 18:13                             ` Jason Gunthorpe
@ 2025-02-06 18:18                               ` Shameerali Kolothum Thodi via
  2025-02-06 18:22                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-06 18:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com



> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 6, 2025 6:13 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 06:04:57PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > Some kind of hot plug smmu would have to create a vSMMU without
> any
> > > kernel backing and then later bind it to a kernel implementation.
> >
> > Not sure I get the problem with associating vSMMU with a pSMMU.
> Something
> > like an iommu instance id mentioned before,
> >
> > -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2,host-smmu=iommu.1
> >
> > This can realize the vSMMU without actually creating a vIOMMU in kernel.
> > And when the dev gets attached/realized, check (GET_HW_INFO)the
> specified
> > iommu instance id matches or not.
> >
> > Or the concern here is exporting an iommu instance id to user space?
> 
> Philisophically we do not permit any HW access through iommufd without
> a VFIO fd to "prove" the process has rights to touch hardware.
> 
> We don't have any way to prove the process has rights to touch the
> iommu hardware seperately from VFIO.

It is not. Qemu just instantiates a vSMMU and assigns the IOMMU 
instance id to it.

> 
> So even if you invent an iommu ID we cannot accept it as a handle to
> create viommu in iommufd.

Creating the vIOMMU only happens when the user does a  cold/hot plug of
a VFIO device. At that time Qemu checks whether the assigned id matches
with whatever the kernel tell it. 

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 18:18                               ` Shameerali Kolothum Thodi via
@ 2025-02-06 18:22                                 ` Jason Gunthorpe
  2025-02-06 20:33                                   ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-06 18:22 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:

> > So even if you invent an iommu ID we cannot accept it as a handle to
> > create viommu in iommufd.
> 
> Creating the vIOMMU only happens when the user does a  cold/hot plug of
> a VFIO device. At that time Qemu checks whether the assigned id matches
> with whatever the kernel tell it. 

This is not hard up until the guest is started. If you boot a guest
without a backing viommu iommufd object then there will be some more
complexities.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 18:22                                 ` Jason Gunthorpe
@ 2025-02-06 20:33                                   ` Nicolin Chen
  2025-02-06 20:38                                     ` Jason Gunthorpe
  2025-02-07 10:21                                     ` Shameerali Kolothum Thodi via
  0 siblings, 2 replies; 150+ messages in thread
From: Nicolin Chen @ 2025-02-06 20:33 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Daniel P. Berrangé,
	Jason Gunthorpe
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> 
> > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > create viommu in iommufd.
> > 
> > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > a VFIO device. At that time Qemu checks whether the assigned id matches
> > with whatever the kernel tell it. 
> 
> This is not hard up until the guest is started. If you boot a guest
> without a backing viommu iommufd object then there will be some more
> complexities.

Yea, I imagined that things would be complicated with hotplugs..

On one hand, I got the part that we need some fixed link forehand
to ease migration/hotplugs.

On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
brings the immediate attention that we cannot even decide vSMMU's
capabilities being reflected in its IDR/IIDR registers, without a
coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
hotplug device, the IOMMU_GET_HW_INFO cannot be done during guest
kernel probing vSMMU instance. So we would have to reset the vSMMU
"HW" after the device hotplug?

Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 20:33                                   ` Nicolin Chen
@ 2025-02-06 20:38                                     ` Jason Gunthorpe
  2025-02-06 20:48                                       ` Nicolin Chen
  2025-02-07 10:21                                     ` Shameerali Kolothum Thodi via
  1 sibling, 1 reply; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-06 20:38 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameerali Kolothum Thodi, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > 
> > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > create viommu in iommufd.
> > > 
> > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > with whatever the kernel tell it. 
> > 
> > This is not hard up until the guest is started. If you boot a guest
> > without a backing viommu iommufd object then there will be some more
> > complexities.
> 
> Yea, I imagined that things would be complicated with hotplugs..
> 
> On one hand, I got the part that we need some fixed link forehand
> to ease migration/hotplugs.
> 
> On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> brings the immediate attention that we cannot even decide vSMMU's
> capabilities being reflected in its IDR/IIDR registers, without a
> coldplug device

As Daniel was saying this all has to be specifiable on the command
line.

IMHO if the vSMMU is not fully specified by the time the boot happens
(either explicity via command line or implicitly by querying the live
HW) then it qemu should fail.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 20:38                                     ` Jason Gunthorpe
@ 2025-02-06 20:48                                       ` Nicolin Chen
  2025-02-06 21:11                                         ` Jason Gunthorpe
  0 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2025-02-06 20:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameerali Kolothum Thodi, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 04:38:55PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > > 
> > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > create viommu in iommufd.
> > > > 
> > > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > > with whatever the kernel tell it. 
> > > 
> > > This is not hard up until the guest is started. If you boot a guest
> > > without a backing viommu iommufd object then there will be some more
> > > complexities.
> > 
> > Yea, I imagined that things would be complicated with hotplugs..
> > 
> > On one hand, I got the part that we need some fixed link forehand
> > to ease migration/hotplugs.
> > 
> > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > brings the immediate attention that we cannot even decide vSMMU's
> > capabilities being reflected in its IDR/IIDR registers, without a
> > coldplug device
> 
> As Daniel was saying this all has to be specifiable on the command
> line.
> 
> IMHO if the vSMMU is not fully specified by the time the boot happens
> (either explicity via command line or implicitly by querying the live
> HW) then it qemu should fail.

Though that makes sense, that would assume we could only support
the case where a VM has at least one cold plug device per vSMMU?

Otherwise, even if we specify vSMMU to which pSMMU via a command
line, we can't get access to the pSMMU via IOMMU_GET_HW_INFO..

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 20:48                                       ` Nicolin Chen
@ 2025-02-06 21:11                                         ` Jason Gunthorpe
  2025-02-06 22:46                                           ` Nicolin Chen
  0 siblings, 1 reply; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-06 21:11 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameerali Kolothum Thodi, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 12:48:40PM -0800, Nicolin Chen wrote:
> On Thu, Feb 06, 2025 at 04:38:55PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> > > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > > > 
> > > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > > create viommu in iommufd.
> > > > > 
> > > > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > > > with whatever the kernel tell it. 
> > > > 
> > > > This is not hard up until the guest is started. If you boot a guest
> > > > without a backing viommu iommufd object then there will be some more
> > > > complexities.
> > > 
> > > Yea, I imagined that things would be complicated with hotplugs..
> > > 
> > > On one hand, I got the part that we need some fixed link forehand
> > > to ease migration/hotplugs.
> > > 
> > > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > > brings the immediate attention that we cannot even decide vSMMU's
> > > capabilities being reflected in its IDR/IIDR registers, without a
> > > coldplug device
> > 
> > As Daniel was saying this all has to be specifiable on the command
> > line.
> > 
> > IMHO if the vSMMU is not fully specified by the time the boot happens
> > (either explicity via command line or implicitly by querying the live
> > HW) then it qemu should fail.
> 
> Though that makes sense, that would assume we could only support
> the case where a VM has at least one cold plug device per vSMMU?
> 
> Otherwise, even if we specify vSMMU to which pSMMU via a command
> line, we can't get access to the pSMMU via IOMMU_GET_HW_INFO..

You'd use the command line information and wouldn't need GET_HW_INFO,
it would be complicated

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 21:11                                         ` Jason Gunthorpe
@ 2025-02-06 22:46                                           ` Nicolin Chen
  2025-02-07  0:08                                             ` Jason Gunthorpe
  0 siblings, 1 reply; 150+ messages in thread
From: Nicolin Chen @ 2025-02-06 22:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameerali Kolothum Thodi, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 05:11:13PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 12:48:40PM -0800, Nicolin Chen wrote:
> > On Thu, Feb 06, 2025 at 04:38:55PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> > > > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > > > > 
> > > > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > > > create viommu in iommufd.
> > > > > > 
> > > > > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > > > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > > > > with whatever the kernel tell it. 
> > > > > 
> > > > > This is not hard up until the guest is started. If you boot a guest
> > > > > without a backing viommu iommufd object then there will be some more
> > > > > complexities.
> > > > 
> > > > Yea, I imagined that things would be complicated with hotplugs..
> > > > 
> > > > On one hand, I got the part that we need some fixed link forehand
> > > > to ease migration/hotplugs.
> > > > 
> > > > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > > > brings the immediate attention that we cannot even decide vSMMU's
> > > > capabilities being reflected in its IDR/IIDR registers, without a
> > > > coldplug device
> > > 
> > > As Daniel was saying this all has to be specifiable on the command
> > > line.
> > > 
> > > IMHO if the vSMMU is not fully specified by the time the boot happens
> > > (either explicity via command line or implicitly by querying the live
> > > HW) then it qemu should fail.
> > 
> > Though that makes sense, that would assume we could only support
> > the case where a VM has at least one cold plug device per vSMMU?
> > 
> > Otherwise, even if we specify vSMMU to which pSMMU via a command
> > line, we can't get access to the pSMMU via IOMMU_GET_HW_INFO..
> 
> You'd use the command line information and wouldn't need GET_HW_INFO,
> it would be complicated

Do you mean the "-device arm-smmuv3-accel,id=xx" line? This still
won't give us the host IDR/IIDR register values to probe a vSMMU,
unless it has a VFIO device assigned to vSMMU's associated PXB in
that command line?

Nicolin


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 22:46                                           ` Nicolin Chen
@ 2025-02-07  0:08                                             ` Jason Gunthorpe
  0 siblings, 0 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-07  0:08 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shameerali Kolothum Thodi, Daniel P. Berrangé,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 02:46:42PM -0800, Nicolin Chen wrote:
> > You'd use the command line information and wouldn't need GET_HW_INFO,
> > it would be complicated
> 
> Do you mean the "-device arm-smmuv3-accel,id=xx" line? This still
> won't give us the host IDR/IIDR register values to probe a vSMMU,
> unless it has a VFIO device assigned to vSMMU's associated PXB in
> that command line?

Yes, put the IDR registers on the command line too.

Nothing from the host should be copied to the guest without the option
to control it through the command line.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 20:33                                   ` Nicolin Chen
  2025-02-06 20:38                                     ` Jason Gunthorpe
@ 2025-02-07 10:21                                     ` Shameerali Kolothum Thodi via
  2025-02-07 10:31                                       ` Daniel P. Berrangé
  1 sibling, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-07 10:21 UTC (permalink / raw)
  To: Nicolin Chen, Daniel P. Berrangé, Jason Gunthorpe
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com



> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 6, 2025 8:33 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Daniel P. Berrangé
> <berrange@redhat.com>; Jason Gunthorpe <jgg@nvidia.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > create viommu in iommufd.
> > >
> > > Creating the vIOMMU only happens when the user does a  cold/hot
> plug of
> > > a VFIO device. At that time Qemu checks whether the assigned id
> matches
> > > with whatever the kernel tell it.
> >
> > This is not hard up until the guest is started. If you boot a guest
> > without a backing viommu iommufd object then there will be some more
> > complexities.
> 
> Yea, I imagined that things would be complicated with hotplugs..
> 
> On one hand, I got the part that we need some fixed link forehand
> to ease migration/hotplugs.
> 
> On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> brings the immediate attention that we cannot even decide vSMMU's
> capabilities being reflected in its IDR/IIDR registers, without a
> coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
> hotplug device, the IOMMU_GET_HW_INFO cannot be done during guest

Right. I forgot about the call to smmu_dev_get_info() during the reset.
That means we need at least one dev per Guest SMMU during Guest
boot :(

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-07 10:21                                     ` Shameerali Kolothum Thodi via
@ 2025-02-07 10:31                                       ` Daniel P. Berrangé
  2025-02-07 12:21                                         ` Shameerali Kolothum Thodi via
  0 siblings, 1 reply; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-02-07 10:31 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Nicolin Chen, Jason Gunthorpe, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Fri, Feb 07, 2025 at 10:21:17AM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > -----Original Message-----
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 6, 2025 8:33 PM
> > To: Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; Daniel P. Berrangé
> > <berrange@redhat.com>; Jason Gunthorpe <jgg@nvidia.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; ddutile@redhat.com;
> > Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> > Jonathan Cameron <jonathan.cameron@huawei.com>;
> > zhangfei.gao@linaro.org; nathanc@nvidia.com
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> > 
> > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi
> > wrote:
> > >
> > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > create viommu in iommufd.
> > > >
> > > > Creating the vIOMMU only happens when the user does a  cold/hot
> > plug of
> > > > a VFIO device. At that time Qemu checks whether the assigned id
> > matches
> > > > with whatever the kernel tell it.
> > >
> > > This is not hard up until the guest is started. If you boot a guest
> > > without a backing viommu iommufd object then there will be some more
> > > complexities.
> > 
> > Yea, I imagined that things would be complicated with hotplugs..
> > 
> > On one hand, I got the part that we need some fixed link forehand
> > to ease migration/hotplugs.
> > 
> > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > brings the immediate attention that we cannot even decide vSMMU's
> > capabilities being reflected in its IDR/IIDR registers, without a
> > coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
> > hotplug device, the IOMMU_GET_HW_INFO cannot be done during guest
> 
> Right. I forgot about the call to smmu_dev_get_info() during the reset.
> That means we need at least one dev per Guest SMMU during Guest
> boot :(

That's pretty unpleasant as a usage restriction. It sounds like there
needs to be a way to configure & control the vIOMMU independantly of
attaching a specific VFIO device.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-07 10:31                                       ` Daniel P. Berrangé
@ 2025-02-07 12:21                                         ` Shameerali Kolothum Thodi via
  2025-02-07 12:53                                           ` Jason Gunthorpe
  0 siblings, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-07 12:21 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Nicolin Chen, Jason Gunthorpe, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com



> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Friday, February 7, 2025 10:32 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Nicolin Chen <nicolinc@nvidia.com>; Jason Gunthorpe
> <jgg@nvidia.com>; qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Feb 07, 2025 at 10:21:17AM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, February 6, 2025 8:33 PM
> > > To: Shameerali Kolothum Thodi
> > > <shameerali.kolothum.thodi@huawei.com>; Daniel P. Berrangé
> > > <berrange@redhat.com>; Jason Gunthorpe <jgg@nvidia.com>
> > > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > > eric.auger@redhat.com; peter.maydell@linaro.org;
> ddutile@redhat.com;
> > > Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> > > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> > > Jonathan Cameron <jonathan.cameron@huawei.com>;
> > > zhangfei.gao@linaro.org; nathanc@nvidia.com
> > > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-
> creatable
> > > nested SMMUv3
> > >
> > > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum
> Thodi
> > > wrote:
> > > >
> > > > > > So even if you invent an iommu ID we cannot accept it as a handle
> to
> > > > > > create viommu in iommufd.
> > > > >
> > > > > Creating the vIOMMU only happens when the user does a  cold/hot
> > > plug of
> > > > > a VFIO device. At that time Qemu checks whether the assigned id
> > > matches
> > > > > with whatever the kernel tell it.
> > > >
> > > > This is not hard up until the guest is started. If you boot a guest
> > > > without a backing viommu iommufd object then there will be some
> more
> > > > complexities.
> > >
> > > Yea, I imagined that things would be complicated with hotplugs..
> > >
> > > On one hand, I got the part that we need some fixed link forehand
> > > to ease migration/hotplugs.
> > >
> > > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > > brings the immediate attention that we cannot even decide vSMMU's
> > > capabilities being reflected in its IDR/IIDR registers, without a
> > > coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
> > > hotplug device, the IOMMU_GET_HW_INFO cannot be done during
> guest
> >
> > Right. I forgot about the call to smmu_dev_get_info() during the reset.
> > That means we need at least one dev per Guest SMMU during Guest
> > boot :(
> 
> That's pretty unpleasant as a usage restriction. It sounds like there
> needs to be a way to configure & control the vIOMMU independantly of
> attaching a specific VFIO device.

Yes, that would be ideal.  

Just wondering whether we can have something like the
vfio_register_iommu_driver() for iommufd subsystem by which it can directly
access iommu drivers ops(may be a restricted set). 

Not sure about the layering violations and other security issues with that...

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-07 12:21                                         ` Shameerali Kolothum Thodi via
@ 2025-02-07 12:53                                           ` Jason Gunthorpe
  0 siblings, 0 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-07 12:53 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Daniel P. Berrangé, Nicolin Chen, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, ddutile@redhat.com, Linuxarm,
	Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Fri, Feb 07, 2025 at 12:21:54PM +0000, Shameerali Kolothum Thodi wrote:

> Just wondering whether we can have something like the
> vfio_register_iommu_driver() for iommufd subsystem by which it can directly
> access iommu drivers ops(may be a restricted set). 

I very much want to try hard to avoid that.

AFAICT you do not need a VFIO device, or access to the HW_INFO of the
smmu to start up a SMMU driver.

Yes, you cannot later attach a VFIO device with a pSMMU that
materially differs from vSMMU setup, but that is fine.

qemu has long had a duality where you can either "inherit from host"
for an easy setup or be "fully specified" and support live
migration/etc. CPUID as a simple example.

So, what the smmu patches are doing now is "inherit from host" and
that requires a VFIO device to work. I think that is fine.

If you want to do full hotplug then you need to "fully specified" on
the command line so a working vSMMU can be shown to the guest with no
devices, and no kernel involvement.

Obviously this is a highly advanced operating mode as things like IIDR
and errata need to be considered, but I would guess booting with no
vPCI devices is already abnormal.

Jason

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 17:58                         ` Jason Gunthorpe
  2025-02-06 18:04                           ` Shameerali Kolothum Thodi via
@ 2025-02-06 18:18                           ` Daniel P. Berrangé
  1 sibling, 0 replies; 150+ messages in thread
From: Daniel P. Berrangé @ 2025-02-06 18:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameerali Kolothum Thodi, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 01:58:43PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 05:54:57PM +0000, Daniel P. Berrangé wrote:
> > > > We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> > > > we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> > > > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > > > the association initialized the SMMU is realized.
> > > 
> > > This is not supported kernel side, you can't instantiate a vIOMMU
> > > without a VFIO device that uses it. For security.
> > 
> > What are the security concerns here ?
> 
> You should not be able to open iommufd and manipulate iommu HW that
> you don't have a VFIO descriptor for, including creating physical
> vIOMMU resources, allocating command queues and whatever else.
> 
> Some kind of hot plug smmu would have to create a vSMMU without any
> kernel backing and then later bind it to a kernel implementation.

Ok, so if we give the info about the vSMMU <-> pSMMU binding to
QEMU upfront, it can delay using it until the point where the kernel
accepts it. This at least gives a clear design to applications outside
QEMU, and hides the low level impl details to inside QEMU.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 17:46                     ` Jason Gunthorpe
  2025-02-06 17:54                       ` Daniel P. Berrangé
@ 2025-02-06 17:57                       ` Shameerali Kolothum Thodi via
  2025-02-06 17:59                         ` Jason Gunthorpe
  1 sibling, 1 reply; 150+ messages in thread
From: Shameerali Kolothum Thodi via @ 2025-02-06 17:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Daniel P. Berrangé
  Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com



> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 6, 2025 5:47 PM
> To: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 05:10:32PM +0000, Daniel P. Berrangé wrote:
> > On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > > > If we set the physical/guest SMMU relationship directly, then at the
> > > > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > > > placed VFIO device, and better reason about behaviour.
> > > >
> > > > Agree.
> > >
> > > Can you just take in a VFIO cdev FD reference on this command line:
> > >
> > >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> > >
> > > And that will lock the pSMMU/vSMMU relationship?
> >
> > We shouldn't assume any VFIO device exists in the QEMU cnofig at the
> time
> > we realize the virtual ssmu. I expect the SMMU may be cold plugged,
> while
> > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > the association initialized the SMMU is realized.
> 
> This is not supported kernel side, you can't instantiate a vIOMMU
> without a VFIO device that uses it. For security.

I think that is fine if Qemu knows about association beforehand. During 
vIOMMU instantiation it can cross check whether the user specified
pSMMU <->vSMMU is correct for the device.

Also how do we do it with multiple VF devices under a pSUMMU ? Which
cdev fd in that case? 

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
  2025-02-06 17:57                       ` Shameerali Kolothum Thodi via
@ 2025-02-06 17:59                         ` Jason Gunthorpe
  0 siblings, 0 replies; 150+ messages in thread
From: Jason Gunthorpe @ 2025-02-06 17:59 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Daniel P. Berrangé, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org, eric.auger@redhat.com,
	peter.maydell@linaro.org, nicolinc@nvidia.com, ddutile@redhat.com,
	Linuxarm, Wangzhou (B), jiangkunkun, Jonathan Cameron,
	zhangfei.gao@linaro.org, nathanc@nvidia.com

On Thu, Feb 06, 2025 at 05:57:38PM +0000, Shameerali Kolothum Thodi wrote:

> Also how do we do it with multiple VF devices under a pSUMMU ? Which
> cdev fd in that case? 

It doesn't matter, they are all interchangeable. Creating the VIOMMU
object just requires any vfio device that is attached to the physical
smmu.

Jason


^ permalink raw reply	[flat|nested] 150+ messages in thread

end of thread, other threads:[~2025-03-06 18:42 UTC | newest]

Thread overview: 150+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-08 12:52 [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Shameer Kolothum via
2024-11-08 12:52 ` [RFC PATCH 1/5] hw/arm/virt: Add an SMMU_IO_LEN macro Shameer Kolothum via
2024-11-13 16:48   ` Eric Auger
2024-11-08 12:52 ` [RFC PATCH 2/5] hw/arm/smmuv3: Add initial support for SMMUv3 Nested device Shameer Kolothum via
2024-11-13 17:12   ` Eric Auger
2024-11-13 18:05     ` Nicolin Chen
2024-11-26 18:28       ` Donald Dutile
2024-11-27 10:21         ` Shameerali Kolothum Thodi via
2024-11-27 16:00           ` Jason Gunthorpe
2024-11-27 16:05             ` Eric Auger
2024-11-28  3:25               ` Zhangfei Gao
2024-11-28  8:06                 ` Eric Auger
2024-11-28  8:28                   ` Shameerali Kolothum Thodi via
2024-11-28  8:41                     ` Eric Auger
2024-11-28 12:52                     ` Jason Gunthorpe
2024-11-27 23:03             ` Donald Dutile
2024-11-28 12:51               ` Jason Gunthorpe
2024-11-28  4:29           ` Donald Dutile
2024-11-28  4:44             ` Nicolin Chen
2024-11-28 12:54               ` Jason Gunthorpe
2024-11-28 18:22                 ` Nicolin Chen
2024-12-02 18:53                 ` Donald Dutile
2024-11-28  8:17             ` Shameerali Kolothum Thodi via
2024-11-14  8:20     ` Shameerali Kolothum Thodi via
2024-11-14  8:41       ` Eric Auger
2024-11-14 13:27         ` Shameerali Kolothum Thodi via
2024-11-15 22:32       ` Nicolin Chen
2024-11-13 18:00   ` Eric Auger
2024-11-08 12:52 ` [RFC PATCH 3/5] hw/arm/smmuv3: Associate a pci bus with a " Shameer Kolothum via
2024-11-13 17:58   ` Eric Auger
2024-11-14  8:30     ` Shameerali Kolothum Thodi via
2025-01-30 16:29   ` Daniel P. Berrangé
2025-01-30 18:19     ` Shameerali Kolothum Thodi via
2024-11-08 12:52 ` [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes Shameer Kolothum via
2024-11-18 10:01   ` Eric Auger
2024-11-18 11:44     ` Shameerali Kolothum Thodi via
2024-11-18 13:45       ` Eric Auger
2024-11-18 15:00         ` Shameerali Kolothum Thodi via
2024-11-18 18:09           ` Eric Auger
2024-11-20 14:16             ` Shameerali Kolothum Thodi via
2024-11-20 16:10               ` Eric Auger
2024-11-20 16:26                 ` Shameerali Kolothum Thodi via
2024-11-21  9:46                   ` Shameerali Kolothum Thodi via
2024-12-10 20:48                     ` Nicolin Chen
2024-12-11 15:21                       ` Shameerali Kolothum Thodi via
2024-12-13  0:28                         ` Nicolin Chen
2024-11-08 12:52 ` [RFC PATCH 5/5] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum via
2024-11-13 18:31   ` Nicolin Chen
2024-11-14  8:48     ` Shameerali Kolothum Thodi via
2024-11-14 10:41       ` Eric Auger
2024-11-15 22:12         ` Nicolin Chen
2024-12-10 23:01   ` Nicolin Chen
2024-12-11  0:48     ` Jason Gunthorpe
2024-12-11  1:28       ` Nicolin Chen
2024-12-11 13:11         ` Jason Gunthorpe
2024-12-11 17:20           ` Nicolin Chen
2024-12-11 18:01             ` Jason Gunthorpe
2024-11-12 22:59 ` [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3 Nicolin Chen
2024-11-14  7:56   ` Shameerali Kolothum Thodi via
2024-11-20 23:59   ` Nathan Chen
2024-11-21 10:12     ` Shameerali Kolothum Thodi via
2024-11-22  1:41       ` Nathan Chen
2024-11-22 17:38         ` Shameerali Kolothum Thodi via
2024-11-22 18:53           ` Nathan Chen
2025-02-04 14:00             ` Eric Auger
2024-12-13 11:58           ` Daniel P. Berrangé
2024-12-13 12:43             ` Jason Gunthorpe
2024-12-12 23:54     ` Nathan Chen
2024-12-13  1:01       ` Nathan Chen
2024-12-16  9:31         ` Shameerali Kolothum Thodi via
2025-01-25  2:43           ` Nathan Chen
2025-01-27 15:26             ` Shameerali Kolothum Thodi via
2025-01-27 23:35               ` Nathan Chen
2024-11-13 16:16 ` Mostafa Saleh
2024-11-14  8:01   ` Shameerali Kolothum Thodi via
2024-11-14 11:49     ` Mostafa Saleh
2024-11-13 21:42 ` Nicolin Chen
2024-11-14  9:11   ` Shameerali Kolothum Thodi via
2024-11-18 10:50 ` Eric Auger
2025-01-30 16:41   ` Daniel P. Berrangé
2024-12-13 12:00 ` Daniel P. Berrangé
2024-12-13 12:46   ` Jason Gunthorpe
2024-12-13 13:19     ` Daniel P. Berrangé
2024-12-16  9:38       ` Shameerali Kolothum Thodi via
2024-12-17 18:36       ` Donald Dutile
2024-12-13 13:33     ` Peter Maydell
2024-12-16 10:01       ` Shameerali Kolothum Thodi via
2025-01-09  4:45         ` Nicolin Chen
2025-01-11  4:05           ` Donald Dutile
2025-01-23  4:10             ` Nicolin Chen
2025-01-23  8:28               ` Shameerali Kolothum Thodi via
2025-01-23  8:40                 ` Nicolin Chen
2025-01-23 11:07                 ` Duan, Zhenzhong
2025-02-17  9:17                   ` Duan, Zhenzhong
2025-02-18  6:52                     ` Shameerali Kolothum Thodi via
2025-03-06 17:59                       ` Eric Auger
2025-03-06 18:27                         ` Shameerali Kolothum Thodi via
2025-03-06 18:40                           ` Eric Auger
2025-01-31 16:54           ` Eric Auger
2025-02-03 18:50             ` Nicolin Chen
2025-02-04 17:49               ` Eric Auger
2025-02-05  0:08                 ` Nicolin Chen
2025-02-05 10:43                   ` Shameerali Kolothum Thodi via
2025-02-05 12:35                   ` Eric Auger
2025-02-06 10:34                   ` Shameerali Kolothum Thodi via
2025-02-06 18:58                     ` Nicolin Chen
2025-03-03 15:21                       ` Shameerali Kolothum Thodi via
2025-03-03 17:04                         ` Nicolin Chen
2025-03-04  9:30                           ` Shameerali Kolothum Thodi via
2025-01-30 16:00 ` Daniel P. Berrangé
2025-01-30 18:09   ` Shameerali Kolothum Thodi via
2025-01-31  9:33     ` Shameerali Kolothum Thodi via
2025-01-31 10:07       ` Eric Auger
2025-01-31 14:24       ` Jason Gunthorpe
2025-01-31 14:39         ` Shameerali Kolothum Thodi via
2025-01-31 14:54           ` Jason Gunthorpe
2025-01-31 15:23             ` Shameerali Kolothum Thodi via
2025-01-31 16:08               ` Eric Auger
2025-02-05 20:53                 ` Nathan Chen
2025-02-06  8:54                   ` Daniel P. Berrangé
2025-02-06  8:53                 ` Daniel P. Berrangé
2025-02-06 16:44                   ` Eric Auger
2025-01-31 21:41     ` Daniel P. Berrangé
2025-02-06 10:02       ` Shameerali Kolothum Thodi via
2025-02-06 10:37         ` Daniel P. Berrangé
2025-02-06 13:51           ` Shameerali Kolothum Thodi via
2025-02-06 14:46             ` Daniel P. Berrangé
2025-02-06 15:07               ` Shameerali Kolothum Thodi via
2025-02-06 17:02                 ` Jason Gunthorpe
2025-02-06 17:10                   ` Daniel P. Berrangé
2025-02-06 17:46                     ` Jason Gunthorpe
2025-02-06 17:54                       ` Daniel P. Berrangé
2025-02-06 17:58                         ` Jason Gunthorpe
2025-02-06 18:04                           ` Shameerali Kolothum Thodi via
2025-02-06 18:13                             ` Jason Gunthorpe
2025-02-06 18:18                               ` Shameerali Kolothum Thodi via
2025-02-06 18:22                                 ` Jason Gunthorpe
2025-02-06 20:33                                   ` Nicolin Chen
2025-02-06 20:38                                     ` Jason Gunthorpe
2025-02-06 20:48                                       ` Nicolin Chen
2025-02-06 21:11                                         ` Jason Gunthorpe
2025-02-06 22:46                                           ` Nicolin Chen
2025-02-07  0:08                                             ` Jason Gunthorpe
2025-02-07 10:21                                     ` Shameerali Kolothum Thodi via
2025-02-07 10:31                                       ` Daniel P. Berrangé
2025-02-07 12:21                                         ` Shameerali Kolothum Thodi via
2025-02-07 12:53                                           ` Jason Gunthorpe
2025-02-06 18:18                           ` Daniel P. Berrangé
2025-02-06 17:57                       ` Shameerali Kolothum Thodi via
2025-02-06 17:59                         ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).