[PATCH 0/2] VIRTIO-IOMMU/VFIO page size related fixes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] VIRTIO-IOMMU/VFIO page size related fixes
@ 2023-07-04 11:15 Eric Auger
  2023-07-04 11:15 ` [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment Eric Auger
  2023-07-04 11:15 ` [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask() Eric Auger
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Auger @ 2023-07-04 11:15 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	jean-philippe, zhenzhong.duan
  Cc: alex.williamson, clg, bharat.bhushan, peter.maydell

When assigning a host device and protecting it with the virtio-iommu we may
end up with qemu crashing with

    qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
    with mask 0x20010000
    qemu: hardware error: vfio: DMA mapping failed, unable to continue

This happens if the host features a 64kB page size and constraints
the physical IOMMU to use a 64kB page size. By default 4kB granule is used
by the qemu virtio-iommu device and this latter becomes aware of the 64kB
requirement too late, after the machine init, when the vfio device domain is
attached. virtio_iommu_set_page_size_mask() fails and this causes
vfio_listener_region_add() to end up with hw_error(). Currently the
granule is global to all domains.

To work around this issue, despite the IOMMU MR may be bypassed, we
transiently enable it on machine init done to get vfio_listener_region_add
and virtio_iommu_set_page_size_mask called ealier, before the domain
attach. That way the page size requirement can be taken into account
before the guest gets started.

Also get benefit of this series to do some cleanups in some traces
which may confuse the end user.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/virtio-iommu-page-size-v1

Eric Auger (2):
  virtio-iommu: Fix 64kB host page size VFIO device assignment
  virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()

 include/hw/virtio/virtio-iommu.h |  2 ++
 hw/virtio/virtio-iommu.c         | 49 +++++++++++++++++++++++---------
 hw/virtio/trace-events           |  1 +
 3 files changed, 38 insertions(+), 14 deletions(-)

-- 
2.38.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment
  2023-07-04 11:15 [PATCH 0/2] VIRTIO-IOMMU/VFIO page size related fixes Eric Auger
@ 2023-07-04 11:15 ` Eric Auger
  2023-07-05  4:52   ` Duan, Zhenzhong
  2023-07-04 11:15 ` [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask() Eric Auger
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Auger @ 2023-07-04 11:15 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	jean-philippe, zhenzhong.duan
  Cc: alex.williamson, clg, bharat.bhushan, peter.maydell

When running on a 64kB page size host and protecting a VFIO device
with the virtio-iommu, qemu crashes with this kind of message:

qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
with mask 0x20010000
qemu: hardware error: vfio: DMA mapping failed, unable to continue

This is due to the fact the IOMMU MR corresponding to the VFIO device
is enabled very late on domain attach, after the machine init.
The device reports a minimal 64kB page size but it is too late to be
applied. virtio_iommu_set_page_size_mask() fails and this causes
vfio_listener_region_add() to end up with hw_error();

To work around this issue, we transiently enable the IOMMU MR on
machine init to collect the page size requirements and then restore
the bypass state.

Fixes: 90519b9053 ("virtio-iommu: Add bypass mode support to assigned device")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/hw/virtio/virtio-iommu.h |  2 ++
 hw/virtio/virtio-iommu.c         | 30 ++++++++++++++++++++++++++++--
 hw/virtio/trace-events           |  1 +
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 2ad5ee320b..a93fc5383e 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -61,6 +61,8 @@ struct VirtIOIOMMU {
     QemuRecMutex mutex;
     GTree *endpoints;
     bool boot_bypass;
+    Notifier machine_done;
+    bool granule_frozen;
 };
 
 #endif
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 1cd258135d..1eaf81bab5 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -24,6 +24,7 @@
 #include "hw/virtio/virtio.h"
 #include "sysemu/kvm.h"
 #include "sysemu/reset.h"
+#include "sysemu/sysemu.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "trace.h"
@@ -1106,12 +1107,12 @@ static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
     }
 
     /*
-     * After the machine is finalized, we can't change the mask anymore. If by
+     * Once the granule is frozen we can't change the mask anymore. If by
      * chance the hotplugged device supports the same granule, we can still
      * accept it. Having a different masks is possible but the guest will use
      * sub-optimal block sizes, so warn about it.
      */
-    if (phase_check(PHASE_MACHINE_READY)) {
+    if (s->granule_frozen) {
         int new_granule = ctz64(new_mask);
         int cur_granule = ctz64(cur_mask);
 
@@ -1146,6 +1147,27 @@ static void virtio_iommu_system_reset(void *opaque)
 
 }
 
+static void virtio_iommu_freeze_granule(Notifier *notifier, void *data)
+{
+    VirtIOIOMMU *s = container_of(notifier, VirtIOIOMMU, machine_done);
+    bool boot_bypass = s->config.bypass;
+    int granule;
+
+    /*
+     * Transient IOMMU MR enable to collect page_size_mask requirement
+     * through memory_region_iommu_set_page_size_mask() called by
+     * VFIO region_add() callback
+     */
+    s->config.bypass = 0;
+    virtio_iommu_switch_address_space_all(s);
+    /* restore default */
+    s->config.bypass = boot_bypass;
+    virtio_iommu_switch_address_space_all(s);
+    s->granule_frozen = true;
+    granule = ctz64(s->config.page_size_mask);
+    trace_virtio_iommu_freeze_granule(BIT(granule));
+}
+
 static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -1189,6 +1211,9 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
         error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!");
     }
 
+    s->machine_done.notify = virtio_iommu_freeze_granule;
+    qemu_add_machine_init_done_notifier(&s->machine_done);
+
     qemu_register_reset(virtio_iommu_system_reset, s);
 }
 
@@ -1198,6 +1223,7 @@ static void virtio_iommu_device_unrealize(DeviceState *dev)
     VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
 
     qemu_unregister_reset(virtio_iommu_system_reset, s);
+    qemu_remove_machine_init_done_notifier(&s->machine_done);
 
     g_hash_table_destroy(s->as_by_busptr);
     if (s->domains) {
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 8f8d05cf9b..68b752e304 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -131,6 +131,7 @@ virtio_iommu_set_page_size_mask(const char *name, uint64_t old, uint64_t new) "m
 virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
 virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
 virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
+virtio_iommu_freeze_granule(uint64_t page_size_mask) "granule set to 0x%"PRIx64
 
 # virtio-mem.c
 virtio_mem_send_response(uint16_t type) "type=%" PRIu16
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()
  2023-07-04 11:15 [PATCH 0/2] VIRTIO-IOMMU/VFIO page size related fixes Eric Auger
  2023-07-04 11:15 ` [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment Eric Auger
@ 2023-07-04 11:15 ` Eric Auger
  2023-07-05  4:55   ` Duan, Zhenzhong
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Auger @ 2023-07-04 11:15 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, mst,
	jean-philippe, zhenzhong.duan
  Cc: alex.williamson, clg, bharat.bhushan, peter.maydell

The current error messages in virtio_iommu_set_page_size_mask()
sound quite similar for different situations and miss the IOMMU
memory region that causes the issue.

Clarify them and rework the comment.

Also remove the trace when the new page_size_mask is not applied as
the current frozen granule is kept. This message is rather confusing
for the end user and anyway the current granule would have been used
by the driver

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/virtio-iommu.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 1eaf81bab5..0d9f7196fe 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -1101,29 +1101,24 @@ static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
                                           new_mask);
 
     if ((cur_mask & new_mask) == 0) {
-        error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
-                   " is incompatible with mask 0x%"PRIx64, cur_mask, new_mask);
+        error_setg(errp, "virtio-iommu %s reports a page size mask 0x%"PRIx64
+                   " incompatible with currently supported mask 0x%"PRIx64,
+                   mr->parent_obj.name, new_mask, cur_mask);
         return -1;
     }
 
     /*
      * Once the granule is frozen we can't change the mask anymore. If by
      * chance the hotplugged device supports the same granule, we can still
-     * accept it. Having a different masks is possible but the guest will use
-     * sub-optimal block sizes, so warn about it.
+     * accept it.
      */
     if (s->granule_frozen) {
-        int new_granule = ctz64(new_mask);
         int cur_granule = ctz64(cur_mask);
 
-        if (new_granule != cur_granule) {
-            error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
-                       " is incompatible with mask 0x%"PRIx64, cur_mask,
-                       new_mask);
+        if (!(BIT(cur_granule) & new_mask)) {
+            error_setg(errp, "virtio-iommu %s does not support frozen granule 0x%"PRIx64,
+                       mr->parent_obj.name, BIT(cur_granule));
             return -1;
-        } else if (new_mask != cur_mask) {
-            warn_report("virtio-iommu page mask 0x%"PRIx64
-                        " does not match 0x%"PRIx64, cur_mask, new_mask);
         }
         return 0;
     }
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment
  2023-07-04 11:15 ` [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment Eric Auger
@ 2023-07-05  4:52   ` Duan, Zhenzhong
  2023-07-05  6:19     ` Eric Auger
  2023-07-05  8:29     ` Jean-Philippe Brucker
  0 siblings, 2 replies; 18+ messages in thread
From: Duan, Zhenzhong @ 2023-07-05  4:52 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, jean-philippe@linaro.org
  Cc: alex.williamson@redhat.com, clg@redhap.com,
	bharat.bhushan@nxp.com, peter.maydell@linaro.org

Hi Eric,

>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Tuesday, July 4, 2023 7:15 PM
>Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
>assignment
>
>When running on a 64kB page size host and protecting a VFIO device
>with the virtio-iommu, qemu crashes with this kind of message:
>
>qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
>with mask 0x20010000

Does 0x20010000 mean only  512MB and 64KB super page mapping is
supported for host iommu hw? 4KB mapping not supported?

There is a check in guest kernel side hint only 4KB is supported, with
this patch we force viommu->pgsize_bitmap to 0x20010000
and fail below check? Does this device work in guest?
Please correct me if I understand wrong.

static int viommu_domain_finalise(struct viommu_endpoint *vdev,
                                  struct iommu_domain *domain)
{
...
        viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
        if (viommu_page_size > PAGE_SIZE) {
                dev_err(vdev->dev,
                        "granule 0x%lx larger than system page size 0x%lx\n",
                        viommu_page_size, PAGE_SIZE);
                return -ENODEV;
        }

Another question is: Presume 0x20010000 does mean only 512MB and 64KB
is supported. Is host hva->hpa mapping ensured to be compatible with at least
64KB mapping? If host mmu only support 4KB mapping or other non-compatible
with 0x20010000, will vfio_dma_map() fail?

>qemu: hardware error: vfio: DMA mapping failed, unable to continue
>
>This is due to the fact the IOMMU MR corresponding to the VFIO device
>is enabled very late on domain attach, after the machine init.
>The device reports a minimal 64kB page size but it is too late to be
>applied. virtio_iommu_set_page_size_mask() fails and this causes
>vfio_listener_region_add() to end up with hw_error();
>
>To work around this issue, we transiently enable the IOMMU MR on
>machine init to collect the page size requirements and then restore
>the bypass state.
>
>Fixes: 90519b9053 ("virtio-iommu: Add bypass mode support to assigned
>device")
>Signed-off-by: Eric Auger <eric.auger@redhat.com>
>---
> include/hw/virtio/virtio-iommu.h |  2 ++
> hw/virtio/virtio-iommu.c         | 30 ++++++++++++++++++++++++++++--
> hw/virtio/trace-events           |  1 +
> 3 files changed, 31 insertions(+), 2 deletions(-)
>
>diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-
>iommu.h
>index 2ad5ee320b..a93fc5383e 100644
>--- a/include/hw/virtio/virtio-iommu.h
>+++ b/include/hw/virtio/virtio-iommu.h
>@@ -61,6 +61,8 @@ struct VirtIOIOMMU {
>     QemuRecMutex mutex;
>     GTree *endpoints;
>     bool boot_bypass;
>+    Notifier machine_done;
>+    bool granule_frozen;
> };
>
> #endif
>diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>index 1cd258135d..1eaf81bab5 100644
>--- a/hw/virtio/virtio-iommu.c
>+++ b/hw/virtio/virtio-iommu.c
>@@ -24,6 +24,7 @@
> #include "hw/virtio/virtio.h"
> #include "sysemu/kvm.h"
> #include "sysemu/reset.h"
>+#include "sysemu/sysemu.h"
> #include "qapi/error.h"
> #include "qemu/error-report.h"
> #include "trace.h"
>@@ -1106,12 +1107,12 @@ static int
>virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>     }
>
>     /*
>-     * After the machine is finalized, we can't change the mask anymore. If by
>+     * Once the granule is frozen we can't change the mask anymore. If by
>      * chance the hotplugged device supports the same granule, we can still
>      * accept it. Having a different masks is possible but the guest will use
>      * sub-optimal block sizes, so warn about it.
>      */
>-    if (phase_check(PHASE_MACHINE_READY)) {
>+    if (s->granule_frozen) {
>         int new_granule = ctz64(new_mask);
>         int cur_granule = ctz64(cur_mask);
>
>@@ -1146,6 +1147,27 @@ static void virtio_iommu_system_reset(void
>*opaque)
>
> }
>
>+static void virtio_iommu_freeze_granule(Notifier *notifier, void *data)
>+{
>+    VirtIOIOMMU *s = container_of(notifier, VirtIOIOMMU, machine_done);
>+    bool boot_bypass = s->config.bypass;
>+    int granule;
>+
>+    /*
>+     * Transient IOMMU MR enable to collect page_size_mask requirement
>+     * through memory_region_iommu_set_page_size_mask() called by
>+     * VFIO region_add() callback
>+     */
>+    s->config.bypass = 0;
>+    virtio_iommu_switch_address_space_all(s);
>+    /* restore default */
>+    s->config.bypass = boot_bypass;
>+    virtio_iommu_switch_address_space_all(s);
>+    s->granule_frozen = true;
>+    granule = ctz64(s->config.page_size_mask);
>+    trace_virtio_iommu_freeze_granule(BIT(granule));
>+}

It looks a bit heavy here just in order to get page_size_mask from host side.
But maybe this is the only way with current interface.

Thanks
Zhenzhong

>+
> static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
> {
>     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>@@ -1189,6 +1211,9 @@ static void virtio_iommu_device_realize(DeviceState
>*dev, Error **errp)
>         error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!");
>     }
>
>+    s->machine_done.notify = virtio_iommu_freeze_granule;
>+    qemu_add_machine_init_done_notifier(&s->machine_done);
>+
>     qemu_register_reset(virtio_iommu_system_reset, s);
> }
>
>@@ -1198,6 +1223,7 @@ static void
>virtio_iommu_device_unrealize(DeviceState *dev)
>     VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
>
>     qemu_unregister_reset(virtio_iommu_system_reset, s);
>+    qemu_remove_machine_init_done_notifier(&s->machine_done);
>
>     g_hash_table_destroy(s->as_by_busptr);
>     if (s->domains) {
>diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>index 8f8d05cf9b..68b752e304 100644
>--- a/hw/virtio/trace-events
>+++ b/hw/virtio/trace-events
>@@ -131,6 +131,7 @@ virtio_iommu_set_page_size_mask(const char *name,
>uint64_t old, uint64_t new) "m
> virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
> virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
> virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn,
>bool on) "Device %02x:%02x.%x switching address space (iommu
>enabled=%d)"
>+virtio_iommu_freeze_granule(uint64_t page_size_mask) "granule set to
>0x%"PRIx64
>
> # virtio-mem.c
> virtio_mem_send_response(uint16_t type) "type=%" PRIu16
>--
>2.38.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()
  2023-07-04 11:15 ` [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask() Eric Auger
@ 2023-07-05  4:55   ` Duan, Zhenzhong
  2023-07-05  8:17     ` Duan, Zhenzhong
  0 siblings, 1 reply; 18+ messages in thread
From: Duan, Zhenzhong @ 2023-07-05  4:55 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, jean-philippe@linaro.org
  Cc: alex.williamson@redhat.com, clg@redhap.com,
	bharat.bhushan@nxp.com, peter.maydell@linaro.org



>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Tuesday, July 4, 2023 7:15 PM
>To: eric.auger.pro@gmail.com; eric.auger@redhat.com; qemu-
>devel@nongnu.org; qemu-arm@nongnu.org; mst@redhat.com; jean-
>philippe@linaro.org; Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhap.com;
>bharat.bhushan@nxp.com; peter.maydell@linaro.org
>Subject: [PATCH 2/2] virtio-iommu: Rework the trace in
>virtio_iommu_set_page_size_mask()
>
>The current error messages in virtio_iommu_set_page_size_mask()
>sound quite similar for different situations and miss the IOMMU
>memory region that causes the issue.
>
>Clarify them and rework the comment.
>
>Also remove the trace when the new page_size_mask is not applied as
>the current frozen granule is kept. This message is rather confusing
>for the end user and anyway the current granule would have been used
>by the driver
>
>Signed-off-by: Eric Auger <eric.auger@redhat.com>
>---
> hw/virtio/virtio-iommu.c | 19 +++++++------------
> 1 file changed, 7 insertions(+), 12 deletions(-)
>
>diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>index 1eaf81bab5..0d9f7196fe 100644
>--- a/hw/virtio/virtio-iommu.c
>+++ b/hw/virtio/virtio-iommu.c
>@@ -1101,29 +1101,24 @@ static int
>virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>                                           new_mask);
>
>     if ((cur_mask & new_mask) == 0) {
>-        error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>-                   " is incompatible with mask 0x%"PRIx64, cur_mask, new_mask);
>+        error_setg(errp, "virtio-iommu %s reports a page size mask 0x%"PRIx64
>+                   " incompatible with currently supported mask 0x%"PRIx64,
>+                   mr->parent_obj.name, new_mask, cur_mask);
>         return -1;
>     }
>
>     /*
>      * Once the granule is frozen we can't change the mask anymore. If by
>      * chance the hotplugged device supports the same granule, we can still
>-     * accept it. Having a different masks is possible but the guest will use
>-     * sub-optimal block sizes, so warn about it.
>+     * accept it.
>      */
>     if (s->granule_frozen) {
>-        int new_granule = ctz64(new_mask);
>         int cur_granule = ctz64(cur_mask);
>
>-        if (new_granule != cur_granule) {
>-            error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>-                       " is incompatible with mask 0x%"PRIx64, cur_mask,
>-                       new_mask);
>+        if (!(BIT(cur_granule) & new_mask)) {

Good catch.

Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Thanks
Zhenzhong

>+            error_setg(errp, "virtio-iommu %s does not support frozen granule
>0x%"PRIx64,
>+                       mr->parent_obj.name, BIT(cur_granule));
>             return -1;
>-        } else if (new_mask != cur_mask) {
>-            warn_report("virtio-iommu page mask 0x%"PRIx64
>-                        " does not match 0x%"PRIx64, cur_mask, new_mask);
>         }
>         return 0;
>     }
>--
>2.38.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment
  2023-07-05  4:52   ` Duan, Zhenzhong
@ 2023-07-05  6:19     ` Eric Auger
  2023-07-05  8:11       ` Duan, Zhenzhong
  2023-07-05  8:29     ` Jean-Philippe Brucker
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Auger @ 2023-07-05  6:19 UTC (permalink / raw)
  To: Duan, Zhenzhong, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, jean-philippe@linaro.org
  Cc: alex.williamson@redhat.com, clg@redhap.com,
	bharat.bhushan@nxp.com, peter.maydell@linaro.org

Hi Zhenzhong,
On 7/5/23 06:52, Duan, Zhenzhong wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Tuesday, July 4, 2023 7:15 PM
>> Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
>> assignment
>>
>> When running on a 64kB page size host and protecting a VFIO device
>> with the virtio-iommu, qemu crashes with this kind of message:
>>
>> qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
>> with mask 0x20010000
> Does 0x20010000 mean only  512MB and 64KB super page mapping is
> supported for host iommu hw? 4KB mapping not supported?
Yes that's correct. In that case the host has 64kB page and 4kB is not
supported.
>
> There is a check in guest kernel side hint only 4KB is supported, with
> this patch we force viommu->pgsize_bitmap to 0x20010000
> and fail below check? Does this device work in guest?
> Please correct me if I understand wrong.
my guest also has 64kB so the check hereafter succeeds. effectively in
case your host has a larger page size than your guest it fails with
[    2.147031] virtio-pci 0000:00:01.0: granule 0x10000 larger than
system page size 0x1000
[    7.231128] ixgbevf 0000:00:02.0: granule 0x10000 larger than system
page size 0x1000

>
> static int viommu_domain_finalise(struct viommu_endpoint *vdev,
>                                   struct iommu_domain *domain)
> {
> ...
>         viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
>         if (viommu_page_size > PAGE_SIZE) {
>                 dev_err(vdev->dev,
>                         "granule 0x%lx larger than system page size 0x%lx\n",
>                         viommu_page_size, PAGE_SIZE);
>                 return -ENODEV;
>         }
>
> Another question is: Presume 0x20010000 does mean only 512MB and 64KB
> is supported. Is host hva->hpa mapping ensured to be compatible with at least
> 64KB mapping? If host mmu only support 4KB mapping or other non-compatible
> with 0x20010000, will vfio_dma_map() fail?
the page size mask is retrieved with VFIO_IOMMU_GET_INFO uapi
which returns host vfio_iommu_type1 vfio_iommu->pgsize_bitmap. This
latter is initialized to host PAGE_MASK and later restricted on
vfio_iommu_type1_attach_group though the vfio_update_pgsize_bitmap() calls

so yes both IOMMU and CPU page size are garanteed to be compatible.

>
>> qemu: hardware error: vfio: DMA mapping failed, unable to continue
>>
>> This is due to the fact the IOMMU MR corresponding to the VFIO device
>> is enabled very late on domain attach, after the machine init.
>> The device reports a minimal 64kB page size but it is too late to be
>> applied. virtio_iommu_set_page_size_mask() fails and this causes
>> vfio_listener_region_add() to end up with hw_error();
>>
>> To work around this issue, we transiently enable the IOMMU MR on
>> machine init to collect the page size requirements and then restore
>> the bypass state.
>>
>> Fixes: 90519b9053 ("virtio-iommu: Add bypass mode support to assigned
>> device")
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>> include/hw/virtio/virtio-iommu.h |  2 ++
>> hw/virtio/virtio-iommu.c         | 30 ++++++++++++++++++++++++++++--
>> hw/virtio/trace-events           |  1 +
>> 3 files changed, 31 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-
>> iommu.h
>> index 2ad5ee320b..a93fc5383e 100644
>> --- a/include/hw/virtio/virtio-iommu.h
>> +++ b/include/hw/virtio/virtio-iommu.h
>> @@ -61,6 +61,8 @@ struct VirtIOIOMMU {
>>     QemuRecMutex mutex;
>>     GTree *endpoints;
>>     bool boot_bypass;
>> +    Notifier machine_done;
>> +    bool granule_frozen;
>> };
>>
>> #endif
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index 1cd258135d..1eaf81bab5 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -24,6 +24,7 @@
>> #include "hw/virtio/virtio.h"
>> #include "sysemu/kvm.h"
>> #include "sysemu/reset.h"
>> +#include "sysemu/sysemu.h"
>> #include "qapi/error.h"
>> #include "qemu/error-report.h"
>> #include "trace.h"
>> @@ -1106,12 +1107,12 @@ static int
>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>>     }
>>
>>     /*
>> -     * After the machine is finalized, we can't change the mask anymore. If by
>> +     * Once the granule is frozen we can't change the mask anymore. If by
>>      * chance the hotplugged device supports the same granule, we can still
>>      * accept it. Having a different masks is possible but the guest will use
>>      * sub-optimal block sizes, so warn about it.
>>      */
>> -    if (phase_check(PHASE_MACHINE_READY)) {
>> +    if (s->granule_frozen) {
>>         int new_granule = ctz64(new_mask);
>>         int cur_granule = ctz64(cur_mask);
>>
>> @@ -1146,6 +1147,27 @@ static void virtio_iommu_system_reset(void
>> *opaque)
>>
>> }
>>
>> +static void virtio_iommu_freeze_granule(Notifier *notifier, void *data)
>> +{
>> +    VirtIOIOMMU *s = container_of(notifier, VirtIOIOMMU, machine_done);
>> +    bool boot_bypass = s->config.bypass;
>> +    int granule;
>> +
>> +    /*
>> +     * Transient IOMMU MR enable to collect page_size_mask requirement
>> +     * through memory_region_iommu_set_page_size_mask() called by
>> +     * VFIO region_add() callback
>> +     */
>> +    s->config.bypass = 0;
>> +    virtio_iommu_switch_address_space_all(s);
>> +    /* restore default */
>> +    s->config.bypass = boot_bypass;
>> +    virtio_iommu_switch_address_space_all(s);
>> +    s->granule_frozen = true;
>> +    granule = ctz64(s->config.page_size_mask);
>> +    trace_virtio_iommu_freeze_granule(BIT(granule));
>> +}
> It looks a bit heavy here just in order to get page_size_mask from host side.
> But maybe this is the only way with current interface.

the problem comes from the fact the regions are aliased due to the
bypass and vfio_listener_region_add() does not get a chance to be called
until the actual domain attach. So I do not see any other way to
transiently enable the region.

At least I could check if boot bypass is set before doing that dance. I
will respin with that.

Thanks

Eric
>
> Thanks
> Zhenzhong
>
>> +
>> static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>> {
>>     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> @@ -1189,6 +1211,9 @@ static void virtio_iommu_device_realize(DeviceState
>> *dev, Error **errp)
>>         error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!");
>>     }
>>
>> +    s->machine_done.notify = virtio_iommu_freeze_granule;
>> +    qemu_add_machine_init_done_notifier(&s->machine_done);
>> +
>>     qemu_register_reset(virtio_iommu_system_reset, s);
>> }
>>
>> @@ -1198,6 +1223,7 @@ static void
>> virtio_iommu_device_unrealize(DeviceState *dev)
>>     VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
>>
>>     qemu_unregister_reset(virtio_iommu_system_reset, s);
>> +    qemu_remove_machine_init_done_notifier(&s->machine_done);
>>
>>     g_hash_table_destroy(s->as_by_busptr);
>>     if (s->domains) {
>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>> index 8f8d05cf9b..68b752e304 100644
>> --- a/hw/virtio/trace-events
>> +++ b/hw/virtio/trace-events
>> @@ -131,6 +131,7 @@ virtio_iommu_set_page_size_mask(const char *name,
>> uint64_t old, uint64_t new) "m
>> virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
>> virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
>> virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn,
>> bool on) "Device %02x:%02x.%x switching address space (iommu
>> enabled=%d)"
>> +virtio_iommu_freeze_granule(uint64_t page_size_mask) "granule set to
>> 0x%"PRIx64
>>
>> # virtio-mem.c
>> virtio_mem_send_response(uint16_t type) "type=%" PRIu16
>> --
>> 2.38.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment
  2023-07-05  6:19     ` Eric Auger
@ 2023-07-05  8:11       ` Duan, Zhenzhong
  0 siblings, 0 replies; 18+ messages in thread
From: Duan, Zhenzhong @ 2023-07-05  8:11 UTC (permalink / raw)
  To: eric.auger@redhat.com, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, qemu-arm@nongnu.org, mst@redhat.com,
	jean-philippe@linaro.org
  Cc: alex.williamson@redhat.com, clg@redhap.com,
	bharat.bhushan@nxp.com, peter.maydell@linaro.org

>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Wednesday, July 5, 2023 2:20 PM
>Subject: Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
>assignment
>
>Hi Zhenzhong,
>On 7/5/23 06:52, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>>> -----Original Message-----
>>> From: Eric Auger <eric.auger@redhat.com>
>>> Sent: Tuesday, July 4, 2023 7:15 PM
>>> Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
>>> assignment
>>>
>>> When running on a 64kB page size host and protecting a VFIO device
>>> with the virtio-iommu, qemu crashes with this kind of message:
>>>
>>> qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
>>> with mask 0x20010000
>> Does 0x20010000 mean only  512MB and 64KB super page mapping is
>> supported for host iommu hw? 4KB mapping not supported?
>Yes that's correct. In that case the host has 64kB page and 4kB is not
>supported.
>>
>> There is a check in guest kernel side hint only 4KB is supported, with
>> this patch we force viommu->pgsize_bitmap to 0x20010000
>> and fail below check? Does this device work in guest?
>> Please correct me if I understand wrong.
>my guest also has 64kB so the check hereafter succeeds. effectively in
>case your host has a larger page size than your guest it fails with
>[    2.147031] virtio-pci 0000:00:01.0: granule 0x10000 larger than
>system page size 0x1000
>[    7.231128] ixgbevf 0000:00:02.0: granule 0x10000 larger than system
>page size 0x1000

Oh, I see, I took PAGE_SIZE as 4KB for granted.

>
>>
>> static int viommu_domain_finalise(struct viommu_endpoint *vdev,
>>                                   struct iommu_domain *domain)
>> {
>> ...
>>         viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
>>         if (viommu_page_size > PAGE_SIZE) {
>>                 dev_err(vdev->dev,
>>                         "granule 0x%lx larger than system page size 0x%lx\n",
>>                         viommu_page_size, PAGE_SIZE);
>>                 return -ENODEV;
>>         }
>>
>> Another question is: Presume 0x20010000 does mean only 512MB and 64KB
>> is supported. Is host hva->hpa mapping ensured to be compatible with at
>least
>> 64KB mapping? If host mmu only support 4KB mapping or other non-
>compatible
>> with 0x20010000, will vfio_dma_map() fail?
>the page size mask is retrieved with VFIO_IOMMU_GET_INFO uapi
>which returns host vfio_iommu_type1 vfio_iommu->pgsize_bitmap. This
>latter is initialized to host PAGE_MASK and later restricted on
>vfio_iommu_type1_attach_group though the vfio_update_pgsize_bitmap()
>calls

Understood, thanks for your analysis.

>
>so yes both IOMMU and CPU page size are garanteed to be compatible.
>
>>
>>> qemu: hardware error: vfio: DMA mapping failed, unable to continue
>>>
>>> This is due to the fact the IOMMU MR corresponding to the VFIO device
>>> is enabled very late on domain attach, after the machine init.
>>> The device reports a minimal 64kB page size but it is too late to be
>>> applied. virtio_iommu_set_page_size_mask() fails and this causes
>>> vfio_listener_region_add() to end up with hw_error();
>>>
>>> To work around this issue, we transiently enable the IOMMU MR on
>>> machine init to collect the page size requirements and then restore
>>> the bypass state.
>>>
>>> Fixes: 90519b9053 ("virtio-iommu: Add bypass mode support to assigned
>>> device")
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> ---
>>> include/hw/virtio/virtio-iommu.h |  2 ++
>>> hw/virtio/virtio-iommu.c         | 30 ++++++++++++++++++++++++++++--
>>> hw/virtio/trace-events           |  1 +
>>> 3 files changed, 31 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-
>>> iommu.h
>>> index 2ad5ee320b..a93fc5383e 100644
>>> --- a/include/hw/virtio/virtio-iommu.h
>>> +++ b/include/hw/virtio/virtio-iommu.h
>>> @@ -61,6 +61,8 @@ struct VirtIOIOMMU {
>>>     QemuRecMutex mutex;
>>>     GTree *endpoints;
>>>     bool boot_bypass;
>>> +    Notifier machine_done;
>>> +    bool granule_frozen;
>>> };
>>>
>>> #endif
>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>> index 1cd258135d..1eaf81bab5 100644
>>> --- a/hw/virtio/virtio-iommu.c
>>> +++ b/hw/virtio/virtio-iommu.c
>>> @@ -24,6 +24,7 @@
>>> #include "hw/virtio/virtio.h"
>>> #include "sysemu/kvm.h"
>>> #include "sysemu/reset.h"
>>> +#include "sysemu/sysemu.h"
>>> #include "qapi/error.h"
>>> #include "qemu/error-report.h"
>>> #include "trace.h"
>>> @@ -1106,12 +1107,12 @@ static int
>>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>>>     }
>>>
>>>     /*
>>> -     * After the machine is finalized, we can't change the mask anymore. If
>by
>>> +     * Once the granule is frozen we can't change the mask anymore. If by
>>>      * chance the hotplugged device supports the same granule, we can still
>>>      * accept it. Having a different masks is possible but the guest will use
>>>      * sub-optimal block sizes, so warn about it.
>>>      */
>>> -    if (phase_check(PHASE_MACHINE_READY)) {
>>> +    if (s->granule_frozen) {
>>>         int new_granule = ctz64(new_mask);
>>>         int cur_granule = ctz64(cur_mask);
>>>
>>> @@ -1146,6 +1147,27 @@ static void virtio_iommu_system_reset(void
>>> *opaque)
>>>
>>> }
>>>
>>> +static void virtio_iommu_freeze_granule(Notifier *notifier, void *data)
>>> +{
>>> +    VirtIOIOMMU *s = container_of(notifier, VirtIOIOMMU, machine_done);
>>> +    bool boot_bypass = s->config.bypass;
>>> +    int granule;
>>> +
>>> +    /*
>>> +     * Transient IOMMU MR enable to collect page_size_mask requirement
>>> +     * through memory_region_iommu_set_page_size_mask() called by
>>> +     * VFIO region_add() callback
>>> +     */
>>> +    s->config.bypass = 0;
>>> +    virtio_iommu_switch_address_space_all(s);
>>> +    /* restore default */
>>> +    s->config.bypass = boot_bypass;
>>> +    virtio_iommu_switch_address_space_all(s);
>>> +    s->granule_frozen = true;
>>> +    granule = ctz64(s->config.page_size_mask);
>>> +    trace_virtio_iommu_freeze_granule(BIT(granule));
>>> +}
>> It looks a bit heavy here just in order to get page_size_mask from host side.
>> But maybe this is the only way with current interface.
>
>the problem comes from the fact the regions are aliased due to the
>bypass and vfio_listener_region_add() does not get a chance to be called
>until the actual domain attach. So I do not see any other way to
>transiently enable the region.
Agree.

>
>At least I could check if boot bypass is set before doing that dance. I
>will respin with that.
Make sense.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()
  2023-07-05  4:55   ` Duan, Zhenzhong
@ 2023-07-05  8:17     ` Duan, Zhenzhong
  2023-07-05 13:16       ` Eric Auger
  0 siblings, 1 reply; 18+ messages in thread
From: Duan, Zhenzhong @ 2023-07-05  8:17 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, jean-philippe@linaro.org
  Cc: alex.williamson@redhat.com, clg@redhap.com,
	bharat.bhushan@nxp.com, peter.maydell@linaro.org



>-----Original Message-----
>From: Duan, Zhenzhong
>Sent: Wednesday, July 5, 2023 12:56 PM
>Subject: RE: [PATCH 2/2] virtio-iommu: Rework the trace in
>virtio_iommu_set_page_size_mask()
>
>
>
>>-----Original Message-----
>>From: Eric Auger <eric.auger@redhat.com>
>>Sent: Tuesday, July 4, 2023 7:15 PM
>>To: eric.auger.pro@gmail.com; eric.auger@redhat.com; qemu-
>>devel@nongnu.org; qemu-arm@nongnu.org; mst@redhat.com; jean-
>>philippe@linaro.org; Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>Cc: alex.williamson@redhat.com; clg@redhap.com;
>bharat.bhushan@nxp.com;
>>peter.maydell@linaro.org
>>Subject: [PATCH 2/2] virtio-iommu: Rework the trace in
>>virtio_iommu_set_page_size_mask()
>>
>>The current error messages in virtio_iommu_set_page_size_mask() sound
>>quite similar for different situations and miss the IOMMU memory region
>>that causes the issue.
>>
>>Clarify them and rework the comment.
>>
>>Also remove the trace when the new page_size_mask is not applied as the
>>current frozen granule is kept. This message is rather confusing for
>>the end user and anyway the current granule would have been used by the
>>driver
>>
>>Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>---
>> hw/virtio/virtio-iommu.c | 19 +++++++------------
>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>
>>diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index
>>1eaf81bab5..0d9f7196fe 100644
>>--- a/hw/virtio/virtio-iommu.c
>>+++ b/hw/virtio/virtio-iommu.c
>>@@ -1101,29 +1101,24 @@ static int
>>virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>>                                           new_mask);
>>
>>     if ((cur_mask & new_mask) == 0) {
>>-        error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>-                   " is incompatible with mask 0x%"PRIx64, cur_mask, new_mask);
>>+        error_setg(errp, "virtio-iommu %s reports a page size mask 0x%"PRIx64
>>+                   " incompatible with currently supported mask 0x%"PRIx64,
>>+                   mr->parent_obj.name, new_mask, cur_mask);
>>         return -1;
>>     }
>>
>>     /*
>>      * Once the granule is frozen we can't change the mask anymore. If by
>>      * chance the hotplugged device supports the same granule, we can still
>>-     * accept it. Having a different masks is possible but the guest will use
>>-     * sub-optimal block sizes, so warn about it.
>>+     * accept it.
>>      */
>>     if (s->granule_frozen) {
>>-        int new_granule = ctz64(new_mask);
>>         int cur_granule = ctz64(cur_mask);
>>
>>-        if (new_granule != cur_granule) {
>>-            error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>-                       " is incompatible with mask 0x%"PRIx64, cur_mask,
>>-                       new_mask);
>>+        if (!(BIT(cur_granule) & new_mask)) {

Sorry, I read this piece code again and got a question, if new_mask has finer
granularity than cur_granule, should we allow it to pass even though
BIT(cur_granule) is not set?

Thanks
Zhenzhong

>
>Good catch.
>
>Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>
>Thanks
>Zhenzhong
>
>>+            error_setg(errp, "virtio-iommu %s does not support frozen
>>+ granule
>>0x%"PRIx64,
>>+                       mr->parent_obj.name, BIT(cur_granule));
>>             return -1;
>>-        } else if (new_mask != cur_mask) {
>>-            warn_report("virtio-iommu page mask 0x%"PRIx64
>>-                        " does not match 0x%"PRIx64, cur_mask, new_mask);
>>         }
>>         return 0;
>>     }
>>--
>>2.38.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment
  2023-07-05  4:52   ` Duan, Zhenzhong
  2023-07-05  6:19     ` Eric Auger
@ 2023-07-05  8:29     ` Jean-Philippe Brucker
  2023-07-05 10:13       ` Duan, Zhenzhong
  1 sibling, 1 reply; 18+ messages in thread
From: Jean-Philippe Brucker @ 2023-07-05  8:29 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: Eric Auger, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	clg@redhap.com, bharat.bhushan@nxp.com, peter.maydell@linaro.org

On Wed, Jul 05, 2023 at 04:52:09AM +0000, Duan, Zhenzhong wrote:
> Hi Eric,
> 
> >-----Original Message-----
> >From: Eric Auger <eric.auger@redhat.com>
> >Sent: Tuesday, July 4, 2023 7:15 PM
> >Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
> >assignment
> >
> >When running on a 64kB page size host and protecting a VFIO device
> >with the virtio-iommu, qemu crashes with this kind of message:
> >
> >qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
> >with mask 0x20010000
> 
> Does 0x20010000 mean only  512MB and 64KB super page mapping is
> supported for host iommu hw? 4KB mapping not supported?

It's not a restriction by the HW IOMMU, but the host kernel. An Arm SMMU
can implement 4KB, 16KB and/or 64KB granules, but the host kernel only
advertises through VFIO the granule corresponding to host PAGE_SIZE. This
restriction is done by arm_lpae_restrict_pgsizes() in order to choose a
page size when a device is driven by the host.

> 
> There is a check in guest kernel side hint only 4KB is supported, with
> this patch we force viommu->pgsize_bitmap to 0x20010000
> and fail below check? Does this device work in guest?
> Please correct me if I understand wrong.

Right, a guest using 4KB pages under a host that uses 64KB is not
supported, because if the guest attempts to dma_map a 4K page, the IOMMU
cannot create a mapping small enough, the mapping would have to spill over
neighbouring guest memory.

One possible solution would be supporting multiple page granules. If we
added a page granule negotiation through VFIO and virtio-iommu then the
guest could pick the page size it wants. But this requires changes to
Linux UAPI so isn't a priority at the moment, because we're trying to
enable nesting which would support 64K-host/4K-guest as well.

See also the discussion on the patch that introduced the guest check
https://lore.kernel.org/linux-iommu/20200318114047.1518048-1-jean-philippe@linaro.org/

Thanks,
Jean

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment
  2023-07-05  8:29     ` Jean-Philippe Brucker
@ 2023-07-05 10:13       ` Duan, Zhenzhong
  2023-07-05 11:33         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 18+ messages in thread
From: Duan, Zhenzhong @ 2023-07-05 10:13 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Eric Auger, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	clg@redhap.com, bharat.bhushan@nxp.com, peter.maydell@linaro.org

>-----Original Message-----
>From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>Sent: Wednesday, July 5, 2023 4:29 PM
>Subject: Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
>assignment
>
>On Wed, Jul 05, 2023 at 04:52:09AM +0000, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>> >-----Original Message-----
>> >From: Eric Auger <eric.auger@redhat.com>
>> >Sent: Tuesday, July 4, 2023 7:15 PM
>> >Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO
>> >device assignment
>> >
>> >When running on a 64kB page size host and protecting a VFIO device
>> >with the virtio-iommu, qemu crashes with this kind of message:
>> >
>> >qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
>> >with mask 0x20010000
>>
>> Does 0x20010000 mean only  512MB and 64KB super page mapping is
>> supported for host iommu hw? 4KB mapping not supported?
>
>It's not a restriction by the HW IOMMU, but the host kernel. An Arm SMMU
>can implement 4KB, 16KB and/or 64KB granules, but the host kernel only
>advertises through VFIO the granule corresponding to host PAGE_SIZE. This
>restriction is done by arm_lpae_restrict_pgsizes() in order to choose a page
>size when a device is driven by the host.

Just curious why not advertises the Arm SMMU implemented granules to VFIO
Eg:4KB, 16KB or 64KB granules? But arm_lpae_restrict_pgsizes() restricted ones,
Eg: for SZ_4K, (SZ_4K | SZ_2M | SZ_1G).
(SZ_4K | SZ_2M | SZ_1G) looks not real hardware granules of Arm SMMU.

>
>>
>> There is a check in guest kernel side hint only 4KB is supported, with
>> this patch we force viommu->pgsize_bitmap to 0x20010000 and fail below
>> check? Does this device work in guest?
>> Please correct me if I understand wrong.
>
>Right, a guest using 4KB pages under a host that uses 64KB is not supported,
>because if the guest attempts to dma_map a 4K page, the IOMMU cannot
>create a mapping small enough, the mapping would have to spill over
>neighbouring guest memory.
>
>One possible solution would be supporting multiple page granules. If we
>added a page granule negotiation through VFIO and virtio-iommu then the
>guest could pick the page size it wants. But this requires changes to Linux UAPI
>so isn't a priority at the moment, because we're trying to enable nesting which
>would support 64K-host/4K-guest as well.
>
>See also the discussion on the patch that introduced the guest check
>https://lore.kernel.org/linux-iommu/20200318114047.1518048-1-jean-
>philippe@linaro.org/

Clear, thanks for sharing the history.

Regards
Zhenzhong


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment
  2023-07-05 10:13       ` Duan, Zhenzhong
@ 2023-07-05 11:33         ` Jean-Philippe Brucker
  2023-07-06  9:00           ` Duan, Zhenzhong
  0 siblings, 1 reply; 18+ messages in thread
From: Jean-Philippe Brucker @ 2023-07-05 11:33 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: Eric Auger, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	clg@redhap.com, bharat.bhushan@nxp.com, peter.maydell@linaro.org

On Wed, Jul 05, 2023 at 10:13:11AM +0000, Duan, Zhenzhong wrote:
> >-----Original Message-----
> >From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> >Sent: Wednesday, July 5, 2023 4:29 PM
> >Subject: Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
> >assignment
> >
> >On Wed, Jul 05, 2023 at 04:52:09AM +0000, Duan, Zhenzhong wrote:
> >> Hi Eric,
> >>
> >> >-----Original Message-----
> >> >From: Eric Auger <eric.auger@redhat.com>
> >> >Sent: Tuesday, July 4, 2023 7:15 PM
> >> >Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO
> >> >device assignment
> >> >
> >> >When running on a 64kB page size host and protecting a VFIO device
> >> >with the virtio-iommu, qemu crashes with this kind of message:
> >> >
> >> >qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
> >> >with mask 0x20010000
> >>
> >> Does 0x20010000 mean only  512MB and 64KB super page mapping is
> >> supported for host iommu hw? 4KB mapping not supported?
> >
> >It's not a restriction by the HW IOMMU, but the host kernel. An Arm SMMU
> >can implement 4KB, 16KB and/or 64KB granules, but the host kernel only
> >advertises through VFIO the granule corresponding to host PAGE_SIZE. This
> >restriction is done by arm_lpae_restrict_pgsizes() in order to choose a page
> >size when a device is driven by the host.
> 
> Just curious why not advertises the Arm SMMU implemented granules to VFIO
> Eg:4KB, 16KB or 64KB granules?

That's possible, but the difficulty is setting up the page table
configuration afterwards. At the moment the host installs the HW page
tables early, when QEMU sets up the VFIO container. That initializes the
page size bitmap because configuring the HW page tables requires picking
one of the supported granules (setting TG0 in the SMMU Context
Descriptor).

If the guest could pick a granule via an ATTACH request, then QEMU would
need to tell the host kernel to install page tables with the desired
granule at that point. That would require a new interface in VFIO to
reconfigure a live container and replace the existing HW page tables
configuration (before ATTACH, the container must already be configured
with working page tables in order to implement boot-bypass, I think).

> But arm_lpae_restrict_pgsizes() restricted ones,
> Eg: for SZ_4K, (SZ_4K | SZ_2M | SZ_1G).
> (SZ_4K | SZ_2M | SZ_1G) looks not real hardware granules of Arm SMMU.

Yes, the granule here is 4K, and other bits only indicate huge page sizes,
so the user can try to optimize large mappings to use fewer TLB entries
where possible.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()
  2023-07-05  8:17     ` Duan, Zhenzhong
@ 2023-07-05 13:16       ` Eric Auger
  2023-07-06  8:56         ` Duan, Zhenzhong
                           ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Eric Auger @ 2023-07-05 13:16 UTC (permalink / raw)
  To: Duan, Zhenzhong, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, jean-philippe@linaro.org
  Cc: alex.williamson@redhat.com, clg@redhap.com,
	bharat.bhushan@nxp.com, peter.maydell@linaro.org

Hi Zhenghong,

On 7/5/23 10:17, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Duan, Zhenzhong
>> Sent: Wednesday, July 5, 2023 12:56 PM
>> Subject: RE: [PATCH 2/2] virtio-iommu: Rework the trace in
>> virtio_iommu_set_page_size_mask()
>>
>>
>>
>>> -----Original Message-----
>>> From: Eric Auger <eric.auger@redhat.com>
>>> Sent: Tuesday, July 4, 2023 7:15 PM
>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com; qemu-
>>> devel@nongnu.org; qemu-arm@nongnu.org; mst@redhat.com; jean-
>>> philippe@linaro.org; Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>> Cc: alex.williamson@redhat.com; clg@redhap.com;
>> bharat.bhushan@nxp.com;
>>> peter.maydell@linaro.org
>>> Subject: [PATCH 2/2] virtio-iommu: Rework the trace in
>>> virtio_iommu_set_page_size_mask()
>>>
>>> The current error messages in virtio_iommu_set_page_size_mask() sound
>>> quite similar for different situations and miss the IOMMU memory region
>>> that causes the issue.
>>>
>>> Clarify them and rework the comment.
>>>
>>> Also remove the trace when the new page_size_mask is not applied as the
>>> current frozen granule is kept. This message is rather confusing for
>>> the end user and anyway the current granule would have been used by the
>>> driver
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> ---
>>> hw/virtio/virtio-iommu.c | 19 +++++++------------
>>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index
>>> 1eaf81bab5..0d9f7196fe 100644
>>> --- a/hw/virtio/virtio-iommu.c
>>> +++ b/hw/virtio/virtio-iommu.c
>>> @@ -1101,29 +1101,24 @@ static int
>>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>>>                                           new_mask);
>>>
>>>     if ((cur_mask & new_mask) == 0) {
>>> -        error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>> -                   " is incompatible with mask 0x%"PRIx64, cur_mask, new_mask);
>>> +        error_setg(errp, "virtio-iommu %s reports a page size mask 0x%"PRIx64
>>> +                   " incompatible with currently supported mask 0x%"PRIx64,
>>> +                   mr->parent_obj.name, new_mask, cur_mask);
>>>         return -1;
>>>     }
>>>
>>>     /*
>>>      * Once the granule is frozen we can't change the mask anymore. If by
>>>      * chance the hotplugged device supports the same granule, we can still
>>> -     * accept it. Having a different masks is possible but the guest will use
>>> -     * sub-optimal block sizes, so warn about it.
>>> +     * accept it.
>>>      */
>>>     if (s->granule_frozen) {
>>> -        int new_granule = ctz64(new_mask);
>>>         int cur_granule = ctz64(cur_mask);
>>>
>>> -        if (new_granule != cur_granule) {
>>> -            error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>> -                       " is incompatible with mask 0x%"PRIx64, cur_mask,
>>> -                       new_mask);
>>> +        if (!(BIT(cur_granule) & new_mask)) {
> Sorry, I read this piece code again and got a question, if new_mask has finer
> granularity than cur_granule, should we allow it to pass even though
> BIT(cur_granule) is not set?
I think this should work but this is not straightforward to test.
virtio-iommu would use the current granule for map/unmap. In map/unmap
notifiers, this is split into pow2 ranges and cascaded to VFIO through
vfio_dma_map/unmap. The iova and size are aligned with the smaller
supported granule.

Jean, do you share this understanding or do I miss something.

Nevertheless the current code would have rejected that case and nobody
complained at that point ;-)

thanks

Eric

>
> Thanks
> Zhenzhong
>
>> Good catch.
>>
>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>
>> Thanks
>> Zhenzhong
>>
>>> +            error_setg(errp, "virtio-iommu %s does not support frozen
>>> + granule
>>> 0x%"PRIx64,
>>> +                       mr->parent_obj.name, BIT(cur_granule));
>>>             return -1;
>>> -        } else if (new_mask != cur_mask) {
>>> -            warn_report("virtio-iommu page mask 0x%"PRIx64
>>> -                        " does not match 0x%"PRIx64, cur_mask, new_mask);
>>>         }
>>>         return 0;
>>>     }
>>> --
>>> 2.38.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()
  2023-07-05 13:16       ` Eric Auger
@ 2023-07-06  8:56         ` Duan, Zhenzhong
  2023-07-06 14:35         ` Jean-Philippe Brucker
  2023-07-06 18:52         ` Michael S. Tsirkin
  2 siblings, 0 replies; 18+ messages in thread
From: Duan, Zhenzhong @ 2023-07-06  8:56 UTC (permalink / raw)
  To: eric.auger@redhat.com, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, qemu-arm@nongnu.org, mst@redhat.com,
	jean-philippe@linaro.org
  Cc: alex.williamson@redhat.com, clg@redhap.com,
	bharat.bhushan@nxp.com, peter.maydell@linaro.org

Hi Eric,
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Wednesday, July 5, 2023 9:17 PM
>Subject: Re: [PATCH 2/2] virtio-iommu: Rework the trace in
>virtio_iommu_set_page_size_mask()
>
>Hi Zhenghong,
>
>On 7/5/23 10:17, Duan, Zhenzhong wrote:
>>
>>> -----Original Message-----
>>> From: Duan, Zhenzhong
>>> Sent: Wednesday, July 5, 2023 12:56 PM
>>> Subject: RE: [PATCH 2/2] virtio-iommu: Rework the trace in
>>> virtio_iommu_set_page_size_mask()
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Eric Auger <eric.auger@redhat.com>
>>>> Sent: Tuesday, July 4, 2023 7:15 PM
>>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com; qemu-
>>>> devel@nongnu.org; qemu-arm@nongnu.org; mst@redhat.com; jean-
>>>> philippe@linaro.org; Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>> Cc: alex.williamson@redhat.com; clg@redhap.com;
>>> bharat.bhushan@nxp.com;
>>>> peter.maydell@linaro.org
>>>> Subject: [PATCH 2/2] virtio-iommu: Rework the trace in
>>>> virtio_iommu_set_page_size_mask()
>>>>
>>>> The current error messages in virtio_iommu_set_page_size_mask()
>>>> sound quite similar for different situations and miss the IOMMU
>>>> memory region that causes the issue.
>>>>
>>>> Clarify them and rework the comment.
>>>>
>>>> Also remove the trace when the new page_size_mask is not applied as
>>>> the current frozen granule is kept. This message is rather confusing
>>>> for the end user and anyway the current granule would have been used
>>>> by the driver
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> ---
>>>> hw/virtio/virtio-iommu.c | 19 +++++++------------
>>>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>>> index 1eaf81bab5..0d9f7196fe 100644
>>>> --- a/hw/virtio/virtio-iommu.c
>>>> +++ b/hw/virtio/virtio-iommu.c
>>>> @@ -1101,29 +1101,24 @@ static int
>>>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>>>>                                           new_mask);
>>>>
>>>>     if ((cur_mask & new_mask) == 0) {
>>>> -        error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>>> -                   " is incompatible with mask 0x%"PRIx64, cur_mask,
>new_mask);
>>>> +        error_setg(errp, "virtio-iommu %s reports a page size mask
>0x%"PRIx64
>>>> +                   " incompatible with currently supported mask 0x%"PRIx64,
>>>> +                   mr->parent_obj.name, new_mask, cur_mask);
>>>>         return -1;
>>>>     }
>>>>
>>>>     /*
>>>>      * Once the granule is frozen we can't change the mask anymore. If by
>>>>      * chance the hotplugged device supports the same granule, we can still
>>>> -     * accept it. Having a different masks is possible but the guest will use
>>>> -     * sub-optimal block sizes, so warn about it.
>>>> +     * accept it.
>>>>      */
>>>>     if (s->granule_frozen) {
>>>> -        int new_granule = ctz64(new_mask);
>>>>         int cur_granule = ctz64(cur_mask);
>>>>
>>>> -        if (new_granule != cur_granule) {
>>>> -            error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>>> -                       " is incompatible with mask 0x%"PRIx64, cur_mask,
>>>> -                       new_mask);
>>>> +        if (!(BIT(cur_granule) & new_mask)) {
>> Sorry, I read this piece code again and got a question, if new_mask
>> has finer granularity than cur_granule, should we allow it to pass
>> even though
>> BIT(cur_granule) is not set?
>I think this should work but this is not straightforward to test.
>virtio-iommu would use the current granule for map/unmap. In map/unmap
>notifiers, this is split into pow2 ranges and cascaded to VFIO through
>vfio_dma_map/unmap. The iova and size are aligned with the smaller
>supported granule.

I see, guess you mean malicious guest which may send any mapping request
to virtio-iommu backend. A well designed guest may use at least cur_granule
as the granularity of its mapping, even with pow2 split, cur_granule is
guaranteed at backend.

Thanks
Zhenzhong

>
>Jean, do you share this understanding or do I miss something.
>
>Nevertheless the current code would have rejected that case and nobody
>complained at that point ;-)
>
>thanks
>
>Eric
>
>>
>> Thanks
>> Zhenzhong
>>
>>> Good catch.
>>>
>>> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>
>>> Thanks
>>> Zhenzhong
>>>
>>>> +            error_setg(errp, "virtio-iommu %s does not support
>>>> + frozen granule
>>>> 0x%"PRIx64,
>>>> +                       mr->parent_obj.name, BIT(cur_granule));
>>>>             return -1;
>>>> -        } else if (new_mask != cur_mask) {
>>>> -            warn_report("virtio-iommu page mask 0x%"PRIx64
>>>> -                        " does not match 0x%"PRIx64, cur_mask, new_mask);
>>>>         }
>>>>         return 0;
>>>>     }
>>>> --
>>>> 2.38.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment
  2023-07-05 11:33         ` Jean-Philippe Brucker
@ 2023-07-06  9:00           ` Duan, Zhenzhong
  0 siblings, 0 replies; 18+ messages in thread
From: Duan, Zhenzhong @ 2023-07-06  9:00 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Eric Auger, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	clg@redhap.com, bharat.bhushan@nxp.com, peter.maydell@linaro.org



>-----Original Message-----
>From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>Sent: Wednesday, July 5, 2023 7:33 PM
>Subject: Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
>assignment
>
>On Wed, Jul 05, 2023 at 10:13:11AM +0000, Duan, Zhenzhong wrote:
>> >-----Original Message-----
>> >From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>> >Sent: Wednesday, July 5, 2023 4:29 PM
>> >Subject: Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO
>> >device assignment
>> >
>> >On Wed, Jul 05, 2023 at 04:52:09AM +0000, Duan, Zhenzhong wrote:
>> >> Hi Eric,
>> >>
>> >> >-----Original Message-----
>> >> >From: Eric Auger <eric.auger@redhat.com>
>> >> >Sent: Tuesday, July 4, 2023 7:15 PM
>> >> >Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO
>> >> >device assignment
>> >> >
>> >> >When running on a 64kB page size host and protecting a VFIO device
>> >> >with the virtio-iommu, qemu crashes with this kind of message:
>> >> >
>> >> >qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is
>> >> >incompatible with mask 0x20010000
>> >>
>> >> Does 0x20010000 mean only  512MB and 64KB super page mapping is
>> >> supported for host iommu hw? 4KB mapping not supported?
>> >
>> >It's not a restriction by the HW IOMMU, but the host kernel. An Arm
>> >SMMU can implement 4KB, 16KB and/or 64KB granules, but the host
>> >kernel only advertises through VFIO the granule corresponding to host
>> >PAGE_SIZE. This restriction is done by arm_lpae_restrict_pgsizes() in
>> >order to choose a page size when a device is driven by the host.
>>
>> Just curious why not advertises the Arm SMMU implemented granules to
>> VFIO Eg:4KB, 16KB or 64KB granules?
>
>That's possible, but the difficulty is setting up the page table configuration
>afterwards. At the moment the host installs the HW page tables early, when
>QEMU sets up the VFIO container. That initializes the page size bitmap
>because configuring the HW page tables requires picking one of the supported
>granules (setting TG0 in the SMMU Context Descriptor).
>
>If the guest could pick a granule via an ATTACH request, then QEMU would
>need to tell the host kernel to install page tables with the desired granule at
>that point. That would require a new interface in VFIO to reconfigure a live
>container and replace the existing HW page tables configuration (before
>ATTACH, the container must already be configured with working page tables
>in order to implement boot-bypass, I think).

Thanks, clear, it looks different from x86. For x86 guest, 1GB mapping on guest side
may be shadowed with 4KB mappings on host side in my understanding.

Regards
Zhenzhong


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()
  2023-07-05 13:16       ` Eric Auger
  2023-07-06  8:56         ` Duan, Zhenzhong
@ 2023-07-06 14:35         ` Jean-Philippe Brucker
  2023-07-06 15:38           ` Eric Auger
  2023-07-07  2:34           ` Duan, Zhenzhong
  2023-07-06 18:52         ` Michael S. Tsirkin
  2 siblings, 2 replies; 18+ messages in thread
From: Jean-Philippe Brucker @ 2023-07-06 14:35 UTC (permalink / raw)
  To: Eric Auger
  Cc: Duan, Zhenzhong, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	clg@redhap.com, bharat.bhushan@nxp.com, peter.maydell@linaro.org

On Wed, Jul 05, 2023 at 03:16:31PM +0200, Eric Auger wrote:
> >>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index
> >>> 1eaf81bab5..0d9f7196fe 100644
> >>> --- a/hw/virtio/virtio-iommu.c
> >>> +++ b/hw/virtio/virtio-iommu.c
> >>> @@ -1101,29 +1101,24 @@ static int
> >>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
> >>>                                           new_mask);
> >>>
> >>>     if ((cur_mask & new_mask) == 0) {
> >>> -        error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
> >>> -                   " is incompatible with mask 0x%"PRIx64, cur_mask, new_mask);
> >>> +        error_setg(errp, "virtio-iommu %s reports a page size mask 0x%"PRIx64
> >>> +                   " incompatible with currently supported mask 0x%"PRIx64,
> >>> +                   mr->parent_obj.name, new_mask, cur_mask);
> >>>         return -1;
> >>>     }
> >>>
> >>>     /*
> >>>      * Once the granule is frozen we can't change the mask anymore. If by
> >>>      * chance the hotplugged device supports the same granule, we can still
> >>> -     * accept it. Having a different masks is possible but the guest will use
> >>> -     * sub-optimal block sizes, so warn about it.
> >>> +     * accept it.
> >>>      */
> >>>     if (s->granule_frozen) {
> >>> -        int new_granule = ctz64(new_mask);
> >>>         int cur_granule = ctz64(cur_mask);
> >>>
> >>> -        if (new_granule != cur_granule) {
> >>> -            error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
> >>> -                       " is incompatible with mask 0x%"PRIx64, cur_mask,
> >>> -                       new_mask);
> >>> +        if (!(BIT(cur_granule) & new_mask)) {
> > Sorry, I read this piece code again and got a question, if new_mask has finer
> > granularity than cur_granule, should we allow it to pass even though
> > BIT(cur_granule) is not set?
> I think this should work but this is not straightforward to test.
> virtio-iommu would use the current granule for map/unmap. In map/unmap
> notifiers, this is split into pow2 ranges and cascaded to VFIO through
> vfio_dma_map/unmap. The iova and size are aligned with the smaller
> supported granule.
> 
> Jean, do you share this understanding or do I miss something.

Yes, I also think that would work. The guest would only issue mappings
with the larger granularity, which can be applied by VFIO with a finer
granule. However I doubt we're going to encounter this case, because
seeing a cur_granule larger than 4k here means that a VFIO device has
already been assigned with a large granule like 64k, and we're trying to
add a new device with 4k. This indicates two HW IOMMUs supporting
different granules in the same system, which seems unlikely.

Hopefully by the time we actually need this (if ever) we will support
per-endpoint probe properties, which allow informing the guest about
different hardware properties instead of relying on one global property in
the virtio config.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()
  2023-07-06 14:35         ` Jean-Philippe Brucker
@ 2023-07-06 15:38           ` Eric Auger
  2023-07-07  2:34           ` Duan, Zhenzhong
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Auger @ 2023-07-06 15:38 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Duan, Zhenzhong, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	clg@redhap.com, bharat.bhushan@nxp.com, peter.maydell@linaro.org

Hi Jean,

On 7/6/23 16:35, Jean-Philippe Brucker wrote:
> On Wed, Jul 05, 2023 at 03:16:31PM +0200, Eric Auger wrote:
>>>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index
>>>>> 1eaf81bab5..0d9f7196fe 100644
>>>>> --- a/hw/virtio/virtio-iommu.c
>>>>> +++ b/hw/virtio/virtio-iommu.c
>>>>> @@ -1101,29 +1101,24 @@ static int
>>>>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>>>>>                                           new_mask);
>>>>>
>>>>>     if ((cur_mask & new_mask) == 0) {
>>>>> -        error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>>>> -                   " is incompatible with mask 0x%"PRIx64, cur_mask, new_mask);
>>>>> +        error_setg(errp, "virtio-iommu %s reports a page size mask 0x%"PRIx64
>>>>> +                   " incompatible with currently supported mask 0x%"PRIx64,
>>>>> +                   mr->parent_obj.name, new_mask, cur_mask);
>>>>>         return -1;
>>>>>     }
>>>>>
>>>>>     /*
>>>>>      * Once the granule is frozen we can't change the mask anymore. If by
>>>>>      * chance the hotplugged device supports the same granule, we can still
>>>>> -     * accept it. Having a different masks is possible but the guest will use
>>>>> -     * sub-optimal block sizes, so warn about it.
>>>>> +     * accept it.
>>>>>      */
>>>>>     if (s->granule_frozen) {
>>>>> -        int new_granule = ctz64(new_mask);
>>>>>         int cur_granule = ctz64(cur_mask);
>>>>>
>>>>> -        if (new_granule != cur_granule) {
>>>>> -            error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>>>>> -                       " is incompatible with mask 0x%"PRIx64, cur_mask,
>>>>> -                       new_mask);
>>>>> +        if (!(BIT(cur_granule) & new_mask)) {
>>> Sorry, I read this piece code again and got a question, if new_mask has finer
>>> granularity than cur_granule, should we allow it to pass even though
>>> BIT(cur_granule) is not set?
>> I think this should work but this is not straightforward to test.
>> virtio-iommu would use the current granule for map/unmap. In map/unmap
>> notifiers, this is split into pow2 ranges and cascaded to VFIO through
>> vfio_dma_map/unmap. The iova and size are aligned with the smaller
>> supported granule.
>>
>> Jean, do you share this understanding or do I miss something.
> Yes, I also think that would work. The guest would only issue mappings
> with the larger granularity, which can be applied by VFIO with a finer
> granule. However I doubt we're going to encounter this case, because
> seeing a cur_granule larger than 4k here means that a VFIO device has
> already been assigned with a large granule like 64k, and we're trying to
> add a new device with 4k. This indicates two HW IOMMUs supporting
> different granules in the same system, which seems unlikely.

agreed
>
> Hopefully by the time we actually need this (if ever) we will support
> per-endpoint probe properties, which allow informing the guest about
> different hardware properties instead of relying on one global property in
> the virtio config.
yes looking forward to reviewing that.

Thanks

Eric
>
> Thanks,
> Jean
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()
  2023-07-05 13:16       ` Eric Auger
  2023-07-06  8:56         ` Duan, Zhenzhong
  2023-07-06 14:35         ` Jean-Philippe Brucker
@ 2023-07-06 18:52         ` Michael S. Tsirkin
  2 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2023-07-06 18:52 UTC (permalink / raw)
  To: Eric Auger
  Cc: Duan, Zhenzhong, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, jean-philippe@linaro.org,
	alex.williamson@redhat.com, clg@redhap.com,
	bharat.bhushan@nxp.com, peter.maydell@linaro.org

On Wed, Jul 05, 2023 at 03:16:31PM +0200, Eric Auger wrote:
> Hi Zhenghong,
> 
> On 7/5/23 10:17, Duan, Zhenzhong wrote:
> >
> >> -----Original Message-----
> >> From: Duan, Zhenzhong
> >> Sent: Wednesday, July 5, 2023 12:56 PM
> >> Subject: RE: [PATCH 2/2] virtio-iommu: Rework the trace in
> >> virtio_iommu_set_page_size_mask()
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Eric Auger <eric.auger@redhat.com>
> >>> Sent: Tuesday, July 4, 2023 7:15 PM
> >>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com; qemu-
> >>> devel@nongnu.org; qemu-arm@nongnu.org; mst@redhat.com; jean-
> >>> philippe@linaro.org; Duan, Zhenzhong <zhenzhong.duan@intel.com>
> >>> Cc: alex.williamson@redhat.com; clg@redhap.com;
> >> bharat.bhushan@nxp.com;
> >>> peter.maydell@linaro.org
> >>> Subject: [PATCH 2/2] virtio-iommu: Rework the trace in
> >>> virtio_iommu_set_page_size_mask()
> >>>
> >>> The current error messages in virtio_iommu_set_page_size_mask() sound
> >>> quite similar for different situations and miss the IOMMU memory region
> >>> that causes the issue.
> >>>
> >>> Clarify them and rework the comment.
> >>>
> >>> Also remove the trace when the new page_size_mask is not applied as the
> >>> current frozen granule is kept. This message is rather confusing for
> >>> the end user and anyway the current granule would have been used by the
> >>> driver
> >>>
> >>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>> ---
> >>> hw/virtio/virtio-iommu.c | 19 +++++++------------
> >>> 1 file changed, 7 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index
> >>> 1eaf81bab5..0d9f7196fe 100644
> >>> --- a/hw/virtio/virtio-iommu.c
> >>> +++ b/hw/virtio/virtio-iommu.c
> >>> @@ -1101,29 +1101,24 @@ static int
> >>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
> >>>                                           new_mask);
> >>>
> >>>     if ((cur_mask & new_mask) == 0) {
> >>> -        error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
> >>> -                   " is incompatible with mask 0x%"PRIx64, cur_mask, new_mask);
> >>> +        error_setg(errp, "virtio-iommu %s reports a page size mask 0x%"PRIx64
> >>> +                   " incompatible with currently supported mask 0x%"PRIx64,
> >>> +                   mr->parent_obj.name, new_mask, cur_mask);
> >>>         return -1;
> >>>     }
> >>>
> >>>     /*
> >>>      * Once the granule is frozen we can't change the mask anymore. If by
> >>>      * chance the hotplugged device supports the same granule, we can still
> >>> -     * accept it. Having a different masks is possible but the guest will use
> >>> -     * sub-optimal block sizes, so warn about it.
> >>> +     * accept it.
> >>>      */
> >>>     if (s->granule_frozen) {
> >>> -        int new_granule = ctz64(new_mask);
> >>>         int cur_granule = ctz64(cur_mask);
> >>>
> >>> -        if (new_granule != cur_granule) {
> >>> -            error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
> >>> -                       " is incompatible with mask 0x%"PRIx64, cur_mask,
> >>> -                       new_mask);
> >>> +        if (!(BIT(cur_granule) & new_mask)) {
> > Sorry, I read this piece code again and got a question, if new_mask has finer
> > granularity than cur_granule, should we allow it to pass even though
> > BIT(cur_granule) is not set?
> I think this should work but this is not straightforward to test.
> virtio-iommu would use the current granule for map/unmap. In map/unmap
> notifiers, this is split into pow2 ranges and cascaded to VFIO through
> vfio_dma_map/unmap. The iova and size are aligned with the smaller
> supported granule.


Maybe add a comment down the road. Not urgent.

> Jean, do you share this understanding or do I miss something.
> 
> Nevertheless the current code would have rejected that case and nobody
> complained at that point ;-)
> 
> thanks
> 
> Eric
> 
> >
> > Thanks
> > Zhenzhong
> >
> >> Good catch.
> >>
> >> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> >>
> >> Thanks
> >> Zhenzhong
> >>
> >>> +            error_setg(errp, "virtio-iommu %s does not support frozen
> >>> + granule
> >>> 0x%"PRIx64,
> >>> +                       mr->parent_obj.name, BIT(cur_granule));
> >>>             return -1;
> >>> -        } else if (new_mask != cur_mask) {
> >>> -            warn_report("virtio-iommu page mask 0x%"PRIx64
> >>> -                        " does not match 0x%"PRIx64, cur_mask, new_mask);
> >>>         }
> >>>         return 0;
> >>>     }
> >>> --
> >>> 2.38.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask()
  2023-07-06 14:35         ` Jean-Philippe Brucker
  2023-07-06 15:38           ` Eric Auger
@ 2023-07-07  2:34           ` Duan, Zhenzhong
  1 sibling, 0 replies; 18+ messages in thread
From: Duan, Zhenzhong @ 2023-07-07  2:34 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Eric Auger
  Cc: eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, mst@redhat.com, alex.williamson@redhat.com,
	clg@redhap.com, bharat.bhushan@nxp.com, peter.maydell@linaro.org

>-----Original Message-----
>From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>Sent: Thursday, July 6, 2023 10:36 PM
>Subject: Re: [PATCH 2/2] virtio-iommu: Rework the trace in
>virtio_iommu_set_page_size_mask()
>
>On Wed, Jul 05, 2023 at 03:16:31PM +0200, Eric Auger wrote:
>> >>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> >>> index 1eaf81bab5..0d9f7196fe 100644
>> >>> --- a/hw/virtio/virtio-iommu.c
>> >>> +++ b/hw/virtio/virtio-iommu.c
>> >>> @@ -1101,29 +1101,24 @@ static int
>> >>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>> >>>                                           new_mask);
>> >>>
>> >>>     if ((cur_mask & new_mask) == 0) {
>> >>> -        error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>> >>> -                   " is incompatible with mask 0x%"PRIx64, cur_mask,
>new_mask);
>> >>> +        error_setg(errp, "virtio-iommu %s reports a page size mask
>0x%"PRIx64
>> >>> +                   " incompatible with currently supported mask 0x%"PRIx64,
>> >>> +                   mr->parent_obj.name, new_mask, cur_mask);
>> >>>         return -1;
>> >>>     }
>> >>>
>> >>>     /*
>> >>>      * Once the granule is frozen we can't change the mask anymore. If by
>> >>>      * chance the hotplugged device supports the same granule, we can
>still
>> >>> -     * accept it. Having a different masks is possible but the guest will use
>> >>> -     * sub-optimal block sizes, so warn about it.
>> >>> +     * accept it.
>> >>>      */
>> >>>     if (s->granule_frozen) {
>> >>> -        int new_granule = ctz64(new_mask);
>> >>>         int cur_granule = ctz64(cur_mask);
>> >>>
>> >>> -        if (new_granule != cur_granule) {
>> >>> -            error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
>> >>> -                       " is incompatible with mask 0x%"PRIx64, cur_mask,
>> >>> -                       new_mask);
>> >>> +        if (!(BIT(cur_granule) & new_mask)) {
>> > Sorry, I read this piece code again and got a question, if new_mask
>> > has finer granularity than cur_granule, should we allow it to pass
>> > even though
>> > BIT(cur_granule) is not set?
>> I think this should work but this is not straightforward to test.
>> virtio-iommu would use the current granule for map/unmap. In
>map/unmap
>> notifiers, this is split into pow2 ranges and cascaded to VFIO through
>> vfio_dma_map/unmap. The iova and size are aligned with the smaller
>> supported granule.
>>
>> Jean, do you share this understanding or do I miss something.
>
>Yes, I also think that would work. The guest would only issue mappings with
>the larger granularity, which can be applied by VFIO with a finer granule.
>However I doubt we're going to encounter this case, because seeing a
>cur_granule larger than 4k here means that a VFIO device has already been
>assigned with a large granule like 64k, and we're trying to add a new device
>with 4k. This indicates two HW IOMMUs supporting different granules in the
>same system, which seems unlikely.

Indeed. Another scenario I can think of is migration from src with 64k granule
to dest with 4k, then hotplug a new device with 4k. Not clear if it's allowed
to migrate between host with different granule.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-07-07  2:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-04 11:15 [PATCH 0/2] VIRTIO-IOMMU/VFIO page size related fixes Eric Auger
2023-07-04 11:15 ` [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment Eric Auger
2023-07-05  4:52   ` Duan, Zhenzhong
2023-07-05  6:19     ` Eric Auger
2023-07-05  8:11       ` Duan, Zhenzhong
2023-07-05  8:29     ` Jean-Philippe Brucker
2023-07-05 10:13       ` Duan, Zhenzhong
2023-07-05 11:33         ` Jean-Philippe Brucker
2023-07-06  9:00           ` Duan, Zhenzhong
2023-07-04 11:15 ` [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask() Eric Auger
2023-07-05  4:55   ` Duan, Zhenzhong
2023-07-05  8:17     ` Duan, Zhenzhong
2023-07-05 13:16       ` Eric Auger
2023-07-06  8:56         ` Duan, Zhenzhong
2023-07-06 14:35         ` Jean-Philippe Brucker
2023-07-06 15:38           ` Eric Auger
2023-07-07  2:34           ` Duan, Zhenzhong
2023-07-06 18:52         ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).