qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
@ 2023-10-11 17:52 Eric Auger
  2023-10-11 17:52 ` [PATCH v3 01/13] memory: Let ReservedRegion use Range Eric Auger
                   ` (13 more replies)
  0 siblings, 14 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

This applies on top of vfio-next:
https://github.com/legoater/qemu/, vfio-next branch

On x86, when assigning VFIO-PCI devices protected with virtio-iommu
we encounter the case where the guest tries to map IOVAs beyond 48b
whereas the physical VTD IOMMU only supports 48b. This ends up with
VFIO_MAP_DMA failures at qemu level because at kernel level,
vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().

This is due to the fact the virtio-iommu currently unconditionally
exposes an IOVA range of 64b through its config input range fields.

This series removes this assumption by retrieving the usable IOVA
regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
a VFIO device is attached. This info is communicated to the
virtio-iommu memory region, transformed into the inversed info, ie.
the host reserved IOVA regions. Then those latter are combined with the
reserved IOVA regions set though the virtio-iommu reserved-regions
property. That way, the guest virtio-iommu driver, unchanged, is
able to probe the whole set of reserved regions and prevent any IOVA
belonging to those ranges from beeing used, achieving the original goal.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/vfio-next-iommu_geometry-v3

History:
v2 -> v3:
- rebase on top of vfio-next (including iommufd prereq)
- take into account IOVA range info capability may not be offered by
  old kernel and use nr_iovas = -1 to encode that [Alex]
- use GList * everywhere instead of arrays (in the range_inverse_array)
  with the benefice it sorts ranges retrieved from the kernel which are
  not garanteed to be sorted. Rework the tests accordingly [Alex]
- Make sure resv_regions GList is build before the probe() [Jean]
  per device list is first populated with prop resv regions on
  IOMMUDevice creation and then rebuilt on set_iova()
- Add a warning if set_iova builds a valid list after probe was
  called [Jean]
- Build host windows on top of IOVA valid ranges if this info can
  be retrieved from the kernel. As many windows are created as
  valid ranges
v1 -> v2:
- Remove "[PATCH 12/13] virtio-iommu: Resize memory region according
  to the max iova info" which causes way too much trouble: trigger
  a coredump in vhost, causes duplication of IOMMU notifiers causing
  EEXIST vfio_dma_map errors, ... This looks like a bad usage of the
  memory API so I prefer removing this from this series. So I was
  also obliged to remove the vfio_find_hostwin() check in the case
  of an IOMMU.
- Let range_inverse_array() take low/high args instead of hardcoding
  0, UINT64_MAX which both complexifies the algo and the tests.
- Move range function description in header.
- Check that if set_iova_ranges is called several times, new resv
  regions are included in previous ones

Eric Auger (13):
  memory: Let ReservedRegion use Range
  memory: Introduce memory_region_iommu_set_iova_ranges
  vfio: Collect container iova range info
  virtio-iommu: Rename reserved_regions into prop_resv_regions
  range: Make range_compare() public
  util/reserved-region: Add new ReservedRegion helpers
  virtio-iommu: Introduce per IOMMUDevice reserved regions
  range: Introduce range_inverse_array()
  virtio-iommu: Record whether a probe request has been issued
  virtio-iommu: Implement set_iova_ranges() callback
  virtio-iommu: Consolidate host reserved regions and property set ones
  test: Add some tests for range and resv-mem helpers
  vfio: Remove 64-bit IOVA address space assumption

 include/exec/memory.h            |  34 +++-
 include/hw/vfio/vfio-common.h    |   2 +
 include/hw/virtio/virtio-iommu.h |   7 +-
 include/qemu/range.h             |  14 ++
 include/qemu/reserved-region.h   |  32 ++++
 hw/core/qdev-properties-system.c |   9 +-
 hw/vfio/common.c                 |  23 ++-
 hw/vfio/container.c              |  67 ++++++-
 hw/virtio/virtio-iommu-pci.c     |   8 +-
 hw/virtio/virtio-iommu.c         | 155 +++++++++++++--
 system/memory.c                  |  13 ++
 tests/unit/test-resv-mem.c       | 318 +++++++++++++++++++++++++++++++
 util/range.c                     |  61 +++++-
 util/reserved-region.c           |  91 +++++++++
 hw/virtio/trace-events           |   1 +
 tests/unit/meson.build           |   1 +
 util/meson.build                 |   1 +
 17 files changed, 791 insertions(+), 46 deletions(-)
 create mode 100644 include/qemu/reserved-region.h
 create mode 100644 tests/unit/test-resv-mem.c
 create mode 100644 util/reserved-region.c

-- 
2.41.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 01/13] memory: Let ReservedRegion use Range
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-11 17:52 ` [PATCH v3 02/13] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

A reserved region is a range tagged with a type. Let's directly use
the Range type in the prospect to reuse some of the library helpers
shipped with the Range type.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>

---

v2 -> v3:
- added Cedric's R-b

v1 -> v2:
- Added Philippe and David's R-b
---
 include/exec/memory.h            | 4 ++--
 hw/core/qdev-properties-system.c | 9 ++++++---
 hw/virtio/virtio-iommu.c         | 6 +++---
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c99842d2fc..0d83f83d6d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -24,6 +24,7 @@
 #include "qemu/bswap.h"
 #include "qemu/queue.h"
 #include "qemu/int128.h"
+#include "qemu/range.h"
 #include "qemu/notify.h"
 #include "qom/object.h"
 #include "qemu/rcu.h"
@@ -79,8 +80,7 @@ extern unsigned int global_dirty_tracking;
 typedef struct MemoryRegionOps MemoryRegionOps;
 
 struct ReservedRegion {
-    hwaddr low;
-    hwaddr high;
+    Range range;
     unsigned type;
 };
 
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 688340610e..07ac2f8f20 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -691,7 +691,7 @@ static void get_reserved_region(Object *obj, Visitor *v, const char *name,
     int rc;
 
     rc = snprintf(buffer, sizeof(buffer), "0x%"PRIx64":0x%"PRIx64":%u",
-                  rr->low, rr->high, rr->type);
+                  range_lob(&rr->range), range_upb(&rr->range), rr->type);
     assert(rc < sizeof(buffer));
 
     visit_type_str(v, name, &p, errp);
@@ -703,6 +703,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
     Property *prop = opaque;
     ReservedRegion *rr = object_field_prop_ptr(obj, prop);
     const char *endptr;
+    uint64_t lob, upb;
     char *str;
     int ret;
 
@@ -710,7 +711,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
         return;
     }
 
-    ret = qemu_strtou64(str, &endptr, 16, &rr->low);
+    ret = qemu_strtou64(str, &endptr, 16, &lob);
     if (ret) {
         error_setg(errp, "start address of '%s'"
                    " must be a hexadecimal integer", name);
@@ -720,7 +721,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
         goto separator_error;
     }
 
-    ret = qemu_strtou64(endptr + 1, &endptr, 16, &rr->high);
+    ret = qemu_strtou64(endptr + 1, &endptr, 16, &upb);
     if (ret) {
         error_setg(errp, "end address of '%s'"
                    " must be a hexadecimal integer", name);
@@ -730,6 +731,8 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
         goto separator_error;
     }
 
+    range_set_bounds(&rr->range, lob, upb);
+
     ret = qemu_strtoui(endptr + 1, &endptr, 10, &rr->type);
     if (ret) {
         error_setg(errp, "type of '%s'"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index be51635895..e5e46e1b55 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -645,8 +645,8 @@ static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
         prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
         prop.head.length = cpu_to_le16(length);
         prop.subtype = subtype;
-        prop.start = cpu_to_le64(s->reserved_regions[i].low);
-        prop.end = cpu_to_le64(s->reserved_regions[i].high);
+        prop.start = cpu_to_le64(range_lob(&s->reserved_regions[i].range));
+        prop.end = cpu_to_le64(range_upb(&s->reserved_regions[i].range));
 
         memcpy(buf, &prop, size);
 
@@ -897,7 +897,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     for (i = 0; i < s->nb_reserved_regions; i++) {
         ReservedRegion *reg = &s->reserved_regions[i];
 
-        if (addr >= reg->low && addr <= reg->high) {
+        if (range_contains(&reg->range, addr)) {
             switch (reg->type) {
             case VIRTIO_IOMMU_RESV_MEM_T_MSI:
                 entry.perm = flag;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 02/13] memory: Introduce memory_region_iommu_set_iova_ranges
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
  2023-10-11 17:52 ` [PATCH v3 01/13] memory: Let ReservedRegion use Range Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-18 22:07   ` Peter Xu
  2023-10-11 17:52 ` [PATCH v3 03/13] vfio: Collect container iova range info Eric Auger
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

This helper will allow to convey information about valid
IOVA ranges to virtual IOMMUS.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- Pass a Glist instead
- Added a comment on memory_region_iommu_set_iova_ranges
- Removed R-b due to that change
---
 include/exec/memory.h | 30 ++++++++++++++++++++++++++++++
 system/memory.c       | 13 +++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 0d83f83d6d..eb52c75192 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -525,6 +525,24 @@ struct IOMMUMemoryRegionClass {
      int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
                                      uint64_t page_size_mask,
                                      Error **errp);
+    /**
+     * @iommu_set_iova_ranges:
+     *
+     * Propagate information about the usable IOVA ranges for a given IOMMU
+     * memory region. Used for example to propagate host physical device
+     * reserved memory region constraints to the virtual IOMMU.
+     *
+     * Optional method: if this method is not provided, then the default IOVA
+     * aperture is used.
+     *
+     * @iova_ranges: list of ordered IOVA ranges (at least one range)
+     *
+     * Returns 0 on success, or a negative error. In case of failure, the error
+     * object must be created.
+     */
+     int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
+                                  GList *iova_ranges,
+                                  Error **errp);
 };
 
 typedef struct RamDiscardListener RamDiscardListener;
@@ -1852,6 +1870,18 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
                                            uint64_t page_size_mask,
                                            Error **errp);
 
+/**
+ * memory_region_iommu_set_iova_ranges - Set the usable IOVA ranges
+ * for a given IOMMU MR region
+ *
+ * @iommu_mr: IOMMU memory region
+ * @iova_ranges: list of ordered IOVA ranges (at least one range)
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu,
+                                        GList *iova_ranges,
+                                        Error **errp);
+
 /**
  * memory_region_name: get a memory region's name
  *
diff --git a/system/memory.c b/system/memory.c
index fa1c99f9ba..26427780bd 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1909,6 +1909,19 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
     return ret;
 }
 
+int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu_mr,
+                                        GList *iova_ranges,
+                                        Error **errp)
+{
+    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+    int ret = 0;
+
+    if (imrc->iommu_set_iova_ranges) {
+        ret = imrc->iommu_set_iova_ranges(iommu_mr, iova_ranges, errp);
+    }
+    return ret;
+}
+
 int memory_region_register_iommu_notifier(MemoryRegion *mr,
                                           IOMMUNotifier *n, Error **errp)
 {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 03/13] vfio: Collect container iova range info
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
  2023-10-11 17:52 ` [PATCH v3 01/13] memory: Let ReservedRegion use Range Eric Auger
  2023-10-11 17:52 ` [PATCH v3 02/13] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-18 19:07   ` Alex Williamson
  2023-10-11 17:52 ` [PATCH v3 04/13] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
capability is supported.

This allows to propagate the information though the IOMMU MR
set_iova_ranges() callback so that virtual IOMMUs
get aware of those aperture constraints. This is only done if
the info is available and the number of iova ranges is greater than
0.

A new vfio_get_info_iova_range helper is introduced matching
the coding style of existing vfio_get_info_dma_avail. The
boolean returned value isn't used though. Code is aligned
between both.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- Turn nr_iovas into a int initialized to -1
- memory_region_iommu_set_iova_ranges only is called if nr_iovas > 0
- vfio_get_info_iova_range returns a bool to match
  vfio_get_info_dma_avail. Uniformize both code by using !hdr in
  the check
- rebase on top of vfio-next
---
 include/hw/vfio/vfio-common.h |  2 ++
 hw/vfio/common.c              |  9 +++++++
 hw/vfio/container.c           | 44 ++++++++++++++++++++++++++++++++---
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7780b9073a..848ff47960 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -99,6 +99,8 @@ typedef struct VFIOContainer {
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
     QLIST_ENTRY(VFIOContainer) next;
     QLIST_HEAD(, VFIODevice) device_list;
+    int nr_iovas;
+    GList *iova_ranges;
 } VFIOContainer;
 
 typedef struct VFIOGuestIOMMU {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5ff5acf1d8..9d804152ba 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -699,6 +699,15 @@ static void vfio_listener_region_add(MemoryListener *listener,
             goto fail;
         }
 
+        if (container->nr_iovas > 0) {
+            ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
+                    container->iova_ranges, &err);
+            if (ret) {
+                g_free(giommu);
+                goto fail;
+            }
+        }
+
         ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
                                                     &err);
         if (ret) {
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index adc467210f..5122ff6d92 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -382,7 +382,7 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
     /* If the capability cannot be found, assume no DMA limiting */
     hdr = vfio_get_iommu_type1_info_cap(info,
                                         VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL);
-    if (hdr == NULL) {
+    if (!hdr) {
         return false;
     }
 
@@ -394,6 +394,33 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
     return true;
 }
 
+static bool vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
+                                     VFIOContainer *container)
+{
+    struct vfio_info_cap_header *hdr;
+    struct vfio_iommu_type1_info_cap_iova_range *cap;
+
+    hdr = vfio_get_iommu_type1_info_cap(info,
+                                        VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
+    if (!hdr) {
+        return false;
+    }
+
+    cap = (void *)hdr;
+
+    container->nr_iovas = cap->nr_iovas;
+    for (int i = 0; i < cap->nr_iovas; i++) {
+        Range *range = g_new(Range, 1);
+
+        range_set_bounds(range, cap->iova_ranges[i].start,
+                         cap->iova_ranges[i].end);
+        container->iova_ranges =
+            range_list_insert(container->iova_ranges, range);
+    }
+
+    return true;
+}
+
 static void vfio_kvm_device_add_group(VFIOGroup *group)
 {
     Error *err = NULL;
@@ -535,6 +562,12 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
     }
 }
 
+static void vfio_free_container(VFIOContainer *container)
+{
+    g_list_free_full(container->iova_ranges, g_free);
+    g_free(container);
+}
+
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
                                   Error **errp)
 {
@@ -616,6 +649,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->error = NULL;
     container->dirty_pages_supported = false;
     container->dma_max_mappings = 0;
+    container->nr_iovas = -1;
+    container->iova_ranges = NULL;
     QLIST_INIT(&container->giommu_list);
     QLIST_INIT(&container->hostwin_list);
     QLIST_INIT(&container->vrdl_list);
@@ -652,6 +687,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
             container->dma_max_mappings = 65535;
         }
+
+        vfio_get_info_iova_range(info, container);
+
         vfio_get_iommu_info_migration(container, info);
         g_free(info);
 
@@ -765,7 +803,7 @@ enable_discards_exit:
     vfio_ram_block_discard_disable(container, false);
 
 free_container_exit:
-    g_free(container);
+    vfio_free_container(container);
 
 close_fd_exit:
     close(fd);
@@ -819,7 +857,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
 
         trace_vfio_disconnect_container(container->fd);
         close(container->fd);
-        g_free(container);
+        vfio_free_container(container);
 
         vfio_put_address_space(space);
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 04/13] virtio-iommu: Rename reserved_regions into prop_resv_regions
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (2 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 03/13] vfio: Collect container iova range info Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-11 17:52 ` [PATCH v3 05/13] range: Make range_compare() public Eric Auger
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

Rename VirtIOIOMMU (nb_)reserved_regions fields with the "prop_" prefix
to highlight those fields are set through a property, at machine level.
They are IOMMU wide.

A subsequent patch will introduce per IOMMUDevice reserved regions
that will include both those IOMMU wide property reserved
regions plus, sometimes, host reserved regions, if the device is
backed by a host device protected by a physical IOMMU. Also change
nb_ prefix by nr_.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>

---

v2 -> v3:
- added Cedric's R-b
---
 include/hw/virtio/virtio-iommu.h |  4 ++--
 hw/virtio/virtio-iommu-pci.c     |  8 ++++----
 hw/virtio/virtio-iommu.c         | 15 ++++++++-------
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index a93fc5383e..eea4564782 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -55,8 +55,8 @@ struct VirtIOIOMMU {
     GHashTable *as_by_busptr;
     IOMMUPciBus *iommu_pcibus_by_bus_num[PCI_BUS_MAX];
     PCIBus *primary_bus;
-    ReservedRegion *reserved_regions;
-    uint32_t nb_reserved_regions;
+    ReservedRegion *prop_resv_regions;
+    uint32_t nr_prop_resv_regions;
     GTree *domains;
     QemuRecMutex mutex;
     GTree *endpoints;
diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
index 7ef2f9dcdb..9459fbf6ed 100644
--- a/hw/virtio/virtio-iommu-pci.c
+++ b/hw/virtio/virtio-iommu-pci.c
@@ -37,7 +37,7 @@ struct VirtIOIOMMUPCI {
 static Property virtio_iommu_pci_properties[] = {
     DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
     DEFINE_PROP_ARRAY("reserved-regions", VirtIOIOMMUPCI,
-                      vdev.nb_reserved_regions, vdev.reserved_regions,
+                      vdev.nr_prop_resv_regions, vdev.prop_resv_regions,
                       qdev_prop_reserved_region, ReservedRegion),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -54,9 +54,9 @@ static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
                          "for the virtio-iommu-pci device");
         return;
     }
-    for (int i = 0; i < s->nb_reserved_regions; i++) {
-        if (s->reserved_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
-            s->reserved_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
+    for (int i = 0; i < s->nr_prop_resv_regions; i++) {
+        if (s->prop_resv_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
+            s->prop_resv_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
             error_setg(errp, "reserved region %d has an invalid type", i);
             error_append_hint(errp, "Valid values are 0 and 1\n");
             return;
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index e5e46e1b55..979cdb5648 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -631,22 +631,23 @@ static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
     size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
     int i;
 
-    total = size * s->nb_reserved_regions;
+    total = size * s->nr_prop_resv_regions;
 
     if (total > free) {
         return -ENOSPC;
     }
 
-    for (i = 0; i < s->nb_reserved_regions; i++) {
-        unsigned subtype = s->reserved_regions[i].type;
+    for (i = 0; i < s->nr_prop_resv_regions; i++) {
+        unsigned subtype = s->prop_resv_regions[i].type;
+        Range *range = &s->prop_resv_regions[i].range;
 
         assert(subtype == VIRTIO_IOMMU_RESV_MEM_T_RESERVED ||
                subtype == VIRTIO_IOMMU_RESV_MEM_T_MSI);
         prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
         prop.head.length = cpu_to_le16(length);
         prop.subtype = subtype;
-        prop.start = cpu_to_le64(range_lob(&s->reserved_regions[i].range));
-        prop.end = cpu_to_le64(range_upb(&s->reserved_regions[i].range));
+        prop.start = cpu_to_le64(range_lob(range));
+        prop.end = cpu_to_le64(range_upb(range));
 
         memcpy(buf, &prop, size);
 
@@ -894,8 +895,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         goto unlock;
     }
 
-    for (i = 0; i < s->nb_reserved_regions; i++) {
-        ReservedRegion *reg = &s->reserved_regions[i];
+    for (i = 0; i < s->nr_prop_resv_regions; i++) {
+        ReservedRegion *reg = &s->prop_resv_regions[i];
 
         if (range_contains(&reg->range, addr)) {
             switch (reg->type) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 05/13] range: Make range_compare() public
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (3 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 04/13] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-11 17:52 ` [PATCH v3 06/13] util/reserved-region: Add new ReservedRegion helpers Eric Auger
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

Let's expose range_compare() in the header so that it can be
reused outside of util/range.c

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

---

v1 -> v2:
- Added Philippe's R-b
---
 include/qemu/range.h | 6 ++++++
 util/range.c         | 6 +-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/qemu/range.h b/include/qemu/range.h
index 7e2b1cc447..aa671da143 100644
--- a/include/qemu/range.h
+++ b/include/qemu/range.h
@@ -217,6 +217,12 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
     return !(last2 < first1 || last1 < first2);
 }
 
+/*
+ * Return -1 if @a < @b, 1 @a > @b, and 0 if they touch or overlap.
+ * Both @a and @b must not be empty.
+ */
+int range_compare(Range *a, Range *b);
+
 GList *range_list_insert(GList *list, Range *data);
 
 #endif
diff --git a/util/range.c b/util/range.c
index 098d9d2dc0..782cb8b21c 100644
--- a/util/range.c
+++ b/util/range.c
@@ -20,11 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/range.h"
 
-/*
- * Return -1 if @a < @b, 1 @a > @b, and 0 if they touch or overlap.
- * Both @a and @b must not be empty.
- */
-static inline int range_compare(Range *a, Range *b)
+int range_compare(Range *a, Range *b)
 {
     assert(!range_is_empty(a) && !range_is_empty(b));
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 06/13] util/reserved-region: Add new ReservedRegion helpers
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (4 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 05/13] range: Make range_compare() public Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-11 17:52 ` [PATCH v3 07/13] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

Introduce resv_region_list_insert() helper which inserts
a new ReservedRegion into a sorted list of reserved region.
In case of overlap, the new region has higher priority and
hides the existing overlapped segments. If the overlap is
partial, new regions are created for parts which are not
overlapped. The new region has higher priority independently
on the type of the regions.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

---

v2 -> v3:
- use l = prev->next
- Added jean's R-b
---
 include/qemu/reserved-region.h | 32 ++++++++++++
 util/reserved-region.c         | 91 ++++++++++++++++++++++++++++++++++
 util/meson.build               |  1 +
 3 files changed, 124 insertions(+)
 create mode 100644 include/qemu/reserved-region.h
 create mode 100644 util/reserved-region.c

diff --git a/include/qemu/reserved-region.h b/include/qemu/reserved-region.h
new file mode 100644
index 0000000000..8e6f0a97e2
--- /dev/null
+++ b/include/qemu/reserved-region.h
@@ -0,0 +1,32 @@
+/*
+ * QEMU ReservedRegion helpers
+ *
+ * Copyright (c) 2023 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_RESERVED_REGION_H
+#define QEMU_RESERVED_REGION_H
+
+#include "exec/memory.h"
+
+/*
+ * Insert a new region into a sorted list of reserved regions. In case
+ * there is overlap with existing regions, the new added region has
+ * higher priority and replaces the overlapped segment.
+ */
+GList *resv_region_list_insert(GList *list, ReservedRegion *reg);
+
+#endif
diff --git a/util/reserved-region.c b/util/reserved-region.c
new file mode 100644
index 0000000000..18f83eb4c6
--- /dev/null
+++ b/util/reserved-region.c
@@ -0,0 +1,91 @@
+/*
+ * QEMU ReservedRegion helpers
+ *
+ * Copyright (c) 2023 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/range.h"
+#include "qemu/reserved-region.h"
+
+GList *resv_region_list_insert(GList *list, ReservedRegion *reg)
+{
+    ReservedRegion *resv_iter, *new_reg;
+    Range *r = &reg->range;
+    Range *range_iter;
+    GList *l;
+
+    for (l = list; l ; ) {
+        resv_iter = (ReservedRegion *)l->data;
+        range_iter = &resv_iter->range;
+
+        /* Skip all list elements strictly less than range to add */
+        if (range_compare(range_iter, r) < 0) {
+            l = l->next;
+        } else if (range_compare(range_iter, r) > 0) {
+            return g_list_insert_before(list, l, reg);
+        } else { /* there is an overlap */
+            if (range_contains_range(r, range_iter)) {
+                /* new range contains current item, simply remove this latter */
+                GList *prev = l->prev;
+                g_free(l->data);
+                list = g_list_delete_link(list, l);
+                if (prev) {
+                    l = prev->next;
+                } else {
+                    l = list;
+                }
+            } else if (range_contains_range(range_iter, r)) {
+                /* new region is included in the current region */
+                if (range_lob(range_iter) == range_lob(r)) {
+                    /* adjacent on the left side, derives into 2 regions */
+                    range_set_bounds(range_iter, range_upb(r) + 1,
+                                     range_upb(range_iter));
+                    return g_list_insert_before(list, l, reg);
+                } else if (range_upb(range_iter) == range_upb(r)) {
+                    /* adjacent on the right side, derives into 2 regions */
+                    range_set_bounds(range_iter, range_lob(range_iter),
+                                     range_lob(r) - 1);
+                    l = l->next;
+                } else {
+                    uint64_t lob = range_lob(range_iter);
+                    /*
+                     * the new range is in the middle of an existing one,
+                     * split this latter into 3 regs instead
+                     */
+                    range_set_bounds(range_iter, range_upb(r) + 1,
+                                     range_upb(range_iter));
+                    new_reg = g_new0(ReservedRegion, 1);
+                    new_reg->type = resv_iter->type;
+                    range_set_bounds(&new_reg->range,
+                                     lob, range_lob(r) - 1);
+                    list = g_list_insert_before(list, l, new_reg);
+                    return g_list_insert_before(list, l, reg);
+                }
+            } else if (range_lob(r) < range_lob(range_iter)) {
+                range_set_bounds(range_iter, range_upb(r) + 1,
+                                 range_upb(range_iter));
+                return g_list_insert_before(list, l, reg);
+            } else { /* intersection on the upper range */
+                range_set_bounds(range_iter, range_lob(range_iter),
+                                 range_lob(r) - 1);
+                l = l->next;
+            }
+        } /* overlap */
+    }
+    return g_list_append(list, reg);
+}
+
diff --git a/util/meson.build b/util/meson.build
index c4827fd70a..eb677b40c2 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -51,6 +51,7 @@ util_ss.add(files('qdist.c'))
 util_ss.add(files('qht.c'))
 util_ss.add(files('qsp.c'))
 util_ss.add(files('range.c'))
+util_ss.add(files('reserved-region.c'))
 util_ss.add(files('stats64.c'))
 util_ss.add(files('systemd.c'))
 util_ss.add(files('transactions.c'))
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 07/13] virtio-iommu: Introduce per IOMMUDevice reserved regions
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (5 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 06/13] util/reserved-region: Add new ReservedRegion helpers Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-11 17:52 ` [PATCH v3 08/13] range: Introduce range_inverse_array() Eric Auger
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

For the time being the per device reserved regions are
just a duplicate of IOMMU wide reserved regions. Subsequent
patches will combine those with host reserved regions, if any.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- do the initialization of the GList when creating the IOMMUDevice
  instead of in virtio_iommu_fill_resv_mem_prop(), which is called
  conditionnally in driver's probe. Also use resv_region_list_insert()
---
 include/hw/virtio/virtio-iommu.h |  1 +
 hw/virtio/virtio-iommu.c         | 37 +++++++++++++++++++++++++-------
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index eea4564782..70b8ace34d 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -39,6 +39,7 @@ typedef struct IOMMUDevice {
     AddressSpace  as;
     MemoryRegion root;          /* The root container of the device */
     MemoryRegion bypass_mr;     /* The alias of shared memory MR */
+    GList *resv_regions;
 } IOMMUDevice;
 
 typedef struct IOMMUPciBus {
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 979cdb5648..0e2370663d 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -26,6 +26,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/reset.h"
 #include "sysemu/sysemu.h"
+#include "qemu/reserved-region.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "trace.h"
@@ -378,6 +379,19 @@ static void virtio_iommu_put_domain(gpointer data)
     g_free(domain);
 }
 
+static void add_prop_resv_regions(IOMMUDevice *sdev)
+{
+    VirtIOIOMMU *s = sdev->viommu;
+    int i;
+
+    for (i = 0; i < s->nr_prop_resv_regions; i++) {
+        ReservedRegion *reg = g_new0(ReservedRegion, 1);
+
+        *reg = s->prop_resv_regions[i];
+        sdev->resv_regions = resv_region_list_insert(sdev->resv_regions, reg);
+    }
+}
+
 static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
                                               int devfn)
 {
@@ -408,6 +422,7 @@ static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
 
         memory_region_init(&sdev->root, OBJECT(s), name, UINT64_MAX);
         address_space_init(&sdev->as, &sdev->root, TYPE_VIRTIO_IOMMU);
+        add_prop_resv_regions(sdev);
 
         /*
          * Build the IOMMU disabled container with aliases to the
@@ -629,17 +644,23 @@ static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
 {
     struct virtio_iommu_probe_resv_mem prop = {};
     size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
-    int i;
+    IOMMUDevice *sdev;
+    GList *l;
 
-    total = size * s->nr_prop_resv_regions;
+    sdev = container_of(virtio_iommu_mr(s, ep), IOMMUDevice, iommu_mr);
+    if (!sdev) {
+        return -EINVAL;
+    }
 
+    total = size * g_list_length(sdev->resv_regions);
     if (total > free) {
         return -ENOSPC;
     }
 
-    for (i = 0; i < s->nr_prop_resv_regions; i++) {
-        unsigned subtype = s->prop_resv_regions[i].type;
-        Range *range = &s->prop_resv_regions[i].range;
+    for (l = sdev->resv_regions; l; l = l->next) {
+        ReservedRegion *reg = l->data;
+        unsigned subtype = reg->type;
+        Range *range = &reg->range;
 
         assert(subtype == VIRTIO_IOMMU_RESV_MEM_T_RESERVED ||
                subtype == VIRTIO_IOMMU_RESV_MEM_T_MSI);
@@ -857,7 +878,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     bool bypass_allowed;
     int granule;
     bool found;
-    int i;
+    GList *l;
 
     interval.low = addr;
     interval.high = addr + 1;
@@ -895,8 +916,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         goto unlock;
     }
 
-    for (i = 0; i < s->nr_prop_resv_regions; i++) {
-        ReservedRegion *reg = &s->prop_resv_regions[i];
+    for (l = sdev->resv_regions; l; l = l->next) {
+        ReservedRegion *reg = l->data;
 
         if (range_contains(&reg->range, addr)) {
             switch (reg->type) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 08/13] range: Introduce range_inverse_array()
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (6 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 07/13] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-11 17:52 ` [PATCH v3 09/13] virtio-iommu: Record whether a probe request has been issued Eric Auger
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

This helper reverses a list of regions within a [low, high]
span, turning original regions into holes and original
holes into actual regions, covering the whole UINT64_MAX span.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- now operate on GList's. Fix the commit msg by mentionning
  low/high params

v1 -> v2:
- Move range_inverse_array description comment in the header
- Take low/high params
---
 include/qemu/range.h |  8 +++++++
 util/range.c         | 55 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/include/qemu/range.h b/include/qemu/range.h
index aa671da143..205e1da76d 100644
--- a/include/qemu/range.h
+++ b/include/qemu/range.h
@@ -225,4 +225,12 @@ int range_compare(Range *a, Range *b);
 
 GList *range_list_insert(GList *list, Range *data);
 
+/*
+ * Inverse an array of sorted ranges over the [low, high] span, ie.
+ * original ranges becomes holes in the newly allocated inv_ranges
+ */
+void range_inverse_array(GList *in_ranges,
+                         GList **out_ranges,
+                         uint64_t low, uint64_t high);
+
 #endif
diff --git a/util/range.c b/util/range.c
index 782cb8b21c..9605ccfcbe 100644
--- a/util/range.c
+++ b/util/range.c
@@ -66,3 +66,58 @@ GList *range_list_insert(GList *list, Range *data)
 
     return list;
 }
+
+static inline
+GList *append_new_range(GList *list, uint64_t lob, uint64_t upb)
+{
+    Range *new = g_new0(Range, 1);
+
+    range_set_bounds(new, lob, upb);
+    return g_list_append(list, new);
+}
+
+
+void range_inverse_array(GList *in, GList **rev,
+                         uint64_t low, uint64_t high)
+{
+    Range *r, *rn;
+    GList *l = in, *out = *rev;
+
+    for (l = in; l && range_upb(l->data) < low; l = l->next) {
+        continue;
+    }
+
+    if (!l) {
+        out = append_new_range(out, low, high);
+        goto exit;
+    }
+    r = (Range *)l->data;
+
+    /* first range lob is greater than min, insert a first range */
+    if (range_lob(r) > low) {
+        out = append_new_range(out, low, MIN(range_lob(r) - 1, high));
+    }
+
+    /* insert a range inbetween each original range until we reach high */
+    for (; l->next; l = l->next) {
+        r = (Range *)l->data;
+        rn = (Range *)l->next->data;
+        if (range_lob(r) >= high) {
+            goto exit;
+        }
+        if (range_compare(r, rn)) {
+            out = append_new_range(out, range_upb(r) + 1,
+                                   MIN(range_lob(rn) - 1, high));
+        }
+    }
+
+    /* last range */
+    r = (Range *)l->data;
+
+    /* last range upb is less than max, insert a last range */
+    if (range_upb(r) <  high) {
+        out = append_new_range(out, range_upb(r) + 1, high);
+    }
+exit:
+    *rev = out;
+}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 09/13] virtio-iommu: Record whether a probe request has been issued
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (7 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 08/13] range: Introduce range_inverse_array() Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-11 17:52 ` [PATCH v3 10/13] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

Add an IOMMUDevice 'probe_done' flag to record that the driver
already issued a probe request on that device.

This will be useful to double check host reserved regions aren't
notified after the probe and hence are not taken into account
by the driver.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 include/hw/virtio/virtio-iommu.h |  1 +
 hw/virtio/virtio-iommu.c         | 20 +++++++++++---------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 70b8ace34d..1dd11ae81a 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -40,6 +40,7 @@ typedef struct IOMMUDevice {
     MemoryRegion root;          /* The root container of the device */
     MemoryRegion bypass_mr;     /* The alias of shared memory MR */
     GList *resv_regions;
+    bool probe_done;
 } IOMMUDevice;
 
 typedef struct IOMMUPciBus {
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 0e2370663d..13c3c087fe 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -639,19 +639,13 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
     return ret;
 }
 
-static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
+static ssize_t virtio_iommu_fill_resv_mem_prop(IOMMUDevice *sdev, uint32_t ep,
                                                uint8_t *buf, size_t free)
 {
     struct virtio_iommu_probe_resv_mem prop = {};
     size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
-    IOMMUDevice *sdev;
     GList *l;
 
-    sdev = container_of(virtio_iommu_mr(s, ep), IOMMUDevice, iommu_mr);
-    if (!sdev) {
-        return -EINVAL;
-    }
-
     total = size * g_list_length(sdev->resv_regions);
     if (total > free) {
         return -ENOSPC;
@@ -688,19 +682,27 @@ static int virtio_iommu_probe(VirtIOIOMMU *s,
                               uint8_t *buf)
 {
     uint32_t ep_id = le32_to_cpu(req->endpoint);
+    IOMMUMemoryRegion *iommu_mr = virtio_iommu_mr(s, ep_id);
     size_t free = VIOMMU_PROBE_SIZE;
+    IOMMUDevice *sdev;
     ssize_t count;
 
-    if (!virtio_iommu_mr(s, ep_id)) {
+    if (!iommu_mr) {
         return VIRTIO_IOMMU_S_NOENT;
     }
 
-    count = virtio_iommu_fill_resv_mem_prop(s, ep_id, buf, free);
+    sdev = container_of(iommu_mr, IOMMUDevice, iommu_mr);
+    if (!sdev) {
+        return -EINVAL;
+    }
+
+    count = virtio_iommu_fill_resv_mem_prop(sdev, ep_id, buf, free);
     if (count < 0) {
         return VIRTIO_IOMMU_S_INVAL;
     }
     buf += count;
     free -= count;
+    sdev->probe_done = true;
 
     return VIRTIO_IOMMU_S_OK;
 }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 10/13] virtio-iommu: Implement set_iova_ranges() callback
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (8 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 09/13] virtio-iommu: Record whether a probe request has been issued Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-11 17:52 ` [PATCH v3 11/13] virtio-iommu: Consolidate host reserved regions and property set ones Eric Auger
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

The implementation populates the array of per IOMMUDevice
host reserved ranges.

It is forbidden to have conflicting sets of host IOVA ranges
to be applied onto the same IOMMU MR (implied by different
host devices).

In case the callback is called after the probe request has
been issues by the driver, a warning is issued.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v3 -> v4:
- Use GList instead
- added a warn_report in case the probe has already been issued

v1 -> v2:
- Forbid conflicting sets of host resv regions
---
 include/hw/virtio/virtio-iommu.h |  1 +
 hw/virtio/virtio-iommu.c         | 67 ++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 1dd11ae81a..781ebaea8f 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -40,6 +40,7 @@ typedef struct IOMMUDevice {
     MemoryRegion root;          /* The root container of the device */
     MemoryRegion bypass_mr;     /* The alias of shared memory MR */
     GList *resv_regions;
+    GList *host_resv_ranges;
     bool probe_done;
 } IOMMUDevice;
 
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 13c3c087fe..15aadd6fdd 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/log.h"
 #include "qemu/iov.h"
+#include "qemu/range.h"
 #include "exec/target_page.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio.h"
@@ -1155,6 +1156,71 @@ static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
     return 0;
 }
 
+/**
+ * virtio_iommu_set_iova_ranges: Conveys the usable IOVA ranges
+ *
+ * The function turns those into reserved ranges. Once some
+ * reserved ranges have been set, new reserved regions cannot be
+ * added outside of the original ones.
+ *
+ * @mr: IOMMU MR
+ * @iova_ranges: list of usable IOVA ranges
+ * @errp: error handle
+ */
+static int virtio_iommu_set_iova_ranges(IOMMUMemoryRegion *mr,
+                                        GList *iova_ranges,
+                                        Error **errp)
+{
+    IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+    GList *current_ranges = sdev->host_resv_ranges;
+    GList *l, *tmp, *new_ranges = NULL;
+    int ret = -EINVAL;
+
+    /* check that each new resv region is included in an existing one */
+    if (sdev->host_resv_ranges) {
+        range_inverse_array(iova_ranges,
+                            &new_ranges,
+                            0, UINT64_MAX);
+
+        for (tmp = new_ranges; tmp; tmp = tmp->next) {
+            Range *newr = (Range *)tmp->data;
+            bool included = false;
+
+            for (l = current_ranges; l; l = l->next) {
+                Range * r = (Range *)l->data;
+
+                if (range_contains_range(r, newr)) {
+                    included = true;
+                    break;
+                }
+            }
+            if (!included) {
+                goto error;
+            }
+        }
+        /* all new reserved ranges are included in existing ones */
+        ret = 0;
+        goto out;
+    }
+
+    if (sdev->probe_done) {
+        warn_report("%s: Notified about new host reserved regions after probe",
+                    mr->parent_obj.name);
+    }
+
+    range_inverse_array(iova_ranges,
+                        &sdev->host_resv_ranges,
+                        0, UINT64_MAX);
+
+    return 0;
+error:
+    error_setg(errp, "IOMMU mr=%s Conflicting host reserved ranges set!",
+               mr->parent_obj.name);
+out:
+    g_list_free_full(new_ranges, g_free);
+    return ret;
+}
+
 static void virtio_iommu_system_reset(void *opaque)
 {
     VirtIOIOMMU *s = opaque;
@@ -1450,6 +1516,7 @@ static void virtio_iommu_memory_region_class_init(ObjectClass *klass,
     imrc->replay = virtio_iommu_replay;
     imrc->notify_flag_changed = virtio_iommu_notify_flag_changed;
     imrc->iommu_set_page_size_mask = virtio_iommu_set_page_size_mask;
+    imrc->iommu_set_iova_ranges = virtio_iommu_set_iova_ranges;
 }
 
 static const TypeInfo virtio_iommu_info = {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 11/13] virtio-iommu: Consolidate host reserved regions and property set ones
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (9 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 10/13] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-11 17:52 ` [PATCH v3 12/13] test: Add some tests for range and resv-mem helpers Eric Auger
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

Up to now we were exposing to the RESV_MEM probe requests the
reserved memory regions set though the reserved-regions array property.

Combine those with the host reserved memory regions if any. Those
latter are tagged as RESERVED. We don't have more information about
them besides then cannot be mapped. Reserved regions set by
property have higher priority.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/virtio-iommu.c | 36 ++++++++++++++++++++++++++++++++++++
 hw/virtio/trace-events   |  1 +
 2 files changed, 37 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 15aadd6fdd..dede0d41aa 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -21,6 +21,7 @@
 #include "qemu/log.h"
 #include "qemu/iov.h"
 #include "qemu/range.h"
+#include "qemu/reserved-region.h"
 #include "exec/target_page.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio.h"
@@ -1156,6 +1157,40 @@ static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
     return 0;
 }
 
+/**
+ * rebuild_resv_regions: rebuild resv regions with both the
+ * info of host resv ranges and property set resv ranges
+ */
+static int rebuild_resv_regions(IOMMUDevice *sdev)
+{
+    GList *l;
+    int i = 0;
+
+    /* free the existing list and rebuild it from scratch */
+    g_list_free_full(sdev->resv_regions, g_free);
+    sdev->resv_regions = NULL;
+
+    /* First add host reserved regions if any, all tagged as RESERVED */
+    for (l = sdev->host_resv_ranges; l; l = l->next) {
+        ReservedRegion *reg = g_new0(ReservedRegion, 1);
+        Range *r = (Range *)l->data;
+
+        reg->type = VIRTIO_IOMMU_RESV_MEM_T_RESERVED;
+        range_set_bounds(&reg->range, range_lob(r), range_upb(r));
+        sdev->resv_regions = resv_region_list_insert(sdev->resv_regions, reg);
+        trace_virtio_iommu_host_resv_regions(sdev->iommu_mr.parent_obj.name, i,
+                                             range_lob(&reg->range),
+                                             range_upb(&reg->range));
+        i++;
+    }
+    /*
+     * then add higher priority reserved regions set by the machine
+     * through properties
+     */
+    add_prop_resv_regions(sdev);
+    return 0;
+}
+
 /**
  * virtio_iommu_set_iova_ranges: Conveys the usable IOVA ranges
  *
@@ -1211,6 +1246,7 @@ static int virtio_iommu_set_iova_ranges(IOMMUMemoryRegion *mr,
     range_inverse_array(iova_ranges,
                         &sdev->host_resv_ranges,
                         0, UINT64_MAX);
+    rebuild_resv_regions(sdev);
 
     return 0;
 error:
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 1cb9027d1e..b49d9c4b0a 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -134,6 +134,7 @@ virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
 virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
 virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
 virtio_iommu_freeze_granule(uint64_t page_size_mask) "granule set to 0x%"PRIx64
+virtio_iommu_host_resv_regions(const char *name, uint32_t index, uint64_t lob, uint64_t upb) "mr=%s host-resv-reg[%d] = [0x%"PRIx64",0x%"PRIx64"]"
 
 # virtio-mem.c
 virtio_mem_send_response(uint16_t type) "type=%" PRIu16
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 12/13] test: Add some tests for range and resv-mem helpers
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (10 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 11/13] virtio-iommu: Consolidate host reserved regions and property set ones Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-30  7:48   ` Cédric Le Goater
  2023-10-11 17:52 ` [PATCH v3 13/13] vfio: Remove 64-bit IOVA address space assumption Eric Auger
  2023-10-18 13:37 ` [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Michael S. Tsirkin
  13 siblings, 1 reply; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

Add unit tests for both resv_region_list_insert() and
range_inverse_array().

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- conversion to new GList based protos
---
 tests/unit/test-resv-mem.c | 318 +++++++++++++++++++++++++++++++++++++
 tests/unit/meson.build     |   1 +
 2 files changed, 319 insertions(+)
 create mode 100644 tests/unit/test-resv-mem.c

diff --git a/tests/unit/test-resv-mem.c b/tests/unit/test-resv-mem.c
new file mode 100644
index 0000000000..ea3336c39d
--- /dev/null
+++ b/tests/unit/test-resv-mem.c
@@ -0,0 +1,318 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * reserved-region/range.c unit-tests.
+ *
+ * Copyright (C) 2023, Red Hat, Inc.
+ *
+ * Author: Eric Auger <eric.auger@redhat.com>
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/range.h"
+#include "exec/memory.h"
+#include "qemu/reserved-region.h"
+
+#define DEBUG 0
+
+#if DEBUG
+static void print_ranges(const char *prefix, GList *ranges)
+{
+    GList *l;
+    int i = 0;
+
+    if (!g_list_length(ranges)) {
+        printf("%s is void\n", prefix);
+        return;
+    }
+    for (l = ranges; l; l = l->next) {
+        Range *r = (Range *)l->data;
+
+        printf("%s rev[%i] = [0x%"PRIx64",0x%"PRIx64"]\n",
+               prefix, i, range_lob(r), range_upb(r));
+        i++;
+    }
+}
+#endif
+
+static void compare_ranges(const char *prefix, GList *ranges,
+                           GList *expected)
+{
+    GList *l, *e;
+    int i = 0;
+
+#if DEBUG
+    print_ranges("out", ranges);
+    print_ranges("expected", expected);
+#endif
+    g_assert_cmpint(g_list_length(ranges), ==, g_list_length(expected));
+    for (l = ranges, e = expected; l ; l = l->next, e = e->next) {
+        Range *r = (Range *)l->data;
+        Range *er = (Range *)e->data;
+
+        g_assert_true(range_lob(r) == range_lob(er) &&
+                      range_upb(r) == range_upb(er));
+        i++;
+    }
+}
+
+static GList *insert_sorted_range(GList *list, uint64_t lob, uint64_t upb)
+{
+    Range *new = g_new0(Range, 1);
+
+    range_set_bounds(new, lob, upb);
+    return range_list_insert(list, new);
+}
+
+static void reset(GList **in, GList **out, GList **expected)
+{
+    g_list_free_full(*in, g_free);
+    g_list_free_full(*out, g_free);
+    g_list_free_full(*expected, g_free);
+    *in = NULL;
+    *out = NULL;
+    *expected = NULL;
+}
+
+static void
+run_range_inverse_array(const char *prefix, GList **in, GList **expected,
+                        uint64_t low, uint64_t high)
+{
+    GList *out = NULL;
+    range_inverse_array(*in, &out, low, high);
+    compare_ranges(prefix, out, *expected);
+    reset(in, &out, expected);
+}
+
+static void check_range_reverse_array(void)
+{
+    GList *in = NULL, *expected = NULL;
+
+    /* test 1 */
+
+    in = insert_sorted_range(in, 0x10000, UINT64_MAX);
+    expected = insert_sorted_range(expected, 0x0, 0xFFFF);
+    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
+
+    /* test 2 */
+
+    in = insert_sorted_range(in, 0x10000, 0xFFFFFFFFFFFF);
+    expected = insert_sorted_range(expected, 0x0, 0xFFFF);
+    expected = insert_sorted_range(expected, 0x1000000000000, UINT64_MAX);
+    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
+
+    /* test 3 */
+
+    in = insert_sorted_range(in, 0x0, 0xFFFF);
+    in = insert_sorted_range(in, 0x10000, 0x2FFFF);
+    expected = insert_sorted_range(expected, 0x30000, UINT64_MAX);
+    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
+
+    /* test 4 */
+
+    in = insert_sorted_range(in, 0x50000, 0x5FFFF);
+    in = insert_sorted_range(in, 0x60000, 0xFFFFFFFFFFFF);
+    expected = insert_sorted_range(expected, 0x0, 0x4FFFF);
+    expected = insert_sorted_range(expected, 0x1000000000000, UINT64_MAX);
+    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
+
+    /* test 5 */
+
+    in = insert_sorted_range(in, 0x0, UINT64_MAX);
+    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
+
+    /* test 6 */
+    in = insert_sorted_range(in,  0x10000, 0x1FFFF);
+    in = insert_sorted_range(in,  0x30000, 0x6FFFF);
+    in = insert_sorted_range(in,  0x90000, UINT64_MAX);
+    expected = insert_sorted_range(expected, 0x0, 0xFFFF);
+    expected = insert_sorted_range(expected, 0x20000, 0x2FFFF);
+    expected = insert_sorted_range(expected, 0x70000, 0x8FFFF);
+    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
+}
+
+static void check_range_reverse_array_low_end(void)
+{
+    GList *in = NULL, *expected = NULL;
+
+    /* test 1 */
+    in = insert_sorted_range(in,  0x0, UINT64_MAX);
+    run_range_inverse_array("test1", &in, &expected, 0x10000, 0xFFFFFF);
+
+    /* test 2 */
+
+    in = insert_sorted_range(in,  0x0, 0xFFFF);
+    in = insert_sorted_range(in,  0x20000, 0x2FFFF);
+    expected = insert_sorted_range(expected, 0x40000, 0xFFFFFFFFFFFF);
+    run_range_inverse_array("test2", &in, &expected, 0x40000, 0xFFFFFFFFFFFF);
+
+    /* test 3 */
+    in = insert_sorted_range(in,  0x0, 0xFFFF);
+    in = insert_sorted_range(in,  0x20000, 0x2FFFF);
+    in = insert_sorted_range(in,  0x1000000000000, UINT64_MAX);
+    expected = insert_sorted_range(expected, 0x40000, 0xFFFFFFFFFFFF);
+    run_range_inverse_array("test3", &in, &expected, 0x40000, 0xFFFFFFFFFFFF);
+
+    /* test 4 */
+
+    in = insert_sorted_range(in,  0x0, 0xFFFF);
+    in = insert_sorted_range(in,  0x20000, 0x2FFFF);
+    in = insert_sorted_range(in,  0x1000000000000, UINT64_MAX);
+    expected = insert_sorted_range(expected, 0x30000, 0xFFFFFFFFFFFF);
+    run_range_inverse_array("test4", &in, &expected, 0x20000, 0xFFFFFFFFFFFF);
+
+    /* test 5 */
+
+    in = insert_sorted_range(in,  0x2000, 0xFFFF);
+    in = insert_sorted_range(in,  0x20000, 0x2FFFF);
+    in = insert_sorted_range(in,  0x100000000, 0x1FFFFFFFF);
+    expected = insert_sorted_range(expected, 0x1000, 0x1FFF);
+    expected = insert_sorted_range(expected, 0x10000, 0x1FFFF);
+    expected = insert_sorted_range(expected, 0x30000, 0xFFFFFFFF);
+    expected = insert_sorted_range(expected, 0x200000000, 0xFFFFFFFFFFFF);
+    run_range_inverse_array("test5", &in, &expected, 0x1000, 0xFFFFFFFFFFFF);
+
+    /* test 6 */
+
+    in = insert_sorted_range(in,  0x10000000 , 0x1FFFFFFF);
+    in = insert_sorted_range(in,  0x100000000, 0x1FFFFFFFF);
+    expected = insert_sorted_range(expected, 0x0, 0xFFFF);
+    run_range_inverse_array("test6", &in, &expected, 0x0, 0xFFFF);
+}
+
+static ReservedRegion *alloc_resv_mem(unsigned type, uint64_t lob, uint64_t upb)
+{
+    ReservedRegion *r;
+
+    r = g_new0(ReservedRegion, 1);
+    r->type = type;
+    range_set_bounds(&r->range, lob, upb);
+    return r;
+}
+
+static void print_resv_region_list(const char *prefix, GList *list,
+                                   uint32_t expected_length)
+{
+    int i = g_list_length(list);
+
+    g_assert_cmpint(i, ==, expected_length);
+#if DEBUG
+    i = 0;
+    for (GList *l = list; l; l = l->next) {
+        ReservedRegion *r = (ReservedRegion *)l->data;
+        Range *range = &r->range;
+
+        printf("%s item[%d]=[0x%x, 0x%"PRIx64", 0x%"PRIx64"]\n",
+               prefix, i++, r->type, range_lob(range), range_upb(range));
+    }
+#endif
+}
+
+static void free_resv_region(gpointer data)
+{
+    ReservedRegion *reg = (ReservedRegion *)data;
+
+    g_free(reg);
+}
+
+static void check_resv_region_list_insert(void)
+{
+    ReservedRegion *r[10];
+    GList *l = NULL;
+
+    r[0] = alloc_resv_mem(0xA, 0, 0xFFFF);
+    r[1] = alloc_resv_mem(0xA, 0x20000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[0]);
+    l = resv_region_list_insert(l, r[1]);
+    print_resv_region_list("test1", l, 2);
+
+    /* adjacent on left */
+    r[2] = alloc_resv_mem(0xB, 0x0, 0xFFF);
+    l = resv_region_list_insert(l, r[2]);
+    /* adjacent on right */
+    r[3] = alloc_resv_mem(0xC, 0x21000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[3]);
+    print_resv_region_list("test2", l, 4);
+
+    /* exact overlap of D into C*/
+    r[4] = alloc_resv_mem(0xD, 0x21000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[4]);
+    print_resv_region_list("test3", l, 4);
+
+    /* in the middle */
+    r[5] = alloc_resv_mem(0xE, 0x22000, 0x23FFF);
+    l = resv_region_list_insert(l, r[5]);
+    print_resv_region_list("test4", l, 6);
+
+    /* overwrites several existing ones */
+    r[6] = alloc_resv_mem(0xF, 0x10000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[6]);
+    print_resv_region_list("test5", l, 3);
+
+    /* contiguous at the end */
+    r[7] = alloc_resv_mem(0x0, 0x30000, 0x40000);
+    l = resv_region_list_insert(l, r[7]);
+    print_resv_region_list("test6", l, 4);
+
+    g_list_free_full(l, free_resv_region);
+    l = NULL;
+
+    r[0] = alloc_resv_mem(0x0, 0x10000, 0x1FFFF);
+    l = resv_region_list_insert(l, r[0]);
+    /* insertion before the 1st item */
+    r[1] = alloc_resv_mem(0x1, 0x0, 0xFF);
+    l = resv_region_list_insert(l, r[1]);
+    print_resv_region_list("test8", l, 2);
+
+    /* collision on the left side */
+    r[2] = alloc_resv_mem(0xA, 0x1200, 0x11FFF);
+    l = resv_region_list_insert(l, r[2]);
+    print_resv_region_list("test9", l, 3);
+
+    /* collision on the right side */
+    r[3] = alloc_resv_mem(0xA, 0x1F000, 0x2FFFF);
+    l = resv_region_list_insert(l, r[3]);
+    print_resv_region_list("test10", l, 4);
+
+    /* override everything */
+    r[4] = alloc_resv_mem(0xF, 0x0, UINT64_MAX);
+    l = resv_region_list_insert(l, r[4]);
+    print_resv_region_list("test11", l, 1);
+
+    g_list_free_full(l, free_resv_region);
+    l = NULL;
+
+    r[0] = alloc_resv_mem(0xF, 0x1000000000000, UINT64_MAX);
+    l = resv_region_list_insert(l, r[0]);
+    print_resv_region_list("test12", l, 1);
+
+    r[1] = alloc_resv_mem(0xA, 0x0, 0xFFFFFFF);
+    l = resv_region_list_insert(l, r[1]);
+    print_resv_region_list("test12", l, 2);
+
+    r[2] = alloc_resv_mem(0xB, 0x100000000, 0x1FFFFFFFF);
+    l = resv_region_list_insert(l, r[2]);
+    print_resv_region_list("test12", l, 3);
+
+    r[3] = alloc_resv_mem(0x0, 0x010000000, 0x2FFFFFFFF);
+    l = resv_region_list_insert(l, r[3]);
+    print_resv_region_list("test12", l, 3);
+
+    g_list_free_full(l, free_resv_region);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/resv-mem/range_reverse_array",
+                    check_range_reverse_array);
+    g_test_add_func("/resv-mem/range_reverse_array_low_end",
+                    check_range_reverse_array_low_end);
+    g_test_add_func("/resv-mem/resv_region_list_insert",
+                    check_resv_region_list_insert);
+
+    g_test_run();
+
+    return 0;
+}
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index f33ae64b8d..e6c51e7a86 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -21,6 +21,7 @@ tests = {
   'test-opts-visitor': [testqapi],
   'test-visitor-serialization': [testqapi],
   'test-bitmap': [],
+  'test-resv-mem': [],
   # all code tested by test-x86-cpuid is inside topology.h
   'test-x86-cpuid': [],
   'test-cutils': [],
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 13/13] vfio: Remove 64-bit IOVA address space assumption
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (11 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 12/13] test: Add some tests for range and resv-mem helpers Eric Auger
@ 2023-10-11 17:52 ` Eric Auger
  2023-10-18 21:42   ` Alex Williamson
  2023-10-18 13:37 ` [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Michael S. Tsirkin
  13 siblings, 1 reply; 27+ messages in thread
From: Eric Auger @ 2023-10-11 17:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

Now we retrieve the usable IOVA ranges from the host,
we now the physical IOMMU aperture and we can remove
the assumption of 64b IOVA space when calling
vfio_host_win_add().

This works fine in general but in case of an IOMMU memory
region this becomes more tricky. For instance the virtio-iommu
MR has a 64b aperture by default. If the physical IOMMU has a
smaller aperture (typically the case for VTD), this means we
would need to resize the IOMMU MR when this latter is linked
to a container. However this happens on vfio_listener_region_add()
when calling the IOMMU MR set_iova_ranges() callback and this
would mean we would have a recursive call the
vfio_listener_region_add(). This looks like a wrong usage of
he memory API causing duplicate IOMMU MR notifier registration
for instance.

Until we find a better solution, make sure the vfio_find_hostwin()
is not called anymore for IOMMU region.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- take into account the avail IOVA range may not be existing
- Create as many VFIOHostDMAWindow as valid IOVA ranges
- rebase on top of vfio-next

I have not found any working solution to the IOMMU MR resizing.
So I can remove this patch or remove the check for IOMMU MR. Maybe
this is an issue which can be handled separately?
---
 hw/vfio/common.c    | 14 +++++++-------
 hw/vfio/container.c | 23 +++++++++++++++++------
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9d804152ba..1447b6fdc4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -654,13 +654,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
         goto fail;
     }
 
-    hostwin = vfio_find_hostwin(container, iova, end);
-    if (!hostwin) {
-        error_setg(&err, "Container %p can't map guest IOVA region"
-                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
-        goto fail;
-    }
-
     memory_region_ref(section->mr);
 
     if (memory_region_is_iommu(section->mr)) {
@@ -720,6 +713,13 @@ static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
+    hostwin = vfio_find_hostwin(container, iova, end);
+    if (!hostwin) {
+        error_setg(&err, "Container %p can't map guest IOVA region"
+                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
+        goto fail;
+    }
+
     /* Here we assume that memory_region_is_ram(section->mr)==true */
 
     /*
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 5122ff6d92..eb9d962881 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -693,12 +693,23 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         vfio_get_iommu_info_migration(container, info);
         g_free(info);
 
-        /*
-         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
-         * information to get the actual window extent rather than assume
-         * a 64-bit IOVA address space.
-         */
-        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
+        if (container->nr_iovas == -1) {
+            /*
+             * no available info on usable IOVA ranges,
+             * assume 64b IOVA space
+             */
+            vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
+        } else {
+            GList *l;
+
+            g_assert(container->nr_iovas);
+            for (l = container->iova_ranges; l; l = l->next) {
+                Range *r = l->data;
+
+                vfio_host_win_add(container, range_lob(r), range_upb(r),
+                                  container->pgsizes);
+            }
+        }
 
         break;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
                   ` (12 preceding siblings ...)
  2023-10-11 17:52 ` [PATCH v3 13/13] vfio: Remove 64-bit IOVA address space assumption Eric Auger
@ 2023-10-18 13:37 ` Michael S. Tsirkin
  2023-10-19  9:07   ` YangHang Liu
  2023-10-19 11:07   ` Cédric Le Goater
  13 siblings, 2 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2023-10-18 13:37 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, pbonzini, peter.maydell, peterx, david, philmd,
	zhenzhong.duan, yi.l.liu

On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote:
> This applies on top of vfio-next:
> https://github.com/legoater/qemu/, vfio-next branch

virtio things make sense

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

let me know how you want to merge all this.



> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
> we encounter the case where the guest tries to map IOVAs beyond 48b
> whereas the physical VTD IOMMU only supports 48b. This ends up with
> VFIO_MAP_DMA failures at qemu level because at kernel level,
> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
> 
> This is due to the fact the virtio-iommu currently unconditionally
> exposes an IOVA range of 64b through its config input range fields.
> 
> This series removes this assumption by retrieving the usable IOVA
> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
> a VFIO device is attached. This info is communicated to the
> virtio-iommu memory region, transformed into the inversed info, ie.
> the host reserved IOVA regions. Then those latter are combined with the
> reserved IOVA regions set though the virtio-iommu reserved-regions
> property. That way, the guest virtio-iommu driver, unchanged, is
> able to probe the whole set of reserved regions and prevent any IOVA
> belonging to those ranges from beeing used, achieving the original goal.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/vfio-next-iommu_geometry-v3
> 
> History:
> v2 -> v3:
> - rebase on top of vfio-next (including iommufd prereq)
> - take into account IOVA range info capability may not be offered by
>   old kernel and use nr_iovas = -1 to encode that [Alex]
> - use GList * everywhere instead of arrays (in the range_inverse_array)
>   with the benefice it sorts ranges retrieved from the kernel which are
>   not garanteed to be sorted. Rework the tests accordingly [Alex]
> - Make sure resv_regions GList is build before the probe() [Jean]
>   per device list is first populated with prop resv regions on
>   IOMMUDevice creation and then rebuilt on set_iova()
> - Add a warning if set_iova builds a valid list after probe was
>   called [Jean]
> - Build host windows on top of IOVA valid ranges if this info can
>   be retrieved from the kernel. As many windows are created as
>   valid ranges
> v1 -> v2:
> - Remove "[PATCH 12/13] virtio-iommu: Resize memory region according
>   to the max iova info" which causes way too much trouble: trigger
>   a coredump in vhost, causes duplication of IOMMU notifiers causing
>   EEXIST vfio_dma_map errors, ... This looks like a bad usage of the
>   memory API so I prefer removing this from this series. So I was
>   also obliged to remove the vfio_find_hostwin() check in the case
>   of an IOMMU.
> - Let range_inverse_array() take low/high args instead of hardcoding
>   0, UINT64_MAX which both complexifies the algo and the tests.
> - Move range function description in header.
> - Check that if set_iova_ranges is called several times, new resv
>   regions are included in previous ones
> 
> Eric Auger (13):
>   memory: Let ReservedRegion use Range
>   memory: Introduce memory_region_iommu_set_iova_ranges
>   vfio: Collect container iova range info
>   virtio-iommu: Rename reserved_regions into prop_resv_regions
>   range: Make range_compare() public
>   util/reserved-region: Add new ReservedRegion helpers
>   virtio-iommu: Introduce per IOMMUDevice reserved regions
>   range: Introduce range_inverse_array()
>   virtio-iommu: Record whether a probe request has been issued
>   virtio-iommu: Implement set_iova_ranges() callback
>   virtio-iommu: Consolidate host reserved regions and property set ones
>   test: Add some tests for range and resv-mem helpers
>   vfio: Remove 64-bit IOVA address space assumption
> 
>  include/exec/memory.h            |  34 +++-
>  include/hw/vfio/vfio-common.h    |   2 +
>  include/hw/virtio/virtio-iommu.h |   7 +-
>  include/qemu/range.h             |  14 ++
>  include/qemu/reserved-region.h   |  32 ++++
>  hw/core/qdev-properties-system.c |   9 +-
>  hw/vfio/common.c                 |  23 ++-
>  hw/vfio/container.c              |  67 ++++++-
>  hw/virtio/virtio-iommu-pci.c     |   8 +-
>  hw/virtio/virtio-iommu.c         | 155 +++++++++++++--
>  system/memory.c                  |  13 ++
>  tests/unit/test-resv-mem.c       | 318 +++++++++++++++++++++++++++++++
>  util/range.c                     |  61 +++++-
>  util/reserved-region.c           |  91 +++++++++
>  hw/virtio/trace-events           |   1 +
>  tests/unit/meson.build           |   1 +
>  util/meson.build                 |   1 +
>  17 files changed, 791 insertions(+), 46 deletions(-)
>  create mode 100644 include/qemu/reserved-region.h
>  create mode 100644 tests/unit/test-resv-mem.c
>  create mode 100644 util/reserved-region.c
> 
> -- 
> 2.41.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 03/13] vfio: Collect container iova range info
  2023-10-11 17:52 ` [PATCH v3 03/13] vfio: Collect container iova range info Eric Auger
@ 2023-10-18 19:07   ` Alex Williamson
  2023-10-19  6:39     ` Eric Auger
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2023-10-18 19:07 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
	pbonzini, peter.maydell, peterx, david, philmd, zhenzhong.duan,
	yi.l.liu

On Wed, 11 Oct 2023 19:52:19 +0200
Eric Auger <eric.auger@redhat.com> wrote:

> Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> capability is supported.
> 
> This allows to propagate the information though the IOMMU MR
> set_iova_ranges() callback so that virtual IOMMUs
> get aware of those aperture constraints. This is only done if
> the info is available and the number of iova ranges is greater than
> 0.
> 
> A new vfio_get_info_iova_range helper is introduced matching
> the coding style of existing vfio_get_info_dma_avail. The
> boolean returned value isn't used though. Code is aligned
> between both.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v2 -> v3:
> - Turn nr_iovas into a int initialized to -1
> - memory_region_iommu_set_iova_ranges only is called if nr_iovas > 0
> - vfio_get_info_iova_range returns a bool to match
>   vfio_get_info_dma_avail. Uniformize both code by using !hdr in
>   the check
> - rebase on top of vfio-next
> ---
>  include/hw/vfio/vfio-common.h |  2 ++
>  hw/vfio/common.c              |  9 +++++++
>  hw/vfio/container.c           | 44 ++++++++++++++++++++++++++++++++---
>  3 files changed, 52 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 7780b9073a..848ff47960 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -99,6 +99,8 @@ typedef struct VFIOContainer {
>      QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
>      QLIST_ENTRY(VFIOContainer) next;
>      QLIST_HEAD(, VFIODevice) device_list;
> +    int nr_iovas;
> +    GList *iova_ranges;

Nit, nr_iovas seems like it has a pretty weak use case here.  We can
just test iova_ranges != NULL for calling set_iova_ranges.  In patch 13
we can again test against NULL, which I think also negates the need to
assert nr_iovas since the NULL test automatically catches the zero
case.  Otherwise

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>

>  } VFIOContainer;
>  
>  typedef struct VFIOGuestIOMMU {
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5ff5acf1d8..9d804152ba 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -699,6 +699,15 @@ static void vfio_listener_region_add(MemoryListener *listener,
>              goto fail;
>          }
>  
> +        if (container->nr_iovas > 0) {
> +            ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
> +                    container->iova_ranges, &err);
> +            if (ret) {
> +                g_free(giommu);
> +                goto fail;
> +            }
> +        }
> +
>          ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
>                                                      &err);
>          if (ret) {
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index adc467210f..5122ff6d92 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -382,7 +382,7 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>      /* If the capability cannot be found, assume no DMA limiting */
>      hdr = vfio_get_iommu_type1_info_cap(info,
>                                          VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL);
> -    if (hdr == NULL) {
> +    if (!hdr) {
>          return false;
>      }
>  
> @@ -394,6 +394,33 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>      return true;
>  }
>  
> +static bool vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
> +                                     VFIOContainer *container)
> +{
> +    struct vfio_info_cap_header *hdr;
> +    struct vfio_iommu_type1_info_cap_iova_range *cap;
> +
> +    hdr = vfio_get_iommu_type1_info_cap(info,
> +                                        VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
> +    if (!hdr) {
> +        return false;
> +    }
> +
> +    cap = (void *)hdr;
> +
> +    container->nr_iovas = cap->nr_iovas;
> +    for (int i = 0; i < cap->nr_iovas; i++) {
> +        Range *range = g_new(Range, 1);
> +
> +        range_set_bounds(range, cap->iova_ranges[i].start,
> +                         cap->iova_ranges[i].end);
> +        container->iova_ranges =
> +            range_list_insert(container->iova_ranges, range);
> +    }
> +
> +    return true;
> +}
> +
>  static void vfio_kvm_device_add_group(VFIOGroup *group)
>  {
>      Error *err = NULL;
> @@ -535,6 +562,12 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
>      }
>  }
>  
> +static void vfio_free_container(VFIOContainer *container)
> +{
> +    g_list_free_full(container->iova_ranges, g_free);
> +    g_free(container);
> +}
> +
>  static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>                                    Error **errp)
>  {
> @@ -616,6 +649,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>      container->error = NULL;
>      container->dirty_pages_supported = false;
>      container->dma_max_mappings = 0;
> +    container->nr_iovas = -1;
> +    container->iova_ranges = NULL;
>      QLIST_INIT(&container->giommu_list);
>      QLIST_INIT(&container->hostwin_list);
>      QLIST_INIT(&container->vrdl_list);
> @@ -652,6 +687,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
>              container->dma_max_mappings = 65535;
>          }
> +
> +        vfio_get_info_iova_range(info, container);
> +
>          vfio_get_iommu_info_migration(container, info);
>          g_free(info);
>  
> @@ -765,7 +803,7 @@ enable_discards_exit:
>      vfio_ram_block_discard_disable(container, false);
>  
>  free_container_exit:
> -    g_free(container);
> +    vfio_free_container(container);
>  
>  close_fd_exit:
>      close(fd);
> @@ -819,7 +857,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>  
>          trace_vfio_disconnect_container(container->fd);
>          close(container->fd);
> -        g_free(container);
> +        vfio_free_container(container);
>  
>          vfio_put_address_space(space);
>      }



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 13/13] vfio: Remove 64-bit IOVA address space assumption
  2023-10-11 17:52 ` [PATCH v3 13/13] vfio: Remove 64-bit IOVA address space assumption Eric Auger
@ 2023-10-18 21:42   ` Alex Williamson
  2023-10-19  6:37     ` Eric Auger
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Williamson @ 2023-10-18 21:42 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
	pbonzini, peter.maydell, peterx, david, philmd, zhenzhong.duan,
	yi.l.liu

On Wed, 11 Oct 2023 19:52:29 +0200
Eric Auger <eric.auger@redhat.com> wrote:

> Now we retrieve the usable IOVA ranges from the host,
> we now the physical IOMMU aperture and we can remove

s/now/use/?

> the assumption of 64b IOVA space when calling
> vfio_host_win_add().
> 
> This works fine in general but in case of an IOMMU memory
> region this becomes more tricky. For instance the virtio-iommu
> MR has a 64b aperture by default. If the physical IOMMU has a
> smaller aperture (typically the case for VTD), this means we
> would need to resize the IOMMU MR when this latter is linked
> to a container. However this happens on vfio_listener_region_add()
> when calling the IOMMU MR set_iova_ranges() callback and this
> would mean we would have a recursive call the
> vfio_listener_region_add(). This looks like a wrong usage of
> he memory API causing duplicate IOMMU MR notifier registration

s/he/the/

> for instance.
> 
> Until we find a better solution, make sure the vfio_find_hostwin()
> is not called anymore for IOMMU region.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v2 -> v3:
> - take into account the avail IOVA range may not be existing
> - Create as many VFIOHostDMAWindow as valid IOVA ranges
> - rebase on top of vfio-next
> 
> I have not found any working solution to the IOMMU MR resizing.
> So I can remove this patch or remove the check for IOMMU MR. Maybe
> this is an issue which can be handled separately?

Am I correct that the only thing we're solving here is the FIXME?

Without this change we assume a 64-bit IOVA address space for the
"hostwin" and the previous patch 03/ already sets the usable IOVA range
for the IOMMU aperture.  Kernel code will error on mappings outside of
the usable IOVA ranges regardless, so at best we're making the failure
occur earlier in QEMU rather than at the time of the DMA mapping.

I think the FIXME comment had assumed the hostwin support would be more
universal, but perhaps we're just doubling down on a relic of SPAPR
support that we can safely ignore, and at some point remove.  It really
seems to serve the same purpose as the new iova_ranges and if it were
worthwhile to fail the range in QEMU before trying to map it in the
kernel, we could test it against iova_ranges directly.

Unless this serves some purpose that I'm not spotting, maybe we should
drop this patch, consider hostwin to be SPAPR specific, and avoid
further entanglements with it here so that it can be more easily
removed.  Thanks,

Alex

> ---
>  hw/vfio/common.c    | 14 +++++++-------
>  hw/vfio/container.c | 23 +++++++++++++++++------
>  2 files changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9d804152ba..1447b6fdc4 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -654,13 +654,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          goto fail;
>      }
>  
> -    hostwin = vfio_find_hostwin(container, iova, end);
> -    if (!hostwin) {
> -        error_setg(&err, "Container %p can't map guest IOVA region"
> -                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
> -        goto fail;
> -    }
> -
>      memory_region_ref(section->mr);
>  
>      if (memory_region_is_iommu(section->mr)) {
> @@ -720,6 +713,13 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          return;
>      }
>  
> +    hostwin = vfio_find_hostwin(container, iova, end);
> +    if (!hostwin) {
> +        error_setg(&err, "Container %p can't map guest IOVA region"
> +                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
> +        goto fail;
> +    }
> +
>      /* Here we assume that memory_region_is_ram(section->mr)==true */
>  
>      /*
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 5122ff6d92..eb9d962881 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -693,12 +693,23 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          vfio_get_iommu_info_migration(container, info);
>          g_free(info);
>  
> -        /*
> -         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> -         * information to get the actual window extent rather than assume
> -         * a 64-bit IOVA address space.
> -         */
> -        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
> +        if (container->nr_iovas == -1) {
> +            /*
> +             * no available info on usable IOVA ranges,
> +             * assume 64b IOVA space
> +             */
> +            vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
> +        } else {
> +            GList *l;
> +
> +            g_assert(container->nr_iovas);
> +            for (l = container->iova_ranges; l; l = l->next) {
> +                Range *r = l->data;
> +
> +                vfio_host_win_add(container, range_lob(r), range_upb(r),
> +                                  container->pgsizes);
> +            }
> +        }
>  
>          break;
>      }



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 02/13] memory: Introduce memory_region_iommu_set_iova_ranges
  2023-10-11 17:52 ` [PATCH v3 02/13] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
@ 2023-10-18 22:07   ` Peter Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2023-10-18 22:07 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, mst, pbonzini, peter.maydell, david, philmd,
	zhenzhong.duan, yi.l.liu

On Wed, Oct 11, 2023 at 07:52:18PM +0200, Eric Auger wrote:
> This helper will allow to convey information about valid
> IOVA ranges to virtual IOMMUS.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Acked-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 13/13] vfio: Remove 64-bit IOVA address space assumption
  2023-10-18 21:42   ` Alex Williamson
@ 2023-10-19  6:37     ` Eric Auger
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-19  6:37 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
	pbonzini, peter.maydell, peterx, david, philmd, zhenzhong.duan,
	yi.l.liu

Hi Alex,

On 10/18/23 23:42, Alex Williamson wrote:
> On Wed, 11 Oct 2023 19:52:29 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Now we retrieve the usable IOVA ranges from the host,
>> we now the physical IOMMU aperture and we can remove
> s/now/use/?
>
>> the assumption of 64b IOVA space when calling
>> vfio_host_win_add().
>>
>> This works fine in general but in case of an IOMMU memory
>> region this becomes more tricky. For instance the virtio-iommu
>> MR has a 64b aperture by default. If the physical IOMMU has a
>> smaller aperture (typically the case for VTD), this means we
>> would need to resize the IOMMU MR when this latter is linked
>> to a container. However this happens on vfio_listener_region_add()
>> when calling the IOMMU MR set_iova_ranges() callback and this
>> would mean we would have a recursive call the
>> vfio_listener_region_add(). This looks like a wrong usage of
>> he memory API causing duplicate IOMMU MR notifier registration
> s/he/the/
>
>> for instance.
>>
>> Until we find a better solution, make sure the vfio_find_hostwin()
>> is not called anymore for IOMMU region.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v2 -> v3:
>> - take into account the avail IOVA range may not be existing
>> - Create as many VFIOHostDMAWindow as valid IOVA ranges
>> - rebase on top of vfio-next
>>
>> I have not found any working solution to the IOMMU MR resizing.
>> So I can remove this patch or remove the check for IOMMU MR. Maybe
>> this is an issue which can be handled separately?
> Am I correct that the only thing we're solving here is the FIXME?
>
> Without this change we assume a 64-bit IOVA address space for the
> "hostwin" and the previous patch 03/ already sets the usable IOVA range
> for the IOMMU aperture.  Kernel code will error on mappings outside of
> the usable IOVA ranges regardless, so at best we're making the failure
> occur earlier in QEMU rather than at the time of the DMA mapping.
yes that's what the patch aims at
>
> I think the FIXME comment had assumed the hostwin support would be more
> universal, but perhaps we're just doubling down on a relic of SPAPR
> support that we can safely ignore, and at some point remove.  It really
> seems to serve the same purpose as the new iova_ranges and if it were
> worthwhile to fail the range in QEMU before trying to map it in the
> kernel, we could test it against iova_ranges directly.
>
> Unless this serves some purpose that I'm not spotting, maybe we should
> drop this patch, consider hostwin to be SPAPR specific, and avoid
> further entanglements with it here so that it can be more easily
> removed.  Thanks,
I am fine dropping the patch esp because I failed at resizing the IOMMU
MR on-the-fly and was forced to move vfio_find_hostwin().

Thank you for the review.

Eric
>
> Alex
>
>> ---
>>  hw/vfio/common.c    | 14 +++++++-------
>>  hw/vfio/container.c | 23 +++++++++++++++++------
>>  2 files changed, 24 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 9d804152ba..1447b6fdc4 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -654,13 +654,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>          goto fail;
>>      }
>>  
>> -    hostwin = vfio_find_hostwin(container, iova, end);
>> -    if (!hostwin) {
>> -        error_setg(&err, "Container %p can't map guest IOVA region"
>> -                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> -        goto fail;
>> -    }
>> -
>>      memory_region_ref(section->mr);
>>  
>>      if (memory_region_is_iommu(section->mr)) {
>> @@ -720,6 +713,13 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>          return;
>>      }
>>  
>> +    hostwin = vfio_find_hostwin(container, iova, end);
>> +    if (!hostwin) {
>> +        error_setg(&err, "Container %p can't map guest IOVA region"
>> +                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> +        goto fail;
>> +    }
>> +
>>      /* Here we assume that memory_region_is_ram(section->mr)==true */
>>  
>>      /*
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 5122ff6d92..eb9d962881 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -693,12 +693,23 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>>          vfio_get_iommu_info_migration(container, info);
>>          g_free(info);
>>  
>> -        /*
>> -         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> -         * information to get the actual window extent rather than assume
>> -         * a 64-bit IOVA address space.
>> -         */
>> -        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
>> +        if (container->nr_iovas == -1) {
>> +            /*
>> +             * no available info on usable IOVA ranges,
>> +             * assume 64b IOVA space
>> +             */
>> +            vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
>> +        } else {
>> +            GList *l;
>> +
>> +            g_assert(container->nr_iovas);
>> +            for (l = container->iova_ranges; l; l = l->next) {
>> +                Range *r = l->data;
>> +
>> +                vfio_host_win_add(container, range_lob(r), range_upb(r),
>> +                                  container->pgsizes);
>> +            }
>> +        }
>>  
>>          break;
>>      }



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 03/13] vfio: Collect container iova range info
  2023-10-18 19:07   ` Alex Williamson
@ 2023-10-19  6:39     ` Eric Auger
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-19  6:39 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
	pbonzini, peter.maydell, peterx, david, philmd, zhenzhong.duan,
	yi.l.liu

Hi Alex,

On 10/18/23 21:07, Alex Williamson wrote:
> On Wed, 11 Oct 2023 19:52:19 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> capability is supported.
>>
>> This allows to propagate the information though the IOMMU MR
>> set_iova_ranges() callback so that virtual IOMMUs
>> get aware of those aperture constraints. This is only done if
>> the info is available and the number of iova ranges is greater than
>> 0.
>>
>> A new vfio_get_info_iova_range helper is introduced matching
>> the coding style of existing vfio_get_info_dma_avail. The
>> boolean returned value isn't used though. Code is aligned
>> between both.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v2 -> v3:
>> - Turn nr_iovas into a int initialized to -1
>> - memory_region_iommu_set_iova_ranges only is called if nr_iovas > 0
>> - vfio_get_info_iova_range returns a bool to match
>>   vfio_get_info_dma_avail. Uniformize both code by using !hdr in
>>   the check
>> - rebase on top of vfio-next
>> ---
>>  include/hw/vfio/vfio-common.h |  2 ++
>>  hw/vfio/common.c              |  9 +++++++
>>  hw/vfio/container.c           | 44 ++++++++++++++++++++++++++++++++---
>>  3 files changed, 52 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 7780b9073a..848ff47960 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -99,6 +99,8 @@ typedef struct VFIOContainer {
>>      QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
>>      QLIST_ENTRY(VFIOContainer) next;
>>      QLIST_HEAD(, VFIODevice) device_list;
>> +    int nr_iovas;
>> +    GList *iova_ranges;
> Nit, nr_iovas seems like it has a pretty weak use case here.  We can
> just test iova_ranges != NULL for calling set_iova_ranges.  In patch 13
> we can again test against NULL, which I think also negates the need to
> assert nr_iovas since the NULL test automatically catches the zero
> case.  Otherwise
>
> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>

Makes sense,  all the more I am going to drop patch 13. So I will respin
and remove nr_iovas and just rely on testing iova_ranges.

Thanks!

Eric
>
>>  } VFIOContainer;
>>  
>>  typedef struct VFIOGuestIOMMU {
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 5ff5acf1d8..9d804152ba 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -699,6 +699,15 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>              goto fail;
>>          }
>>  
>> +        if (container->nr_iovas > 0) {
>> +            ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
>> +                    container->iova_ranges, &err);
>> +            if (ret) {
>> +                g_free(giommu);
>> +                goto fail;
>> +            }
>> +        }
>> +
>>          ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
>>                                                      &err);
>>          if (ret) {
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index adc467210f..5122ff6d92 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -382,7 +382,7 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>>      /* If the capability cannot be found, assume no DMA limiting */
>>      hdr = vfio_get_iommu_type1_info_cap(info,
>>                                          VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL);
>> -    if (hdr == NULL) {
>> +    if (!hdr) {
>>          return false;
>>      }
>>  
>> @@ -394,6 +394,33 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>>      return true;
>>  }
>>  
>> +static bool vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
>> +                                     VFIOContainer *container)
>> +{
>> +    struct vfio_info_cap_header *hdr;
>> +    struct vfio_iommu_type1_info_cap_iova_range *cap;
>> +
>> +    hdr = vfio_get_iommu_type1_info_cap(info,
>> +                                        VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
>> +    if (!hdr) {
>> +        return false;
>> +    }
>> +
>> +    cap = (void *)hdr;
>> +
>> +    container->nr_iovas = cap->nr_iovas;
>> +    for (int i = 0; i < cap->nr_iovas; i++) {
>> +        Range *range = g_new(Range, 1);
>> +
>> +        range_set_bounds(range, cap->iova_ranges[i].start,
>> +                         cap->iova_ranges[i].end);
>> +        container->iova_ranges =
>> +            range_list_insert(container->iova_ranges, range);
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>  static void vfio_kvm_device_add_group(VFIOGroup *group)
>>  {
>>      Error *err = NULL;
>> @@ -535,6 +562,12 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
>>      }
>>  }
>>  
>> +static void vfio_free_container(VFIOContainer *container)
>> +{
>> +    g_list_free_full(container->iova_ranges, g_free);
>> +    g_free(container);
>> +}
>> +
>>  static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>>                                    Error **errp)
>>  {
>> @@ -616,6 +649,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>>      container->error = NULL;
>>      container->dirty_pages_supported = false;
>>      container->dma_max_mappings = 0;
>> +    container->nr_iovas = -1;
>> +    container->iova_ranges = NULL;
>>      QLIST_INIT(&container->giommu_list);
>>      QLIST_INIT(&container->hostwin_list);
>>      QLIST_INIT(&container->vrdl_list);
>> @@ -652,6 +687,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>>          if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
>>              container->dma_max_mappings = 65535;
>>          }
>> +
>> +        vfio_get_info_iova_range(info, container);
>> +
>>          vfio_get_iommu_info_migration(container, info);
>>          g_free(info);
>>  
>> @@ -765,7 +803,7 @@ enable_discards_exit:
>>      vfio_ram_block_discard_disable(container, false);
>>  
>>  free_container_exit:
>> -    g_free(container);
>> +    vfio_free_container(container);
>>  
>>  close_fd_exit:
>>      close(fd);
>> @@ -819,7 +857,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>>  
>>          trace_vfio_disconnect_container(container->fd);
>>          close(container->fd);
>> -        g_free(container);
>> +        vfio_free_container(container);
>>  
>>          vfio_put_address_space(space);
>>      }



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-10-18 13:37 ` [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Michael S. Tsirkin
@ 2023-10-19  9:07   ` YangHang Liu
  2023-10-19  9:08     ` Eric Auger
  2023-10-19 11:07   ` Cédric Le Goater
  1 sibling, 1 reply; 27+ messages in thread
From: YangHang Liu @ 2023-10-19  9:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	clg, jean-philippe, pbonzini, peter.maydell, peterx, david,
	philmd, zhenzhong.duan, yi.l.liu

The original issue I found : After starting a VM which has two ice PFs
and a virtio-iommu device, qemu-kvm and VM guest dmesg throw lots of
duplicate VFIO_MAP_DMA errors

After testing with Eric's build, the  original issue is gone and the
Tier1 regression test against ice PF and virtio iommu device gets PASS
as well.

Tested-by: Yanghang Liu <yanghliu@redhat.com>


On Wed, Oct 18, 2023 at 9:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote:
> > This applies on top of vfio-next:
> > https://github.com/legoater/qemu/, vfio-next branch
>
> virtio things make sense
>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>
> let me know how you want to merge all this.
>
>
>
> > On x86, when assigning VFIO-PCI devices protected with virtio-iommu
> > we encounter the case where the guest tries to map IOVAs beyond 48b
> > whereas the physical VTD IOMMU only supports 48b. This ends up with
> > VFIO_MAP_DMA failures at qemu level because at kernel level,
> > vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
> >
> > This is due to the fact the virtio-iommu currently unconditionally
> > exposes an IOVA range of 64b through its config input range fields.
> >
> > This series removes this assumption by retrieving the usable IOVA
> > regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
> > a VFIO device is attached. This info is communicated to the
> > virtio-iommu memory region, transformed into the inversed info, ie.
> > the host reserved IOVA regions. Then those latter are combined with the
> > reserved IOVA regions set though the virtio-iommu reserved-regions
> > property. That way, the guest virtio-iommu driver, unchanged, is
> > able to probe the whole set of reserved regions and prevent any IOVA
> > belonging to those ranges from beeing used, achieving the original goal.
> >
> > Best Regards
> >
> > Eric
> >
> > This series can be found at:
> > https://github.com/eauger/qemu/tree/vfio-next-iommu_geometry-v3
> >
> > History:
> > v2 -> v3:
> > - rebase on top of vfio-next (including iommufd prereq)
> > - take into account IOVA range info capability may not be offered by
> >   old kernel and use nr_iovas = -1 to encode that [Alex]
> > - use GList * everywhere instead of arrays (in the range_inverse_array)
> >   with the benefice it sorts ranges retrieved from the kernel which are
> >   not garanteed to be sorted. Rework the tests accordingly [Alex]
> > - Make sure resv_regions GList is build before the probe() [Jean]
> >   per device list is first populated with prop resv regions on
> >   IOMMUDevice creation and then rebuilt on set_iova()
> > - Add a warning if set_iova builds a valid list after probe was
> >   called [Jean]
> > - Build host windows on top of IOVA valid ranges if this info can
> >   be retrieved from the kernel. As many windows are created as
> >   valid ranges
> > v1 -> v2:
> > - Remove "[PATCH 12/13] virtio-iommu: Resize memory region according
> >   to the max iova info" which causes way too much trouble: trigger
> >   a coredump in vhost, causes duplication of IOMMU notifiers causing
> >   EEXIST vfio_dma_map errors, ... This looks like a bad usage of the
> >   memory API so I prefer removing this from this series. So I was
> >   also obliged to remove the vfio_find_hostwin() check in the case
> >   of an IOMMU.
> > - Let range_inverse_array() take low/high args instead of hardcoding
> >   0, UINT64_MAX which both complexifies the algo and the tests.
> > - Move range function description in header.
> > - Check that if set_iova_ranges is called several times, new resv
> >   regions are included in previous ones
> >
> > Eric Auger (13):
> >   memory: Let ReservedRegion use Range
> >   memory: Introduce memory_region_iommu_set_iova_ranges
> >   vfio: Collect container iova range info
> >   virtio-iommu: Rename reserved_regions into prop_resv_regions
> >   range: Make range_compare() public
> >   util/reserved-region: Add new ReservedRegion helpers
> >   virtio-iommu: Introduce per IOMMUDevice reserved regions
> >   range: Introduce range_inverse_array()
> >   virtio-iommu: Record whether a probe request has been issued
> >   virtio-iommu: Implement set_iova_ranges() callback
> >   virtio-iommu: Consolidate host reserved regions and property set ones
> >   test: Add some tests for range and resv-mem helpers
> >   vfio: Remove 64-bit IOVA address space assumption
> >
> >  include/exec/memory.h            |  34 +++-
> >  include/hw/vfio/vfio-common.h    |   2 +
> >  include/hw/virtio/virtio-iommu.h |   7 +-
> >  include/qemu/range.h             |  14 ++
> >  include/qemu/reserved-region.h   |  32 ++++
> >  hw/core/qdev-properties-system.c |   9 +-
> >  hw/vfio/common.c                 |  23 ++-
> >  hw/vfio/container.c              |  67 ++++++-
> >  hw/virtio/virtio-iommu-pci.c     |   8 +-
> >  hw/virtio/virtio-iommu.c         | 155 +++++++++++++--
> >  system/memory.c                  |  13 ++
> >  tests/unit/test-resv-mem.c       | 318 +++++++++++++++++++++++++++++++
> >  util/range.c                     |  61 +++++-
> >  util/reserved-region.c           |  91 +++++++++
> >  hw/virtio/trace-events           |   1 +
> >  tests/unit/meson.build           |   1 +
> >  util/meson.build                 |   1 +
> >  17 files changed, 791 insertions(+), 46 deletions(-)
> >  create mode 100644 include/qemu/reserved-region.h
> >  create mode 100644 tests/unit/test-resv-mem.c
> >  create mode 100644 util/reserved-region.c
> >
> > --
> > 2.41.0
>
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-10-19  9:07   ` YangHang Liu
@ 2023-10-19  9:08     ` Eric Auger
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Auger @ 2023-10-19  9:08 UTC (permalink / raw)
  To: YangHang Liu, Michael S. Tsirkin
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
	jean-philippe, pbonzini, peter.maydell, peterx, david, philmd,
	zhenzhong.duan, yi.l.liu

Hi Yanghang,

On 10/19/23 11:07, YangHang Liu wrote:
> The original issue I found : After starting a VM which has two ice PFs
> and a virtio-iommu device, qemu-kvm and VM guest dmesg throw lots of
> duplicate VFIO_MAP_DMA errors
>
> After testing with Eric's build, the  original issue is gone and the
> Tier1 regression test against ice PF and virtio iommu device gets PASS
> as well.
>
> Tested-by: Yanghang Liu <yanghliu@redhat.com>

Thank you for testing!

Eric
>
>
> On Wed, Oct 18, 2023 at 9:45 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote:
>>> This applies on top of vfio-next:
>>> https://github.com/legoater/qemu/, vfio-next branch
>> virtio things make sense
>>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>
>> let me know how you want to merge all this.
>>
>>
>>
>>> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
>>> we encounter the case where the guest tries to map IOVAs beyond 48b
>>> whereas the physical VTD IOMMU only supports 48b. This ends up with
>>> VFIO_MAP_DMA failures at qemu level because at kernel level,
>>> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
>>>
>>> This is due to the fact the virtio-iommu currently unconditionally
>>> exposes an IOVA range of 64b through its config input range fields.
>>>
>>> This series removes this assumption by retrieving the usable IOVA
>>> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
>>> a VFIO device is attached. This info is communicated to the
>>> virtio-iommu memory region, transformed into the inversed info, ie.
>>> the host reserved IOVA regions. Then those latter are combined with the
>>> reserved IOVA regions set though the virtio-iommu reserved-regions
>>> property. That way, the guest virtio-iommu driver, unchanged, is
>>> able to probe the whole set of reserved regions and prevent any IOVA
>>> belonging to those ranges from beeing used, achieving the original goal.
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> This series can be found at:
>>> https://github.com/eauger/qemu/tree/vfio-next-iommu_geometry-v3
>>>
>>> History:
>>> v2 -> v3:
>>> - rebase on top of vfio-next (including iommufd prereq)
>>> - take into account IOVA range info capability may not be offered by
>>>   old kernel and use nr_iovas = -1 to encode that [Alex]
>>> - use GList * everywhere instead of arrays (in the range_inverse_array)
>>>   with the benefice it sorts ranges retrieved from the kernel which are
>>>   not garanteed to be sorted. Rework the tests accordingly [Alex]
>>> - Make sure resv_regions GList is build before the probe() [Jean]
>>>   per device list is first populated with prop resv regions on
>>>   IOMMUDevice creation and then rebuilt on set_iova()
>>> - Add a warning if set_iova builds a valid list after probe was
>>>   called [Jean]
>>> - Build host windows on top of IOVA valid ranges if this info can
>>>   be retrieved from the kernel. As many windows are created as
>>>   valid ranges
>>> v1 -> v2:
>>> - Remove "[PATCH 12/13] virtio-iommu: Resize memory region according
>>>   to the max iova info" which causes way too much trouble: trigger
>>>   a coredump in vhost, causes duplication of IOMMU notifiers causing
>>>   EEXIST vfio_dma_map errors, ... This looks like a bad usage of the
>>>   memory API so I prefer removing this from this series. So I was
>>>   also obliged to remove the vfio_find_hostwin() check in the case
>>>   of an IOMMU.
>>> - Let range_inverse_array() take low/high args instead of hardcoding
>>>   0, UINT64_MAX which both complexifies the algo and the tests.
>>> - Move range function description in header.
>>> - Check that if set_iova_ranges is called several times, new resv
>>>   regions are included in previous ones
>>>
>>> Eric Auger (13):
>>>   memory: Let ReservedRegion use Range
>>>   memory: Introduce memory_region_iommu_set_iova_ranges
>>>   vfio: Collect container iova range info
>>>   virtio-iommu: Rename reserved_regions into prop_resv_regions
>>>   range: Make range_compare() public
>>>   util/reserved-region: Add new ReservedRegion helpers
>>>   virtio-iommu: Introduce per IOMMUDevice reserved regions
>>>   range: Introduce range_inverse_array()
>>>   virtio-iommu: Record whether a probe request has been issued
>>>   virtio-iommu: Implement set_iova_ranges() callback
>>>   virtio-iommu: Consolidate host reserved regions and property set ones
>>>   test: Add some tests for range and resv-mem helpers
>>>   vfio: Remove 64-bit IOVA address space assumption
>>>
>>>  include/exec/memory.h            |  34 +++-
>>>  include/hw/vfio/vfio-common.h    |   2 +
>>>  include/hw/virtio/virtio-iommu.h |   7 +-
>>>  include/qemu/range.h             |  14 ++
>>>  include/qemu/reserved-region.h   |  32 ++++
>>>  hw/core/qdev-properties-system.c |   9 +-
>>>  hw/vfio/common.c                 |  23 ++-
>>>  hw/vfio/container.c              |  67 ++++++-
>>>  hw/virtio/virtio-iommu-pci.c     |   8 +-
>>>  hw/virtio/virtio-iommu.c         | 155 +++++++++++++--
>>>  system/memory.c                  |  13 ++
>>>  tests/unit/test-resv-mem.c       | 318 +++++++++++++++++++++++++++++++
>>>  util/range.c                     |  61 +++++-
>>>  util/reserved-region.c           |  91 +++++++++
>>>  hw/virtio/trace-events           |   1 +
>>>  tests/unit/meson.build           |   1 +
>>>  util/meson.build                 |   1 +
>>>  17 files changed, 791 insertions(+), 46 deletions(-)
>>>  create mode 100644 include/qemu/reserved-region.h
>>>  create mode 100644 tests/unit/test-resv-mem.c
>>>  create mode 100644 util/reserved-region.c
>>>
>>> --
>>> 2.41.0
>>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-10-18 13:37 ` [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Michael S. Tsirkin
  2023-10-19  9:07   ` YangHang Liu
@ 2023-10-19 11:07   ` Cédric Le Goater
  2023-10-19 11:20     ` Michael S. Tsirkin
  2023-10-19 13:51     ` Eric Auger
  1 sibling, 2 replies; 27+ messages in thread
From: Cédric Le Goater @ 2023-10-19 11:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	jean-philippe, pbonzini, peter.maydell, peterx, david, philmd,
	zhenzhong.duan, yi.l.liu

On 10/18/23 15:37, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote:
>> This applies on top of vfio-next:
>> https://github.com/legoater/qemu/, vfio-next branch
> 
> virtio things make sense
> 
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> 
> let me know how you want to merge all this.

Michael,

I will grab the series if that's OK.

Thanks,

C.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-10-19 11:07   ` Cédric Le Goater
@ 2023-10-19 11:20     ` Michael S. Tsirkin
  2023-10-19 13:51     ` Eric Auger
  1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2023-10-19 11:20 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	jean-philippe, pbonzini, peter.maydell, peterx, david, philmd,
	zhenzhong.duan, yi.l.liu

On Thu, Oct 19, 2023 at 01:07:41PM +0200, Cédric Le Goater wrote:
> On 10/18/23 15:37, Michael S. Tsirkin wrote:
> > On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote:
> > > This applies on top of vfio-next:
> > > https://github.com/legoater/qemu/, vfio-next branch
> > 
> > virtio things make sense
> > 
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > let me know how you want to merge all this.
> 
> Michael,
> 
> I will grab the series if that's OK.

fine by me

> Thanks,
> 
> C.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-10-19 11:07   ` Cédric Le Goater
  2023-10-19 11:20     ` Michael S. Tsirkin
@ 2023-10-19 13:51     ` Eric Auger
  2023-10-19 17:40       ` Cédric Le Goater
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Auger @ 2023-10-19 13:51 UTC (permalink / raw)
  To: Cédric Le Goater, Michael S. Tsirkin
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	jean-philippe, pbonzini, peter.maydell, peterx, david, philmd,
	zhenzhong.duan, yi.l.liu, YangHang Liu

Hi Cédric,

On 10/19/23 13:07, Cédric Le Goater wrote:
> On 10/18/23 15:37, Michael S. Tsirkin wrote:
>> On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote:
>>> This applies on top of vfio-next:
>>> https://github.com/legoater/qemu/, vfio-next branch
>>
>> virtio things make sense
>>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>
>> let me know how you want to merge all this.
>
> Michael,
>
> I will grab the series if that's OK.

I have just sent a v4 taking into account Alex' suggestions, collecting
Michael's and Alex' R-b, and YangHang's T-b.
This should be ready to go if you don't have any other comments and if
this survives your non regression tests ;-)

Eric
>
> Thanks,
>
> C.
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
  2023-10-19 13:51     ` Eric Auger
@ 2023-10-19 17:40       ` Cédric Le Goater
  0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2023-10-19 17:40 UTC (permalink / raw)
  To: eric.auger, Michael S. Tsirkin
  Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	jean-philippe, pbonzini, peter.maydell, peterx, david, philmd,
	zhenzhong.duan, yi.l.liu, YangHang Liu

Hello Eric,

On 10/19/23 15:51, Eric Auger wrote:
> Hi Cédric,
> 
> On 10/19/23 13:07, Cédric Le Goater wrote:
>> On 10/18/23 15:37, Michael S. Tsirkin wrote:
>>> On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote:
>>>> This applies on top of vfio-next:
>>>> https://github.com/legoater/qemu/, vfio-next branch
>>>
>>> virtio things make sense
>>>
>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>>> let me know how you want to merge all this.
>>
>> Michael,
>>
>> I will grab the series if that's OK.
> 
> I have just sent a v4 taking into account Alex' suggestions, collecting
> Michael's and Alex' R-b, and YangHang's T-b.
> This should be ready to go if you don't have any other comments and if
> this survives your non regression tests ;-)

Sure.

I did a simple fix in patch 2 for a documentation breakage. No need
to resend. I should apply in the morning.

Cheers,

C.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 12/13] test: Add some tests for range and resv-mem helpers
  2023-10-11 17:52 ` [PATCH v3 12/13] test: Add some tests for range and resv-mem helpers Eric Auger
@ 2023-10-30  7:48   ` Cédric Le Goater
  0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2023-10-30  7:48 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
	jean-philippe, mst, pbonzini
  Cc: peter.maydell, peterx, david, philmd, zhenzhong.duan, yi.l.liu

On 10/11/23 19:52, Eric Auger wrote:
> Add unit tests for both resv_region_list_insert() and
> range_inverse_array().
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v2 -> v3:
> - conversion to new GList based protos
> ---
>   tests/unit/test-resv-mem.c | 318 +++++++++++++++++++++++++++++++++++++
>   tests/unit/meson.build     |   1 +
>   2 files changed, 319 insertions(+)
>   create mode 100644 tests/unit/test-resv-mem.c
> 
> diff --git a/tests/unit/test-resv-mem.c b/tests/unit/test-resv-mem.c
> new file mode 100644
> index 0000000000..ea3336c39d
> --- /dev/null
> +++ b/tests/unit/test-resv-mem.c
> @@ -0,0 +1,318 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * reserved-region/range.c unit-tests.
> + *
> + * Copyright (C) 2023, Red Hat, Inc.
> + *
> + * Author: Eric Auger <eric.auger@redhat.com>
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/range.h"
> +#include "exec/memory.h"
> +#include "qemu/reserved-region.h"
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +static void print_ranges(const char *prefix, GList *ranges)
> +{
> +    GList *l;
> +    int i = 0;
> +
> +    if (!g_list_length(ranges)) {
> +        printf("%s is void\n", prefix);
> +        return;
> +    }
> +    for (l = ranges; l; l = l->next) {
> +        Range *r = (Range *)l->data;
> +
> +        printf("%s rev[%i] = [0x%"PRIx64",0x%"PRIx64"]\n",
> +               prefix, i, range_lob(r), range_upb(r));
> +        i++;
> +    }
> +}
> +#endif
> +
> +static void compare_ranges(const char *prefix, GList *ranges,
> +                           GList *expected)
> +{
> +    GList *l, *e;
> +    int i = 0;

I dropped this variable. It's unused and clang complained about it.

Thanks,

C.

> +
> +#if DEBUG
> +    print_ranges("out", ranges);
> +    print_ranges("expected", expected);
> +#endif
> +    g_assert_cmpint(g_list_length(ranges), ==, g_list_length(expected));
> +    for (l = ranges, e = expected; l ; l = l->next, e = e->next) {
> +        Range *r = (Range *)l->data;
> +        Range *er = (Range *)e->data;
> +
> +        g_assert_true(range_lob(r) == range_lob(er) &&
> +                      range_upb(r) == range_upb(er));
> +        i++;
> +    }
> +}
> +
> +static GList *insert_sorted_range(GList *list, uint64_t lob, uint64_t upb)
> +{
> +    Range *new = g_new0(Range, 1);
> +
> +    range_set_bounds(new, lob, upb);
> +    return range_list_insert(list, new);
> +}
> +
> +static void reset(GList **in, GList **out, GList **expected)
> +{
> +    g_list_free_full(*in, g_free);
> +    g_list_free_full(*out, g_free);
> +    g_list_free_full(*expected, g_free);
> +    *in = NULL;
> +    *out = NULL;
> +    *expected = NULL;
> +}
> +
> +static void
> +run_range_inverse_array(const char *prefix, GList **in, GList **expected,
> +                        uint64_t low, uint64_t high)
> +{
> +    GList *out = NULL;
> +    range_inverse_array(*in, &out, low, high);
> +    compare_ranges(prefix, out, *expected);
> +    reset(in, &out, expected);
> +}
> +
> +static void check_range_reverse_array(void)
> +{
> +    GList *in = NULL, *expected = NULL;
> +
> +    /* test 1 */
> +
> +    in = insert_sorted_range(in, 0x10000, UINT64_MAX);
> +    expected = insert_sorted_range(expected, 0x0, 0xFFFF);
> +    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
> +
> +    /* test 2 */
> +
> +    in = insert_sorted_range(in, 0x10000, 0xFFFFFFFFFFFF);
> +    expected = insert_sorted_range(expected, 0x0, 0xFFFF);
> +    expected = insert_sorted_range(expected, 0x1000000000000, UINT64_MAX);
> +    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
> +
> +    /* test 3 */
> +
> +    in = insert_sorted_range(in, 0x0, 0xFFFF);
> +    in = insert_sorted_range(in, 0x10000, 0x2FFFF);
> +    expected = insert_sorted_range(expected, 0x30000, UINT64_MAX);
> +    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
> +
> +    /* test 4 */
> +
> +    in = insert_sorted_range(in, 0x50000, 0x5FFFF);
> +    in = insert_sorted_range(in, 0x60000, 0xFFFFFFFFFFFF);
> +    expected = insert_sorted_range(expected, 0x0, 0x4FFFF);
> +    expected = insert_sorted_range(expected, 0x1000000000000, UINT64_MAX);
> +    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
> +
> +    /* test 5 */
> +
> +    in = insert_sorted_range(in, 0x0, UINT64_MAX);
> +    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
> +
> +    /* test 6 */
> +    in = insert_sorted_range(in,  0x10000, 0x1FFFF);
> +    in = insert_sorted_range(in,  0x30000, 0x6FFFF);
> +    in = insert_sorted_range(in,  0x90000, UINT64_MAX);
> +    expected = insert_sorted_range(expected, 0x0, 0xFFFF);
> +    expected = insert_sorted_range(expected, 0x20000, 0x2FFFF);
> +    expected = insert_sorted_range(expected, 0x70000, 0x8FFFF);
> +    run_range_inverse_array("test1", &in, &expected, 0x0, UINT64_MAX);
> +}
> +
> +static void check_range_reverse_array_low_end(void)
> +{
> +    GList *in = NULL, *expected = NULL;
> +
> +    /* test 1 */
> +    in = insert_sorted_range(in,  0x0, UINT64_MAX);
> +    run_range_inverse_array("test1", &in, &expected, 0x10000, 0xFFFFFF);
> +
> +    /* test 2 */
> +
> +    in = insert_sorted_range(in,  0x0, 0xFFFF);
> +    in = insert_sorted_range(in,  0x20000, 0x2FFFF);
> +    expected = insert_sorted_range(expected, 0x40000, 0xFFFFFFFFFFFF);
> +    run_range_inverse_array("test2", &in, &expected, 0x40000, 0xFFFFFFFFFFFF);
> +
> +    /* test 3 */
> +    in = insert_sorted_range(in,  0x0, 0xFFFF);
> +    in = insert_sorted_range(in,  0x20000, 0x2FFFF);
> +    in = insert_sorted_range(in,  0x1000000000000, UINT64_MAX);
> +    expected = insert_sorted_range(expected, 0x40000, 0xFFFFFFFFFFFF);
> +    run_range_inverse_array("test3", &in, &expected, 0x40000, 0xFFFFFFFFFFFF);
> +
> +    /* test 4 */
> +
> +    in = insert_sorted_range(in,  0x0, 0xFFFF);
> +    in = insert_sorted_range(in,  0x20000, 0x2FFFF);
> +    in = insert_sorted_range(in,  0x1000000000000, UINT64_MAX);
> +    expected = insert_sorted_range(expected, 0x30000, 0xFFFFFFFFFFFF);
> +    run_range_inverse_array("test4", &in, &expected, 0x20000, 0xFFFFFFFFFFFF);
> +
> +    /* test 5 */
> +
> +    in = insert_sorted_range(in,  0x2000, 0xFFFF);
> +    in = insert_sorted_range(in,  0x20000, 0x2FFFF);
> +    in = insert_sorted_range(in,  0x100000000, 0x1FFFFFFFF);
> +    expected = insert_sorted_range(expected, 0x1000, 0x1FFF);
> +    expected = insert_sorted_range(expected, 0x10000, 0x1FFFF);
> +    expected = insert_sorted_range(expected, 0x30000, 0xFFFFFFFF);
> +    expected = insert_sorted_range(expected, 0x200000000, 0xFFFFFFFFFFFF);
> +    run_range_inverse_array("test5", &in, &expected, 0x1000, 0xFFFFFFFFFFFF);
> +
> +    /* test 6 */
> +
> +    in = insert_sorted_range(in,  0x10000000 , 0x1FFFFFFF);
> +    in = insert_sorted_range(in,  0x100000000, 0x1FFFFFFFF);
> +    expected = insert_sorted_range(expected, 0x0, 0xFFFF);
> +    run_range_inverse_array("test6", &in, &expected, 0x0, 0xFFFF);
> +}
> +
> +static ReservedRegion *alloc_resv_mem(unsigned type, uint64_t lob, uint64_t upb)
> +{
> +    ReservedRegion *r;
> +
> +    r = g_new0(ReservedRegion, 1);
> +    r->type = type;
> +    range_set_bounds(&r->range, lob, upb);
> +    return r;
> +}
> +
> +static void print_resv_region_list(const char *prefix, GList *list,
> +                                   uint32_t expected_length)
> +{
> +    int i = g_list_length(list);
> +
> +    g_assert_cmpint(i, ==, expected_length);
> +#if DEBUG
> +    i = 0;
> +    for (GList *l = list; l; l = l->next) {
> +        ReservedRegion *r = (ReservedRegion *)l->data;
> +        Range *range = &r->range;
> +
> +        printf("%s item[%d]=[0x%x, 0x%"PRIx64", 0x%"PRIx64"]\n",
> +               prefix, i++, r->type, range_lob(range), range_upb(range));
> +    }
> +#endif
> +}
> +
> +static void free_resv_region(gpointer data)
> +{
> +    ReservedRegion *reg = (ReservedRegion *)data;
> +
> +    g_free(reg);
> +}
> +
> +static void check_resv_region_list_insert(void)
> +{
> +    ReservedRegion *r[10];
> +    GList *l = NULL;
> +
> +    r[0] = alloc_resv_mem(0xA, 0, 0xFFFF);
> +    r[1] = alloc_resv_mem(0xA, 0x20000, 0x2FFFF);
> +    l = resv_region_list_insert(l, r[0]);
> +    l = resv_region_list_insert(l, r[1]);
> +    print_resv_region_list("test1", l, 2);
> +
> +    /* adjacent on left */
> +    r[2] = alloc_resv_mem(0xB, 0x0, 0xFFF);
> +    l = resv_region_list_insert(l, r[2]);
> +    /* adjacent on right */
> +    r[3] = alloc_resv_mem(0xC, 0x21000, 0x2FFFF);
> +    l = resv_region_list_insert(l, r[3]);
> +    print_resv_region_list("test2", l, 4);
> +
> +    /* exact overlap of D into C*/
> +    r[4] = alloc_resv_mem(0xD, 0x21000, 0x2FFFF);
> +    l = resv_region_list_insert(l, r[4]);
> +    print_resv_region_list("test3", l, 4);
> +
> +    /* in the middle */
> +    r[5] = alloc_resv_mem(0xE, 0x22000, 0x23FFF);
> +    l = resv_region_list_insert(l, r[5]);
> +    print_resv_region_list("test4", l, 6);
> +
> +    /* overwrites several existing ones */
> +    r[6] = alloc_resv_mem(0xF, 0x10000, 0x2FFFF);
> +    l = resv_region_list_insert(l, r[6]);
> +    print_resv_region_list("test5", l, 3);
> +
> +    /* contiguous at the end */
> +    r[7] = alloc_resv_mem(0x0, 0x30000, 0x40000);
> +    l = resv_region_list_insert(l, r[7]);
> +    print_resv_region_list("test6", l, 4);
> +
> +    g_list_free_full(l, free_resv_region);
> +    l = NULL;
> +
> +    r[0] = alloc_resv_mem(0x0, 0x10000, 0x1FFFF);
> +    l = resv_region_list_insert(l, r[0]);
> +    /* insertion before the 1st item */
> +    r[1] = alloc_resv_mem(0x1, 0x0, 0xFF);
> +    l = resv_region_list_insert(l, r[1]);
> +    print_resv_region_list("test8", l, 2);
> +
> +    /* collision on the left side */
> +    r[2] = alloc_resv_mem(0xA, 0x1200, 0x11FFF);
> +    l = resv_region_list_insert(l, r[2]);
> +    print_resv_region_list("test9", l, 3);
> +
> +    /* collision on the right side */
> +    r[3] = alloc_resv_mem(0xA, 0x1F000, 0x2FFFF);
> +    l = resv_region_list_insert(l, r[3]);
> +    print_resv_region_list("test10", l, 4);
> +
> +    /* override everything */
> +    r[4] = alloc_resv_mem(0xF, 0x0, UINT64_MAX);
> +    l = resv_region_list_insert(l, r[4]);
> +    print_resv_region_list("test11", l, 1);
> +
> +    g_list_free_full(l, free_resv_region);
> +    l = NULL;
> +
> +    r[0] = alloc_resv_mem(0xF, 0x1000000000000, UINT64_MAX);
> +    l = resv_region_list_insert(l, r[0]);
> +    print_resv_region_list("test12", l, 1);
> +
> +    r[1] = alloc_resv_mem(0xA, 0x0, 0xFFFFFFF);
> +    l = resv_region_list_insert(l, r[1]);
> +    print_resv_region_list("test12", l, 2);
> +
> +    r[2] = alloc_resv_mem(0xB, 0x100000000, 0x1FFFFFFFF);
> +    l = resv_region_list_insert(l, r[2]);
> +    print_resv_region_list("test12", l, 3);
> +
> +    r[3] = alloc_resv_mem(0x0, 0x010000000, 0x2FFFFFFFF);
> +    l = resv_region_list_insert(l, r[3]);
> +    print_resv_region_list("test12", l, 3);
> +
> +    g_list_free_full(l, free_resv_region);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +    g_test_init(&argc, &argv, NULL);
> +
> +    g_test_add_func("/resv-mem/range_reverse_array",
> +                    check_range_reverse_array);
> +    g_test_add_func("/resv-mem/range_reverse_array_low_end",
> +                    check_range_reverse_array_low_end);
> +    g_test_add_func("/resv-mem/resv_region_list_insert",
> +                    check_resv_region_list_insert);
> +
> +    g_test_run();
> +
> +    return 0;
> +}
> diff --git a/tests/unit/meson.build b/tests/unit/meson.build
> index f33ae64b8d..e6c51e7a86 100644
> --- a/tests/unit/meson.build
> +++ b/tests/unit/meson.build
> @@ -21,6 +21,7 @@ tests = {
>     'test-opts-visitor': [testqapi],
>     'test-visitor-serialization': [testqapi],
>     'test-bitmap': [],
> +  'test-resv-mem': [],
>     # all code tested by test-x86-cpuid is inside topology.h
>     'test-x86-cpuid': [],
>     'test-cutils': [],



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-10-30  7:49 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-11 17:52 [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
2023-10-11 17:52 ` [PATCH v3 01/13] memory: Let ReservedRegion use Range Eric Auger
2023-10-11 17:52 ` [PATCH v3 02/13] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
2023-10-18 22:07   ` Peter Xu
2023-10-11 17:52 ` [PATCH v3 03/13] vfio: Collect container iova range info Eric Auger
2023-10-18 19:07   ` Alex Williamson
2023-10-19  6:39     ` Eric Auger
2023-10-11 17:52 ` [PATCH v3 04/13] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
2023-10-11 17:52 ` [PATCH v3 05/13] range: Make range_compare() public Eric Auger
2023-10-11 17:52 ` [PATCH v3 06/13] util/reserved-region: Add new ReservedRegion helpers Eric Auger
2023-10-11 17:52 ` [PATCH v3 07/13] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
2023-10-11 17:52 ` [PATCH v3 08/13] range: Introduce range_inverse_array() Eric Auger
2023-10-11 17:52 ` [PATCH v3 09/13] virtio-iommu: Record whether a probe request has been issued Eric Auger
2023-10-11 17:52 ` [PATCH v3 10/13] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
2023-10-11 17:52 ` [PATCH v3 11/13] virtio-iommu: Consolidate host reserved regions and property set ones Eric Auger
2023-10-11 17:52 ` [PATCH v3 12/13] test: Add some tests for range and resv-mem helpers Eric Auger
2023-10-30  7:48   ` Cédric Le Goater
2023-10-11 17:52 ` [PATCH v3 13/13] vfio: Remove 64-bit IOVA address space assumption Eric Auger
2023-10-18 21:42   ` Alex Williamson
2023-10-19  6:37     ` Eric Auger
2023-10-18 13:37 ` [PATCH v3 00/13] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Michael S. Tsirkin
2023-10-19  9:07   ` YangHang Liu
2023-10-19  9:08     ` Eric Auger
2023-10-19 11:07   ` Cédric Le Goater
2023-10-19 11:20     ` Michael S. Tsirkin
2023-10-19 13:51     ` Eric Auger
2023-10-19 17:40       ` Cédric Le Goater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).