* [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
@ 2023-09-13 8:01 Eric Auger
2023-09-13 8:01 ` [PATCH v2 01/12] memory: Let ReservedRegion use Range Eric Auger
` (12 more replies)
0 siblings, 13 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
On x86, when assigning VFIO-PCI devices protected with virtio-iommu
we encounter the case where the guest tries to map IOVAs beyond 48b
whereas the physical VTD IOMMU only supports 48b. This ends up with
VFIO_MAP_DMA failures at qemu level because at kernel level,
vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
This is due to the fact the virtio-iommu currently unconditionally
exposes an IOVA range of 64b through its config input range fields.
This series removes this assumption by retrieving the usable IOVA
regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
a VFIO device is attached. This info is communicated to the
virtio-iommu memory region, transformed into the inversed info, ie.
the host reserved IOVA regions. Then those latter are combined with the
reserved IOVA regions set though the virtio-iommu reserved-regions
property. That way, the guest virtio-iommu driver, unchanged, is
able to probe the whole set of reserved regions and prevent any IOVA
belonging to those ranges from beeing used, achieving the original goal.
Best Regards
Eric
This series can be found at:
https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v2
History:
v1 -> v2:
- Remove "[PATCH 12/13] virtio-iommu: Resize memory region according
to the max iova info" which causes way too much trouble: trigger
a coredump in vhost, causes duplication of IOMMU notifiers causing
EEXIST vfio_dma_map errors, ... This looks like a bad usage of the
memory API so I prefer removing this from this series. So I was
also obliged to remove the vfio_find_hostwin() check in the case
of an IOMMU MR.
- Let range_inverse_array() take low/high args instead of hardcoding
0, UINT64_MAX which both complexifies the algo and the tests.
- Move range function description in header.
- Check that if set_iova_ranges is called several times, new resv
regions are included in previous ones
Eric Auger (12):
memory: Let ReservedRegion use Range
memory: Introduce memory_region_iommu_set_iova_ranges
vfio: Collect container iova range info
virtio-iommu: Rename reserved_regions into prop_resv_regions
virtio-iommu: Introduce per IOMMUDevice reserved regions
range: Introduce range_inverse_array()
virtio-iommu: Implement set_iova_ranges() callback
range: Make range_compare() public
util/reserved-region: Add new ReservedRegion helpers
virtio-iommu: Consolidate host reserved regions and property set ones
test: Add some tests for range and resv-mem helpers
vfio: Remove 64-bit IOVA address space assumption
include/exec/memory.h | 30 +++-
include/hw/vfio/vfio-common.h | 2 +
include/hw/virtio/virtio-iommu.h | 7 +-
include/qemu/range.h | 14 ++
include/qemu/reserved-region.h | 32 ++++
hw/core/qdev-properties-system.c | 9 +-
hw/vfio/common.c | 70 +++++++--
hw/virtio/virtio-iommu-pci.c | 8 +-
hw/virtio/virtio-iommu.c | 110 ++++++++++++--
softmmu/memory.c | 15 ++
tests/unit/test-resv-mem.c | 251 +++++++++++++++++++++++++++++++
util/range.c | 51 ++++++-
util/reserved-region.c | 94 ++++++++++++
hw/virtio/trace-events | 1 +
tests/unit/meson.build | 1 +
util/meson.build | 1 +
16 files changed, 655 insertions(+), 41 deletions(-)
create mode 100644 include/qemu/reserved-region.h
create mode 100644 tests/unit/test-resv-mem.c
create mode 100644 util/reserved-region.c
--
2.41.0
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v2 01/12] memory: Let ReservedRegion use Range
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-13 12:43 ` Cédric Le Goater
2023-09-13 8:01 ` [PATCH v2 02/12] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
` (11 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
A reserved region is a range tagged with a type. Let's directly use
the Range type in the prospect to reuse some of the library helpers
shipped with the Range type.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
v1 -> v2:
- Added Philippe and David's R-b
---
include/exec/memory.h | 4 ++--
hw/core/qdev-properties-system.c | 9 ++++++---
hw/virtio/virtio-iommu.c | 6 +++---
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 68284428f8..184cb3a01b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -24,6 +24,7 @@
#include "qemu/bswap.h"
#include "qemu/queue.h"
#include "qemu/int128.h"
+#include "qemu/range.h"
#include "qemu/notify.h"
#include "qom/object.h"
#include "qemu/rcu.h"
@@ -79,8 +80,7 @@ extern unsigned int global_dirty_tracking;
typedef struct MemoryRegionOps MemoryRegionOps;
struct ReservedRegion {
- hwaddr low;
- hwaddr high;
+ Range range;
unsigned type;
};
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 6d5d43eda2..8d486eeefd 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -699,7 +699,7 @@ static void get_reserved_region(Object *obj, Visitor *v, const char *name,
int rc;
rc = snprintf(buffer, sizeof(buffer), "0x%"PRIx64":0x%"PRIx64":%u",
- rr->low, rr->high, rr->type);
+ range_lob(&rr->range), range_upb(&rr->range), rr->type);
assert(rc < sizeof(buffer));
visit_type_str(v, name, &p, errp);
@@ -711,6 +711,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
Property *prop = opaque;
ReservedRegion *rr = object_field_prop_ptr(obj, prop);
const char *endptr;
+ uint64_t lob, upb;
char *str;
int ret;
@@ -718,7 +719,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
return;
}
- ret = qemu_strtou64(str, &endptr, 16, &rr->low);
+ ret = qemu_strtou64(str, &endptr, 16, &lob);
if (ret) {
error_setg(errp, "start address of '%s'"
" must be a hexadecimal integer", name);
@@ -728,7 +729,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
goto separator_error;
}
- ret = qemu_strtou64(endptr + 1, &endptr, 16, &rr->high);
+ ret = qemu_strtou64(endptr + 1, &endptr, 16, &upb);
if (ret) {
error_setg(errp, "end address of '%s'"
" must be a hexadecimal integer", name);
@@ -738,6 +739,8 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
goto separator_error;
}
+ range_set_bounds(&rr->range, lob, upb);
+
ret = qemu_strtoui(endptr + 1, &endptr, 10, &rr->type);
if (ret) {
error_setg(errp, "type of '%s'"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index be51635895..e5e46e1b55 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -645,8 +645,8 @@ static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
prop.head.length = cpu_to_le16(length);
prop.subtype = subtype;
- prop.start = cpu_to_le64(s->reserved_regions[i].low);
- prop.end = cpu_to_le64(s->reserved_regions[i].high);
+ prop.start = cpu_to_le64(range_lob(&s->reserved_regions[i].range));
+ prop.end = cpu_to_le64(range_upb(&s->reserved_regions[i].range));
memcpy(buf, &prop, size);
@@ -897,7 +897,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
for (i = 0; i < s->nb_reserved_regions; i++) {
ReservedRegion *reg = &s->reserved_regions[i];
- if (addr >= reg->low && addr <= reg->high) {
+ if (range_contains(®->range, addr)) {
switch (reg->type) {
case VIRTIO_IOMMU_RESV_MEM_T_MSI:
entry.perm = flag;
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 02/12] memory: Introduce memory_region_iommu_set_iova_ranges
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
2023-09-13 8:01 ` [PATCH v2 01/12] memory: Let ReservedRegion use Range Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-13 12:43 ` Cédric Le Goater
2023-09-13 8:01 ` [PATCH v2 03/12] vfio: Collect container iova range info Eric Auger
` (10 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
This helper will allow to convey information about valid
IOVA ranges to virtual IOMMUS.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
include/exec/memory.h | 26 ++++++++++++++++++++++++++
softmmu/memory.c | 15 +++++++++++++++
2 files changed, 41 insertions(+)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 184cb3a01b..f6fb99dd3f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -519,6 +519,27 @@ struct IOMMUMemoryRegionClass {
int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
uint64_t page_size_mask,
Error **errp);
+ /**
+ * @iommu_set_iova_ranges:
+ *
+ * Propagate information about the usable IOVA ranges for a given IOMMU
+ * memory region. Used for example to propagate host physical device
+ * reserved memory region constraints to the virtual IOMMU.
+ *
+ * Optional method: if this method is not provided, then the default IOVA
+ * aperture is used.
+ *
+ * @nr_ranges: number of IOVA ranges
+ *
+ * @iova_ranges: an array of @nr_ranges usable IOVA ranges
+ *
+ * Returns 0 on success, or a negative error. In case of failure, the error
+ * object must be created.
+ */
+ int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
+ uint32_t nr_ranges,
+ struct Range *iova_ranges,
+ Error **errp);
};
typedef struct RamDiscardListener RamDiscardListener;
@@ -1845,6 +1866,11 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
uint64_t page_size_mask,
Error **errp);
+int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu,
+ uint32_t nr_ranges,
+ struct Range *iova_ranges,
+ Error **errp);
+
/**
* memory_region_name: get a memory region's name
*
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce70..07499457aa 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1905,6 +1905,21 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
return ret;
}
+int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu_mr,
+ uint32_t nr_ranges,
+ struct Range *iova_ranges,
+ Error **errp)
+{
+ IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+ int ret = 0;
+
+ if (imrc->iommu_set_iova_ranges) {
+ ret = imrc->iommu_set_iova_ranges(iommu_mr, nr_ranges,
+ iova_ranges, errp);
+ }
+ return ret;
+}
+
int memory_region_register_iommu_notifier(MemoryRegion *mr,
IOMMUNotifier *n, Error **errp)
{
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 03/12] vfio: Collect container iova range info
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
2023-09-13 8:01 ` [PATCH v2 01/12] memory: Let ReservedRegion use Range Eric Auger
2023-09-13 8:01 ` [PATCH v2 02/12] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-13 12:55 ` Cédric Le Goater
2023-09-19 15:44 ` Alex Williamson
2023-09-13 8:01 ` [PATCH v2 04/12] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
` (9 subsequent siblings)
12 siblings, 2 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
capability is supported.
This allows to propagate the information though the IOMMU MR
set_iova_ranges() callback so that virtual IOMMUs
get aware of those aperture constraints.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
include/hw/vfio/vfio-common.h | 2 ++
hw/vfio/common.c | 45 +++++++++++++++++++++++++++++++++--
2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index da43d27352..74b9b27270 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -98,6 +98,8 @@ typedef struct VFIOContainer {
QLIST_HEAD(, VFIOGroup) group_list;
QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
QLIST_ENTRY(VFIOContainer) next;
+ unsigned nr_iovas;
+ struct vfio_iova_range *iova_ranges;
} VFIOContainer;
typedef struct VFIOGuestIOMMU {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9aac21abb7..26da38de05 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1157,6 +1157,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
goto fail;
}
+ ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
+ container->nr_iovas, (struct Range *)container->iova_ranges,
+ &err);
+ if (ret) {
+ g_free(giommu);
+ goto fail;
+ }
+
ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
&err);
if (ret) {
@@ -1981,6 +1989,29 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
return true;
}
+static void vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
+ unsigned int *nr_iovas,
+ struct vfio_iova_range **iova_ranges)
+{
+ struct vfio_info_cap_header *hdr;
+ struct vfio_iommu_type1_info_cap_iova_range *cap;
+
+ hdr = vfio_get_iommu_type1_info_cap(info,
+ VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
+ if (hdr == NULL) {
+ return;
+ }
+
+ cap = (void *)hdr;
+ *nr_iovas = cap->nr_iovas;
+
+ if (*nr_iovas == 0) {
+ return;
+ }
+ *iova_ranges = g_memdup2(cap->iova_ranges,
+ *nr_iovas * sizeof(struct vfio_iova_range));
+}
+
static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
struct vfio_region_info *info)
{
@@ -2433,6 +2464,12 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
}
}
+static void vfio_free_container(VFIOContainer *container)
+{
+ g_free(container->iova_ranges);
+ g_free(container);
+}
+
static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
Error **errp)
{
@@ -2550,6 +2587,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
container->dma_max_mappings = 65535;
}
+
+ vfio_get_info_iova_range(info, &container->nr_iovas,
+ &container->iova_ranges);
+
vfio_get_iommu_info_migration(container, info);
g_free(info);
@@ -2663,7 +2704,7 @@ enable_discards_exit:
vfio_ram_block_discard_disable(container, false);
free_container_exit:
- g_free(container);
+ vfio_free_container(container);
close_fd_exit:
close(fd);
@@ -2717,7 +2758,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
trace_vfio_disconnect_container(container->fd);
close(container->fd);
- g_free(container);
+ vfio_free_container(container);
vfio_put_address_space(space);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 04/12] virtio-iommu: Rename reserved_regions into prop_resv_regions
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (2 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 03/12] vfio: Collect container iova range info Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-13 13:01 ` Cédric Le Goater
2023-09-13 8:01 ` [PATCH v2 05/12] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
` (8 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
Rename VirtIOIOMMU (nb_)reserved_regions fields with the "prop_" prefix
to highlight those fields are set through a property, at machine level.
They are IOMMU wide.
A subsequent patch will introduce per IOMMUDevice reserved regions
that will include both those IOMMU wide property reserved
regions plus, sometimes, host reserved regions, if the device is
backed by a host device protected by a physical IOMMU. Also change
nb_ prefix by nr_.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
include/hw/virtio/virtio-iommu.h | 4 ++--
hw/virtio/virtio-iommu-pci.c | 8 ++++----
hw/virtio/virtio-iommu.c | 15 ++++++++-------
3 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index a93fc5383e..eea4564782 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -55,8 +55,8 @@ struct VirtIOIOMMU {
GHashTable *as_by_busptr;
IOMMUPciBus *iommu_pcibus_by_bus_num[PCI_BUS_MAX];
PCIBus *primary_bus;
- ReservedRegion *reserved_regions;
- uint32_t nb_reserved_regions;
+ ReservedRegion *prop_resv_regions;
+ uint32_t nr_prop_resv_regions;
GTree *domains;
QemuRecMutex mutex;
GTree *endpoints;
diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
index 7ef2f9dcdb..9459fbf6ed 100644
--- a/hw/virtio/virtio-iommu-pci.c
+++ b/hw/virtio/virtio-iommu-pci.c
@@ -37,7 +37,7 @@ struct VirtIOIOMMUPCI {
static Property virtio_iommu_pci_properties[] = {
DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
DEFINE_PROP_ARRAY("reserved-regions", VirtIOIOMMUPCI,
- vdev.nb_reserved_regions, vdev.reserved_regions,
+ vdev.nr_prop_resv_regions, vdev.prop_resv_regions,
qdev_prop_reserved_region, ReservedRegion),
DEFINE_PROP_END_OF_LIST(),
};
@@ -54,9 +54,9 @@ static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
"for the virtio-iommu-pci device");
return;
}
- for (int i = 0; i < s->nb_reserved_regions; i++) {
- if (s->reserved_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
- s->reserved_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
+ for (int i = 0; i < s->nr_prop_resv_regions; i++) {
+ if (s->prop_resv_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
+ s->prop_resv_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
error_setg(errp, "reserved region %d has an invalid type", i);
error_append_hint(errp, "Valid values are 0 and 1\n");
return;
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index e5e46e1b55..979cdb5648 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -631,22 +631,23 @@ static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
int i;
- total = size * s->nb_reserved_regions;
+ total = size * s->nr_prop_resv_regions;
if (total > free) {
return -ENOSPC;
}
- for (i = 0; i < s->nb_reserved_regions; i++) {
- unsigned subtype = s->reserved_regions[i].type;
+ for (i = 0; i < s->nr_prop_resv_regions; i++) {
+ unsigned subtype = s->prop_resv_regions[i].type;
+ Range *range = &s->prop_resv_regions[i].range;
assert(subtype == VIRTIO_IOMMU_RESV_MEM_T_RESERVED ||
subtype == VIRTIO_IOMMU_RESV_MEM_T_MSI);
prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
prop.head.length = cpu_to_le16(length);
prop.subtype = subtype;
- prop.start = cpu_to_le64(range_lob(&s->reserved_regions[i].range));
- prop.end = cpu_to_le64(range_upb(&s->reserved_regions[i].range));
+ prop.start = cpu_to_le64(range_lob(range));
+ prop.end = cpu_to_le64(range_upb(range));
memcpy(buf, &prop, size);
@@ -894,8 +895,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
goto unlock;
}
- for (i = 0; i < s->nb_reserved_regions; i++) {
- ReservedRegion *reg = &s->reserved_regions[i];
+ for (i = 0; i < s->nr_prop_resv_regions; i++) {
+ ReservedRegion *reg = &s->prop_resv_regions[i];
if (range_contains(®->range, addr)) {
switch (reg->type) {
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 05/12] virtio-iommu: Introduce per IOMMUDevice reserved regions
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (3 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 04/12] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-29 15:52 ` Jean-Philippe Brucker
2023-09-13 8:01 ` [PATCH v2 06/12] range: Introduce range_inverse_array() Eric Auger
` (7 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
For the time being the per device reserved regions are
just a duplicate of IOMMU wide reserved regions. Subsequent
patches will combine those with host reserved regions, if any.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
include/hw/virtio/virtio-iommu.h | 1 +
hw/virtio/virtio-iommu.c | 42 ++++++++++++++++++++++++++------
2 files changed, 35 insertions(+), 8 deletions(-)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index eea4564782..70b8ace34d 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -39,6 +39,7 @@ typedef struct IOMMUDevice {
AddressSpace as;
MemoryRegion root; /* The root container of the device */
MemoryRegion bypass_mr; /* The alias of shared memory MR */
+ GList *resv_regions;
} IOMMUDevice;
typedef struct IOMMUPciBus {
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 979cdb5648..ea359b586a 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -624,22 +624,48 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
return ret;
}
+static int consolidate_resv_regions(IOMMUDevice *sdev)
+{
+ VirtIOIOMMU *s = sdev->viommu;
+ int i;
+
+ for (i = 0; i < s->nr_prop_resv_regions; i++) {
+ ReservedRegion *reg = g_new0(ReservedRegion, 1);
+
+ *reg = s->prop_resv_regions[i];
+ sdev->resv_regions = g_list_append(sdev->resv_regions, reg);
+ }
+ return 0;
+}
+
static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
uint8_t *buf, size_t free)
{
struct virtio_iommu_probe_resv_mem prop = {};
size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
- int i;
+ IOMMUDevice *sdev;
+ GList *l;
+ int ret;
- total = size * s->nr_prop_resv_regions;
+ sdev = container_of(virtio_iommu_mr(s, ep), IOMMUDevice, iommu_mr);
+ if (!sdev) {
+ return -EINVAL;
+ }
+ ret = consolidate_resv_regions(sdev);
+ if (ret) {
+ return ret;
+ }
+
+ total = size * g_list_length(sdev->resv_regions);
if (total > free) {
return -ENOSPC;
}
- for (i = 0; i < s->nr_prop_resv_regions; i++) {
- unsigned subtype = s->prop_resv_regions[i].type;
- Range *range = &s->prop_resv_regions[i].range;
+ for (l = sdev->resv_regions; l; l = l->next) {
+ ReservedRegion *reg = l->data;
+ unsigned subtype = reg->type;
+ Range *range = ®->range;
assert(subtype == VIRTIO_IOMMU_RESV_MEM_T_RESERVED ||
subtype == VIRTIO_IOMMU_RESV_MEM_T_MSI);
@@ -857,7 +883,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
bool bypass_allowed;
int granule;
bool found;
- int i;
+ GList *l;
interval.low = addr;
interval.high = addr + 1;
@@ -895,8 +921,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
goto unlock;
}
- for (i = 0; i < s->nr_prop_resv_regions; i++) {
- ReservedRegion *reg = &s->prop_resv_regions[i];
+ for (l = sdev->resv_regions; l; l = l->next) {
+ ReservedRegion *reg = l->data;
if (range_contains(®->range, addr)) {
switch (reg->type) {
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 06/12] range: Introduce range_inverse_array()
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (4 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 05/12] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-19 16:29 ` Alex Williamson
2023-09-13 8:01 ` [PATCH v2 07/12] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
` (6 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
This helper reverses an array of regions, turning original
regions into holes and original holes into actual regions,
covering the whole UINT64_MAX span.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
v1 -> v2:
- Move range_inverse_array description comment in the header
- Take low/high params
---
include/qemu/range.h | 8 ++++++++
util/range.c | 45 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 53 insertions(+)
diff --git a/include/qemu/range.h b/include/qemu/range.h
index 7e2b1cc447..2b59e3bf0c 100644
--- a/include/qemu/range.h
+++ b/include/qemu/range.h
@@ -219,4 +219,12 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
GList *range_list_insert(GList *list, Range *data);
+/*
+ * Inverse an array of sorted ranges over the [low, high] span, ie.
+ * original ranges becomes holes in the newly allocated inv_ranges
+ */
+void range_inverse_array(uint32_t nr_ranges, Range *ranges,
+ uint32_t *nr_inv_ranges, Range **inv_ranges,
+ uint64_t low, uint64_t high);
+
#endif
diff --git a/util/range.c b/util/range.c
index 098d9d2dc0..4baeb588cc 100644
--- a/util/range.c
+++ b/util/range.c
@@ -70,3 +70,48 @@ GList *range_list_insert(GList *list, Range *data)
return list;
}
+
+void range_inverse_array(uint32_t nr_ranges, Range *ranges,
+ uint32_t *nr_inv_ranges, Range **inv_ranges,
+ uint64_t low, uint64_t high)
+{
+ Range *resv;
+ int i = 0, j = 0;
+
+ resv = g_malloc0_n(nr_ranges + 1, sizeof(Range));
+
+ for (; j < nr_ranges && (range_upb(&ranges[j]) < low); j++) {
+ continue; /* skip all ranges below mon */
+ }
+
+ if (j == nr_ranges) {
+ range_set_bounds(&resv[i++], low, high);
+ goto realloc;
+ }
+
+ /* first range lob is greater than min, insert a first range */
+ if (range_lob(&ranges[j]) > low) {
+ range_set_bounds(&resv[i++], low,
+ MIN(range_lob(&ranges[j]) - 1, high));
+ }
+
+ /* insert a range inbetween each original range until we reach max */
+ for (; j < nr_ranges - 1; j++) {
+ if (range_lob(&ranges[j]) >= high) {
+ goto realloc;
+ }
+ if (range_compare(&ranges[j], &ranges[j + 1])) {
+ range_set_bounds(&resv[i++], range_upb(&ranges[j]) + 1,
+ MIN(range_lob(&ranges[j + 1]) - 1, high));
+ }
+ }
+ /* last range upb is less than max, insert a last range */
+ if (range_upb(&ranges[j]) < high) {
+ range_set_bounds(&resv[i++],
+ range_upb(&ranges[j]) + 1, high);
+ }
+realloc:
+ *nr_inv_ranges = i;
+ resv = g_realloc(resv, i * sizeof(Range));
+ *inv_ranges = resv;
+}
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 07/12] virtio-iommu: Implement set_iova_ranges() callback
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (5 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 06/12] range: Introduce range_inverse_array() Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-29 16:15 ` Jean-Philippe Brucker
2023-09-13 8:01 ` [PATCH v2 08/12] range: Make range_compare() public Eric Auger
` (5 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
The implementation populates the array of per IOMMUDevice
host reserved regions.
It is forbidden to have conflicting sets of host IOVA ranges
to be applied onto the same IOMMU MR (implied by different
host devices).
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
v1 -> v2:
- Forbid conflicting sets of host resv regions
---
include/hw/virtio/virtio-iommu.h | 2 ++
hw/virtio/virtio-iommu.c | 48 ++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 70b8ace34d..31b69c8261 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -40,6 +40,8 @@ typedef struct IOMMUDevice {
MemoryRegion root; /* The root container of the device */
MemoryRegion bypass_mr; /* The alias of shared memory MR */
GList *resv_regions;
+ Range *host_resv_regions;
+ uint32_t nr_host_resv_regions;
} IOMMUDevice;
typedef struct IOMMUPciBus {
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index ea359b586a..ed2df5116f 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -20,6 +20,7 @@
#include "qemu/osdep.h"
#include "qemu/log.h"
#include "qemu/iov.h"
+#include "qemu/range.h"
#include "exec/target_page.h"
#include "hw/qdev-properties.h"
#include "hw/virtio/virtio.h"
@@ -1158,6 +1159,52 @@ static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
return 0;
}
+static int virtio_iommu_set_iova_ranges(IOMMUMemoryRegion *mr,
+ uint32_t nr_ranges,
+ struct Range *iova_ranges,
+ Error **errp)
+{
+ IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+ uint32_t nr_host_resv_regions;
+ Range *host_resv_regions;
+ int ret = -EINVAL;
+
+ if (!nr_ranges) {
+ return 0;
+ }
+
+ if (sdev->host_resv_regions) {
+ range_inverse_array(nr_ranges, iova_ranges,
+ &nr_host_resv_regions, &host_resv_regions,
+ 0, UINT64_MAX);
+ if (nr_host_resv_regions != sdev->nr_host_resv_regions) {
+ goto error;
+ }
+ for (int i = 0; i < nr_host_resv_regions; i++) {
+ Range *new = &host_resv_regions[i];
+ Range *existing = &sdev->host_resv_regions[i];
+
+ if (!range_contains_range(existing, new)) {
+ goto error;
+ }
+ }
+ ret = 0;
+ goto out;
+ }
+
+ range_inverse_array(nr_ranges, iova_ranges,
+ &sdev->nr_host_resv_regions, &sdev->host_resv_regions,
+ 0, UINT64_MAX);
+
+ return 0;
+error:
+ error_setg(errp, "IOMMU mr=%s Conflicting host reserved regions set!",
+ mr->parent_obj.name);
+out:
+ g_free(host_resv_regions);
+ return ret;
+}
+
static void virtio_iommu_system_reset(void *opaque)
{
VirtIOIOMMU *s = opaque;
@@ -1453,6 +1500,7 @@ static void virtio_iommu_memory_region_class_init(ObjectClass *klass,
imrc->replay = virtio_iommu_replay;
imrc->notify_flag_changed = virtio_iommu_notify_flag_changed;
imrc->iommu_set_page_size_mask = virtio_iommu_set_page_size_mask;
+ imrc->iommu_set_iova_ranges = virtio_iommu_set_iova_ranges;
}
static const TypeInfo virtio_iommu_info = {
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 08/12] range: Make range_compare() public
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (6 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 07/12] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-13 8:01 ` [PATCH v2 09/12] util/reserved-region: Add new ReservedRegion helpers Eric Auger
` (4 subsequent siblings)
12 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
Let's expose range_compare() in the header so that it can be
reused outside of util/range.c
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
v1 -> v2:
- Added Philippe's R-b
---
include/qemu/range.h | 6 ++++++
util/range.c | 6 +-----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/include/qemu/range.h b/include/qemu/range.h
index 2b59e3bf0c..dfe7e58ff9 100644
--- a/include/qemu/range.h
+++ b/include/qemu/range.h
@@ -217,6 +217,12 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
return !(last2 < first1 || last1 < first2);
}
+/*
+ * Return -1 if @a < @b, 1 @a > @b, and 0 if they touch or overlap.
+ * Both @a and @b must not be empty.
+ */
+int range_compare(Range *a, Range *b);
+
GList *range_list_insert(GList *list, Range *data);
/*
diff --git a/util/range.c b/util/range.c
index 4baeb588cc..19402e45aa 100644
--- a/util/range.c
+++ b/util/range.c
@@ -20,11 +20,7 @@
#include "qemu/osdep.h"
#include "qemu/range.h"
-/*
- * Return -1 if @a < @b, 1 @a > @b, and 0 if they touch or overlap.
- * Both @a and @b must not be empty.
- */
-static inline int range_compare(Range *a, Range *b)
+int range_compare(Range *a, Range *b)
{
assert(!range_is_empty(a) && !range_is_empty(b));
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 09/12] util/reserved-region: Add new ReservedRegion helpers
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (7 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 08/12] range: Make range_compare() public Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-29 16:16 ` Jean-Philippe Brucker
2023-09-13 8:01 ` [PATCH v2 10/12] virtio-iommu: Consolidate host reserved regions and property set ones Eric Auger
` (3 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
Introduce resv_region_list_insert() helper which inserts
a new ReservedRegion into a sorted list of reserved region.
In case of overlap, the new region has higher priority and
hides the existing overlapped segments. If the overlap is
partial, new regions are created for parts which are not
overlapped. The new region has higher priority independently
on the type of the regions.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
include/qemu/reserved-region.h | 32 ++++++++++++
util/reserved-region.c | 94 ++++++++++++++++++++++++++++++++++
util/meson.build | 1 +
3 files changed, 127 insertions(+)
create mode 100644 include/qemu/reserved-region.h
create mode 100644 util/reserved-region.c
diff --git a/include/qemu/reserved-region.h b/include/qemu/reserved-region.h
new file mode 100644
index 0000000000..8e6f0a97e2
--- /dev/null
+++ b/include/qemu/reserved-region.h
@@ -0,0 +1,32 @@
+/*
+ * QEMU ReservedRegion helpers
+ *
+ * Copyright (c) 2023 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_RESERVED_REGION_H
+#define QEMU_RESERVED_REGION_H
+
+#include "exec/memory.h"
+
+/*
+ * Insert a new region into a sorted list of reserved regions. In case
+ * there is overlap with existing regions, the new added region has
+ * higher priority and replaces the overlapped segment.
+ */
+GList *resv_region_list_insert(GList *list, ReservedRegion *reg);
+
+#endif
diff --git a/util/reserved-region.c b/util/reserved-region.c
new file mode 100644
index 0000000000..bb26a6bed3
--- /dev/null
+++ b/util/reserved-region.c
@@ -0,0 +1,94 @@
+/*
+ * QEMU ReservedRegion helpers
+ *
+ * Copyright (c) 2023 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/range.h"
+#include "qemu/reserved-region.h"
+
+GList *resv_region_list_insert(GList *list, ReservedRegion *reg)
+{
+ ReservedRegion *resv_iter, *new_reg;
+ Range *r = ®->range;
+ Range *range_iter;
+ GList *l;
+
+ for (l = list; l ; ) {
+ resv_iter = (ReservedRegion *)l->data;
+ range_iter = &resv_iter->range;
+
+ /* Skip all list elements strictly less than range to add */
+ if (range_compare(range_iter, r) < 0) {
+ l = l->next;
+ } else if (range_compare(range_iter, r) > 0) {
+ return g_list_insert_before(list, l, reg);
+ } else { /* there is an operlap */
+ if (range_contains_range(r, range_iter)) {
+ /* new range contains current item, simply remove this latter */
+ GList *prev = l->prev;
+ g_free(l->data);
+ list = g_list_delete_link(list, l);
+ if (prev) {
+ l = prev;
+ if (l) {
+ l = l->next;
+ }
+ } else {
+ l = list;
+ }
+ } else if (range_contains_range(range_iter, r)) {
+ /* new region is included in the current region */
+ if (range_lob(range_iter) == range_lob(r)) {
+ /* adjacent on the left side, derives into 2 regions */
+ range_set_bounds(range_iter, range_upb(r) + 1,
+ range_upb(range_iter));
+ return g_list_insert_before(list, l, reg);
+ } else if (range_upb(range_iter) == range_upb(r)) {
+ /* adjacent on the right side, derives into 2 regions */
+ range_set_bounds(range_iter, range_lob(range_iter),
+ range_lob(r) - 1);
+ l = l->next;
+ } else {
+ uint64_t lob = range_lob(range_iter);
+ /*
+ * the new range is in the middle of an existing one,
+ * split this latter into 3 regs instead
+ */
+ range_set_bounds(range_iter, range_upb(r) + 1,
+ range_upb(range_iter));
+ new_reg = g_new0(ReservedRegion, 1);
+ new_reg->type = resv_iter->type;
+ range_set_bounds(&new_reg->range,
+ lob, range_lob(r) - 1);
+ list = g_list_insert_before(list, l, new_reg);
+ return g_list_insert_before(list, l, reg);
+ }
+ } else if (range_lob(r) < range_lob(range_iter)) {
+ range_set_bounds(range_iter, range_upb(r) + 1,
+ range_upb(range_iter));
+ return g_list_insert_before(list, l, reg);
+ } else { /* intersection on the upper range */
+ range_set_bounds(range_iter, range_lob(range_iter),
+ range_lob(r) - 1);
+ l = l->next;
+ }
+ } /* overlap */
+ }
+ return g_list_append(list, reg);
+}
+
diff --git a/util/meson.build b/util/meson.build
index c4827fd70a..eb677b40c2 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -51,6 +51,7 @@ util_ss.add(files('qdist.c'))
util_ss.add(files('qht.c'))
util_ss.add(files('qsp.c'))
util_ss.add(files('range.c'))
+util_ss.add(files('reserved-region.c'))
util_ss.add(files('stats64.c'))
util_ss.add(files('systemd.c'))
util_ss.add(files('transactions.c'))
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 10/12] virtio-iommu: Consolidate host reserved regions and property set ones
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (8 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 09/12] util/reserved-region: Add new ReservedRegion helpers Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-13 8:01 ` [PATCH v2 11/12] test: Add some tests for range and resv-mem helpers Eric Auger
` (2 subsequent siblings)
12 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
Up to now we were exposing to the RESV_MEM probe requests the
reserved memory regions set though the reserved-regions array property.
Combine those with the host reserved memory regions if any. Those
latter are tagged as RESERVED. We don't have more information about
them besides then cannot be mapped. Reserved regions set by
property have higher priority.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
hw/virtio/virtio-iommu.c | 17 ++++++++++++++++-
hw/virtio/trace-events | 1 +
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index ed2df5116f..4b22fe7ff1 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -21,6 +21,7 @@
#include "qemu/log.h"
#include "qemu/iov.h"
#include "qemu/range.h"
+#include "qemu/reserved-region.h"
#include "exec/target_page.h"
#include "hw/qdev-properties.h"
#include "hw/virtio/virtio.h"
@@ -630,11 +631,25 @@ static int consolidate_resv_regions(IOMMUDevice *sdev)
VirtIOIOMMU *s = sdev->viommu;
int i;
+ /* First add host reserved regions if any, all tagged as RESERVED */
+ for (i = 0; i < sdev->nr_host_resv_regions; i++) {
+ ReservedRegion *reg = g_new0(ReservedRegion, 1);
+ reg->range = sdev->host_resv_regions[i];
+ reg->type = VIRTIO_IOMMU_RESV_MEM_T_RESERVED;
+ sdev->resv_regions = resv_region_list_insert(sdev->resv_regions, reg);
+ trace_virtio_iommu_host_resv_regions(sdev->iommu_mr.parent_obj.name, i,
+ range_lob(®->range),
+ range_upb(®->range));
+ }
+ /*
+ * then add higher priority reserved regions set through properties by the
+ * machine
+ */
for (i = 0; i < s->nr_prop_resv_regions; i++) {
ReservedRegion *reg = g_new0(ReservedRegion, 1);
*reg = s->prop_resv_regions[i];
- sdev->resv_regions = g_list_append(sdev->resv_regions, reg);
+ sdev->resv_regions = resv_region_list_insert(sdev->resv_regions, reg);
}
return 0;
}
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 7109cf1a3b..796574b0f3 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -134,6 +134,7 @@ virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
virtio_iommu_freeze_granule(uint64_t page_size_mask) "granule set to 0x%"PRIx64
+virtio_iommu_host_resv_regions(const char *name, uint32_t index, uint64_t lob, uint64_t upb) "mr=%s host-resv-reg[%d] = [0x%"PRIx64",0x%"PRIx64"]"
# virtio-mem.c
virtio_mem_send_response(uint16_t type) "type=%" PRIu16
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 11/12] test: Add some tests for range and resv-mem helpers
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (9 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 10/12] virtio-iommu: Consolidate host reserved regions and property set ones Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-13 8:01 ` [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption Eric Auger
2023-09-26 8:06 ` [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space YangHang Liu
12 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
Add unit tests for both resv_region_list_insert() and
range_inverse_array().
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
tests/unit/test-resv-mem.c | 251 +++++++++++++++++++++++++++++++++++++
tests/unit/meson.build | 1 +
2 files changed, 252 insertions(+)
create mode 100644 tests/unit/test-resv-mem.c
diff --git a/tests/unit/test-resv-mem.c b/tests/unit/test-resv-mem.c
new file mode 100644
index 0000000000..8fc7dbf1dd
--- /dev/null
+++ b/tests/unit/test-resv-mem.c
@@ -0,0 +1,251 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * reserved-region/range.c unit-tests.
+ *
+ * Copyright (C) 2023, Red Hat, Inc.
+ *
+ * Author: Eric Auger <eric.auger@redhat.com>
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/range.h"
+#include "exec/memory.h"
+#include "qemu/reserved-region.h"
+
+#define DEBUG 0
+
+static void print_rev_array(const char *prefix, uint32_t nr_rev,
+ Range *rev, uint32_t expected_nr_rev)
+{
+ g_assert_cmpint(nr_rev, ==, expected_nr_rev);
+#if DEBUG
+ printf("%s nr_rev=%d\n", prefix, nr_rev);
+ for (int i = 0; i < nr_rev; i++) {
+ printf("%s rev[%i] = [0x%"PRIx64",0x%"PRIx64"]\n",
+ prefix, i, range_lob(&rev[i]), range_upb(&rev[i]));
+ }
+#endif
+}
+
+static void check_range_reverse_array(void)
+{
+ Range r[10];
+ Range *rev;
+ uint32_t nr_rev;
+
+ range_set_bounds(&r[0], 0x10000, UINT64_MAX);
+ range_inverse_array(1, r, &nr_rev, &rev, 0, UINT64_MAX);
+ print_rev_array("test1", nr_rev, rev, 1);
+ g_free(rev);
+
+ range_set_bounds(&r[0], 0x10000, 0xFFFFFFFFFFFF);
+ range_inverse_array(1, r, &nr_rev, &rev, 0, UINT64_MAX);
+ print_rev_array("test2", nr_rev, rev, 2);
+ g_free(rev);
+
+ range_set_bounds(&r[0], 0x0, 0xFFFF);
+ range_set_bounds(&r[1], 0x10000, 0x2FFFF);
+ range_inverse_array(2, r, &nr_rev, &rev, 0, UINT64_MAX);
+ print_rev_array("test3", nr_rev, rev, 1);
+ g_free(rev);
+
+ range_set_bounds(&r[0], 0x50000, 0x5FFFF);
+ range_set_bounds(&r[1], 0x60000, 0xFFFFFFFFFFFF);
+ range_inverse_array(2, r, &nr_rev, &rev, 0, UINT64_MAX);
+ print_rev_array("test4", nr_rev, rev, 2);
+ g_free(rev);
+
+ range_set_bounds(&r[0], 0x0, UINT64_MAX);
+ range_inverse_array(1, r, &nr_rev, &rev, 0, UINT64_MAX);
+ print_rev_array("test5", nr_rev, rev, 0);
+ g_free(rev);
+}
+
+static void check_range_reverse_array_low_end(void)
+{
+ Range r[10];
+ Range *rev;
+ uint32_t nr_rev;
+
+ printf("%s\n", __func__);
+ range_set_bounds(&r[0], 0x0, UINT64_MAX);
+ range_inverse_array(1, r, &nr_rev, &rev, 0x10000, 0xFFFFFF);
+ print_rev_array("test1", nr_rev, rev, 0);
+ g_free(rev);
+
+ range_set_bounds(&r[0], 0x0, 0xFFFF);
+ range_set_bounds(&r[1], 0x20000, 0x2FFFF);
+ range_inverse_array(2, r, &nr_rev, &rev, 0x40000, 0xFFFFFFFFFFFF);
+ print_rev_array("test2", nr_rev, rev, 1);
+ g_free(rev);
+
+ range_set_bounds(&r[0], 0x0, 0xFFFF);
+ range_set_bounds(&r[1], 0x20000, 0x2FFFF);
+ range_set_bounds(&r[2], 0x1000000000000, UINT64_MAX);
+ range_inverse_array(3, r, &nr_rev, &rev, 0x40000, 0xFFFFFFFFFFFF);
+ print_rev_array("test3", nr_rev, rev, 1);
+ g_free(rev);
+
+ range_set_bounds(&r[0], 0x0, 0xFFFF);
+ range_set_bounds(&r[1], 0x10000, 0x2FFFF);
+ range_set_bounds(&r[2], 0x100000000, 0x100000000FFFF);
+ range_inverse_array(3, r, &nr_rev, &rev, 0x20000, 0xFFFFFFFFFFFF);
+ print_rev_array("test4", nr_rev, rev, 1);
+ g_free(rev);
+
+ range_set_bounds(&r[0], 0x2000, 0xFFFF);
+ range_set_bounds(&r[1], 0x20000, 0x2FFFF);
+ range_set_bounds(&r[2], 0x100000000, 0x1FFFFFFFF);
+ range_inverse_array(3, r, &nr_rev, &rev, 0x1000, 0xFFFFFFFFFFFF);
+ /*
+ * expects [0x1000, 0x2000] [0x10000, 0x1FFFF] [0x30000, 0xFFFFFFFF]
+ * [0x200000000, 0xFFFFFFFFFFFF]
+ */
+ print_rev_array("test4", nr_rev, rev, 4);
+ g_free(rev);
+
+ range_set_bounds(&r[0], 0x10000000 , 0x1FFFFFFF);
+ range_set_bounds(&r[1], 0x100000000, 0x1FFFFFFFF);
+ range_inverse_array(2, r, &nr_rev, &rev, 0x0, 0xFFFF);
+ /* expects [0x0, 0xFFFF] */
+ print_rev_array("test5", nr_rev, rev, 1);
+ g_free(rev);
+}
+
+static ReservedRegion *alloc_resv_mem(unsigned type, uint64_t lob, uint64_t upb)
+{
+ ReservedRegion *r;
+
+ r = g_new0(ReservedRegion, 1);
+ r->type = type;
+ range_set_bounds(&r->range, lob, upb);
+ return r;
+}
+
+static void print_resv_region_list(const char *prefix, GList *list,
+ uint32_t expected_length)
+{
+ int i = g_list_length(list);
+
+ g_assert_cmpint(i, ==, expected_length);
+#if DEBUG
+ i = 0;
+ for (GList *l = list; l; l = l->next) {
+ ReservedRegion *r = (ReservedRegion *)l->data;
+ Range *range = &r->range;
+
+ printf("%s item[%d]=[0x%x, 0x%"PRIx64", 0x%"PRIx64"]\n",
+ prefix, i++, r->type, range_lob(range), range_upb(range));
+ }
+#endif
+}
+
+static void free_resv_region(gpointer data)
+{
+ ReservedRegion *reg = (ReservedRegion *)data;
+
+ g_free(reg);
+}
+
+static void check_resv_region_list_insert(void)
+{
+ ReservedRegion *r[10];
+ GList *l = NULL;
+
+ r[0] = alloc_resv_mem(0xA, 0, 0xFFFF);
+ r[1] = alloc_resv_mem(0xA, 0x20000, 0x2FFFF);
+ l = resv_region_list_insert(l, r[0]);
+ l = resv_region_list_insert(l, r[1]);
+ print_resv_region_list("test1", l, 2);
+
+ /* adjacent on left */
+ r[2] = alloc_resv_mem(0xB, 0x0, 0xFFF);
+ l = resv_region_list_insert(l, r[2]);
+ /* adjacent on right */
+ r[3] = alloc_resv_mem(0xC, 0x21000, 0x2FFFF);
+ l = resv_region_list_insert(l, r[3]);
+ print_resv_region_list("test2", l, 4);
+
+ /* exact overlap of D into C*/
+ r[4] = alloc_resv_mem(0xD, 0x21000, 0x2FFFF);
+ l = resv_region_list_insert(l, r[4]);
+ print_resv_region_list("test3", l, 4);
+
+ /* in the middle */
+ r[5] = alloc_resv_mem(0xE, 0x22000, 0x23FFF);
+ l = resv_region_list_insert(l, r[5]);
+ print_resv_region_list("test4", l, 6);
+
+ /* overwrites several existing ones */
+ r[6] = alloc_resv_mem(0xF, 0x10000, 0x2FFFF);
+ l = resv_region_list_insert(l, r[6]);
+ print_resv_region_list("test5", l, 3);
+
+ /* contiguous at the end */
+ r[7] = alloc_resv_mem(0x0, 0x30000, 0x40000);
+ l = resv_region_list_insert(l, r[7]);
+ print_resv_region_list("test6", l, 4);
+
+ g_list_free_full(l, free_resv_region);
+ l = NULL;
+
+ r[0] = alloc_resv_mem(0x0, 0x10000, 0x1FFFF);
+ l = resv_region_list_insert(l, r[0]);
+ /* insertion before the 1st item */
+ r[1] = alloc_resv_mem(0x1, 0x0, 0xFF);
+ l = resv_region_list_insert(l, r[1]);
+ print_resv_region_list("test8", l, 2);
+
+ /* collision on the left side */
+ r[2] = alloc_resv_mem(0xA, 0x1200, 0x11FFF);
+ l = resv_region_list_insert(l, r[2]);
+ print_resv_region_list("test9", l, 3);
+
+ /* collision on the right side */
+ r[3] = alloc_resv_mem(0xA, 0x1F000, 0x2FFFF);
+ l = resv_region_list_insert(l, r[3]);
+ print_resv_region_list("test10", l, 4);
+
+ /* override everything */
+ r[4] = alloc_resv_mem(0xF, 0x0, UINT64_MAX);
+ l = resv_region_list_insert(l, r[4]);
+ print_resv_region_list("test11", l, 1);
+
+ g_list_free_full(l, free_resv_region);
+ l = NULL;
+
+ r[0] = alloc_resv_mem(0xF, 0x1000000000000, UINT64_MAX);
+ l = resv_region_list_insert(l, r[0]);
+ print_resv_region_list("test12", l, 1);
+
+ r[1] = alloc_resv_mem(0xA, 0x0, 0xFFFFFFF);
+ l = resv_region_list_insert(l, r[1]);
+ print_resv_region_list("test12", l, 2);
+
+ r[2] = alloc_resv_mem(0xB, 0x100000000, 0x1FFFFFFFF);
+ l = resv_region_list_insert(l, r[2]);
+ print_resv_region_list("test12", l, 3);
+
+ r[3] = alloc_resv_mem(0x0, 0x010000000, 0x2FFFFFFFF);
+ l = resv_region_list_insert(l, r[3]);
+ print_resv_region_list("test12", l, 3);
+
+ g_list_free_full(l, free_resv_region);
+}
+
+int main(int argc, char **argv)
+{
+ g_test_init(&argc, &argv, NULL);
+
+ g_test_add_func("/resv-mem/range_reverse_array",
+ check_range_reverse_array);
+ g_test_add_func("/resv-mem/range_reverse_array_low_end",
+ check_range_reverse_array_low_end);
+ g_test_add_func("/resv-mem/resv_region_list_insert",
+ check_resv_region_list_insert);
+
+ g_test_run();
+
+ return 0;
+}
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 0299ef6906..3e196eda7b 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -21,6 +21,7 @@ tests = {
'test-opts-visitor': [testqapi],
'test-visitor-serialization': [testqapi],
'test-bitmap': [],
+ 'test-resv-mem': [],
# all code tested by test-x86-cpuid is inside topology.h
'test-x86-cpuid': [],
'test-cutils': [],
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (10 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 11/12] test: Add some tests for range and resv-mem helpers Eric Auger
@ 2023-09-13 8:01 ` Eric Auger
2023-09-19 17:22 ` Alex Williamson
2023-09-20 20:02 ` Alex Williamson
2023-09-26 8:06 ` [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space YangHang Liu
12 siblings, 2 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-13 8:01 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, alex.williamson,
clg, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
Now we retrieve the usable IOVA ranges from the host,
we now the physical IOMMU aperture and we can remove
the assumption of 64b IOVA space when calling
vfio_host_win_add().
This works fine in general but in case of an IOMMU memory
region this becomes more tricky. For instance the virtio-iommu
MR has a 64b aperture by default. If the physical IOMMU has a
smaller aperture (typically the case for VTD), this means we
would need to resize the IOMMU MR when this latter is linked
to a container. However this happens on vfio_listener_region_add()
when calling the IOMMU MR set_iova_ranges() callback and this
would mean we would have a recursive call the
vfio_listener_region_add(). This looks like a wrong usage of
the memory API causing duplicate IOMMU MR notifier registration
for instance.
Until we find a better solution, make sure the vfio_find_hostwin()
is not called anymore for IOMMU region.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
I have not found any working solution to the IOMMU MR resizing.
So I can remove this patch or remove the check for IOMMU MR. Maybe
this is an issue which can be handled separately?
---
hw/vfio/common.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 26da38de05..40cac1ca91 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1112,13 +1112,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
#endif
}
- hostwin = vfio_find_hostwin(container, iova, end);
- if (!hostwin) {
- error_setg(&err, "Container %p can't map guest IOVA region"
- " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
- goto fail;
- }
-
memory_region_ref(section->mr);
if (memory_region_is_iommu(section->mr)) {
@@ -1177,6 +1170,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
return;
}
+ hostwin = vfio_find_hostwin(container, iova, end);
+ if (!hostwin) {
+ error_setg(&err, "Container %p can't map guest IOVA region"
+ " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
+ goto fail;
+ }
+
+
/* Here we assume that memory_region_is_ram(section->mr)==true */
/*
@@ -2594,12 +2595,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
vfio_get_iommu_info_migration(container, info);
g_free(info);
- /*
- * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
- * information to get the actual window extent rather than assume
- * a 64-bit IOVA address space.
- */
- vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
+ g_assert(container->nr_iovas);
+ vfio_host_win_add(container, 0,
+ container->iova_ranges[container->nr_iovas - 1].end,
+ container->pgsizes);
break;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v2 01/12] memory: Let ReservedRegion use Range
2023-09-13 8:01 ` [PATCH v2 01/12] memory: Let ReservedRegion use Range Eric Auger
@ 2023-09-13 12:43 ` Cédric Le Goater
0 siblings, 0 replies; 37+ messages in thread
From: Cédric Le Goater @ 2023-09-13 12:43 UTC (permalink / raw)
To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
On 9/13/23 10:01, Eric Auger wrote:
> A reserved region is a range tagged with a type. Let's directly use
> the Range type in the prospect to reuse some of the library helpers
> shipped with the Range type.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
>
> ---
>
> v1 -> v2:
> - Added Philippe and David's R-b
> ---
> include/exec/memory.h | 4 ++--
> hw/core/qdev-properties-system.c | 9 ++++++---
> hw/virtio/virtio-iommu.c | 6 +++---
> 3 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 68284428f8..184cb3a01b 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -24,6 +24,7 @@
> #include "qemu/bswap.h"
> #include "qemu/queue.h"
> #include "qemu/int128.h"
> +#include "qemu/range.h"
> #include "qemu/notify.h"
> #include "qom/object.h"
> #include "qemu/rcu.h"
> @@ -79,8 +80,7 @@ extern unsigned int global_dirty_tracking;
> typedef struct MemoryRegionOps MemoryRegionOps;
>
> struct ReservedRegion {
> - hwaddr low;
> - hwaddr high;
> + Range range;
> unsigned type;
> };
>
> diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
> index 6d5d43eda2..8d486eeefd 100644
> --- a/hw/core/qdev-properties-system.c
> +++ b/hw/core/qdev-properties-system.c
> @@ -699,7 +699,7 @@ static void get_reserved_region(Object *obj, Visitor *v, const char *name,
> int rc;
>
> rc = snprintf(buffer, sizeof(buffer), "0x%"PRIx64":0x%"PRIx64":%u",
> - rr->low, rr->high, rr->type);
> + range_lob(&rr->range), range_upb(&rr->range), rr->type);
> assert(rc < sizeof(buffer));
>
> visit_type_str(v, name, &p, errp);
> @@ -711,6 +711,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
> Property *prop = opaque;
> ReservedRegion *rr = object_field_prop_ptr(obj, prop);
> const char *endptr;
> + uint64_t lob, upb;
> char *str;
> int ret;
>
> @@ -718,7 +719,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
> return;
> }
>
> - ret = qemu_strtou64(str, &endptr, 16, &rr->low);
> + ret = qemu_strtou64(str, &endptr, 16, &lob);
> if (ret) {
> error_setg(errp, "start address of '%s'"
> " must be a hexadecimal integer", name);
> @@ -728,7 +729,7 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
> goto separator_error;
> }
>
> - ret = qemu_strtou64(endptr + 1, &endptr, 16, &rr->high);
> + ret = qemu_strtou64(endptr + 1, &endptr, 16, &upb);
> if (ret) {
> error_setg(errp, "end address of '%s'"
> " must be a hexadecimal integer", name);
> @@ -738,6 +739,8 @@ static void set_reserved_region(Object *obj, Visitor *v, const char *name,
> goto separator_error;
> }
>
> + range_set_bounds(&rr->range, lob, upb);
> +
> ret = qemu_strtoui(endptr + 1, &endptr, 10, &rr->type);
> if (ret) {
> error_setg(errp, "type of '%s'"
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index be51635895..e5e46e1b55 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -645,8 +645,8 @@ static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
> prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
> prop.head.length = cpu_to_le16(length);
> prop.subtype = subtype;
> - prop.start = cpu_to_le64(s->reserved_regions[i].low);
> - prop.end = cpu_to_le64(s->reserved_regions[i].high);
> + prop.start = cpu_to_le64(range_lob(&s->reserved_regions[i].range));
> + prop.end = cpu_to_le64(range_upb(&s->reserved_regions[i].range));
>
> memcpy(buf, &prop, size);
>
> @@ -897,7 +897,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
> for (i = 0; i < s->nb_reserved_regions; i++) {
> ReservedRegion *reg = &s->reserved_regions[i];
>
> - if (addr >= reg->low && addr <= reg->high) {
> + if (range_contains(®->range, addr)) {
> switch (reg->type) {
> case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> entry.perm = flag;
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 02/12] memory: Introduce memory_region_iommu_set_iova_ranges
2023-09-13 8:01 ` [PATCH v2 02/12] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
@ 2023-09-13 12:43 ` Cédric Le Goater
2023-09-20 7:40 ` Eric Auger
0 siblings, 1 reply; 37+ messages in thread
From: Cédric Le Goater @ 2023-09-13 12:43 UTC (permalink / raw)
To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
On 9/13/23 10:01, Eric Auger wrote:
> This helper will allow to convey information about valid
> IOVA ranges to virtual IOMMUS.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/exec/memory.h | 26 ++++++++++++++++++++++++++
> softmmu/memory.c | 15 +++++++++++++++
> 2 files changed, 41 insertions(+)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 184cb3a01b..f6fb99dd3f 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -519,6 +519,27 @@ struct IOMMUMemoryRegionClass {
> int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
> uint64_t page_size_mask,
> Error **errp);
> + /**
> + * @iommu_set_iova_ranges:
> + *
> + * Propagate information about the usable IOVA ranges for a given IOMMU
> + * memory region. Used for example to propagate host physical device
> + * reserved memory region constraints to the virtual IOMMU.
> + *
> + * Optional method: if this method is not provided, then the default IOVA
> + * aperture is used.
> + *
> + * @nr_ranges: number of IOVA ranges
> + *
> + * @iova_ranges: an array of @nr_ranges usable IOVA ranges
> + *
> + * Returns 0 on success, or a negative error. In case of failure, the error
> + * object must be created.
> + */
> + int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
> + uint32_t nr_ranges,
> + struct Range *iova_ranges,
> + Error **errp);
> };
>
> typedef struct RamDiscardListener RamDiscardListener;
> @@ -1845,6 +1866,11 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
> uint64_t page_size_mask,
> Error **errp);
>
> +int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu,
> + uint32_t nr_ranges,
> + struct Range *iova_ranges,
> + Error **errp);
> +
> /**
> * memory_region_name: get a memory region's name
> *
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 7d9494ce70..07499457aa 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1905,6 +1905,21 @@ int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
> return ret;
> }
>
> +int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu_mr,
> + uint32_t nr_ranges,
> + struct Range *iova_ranges,
> + Error **errp)
> +{
> + IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
> + int ret = 0;
> +
> + if (imrc->iommu_set_iova_ranges) {
> + ret = imrc->iommu_set_iova_ranges(iommu_mr, nr_ranges,
> + iova_ranges, errp);
> + }
> + return ret;
> +}
> +
> int memory_region_register_iommu_notifier(MemoryRegion *mr,
> IOMMUNotifier *n, Error **errp)
> {
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 03/12] vfio: Collect container iova range info
2023-09-13 8:01 ` [PATCH v2 03/12] vfio: Collect container iova range info Eric Auger
@ 2023-09-13 12:55 ` Cédric Le Goater
2023-09-20 7:38 ` Eric Auger
2023-09-19 15:44 ` Alex Williamson
1 sibling, 1 reply; 37+ messages in thread
From: Cédric Le Goater @ 2023-09-13 12:55 UTC (permalink / raw)
To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
On 9/13/23 10:01, Eric Auger wrote:
> Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> capability is supported.
>
> This allows to propagate the information though the IOMMU MR
> set_iova_ranges() callback so that virtual IOMMUs
> get aware of those aperture constraints.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
> include/hw/vfio/vfio-common.h | 2 ++
> hw/vfio/common.c | 45 +++++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index da43d27352..74b9b27270 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -98,6 +98,8 @@ typedef struct VFIOContainer {
> QLIST_HEAD(, VFIOGroup) group_list;
> QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
> QLIST_ENTRY(VFIOContainer) next;
> + unsigned nr_iovas;
> + struct vfio_iova_range *iova_ranges;
> } VFIOContainer;
>
> typedef struct VFIOGuestIOMMU {
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9aac21abb7..26da38de05 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1157,6 +1157,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
> goto fail;
> }
>
> + ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
> + container->nr_iovas, (struct Range *)container->iova_ranges,
> + &err);
> + if (ret) {
> + g_free(giommu);
> + goto fail;
> + }
> +
> ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
> &err);
> if (ret) {
> @@ -1981,6 +1989,29 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
> return true;
> }
>
> +static void vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
> + unsigned int *nr_iovas,
> + struct vfio_iova_range **iova_ranges)
I guess there is no point in returning an error since we can not
assign default values.
> +{
> + struct vfio_info_cap_header *hdr;
> + struct vfio_iommu_type1_info_cap_iova_range *cap;
> +
> + hdr = vfio_get_iommu_type1_info_cap(info,
> + VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
> + if (hdr == NULL) {
May be :
if (!hdr) {
> + return;
> + }
> +
> + cap = (void *)hdr;
> + *nr_iovas = cap->nr_iovas;
> +
I would add a trace event with the #iovas.
Thanks,
C.
> + if (*nr_iovas == 0) {
> + return;
> + }
> + *iova_ranges = g_memdup2(cap->iova_ranges,
> + *nr_iovas * sizeof(struct vfio_iova_range));
> +}
> +
> static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
> struct vfio_region_info *info)
> {
> @@ -2433,6 +2464,12 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
> }
> }
>
> +static void vfio_free_container(VFIOContainer *container)
> +{
> + g_free(container->iova_ranges);
> + g_free(container);
> +}
> +
> static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> Error **errp)
> {
> @@ -2550,6 +2587,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
> container->dma_max_mappings = 65535;
> }
> +
> + vfio_get_info_iova_range(info, &container->nr_iovas,
> + &container->iova_ranges);
> +
> vfio_get_iommu_info_migration(container, info);
> g_free(info);
>
> @@ -2663,7 +2704,7 @@ enable_discards_exit:
> vfio_ram_block_discard_disable(container, false);
>
> free_container_exit:
> - g_free(container);
> + vfio_free_container(container);
>
> close_fd_exit:
> close(fd);
> @@ -2717,7 +2758,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>
> trace_vfio_disconnect_container(container->fd);
> close(container->fd);
> - g_free(container);
> + vfio_free_container(container);
>
> vfio_put_address_space(space);
> }
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 04/12] virtio-iommu: Rename reserved_regions into prop_resv_regions
2023-09-13 8:01 ` [PATCH v2 04/12] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
@ 2023-09-13 13:01 ` Cédric Le Goater
0 siblings, 0 replies; 37+ messages in thread
From: Cédric Le Goater @ 2023-09-13 13:01 UTC (permalink / raw)
To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, alex.williamson,
jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
On 9/13/23 10:01, Eric Auger wrote:
> Rename VirtIOIOMMU (nb_)reserved_regions fields with the "prop_" prefix
> to highlight those fields are set through a property, at machine level.
> They are IOMMU wide.
>
> A subsequent patch will introduce per IOMMUDevice reserved regions
> that will include both those IOMMU wide property reserved
> regions plus, sometimes, host reserved regions, if the device is
> backed by a host device protected by a physical IOMMU. Also change
> nb_ prefix by nr_.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
> include/hw/virtio/virtio-iommu.h | 4 ++--
> hw/virtio/virtio-iommu-pci.c | 8 ++++----
> hw/virtio/virtio-iommu.c | 15 ++++++++-------
> 3 files changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> index a93fc5383e..eea4564782 100644
> --- a/include/hw/virtio/virtio-iommu.h
> +++ b/include/hw/virtio/virtio-iommu.h
> @@ -55,8 +55,8 @@ struct VirtIOIOMMU {
> GHashTable *as_by_busptr;
> IOMMUPciBus *iommu_pcibus_by_bus_num[PCI_BUS_MAX];
> PCIBus *primary_bus;
> - ReservedRegion *reserved_regions;
> - uint32_t nb_reserved_regions;
> + ReservedRegion *prop_resv_regions;
> + uint32_t nr_prop_resv_regions;
> GTree *domains;
> QemuRecMutex mutex;
> GTree *endpoints;
> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> index 7ef2f9dcdb..9459fbf6ed 100644
> --- a/hw/virtio/virtio-iommu-pci.c
> +++ b/hw/virtio/virtio-iommu-pci.c
> @@ -37,7 +37,7 @@ struct VirtIOIOMMUPCI {
> static Property virtio_iommu_pci_properties[] = {
> DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
> DEFINE_PROP_ARRAY("reserved-regions", VirtIOIOMMUPCI,
> - vdev.nb_reserved_regions, vdev.reserved_regions,
> + vdev.nr_prop_resv_regions, vdev.prop_resv_regions,
> qdev_prop_reserved_region, ReservedRegion),
> DEFINE_PROP_END_OF_LIST(),
> };
> @@ -54,9 +54,9 @@ static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> "for the virtio-iommu-pci device");
> return;
> }
> - for (int i = 0; i < s->nb_reserved_regions; i++) {
> - if (s->reserved_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
> - s->reserved_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
> + for (int i = 0; i < s->nr_prop_resv_regions; i++) {
> + if (s->prop_resv_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
> + s->prop_resv_regions[i].type != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
> error_setg(errp, "reserved region %d has an invalid type", i);
> error_append_hint(errp, "Valid values are 0 and 1\n");
> return;
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index e5e46e1b55..979cdb5648 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -631,22 +631,23 @@ static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
> size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
> int i;
>
> - total = size * s->nb_reserved_regions;
> + total = size * s->nr_prop_resv_regions;
>
> if (total > free) {
> return -ENOSPC;
> }
>
> - for (i = 0; i < s->nb_reserved_regions; i++) {
> - unsigned subtype = s->reserved_regions[i].type;
> + for (i = 0; i < s->nr_prop_resv_regions; i++) {
> + unsigned subtype = s->prop_resv_regions[i].type;
> + Range *range = &s->prop_resv_regions[i].range;
>
> assert(subtype == VIRTIO_IOMMU_RESV_MEM_T_RESERVED ||
> subtype == VIRTIO_IOMMU_RESV_MEM_T_MSI);
> prop.head.type = cpu_to_le16(VIRTIO_IOMMU_PROBE_T_RESV_MEM);
> prop.head.length = cpu_to_le16(length);
> prop.subtype = subtype;
> - prop.start = cpu_to_le64(range_lob(&s->reserved_regions[i].range));
> - prop.end = cpu_to_le64(range_upb(&s->reserved_regions[i].range));
> + prop.start = cpu_to_le64(range_lob(range));
> + prop.end = cpu_to_le64(range_upb(range));
>
> memcpy(buf, &prop, size);
>
> @@ -894,8 +895,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
> goto unlock;
> }
>
> - for (i = 0; i < s->nb_reserved_regions; i++) {
> - ReservedRegion *reg = &s->reserved_regions[i];
> + for (i = 0; i < s->nr_prop_resv_regions; i++) {
> + ReservedRegion *reg = &s->prop_resv_regions[i];
>
> if (range_contains(®->range, addr)) {
> switch (reg->type) {
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 03/12] vfio: Collect container iova range info
2023-09-13 8:01 ` [PATCH v2 03/12] vfio: Collect container iova range info Eric Auger
2023-09-13 12:55 ` Cédric Le Goater
@ 2023-09-19 15:44 ` Alex Williamson
2023-09-20 7:15 ` Eric Auger
2023-09-20 7:39 ` Eric Auger
1 sibling, 2 replies; 37+ messages in thread
From: Alex Williamson @ 2023-09-19 15:44 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
On Wed, 13 Sep 2023 10:01:38 +0200
Eric Auger <eric.auger@redhat.com> wrote:
> Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> capability is supported.
>
> This allows to propagate the information though the IOMMU MR
> set_iova_ranges() callback so that virtual IOMMUs
> get aware of those aperture constraints.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
> include/hw/vfio/vfio-common.h | 2 ++
> hw/vfio/common.c | 45 +++++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index da43d27352..74b9b27270 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -98,6 +98,8 @@ typedef struct VFIOContainer {
> QLIST_HEAD(, VFIOGroup) group_list;
> QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
> QLIST_ENTRY(VFIOContainer) next;
> + unsigned nr_iovas;
> + struct vfio_iova_range *iova_ranges;
> } VFIOContainer;
>
> typedef struct VFIOGuestIOMMU {
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9aac21abb7..26da38de05 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1157,6 +1157,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
> goto fail;
> }
>
> + ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
> + container->nr_iovas, (struct Range *)container->iova_ranges,
> + &err);
The semantics of calling this with nr_iovas == 0 and the vIOMMU driver
ignoring that it's being told there are no usable iova ranges is rather
strange. Should nr_iovas be initialized to -1 for that or should this
call be conditional on non-zero nr_iovas?
Also, vfio_get_info_iova_range() is only called in the type1 container
path and the IOVA range info capability has only existed since kernel
v5.4. So we need to do something sane even if we don't have the kernel
telling us about the IOVA ranges. I think this precludes the assert in
the final patch of the series or else new QEMU on an old kernel is
broken.
> + if (ret) {
> + g_free(giommu);
> + goto fail;
> + }
> +
> ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
> &err);
> if (ret) {
> @@ -1981,6 +1989,29 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
> return true;
> }
>
> +static void vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
> + unsigned int *nr_iovas,
> + struct vfio_iova_range **iova_ranges)
Just pass the VFIOContainer pointer? Thanks,
Alex
> +{
> + struct vfio_info_cap_header *hdr;
> + struct vfio_iommu_type1_info_cap_iova_range *cap;
> +
> + hdr = vfio_get_iommu_type1_info_cap(info,
> + VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
> + if (hdr == NULL) {
> + return;
> + }
> +
> + cap = (void *)hdr;
> + *nr_iovas = cap->nr_iovas;
> +
> + if (*nr_iovas == 0) {
> + return;
> + }
> + *iova_ranges = g_memdup2(cap->iova_ranges,
> + *nr_iovas * sizeof(struct vfio_iova_range));
> +}
> +
> static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
> struct vfio_region_info *info)
> {
> @@ -2433,6 +2464,12 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
> }
> }
>
> +static void vfio_free_container(VFIOContainer *container)
> +{
> + g_free(container->iova_ranges);
> + g_free(container);
> +}
> +
> static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> Error **errp)
> {
> @@ -2550,6 +2587,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
> container->dma_max_mappings = 65535;
> }
> +
> + vfio_get_info_iova_range(info, &container->nr_iovas,
> + &container->iova_ranges);
> +
> vfio_get_iommu_info_migration(container, info);
> g_free(info);
>
> @@ -2663,7 +2704,7 @@ enable_discards_exit:
> vfio_ram_block_discard_disable(container, false);
>
> free_container_exit:
> - g_free(container);
> + vfio_free_container(container);
>
> close_fd_exit:
> close(fd);
> @@ -2717,7 +2758,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>
> trace_vfio_disconnect_container(container->fd);
> close(container->fd);
> - g_free(container);
> + vfio_free_container(container);
>
> vfio_put_address_space(space);
> }
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 06/12] range: Introduce range_inverse_array()
2023-09-13 8:01 ` [PATCH v2 06/12] range: Introduce range_inverse_array() Eric Auger
@ 2023-09-19 16:29 ` Alex Williamson
2023-09-20 7:24 ` Eric Auger
0 siblings, 1 reply; 37+ messages in thread
From: Alex Williamson @ 2023-09-19 16:29 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
On Wed, 13 Sep 2023 10:01:41 +0200
Eric Auger <eric.auger@redhat.com> wrote:
> This helper reverses an array of regions, turning original
> regions into holes and original holes into actual regions,
> covering the whole UINT64_MAX span.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
> ---
>
> v1 -> v2:
> - Move range_inverse_array description comment in the header
> - Take low/high params
> ---
> include/qemu/range.h | 8 ++++++++
> util/range.c | 45 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 53 insertions(+)
>
> diff --git a/include/qemu/range.h b/include/qemu/range.h
> index 7e2b1cc447..2b59e3bf0c 100644
> --- a/include/qemu/range.h
> +++ b/include/qemu/range.h
> @@ -219,4 +219,12 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
>
> GList *range_list_insert(GList *list, Range *data);
>
> +/*
> + * Inverse an array of sorted ranges over the [low, high] span, ie.
> + * original ranges becomes holes in the newly allocated inv_ranges
> + */
> +void range_inverse_array(uint32_t nr_ranges, Range *ranges,
> + uint32_t *nr_inv_ranges, Range **inv_ranges,
> + uint64_t low, uint64_t high);
> +
> #endif
> diff --git a/util/range.c b/util/range.c
> index 098d9d2dc0..4baeb588cc 100644
> --- a/util/range.c
> +++ b/util/range.c
> @@ -70,3 +70,48 @@ GList *range_list_insert(GList *list, Range *data)
>
> return list;
> }
> +
> +void range_inverse_array(uint32_t nr_ranges, Range *ranges,
> + uint32_t *nr_inv_ranges, Range **inv_ranges,
> + uint64_t low, uint64_t high)
Rare be it for me to suggest GLib, but we already appear to have
range_list_insert() making use of GList for an ordered list of Ranges.
Doesn't this function become a lot easier if we take a sorted GList,
walk it to create the inverse, and return a new GList of the inverted
Ranges? Seems the initial sorted GList would be created by making use
of the existing range_list_insert() function. Thanks,
Alex
> +{
> + Range *resv;
> + int i = 0, j = 0;
> +
> + resv = g_malloc0_n(nr_ranges + 1, sizeof(Range));
> +
> + for (; j < nr_ranges && (range_upb(&ranges[j]) < low); j++) {
> + continue; /* skip all ranges below mon */
> + }
> +
> + if (j == nr_ranges) {
> + range_set_bounds(&resv[i++], low, high);
> + goto realloc;
> + }
> +
> + /* first range lob is greater than min, insert a first range */
> + if (range_lob(&ranges[j]) > low) {
> + range_set_bounds(&resv[i++], low,
> + MIN(range_lob(&ranges[j]) - 1, high));
> + }
> +
> + /* insert a range inbetween each original range until we reach max */
> + for (; j < nr_ranges - 1; j++) {
> + if (range_lob(&ranges[j]) >= high) {
> + goto realloc;
> + }
> + if (range_compare(&ranges[j], &ranges[j + 1])) {
> + range_set_bounds(&resv[i++], range_upb(&ranges[j]) + 1,
> + MIN(range_lob(&ranges[j + 1]) - 1, high));
> + }
> + }
> + /* last range upb is less than max, insert a last range */
> + if (range_upb(&ranges[j]) < high) {
> + range_set_bounds(&resv[i++],
> + range_upb(&ranges[j]) + 1, high);
> + }
> +realloc:
> + *nr_inv_ranges = i;
> + resv = g_realloc(resv, i * sizeof(Range));
> + *inv_ranges = resv;
> +}
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption
2023-09-13 8:01 ` [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption Eric Auger
@ 2023-09-19 17:22 ` Alex Williamson
2023-09-20 7:28 ` Eric Auger
` (2 more replies)
2023-09-20 20:02 ` Alex Williamson
1 sibling, 3 replies; 37+ messages in thread
From: Alex Williamson @ 2023-09-19 17:22 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
On Wed, 13 Sep 2023 10:01:47 +0200
Eric Auger <eric.auger@redhat.com> wrote:
> Now we retrieve the usable IOVA ranges from the host,
> we now the physical IOMMU aperture and we can remove
> the assumption of 64b IOVA space when calling
> vfio_host_win_add().
>
> This works fine in general but in case of an IOMMU memory
> region this becomes more tricky. For instance the virtio-iommu
> MR has a 64b aperture by default. If the physical IOMMU has a
> smaller aperture (typically the case for VTD), this means we
> would need to resize the IOMMU MR when this latter is linked
> to a container. However this happens on vfio_listener_region_add()
> when calling the IOMMU MR set_iova_ranges() callback and this
> would mean we would have a recursive call the
> vfio_listener_region_add(). This looks like a wrong usage of
> the memory API causing duplicate IOMMU MR notifier registration
> for instance.
>
> Until we find a better solution, make sure the vfio_find_hostwin()
> is not called anymore for IOMMU region.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
> ---
>
> I have not found any working solution to the IOMMU MR resizing.
> So I can remove this patch or remove the check for IOMMU MR. Maybe
> this is an issue which can be handled separately?
> ---
> hw/vfio/common.c | 25 ++++++++++++-------------
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 26da38de05..40cac1ca91 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1112,13 +1112,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
> #endif
> }
>
> - hostwin = vfio_find_hostwin(container, iova, end);
> - if (!hostwin) {
> - error_setg(&err, "Container %p can't map guest IOVA region"
> - " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
> - goto fail;
> - }
> -
> memory_region_ref(section->mr);
>
> if (memory_region_is_iommu(section->mr)) {
> @@ -1177,6 +1170,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
> return;
> }
>
> + hostwin = vfio_find_hostwin(container, iova, end);
> + if (!hostwin) {
> + error_setg(&err, "Container %p can't map guest IOVA region"
> + " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
> + goto fail;
> + }
> +
> +
> /* Here we assume that memory_region_is_ram(section->mr)==true */
>
> /*
> @@ -2594,12 +2595,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> vfio_get_iommu_info_migration(container, info);
> g_free(info);
>
> - /*
> - * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> - * information to get the actual window extent rather than assume
> - * a 64-bit IOVA address space.
> - */
> - vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
> + g_assert(container->nr_iovas);
This assert is a problem for older kernels.
> + vfio_host_win_add(container, 0,
> + container->iova_ranges[container->nr_iovas - 1].end,
> + container->pgsizes);
This doesn't address the assumption about the min_iova and adds an
assumption that the kernel provided list is sorted. Thanks,
Alex
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 03/12] vfio: Collect container iova range info
2023-09-19 15:44 ` Alex Williamson
@ 2023-09-20 7:15 ` Eric Auger
2023-09-20 7:39 ` Eric Auger
1 sibling, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-20 7:15 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Alex,
On 9/19/23 17:44, Alex Williamson wrote:
> On Wed, 13 Sep 2023 10:01:38 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> capability is supported.
>>
>> This allows to propagate the information though the IOMMU MR
>> set_iova_ranges() callback so that virtual IOMMUs
>> get aware of those aperture constraints.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>> include/hw/vfio/vfio-common.h | 2 ++
>> hw/vfio/common.c | 45 +++++++++++++++++++++++++++++++++--
>> 2 files changed, 45 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index da43d27352..74b9b27270 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -98,6 +98,8 @@ typedef struct VFIOContainer {
>> QLIST_HEAD(, VFIOGroup) group_list;
>> QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
>> QLIST_ENTRY(VFIOContainer) next;
>> + unsigned nr_iovas;
>> + struct vfio_iova_range *iova_ranges;
>> } VFIOContainer;
>>
>> typedef struct VFIOGuestIOMMU {
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 9aac21abb7..26da38de05 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1157,6 +1157,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> goto fail;
>> }
>>
>> + ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
>> + container->nr_iovas, (struct Range *)container->iova_ranges,
>> + &err);
> The semantics of calling this with nr_iovas == 0 and the vIOMMU driver
> ignoring that it's being told there are no usable iova ranges is rather
> strange. Should nr_iovas be initialized to -1 for that or should this
> call be conditional on non-zero nr_iovas?
>
> Also, vfio_get_info_iova_range() is only called in the type1 container
> path and the IOVA range info capability has only existed since kernel
> v5.4. So we need to do something sane even if we don't have the kernel
> telling us about the IOVA ranges. I think this precludes the assert in
> the final patch of the series or else new QEMU on an old kernel is
> broken.
Correct. I totally missed that point.
Thanks!
Eric
>
>> + if (ret) {
>> + g_free(giommu);
>> + goto fail;
>> + }
>> +
>> ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
>> &err);
>> if (ret) {
>> @@ -1981,6 +1989,29 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>> return true;
>> }
>>
>> +static void vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
>> + unsigned int *nr_iovas,
>> + struct vfio_iova_range **iova_ranges)
> Just pass the VFIOContainer pointer? Thanks,
>
> Alex
>
>> +{
>> + struct vfio_info_cap_header *hdr;
>> + struct vfio_iommu_type1_info_cap_iova_range *cap;
>> +
>> + hdr = vfio_get_iommu_type1_info_cap(info,
>> + VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
>> + if (hdr == NULL) {
>> + return;
>> + }
>> +
>> + cap = (void *)hdr;
>> + *nr_iovas = cap->nr_iovas;
>> +
>> + if (*nr_iovas == 0) {
>> + return;
>> + }
>> + *iova_ranges = g_memdup2(cap->iova_ranges,
>> + *nr_iovas * sizeof(struct vfio_iova_range));
>> +}
>> +
>> static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
>> struct vfio_region_info *info)
>> {
>> @@ -2433,6 +2464,12 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
>> }
>> }
>>
>> +static void vfio_free_container(VFIOContainer *container)
>> +{
>> + g_free(container->iova_ranges);
>> + g_free(container);
>> +}
>> +
>> static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>> Error **errp)
>> {
>> @@ -2550,6 +2587,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>> if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
>> container->dma_max_mappings = 65535;
>> }
>> +
>> + vfio_get_info_iova_range(info, &container->nr_iovas,
>> + &container->iova_ranges);
>> +
>> vfio_get_iommu_info_migration(container, info);
>> g_free(info);
>>
>> @@ -2663,7 +2704,7 @@ enable_discards_exit:
>> vfio_ram_block_discard_disable(container, false);
>>
>> free_container_exit:
>> - g_free(container);
>> + vfio_free_container(container);
>>
>> close_fd_exit:
>> close(fd);
>> @@ -2717,7 +2758,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>>
>> trace_vfio_disconnect_container(container->fd);
>> close(container->fd);
>> - g_free(container);
>> + vfio_free_container(container);
>>
>> vfio_put_address_space(space);
>> }
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 06/12] range: Introduce range_inverse_array()
2023-09-19 16:29 ` Alex Williamson
@ 2023-09-20 7:24 ` Eric Auger
0 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-20 7:24 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Alex,
On 9/19/23 18:29, Alex Williamson wrote:
> On Wed, 13 Sep 2023 10:01:41 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> This helper reverses an array of regions, turning original
>> regions into holes and original holes into actual regions,
>> covering the whole UINT64_MAX span.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v1 -> v2:
>> - Move range_inverse_array description comment in the header
>> - Take low/high params
>> ---
>> include/qemu/range.h | 8 ++++++++
>> util/range.c | 45 ++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 53 insertions(+)
>>
>> diff --git a/include/qemu/range.h b/include/qemu/range.h
>> index 7e2b1cc447..2b59e3bf0c 100644
>> --- a/include/qemu/range.h
>> +++ b/include/qemu/range.h
>> @@ -219,4 +219,12 @@ static inline int ranges_overlap(uint64_t first1, uint64_t len1,
>>
>> GList *range_list_insert(GList *list, Range *data);
>>
>> +/*
>> + * Inverse an array of sorted ranges over the [low, high] span, ie.
>> + * original ranges becomes holes in the newly allocated inv_ranges
>> + */
>> +void range_inverse_array(uint32_t nr_ranges, Range *ranges,
>> + uint32_t *nr_inv_ranges, Range **inv_ranges,
>> + uint64_t low, uint64_t high);
>> +
>> #endif
>> diff --git a/util/range.c b/util/range.c
>> index 098d9d2dc0..4baeb588cc 100644
>> --- a/util/range.c
>> +++ b/util/range.c
>> @@ -70,3 +70,48 @@ GList *range_list_insert(GList *list, Range *data)
>>
>> return list;
>> }
>> +
>> +void range_inverse_array(uint32_t nr_ranges, Range *ranges,
>> + uint32_t *nr_inv_ranges, Range **inv_ranges,
>> + uint64_t low, uint64_t high)
> Rare be it for me to suggest GLib, but we already appear to have
> range_list_insert() making use of GList for an ordered list of Ranges.
> Doesn't this function become a lot easier if we take a sorted GList,
> walk it to create the inverse, and return a new GList of the inverted
> Ranges? Seems the initial sorted GList would be created by making use
> of the existing range_list_insert() function. Thanks,
I am not sure this greatly simplifies. This definitively removes the
realloc stuff. But to me what complexifies the algo is the low/high
parameter support instead of hardcoding thel to 0/UINT64_MAX (suggestion
by Philippe which I can understand for such helper though). You can see
the diff with
https://lore.kernel.org/all/20230904080451.424731-7-eric.auger@redhat.com/
I will further look at your suggestion though
Eric
>
> Alex
>
>> +{
>> + Range *resv;
>> + int i = 0, j = 0;
>> +
>> + resv = g_malloc0_n(nr_ranges + 1, sizeof(Range));
>> +
>> + for (; j < nr_ranges && (range_upb(&ranges[j]) < low); j++) {
>> + continue; /* skip all ranges below mon */
>> + }
>> +
>> + if (j == nr_ranges) {
>> + range_set_bounds(&resv[i++], low, high);
>> + goto realloc;
>> + }
>> +
>> + /* first range lob is greater than min, insert a first range */
>> + if (range_lob(&ranges[j]) > low) {
>> + range_set_bounds(&resv[i++], low,
>> + MIN(range_lob(&ranges[j]) - 1, high));
>> + }
>> +
>> + /* insert a range inbetween each original range until we reach max */
>> + for (; j < nr_ranges - 1; j++) {
>> + if (range_lob(&ranges[j]) >= high) {
>> + goto realloc;
>> + }
>> + if (range_compare(&ranges[j], &ranges[j + 1])) {
>> + range_set_bounds(&resv[i++], range_upb(&ranges[j]) + 1,
>> + MIN(range_lob(&ranges[j + 1]) - 1, high));
>> + }
>> + }
>> + /* last range upb is less than max, insert a last range */
>> + if (range_upb(&ranges[j]) < high) {
>> + range_set_bounds(&resv[i++],
>> + range_upb(&ranges[j]) + 1, high);
>> + }
>> +realloc:
>> + *nr_inv_ranges = i;
>> + resv = g_realloc(resv, i * sizeof(Range));
>> + *inv_ranges = resv;
>> +}
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption
2023-09-19 17:22 ` Alex Williamson
@ 2023-09-20 7:28 ` Eric Auger
2023-09-26 20:00 ` Eric Auger
2023-10-10 17:16 ` Eric Auger
2 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-20 7:28 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Alex,
On 9/19/23 19:22, Alex Williamson wrote:
> On Wed, 13 Sep 2023 10:01:47 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Now we retrieve the usable IOVA ranges from the host,
>> we now the physical IOMMU aperture and we can remove
>> the assumption of 64b IOVA space when calling
>> vfio_host_win_add().
>>
>> This works fine in general but in case of an IOMMU memory
>> region this becomes more tricky. For instance the virtio-iommu
>> MR has a 64b aperture by default. If the physical IOMMU has a
>> smaller aperture (typically the case for VTD), this means we
>> would need to resize the IOMMU MR when this latter is linked
>> to a container. However this happens on vfio_listener_region_add()
>> when calling the IOMMU MR set_iova_ranges() callback and this
>> would mean we would have a recursive call the
>> vfio_listener_region_add(). This looks like a wrong usage of
>> the memory API causing duplicate IOMMU MR notifier registration
>> for instance.
>>
>> Until we find a better solution, make sure the vfio_find_hostwin()
>> is not called anymore for IOMMU region.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> I have not found any working solution to the IOMMU MR resizing.
>> So I can remove this patch or remove the check for IOMMU MR. Maybe
>> this is an issue which can be handled separately?
>> ---
>> hw/vfio/common.c | 25 ++++++++++++-------------
>> 1 file changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 26da38de05..40cac1ca91 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1112,13 +1112,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> #endif
>> }
>>
>> - hostwin = vfio_find_hostwin(container, iova, end);
>> - if (!hostwin) {
>> - error_setg(&err, "Container %p can't map guest IOVA region"
>> - " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> - goto fail;
>> - }
>> -
>> memory_region_ref(section->mr);
>>
>> if (memory_region_is_iommu(section->mr)) {
>> @@ -1177,6 +1170,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> return;
>> }
>>
>> + hostwin = vfio_find_hostwin(container, iova, end);
>> + if (!hostwin) {
>> + error_setg(&err, "Container %p can't map guest IOVA region"
>> + " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> + goto fail;
>> + }
>> +
>> +
>> /* Here we assume that memory_region_is_ram(section->mr)==true */
>>
>> /*
>> @@ -2594,12 +2595,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>> vfio_get_iommu_info_migration(container, info);
>> g_free(info);
>>
>> - /*
>> - * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> - * information to get the actual window extent rather than assume
>> - * a 64-bit IOVA address space.
>> - */
>> - vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
>> + g_assert(container->nr_iovas);
> This assert is a problem for older kernels.
yes I will fix this.
>
>> + vfio_host_win_add(container, 0,
>> + container->iova_ranges[container->nr_iovas - 1].end,
>> + container->pgsizes);
> This doesn't address the assumption about the min_iova and adds an
> assumption that the kernel provided list is sorted. Thanks,
yup.
Thanks!
Eric
>
> Alex
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 03/12] vfio: Collect container iova range info
2023-09-13 12:55 ` Cédric Le Goater
@ 2023-09-20 7:38 ` Eric Auger
0 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-20 7:38 UTC (permalink / raw)
To: Cédric Le Goater, eric.auger.pro, qemu-devel, qemu-arm,
alex.williamson, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
Hi Cédric,
On 9/13/23 14:55, Cédric Le Goater wrote:
> On 9/13/23 10:01, Eric Auger wrote:
>> Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> capability is supported.
>>
>> This allows to propagate the information though the IOMMU MR
>> set_iova_ranges() callback so that virtual IOMMUs
>> get aware of those aperture constraints.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>> include/hw/vfio/vfio-common.h | 2 ++
>> hw/vfio/common.c | 45 +++++++++++++++++++++++++++++++++--
>> 2 files changed, 45 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h
>> b/include/hw/vfio/vfio-common.h
>> index da43d27352..74b9b27270 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -98,6 +98,8 @@ typedef struct VFIOContainer {
>> QLIST_HEAD(, VFIOGroup) group_list;
>> QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
>> QLIST_ENTRY(VFIOContainer) next;
>> + unsigned nr_iovas;
>> + struct vfio_iova_range *iova_ranges;
>> } VFIOContainer;
>> typedef struct VFIOGuestIOMMU {
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 9aac21abb7..26da38de05 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1157,6 +1157,14 @@ static void
>> vfio_listener_region_add(MemoryListener *listener,
>> goto fail;
>> }
>> + ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
>> + container->nr_iovas, (struct Range
>> *)container->iova_ranges,
>> + &err);
>> + if (ret) {
>> + g_free(giommu);
>> + goto fail;
>> + }
>> +
>> ret = memory_region_register_iommu_notifier(section->mr,
>> &giommu->n,
>> &err);
>> if (ret) {
>> @@ -1981,6 +1989,29 @@ bool vfio_get_info_dma_avail(struct
>> vfio_iommu_type1_info *info,
>> return true;
>> }
>> +static void vfio_get_info_iova_range(struct vfio_iommu_type1_info
>> *info,
>> + unsigned int *nr_iovas,
>> + struct vfio_iova_range
>> **iova_ranges)
>
> I guess there is no point in returning an error since we can not
> assign default values.
Actually this will return a boolean depending on the whether the
capability is supported,
as reported by Alex. I should have get inspired of
vfio_get_info_dma_avail()!
>
>> +{
>> + struct vfio_info_cap_header *hdr;
>> + struct vfio_iommu_type1_info_cap_iova_range *cap;
>> +
>> + hdr = vfio_get_iommu_type1_info_cap(info,
>> +
>> VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
>> + if (hdr == NULL) {
>
> May be :
>
> if (!hdr) {
yep. I guess I copy/pasted the vfio_get_info_dma_avail() code here ;-)
>
>
>> + return;
>> + }
>> +
>> + cap = (void *)hdr;
>> + *nr_iovas = cap->nr_iovas;
>> +
>
> I would add a trace event with the #iovas.
I do agree tracing the resv regions is usefyl however I would be tempted
to have this trace point elsewhere, maybe at the place where
vfio_host_win_add is called to trace aperture min/max and in the
set_iova cb. Because here I would need to enumerate the regions to trace
them and usually trace points do not add any code.
Thanks
Eric
>
> Thanks,
>
> C.
>
>
>> + if (*nr_iovas == 0) {
>> + return;
>> + }
>> + *iova_ranges = g_memdup2(cap->iova_ranges,
>> + *nr_iovas * sizeof(struct
>> vfio_iova_range));
>> +}
>> +
>> static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
>> struct vfio_region_info
>> *info)
>> {
>> @@ -2433,6 +2464,12 @@ static void
>> vfio_get_iommu_info_migration(VFIOContainer *container,
>> }
>> }
>> +static void vfio_free_container(VFIOContainer *container)
>> +{
>> + g_free(container->iova_ranges);
>> + g_free(container);
>> +}
>> +
>> static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>> Error **errp)
>> {
>> @@ -2550,6 +2587,10 @@ static int vfio_connect_container(VFIOGroup
>> *group, AddressSpace *as,
>> if (!vfio_get_info_dma_avail(info,
>> &container->dma_max_mappings)) {
>> container->dma_max_mappings = 65535;
>> }
>> +
>> + vfio_get_info_iova_range(info, &container->nr_iovas,
>> + &container->iova_ranges);
>> +
>> vfio_get_iommu_info_migration(container, info);
>> g_free(info);
>> @@ -2663,7 +2704,7 @@ enable_discards_exit:
>> vfio_ram_block_discard_disable(container, false);
>> free_container_exit:
>> - g_free(container);
>> + vfio_free_container(container);
>> close_fd_exit:
>> close(fd);
>> @@ -2717,7 +2758,7 @@ static void vfio_disconnect_container(VFIOGroup
>> *group)
>> trace_vfio_disconnect_container(container->fd);
>> close(container->fd);
>> - g_free(container);
>> + vfio_free_container(container);
>> vfio_put_address_space(space);
>> }
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 03/12] vfio: Collect container iova range info
2023-09-19 15:44 ` Alex Williamson
2023-09-20 7:15 ` Eric Auger
@ 2023-09-20 7:39 ` Eric Auger
1 sibling, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-20 7:39 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Alex,
On 9/19/23 17:44, Alex Williamson wrote:
> On Wed, 13 Sep 2023 10:01:38 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Collect iova range information if VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> capability is supported.
>>
>> This allows to propagate the information though the IOMMU MR
>> set_iova_ranges() callback so that virtual IOMMUs
>> get aware of those aperture constraints.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>> include/hw/vfio/vfio-common.h | 2 ++
>> hw/vfio/common.c | 45 +++++++++++++++++++++++++++++++++--
>> 2 files changed, 45 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index da43d27352..74b9b27270 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -98,6 +98,8 @@ typedef struct VFIOContainer {
>> QLIST_HEAD(, VFIOGroup) group_list;
>> QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
>> QLIST_ENTRY(VFIOContainer) next;
>> + unsigned nr_iovas;
>> + struct vfio_iova_range *iova_ranges;
>> } VFIOContainer;
>>
>> typedef struct VFIOGuestIOMMU {
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 9aac21abb7..26da38de05 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1157,6 +1157,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> goto fail;
>> }
>>
>> + ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
>> + container->nr_iovas, (struct Range *)container->iova_ranges,
>> + &err);
> The semantics of calling this with nr_iovas == 0 and the vIOMMU driver
> ignoring that it's being told there are no usable iova ranges is rather
> strange. Should nr_iovas be initialized to -1 for that or should this
> call be conditional on non-zero nr_iovas?
>
> Also, vfio_get_info_iova_range() is only called in the type1 container
> path and the IOVA range info capability has only existed since kernel
> v5.4. So we need to do something sane even if we don't have the kernel
> telling us about the IOVA ranges. I think this precludes the assert in
> the final patch of the series or else new QEMU on an old kernel is
> broken.
>
>> + if (ret) {
>> + g_free(giommu);
>> + goto fail;
>> + }
>> +
>> ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
>> &err);
>> if (ret) {
>> @@ -1981,6 +1989,29 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>> return true;
>> }
>>
>> +static void vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
>> + unsigned int *nr_iovas,
>> + struct vfio_iova_range **iova_ranges)
> Just pass the VFIOContainer pointer? Thanks,
Sorry I miss this comment: yup.
Eric
>
> Alex
>
>> +{
>> + struct vfio_info_cap_header *hdr;
>> + struct vfio_iommu_type1_info_cap_iova_range *cap;
>> +
>> + hdr = vfio_get_iommu_type1_info_cap(info,
>> + VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
>> + if (hdr == NULL) {
>> + return;
>> + }
>> +
>> + cap = (void *)hdr;
>> + *nr_iovas = cap->nr_iovas;
>> +
>> + if (*nr_iovas == 0) {
>> + return;
>> + }
>> + *iova_ranges = g_memdup2(cap->iova_ranges,
>> + *nr_iovas * sizeof(struct vfio_iova_range));
>> +}
>> +
>> static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
>> struct vfio_region_info *info)
>> {
>> @@ -2433,6 +2464,12 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
>> }
>> }
>>
>> +static void vfio_free_container(VFIOContainer *container)
>> +{
>> + g_free(container->iova_ranges);
>> + g_free(container);
>> +}
>> +
>> static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>> Error **errp)
>> {
>> @@ -2550,6 +2587,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>> if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
>> container->dma_max_mappings = 65535;
>> }
>> +
>> + vfio_get_info_iova_range(info, &container->nr_iovas,
>> + &container->iova_ranges);
>> +
>> vfio_get_iommu_info_migration(container, info);
>> g_free(info);
>>
>> @@ -2663,7 +2704,7 @@ enable_discards_exit:
>> vfio_ram_block_discard_disable(container, false);
>>
>> free_container_exit:
>> - g_free(container);
>> + vfio_free_container(container);
>>
>> close_fd_exit:
>> close(fd);
>> @@ -2717,7 +2758,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>>
>> trace_vfio_disconnect_container(container->fd);
>> close(container->fd);
>> - g_free(container);
>> + vfio_free_container(container);
>>
>> vfio_put_address_space(space);
>> }
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 02/12] memory: Introduce memory_region_iommu_set_iova_ranges
2023-09-13 12:43 ` Cédric Le Goater
@ 2023-09-20 7:40 ` Eric Auger
0 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-20 7:40 UTC (permalink / raw)
To: Cédric Le Goater, eric.auger.pro, qemu-devel, qemu-arm,
alex.williamson, jean-philippe, mst, pbonzini
Cc: peter.maydell, peterx, david, philmd
Hi Cedric,
On 9/13/23 14:43, Cédric Le Goater wrote:
> On 9/13/23 10:01, Eric Auger wrote:
>> This helper will allow to convey information about valid
>> IOVA ranges to virtual IOMMUS.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks!
Eric
>
> Thanks,
>
> C.
>
>> ---
>> include/exec/memory.h | 26 ++++++++++++++++++++++++++
>> softmmu/memory.c | 15 +++++++++++++++
>> 2 files changed, 41 insertions(+)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 184cb3a01b..f6fb99dd3f 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -519,6 +519,27 @@ struct IOMMUMemoryRegionClass {
>> int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
>> uint64_t page_size_mask,
>> Error **errp);
>> + /**
>> + * @iommu_set_iova_ranges:
>> + *
>> + * Propagate information about the usable IOVA ranges for a
>> given IOMMU
>> + * memory region. Used for example to propagate host physical
>> device
>> + * reserved memory region constraints to the virtual IOMMU.
>> + *
>> + * Optional method: if this method is not provided, then the
>> default IOVA
>> + * aperture is used.
>> + *
>> + * @nr_ranges: number of IOVA ranges
>> + *
>> + * @iova_ranges: an array of @nr_ranges usable IOVA ranges
>> + *
>> + * Returns 0 on success, or a negative error. In case of
>> failure, the error
>> + * object must be created.
>> + */
>> + int (*iommu_set_iova_ranges)(IOMMUMemoryRegion *iommu,
>> + uint32_t nr_ranges,
>> + struct Range *iova_ranges,
>> + Error **errp);
>> };
>> typedef struct RamDiscardListener RamDiscardListener;
>> @@ -1845,6 +1866,11 @@ int
>> memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
>> uint64_t page_size_mask,
>> Error **errp);
>> +int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu,
>> + uint32_t nr_ranges,
>> + struct Range *iova_ranges,
>> + Error **errp);
>> +
>> /**
>> * memory_region_name: get a memory region's name
>> *
>> diff --git a/softmmu/memory.c b/softmmu/memory.c
>> index 7d9494ce70..07499457aa 100644
>> --- a/softmmu/memory.c
>> +++ b/softmmu/memory.c
>> @@ -1905,6 +1905,21 @@ int
>> memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
>> return ret;
>> }
>> +int memory_region_iommu_set_iova_ranges(IOMMUMemoryRegion *iommu_mr,
>> + uint32_t nr_ranges,
>> + struct Range *iova_ranges,
>> + Error **errp)
>> +{
>> + IOMMUMemoryRegionClass *imrc =
>> IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
>> + int ret = 0;
>> +
>> + if (imrc->iommu_set_iova_ranges) {
>> + ret = imrc->iommu_set_iova_ranges(iommu_mr, nr_ranges,
>> + iova_ranges, errp);
>> + }
>> + return ret;
>> +}
>> +
>> int memory_region_register_iommu_notifier(MemoryRegion *mr,
>> IOMMUNotifier *n, Error
>> **errp)
>> {
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption
2023-09-13 8:01 ` [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption Eric Auger
2023-09-19 17:22 ` Alex Williamson
@ 2023-09-20 20:02 ` Alex Williamson
2023-10-11 17:32 ` Eric Auger
1 sibling, 1 reply; 37+ messages in thread
From: Alex Williamson @ 2023-09-20 20:02 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
On Wed, 13 Sep 2023 10:01:47 +0200
Eric Auger <eric.auger@redhat.com> wrote:
> Now we retrieve the usable IOVA ranges from the host,
> we now the physical IOMMU aperture and we can remove
> the assumption of 64b IOVA space when calling
> vfio_host_win_add().
>
> This works fine in general but in case of an IOMMU memory
> region this becomes more tricky. For instance the virtio-iommu
> MR has a 64b aperture by default. If the physical IOMMU has a
> smaller aperture (typically the case for VTD), this means we
> would need to resize the IOMMU MR when this latter is linked
> to a container. However this happens on vfio_listener_region_add()
> when calling the IOMMU MR set_iova_ranges() callback and this
> would mean we would have a recursive call the
> vfio_listener_region_add(). This looks like a wrong usage of
> the memory API causing duplicate IOMMU MR notifier registration
> for instance.
>
> Until we find a better solution, make sure the vfio_find_hostwin()
> is not called anymore for IOMMU region.
Thanks for your encouragement to double check this, it does seem like
there are some gaps in the host window support. First I guess I don't
understand why the last chunk here assumes a contiguous range.
Shouldn't we call vfio_host_win_add() for each IOVA range?
But then we have a problem that we don't necessarily get positive
feedback from memory_region_iommu_set_iova_ranges(). Did the vIOMMU
accept the ranges or not? Only one vIOMMU implements the callback.
Should we only call memory_region_iommu_set_iova_ranges() if the range
doesn't align to a host window and should the wrapper return -ENOTSUP
if there is no vIOMMU support to poke holes in the range? Thanks,
Alex
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
> ---
>
> I have not found any working solution to the IOMMU MR resizing.
> So I can remove this patch or remove the check for IOMMU MR. Maybe
> this is an issue which can be handled separately?
> ---
> hw/vfio/common.c | 25 ++++++++++++-------------
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 26da38de05..40cac1ca91 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1112,13 +1112,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
> #endif
> }
>
> - hostwin = vfio_find_hostwin(container, iova, end);
> - if (!hostwin) {
> - error_setg(&err, "Container %p can't map guest IOVA region"
> - " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
> - goto fail;
> - }
> -
> memory_region_ref(section->mr);
>
> if (memory_region_is_iommu(section->mr)) {
> @@ -1177,6 +1170,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
> return;
> }
>
> + hostwin = vfio_find_hostwin(container, iova, end);
> + if (!hostwin) {
> + error_setg(&err, "Container %p can't map guest IOVA region"
> + " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
> + goto fail;
> + }
> +
> +
> /* Here we assume that memory_region_is_ram(section->mr)==true */
>
> /*
> @@ -2594,12 +2595,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> vfio_get_iommu_info_migration(container, info);
> g_free(info);
>
> - /*
> - * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> - * information to get the actual window extent rather than assume
> - * a 64-bit IOVA address space.
> - */
> - vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
> + g_assert(container->nr_iovas);
> + vfio_host_win_add(container, 0,
> + container->iova_ranges[container->nr_iovas - 1].end,
> + container->pgsizes);
>
> break;
> }
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
` (11 preceding siblings ...)
2023-09-13 8:01 ` [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption Eric Auger
@ 2023-09-26 8:06 ` YangHang Liu
12 siblings, 0 replies; 37+ messages in thread
From: YangHang Liu @ 2023-09-26 8:06 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg,
jean-philippe, mst, pbonzini, peter.maydell, peterx, david,
philmd
The original issue I found : After starting a VM which has two ice PFs
and a virtio-iommu device, qemu-kvm and VM guest dmesg throw lots of
duplicate VFIO_MAP_DMA errors
After testing with Eric's build, the original issue is gone and the
Tier1 regression test against ice PF and virtio iommu device gets PASS
as well.
Tested-by: Yanghang Liu <yanghliu@redhat.com>
On Wed, Sep 13, 2023 at 4:06 PM Eric Auger <eric.auger@redhat.com> wrote:
>
> On x86, when assigning VFIO-PCI devices protected with virtio-iommu
> we encounter the case where the guest tries to map IOVAs beyond 48b
> whereas the physical VTD IOMMU only supports 48b. This ends up with
> VFIO_MAP_DMA failures at qemu level because at kernel level,
> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map().
>
> This is due to the fact the virtio-iommu currently unconditionally
> exposes an IOVA range of 64b through its config input range fields.
>
> This series removes this assumption by retrieving the usable IOVA
> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when
> a VFIO device is attached. This info is communicated to the
> virtio-iommu memory region, transformed into the inversed info, ie.
> the host reserved IOVA regions. Then those latter are combined with the
> reserved IOVA regions set though the virtio-iommu reserved-regions
> property. That way, the guest virtio-iommu driver, unchanged, is
> able to probe the whole set of reserved regions and prevent any IOVA
> belonging to those ranges from beeing used, achieving the original goal.
>
> Best Regards
>
> Eric
>
> This series can be found at:
> https://github.com/eauger/qemu/tree/virtio-iommu_geometry_v2
>
> History:
> v1 -> v2:
> - Remove "[PATCH 12/13] virtio-iommu: Resize memory region according
> to the max iova info" which causes way too much trouble: trigger
> a coredump in vhost, causes duplication of IOMMU notifiers causing
> EEXIST vfio_dma_map errors, ... This looks like a bad usage of the
> memory API so I prefer removing this from this series. So I was
> also obliged to remove the vfio_find_hostwin() check in the case
> of an IOMMU MR.
> - Let range_inverse_array() take low/high args instead of hardcoding
> 0, UINT64_MAX which both complexifies the algo and the tests.
> - Move range function description in header.
> - Check that if set_iova_ranges is called several times, new resv
> regions are included in previous ones
>
> Eric Auger (12):
> memory: Let ReservedRegion use Range
> memory: Introduce memory_region_iommu_set_iova_ranges
> vfio: Collect container iova range info
> virtio-iommu: Rename reserved_regions into prop_resv_regions
> virtio-iommu: Introduce per IOMMUDevice reserved regions
> range: Introduce range_inverse_array()
> virtio-iommu: Implement set_iova_ranges() callback
> range: Make range_compare() public
> util/reserved-region: Add new ReservedRegion helpers
> virtio-iommu: Consolidate host reserved regions and property set ones
> test: Add some tests for range and resv-mem helpers
> vfio: Remove 64-bit IOVA address space assumption
>
> include/exec/memory.h | 30 +++-
> include/hw/vfio/vfio-common.h | 2 +
> include/hw/virtio/virtio-iommu.h | 7 +-
> include/qemu/range.h | 14 ++
> include/qemu/reserved-region.h | 32 ++++
> hw/core/qdev-properties-system.c | 9 +-
> hw/vfio/common.c | 70 +++++++--
> hw/virtio/virtio-iommu-pci.c | 8 +-
> hw/virtio/virtio-iommu.c | 110 ++++++++++++--
> softmmu/memory.c | 15 ++
> tests/unit/test-resv-mem.c | 251 +++++++++++++++++++++++++++++++
> util/range.c | 51 ++++++-
> util/reserved-region.c | 94 ++++++++++++
> hw/virtio/trace-events | 1 +
> tests/unit/meson.build | 1 +
> util/meson.build | 1 +
> 16 files changed, 655 insertions(+), 41 deletions(-)
> create mode 100644 include/qemu/reserved-region.h
> create mode 100644 tests/unit/test-resv-mem.c
> create mode 100644 util/reserved-region.c
>
> --
> 2.41.0
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption
2023-09-19 17:22 ` Alex Williamson
2023-09-20 7:28 ` Eric Auger
@ 2023-09-26 20:00 ` Eric Auger
2023-10-10 17:16 ` Eric Auger
2 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-09-26 20:00 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Alex,
On 9/19/23 19:22, Alex Williamson wrote:
> On Wed, 13 Sep 2023 10:01:47 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Now we retrieve the usable IOVA ranges from the host,
>> we now the physical IOMMU aperture and we can remove
>> the assumption of 64b IOVA space when calling
>> vfio_host_win_add().
>>
>> This works fine in general but in case of an IOMMU memory
>> region this becomes more tricky. For instance the virtio-iommu
>> MR has a 64b aperture by default. If the physical IOMMU has a
>> smaller aperture (typically the case for VTD), this means we
>> would need to resize the IOMMU MR when this latter is linked
>> to a container. However this happens on vfio_listener_region_add()
>> when calling the IOMMU MR set_iova_ranges() callback and this
>> would mean we would have a recursive call the
>> vfio_listener_region_add(). This looks like a wrong usage of
>> the memory API causing duplicate IOMMU MR notifier registration
>> for instance.
>>
>> Until we find a better solution, make sure the vfio_find_hostwin()
>> is not called anymore for IOMMU region.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> I have not found any working solution to the IOMMU MR resizing.
>> So I can remove this patch or remove the check for IOMMU MR. Maybe
>> this is an issue which can be handled separately?
>> ---
>> hw/vfio/common.c | 25 ++++++++++++-------------
>> 1 file changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 26da38de05..40cac1ca91 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1112,13 +1112,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> #endif
>> }
>>
>> - hostwin = vfio_find_hostwin(container, iova, end);
>> - if (!hostwin) {
>> - error_setg(&err, "Container %p can't map guest IOVA region"
>> - " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> - goto fail;
>> - }
>> -
>> memory_region_ref(section->mr);
>>
>> if (memory_region_is_iommu(section->mr)) {
>> @@ -1177,6 +1170,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> return;
>> }
>>
>> + hostwin = vfio_find_hostwin(container, iova, end);
>> + if (!hostwin) {
>> + error_setg(&err, "Container %p can't map guest IOVA region"
>> + " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> + goto fail;
>> + }
>> +
>> +
>> /* Here we assume that memory_region_is_ram(section->mr)==true */
>>
>> /*
>> @@ -2594,12 +2595,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>> vfio_get_iommu_info_migration(container, info);
>> g_free(info);
>>
>> - /*
>> - * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> - * information to get the actual window extent rather than assume
>> - * a 64-bit IOVA address space.
>> - */
>> - vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
>> + g_assert(container->nr_iovas);
> This assert is a problem for older kernels.
>
>> + vfio_host_win_add(container, 0,
>> + container->iova_ranges[container->nr_iovas - 1].end,
>> + container->pgsizes);
> This doesn't address the assumption about the min_iova and adds an
> assumption that the kernel provided list is sorted. Thanks,
Damned I was effectively thinking the array would be sorted but the uapi
does not document this :-(
thanks!
Eric
>
> Alex
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 05/12] virtio-iommu: Introduce per IOMMUDevice reserved regions
2023-09-13 8:01 ` [PATCH v2 05/12] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
@ 2023-09-29 15:52 ` Jean-Philippe Brucker
2023-10-03 15:48 ` Eric Auger
0 siblings, 1 reply; 37+ messages in thread
From: Jean-Philippe Brucker @ 2023-09-29 15:52 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Eric,
On Wed, Sep 13, 2023 at 10:01:40AM +0200, Eric Auger wrote:
> For the time being the per device reserved regions are
> just a duplicate of IOMMU wide reserved regions. Subsequent
> patches will combine those with host reserved regions, if any.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
> include/hw/virtio/virtio-iommu.h | 1 +
> hw/virtio/virtio-iommu.c | 42 ++++++++++++++++++++++++++------
> 2 files changed, 35 insertions(+), 8 deletions(-)
>
> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> index eea4564782..70b8ace34d 100644
> --- a/include/hw/virtio/virtio-iommu.h
> +++ b/include/hw/virtio/virtio-iommu.h
> @@ -39,6 +39,7 @@ typedef struct IOMMUDevice {
> AddressSpace as;
> MemoryRegion root; /* The root container of the device */
> MemoryRegion bypass_mr; /* The alias of shared memory MR */
> + GList *resv_regions;
> } IOMMUDevice;
>
> typedef struct IOMMUPciBus {
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 979cdb5648..ea359b586a 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -624,22 +624,48 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
> return ret;
> }
>
> +static int consolidate_resv_regions(IOMMUDevice *sdev)
> +{
> + VirtIOIOMMU *s = sdev->viommu;
> + int i;
> +
> + for (i = 0; i < s->nr_prop_resv_regions; i++) {
> + ReservedRegion *reg = g_new0(ReservedRegion, 1);
> +
> + *reg = s->prop_resv_regions[i];
> + sdev->resv_regions = g_list_append(sdev->resv_regions, reg);
> + }
> + return 0;
> +}
> +
> static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
> uint8_t *buf, size_t free)
> {
> struct virtio_iommu_probe_resv_mem prop = {};
> size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
> - int i;
> + IOMMUDevice *sdev;
> + GList *l;
> + int ret;
>
> - total = size * s->nr_prop_resv_regions;
> + sdev = container_of(virtio_iommu_mr(s, ep), IOMMUDevice, iommu_mr);
> + if (!sdev) {
> + return -EINVAL;
> + }
>
> + ret = consolidate_resv_regions(sdev);
> + if (ret) {
> + return ret;
> + }
> +
> + total = size * g_list_length(sdev->resv_regions);
> if (total > free) {
> return -ENOSPC;
> }
>
> - for (i = 0; i < s->nr_prop_resv_regions; i++) {
> - unsigned subtype = s->prop_resv_regions[i].type;
> - Range *range = &s->prop_resv_regions[i].range;
> + for (l = sdev->resv_regions; l; l = l->next) {
> + ReservedRegion *reg = l->data;
> + unsigned subtype = reg->type;
> + Range *range = ®->range;
>
> assert(subtype == VIRTIO_IOMMU_RESV_MEM_T_RESERVED ||
> subtype == VIRTIO_IOMMU_RESV_MEM_T_MSI);
> @@ -857,7 +883,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
> bool bypass_allowed;
> int granule;
> bool found;
> - int i;
> + GList *l;
>
> interval.low = addr;
> interval.high = addr + 1;
> @@ -895,8 +921,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
> goto unlock;
> }
>
> - for (i = 0; i < s->nr_prop_resv_regions; i++) {
> - ReservedRegion *reg = &s->prop_resv_regions[i];
> + for (l = sdev->resv_regions; l; l = l->next) {
> + ReservedRegion *reg = l->data;
This means translate() now only takes reserved regions into account after
the guest issues a probe request, which only happens if the guest actually
supports the probe feature. It may be better to build the list earlier
(like when creating the IOMMUDevice), and complete it in
set_iova_ranges(). I guess both could call consolidate() which would
rebuild the whole list, for example
Thanks,
Jean
>
> if (range_contains(®->range, addr)) {
> switch (reg->type) {
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 07/12] virtio-iommu: Implement set_iova_ranges() callback
2023-09-13 8:01 ` [PATCH v2 07/12] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
@ 2023-09-29 16:15 ` Jean-Philippe Brucker
2023-10-10 14:36 ` Eric Auger
0 siblings, 1 reply; 37+ messages in thread
From: Jean-Philippe Brucker @ 2023-09-29 16:15 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg, mst,
pbonzini, peter.maydell, peterx, david, philmd
On Wed, Sep 13, 2023 at 10:01:42AM +0200, Eric Auger wrote:
> The implementation populates the array of per IOMMUDevice
> host reserved regions.
>
> It is forbidden to have conflicting sets of host IOVA ranges
> to be applied onto the same IOMMU MR (implied by different
> host devices).
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
> ---
>
> v1 -> v2:
> - Forbid conflicting sets of host resv regions
> ---
> include/hw/virtio/virtio-iommu.h | 2 ++
> hw/virtio/virtio-iommu.c | 48 ++++++++++++++++++++++++++++++++
> 2 files changed, 50 insertions(+)
>
> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> index 70b8ace34d..31b69c8261 100644
> --- a/include/hw/virtio/virtio-iommu.h
> +++ b/include/hw/virtio/virtio-iommu.h
> @@ -40,6 +40,8 @@ typedef struct IOMMUDevice {
> MemoryRegion root; /* The root container of the device */
> MemoryRegion bypass_mr; /* The alias of shared memory MR */
> GList *resv_regions;
> + Range *host_resv_regions;
> + uint32_t nr_host_resv_regions;
> } IOMMUDevice;
>
> typedef struct IOMMUPciBus {
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index ea359b586a..ed2df5116f 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -20,6 +20,7 @@
> #include "qemu/osdep.h"
> #include "qemu/log.h"
> #include "qemu/iov.h"
> +#include "qemu/range.h"
> #include "exec/target_page.h"
> #include "hw/qdev-properties.h"
> #include "hw/virtio/virtio.h"
> @@ -1158,6 +1159,52 @@ static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
> return 0;
> }
>
> +static int virtio_iommu_set_iova_ranges(IOMMUMemoryRegion *mr,
> + uint32_t nr_ranges,
> + struct Range *iova_ranges,
> + Error **errp)
> +{
> + IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
> + uint32_t nr_host_resv_regions;
> + Range *host_resv_regions;
> + int ret = -EINVAL;
> +
> + if (!nr_ranges) {
> + return 0;
> + }
> +
> + if (sdev->host_resv_regions) {
> + range_inverse_array(nr_ranges, iova_ranges,
> + &nr_host_resv_regions, &host_resv_regions,
> + 0, UINT64_MAX);
> + if (nr_host_resv_regions != sdev->nr_host_resv_regions) {
> + goto error;
> + }
> + for (int i = 0; i < nr_host_resv_regions; i++) {
> + Range *new = &host_resv_regions[i];
> + Range *existing = &sdev->host_resv_regions[i];
> +
> + if (!range_contains_range(existing, new)) {
> + goto error;
> + }
> + }
> + ret = 0;
> + goto out;
> + }
> +
> + range_inverse_array(nr_ranges, iova_ranges,
> + &sdev->nr_host_resv_regions, &sdev->host_resv_regions,
> + 0, UINT64_MAX);
Can set_iova_ranges() only be called for the first time before the guest
has had a chance to issue a probe request? Maybe we could add a
sanity-check that the guest hasn't issued a probe request yet, since we
can't notify about updated reserved regions.
I'm probably misremembering because I thought Linux set up IOMMU contexts
(including probe requests) before enabling DMA master in PCI which cause
QEMU VFIO to issue these calls. I'll double check.
Thanks,
Jean
> +
> + return 0;
> +error:
> + error_setg(errp, "IOMMU mr=%s Conflicting host reserved regions set!",
> + mr->parent_obj.name);
> +out:
> + g_free(host_resv_regions);
> + return ret;
> +}
> +
> static void virtio_iommu_system_reset(void *opaque)
> {
> VirtIOIOMMU *s = opaque;
> @@ -1453,6 +1500,7 @@ static void virtio_iommu_memory_region_class_init(ObjectClass *klass,
> imrc->replay = virtio_iommu_replay;
> imrc->notify_flag_changed = virtio_iommu_notify_flag_changed;
> imrc->iommu_set_page_size_mask = virtio_iommu_set_page_size_mask;
> + imrc->iommu_set_iova_ranges = virtio_iommu_set_iova_ranges;
> }
>
> static const TypeInfo virtio_iommu_info = {
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 09/12] util/reserved-region: Add new ReservedRegion helpers
2023-09-13 8:01 ` [PATCH v2 09/12] util/reserved-region: Add new ReservedRegion helpers Eric Auger
@ 2023-09-29 16:16 ` Jean-Philippe Brucker
2023-10-10 13:51 ` Eric Auger
0 siblings, 1 reply; 37+ messages in thread
From: Jean-Philippe Brucker @ 2023-09-29 16:16 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg, mst,
pbonzini, peter.maydell, peterx, david, philmd
On Wed, Sep 13, 2023 at 10:01:44AM +0200, Eric Auger wrote:
> Introduce resv_region_list_insert() helper which inserts
> a new ReservedRegion into a sorted list of reserved region.
> In case of overlap, the new region has higher priority and
> hides the existing overlapped segments. If the overlap is
> partial, new regions are created for parts which are not
> overlapped. The new region has higher priority independently
> on the type of the regions.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
> include/qemu/reserved-region.h | 32 ++++++++++++
> util/reserved-region.c | 94 ++++++++++++++++++++++++++++++++++
> util/meson.build | 1 +
> 3 files changed, 127 insertions(+)
> create mode 100644 include/qemu/reserved-region.h
> create mode 100644 util/reserved-region.c
>
> diff --git a/include/qemu/reserved-region.h b/include/qemu/reserved-region.h
> new file mode 100644
> index 0000000000..8e6f0a97e2
> --- /dev/null
> +++ b/include/qemu/reserved-region.h
> @@ -0,0 +1,32 @@
> +/*
> + * QEMU ReservedRegion helpers
> + *
> + * Copyright (c) 2023 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef QEMU_RESERVED_REGION_H
> +#define QEMU_RESERVED_REGION_H
> +
> +#include "exec/memory.h"
> +
> +/*
> + * Insert a new region into a sorted list of reserved regions. In case
> + * there is overlap with existing regions, the new added region has
> + * higher priority and replaces the overlapped segment.
> + */
> +GList *resv_region_list_insert(GList *list, ReservedRegion *reg);
> +
> +#endif
> diff --git a/util/reserved-region.c b/util/reserved-region.c
> new file mode 100644
> index 0000000000..bb26a6bed3
> --- /dev/null
> +++ b/util/reserved-region.c
> @@ -0,0 +1,94 @@
> +/*
> + * QEMU ReservedRegion helpers
> + *
> + * Copyright (c) 2023 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/range.h"
> +#include "qemu/reserved-region.h"
> +
> +GList *resv_region_list_insert(GList *list, ReservedRegion *reg)
> +{
> + ReservedRegion *resv_iter, *new_reg;
> + Range *r = ®->range;
> + Range *range_iter;
> + GList *l;
> +
> + for (l = list; l ; ) {
> + resv_iter = (ReservedRegion *)l->data;
> + range_iter = &resv_iter->range;
> +
> + /* Skip all list elements strictly less than range to add */
> + if (range_compare(range_iter, r) < 0) {
> + l = l->next;
> + } else if (range_compare(range_iter, r) > 0) {
> + return g_list_insert_before(list, l, reg);
> + } else { /* there is an operlap */
I guess this could just be 'else if', and the whole block below
unindented. But I don't know if the function would look less or more scary :)
> + if (range_contains_range(r, range_iter)) {
> + /* new range contains current item, simply remove this latter */
> + GList *prev = l->prev;
> + g_free(l->data);
> + list = g_list_delete_link(list, l);
> + if (prev) {
These four lines could just be 'l = prev->next'.
> + l = prev;
> + if (l) {
> + l = l->next;
> + }
> + } else {
> + l = list;
> + }
> + } else if (range_contains_range(range_iter, r)) {
> + /* new region is included in the current region */
> + if (range_lob(range_iter) == range_lob(r)) {
> + /* adjacent on the left side, derives into 2 regions */
> + range_set_bounds(range_iter, range_upb(r) + 1,
> + range_upb(range_iter));
> + return g_list_insert_before(list, l, reg);
> + } else if (range_upb(range_iter) == range_upb(r)) {
> + /* adjacent on the right side, derives into 2 regions */
> + range_set_bounds(range_iter, range_lob(range_iter),
> + range_lob(r) - 1);
> + l = l->next;
> + } else {
> + uint64_t lob = range_lob(range_iter);
> + /*
> + * the new range is in the middle of an existing one,
> + * split this latter into 3 regs instead
> + */
> + range_set_bounds(range_iter, range_upb(r) + 1,
> + range_upb(range_iter));
> + new_reg = g_new0(ReservedRegion, 1);
> + new_reg->type = resv_iter->type;
> + range_set_bounds(&new_reg->range,
> + lob, range_lob(r) - 1);
> + list = g_list_insert_before(list, l, new_reg);
> + return g_list_insert_before(list, l, reg);
> + }
> + } else if (range_lob(r) < range_lob(range_iter)) {
> + range_set_bounds(range_iter, range_upb(r) + 1,
> + range_upb(range_iter));
> + return g_list_insert_before(list, l, reg);
> + } else { /* intersection on the upper range */
> + range_set_bounds(range_iter, range_lob(range_iter),
> + range_lob(r) - 1);
> + l = l->next;
> + }
> + } /* overlap */
> + }
> + return g_list_append(list, reg);
Looks correct overall
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> +}
> +
> diff --git a/util/meson.build b/util/meson.build
> index c4827fd70a..eb677b40c2 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -51,6 +51,7 @@ util_ss.add(files('qdist.c'))
> util_ss.add(files('qht.c'))
> util_ss.add(files('qsp.c'))
> util_ss.add(files('range.c'))
> +util_ss.add(files('reserved-region.c'))
> util_ss.add(files('stats64.c'))
> util_ss.add(files('systemd.c'))
> util_ss.add(files('transactions.c'))
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 05/12] virtio-iommu: Introduce per IOMMUDevice reserved regions
2023-09-29 15:52 ` Jean-Philippe Brucker
@ 2023-10-03 15:48 ` Eric Auger
0 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-10-03 15:48 UTC (permalink / raw)
To: Jean-Philippe Brucker
Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Jean,
On 9/29/23 17:52, Jean-Philippe Brucker wrote:
> Hi Eric,
>
> On Wed, Sep 13, 2023 at 10:01:40AM +0200, Eric Auger wrote:
>> For the time being the per device reserved regions are
>> just a duplicate of IOMMU wide reserved regions. Subsequent
>> patches will combine those with host reserved regions, if any.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>> include/hw/virtio/virtio-iommu.h | 1 +
>> hw/virtio/virtio-iommu.c | 42 ++++++++++++++++++++++++++------
>> 2 files changed, 35 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
>> index eea4564782..70b8ace34d 100644
>> --- a/include/hw/virtio/virtio-iommu.h
>> +++ b/include/hw/virtio/virtio-iommu.h
>> @@ -39,6 +39,7 @@ typedef struct IOMMUDevice {
>> AddressSpace as;
>> MemoryRegion root; /* The root container of the device */
>> MemoryRegion bypass_mr; /* The alias of shared memory MR */
>> + GList *resv_regions;
>> } IOMMUDevice;
>>
>> typedef struct IOMMUPciBus {
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index 979cdb5648..ea359b586a 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -624,22 +624,48 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
>> return ret;
>> }
>>
>> +static int consolidate_resv_regions(IOMMUDevice *sdev)
>> +{
>> + VirtIOIOMMU *s = sdev->viommu;
>> + int i;
>> +
>> + for (i = 0; i < s->nr_prop_resv_regions; i++) {
>> + ReservedRegion *reg = g_new0(ReservedRegion, 1);
>> +
>> + *reg = s->prop_resv_regions[i];
>> + sdev->resv_regions = g_list_append(sdev->resv_regions, reg);
>> + }
>> + return 0;
>> +}
>> +
>> static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
>> uint8_t *buf, size_t free)
>> {
>> struct virtio_iommu_probe_resv_mem prop = {};
>> size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
>> - int i;
>> + IOMMUDevice *sdev;
>> + GList *l;
>> + int ret;
>>
>> - total = size * s->nr_prop_resv_regions;
>> + sdev = container_of(virtio_iommu_mr(s, ep), IOMMUDevice, iommu_mr);
>> + if (!sdev) {
>> + return -EINVAL;
>> + }
>>
>> + ret = consolidate_resv_regions(sdev);
>> + if (ret) {
>> + return ret;
>> + }
>> +
>> + total = size * g_list_length(sdev->resv_regions);
>> if (total > free) {
>> return -ENOSPC;
>> }
>>
>> - for (i = 0; i < s->nr_prop_resv_regions; i++) {
>> - unsigned subtype = s->prop_resv_regions[i].type;
>> - Range *range = &s->prop_resv_regions[i].range;
>> + for (l = sdev->resv_regions; l; l = l->next) {
>> + ReservedRegion *reg = l->data;
>> + unsigned subtype = reg->type;
>> + Range *range = ®->range;
>>
>> assert(subtype == VIRTIO_IOMMU_RESV_MEM_T_RESERVED ||
>> subtype == VIRTIO_IOMMU_RESV_MEM_T_MSI);
>> @@ -857,7 +883,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>> bool bypass_allowed;
>> int granule;
>> bool found;
>> - int i;
>> + GList *l;
>>
>> interval.low = addr;
>> interval.high = addr + 1;
>> @@ -895,8 +921,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>> goto unlock;
>> }
>>
>> - for (i = 0; i < s->nr_prop_resv_regions; i++) {
>> - ReservedRegion *reg = &s->prop_resv_regions[i];
>> + for (l = sdev->resv_regions; l; l = l->next) {
>> + ReservedRegion *reg = l->data;
> This means translate() now only takes reserved regions into account after
> the guest issues a probe request, which only happens if the guest actually
> supports the probe feature. It may be better to build the list earlier
> (like when creating the IOMMUDevice), and complete it in
> set_iova_ranges(). I guess both could call consolidate() which would
> rebuild the whole list, for example
you're totally right, I completely missed that point. Will try to follow
your suggestion.
Thanks!
Eric
>
> Thanks,
> Jean
>
>>
>> if (range_contains(®->range, addr)) {
>> switch (reg->type) {
>> --
>> 2.41.0
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 09/12] util/reserved-region: Add new ReservedRegion helpers
2023-09-29 16:16 ` Jean-Philippe Brucker
@ 2023-10-10 13:51 ` Eric Auger
0 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-10-10 13:51 UTC (permalink / raw)
To: Jean-Philippe Brucker
Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Jean,
On 9/29/23 18:16, Jean-Philippe Brucker wrote:
> On Wed, Sep 13, 2023 at 10:01:44AM +0200, Eric Auger wrote:
>> Introduce resv_region_list_insert() helper which inserts
>> a new ReservedRegion into a sorted list of reserved region.
>> In case of overlap, the new region has higher priority and
>> hides the existing overlapped segments. If the overlap is
>> partial, new regions are created for parts which are not
>> overlapped. The new region has higher priority independently
>> on the type of the regions.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>> include/qemu/reserved-region.h | 32 ++++++++++++
>> util/reserved-region.c | 94 ++++++++++++++++++++++++++++++++++
>> util/meson.build | 1 +
>> 3 files changed, 127 insertions(+)
>> create mode 100644 include/qemu/reserved-region.h
>> create mode 100644 util/reserved-region.c
>>
>> diff --git a/include/qemu/reserved-region.h b/include/qemu/reserved-region.h
>> new file mode 100644
>> index 0000000000..8e6f0a97e2
>> --- /dev/null
>> +++ b/include/qemu/reserved-region.h
>> @@ -0,0 +1,32 @@
>> +/*
>> + * QEMU ReservedRegion helpers
>> + *
>> + * Copyright (c) 2023 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2 of the License, or (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef QEMU_RESERVED_REGION_H
>> +#define QEMU_RESERVED_REGION_H
>> +
>> +#include "exec/memory.h"
>> +
>> +/*
>> + * Insert a new region into a sorted list of reserved regions. In case
>> + * there is overlap with existing regions, the new added region has
>> + * higher priority and replaces the overlapped segment.
>> + */
>> +GList *resv_region_list_insert(GList *list, ReservedRegion *reg);
>> +
>> +#endif
>> diff --git a/util/reserved-region.c b/util/reserved-region.c
>> new file mode 100644
>> index 0000000000..bb26a6bed3
>> --- /dev/null
>> +++ b/util/reserved-region.c
>> @@ -0,0 +1,94 @@
>> +/*
>> + * QEMU ReservedRegion helpers
>> + *
>> + * Copyright (c) 2023 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2 of the License, or (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/range.h"
>> +#include "qemu/reserved-region.h"
>> +
>> +GList *resv_region_list_insert(GList *list, ReservedRegion *reg)
>> +{
>> + ReservedRegion *resv_iter, *new_reg;
>> + Range *r = ®->range;
>> + Range *range_iter;
>> + GList *l;
>> +
>> + for (l = list; l ; ) {
>> + resv_iter = (ReservedRegion *)l->data;
>> + range_iter = &resv_iter->range;
>> +
>> + /* Skip all list elements strictly less than range to add */
>> + if (range_compare(range_iter, r) < 0) {
>> + l = l->next;
>> + } else if (range_compare(range_iter, r) > 0) {
>> + return g_list_insert_before(list, l, reg);
>> + } else { /* there is an operlap */
> I guess this could just be 'else if', and the whole block below
> unindented. But I don't know if the function would look less or more scary :)
yeah if you don't mind I would be inclided to leave it as is. It
isolates the case where we have an overlap and you already reviewed it ;-).
>
>> + if (range_contains_range(r, range_iter)) {
>> + /* new range contains current item, simply remove this latter */
>> + GList *prev = l->prev;
>> + g_free(l->data);
>> + list = g_list_delete_link(list, l);
>> + if (prev) {
> These four lines could just be 'l = prev->next'.
indeed!
>
>> + l = prev;
>> + if (l) {
>> + l = l->next;
>> + }
>> + } else {
>> + l = list;
>> + }
>> + } else if (range_contains_range(range_iter, r)) {
>> + /* new region is included in the current region */
>> + if (range_lob(range_iter) == range_lob(r)) {
>> + /* adjacent on the left side, derives into 2 regions */
>> + range_set_bounds(range_iter, range_upb(r) + 1,
>> + range_upb(range_iter));
>> + return g_list_insert_before(list, l, reg);
>> + } else if (range_upb(range_iter) == range_upb(r)) {
>> + /* adjacent on the right side, derives into 2 regions */
>> + range_set_bounds(range_iter, range_lob(range_iter),
>> + range_lob(r) - 1);
>> + l = l->next;
>> + } else {
>> + uint64_t lob = range_lob(range_iter);
>> + /*
>> + * the new range is in the middle of an existing one,
>> + * split this latter into 3 regs instead
>> + */
>> + range_set_bounds(range_iter, range_upb(r) + 1,
>> + range_upb(range_iter));
>> + new_reg = g_new0(ReservedRegion, 1);
>> + new_reg->type = resv_iter->type;
>> + range_set_bounds(&new_reg->range,
>> + lob, range_lob(r) - 1);
>> + list = g_list_insert_before(list, l, new_reg);
>> + return g_list_insert_before(list, l, reg);
>> + }
>> + } else if (range_lob(r) < range_lob(range_iter)) {
>> + range_set_bounds(range_iter, range_upb(r) + 1,
>> + range_upb(range_iter));
>> + return g_list_insert_before(list, l, reg);
>> + } else { /* intersection on the upper range */
>> + range_set_bounds(range_iter, range_lob(range_iter),
>> + range_lob(r) - 1);
>> + l = l->next;
>> + }
>> + } /* overlap */
>> + }
>> + return g_list_append(list, reg);
> Looks correct overall
>
> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
That must have been a pain, really appreciated!
Eric
>
>> +}
>> +
>> diff --git a/util/meson.build b/util/meson.build
>> index c4827fd70a..eb677b40c2 100644
>> --- a/util/meson.build
>> +++ b/util/meson.build
>> @@ -51,6 +51,7 @@ util_ss.add(files('qdist.c'))
>> util_ss.add(files('qht.c'))
>> util_ss.add(files('qsp.c'))
>> util_ss.add(files('range.c'))
>> +util_ss.add(files('reserved-region.c'))
>> util_ss.add(files('stats64.c'))
>> util_ss.add(files('systemd.c'))
>> util_ss.add(files('transactions.c'))
>> --
>> 2.41.0
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 07/12] virtio-iommu: Implement set_iova_ranges() callback
2023-09-29 16:15 ` Jean-Philippe Brucker
@ 2023-10-10 14:36 ` Eric Auger
0 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-10-10 14:36 UTC (permalink / raw)
To: Jean-Philippe Brucker
Cc: eric.auger.pro, qemu-devel, qemu-arm, alex.williamson, clg, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Jean,
On 9/29/23 18:15, Jean-Philippe Brucker wrote:
> On Wed, Sep 13, 2023 at 10:01:42AM +0200, Eric Auger wrote:
>> The implementation populates the array of per IOMMUDevice
>> host reserved regions.
>>
>> It is forbidden to have conflicting sets of host IOVA ranges
>> to be applied onto the same IOMMU MR (implied by different
>> host devices).
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v1 -> v2:
>> - Forbid conflicting sets of host resv regions
>> ---
>> include/hw/virtio/virtio-iommu.h | 2 ++
>> hw/virtio/virtio-iommu.c | 48 ++++++++++++++++++++++++++++++++
>> 2 files changed, 50 insertions(+)
>>
>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
>> index 70b8ace34d..31b69c8261 100644
>> --- a/include/hw/virtio/virtio-iommu.h
>> +++ b/include/hw/virtio/virtio-iommu.h
>> @@ -40,6 +40,8 @@ typedef struct IOMMUDevice {
>> MemoryRegion root; /* The root container of the device */
>> MemoryRegion bypass_mr; /* The alias of shared memory MR */
>> GList *resv_regions;
>> + Range *host_resv_regions;
>> + uint32_t nr_host_resv_regions;
>> } IOMMUDevice;
>>
>> typedef struct IOMMUPciBus {
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index ea359b586a..ed2df5116f 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -20,6 +20,7 @@
>> #include "qemu/osdep.h"
>> #include "qemu/log.h"
>> #include "qemu/iov.h"
>> +#include "qemu/range.h"
>> #include "exec/target_page.h"
>> #include "hw/qdev-properties.h"
>> #include "hw/virtio/virtio.h"
>> @@ -1158,6 +1159,52 @@ static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>> return 0;
>> }
>>
>> +static int virtio_iommu_set_iova_ranges(IOMMUMemoryRegion *mr,
>> + uint32_t nr_ranges,
>> + struct Range *iova_ranges,
>> + Error **errp)
>> +{
>> + IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
>> + uint32_t nr_host_resv_regions;
>> + Range *host_resv_regions;
>> + int ret = -EINVAL;
>> +
>> + if (!nr_ranges) {
>> + return 0;
>> + }
>> +
>> + if (sdev->host_resv_regions) {
>> + range_inverse_array(nr_ranges, iova_ranges,
>> + &nr_host_resv_regions, &host_resv_regions,
>> + 0, UINT64_MAX);
>> + if (nr_host_resv_regions != sdev->nr_host_resv_regions) {
>> + goto error;
>> + }
>> + for (int i = 0; i < nr_host_resv_regions; i++) {
>> + Range *new = &host_resv_regions[i];
>> + Range *existing = &sdev->host_resv_regions[i];
>> +
>> + if (!range_contains_range(existing, new)) {
>> + goto error;
>> + }
>> + }
>> + ret = 0;
>> + goto out;
>> + }
>> +
>> + range_inverse_array(nr_ranges, iova_ranges,
>> + &sdev->nr_host_resv_regions, &sdev->host_resv_regions,
>> + 0, UINT64_MAX);
> Can set_iova_ranges() only be called for the first time before the guest
> has had a chance to issue a probe request? Maybe we could add a
> sanity-check that the guest hasn't issued a probe request yet, since we
> can't notify about updated reserved regions.
I added a warning if the set_iova is called after the probe
Eric
>
> I'm probably misremembering because I thought Linux set up IOMMU contexts
> (including probe requests) before enabling DMA master in PCI which cause
> QEMU VFIO to issue these calls. I'll double check.
>
> Thanks,
> Jean
>
>> +
>> + return 0;
>> +error:
>> + error_setg(errp, "IOMMU mr=%s Conflicting host reserved regions set!",
>> + mr->parent_obj.name);
>> +out:
>> + g_free(host_resv_regions);
>> + return ret;
>> +}
>> +
>> static void virtio_iommu_system_reset(void *opaque)
>> {
>> VirtIOIOMMU *s = opaque;
>> @@ -1453,6 +1500,7 @@ static void virtio_iommu_memory_region_class_init(ObjectClass *klass,
>> imrc->replay = virtio_iommu_replay;
>> imrc->notify_flag_changed = virtio_iommu_notify_flag_changed;
>> imrc->iommu_set_page_size_mask = virtio_iommu_set_page_size_mask;
>> + imrc->iommu_set_iova_ranges = virtio_iommu_set_iova_ranges;
>> }
>>
>> static const TypeInfo virtio_iommu_info = {
>> --
>> 2.41.0
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption
2023-09-19 17:22 ` Alex Williamson
2023-09-20 7:28 ` Eric Auger
2023-09-26 20:00 ` Eric Auger
@ 2023-10-10 17:16 ` Eric Auger
2 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-10-10 17:16 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Alex,
On 9/19/23 19:22, Alex Williamson wrote:
> On Wed, 13 Sep 2023 10:01:47 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Now we retrieve the usable IOVA ranges from the host,
>> we now the physical IOMMU aperture and we can remove
>> the assumption of 64b IOVA space when calling
>> vfio_host_win_add().
>>
>> This works fine in general but in case of an IOMMU memory
>> region this becomes more tricky. For instance the virtio-iommu
>> MR has a 64b aperture by default. If the physical IOMMU has a
>> smaller aperture (typically the case for VTD), this means we
>> would need to resize the IOMMU MR when this latter is linked
>> to a container. However this happens on vfio_listener_region_add()
>> when calling the IOMMU MR set_iova_ranges() callback and this
>> would mean we would have a recursive call the
>> vfio_listener_region_add(). This looks like a wrong usage of
>> the memory API causing duplicate IOMMU MR notifier registration
>> for instance.
>>
>> Until we find a better solution, make sure the vfio_find_hostwin()
>> is not called anymore for IOMMU region.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> I have not found any working solution to the IOMMU MR resizing.
>> So I can remove this patch or remove the check for IOMMU MR. Maybe
>> this is an issue which can be handled separately?
>> ---
>> hw/vfio/common.c | 25 ++++++++++++-------------
>> 1 file changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 26da38de05..40cac1ca91 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1112,13 +1112,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> #endif
>> }
>>
>> - hostwin = vfio_find_hostwin(container, iova, end);
>> - if (!hostwin) {
>> - error_setg(&err, "Container %p can't map guest IOVA region"
>> - " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> - goto fail;
>> - }
>> -
>> memory_region_ref(section->mr);
>>
>> if (memory_region_is_iommu(section->mr)) {
>> @@ -1177,6 +1170,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> return;
>> }
>>
>> + hostwin = vfio_find_hostwin(container, iova, end);
>> + if (!hostwin) {
>> + error_setg(&err, "Container %p can't map guest IOVA region"
>> + " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> + goto fail;
>> + }
>> +
>> +
>> /* Here we assume that memory_region_is_ram(section->mr)==true */
>>
>> /*
>> @@ -2594,12 +2595,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>> vfio_get_iommu_info_migration(container, info);
>> g_free(info);
>>
>> - /*
>> - * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> - * information to get the actual window extent rather than assume
>> - * a 64-bit IOVA address space.
>> - */
>> - vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
>> + g_assert(container->nr_iovas);
> This assert is a problem for older kernels
this will be solved in next version by letting
container->nr_iovas == 1 if the kernel does not support the api, in which case we will assume 64b IOVA
>
>> + vfio_host_win_add(container, 0,
>> + container->iova_ranges[container->nr_iovas - 1].end,
>> + container->pgsizes);
> This doesn't address the assumption about the min_iova and adds an
> assumption that the kernel provided list is sorted. Thanks,
agreed. In the next version, container->iova_ranges will be a sorted GList
Thanks!
Eric
>
> Alex
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption
2023-09-20 20:02 ` Alex Williamson
@ 2023-10-11 17:32 ` Eric Auger
0 siblings, 0 replies; 37+ messages in thread
From: Eric Auger @ 2023-10-11 17:32 UTC (permalink / raw)
To: Alex Williamson
Cc: eric.auger.pro, qemu-devel, qemu-arm, clg, jean-philippe, mst,
pbonzini, peter.maydell, peterx, david, philmd
Hi Alex,
On 9/20/23 22:02, Alex Williamson wrote:
> On Wed, 13 Sep 2023 10:01:47 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
>> Now we retrieve the usable IOVA ranges from the host,
>> we now the physical IOMMU aperture and we can remove
>> the assumption of 64b IOVA space when calling
>> vfio_host_win_add().
>>
>> This works fine in general but in case of an IOMMU memory
>> region this becomes more tricky. For instance the virtio-iommu
>> MR has a 64b aperture by default. If the physical IOMMU has a
>> smaller aperture (typically the case for VTD), this means we
>> would need to resize the IOMMU MR when this latter is linked
>> to a container. However this happens on vfio_listener_region_add()
>> when calling the IOMMU MR set_iova_ranges() callback and this
>> would mean we would have a recursive call the
>> vfio_listener_region_add(). This looks like a wrong usage of
>> the memory API causing duplicate IOMMU MR notifier registration
>> for instance.
>>
>> Until we find a better solution, make sure the vfio_find_hostwin()
>> is not called anymore for IOMMU region.
> Thanks for your encouragement to double check this, it does seem like
> there are some gaps in the host window support. First I guess I don't
> understand why the last chunk here assumes a contiguous range.
> Shouldn't we call vfio_host_win_add() for each IOVA range?
I aknowledge am not very familiar with those VFIOHostDMAWindow's
semantic. If it matches the avail. IOVA windows I am collecting in this
series - which sounds sensible - let's create as many of them as valid
IOVA ranges.
>
> But then we have a problem that we don't necessarily get positive
> feedback from memory_region_iommu_set_iova_ranges(). Did the vIOMMU
if memory_region_iommu_set_iova_ranges() fails,
vfio_listener_region_add() does enter the error handling path and we
will never reach the vfio_connect_container
> accept the ranges or not? Only one vIOMMU implements the callback.
> Should we only call memory_region_iommu_set_iova_ranges() if the range
> doesn't align to a host window and should the wrapper return -ENOTSUP
> if there is no vIOMMU support to poke holes in the range? Thanks,
On one end, there is the capability to retrieve from the host the usable
IOVA ranges.
On the other end, there is capability for the viommu to use the info and
also propagate them to the driver.
Currently if the viommu is not able to use that info,
memory_region_iommu_set_iova_ranges() returns 0 meaning we won't regress
the current code.
Then shall we use the info retrieved from the host to build the
VFIOHostDMAWindow and check the IOVAs even in that case? I guess yes
because anyway mapping IOVAs within reserved regions is wrong. This
would fail latter on on the dma_map().
Besides won't the host windows be built on top of available IOVA regions
instead?
Thanks
Eric
>
> Alex
>
>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> I have not found any working solution to the IOMMU MR resizing.
>> So I can remove this patch or remove the check for IOMMU MR. Maybe
>> this is an issue which can be handled separately?
>> ---
>> hw/vfio/common.c | 25 ++++++++++++-------------
>> 1 file changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 26da38de05..40cac1ca91 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1112,13 +1112,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> #endif
>> }
>>
>> - hostwin = vfio_find_hostwin(container, iova, end);
>> - if (!hostwin) {
>> - error_setg(&err, "Container %p can't map guest IOVA region"
>> - " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> - goto fail;
>> - }
>> -
>> memory_region_ref(section->mr);
>>
>> if (memory_region_is_iommu(section->mr)) {
>> @@ -1177,6 +1170,14 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> return;
>> }
>>
>> + hostwin = vfio_find_hostwin(container, iova, end);
>> + if (!hostwin) {
>> + error_setg(&err, "Container %p can't map guest IOVA region"
>> + " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
>> + goto fail;
>> + }
>> +
>> +
>> /* Here we assume that memory_region_is_ram(section->mr)==true */
>>
>> /*
>> @@ -2594,12 +2595,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>> vfio_get_iommu_info_migration(container, info);
>> g_free(info);
>>
>> - /*
>> - * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
>> - * information to get the actual window extent rather than assume
>> - * a 64-bit IOVA address space.
>> - */
>> - vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
>> + g_assert(container->nr_iovas);
>> + vfio_host_win_add(container, 0,
>> + container->iova_ranges[container->nr_iovas - 1].end,
>> + container->pgsizes);
>>
>> break;
>> }
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2023-10-11 17:33 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-13 8:01 [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space Eric Auger
2023-09-13 8:01 ` [PATCH v2 01/12] memory: Let ReservedRegion use Range Eric Auger
2023-09-13 12:43 ` Cédric Le Goater
2023-09-13 8:01 ` [PATCH v2 02/12] memory: Introduce memory_region_iommu_set_iova_ranges Eric Auger
2023-09-13 12:43 ` Cédric Le Goater
2023-09-20 7:40 ` Eric Auger
2023-09-13 8:01 ` [PATCH v2 03/12] vfio: Collect container iova range info Eric Auger
2023-09-13 12:55 ` Cédric Le Goater
2023-09-20 7:38 ` Eric Auger
2023-09-19 15:44 ` Alex Williamson
2023-09-20 7:15 ` Eric Auger
2023-09-20 7:39 ` Eric Auger
2023-09-13 8:01 ` [PATCH v2 04/12] virtio-iommu: Rename reserved_regions into prop_resv_regions Eric Auger
2023-09-13 13:01 ` Cédric Le Goater
2023-09-13 8:01 ` [PATCH v2 05/12] virtio-iommu: Introduce per IOMMUDevice reserved regions Eric Auger
2023-09-29 15:52 ` Jean-Philippe Brucker
2023-10-03 15:48 ` Eric Auger
2023-09-13 8:01 ` [PATCH v2 06/12] range: Introduce range_inverse_array() Eric Auger
2023-09-19 16:29 ` Alex Williamson
2023-09-20 7:24 ` Eric Auger
2023-09-13 8:01 ` [PATCH v2 07/12] virtio-iommu: Implement set_iova_ranges() callback Eric Auger
2023-09-29 16:15 ` Jean-Philippe Brucker
2023-10-10 14:36 ` Eric Auger
2023-09-13 8:01 ` [PATCH v2 08/12] range: Make range_compare() public Eric Auger
2023-09-13 8:01 ` [PATCH v2 09/12] util/reserved-region: Add new ReservedRegion helpers Eric Auger
2023-09-29 16:16 ` Jean-Philippe Brucker
2023-10-10 13:51 ` Eric Auger
2023-09-13 8:01 ` [PATCH v2 10/12] virtio-iommu: Consolidate host reserved regions and property set ones Eric Auger
2023-09-13 8:01 ` [PATCH v2 11/12] test: Add some tests for range and resv-mem helpers Eric Auger
2023-09-13 8:01 ` [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption Eric Auger
2023-09-19 17:22 ` Alex Williamson
2023-09-20 7:28 ` Eric Auger
2023-09-26 20:00 ` Eric Auger
2023-10-10 17:16 ` Eric Auger
2023-09-20 20:02 ` Alex Williamson
2023-10-11 17:32 ` Eric Auger
2023-09-26 8:06 ` [PATCH v2 00/12] VIRTIO-IOMMU/VFIO: Don't assume 64b IOVA space YangHang Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).