* [PATCH 0/3] Add bypass mode support to assigned device
@ 2022-06-13 6:10 Zhenzhong Duan
2022-06-13 6:10 ` [PATCH 1/3] virtio-iommu: " Zhenzhong Duan
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Zhenzhong Duan @ 2022-06-13 6:10 UTC (permalink / raw)
To: eric.auger
Cc: qemu-devel, mst, jean-philippe, pbonzini, yu.c.zhang,
chuanxiao.dong, tina.zhang
Currently virtio-iommu's logic to support bypass mode works only for
emulated device. For assigned device, no GPA -> HPA mapping is setup
in IOMMU page table.
Host report below error:
[3713481.750944] dmar_fault: 191 callbacks suppressed
[3713481.750953] DMAR: DRHD: handling fault status reg 302
[3713481.750962] DMAR: [DMA Read NO_PASID] Request device [2f:00.1] fault addr 0x7ebb0000 [fault reason 0x06] PTE Read access is not set
[3713481.751003] DMAR: DRHD: handling fault status reg 402
[3713481.751007] DMAR: [DMA Read NO_PASID] Request device [2f:00.1] fault addr 0x7ebb0000 [fault reason 0x06] PTE Read access is not set
[3713481.751023] DMAR: DRHD: handling fault status reg 502
[3713481.751026] DMAR: [DMA Write NO_PASID] Request device [2f:00.1] fault addr 0x7ebb0000 [fault reason 0x05] PTE Write access is not set
[3713481.751072] DMAR: DRHD: handling fault status reg 602
Guest kernel report below error:
[ 3.461716] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[ 3.462605] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[ 3.464630] i40e 0000:00:04.0: Adding to iommu group 5
[ 3.482093] i40e 0000:00:04.0: fw 0.0.00000 api 0.0 nvm 0.00 0x176953ce 28.50.1 [8086:37d3] [8086:35d0]
[ 3.484295] i40e 0000:00:04.0: eeprom check failed (-62), Tx/Rx traffic disabled
[ 3.487268] i40e 0000:00:04.0: configure_lan_hmc failed: -49
[ 3.489066] i40e: probe of 0000:00:04.0 failed with error -2
Fix it by adding switch beween bypass and iommu address space just like
the virtual VT-d implementation, so that in bypass mode, vfio mapping
is setup.
Tested with four combination of qemu's virtio-iommu.boot-bypass=true/false
with guest kernel's iommu=pt/nopt on x86_64 platform.
Zhenzhong Duan (3):
virtio-iommu: Add bypass mode support to assigned device
virtio-iommu: Use recursive lock to avoid deadlock
virtio-iommu: Add an assert check in translate routine
hw/virtio/trace-events | 1 +
hw/virtio/virtio-iommu.c | 135 ++++++++++++++++++++++++++++---
include/hw/virtio/virtio-iommu.h | 4 +-
3 files changed, 130 insertions(+), 10 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] virtio-iommu: Add bypass mode support to assigned device
2022-06-13 6:10 [PATCH 0/3] Add bypass mode support to assigned device Zhenzhong Duan
@ 2022-06-13 6:10 ` Zhenzhong Duan
2022-06-23 16:52 ` Eric Auger
2022-06-13 6:10 ` [PATCH 2/3] virtio-iommu: Use recursive lock to avoid deadlock Zhenzhong Duan
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Zhenzhong Duan @ 2022-06-13 6:10 UTC (permalink / raw)
To: eric.auger
Cc: qemu-devel, mst, jean-philippe, pbonzini, yu.c.zhang,
chuanxiao.dong, tina.zhang
Currently assigned devices can not work in virtio-iommu bypass mode.
Guest driver fails to probe the device due to DMA failure. And the
reason is because of lacking GPA -> HPA mappings when VM is created.
Add a root container memory region to hold both bypass memory region
and iommu memory region, so the switch between them is supported
just like the implementation in virtual VT-d.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/virtio/trace-events | 1 +
hw/virtio/virtio-iommu.c | 115 ++++++++++++++++++++++++++++++-
include/hw/virtio/virtio-iommu.h | 2 +
3 files changed, 116 insertions(+), 2 deletions(-)
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index ab8e095b73fa..20af2e7ebd78 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -124,6 +124,7 @@ virtio_iommu_remap(const char *name, uint64_t virt_start, uint64_t virt_end, uin
virtio_iommu_set_page_size_mask(const char *name, uint64_t old, uint64_t new) "mr=%s old_mask=0x%"PRIx64" new_mask=0x%"PRIx64
virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
+virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
# virtio-mem.c
virtio_mem_send_response(uint16_t type) "type=%" PRIu16
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 2597e166f939..ff718107ee02 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -69,6 +69,77 @@ static inline uint16_t virtio_iommu_get_bdf(IOMMUDevice *dev)
return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
}
+static bool virtio_iommu_device_bypassed(IOMMUDevice *sdev)
+{
+ uint32_t sid;
+ bool bypassed;
+ VirtIOIOMMU *s = sdev->viommu;
+ VirtIOIOMMUEndpoint *ep;
+
+ sid = virtio_iommu_get_bdf(sdev);
+
+ qemu_mutex_lock(&s->mutex);
+ /* need to check bypass before system reset */
+ if (!s->endpoints) {
+ bypassed = s->config.bypass;
+ goto unlock;
+ }
+
+ ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
+ if (!ep || !ep->domain) {
+ bypassed = s->config.bypass;
+ } else {
+ bypassed = ep->domain->bypass;
+ }
+
+unlock:
+ qemu_mutex_unlock(&s->mutex);
+ return bypassed;
+}
+
+/* Return whether the device is using IOMMU translation. */
+static bool virtio_iommu_switch_address_space(IOMMUDevice *sdev)
+{
+ bool use_remapping;
+
+ assert(sdev);
+
+ use_remapping = !virtio_iommu_device_bypassed(sdev);
+
+ trace_virtio_iommu_switch_address_space(pci_bus_num(sdev->bus),
+ PCI_SLOT(sdev->devfn),
+ PCI_FUNC(sdev->devfn),
+ use_remapping);
+
+ /* Turn off first then on the other */
+ if (use_remapping) {
+ memory_region_set_enabled(&sdev->bypass_mr, false);
+ memory_region_set_enabled(MEMORY_REGION(&sdev->iommu_mr), true);
+ } else {
+ memory_region_set_enabled(MEMORY_REGION(&sdev->iommu_mr), false);
+ memory_region_set_enabled(&sdev->bypass_mr, true);
+ }
+
+ return use_remapping;
+}
+
+static void virtio_iommu_switch_address_space_all(VirtIOIOMMU *s)
+{
+ GHashTableIter iter;
+ IOMMUPciBus *iommu_pci_bus;
+ int i;
+
+ g_hash_table_iter_init(&iter, s->as_by_busptr);
+ while (g_hash_table_iter_next(&iter, NULL, (void **)&iommu_pci_bus)) {
+ for (i = 0; i < PCI_DEVFN_MAX; i++) {
+ if (!iommu_pci_bus->pbdev[i]) {
+ continue;
+ }
+ virtio_iommu_switch_address_space(iommu_pci_bus->pbdev[i]);
+ }
+ }
+}
+
/**
* The bus number is used for lookup when SID based operations occur.
* In that case we lazily populate the IOMMUPciBus array from the bus hash
@@ -213,6 +284,7 @@ static gboolean virtio_iommu_notify_map_cb(gpointer key, gpointer value,
static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
{
VirtIOIOMMUDomain *domain = ep->domain;
+ IOMMUDevice *sdev = container_of(ep->iommu_mr, IOMMUDevice, iommu_mr);
if (!ep->domain) {
return;
@@ -221,6 +293,7 @@ static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
ep->iommu_mr);
QLIST_REMOVE(ep, next);
ep->domain = NULL;
+ virtio_iommu_switch_address_space(sdev);
}
static VirtIOIOMMUEndpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
@@ -323,12 +396,39 @@ static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
trace_virtio_iommu_init_iommu_mr(name);
+ memory_region_init(&sdev->root, OBJECT(s), name, UINT64_MAX);
+ address_space_init(&sdev->as, &sdev->root, TYPE_VIRTIO_IOMMU);
+
+ /*
+ * Build the IOMMU disabled container with aliases to the
+ * shared MRs. Note that aliasing to a shared memory region
+ * could help the memory API to detect same FlatViews so we
+ * can have devices to share the same FlatView when in bypass
+ * mode. (either by not configuring virtio-iommu driver or with
+ * "iommu=pt"). It will greatly reduce the total number of
+ * FlatViews of the system hence VM runs faster.
+ */
+ memory_region_init_alias(&sdev->bypass_mr, OBJECT(s),
+ "system", get_system_memory(), 0,
+ memory_region_size(get_system_memory()));
+
memory_region_init_iommu(&sdev->iommu_mr, sizeof(sdev->iommu_mr),
TYPE_VIRTIO_IOMMU_MEMORY_REGION,
OBJECT(s), name,
UINT64_MAX);
- address_space_init(&sdev->as,
- MEMORY_REGION(&sdev->iommu_mr), TYPE_VIRTIO_IOMMU);
+
+ /*
+ * Hook both the containers under the root container, we
+ * switch between iommu & bypass MRs by enable/disable
+ * corresponding sub-containers
+ */
+ memory_region_add_subregion_overlap(&sdev->root, 0,
+ MEMORY_REGION(&sdev->iommu_mr),
+ 0);
+ memory_region_add_subregion_overlap(&sdev->root, 0,
+ &sdev->bypass_mr, 0);
+
+ virtio_iommu_switch_address_space(sdev);
g_free(name);
}
return &sdev->as;
@@ -342,6 +442,7 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
uint32_t flags = le32_to_cpu(req->flags);
VirtIOIOMMUDomain *domain;
VirtIOIOMMUEndpoint *ep;
+ IOMMUDevice *sdev;
trace_virtio_iommu_attach(domain_id, ep_id);
@@ -375,6 +476,8 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
QLIST_INSERT_HEAD(&domain->endpoint_list, ep, next);
ep->domain = domain;
+ sdev = container_of(ep->iommu_mr, IOMMUDevice, iommu_mr);
+ virtio_iommu_switch_address_space(sdev);
/* Replay domain mappings on the associated memory region */
g_tree_foreach(domain->mappings, virtio_iommu_notify_map_cb,
@@ -887,6 +990,7 @@ static void virtio_iommu_set_config(VirtIODevice *vdev,
return;
}
dev_config->bypass = in_config->bypass;
+ virtio_iommu_switch_address_space_all(dev);
}
trace_virtio_iommu_set_config(in_config->bypass);
@@ -1026,6 +1130,8 @@ static void virtio_iommu_system_reset(void *opaque)
* system reset
*/
s->config.bypass = s->boot_bypass;
+ virtio_iommu_switch_address_space_all(s);
+
}
static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
@@ -1041,6 +1147,11 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
virtio_iommu_handle_command);
s->event_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE, NULL);
+ /*
+ * config.bypass is needed to get initial address space early, such as
+ * in vfio realize
+ */
+ s->config.bypass = s->boot_bypass;
s->config.page_size_mask = TARGET_PAGE_MASK;
s->config.input_range.end = UINT64_MAX;
s->config.domain_range.end = UINT32_MAX;
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 84391f844826..102eeefa731d 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -37,6 +37,8 @@ typedef struct IOMMUDevice {
int devfn;
IOMMUMemoryRegion iommu_mr;
AddressSpace as;
+ MemoryRegion root; /* The root container of the device */
+ MemoryRegion bypass_mr; /* The alias of shared memory MR */
} IOMMUDevice;
typedef struct IOMMUPciBus {
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] virtio-iommu: Use recursive lock to avoid deadlock
2022-06-13 6:10 [PATCH 0/3] Add bypass mode support to assigned device Zhenzhong Duan
2022-06-13 6:10 ` [PATCH 1/3] virtio-iommu: " Zhenzhong Duan
@ 2022-06-13 6:10 ` Zhenzhong Duan
2022-06-13 6:10 ` [PATCH 3/3] virtio-iommu: Add an assert check in translate routine Zhenzhong Duan
2022-06-23 16:52 ` [PATCH 0/3] Add bypass mode support to assigned device Eric Auger
3 siblings, 0 replies; 8+ messages in thread
From: Zhenzhong Duan @ 2022-06-13 6:10 UTC (permalink / raw)
To: eric.auger
Cc: qemu-devel, mst, jean-philippe, pbonzini, yu.c.zhang,
chuanxiao.dong, tina.zhang
When switching address space with mutex lock hold, mapping will be
replayed for assigned device. This will trigger relock deadlock.
Also release the mutex resource in unrealize routine.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/virtio/virtio-iommu.c | 20 +++++++++++---------
include/hw/virtio/virtio-iommu.h | 2 +-
2 files changed, 12 insertions(+), 10 deletions(-)
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index ff718107ee02..73d5bde9d122 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -78,7 +78,7 @@ static bool virtio_iommu_device_bypassed(IOMMUDevice *sdev)
sid = virtio_iommu_get_bdf(sdev);
- qemu_mutex_lock(&s->mutex);
+ qemu_rec_mutex_lock(&s->mutex);
/* need to check bypass before system reset */
if (!s->endpoints) {
bypassed = s->config.bypass;
@@ -93,7 +93,7 @@ static bool virtio_iommu_device_bypassed(IOMMUDevice *sdev)
}
unlock:
- qemu_mutex_unlock(&s->mutex);
+ qemu_rec_mutex_unlock(&s->mutex);
return bypassed;
}
@@ -745,7 +745,7 @@ static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
tail.status = VIRTIO_IOMMU_S_DEVERR;
goto out;
}
- qemu_mutex_lock(&s->mutex);
+ qemu_rec_mutex_lock(&s->mutex);
switch (head.type) {
case VIRTIO_IOMMU_T_ATTACH:
tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
@@ -774,7 +774,7 @@ static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
default:
tail.status = VIRTIO_IOMMU_S_UNSUPP;
}
- qemu_mutex_unlock(&s->mutex);
+ qemu_rec_mutex_unlock(&s->mutex);
out:
sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
@@ -862,7 +862,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
sid = virtio_iommu_get_bdf(sdev);
trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
- qemu_mutex_lock(&s->mutex);
+ qemu_rec_mutex_lock(&s->mutex);
ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
if (!ep) {
@@ -946,7 +946,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
trace_virtio_iommu_translate_out(addr, entry.translated_addr, sid);
unlock:
- qemu_mutex_unlock(&s->mutex);
+ qemu_rec_mutex_unlock(&s->mutex);
return entry;
}
@@ -1035,7 +1035,7 @@ static void virtio_iommu_replay(IOMMUMemoryRegion *mr, IOMMUNotifier *n)
sid = virtio_iommu_get_bdf(sdev);
- qemu_mutex_lock(&s->mutex);
+ qemu_rec_mutex_lock(&s->mutex);
if (!s->endpoints) {
goto unlock;
@@ -1049,7 +1049,7 @@ static void virtio_iommu_replay(IOMMUMemoryRegion *mr, IOMMUNotifier *n)
g_tree_foreach(ep->domain->mappings, virtio_iommu_remap, mr);
unlock:
- qemu_mutex_unlock(&s->mutex);
+ qemu_rec_mutex_unlock(&s->mutex);
}
static int virtio_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu_mr,
@@ -1167,7 +1167,7 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
virtio_add_feature(&s->features, VIRTIO_IOMMU_F_PROBE);
virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS_CONFIG);
- qemu_mutex_init(&s->mutex);
+ qemu_rec_mutex_init(&s->mutex);
s->as_by_busptr = g_hash_table_new_full(NULL, NULL, NULL, g_free);
@@ -1195,6 +1195,8 @@ static void virtio_iommu_device_unrealize(DeviceState *dev)
g_tree_destroy(s->endpoints);
}
+ qemu_rec_mutex_destroy(&s->mutex);
+
virtio_delete_queue(s->req_vq);
virtio_delete_queue(s->event_vq);
virtio_cleanup(vdev);
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 102eeefa731d..2ad5ee320be9 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -58,7 +58,7 @@ struct VirtIOIOMMU {
ReservedRegion *reserved_regions;
uint32_t nb_reserved_regions;
GTree *domains;
- QemuMutex mutex;
+ QemuRecMutex mutex;
GTree *endpoints;
bool boot_bypass;
};
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] virtio-iommu: Add an assert check in translate routine
2022-06-13 6:10 [PATCH 0/3] Add bypass mode support to assigned device Zhenzhong Duan
2022-06-13 6:10 ` [PATCH 1/3] virtio-iommu: " Zhenzhong Duan
2022-06-13 6:10 ` [PATCH 2/3] virtio-iommu: Use recursive lock to avoid deadlock Zhenzhong Duan
@ 2022-06-13 6:10 ` Zhenzhong Duan
2022-06-23 16:52 ` [PATCH 0/3] Add bypass mode support to assigned device Eric Auger
3 siblings, 0 replies; 8+ messages in thread
From: Zhenzhong Duan @ 2022-06-13 6:10 UTC (permalink / raw)
To: eric.auger
Cc: qemu-devel, mst, jean-philippe, pbonzini, yu.c.zhang,
chuanxiao.dong, tina.zhang
With address space switch supported, dma access translation only
happen after endpoint is attached to a non-bypass domain.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/virtio/virtio-iommu.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 73d5bde9d122..7c122ab95780 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -865,6 +865,10 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
qemu_rec_mutex_lock(&s->mutex);
ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
+
+ if (bypass_allowed)
+ assert(ep && ep->domain && !ep->domain->bypass);
+
if (!ep) {
if (!bypass_allowed) {
error_report_once("%s sid=%d is not known!!", __func__, sid);
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] virtio-iommu: Add bypass mode support to assigned device
2022-06-13 6:10 ` [PATCH 1/3] virtio-iommu: " Zhenzhong Duan
@ 2022-06-23 16:52 ` Eric Auger
2022-06-24 8:28 ` Duan, Zhenzhong
0 siblings, 1 reply; 8+ messages in thread
From: Eric Auger @ 2022-06-23 16:52 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, mst, jean-philippe, pbonzini, yu.c.zhang,
chuanxiao.dong, tina.zhang
Hi,
On 6/13/22 08:10, Zhenzhong Duan wrote:
> Currently assigned devices can not work in virtio-iommu bypass mode.
> Guest driver fails to probe the device due to DMA failure. And the
> reason is because of lacking GPA -> HPA mappings when VM is created.
>
> Add a root container memory region to hold both bypass memory region
> and iommu memory region, so the switch between them is supported
> just like the implementation in virtual VT-d.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/virtio/trace-events | 1 +
> hw/virtio/virtio-iommu.c | 115 ++++++++++++++++++++++++++++++-
> include/hw/virtio/virtio-iommu.h | 2 +
> 3 files changed, 116 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index ab8e095b73fa..20af2e7ebd78 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -124,6 +124,7 @@ virtio_iommu_remap(const char *name, uint64_t virt_start, uint64_t virt_end, uin
> virtio_iommu_set_page_size_mask(const char *name, uint64_t old, uint64_t new) "mr=%s old_mask=0x%"PRIx64" new_mask=0x%"PRIx64
> virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
> virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
> +virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
>
> # virtio-mem.c
> virtio_mem_send_response(uint16_t type) "type=%" PRIu16
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 2597e166f939..ff718107ee02 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -69,6 +69,77 @@ static inline uint16_t virtio_iommu_get_bdf(IOMMUDevice *dev)
> return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
> }
>
> +static bool virtio_iommu_device_bypassed(IOMMUDevice *sdev)
> +{
> + uint32_t sid;
> + bool bypassed;
> + VirtIOIOMMU *s = sdev->viommu;
> + VirtIOIOMMUEndpoint *ep;
> +
> + sid = virtio_iommu_get_bdf(sdev);
> +
> + qemu_mutex_lock(&s->mutex);
> + /* need to check bypass before system reset */
> + if (!s->endpoints) {
> + bypassed = s->config.bypass;
> + goto unlock;
> + }
> +
> + ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
> + if (!ep || !ep->domain) {
> + bypassed = s->config.bypass;
> + } else {
> + bypassed = ep->domain->bypass;
> + }
> +
> +unlock:
> + qemu_mutex_unlock(&s->mutex);
> + return bypassed;
> +}
> +
> +/* Return whether the device is using IOMMU translation. */
> +static bool virtio_iommu_switch_address_space(IOMMUDevice *sdev)
> +{
> + bool use_remapping;
> +
> + assert(sdev);
> +
> + use_remapping = !virtio_iommu_device_bypassed(sdev);
> +
> + trace_virtio_iommu_switch_address_space(pci_bus_num(sdev->bus),
> + PCI_SLOT(sdev->devfn),
> + PCI_FUNC(sdev->devfn),
> + use_remapping);
> +
> + /* Turn off first then on the other */
> + if (use_remapping) {
> + memory_region_set_enabled(&sdev->bypass_mr, false);
> + memory_region_set_enabled(MEMORY_REGION(&sdev->iommu_mr), true);
> + } else {
> + memory_region_set_enabled(MEMORY_REGION(&sdev->iommu_mr), false);
> + memory_region_set_enabled(&sdev->bypass_mr, true);
> + }
> +
> + return use_remapping;
> +}
> +
> +static void virtio_iommu_switch_address_space_all(VirtIOIOMMU *s)
> +{
> + GHashTableIter iter;
> + IOMMUPciBus *iommu_pci_bus;
> + int i;
> +
> + g_hash_table_iter_init(&iter, s->as_by_busptr);
> + while (g_hash_table_iter_next(&iter, NULL, (void **)&iommu_pci_bus)) {
> + for (i = 0; i < PCI_DEVFN_MAX; i++) {
> + if (!iommu_pci_bus->pbdev[i]) {
> + continue;
> + }
> + virtio_iommu_switch_address_space(iommu_pci_bus->pbdev[i]);
> + }
> + }
> +}
> +
> /**
> * The bus number is used for lookup when SID based operations occur.
> * In that case we lazily populate the IOMMUPciBus array from the bus hash
> @@ -213,6 +284,7 @@ static gboolean virtio_iommu_notify_map_cb(gpointer key, gpointer value,
> static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
> {
> VirtIOIOMMUDomain *domain = ep->domain;
> + IOMMUDevice *sdev = container_of(ep->iommu_mr, IOMMUDevice, iommu_mr);
>
> if (!ep->domain) {
> return;
> @@ -221,6 +293,7 @@ static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
> ep->iommu_mr);
> QLIST_REMOVE(ep, next);
> ep->domain = NULL;
> + virtio_iommu_switch_address_space(sdev);
> }
>
> static VirtIOIOMMUEndpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
> @@ -323,12 +396,39 @@ static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
>
> trace_virtio_iommu_init_iommu_mr(name);
>
> + memory_region_init(&sdev->root, OBJECT(s), name, UINT64_MAX);
> + address_space_init(&sdev->as, &sdev->root, TYPE_VIRTIO_IOMMU);
> +
> + /*
> + * Build the IOMMU disabled container with aliases to the
> + * shared MRs. Note that aliasing to a shared memory region
What do you call the 'disabled container' and the shared MRs?
> + * could help the memory API to detect same FlatViews so we
> + * can have devices to share the same FlatView when in bypass
> + * mode. (either by not configuring virtio-iommu driver or with
> + * "iommu=pt"). It will greatly reduce the total number of
> + * FlatViews of the system hence VM runs faster.
Do you mean that we could have used a shared bypass MR instead of
creatingone per device?
> + */
> + memory_region_init_alias(&sdev->bypass_mr, OBJECT(s),
> + "system", get_system_memory(), 0,
> + memory_region_size(get_system_memory()));
> +
> memory_region_init_iommu(&sdev->iommu_mr, sizeof(sdev->iommu_mr),
> TYPE_VIRTIO_IOMMU_MEMORY_REGION,
> OBJECT(s), name,
> UINT64_MAX);
> - address_space_init(&sdev->as,
> - MEMORY_REGION(&sdev->iommu_mr), TYPE_VIRTIO_IOMMU);
> +
> + /*
> + * Hook both the containers under the root container, we
did you mean "hook both sub-regions to the root container"?
> + * switch between iommu & bypass MRs by enable/disable
> + * corresponding sub-containers
> + */
> + memory_region_add_subregion_overlap(&sdev->root, 0,
> + MEMORY_REGION(&sdev->iommu_mr),
> + 0);
> + memory_region_add_subregion_overlap(&sdev->root, 0,
> + &sdev->bypass_mr, 0);
> +
> + virtio_iommu_switch_address_space(sdev);
> g_free(name);
> }
> return &sdev->as;
> @@ -342,6 +442,7 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
> uint32_t flags = le32_to_cpu(req->flags);
> VirtIOIOMMUDomain *domain;
> VirtIOIOMMUEndpoint *ep;
> + IOMMUDevice *sdev;
>
> trace_virtio_iommu_attach(domain_id, ep_id);
>
> @@ -375,6 +476,8 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
> QLIST_INSERT_HEAD(&domain->endpoint_list, ep, next);
>
> ep->domain = domain;
> + sdev = container_of(ep->iommu_mr, IOMMUDevice, iommu_mr);
> + virtio_iommu_switch_address_space(sdev);
>
> /* Replay domain mappings on the associated memory region */
> g_tree_foreach(domain->mappings, virtio_iommu_notify_map_cb,
> @@ -887,6 +990,7 @@ static void virtio_iommu_set_config(VirtIODevice *vdev,
> return;
> }
> dev_config->bypass = in_config->bypass;
> + virtio_iommu_switch_address_space_all(dev);
> }
>
> trace_virtio_iommu_set_config(in_config->bypass);
> @@ -1026,6 +1130,8 @@ static void virtio_iommu_system_reset(void *opaque)
> * system reset
> */
> s->config.bypass = s->boot_bypass;
> + virtio_iommu_switch_address_space_all(s);
> +
> }
>
> static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
> @@ -1041,6 +1147,11 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
> virtio_iommu_handle_command);
> s->event_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE, NULL);
>
> + /*
> + * config.bypass is needed to get initial address space early, such as
> + * in vfio realize
> + */
I don't understand this comment, please can you elaborate?
> + s->config.bypass = s->boot_bypass;
> s->config.page_size_mask = TARGET_PAGE_MASK;
> s->config.input_range.end = UINT64_MAX;
> s->config.domain_range.end = UINT32_MAX;
> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
> index 84391f844826..102eeefa731d 100644
> --- a/include/hw/virtio/virtio-iommu.h
> +++ b/include/hw/virtio/virtio-iommu.h
> @@ -37,6 +37,8 @@ typedef struct IOMMUDevice {
> int devfn;
> IOMMUMemoryRegion iommu_mr;
> AddressSpace as;
> + MemoryRegion root; /* The root container of the device */
> + MemoryRegion bypass_mr; /* The alias of shared memory MR */
> } IOMMUDevice;
>
> typedef struct IOMMUPciBus {
Did you test migration? I wonder if we shouldn't call
virtio_iommu_switch_address_space_all(s)
after restoring the endpoints in iommu_post_load() as it is done in intel-iommu
Thanks
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] Add bypass mode support to assigned device
2022-06-13 6:10 [PATCH 0/3] Add bypass mode support to assigned device Zhenzhong Duan
` (2 preceding siblings ...)
2022-06-13 6:10 ` [PATCH 3/3] virtio-iommu: Add an assert check in translate routine Zhenzhong Duan
@ 2022-06-23 16:52 ` Eric Auger
2022-06-24 7:36 ` Duan, Zhenzhong
3 siblings, 1 reply; 8+ messages in thread
From: Eric Auger @ 2022-06-23 16:52 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, mst, jean-philippe, pbonzini, yu.c.zhang,
chuanxiao.dong, tina.zhang
Hi Duan,
On 6/13/22 08:10, Zhenzhong Duan wrote:
> Currently virtio-iommu's logic to support bypass mode works only for
> emulated device. For assigned device, no GPA -> HPA mapping is setup
> in IOMMU page table.
>
> Host report below error:
> [3713481.750944] dmar_fault: 191 callbacks suppressed
> [3713481.750953] DMAR: DRHD: handling fault status reg 302
> [3713481.750962] DMAR: [DMA Read NO_PASID] Request device [2f:00.1] fault addr 0x7ebb0000 [fault reason 0x06] PTE Read access is not set
> [3713481.751003] DMAR: DRHD: handling fault status reg 402
> [3713481.751007] DMAR: [DMA Read NO_PASID] Request device [2f:00.1] fault addr 0x7ebb0000 [fault reason 0x06] PTE Read access is not set
> [3713481.751023] DMAR: DRHD: handling fault status reg 502
> [3713481.751026] DMAR: [DMA Write NO_PASID] Request device [2f:00.1] fault addr 0x7ebb0000 [fault reason 0x05] PTE Write access is not set
> [3713481.751072] DMAR: DRHD: handling fault status reg 602
>
> Guest kernel report below error:
> [ 3.461716] i40e: Intel(R) Ethernet Connection XL710 Network Driver
> [ 3.462605] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
> [ 3.464630] i40e 0000:00:04.0: Adding to iommu group 5
> [ 3.482093] i40e 0000:00:04.0: fw 0.0.00000 api 0.0 nvm 0.00 0x176953ce 28.50.1 [8086:37d3] [8086:35d0]
> [ 3.484295] i40e 0000:00:04.0: eeprom check failed (-62), Tx/Rx traffic disabled
> [ 3.487268] i40e 0000:00:04.0: configure_lan_hmc failed: -49
> [ 3.489066] i40e: probe of 0000:00:04.0 failed with error -2
>
> Fix it by adding switch beween bypass and iommu address space just like
> the virtual VT-d implementation, so that in bypass mode, vfio mapping
> is setup.
>
> Tested with four combination of qemu's virtio-iommu.boot-bypass=true/false
> with guest kernel's iommu=pt/nopt on x86_64 platform.
I know this has already landed uptream (I was off last week) but I have
few comments/questions related to the series.
>
> Zhenzhong Duan (3):
> virtio-iommu: Add bypass mode support to assigned device
> virtio-iommu: Use recursive lock to avoid deadlock
This patch may have been squashed into the previous one, as
"virtio-iommu: Add bypass mode support to assigned device" deadlocks.
Eric
> virtio-iommu: Add an assert check in translate routine
>
> hw/virtio/trace-events | 1 +
> hw/virtio/virtio-iommu.c | 135 ++++++++++++++++++++++++++++---
> include/hw/virtio/virtio-iommu.h | 4 +-
> 3 files changed, 130 insertions(+), 10 deletions(-)
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 0/3] Add bypass mode support to assigned device
2022-06-23 16:52 ` [PATCH 0/3] Add bypass mode support to assigned device Eric Auger
@ 2022-06-24 7:36 ` Duan, Zhenzhong
0 siblings, 0 replies; 8+ messages in thread
From: Duan, Zhenzhong @ 2022-06-24 7:36 UTC (permalink / raw)
To: eric.auger@redhat.com
Cc: qemu-devel@nongnu.org, mst@redhat.com, jean-philippe@linaro.org,
pbonzini@redhat.com, Zhang, Yu C, Dong, Chuanxiao, Zhang, Tina
Hi Eric,
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Friday, June 24, 2022 12:52 AM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: qemu-devel@nongnu.org; mst@redhat.com; jean-philippe@linaro.org;
>pbonzini@redhat.com; Zhang, Yu C <yu.c.zhang@intel.com>; Dong,
>Chuanxiao <chuanxiao.dong@intel.com>; Zhang, Tina
><tina.zhang@intel.com>
>Subject: Re: [PATCH 0/3] Add bypass mode support to assigned device
>
>Hi Duan,
>
>On 6/13/22 08:10, Zhenzhong Duan wrote:
>> Currently virtio-iommu's logic to support bypass mode works only for
>> emulated device. For assigned device, no GPA -> HPA mapping is setup
>> in IOMMU page table.
>>
>> Host report below error:
>> [3713481.750944] dmar_fault: 191 callbacks suppressed [3713481.750953]
>> DMAR: DRHD: handling fault status reg 302 [3713481.750962] DMAR: [DMA
>> Read NO_PASID] Request device [2f:00.1] fault addr 0x7ebb0000 [fault
>> reason 0x06] PTE Read access is not set [3713481.751003] DMAR: DRHD:
>> handling fault status reg 402 [3713481.751007] DMAR: [DMA Read
>> NO_PASID] Request device [2f:00.1] fault addr 0x7ebb0000 [fault reason
>> 0x06] PTE Read access is not set [3713481.751023] DMAR: DRHD: handling
>> fault status reg 502 [3713481.751026] DMAR: [DMA Write NO_PASID]
>> Request device [2f:00.1] fault addr 0x7ebb0000 [fault reason 0x05] PTE
>> Write access is not set [3713481.751072] DMAR: DRHD: handling fault
>> status reg 602
>>
>> Guest kernel report below error:
>> [ 3.461716] i40e: Intel(R) Ethernet Connection XL710 Network Driver
>> [ 3.462605] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
>> [ 3.464630] i40e 0000:00:04.0: Adding to iommu group 5
>> [ 3.482093] i40e 0000:00:04.0: fw 0.0.00000 api 0.0 nvm 0.00 0x176953ce
>28.50.1 [8086:37d3] [8086:35d0]
>> [ 3.484295] i40e 0000:00:04.0: eeprom check failed (-62), Tx/Rx traffic
>disabled
>> [ 3.487268] i40e 0000:00:04.0: configure_lan_hmc failed: -49
>> [ 3.489066] i40e: probe of 0000:00:04.0 failed with error -2
>>
>> Fix it by adding switch beween bypass and iommu address space just
>> like the virtual VT-d implementation, so that in bypass mode, vfio
>> mapping is setup.
>>
>> Tested with four combination of qemu's
>> virtio-iommu.boot-bypass=true/false
>> with guest kernel's iommu=pt/nopt on x86_64 platform.
>
>I know this has already landed uptream (I was off last week) but I have few
>comments/questions related to the series.
Sure😊
>>
>> Zhenzhong Duan (3):
>> virtio-iommu: Add bypass mode support to assigned device
>> virtio-iommu: Use recursive lock to avoid deadlock
>This patch may have been squashed into the previous one, as
>"virtio-iommu: Add bypass mode support to assigned device" deadlocks.
Yes, that may be better. My initial thought is to make 1st patch slim a bit,
which may be more clear for maintainers to review.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 1/3] virtio-iommu: Add bypass mode support to assigned device
2022-06-23 16:52 ` Eric Auger
@ 2022-06-24 8:28 ` Duan, Zhenzhong
0 siblings, 0 replies; 8+ messages in thread
From: Duan, Zhenzhong @ 2022-06-24 8:28 UTC (permalink / raw)
To: eric.auger@redhat.com
Cc: qemu-devel@nongnu.org, mst@redhat.com, jean-philippe@linaro.org,
pbonzini@redhat.com, Zhang, Yu C, Dong, Chuanxiao, Zhang, Tina
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Friday, June 24, 2022 12:52 AM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: qemu-devel@nongnu.org; mst@redhat.com; jean-philippe@linaro.org;
>pbonzini@redhat.com; Zhang, Yu C <yu.c.zhang@intel.com>; Dong,
>Chuanxiao <chuanxiao.dong@intel.com>; Zhang, Tina
><tina.zhang@intel.com>
>Subject: Re: [PATCH 1/3] virtio-iommu: Add bypass mode support to
>assigned device
>
>Hi,
>On 6/13/22 08:10, Zhenzhong Duan wrote:
>> Currently assigned devices can not work in virtio-iommu bypass mode.
>> Guest driver fails to probe the device due to DMA failure. And the
>> reason is because of lacking GPA -> HPA mappings when VM is created.
>>
>> Add a root container memory region to hold both bypass memory region
>> and iommu memory region, so the switch between them is supported just
>> like the implementation in virtual VT-d.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> hw/virtio/trace-events | 1 +
>> hw/virtio/virtio-iommu.c | 115 ++++++++++++++++++++++++++++++-
>> include/hw/virtio/virtio-iommu.h | 2 +
>> 3 files changed, 116 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events index
>> ab8e095b73fa..20af2e7ebd78 100644
>> --- a/hw/virtio/trace-events
>> +++ b/hw/virtio/trace-events
>> @@ -124,6 +124,7 @@ virtio_iommu_remap(const char *name, uint64_t
>> virt_start, uint64_t virt_end, uin
>> virtio_iommu_set_page_size_mask(const char *name, uint64_t old,
>uint64_t new) "mr=%s old_mask=0x%"PRIx64" new_mask=0x%"PRIx64
>virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
>> virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
>> +virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn,
>bool on) "Device %02x:%02x.%x switching address space (iommu
>enabled=%d)"
>>
>> # virtio-mem.c
>> virtio_mem_send_response(uint16_t type) "type=%" PRIu16 diff --git
>> a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c index
>> 2597e166f939..ff718107ee02 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -69,6 +69,77 @@ static inline uint16_t
>virtio_iommu_get_bdf(IOMMUDevice *dev)
>> return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn); }
>>
>> +static bool virtio_iommu_device_bypassed(IOMMUDevice *sdev) {
>> + uint32_t sid;
>> + bool bypassed;
>> + VirtIOIOMMU *s = sdev->viommu;
>> + VirtIOIOMMUEndpoint *ep;
>> +
>> + sid = virtio_iommu_get_bdf(sdev);
>> +
>> + qemu_mutex_lock(&s->mutex);
>> + /* need to check bypass before system reset */
>> + if (!s->endpoints) {
>> + bypassed = s->config.bypass;
>> + goto unlock;
>> + }
>> +
>> + ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
>> + if (!ep || !ep->domain) {
>> + bypassed = s->config.bypass;
>> + } else {
>> + bypassed = ep->domain->bypass;
>> + }
>> +
>> +unlock:
>> + qemu_mutex_unlock(&s->mutex);
>> + return bypassed;
>> +}
>> +
>> +/* Return whether the device is using IOMMU translation. */ static
>> +bool virtio_iommu_switch_address_space(IOMMUDevice *sdev) {
>> + bool use_remapping;
>> +
>> + assert(sdev);
>> +
>> + use_remapping = !virtio_iommu_device_bypassed(sdev);
>> +
>> + trace_virtio_iommu_switch_address_space(pci_bus_num(sdev->bus),
>> + PCI_SLOT(sdev->devfn),
>> + PCI_FUNC(sdev->devfn),
>> + use_remapping);
>> +
>> + /* Turn off first then on the other */
>> + if (use_remapping) {
>> + memory_region_set_enabled(&sdev->bypass_mr, false);
>> + memory_region_set_enabled(MEMORY_REGION(&sdev->iommu_mr),
>true);
>> + } else {
>> + memory_region_set_enabled(MEMORY_REGION(&sdev->iommu_mr),
>false);
>> + memory_region_set_enabled(&sdev->bypass_mr, true);
>> + }
>> +
>> + return use_remapping;
>> +}
>> +
>> +static void virtio_iommu_switch_address_space_all(VirtIOIOMMU *s) {
>> + GHashTableIter iter;
>> + IOMMUPciBus *iommu_pci_bus;
>> + int i;
>> +
>> + g_hash_table_iter_init(&iter, s->as_by_busptr);
>> + while (g_hash_table_iter_next(&iter, NULL, (void **)&iommu_pci_bus))
>{
>> + for (i = 0; i < PCI_DEVFN_MAX; i++) {
>> + if (!iommu_pci_bus->pbdev[i]) {
>> + continue;
>> + }
>> + virtio_iommu_switch_address_space(iommu_pci_bus->pbdev[i]);
>> + }
>> + }
>> +}
>> +
>> /**
>> * The bus number is used for lookup when SID based operations occur.
>> * In that case we lazily populate the IOMMUPciBus array from the bus
>> hash @@ -213,6 +284,7 @@ static gboolean
>> virtio_iommu_notify_map_cb(gpointer key, gpointer value, static void
>> virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
>{
>> VirtIOIOMMUDomain *domain = ep->domain;
>> + IOMMUDevice *sdev = container_of(ep->iommu_mr, IOMMUDevice,
>> + iommu_mr);
>>
>> if (!ep->domain) {
>> return;
>> @@ -221,6 +293,7 @@ static void
>virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
>> ep->iommu_mr);
>> QLIST_REMOVE(ep, next);
>> ep->domain = NULL;
>> + virtio_iommu_switch_address_space(sdev);
>> }
>>
>> static VirtIOIOMMUEndpoint *virtio_iommu_get_endpoint(VirtIOIOMMU
>*s,
>> @@ -323,12 +396,39 @@ static AddressSpace
>> *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
>>
>> trace_virtio_iommu_init_iommu_mr(name);
>>
>> + memory_region_init(&sdev->root, OBJECT(s), name, UINT64_MAX);
>> + address_space_init(&sdev->as, &sdev->root,
>> + TYPE_VIRTIO_IOMMU);
>> +
>> + /*
>> + * Build the IOMMU disabled container with aliases to the
>> + * shared MRs. Note that aliasing to a shared memory region
>What do you call the 'disabled container' and the shared MRs?
' IOMMU disabled container' means sdev->bypass_mr.
Shared MRs means get_system_memory(), aka system_memory.
All endpoint's sdev->bypass_mr alias to system_memory, so saying
sdev->bypass_mr 'IOMMU disabled' and system_memory shared MRs.
>
>> + * could help the memory API to detect same FlatViews so we
>> + * can have devices to share the same FlatView when in bypass
>> + * mode. (either by not configuring virtio-iommu driver or with
>> + * "iommu=pt"). It will greatly reduce the total number of
>> + * FlatViews of the system hence VM runs faster.
>
>Do you mean that we could have used a shared bypass MR instead of
>creatingone per device?
I think we still need to create iommu MR per device which is enabled when
endpoint attached to dma domain.
Shared bypass MR is enabled when:
1. endpoint not attached to any domain yet and virtio-iommu.boot_bypass=true
2. endpoint is attached to bypass domain
>> + */
>> + memory_region_init_alias(&sdev->bypass_mr, OBJECT(s),
>> + "system", get_system_memory(), 0,
>> +
>> + memory_region_size(get_system_memory()));
>> +
>> memory_region_init_iommu(&sdev->iommu_mr, sizeof(sdev-
>>iommu_mr),
>> TYPE_VIRTIO_IOMMU_MEMORY_REGION,
>> OBJECT(s), name,
>> UINT64_MAX);
>> - address_space_init(&sdev->as,
>> - MEMORY_REGION(&sdev->iommu_mr),
>TYPE_VIRTIO_IOMMU);
>> +
>> + /*
>> + * Hook both the containers under the root container, we
>did you mean "hook both sub-regions to the root container"?
Yes
>> + * switch between iommu & bypass MRs by enable/disable
>> + * corresponding sub-containers
>> + */
>> + memory_region_add_subregion_overlap(&sdev->root, 0,
>> + MEMORY_REGION(&sdev->iommu_mr),
>> + 0);
>> + memory_region_add_subregion_overlap(&sdev->root, 0,
>> + &sdev->bypass_mr, 0);
>> +
>> + virtio_iommu_switch_address_space(sdev);
>> g_free(name);
>> }
>> return &sdev->as;
>> @@ -342,6 +442,7 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>> uint32_t flags = le32_to_cpu(req->flags);
>> VirtIOIOMMUDomain *domain;
>> VirtIOIOMMUEndpoint *ep;
>> + IOMMUDevice *sdev;
>>
>> trace_virtio_iommu_attach(domain_id, ep_id);
>>
>> @@ -375,6 +476,8 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>> QLIST_INSERT_HEAD(&domain->endpoint_list, ep, next);
>>
>> ep->domain = domain;
>> + sdev = container_of(ep->iommu_mr, IOMMUDevice, iommu_mr);
>> + virtio_iommu_switch_address_space(sdev);
>>
>> /* Replay domain mappings on the associated memory region */
>> g_tree_foreach(domain->mappings, virtio_iommu_notify_map_cb, @@
>> -887,6 +990,7 @@ static void virtio_iommu_set_config(VirtIODevice *vdev,
>> return;
>> }
>> dev_config->bypass = in_config->bypass;
>> + virtio_iommu_switch_address_space_all(dev);
>> }
>>
>> trace_virtio_iommu_set_config(in_config->bypass);
>> @@ -1026,6 +1130,8 @@ static void virtio_iommu_system_reset(void
>*opaque)
>> * system reset
>> */
>> s->config.bypass = s->boot_bypass;
>> + virtio_iommu_switch_address_space_all(s);
>> +
>> }
>>
>> static void virtio_iommu_device_realize(DeviceState *dev, Error
>> **errp) @@ -1041,6 +1147,11 @@ static void
>virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>> virtio_iommu_handle_command);
>> s->event_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE,
>> NULL);
>>
>> + /*
>> + * config.bypass is needed to get initial address space early, such as
>> + * in vfio realize
>> + */
>I don't understand this comment, please can you elaborate?
In vfio_realize(), virtio_iommu_find_add_as() is called to get the address space
from virtio-iommu. virtio_iommu_find_add_as() calls virtio_iommu_switch_address_space()
and virtio_iommu_switch_address_space() need to check config.bypass to enable
the right MRs, either shared bypass MR or iommu MR.
It could be ok to use s->boot_bypass directly in virtio_iommu_device_bypassed(),
then this change could be removed.
>> + s->config.bypass = s->boot_bypass;
>> s->config.page_size_mask = TARGET_PAGE_MASK;
>> s->config.input_range.end = UINT64_MAX;
>> s->config.domain_range.end = UINT32_MAX; diff --git
>> a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
>> index 84391f844826..102eeefa731d 100644
>> --- a/include/hw/virtio/virtio-iommu.h
>> +++ b/include/hw/virtio/virtio-iommu.h
>> @@ -37,6 +37,8 @@ typedef struct IOMMUDevice {
>> int devfn;
>> IOMMUMemoryRegion iommu_mr;
>> AddressSpace as;
>> + MemoryRegion root; /* The root container of the device */
>> + MemoryRegion bypass_mr; /* The alias of shared memory MR */
>> } IOMMUDevice;
>>
>> typedef struct IOMMUPciBus {
>Did you test migration? I wonder if we shouldn't call
>
>virtio_iommu_switch_address_space_all(s)
>after restoring the endpoints in iommu_post_load() as it is done in intel-
>iommu
Thanks for point out, I missed the migration scenario. After your suggested
change, the migration works without pass-through device. With pass-through
device I got "VFIO device doesn't support migration". Not clear if
pass-through device migration is totally supported now.
Zhenzhong
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-06-24 8:48 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-13 6:10 [PATCH 0/3] Add bypass mode support to assigned device Zhenzhong Duan
2022-06-13 6:10 ` [PATCH 1/3] virtio-iommu: " Zhenzhong Duan
2022-06-23 16:52 ` Eric Auger
2022-06-24 8:28 ` Duan, Zhenzhong
2022-06-13 6:10 ` [PATCH 2/3] virtio-iommu: Use recursive lock to avoid deadlock Zhenzhong Duan
2022-06-13 6:10 ` [PATCH 3/3] virtio-iommu: Add an assert check in translate routine Zhenzhong Duan
2022-06-23 16:52 ` [PATCH 0/3] Add bypass mode support to assigned device Eric Auger
2022-06-24 7:36 ` Duan, Zhenzhong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).