[Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07
@ 2015-10-07 15:42 Alex Williamson
  2015-10-07 15:42 ` [Qemu-devel] [PULL 1/9] hw/vfio/platform: irqfd setup sequence update Alex Williamson
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:42 UTC (permalink / raw)
  To: qemu-devel

Hi Peter,

Per David's suggestion I simply dropped the clang problem patch, it's
apparently unnecessary at this time.  Thanks,

Alex

The following changes since commit c0b520dfb8890294a9f8879f4759172900585995:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging (2015-10-02 16:59:21 +0100)

are available in the git repository at:


  git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20151007.0

for you to fetch changes up to 508ce5eb00070809f0d19917a1b2960dfcf5a64b:

  vfio: Allow hotplug of containers onto existing guest IOMMU mappings (2015-10-05 12:39:47 -0600)

----------------------------------------------------------------
VFIO updates 2015-10-07

 - Change platform device IRQ setup sequence for compatibility
   with upcoming IRQ forwarding (Eric Auger)
 - Extensions to support vfio-pci devices on spapr-pci-host-bridge
   (David Gibson) [clang problem patch dropped]

----------------------------------------------------------------
David Gibson (6):
      vfio: Remove unneeded union from VFIOContainer
      vfio: Generalize vfio_listener_region_add failure path
      vfio: Check guest IOVA ranges against host IOMMU capabilities
      vfio: Record host IOMMU's available IO page sizes
      memory: Allow replay of IOMMU mapping notifications
      vfio: Allow hotplug of containers onto existing guest IOMMU mappings

Eric Auger (3):
      hw/vfio/platform: irqfd setup sequence update
      hw/vfio/platform: change interrupt/unmask fields into pointer
      hw/vfio/platform: do not set resamplefd for edge-sensitive IRQS

 hw/vfio/common.c                | 140 ++++++++++++++++++++++++----------------
 hw/vfio/platform.c              | 116 ++++++++++++++++++++-------------
 include/exec/memory.h           |  13 ++++
 include/hw/vfio/vfio-common.h   |  23 +++----
 include/hw/vfio/vfio-platform.h |   4 +-
 memory.c                        |  20 ++++++
 trace-events                    |   4 +-
 7 files changed, 204 insertions(+), 116 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PULL 1/9] hw/vfio/platform: irqfd setup sequence update
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
@ 2015-10-07 15:42 ` Alex Williamson
  2015-10-07 15:43 ` [Qemu-devel] [PULL 2/9] hw/vfio/platform: change interrupt/unmask fields into pointer Alex Williamson
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eric Auger

From: Eric Auger <eric.auger@linaro.org>

With current implementation, eventfd VFIO signaling is first set up and
then irqfd is setup, if supported and allowed.

This start sequence causes several issues with IRQ forwarding setup
which, if supported, is transparently attempted on irqfd setup:
IRQ forwarding setup is likely to fail if the IRQ is detected as under
injection into the guest (active at irqchip level or VFIO masked).

This currently always happens because the current sequence explicitly
VFIO-masks the IRQ before setting irqfd.

Even if that masking were removed, we couldn't prevent the case where
the IRQ is under injection into the guest.

So the simpler solution is to remove this 2-step startup and directly
attempt irqfd setup. This is what this patch does.

Also in case the eventfd setup fails, there is no reason to go farther:
let's abort.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/platform.c |   51 +++++++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a6726cd..d864342 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -310,18 +310,29 @@ static void vfio_platform_eoi(VFIODevice *vbasedev)
 /**
  * vfio_start_eventfd_injection - starts the virtual IRQ injection using
  * user-side handled eventfds
- * @intp: the IRQ struct pointer
+ * @sbdev: the sysbus device handle
+ * @irq: the qemu irq handle
  */
 
-static int vfio_start_eventfd_injection(VFIOINTp *intp)
+static void vfio_start_eventfd_injection(SysBusDevice *sbdev, qemu_irq irq)
 {
     int ret;
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+    VFIOINTp *intp;
+
+    QLIST_FOREACH(intp, &vdev->intp_list, next) {
+        if (intp->qemuirq == irq) {
+            break;
+        }
+    }
+    assert(intp);
 
     ret = vfio_set_trigger_eventfd(intp, vfio_intp_interrupt);
     if (ret) {
-        error_report("vfio: Error: Failed to pass IRQ fd to the driver: %m");
+        error_report("vfio: failed to start eventfd signaling for IRQ %d: %m",
+                     intp->pin);
+        abort();
     }
-    return ret;
 }
 
 /*
@@ -359,6 +370,15 @@ static int vfio_set_resample_eventfd(VFIOINTp *intp)
     return ret;
 }
 
+/**
+ * vfio_start_irqfd_injection - starts the virtual IRQ injection using
+ * irqfd
+ *
+ * @sbdev: the sysbus device handle
+ * @irq: the qemu irq handle
+ *
+ * In case the irqfd setup fails, we fallback to userspace handled eventfd
+ */
 static void vfio_start_irqfd_injection(SysBusDevice *sbdev, qemu_irq irq)
 {
     VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
@@ -366,7 +386,7 @@ static void vfio_start_irqfd_injection(SysBusDevice *sbdev, qemu_irq irq)
 
     if (!kvm_irqfds_enabled() || !kvm_resamplefds_enabled() ||
         !vdev->irqfd_allowed) {
-        return;
+        goto fail_irqfd;
     }
 
     QLIST_FOREACH(intp, &vdev->intp_list, next) {
@@ -376,13 +396,6 @@ static void vfio_start_irqfd_injection(SysBusDevice *sbdev, qemu_irq irq)
     }
     assert(intp);
 
-    /* Get to a known interrupt state */
-    qemu_set_fd_handler(event_notifier_get_fd(&intp->interrupt),
-                        NULL, NULL, vdev);
-
-    vfio_mask_single_irqindex(&vdev->vbasedev, intp->pin);
-    qemu_set_irq(intp->qemuirq, 0);
-
     if (kvm_irqchip_add_irqfd_notifier(kvm_state, &intp->interrupt,
                                    &intp->unmask, irq) < 0) {
         goto fail_irqfd;
@@ -395,9 +408,6 @@ static void vfio_start_irqfd_injection(SysBusDevice *sbdev, qemu_irq irq)
         goto fail_vfio;
     }
 
-    /* Let's resume injection with irqfd setup */
-    vfio_unmask_single_irqindex(&vdev->vbasedev, intp->pin);
-
     intp->kvm_accel = true;
 
     trace_vfio_platform_start_irqfd_injection(intp->pin,
@@ -406,9 +416,11 @@ static void vfio_start_irqfd_injection(SysBusDevice *sbdev, qemu_irq irq)
     return;
 fail_vfio:
     kvm_irqchip_remove_irqfd_notifier(kvm_state, &intp->interrupt, irq);
+    error_report("vfio: failed to start eventfd signaling for IRQ %d: %m",
+                 intp->pin);
+    abort();
 fail_irqfd:
-    vfio_start_eventfd_injection(intp);
-    vfio_unmask_single_irqindex(&vdev->vbasedev, intp->pin);
+    vfio_start_eventfd_injection(sbdev, irq);
     return;
 }
 
@@ -646,7 +658,6 @@ static void vfio_platform_realize(DeviceState *dev, Error **errp)
     VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
     SysBusDevice *sbdev = SYS_BUS_DEVICE(dev);
     VFIODevice *vbasedev = &vdev->vbasedev;
-    VFIOINTp *intp;
     int i, ret;
 
     vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
@@ -665,10 +676,6 @@ static void vfio_platform_realize(DeviceState *dev, Error **errp)
         vfio_map_region(vdev, i);
         sysbus_init_mmio(sbdev, &vdev->regions[i]->mem);
     }
-
-    QLIST_FOREACH(intp, &vdev->intp_list, next) {
-        vfio_start_eventfd_injection(intp);
-    }
 }
 
 static const VMStateDescription vfio_platform_vmstate = {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PULL 2/9] hw/vfio/platform: change interrupt/unmask fields into pointer
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
  2015-10-07 15:42 ` [Qemu-devel] [PULL 1/9] hw/vfio/platform: irqfd setup sequence update Alex Williamson
@ 2015-10-07 15:43 ` Alex Williamson
  2015-10-07 15:43 ` [Qemu-devel] [PULL 3/9] hw/vfio/platform: do not set resamplefd for edge-sensitive IRQS Alex Williamson
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eric Auger

From: Eric Auger <eric.auger@linaro.org>

unmask EventNotifier might not be initialized in case of edge
sensitive irq. Using EventNotifier pointers make life simpler to
handle the edge-sensitive irqfd setup.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/platform.c              |   35 ++++++++++++++++++++---------------
 include/hw/vfio/vfio-platform.h |    4 ++--
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index d864342..cab1517 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -57,15 +57,20 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev,
     sysbus_init_irq(sbdev, &intp->qemuirq);
 
     /* Get an eventfd for trigger */
-    ret = event_notifier_init(&intp->interrupt, 0);
+    intp->interrupt = g_malloc0(sizeof(EventNotifier));
+    ret = event_notifier_init(intp->interrupt, 0);
     if (ret) {
+        g_free(intp->interrupt);
         g_free(intp);
         error_report("vfio: Error: trigger event_notifier_init failed ");
         return NULL;
     }
     /* Get an eventfd for resample/unmask */
-    ret = event_notifier_init(&intp->unmask, 0);
+    intp->unmask = g_malloc0(sizeof(EventNotifier));
+    ret = event_notifier_init(intp->unmask, 0);
     if (ret) {
+        g_free(intp->interrupt);
+        g_free(intp->unmask);
         g_free(intp);
         error_report("vfio: Error: resamplefd event_notifier_init failed");
         return NULL;
@@ -100,7 +105,7 @@ static int vfio_set_trigger_eventfd(VFIOINTp *intp,
     irq_set->start = 0;
     irq_set->count = 1;
     pfd = (int32_t *)&irq_set->data;
-    *pfd = event_notifier_get_fd(&intp->interrupt);
+    *pfd = event_notifier_get_fd(intp->interrupt);
     qemu_set_fd_handler(*pfd, (IOHandler *)handler, NULL, intp);
     ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
     g_free(irq_set);
@@ -182,7 +187,7 @@ static void vfio_intp_mmap_enable(void *opaque)
 static void vfio_intp_inject_pending_lockheld(VFIOINTp *intp)
 {
     trace_vfio_platform_intp_inject_pending_lockheld(intp->pin,
-                              event_notifier_get_fd(&intp->interrupt));
+                              event_notifier_get_fd(intp->interrupt));
 
     intp->state = VFIO_IRQ_ACTIVE;
 
@@ -224,18 +229,18 @@ static void vfio_intp_interrupt(VFIOINTp *intp)
         trace_vfio_intp_interrupt_set_pending(intp->pin);
         QSIMPLEQ_INSERT_TAIL(&vdev->pending_intp_queue,
                              intp, pqnext);
-        ret = event_notifier_test_and_clear(&intp->interrupt);
+        ret = event_notifier_test_and_clear(intp->interrupt);
         qemu_mutex_unlock(&vdev->intp_mutex);
         return;
     }
 
     trace_vfio_platform_intp_interrupt(intp->pin,
-                              event_notifier_get_fd(&intp->interrupt));
+                              event_notifier_get_fd(intp->interrupt));
 
-    ret = event_notifier_test_and_clear(&intp->interrupt);
+    ret = event_notifier_test_and_clear(intp->interrupt);
     if (!ret) {
         error_report("Error when clearing fd=%d (ret = %d)",
-                     event_notifier_get_fd(&intp->interrupt), ret);
+                     event_notifier_get_fd(intp->interrupt), ret);
     }
 
     intp->state = VFIO_IRQ_ACTIVE;
@@ -283,7 +288,7 @@ static void vfio_platform_eoi(VFIODevice *vbasedev)
     QLIST_FOREACH(intp, &vdev->intp_list, next) {
         if (intp->state == VFIO_IRQ_ACTIVE) {
             trace_vfio_platform_eoi(intp->pin,
-                                event_notifier_get_fd(&intp->interrupt));
+                                event_notifier_get_fd(intp->interrupt));
             intp->state = VFIO_IRQ_INACTIVE;
 
             /* deassert the virtual IRQ */
@@ -360,7 +365,7 @@ static int vfio_set_resample_eventfd(VFIOINTp *intp)
     irq_set->start = 0;
     irq_set->count = 1;
     pfd = (int32_t *)&irq_set->data;
-    *pfd = event_notifier_get_fd(&intp->unmask);
+    *pfd = event_notifier_get_fd(intp->unmask);
     qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
     ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
     g_free(irq_set);
@@ -396,8 +401,8 @@ static void vfio_start_irqfd_injection(SysBusDevice *sbdev, qemu_irq irq)
     }
     assert(intp);
 
-    if (kvm_irqchip_add_irqfd_notifier(kvm_state, &intp->interrupt,
-                                   &intp->unmask, irq) < 0) {
+    if (kvm_irqchip_add_irqfd_notifier(kvm_state, intp->interrupt,
+                                   intp->unmask, irq) < 0) {
         goto fail_irqfd;
     }
 
@@ -411,11 +416,11 @@ static void vfio_start_irqfd_injection(SysBusDevice *sbdev, qemu_irq irq)
     intp->kvm_accel = true;
 
     trace_vfio_platform_start_irqfd_injection(intp->pin,
-                                     event_notifier_get_fd(&intp->interrupt),
-                                     event_notifier_get_fd(&intp->unmask));
+                                     event_notifier_get_fd(intp->interrupt),
+                                     event_notifier_get_fd(intp->unmask));
     return;
 fail_vfio:
-    kvm_irqchip_remove_irqfd_notifier(kvm_state, &intp->interrupt, irq);
+    kvm_irqchip_remove_irqfd_notifier(kvm_state, intp->interrupt, irq);
     error_report("vfio: failed to start eventfd signaling for IRQ %d: %m",
                  intp->pin);
     abort();
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index c5cf1d7..b468f80 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -34,8 +34,8 @@ enum {
 typedef struct VFIOINTp {
     QLIST_ENTRY(VFIOINTp) next; /* entry for IRQ list */
     QSIMPLEQ_ENTRY(VFIOINTp) pqnext; /* entry for pending IRQ queue */
-    EventNotifier interrupt; /* eventfd triggered on interrupt */
-    EventNotifier unmask; /* eventfd for unmask on QEMU bypass */
+    EventNotifier *interrupt; /* eventfd triggered on interrupt */
+    EventNotifier *unmask; /* eventfd for unmask on QEMU bypass */
     qemu_irq qemuirq;
     struct VFIOPlatformDevice *vdev; /* back pointer to device */
     int state; /* inactive, pending, active */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PULL 3/9] hw/vfio/platform: do not set resamplefd for edge-sensitive IRQS
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
  2015-10-07 15:42 ` [Qemu-devel] [PULL 1/9] hw/vfio/platform: irqfd setup sequence update Alex Williamson
  2015-10-07 15:43 ` [Qemu-devel] [PULL 2/9] hw/vfio/platform: change interrupt/unmask fields into pointer Alex Williamson
@ 2015-10-07 15:43 ` Alex Williamson
  2015-10-07 15:43 ` [Qemu-devel] [PULL 4/9] vfio: Remove unneeded union from VFIOContainer Alex Williamson
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eric Auger

From: Eric Auger <eric.auger@linaro.org>

In irqfd mode, current code attempts to set a resamplefd whatever
the type of the IRQ. For an edge-sensitive IRQ this attempt fails
and as a consequence, the whole irqfd setup fails and we fall back
to the slow mode. This patch bypasses the resamplefd setting for
non level-sentive IRQs.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/platform.c |   42 +++++++++++++++++++++++++++---------------
 trace-events       |    4 +++-
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index cab1517..5c1156c 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -32,6 +32,11 @@
  * Functions used whatever the injection method
  */
 
+static inline bool vfio_irq_is_automasked(VFIOINTp *intp)
+{
+    return intp->flags & VFIO_IRQ_INFO_AUTOMASKED;
+}
+
 /**
  * vfio_init_intp - allocate, initialize the IRQ struct pointer
  * and add it into the list of IRQs
@@ -65,15 +70,17 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev,
         error_report("vfio: Error: trigger event_notifier_init failed ");
         return NULL;
     }
-    /* Get an eventfd for resample/unmask */
-    intp->unmask = g_malloc0(sizeof(EventNotifier));
-    ret = event_notifier_init(intp->unmask, 0);
-    if (ret) {
-        g_free(intp->interrupt);
-        g_free(intp->unmask);
-        g_free(intp);
-        error_report("vfio: Error: resamplefd event_notifier_init failed");
-        return NULL;
+    if (vfio_irq_is_automasked(intp)) {
+        /* Get an eventfd for resample/unmask */
+        intp->unmask = g_malloc0(sizeof(EventNotifier));
+        ret = event_notifier_init(intp->unmask, 0);
+        if (ret) {
+            g_free(intp->interrupt);
+            g_free(intp->unmask);
+            g_free(intp);
+            error_report("vfio: Error: resamplefd event_notifier_init failed");
+            return NULL;
+        }
     }
 
     QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
@@ -294,7 +301,7 @@ static void vfio_platform_eoi(VFIODevice *vbasedev)
             /* deassert the virtual IRQ */
             qemu_set_irq(intp->qemuirq, 0);
 
-            if (intp->flags & VFIO_IRQ_INFO_AUTOMASKED) {
+            if (vfio_irq_is_automasked(intp)) {
                 /* unmasks the physical level-sensitive IRQ */
                 vfio_unmask_single_irqindex(vbasedev, intp->pin);
             }
@@ -409,15 +416,20 @@ static void vfio_start_irqfd_injection(SysBusDevice *sbdev, qemu_irq irq)
     if (vfio_set_trigger_eventfd(intp, NULL) < 0) {
         goto fail_vfio;
     }
-    if (vfio_set_resample_eventfd(intp) < 0) {
-        goto fail_vfio;
+    if (vfio_irq_is_automasked(intp)) {
+        if (vfio_set_resample_eventfd(intp) < 0) {
+            goto fail_vfio;
+        }
+        trace_vfio_platform_start_level_irqfd_injection(intp->pin,
+                                    event_notifier_get_fd(intp->interrupt),
+                                    event_notifier_get_fd(intp->unmask));
+    } else {
+        trace_vfio_platform_start_edge_irqfd_injection(intp->pin,
+                                    event_notifier_get_fd(intp->interrupt));
     }
 
     intp->kvm_accel = true;
 
-    trace_vfio_platform_start_irqfd_injection(intp->pin,
-                                     event_notifier_get_fd(intp->interrupt),
-                                     event_notifier_get_fd(intp->unmask));
     return;
 fail_vfio:
     kvm_irqchip_remove_irqfd_notifier(kvm_state, intp->interrupt, irq);
diff --git a/trace-events b/trace-events
index 36db793..6790292 100644
--- a/trace-events
+++ b/trace-events
@@ -1624,7 +1624,9 @@ vfio_platform_intp_interrupt(int pin, int fd) "Inject IRQ #%d (fd = %d)"
 vfio_platform_intp_inject_pending_lockheld(int pin, int fd) "Inject pending IRQ #%d (fd = %d)"
 vfio_platform_populate_interrupts(int pin, int count, int flags) "- IRQ index %d: count %d, flags=0x%x"
 vfio_intp_interrupt_set_pending(int index) "irq %d is set PENDING"
-vfio_platform_start_irqfd_injection(int index, int fd, int resamplefd) "IRQ index=%d, fd = %d, resamplefd = %d"
+vfio_platform_start_level_irqfd_injection(int index, int fd, int resamplefd) "IRQ index=%d, fd = %d, resamplefd = %d"
+vfio_platform_start_edge_irqfd_injection(int index, int fd) "IRQ index=%d, fd = %d"
+
 
 #hw/acpi/memory_hotplug.c
 mhp_acpi_invalid_slot_selected(uint32_t slot) "0x%"PRIx32

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PULL 4/9] vfio: Remove unneeded union from VFIOContainer
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
                   ` (2 preceding siblings ...)
  2015-10-07 15:43 ` [Qemu-devel] [PULL 3/9] hw/vfio/platform: do not set resamplefd for edge-sensitive IRQS Alex Williamson
@ 2015-10-07 15:43 ` Alex Williamson
  2015-10-07 15:43 ` [Qemu-devel] [PULL 5/9] vfio: Generalize vfio_listener_region_add failure path Alex Williamson
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

Currently the VFIOContainer iommu_data field contains a union with
different information for different host iommu types.  However:
   * It only actually contains information for the x86-like "Type1" iommu
   * Because we have a common listener the Type1 fields are actually used
on all IOMMU types, including the SPAPR TCE type as well

In fact we now have a general structure for the listener which is unlikely
to ever need per-iommu-type information, so this patch removes the union.

In a similar way we can unify the setup of the vfio memory listener in
vfio_connect_container() that is currently split across a switch on iommu
type, but is effectively the same in both cases.

The iommu_data.release pointer was only needed as a cleanup function
which would handle potentially different data in the union.  With the
union gone, it too can be removed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/common.c              |   52 +++++++++++++++--------------------------
 include/hw/vfio/vfio-common.h |   16 ++-----------
 2 files changed, 22 insertions(+), 46 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0d341a3..1545f62 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -315,8 +315,7 @@ out:
 static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
+    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
     hwaddr iova, end;
     Int128 llend;
     void *vaddr;
@@ -406,9 +405,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
          * can gracefully fail.  Runtime, there's not much we can do other
          * than throw a hardware error.
          */
-        if (!container->iommu_data.type1.initialized) {
-            if (!container->iommu_data.type1.error) {
-                container->iommu_data.type1.error = ret;
+        if (!container->initialized) {
+            if (!container->error) {
+                container->error = ret;
             }
         } else {
             hw_error("vfio: DMA mapping failed, unable to continue");
@@ -419,8 +418,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
 static void vfio_listener_region_del(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            iommu_data.type1.listener);
+    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
     hwaddr iova, end;
     int ret;
 
@@ -485,7 +483,7 @@ static const MemoryListener vfio_memory_listener = {
 
 static void vfio_listener_release(VFIOContainer *container)
 {
-    memory_listener_unregister(&container->iommu_data.type1.listener);
+    memory_listener_unregister(&container->listener);
 }
 
 int vfio_mmap_region(Object *obj, VFIORegion *region,
@@ -683,21 +681,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
-
-        if (container->iommu_data.type1.error) {
-            ret = container->iommu_data.type1.error;
-            error_report("vfio: memory listener initialization failed for container");
-            goto listener_release_exit;
-        }
-
-        container->iommu_data.type1.initialized = true;
-
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
@@ -723,19 +706,24 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
-
-        container->iommu_data.type1.listener = vfio_memory_listener;
-        container->iommu_data.release = vfio_listener_release;
-
-        memory_listener_register(&container->iommu_data.type1.listener,
-                                 container->space->as);
-
     } else {
         error_report("vfio: No available IOMMU models");
         ret = -EINVAL;
         goto free_container_exit;
     }
 
+    container->listener = vfio_memory_listener;
+
+    memory_listener_register(&container->listener, container->space->as);
+
+    if (container->error) {
+        ret = container->error;
+        error_report("vfio: memory listener initialization failed for container");
+        goto listener_release_exit;
+    }
+
+    container->initialized = true;
+
     QLIST_INIT(&container->group_list);
     QLIST_INSERT_HEAD(&space->containers, container, next);
 
@@ -774,9 +762,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
         VFIOAddressSpace *space = container->space;
         VFIOGuestIOMMU *giommu, *tmp;
 
-        if (container->iommu_data.release) {
-            container->iommu_data.release(container);
-        }
+        vfio_listener_release(container);
         QLIST_REMOVE(container, next);
 
         QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9b9901f..fbbe6de 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -59,22 +59,12 @@ typedef struct VFIOAddressSpace {
 
 struct VFIOGroup;
 
-typedef struct VFIOType1 {
-    MemoryListener listener;
-    int error;
-    bool initialized;
-} VFIOType1;
-
 typedef struct VFIOContainer {
     VFIOAddressSpace *space;
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
-    struct {
-        /* enable abstraction to support various iommu backends */
-        union {
-            VFIOType1 type1;
-        };
-        void (*release)(struct VFIOContainer *);
-    } iommu_data;
+    MemoryListener listener;
+    int error;
+    bool initialized;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_ENTRY(VFIOContainer) next;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PULL 5/9] vfio: Generalize vfio_listener_region_add failure path
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
                   ` (3 preceding siblings ...)
  2015-10-07 15:43 ` [Qemu-devel] [PULL 4/9] vfio: Remove unneeded union from VFIOContainer Alex Williamson
@ 2015-10-07 15:43 ` Alex Williamson
  2015-10-07 15:43 ` [Qemu-devel] [PULL 6/9] vfio: Check guest IOVA ranges against host IOMMU capabilities Alex Williamson
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier, Thomas Huth, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

If a DMA mapping operation fails in vfio_listener_region_add() it
checks to see if we've already completed initial setup of the
container.  If so it reports an error so the setup code can fail
gracefully, otherwise throws a hw_error().

There are other potential failure cases in vfio_listener_region_add()
which could benefit from the same logic, so move it to its own
fail: block.  Later patches can use this to extend other failure cases
to fail as gracefully as possible under the circumstances.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/common.c |   26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1545f62..95a4850 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -399,19 +399,23 @@ static void vfio_listener_region_add(MemoryListener *listener,
         error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
                      "0x%"HWADDR_PRIx", %p) = %d (%m)",
                      container, iova, end - iova, vaddr, ret);
+        goto fail;
+    }
 
-        /*
-         * On the initfn path, store the first error in the container so we
-         * can gracefully fail.  Runtime, there's not much we can do other
-         * than throw a hardware error.
-         */
-        if (!container->initialized) {
-            if (!container->error) {
-                container->error = ret;
-            }
-        } else {
-            hw_error("vfio: DMA mapping failed, unable to continue");
+    return;
+
+fail:
+    /*
+     * On the initfn path, store the first error in the container so we
+     * can gracefully fail.  Runtime, there's not much we can do other
+     * than throw a hardware error.
+     */
+    if (!container->initialized) {
+        if (!container->error) {
+            container->error = ret;
         }
+    } else {
+        hw_error("vfio: DMA mapping failed, unable to continue");
     }
 }
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PULL 6/9] vfio: Check guest IOVA ranges against host IOMMU capabilities
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
                   ` (4 preceding siblings ...)
  2015-10-07 15:43 ` [Qemu-devel] [PULL 5/9] vfio: Generalize vfio_listener_region_add failure path Alex Williamson
@ 2015-10-07 15:43 ` Alex Williamson
  2015-10-07 15:43 ` [Qemu-devel] [PULL 7/9] vfio: Record host IOMMU's available IO page sizes Alex Williamson
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

The current vfio core code assumes that the host IOMMU is capable of
mapping any IOVA the guest wants to use to where we need.  However, real
IOMMUs generally only support translating a certain range of IOVAs (the
"DMA window") not a full 64-bit address space.

The common x86 IOMMUs support a wide enough range that guests are very
unlikely to go beyond it in practice, however the IOMMU used on IBM Power
machines - in the default configuration - supports only a much more limited
IOVA range, usually 0..2GiB.

If the guest attempts to set up an IOVA range that the host IOMMU can't
map, qemu won't report an error until it actually attempts to map a bad
IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
visible IOMMU) then this will show up very quickly.  If there is a guest
visible IOMMU, however, the problem might not show up until much later when
the guest actually attempt to DMA with an IOVA the host can't handle.

This patch adds a test so that we will detect earlier if the guest is
attempting to use IOVA ranges that the host IOMMU won't be able to deal
with.

For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
incorrect, but no worse than what we have already.  We can't do better for
now because the Type1 kernel interface doesn't tell us what IOVA range the
IOMMU actually supports.

For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
IOVA range and validate guest IOVA ranges against it, and this patch does
so.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/common.c              |   40 +++++++++++++++++++++++++++++++++++++---
 include/hw/vfio/vfio-common.h |    6 ++++++
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 95a4850..2faf492 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -343,14 +343,22 @@ static void vfio_listener_region_add(MemoryListener *listener,
     if (int128_ge(int128_make64(iova), llend)) {
         return;
     }
+    end = int128_get64(llend);
+
+    if ((iova < container->min_iova) || ((end - 1) > container->max_iova)) {
+        error_report("vfio: IOMMU container %p can't map guest IOVA region"
+                     " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
+                     container, iova, end - 1);
+        ret = -EFAULT;
+        goto fail;
+    }
 
     memory_region_ref(section->mr);
 
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;
 
-        trace_vfio_listener_region_add_iommu(iova,
-                    int128_get64(int128_sub(llend, int128_one())));
+        trace_vfio_listener_region_add_iommu(iova, end - 1);
         /*
          * FIXME: We should do some checking to see if the
          * capabilities of the host VFIO IOMMU are adequate to model
@@ -387,7 +395,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
 
     /* Here we assume that memory_region_is_ram(section->mr)==true */
 
-    end = int128_get64(llend);
     vaddr = memory_region_get_ram_ptr(section->mr) +
             section->offset_within_region +
             (iova - section->offset_within_address_space);
@@ -685,7 +692,19 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
+
+        /*
+         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
+         * IOVA whatsoever.  That's not actually true, but the current
+         * kernel interface doesn't tell us what it can map, and the
+         * existing Type1 IOMMUs generally support any IOVA we're
+         * going to actually try in practice.
+         */
+        container->min_iova = 0;
+        container->max_iova = (hwaddr)-1;
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+        struct vfio_iommu_spapr_tce_info info;
+
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
             error_report("vfio: failed to set group container: %m");
@@ -710,6 +729,21 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
             ret = -errno;
             goto free_container_exit;
         }
+
+        /*
+         * This only considers the host IOMMU's 32-bit window.  At
+         * some point we need to add support for the optional 64-bit
+         * window and dynamic windows
+         */
+        info.argsz = sizeof(info);
+        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
+        if (ret) {
+            error_report("vfio: VFIO_IOMMU_SPAPR_TCE_GET_INFO failed: %m");
+            ret = -errno;
+            goto free_container_exit;
+        }
+        container->min_iova = info.dma32_window_start;
+        container->max_iova = container->min_iova + info.dma32_window_size - 1;
     } else {
         error_report("vfio: No available IOMMU models");
         ret = -EINVAL;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fbbe6de..27a14c0 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -65,6 +65,12 @@ typedef struct VFIOContainer {
     MemoryListener listener;
     int error;
     bool initialized;
+    /*
+     * This assumes the host IOMMU can support only a single
+     * contiguous IOVA window.  We may need to generalize that in
+     * future
+     */
+    hwaddr min_iova, max_iova;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_ENTRY(VFIOContainer) next;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PULL 7/9] vfio: Record host IOMMU's available IO page sizes
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
                   ` (5 preceding siblings ...)
  2015-10-07 15:43 ` [Qemu-devel] [PULL 6/9] vfio: Check guest IOVA ranges against host IOMMU capabilities Alex Williamson
@ 2015-10-07 15:43 ` Alex Williamson
  2015-10-07 15:43 ` [Qemu-devel] [PULL 8/9] memory: Allow replay of IOMMU mapping notifications Alex Williamson
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier, Thomas Huth, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

Depending on the host IOMMU type we determine and record the available page
sizes for IOMMU translation.  We'll need this for other validation in
future patches.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/common.c              |   13 +++++++++++++
 include/hw/vfio/vfio-common.h |    1 +
 2 files changed, 14 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 2faf492..f666de2 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -677,6 +677,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
     if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
         ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
         bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
+        struct vfio_iommu_type1_info info;
 
         ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
         if (ret) {
@@ -702,6 +703,15 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
          */
         container->min_iova = 0;
         container->max_iova = (hwaddr)-1;
+
+        /* Assume just 4K IOVA page size */
+        container->iova_pgsizes = 0x1000;
+        info.argsz = sizeof(info);
+        ret = ioctl(fd, VFIO_IOMMU_GET_INFO, &info);
+        /* Ignore errors */
+        if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
+            container->iova_pgsizes = info.iova_pgsizes;
+        }
     } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
         struct vfio_iommu_spapr_tce_info info;
 
@@ -744,6 +754,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
         }
         container->min_iova = info.dma32_window_start;
         container->max_iova = container->min_iova + info.dma32_window_size - 1;
+
+        /* Assume just 4K IOVA pages for now */
+        container->iova_pgsizes = 0x1000;
     } else {
         error_report("vfio: No available IOMMU models");
         ret = -EINVAL;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 27a14c0..f037f3c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -71,6 +71,7 @@ typedef struct VFIOContainer {
      * future
      */
     hwaddr min_iova, max_iova;
+    uint64_t iova_pgsizes;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_ENTRY(VFIOContainer) next;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PULL 8/9] memory: Allow replay of IOMMU mapping notifications
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
                   ` (6 preceding siblings ...)
  2015-10-07 15:43 ` [Qemu-devel] [PULL 7/9] vfio: Record host IOMMU's available IO page sizes Alex Williamson
@ 2015-10-07 15:43 ` Alex Williamson
  2015-10-07 15:43 ` [Qemu-devel] [PULL 9/9] vfio: Allow hotplug of containers onto existing guest IOMMU mappings Alex Williamson
  2015-10-08 17:00 ` [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Peter Maydell
  9 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Vivier, Paolo Bonzini, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

When we have guest visible IOMMUs, we allow notifiers to be registered
which will be informed of all changes to IOMMU mappings.  This is used by
vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.

However, unlike with a memory region listener, an iommu notifier won't be
told about any mappings which already exist in the (guest) IOMMU at the
time it is registered.  This can cause problems if hotplugging a VFIO
device onto a guest bus which had existing guest IOMMU mappings, but didn't
previously have an VFIO devices (and hence no host IOMMU mappings).

This adds a memory_region_iommu_replay() function to handle this case.  It
replays any existing mappings in an IOMMU memory region to a specified
notifier.  Because the IOMMU memory region doesn't internally remember the
granularity of the guest IOMMU it has a small hack where the caller must
specify a granularity at which to replay mappings.

If there are finer mappings in the guest IOMMU these will be reported in
the iotlb structures passed to the notifier which it must handle (probably
causing it to flag an error).  This isn't new - the VFIO iommu notifier
must already handle notifications about guest IOMMU mappings too short
for it to represent in the host IOMMU.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 include/exec/memory.h |   13 +++++++++++++
 memory.c              |   20 ++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5baaf48..0f07159 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -583,6 +583,19 @@ void memory_region_notify_iommu(MemoryRegion *mr,
 void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
 
 /**
+ * memory_region_iommu_replay: replay existing IOMMU translations to
+ * a notifier
+ *
+ * @mr: the memory region to observe
+ * @n: the notifier to which to replay iommu mappings
+ * @granularity: Minimum page granularity to replay notifications for
+ * @is_write: Whether to treat the replay as a translate "write"
+ *     through the iommu
+ */
+void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n,
+                                hwaddr granularity, bool is_write);
+
+/**
  * memory_region_unregister_iommu_notifier: unregister a notifier for
  * changes to IOMMU translation entries.
  *
diff --git a/memory.c b/memory.c
index ef87363..1b03d22 100644
--- a/memory.c
+++ b/memory.c
@@ -1403,6 +1403,26 @@ void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
     notifier_list_add(&mr->iommu_notify, n);
 }
 
+void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n,
+                                hwaddr granularity, bool is_write)
+{
+    hwaddr addr;
+    IOMMUTLBEntry iotlb;
+
+    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
+        iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+        if (iotlb.perm != IOMMU_NONE) {
+            n->notify(n, &iotlb);
+        }
+
+        /* if (2^64 - MR size) < granularity, it's possible to get an
+         * infinite loop here.  This should catch such a wraparound */
+        if ((addr + granularity) < addr) {
+            break;
+        }
+    }
+}
+
 void memory_region_unregister_iommu_notifier(Notifier *n)
 {
     notifier_remove(n);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PULL 9/9] vfio: Allow hotplug of containers onto existing guest IOMMU mappings
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
                   ` (7 preceding siblings ...)
  2015-10-07 15:43 ` [Qemu-devel] [PULL 8/9] memory: Allow replay of IOMMU mapping notifications Alex Williamson
@ 2015-10-07 15:43 ` Alex Williamson
  2015-10-08 17:00 ` [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Peter Maydell
  9 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2015-10-07 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

At present the memory listener used by vfio to keep host IOMMU mappings
in sync with the guest memory image assumes that if a guest IOMMU
appears, then it has no existing mappings.

This may not be true if a VFIO device is hotplugged onto a guest bus
which didn't previously include a VFIO device, and which has existing
guest IOMMU mappings.

Therefore, use the memory_region_register_iommu_notifier_replay()
function in order to fix this case, replaying existing guest IOMMU
mappings, bringing the host IOMMU into sync with the guest IOMMU.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/common.c |   23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f666de2..6797208 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -312,6 +312,11 @@ out:
     rcu_read_unlock();
 }
 
+static hwaddr vfio_container_granularity(VFIOContainer *container)
+{
+    return (hwaddr)1 << ctz64(container->iova_pgsizes);
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
@@ -369,26 +374,16 @@ static void vfio_listener_region_add(MemoryListener *listener,
          * would be the right place to wire that up (tell the KVM
          * device emulation the VFIO iommu handles to use).
          */
-        /*
-         * This assumes that the guest IOMMU is empty of
-         * mappings at this point.
-         *
-         * One way of doing this is:
-         * 1. Avoid sharing IOMMUs between emulated devices or different
-         * IOMMU groups.
-         * 2. Implement VFIO_IOMMU_ENABLE in the host kernel to fail if
-         * there are some mappings in IOMMU.
-         *
-         * VFIO on SPAPR does that. Other IOMMU models may do that different,
-         * they must make sure there are no existing mappings or
-         * loop through existing mappings to map them into VFIO.
-         */
         giommu = g_malloc0(sizeof(*giommu));
         giommu->iommu = section->mr;
         giommu->container = container;
         giommu->n.notify = vfio_iommu_map_notify;
         QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+
         memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
+        memory_region_iommu_replay(giommu->iommu, &giommu->n,
+                                   vfio_container_granularity(container),
+                                   false);
 
         return;
     }

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07
  2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
                   ` (8 preceding siblings ...)
  2015-10-07 15:43 ` [Qemu-devel] [PULL 9/9] vfio: Allow hotplug of containers onto existing guest IOMMU mappings Alex Williamson
@ 2015-10-08 17:00 ` Peter Maydell
  9 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2015-10-08 17:00 UTC (permalink / raw)
  To: Alex Williamson; +Cc: QEMU Developers

On 7 October 2015 at 16:42, Alex Williamson <alex.williamson@redhat.com> wrote:
> Hi Peter,
>
> Per David's suggestion I simply dropped the clang problem patch, it's
> apparently unnecessary at this time.  Thanks,
>
> Alex
>
> The following changes since commit c0b520dfb8890294a9f8879f4759172900585995:
>
>   Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging (2015-10-02 16:59:21 +0100)
>
> are available in the git repository at:
>
>
>   git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20151007.0
>
> for you to fetch changes up to 508ce5eb00070809f0d19917a1b2960dfcf5a64b:
>
>   vfio: Allow hotplug of containers onto existing guest IOMMU mappings (2015-10-05 12:39:47 -0600)
>
> ----------------------------------------------------------------
> VFIO updates 2015-10-07
>
>  - Change platform device IRQ setup sequence for compatibility
>    with upcoming IRQ forwarding (Eric Auger)
>  - Extensions to support vfio-pci devices on spapr-pci-host-bridge
>    (David Gibson) [clang problem patch dropped]
>
> ----------------------------------------------------------------

Applied, thanks.

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-10-08 17:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-07 15:42 [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Alex Williamson
2015-10-07 15:42 ` [Qemu-devel] [PULL 1/9] hw/vfio/platform: irqfd setup sequence update Alex Williamson
2015-10-07 15:43 ` [Qemu-devel] [PULL 2/9] hw/vfio/platform: change interrupt/unmask fields into pointer Alex Williamson
2015-10-07 15:43 ` [Qemu-devel] [PULL 3/9] hw/vfio/platform: do not set resamplefd for edge-sensitive IRQS Alex Williamson
2015-10-07 15:43 ` [Qemu-devel] [PULL 4/9] vfio: Remove unneeded union from VFIOContainer Alex Williamson
2015-10-07 15:43 ` [Qemu-devel] [PULL 5/9] vfio: Generalize vfio_listener_region_add failure path Alex Williamson
2015-10-07 15:43 ` [Qemu-devel] [PULL 6/9] vfio: Check guest IOVA ranges against host IOMMU capabilities Alex Williamson
2015-10-07 15:43 ` [Qemu-devel] [PULL 7/9] vfio: Record host IOMMU's available IO page sizes Alex Williamson
2015-10-07 15:43 ` [Qemu-devel] [PULL 8/9] memory: Allow replay of IOMMU mapping notifications Alex Williamson
2015-10-07 15:43 ` [Qemu-devel] [PULL 9/9] vfio: Allow hotplug of containers onto existing guest IOMMU mappings Alex Williamson
2015-10-08 17:00 ` [Qemu-devel] [PULL 0/9] VFIO updates for 2015-10-07 Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).