qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: alex.williamson@redhat.com, qemu-devel@nongnu.org
Cc: peterx@redhat.com, Cornelia Huck <cohuck@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	kvm@vger.kernel.org, david@redhat.com
Subject: [Qemu-devel] [PATCH v3 3/4] vfio: Inhibit ballooning based on group attachment to a container
Date: Tue,  7 Aug 2018 13:31:24 -0600	[thread overview]
Message-ID: <20180807193125.30378-4-alex.williamson@redhat.com> (raw)
In-Reply-To: <20180807193125.30378-1-alex.williamson@redhat.com>

We use a VFIOContainer to associate an AddressSpace to one or more
VFIOGroups.  The VFIOContainer represents the DMA context for that
AdressSpace for those VFIOGroups and is synchronized to changes in
that AddressSpace via a MemoryListener.  For IOMMU backed devices,
maintaining the DMA context for a VFIOGroup generally involves
pinning a host virtual address in order to create a stable host
physical address and then mapping a translation from the associated
guest physical address to that host physical address into the IOMMU.

While the above maintains the VFIOContainer synchronized to the QEMU
memory API of the VM, memory ballooning occurs outside of that API.
Inflating the memory balloon (ie. cooperatively capturing pages from
the guest for use by the host) simply uses MADV_DONTNEED to "zap"
pages from QEMU's host virtual address space.  The page pinning and
IOMMU mapping above remains in place, negating the host's ability to
reuse the page, but the host virtual to host physical mapping of the
page is invalidated outside of QEMU's memory API.

When the balloon is later deflated, attempting to cooperatively
return pages to the guest, the page is simply freed by the guest
balloon driver, allowing it to be used in the guest and incurring a
page fault when that occurs.  The page fault maps a new host physical
page backing the existing host virtual address, meanwhile the
VFIOContainer still maintains the translation to the original host
physical address.  At this point the guest vCPU and any assigned
devices will map different host physical addresses to the same guest
physical address.  Badness.

The IOMMU typically does not have page level granularity with which
it can track this mapping without also incurring inefficiencies in
using page size mappings throughout.  MMU notifiers in the host
kernel also provide indicators for invalidating the mapping on
balloon inflation, not for updating the mapping when the balloon is
deflated.  For these reasons we assume a default behavior that the
mapping of each VFIOGroup into the VFIOContainer is incompatible
with memory ballooning and increment the balloon inhibitor to match
the attached VFIOGroups.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/common.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fb396cf00ac4..a3d758af8d8f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -32,6 +32,7 @@
 #include "hw/hw.h"
 #include "qemu/error-report.h"
 #include "qemu/range.h"
+#include "sysemu/balloon.h"
 #include "sysemu/kvm.h"
 #include "trace.h"
 #include "qapi/error.h"
@@ -1044,6 +1045,33 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 
     space = vfio_get_address_space(as);
 
+    /*
+     * VFIO is currently incompatible with memory ballooning insofar as the
+     * madvise to purge (zap) the page from QEMU's address space does not
+     * interact with the memory API and therefore leaves stale virtual to
+     * physical mappings in the IOMMU if the page was previously pinned.  We
+     * therefore add a balloon inhibit for each group added to a container,
+     * whether the container is used individually or shared.  This provides
+     * us with options to allow devices within a group to opt-in and allow
+     * ballooning, so long as it is done consistently for a group (for instance
+     * if the device is an mdev device where it is known that the host vendor
+     * driver will never pin pages outside of the working set of the guest
+     * driver, which would thus not be ballooning candidates).
+     *
+     * The first opportunity to induce pinning occurs here where we attempt to
+     * attach the group to existing containers within the AddressSpace.  If any
+     * pages are already zapped from the virtual address space, such as from a
+     * previous ballooning opt-in, new pinning will cause valid mappings to be
+     * re-established.  Likewise, when the overall MemoryListener for a new
+     * container is registered, a replay of mappings within the AddressSpace
+     * will occur, re-establishing any previously zapped pages as well.
+     *
+     * NB. Balloon inhibiting does not currently block operation of the
+     * balloon driver or revoke previously pinned pages, it only prevents
+     * calling madvise to modify the virtual mapping of ballooned pages.
+     */
+    qemu_balloon_inhibit(true);
+
     QLIST_FOREACH(container, &space->containers, next) {
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
             group->container = container;
@@ -1232,6 +1260,7 @@ close_fd_exit:
     close(fd);
 
 put_space_exit:
+    qemu_balloon_inhibit(false);
     vfio_put_address_space(space);
 
     return ret;
@@ -1352,6 +1381,7 @@ void vfio_put_group(VFIOGroup *group)
         return;
     }
 
+    qemu_balloon_inhibit(false);
     vfio_kvm_device_del_group(group);
     vfio_disconnect_container(group);
     QLIST_REMOVE(group, next);
-- 
2.18.0

  parent reply	other threads:[~2018-08-07 19:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-07 19:31 [Qemu-devel] [PATCH v3 0/4] Balloon inhibit enhancements, vfio restriction Alex Williamson
2018-08-07 19:31 ` [Qemu-devel] [PATCH v3 1/4] balloon: Allow multiple inhibit users Alex Williamson
2018-08-07 19:44   ` Michael S. Tsirkin
2018-08-07 20:08     ` Alex Williamson
2018-08-08  0:07       ` Michael S. Tsirkin
2018-08-07 19:31 ` [Qemu-devel] [PATCH v3 2/4] kvm: Use inhibit to prevent ballooning without synchronous mmu Alex Williamson
2018-08-16 18:15   ` Alex Williamson
2018-08-17  7:46     ` Paolo Bonzini
2018-08-07 19:31 ` Alex Williamson [this message]
2018-08-08  3:38   ` [Qemu-devel] [PATCH v3 3/4] vfio: Inhibit ballooning based on group attachment to a container Peter Xu
2018-08-07 19:31 ` [Qemu-devel] [PATCH v3 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning Alex Williamson
2018-08-07 19:44 ` [Qemu-devel] [PATCH v3 0/4] Balloon inhibit enhancements, vfio restriction Michael S. Tsirkin
2018-08-07 19:53   ` Alex Williamson
2018-08-07 21:58     ` Michael S. Tsirkin
2018-08-07 22:40       ` Alex Williamson
2018-08-08  0:02         ` Michael S. Tsirkin
2018-08-08  3:45       ` Peter Xu
2018-08-08 22:23         ` Alex Williamson
2018-08-09  9:20           ` Michael S. Tsirkin
2018-08-09  9:23         ` Michael S. Tsirkin
2018-08-09  9:37           ` Peter Xu
2018-08-09 10:13             ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180807193125.30378-4-alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).