From: Peter Xu <peterx@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, cohuck@redhat.com,
mst@redhat.com, david@redhat.com
Subject: Re: [Qemu-devel] [PATCH v2 3/4] vfio: Inhibit ballooning based on group attachment to a container
Date: Tue, 7 Aug 2018 21:10:21 +0800 [thread overview]
Message-ID: <20180807131021.GF7265@xz-mi> (raw)
In-Reply-To: <153299246170.14411.6197545037372422542.stgit@gimli.home>
On Mon, Jul 30, 2018 at 05:14:21PM -0600, Alex Williamson wrote:
> We use a VFIOContainer to associate an AddressSpace to one or more
> VFIOGroups. The VFIOContainer represents the DMA context for that
> AdressSpace for those VFIOGroups and is synchronized to changes in
> that AddressSpace via a MemoryListener. For IOMMU backed devices,
> maintaining the DMA context for a VFIOGroup generally involves
> pinning a host virtual address in order to create a stable host
> physical address and then mapping a translation from the associated
> guest physical address to that host physical address into the IOMMU.
>
> While the above maintains the VFIOContainer synchronized to the QEMU
> memory API of the VM, memory ballooning occurs outside of that API.
> Inflating the memory balloon (ie. cooperatively capturing pages from
> the guest for use by the host) simply uses MADV_DONTNEED to "zap"
> pages from QEMU's host virtual address space. The page pinning and
> IOMMU mapping above remains in place, negating the host's ability to
> reuse the page, but the host virtual to host physical mapping of the
> page is invalidated outside of QEMU's memory API.
>
> When the balloon is later deflated, attempting to cooperatively
> return pages to the guest, the page is simply freed by the guest
> balloon driver, allowing it to be used in the guest and incurring a
> page fault when that occurs. The page fault maps a new host physical
> page backing the existing host virtual address, meanwhile the
> VFIOContainer still maintains the translation to the original host
> physical address. At this point the guest vCPU and any assigned
> devices will map different host physical addresses to the same guest
> physical address. Badness.
>
> The IOMMU typically does not have page level granularity with which
> it can track this mapping without also incurring inefficiencies in
> using page size mappings throughout. MMU notifiers in the host
> kernel also provide indicators for invalidating the mapping on
> balloon inflation, not for updating the mapping when the balloon is
> deflated. For these reasons we assume a default behavior that the
> mapping of each VFIOGroup into the VFIOContainer is incompatible
> with memory ballooning and increment the balloon inhibitor to match
> the attached VFIOGroups.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> hw/vfio/common.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index fb396cf00ac4..4881b691a659 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -32,6 +32,7 @@
> #include "hw/hw.h"
> #include "qemu/error-report.h"
> #include "qemu/range.h"
> +#include "sysemu/balloon.h"
> #include "sysemu/kvm.h"
> #include "trace.h"
> #include "qapi/error.h"
> @@ -1049,6 +1050,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> group->container = container;
> QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> vfio_kvm_device_add_group(group);
> + qemu_balloon_inhibit(true);
[1]
> return 0;
> }
> }
> @@ -1198,6 +1200,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> }
>
> vfio_kvm_device_add_group(group);
> + qemu_balloon_inhibit(true);
AFAIU there is a very critical information that this
qemu_balloon_inhibit() call must be before the call to:
memory_listener_register(&container->listener, container->space->as);
Since the memory listener registeration is the point when we do the
pinning of the pages. So to make sure we won't have stale pages we
must call qemu_balloon_inhibit() before memory_listener_register()
(which is what this patch does). However this is not that obvious,
not sure whether that might worth a comment.
Considering this, not sure whether we can just do this per-container
instead of per-group, then we also don't need to bother with extra
group-add paths like [1].
No matter what, this patch looks good to me (and it is correct AFAIK),
so I'm leaving r-b and I'll leave Alex to decide:
Reviewed-by: Peter Xu <peterx@redhat.com>
>
> QLIST_INIT(&container->group_list);
> QLIST_INSERT_HEAD(&space->containers, container, next);
> @@ -1222,6 +1225,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> listener_release_exit:
> QLIST_REMOVE(group, container_next);
> QLIST_REMOVE(container, next);
> + qemu_balloon_inhibit(false);
> vfio_kvm_device_del_group(group);
> vfio_listener_release(container);
>
> @@ -1352,6 +1356,7 @@ void vfio_put_group(VFIOGroup *group)
> return;
> }
>
> + qemu_balloon_inhibit(false);
> vfio_kvm_device_del_group(group);
> vfio_disconnect_container(group);
> QLIST_REMOVE(group, next);
>
Regards,
--
Peter Xu
next prev parent reply other threads:[~2018-08-07 13:10 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-30 23:13 [Qemu-devel] [PATCH v2 0/4] Balloon inhibit enhancements, vfio restriction Alex Williamson
2018-07-30 23:13 ` [Qemu-devel] [PATCH v2 1/4] balloon: Allow nested inhibits Alex Williamson
2018-07-31 8:25 ` David Hildenbrand
2018-08-07 12:56 ` Peter Xu
2018-08-07 14:20 ` Cornelia Huck
2018-07-30 23:14 ` [Qemu-devel] [PATCH v2 2/4] kvm: Use inhibit to prevent ballooning without synchronous mmu Alex Williamson
2018-07-31 8:24 ` David Hildenbrand
2018-08-07 12:56 ` Peter Xu
2018-08-07 14:24 ` Cornelia Huck
2018-07-30 23:14 ` [Qemu-devel] [PATCH v2 3/4] vfio: Inhibit ballooning based on group attachment to a container Alex Williamson
2018-08-07 13:10 ` Peter Xu [this message]
2018-08-07 16:35 ` Alex Williamson
2018-08-08 3:22 ` Peter Xu
2018-07-30 23:14 ` [Qemu-devel] [PATCH v2 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning Alex Williamson
2018-08-07 14:15 ` Cornelia Huck
2018-07-31 12:29 ` [Qemu-devel] [PATCH v2 0/4] Balloon inhibit enhancements, vfio restriction Michael S. Tsirkin
2018-07-31 14:44 ` Alex Williamson
2018-07-31 15:07 ` Dr. David Alan Gilbert
2018-07-31 21:50 ` Alex Williamson
2018-08-03 18:42 ` Michael S. Tsirkin
2018-08-03 20:12 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180807131021.GF7265@xz-mi \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).