From: David Hildenbrand <david@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>,
dhildenb@redhat.com, imammedo@redhat.com, ehabkost@redhat.com
Cc: mst@redhat.com, pbonzini@redhat.com, qemu-devel@nongnu.org,
qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
Date: Wed, 5 Dec 2018 08:59:06 +0100 [thread overview]
Message-ID: <587369f0-d3aa-3494-33a9-f634dae65213@redhat.com> (raw)
In-Reply-To: <20181205050641.864-5-david@gibson.dropbear.id.au>
On 05.12.18 06:06, David Gibson wrote:
> Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually
> discard RAM pages inserted into the balloon. This is basically a Linux
> only interface (MADV_DONTNEED exists on some other platforms, but doesn't
> always have the same semantics). It also doesn't work on hugepages and has
> some other limitations.
>
> It turns out that postcopy also needs to discard chunks of memory, and uses
> a better interface for it: ram_block_discard_range(). It doesn't cover
> every case, but it covers more than going direct to madvise() and this
> gives us a single place to update for more possibilities in future.
>
> There are some subtleties here to maintain the current balloon behaviour:
>
> * For now, we just ignore requests to balloon in a hugepage backed region.
> That matches current behaviour, because MADV_DONTNEED on a hugepage would
> simply fail, and we ignore the error.
>
> * If host page size is > BALLOON_PAGE_SIZE we can frequently call this on
> non-host-page-aligned addresses. These would also fail in madvise(),
> which we then ignored. ram_block_discard_range() error_report()s calls
> on unaligned addresses, so we explicitly check that case to avoid
> spamming the logs.
>
> * We now call ram_block_discard_range() with the *host* page size, whereas
> we previously called madvise() with BALLOON_PAGE_SIZE. Surprisingly,
> this also matches existing behaviour. Although the kernel fails madvise
> on unaligned addresses, it will round unaligned sizes *up* to the host
> page size. Yes, this means that if BALLOON_PAGE_SIZE < guest page size
> we can incorrectly discard more memory than the guest asked us to. I'm
> planning to address that soon.
>
> Errors other than the ones discussed above, will now be reported by
> ram_block_discard_range(), rather than silently ignored, which means we
> have a much better chance of seeing when something is going wrong.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> hw/virtio/virtio-balloon.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index c3a19aa27d..4435905c87 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -37,8 +37,29 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
> MemoryRegion *mr, hwaddr offset)
> {
> void *addr = memory_region_get_ram_ptr(mr) + offset;
> + RAMBlock *rb;
> + size_t rb_page_size;
> + ram_addr_t ram_offset;
>
> - qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
> + /* XXX is there a better way to get to the RAMBlock than via a
> + * host address? */
We have qemu_get_ram_block(). That one should work as long as we know
that it is a valid guest ram address. (would have to make it !static)
Then we would only have to pass to this function the original ram_addr_t
handed over by the guest (which looks somewhat cleaner to me than going
via memory regions)
> + rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> + rb_page_size = qemu_ram_pagesize(rb);
> +
> + /* Silently ignore hugepage RAM blocks */
> + if (rb_page_size != getpagesize()) {
> + return;
> + }
> +
> + /* Silently ignore unaligned requests */
> + if (ram_offset & (rb_page_size - 1)) {
> + return;
> + }
> +
> + ram_block_discard_range(rb, ram_offset, rb_page_size);
> + /* We ignore errors from ram_block_discard_range(), because it has
> + * already reported them, and failing to discard a balloon page is
> + * not fatal */
> }
>
> static const char *balloon_stat_names[] = {
>
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2018-12-05 7:59 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-05 5:06 [Qemu-devel] [RFCv2 for-4.0 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
2018-12-05 5:06 ` [Qemu-devel] [RFCv2 for-4.0 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate David Gibson
2018-12-05 5:06 ` [Qemu-devel] [RFCv2 for-4.0 2/5] virtio-balloon: Corrections to address verification David Gibson
2018-12-05 5:06 ` [Qemu-devel] [RFCv2 for-4.0 3/5] virtio-balloon: Rework ballon_page() interface David Gibson
2018-12-05 5:06 ` [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise() David Gibson
2018-12-05 7:59 ` David Hildenbrand [this message]
2018-12-05 22:56 ` David Gibson
2018-12-05 5:06 ` [Qemu-devel] [RFCv2 for-4.0 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=587369f0-d3aa-3494-33a9-f634dae65213@redhat.com \
--to=david@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=dhildenb@redhat.com \
--cc=ehabkost@redhat.com \
--cc=imammedo@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).