qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Thomas Huth <thuth@redhat.com>
Cc: drjones@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com,
	jitendra.kolhe@hpe.com, wehuang@redhat.com, amit.shah@redhat.com,
	dgibson@redhat.com
Subject: Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes
Date: Wed, 13 Apr 2016 20:55:39 +0300	[thread overview]
Message-ID: <20160413205524-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <570E8404.10503@redhat.com>

On Wed, Apr 13, 2016 at 07:38:12PM +0200, Thomas Huth wrote:
> On 13.04.2016 19:07, Michael S. Tsirkin wrote:
> > On Wed, Apr 13, 2016 at 04:51:49PM +0200, Thomas Huth wrote:
> >> On 13.04.2016 15:15, Michael S. Tsirkin wrote:
> >>> On Wed, Apr 13, 2016 at 01:52:44PM +0200, Thomas Huth wrote:
> ...
> >>>> Then, there's yet another problem: If the host page size is bigger
> >>>> than the 4k balloon page size, we can not simply call madvise() on
> >>>> each of the 4k balloon addresses that we get from the guest - since
> >>>> the madvise() always evicts the whole host page, not only a 4k area!
> >>>>
> >>>> So in this case, we've got to track the 4k fragments of a host page
> >>>> and only call madvise(DONTNEED) when all fragments have been collected.
> >>>> This of course only works fine if the guest sends consecutive 4k
> >>>> fragments - which is the case in the most important scenarios that
> >>>> I try to address here (like a ppc64 guest with 64k page size that
> >>>> is running on a ppc64 host with 64k page size). In case the guest
> >>>> uses a page size that is smaller than the host page size, we might
> >>>> need to add some more additional logic here later to increase the
> >>>> probability of being able to release memory, but at least the guest
> >>>> should now not crash anymore due to unintentionally evicted pages.
> >> ...
> >>>>  static void virtio_balloon_instance_init(Object *obj)
> >>>> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> >>>> index 35f62ac..04b7c0c 100644
> >>>> --- a/include/hw/virtio/virtio-balloon.h
> >>>> +++ b/include/hw/virtio/virtio-balloon.h
> >>>> @@ -43,6 +43,9 @@ typedef struct VirtIOBalloon {
> >>>>      int64_t stats_last_update;
> >>>>      int64_t stats_poll_interval;
> >>>>      uint32_t host_features;
> >>>> +    void *current_addr;
> >>>> +    unsigned long *fragment_bits;
> >>>> +    int fragment_bits_size;
> >>>>  } VirtIOBalloon;
> >>>>  
> >>>>  #endif
> >>>
> >>> It looks like fragment_bits would have to be migrated.
> >>> Which is a lot of complexity.
> ...
> >>> How about we just skip madvise if host page size is > balloon
> >>> page size, for 2.6?
> >>
> >> That would mean a regression compared to what we have today. Currently,
> >> the ballooning is working OK for 64k guests on a 64k ppc host - rather
> >> by chance than on purpose, but it's working. The guest is always sending
> >> all the 4k fragments of a 64k page, and QEMU is trying to call madvise()
> >> for every one of them, but the kernel is ignoring madvise() on
> >> non-64k-aligned addresses, so we end up with a situation where the
> >> madvise() frees a whole 64k page which is also declared as free by the
> >> guest.
> >>
> >> I think we should either take this patch as it is right now (without
> >> adding extra code for migration) and later update it to the bitmap code
> >> by Jitendra Kolhe, or omit it completely (leaving 4k guests broken) and
> >> fix it properly after the bitmap code has been applied. But disabling
> >> the balloon code for 64k guests on 64k hosts completely does not sound
> >> very appealing to me. What do you think?
> >>
> >>  Thomas
> > 
> > True. As simple a hack - how about disabling madvise when host page size >
> > target page size?
> 
> That could work - but is there a generic way in QEMU to get the current
> page size from a guest (since this might differ from TARGET_PAGE_SIZE)?
> Or would that mean to pollute the virtio-balloon code with ugly #ifdefs?
> 
>  Thomas

let's just use TARGET_PAGE_SIZE, that's the best I can think of.

  reply	other threads:[~2016-04-13 17:55 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-13 11:52 [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes Thomas Huth
2016-04-13 12:37 ` Andrew Jones
2016-04-13 13:15 ` Michael S. Tsirkin
2016-04-13 14:51   ` Thomas Huth
2016-04-13 17:07     ` Michael S. Tsirkin
2016-04-13 17:38       ` Thomas Huth
2016-04-13 17:55         ` Michael S. Tsirkin [this message]
2016-04-13 18:11           ` Thomas Huth
2016-04-13 18:14             ` Michael S. Tsirkin
2016-04-14  3:45               ` David Gibson
2016-04-13 18:21             ` Andrew Jones
2016-04-14 11:47     ` Dr. David Alan Gilbert
2016-04-14 12:19       ` Thomas Huth
2016-04-14 18:34         ` Dr. David Alan Gilbert
2016-04-15  4:26           ` David Gibson
2016-05-23  6:25     ` Jitendra Kolhe
2016-04-14  3:39   ` David Gibson
2016-04-14  3:37 ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160413205524-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=dgibson@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=drjones@redhat.com \
    --cc=jitendra.kolhe@hpe.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=wehuang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).