From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53031) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aqPGW-0005qT-Ur for qemu-devel@nongnu.org; Wed, 13 Apr 2016 14:11:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aqPGT-0002iU-JN for qemu-devel@nongnu.org; Wed, 13 Apr 2016 14:11:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51880) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aqPGT-0002iP-BH for qemu-devel@nongnu.org; Wed, 13 Apr 2016 14:11:45 -0400 References: <1460548364-27469-1-git-send-email-thuth@redhat.com> <20160413145835-mutt-send-email-mst@redhat.com> <570E5D05.2030507@redhat.com> <20160413200618-mutt-send-email-mst@redhat.com> <570E8404.10503@redhat.com> <20160413205524-mutt-send-email-mst@redhat.com> From: Thomas Huth Message-ID: <570E8BD8.9000008@redhat.com> Date: Wed, 13 Apr 2016 20:11:36 +0200 MIME-Version: 1.0 In-Reply-To: <20160413205524-mutt-send-email-mst@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: drjones@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com, jitendra.kolhe@hpe.com, wehuang@redhat.com, amit.shah@redhat.com, dgibson@redhat.com On 13.04.2016 19:55, Michael S. Tsirkin wrote: > On Wed, Apr 13, 2016 at 07:38:12PM +0200, Thomas Huth wrote: >> On 13.04.2016 19:07, Michael S. Tsirkin wrote: >>> On Wed, Apr 13, 2016 at 04:51:49PM +0200, Thomas Huth wrote: >>>> On 13.04.2016 15:15, Michael S. Tsirkin wrote: >>>>> On Wed, Apr 13, 2016 at 01:52:44PM +0200, Thomas Huth wrote: >> ... >>>>>> Then, there's yet another problem: If the host page size is bigger >>>>>> than the 4k balloon page size, we can not simply call madvise() on >>>>>> each of the 4k balloon addresses that we get from the guest - since >>>>>> the madvise() always evicts the whole host page, not only a 4k area! >>>>>> >>>>>> So in this case, we've got to track the 4k fragments of a host page >>>>>> and only call madvise(DONTNEED) when all fragments have been collected. >>>>>> This of course only works fine if the guest sends consecutive 4k >>>>>> fragments - which is the case in the most important scenarios that >>>>>> I try to address here (like a ppc64 guest with 64k page size that >>>>>> is running on a ppc64 host with 64k page size). In case the guest >>>>>> uses a page size that is smaller than the host page size, we might >>>>>> need to add some more additional logic here later to increase the >>>>>> probability of being able to release memory, but at least the guest >>>>>> should now not crash anymore due to unintentionally evicted pages. >>>> ... >>>>>> static void virtio_balloon_instance_init(Object *obj) >>>>>> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h >>>>>> index 35f62ac..04b7c0c 100644 >>>>>> --- a/include/hw/virtio/virtio-balloon.h >>>>>> +++ b/include/hw/virtio/virtio-balloon.h >>>>>> @@ -43,6 +43,9 @@ typedef struct VirtIOBalloon { >>>>>> int64_t stats_last_update; >>>>>> int64_t stats_poll_interval; >>>>>> uint32_t host_features; >>>>>> + void *current_addr; >>>>>> + unsigned long *fragment_bits; >>>>>> + int fragment_bits_size; >>>>>> } VirtIOBalloon; >>>>>> >>>>>> #endif >>>>> >>>>> It looks like fragment_bits would have to be migrated. >>>>> Which is a lot of complexity. >> ... >>>>> How about we just skip madvise if host page size is > balloon >>>>> page size, for 2.6? >>>> >>>> That would mean a regression compared to what we have today. Currently, >>>> the ballooning is working OK for 64k guests on a 64k ppc host - rather >>>> by chance than on purpose, but it's working. The guest is always sending >>>> all the 4k fragments of a 64k page, and QEMU is trying to call madvise() >>>> for every one of them, but the kernel is ignoring madvise() on >>>> non-64k-aligned addresses, so we end up with a situation where the >>>> madvise() frees a whole 64k page which is also declared as free by the >>>> guest. >>>> >>>> I think we should either take this patch as it is right now (without >>>> adding extra code for migration) and later update it to the bitmap code >>>> by Jitendra Kolhe, or omit it completely (leaving 4k guests broken) and >>>> fix it properly after the bitmap code has been applied. But disabling >>>> the balloon code for 64k guests on 64k hosts completely does not sound >>>> very appealing to me. What do you think? >>> >>> True. As simple a hack - how about disabling madvise when host page size > >>> target page size? >> >> That could work - but is there a generic way in QEMU to get the current >> page size from a guest (since this might differ from TARGET_PAGE_SIZE)? >> Or would that mean to pollute the virtio-balloon code with ugly #ifdefs? > > let's just use TARGET_PAGE_SIZE, that's the best I can think of. That won't work - at least not on ppc: TARGET_PAGE_SIZE is always defined to 4096 here. The Linux kernel then switches the real page size during runtime to 65536. So we'd need a way to detect this automatically... Thomas