qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>,
	Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, Markus Armbruster <armbru@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH 0/3] Hyper-V Dynamic Memory Protocol driver (hv-balloon)
Date: Mon, 21 Sep 2020 11:10:27 +0200	[thread overview]
Message-ID: <c51f9ebd-5303-9919-1469-c99ff6a461b1@redhat.com> (raw)
In-Reply-To: <cover.1600556526.git.maciej.szmigiero@oracle.com>

On 20.09.20 15:25, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> This series adds a Hyper-V Dynamic Memory Protocol driver (hv-balloon)
> and its protocol definitions.
> Also included is a driver providing backing devices for memory hot-add
> protocols ("haprots").
> 
> A haprot device works like a virtual DIMM stick: it allows inserting
> extra RAM into the guest at run time.
> 
> The main differences from the ACPI-based PC DIMM hotplug are:
> * Notifying the guest about the new memory range is not done via ACPI but
> via a protocol handler that registers with the haprot framework.
> This means that the ACPI DIMM slot limit does not apply.
> 
> * A protocol handler can prevent removal of a haprot device when it is
> still in use by setting its "busy" field.
> 
> * A protocol handler can also register an "unplug" callback so it gets
> notified when an user decides to remove the haprot device.
> This way the protocol handler can inform the guest about this fact and / or
> do its own cleanup.
> 
> The hv-balloon driver is like virtio-balloon on steroids: it allows both
> changing the guest memory allocation via ballooning and inserting extra
> RAM into it by adding haprot virtual DIMM sticks.
> One of advantages of these over ACPI-based PC DIMM hotplug is that such
> memory can be hotplugged in much smaller granularity because the ACPI DIMM
> slot limit does not apply.

Reading further below, it's essentially DIMM-based memory hotplug +
virtio-balloon - except the 256MB DIMM limit. But reading below, I don't
see how you want to avoid the KVM memory slot limit that's in a similar
size (I recall 256*2 due to 2 address spaces). Or avoid VMA limits when
wanting to grow a VM big in very tiny steps over time (e.g., adding 64MB
at a time).

> 
> In contrast with ACPI DIMM hotplug where one can only request to unplug a
> whole DIMM stick this driver allows removing memory from guest in single
> page (4k) units via ballooning.
> Then, once the guest has released the whole memory backed by a haprot
> virtual DIMM stick such device is marked "unused" and can be removed from
> the VM, if one wants so.
> A "HV_BALLOON_HAPROT_UNUSED" QMP event is emitted in this case so the
> software controlling QEMU knows that this operation is now possible.
> 
> The haprot devices are also marked unused after a VM reboot (with a
> corresponding "HV_BALLOON_HAPROT_UNUSED" QMP event).
> They are automatically reinserted (if still present) after the guest
> reconnects to this protocol (a "HV_BALLOON_HAPROT_INUSE" QMP event is then
> emitted).
> 
> For performance reasons, the guest-released memory is tracked in few range
> trees, as a series of (start, count) ranges.
> Each time a new page range is inserted into such tree its neighbors are
> checked as candidates for possible merging with it.
> 
> Besides performance reasons, the Dynamic Memory protocol itself uses page
> ranges as the data structure in its messages, so relevant pages need to be
> merged into such ranges anyway.
> 
> One has to be careful when tracking the guest-released pages, since the
> guest can maliciously report returning pages outside its current address
> space, which later clash with the address range of newly added memory.
> Similarly, the guest can report freeing the same page twice.
> 
> The above design results in much better ballooning performance than when
> using virtio-balloon with the same guest: 230 GB / minute with this driver
> versus 70 GB / minute with virtio-balloon.

I assume these numbers apply with Windows guests only. IIRC Linux
hv_balloon does not support page migration/compaction, while
virtio-balloon does. So you might end up with quite some fragmented
memory with hv_balloon in Linux guests - of course, usually only in
corner cases.

> 
> During a ballooning operation most of time is spent waiting for the guest
> to come up with newly freed page ranges, processing the received ranges on
> the host side (in QEMU / KVM) is nearly instantaneous.
> 
> The unballoon operation is also pretty much instantaneous:
> thanks to the merging of the ballooned out page ranges 200 GB of memory can
> be returned to the guest in about 1 second.
> With virtio-balloon this operation takes about 2.5 minutes.
> 
> These tests were done against a Windows Server 2019 guest running on a
> Xeon E5-2699, after dirtying the whole memory inside guest before each
> balloon operation.
> 
> Using a range tree instead of a bitmap to track the removed memory also
> means that the solution scales well with the guest size: even a 1 TB range
> takes just few bytes of memory.
> Example usage:
> * Add "-device vmbus-bridge,id=vmbus-bridge -device hv-balloon,id=hvb"
>   to the QEMU command line and set "maxmem" value to something large,
>   like 1T.
> 
> * Use QEMU monitor commands to add a haprot virtual DIMM stick, together
>   with its memory backend:
>   object_add memory-backend-ram,id=mem1,size=200G
>   device_add mem-haprot,id=ha1,memdev=mem1
>   The first command is actually the same as for ACPI-based DIMM hotplug.
> 
> * Use the ballooning interface monitor commands to force the guest to give
>   out as much memory as possible:
>   balloon 1

At least under virtio-balloon with Linux, that will pretty sure trigger
a guest crash. Is something like that expected to work with Windows
guests reasonably well?


>   The ballooning interface monitor commands can also be used to resize
>   the guest up and down appropriately.
> 
> * One can check the current guest size by issuing a "info balloon" command.
>   This is useful to know what is happening, since large ballooning or
>   unballooning operations take some time to complete.

So, every time you want to add more memory (after the balloon was
deflated) to a guest, you have to plug a new mem-haprot device, correct?

So your QEMU user has to be well aware of how to balance "balloon" and
"object_add/device_add/object_del_device_del" commands to achieve the
desired guest size.

> 
> * Once the guest releases the whole memory backed by a haprot device
>   (or is restarted) a "HV_BALLOON_HAPROT_UNUSED" QMP event will be
>   generated.
>   The haprot device then can be removed, together with its memory backend:
>   device_del ha1
>   object_del mem1

So, you rely on some external entity to properly shrink a guest again
(e.g., during reboot).

> 
> Future directions:
> * Allow sharing the ballooning QEMU interface between hv-balloon and
>   virtio-balloon drivers.
>   Currently, only one of them can be added to the VM at the same time.

Yeah, that makes sense. Only one at a time.

> 
> * Allow new haport devices to reuse the same address range as the ones
>   that were previously deleted via device_del monitor command without
>   having to restart the VM.
> 
> * Add vmstate / live migration support to the hv-balloon driver.
> 
> * Use haprot device to also add memory via virtio interface (this requires
>   defining a new operation in virtio-balloon protocol and appropriate
>   support from the client virtio-balloon driver in the Linux kernel).

Most probably not the direction we are going to take. We have virtio-mem
for clean, fine-grained, NUMA-aware, paravirtualized memory hot(un)plug
now, and we are well aware of various issues with (base-page size based)
memory ballooning that are fairly impossible to solve (especially in the
context of vfio).

-- 
Thanks,

David / dhildenb



  parent reply	other threads:[~2020-09-21  9:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-20 13:25 [PATCH 0/3] Hyper-V Dynamic Memory Protocol driver (hv-balloon) Maciej S. Szmigiero
2020-09-20 13:25 ` [PATCH 1/3] haprot: add a device for memory hot-add protocols Maciej S. Szmigiero
2020-09-20 13:25 ` [PATCH 2/3] Add Hyper-V Dynamic Memory Protocol definitions Maciej S. Szmigiero
2020-09-20 13:25 ` [PATCH 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon) Maciej S. Szmigiero
2020-09-20 14:16 ` [PATCH 0/3] " no-reply
2020-09-21  9:00 ` Igor Mammedov
2020-09-21  9:29   ` David Hildenbrand
2020-09-21  9:10 ` David Hildenbrand [this message]
2020-09-21 22:22   ` Maciej S. Szmigiero
2020-09-22  7:26     ` David Hildenbrand
2020-09-22 23:19       ` Maciej S. Szmigiero
2020-09-23 12:48         ` David Hildenbrand
2020-09-24 22:37           ` Maciej S. Szmigiero
2020-09-25  6:49             ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c51f9ebd-5303-9919-1469-c99ff6a461b1@redhat.com \
    --to=david@redhat.com \
    --cc=armbru@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mail@maciej.szmigiero.name \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).