qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: "Michael S . Tsirkin" <mst@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Thomas Huth" <thuth@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>
Subject: Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols
Date: Thu, 2 Mar 2023 10:28:44 +0100	[thread overview]
Message-ID: <e2576d8d-0cdc-55b4-5aaa-f0e8518b25ce@redhat.com> (raw)
In-Reply-To: <9280c056-43eb-08e6-cb63-a7e601cb4700@maciej.szmigiero.name>

On 01.03.23 23:08, Maciej S. Szmigiero wrote:
> On 1.03.2023 18:24, David Hildenbrand wrote:
> (...)
>>> With virtio-mem one can simply have per-node virtio-mem devices.
>>>
>>> 2) I'm not sure what's the overhead of having, let's say, 1 TiB backing
>>> memory device mostly marked madvise(MADV_DONTNEED).
>>> Like, how much memory + swap this setup would actually consume - that's
>>> something I would need to measure.
>>
>> There are some WIP items to improve that (QEMU metadata (e.g., bitmaps), KVM metadata (e.g., per-memslot), Linux metadata (e.g., page tables).
>> Memory overcommit handling also has to be tackled.
>>
>> So it would be a "shared" problem with virtio-mem and will be sorted out eventually :)
>>
> 
> Yes, but this might take a bit of time, especially if kernel-side changes
> are involved - that's why I will check how this setup works in practice
> in its current shape.

Yes, let me know if you have any question. I invested a lot of time to 
figure out all of the details and possible workarounds/approaches in the 
past.

>>> Hyper-V actually "cleans up" the guest memory map on reboot - if the
>>> guest was effectively resized up then on reboot the guest boot memory is
>>> resized up to match that last size.
>>> Similarly, if the guest was ballooned out - that amount of memory is
>>> removed from the boot memory on reboot.
>>
>> Yes, it cleans up, but as I said last time I checked there was this concept of startup vs. minimum vs. maximum, at least for dynamic memory:
>>
>> https://www.fastvue.co/tmgreporter/blog/understanding-hyper-v-dynamic-memory-dynamic-ram/
>>
>> Startup RAM would be whatever you specify for "-m xG". If you go below min, you remove memory via deflation once the guest is up.
> 
> 
> That article was from 2014, so I guess it pertained Windows 2012 R2.

I remember seeing the same interface when I played with that a couple of 
years ago, but I don't recall which windows version i was using.

> 
> The memory settings page in more recent Hyper-V versions looks like on
> the screenshot at [1].
> 
> It no longer calls that main memory amount value "Startup RAM", now it's
> just "RAM".
> 
> Despite what one might think the "Enable Dynamic Memory" checkbox does
> *not* control the Dynamic Memory protocol availability or usage - the
> protocol is always available/exported to the guest.
> 
> What the "Enable Dynamic Memory" checkbox controls is some host-side
> heuristics that automatically resize the guest within chosen bounds
> based on some metrics.
> 
> Even if the "Enable Dynamic Memory" checkbox is *not* enabled the guest
> can still be online-resized via Dynamic Memory protocol by simply
> changing the value in the "RAM" field and clicking "Apply".
> 
> At least that's how it works on Windows 2019 with a Linux guest.

Right, I recall that that's a feature that was separately announced as 
explicit VM resizing, not HV dynamic memory. It uses the same underlying 
mechanism, yes, which is why the feature is always exposed to the VMs.

That's most probably when they performed the "Startup RAM" -> "RAM" 
rename, to make both features possibly co-exist and easier to configure.

> 
>>>
>>> So it's not exactly doing a hot-add after the guest boots.
>>
>> I recall BUG reports in Linux, that we got hv-balloon hot-add requests ~1 minute after Linux booted up, because of the above reason of startup memory [in these BUG reports, memory onlining was disabled and the VM would run out of memory because we hotplugged too much memory]. That's why I remember that this approach once was done.
>>
>> Maybe there are multiple implementations noways. At least in QEMU you could chose whatever makes most sense for QEMU.
>>
> 
> Right, it seems that the Hyper-V behavior evolved with time, too.

Yes. One could think of a split approach, that is, we never resize the 
initial RAM size (-m XG) from inside QEMU. Instead, we could have the 
following models:

(1) Basic "Startup RAM" model: always (re)boot Linux with "-m XG". On
     reboot. Once the VM comes up, we either add memory or request to
     inflate the balloon, to reach the previous guest size. Whenever the
     VM reboots, we first defrag all hv-balloon provided memory ("one
     contiguous chunk") to then "add" that memory to the VM. If the
     logical VM size <= requested, this hv-balloon memory size would be
     "0". Essentially resembling the "old" HV dynamic memory approach.

(2) Extended "Startup RAM" mode: Same as (1), but instead of hot-adding
     the RAM after the guest came up, we simply defrag the
     hv-balloon RAM during reboot ("one contiguous chunk") and expose it
     via e820/SRAT ot the guest. Going "below" startup RAM will still
     require inflation once the guest is up.

(3) External "Resize" mode: On reboot, simply shutdown the VM and notify
     libvirt. Libvirt will restart the VM with adjusted "Startup RAM".

It's fairly straight forward to extend (1) to achieve (2). That could be 
a sane default for QEMU. However wants (3) can simply let libvirt handle 
it on top without any special handling.

Internal resize mode is tricky, especially regarding migration. With 
sufficient motivation and problem solving one might be able to turn (1) 
or (2) into such a (4) mode. It would just be an implementation detail.


Note that I never considered the "go below initial RAM" and "resize 
initial RAM" really relevant for virtio-mem. Instead, you chose the 
startup size to be reasonably small (e.g., 4 GiB) and expose memory via 
the virtio-mem devices right at QEMU startup ("requested-size=XG"). The 
same approach could be applied to the hv-balloon model.

One main reason to decide against resizing significantly below 4G was, 
for example, that you'll end up losing valuable DMA/DMA32 memory the 
lower you go -- that no hotplugged memory will provide. So using 
inflation for everything < 4G does not sound too crazy to me, and could 
avoid mode (3) altogether. But again, just my thoughts.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2023-03-02  9:29 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-24 21:41 [PATCH][RESEND v3 0/3] Hyper-V Dynamic Memory Protocol driver (hv-balloon) Maciej S. Szmigiero
2023-02-24 21:41 ` [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols Maciej S. Szmigiero
2023-02-27 15:25   ` David Hildenbrand
2023-02-28 14:14     ` Maciej S. Szmigiero
2023-02-28 15:02       ` David Hildenbrand
2023-02-28 21:27         ` Maciej S. Szmigiero
2023-02-28 22:12           ` David Hildenbrand
2023-03-01 16:26             ` Maciej S. Szmigiero
2023-03-01 17:24               ` David Hildenbrand
2023-03-01 22:08                 ` Maciej S. Szmigiero
2023-03-02  9:28                   ` David Hildenbrand [this message]
2023-02-24 21:41 ` [PATCH][RESEND v3 2/3] Add Hyper-V Dynamic Memory Protocol definitions Maciej S. Szmigiero
2023-02-24 21:41 ` [PATCH][RESEND v3 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon) Maciej S. Szmigiero
2023-02-28 16:18   ` Igor Mammedov
2023-02-28 17:12     ` David Hildenbrand
2023-02-28 17:34   ` Daniel P. Berrangé
2023-02-28 21:24     ` Maciej S. Szmigiero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2576d8d-0cdc-55b4-5aaa-f0e8518b25ce@redhat.com \
    --to=david@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=mail@maciej.szmigiero.name \
    --cc=marcandre.lureau@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).