From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Stevens <stevensd@chromium.org>
Cc: virtualization@lists.linux-foundation.org
Subject: Re: [PATCH v2] virtio_balloon: add param to skip adjusting pages
Date: Wed, 24 Nov 2021 03:37:56 -0500 [thread overview]
Message-ID: <20211124032622-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAD=HUj5wPYLKJxsjgcnMu_NYQ6eMwmd-VDU0gbWbqgzOPkV6fg@mail.gmail.com>
On Wed, Nov 24, 2021 at 01:55:16PM +0900, David Stevens wrote:
> > >> And if you look at things from the context of
> > >> a specific userspace process, there will be other processes running
> > >> and using memory. So while that statement is true with respect to this
> > >> change, that is also true without this change. The specific details
> > >> might be changed by the proposed parameter, but it wouldn't be
> > >> introducing any fundamentally new behavior to Linux.
> > >>
> > >
> > > Please note that the hyper-v balloon just recently switched to using
> > > adjust_managed_page_count() proper accounting reasons:
> > >
> > > commit d1df458cbfdb0c3384c03c7fbcb1689bc02a746c
> > > Author: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > Date: Wed Dec 2 17:12:45 2020 +0100
> > >
> > > hv_balloon: do adjust_managed_page_count() when ballooning/un-ballooning
> > >
> > > Unlike virtio_balloon/virtio_mem/xen balloon drivers, Hyper-V balloon driver
> > > does not adjust managed pages count when ballooning/un-ballooning and this leads
> > > to incorrect stats being reported, e.g. unexpected 'free' output.
> > >
> > > Note, the calculation in post_status() seems to remain correct: ballooned out
> > > pages are never 'available' and we manually add dm->num_pages_ballooned to
> > > 'commited'.
> > >
>
> I saw this commit, but it wasn't entirely clear to me the problem it
> was addressing was. Is it the issue Michael pointed out on v1 of my
> patch set, where memory in the balloon shouldn't be included in the
> free stat reported to the device? This version of my patch should
> address that specific issue. Managed page count is linux kernel
> specific metadata, so there's no fundamental reason that it needs to
> line up exactly with anything reported via the virtio-balloon API.
>
> > >> That approach would require a lot of changes to userspace - probably
> > >> nearly everywhere that uses _SC_PHYS_PAGES or get_phys_pages, or
> > >> anywhere that parses /proc/meminfo. Actually properly using "logically
> > >> offline pages" would require an additional API for monitoring changes
> > >> to the value, and updating to that sort of listener API would not be a
> > >> localized change, especially since most programs do not account for
> > >> memory hotplug and just use the amount of physical memory during
> > >> initialization. Realistically, nearly all of the callers would simply
> > >> add together "logically offline pages" and MemTotal.
> > >
> > > I'd appreciate a more generic approach for user space to figure out the
> > > "initial memory size" in a virtualized environment than adding
> > > some module parameter to virtio-balloon -- if that makes sense.
> > >
> > > MemTotal as is expresses how much memory the buddy currently manages,
> > > for example, excluding early allocations during boot, excluding actually
> > > unplugged memory and excluding logically unplugged memory. Adjusting that
> > > value makes perfect sense for virtio-balloon without deflate-on-oom.
> > >
>
> That's a definition of how MemTotal is implemented, but it's not
> really a specification of the MemTotal API. The closest thing to a
> real specification I can find is "Total usable RAM (i.e., physical RAM
> minus a few reserved bits and the kernel binary code)", from the proc
> man pages. I think there is quite a bit of leeway in changing how
> exactly MemTotal is implemented without violating the (quite vague)
> specification or changing any observable semantics of the API. In
> particular, leaving the pages in the balloon as part of MemTotal is
> essentially indistinguishable from simply having a non-OOM killable
> process locking an equivalent amount of memory. So this proposal isn't
> really introducing any fundamentally new behavior to the Linux kernel.
>
> > >> It's also not clear to me what utility the extra information would
> > >> provide to userspace. If userspace wants to know how much memory is
> > >> available, they should use MemAvailable. If userspace wants to have a
> > >> rough estimate for the maximum amount of memory in the system, they
> > >> would add together MemTotal and "logically offline pages". The value
> > >> of MemTotal with a no-deflate-on-oom virtio-balloon is a value with a
> > >> vague meaning that lies somewhere between the "maximum amount of
> > >> memory" and the "current amount of memory". I don't really see any
> > >> situations where it should clearly be used over one of MemAvailable or
> > >> MemTotal + "logically offline pages".
> > >
> > > The issue is that any application that relies on MemTotal in a virtualized
> > > environment is most probably already suboptimal in some cases. You can
> > > rely on it and actually later someone will unplug (inflate balloon)
> > > memory or plug (deflate balloon) memory. Even MemAvailable is suboptimal
> > > because what about two applications that rely on that information at
> > > the same time?
> > >
> >
> > BTW, the general issue here is that "we don't know what the hypervisor
> > will do".
>
> I do agree that this is a significant problem. I would expand on it a
> bit more, to be "since we don't know what the hypervisor will do, we
> don't know how to treat memory in the balloon". The proposed module
> parameter is more or less a mechanism to allow the system
> administrator to tell the virtio_balloon driver how the hypervisor
> behaves.
Now that you put it that way, it looks more like this should
be a feature bit not a module parameter.
> And if the hypervisor will give memory back to the guest when
> the guest needs it, then I don't think it's not necessary to logically
> unplug the memory.
Ideally we would also pair this with sending a signal to device
that memory is needed.
> It might be a bit cleaner to explicitly address this in the
> virtio_balloon protocol. We could add a min_num_pages field to the
> balloon config, with semantics along the lines of "The host will
> respond to memory pressure in the guest by deflating the balloon down
> to min_num_pages, unless it would cause system instability in the
> host". Given that feature, I think it would be reasonable to only
> consider min_num_pages as logically unplugged.
Okay. I think I would do it a bit differently though, make num_pages be
the min_num_pages, and add an extra_num_pages field for memory that is
nice to have but ok to drop.
As long as we are here, can we add a page_shift field please
so more than 2^44 bytes can be requested?
> > Maybe "MemMax" actually could make sense, where we expose the maximum
> > "MemTotal" we had so far since we were up an running. So the semantics
> > wouldn't be "maximum possible", because we don't know that, but instead
> > "maximum we had".
>
> Rather than add a new API, I think it is much better to make existing
> APIs behave closer to how they behave in a non-virtualized
> environment. It is true that we could go through and fix a limited
> number of special user space applications, but sysconf(_SC_PHYS_PAGES)
> and /proc/meminfo are not special APIs. Fixing every application that
> uses them is not feasible, especially when taking into account systems
> with closed-source applications (e.g. Android). Also, while MemMax is
> well defined, it has the same issues you brought up earlier -
> specifically, applications don't know whether the hypervisor will
> actually ever provide MemMax again, and they don't know whether MemMax
> is actually the realy maximum amount of memory that could be available
> in the future. It's not clear to me that it's significantly better or
> more useful to userspace than simply changing how MemTotal is
> implemented.
>
> -David
Agree on trying to avoid changing applications, limiting changes
to device and guest kernel, this has a lot of value.
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2021-11-24 8:38 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20211118091130.3817665-1-stevensd@google.com>
2021-11-18 11:17 ` [PATCH v2] virtio_balloon: add param to skip adjusting pages David Hildenbrand
[not found] ` <CAD=HUj7i7foyPE8a6dhj+=UR2jn5_vaQx-3jjKtjYrY8iSJWzw@mail.gmail.com>
2021-11-19 13:36 ` David Hildenbrand
2021-11-19 13:53 ` David Hildenbrand
[not found] ` <CAD=HUj5wPYLKJxsjgcnMu_NYQ6eMwmd-VDU0gbWbqgzOPkV6fg@mail.gmail.com>
2021-11-24 8:37 ` Michael S. Tsirkin [this message]
2021-11-24 10:00 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211124032622-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=stevensd@chromium.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).