From: Anthony Liguori <anthony@codemonkey.ws>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Yasunori Goto <y-goto@jp.fujitsu.com>,
Christoph Lameter <clameter@sgi.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Mel Gorman <mel@csn.ul.ie>
Subject: Re: [PATCH RFC] hotplug-memory: refactor online_pages to separate zone growth from page onlining
Date: Sat, 29 Mar 2008 19:26:15 -0500 [thread overview]
Message-ID: <47EEDE27.2020809@codemonkey.ws> (raw)
In-Reply-To: <47EED683.5030200@goop.org>
Jeremy Fitzhardinge wrote:
> Dave Hansen wrote:
>>
>> To me, it sounds like the only different thing that you want is to make
>> sure that only partial sections are onlined. So, shall we work with the
>> existing interfaces to online partial sections, or will we just disable
>> it entirely when we see Xen?
>>
>
> Well, yes and no.
>
> For the current balloon driver, it doesn't make much sense. It would
> add a fair amount of complexity without any real gain. It's currently
> based around alloc_page/free_page. When it wants to shrink the domain
> and give memory back to the host, it allocates pages, adds the page
> structures to a ballooned pages list, and strips off the backing
> memory and gives it to the host. Growing the domain is the converse:
> it gets pages from the host, pulls page structures off the list, binds
> them together and frees them back to the kernel. If it runs out of
> ballooned page structures, it hotplugs in some memory to add more.
>
> That said, if (partial-)sections were much smaller - say 2-4 meg - and
> page migration/defrag worked reliably, then we could probably do
> without the balloon driver and do it all in terms of memory hot
> plug/unplug. That would give us a general mechanism which could
> either be driven from userspace, and/or have in-kernel
> Xen/kvm/s390/etc policy modules. Aside from small sections, the only
> additional requirement would be an online hook which can actually
> attach backing memory to the pages being onlined, rather than just
> assuming an underlying DIMM as current code does.
Ballooning on KVM (and s390) is very much a different beast from Xen.
With Xen, ballooning is very similar to hotplug in that you're adding
and removing physical memory from the guest. The use of alloc_page() to
implement it instead of hotplug is for the reasons Jeremy's outlined
above. Logically though, it's hotplug.
For KVM and s390, ballooning is really a primitive form of guest page
hinting. The host asks the guest to allocate some memory and the guest
allocates what it can, and then tells the host which pages they were.
It's basically saying the pages are Unused and then the host may move
those pages from Up=>Uz which reduces the resident size of the guest.
The virtual size stays the same though. We can enforce limits on the
resident size of the guest via the new cgroup memory controller.
The guest is free to reclaim those pages at any time it wants without
informing the host. In fact, we plan to utilize this by implementing a
shrinker and OOM handler in the virtio balloon driver.
Hotplug is still useful for us as it's more efficient to hot-add 1gb of
memory instead of starting out with an extra 1gb and ballooning down.
We wouldn't want to hotplug away every page we balloon though as we want
to be able to reclaim them if necessary without the hosts intervention
(like on an OOM condition).
>> For Xen and KVM, how does it get decided that the guest needs more
>> memory? Is this guest or host driven? Both? How is the guest
>> notified? Is guest userspace involved at all?
>
> In Xen, either the host or the guest can set the target size for the
> domain, which is capped by the host-set limit. Aside from possibly
> setting the target size, there's no usermode involvement in managing
> ballooning. The virtio balloon driver is similar, though from a quick
> look it seems to be entirely driven by the host side.
The host support for KVM ballooning is entirely in userspace, but that's
orthogonal to the discussion at hand really.
Regards,
Anthony Liguori
> J
next prev parent reply other threads:[~2008-03-30 0:26 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-29 0:00 [PATCH RFC] hotplug-memory: refactor online_pages to separate zone growth from page onlining Jeremy Fitzhardinge
2008-03-29 0:47 ` Dave Hansen
2008-03-29 2:08 ` Jeremy Fitzhardinge
2008-03-29 6:01 ` Dave Hansen
2008-03-29 16:06 ` Dave Hansen
2008-03-29 23:53 ` Jeremy Fitzhardinge
2008-03-30 0:26 ` Anthony Liguori [this message]
2008-03-31 16:42 ` Dave Hansen
2008-03-31 18:06 ` Jeremy Fitzhardinge
2008-04-01 7:17 ` Yasunori Goto
2008-04-02 18:46 ` Dave Hansen
2008-04-02 18:52 ` Jeremy Fitzhardinge
2008-04-02 18:59 ` Dave Hansen
2008-04-02 21:03 ` Jeremy Fitzhardinge
2008-04-02 21:17 ` Dave Hansen
2008-04-02 21:35 ` Jeremy Fitzhardinge
2008-04-02 21:43 ` Dave Hansen
2008-04-02 22:13 ` Jeremy Fitzhardinge
2008-04-02 23:27 ` Dave Hansen
2008-04-03 7:03 ` KAMEZAWA Hiroyuki
2008-04-02 21:36 ` Anthony Liguori
2008-03-29 4:38 ` KAMEZAWA Hiroyuki
2008-03-29 5:48 ` Jeremy Fitzhardinge
2008-03-29 6:26 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47EEDE27.2020809@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=clameter@sgi.com \
--cc=dave@linux.vnet.ibm.com \
--cc=jeremy@goop.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=y-goto@jp.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox