From: Dan Magenheimer <dan.magenheimer@oracle.com>
To: Jan Beulich <JBeulich@suse.com>, "Keir(Xen.org)" <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>, Olaf Hering <olaf@aepfle.de>,
IanCampbell <Ian.Campbell@citrix.com>,
Konrad Wilk <konrad.wilk@oracle.com>,
GeorgeDunlap <George.Dunlap@eu.citrix.com>,
IanJackson <Ian.Jackson@eu.citrix.com>,
George Shuklin <george.shuklin@gmail.com>,
xen-devel@lists.xen.org, DarioFaggioli <raistlin@linux.it>,
Kurt Hackel <kurt.hackel@oracle.com>,
Zhigang Wang <zhigang.x.wang@oracle.com>
Subject: Re: Proposed new "memory capacity claim" hypercall/feature
Date: Wed, 31 Oct 2012 09:04:47 -0700 (PDT) [thread overview]
Message-ID: <83bb902d-8e49-41cf-ad1e-c07c62d6e5f8@default> (raw)
In-Reply-To: <5090EBFE02000078000A59DD@nat28.tlf.novell.com>
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Subject: RE: Proposed new "memory capacity claim" hypercall/feature
>
> >>> On 30.10.12 at 18:13, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
(NOTE TO KEIR: Input from you requested in first stanza below.)
Hi Jan --
Thanks for the continued feedback!
I've slightly re-ordered the email to focus on the problem
(moved tmem-specific discussion to the end).
> As long as the allocation times can get brought down to an
> acceptable level, I continue to not see a need for the extra
> "claim" approach you're proposing. So working on that one (or
> showing that without unreasonable effort this cannot be
> further improved) would be a higher priority thing from my pov
> (without anyone arguing about its usefulness).
Fair enough. I will do some measurement and analysis of this
code. However, let me ask something of you and Keir as well:
Please estimate how long (in usec) you think it is acceptable
to hold the heap_lock. If your limit is very small (as I expect),
doing anything "N" times in a loop with the lock held (for N==2^26,
which is a 256GB domain) may make the analysis moot.
> But yes, with all the factors you mention brought in, there is
> certainly some improvement needed (whether your "claim"
> proposal is a the right thing is another question, not to mention
> that I currently don't see how this would get implemented in
> a consistent way taking several orders of magnitude less time
> to carry out).
OK, I will start on the next step... proof-of-concept.
I'm envisioning simple arithmetic, but maybe you are
right and arithmetic will not be sufficient.
> > Suppose you have a huge 256GB machine and you have already launched
> > a 64GB tmem guest "A". The guest is idle for now, so slowly
> > selfballoons down to maybe 4GB. You start to launch another 64GB
> > guest "B" which, as we know, is going to take some time to complete.
> > In the middle of launching "B", "A" suddenly gets very active and
> > needs to balloon up as quickly as possible or it can't balloon fast
> > enough (or at all if "frozen" as suggested) so starts swapping (and,
> > thanks to Linux frontswap, the swapping tries to go to hypervisor/tmem
> > memory). But ballooning and tmem are both blocked and so the
> > guest swaps its poor little butt off even though there's >100GB
> > of free physical memory available.
>
> That's only one side of the overcommit situation you're striving
> to get work right here: That same self ballooning guest, after
> sufficiently more guest got started so that the rest of the memory
> got absorbed by them would suffer the very same problems in
> the described situation, so it has to be prepared for this case
> anyway.
The tmem design does ensure the guest is prepared for this case
anyway... the guest swaps. And, unlike page-sharing, the guest
determines which pages to swap, not the host, and there is no
possibility of double-paging.
In your scenario, the host memory is truly oversubscribed. This
scenario is ultimately a weakness of virtualization in general;
trying to statistically-share an oversubscribed fixed resource
among a number of guests will sometimes cause a performance
degradation, whether the resource is CPU or LAN bandwidth or,
in this case, physical memory. That very generic problem
is I think not one any of us can solve. Toolstacks need to
be able to recognize the problem (whether CPU, LAN, or memory)
and act accordingly (report, or auto-migrate).
In my scenario, guest performance is hammered only because of
the unfortunate deficiency in the existing hypervisor memory
allocation mechanisms, namely that small allocations must
be artificially "frozen" until a large allocation can complete.
That specific problem is one I am trying to solve.
BTW, with tmem, some future toolstack might monitor various
available tmem statistics and predict/avoid your scenario.
Dan
next prev parent reply other threads:[~2012-10-31 16:04 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-29 17:06 Proposed new "memory capacity claim" hypercall/feature Dan Magenheimer
2012-10-29 18:24 ` Keir Fraser
2012-10-29 21:08 ` Dan Magenheimer
2012-10-29 22:22 ` Keir Fraser
2012-10-29 23:03 ` Dan Magenheimer
2012-10-29 23:17 ` Keir Fraser
2012-10-30 15:13 ` Dan Magenheimer
2012-10-30 14:43 ` Keir Fraser
2012-10-30 16:33 ` Dan Magenheimer
2012-10-30 9:11 ` George Dunlap
2012-10-30 16:13 ` Dan Magenheimer
2012-10-29 22:35 ` Tim Deegan
2012-10-29 23:21 ` Dan Magenheimer
2012-10-30 8:13 ` Tim Deegan
2012-10-30 15:26 ` Dan Magenheimer
2012-10-30 8:29 ` Jan Beulich
2012-10-30 15:43 ` Dan Magenheimer
2012-10-30 16:04 ` Jan Beulich
2012-10-30 17:13 ` Dan Magenheimer
2012-10-31 8:14 ` Jan Beulich
2012-10-31 16:04 ` Dan Magenheimer [this message]
2012-10-31 16:19 ` Jan Beulich
2012-10-31 16:51 ` Dan Magenheimer
2012-11-02 9:01 ` Jan Beulich
2012-11-02 9:30 ` Keir Fraser
2012-11-04 19:43 ` Dan Magenheimer
2012-11-04 20:35 ` Tim Deegan
2012-11-05 0:23 ` Dan Magenheimer
2012-11-05 10:29 ` Ian Campbell
2012-11-05 14:54 ` Dan Magenheimer
2012-11-05 22:24 ` Ian Campbell
2012-11-05 22:58 ` Zhigang Wang
2012-11-05 22:58 ` Dan Magenheimer
2012-11-06 13:23 ` Ian Campbell
2012-11-05 22:33 ` Dan Magenheimer
2012-11-06 10:49 ` Jan Beulich
2012-11-05 9:16 ` Jan Beulich
2012-11-07 22:17 ` Dan Magenheimer
2012-11-08 7:36 ` Keir Fraser
2012-11-08 10:11 ` Ian Jackson
2012-11-08 10:57 ` Keir Fraser
2012-11-08 21:45 ` Dan Magenheimer
2012-11-12 11:03 ` Ian Jackson
2012-11-08 8:00 ` Jan Beulich
2012-11-08 8:18 ` Keir Fraser
2012-11-08 8:54 ` Jan Beulich
2012-11-08 9:12 ` Keir Fraser
2012-11-08 9:47 ` Jan Beulich
2012-11-08 10:50 ` Keir Fraser
2012-11-08 13:48 ` Jan Beulich
2012-11-08 19:16 ` Dan Magenheimer
2012-11-08 22:32 ` Keir Fraser
2012-11-09 8:47 ` Jan Beulich
2012-11-08 18:38 ` Dan Magenheimer
2012-11-05 17:14 ` George Dunlap
2012-11-05 18:21 ` Dan Magenheimer
2012-11-01 2:13 ` Dario Faggioli
2012-11-01 15:51 ` Dan Magenheimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83bb902d-8e49-41cf-ad1e-c07c62d6e5f8@default \
--to=dan.magenheimer@oracle.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=george.shuklin@gmail.com \
--cc=keir@xen.org \
--cc=konrad.wilk@oracle.com \
--cc=kurt.hackel@oracle.com \
--cc=olaf@aepfle.de \
--cc=raistlin@linux.it \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
--cc=zhigang.x.wang@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).