xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Tim Deegan <tim@xen.org>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Olaf Hering <olaf@aepfle.de>, Keir Fraser <keir@xen.org>,
	IanCampbell <Ian.Campbell@citrix.com>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	George Shuklin <george.shuklin@gmail.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	xen-devel@lists.xen.org, DarioFaggioli <raistlin@linux.it>,
	Jan Beulich <JBeulich@suse.com>,
	Kurt Hackel <kurt.hackel@oracle.com>,
	Zhigang Wang <zhigang.x.wang@oracle.com>
Subject: Re: Proposed new "memory capacity claim" hypercall/feature
Date: Sun, 4 Nov 2012 20:35:32 +0000	[thread overview]
Message-ID: <20121104203532.GA11377@ocelot.phlegethon.org> (raw)
In-Reply-To: <7481128d-3f65-4cc3-ad96-1d4e9cd25094@default>

At 11:43 -0800 on 04 Nov (1352029386), Dan Magenheimer wrote:
> > From: Keir Fraser [mailto:keir@xen.org]
> > Sent: Friday, November 02, 2012 3:30 AM
> > To: Jan Beulich; Dan Magenheimer
> > Cc: Olaf Hering; IanCampbell; George Dunlap; Ian Jackson; George Shuklin; DarioFaggioli; xen-
> > devel@lists.xen.org; Konrad Rzeszutek Wilk; Kurt Hackel; Mukesh Rathor; Zhigang Wang; TimDeegan
> > Subject: Re: Proposed new "memory capacity claim" hypercall/feature
> > 
> > On 02/11/2012 09:01, "Jan Beulich" <JBeulich@suse.com> wrote:
> > 
> > > Plus, if necessary, that loop could be broken up so that only the
> > > initial part of it gets run with the lock held (see c/s
> > > 22135:69e8bb164683 for why the unlock was moved past the
> > > loop). That would make for a shorter lock hold time, but for a
> > > higher allocation latency on large oder allocations (due to worse
> > > cache locality).
> > 
> > In fact I believe only the first page needs to have its count_info set to !=
> > PGC_state_free, while the lock is held. That is sufficient to defeat the
> > buddy merging in free_heap_pages(). Similarly, we could hoist most of the
> > first loop in free_heap_pages() outside the lock. There's a lot of scope for
> > optimisation here.
> 
> (sorry for the delayed response)
> 
> Aren't we getting a little sidetracked here?  (Maybe my fault for
> looking at whether this specific loop is fast enough...)
> 
> This loop handles only order=N chunks of RAM.  Speeding up this
> loop and holding the heap_lock here for a shorter period only helps
> the TOCTOU race if the entire domain can be allocated as a
> single order-N allocation.

I think the idea is to speed up allocation so that, even for a large VM,
you can just allocate memory instead of needing a reservation hypercall
(whose only purpose, AIUI, is to give you an immediate answer).

> So unless the code for the _entire_ memory allocation path can
> be optimized so that the heap_lock can be held across _all_ the
> allocations necessary to create an arbitrary-sized domain, for
> any arbitrary state of memory fragmentation, the original
> problem has not been solved.
> 
> Or am I misunderstanding?
> 
> I _think_ the claim hypercall/subop should resolve this, though
> admittedly I have yet to prove (and code) it.

I don't think it solves it - or rather it might solve this _particular_
instance of it but it doesn't solve the bigger problem.  If you have a
set of overcommitted hosts and you want to start a new VM, you need to:

 - (a) decide which of your hosts is the least overcommitted;
 - (b) free up enough memory on that host to build the VM; and
 - (c) build the VM.

The claim hypercall _might_ fix (c) (if it could handle allocations that
need address-width limits or contiguous pages).  But (b) and (a) have
exactly the same problem, unless there is a central arbiter of memory
allocation (or equivalent distributed system).  If you try to start 2
VMs at once,

 - (a) the toolstack will choose to start them both on the same machine,
       even if that's not optimal, or in the case where one creation is
       _bound_ to fail after some delay.
 - (b) the other VMs (and perhaps tmem) start ballooning out enough
       memory to start the new VM.  This can take even longer than
       allocating it since it depends on guest behaviour.  It can fail
       after an arbitrary delay (ditto).

If you have a toolstack with enough knowledge and control over memory
allocation to sort out stages (a) and (b) in such a way that there are
no delayed failures, (c) should be trivial.

Tim.

  reply	other threads:[~2012-11-04 20:35 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 17:06 Proposed new "memory capacity claim" hypercall/feature Dan Magenheimer
2012-10-29 18:24 ` Keir Fraser
2012-10-29 21:08   ` Dan Magenheimer
2012-10-29 22:22     ` Keir Fraser
2012-10-29 23:03       ` Dan Magenheimer
2012-10-29 23:17         ` Keir Fraser
2012-10-30 15:13           ` Dan Magenheimer
2012-10-30 14:43             ` Keir Fraser
2012-10-30 16:33               ` Dan Magenheimer
2012-10-30  9:11         ` George Dunlap
2012-10-30 16:13           ` Dan Magenheimer
2012-10-29 22:35 ` Tim Deegan
2012-10-29 23:21   ` Dan Magenheimer
2012-10-30  8:13     ` Tim Deegan
2012-10-30 15:26       ` Dan Magenheimer
2012-10-30  8:29     ` Jan Beulich
2012-10-30 15:43       ` Dan Magenheimer
2012-10-30 16:04         ` Jan Beulich
2012-10-30 17:13           ` Dan Magenheimer
2012-10-31  8:14             ` Jan Beulich
2012-10-31 16:04               ` Dan Magenheimer
2012-10-31 16:19                 ` Jan Beulich
2012-10-31 16:51                   ` Dan Magenheimer
2012-11-02  9:01                     ` Jan Beulich
2012-11-02  9:30                       ` Keir Fraser
2012-11-04 19:43                         ` Dan Magenheimer
2012-11-04 20:35                           ` Tim Deegan [this message]
2012-11-05  0:23                             ` Dan Magenheimer
2012-11-05 10:29                               ` Ian Campbell
2012-11-05 14:54                                 ` Dan Magenheimer
2012-11-05 22:24                                   ` Ian Campbell
2012-11-05 22:58                                     ` Zhigang Wang
2012-11-05 22:58                                     ` Dan Magenheimer
2012-11-06 13:23                                       ` Ian Campbell
2012-11-05 22:33                             ` Dan Magenheimer
2012-11-06 10:49                               ` Jan Beulich
2012-11-05  9:16                           ` Jan Beulich
2012-11-07 22:17                             ` Dan Magenheimer
2012-11-08  7:36                               ` Keir Fraser
2012-11-08 10:11                                 ` Ian Jackson
2012-11-08 10:57                                   ` Keir Fraser
2012-11-08 21:45                                   ` Dan Magenheimer
2012-11-12 11:03                                     ` Ian Jackson
2012-11-08  8:00                               ` Jan Beulich
2012-11-08  8:18                                 ` Keir Fraser
2012-11-08  8:54                                   ` Jan Beulich
2012-11-08  9:12                                     ` Keir Fraser
2012-11-08  9:47                                       ` Jan Beulich
2012-11-08 10:50                                         ` Keir Fraser
2012-11-08 13:48                                           ` Jan Beulich
2012-11-08 19:16                                             ` Dan Magenheimer
2012-11-08 22:32                                               ` Keir Fraser
2012-11-09  8:47                                               ` Jan Beulich
2012-11-08 18:38                                 ` Dan Magenheimer
2012-11-05 17:14         ` George Dunlap
2012-11-05 18:21           ` Dan Magenheimer
2012-11-01  2:13   ` Dario Faggioli
2012-11-01 15:51     ` Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121104203532.GA11377@ocelot.phlegethon.org \
    --to=tim@xen.org \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=george.shuklin@gmail.com \
    --cc=keir@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=kurt.hackel@oracle.com \
    --cc=olaf@aepfle.de \
    --cc=raistlin@linux.it \
    --cc=xen-devel@lists.xen.org \
    --cc=zhigang.x.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).