All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Andres Lagar-Cavilla <andreslc@gridcentric.ca>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>,
	"Keir (Xen.org)" <keir@xen.org>,
	Ian Campbell <ian.campbell@citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
	xen-devel@lists.xen.org,
	Konrad Rzeszutek Wilk <konrad@kernel.org>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: Proposed XENMEM_claim_pages hypercall: Analysis of problem and alternate solutions
Date: Fri, 11 Jan 2013 14:08:14 -0500	[thread overview]
Message-ID: <20130111190814.GD29020@phenom.dumpdata.com> (raw)
In-Reply-To: <F2749668-D894-4DBA-BC48-D5CE8180C27F@gridcentric.ca>

> >>>> Neither is enforcing min==max. This was my argument when previously commenting on this thread. The fact that you have enforcement of a maximum domain allocation gives you an excellent tool to keep a domain's unsupervised growth at bay. The toolstack can choose how fine-grained, how often to be alerted and stall the domain.
> > 
> > That would also do the trick - but there are penalties to it.
> > 
> > If one just wants to launch multiple guests and "freeze" all the other guests
> > from using the balloon driver - that can certainly be done.
> > 
> > But that is a half-way solution (in my mind). Dan's idea is that you wouldn't
> > even need that and can just allocate without having to worry about the other
> > guests at all - b/c you have reserved enough memory in the hypervisor (host) to
> > launch the guest.
> 
> Konrad:
> Ok, what happens when a guest is stalled because it cannot allocate more pages due to existing claims? Exactly the same that happens when it can't grow because it has hit d->max_pages.

But it wouldn't. I am going here on a limp, b/c I believe this is what the code
does but I should double-check.

The variables for the guest to go up/down would still stay in place - so it
should not be impacted by the 'claim'. Meaning you just leave them alone
and let the guest do whatever it wants without influencing it.

If the claim hypercall fails, then yes - you could have this issue.

But the solution to the hypercall failing are multiple - one is to 
try to "squeeze" all the guests to make space or just try to allocate
the guest on another box that has more memory and where the claim
hypercall returned success. Or it can do these claim hypercalls
on all the nodes in parallel and pick amongst the ones that returned
success.

Perhaps the 'claim' call should be called 'probe_and_claim'?

.. snip..
> >>> That code makes certain assumptions - that the guest will not go/up down
> >>> in the ballooning once the toolstack has decreed how much
> >>> memory the guest should use. It also assumes that the operations
> >>> are semi-atomic - and to make it so as much as it can - it executes
> >>> these operations in serial.
> >>> 
> >>> This goes back to the problem statement - if we try to parallize
> >>> this we run in the problem that the amount of memory we thought
> >>> we free is not true anymore. The start of this email has a good
> >>> description of some of the issues.
> >> 
> >> Just set max_pages (bad name...) everywhere as needed to make room. Then kick tmem (everywhere, in parallel) to free memory. Wait until enough is free …. Allocate your domain(s, in parallel). If any vcpus become stalled because a tmem guest driver is trying to allocate beyond max_pages, you need to adjust your allocations. As usual.
> > 
> > 
> > Versus just one "reserve" that would remove the need for most of this.
> > That is - if we can not "reserve" we would fall-back to the mechanism you
> > stated, but if there is enough memory we do not have to do the "wait"
> > game (which on a 1TB takes forever and makes launching guests sometimes
> > take minutes) - and can launch the guest without having to worry
> > about slow-path.
> > .. snip.
> 
> The "wait" could be literally zero in a common case. And if not, because there is not enough free ram, the claim would have failed.
> 

Absolutly. And that is the beaty of it. If it fails then we can
decide to persue other options knowing that there was no race in finding
the value of free memory at all. The other options could be the
squeeze other guests down and try again; or just decide to claim/allocate
the guest on another host altogether.


> >>> I believe Dan is saying is that it is not enabled by default.
> >>> Meaning it does not get executed in by /etc/init.d/xencommons and
> >>> as such it never gets run (or does it now?) - unless one knows
> >>> about it - or it is enabled by default in a product. But perhaps
> >>> we are both mistaken? Is it enabled by default now on den-unstable?
> >> 
> >> I'm a bit lost … what is supposed to be enabled? A sharing daemon? A paging daemon? Neither daemon requires wait queue work, batch allocations, etc. I can't figure out what this portion of the conversation is about.
> > 
> > The xenshared daemon.
> That's not in the tree. Unbeknownst to me. Would appreciate to know more. Or is it a symbolic placeholder in this conversation?

OK, I am confused then. I thought there was now an daemon that would take
care of the PoD and swapping? Perhaps its called something else?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2013-01-11 19:08 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.18000.1354568068.1399.xen-devel@lists.xen.org>
2012-12-04  3:24 ` Proposed XENMEM_claim_pages hypercall: Analysis of problem and alternate solutions Andres Lagar-Cavilla
2012-12-18 22:17   ` Konrad Rzeszutek Wilk
2012-12-19 12:53     ` George Dunlap
2012-12-19 13:48       ` George Dunlap
2013-01-03 20:38         ` Dan Magenheimer
2013-01-02 21:59       ` Konrad Rzeszutek Wilk
2013-01-14 18:28         ` George Dunlap
2013-01-22 21:57           ` Konrad Rzeszutek Wilk
2013-01-23 18:36             ` Dave Scott
2013-02-12 15:38               ` Konrad Rzeszutek Wilk
2012-12-20 16:04     ` Tim Deegan
2013-01-02 15:31       ` Andres Lagar-Cavilla
2013-01-02 21:43         ` Dan Magenheimer
2013-01-03 16:25           ` Andres Lagar-Cavilla
2013-01-03 18:49             ` Dan Magenheimer
2013-01-07 14:43               ` Ian Campbell
2013-01-07 18:41                 ` Dan Magenheimer
2013-01-08  9:03                   ` Ian Campbell
2013-01-08 19:41                     ` Dan Magenheimer
2013-01-09 10:41                       ` Ian Campbell
2013-01-09 14:44                         ` Dan Magenheimer
2013-01-09 14:58                           ` Ian Campbell
2013-01-14 15:45                           ` George Dunlap
2013-01-14 18:18                             ` Dan Magenheimer
2013-01-14 19:42                               ` George Dunlap
2013-01-14 23:14                                 ` Dan Magenheimer
2013-01-23 12:18                                   ` Ian Campbell
2013-01-23 17:34                                     ` Dan Magenheimer
2013-02-12 16:18                                     ` Konrad Rzeszutek Wilk
2013-01-10 10:31                       ` Ian Campbell
2013-01-10 18:42                         ` Dan Magenheimer
2013-01-02 21:38       ` Dan Magenheimer
2013-01-03 16:24         ` Andres Lagar-Cavilla
2013-01-03 18:33           ` Dan Magenheimer
2013-01-10 17:13         ` Tim Deegan
2013-01-10 21:43           ` Dan Magenheimer
2013-01-17 15:12             ` Tim Deegan
2013-01-17 15:26               ` Andres Lagar-Cavilla
2013-01-22 19:22               ` Dan Magenheimer
2013-01-23 12:18                 ` Ian Campbell
2013-01-23 16:05                   ` Dan Magenheimer
2013-01-02 15:29     ` Andres Lagar-Cavilla
2013-01-11 16:03       ` Konrad Rzeszutek Wilk
2013-01-11 16:13         ` Andres Lagar-Cavilla
2013-01-11 19:08           ` Konrad Rzeszutek Wilk [this message]
2013-01-14 16:00             ` George Dunlap
2013-01-14 16:11               ` Andres Lagar-Cavilla
2013-01-17 15:16             ` Tim Deegan
2013-01-18 21:45               ` Konrad Rzeszutek Wilk
2013-01-21 10:29                 ` Tim Deegan
2013-02-12 15:54                   ` Konrad Rzeszutek Wilk
2013-02-14 13:32                     ` Konrad Rzeszutek Wilk
2012-12-03 20:54 Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130111190814.GD29020@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andreslc@gridcentric.ca \
    --cc=dan.magenheimer@oracle.com \
    --cc=ian.campbell@citrix.com \
    --cc=keir@xen.org \
    --cc=konrad@kernel.org \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.