Re: [RFC v2][PATCH 1/3] docs: design and intended usage for NUMA-aware ballooning

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: David Vrabel <david.vrabel@citrix.com>
To: Yechen Li <lccycc123@gmail.com>
Cc: ufimtseva@gmail.com, dario.faggioli@citrix.com,
	Ian.Jackson@eu.citrix.com, xen-devel@lists.xen.org,
	Ian.Campbell@eu.citrix.com, JBeulich@suse.com
Subject: Re: [RFC v2][PATCH 1/3] docs: design and intended usage for	NUMA-aware ballooning
Date: Mon, 19 Aug 2013 13:58:51 +0100	[thread overview]
Message-ID: <5212168B.3090007@citrix.com> (raw)
In-Reply-To: <1376626416-12901-1-git-send-email-lccycc123@gmail.com>

On 16/08/13 05:13, Yechen Li wrote:
> 
> +### nodemask VNODE\_TO\_PNODE(int vnode) ###
> +
> +This service is provided by the hypervisor (and wired, if necessary, all the
> +way up to the proper toolstack layer or guest kernel), since it is only Xen
> +that knows both the virtual and the physical topologies.

The physical NUMA topology must not be exposed to guests that have a
virtual NUMA topology -- only the toolstack and Xen should know the
mapping between the two.

A guest cannot make sensible use of a machine topology as it may be
migrated to a host with a different topology.

> +## Description of the problem ##
> +
> +Let us use an example. Let's assume that guest _G_ has a virtual 2 vnodes,
> +and that the memory for vnode #0 and #1 comes from pnode #0 and pnode #2,
> +respectively.
> +
> +Now, the user wants to create a new guest, but the system is under high memory
> +pressure, so he decides to try ballooning _G_ down. He sees that pnode #2 has
> +the best chances to accommodate all the memory for the new guest, which would
> +be really good for performance, if only he can make space there. _G_ is the
> +only domain eating some memory from pnode, #2 but, as said above, not all of
> +its memory comes from there.

It is not clear to me that this is the optimal decision.  What
tools/information will be available that the user can use to make
sensible decisions here?  e.g., is the current layout available to tools?

Remember that the "user" in this example is most often some automated
process and not a human.

> +So, right now, the user has no way to specify that he wants to balloon down
> +_G_ in such a way that he will get as much as possible free pages from pnode
> +#2, rather than from pnode #0. He can ask _G_ to balloon down, but there is
> +no guarantee on from what pnode the memory will be freed.
> +
> +The same applies to the ballooning up case, when the user, for some specific
> +reasons, wants to be sure that it is the memory of some (other) specific pnode
> +that will be used.

I would like to see some real world examples of cases where this is
sensible.

In general, I'm not keen on adding ABIs or interfaces that don't solve
real world problems, particularly if they're easy to misuse and end up
with something that is very suboptimal.

> +## NUMA-aware ballooning ##
> +
> +The new NUMA-aware ballooning logic works as follows.
> +
> +There is room, in libxl\_set\_memory\_target() for two more parameters, in
> +addition to the new memory target:

The Xenstore interface should be the primary interface being documented.
 The libxl interface is secondary and (probably) a consequence of the
xenstore interface.

> +* _pnid_ -- which is the pnode id of which node the user wants to try get some
> +  free memory on
> +* _nodeexact_ -- which is a bool specifying whether or not, in case it is not
> +  possible to reach the new ballooning target only with memory from pnode
> +  _pnid_, the user is fine with using memory from other pnodes.  
> +  If _nodeexact_ is true, it is possible that the new target is not reached; if
> +  it is false, the new target will (probably) be reached, but it is possible
> +  that some memory is freed on pnodes other than _pnid_.
> +
> +To let the ballooning driver know about these new parameters, a new xenstore
> +key exists in ~/memory/target\_nid. So, for a proper NUMA aware ballooning
> +operation to occur, the user should write the proper values in both the keys:
> +~/memory/target\_nid and ~/memory/target.

If we decide we do need such control, I think the xenstore interface
should look more like:

memory/target

  as before

memory/target-by-nid/0

  target for virtual node 0

...

memory/target-by-nid/N

  target for virtual node N

I think this better reflects the goal which is an adjusted NUMA layout
for the guest rather than the steps required to reach it (release P
pages from node N).

The balloon driver attempts to reach target, whist simultaneously trying
to reach the individual node targets.  It should prefer to balloon
up/down on the node that is furthest from its node target.

In cases where target and the sum of target-by-nid/N don't agree (or are
not present) the balloon driver should use target, and balloon up/down
evenly across all NUMA nodes.

Thew libxl interface does not necessarily have to match the xenstore
interface if that's the initial tools would prefer.

Finally a style comment, please avoid the use of a single gender
specific pronouns in documentation/comments (i.e., don't always use
he/his etc.).  I prefer to use a singular "they" but you could consider
"he or she" or using "he" for some examples and "she" in others.

David

next prev parent reply	other threads:[~2013-08-19 12:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-16  4:13 [RFC v2][PATCH 1/3] docs: design and intended usage for NUMA-aware ballooning Yechen Li
2013-08-16  9:09 ` Jan Beulich
2013-08-16 10:18   ` Li Yechen
     [not found]   ` <CAP5+zHQ128UVGsGjxsNdvSOupt42Gue2+1nLVg-KYrb=exqqCw@mail.gmail.com>
2013-08-16 13:21     ` Jan Beulich
2013-08-16 14:17       ` Li Yechen
2013-08-16 14:55         ` Jan Beulich
2013-08-16 22:53       ` Dario Faggioli
2013-08-19  9:22         ` Jan Beulich
2013-08-20 14:18           ` Dario Faggioli
2013-08-16 23:30   ` Dario Faggioli
2013-08-19  9:17     ` Jan Beulich
2013-08-20 14:05       ` Dario Faggioli
2013-08-20 14:24         ` Jan Beulich
2013-08-19 11:05   ` George Dunlap
2013-08-20 14:31     ` Dario Faggioli
2013-08-19 12:58 ` David Vrabel [this message]
2013-08-19 13:26   ` George Dunlap
2013-08-20 14:20     ` David Vrabel
2013-08-20 14:55   ` Dario Faggioli
2013-08-20 15:15   ` Li Yechen
2013-08-25 21:24     ` Dario Faggioli
2013-09-26 14:15       ` Li Yechen
2013-09-26 14:15         ` Li Yechen
2013-08-23 20:53   ` Konrad Rzeszutek Wilk
2013-08-25 21:18     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5212168B.3090007@citrix.com \
    --to=david.vrabel@citrix.com \
    --cc=Ian.Campbell@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=lccycc123@gmail.com \
    --cc=ufimtseva@gmail.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).