From: Dario Faggioli <raistlin@linux.it>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Andre Przywara <andre.przywara@amd.com>,
Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
George Dunlap <George.Dunlap@eu.citrix.com>,
Juergen Gross <juergen.gross@ts.fujitsu.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH 10 of 10 [RFC]] xl: Some automatic NUMA placement documentation
Date: Thu, 12 Apr 2012 12:32:09 +0200 [thread overview]
Message-ID: <1334226729.28329.20.camel@Solace> (raw)
In-Reply-To: <1334221902.16387.45.camel@zakaz.uk.xensource.com>
[-- Attachment #1.1: Type: text/plain, Size: 6834 bytes --]
On Thu, 2012-04-12 at 10:11 +0100, Ian Campbell wrote:
> On Wed, 2012-04-11 at 14:17 +0100, Dario Faggioli wrote:
> > Add some rationale and usage documentation for the new automatic
> > NUMA placement feature of xl.
> >
> > TODO: * Decide whether we want to have things like "Future Steps/Roadmap"
> > and/or "Performances/Benchmarks Results" here as well.
>
> I think these would be better in the list archives and on the wiki
> respectively.
>
Ok, fine. I already posted the link in this thread and will continue to
do so, as I'll put together a blog post and a wiki page about
benchmarks.
As for future steps/roadmap, let's first see what comes out from this
series... :-)
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> >
> > diff --git a/docs/misc/xl-numa-placement.txt b/docs/misc/xl-numa-placement.txt
> > new file mode 100644
> > --- /dev/null
> > +++ b/docs/misc/xl-numa-placement.txt
>
> It looks like you are using something approximating markdown syntax
> here, so you might as well name this xl-numa-placement.markdown and get
> a .html version etc almost for free.
>
Actually, that was another question I had and forgot to ask, i.e., what
format should this file come with. I sort of took inspiration from
xl-disk-configuration.txt and went for a plain text file, but I of
course can go for a full-fledged markdown syntax. Thanks.
> > +Of course, if a domain is known to only run on a subset of the physical
> > +CPUs of the host, it is very easy to turn all its memory accesses into
> > +local ones, by just constructing it's node affinity (in Xen) basing on
>
> ^based
>
Ok, to this ans to all typos/english howlers as well. Thanks a lot for
looking into this! :-)
> > + * `nodes = [ '0', '1' ]` and `cpus = "0"`, with CPU 0 within node 0:
> > + (i.e., cpu affinity subset of node affinity):
> > + domain's vcpus can and will only run on host CPU 0. As node affinity
> > + is being explicitly set to host NUMA nodes 0 and 1 --- which includes
> > + CPU 0 --- all the memory access of the domain will be local;
>
> In this case won't some of (half?) the memory come from node 1 and
> therefore be non-local to cpu 0?
>
Oops, yep, you're right, that's not what I meant to write!
> > +
> > + * `nodes = [ '0', '1' ]` and `cpus = "0, 4", with CPU 0 in node 0 but
> > + CPU 4 in, say, node 2 (i.e., cpu affinity superset of node affinity):
> > + domain's vcpus can run on host CPUs 0 and 4, with CPU 4 not being within
> > + the node affinity (explicitly set to host NUMA nodes 0 and 1). The
> > + (credit) scheduler will try to keep memory accesses local by scheduling
> > + the domain's vcpus on CPU 0, but it may not achieve 100% success;
> > +
> > + * `nodes = [ '0', '1' ]` and `cpus = "4"`, with CPU 4 within, say, node 2
>
> These examples might be a little clearer if you defined up front what
> the nodes and cpus were and then used that for all of them?
>
Good, idea, I will do that.
> A bunch of what follows would be good to have in the xl or xl.cfg man
> pages too/instead. (I started with this docs patch so I haven't actually
> looked at the earlier ones yet, perhaps this is already the case)
>
Single patches that introduces the various features tries to document
them as well, but not with this level of details. I'm fine with putting
there whatever you think it could fit, just le me know, perhaps on the
comments on those patches, or whatever you like.
> > +
> > + * "auto": automatic placement by means of a not better specified (xl
> > + implementation dependant) algorithm. It is basically for those
> > + who do want automatic placement, but have no idea what policy
> > + or algorithm would be better... <<Just give me a sane default!>>
> > +
> > + * "ffit": automatic placement via the First Fit algorithm, applied checking
> > + the memory requirement of the domain against the amount of free
> > + memory in the various host NUMA nodes;
> > +
> > + * "bfit": automatic placement via the Best Fit algorithm, applied checking
> > + the memory requirement of the domain against the amount of free
> > + memory in the various host NUMA nodes;
> > +
> > + * "wfit": automatic placement via the Worst Fit algorithm, applied checking
> > + the memory requirement of the domain against the amount of free
> > + memory in the various host NUMA nodes;
> >
> > <snip>
> >
> > + * `nodes_policy="auto"` (or `"ffit"`, `"bfit"`, `"wfit"`) and `nodes=2`:
> > + xl will try fitting the domain on the host NUMA nodes by using the
> > + requested policy and only the number of nodes specified in `nodes=`
> > + (2 in this example).
>
> Number of nodes rather than specifically node 2? This is different to
> the examples in the preceding section?
>
It is. I'll try to clarify things as per your suggestion. However,
talking about syntax, here's what the series allows "nodes" and
"nodes_policy" to be:
* "nodes=": - a list (`[ '0', '3' ]`), and in this case the elements
of the list are specific nodes you want to use;
- an integer (`2`), and in this case that is the _number_
of nodes you want to use, with the algorithm free to
arbitrary decide which ones to pick;
- the string `"auto"`, and in this case you tell the
algorithm: <<please, do whatever you like and make me
happy>> :-)
* "nodes_policy=" - the string `"auto"`, the same as above
- the strings `"ffit"`, `"bfit"` and `"wfit"`, with
the meaning reported by the doc in he patch.
There is some overlapping but I wanted to make it possible for one to
write just things like:
nodes = [ '0', '3' ]
or:
nodes = "auto"
or:
nodes_policy = "wfit"
nodes = 2
without introducing too much different options. On the down side, this
could obviously lead to awkward or nonsensical combinations... I tried
to intercept the worst of them during config file parsing, and can
surely push this farther.
So the important question here is, besides from the fact I'll try to
clarify things better, do you think the interface is both comprehensive
and clear enough? Or should we think to something different?
Thanks a lot again and Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
prev parent reply other threads:[~2012-04-12 10:32 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-11 13:17 [PATCH 00 of 10 [RFC]] Automatically place guest on host's NUMA nodes with xl Dario Faggioli
2012-04-11 13:17 ` [PATCH 01 of 10 [RFC]] libxc: Generalize xenctl_cpumap to just xenctl_map Dario Faggioli
2012-04-11 16:08 ` George Dunlap
2012-04-11 16:31 ` Dario Faggioli
2012-04-11 16:41 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 02 of 10 [RFC]] libxl: Generalize libxl_cpumap to just libxl_map Dario Faggioli
2012-04-11 13:17 ` [PATCH 03 of 10 [RFC]] libxc, libxl: Introduce xc_nodemap_t and libxl_nodemap Dario Faggioli
2012-04-11 16:38 ` George Dunlap
2012-04-11 16:57 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 04 of 10 [RFC]] libxl: Introduce libxl_get_numainfo() calling xc_numainfo() Dario Faggioli
2012-04-11 13:17 ` [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file Dario Faggioli
2012-04-12 10:24 ` George Dunlap
2012-04-12 10:48 ` David Vrabel
2012-04-12 22:25 ` Dario Faggioli
2012-04-12 11:32 ` Formatting of emails which are comments on patches Ian Jackson
2012-04-12 11:42 ` George Dunlap
2012-04-12 22:21 ` [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file Dario Faggioli
2012-04-11 13:17 ` [PATCH 06 of 10 [RFC]] xl: Allow user to set or change node affinity on-line Dario Faggioli
2012-04-12 10:29 ` George Dunlap
2012-04-12 21:57 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 07 of 10 [RFC]] sched_credit: Let the scheduler know about `node affinity` Dario Faggioli
2012-04-12 23:06 ` Dario Faggioli
2012-04-27 14:45 ` George Dunlap
2012-05-02 15:13 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 08 of 10 [RFC]] xl: Introduce First Fit memory-wise placement of guests on nodes Dario Faggioli
2012-05-01 15:45 ` George Dunlap
2012-05-02 16:30 ` Dario Faggioli
2012-05-03 1:03 ` Dario Faggioli
2012-05-03 8:10 ` Ian Campbell
2012-05-03 10:16 ` George Dunlap
2012-05-03 13:41 ` George Dunlap
2012-05-03 14:58 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 09 of 10 [RFC]] xl: Introduce Best and Worst Fit guest placement algorithms Dario Faggioli
2012-04-16 10:29 ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 10 of 10 [RFC]] xl: Some automatic NUMA placement documentation Dario Faggioli
2012-04-12 9:11 ` Ian Campbell
2012-04-12 10:32 ` Dario Faggioli [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1334226729.28329.20.camel@Solace \
--to=raistlin@linux.it \
--cc=George.Dunlap@eu.citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=Stefano.Stabellini@eu.citrix.com \
--cc=andre.przywara@amd.com \
--cc=juergen.gross@ts.fujitsu.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).