From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 10 of 10 [RFC]] xl: Some automatic NUMA placement documentation Date: Thu, 12 Apr 2012 12:32:09 +0200 Message-ID: <1334226729.28329.20.camel@Solace> References: <1334221902.16387.45.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3090533462319172796==" Return-path: In-Reply-To: <1334221902.16387.45.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Andre Przywara , Stefano Stabellini , George Dunlap , Juergen Gross , Ian Jackson , "xen-devel@lists.xen.org" , Jan Beulich List-Id: xen-devel@lists.xenproject.org --===============3090533462319172796== Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-NYPfdHDeX24zCS4Xzj1G" --=-NYPfdHDeX24zCS4Xzj1G Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2012-04-12 at 10:11 +0100, Ian Campbell wrote: > On Wed, 2012-04-11 at 14:17 +0100, Dario Faggioli wrote: > > Add some rationale and usage documentation for the new automatic > > NUMA placement feature of xl. > >=20 > > TODO: * Decide whether we want to have things like "Future Steps/Roadma= p" > > and/or "Performances/Benchmarks Results" here as well. >=20 > I think these would be better in the list archives and on the wiki > respectively. >=20 Ok, fine. I already posted the link in this thread and will continue to do so, as I'll put together a blog post and a wiki page about benchmarks. As for future steps/roadmap, let's first see what comes out from this series... :-) > > Signed-off-by: Dario Faggioli > >=20 > > diff --git a/docs/misc/xl-numa-placement.txt b/docs/misc/xl-numa-placem= ent.txt > > new file mode 100644 > > --- /dev/null > > +++ b/docs/misc/xl-numa-placement.txt >=20 > It looks like you are using something approximating markdown syntax > here, so you might as well name this xl-numa-placement.markdown and get > a .html version etc almost for free. >=20 Actually, that was another question I had and forgot to ask, i.e., what format should this file come with. I sort of took inspiration from xl-disk-configuration.txt and went for a plain text file, but I of course can go for a full-fledged markdown syntax. Thanks. > > +Of course, if a domain is known to only run on a subset of the physica= l > > +CPUs of the host, it is very easy to turn all its memory accesses into > > +local ones, by just constructing it's node affinity (in Xen) basing on >=20 > ^based >=20 Ok, to this ans to all typos/english howlers as well. Thanks a lot for looking into this! :-) > > + * `nodes =3D [ '0', '1' ]` and `cpus =3D "0"`, with CPU 0 within node= 0: > > + (i.e., cpu affinity subset of node affinity): > > + domain's vcpus can and will only run on host CPU 0. As node affin= ity > > + is being explicitly set to host NUMA nodes 0 and 1 --- which incl= udes > > + CPU 0 --- all the memory access of the domain will be local; >=20 > In this case won't some of (half?) the memory come from node 1 and > therefore be non-local to cpu 0? >=20 Oops, yep, you're right, that's not what I meant to write! > > + > > + * `nodes =3D [ '0', '1' ]` and `cpus =3D "0, 4", with CPU 0 in node 0= but > > + CPU 4 in, say, node 2 (i.e., cpu affinity superset of node affinity= ): > > + domain's vcpus can run on host CPUs 0 and 4, with CPU 4 not being= within > > + the node affinity (explicitly set to host NUMA nodes 0 and 1). Th= e > > + (credit) scheduler will try to keep memory accesses local by sche= duling > > + the domain's vcpus on CPU 0, but it may not achieve 100% success; > > + > > + * `nodes =3D [ '0', '1' ]` and `cpus =3D "4"`, with CPU 4 within, say= , node 2 >=20 > These examples might be a little clearer if you defined up front what > the nodes and cpus were and then used that for all of them? >=20 Good, idea, I will do that. > A bunch of what follows would be good to have in the xl or xl.cfg man > pages too/instead. (I started with this docs patch so I haven't actually > looked at the earlier ones yet, perhaps this is already the case) >=20 Single patches that introduces the various features tries to document them as well, but not with this level of details. I'm fine with putting there whatever you think it could fit, just le me know, perhaps on the comments on those patches, or whatever you like.=20 > > + > > + * "auto": automatic placement by means of a not better specified (xl > > + implementation dependant) algorithm. It is basically for th= ose > > + who do want automatic placement, but have no idea what poli= cy > > + or algorithm would be better... <> > > + > > + * "ffit": automatic placement via the First Fit algorithm, applied ch= ecking > > + the memory requirement of the domain against the amount of = free > > + memory in the various host NUMA nodes; > > + > > + * "bfit": automatic placement via the Best Fit algorithm, applied che= cking > > + the memory requirement of the domain against the amount of = free > > + memory in the various host NUMA nodes; > > + > > + * "wfit": automatic placement via the Worst Fit algorithm, applied ch= ecking > > + the memory requirement of the domain against the amount of = free > > + memory in the various host NUMA nodes; > > > > > > > > + * `nodes_policy=3D"auto"` (or `"ffit"`, `"bfit"`, `"wfit"`) and `node= s=3D2`: > > + xl will try fitting the domain on the host NUMA nodes by using th= e > > + requested policy and only the number of nodes specified in `nodes= =3D` > > + (2 in this example). >=20 > Number of nodes rather than specifically node 2? This is different to > the examples in the preceding section? >=20 It is. I'll try to clarify things as per your suggestion. However, talking about syntax, here's what the series allows "nodes" and "nodes_policy" to be: * "nodes=3D": - a list (`[ '0', '3' ]`), and in this case the elements=20 of the list are specific nodes you want to use; - an integer (`2`), and in this case that is the _number_=20 of nodes you want to use, with the algorithm free to arbitrary decide which ones to pick; - the string `"auto"`, and in this case you tell the=20 algorithm: <> :-) * "nodes_policy=3D" - the string `"auto"`, the same as above - the strings `"ffit"`, `"bfit"` and `"wfit"`, with=20 the meaning reported by the doc in he patch. There is some overlapping but I wanted to make it possible for one to write just things like: nodes =3D [ '0', '3' ] or:=20 nodes =3D "auto" or: nodes_policy =3D "wfit" nodes =3D 2 without introducing too much different options. On the down side, this could obviously lead to awkward or nonsensical combinations... I tried to intercept the worst of them during config file parsing, and can surely push this farther. So the important question here is, besides from the fact I'll try to clarify things better, do you think the interface is both comprehensive and clear enough? Or should we think to something different? Thanks a lot again and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-NYPfdHDeX24zCS4Xzj1G Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEABECAAYFAk+GrykACgkQk4XaBE3IOsQizgCgm3Npz/1CVS10MPi78xIrWWJg qpQAoIPD1r+QrY1oC0nel5jKF5gjSt6v =FplK -----END PGP SIGNATURE----- --=-NYPfdHDeX24zCS4Xzj1G-- --===============3090533462319172796== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3090533462319172796==--