From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: NUMA-aware VM placement in Xen Date: Fri, 24 Feb 2012 11:50:45 +0100 Message-ID: <1330080645.5034.96.camel@Abyss> References: <1330078323.5034.73.camel@Abyss> <4F476396.60802@eu.citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3677443251419548882==" Return-path: In-Reply-To: <4F476396.60802@eu.citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: George Dunlap Cc: "Tim (Xen.org)" , xen-devel , "Keir (Xen.org)" , Ian Campbell List-Id: xen-devel@lists.xenproject.org --===============3677443251419548882== Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-z0pT82/1ZuQr7ZRP+kMt" --=-z0pT82/1ZuQr7ZRP+kMt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2012-02-24 at 10:16 +0000, George Dunlap wrote:=20 > > Basically, given alloc_domheap_pages uses d->node_affinity for deciding > > from which node(s) to actually take memory from, I was planning to > > either use the same mask or build a new one with similar purposes, the > > problem being _where_ to populate it with the proper nodes. > > I'm now looking at xc_domain_setmaxmem-->do_domctl(XEN_DOMCTL_max_mem), > > although I think it's too early, and I'd end up guessing wrt a lot of > > aspects... But considering xm/xend was doing the same even earlier (at > > least I think)... > So the first question is, where should the decision about NUMA placement= =20 > be made, and the second is how that level should implement it. >=20 Yes, indeed. > Doing it at the libxc level I think is not right. =20 > Ok, same here. Just to be sure I understood what you're saying, if you refer to xc_domain_setmaxmem, as I'll end up doing it in do_domctl, it'd be in Xen, but anyway it still won't look like the way I wanted it to be (see below). :-( > It seems to me we=20 > have two options: > * Have libxl do the NUMA placement on behalf of the toolstack. In that= =20 > case, the libxl_domain_create_new function should look at the available= =20 > memory, the NUMA layout, &c, and then set d->node_affinity before=20 > calling xc_hvm_build. > This can be done. If I got it correctly it is more or less what xm/xend already does. > * Have the toolstack do it. In this case, you'd be modifying xl to set= =20 > d->node_affinity before calling libxl's domain creation function. >=20 I'm not sure I'm getting this right... It seems very similar to the one above. > Do those options work? Let me know if I've misunderstood anything. >=20 I think they can be implemented. "work", it depends on how we define "work". :-D That's why I was struggling for putting this in the hypervisor and not in the toolstack because I really think it should live there if possible. For example it would be nice for the decision to be protected by the proper locking. I mean, what's the point in checking the amount of free memory in a node somewhere in (lib)xl, if when the actual allocation will happen (in Xen) that might be a completely different value (due to concurrent domain creation, destruction, etc.)? > Any thoughts one way or the other from anyone? >=20 Any ideas on how to put that thing _in_ Xen? > I'd be tempted to have it be optional -- you can set "numa=3Dauto" and th= e=20 > domain creation function will do the simple thing; or you can set=20 > "numa=3Dmanual" and have the toolstack / config file set the nodes=20 > manually. That would translate pretty well to config files as well --= =20 > more "set the knobs" administrators could set the numa layout in the=20 > config file manually if they wanted. >=20 I agree and that was already my plan: configurable and per-domain. I think the config file, supporting cpupools and vcpu-pinning, already offer almost all the facilities for manually deploying a VM reflecting a specific NUMA-layout. What I was thinking adding was the "numa=3Dauto" or whatever switch, so that if one does not (want to) specify cpupools or pinning, VM still gets NUMA-sensible placement. But anyway, no problem adding other knobs if considered worthwhile, the problem is the other part! :-P Thanks and Regards, Dario --=20 <> (Raistlin Majere) ------------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-z0pT82/1ZuQr7ZRP+kMt Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEABECAAYFAk9Ha4UACgkQk4XaBE3IOsT2KQCdHiBtJ0/JMpwGq+Xuobtw+JiA agUAn1+DnFkQn8v0O+1fLvh3QHYGCQoN =VZst -----END PGP SIGNATURE----- --=-z0pT82/1ZuQr7ZRP+kMt-- --===============3677443251419548882== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3677443251419548882==--