From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement Date: Fri, 19 Oct 2012 01:20:33 +0200 Message-ID: <1350602433.26152.106.camel@Solace> References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5294901881647590138==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: George Dunlap Cc: Andre Przywara , Ian Campbell , Stefano Stabellini , Juergen Gross , Ian Jackson , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org --===============5294901881647590138== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-9tJFK9bD/u2CbAXmdpFr" --=-9tJFK9bD/u2CbAXmdpFr Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote: > On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli > wrote: > > In fact, among placement candidates with the same number of nodes, the > > closer the various nodes are to each others, the better the performance= s > > for a domain placed there. >=20 > Looks good overall -- my only worry is the N^2 nature of the > algorithm. We're already doing some big combinatorial thing to > generate the candidates, right? =20 > It is, with N being the number of nodes, which we discussed thoroughly already a couple of months ago, and reached consensus on the fact that N will stay less than 8 for the next 5 (but probably even more) years. :-) In any case, if something really unexpected happens, and N jumps to anything bigger than 16, the placement algorithm won't even start, and we'll never reach this point!. Moreover, given the number we're playing with, I don't think this specific patch is adding much complexity, as we already have the function that counts the number of vCPUs (as it was for xend) bound to a candidate, which is Ndoms*Nvcpus, and we're very likely going o have much more domains than nodes. :-) > And now we're doing N^2 for each > candidate?=20 > Again, yes, but that is turning it from Ndoms*Nvcpus to Ndoms*Nvcpus+Nnodes^2, which is still dominated by the first term. IIRC, Andre tried to start >50 domains with 2 vCPUs on a 8 nodes system, which means 50*2 vs 8*8. > Suppose we get an ARM system with 4096 cores and 128 NUMA > nodes? If Xen 4.4 doesn't come out until March 2014, there will still > be distros using 4.3 through mid-2015. >=20 Right, but I really don't think that monster is actually made out of 4096 cores arranged in 128 _NUMA_ nodes on which you run the same instance of the hypervisor. I also recall hearing the numbers and the use of the word "node", but I really think they was rather referred to a cluster architecture where "a node" means something more like "a server", each one running their copy of Xen (although they'll be packed all together in the same rack, talking via some super-fast interconnect). I'm pretty sure I remember Stefano speculating about the need to use some orchestration layer (like {Cloud,Open}Stack) _within_ those big irons to deal exactly with that... Stefano, am I talking nonsense? :-D Finally, allow me to say that the whole placement algorithm already interacts quite nicely with cpupools. Thus, even in the unlikely event of an actual 128 NUMA nodes machine, you can have, say, 16 cpupools with 8 nodes each (or vice versa), and the algorithm will be back dealing with _no_more_than_ 8 (or 16) nodes. Yes, right now this would require for someone to manually setup the pools and decide which domain to put where. However, it would be very very easy to add, at that point, something doing this pooling and more coarse placing automatically (and quickly). In fact, we can even think about having it for 4.3, if you really believe it's necessary. > I seem to remember having a discussion about this issue already, but I > can't remember what the outcome was... >=20 Yep, we did, and the outcome was right what I tried to summarize above. :-) Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-9tJFK9bD/u2CbAXmdpFr Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEABECAAYFAlCAjsEACgkQk4XaBE3IOsRFTQCgr1AfcmhHdRppaow6h7qWlaRS 2U8AoJ/7UxvQ7eOm7Suaum11EC2s0w89 =g0s+ -----END PGP SIGNATURE----- --=-9tJFK9bD/u2CbAXmdpFr-- --===============5294901881647590138== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============5294901881647590138==--