xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Andre Przywara <andre.przywara@amd.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
Date: Fri, 19 Oct 2012 01:20:33 +0200	[thread overview]
Message-ID: <1350602433.26152.106.camel@Solace> (raw)
In-Reply-To: <CAFLBxZbv4vYoJnSSjPnDQPZdOTqPMp0eoyLvFW8Vgo+jYQR4iw@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 3565 bytes --]

On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote:
> On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > In fact, among placement candidates with the same number of nodes, the
> > closer the various nodes are to each others, the better the performances
> > for a domain placed there.
> 
> Looks good overall -- my only worry is the N^2 nature of the
> algorithm.  We're already doing some big combinatorial thing to
> generate the candidates, right?  
>
It is, with N being the number of nodes, which we discussed thoroughly
already a couple of months ago, and reached consensus on the fact that N
will stay less than 8 for the next 5 (but probably even more) years. :-)

In any case, if something really unexpected happens, and N jumps to
anything bigger than 16, the placement algorithm won't even start, and
we'll never reach this point!.

Moreover, given the number we're playing with, I don't think this
specific patch is adding much complexity, as we already have the
function that counts the number of vCPUs (as it was for xend) bound to a
candidate, which is Ndoms*Nvcpus, and we're very likely going o have
much more domains than nodes. :-)

> And now we're doing N^2 for each
> candidate? 
>
Again, yes, but that is turning it from Ndoms*Nvcpus to
Ndoms*Nvcpus+Nnodes^2, which is still dominated by the first term. IIRC,
Andre tried to start >50 domains with 2 vCPUs on a 8 nodes system, which
means 50*2 vs 8*8.

> Suppose we get an ARM system with 4096 cores and 128 NUMA
> nodes?  If Xen 4.4 doesn't come out until March 2014, there will still
> be distros using 4.3 through mid-2015.
> 
Right, but I really don't think that monster is actually made out of
4096 cores arranged in 128 _NUMA_ nodes on which you run the same
instance of the hypervisor.

I also recall hearing the numbers and the use of the word "node", but I
really think they was rather referred to a cluster architecture where "a
node" means something more like "a server", each one running their copy
of Xen (although they'll be packed all together in the same rack,
talking via some super-fast interconnect).
I'm pretty sure I remember Stefano speculating about the need to use
some orchestration layer (like {Cloud,Open}Stack) _within_ those big
irons to deal exactly with that... Stefano, am I talking nonsense? :-D

Finally, allow me to say that the whole placement algorithm already
interacts quite nicely with cpupools. Thus, even in the unlikely event
of an actual 128 NUMA nodes machine, you can have, say, 16 cpupools with
8 nodes each (or vice versa), and the algorithm will be back dealing
with _no_more_than_ 8 (or 16) nodes. Yes, right now this would require
for someone to manually setup the pools and decide which domain to put
where. However, it would be very very easy to add, at that point,
something doing this pooling and more coarse placing automatically (and
quickly). In fact, we can even think about having it for 4.3, if you
really believe it's necessary.

> I seem to remember having a discussion about this issue already, but I
> can't remember what the outcome was...
> 
Yep, we did, and the outcome was right what I tried to summarize
above. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2012-10-18 23:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-16 17:26 [PATCH 0 of 3] Some small NUMA placement improvements Dario Faggioli
2012-10-16 17:26 ` [PATCH 1 of 3] libxl: take node distances into account during NUMA placement Dario Faggioli
2012-10-18 15:17   ` George Dunlap
2012-10-18 23:20     ` Dario Faggioli [this message]
2012-10-19 10:03       ` Ian Jackson
2012-10-19 10:39         ` Stefano Stabellini
2012-10-19 10:56           ` Dario Faggioli
2012-10-19 10:35       ` Stefano Stabellini
2012-10-19 10:50       ` George Dunlap
2012-10-19 11:00         ` Dario Faggioli
2012-10-19 14:57       ` Ian Jackson
2012-10-19 18:02         ` Dario Faggioli
2012-10-21  7:35           ` Dario Faggioli
2012-10-16 17:26 ` [PATCH 2 of 3] libxl, xl: user can ask for min and max nodes to use during placement Dario Faggioli
2012-10-18 15:21   ` George Dunlap
2012-10-16 17:26 ` [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning Dario Faggioli
2012-10-18 11:23   ` Ian Campbell
2012-10-18 13:11     ` Dario Faggioli
2012-10-18 13:15       ` Ian Campbell
2012-10-18 13:18         ` Dario Faggioli
2012-10-18 15:30   ` George Dunlap
2012-10-18 22:35     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1350602433.26152.106.camel@Solace \
    --to=dario.faggioli@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).