Re: [PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Dario Faggioli <raistlin@linux.it>
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Andre Przywara <andre.przywara@amd.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes
Date: Wed, 18 Jul 2012 00:15:20 +0200	[thread overview]
Message-ID: <1342563320.11794.19.camel@Abyss> (raw)
In-Reply-To: <20485.35590.105351.434937@mariner.uk.xensource.com>


[-- Attachment #1.1: Type: text/plain, Size: 5699 bytes --]

On Tue, 2012-07-17 at 16:55 +0100, Ian Jackson wrote: 
> Dario Faggioli writes ("[PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes"):
> > If a domain does not have a VCPU affinity, try to pin it automatically to some
> > PCPUs. This is done taking into account the NUMA characteristics of the host.
> > In fact, we look for a combination of host's NUMA nodes with enough free memory
> > and number of PCPUs for the new domain, and pin it to the VCPUs of those nodes.
> 
> Thanks for this admirably clear patch.
> 
Thanks to you for looking at it.

> Can you please rewrap your commit messages to around 70 lines ?  Many
> VCSs indent them in the log in some situations, and as you see here
> mail programs indent them when quoting too.
> 
That should have happened in the first place. As I'm reposting, I'll
take extra attention to that, sorry.

> > +/* Subtract two values and translate the result in [0, 1] */
> > +static double normalized_diff(double a, double b)
> > +{
> > +#define max(a, b) (a > b ? a : b)
> > +    if (!a && a == b)
> > +        return 0.0;
> > +    return (a - b) / max(a, b);
> > +}
> 
> 1. This macro max() should be in libxl_internal.h.
> 2. It should be MAX so people are warned it's a macro
> 3. It should have all the necessary ()s for macro precedence safety
> 
Ok, will do that.

> > +    double freememkb_diff = normalized_diff(c2->free_memkb, c1->free_memkb);
> > +    double nrdomains_diff = normalized_diff(c1->nr_domains, c2->nr_domains);
> > +
> > +    if (c1->nr_nodes != c2->nr_nodes)
> > +        return c1->nr_nodes - c2->nr_nodes;
> > +
> > +    return sign(3*freememkb_diff + nrdomains_diff);
> 
> The reason you need what sign() does is that you need to convert from
> double to int, I guess.
> 
Mostly for that and to make what's happening even more clear.

> > +
> > +    /*
> > +     * Check if the domain has any CPU affinity. If not, try to build up one.
> > +     * In case numa_place_domain() find at least a suitable candidate, it will
> > +     * affect info->cpumap accordingly; if it does not, it just leaves it
> > +     * as it is. This means (unless some weird error manifests) the subsequent
> > +     * call to libxl_set_vcpuaffinity_all() will do the actual placement,
> > +     * whatever that turns out to be.
> > +     */
> > +    if (libxl_bitmap_is_full(&info->cpumap)) {
> > +        int rc = numa_place_domain(gc, info);
> > +        if (rc)
> > +            return rc;
> > +    }
> 
> I guess it would be preferable to do this only if the bitmap was full
> by default, so that setting the bitmap explicitly to all cpus still
> works.
> 
> I'm not sure that that's essential to have, though.
> 
I was thinking about this right in this days, and I think it should be
exactly as you say, as one need a mechanism for disabling this thing as
a whole. I really don't think it should take to much to put something
together, even as a separate, future, patch, if these get checked-in.
Thanks.

> > +            /*
> > +             * Conditions are met, we can add this combination to the
> > +             * NUMA placement candidates list. We first make sure there
> > +             * is enough space in there, and then we initialize the new
> > +             * candidate element with the node map corresponding to the
> > +             * combination we are dealing with. Memory allocation for
> > +             * expanding the array that hosts the list happens in chunks
> > +             * equal to the number of NUMA nodes in the system (to
> > +             * avoid allocating memory each and every time we find a
> > +             * new candidate).
> > +             */
> > +            if (*nr_cndts == array_size)
> > +                array_size += nr_nodes;
> > +            GCREALLOC_ARRAY(new_cndts, array_size);
> 
> This part of the algorithm is quadratic in the number of combinations
> divided by the number of nodes.  So the algorithm is
>    O( (  C( nr_nodes, min_nodes ) / min_nodes  )^2 )
> which is quite bad really.
> 
I might well be wrong, but I was thinking to it as something like this:

O( C(nr_nodes,(nr_nodes/2)) * nr_nodes )

That's because the external while() is repeated, at most, nr_nodes times
(if min_nodes=1 and max_nodes=nr_nodes). Each of these steps hosts a
for() which visits all the combinations, the maximum number of which
ISTR to be C(nr_nodes,(nr_nodes/2)).

I'm not sure what you meant when putting min_nodes up there in your
formula (min_nodes is likely to be 1 most of the cases...), so I can't
get numbers and compare them, but it looked (looks?) a bit less bad to
me... Or did I make some obvious mistake I'm not seeing right now? :-(

> At the very least this needs to be an exponential allocation, eg
>   +                array_size += nr_nodes + array_size;
> 
> But if you didn't insist on creating the whole list and sorting it,
> you would avoid this allocation entirely, wouldn't you ?
> 
I'll kill the separation between candidate identification and sorting:
that's easy, quick, and will make George happy as well. :-)

> Should we bve worried that this algorithm will be too slow even if it
> involves just
>   O( C(nr_nodes,min_nodes) )
> iterations ?
> 
I'm commenting about this in the other thread you opened...

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2012-07-17 22:15 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-10 15:03 [PATCH 0 of 3 v4/leftover] Automatic NUMA placement for xl Dario Faggioli
2012-07-10 15:03 ` [PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-07-17 15:55   ` Ian Jackson
2012-07-16 17:13     ` [PATCH 0 of 3 v5/leftover] Automatic NUMA placement for xl Dario Faggioli
2012-07-16 17:13       ` [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-07-17 18:04         ` [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes [and 1 more messages] Ian Jackson
2012-07-17 20:23           ` Ian Campbell
2012-07-18  0:31             ` Dario Faggioli
2012-07-18 10:44             ` Ian Jackson
2012-07-18  0:22           ` Dario Faggioli
2012-07-18  8:27             ` Dario Faggioli
2012-07-18  9:13             ` Ian Campbell
2012-07-18  9:43               ` Dario Faggioli
2012-07-18  9:53                 ` Ian Campbell
2012-07-18 10:08                   ` Dario Faggioli
2012-07-18 11:00                   ` Ian Jackson
2012-07-18 13:14                     ` Ian Campbell
2012-07-18 13:35                       ` Dario Faggioli
2012-07-19 12:47                       ` Dario Faggioli
2012-07-18 13:40                     ` Andre Przywara
2012-07-18 13:54                       ` Juergen Gross
2012-07-18 14:00                       ` Dario Faggioli
2012-07-19 14:43                       ` Ian Jackson
2012-07-19 18:37                         ` Andre Przywara
2012-07-21  1:46                           ` Dario Faggioli
2012-07-18 10:53                 ` Ian Jackson
2012-07-18 13:12                   ` Ian Campbell
2012-07-18  9:47             ` Dario Faggioli
2012-07-19 12:21         ` [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes Andre Przywara
2012-07-19 14:22           ` Dario Faggioli
2012-07-20  8:19             ` Andre Przywara
2012-07-20  9:39               ` Dario Faggioli
2012-07-20 10:01                 ` Dario Faggioli
2012-07-20  8:20             ` Dario Faggioli
2012-07-20  8:26               ` Andre Przywara
2012-07-20  8:38                 ` Juergen Gross
2012-07-20  9:52                   ` Dario Faggioli
2012-07-20  9:56                     ` Juergen Gross
2012-07-20  9:44                 ` Dario Faggioli
2012-07-20 11:47                   ` Andre Przywara
2012-07-20 12:54                     ` Dario Faggioli
2012-07-20 13:07                       ` Andre Przywara
2012-07-21  1:44                         ` Dario Faggioli
2012-07-16 17:13       ` [PATCH 2 of 3 v5/leftover] libxl: have NUMA placement deal with cpupools Dario Faggioli
2012-07-16 17:13       ` [PATCH 3 of 3 v5/leftover] Some automatic NUMA placement documentation Dario Faggioli
2012-07-20 11:07       ` [PATCH 0 of 3 v5/leftover] Automatic NUMA placement for xl David Vrabel
2012-07-20 11:43         ` Andre Przywara
2012-07-20 12:00           ` Ian Campbell
2012-07-20 12:08             ` Ian Campbell
2012-07-23 10:38               ` Dario Faggioli
2012-07-23 10:42                 ` Ian Campbell
2012-07-23 15:31                   ` Dario Faggioli
2012-07-23 10:23             ` Dario Faggioli
2012-07-20 12:14           ` David Vrabel
2012-07-17 15:59     ` [PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes Ian Campbell
2012-07-17 18:01       ` Ian Jackson
2012-07-17 22:15     ` Dario Faggioli [this message]
2012-07-10 15:03 ` [PATCH 2 of 3 v4/leftover] libxl: have NUMA placement deal with cpupools Dario Faggioli
2012-07-10 15:03 ` [PATCH 3 of 3 v4/leftover] Some automatic NUMA placement documentation Dario Faggioli
2012-07-16 17:03 ` [PATCH 0 of 3 v4/leftover] Automatic NUMA placement for xl Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1342563320.11794.19.camel@Abyss \
    --to=raistlin@linux.it \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).