Re: [PATCH 08 of 10 v2] libxl: enable automatic placement of guests on NUMA nodes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <raistlin@linux.it>
Cc: Andre Przywara <andre.przywara@amd.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH 08 of 10 v2] libxl: enable automatic placement of guests on NUMA nodes
Date: Thu, 21 Jun 2012 17:16:01 +0100	[thread overview]
Message-ID: <4FE348C1.5030407@eu.citrix.com> (raw)
In-Reply-To: <81f18379bb3d4d9397d1.1339779876@Solace>

On 15/06/12 18:04, Dario Faggioli wrote:
> If a domain does not have a VCPU affinity, try to pin it automatically to some
> PCPUs. This is done taking into account the NUMA characteristics of the host.
> In fact, we look for a combination of host's NUMA nodes with enough free memory
> and number of PCPUs for the new domain, and pin it to the VCPUs of those nodes.
>
> Once we know which ones, among all the possible combinations, represents valid
> placement candidates for a domain, use some heuistics for deciding which is the
> best. For instance, smaller candidates are considered to be better, both from
> the domain's point of view (fewer memory spreading among nodes) and from the
> system as a whole point of view (fewer memoy fragmentation).  In case of
> candidates of equal sizes (i.e., with the same number of nodes), the one with
> the greater amount of memory wins, as this is also good for keeping memory
> fragmentation under control.
>
> This all happens internally to libxl, and no API for driving the mechanism is
> provided for now. This matches what xend already does.
>
> Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com>
Overall I think this approach is much better.  I haven't done an 
extremely detailed review, but I do have some comments below.
> +    /*
> +     * Round up and down some of the constraints. For instance, the minimum
> +     * number of cpus a candidate should have must at least be non-negative.
> +     * Regarding the minimum number of NUMA nodes, if not explicitly specified
> +     * (i.e., min_nodes<= 0), we try to figure out a sensible number of nodes
> +     * from where to start generating candidates, if possible (or just start
> +     * from 1 otherwise). The maximum number of nodes should not exceed the
> +     * number of existent NUMA nodes on the host, or the candidate genaration
> +     * won't work properly.
> +     */
> +    min_cpus = min_cpus<= 0 ? 0 : min_cpus;
Wouldn't it just make more sense to specify that "min_cpus" (and other 
parameters) had to be >=0?

In any case, this is a very complicated way of saying "if (min_cpus<0) 
min_cpus = 0;".
> +    if (min_nodes<= 0) {
> +        int cpus_per_node;
> +
> +        cpus_per_node = cpus_per_node_count(tinfo, nr_cpus, ninfo, nr_nodes);
> +        if (cpus_per_node == 0)
> +            min_nodes = 1;
> +        else
> +            min_nodes = (min_cpus + cpus_per_node - 1) / cpus_per_node;
> +    }
Same here; you could just write "if(!min_nodes) {..."
> +    min_nodes = min_nodes>  nr_nodes ? nr_nodes : min_nodes;
> +    if (max_nodes<= 0)
> +        max_nodes = nr_nodes;
> +    else
> +        max_nodes = max_nodes>  nr_nodes ? nr_nodes : max_nodes;
if (max_nodes == 0 || max_nodes > nr_nodes) max_nodes = nr_nodes;
> +
> +/*
> + * The NUMA placement candidates are reordered according to the following
> + * heuristics:
> + *  - candidates involving fewer nodes come first. In case two (or
> + *    more) candidates span the same number of nodes,
> + *  - candidates with greater amount of free memory come first. In
> + *    case two (or more) candidates differ in their amount of free
> + *    memory by less than 10%,
Interesting idea -- sounds pretty reasonable.
> + *  - candidates with fewer domains insisting on them at the time of
> + *    this call come first.
Do you mean "existing"?  I think "assigned to" is probably better.
> + */
> +static int numa_cmpf(const void *v1, const void *v2)
> +{
> +    const libxl__numa_candidate *c1 = (const libxl__numa_candidate*) v1;
> +    const libxl__numa_candidate *c2 = (const libxl__numa_candidate*) v2;
> +    double mem_diff = labs(c1->free_memkb - c2->free_memkb);
> +    double mem_avg = (c1->free_memkb + c2->free_memkb) / 2.0;
> +
> +    if (c1->nr_nodes != c2->nr_nodes)
> +        return c1->nr_nodes - c2->nr_nodes;
> +
> +    if ((mem_diff / mem_avg) * 100.0<  10.0&&
> +        c1->nr_domains != c2->nr_domains)
> +        return c1->nr_domains - c2->nr_domains;
I realize this isn't a hot path, but it seems like moving into FP is 
really unnecessary.  You can just do this:

if ( ((mem_diff * 100) / mem_avg) < 10 ...

One minor note: I personally think it's a lot more readable to put the 
'&&' and '||' on the same line as the next item, rather than the 
previous item; i.e.:
  if ( expression_a
&& expression_b )

One more thing: Is there a reason why you put get_numa_candidates() in 
libxl_internal.h, but not sort_numa_candidates()?  It seems like both or 
neither should go. :-)

That's all I have for now.  I'm OK with the general approach, so here's 
a "weak ack", so if a maintainer is happy with the code, he can check it in:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

I'll try to come back and give a more detailed code review if I get a 
chance.

  -George

next prev parent reply	other threads:[~2012-06-21 16:16 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-15 17:04 [PATCH 00 of 10 v2] Automatic NUMA placement for xl Dario Faggioli
2012-06-15 17:04 ` [PATCH 01 of 10 v2] libxl: fix a typo in the GCREALLOC_ARRAY macro Dario Faggioli
2012-06-21  8:53   ` Ian Campbell
2012-06-26 16:00     ` Ian Campbell
2012-06-26 16:26       ` Dario Faggioli
2012-06-15 17:04 ` [PATCH 02 of 10 v2] libxl: add a new Array type to the IDL Dario Faggioli
2012-06-15 17:04 ` [PATCH 03 of 10 v2] libxl, libxc: introduce libxl_get_numainfo() Dario Faggioli
2012-06-21  9:02   ` Ian Campbell
2012-06-21 10:00     ` Dario Faggioli
2012-06-21 10:21       ` Ian Campbell
2012-06-15 17:04 ` [PATCH 04 of 10 v2] xl: add more NUMA information to `xl info -n' Dario Faggioli
2012-06-21  9:04   ` Ian Campbell
2012-06-15 17:04 ` [PATCH 05 of 10 v2] libxl: rename libxl_cpumap to libxl_bitmap Dario Faggioli
2012-06-21  9:12   ` Ian Campbell
2012-06-21  9:49     ` Dario Faggioli
2012-06-21 10:22       ` Ian Campbell
2012-06-15 17:04 ` [PATCH 06 of 10 v2] libxl: expand the libxl_bitmap API a bit Dario Faggioli
2012-06-21  9:30   ` Ian Campbell
2012-06-21  9:46     ` Dario Faggioli
2012-06-15 17:04 ` [PATCH 07 of 10 v2] libxl: introduce some node map helpers Dario Faggioli
2012-06-21  9:35   ` Ian Campbell
2012-06-21  9:44     ` Dario Faggioli
2012-06-15 17:04 ` [PATCH 08 of 10 v2] libxl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-06-21 11:40   ` Ian Campbell
2012-06-21 16:34     ` Dario Faggioli
2012-06-22 10:14       ` Ian Campbell
2012-06-26 16:25         ` Dario Faggioli
2012-06-26 16:26           ` Ian Campbell
2012-06-26 17:23             ` Ian Jackson
2012-06-21 16:16   ` George Dunlap [this message]
2012-06-21 16:43     ` Dario Faggioli
2012-06-22 10:05       ` George Dunlap
2012-06-26 11:03         ` Ian Jackson
2012-06-26 15:20           ` Dario Faggioli
2012-06-27  8:15           ` Dario Faggioli
2012-06-28  7:25   ` Zhang, Yang Z
2012-06-28  8:36     ` George Dunlap
2012-06-29  5:38       ` Zhang, Yang Z
2012-06-29  9:46         ` Dario Faggioli
2012-06-28 10:12     ` Dario Faggioli
2012-06-28 12:41       ` Pasi Kärkkäinen
2012-06-28 17:03         ` Dario Faggioli
2012-06-29  5:29           ` Zhang, Yang Z
2012-06-29  9:38             ` Dario Faggioli
2012-06-15 17:04 ` [PATCH 09 of 10 v2] libxl: have NUMA placement deal with cpupools Dario Faggioli
2012-06-21 13:31   ` Ian Campbell
2012-06-21 13:54     ` Dario Faggioli
2012-06-21 13:58       ` Ian Campbell
2012-06-15 17:04 ` [PATCH 10 of 10 v2] Some automatic NUMA placement documentation Dario Faggioli
2012-06-18 15:54   ` Dario Faggioli
2012-06-21 13:38   ` Ian Campbell
2012-06-21 13:57     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE348C1.5030407@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=raistlin@linux.it \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.