[PATCH 0 of 3] Some small NUMA placement improvements

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0 of 3] Some small NUMA placement improvements
@ 2012-10-16 17:26 Dario Faggioli
  2012-10-16 17:26 ` [PATCH 1 of 3] libxl: take node distances into account during NUMA placement Dario Faggioli
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Dario Faggioli @ 2012-10-16 17:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Ian Jackson, Ian Campbell

Hi everyone,

This series implements some small impovement in how NUMA is dealt at the libxl
and xl level that I had pending in one of my queues.

More specifically, it enables actually using node distances information during
automatic placement and it allows the users to ask for a minimum and a maximum
number of NUMA nodes they want the automaic placement to use.

A little bit less related to _automatic_ placement, it also enhances the syntax
of the "cpus=" config switch (as well as the one of the `xl vcpu-pin' command),
so that full nodes can be specified instead of just single CPUs.

This is all about, on one hand, keeping improving performances of the NUMA
placement mechanism and, on the other, starting to give users some knobs to
control it.

Comments appreciated, as usual. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-16 17:26 [PATCH 0 of 3] Some small NUMA placement improvements Dario Faggioli
@ 2012-10-16 17:26 ` Dario Faggioli
  2012-10-18 15:17   ` George Dunlap
  2012-10-16 17:26 ` [PATCH 2 of 3] libxl, xl: user can ask for min and max nodes to use during placement Dario Faggioli
  2012-10-16 17:26 ` [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning Dario Faggioli
  2 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2012-10-16 17:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Ian Jackson, Ian Campbell

In fact, among placement candidates with the same number of nodes, the
closer the various nodes are to each others, the better the performances
for a domain placed there.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -105,6 +105,9 @@ out:
  *  - the number of vcpus runnable on the candidates is considered, and
  *    candidates with fewer of them are preferred. If two candidate have
  *    the same number of runnable vcpus,
+ *  - the sum of the node distances in the candidates is considered, and
+ *    candidates with smaller total distance are preferred. If total
+ *    distance is the same for the two candidatess,
  *  - the amount of free memory in the candidates is considered, and the
  *    candidate with greater amount of it is preferred.
  *
@@ -114,6 +117,10 @@ out:
  * overloading large (from a memory POV) nodes. That's right the effect
  * that counting the vcpus able to run on the nodes tries to prevent.
  *
+ * The relative distance within the nodes in the candidates is considered
+ * as the closer the nodes, the better for the domain ending up on the
+ * candidate.
+ *
  * Note that this completely ignore the number of nodes each candidate span,
  * as the fact that fewer nodes is better is already accounted for in the
  * algorithm.
@@ -124,6 +131,9 @@ static int numa_cmpf(const libxl__numa_c
     if (c1->nr_vcpus != c2->nr_vcpus)
         return c1->nr_vcpus - c2->nr_vcpus;
 
+    if (c1->dists_sum != c2->dists_sum)
+        return c1->dists_sum - c2->dists_sum;
+
     return c2->free_memkb - c1->free_memkb;
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2732,6 +2732,7 @@ static inline void libxl__ctx_unlock(lib
 typedef struct {
     int nr_cpus, nr_nodes;
     int nr_vcpus;
+    int dists_sum;
     uint32_t free_memkb;
     libxl_bitmap nodemap;
 } libxl__numa_candidate;
diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c
--- a/tools/libxl/libxl_numa.c
+++ b/tools/libxl/libxl_numa.c
@@ -218,6 +218,40 @@ static int nodemap_to_nr_vcpus(libxl__gc
     return nr_vcpus;
 }
 
+/* Sum the relative distances of nodes in the nodemap to help finding
+ * out which candidate is the "tightest" one. */
+static int nodemap_to_dists_sum(libxl_numainfo *ninfo, libxl_bitmap *nodemap)
+{
+    int tot_dist = 0;
+    int i, j, a = 0, b;
+
+    for (i = 0; i < libxl_bitmap_count_set(nodemap); i++) {
+        while (!libxl_bitmap_test(nodemap, a))
+            a++;
+
+        /* As it is usually non-zero, we do take the latency of
+         * of a node to itself into account. */
+        b = a;
+        for (j = 0; j < libxl_bitmap_count_set(nodemap) - i; j++) {
+            while (!libxl_bitmap_test(nodemap, b))
+                b++;
+
+            /*
+             * In most architectures, going from node A to node B costs
+             * exactly as much as going from B to A does. However, let's
+             * not rely on this and consider both contributions, just to
+             * be ready for everything future might reserve for us.
+             */
+            tot_dist += ninfo[a].dists[b];
+            tot_dist += ninfo[b].dists[a];
+            b++;
+        }
+        a++;
+    }
+
+    return tot_dist;
+}
+
 /*
  * This function tries to figure out if the host has a consistent number
  * of cpus along all its NUMA nodes. In fact, if that is the case, we can
@@ -415,6 +449,7 @@ int libxl__get_numa_candidate(libxl__gc 
              */
             libxl__numa_candidate_put_nodemap(gc, &new_cndt, &nodemap);
             new_cndt.nr_vcpus = nodemap_to_nr_vcpus(gc, tinfo, &nodemap);
+            new_cndt.dists_sum = nodemap_to_dists_sum(ninfo, &nodemap);
             new_cndt.free_memkb = nodes_free_memkb;
             new_cndt.nr_nodes = libxl_bitmap_count_set(&nodemap);
             new_cndt.nr_cpus = nodes_cpus;
@@ -430,12 +465,14 @@ int libxl__get_numa_candidate(libxl__gc 
 
                 LOG(DEBUG, "New best NUMA placement candidate found: "
                            "nr_nodes=%d, nr_cpus=%d, nr_vcpus=%d, "
-                           "free_memkb=%"PRIu32"", new_cndt.nr_nodes,
-                           new_cndt.nr_cpus, new_cndt.nr_vcpus,
+                           "dists_sum=%d, free_memkb=%"PRIu32"",
+                           new_cndt.nr_nodes, new_cndt.nr_cpus,
+                           new_cndt.nr_vcpus, new_cndt.dists_sum,
                            new_cndt.free_memkb / 1024);
 
                 libxl__numa_candidate_put_nodemap(gc, cndt_out, &nodemap);
                 cndt_out->nr_vcpus = new_cndt.nr_vcpus;
+                cndt_out->dists_sum = new_cndt.dists_sum;
                 cndt_out->free_memkb = new_cndt.free_memkb;
                 cndt_out->nr_nodes = new_cndt.nr_nodes;
                 cndt_out->nr_cpus = new_cndt.nr_cpus;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2 of 3] libxl, xl: user can ask for min and max nodes to use during placement
  2012-10-16 17:26 [PATCH 0 of 3] Some small NUMA placement improvements Dario Faggioli
  2012-10-16 17:26 ` [PATCH 1 of 3] libxl: take node distances into account during NUMA placement Dario Faggioli
@ 2012-10-16 17:26 ` Dario Faggioli
  2012-10-18 15:21   ` George Dunlap
  2012-10-16 17:26 ` [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning Dario Faggioli
  2 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2012-10-16 17:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Ian Jackson, Ian Campbell

And the placement algorithm will stick to that, or fail. This happens
adding support for "minnodes=" and "maxnodes=" in the domain config
file parser.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -133,6 +133,18 @@ the same time, achieving efficient utili
 
 =back
 
+=item B<minnodes=N>
+
+Tells libxl to place the new domain on at least `N` nodes. This is only
+effective if automatic NUMA placement occurs (i.e., if no `cpus=` option
+is specified).
+
+=item B<maxnodes=M>
+
+Tells libxl to place the new domain on at most `M` nodes. This is only
+effective if automatic NUMA placement occurs (i.e., if no `cpus=` option
+is specified).
+
 =head3 CPU Scheduling
 
 =over 4
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -219,6 +219,11 @@ int libxl__domain_build_info_setdefault(
 
     libxl_defbool_setdefault(&b_info->numa_placement, true);
 
+    if (!b_info->min_nodes)
+        b_info->min_nodes = 0;
+    if (!b_info->max_nodes)
+        b_info->max_nodes = 0;
+
     if (b_info->max_memkb == LIBXL_MEMKB_DEFAULT)
         b_info->max_memkb = 32 * 1024;
     if (b_info->target_memkb == LIBXL_MEMKB_DEFAULT)
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -174,7 +174,8 @@ static int numa_place_domain(libxl__gc *
     /* Find the best candidate with enough free memory and at least
      * as much pcpus as the domain has vcpus.  */
     rc = libxl__get_numa_candidate(gc, memkb, info->max_vcpus,
-                                   0, 0, &cpupool_info.cpumap,
+                                   info->min_nodes, info->max_nodes,
+                                   &cpupool_info.cpumap,
                                    numa_cmpf, &candidate, &found);
     if (rc)
         goto out;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -262,6 +262,8 @@ libxl_domain_build_info = Struct("domain
     ("avail_vcpus",     libxl_bitmap),
     ("cpumap",          libxl_bitmap),
     ("numa_placement",  libxl_defbool),
+    ("min_nodes",       integer),
+    ("max_nodes",       integer),
     ("tsc_mode",        libxl_tsc_mode),
     ("max_memkb",       MemKB),
     ("target_memkb",    MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -731,6 +731,11 @@ static void parse_config_data(const char
         libxl_defbool_set(&b_info->numa_placement, false);
     }
 
+    if (!xlu_cfg_get_long (config, "minnodes", &l, 0))
+        b_info->min_nodes = l;
+    if (!xlu_cfg_get_long (config, "maxnodes", &l, 0))
+        b_info->max_nodes = l;
+
     if (!xlu_cfg_get_long (config, "memory", &l, 0)) {
         b_info->max_memkb = l * 1024;
         b_info->target_memkb = b_info->max_memkb;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning
  2012-10-16 17:26 [PATCH 0 of 3] Some small NUMA placement improvements Dario Faggioli
  2012-10-16 17:26 ` [PATCH 1 of 3] libxl: take node distances into account during NUMA placement Dario Faggioli
  2012-10-16 17:26 ` [PATCH 2 of 3] libxl, xl: user can ask for min and max nodes to use during placement Dario Faggioli
@ 2012-10-16 17:26 ` Dario Faggioli
  2012-10-18 11:23   ` Ian Campbell
  2012-10-18 15:30   ` George Dunlap
  2 siblings, 2 replies; 22+ messages in thread
From: Dario Faggioli @ 2012-10-16 17:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Ian Jackson, Ian Campbell

Making it possible to use something like the following:
 * "nodes:0-3": all pCPUs of nodes 0,1,2,3;
 * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3;
 * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2 but not pCPU 6;
 * ...

In both domain config file and `xl vcpu-pin'.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -109,7 +109,7 @@ some cpus on its own (see below). A C<CP
 
 =over 4
 
-=item "all"
+=item "all" (or "nodes:all")
 
 To allow all the vcpus of the guest to run on all the cpus on the host.
 
@@ -117,6 +117,14 @@ To allow all the vcpus of the guest to r
 
 To allow all the vcpus of the guest to run on cpus 0,2,3,5.
 
+=item "nodes:0-3,^node:2"
+
+To allow all the vcpus of the guest to run on the cpus belonging to
+the NUMA nodes 0,1,3 of the host. Notice that it is possible to combine
+this syntax with the one above. That means, something like "1,node:2,^6"
+is possible and means all the vcpus of the guest will run on cpus 1 plus
+on all the cpus of node 2, but never on cpu 6.
+
 =item ["2", "3"] (or [2, 3])
 
 To ask for specific vcpu mapping. That means (in this example), vcpu #0
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -506,31 +506,58 @@ static void split_string_into_string_lis
 
 static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap)
 {
-    libxl_bitmap exclude_cpumap;
-    uint32_t cpuida, cpuidb;
+    libxl_bitmap nodemap, cpu_nodemap;
+    libxl_bitmap exclude_cpumap, exclude_nodemap;
+    uint32_t ida, idb;
     char *endptr, *toka, *tokb, *saveptr = NULL;
-    int i, rc = 0, rmcpu;
-
-    if (!strcmp(cpu, "all")) {
+    int i, rc = 0, isnot, isnode;
+
+    if (!strcmp(cpu, "all") || !strcmp(cpu, "nodes:all")) {
         libxl_bitmap_set_any(cpumap);
         return 0;
     }
 
-    if (libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0)) {
+    libxl_bitmap_init(&cpu_nodemap);
+    libxl_bitmap_init(&nodemap);
+    libxl_bitmap_init(&exclude_nodemap);
+    libxl_bitmap_init(&exclude_nodemap);
+
+    rc = libxl_node_bitmap_alloc(ctx, &cpu_nodemap, 0);
+    if (rc) {
+        fprintf(stderr, "Error: Failed to allocate nodemap.\n");
+        goto vcpp_out;
+    }
+    rc = libxl_node_bitmap_alloc(ctx, &nodemap, 0);
+    if (rc) {
+        fprintf(stderr, "Error: Failed to allocate nodemap.\n");
+        goto vcpp_out;
+    }
+    rc = libxl_node_bitmap_alloc(ctx, &exclude_nodemap, 0);
+    if (rc) {
+        fprintf(stderr, "Error: Failed to allocate nodemap.\n");
+        goto vcpp_out;
+    }
+    rc = libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0);
+    if (rc) {
         fprintf(stderr, "Error: Failed to allocate cpumap.\n");
-        return ENOMEM;
+        goto vcpp_out;
     }
 
     for (toka = strtok_r(cpu, ",", &saveptr); toka;
          toka = strtok_r(NULL, ",", &saveptr)) {
-        rmcpu = 0;
+        isnot = 0; isnode = 0;
         if (*toka == '^') {
-            /* This (These) Cpu(s) will be removed from the map */
+            /* This (These) Cpu(s)/Node(s) will be removed from the map */
             toka++;
-            rmcpu = 1;
-        }
-        /* Extract a valid (range of) cpu(s) */
-        cpuida = cpuidb = strtoul(toka, &endptr, 10);
+            isnot = 1;
+        }
+        /* Check if we're dealing with a full node */
+        if (strstr(toka, "node:") == toka || strstr(toka, "nodes:") == toka) {
+            toka += 5 + (toka[4] == 's');
+            isnode = 1;
+        }
+        /* Extract a valid (range of) cpu(s) or node(s) */
+        ida = idb = strtoul(toka, &endptr, 10);
         if (endptr == toka) {
             fprintf(stderr, "Error: Invalid argument.\n");
             rc = EINVAL;
@@ -538,27 +565,48 @@ static int vcpupin_parse(char *cpu, libx
         }
         if (*endptr == '-') {
             tokb = endptr + 1;
-            cpuidb = strtoul(tokb, &endptr, 10);
-            if (endptr == tokb || cpuida > cpuidb) {
+            idb = strtoul(tokb, &endptr, 10);
+            if (endptr == tokb || ida > idb) {
                 fprintf(stderr, "Error: Invalid argument.\n");
                 rc = EINVAL;
                 goto vcpp_out;
             }
         }
-        while (cpuida <= cpuidb) {
-            rmcpu == 0 ? libxl_bitmap_set(cpumap, cpuida) :
-                         libxl_bitmap_set(&exclude_cpumap, cpuida);
-            cpuida++;
-        }
-    }
-
-    /* Clear all the cpus from the removal list */
+        while (ida <= idb) {
+            if (!isnode)
+                isnot == 0 ? libxl_bitmap_set(cpumap, ida) :
+                             libxl_bitmap_set(&exclude_cpumap, ida);
+            else
+                isnot == 0 ? libxl_bitmap_set(&nodemap, ida) :
+                             libxl_bitmap_set(&exclude_nodemap, ida);
+            ida++;
+        }
+    }
+
+    /* Add the cpus that have been specified via "node:" items */
+    rc = libxl_nodemap_to_cpumap(ctx, &nodemap, &cpu_nodemap);
+    if (rc)
+        goto vcpp_out;
+    libxl_for_each_set_bit(i, cpu_nodemap) {
+        libxl_bitmap_set(cpumap, i);
+    }
+
+    /* Clear all the cpus from the removal cpu and node lists */
     libxl_for_each_set_bit(i, exclude_cpumap) {
         libxl_bitmap_reset(cpumap, i);
     }
+    rc = libxl_nodemap_to_cpumap(ctx, &exclude_nodemap, &cpu_nodemap);
+    if (rc)
+        goto vcpp_out;
+    libxl_for_each_set_bit(i, cpu_nodemap) {
+        libxl_bitmap_reset(cpumap, i);
+    }
 
 vcpp_out:
     libxl_bitmap_dispose(&exclude_cpumap);
+    libxl_bitmap_dispose(&exclude_nodemap);
+    libxl_bitmap_dispose(&nodemap);
+    libxl_bitmap_dispose(&cpu_nodemap);
 
     return rc;
 }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning
  2012-10-16 17:26 ` [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning Dario Faggioli
@ 2012-10-18 11:23   ` Ian Campbell
  2012-10-18 13:11     ` Dario Faggioli
  2012-10-18 15:30   ` George Dunlap
  1 sibling, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2012-10-18 11:23 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andre Przywara, Ian Jackson, xen-devel@lists.xen.org

On Tue, 2012-10-16 at 18:26 +0100, Dario Faggioli wrote:

> +            isnot = 1;
> +        }
> +        /* Check if we're dealing with a full node */
> +        if (strstr(toka, "node:") == toka || strstr(toka, "nodes:") == toka) {

Isn't strstr(FOO, "thing") == FOO just an expensive way of writing
strncmp(FOO, "ting", sizeof(...))?

Ian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning
  2012-10-18 11:23   ` Ian Campbell
@ 2012-10-18 13:11     ` Dario Faggioli
  2012-10-18 13:15       ` Ian Campbell
  0 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2012-10-18 13:11 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andre Przywara, Ian Jackson, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1228 bytes --]

On Thu, 2012-10-18 at 12:23 +0100, Ian Campbell wrote: 
> On Tue, 2012-10-16 at 18:26 +0100, Dario Faggioli wrote:
> 
> > +            isnot = 1;
> > +        }
> > +        /* Check if we're dealing with a full node */
> > +        if (strstr(toka, "node:") == toka || strstr(toka, "nodes:") == toka) {
> 
> Isn't strstr(FOO, "thing") == FOO just an expensive way of writing
> strncmp(FOO, "ting", sizeof(...))?
> 
Mmm... what I wanted was to make sure that "thing" not only is part of
FOO, but that also is the very beginning of that same string. That's why
I exploited the fact that strstr() returns a pointer to the beginning of
the substring "thing" within FOO.

Now that you mentioned it, I think that I actually can achieve the same
with strncmp, limiting the comparison at strlen("thing"). If that was
what you meant, the answer is yes, I can do that (and I agree it's
probably better).

Thanks for looking into this,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning
  2012-10-18 13:11     ` Dario Faggioli
@ 2012-10-18 13:15       ` Ian Campbell
  2012-10-18 13:18         ` Dario Faggioli
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2012-10-18 13:15 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andre Przywara, Ian Jackson, xen-devel@lists.xen.org

On Thu, 2012-10-18 at 14:11 +0100, Dario Faggioli wrote:
> On Thu, 2012-10-18 at 12:23 +0100, Ian Campbell wrote: 
> > On Tue, 2012-10-16 at 18:26 +0100, Dario Faggioli wrote:
> > 
> > > +            isnot = 1;
> > > +        }
> > > +        /* Check if we're dealing with a full node */
> > > +        if (strstr(toka, "node:") == toka || strstr(toka, "nodes:") == toka) {
> > 
> > Isn't strstr(FOO, "thing") == FOO just an expensive way of writing
> > strncmp(FOO, "ting", sizeof(...))?
> > 
> Mmm... what I wanted was to make sure that "thing" not only is part of
> FOO, but that also is the very beginning of that same string. That's why
> I exploited the fact that strstr() returns a pointer to the beginning of
> the substring "thing" within FOO.
> 
> Now that you mentioned it, I think that I actually can achieve the same
> with strncmp, limiting the comparison at strlen("thing"). If that was
> what you meant, the answer is yes, I can do that (and I agree it's
> probably better).

Right, With the strstr method if there is 1000 characters in the input
before the "thing" you've wasted a bunch of time scanning for it just to
throw it away, whereas with str*cmp you would give up after the first
non-matching char.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning
  2012-10-18 13:15       ` Ian Campbell
@ 2012-10-18 13:18         ` Dario Faggioli
  0 siblings, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2012-10-18 13:18 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andre Przywara, Ian Jackson, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 630 bytes --]

On Thu, 2012-10-18 at 14:15 +0100, Ian Campbell wrote: 
> Right, With the strstr method if there is 1000 characters in the input
> before the "thing" you've wasted a bunch of time scanning for it just to
> throw it away, whereas with str*cmp you would give up after the first
> non-matching char.
> 
Ok, I'll go for it.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-16 17:26 ` [PATCH 1 of 3] libxl: take node distances into account during NUMA placement Dario Faggioli
@ 2012-10-18 15:17   ` George Dunlap
  2012-10-18 23:20     ` Dario Faggioli
  0 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2012-10-18 15:17 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andre Przywara, Ian Jackson, Ian Campbell, xen-devel

On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> In fact, among placement candidates with the same number of nodes, the
> closer the various nodes are to each others, the better the performances
> for a domain placed there.

Looks good overall -- my only worry is the N^2 nature of the
algorithm.  We're already doing some big combinatorial thing to
generate the candidates, right?  And now we're doing N^2 for each
candidate? Suppose we get an ARM system with 4096 cores and 128 NUMA
nodes?  If Xen 4.4 doesn't come out until March 2014, there will still
be distros using 4.3 through mid-2015.

I seem to remember having a discussion about this issue already, but I
can't remember what the outcome was...

 -George


>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -105,6 +105,9 @@ out:
>   *  - the number of vcpus runnable on the candidates is considered, and
>   *    candidates with fewer of them are preferred. If two candidate have
>   *    the same number of runnable vcpus,
> + *  - the sum of the node distances in the candidates is considered, and
> + *    candidates with smaller total distance are preferred. If total
> + *    distance is the same for the two candidatess,
>   *  - the amount of free memory in the candidates is considered, and the
>   *    candidate with greater amount of it is preferred.
>   *
> @@ -114,6 +117,10 @@ out:
>   * overloading large (from a memory POV) nodes. That's right the effect
>   * that counting the vcpus able to run on the nodes tries to prevent.
>   *
> + * The relative distance within the nodes in the candidates is considered
> + * as the closer the nodes, the better for the domain ending up on the
> + * candidate.
> + *
>   * Note that this completely ignore the number of nodes each candidate span,
>   * as the fact that fewer nodes is better is already accounted for in the
>   * algorithm.
> @@ -124,6 +131,9 @@ static int numa_cmpf(const libxl__numa_c
>      if (c1->nr_vcpus != c2->nr_vcpus)
>          return c1->nr_vcpus - c2->nr_vcpus;
>
> +    if (c1->dists_sum != c2->dists_sum)
> +        return c1->dists_sum - c2->dists_sum;
> +
>      return c2->free_memkb - c1->free_memkb;
>  }
>
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2732,6 +2732,7 @@ static inline void libxl__ctx_unlock(lib
>  typedef struct {
>      int nr_cpus, nr_nodes;
>      int nr_vcpus;
> +    int dists_sum;
>      uint32_t free_memkb;
>      libxl_bitmap nodemap;
>  } libxl__numa_candidate;
> diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c
> --- a/tools/libxl/libxl_numa.c
> +++ b/tools/libxl/libxl_numa.c
> @@ -218,6 +218,40 @@ static int nodemap_to_nr_vcpus(libxl__gc
>      return nr_vcpus;
>  }
>
> +/* Sum the relative distances of nodes in the nodemap to help finding
> + * out which candidate is the "tightest" one. */
> +static int nodemap_to_dists_sum(libxl_numainfo *ninfo, libxl_bitmap *nodemap)
> +{
> +    int tot_dist = 0;
> +    int i, j, a = 0, b;
> +
> +    for (i = 0; i < libxl_bitmap_count_set(nodemap); i++) {
> +        while (!libxl_bitmap_test(nodemap, a))
> +            a++;
> +
> +        /* As it is usually non-zero, we do take the latency of
> +         * of a node to itself into account. */
> +        b = a;
> +        for (j = 0; j < libxl_bitmap_count_set(nodemap) - i; j++) {
> +            while (!libxl_bitmap_test(nodemap, b))
> +                b++;
> +
> +            /*
> +             * In most architectures, going from node A to node B costs
> +             * exactly as much as going from B to A does. However, let's
> +             * not rely on this and consider both contributions, just to
> +             * be ready for everything future might reserve for us.
> +             */
> +            tot_dist += ninfo[a].dists[b];
> +            tot_dist += ninfo[b].dists[a];
> +            b++;
> +        }
> +        a++;
> +    }
> +
> +    return tot_dist;
> +}
> +
>  /*
>   * This function tries to figure out if the host has a consistent number
>   * of cpus along all its NUMA nodes. In fact, if that is the case, we can
> @@ -415,6 +449,7 @@ int libxl__get_numa_candidate(libxl__gc
>               */
>              libxl__numa_candidate_put_nodemap(gc, &new_cndt, &nodemap);
>              new_cndt.nr_vcpus = nodemap_to_nr_vcpus(gc, tinfo, &nodemap);
> +            new_cndt.dists_sum = nodemap_to_dists_sum(ninfo, &nodemap);
>              new_cndt.free_memkb = nodes_free_memkb;
>              new_cndt.nr_nodes = libxl_bitmap_count_set(&nodemap);
>              new_cndt.nr_cpus = nodes_cpus;
> @@ -430,12 +465,14 @@ int libxl__get_numa_candidate(libxl__gc
>
>                  LOG(DEBUG, "New best NUMA placement candidate found: "
>                             "nr_nodes=%d, nr_cpus=%d, nr_vcpus=%d, "
> -                           "free_memkb=%"PRIu32"", new_cndt.nr_nodes,
> -                           new_cndt.nr_cpus, new_cndt.nr_vcpus,
> +                           "dists_sum=%d, free_memkb=%"PRIu32"",
> +                           new_cndt.nr_nodes, new_cndt.nr_cpus,
> +                           new_cndt.nr_vcpus, new_cndt.dists_sum,
>                             new_cndt.free_memkb / 1024);
>
>                  libxl__numa_candidate_put_nodemap(gc, cndt_out, &nodemap);
>                  cndt_out->nr_vcpus = new_cndt.nr_vcpus;
> +                cndt_out->dists_sum = new_cndt.dists_sum;
>                  cndt_out->free_memkb = new_cndt.free_memkb;
>                  cndt_out->nr_nodes = new_cndt.nr_nodes;
>                  cndt_out->nr_cpus = new_cndt.nr_cpus;
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2 of 3] libxl, xl: user can ask for min and max nodes to use during placement
  2012-10-16 17:26 ` [PATCH 2 of 3] libxl, xl: user can ask for min and max nodes to use during placement Dario Faggioli
@ 2012-10-18 15:21   ` George Dunlap
  0 siblings, 0 replies; 22+ messages in thread
From: George Dunlap @ 2012-10-18 15:21 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andre Przywara, Ian Jackson, Ian Campbell, xen-devel

On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> And the placement algorithm will stick to that, or fail. This happens
> adding support for "minnodes=" and "maxnodes=" in the domain config
> file parser.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

>
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -133,6 +133,18 @@ the same time, achieving efficient utili
>
>  =back
>
> +=item B<minnodes=N>
> +
> +Tells libxl to place the new domain on at least `N` nodes. This is only
> +effective if automatic NUMA placement occurs (i.e., if no `cpus=` option
> +is specified).
> +
> +=item B<maxnodes=M>
> +
> +Tells libxl to place the new domain on at most `M` nodes. This is only
> +effective if automatic NUMA placement occurs (i.e., if no `cpus=` option
> +is specified).
> +
>  =head3 CPU Scheduling
>
>  =over 4
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -219,6 +219,11 @@ int libxl__domain_build_info_setdefault(
>
>      libxl_defbool_setdefault(&b_info->numa_placement, true);
>
> +    if (!b_info->min_nodes)
> +        b_info->min_nodes = 0;
> +    if (!b_info->max_nodes)
> +        b_info->max_nodes = 0;
> +
>      if (b_info->max_memkb == LIBXL_MEMKB_DEFAULT)
>          b_info->max_memkb = 32 * 1024;
>      if (b_info->target_memkb == LIBXL_MEMKB_DEFAULT)
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -174,7 +174,8 @@ static int numa_place_domain(libxl__gc *
>      /* Find the best candidate with enough free memory and at least
>       * as much pcpus as the domain has vcpus.  */
>      rc = libxl__get_numa_candidate(gc, memkb, info->max_vcpus,
> -                                   0, 0, &cpupool_info.cpumap,
> +                                   info->min_nodes, info->max_nodes,
> +                                   &cpupool_info.cpumap,
>                                     numa_cmpf, &candidate, &found);
>      if (rc)
>          goto out;
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -262,6 +262,8 @@ libxl_domain_build_info = Struct("domain
>      ("avail_vcpus",     libxl_bitmap),
>      ("cpumap",          libxl_bitmap),
>      ("numa_placement",  libxl_defbool),
> +    ("min_nodes",       integer),
> +    ("max_nodes",       integer),
>      ("tsc_mode",        libxl_tsc_mode),
>      ("max_memkb",       MemKB),
>      ("target_memkb",    MemKB),
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -731,6 +731,11 @@ static void parse_config_data(const char
>          libxl_defbool_set(&b_info->numa_placement, false);
>      }
>
> +    if (!xlu_cfg_get_long (config, "minnodes", &l, 0))
> +        b_info->min_nodes = l;
> +    if (!xlu_cfg_get_long (config, "maxnodes", &l, 0))
> +        b_info->max_nodes = l;
> +
>      if (!xlu_cfg_get_long (config, "memory", &l, 0)) {
>          b_info->max_memkb = l * 1024;
>          b_info->target_memkb = b_info->max_memkb;
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning
  2012-10-16 17:26 ` [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning Dario Faggioli
  2012-10-18 11:23   ` Ian Campbell
@ 2012-10-18 15:30   ` George Dunlap
  2012-10-18 22:35     ` Dario Faggioli
  1 sibling, 1 reply; 22+ messages in thread
From: George Dunlap @ 2012-10-18 15:30 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Andre Przywara, Ian Jackson, Ian Campbell, xen-devel

On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> Making it possible to use something like the following:
>  * "nodes:0-3": all pCPUs of nodes 0,1,2,3;
>  * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3;
>  * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2 but not pCPU 6;
>  * ...
>
> In both domain config file and `xl vcpu-pin'.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Is it OK if I give this an "I'm OK with the idea but I'm too lazy to
figure out if the parsing code is doing the right thing" Ack? :-)

 -George

>
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -109,7 +109,7 @@ some cpus on its own (see below). A C<CP
>
>  =over 4
>
> -=item "all"
> +=item "all" (or "nodes:all")
>
>  To allow all the vcpus of the guest to run on all the cpus on the host.
>
> @@ -117,6 +117,14 @@ To allow all the vcpus of the guest to r
>
>  To allow all the vcpus of the guest to run on cpus 0,2,3,5.
>
> +=item "nodes:0-3,^node:2"
> +
> +To allow all the vcpus of the guest to run on the cpus belonging to
> +the NUMA nodes 0,1,3 of the host. Notice that it is possible to combine
> +this syntax with the one above. That means, something like "1,node:2,^6"
> +is possible and means all the vcpus of the guest will run on cpus 1 plus
> +on all the cpus of node 2, but never on cpu 6.
> +
>  =item ["2", "3"] (or [2, 3])
>
>  To ask for specific vcpu mapping. That means (in this example), vcpu #0
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -506,31 +506,58 @@ static void split_string_into_string_lis
>
>  static int vcpupin_parse(char *cpu, libxl_bitmap *cpumap)
>  {
> -    libxl_bitmap exclude_cpumap;
> -    uint32_t cpuida, cpuidb;
> +    libxl_bitmap nodemap, cpu_nodemap;
> +    libxl_bitmap exclude_cpumap, exclude_nodemap;
> +    uint32_t ida, idb;
>      char *endptr, *toka, *tokb, *saveptr = NULL;
> -    int i, rc = 0, rmcpu;
> -
> -    if (!strcmp(cpu, "all")) {
> +    int i, rc = 0, isnot, isnode;
> +
> +    if (!strcmp(cpu, "all") || !strcmp(cpu, "nodes:all")) {
>          libxl_bitmap_set_any(cpumap);
>          return 0;
>      }
>
> -    if (libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0)) {
> +    libxl_bitmap_init(&cpu_nodemap);
> +    libxl_bitmap_init(&nodemap);
> +    libxl_bitmap_init(&exclude_nodemap);
> +    libxl_bitmap_init(&exclude_nodemap);
> +
> +    rc = libxl_node_bitmap_alloc(ctx, &cpu_nodemap, 0);
> +    if (rc) {
> +        fprintf(stderr, "Error: Failed to allocate nodemap.\n");
> +        goto vcpp_out;
> +    }
> +    rc = libxl_node_bitmap_alloc(ctx, &nodemap, 0);
> +    if (rc) {
> +        fprintf(stderr, "Error: Failed to allocate nodemap.\n");
> +        goto vcpp_out;
> +    }
> +    rc = libxl_node_bitmap_alloc(ctx, &exclude_nodemap, 0);
> +    if (rc) {
> +        fprintf(stderr, "Error: Failed to allocate nodemap.\n");
> +        goto vcpp_out;
> +    }
> +    rc = libxl_cpu_bitmap_alloc(ctx, &exclude_cpumap, 0);
> +    if (rc) {
>          fprintf(stderr, "Error: Failed to allocate cpumap.\n");
> -        return ENOMEM;
> +        goto vcpp_out;
>      }
>
>      for (toka = strtok_r(cpu, ",", &saveptr); toka;
>           toka = strtok_r(NULL, ",", &saveptr)) {
> -        rmcpu = 0;
> +        isnot = 0; isnode = 0;
>          if (*toka == '^') {
> -            /* This (These) Cpu(s) will be removed from the map */
> +            /* This (These) Cpu(s)/Node(s) will be removed from the map */
>              toka++;
> -            rmcpu = 1;
> -        }
> -        /* Extract a valid (range of) cpu(s) */
> -        cpuida = cpuidb = strtoul(toka, &endptr, 10);
> +            isnot = 1;
> +        }
> +        /* Check if we're dealing with a full node */
> +        if (strstr(toka, "node:") == toka || strstr(toka, "nodes:") == toka) {
> +            toka += 5 + (toka[4] == 's');
> +            isnode = 1;
> +        }
> +        /* Extract a valid (range of) cpu(s) or node(s) */
> +        ida = idb = strtoul(toka, &endptr, 10);
>          if (endptr == toka) {
>              fprintf(stderr, "Error: Invalid argument.\n");
>              rc = EINVAL;
> @@ -538,27 +565,48 @@ static int vcpupin_parse(char *cpu, libx
>          }
>          if (*endptr == '-') {
>              tokb = endptr + 1;
> -            cpuidb = strtoul(tokb, &endptr, 10);
> -            if (endptr == tokb || cpuida > cpuidb) {
> +            idb = strtoul(tokb, &endptr, 10);
> +            if (endptr == tokb || ida > idb) {
>                  fprintf(stderr, "Error: Invalid argument.\n");
>                  rc = EINVAL;
>                  goto vcpp_out;
>              }
>          }
> -        while (cpuida <= cpuidb) {
> -            rmcpu == 0 ? libxl_bitmap_set(cpumap, cpuida) :
> -                         libxl_bitmap_set(&exclude_cpumap, cpuida);
> -            cpuida++;
> -        }
> -    }
> -
> -    /* Clear all the cpus from the removal list */
> +        while (ida <= idb) {
> +            if (!isnode)
> +                isnot == 0 ? libxl_bitmap_set(cpumap, ida) :
> +                             libxl_bitmap_set(&exclude_cpumap, ida);
> +            else
> +                isnot == 0 ? libxl_bitmap_set(&nodemap, ida) :
> +                             libxl_bitmap_set(&exclude_nodemap, ida);
> +            ida++;
> +        }
> +    }
> +
> +    /* Add the cpus that have been specified via "node:" items */
> +    rc = libxl_nodemap_to_cpumap(ctx, &nodemap, &cpu_nodemap);
> +    if (rc)
> +        goto vcpp_out;
> +    libxl_for_each_set_bit(i, cpu_nodemap) {
> +        libxl_bitmap_set(cpumap, i);
> +    }
> +
> +    /* Clear all the cpus from the removal cpu and node lists */
>      libxl_for_each_set_bit(i, exclude_cpumap) {
>          libxl_bitmap_reset(cpumap, i);
>      }
> +    rc = libxl_nodemap_to_cpumap(ctx, &exclude_nodemap, &cpu_nodemap);
> +    if (rc)
> +        goto vcpp_out;
> +    libxl_for_each_set_bit(i, cpu_nodemap) {
> +        libxl_bitmap_reset(cpumap, i);
> +    }
>
>  vcpp_out:
>      libxl_bitmap_dispose(&exclude_cpumap);
> +    libxl_bitmap_dispose(&exclude_nodemap);
> +    libxl_bitmap_dispose(&nodemap);
> +    libxl_bitmap_dispose(&cpu_nodemap);
>
>      return rc;
>  }
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning
  2012-10-18 15:30   ` George Dunlap
@ 2012-10-18 22:35     ` Dario Faggioli
  0 siblings, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2012-10-18 22:35 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andre Przywara, Ian Jackson, Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1460 bytes --]

On Thu, 2012-10-18 at 16:30 +0100, George Dunlap wrote:
> On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > Making it possible to use something like the following:
> >  * "nodes:0-3": all pCPUs of nodes 0,1,2,3;
> >  * "nodes:0-3,^node:2": all pCPUS of nodes 0,1,3;
> >  * "1,nodes:1-2,^6": pCPU 1 plus all pCPUs of nodes 1,2 but not pCPU 6;
> >  * ...
> >
> > In both domain config file and `xl vcpu-pin'.
> >
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> Is it OK if I give this an "I'm OK with the idea but I'm too lazy to
> figure out if the parsing code is doing the right thing" Ack? :-)
> 
Well, it's certainly ok with me... :-)

Actually, the new parsing code is nothing particularly sexy, and the
only relevant bit is the one IanC commented about, i.e., checking for a
"node:" substring at the beginning of each ',' separated element of the
original string.

That's it.

The reason why this requires this whole lot of + lines is that I need a
couple of nodemap where to store the data, and declaring, initializing,
allocating and freeing them requires that. :-(

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-18 15:17   ` George Dunlap
@ 2012-10-18 23:20     ` Dario Faggioli
  2012-10-19 10:03       ` Ian Jackson
                         ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Dario Faggioli @ 2012-10-18 23:20 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andre Przywara, Ian Campbell, Stefano Stabellini, Juergen Gross,
	Ian Jackson, xen-devel@lists.xen.org

[-- Attachment #1.1: Type: text/plain, Size: 3565 bytes --]

On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote:
> On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > In fact, among placement candidates with the same number of nodes, the
> > closer the various nodes are to each others, the better the performances
> > for a domain placed there.
> 
> Looks good overall -- my only worry is the N^2 nature of the
> algorithm.  We're already doing some big combinatorial thing to
> generate the candidates, right?  
>
It is, with N being the number of nodes, which we discussed thoroughly
already a couple of months ago, and reached consensus on the fact that N
will stay less than 8 for the next 5 (but probably even more) years. :-)

In any case, if something really unexpected happens, and N jumps to
anything bigger than 16, the placement algorithm won't even start, and
we'll never reach this point!.

Moreover, given the number we're playing with, I don't think this
specific patch is adding much complexity, as we already have the
function that counts the number of vCPUs (as it was for xend) bound to a
candidate, which is Ndoms*Nvcpus, and we're very likely going o have
much more domains than nodes. :-)

> And now we're doing N^2 for each
> candidate? 
>
Again, yes, but that is turning it from Ndoms*Nvcpus to
Ndoms*Nvcpus+Nnodes^2, which is still dominated by the first term. IIRC,
Andre tried to start >50 domains with 2 vCPUs on a 8 nodes system, which
means 50*2 vs 8*8.

> Suppose we get an ARM system with 4096 cores and 128 NUMA
> nodes?  If Xen 4.4 doesn't come out until March 2014, there will still
> be distros using 4.3 through mid-2015.
> 
Right, but I really don't think that monster is actually made out of
4096 cores arranged in 128 _NUMA_ nodes on which you run the same
instance of the hypervisor.

I also recall hearing the numbers and the use of the word "node", but I
really think they was rather referred to a cluster architecture where "a
node" means something more like "a server", each one running their copy
of Xen (although they'll be packed all together in the same rack,
talking via some super-fast interconnect).
I'm pretty sure I remember Stefano speculating about the need to use
some orchestration layer (like {Cloud,Open}Stack) _within_ those big
irons to deal exactly with that... Stefano, am I talking nonsense? :-D

Finally, allow me to say that the whole placement algorithm already
interacts quite nicely with cpupools. Thus, even in the unlikely event
of an actual 128 NUMA nodes machine, you can have, say, 16 cpupools with
8 nodes each (or vice versa), and the algorithm will be back dealing
with _no_more_than_ 8 (or 16) nodes. Yes, right now this would require
for someone to manually setup the pools and decide which domain to put
where. However, it would be very very easy to add, at that point,
something doing this pooling and more coarse placing automatically (and
quickly). In fact, we can even think about having it for 4.3, if you
really believe it's necessary.

> I seem to remember having a discussion about this issue already, but I
> can't remember what the outcome was...
> 
Yep, we did, and the outcome was right what I tried to summarize
above. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-18 23:20     ` Dario Faggioli
@ 2012-10-19 10:03       ` Ian Jackson
  2012-10-19 10:39         ` Stefano Stabellini
  2012-10-19 10:35       ` Stefano Stabellini
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2012-10-19 10:03 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Andre Przywara, Ian Campbell, Stefano Stabellini, George Dunlap,
	Juergen Gross, xen-devel@lists.xen.org

Dario Faggioli writes ("Re: [Xen-devel] [PATCH 1 of 3] libxl: take node distances into account during NUMA placement"):
> On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote:
> > Looks good overall -- my only worry is the N^2 nature of the
> > algorithm.  We're already doing some big combinatorial thing to
> > generate the candidates, right?  
> >
> It is, with N being the number of nodes, which we discussed thoroughly
> already a couple of months ago, and reached consensus on the fact that N
> will stay less than 8 for the next 5 (but probably even more) years. :-)

No, I don't think we did reach consensus about that.  It was asserted
but I dissented.  I don't think this is a reasonable assumption.

> In any case, if something really unexpected happens, and N jumps to
> anything bigger than 16, the placement algorithm won't even start, and
> we'll never reach this point!.

And that is the safety catch you put in because I didn't think the
assumption about numbers of cpus was reasonable.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-18 23:20     ` Dario Faggioli
  2012-10-19 10:03       ` Ian Jackson
@ 2012-10-19 10:35       ` Stefano Stabellini
  2012-10-19 10:50       ` George Dunlap
  2012-10-19 14:57       ` Ian Jackson
  3 siblings, 0 replies; 22+ messages in thread
From: Stefano Stabellini @ 2012-10-19 10:35 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Andre Przywara, Ian Campbell, Stefano Stabellini, George Dunlap,
	Juergen Gross, Ian Jackson, xen-devel@lists.xen.org

On Fri, 19 Oct 2012, Dario Faggioli wrote:
> > Suppose we get an ARM system with 4096 cores and 128 NUMA
> > nodes?  If Xen 4.4 doesn't come out until March 2014, there will still
> > be distros using 4.3 through mid-2015.
> > 
> Right, but I really don't think that monster is actually made out of
> 4096 cores arranged in 128 _NUMA_ nodes on which you run the same
> instance of the hypervisor.

In fact it does not. It would be 4096 cores arranged in 1024 nodes, each
node completely independent from one another.


> I also recall hearing the numbers and the use of the word "node", but I
> really think they was rather referred to a cluster architecture where "a
> node" means something more like "a server", each one running their copy
> of Xen (although they'll be packed all together in the same rack,
> talking via some super-fast interconnect).

Exactly right.


> I'm pretty sure I remember Stefano speculating about the need to use
> some orchestration layer (like {Cloud,Open}Stack) _within_ those big
> irons to deal exactly with that... Stefano, am I talking nonsense? :-D

Nope, all true.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-19 10:03       ` Ian Jackson
@ 2012-10-19 10:39         ` Stefano Stabellini
  2012-10-19 10:56           ` Dario Faggioli
  0 siblings, 1 reply; 22+ messages in thread
From: Stefano Stabellini @ 2012-10-19 10:39 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andre Przywara, Ian Campbell, Stefano Stabellini, George Dunlap,
	Dario Faggioli, xen-devel@lists.xen.org, Juergen Gross

On Fri, 19 Oct 2012, Ian Jackson wrote:
> Dario Faggioli writes ("Re: [Xen-devel] [PATCH 1 of 3] libxl: take node distances into account during NUMA placement"):
> > On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote:
> > > Looks good overall -- my only worry is the N^2 nature of the
> > > algorithm.  We're already doing some big combinatorial thing to
> > > generate the candidates, right?  
> > >
> > It is, with N being the number of nodes, which we discussed thoroughly
> > already a couple of months ago, and reached consensus on the fact that N
> > will stay less than 8 for the next 5 (but probably even more) years. :-)
> 
> No, I don't think we did reach consensus about that.  It was asserted
> but I dissented.  I don't think this is a reasonable assumption.

When Dario speaks about consensus in terms of hardware that is going to
reach the market, he really means what Intel and AMD have told us.

Sorry but your opinion doesn't count that much in the matter of cpu
architecture being produces in the near future, unless you have already
started a secret Californian cpu startup, that nobody knows about yet
;-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-18 23:20     ` Dario Faggioli
  2012-10-19 10:03       ` Ian Jackson
  2012-10-19 10:35       ` Stefano Stabellini
@ 2012-10-19 10:50       ` George Dunlap
  2012-10-19 11:00         ` Dario Faggioli
  2012-10-19 14:57       ` Ian Jackson
  3 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2012-10-19 10:50 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Andre Przywara, Ian Campbell, Stefano Stabellini, Juergen Gross,
	Ian Jackson, xen-devel@lists.xen.org

On 19/10/12 00:20, Dario Faggioli wrote:
> On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote:
>> On Tue, Oct 16, 2012 at 6:26 PM, Dario Faggioli
>> <dario.faggioli@citrix.com> wrote:
>>> In fact, among placement candidates with the same number of nodes, the
>>> closer the various nodes are to each others, the better the performances
>>> for a domain placed there.
>> Looks good overall -- my only worry is the N^2 nature of the
>> algorithm.  We're already doing some big combinatorial thing to
>> generate the candidates, right?
>>
> It is, with N being the number of nodes, which we discussed thoroughly
> already a couple of months ago, and reached consensus on the fact that N
> will stay less than 8 for the next 5 (but probably even more) years. :-)
>
> In any case, if something really unexpected happens, and N jumps to
> anything bigger than 16, the placement algorithm won't even start, and
> we'll never reach this point!.

Ah, right -- Yes, I think the "safety valve" was the thing I wanted to 
have in place.

In that case:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-19 10:39         ` Stefano Stabellini
@ 2012-10-19 10:56           ` Dario Faggioli
  0 siblings, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2012-10-19 10:56 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Andre Przywara, Ian Campbell, George Dunlap, Juergen Gross,
	Ian Jackson, xen-devel@lists.xen.org

[-- Attachment #1.1: Type: text/plain, Size: 2267 bytes --]

On Fri, 2012-10-19 at 11:39 +0100, Stefano Stabellini wrote:
> On Fri, 19 Oct 2012, Ian Jackson wrote:
> > Dario Faggioli writes ("Re: [Xen-devel] [PATCH 1 of 3] libxl: take node distances into account during NUMA placement"):
> > > It is, with N being the number of nodes, which we discussed thoroughly
> > > already a couple of months ago, and reached consensus on the fact that N
> > > will stay less than 8 for the next 5 (but probably even more) years. :-)
> > 
> > No, I don't think we did reach consensus about that.  It was asserted
> > but I dissented.  I don't think this is a reasonable assumption.
> 
> When Dario speaks about consensus in terms of hardware that is going to
> reach the market, he really means what Intel and AMD have told us.
> 
Indeed.

That being said, I really wanted to avoid re-starting that discussion
so, while trying to summarize it as quickly as possible, I've no problem
admitting that "reach consensus" could not be the perfect choice of
words. My bad.

The whole point is that the solution we have (and that I'm trying to
keep up improving with these patches) is both viable with current and
near future hardware and no harms with any crazy unexpectable
breakthrough in CPU design --thanks to the safety catch IanJ suggested. 

Moreover, it is easily amendable, for example taking more advantage of
cpupools (with witch the placement algorithm is already integrated up to
some extent), in case the latter happens.
This is so true that, as I already said, I've no problem even starting
to think about how to put it together. Maybe not from tomorrow (I'm
quite busy with other stuff :-D), but definitely before 4.3, if we think
it's something we couldn't live without.

> Sorry but your opinion doesn't count that much in the matter of cpu
> architecture being produces in the near future, unless you have already
> started a secret Californian cpu startup, that nobody knows about yet
> ;-)
>
:-D

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-19 10:50       ` George Dunlap
@ 2012-10-19 11:00         ` Dario Faggioli
  0 siblings, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2012-10-19 11:00 UTC (permalink / raw)
  To: George Dunlap
  Cc: Andre Przywara, Ian Campbell, Stefano Stabellini, Juergen Gross,
	Ian Jackson, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 808 bytes --]

On Fri, 2012-10-19 at 11:50 +0100, George Dunlap wrote:
> > In any case, if something really unexpected happens, and N jumps to
> > anything bigger than 16, the placement algorithm won't even start, and
> > we'll never reach this point!.
> 
> Ah, right -- Yes, I think the "safety valve" was the thing I wanted to 
> have in place.
> 
Both you and Ian Jackson, yes. :-)

> In that case:
> 
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> 
Cool. I'll add you're Acks and respin. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-18 23:20     ` Dario Faggioli
                         ` (2 preceding siblings ...)
  2012-10-19 10:50       ` George Dunlap
@ 2012-10-19 14:57       ` Ian Jackson
  2012-10-19 18:02         ` Dario Faggioli
  3 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2012-10-19 14:57 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Andre Przywara, Ian Campbell, Stefano Stabellini, George Dunlap,
	Juergen Gross, xen-devel@lists.xen.org

Dario Faggioli writes ("Re: [Xen-devel] [PATCH 1 of 3] libxl: take node distances into account during NUMA placement"):
> On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote:
> > And now we're doing N^2 for each
> > candidate? 
> 
> Again, yes, but that is turning it from Ndoms*Nvcpus to
> Ndoms*Nvcpus+Nnodes^2, which is still dominated by the first term. IIRC,
> Andre tried to start >50 domains with 2 vCPUs on a 8 nodes system, which
> means 50*2 vs 8*8.

Are you sure about this calculation ?  It seems to me that 
doing Nnodes^2 for each candidate multiplies the cost by the number of
candidates, rather than adding Nnodes^2.

There are in the worst case nearly 2^Nnodes candidates (combinations
of nodes).  So the cost is
  <= 2^Nnodes * Nnodes^2

Your algorithm runs with up to 16 nodes.  Your change here increases
the worst-case cost from
  2^16 = 65556
to
  2^16 * 16^2 = 16777216

I don't think that's a good idea.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-19 14:57       ` Ian Jackson
@ 2012-10-19 18:02         ` Dario Faggioli
  2012-10-21  7:35           ` Dario Faggioli
  0 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2012-10-19 18:02 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andre Przywara, Ian Campbell, Stefano Stabellini, George Dunlap,
	Juergen Gross, xen-devel@lists.xen.org

[-- Attachment #1.1: Type: text/plain, Size: 4583 bytes --]

On Fri, 2012-10-19 at 15:57 +0100, Ian Jackson wrote:
> Dario Faggioli writes ("Re: [Xen-devel] [PATCH 1 of 3] libxl: take node distances into account during NUMA placement"):
> > On Thu, 2012-10-18 at 16:17 +0100, George Dunlap wrote:
> > > And now we're doing N^2 for each
> > > candidate? 
> > 
> > Again, yes, but that is turning it from Ndoms*Nvcpus to
> > Ndoms*Nvcpus+Nnodes^2, which is still dominated by the first term. IIRC,
> > Andre tried to start >50 domains with 2 vCPUs on a 8 nodes system, which
> > means 50*2 vs 8*8.
> 
> Are you sure about this calculation ?  
>
I am. In fact, I think they answer George's question which was "by how
much we are increasing the complexity of each step?" and not "by how
much we are increasing the overall complexity?".

That of course doesn't mean I don't care about the overall complexity,
just that I don't think this is making things worse than what we have
(which is, btw, the result of the long discussion we had pre 4.2
release).

> It seems to me that 
> doing Nnodes^2 for each candidate multiplies the cost by the number of
> candidates, rather than adding Nnodes^2.
> 
That is definitely true. Again the key is the difference between talking
about "each candidate", as we were, and total.

> There are in the worst case nearly 2^Nnodes candidates (combinations
> of nodes).  So the cost is
>   <= 2^Nnodes * Nnodes^2
> 
> Your algorithm runs with up to 16 nodes.  Your change here increases
> the worst-case cost from
>   2^16 = 65556
> to
>   2^16 * 16^2 = 16777216
> 
Ok, I'm fine with the "let's throw numbers" game, so let me throw
mine[*] ones before proposing a couple of possible strategies. :-D

So, the current exact calculation for the number of candidates that are
generated (in the worst case, i.e., when the only suitable candidate for
the domain being created is the very last one evaluated):

combs=0;
N=16;
for k=1:N
  combs=combs+bincoeff(N,k);
endfor;
combs
combs =  65535

Now, the change about distances adds to each step something that can be
upper bounded by this:

steps=0;
N=16;
for i=0:N
  for j=1:N-i
    steps++;
  endfor;
endfor;
steps
steps =  136

Hence:

combs*steps
ans =  8912760

But this is of course not tight, and a more accurate calculation of the
worst case overall required number of "steps" you're interested in
should look more like this:

combs=0;
N=16;
for k=1:N
  steps=0;
  for i=0:k
    for j=1:k-i
      steps++;
    endfor;
  endfor;
  combs=combs+bincoeff(n,k)*steps;
endfor;
combs
combs =  2490368

So not exactly 16777216, although, don't get me wrong, I agree it is
huge.

So, now that the math has been taken care of, here it comes the time for
the proposals I was talking about before.

1. For this specific issue, I can try to change the way I use the 
  distances matrix and the way I define and calculate a measure of how
  distant the nodes in a candidate are from each others, trying to make
  it at least linear in computation time. It's not immediate, mainly
  because it's a matrix after all, but maybe I can save some
  intermediate results during the process and amortize the complexity
  (I've ideas, but nothing precise enough to share yet). Does that make
  sense to you?

2. Independently from this specific issue, I think we need something to
  determine where a reasonable limit lie, and whether what we have and
  the changes we will make result or not in a decent domain creation  
  time.
  That's something very hard to do, but I was thinking about writing
  some sort of 'simulator', i.e., a standalone program that calls the
  placement function and intercept the libxl_ calls it does, to create
  and let it run in an artificial scenario of choice.
  Something like that has already been suggested during one of the last
  rounds of review (although for different purposes), and I thought it
  was a real good idea... Now I think is something we definitely
  need. :-)
  I know it's not going to reach usec-level precision (hypercall being
  simulated by function calls, and stuff like that), but I think it
  could be useful... You?

Thanks again for looking into this and Regards,
Dario

[*] copying and pasting the lines in octave should work.

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
  2012-10-19 18:02         ` Dario Faggioli
@ 2012-10-21  7:35           ` Dario Faggioli
  0 siblings, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2012-10-21  7:35 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andre Przywara, Ian Campbell, Stefano Stabellini, George Dunlap,
	Juergen Gross, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1471 bytes --]

On Fri, 2012-10-19 at 20:02 +0200, Dario Faggioli wrote: 
> I am. In fact, I think they answer George's question which was "by how
> much we are increasing the complexity of each step?" and not "by how
> much we are increasing the overall complexity?".
> 
> That of course doesn't mean I don't care about the overall complexity,
> just that I don't think this is making things worse than what we have
> (which is, btw, the result of the long discussion we had pre 4.2
> release).
> 
> > It seems to me that 
> > doing Nnodes^2 for each candidate multiplies the cost by the number of
> > candidates, rather than adding Nnodes^2.
> > 
> That is definitely true. Again the key is the difference between talking
> about "each candidate", as we were, and total.
> 
Which, BTW, make me thinking that you don't like the fact that, for each
candidate, we go through the list of domains to check what vcpus are
bound to such candidate.

That was requested during review of the automatic placement series,
reviewed itself, tested on an 8 nodes box with 50-60 domains and became
cs 4165d71479f9.

Anyway, if you want, I think I can change that too. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-10-21  7:35 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-16 17:26 [PATCH 0 of 3] Some small NUMA placement improvements Dario Faggioli
2012-10-16 17:26 ` [PATCH 1 of 3] libxl: take node distances into account during NUMA placement Dario Faggioli
2012-10-18 15:17   ` George Dunlap
2012-10-18 23:20     ` Dario Faggioli
2012-10-19 10:03       ` Ian Jackson
2012-10-19 10:39         ` Stefano Stabellini
2012-10-19 10:56           ` Dario Faggioli
2012-10-19 10:35       ` Stefano Stabellini
2012-10-19 10:50       ` George Dunlap
2012-10-19 11:00         ` Dario Faggioli
2012-10-19 14:57       ` Ian Jackson
2012-10-19 18:02         ` Dario Faggioli
2012-10-21  7:35           ` Dario Faggioli
2012-10-16 17:26 ` [PATCH 2 of 3] libxl, xl: user can ask for min and max nodes to use during placement Dario Faggioli
2012-10-18 15:21   ` George Dunlap
2012-10-16 17:26 ` [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning Dario Faggioli
2012-10-18 11:23   ` Ian Campbell
2012-10-18 13:11     ` Dario Faggioli
2012-10-18 13:15       ` Ian Campbell
2012-10-18 13:18         ` Dario Faggioli
2012-10-18 15:30   ` George Dunlap
2012-10-18 22:35     ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).