[PATCH 1 of 3] libxl: take node distances into account during NUMA placement

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xen.org
Cc: Andre Przywara <andre.przywara@amd.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>
Subject: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
Date: Tue, 16 Oct 2012 19:26:26 +0200	[thread overview]
Message-ID: <fcccd3353cc6f336b7b0.1350408386@Solace> (raw)
In-Reply-To: <patchbomb.1350408385@Solace>

In fact, among placement candidates with the same number of nodes, the
closer the various nodes are to each others, the better the performances
for a domain placed there.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -105,6 +105,9 @@ out:
  *  - the number of vcpus runnable on the candidates is considered, and
  *    candidates with fewer of them are preferred. If two candidate have
  *    the same number of runnable vcpus,
+ *  - the sum of the node distances in the candidates is considered, and
+ *    candidates with smaller total distance are preferred. If total
+ *    distance is the same for the two candidatess,
  *  - the amount of free memory in the candidates is considered, and the
  *    candidate with greater amount of it is preferred.
  *
@@ -114,6 +117,10 @@ out:
  * overloading large (from a memory POV) nodes. That's right the effect
  * that counting the vcpus able to run on the nodes tries to prevent.
  *
+ * The relative distance within the nodes in the candidates is considered
+ * as the closer the nodes, the better for the domain ending up on the
+ * candidate.
+ *
  * Note that this completely ignore the number of nodes each candidate span,
  * as the fact that fewer nodes is better is already accounted for in the
  * algorithm.
@@ -124,6 +131,9 @@ static int numa_cmpf(const libxl__numa_c
     if (c1->nr_vcpus != c2->nr_vcpus)
         return c1->nr_vcpus - c2->nr_vcpus;
 
+    if (c1->dists_sum != c2->dists_sum)
+        return c1->dists_sum - c2->dists_sum;
+
     return c2->free_memkb - c1->free_memkb;
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2732,6 +2732,7 @@ static inline void libxl__ctx_unlock(lib
 typedef struct {
     int nr_cpus, nr_nodes;
     int nr_vcpus;
+    int dists_sum;
     uint32_t free_memkb;
     libxl_bitmap nodemap;
 } libxl__numa_candidate;
diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c
--- a/tools/libxl/libxl_numa.c
+++ b/tools/libxl/libxl_numa.c
@@ -218,6 +218,40 @@ static int nodemap_to_nr_vcpus(libxl__gc
     return nr_vcpus;
 }
 
+/* Sum the relative distances of nodes in the nodemap to help finding
+ * out which candidate is the "tightest" one. */
+static int nodemap_to_dists_sum(libxl_numainfo *ninfo, libxl_bitmap *nodemap)
+{
+    int tot_dist = 0;
+    int i, j, a = 0, b;
+
+    for (i = 0; i < libxl_bitmap_count_set(nodemap); i++) {
+        while (!libxl_bitmap_test(nodemap, a))
+            a++;
+
+        /* As it is usually non-zero, we do take the latency of
+         * of a node to itself into account. */
+        b = a;
+        for (j = 0; j < libxl_bitmap_count_set(nodemap) - i; j++) {
+            while (!libxl_bitmap_test(nodemap, b))
+                b++;
+
+            /*
+             * In most architectures, going from node A to node B costs
+             * exactly as much as going from B to A does. However, let's
+             * not rely on this and consider both contributions, just to
+             * be ready for everything future might reserve for us.
+             */
+            tot_dist += ninfo[a].dists[b];
+            tot_dist += ninfo[b].dists[a];
+            b++;
+        }
+        a++;
+    }
+
+    return tot_dist;
+}
+
 /*
  * This function tries to figure out if the host has a consistent number
  * of cpus along all its NUMA nodes. In fact, if that is the case, we can
@@ -415,6 +449,7 @@ int libxl__get_numa_candidate(libxl__gc 
              */
             libxl__numa_candidate_put_nodemap(gc, &new_cndt, &nodemap);
             new_cndt.nr_vcpus = nodemap_to_nr_vcpus(gc, tinfo, &nodemap);
+            new_cndt.dists_sum = nodemap_to_dists_sum(ninfo, &nodemap);
             new_cndt.free_memkb = nodes_free_memkb;
             new_cndt.nr_nodes = libxl_bitmap_count_set(&nodemap);
             new_cndt.nr_cpus = nodes_cpus;
@@ -430,12 +465,14 @@ int libxl__get_numa_candidate(libxl__gc 
 
                 LOG(DEBUG, "New best NUMA placement candidate found: "
                            "nr_nodes=%d, nr_cpus=%d, nr_vcpus=%d, "
-                           "free_memkb=%"PRIu32"", new_cndt.nr_nodes,
-                           new_cndt.nr_cpus, new_cndt.nr_vcpus,
+                           "dists_sum=%d, free_memkb=%"PRIu32"",
+                           new_cndt.nr_nodes, new_cndt.nr_cpus,
+                           new_cndt.nr_vcpus, new_cndt.dists_sum,
                            new_cndt.free_memkb / 1024);
 
                 libxl__numa_candidate_put_nodemap(gc, cndt_out, &nodemap);
                 cndt_out->nr_vcpus = new_cndt.nr_vcpus;
+                cndt_out->dists_sum = new_cndt.dists_sum;
                 cndt_out->free_memkb = new_cndt.free_memkb;
                 cndt_out->nr_nodes = new_cndt.nr_nodes;
                 cndt_out->nr_cpus = new_cndt.nr_cpus;

next prev parent reply	other threads:[~2012-10-16 17:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-16 17:26 [PATCH 0 of 3] Some small NUMA placement improvements Dario Faggioli
2012-10-16 17:26 ` Dario Faggioli [this message]
2012-10-18 15:17   ` [PATCH 1 of 3] libxl: take node distances into account during NUMA placement George Dunlap
2012-10-18 23:20     ` Dario Faggioli
2012-10-19 10:03       ` Ian Jackson
2012-10-19 10:39         ` Stefano Stabellini
2012-10-19 10:56           ` Dario Faggioli
2012-10-19 10:35       ` Stefano Stabellini
2012-10-19 10:50       ` George Dunlap
2012-10-19 11:00         ` Dario Faggioli
2012-10-19 14:57       ` Ian Jackson
2012-10-19 18:02         ` Dario Faggioli
2012-10-21  7:35           ` Dario Faggioli
2012-10-16 17:26 ` [PATCH 2 of 3] libxl, xl: user can ask for min and max nodes to use during placement Dario Faggioli
2012-10-18 15:21   ` George Dunlap
2012-10-16 17:26 ` [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning Dario Faggioli
2012-10-18 11:23   ` Ian Campbell
2012-10-18 13:11     ` Dario Faggioli
2012-10-18 13:15       ` Ian Campbell
2012-10-18 13:18         ` Dario Faggioli
2012-10-18 15:30   ` George Dunlap
2012-10-18 22:35     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fcccd3353cc6f336b7b0.1350408386@Solace \
    --to=dario.faggioli@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).