From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xen.org
Cc: Andre Przywara <andre.przywara@amd.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
Ian Campbell <Ian.Campbell@citrix.com>
Subject: [PATCH 1 of 3] libxl: take node distances into account during NUMA placement
Date: Tue, 16 Oct 2012 19:26:26 +0200 [thread overview]
Message-ID: <fcccd3353cc6f336b7b0.1350408386@Solace> (raw)
In-Reply-To: <patchbomb.1350408385@Solace>
In fact, among placement candidates with the same number of nodes, the
closer the various nodes are to each others, the better the performances
for a domain placed there.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -105,6 +105,9 @@ out:
* - the number of vcpus runnable on the candidates is considered, and
* candidates with fewer of them are preferred. If two candidate have
* the same number of runnable vcpus,
+ * - the sum of the node distances in the candidates is considered, and
+ * candidates with smaller total distance are preferred. If total
+ * distance is the same for the two candidatess,
* - the amount of free memory in the candidates is considered, and the
* candidate with greater amount of it is preferred.
*
@@ -114,6 +117,10 @@ out:
* overloading large (from a memory POV) nodes. That's right the effect
* that counting the vcpus able to run on the nodes tries to prevent.
*
+ * The relative distance within the nodes in the candidates is considered
+ * as the closer the nodes, the better for the domain ending up on the
+ * candidate.
+ *
* Note that this completely ignore the number of nodes each candidate span,
* as the fact that fewer nodes is better is already accounted for in the
* algorithm.
@@ -124,6 +131,9 @@ static int numa_cmpf(const libxl__numa_c
if (c1->nr_vcpus != c2->nr_vcpus)
return c1->nr_vcpus - c2->nr_vcpus;
+ if (c1->dists_sum != c2->dists_sum)
+ return c1->dists_sum - c2->dists_sum;
+
return c2->free_memkb - c1->free_memkb;
}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2732,6 +2732,7 @@ static inline void libxl__ctx_unlock(lib
typedef struct {
int nr_cpus, nr_nodes;
int nr_vcpus;
+ int dists_sum;
uint32_t free_memkb;
libxl_bitmap nodemap;
} libxl__numa_candidate;
diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c
--- a/tools/libxl/libxl_numa.c
+++ b/tools/libxl/libxl_numa.c
@@ -218,6 +218,40 @@ static int nodemap_to_nr_vcpus(libxl__gc
return nr_vcpus;
}
+/* Sum the relative distances of nodes in the nodemap to help finding
+ * out which candidate is the "tightest" one. */
+static int nodemap_to_dists_sum(libxl_numainfo *ninfo, libxl_bitmap *nodemap)
+{
+ int tot_dist = 0;
+ int i, j, a = 0, b;
+
+ for (i = 0; i < libxl_bitmap_count_set(nodemap); i++) {
+ while (!libxl_bitmap_test(nodemap, a))
+ a++;
+
+ /* As it is usually non-zero, we do take the latency of
+ * of a node to itself into account. */
+ b = a;
+ for (j = 0; j < libxl_bitmap_count_set(nodemap) - i; j++) {
+ while (!libxl_bitmap_test(nodemap, b))
+ b++;
+
+ /*
+ * In most architectures, going from node A to node B costs
+ * exactly as much as going from B to A does. However, let's
+ * not rely on this and consider both contributions, just to
+ * be ready for everything future might reserve for us.
+ */
+ tot_dist += ninfo[a].dists[b];
+ tot_dist += ninfo[b].dists[a];
+ b++;
+ }
+ a++;
+ }
+
+ return tot_dist;
+}
+
/*
* This function tries to figure out if the host has a consistent number
* of cpus along all its NUMA nodes. In fact, if that is the case, we can
@@ -415,6 +449,7 @@ int libxl__get_numa_candidate(libxl__gc
*/
libxl__numa_candidate_put_nodemap(gc, &new_cndt, &nodemap);
new_cndt.nr_vcpus = nodemap_to_nr_vcpus(gc, tinfo, &nodemap);
+ new_cndt.dists_sum = nodemap_to_dists_sum(ninfo, &nodemap);
new_cndt.free_memkb = nodes_free_memkb;
new_cndt.nr_nodes = libxl_bitmap_count_set(&nodemap);
new_cndt.nr_cpus = nodes_cpus;
@@ -430,12 +465,14 @@ int libxl__get_numa_candidate(libxl__gc
LOG(DEBUG, "New best NUMA placement candidate found: "
"nr_nodes=%d, nr_cpus=%d, nr_vcpus=%d, "
- "free_memkb=%"PRIu32"", new_cndt.nr_nodes,
- new_cndt.nr_cpus, new_cndt.nr_vcpus,
+ "dists_sum=%d, free_memkb=%"PRIu32"",
+ new_cndt.nr_nodes, new_cndt.nr_cpus,
+ new_cndt.nr_vcpus, new_cndt.dists_sum,
new_cndt.free_memkb / 1024);
libxl__numa_candidate_put_nodemap(gc, cndt_out, &nodemap);
cndt_out->nr_vcpus = new_cndt.nr_vcpus;
+ cndt_out->dists_sum = new_cndt.dists_sum;
cndt_out->free_memkb = new_cndt.free_memkb;
cndt_out->nr_nodes = new_cndt.nr_nodes;
cndt_out->nr_cpus = new_cndt.nr_cpus;
next prev parent reply other threads:[~2012-10-16 17:26 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-16 17:26 [PATCH 0 of 3] Some small NUMA placement improvements Dario Faggioli
2012-10-16 17:26 ` Dario Faggioli [this message]
2012-10-18 15:17 ` [PATCH 1 of 3] libxl: take node distances into account during NUMA placement George Dunlap
2012-10-18 23:20 ` Dario Faggioli
2012-10-19 10:03 ` Ian Jackson
2012-10-19 10:39 ` Stefano Stabellini
2012-10-19 10:56 ` Dario Faggioli
2012-10-19 10:35 ` Stefano Stabellini
2012-10-19 10:50 ` George Dunlap
2012-10-19 11:00 ` Dario Faggioli
2012-10-19 14:57 ` Ian Jackson
2012-10-19 18:02 ` Dario Faggioli
2012-10-21 7:35 ` Dario Faggioli
2012-10-16 17:26 ` [PATCH 2 of 3] libxl, xl: user can ask for min and max nodes to use during placement Dario Faggioli
2012-10-18 15:21 ` George Dunlap
2012-10-16 17:26 ` [PATCH 3 of 3] xl: allow for node-wise specification of vcpu pinning Dario Faggioli
2012-10-18 11:23 ` Ian Campbell
2012-10-18 13:11 ` Dario Faggioli
2012-10-18 13:15 ` Ian Campbell
2012-10-18 13:18 ` Dario Faggioli
2012-10-18 15:30 ` George Dunlap
2012-10-18 22:35 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fcccd3353cc6f336b7b0.1350408386@Solace \
--to=dario.faggioli@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=andre.przywara@amd.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).