From: riel@redhat.com
To: linux-kernel@vger.kernel.org
Cc: peterz@infradead.org, mgorman@suse.de, chegu_vinod@hp.com,
mingo@kernel.org, efault@gmx.de, vincent.guittot@linaro.org
Subject: [PATCH RFC 2/5] sched,numa: classify the NUMA topology of a system
Date: Wed, 8 Oct 2014 15:37:27 -0400 [thread overview]
Message-ID: <1412797050-8903-3-git-send-email-riel@redhat.com> (raw)
In-Reply-To: <1412797050-8903-1-git-send-email-riel@redhat.com>
From: Rik van Riel <riel@redhat.com>
Smaller NUMA systems tend to have all NUMA nodes directly connected
to each other. This includes the degenerate case of a system with just
one node, ie. a non-NUMA system.
Larger systems can have two kinds of NUMA topology, which affects how
tasks and memory should be placed on the system.
On glueless mesh systems, nodes that are not directly connected to
each other will bounce traffic through intermediary nodes. Task groups
can be run closer to each other by moving tasks from a node to an
intermediary node between it and the task's preferred node.
On NUMA systems with backplane controllers, the intermediary hops
are incapable of running programs. This creates "islands" of nodes
that are at an equal distance to anywhere else in the system.
Each kind of topology requires a slightly different placement
algorithm; this patch provides the mechanism to detect the kind
of NUMA topology of a system.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
include/linux/topology.h | 7 +++++++
kernel/sched/core.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 60 insertions(+)
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 33002f4..bf40d46 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -49,6 +49,13 @@
int arch_update_cpu_topology(void);
extern int node_hops(int i, int j);
+enum numa_topology_type {
+ NUMA_DIRECT,
+ NUMA_GLUELESS_MESH,
+ NUMA_BACKPLANE,
+};
+extern enum numa_topology_type sched_numa_topology_type;
+
/* Conform to ACPI 2.0 SLIT distance definitions */
#define LOCAL_DISTANCE 10
#define REMOTE_DISTANCE 20
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0cf501e..1898914 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6075,6 +6075,7 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
#ifdef CONFIG_NUMA
static int sched_domains_numa_levels;
+enum numa_topology_type sched_numa_topology_type;
static int *sched_domains_numa_distance;
static int *sched_domains_numa_hops;
static struct cpumask ***sched_domains_numa_masks;
@@ -6276,6 +6277,56 @@ static bool find_numa_distance(int distance)
return false;
}
+/*
+ * A system can have three types of NUMA topology:
+ * NUMA_DIRECT: all nodes are directly connected, or not a NUMA system
+ * NUMA_GLUELESS_MESH: some nodes reachable through intermediary nodes
+ * NUMA_BACKPLANE: nodes can reach other nodes through a backplane
+ *
+ * The difference between a glueless mesh topology and a backplane
+ * topology lies in whether communication between not directly
+ * connected nodes goes through intermediary nodes (where programs
+ * could run), or through backplane controllers. This affects
+ * placement of programs.
+ *
+ * The type of topology can be discerned with the following tests:
+ * - If the maximum distance between any nodes is 1 hop, the system
+ * is directly connected.
+ * - If for two nodes A and B, located N > 1 hops away from each other,
+ * there is an intermediary node C, which is < N hops away from both
+ * nodes A and B, the system is a glueless mesh.
+ */
+static void init_numa_topology_type(void)
+{
+ int a, b, c, n;
+
+ n = sched_domains_numa_levels;
+
+ if (n <= 1)
+ sched_numa_topology_type = NUMA_DIRECT;
+
+ for_each_online_node(a) {
+ for_each_online_node(b) {
+ /* Find two nodes furthest removed from each other. */
+ if (node_hops(a, b) < n)
+ continue;
+
+ /* Is there an intermediary node between a and b? */
+ for_each_online_node(c) {
+ if (node_hops(a, c) < n &&
+ node_hops(b, c) < n) {
+ sched_numa_topology_type =
+ NUMA_GLUELESS_MESH;
+ return;
+ }
+ }
+
+ sched_numa_topology_type = NUMA_BACKPLANE;
+ return;
+ }
+ }
+}
+
static void sched_init_numa(void)
{
int next_distance, curr_distance = node_distance(0, 0);
@@ -6425,6 +6476,8 @@ static void sched_init_numa(void)
sched_domain_topology = tl;
sched_domains_numa_levels = level;
+
+ init_numa_topology_type();
}
static void sched_domains_numa_masks_set(int cpu)
--
1.9.3
next prev parent reply other threads:[~2014-10-08 19:38 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-08 19:37 [PATCH RFC 0/5] sched,numa: task placement with complex NUMA topologies riel
2014-10-08 19:37 ` [PATCH RFC 1/5] sched,numa: build table of node hop distance riel
2014-10-12 13:17 ` Peter Zijlstra
2014-10-12 13:28 ` Rik van Riel
2014-10-14 6:47 ` Peter Zijlstra
2014-10-14 7:49 ` Rik van Riel
2014-10-08 19:37 ` riel [this message]
2014-10-12 14:30 ` [PATCH RFC 2/5] sched,numa: classify the NUMA topology of a system Peter Zijlstra
2014-10-13 7:12 ` Rik van Riel
2014-10-08 19:37 ` [PATCH RFC 3/5] sched,numa: preparations for complex topology placement riel
2014-10-12 14:37 ` Peter Zijlstra
2014-10-13 7:12 ` Rik van Riel
2014-10-08 19:37 ` [PATCH RFC 4/5] sched,numa: calculate node scores in complex NUMA topologies riel
2014-10-12 14:53 ` Peter Zijlstra
2014-10-13 7:15 ` Rik van Riel
2014-10-08 19:37 ` [PATCH RFC 5/5] sched,numa: find the preferred nid with complex NUMA topology riel
2014-10-12 14:56 ` Peter Zijlstra
2014-10-13 7:17 ` Rik van Riel
[not found] ` <4168C988EBDF2141B4E0B6475B6A73D126F58E4F@G6W2504.americas.hpqcorp.net>
[not found] ` <54367446.3020603@redhat.com>
2014-10-10 18:44 ` [PATCH RFC 0/5] sched,numa: task placement with complex NUMA topologies Vinod, Chegu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1412797050-8903-3-git-send-email-riel@redhat.com \
--to=riel@redhat.com \
--cc=chegu_vinod@hp.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.