From: Matt Fleming <matt@codeblueprint.co.uk>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
Matt Fleming <matt@codeblueprint.co.uk>,
"Suthikulpanit, Suravee" <Suravee.Suthikulpanit@amd.com>,
Mel Gorman <mgorman@techsingularity.net>,
"Lendacky, Thomas" <Thomas.Lendacky@amd.com>,
Borislav Petkov <bp@alien8.de>
Subject: [PATCH] sched/topology: Improve load balancing on AMD EPYC
Date: Wed, 5 Jun 2019 16:59:22 +0100 [thread overview]
Message-ID: <20190605155922.17153-1-matt@codeblueprint.co.uk> (raw)
SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init()
for any sched domains with a NUMA distance greater than 2 hops
(RECLAIM_DISTANCE). The idea being that it's expensive to balance
across domains that far apart.
However, as is rather unfortunately explained in
commit 32e45ff43eaf ("mm: increase RECLAIM_DISTANCE to 30")
the value for RECLAIM_DISTANCE is based on node distance tables from
2011-era hardware.
Current AMD EPYC machines have the following NUMA node distances:
node distances:
node 0 1 2 3 4 5 6 7
0: 10 16 16 16 32 32 32 32
1: 16 10 16 16 32 32 32 32
2: 16 16 10 16 32 32 32 32
3: 16 16 16 10 32 32 32 32
4: 32 32 32 32 10 16 16 16
5: 32 32 32 32 16 10 16 16
6: 32 32 32 32 16 16 10 16
7: 32 32 32 32 16 16 16 10
where 2 hops is 32.
The result is that the scheduler fails to load balance properly across
NUMA nodes on different sockets -- 2 hops apart.
For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4
(CPUs 32-39) like so,
$ numactl -C 0-7,32-39 ./spinner 16
causes all threads to fork and remain on node 0 until the active
balancer kicks in after a few seconds and forcibly moves some threads
to node 4.
Update the code in sd_init() to account for modern node distances, and
maintaining backward-compatible behaviour by respecting
RECLAIM_DISTANCE for distances more than 2 hops.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@amd.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Lendacky, Thomas" <Thomas.Lendacky@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
---
kernel/sched/topology.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index f53f89df837d..0eea395f7c6b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1410,7 +1410,18 @@ sd_init(struct sched_domain_topology_level *tl,
sd->flags &= ~SD_PREFER_SIBLING;
sd->flags |= SD_SERIALIZE;
- if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
+
+ /*
+ * Strip the following flags for sched domains with a NUMA
+ * distance greater than the historical 2-hops value
+ * (RECLAIM_DISTANCE) and where tl->numa_level confirms it
+ * really is more than 2 hops.
+ *
+ * Respecting RECLAIM_DISTANCE means we maintain
+ * backwards-compatible behaviour.
+ */
+ if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE &&
+ tl->numa_level > 3) {
sd->flags &= ~(SD_BALANCE_EXEC |
SD_BALANCE_FORK |
SD_WAKE_AFFINE);
--
2.13.7
next reply other threads:[~2019-06-05 15:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-05 15:59 Matt Fleming [this message]
2019-06-05 18:00 ` [PATCH] sched/topology: Improve load balancing on AMD EPYC Peter Zijlstra
2019-06-10 21:26 ` Matt Fleming
2019-06-11 17:22 ` Lendacky, Thomas
2019-06-18 10:43 ` Matt Fleming
2019-06-18 12:33 ` Peter Zijlstra
2019-06-19 21:34 ` Matt Fleming
2019-06-24 14:24 ` Mel Gorman
2019-06-26 21:18 ` Suthikulpanit, Suravee
2019-06-28 15:15 ` Matt Fleming
2019-07-22 14:11 ` Suthikulpanit, Suravee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190605155922.17153-1-matt@codeblueprint.co.uk \
--to=matt@codeblueprint.co.uk \
--cc=Suravee.Suthikulpanit@amd.com \
--cc=Thomas.Lendacky@amd.com \
--cc=bp@alien8.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.