[PATCH 12/26] sched/topology: Fix overlapping sched_group_capacity

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Linux-Stable <stable@vger.kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH 12/26] sched/topology: Fix overlapping sched_group_capacity
Date: Thu, 20 Jul 2017 22:21:30 +0100	[thread overview]
Message-ID: <20170720212144.18453-13-mgorman@techsingularity.net> (raw)
In-Reply-To: <20170720212144.18453-1-mgorman@techsingularity.net>

From: Peter Zijlstra <peterz@infradead.org>

commit 1676330ecfa840113a37b25a49afda068380d19c upstream.

When building the overlapping groups we need to attach a consistent
sched_group_capacity structure. That is, all 'identical' sched_group's
should have the _same_ sched_group_capacity.

This can (once again) be demonstrated with a topology like:

  node   0   1   2   3
    0:  10  20  30  20
    1:  20  10  20  30
    2:  30  20  10  20
    3:  20  30  20  10

But we need at least 2 CPUs per node for this to show up, after all,
if there is only one CPU per node, our CPU @i is per definition a
unique CPU that reaches this domain (aka balance-cpu).

Given the above NUMA topo and 2 CPUs per node:

  [] CPU0 attaching sched-domain(s):
  []  domain-0: span=0,4 level=DIE
  []   groups: 0:{ span=0 }, 4:{ span=4 }
  []   domain-1: span=0-1,3-5,7 level=NUMA
  []    groups: 0:{ span=0,4 mask=0,4 cap=2048 }, 1:{ span=1,5 mask=1,5 cap=2048 }, 3:{ span=3,7 mask=3,7 cap=2048 }
  []    domain-2: span=0-7 level=NUMA
  []     groups: 0:{ span=0-1,3-5,7 mask=0,4 cap=6144 }, 2:{ span=1-3,5-7 mask=2,6 cap=6144 }
  [] CPU1 attaching sched-domain(s):
  []  domain-0: span=1,5 level=DIE
  []   groups: 1:{ span=1 }, 5:{ span=5 }
  []   domain-1: span=0-2,4-6 level=NUMA
  []    groups: 1:{ span=1,5 mask=1,5 cap=2048 }, 2:{ span=2,6 mask=2,6 cap=2048 }, 4:{ span=0,4 mask=0,4 cap=2048 }
  []    domain-2: span=0-7 level=NUMA
  []     groups: 1:{ span=0-2,4-6 mask=1,5 cap=6144 }, 3:{ span=0,2-4,6-7 mask=3,7 cap=6144 }

Observe how CPU0-domain1-group0 and CPU1-domain1-group4 are the
'same' but have a different id (0 vs 4).

To fix this, use the group balance CPU to select the SGC. This means
we have to compute the full mask for each CPU and require a second
temporary mask to store the group mask in (it otherwise lives in the
SGC).

The fixed topology looks like:

  [] CPU0 attaching sched-domain(s):
  []  domain-0: span=0,4 level=DIE
  []   groups: 0:{ span=0 }, 4:{ span=4 }
  []   domain-1: span=0-1,3-5,7 level=NUMA
  []    groups: 0:{ span=0,4 mask=0,4 cap=2048 }, 1:{ span=1,5 mask=1,5 cap=2048 }, 3:{ span=3,7 mask=3,7 cap=2048 }
  []    domain-2: span=0-7 level=NUMA
  []     groups: 0:{ span=0-1,3-5,7 mask=0,4 cap=6144 }, 2:{ span=1-3,5-7 mask=2,6 cap=6144 }
  [] CPU1 attaching sched-domain(s):
  []  domain-0: span=1,5 level=DIE
  []   groups: 1:{ span=1 }, 5:{ span=5 }
  []   domain-1: span=0-2,4-6 level=NUMA
  []    groups: 1:{ span=1,5 mask=1,5 cap=2048 }, 2:{ span=2,6 mask=2,6 cap=2048 }, 0:{ span=0,4 mask=0,4 cap=2048 }
  []    domain-2: span=0-7 level=NUMA
  []     groups: 1:{ span=0-2,4-6 mask=1,5 cap=6144 }, 3:{ span=0,2-4,6-7 mask=3,7 cap=6144 }

Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/topology.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 12af4b157928..4f6fa7553d92 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -10,6 +10,7 @@ DEFINE_MUTEX(sched_domains_mutex);
 
 /* Protected by sched_domains_mutex: */
 cpumask_var_t sched_domains_tmpmask;
+cpumask_var_t sched_domains_tmpmask2;
 
 #ifdef CONFIG_SCHED_DEBUG
 
@@ -500,13 +501,16 @@ enum s_alloc {
  * Only CPUs that can arrive at this group should be considered to continue
  * balancing.
  */
-static void build_group_mask(struct sched_domain *sd, struct sched_group *sg)
+static void
+build_group_mask(struct sched_domain *sd, struct sched_group *sg, struct cpumask *mask)
 {
 	const struct cpumask *sg_span = sched_group_cpus(sg);
 	struct sd_data *sdd = sd->private;
 	struct sched_domain *sibling;
 	int i;
 
+	cpumask_clear(mask);
+
 	for_each_cpu(i, sg_span) {
 		sibling = *per_cpu_ptr(sdd->sd, i);
 
@@ -522,11 +526,11 @@ static void build_group_mask(struct sched_domain *sd, struct sched_group *sg)
 		if (!cpumask_equal(sg_span, sched_domain_span(sibling->child)))
 			continue;
 
-		cpumask_set_cpu(i, sched_group_mask(sg));
+		cpumask_set_cpu(i, mask);
 	}
 
 	/* We must not have empty masks here */
-	WARN_ON_ONCE(cpumask_empty(sched_group_mask(sg)));
+	WARN_ON_ONCE(cpumask_empty(mask));
 }
 
 /*
@@ -560,14 +564,19 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu)
 }
 
 static void init_overlap_sched_group(struct sched_domain *sd,
-				     struct sched_group *sg, int cpu)
+				     struct sched_group *sg)
 {
+	struct cpumask *mask = sched_domains_tmpmask2;
 	struct sd_data *sdd = sd->private;
 	struct cpumask *sg_span;
+	int cpu;
+
+	build_group_mask(sd, sg, mask);
+	cpu = cpumask_first_and(sched_group_cpus(sg), mask);
 
 	sg->sgc = *per_cpu_ptr(sdd->sgc, cpu);
 	if (atomic_inc_return(&sg->sgc->ref) == 1)
-		build_group_mask(sd, sg);
+		cpumask_copy(sched_group_mask(sg), mask);
 
 	/*
 	 * Initialize sgc->capacity such that even if we mess up the
@@ -619,7 +628,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 		sg_span = sched_group_cpus(sg);
 		cpumask_or(covered, covered, sg_span);
 
-		init_overlap_sched_group(sd, sg, i);
+		init_overlap_sched_group(sd, sg);
 
 		if (!first)
 			first = sg;
@@ -1578,6 +1587,7 @@ int sched_init_domains(const struct cpumask *cpu_map)
 	int err;
 
 	zalloc_cpumask_var(&sched_domains_tmpmask, GFP_KERNEL);
+	zalloc_cpumask_var(&sched_domains_tmpmask2, GFP_KERNEL);
 	zalloc_cpumask_var(&fallback_doms, GFP_KERNEL);
 
 	arch_update_cpu_topology();
-- 
2.13.1

next prev parent reply	other threads:[~2017-07-20 21:21 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-20 21:21 [PATCH 00/26] Performance-related backports for 4.12.2 Mel Gorman
2017-07-20 21:21 ` [PATCH 01/26] sched/topology: Refactor function build_overlap_sched_groups() Mel Gorman
2017-07-20 21:21 ` [PATCH 02/26] sched/topology: Fix building of overlapping sched-groups Mel Gorman
2017-07-20 21:21 ` [PATCH 03/26] sched/topology: Simplify build_overlap_sched_groups() Mel Gorman
2017-07-20 21:21 ` [PATCH 04/26] sched/debug: Print the scheduler topology group mask Mel Gorman
2017-07-20 21:21 ` [PATCH 05/26] sched/topology: Verify the first group matches the child domain Mel Gorman
2017-07-20 21:21 ` [PATCH 06/26] sched/topology: Optimize build_group_mask() Mel Gorman
2017-07-20 21:21 ` [PATCH 07/26] sched/topology: Move comment about asymmetric node setups Mel Gorman
2017-07-20 21:21 ` [PATCH 08/26] sched/topology: Remove FORCE_SD_OVERLAP Mel Gorman
2017-07-20 21:21 ` [PATCH 09/26] sched/topology: Fix overlapping sched_group_mask Mel Gorman
2017-07-20 21:21 ` [PATCH 10/26] sched/topology: Small cleanup Mel Gorman
2017-07-20 21:21 ` [PATCH 11/26] sched/topology: Add sched_group_capacity debugging Mel Gorman
2017-07-20 21:21 ` Mel Gorman [this message]
2017-07-20 21:21 ` [PATCH 13/26] sched/topology: Add a few comments Mel Gorman
2017-07-20 21:21 ` [PATCH 14/26] sched/topology: Rewrite get_group() Mel Gorman
2017-07-20 21:21 ` [PATCH 15/26] sched/topology: Simplify sched_group_mask() usage Mel Gorman
2017-07-20 21:21 ` [PATCH 16/26] sched/topology: Rename sched_group_mask() Mel Gorman
2017-07-20 21:21 ` [PATCH 17/26] sched/topology: Rename sched_group_cpus() Mel Gorman
2017-07-20 21:21 ` [PATCH 18/26] vtime, sched/cputime: Remove vtime_account_user() Mel Gorman
2017-07-20 21:21 ` [PATCH 19/26] sched/cputime: Always set tsk->vtime_snap_whence after accounting vtime Mel Gorman
2017-07-20 21:21 ` [PATCH 20/26] sched/cputime: Rename vtime fields Mel Gorman
2017-07-20 21:21 ` [PATCH 21/26] sched/cputime: Move the vtime task fields to their own struct Mel Gorman
2017-07-20 21:21 ` [PATCH 22/26] sched/cputime: Accumulate vtime on top of nsec clocksource Mel Gorman
2017-07-20 21:21 ` [PATCH 23/26] sched/fair: Fix load_balance() affinity redo path Mel Gorman
2017-07-20 21:21 ` [PATCH 24/26] percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch Mel Gorman
2017-07-20 21:21 ` [PATCH 25/26] writeback: rework wb_[dec|inc]_stat family of functions Mel Gorman
2017-07-20 21:21 ` [PATCH 26/26] kernel/fork.c: virtually mapped stacks: do not disable interrupts Mel Gorman
2017-07-24 16:44 ` [PATCH 00/26] Performance-related backports for 4.12.2 Mel Gorman
2017-07-24 23:29   ` Greg KH
2017-07-25  8:14     ` Mel Gorman
2017-07-25 15:21       ` Greg KH

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:12af4b15792 dfblob:4f6fa7553d9 )
 OR (
bs:"[PATCH 12/26] sched/topology: Fix overlapping sched_group_capacity" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170720212144.18453-13-mgorman@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).