All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Linux-Stable <stable@vger.kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH 09/26] sched/topology: Fix overlapping sched_group_mask
Date: Thu, 20 Jul 2017 22:21:27 +0100	[thread overview]
Message-ID: <20170720212144.18453-10-mgorman@techsingularity.net> (raw)
In-Reply-To: <20170720212144.18453-1-mgorman@techsingularity.net>

From: Peter Zijlstra <peterz@infradead.org>

commit 73bb059f9b8a00c5e1bf2f7ca83138c05d05e600 upstream.

The point of sched_group_mask is to select those CPUs from
sched_group_cpus that can actually arrive at this balance domain.

The current code gets it wrong, as can be readily demonstrated with a
topology like:

  node   0   1   2   3
    0:  10  20  30  20
    1:  20  10  20  30
    2:  30  20  10  20
    3:  20  30  20  10

Where (for example) domain 1 on CPU1 ends up with a mask that includes
CPU0:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 (mask: 1), 2, 0
  []   domain 1: span 0-3 level NUMA
  []    groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)

This causes sched_balance_cpu() to compute the wrong CPU and
consequently should_we_balance() will terminate early resulting in
missed load-balance opportunities.

The fixed topology looks like:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 (mask: 1), 2, 0
  []   domain 1: span 0-3 level NUMA
  []    groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)

(note: this relies on OVERLAP domains to always have children, this is
 true because the regular topology domains are still here -- this is
 before degenerate trimming)

Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/topology.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 21efacf547d4..09563c1d1d5b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -495,6 +495,9 @@ enum s_alloc {
 /*
  * Build an iteration mask that can exclude certain CPUs from the upwards
  * domain traversal.
+ *
+ * Only CPUs that can arrive at this group should be considered to continue
+ * balancing.
  */
 static void build_group_mask(struct sched_domain *sd, struct sched_group *sg)
 {
@@ -505,11 +508,24 @@ static void build_group_mask(struct sched_domain *sd, struct sched_group *sg)
 
 	for_each_cpu(i, sg_span) {
 		sibling = *per_cpu_ptr(sdd->sd, i);
-		if (!cpumask_test_cpu(i, sched_domain_span(sibling)))
+
+		/*
+		 * Can happen in the asymmetric case, where these siblings are
+		 * unused. The mask will not be empty because those CPUs that
+		 * do have the top domain _should_ span the domain.
+		 */
+		if (!sibling->child)
+			continue;
+
+		/* If we would not end up here, we can't continue from here */
+		if (!cpumask_equal(sg_span, sched_domain_span(sibling->child)))
 			continue;
 
 		cpumask_set_cpu(i, sched_group_mask(sg));
 	}
+
+	/* We must not have empty masks here */
+	WARN_ON_ONCE(cpumask_empty(sched_group_mask(sg)));
 }
 
 /*
-- 
2.13.1

  parent reply	other threads:[~2017-07-20 21:21 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-20 21:21 [PATCH 00/26] Performance-related backports for 4.12.2 Mel Gorman
2017-07-20 21:21 ` [PATCH 01/26] sched/topology: Refactor function build_overlap_sched_groups() Mel Gorman
2017-07-20 21:21 ` [PATCH 02/26] sched/topology: Fix building of overlapping sched-groups Mel Gorman
2017-07-20 21:21 ` [PATCH 03/26] sched/topology: Simplify build_overlap_sched_groups() Mel Gorman
2017-07-20 21:21 ` [PATCH 04/26] sched/debug: Print the scheduler topology group mask Mel Gorman
2017-07-20 21:21 ` [PATCH 05/26] sched/topology: Verify the first group matches the child domain Mel Gorman
2017-07-20 21:21 ` [PATCH 06/26] sched/topology: Optimize build_group_mask() Mel Gorman
2017-07-20 21:21 ` [PATCH 07/26] sched/topology: Move comment about asymmetric node setups Mel Gorman
2017-07-20 21:21 ` [PATCH 08/26] sched/topology: Remove FORCE_SD_OVERLAP Mel Gorman
2017-07-20 21:21 ` Mel Gorman [this message]
2017-07-20 21:21 ` [PATCH 10/26] sched/topology: Small cleanup Mel Gorman
2017-07-20 21:21 ` [PATCH 11/26] sched/topology: Add sched_group_capacity debugging Mel Gorman
2017-07-20 21:21 ` [PATCH 12/26] sched/topology: Fix overlapping sched_group_capacity Mel Gorman
2017-07-20 21:21 ` [PATCH 13/26] sched/topology: Add a few comments Mel Gorman
2017-07-20 21:21 ` [PATCH 14/26] sched/topology: Rewrite get_group() Mel Gorman
2017-07-20 21:21 ` [PATCH 15/26] sched/topology: Simplify sched_group_mask() usage Mel Gorman
2017-07-20 21:21 ` [PATCH 16/26] sched/topology: Rename sched_group_mask() Mel Gorman
2017-07-20 21:21 ` [PATCH 17/26] sched/topology: Rename sched_group_cpus() Mel Gorman
2017-07-20 21:21 ` [PATCH 18/26] vtime, sched/cputime: Remove vtime_account_user() Mel Gorman
2017-07-20 21:21 ` [PATCH 19/26] sched/cputime: Always set tsk->vtime_snap_whence after accounting vtime Mel Gorman
2017-07-20 21:21 ` [PATCH 20/26] sched/cputime: Rename vtime fields Mel Gorman
2017-07-20 21:21 ` [PATCH 21/26] sched/cputime: Move the vtime task fields to their own struct Mel Gorman
2017-07-20 21:21 ` [PATCH 22/26] sched/cputime: Accumulate vtime on top of nsec clocksource Mel Gorman
2017-07-20 21:21 ` [PATCH 23/26] sched/fair: Fix load_balance() affinity redo path Mel Gorman
2017-07-20 21:21 ` [PATCH 24/26] percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch Mel Gorman
2017-07-20 21:21 ` [PATCH 25/26] writeback: rework wb_[dec|inc]_stat family of functions Mel Gorman
2017-07-20 21:21 ` [PATCH 26/26] kernel/fork.c: virtually mapped stacks: do not disable interrupts Mel Gorman
2017-07-24 16:44 ` [PATCH 00/26] Performance-related backports for 4.12.2 Mel Gorman
2017-07-24 23:29   ` Greg KH
2017-07-25  8:14     ` Mel Gorman
2017-07-25 15:21       ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170720212144.18453-10-mgorman@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.