stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 4.11 81/88] sched/topology: Fix overlapping sched_group_mask
Date: Wed, 19 Jul 2017 12:08:43 +0200	[thread overview]
Message-ID: <20170719100833.410590866@linuxfoundation.org> (raw)
In-Reply-To: <20170719100820.364094938@linuxfoundation.org>

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <peterz@infradead.org>

commit 73bb059f9b8a00c5e1bf2f7ca83138c05d05e600 upstream.

The point of sched_group_mask is to select those CPUs from
sched_group_cpus that can actually arrive at this balance domain.

The current code gets it wrong, as can be readily demonstrated with a
topology like:

  node   0   1   2   3
    0:  10  20  30  20
    1:  20  10  20  30
    2:  30  20  10  20
    3:  20  30  20  10

Where (for example) domain 1 on CPU1 ends up with a mask that includes
CPU0:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 (mask: 1), 2, 0
  []   domain 1: span 0-3 level NUMA
  []    groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)

This causes sched_balance_cpu() to compute the wrong CPU and
consequently should_we_balance() will terminate early resulting in
missed load-balance opportunities.

The fixed topology looks like:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 (mask: 1), 2, 0
  []   domain 1: span 0-3 level NUMA
  []    groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)

(note: this relies on OVERLAP domains to always have children, this is
 true because the regular topology domains are still here -- this is
 before degenerate trimming)

Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/sched/topology.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -480,6 +480,9 @@ enum s_alloc {
  * Build an iteration mask that can exclude certain CPUs from the upwards
  * domain traversal.
  *
+ * Only CPUs that can arrive at this group should be considered to continue
+ * balancing.
+ *
  * Asymmetric node setups can result in situations where the domain tree is of
  * unequal depth, make sure to skip domains that already cover the entire
  * range.
@@ -497,11 +500,24 @@ static void build_group_mask(struct sche
 
 	for_each_cpu(i, sg_span) {
 		sibling = *per_cpu_ptr(sdd->sd, i);
-		if (!cpumask_test_cpu(i, sched_domain_span(sibling)))
+
+		/*
+		 * Can happen in the asymmetric case, where these siblings are
+		 * unused. The mask will not be empty because those CPUs that
+		 * do have the top domain _should_ span the domain.
+		 */
+		if (!sibling->child)
+			continue;
+
+		/* If we would not end up here, we can't continue from here */
+		if (!cpumask_equal(sg_span, sched_domain_span(sibling->child)))
 			continue;
 
 		cpumask_set_cpu(i, sched_group_mask(sg));
 	}
+
+	/* We must not have empty masks here */
+	WARN_ON_ONCE(cpumask_empty(sched_group_mask(sg)));
 }
 
 /*

  parent reply	other threads:[~2017-07-19 10:11 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-19 10:07 [PATCH 4.11 00/88] 4.11.12-stable review Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 01/88] net/phy: micrel: configure intterupts after autoneg workaround Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 02/88] ipv6: avoid unregistering inet6_dev for loopback Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 03/88] netvsc: dont access netdev->num_rx_queues directly Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 04/88] sfc: Fix MCDI command size for filter operations Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 06/88] net: dp83640: Avoid NULL pointer dereference Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 07/88] tcp: reset sk_rx_dst in tcp_disconnect() Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 08/88] net: prevent sign extension in dev_get_stats() Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 09/88] virtio-net: serialize tx routine during reset Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 10/88] net: sched: Fix one possible panic when no destroy callback Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 11/88] mlxsw: spectrum_router: Fix NULL pointer dereference Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 12/88] rocker: move dereference before free Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 13/88] bpf: prevent leaking pointer via xadd on unpriviledged Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 15/88] net/mlx5: Cancel delayed recovery work when unloading the driver Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 16/88] net/mlx5e: Fix TX carrier errors report in get stats ndo Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 17/88] ipv6: dad: dont remove dynamic addresses if link is down Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 18/88] vxlan: fix hlist corruption Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 19/88] geneve: " Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 20/88] net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64 Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 21/88] liquidio: fix bug in soft reset failure detection Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 23/88] vrf: fix bug_on triggered by rx when destroying a vrf Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 24/88] rds: tcp: use sock_create_lite() to create the accept socket Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 25/88] net/mlx5e: Initialize CEEs getpermhwaddr address buffer to 0xff Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 26/88] cxgb4: fix BUG() on interrupt deallocating path of ULD Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 27/88] tap: convert a mutex to a spinlock Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 28/88] bridge: mdb: fix leak on complete_info ptr on fail path Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 30/88] sfc: dont read beyond unicast address list Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 31/88] Adding asm-prototypes.h for genksyms to generate crc Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 32/88] sed regex in Makefile.build requires line break between exported symbols Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 33/88] Adding the type of " Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 34/88] sparc64: Fix gup_huge_pmd Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 35/88] block: Fix a blk_exit_rl() regression Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 36/88] brcmfmac: Fix a memory leak in error handling path in brcmf_cfg80211_attach Greg Kroah-Hartman
2017-07-19 10:07 ` [PATCH 4.11 37/88] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 38/88] efi: Process the MEMATTR table only if EFI_MEMMAP is enabled Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 39/88] cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 40/88] cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 41/88] cfg80211: Check if PMKID attribute is of expected size Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 42/88] cfg80211: Check if NAN service ID " Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 43/88] drm/amdgpu/gfx6: properly cache mc_arb_ramcfg Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 44/88] irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 45/88] parisc: Report SIGSEGV instead of SIGBUS when running out of stack Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 46/88] parisc: use compat_sys_keyctl() Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 47/88] parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 48/88] parisc/mm: Ensure IRQs are off in switch_mm() Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 49/88] tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 50/88] thp, mm: fix crash due race in MADV_FREE handling Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 51/88] kernel/extable.c: mark core_kernel_text notrace Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 52/88] mm/list_lru.c: fix list_lru_count_node() to be race free Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 53/88] fs/dcache.c: fix spin lockup issue on nlru->lock Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 54/88] checkpatch: silence perl 5.26.0 unescaped left brace warnings Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 55/88] binfmt_elf: use ELF_ET_DYN_BASE only for PIE Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 56/88] arm: move ELF_ET_DYN_BASE to 4MB Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 57/88] arm64: move ELF_ET_DYN_BASE to 4GB / 4MB Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 58/88] powerpc: " Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 59/88] s390: reduce ELF_ET_DYN_BASE Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 60/88] exec: Limit arg stack to at most 75% of _STK_LIM Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 61/88] powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 62/88] ARM64: dts: marvell: armada37xx: Fix timer interrupt specifiers Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 63/88] arm64: Preventing READ_IMPLIES_EXEC propagation Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 64/88] vt: fix unchecked __put_user() in tioclinux ioctls Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 65/88] rcu: Add memory barriers for NOCB leader wakeup Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 66/88] nvmem: core: fix leaks on registration errors Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 67/88] Drivers: hv: vmbus: Close timing hole that can corrupt per-cpu page Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 68/88] mnt: In umount propagation reparent in a separate pass Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 69/88] mnt: In propgate_umount handle visiting mounts in any order Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 70/88] mnt: Make propagate_umount less slow for overlapping mount propagation trees Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 71/88] selftests/capabilities: Fix the test_execve test Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 74/88] crypto: atmel - only treat EBUSY as transient if backlog Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 75/88] crypto: sha1-ssse3 - Disable avx2 Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 78/88] sched/fair, cpumask: Export for_each_cpu_wrap() Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 79/88] sched/topology: Fix building of overlapping sched-groups Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 80/88] sched/topology: Optimize build_group_mask() Greg Kroah-Hartman
2017-07-19 10:08 ` Greg Kroah-Hartman [this message]
2017-07-19 10:08 ` [PATCH 4.11 82/88] PM / wakeirq: Convert to SRCU Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 84/88] PM / QoS: return -EINVAL for bogus strings Greg Kroah-Hartman
2017-07-19 10:08 ` [PATCH 4.11 88/88] kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS Greg Kroah-Hartman
2017-07-19 10:27 ` [PATCH 4.11 00/88] 4.11.12-stable review Greg Kroah-Hartman
2017-07-19 20:34 ` Guenter Roeck
2017-07-19 23:38 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170719100833.410590866@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).