From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Mike Galbraith <efault@gmx.de>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>
Subject: [PATCH 4.4 48/57] sched/topology: Fix overlapping sched_group_mask
Date: Wed, 19 Jul 2017 13:12:54 +0200 [thread overview]
Message-ID: <20170719111251.902738816@linuxfoundation.org> (raw)
In-Reply-To: <20170719111249.973558472@linuxfoundation.org>
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra <peterz@infradead.org>
commit 73bb059f9b8a00c5e1bf2f7ca83138c05d05e600 upstream.
The point of sched_group_mask is to select those CPUs from
sched_group_cpus that can actually arrive at this balance domain.
The current code gets it wrong, as can be readily demonstrated with a
topology like:
node 0 1 2 3
0: 10 20 30 20
1: 20 10 20 30
2: 30 20 10 20
3: 20 30 20 10
Where (for example) domain 1 on CPU1 ends up with a mask that includes
CPU0:
[] CPU1 attaching sched-domain:
[] domain 0: span 0-2 level NUMA
[] groups: 1 (mask: 1), 2, 0
[] domain 1: span 0-3 level NUMA
[] groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)
This causes sched_balance_cpu() to compute the wrong CPU and
consequently should_we_balance() will terminate early resulting in
missed load-balance opportunities.
The fixed topology looks like:
[] CPU1 attaching sched-domain:
[] domain 0: span 0-2 level NUMA
[] groups: 1 (mask: 1), 2, 0
[] domain 1: span 0-3 level NUMA
[] groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)
(note: this relies on OVERLAP domains to always have children, this is
true because the regular topology domains are still here -- this is
before degenerate trimming)
Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/sched/core.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6123,6 +6123,9 @@ enum s_alloc {
* Build an iteration mask that can exclude certain CPUs from the upwards
* domain traversal.
*
+ * Only CPUs that can arrive at this group should be considered to continue
+ * balancing.
+ *
* Asymmetric node setups can result in situations where the domain tree is of
* unequal depth, make sure to skip domains that already cover the entire
* range.
@@ -6141,11 +6144,24 @@ static void build_group_mask(struct sche
for_each_cpu(i, span) {
sibling = *per_cpu_ptr(sdd->sd, i);
- if (!cpumask_test_cpu(i, sched_domain_span(sibling)))
+
+ /*
+ * Can happen in the asymmetric case, where these siblings are
+ * unused. The mask will not be empty because those CPUs that
+ * do have the top domain _should_ span the domain.
+ */
+ if (!sibling->child)
+ continue;
+
+ /* If we would not end up here, we can't continue from here */
+ if (!cpumask_equal(sg_span, sched_domain_span(sibling->child)))
continue;
cpumask_set_cpu(i, sched_group_mask(sg));
}
+
+ /* We must not have empty masks here */
+ WARN_ON_ONCE(cpumask_empty(sched_group_mask(sg)));
}
/*
next prev parent reply other threads:[~2017-07-19 11:14 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-19 11:12 [PATCH 4.4 00/57] 4.4.78-stable review Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 01/57] net_sched: fix error recovery at qdisc creation Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 02/57] net: sched: Fix one possible panic when no destroy callback Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 03/57] net/phy: micrel: configure intterupts after autoneg workaround Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 04/57] ipv6: avoid unregistering inet6_dev for loopback Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 05/57] net: dp83640: Avoid NULL pointer dereference Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 06/57] tcp: reset sk_rx_dst in tcp_disconnect() Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 07/57] net: prevent sign extension in dev_get_stats() Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 08/57] bpf: prevent leaking pointer via xadd on unpriviledged Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 10/57] ipv6: dad: dont remove dynamic addresses if link is down Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 12/57] vrf: fix bug_on triggered by rx when destroying a vrf Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 13/57] rds: tcp: use sock_create_lite() to create the accept socket Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 15/57] cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 16/57] cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 17/57] cfg80211: Check if PMKID attribute is of expected size Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 18/57] irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 19/57] parisc: Report SIGSEGV instead of SIGBUS when running out of stack Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 20/57] parisc: use compat_sys_keyctl() Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 21/57] parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 22/57] parisc/mm: Ensure IRQs are off in switch_mm() Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 23/57] tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 24/57] kernel/extable.c: mark core_kernel_text notrace Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 25/57] mm/list_lru.c: fix list_lru_count_node() to be race free Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 26/57] fs/dcache.c: fix spin lockup issue on nlru->lock Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 27/57] checkpatch: silence perl 5.26.0 unescaped left brace warnings Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 28/57] binfmt_elf: use ELF_ET_DYN_BASE only for PIE Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 29/57] arm: move ELF_ET_DYN_BASE to 4MB Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 30/57] arm64: move ELF_ET_DYN_BASE to 4GB / 4MB Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 31/57] powerpc: " Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 32/57] s390: reduce ELF_ET_DYN_BASE Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 33/57] exec: Limit arg stack to at most 75% of _STK_LIM Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 34/57] vt: fix unchecked __put_user() in tioclinux ioctls Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 35/57] mnt: In umount propagation reparent in a separate pass Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 36/57] mnt: In propgate_umount handle visiting mounts in any order Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 37/57] mnt: Make propagate_umount less slow for overlapping mount propagation trees Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 38/57] selftests/capabilities: Fix the test_execve test Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 39/57] tpm: Get rid of chip->pdev Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 40/57] tpm: Provide strong locking for device removal Greg Kroah-Hartman
2017-07-25 22:56 ` Ben Hutchings
2017-07-26 19:56 ` Greg Kroah-Hartman
2017-07-26 20:03 ` Jason Gunthorpe
2017-07-28 22:42 ` Greg Kroah-Hartman
2017-07-31 22:22 ` Jarkko Sakkinen
2017-08-04 19:59 ` Greg Kroah-Hartman
2017-08-04 21:44 ` Greg Kroah-Hartman
2017-08-06 12:47 ` Jarkko Sakkinen
2017-08-08 21:05 ` Jarkko Sakkinen
2017-08-08 21:14 ` Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 41/57] Add "shutdown" to "struct class" Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 42/57] tpm: Issue a TPM2_Shutdown for TPM2 devices Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 45/57] crypto: atmel - only treat EBUSY as transient if backlog Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 46/57] crypto: sha1-ssse3 - Disable avx2 Greg Kroah-Hartman
2017-07-19 11:12 ` Greg Kroah-Hartman [this message]
2017-07-19 11:12 ` [PATCH 4.4 49/57] sched/topology: Optimize build_group_mask() Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 50/57] PM / wakeirq: Convert to SRCU Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 51/57] PM / QoS: return -EINVAL for bogus strings Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 52/57] tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results Greg Kroah-Hartman
2017-07-19 11:12 ` [PATCH 4.4 53/57] KVM: x86: disable MPX if host did not enable MPX XSAVE features Greg Kroah-Hartman
2017-07-19 11:13 ` [PATCH 4.4 57/57] kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS Greg Kroah-Hartman
2017-07-19 20:33 ` [PATCH 4.4 00/57] 4.4.78-stable review Guenter Roeck
2017-07-19 23:39 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170719111251.902738816@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).