All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeffrey Hugo <jhugo@codeaurora.org>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Austin Christ <austinwc@codeaurora.org>,
	Tyler Baicar <tbaicar@codeaurora.org>,
	Jeffrey Hugo <jhugo@codeaurora.org>
Subject: [RFC 1/2] sched/fair: Fix load_balance() affinity redo path
Date: Fri, 12 May 2017 11:01:37 -0600	[thread overview]
Message-ID: <1494608498-4538-2-git-send-email-jhugo@codeaurora.org> (raw)
In-Reply-To: <1494608498-4538-1-git-send-email-jhugo@codeaurora.org>

If load_balance() fails to migrate any tasks because all tasks were
affined, load_balance() removes the source cpu from consideration and
attempts to redo and balance among the new subset of cpus.

There is a bug in this code path where the algorithm considers all active
cpus in the system (minus the source that was just masked out).  This is
not valid for two reasons: some active cpus may not be in the current
scheduling domain and one of the active cpus is dst_cpu. These cpus should
not be considered, as we cannot pull load from them.

Instead of failing out of load_balance(), we may end up redoing the search
with no valid cpus and incorrectly concluding the domain is balanced.
Additionally, if the group_imbalance flag was just set, it may also be
incorrectly unset, thus the flag will not be seen by other cpus in future
load_balance() runs as that algorithm intends.

Fix the check by removing cpus not in the current domain and the dst_cpu
from considertation, thus limiting the evaluation to valid remaining cpus
from which load might be migrated.

Signed-off-by: Austin Christ <austinwc@codeaurora.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 kernel/sched/fair.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d711093..8f783ba 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8219,8 +8219,19 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 
 		/* All tasks on this runqueue were pinned by CPU affinity */
 		if (unlikely(env.flags & LBF_ALL_PINNED)) {
+			struct cpumask tmp;
+
+			/* Cpumask of all initially possible busiest cpus. */
+			cpumask_copy(&tmp, sched_domain_span(env.sd));
+			cpumask_clear_cpu(env.dst_cpu, &tmp);
+
 			cpumask_clear_cpu(cpu_of(busiest), cpus);
-			if (!cpumask_empty(cpus)) {
+			/*
+			 * Go back to "redo" iff the load-balance cpumask
+			 * contains other potential busiest cpus for the
+			 * current sched domain.
+			 */
+			if (cpumask_intersects(cpus, &tmp)) {
 				env.loop = 0;
 				env.loop_break = sched_nr_migrate_break;
 				goto redo;
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

  reply	other threads:[~2017-05-12 17:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-12 17:01 [RFC 0/2] load_balance() fixes for affinity Jeffrey Hugo
2017-05-12 17:01 ` Jeffrey Hugo [this message]
2017-05-12 17:23   ` [RFC 1/2] sched/fair: Fix load_balance() affinity redo path Peter Zijlstra
2017-05-12 17:29     ` Jeffrey Hugo
2017-05-12 20:44       ` Peter Zijlstra
2017-05-12 20:54         ` Jeffrey Hugo
2017-05-12 20:47   ` Peter Zijlstra
2017-05-12 20:57     ` Jeffrey Hugo
2017-05-15 14:56       ` Dietmar Eggemann
2017-05-18 14:31         ` Jeffrey Hugo
2017-05-12 17:01 ` [RFC 2/2] sched/fair: Remove group imbalance from calculate_imbalance() Jeffrey Hugo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1494608498-4538-2-git-send-email-jhugo@codeaurora.org \
    --to=jhugo@codeaurora.org \
    --cc=austinwc@codeaurora.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tbaicar@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.