All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Brendan Jackman <brendan.jackman@arm.com>
Cc: linux-kernel@vger.kernel.org, Joel Fernandes <joelaf@google.com>,
	Andres Oportus <andresoportus@google.com>,
	Ingo Molnar <mingo@redhat.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH 2/2] sched/fair: Fix use of NULL with find_idlest_group
Date: Mon, 21 Aug 2017 23:14:00 +0200	[thread overview]
Message-ID: <20170821211400.GF32112@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20170821152128.14418-3-brendan.jackman@arm.com>

On Mon, Aug 21, 2017 at 04:21:28PM +0100, Brendan Jackman wrote:
> The current use of returning NULL from find_idlest_group is broken in
> two cases:
> 
> a1) The local group is not allowed.
> 
>    In this case, we currently do not change this_runnable_load or
>    this_avg_load from its initial value of 0, which means we return
>    NULL regardless of the load of the other, allowed groups. This
>    results in pointlessly continuing the find_idlest_group search
>    within the local group and then returning prev_cpu from
>    select_task_rq_fair.

> b) smp_processor_id() is the "idlest" and != prev_cpu.
> 
>    find_idlest_group also returns NULL when the local group is
>    allowed and is the idlest. The caller then continues the
>    find_idlest_group search at a lower level of the current CPU's
>    sched_domain hierarchy. However new_cpu is not updated. This means
>    the search is pointless and we return prev_cpu from
>    select_task_rq_fair.
> 

I think its much simpler than that.. but its late, so who knows ;-)

Both cases seem predicated on the assumption that we'll return @cpu when
we don't find any idler CPU. Consider, if the local group is the idlest,
we should stick with @cpu and simply proceed with the child domain.

The confusion, and the bugs, seem to have snuck in when we started
considering @prev_cpu, whenever that was. The below is mostly code
movement to put that whole while(sd) loop into its own function.

The effective change is setting @new_cpu = @cpu when we start that loop:

@@ -6023,6 +6023,8 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
 		struct sched_group *group;
 		int weight;
 
+		new_cpu = cpu;
+
 		if (!(sd->flags & sd_flag)) {
 			sd = sd->child;
 			continue;




---
 kernel/sched/fair.c | 83 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 48 insertions(+), 35 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c77e4b1d51c0..3e77265c480a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5588,10 +5588,10 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
 }
 
 /*
- * find_idlest_cpu - find the idlest cpu among the cpus in group.
+ * find_idlest_group_cpu - find the idlest cpu among the cpus in group.
  */
 static int
-find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
+find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 {
 	unsigned long load, min_load = ULONG_MAX;
 	unsigned int min_exit_latency = UINT_MAX;
@@ -5640,6 +5640,50 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
 	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
 }
 
+static int
+find_idlest_cpu(struct sched_domain *sd, struct task_struct *p, int cpu, int sd_flag)
+{
+	struct sched_domain *tmp;
+	int new_cpu = cpu;
+
+	while (sd) {
+		struct sched_group *group;
+		int weight;
+
+		if (!(sd->flags & sd_flag)) {
+			sd = sd->child;
+			continue;
+		}
+
+		group = find_idlest_group(sd, p, cpu, sd_flag);
+		if (!group) {
+			sd = sd->child;
+			continue;
+		}
+
+		new_cpu = find_idlest_group_cpu(group, p, cpu);
+		if (new_cpu == -1 || new_cpu == cpu) {
+			/* Now try balancing at a lower domain level of cpu */
+			sd = sd->child;
+			continue;
+		}
+
+		/* Now try balancing at a lower domain level of new_cpu */
+		cpu = new_cpu;
+		weight = sd->span_weight;
+		sd = NULL;
+		for_each_domain(cpu, tmp) {
+			if (weight <= tmp->span_weight)
+				break;
+			if (tmp->flags & sd_flag)
+				sd = tmp;
+		}
+		/* while loop will break here if sd == NULL */
+	}
+
+	return new_cpu;
+}
+
 /*
  * Implement a for_each_cpu() variant that starts the scan at a given cpu
  * (@start), and wraps around.
@@ -6019,39 +6063,8 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
 		if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
 			new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
 
-	} else while (sd) {
-		struct sched_group *group;
-		int weight;
-
-		if (!(sd->flags & sd_flag)) {
-			sd = sd->child;
-			continue;
-		}
-
-		group = find_idlest_group(sd, p, cpu, sd_flag);
-		if (!group) {
-			sd = sd->child;
-			continue;
-		}
-
-		new_cpu = find_idlest_cpu(group, p, cpu);
-		if (new_cpu == -1 || new_cpu == cpu) {
-			/* Now try balancing at a lower domain level of cpu */
-			sd = sd->child;
-			continue;
-		}
-
-		/* Now try balancing at a lower domain level of new_cpu */
-		cpu = new_cpu;
-		weight = sd->span_weight;
-		sd = NULL;
-		for_each_domain(cpu, tmp) {
-			if (weight <= tmp->span_weight)
-				break;
-			if (tmp->flags & sd_flag)
-				sd = tmp;
-		}
-		/* while loop will break here if sd == NULL */
+	} else {
+		new_cpu = find_idlest_cpu(sd, p, cpu, sd_flag);
 	}
 	rcu_read_unlock();
 

  parent reply	other threads:[~2017-08-21 21:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-21 15:21 [PATCH 0/2] sched/fair: Tweaks for select_task_rq_fair slowpath Brendan Jackman
2017-08-21 15:21 ` [PATCH 1/2] sched/fair: Remove unnecessary comparison with -1 Brendan Jackman
2017-08-21 17:30   ` Josef Bacik
2017-08-21 15:21 ` [PATCH 2/2] sched/fair: Fix use of NULL with find_idlest_group Brendan Jackman
2017-08-21 17:26   ` Josef Bacik
2017-08-21 17:59     ` Brendan Jackman
2017-08-21 20:22       ` Peter Zijlstra
2017-08-21 21:14   ` Peter Zijlstra [this message]
2017-08-21 21:23     ` Peter Zijlstra
2017-08-22  4:34     ` Joel Fernandes
2017-08-22 10:39       ` Brendan Jackman
2017-08-22 10:45         ` Brendan Jackman
2017-08-22 11:03         ` Peter Zijlstra
2017-08-22 12:46           ` Brendan Jackman
2017-08-22  7:48   ` Vincent Guittot
2017-08-22 10:41     ` Brendan Jackman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170821211400.GF32112@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=andresoportus@google.com \
    --cc=brendan.jackman@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.