From: Peter Zijlstra <peterz@infradead.org>
To: Brendan Jackman <brendan.jackman@arm.com>
Cc: linux-kernel@vger.kernel.org, Joel Fernandes <joelaf@google.com>,
Andres Oportus <andresoportus@google.com>,
Ingo Molnar <mingo@redhat.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH 2/2] sched/fair: Fix use of NULL with find_idlest_group
Date: Mon, 21 Aug 2017 23:14:00 +0200 [thread overview]
Message-ID: <20170821211400.GF32112@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20170821152128.14418-3-brendan.jackman@arm.com>
On Mon, Aug 21, 2017 at 04:21:28PM +0100, Brendan Jackman wrote:
> The current use of returning NULL from find_idlest_group is broken in
> two cases:
>
> a1) The local group is not allowed.
>
> In this case, we currently do not change this_runnable_load or
> this_avg_load from its initial value of 0, which means we return
> NULL regardless of the load of the other, allowed groups. This
> results in pointlessly continuing the find_idlest_group search
> within the local group and then returning prev_cpu from
> select_task_rq_fair.
> b) smp_processor_id() is the "idlest" and != prev_cpu.
>
> find_idlest_group also returns NULL when the local group is
> allowed and is the idlest. The caller then continues the
> find_idlest_group search at a lower level of the current CPU's
> sched_domain hierarchy. However new_cpu is not updated. This means
> the search is pointless and we return prev_cpu from
> select_task_rq_fair.
>
I think its much simpler than that.. but its late, so who knows ;-)
Both cases seem predicated on the assumption that we'll return @cpu when
we don't find any idler CPU. Consider, if the local group is the idlest,
we should stick with @cpu and simply proceed with the child domain.
The confusion, and the bugs, seem to have snuck in when we started
considering @prev_cpu, whenever that was. The below is mostly code
movement to put that whole while(sd) loop into its own function.
The effective change is setting @new_cpu = @cpu when we start that loop:
@@ -6023,6 +6023,8 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
struct sched_group *group;
int weight;
+ new_cpu = cpu;
+
if (!(sd->flags & sd_flag)) {
sd = sd->child;
continue;
---
kernel/sched/fair.c | 83 +++++++++++++++++++++++++++++++----------------------
1 file changed, 48 insertions(+), 35 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c77e4b1d51c0..3e77265c480a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5588,10 +5588,10 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
}
/*
- * find_idlest_cpu - find the idlest cpu among the cpus in group.
+ * find_idlest_group_cpu - find the idlest cpu among the cpus in group.
*/
static int
-find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
+find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
{
unsigned long load, min_load = ULONG_MAX;
unsigned int min_exit_latency = UINT_MAX;
@@ -5640,6 +5640,50 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
}
+static int
+find_idlest_cpu(struct sched_domain *sd, struct task_struct *p, int cpu, int sd_flag)
+{
+ struct sched_domain *tmp;
+ int new_cpu = cpu;
+
+ while (sd) {
+ struct sched_group *group;
+ int weight;
+
+ if (!(sd->flags & sd_flag)) {
+ sd = sd->child;
+ continue;
+ }
+
+ group = find_idlest_group(sd, p, cpu, sd_flag);
+ if (!group) {
+ sd = sd->child;
+ continue;
+ }
+
+ new_cpu = find_idlest_group_cpu(group, p, cpu);
+ if (new_cpu == -1 || new_cpu == cpu) {
+ /* Now try balancing at a lower domain level of cpu */
+ sd = sd->child;
+ continue;
+ }
+
+ /* Now try balancing at a lower domain level of new_cpu */
+ cpu = new_cpu;
+ weight = sd->span_weight;
+ sd = NULL;
+ for_each_domain(cpu, tmp) {
+ if (weight <= tmp->span_weight)
+ break;
+ if (tmp->flags & sd_flag)
+ sd = tmp;
+ }
+ /* while loop will break here if sd == NULL */
+ }
+
+ return new_cpu;
+}
+
/*
* Implement a for_each_cpu() variant that starts the scan at a given cpu
* (@start), and wraps around.
@@ -6019,39 +6063,8 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
- } else while (sd) {
- struct sched_group *group;
- int weight;
-
- if (!(sd->flags & sd_flag)) {
- sd = sd->child;
- continue;
- }
-
- group = find_idlest_group(sd, p, cpu, sd_flag);
- if (!group) {
- sd = sd->child;
- continue;
- }
-
- new_cpu = find_idlest_cpu(group, p, cpu);
- if (new_cpu == -1 || new_cpu == cpu) {
- /* Now try balancing at a lower domain level of cpu */
- sd = sd->child;
- continue;
- }
-
- /* Now try balancing at a lower domain level of new_cpu */
- cpu = new_cpu;
- weight = sd->span_weight;
- sd = NULL;
- for_each_domain(cpu, tmp) {
- if (weight <= tmp->span_weight)
- break;
- if (tmp->flags & sd_flag)
- sd = tmp;
- }
- /* while loop will break here if sd == NULL */
+ } else {
+ new_cpu = find_idlest_cpu(sd, p, cpu, sd_flag);
}
rcu_read_unlock();
next prev parent reply other threads:[~2017-08-21 21:14 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-21 15:21 [PATCH 0/2] sched/fair: Tweaks for select_task_rq_fair slowpath Brendan Jackman
2017-08-21 15:21 ` [PATCH 1/2] sched/fair: Remove unnecessary comparison with -1 Brendan Jackman
2017-08-21 17:30 ` Josef Bacik
2017-08-21 15:21 ` [PATCH 2/2] sched/fair: Fix use of NULL with find_idlest_group Brendan Jackman
2017-08-21 17:26 ` Josef Bacik
2017-08-21 17:59 ` Brendan Jackman
2017-08-21 20:22 ` Peter Zijlstra
2017-08-21 21:14 ` Peter Zijlstra [this message]
2017-08-21 21:23 ` Peter Zijlstra
2017-08-22 4:34 ` Joel Fernandes
2017-08-22 10:39 ` Brendan Jackman
2017-08-22 10:45 ` Brendan Jackman
2017-08-22 11:03 ` Peter Zijlstra
2017-08-22 12:46 ` Brendan Jackman
2017-08-22 7:48 ` Vincent Guittot
2017-08-22 10:41 ` Brendan Jackman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170821211400.GF32112@worktop.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=andresoportus@google.com \
--cc=brendan.jackman@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=joelaf@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox