From: Jason Low <jason.low2@hp.com>
To: Ingo Molnar <mingo@kernel.org>, Jason Low <jason.low2@hp.com>
Cc: peterz@infradead.org, linux-kernel@vger.kernel.org,
daniel.lezcano@linaro.org, alex.shi@linaro.org,
preeti@linux.vnet.ibm.com, efault@gmx.de,
vincent.guittot@linaro.org, morten.rasmussen@arm.com,
aswin@hp.com
Subject: Re: [PATCH 2/2] sched: Fix next_balance logic in rebalance_domains() and idle_balance()
Date: Thu, 08 May 2014 17:49:22 -0700 [thread overview]
Message-ID: <1399596562.2200.7.camel@j-VirtualBox> (raw)
In-Reply-To: <1399587244.2030.59.camel@j-VirtualBox>
On Thu, 2014-05-08 at 15:14 -0700, Jason Low wrote:
> Hi Ingo, Peter,
>
> Were there NULL domains on the test system? If so, I think we can
> address the problem by doing update_next_balance() only if the below
> rcu_dereference_check_sched_domain() returns a non-null domain.
Also, below is a patch which has the change:
---
Subject: [PATCH v2] sched: Fix next_balance logic in rebalance_domains()
v1->v2:
- Address a potential null pointer dereference.
Currently, in idle_balance(), we update rq->next_balance when we pull_tasks.
However, it is also important to update this in the !pulled_tasks case too.
When the CPU is "busy" (the CPU isn't idle), rq->next_balance gets computed
using sd->busy_factor (so we increase the balance interval when the CPU is
busy). However, when the CPU goes idle, rq->next_balance could still be set
to a large value that was computed with the sd->busy_factor.
Thus, we need to also update rq->next_balance in idle_balance() in the cases
where !pulled_tasks too, so that rq->next_balance gets updated without taking
the busy_factor into account when the CPU is about to go idle.
This patch makes rq->next_balance get updated independently of whether or
not we pulled_task. Also, we add logic to ensure that we always traverse
at least 1 of the sched domains to get a proper next_balance value for
updating rq->next_balance.
Additionally, since load_balance() modifies the sd->balance_interval, we
need to re-obtain the sched domain's interval after the call to
load_balance() in rebalance_domains() before we update rq->next_balance.
This patch adds and uses 2 new helper functions, update_next_balance() and
get_sd_balance_interval() to update next_balance and obtain the sched
domain's balance_interval.
Cc: daniel.lezcano@linaro.org
Cc: alex.shi@linaro.org
Cc: preeti@linux.vnet.ibm.com
Cc: efault@gmx.de
Cc: vincent.guittot@linaro.org
Cc: morten.rasmussen@arm.com
Cc: aswin@hp.com
Cc: mingo@kernel.org
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Jason Low <jason.low2@hp.com>
---
kernel/sched/fair.c | 69 ++++++++++++++++++++++++++++++++++-----------------
1 files changed, 46 insertions(+), 23 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 28ccf50..57b7390 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6669,17 +6669,44 @@ out:
return ld_moved;
}
+static inline unsigned long
+get_sd_balance_interval(struct sched_domain *sd, int cpu_busy)
+{
+ unsigned long interval = sd->balance_interval;
+
+ if (cpu_busy)
+ interval *= sd->busy_factor;
+
+ /* scale ms to jiffies */
+ interval = msecs_to_jiffies(interval);
+ interval = clamp(interval, 1UL, max_load_balance_interval);
+
+ return interval;
+}
+
+static inline void
+update_next_balance(struct sched_domain *sd, int cpu_busy, unsigned long *next_balance)
+{
+ unsigned long interval, next;
+
+ interval = get_sd_balance_interval(sd, cpu_busy);
+ next = sd->last_balance + interval;
+
+ if (time_after(*next_balance, next))
+ *next_balance = next;
+}
+
/*
* idle_balance is called by schedule() if this_cpu is about to become
* idle. Attempts to pull tasks from other CPUs.
*/
static int idle_balance(struct rq *this_rq)
{
+ unsigned long next_balance = jiffies + HZ;
+ int this_cpu = this_rq->cpu;
struct sched_domain *sd;
int pulled_task = 0;
- unsigned long next_balance = jiffies + HZ;
u64 curr_cost = 0;
- int this_cpu = this_rq->cpu;
idle_enter_fair(this_rq);
@@ -6689,8 +6716,15 @@ static int idle_balance(struct rq *this_rq)
*/
this_rq->idle_stamp = rq_clock(this_rq);
- if (this_rq->avg_idle < sysctl_sched_migration_cost)
+ if (this_rq->avg_idle < sysctl_sched_migration_cost) {
+ rcu_read_lock();
+ sd = rcu_dereference_check_sched_domain(this_rq->sd);
+ if (sd)
+ update_next_balance(sd, 0, &next_balance);
+ rcu_read_unlock();
+
goto out;
+ }
/*
* Drop the rq->lock, but keep IRQ/preempt disabled.
@@ -6700,15 +6734,16 @@ static int idle_balance(struct rq *this_rq)
update_blocked_averages(this_cpu);
rcu_read_lock();
for_each_domain(this_cpu, sd) {
- unsigned long interval;
int continue_balancing = 1;
u64 t0, domain_cost;
if (!(sd->flags & SD_LOAD_BALANCE))
continue;
- if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost)
+ if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
+ update_next_balance(sd, 0, &next_balance);
break;
+ }
if (sd->flags & SD_BALANCE_NEWIDLE) {
t0 = sched_clock_cpu(this_cpu);
@@ -6724,9 +6759,7 @@ static int idle_balance(struct rq *this_rq)
curr_cost += domain_cost;
}
- interval = msecs_to_jiffies(sd->balance_interval);
- if (time_after(next_balance, sd->last_balance + interval))
- next_balance = sd->last_balance + interval;
+ update_next_balance(sd, 0, &next_balance);
/*
* Stop searching for tasks to pull if there are
@@ -6750,15 +6783,11 @@ static int idle_balance(struct rq *this_rq)
if (this_rq->cfs.h_nr_running && !pulled_task)
pulled_task = 1;
- if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
- /*
- * We are going idle. next_balance may be set based on
- * a busy processor. So reset next_balance.
- */
+out:
+ /* Move the next balance forward */
+ if (time_after(this_rq->next_balance, next_balance))
this_rq->next_balance = next_balance;
- }
-out:
/* Is there a task of a high priority class? */
if (this_rq->nr_running != this_rq->cfs.h_nr_running)
pulled_task = -1;
@@ -7041,16 +7070,9 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
break;
}
- interval = sd->balance_interval;
- if (idle != CPU_IDLE)
- interval *= sd->busy_factor;
-
- /* scale ms to jiffies */
- interval = msecs_to_jiffies(interval);
- interval = clamp(interval, 1UL, max_load_balance_interval);
+ interval = get_sd_balance_interval(sd, idle != CPU_IDLE);
need_serialize = sd->flags & SD_SERIALIZE;
-
if (need_serialize) {
if (!spin_trylock(&balancing))
goto out;
@@ -7066,6 +7088,7 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE;
}
sd->last_balance = jiffies;
+ interval = get_sd_balance_interval(sd, idle != CPU_IDLE);
}
if (need_serialize)
spin_unlock(&balancing);
--
1.7.1
next prev parent reply other threads:[~2014-05-09 0:49 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-28 22:45 [PATCH 0/2] sched: Idle load balance fixes Jason Low
2014-04-28 22:45 ` [PATCH 1/2] sched: Fix updating rq->max_idle_balance_cost and rq->next_balance in idle_balance() Jason Low
2014-05-08 10:42 ` [tip:sched/core] sched: Fix updating rq-> max_idle_balance_cost " tip-bot for Jason Low
2014-04-28 22:45 ` [PATCH 2/2] sched: Fix next_balance logic in rebalance_domains() and idle_balance() Jason Low
2014-05-08 16:59 ` Jason Low
2014-05-08 17:38 ` Ingo Molnar
2014-05-08 18:15 ` Jason Low
2014-05-08 22:14 ` Jason Low
2014-05-09 0:49 ` Jason Low [this message]
2014-05-09 9:08 ` Peter Zijlstra
2014-05-09 17:29 ` Jason Low
2014-05-19 13:09 ` [tip:sched/core] " tip-bot for Jason Low
2014-05-22 12:27 ` [tip:sched/core] sched: Fix the rq-> " tip-bot for Jason Low
2014-05-08 18:06 ` [PATCH 2/2] sched: Fix " Peter Zijlstra
2014-04-30 7:54 ` [PATCH 0/2] sched: Idle load balance fixes Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1399596562.2200.7.camel@j-VirtualBox \
--to=jason.low2@hp.com \
--cc=alex.shi@linaro.org \
--cc=aswin@hp.com \
--cc=daniel.lezcano@linaro.org \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=preeti@linux.vnet.ibm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox