public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: Reduce the rate of needless idle load balancing
@ 2014-05-20 20:17 Tim Chen
  2014-05-20 20:51 ` Jason Low
  0 siblings, 1 reply; 14+ messages in thread
From: Tim Chen @ 2014-05-20 20:17 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Andrew Morton, Len Brown, Russ Anderson, Dimitri Sivanich,
	Hedi Berriche, Andi Kleen, Michel Lespinasse, Rik van Riel,
	Peter Hurley, linux-kernel

The current no_hz idle load balancer do load balancing on *all* idle cpus,
even though the time due to load balance for a particular
idle cpu could be still a while in future.  This introduces a much
higher load balancing rate than what is necessary.  The patch
changes the behavior by only doing idle load balancing on
behalf of an idle cpu only when time is due for load balancing.

On SGI's systems with over 3000 cores, the cpu responsible for idle balancing
got overwhelmed with idle balancing, and introduces a lot of OS noise
to workloads.  This patch fixes the issue.

Thanks.

Tim

Acked-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9b4c4f3..97132db 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6764,12 +6764,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 
 		rq = cpu_rq(balance_cpu);
 
-		raw_spin_lock_irq(&rq->lock);
-		update_rq_clock(rq);
-		update_idle_cpu_load(rq);
-		raw_spin_unlock_irq(&rq->lock);
-
-		rebalance_domains(rq, CPU_IDLE);
+		/*
+		 * If time for next balance is due,
+		 * do the balance.
+		 */
+		if (time_after(jiffies + 1, rq->next_balance)) {
+			raw_spin_lock_irq(&rq->lock);
+			update_rq_clock(rq);
+			update_idle_cpu_load(rq);
+			raw_spin_unlock_irq(&rq->lock);
+			rebalance_domains(rq, CPU_IDLE);
+		}
 
 		if (time_after(this_rq->next_balance, rq->next_balance))
 			this_rq->next_balance = rq->next_balance;
-- 
1.7.11.7



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-20 20:17 [PATCH] sched: Reduce the rate of needless idle load balancing Tim Chen
@ 2014-05-20 20:51 ` Jason Low
  2014-05-20 20:58   ` Rik van Riel
  2014-05-20 20:59   ` Tim Chen
  0 siblings, 2 replies; 14+ messages in thread
From: Jason Low @ 2014-05-20 20:51 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, Len Brown,
	Russ Anderson, Dimitri Sivanich, Hedi Berriche, Andi Kleen,
	Michel Lespinasse, Rik van Riel, Peter Hurley,
	Linux Kernel Mailing List

On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9b4c4f3..97132db 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6764,12 +6764,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>
>                 rq = cpu_rq(balance_cpu);
>
> -               raw_spin_lock_irq(&rq->lock);
> -               update_rq_clock(rq);
> -               update_idle_cpu_load(rq);
> -               raw_spin_unlock_irq(&rq->lock);
> -
> -               rebalance_domains(rq, CPU_IDLE);
> +               /*
> +                * If time for next balance is due,
> +                * do the balance.
> +                */
> +               if (time_after(jiffies + 1, rq->next_balance)) {

Hi Tim,

If we want to do idle load balancing only when it is due for a
balance, shouldn't the above just be "if (time_after(jiffies,
rq->next_balance))"?

> +                       raw_spin_lock_irq(&rq->lock);
> +                       update_rq_clock(rq);
> +                       update_idle_cpu_load(rq);
> +                       raw_spin_unlock_irq(&rq->lock);
> +                       rebalance_domains(rq, CPU_IDLE);
> +               }
>
>                 if (time_after(this_rq->next_balance, rq->next_balance))
>                         this_rq->next_balance = rq->next_balance;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-20 20:51 ` Jason Low
@ 2014-05-20 20:58   ` Rik van Riel
  2014-05-20 20:59   ` Tim Chen
  1 sibling, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2014-05-20 20:58 UTC (permalink / raw)
  To: Jason Low, Tim Chen
  Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, Len Brown,
	Russ Anderson, Dimitri Sivanich, Hedi Berriche, Andi Kleen,
	Michel Lespinasse, Peter Hurley, Linux Kernel Mailing List

On 05/20/2014 04:51 PM, Jason Low wrote:
> On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9b4c4f3..97132db 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6764,12 +6764,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>>
>>                 rq = cpu_rq(balance_cpu);
>>
>> -               raw_spin_lock_irq(&rq->lock);
>> -               update_rq_clock(rq);
>> -               update_idle_cpu_load(rq);
>> -               raw_spin_unlock_irq(&rq->lock);
>> -
>> -               rebalance_domains(rq, CPU_IDLE);
>> +               /*
>> +                * If time for next balance is due,
>> +                * do the balance.
>> +                */
>> +               if (time_after(jiffies + 1, rq->next_balance)) {
> 
> Hi Tim,
> 
> If we want to do idle load balancing only when it is due for a
> balance, shouldn't the above just be "if (time_after(jiffies,
> rq->next_balance))"?

I was wondering the same.

Everything else gets my

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-20 20:51 ` Jason Low
  2014-05-20 20:58   ` Rik van Riel
@ 2014-05-20 20:59   ` Tim Chen
  2014-05-20 21:04     ` Tim Chen
  2014-05-20 21:09     ` Jason Low
  1 sibling, 2 replies; 14+ messages in thread
From: Tim Chen @ 2014-05-20 20:59 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, Len Brown,
	Russ Anderson, Dimitri Sivanich, Hedi Berriche, Andi Kleen,
	Michel Lespinasse, Rik van Riel, Peter Hurley,
	Linux Kernel Mailing List

On Tue, 2014-05-20 at 13:51 -0700, Jason Low wrote:
> On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 9b4c4f3..97132db 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6764,12 +6764,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> >
> >                 rq = cpu_rq(balance_cpu);
> >
> > -               raw_spin_lock_irq(&rq->lock);
> > -               update_rq_clock(rq);
> > -               update_idle_cpu_load(rq);
> > -               raw_spin_unlock_irq(&rq->lock);
> > -
> > -               rebalance_domains(rq, CPU_IDLE);
> > +               /*
> > +                * If time for next balance is due,
> > +                * do the balance.
> > +                */
> > +               if (time_after(jiffies + 1, rq->next_balance)) {
> 
> Hi Tim,
> 
> If we want to do idle load balancing only when it is due for a
> balance, shouldn't the above just be "if (time_after(jiffies,
> rq->next_balance))"?

If rq->next_balance and jiffies are equal, then
time_after(jiffies, rq->next_balance) check will be false and
you will not do balance.  But actually you want to balance
for this case so the jiffies+1 was used.

Tim


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-20 20:59   ` Tim Chen
@ 2014-05-20 21:04     ` Tim Chen
  2014-05-21  1:15       ` Joe Perches
  2014-05-20 21:09     ` Jason Low
  1 sibling, 1 reply; 14+ messages in thread
From: Tim Chen @ 2014-05-20 21:04 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, Len Brown,
	Russ Anderson, Dimitri Sivanich, Hedi Berriche, Andi Kleen,
	Michel Lespinasse, Rik van Riel, Peter Hurley,
	Linux Kernel Mailing List

On Tue, 2014-05-20 at 13:59 -0700, Tim Chen wrote:
> On Tue, 2014-05-20 at 13:51 -0700, Jason Low wrote:
> > On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 9b4c4f3..97132db 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6764,12 +6764,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> > >
> > >                 rq = cpu_rq(balance_cpu);
> > >
> > > -               raw_spin_lock_irq(&rq->lock);
> > > -               update_rq_clock(rq);
> > > -               update_idle_cpu_load(rq);
> > > -               raw_spin_unlock_irq(&rq->lock);
> > > -
> > > -               rebalance_domains(rq, CPU_IDLE);
> > > +               /*
> > > +                * If time for next balance is due,
> > > +                * do the balance.
> > > +                */
> > > +               if (time_after(jiffies + 1, rq->next_balance)) {
> > 
> > Hi Tim,
> > 
> > If we want to do idle load balancing only when it is due for a
> > balance, shouldn't the above just be "if (time_after(jiffies,
> > rq->next_balance))"?
> 
> If rq->next_balance and jiffies are equal, then
> time_after(jiffies, rq->next_balance) check will be false and
> you will not do balance.  But actually you want to balance
> for this case so the jiffies+1 was used.

So maybe I should switch the check to 
if (time_before(rq->next_balance, jiffies))

Tim



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-20 20:59   ` Tim Chen
  2014-05-20 21:04     ` Tim Chen
@ 2014-05-20 21:09     ` Jason Low
  2014-05-20 21:12       ` Tim Chen
  2014-05-20 21:39       ` Tim Chen
  1 sibling, 2 replies; 14+ messages in thread
From: Jason Low @ 2014-05-20 21:09 UTC (permalink / raw)
  To: Tim Chen
  Cc: Jason Low, Ingo Molnar, Peter Zijlstra, Andrew Morton, Len Brown,
	Russ Anderson, Dimitri Sivanich, Hedi Berriche, Andi Kleen,
	Michel Lespinasse, Rik van Riel, Peter Hurley,
	Linux Kernel Mailing List

On Tue, May 20, 2014 at 1:59 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> On Tue, 2014-05-20 at 13:51 -0700, Jason Low wrote:
>> On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>>
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 9b4c4f3..97132db 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -6764,12 +6764,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
>> >
>> >                 rq = cpu_rq(balance_cpu);
>> >
>> > -               raw_spin_lock_irq(&rq->lock);
>> > -               update_rq_clock(rq);
>> > -               update_idle_cpu_load(rq);
>> > -               raw_spin_unlock_irq(&rq->lock);
>> > -
>> > -               rebalance_domains(rq, CPU_IDLE);
>> > +               /*
>> > +                * If time for next balance is due,
>> > +                * do the balance.
>> > +                */
>> > +               if (time_after(jiffies + 1, rq->next_balance)) {
>>
>> Hi Tim,
>>
>> If we want to do idle load balancing only when it is due for a
>> balance, shouldn't the above just be "if (time_after(jiffies,
>> rq->next_balance))"?
>
> If rq->next_balance and jiffies are equal, then
> time_after(jiffies, rq->next_balance) check will be false and
> you will not do balance.  But actually you want to balance
> for this case so the jiffies+1 was used.

Hi Tim, Rik

Yes, that makes sense that we want to balance if they are equal. We
may also consider using "if (time_after_eq(jiffies,
rq->next_balance)".

Reviewed-by: Jason Low <jason.low2@hp.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-20 21:09     ` Jason Low
@ 2014-05-20 21:12       ` Tim Chen
  2014-05-20 21:39       ` Tim Chen
  1 sibling, 0 replies; 14+ messages in thread
From: Tim Chen @ 2014-05-20 21:12 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, Len Brown,
	Russ Anderson, Dimitri Sivanich, Hedi Berriche, Andi Kleen,
	Michel Lespinasse, Rik van Riel, Peter Hurley,
	Linux Kernel Mailing List

On Tue, 2014-05-20 at 14:09 -0700, Jason Low wrote:
> On Tue, May 20, 2014 at 1:59 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > On Tue, 2014-05-20 at 13:51 -0700, Jason Low wrote:
> >> On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> >>
> >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> > index 9b4c4f3..97132db 100644
> >> > --- a/kernel/sched/fair.c
> >> > +++ b/kernel/sched/fair.c
> >> > @@ -6764,12 +6764,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> >> >
> >> >                 rq = cpu_rq(balance_cpu);
> >> >
> >> > -               raw_spin_lock_irq(&rq->lock);
> >> > -               update_rq_clock(rq);
> >> > -               update_idle_cpu_load(rq);
> >> > -               raw_spin_unlock_irq(&rq->lock);
> >> > -
> >> > -               rebalance_domains(rq, CPU_IDLE);
> >> > +               /*
> >> > +                * If time for next balance is due,
> >> > +                * do the balance.
> >> > +                */
> >> > +               if (time_after(jiffies + 1, rq->next_balance)) {
> >>
> >> Hi Tim,
> >>
> >> If we want to do idle load balancing only when it is due for a
> >> balance, shouldn't the above just be "if (time_after(jiffies,
> >> rq->next_balance))"?
> >
> > If rq->next_balance and jiffies are equal, then
> > time_after(jiffies, rq->next_balance) check will be false and
> > you will not do balance.  But actually you want to balance
> > for this case so the jiffies+1 was used.
> 
> Hi Tim, Rik
> 
> Yes, that makes sense that we want to balance if they are equal. We
> may also consider using "if (time_after_eq(jiffies,
> rq->next_balance)".

That sounds good.  Thanks.
> 
> Reviewed-by: Jason Low <jason.low2@hp.com>


Tim



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-20 21:09     ` Jason Low
  2014-05-20 21:12       ` Tim Chen
@ 2014-05-20 21:39       ` Tim Chen
  2014-05-21  6:38         ` Peter Zijlstra
  2014-06-05 14:34         ` [tip:sched/core] sched/balancing: " tip-bot for Tim Chen
  1 sibling, 2 replies; 14+ messages in thread
From: Tim Chen @ 2014-05-20 21:39 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, Len Brown,
	Russ Anderson, Dimitri Sivanich, Hedi Berriche, Andi Kleen,
	Michel Lespinasse, Rik van Riel, Peter Hurley,
	Linux Kernel Mailing List

On Tue, 2014-05-20 at 14:09 -0700, Jason Low wrote:

> Hi Tim, Rik
> 
> Yes, that makes sense that we want to balance if they are equal. We
> may also consider using "if (time_after_eq(jiffies,
> rq->next_balance)".
> 
> Reviewed-by: Jason Low <jason.low2@hp.com>

Jason & Rik,

Thanks for reviewing the patch.  I've updated the patch below as
suggested.

Tim

---

From: Tim Chen <tim.c.chen@linux.intel.com>
Subject: [PATCH v2] sched: Reduce the rate of needless idle load balancing


The current no_hz idle load balancer do load balancing for *all* idle cpus,
even though the time due to load balance for a particular
idle cpu could be still a while in the future.  This introduces a much
higher load balancing rate than what is necessary.  The patch
changes the behavior by only doing idle load balancing on
behalf of an idle cpu only when it is due for load balancing.

On SGI's systems with over 3000 cores, the cpu responsible for idle balancing
got overwhelmed with idle balancing, and introduces a lot of OS noise
to workloads.  This patch fixes the issue.

Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9b4c4f3..b826c3a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6764,12 +6764,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 
 		rq = cpu_rq(balance_cpu);
 
-		raw_spin_lock_irq(&rq->lock);
-		update_rq_clock(rq);
-		update_idle_cpu_load(rq);
-		raw_spin_unlock_irq(&rq->lock);
-
-		rebalance_domains(rq, CPU_IDLE);
+		/*
+		 * If time for next balance is due,
+		 * do the balance.
+		 */
+		if (time_after_eq(jiffies, rq->next_balance)) {
+			raw_spin_lock_irq(&rq->lock);
+			update_rq_clock(rq);
+			update_idle_cpu_load(rq);
+			raw_spin_unlock_irq(&rq->lock);
+			rebalance_domains(rq, CPU_IDLE);
+		}
 
 		if (time_after(this_rq->next_balance, rq->next_balance))
 			this_rq->next_balance = rq->next_balance;
-- 
1.7.11.7





^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-20 21:04     ` Tim Chen
@ 2014-05-21  1:15       ` Joe Perches
  2014-05-21 16:37         ` Tim Chen
  0 siblings, 1 reply; 14+ messages in thread
From: Joe Perches @ 2014-05-21  1:15 UTC (permalink / raw)
  To: Tim Chen
  Cc: Jason Low, Ingo Molnar, Peter Zijlstra, Andrew Morton, Len Brown,
	Russ Anderson, Dimitri Sivanich, Hedi Berriche, Andi Kleen,
	Michel Lespinasse, Rik van Riel, Peter Hurley,
	Linux Kernel Mailing List

On Tue, 2014-05-20 at 14:04 -0700, Tim Chen wrote:
> On Tue, 2014-05-20 at 13:59 -0700, Tim Chen wrote:
> > On Tue, 2014-05-20 at 13:51 -0700, Jason Low wrote:
> > > On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
[]
> > > If we want to do idle load balancing only when it is due for a
> > > balance, shouldn't the above just be "if (time_after(jiffies,
> > > rq->next_balance))"?
> > 
> > If rq->next_balance and jiffies are equal, then
> > time_after(jiffies, rq->next_balance) check will be false and
> > you will not do balance.  But actually you want to balance
> > for this case so the jiffies+1 was used.
> 
> So maybe I should switch the check to 
> if (time_before(rq->next_balance, jiffies))

time_after_eq() or time_is_after_eq_jiffies()



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-20 21:39       ` Tim Chen
@ 2014-05-21  6:38         ` Peter Zijlstra
  2014-06-05 14:34         ` [tip:sched/core] sched/balancing: " tip-bot for Tim Chen
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2014-05-21  6:38 UTC (permalink / raw)
  To: Tim Chen
  Cc: Jason Low, Ingo Molnar, Andrew Morton, Len Brown, Russ Anderson,
	Dimitri Sivanich, Hedi Berriche, Andi Kleen, Michel Lespinasse,
	Rik van Riel, Peter Hurley, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

On Tue, May 20, 2014 at 02:39:27PM -0700, Tim Chen wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> Subject: [PATCH v2] sched: Reduce the rate of needless idle load balancing
> 
> 
> The current no_hz idle load balancer do load balancing for *all* idle cpus,
> even though the time due to load balance for a particular
> idle cpu could be still a while in the future.  This introduces a much
> higher load balancing rate than what is necessary.  The patch
> changes the behavior by only doing idle load balancing on
> behalf of an idle cpu only when it is due for load balancing.
> 
> On SGI's systems with over 3000 cores, the cpu responsible for idle balancing
> got overwhelmed with idle balancing, and introduces a lot of OS noise
> to workloads.  This patch fixes the issue.
> 
> Acked-by: Russ Anderson <rja@sgi.com>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: Jason Low <jason.low2@hp.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---

Thanks!

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-21  1:15       ` Joe Perches
@ 2014-05-21 16:37         ` Tim Chen
  2014-05-21 18:26           ` Davidlohr Bueso
  0 siblings, 1 reply; 14+ messages in thread
From: Tim Chen @ 2014-05-21 16:37 UTC (permalink / raw)
  To: Joe Perches
  Cc: Jason Low, Ingo Molnar, Peter Zijlstra, Andrew Morton, Len Brown,
	Russ Anderson, Dimitri Sivanich, Hedi Berriche, Andi Kleen,
	Michel Lespinasse, Rik van Riel, Peter Hurley,
	Linux Kernel Mailing List

On Tue, 2014-05-20 at 18:15 -0700, Joe Perches wrote:
> On Tue, 2014-05-20 at 14:04 -0700, Tim Chen wrote:
> > On Tue, 2014-05-20 at 13:59 -0700, Tim Chen wrote:
> > > On Tue, 2014-05-20 at 13:51 -0700, Jason Low wrote:
> > > > On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> []
> > > > If we want to do idle load balancing only when it is due for a
> > > > balance, shouldn't the above just be "if (time_after(jiffies,
> > > > rq->next_balance))"?
> > > 
> > > If rq->next_balance and jiffies are equal, then
> > > time_after(jiffies, rq->next_balance) check will be false and
> > > you will not do balance.  But actually you want to balance
> > > for this case so the jiffies+1 was used.
> > 
> > So maybe I should switch the check to 
> > if (time_before(rq->next_balance, jiffies))
> 
> time_after_eq() or time_is_after_eq_jiffies()
> 
> 

I prefer time_after_eq to keep the code style consistent with the
rest of the code in fair.c.

Tim


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-21 16:37         ` Tim Chen
@ 2014-05-21 18:26           ` Davidlohr Bueso
  2014-05-21 18:49             ` Tim Chen
  0 siblings, 1 reply; 14+ messages in thread
From: Davidlohr Bueso @ 2014-05-21 18:26 UTC (permalink / raw)
  To: Tim Chen
  Cc: Joe Perches, Jason Low, Ingo Molnar, Peter Zijlstra,
	Andrew Morton, Len Brown, Russ Anderson, Dimitri Sivanich,
	Hedi Berriche, Andi Kleen, Michel Lespinasse, Rik van Riel,
	Peter Hurley, Linux Kernel Mailing List

On Wed, 2014-05-21 at 09:37 -0700, Tim Chen wrote:
> On Tue, 2014-05-20 at 18:15 -0700, Joe Perches wrote:
> > On Tue, 2014-05-20 at 14:04 -0700, Tim Chen wrote:
> > > On Tue, 2014-05-20 at 13:59 -0700, Tim Chen wrote:
> > > > On Tue, 2014-05-20 at 13:51 -0700, Jason Low wrote:
> > > > > On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > []
> > > > > If we want to do idle load balancing only when it is due for a
> > > > > balance, shouldn't the above just be "if (time_after(jiffies,
> > > > > rq->next_balance))"?
> > > > 
> > > > If rq->next_balance and jiffies are equal, then
> > > > time_after(jiffies, rq->next_balance) check will be false and
> > > > you will not do balance.  But actually you want to balance
> > > > for this case so the jiffies+1 was used.
> > > 
> > > So maybe I should switch the check to 
> > > if (time_before(rq->next_balance, jiffies))
> > 
> > time_after_eq() or time_is_after_eq_jiffies()
> > 
> > 
> 
> I prefer time_after_eq to keep the code style consistent with the
> rest of the code in fair.c.

Should all the code be updated then? We should use the existing
interfaces if available.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sched: Reduce the rate of needless idle load balancing
  2014-05-21 18:26           ` Davidlohr Bueso
@ 2014-05-21 18:49             ` Tim Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Tim Chen @ 2014-05-21 18:49 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Joe Perches, Jason Low, Ingo Molnar, Peter Zijlstra,
	Andrew Morton, Len Brown, Russ Anderson, Dimitri Sivanich,
	Hedi Berriche, Andi Kleen, Michel Lespinasse, Rik van Riel,
	Peter Hurley, Linux Kernel Mailing List

On Wed, 2014-05-21 at 11:26 -0700, Davidlohr Bueso wrote:
> On Wed, 2014-05-21 at 09:37 -0700, Tim Chen wrote:
> > On Tue, 2014-05-20 at 18:15 -0700, Joe Perches wrote:
> > > On Tue, 2014-05-20 at 14:04 -0700, Tim Chen wrote:
> > > > On Tue, 2014-05-20 at 13:59 -0700, Tim Chen wrote:
> > > > > On Tue, 2014-05-20 at 13:51 -0700, Jason Low wrote:
> > > > > > On Tue, May 20, 2014 at 1:17 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > []
> > > > > > If we want to do idle load balancing only when it is due for a
> > > > > > balance, shouldn't the above just be "if (time_after(jiffies,
> > > > > > rq->next_balance))"?
> > > > > 
> > > > > If rq->next_balance and jiffies are equal, then
> > > > > time_after(jiffies, rq->next_balance) check will be false and
> > > > > you will not do balance.  But actually you want to balance
> > > > > for this case so the jiffies+1 was used.
> > > > 
> > > > So maybe I should switch the check to 
> > > > if (time_before(rq->next_balance, jiffies))
> > > 
> > > time_after_eq() or time_is_after_eq_jiffies()
> > > 
> > > 
> > 
> > I prefer time_after_eq to keep the code style consistent with the
> > rest of the code in fair.c.
> 
> Should all the code be updated then? We should use the existing
> interfaces if available.
> 

BTW, if this code was to be updated, time_is_before_eq_jiffies(rq->next_balance) 
check will be the correct thing to do for the patch. This expands to
time_after_eq(jiffies, rq->next_balance), which is what we want.

So something like:

                if (time_is_before_eq_jiffies(rq->next_balance)) {
                        raw_spin_lock_irq(&rq->lock);
                        update_rq_clock(rq);
                        update_idle_cpu_load(rq);
                        raw_spin_unlock_irq(&rq->lock);
                        rebalance_domains(rq, CPU_IDLE);
                }


But I don't think this change is making the code logic any clearer.  
I prefer time_after_eq(jiffies, rq->next_balance), which is more
readable.

Tim


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:sched/core] sched/balancing: Reduce the rate of needless idle load balancing
  2014-05-20 21:39       ` Tim Chen
  2014-05-21  6:38         ` Peter Zijlstra
@ 2014-06-05 14:34         ` tip-bot for Tim Chen
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot for Tim Chen @ 2014-06-05 14:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, torvalds, peterz, peter, jason.low2, sivanich, riel, akpm,
	tglx, len.brown, linux-kernel, hpa, andi, tim.c.chen, hedi, rja,
	walken

Commit-ID:  ed61bbc69c773465782476c7e5869fa5607fa73a
Gitweb:     http://git.kernel.org/tip/ed61bbc69c773465782476c7e5869fa5607fa73a
Author:     Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate: Tue, 20 May 2014 14:39:27 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 5 Jun 2014 11:52:01 +0200

sched/balancing: Reduce the rate of needless idle load balancing

The current no_hz idle load balancer do load balancing for *all* idle cpus,
even though the time due to load balance for a particular
idle cpu could be still a while in the future.  This introduces a much
higher load balancing rate than what is necessary.  The patch
changes the behavior by only doing idle load balancing on
behalf of an idle cpu only when it is due for load balancing.

On SGI's systems with over 3000 cores, the cpu responsible for idle balancing
got overwhelmed with idle balancing, and introduces a lot of OS noise
to workloads.  This patch fixes the issue.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: MichelLespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1400621967.2970.280.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b71d8c3..7a0c000 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7193,12 +7193,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 
 		rq = cpu_rq(balance_cpu);
 
-		raw_spin_lock_irq(&rq->lock);
-		update_rq_clock(rq);
-		update_idle_cpu_load(rq);
-		raw_spin_unlock_irq(&rq->lock);
-
-		rebalance_domains(rq, CPU_IDLE);
+		/*
+		 * If time for next balance is due,
+		 * do the balance.
+		 */
+		if (time_after_eq(jiffies, rq->next_balance)) {
+			raw_spin_lock_irq(&rq->lock);
+			update_rq_clock(rq);
+			update_idle_cpu_load(rq);
+			raw_spin_unlock_irq(&rq->lock);
+			rebalance_domains(rq, CPU_IDLE);
+		}
 
 		if (time_after(this_rq->next_balance, rq->next_balance))
 			this_rq->next_balance = rq->next_balance;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-06-05 14:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-20 20:17 [PATCH] sched: Reduce the rate of needless idle load balancing Tim Chen
2014-05-20 20:51 ` Jason Low
2014-05-20 20:58   ` Rik van Riel
2014-05-20 20:59   ` Tim Chen
2014-05-20 21:04     ` Tim Chen
2014-05-21  1:15       ` Joe Perches
2014-05-21 16:37         ` Tim Chen
2014-05-21 18:26           ` Davidlohr Bueso
2014-05-21 18:49             ` Tim Chen
2014-05-20 21:09     ` Jason Low
2014-05-20 21:12       ` Tim Chen
2014-05-20 21:39       ` Tim Chen
2014-05-21  6:38         ` Peter Zijlstra
2014-06-05 14:34         ` [tip:sched/core] sched/balancing: " tip-bot for Tim Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox