High priority threads causing severe CPU load imbalances

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* High priority threads causing severe CPU load imbalances
@ 2010-04-06 13:12 Suresh Jayaraman
  2010-04-06 14:08 ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Suresh Jayaraman @ 2010-04-06 13:12 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra

I have a simple test program that accepts number of threads(pthreads) to
be created as a input. Each of these threads that gets created invokes a
function which is just a infinite while loop. The main function after
creating those threads goes in a infinite loop itself

My test machine is a Dual Core AMD Opteron(tm) 860 with 8
sockets(non-HT), I run this test program with number of threads ==
number of CPUs:

   ./loadcpu -t 16

I see 100% CPU utilization on almost all CPUs (via mpstat/htop/vmstat).

When the above threads are running, if I introduce a few high priority
threads by doing:

   nice -n -13 ./loadcpu -t 3

After a short while, I see a few CPUs becoming idle at ~0% utilization
(the number of CPUs becoming idle equals roughly the number of high
priority threads i.e. 3). When I stop the high priority threads, the CPU
utilization comes back to normal i.e. ~100%.

This is reproducible on 2.6.32.10 stable kernel with all the recent all
SMT fixes (I hope) and I think it would be reproducible in current
upstream as well.

sched_mc_power_savings has been always set to 0.

I spent a while staring at the load balancing and the thread migration
code, but could not figure out why this is happening. Would appreciate
any pointers.

Thanks,

-- 
Suresh Jayaraman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High priority threads causing severe CPU load imbalances
  2010-04-06 13:12 High priority threads causing severe CPU load imbalances Suresh Jayaraman
@ 2010-04-06 14:08 ` Peter Zijlstra
  2010-04-06 16:35   ` Suresh Jayaraman
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Peter Zijlstra @ 2010-04-06 14:08 UTC (permalink / raw)
  To: Suresh Jayaraman; +Cc: LKML, Ingo Molnar

On Tue, 2010-04-06 at 18:42 +0530, Suresh Jayaraman wrote:
> I have a simple test program that accepts number of threads(pthreads) to
> be created as a input. Each of these threads that gets created invokes a
> function which is just a infinite while loop. The main function after
> creating those threads goes in a infinite loop itself
> 
> My test machine is a Dual Core AMD Opteron(tm) 860 with 8
> sockets(non-HT), I run this test program with number of threads ==
> number of CPUs:
> 
>    ./loadcpu -t 16
> 
> I see 100% CPU utilization on almost all CPUs (via mpstat/htop/vmstat).
> 
> When the above threads are running, if I introduce a few high priority
> threads by doing:
> 
>    nice -n -13 ./loadcpu -t 3
> 
> After a short while, I see a few CPUs becoming idle at ~0% utilization
> (the number of CPUs becoming idle equals roughly the number of high
> priority threads i.e. 3). When I stop the high priority threads, the CPU
> utilization comes back to normal i.e. ~100%.
> 
> This is reproducible on 2.6.32.10 stable kernel with all the recent all
> SMT fixes (I hope) and I think it would be reproducible in current
> upstream as well.

Why bother using -stable for reporting bugs?

> sched_mc_power_savings has been always set to 0.
> 
> I spent a while staring at the load balancing and the thread migration
> code, but could not figure out why this is happening. Would appreciate
> any pointers.

Right, except its not a severe imbalance as the subject suggests. For
some reason it seems to end up in a semi-stable state that is actually
quite balanced.

for ((i=0; i<8; i++)) do while :; do :; done & done
for ((i=0; i<3; i++)) do while :; do :; done & renice -n -15 -p $! ;
done

gets me:

Cpu0  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  : 99.0%us,  1.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16440840k total,  1073672k used, 15367168k free,   105844k buffers
Swap: 16777212k total,        0k used, 16777212k free,   296504k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4370 root       5 -15  105m  804  304 R 100.1  0.0   0:45.02 bash
 4374 root       5 -15  105m  804  304 R 100.1  0.0   0:44.95 bash
 4372 root       5 -15  105m  804  304 R 99.1  0.0   0:45.00 bash
 4364 root      20   0  105m  804  304 R 51.0  0.0   0:33.06 bash
 4362 root      20   0  105m  800  300 R 50.0  0.0   0:33.17 bash
 4365 root      20   0  105m  804  304 R 50.0  0.0   0:33.75 bash
 4368 root      20   0  105m  804  304 R 50.0  0.0   0:33.32 bash
 4369 root      20   0  105m  804  304 R 50.0  0.0   0:33.38 bash
 4363 root      20   0  105m  804  304 R 49.1  0.0   0:33.65 bash
 4366 root      20   0  105m  804  304 R 49.1  0.0   0:33.29 bash
 4367 root      20   0  105m  804  304 R 49.1  0.0   0:33.54 bash 

So we have the 3 -15 loops on a cpu each, and the 8 0 loops on 2 cpus
each, and 1 cpu idle. That is actually quite balanced, 'better' would be
if those 0 loops would rotate over the 5 available cpus, but that would
also trash more caches I guess.

I'm not quite sure what makes the load-balancer end up in this situation
though, but I suspect the various imbalance_pct things might have
something to do with it.

It doesn't always end up in this state either, if you only start 2 -15
loops its a roll of the dice on what happens, sometimes it ends up with
the 6 cpus cycling the 2 extra tasks around, sometimes its 1 cpu idle
with cycling 1 task.

Unexpected, maybe, severe imbalance, no. Would be nice to get it to be a
little more stable behaviour though.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High priority threads causing severe CPU load imbalances
  2010-04-06 14:08 ` Peter Zijlstra
@ 2010-04-06 16:35   ` Suresh Jayaraman
  2010-04-08 16:15     ` Peter Zijlstra
  2010-04-07  4:42   ` Andy Lutomirski
  2010-04-07  5:46   ` Masayuki Igawa
  2 siblings, 1 reply; 8+ messages in thread
From: Suresh Jayaraman @ 2010-04-06 16:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar

On 04/06/2010 07:38 PM, Peter Zijlstra wrote:
> On Tue, 2010-04-06 at 18:42 +0530, Suresh Jayaraman wrote:
>> I have a simple test program that accepts number of threads(pthreads) to
>> be created as a input. Each of these threads that gets created invokes a
>> function which is just a infinite while loop. The main function after
>> creating those threads goes in a infinite loop itself
>>
>> My test machine is a Dual Core AMD Opteron(tm) 860 with 8
>> sockets(non-HT), I run this test program with number of threads ==
>> number of CPUs:
>>
>>    ./loadcpu -t 16
>>
>> I see 100% CPU utilization on almost all CPUs (via mpstat/htop/vmstat).
>>
>> When the above threads are running, if I introduce a few high priority
>> threads by doing:
>>
>>    nice -n -13 ./loadcpu -t 3
>>
>> After a short while, I see a few CPUs becoming idle at ~0% utilization
>> (the number of CPUs becoming idle equals roughly the number of high
>> priority threads i.e. 3). When I stop the high priority threads, the CPU
>> utilization comes back to normal i.e. ~100%.
>>
>> This is reproducible on 2.6.32.10 stable kernel with all the recent all
>> SMT fixes (I hope) and I think it would be reproducible in current
>> upstream as well.
> 
> Why bother using -stable for reporting bugs?

It was not intentional. It just happened that I first noticed the bug on
a 32.10 kernel.

>> sched_mc_power_savings has been always set to 0.
>>
>> I spent a while staring at the load balancing and the thread migration
>> code, but could not figure out why this is happening. Would appreciate
>> any pointers.
> 
> Right, except its not a severe imbalance as the subject suggests. For
> some reason it seems to end up in a semi-stable state that is actually
> quite balanced.

In my reproduction attempt the number of CPUs becoming idle increased
with the number of high priority threads. For e.g.

 3 (out of 16 CPUs) become idle when there were 3 high priority threads
 5 CPUs become idle when there were 4 high priority threads
 7 CPUs become idle when there were 5 high priority threads (~40% )

But, I also starting to think it is some wierd combination of normal
priority threads and high priority threads make the problem worse or
good. Because with 7 or higher threads the utilization becomes smoother
again.

The increasing number of idle CPUs made me think that it could be severe..


> 
> for ((i=0; i<8; i++)) do while :; do :; done & done
> for ((i=0; i<3; i++)) do while :; do :; done & renice -n -15 -p $! ;
> done
> 
> gets me:
> 
> Cpu0  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu4  : 99.0%us,  1.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu5  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu6  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  16440840k total,  1073672k used, 15367168k free,   105844k buffers
> Swap: 16777212k total,        0k used, 16777212k free,   296504k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4370 root       5 -15  105m  804  304 R 100.1  0.0   0:45.02 bash
>  4374 root       5 -15  105m  804  304 R 100.1  0.0   0:44.95 bash
>  4372 root       5 -15  105m  804  304 R 99.1  0.0   0:45.00 bash
>  4364 root      20   0  105m  804  304 R 51.0  0.0   0:33.06 bash
>  4362 root      20   0  105m  800  300 R 50.0  0.0   0:33.17 bash
>  4365 root      20   0  105m  804  304 R 50.0  0.0   0:33.75 bash
>  4368 root      20   0  105m  804  304 R 50.0  0.0   0:33.32 bash
>  4369 root      20   0  105m  804  304 R 50.0  0.0   0:33.38 bash
>  4363 root      20   0  105m  804  304 R 49.1  0.0   0:33.65 bash
>  4366 root      20   0  105m  804  304 R 49.1  0.0   0:33.29 bash
>  4367 root      20   0  105m  804  304 R 49.1  0.0   0:33.54 bash 
> 
> So we have the 3 -15 loops on a cpu each, and the 8 0 loops on 2 cpus
> each, and 1 cpu idle. That is actually quite balanced, 'better' would be
> if those 0 loops would rotate over the 5 available cpus, but that would
> also trash more caches I guess.

Perhaps there is a chance that with more CPUs, different number of high
priority threads the problem could get worser as I mentioned above..?


Thanks,

-- 
Suresh Jayaraman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High priority threads causing severe CPU load imbalances
  2010-04-06 14:08 ` Peter Zijlstra
  2010-04-06 16:35   ` Suresh Jayaraman
@ 2010-04-07  4:42   ` Andy Lutomirski
  2010-04-07  7:44     ` Peter Zijlstra
  2010-04-07  5:46   ` Masayuki Igawa
  2 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2010-04-07  4:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Suresh Jayaraman, LKML, Ingo Molnar

Peter Zijlstra wrote:
> On Tue, 2010-04-06 at 18:42 +0530, Suresh Jayaraman wrote:
>> I have a simple test program that accepts number of threads(pthreads) to
>> be created as a input. Each of these threads that gets created invokes a
>> function which is just a infinite while loop. The main function after
>> creating those threads goes in a infinite loop itself
>>
>> My test machine is a Dual Core AMD Opteron(tm) 860 with 8
>> sockets(non-HT), I run this test program with number of threads ==
>> number of CPUs:
>>
>>    ./loadcpu -t 16
>>
>> I see 100% CPU utilization on almost all CPUs (via mpstat/htop/vmstat).
>>
>> When the above threads are running, if I introduce a few high priority
>> threads by doing:
>>
>>    nice -n -13 ./loadcpu -t 3
>>
>> After a short while, I see a few CPUs becoming idle at ~0% utilization
>> (the number of CPUs becoming idle equals roughly the number of high
>> priority threads i.e. 3). When I stop the high priority threads, the CPU
>> utilization comes back to normal i.e. ~100%.
>>
>> This is reproducible on 2.6.32.10 stable kernel with all the recent all
>> SMT fixes (I hope) and I think it would be reproducible in current
>> upstream as well.
> 
> Why bother using -stable for reporting bugs?
> 
>> sched_mc_power_savings has been always set to 0.
>>
>> I spent a while staring at the load balancing and the thread migration
>> code, but could not figure out why this is happening. Would appreciate
>> any pointers.
> 
> Right, except its not a severe imbalance as the subject suggests. For
> some reason it seems to end up in a semi-stable state that is actually
> quite balanced.
> 
> for ((i=0; i<8; i++)) do while :; do :; done & done
> for ((i=0; i<3; i++)) do while :; do :; done & renice -n -15 -p $! ;
> done
> 
> gets me:
> 
> Cpu0  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu4  : 99.0%us,  1.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu5  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu6  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  16440840k total,  1073672k used, 15367168k free,   105844k buffers
> Swap: 16777212k total,        0k used, 16777212k free,   296504k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4370 root       5 -15  105m  804  304 R 100.1  0.0   0:45.02 bash
>  4374 root       5 -15  105m  804  304 R 100.1  0.0   0:44.95 bash
>  4372 root       5 -15  105m  804  304 R 99.1  0.0   0:45.00 bash
>  4364 root      20   0  105m  804  304 R 51.0  0.0   0:33.06 bash
>  4362 root      20   0  105m  800  300 R 50.0  0.0   0:33.17 bash
>  4365 root      20   0  105m  804  304 R 50.0  0.0   0:33.75 bash
>  4368 root      20   0  105m  804  304 R 50.0  0.0   0:33.32 bash
>  4369 root      20   0  105m  804  304 R 50.0  0.0   0:33.38 bash
>  4363 root      20   0  105m  804  304 R 49.1  0.0   0:33.65 bash
>  4366 root      20   0  105m  804  304 R 49.1  0.0   0:33.29 bash
>  4367 root      20   0  105m  804  304 R 49.1  0.0   0:33.54 bash 
> 
> So we have the 3 -15 loops on a cpu each, and the 8 0 loops on 2 cpus
> each, and 1 cpu idle. That is actually quite balanced, 'better' would be
> if those 0 loops would rotate over the 5 available cpus, but that would
> also trash more caches I guess.

What's wrong with having the three -15 loops each get a CPU, having six 
of the remaining 0 loops get half a CPU, and the last two get their own 
CPUs.  That's less fair but strictly better than the current solution, 
and nothing bounces.

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High priority threads causing severe CPU load imbalances
  2010-04-06 14:08 ` Peter Zijlstra
  2010-04-06 16:35   ` Suresh Jayaraman
  2010-04-07  4:42   ` Andy Lutomirski
@ 2010-04-07  5:46   ` Masayuki Igawa
  2 siblings, 0 replies; 8+ messages in thread
From: Masayuki Igawa @ 2010-04-07  5:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Suresh Jayaraman, LKML, Ingo Molnar

On Tue, 06 Apr 2010 16:08:10 +0200, Peter Zijlstra wrote:

> On Tue, 2010-04-06 at 18:42 +0530, Suresh Jayaraman wrote:
> > I have a simple test program that accepts number of threads(pthreads) to
> > be created as a input. Each of these threads that gets created invokes a
> > function which is just a infinite while loop. The main function after
> > creating those threads goes in a infinite loop itself
> > 
> > My test machine is a Dual Core AMD Opteron(tm) 860 with 8
> > sockets(non-HT), I run this test program with number of threads ==
> > number of CPUs:
> > 
> >    ./loadcpu -t 16
> > 
> > I see 100% CPU utilization on almost all CPUs (via mpstat/htop/vmstat).
> > 
> > When the above threads are running, if I introduce a few high priority
> > threads by doing:
> > 
> >    nice -n -13 ./loadcpu -t 3
> > 
> > After a short while, I see a few CPUs becoming idle at ~0% utilization
> > (the number of CPUs becoming idle equals roughly the number of high
> > priority threads i.e. 3). When I stop the high priority threads, the CPU
> > utilization comes back to normal i.e. ~100%.
> > 
> > This is reproducible on 2.6.32.10 stable kernel with all the recent all
> > SMT fixes (I hope) and I think it would be reproducible in current
> > upstream as well.
> 
> Why bother using -stable for reporting bugs?
> 
> > sched_mc_power_savings has been always set to 0.
> > 
> > I spent a while staring at the load balancing and the thread migration
> > code, but could not figure out why this is happening. Would appreciate
> > any pointers.
> 
> Right, except its not a severe imbalance as the subject suggests. For
> some reason it seems to end up in a semi-stable state that is actually
> quite balanced.
> 
> for ((i=0; i<8; i++)) do while :; do :; done & done
> for ((i=0; i<3; i++)) do while :; do :; done & renice -n -15 -p $! ;
> done
> 
> gets me:
> 
> Cpu0  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu4  : 99.0%us,  1.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu5  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu6  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  16440840k total,  1073672k used, 15367168k free,   105844k buffers
> Swap: 16777212k total,        0k used, 16777212k free,   296504k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4370 root       5 -15  105m  804  304 R 100.1  0.0   0:45.02 bash
>  4374 root       5 -15  105m  804  304 R 100.1  0.0   0:44.95 bash
>  4372 root       5 -15  105m  804  304 R 99.1  0.0   0:45.00 bash
>  4364 root      20   0  105m  804  304 R 51.0  0.0   0:33.06 bash
>  4362 root      20   0  105m  800  300 R 50.0  0.0   0:33.17 bash
>  4365 root      20   0  105m  804  304 R 50.0  0.0   0:33.75 bash
>  4368 root      20   0  105m  804  304 R 50.0  0.0   0:33.32 bash
>  4369 root      20   0  105m  804  304 R 50.0  0.0   0:33.38 bash
>  4363 root      20   0  105m  804  304 R 49.1  0.0   0:33.65 bash
>  4366 root      20   0  105m  804  304 R 49.1  0.0   0:33.29 bash
>  4367 root      20   0  105m  804  304 R 49.1  0.0   0:33.54 bash 
> 
> So we have the 3 -15 loops on a cpu each, and the 8 0 loops on 2 cpus
> each, and 1 cpu idle. That is actually quite balanced, 'better' would be
> if those 0 loops would rotate over the 5 available cpus, but that would
> also trash more caches I guess.
> 
> I'm not quite sure what makes the load-balancer end up in this situation
> though, but I suspect the various imbalance_pct things might have
> something to do with it.
> 
> It doesn't always end up in this state either, if you only start 2 -15
> loops its a roll of the dice on what happens, sometimes it ends up with
> the 6 cpus cycling the 2 extra tasks around, sometimes its 1 cpu idle
> with cycling 1 task.
> 
> Unexpected, maybe, severe imbalance, no. Would be nice to get it to be a
> little more stable behaviour though.


I found a similar(maybe same) problem by using the cgroup cpu-subsystem like following:

My test machine has Xeon(Quad Core) with 2 sockets(non-HT).
# mount -t cgroup -o cpu none /dev/cgroup-cpu/
# mkdir -p /dev/cgroup-cpu/204800 /dev/cgroup-cpu/1024
# echo 204800 > /dev/cgroup-cpu/204800/cpu.shares
# for ((i=0; i<3; i++)) do while :; do :; done & echo $! > /dev/cgroup-cpu/204800/tasks ; done
# for ((i=0; i<5; i++)) do while :; do :; done & echo $! > /dev/cgroup-cpu/1024/tasks ; done


gets me:

Tasks: 190 total,   9 running, 181 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.0%us,  0.0%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8180292k total,  2430940k used,  5749352k free,   204988k buffers
Swap:        0k total,        0k used,        0k free,  1931820k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND          
30923 root      20   0  5808  540  264 R  100  0.0   2:30.64 3 bash             
30922 root      20   0  5808  540  264 R  100  0.0   2:30.64 2 bash             
30924 root      20   0  5808  540  264 R  100  0.0   2:30.63 6 bash             
30925 root      20   0  5808  540  264 R   42  0.0   1:00.19 7 bash             
30928 root      20   0  5808  540  264 R   41  0.0   0:57.26 5 bash             
30929 root      20   0  5808  540  264 R   40  0.0   0:57.03 7 bash             
30926 root      20   0  5808  540  264 R   39  0.0   0:58.37 7 bash             
30927 root      20   0  5808  540  264 R   39  0.0   0:58.57 5 bash             

I don't expect this behavior.
(I expect that all 8 processes use 100%CPU.)
So I'm investigating this problem.
And I suspect that the cause is find_busiest_group() returns the sched_group 
(as the busiest sched_group) with a high priority process 
although this sched_group has a 100% idle cpu.

IIUC, This problem was caused by changing the load calculation way by this patch,
---
commit 2dd73a4f09beacadde827a032cf15fd8b1fa3d48
Author: Peter Williams <pwil3058@bigpond.net.au>
Date:   Tue Jun 27 02:54:34 2006 -0700

    [PATCH] sched: implement smpnice
---
This patch changed the load calculation way from nr_running to weighted_load.
So the scheduler looks on the high priority process as many processes in the load calculation.

I don't find the solution of this problem yet.
I'll dig down more to find the solution.

Thanks.
-- 
Masayuki Igawa


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High priority threads causing severe CPU load imbalances
  2010-04-07  4:42   ` Andy Lutomirski
@ 2010-04-07  7:44     ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2010-04-07  7:44 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Suresh Jayaraman, LKML, Ingo Molnar

On Wed, 2010-04-07 at 00:42 -0400, Andy Lutomirski wrote:
> That's less fair but strictly better than the current solution, 
> and nothing bounces. 

The fairness thing, that really matters a lot to some people.

I've had enterprise bugs filed over such behaviour as you describe.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High priority threads causing severe CPU load imbalances
  2010-04-06 16:35   ` Suresh Jayaraman
@ 2010-04-08 16:15     ` Peter Zijlstra
  2010-04-09  2:20       ` Masayuki Igawa
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2010-04-08 16:15 UTC (permalink / raw)
  To: Suresh Jayaraman; +Cc: LKML, Ingo Molnar, Masayuki Igawa

On Tue, 2010-04-06 at 22:05 +0530, Suresh Jayaraman wrote:
> Perhaps there is a chance that with more CPUs, different number of high
> priority threads the problem could get worser as I mentioned above..?

One thing that could be happening (triggered by what Igawa-san said,
although his case is more complicated by involving the cgroup stuff) is
that f_b_g() ends up selecting a group that contains these niced tasks
and then f_b_q() will not find a suitable source queue because all of
them will have but a single runnable task on it and hence we simply
bail.

We'd somehow have to teach update_*_lb_stats() not to consider groups
where nr_running <= nr_cpus. I don't currently have a patch for that,
but I think that is the direction you might need to look in.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: High priority threads causing severe CPU load imbalances
  2010-04-08 16:15     ` Peter Zijlstra
@ 2010-04-09  2:20       ` Masayuki Igawa
  0 siblings, 0 replies; 8+ messages in thread
From: Masayuki Igawa @ 2010-04-09  2:20 UTC (permalink / raw)
  To: peterz; +Cc: sjayaraman, linux-kernel, mingo

From: Peter Zijlstra <peterz@infradead.org>
Subject: Re: High priority threads causing severe CPU load imbalances
Date: Thu, 08 Apr 2010 18:15:44 +0200

> On Tue, 2010-04-06 at 22:05 +0530, Suresh Jayaraman wrote:
>> Perhaps there is a chance that with more CPUs, different number of high
>> priority threads the problem could get worser as I mentioned above..?
> 
> One thing that could be happening (triggered by what Igawa-san said,
> although his case is more complicated by involving the cgroup stuff) is
> that f_b_g() ends up selecting a group that contains these niced tasks
> and then f_b_q() will not find a suitable source queue because all of
> them will have but a single runnable task on it and hence we simply
> bail.
> 
> We'd somehow have to teach update_*_lb_stats() not to consider groups
> where nr_running <= nr_cpus. I don't currently have a patch for that,
> but I think that is the direction you might need to look in.

I made a patch for my understanding the load_balance()'s behavior.
This patch reduced CPU load imbalances but not perfect.
---
Cpu0  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 90.1%us,  0.0%sy,  0.0%ni,  9.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  : 98.7%us,  0.3%sy,  0.0%ni,  1.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  : 96.1%us,  1.0%sy,  0.0%ni,  3.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  : 99.0%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu7  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8032460k total,   807628k used,  7224832k free,    30692k buffers
Swap:        0k total,        0k used,        0k free,   347308k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND         
 9872 root      20   0 66128  632  268 R   99  0.0   0:13.69 4 bash            
 9876 root      20   0 66128  632  268 R   99  0.0   0:10.31 2 bash            
 9877 root      20   0 66128  632  268 R   99  0.0   0:10.79 3 bash            
 9871 root      20   0 66128  632  268 R   99  0.0   0:13.70 0 bash            
 9873 root      20   0 66128  632  268 R   99  0.0   0:13.68 1 bash            
 9874 root      20   0 66128  632  268 R   98  0.0   0:10.00 6 bash            
 9875 root      20   0 66128  632  268 R   92  0.0   0:11.22 4 bash            
 9878 root      20   0 66128  632  268 R   91  0.0   0:10.03 7 bash            
---
Also, this patch caused ping-pong load balances..

This patch is regards the sched_group as a idle sched_group
if local sched_group's cpu is CPU_IDLE.

But the state is not stable because active_load_balance() runs at this situation IIUC.


I'll investigate more.

===
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5a5ea2c..806be90 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2418,6 +2418,7 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 	int i;
 	unsigned int balance_cpu = -1, first_idle_cpu = 0;
 	unsigned long avg_load_per_task = 0;
+	int idle_group = 0;
 
 	if (local_group)
 		balance_cpu = group_first_cpu(group);
@@ -2440,6 +2441,12 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 			}
 
 			load = target_load(i, load_idx);
+			/* This group is idle if it has a idle cpu. */
+			if (idle == CPU_IDLE) {
+				idle_group = 1;
+				sgs->group_load = 0;
+				sgs->sum_weighted_load = 0;
+			}
 		} else {
 			load = source_load(i, load_idx);
 			if (load > max_cpu_load)
@@ -2451,6 +2458,10 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 		sgs->group_load += load;
 		sgs->sum_nr_running += rq->nr_running;
 		sgs->sum_weighted_load += weighted_cpuload(i);
+		if (!idle_group) {
+			sgs->group_load += load;
+			sgs->sum_weighted_load += weighted_cpuload(i);
+		}
 
 	}
 
===


Thanks.
-- 
Masayuki Igawa

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-04-09  2:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-06 13:12 High priority threads causing severe CPU load imbalances Suresh Jayaraman
2010-04-06 14:08 ` Peter Zijlstra
2010-04-06 16:35   ` Suresh Jayaraman
2010-04-08 16:15     ` Peter Zijlstra
2010-04-09  2:20       ` Masayuki Igawa
2010-04-07  4:42   ` Andy Lutomirski
2010-04-07  7:44     ` Peter Zijlstra
2010-04-07  5:46   ` Masayuki Igawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox