scheduler problems on shutdown

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* scheduler problems on shutdown
@ 2004-04-09 18:21 Martin J. Bligh
  2004-04-10 15:31 ` Martin J. Bligh
  0 siblings, 1 reply; 7+ messages in thread
From: Martin J. Bligh @ 2004-04-09 18:21 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

I get this on shutdown (after "Power Off" ironically).
2.6.5-rc3-mjb2.

Badness in find_busiest_group at kernel/sched.c:1425
Call Trace:
 [<c0117c84>] find_busiest_group+0x64/0x22c
 [<c0118091>] load_balance_newidle+0x21/0x6c
 [<c0118c77>] schedule+0x273/0x644
 [<c011e9c5>] exit_notify+0x609/0x64c
 [<c011ed22>] do_exit+0x31a/0x32c
 [<c0128c7a>] sys_reboot+0x1f2/0x2f8
 [<c0116f50>] wake_up_state+0xc/0x10
 [<c0125c37>] kill_proc_info+0x37/0x4c
 [<c0125d30>] kill_something_info+0xe4/0xec
 [<c01279e8>] sys_kill+0x54/0x5c
 [<c014caa3>] filp_open+0x3b/0x5c
 [<c014ce79>] sys_open+0x59/0x74
 [<c01075f9>] error_code+0x2d/0x38
 [<c0106b8f>] syscall_call+0x7/0xb

Look familar?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scheduler problems on shutdown
  2004-04-09 18:21 scheduler problems on shutdown Martin J. Bligh
@ 2004-04-10 15:31 ` Martin J. Bligh
  2004-04-11  9:48   ` Nick Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Martin J. Bligh @ 2004-04-10 15:31 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

> I get this on shutdown (after "Power Off" ironically).
> 2.6.5-rc3-mjb2.
> 
> Badness in find_busiest_group at kernel/sched.c:1425
> Call Trace:
>  [<c0117c84>] find_busiest_group+0x64/0x22c
>  [<c0118091>] load_balance_newidle+0x21/0x6c
>  [<c0118c77>] schedule+0x273/0x644
>  [<c011e9c5>] exit_notify+0x609/0x64c
>  [<c011ed22>] do_exit+0x31a/0x32c
>  [<c0128c7a>] sys_reboot+0x1f2/0x2f8
>  [<c0116f50>] wake_up_state+0xc/0x10
>  [<c0125c37>] kill_proc_info+0x37/0x4c
>  [<c0125d30>] kill_something_info+0xe4/0xec
>  [<c01279e8>] sys_kill+0x54/0x5c
>  [<c014caa3>] filp_open+0x3b/0x5c
>  [<c014ce79>] sys_open+0x59/0x74
>  [<c01075f9>] error_code+0x2d/0x38
>  [<c0106b8f>] syscall_call+0x7/0xb
> 
> Look familar?

Dunno why the numbers are different, but it's now 1738 in 2.6.5-mjb1 ... 
I wouldn't have thought we'd inserted that much since then. anyway, it's
this:

                /* Tally up the load of all CPUs in the group */
                cpus_and(tmp, group->cpumask, cpu_online_map);
                WARN_ON(cpus_empty(tmp));

in find_busiest_group. Which makes sense I guess, but is very ugly.

M.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scheduler problems on shutdown
  2004-04-10 15:31 ` Martin J. Bligh
@ 2004-04-11  9:48   ` Nick Piggin
  2004-04-11 15:11     ` Martin J. Bligh
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2004-04-11  9:48 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel



Martin J. Bligh wrote:

>>I get this on shutdown (after "Power Off" ironically).
>>2.6.5-rc3-mjb2.
>>
>>Badness in find_busiest_group at kernel/sched.c:1425
>>Call Trace:
>> [<c0117c84>] find_busiest_group+0x64/0x22c
>> [<c0118091>] load_balance_newidle+0x21/0x6c
>> [<c0118c77>] schedule+0x273/0x644
>> [<c011e9c5>] exit_notify+0x609/0x64c
>> [<c011ed22>] do_exit+0x31a/0x32c
>> [<c0128c7a>] sys_reboot+0x1f2/0x2f8
>> [<c0116f50>] wake_up_state+0xc/0x10
>> [<c0125c37>] kill_proc_info+0x37/0x4c
>> [<c0125d30>] kill_something_info+0xe4/0xec
>> [<c01279e8>] sys_kill+0x54/0x5c
>> [<c014caa3>] filp_open+0x3b/0x5c
>> [<c014ce79>] sys_open+0x59/0x74
>> [<c01075f9>] error_code+0x2d/0x38
>> [<c0106b8f>] syscall_call+0x7/0xb
>>
>>Look familar?
>>
>
>Dunno why the numbers are different, but it's now 1738 in 2.6.5-mjb1 ... 
>I wouldn't have thought we'd inserted that much since then. anyway, it's
>this:
>
>                /* Tally up the load of all CPUs in the group */
>                cpus_and(tmp, group->cpumask, cpu_online_map);
>                WARN_ON(cpus_empty(tmp));
>
>in find_busiest_group. Which makes sense I guess, but is very ugly.
>
>

I think the WARN_ON can go. You have to make sure the for_each_cpu
loop doesn't get to run it though. It shouldn't be in the latest -mm
kernels, is it?

It is normal to have an entire group offline with CPU hotplug.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scheduler problems on shutdown
  2004-04-11  9:48   ` Nick Piggin
@ 2004-04-11 15:11     ` Martin J. Bligh
  2004-04-11 15:24       ` Nick Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Martin J. Bligh @ 2004-04-11 15:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

> I think the WARN_ON can go. You have to make sure the for_each_cpu
> loop doesn't get to run it though. It shouldn't be in the latest -mm
> kernels, is it?

OK, I'll figure it out. I don't like the latest code, so don't really want
to "upgrade" though.
 
> It is normal to have an entire group offline with CPU hotplug.

Only if we can't figure out how to hotplug groups as well, which would
be a much cleaner way of doing it.

M.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scheduler problems on shutdown
  2004-04-11 15:11     ` Martin J. Bligh
@ 2004-04-11 15:24       ` Nick Piggin
  2004-04-11 15:42         ` Martin J. Bligh
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2004-04-11 15:24 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

Martin J. Bligh wrote:
>>I think the WARN_ON can go. You have to make sure the for_each_cpu
>>loop doesn't get to run it though. It shouldn't be in the latest -mm
>>kernels, is it?
> 
> 
> OK, I'll figure it out. I don't like the latest code, so don't really want
> to "upgrade" though.
>  

Oh? Anything specific?

> 
>>It is normal to have an entire group offline with CPU hotplug.
> 
> 
> Only if we can't figure out how to hotplug groups as well, which would
> be a much cleaner way of doing it.
> 

I think we'll soon want to add a sched domain setup callback
for hotplug that can take care of these things as required.
But for now, the current situation should be OK.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scheduler problems on shutdown
  2004-04-11 15:24       ` Nick Piggin
@ 2004-04-11 15:42         ` Martin J. Bligh
  2004-04-11 16:05           ` Nick Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Martin J. Bligh @ 2004-04-11 15:42 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

>> OK, I'll figure it out. I don't like the latest code, so don't really want
>> to "upgrade" though.
> 
> Oh? Anything specific?

balance_on_clone, mostly.

M.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scheduler problems on shutdown
  2004-04-11 15:42         ` Martin J. Bligh
@ 2004-04-11 16:05           ` Nick Piggin
  0 siblings, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2004-04-11 16:05 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

Martin J. Bligh wrote:
>>>OK, I'll figure it out. I don't like the latest code, so don't really want
>>>to "upgrade" though.
>>
>>Oh? Anything specific?
> 
> 
> balance_on_clone, mostly.
> 

Oh, that is removed until more testing/tuning is done.

So far I haven't seen a benchmark with more than a few %
improvement. I thought I was going to be flooded with them
from the HPC guys after the last thread on the subject...

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-04-11 16:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-09 18:21 scheduler problems on shutdown Martin J. Bligh
2004-04-10 15:31 ` Martin J. Bligh
2004-04-11  9:48   ` Nick Piggin
2004-04-11 15:11     ` Martin J. Bligh
2004-04-11 15:24       ` Nick Piggin
2004-04-11 15:42         ` Martin J. Bligh
2004-04-11 16:05           ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox