* 2.6.20 BUG: soft lockup detected on CPU#0!
@ 2007-02-07 23:02 Lukasz Trabinski
2007-02-08 4:47 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Lukasz Trabinski @ 2007-02-07 23:02 UTC (permalink / raw)
To: linux-kernel; +Cc: Bartlomiej Solarz-Niesluchowski
Hello
On 2.6.19 I had about 60 days uptime, on 2.6.20 2 days :(
oceanic:~$ uname -a
Linux oceanic.wsisiz.edu.pl 2.6.20-oceanic #2 SMP Sun Feb 4 21:55:29 CET
2007 x86_64 x86_64 x86_64 GNU/Linux
Feb 7 22:46:00 oceanic kernel: BUG: soft lockup detected on CPU#0!
Feb 7 22:46:00 oceanic kernel:
Feb 7 22:46:00 oceanic kernel: Call Trace:
Feb 7 22:46:00 oceanic kernel: <IRQ> [<ffffffff80250550>] softlockup_tick+0xdb/0xed
Feb 7 22:46:00 oceanic kernel: [<ffffffff8022ea87>] __do_softirq+0x55/0xc4
Feb 7 22:46:00 oceanic kernel: [<ffffffff8023256c>] update_process_times+0x42/0x68
Feb 7 22:46:00 oceanic kernel: [<ffffffff80213e36>] smp_local_timer_interrupt+0x34/0x55
Feb 7 22:46:00 oceanic kernel: [<ffffffff80214316>] smp_apic_timer_interrupt+0x51/0x68
Feb 7 22:46:00 oceanic kernel: [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70
Feb 7 22:46:00 oceanic kernel: <EOI> [<ffffffff804393c3>] _spin_unlock_irqrestore+0x8/0x9
Feb 7 22:46:00 oceanic kernel: [<ffffffff804393c3>] _spin_unlock_irqrestore+0x8/0x9
Feb 7 22:46:00 oceanic kernel: [<ffffffff8023dddd>] hrtimer_try_to_cancel+0x4a/0x53
Feb 7 22:46:00 oceanic kernel: [<ffffffff8023ddf2>] hrtimer_cancel+0xc/0x16
Feb 7 22:46:00 oceanic kernel: [<ffffffff8023ddf2>] hrtimer_cancel+0xc/0x16
Feb 7 22:46:00 oceanic kernel: [<ffffffff8022cc66>] do_exit+0x1c8/0x800
Feb 7 22:46:00 oceanic kernel: [<ffffffff8022d31d>] sys_exit_group+0x0/0xe
Feb 7 22:46:00 oceanic kernel: [<ffffffff8020995c>] tracesys+0xdc/0xe1
Feb 7 22:46:00 oceanic kernel:
Feb 7 22:46:00 oceanic kernel: BUG: soft lockup detected on CPU#1!
Feb 7 22:46:00 oceanic kernel:
Feb 7 22:46:00 oceanic kernel: Call Trace:
Feb 7 22:46:00 oceanic kernel: <IRQ> [<ffffffff80250550>] softlockup_tick+0xdb/0xed
Feb 7 22:46:00 oceanic kernel: [<ffffffff8023256c>] update_process_times+0x42/0x68
Feb 7 22:46:00 oceanic kernel: [<ffffffff80213e36>] smp_local_timer_interrupt+0x34/0x55
Feb 7 22:46:00 oceanic kernel: [<ffffffff80214316>] smp_apic_timer_interrupt+0x51/0x68
Feb 7 22:46:00 oceanic kernel: [<ffffffff802338c8>] __group_send_sig_info+0x35/0x8b
Feb 7 22:46:00 oceanic kernel: [<ffffffff8022e0f1>] it_real_fn+0x0/0x4f
Feb 7 22:46:00 oceanic kernel: [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70
Feb 7 22:46:00 oceanic kernel: [<ffffffff802164da>] flat_send_IPI_mask+0x0/0x3d
Feb 7 22:46:00 oceanic kernel: [<ffffffff8023dc52>] hrtimer_run_queues+0x105/0x164
Feb 7 22:46:00 oceanic kernel: [<ffffffff80231d0f>] run_timer_softirq+0x21/0x19f
Feb 7 22:46:00 oceanic kernel: [<ffffffff8022eb49>] tasklet_action+0x53/0x9d
Feb 7 22:46:00 oceanic kernel: [<ffffffff8022ea87>] __do_softirq+0x55/0xc4
Feb 7 22:46:00 oceanic kernel: [<ffffffff8020a8fc>] call_softirq+0x1c/0x28
Feb 7 22:46:00 oceanic kernel: [<ffffffff8020c3b1>] do_softirq+0x2c/0x7d
Feb 7 22:46:00 oceanic kernel: [<ffffffff8021431b>] smp_apic_timer_interrupt+0x56/0x68
Feb 7 22:46:00 oceanic kernel: [<ffffffff80208a55>] default_idle+0x0/0x3d
Feb 7 22:46:00 oceanic kernel: [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70
Feb 7 22:46:00 oceanic kernel: <EOI> [<ffffffff80208a7e>] default_idle+0x29/0x3d
Feb 7 22:46:00 oceanic kernel: [<ffffffff80208ae4>] cpu_idle+0x52/0x71
Feb 7 22:46:00 oceanic kernel: [<ffffffff8057fbbc>] start_secondary+0x465/0x474
--
ŁT
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
2007-02-07 23:02 2.6.20 BUG: soft lockup detected on CPU#0! Lukasz Trabinski
@ 2007-02-08 4:47 ` Andrew Morton
2007-02-08 8:06 ` Ingo Molnar
2007-02-08 15:51 ` Lukasz Trabinski
0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2007-02-08 4:47 UTC (permalink / raw)
To: Lukasz Trabinski
Cc: linux-kernel, Bartlomiej Solarz-Niesluchowski, Ingo Molnar,
Thomas Gleixner
On Thu, 8 Feb 2007 00:02:10 +0100 (CET) Lukasz Trabinski <lukasz@wsisiz.edu.pl> wrote:
> Hello
>
> On 2.6.19 I had about 60 days uptime, on 2.6.20 2 days :(
>
Did the machine actually fail? Or did it just print these messages and
keep going?
>
> oceanic:~$ uname -a
> Linux oceanic.wsisiz.edu.pl 2.6.20-oceanic #2 SMP Sun Feb 4 21:55:29 CET
> 2007 x86_64 x86_64 x86_64 GNU/Linux
>
> Feb 7 22:46:00 oceanic kernel: BUG: soft lockup detected on CPU#0!
> Feb 7 22:46:00 oceanic kernel:
> Feb 7 22:46:00 oceanic kernel: Call Trace:
> Feb 7 22:46:00 oceanic kernel: <IRQ> [<ffffffff80250550>] softlockup_tick+0xdb/0xed
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8022ea87>] __do_softirq+0x55/0xc4
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8023256c>] update_process_times+0x42/0x68
> Feb 7 22:46:00 oceanic kernel: [<ffffffff80213e36>] smp_local_timer_interrupt+0x34/0x55
> Feb 7 22:46:00 oceanic kernel: [<ffffffff80214316>] smp_apic_timer_interrupt+0x51/0x68
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70
> Feb 7 22:46:00 oceanic kernel: <EOI> [<ffffffff804393c3>] _spin_unlock_irqrestore+0x8/0x9
> Feb 7 22:46:00 oceanic kernel: [<ffffffff804393c3>] _spin_unlock_irqrestore+0x8/0x9
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8023dddd>] hrtimer_try_to_cancel+0x4a/0x53
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8023ddf2>] hrtimer_cancel+0xc/0x16
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8023ddf2>] hrtimer_cancel+0xc/0x16
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8022cc66>] do_exit+0x1c8/0x800
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8022d31d>] sys_exit_group+0x0/0xe
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8020995c>] tracesys+0xdc/0xe1
> Feb 7 22:46:00 oceanic kernel:
> Feb 7 22:46:00 oceanic kernel: BUG: soft lockup detected on CPU#1!
> Feb 7 22:46:00 oceanic kernel:
> Feb 7 22:46:00 oceanic kernel: Call Trace:
> Feb 7 22:46:00 oceanic kernel: <IRQ> [<ffffffff80250550>] softlockup_tick+0xdb/0xed
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8023256c>] update_process_times+0x42/0x68
> Feb 7 22:46:00 oceanic kernel: [<ffffffff80213e36>] smp_local_timer_interrupt+0x34/0x55
> Feb 7 22:46:00 oceanic kernel: [<ffffffff80214316>] smp_apic_timer_interrupt+0x51/0x68
> Feb 7 22:46:00 oceanic kernel: [<ffffffff802338c8>] __group_send_sig_info+0x35/0x8b
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8022e0f1>] it_real_fn+0x0/0x4f
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70
> Feb 7 22:46:00 oceanic kernel: [<ffffffff802164da>] flat_send_IPI_mask+0x0/0x3d
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8023dc52>] hrtimer_run_queues+0x105/0x164
> Feb 7 22:46:00 oceanic kernel: [<ffffffff80231d0f>] run_timer_softirq+0x21/0x19f
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8022eb49>] tasklet_action+0x53/0x9d
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8022ea87>] __do_softirq+0x55/0xc4
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8020a8fc>] call_softirq+0x1c/0x28
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8020c3b1>] do_softirq+0x2c/0x7d
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8021431b>] smp_apic_timer_interrupt+0x56/0x68
> Feb 7 22:46:00 oceanic kernel: [<ffffffff80208a55>] default_idle+0x0/0x3d
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8020a3a6>] apic_timer_interrupt+0x66/0x70
> Feb 7 22:46:00 oceanic kernel: <EOI> [<ffffffff80208a7e>] default_idle+0x29/0x3d
> Feb 7 22:46:00 oceanic kernel: [<ffffffff80208ae4>] cpu_idle+0x52/0x71
> Feb 7 22:46:00 oceanic kernel: [<ffffffff8057fbbc>] start_secondary+0x465/0x474
>
The softlock detector has a long history of false positives and precious
few true positives, in my experience.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
2007-02-08 4:47 ` Andrew Morton
@ 2007-02-08 8:06 ` Ingo Molnar
2007-02-08 8:15 ` Andrew Morton
2007-02-08 9:16 ` Eric Dumazet
2007-02-08 15:51 ` Lukasz Trabinski
1 sibling, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2007-02-08 8:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Lukasz Trabinski, linux-kernel, Bartlomiej Solarz-Niesluchowski,
Thomas Gleixner
* Andrew Morton <akpm@linux-foundation.org> wrote:
> The softlock detector has a long history of false positives and
> precious few true positives, in my experience.
hm, not so the latest & lamest in my experience. The commit that made it
quite robust was 6687a97d4041f996f725902d2990e5de6ef5cbe5, as of March
2006, and first showed up in 2.6.17. (OTOH, since the merge of lockdep
the main source of soft lockups in the field has been quite severely
reduced. Nevertheless it's still good to have it around, occasionally
there happen other types of soft lockups too, in open-coded loops, etc.)
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
2007-02-08 8:06 ` Ingo Molnar
@ 2007-02-08 8:15 ` Andrew Morton
2007-02-08 9:16 ` Eric Dumazet
1 sibling, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2007-02-08 8:15 UTC (permalink / raw)
To: Ingo Molnar
Cc: Lukasz Trabinski, linux-kernel, Bartlomiej Solarz-Niesluchowski,
Thomas Gleixner
On Thu, 8 Feb 2007 09:06:44 +0100 Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > The softlock detector has a long history of false positives and
> > precious few true positives, in my experience.
>
> hm, not so the latest & lamest in my experience. The commit that made it
> quite robust was 6687a97d4041f996f725902d2990e5de6ef5cbe5, as of March
> 2006, and first showed up in 2.6.17. (OTOH, since the merge of lockdep
> the main source of soft lockups in the field has been quite severely
> reduced. Nevertheless it's still good to have it around, occasionally
> there happen other types of soft lockups too, in open-coded loops, etc.)
>
So... what caused Lukasz's lockup?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
2007-02-08 8:06 ` Ingo Molnar
2007-02-08 8:15 ` Andrew Morton
@ 2007-02-08 9:16 ` Eric Dumazet
2007-02-08 9:56 ` Ingo Molnar
1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2007-02-08 9:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andrew Morton, Lukasz Trabinski, linux-kernel,
Bartlomiej Solarz-Niesluchowski, Thomas Gleixner
On Thursday 08 February 2007 09:06, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> > The softlock detector has a long history of false positives and
> > precious few true positives, in my experience.
>
> hm, not so the latest & lamest in my experience. The commit that made it
> quite robust was 6687a97d4041f996f725902d2990e5de6ef5cbe5, as of March
> 2006, and first showed up in 2.6.17. (OTOH, since the merge of lockdep
> the main source of soft lockups in the field has been quite severely
> reduced. Nevertheless it's still good to have it around, occasionally
> there happen other types of soft lockups too, in open-coded loops, etc.)
This reminds me the current problem in close_files()
code, where we trigger soft lockup quite regularly.
Is there any chance/interest we can solve the issue Andrew had with this
patch ?
http://lkml.org/lkml/2006/5/2/273
Thank you
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
2007-02-08 9:16 ` Eric Dumazet
@ 2007-02-08 9:56 ` Ingo Molnar
2007-02-08 10:12 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2007-02-08 9:56 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Morton, Lukasz Trabinski, linux-kernel,
Bartlomiej Solarz-Niesluchowski, Thomas Gleixner
* Eric Dumazet <dada1@cosmosbay.com> wrote:
> This reminds me the current problem in close_files()
> code, where we trigger soft lockup quite regularly.
>
> Is there any chance/interest we can solve the issue Andrew had with
> this patch ?
>
> http://lkml.org/lkml/2006/5/2/273
yes - the -rt patch included the patch below for more than 2 years.
(note that this one is even more finegrained)
Ingo
Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -431,8 +433,10 @@ static void close_files(struct files_str
while (set) {
if (set & 1) {
struct file * file = xchg(&fdt->fd[i], NULL);
- if (file)
+ if (file) {
filp_close(file, files);
+ cond_resched();
+ }
}
i++;
set >>= 1;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
2007-02-08 9:56 ` Ingo Molnar
@ 2007-02-08 10:12 ` Andrew Morton
0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2007-02-08 10:12 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric Dumazet, Lukasz Trabinski, linux-kernel,
Bartlomiej Solarz-Niesluchowski, Thomas Gleixner
On Thu, 8 Feb 2007 10:56:12 +0100 Ingo Molnar <mingo@elte.hu> wrote:
>
> * Eric Dumazet <dada1@cosmosbay.com> wrote:
>
> > This reminds me the current problem in close_files()
> > code, where we trigger soft lockup quite regularly.
> >
> > Is there any chance/interest we can solve the issue Andrew had with
> > this patch ?
> >
> > http://lkml.org/lkml/2006/5/2/273
>
> yes - the -rt patch included the patch below for more than 2 years.
> (note that this one is even more finegrained)
>
> Ingo
>
> Index: linux/kernel/exit.c
> ===================================================================
> --- linux.orig/kernel/exit.c
> +++ linux/kernel/exit.c
> @@ -431,8 +433,10 @@ static void close_files(struct files_str
> while (set) {
> if (set & 1) {
> struct file * file = xchg(&fdt->fd[i], NULL);
> - if (file)
> + if (file) {
> filp_close(file, files);
> + cond_resched();
> + }
> }
> i++;
> set >>= 1;
That doesn't hang like the other patch did on 2.6.17-rc3.
Very mysterious.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.20 BUG: soft lockup detected on CPU#0!
2007-02-08 4:47 ` Andrew Morton
2007-02-08 8:06 ` Ingo Molnar
@ 2007-02-08 15:51 ` Lukasz Trabinski
1 sibling, 0 replies; 8+ messages in thread
From: Lukasz Trabinski @ 2007-02-08 15:51 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, Bartlomiej Solarz-Niesluchowski, Ingo Molnar,
Thomas Gleixner
On Wed, 7 Feb 2007, Andrew Morton wrote:
> On Thu, 8 Feb 2007 00:02:10 +0100 (CET) Lukasz Trabinski <lukasz@wsisiz.edu.pl> wrote:
>
>> Hello
>>
>> On 2.6.19 I had about 60 days uptime, on 2.6.20 2 days :(
>>
>
> Did the machine actually fail? Or did it just print these messages and
> keep going?
Was message from tg3 network driver, but i don't remember it. :(
Machine was totaly locked, only ctrl sysrq boot...
--
ŁT
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-02-08 15:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-07 23:02 2.6.20 BUG: soft lockup detected on CPU#0! Lukasz Trabinski
2007-02-08 4:47 ` Andrew Morton
2007-02-08 8:06 ` Ingo Molnar
2007-02-08 8:15 ` Andrew Morton
2007-02-08 9:16 ` Eric Dumazet
2007-02-08 9:56 ` Ingo Molnar
2007-02-08 10:12 ` Andrew Morton
2007-02-08 15:51 ` Lukasz Trabinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox