* RE: kernel BUG at kernel/timer.c:370!
@ 2004-03-31 17:16 Craig, Dave
2004-03-31 19:52 ` Andrew Morton
2004-04-01 14:24 ` Flavio Bruno Leitner
0 siblings, 2 replies; 17+ messages in thread
From: Craig, Dave @ 2004-03-31 17:16 UTC (permalink / raw)
To: Andrew Morton, Rafael D'Halleweyn (List); +Cc: linux-kernel
cascade: c1a1d5e0 != c1a0d5e0
hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
Call Trace:
[<c012ca73>] cascade+0x79/0xa1
[<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
[<c012d0b3>] run_timer_softirq+0x159/0x1c9
[<c012899d>] do_softirq+0xc9/0xcb
[<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
[<c0108c09>] default_idle+0x0/0x32
[<c010bab2>] apic_timer_interrupt+0x1a/0x20
[<c0108c09>] default_idle+0x0/0x32
[<c0108c36>] default_idle+0x2d/0x32
[<c0108cb4>] cpu_idle+0x3a/0x43
[<c0105000>] rest_init+0x0/0x68
[<c039c89f>] start_kernel+0x1b7/0x209
[<c039c427>] unknown_bootoption+0x0/0x124
Here is the result. I am doing a lot of IPv4 multicast.
Dave
-----Original Message-----
From: Craig, Dave
Sent: Wednesday, March 31, 2004 9:00 AM
To: 'Andrew Morton'; Rafael D'Halleweyn (List)
Cc: linux-kernel@vger.kernel.org
Subject: RE: kernel BUG at kernel/timer.c:370!
I just observed this failure on two separate systems this morning. I
added the patch in the hopes that it will provide some useful
information.
Dave Craig
QUALCOMM Incorporated
-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andrew Morton
Sent: Saturday, February 14, 2004 12:22 AM
To: Rafael D'Halleweyn (List)
Cc: linux-kernel@vger.kernel.org
Subject: Re: kernel BUG at kernel/timer.c:370!
"Rafael D'Halleweyn (List)" <list@noduck.net> wrote:
>
> I sometimes get the following BUG (transcribed from a digital camera
> snapshot, so it might contain errors). I did not copy the stack
trace,
> let me know if you want it.
>
> kernel BUG at kernel/timer.c:370!
> invalid operand: 0000 [#1]
> CPU: 0
> EIP: 0060:[<c01284f8>] Not tainted
> EFLAGS: 00010003
> EIP is at cascade+0x50/0x70
> eax: d0a77724 ebx: d0a77724 ecx: c04aaa28 edx: 0000001c
> esi: c04aab08 edi: c04aa220 ebp: 0000001c esp: c0457e9e
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
> Stack: ...
> Call Trace:
> [<c01289e4>] update_process_times+0x44/0x50
> [<c0128b3f>] run_timer_softirq+0x12f/0x1c0
> [<c0124695>] do_softirq+0x95/0xa0
> [<c010d2fb>] do_IRQ+0xfb/0x130
> [<c010b5e8>] common_interrupt+0x18/0x20
This could be a hardware problem. Or it could be a bug basically
anywhere
in the kernel.
Are you using CONFIG_DEBUG_SLAB?
Could you please apply the below patch, wait for the problem to reoccur,
then let us know?
diff -puN kernel/timer.c~a kernel/timer.c
--- 25/kernel/timer.c~a 2004-02-14 00:14:46.000000000 -0800
+++ 25-akpm/kernel/timer.c 2004-02-14 00:20:09.000000000 -0800
@@ -31,6 +31,7 @@
#include <linux/time.h>
#include <linux/jiffies.h>
#include <linux/cpu.h>
+#include <linux/kallsyms.h>
#include <asm/uaccess.h>
#include <asm/div64.h>
@@ -367,7 +368,15 @@ static int cascade(tvec_base_t *base, tv
struct timer_list *tmp;
tmp = list_entry(curr, struct timer_list, entry);
- BUG_ON(tmp->base != base);
+ if (tmp->base != base) {
+ printk("%s: %p != %p\n",
+ __FUNCTION__, tmp->base, base);
+ printk("handler=%p", tmp->function);
+ print_symbol(" (%s)", (unsigned
long)tmp->function);
+ printk("\n");
+ dump_stack();
+ tmp->base = base;
+ }
curr = curr->next;
internal_add_timer(base, tmp);
}
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: kernel BUG at kernel/timer.c:370!
2004-03-31 17:16 kernel BUG at kernel/timer.c:370! Craig, Dave
@ 2004-03-31 19:52 ` Andrew Morton
2004-04-01 14:24 ` Flavio Bruno Leitner
1 sibling, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2004-03-31 19:52 UTC (permalink / raw)
To: Craig, Dave; +Cc: list, linux-kernel
"Craig, Dave" <dwcraig@qualcomm.com> wrote:
>
> cascade: c1a1d5e0 != c1a0d5e0
> hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
> Call Trace:
> [<c012ca73>] cascade+0x79/0xa1
> [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
> [<c012d0b3>] run_timer_softirq+0x159/0x1c9
> [<c012899d>] do_softirq+0xc9/0xcb
> [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
> [<c0108c09>] default_idle+0x0/0x32
> [<c010bab2>] apic_timer_interrupt+0x1a/0x20
> [<c0108c09>] default_idle+0x0/0x32
> [<c0108c36>] default_idle+0x2d/0x32
> [<c0108cb4>] cpu_idle+0x3a/0x43
> [<c0105000>] rest_init+0x0/0x68
> [<c039c89f>] start_kernel+0x1b7/0x209
> [<c039c427>] unknown_bootoption+0x0/0x124
>
> Here is the result. I am doing a lot of IPv4 multicast.
There's only a single bit difference between the expected and actual
timer->base value. So either your machine has flakey memory or the percpu
data area happened to be separated by 64k.
Is the machine SMP? If so can you please run
nm vmliunx | grep __per_cpu
and send the output?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel BUG at kernel/timer.c:370!
2004-03-31 17:16 kernel BUG at kernel/timer.c:370! Craig, Dave
2004-03-31 19:52 ` Andrew Morton
@ 2004-04-01 14:24 ` Flavio Bruno Leitner
2004-04-01 17:24 ` Flavio Bruno Leitner
1 sibling, 1 reply; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-04-01 14:24 UTC (permalink / raw)
To: Craig, Dave; +Cc: Andrew Morton, Rafael D'Halleweyn (List), linux-kernel
On Wed, Mar 31, 2004 at 09:16:52AM -0800, Craig, Dave wrote:
> cascade: c1a1d5e0 != c1a0d5e0
> hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
> Call Trace:
> [<c012ca73>] cascade+0x79/0xa1
> [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
> [<c012d0b3>] run_timer_softirq+0x159/0x1c9
> [<c012899d>] do_softirq+0xc9/0xcb
> [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
> [<c0108c09>] default_idle+0x0/0x32
> [<c010bab2>] apic_timer_interrupt+0x1a/0x20
> [<c0108c09>] default_idle+0x0/0x32
> [<c0108c36>] default_idle+0x2d/0x32
> [<c0108cb4>] cpu_idle+0x3a/0x43
> [<c0105000>] rest_init+0x0/0x68
> [<c039c89f>] start_kernel+0x1b7/0x209
> [<c039c427>] unknown_bootoption+0x0/0x124
>
> Here is the result. I am doing a lot of IPv4 multicast.
Applied the patch, here is the result.
cascade: c040b170 != c040ab00
handler=c040b168 (0xc040b168)
Call Trace:
[<c012741f>] cascade+0x7f/0xb0
[<c0127a3e>] run_timer_softirq+0xee/0x170
[<c0123b15>] do_softirq+0xa5/0xb0
[<c010b625>] do_IRQ+0xe5/0x120
[<c0109a94>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c03b0829>] start_kernel+0x189/0x1e0
[<c03b0540>] unknown_bootoption+0x0/0x120
cascade: c040ab20 != c040ab00
handler=c040ab18 (0xc040ab18)
Call Trace:
[<c012741f>] cascade+0x7f/0xb0
[<c0127a3e>] run_timer_softirq+0xee/0x170
[<c0123b15>] do_softirq+0xa5/0xb0
[<c010b625>] do_IRQ+0xe5/0x120
[<c0109a94>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c03b0829>] start_kernel+0x189/0x1e0
[<c03b0540>] unknown_bootoption+0x0/0x120
--
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: kernel BUG at kernel/timer.c:370!
2004-04-01 14:24 ` Flavio Bruno Leitner
@ 2004-04-01 17:24 ` Flavio Bruno Leitner
2004-04-01 18:37 ` Andrew Morton
0 siblings, 1 reply; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-04-01 17:24 UTC (permalink / raw)
To: Craig, Dave; +Cc: Andrew Morton, Rafael D'Halleweyn (List), linux-kernel
Another output with all debug options enabled.
cascade: c03b3128 != c03b28c0
kernel/timer.c:296: spin_lock(kernel/timer.c:c03b28c0) already locked by kernel/timer.c/401
handler=c03b3120 (0xc03b3120)
Call Trace:
[<c01347ef>] cascade+0x7f/0xb0
[<c0135025>] run_timer_softirq+0x315/0x3f0
[<c012fa35>] do_softirq+0xa5/0xb0
[<c010caea>] do_IRQ+0x21a/0x360
[<c012b5bf>] profile_hook+0x1f/0x23
[<c010a934>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c0434829>] start_kernel+0x189/0x1e0
[<c0434540>] unknown_bootoption+0x0/0x120
cascade: c03b2f88 != c03b28c0
handler=c03b2f80 (0xc03b2f80)
Call Trace:
[<c01347ef>] cascade+0x7f/0xb0
[<c0135025>] run_timer_softirq+0x315/0x3f0
[<c012fa35>] do_softirq+0xa5/0xb0
[<c010caea>] do_IRQ+0x21a/0x360
[<c012b5bf>] profile_hook+0x1f/0x23
[<c010a934>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c0434829>] start_kernel+0x189/0x1e0
[<c0434540>] unknown_bootoption+0x0/0x120
cascade: c03b2910 != c03b28c0
handler=c03b2908 (0xc03b2908)
Call Trace:
[<c01347ef>] cascade+0x7f/0xb0
[<c0135025>] run_timer_softirq+0x315/0x3f0
[<c012fa35>] do_softirq+0xa5/0xb0
[<c010caea>] do_IRQ+0x21a/0x360
[<c012b5bf>] profile_hook+0x1f/0x23
[<c010a934>] common_interrupt+0x18/0x20
[<c0107066>] default_idle+0x26/0x40
[<c01070f4>] cpu_idle+0x34/0x40
[<c0434829>] start_kernel+0x189/0x1e0
[<c0434540>] unknown_bootoption+0x0/0x120
--
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: kernel BUG at kernel/timer.c:370!
2004-04-01 17:24 ` Flavio Bruno Leitner
@ 2004-04-01 18:37 ` Andrew Morton
2004-04-02 14:42 ` Flavio Bruno Leitner
0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2004-04-01 18:37 UTC (permalink / raw)
To: Flavio Bruno Leitner; +Cc: dwcraig, list, linux-kernel
Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
>
> cascade: c03b3128 != c03b28c0
> kernel/timer.c:296: spin_lock(kernel/timer.c:c03b28c0) already locked by kernel/timer.c/401
> handler=c03b3120 (0xc03b3120)
> Call Trace:
> [<c01347ef>] cascade+0x7f/0xb0
> [<c0135025>] run_timer_softirq+0x315/0x3f0
> [<c012fa35>] do_softirq+0xa5/0xb0
> [<c010caea>] do_IRQ+0x21a/0x360
> [<c012b5bf>] profile_hook+0x1f/0x23
> [<c010a934>] common_interrupt+0x18/0x20
> [<c0107066>] default_idle+0x26/0x40
> [<c01070f4>] cpu_idle+0x34/0x40
> [<c0434829>] start_kernel+0x189/0x1e0
> [<c0434540>] unknown_bootoption+0x0/0x120
Is the machine SMP?
What was the machine doing at the time?
Can you have a look in System.map, see if you can work out what's at
0xc03b3120?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel BUG at kernel/timer.c:370!
2004-04-01 18:37 ` Andrew Morton
@ 2004-04-02 14:42 ` Flavio Bruno Leitner
0 siblings, 0 replies; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-04-02 14:42 UTC (permalink / raw)
To: Andrew Morton; +Cc: dwcraig, list, linux-kernel
On Thu, Apr 01, 2004 at 10:37:18AM -0800, Andrew Morton wrote:
> Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
> >
> > cascade: c03b3128 != c03b28c0
> > kernel/timer.c:296: spin_lock(kernel/timer.c:c03b28c0) already locked by kernel/timer.c/401
> > handler=c03b3120 (0xc03b3120)
> > Call Trace:
> > [<c01347ef>] cascade+0x7f/0xb0
> > [<c0135025>] run_timer_softirq+0x315/0x3f0
> > [<c012fa35>] do_softirq+0xa5/0xb0
> > [<c010caea>] do_IRQ+0x21a/0x360
> > [<c012b5bf>] profile_hook+0x1f/0x23
> > [<c010a934>] common_interrupt+0x18/0x20
> > [<c0107066>] default_idle+0x26/0x40
> > [<c01070f4>] cpu_idle+0x34/0x40
> > [<c0434829>] start_kernel+0x189/0x1e0
> > [<c0434540>] unknown_bootoption+0x0/0x120
>
> Is the machine SMP?
No, it's a simple Pentium II .
> What was the machine doing at the time?
I were running process like postfix, pump, ntpd. Well, after you do this
question, I tried to reproduce with runlevel 1 (single), but I can't until
now. Next step will be disable one per one service until I can't reproduce
anymore.
>
> Can you have a look in System.map, see if you can work out what's at
> 0xc03b3120?
c03b3128 => Not found in System.map
c03b28c0 => per_cpu__tvec_bases
c03b3120 => Not found in System.map
--
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: kernel BUG at kernel/timer.c:370!
@ 2004-04-01 19:05 Craig, Dave
0 siblings, 0 replies; 17+ messages in thread
From: Craig, Dave @ 2004-04-01 19:05 UTC (permalink / raw)
To: Andrew Morton; +Cc: list, linux-kernel
It could be hardware, but it would be hardware negatively interacting
with the kernel preemption feature. The failure does not occur when
that feature is disabled.
Dave
-----Original Message-----
From: Andrew Morton [mailto:akpm@osdl.org]
Sent: Wednesday, March 31, 2004 2:16 PM
To: Craig, Dave
Cc: list@noduck.net; linux-kernel@vger.kernel.org
Subject: Re: kernel BUG at kernel/timer.c:370!
"Craig, Dave" <dwcraig@qualcomm.com> wrote:
>
> Sure thing.
>
> 7ecb001b A __crc___per_cpu_offset
> c033a510 r __kcrctab___per_cpu_offset
> c033c462 r __kstrtab___per_cpu_offset
> c03366c4 r __ksymtab___per_cpu_offset
> c040bd90 A __per_cpu_end
> c040c020 B __per_cpu_offset
> c04090a0 A __per_cpu_start
>
> It is a dual processor and the processors are hyperthreaded.
OK. We're consistently seeing a single-bit difference and there's no
simple power-of-two stride in the things which that pointer points at.
Most likely you have a hardware problem.
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: kernel BUG at kernel/timer.c:370!
@ 2004-03-31 21:39 Craig, Dave
2004-03-31 22:15 ` Andrew Morton
0 siblings, 1 reply; 17+ messages in thread
From: Craig, Dave @ 2004-03-31 21:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: list, linux-kernel
Sure thing.
7ecb001b A __crc___per_cpu_offset
c033a510 r __kcrctab___per_cpu_offset
c033c462 r __kstrtab___per_cpu_offset
c03366c4 r __ksymtab___per_cpu_offset
c040bd90 A __per_cpu_end
c040c020 B __per_cpu_offset
c04090a0 A __per_cpu_start
It is a dual processor and the processors are hyperthreaded.
Dave
-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andrew Morton
Sent: Wednesday, March 31, 2004 11:52 AM
To: Craig, Dave
Cc: list@noduck.net; linux-kernel@vger.kernel.org
Subject: Re: kernel BUG at kernel/timer.c:370!
"Craig, Dave" <dwcraig@qualcomm.com> wrote:
>
> cascade: c1a1d5e0 != c1a0d5e0
> hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
> Call Trace:
> [<c012ca73>] cascade+0x79/0xa1
> [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
> [<c012d0b3>] run_timer_softirq+0x159/0x1c9
> [<c012899d>] do_softirq+0xc9/0xcb
> [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
> [<c0108c09>] default_idle+0x0/0x32
> [<c010bab2>] apic_timer_interrupt+0x1a/0x20
> [<c0108c09>] default_idle+0x0/0x32
> [<c0108c36>] default_idle+0x2d/0x32
> [<c0108cb4>] cpu_idle+0x3a/0x43
> [<c0105000>] rest_init+0x0/0x68
> [<c039c89f>] start_kernel+0x1b7/0x209
> [<c039c427>] unknown_bootoption+0x0/0x124
>
> Here is the result. I am doing a lot of IPv4 multicast.
There's only a single bit difference between the expected and actual
timer->base value. So either your machine has flakey memory or the
percpu
data area happened to be separated by 64k.
Is the machine SMP? If so can you please run
nm vmliunx | grep __per_cpu
and send the output?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel BUG at kernel/timer.c:370!
2004-03-31 21:39 Craig, Dave
@ 2004-03-31 22:15 ` Andrew Morton
0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2004-03-31 22:15 UTC (permalink / raw)
To: Craig, Dave; +Cc: list, linux-kernel
"Craig, Dave" <dwcraig@qualcomm.com> wrote:
>
> Sure thing.
>
> 7ecb001b A __crc___per_cpu_offset
> c033a510 r __kcrctab___per_cpu_offset
> c033c462 r __kstrtab___per_cpu_offset
> c03366c4 r __ksymtab___per_cpu_offset
> c040bd90 A __per_cpu_end
> c040c020 B __per_cpu_offset
> c04090a0 A __per_cpu_start
>
> It is a dual processor and the processors are hyperthreaded.
OK. We're consistently seeing a single-bit difference and there's no
simple power-of-two stride in the things which that pointer points at.
Most likely you have a hardware problem.
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: kernel BUG at kernel/timer.c:370!
@ 2004-03-31 16:59 Craig, Dave
0 siblings, 0 replies; 17+ messages in thread
From: Craig, Dave @ 2004-03-31 16:59 UTC (permalink / raw)
To: Andrew Morton, Rafael D'Halleweyn (List); +Cc: linux-kernel
I just observed this failure on two separate systems this morning. I
added the patch in the hopes that it will provide some useful
information.
Dave Craig
QUALCOMM Incorporated
-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andrew Morton
Sent: Saturday, February 14, 2004 12:22 AM
To: Rafael D'Halleweyn (List)
Cc: linux-kernel@vger.kernel.org
Subject: Re: kernel BUG at kernel/timer.c:370!
"Rafael D'Halleweyn (List)" <list@noduck.net> wrote:
>
> I sometimes get the following BUG (transcribed from a digital camera
> snapshot, so it might contain errors). I did not copy the stack
trace,
> let me know if you want it.
>
> kernel BUG at kernel/timer.c:370!
> invalid operand: 0000 [#1]
> CPU: 0
> EIP: 0060:[<c01284f8>] Not tainted
> EFLAGS: 00010003
> EIP is at cascade+0x50/0x70
> eax: d0a77724 ebx: d0a77724 ecx: c04aaa28 edx: 0000001c
> esi: c04aab08 edi: c04aa220 ebp: 0000001c esp: c0457e9e
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
> Stack: ...
> Call Trace:
> [<c01289e4>] update_process_times+0x44/0x50
> [<c0128b3f>] run_timer_softirq+0x12f/0x1c0
> [<c0124695>] do_softirq+0x95/0xa0
> [<c010d2fb>] do_IRQ+0xfb/0x130
> [<c010b5e8>] common_interrupt+0x18/0x20
This could be a hardware problem. Or it could be a bug basically
anywhere
in the kernel.
Are you using CONFIG_DEBUG_SLAB?
Could you please apply the below patch, wait for the problem to reoccur,
then let us know?
diff -puN kernel/timer.c~a kernel/timer.c
--- 25/kernel/timer.c~a 2004-02-14 00:14:46.000000000 -0800
+++ 25-akpm/kernel/timer.c 2004-02-14 00:20:09.000000000 -0800
@@ -31,6 +31,7 @@
#include <linux/time.h>
#include <linux/jiffies.h>
#include <linux/cpu.h>
+#include <linux/kallsyms.h>
#include <asm/uaccess.h>
#include <asm/div64.h>
@@ -367,7 +368,15 @@ static int cascade(tvec_base_t *base, tv
struct timer_list *tmp;
tmp = list_entry(curr, struct timer_list, entry);
- BUG_ON(tmp->base != base);
+ if (tmp->base != base) {
+ printk("%s: %p != %p\n",
+ __FUNCTION__, tmp->base, base);
+ printk("handler=%p", tmp->function);
+ print_symbol(" (%s)", (unsigned
long)tmp->function);
+ printk("\n");
+ dump_stack();
+ tmp->base = base;
+ }
curr = curr->next;
internal_add_timer(base, tmp);
}
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 17+ messages in thread* kernel BUG at kernel/timer.c:370!
@ 2004-03-05 17:40 Flavio Bruno Leitner
2004-03-05 23:06 ` Andrew Morton
0 siblings, 1 reply; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-03-05 17:40 UTC (permalink / raw)
To: linux-kernel
Hello!
My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2
begins returning an oops right after boot.
kernel BUG at kernel/timer.c:370!
invalid operand: 0000 [#1]
CPU: 0
EIP: 0060:[<c0127177>] Not tainted
EFLAGS: 00010006
EIP is at cascade+0x44/0x4e
eax: c03e4368 ebx: c03e02b0 ecx: fffce200 edx: c03e03b0
esi: c03e0398 edi: c03dfa80 ebp: c0387f08 esp: c0387ef4
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0386000 task=c0306520)
Stack: c03dfa80 cde229c4 00000000 c03df7a8 c0387f20 c0387f38 c0127732 c03dfa80
c03e0288 00000022 c0387f34 c0387f20 c0387f20 c0308d64 00000001 c03df7a8
0000000a c0387f54 c0123b7c c03df7a8 00000046 00000000 c037da00 c0308d64
Call Trace:
[<c0127732>] run_timer_softirq+0xec/0x16b
[<c0123b7c>] do_softirq+0x98/0x9a
[<c010d2ff>] do_IRQ+0xe4/0x11c
[<c010b974>] common_interrupt+0x18/0x20
[<d08c8257>] acpi_processor_idle+0xe9/0x1e5 [processor]
[<c0105000>] _stext+0x0/0x2a
[<c01090b7>] cpu_idle+0x2f/0x38
[<c038c70a>] start_kernel+0x185/0x1c9
[<c038c44a>] unknow_bootoption+0x0/0x108
Code: 0f 0b 72 01 3b 05 2d c0 eb d4 55 89 e5 56 53 83 ec 04 0f bf
Here is the function:
static int cascade(tvec_base_t *base, tvec_t *tv, int index)
{
/* cascade all the timers from tv up one level */
struct list_head *head, *curr;
head = tv->vec + index;
curr = head->next;
/*
* We are removing _all_ timers from the list, so we don't have to
* detach them individually, just clear the list afterwards.
*/
while (curr != head) {
struct timer_list *tmp;
tmp = list_entry(curr, struct timer_list, entry);
BUG_ON(tmp->base != base);
curr = curr->next;
internal_add_timer(base, tmp);
}
INIT_LIST_HEAD(head);
return index;
}
Any ideas about this one?
Thanks!
--
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: kernel BUG at kernel/timer.c:370!
2004-03-05 17:40 Flavio Bruno Leitner
@ 2004-03-05 23:06 ` Andrew Morton
2004-03-11 15:43 ` Flavio Bruno Leitner
0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2004-03-05 23:06 UTC (permalink / raw)
To: Flavio Bruno Leitner; +Cc: linux-kernel
Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
>
> My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2
> begins returning an oops right after boot.
>
> kernel BUG at kernel/timer.c:370!
Oh fantastic. Something scrogged the timer lists.
I suggest you try stripping your kernel config down the the bare minimum
which is needed to boot, see if that fixes it and if so, start
reintroducing things until you've worked out which driver is causing the
problem.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel BUG at kernel/timer.c:370!
2004-03-05 23:06 ` Andrew Morton
@ 2004-03-11 15:43 ` Flavio Bruno Leitner
2004-03-11 21:42 ` Andrew Morton
0 siblings, 1 reply; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-03-11 15:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 981 bytes --]
On Fri, Mar 05, 2004 at 03:06:15PM -0800, Andrew Morton wrote:
> Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
> >
> > My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2
> > begins returning an oops right after boot.
> >
> > kernel BUG at kernel/timer.c:370!
>
> Oh fantastic. Something scrogged the timer lists.
>
> I suggest you try stripping your kernel config down the the bare minimum
> which is needed to boot, see if that fixes it and if so, start
> reintroducing things until you've worked out which driver is causing the
> problem.
Done!
The oops happens when the patch is applied, just do ifconfig eth0 down
and ifconfig eth0 <with another ip> up. The dhcp always get wrong ip,
so my rc.local run ifconfig down and up. Removing the patch, I can't
reproduce it anymore.
This oops still happens with newer kernels.
Thanks!
--
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
[-- Attachment #2: ifdown-up-oops.patch --]
[-- Type: text/plain, Size: 2341 bytes --]
diff -Nru a/include/linux/inetdevice.h b/include/linux/inetdevice.h
--- a/include/linux/inetdevice.h Fri Apr 11 03:35:44 2003
+++ b/include/linux/inetdevice.h Thu Jan 29 20:57:46 2004
@@ -21,6 +21,7 @@
int medium_id;
int no_xfrm;
int no_policy;
+ int force_igmp_version;
void *sysctl;
};
diff -Nru a/net/ipv4/igmp.c b/net/ipv4/igmp.c
--- a/net/ipv4/igmp.c Sat Jan 24 15:54:51 2004
+++ b/net/ipv4/igmp.c Mon Feb 2 21:43:31 2004
@@ -126,10 +126,14 @@
* contradict to specs provided this delay is small enough.
*/
-#define IGMP_V1_SEEN(in_dev) ((in_dev)->mr_v1_seen && \
- time_before(jiffies, (in_dev)->mr_v1_seen))
-#define IGMP_V2_SEEN(in_dev) ((in_dev)->mr_v2_seen && \
- time_before(jiffies, (in_dev)->mr_v2_seen))
+#define IGMP_V1_SEEN(in_dev) (ipv4_devconf.force_igmp_version == 1 || \
+ (in_dev)->cnf.force_igmp_version == 1 || \
+ ((in_dev)->mr_v1_seen && \
+ time_before(jiffies, (in_dev)->mr_v1_seen)))
+#define IGMP_V2_SEEN(in_dev) (ipv4_devconf.force_igmp_version == 2 || \
+ (in_dev)->cnf.force_igmp_version == 2 || \
+ ((in_dev)->mr_v2_seen && \
+ time_before(jiffies, (in_dev)->mr_v2_seen)))
static void igmpv3_add_delrec(struct in_device *in_dev, struct ip_mc_list *im);
static void igmpv3_del_delrec(struct in_device *in_dev, __u32 multiaddr);
@@ -1063,7 +1067,7 @@
reporter = im->reporter;
igmp_stop_timer(im);
- if (in_dev->dev->flags & IFF_UP) {
+ if (!in_dev->dead) {
if (IGMP_V1_SEEN(in_dev))
goto done;
if (IGMP_V2_SEEN(in_dev)) {
@@ -1094,6 +1098,8 @@
if (im->multiaddr == IGMP_ALL_HOSTS)
return;
+ if (in_dev->dead)
+ return;
if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) {
spin_lock_bh(&im->lock);
igmp_start_timer(im, IGMP_Initial_Report_Delay);
@@ -1167,7 +1173,7 @@
igmpv3_del_delrec(in_dev, im->multiaddr);
#endif
igmp_group_added(im);
- if (in_dev->dev->flags & IFF_UP)
+ if (!in_dev->dead)
ip_rt_multicast_event(in_dev);
out:
return;
@@ -1191,7 +1197,7 @@
write_unlock_bh(&in_dev->lock);
igmp_group_dropped(i);
- if (in_dev->dev->flags & IFF_UP)
+ if (!in_dev->dead)
ip_rt_multicast_event(in_dev);
ip_ma_put(i);
@@ -1266,6 +1272,9 @@
struct ip_mc_list *i;
ASSERT_RTNL();
+
+ /* Deactivate timers */
+ ip_mc_down(in_dev);
write_lock_bh(&in_dev->lock);
while ((i = in_dev->mc_list) != NULL) {
[-- Attachment #3: oops.txt --]
[-- Type: text/plain, Size: 1161 bytes --]
Hello!
My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2
begins returning an oops right after boot.
kernel BUG at kernel/timer.c:370!
invalid operand: 0000 [#1]
CPU: 0
EIP: 0060:[<c0127177>] Not tainted
EFLAGS: 00010006
EIP is at cascade+0x44/0x4e
eax: c03e4368 ebx: c03e02b0 ecx: fffce200 edx: c03e03b0
esi: c03e0398 edi: c03dfa80 ebp: c0387f08 esp: c0387ef4
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0386000 task=c0306520)
Stack: c03dfa80 cde229c4 00000000 c03df7a8 c0387f20 c0387f38 c0127732 c03dfa80
c03e0288 00000022 c0387f34 c0387f20 c0387f20 c0308d64 00000001 c03df7a8
0000000a c0387f54 c0123b7c c03df7a8 00000046 00000000 c037da00 c0308d64
Call Trace:
[<c0127732>] run_timer_softirq+0xec/0x16b
[<c0123b7c>] do_softirq+0x98/0x9a
[<c010d2ff>] do_IRQ+0xe4/0x11c
[<c010b974>] common_interrupt+0x18/0x20
[<d08c8257>] acpi_processor_idle+0xe9/0x1e5 [processor]
[<c0105000>] _stext+0x0/0x2a
[<c01090b7>] cpu_idle+0x2f/0x38
[<c038c70a>] start_kernel+0x185/0x1c9
[<c038c44a>] unknow_bootoption+0x0/0x108
Code: 0f 0b 72 01 3b 05 2d c0 eb d4 55 89 e5 56 53 83 ec 04 0f bf
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: kernel BUG at kernel/timer.c:370!
2004-03-11 15:43 ` Flavio Bruno Leitner
@ 2004-03-11 21:42 ` Andrew Morton
2004-03-12 19:11 ` Flavio Bruno Leitner
0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2004-03-11 21:42 UTC (permalink / raw)
To: Flavio Bruno Leitner; +Cc: linux-kernel
Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
>
> On Fri, Mar 05, 2004 at 03:06:15PM -0800, Andrew Morton wrote:
> > Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
> > >
> > > My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2
> > > begins returning an oops right after boot.
> > >
> > > kernel BUG at kernel/timer.c:370!
> >
> > Oh fantastic. Something scrogged the timer lists.
> >
> > I suggest you try stripping your kernel config down the the bare minimum
> > which is needed to boot, see if that fixes it and if so, start
> > reintroducing things until you've worked out which driver is causing the
> > problem.
>
> Done!
>
> The oops happens when the patch is applied, just do ifconfig eth0 down
> and ifconfig eth0 <with another ip> up. The dhcp always get wrong ip,
> so my rc.local run ifconfig down and up. Removing the patch, I can't
> reproduce it anymore.
>
Thanks for working that out. Maybe we need to terminate those sysctl
tables. Does this fix it?
---
25-akpm/net/ipv4/devinet.c | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)
diff -puN net/ipv4/devinet.c~devinet-ctl_table-fix net/ipv4/devinet.c
--- 25/net/ipv4/devinet.c~devinet-ctl_table-fix Thu Mar 11 13:40:38 2004
+++ 25-akpm/net/ipv4/devinet.c Thu Mar 11 13:40:53 2004
@@ -1210,11 +1210,11 @@ int ipv4_doint_and_flush_strategy(ctl_ta
static struct devinet_sysctl_table {
struct ctl_table_header *sysctl_header;
- ctl_table devinet_vars[20];
- ctl_table devinet_dev[2];
- ctl_table devinet_conf_dir[2];
- ctl_table devinet_proto_dir[2];
- ctl_table devinet_root_dir[2];
+ ctl_table devinet_vars[21];
+ ctl_table devinet_dev[3];
+ ctl_table devinet_conf_dir[3];
+ ctl_table devinet_proto_dir[3];
+ ctl_table devinet_root_dir[3];
} devinet_sysctl = {
.devinet_vars = {
{
@@ -1372,6 +1372,7 @@ static struct devinet_sysctl_table {
.proc_handler = &ipv4_doint_and_flush,
.strategy = &ipv4_doint_and_flush_strategy,
},
+ { .ctl_name = 0 }
},
.devinet_dev = {
{
@@ -1380,6 +1381,7 @@ static struct devinet_sysctl_table {
.mode = 0555,
.child = devinet_sysctl.devinet_vars,
},
+ { .ctl_name = 0 }
},
.devinet_conf_dir = {
{
@@ -1388,6 +1390,7 @@ static struct devinet_sysctl_table {
.mode = 0555,
.child = devinet_sysctl.devinet_dev,
},
+ { .ctl_name = 0 }
},
.devinet_proto_dir = {
{
@@ -1396,6 +1399,7 @@ static struct devinet_sysctl_table {
.mode = 0555,
.child = devinet_sysctl.devinet_conf_dir,
},
+ { .ctl_name = 0 }
},
.devinet_root_dir = {
{
@@ -1404,6 +1408,7 @@ static struct devinet_sysctl_table {
.mode = 0555,
.child = devinet_sysctl.devinet_proto_dir,
},
+ { .ctl_name = 0 }
},
};
_
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: kernel BUG at kernel/timer.c:370!
2004-03-11 21:42 ` Andrew Morton
@ 2004-03-12 19:11 ` Flavio Bruno Leitner
0 siblings, 0 replies; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-03-12 19:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Thu, Mar 11, 2004 at 01:42:21PM -0800, Andrew Morton wrote:
> Thanks for working that out. Maybe we need to terminate those sysctl
> tables. Does this fix it?
No, still the same oops. :(
I test it on old kernel with start with this problem and with bitkeeper of
today.
--
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E 531C BC17 D670 7FF0 A9E0 ]
^ permalink raw reply [flat|nested] 17+ messages in thread
* kernel BUG at kernel/timer.c:370!
@ 2004-02-14 3:33 Rafael D'Halleweyn (List)
2004-02-14 8:21 ` Andrew Morton
0 siblings, 1 reply; 17+ messages in thread
From: Rafael D'Halleweyn (List) @ 2004-02-14 3:33 UTC (permalink / raw)
To: linux-kernel
I sometimes get the following BUG (transcribed from a digital camera
snapshot, so it might contain errors). I did not copy the stack trace,
let me know if you want it.
kernel BUG at kernel/timer.c:370!
invalid operand: 0000 [#1]
CPU: 0
EIP: 0060:[<c01284f8>] Not tainted
EFLAGS: 00010003
EIP is at cascade+0x50/0x70
eax: d0a77724 ebx: d0a77724 ecx: c04aaa28 edx: 0000001c
esi: c04aab08 edi: c04aa220 ebp: 0000001c esp: c0457e9e
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
Stack: ...
Call Trace:
[<c01289e4>] update_process_times+0x44/0x50
[<c0128b3f>] run_timer_softirq+0x12f/0x1c0
[<c0124695>] do_softirq+0x95/0xa0
[<c010d2fb>] do_IRQ+0xfb/0x130
[<c010b5e8>] common_interrupt+0x18/0x20
Code: 0f 0b 72 01 92 d1 38 c0 eb d5 8d b4 26 00 00 00 00 8d bc 27
<0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing
--
Rafael D'Halleweyn (List) <list@noduck.net>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel BUG at kernel/timer.c:370!
2004-02-14 3:33 Rafael D'Halleweyn (List)
@ 2004-02-14 8:21 ` Andrew Morton
0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2004-02-14 8:21 UTC (permalink / raw)
To: Rafael D'Halleweyn (List); +Cc: linux-kernel
"Rafael D'Halleweyn (List)" <list@noduck.net> wrote:
>
> I sometimes get the following BUG (transcribed from a digital camera
> snapshot, so it might contain errors). I did not copy the stack trace,
> let me know if you want it.
>
> kernel BUG at kernel/timer.c:370!
> invalid operand: 0000 [#1]
> CPU: 0
> EIP: 0060:[<c01284f8>] Not tainted
> EFLAGS: 00010003
> EIP is at cascade+0x50/0x70
> eax: d0a77724 ebx: d0a77724 ecx: c04aaa28 edx: 0000001c
> esi: c04aab08 edi: c04aa220 ebp: 0000001c esp: c0457e9e
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
> Stack: ...
> Call Trace:
> [<c01289e4>] update_process_times+0x44/0x50
> [<c0128b3f>] run_timer_softirq+0x12f/0x1c0
> [<c0124695>] do_softirq+0x95/0xa0
> [<c010d2fb>] do_IRQ+0xfb/0x130
> [<c010b5e8>] common_interrupt+0x18/0x20
This could be a hardware problem. Or it could be a bug basically anywhere
in the kernel.
Are you using CONFIG_DEBUG_SLAB?
Could you please apply the below patch, wait for the problem to reoccur,
then let us know?
diff -puN kernel/timer.c~a kernel/timer.c
--- 25/kernel/timer.c~a 2004-02-14 00:14:46.000000000 -0800
+++ 25-akpm/kernel/timer.c 2004-02-14 00:20:09.000000000 -0800
@@ -31,6 +31,7 @@
#include <linux/time.h>
#include <linux/jiffies.h>
#include <linux/cpu.h>
+#include <linux/kallsyms.h>
#include <asm/uaccess.h>
#include <asm/div64.h>
@@ -367,7 +368,15 @@ static int cascade(tvec_base_t *base, tv
struct timer_list *tmp;
tmp = list_entry(curr, struct timer_list, entry);
- BUG_ON(tmp->base != base);
+ if (tmp->base != base) {
+ printk("%s: %p != %p\n",
+ __FUNCTION__, tmp->base, base);
+ printk("handler=%p", tmp->function);
+ print_symbol(" (%s)", (unsigned long)tmp->function);
+ printk("\n");
+ dump_stack();
+ tmp->base = base;
+ }
curr = curr->next;
internal_add_timer(base, tmp);
}
_
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2004-04-02 14:42 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-31 17:16 kernel BUG at kernel/timer.c:370! Craig, Dave
2004-03-31 19:52 ` Andrew Morton
2004-04-01 14:24 ` Flavio Bruno Leitner
2004-04-01 17:24 ` Flavio Bruno Leitner
2004-04-01 18:37 ` Andrew Morton
2004-04-02 14:42 ` Flavio Bruno Leitner
-- strict thread matches above, loose matches on Subject: below --
2004-04-01 19:05 Craig, Dave
2004-03-31 21:39 Craig, Dave
2004-03-31 22:15 ` Andrew Morton
2004-03-31 16:59 Craig, Dave
2004-03-05 17:40 Flavio Bruno Leitner
2004-03-05 23:06 ` Andrew Morton
2004-03-11 15:43 ` Flavio Bruno Leitner
2004-03-11 21:42 ` Andrew Morton
2004-03-12 19:11 ` Flavio Bruno Leitner
2004-02-14 3:33 Rafael D'Halleweyn (List)
2004-02-14 8:21 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox