public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at kernel/timer.c:370!
@ 2004-02-14  3:33 Rafael D'Halleweyn (List)
  2004-02-14  8:21 ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael D'Halleweyn (List) @ 2004-02-14  3:33 UTC (permalink / raw)
  To: linux-kernel


I sometimes get the following BUG (transcribed from a digital camera
snapshot, so it might contain errors). I did not copy the stack trace,
let me know if you want it.

kernel BUG at kernel/timer.c:370!
invalid operand: 0000 [#1]
CPU:    0
EIP:    0060:[<c01284f8>]    Not tainted
EFLAGS: 00010003
EIP is at cascade+0x50/0x70
eax: d0a77724   ebx: d0a77724   ecx: c04aaa28   edx: 0000001c
esi: c04aab08   edi: c04aa220   ebp: 0000001c   esp: c0457e9e
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
Stack: ...
Call Trace:
 [<c01289e4>] update_process_times+0x44/0x50
 [<c0128b3f>] run_timer_softirq+0x12f/0x1c0
 [<c0124695>] do_softirq+0x95/0xa0
 [<c010d2fb>] do_IRQ+0xfb/0x130
 [<c010b5e8>] common_interrupt+0x18/0x20

Code: 0f 0b 72 01 92 d1 38 c0 eb d5 8d b4 26 00 00 00 00 8d bc 27
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing

-- 
Rafael D'Halleweyn (List) <list@noduck.net>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-02-14  3:33 Rafael D'Halleweyn (List)
@ 2004-02-14  8:21 ` Andrew Morton
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2004-02-14  8:21 UTC (permalink / raw)
  To: Rafael D'Halleweyn (List); +Cc: linux-kernel

"Rafael D'Halleweyn (List)" <list@noduck.net> wrote:
>
> I sometimes get the following BUG (transcribed from a digital camera
>  snapshot, so it might contain errors). I did not copy the stack trace,
>  let me know if you want it.
> 
>  kernel BUG at kernel/timer.c:370!
>  invalid operand: 0000 [#1]
>  CPU:    0
>  EIP:    0060:[<c01284f8>]    Not tainted
>  EFLAGS: 00010003
>  EIP is at cascade+0x50/0x70
>  eax: d0a77724   ebx: d0a77724   ecx: c04aaa28   edx: 0000001c
>  esi: c04aab08   edi: c04aa220   ebp: 0000001c   esp: c0457e9e
>  ds: 007b   es: 007b   ss: 0068
>  Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
>  Stack: ...
>  Call Trace:
>   [<c01289e4>] update_process_times+0x44/0x50
>   [<c0128b3f>] run_timer_softirq+0x12f/0x1c0
>   [<c0124695>] do_softirq+0x95/0xa0
>   [<c010d2fb>] do_IRQ+0xfb/0x130
>   [<c010b5e8>] common_interrupt+0x18/0x20

This could be a hardware problem.  Or it could be a bug basically anywhere
in the kernel.

Are you using CONFIG_DEBUG_SLAB?

Could you please apply the below patch, wait for the problem to reoccur,
then let us know?

diff -puN kernel/timer.c~a kernel/timer.c
--- 25/kernel/timer.c~a	2004-02-14 00:14:46.000000000 -0800
+++ 25-akpm/kernel/timer.c	2004-02-14 00:20:09.000000000 -0800
@@ -31,6 +31,7 @@
 #include <linux/time.h>
 #include <linux/jiffies.h>
 #include <linux/cpu.h>
+#include <linux/kallsyms.h>
 
 #include <asm/uaccess.h>
 #include <asm/div64.h>
@@ -367,7 +368,15 @@ static int cascade(tvec_base_t *base, tv
 		struct timer_list *tmp;
 
 		tmp = list_entry(curr, struct timer_list, entry);
-		BUG_ON(tmp->base != base);
+		if (tmp->base != base) {
+			printk("%s: %p != %p\n",
+				__FUNCTION__, tmp->base, base);
+			printk("handler=%p", tmp->function);
+			print_symbol(" (%s)", (unsigned long)tmp->function);
+			printk("\n");
+			dump_stack();
+			tmp->base = base;
+		}
 		curr = curr->next;
 		internal_add_timer(base, tmp);
 	}

_



^ permalink raw reply	[flat|nested] 17+ messages in thread

* kernel BUG at kernel/timer.c:370!
@ 2004-03-05 17:40 Flavio Bruno Leitner
  2004-03-05 23:06 ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-03-05 17:40 UTC (permalink / raw)
  To: linux-kernel


Hello!

My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2 
begins returning an oops right after boot.

kernel BUG at kernel/timer.c:370!
invalid operand: 0000 [#1]
CPU:	0
EIP:	0060:[<c0127177>]	Not tainted
EFLAGS: 00010006
EIP is at cascade+0x44/0x4e
eax: c03e4368	ebx: c03e02b0	ecx: fffce200	edx: c03e03b0
esi: c03e0398	edi: c03dfa80	ebp: c0387f08	esp: c0387ef4
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0386000 task=c0306520)
Stack: c03dfa80 cde229c4 00000000 c03df7a8 c0387f20 c0387f38 c0127732 c03dfa80
       c03e0288 00000022 c0387f34 c0387f20 c0387f20 c0308d64 00000001 c03df7a8
       0000000a c0387f54 c0123b7c c03df7a8 00000046 00000000 c037da00 c0308d64
Call Trace:
  [<c0127732>] run_timer_softirq+0xec/0x16b
  [<c0123b7c>] do_softirq+0x98/0x9a
  [<c010d2ff>] do_IRQ+0xe4/0x11c
  [<c010b974>] common_interrupt+0x18/0x20
  [<d08c8257>] acpi_processor_idle+0xe9/0x1e5 [processor]
  [<c0105000>] _stext+0x0/0x2a
  [<c01090b7>] cpu_idle+0x2f/0x38
  [<c038c70a>] start_kernel+0x185/0x1c9
  [<c038c44a>] unknow_bootoption+0x0/0x108

Code: 0f 0b 72 01 3b 05 2d c0 eb d4 55 89 e5 56 53 83 ec 04 0f bf


Here is the function:
static int cascade(tvec_base_t *base, tvec_t *tv, int index)
{
        /* cascade all the timers from tv up one level */
        struct list_head *head, *curr;

        head = tv->vec + index;
        curr = head->next;
        /*
         * We are removing _all_ timers from the list, so we don't  have to
         * detach them individually, just clear the list afterwards.
         */
        while (curr != head) {
                struct timer_list *tmp;

                tmp = list_entry(curr, struct timer_list, entry);
                BUG_ON(tmp->base != base);
                curr = curr->next;
                internal_add_timer(base, tmp);
        }
        INIT_LIST_HEAD(head);

        return index;
}


Any ideas about this one?
Thanks!


-- 
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E  531C BC17 D670 7FF0 A9E0 ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-03-05 17:40 Flavio Bruno Leitner
@ 2004-03-05 23:06 ` Andrew Morton
  2004-03-11 15:43   ` Flavio Bruno Leitner
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2004-03-05 23:06 UTC (permalink / raw)
  To: Flavio Bruno Leitner; +Cc: linux-kernel

Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
>
> My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2 
> begins returning an oops right after boot.
> 
> kernel BUG at kernel/timer.c:370!

Oh fantastic.  Something scrogged the timer lists.

I suggest you try stripping your kernel config down the the bare minimum
which is needed to boot, see if that fixes it and if so, start
reintroducing things until you've worked out which driver is causing the
problem.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-03-05 23:06 ` Andrew Morton
@ 2004-03-11 15:43   ` Flavio Bruno Leitner
  2004-03-11 21:42     ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-03-11 15:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

On Fri, Mar 05, 2004 at 03:06:15PM -0800, Andrew Morton wrote:
> Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
> >
> > My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2 
> > begins returning an oops right after boot.
> > 
> > kernel BUG at kernel/timer.c:370!
> 
> Oh fantastic.  Something scrogged the timer lists.
> 
> I suggest you try stripping your kernel config down the the bare minimum
> which is needed to boot, see if that fixes it and if so, start
> reintroducing things until you've worked out which driver is causing the
> problem.

Done!

The oops happens when the patch is applied, just do ifconfig eth0 down 
and ifconfig eth0 <with another ip>  up. The dhcp always get wrong ip, 
so my rc.local run ifconfig down and up. Removing the patch, I can't 
reproduce it anymore.

This oops still happens with newer kernels.

Thanks!


-- 
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E  531C BC17 D670 7FF0 A9E0 ]

[-- Attachment #2: ifdown-up-oops.patch --]
[-- Type: text/plain, Size: 2341 bytes --]

diff -Nru a/include/linux/inetdevice.h b/include/linux/inetdevice.h
--- a/include/linux/inetdevice.h	Fri Apr 11 03:35:44 2003
+++ b/include/linux/inetdevice.h	Thu Jan 29 20:57:46 2004
@@ -21,6 +21,7 @@
 	int	medium_id;
 	int	no_xfrm;
 	int	no_policy;
+	int	force_igmp_version;
 	void	*sysctl;
 };
 
diff -Nru a/net/ipv4/igmp.c b/net/ipv4/igmp.c
--- a/net/ipv4/igmp.c	Sat Jan 24 15:54:51 2004
+++ b/net/ipv4/igmp.c	Mon Feb  2 21:43:31 2004
@@ -126,10 +126,14 @@
  * contradict to specs provided this delay is small enough.
  */
 
-#define IGMP_V1_SEEN(in_dev) ((in_dev)->mr_v1_seen && \
-		time_before(jiffies, (in_dev)->mr_v1_seen))
-#define IGMP_V2_SEEN(in_dev) ((in_dev)->mr_v2_seen && \
-		time_before(jiffies, (in_dev)->mr_v2_seen))
+#define IGMP_V1_SEEN(in_dev) (ipv4_devconf.force_igmp_version == 1 || \
+		(in_dev)->cnf.force_igmp_version == 1 || \
+		((in_dev)->mr_v1_seen && \
+		time_before(jiffies, (in_dev)->mr_v1_seen)))
+#define IGMP_V2_SEEN(in_dev) (ipv4_devconf.force_igmp_version == 2 || \
+		(in_dev)->cnf.force_igmp_version == 2 || \
+		((in_dev)->mr_v2_seen && \
+		time_before(jiffies, (in_dev)->mr_v2_seen)))
 
 static void igmpv3_add_delrec(struct in_device *in_dev, struct ip_mc_list *im);
 static void igmpv3_del_delrec(struct in_device *in_dev, __u32 multiaddr);
@@ -1063,7 +1067,7 @@
 	reporter = im->reporter;
 	igmp_stop_timer(im);
 
-	if (in_dev->dev->flags & IFF_UP) {
+	if (!in_dev->dead) {
 		if (IGMP_V1_SEEN(in_dev))
 			goto done;
 		if (IGMP_V2_SEEN(in_dev)) {
@@ -1094,6 +1098,8 @@
 	if (im->multiaddr == IGMP_ALL_HOSTS)
 		return;
 
+	if (in_dev->dead)
+		return;
 	if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) {
 		spin_lock_bh(&im->lock);
 		igmp_start_timer(im, IGMP_Initial_Report_Delay);
@@ -1167,7 +1173,7 @@
 	igmpv3_del_delrec(in_dev, im->multiaddr);
 #endif
 	igmp_group_added(im);
-	if (in_dev->dev->flags & IFF_UP)
+	if (!in_dev->dead)
 		ip_rt_multicast_event(in_dev);
 out:
 	return;
@@ -1191,7 +1197,7 @@
 				write_unlock_bh(&in_dev->lock);
 				igmp_group_dropped(i);
 
-				if (in_dev->dev->flags & IFF_UP)
+				if (!in_dev->dead)
 					ip_rt_multicast_event(in_dev);
 
 				ip_ma_put(i);
@@ -1266,6 +1272,9 @@
 	struct ip_mc_list *i;
 
 	ASSERT_RTNL();
+
+	/* Deactivate timers */
+	ip_mc_down(in_dev);
 
 	write_lock_bh(&in_dev->lock);
 	while ((i = in_dev->mc_list) != NULL) {

[-- Attachment #3: oops.txt --]
[-- Type: text/plain, Size: 1161 bytes --]


Hello!

My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2 
begins returning an oops right after boot.

kernel BUG at kernel/timer.c:370!
invalid operand: 0000 [#1]
CPU:	0
EIP:	0060:[<c0127177>]	Not tainted
EFLAGS: 00010006
EIP is at cascade+0x44/0x4e
eax: c03e4368	ebx: c03e02b0	ecx: fffce200	edx: c03e03b0
esi: c03e0398	edi: c03dfa80	ebp: c0387f08	esp: c0387ef4
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0386000 task=c0306520)
Stack: c03dfa80 cde229c4 00000000 c03df7a8 c0387f20 c0387f38 c0127732 c03dfa80
       c03e0288 00000022 c0387f34 c0387f20 c0387f20 c0308d64 00000001 c03df7a8
       0000000a c0387f54 c0123b7c c03df7a8 00000046 00000000 c037da00 c0308d64
Call Trace:
  [<c0127732>] run_timer_softirq+0xec/0x16b
  [<c0123b7c>] do_softirq+0x98/0x9a
  [<c010d2ff>] do_IRQ+0xe4/0x11c
  [<c010b974>] common_interrupt+0x18/0x20
  [<d08c8257>] acpi_processor_idle+0xe9/0x1e5 [processor]
  [<c0105000>] _stext+0x0/0x2a
  [<c01090b7>] cpu_idle+0x2f/0x38
  [<c038c70a>] start_kernel+0x185/0x1c9
  [<c038c44a>] unknow_bootoption+0x0/0x108

Code: 0f 0b 72 01 3b 05 2d c0 eb d4 55 89 e5 56 53 83 ec 04 0f bf


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-03-11 15:43   ` Flavio Bruno Leitner
@ 2004-03-11 21:42     ` Andrew Morton
  2004-03-12 19:11       ` Flavio Bruno Leitner
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2004-03-11 21:42 UTC (permalink / raw)
  To: Flavio Bruno Leitner; +Cc: linux-kernel

Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
>
> On Fri, Mar 05, 2004 at 03:06:15PM -0800, Andrew Morton wrote:
> > Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
> > >
> > > My laptop is an Acer TravelMate 630 and somewhere between 2.6.2 and 2.6.3-rc2 
> > > begins returning an oops right after boot.
> > > 
> > > kernel BUG at kernel/timer.c:370!
> > 
> > Oh fantastic.  Something scrogged the timer lists.
> > 
> > I suggest you try stripping your kernel config down the the bare minimum
> > which is needed to boot, see if that fixes it and if so, start
> > reintroducing things until you've worked out which driver is causing the
> > problem.
> 
> Done!
> 
> The oops happens when the patch is applied, just do ifconfig eth0 down 
> and ifconfig eth0 <with another ip>  up. The dhcp always get wrong ip, 
> so my rc.local run ifconfig down and up. Removing the patch, I can't 
> reproduce it anymore.
> 

Thanks for working that out.  Maybe we need to terminate those sysctl
tables.   Does this fix it?

---

 25-akpm/net/ipv4/devinet.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff -puN net/ipv4/devinet.c~devinet-ctl_table-fix net/ipv4/devinet.c
--- 25/net/ipv4/devinet.c~devinet-ctl_table-fix	Thu Mar 11 13:40:38 2004
+++ 25-akpm/net/ipv4/devinet.c	Thu Mar 11 13:40:53 2004
@@ -1210,11 +1210,11 @@ int ipv4_doint_and_flush_strategy(ctl_ta
 
 static struct devinet_sysctl_table {
 	struct ctl_table_header *sysctl_header;
-	ctl_table		devinet_vars[20];
-	ctl_table		devinet_dev[2];
-	ctl_table		devinet_conf_dir[2];
-	ctl_table		devinet_proto_dir[2];
-	ctl_table		devinet_root_dir[2];
+	ctl_table		devinet_vars[21];
+	ctl_table		devinet_dev[3];
+	ctl_table		devinet_conf_dir[3];
+	ctl_table		devinet_proto_dir[3];
+	ctl_table		devinet_root_dir[3];
 } devinet_sysctl = {
 	.devinet_vars = {
 		{
@@ -1372,6 +1372,7 @@ static struct devinet_sysctl_table {
 			.proc_handler	= &ipv4_doint_and_flush,
 			.strategy	= &ipv4_doint_and_flush_strategy,
 		},
+		{ .ctl_name = 0 }
 	},
 	.devinet_dev = {
 		{
@@ -1380,6 +1381,7 @@ static struct devinet_sysctl_table {
 			.mode		= 0555,
 			.child		= devinet_sysctl.devinet_vars,
 		},
+		{ .ctl_name = 0 }
 	},
 	.devinet_conf_dir = {
 	        {
@@ -1388,6 +1390,7 @@ static struct devinet_sysctl_table {
 			.mode		= 0555,
 			.child		= devinet_sysctl.devinet_dev,
 		},
+		{ .ctl_name = 0 }
 	},
 	.devinet_proto_dir = {
 		{
@@ -1396,6 +1399,7 @@ static struct devinet_sysctl_table {
 			.mode		= 0555,
 			.child 		= devinet_sysctl.devinet_conf_dir,
 		},
+		{ .ctl_name = 0 }
 	},
 	.devinet_root_dir = {
 		{
@@ -1404,6 +1408,7 @@ static struct devinet_sysctl_table {
 			.mode		= 0555,
 			.child		= devinet_sysctl.devinet_proto_dir,
 		},
+		{ .ctl_name = 0 }
 	},
 };
 

_


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-03-11 21:42     ` Andrew Morton
@ 2004-03-12 19:11       ` Flavio Bruno Leitner
  0 siblings, 0 replies; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-03-12 19:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Thu, Mar 11, 2004 at 01:42:21PM -0800, Andrew Morton wrote:
> Thanks for working that out.  Maybe we need to terminate those sysctl
> tables.   Does this fix it?

No, still the same oops. :( 
I test it on old kernel with start with this problem and with bitkeeper of
today. 

-- 
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E  531C BC17 D670 7FF0 A9E0 ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: kernel BUG at kernel/timer.c:370!
@ 2004-03-31 16:59 Craig, Dave
  0 siblings, 0 replies; 17+ messages in thread
From: Craig, Dave @ 2004-03-31 16:59 UTC (permalink / raw)
  To: Andrew Morton, Rafael D'Halleweyn (List); +Cc: linux-kernel

I just observed this failure on two separate systems this morning.  I
added the patch in the hopes that it will provide some useful
information.

	Dave Craig

QUALCOMM Incorporated

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andrew Morton
Sent: Saturday, February 14, 2004 12:22 AM
To: Rafael D'Halleweyn (List)
Cc: linux-kernel@vger.kernel.org
Subject: Re: kernel BUG at kernel/timer.c:370!

"Rafael D'Halleweyn (List)" <list@noduck.net> wrote:
>
> I sometimes get the following BUG (transcribed from a digital camera
>  snapshot, so it might contain errors). I did not copy the stack
trace,
>  let me know if you want it.
> 
>  kernel BUG at kernel/timer.c:370!
>  invalid operand: 0000 [#1]
>  CPU:    0
>  EIP:    0060:[<c01284f8>]    Not tainted
>  EFLAGS: 00010003
>  EIP is at cascade+0x50/0x70
>  eax: d0a77724   ebx: d0a77724   ecx: c04aaa28   edx: 0000001c
>  esi: c04aab08   edi: c04aa220   ebp: 0000001c   esp: c0457e9e
>  ds: 007b   es: 007b   ss: 0068
>  Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
>  Stack: ...
>  Call Trace:
>   [<c01289e4>] update_process_times+0x44/0x50
>   [<c0128b3f>] run_timer_softirq+0x12f/0x1c0
>   [<c0124695>] do_softirq+0x95/0xa0
>   [<c010d2fb>] do_IRQ+0xfb/0x130
>   [<c010b5e8>] common_interrupt+0x18/0x20

This could be a hardware problem.  Or it could be a bug basically
anywhere
in the kernel.

Are you using CONFIG_DEBUG_SLAB?

Could you please apply the below patch, wait for the problem to reoccur,
then let us know?

diff -puN kernel/timer.c~a kernel/timer.c
--- 25/kernel/timer.c~a	2004-02-14 00:14:46.000000000 -0800
+++ 25-akpm/kernel/timer.c	2004-02-14 00:20:09.000000000 -0800
@@ -31,6 +31,7 @@
 #include <linux/time.h>
 #include <linux/jiffies.h>
 #include <linux/cpu.h>
+#include <linux/kallsyms.h>
 
 #include <asm/uaccess.h>
 #include <asm/div64.h>
@@ -367,7 +368,15 @@ static int cascade(tvec_base_t *base, tv
 		struct timer_list *tmp;
 
 		tmp = list_entry(curr, struct timer_list, entry);
-		BUG_ON(tmp->base != base);
+		if (tmp->base != base) {
+			printk("%s: %p != %p\n",
+				__FUNCTION__, tmp->base, base);
+			printk("handler=%p", tmp->function);
+			print_symbol(" (%s)", (unsigned
long)tmp->function);
+			printk("\n");
+			dump_stack();
+			tmp->base = base;
+		}
 		curr = curr->next;
 		internal_add_timer(base, tmp);
 	}

_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: kernel BUG at kernel/timer.c:370!
@ 2004-03-31 17:16 Craig, Dave
  2004-03-31 19:52 ` Andrew Morton
  2004-04-01 14:24 ` Flavio Bruno Leitner
  0 siblings, 2 replies; 17+ messages in thread
From: Craig, Dave @ 2004-03-31 17:16 UTC (permalink / raw)
  To: Andrew Morton, Rafael D'Halleweyn (List); +Cc: linux-kernel

cascade: c1a1d5e0 != c1a0d5e0
hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
Call Trace:
 [<c012ca73>] cascade+0x79/0xa1
 [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
 [<c012d0b3>] run_timer_softirq+0x159/0x1c9
 [<c012899d>] do_softirq+0xc9/0xcb
 [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
 [<c0108c09>] default_idle+0x0/0x32
 [<c010bab2>] apic_timer_interrupt+0x1a/0x20
 [<c0108c09>] default_idle+0x0/0x32
 [<c0108c36>] default_idle+0x2d/0x32
 [<c0108cb4>] cpu_idle+0x3a/0x43
 [<c0105000>] rest_init+0x0/0x68
 [<c039c89f>] start_kernel+0x1b7/0x209
 [<c039c427>] unknown_bootoption+0x0/0x124

Here is the result.  I am doing a lot of IPv4 multicast.

	Dave

-----Original Message-----
From: Craig, Dave 
Sent: Wednesday, March 31, 2004 9:00 AM
To: 'Andrew Morton'; Rafael D'Halleweyn (List)
Cc: linux-kernel@vger.kernel.org
Subject: RE: kernel BUG at kernel/timer.c:370!

I just observed this failure on two separate systems this morning.  I
added the patch in the hopes that it will provide some useful
information.

	Dave Craig

QUALCOMM Incorporated

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andrew Morton
Sent: Saturday, February 14, 2004 12:22 AM
To: Rafael D'Halleweyn (List)
Cc: linux-kernel@vger.kernel.org
Subject: Re: kernel BUG at kernel/timer.c:370!

"Rafael D'Halleweyn (List)" <list@noduck.net> wrote:
>
> I sometimes get the following BUG (transcribed from a digital camera
>  snapshot, so it might contain errors). I did not copy the stack
trace,
>  let me know if you want it.
> 
>  kernel BUG at kernel/timer.c:370!
>  invalid operand: 0000 [#1]
>  CPU:    0
>  EIP:    0060:[<c01284f8>]    Not tainted
>  EFLAGS: 00010003
>  EIP is at cascade+0x50/0x70
>  eax: d0a77724   ebx: d0a77724   ecx: c04aaa28   edx: 0000001c
>  esi: c04aab08   edi: c04aa220   ebp: 0000001c   esp: c0457e9e
>  ds: 007b   es: 007b   ss: 0068
>  Process swapper (pid: 0, threadinfo=c0456000 task=c03d2de0)
>  Stack: ...
>  Call Trace:
>   [<c01289e4>] update_process_times+0x44/0x50
>   [<c0128b3f>] run_timer_softirq+0x12f/0x1c0
>   [<c0124695>] do_softirq+0x95/0xa0
>   [<c010d2fb>] do_IRQ+0xfb/0x130
>   [<c010b5e8>] common_interrupt+0x18/0x20

This could be a hardware problem.  Or it could be a bug basically
anywhere
in the kernel.

Are you using CONFIG_DEBUG_SLAB?

Could you please apply the below patch, wait for the problem to reoccur,
then let us know?

diff -puN kernel/timer.c~a kernel/timer.c
--- 25/kernel/timer.c~a	2004-02-14 00:14:46.000000000 -0800
+++ 25-akpm/kernel/timer.c	2004-02-14 00:20:09.000000000 -0800
@@ -31,6 +31,7 @@
 #include <linux/time.h>
 #include <linux/jiffies.h>
 #include <linux/cpu.h>
+#include <linux/kallsyms.h>
 
 #include <asm/uaccess.h>
 #include <asm/div64.h>
@@ -367,7 +368,15 @@ static int cascade(tvec_base_t *base, tv
 		struct timer_list *tmp;
 
 		tmp = list_entry(curr, struct timer_list, entry);
-		BUG_ON(tmp->base != base);
+		if (tmp->base != base) {
+			printk("%s: %p != %p\n",
+				__FUNCTION__, tmp->base, base);
+			printk("handler=%p", tmp->function);
+			print_symbol(" (%s)", (unsigned
long)tmp->function);
+			printk("\n");
+			dump_stack();
+			tmp->base = base;
+		}
 		curr = curr->next;
 		internal_add_timer(base, tmp);
 	}

_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-03-31 17:16 kernel BUG at kernel/timer.c:370! Craig, Dave
@ 2004-03-31 19:52 ` Andrew Morton
  2004-04-01 14:24 ` Flavio Bruno Leitner
  1 sibling, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2004-03-31 19:52 UTC (permalink / raw)
  To: Craig, Dave; +Cc: list, linux-kernel

"Craig, Dave" <dwcraig@qualcomm.com> wrote:
>
> cascade: c1a1d5e0 != c1a0d5e0
>  hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
>  Call Trace:
>   [<c012ca73>] cascade+0x79/0xa1
>   [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
>   [<c012d0b3>] run_timer_softirq+0x159/0x1c9
>   [<c012899d>] do_softirq+0xc9/0xcb
>   [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
>   [<c0108c09>] default_idle+0x0/0x32
>   [<c010bab2>] apic_timer_interrupt+0x1a/0x20
>   [<c0108c09>] default_idle+0x0/0x32
>   [<c0108c36>] default_idle+0x2d/0x32
>   [<c0108cb4>] cpu_idle+0x3a/0x43
>   [<c0105000>] rest_init+0x0/0x68
>   [<c039c89f>] start_kernel+0x1b7/0x209
>   [<c039c427>] unknown_bootoption+0x0/0x124
> 
>  Here is the result.  I am doing a lot of IPv4 multicast.

There's only a single bit difference between the expected and actual
timer->base value.  So either your machine has flakey memory or the percpu
data area happened to be separated by 64k.

Is the machine SMP?  If so can you please run

	nm vmliunx | grep __per_cpu

and send the output?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: kernel BUG at kernel/timer.c:370!
@ 2004-03-31 21:39 Craig, Dave
  2004-03-31 22:15 ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Craig, Dave @ 2004-03-31 21:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: list, linux-kernel

Sure thing.

7ecb001b A __crc___per_cpu_offset
c033a510 r __kcrctab___per_cpu_offset
c033c462 r __kstrtab___per_cpu_offset
c03366c4 r __ksymtab___per_cpu_offset
c040bd90 A __per_cpu_end
c040c020 B __per_cpu_offset
c04090a0 A __per_cpu_start

It is a dual processor and the processors are hyperthreaded.

	Dave

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org
[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andrew Morton
Sent: Wednesday, March 31, 2004 11:52 AM
To: Craig, Dave
Cc: list@noduck.net; linux-kernel@vger.kernel.org
Subject: Re: kernel BUG at kernel/timer.c:370!

"Craig, Dave" <dwcraig@qualcomm.com> wrote:
>
> cascade: c1a1d5e0 != c1a0d5e0
>  hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
>  Call Trace:
>   [<c012ca73>] cascade+0x79/0xa1
>   [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
>   [<c012d0b3>] run_timer_softirq+0x159/0x1c9
>   [<c012899d>] do_softirq+0xc9/0xcb
>   [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
>   [<c0108c09>] default_idle+0x0/0x32
>   [<c010bab2>] apic_timer_interrupt+0x1a/0x20
>   [<c0108c09>] default_idle+0x0/0x32
>   [<c0108c36>] default_idle+0x2d/0x32
>   [<c0108cb4>] cpu_idle+0x3a/0x43
>   [<c0105000>] rest_init+0x0/0x68
>   [<c039c89f>] start_kernel+0x1b7/0x209
>   [<c039c427>] unknown_bootoption+0x0/0x124
> 
>  Here is the result.  I am doing a lot of IPv4 multicast.

There's only a single bit difference between the expected and actual
timer->base value.  So either your machine has flakey memory or the
percpu
data area happened to be separated by 64k.

Is the machine SMP?  If so can you please run

	nm vmliunx | grep __per_cpu

and send the output?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-03-31 21:39 Craig, Dave
@ 2004-03-31 22:15 ` Andrew Morton
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2004-03-31 22:15 UTC (permalink / raw)
  To: Craig, Dave; +Cc: list, linux-kernel

"Craig, Dave" <dwcraig@qualcomm.com> wrote:
>
> Sure thing.
> 
> 7ecb001b A __crc___per_cpu_offset
> c033a510 r __kcrctab___per_cpu_offset
> c033c462 r __kstrtab___per_cpu_offset
> c03366c4 r __ksymtab___per_cpu_offset
> c040bd90 A __per_cpu_end
> c040c020 B __per_cpu_offset
> c04090a0 A __per_cpu_start
> 
> It is a dual processor and the processors are hyperthreaded.

OK.  We're consistently seeing a single-bit difference and there's no
simple power-of-two stride in the things which that pointer points at. 
Most likely you have a hardware problem.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-03-31 17:16 kernel BUG at kernel/timer.c:370! Craig, Dave
  2004-03-31 19:52 ` Andrew Morton
@ 2004-04-01 14:24 ` Flavio Bruno Leitner
  2004-04-01 17:24   ` Flavio Bruno Leitner
  1 sibling, 1 reply; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-04-01 14:24 UTC (permalink / raw)
  To: Craig, Dave; +Cc: Andrew Morton, Rafael D'Halleweyn (List), linux-kernel

On Wed, Mar 31, 2004 at 09:16:52AM -0800, Craig, Dave wrote:
> cascade: c1a1d5e0 != c1a0d5e0
> hander=c028ee8d (igmp_ifc_timer_expire+0x0/0x3e)
> Call Trace:
>  [<c012ca73>] cascade+0x79/0xa1
>  [<c028ee8d>] igmp_ifc_timer_expire+0x0/0x3e
>  [<c012d0b3>] run_timer_softirq+0x159/0x1c9
>  [<c012899d>] do_softirq+0xc9/0xcb
>  [<c0119c46>] smp_apic_timer_interrupt+0xd8/0x140
>  [<c0108c09>] default_idle+0x0/0x32
>  [<c010bab2>] apic_timer_interrupt+0x1a/0x20
>  [<c0108c09>] default_idle+0x0/0x32
>  [<c0108c36>] default_idle+0x2d/0x32
>  [<c0108cb4>] cpu_idle+0x3a/0x43
>  [<c0105000>] rest_init+0x0/0x68
>  [<c039c89f>] start_kernel+0x1b7/0x209
>  [<c039c427>] unknown_bootoption+0x0/0x124
> 
> Here is the result.  I am doing a lot of IPv4 multicast.

Applied the patch, here is the result.
cascade: c040b170 != c040ab00           
handler=c040b168 (0xc040b168)
Call Trace:                  
 [<c012741f>] cascade+0x7f/0xb0
 [<c0127a3e>] run_timer_softirq+0xee/0x170
 [<c0123b15>] do_softirq+0xa5/0xb0        
 [<c010b625>] do_IRQ+0xe5/0x120   
 [<c0109a94>] common_interrupt+0x18/0x20
 [<c0107066>] default_idle+0x26/0x40    
 [<c01070f4>] cpu_idle+0x34/0x40    
 [<c03b0829>] start_kernel+0x189/0x1e0
 [<c03b0540>] unknown_bootoption+0x0/0x120
                                          
cascade: c040ab20 != c040ab00
handler=c040ab18 (0xc040ab18)
Call Trace:                  
 [<c012741f>] cascade+0x7f/0xb0
 [<c0127a3e>] run_timer_softirq+0xee/0x170
 [<c0123b15>] do_softirq+0xa5/0xb0        
 [<c010b625>] do_IRQ+0xe5/0x120   
 [<c0109a94>] common_interrupt+0x18/0x20
 [<c0107066>] default_idle+0x26/0x40    
 [<c01070f4>] cpu_idle+0x34/0x40    
 [<c03b0829>] start_kernel+0x189/0x1e0
 [<c03b0540>] unknown_bootoption+0x0/0x120


-- 
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E  531C BC17 D670 7FF0 A9E0 ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-04-01 14:24 ` Flavio Bruno Leitner
@ 2004-04-01 17:24   ` Flavio Bruno Leitner
  2004-04-01 18:37     ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-04-01 17:24 UTC (permalink / raw)
  To: Craig, Dave; +Cc: Andrew Morton, Rafael D'Halleweyn (List), linux-kernel


Another output with all debug options enabled.

cascade: c03b3128 != c03b28c0           
kernel/timer.c:296: spin_lock(kernel/timer.c:c03b28c0) already locked by kernel/timer.c/401
handler=c03b3120 (0xc03b3120)                                                              
Call Trace:                  
 [<c01347ef>] cascade+0x7f/0xb0
 [<c0135025>] run_timer_softirq+0x315/0x3f0
 [<c012fa35>] do_softirq+0xa5/0xb0         
 [<c010caea>] do_IRQ+0x21a/0x360  
 [<c012b5bf>] profile_hook+0x1f/0x23
 [<c010a934>] common_interrupt+0x18/0x20
 [<c0107066>] default_idle+0x26/0x40    
 [<c01070f4>] cpu_idle+0x34/0x40    
 [<c0434829>] start_kernel+0x189/0x1e0
 [<c0434540>] unknown_bootoption+0x0/0x120

cascade: c03b2f88 != c03b28c0
handler=c03b2f80 (0xc03b2f80)
Call Trace:                  
 [<c01347ef>] cascade+0x7f/0xb0
 [<c0135025>] run_timer_softirq+0x315/0x3f0
 [<c012fa35>] do_softirq+0xa5/0xb0         
 [<c010caea>] do_IRQ+0x21a/0x360  
 [<c012b5bf>] profile_hook+0x1f/0x23
 [<c010a934>] common_interrupt+0x18/0x20
 [<c0107066>] default_idle+0x26/0x40    
 [<c01070f4>] cpu_idle+0x34/0x40    
 [<c0434829>] start_kernel+0x189/0x1e0
 [<c0434540>] unknown_bootoption+0x0/0x120
                                          
cascade: c03b2910 != c03b28c0
handler=c03b2908 (0xc03b2908)
Call Trace:                  
 [<c01347ef>] cascade+0x7f/0xb0
 [<c0135025>] run_timer_softirq+0x315/0x3f0
 [<c012fa35>] do_softirq+0xa5/0xb0         
 [<c010caea>] do_IRQ+0x21a/0x360  
 [<c012b5bf>] profile_hook+0x1f/0x23
 [<c010a934>] common_interrupt+0x18/0x20
 [<c0107066>] default_idle+0x26/0x40    
 [<c01070f4>] cpu_idle+0x34/0x40    
 [<c0434829>] start_kernel+0x189/0x1e0
 [<c0434540>] unknown_bootoption+0x0/0x120



-- 
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E  531C BC17 D670 7FF0 A9E0 ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-04-01 17:24   ` Flavio Bruno Leitner
@ 2004-04-01 18:37     ` Andrew Morton
  2004-04-02 14:42       ` Flavio Bruno Leitner
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2004-04-01 18:37 UTC (permalink / raw)
  To: Flavio Bruno Leitner; +Cc: dwcraig, list, linux-kernel

Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
>
> cascade: c03b3128 != c03b28c0           
>  kernel/timer.c:296: spin_lock(kernel/timer.c:c03b28c0) already locked by kernel/timer.c/401
>  handler=c03b3120 (0xc03b3120)                                                              
>  Call Trace:                  
>   [<c01347ef>] cascade+0x7f/0xb0
>   [<c0135025>] run_timer_softirq+0x315/0x3f0
>   [<c012fa35>] do_softirq+0xa5/0xb0         
>   [<c010caea>] do_IRQ+0x21a/0x360  
>   [<c012b5bf>] profile_hook+0x1f/0x23
>   [<c010a934>] common_interrupt+0x18/0x20
>   [<c0107066>] default_idle+0x26/0x40    
>   [<c01070f4>] cpu_idle+0x34/0x40    
>   [<c0434829>] start_kernel+0x189/0x1e0
>   [<c0434540>] unknown_bootoption+0x0/0x120

Is the machine SMP?

What was the machine doing at the time?

Can you have a look in System.map, see if you can work out what's at
0xc03b3120?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: kernel BUG at kernel/timer.c:370!
@ 2004-04-01 19:05 Craig, Dave
  0 siblings, 0 replies; 17+ messages in thread
From: Craig, Dave @ 2004-04-01 19:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: list, linux-kernel

It could be hardware, but it would be hardware negatively interacting
with the kernel preemption feature.  The failure does not occur when
that feature is disabled.

	Dave

-----Original Message-----
From: Andrew Morton [mailto:akpm@osdl.org] 
Sent: Wednesday, March 31, 2004 2:16 PM
To: Craig, Dave
Cc: list@noduck.net; linux-kernel@vger.kernel.org
Subject: Re: kernel BUG at kernel/timer.c:370!

"Craig, Dave" <dwcraig@qualcomm.com> wrote:
>
> Sure thing.
> 
> 7ecb001b A __crc___per_cpu_offset
> c033a510 r __kcrctab___per_cpu_offset
> c033c462 r __kstrtab___per_cpu_offset
> c03366c4 r __ksymtab___per_cpu_offset
> c040bd90 A __per_cpu_end
> c040c020 B __per_cpu_offset
> c04090a0 A __per_cpu_start
> 
> It is a dual processor and the processors are hyperthreaded.

OK.  We're consistently seeing a single-bit difference and there's no
simple power-of-two stride in the things which that pointer points at. 
Most likely you have a hardware problem.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: kernel BUG at kernel/timer.c:370!
  2004-04-01 18:37     ` Andrew Morton
@ 2004-04-02 14:42       ` Flavio Bruno Leitner
  0 siblings, 0 replies; 17+ messages in thread
From: Flavio Bruno Leitner @ 2004-04-02 14:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dwcraig, list, linux-kernel

On Thu, Apr 01, 2004 at 10:37:18AM -0800, Andrew Morton wrote:
> Flavio Bruno Leitner <fbl@conectiva.com.br> wrote:
> >
> > cascade: c03b3128 != c03b28c0           
> >  kernel/timer.c:296: spin_lock(kernel/timer.c:c03b28c0) already locked by kernel/timer.c/401
> >  handler=c03b3120 (0xc03b3120)                                                              
> >  Call Trace:                  
> >   [<c01347ef>] cascade+0x7f/0xb0
> >   [<c0135025>] run_timer_softirq+0x315/0x3f0
> >   [<c012fa35>] do_softirq+0xa5/0xb0         
> >   [<c010caea>] do_IRQ+0x21a/0x360  
> >   [<c012b5bf>] profile_hook+0x1f/0x23
> >   [<c010a934>] common_interrupt+0x18/0x20
> >   [<c0107066>] default_idle+0x26/0x40    
> >   [<c01070f4>] cpu_idle+0x34/0x40    
> >   [<c0434829>] start_kernel+0x189/0x1e0
> >   [<c0434540>] unknown_bootoption+0x0/0x120
> 
> Is the machine SMP?

No, it's a simple Pentium II .

> What was the machine doing at the time?

I were running process like postfix, pump, ntpd. Well, after you do this
question, I tried to reproduce with runlevel 1 (single), but I can't until
now. Next step will be disable one per one service until I can't reproduce 
anymore.


> 
> Can you have a look in System.map, see if you can work out what's at
> 0xc03b3120?

c03b3128 => Not found in System.map
c03b28c0 => per_cpu__tvec_bases
c03b3120 => Not found in System.map


-- 
Flávio Bruno Leitner <fbl@conectiva.com.br>
[ E74B 0BD0 5E05 C385 239E  531C BC17 D670 7FF0 A9E0 ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2004-04-02 14:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-31 17:16 kernel BUG at kernel/timer.c:370! Craig, Dave
2004-03-31 19:52 ` Andrew Morton
2004-04-01 14:24 ` Flavio Bruno Leitner
2004-04-01 17:24   ` Flavio Bruno Leitner
2004-04-01 18:37     ` Andrew Morton
2004-04-02 14:42       ` Flavio Bruno Leitner
  -- strict thread matches above, loose matches on Subject: below --
2004-04-01 19:05 Craig, Dave
2004-03-31 21:39 Craig, Dave
2004-03-31 22:15 ` Andrew Morton
2004-03-31 16:59 Craig, Dave
2004-03-05 17:40 Flavio Bruno Leitner
2004-03-05 23:06 ` Andrew Morton
2004-03-11 15:43   ` Flavio Bruno Leitner
2004-03-11 21:42     ` Andrew Morton
2004-03-12 19:11       ` Flavio Bruno Leitner
2004-02-14  3:33 Rafael D'Halleweyn (List)
2004-02-14  8:21 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox