2.5.46: kernel BUG at kernel/timer.c:333!

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.5.46: kernel BUG at kernel/timer.c:333!
@ 2002-11-09 15:07 Miquel van Smoorenburg
  2002-11-09 18:51 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Miquel van Smoorenburg @ 2002-11-09 15:07 UTC (permalink / raw)
  To: linux-kernel

I can reliably crash 2.5.X on one of our newsservers (dual PIII/450, GigE,
lots of disk- and network I/O).

The last crash I posted here was a bit garbled. So I tried again,
this one is clean. Crash appears to be timer related ?

kernel BUG at kernel/timer.c:333!
invalid operand: 0000
CPU:    1
EIP:    0060:[<c01227a9>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010083
eax: c1a16140   ebx: c1a1638c     ecx: f70b65e4       edx: c1a16140
esi: c1a15aa0   edi: f70b65e4     ebp: c1a162b0       esp: c1b25db4
ds: 0068   es: 0068   ss: 0068
Stack: c1a16ac0 c1a15aa0 c1b24000 00000001 c0122ca9 c1a15aa0 c1a162b0 c1a16ac0
       00000000 c1b24000 00000001 00000001 c011f7d5 00000000 00000001 c03d9660
       fffffffe 00000020 c042f424 c042f424 c011f4da c03d9660 00000004 00000001
 [<c0122ca9>] run_timer_tasklet+0x5d/0x15c
 [<c011f7d5>] tasklet_hi_action+0x85/0xe0
 [<c011f4da>] do_softirq+0x5a/0xac
 [<c0112adf>] smp_apic_timer_interrupt+0x113/0x124
 [<c0109476>] apic_timer_interrupt+0x1a/0x20
 [<c0135fd5>] shrink_cache+0x185/0x350
 [<c0136850>] shrink_zone+0x80/0x88
 [<c0136a5e>] balance_pgdat+0x9e/0xfc
 [<c0136bc4>] kswapd+0x108/0x112
 [<c0136abc>] kswapd+0x0/0x112
 [<c0118058>] autoremove_wake_function+0x0/0x40
 [<c0106e8d>] kernel_thread_helper+0x5/0xc
Code: 0f 0b 4d 01 ed d2 29 c0 8b 39 8b 41 08 89 c2 2b 56 04 81 fa


>>EIP; c01227a9 <cascade+25/f8>   <=====

>>eax; c1a16140 <END_OF_CODE+158eb98/????>
>>ebx; c1a1638c <END_OF_CODE+158ede4/????>
>>ecx; f70b65e4 <END_OF_CODE+36c2f03c/????>
>>edx; c1a16140 <END_OF_CODE+158eb98/????>
>>esi; c1a15aa0 <END_OF_CODE+158e4f8/????>
>>edi; f70b65e4 <END_OF_CODE+36c2f03c/????>
>>ebp; c1a162b0 <END_OF_CODE+158ed08/????>
>>esp; c1b25db4 <END_OF_CODE+169e80c/????>

Code;  c01227a9 <cascade+25/f8>
00000000 <_EIP>:
Code;  c01227a9 <cascade+25/f8>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c01227ab <cascade+27/f8>
   2:   4d                        dec    %ebp
Code;  c01227ac <cascade+28/f8>
   3:   01 ed                     add    %ebp,%ebp
Code;  c01227ae <cascade+2a/f8>
   5:   d2 29                     shrb   %cl,(%ecx)
Code;  c01227b0 <cascade+2c/f8>
   7:   c0 8b 39 8b 41 08 89      rorb   $0x89,0x8418b39(%ebx)
Code;  c01227b7 <cascade+33/f8>
   e:   c2 2b 56                  ret    $0x562b
Code;  c01227ba <cascade+36/f8>
  11:   04 81                     add    $0x81,%al
Code;  c01227bc <cascade+38/f8>
  13:   fa                        cli    


Perhaps it has something to do with the following debug message
I see during boot:

... bootmessages ...
slab: reap timer started for cpu 0
slab: reap timer started for cpu 1
Starting kswapd
highmem bounce pool size: 64 pages
aio_setup: sizeof(struct page) = 40
[f7f8e040] eventpoll: driver installed.
Debug: sleeping function called from illegal context at include/asm/semaphore.h:119
Call Trace:
 [<c0117be8>] __might_sleep+0x54/0x58
 [<c013578c>] set_shrinker+0x3c/0x7c
 [<c01686b4>] mb_cache_create+0x1c4/0x244
 [<c0168380>] mb_cache_shrink_fn+0x0/0x170
 [<c01050ab>] init+0x47/0x1ac
 [<c0105064>] init+0x0/0x1ac
 [<c0106e8d>] kernel_thread_helper+0x5/0xc

Journalled Block Device driver loaded
Capability LSM initialized
Serial: 8250/16550 driver $Revision: 1.90 $ IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
... continuess booting ...

Mike.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.46: kernel BUG at kernel/timer.c:333!
  2002-11-09 15:07 2.5.46: kernel BUG at kernel/timer.c:333! Miquel van Smoorenburg
@ 2002-11-09 18:51 ` Andrew Morton
  2002-11-10 14:32   ` Miquel van Smoorenburg
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2002-11-09 18:51 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-kernel

Miquel van Smoorenburg wrote:
> 
> I can reliably crash 2.5.X on one of our newsservers (dual PIII/450, GigE,
> lots of disk- and network I/O).
> 
> The last crash I posted here was a bit garbled. So I tried again,
> this one is clean. Crash appears to be timer related ?
> 
> kernel BUG at kernel/timer.c:333!

There are timer fixes in Linus's current tree.  The problem which
they address could cause this BUG.

>..
> Debug: sleeping function called from illegal context at include/asm/semaphore.h:119
> Call Trace:
>  [<c0117be8>] __might_sleep+0x54/0x58
>  [<c013578c>] set_shrinker+0x3c/0x7c
>  [<c01686b4>] mb_cache_create+0x1c4/0x244
>  [<c0168380>] mb_cache_shrink_fn+0x0/0x170
>  [<c01050ab>] init+0x47/0x1ac
>  [<c0105064>] init+0x0/0x1ac
>  [<c0106e8d>] kernel_thread_helper+0x5/0xc

That's different.  A fix for this is in Linus's tree.

So..  Please grab an update from
ftp://ftp.kernel.org/pub/linux/kernel/v2.5/snapshots/
or retest 2.5.47.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.46: kernel BUG at kernel/timer.c:333!
  2002-11-09 18:51 ` Andrew Morton
@ 2002-11-10 14:32   ` Miquel van Smoorenburg
  2002-11-10 17:13     ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Miquel van Smoorenburg @ 2002-11-10 14:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

According to Andrew Morton:
> Miquel van Smoorenburg wrote:
> > I can reliably crash 2.5.X on one of our newsservers (dual PIII/450, GigE,
> > lots of disk- and network I/O).
> > 
> > kernel BUG at kernel/timer.c:333!
> 
> There are timer fixes in Linus's current tree.  The problem which
> they address could cause this BUG.

I've booted 2.5.46bk5 on the machine, and it has been running for over
2 hours with extra heavy diskio. That reliably crashed the machine
in about 45 minutes with 2.4.45 and 2.5.46, machine is still up now.

> > Debug: sleeping function called from illegal context at include/asm/semaphore.h:119
> > Call Trace:
> >  [<c0117be8>] __might_sleep+0x54/0x58
> >  [<c013578c>] set_shrinker+0x3c/0x7c
> >  [<c01686b4>] mb_cache_create+0x1c4/0x244
> >  [<c0168380>] mb_cache_shrink_fn+0x0/0x170
> >  [<c01050ab>] init+0x47/0x1ac
> >  [<c0105064>] init+0x0/0x1ac
> >  [<c0106e8d>] kernel_thread_helper+0x5/0xc
> 
> That's different.  A fix for this is in Linus's tree.

Indeed, I don't see that one anymore in -bk5

I'm still seeing the buffer layer error at fs/buffer.c:1623,
though. Happens when a blockdev is close()d. Is a fix for
this in -mm2? Does -mm2 include -bk5 ?  If so I'll put that
on it and keep an eye on it tomorrow, see what happens.

Debug messages I'm still seeing:
(note that I compiled IPv6 into the kernel since we're slowly moving
our network to IPv6 but that it is otherwise unused right now, and
that the previous kernels that crashed on me didn't have IPv6 in it)

Uninitialised timer!
This is just a warning.  Your computer is OK
function=0xc0285748, data=0xf78a6680
Call Trace:
 [<c0122610>] check_timer_failed+0x40/0x54
 [<c0285748>] igmp6_timer_handler+0x0/0x58
 [<c0122b12>] del_timer+0x16/0x84
 [<c02855d8>] igmp6_join_group+0x94/0x124
 [<c028464c>] igmp6_group_added+0xcc/0xd8
 [<c02885ab>] tcp_v6_err+0x3a7/0x63c
 [<c0284a46>] ipv6_dev_mc_inc+0x31e/0x330
 [<c027443e>] addrconf_join_solict+0x3a/0x44
 [<c0275a3f>] addrconf_dad_start+0x13/0x15c
 [<c0275354>] addrconf_add_linklocal+0x28/0x44
 [<c0275407>] addrconf_dev_config+0x97/0xa4
 [<c02754ee>] addrconf_notify+0x52/0xc0
 [<c0126037>] notifier_call_chain+0x1f/0x38
 [<c02259f6>] dev_open+0xa6/0xb0
 [<c0226ca1>] dev_change_flags+0x51/0x104
 [<c025bad0>] devinet_ioctl+0x2bc/0x598
 [<c025e2e7>] inet_ioctl+0x77/0xb4
 [<c021f41b>] sock_ioctl+0x267/0x298
 [<c0153c09>] sys_ioctl+0x22d/0x27c
 [<c01095b1>] error_code+0x2d/0x38
 [<c0108b6f>] syscall_call+0x7/0xb

buffer layer error at fs/buffer.c:1623
Pass this trace through ksymoops for reporting
Call Trace:
 [<c0145753>] __buffer_error+0x33/0x38
 [<c0146e6f>] __block_write_full_page+0x7f/0x3bc
 [<c0147ef9>] block_write_full_page+0x2d/0x9c
 [<c014a414>] blkdev_get_block+0x0/0x48
 [<c014a513>] blkdev_writepage+0xf/0x14
 [<c014a414>] blkdev_get_block+0x0/0x48
 [<c0161857>] mpage_writepages+0x21b/0x398
 [<c014a504>] blkdev_writepage+0x0/0x14
 [<c0137cfb>] __pagevec_free+0x1f/0x28
 [<c0135895>] release_pages+0x171/0x17c
 [<c014b5ad>] generic_writepages+0x11/0x15
 [<c013e068>] do_writepages+0x18/0x30
 [<c012f035>] filemap_fdatawrite+0x51/0x68
 [<c014596f>] sync_blockdev+0x1b/0x3c
 [<c014b3bf>] blkdev_put+0x7b/0x19c
 [<c014b4f2>] blkdev_close+0x12/0x18
 [<c0145410>] __fput+0x30/0xf8
 [<c01453dc>] fput+0x14/0x18
 [<c01437cc>] filp_close+0xb0/0xbc
 [<c014384e>] sys_close+0x76/0xa4
 [<c0108b6f>] syscall_call+0x7/0xb

Thanks,

Mike.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.46: kernel BUG at kernel/timer.c:333!
  2002-11-10 14:32   ` Miquel van Smoorenburg
@ 2002-11-10 17:13     ` Andrew Morton
  2002-11-11 10:57       ` Miquel van Smoorenburg
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2002-11-10 17:13 UTC (permalink / raw)
  To: Miquel van Smoorenburg, David S. Miller; +Cc: linux-kernel

Miquel van Smoorenburg wrote:
> 
> According to Andrew Morton:
> > Miquel van Smoorenburg wrote:
> > > I can reliably crash 2.5.X on one of our newsservers (dual PIII/450, GigE,
> > > lots of disk- and network I/O).
> > >
> > > kernel BUG at kernel/timer.c:333!
> >
> > There are timer fixes in Linus's current tree.  The problem which
> > they address could cause this BUG.
> 
> I've booted 2.5.46bk5 on the machine, and it has been running for over
> 2 hours with extra heavy diskio. That reliably crashed the machine
> in about 45 minutes with 2.4.45 and 2.5.46, machine is still up now.

OK, thanks.

> 
> I'm still seeing the buffer layer error at fs/buffer.c:1623,
> though. Happens when a blockdev is close()d. Is a fix for
> this in -mm2? Does -mm2 include -bk5 ?  If so I'll put that
> on it and keep an eye on it tomorrow, see what happens.

This is a blockdev which was under mmap(), yes?  No, I haven't looked at
that yet.  It'll be a matter of just killing the warning.

mmapping a blockdev is a pretty dopey thing to do, btw.  It doesn't
allow the use of highmem, the IO uses tiny BIOs (in fact I think
it uses 512-byte or 1k blocksize too) and there are buffer_heads
all over the place.  You'll get better results from mmapping a
regular file.
 
> Debug messages I'm still seeing:
> (note that I compiled IPv6 into the kernel since we're slowly moving
> our network to IPv6 but that it is otherwise unused right now, and
> that the previous kernels that crashed on me didn't have IPv6 in it)
> 
> Uninitialised timer!
> This is just a warning.  Your computer is OK
> function=0xc0285748, data=0xf78a6680
> Call Trace:
>  [<c0122610>] check_timer_failed+0x40/0x54
>  [<c0285748>] igmp6_timer_handler+0x0/0x58
>  [<c0122b12>] del_timer+0x16/0x84
>  [<c02855d8>] igmp6_join_group+0x94/0x124

I missed one there.


--- 25/net/ipv6/mcast.c~ip6-mcast-timer	Sun Nov 10 09:12:28 2002
+++ 25-akpm/net/ipv6/mcast.c	Sun Nov 10 09:12:44 2002
@@ -296,6 +296,7 @@ int ipv6_dev_mc_inc(struct net_device *d
 	}
 
 	memset(mc, 0, sizeof(struct ifmcaddr6));
+	init_timer(&mc->mca_timer);
 	mc->mca_timer.function = igmp6_timer_handler;
 	mc->mca_timer.data = (unsigned long) mc;
 

_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.46: kernel BUG at kernel/timer.c:333!
  2002-11-10 17:13     ` Andrew Morton
@ 2002-11-11 10:57       ` Miquel van Smoorenburg
  2002-11-11 18:13         ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Miquel van Smoorenburg @ 2002-11-11 10:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

According to Andrew Morton:
> Miquel van Smoorenburg wrote:
> > 
> > I've booted 2.5.46bk5 on the machine, and it has been running for over
> > 2 hours with extra heavy diskio. That reliably crashed the machine
> > in about 45 minutes with 2.4.45 and 2.5.46, machine is still up now.
> 
> OK, thanks.

It survived the night and is still up. Looks like it runs slightly
faster than 2.4.20-X.

> This is a blockdev which was under mmap(), yes?  No, I haven't looked at
> that yet.  It'll be a matter of just killing the warning.

OK.

> mmapping a blockdev is a pretty dopey thing to do, btw.  It doesn't
> allow the use of highmem, the IO uses tiny BIOs (in fact I think
> it uses 512-byte or 1k blocksize too) and there are buffer_heads
> all over the place.  You'll get better results from mmapping a
> regular file.

It's just that the news server uses its own 'filesystem'. It does
normal read/write i/o on it, but the allocation bitmap at the
beginning of the 'file' is mmap()ed. Using a regular file means
creating a 160 GB file, the triple indirect blocks will probably
kill performance.

I guess that means I have to resurrect rawfs, then (a filesystem
I wrote for 2.2 that shows partitions as fixed-size files). But
that seems so .. unnecessary.

Mike.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.46: kernel BUG at kernel/timer.c:333!
  2002-11-11 10:57       ` Miquel van Smoorenburg
@ 2002-11-11 18:13         ` Andrew Morton
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2002-11-11 18:13 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-kernel

Miquel van Smoorenburg wrote:
> 
> According to Andrew Morton:
> > Miquel van Smoorenburg wrote:
> > >
> > > I've booted 2.5.46bk5 on the machine, and it has been running for over
> > > 2 hours with extra heavy diskio. That reliably crashed the machine
> > > in about 45 minutes with 2.4.45 and 2.5.46, machine is still up now.
> >
> > OK, thanks.
> 
> It survived the night and is still up. Looks like it runs slightly
> faster than 2.4.20-X.

Good, thanks.

> ..
> > mmapping a blockdev is a pretty dopey thing to do, btw.  It doesn't
> > allow the use of highmem, the IO uses tiny BIOs (in fact I think
> > it uses 512-byte or 1k blocksize too) and there are buffer_heads
> > all over the place.  You'll get better results from mmapping a
> > regular file.
> 
> It's just that the news server uses its own 'filesystem'. It does
> normal read/write i/o on it, but the allocation bitmap at the
> beginning of the 'file' is mmap()ed. Using a regular file means
> creating a 160 GB file, the triple indirect blocks will probably
> kill performance.

hm, OK.  mmapping just a little bit in that manner isn't so bad I
guess.  But the general read() and write() implementation for blockdevs
is not as efficient as that for, say, ext2 files.

Probably I could turn on the direct-to-bio stuff for blockdevs if there is
no filesystem mounted.  But with the regular open of /dev/hda1 there's no
way to prevent that.

The really scary problem is if some smarty tries to modify a blockdev
with MAP_SHARED while there's a live filesystem mounted.  Even if they
modify "safe" parts of the filesystem, this will wreck the fs if it has
blocksize < pagesize.  I'm not aware of a robust way of preventing this.

> I guess that means I have to resurrect rawfs, then (a filesystem
> I wrote for 2.2 that shows partitions as fixed-size files). But
> that seems so .. unnecessary.
> 

What you're doing now should be OK.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-11-11 18:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-09 15:07 2.5.46: kernel BUG at kernel/timer.c:333! Miquel van Smoorenburg
2002-11-09 18:51 ` Andrew Morton
2002-11-10 14:32   ` Miquel van Smoorenburg
2002-11-10 17:13     ` Andrew Morton
2002-11-11 10:57       ` Miquel van Smoorenburg
2002-11-11 18:13         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox