* 2.4.10-pre5: Bug in alloc_pages
@ 2001-09-08 11:04 Nick Piggin
0 siblings, 0 replies; 2+ messages in thread
From: Nick Piggin @ 2001-09-08 11:04 UTC (permalink / raw)
To: Linux-Kernel
Here are a few Oopses which appeared in 2.4.10-pre5 (not in pre4). The first
2 appeared during the startup scripts and the next ones appeared over the
next 20 minutes or so. I'd be happy to try patches. Please CC me.
Nick
kernel BUG at page_alloc.c:204!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c012c3b6>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 00000020 ebx: c1008a80 ecx: 00000000 edx: c3f8b920
esi: c1000000 edi: 00001000 ebp: c0209080 esp: c130da34
ds: 0018 es: 0018 ss: 0018
Process squid (pid: 462, stackpage=c130d000)
Stack: c01e7d60 c01e7d53 000000cc 0000022a 00000282 00000000 c0209080
c18cd040
00000001 c1008ac0 c01248e8 c1008ac0 c12cf840 c12cf7e0 c12cf780
c0209080
c02092b8 00000000 c02092b0 c012c5aa 00000001 000001d2 c18cd0d8
00003644
Call Trace: [<c01248e8>] [<c012c5aa>] [<c01249a0>] [<c0125ab0>] [<c0122895>]
[<c01229fe>] [<c014630b>] [<c0125122>] [<c0111a10>] [<c0111ce5>]
[<c0111a10>]
[<c0106e0c>] [<c01db8b0>] [<c0111a10>] [<c0106e0c>] [<c0120018>]
[<c01db8b0>]
[<c014745c>] [<c0148081>] [<c0188fdb>] [<c0112624>] [<c012c5aa>]
[<c0147b10>]
[<c013ad3d>] [<c013afa9>] [<c013bf2b>] [<c0105907>] [<c0106d1b>]
Code: 0f 0b 83 c4 0c 89 d8 e9 a4 fe ff ff 8b 43 18 a9 80 00 00 00
>>EIP; c012c3b6 <rmqueue+196/280> <=====
Trace; c01248e8 <add_to_page_cache_unique+a8/b0>
Trace; c012c5aa <__alloc_pages+4a/2b0>
Trace; c01249a0 <read_cluster_nonblocking+b0/100>
Trace; c0125ab0 <filemap_nopage+1d0/400>
Trace; c0122894 <do_no_page+44/e0>
Trace; c01229fe <handle_mm_fault+ce/e0>
Trace; c014630a <update_atime+4a/50>
Trace; c0125122 <do_generic_file_read+232/530>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0111ce4 <do_page_fault+2d4/4a0>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Trace; c01db8b0 <clear_user+30/40>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Trace; c0120018 <exec_usermodehelper+298/3c0>
Trace; c01db8b0 <clear_user+30/40>
Trace; c014745c <padzero+1c/20>
Trace; c0148080 <load_elf_binary+570/a50>
Trace; c0188fda <submit_bh+3a/80>
Trace; c0112624 <schedule+1f4/3f0>
Trace; c012c5aa <__alloc_pages+4a/2b0>
Trace; c0147b10 <load_elf_binary+0/a50>
Trace; c013ad3c <search_binary_handler+6c/190>
Trace; c013afa8 <do_execve+148/200>
Trace; c013bf2a <getname+6a/b0>
Trace; c0105906 <sys_execve+36/60>
Trace; c0106d1a <system_call+32/38>
Code; c012c3b6 <rmqueue+196/280>
00000000 <_EIP>:
Code; c012c3b6 <rmqueue+196/280> <=====
0: 0f 0b ud2a <=====
Code; c012c3b8 <rmqueue+198/280>
2: 83 c4 0c add $0xc,%esp
Code; c012c3ba <rmqueue+19a/280>
5: 89 d8 mov %ebx,%eax
Code; c012c3bc <rmqueue+19c/280>
7: e9 a4 fe ff ff jmp fffffeb0 <_EIP+0xfffffeb0> c012c266
<rmqueue+46/280>
Code; c012c3c2 <rmqueue+1a2/280>
c: 8b 43 18 mov 0x18(%ebx),%eax
Code; c012c3c4 <rmqueue+1a4/280>
f: a9 80 00 00 00 test $0x80,%eax
kernel BUG at page_alloc.c:204!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c012c3b6>]
EFLAGS: 00010282
eax: 00000020 ebx: c1008ac0 ecx: 00000005 edx: c3f8b620
esi: c1000000 edi: 00001000 ebp: c0209080 esp: c2225df0
ds: 0018 es: 0018 ss: 0018
Process xfs-xtt (pid: 450, stackpage=c2225000)
Stack: c01e7d60 c01e7d53 000000cc 0000022b 00000286 00000000 c0209080
0044c35c
c0279fa0 c3fbd300 c018bec0 c0279fa0 c3fbd300 0044c35c 00000028
c0209080
c02092b8 00000000 c02092b0 c012c5aa 00000001 000001d2 00000000
c3f8b620
Call Trace: [<c018bec0>] [<c012c5aa>] [<c01227cf>] [<c012292a>] [<c01229fe>]
[<c014630b>] [<c0125122>] [<c0111a10>] [<c0111ce5>] [<c01254eb>]
[<c0125420>]
[<c0131ed3>] [<c0118a4b>] [<c0111a10>] [<c0106e0c>]
Code: 0f 0b 83 c4 0c 89 d8 e9 a4 fe ff ff 8b 43 18 a9 80 00 00 00
>>EIP; c012c3b6 <rmqueue+196/280> <=====
Trace; c018bec0 <start_request+170/220>
Trace; c012c5aa <__alloc_pages+4a/2b0>
Trace; c01227ce <do_anonymous_page+3e/c0>
Trace; c012292a <do_no_page+da/e0>
Trace; c01229fe <handle_mm_fault+ce/e0>
Trace; c014630a <update_atime+4a/50>
Trace; c0125122 <do_generic_file_read+232/530>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0111ce4 <do_page_fault+2d4/4a0>
Trace; c01254ea <generic_file_read+6a/80>
Trace; c0125420 <file_read_actor+0/60>
Trace; c0131ed2 <sys_read+82/d0>
Trace; c0118a4a <do_softirq+5a/a0>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Code; c012c3b6 <rmqueue+196/280>
00000000 <_EIP>:
Code; c012c3b6 <rmqueue+196/280> <=====
0: 0f 0b ud2a <=====
Code; c012c3b8 <rmqueue+198/280>
2: 83 c4 0c add $0xc,%esp
Code; c012c3ba <rmqueue+19a/280>
5: 89 d8 mov %ebx,%eax
Code; c012c3bc <rmqueue+19c/280>
7: e9 a4 fe ff ff jmp fffffeb0 <_EIP+0xfffffeb0> c012c266
<rmqueue+46/280>
Code; c012c3c2 <rmqueue+1a2/280>
c: 8b 43 18 mov 0x18(%ebx),%eax
Code; c012c3c4 <rmqueue+1a4/280>
f: a9 80 00 00 00 test $0x80,%eax
kernel BUG at page_alloc.c:204!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c012c3b6>]
EFLAGS: 00010286
eax: 00000020 ebx: c1008a80 ecx: 00000005 edx: c1f221c0
esi: c1000000 edi: 00001000 ebp: c0209080 esp: c141ddb4
ds: 0018 es: 0018 ss: 0018
Process cc1 (pid: 608, stackpage=c141d000)
Stack: c01e7d60 c01e7d53 000000cc 0000022a 00000286 00000000 c0209080
c1d6f740
c141c000 c1009100 c01248e8 c1009100 c023e840 c023e7e0 c023e780
c0209080
c02092b8 00000000 c02092b0 c012c5aa 00000001 000001d2 c1d6f7d8
0000001f
Call Trace: [<c01248e8>] [<c012c5aa>] [<c01249a0>] [<c0125ab0>] [<c01227cf>]
[<c0122895>] [<c01229fe>] [<c01215ad>] [<c0111a10>] [<c0111ce5>]
[<c0123893>]
[<c01317d1>] [<c01316d2>] [<c013bf2b>] [<c0131a04>] [<c0111a10>]
[<c0106e0c>]
Code: 0f 0b 83 c4 0c 89 d8 e9 a4 fe ff ff 8b 43 18 a9 80 00 00 00
>>EIP; c012c3b6 <rmqueue+196/280> <=====
Trace; c01248e8 <add_to_page_cache_unique+a8/b0>
Trace; c012c5aa <__alloc_pages+4a/2b0>
Trace; c01249a0 <read_cluster_nonblocking+b0/100>
Trace; c0125ab0 <filemap_nopage+1d0/400>
Trace; c01227ce <do_anonymous_page+3e/c0>
Trace; c0122894 <do_no_page+44/e0>
Trace; c01229fe <handle_mm_fault+ce/e0>
Trace; c01215ac <zap_page_range+22c/260>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0111ce4 <do_page_fault+2d4/4a0>
Trace; c0123892 <do_munmap+62/2a0>
Trace; c01317d0 <dentry_open+f0/160>
Trace; c01316d2 <filp_open+52/60>
Trace; c013bf2a <getname+6a/b0>
Trace; c0131a04 <sys_open+74/b0>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Code; c012c3b6 <rmqueue+196/280>
00000000 <_EIP>:
Code; c012c3b6 <rmqueue+196/280> <=====
0: 0f 0b ud2a <=====
Code; c012c3b8 <rmqueue+198/280>
2: 83 c4 0c add $0xc,%esp
Code; c012c3ba <rmqueue+19a/280>
5: 89 d8 mov %ebx,%eax
Code; c012c3bc <rmqueue+19c/280>
7: e9 a4 fe ff ff jmp fffffeb0 <_EIP+0xfffffeb0> c012c266
<rmqueue+46/280>
Code; c012c3c2 <rmqueue+1a2/280>
c: 8b 43 18 mov 0x18(%ebx),%eax
Code; c012c3c4 <rmqueue+1a4/280>
f: a9 80 00 00 00 test $0x80,%eax
kernel BUG at page_alloc.c:75!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c012bf72>]
EFLAGS: 00010282
eax: 0000001f ebx: 00000000 ecx: 00000000 edx: 00000000
esi: c1008a80 edi: 00000007 ebp: 00000001 esp: c3fc1f60
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 4, stackpage=c3fc1000)
Stack: c01e7d60 c01e7d53 0000004b c0145b22 c3fc1f7c 00000001 00000000
c1008a9c
00000000 00000000 c1008a80 00000007 00000001 c012b368 00000002
00000006
000001c0 00000019 00000001 c012b969 000001c0 00000001 000001c0
00000001
Call Trace: [<c0145b22>] [<c012b368>] [<c012b969>] [<c012ba3e>] [<c0105000>]
[<c0105000>] [<c01054d6>] [<c012b9e0>]
Code: 0f 0b 83 c4 0c 8b 1d 6c 0c 26 c0 89 f2 29 da c1 fa 06 3b 15
>>EIP; c012bf72 <__free_pages_ok+42/2f0> <=====
Trace; c0145b22 <prune_icache+92/140>
Trace; c012b368 <page_launder+548/8e0>
Trace; c012b968 <do_try_to_free_pages+48/c0>
Trace; c012ba3e <kswapd+5e/c0>
Trace; c0105000 <_stext+0/0>
Trace; c0105000 <_stext+0/0>
Trace; c01054d6 <kernel_thread+26/30>
Trace; c012b9e0 <kswapd+0/c0>
Code; c012bf72 <__free_pages_ok+42/2f0>
00000000 <_EIP>:
Code; c012bf72 <__free_pages_ok+42/2f0> <=====
0: 0f 0b ud2a <=====
Code; c012bf74 <__free_pages_ok+44/2f0>
2: 83 c4 0c add $0xc,%esp
Code; c012bf76 <__free_pages_ok+46/2f0>
5: 8b 1d 6c 0c 26 c0 mov 0xc0260c6c,%ebx
Code; c012bf7c <__free_pages_ok+4c/2f0>
b: 89 f2 mov %esi,%edx
Code; c012bf7e <__free_pages_ok+4e/2f0>
d: 29 da sub %ebx,%edx
Code; c012bf80 <__free_pages_ok+50/2f0>
f: c1 fa 06 sar $0x6,%edx
Code; c012bf84 <__free_pages_ok+54/2f0>
12: 3b 15 00 00 00 00 cmp 0x0,%edx
kernel BUG at vmscan.c:451!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c012ad54>]
EFLAGS: 00010286
eax: 0000001c ebx: c1008a9c ecx: 00000004 edx: c1f223c0
esi: c18cd7d8 edi: c1008a80 ebp: c0209098 esp: c3453df8
ds: 0018 es: 0018 ss: 0018
Process modprobe (pid: 905, stackpage=c3453000)
Stack: c01e7b29 c01e7b20 000001c3 c0209080 c02092b8 00000001 00000001
c012c4e5
c0209080 00000000 c02092bc 00000000 c02092b0 c012c5e0 c02092b0
00000000
00000001 00000001 00000001 000001d2 00000000 c1f223c0 00000001
c27adae0
Call Trace: [<c012c4e5>] [<c012c5e0>] [<c01227cf>] [<c012292a>] [<c01229fe>]
[<c014630b>] [<c0125122>] [<c0111a10>] [<c0111ce5>] [<c01254eb>]
[<c0125420>]
[<c0131ed3>] [<c0150800>] [<c0111a10>] [<c0106e0c>]
Code: 0f 0b 83 c4 0c e9 7c ff ff ff 89 f6 8b 47 18 a9 80 00 00 00
>>EIP; c012ad54 <reclaim_page+3a4/470> <=====
Trace; c012c4e4 <__alloc_pages_limit+44/a0>
Trace; c012c5e0 <__alloc_pages+80/2b0>
Trace; c01227ce <do_anonymous_page+3e/c0>
Trace; c012292a <do_no_page+da/e0>
Trace; c01229fe <handle_mm_fault+ce/e0>
Trace; c014630a <update_atime+4a/50>
Trace; c0125122 <do_generic_file_read+232/530>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0111ce4 <do_page_fault+2d4/4a0>
Trace; c01254ea <generic_file_read+6a/80>
Trace; c0125420 <file_read_actor+0/60>
Trace; c0131ed2 <sys_read+82/d0>
Trace; c0150800 <ext2_file_lseek+0/b0>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Code; c012ad54 <reclaim_page+3a4/470>
00000000 <_EIP>:
Code; c012ad54 <reclaim_page+3a4/470> <=====
0: 0f 0b ud2a <=====
Code; c012ad56 <reclaim_page+3a6/470>
2: 83 c4 0c add $0xc,%esp
Code; c012ad58 <reclaim_page+3a8/470>
5: e9 7c ff ff ff jmp ffffff86 <_EIP+0xffffff86> c012acda
<reclaim_page+32a/470>
Code; c012ad5e <reclaim_page+3ae/470>
a: 89 f6 mov %esi,%esi
Code; c012ad60 <reclaim_page+3b0/470>
c: 8b 47 18 mov 0x18(%edi),%eax
Code; c012ad62 <reclaim_page+3b2/470>
f: a9 80 00 00 00 test $0x80,%eax
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: 2.4.10-pre5: Bug in alloc_pages
[not found] <200109081850.f88Io0L01074@penguin.transmeta.com>
@ 2001-09-09 4:52 ` Nick Piggin
0 siblings, 0 replies; 2+ messages in thread
From: Nick Piggin @ 2001-09-09 4:52 UTC (permalink / raw)
To: Linus Torvalds, Linux-Kernel
I assume you have come up with the cause of this problem - pre6 seems to be
stable. Anyway if you still want this information, it _is_ repeatable under
pre5 and I have not seen it no any other kernel. pre5 oopsed everytime I
booted it (mostly when starting squid), and basically ran some chance of
oopsing everytime something memory intensive was run eg. gcc.
It didn't seem to make any difference when swap was on or off. I tested this
because I saw about 10 similar messages about bad swap pages (corrupted,
invalid, not found, I can't remember) while booting pre5 the first time,
however they ran off the console and the system Oopsed then froze before I
could record them unfortunately.
The 5 oopses I sent happened over about 15 minutes, however I think the
frequency of them would be very dependant on vm pressure.
Nick
----- Original Message -----
From: "Linus Torvalds" <torvalds@transmeta.com>
Newsgroups: linux.dev.kernel
To: <s3293115@student.anu.edu.au>; "Hugh Dickins" <hugh@veritas.com>;
"Alexander Viro" <viro@math.psu.edu>
Sent: Sunday, September 09, 2001 4:50 AM
Subject: Re: 2.4.10-pre5: Bug in alloc_pages
> [ Background for Al Viro: there's an oops in at least 2.4.10-pre5 that
> triggers in mm/page_alloc.c, line 204 - which implies that we are
> trying to allocate a page off the free list that is already on one of
> the page lists. Which implies rather serious MM corruption ]
>
> In article <000701c13856$14e01fe0$0200a8c0@W2K> you write:
> >Here are a few Oopses which appeared in 2.4.10-pre5 (not in pre4). The
first
> >2 appeared during the startup scripts and the next ones appeared over the
> >next 20 minutes or so. I'd be happy to try patches. Please CC me.
>
> Nick, can you do some more debugging for me? The bug is definitely real,
> there's no question about it - I have now seen it myself, but I don't
> seem to be able to reproduce it on my machines. It seems to happen quite
> early for you..
>
> What I'd ask you to do is:
>
> - can you verify that it is repeatable under pre5? Does it happen every
> time, or at least easy to trigger?
>
> - can you try to trigger it some more under pre4? In particular, pre5
> doesn't actually have any MM changes _at_all_, which makes me suspect
> that maybe something in pre5 just made it easier to trigger. This is
> also why I'd like to hear whether it is really repeatable in pre5: if
> it's not repeatable in pre5, maybe you ran pre4 for a longish time
> and just didn't happen to hit it..
>
> I know that the above kind of testing is rather nasty and boring
> (especially as you'd end up having to reboot multiple times), but it
> would really help.. Thanks.
>
> If it really happens only under pre5, and never under pre4, then that is
> very interesting indeed. As mentioned, the pre4->pre5 thing doesn't
> actually change any of the VM code itself, so then there's something
> else going on. The only thing I can imagine right now is:
>
> - the initbootdata handling changed a bit. Does the problem go away if
> you copy 'arch/i386/kernel/setup.c' from pre4 into the pre5 tree?
>
> - Al Viro's FS-layer changes somehow trigger this bug, possibly by
> freeing some inode early. I don't have any real reason to suspect the
> FS changes, except for the fact that with no MM changes, the FS is
> the only other thing that has changed and is fairly intimate with MM
> stuff.
>
> Most of the pre4->pre5 changes are in fact things that I know cannot
> matter, simply because I don't even have them compiled into my kernels.
> Things like bluetooth, ARM, sparc, minix, telephony, framebuffer etc.
> This is why it would be so interesting to make sure that it really _is_
> pre5 only, and never happens in pre4..
>
> Hugh, if it turns out to be possible to trigger on pre4 too, I'm still
> going to blame your swap changes. So please give them a double look just
> in case..
>
> Nick, I don't have any real patches for you to test yet (except the
> suggestion to reverse i386/kernel/setup.c if you can't re-create it on
> pre4), but I'd be very grateful for as much information as you can
> possibly gather.. Things like patterns to how the oopses happen etc.
>
> Thanks,
> Linus
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2001-09-09 7:35 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200109081850.f88Io0L01074@penguin.transmeta.com>
2001-09-09 4:52 ` 2.4.10-pre5: Bug in alloc_pages Nick Piggin
2001-09-08 11:04 Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox