public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4.10-pre5: Bug in alloc_pages
@ 2001-09-08 11:04 Nick Piggin
  0 siblings, 0 replies; 2+ messages in thread
From: Nick Piggin @ 2001-09-08 11:04 UTC (permalink / raw)
  To: Linux-Kernel

Here are a few Oopses which appeared in 2.4.10-pre5 (not in pre4). The first
2 appeared during the startup scripts and the next ones appeared over the
next 20 minutes or so. I'd be happy to try patches. Please CC me.

Nick

kernel BUG at page_alloc.c:204!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012c3b6>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 00000020   ebx: c1008a80   ecx: 00000000   edx: c3f8b920
esi: c1000000   edi: 00001000   ebp: c0209080   esp: c130da34
ds: 0018   es: 0018   ss: 0018
Process squid (pid: 462, stackpage=c130d000)
Stack: c01e7d60 c01e7d53 000000cc 0000022a 00000282 00000000 c0209080
c18cd040
       00000001 c1008ac0 c01248e8 c1008ac0 c12cf840 c12cf7e0 c12cf780
c0209080
       c02092b8 00000000 c02092b0 c012c5aa 00000001 000001d2 c18cd0d8
00003644
Call Trace: [<c01248e8>] [<c012c5aa>] [<c01249a0>] [<c0125ab0>] [<c0122895>]
   [<c01229fe>] [<c014630b>] [<c0125122>] [<c0111a10>] [<c0111ce5>]
[<c0111a10>]
   [<c0106e0c>] [<c01db8b0>] [<c0111a10>] [<c0106e0c>] [<c0120018>]
[<c01db8b0>]
   [<c014745c>] [<c0148081>] [<c0188fdb>] [<c0112624>] [<c012c5aa>]
[<c0147b10>]
   [<c013ad3d>] [<c013afa9>] [<c013bf2b>] [<c0105907>] [<c0106d1b>]
Code: 0f 0b 83 c4 0c 89 d8 e9 a4 fe ff ff 8b 43 18 a9 80 00 00 00

>>EIP; c012c3b6 <rmqueue+196/280>   <=====
Trace; c01248e8 <add_to_page_cache_unique+a8/b0>
Trace; c012c5aa <__alloc_pages+4a/2b0>
Trace; c01249a0 <read_cluster_nonblocking+b0/100>
Trace; c0125ab0 <filemap_nopage+1d0/400>
Trace; c0122894 <do_no_page+44/e0>
Trace; c01229fe <handle_mm_fault+ce/e0>
Trace; c014630a <update_atime+4a/50>
Trace; c0125122 <do_generic_file_read+232/530>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0111ce4 <do_page_fault+2d4/4a0>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Trace; c01db8b0 <clear_user+30/40>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Trace; c0120018 <exec_usermodehelper+298/3c0>
Trace; c01db8b0 <clear_user+30/40>
Trace; c014745c <padzero+1c/20>
Trace; c0148080 <load_elf_binary+570/a50>
Trace; c0188fda <submit_bh+3a/80>
Trace; c0112624 <schedule+1f4/3f0>
Trace; c012c5aa <__alloc_pages+4a/2b0>
Trace; c0147b10 <load_elf_binary+0/a50>
Trace; c013ad3c <search_binary_handler+6c/190>
Trace; c013afa8 <do_execve+148/200>
Trace; c013bf2a <getname+6a/b0>
Trace; c0105906 <sys_execve+36/60>
Trace; c0106d1a <system_call+32/38>
Code;  c012c3b6 <rmqueue+196/280>
00000000 <_EIP>:
Code;  c012c3b6 <rmqueue+196/280>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012c3b8 <rmqueue+198/280>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c012c3ba <rmqueue+19a/280>
   5:   89 d8                     mov    %ebx,%eax
Code;  c012c3bc <rmqueue+19c/280>
   7:   e9 a4 fe ff ff            jmp    fffffeb0 <_EIP+0xfffffeb0> c012c266
<rmqueue+46/280>
Code;  c012c3c2 <rmqueue+1a2/280>
   c:   8b 43 18                  mov    0x18(%ebx),%eax
Code;  c012c3c4 <rmqueue+1a4/280>
   f:   a9 80 00 00 00            test   $0x80,%eax

kernel BUG at page_alloc.c:204!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012c3b6>]
EFLAGS: 00010282
eax: 00000020   ebx: c1008ac0   ecx: 00000005   edx: c3f8b620
esi: c1000000   edi: 00001000   ebp: c0209080   esp: c2225df0
ds: 0018   es: 0018   ss: 0018
Process xfs-xtt (pid: 450, stackpage=c2225000)
Stack: c01e7d60 c01e7d53 000000cc 0000022b 00000286 00000000 c0209080
0044c35c
       c0279fa0 c3fbd300 c018bec0 c0279fa0 c3fbd300 0044c35c 00000028
c0209080
       c02092b8 00000000 c02092b0 c012c5aa 00000001 000001d2 00000000
c3f8b620
Call Trace: [<c018bec0>] [<c012c5aa>] [<c01227cf>] [<c012292a>] [<c01229fe>]
   [<c014630b>] [<c0125122>] [<c0111a10>] [<c0111ce5>] [<c01254eb>]
[<c0125420>]
   [<c0131ed3>] [<c0118a4b>] [<c0111a10>] [<c0106e0c>]
Code: 0f 0b 83 c4 0c 89 d8 e9 a4 fe ff ff 8b 43 18 a9 80 00 00 00

>>EIP; c012c3b6 <rmqueue+196/280>   <=====
Trace; c018bec0 <start_request+170/220>
Trace; c012c5aa <__alloc_pages+4a/2b0>
Trace; c01227ce <do_anonymous_page+3e/c0>
Trace; c012292a <do_no_page+da/e0>
Trace; c01229fe <handle_mm_fault+ce/e0>
Trace; c014630a <update_atime+4a/50>
Trace; c0125122 <do_generic_file_read+232/530>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0111ce4 <do_page_fault+2d4/4a0>
Trace; c01254ea <generic_file_read+6a/80>
Trace; c0125420 <file_read_actor+0/60>
Trace; c0131ed2 <sys_read+82/d0>
Trace; c0118a4a <do_softirq+5a/a0>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Code;  c012c3b6 <rmqueue+196/280>
00000000 <_EIP>:
Code;  c012c3b6 <rmqueue+196/280>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012c3b8 <rmqueue+198/280>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c012c3ba <rmqueue+19a/280>
   5:   89 d8                     mov    %ebx,%eax
Code;  c012c3bc <rmqueue+19c/280>
   7:   e9 a4 fe ff ff            jmp    fffffeb0 <_EIP+0xfffffeb0> c012c266
<rmqueue+46/280>
Code;  c012c3c2 <rmqueue+1a2/280>
   c:   8b 43 18                  mov    0x18(%ebx),%eax
Code;  c012c3c4 <rmqueue+1a4/280>
   f:   a9 80 00 00 00            test   $0x80,%eax

kernel BUG at page_alloc.c:204!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012c3b6>]
EFLAGS: 00010286
eax: 00000020   ebx: c1008a80   ecx: 00000005   edx: c1f221c0
esi: c1000000   edi: 00001000   ebp: c0209080   esp: c141ddb4
ds: 0018   es: 0018   ss: 0018
Process cc1 (pid: 608, stackpage=c141d000)
Stack: c01e7d60 c01e7d53 000000cc 0000022a 00000286 00000000 c0209080
c1d6f740
       c141c000 c1009100 c01248e8 c1009100 c023e840 c023e7e0 c023e780
c0209080
       c02092b8 00000000 c02092b0 c012c5aa 00000001 000001d2 c1d6f7d8
0000001f
Call Trace: [<c01248e8>] [<c012c5aa>] [<c01249a0>] [<c0125ab0>] [<c01227cf>]
   [<c0122895>] [<c01229fe>] [<c01215ad>] [<c0111a10>] [<c0111ce5>]
[<c0123893>]
   [<c01317d1>] [<c01316d2>] [<c013bf2b>] [<c0131a04>] [<c0111a10>]
[<c0106e0c>]
Code: 0f 0b 83 c4 0c 89 d8 e9 a4 fe ff ff 8b 43 18 a9 80 00 00 00

>>EIP; c012c3b6 <rmqueue+196/280>   <=====
Trace; c01248e8 <add_to_page_cache_unique+a8/b0>
Trace; c012c5aa <__alloc_pages+4a/2b0>
Trace; c01249a0 <read_cluster_nonblocking+b0/100>
Trace; c0125ab0 <filemap_nopage+1d0/400>
Trace; c01227ce <do_anonymous_page+3e/c0>
Trace; c0122894 <do_no_page+44/e0>
Trace; c01229fe <handle_mm_fault+ce/e0>
Trace; c01215ac <zap_page_range+22c/260>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0111ce4 <do_page_fault+2d4/4a0>
Trace; c0123892 <do_munmap+62/2a0>
Trace; c01317d0 <dentry_open+f0/160>
Trace; c01316d2 <filp_open+52/60>
Trace; c013bf2a <getname+6a/b0>
Trace; c0131a04 <sys_open+74/b0>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Code;  c012c3b6 <rmqueue+196/280>
00000000 <_EIP>:
Code;  c012c3b6 <rmqueue+196/280>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012c3b8 <rmqueue+198/280>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c012c3ba <rmqueue+19a/280>
   5:   89 d8                     mov    %ebx,%eax
Code;  c012c3bc <rmqueue+19c/280>
   7:   e9 a4 fe ff ff            jmp    fffffeb0 <_EIP+0xfffffeb0> c012c266
<rmqueue+46/280>
Code;  c012c3c2 <rmqueue+1a2/280>
   c:   8b 43 18                  mov    0x18(%ebx),%eax
Code;  c012c3c4 <rmqueue+1a4/280>
   f:   a9 80 00 00 00            test   $0x80,%eax

kernel BUG at page_alloc.c:75!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012bf72>]
EFLAGS: 00010282
eax: 0000001f   ebx: 00000000   ecx: 00000000   edx: 00000000
esi: c1008a80   edi: 00000007   ebp: 00000001   esp: c3fc1f60
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 4, stackpage=c3fc1000)
Stack: c01e7d60 c01e7d53 0000004b c0145b22 c3fc1f7c 00000001 00000000
c1008a9c
       00000000 00000000 c1008a80 00000007 00000001 c012b368 00000002
00000006
       000001c0 00000019 00000001 c012b969 000001c0 00000001 000001c0
00000001
Call Trace: [<c0145b22>] [<c012b368>] [<c012b969>] [<c012ba3e>] [<c0105000>]
   [<c0105000>] [<c01054d6>] [<c012b9e0>]
Code: 0f 0b 83 c4 0c 8b 1d 6c 0c 26 c0 89 f2 29 da c1 fa 06 3b 15

>>EIP; c012bf72 <__free_pages_ok+42/2f0>   <=====
Trace; c0145b22 <prune_icache+92/140>
Trace; c012b368 <page_launder+548/8e0>
Trace; c012b968 <do_try_to_free_pages+48/c0>
Trace; c012ba3e <kswapd+5e/c0>
Trace; c0105000 <_stext+0/0>
Trace; c0105000 <_stext+0/0>
Trace; c01054d6 <kernel_thread+26/30>
Trace; c012b9e0 <kswapd+0/c0>
Code;  c012bf72 <__free_pages_ok+42/2f0>
00000000 <_EIP>:
Code;  c012bf72 <__free_pages_ok+42/2f0>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012bf74 <__free_pages_ok+44/2f0>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c012bf76 <__free_pages_ok+46/2f0>
   5:   8b 1d 6c 0c 26 c0         mov    0xc0260c6c,%ebx
Code;  c012bf7c <__free_pages_ok+4c/2f0>
   b:   89 f2                     mov    %esi,%edx
Code;  c012bf7e <__free_pages_ok+4e/2f0>
   d:   29 da                     sub    %ebx,%edx
Code;  c012bf80 <__free_pages_ok+50/2f0>
   f:   c1 fa 06                  sar    $0x6,%edx
Code;  c012bf84 <__free_pages_ok+54/2f0>
  12:   3b 15 00 00 00 00         cmp    0x0,%edx

kernel BUG at vmscan.c:451!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012ad54>]
EFLAGS: 00010286
eax: 0000001c   ebx: c1008a9c   ecx: 00000004   edx: c1f223c0
esi: c18cd7d8   edi: c1008a80   ebp: c0209098   esp: c3453df8
ds: 0018   es: 0018   ss: 0018
Process modprobe (pid: 905, stackpage=c3453000)
Stack: c01e7b29 c01e7b20 000001c3 c0209080 c02092b8 00000001 00000001
c012c4e5
       c0209080 00000000 c02092bc 00000000 c02092b0 c012c5e0 c02092b0
00000000
       00000001 00000001 00000001 000001d2 00000000 c1f223c0 00000001
c27adae0
Call Trace: [<c012c4e5>] [<c012c5e0>] [<c01227cf>] [<c012292a>] [<c01229fe>]
   [<c014630b>] [<c0125122>] [<c0111a10>] [<c0111ce5>] [<c01254eb>]
[<c0125420>]
   [<c0131ed3>] [<c0150800>] [<c0111a10>] [<c0106e0c>]
Code: 0f 0b 83 c4 0c e9 7c ff ff ff 89 f6 8b 47 18 a9 80 00 00 00

>>EIP; c012ad54 <reclaim_page+3a4/470>   <=====
Trace; c012c4e4 <__alloc_pages_limit+44/a0>
Trace; c012c5e0 <__alloc_pages+80/2b0>
Trace; c01227ce <do_anonymous_page+3e/c0>
Trace; c012292a <do_no_page+da/e0>
Trace; c01229fe <handle_mm_fault+ce/e0>
Trace; c014630a <update_atime+4a/50>
Trace; c0125122 <do_generic_file_read+232/530>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0111ce4 <do_page_fault+2d4/4a0>
Trace; c01254ea <generic_file_read+6a/80>
Trace; c0125420 <file_read_actor+0/60>
Trace; c0131ed2 <sys_read+82/d0>
Trace; c0150800 <ext2_file_lseek+0/b0>
Trace; c0111a10 <do_page_fault+0/4a0>
Trace; c0106e0c <error_code+34/3c>
Code;  c012ad54 <reclaim_page+3a4/470>
00000000 <_EIP>:
Code;  c012ad54 <reclaim_page+3a4/470>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012ad56 <reclaim_page+3a6/470>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c012ad58 <reclaim_page+3a8/470>
   5:   e9 7c ff ff ff            jmp    ffffff86 <_EIP+0xffffff86> c012acda
<reclaim_page+32a/470>
Code;  c012ad5e <reclaim_page+3ae/470>
   a:   89 f6                     mov    %esi,%esi
Code;  c012ad60 <reclaim_page+3b0/470>
   c:   8b 47 18                  mov    0x18(%edi),%eax
Code;  c012ad62 <reclaim_page+3b2/470>
   f:   a9 80 00 00 00            test   $0x80,%eax


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: 2.4.10-pre5: Bug in alloc_pages
       [not found] <200109081850.f88Io0L01074@penguin.transmeta.com>
@ 2001-09-09  4:52 ` Nick Piggin
  0 siblings, 0 replies; 2+ messages in thread
From: Nick Piggin @ 2001-09-09  4:52 UTC (permalink / raw)
  To: Linus Torvalds, Linux-Kernel

I assume you have come up with the cause of this problem - pre6 seems to be
stable. Anyway if you still want this information, it _is_ repeatable under
pre5 and I have not seen it no any other kernel. pre5 oopsed everytime I
booted it (mostly when starting squid), and basically ran some chance of
oopsing everytime something memory intensive was run eg. gcc.

It didn't seem to make any difference when swap was on or off. I tested this
because I saw about 10 similar messages about bad swap pages (corrupted,
invalid, not found, I can't remember) while booting pre5 the first time,
however they ran off the console and the system Oopsed then froze before I
could record them unfortunately.

The 5 oopses I sent happened over about 15 minutes, however I think the
frequency of them would be very dependant on vm pressure.

Nick

----- Original Message -----
From: "Linus Torvalds" <torvalds@transmeta.com>
Newsgroups: linux.dev.kernel
To: <s3293115@student.anu.edu.au>; "Hugh Dickins" <hugh@veritas.com>;
"Alexander Viro" <viro@math.psu.edu>
Sent: Sunday, September 09, 2001 4:50 AM
Subject: Re: 2.4.10-pre5: Bug in alloc_pages


> [ Background for Al Viro: there's an oops in at least 2.4.10-pre5 that
>   triggers in mm/page_alloc.c, line 204 - which implies that we are
>   trying to allocate a page off the free list that is already on one of
>   the page lists. Which implies rather serious MM corruption ]
>
> In article <000701c13856$14e01fe0$0200a8c0@W2K> you write:
> >Here are a few Oopses which appeared in 2.4.10-pre5 (not in pre4). The
first
> >2 appeared during the startup scripts and the next ones appeared over the
> >next 20 minutes or so. I'd be happy to try patches. Please CC me.
>
> Nick, can you do some more debugging for me? The bug is definitely real,
> there's no question about it - I have now seen it myself, but I don't
> seem to be able to reproduce it on my machines. It seems to happen quite
> early for you..
>
> What I'd ask you to do is:
>
>  - can you verify that it is repeatable under pre5? Does it happen every
>    time, or at least easy to trigger?
>
>  - can you try to trigger it some more under pre4? In particular, pre5
>    doesn't actually have any MM changes _at_all_, which makes me suspect
>    that maybe something in pre5 just made it easier to trigger. This is
>    also why I'd like to hear whether it is really repeatable in pre5: if
>    it's not repeatable in pre5, maybe you ran pre4 for a longish time
>    and just didn't happen to hit it..
>
> I know that the above kind of testing is rather nasty and boring
> (especially as you'd end up having to reboot multiple times), but it
> would really help.. Thanks.
>
> If it really happens only under pre5, and never under pre4, then that is
> very interesting indeed.  As mentioned, the pre4->pre5 thing doesn't
> actually change any of the VM code itself, so then there's something
> else going on. The only thing I can imagine right now is:
>
>  - the initbootdata handling changed a bit. Does the problem go away if
>    you copy 'arch/i386/kernel/setup.c' from pre4 into the pre5 tree?
>
>  - Al Viro's FS-layer changes somehow trigger this bug, possibly by
>    freeing some inode early. I don't have any real reason to suspect the
>    FS changes, except for the fact that with no MM changes, the FS is
>    the only other thing that has changed and is fairly intimate with MM
>    stuff.
>
> Most of the pre4->pre5 changes are in fact things that I know cannot
> matter, simply because I don't even have them compiled into my kernels.
> Things like bluetooth, ARM, sparc, minix, telephony, framebuffer etc.
> This is why it would be so interesting to make sure that it really _is_
> pre5 only, and never happens in pre4..
>
> Hugh, if it turns out to be possible to trigger on pre4 too, I'm still
> going to blame your swap changes. So please give them a double look just
> in case..
>
> Nick, I don't have any real patches for you to test yet (except the
> suggestion to reverse i386/kernel/setup.c if you can't re-create it on
> pre4), but I'd be very grateful for as much information as you can
> possibly gather.. Things like patterns to how the oopses happen etc.
>
> Thanks,
> Linus
>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-09-09  7:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200109081850.f88Io0L01074@penguin.transmeta.com>
2001-09-09  4:52 ` 2.4.10-pre5: Bug in alloc_pages Nick Piggin
2001-09-08 11:04 Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox