* kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
@ 2002-07-23 16:56 David F Barrera
2002-07-23 19:24 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: David F Barrera @ 2002-07-23 16:56 UTC (permalink / raw)
To: linux-kernel
I have experienced the following errors while running a test suite (LTP
test suite) on the 2.4.26 kernel. Has anybody seen this problem, and, if
so, is there a patch for it? Thanks.
kernel BUG at page_alloc.c:92!
invalid operand: 0000
CPU: 7
EIP: 0010:[<c0132fae>] Not tainted
EFLAGS: 00010202
eax: 00000020 ebx: 00000009 ecx: c7fd0208 edx: c7fd0208
esi: fe0029fa edi: 00000000 ebp: ddeff009 esp: f6793eb4
ds: 0018 es: 0018 ss: 0018
Process top (pid: 4648, threadinfo=f6792000 task=f7320ce0)
Stack: c7fd0208 00000000 00000009 ddeff000 ddeff000 c011e137 f7320ce0
f5f4d8a0
bffff9f1 00000009 fe0029fa 00000000 ddeff009 c011e0fe f6792000
ddeff000
f5f4d8a0 c7fd0208 f5ba83c0 00000000 f5f4d8a0 ddeff000 ddeff000
c015cca3
Call Trace: [<c011e137>] [<c011e0fe>] [<c015cca3>] [<c015d016>]
[<c013c9e8>]
[<c013cb9a>] [<c010700b>]
Code: 0f 0b 5c 00 99 49 2d c0 8b 14 24 8b 42 14 83 e0 40 74 08 0f
mremap01: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
mremap01: page allocation failure. order:0, mode:0x0
mremap01: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
mremap01: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
----------------------------------------------------------------------------------------------------------------------------
Ksymoops output:
ksymoops 2.4.1 on i686 2.5.26. Options used
-v /boot/vmlinux-2.5.26 (specified)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.5.26/ (default)
-m /boot/System.map-2.5.26 (specified)
No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod
file?
Warning (compare_maps): ksyms_base symbol
__wake_up_sync_R__ver___wake_up_sync not found in vmlinux. Ignoring
ksyms_base entry
Warning (compare_maps): ksyms_base symbol idle_cpu_R__ver_idle_cpu not
found in vmlinux. Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol
set_cpus_allowed_R__ver_set_cpus_allowed not found in vmlinux. Ignoring
ksyms_base entry
kernel BUG at page_alloc.c:92!
invalid operand: 0000
CPU: 7
EIP: 0010:[<c0132fae>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000020 ebx: 00000009 ecx: c7fd0208 edx: c7fd0208
esi: fe0029fa edi: 00000000 ebp: ddeff009 esp: f6793eb4
ds: 0018 es: 0018 ss: 0018
Stack: c7fd0208 00000000 00000009 ddeff000 ddeff000 c011e137 f7320ce0
f5f4d8a0
bffff9f1 00000009 fe0029fa 00000000 ddeff009 c011e0fe f6792000
ddeff000
f5f4d8a0 c7fd0208 f5ba83c0 00000000 f5f4d8a0 ddeff000 ddeff000
c015cca3
Call Trace: [<c011e137>] [<c011e0fe>] [<c015cca3>] [<c015d016>]
[<c013c9e8>]
[<c013cb9a>] [<c010700b>]
Code: 0f 0b 5c 00 99 49 2d c0 8b 14 24 8b 42 14 83 e0 40 74 08 0f
>>EIP; c0132fae <__free_pages_ok+4e/2e0> <=====
Trace; c011e137 <access_process_vm+177/1c0>
Trace; c011e0fe <access_process_vm+13e/1c0>
Trace; c015cca3 <proc_pid_cmdline+63/f0>
Trace; c015d016 <proc_info_read+46/100>
Trace; c013c9e8 <vfs_read+98/110>
Trace; c013cb9a <sys_read+2a/40>
Trace; c010700b <syscall_call+7/b>
Code; c0132fae <__free_pages_ok+4e/2e0>
00000000 <_EIP>:
Code; c0132fae <__free_pages_ok+4e/2e0> <=====
0: 0f 0b ud2a <=====
Code; c0132fb0 <__free_pages_ok+50/2e0>
2: 5c pop %esp
Code; c0132fb1 <__free_pages_ok+51/2e0>
3: 00 99 49 2d c0 8b add %bl,0x8bc02d49(%ecx)
Code; c0132fb7 <__free_pages_ok+57/2e0>
9: 14 24 adc $0x24,%al
Code; c0132fb9 <__free_pages_ok+59/2e0>
b: 8b 42 14 mov 0x14(%edx),%eax
Code; c0132fbc <__free_pages_ok+5c/2e0>
e: 83 e0 40 and $0x40,%eax
Code; c0132fbf <__free_pages_ok+5f/2e0>
11: 74 08 je 1b <_EIP+0x1b> c0132fc9
<__free_pages_ok+69/2e0>
Code; c0132fc1 <__free_pages_ok+61/2e0>
13: 0f 00 00 sldtl (%eax)
4 warnings issued. Results may not be reliable.
David F Barrera
dbarrera@us.ibm.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
@ 2002-07-23 17:07 David F Barrera
2002-07-23 17:17 ` Rik van Riel
0 siblings, 1 reply; 14+ messages in thread
From: David F Barrera @ 2002-07-23 17:07 UTC (permalink / raw)
To: linux-kernel
I should have stated that the problem is occurring on the 2.5.26 kernel,
NOT 2.4.26. My mistake. Also, the hardware is an IBM eServer 8-way x370
with 12GB or RAM, SCSI drives, 2GB swap. My observation was that not all
memory was being used at the time of the oops and error messages.
I have experienced the following errors while running a test suite (LTP
test suite) on the 2.5.26 kernel. Has anybody seen this problem, and, if
so, is there a patch for it? Please reply directly as I am not yet
subscribed to the mailing list. Thanks.
kernel BUG at page_alloc.c:92!
invalid operand: 0000
CPU: 7
EIP: 0010:[<c0132fae>] Not tainted
EFLAGS: 00010202
eax: 00000020 ebx: 00000009 ecx: c7fd0208 edx: c7fd0208
esi: fe0029fa edi: 00000000 ebp: ddeff009 esp: f6793eb4
ds: 0018 es: 0018 ss: 0018
Process top (pid: 4648, threadinfo=f6792000 task=f7320ce0)
Stack: c7fd0208 00000000 00000009 ddeff000 ddeff000 c011e137 f7320ce0
f5f4d8a0
bffff9f1 00000009 fe0029fa 00000000 ddeff009 c011e0fe f6792000
ddeff000
f5f4d8a0 c7fd0208 f5ba83c0 00000000 f5f4d8a0 ddeff000 ddeff000
c015cca3
Call Trace: [<c011e137>] [<c011e0fe>] [<c015cca3>] [<c015d016>]
[<c013c9e8>]
[<c013cb9a>] [<c010700b>]
Code: 0f 0b 5c 00 99 49 2d c0 8b 14 24 8b 42 14 83 e0 40 74 08 0f
mremap01: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
mremap01: page allocation failure. order:0, mode:0x0
mremap01: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
mremap01: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
pdflush: page allocation failure. order:0, mode:0x0
----------------------------------------------------------------------------------------------------------------------------
Ksymoops output:
ksymoops 2.4.1 on i686 2.5.26. Options used
-v /boot/vmlinux-2.5.26 (specified)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.5.26/ (default)
-m /boot/System.map-2.5.26 (specified)
No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod
file?
Warning (compare_maps): ksyms_base symbol
__wake_up_sync_R__ver___wake_up_sync not found in vmlinux. Ignoring
ksyms_base entry
Warning (compare_maps): ksyms_base symbol idle_cpu_R__ver_idle_cpu not
found in vmlinux. Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol
set_cpus_allowed_R__ver_set_cpus_allowed not found in vmlinux. Ignoring
ksyms_base entry
kernel BUG at page_alloc.c:92!
invalid operand: 0000
CPU: 7
EIP: 0010:[<c0132fae>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000020 ebx: 00000009 ecx: c7fd0208 edx: c7fd0208
esi: fe0029fa edi: 00000000 ebp: ddeff009 esp: f6793eb4
ds: 0018 es: 0018 ss: 0018
Stack: c7fd0208 00000000 00000009 ddeff000 ddeff000 c011e137 f7320ce0
f5f4d8a0
bffff9f1 00000009 fe0029fa 00000000 ddeff009 c011e0fe f6792000
ddeff000
f5f4d8a0 c7fd0208 f5ba83c0 00000000 f5f4d8a0 ddeff000 ddeff000
c015cca3
Call Trace: [<c011e137>] [<c011e0fe>] [<c015cca3>] [<c015d016>]
[<c013c9e8>]
[<c013cb9a>] [<c010700b>]
Code: 0f 0b 5c 00 99 49 2d c0 8b 14 24 8b 42 14 83 e0 40 74 08 0f
>>EIP; c0132fae <__free_pages_ok+4e/2e0> <=====
Trace; c011e137 <access_process_vm+177/1c0>
Trace; c011e0fe <access_process_vm+13e/1c0>
Trace; c015cca3 <proc_pid_cmdline+63/f0>
Trace; c015d016 <proc_info_read+46/100>
Trace; c013c9e8 <vfs_read+98/110>
Trace; c013cb9a <sys_read+2a/40>
Trace; c010700b <syscall_call+7/b>
Code; c0132fae <__free_pages_ok+4e/2e0>
00000000 <_EIP>:
Code; c0132fae <__free_pages_ok+4e/2e0> <=====
0: 0f 0b ud2a <=====
Code; c0132fb0 <__free_pages_ok+50/2e0>
2: 5c pop %esp
Code; c0132fb1 <__free_pages_ok+51/2e0>
3: 00 99 49 2d c0 8b add %bl,0x8bc02d49(%ecx)
Code; c0132fb7 <__free_pages_ok+57/2e0>
9: 14 24 adc $0x24,%al
Code; c0132fb9 <__free_pages_ok+59/2e0>
b: 8b 42 14 mov 0x14(%edx),%eax
Code; c0132fbc <__free_pages_ok+5c/2e0>
e: 83 e0 40 and $0x40,%eax
Code; c0132fbf <__free_pages_ok+5f/2e0>
11: 74 08 je 1b <_EIP+0x1b> c0132fc9
<__free_pages_ok+69/2e0>
Code; c0132fc1 <__free_pages_ok+61/2e0>
13: 0f 00 00 sldtl (%eax)
4 warnings issued. Results may not be reliable.
David F Barrera
dbarrera@us.ibm.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-23 17:07 David F Barrera
@ 2002-07-23 17:17 ` Rik van Riel
2002-07-23 18:17 ` Paul Larson
0 siblings, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2002-07-23 17:17 UTC (permalink / raw)
To: David F Barrera; +Cc: linux-kernel
On Tue, 23 Jul 2002, David F Barrera wrote:
> I should have stated that the problem is occurring on the 2.5.26 kernel,
> NOT 2.4.26. My mistake. Also, the hardware is an IBM eServer 8-way
> x370 with 12GB or RAM, SCSI drives, 2GB swap.
Does the attached patch fix it ?
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
===== mm/rmap.c 1.3 vs edited =====
--- 1.3/mm/rmap.c Tue Jul 16 18:46:30 2002
+++ edited/mm/rmap.c Tue Jul 23 14:01:23 2002
@@ -163,7 +163,7 @@
void page_remove_rmap(struct page * page, pte_t * ptep)
{
struct pte_chain * pc, * prev_pc = NULL;
- unsigned long pfn = pte_pfn(*ptep);
+ unsigned long pfn = page_to_pfn(page);
if (!page || !ptep)
BUG();
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-23 17:17 ` Rik van Riel
@ 2002-07-23 18:17 ` Paul Larson
0 siblings, 0 replies; 14+ messages in thread
From: Paul Larson @ 2002-07-23 18:17 UTC (permalink / raw)
To: Rik van Riel; +Cc: David F Barrera, lkml
On Tue, 2002-07-23 at 12:17, Rik van Riel wrote:
> Does the attached patch fix it ?
> ===== mm/rmap.c 1.3 vs edited =====
> --- 1.3/mm/rmap.c Tue Jul 16 18:46:30 2002
> +++ edited/mm/rmap.c Tue Jul 23 14:01:23 2002
I doubt it, his probelem was on 2.5.26.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-23 16:56 kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0 David F Barrera
@ 2002-07-23 19:24 ` Andrew Morton
2002-07-23 20:34 ` Andrea Arcangeli
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2002-07-23 19:24 UTC (permalink / raw)
To: David F Barrera; +Cc: linux-kernel
David F Barrera wrote:
>
> I have experienced the following errors while running a test suite (LTP
> test suite) on the 2.4.26 kernel. Has anybody seen this problem, and, if
> so, is there a patch for it? Thanks.
>
> kernel BUG at page_alloc.c:92!
Could you please replace the put_page(page) in
kernel/ptrace.c:access_process_vm() with page_cache_release(page)
and retest?
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-23 19:24 ` Andrew Morton
@ 2002-07-23 20:34 ` Andrea Arcangeli
2002-07-23 20:47 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2002-07-23 20:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: David F Barrera, linux-kernel
On Tue, Jul 23, 2002 at 12:24:04PM -0700, Andrew Morton wrote:
> David F Barrera wrote:
> >
> > I have experienced the following errors while running a test suite (LTP
> > test suite) on the 2.4.26 kernel. Has anybody seen this problem, and, if
> > so, is there a patch for it? Thanks.
> >
> > kernel BUG at page_alloc.c:92!
>
> Could you please replace the put_page(page) in
> kernel/ptrace.c:access_process_vm() with page_cache_release(page)
> and retest?
I prefer to drop page_cache_release and to have __free_pages_ok to deal
with the lru pages like it's been fixed in 2.4.
Andrea
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-23 20:34 ` Andrea Arcangeli
@ 2002-07-23 20:47 ` Andrew Morton
2002-07-23 20:56 ` Andrea Arcangeli
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2002-07-23 20:47 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: David F Barrera, linux-kernel
Andrea Arcangeli wrote:
>
> On Tue, Jul 23, 2002 at 12:24:04PM -0700, Andrew Morton wrote:
> > David F Barrera wrote:
> > >
> > > I have experienced the following errors while running a test suite (LTP
> > > test suite) on the 2.4.26 kernel. Has anybody seen this problem, and, if
> > > so, is there a patch for it? Thanks.
> > >
> > > kernel BUG at page_alloc.c:92!
> >
> > Could you please replace the put_page(page) in
> > kernel/ptrace.c:access_process_vm() with page_cache_release(page)
> > and retest?
>
> I prefer to drop page_cache_release and to have __free_pages_ok to deal
> with the lru pages like it's been fixed in 2.4.
That would fix it too. But a __free_pages_ok call from interrupt
context can deadlock the box.
The removal of pages from the LRU is rather a mess. It's getting
better, and we can fix up some more of this if/when pagemap_lru_lock
becomes an interrupt-safe lock.
-
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-23 20:47 ` Andrew Morton
@ 2002-07-23 20:56 ` Andrea Arcangeli
2002-07-23 21:16 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2002-07-23 20:56 UTC (permalink / raw)
To: Andrew Morton; +Cc: David F Barrera, linux-kernel
On Tue, Jul 23, 2002 at 01:47:28PM -0700, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> >
> > On Tue, Jul 23, 2002 at 12:24:04PM -0700, Andrew Morton wrote:
> > > David F Barrera wrote:
> > > >
> > > > I have experienced the following errors while running a test suite (LTP
> > > > test suite) on the 2.4.26 kernel. Has anybody seen this problem, and, if
> > > > so, is there a patch for it? Thanks.
> > > >
> > > > kernel BUG at page_alloc.c:92!
> > >
> > > Could you please replace the put_page(page) in
> > > kernel/ptrace.c:access_process_vm() with page_cache_release(page)
> > > and retest?
> >
> > I prefer to drop page_cache_release and to have __free_pages_ok to deal
> > with the lru pages like it's been fixed in 2.4.
>
> That would fix it too. But a __free_pages_ok call from interrupt
> context can deadlock the box.
I guess you mean it can corrupt the lru list, not necessairly deadlock
the box. That's not the case either though, see the in_interrupt() check
in my tree in free_pages_ok, only normal context is allowed to play with
pagecache. (async-io isn't in my tree)
>
> The removal of pages from the LRU is rather a mess. It's getting
> better, and we can fix up some more of this if/when pagemap_lru_lock
> becomes an interrupt-safe lock.
that will allow irq to manage pagecahce but the fact it's not interrupt
safe it's really a irq latency feature, the fact disabling irqs during
the critical section decreases contention on the lock is kind of hack,
that is true for all spinlocks out there, by that argument all spinlocks
should be irq safe.
Andrea
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-23 20:56 ` Andrea Arcangeli
@ 2002-07-23 21:16 ` Andrew Morton
0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2002-07-23 21:16 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: David F Barrera, linux-kernel
Andrea Arcangeli wrote:
>
> On Tue, Jul 23, 2002 at 01:47:28PM -0700, Andrew Morton wrote:
> > Andrea Arcangeli wrote:
> > >
> > > On Tue, Jul 23, 2002 at 12:24:04PM -0700, Andrew Morton wrote:
> > > > David F Barrera wrote:
> > > > >
> > > > > I have experienced the following errors while running a test suite (LTP
> > > > > test suite) on the 2.4.26 kernel. Has anybody seen this problem, and, if
> > > > > so, is there a patch for it? Thanks.
> > > > >
> > > > > kernel BUG at page_alloc.c:92!
> > > >
> > > > Could you please replace the put_page(page) in
> > > > kernel/ptrace.c:access_process_vm() with page_cache_release(page)
> > > > and retest?
> > >
> > > I prefer to drop page_cache_release and to have __free_pages_ok to deal
> > > with the lru pages like it's been fixed in 2.4.
> >
> > That would fix it too. But a __free_pages_ok call from interrupt
> > context can deadlock the box.
>
> I guess you mean it can corrupt the lru list, not necessairly deadlock
> the box.
Take the lock from interrupt context and it'll deadlock.
> That's not the case either though, see the in_interrupt() check
> in my tree in free_pages_ok, only normal context is allowed to play with
> pagecache. (async-io isn't in my tree)
Yes, I'm adding the same check to 2.5. It's anon pages as well
as pagecache pages. And unless we have a
BUG_ON(PageLRU(page) && in_interrupt())
in put_page_testzero() then I'm not sure how we can be sure that
aio is the only problem area.
> >
> > The removal of pages from the LRU is rather a mess. It's getting
> > better, and we can fix up some more of this if/when pagemap_lru_lock
> > becomes an interrupt-safe lock.
>
> that will allow irq to manage pagecahce but the fact it's not interrupt
> safe it's really a irq latency feature,
Not sure what you mean by this?
> the fact disabling irqs during
> the critical section decreases contention on the lock is kind of hack,
> that is true for all spinlocks out there, by that argument all spinlocks
> should be irq safe.
Sure. If the lock is heavily used, performance critical and you've
done the work to ensure that maximum hold time is small, it is
well worth doing. Plus we need it for free-from-interrupt-context,
and we may need it for performing LRU list motion within IO completion,
although that's looking a bit remote at present.
-
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
@ 2002-07-24 13:42 David F Barrera
2002-07-24 19:35 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: David F Barrera @ 2002-07-24 13:42 UTC (permalink / raw)
To: Andrew Morton; +Cc: akpm, Andrea Arcangeli, linux-kernel
Andrew,
I tried the change to ptrace.c, but it did not work. I cannot boot the
machine. It gives an oops upon boot.
Unable to handle kernel paging request at virtual address 20203444
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c01a19db>] Not tainted
EFLAGS: 00010002
eax: f7942000 ebx: f7942000 ecx: 0000f209 edx: 20203320
esi: 0000f209 edi: 00000046 ebp: c6a574c0 esp: c6aedebc
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, threadinfo=c6aec000 task=c6aea040)
Stack: 00000002 c01b81f6 f7942000 c01b8476 c6a574c0 c01b8ddd c6a574c0
00000009
00000000 c03ee9a0 46460000 00000000 00000046 0000270f 00000001
c01b920b
00000046 00000001 c6aedf15 c0112d70 c039ee40 00000001 00000000
c039ee40
Call Trace: [<c01b81f6>] [<c01b8476>] [<c01b8ddd>] [<c01b920b>]
[<c0112d70>]
[<c01b927c>] [<c01092de>] [<c01094e4>] [<c01052f0>] [<c0107957>]
[<c01052f0>]
[<c01052f0>] [<c010531a>] [<c01053c2>] [<c0117058>]
Code: f6 82 24 01 00 00 08 74 26 0f b6 83 25 01 00 00 b9 01 00 00
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
Regards,
David F Barrera
Linux Technology Center, IBM
T/L 678-1375 External 838-1375
dbarrera@us.ibm.com
Andrew Morton
<akpm@zip.com.au> To: Andrea Arcangeli <andrea@suse.de>
Sent by: cc: David F Barrera/Austin/IBM@IBMUS, linux-kernel@vger.kernel.org
akpm@e4.ny.us.ibm Subject: Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0,
.com mode:0x0
07/23/2002 04:16
PM
Andrea Arcangeli wrote:
>
> On Tue, Jul 23, 2002 at 01:47:28PM -0700, Andrew Morton wrote:
> > Andrea Arcangeli wrote:
> > >
> > > On Tue, Jul 23, 2002 at 12:24:04PM -0700, Andrew Morton wrote:
> > > > David F Barrera wrote:
> > > > >
> > > > > I have experienced the following errors while running a test
suite (LTP
> > > > > test suite) on the 2.4.26 kernel. Has anybody seen this
problem, and, if
> > > > > so, is there a patch for it? Thanks.
> > > > >
> > > > > kernel BUG at page_alloc.c:92!
> > > >
> > > > Could you please replace the put_page(page) in
> > > > kernel/ptrace.c:access_process_vm() with page_cache_release(page)
> > > > and retest?
> > >
> > > I prefer to drop page_cache_release and to have __free_pages_ok to
deal
> > > with the lru pages like it's been fixed in 2.4.
> >
> > That would fix it too. But a __free_pages_ok call from interrupt
> > context can deadlock the box.
>
> I guess you mean it can corrupt the lru list, not necessairly deadlock
> the box.
Take the lock from interrupt context and it'll deadlock.
> That's not the case either though, see the in_interrupt() check
> in my tree in free_pages_ok, only normal context is allowed to play with
> pagecache. (async-io isn't in my tree)
Yes, I'm adding the same check to 2.5. It's anon pages as well
as pagecache pages. And unless we have a
BUG_ON(PageLRU(page) && in_interrupt())
in put_page_testzero() then I'm not sure how we can be sure that
aio is the only problem area.
> >
> > The removal of pages from the LRU is rather a mess. It's getting
> > better, and we can fix up some more of this if/when pagemap_lru_lock
> > becomes an interrupt-safe lock.
>
> that will allow irq to manage pagecahce but the fact it's not interrupt
> safe it's really a irq latency feature,
Not sure what you mean by this?
> the fact disabling irqs during
> the critical section decreases contention on the lock is kind of hack,
> that is true for all spinlocks out there, by that argument all spinlocks
> should be irq safe.
Sure. If the lock is heavily used, performance critical and you've
done the work to ensure that maximum hold time is small, it is
well worth doing. Plus we need it for free-from-interrupt-context,
and we may need it for performing LRU list motion within IO completion,
although that's looking a bit remote at present.
-
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-24 13:42 David F Barrera
@ 2002-07-24 19:35 ` Andrew Morton
2002-07-24 19:55 ` Andrea Arcangeli
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2002-07-24 19:35 UTC (permalink / raw)
To: David F Barrera; +Cc: Andrea Arcangeli, linux-kernel
David F Barrera wrote:
>
> Andrew,
>
> I tried the change to ptrace.c, but it did not work. I cannot boot the
> machine. It gives an oops upon boot.
That won't be due to the ptrace change. Suggest you do a clean
build and if the oops is still there, please pass it through ksymoops and
let us know.
And please drop the ptrace.c change and use
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.27/lru-removal.patch
instead.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-24 19:35 ` Andrew Morton
@ 2002-07-24 19:55 ` Andrea Arcangeli
2002-07-24 20:11 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2002-07-24 19:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: David F Barrera, linux-kernel
On Wed, Jul 24, 2002 at 12:35:54PM -0700, Andrew Morton wrote:
> And please drop the ptrace.c change and use
> http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.27/lru-removal.patch
> instead.
page_cache_release() can return a #define to __free_page().
Andrea
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
2002-07-24 19:55 ` Andrea Arcangeli
@ 2002-07-24 20:11 ` Andrew Morton
0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2002-07-24 20:11 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: David F Barrera, linux-kernel
Andrea Arcangeli wrote:
>
> On Wed, Jul 24, 2002 at 12:35:54PM -0700, Andrew Morton wrote:
> > And please drop the ptrace.c change and use
> > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.27/lru-removal.patch
> > instead.
>
> page_cache_release() can return a #define to __free_page().
>
Man, it can do a ton more than that. This patch is just a stopgap
to prevent the oops.
page_cache_release() goes out onto the bus for the PageReserved()
test and then immediately goes out onto the bus again to perform the
atomic_dec_and_test(). Plus it tends to do all this inside
a global lock. That PageReserved thing needs to go away.
Seriously, this stuff needs a truck driven through it. See
http://mail.nl.linux.org/linux-mm/2002-07/msg00009.html and things
like pagevec_release(). It still needs quite some work, but the
optimisations which are available here are considerable.
-
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0
@ 2002-07-25 14:02 David F Barrera
0 siblings, 0 replies; 14+ messages in thread
From: David F Barrera @ 2002-07-25 14:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: akpm, Andrea Arcangeli, linux-kernel
The problem occurs after the ptrace.c change, as I indicated earlier.
Following is the ksymoops output:
ksymoops 2.4.1 on i686 2.4.17-3.1smp64gigmem. Options used
-v /boot/vmlinux-2.5.26 (specified)
-K (specified)
-l /proc/modules (default)
-o /lib/modules/2.5.26/ (specified)
-m /boot/System.map-2.5.26 (specified)
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel NULL pointer dereference at virtual address
00000014
*pde = 373f2001
Oops: 0000
CPU: 0
EIP: 0010:[<c0133a83>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 00000000 ebx: 00000000 ecx: c357c058 edx: c03d5e20
esi: 00000000 edi: f73e900a ebp: f73e900a esp: f73fdee0
ds: 0018 es: 0018 ss: 0018
Stack: 0000000a c011e12b c357c058 f73fc000 f73e9000 f75f1ee0 c357c058
f75dcf60
00000000 f75f1ee0 f73e9000 f73e9000 c015ccd3 c88c2ce0 bfffffa4
f73e9000
0000000a 00000000 000001d0 f7552950 00000400 c88c2ce0 f73e9000
c015d046
Call Trace: [<c011e12b>] [<c015ccd3>] [<c015d046>] [<c013ca18>]
[<c010cd79>]
[<c013cbca>] [<c010700b>]
Code: 8b 43 14 a9 00 08 00 00 75 24 f0 ff 4b 10 0f 94 c0 84 c0 74
>>EIP; c0133a83 <page_cache_release+3/40> <=====
Trace; c011e12b <access_process_vm+13b/1c0>
Trace; c015ccd3 <proc_pid_cmdline+63/f0>
Trace; c015d046 <proc_info_read+46/100>
Trace; c013ca18 <vfs_read+98/110>
Trace; c010cd79 <sys_mmap2+69/a0>
Trace; c013cbca <sys_read+2a/40>
Trace; c010700b <syscall_call+7/b>
Code; c0133a83 <page_cache_release+3/40>
00000000 <_EIP>:
Code; c0133a83 <page_cache_release+3/40> <=====
0: 8b 43 14 mov 0x14(%ebx),%eax <=====
Code; c0133a86 <page_cache_release+6/40>
3: a9 00 08 00 00 test $0x800,%eax
Code; c0133a8b <page_cache_release+b/40>
8: 75 24 jne 2e <_EIP+0x2e> c0133ab1
<page_cache_release+31/40>
Code; c0133a8d <page_cache_release+d/40>
a: f0 ff 4b 10 lock decl 0x10(%ebx)
Code; c0133a91 <page_cache_release+11/40>
e: 0f 94 c0 sete %al
Code; c0133a94 <page_cache_release+14/40>
11: 84 c0 test %al,%al
Code; c0133a96 <page_cache_release+16/40>
13: 74 00 je 15 <_EIP+0x15> c0133a98
<page_cache_release+18/40>
<1>Unable to handle kernel NULL pointer dereference at virtual address
00000000
c01b7e5b
*pde = 00104001
Oops: 0002
CPU: 5
EIP: 0010:[<c01b7e5b>] Not tainted
EFLAGS: 00010003
eax: 00000000 ebx: 00000000 ecx: f75d1000 edx: 00000000
esi: 0000f01b edi: 00000001 ebp: c886f4c0 esp: c88c7ec0
ds: 0018 es: 0018 ss: 0018
Stack: 00000001 0000270f c886f4c0 00000000 c01b8e0d c886f4c0 0000001b
00000000
c03e69a0 01010000 00000000 00000001 0000270f 00000001 c01b923b
00000001
00000001 c88c7f15 c0112da0 c03994c0 00000001 00000000 c03994c0
00000000
Call Trace: [<c01b8e0d>] [<c01b923b>] [<c0112da0>] [<c01b92ac>]
[<c01092de>]
[<c01094e4>] [<c01052f0>] [<c0107957>] [<c01052f0>] [<c01052f0>]
[<c010531a>]
[<c01053c2>] [<c0117088>]
Code: c6 00 00 8b 91 58 01 00 00 ff 81 5c 01 00 00 8b 44 24 18 88
>>EIP; c01b7e5b <put_queue+2b/60> <=====
Trace; c01b8e0d <handle_scancode+24d/2a0>
Trace; c01b923b <handle_kbd_event+14b/1a0>
Trace; c0112da0 <scheduler_tick+a0/370>
Trace; c01b92ac <keyboard_interrupt+1c/30>
Trace; c01092de <handle_IRQ_event+5e/90>
Trace; c01094e4 <do_IRQ+a4/f0>
Trace; c01052f0 <default_idle+0/40>
Trace; c0107957 <common_interrupt+1f/24>
Trace; c01052f0 <default_idle+0/40>
Trace; c01052f0 <default_idle+0/40>
Trace; c010531a <default_idle+2a/40>
Trace; c01053c2 <cpu_idle+52/70>
Trace; c0117088 <printk+128/140>
Code; c01b7e5b <put_queue+2b/60>
00000000 <_EIP>:
Code; c01b7e5b <put_queue+2b/60> <=====
0: c6 00 00 movb $0x0,(%eax) <=====
Code; c01b7e5e <put_queue+2e/60>
3: 8b 91 58 01 00 00 mov 0x158(%ecx),%edx
Code; c01b7e64 <put_queue+34/60>
9: ff 81 5c 01 00 00 incl 0x15c(%ecx)
Code; c01b7e6a <put_queue+3a/60>
f: 8b 44 24 18 mov 0x18(%esp,1),%eax
Code; c01b7e6e <put_queue+3e/60>
13: 88 00 mov %al,(%eax)
<0>Kernel panic: Aiee, killing interrupt handler!
David F Barrera
Linux Technology Center, IBM
T/L 678-1375 External 838-1375
dbarrera@us.ibm.com
Andrew Morton
<akpm@zip.com.au> To: David F Barrera/Austin/IBM@IBMUS
Sent by: cc: Andrea Arcangeli <andrea@suse.de>, linux-kernel@vger.kernel.org
akpm@us.ibm.com Subject: Re: kernel BUG at page_alloc.c:92! & page allocation failure. order:0,
mode:0x0
07/24/2002 02:35
PM
David F Barrera wrote:
>
> Andrew,
>
> I tried the change to ptrace.c, but it did not work. I cannot boot the
> machine. It gives an oops upon boot.
That won't be due to the ptrace change. Suggest you do a clean
build and if the oops is still there, please pass it through ksymoops and
let us know.
And please drop the ptrace.c change and use
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.27/lru-removal.patch
instead.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2002-07-25 14:00 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-23 16:56 kernel BUG at page_alloc.c:92! & page allocation failure. order:0, mode:0x0 David F Barrera
2002-07-23 19:24 ` Andrew Morton
2002-07-23 20:34 ` Andrea Arcangeli
2002-07-23 20:47 ` Andrew Morton
2002-07-23 20:56 ` Andrea Arcangeli
2002-07-23 21:16 ` Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2002-07-23 17:07 David F Barrera
2002-07-23 17:17 ` Rik van Riel
2002-07-23 18:17 ` Paul Larson
2002-07-24 13:42 David F Barrera
2002-07-24 19:35 ` Andrew Morton
2002-07-24 19:55 ` Andrea Arcangeli
2002-07-24 20:11 ` Andrew Morton
2002-07-25 14:02 David F Barrera
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox