page allocator regression on nommu

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* page allocator regression on nommu
@ 2009-08-31  7:48 Paul Mundt
  2009-08-31 10:08 ` Pekka Enberg
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Paul Mundt @ 2009-08-31  7:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, KOSAKI Motohiro, Pekka Enberg, Peter Zijlstra,
	Nick Piggin, Dave Hansen, Lee Schermerhorn, Andrew Morton,
	Linus Torvalds, David Howells, linux-mm, linux-kernel

Hi Mel,

It seems we've managed to trigger a fairly interesting conflict between
the anti-fragmentation disabling code and the nommu region rbtree. I've
bisected it down to:

commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e
Author: Mel Gorman <mel@csn.ul.ie>
Date:   Tue Jun 16 15:31:58 2009 -0700

    page allocator: move check for disabled anti-fragmentation out of fastpath

    On low-memory systems, anti-fragmentation gets disabled as there is
    nothing it can do and it would just incur overhead shuffling pages between
    lists constantly.  Currently the check is made in the free page fast path
    for every page.  This patch moves it to a slow path.  On machines with low
    memory, there will be small amount of additional overhead as pages get
    shuffled between lists but it should quickly settle.

which causes death on unpacking initramfs on my nommu board. With this
reverted, everything works as expected. Note that this blows up with all of
SLOB/SLUB/SLAB.

I'll continue debugging it, and can post my .config if it will be helpful, but
hopefully you have some suggestions on what to try :-)

---

modprobe: page allocation failure. order:7, mode:0xd0
Stack: (0x0c909d2c to 0x0c90a000)
9d20:                            0c0387d2 00000000 00000000 0c107fcc 00000000
9d40: 00000000 0c107fcc 0c2f8960 00000000 00000010 00000000 00000000 000200d0
9d60: 00000000 0c041614 0c9f8cf4 00000000 0c9f8d1c 00000000 00000004 00000000
9d80: 00000007 0c01a582 0c2f8cd0 00000000 00000000 00049000 00000049 00048fff
9da0: 00049fff 00049000 00000007 0c06633a 00000001 0c909f30 0003e950 0c0bad00
9dc0: 0000984a 0c2f8ea0 ffffe000 00000002 00000000 0c065314 0c2f8ea0 0c077f60
9de0: 00049000 0000673c 00000050 000369a0 000019cf 00000004 00004e20 0003e950
9e00: 000481aa 00000001 00004eaa 000049a0 00007fb0 000369a0 00000000 00000000
9e20: 0c909e50 00000000 00000000 0c066790 00000000 0c909f30 00000000 00000000
9e40: 0c2f8ea0 0c909f30 0c909e50 00004eaa 00000000 00000000 00000000 00000000
9e60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9ea0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9ec0: 0c0482ee 00000000 fffffff8 0c2f8ea0 0c104ff8 0c04884e 0c0feb64 00000001
9ee0: 0c048570 ffffe000 0c2f8ea0 00000000 00000000 0c813eb8 0c909f30 0c002e68
9f00: 00000000 00000000 00000000 0c0feb64 0c813eb8 0c909f30 0c960000 0c007972
9f20: ffffff0f 00000071 00000100 0c002e38 0000000b ffffff0f 00000001 0000000b
9f40: 0c0fea64 0c813eb8 0c0feb64 0c2f898c 00000000 ffffe000 0c9f8990 00000000
9f60: 00000000 00000000 00000000 0c909f8c 0c0050ca 0c01e874 40000001 00000000
9f80: 00000000 4795ce40 0000004c 0c002cca 00000000 00000000 00000000 00000000
9fa0: 00000000 00000000 00000000 00000000 00000000 0c9f8990 0c01e7a4 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 0c909f9c 0c002cc4 00000000 40000000 00000000 00000000 00000000 00000000

Call trace:
 [<0c041614>] do_mmap_pgoff+0x638/0xa68
 [<0c01a582>] flush_itimer_signals+0x36/0x60
 [<0c06633a>] load_flat_file+0x32a/0x738
 [<0c0bad00>] down_write+0x0/0xc
 [<0c065314>] load_elf_fdpic_binary+0x44/0xa8c
 [<0c077f60>] memset+0x0/0x60
 [<0c066790>] load_flat_binary+0x48/0x2fc
 [<0c0482ee>] search_binary_handler+0x4a/0x1fc
 [<0c04884e>] do_execve+0x156/0x2bc
 [<0c048570>] copy_strings+0x0/0x150
 [<0c002e68>] sys_execve+0x30/0x5c
 [<0c007972>] syscall_call+0xc/0x10
 [<0c002e38>] sys_execve+0x0/0x5c
 [<0c0050ca>] kernel_execve+0x6/0xc
 [<0c01e874>] ____call_usermodehelper+0xd0/0xfc
 [<0c002cca>] kernel_thread_helper+0x6/0x10
 [<0c01e7a4>] ____call_usermodehelper+0x0/0xfc
 [<0c002cc4>] kernel_thread_helper+0x0/0x10

Mem-Info:
Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Active_anon:0 active_file:0 inactive_anon:0
 inactive_file:0 unevictable:323 dirty:0 writeback:0 unstable:0
 free:2967 slab:0 mapped:0 pagetables:0 bounce:0
Normal free:11868kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:1292kB present:16256kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Normal: 267*4kB 268*8kB 251*16kB 145*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 11868kB
323 total pagecache pages
------------[ cut here ]------------
kernel BUG at mm/nommu.c:598!
Kernel BUG: 003e [#1]
Modules linked in:

Pid : 51, Comm:                 modprobe
CPU : 0                 Not tainted  (2.6.31-rc7 #2835)

PC is at __put_nommu_region+0xe/0xb0
PR is at do_mmap_pgoff+0x8dc/0xa68
PC  : 0c040e7a SP  : 0c909d7c SR  : 40000001
R0  : 0000001d R1  : 00000000 R2  : 00000000 R3  : 0000000d
R4  : 0c9f8cf4 R5  : 000017de R6  : ffffffff R7  : 00000035
R8  : 0c9f8cf4 R9  : 00000000 R10 : 00000000 R11 : fffffff4
R12 : 0c9f8d1c R13 : 00000000 R14 : 0c9f8cf4
MACH: 0000017e MACL: baf11680 GBR : 00000000 PR  : 0c0418b8

Call trace:
 [<0c0418b8>] do_mmap_pgoff+0x8dc/0xa68
 [<0c01a582>] flush_itimer_signals+0x36/0x60
 [<0c06633a>] load_flat_file+0x32a/0x738
 [<0c0bad00>] down_write+0x0/0xc
 [<0c065314>] load_elf_fdpic_binary+0x44/0xa8c
 [<0c077f60>] memset+0x0/0x60
 [<0c066790>] load_flat_binary+0x48/0x2fc
 [<0c0482ee>] search_binary_handler+0x4a/0x1fc
 [<0c04884e>] do_execve+0x156/0x2bc
 [<0c048570>] copy_strings+0x0/0x150
 [<0c002e68>] sys_execve+0x30/0x5c
 [<0c007972>] syscall_call+0xc/0x10
 [<0c002e38>] sys_execve+0x0/0x5c
 [<0c0050ca>] kernel_execve+0x6/0xc
 [<0c01e874>] ____call_usermodehelper+0xd0/0xfc
 [<0c002cca>] kernel_thread_helper+0x6/0x10
 [<0c01e7a4>] ____call_usermodehelper+0x0/0xfc
 [<0c002cc4>] kernel_thread_helper+0x0/0x10

Code:
  0c040e74:  tst       r1, r1
  0c040e76:  bf.s      0c040e7c
  0c040e78:  mov       r4, r8
->0c040e7a:  trapa     #62
  0c040e7c:  stc       sr, r1
  0c040e7e:  mov       r1, r0
  0c040e80:  or        #-16, r0
  0c040e82:  ldc       r0, sr
  0c040e84:  mov       r1, r0

Process: modprobe (pid: 51, stack limit = 0c908001)
Stack: (0x0c909d7c to 0x0c90a000)
9d60:                                                                0c0418b8
9d80: 00000007 0c01a582 0c2f8cd0 00000000 00000000 00049000 00000049 00048fff
9da0: 00049fff 00049000 00000007 0c06633a 00000001 0c909f30 0003e950 0c0bad00
9dc0: 0000984a 0c2f8ea0 ffffe000 00000002 00000000 0c065314 0c2f8ea0 0c077f60
9de0: 00049000 0000673c 00000050 000369a0 000019cf 00000004 00004e20 0003e950
9e00: 000481aa 00000001 00004eaa 000049a0 00007fb0 000369a0 00000000 00000000
9e20: 0c909e50 00000000 00000000 0c066790 00000000 0c909f30 00000000 00000000
9e40: 0c2f8ea0 0c909f30 0c909e50 00004eaa 00000000 00000000 00000000 00000000
9e60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9ea0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9ec0: 0c0482ee 00000000 fffffff8 0c2f8ea0 0c104ff8 0c04884e 0c0feb64 00000001
9ee0: 0c048570 ffffe000 0c2f8ea0 00000000 00000000 0c813eb8 0c909f30 0c002e68
9f00: 00000000 00000000 00000000 0c0feb64 0c813eb8 0c909f30 0c960000 0c007972
9f20: ffffff0f 00000071 00000100 0c002e38 0000000b ffffff0f 00000001 0000000b
9f40: 0c0fea64 0c813eb8 0c0feb64 0c2f898c 00000000 ffffe000 0c9f8990 00000000
9f60: 00000000 00000000 00000000 0c909f8c 0c0050ca 0c01e874 40000001 00000000
9f80: 00000000 4795ce40 0000004c 0c002cca 00000000 00000000 00000000 00000000
9fa0: 00000000 00000000 00000000 00000000 00000000 0c9f8990 0c01e7a4 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 0c909f9c 0c002cc4 00000000 40000000 00000000 00000000 00000000 00000000
---[ end trace 139ce121c98e96c9 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-08-31  7:48 page allocator regression on nommu Paul Mundt
@ 2009-08-31 10:08 ` Pekka Enberg
  2009-08-31 10:26   ` Paul Mundt
  2009-09-01 13:46   ` David Howells
  2009-08-31 10:30 ` Mel Gorman
  2009-09-01 13:35 ` David Howells
  2 siblings, 2 replies; 13+ messages in thread
From: Pekka Enberg @ 2009-08-31 10:08 UTC (permalink / raw)
  To: Paul Mundt, Mel Gorman, Christoph Lameter, KOSAKI Motohiro,
	Pekka Enberg, Peter Zijlstra, Nick Piggin, Dave Hansen,
	Lee Schermerhorn, Andrew Morton, Linus Torvalds, David Howells,
	linux-mm, linux-kernel

Hi Paul,

On Mon, Aug 31, 2009 at 10:48 AM, Paul Mundt<lethal@linux-sh.org> wrote:
> It seems we've managed to trigger a fairly interesting conflict between
> the anti-fragmentation disabling code and the nommu region rbtree. I've
> bisected it down to:
>
> commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e
> Author: Mel Gorman <mel@csn.ul.ie>
> Date:   Tue Jun 16 15:31:58 2009 -0700
>
>    page allocator: move check for disabled anti-fragmentation out of fastpath
>
>    On low-memory systems, anti-fragmentation gets disabled as there is
>    nothing it can do and it would just incur overhead shuffling pages between
>    lists constantly.  Currently the check is made in the free page fast path
>    for every page.  This patch moves it to a slow path.  On machines with low
>    memory, there will be small amount of additional overhead as pages get
>    shuffled between lists but it should quickly settle.
>
> which causes death on unpacking initramfs on my nommu board. With this
> reverted, everything works as expected. Note that this blows up with all of
> SLOB/SLUB/SLAB.
>
> I'll continue debugging it, and can post my .config if it will be helpful, but
> hopefully you have some suggestions on what to try :-)
>
> ---
>
> modprobe: page allocation failure. order:7, mode:0xd0

OK, so we have order 7 page allocation here...

> Stack: (0x0c909d2c to 0x0c90a000)
> 9d20:                            0c0387d2 00000000 00000000 0c107fcc 00000000
> 9d40: 00000000 0c107fcc 0c2f8960 00000000 00000010 00000000 00000000 000200d0
> 9d60: 00000000 0c041614 0c9f8cf4 00000000 0c9f8d1c 00000000 00000004 00000000
> 9d80: 00000007 0c01a582 0c2f8cd0 00000000 00000000 00049000 00000049 00048fff
> 9da0: 00049fff 00049000 00000007 0c06633a 00000001 0c909f30 0003e950 0c0bad00
> 9dc0: 0000984a 0c2f8ea0 ffffe000 00000002 00000000 0c065314 0c2f8ea0 0c077f60
> 9de0: 00049000 0000673c 00000050 000369a0 000019cf 00000004 00004e20 0003e950
> 9e00: 000481aa 00000001 00004eaa 000049a0 00007fb0 000369a0 00000000 00000000
> 9e20: 0c909e50 00000000 00000000 0c066790 00000000 0c909f30 00000000 00000000
> 9e40: 0c2f8ea0 0c909f30 0c909e50 00004eaa 00000000 00000000 00000000 00000000
> 9e60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9ea0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9ec0: 0c0482ee 00000000 fffffff8 0c2f8ea0 0c104ff8 0c04884e 0c0feb64 00000001
> 9ee0: 0c048570 ffffe000 0c2f8ea0 00000000 00000000 0c813eb8 0c909f30 0c002e68
> 9f00: 00000000 00000000 00000000 0c0feb64 0c813eb8 0c909f30 0c960000 0c007972
> 9f20: ffffff0f 00000071 00000100 0c002e38 0000000b ffffff0f 00000001 0000000b
> 9f40: 0c0fea64 0c813eb8 0c0feb64 0c2f898c 00000000 ffffe000 0c9f8990 00000000
> 9f60: 00000000 00000000 00000000 0c909f8c 0c0050ca 0c01e874 40000001 00000000
> 9f80: 00000000 4795ce40 0000004c 0c002cca 00000000 00000000 00000000 00000000
> 9fa0: 00000000 00000000 00000000 00000000 00000000 0c9f8990 0c01e7a4 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 0c909f9c 0c002cc4 00000000 40000000 00000000 00000000 00000000 00000000
>
> Call trace:
>  [<0c041614>] do_mmap_pgoff+0x638/0xa68
>  [<0c01a582>] flush_itimer_signals+0x36/0x60
>  [<0c06633a>] load_flat_file+0x32a/0x738
>  [<0c0bad00>] down_write+0x0/0xc
>  [<0c065314>] load_elf_fdpic_binary+0x44/0xa8c
>  [<0c077f60>] memset+0x0/0x60
>  [<0c066790>] load_flat_binary+0x48/0x2fc
>  [<0c0482ee>] search_binary_handler+0x4a/0x1fc
>  [<0c04884e>] do_execve+0x156/0x2bc
>  [<0c048570>] copy_strings+0x0/0x150
>  [<0c002e68>] sys_execve+0x30/0x5c
>  [<0c007972>] syscall_call+0xc/0x10
>  [<0c002e38>] sys_execve+0x0/0x5c
>  [<0c0050ca>] kernel_execve+0x6/0xc
>  [<0c01e874>] ____call_usermodehelper+0xd0/0xfc
>  [<0c002cca>] kernel_thread_helper+0x6/0x10
>  [<0c01e7a4>] ____call_usermodehelper+0x0/0xfc
>  [<0c002cc4>] kernel_thread_helper+0x0/0x10
>
> Mem-Info:
> Normal per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> Active_anon:0 active_file:0 inactive_anon:0
>  inactive_file:0 unevictable:323 dirty:0 writeback:0 unstable:0
>  free:2967 slab:0 mapped:0 pagetables:0 bounce:0
> Normal free:11868kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:1292kB present:16256kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0
> Normal: 267*4kB 268*8kB 251*16kB 145*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 11868kB

...but we seem to be all out of order > 3 pages. I'm not sure why
commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e changes any of this,
though.

> 323 total pagecache pages
> ------------[ cut here ]------------
> kernel BUG at mm/nommu.c:598!
> Kernel BUG: 003e [#1]
> Modules linked in:
>
> Pid : 51, Comm:                 modprobe
> CPU : 0                 Not tainted  (2.6.31-rc7 #2835)
>
> PC is at __put_nommu_region+0xe/0xb0
> PR is at do_mmap_pgoff+0x8dc/0xa68

This looks to be a bug in nommu do_mmap_pgoff() error handling. I
guess we shouldn't call __put_nommu_region() if add_nommu_region()
hasn't been called?

> PC  : 0c040e7a SP  : 0c909d7c SR  : 40000001
> R0  : 0000001d R1  : 00000000 R2  : 00000000 R3  : 0000000d
> R4  : 0c9f8cf4 R5  : 000017de R6  : ffffffff R7  : 00000035
> R8  : 0c9f8cf4 R9  : 00000000 R10 : 00000000 R11 : fffffff4
> R12 : 0c9f8d1c R13 : 00000000 R14 : 0c9f8cf4
> MACH: 0000017e MACL: baf11680 GBR : 00000000 PR  : 0c0418b8
>
> Call trace:
>  [<0c0418b8>] do_mmap_pgoff+0x8dc/0xa68
>  [<0c01a582>] flush_itimer_signals+0x36/0x60
>  [<0c06633a>] load_flat_file+0x32a/0x738
>  [<0c0bad00>] down_write+0x0/0xc
>  [<0c065314>] load_elf_fdpic_binary+0x44/0xa8c
>  [<0c077f60>] memset+0x0/0x60
>  [<0c066790>] load_flat_binary+0x48/0x2fc
>  [<0c0482ee>] search_binary_handler+0x4a/0x1fc
>  [<0c04884e>] do_execve+0x156/0x2bc
>  [<0c048570>] copy_strings+0x0/0x150
>  [<0c002e68>] sys_execve+0x30/0x5c
>  [<0c007972>] syscall_call+0xc/0x10
>  [<0c002e38>] sys_execve+0x0/0x5c
>  [<0c0050ca>] kernel_execve+0x6/0xc
>  [<0c01e874>] ____call_usermodehelper+0xd0/0xfc
>  [<0c002cca>] kernel_thread_helper+0x6/0x10
>  [<0c01e7a4>] ____call_usermodehelper+0x0/0xfc
>  [<0c002cc4>] kernel_thread_helper+0x0/0x10
>
> Code:
>  0c040e74:  tst       r1, r1
>  0c040e76:  bf.s      0c040e7c
>  0c040e78:  mov       r4, r8
> ->0c040e7a:  trapa     #62
>  0c040e7c:  stc       sr, r1
>  0c040e7e:  mov       r1, r0
>  0c040e80:  or        #-16, r0
>  0c040e82:  ldc       r0, sr
>  0c040e84:  mov       r1, r0
>
> Process: modprobe (pid: 51, stack limit = 0c908001)
> Stack: (0x0c909d7c to 0x0c90a000)
> 9d60:                                                                0c0418b8
> 9d80: 00000007 0c01a582 0c2f8cd0 00000000 00000000 00049000 00000049 00048fff
> 9da0: 00049fff 00049000 00000007 0c06633a 00000001 0c909f30 0003e950 0c0bad00
> 9dc0: 0000984a 0c2f8ea0 ffffe000 00000002 00000000 0c065314 0c2f8ea0 0c077f60
> 9de0: 00049000 0000673c 00000050 000369a0 000019cf 00000004 00004e20 0003e950
> 9e00: 000481aa 00000001 00004eaa 000049a0 00007fb0 000369a0 00000000 00000000
> 9e20: 0c909e50 00000000 00000000 0c066790 00000000 0c909f30 00000000 00000000
> 9e40: 0c2f8ea0 0c909f30 0c909e50 00004eaa 00000000 00000000 00000000 00000000
> 9e60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9ea0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9ec0: 0c0482ee 00000000 fffffff8 0c2f8ea0 0c104ff8 0c04884e 0c0feb64 00000001
> 9ee0: 0c048570 ffffe000 0c2f8ea0 00000000 00000000 0c813eb8 0c909f30 0c002e68
> 9f00: 00000000 00000000 00000000 0c0feb64 0c813eb8 0c909f30 0c960000 0c007972
> 9f20: ffffff0f 00000071 00000100 0c002e38 0000000b ffffff0f 00000001 0000000b
> 9f40: 0c0fea64 0c813eb8 0c0feb64 0c2f898c 00000000 ffffe000 0c9f8990 00000000
> 9f60: 00000000 00000000 00000000 0c909f8c 0c0050ca 0c01e874 40000001 00000000
> 9f80: 00000000 4795ce40 0000004c 0c002cca 00000000 00000000 00000000 00000000
> 9fa0: 00000000 00000000 00000000 00000000 00000000 0c9f8990 0c01e7a4 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 0c909f9c 0c002cc4 00000000 40000000 00000000 00000000 00000000 00000000
> ---[ end trace 139ce121c98e96c9 ]---
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-08-31 10:08 ` Pekka Enberg
@ 2009-08-31 10:26   ` Paul Mundt
  2009-09-01 13:46   ` David Howells
  1 sibling, 0 replies; 13+ messages in thread
From: Paul Mundt @ 2009-08-31 10:26 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Mel Gorman, Christoph Lameter, KOSAKI Motohiro, Peter Zijlstra,
	Nick Piggin, Dave Hansen, Lee Schermerhorn, Andrew Morton,
	Linus Torvalds, David Howells, linux-mm, linux-kernel

On Mon, Aug 31, 2009 at 01:08:19PM +0300, Pekka Enberg wrote:
> On Mon, Aug 31, 2009 at 10:48 AM, Paul Mundt<lethal@linux-sh.org> wrote:
> > modprobe: page allocation failure. order:7, mode:0xd0
> 
> OK, so we have order 7 page allocation here...
> 
[snip]
> > Active_anon:0 active_file:0 inactive_anon:0
> > ?inactive_file:0 unevictable:323 dirty:0 writeback:0 unstable:0
> > ?free:2967 slab:0 mapped:0 pagetables:0 bounce:0
> > Normal free:11868kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:1292kB present:16256kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0
> > Normal: 267*4kB 268*8kB 251*16kB 145*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 11868kB
> 
> ...but we seem to be all out of order > 3 pages. I'm not sure why
> commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e changes any of this,
> though.
> 
Nor am I, but it does. With it reverted, all of the order-7 allocations
succeed just fine. With some debugging printks added:

usbcore: registered new device driver usb
alloc order 7 for 49000: pages 0c21c000
alloc order 7 for 49000: pages 0c21c000
...

While with it applied:

alloc order 7 for 49000:
modprobe: page allocation failure. order:7, mode:0xd0
...
Mem-Info:
Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Active_anon:0 active_file:0 inactive_anon:0
 inactive_file:0 unevictable:323 dirty:0 writeback:0 unstable:0
 free:2911 slab:0 mapped:0 pagetables:0 bounce:0
Normal free:11644kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:1292kB present:16256kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Normal: 259*4kB 264*8kB 247*16kB 142*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 11644kB
323 total pagecache pages
4096 pages RAM
662 pages reserved
226 pages shared
288 pages non-shared
0 pages in pagetable cache
-ENOMEM
Allocation of length 299008 from process 50 (modprobe) failed
Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Active_anon:0 active_file:0 inactive_anon:0
 inactive_file:0 unevictable:323 dirty:0 writeback:0 unstable:0
 free:2911 slab:0 mapped:0 pagetables:0 bounce:0
Normal free:11644kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:1292kB present:16256kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Normal: 259*4kB 264*8kB 247*16kB 142*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 11644kB
323 total pagecache pages

the -ENOMEM printk() I've placed in the alloc_pages() error path.

> > ------------[ cut here ]------------
> > kernel BUG at mm/nommu.c:598!
> > Kernel BUG: 003e [#1]
> > Modules linked in:
> >
> > Pid : 51, Comm: ? ? ? ? ? ? ? ? modprobe
> > CPU : 0 ? ? ? ? ? ? ? ? Not tainted ?(2.6.31-rc7 #2835)
> >
> > PC is at __put_nommu_region+0xe/0xb0
> > PR is at do_mmap_pgoff+0x8dc/0xa68
> 
> This looks to be a bug in nommu do_mmap_pgoff() error handling. I
> guess we shouldn't call __put_nommu_region() if add_nommu_region()
> hasn't been called?
> 
Yeah, that looks a bit suspect. __put_nommu_region() is safe to be called
without a call to add_nommu_region(), but we happen to trip over the BUG_ON()
in this case because we've never made a single addition to the region tree.

We probably ought to just up_write() and return if nommu_region_tree == RB_ROOT,
which is what I'll do unless David objects.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-08-31  7:48 page allocator regression on nommu Paul Mundt
  2009-08-31 10:08 ` Pekka Enberg
@ 2009-08-31 10:30 ` Mel Gorman
  2009-08-31 10:43   ` Paul Mundt
  2009-09-01 13:35 ` David Howells
  2 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2009-08-31 10:30 UTC (permalink / raw)
  To: Paul Mundt, Christoph Lameter, KOSAKI Motohiro, Pekka Enberg,
	Peter Zijlstra, Nick Piggin, Dave Hansen, Lee Schermerhorn,
	Andrew Morton, Linus Torvalds, David Howells, linux-mm,
	linux-kernel

On Mon, Aug 31, 2009 at 04:48:43PM +0900, Paul Mundt wrote:
> Hi Mel,
> 
> It seems we've managed to trigger a fairly interesting conflict between
> the anti-fragmentation disabling code and the nommu region rbtree. I've
> bisected it down to:
> 
> commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e
> Author: Mel Gorman <mel@csn.ul.ie>
> Date:   Tue Jun 16 15:31:58 2009 -0700
> 
>     page allocator: move check for disabled anti-fragmentation out of fastpath
> 
>     On low-memory systems, anti-fragmentation gets disabled as there is
>     nothing it can do and it would just incur overhead shuffling pages between
>     lists constantly.  Currently the check is made in the free page fast path
>     for every page.  This patch moves it to a slow path.  On machines with low
>     memory, there will be small amount of additional overhead as pages get
>     shuffled between lists but it should quickly settle.
> 
> which causes death on unpacking initramfs on my nommu board. With this
> reverted, everything works as expected. Note that this blows up with all of
> SLOB/SLUB/SLAB.
> 
> I'll continue debugging it, and can post my .config if it will be helpful, but
> hopefully you have some suggestions on what to try :-)
> 

Based on the output you have given me, it would appear the real
underlying cause is that fragmentation caused the allocation to fail.
The following patch might fix the problem.

====
page-allocator: Always change pageblock ownership when anti-fragmentation is disabled

On low-memory systems, anti-fragmentation gets disabled as there is nothing
it can do and it would just incur overhead shuffling pages between lists
constantly. When the system starts up, there is a period of time when
all the pageblocks are marked MOVABLE and the expectation is that they
get marked UNMOVABLE.

However, when MAX_ORDER is a large number, the pageblocks may not change
ownership because the normal criteria do not apply. This can have the
effect of prematurely breaking up too many large contiguous blocks which
can be a problem on NOMMU systems.

This patch causes pageblocks to change ownership ever time a fallback
occurs when anti-fragmentation is disabled. This should prevent the
large blocks being prematurely broken up.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 mm/page_alloc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d052abb..cfe9a5b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -817,7 +817,8 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * agressive about taking ownership of free pages
 			 */
 			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE) {
+					start_migratetype == MIGRATE_RECLAIMABLE ||
+					page_group_by_mobility_disabled) {
 				unsigned long pages;
 				pages = move_freepages_block(zone, page,
 								start_migratetype);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-08-31 10:30 ` Mel Gorman
@ 2009-08-31 10:43   ` Paul Mundt
  2009-08-31 10:59     ` Mel Gorman
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Mundt @ 2009-08-31 10:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, KOSAKI Motohiro, Pekka Enberg, Peter Zijlstra,
	Nick Piggin, Dave Hansen, Lee Schermerhorn, Andrew Morton,
	Linus Torvalds, David Howells, linux-mm, linux-kernel

On Mon, Aug 31, 2009 at 11:30:56AM +0100, Mel Gorman wrote:
> On Mon, Aug 31, 2009 at 04:48:43PM +0900, Paul Mundt wrote:
> > Hi Mel,
> > 
> > It seems we've managed to trigger a fairly interesting conflict between
> > the anti-fragmentation disabling code and the nommu region rbtree. I've
> > bisected it down to:
> > 
> > commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e
> > Author: Mel Gorman <mel@csn.ul.ie>
> > Date:   Tue Jun 16 15:31:58 2009 -0700
> > 
> >     page allocator: move check for disabled anti-fragmentation out of fastpath
> > 
> >     On low-memory systems, anti-fragmentation gets disabled as there is
> >     nothing it can do and it would just incur overhead shuffling pages between
> >     lists constantly.  Currently the check is made in the free page fast path
> >     for every page.  This patch moves it to a slow path.  On machines with low
> >     memory, there will be small amount of additional overhead as pages get
> >     shuffled between lists but it should quickly settle.
> > 
> > which causes death on unpacking initramfs on my nommu board. With this
> > reverted, everything works as expected. Note that this blows up with all of
> > SLOB/SLUB/SLAB.
> > 
> > I'll continue debugging it, and can post my .config if it will be helpful, but
> > hopefully you have some suggestions on what to try :-)
> > 
> 
> Based on the output you have given me, it would appear the real
> underlying cause is that fragmentation caused the allocation to fail.
> The following patch might fix the problem.
>
Unfortunately this has no impact, the same issue occurs.

Note that with 49255c619fbd482d704289b5eb2795f8e3b7ff2e reverted, show_mem()
shows the following:

alloc order 7 for 49000: pages 0c21c000
Mem-Info:
Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Active_anon:0 active_file:2 inactive_anon:0
 inactive_file:320 unevictable:0 dirty:0 writeback:0 unstable:0
 free:2782 slab:0 mapped:0 pagetables:0 bounce:0
Normal free:11128kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:8kB inactive_file:1280kB unevictable:0kB present:16256kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Normal: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB 1*8192kB 0*16384kB 0*32768kB = 11128kB
323 total pagecache pages
4096 pages RAM
662 pages reserved
227 pages shared
289 pages non-shared
0 pages in pagetable cache

while with it applied, consistently:

alloc order 7 for 49000:
modprobe: page allocation failure. order:7, mode:0xd0
...
Mem-Info:
Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Active_anon:0 active_file:0 inactive_anon:0
 inactive_file:0 unevictable:323 dirty:0 writeback:0 unstable:0
 free:2910 slab:0 mapped:0 pagetables:0 bounce:0
Normal free:11640kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:1292kB present:16256kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Normal: 252*4kB 245*8kB 238*16kB 152*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 11640kB
323 total pagecache pages
4096 pages RAM
662 pages reserved
226 pages shared
289 pages non-shared
0 pages in pagetable cache
-ENOMEM

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-08-31 10:43   ` Paul Mundt
@ 2009-08-31 10:59     ` Mel Gorman
  2009-09-01  0:46       ` Paul Mundt
  0 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2009-08-31 10:59 UTC (permalink / raw)
  To: Paul Mundt, Christoph Lameter, KOSAKI Motohiro, Pekka Enberg,
	Peter Zijlstra, Nick Piggin, Dave Hansen, Lee Schermerhorn,
	Andrew Morton, Linus Torvalds, David Howells, linux-mm,
	linux-kernel

On Mon, Aug 31, 2009 at 07:43:15PM +0900, Paul Mundt wrote:
> On Mon, Aug 31, 2009 at 11:30:56AM +0100, Mel Gorman wrote:
> > On Mon, Aug 31, 2009 at 04:48:43PM +0900, Paul Mundt wrote:
> > > Hi Mel,
> > > 
> > > It seems we've managed to trigger a fairly interesting conflict between
> > > the anti-fragmentation disabling code and the nommu region rbtree. I've
> > > bisected it down to:
> > > 
> > > commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e
> > > Author: Mel Gorman <mel@csn.ul.ie>
> > > Date:   Tue Jun 16 15:31:58 2009 -0700
> > > 
> > >     page allocator: move check for disabled anti-fragmentation out of fastpath
> > > 
> > >     On low-memory systems, anti-fragmentation gets disabled as there is
> > >     nothing it can do and it would just incur overhead shuffling pages between
> > >     lists constantly.  Currently the check is made in the free page fast path
> > >     for every page.  This patch moves it to a slow path.  On machines with low
> > >     memory, there will be small amount of additional overhead as pages get
> > >     shuffled between lists but it should quickly settle.
> > > 
> > > which causes death on unpacking initramfs on my nommu board. With this
> > > reverted, everything works as expected. Note that this blows up with all of
> > > SLOB/SLUB/SLAB.
> > > 
> > > I'll continue debugging it, and can post my .config if it will be helpful, but
> > > hopefully you have some suggestions on what to try :-)
> > > 
> > 
> > Based on the output you have given me, it would appear the real
> > underlying cause is that fragmentation caused the allocation to fail.
> > The following patch might fix the problem.
> >
> Unfortunately this has no impact, the same issue occurs.
> 

What is the output of the following debug patch?

====
page-allocator: Debug per-cpu free

It's possible that pages being freed on the per-cpu list of 1 page is
the wrong type when anti-fragmentation is disabled. It could have the
impact of triggering a fallback earlier than it should happen.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d052abb..a2a11ce 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1042,6 +1042,7 @@ static void free_hot_cold_page(struct page *page, int cold)
 
 	pcp = &zone_pcp(zone, get_cpu())->pcp;
 	set_page_private(page, get_pageblock_migratetype(page));
+	WARN_ON_ONCE(page_group_by_mobility_disabled && page_private(page) != MIGRATE_UNMOVABLE);
 	local_irq_save(flags);
 	if (unlikely(wasMlocked))
 		free_page_mlock(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-08-31 10:59     ` Mel Gorman
@ 2009-09-01  0:46       ` Paul Mundt
  2009-09-01 10:03         ` Mel Gorman
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Mundt @ 2009-09-01  0:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, KOSAKI Motohiro, Pekka Enberg, Peter Zijlstra,
	Nick Piggin, Dave Hansen, Lee Schermerhorn, Andrew Morton,
	Linus Torvalds, David Howells, linux-mm, linux-kernel

On Mon, Aug 31, 2009 at 11:59:52AM +0100, Mel Gorman wrote:
> On Mon, Aug 31, 2009 at 07:43:15PM +0900, Paul Mundt wrote:
> > On Mon, Aug 31, 2009 at 11:30:56AM +0100, Mel Gorman wrote:
> > > On Mon, Aug 31, 2009 at 04:48:43PM +0900, Paul Mundt wrote:
> > > > Hi Mel,
> > > > 
> > > > It seems we've managed to trigger a fairly interesting conflict between
> > > > the anti-fragmentation disabling code and the nommu region rbtree. I've
> > > > bisected it down to:
> > > > 
> > > > commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e
> > > > Author: Mel Gorman <mel@csn.ul.ie>
> > > > Date:   Tue Jun 16 15:31:58 2009 -0700
> > > > 
> > > >     page allocator: move check for disabled anti-fragmentation out of fastpath
> > > > 
> > > >     On low-memory systems, anti-fragmentation gets disabled as there is
> > > >     nothing it can do and it would just incur overhead shuffling pages between
> > > >     lists constantly.  Currently the check is made in the free page fast path
> > > >     for every page.  This patch moves it to a slow path.  On machines with low
> > > >     memory, there will be small amount of additional overhead as pages get
> > > >     shuffled between lists but it should quickly settle.
> > > > 
> > > > which causes death on unpacking initramfs on my nommu board. With this
> > > > reverted, everything works as expected. Note that this blows up with all of
> > > > SLOB/SLUB/SLAB.
> > > > 
> > > > I'll continue debugging it, and can post my .config if it will be helpful, but
> > > > hopefully you have some suggestions on what to try :-)
> > > > 
> > > 
> > > Based on the output you have given me, it would appear the real
> > > underlying cause is that fragmentation caused the allocation to fail.
> > > The following patch might fix the problem.
> > >
> > Unfortunately this has no impact, the same issue occurs.
> > 
> 
> What is the output of the following debug patch?
> 

...
Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)
------------[ cut here ]------------
Badness at mm/page_alloc.c:1046

Pid : 0, Comm:          swapper
CPU : 0                 Not tainted  (2.6.31-rc7 #2864)

PC is at free_hot_cold_page+0xa0/0x150
PR is at free_hot_cold_page+0x7e/0x150
PC  : 0c039804 SP  : 0c0f5fa0 SR  : 400000f0
R0  : 00000002 R1  : 00000001 R2  : 0c20eb20 R3  : 0c000000
R4  : 00000002 R5  : 00000024 R6  : 00000002 R7  : 0c079260
R8  : 0c103000 R9  : 0c21c360 R10 : 0c102fec R11 : ffffffff
R12 : 00000000 R13 : 0000d000 R14 : 00000000
MACH: 00000008 MACL: 0000000d GBR : 00000000 PR  : 0c0397e2

Call trace:
 [<0c1093a2>] free_all_bootmem_core+0xda/0x1bc
 [<0c106da2>] mem_init+0x22/0xe0
 [<0c0112dc>] printk+0x0/0x24
 [<0c108f5c>] __alloc_bootmem+0x0/0xc
 [<0c104480>] start_kernel+0xe8/0x4b8
 [<0c00201c>] _stext+0x1c/0x28
 [<0c002000>] _stext+0x0/0x28

Code:
  0c0397fe:  negc      r11, r1
  0c039800:  tst       r1, r1
  0c039802:  bt        0c039810
->0c039804:  trapa     #62
  0c039806:  tst       r1, r1
  0c039808:  bt        0c039810
  0c03980a:  mov       #1, r2
  0c03980c:  mov.l     0c0398ac <free_hot_cold_page+0x148/0x150>, r1  ! 0c20e9dc <0xc20e9dc>
  0c03980e:  mov.l     r2, @r1
...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-09-01  0:46       ` Paul Mundt
@ 2009-09-01 10:03         ` Mel Gorman
  2009-09-01 10:20           ` Paul Mundt
  0 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2009-09-01 10:03 UTC (permalink / raw)
  To: Paul Mundt, Christoph Lameter, KOSAKI Motohiro, Pekka Enberg,
	Peter Zijlstra, Nick Piggin, Dave Hansen, Lee Schermerhorn,
	Andrew Morton, Linus Torvalds, David Howells, linux-mm,
	linux-kernel

On Tue, Sep 01, 2009 at 09:46:27AM +0900, Paul Mundt wrote:
> > What is the output of the following debug patch?
> > 
> 
> ...
> Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)
> ------------[ cut here ]------------
> Badness at mm/page_alloc.c:1046
> 

Ok, it looks like ownership was not being taken properly and the first
patch was incomplete. Please try

====

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d052abb..5596880 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -817,13 +815,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * agressive about taking ownership of free pages
 			 */
 			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE) {
+					start_migratetype == MIGRATE_RECLAIMABLE ||
+					page_group_by_mobility_disabled) {
 				unsigned long pages;
 				pages = move_freepages_block(zone, page,
 								start_migratetype);
 
 				/* Claim the whole block if over half of it is free */
-				if (pages >= (1 << (pageblock_order-1)))
+				if (pages >= (1 << (pageblock_order-1)) ||
+						page_group_by_mobility_disabled)
 					set_pageblock_migratetype(page,
 								start_migratetype);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-09-01 10:03         ` Mel Gorman
@ 2009-09-01 10:20           ` Paul Mundt
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Mundt @ 2009-09-01 10:20 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, KOSAKI Motohiro, Pekka Enberg, Peter Zijlstra,
	Nick Piggin, Dave Hansen, Lee Schermerhorn, Andrew Morton,
	Linus Torvalds, David Howells, linux-mm, linux-kernel

On Tue, Sep 01, 2009 at 11:03:56AM +0100, Mel Gorman wrote:
> On Tue, Sep 01, 2009 at 09:46:27AM +0900, Paul Mundt wrote:
> > > What is the output of the following debug patch?
> > > 
> > 
> > ...
> > Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)
> > ------------[ cut here ]------------
> > Badness at mm/page_alloc.c:1046
> > 
> 
> Ok, it looks like ownership was not being taken properly and the first
> patch was incomplete. Please try
> 
That did the trick, everything looks back to normal now. :-)

Tested-by: Paul Mundt <lethal@linux-sh.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-08-31  7:48 page allocator regression on nommu Paul Mundt
  2009-08-31 10:08 ` Pekka Enberg
  2009-08-31 10:30 ` Mel Gorman
@ 2009-09-01 13:35 ` David Howells
  2 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2009-09-01 13:35 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: dhowells, Paul Mundt, Mel Gorman, Christoph Lameter,
	KOSAKI Motohiro, Peter Zijlstra, Nick Piggin, Dave Hansen,
	Lee Schermerhorn, Andrew Morton, Linus Torvalds, linux-mm,
	linux-kernel

Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> This looks to be a bug in nommu do_mmap_pgoff() error handling. I
> guess we shouldn't call __put_nommu_region() if add_nommu_region()
> hasn't been called?

We should to make sure the region gets cleaned up properly.  However, it will
go wrong if do_mmap_shared_file() or do_mmap_private() fail.  We should
perhaps call add_nommu_region() before doing the "set up the mapping" chunk -
we hold the region semaphore, so it shouldn't hurt anyone if we then have to
remove it again.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-08-31 10:08 ` Pekka Enberg
  2009-08-31 10:26   ` Paul Mundt
@ 2009-09-01 13:46   ` David Howells
  2009-09-01 13:48     ` Pekka Enberg
  2009-09-01 14:27     ` Paul Mundt
  1 sibling, 2 replies; 13+ messages in thread
From: David Howells @ 2009-09-01 13:46 UTC (permalink / raw)
  To: Paul Mundt
  Cc: dhowells, Pekka Enberg, Mel Gorman, Christoph Lameter,
	KOSAKI Motohiro, Peter Zijlstra, Nick Piggin, Dave Hansen,
	Lee Schermerhorn, Andrew Morton, Linus Torvalds, linux-mm,
	linux-kernel

Paul Mundt <lethal@linux-sh.org> wrote:

> Yeah, that looks a bit suspect. __put_nommu_region() is safe to be called
> without a call to add_nommu_region(), but we happen to trip over the
> BUG_ON() in this case because we've never made a single addition to the
> region tree.
> 
> We probably ought to just up_write() and return if nommu_region_tree ==
> RB_ROOT, which is what I'll do unless David objects.

I think that's the wrong thing to do.  I think we're better moving the call to
add_nommu_region() to above the "/* set up the mapping */" comment.  We hold
the region semaphore at this point, so the fact that it winds up in the tree
briefly won't cause a race, and it means __put_nommu_region() can be used with
impunity to correctly clean up.

See attached patch.

David
---
From: David Howells <dhowells@redhat.com>
Subject: [PATCH] NOMMU: Fix error handling in do_mmap_pgoff()

Fix the error handling in do_mmap_pgoff().  If do_mmap_shared_file() or
do_mmap_private() fail, we jump to the error_put_region label at which point we
cann __put_nommu_region() on the region - but we haven't yet added the region
to the tree, and so __put_nommu_region() may BUG because the region tree is
empty or it may corrupt the region tree.

To get around this, we can afford to add the region to the region tree before
calling do_mmap_shared_file() or do_mmap_private() as we keep nommu_region_sem
write-locked, so no-one can race with us by seeing a transient region.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 mm/nommu.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/mm/nommu.c b/mm/nommu.c
index 7466c7a..aabe86c 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1347,6 +1347,7 @@ unsigned long do_mmap_pgoff(struct file *file,
 	}

 	vma->vm_region = region;
+	add_nommu_region(region);

 	/* set up the mapping */
 	if (file && vma->vm_flags & VM_SHARED)
@@ -1356,8 +1357,6 @@ unsigned long do_mmap_pgoff(struct file *file,
 	if (ret < 0)
 		goto error_put_region;

-	add_nommu_region(region);
-
 	/* okay... we have a mapping; now we have to register it */
 	result = vma->vm_start;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-09-01 13:46   ` David Howells
@ 2009-09-01 13:48     ` Pekka Enberg
  2009-09-01 14:27     ` Paul Mundt
  1 sibling, 0 replies; 13+ messages in thread
From: Pekka Enberg @ 2009-09-01 13:48 UTC (permalink / raw)
  To: David Howells
  Cc: Paul Mundt, Mel Gorman, Christoph Lameter, KOSAKI Motohiro,
	Peter Zijlstra, Nick Piggin, Dave Hansen, Lee Schermerhorn,
	Andrew Morton, Linus Torvalds, linux-mm, linux-kernel

On Tue, 2009-09-01 at 14:46 +0100, David Howells wrote:
> From: David Howells <dhowells@redhat.com>
> Subject: [PATCH] NOMMU: Fix error handling in do_mmap_pgoff()
> 
> Fix the error handling in do_mmap_pgoff().  If do_mmap_shared_file() or
> do_mmap_private() fail, we jump to the error_put_region label at which point we
> cann __put_nommu_region() on the region - but we haven't yet added the region
> to the tree, and so __put_nommu_region() may BUG because the region tree is
> empty or it may corrupt the region tree.
> 
> To get around this, we can afford to add the region to the region tree before
> calling do_mmap_shared_file() or do_mmap_private() as we keep nommu_region_sem
> write-locked, so no-one can race with us by seeing a transient region.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>

Looks sane to me. FWIW:

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: page allocator regression on nommu
  2009-09-01 13:46   ` David Howells
  2009-09-01 13:48     ` Pekka Enberg
@ 2009-09-01 14:27     ` Paul Mundt
  1 sibling, 0 replies; 13+ messages in thread
From: Paul Mundt @ 2009-09-01 14:27 UTC (permalink / raw)
  To: David Howells
  Cc: Pekka Enberg, Mel Gorman, Christoph Lameter, KOSAKI Motohiro,
	Peter Zijlstra, Nick Piggin, Dave Hansen, Lee Schermerhorn,
	Andrew Morton, Linus Torvalds, linux-mm, linux-kernel

On Tue, Sep 01, 2009 at 02:46:45PM +0100, David Howells wrote:
> Paul Mundt <lethal@linux-sh.org> wrote:
> 
> > Yeah, that looks a bit suspect. __put_nommu_region() is safe to be called
> > without a call to add_nommu_region(), but we happen to trip over the
> > BUG_ON() in this case because we've never made a single addition to the
> > region tree.
> > 
> > We probably ought to just up_write() and return if nommu_region_tree ==
> > RB_ROOT, which is what I'll do unless David objects.
> 
> I think that's the wrong thing to do.  I think we're better moving the call to
> add_nommu_region() to above the "/* set up the mapping */" comment.  We hold
> the region semaphore at this point, so the fact that it winds up in the tree
> briefly won't cause a race, and it means __put_nommu_region() can be used with
> impunity to correctly clean up.
> 
[snip]

> From: David Howells <dhowells@redhat.com>
> Subject: [PATCH] NOMMU: Fix error handling in do_mmap_pgoff()
> 
> Fix the error handling in do_mmap_pgoff().  If do_mmap_shared_file() or
> do_mmap_private() fail, we jump to the error_put_region label at which point we
> cann __put_nommu_region() on the region - but we haven't yet added the region
> to the tree, and so __put_nommu_region() may BUG because the region tree is
> empty or it may corrupt the region tree.
> 
> To get around this, we can afford to add the region to the region tree before
> calling do_mmap_shared_file() or do_mmap_private() as we keep nommu_region_sem
> write-locked, so no-one can race with us by seeing a transient region.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>

Agreed, that does look cleaner. After playing around with it a bit, I concede
that the BUG_ON() is definitely worth preserving. :-)

Acked-by: Paul Mundt <lethal@linux-sh.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-09-01 14:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-31  7:48 page allocator regression on nommu Paul Mundt
2009-08-31 10:08 ` Pekka Enberg
2009-08-31 10:26   ` Paul Mundt
2009-09-01 13:46   ` David Howells
2009-09-01 13:48     ` Pekka Enberg
2009-09-01 14:27     ` Paul Mundt
2009-08-31 10:30 ` Mel Gorman
2009-08-31 10:43   ` Paul Mundt
2009-08-31 10:59     ` Mel Gorman
2009-09-01  0:46       ` Paul Mundt
2009-09-01 10:03         ` Mel Gorman
2009-09-01 10:20           ` Paul Mundt
2009-09-01 13:35 ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).