* oom-killer killing even if memory is available?
@ 2009-03-17 9:00 Heiko Carstens
2009-03-17 9:46 ` Andrew Morton
2009-03-17 9:51 ` Nick Piggin
0 siblings, 2 replies; 13+ messages in thread
From: Heiko Carstens @ 2009-03-17 9:00 UTC (permalink / raw)
To: linux-mm
Cc: Mel Gorman, Nick Piggin, Andrew Morton, Martin Schwidefsky,
Andreas Krebbel
Hi all,
the below looks like there is some bug in the memory management code.
Even if there seems to be plenty of memory available the oom-killer
kills processes.
The below happened after 27 days uptime, memory seems to be heavily
fragmented, but there are stills larger portions of memory free that
could satisfy an order 2 allocation. Any idea why this fails?
[root@t6360003 ~]# uptime
09:33:41 up 27 days, 22:55, 1 user, load average: 0.00, 0.00, 0.00
Mar 16 21:40:40 t6360003 kernel: basename invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
Mar 16 21:40:40 t6360003 kernel: CPU: 0 Not tainted 2.6.28 #1
Mar 16 21:40:40 t6360003 kernel: Process basename (pid: 30555, task: 000000007baa6838, ksp: 0000000063867968)
Mar 16 21:40:40 t6360003 kernel: 0700000084a8c238 0000000063867a90 0000000000000002 0000000000000000
Mar 16 21:40:40 t6360003 kernel: 0000000063867b30 0000000063867aa8 0000000063867aa8 000000000010534e
Mar 16 21:40:40 t6360003 kernel: 0000000000000000 0000000063867968 0000000000000000 000000000000000a
Mar 16 21:40:40 t6360003 kernel: 000000000000000d 0000000000000000 0000000063867a90 0000000063867b08
Mar 16 21:40:40 t6360003 kernel: 00000000004a5ab0 000000000010534e 0000000063867a90 0000000063867ae0
Mar 16 21:40:40 t6360003 kernel: Call Trace:
Mar 16 21:40:40 t6360003 kernel: ([<0000000000105248>] show_trace+0xf4/0x144)
Mar 16 21:40:40 t6360003 kernel: [<0000000000105300>] show_stack+0x68/0xf4
Mar 16 21:40:40 t6360003 kernel: [<000000000049c84c>] dump_stack+0xb0/0xc0
Mar 16 21:40:40 t6360003 kernel: [<000000000019235e>] oom_kill_process+0x9e/0x220
Mar 16 21:40:40 t6360003 kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264
Mar 16 21:40:40 t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
Mar 16 21:40:40 t6360003 kernel: [<0000000000104058>] crst_table_alloc+0x48/0x108
Mar 16 21:40:40 t6360003 kernel: [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8
Mar 16 21:40:40 t6360003 kernel: [<00000000001a802e>] handle_mm_fault+0x262/0x9cc
Mar 16 21:40:40 t6360003 kernel: [<00000000004a1a7a>] do_dat_exception+0x30a/0x41c
Mar 16 21:40:40 t6360003 kernel: [<0000000000115e5c>] sysc_return+0x0/0x8
Mar 16 21:40:40 t6360003 kernel: [<0000004d193bfae0>] 0x4d193bfae0
Mar 16 21:40:40 t6360003 kernel: Mem-Info:
Mar 16 21:40:40 t6360003 kernel: DMA per-cpu:
Mar 16 21:40:40 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: Normal per-cpu:
Mar 16 21:40:40 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 30
Mar 16 21:40:40 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Mar 16 21:40:40 t6360003 kernel: Active_anon:372 active_file:45 inactive_anon:154
Mar 16 21:40:40 t6360003 kernel: inactive_file:152 unevictable:987 dirty:0 writeback:188 unstable:0
Mar 16 21:40:40 t6360003 kernel: free:146348 slab:875833 mapped:805 pagetables:378 bounce:0
Mar 16 21:40:40 t6360003 kernel: DMA free:467728kB min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:116kB unevictable:0kB present:2068480kB pages_scanned:0 all_unreclaimable? no
Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 2020 2020
Mar 16 21:40:40 t6360003 kernel: Normal free:117664kB min:4064kB low:5080kB high:6096kB active_anon:1488kB inactive_anon:616kB active_file:188kB inactive_file:492kB unevictable:3948kB present:2068480kB pages_scanned:128 all_unreclaimable? no
Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 0 0
Mar 16 21:40:40 t6360003 kernel: DMA: 101853*4kB 7419*8kB 2*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB = 467692kB
Mar 16 21:40:40 t6360003 kernel: Normal: 28880*4kB 121*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB = 117944kB
Mar 16 21:40:40 t6360003 kernel: 1688 total pagecache pages
Mar 16 21:40:40 t6360003 kernel: 564 pages in swap cache
Mar 16 21:40:40 t6360003 kernel: Swap cache stats: add 1106206, delete 1105642, find 599107/618721
Mar 16 21:40:40 t6360003 kernel: Free swap = 1959300kB
Mar 16 21:40:40 t6360003 kernel: Total swap = 1999992kB
Mar 16 21:40:40 t6360003 kernel: 1048576 pages RAM
Mar 16 21:40:40 t6360003 kernel: 20255 pages reserved
Mar 16 21:40:40 t6360003 kernel: 10560 pages shared
Mar 16 21:40:40 t6360003 kernel: 878998 pages non-shared
Mar 16 21:40:40 t6360003 kernel: Out of memory: kill process 30502 (cc1) score 3672 or a child
Mar 16 21:40:40 t6360003 kernel: Killed process 30502 (cc1)
Mar 17 01:33:12 t6360003 kernel: sh invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
Mar 17 01:33:12 t6360003 kernel: CPU: 5 Not tainted 2.6.28 #1
Mar 17 01:33:12 t6360003 kernel: Process sh (pid: 16756, task: 0000000004852738, ksp: 0000000050b7d738)
Mar 17 01:33:12 t6360003 kernel: 07000000fbb28238 0000000050b7d860 0000000000000002 0000000000000000
Mar 17 01:33:12 t6360003 kernel: 0000000050b7d900 0000000050b7d878 0000000050b7d878 000000000010534e
Mar 17 01:33:12 t6360003 kernel: 0000000000000000 0000000050b7d738 0000000000000000 000000000000000a
Mar 17 01:33:12 t6360003 kernel: 000000000000000d 0000000000000000 0000000050b7d860 0000000050b7d8d8
Mar 17 01:33:12 t6360003 kernel: 00000000004a5ab0 000000000010534e 0000000050b7d860 0000000050b7d8b0
Mar 17 01:33:12 t6360003 kernel: Call Trace:
Mar 17 01:33:12 t6360003 kernel: ([<0000000000105248>] show_trace+0xf4/0x144)
Mar 17 01:33:12 t6360003 kernel: [<0000000000105300>] show_stack+0x68/0xf4
Mar 17 01:33:12 t6360003 kernel: [<000000000049c84c>] dump_stack+0xb0/0xc0
Mar 17 01:33:12 t6360003 kernel: [<000000000019235e>] oom_kill_process+0x9e/0x220
Mar 17 01:33:12 t6360003 kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264
Mar 17 01:33:12 t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
Mar 17 01:33:12 t6360003 kernel: [<0000000000104058>] crst_table_alloc+0x48/0x108
Mar 17 01:33:12 t6360003 kernel: [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8
Mar 17 01:33:12 t6360003 kernel: [<00000000001aa8ac>] copy_page_range+0x9ac/0xadc
Mar 17 01:33:12 t6360003 kernel: [<000000000013db32>] dup_mm+0x342/0x604
Mar 17 01:33:12 t6360003 kernel: [<000000000013ef70>] copy_process+0x1118/0x1158
Mar 17 01:33:12 t6360003 kernel: [<000000000013f046>] do_fork+0x96/0x2dc
Mar 17 01:33:12 t6360003 kernel: [<000000000010a402>] sys_clone+0x6a/0x78
Mar 17 01:33:12 t6360003 kernel: [<0000000000115e56>] sysc_noemu+0x10/0x16
Mar 17 01:33:12 t6360003 kernel: [<0000004d1949a152>] 0x4d1949a152
Mar 17 01:33:12 t6360003 kernel: Mem-Info:
Mar 17 01:33:12 t6360003 kernel: DMA per-cpu:
Mar 17 01:33:12 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: Normal per-cpu:
Mar 17 01:33:12 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 30
Mar 17 01:33:12 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Mar 17 01:33:12 t6360003 kernel: Active_anon:1057 active_file:85 inactive_anon:457
Mar 17 01:33:12 t6360003 kernel: inactive_file:163 unevictable:987 dirty:6 writeback:414 unstable:0
Mar 17 01:33:12 t6360003 kernel: free:136683 slab:884736 mapped:832 pagetables:375 bounce:0
Mar 17 01:33:12 t6360003 kernel: DMA free:445420kB min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:8kB active_file:32kB inactive_file:4kB unevictable:0kB present:2068480kB pages_scanned:0 all_unreclaimable? no
Mar 17 01:33:12 t6360003 kernel: lowmem_reserve[]: 0 2020 2020
Mar 17 01:33:12 t6360003 kernel: Normal free:101312kB min:4064kB low:5080kB high:6096kB active_anon:4228kB inactive_anon:1820kB active_file:308kB inactive_file:648kB unevictable:3948kB present:2068480kB pages_scanned:0 all_unreclaimable? no
Mar 17 01:33:12 t6360003 kernel: lowmem_reserve[]: 0 0 0
Mar 17 01:33:12 t6360003 kernel: DMA: 100796*4kB 5166*8kB 5*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB = 445648kB
Mar 17 01:33:12 t6360003 kernel: Normal: 24811*4kB 56*8kB 1*16kB 4*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB = 101500kB
Mar 17 01:33:12 t6360003 kernel: 2265 total pagecache pages
Mar 17 01:33:12 t6360003 kernel: 1197 pages in swap cache
Mar 17 01:33:12 t6360003 kernel: Swap cache stats: add 3336530, delete 3335333, find 2045244/2201205
Mar 17 01:33:12 t6360003 kernel: Free swap = 1971336kB
Mar 17 01:33:12 t6360003 kernel: Total swap = 1999992kB
Mar 17 01:33:12 t6360003 kernel: 1048576 pages RAM
Mar 17 01:33:12 t6360003 kernel: 20255 pages reserved
Mar 17 01:33:12 t6360003 kernel: 9350 pages shared
Mar 17 01:33:12 t6360003 kernel: 888261 pages non-shared
Mar 17 01:33:12 t6360003 kernel: Out of memory: kill process 27449 (rpmbuild) score 3460 or a child
Mar 17 01:33:12 t6360003 kernel: Killed process 27519 (sh)
Mar 17 01:33:13 t6360003 kernel: as invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
Mar 17 01:33:13 t6360003 kernel: CPU: 2 Not tainted 2.6.28 #1
Mar 17 01:33:13 t6360003 kernel: Process as (pid: 16914, task: 0000000084aba338, ksp: 000000000e3c76d8)
Mar 17 01:33:13 t6360003 kernel: 0700000035e74138 000000000e3c7800 0000000000000002 0000000000000000
Mar 17 01:33:13 t6360003 kernel: 000000000e3c78a0 000000000e3c7818 000000000e3c7818 000000000010534e
Mar 17 01:33:13 t6360003 kernel: 0000000000000000 000000000e3c76d8 0000000000000000 000000000000000a
Mar 17 01:33:13 t6360003 kernel: 000000000000000d 0000000000000000 000000000e3c7800 000000000e3c7878
Mar 17 01:33:13 t6360003 kernel: 00000000004a5ab0 000000000010534e 000000000e3c7800 000000000e3c7850
Mar 17 01:33:13 t6360003 kernel: Call Trace:
Mar 17 01:33:13 t6360003 kernel: ([<0000000000105248>] show_trace+0xf4/0x144)
Mar 17 01:33:13 t6360003 kernel: [<0000000000105300>] show_stack+0x68/0xf4
Mar 17 01:33:13 t6360003 kernel: [<000000000049c84c>] dump_stack+0xb0/0xc0
Mar 17 01:33:13 t6360003 kernel: [<000000000019235e>] oom_kill_process+0x9e/0x220
Mar 17 01:33:13 t6360003 kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264
Mar 17 01:33:13 t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
Mar 17 01:33:13 t6360003 kernel: [<0000000000104058>] crst_table_alloc+0x48/0x108
Mar 17 01:33:13 t6360003 kernel: [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8
Mar 17 01:33:13 t6360003 kernel: [<00000000001a802e>] handle_mm_fault+0x262/0x9cc
Mar 17 01:33:13 t6360003 kernel: [<00000000001a894e>] __get_user_pages+0x1b6/0x574
Mar 17 01:33:13 t6360003 kernel: [<00000000001a8d5a>] get_user_pages+0x4e/0x60
Mar 17 01:33:13 t6360003 kernel: [<00000000001d58c4>] get_arg_page+0x6c/0xe8
Mar 17 01:33:13 t6360003 kernel: [<00000000001d5c3a>] copy_strings+0x1aa/0x290
Mar 17 01:33:13 t6360003 kernel: [<00000000001d5d7e>] copy_strings_kernel+0x5e/0xb0
Mar 17 01:33:13 t6360003 kernel: [<00000000001d78b0>] do_execve+0x1c8/0x254
Mar 17 01:33:13 t6360003 kernel: [<000000000010a2f8>] sys_execve+0x80/0xb8
Mar 17 01:33:13 t6360003 kernel: [<0000000000115e56>] sysc_noemu+0x10/0x16
Mar 17 01:33:13 t6360003 kernel: [<0000004d1949a40c>] 0x4d1949a40c
Mar 17 01:33:13 t6360003 kernel: Mem-Info:
Mar 17 01:33:13 t6360003 kernel: DMA per-cpu:
Mar 17 01:33:13 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: Normal per-cpu:
Mar 17 01:33:13 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Mar 17 01:33:13 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 12
Mar 17 01:33:13 t6360003 kernel: Active_anon:274 active_file:126 inactive_anon:92
Mar 17 01:33:13 t6360003 kernel: inactive_file:110 unevictable:987 dirty:20 writeback:222 unstable:0
Mar 17 01:33:13 t6360003 kernel: free:137753 slab:884727 mapped:901 pagetables:318 bounce:0
Mar 17 01:33:13 t6360003 kernel: DMA free:445604kB min:4064kB low:5080kB high:6096kB active_anon:8kB inactive_anon:0kB active_file:0kB inactive_file:104kB unevictable:0kB present:2068480kB pages_scanned:0 all_unreclaimable? no
Mar 17 01:33:13 t6360003 kernel: lowmem_reserve[]: 0 2020 2020
Mar 17 01:33:13 t6360003 kernel: Normal free:105408kB min:4064kB low:5080kB high:6096kB active_anon:1088kB inactive_anon:368kB active_file:504kB inactive_file:336kB unevictable:3948kB present:2068480kB pages_scanned:450 all_unreclaimable? no
Mar 17 01:33:13 t6360003 kernel: lowmem_reserve[]: 0 0 0
Mar 17 01:33:13 t6360003 kernel: DMA: 100811*4kB 5173*8kB 6*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB = 445812kB
Mar 17 01:33:13 t6360003 kernel: Normal: 25877*4kB 62*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB = 105652kB
Mar 17 01:33:13 t6360003 kernel: 1477 total pagecache pages
Mar 17 01:33:13 t6360003 kernel: 398 pages in swap cache
Mar 17 01:33:13 t6360003 kernel: Swap cache stats: add 3343439, delete 3343041, find 2048863/2205415
Mar 17 01:33:13 t6360003 kernel: Free swap = 1960020kB
Mar 17 01:33:13 t6360003 kernel: Total swap = 1999992kB
Mar 17 01:33:13 t6360003 kernel: 1048576 pages RAM
Mar 17 01:33:13 t6360003 kernel: 20255 pages reserved
Mar 17 01:33:13 t6360003 kernel: 9549 pages shared
Mar 17 01:33:13 t6360003 kernel: 887159 pages non-shared
Mar 17 01:33:13 t6360003 kernel: Out of memory: kill process 29305 (make) score 3403 or a child
Mar 17 01:33:13 t6360003 kernel: Killed process 29320 (sh)
Mar 17 01:33:14 t6360003 kernel: sh invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
Mar 17 01:33:14 t6360003 kernel: CPU: 4 Not tainted 2.6.28 #1
Mar 17 01:33:14 t6360003 kernel: Process sh (pid: 16922, task: 0000000084ab6138, ksp: 0000000060781738)
Mar 17 01:33:14 t6360003 kernel: 070000003d7a0438 0000000060781860 0000000000000002 0000000000000000
Mar 17 01:33:14 t6360003 kernel: 0000000060781900 0000000060781878 0000000060781878 000000000010534e
Mar 17 01:33:14 t6360003 kernel: 0000000000000000 0000000060781738 0000000000000000 000000000000000a
Mar 17 01:33:14 t6360003 kernel: 000000000000000d 0000000000000000 0000000060781860 00000000607818d8
Mar 17 01:33:14 t6360003 kernel: 00000000004a5ab0 000000000010534e 0000000060781860 00000000607818b0
Mar 17 01:33:14 t6360003 kernel: Call Trace:
Mar 17 01:33:14 t6360003 kernel: ([<0000000000105248>] show_trace+0xf4/0x144)
Mar 17 01:33:14 t6360003 kernel: [<0000000000105300>] show_stack+0x68/0xf4
Mar 17 01:33:14 t6360003 kernel: [<000000000049c84c>] dump_stack+0xb0/0xc0
Mar 17 01:33:14 t6360003 kernel: [<000000000019235e>] oom_kill_process+0x9e/0x220
Mar 17 01:33:14 t6360003 kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264
Mar 17 01:33:14 t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
Mar 17 01:33:14 t6360003 kernel: [<0000000000104058>] crst_table_alloc+0x48/0x108
Mar 17 01:33:14 t6360003 kernel: [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8
Mar 17 01:33:14 t6360003 kernel: [<00000000001aa8ac>] copy_page_range+0x9ac/0xadc
Mar 17 01:33:14 t6360003 kernel: [<000000000013db32>] dup_mm+0x342/0x604
Mar 17 01:33:14 t6360003 kernel: [<000000000013ef70>] copy_process+0x1118/0x1158
Mar 17 01:33:14 t6360003 kernel: [<000000000013f046>] do_fork+0x96/0x2dc
Mar 17 01:33:14 t6360003 kernel: [<000000000010a402>] sys_clone+0x6a/0x78
Mar 17 01:33:14 t6360003 kernel: [<0000000000115e56>] sysc_noemu+0x10/0x16
Mar 17 01:33:14 t6360003 kernel: [<0000004d1949a152>] 0x4d1949a152
Mar 17 01:33:14 t6360003 kernel: Mem-Info:
Mar 17 01:33:14 t6360003 kernel: DMA per-cpu:
Mar 17 01:33:14 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: Normal per-cpu:
Mar 17 01:33:14 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 42
Mar 17 01:33:14 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 50
Mar 17 01:33:14 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
Mar 17 01:33:14 t6360003 kernel: Active_anon:171 active_file:100 inactive_anon:74
Mar 17 01:33:14 t6360003 kernel: inactive_file:78 unevictable:987 dirty:0 writeback:146 unstable:0
Mar 17 01:33:14 t6360003 kernel: free:137942 slab:884626 mapped:858 pagetables:357 bounce:0
Mar 17 01:33:14 t6360003 kernel: DMA free:445612kB min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:8kB active_file:104kB inactive_file:0kB unevictable:0kB present:2068480kB pages_scanned:0 all_unreclaimable? no
Mar 17 01:33:14 t6360003 kernel: lowmem_reserve[]: 0 2020 2020
Mar 17 01:33:14 t6360003 kernel: Normal free:106156kB min:4064kB low:5080kB high:6096kB active_anon:684kB inactive_anon:288kB active_file:296kB inactive_file:312kB unevictable:3948kB present:2068480kB pages_scanned:544 all_unreclaimable? no
Mar 17 01:33:14 t6360003 kernel: lowmem_reserve[]: 0 0 0
Mar 17 01:33:14 t6360003 kernel: DMA: 100818*4kB 5175*8kB 2*16kB 3*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB = 445888kB
Mar 17 01:33:14 t6360003 kernel: Normal: 25978*4kB 76*8kB 2*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB = 106184kB
Mar 17 01:33:14 t6360003 kernel: 1315 total pagecache pages
Mar 17 01:33:14 t6360003 kernel: 321 pages in swap cache
Mar 17 01:33:14 t6360003 kernel: Swap cache stats: add 3349005, delete 3348684, find 2049544/2206461
Mar 17 01:33:14 t6360003 kernel: Free swap = 1947096kB
Mar 17 01:33:14 t6360003 kernel: Total swap = 1999992kB
Mar 17 01:33:14 t6360003 kernel: 1048576 pages RAM
Mar 17 01:33:14 t6360003 kernel: 20255 pages reserved
Mar 17 01:33:14 t6360003 kernel: 9878 pages shared
Mar 17 01:33:14 t6360003 kernel: 887456 pages non-shared
Mar 17 01:33:14 t6360003 kernel: Out of memory: kill process 16782 (cc1) score 3375 or a child
Mar 17 01:33:14 t6360003 kernel: Killed process 16782 (cc1)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-17 9:00 oom-killer killing even if memory is available? Heiko Carstens
@ 2009-03-17 9:46 ` Andrew Morton
2009-03-17 10:17 ` Heiko Carstens
2009-03-20 15:27 ` Mel Gorman
2009-03-17 9:51 ` Nick Piggin
1 sibling, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2009-03-17 9:46 UTC (permalink / raw)
To: Heiko Carstens
Cc: linux-mm, Mel Gorman, Nick Piggin, Martin Schwidefsky,
Andreas Krebbel
On Tue, 17 Mar 2009 10:00:49 +0100 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> Hi all,
>
> the below looks like there is some bug in the memory management code.
> Even if there seems to be plenty of memory available the oom-killer
> kills processes.
>
> The below happened after 27 days uptime, memory seems to be heavily
> fragmented, but there are stills larger portions of memory free that
> could satisfy an order 2 allocation. Any idea why this fails?
>
> [root@t6360003 ~]# uptime
> 09:33:41 up 27 days, 22:55, 1 user, load average: 0.00, 0.00, 0.00
>
> Mar 16 21:40:40 t6360003 kernel: basename invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
> Mar 16 21:40:40 t6360003 kernel: CPU: 0 Not tainted 2.6.28 #1
> Mar 16 21:40:40 t6360003 kernel: Process basename (pid: 30555, task: 000000007baa6838, ksp: 0000000063867968)
> Mar 16 21:40:40 t6360003 kernel: 0700000084a8c238 0000000063867a90 0000000000000002 0000000000000000
> Mar 16 21:40:40 t6360003 kernel: 0000000063867b30 0000000063867aa8 0000000063867aa8 000000000010534e
> Mar 16 21:40:40 t6360003 kernel: 0000000000000000 0000000063867968 0000000000000000 000000000000000a
> Mar 16 21:40:40 t6360003 kernel: 000000000000000d 0000000000000000 0000000063867a90 0000000063867b08
> Mar 16 21:40:40 t6360003 kernel: 00000000004a5ab0 000000000010534e 0000000063867a90 0000000063867ae0
> Mar 16 21:40:40 t6360003 kernel: Call Trace:
> Mar 16 21:40:40 t6360003 kernel: ([<0000000000105248>] show_trace+0xf4/0x144)
> Mar 16 21:40:40 t6360003 kernel: [<0000000000105300>] show_stack+0x68/0xf4
> Mar 16 21:40:40 t6360003 kernel: [<000000000049c84c>] dump_stack+0xb0/0xc0
> Mar 16 21:40:40 t6360003 kernel: [<000000000019235e>] oom_kill_process+0x9e/0x220
> Mar 16 21:40:40 t6360003 kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264
> Mar 16 21:40:40 t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
> Mar 16 21:40:40 t6360003 kernel: [<0000000000104058>] crst_table_alloc+0x48/0x108
> Mar 16 21:40:40 t6360003 kernel: [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8
> Mar 16 21:40:40 t6360003 kernel: [<00000000001a802e>] handle_mm_fault+0x262/0x9cc
> Mar 16 21:40:40 t6360003 kernel: [<00000000004a1a7a>] do_dat_exception+0x30a/0x41c
> Mar 16 21:40:40 t6360003 kernel: [<0000000000115e5c>] sysc_return+0x0/0x8
> Mar 16 21:40:40 t6360003 kernel: [<0000004d193bfae0>] 0x4d193bfae0
> Mar 16 21:40:40 t6360003 kernel: Mem-Info:
> Mar 16 21:40:40 t6360003 kernel: DMA per-cpu:
> Mar 16 21:40:40 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: Normal per-cpu:
> Mar 16 21:40:40 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 30
> Mar 16 21:40:40 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: Active_anon:372 active_file:45 inactive_anon:154
> Mar 16 21:40:40 t6360003 kernel: inactive_file:152 unevictable:987 dirty:0 writeback:188 unstable:0
> Mar 16 21:40:40 t6360003 kernel: free:146348 slab:875833 mapped:805 pagetables:378 bounce:0
> Mar 16 21:40:40 t6360003 kernel: DMA free:467728kB min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:116kB unevictable:0kB present:2068480kB pages_scanned:0 all_unreclaimable? no
> Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 2020 2020
> Mar 16 21:40:40 t6360003 kernel: Normal free:117664kB min:4064kB low:5080kB high:6096kB active_anon:1488kB inactive_anon:616kB active_file:188kB inactive_file:492kB unevictable:3948kB present:2068480kB pages_scanned:128 all_unreclaimable? no
> Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 0 0
The scanner has wrung pretty much all it can out of the reclaimable pages -
the LRUs are nearly empty. There's a few hundred MB free and apparently we
don't have four physically contiguous free pages anywhere. It's
believeable.
The question is: where the heck did all your memory go? You have 2GB of
ZONE_NORMAL memory in that machine, but only a tenth of it is visible to
the page reclaim code.
Something must have allocated (and possibly leaked) it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-17 9:00 oom-killer killing even if memory is available? Heiko Carstens
2009-03-17 9:46 ` Andrew Morton
@ 2009-03-17 9:51 ` Nick Piggin
2009-03-17 10:11 ` Heiko Carstens
1 sibling, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2009-03-17 9:51 UTC (permalink / raw)
To: Heiko Carstens, Mel Gorman
Cc: linux-mm, Nick Piggin, Andrew Morton, Martin Schwidefsky,
Andreas Krebbel
On Tuesday 17 March 2009 20:00:49 Heiko Carstens wrote:
> Hi all,
>
> the below looks like there is some bug in the memory management code.
> Even if there seems to be plenty of memory available the oom-killer
> kills processes.
>
> The below happened after 27 days uptime, memory seems to be heavily
> fragmented,
What slab allocator are you using?
> but there are stills larger portions of memory free that
> could satisfy an order 2 allocation. Any idea why this fails?
We still keep some watermarks around for higher order pages (for
GFP_ATOMIC and page reclaim etc purposes).
Possibly it is being a bit aggressive with the higher orders; when I
added it I just made a guess at a sane function. See
mm/page_alloc.c:zone_watermark_ok(). In particular, the for loop at the
end of the function is the slowpath where it is calculating higher
order watermarks. The min >>= 1 statement, 1 could be replaced with 2.
Or we could just keep reserves for 0..PAGE_ALLOC_COSTLY_ORDER and then
give away _any_ free pages for higher orders than that.
Still would seem to just prolong the inevitable? Exploding after 27 days
of uptime is rather sad :(
> [root@t6360003 ~]# uptime
> 09:33:41 up 27 days, 22:55, 1 user, load average: 0.00, 0.00, 0.00
>
> Mar 16 21:40:40 t6360003 kernel: basename invoked oom-killer:
> gfp_mask=0xd0, order=2, oomkilladj=0 Mar 16 21:40:40 t6360003 kernel: CPU:
order 2, __GFP_WAIT|__GFP_IO|__GFP_FS.
> 0 Not tainted 2.6.28 #1
> Mar 16 21:40:40 t6360003 kernel: Process basename (pid: 30555, task:
> 000000007baa6838, ksp: 0000000063867968) Mar 16 21:40:40 t6360003 kernel:
> 0700000084a8c238 0000000063867a90 0000000000000002 0000000000000000 Mar 16
> 21:40:40 t6360003 kernel: 0000000063867b30 0000000063867aa8
> 0000000063867aa8 000000000010534e Mar 16 21:40:40 t6360003 kernel:
> 0000000000000000 0000000063867968 0000000000000000 000000000000000a Mar 16
> 21:40:40 t6360003 kernel: 000000000000000d 0000000000000000
> 0000000063867a90 0000000063867b08 Mar 16 21:40:40 t6360003 kernel:
> 00000000004a5ab0 000000000010534e 0000000063867a90 0000000063867ae0 Mar 16
> 21:40:40 t6360003 kernel: Call Trace:
> Mar 16 21:40:40 t6360003 kernel: ([<0000000000105248>]
> show_trace+0xf4/0x144) Mar 16 21:40:40 t6360003 kernel:
> [<0000000000105300>] show_stack+0x68/0xf4 Mar 16 21:40:40 t6360003 kernel:
> [<000000000049c84c>] dump_stack+0xb0/0xc0 Mar 16 21:40:40 t6360003 kernel:
> [<000000000019235e>] oom_kill_process+0x9e/0x220 Mar 16 21:40:40 t6360003
> kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264 Mar 16 21:40:40
> t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
> Mar 16 21:40:40 t6360003 kernel: [<0000000000104058>]
> crst_table_alloc+0x48/0x108 Mar 16 21:40:40 t6360003 kernel:
> [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8 Mar 16 21:40:40 t6360003
> kernel: [<00000000001a802e>] handle_mm_fault+0x262/0x9cc Mar 16 21:40:40
> t6360003 kernel: [<00000000004a1a7a>] do_dat_exception+0x30a/0x41c Mar 16
> 21:40:40 t6360003 kernel: [<0000000000115e5c>] sysc_return+0x0/0x8 Mar 16
> 21:40:40 t6360003 kernel: [<0000004d193bfae0>] 0x4d193bfae0 Mar 16
> 21:40:40 t6360003 kernel: Mem-Info:
> Mar 16 21:40:40 t6360003 kernel: DMA per-cpu:
> Mar 16 21:40:40 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: Normal per-cpu:
> Mar 16 21:40:40 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 30
> Mar 16 21:40:40 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> Mar 16 21:40:40 t6360003 kernel: Active_anon:372 active_file:45
> inactive_anon:154 Mar 16 21:40:40 t6360003 kernel: inactive_file:152
> unevictable:987 dirty:0 writeback:188 unstable:0 Mar 16 21:40:40 t6360003
> kernel: free:146348 slab:875833 mapped:805 pagetables:378 bounce:0 Mar 16
> 21:40:40 t6360003 kernel: DMA free:467728kB min:4064kB low:5080kB
> high:6096kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:116kB unevictable:0kB present:2068480kB pages_scanned:0
> all_unreclaimable? no Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0
> 2020 2020
> Mar 16 21:40:40 t6360003 kernel: Normal free:117664kB min:4064kB low:5080kB
> high:6096kB active_anon:1488kB inactive_anon:616kB active_file:188kB
> inactive_file:492kB unevictable:3948kB present:2068480kB pages_scanned:128
> all_unreclaimable? no Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0
> 0 0
> Mar 16 21:40:40 t6360003 kernel: DMA: 101853*4kB 7419*8kB 2*16kB 2*32kB
> 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB = 467692kB Mar 16 21:40:40 t6360003
> kernel: Normal: 28880*4kB 121*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB
> 0*512kB 1*1024kB = 117944kB Mar 16 21:40:40 t6360003 kernel: 1688 total
> pagecache pages
> Mar 16 21:40:40 t6360003 kernel: 564 pages in swap cache
> Mar 16 21:40:40 t6360003 kernel: Swap cache stats: add 1106206, delete
> 1105642, find 599107/618721 Mar 16 21:40:40 t6360003 kernel: Free swap =
> 1959300kB
> Mar 16 21:40:40 t6360003 kernel: Total swap = 1999992kB
> Mar 16 21:40:40 t6360003 kernel: 1048576 pages RAM
> Mar 16 21:40:40 t6360003 kernel: 20255 pages reserved
> Mar 16 21:40:40 t6360003 kernel: 10560 pages shared
> Mar 16 21:40:40 t6360003 kernel: 878998 pages non-shared
> Mar 16 21:40:40 t6360003 kernel: Out of memory: kill process 30502 (cc1)
> score 3672 or a child Mar 16 21:40:40 t6360003 kernel: Killed process 30502
> (cc1)
> Mar 17 01:33:12 t6360003 kernel: sh invoked oom-killer: gfp_mask=0xd0,
> order=2, oomkilladj=0 Mar 17 01:33:12 t6360003 kernel: CPU: 5 Not tainted
> 2.6.28 #1
> Mar 17 01:33:12 t6360003 kernel: Process sh (pid: 16756, task:
> 0000000004852738, ksp: 0000000050b7d738) Mar 17 01:33:12 t6360003 kernel:
> 07000000fbb28238 0000000050b7d860 0000000000000002 0000000000000000 Mar 17
> 01:33:12 t6360003 kernel: 0000000050b7d900 0000000050b7d878
> 0000000050b7d878 000000000010534e Mar 17 01:33:12 t6360003 kernel:
> 0000000000000000 0000000050b7d738 0000000000000000 000000000000000a Mar 17
> 01:33:12 t6360003 kernel: 000000000000000d 0000000000000000
> 0000000050b7d860 0000000050b7d8d8 Mar 17 01:33:12 t6360003 kernel:
> 00000000004a5ab0 000000000010534e 0000000050b7d860 0000000050b7d8b0 Mar 17
> 01:33:12 t6360003 kernel: Call Trace:
> Mar 17 01:33:12 t6360003 kernel: ([<0000000000105248>]
> show_trace+0xf4/0x144) Mar 17 01:33:12 t6360003 kernel:
> [<0000000000105300>] show_stack+0x68/0xf4 Mar 17 01:33:12 t6360003 kernel:
> [<000000000049c84c>] dump_stack+0xb0/0xc0 Mar 17 01:33:12 t6360003 kernel:
> [<000000000019235e>] oom_kill_process+0x9e/0x220 Mar 17 01:33:12 t6360003
> kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264 Mar 17 01:33:12
> t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
> Mar 17 01:33:12 t6360003 kernel: [<0000000000104058>]
> crst_table_alloc+0x48/0x108 Mar 17 01:33:12 t6360003 kernel:
> [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8 Mar 17 01:33:12 t6360003
> kernel: [<00000000001aa8ac>] copy_page_range+0x9ac/0xadc Mar 17 01:33:12
> t6360003 kernel: [<000000000013db32>] dup_mm+0x342/0x604 Mar 17 01:33:12
> t6360003 kernel: [<000000000013ef70>] copy_process+0x1118/0x1158 Mar 17
> 01:33:12 t6360003 kernel: [<000000000013f046>] do_fork+0x96/0x2dc Mar 17
> 01:33:12 t6360003 kernel: [<000000000010a402>] sys_clone+0x6a/0x78 Mar 17
> 01:33:12 t6360003 kernel: [<0000000000115e56>] sysc_noemu+0x10/0x16 Mar 17
> 01:33:12 t6360003 kernel: [<0000004d1949a152>] 0x4d1949a152 Mar 17
> 01:33:12 t6360003 kernel: Mem-Info:
> Mar 17 01:33:12 t6360003 kernel: DMA per-cpu:
> Mar 17 01:33:12 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: Normal per-cpu:
> Mar 17 01:33:12 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 30
> Mar 17 01:33:12 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:12 t6360003 kernel: Active_anon:1057 active_file:85
> inactive_anon:457 Mar 17 01:33:12 t6360003 kernel: inactive_file:163
> unevictable:987 dirty:6 writeback:414 unstable:0 Mar 17 01:33:12 t6360003
> kernel: free:136683 slab:884736 mapped:832 pagetables:375 bounce:0 Mar 17
> 01:33:12 t6360003 kernel: DMA free:445420kB min:4064kB low:5080kB
> high:6096kB active_anon:0kB inactive_anon:8kB active_file:32kB
> inactive_file:4kB unevictable:0kB present:2068480kB pages_scanned:0
> all_unreclaimable? no Mar 17 01:33:12 t6360003 kernel: lowmem_reserve[]: 0
> 2020 2020
> Mar 17 01:33:12 t6360003 kernel: Normal free:101312kB min:4064kB low:5080kB
> high:6096kB active_anon:4228kB inactive_anon:1820kB active_file:308kB
> inactive_file:648kB unevictable:3948kB present:2068480kB pages_scanned:0
> all_unreclaimable? no Mar 17 01:33:12 t6360003 kernel: lowmem_reserve[]: 0
> 0 0
> Mar 17 01:33:12 t6360003 kernel: DMA: 100796*4kB 5166*8kB 5*16kB 1*32kB
> 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB = 445648kB Mar 17 01:33:12 t6360003
> kernel: Normal: 24811*4kB 56*8kB 1*16kB 4*32kB 0*64kB 1*128kB 0*256kB
> 1*512kB 1*1024kB = 101500kB Mar 17 01:33:12 t6360003 kernel: 2265 total
> pagecache pages
> Mar 17 01:33:12 t6360003 kernel: 1197 pages in swap cache
> Mar 17 01:33:12 t6360003 kernel: Swap cache stats: add 3336530, delete
> 3335333, find 2045244/2201205 Mar 17 01:33:12 t6360003 kernel: Free swap =
> 1971336kB
> Mar 17 01:33:12 t6360003 kernel: Total swap = 1999992kB
> Mar 17 01:33:12 t6360003 kernel: 1048576 pages RAM
> Mar 17 01:33:12 t6360003 kernel: 20255 pages reserved
> Mar 17 01:33:12 t6360003 kernel: 9350 pages shared
> Mar 17 01:33:12 t6360003 kernel: 888261 pages non-shared
> Mar 17 01:33:12 t6360003 kernel: Out of memory: kill process 27449
> (rpmbuild) score 3460 or a child Mar 17 01:33:12 t6360003 kernel: Killed
> process 27519 (sh)
> Mar 17 01:33:13 t6360003 kernel: as invoked oom-killer: gfp_mask=0xd0,
> order=2, oomkilladj=0 Mar 17 01:33:13 t6360003 kernel: CPU: 2 Not tainted
> 2.6.28 #1
> Mar 17 01:33:13 t6360003 kernel: Process as (pid: 16914, task:
> 0000000084aba338, ksp: 000000000e3c76d8) Mar 17 01:33:13 t6360003 kernel:
> 0700000035e74138 000000000e3c7800 0000000000000002 0000000000000000 Mar 17
> 01:33:13 t6360003 kernel: 000000000e3c78a0 000000000e3c7818
> 000000000e3c7818 000000000010534e Mar 17 01:33:13 t6360003 kernel:
> 0000000000000000 000000000e3c76d8 0000000000000000 000000000000000a Mar 17
> 01:33:13 t6360003 kernel: 000000000000000d 0000000000000000
> 000000000e3c7800 000000000e3c7878 Mar 17 01:33:13 t6360003 kernel:
> 00000000004a5ab0 000000000010534e 000000000e3c7800 000000000e3c7850 Mar 17
> 01:33:13 t6360003 kernel: Call Trace:
> Mar 17 01:33:13 t6360003 kernel: ([<0000000000105248>]
> show_trace+0xf4/0x144) Mar 17 01:33:13 t6360003 kernel:
> [<0000000000105300>] show_stack+0x68/0xf4 Mar 17 01:33:13 t6360003 kernel:
> [<000000000049c84c>] dump_stack+0xb0/0xc0 Mar 17 01:33:13 t6360003 kernel:
> [<000000000019235e>] oom_kill_process+0x9e/0x220 Mar 17 01:33:13 t6360003
> kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264 Mar 17 01:33:13
> t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
> Mar 17 01:33:13 t6360003 kernel: [<0000000000104058>]
> crst_table_alloc+0x48/0x108 Mar 17 01:33:13 t6360003 kernel:
> [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8 Mar 17 01:33:13 t6360003
> kernel: [<00000000001a802e>] handle_mm_fault+0x262/0x9cc Mar 17 01:33:13
> t6360003 kernel: [<00000000001a894e>] __get_user_pages+0x1b6/0x574 Mar 17
> 01:33:13 t6360003 kernel: [<00000000001a8d5a>] get_user_pages+0x4e/0x60
> Mar 17 01:33:13 t6360003 kernel: [<00000000001d58c4>]
> get_arg_page+0x6c/0xe8 Mar 17 01:33:13 t6360003 kernel:
> [<00000000001d5c3a>] copy_strings+0x1aa/0x290 Mar 17 01:33:13 t6360003
> kernel: [<00000000001d5d7e>] copy_strings_kernel+0x5e/0xb0 Mar 17 01:33:13
> t6360003 kernel: [<00000000001d78b0>] do_execve+0x1c8/0x254 Mar 17
> 01:33:13 t6360003 kernel: [<000000000010a2f8>] sys_execve+0x80/0xb8 Mar 17
> 01:33:13 t6360003 kernel: [<0000000000115e56>] sysc_noemu+0x10/0x16 Mar 17
> 01:33:13 t6360003 kernel: [<0000004d1949a40c>] 0x4d1949a40c Mar 17
> 01:33:13 t6360003 kernel: Mem-Info:
> Mar 17 01:33:13 t6360003 kernel: DMA per-cpu:
> Mar 17 01:33:13 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: Normal per-cpu:
> Mar 17 01:33:13 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:13 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 12
> Mar 17 01:33:13 t6360003 kernel: Active_anon:274 active_file:126
> inactive_anon:92 Mar 17 01:33:13 t6360003 kernel: inactive_file:110
> unevictable:987 dirty:20 writeback:222 unstable:0 Mar 17 01:33:13 t6360003
> kernel: free:137753 slab:884727 mapped:901 pagetables:318 bounce:0 Mar 17
> 01:33:13 t6360003 kernel: DMA free:445604kB min:4064kB low:5080kB
> high:6096kB active_anon:8kB inactive_anon:0kB active_file:0kB
> inactive_file:104kB unevictable:0kB present:2068480kB pages_scanned:0
> all_unreclaimable? no Mar 17 01:33:13 t6360003 kernel: lowmem_reserve[]: 0
> 2020 2020
> Mar 17 01:33:13 t6360003 kernel: Normal free:105408kB min:4064kB low:5080kB
> high:6096kB active_anon:1088kB inactive_anon:368kB active_file:504kB
> inactive_file:336kB unevictable:3948kB present:2068480kB pages_scanned:450
> all_unreclaimable? no Mar 17 01:33:13 t6360003 kernel: lowmem_reserve[]: 0
> 0 0
> Mar 17 01:33:13 t6360003 kernel: DMA: 100811*4kB 5173*8kB 6*16kB 0*32kB
> 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB = 445812kB Mar 17 01:33:13 t6360003
> kernel: Normal: 25877*4kB 62*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB
> 1*512kB 1*1024kB = 105652kB Mar 17 01:33:13 t6360003 kernel: 1477 total
> pagecache pages
> Mar 17 01:33:13 t6360003 kernel: 398 pages in swap cache
> Mar 17 01:33:13 t6360003 kernel: Swap cache stats: add 3343439, delete
> 3343041, find 2048863/2205415 Mar 17 01:33:13 t6360003 kernel: Free swap =
> 1960020kB
> Mar 17 01:33:13 t6360003 kernel: Total swap = 1999992kB
> Mar 17 01:33:13 t6360003 kernel: 1048576 pages RAM
> Mar 17 01:33:13 t6360003 kernel: 20255 pages reserved
> Mar 17 01:33:13 t6360003 kernel: 9549 pages shared
> Mar 17 01:33:13 t6360003 kernel: 887159 pages non-shared
> Mar 17 01:33:13 t6360003 kernel: Out of memory: kill process 29305 (make)
> score 3403 or a child Mar 17 01:33:13 t6360003 kernel: Killed process 29320
> (sh)
> Mar 17 01:33:14 t6360003 kernel: sh invoked oom-killer: gfp_mask=0xd0,
> order=2, oomkilladj=0 Mar 17 01:33:14 t6360003 kernel: CPU: 4 Not tainted
> 2.6.28 #1
> Mar 17 01:33:14 t6360003 kernel: Process sh (pid: 16922, task:
> 0000000084ab6138, ksp: 0000000060781738) Mar 17 01:33:14 t6360003 kernel:
> 070000003d7a0438 0000000060781860 0000000000000002 0000000000000000 Mar 17
> 01:33:14 t6360003 kernel: 0000000060781900 0000000060781878
> 0000000060781878 000000000010534e Mar 17 01:33:14 t6360003 kernel:
> 0000000000000000 0000000060781738 0000000000000000 000000000000000a Mar 17
> 01:33:14 t6360003 kernel: 000000000000000d 0000000000000000
> 0000000060781860 00000000607818d8 Mar 17 01:33:14 t6360003 kernel:
> 00000000004a5ab0 000000000010534e 0000000060781860 00000000607818b0 Mar 17
> 01:33:14 t6360003 kernel: Call Trace:
> Mar 17 01:33:14 t6360003 kernel: ([<0000000000105248>]
> show_trace+0xf4/0x144) Mar 17 01:33:14 t6360003 kernel:
> [<0000000000105300>] show_stack+0x68/0xf4 Mar 17 01:33:14 t6360003 kernel:
> [<000000000049c84c>] dump_stack+0xb0/0xc0 Mar 17 01:33:14 t6360003 kernel:
> [<000000000019235e>] oom_kill_process+0x9e/0x220 Mar 17 01:33:14 t6360003
> kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264 Mar 17 01:33:14
> t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
> Mar 17 01:33:14 t6360003 kernel: [<0000000000104058>]
> crst_table_alloc+0x48/0x108 Mar 17 01:33:14 t6360003 kernel:
> [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8 Mar 17 01:33:14 t6360003
> kernel: [<00000000001aa8ac>] copy_page_range+0x9ac/0xadc Mar 17 01:33:14
> t6360003 kernel: [<000000000013db32>] dup_mm+0x342/0x604 Mar 17 01:33:14
> t6360003 kernel: [<000000000013ef70>] copy_process+0x1118/0x1158 Mar 17
> 01:33:14 t6360003 kernel: [<000000000013f046>] do_fork+0x96/0x2dc Mar 17
> 01:33:14 t6360003 kernel: [<000000000010a402>] sys_clone+0x6a/0x78 Mar 17
> 01:33:14 t6360003 kernel: [<0000000000115e56>] sysc_noemu+0x10/0x16 Mar 17
> 01:33:14 t6360003 kernel: [<0000004d1949a152>] 0x4d1949a152 Mar 17
> 01:33:14 t6360003 kernel: Mem-Info:
> Mar 17 01:33:14 t6360003 kernel: DMA per-cpu:
> Mar 17 01:33:14 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: Normal per-cpu:
> Mar 17 01:33:14 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 42
> Mar 17 01:33:14 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 50
> Mar 17 01:33:14 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> Mar 17 01:33:14 t6360003 kernel: Active_anon:171 active_file:100
> inactive_anon:74 Mar 17 01:33:14 t6360003 kernel: inactive_file:78
> unevictable:987 dirty:0 writeback:146 unstable:0 Mar 17 01:33:14 t6360003
> kernel: free:137942 slab:884626 mapped:858 pagetables:357 bounce:0 Mar 17
> 01:33:14 t6360003 kernel: DMA free:445612kB min:4064kB low:5080kB
> high:6096kB active_anon:0kB inactive_anon:8kB active_file:104kB
> inactive_file:0kB unevictable:0kB present:2068480kB pages_scanned:0
> all_unreclaimable? no Mar 17 01:33:14 t6360003 kernel: lowmem_reserve[]: 0
> 2020 2020
> Mar 17 01:33:14 t6360003 kernel: Normal free:106156kB min:4064kB low:5080kB
> high:6096kB active_anon:684kB inactive_anon:288kB active_file:296kB
> inactive_file:312kB unevictable:3948kB present:2068480kB pages_scanned:544
> all_unreclaimable? no Mar 17 01:33:14 t6360003 kernel: lowmem_reserve[]: 0
> 0 0
> Mar 17 01:33:14 t6360003 kernel: DMA: 100818*4kB 5175*8kB 2*16kB 3*32kB
> 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB = 445888kB Mar 17 01:33:14 t6360003
> kernel: Normal: 25978*4kB 76*8kB 2*16kB 1*32kB 1*64kB 0*128kB 0*256kB
> 1*512kB 1*1024kB = 106184kB Mar 17 01:33:14 t6360003 kernel: 1315 total
> pagecache pages
> Mar 17 01:33:14 t6360003 kernel: 321 pages in swap cache
> Mar 17 01:33:14 t6360003 kernel: Swap cache stats: add 3349005, delete
> 3348684, find 2049544/2206461 Mar 17 01:33:14 t6360003 kernel: Free swap =
> 1947096kB
> Mar 17 01:33:14 t6360003 kernel: Total swap = 1999992kB
> Mar 17 01:33:14 t6360003 kernel: 1048576 pages RAM
> Mar 17 01:33:14 t6360003 kernel: 20255 pages reserved
> Mar 17 01:33:14 t6360003 kernel: 9878 pages shared
> Mar 17 01:33:14 t6360003 kernel: 887456 pages non-shared
> Mar 17 01:33:14 t6360003 kernel: Out of memory: kill process 16782 (cc1)
> score 3375 or a child Mar 17 01:33:14 t6360003 kernel: Killed process 16782
> (cc1)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-17 9:51 ` Nick Piggin
@ 2009-03-17 10:11 ` Heiko Carstens
0 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2009-03-17 10:11 UTC (permalink / raw)
To: Nick Piggin
Cc: Mel Gorman, linux-mm, Nick Piggin, Andrew Morton,
Martin Schwidefsky, Andreas Krebbel
On Tue, 17 Mar 2009 20:51:13 +1100
Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> On Tuesday 17 March 2009 20:00:49 Heiko Carstens wrote:
> > the below looks like there is some bug in the memory management code.
> > Even if there seems to be plenty of memory available the oom-killer
> > kills processes.
> >
> > The below happened after 27 days uptime, memory seems to be heavily
> > fragmented,
>
> What slab allocator are you using?
That was SLAB.
> > but there are stills larger portions of memory free that
> > could satisfy an order 2 allocation. Any idea why this fails?
>
> We still keep some watermarks around for higher order pages (for
> GFP_ATOMIC and page reclaim etc purposes).
>
> Possibly it is being a bit aggressive with the higher orders; when I
> added it I just made a guess at a sane function. See
> mm/page_alloc.c:zone_watermark_ok(). In particular, the for loop at the
> end of the function is the slowpath where it is calculating higher
> order watermarks. The min >>= 1 statement, 1 could be replaced with 2.
> Or we could just keep reserves for 0..PAGE_ALLOC_COSTLY_ORDER and then
> give away _any_ free pages for higher orders than that.
>
> Still would seem to just prolong the inevitable? Exploding after 27 days
> of uptime is rather sad :(
Yes, it seems to look more like a memory leak. Hmm..
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-17 9:46 ` Andrew Morton
@ 2009-03-17 10:17 ` Heiko Carstens
2009-03-17 10:28 ` Heiko Carstens
2009-03-20 15:27 ` Mel Gorman
1 sibling, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2009-03-17 10:17 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Mel Gorman, Nick Piggin, Martin Schwidefsky,
Andreas Krebbel
On Tue, 17 Mar 2009 02:46:05 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:
> > Mar 16 21:40:40 t6360003 kernel: Active_anon:372 active_file:45 inactive_anon:154
> > Mar 16 21:40:40 t6360003 kernel: inactive_file:152 unevictable:987 dirty:0 writeback:188 unstable:0
> > Mar 16 21:40:40 t6360003 kernel: free:146348 slab:875833 mapped:805 pagetables:378 bounce:0
> > Mar 16 21:40:40 t6360003 kernel: DMA free:467728kB min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:116kB unevictable:0kB present:2068480kB pages_scanned:0 all_unreclaimable? no
> > Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 2020 2020
> > Mar 16 21:40:40 t6360003 kernel: Normal free:117664kB min:4064kB low:5080kB high:6096kB active_anon:1488kB inactive_anon:616kB active_file:188kB inactive_file:492kB unevictable:3948kB present:2068480kB pages_scanned:128 all_unreclaimable? no
> > Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 0 0
>
> The scanner has wrung pretty much all it can out of the reclaimable pages -
> the LRUs are nearly empty. There's a few hundred MB free and apparently we
> don't have four physically contiguous free pages anywhere. It's
> believeable.
>
> The question is: where the heck did all your memory go? You have 2GB of
> ZONE_NORMAL memory in that machine, but only a tenth of it is visible to
> the page reclaim code.
>
> Something must have allocated (and possibly leaked) it.
Looks like most of the memory went for dentries and inodes.
slabtop output:
Active / Total Objects (% used) : 8172165 / 8326954 (98.1%)
Active / Total Slabs (% used) : 903692 / 903698 (100.0%)
Active / Total Caches (% used) : 91 / 144 (63.2%)
Active / Total Size (% used) : 3251262.44K / 3281384.22K (99.1%)
Minimum / Average / Maximum Object : 0.02K / 0.39K / 1024.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
3960036 3960017 99% 0.59K 660006 6 2640024K inode_cache
4137155 3997581 96% 0.20K 217745 19 870980K dentry
69776 69744 99% 0.80K 17444 4 69776K ext3_inode_cache
96792 92892 95% 0.10K 2616 37 10464K buffer_head
10024 9895 98% 0.54K 1432 7 5728K radix_tree_node
1093 1087 99% 4.00K 1093 1 4372K size-4096
14805 14711 99% 0.25K 987 15 3948K size-256
2400 2381 99% 0.80K 480 5 1920K shmem_inode_cache
1416 1416 100% 1.00K 354 4 1416K size-1024
152 152 100% 5.59K 152 1 1216K task_struct
370 359 97% 2.00K 185 2 740K size-2048
9381 4359 46% 0.06K 159 59 636K size-64
8 8 100% 64.00K 8 1 512K size-65536
976 952 97% 0.50K 122 8 488K size-512
177 156 88% 2.25K 59 3 472K sighand_cache
6254 6070 97% 0.07K 118 53 472K sysfs_dir_cache
1335 422 31% 0.25K 89 15 356K filp
1830 1298 70% 0.12K 61 30 244K size-128
1288 1061 82% 0.16K 56 23 224K vm_area_struct
184 160 86% 1.00K 46 4 184K signal_cache
4704 4548 96% 0.03K 42 112 168K size-32
205 178 86% 0.75K 41 5 164K sock_inode_cache
234 234 100% 0.64K 39 6 156K proc_inode_cache
150 143 95% 0.75K 30 5 120K kmem_cache
120 97 80% 1.00K 30 4 120K files_cache
6 6 100% 16.00K 6 1 96K size-16384
720 124 17% 0.12K 24 30 96K pid
140 116 82% 0.53K 20 7 80K idr_layer_cache
30 30 100% 2.11K 10 3 80K blkdev_queue
18 18 100% 4.00K 18 1 72K biovec-256
17 17 100% 4.00K 17 1 68K size-4096(DMA)
68 67 98% 1.00K 17 4 68K RAWv6
2 2 100% 32.00K 2 1 64K size-32768
65 65 100% 0.75K 13 5 52K RAW
48 48 100% 1.00K 12 4 48K mm_struct
40 36 90% 1.00K 10 4 40K bdev_cache
50 19 38% 0.75K 10 5 40K UNIX
400 42 10% 0.09K 10 40 40K journal_head
18 18 100% 2.00K 9 2 36K biovec-128
472 96 20% 0.06K 8 59 32K fs_cache
105 105 100% 0.25K 7 15 28K skbuff_head_cache
210 18 8% 0.12K 7 30 28K bio
24 21 87% 1.00K 6 4 24K size-1024(DMA)
864 489 56% 0.02K 6 144 24K anon_vma
24 18 75% 1.00K 6 4 24K biovec-64
35 32 91% 0.50K 5 7 20K skbuff_fclone_cache
32 18 56% 0.50K 4 8 16K size-512(DMA)
60 26 43% 0.25K 4 15 16K mnt_cache
60 18 30% 0.25K 4 15 16K biovec-16
8 6 75% 1.75K 4 2 16K TCP
8 8 100% 2.00K 4 2 16K rpc_buffers
66 4 6% 0.17K 3 22 12K file_lock_cache
30 20 66% 0.36K 3 10 12K blkdev_requests
12 5 41% 1.00K 3 4 12K UDP
45 4 8% 0.25K 3 15 12K uid_cache
21 13 61% 0.50K 3 7 12K ip6_dst_cache
336 256 76% 0.03K 3 112 12K dm_io
432 256 59% 0.02K 3 144 12K dm_target_io
30 7 23% 0.25K 2 15 8K size-256(DMA)
118 18 15% 0.06K 2 59 8K biovec-4
96 8 8% 0.08K 2 48 8K blkdev_ioc
14 4 28% 0.50K 2 7 8K ip_dst_cache
5 5 100% 1.50K 1 5 8K qdio_q
4 3 75% 1.75K 2 2 8K TCPv6
8 6 75% 1.00K 2 4 8K rpc_inode_cache
14 8 57% 0.50K 2 7 8K rpc_tasks
112 4 3% 0.03K 1 112 4K size-32(DMA)
59 34 57% 0.06K 1 59 4K size-64(DMA)
1 1 100% 4.00K 1 1 4K names_cache
202 18 8% 0.02K 1 202 4K biovec-1
30 2 6% 0.12K 1 30 4K sgpool-8
15 2 13% 0.25K 1 15 4K sgpool-16
8 2 25% 0.50K 1 8 4K sgpool-32
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-17 10:17 ` Heiko Carstens
@ 2009-03-17 10:28 ` Heiko Carstens
2009-03-17 10:49 ` Nick Piggin
0 siblings, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2009-03-17 10:28 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, linux-mm, Mel Gorman, Nick Piggin,
Martin Schwidefsky, Andreas Krebbel
On Tue, 17 Mar 2009 11:17:38 +0100
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> On Tue, 17 Mar 2009 02:46:05 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> > > Mar 16 21:40:40 t6360003 kernel: Active_anon:372 active_file:45 inactive_anon:154
> > > Mar 16 21:40:40 t6360003 kernel: inactive_file:152 unevictable:987 dirty:0 writeback:188 unstable:0
> > > Mar 16 21:40:40 t6360003 kernel: free:146348 slab:875833 mapped:805 pagetables:378 bounce:0
> > > Mar 16 21:40:40 t6360003 kernel: DMA free:467728kB min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:116kB unevictable:0kB present:2068480kB pages_scanned:0 all_unreclaimable? no
> > > Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 2020 2020
> > > Mar 16 21:40:40 t6360003 kernel: Normal free:117664kB min:4064kB low:5080kB high:6096kB active_anon:1488kB inactive_anon:616kB active_file:188kB inactive_file:492kB unevictable:3948kB present:2068480kB pages_scanned:128 all_unreclaimable? no
> > > Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 0 0
> >
> > The scanner has wrung pretty much all it can out of the reclaimable pages -
> > the LRUs are nearly empty. There's a few hundred MB free and apparently we
> > don't have four physically contiguous free pages anywhere. It's
> > believeable.
> >
> > The question is: where the heck did all your memory go? You have 2GB of
> > ZONE_NORMAL memory in that machine, but only a tenth of it is visible to
> > the page reclaim code.
> >
> > Something must have allocated (and possibly leaked) it.
>
> Looks like most of the memory went for dentries and inodes.
> slabtop output:
>
> Active / Total Objects (% used) : 8172165 / 8326954 (98.1%)
> Active / Total Slabs (% used) : 903692 / 903698 (100.0%)
> Active / Total Caches (% used) : 91 / 144 (63.2%)
> Active / Total Size (% used) : 3251262.44K / 3281384.22K (99.1%)
> Minimum / Average / Maximum Object : 0.02K / 0.39K / 1024.00K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 3960036 3960017 99% 0.59K 660006 6 2640024K inode_cache
> 4137155 3997581 96% 0.20K 217745 19 870980K dentry
> 69776 69744 99% 0.80K 17444 4 69776K ext3_inode_cache
> 96792 92892 95% 0.10K 2616 37 10464K buffer_head
> 10024 9895 98% 0.54K 1432 7 5728K radix_tree_node
> 1093 1087 99% 4.00K 1093 1 4372K size-4096
> 14805 14711 99% 0.25K 987 15 3948K size-256
> 2400 2381 99% 0.80K 480 5 1920K shmem_inode_cache
FWIW, after "echo 3 > /proc/sys/vm/drop_caches" it looks like this:
Active / Total Objects (% used) : 7965003 / 8153578 (97.7%)
Active / Total Slabs (% used) : 882511 / 882511 (100.0%)
Active / Total Caches (% used) : 90 / 144 (62.5%)
Active / Total Size (% used) : 3173487.59K / 3211091.64K (98.8%)
Minimum / Average / Maximum Object : 0.02K / 0.39K / 1024.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
3960036 3960007 99% 0.59K 660006 6 2640024K inode_cache
4137155 3962636 95% 0.20K 217745 19 870980K dentry
1097 1097 100% 4.00K 1097 1 4388K size-4096
14805 14667 99% 0.25K 987 15 3948K size-256
2400 2381 99% 0.80K 480 5 1920K shmem_inode_cache
1404 1404 100% 1.00K 351 4 1404K size-1024
152 152 100% 5.59K 152 1 1216K task_struct
1302 347 26% 0.54K 186 7 744K radix_tree_node
370 359 97% 2.00K 185 2 740K size-2048
9381 4316 46% 0.06K 159 59 636K size-64
8 8 100% 64.00K 8 1 512K size-65536
So, are we leaking dentries and inodes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-17 10:28 ` Heiko Carstens
@ 2009-03-17 10:49 ` Nick Piggin
2009-03-17 11:39 ` Heiko Carstens
2009-03-20 5:08 ` Wu Fengguang
0 siblings, 2 replies; 13+ messages in thread
From: Nick Piggin @ 2009-03-17 10:49 UTC (permalink / raw)
To: Heiko Carstens, linux-fsdevel
Cc: Andrew Morton, linux-mm, Mel Gorman, Nick Piggin,
Martin Schwidefsky, Andreas Krebbel
On Tuesday 17 March 2009 21:28:42 Heiko Carstens wrote:
> On Tue, 17 Mar 2009 11:17:38 +0100
>
> Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > On Tue, 17 Mar 2009 02:46:05 -0700
> >
> > Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > Mar 16 21:40:40 t6360003 kernel: Active_anon:372 active_file:45
> > > > inactive_anon:154 Mar 16 21:40:40 t6360003 kernel: inactive_file:152
> > > > unevictable:987 dirty:0 writeback:188 unstable:0 Mar 16 21:40:40
> > > > t6360003 kernel: free:146348 slab:875833 mapped:805 pagetables:378
> > > > bounce:0 Mar 16 21:40:40 t6360003 kernel: DMA free:467728kB
> > > > min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:0kB
> > > > active_file:0kB inactive_file:116kB unevictable:0kB present:2068480kB
> > > > pages_scanned:0 all_unreclaimable? no Mar 16 21:40:40 t6360003
> > > > kernel: lowmem_reserve[]: 0 2020 2020 Mar 16 21:40:40 t6360003
> > > > kernel: Normal free:117664kB min:4064kB low:5080kB high:6096kB
> > > > active_anon:1488kB inactive_anon:616kB active_file:188kB
> > > > inactive_file:492kB unevictable:3948kB present:2068480kB
> > > > pages_scanned:128 all_unreclaimable? no Mar 16 21:40:40 t6360003
> > > > kernel: lowmem_reserve[]: 0 0 0
> > >
> > > The scanner has wrung pretty much all it can out of the reclaimable
> > > pages - the LRUs are nearly empty. There's a few hundred MB free and
> > > apparently we don't have four physically contiguous free pages
> > > anywhere. It's believeable.
> > >
> > > The question is: where the heck did all your memory go? You have 2GB
> > > of ZONE_NORMAL memory in that machine, but only a tenth of it is
> > > visible to the page reclaim code.
> > >
> > > Something must have allocated (and possibly leaked) it.
> >
> > Looks like most of the memory went for dentries and inodes.
> > slabtop output:
> >
> > Active / Total Objects (% used) : 8172165 / 8326954 (98.1%)
> > Active / Total Slabs (% used) : 903692 / 903698 (100.0%)
> > Active / Total Caches (% used) : 91 / 144 (63.2%)
> > Active / Total Size (% used) : 3251262.44K / 3281384.22K (99.1%)
> > Minimum / Average / Maximum Object : 0.02K / 0.39K / 1024.00K
> >
> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > 3960036 3960017 99% 0.59K 660006 6 2640024K inode_cache
> > 4137155 3997581 96% 0.20K 217745 19 870980K dentry
> > 69776 69744 99% 0.80K 17444 4 69776K ext3_inode_cache
> > 96792 92892 95% 0.10K 2616 37 10464K buffer_head
> > 10024 9895 98% 0.54K 1432 7 5728K radix_tree_node
> > 1093 1087 99% 4.00K 1093 1 4372K size-4096
> > 14805 14711 99% 0.25K 987 15 3948K size-256
> > 2400 2381 99% 0.80K 480 5 1920K shmem_inode_cache
>
> FWIW, after "echo 3 > /proc/sys/vm/drop_caches" it looks like this:
>
> Active / Total Objects (% used) : 7965003 / 8153578 (97.7%)
> Active / Total Slabs (% used) : 882511 / 882511 (100.0%)
> Active / Total Caches (% used) : 90 / 144 (62.5%)
> Active / Total Size (% used) : 3173487.59K / 3211091.64K (98.8%)
> Minimum / Average / Maximum Object : 0.02K / 0.39K / 1024.00K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 3960036 3960007 99% 0.59K 660006 6 2640024K inode_cache
> 4137155 3962636 95% 0.20K 217745 19 870980K dentry
> 1097 1097 100% 4.00K 1097 1 4388K size-4096
> 14805 14667 99% 0.25K 987 15 3948K size-256
> 2400 2381 99% 0.80K 480 5 1920K shmem_inode_cache
> 1404 1404 100% 1.00K 351 4 1404K size-1024
> 152 152 100% 5.59K 152 1 1216K task_struct
> 1302 347 26% 0.54K 186 7 744K radix_tree_node
> 370 359 97% 2.00K 185 2 740K size-2048
> 9381 4316 46% 0.06K 159 59 636K size-64
> 8 8 100% 64.00K 8 1 512K size-65536
>
> So, are we leaking dentries and inodes?
Yes, probably leaking dentries, which pin inodes. I don't know that slab
leak debugging is going to help you because it won't find what is holding
the refcount.
Cc linux-fsdevel. Which kernel this is? Config as well.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-17 10:49 ` Nick Piggin
@ 2009-03-17 11:39 ` Heiko Carstens
2009-03-20 5:08 ` Wu Fengguang
1 sibling, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2009-03-17 11:39 UTC (permalink / raw)
To: Nick Piggin
Cc: linux-fsdevel, Andrew Morton, linux-mm, Mel Gorman, Nick Piggin,
Martin Schwidefsky, Andreas Krebbel
On Tue, 17 Mar 2009 21:49:35 +1100
Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> On Tuesday 17 March 2009 21:28:42 Heiko Carstens wrote:
> > > Looks like most of the memory went for dentries and inodes.
> > > slabtop output:
> > >
> > > Active / Total Objects (% used) : 8172165 / 8326954 (98.1%)
> > > Active / Total Slabs (% used) : 903692 / 903698 (100.0%)
> > > Active / Total Caches (% used) : 91 / 144 (63.2%)
> > > Active / Total Size (% used) : 3251262.44K / 3281384.22K (99.1%)
> > > Minimum / Average / Maximum Object : 0.02K / 0.39K / 1024.00K
> > >
> > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > > 3960036 3960017 99% 0.59K 660006 6 2640024K inode_cache
> > > 4137155 3997581 96% 0.20K 217745 19 870980K dentry
> > > 69776 69744 99% 0.80K 17444 4 69776K ext3_inode_cache
> > > 96792 92892 95% 0.10K 2616 37 10464K buffer_head
> > > 10024 9895 98% 0.54K 1432 7 5728K radix_tree_node
> > > 1093 1087 99% 4.00K 1093 1 4372K size-4096
> > > 14805 14711 99% 0.25K 987 15 3948K size-256
> > > 2400 2381 99% 0.80K 480 5 1920K shmem_inode_cache
> >
> > FWIW, after "echo 3 > /proc/sys/vm/drop_caches" it looks like this:
> >
> > Active / Total Objects (% used) : 7965003 / 8153578 (97.7%)
> > Active / Total Slabs (% used) : 882511 / 882511 (100.0%)
> > Active / Total Caches (% used) : 90 / 144 (62.5%)
> > Active / Total Size (% used) : 3173487.59K / 3211091.64K (98.8%)
> > Minimum / Average / Maximum Object : 0.02K / 0.39K / 1024.00K
> >
> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > 3960036 3960007 99% 0.59K 660006 6 2640024K inode_cache
> > 4137155 3962636 95% 0.20K 217745 19 870980K dentry
> > 1097 1097 100% 4.00K 1097 1 4388K size-4096
> > 14805 14667 99% 0.25K 987 15 3948K size-256
> > 2400 2381 99% 0.80K 480 5 1920K shmem_inode_cache
> > 1404 1404 100% 1.00K 351 4 1404K size-1024
> > 152 152 100% 5.59K 152 1 1216K task_struct
> > 1302 347 26% 0.54K 186 7 744K radix_tree_node
> > 370 359 97% 2.00K 185 2 740K size-2048
> > 9381 4316 46% 0.06K 159 59 636K size-64
> > 8 8 100% 64.00K 8 1 512K size-65536
> >
> > So, are we leaking dentries and inodes?
>
> Yes, probably leaking dentries, which pin inodes. I don't know that slab
> leak debugging is going to help you because it won't find what is holding
> the refcount.
>
> Cc linux-fsdevel. Which kernel this is? Config as well.
This is a 2.6.28 kernel, but with some private patches on top. But none
of them touches fs code.
Hmm... if needed we could retry with a plain vanilla 2.6.28.x kernel.
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.28
# Mon Jan 19 12:43:38 2009
#
CONFIG_SCHED_MC=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_BUG=y
CONFIG_NO_IOMEM=y
CONFIG_NO_DMA=y
CONFIG_GENERIC_LOCKBREAK=y
CONFIG_PGSTE=y
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_S390=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CGROUPS is not set
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_USER_SCHED=y
# CONFIG_CGROUP_SCHED is not set
# CONFIG_SYSFS_DEPRECATED_V2 is not set
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_COMPAT_BRK=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
# CONFIG_MARKERS is not set
CONFIG_OPROFILE=m
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_BLK_DEV_INTEGRITY is not set
CONFIG_BLOCK_COMPAT=y
#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
CONFIG_DEFAULT_DEADLINE=y
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="deadline"
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_CLASSIC_RCU is not set
# CONFIG_FREEZER is not set
#
# Base setup
#
#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_64BIT=y
CONFIG_SMP=y
CONFIG_NR_CPUS=64
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_AUDIT_ARCH=y
CONFIG_S390_SWITCH_AMODE=y
CONFIG_S390_EXEC_PROTECT=y
#
# Code generation options
#
# CONFIG_MARCH_G5 is not set
# CONFIG_MARCH_Z900 is not set
CONFIG_MARCH_Z990=y
# CONFIG_MARCH_Z9_109 is not set
# CONFIG_MARCH_Z10 is not set
CONFIG_PACK_STACK=y
CONFIG_SMALL_STACK=y
CONFIG_CHECK_STACK=y
CONFIG_STACK_GUARD=512
# CONFIG_WARN_STACK is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
#
# Kernel preemption
#
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_TRACE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MIGRATION=y
CONFIG_RESOURCES_64BIT=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
#
# I/O subsystem configuration
#
CONFIG_MACHCHK_WARNING=y
CONFIG_QDIO=y
CONFIG_CHSC_SCH=m
#
# Misc
#
CONFIG_IPL=y
# CONFIG_IPL_TAPE is not set
CONFIG_IPL_VM=y
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_FORCE_MAX_ZONEORDER=9
# CONFIG_PROCESS_DEBUG is not set
CONFIG_PFAULT=y
CONFIG_SHARED_KERNEL=y
CONFIG_CMM=m
CONFIG_CMM_PROC=y
CONFIG_CMM_IUCV=y
CONFIG_PAGE_STATES=y
CONFIG_APPLDATA_BASE=y
CONFIG_APPLDATA_MEM=m
CONFIG_APPLDATA_OS=m
CONFIG_APPLDATA_NET_SUM=m
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y
CONFIG_S390_HYPFS_FS=y
CONFIG_KEXEC=y
CONFIG_ZFCPDUMP=y
CONFIG_S390_GUEST=y
CONFIG_KMSG_IDS=y
CONFIG_NET=y
#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=m
CONFIG_XFRM_SUB_POLICY=y
# CONFIG_XFRM_MIGRATE is not set
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_IPCOMP=y
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_IUCV=y
CONFIG_AFIUCV=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=y
CONFIG_NET_IPGRE=y
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=y
CONFIG_INET_XFRM_TUNNEL=y
CONFIG_INET_TUNNEL=y
CONFIG_INET_XFRM_MODE_TRANSPORT=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=m
CONFIG_INET_LRO=m
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=y
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=y
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=y
CONFIG_INET6_XFRM_MODE_TUNNEL=y
CONFIG_INET6_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
CONFIG_IPV6_SIT=m
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
# CONFIG_IPV6_MROUTE is not set
CONFIG_NETLABEL=y
# CONFIG_NETWORK_SECMARK is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y
#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_CT_ACCT=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_EVENTS=y
# CONFIG_NF_CT_PROTO_DCCP is not set
CONFIG_NF_CT_PROTO_SCTP=m
# CONFIG_NF_CT_PROTO_UDPLITE is not set
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
# CONFIG_NF_CONNTRACK_PPTP is not set
# CONFIG_NF_CONNTRACK_SANE is not set
CONFIG_NF_CONNTRACK_SIP=m
# CONFIG_NF_CONNTRACK_TFTP is not set
CONFIG_NF_CT_NETLINK=m
# CONFIG_NETFILTER_TPROXY is not set
CONFIG_NETFILTER_XTABLES=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
# CONFIG_NETFILTER_XT_MATCH_RECENT is not set
# CONFIG_NETFILTER_XT_MATCH_SCTP is not set
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_TIME=m
CONFIG_NETFILTER_XT_MATCH_U32=m
CONFIG_IP_VS=m
# CONFIG_IP_VS_IPV6 is not set
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12
#
# IPVS transport protocol load balancing support
#
# CONFIG_IP_VS_PROTO_TCP is not set
# CONFIG_IP_VS_PROTO_UDP is not set
# CONFIG_IP_VS_PROTO_ESP is not set
# CONFIG_IP_VS_PROTO_AH is not set
#
# IPVS scheduler
#
# CONFIG_IP_VS_RR is not set
# CONFIG_IP_VS_WRR is not set
# CONFIG_IP_VS_LC is not set
# CONFIG_IP_VS_WLC is not set
# CONFIG_IP_VS_LBLC is not set
# CONFIG_IP_VS_LBLCR is not set
# CONFIG_IP_VS_DH is not set
# CONFIG_IP_VS_SH is not set
# CONFIG_IP_VS_SED is not set
# CONFIG_IP_VS_NQ is not set
#
# IPVS application helper
#
#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_CONNTRACK_IPV4=m
CONFIG_NF_CONNTRACK_PROC_COMPAT=y
CONFIG_IP_NF_QUEUE=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_ADDRTYPE=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_NF_NAT=m
CONFIG_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
# CONFIG_NF_NAT_SNMP_BASIC is not set
CONFIG_NF_NAT_PROTO_SCTP=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
# CONFIG_NF_NAT_TFTP is not set
CONFIG_NF_NAT_AMANDA=m
# CONFIG_NF_NAT_PPTP is not set
CONFIG_NF_NAT_H323=m
CONFIG_NF_NAT_SIP=m
CONFIG_IP_NF_MANGLE=m
# CONFIG_IP_NF_TARGET_CLUSTERIP is not set
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
# CONFIG_IP_NF_SECURITY is not set
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
#
# IPv6: Netfilter Configuration
#
CONFIG_NF_CONNTRACK_IPV6=m
CONFIG_IP6_NF_QUEUE=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RT=m
CONFIG_IP6_NF_TARGET_LOG=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_RAW=m
# CONFIG_IP6_NF_SECURITY is not set
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
# CONFIG_BRIDGE_EBT_IP6 is not set
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_ULOG=m
CONFIG_BRIDGE_EBT_NFLOG=m
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_MSG is not set
# CONFIG_SCTP_DBG_OBJCNT is not set
CONFIG_SCTP_HMAC_NONE=y
# CONFIG_SCTP_HMAC_SHA1 is not set
# CONFIG_SCTP_HMAC_MD5 is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_STP=y
CONFIG_BRIDGE=y
CONFIG_VLAN_8021Q=y
# CONFIG_VLAN_8021Q_GVRP is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
CONFIG_LLC2=m
CONFIG_IPX=m
# CONFIG_IPX_INTERN is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_SCHED is not set
CONFIG_NET_CLS_ROUTE=y
#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_TCPPROBE is not set
CONFIG_CAN=m
CONFIG_CAN_RAW=m
CONFIG_CAN_BCM=m
#
# CAN Device Drivers
#
CONFIG_CAN_VCAN=m
CONFIG_CAN_DEBUG_DEVICES=y
# CONFIG_AF_RXRPC is not set
# CONFIG_PHONET is not set
CONFIG_FIB_RULES=y
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_PCMCIA is not set
CONFIG_CCW=y
#
# Device Drivers
#
#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
CONFIG_SYS_HYPERVISOR=y
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=40960
CONFIG_BLK_DEV_XIP=y
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
#
# S/390 block device drivers
#
CONFIG_BLK_DEV_XPRAM=m
CONFIG_DCSSBLK=y
CONFIG_DASD=y
CONFIG_DASD_PROFILE=y
CONFIG_DASD_ECKD=y
CONFIG_DASD_FBA=y
CONFIG_DASD_DIAG=y
CONFIG_DASD_EER=y
CONFIG_VIRTIO_BLK=y
CONFIG_MISC_DEVICES=y
# CONFIG_EEPROM_93CX6 is not set
CONFIG_ENCLOSURE_SERVICES=m
# CONFIG_C2PORT is not set
#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
# CONFIG_SCSI_DMA is not set
CONFIG_SCSI_TGT=m
CONFIG_SCSI_NETLINK=y
# CONFIG_SCSI_PROC_FS is not set
#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
CONFIG_CHR_DEV_OSST=y
CONFIG_BLK_DEV_SR=y
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_SCSI_WAIT_SCAN=m
#
# SCSI Transports
#
# CONFIG_SCSI_SPI_ATTRS is not set
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SAS_LIBSAS_DEBUG=y
CONFIG_SCSI_SRP_ATTRS=m
CONFIG_SCSI_SRP_TGT_ATTRS=y
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_SCSI_DEBUG is not set
CONFIG_ZFCP=y
# CONFIG_SCSI_DH is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
# CONFIG_MD_LINEAR is not set
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
# CONFIG_MD_RAID456 is not set
CONFIG_MD_MULTIPATH=m
# CONFIG_MD_FAULTY is not set
CONFIG_BLK_DEV_DM=m
# CONFIG_DM_DEBUG is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
# CONFIG_DM_DELAY is not set
CONFIG_DM_UEVENT=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_BONDING=m
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=y
CONFIG_VETH=m
CONFIG_NET_ETHERNET=y
# CONFIG_MII is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NETDEV_1000=y
CONFIG_NETDEV_10000=y
CONFIG_TR=y
# CONFIG_WAN is not set
#
# S/390 network device drivers
#
CONFIG_LCS=m
CONFIG_CTCM=m
CONFIG_NETIUCV=m
CONFIG_SMSGIUCV=m
CONFIG_CLAW=m
CONFIG_QETH=m
CONFIG_QETH_L2=m
CONFIG_QETH_L3=m
CONFIG_QETH_IPV6=y
CONFIG_CCWGROUP=m
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
CONFIG_VIRTIO_NET=y
#
# Character devices
#
CONFIG_DEVKMEM=y
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IUCV=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_VIRTIO is not set
# CONFIG_R3964 is not set
CONFIG_RAW_DRIVER=m
CONFIG_MAX_RAW_DEVS=256
CONFIG_HANGCHECK_TIMER=m
#
# S/390 character device drivers
#
CONFIG_TN3270=y
CONFIG_TN3270_TTY=y
CONFIG_TN3270_FS=y
CONFIG_TN3270_CONSOLE=y
CONFIG_TN3215=y
CONFIG_TN3215_CONSOLE=y
CONFIG_CCW_CONSOLE=y
CONFIG_SCLP_TTY=y
CONFIG_SCLP_CONSOLE=y
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
CONFIG_SCLP_CPI=m
CONFIG_SCLP_ASYNC=m
CONFIG_S390_TAPE=m
#
# S/390 tape interface support
#
CONFIG_S390_TAPE_BLOCK=y
#
# S/390 tape hardware support
#
CONFIG_S390_TAPE_34XX=m
CONFIG_S390_TAPE_3590=m
CONFIG_VMLOGRDR=m
CONFIG_VMCP=m
CONFIG_MONREADER=m
CONFIG_MONWRITER=m
CONFIG_S390_VMUR=m
# CONFIG_POWER_SUPPLY is not set
CONFIG_THERMAL=y
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set
#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
CONFIG_ZVM_WATCHDOG=m
# CONFIG_REGULATOR is not set
CONFIG_MEMSTICK=m
CONFIG_MEMSTICK_DEBUG=y
#
# MemoryStick drivers
#
CONFIG_MEMSTICK_UNSAFE_RESUME=y
CONFIG_MSPRO_BLOCK=m
#
# MemoryStick Host Controller Drivers
#
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_STAGING is not set
#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT2_FS_XIP=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
# CONFIG_EXT4_FS is not set
CONFIG_FS_XIP=y
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_JBD2=m
CONFIG_JBD2_DEBUG=y
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=m
# CONFIG_REISERFS_CHECK is not set
CONFIG_REISERFS_PROC_INFO=y
CONFIG_REISERFS_FS_XATTR=y
CONFIG_REISERFS_FS_POSIX_ACL=y
# CONFIG_REISERFS_FS_SECURITY is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
CONFIG_XFS_FS=m
# CONFIG_XFS_QUOTA is not set
CONFIG_XFS_POSIX_ACL=y
# CONFIG_XFS_RT is not set
# CONFIG_XFS_DEBUG is not set
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=m
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
# CONFIG_OCFS2_FS_STATS is not set
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
# CONFIG_OCFS2_COMPAT_JBD is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=m
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
CONFIG_FUSE_FS=m
CONFIG_GENERIC_ACL=y
#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y
#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set
#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
# CONFIG_NFSD_V4 is not set
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
# CONFIG_SUNRPC_REGISTER_V4 is not set
CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_RPCSEC_GSS_SPKM3=m
CONFIG_SMB_FS=m
CONFIG_SMB_NLS_DEFAULT=y
CONFIG_SMB_NLS_REMOTE="cp437"
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_IBM_PARTITION=y
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
# CONFIG_MINIX_SUBPARTITION is not set
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
# CONFIG_KARMA_PARTITION is not set
# CONFIG_EFI_PARTITION is not set
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=m
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=y
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y
#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
CONFIG_HEADERS_CHECK=y
CONFIG_DEBUG_KERNEL=y
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
# CONFIG_TIMER_STATS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_INFO is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_SG is not set
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_TORTURE_TEST is not set
CONFIG_KPROBES_SANITY_TEST=y
CONFIG_BACKTRACE_SELF_TEST=m
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
CONFIG_LKDTM=m
# CONFIG_FAULT_INJECTION is not set
CONFIG_LATENCYTOP=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_HAVE_FUNCTION_TRACER=y
#
# Tracers
#
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_CONTEXT_SWITCH_TRACER is not set
# CONFIG_BOOT_TRACER is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_BUILD_DOCSRC is not set
# CONFIG_DYNAMIC_PRINTK_DEBUG is not set
# CONFIG_SAMPLES is not set
# CONFIG_DEBUG_PAGEALLOC is not set
#
# Security options
#
# CONFIG_KEYS is not set
CONFIG_SECURITY=y
# CONFIG_SECURITYFS is not set
# CONFIG_SECURITY_NETWORK is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=0
CONFIG_CRYPTO=y
#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=m
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_NULL=m
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=m
CONFIG_CRYPTO_SEQIV=m
#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CTR=m
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=m
#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
#
# Digest
#
CONFIG_CRYPTO_CRC32C=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA1=m
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m
#
# Ciphers
#
CONFIG_CRYPTO_AES=m
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=m
#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_HW=y
CONFIG_ZCRYPT=m
# CONFIG_ZCRYPT_MONOLITHIC is not set
CONFIG_CRYPTO_SHA1_S390=m
CONFIG_CRYPTO_SHA256_S390=m
CONFIG_CRYPTO_SHA512_S390=m
CONFIG_CRYPTO_DES_S390=m
CONFIG_CRYPTO_AES_S390=m
CONFIG_S390_PRNG=m
#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
CONFIG_CRC_T10DIF=m
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=m
CONFIG_LZO_DECOMPRESS=m
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_PLIST=y
CONFIG_HAVE_KVM=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=y
CONFIG_VIRTIO=y
CONFIG_VIRTIO_RING=y
CONFIG_VIRTIO_BALLOON=y
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-17 10:49 ` Nick Piggin
2009-03-17 11:39 ` Heiko Carstens
@ 2009-03-20 5:08 ` Wu Fengguang
1 sibling, 0 replies; 13+ messages in thread
From: Wu Fengguang @ 2009-03-20 5:08 UTC (permalink / raw)
To: Nick Piggin
Cc: Heiko Carstens, linux-fsdevel, Andrew Morton, linux-mm,
Mel Gorman, Nick Piggin, Martin Schwidefsky, Andreas Krebbel
[-- Attachment #1: Type: text/plain, Size: 4864 bytes --]
On Tue, Mar 17, 2009 at 09:49:35PM +1100, Nick Piggin wrote:
> On Tuesday 17 March 2009 21:28:42 Heiko Carstens wrote:
> > On Tue, 17 Mar 2009 11:17:38 +0100
> >
> > Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > On Tue, 17 Mar 2009 02:46:05 -0700
> > >
> > > Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > > Mar 16 21:40:40 t6360003 kernel: Active_anon:372 active_file:45
> > > > > inactive_anon:154 Mar 16 21:40:40 t6360003 kernel: inactive_file:152
> > > > > unevictable:987 dirty:0 writeback:188 unstable:0 Mar 16 21:40:40
> > > > > t6360003 kernel: free:146348 slab:875833 mapped:805 pagetables:378
> > > > > bounce:0 Mar 16 21:40:40 t6360003 kernel: DMA free:467728kB
> > > > > min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:0kB
> > > > > active_file:0kB inactive_file:116kB unevictable:0kB present:2068480kB
> > > > > pages_scanned:0 all_unreclaimable? no Mar 16 21:40:40 t6360003
> > > > > kernel: lowmem_reserve[]: 0 2020 2020 Mar 16 21:40:40 t6360003
> > > > > kernel: Normal free:117664kB min:4064kB low:5080kB high:6096kB
> > > > > active_anon:1488kB inactive_anon:616kB active_file:188kB
> > > > > inactive_file:492kB unevictable:3948kB present:2068480kB
> > > > > pages_scanned:128 all_unreclaimable? no Mar 16 21:40:40 t6360003
> > > > > kernel: lowmem_reserve[]: 0 0 0
> > > >
> > > > The scanner has wrung pretty much all it can out of the reclaimable
> > > > pages - the LRUs are nearly empty. There's a few hundred MB free and
> > > > apparently we don't have four physically contiguous free pages
> > > > anywhere. It's believeable.
> > > >
> > > > The question is: where the heck did all your memory go? You have 2GB
> > > > of ZONE_NORMAL memory in that machine, but only a tenth of it is
> > > > visible to the page reclaim code.
> > > >
> > > > Something must have allocated (and possibly leaked) it.
> > >
> > > Looks like most of the memory went for dentries and inodes.
> > > slabtop output:
> > >
> > > Active / Total Objects (% used) : 8172165 / 8326954 (98.1%)
> > > Active / Total Slabs (% used) : 903692 / 903698 (100.0%)
> > > Active / Total Caches (% used) : 91 / 144 (63.2%)
> > > Active / Total Size (% used) : 3251262.44K / 3281384.22K (99.1%)
> > > Minimum / Average / Maximum Object : 0.02K / 0.39K / 1024.00K
> > >
> > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > > 3960036 3960017 99% 0.59K 660006 6 2640024K inode_cache
> > > 4137155 3997581 96% 0.20K 217745 19 870980K dentry
> > > 69776 69744 99% 0.80K 17444 4 69776K ext3_inode_cache
> > > 96792 92892 95% 0.10K 2616 37 10464K buffer_head
> > > 10024 9895 98% 0.54K 1432 7 5728K radix_tree_node
> > > 1093 1087 99% 4.00K 1093 1 4372K size-4096
> > > 14805 14711 99% 0.25K 987 15 3948K size-256
> > > 2400 2381 99% 0.80K 480 5 1920K shmem_inode_cache
> >
> > FWIW, after "echo 3 > /proc/sys/vm/drop_caches" it looks like this:
> >
> > Active / Total Objects (% used) : 7965003 / 8153578 (97.7%)
> > Active / Total Slabs (% used) : 882511 / 882511 (100.0%)
> > Active / Total Caches (% used) : 90 / 144 (62.5%)
> > Active / Total Size (% used) : 3173487.59K / 3211091.64K (98.8%)
> > Minimum / Average / Maximum Object : 0.02K / 0.39K / 1024.00K
> >
> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> > 3960036 3960007 99% 0.59K 660006 6 2640024K inode_cache
> > 4137155 3962636 95% 0.20K 217745 19 870980K dentry
> > 1097 1097 100% 4.00K 1097 1 4388K size-4096
> > 14805 14667 99% 0.25K 987 15 3948K size-256
> > 2400 2381 99% 0.80K 480 5 1920K shmem_inode_cache
> > 1404 1404 100% 1.00K 351 4 1404K size-1024
> > 152 152 100% 5.59K 152 1 1216K task_struct
> > 1302 347 26% 0.54K 186 7 744K radix_tree_node
> > 370 359 97% 2.00K 185 2 740K size-2048
> > 9381 4316 46% 0.06K 159 59 636K size-64
> > 8 8 100% 64.00K 8 1 512K size-65536
> >
> > So, are we leaking dentries and inodes?
>
> Yes, probably leaking dentries, which pin inodes. I don't know that slab
> leak debugging is going to help you because it won't find what is holding
> the refcount.
Heiko, what's the output of `lsof`?
The attached filecache patch may also help debugging.
Usage:
# run patched kernel, with CONFIG_PROC_FILECACHE and CONFIG_PROC_FILECACHE_EXTRAS
modprobe filecache
echo ls all > /proc/filecache
cp /proc/filecache filecache-`date +'%F'`
This will dump all the cached inodes with their file name, refcount and creator.
Thanks,
Fengguang
[-- Attachment #2: filecache-2.6.28.patch --]
[-- Type: text/x-diff, Size: 33812 bytes --]
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -27,6 +27,7 @@ extern unsigned long max_mapnr;
extern unsigned long num_physpages;
extern void * high_memory;
extern int page_cluster;
+extern char * const zone_names[];
#ifdef CONFIG_SYSCTL
extern int sysctl_legacy_va_layout;
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -104,7 +104,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_Z
EXPORT_SYMBOL(totalram_pages);
-static char * const zone_names[MAX_NR_ZONES] = {
+char * const zone_names[MAX_NR_ZONES] = {
#ifdef CONFIG_ZONE_DMA
"DMA",
#endif
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -1943,7 +1943,10 @@ char *__d_path(const struct path *path,
if (dentry == root->dentry && vfsmnt == root->mnt)
break;
- if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
+ if (unlikely(!vfsmnt)) {
+ if (IS_ROOT(dentry))
+ break;
+ } else if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
/* Global root? */
if (vfsmnt->mnt_parent == vfsmnt) {
goto global_root;
--- linux-2.6.orig/lib/radix-tree.c
+++ linux-2.6/lib/radix-tree.c
@@ -564,7 +564,6 @@ out:
}
EXPORT_SYMBOL(radix_tree_tag_clear);
-#ifndef __KERNEL__ /* Only the test harness uses this at present */
/**
* radix_tree_tag_get - get a tag on a radix tree node
* @root: radix tree root
@@ -627,7 +626,6 @@ int radix_tree_tag_get(struct radix_tree
}
}
EXPORT_SYMBOL(radix_tree_tag_get);
-#endif
/**
* radix_tree_next_hole - find the next hole (not-present entry)
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -82,6 +82,10 @@ static struct hlist_head *inode_hashtabl
*/
DEFINE_SPINLOCK(inode_lock);
+EXPORT_SYMBOL(inode_in_use);
+EXPORT_SYMBOL(inode_unused);
+EXPORT_SYMBOL(inode_lock);
+
/*
* iprune_mutex provides exclusion between the kswapd or try_to_free_pages
* icache shrinking path, and the umount path. Without this exclusion,
@@ -108,6 +112,14 @@ static void wake_up_inode(struct inode *
wake_up_bit(&inode->i_state, __I_LOCK);
}
+static inline void inode_created_by(struct inode *inode, struct task_struct *task)
+{
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ inode->i_cuid = task->uid;
+ memcpy(inode->i_comm, task->comm, sizeof(task->comm));
+#endif
+}
+
static struct inode *alloc_inode(struct super_block *sb)
{
static const struct address_space_operations empty_aops;
@@ -183,6 +195,7 @@ static struct inode *alloc_inode(struct
}
inode->i_private = NULL;
inode->i_mapping = mapping;
+ inode_created_by(inode, current);
}
return inode;
}
@@ -247,6 +260,8 @@ void __iget(struct inode * inode)
inodes_stat.nr_unused--;
}
+EXPORT_SYMBOL(__iget);
+
/**
* clear_inode - clear an inode
* @inode: inode to clear
@@ -1353,6 +1368,16 @@ void inode_double_unlock(struct inode *i
}
EXPORT_SYMBOL(inode_double_unlock);
+
+struct hlist_head * get_inode_hash_budget(unsigned long index)
+{
+ if (index >= (1 << i_hash_shift))
+ return NULL;
+
+ return inode_hashtable + index;
+}
+EXPORT_SYMBOL_GPL(get_inode_hash_budget);
+
static __initdata unsigned long ihash_entries;
static int __init set_ihash_entries(char *str)
{
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -45,6 +45,9 @@
LIST_HEAD(super_blocks);
DEFINE_SPINLOCK(sb_lock);
+EXPORT_SYMBOL(super_blocks);
+EXPORT_SYMBOL(sb_lock);
+
/**
* alloc_super - create new superblock
* @type: filesystem type superblock should belong to
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -230,6 +230,7 @@ unsigned long shrink_slab(unsigned long
up_read(&shrinker_rwsem);
return ret;
}
+EXPORT_SYMBOL(shrink_slab);
/* Called without lock on whether page is mapped, so answer is unstable */
static inline int page_mapping_inuse(struct page *page)
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -44,6 +44,7 @@ struct address_space swapper_space = {
.i_mmap_nonlinear = LIST_HEAD_INIT(swapper_space.i_mmap_nonlinear),
.backing_dev_info = &swap_backing_dev_info,
};
+EXPORT_SYMBOL_GPL(swapper_space);
#define INC_CACHE_INFO(x) do { swap_cache_info.x++; } while (0)
--- linux-2.6.orig/Documentation/filesystems/proc.txt
+++ linux-2.6/Documentation/filesystems/proc.txt
@@ -266,6 +266,7 @@ Table 1-4: Kernel info in /proc
driver Various drivers grouped here, currently rtc (2.4)
execdomains Execdomains, related to security (2.4)
fb Frame Buffer devices (2.4)
+ filecache Query/drop in-memory file cache
fs File system parameters, currently nfs/exports (2.4)
ide Directory containing info about the IDE subsystem
interrupts Interrupt usage
@@ -456,6 +457,88 @@ varies by architecture and compile optio
> cat /proc/meminfo
+..............................................................................
+
+filecache:
+
+Provides access to the in-memory file cache.
+
+To list an index of all cached files:
+
+ echo ls > /proc/filecache
+ cat /proc/filecache
+
+The output looks like:
+
+ # filecache 1.0
+ # ino size cached cached% state refcnt dev file
+ 1026334 91 92 100 -- 66 03:02(hda2) /lib/ld-2.3.6.so
+ 233608 1242 972 78 -- 66 03:02(hda2) /lib/tls/libc-2.3.6.so
+ 65203 651 476 73 -- 1 03:02(hda2) /bin/bash
+ 1026445 261 160 61 -- 10 03:02(hda2) /lib/libncurses.so.5.5
+ 235427 10 12 100 -- 44 03:02(hda2) /lib/tls/libdl-2.3.6.so
+
+FIELD INTRO
+---------------------------------------------------------------------------
+ino inode number
+size inode size in KB
+cached cached size in KB
+cached% percent of file data cached
+state1 '-' clean; 'd' metadata dirty; 'D' data dirty
+state2 '-' unlocked; 'L' locked, normally indicates file being written out
+refcnt file reference count, it's an in-kernel one, not exactly open count
+dev major:minor numbers in hex, followed by a descriptive device name
+file file path _inside_ the filesystem. There are several special names:
+ '(noname)': the file name is not available
+ '(03:02)': the file is a block device file of major:minor
+ '...(deleted)': the named file has been deleted from the disk
+
+To list the cached pages of a perticular file:
+
+ echo /bin/bash > /proc/filecache
+ cat /proc/filecache
+
+ # file /bin/bash
+ # flags R:referenced A:active U:uptodate D:dirty W:writeback M:mmap
+ # idx len state refcnt
+ 0 36 RAU__M 3
+ 36 1 RAU__M 2
+ 37 8 RAU__M 3
+ 45 2 RAU___ 1
+ 47 6 RAU__M 3
+ 53 3 RAU__M 2
+ 56 2 RAU__M 3
+
+FIELD INTRO
+----------------------------------------------------------------------------
+idx page index
+len number of pages which are cached and share the same state
+state page state of the flags listed in line two
+refcnt page reference count
+
+Careful users may notice that the file name to be queried is remembered between
+commands. Internally, the module has a global variable to store the file name
+parameter, so that it can be inherited by newly opened /proc/filecache file.
+However it can lead to interference for multiple queriers. The solution here
+is to obey a rule: only root can interactively change the file name parameter;
+normal users must go for scripts to access the interface. Scripts should do it
+by following the code example below:
+
+ filecache = open("/proc/filecache", "rw");
+ # avoid polluting the global parameter filename
+ filecache.write("set private");
+
+To instruct the kernel to drop clean caches, dentries and inodes from memory,
+causing that memory to become free:
+
+ # drop clean file data cache (i.e. file backed pagecache)
+ echo drop pagecache > /proc/filecache
+
+ # drop clean file metadata cache (i.e. dentries and inodes)
+ echo drop slabcache > /proc/filecache
+
+Note that the drop commands are non-destructive operations and dirty objects
+are not freeable, the user should run `sync' first.
MemTotal: 16344972 kB
MemFree: 13634064 kB
--- /dev/null
+++ linux-2.6/fs/proc/filecache.c
@@ -0,0 +1,1045 @@
+/*
+ * fs/proc/filecache.c
+ *
+ * Copyright (C) 2006, 2007 Fengguang Wu <wfg@mail.ustc.edu.cn>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/radix-tree.h>
+#include <linux/page-flags.h>
+#include <linux/pagevec.h>
+#include <linux/pagemap.h>
+#include <linux/vmalloc.h>
+#include <linux/writeback.h>
+#include <linux/buffer_head.h>
+#include <linux/parser.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/module.h>
+#include <asm/uaccess.h>
+
+/*
+ * Increase minor version when new columns are added;
+ * Increase major version when existing columns are changed.
+ */
+#define FILECACHE_VERSION "1.0"
+
+/* Internal buffer sizes. The larger the more effcient. */
+#define SBUF_SIZE (128<<10)
+#define IWIN_PAGE_ORDER 3
+#define IWIN_SIZE ((PAGE_SIZE<<IWIN_PAGE_ORDER) / sizeof(struct inode *))
+
+/*
+ * Session management.
+ *
+ * Each opened /proc/filecache file is assiocated with a session object.
+ * Also there is a global_session that maintains status across open()/close()
+ * (i.e. the lifetime of an opened file), so that a casual user can query the
+ * filecache via _multiple_ simple shell commands like
+ * 'echo cat /bin/bash > /proc/filecache; cat /proc/filecache'.
+ *
+ * session.query_file is the file whose cache info is to be queried.
+ * Its value determines what we get on read():
+ * - NULL: ii_*() called to show the inode index
+ * - filp: pg_*() called to show the page groups of a filp
+ *
+ * session.query_file is
+ * - cloned from global_session.query_file on open();
+ * - updated on write("cat filename");
+ * note that the new file will also be saved in global_session.query_file if
+ * session.private_session is false.
+ */
+
+struct session {
+ /* options */
+ int private_session;
+ unsigned long ls_options;
+ dev_t ls_dev;
+
+ /* parameters */
+ struct file *query_file;
+
+ /* seqfile pos */
+ pgoff_t start_offset;
+ pgoff_t next_offset;
+
+ /* inode at last pos */
+ struct {
+ unsigned long pos;
+ unsigned long state;
+ struct inode *inode;
+ struct inode *pinned_inode;
+ } ipos;
+
+ /* inode window */
+ struct {
+ unsigned long cursor;
+ unsigned long origin;
+ unsigned long size;
+ struct inode **inodes;
+ } iwin;
+};
+
+static struct session global_session;
+
+/*
+ * Session address is stored in proc_file->f_ra.start:
+ * we assume that there will be no readahead for proc_file.
+ */
+static struct session *get_session(struct file *proc_file)
+{
+ return (struct session *)proc_file->f_ra.start;
+}
+
+static void set_session(struct file *proc_file, struct session *s)
+{
+ BUG_ON(proc_file->f_ra.start);
+ proc_file->f_ra.start = (unsigned long)s;
+}
+
+static void update_global_file(struct session *s)
+{
+ if (s->private_session)
+ return;
+
+ if (global_session.query_file)
+ fput(global_session.query_file);
+
+ global_session.query_file = s->query_file;
+
+ if (global_session.query_file)
+ get_file(global_session.query_file);
+}
+
+/*
+ * Cases of the name:
+ * 1) NULL (new session)
+ * s->query_file = global_session.query_file = 0;
+ * 2) "" (ls/la)
+ * s->query_file = global_session.query_file;
+ * 3) a regular file name (cat newfile)
+ * s->query_file = global_session.query_file = newfile;
+ */
+static int session_update_file(struct session *s, char *name)
+{
+ static DEFINE_MUTEX(mutex); /* protects global_session.query_file */
+ int err = 0;
+
+ mutex_lock(&mutex);
+
+ /*
+ * We are to quit, or to list the cached files.
+ * Reset *.query_file.
+ */
+ if (!name) {
+ if (s->query_file) {
+ fput(s->query_file);
+ s->query_file = NULL;
+ }
+ update_global_file(s);
+ goto out;
+ }
+
+ /*
+ * This is a new session.
+ * Inherit options/parameters from global ones.
+ */
+ if (name[0] == '\0') {
+ *s = global_session;
+ if (s->query_file)
+ get_file(s->query_file);
+ goto out;
+ }
+
+ /*
+ * Open the named file.
+ */
+ if (s->query_file)
+ fput(s->query_file);
+ s->query_file = filp_open(name, O_RDONLY|O_LARGEFILE, 0);
+ if (IS_ERR(s->query_file)) {
+ err = PTR_ERR(s->query_file);
+ s->query_file = NULL;
+ } else
+ update_global_file(s);
+
+out:
+ mutex_unlock(&mutex);
+
+ return err;
+}
+
+static struct session *session_create(void)
+{
+ struct session *s;
+ int err = 0;
+
+ s = kmalloc(sizeof(*s), GFP_KERNEL);
+ if (s)
+ err = session_update_file(s, "");
+ else
+ err = -ENOMEM;
+
+ return err ? ERR_PTR(err) : s;
+}
+
+static void session_release(struct session *s)
+{
+ if (s->ipos.pinned_inode)
+ iput(s->ipos.pinned_inode);
+ if (s->query_file)
+ fput(s->query_file);
+ kfree(s);
+}
+
+
+/*
+ * Listing of cached files.
+ *
+ * Usage:
+ * echo > /proc/filecache # enter listing mode
+ * cat /proc/filecache # get the file listing
+ */
+
+/* code style borrowed from ib_srp.c */
+enum {
+ LS_OPT_ERR = 0,
+ LS_OPT_NOCLEAN = 1 << 0,
+ LS_OPT_NODIRTY = 1 << 1,
+ LS_OPT_NOUNUSED = 1 << 2,
+ LS_OPT_EMPTY = 1 << 3,
+ LS_OPT_ALL = 1 << 4,
+ LS_OPT_DEV = 1 << 5,
+};
+
+static match_table_t ls_opt_tokens = {
+ { LS_OPT_NOCLEAN, "noclean" },
+ { LS_OPT_NODIRTY, "nodirty" },
+ { LS_OPT_NOUNUSED, "nounused" },
+ { LS_OPT_EMPTY, "empty" },
+ { LS_OPT_ALL, "all" },
+ { LS_OPT_DEV, "dev=%s" },
+ { LS_OPT_ERR, NULL }
+};
+
+static int ls_parse_options(const char *buf, struct session *s)
+{
+ substring_t args[MAX_OPT_ARGS];
+ char *options, *sep_opt;
+ char *p;
+ int token;
+ int ret = 0;
+
+ if (!buf)
+ return 0;
+ options = kstrdup(buf, GFP_KERNEL);
+ if (!options)
+ return -ENOMEM;
+
+ s->ls_options = 0;
+ sep_opt = options;
+ while ((p = strsep(&sep_opt, " ")) != NULL) {
+ if (!*p)
+ continue;
+
+ token = match_token(p, ls_opt_tokens, args);
+
+ switch (token) {
+ case LS_OPT_NOCLEAN:
+ case LS_OPT_NODIRTY:
+ case LS_OPT_NOUNUSED:
+ case LS_OPT_EMPTY:
+ case LS_OPT_ALL:
+ s->ls_options |= token;
+ break;
+ case LS_OPT_DEV:
+ p = match_strdup(args);
+ if (!p) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ if (*p == '/') {
+ struct kstat stat;
+ struct nameidata nd;
+ ret = path_lookup(p, LOOKUP_FOLLOW, &nd);
+ if (!ret)
+ ret = vfs_getattr(nd.path.mnt,
+ nd.path.dentry, &stat);
+ if (!ret)
+ s->ls_dev = stat.rdev;
+ } else
+ s->ls_dev = simple_strtoul(p, NULL, 0);
+ /* printk("%lx %s\n", (long)s->ls_dev, p); */
+ kfree(p);
+ break;
+
+ default:
+ printk(KERN_WARNING "unknown parameter or missing value "
+ "'%s' in ls command\n", p);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+out:
+ kfree(options);
+ return ret;
+}
+
+/*
+ * Add possible filters here.
+ * No permission check: we cannot verify the path's permission anyway.
+ * We simply demand root previledge for accessing /proc/filecache.
+ */
+static int may_show_inode(struct session *s, struct inode *inode)
+{
+ if (!atomic_read(&inode->i_count))
+ return 0;
+ if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+ return 0;
+ if (!inode->i_mapping)
+ return 0;
+
+ if (s->ls_dev && s->ls_dev != inode->i_sb->s_dev)
+ return 0;
+
+ if (s->ls_options & LS_OPT_ALL)
+ return 1;
+
+ if (!(s->ls_options & LS_OPT_EMPTY) && !inode->i_mapping->nrpages)
+ return 0;
+
+ if ((s->ls_options & LS_OPT_NOCLEAN) && !(inode->i_state & I_DIRTY))
+ return 0;
+
+ if ((s->ls_options & LS_OPT_NODIRTY) && (inode->i_state & I_DIRTY))
+ return 0;
+
+ if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+ S_ISLNK(inode->i_mode) || S_ISBLK(inode->i_mode)))
+ return 0;
+
+ return 1;
+}
+
+/*
+ * Full: there are more data following.
+ */
+static int iwin_full(struct session *s)
+{
+ return !s->iwin.cursor ||
+ s->iwin.cursor > s->iwin.origin + s->iwin.size;
+}
+
+static int iwin_push(struct session *s, struct inode *inode)
+{
+ if (!may_show_inode(s, inode))
+ return 0;
+
+ s->iwin.cursor++;
+
+ if (s->iwin.size >= IWIN_SIZE)
+ return 1;
+
+ if (s->iwin.cursor > s->iwin.origin)
+ s->iwin.inodes[s->iwin.size++] = inode;
+ return 0;
+}
+
+/*
+ * Travease the inode lists in order - newest first.
+ * And fill @s->iwin.inodes with inodes positioned in [@pos, @pos+IWIN_SIZE).
+ */
+static int iwin_fill(struct session *s, unsigned long pos)
+{
+ struct inode *inode;
+ struct super_block *sb;
+
+ s->iwin.origin = pos;
+ s->iwin.cursor = 0;
+ s->iwin.size = 0;
+
+ /*
+ * We have a cursor inode, clean and expected to be unchanged.
+ */
+ if (s->ipos.inode && pos >= s->ipos.pos &&
+ !(s->ipos.state & I_DIRTY) &&
+ s->ipos.state == s->ipos.inode->i_state) {
+ inode = s->ipos.inode;
+ s->iwin.cursor = s->ipos.pos;
+ goto continue_from_saved;
+ }
+
+ if (s->ls_options & LS_OPT_NODIRTY)
+ goto clean_inodes;
+
+ spin_lock(&sb_lock);
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ if (s->ls_dev && s->ls_dev != sb->s_dev)
+ continue;
+
+ list_for_each_entry(inode, &sb->s_dirty, i_list) {
+ if (iwin_push(s, inode))
+ goto out_full_unlock;
+ }
+ list_for_each_entry(inode, &sb->s_io, i_list) {
+ if (iwin_push(s, inode))
+ goto out_full_unlock;
+ }
+ }
+ spin_unlock(&sb_lock);
+
+clean_inodes:
+ list_for_each_entry(inode, &inode_in_use, i_list) {
+ if (iwin_push(s, inode))
+ goto out_full;
+continue_from_saved:
+ ;
+ }
+
+ if (s->ls_options & LS_OPT_NOUNUSED)
+ return 0;
+
+ list_for_each_entry(inode, &inode_unused, i_list) {
+ if (iwin_push(s, inode))
+ goto out_full;
+ }
+
+ return 0;
+
+out_full_unlock:
+ spin_unlock(&sb_lock);
+out_full:
+ return 1;
+}
+
+static struct inode *iwin_inode(struct session *s, unsigned long pos)
+{
+ if ((iwin_full(s) && pos >= s->iwin.origin + s->iwin.size)
+ || pos < s->iwin.origin)
+ iwin_fill(s, pos);
+
+ if (pos >= s->iwin.cursor)
+ return NULL;
+
+ s->ipos.pos = pos;
+ s->ipos.inode = s->iwin.inodes[pos - s->iwin.origin];
+ BUG_ON(!s->ipos.inode);
+ return s->ipos.inode;
+}
+
+static void show_inode(struct seq_file *m, struct inode *inode)
+{
+ char state[] = "--"; /* dirty, locked */
+ struct dentry *dentry;
+ loff_t size = i_size_read(inode);
+ unsigned long nrpages;
+ int percent;
+ int refcnt;
+ int shift;
+
+ if (!size)
+ size++;
+
+ if (inode->i_mapping)
+ nrpages = inode->i_mapping->nrpages;
+ else {
+ nrpages = 0;
+ WARN_ON(1);
+ }
+
+ for (shift = 0; (size >> shift) > ULONG_MAX / 128; shift += 12)
+ ;
+ percent = min(100UL, (((100 * nrpages) >> shift) << PAGE_CACHE_SHIFT) /
+ (unsigned long)(size >> shift));
+
+ if (inode->i_state & (I_DIRTY_DATASYNC|I_DIRTY_PAGES))
+ state[0] = 'D';
+ else if (inode->i_state & I_DIRTY_SYNC)
+ state[0] = 'd';
+
+ if (inode->i_state & I_LOCK)
+ state[0] = 'L';
+
+ refcnt = 0;
+ list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
+ refcnt += atomic_read(&dentry->d_count);
+ }
+
+ seq_printf(m, "%10lu %10llu %8lu %7d ",
+ inode->i_ino,
+ DIV_ROUND_UP(size, 1024),
+ nrpages << (PAGE_CACHE_SHIFT - 10),
+ percent);
+
+ seq_printf(m, "%6d %5s ",
+ refcnt,
+ state);
+
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ seq_printf(m, "%8u %5u %-16s",
+ inode->i_access_count,
+ inode->i_cuid,
+ inode->i_comm);
+#endif
+
+ seq_printf(m, "%02x:%02x(%s)\t",
+ MAJOR(inode->i_sb->s_dev),
+ MINOR(inode->i_sb->s_dev),
+ inode->i_sb->s_id);
+
+ if (list_empty(&inode->i_dentry)) {
+ if (!atomic_read(&inode->i_count))
+ seq_puts(m, "(noname)\n");
+ else
+ seq_printf(m, "(%02x:%02x)\n",
+ imajor(inode), iminor(inode));
+ } else {
+ struct path path = {
+ .mnt = NULL,
+ .dentry = list_entry(inode->i_dentry.next,
+ struct dentry, d_alias)
+ };
+
+ seq_path(m, &path, " \t\n\\");
+ seq_putc(m, '\n');
+ }
+}
+
+static int ii_show(struct seq_file *m, void *v)
+{
+ unsigned long index = *(loff_t *) v;
+ struct session *s = m->private;
+ struct inode *inode;
+
+ if (index == 0) {
+ seq_puts(m, "# filecache " FILECACHE_VERSION "\n");
+ seq_puts(m, "# ino size cached cached% "
+ "refcnt state "
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ "accessed uid process "
+#endif
+ "dev\t\tfile\n");
+ }
+
+ inode = iwin_inode(s,index);
+ show_inode(m, inode);
+
+ return 0;
+}
+
+static void *ii_start(struct seq_file *m, loff_t *pos)
+{
+ struct session *s = m->private;
+
+ s->iwin.size = 0;
+ s->iwin.inodes = (struct inode **)
+ __get_free_pages( GFP_KERNEL, IWIN_PAGE_ORDER);
+ if (!s->iwin.inodes)
+ return NULL;
+
+ spin_lock(&inode_lock);
+
+ return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void *ii_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct session *s = m->private;
+
+ (*pos)++;
+ return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void ii_stop(struct seq_file *m, void *v)
+{
+ struct session *s = m->private;
+ struct inode *inode = s->ipos.inode;
+
+ if (!s->iwin.inodes)
+ return;
+
+ if (inode) {
+ __iget(inode);
+ s->ipos.state = inode->i_state;
+ }
+ spin_unlock(&inode_lock);
+
+ free_pages((unsigned long) s->iwin.inodes, IWIN_PAGE_ORDER);
+ if (s->ipos.pinned_inode)
+ iput(s->ipos.pinned_inode);
+ s->ipos.pinned_inode = inode;
+}
+
+/*
+ * Listing of cached page ranges of a file.
+ *
+ * Usage:
+ * echo 'file name' > /proc/filecache
+ * cat /proc/filecache
+ */
+
+unsigned long page_mask;
+#define PG_MMAP PG_lru /* reuse any non-relevant flag */
+#define PG_BUFFER PG_swapcache /* ditto */
+#define PG_DIRTY PG_error /* ditto */
+#define PG_WRITEBACK PG_buddy /* ditto */
+
+/*
+ * Page state names, prefixed by their abbreviations.
+ */
+struct {
+ unsigned long mask;
+ const char *name;
+ int faked;
+} page_flag [] = {
+ {1 << PG_referenced, "R:referenced", 0},
+ {1 << PG_active, "A:active", 0},
+ {1 << PG_MMAP, "M:mmap", 1},
+
+ {1 << PG_uptodate, "U:uptodate", 0},
+ {1 << PG_dirty, "D:dirty", 0},
+ {1 << PG_writeback, "W:writeback", 0},
+ {1 << PG_reclaim, "X:readahead", 0},
+
+ {1 << PG_private, "P:private", 0},
+ {1 << PG_owner_priv_1, "O:owner", 0},
+
+ {1 << PG_BUFFER, "b:buffer", 1},
+ {1 << PG_DIRTY, "d:dirty", 1},
+ {1 << PG_WRITEBACK, "w:writeback", 1},
+};
+
+static unsigned long page_flags(struct page* page)
+{
+ unsigned long flags;
+ struct address_space *mapping = page_mapping(page);
+
+ flags = page->flags & page_mask;
+
+ if (page_mapped(page))
+ flags |= (1 << PG_MMAP);
+
+ if (page_has_buffers(page))
+ flags |= (1 << PG_BUFFER);
+
+ if (mapping) {
+ if (radix_tree_tag_get(&mapping->page_tree,
+ page_index(page),
+ PAGECACHE_TAG_WRITEBACK))
+ flags |= (1 << PG_WRITEBACK);
+
+ if (radix_tree_tag_get(&mapping->page_tree,
+ page_index(page),
+ PAGECACHE_TAG_DIRTY))
+ flags |= (1 << PG_DIRTY);
+ }
+
+ return flags;
+}
+
+static int pages_similiar(struct page* page0, struct page* page)
+{
+ if (page_count(page0) != page_count(page))
+ return 0;
+
+ if (page_flags(page0) != page_flags(page))
+ return 0;
+
+ return 1;
+}
+
+static void show_range(struct seq_file *m, struct page* page, unsigned long len)
+{
+ int i;
+ unsigned long flags;
+
+ if (!m || !page)
+ return;
+
+ seq_printf(m, "%lu\t%lu\t", page->index, len);
+
+ flags = page_flags(page);
+ for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+ seq_putc(m, (flags & page_flag[i].mask) ?
+ page_flag[i].name[0] : '_');
+
+ seq_printf(m, "\t%d\n", page_count(page));
+}
+
+#define BATCH_LINES 100
+static pgoff_t show_file_cache(struct seq_file *m,
+ struct address_space *mapping, pgoff_t start)
+{
+ int i;
+ int lines = 0;
+ pgoff_t len = 0;
+ struct pagevec pvec;
+ struct page *page;
+ struct page *page0 = NULL;
+
+ for (;;) {
+ pagevec_init(&pvec, 0);
+ pvec.nr = radix_tree_gang_lookup(&mapping->page_tree,
+ (void **)pvec.pages, start + len, PAGEVEC_SIZE);
+
+ if (pvec.nr == 0) {
+ show_range(m, page0, len);
+ start = ULONG_MAX;
+ goto out;
+ }
+
+ if (!page0)
+ page0 = pvec.pages[0];
+
+ for (i = 0; i < pvec.nr; i++) {
+ page = pvec.pages[i];
+
+ if (page->index == start + len &&
+ pages_similiar(page0, page))
+ len++;
+ else {
+ show_range(m, page0, len);
+ page0 = page;
+ start = page->index;
+ len = 1;
+ if (++lines > BATCH_LINES)
+ goto out;
+ }
+ }
+ }
+
+out:
+ return start;
+}
+
+static int pg_show(struct seq_file *m, void *v)
+{
+ struct session *s = m->private;
+ struct file *file = s->query_file;
+ pgoff_t offset;
+
+ if (!file)
+ return ii_show(m, v);
+
+ offset = *(loff_t *) v;
+
+ if (!offset) { /* print header */
+ int i;
+
+ seq_puts(m, "# file ");
+ seq_path(m, &file->f_path, " \t\n\\");
+
+ seq_puts(m, "\n# flags");
+ for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+ seq_printf(m, " %s", page_flag[i].name);
+
+ seq_puts(m, "\n# idx\tlen\tstate\t\trefcnt\n");
+ }
+
+ s->start_offset = offset;
+ s->next_offset = show_file_cache(m, file->f_mapping, offset);
+
+ return 0;
+}
+
+static void *file_pos(struct file *file, loff_t *pos)
+{
+ loff_t size = i_size_read(file->f_mapping->host);
+ pgoff_t end = DIV_ROUND_UP(size, PAGE_CACHE_SIZE);
+ pgoff_t offset = *pos;
+
+ return offset < end ? pos : NULL;
+}
+
+static void *pg_start(struct seq_file *m, loff_t *pos)
+{
+ struct session *s = m->private;
+ struct file *file = s->query_file;
+ pgoff_t offset = *pos;
+
+ if (!file)
+ return ii_start(m, pos);
+
+ rcu_read_lock();
+
+ if (offset - s->start_offset == 1)
+ *pos = s->next_offset;
+ return file_pos(file, pos);
+}
+
+static void *pg_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct session *s = m->private;
+ struct file *file = s->query_file;
+
+ if (!file)
+ return ii_next(m, v, pos);
+
+ *pos = s->next_offset;
+ return file_pos(file, pos);
+}
+
+static void pg_stop(struct seq_file *m, void *v)
+{
+ struct session *s = m->private;
+ struct file *file = s->query_file;
+
+ if (!file)
+ return ii_stop(m, v);
+
+ rcu_read_unlock();
+}
+
+struct seq_operations seq_filecache_op = {
+ .start = pg_start,
+ .next = pg_next,
+ .stop = pg_stop,
+ .show = pg_show,
+};
+
+/*
+ * Implement the manual drop-all-pagecache function
+ */
+
+#define MAX_INODES (PAGE_SIZE / sizeof(struct inode *))
+static int drop_pagecache(void)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct inode *inode;
+ struct inode **inodes;
+ unsigned long i, j, k;
+ int err = 0;
+
+ inodes = (struct inode **)__get_free_pages(GFP_KERNEL, IWIN_PAGE_ORDER);
+ if (!inodes)
+ return -ENOMEM;
+
+ for (i = 0; (head = get_inode_hash_budget(i)); i++) {
+ if (hlist_empty(head))
+ continue;
+
+ j = 0;
+ cond_resched();
+
+ /*
+ * Grab some inodes.
+ */
+ spin_lock(&inode_lock);
+ hlist_for_each (node, head) {
+ inode = hlist_entry(node, struct inode, i_hash);
+ if (!atomic_read(&inode->i_count))
+ continue;
+ if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+ continue;
+ if (!inode->i_mapping || !inode->i_mapping->nrpages)
+ continue;
+ __iget(inode);
+ inodes[j++] = inode;
+ if (j >= MAX_INODES)
+ break;
+ }
+ spin_unlock(&inode_lock);
+
+ /*
+ * Free clean pages.
+ */
+ for (k = 0; k < j; k++) {
+ inode = inodes[k];
+ invalidate_mapping_pages(inode->i_mapping, 0, ~1);
+ iput(inode);
+ }
+
+ /*
+ * Simply ignore the remaining inodes.
+ */
+ if (j >= MAX_INODES && !err) {
+ printk(KERN_WARNING
+ "Too many collides in inode hash table.\n"
+ "Pls boot with a larger ihash_entries=XXX.\n");
+ err = -EAGAIN;
+ }
+ }
+
+ free_pages((unsigned long) inodes, IWIN_PAGE_ORDER);
+ return err;
+}
+
+static void drop_slabcache(void)
+{
+ int nr_objects;
+
+ do {
+ nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
+ } while (nr_objects > 10);
+}
+
+/*
+ * Proc file operations.
+ */
+
+static int filecache_open(struct inode *inode, struct file *proc_file)
+{
+ struct seq_file *m;
+ struct session *s;
+ unsigned size;
+ char *buf = 0;
+ int ret;
+
+ if (!try_module_get(THIS_MODULE))
+ return -ENOENT;
+
+ s = session_create();
+ if (IS_ERR(s)) {
+ ret = PTR_ERR(s);
+ goto out;
+ }
+ set_session(proc_file, s);
+
+ size = SBUF_SIZE;
+ buf = kmalloc(size, GFP_KERNEL);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = seq_open(proc_file, &seq_filecache_op);
+ if (!ret) {
+ m = proc_file->private_data;
+ m->private = s;
+ m->buf = buf;
+ m->size = size;
+ }
+
+out:
+ if (ret) {
+ kfree(s);
+ kfree(buf);
+ module_put(THIS_MODULE);
+ }
+ return ret;
+}
+
+static int filecache_release(struct inode *inode, struct file *proc_file)
+{
+ struct session *s = get_session(proc_file);
+ int ret;
+
+ session_release(s);
+ ret = seq_release(inode, proc_file);
+ module_put(THIS_MODULE);
+ return ret;
+}
+
+ssize_t filecache_write(struct file *proc_file, const char __user * buffer,
+ size_t count, loff_t *ppos)
+{
+ struct session *s;
+ char *name;
+ int err = 0;
+
+ if (count >= PATH_MAX + 5)
+ return -ENAMETOOLONG;
+
+ name = kmalloc(count+1, GFP_KERNEL);
+ if (!name)
+ return -ENOMEM;
+
+ if (copy_from_user(name, buffer, count)) {
+ err = -EFAULT;
+ goto out;
+ }
+
+ /* strip the optional newline */
+ if (count && name[count-1] == '\n')
+ name[count-1] = '\0';
+ else
+ name[count] = '\0';
+
+ s = get_session(proc_file);
+ if (!strcmp(name, "set private")) {
+ s->private_session = 1;
+ goto out;
+ }
+
+ if (!strncmp(name, "cat ", 4)) {
+ err = session_update_file(s, name+4);
+ goto out;
+ }
+
+ if (!strncmp(name, "ls", 2)) {
+ err = session_update_file(s, NULL);
+ if (!err)
+ err = ls_parse_options(name+2, s);
+ if (!err && !s->private_session) {
+ global_session.ls_dev = s->ls_dev;
+ global_session.ls_options = s->ls_options;
+ }
+ goto out;
+ }
+
+ if (!strncmp(name, "drop pagecache", 14)) {
+ err = drop_pagecache();
+ goto out;
+ }
+
+ if (!strncmp(name, "drop slabcache", 14)) {
+ drop_slabcache();
+ goto out;
+ }
+
+ /* err = -EINVAL; */
+ err = session_update_file(s, name);
+
+out:
+ kfree(name);
+
+ return err ? err : count;
+}
+
+static struct file_operations proc_filecache_fops = {
+ .owner = THIS_MODULE,
+ .open = filecache_open,
+ .release = filecache_release,
+ .write = filecache_write,
+ .read = seq_read,
+ .llseek = seq_lseek,
+};
+
+
+static __init int filecache_init(void)
+{
+ int i;
+ struct proc_dir_entry *entry;
+
+ entry = create_proc_entry("filecache", 0600, NULL);
+ if (entry)
+ entry->proc_fops = &proc_filecache_fops;
+
+ for (page_mask = i = 0; i < ARRAY_SIZE(page_flag); i++)
+ if (!page_flag[i].faked)
+ page_mask |= page_flag[i].mask;
+
+ return 0;
+}
+
+static void filecache_exit(void)
+{
+ remove_proc_entry("filecache", NULL);
+ if (global_session.query_file)
+ fput(global_session.query_file);
+}
+
+MODULE_AUTHOR("Fengguang Wu <wfg@mail.ustc.edu.cn>");
+MODULE_LICENSE("GPL");
+
+module_init(filecache_init);
+module_exit(filecache_exit);
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -685,6 +685,12 @@ struct inode {
void *i_security;
#endif
void *i_private; /* fs or device private pointer */
+
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ unsigned int i_access_count; /* opened how many times? */
+ uid_t i_cuid; /* opened first by which user? */
+ char i_comm[16]; /* opened first by which app? */
+#endif
};
/*
@@ -773,6 +779,13 @@ static inline unsigned imajor(const stru
return MAJOR(inode->i_rdev);
}
+static inline void inode_accessed(struct inode *inode)
+{
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+ inode->i_access_count++;
+#endif
+}
+
extern struct block_device *I_BDEV(struct inode *inode);
struct fown_struct {
@@ -1907,6 +1920,7 @@ extern void remove_inode_hash(struct ino
static inline void insert_inode_hash(struct inode *inode) {
__insert_inode_hash(inode, inode->i_ino);
}
+struct hlist_head * get_inode_hash_budget(unsigned long index);
extern struct file * get_empty_filp(void);
extern void file_move(struct file *f, struct list_head *list);
--- linux-2.6.orig/fs/open.c
+++ linux-2.6/fs/open.c
@@ -828,6 +828,7 @@ static struct file *__dentry_open(struct
goto cleanup_all;
}
+ inode_accessed(inode);
f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
--- linux-2.6.orig/fs/Kconfig
+++ linux-2.6/fs/Kconfig
@@ -750,6 +750,36 @@ config CONFIGFS_FS
Both sysfs and configfs can and should exist together on the
same system. One is not a replacement for the other.
+config PROC_FILECACHE
+ tristate "/proc/filecache support"
+ default m
+ depends on PROC_FS
+ help
+ This option creates a file /proc/filecache which enables one to
+ query/drop the cached files in memory.
+
+ A quick start guide:
+
+ # echo 'ls' > /proc/filecache
+ # head /proc/filecache
+
+ # echo 'cat /bin/bash' > /proc/filecache
+ # head /proc/filecache
+
+ # echo 'drop pagecache' > /proc/filecache
+ # echo 'drop slabcache' > /proc/filecache
+
+ For more details, please check Documentation/filesystems/proc.txt .
+
+ It can be a handy tool for sysadms and desktop users.
+
+config PROC_FILECACHE_EXTRAS
+ bool "track extra states"
+ default y
+ depends on PROC_FILECACHE
+ help
+ Track extra states that costs a little more time/space.
+
endmenu
menu "Miscellaneous filesystems"
--- linux-2.6.orig/fs/proc/Makefile
+++ linux-2.6/fs/proc/Makefile
@@ -2,7 +2,8 @@
# Makefile for the Linux proc filesystem routines.
#
-obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PROC_FILECACHE) += filecache.o
proc-y := nommu.o task_nommu.o
proc-$(CONFIG_MMU) := mmu.o task_mmu.o
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-17 9:46 ` Andrew Morton
2009-03-17 10:17 ` Heiko Carstens
@ 2009-03-20 15:27 ` Mel Gorman
2009-03-20 21:02 ` Andrew Morton
1 sibling, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2009-03-20 15:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Heiko Carstens, linux-mm, Nick Piggin, Martin Schwidefsky,
Andreas Krebbel
On Tue, Mar 17, 2009 at 02:46:05AM -0700, Andrew Morton wrote:
> On Tue, 17 Mar 2009 10:00:49 +0100 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > Hi all,
> >
> > the below looks like there is some bug in the memory management code.
> > Even if there seems to be plenty of memory available the oom-killer
> > kills processes.
> >
> > The below happened after 27 days uptime, memory seems to be heavily
> > fragmented, but there are stills larger portions of memory free that
> > could satisfy an order 2 allocation. Any idea why this fails?
> >
You are hitting the watermark code for the order-2 allocation in all
liklihood. This looks like a GFP_KERNEL allocation so ordinarily it's a
bit surprising.
> > [root@t6360003 ~]# uptime
> > 09:33:41 up 27 days, 22:55, 1 user, load average: 0.00, 0.00, 0.00
> >
> > Mar 16 21:40:40 t6360003 kernel: basename invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
> > Mar 16 21:40:40 t6360003 kernel: CPU: 0 Not tainted 2.6.28 #1
> > Mar 16 21:40:40 t6360003 kernel: Process basename (pid: 30555, task: 000000007baa6838, ksp: 0000000063867968)
> > Mar 16 21:40:40 t6360003 kernel: 0700000084a8c238 0000000063867a90 0000000000000002 0000000000000000
> > Mar 16 21:40:40 t6360003 kernel: 0000000063867b30 0000000063867aa8 0000000063867aa8 000000000010534e
> > Mar 16 21:40:40 t6360003 kernel: 0000000000000000 0000000063867968 0000000000000000 000000000000000a
> > Mar 16 21:40:40 t6360003 kernel: 000000000000000d 0000000000000000 0000000063867a90 0000000063867b08
> > Mar 16 21:40:40 t6360003 kernel: 00000000004a5ab0 000000000010534e 0000000063867a90 0000000063867ae0
> > Mar 16 21:40:40 t6360003 kernel: Call Trace:
> > Mar 16 21:40:40 t6360003 kernel: ([<0000000000105248>] show_trace+0xf4/0x144)
> > Mar 16 21:40:40 t6360003 kernel: [<0000000000105300>] show_stack+0x68/0xf4
> > Mar 16 21:40:40 t6360003 kernel: [<000000000049c84c>] dump_stack+0xb0/0xc0
> > Mar 16 21:40:40 t6360003 kernel: [<000000000019235e>] oom_kill_process+0x9e/0x220
> > Mar 16 21:40:40 t6360003 kernel: [<0000000000192c30>] out_of_memory+0x17c/0x264
> > Mar 16 21:40:40 t6360003 kernel: [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
> > Mar 16 21:40:40 t6360003 kernel: [<0000000000104058>] crst_table_alloc+0x48/0x108
> > Mar 16 21:40:40 t6360003 kernel: [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8
> > Mar 16 21:40:40 t6360003 kernel: [<00000000001a802e>] handle_mm_fault+0x262/0x9cc
> > Mar 16 21:40:40 t6360003 kernel: [<00000000004a1a7a>] do_dat_exception+0x30a/0x41c
> > Mar 16 21:40:40 t6360003 kernel: [<0000000000115e5c>] sysc_return+0x0/0x8
> > Mar 16 21:40:40 t6360003 kernel: [<0000004d193bfae0>] 0x4d193bfae0
> > Mar 16 21:40:40 t6360003 kernel: Mem-Info:
> > Mar 16 21:40:40 t6360003 kernel: DMA per-cpu:
> > Mar 16 21:40:40 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: Normal per-cpu:
> > Mar 16 21:40:40 t6360003 kernel: CPU 0: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: CPU 1: hi: 186, btch: 31 usd: 30
> > Mar 16 21:40:40 t6360003 kernel: CPU 2: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: CPU 3: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: CPU 4: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: CPU 5: hi: 186, btch: 31 usd: 0
> > Mar 16 21:40:40 t6360003 kernel: Active_anon:372 active_file:45 inactive_anon:154
> > Mar 16 21:40:40 t6360003 kernel: inactive_file:152 unevictable:987 dirty:0 writeback:188 unstable:0
> > Mar 16 21:40:40 t6360003 kernel: free:146348 slab:875833 mapped:805 pagetables:378 bounce:0
> > Mar 16 21:40:40 t6360003 kernel: DMA free:467728kB min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:116kB unevictable:0kB present:2068480kB pages_scanned:0 all_unreclaimable? no
> > Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 2020 2020
> > Mar 16 21:40:40 t6360003 kernel: Normal free:117664kB min:4064kB low:5080kB high:6096kB active_anon:1488kB inactive_anon:616kB active_file:188kB inactive_file:492kB unevictable:3948kB present:2068480kB pages_scanned:128 all_unreclaimable? no
> > Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 0 0
>
> The scanner has wrung pretty much all it can out of the reclaimable pages -
> the LRUs are nearly empty. There's a few hundred MB free and apparently we
> don't have four physically contiguous free pages anywhere. It's
> believeable.
>
> The question is: where the heck did all your memory go? You have 2GB of
> ZONE_NORMAL memory in that machine, but only a tenth of it is visible to
> the page reclaim code.
>
> Something must have allocated (and possibly leaked) it.
>
This looks like a memory leak all right. There used to be a patch that
recorded a stack trace for every page allocation but it was dropped from
-mm ages ago because of a merge conflict. I didn't revive it at the time
because it wasn't of immediate concern.
Should I revive the patch or do we have preferred ways of tracking down
memory leaks these days?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-20 15:27 ` Mel Gorman
@ 2009-03-20 21:02 ` Andrew Morton
2009-03-23 11:55 ` Mel Gorman
2009-03-23 14:58 ` Mel Gorman
0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2009-03-20 21:02 UTC (permalink / raw)
To: Mel Gorman
Cc: Heiko Carstens, linux-mm, Nick Piggin, Martin Schwidefsky,
Andreas Krebbel
On Fri, 20 Mar 2009 15:27:00 +0000 Mel Gorman <mel@csn.ul.ie> wrote:
> >
> > Something must have allocated (and possibly leaked) it.
> >
>
> This looks like a memory leak all right. There used to be a patch that
> recorded a stack trace for every page allocation but it was dropped from
> -mm ages ago because of a merge conflict. I didn't revive it at the time
> because it wasn't of immediate concern.
>
> Should I revive the patch or do we have preferred ways of tracking down
> memory leaks these days?
We know that a dentry is getting leaked but afaik we don't know which one
or why.
We could get more info via the page-owner-tracking-leak-detector.patch
approach, or by dumping the info in the cached dentries - I think Wu
Fengguang prepared a patch which does that.
I'm not sure why I dropped page-owner-tracking-leak-detector.patch actually
- it was pretty useful sometimes and afaik we still haven't merged any tool
which duplicates it.
Here's the latest version which I have:
From: Alexander Nyberg <alexn@dsv.su.se>
Introduces CONFIG_PAGE_OWNER that keeps track of the call chain under which a
page was allocated. Includes a user-space helper in
Documentation/page_owner.c to sort the enormous amount of output that this may
give (thanks tridge).
Information available through /proc/page_owner
x86_64 introduces some stack noise in certain call chains so for exact
output use of x86 && CONFIG_FRAME_POINTER is suggested. Tested on x86, x86
&& CONFIG_FRAME_POINTER, x86_64
Output looks like:
4819 times:
Page allocated via order 0, mask 0x50
[0xc012b7b9] find_lock_page+25
[0xc012b8c8] find_or_create_page+152
[0xc0147d74] grow_dev_page+36
[0xc0148164] __find_get_block+84
[0xc0147ebc] __getblk_slow+124
[0xc0148164] __find_get_block+84
[0xc01481e7] __getblk+55
[0xc0185d14] do_readahead+100
We use a custom stack unwinder because using __builtin_return_address([0-7])
causes gcc to generate code that might try to unwind the stack looking for
function return addresses and "fall off" causing early panics if the call
chain is not deep enough. So in that case we could have had a depth of around
3 functions in all traces (I experimented a bit with this).
From: Dave Hansen <haveblue@us.ibm.com>
make page_owner handle non-contiguous page ranges
From: Alexander Nyberg <alexn@telia.com>
I've cleaned up the __alloc_pages() part to a simple set_page_owner() call.
Signed-off-by: Alexander Nyberg <alexn@dsv.su.se>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-Off-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitu.com>
DESC
Update page->order at an appropriate time when tracking PAGE_OWNER
EDESC
From: mel@skynet.ie (Mel Gorman)
PAGE_OWNER tracks free pages by setting page->order to -1. However, it is
set during __free_pages() which is not the only free path as
__pagevec_free() and free_compound_page() do not go through __free_pages().
This leads to a situation where free pages are visible in /proc/page_owner
which is confusing and might be interpreted as a memory leak.
This patch sets page->owner when PageBuddy is set. It also prints a
warning to the kernel log if a free page is found that does not appear free
to PAGE_OWNER. This should be considered a fix to
page-owner-tracking-leak-detector.patch.
This only applies to -mm as PAGE_OWNER is not in mainline.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
DESC
Print out PAGE_OWNER statistics in relation to fragmentation avoidance
EDESC
From: Mel Gorman <mel@csn.ul.ie>
When PAGE_OWNER is set, more information is available of relevance to
fragmentation avoidance. A second line is added to /proc/page_owner showing
the PFN, the pageblock number, the mobility type of the page based on its
allocation flags, whether the allocation is improperly placed and the flags.
A sample entry looks like
Page allocated via order 0, mask 0x1280d2
PFN 7355 Block 7 type 3 Fallback Flags LA
[0xc01528c6] __handle_mm_fault+598
[0xc0320427] do_page_fault+279
[0xc031ed9a] error_code+114
This information can be used to identify pages that are improperly placed. As
the format of PAGE_OWNER data is now different, the comment at the top of
Documentation/page_owner.c is updated with new instructions.
As PAGE_OWNER tracks the GFP flags used to allocate the pages,
/proc/pagetypeinfo is enhanced to contain how many mixed blocks exist. The
additional output looks like
Number of mixed blocks Unmovable Reclaimable Movable Reserve
Node 0, zone DMA 0 1 2 1
Node 0, zone Normal 2 11 33 0
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
DESC
Allow PAGE_OWNER to be set on any architecture
EDESC
From: Mel Gorman <mel@csn.ul.ie>
Currently PAGE_OWNER depends on CONFIG_X86. This appears to be due to
pfn_to_page() being called in an inappropriate for many memory models and
the presense of memory holes. This patch ensures that pfn_valid() and
pfn_valid_within() is called at the appropriate places and the offsets
correctly updated so that PAGE_OWNER is safe on any architecture.
In situations where CONFIG_HOLES_IN_ZONES is set (IA64 with
VIRTUAL_MEM_MAP), there may be cases where pages allocated within a
MAX_ORDER_NR_PAGES block of pages may not be displayed in /proc/page_owner
if the hole is at the start of the block. Addressing this would be quite
complex, perform slowly and is of no clear benefit.
Once PAGE_OWNER is allowed on all architectures, the statistics for
grouping pages by mobility that declare how many pageblocks contain mixed
page types becomes optionally available on all arches.
This patch was tested successfully on x86, x86_64, ppc64 and IA64 machines.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
DESC
allow-page_owner-to-be-set-on-any-architecture-fix
EDESC
From: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
DESC
allow-page_owner-to-be-set-on-any-architecture-fix fix
EDESC
From: mel@skynet.ie (Mel Gorman)
Page-owner-tracking stores the a backtrace of an allocation in the struct
page. How the stack trace is generated depends on whether
CONFIG_FRAME_POINTER is set or not. If CONFIG_FRAME_POINTER is set, the
frame pointer must be read using some inline assembler which is not
available for all architectures.
This patch uses the frame pointer where it is available but has a fallback
where it is not.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/page_owner.c | 141 ++++++++++++++++++++++++++++++++++
fs/proc/proc_misc.c | 145 +++++++++++++++++++++++++++++++++++
include/linux/mm_types.h | 5 +
lib/Kconfig.debug | 10 ++
mm/page_alloc.c | 66 +++++++++++++++
mm/vmstat.c | 93 ++++++++++++++++++++++
6 files changed, 460 insertions(+)
diff -puN /dev/null Documentation/page_owner.c
--- /dev/null
+++ a/Documentation/page_owner.c
@@ -0,0 +1,141 @@
+/*
+ * User-space helper to sort the output of /proc/page_owner
+ *
+ * Example use:
+ * cat /proc/page_owner > page_owner_full.txt
+ * grep -v ^PFN page_owner_full.txt > page_owner.txt
+ * ./sort page_owner.txt sorted_page_owner.txt
+*/
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+
+struct block_list {
+ char *txt;
+ int len;
+ int num;
+};
+
+
+static struct block_list *list;
+static int list_size;
+static int max_size;
+
+struct block_list *block_head;
+
+int read_block(char *buf, FILE *fin)
+{
+ int ret = 0;
+ int hit = 0;
+ char *curr = buf;
+
+ for (;;) {
+ *curr = getc(fin);
+ if (*curr == EOF) return -1;
+
+ ret++;
+ if (*curr == '\n' && hit == 1)
+ return ret - 1;
+ else if (*curr == '\n')
+ hit = 1;
+ else
+ hit = 0;
+ curr++;
+ }
+}
+
+static int compare_txt(struct block_list *l1, struct block_list *l2)
+{
+ return strcmp(l1->txt, l2->txt);
+}
+
+static int compare_num(struct block_list *l1, struct block_list *l2)
+{
+ return l2->num - l1->num;
+}
+
+static void add_list(char *buf, int len)
+{
+ if (list_size != 0 &&
+ len == list[list_size-1].len &&
+ memcmp(buf, list[list_size-1].txt, len) == 0) {
+ list[list_size-1].num++;
+ return;
+ }
+ if (list_size == max_size) {
+ printf("max_size too small??\n");
+ exit(1);
+ }
+ list[list_size].txt = malloc(len+1);
+ list[list_size].len = len;
+ list[list_size].num = 1;
+ memcpy(list[list_size].txt, buf, len);
+ list[list_size].txt[len] = 0;
+ list_size++;
+ if (list_size % 1000 == 0) {
+ printf("loaded %d\r", list_size);
+ fflush(stdout);
+ }
+}
+
+int main(int argc, char **argv)
+{
+ FILE *fin, *fout;
+ char buf[1024];
+ int ret, i, count;
+ struct block_list *list2;
+ struct stat st;
+
+ fin = fopen(argv[1], "r");
+ fout = fopen(argv[2], "w");
+ if (!fin || !fout) {
+ printf("Usage: ./program <input> <output>\n");
+ perror("open: ");
+ exit(2);
+ }
+
+ fstat(fileno(fin), &st);
+ max_size = st.st_size / 100; /* hack ... */
+
+ list = malloc(max_size * sizeof(*list));
+
+ for(;;) {
+ ret = read_block(buf, fin);
+ if (ret < 0)
+ break;
+
+ buf[ret] = '\0';
+ add_list(buf, ret);
+ }
+
+ printf("loaded %d\n", list_size);
+
+ printf("sorting ....\n");
+
+ qsort(list, list_size, sizeof(list[0]), compare_txt);
+
+ list2 = malloc(sizeof(*list) * list_size);
+
+ printf("culling\n");
+
+ for (i=count=0;i<list_size;i++) {
+ if (count == 0 ||
+ strcmp(list2[count-1].txt, list[i].txt) != 0) {
+ list2[count++] = list[i];
+ } else {
+ list2[count-1].num += list[i].num;
+ }
+ }
+
+ qsort(list2, count, sizeof(list[0]), compare_num);
+
+ for (i=0;i<count;i++) {
+ fprintf(fout, "%d times:\n%s\n", list2[i].num, list2[i].txt);
+ }
+ return 0;
+}
diff -puN fs/proc/proc_misc.c~page-owner-tracking-leak-detector fs/proc/proc_misc.c
--- a/fs/proc/proc_misc.c~page-owner-tracking-leak-detector
+++ a/fs/proc/proc_misc.c
@@ -855,6 +855,140 @@ static struct file_operations proc_kpage
};
#endif /* CONFIG_PROC_PAGE_MONITOR */
+#ifdef CONFIG_PAGE_OWNER
+#include <linux/bootmem.h>
+#include <linux/kallsyms.h>
+static ssize_t
+read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
+{
+ unsigned long pfn;
+ struct page *page;
+ char *kbuf, *modname;
+ const char *symname;
+ int ret = 0;
+ char namebuf[128];
+ unsigned long offset = 0, symsize;
+ int i;
+ ssize_t num_written = 0;
+ int blocktype = 0, pagetype = 0;
+
+ page = NULL;
+ pfn = min_low_pfn + *ppos;
+
+ /* Find a valid PFN or the start of a MAX_ORDER_NR_PAGES area */
+ while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
+ pfn++;
+
+ /* Find an allocated page */
+ for (; pfn < max_pfn; pfn++) {
+ /*
+ * If the new page is in a new MAX_ORDER_NR_PAGES area,
+ * validate the area as existing, skip it if not
+ */
+ if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && !pfn_valid(pfn)) {
+ pfn += MAX_ORDER_NR_PAGES - 1;
+ continue;
+ }
+
+ /* Check for holes within a MAX_ORDER area */
+ if (!pfn_valid_within(pfn))
+ continue;
+
+ page = pfn_to_page(pfn);
+
+ /* Catch situations where free pages have a bad ->order */
+ if (page->order >= 0 && PageBuddy(page))
+ printk(KERN_WARNING
+ "PageOwner info inaccurate for PFN %lu\n",
+ pfn);
+
+ /* Stop search if page is allocated and has trace info */
+ if (page->order >= 0 && page->trace[0])
+ break;
+ }
+
+ if (!pfn_valid(pfn))
+ return 0;
+
+ /* Record the next PFN to read in the file offset */
+ *ppos = (pfn - min_low_pfn) + 1;
+
+ kbuf = kmalloc(count, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ ret = snprintf(kbuf, count, "Page allocated via order %d, mask 0x%x\n",
+ page->order, page->gfp_mask);
+ if (ret >= count) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ /* Print information relevant to grouping pages by mobility */
+ blocktype = get_pageblock_migratetype(page);
+ pagetype = allocflags_to_migratetype(page->gfp_mask);
+ ret += snprintf(kbuf+ret, count-ret,
+ "PFN %lu Block %lu type %d %s "
+ "Flags %s%s%s%s%s%s%s%s%s%s%s%s\n",
+ pfn,
+ pfn >> pageblock_order,
+ blocktype,
+ blocktype != pagetype ? "Fallback" : " ",
+ PageLocked(page) ? "K" : " ",
+ PageError(page) ? "E" : " ",
+ PageReferenced(page) ? "R" : " ",
+ PageUptodate(page) ? "U" : " ",
+ PageDirty(page) ? "D" : " ",
+ PageLRU(page) ? "L" : " ",
+ PageActive(page) ? "A" : " ",
+ PageSlab(page) ? "S" : " ",
+ PageWriteback(page) ? "W" : " ",
+ PageCompound(page) ? "C" : " ",
+ PageSwapCache(page) ? "B" : " ",
+ PageMappedToDisk(page) ? "M" : " ");
+ if (ret >= count) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ num_written = ret;
+
+ for (i = 0; i < 8; i++) {
+ if (!page->trace[i])
+ break;
+ symname = kallsyms_lookup(page->trace[i], &symsize, &offset,
+ &modname, namebuf);
+ ret = snprintf(kbuf + num_written, count - num_written,
+ "[0x%lx] %s+%lu\n",
+ page->trace[i], namebuf, offset);
+ if (ret >= count - num_written) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ num_written += ret;
+ }
+
+ ret = snprintf(kbuf + num_written, count - num_written, "\n");
+ if (ret >= count - num_written) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ num_written += ret;
+ ret = num_written;
+
+ if (copy_to_user(buf, kbuf, ret))
+ ret = -EFAULT;
+out:
+ kfree(kbuf);
+ return ret;
+}
+
+static struct file_operations proc_page_owner_operations = {
+ .read = read_page_owner,
+};
+#endif
+
struct proc_dir_entry *proc_root_kcore;
void __init proc_misc_init(void)
@@ -932,4 +1066,15 @@ void __init proc_misc_init(void)
#ifdef CONFIG_PROC_VMCORE
proc_vmcore = proc_create("vmcore", S_IRUSR, NULL, &proc_vmcore_operations);
#endif
+#ifdef CONFIG_PAGE_OWNER
+ {
+ struct proc_dir_entry *entry;
+ entry = create_proc_entry("page_owner",
+ S_IWUSR | S_IRUGO, NULL);
+ if (entry) {
+ entry->proc_fops = &proc_page_owner_operations;
+ entry->size = 1024;
+ }
+ }
+#endif
}
diff -puN include/linux/mm_types.h~page-owner-tracking-leak-detector include/linux/mm_types.h
--- a/include/linux/mm_types.h~page-owner-tracking-leak-detector
+++ a/include/linux/mm_types.h
@@ -101,6 +101,11 @@ struct page {
#ifdef CONFIG_KMEMCHECK
void *shadow;
#endif
+#ifdef CONFIG_PAGE_OWNER
+ int order;
+ unsigned int gfp_mask;
+ unsigned long trace[8];
+#endif
};
/*
diff -puN lib/Kconfig.debug~page-owner-tracking-leak-detector lib/Kconfig.debug
--- a/lib/Kconfig.debug~page-owner-tracking-leak-detector
+++ a/lib/Kconfig.debug
@@ -66,6 +66,16 @@ config UNUSED_SYMBOLS
you really need it, and what the merge plan to the mainline kernel for
your module is.
+config PAGE_OWNER
+ bool "Track page owner"
+ depends on DEBUG_KERNEL
+ help
+ This keeps track of what call chain is the owner of a page, may
+ help to find bare alloc_page(s) leaks. Eats a fair amount of memory.
+ See Documentation/page_owner.c for user-space helper.
+
+ If unsure, say N.
+
config DEBUG_FS
bool "Debug Filesystem"
depends on SYSFS
diff -puN mm/page_alloc.c~page-owner-tracking-leak-detector mm/page_alloc.c
--- a/mm/page_alloc.c~page-owner-tracking-leak-detector
+++ a/mm/page_alloc.c
@@ -316,6 +316,9 @@ static inline void set_page_order(struct
{
set_page_private(page, order);
__SetPageBuddy(page);
+#ifdef CONFIG_PAGE_OWNER
+ page->order = -1;
+#endif
}
static inline void rmv_page_order(struct page *page)
@@ -1434,6 +1437,62 @@ try_next_zone:
return page;
}
+#ifdef CONFIG_PAGE_OWNER
+static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
+{
+ return p > (void *)tinfo &&
+ p < (void *)tinfo + THREAD_SIZE - 3;
+}
+
+static inline void __stack_trace(struct page *page, unsigned long *stack,
+ unsigned long bp)
+{
+ int i = 0;
+ unsigned long addr;
+ struct thread_info *tinfo = (struct thread_info *)
+ ((unsigned long)stack & (~(THREAD_SIZE - 1)));
+
+ memset(page->trace, 0, sizeof(long) * 8);
+
+#ifdef CONFIG_FRAME_POINTER
+ if (bp) {
+ while (valid_stack_ptr(tinfo, (void *)bp)) {
+ addr = *(unsigned long *)(bp + sizeof(long));
+ page->trace[i] = addr;
+ if (++i >= 8)
+ break;
+ bp = *(unsigned long *)bp;
+ }
+ return;
+ }
+#endif /* CONFIG_FRAME_POINTER */
+ while (valid_stack_ptr(tinfo, stack)) {
+ addr = *stack++;
+ if (__kernel_text_address(addr)) {
+ page->trace[i] = addr;
+ if (++i >= 8)
+ break;
+ }
+ }
+}
+
+static void set_page_owner(struct page *page, unsigned int order,
+ unsigned int gfp_mask)
+{
+ unsigned long address;
+ unsigned long bp = 0;
+#ifdef CONFIG_X86_64
+ asm ("movq %%rbp, %0" : "=r" (bp) : );
+#endif
+#ifdef CONFIG_X86_32
+ asm ("movl %%ebp, %0" : "=r" (bp) : );
+#endif
+ page->order = (int) order;
+ page->gfp_mask = gfp_mask;
+ __stack_trace(page, &address, bp);
+}
+#endif /* CONFIG_PAGE_OWNER */
+
/*
* This is the 'heart' of the zoned buddy allocator.
*/
@@ -1638,6 +1697,10 @@ nopage:
show_mem();
}
got_pg:
+#ifdef CONFIG_PAGE_OWNER
+ if (page)
+ set_page_owner(page, order, gfp_mask);
+#endif
return page;
}
EXPORT_SYMBOL(__alloc_pages_internal);
@@ -2635,6 +2698,9 @@ void __meminit memmap_init_zone(unsigned
if (!is_highmem_idx(zone))
set_page_address(page, __va(pfn << PAGE_SHIFT));
#endif
+#ifdef CONFIG_PAGE_OWNER
+ page->order = -1;
+#endif
}
}
diff -puN mm/vmstat.c~page-owner-tracking-leak-detector mm/vmstat.c
--- a/mm/vmstat.c~page-owner-tracking-leak-detector
+++ a/mm/vmstat.c
@@ -15,6 +15,7 @@
#include <linux/cpu.h>
#include <linux/vmstat.h>
#include <linux/sched.h>
+#include "internal.h"
#ifdef CONFIG_VM_EVENT_COUNTERS
DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -560,6 +561,97 @@ static int pagetypeinfo_showblockcount(s
return 0;
}
+#ifdef CONFIG_PAGE_OWNER
+static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
+ pg_data_t *pgdat,
+ struct zone *zone)
+{
+ int mtype, pagetype;
+ unsigned long pfn;
+ unsigned long start_pfn = zone->zone_start_pfn;
+ unsigned long end_pfn = start_pfn + zone->spanned_pages;
+ unsigned long count[MIGRATE_TYPES] = { 0, };
+
+ /* Align PFNs to pageblock_nr_pages boundary */
+ pfn = start_pfn & ~(pageblock_nr_pages-1);
+
+ /*
+ * Walk the zone in pageblock_nr_pages steps. If a page block spans
+ * a zone boundary, it will be double counted between zones. This does
+ * not matter as the mixed block count will still be correct
+ */
+ for (; pfn < end_pfn; pfn += pageblock_nr_pages) {
+ struct page *page;
+ unsigned long offset = 0;
+
+ /* Do not read before the zone start, use a valid page */
+ if (pfn < start_pfn)
+ offset = start_pfn - pfn;
+
+ if (!pfn_valid(pfn + offset))
+ continue;
+
+ page = pfn_to_page(pfn + offset);
+ mtype = get_pageblock_migratetype(page);
+
+ /* Check the block for bad migrate types */
+ for (; offset < pageblock_nr_pages; offset++) {
+ /* Do not past the end of the zone */
+ if (pfn + offset >= end_pfn)
+ break;
+
+ if (!pfn_valid_within(pfn + offset))
+ continue;
+
+ page = pfn_to_page(pfn + offset);
+
+ /* Skip free pages */
+ if (PageBuddy(page)) {
+ offset += (1UL << page_order(page)) - 1UL;
+ continue;
+ }
+ if (page->order < 0)
+ continue;
+
+ pagetype = allocflags_to_migratetype(page->gfp_mask);
+ if (pagetype != mtype) {
+ count[mtype]++;
+ break;
+ }
+
+ /* Move to end of this allocation */
+ offset += (1 << page->order) - 1;
+ }
+ }
+
+ /* Print counts */
+ seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+ for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+ seq_printf(m, "%12lu ", count[mtype]);
+ seq_putc(m, '\n');
+}
+#endif /* CONFIG_PAGE_OWNER */
+
+/*
+ * Print out the number of pageblocks for each migratetype that contain pages
+ * of other types. This gives an indication of how well fallbacks are being
+ * contained by rmqueue_fallback(). It requires information from PAGE_OWNER
+ * to determine what is going on
+ */
+static void pagetypeinfo_showmixedcount(struct seq_file *m, pg_data_t *pgdat)
+{
+#ifdef CONFIG_PAGE_OWNER
+ int mtype;
+
+ seq_printf(m, "\n%-23s", "Number of mixed blocks ");
+ for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+ seq_printf(m, "%12s ", migratetype_names[mtype]);
+ seq_putc(m, '\n');
+
+ walk_zones_in_node(m, pgdat, pagetypeinfo_showmixedcount_print);
+#endif /* CONFIG_PAGE_OWNER */
+}
+
/*
* This prints out statistics in relation to grouping pages by mobility.
* It is expensive to collect so do not constantly read the file.
@@ -577,6 +669,7 @@ static int pagetypeinfo_show(struct seq_
seq_putc(m, '\n');
pagetypeinfo_showfree(m, pgdat);
pagetypeinfo_showblockcount(m, pgdat);
+ pagetypeinfo_showmixedcount(m, pgdat);
return 0;
}
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-20 21:02 ` Andrew Morton
@ 2009-03-23 11:55 ` Mel Gorman
2009-03-23 14:58 ` Mel Gorman
1 sibling, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2009-03-23 11:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Heiko Carstens, linux-mm, Nick Piggin, Martin Schwidefsky,
Andreas Krebbel
On Fri, Mar 20, 2009 at 02:02:55PM -0700, Andrew Morton wrote:
> On Fri, 20 Mar 2009 15:27:00 +0000 Mel Gorman <mel@csn.ul.ie> wrote:
>
> > >
> > > Something must have allocated (and possibly leaked) it.
> > >
> >
> > This looks like a memory leak all right. There used to be a patch that
> > recorded a stack trace for every page allocation but it was dropped from
> > -mm ages ago because of a merge conflict. I didn't revive it at the time
> > because it wasn't of immediate concern.
> >
> > Should I revive the patch or do we have preferred ways of tracking down
> > memory leaks these days?
>
> We know that a dentry is getting leaked but afaik we don't know which one
> or why.
>
> We could get more info via the page-owner-tracking-leak-detector.patch
> approach, or by dumping the info in the cached dentries - I think Wu
> Fengguang prepared a patch which does that.
>
Looks like it
> I'm not sure why I dropped page-owner-tracking-leak-detector.patch actually
> - it was pretty useful sometimes and afaik we still haven't merged any tool
> which duplicates it.
>
The note I got at the time was "This patch was dropped because procfs
changes broke it".
> Here's the latest version which I have:
>
That matches what I have. I'll check and see can I figure out what broke
with it.
> From: Alexander Nyberg <alexn@dsv.su.se>
>
> Introduces CONFIG_PAGE_OWNER that keeps track of the call chain under which a
> page was allocated. Includes a user-space helper in
> Documentation/page_owner.c to sort the enormous amount of output that this may
> give (thanks tridge).
>
> <SNIP>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: oom-killer killing even if memory is available?
2009-03-20 21:02 ` Andrew Morton
2009-03-23 11:55 ` Mel Gorman
@ 2009-03-23 14:58 ` Mel Gorman
1 sibling, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2009-03-23 14:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Heiko Carstens, linux-mm, Nick Piggin, Martin Schwidefsky,
Andreas Krebbel
On Fri, Mar 20, 2009 at 02:02:55PM -0700, Andrew Morton wrote:
> On Fri, 20 Mar 2009 15:27:00 +0000 Mel Gorman <mel@csn.ul.ie> wrote:
>
> > >
> > > Something must have allocated (and possibly leaked) it.
> > >
> >
> > This looks like a memory leak all right. There used to be a patch that
> > recorded a stack trace for every page allocation but it was dropped from
> > -mm ages ago because of a merge conflict. I didn't revive it at the time
> > because it wasn't of immediate concern.
> >
> > Should I revive the patch or do we have preferred ways of tracking down
> > memory leaks these days?
>
> We know that a dentry is getting leaked but afaik we don't know which one
> or why.
>
> We could get more info via the page-owner-tracking-leak-detector.patch
> approach, or by dumping the info in the cached dentries - I think Wu
> Fengguang prepared a patch which does that.
>
> I'm not sure why I dropped page-owner-tracking-leak-detector.patch actually
> - it was pretty useful sometimes and afaik we still haven't merged any tool
> which duplicates it.
>
> Here's the latest version which I have:
>
Here is a rebased reversion. Appears to work as advertised based on a
quick test with qemu and builds without CONFIG_PROC_PAGEOWNER
============
From: Alexander Nyberg <alexn@dsv.su.se>
Subject: [PATCH] Introduces CONFIG_PAGE_OWNER that keeps track of the call chain under which a page was allocated
Introduces CONFIG_PROC_PAGEOWNER that keeps track of the call chain
under which a page was allocated. Includes a user-space helper in
Documentation/page_owner.c to sort the enormous amount of output that this
may give (thanks tridge).
Information available through /proc/page_owner
x86_64 introduces some stack noise in certain call chains so for exact
output use of x86 && CONFIG_FRAME_POINTER is suggested. Tested on x86,
x86 && CONFIG_FRAME_POINTER, x86_64
Output looks like:
4819 times:
Page allocated via order 0, mask 0x50
[0xc012b7b9] find_lock_page+25
[0xc012b8c8] find_or_create_page+152
[0xc0147d74] grow_dev_page+36
[0xc0148164] __find_get_block+84
[0xc0147ebc] __getblk_slow+124
[0xc0148164] __find_get_block+84
[0xc01481e7] __getblk+55
[0xc0185d14] do_readahead+100
We use a custom stack unwinder because using __builtin_return_address([0-7])
causes gcc to generate code that might try to unwind the stack looking for
function return addresses and "fall off" causing early panics if the call
chain is not deep enough. So in that case we could have had a depth of
around 3 functions in all traces (I experimented a bit with this).
From: Dave Hansen <haveblue@us.ibm.com>
make page_owner handle non-contiguous page ranges
From: Alexander Nyberg <alexn@telia.com>
I've cleaned up the __alloc_pages() part to a simple set_page_owner() call.
Signed-off-by: Alexander Nyberg <alexn@dsv.su.se>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-Off-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitu.com>
DESC
Update page->order at an appropriate time when tracking PAGE_OWNER
EDESC
From: mel@skynet.ie (Mel Gorman)
PAGE_OWNER tracks free pages by setting page->order to -1. However, it is
set during __free_pages() which is not the only free path as
__pagevec_free() and free_compound_page() do not go through __free_pages().
This leads to a situation where free pages are visible in /proc/page_owner
which is confusing and might be interpreted as a memory leak.
This patch sets page->owner when PageBuddy is set. It also prints a
warning to the kernel log if a free page is found that does not appear free
to PAGE_OWNER. This should be considered a fix to
page-owner-tracking-leak-detector.patch.
This only applies to -mm as PAGE_OWNER is not in mainline.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
DESC
Print out PAGE_OWNER statistics in relation to fragmentation avoidance
EDESC
From: Mel Gorman <mel@csn.ul.ie>
When PAGE_OWNER is set, more information is available of relevance to
fragmentation avoidance. A second line is added to /proc/page_owner showing
the PFN, the pageblock number, the mobility type of the page based on its
allocation flags, whether the allocation is improperly placed and the flags.
A sample entry looks like
Page allocated via order 0, mask 0x1280d2
PFN 7355 Block 7 type 3 Fallback Flags LA
[0xc01528c6] __handle_mm_fault+598
[0xc0320427] do_page_fault+279
[0xc031ed9a] error_code+114
This information can be used to identify pages that are improperly placed.
As the format of PAGE_OWNER data is now different, the comment at the top
of Documentation/page_owner.c is updated with new instructions.
As PAGE_OWNER tracks the GFP flags used to allocate the pages,
/proc/pagetypeinfo is enhanced to contain how many mixed blocks exist.
The additional output looks like
Number of mixed blocks Unmovable Reclaimable Movable Reserve
Node 0, zone DMA 0 1 2 1
Node 0, zone Normal 2 11 33 0
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
DESC
Allow PAGE_OWNER to be set on any architecture
EDESC
From: Mel Gorman <mel@csn.ul.ie>
Currently PAGE_OWNER depends on CONFIG_X86. This appears to be due to
pfn_to_page() being called in an inappropriate for many memory models
and the presense of memory holes. This patch ensures that pfn_valid()
and pfn_valid_within() is called at the appropriate places and the offsets
correctly updated so that PAGE_OWNER is safe on any architecture.
In situations where CONFIG_HOLES_IN_ZONES is set (IA64 with VIRTUAL_MEM_MAP),
there may be cases where pages allocated within a MAX_ORDER_NR_PAGES block
of pages may not be displayed in /proc/page_owner if the hole is at the
start of the block. Addressing this would be quite complex, perform slowly
and is of no clear benefit.
Once PAGE_OWNER is allowed on all architectures, the statistics for grouping
pages by mobility that declare how many pageblocks contain mixed page types
becomes optionally available on all arches.
This patch was tested successfully on x86, x86_64, ppc64 and IA64 machines.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
DESC
allow-page_owner-to-be-set-on-any-architecture-fix
EDESC
From: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
DESC
allow-page_owner-to-be-set-on-any-architecture-fix fix
EDESC
From: mel@skynet.ie (Mel Gorman)
Page-owner-tracking stores the a backtrace of an allocation in the
struct page. How the stack trace is generated depends on whether
CONFIG_FRAME_POINTER is set or not. If CONFIG_FRAME_POINTER is set,
the frame pointer must be read using some inline assembler which is not
available for all architectures.
This patch uses the frame pointer where it is available but has a fallback
where it is not.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
From: Mel Gorman <mel@csn.ul.ie>
Rebase on top of procfs changes
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff --git a/Documentation/page_owner.c b/Documentation/page_owner.c
new file mode 100644
index 0000000..9081bd6
--- /dev/null
+++ b/Documentation/page_owner.c
@@ -0,0 +1,144 @@
+/*
+ * User-space helper to sort the output of /proc/page_owner
+ *
+ * Example use:
+ * cat /proc/page_owner > page_owner_full.txt
+ * grep -v ^PFN page_owner_full.txt > page_owner.txt
+ * ./sort page_owner.txt sorted_page_owner.txt
+*/
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+
+struct block_list {
+ char *txt;
+ int len;
+ int num;
+};
+
+
+static struct block_list *list;
+static int list_size;
+static int max_size;
+
+struct block_list *block_head;
+
+int read_block(char *buf, FILE *fin)
+{
+ int ret = 0;
+ int hit = 0;
+ char *curr = buf;
+
+ for (;;) {
+ *curr = getc(fin);
+ if (*curr == EOF) return -1;
+
+ ret++;
+ if (*curr == '\n' && hit == 1)
+ return ret - 1;
+ else if (*curr == '\n')
+ hit = 1;
+ else
+ hit = 0;
+ curr++;
+ }
+}
+
+static int compare_txt(const void *d1, const void *d2)
+{
+ struct block_list *l1 = (struct block_list *)d1;
+ struct block_list *l2 = (struct block_list *)d2;
+ return strcmp(l1->txt, l2->txt);
+}
+
+static int compare_num(const void *d1, const void *d2)
+{
+ struct block_list *l1 = (struct block_list *)d1;
+ struct block_list *l2 = (struct block_list *)d2;
+ return l2->num - l1->num;
+}
+
+static void add_list(char *buf, int len)
+{
+ if (list_size != 0 &&
+ len == list[list_size-1].len &&
+ memcmp(buf, list[list_size-1].txt, len) == 0) {
+ list[list_size-1].num++;
+ return;
+ }
+ if (list_size == max_size) {
+ printf("max_size too small??\n");
+ exit(1);
+ }
+ list[list_size].txt = malloc(len+1);
+ list[list_size].len = len;
+ list[list_size].num = 1;
+ memcpy(list[list_size].txt, buf, len);
+ list[list_size].txt[len] = 0;
+ list_size++;
+ if (list_size % 1000 == 0) {
+ printf("loaded %d\r", list_size);
+ fflush(stdout);
+ }
+}
+
+int main(int argc, char **argv)
+{
+ FILE *fin, *fout;
+ char buf[1024];
+ int ret, i, count;
+ struct block_list *list2;
+ struct stat st;
+
+ fin = fopen(argv[1], "r");
+ fout = fopen(argv[2], "w");
+ if (!fin || !fout) {
+ printf("Usage: ./program <input> <output>\n");
+ perror("open: ");
+ exit(2);
+ }
+
+ fstat(fileno(fin), &st);
+ max_size = st.st_size / 100; /* hack ... */
+
+ list = malloc(max_size * sizeof(*list));
+
+ for(;;) {
+ ret = read_block(buf, fin);
+ if (ret < 0)
+ break;
+
+ buf[ret] = '\0';
+ add_list(buf, ret);
+ }
+
+ printf("loaded %d\n", list_size);
+
+ printf("sorting ....\n");
+
+ qsort(list, list_size, sizeof(list[0]), compare_txt);
+
+ list2 = malloc(sizeof(*list) * list_size);
+
+ printf("culling\n");
+
+ for (i=count=0;i<list_size;i++) {
+ if (count == 0 ||
+ strcmp(list2[count-1].txt, list[i].txt) != 0)
+ list2[count++] = list[i];
+ else
+ list2[count-1].num += list[i].num;
+ }
+
+ qsort(list2, count, sizeof(list[0]), compare_num);
+
+ for (i=0;i<count;i++) {
+ fprintf(fout, "%d times:\n%s\n", list2[i].num, list2[i].txt);
+ }
+ return 0;
+}
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 63d9651..7bcb474 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -22,6 +22,7 @@ proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o
proc-$(CONFIG_NET) += proc_net.o
proc-$(CONFIG_PROC_KCORE) += kcore.o
proc-$(CONFIG_PROC_VMCORE) += vmcore.o
+proc-$(CONFIG_PROC_PAGEOWNER) += pageowner.o
proc-$(CONFIG_PROC_DEVICETREE) += proc_devtree.o
proc-$(CONFIG_PRINTK) += kmsg.o
proc-$(CONFIG_PROC_PAGE_MONITOR) += page.o
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d84feb7..08cd32c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -94,6 +94,12 @@ struct page {
void *virtual; /* Kernel virtual address (NULL if
not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */
+
+#ifdef CONFIG_PROC_PAGEOWNER
+ int order;
+ unsigned int gfp_mask;
+ unsigned long trace[8];
+#endif
};
/*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1bcf9cd..69840d2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -66,6 +66,16 @@ config UNUSED_SYMBOLS
you really need it, and what the merge plan to the mainline kernel for
your module is.
+config PROC_PAGEOWNER
+ bool "Track page owner"
+ depends on DEBUG_KERNEL
+ help
+ This keeps track of what call chain is the owner of a page, may
+ help to find bare alloc_page(s) leaks. Eats a fair amount of memory.
+ See Documentation/page_owner.c for user-space helper.
+
+ If unsure, say N.
+
config DEBUG_FS
bool "Debug Filesystem"
depends on SYSFS
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5c44ed4..fd77809 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -358,6 +358,9 @@ static inline void set_page_order(struct page *page, int order)
{
set_page_private(page, order);
__SetPageBuddy(page);
+#ifdef CONFIG_PROC_PAGEOWNER
+ page->order = -1;
+#endif
}
static inline void rmv_page_order(struct page *page)
@@ -1460,6 +1463,62 @@ try_next_zone:
return page;
}
+#ifdef CONFIG_PROC_PAGEOWNER
+static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
+{
+ return p > (void *)tinfo &&
+ p < (void *)tinfo + THREAD_SIZE - 3;
+}
+
+static inline void __stack_trace(struct page *page, unsigned long *stack,
+ unsigned long bp)
+{
+ int i = 0;
+ unsigned long addr;
+ struct thread_info *tinfo = (struct thread_info *)
+ ((unsigned long)stack & (~(THREAD_SIZE - 1)));
+
+ memset(page->trace, 0, sizeof(long) * 8);
+
+#ifdef CONFIG_FRAME_POINTER
+ if (bp) {
+ while (valid_stack_ptr(tinfo, (void *)bp)) {
+ addr = *(unsigned long *)(bp + sizeof(long));
+ page->trace[i] = addr;
+ if (++i >= 8)
+ break;
+ bp = *(unsigned long *)bp;
+ }
+ return;
+ }
+#endif /* CONFIG_FRAME_POINTER */
+ while (valid_stack_ptr(tinfo, stack)) {
+ addr = *stack++;
+ if (__kernel_text_address(addr)) {
+ page->trace[i] = addr;
+ if (++i >= 8)
+ break;
+ }
+ }
+}
+
+static void set_page_owner(struct page *page, unsigned int order,
+ unsigned int gfp_mask)
+{
+ unsigned long address;
+ unsigned long bp = 0;
+#ifdef CONFIG_X86_64
+ asm ("movq %%rbp, %0" : "=r" (bp) : );
+#endif
+#ifdef CONFIG_X86_32
+ asm ("movl %%ebp, %0" : "=r" (bp) : );
+#endif
+ page->order = (int) order;
+ page->gfp_mask = gfp_mask;
+ __stack_trace(page, &address, bp);
+}
+#endif /* CONFIG_PROC_PAGEOWNER */
+
/*
* This is the 'heart' of the zoned buddy allocator.
*/
@@ -1668,6 +1727,10 @@ nopage:
show_mem();
}
got_pg:
+#ifdef CONFIG_PROC_PAGEOWNER
+ if (page)
+ set_page_owner(page, order, gfp_mask);
+#endif
return page;
}
EXPORT_SYMBOL(__alloc_pages_internal);
@@ -2668,6 +2731,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
if (!is_highmem_idx(zone))
set_page_address(page, __va(pfn << PAGE_SHIFT));
#endif
+#ifdef CONFIG_PROC_PAGEOWNER
+ page->order = -1;
+#endif
}
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 9114974..af12bc6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -15,6 +15,7 @@
#include <linux/cpu.h>
#include <linux/vmstat.h>
#include <linux/sched.h>
+#include "internal.h"
#ifdef CONFIG_VM_EVENT_COUNTERS
DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -560,6 +561,97 @@ static int pagetypeinfo_showblockcount(struct seq_file *m, void *arg)
return 0;
}
+#ifdef CONFIG_PROC_PAGEOWNER
+static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
+ pg_data_t *pgdat,
+ struct zone *zone)
+{
+ int mtype, pagetype;
+ unsigned long pfn;
+ unsigned long start_pfn = zone->zone_start_pfn;
+ unsigned long end_pfn = start_pfn + zone->spanned_pages;
+ unsigned long count[MIGRATE_TYPES] = { 0, };
+
+ /* Align PFNs to pageblock_nr_pages boundary */
+ pfn = start_pfn & ~(pageblock_nr_pages-1);
+
+ /*
+ * Walk the zone in pageblock_nr_pages steps. If a page block spans
+ * a zone boundary, it will be double counted between zones. This does
+ * not matter as the mixed block count will still be correct
+ */
+ for (; pfn < end_pfn; pfn += pageblock_nr_pages) {
+ struct page *page;
+ unsigned long offset = 0;
+
+ /* Do not read before the zone start, use a valid page */
+ if (pfn < start_pfn)
+ offset = start_pfn - pfn;
+
+ if (!pfn_valid(pfn + offset))
+ continue;
+
+ page = pfn_to_page(pfn + offset);
+ mtype = get_pageblock_migratetype(page);
+
+ /* Check the block for bad migrate types */
+ for (; offset < pageblock_nr_pages; offset++) {
+ /* Do not past the end of the zone */
+ if (pfn + offset >= end_pfn)
+ break;
+
+ if (!pfn_valid_within(pfn + offset))
+ continue;
+
+ page = pfn_to_page(pfn + offset);
+
+ /* Skip free pages */
+ if (PageBuddy(page)) {
+ offset += (1UL << page_order(page)) - 1UL;
+ continue;
+ }
+ if (page->order < 0)
+ continue;
+
+ pagetype = allocflags_to_migratetype(page->gfp_mask);
+ if (pagetype != mtype) {
+ count[mtype]++;
+ break;
+ }
+
+ /* Move to end of this allocation */
+ offset += (1 << page->order) - 1;
+ }
+ }
+
+ /* Print counts */
+ seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+ for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+ seq_printf(m, "%12lu ", count[mtype]);
+ seq_putc(m, '\n');
+}
+#endif /* CONFIG_PROC_PAGEOWNER */
+
+/*
+ * Print out the number of pageblocks for each migratetype that contain pages
+ * of other types. This gives an indication of how well fallbacks are being
+ * contained by rmqueue_fallback(). It requires information from PAGE_OWNER
+ * to determine what is going on
+ */
+static void pagetypeinfo_showmixedcount(struct seq_file *m, pg_data_t *pgdat)
+{
+#ifdef CONFIG_PROC_PAGEOWNER
+ int mtype;
+
+ seq_printf(m, "\n%-23s", "Number of mixed blocks ");
+ for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+ seq_printf(m, "%12s ", migratetype_names[mtype]);
+ seq_putc(m, '\n');
+
+ walk_zones_in_node(m, pgdat, pagetypeinfo_showmixedcount_print);
+#endif /* CONFIG_PROC_PAGEOWNER */
+}
+
/*
* This prints out statistics in relation to grouping pages by mobility.
* It is expensive to collect so do not constantly read the file.
@@ -577,6 +669,7 @@ static int pagetypeinfo_show(struct seq_file *m, void *arg)
seq_putc(m, '\n');
pagetypeinfo_showfree(m, pgdat);
pagetypeinfo_showblockcount(m, pgdat);
+ pagetypeinfo_showmixedcount(m, pgdat);
return 0;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-03-23 13:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-17 9:00 oom-killer killing even if memory is available? Heiko Carstens
2009-03-17 9:46 ` Andrew Morton
2009-03-17 10:17 ` Heiko Carstens
2009-03-17 10:28 ` Heiko Carstens
2009-03-17 10:49 ` Nick Piggin
2009-03-17 11:39 ` Heiko Carstens
2009-03-20 5:08 ` Wu Fengguang
2009-03-20 15:27 ` Mel Gorman
2009-03-20 21:02 ` Andrew Morton
2009-03-23 11:55 ` Mel Gorman
2009-03-23 14:58 ` Mel Gorman
2009-03-17 9:51 ` Nick Piggin
2009-03-17 10:11 ` Heiko Carstens
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).